A Physics Professor Tested Whether AI Can Reliably Grade Exams
Professor Lyman Page compared his grades to ones generated by Google’s Gemini
As universities nationwide grapple with how artificial intelligence could reshape education, one Princeton physics professor is testing whether AI can reliably grade exams.
The experiment, conducted in the spring 2026 semester in PHY 104: General Physics II, taught by professor Lyman Page, compared Page’s grades of exams with grades generated by Google’s Gemini Pro AI model. The AI-generated scores did not affect students’ grades, which were determined manually, and students were allowed to opt out of the experiment.
“Grading exams is an enormous time sink, and there’s always some subjectivity in it,” Page said. He explained that with more than 230 exams to grade by hand, AI could make grading more efficient and consistent. The experiment was still in progress when this issue went to press.
Page added that physics exams are often hard to read, as people frequently start over, scratch out problems, or put arrows over their work, so it will take some training for the AI model to understand that. Ultimately, it may not work.
“The goal of creating any problem is to figure out what someone knows, not necessarily if they have the exact right answer,” he said.
But if the technology performs well, he said the class will probably start using it on small assignments before considering it for other projects.
Google’s terms of service say the enterprise version of Gemini Pro does not share uploaded content with outside organizations without permission. Before launching the experiment, the physics department worked with University administrators and the Office of Information Technology. Page said the system uses Princeton’s firewall protections, and no one will be able to scrape the exam from public AI platforms.
Beyond efficiency, Page explained that AI could help instructors provide more detailed feedback to students. He explained that especially in large courses, there are not enough hours in the day to annotate on exams where students missed a vector sign or put a direction in the wrong way.
Arav Gupta ’29 said he initially thought that it was a joke when he received Page’s Canvas announcement that Gemini Pro would be used to grade his exam. Once he learned that the exams would be graded manually first, he felt more comfortable participating.
After receiving his official grade, he ran his exam into Gemini and saw that he received a score roughly 10% lower than the score he received from the hand grader.
Gupta didn’t have an answer key, so it was not a direct parallel to Page’s experiment, but Gemini did compare its answers with his responses. “It wasn’t giving me a lot of partial credit the same way the hand grader would,” he said.
Andrew Addo ’29 said he was surprised by the department’s decision but grasps the reasoning behind it. “I think I understand, going forward, the trend is that generative AI is going to be used in all aspects of life for efficiency,” he said. Still, he said that he wished the department was more transparent about how the experiment would work.
Princeton is adapting its broader policies surrounding AI. This spring, the University faculty voted to use proctors for in-person exams beginning in the fall semester, citing concerns about AI and electronic devices.
When informed of the decision, Page questioned whether human proctors were the most effective response. “If the purpose is to detect cheating, why not use a camera?” he asked.
For Page, his experiment reflects what he sees as a larger shift in higher education.
AI is “part of all of our lives … . [We have to] figure out how to take advantage of it,” he said. “It’s a powerful tool. We can use it to our benefit.”
Still, he argued that while AI may become increasingly embedded in STEM education, it will not replace the human element of teaching.
“If you could watch a computer and learn everything you need … people would,” he said. “But they don’t, because it doesn’t work. … Humans are important for communicating to humans.”



No responses yet