From bowl games to GPAs: Can algorithms improve grading?

Brett Tomlinson
By Brett Tomlinson
2 min read

Gordon Scharf ’09’s search for a better grading system started at the dinner table of the Brown Hall Co-op. Engineering students were complaining that high grades were harder to earn in their classes, “but there was no good way of quantifying how much harder it was,” Scharf said.

The initial premise was not necessarily accurate — departments in the natural sciences award the fewest A’s — but the conversation pushed Scharf toward a new research topic. For his senior thesis, the operations research and financial engineering major decided to develop a solution.  

He found an interested adviser, Robert Vanderbei, who happens to be a member of the University’s grading committee; and a constructive critic, physics professor Daniel Marlow, who helped to test Scharf’s model with several different distributions of grades. Vanderbei, Scharf, and Marlow have collaborated on a working paper titled “Assessing Inequity in Grading,” which presents a statistical model to assess both grade inflation and student achievement.

Grading data, Scharf said, form a “landscape of pairwise comparisons,” with thousands of students choosing from hundreds of course options. It’s comparable to major college football, where more than 100 teams compete and each plays just 12 games in a season. A team’s record of wins and losses can mean different things, depending on the strength of its competition, so the Bowl Championship Series has developed algorithms to compare teams and select the top candidates for postseason bowl games.  

Using a similar approach, Vanderbei, Scharf, and Marlow’s model compares grades to calculate an “inflatedness” measure for each course and an “aptitude” measure (or adjusted grade-point average) for each student. When it has been run using actual Princeton grading data — masked for confidentiality — the model has been able to accurately predict individual grades removed from the data set.

Vanderbei said that the grading model could have several applications, from providing new data on which departments are the toughest and easiest graders to giving contextual data to the committees that select undergraduates for fellowships and honors.  

The model’s “inflatedness” measures for each class also could eliminate the need for grading guidelines like the ones employed at Princeton. Professors could grade however they chose, and the model would correct for those who give inflated marks (as well as those who grade too severely). The faculty probably would not go down that path, Vanderbei said, because “it gets to be a little Big Brother-ish.”

0 Responses

Join the conversation

Plain text

Full name and Princeton affiliation (if applicable) are required for all published comments. For more information, view our commenting policy. Responses are limited to 500 words for online and 250 words for print consideration.

Related News

Newsletters.
Get More From PAW In Your Inbox.

Learn More

Title complimentary graphics