Current Issue

Oct.13, 2010

Vol. 111, No. 2


From bowl games to GPAs: Can algorithms improve grading?

By Brett Tomlinson
Published in the October13, 2010, issue

Gordon Scharf ’09’s search for a better grading system started at the dinner table of the Brown Hall Co-op. Engineering students were complaining that high grades were harder to earn in their classes, “but there was no good way of quantifying how much harder it was,” Scharf said.

The initial premise was not necessarily accurate — departments in the natural sciences award the fewest A’s — but the conversation pushed Scharf toward a new research topic. For his senior thesis, the operations research and financial engineering major decided to develop a solution.

He found an interested adviser, Robert Vanderbei, who happens to be a member of the University’s grading committee; and a constructive critic, physics professor Daniel Marlow, who helped to test Scharf’s model with several different distributions of grades. Vanderbei, Scharf, and Marlow have collaborated on a working paper titled “Assessing Inequity in Grading,” which presents a statistical model to assess both grade inflation and student achievement.

Grading data, Scharf said, form a “landscape of pairwise comparisons,” with thousands of students choosing from hundreds of course options. It’s comparable to major college football, where more than 100 teams compete and each plays just 12 games in a season. A team’s record of wins and losses can mean different things, depending on the strength of its competition, so the Bowl Championship Series has developed algorithms to compare teams and select the top candidates for postseason bowl games.

Using a similar approach, Vanderbei, Scharf, and Marlow’s model compares grades to calculate an “inflatedness” measure for each course and an “aptitude” measure (or adjusted grade-point average) for each student. When it has been run using actual Princeton grading data — masked for confidentiality — the model has been able to accurately predict individual grades removed from the data set.

Vanderbei said that the grading model could have several applications, from providing new data on which departments are the toughest and easiest graders to giving contextual data to the committees that select undergraduates for fellowships and honors.

The model’s “inflatedness” measures for each class also could eliminate the need for grading guidelines like the ones employed at Princeton. Professors could grade however they chose, and the model would correct for those who give inflated marks (as well as those who grade too severely). The faculty probably would not go down that path, Vanderbei said, because “it gets to be a little Big Brother-ish.”

Students might be more supportive, according to Scharf, who now works in the Beijing office of Intellectual Ventures. He has discussed his work with friends and said that most thought it was “a really cool idea” — even those who probably wouldn’t have benefited from it.
Post Comments
Tell us what you think about
From bowl games to GPAs: Can algorithms improve grading?
Enter the word as it appears in the picture below
By submitting a comment, you agree to PAW's comment posting policy.
CURRENT ISSUE: Oct.13, 2010
Web Bonus Links
John, Paul, George, and ... A's
A statistical model explores grading inequities at Beatle University.