FIVE YEARS AGO, Erez Lieberman Aiden ’02 was slogging toward the end of a laborious linguistics study when he realized he was going about it all wrong. He and fellow Harvard graduate student Jean-Baptiste Michel wanted to trace how the use of irregular verbs — such as “go” and “sing” — has evolved over time. To help them do this, an undergraduate student spent months paging through a dozen library books on old English grammar to compile a list of irregular verbs, which Aiden and Michel then analyzed for patterns.
To verify their results, the two went back to Harvard’s Widener Library to check out the grammar books again. But this time, they noticed that somebody else also had been borrowing them. The books later showed up on the website of the Google Books project, in which the search giant had begun to digitize the collections of major university libraries. That’s when it hit them: What they were doing “was completely obsolete,” Aiden says.
Aiden realized that Google was creating a digital archive of the written historical record. Searching how the use of words has changed over centuries would make it possible to track cultural, linguistic, and historic trends, the two graduate students saw. They convinced Google to let them develop software tools to probe the company’s digital library. In late 2010, they unveiled what they could do by searching 5 million books — including tracing the rise and fall of inventions, ideas, and individuals’ fame; and detecting the suppression of artists and intellectuals. They declared the birth of a new field that they dubbed “culturomics.” It has shaken up the world of humanities and landed the two on the front pages of The New York Times, The Boston Globe, and The Wall Street Journal.
THAT AIDEN, NOW 32, WOULD TAKE ON such a project at all might be considered surprising: He is first a mathematician and scientist, not a humanist; at the time of his linguistics work, he was working on his Ph.D. in mathematics and biomedical engineering. The project reinforced an important lesson, Aiden says, half-joking: “It seems to be possible to study language change and these kinds of seemingly nutty subjects without completely wrecking one’s scientific career.”
At least that has been true for Aiden, who as a graduate student and postdoc ranged freely across disciplines with the support of his advisers. As a biologist, Aiden produced the first three-dimensional maps of the human genome, work that may help reveal how cells malfunction in disease. As an engineer, he came up with an award-winning idea for an electronic shoe insert — now in testing — that can sense poor balance in the elderly. All six of his research papers were published in the world’s top two science journals, Science and Nature, an astonishing record for a young scientist. He is now a visiting faculty member at Google and a junior fellow of the selective Harvard Society of Fellows, a position that allows him to pursue independent research for three years in any area that interests him — as befits a modern-day Renaissance man.
“Don’t try to figure out what box Erez fits in. He doesn’t really fit in a box,” says Eric Lander ’78, director of the Broad Institute at Harvard and MIT and one of Aiden’s Ph.D. advisers. “He’s what universities should have: people who are just broadly creative and able to be effective in many different fields.”
“Erez is very different from any scientist I know,” says Harvard mathematical biologist Martin Nowak, who advised Aiden on both his Princeton senior thesis and his Ph.D. Most scientists work their way step-by-step through a problem, Nowak says, but Aiden “takes a long time to just think. Then he goes for very big projects.”
Aiden says that’s because he is attracted to research that has a slim chance of working but could have a tremendous payoff. “I have trouble working on projects that aren’t extraordinarily exciting, that I believe are game-changers.
“Boredom is this tremendous warning sign,” he continues. “If when contemplating the big picture, it doesn’t make your heart race with excitement, that’s a warning sign.”
IN AN INTERVIEW, AIDEN COMES ACROSS as an approachable, if intense, young man bursting with ideas that he calls “cool,” “super-exciting,” and “super-interesting.” He is sitting cross-legged on the floor in a basement student lounge on the Harvard campus, sporting his usual dark goatee, gray chinos, knit top, and hiking boots, eating a lunch of pizza and soup. It is spring break in mid-March, and the campus is quiet. He is joined by his wife, Aviva Presser Aiden, who injects another jolt of intellectual firepower to the household: She earned a Ph.D. in applied mathematics and genomics from Harvard and MIT in 2009, received a $100,000 grant from the Gates Foundation to develop a microbial-based cellphone charger for use in Africa, and is now finishing a medical degree at Harvard. The family includes a 2-year-old son; the Aidens were discussing whether their second child, due in June, would have brown eyes like dad or blue eyes like mom.
When Erez steps away to answer a phone call, Aviva recalls how they met, soon after their arrival in Boston as graduate students. While he was “obviously brilliant,” she says, part of the attraction was that he thought about things “that maybe you wonder about when you’re 10, then you stop thinking about them because you’re thinking about practical things, like, ‘Why is my experiment not working?’ ‘Why is my stipend check late?’” Erez, she says, never lets life’s practicalities get in the way of curiosity.
Aiden grew up in Brooklyn, the son of Hungarian and Romanian immigrants who spoke Hebrew at home. His grandparents taught him Hungarian, making English his third language. (He was born Erez Lieberman; he and Aviva added the “Aiden,” which means Eden in Hebrew, when they married in 2005.) On weekends his father often took him to New Jersey to the family’s factory, which made high-precision industrial cutting tools. Father and son would tinker together on inventions such as a tool that produced medical-syringe needles more efficiently. That “made the concept of creating new things something I just got at a very young age,” he says. He attended a private Jewish high school, where he staved off boredom by working in a college molecular biology lab.
For a brainy kid who wasn’t sure what he wanted to focus on, Princeton was “a wonderful place,” Aiden says. “There’s so much faculty and so little Princeton,” he says, referring to the abundance of leading scholars at a relatively small university. He names some who influenced him: scientists and mathematicians like Robert Austin and Andrew Wiles, but also humanists like philosopher Saul Kripke, ethicist Peter Singer, and novelist Toni Morrison.
Although Aiden majored in math and studied physics and philosophy, he felt pulled toward biology because he wanted to do something that “would have more impact on the world on a shorter time horizon.” He approached Nowak, who was then at the nearby Institute for Advanced Study. “He just showed up out of the blue,” says Nowak — which turned into a habit. “It was almost annoying,” Nowak says, except that “any problem I was considering, he could give me good advice about how to solve it.”
Each of Aiden’s two senior theses — in math and philosophy — was selected as the best in its department. Still, in some sense college was a disappointment: “I was under the illusory impression that if I simply studied math, physics, and philosophy, I would be able to make all life decisions from first principles,” Aiden says. “It was a complete fool’s errand.”
SO AIDEN DECIDED HE NEEDED TO FILL a “huge gap” in his education: history. In a master’s program at Yeshiva University, he boned up on general history, from prehistoric to modern European, and took a “deep dive” into a 17th-century Jewish text called the Kol Sachal that critiques rabbinic practices. Aiden argued that its author, a Venetian rabbi, was not attacking Judaism, but instead trying to reconcile it with Christianity. His adviser, Elisheva Carlebach, now at Columbia, says he “devoured the secondary literature, but was able to see his way to a completely original insight.”
On a visit back to Princeton one weekend while working on his history degree, Aiden heard Lander give a seminar about his role co-leading the recently completed project to sequence the human genome — a major scientific achievement. Aiden decided he wanted to work in Lander’s lab. “He just gave such an incredibly energizing talk about what was possible in genomics that I wanted to be part of that,” he remembers.
Once there, however, his first project failed. An attempt to sequence genes involved in the immune system, it sucked up 18 months before Aiden gave up. “I was really, really down about it,” he says. “I was very, very hopeful about that project and had invested myself in it enormously, had spent insane amounts of time on it, and it didn’t work.” But he continues: “In retrospect, it was great” — besides giving him valuable tools, “it led me to dig deep to think of a project that was more creative and interesting.” Failure, he says, is something to “draw on.”
The new project involved zooming in on how the human genome — about two meters long — folds to fit inside the nucleus of a cell. “Usually when we put long strings into small spaces, they start to get tangled,” like headphones in a pocket, Aiden explains. But the genome does not, his project showed. Instead, it nestles into a structure that is dense, but not knotted. It’s much like uncooked Ramen noodles, 100 feet of which fit into a 4-by-4-inch packet, he says.
Scientists already knew that within the cell’s nucleus, our 23 pairs of chromosomes aren’t tangled up but instead occupy distinct sections of the nucleus. But between the whole-chromosome level and short stretches of the DNA helix that were a million times smaller, not much was known.
Aiden invented a way to measure the points where DNA strands touch adjoining strands across the entire genome, a technique called Hi-C. It worked like this: He and his collaborators injected a DNA sample with a chemical (formaldehyde) that acted as a glue and bound the DNA strands at the contact points. They then broke the genome into literally a million short lengths that they fished out and fed through a DNA-sequencing machine. By comparing these contact points to a reference copy of the genome, Aiden worked out which pieces were touching when the chromosomes were folded up inside the cell nucleus.
The folding map, published in the journal Science in 2009, revealed two important insights. First, chromosomes tend to move between different compartments of the nucleus, depending on whether genes are turned on or off. Second, when Aiden’s team zoomed in closer, they found that DNA folds into a structure closely related to one that had been proposed by a 19th-century Italian mathematician but never observed in nature before: a fractal globule, which explains how the cell’s long DNA can fit into the cell’s nucleus without becoming tangled.
Because this folding pattern helps determine which genes are switched on, it may explain how a neural cell and a skin cell carrying the same DNA are so different, Aiden says. The three-dimensional maps also could shed light on cancer, a disease in which the genome malfunctions.
EVEN WHILE MAKING GENOME MAPS, however, Aiden couldn’t be pinned down to one field. Around the same time, he was winding up the irregular-verbs project, which had been inspired by a talk he had heard at Princeton by Harvard psychologist Steven Pinker. Pinker had discussed the curious fact that while only a few English verbs are irregular, they are among the most widely used verbs. That mystery, Aiden says, seemed like something that “might be amenable to some sort of mathematical analysis.”
The study he did with Michel — on the evolution of 177 verbs from about 800 A.D. until today — made the cover of the journal Nature in 2007. The researchers found that the less often an irregular form is used, the more quickly it is replaced with the regular form. (“Finded” has not replaced “found,” but the rarely used “chode” has given way to “chided.”) The data did indeed fit a mathematical relationship: Verbs used 100 times less often become regular 10 times as fast.
After that, the idea of trolling Google’s book collection seemed “irresistible.” Google provided enough access to allow the researchers to begin playing with the data, though it didn’t publicly release full texts for copyright reasons. Aiden and Michel converted the books into a single database of text that could be searched for words or phrases — known as ngrams — using a software tool they called an ngram viewer that produced graphs showing how usage of the word in question rose or fell over time.
Instead of reading a few books closely, the researchers could “read” millions superficially. Their paper was published online in Science in December 2010. “Two weeks before it came out we convinced Google to create a Web version [of the ngram viewer], and it all came out at the same time and it was just super-exciting,” Aiden says.
In the Science paper, which focuses on more than 5 million books in six languages published between 1800 and 2000 (4 percent of all books ever published), Aiden and Michel offer a taste of how this database of 500 billion words can be used to study the evolution of language and culture. The English lexicon is growing fast: The number of words has doubled in the past century, to 1 million. The use of “woman” caught up with “man” only in the mid-1980s. “Ice cream” peaked in 1950, while “sushi” doesn’t take off until about 1980.
The Science paper charts the rise and fall of the famous: Sigmund Freud surpassed Charles Darwin in 1949 in terms of how often his name appears in books; Jimmy Carter’s fame surpasses that of Marilyn Monroe, Mickey Mouse, or Che Guevara. Actors’ fame emerges at about age 30, the researchers reported, while politicians’ comes later, in their 50s. The study found examples of political suppression: The names of Jewish artists such as Marc Chagall and others considered “degenerate” almost vanish from German books in the Nazi era. Aiden and Michel also leavened their paper with jokes: “God is not dead but needs a new publicist,” they wrote, noting that use of the word “God” has fallen off since the mid-19th century.
The study grabbed headlines and popular attention — the ngram viewer recorded 1 million hits in the first 24 hours it was public, Aiden says. But the reaction from scholars wasn’t entirely positive. Some complained that humanists were not among the paper’s authors (false, says Aiden; his co-authors included two literature Ph.D.s and he has a graduate degree in history). Another complaint was that “culturomics” was simply a new name for existing studies within the digital humanities. Aiden responds that the Google ngram viewer is “an insane change in scale” compared to what came before, insisting that humanists would benefit enormously from compiling large datasets the way genome researchers have. (Try the ngram viewer at http://books.google.com/ngrams/.)
“To my mind it’s hugely interesting,” says Princeton history professor Anthony Grafton, who first had feared that culturomics would sweep aside the traditional practice of — and funding for — closely reading books. “If you want to try to understand a culture, you need a lot of data. And this is a more rigorous and powerful way of gathering that data than anything we’ve ever had.”
MUCH OF AIDEN’S WORK TODAY takes place in what he and Michel call their Cultural Observatory. Although the words “Aiden Lab” are written on the oak door in gold, old-fashioned letters, the room looks much like a student study spot, with a worn couch and chairs, a half-dozen carrels, and blackboards covering one of the mint-green walls.
Sitting in one of the carrels is Benjamin Schmidt, a graduate student in history at Princeton who is spending a year at Harvard working at the Cultural Observatory. He shows off a new ngram viewer that the team has created to search Arkiv, an online database where most physicists deposit their papers. (The full text is free online.)
Schmidt has been using the Google Books ngram viewer to analyze the accuracy of period television shows. He has found numerous anachronisms, such as the mention of the term “black market,” which came into use during World War II, in Downton Abbey, the British drama set in the 1910s. In Mad Men,the hit show about an advertising firm in the 1960s, errors are more of emphasis: Characters often say “feel good about,” which rarely was used in the ’60s. Michel and Aiden are working to add newspapers, periodicals, and other print materials to the Google ngram database.
Aiden’s biology work continues as well. A couple of miles away at the Broad Institute, he is developing techniques similar to Hi-C for mapping the three-dimensional structures of proteins. He works with Aviva on her Gates Foundation fuel-cell project. On the “back burner,” he says, is iShoe, the sensor-filled insole project he helped develop as an intern at NASA.
On the side, he and Aviva run a nonprofit, Bears Without Borders, that pays local artisans in developing countries to make stuffed animals for children in hospitals and orphanages. “I have very complex theories about how my work is actually helpful to the universe, and it’s nice, once in a while, to do something that one can see has a tangible and immediate impact,” says Aiden.
But his growing family may force some new realities to take hold. Once known for regularly pulling all-nighters, then sleeping in to catch up, he has been forced into “a normalized sleep schedule,” says Aviva, explaining that Gabriel, their toddler, “is up at 7, no matter what.” Aiden says that’s OK with him.
After all, his son has taught him about learning, too. “In the best possible week,” he says, “I won’t learn as much as Gabriel does.”
Jocelyn Kaiser ’88 is a writer at Science magazine.