One of the liberating things about the Web is the freedom it seems to offer. You can be anyone and do anything online, out of the glare of prying eyes. You can write a scalding anonymous comment, rate a movie under a pseudonym, or buy an embarrassing book, and no one’s the wiser, right?
With his path-breaking work into “de-anonymization,” Princeton computer scientist Arvind Narayanan is showing how wrong we are. You might think your anonymous blog posts can’t be traced back to you, for example, but there’s a fair chance that Narayanan — and governments and companies — could do just that.
“There has been a realignment of privacy as we shift from an analog to a digital world,” says Narayanan, an assistant professor. Take reading sensitive articles online, or buying an item at Amazon.com. In the past, you might have requested the article from a librarian or checked out with a cashier. “Now you have more privacy from the people around you,” he says. “But the companies or the government might be doing the snooping.”
Narayanan’s research on such things as the “anonymity” of rating movies and commenting online shows how quickly the digital fingerprints we leave behind lead directly back to us. “You have these intuitions about what makes you anonymous, and Arvind has gotten out there and said that your intuitions are mistaken,” says Seth Schoen, of the nonprofit Electronic Frontier Foundation. “People are much more unique and much more different than we would think.”
In 2006, Netflix released movie reviews from half a million customers as part of a contest to develop a better algorithm for recommending movies. The company scrubbed the data of people’s names to protect their privacy. Narayanan and a colleague showed later that year that you still could reveal the identities of some users by comparing the Netflix data with consumers’ movie ratings from another website, Internet Movie Database. The research led to a privacy lawsuit that was settled in 2010.
Narayanan and colleagues have made similar findings in other contexts, from hospital records to anonymous Twitter users to genealogy websites. Last year, he and colleagues at Stanford and Berkeley showed they could unmask anonymous blog commenters by comparing their writing style and word choice to work the authors had published under a byline. They started with a database of 100,000 blogs drawn from the Web service Spinn3r. Using a sample of just three “anonymized” posts from each blog, they were able to correctly identify the author purely by analyzing the writing in more than 20 percent of the cases. And about a third of the time, the author was one of the top 20 guesses. While the technique didn’t unmask everyone, the results may unsettle the writers of anonymous blogs, particularly in countries with repressive regimes.
“The strength of the de-anonymization attack we have presented is only likely to improve over time as better techniques are developed,” the researchers wrote. “Even if the adversary is unable to identify the author using our methods in a fully automated fashion, he might be able to identify a few tens of candidates for manual inspection. ... Outed anonymous bloggers have faced consequences ranging from firing to arrest and political persecution.”
It takes only 33 independent bits of information about someone to identify that person among the more than 7 billion people in the world. That has led Narayanan to call his influential blog “33 Bits of Entropy.”
How might de-anonymization work on the Web? “In the course of a typical day, you might comment on a news article about your hometown, tweet a recipe from your favorite cooking site, and have a conversation on a friend’s blog,” Narayanan wrote. “You have established a public record of having visited these three specific URLs. How many other people do you expect will have visited all three, and at roughly the same times that you did? With a very high probability, no one else.” Even information as innocuous as the version of your operating system, the timing of your software updates, or what plug-ins you’ve installed can identify you.
Narayanan currently is investigating hidden trackers on websites that collect a record of your digital movements, amassing a data profile that can then be sold. He also is looking into “behavioral advertising,” where companies use information they’ve collected about you to deliver ads they believe will be most relevant to you. In a new study led by Narayanan, Princeton researchers have released “bots” — software programs that mimic human behavior — with fake user profiles, and will compare the bots’ search results and the ads, deals, and prices the bots are offered. The researchers then will look for patterns to measure discrimination across different sites.
The risk, Narayanan says is that “people of different races and ethnicities could experience the Web differently. They might see a different price online, or even a different list of goods and services.” Some experts say such targeted advertising could be used to limit access by certain groups of people to credit, insurance, and other services.
Narayanan is an advocate for training software developers to think about ethical issues related to their work. “Everywhere we look today, we see lots of examples of the kind of power that technologists have today, especially software engineers,” he said in a talk at Princeton in November. “In terms of data collection and privacy — do software engineers have an obligation to think about these questions, or should they just be thinking about their bottom line?”
Not surprisingly, he is reluctant to reveal much about himself. He prefers to expose his data to as few trusted companies as possible. He installs encryption and blocking tools on his Web browsers and cloud services. He adjusts his digital behavior as new vulnerabilities emerge.
In the end, he says, Internet privacy is about how we negotiate our public selves in the world, and those rules are changing. “There are things we only share with our immediate family,” he says. “Once something is on the Internet, you lose the ability to make that aspect of your personality useful to you. That can be devastating.”
Dan Grech ’99 (@dgrech) is a Media Innovation Fellow at Florida International University. He has taught journalism at Princeton and Columbia universities.