Computer Science: Too Much Information

Websites that record users’ sessions for quality control often collect sensitive data

Peter Grundy

Placeholder author icon
By Josephine Wolff ’10

Published March 29, 2018

3 min read

Websites constantly collect information about their visitors’ activity — which links they click, which items they purchase, how long they spend on a particular page. That process has become increasingly thorough thanks to the proliferation of “session-replay scripts,” a technology that captures, like a video, a user’s entire set of interactions with a website, to be “replayed” for analysis purposes. A team of University researchers analyzed data for the top 50,000 websites and found those running session-replay scripts sometimes fail to protect their users’ personal information, and in some cases, even their passwords. Many companies hire analytics firms to collect and analyze user data to recommend website-design improvements. The opportunity for these firms to access users’ private information and protected credentials is a concern not just to consumers but also to the analytics firms themselves, which would prefer not to have the sensitive data that could make them the targets of malicious cyberattacks. 

Computer science graduate student Steven Englehardt, with postdoctoral research associate Günes Acar and computer science professor Arvind Narayanan, began writing in 2017 about their research on third-party firms running session-replay scripts on websites and the data those firms can access. Using Walgreens.com as a case study, they found that session-replay firms often inadvertently collect users’ personal information, including data about pharmaceutical prescriptions and medical conditions. In November, they described the first set of their findings in a post on Freedom to Tinker, a blog hosted by Princeton’s Center for Information Technology Policy.

The problem, says Englehardt, “is that websites aren’t following the terms of service put in place by the third-party session-recording services.” For instance, FullStory, which provides Walgreens with its session-replay service, requires that Walgreens strip users’ health and medical data from the records before sending the data. But the websites that rely on these services are not always willing or able to redact personal information effectively.

The problem “is that websites aren’t following the terms of service put in place by the third-party session-recording services.” — Steven Englehardt GS

Photo: David Kelly Crow

That means firms like FullStory end up handling sensitive data that they do not need or want. “FullStory is very explicit that they do not share or sell that data, but it’s possible some other third-party firms allow this data to be used in other ways,” Englehardt says.

Adam Tanner *88, a journalist and fellow at Harvard’s Institute for Quantitative Social Science, says that session-replay script processes that leak personal data to third parties are a “clear example of how government rules protecting consumers often lag far behind invasive technology.” He continues: “We should know who is gathering our data and whether they are selling or sharing it with others. Most importantly, we should have a say in what happens to sensitive information about ourselves and our lives.”

After generating news about their findings on session-replay scripts on Walgreens.com, the team followed up in February with findings about how session-replay firms sometimes collect passwords inadvertently, particularly through the “show password” option on some website logins. Unwittingly sharing user credentials with these third parties presents additional security concerns since it potentially could enable those firms — as well as anyone who hacks into their databases or purchases their data — to access protected user accounts. Many data-analysis companies explicitly exclude login credentials from the data they collect, but the researchers found that analytics provider Mixpanel still accidentally collected user passwords under certain conditions. Following the researchers’ post, Mixpanel announced it had adjusted its techniques to better avoid collecting passwords, but the research team said that Mixpanel and other firms continued to collect credentials in some cases. “There is no foolproof way for these third-party scripts to prevent password collection, given their intended functionality,” the researchers concluded in their post.

The researchers focused on seven of the most popular companies that provide session-replay scripts and identified their services on 482 of the top 50,000 websites. But Englehardt stresses that this number may underestimate how many popular websites actually use session replays, since most websites don’t record every user session but instead capture a sample of every 10th or 100th user’s interactions.

For users concerned about having their every online move recorded, Englehardt and Acar recommend installing a privacy extension, such as uBlock or EasyPrivacy, both of which have drawn from their research.

1 Response

Norman Ravitch *62

6 Years Ago

Sympathize With Facebook Users?

When I first became aware of what people used Facebook for, I marveled that anyone would reveal themselves to perfect strangers in this way. Now they seem to have regrets. While Zuckerberg and his ilk are clearly culprits in a giant money-making scheme, I have even less sympathy for the fools who have allowed them to do this. People of quality always prided themselves on privacy. Now every fool goes around physically, morally, ethically, and socially naked to the world. Suffer the fools!

Join the conversation

Plain text

Full name and Princeton affiliation (if applicable) are required for all published comments. For more information, view our commenting policy. Responses are limited to 500 words for online and 250 words for print consideration.

Related News

Newsletters.
Get More From PAW In Your Inbox.

Learn More

Title complimentary graphics