Current Issue

July16, 2008

Vol. 108, No. 16

Features

Data crusader

Josh Tauberer ’04 is someone a policy wonk could love

By Brett Tomlinson
Published in the July16, 2008, issue


Photo illustration: Seven Veach, photo courtesy Josh Tauberer ’04

Since graduating from Princeton in 2004, Josh Tauberer has led a double life. By day, he’s a mild-mannered graduate student in linguistics at the University of Pennsylvania. By night, he commands a legion of computer programs, trolling the Internet for data about congressional bills and republishing the information on GovTrack.us, a popular Web site for bloggers, policy wonks, and concerned voters.

Some 10,000 visitors view GovTrack each day — more when a hot bill is up for debate — and its freely available databases feed a handful of government watchdog sites, including OpenCongress.org, a portal of congressional news; and MAPLight.org, which tracks the votes of members of Congress in parallel with the contributions they receive from special-interest groups. At the center of this web of information is Tauberer, GovTrack’s sole employee, who works from a slightly cluttered desktop in his Philadelphia apartment.

He’s just one citizen, doing his part for democracy.

“You could put it that way,” Tauberer says, stifling a laugh, “but ... I happen to enjoy it. It’s not like I get up in the morning and [say], ‘Oh, I’ve got to save the world by making this site.’”

Indeed, when Tauberer began organizing his site as an undergraduate, few thought that there was any need for it. The Library of Congress had been publishing congressional bills on its THOMAS.loc.gov site since 1995. But Tauberer found THOMAS difficult to navigate and filled with cumbersome quirks. So, with hopes of building a better source for legislative data, Tauberer, a largely self-taught computer programmer, began creating “screen-scraping” programs that look for specific patterns on Web pages, copy the information they find, and store it in a database. Technically, screen-scraping is not very difficult, he says, but it can be a hassle to decipher page formats and sort through data that may be incomplete, inconsistent, or unreliable. And when a source Web site is redesigned, the screen-scrapers need to be retooled as well. (“Fortunately, the government doesn’t change anything — ever,” Tauberer jokes.)

Perseverance paid off for Tauberer when he launched GovTrack in September 2004, more than three years after he first envisioned the site. Users began to take notice later that year after Tauberer was awarded the top prize in a Web development contest run by Technorati.com — the citation called GovTrack “School House Rock on steroids” — and a January 2005 New York Times story about the site provided an additional boost. Today, when Web searchers type a congressional bill number into Google, more often than not the top result is a URL that begins with “www.govtrack.us.” Other GovTrack-supported sites are close behind.

“GovTrack is really the central hub in federal legislative information,” says John Wonderlich, director of the Sunlight Foundation’s Open House Project, which lobbies for better Web access to legislative data. “It’s the clearinghouse for data coming from the Library of Congress, and that’s kind of amazing that [Tauberer] has managed to do that on his own.”

While Tauberer’s hope was to improve government accountability by making it easier to access and digest the details of legislation, he is the first to admit that “information only gets you so far.” Footnotes, references, and amendments to amendments to amendments can make bills nearly indecipherable, even to well-informed readers. So, in addition to publishing the full text, status, and Library of Congress summary for each bill introduced on Capitol Hill, GovTrack provides other useful tools: e-mail alerts linked to specific bills, members of Congress, committees, or topics of interest; detailed maps of congressional districts, created by Tauberer using census data and Google maps; graphs that illustrate votes on a particular bill; and a blog of legislative analysis, written mainly by unpaid contributors.

Each senator and representative also has a GovTrack page that includes the member’s voting history, links to bills he or she has sponsored, and a graphic that shows the member’s standing on GovTrack’s “Ideometer,” an ideological spectrum that Tauberer created using a statistical analysis of bill sponsorship patterns. John McCain, for example, pushes the Ideometer’s needle to the right, about a third of the way toward the Republican end of the spectrum, while Barack Obama is positioned to the left, about two-thirds of the way toward the Democratic end. Both are labeled “rank-and-file,” which means they fall within the middle 50 percent of their respective parties. Sen. Barbara Boxer (D-Calif.) occupies the far left pole, and Sen. Jim DeMint (R-S.C.) stands on the far right.

Tauberer would like to add more analytical features like the Ideometer, but he concedes there are limitations to his skills. While he’s a whiz with databases, he lacks the design expertise needed to generate the slick infographics that newspapers and magazines create. And then there’s the simple arithmetic of time. After four years of graduate school, Tauberer is drafting a proposal for his dissertation in linguistics, which he hopes to complete in the coming year.

Post Comments
Tell us what you think about
Data crusader
Enter the word as it appears in the picture below
Send
By submitting a comment, you agree to PAW's comment posting policy.
CURRENT ISSUE: July16, 2008