In May 2006, Aaron Swartz wrote a blog post titled "The Book That Changed My Life". The book in question, Understanding Power, is a series of transcribed discussions with the MIT linguist Noam Chomsky in which Chomsky analyses and explains the ways in which political power is wielded, acquired and guarded. "Reading the book, I felt as if my mind was rocked by explosions. At times the ideas were too much that I literally had to lie down," Swartz wrote. "Ever since then," he continued, "I've realised that I need to spend my life working to fix the shocking brokenness I'd discovered." After leaving Reddit in 2007, Swartz began to do so in earnest. He had come to believe that free, unimpeded access to information was an inherently political issue, not just a slogan to be monetised. "The system", in all its incarnations, that vague authoritarian stronghold of imprecise menace and organisational inefficiency, had long been Swartz's primary antagonist. If "the system" relies on institutional opacity to conceal its aims and to consolidate its grasp on power, one way to buck that system is to reveal the information that it actively keeps hidden. In July 2008, Swartz put his name to a cri de coeur dubbed the "Guerilla Open Access Manifesto", in which he urged people of good conscience to "take information, wherever it is stored, make our copies and share them with the world". He evangelised on these topics like a man touched with ideological tinnitus, unable to escape the sound of social dysfunction and desperate to make others hear the ringing in his ears.
Swartz had developed several methods of acquiring large data sets. Sometimes he'd purchase them. Sometimes he'd request them directly from government agencies under the Freedom of Information Act. Sometimes he would use scripts and download the material automatically. This last method was quick and easy -- especially for Swartz, who so disliked having to ask other people for help -- but it also had the potential to greatly annoy the database providers.
In a blog post published in January 2013, the librarian Eric Hellman recalled how, upon meeting Swartz, he took him to task about "how some of his mass-downloading was getting people really upset and could have negative consequences for the things he was trying to accomplish. If he would just ask, I told him, he could have an account for an API that DIDN'T crash to smithereens when asked for millions of records. And people were working really hard to make the information he wanted free, it just needed some years to make sure the machinery wouldn't collapse. Aaron sounded embarrassed."
Embarrassed though he may have been, Swartz had no intention of changing his ways. He had made himself into a freelance idealist, one who was uninterested in waiting around for systems to gradually reform themselves. Years after Chomsky's book first sent him reeling with the vertiginous power of its transgressive political ideas, Swartz was surer than ever that systems existed to be overturned. "We need to fight for Guerilla Open Access," he wrote in 2008. He was ready to lead the charge.
He had made himself into a freelance idealist, one who was uninterested in waiting around for systems to gradually reform themselves.
In 2008, Swartz set his sights on a US federal database called Public Access to Court Electronic Records, or PACER. The database is a comprehensive online archive of federal court documents. It's an invaluable resource for researchers, who, rather than having to rummage courthouse archives for the files they need, can access that material from the comfort of their own homes. This convenience comes at a cost, though: PACER users must pay ten cents for every page they access. (The fee is only assessed if more than $US15 worth of charges is accrued in a given quarter.)
At the end of 2007, the US Courts announced that, for a limited time, it would offer completely free access to the PACER database. (At the time, PACER only charged eight cents a page, not ten.) This trial program was made available at sixteen federal depository libraries across the United States, and researchers would have to physically visit these libraries to take advantage of the offer. Swartz sensed an opportunity.
He joined forces with the legendary archivist and public-data activist Carl Malamud, who had enjoined volunteers to visit the depository libraries, download PACER records to portable thumb drives, and then "recycle" that material by uploading it to Malamud's website resource.org, where it would live in perpetuity as a free alternative to PACER. "Is this legal?" Malamud asked, before answering his own rhetorical question. "You betcha! These are public documents."
The terms of the PACER access initiative did not explicitly authorise remote downloading, and this made Malamud nervous. "do you have your library's permission/tacit agreement to drain pacer?" he asked. "no", Swartz replied. "sigh. this is not how we do things. :)," Malamud emailed Swartz on 4 September 2008. "we don't cut corners. we belly up to the bar and get permission." If Swartz wanted to collaborate with Malamud, he would have to play by the rules.
Swartz gave his assent and then, without telling Malamud, ran the program remotely anyway. He persuaded a friend in California to visit the library in Sacramento and surreptitiously download an authentication cookie that Swartz could use from home to fool PACER into thinking he was at the Sacramento library. In Massachusetts, Swartz ran the program, and then sat back and watched the files roll in. "we're going to have fun with this," Malamud told Swartz in late September, after Swartz had estimated that he would be able to capture approximately four terabytes worth of PACER records. "awesome. :-)," Swartz replied.
On 20 September 2008, Swartz revisited the Guerilla Open Access Manifesto in a blog post promoting the launch of a website called guerillaopenaccess.com. "I realised that the Open Access movement simply wasn't enough -- even if we got all journals going forward to be open, the whole history of scientific knowledge would be locked up," he wrote by way of explanation. "I realised what must be done. If we couldn't get free access to this knowledge, folks would have to take it." A week later, the government noticed the unusually high number of downloads purportedly originating from the Sacramento County Public Law Library and severed Swartz's access to PACER. When Malamud learned that Swartz had been running his crawler remotely despite instructions to the contrary, he told Swartz that "you definitely went over the line, even after I specifically told you I didn't want that to happen on my resources." Then, worse came to worst: fearing a security breach, PACER suspended the trial-access program entirely.
Swartz's personal website, the FBI observed, "includes a section titled 'Aaron Swartz: a lifetime of dubious accomplishments'".
Swartz had downloaded almost 20 million pages from PACER, which constituted about 20 per cent of the entire database. Automatically downloading PACER wasn't illegal, as far as Swartz and Malamud believed, but it was certainly unusual, and federal agencies tended to be suspicious of unusual things. In a report dated 6 February 2009, the Washington field office of the FBI noted that, thanks to Swartz's actions, "the PACER system was being inundated with requests. One request was being made every three seconds." Wondering exactly what Swartz and Malamud had been up to, the agency initiated an "information gathering phase".
The file that the FBI started on Swartz contained a précis of his recent activities. It noted Swartz's stated ambitions of "pulling all information about politics, votes, lobbying records, and campaign finance reports under one unified interface". Swartz's personal website, the FBI observed, "includes a section titled 'Aaron Swartz: a lifetime of dubious accomplishments'". In February, the FBI sent a car to surveil Swartz's parents' house in Highland Park, Illinois. On April 14, an agent called Highland Park hoping to talk with Swartz in person. Swartz wasn't at home, but the FBI agent spoke with his mother, who was spooked enough to send Carl Malamud a frantic email and Twitter message informing him of what had happened. ("tell your mother that twitter is *not* the right way to reach me on this stuff :)", Malamud told Swartz.)
Swartz eventually returned the call. "I'm sure you can guess what this is about. PACER," said Special Agent Kristina Honeycutt, in Swartz's telling. "We're interested in sitting down and talking to you about it, more so to just find out exactly what happened, so we can help the US Courts get their system back up." Honeycutt asked if Swartz would be willing to meet at some point soon for a face-to-face conversation. "If it was something bigger than that," she said, "we wouldn't have called you to ask."
Swartz's lawyer eventually called the FBI and said that his client would agree to meet only if the agency could guarantee that doing so would not work to his detriment. The FBI couldn't make that promise, so Swartz never met with them. The investigation was eventually closed on 20 April 2009. Later, Swartz requested his FBI file and posted the contents online.
Swartz had spent two years downloading and uploading various data-sets in a flurry of shotgun activism: spreading his shot wide, not caring particularly about which target he hit. Now, his tactics had backfired. But far from convincing Swartz to curb his ambitions and proceed with more caution, the PACER experience, if anything, just encouraged him to reload. Swartz's guerillaopenaccess.com website linked to the website of a group called the Content Liberation Front, self-described "guerillas of the open access movement". The Content Liberation Front's website was a simple list of projects, the first of which was the acquisition of expired journals.
"Many online journal sites, like JSTOR, even charge for articles which have entered the public domain," the site said. "If you have copies of such articles, please upload them to archive.org and let us know." Uploading public-domain articles was only the beginning: "If you have a bit more skills or time, we suggest liberating entire journal archives from these sites and uploading them to file sharing networks. If anyone does so, let us know we'll post about it here."
The site urged visitors to send hard copies of databases to its mailing address:
The Content Liberation Front
c/o Aaron Swartz
950 Massachusetts Ave., #320
Cambridge, MA 02139
That was Swartz's apartment, between Harvard Square and Central Square, just down the road from the Massachusetts Institute of Technology.
This post was adapted and excerpted from The Idealist: Aaron Swartz and the Rise of Free Culture on the Internet, by Justin Peters. Out now from Scribner.
Images via Flickr / Quinn Norton / Sage Ross