But the activity was coming from the Massachusetts Institute of Technology—and as the night went on, the scraper started to pick up speed.10
The JSTOR system couldn’t handle these voluminous download requests. Much as a home computer might freeze when launching a dozen programs simultaneously, a computer server can easily stall if hit with lots of requests in rapid succession. It takes a powerful machine to survive such an onslaught, and the JSTOR servers were, apparently, relatively feeble, or at least unprepared.
After some internal debate about what was happening and how to respond, one of the JSTOR staffers sent out the order to “Jack ’em”11—that is, to ban the offending MIT IP address from the system. “You mess with the bull,” said another, “you get the horns.”12
And that, for the moment, was that. (“Time of death: 8:56 pm,” one JSTOR employee noted.)13 But by eight the next morning, the scraper had reactivated at a different IP address and resumed downloading.14 While JSTOR limited the number of articles that a given user could download per session, it did not limit the number of sessions a user could initiate.15 The MIT scraper had identified this loophole and, at peak activity, had initiated over two hundred thousand download sessions in a single hour—an average of 55.5 new sessions per second.16 “This is too much activity for the system,” one staffer wrote, and JSTOR responded by again banning the offending IP address.17 Moves and countermoves: the scraper neatly evaded JSTOR’s second ban by adopting yet another IP address.18 This time, JSTOR retaliated by showing the entire school its horns, temporarily banning a wide range of MIT IP addresses. The downloads ceased, and the JSTOR site slowly recovered. Employees of both MIT and JSTOR proceeded to assess the damage.
In an e-mail to Ellen Finnie Duranceau, JSTOR’s contact at MIT Libraries, JSTOR user services manager Brian Larsen refrained from characterizing the incident as a hostile attack, noting, “This activity is normally a compromised username and password or a student/researcher unaware of the impact of their activities or that this method of gathering PDFs is in violation of our Terms and Conditions of Use.” The method—“robotic harvesting”—was not only prohibited, it was unnecessary. Larsen noted that JSTOR was accustomed to working with scholars who required bulk access to articles for research purposes “and would be happy to do so in this case as well if that turns out to be the motivation.”19
Duranceau and MIT’s information technology department soon determined that the download requests originated on a computer that had logged in to the school’s network with a guest account, which meant that MIT could not precisely identify the guilty party. Nevertheless, MIT recorded the offending computer’s MAC address—basically, an identification number that is unique to every computer’s network adapter, like a fingerprint—and banned that address from the network. Duranceau told Larsen that the harvesting was unlikely to happen again.20
Larsen was glad to hear it. In a follow-up e-mail, he emphasized that the incident wasn’t just a typical instance of overzealous archival research, but “an extreme case” that had affected the performance of JSTOR’s website, which rarely happened. The IP ban successfully stopped the downloads, and Larsen hoped that JSTOR had sent the downloader a sufficiently strong message: “I have no reason to believe we would need to be more heavy handed than that, but, of course, it is possible that it might be warranted at some point. I highly doubt it.”21
His faith was misplaced. On October 9, 2010, a JSTOR employee sent an ominous e-mail informing colleagues, “The MIT scraper is back.”22 Just as before, the scraper would start a session and download a document, then start another session and download another document, then repeat the process ad infinitum. With its servers suffering under the strain of the scraping and other users’ activities affected by these actions,23 JSTOR, in an unprecedented move, blocked access to its database for the entire MIT campus to maintain its server stability.24 Or, as one JSTOR employee put it, “MIT went Rambo on us, and we suspended the whole range.”25
For JSTOR, this was a drastic measure, since it affected all MIT users, not just the overzealous ones. Blanket bans for entire institutions risk eliciting angry screeds from scholars wondering why, exactly, the database had failed them in their hour of need. Access remained suspended for several days before it was finally restored on October 12.26 Drastic though it may have been, the solution seemed to work. The downloads ceased.
An analysis of the two incidents produced some alarming numbers. In October, the scraper had downloaded 8,422 articles in 8,515 total sessions. In September, however, the scraper had acquired 453,570 articles from 562 different journals over 1,256,249 sessions. “This is an extraordinary amount and blows away any recorded abuse case that I am aware of,” Larsen noted.27
What could anyone want with that many articles? The extent and pattern of the robotic harvesting indicated intentionality; the scraping clearly wasn’t the work of a student or a professor who had fallen down a research hole. Worries mounted after a deep dive into the downloaded content pointed toward a disturbing conclusion. The first document that had been downloaded in the October scraping session was the article “The Mystery of Misspelling” from a 1957 issue of the Elementary School Journal. The final article downloaded came from a 1950 issue of the Elementary School Journal. After presumably considering and dismissing the possibility that the system had been breached by a nostalgic fourth-grade English teacher, a JSTOR employee stated what appeared to be obvious: “They’re clearly going after substantially the entire corpus.”28
“The entire corpus” referred to the whole of the JSTOR database: more than 5 million articles from more than a thousand academic journals, all of which had been legally licensed and carefully digitized by the nonprofit organization. In September, MIT told JSTOR that a guest had been responsible for the downloads and that the problem was unlikely to recur. But it had recurred, prompting questions that MIT seemed reluctant or unable to answer. Who was draining the database? And why? JSTOR officials worried that voracious overseas hackers had downloaded the files.29 “By doing a simplified Chinese language Google search on ‘EZProxy password,’ you will find numerous lists with valid authentication information for hundreds if not thousands of schools,” one JSTOR employee wrote, implying that unscrupulous foreigners might be siphoning the archives.30
A senior JSTOR official reacted with alarm, asserting that the “activity noted is outright theft and may merit a call with university counsel, and even the local police, to ensure not only that the activity has stopped but that—e.g. the visiting scholar who left—isn’t leaving with a hard drive containing our database.”31 Another JSTOR employee concurred: “This is an astronomical number of articles—again, real theft (and one can assume willful malfeasance given the use of a robot, etc.). Does the university contact law enforcement? Would they be willing to do so in this instance?”32
* * *
IN September 2010, Aaron Swartz purchased a new Acer laptop and visited the Massachusetts Institute of Technology, planning to download as many articles as possible from JSTOR. Logging on to the school’s network under the alias Gary Host (G. Host, or “ghost”), Swartz played patty-cake with JSTOR’s and MIT’s tech teams for months before finding a way to access the database without arousing attention.
His actions shouldn’t have surprised anyone. If the city of Cambridge had compiled a yearbook of all its residents, Aaron Swartz would surely have been named Most Likely to Try to Download the Entire JSTOR Corpus. Swartz was an ideologue who had spent the past few years not only bulk-downloading large data sets that were inaccessible to the public, but also writing and speaking on the moral necessity of doing so. The JSTOR hack derived directly from the Guerilla Open Access playbook and the Content Liberation Front’s to-do list.
In late September 2010, Swartz traveled to Budapest for the Internet at Liberty conference, where he spoke on “online free expression and enforcing ethics & accountability for corporations & governments.”33 At the conference, Noam Scheiber of the New
Republic reported, Swartz dined with some activists who, with the backing of George Soros, had tried to get JSTOR to make its archives available to the public. But the price had been prohibitive—securing all the necessary copyrights would have cost Soros hundreds of millions of dollars—and Swartz’s dinner companions decried “the outrageous sum of money it would take to free up JSTOR for public consumption.”34 Scheiber makes clear that Swartz’s companions did not propose any sort of guerilla downloading campaign or suggest that Swartz take matters into his own hands. The conference concluded on September 22, 2010. Three days later, Swartz set up shop at MIT.
There is not necessarily any causal connection to be found here. Swartz never announced his plans for the JSTOR documents—not publicly, at least. If he confided in friends or family members, they have kept his secret. “Maybe he was downloading them because he’d figured out a way to do it and he was going to wait to see what to do next,” his friend Ben Wikler would later suggest. “Maybe he did it so he didn’t have to have an Internet connection to read whatever journal he wanted.”35 Feel free to examine the evidence and draw your own conclusions—the federal government certainly did.
Whatever Swartz’s intentions, the similarities between the JSTOR scrape and the PACER project are manifold. In both cases, Swartz used computer scripts to rapidly drain useful databases that weren’t immediately accessible to the public, prioritizing speed and efficiency over strict compliance with those databases’ terms and conditions of use. In both cases, the service providers initially framed the activity as a possible crime; that frame dictated their subsequent actions. In both cases, Swartz could have escaped notice if he had proceeded more slowly and acted within the databases’ terms of use. In both cases he sacrificed caution for celerity, and in both cases he paid the price.
* * *
BY the time he started his JSTOR operation, Aaron Swartz had been living in Cambridge, Massachusetts, for more than two years. “Cambridge is the only place that’s ever felt like home,” he wrote on his blog upon his departure from San Francisco in 2008.36 “Surrounded by Harvard and MIT and Tufts and BC and BU and on and on it’s a city of thinking and of books, of quiet contemplation and peaceful concentration. And it has actual weather, with real snow and seasons and everything, not this time-stands-still sun that San Francisco insists upon.”37
In Cambridge, he began working remotely for the Progressive Change Campaign Committee (PCCC), a political action group he had cofounded. “We had no money and no members and not much of a plan for how to get them,” Swartz wrote, but they eventually figured it out.38 The group found its niche by executing a series of theatrical stunts that drew media attention and attracted new members. Annoyed with the omnipresence of the boisterous television host Jim Cramer, who for all his professed financial acumen had failed to foresee the collapse of the housing market in 2008, Swartz and the PCCC launched an online petition asking CNBC, Cramer’s network, to hire someone—anyone—who hadn’t been wrong about the subprime-mortgage crisis. “We spread the word to friends and bloggers and before we knew it we had nearly 20,000 signatures—20,000 new members,” Swartz recalled.39 The group continued to grow from there.
Swartz worked out of a shambolic old building called the Democracy Center, near Harvard Square, which had become a hub for political activists with grand ambitions and limited budgets. The adjacent office belonged to the political organizer and former Onion contributor Ben Wikler. Wikler was employed by a global activism group called Avaaz. The two became friends.
“I think we first had lunch July twenty-eighth of 2009. I don’t remember for certain, but my big recollection is that Aaron ordered a huge plate of french fries,” Wikler said. They bonded over their shared admiration for Robert Caro’s book The Power Broker, about the urban planner Robert Moses. Swartz and Wikler soon realized they had other things in common, too. “He was game for almost anything. He was someone you didn’t have to plan dinner with—you’d just say, let’s go to dinner, and you’d go. We’d go to movies and talks and parties of hacker people or political fund-raisers,” Wikler remembered. “We had a mutual man-crush because I knew all these people in online activism and organizing, and Aaron knew all the tech bloggers. So we were each other’s tickets into the other’s world.”
Swartz’s connections to the Cambridge tech community dated back a decade. His father, Robert Swartz, had worked as a consultant at the MIT Media Lab since 2000, providing advice on patent issues. Swartz’s blog is replete with stories of him as a teenager and young adult visiting his father’s office in the Media Lab. After Swartz had been accepted to the Summer Founders Program in 2005, his father had even encouraged him to live at MIT during the summer—advice he eventually took. “You could set up in the conference room,” Robert Swartz suggested, not wholly in jest. “There’s a shower just down the hall and coffee every morning. It’d probably be a month before [the boss] finds out.”40
Although Aaron Swartz was never formally enrolled in or employed by MIT, he was nevertheless a member of the broader community there. Officials recalled that Swartz had been “a member of MIT’s Free Culture Group, a regular visitor at MIT’s Student Information Processing Board (SIPB), and an active participant in the annual MIT International Puzzle Mystery Hunt Competition.”41 Mystery Hunt is a puzzle-solving contest that is half scavenger hunt, half Mensa entrance exam. The annual event attracts participants from around the world, many of them grown adults unaffiliated with MIT. Teams spend the weekend of the hunt running around the MIT campus solving a series of difficult puzzles, occasionally sneaking into rooms and campus locations that are technically off-limits.
In Cambridge, Swartz started to treat his own life as a puzzle to be solved. He designed various lifestyle experiments to optimize his efficiency and happiness. He dabbled in creative sleep schedules, then abandoned the experiment when he found it only made him tired. In the spring of 2009, he spent a month away from computers and the Internet for the first time in his adult life. His laptop had become “a beckoning world of IMs to friends, brain-gelatinizing television shows, and an endless pile of emails to answer. It’s like a constant stream of depression,” he wrote. “I want to be human again. Even if that means isolating myself from the rest of you humans.”42
He spent June offline, an experience he later described as revelatory. “I am not happy. I used to think of myself as just an unhappy person: a misanthrope, prone to mood swings and eating binges, who spends his days moping around the house in his pajamas, too shy and sad to step outside. But that’s not how I was offline,” Swartz wrote, recounting how he had come to enjoy simple human pleasures such as shaving and exercising in the absence of perpetual connectivity.43 “Normal days weren’t painful anymore. I didn’t spend them filled with worry, like before. Offline, I felt solid and composed. Online, I feel like my brain wants to run off in a million different directions, even when I try to point it forward.” Swartz vowed to find ways to sustain this serenity and thenceforth tried to make his apartment a computer-free zone. But he would never be able to avoid the Internet entirely.
In 2010, Swartz was named a fellow at the Edmond J. Safra Center for Ethics at Harvard University.44 Lawrence Lessig, who had also returned to Cambridge from Palo Alto, brought him aboard. Lessig was supervising a Safra Center program that examined institutional corruption and its effect on public life. The fellowship was well suited for Swartz, who had spent so much of his life fixated on institutional and personal ethics. Individual ethicality had obsessed Swartz for years, and as he aged, it became perhaps his chief concern.
“It seems impossible to be moral. Not only does everything I do cause great harm, but so does everything I don’t do. Standard accounts of morality assume that it’s difficult, but attainable: don’t lie, don’t cheat, don’t steal. But it seems like living a moral life isn’t even possible,” Swartz declared in August 2009.45 The next month, he extrapolated from this line of thought:
The conclusion is inescapable: we must live
our lives to promote the most overall good. And that would seem to mean helping those most in want—the world’s poorest people.
Our rule demands one do everything they can to help the poorest—not just spending one’s wealth and selling one’s possessions, but breaking the law if that will help. I have friends who, to save money, break into buildings on the MIT campus to steal food and drink and naps and showers. They use the money they save to promote the public good. It seems like these criminals, not the average workaday law-abiding citizen, should be our moral exemplars.46
This section ignited a debate in the comments section of Swartz’s blog. Readers chided Swartz for sanctioning the theft of services from MIT. The next day, in a blog post titled “Honest Theft,” Swartz defended his position: “There’s the obvious argument that by taking these things without paying, they’re actually passing on their costs to the rest of the MIT community.” But perhaps that wasn’t as bad as it seemed, since “MIT receives enormous sums from the wealthy and powerful, more than they know how to spend.”
Other readers argued that the freeloaders’ actions just forced MIT to spend more money on security. “I don’t see how that’s true unless the students get caught,” Swartz responded. “Even if they did, MIT has a notoriously relaxed security policy, so they likely wouldn’t get in too much trouble and MIT probably wouldn’t do anything to up their security.”47 Swartz had good reason to think this way. MIT was the birthplace of the hacker ethic. The university tacitly encourages the pranks and exploits of its students; stories abound of clever undergraduates breaking into classrooms, crawling through air ducts, or otherwise evading security measures for various esoteric and delightful reasons, and these antics have been cataloged in museum exhibits and coffee-table books. By officially celebrating these pranks, MIT sends the message that it is an open society, a place where students are encouraged to pursue all sorts of creative projects, even ones that break the rules.
The Idealists Page 21