by Aaron Swartz
Now, as far as I know, nobody has ever put up the U.S.’s nuclear missiles on the Internet. I mean, it’s not something I’ve heard about. But that’s sort of the point. He wasn’t having a rational concern, right? It was this irrational fear that things were out of control. Here was this man, a United States senator, and those people on the Internet, they were just mocking him. They had to be brought under control. Things had to be under control. And I think that was the attitude of Congress. And just as seeing that fire in that senator’s eyes scared me, I think those hearings scared a lot of people. They saw this wasn’t the attitude of a thoughtful government trying to resolve trade-offs in order to best represent its citizens. This was more like the attitude of a tyrant. And so the citizens fought back.
The wheels came off the bus pretty quickly after that hearing. First the Republican senators pulled out, and then the White House issued a statement opposing the bill, and then the Democrats, left all alone out there, announced they were putting the bill on hold so they could have a few further discussions before the official vote. And that was when, as hard as it was for me to believe, after all this, we had won. The thing that everyone said was impossible, that some of the biggest companies in the world had written off as kind of a pipe dream, had happened. We did it. We won.
And then we started rubbing it in. You all know what happened next. Wikipedia went black. Reddit went black. Craigslist went black. The phone lines on Capitol Hill flat-out melted. Members of Congress started rushing to issue statements retracting their support for the bill that they were promoting just a couple days ago. And it was just ridiculous. I mean, there’s a chart from the time that captures it pretty well. It says something like “January 14th” on one side and has this big, long list of names supporting the bill, and then just a few lonely people opposing it; and on the other side, it says “January 15th,” and now it’s totally reversed—everyone is opposing it, just a few lonely names still hanging on in support.
I mean, this really was unprecedented. Don’t take my word for it, but ask former senator Chris Dodd, now the chief lobbyist for Hollywood. He admitted, after he lost, that he had masterminded the whole evil plan. And he told the New York Times he had never seen anything like it during his many years in Congress. And everyone I’ve spoken to agrees. The people rose up, and they caused a sea change in Washington—not the press, which refused to cover the story—just coincidentally, their parent companies all happened to be lobbying for the bill; not the politicians, who were pretty much unanimously in favor of it; and not the companies, who had all but given up trying to stop it and decided it was inevitable. It was really stopped by the people, the people themselves. They killed the bill dead; so dead that when members of Congress propose something now that even touches the Internet, they have to give a long speech beforehand about how it is definitely not like SOPA; so dead that when you ask congressional staffers about it, they groan and shake their heads like it’s all a bad dream they’re trying really hard to forget; so dead that it’s kind of hard to believe this story, hard to remember how close it all came to actually passing, hard to remember how this could have gone any other way. But it wasn’t a dream or a nightmare; it was all very real.
And it will happen again. Sure, it will have yet another name, and maybe a different excuse, and probably do its damage in a different way. But make no mistake: The enemies of the freedom to connect have not disappeared. The fire in those politicians’ eyes hasn’t been put out. There are a lot of people, a lot of powerful people, who want to clamp down on the Internet. And to be honest, there aren’t a whole lot who have a vested interest in protecting it from all of that. Even some of the biggest companies, some of the biggest Internet companies, to put it frankly, would benefit from a world in which their little competitors could get censored. We can’t let that happen.
Now, I’ve told this as a personal story, partly because I think big stories like this one are just more interesting at human scale. The director J. D. Walsh says good stories should be like the poster for Transformers. There’s a huge evil robot on the left side of the poster and a huge, big army on the right side of the poster. And in the middle, at the bottom, there’s just a small family trapped in the middle. Big stories need human stakes. But mostly, it’s a personal story, because I didn’t have time to research any of the other part of it. But that’s kind of the point. We won this fight because everyone made themselves the hero of their own story. Everyone took it as their job to save this crucial freedom. They threw themselves into it. They did whatever they could think of to do. They didn’t stop to ask anyone for permission. You remember how Hacker News readers spontaneously organized this boycott of GoDaddy over their support of SOPA? Nobody told them they could do that. A few people even thought it was a bad idea. It didn’t matter. The senators were right: The Internet really is out of control. But if we forget that, if we let Hollywood rewrite the story so it was just big company Google who stopped the bill, if we let them persuade us we didn’t actually make a difference, if we start seeing it as someone else’s responsibility to do this work and it’s our job just to go home and pop some popcorn and curl up on the couch to watch Transformers, well, then next time they might just win. Let’s not let that happen.
COMPUTERS
In 2000, at the age of thirteen, Aaron Swartz coauthored the RDF Site Summary (RSS), 1.0 specification, which became the first major standard for syndicating website and blog content through feeds. It was published a few days after he turned fourteen. It is no easy task to work out a technical standard with nearly a dozen other people—something many adults lack both the patience and maturity to do. I call attention to it because Swartz’s technical achievements show that he practiced what he preached—a very rare quality. He wanted openness, debate, rationality, and critical thinking, and he refused to cut corners—even at the age of thirteen.
RSS itself was fundamentally about sharing, taking the content out of its presented form on a website and allowing it to be redistributed and aggregated by other individuals and entities. Another of Swartz’s projects, the webpage authoring tool Markdown (2004, co-designed with John Gruber), was a lightweight tool to easily generate webpages and blogposts by turning marked-up text into HTML. Both point to one of Swartz’s central driving passions: making the creation, distribution, and freedom of information as easy and frictionless as possible.
Swartz’s technical skills were obviously superior, but what differentiated him from most programmers, even some of the greatest open-source gurus, was the way he went about his technical projects. Rather than retreating into a “cathedral” of elite programmers, he wanted to keep things simple, include people, and welcome them in by making things as accessible as he could. The technical projects he chose perfectly mirrored this instinct. They all point to his later, more explicitly political work, where two projects stand out: first, the tor2web proxy project, intended to make hidden deep websites accessible to everyday web users and not just techies; and second, the anonymous leak platform SecureDrop, now known as Strongbox and currently deployed at the New Yorker, The Guardian, and elsewhere. Swartz saw the deep web as a good platform for sharing information anonymously, and told Wired, “the idea was to kind of produce this hybrid where people could publish stuff using Tor and make it so that anyone on the Internet could view it.” That, in essence, was his technical philosophy: to build things for anyone on the Internet, not just hackers.
Swartz’s remarkable achievement was that he managed to merge political activism and technical knowhow to a degree managed by few before—perhaps Edward Felten’s analysis of DRM methods and advocacy against them come closest. His technical efforts to ease and democratize the creation and flow of information aligned perfectly with his political ideals of openness, transparency, and reform. That the Internet is growing farther from his ideals rather than closer signals just how much we lost with him.
—David Auerbach
Excerpt: A Programmable Web
h
ttp://www.morganclaypool.com/doi/pdf/10.2200/S00481ED1V01Y201302WBE005
November 2009
Age 22
The following is an excerpt from Aaron Swartz’s A Programmable Web: An Unfinished Work published in 2013 by Morgan & Claypool. Excerpted by permission of Morgan & Claypool Publishers.—Ed.
If you are like most people I know (and, since you’re reading this book, you probably are—at least in this respect), you use the Web. A lot. In fact, in my own personal case, the vast majority of my days are spent reading or scanning web pages—a scroll through my webmail client to talk with friends and colleagues, a weblog or two to catch up on the news of the day, a dozen short articles, a flotilla of Google queries, and the constant turn to Wikipedia for a stray fact to answer a nagging question.
All fine and good, of course; indeed, nigh indispensable. And yet, it is sobering to think that little over a decade ago none of this existed. Email had its own specialized applications, weblogs had yet to be invented, articles were found on paper, Google was yet unborn, and Wikipedia not even a distant twinkle in Larry Sanger’s eye.
And so, it is striking to consider—almost shocking, in fact—what the world might be like when our software turns to the Web just as frequently and casually as we do. Today, of course, we can see the faint, future glimmers of such a world. There is software that phones home to find out if there’s an update. There is software where part of its content—the help pages, perhaps, or some kind of catalog—is streamed over the Web. There is software that sends a copy of all your work to be stored on the Web. There is software specially designed to help you navigate a certain kind of web page. There is software that consists of nothing but a certain kind of web page. There is software—the so-called “mashups”—that consists of a web page combining information from two other web pages. And there is software that, using “APIs,” treats other web sites as just another part of the software infrastructure, another function it can call to get things done.
Our computers are so small and the Web so great and vast that this last scenario seems like part of an inescapable trend. Why wouldn’t you depend on other web sites whenever you could, making their endless information and bountiful abilities a seamless part of yours? And so, I suspect, such uses will become increasingly common until, one day, your computer is as tethered to the Web as you yourself are now.
It is sometimes suggested that such a future is impossible, that making a Web that other computers could use is the fantasy of some (rather unimaginative, I would think) sci-fi novelist. That it would only happen in a world of lumbering robots and artificial intelligence and machines that follow you around, barking orders while intermittently unsuccessfully attempting to persuade you to purchase a new pair of shoes.
So it is perhaps unsurprising that one of the critics who has expressed something like this view, Cory Doctorow, is in fact a rather imaginative sci-fi novelist (amongst much else). Doctorow’s complaint is expressed in his essay “Metacrap: Putting the torch to seven straw-men of the meta-utopia.” It is also reprinted in his book of essays Content: Selected Essays on Technology, Creativity, Copyright, and the Future of the Future (2008, Tachyon Publications) which is likewise available online at http://craphound.com/content/download/.
Doctorow argues that any system that collects accurate “metadata”—the kind of machine-processable data that will be needed to make this dream of computers using-the-Web come true—will run into seven inescapable problems: people lie, people are lazy, people are stupid, people don’t know themselves, schemas aren’t neutral, metrics influence results, and there’s more than one way to describe something. Instead, Doctorow proposes that instead of trying to get people to provide data, we should instead look at the data they produce incidentally while doing other things (like how Google looks at the links people make when they write web pages) and use that instead.
Doctorow is, of course, attacking a strawman. Utopian fantasies of honest, complete, unbiased data about everything are obviously impossible. But who was trying for that anyway? The Web is rarely perfectly honest, complete, and unbiased—but it’s still pretty damn useful. There’s no reason making a Web for computers to use can’t be the same way.
I have to say, however, the idea’s proponents do not escape culpability for these utopian perceptions. Many of them have gone around talking about the “Semantic Web” in which our computers would finally be capable of “machine understanding.” Such a framing (among other factors) has attracted refugees from the struggling world of artificial intelligence, who have taken it as another opportunity to promote their life’s work.
Instead of the “let’s just build something that works” attitude that made the Web (and the Internet) such a roaring success, they brought the formalizing mindset of mathematicians and the institutional structures of academics and defense contractors. They formed committees to form working groups to write drafts of ontologies that carefully listed (in 100-page Word documents) all possible things in the universe and the various properties they could have, and they spent hours in Talmudic debates over whether a washing machine was a kitchen appliance or a household cleaning device.
With them has come academic research and government grants and corporate R&D and the whole apparatus of people and institutions that scream “pipedream.” And instead of spending time building things, they’ve convinced people interested in these ideas that the first thing we need to do is write standards. (To engineers, this is absurd from the start—standards are things you write after you’ve got something working, not before!)
And so the “Semantic Web Activity” at the Worldwide Web Consortium (W3C) has spent its time writing standard upon standard: the Extensible Markup Language (XML), the Resource Description Framework (RDF), the Web Ontology Language (OWL), tools for Gleaning Resource Descriptions from Dialects of Languages (GRDDL), the Simple Protocol And RDF Query Language (SPARQL) (as created by the RDF Data Access Working Group (DAWG)).
Few have received any widespread use and those that have (XML) are uniformly scourges on the planet, offenses against hardworking programmers that have pushed out sensible formats (like JSON) in favor of overly complicated hairballs with no basis in reality (I’m not done yet!—more on this in chapter 5).
Instead of getting existing systems to talk to each other and writing up the best practices, these self-appointed guarantors of the Semantic Web have spent their time creating their own little universe, complete with Semantic Web databases and programming languages. But databases and programming languages, while far from perfect, are largely solved problems. People already have their favorites, which have been tested and hacked to work in all sorts of unusual environments, and folks are not particularly inclined to learn a new one, especially for no good reason. It’s hard enough getting people to share data as it is, harder to get them to share it in a particular format, and completely impossible to get them to store it and manage it in a completely new system.
And yet this is what Semantic Webheads are spending their time on. It’s as if to get people to use the Web, they started writing a new operating system that had the Web built-in right at the core. Sure, we might end up there someday, but insisting that people do that from the start would have doomed the Web to obscurity from the beginning.
All of which has led “web engineers” (as this series’ title so cutely calls them) to tune out and go back to doing real work, not wanting to waste their time with things that don’t exist and, in all likelihood, never will. And it’s led many who have been working on the Semantic Web, in the vain hope of actually building a world where software can communicate, to burn out and tune out and find more productive avenues for their attentions.
For an example, look at Sean B. Palmer. In his influential piece, “Ditching the Semantic Web?,” he proclaims “It’s not prudent, perhaps even not moral (if that doesn’t sound too melodramatic), to work on RDF, OWL, SPARQL, RIF, the broken ideas of distributed trust, CWM, Tabulator, Dublin Core, FOAF, SIOC, and an
y of these kinds of things” and says not only will he “stop working on the Semantic Web” but “I will, moreover, actively dissuade anyone from working on the Semantic Web where it distracts them from working on” more practical projects.
It would be only fair here to point out that I am not exactly an unbiased observer. For one thing, Sean, like just about everyone else I cite in the book, is a friend. We met through working on these things together but since have kept in touch and share emails about what we’re working on and are just generally nice to each other. And the same goes for almost all the other people I cite and criticize.
Moreover, the reason we were working together is that I too did my time in the Semantic Web salt mines. My first web application was a collaboratively written encyclopedia, but my second, aggregated news headlines from sites around the Web, led me into a downward spiral that ended with many years spent on RDF Core Working Groups and an ultimate decision to get out of the world of computers altogether.
Obviously, that didn’t work out quite as planned. Jim Hendler, another friend and one of the AI transplants I’ve just spend so much time taking a swing at, asked me if I’d write a bit on the subject to kick off a new series of electronic books he’s putting together. I’ll do just about anything for a little cash (just kidding; I just wanted to get published (just kidding; I’ve been published plenty of times (just kidding; not that many times (just kidding; I’ve never been published (just kidding; I have, but I just wanted more practice (just kidding; I practice plenty (just kidding; I never practice (just kidding; I just wanted to publish a book (just kidding; I just wanted to write a book (just kidding; it’s easy to write a book (just kidding; it’s a death march (just kidding; it’s not so bad (just kidding; my girlfriend left me (just kidding; I left her (just kidding, just kidding, just kidding))))))))))))))) and so here I am again, rehashing all the old ground and finally getting my chance to complain about what a mistake all the Semantic Web folks have made.