by Steve Levy
It would seem that book scanning was a good candidate for similar transparency. If Google had a more efficient way to scan books, sharing the improved techniques could benefit the company in the long run—inevitably, much of the output would find its way onto the web, bolstering Google’s indexes. But in this case, paranoia and a focus on short-term gain kept the machines under wraps. “We’ve done a ton of work to try to make those machines an order of magnitude better,” AMac said. “That does give us an advantage in terms of scanning rate and cost, and we actually want to have that advantage for a while.” Page himself dismissed the argument that sharing Google’s scanner technology would help the business in the long run, as well as benefit society. “If you don’t have a reason to talk about it, why talk about it?” he responded. “You’re running a business, and you have to weigh [exposure] against the downside, which can be significant.”
Google got a shock in October 2003, when it learned it was not the only company doing a massive book-scanning project. That was the day Amazon.com introduced its “Search Inside the Book” feature. Amazon head Jeff Bezos had ordered the project to see if searching inside books would increase sales. (It did, by about 9 percent.) He had hired Udi Manber (who would later go to Google) to become “chief algorithms officer” and lead the project. Amazon began scanning books, and after the first 10,000, Manber’s engineers began working on ranking algorithms. The results didn’t prove satisfactory until Amazon had around 120,000 books in its indexes (many of its books were scanned in centers Amazon created in India and the Philippines), and putting in a keyword would pull out an apt passage in this virtual library. At that point, says Manber, “it was really eye-opening. It was, wow.” Just after the prototype was operative, Manber had been scheduled to present a report to management on the history of newspapers. Normally, you would Google the subject. But in this case he typed “history of newspapers” into his prototype and was instantly ushered inside a book that explained how newspapers had started in English coffeehouses in port cities, where sailors exchanged stories of their travels. “I bought the book,” says Manber. Bezos would later declare that his goal was to offer consumers the chance to buy any book ever written, in digital form.
Google professed to welcome Amazon’s efforts. “I think it’s an important part of the evolution of the Internet,” said Brin. Cognizant of Google’s own efforts, he observed that Amazon’s project was just an initial step in book search. Then he noted something that would prove more prophetic than he intended: “I do feel that the Internet needs to sort out copyright issues.” (Amazon, which had signed contracts with hundreds of publishers, had no such problems.) Later, Googlers would say that Amazon’s entry had been beneficial to Google because it introduced the concept of massive scanning in a less threatening manner than their project would. “It was like they disturbed The Force before we did,” says Megan Smith.
Nonetheless, Amazon forced an alteration in Google’s plans. Smith had already been working on a project similar to Amazon’s. It was a parallel path to the libraries project, involving books currently on sale that would be scanned with the blessing of publishers. As with Amazon’s plan, the publishers would allow their books to be scanned with snippets of the text exposed to users as teasers for eventual purchase. Google would provide links to online bookstores where people could instantly buy the books that showed up in search results. “We had been working on this project for a while, and so we were a little bit nervous that the publishers would sign exclusive deals with Amazon without knowing about how search was a marketing opportunity for them,” Smith later recalled. “Also, we needed their guidance and to know what they thought of our crazy project.” A team from Google, including Smith, her biz-dev colleague Cathy Gordon, David Drummond, and Susan Wojcicki hurriedly arranged meetings with top publishers in New York City, creating a slide deck on the flight.
The publishers welcomed Google, in part because they were intrigued by the edgy new company. “The leverage that our name had, even in 2003, was astonishing,” says Cathy Gordon. “Two years before, it was ‘Who’s Google, and what are you doing?’ But by that point everyone was interested. They thought, ‘This Google thing is kind of cool.’” The publishers welcomed Google for another reason: they were concerned about ceding too much power to Amazon. “By the time we started to talk to the publishers, they could tell us everything that Amazon had ticked them off about, which was really useful, because we had no product and no infrastructure then,” says Gordon. Google was more than happy to present itself as an alternative, one that presented no threat to publishers. Google wasn’t competing with physical bookstores but was simply going to alert search customers to books they might want to purchase. Google even agreed to show less content from the books it scanned as part of what it now called Google Print.
The meetings seemed to go smoothly, at least until a power failure hit New York City and the entire northeastern United States on the second afternoon of what was to be two days of back-to-back sessions. (Stuck in the city, the group wound up spending the last night at Cathy Gordon’s mother’s house.) But not all of the publishers found Google charming. Jack Romanos, then CEO of Simon & Schuster, later complained to New York’s John Heilemann about Google’s “innocent arrogance” and “holier-than-thou” attitude. “One minute they’re pretending to be all idealistic, talking about how they’re only in this to expand the world’s knowledge, and the next they’re telling you that you’re going to have to do it their way or no way at all.”
In truth, Google was not dealing with the publishers in an upfront manner. During those first meetings, the Googlers did not even hint at their plans to digitize and index the vast holdings of huge libraries, regardless of copyright status. ‘We knew that this was going to be an issue,” says Gordon. “But Google does not disclose these kinds of things early. Ever.”
So when Google launched its Google Print in October 2004 at the Frankfurt Book Festival (Ocean was only the code name), with commitments from fifteen publishers including Penguin, Warner Books, and Houghton Mifflin, there was no mention of the library project, even though the scanning facilities were humming away, truckloads of books shuffling out of and back to various libraries every week. Two months later, on December 14, Google announced its separate deal to scan the libraries of Stanford, Harvard, the University of Michigan, Oxford University, and the New York Public Library. The project involved an estimated 10 million books. Google would give each library digital copies of the scans and use its own copies to store the contents of the books in its search indexes, along with the other books that it was scanning as part of the Google Print program, which dealt in authorized digital copies of books in print. (Eventually, Google’s Universal Search feature would display relevant book results in general searches.)
Page was rhapsodic when explaining the deal. At Stanford, he said, he had heard there were 132 miles of books in the libraries, but you couldn’t find what was in them. Google’s project might drive people to go to libraries more often, because now they would know what was in there. “That’s the really big deal,” he said. “A lot of people thought that this was impossible.”
As for Google’s edge in collecting this corpus, he said, “We’re not trying to lock up anything. We’re looking to have good competition.”
The fine print in Google Libraries was a little complicated. Different libraries had different comfort levels about what Google could scan. As far as the user was concerned, it could be baffling, too. Different books had different degrees of accessibility. Public domain books were available in their entirety. With in-print books licensed in the Google Print program, users could see a limited number of sample pages of the book. For “orphan books” from libraries, Google was most conservative, showing a “snippet view” with only the passage that contained the search term. (An orphan book was still in copyright but out of print, and the copyright holder could not be easily contacted.) In all cases, Google showed bibliographic information and, when possib
le, information on where to find or buy the physical book.
With the announcement of the library project, the publishing industry unleashed its suppressed fury toward the philistines who wanted to transform their treasures into bits. It was one thing to do what Amazon had done, digitizing books as a prelude to sales. Google Print had been seen in the same light. But now Google was making a copy of every book—without permission—to build a library of its own, without paying publishers and authors for the privilege. By what authority? the publishers wanted to know. And what if someone hacked into Google’s archive and stole the contents, distributing them free all over the Internet? There would no longer be any need for anyone to buy a book!
Marissa Mayer thought that bad timing contributed to the troubles. The Google Libraries announcement came out on December 14 to sync with a board of trustees meeting at Harvard. “We missed an opportunity because all the Internet users were Christmas shopping so no one’s reading about this amazing thing to bring books online,” she later said. That year Mayer returned to her hometown in Wisconsin for the holidays and was disappointed that even her parents hadn’t gotten the message and asked her what this troublesome books thing was about. “What do you mean?” she said. “We’re putting all the world’s books online, and you’ll be able to search them from anywhere!” It wasn’t until after the New Year that people began to hear about it, and by that time the publishers had seized the stage.
Indeed, representatives of publishers and authors objected to the suit, essentially charging that Google was overstepping boundaries. Instead of a boon to society, they charged, Google’s program was a literary landgrab launched by a powerful corporation that would mine the world’s knowledge for profit and cheat rightful owners of the bounty. The war of words over the war on books proceeded for the next few months, with neither side backing down. On October 19, 2005, several publishers, under the auspices of the Association of American Publishers, filed suit against Google’s “massive, wholesale and systematic copying of entire books still protected by copyright.” The previous month, the Authors Guild had filed a class-action suit charging Google with infringement. The two suits were combined by the court.
Critics of the plan seized on the fact that Google Book Search was scanning the books without permission of authors or publishers. (“To reflect the product’s evolution,” Google said, it had changed the name from Google Print, encompassing both the publisher and the library program.) Google, the lawsuit argued, was within its rights to scan when the book was in the public domain. But for all other books, the process should be “opt in,” meaning that Google should scan no book under copyright unless the rights holder specifically authorized it. Google noted that such a plan would essentially gut its book archive. The vast majority of printed books, around 80 percent, had been published since 1923. Perhaps 5 percent of those were currently in print, and Google was working with publishers to get permission to scan those for Book Search. But almost three-quarters of all books were still in copyright but not in print, and in many cases it was difficult if not impossible to find the rights holder. (When explaining this situation, digital law expert Lawrence Lessig claimed that of the 10,027 books published in 1930, only 174 are still in print. The remaining 9,853 books cannot be reprinted or even copied without the permission of the copyright holder.) Such a process certainly didn’t scale.
Google also considered the objections of the Authors Guild, which claimed to represent out-of-print authors, as illogical—writers in that category, Google argued, would only be helped by its efforts. “The fact that these books were out of print meant that there was no revenue accruing to an author,” says Google’s Cathy Gordon. “The only way anyone could get such a book was to buy it on the secondhand market.”
Google’s chief economist, Hal Varian, wrote an economic analysis of the Google Libraries in 2006. Not surprisingly, he found that it was “legally sound and economically sensible.” He warned that an opt-in model would be destructive, ruining the value to society of a complete database of book contents.
Imagine receiving a letter that told you you had inherited the copyright on great-uncle Fred’s autobiography. If you signed and returned the enclosed legal document, the book would be added to the Google Library index. What would the response rate be? The response rate would probably be about the same as to those letters telling you you have won the Nigerian lottery.
The law was illogical, and it was as if Google felt that executing a commonsense plan would move the world to the proper view of things. “I anticipated it would be controversial,” says Page of the project. “I think we knew that there would be a lot of interesting issues and the way the laws are structured isn’t really sensible, especially with regard to orphan works. If you were to sit down to write the law knowing what you know now, there’s no way you’d ever write it like that.”
Google’s Book Search team included Random House’s former vice president of new media, Adam Smith, as the managing director. He worked with an engineer named Dan Clancy, who had formerly managed the information services for the NASA Ames Research Center, just down the highway from the Googleplex. Their team supervised the technical work to produce the product but also oversaw what seemed like a public relations war to convince the world that Google’s motives were pure and that if a lawsuit were to kill this beneficial project, the world would suffer.
They had help from various luminaries in the digital realm. A month after the lawsuit was filed, some of the players participated in a public debate at the New York Public Library. Google’s David Drummond was supported by cyberlaw superstar Lawrence Lessig in defending Book Search, while lawyers for the publishers and Authors Guild executive director Paul Aiken spoke against it. Lessig was persuasive in stating the case for the utility of an opt-out system. He had earlier written of the transformation of property law after the emergence of the aviation industry. Originally, the boundaries of one’s property were thought to have extended skyward into the universe, and flying over a home owner’s acreage was trespassing. Since it was impossible for an airline to secure permission over every single piece of property underneath its flight path, society saw fit to recognize a different boundary. The same should apply to books—inclusion in a search engine in a way that doesn’t erode the value of the book was so important to society that it had to be legal.
The lawyers for the publishers and authors, while conceding that there were benefits to a universal book search—including purchases coming from increased exposure to a book—preferred to focus on the narrow fact that the law forbade making an unauthorized single copy of a book during the scanning process. But the underlying impetus for the suit was the conviction that in a multimillion-dollar enterprise such as Book Search it was unconscionable for authors and publishers not to be paid. After the debate, Aiken laid out the essence of his group’s rationale to an Authors Guild member who told him that he’d like his books discoverable by Google. “Don’t you understand?” Aiken said. “These people in Silicon Valley are billionaires, and they’re making money off you!”
Google, so used to being seen as a scrappy underdog, had underestimated the fact that in this instance, it was seen as a digital bully pounding on the vulnerable weaklings of an industry in decline. “Google saw us as patsies,” said Pat Schroeder, a former congresswoman who headed the Association of American Publishers. “They assumed we’d never sue. But they were wrong—so here we are and isn’t it fun?”
To Page, it came down to whether Google’s plan would help the world or not. For him the benefit provided by Book Search outweighed the legal niceties. “Do you really want the whole world not to have access to human knowledge as contained in books, because you really want opt out rather than opt in?” asked Page. “You’ve just got to think about that from a societal point of view.” Page was shocked that people didn’t get that. He chalked up a lot of the opposition’s passion as phony—a negotiating tactic. “People want to get money out of us, or they want to get other
things, so they’re arguing a very untenable position.”
Showdowns like these often concluded with a financial settlement, and many thought that the negotiating sessions between the parties would do just that. But things took an unusual turn relatively early in the process. Instead of the usual chest beating and ultimatums before the calculators were pulled out, a representative from the Writers Guild of America made a surprising proposal: instead of figuring out what Google had to offer the rights holders to pursue its current plan, what if Google took on an even more ambitious role—not just as an archivist for books, but as the designated digital bookstore for the millions of tomes otherwise unavailable? Such a scheme could be complemented by a giant registry of authors and rights holders to determine who should be paid. And, of course, Google would contribute a large sum of money to the plaintiffs to pay off legal bills and compensate them for the wrongs already committed.
The proposal put Google at a critical juncture. Thus far, Google had been arguing on principle. It had defined itself in the conflict as a proxy for the culture itself, indeed for all of civilization. The snippets, it argued, belonged to the people. And it was demanding no exclusivity. If Google won its argument and it was determined that including the text of books in search engine indexes was fair use, anyone could make deals with libraries to do his own scanning. Google might have snapped up some of the plum libraries, but there were dozens of other first-rate collections that a company like Microsoft or Yahoo could scan. (Indeed, Microsoft had embarked on such a plan but eventually abandoned it because of excessive costs.) Or maybe the Library of Congress could digitize its own holdings and license the files to a search engine company.