by Steve Levy
Nonetheless, McLaughlin and Wong spent a lot of time in Sacramento educating legislators on the fine points of Gmail. At Sergey Brin’s suggestion, Google gave Gmail accounts to all the legislators and their aides. At the time that was a valuable commodity, since the service was invitation only. (Invitations were going for $100 on eBay.) The Figueroa bill passed the California senate but never became law.
Even though the legislative challenge fizzled, Gmail became a permanent bête noire to privacy rights organizations. One bone of contention was that Gmail didn’t seem to have a delete button. (It actually did have an option to delete an email, but that choice was buried under several nested menus.) Buchheit later said that that approach had been his idea. Omitting a delete button was supposed to teach you to view email—and information itself—the way Google did. The implicit message was that the only thing that should be deleted was the concept of limited storage. Not everybody at Google subscribed to this philosophy—Eric Schmidt had long before instituted a personal practice of making his emails “go away as quickly as possible” unless specifically asked to retain them. To most people at Google, though, automatic archiving was a cause for celebration, and gripes from privacy do-gooders were viewed as misguided or even cynically—exploiting a phony issue for their own status and fund-raising. “Even to this day, I’ll read people saying that Google keeps your [deleted] email forever. Like, totally false stuff!” says Buchheit. Buchheit called his critics “fake privacy organizations” because in his mind “they were primarily interested in getting attention for themselves and were going around telling lies about things.”
But to millions of people whose perceptions were framed by the traditional nature of storage and the control it provided, Gmail was a shrieking alarm that in this new world, privacy was elusive. And Google’s policy people knew that from that point on, everything Google did would have to withstand scrutiny from the angle of privacy, whether or not its engineers thought the charges were valid. “Gmail was game-changing,” says Nicole Wong. Google would now have to figure out answers to questions—mostly legitimate ones—of what happens to personal information stored on Google’s servers.
Ironically, even as the Gmail privacy conflagration moved off the news pages, there was another source of frenzy around Gmail—people who were desperate to get accounts. The strong demand for Gmail accounts confirmed Buchheit’s instinct, supported enthusiastically by Page and Brin, that giving people huge amounts of storage and letting them search all of their emails with lightning speed would be irresistible—even if the service came with sometimes-creepy ads.
Why did Google see this when its competitors who had web-based mail products first didn’t? About six months after Gmail came out, Bill Gates visited me at Newsweek’s New York headquarters to talk about spam. (His message was that within a year it would no longer be a problem. Not exactly a Nostradamus moment.) We met in my editor’s office. The question came up whether free email accounts should be supported by advertising. Gates felt that users were more negative than positive on the issue, but if people wanted it, Microsoft would offer it.
“Have you played with Gmail?” I asked him.
“Oh sure, I play with everything,” he replied. “I play with A-Mail, B-Mail, C-Mail, I play with all of them.”
My editor and I explained that the IT department at Newsweek gave us barely enough storage to hold a few days’ mail, and we both forwarded everything to Gmail so we wouldn’t have to spend our time deciding what to delete. Only a few months after starting this, both of us had consumed more than half of Gmail’s 2-gigabyte free storage space. (Google had already doubled the storage from one gig to two.)
Gates looked stunned, as if this offended him. “How could you need more than a gig?” he asked. “What’ve you got in there? Movies? PowerPoint presentations?”
No, just lots of mail.
He began firing questions. “How many messages are there?” he demanded. “Seriously, I’m trying to understand whether it’s the number of messages or the size of messages.” After doing the math in his head, he came to the conclusion that Google was doing something wrong.
The episode is telling. Gates’s implicit criticism of Gmail was that it was wasteful in its means of storing each email. Despite his currency with cutting-edge technologies, his mentality was anchored in the old paradigm of storage being a commodity that must be conserved. He had written his first programs under a brutal imperative for brevity. And Microsoft’s web-based email service reflected that parsimony.
The young people at Google had no such mental barriers. From the moment their company started, they were thinking in terms of huge numbers. Remember, they named their company after a 100-digit number! Moore’s Law was as much a fact as air for them, so they understood that the expense of the seemingly astounding 2 gigabytes they gave away in 2004 would be negligible only months later. It would take some months for Gates’s minions to catch up and for Microsoft’s Hotmail to dramatically increase storage. (Yahoo Mail also followed suit.)
“That was part of my justification for doing Gmail,” says Paul Buchheit of its ability to make use of Google’s capacious servers for its storage. “When people said that it should be canceled, I told them it’s really the foundation for a lot of other products. It just seemed obvious that the way things were going, all information was going to be online.”
People would quickly identify that concept as a core value of “cloud computing.” The term came from the phenomenon where data—even private, proprietary information once stored on one’s own computer—would be accessed via the Internet, no matter where you were. As far as the user was concerned, information lived in a huge data cloud, and you pulled it down and sent it back up without regard to its actual location.
The term originally wasn’t popular at Google. “Internally, we thought of ‘cloud computing’ as a marketing term,” says Urs Hölzle. (“Marketing” being pejorative in this context.) “Technically speaking, it’s cluster computing that you do.” (At Google, people refer to a “cluster” as a large number of servers—well into the thousands—usually representing the minimum number of machines needed to serve search results from a query.) But the aptness of the metaphor, as well as the fact that it became standard industry jargon, eventually led Google to accept it. Gmail was a cloud application. “For the first time you said, ‘Gee, there’s a product that could conceivably replace your desktop client,’” says Hölzle. He meant that instead of using Microsoft applications, people might switch to advertising-supported products, with ads supplied by Google. Even more important, the psychology of the cloud matched Google’s worldview: network-based, fast, operating on scale. “On one level, [the cloud is] the business we’ve been in since the day Larry and Sergey founded Google,” says Dave Girouard, a company executive in charge of Google’s cloud-oriented business software. “We have an amazing advantage because we’re a company that was born of the web that has never done anything else.”
What’s more, Google was a company that benefited from the massive adoption of that web. The sooner people migrated to all-digital worlds—where Google could mine the information, deliver it to users, and sell ads targeted to their activities at that very moment—the more Google would be intertwined in their lives. After Gmail, a corollary was added to the all-the-world’s-information axiom: the sooner everyone moved to the cloud, the better it would be for Google.
2
“My job was to get in the car, get on a plane, go find data centers.”
Google’s own cloud would come to reside in a constellation of huge data centers spread around the world, costing more than a billion dollars each, each of them packed with servers Google built itself. Of all of Google’s secrets, this massive digital infrastructure was perhaps its most closely held. It never disclosed the number of these data centers. (According to an industry observer, Data Center Knowledge, there were twenty-four major facilities by 2009, a number Google didn’t confirm or dispute.) Google would not say
how many servers it had in those centers. (Google did, however, eventually say that it is the largest computer manufacturer in the world—making its own servers requires it to build more units every year than the industry giants HP, Dell, and Lenovo. Nor did Google spokespeople deny reports that it had more than a million of those servers in operation.) And it never welcomed outsiders to peer into its data centers.
But in 2002, before Google firmly closed the shutters, I was offered a rare glimpse of the company’s data storage. Jim Reese, then the caretaker of the company’s infrastructure, was the guide. He drove to the Exodus colo (colocation center) near San Jose in his car, apologizing for a flapping patch of upholstery on the interior roof as he steered. On the way over, he shared the kind of information that in later years Google would never divulge: real numbers about its servers and its searches. Google, he said, had 10,000 servers to process the 150 million searches its customers launched every day. A sleepy guard waved us in, and we entered a large darkened space with “cages” of servers surrounded by chain-link fences. Air conditioners churned out a steady electronic hum. Reese pointed out who owned the servers in each cage. The cages of companies such as eBay and Yahoo held symmetrically balanced racks of pizza box–style servers, with all the cables tidily secured and labeled. Google’s servers looked half finished—without cases they seemed almost uncomfortably naked—and sprewing from them was an unruly tangle of cables. If you could imagine a male college freshman made of gigabytes, this would be his dorm.
Components built to fail, supersophisticated software schemes, and a willingness to discard conventional wisdom would grow Google’s storage capabilities from this puzzling rat’s nest to the world’s biggest data cloud.
A neurosurgeon by training, Reese had drifted to corporate computer maintenance when he applied to Google in June 1999. Google had eighteen employees then. Urs Hölzle conducted an initial interview by phone. In Reese’s interactions with other companies, he’d gotten a few cursory questions about a technical point or two, and then the interviewer would pitch the job. “But in this phone screen, absolutely no recruiting went on,” says Reese. “He questioned me for an hour and a half. Really grilled me.” It was disorienting, even more so with Hölzle’s gruff, accented voice barking one question after another, with never an acknowledgment that Reese’s answers were satisfactory or not. Then Hölzle abruptly thanked him and hung up. The next day, Reese got an invitation to the Palo Alto office, where he went into a tiny conference room with Larry and Sergey, who asked him more technical questions. They paid special attention to his answer concerning the best way to install Linux on bare, white-box (unbranded) computers with blank disk drives and from there scaling the process to huge numbers of new machines. The founders looked at each other and nodded. Then they invited him into their tiny office.
Larry sat in his desk chair. Sergey sat in his desk chair. Then the Google leaders sheepishly realized that there were no other chairs. “Why don’t you pull up a ball?” they asked him. So Reese was perched on a red physio ball when they asked him to work at Google. The $70,000 salary was the lowest offer of any company he talked to, but he took it anyway. It hadn’t escaped him that between the time of his first interview and the employment offer, Google had announced its $25 million venture capital windfall.
Reese quickly realized that the question about massive Linux installations was not rhetorical—it was his job to get Google’s jury-rigged machines up and running. At the time Google had about 300 servers, all located at a single colocation facility in Santa Clara, a few miles south of Palo Alto. They occupied about half a cage, which in this facility was a space about the size of a New York City hotel room, bounded by chain-link fence. Reese’s first assignment, and pretty much every assignment after that, dealt with expansion. But he had to do it in the most economical way possible. Larry Page understood why the square-footage rate at colos was so high—“You’re paying for security, fire suppression, air-conditioning, interruption of power,” he noted at the time. “The square-footage cost is extremely high—it’s maybe a hundred times what I pay for my apartment.” He told Reese to double the number of servers and fit them into a cage. Reese managed to exceed that spec, squeezing not 600 but 800 servers into the cage.
Google was a tough client for Exodus; no company had ever jammed so many servers into so small an area. The typical practice was to put between five and ten servers on a rack; Google managed to get eighty servers on each of its racks. The racks were so closely arranged that it was difficult for a human being to squeeze into the aisle between them. To get an extra rack in, Google had to get Exodus to temporarily remove the side wall of the cage. “The data centers had never worried about how much power and AC went into each cage, because it was never close to being maxed out,” says Reese. “Well, we completely maxed out. It was on an order of magnitude of a small suburban neighborhood,” Reese says. Exodus had to scramble to install heavier circuitry. Its air-conditioning was also overwhelmed, and the colo bought a portable AC truck. They drove the eighteen-wheeler up to the colo, punched three holes in the wall, and pumped cold air into Google’s cage through PVC pipes.
When Brin and Page hired Reese, they made it clear that they expected an exponential growth in Google’s computer power and infrastructure. “They told me that whatever I do, make sure it will work not just for 500 or 5,000 computers but 50,000—that we should build in massive scalability now and that we would have that many computers in just a few years. Which we did,” says Reese.
The key to Google’s efficiency was buying low-quality equipment dirt cheap and applying brainpower to work around the inevitably high failure rate. It was an outgrowth of Google’s earliest days, when Page and Brin had built a server housed by Lego blocks. “Larry and Sergey proposed that we design and build our own servers as cheaply as we can—massive numbers of servers connected to a high-speed network,” says Reese. The conventional wisdom was that an equipment failure should be regarded as, well, a failure. Generally the server failure rate was between 4 and 10 percent. To keep the failures at the lower end of the range, technology companies paid for high-end equipment from Sun Microsystems or EMC. “Our idea was completely opposite,” says Reese. “We’re going to build hundreds and thousands of cheap servers knowing from the get-go that a certain percentage, maybe 10 percent, are going to fail,” says Reese. Google’s first CIO, Douglas Merrill, once noted that the disk drives Google purchased were “poorer quality than you would put into your kid’s computer at home.”
But Google designed around the flaws. “We built capabilities into the software, the hardware, and the network—the way we hook them up, the load balancing, and so on—to build in redundancy, to make the system fault-tolerant,” says Reese. The Google File System, written by Jeff Dean and Sanjay Ghemawat, was invaluable in this process: it was designed to manage failure by “sharding” data, distributing it to multiple servers. If Google search called for certain information at one server and didn’t get a reply after a couple of milliseconds, there were two other Google servers that could fulfill the request.
“The Google business model was constrained by cost, especially at the very beginning,” says Erik Teetzel, who worked with Google’s data centers. “Every time we would serve a query it cost us money, and generating ad money didn’t happen until later, so Larry and Sergey and Urs set out to build the cheapest infrastructure they could. They didn’t buy the prescribed notion that you must buy your servers from HP and couple it with a Cisco router and software from Linux or Windows. They looked at it holistically, to have control from soup to nuts. That set the stage for this holistic picture where we could do very efficient computing.”
By having only one data center, Google was vulnerable. First, it moved to make sure that it had multiple fiber links into the building—otherwise an errant public works crew could take Google down. “When it comes to a backhoe versus fiber, the backhoe always wins,” says Reese. “So we made sure that we had fiber coming in from differe
nt routes.” More significantly, Google needed redundant data centers to keep operating if a catastrophe struck the Exodus center. So the company also took space in a nearby colocation facility in Sunnyvale.
But it wasn’t only redundancy that Google needed at that point; it was speed. Speed had always been an obsession at Google, especially for Larry Page. It was almost instinctual for him. “He’s always measuring everything,” says early Googler Megan Smith. “At his core he cares about latency.” More accurately, he despises latency and is always trying to remove it, like Lady Macbeth washing guilt from her hands. Once Smith was walking down the street with him in Morocco and he suddenly dragged her into a random Internet café with maybe three machines. Immediately, he began timing how long it took web pages to load into a browser there.
Whether due to pathological impatience or a dead-on conviction that speed is chronically underestimated as a factor in successful products, Page had been insisting on faster delivery for everything Google from the beginning. The minimalism of Google’s home page, allowing for lightning-quick loading, was the classic example. But early Google also innovated by storing cached versions of web pages on its own servers, for redundancy and speed.
“Speed is a feature,” says Urs Hölzle. “Speed can drive usage as much as having bells and whistles on your product. People really underappreciate it. Larry is very much on that line.”
Engineers working for Page learned quickly enough of this priority. “When people do demos and they’re slow, I’m known to count sometimes,” he says. “One one-thousand, two one-thousand. That tends to get people’s attention.” Actually, if your product could be measured in seconds, you’d already failed. Buchheit remembers one time when he was doing an early Gmail demo in Larry’s office. Page made a face and told him it was way too slow. Buchheit objected, but Page reiterated his complaint, charging that the reload took at least 600 milliseconds. (That’s six-tenths of a second.) Buchheit thought, You can’t know that, but when he got back to his own office he checked the server logs. Six hundred milliseconds. “He nailed it,” says Buchheit. “So I started testing myself, and without too much effort, I could estimate times to a hundred milliseconds precision—I could tell if it was 300 milliseconds or 700, whatever. And that happens throughout the company.” (Page himself considered it unexceptional to be able to detect lags of 200 milliseconds, generally thought of as the limit of human perception.)