by Steve Levy
Sergey Brin even put a label on his cofounder’s frustration at the tendency of developers to load more and more features into programs, making them run way too slowly. Page’s Law, according to Brin, was the observation that every eighteen months, software becomes twice as slow. Google was determined to avoid this problem. “We want to actually break Page’s law and make our software increasingly fast over time,” says Brin.
“There’s definitely an obsession with speed here,” says Buchheit. “With most people in the world, when you complain that something is too slow, they might say, ‘Well, you just need more patience.’ At Google, they’re like, ‘Yeah, it makes me want to tear my eyes out!’”
The data in Google’s logs justified the obsession with speed. When things go slowly, says Urs Hölzle, “people are unconsciously afraid of doing another search, because it’s slow. Or they are more likely to try another result than rephrase the query. I’m sure if you ask them, none of them would tell you, but in aggregate you really see that.” On the other hand, when you speed things up, they search more. Hölzle would cite Google’s experience when the company boosted the performance of its Picasa web-based photo service, making slide shows run three times as fast. Even though there was no announcement of the improvement, traffic on the site increased 40 percent the first day it was implemented. “It just happened,” says Hölzle. “The only thing we changed was the speed.”
In 2007, Google conducted some user studies that measured the behavior of people whose search results were artificially delayed. One might think that the minuscule amounts of latency involved in the experiment would be negligible—they ranged between 100 and 400 milliseconds. But even those tiny hiccups in delivering search results acted as a deterrent to future searches. The reduction in the number of searches was small but significant, and were measurable even with 100 milliseconds (one-tenth of a second) latency. What’s more, even after the delays were removed, the people exposed to the slower results would take a long time to resume their previous level of searching.
(Microsoft found a similar effect when it conducted its own tests with its Bing search engine. The Bing experiments also showed that when results are delayed, users respond with their own latency, taking longer to click on links after a search is completed. Presumably, during the half second or more that the results are delayed, the users have begun to think about something else and have to refocus before they get around to clicking on a result.)
In 2008, Google issued a Code Yellow for speed. (A Code Yellow is named after a tank top of that color owned by engineering director Wayne Rosing. During Code Yellow a leader is given the shirt and can tap anyone at Google and force him or her to drop a current project to help out. Often, the Code Yellow leader escalates the emergency into a war room situation and pulls people out of their offices and into a conference room for a more extended struggle.) This Code Yellow kicked off at a TGIF where Hölzle metered the performance of various Google products around the world, with a running ticker on the big screen in Charlie’s Café pinpointing the deficiencies. “You could hear a pin drop in the room when people were watching how stunningly slow things were, like Gmail in India,” says Gabriel Stricker, a Google PR director. After the Code Yellow, Google set a companywide OKR (the objective key result metric Google uses to set goals) to fight latency. To help meet its goals, the company created a market-based incentive program for product teams to juice up performance—a cap-and-trade model in which teams were mandated latency ceilings or maximum performance times. If a team didn’t make its benchmarks, says Hölzle, it accrued a debt that had to be paid off by barter with a team that exceeded its benchmarks. “You could trade for an engineer or machines. Whatever,” he says.
The metric for this exchange was, oddly enough, human lives. The calculation goes like this: average human life expectancy is seventy years. That’s about two billion seconds. If a product has 100 million users and unnecessarily wastes four seconds of a user’s time every day, that was more than a hundred people killed in a year. So if the Gmail team wasn’t meeting its goals, it might go to the Picasa team and ask for ten lives to lift its speed budget into the black. In exchange, the Gmailers might yield a thousand servers from its allocation or all its massage tickets for the next month.
But you couldn’t keep borrowing forever. “People have definitely been yelled at,” says Hölzle. If a team got too deep in the hole, the latency police would close down the casino. “There’s a launch gate where if you’re too far in the negative, you can’t launch features. From that point on, you need to focus on latency alone until you’re close to your goal.”
Back in 2000, Google wanted to get speedier by setting up data centers in locations closer to its users. Its first priority was getting servers on the East Coast of the United States. By the spring of that year, Google was occupying space in a colo in northern Virginia. The tricky part of setting up in a new facility was loading all those thousands of servers with the indexes. That involved terabytes of data, which was potentially going to force Google to pay a huge amount of money to the bandwidth provider that owned the fiber. “Networking was very expensive,” says Hölzle. “And our data push would take twenty hours at a gigabyte per second—that would cost us something like $250,000 a month.” To save money, Google devised a trick that exploited a loophole in the billing system for data transfer. Broadband providers used a system known as the 95th Percentile Rule. Over the period of a month, the provider would test how much information was moving, automatically taking a measurement every five minutes. In order to discard unusual spikes in activity, when the billing rate was calculated the provider would lop off the measurements in the top five percentiles and bill the customer at the rate of the 95th percentile.
Google’s exploitation of the rule was like the correct answer to a trick question in one of its hiring interviews. It decided to move all its information during those discounted spikes. “We figured out that if we used zero bandwidth all month, except for thirty hours once a month, we would be under that 5 percent,” says Reese. For two nights a month, from 6 p.m. to 6 a.m. Pacific time, Google moved all the data in its indexes from west to east. “We would push as fast as we could, and that would cause massive traffic to go across, but it was during the lull hours for them…. And of course, the bill came out to be nothing,” says Reese, “because when they lopped off the top 5 percent, our remaining bandwidth was in fact zero, because we didn’t use any otherwise. I literally turned off the router ports for twenty-eight or twenty-nine days a month.”
Eventually, the contract expired and Google negotiated a plan where it actually paid for its bandwidth. But by that time it had decided how to end the need for such contracts entirely: it began to buy its own fiber.
Fiber-optic cable was the most efficient, robust, and speedy means of moving data. Just as Google had taken advantage of the oversupply of data centers in the wake of the dot-com bust, it had a great opportunity to buy fiber-optic cable cheap. In the 1980s and 1990s, a raft of optical networking companies had made huge investments in fiber optics. But they had overestimated the demand, and by the early 2000s, many were struggling or going broke. Google began buying strategically located stretches of fiber. “We would want to pick up pieces that would connect our data center, so we’d identify the owner, negotiate, and take it over,” says Chris Sacca, who did many of the deals. “Then we’d put optical networking equipment on one end in our data center, the same equipment on the data center at the other end, and now we’re running that stretch of fiber,” says Sacca. “We were paying ten cents on the dollar.” Since fiber-optic cable had huge capacity, Google then made arrangements with broadband companies to fill in the gaps it didn’t own. “We swapped out strands with other guys,” says Sacca.
By the time Google finished with its fiber push, it was in a unique situation. “We owned the fiber. It was ours. Pushing the traffic was nothing,” says Sacca. How much fiber did Google own? “More than anyone else on the planet.”
&
nbsp; In 2001, Exodus suffered financial disarray, and some of its data centers fell into the hands of private investors. Google began renting entire data centers from the new owners. As the sole tenant, it had the opportunity to revamp everything that went inside the shell. Its biggest operation was in Atlanta, a former Exodus facility with 200,000 square feet of floor space. It was big enough for Google to maintain an assembly operation where workers could build the servers on the spot.
But there was only so much that could be done when someone else owned the facility. Google’s engineers knew that if they had a chance to design their facilities from the ground up—beginning with the site selection—they could be much more efficient. By mid-2003, Google reluctantly began planning to build its own data centers. “It was a big step,” says Hölzle, “but not a welcome truth. It’s nice if you have something you don’t have to worry about, and we’d been very successful in buying space in bankrupt data centers.” Looking into the future, though, Google saw that the period of oversupply in data center space was coming to an end, and after the current cheap contacts expired, prices would rise to perhaps three times what Google was currently paying. Those high costs would more accurately reflect the true costs that the hosts paid, particularly in terms of power.
Google considered the existing data centers horribly inefficient, particularly in the way they gobbled up power. “They wasted power, both by bad practice and bad buildings,” says Hölzle. If Google designed and built its own data centers, it would be free to innovate new ways to keep costs down. In some cases, all it had to do was apply existing ideas that no one had yet put into practice. There was a lot of unheeded literature about how to cool computers. One paper outlined a potential back-to-back arrangement of servers where the exhaust pipes faced each other and created a warm aisle between the racks. These would alternate with cool aisles, where the intakes on the front of the servers would draw on cooler air. Google tried to implement such arrangements in its colos, but the facilities managers complained. Their job, they would insist, was keeping the temperature in the building at a steady 68 degrees.
In its own data centers, Google could not only implement these energy-saving ideas, but take extreme measures to separate the hot and cold air. Its own data centers would have enclosed rooms that segregated the hot air. Inside those separate rooms, the temperature would be much higher—perhaps 120 degrees or even more. If someone had to go into one of those hot rooms, you could temporarily cool the area down so the person wouldn’t melt while trying to swap out a motherboard. Even in the cold aisles, Google would raise the temperature. “You can save just 20 percent by raising the thermostat,” says Hölzle. “Instead of setting the cold aisle temperature to 68 you can raise it to 80.”
Doing so would put a lot of stress on the equipment, but Google’s attitude was, so what if stuff broke? “You counted on failure,” says Chris Sacca. “We were buying nonspec parts [components rejected for commercial use because they were not rated to perform at high standards], so we didn’t need to coddle them.”
With all these hot and cold rooms, Google had a modular approach to data centers, and it even wondered whether it would make sense to build a data center without a traditional shell, just a scaffolding for stacking truck-size weatherproof containers. For a while, Google even ran a test of a containerized data center in the underground parking lot under Building 40, where Charlie’s Café was located. It was covered with a big tarp to hide its purpose, and a special security guard was posted to make sure no one except the very few Googlers permitted to visit could get a glimpse of the experiment. (Eventually Google would adopt a plan where the modules would reside inside a huge building shell, and a peek inside its centers would reveal what looked like an indoor trailer park. In ensuing years, other companies, including Microsoft, would adopt the container model for some of their data centers.)
Google also mulled over some radical approaches to save energy. What if you put a data center in Iceland and used only native geothermal power? Could you put a data center above the Arctic Circle and use outside air to keep temperatures down? How about buying an old aircraft carrier and cooling it with seawater? There was even one suggestion to use a big old blimp filled with helium as a data center.
Google was still reluctant to take the big step—and the enormous capital expense—of building its own data centers, so it explored the idea of having one of the companies currently involved in hosting to build a facility especially for Google, collaborating with the company to implement all the efficiencies. None was interested. “People just didn’t believe they should change, basically. There wasn’t a willing partner,” says Hölzle. It was a challenge even for Google to find an engineering firm flexible enough to violate the standard methodology and build things Google style. “We interviewed a number of companies, and they would say, ‘This is crazy talk. We’re professionals. We know how to build facilities,’” Hölzle recalls. Ultimately Google hired a small East Coast company called DLB Associates. “I think they were actually not convinced at all in the beginning, but they were willing to collaborate,” says Hölzle.
So Google, a company that had once focused entirely on building Internet software, prepared to begin a building program that would lead it to construct more than a dozen billion-dollar facilities over the next few years. A key Googler in the process was Chris Sacca. Not quite thirty at the time, Sacca had already been through several careers. Born to a working-class family in Buffalo, he’d been a ski bum, a stock speculator, and a lawyer. During law school at Georgetown, he’d taken a three-month break to help El Salvador’s telecom privatization project and after that thought he could make extra money as a consultant. (To buff up his status, he gave himself a fancy name: The Salinger Group.) When his stock trading put him $4 million in the hole, he took a job at a Silicon Valley law firm, where insistent networking put him on Google’s radar. “They wanted one person who could identify, negotiate, draft, and close data centers,” he says. “My job was to get in the car, get on a plane, go find data centers to buy, lease.”
Now Sacca was charged with finding sites where Google could build its centers. It was a process conducted with utmost stealth. In 2004, prior to the IPO, the company was still hiding its success. “Google didn’t want Microsoft to know how big search was,” says Sacca. “And if you knew how many computers Google was running, you could do some back-of-the-envelope math and see how big an opportunity this was.”
There was also the additional consideration that if people in a given locality knew it was Google they were dealing with, they might be less generous in giving tax breaks. In any case, when seeking out locations for a Google data center, Sacca and his colleagues did not let on who employed them. Sacca frequently used the name of his made-up consultancy operation, The Salinger Group. Other times he’d say he was from Hoya Technologies. (“Hoya” is the name of the sports teams at Georgetown, where he went to law school.) At some point, Larry Page noted that the flaw with those names was that people could all too easily Google them; something vaguer was needed. So Sacca became a representative of Design LLC. (“LLC” stands for “limited liability corporation.”) It was so generic that there were millions of search results for that name.
The basic requirements for a data center were clear: land, power, and water. The last was important because the cooling process was to be done by an evaporative process that required millions of gallons of water through refrigerator-style “chillers” that drop the temperature and then run the cool water through “jackets” that hug the server racks. Then the water—heated up by now—gets run through massive cooling towers, where it trickles down, evaporates, and gets collected back in the system again. (The air-conditioning is generally reserved for backup.) All of this requires massive power, and before a shovel can be stuck into the ground, it has to be determined whether the local electric utility can provide sufficient amps to power a small city—at bargain rates.
Focusing on Oregon, Sacca and a colleague used maps o
f power grids and fiber-optic connections to find potential locations. Then Sacca would drop into the local development office and power utility. To make sure someone was at the office that day, he would call from the previous town. “If we were in Coos Bay on Monday, we’d call Tillamook—‘Hey, I’m going to be there Wednesday, will you be there?’” And on Wednesday, a ragged six-foot-tall guy in shorts with his shirttail out—Sacca—would go to some double-wide trailer where the development people worked. “How’s it going?” he’d ask, “I’m up here doing some site selection.” Soon into the conversation he would identify himself as being from a company called Design LLC. And eventually he would reveal his intent to build a massive, massive utility. Uh, do you have any property in this town that has a contiguous fifty to sixty acres with access to power from Bonneville? “It was hilarious,” recalls Sacca. “There’s no reference check you can do, but here’s this kid rolling in there claiming he’s going to spend millions and millions of dollars and he needs your help.”