I'm Feeling Lucky: The Confessions of Google Employee Number 59

Home > Other > I'm Feeling Lucky: The Confessions of Google Employee Number 59 > Page 21
I'm Feeling Lucky: The Confessions of Google Employee Number 59 Page 21

by Douglas Edwards


  Ben Smith owned the front-end infrastructure that enabled Google to serve the index to Yahoo. Smith and Craig Silverstein were the experts on the Google Web Server (GWS), the system that actually communicated directly with users. This put Smith in the role of riding herd over latency problems.

  "It was the most miserable couple of months in my life," Smith said about the Yahoo buildup. "I'd be driving home with the sun coming up. I'd get four hours of sleep and then head back to work."

  Every neck felt the hot breath of failure, and every throat tasted the bile waiting to erupt if they fell behind schedule. Though not all felt it with the same intensity. "I wasn't too worried about it," Urs told me. "What we promised Yahoo was a lot smaller than our goal in terms of coverage [the 1B index]. The scary things were the reliability parts, not the quality. They can't measure quality."

  Urs knew that ultimately it was just a business deal and that Yahoo had the upper hand. "If Yahoo wanted to walk away," he conceded, "they could walk away. They didn't even need a pretext. It was a pretty one-sided contract."

  Google would be taking a calculated risk by giving Yahoo guarantees, but Urs made that calculation and felt comfortable enough with the odds that he slept easy at night. "We promised ninety-nine-point-five percent uptime," he said, "and we weren't reeeeaaallllly quite there. So you look at the penalties and say, fine, if it occasionally happens, then we'll pay some of these penalties. Hopefully in a good partnership, people are going to be rewarding you for seriously trying. And we were definitely seriously trying."

  The End of "The"

  So what did all this effort produce?

  "Mostly," Jeff said, "we wanted to get many more queries per second served out of these machines. One of the big things we did was completely change the index format to make it much more compact."

  In layman's terms, Google's index was full of spaces that didn't need to be there—it fit the data like baggy pants in constant danger of hitting the ground. Google wasted precious time searching empty pockets to find the bits it needed. One of JeffnSanjay's innovations was to shove most occurrences of a particular word into a single block in the database. Kind of like putting all your nickels in one pocket, dimes in another, so if you see a nickel, you know not to waste time searching through that pocket for a dime. The software searching the index could tell quickly from the block header that it didn't need anything in that block and skip ahead, which made each machine faster.

  "We improved that," Jeff said, "and we added skip tables to skip even larger chunks than just blocks." The goal was to minimize the number of times Google read each hard drive, because physically moving a head across a disk is far, far slower than doing things within an electronic circuit. JeffnSanjay rewrote the disk-scheduling systems to give each disk its own set of code. That cut search times by thirty to forty percent. A thirty-percent improvement was like running a four-minute mile in under three. A stunning accomplishment. But it wasn't enough.

  So Jeff and Sanjay got rid of "the."

  "The" is the most common word in English and conveys little useful information. JeffnSanjay decided to ignore it, freeing up one percent of the space being used by the index. The only downside? It became infinitely harder to find information about the eighties alternative rock band "The The." Engineering lives and dies by its tradeoffs.

  To keep the failure of a single machine from corrupting the data and requiring a restart of the entire crawl, the war room team implemented checkpointing, which saved the state of the crawl so that if things blew up they could go back to the last checkpoint instead of starting over from the beginning.

  With the hardware on its way to the data centers and the crawler, the indexer, the ranker, and the serving side progressing, only one issue remained. Yahoo wanted its search results to appear current, so it insisted that at least part of its index be updated on a daily basis.

  Think of a card shark at a blackjack table. She carefully arranges the cards to ensure that everyone gets a good hand, but not as good as hers. She starts dealing around the table. Now imagine her trying to add new cards to the deck in her hand as she deals, improving all the results, including her own. It was that kind of problem.

  Google's PageRank algorithm required a full day and a half to score an index. Adding additional information every twenty-four hours meant the pageranker would have to run faster, while integrating the new data in all the appropriate places. "It is a much harder problem to update an index every day than it is to have a static index," Jeff explained. "There are many more moving pieces to deal with."

  Jeff was maxed out. Sanjay was overloaded. Ben Gomes had a full plate. Developing an incremental indexing system could take a dedicated team of programmers years, and there were only weeks before the contract went into effect. Larry and Sergey, understanding the desperate need, threw the resource floodgates open and gave Urs carte blanche to do what was needed. Never one to waste an opportunity, Urs went all out. He hired a guy.

  "I had no experience with crawls," Anurag Acharya recalls, "and Google didn't tell people what they would be working on." Urs had sung his siren song at perfect pitch and persuaded his former UC Santa Barbara colleague to abandon academia for Silicon Valley.

  On his first day, Anurag focused on part of the indexing system. That same evening, Urs stopped by for a chat about his next assignment.

  "I'll take a look at the logs," Anurag suggested, "and see what problems there might be."

  "Why don't you do incremental indexing for a while," Urs casually replied, "and then we'll see?"

  "I say 'Yeah,'" Anurag told me about that conversation, "like I know what doing incremental indexing really means. So there went the next five months."

  Google didn't haze newbies, but Anurag must have felt as if he'd been led blindfolded into a room full of drunken frat boys with wooden paddles. He was hit with the complex issues of how to crawl additional sites, rank them appropriately, and then integrate them seamlessly into the existing index.

  "I don't think I was brought in specifically for the index," he said. "It just happened. I showed up at that point, and at that point, those were the problems."

  "Anurag started and a couple of us in the company knew him," said Ben Smith, who had been Anurag's student at UCSB, "and he basically just disappeared. He wouldn't come down to lunch. He was always in his office. He was there late for two months. What is up with this guy? And then Urs called me into his office and said, 'This is what's coming. Soon. Can you help him out?' Okay. Now I understood why."

  Smith knew exactly what he was getting into. The first time Urs had asked him to take on incremental indexing had been almost a year earlier, on his first day as an intern at Google. Smith had refused. "I said," he told me with a laugh, "'That's way too big for a summer project, nobody really knows how to do that. I don't wanna tackle that.'" Now he and Anurag would have to figure it out in a matter of weeks.

  Smith had already sped up Google's response rate by improving the search engine's ability to cache queries. The first time someone searched for "hotels in Madrid," Google searched the entire index, then stored the query and the results it had found. The next time someone searched for "hotels in Madrid," Smith's code delivered the same results from memory, without having to search the index. Instead of accessing hundreds of machines, a cached query used only one—an enormous reduction in the cost of search. Unfortunately for Smith, the new incremental index threatened to undo his work, because a continuously refreshed index would quickly make cached queries obsolete.

  "Anurag cranked maybe six to eight weeks and he had something that kinda worked," recalls Smith. "He wrote a new server called 'the mixer,' which hid the fact that we were talking to two different indices [a daily index and the main index] and mixed them together."*

  "Anurag and I were very stressed," Smith went on. "For whatever reason, we had to keep it quiet." They couldn't talk about what they were doing or why they were in the office every night after even the vampire coders had gone home.
"Many, many days, we'd leave somewhere between three and five a.m. That was the time when Anurag and I could try to plug in our new system, because that was when Google had the least amount of search traffic. There were a lot of days where it was, 'Let's turn it on and see how it works,' because we didn't really know. The mixer would talk to the cache and the mixer would talk to the incremental. And sometimes the mixer would melt down and sometimes the incremental would melt down, because it didn't have enough capacity, and we'd say, 'Okay. Why? What happened and how do you fix it?'"

  The hours and the stress shaved tolerances among the engineers until little remained to insulate their frustrations from the friction of the outside world.

  "For a large fraction of my career here," Smith explained to me, "I worked on infrastructure or on the serving side. Larry seemed much more interested in the product aspect of things. He wasn't interested in the infrastructure side of the Yahoo deal—he didn't even know what was going on regarding it. I remember one time he wandered into my office and made some crack like 'You need to relax more,' and I just chewed him out."

  Because the 1B index devoured almost all the available machines, only a few hundred remained for the incremental team to use. Even if ops could have built them faster, there were no data centers in which to put them. The team struggled on as the last days of May passed and July loomed over the horizon like the Imperial death star.

  Yahoogle

  The final deadline was a week away.

  The machines were built, the data centers filled. The crawler had worked. The indexer had worked. The pageranker had worked. Google had identified a billion URLs and now could search them. We had the superior technology. The Yahoo deal proved we had the business smarts to go with it. It was time to take our light from under its bushel and show it to the world.

  At 2:59 a.m. on Monday, June 26, 2000, Cindy sat in her office, her fingers poised on the keyboard, waiting to hit Send. On her screen was a press release announcing that Google was now the largest search engine on the planet. A minute later, just in time to feed the gaping morning news maw on the East Coast, the message was on its way. Cindy gave the business and technology editors an hour to digest that tantalizing morsel, then served the pièce de résistance: a brief announcement that Google had signed a contract to replace Inktomi as the search technology provider for Yahoo. It was the biggest accomplishment in our company's short life.

  The experts were underwhelmed.

  "Analysts agreed that the announcement may have hurt Inktomi's pride," CNET reported, "but they said the implications for its revenues and profitability are mild ... That side of its business is a money loser that has increasingly played second fiddle to its exploding networking-services division. The search market in general, meanwhile, remains a low-margin, commodity business ... Dick Pierce, Inktomi's chief operating officer, said ... losing the portal as a search licensing partner ... will have 'little impact with respect to profitability.'"

  Wall Street didn't buy the expert view. In fact, it sold heavily. By the end of the day, Inktomi's share price had fallen eighteen percent. This despite the fact that Yahoo had thrown Inktomi a bone, naming them a "corporate search" partner for an initiative launched the same day—because everyone knew the real money in search was on the corporate side.

  With impeccable timing, I had planned my first vacation to coincide with the most momentous week in Google's history. Sunday night I had trouble falling asleep in our Lake Tahoe hotel, and on Monday I was up early flipping through the cable channels looking for news about the blockbuster Yahoogle deal as my family snuggled under their blankets. Much to my surprise, it wasn't the lead story on any of the major networks and, unbelievably, it didn't make headlines in the Tuesday papers. The San Francisco Chronicle had a brief mention in the business section and the Mercury News had slightly more, yet even that thin coverage signified that things had changed. Up to that point, the mainstream media had portrayed Google as another quirky startup and California cultural oddity, with an emphasis on the wacky ways of western entrepreneurs. Now, however, Google was a business-section item, suggesting that the company should be taken seriously as a corporate entity.

  We didn't care what the press said. We knew it was a major win. The Googlers at the Plex celebrated accordingly. On Monday Charlie and his crew prepared a luau lunch and served it up al fresco. The grass was green and freshly mown, the food hot and plentiful, and the spirits high. Music filled the air and margaritas sloshed in paper cups hoisted in salute as Larry and Sergey, wearing plastic leis, introduced Yahoo co-founder David Filo. Filo eschewed the customary rhetorical pats on the back in favor of a brief speech that boiled down to, "Thank you. We have a lot to do. You should really get back to work." Perhaps his absent partner, Jerry Yang, was the party guy.

  Susan Wojcicki handed out t-shirts she had secretly ordered proclaiming "Google and Yahoo got lucky"—Google's first official commemorative garment. If you want to make a killing trading tech stocks, find a friend in the t-shirt business between San Francisco and San Jose and ask to be alerted any time a rush order gets placed. Conventional wisdom in Silicon Valley states, "If it's not on a t-shirt, it didn't really happen."

  Copy That, Good Buddy

  Saturday, July 1. Google was serving the 1B index to all its own users from a new West Coast data center. All that remained was to load a copy of the index into the new data center in Virginia and the old one at Exodus. The Virginia transmission went smoothly, but when ops tried sending a copy to Exodus, it failed. The connection between the data centers couldn't be established, so the data couldn't be sent. Without a copy of the index, the third data center would be useless and Google would be unprepared to handle Yahoo's queries, which were due to start flooding in within forty-eight hours.

  Jim, Schwim, and Zain Kahn piled into Jim's ten-year-old Volvo station wagon and sped off to check it out. The network line between the data centers hadn't gone live yet. Instead of relying on outsiders to activate the cable, they opted for a backup system known among technicians as "sneakerware."

  "We just ripped out the eighty machines that had the index," recalls Schwim, who helped load the machines that held Google's future into the Volvo. The techs climbed in with the hard drives and drove them to Exodus, where they piled them on the floor of the already overcrowded cage. "We stacked up eighty machines on the ground, with nothing around them, not even cabinets, and we plugged them into these ridiculous power strips so we could copy the index off. You have to imagine someone working at Inktomi thinking, we have this beautiful cage and there's a pile of ... 'bleep,' and they got the contract?" As one of the ops guys remarked to me later, "Never underestimate the bandwidth of a truck full of hard drives."

  While Inktomi's cage may have been beautiful, it wasn't completely secure. Google didn't have enough outlets to plug in all their machines, so Zain crawled under the raised floor and snaked out an unused cable from the Inktomi side of the fence. It would have been the ultimate indignity had anyone from Yahoo's jilted partner been around to witness it, but to Google, it was just an opportunity to improvise.

  The Wake of the Flood

  Early on the morning of Sunday, July 2, Howard Gobioff turned his black Honda Nighthawk into the Google parking lot, killed the motor, pulled off his helmet, shook loose his ponytail, and climbed the stairs. Inside, Romanian roller-hockey enforcer Bogdan Cocosel had been up all night as the push propagated the new index to the thousands of servers in all the data centers. Bogdan nursed the system and, when it appeared to hiccup, cursed it with enthusiasm. Howard sat at the terminal to relieve him.

  To those inside the Googleplex, it was a glorious new dawn. There was no going back now. Howard watched as the index skated along the ragged edge of disk capacity. The push held throughout the day, and by the next morning the billion-URL index at last stood locked and loaded and ready to serve. Yahoo would initiate the switchover at eight p.m.

  Monday, July 3, 7:45 p.m. The team floated in and around Urs Hölzle's offi
ce, anticipating the opening of the spillway and the rush of the incoming torrent of queries into Google's query stream.

  Eight p.m. came, but the flood did not appear. Not even a trickle came through. There were no queries from Yahoo being passed to Google. Had Yahoo reconsidered? Had Inktomi somehow sabotaged the deal? Urs called Udi. Yes, it was supposed to have happened at eight p.m. Unfortunately, Yahoo was having problems reconfiguring the DNS (domain name server) that would tell the queries to go to Google instead of Inktomi. No one at Yahoo had changed a DNS entry in quite some time and they had forgotten how to do it.

  "You should be seeing it now," Udi told Urs.

  "Hmmm ... No."

  "Now?"

  "Still no. Try changing your DNS expiration time."

  A pause.

  "How about now?"

  Yahoo's traffic came sweeping into Google's data centers and Google itself seemed to swell in magnitude, to be lifted on a crest of queries to the upper tier of online search companies. A loud pop was heard and a cheer went up from the assembled Googlers. Someone had uncorked a single bottle of Dom Perignon and was passing around cups with a sip for each of the dozens of people on hand.

  Urs was even more succinct than his Yahoo counterpart had been. "To something!" he said, raising his glass.

  The engineers downed their champagne in a gulp and dug into the bag of Big Macs Craig Silverstein had brought in from McDonald's. They wiped their greasy fingers on their jeans and then went home to sleep for many hours. The changeover passed flawlessly. Not a single query was lost. Yahoo had licensed only a portion of Google's full data set, a distinction that would probably make no difference to most Yahoo searchers but meant that Google.com retained absolute superiority.

 

‹ Prev