Book Read Free

Tubes: A Journey to the Center of the Internet

Page 21

by Andrew Blum


  It used to be that we kept our data on our (actual) desks, but as we’ve increasingly given up that local control to far-off professionals, the “hard drive”—that most tangible of descriptors—has transformed into a “cloud,” the catchall term for any data or service kept out there, somewhere on the Internet. Needless to say, there is nothing cloudlike about it. According to a 2010 Greenpeace report, 2 percent of the world’s electricity usage can now be traced to data centers, and that usage is growing at a rate of 12 percent a year. By today’s standards, a very large data center might be a 500,000-square-foot building demanding fifty megawatts of power, which is about how much it takes to light a small city. But the largest data center “campus” might contain four of those buildings, totaling more than a million square feet—twice the size of the Javits Center in New York, the same as ten Walmarts. We’ve only just begun building data centers, and already their accumulated impact is enormous.

  I know this intuitively, because a lot of this data is mine. I have gigabytes of email storage in a data center in Lower Manhattan (and growing every day); another sixty gigabytes of online backup storage in Virginia; the cumulative traces of countless Google searches; a season’s worth of episodes of Top Chef downloaded from Apple; dozens of movies streamed from Netflix; pictures on Facebook; more than a thousand tweets and a couple hundred blog posts. Multiply that around the world and the numbers defy belief. In 2011, Facebook reported that nearly six billion photos were uploaded to the service every month. Google confirms at least one billion searches per day—with some estimates tripling that number. All that has to be processed and stored somewhere. So where does it all go?

  I was less interested in the aggregate statistics than in the specifics, the parts of all this online detritus that I could touch. I knew that data centers which once occupied closets had expanded to fill whole floors of buildings; that floors had grown into subdivided warehouses; and that warehouses have transformed into purpose-built campuses, as in The Dalles. What had before been afterthoughts, physically speaking, had now acquired their own architecture; soon, they’d need urban planning. A data center was once like a closet, but now was more like a village. The ever-increasing size of my own appetite for the Internet made it clear why. What was less clear to me was where. What were these enormous buildings doing way up on the Columbia Plateau?

  The Internet’s efficiency at moving traffic—and the success of exchange points at serving as hubs for that traffic—has left the question of where data sleeps remarkably open-ended. When we request information over the Internet, it has to come from somewhere: either another person or from the place where it’s being stored. But the everyday miracle of the Internet allows all that data to, in theory, be stored anywhere—and still the stuff will find its way back to us. Accordingly, for smaller data centers, convenience rules: they are often close to their founders or their customers, or whoever finds a need to visit them to tweak the machines. But as it happens, the bigger a data center gets, the thornier the question of location becomes. Ironically for such massive, factory-like buildings, data centers can seem quite loosely connected to the earth. But still they cluster.

  Dozens of considerations go into locating a data center, but they almost all come down to making it as cheap as possible to keep a hard drive—much less 150,000 of them—spinning and cool. The engineering of the building itself, especially how its temperature is controlled, has a huge impact on its efficiency. Data center engineers compete to design buildings with the lowest “power usage effectiveness,” or “PUE,” which is sort of like the gas mileage in a car. But among the most important external variables in a good PUE is a building’s location. Just as a car will get better gas mileage in a flat, empty place compared with a hilly city, a data center will run more efficiently where it can draw in outside air to cool its spinning hard drives and powerful computers. But because data centers can be anywhere, seemingly small differences become amplified.

  Siting a data center is like the acupuncture of the physical Internet, with places carefully chosen with pinpoint precision to exploit one characteristic or another. As competitive companies thrust and parry for advantage, it becomes clear that some places are better than others, and the result is geographic clusters. The largest data centers begin piling up in the same corners of the earth, like snowdrifts.

  Michael Manos has built more data centers than perhaps anyone—by his count around a hundred, first for Microsoft and later for Digital Realty Trust, a major wholesale developer. He is a big, fair-skinned, good-humored guy and he talks a mile a minute, like John Candy playing a commercial real estate agent. That suits the data center game, which is about finding a deal and driving your stake. When he joined Microsoft in 2005, the company had about ten thousand servers spread out in three separate facilities around the world, running their online services like Hotmail, MSN, and Xbox games. By the time Manos left four years later, he had helped expand Microsoft’s footprint to “hundreds of thousands” of servers spread around the world in “tens” of facilities—“But I still can’t tell you how many,” he told me. The number was still a secret. It was an expansion of unprecedented scale in Microsoft’s history, and one that to this day has been matched only by a handful of other companies. “Not a lot of people on the planet are dealing with these size and scale issues,” Manos said. Even fewer have scoured the world as Manos has.

  At Microsoft, he built a mapping tool that considered fifty-six different criteria to generate a “heat map” indicating the best location for a data center, shaded from green (for good) to red (for bad). But the trick was getting the scale right. At the state level, a place like Oregon looked horrible—mainly because of environmental risks, like earthquakes. But when he zoomed in, the story changed: the earthquake zone is on the western side of the state, while central Oregon has the benefit of being cold and dry—perfect for cooling hard drives using outside air. Surprisingly, what got almost no weight in the equation was the cost of the land itself, or even the cost of the actual building. “If you look at the numbers, eighty-five-ish percent of your cost is in the mechanical and electrical systems inside the building,” Manos explained. “Roughly seven percent, on average, is land, concrete, and steel. That’s nothing! People always ask me, ‘Is it better to build small and tall or big and wide?’ It doesn’t matter. At the end of the day, real estate and the biggest construction costs are literally not an issue for most of these buildings. All your cost is in how much gear can you stick in your box.” And then, of course, how much it costs to plug it in—what data center people call “op-ex,” the operating expenses. “A data center guy is always looking for two things,” Manos said. “My wife used to think I was always looking at the scenery, but actually I was looking at the power lines, and for fiber hanging from those power lines.” In other words, he was looking for the view outside my window in The Dalles.

  Beginning in the late 1990s, the Bonneville Power Administration had begun mounting fiber-optic cables along its long-haul transmission lines, an amazing network that crisscrossed the Northwest and came together in The Dalles. It was a tricky job, often requiring helicopters to string cable on high towers in rough country, and while the power company’s leaders’ primary goal was improving internal communications, they saw it was only incrementally more expensive to install extra fiber—far more, in fact, than they needed for the company’s own use. To the strenuous objection of the telecommunications companies, who didn’t believe a government-subsidized utility should be competing with them, the BPA soon began leasing that extra fiber out. It was a big, robust communications system, a regional sweep of heavy-duty fiber protected from errant backhoes on its perch high up on the power lines—catnip for data center developers.

  Microsoft tapped into it from a town called Quincy, up the road from The Dalles in Washington State. “It was the greenest spot in the United States for us,” Manos said, referring to his heat map, rather than trees or environmental considerations. Like The Dalles, Quincy
was near the Columbia River and nestled in the tangle of the Bonneville Power Administration’s power and fiber infrastructure. Not surprisingly, Microsoft wasn’t alone for long. Soon after breaking ground on its 470,000-square-foot, 48-megawatt data center (since joined by a second building), what Manos calls the “Burger King people” showed up—the second movers, the companies who wait until the market leader has built in a particular location, and then build next to them. In Quincy, these included Yahoo!, Ask.com, and Sabey, a wholesale data center owner. “Within eighteen months, you had this massive, almost three billion dollars’ worth of data center construction going on in a town that was predominantly known for growing spearmint, beans, and potatoes,” Manos said. “When you drive through downtown now, it’s just big, giant, open farm fields and then these massive monuments of the Internet age sticking out of these corn rows.” Meanwhile, down the road in The Dalles, one of Microsoft’s biggest competitors was writing its own story.

  The Dalles had been a crossroads for centuries, but around 2000, at the crest of the broadband boom, it seemed as if the Internet was passing it by. The Dalles was without high-speed access for businesses and homes, despite the big nationwide backbones that tore right through along the railroad tracks, and the BPA’s big network. Worse, Sprint, the local carrier, said the city wouldn’t get access for another five to ten years. “It was like being a town that sits next to the freeway but has no off-ramp,” was how Nolan Young, the city manager, explained it to me in his worn office, grand but fluorescent lit like a high school principal’s, inside the turn-of-the-century Dalles City Hall. Wizened and soft-spoken, with a hobbitlike pitch to his voice, Young had shrugged at the sight of my tape recorder. Like any veteran politician, he was used to nosy journalists—although more than a small town’s share had been through here recently.

  The Dalles had felt the brunt of the industrial collapse of the Pacific Northwest, and the Internet’s neglect added insult to injury. “We said, ‘That’s not quick enough for us! We’ll do it ourselves,’ ” Young recalled. It was an act of both faith and desperation—the ultimate “if you build it they will come” move. In 2002, the Quality Life Broadband Network, or “Q-Life,” was chartered as an independent utility, with local hospitals and schools as its first customers. Construction began on a seventeen-mile fiber loop around The Dalles, from city hall to a hub at the BPA’s Big Eddy substation, on the outskirts of town. Its total cost was $1.8 million, funded half with federal and state grants, and half with a loan. No city funds were used.

  The Dalles’s predicament was typical of towns on the wrong side of the “digital divide,” as politicians call poorer communities’ lack of access to broadband. The big nationwide backbones were quickly and robustly built, but they often passed through rural areas without stopping. The reasons were both economic and technological. Long-distance fiber-optic networks are built in fifty-odd-mile segments, which is the distance light signals in fiber-optic cables can travel before needing to be broken down and reamplified. But even at those “regeneration” points, siphoning off the long-distance signals for local distribution requires expensive equipment and a lot of person-hours to set up. High-capacity, long-distance fiber-optic networks are therefore cheaper to build and to operate if they zoom straight through on their path between hubs. And even if they can be induced to stop, a small town doesn’t have the density of customers needed to push it up the priority list of construction projects for a national company, like Sprint. A “middle mile” network bridges that gap, by laying fiber between a town and the nearest regional hub, connecting small local networks to the long-distance backbones. Network engineers call this the “backhaul,” and there’s no Internet without it. Q-Life was a textbook example of the middle mile—although in The Dalles, the middle mile was actually closer to four miles, from the center of town to the Big Eddy substation, where the BPA’s fiber converged.

  Once Q-Life’s fiber was in place, local Internet service providers quickly swooped in to offer the services Sprint wouldn’t. Six months later, Sprint itself even showed up—quite a lot sooner than its original five-year timeline. “We count that as one of our successes,” Young said. “One could say they’re our competitors, but now there were options.” But the town couldn’t have predicted what happened next. At the time, few could have. The Dalles was about to become home to the world’s most famous data center.

  In 2004, just a year after the Q-Life network was completed, a man named Chris Sacca, representing a company with the suspiciously generic name of “Design LLC,” showed up in The Dalles looking for shovel-ready sites in “enterprise zones,” where tax breaks and other incentives were offered to encourage businesses to locate there. He was young, sloppily dressed, and interested in such astronomical quantities of power that a nearby town had suspected him as a terrorist and called the Department of Homeland Security. The Dalles had a site for him, thirty acres next to a decommissioned aluminum smelter that itself once drew eighty-five megawatts of power—more than the everyday needs of a city many times its size.

  As negotiations began, Sacca wanted total secrecy, and Young started signing nondisclosure agreements. The cost of the land itself wasn’t much at issue (as Manos could have predicted). It was all about power and taxes. The local congressman was called in to help convince the Bonneville Power Administration to steepen its discounts. The governor had to approve the fifteen-year tax break Design LLC demanded, given the hundreds of millions of dollars of equipment it planned to install in The Dalles. But any reasonably sized community in Oregon might have come up with the power and the incentives. The ace in the hole that made Design LLC’s heat map glow bright green over The Dalles was of the town’s own making: Q-Life. “It was visionary—this little town with no tax revenues had figured out that if you want to transform an economy from manufacturing to information, you’ve got to pull fiber,” Sacca later said. In early 2005, the deal was approved: $1.87 million for the land and an option for three more tracts. But still Young had to keep the secret, even after construction began. “I had signed so many agreements that there was a point when I was standing at the site, and someone said, ‘I see they’re building … rrrggrrr … there.’ And I said, ‘What, I don’t see anything!’” But the secret’s out now: Design LLC was Google.

  It’s become a cliché that data centers adhere to the same rules as the secret cage matches in the movie Fight Club: “The first rule of data centers is don’t talk about data centers.” This tendency toward the hush-hush often bleeds into people’s expectations about the other types of the Internet’s physical infrastructure, like exchange points—which are actually quite open. So why all the secrecy about data centers? A data center is a storehouse of information, the closest the Internet has to a physical vault. Exchange points are merely transient places, as Arnold Nipper pointed out in Frankfurt; information passes through (and fast!). But in data centers it’s relatively static, and physically contained in equipment that needs to be protected, and which itself has enormous value. Yet more often the secrecy isn’t because of concerns over privacy or theft, but competition. Knowing how big a data center is, how much power it uses, and precisely what’s inside is the kind of proprietary information technology companies are eager to keep under wraps. (And indeed, Manos and Sacca very well might have run into each other, crisscrossing the Columbia River Valley in search of a site.) This is especially true for data centers built and owned by single companies, where the buildings themselves can be correlated to the products they offer. A culture of secrecy developed in the data center world, with companies fiercely protecting both the full scope of their operations, and the particularities of the machines housed inside. The details of a data center became like the formula for Coke, among the most important corporate secrets.

  As a consequence, from a regular Internet user’s perspective, where our data sleeps is often a difficult question to answer. Big web-based companies in particular seem to enjoy hiding within “the cloud.” They are frequently
cagey about where they keep your data, sometimes even pretending not to be entirely sure about it themselves. As one data center expert put it to me, “Sometimes the answer to the question ‘where’s my email?’ is more quantum than Newtonian”—a geeky way of saying it appears to be in so many places at once that it’s as if it’s nowhere at all. Sometimes the location of our data is obscured further by what are known as “content delivery networks,” which keep copies of frequently accessed data, like popular YouTube clips or TV shows, in many small servers closer to people’s homes, just as a local store keeps popular items in stock. Being close minimizes the chances of congestion, while also bringing bandwidth costs down. But generally speaking, the cloud asks us to believe that our data is an abstraction, not a physical reality.

  But that’s disingenuous. While there are moments when our online life really has discombobulated, with our data broken into ever-smaller pieces to the point that it’s theoretically impossible to know where it is, that’s still the exception. It’s a quarter truth that data center owners seize upon in a deliberate attempt at directing attention away from their actual places—whether for competitive reasons, because of environmental embarrassment, or for other notions of security. But what frustrates me is that feigned obscurity becomes a malignant advantage of the cloud, a condescending purr of “we’ll take care of that for you” that in its plea for our ignorance reminds me of slaughterhouses. Our data is always somewhere, often in two places. Given that it’s ours, I stick to the belief that we should know where it is, how it ended up there, and what it’s like. It seems a basic tenet of today’s Internet: if we’re entrusting so much of who we are to large companies, they should entrust us with a sense of where they’re keeping it all, and what it looks like.

 

‹ Prev