Epitomizing the excesses of the elaborate Amazon setup, in Balaban’s view, were top-of-the-line “machine learning” Tesla GPUs from Nvidia. He discovered that Nvidia’s gaming chips were not only ten times cheaper but also faster. What mattered to Balaban’s machine-learning algorithms were not all the custom “machine-learning features” but the number of usable floating-point operations per dollar. As Bill Dally had shown at Nvidia, machine learning is essentially a product of Moore’s Law advances in processing speeds and parallelization. If you can do it on a handset, why do you need The Dalles?
Balaban resolved to maximize the usable FLOPS per buck. That meant using game-machine GeForce processors, not the Teslas that were Dally’s pride at Nvidia or the Tensor Processing Units that Urs Hölzle cherished at Google.
The Nvidia representatives would try to frighten him by explaining that the game chips “were not meant for a datacenter. They couldn’t be relied on for machine-learning tasks. We can’t stand behind them.” It was what Silicon Valley calls “FUD”—the fear, uncertainty, and doubt that the established producers like IBM spread about cheap alternative devices, such as those produced a decade ago by Nvidia.
But keeping his eye on the key metric of FLOPS per buck, Balaban calculated that the up-market Tesla chips cost around five thousand dollars for Floating Point-32 performance teraflops of 10.6. The gaming chips (GeForce GTX 1080 TI) produced 11.3 teraflops and could be bought for $580 per module. It was not a close call. In Balaban’s model of FLOPS per buck, the gaming chips were around twenty-four times better.
At that point, Balaban made the disappointing discovery that Nvidia did not sell its GeForce GPU boards in the minuscule numbers that he needed for his server farm. This looked like a “show-stopper.” But he recalled his discussions with Austin Russell about building GPU clusters of gaming boards for crypto-mining.
The solution was obvious: “buy ’em at Fry’s,” the Valley’s dominant electronics retail chain, selling boards made by Zotac and Asus in Taiwan. The Lambda team cleared out the Bay Area’s supply of 1080 TIs, provoking something of a crisis for crypto-miners who needed the modules for their own servers.
At that point in January 2016, Russell invited the Lambda team up to the garage behind the pool at Pony Tracks Ranch. He let them use the space for free; Lambda would just have to pay its power bill. Balaban and his team set out to build the servers from the bottom up, using the gaming boards with the GPU clusters. They installed their own one-hundred-amp breakout box at twenty-four kilowatts.
At 4:27 a.m. on February 13, 2016, they got the first server up and running with the GTX 980 TI Maxwell architecture. The peak compute rate was 5.63 teraflops, so they had four modules per machine, for a total of 225.2 teraflops, putting them on the list of the world’s top supercomputers with a near-quarter-petaflop cluster.
Among the people taking an interest in Balaban’s progress was Georges Harik, a Silicon Valley titan who had, like Balaban, studied computer science at the University of Michigan and had then gone on to develop Google’s AdWords. Harik observed, “I don’t know how Dreamscope will go, but if you guys are good at Linux systems administration, what you can do is GPU cloud services.” That was the number-ten Google employee recommending that they compete with Google in the cloud. It was an idea.
Balaban and his team had learned how to maximize FLOPS per dollar on their machines, with the immediate result of ending the bills from Amazon. Paying off in six weeks, the sixty-thousand-dollar investment earned the runway to make Dreamscope a success. They had built up a team of Stephen and his brother, Michael; Chaun Li, their chief scientist and an expert in using neural networks for converting photos into paintings; and Steve Clarkson, who had dropped out of his Berkeley Ph.D. project in software engineering.
By December 2016, the monetization plan for Dreamscope was sort of working—they had many avid users and were making some five thousand dollars a month on millions of downloads. With more money and a longer runway, it might take off as a profitable product.
Balaban, however, decided to downgrade Dreamscope and go into the computer infrastructure market. He would sell boxes, like Michael Dell in his early years. Mixing up the family group, Balaban brought in a high school pal named Jackson Sengle, lured away from a Ph.D. in bioengineering at Dartmouth. He understood the ribosomes that manufacture all the proteins in your body. Why not defy the new Silicon Valley norm and, as Peter Thiel put it to the Fellows in 2014, “do something that everyone thinks is stupid”—sell homemade computers?
They began putting them together manually, step-by-step, deep into the night above Silicon Valley. With their big advantage in GPU costs at $580 per module, they would not have to be especially efficient in assembly. Their product was a GPU workstation that contained four GPU GeForce gaming modules from Nvidia, which Lambda priced at ten thousand dollars apiece; if you want them rack-mounted for your cloud, they cost twenty-five thousand.
They put it up on their own Lambda Labs website and on Amazon.com. “Not AWS,” Balaban points out, “just Amazon.com.” They called it the “Deep Learning DevBox”—“Dev” for development. They used Google AdWords to advertise. Harik was presumably pleased.
In March 2017, the DevBoxes began to sell. They brought in twenty-five thousand dollars, which was five times more than Dreamscope. That was something. Then in April they sold seventy-five thousand dollars, fifteen times more than Dreamscope. In May they sold $135,000. In August came the eclipse and proposal. By November they were up to nearly $500,000 and ready to launch a datacenter business, tested by Dreamscope, which Balaban said would be used to “dogfood our cloud service.”
Harik’s idea that they could go into the Linux Administrator business for GPU deep-learning clusters seemed to make sense. Another ex-Googler also encouraged Balaban. Ken Patchett, who built Google’s green field data centers in Asia and then went on to build data centers for Facebook, explained to him the sources of the excess costs in data centers. There was all that 24/7 reliability, redundancy, and battery backup, all the costly carbon offset energy cosmetics, all the high-end ASICS and air conditioning.
Perhaps the giant Google data fortresses around the world were becoming seriously suboptimal in FLOPS per watt and FLOPS per dollar. They sure can do searches, but a new Bell’s Law regime is at hand—a new era of decentralization, face recognition in handsets, datacenters in cars and in movable containers—opening up a new era of “sky” computing, dispersing the clouds.
CHAPTER 18
The Rise of Sky Computing
Urs Hölzle has been the central figure in the development of Google’s cloud almost from its beginning, directing its expansion from its redoubt in The Dalles to the ends of the earth. Early in 2017, he reported on his feats to the annual Optical Fiber Conference, which gathers the world’s leading optical engineers and scientists to contemplate the unbounded demands for modulated light and the exquisitely crafted machines needed to transmit, amplify, add, drop, shape, shuffle, switch, and carry it.1
Fiber-optic systems deploy lines of silica fiber that stretch unamplified across a distance the length of Long Island, combine thousands of threads in each cable and scores of data-bearing wavelengths in each thread, and are made of glass so pure that you could see through a pane of it forty miles thick. What Hölzle calls “low-power, high-density coherent optics” is one of the heroic feats of engineering in the Information Age, and it allowed him to increase the bandwidth across his data centers fifty-fold in six years. His forty-two-kilohertz global information finder-fetcher, completing forty-two thousand searches per second, each one entailing hundreds or thousands of computing steps, is a historic technological achievement.
In the course of building this megahertzian planetary utility, Google became one of the world’s leading fiber companies. Its third cable across the Pacific, a 12,899-kilometer line running from California to Hong Kong, will carry data at a rate of 144 terabits a second. Such speeds have pushed bandwidth up twenty-nine-fold s
ince 2010, when Google’s Unity cable between the West Coast and Japan began service. In 2018, Google was planning an even more capacious cable all the way from New York to Japan.
For all the awesome achievements enabled by the star photonic talent in his audience, Hölzle might have offered a message of thanks and celebration. But he had come to Los Angeles not so much to celebrate as to complain. His entire endeavor, he declared, was “approaching a wall.”2 These relentless bandwidth gains, up sixtyfold in seven years, were both inadequate and too costly. Facing the disaggregation of memory, storage, and computation across the planet (“Schmidt’s Law,” which I described in chapter 2) and the explosion of demand for Google’s services, he needed almost immediate tenfold step-function improvements in bandwidth and connectivity.
The lasers and wavelength division networks and coherent optics under seven seas and continents were one thing. But input-output—catching the light-speed rush of photons from hundreds of different messages on each ten-micron core of each bundled thread and channeling them to the right addresses—was all too difficult and costly. Hölzle wanted cloud 3.0. He wanted pluggable fiber optics modules made in volume on automated equipment at one-tenth the cost. He wanted fiber optics to improve far faster than microchip gear under Moore’s Law, and he wanted the price to drop even faster still. He wanted the moon, fast and cheap.
Defying the facts of life in perhaps the world’s most dauntlessly dynamic technological industry, Hölzle’s complaint was an early sign of the death of a paradigm. Paradigms die when they no longer fit the conditions of the real world. Propelled beyond the bounds of economic and technological reality by the nearly infinite demand for free goods, Hölzle and the rest of the Google team have no idea of the real demand for their products. Demand is registered by price signals, and none are transmitted. Google was confounded by its commitment to free and its idea of a zero-marginal-cost economy.
The nearly infinite demand implicit in “free” runs into the finitude of bandwidth, optical innovation, and finance—a finitude that reflects the inexorable scarcity of time. This finitude produces not zero marginal costs but spikes of nearly infinite marginal costs—Hölzle’s “wall”—in the face of surging demand for Google’s cornucopia of valuable products at no cost. Think of a billion addicted teenagers around the world turning to “free apps” on their open-source Android phones an average of eighty times a day.3
Google staged a tremendous coup in recentralizing computing around its data centers. It achieved unprecedented scale by a commitment to “free.” But free flow is not cash flow. It bypasses the entrepreneurial learning that is conveyed through the remorseless messaging of price. Without prices, all that is left to confine consumption is the scarcity of time. Beyond the scores of hours a week for its smart phone customers, time was closing in on Google.
Hölzle lived in a dream world of limitless but ultimately illusory demand. A decade had passed and a new paradigm was on its way. That paradigm would leave its data centers—with their exabytes of memory and petaflops of racked computing power, acres of specialized software, and gigantic cooling towers, perched near rivers and glaciers, often linked to archaic arrays of exhibitionist “green” energy from windmills and solar cells—as vast monuments to an epoch that was ending.
Muneeb Ali of Blockstack explains the transition:
Google and Facebook captured value in the application layer but then had to invent a lot of protocols and infrastructure to actually scale (Google File System, Map Reduce [data base tools], SPDY [to remove latency from link traffic]). Because of the value they created early on, they had the resources.
This [architecture] leads to giant moats because big companies had all the data, but also because no one else had the resources to innovate at the protocol/infrastructure layer. [It was left to Hölzle and his colleagues at Google.]
This innovation is always needed. The question is who has the incentive to lead the charge. In the post-blockchain world, the model gets flipped and there is direct incentive [for many teams outside the giant companies] to work on the hard problems of protocol and infrastructure innovation. This is a major shift.4
Advances in blockchains and cryptography constituted a new Bell’s Law step-function. It would be farther reaching than anything Hölzle was envisaging in his call for ever more bandwidth to carry ever denser images, from 4K pixels wide to 8K pixels wide, and more, all tagged and processed in his data centers by ever more gigaflops of machine intelligence and gigawatts of power.
The new Internet computer architecture and security model of the cryptocosm means the eclipse of the existing Bell’s Law regime of condensed “cloud” processing at data centers teeming with the siloed applications and customer data of a particular giant corporation. On the blockchain, the data will be visible to all and interoperable among all users. The data are exempt, therefore, from exclusive capture by suppliers of infrastructure.
As the cryptocosm gains momentum, these water-cooled clouds will play a declining role. Replacing clouds will be distributed, peer-to-peer architectures, transparent global datasets available to all, and new security models. They will diffuse everywhere on air-cooled laptops and portables. Dispersing clouds, the sky is the limit.
Blockstack, Counterparty, and Rootstock are among the companies providing platforms for secure networking based on identity and data rooted in Satoshi’s bitcoin blockchain. Specialized to be secure for money, bitcoin provides only eighty-three bytes of text storage under its OP_RETURN instruction. That’s enough for memory pointers and compressed mathematical hashes but not enough for even a full tweet. Bitcoin makes up for less capacity through more security.
As bitcoin is a calculator for money transfers, Ethereum is a global computer for executing programs. As bitcoin is a recorder of debits and credits for “coins” on a public ledger, Ethereum is a “virtual machine” for framing and sending software instructions for smart contracts or conditional transactions. To pay for it all, it also supplies coins—ether.
Embedded in the Ethereum blockchain, smart contracts can carry out financial transactions or monetary deals. Buterin offers the analogy of a vending machine, but any similar stepwise tree algorithm applies (if you insert the correct coin and if you designate choice of purchase, then you can collect widget in slot below; if not, you can pound machine fecklessly with your fists).
As Buterin declared in announcing his system, he expected Ethereum to enable “protocols around decentralized file storage, decentralized computing and prediction markets, and provide a massive boost to other peer-to-peer protocols by adding an economic layer.” Most of the other crypto-ventures have used this more resourceful Ethereum blockchain and Solidity language to build their infrastructures.
For reach and ingenuity, it is hard to excel Golem. Calling itself, with partial felicity, an “Airbnb for computers,” it offers to rent your computer’s resources when you are not using them. Then it organizes these resources with the resources of others into a virtual supercomputer. Golem rents cycles and software for this supercomputer in the sky.
A distributed blockchain system that records all the contributions and payments among the computers, Golem sprang from a fertile crypto cohort in Warsaw, Poland. It promises to perform dense parallel computation using surplus computer power around the world. For programming, it offers an application registry and application store to be used by authors of software. It provides a firewalled “sandbox” for “validators” to test the integrity of the software without affecting the platform. To tie these systems together, it provides a Golem Network Token (GNT) and transactions framework that arranges for all participants to be paid as specified.
The global computer and global network token provide economic incentives for suppliers of compute-cycles from billions of otherwise dormant laptops, tablets, and even smart phones. Golem also provides a smart contract matrix for developers, testers, and validators of software. It is a new computing ecosystem. As the blockchain developer Ivan Liljeq
vist comments on his blog Ivan on Tech, “It will be cool if I can program the transactions framework and say I want to be paid in micropayments for every operation as it is executed when the software is used.”5 It could radically change how software is written and sold.
Golem sees itself in the long term building key elements of Web 3.0, where content of all kinds can be generated and exchanged without middlemen. If it succeeds, the top-down silos of the oligarchs will give way to a decentralized Internet, perhaps attached to the storage realms of the Interplanetary File System of Juan Benet and his Filecoin. Benet is leading a movement of many storage companies to rent unused disk space on a similar model.
Scientists around the world could turn to Golem to compute quantitative financial models, Navier-Stokes fluid-flow equations, climate change atmospheric models, protein-fold geometries, machine-learning weights, and pharmacological sampling statistics. Soon much of the world’s population will turn to the global supercomputer to calculate its passage through virtual reality models of the planet. As a sign of the impact of Golem, the GNT is often the most widely held crypto-asset on the Ethereum platform, and from its issue in November 2016 to mid-2018 its value rose more than fortyfold.
For its initial test market in its “Brass” release, Golem aroused enthusiasm by choosing graphics rendering and visualization. Often called “image synthesis”—the compute-intense process of generating photorealistic or animated scenes from two- or three-dimensional computer models—it is most familiar as computer-generated images in videos or films. Rendering and visualization also pervade architecture, education, construction, computer-aided design and engineering, real estate, and even surgery.
Architects use 3D modeling software to display textures, lighting, and minute details. Surgeons rely on high-quality renders of organ scans to diagnose and treat their patients. A rendering can be as routine as a scene in a 2D South Park cartoon or as complex as an action-packed episode in Avatar or the gritty graphics of an interactive 3D game. Advances are accelerating, taking us from the era of Ratatouille just a decade ago—when every animated frame took 6.5 hours to render—to the instant real-time rendering of photorealistic scenes on tens of thousands of parallel GPUs in the Amazon cloud today.
Life After Google Page 20