Life After Google
Page 7
Taking an Olympian view of the scene, Andy Bechtolsheim, the Valley’s leading entrepreneur of networking hardware, sells equipment to both Google and its rivals. He is now building four hundred–gigabit Ethernets at his ascendant network company, Arista. If CPUs won’t run much cooler, he reasons, maybe the rest of the computer can be redesigned to keep power consumption to a minimum. That’s his goal. Some industry veterans believe that Bechtolsheim doesn’t count for much in the era of cloud computing. He counted, though, back in 1998, when he supplied the first outside money for Brin and Page. Prior to that, he had made successive fortunes as the founder of Sun Microsystems, as a major early investor in Microsoft, and as the progenitor of Granite Systems, an inventor of gigabit Ethernet switches ultimately snapped up by Cisco. As a founder of the now forgotten Frox, he helped launch many of the major inventions in digital video. Now he is the technical leader of Arista, the ascendant router and switch company of the data center era. With Cisco, Google, Microsoft, Sun, and Arista as his royal flush, he is the supreme investor-entrepreneur in Silicon Valley history.
Speaking at double-data-rate in a German accent, Bechtolsheim believes that the move from search to more ambitious services plays to Google’s advantage. “To deliver video, maps, and all the rest on the fly, optimized for a specific customer’s needs, to get the maximum benefit for the advertiser—that requires tremendous hardware, storage, and memory. It takes hundreds of computers free to each end user. The next tier down doesn’t have the economics to build this stuff.”
I ask, “So is the game over?” Bechtolsheim replies, “Only if no one changes the game.”
Leaning back in his chair, Bechtolsheim observes, “The last few years have been disappointing for people who want to accelerate progress in technology. But now the world is moving faster again.”8
The next wave of innovation will compress today’s parallel solutions in an evolutionary convergence of electronics and optics: 3D and even holographic memory cells; lasers inscribed on the tops of chips, replacing copper pins with streams of photons; and all-optical networks in which thousands of colors of light travel along a single fiber. As these advances find their way into an increasing variety of devices, the petascale computer will shrink from a dinosaur to a teleputer—the successor to today’s handhelds—in your ear or in your signal path. It will access an endless variety of sensors, searchers, and servers.
These innovations will enable participation in metaverses that seem to play to the strength of Google’s cloud, which will link to trillions of sensors around the globe. (IPhone8 has sixteen different sensor systems, from an array of radio frequency devices to gyroscopes, accelerometers, barometers, and imagers galore.) A planetary sensorium will give Google a constant knowledge of the physical state of the world, from traffic conditions to the workings of your own biomachine.
Jaron Lanier, the inventor of virtual reality, calls Google’s triumphant, capacious, efficient data centers “Siren Servers,” alluding to the bird-women of Greek mythology who with their irresistible song lured sailors to their deaths on the rocks. The sailors in Lanier’s metaphor are not kayakers on the Columbia but the masters of industry who own the servers. Siren Servers confer on Google its temporary endorphins of dominance, which will be followed, in Lanier’s caustic vision, by shipwreck amid the rocks and waves of a new paradigm.
With this in mind, let us recall Bell’s Law. As we pay a billionth of a cent per byte of storage and a penny per gigabit per second of bandwidth, what kind of machine labors to be born? After all, Bell’s ten years are running out. Will the Sirens offer a new machine of economic growth and progress, investment and capital accumulation, and continued economic dominance? Or is The Dalles a monument to an expiring business strategy? Are the days of centralization over?
CHAPTER 7
Dally’s Parallel Paradigm
Is this Life after Google or what?
Bill Dally is about to take me to the Palo Alto Caltrain station in his self-driving Tesla Model S.1
In the Nvidia garage in Santa Clara, I board the sleek gray boron steel and titanium missile, noting its futuristic payload of a 1,200-pound lithium-ion battery. Should be enough to get me to the station. Fully charged, it can almost replace sixty pounds of gasoline in the tank of an internal combustion engine. That might not seem like much, but in Google-era mathematics it can save the world.
In calculating the energy budgets of its data centers, Google, like the rest of Silicon Valley, is as rigorous as a Kenyan marathoner. But you had better recheck its numbers when it begins rolling out cars gleaming in the sun of solar subsidies. They may cost “waymo” than they say.
This is a Tesla, though, and its self-driving aspiration comes from Nvidia’s industry-leading Drive PX system. To buckle myself into the bucket seat, I have to push aside a flier from the annual Hot Chips conference in nearby Cupertino. While I was analyzing semiconductors for Ben Rosen and Esther Dyson some three decades ago, when chips were still hot, I used to go to Hot Chips conferences to stay up to date. Silicon, then as now, was the foundation, the physical layer, underlying the entire edifice of information technology. I am reassured that Hot Chips lives on even though Google and others assert that “software eats everything.”
Nick Tredennick, the designer of a favorite “hot chip” of yore, the Motorola 68,000 microprocessor behind Steve Jobs’s Macintosh computer, used to say that the industry seeks to exploit the “leading edge wedge.” Three overlapping design targets converged in this fertile crescent of chip design: zero delay (fast hot chips), zero power (cool low-energy devices), and zero cost (transistors going for billionths of a penny).2 Between the 1980s and 2017, chips have been migrating from the hot fast end toward the cool cheap end, a trend Dally has led.
In the Tesla’s front seat, I face a two-foot-high screen displaying pale green and striated Google maps. Dally points out that self-driving vehicles “don’t care where the road lanes are. They navigate on maps, register their place on a map. If they have an empty road, they just take a line down the middle, like they are riding a rail. It is only the presence of moving objects, such as pedestrians and other cars, that requires them to use all their motion-sensing capabilities.”
While the maps come from Google, the processing comes from Nvidia GPUs. These chips compute the car’s response to lidar, radar, ultrasound, and camera signals that free the missile to descend from the outer space of Elon Musk’s domains and enter the ever-changing high-entropy world beyond Google Maps.
Dally barks his command: “Navigate to California Avenue Caltrain station,” and the car crisply responds. Dally comments, “In the last couple years speech recognition has become dramatically better. Thirty-percent better. Two years ago it was not really capable of getting it right. But now with machine learning on our Tegra chips, it gets it right every time.” Benefiting are all the users of Amazon’s Alexa, Apple’s Siri, Microsoft’s Cortana, Google’s Go.
Dally has his hands on the steering wheel now as he negotiates the back streets. “It’s only level-two autonomy,” he explains, using the Society of Auto Engineers’ classifications, which range from level one, a mere driver assistant, to level five, full self-driving. Musk promises to get Tesla to level five in two years. That’s Elon for you. But for now Dally keeps his eye on the road as the Tesla makes its way, with several high-voltage bursts, up the ramp onto 101. Now Tesla’s self-driving mode enables him to turn and show me his film of the recent solar eclipse—a series of vivid high-contrast images of the rare event.
Machine learning, Dally points out, is mostly accomplished by graphics processing chips from Nvidia. Some advances in artificial intelligence spring from improvements in algorithms, but the real source of these capabilities is the explosive improvement in computer speed achieved through a combination of Moore’s Law and parallel processing. Nvidia’s graphics processors are the climax of Dally’s long career as a prophet of parallel processing, which began thirty years ago at Virginia Tech, where he
studied the virtues of multiple processors functioning together.
At a Hot Chips conference at Stanford in August 1991, Dally and Norm Jouppi first emerged as foils in fashioning the future philosophies of computation. Dally introduced his revolutionary massively parallel J-machine, and Jouppi, now at Google, then at Digital Equipment, touted the promise of revving up existing processor pipelines to “five instructions per-clock” cycle.3
Those two papers of 1991 polarize all computer science: do you make existing serial von Neumann processors go faster, seeking zero delay, stepping and fetching instructions and data from ever faster remote memories? Or do you diffuse the memory and processing all through the machine? In a massively parallel spread like Dally’s J-machine, the memory is always close to the processor.
Twenty-six years later, Dally and Jouppi are still at it. At the August 2017 Hot Chips in Cupertino, all the big guys were touting their own chips for what they call “deep learning,” the fashionable Silicon Valley term for the massive acceleration of multi-layered pattern recognition, correlation, and correction tied to feedback that results in a cumulative gain in performance. What they call “learning” originated in earlier ventures in AI. Guess, measure the error, adjust the answer, feed it back are the canonical steps followed in Google’s data centers, enabling such applications as Google Translate, Google Soundwriter, Google Maps, Google Assistant, Waymo cars, search, Google Now, and so on, in real time.4
As recently as 2012, Google was still struggling with the difference between dogs and cats. YouTube was famous for its cat videos, but it could not efficiently teach its machines to recognize the cats. They could count them; the data center dogs could dance; but it took sixteen thousand microprocessor cores and six hundred kilowatts.5 And it still was a dog, with a 5 percent error rate—not an impressive portent for Google’s human face-recognition project or for car vision systems that need flawlessly to identify remote objects in real time.
As Claude Shannon showed, these success rates of 95 percent, or even 99.999 percent, are deceptive, because you have no way of telling which instances are the errors.6 The vast majority of the home loans in the mortgage crisis were sound, but because no one knew which ones were not, all the securities crashed. You don’t want that problem with self-driving cars.
In a joint appearance in 2012 in Aspen, Peter Thiel chided Eric Schmidt: “You don’t have the slightest idea of what you are doing.” He pointed out that the company had amassed some $50 billion in cash at the time and was allowing it to sit in the bank at near-zero interest rates while its vast data centers still could not identify cats as well as a three-year-old could.7
Thiel is the leading critic of Silicon Valley’s prevailing philosophy of “inevitable” innovation. Page, on the other hand, is a machine-learning maximalist who believes that silicon will soon outperform human beings, however you want to define the difference. If the haphazard Turing machine of evolution could produce human brains, just imagine what could be accomplished by Google’s constellation of eminent academics devoting entire data centers full of multi-gigahertz silicon to training machines on petabytes of data. In 2012, though, the results seemed underwhelming.
Simultaneously with the dogs and cats crisis in 2012, the leader of the Google Brain research team, Jeff Dean, raised the stakes by telling Urs Hölzle, Google’s data center dynamo, “We need another Google.” Dean meant that Google would have to double the capacity of its data centers just to accommodate new demand for its Google Now speech recognition services on Android smartphones.
Late in the year, Bill Dally provided an answer. Over breakfast at Dally’s favorite Palo Alto café, his Stanford colleague Andrew Ng, who worked with Dean at Google Brain, was complaining about the naming of cats. Sixteen thousand costly microprocessor cores seemed inefficient. Dally suggested that Nvidia GPUs could help. Graphics processors specialize in the matrix multiplication and floating-point mathematical operations that teach machines to recognize patterns. A graphical image is an array of values readily mapped to a mathematical matrix. Running images through as many as twelve layers of matrices, machine learning could be seen as another form of iterative graphics processing.
Prove it, Ng told Dally, and Google would buy his chips.
The man who built the first crude graphics processor, the precursor of all of Google’s data center neural networks, was Frank Rosenblatt, a psychology professor at Cornell. In 1958 he described his “perceptron” to the New Yorker: “If a triangle is held up to the perceptron’s eye [photosensor], the association units connected with the eye pick up the image of the triangle and convey it along a random succession of lines to the response units [now called neurons], where the image is registered. . . . [A]ll the connections leading to that response are strengthened [i.e., their weights are increased], and if a triangle of a different size and shape is held up to the perceptron, its image will be passed along the track that the first triangle took. If a square is presented, however, a new set of random lines is called into play. . . . The more images the perceptron is permitted to scan, the more adroit its generalizations. . . . It can tell the difference between a dog and a cat.”8
Four years later, Ray Kurzweil, then sixteen, visited Rosenblatt after Kurzweil’s MIT mentor Marvin Minsky exposed the limitations of the one-layer perceptron that Rosenblatt had built. Rosenblatt told Kurzweil that he could surmount these limitations by stacking perceptrons on top of one another in layers. “The performance improves dramatically,” he said. Rosenblatt died in a boating accident eight years later, never having built a multilayered machine.
Now at Google, that omission was being remedied. Dally assigned Nvidia’s software guru Frank Canizaro to work with Ng on upgrading Nvidia’s proprietary software CUDA (Compute Unified Device Architecture) for use in its CUDA Deep Neural Network library (cuDNN). The Stanford-Google-Nvidia team solved the cat-and-dog problem with merely twelve GPUs burning just four kilowatts, all for just thirty-three thousand dollars.
Dally was proud of this achievement. The Nvidia machine was roughly 150 times as cost-effective as Google’s previous setup. And that’s not even taking into account the GPUs’ enormous advantage in energy efficiency. Nvidia processors soon pervaded Google’s data centers, giving unprecedented performance in the matrix multiplications and accumulations at the heart of machine learning.
Google now deploys ten-to-twelve-layer neural networks generating thirty exa-flops of floating point mathematical computing capacity—and matrix multiplications galore. In accord with Rosenblatt’s prediction that the “more images the perceptron is permitted to scan, the more adroit its generalizations,” the Google machine sorts tens of millions of images according to some one billion parameters. It prompts Google Brain routinely to claim to be “outperforming humans.” Gee, a billion parameters, beats me! In Silicon Valley, where human beings program these machines, it is considered cranky to question the claim of “superhuman” powers.
None of this would faze Dally, except for one crucial change at Google. At the 2017 Hot Chips conference, the company, in a do-it-yourself mood, indicated that it would henceforth replace Nvidia’s devices with its own special-purpose silicon. Jeff Dean celebrated Jouppi’s souped-up “Tensor” “matrix multiplier,” which eschewed graphics and floating point, focusing on the machine learning functions alone. It’s a matrix multiplier ASIC (application-specific integrated circuit). Without their Tensor processing unit, say the Google guys, they would have had to double the size of their data centers.
Dally points out that it is always possible to make huge temporary gains by putting entire systems onto single slivers of ASIC silicon, special-purpose chips hard-wired to perform one complex function. As Dally tells me, in performing parallel operations, graphics processors are ten times more cost-effective than general-purpose central processing units (CPUs), and ASICs are ten to a hundred times more cost-effective than ordinary GPUs. But with ASICs, your market is reduced to just your chosen special purposes, and your data c
enters are no longer all-purpose Turing machines. They are ossifying into special-purpose factories like the aluminum plants they succeeded in The Dalles.
Google can afford to make its own custom ASICs for particular slots in its data centers, but Nvidia is dominating the entire domain of massively parallel processing. In the third quarter of 2017, after the Hot Chips “setbacks,” Nvidia announced a 109 percent increase in revenues from cloud computing sales, to $830 million, lifting the company’s market value to almost $130 billion.
Now Nvidia is a potent force providing parallel processors across global industry and providing new platforms for life after Google. Is all this to come to an end with Google’s new prowess in making hardware as well as software, hiring industry hardware titans such as Dave Patterson and Norm Jouppi to contrive world-leading chip architectures?
I was visiting Dally to find out. A fifty-seven-year-old, brown-haired engineer with a black hat and backpack and hiking boots, he is dressed Silicon-Valley-mountaineer style to take me on a high-altitude adventure in microchips and software, ideas and speculations, Google maps and Elon Musk “reality distortion fields” down Route 101 at five o’clock on a late-August Friday evening.
It’s not quite Doctor Brown’s Back to the Future ride in a DeLorean, but it will suffice for some modest time-travel in the history of computing.
Since writing his college thesis in the late 1970s, Dally has rebelled against the serial step-by-step computing regime known as the von Neumann architecture. After working on the “Cosmic Cube” under Chuck Seitz for his Ph.D. at Caltech (1983), Dally has led design of parallel machines at MIT (the J-machine and the M-machine), introduced massive parallelism to Cray supercomputers (the T-3D and 3E), and pioneered parallel graphics at Stanford (the Imagine project, a streaming parallel device incorporating programmable “shaders,” now ubiquitous in the industry’s graphic processors from Nvidia and others).