Book Read Free

The Signal and the Noise

Page 30

by Nate Silver

  In the end, Kasparov, not Deep Blue, blinked, although not quite for the reasons that Campbell and his team were expecting.

  Deep Blue was designed with the goal of beating Kasparov and Kasparov specifically. The team tried to predict which opening sequences Kasparov was most likely to use and develop strong counterattacks to them. (Kasparov, indeed, averted the trap by playing opening moves that he had rarely used before in tournament competition.) Because of its mediocre performance against Kasparov in 1996 and its problems against like-minded players in training matches, meanwhile, Deep Blue’s processing power was doubled and its heuristics were refined.44 Campbell knew that Deep Blue needed to probe more deeply (but perhaps more selectively) into the search tree to match wits with Kasparov’s deep strategic thinking. At the same time, the system was designed to be slightly biased toward complicated positions, which played more to its strengths.

  “Positions that are good for computers are complex positions with lots of pieces on the board so there’s lots of legal moves available,” Campbell told me. “We want the positions where tactics are more important than strategy. So you can do some minor things to encourage that.”

  In this sense, Deep Blue was more “human” than any chess computer before or since. Although game theory does not come into play in chess to the same degree it does in games of incomplete information like poker, the opening sequences are one potential exception. Making a slightly inferior move to throw your opponent off-balance can undermine months of his preparation time—or months of yours if he knows the right response to it. But most computers try to play “perfect” chess rather than varying their game to match up well against their opponent. Deep Blue instead did what most human players do and leaned into positions where Campbell thought it might have a comparative advantage.

  Feature or Bug?

  Still, Kasparov’s skills were so superior in 1997 that it was really just a matter of programming Deep Blue to play winning chess.

  In theory, programming a computer to play chess is easy: if you let a chess program’s search algorithms run for an indefinite amount of time, then all positions can be solved by brute force. “There is a well-understood algorithm to solve chess,” Campbell told me. “I could probably write the program in half a day that could solve the game if you just let it run long enough.” In practice, however, “it takes the lifetime of the universe to do that,” he lamented.

  Teaching a chess computer how to beat a World Champion, instead, often comes down to a banal process of trial and error. Does allotting the program more time in the endgame and less in the midgame improve performance on balance? Is there a better way to evaluate the value of a knight vis-à-vis a bishop in the early going? How quickly should the program prune dead-looking branches on its search tree even if it knows there is some residual chance that a checkmate or a trap might be lurking there?

  By tweaking these parameters and seeing how it played with the changes, Campbell put Deep Blue through many trials. But sometimes it still seemed to make errors, playing strange and unexpected moves. When this happened, Campbell had to ask the age-old programmer’s question: was the new move a feature of the program—a eureka moment that indicated it was growing yet more skilled? Or was it a bug?

  My general advice, in the broader context of forecasting, is to lean heavily toward the “bug” interpretation when your model produces an unexpected or hard-to-explain result. It is too easy to mistake noise for a signal. Bugs can undermine the hard work of even the strongest forecasters.

  Bob Voulgaris, the millionaire basketball bettor I introduced to you in chapter 8, one year decided that he wanted to bet baseball. The simulator he designed consistently recommended “under” bets on the Philadelphia Phillies and the bets weren’t doing very well. It turned out that the error came down to a single misplaced character in 10,000 lines of code: his assistant had mistakenly coded the Phillies’ home ballpark—Citizens Bank Park, a compact field that boosts offense and home runs—as P-H-l rather than P-H-I. That one line of code had been enough to swamp the signal in his program and tie up Voulgaris’s capital in betting on the noise. Voulgaris was so dismayed by the bug that he stopped using his baseball-betting program entirely.

  The challenge for Campbell is that Deep Blue long ago became better at chess than its creators. It might make a move that they wouldn’t have played, but they wouldn’t necessarily know if it was a bug.

  “In the early stages of debugging Deep Blue, when it would make a move that was unusual, I would say, ‘Oh, there’s something wrong,’” Campbell told me. “We’d dig in and look at the code and eventually figure out what the problem was. But that happened less and less as time went on. As it continued to make these unusual moves, we’d look in and see that it had figured out something that is difficult for humans to see.”

  Perhaps the most famous moves in chess history were made by the chess prodigy Bobby Fischer in the so-called “Game of the Century” in 1956 (figure 9-7). Fischer, just thirteen years old at the time, made two dramatic sacrifices in his game against the grandmaster Donald Byrne—at one point offering up a knight for no apparent gain, then a few moves later, deliberately leaving his queen unguarded to advance one of his bishops instead. Both moves were entirely right; the destruction that Fischer realized on Byrne from the strategic gain in his position became obvious just a few moves later. However, few grandmasters then or today would have considered Fischer’s moves. Heuristics like “Never give up your queen except for another queen or an immediate checkmate” are too powerful, probably because they serve a player well 99 percent of the time.


  When I put the positions into my midrange laptop and ran them on the computer program Fritz, however, it identified Fischer’s plays after just a few seconds. In fact, the program considers any moves other than the ones that Fischer made to be grievous errors. In searching through all possible moves, the program identified the situations where the heuristic should be discarded.

  We should probably not describe the computer as “creative” for finding the moves; instead, it did so more through the brute force of its calculation speed. But it also had another advantage: it did not let its hang-ups about the right way to play chess get in the way of identifying the right move in those particular circumstances. For a human player, this would have required the creativity and confidence to see beyond the conventional thinking. People marveled at Fischer’s skill because he was so young, but perhaps it was for exactly that reason that he found the moves: he had the full breadth of his imagination at his disposal. The blind spots in our thinking are usually of our own making and they can grow worse as we age. Computers have their blind spots as well, but they can avoid these failures of the imagination by at least considering all possible moves.

  Nevertheless, there were some bugs in Deep Blue’s inventory: not many, but a few. Toward the end of my interview with him, Campbell somewhat mischievously referred to an incident that had occurred toward the end of the first game in their 1997 match with Kasparov.

  “A bug occurred in the game and it may have made Kasparov misunderstand the capabilities of Deep Blue,” Campbell told me. “He didn’t come up with the theory that the move that it played was a bug.”

  The bug had arisen on the forty-fourth move of their first game against Kasparov; unable to select a move, the program had defaulted to a last-resort fail-safe in which it picked a play completely at random. The bug had been inconsequential, coming late in the game in a position that had already been lost; Campbell and team repaired it the next day. “We had seen it once before, in a test game played earlier in 1997, and thought that it was fixed,” he told me. “Unfortunately there was one case that we had missed.”

  In fact, the bug was anything but unfortunate for Deep Blue: it was likely what allowed the computer to beat Kasparov. In the popular recounting of Kasparov’s match against Deep Blue, it was the second game in which his problems originate
d—when he had made the almost unprecedented error of forfeiting a position that he could probably have drawn. But what had inspired Kasparov to commit this mistake? His anxiety over Deep Blue’s forty-fourth move in the first game—the move in which the computer had moved its rook for no apparent purpose. Kasparov had concluded that the counterintuitive play must be a sign of superior intelligence. He had never considered that it was simply a bug.

  For as much as we rely on twenty-first-century technology, we still have Edgar Allan Poe’s blind spots about the role that these machines play in our lives. The computer had made Kasparov blink, but only because of a design flaw.

  What Computers Do Well

  Computers are very, very fast at making calculations. Moreover, they can be counted on to calculate faithfully—without getting tired or emotional or changing their mode of analysis in midstream.

  But this does not mean that computers produce perfect forecasts, or even necessarily good ones. The acronym GIGO (“garbage in, garbage out”) sums up this problem. If you give a computer bad data, or devise a foolish set of instructions for it to analyze, it won’t spin straw into gold. Meanwhile, computers are not very good at tasks that require creativity and imagination, like devising strategies or developing theories about the way the world works.

  Computers are most useful to forecasters, therefore, in fields like weather forecasting and chess where the system abides by relatively simple and well-understood laws, but where the equations that govern the system must be solved many times over in order to produce a good forecast. They seem to have helped very little in fields like economic or earthquake forecasting where our understanding of root causes is blurrier and the data is noisier. In each of those fields, there were high hopes for computer-driven forecasting in the 1970s and 1980s when computers became more accessible to everyday academics and scientists, but little progress has been made since then.

  Many fields lie somewhere in between these two poles. The data is often good but not great, and we have some understanding of the systems and processes that generate the numbers, but not a perfect one. In cases like these, it may be possible to improve predictions through the process that Deep Blue’s programmers used: trial and error. This is at the core of business strategy for the company we most commonly associate with Big Data today.

  When Trial and Error Works

  Visit the Googleplex in Mountain View, California, as I did in late 2009, and it isn’t always clear when somebody is being serious and when they’re joking around. It’s a culture that fosters creativity, with primary colors, volleyball courts, and every conceivable form of two-wheeled vehicle. Google people, even its engineers and economists, can be whimsical and offbeat.

  “There are these experiments running all the time,” said Hal Varian, the chief economist at Google, when I met him there. “You should think of it as more of an organism, a living thing. I have said that we should be concerned about what happens when it comes alive, like Skynet.* But we made a deal with the governor of California”—at the time, Arnold Schwarzenegger—“to come and aid us.”

  Google performs extensive testing on search and its other products. “We ran six thousand experiments on search last year and probably another six thousand or so on the ad monetization side,” he said. “So Google is doing on a rough order of ten thousand experiments a year.”

  Some of these experiments are highly visible—occasionally involving rolling out a whole new product line. But most are barely noticeable: moving the placement of a logo by a few pixels, or slightly permuting the background color on an advertisement, and then seeing what effect that has on click-throughs or monetization. Many of the experiments are applied to as few as 0.5 percent of Google’s users, depending on how promising the idea seems to be.

  When you search for a term on Google, you probably don’t think of yourself as participating in an experiment. But from Google’s standpoint, things are a little different. The search results that Google returns, and the order in which they appear on the page, represent their prediction about which results you will find most useful.

  How is a subjective-seeming quality like “usefulness” measured and predicted? If you search for a term like best new mexican restaurant, does that mean you are planning a trip to Albuquerque? That you are looking for a Mexican restaurant that opened recently? That you want a Mexican restaurant that serves Nuevo Latino cuisine? You probably should have formed a better search query, but since you didn’t, Google can convene a panel of 1,000 people who made the same request, show them a wide variety of Web pages, and have them rate the utility of each one on a scale of 0 to 10. Then Google would display the pages to you in order of the highest to lowest average rating.

  Google cannot do this for every search request, of course—not when they receive hundreds of millions of search requests per day. But, Varian told me, they do use human evaluators on a series of representative search queries. Then they see which statistical measurements are best correlated with these human judgments about relevance and usefulness. Google’s best-known statistical measurement of a Web site is PageRank,45 a score based on how many other Web pages link to the one you might be seeking out. But PageRank is just one of two hundred signals that Google uses46 to approximate the human evaluators’ judgment.

  Of course, this is not such an easy task—two hundred signals applied to an almost infinite array of potential search queries. This is why Google places so much emphasis on experimentation and testing. The product you know as Google search, as good as it is, will very probably be a little bit different tomorrow.

  What makes the company successful is the way it combines this rigorous commitment to testing with its freewheeling creative culture. Google’s people are given every inducement to do what people do much better than computers: come up with ideas, a lot of ideas. Google then harnesses its immense data to put these ideas to the test. The majority of them are discarded very quickly, but the best ones survive.

  Computer programs play chess in this way, exploring almost all possible options in at least some depth, but focusing their resources on the more promising lines of attack. It is a very Bayesian process: Google is always at a running start, refining its search algorithms, never quite seeing them as finished.

  In most cases, we cannot test our ideas as quickly as Google, which gets feedback more or less instantaneously from hundreds of millions of users around the world. Nor do we have access to a supercomputer, as Deep Blue’s engineers did. Progress will occur at a much slower rate.

  Nevertheless, a commitment to testing ourselves—actually seeing how well our predictions work in the real world rather than in the comfort of a statistical model—is probably the best way to accelerate the learning process.

  Overcoming Our Technological Blind Spot

  In many ways, we are our greatest technological constraint. The slow and steady march of human evolution has fallen out of step with technological progress: evolution occurs on millennial time scales, whereas processing power doubles roughly every other year.

  Our ancestors who lived in caves would have found it advantageous to have very strong, perhaps almost hyperactive pattern-recognition skills—to be able to identify in a split-second whether that rustling in the leaves over yonder was caused by the wind or by an encroaching grizzly bear. Nowadays, in a fast-paced world awash in numbers and statistics, those same tendencies can get us into trouble: when presented with a series of random numbers, we see patterns where there aren’t any. (Advertisers and politicians, possessed of modern guile, often prey on the primordial parts of our brain.)

  Chess, however, makes for a happy ending. Kasparov and Deep Blue’s programmers saw each other as antagonists, but they each taught us something about the complementary roles that computer processing speed and human ingenuity can play in prediction.

  In fact, the best game of chess in the world right now might be played neither by man nor machine.47 In 2005, the Web site, hosted a “freestyle” chess tourname
nt: players were free to supplement their own insight with any computer program or programs that they liked, and to solicit advice over the Internet. Although several grandmasters entered the tournament, it was won neither by the strongest human players nor by those using the most highly regarded software, but by a pair of twentysomething amateurs from New Hampshire, Steven Cramton and Zackary “ZakS” Stephen, who surveyed a combination of three computer programs to determine their moves.48 Cramton and Stephen won because they were neither awed nor intimidated by technology. They knew the strengths and weakness of each program and acted less as players than as coaches.

  Be wary, however, when you come across phrases like “the computer thinks the Yankees will win the World Series.” If these are used as shorthand for a more precise phrase (“the output of the computer program is that the Yankees will win the World Series”), they may be totally benign. With all the information in the world today, it’s certainly helpful to have machines that can make calculations much faster than we can.

  But if you get the sense that the forecaster means this more literally—that he thinks of the computer as a sentient being, or the model as having a mind of its own—it may be a sign that there isn’t much thinking going on at all. Whatever biases and blind spots the forecaster has are sure to be replicated in his computer program.

  We have to view technology as what it always has been—a tool for the betterment of the human condition. We should neither worship at the altar of technology nor be frightened by it. Nobody has yet designed, and perhaps no one ever will, a computer that thinks like a human being.49 But computers are themselves a reflection of human progress and human ingenuity: it is not really “artificial” intelligence if a human designed the artifice.




‹ Prev