Still, Craig blitzed though his first game. His crucial clue was another jumbo bet—$12,000, this time—on a Daily Double, in which he identified “small masses of lymphoid tissue in the nasopharynx” (“What are adenoids?”). He chalked up another $34,399 and appeared to be off and running.
But the next match was his undoing. He faced Matt Martin, a police officer from Arlington, Virginia, and Jelisa Castrodale, a sportswriter from North Carolina. Just a day earlier, his luck with Danish kings and atomic elements made him wonder if he was dreaming. Now his fortunes took a cruel turn. Sleep-deprived, he found himself struggling in a category that seemed to be mocking him: “Pillow talk.” Such fluff was hardly his forte. Castrodale identified the “small scattered pillows also known as scatter cushions” (“What is a throw pillow?”) and the “child carrying the pillow in a wedding procession” (“What is a ring-bearer?”). And when a clue asked about “folks with tailbone injuries” sitting on “pillows in the shape of these sweet baked treats,” Martin buzzed in. It was the cop, as Alex Trebek gleefully noted, who answered, “What are donuts?”
Barely a week before Craig’s final show aired, Watson was engaged in a closely fought match with the former champion Justin Bernbach, and they were playing the very same clues. This was the day that Watson, following a dominating morning, later faltered and crashed. Its patterns in this game seemed to mirror those of Roger Craig. Like Craig, Watson appeared largely lost on pillow talk. Both of them, however, swept through the category on the ancient civilization of Ur. (When you have a category like that, Craig later explained, “You almost know the answers before they ask the questions.” He listed a few on the fingers of one hand: Iraq, Sumeria, Cyrus the Great, and Ziggurats (the terraced monuments they built). “What else can they ask about Ur?” Watson, though following a different logic, delivered the same winning results. Watson and Craig also thrived in the category “But what am I?” It featured the Latin names for certain animals, along with helpful hints, alerting players, for example, not to confuse “cyanocitta cristata” with Canadian baseball players (“What are Blue Jays?”). These were easy factoids for computer and computer scientist alike.
As Watson went into Final Jeopardy on that September afternoon, it held a slim lead over Bernbach and a comfortable one over Maxine Levaren, a personal success coach from San Diego. But it lost the game to Bernbach, you might recall, by missing a clue in the category Sports and the Media. It failed to name the city whose newspaper celebrated the previous February 8 with the headline: “Amen! After 43 Years, Our Prayers Are Answered.” The computer had only 13 percent confidence in Chicago, but that was higher than its confidence in its other candidates, including Omaha and two cities associated with prayer, Jerusalem and the Vatican. In retrospect, Watson was scouring its database for events dated February 8. But the machine, raised in the era of instant digital news, ignored the lag at the heart of traditional paper headlines: Most of the events they describe occurred the previous day.
Like Watson, Roger Craig reached Final Jeopardy clinging to a narrow lead, $22,000 to $19,700, over Jelisa Castrodale. The Sports and the Media category looked perfect for the sportswriter. But Craig was a fan as well and a master of sports facts—especially those concerning football. The same clue Watson had botched, featuring forty-three years and answered prayers, popped up on the board and the contestants wrote their responses. After the jingle, Alex Trebek turned to them. Martin, who lagged far behind, incorrectly guessed: “What is Miami?” Castrodale was next: “What is New Orleans?” That was right. She had bet all but one dollar, which lifted her to $39,339. Craig had anticipated her bet and topped it: He would win by $2 if he got it right. But his answer was 840 miles off target. The six-time champion, who had trained himself with the methods and rigor of computer science, came up with the same incorrect response as his electronic role model: “What is Chicago?”
Was the melding of man and machine leading Craig and Watson through the same thought processes and even to the same errors? Weeks later, sitting in IBM’s empty Jeopardy studio, David Gondek opened his Mac and traced the cognitive route that led Watson to the Windy City. “It really didn’t have any idea,” he said, clicking away. The critical document, Gondek found, turned out to be news about a prayer meeting in Chicago on February 8, which featured a prominent swami. When Watson failed to come up with convincing responses, which correlated, statistically and semantically, to the clue, it turned to documents like this one with a few matching words. The machine had negligible confidence in answers from such sources. But in this case, the machine had no better option.
Craig had a different story. In the thirty seconds he had to mull Final Jeopardy, thoughts about a prayer service featuring a swami in Chicago never entered his mind. But his analysis, usually so disciplined, was derailed by an all-too-human foible. He fell to suggestion, one nourished by his environment. Just a short drive north of his home in Delaware, the ice hockey team in Philadelphia, the Flyers, had recently battled to the finals of the Stanley Cup. This awakened hockey fever in the metropolitan area and an onslaught of media coverage, along with endless chatter and speculation. Hockey hadn’t been on people’s minds to this degree since the glory years of the franchise, when the “Broad Street Bullies” won back-to-back cups in the mid-1970s. The Flyers ultimately lost to the Chicago Black Hawks, a team that hadn’t won in forty-nine years (six years longer than the Saints). So even though Craig was a “huge football fan” who hadn’t missed watching a Super Bowl since his childhood, he had hockey in his head when he saw the Final Jeopardy clue. Much like the psychology test subjects who mistook Moses for the animal keeper on the ark, Craig focused on a forty-something-year championship drought—and looked right past the crucial February date. The hockey final, after all, had been in June. “I blew it,” he said. So did Watson. But despite their virtuoso talents and similar techniques, in this one example of failure they each remained true to their kind. One was dumb as only a machine can be, the other human to a fault.
During the sparring sessions in the spring, Watson had relied on simple heuristics to guide its strategy. Ferrucci at one point called it brain dead, and David Gondek, who had written the rules, had to agree. You might say that such heuristics are “brain-dead by definition,” he said, since they replace analysis with rules. But what a waste it was to equip Watson, a machine that could carry out billions of calculations per second, with such a rudimentary set of instructions.
There was no reason, of course, for Watson’s strategy to be guided by a handful of simple rules. The machine had plenty of processing power, enough to run a trillion-dollar trading portfolio or to manage all of the air traffic in North America or even the world. Figuring out bets for a single game of Jeopardy was well within its range. But before the machine could become a strategic whiz, Gondek and his team had to turn thousands of Jeopardy games into a crazy quilt of statistical probabilities. Then they had to teach Watson—or help it teach itself—how best to play the game. This took time.
The goal was to have Watson analyze a dizzying assortment of variables, from its track record on anagrams or geography puzzlers to its opponents’ ever-changing scores. Then it would come up with the ideal betting strategy for each point of the game and for each clue. This promised to be much simpler for Watson than the rest of its work. English, after all, was foreign to the machine, and Jeopardy clues, even after years of work, remained challenging. Game strategy, with its statistical crunching of probabilities, played to Watson’s strengths.
To tutor Watson in the art of strategy, Gondek brought in one of IBM’s gaming masters, an intense computer scientist named Gerald Tesauro. Short, dark, and neatly dressed, his polo shirt tucked cleanly into dark slacks, Tesauro was one of the more competitive members of the Jeopardy team. He took pride, for example, in his ability to beat Watson to the buzzer. Once, in a practice match against the machine, he managed to buzz in twenty-four times, he later said, and got eighteen of the clues right. Like a basketball player
who’s hitting every shot, he said, he was “in some kind of a zone” (though, to be honest, that 75 percent precision rate would place him in a crowd of Jeopardy also-rans). Even when Tesauro was in the audience, he would play along in his mind, jerking an imaginary buzzer in his fist each time he knew the response.
Tesauro gained global renown in the ’90s when he developed the computer that mastered the five-thousand-year-old game of backgammon. (Sumerians, as Roger Craig may already know, played a variation of it in the ancient city of Ur.) What distinguished Tesauro’s approach was that he didn’t teach the machine a thing. Using neural networks, his system, known as TD-Gammon, learned on its own. Following Tesauro’s instructions, it played games against itself, millions of them. Each time it won or lost, it drew conclusions. Certain moves in certain situations led more often to victory, others to defeat. Although this was primitive feedback—no more than thumbs up, thumbs down—each game delivered a minuscule improvement, Tesauro said. Over the course of millions of games, the machine developed a repertoire of winning moves for countless scenarios. Tesauro’s machine beat champions.
Tesauro’s first goal was to run millions of simulated Jeopardy games, just as he had with backgammon. For this he needed mathematical models of three players, Watson and two humans. Modeling Watson wasn’t so hard. “We knew all of its algorithms,” he said, and the team had precise statistics on every aspect of its behavior. The human players were more complicated. Tesauro had to pull together statistics on the thousands of humans who had played Jeopardy: how often they buzzed in, their precision in different levels of clues, their betting patterns for Daily Doubles and Final Jeopardy. From these, the IBM team pieced together statistical models of two humans.
Then they put them into action against the model of Watson. The games had none of the life or drama of Jeopardy—no suspense, no jokes, no jingle while the digital players came up with their Final Jeopardy responses. They were only simulations of the scoring dynamics of Jeopardy. Yet they were valuable. After millions of games, Tesauro was able to calculate the value of each clue at each state of the game. If Watson was in second place, trailing by $1,500 with $14,400 left on the board, what size bet on a Daily Double maximized its chance of winning? The answer changed with every move, and Tesauro was mapping it all out. Humans, when it came to betting, only had about five seconds to make a decision. They went with their gut. Watson, like its number-crunching brethren in advertising and medicine, was turning its pile of data into science.
The science, it turned out, was a bit scary. Watson’s model was based on the record it had established following the simple heuristics. And studies showed that the machine, much like risk-averse humans, had been dramatically underbetting. In many stages of the game, according to Tesauro’s results, the computer could maximize its chances by wagering nearly everything it had. (This wasn’t always the case. If Watson enjoyed a big lead late in the game, it made sense to minimize a bet.) When Tesauro adjusted Watson’s strategy toward a riskier blend of bets, it started winning more of the simulated games. He and Gondek concluded that in many scenarios, Watson should bet the farm. “When we first went to Dave Ferrucci about this,” Tesauro recalled, “he turned pale as a sheet and said, ‘You want to do what?’”
“We showed him all the extra wins we were getting with this,” Gondek said. “But he looked at the colossal bets we were making and said, ‘What if you get them wrong?’”
The conflict between rational analysis and intuition were playing out right in the IBM War Room. And Ferrucci, much like the humans who placed small, safe bets every evening on Jeopardy, was turning away from statistics and focusing on a deeper and more primitive concern: survival. Watson was going to be playing only one game on national television. What if it bet big on that day and lost?
“That would really look bad for us,” Tesauro said. Perhaps it would be better to sacrifice a win or two out of a hundred and protect Watson a bit more from prime-time catastrophe. It wasn’t clear. The strategy team continued to crunch numbers.
The numbers flowing in from the real matches, where Watson was playing flesh-and-blood humans, were improving. Through the autumn season, the newer, smarter Watson powered its way past scores of Jeopardy champions. It won nearly 70 percent of its matches; its betting was bolder, its responses more assured. It still crashed from time to time, of course, and routinely made crazy mistakes. On one Daily Double, it was asked to name the company that in 2002 “came out with a product line featuring 2-line Maya Angelou poems.” Watson missed the answer (“What is Hallmark?”) and appeared to pay tribute to its creators, responding: “What is IBM?”
Watson’s greatest weakness was in Final Jeopardy. According to the statistics, after the first sixty clues, Watson was leading an astounding 91 percent on the games. Yet that final clue, with its more difficult wording and complex wagering dynamics, lowered its winning percentage to 67 percent. Final Jeopardy turned Watson from a winner to a loser in one-quarter of the games. This was its vulnerability going into the match, and it would no doubt rise against the likes of Ken Jennings and Brad Rutter. The average human got Final Jeopardy right about half the time, according to Gondek. Watson hovered just below 50 percent. Ken Jennings, by contrast, aced Final Jeopardy clues at a 68 percent rate. That didn’t bode well for the machine.
Brad Rutter, undefeated in his Jeopardy career, walked into the cavernous Wheel of Fortune studio. It was mid-November, just two months before he and Ken Jennings would take on Watson. Rutter, thirty-two, is thin and energetic, with sharply chiseled features. His close-cut black beard gives him the look of a vacationing television star. This is appropriate, since he recently moved from his native Lancaster, Pennsylvania, to L.A.’s Beechwood Canyon, right under the famous Hollywood sign. He’s trying to make it as an actor.
On this autumn day, Rutter and Jennings were having their orientation for the upcoming match. They were shuttling back and forth between meetings in the Robert Young Building and interviews in the empty Wheel of Fortune studio. Rutter, clearly fascinated by television, spotted a rack of men’s suits by the stage. “Are those Pat Sajak’s?” he asked, referring to the longtime Wheel of Fortune host. Told that they were, he went over to check the labels. For years, the show announced every evening that Sajak’s wardrobe was provided by Perry Ellis. Rutter, a stickler for facts, wanted to make sure it was true. It was.
The previous evening, Rutter had been given a Blu-ray Disc featuring five of Watson’s sparring rounds. He studied them closely. He noticed right away that Watson hopped around the board, apparently hunting for Daily Doubles. He also focused on Watson’s buzzer speed and was relieved to see that humans often managed to beat the machine. This was crucial for Rutter, who viewed speed as his greatest advantage. He said he was no expert on computers and had only a vague idea of how Watson worked. But he had expected the IBM team to give Watson an intricate timing program to anticipate the buzz. (This was a frightfully complex option that Ferrucci had decided not to pursue.) “That scared me,” Rutter said.
Rutter’s speed is legendary. It fueled his 16-0 record on Jeopardy, including his decisive victories over Jennings. It was such an advantage that IBM’s Gondek referred to Rutter as “Jennings Kryptonite.” Rutter said he wasn’t sure what made his thumb so fast, but he had a theory. “I used to play a lot on the Nintendo entertainment system when I was a kid,” he said. “And if you played Super Mario Brothers or Metroid, you had to hit the button to jump at exactly the right time. It was not about speed but timing. And that’s what the Jeopardy buzzer is about.” This meant that a computer game trained the human, who would later use those same skills to take on another computer.
But Rutter boasted strengths beyond mere speed. In an Ultimate Tournament of Champions match that aired in May 2005, he found himself in a most unusual position—third place—heading into Final Jeopardy. The category was People and Places, and the clue: “This Mediterranean island shares a name with President Garfield’s nickname for his wife.”
“I started scanning Mediterranean islands,” Rutter said. “OK. Sardinia? No. Corsica? No. Sicily? No. Menorca? Mallorca? Malta?” He figured, “Malta could be a girl’s name,” and wrote it down. But he knew it was wrong. As the theme music played, he continued to think of islands. Lesbos, Rhodes, Ibiza … “With about five seconds left,” he said, “I got to Crete.” All at once the pieces came together. “Crete could be short for Lucretia. That’s a very nineteenth-century name. And then it was an apparition in my head. I’d looked at a list of First Ladies, and somehow Lucretia Garfield popped out at me. I can’t explain it. So I scribbled down Crete. It was barely legible. I was the only one to get it right, and I ended up winning by a dollar.”
A timely spark of human brilliance had saved him. It featured a series of insights and connections Watson would be hard-pressed to match. Indeed, in the coming showdown, Rutter’s competition on that type of clue was more likely to come from the other human on the stage. This led to a question: Was it fair that the two humans had to battle each other in addition to the machine? Mightn’t it be easier for one player to face two machines?
Rutter thought so. “I’ve seen Ken play seventy-four matches,” he said. “I know his strengths and weaknesses pretty well. They’re different than Watson’s. So when I’m picking different categories or clues off the board, who do I attack? Whose weaknesses do I try to get to? That’s a tough question. I haven’t really figured it out yet. I’m going to be thinking about it a lot.”
Jennings, sitting in the same studio, elaborated on the point. “I don’t mean it to sound like I’m making excuses already,” he said, “but there is some inherent disadvantage that there are two humans and one Watson.” The way he saw it, Watson’s algorithms would master “a certain percentage of the Jeopardy canon.” And if the computer was fast on the buzzer, it would dominate in those areas. That left two players to battle over the clues “that only humans can do.”
Final Jeopardy Page 21