The issue was speed. The millions of calculations for each question exacted a price in time. Millisecond by millisecond, they added up. Each clue took a single server an average of 90 minutes to 2 hours, more than long enough for Jennifer Chu-Carroll’s lunch break. For Watson to compete in Jeopardy, Ferrucci’s team had to shave that down to a maximum of 5 seconds and an average of 3.
How could Watson speed up by a factor of 1,440? In late 2008, Ferrucci entrusted this job to a five-person team of hardware experts led by Eddie Epstein, a senior researcher. For them, the challenge was to divide the work Watson carried out in two hours into thousands of stand-alone jobs, many of them small sequences of their own. They then had to distribute each job to a different processor for a second or two before analyzing the results cascading in. This work, or scale-out, required precise choreography—thousands of jobs calculated to the millisecond—and it would function only on a big load of hardware.
Epstein and his team designed a chunky body for Watson. Packing the computers closely limited the distance the information would have to travel and enhanced the system’s speed. It would develop into a cube of nearly 280 computers, or nodes, each with eight processors—the equivalent of 2,240 computers. The eight towers, each the size of a restaurant refrigerator, carried scores of computers on horizontal shelves, each about as big as a pizza box. The towers were tilted, like the one in Pisa, giving them more surface area for cooling. In its resting state, this assembly of machines emitted a low, whirring hum. But about a half hour before answering a Jeopardy question, the computers would stir into action, and the hum would amplify to a roar. During this process, Watson was moving its trove of data from hard drives onto random access memory (RAM). This is the much faster (and more expensive) memory that can be searched in an instant—without the rotating of disks. Watson, in effect, was shifting its knowledge from its inner recesses closer to the tip of its tongue. As it did, the roar heightened, the heat mounted, and a powerful set of air conditioners kicked into high gear.
It was a snowy day in February 2010 when the marketing team unveiled prototypes of the Watson avatar for David Ferrucci. Ferrucci was working from home with a slow computer connection, so it took him several long minutes to download the video of the avatar in action. “It’s amazing we can get a computer to answer a question in three seconds and it still takes fifteen minutes to download a file,” he muttered. When he finally had the video, the creative team walked him through different possible versions of Watson. They weren’t sure yet whether the avatar would reside in a clear globe, a reddish sphere, or perhaps a simple black screen. However it was deployed, it would continuously shift into numerous states of listening and answering. Miles Gilbert, the art director, explained that the five bars of the Smarter Planet icon would stay idle in the background “and then pop up when he becomes active.”
“This is mesmerizing,” Ferrucci said. But he had some complaints. He thought that the avatar could show more of the computation going on inside the machine. Already, the threads seemed to simulate a cognitive process. They came from different parts of the globe and some grew brighter while others faded. This was actually what was happening computationally, he said, as Watson entertained hundreds of candidate answers and sifted them down to a handful and then just one. Wouldn’t it be possible to add this type of real-time data to the machine? “It would be neat if all this movement was less random and meant more,” he said.
It sounded like an awful lot of work for something that might fill a combined six minutes of television time. “You’re suggesting that there should be thousands of threads, and then they’re boiled down to five threads, and ultimately one?” asked a member of the research division’s press team.
“Yeah,” Ferrucci said. “These are threads in massive parallelism. As they come more and more together, they compete with each other. Then you’re down to the five we put on the [answer] panel. One of them’s the brightest, which we put into our answer. This,” he said emphatically, “could be more precise in its meaning.”
There was silence on the line as the artists and PR people digested this contribution from the world of engineering. They moved on to the types of data that Watson could produce for its avatar. Could the system deliver the precise number of candidate answers? Could it show its levels of confidence in each one rising and falling? Ferrucci explained that the machine’s ability to produce data was nearly limitless—though he wanted to make sure that this side job didn’t interfere with its Jeopardy play. “I’m tempted to say something I’ll probably regret,” he said. “We can tell you after each question the probability that we’re going to win the game.” He laughed. “Is there room for that analysis?”
It was around this time that Ferrucci, focusing on the red circular version of Watson, started to carry out image searches on the Internet. He was looking for Kubrick’s 2001. “You probably want to avoid that red-eye look,” he said, “because when it’s pulsating, it looks like HAL. I’m looking at the HAL eye on the Web. It’s red and circular, and kind of global. It’s sort of like Smarter Planet, actually.”
The call ended with Ferrucci promising new streams of Watson data for Joshua Davis and his colleagues at Ogilvy. They had at least until summer to get the avatar up and running. But the rest of Watson—the faceless brain with its new body—was heading into its first round of sparring matches. They would be the first games against real Jeopardy players, a true test of Watson’s speed, judgment, and betting strategy. The humans would carry back a trophy, along with serious bragging rights, if they managed to beat Watson before Ken Jennings and Brad Rutter even reached the podium.
6. Watson Takes On Humans
EARLY IN THE MATCH, David Ferrucci sensed that something was amiss. He was in the observation room next to the improvised Jeopardy studio at IBM Research on a midwinter morning in 2010. On the other side of the window, Watson was battling two humans—and appeared to be melting under the pressure. One Daily Double should have been an easy factoid: “This longest Italian river is fed by 141 tributaries.” Yet the computer inexplicably came up with “What is _____?” No Tiber, no Rubicon, no Po (the correct response). It didn’t come up with a single body of water, Italian or otherwise. It drew a blank.
Ferrucci leaned forward, looking agitated, and said to no one in particular, “It doesn’t feel right. Did you leave off half the system?” His colleagues, all typing on their laptops, kept their heads down and murmured that they hadn’t. To engage Ferrucci when he was in a darkening mood could backfire. No one was looking for a confrontation this early in the morning.
Watson continued to malfunction. As the two Jeopardy players outscored the machine, it developed a small speech defect. Its genial male voice started to add a “D” to words ending in “N.” In the category the Second Largest City, Watson buzzed for the clue, Lahore, and confidently answered, “What is Pakistand?” After a short consultation, the game judge, strictly following the rules, declared the answer incorrect. That turned Watson’s $600 gain into a loss, a difference of $1,200. “This is ridiculous,” Ferrucci muttered.
Then Watson, a still faceless presence at the far left podium, began to place some ludicrous bets. In one game, it was losing to a journalist and former Jeopardy champion named Greg Lindsay, $12,400 to $6,700. Watson landed on a Daily Double. If it bet big, it could pull even with Lindsay or even inch ahead. Yet it wagered a laughable $5. It was Watson’s second strange bet in a row. The researchers groaned in unison. Some of their colleagues were sitting in the studio with the New York Times Magazine’s Clive Thompson, who was writing a piece on Watson. They looked through the window at Ferrucci and shrugged, as if to ask “What’s up with this beast?”
But Ferrucci didn’t see them. He was staring at David Gondek. Lithe and unusually cheerful, Gondek was a leading member of the team. Unlike most of his suburban colleagues, he lived far south in Greenpoint, Brooklyn, taking the train and biking from the station. He headed up machine learning and game strategy and seemed
to have a hand in practically every aspect of Watson. Ferrucci continued to stare wordlessly at him. Gondek, after all, was responsible for programming Watson’s betting strategy, and it looked like the computer was playing to lose. Ferrucci, during this brief interlude, was carrying out an inquisition with his eyes.
Gondek looked up at his boss. “It’s a heuristic,” he explained. He meant that Watson was placing bets according to a simple formula. Gondek and his colleagues were hard at work on a more sophisticated betting strategy, which they hoped would be ready in a month. But for now, the computer relied on a handful of rules to guide its wagers.
“I didn’t realize that it was this stupid!” Ferrucci said. “You never told me it was brain-dead.” He gestured toward Thompson, who was watching the game on the other side of the glass and taking notes on his laptop. “We really enjoy stinking it up for the New York Times writer.”
Gondek started to explain the thinking behind the heuristic. If Watson had barely half the winnings of the leader, one of its rules told it not to risk much in a Daily Double. Its primary goal at this point was not to catch up but to reach Final Jeopardy within striking distance of the leader. If it fell below half of the leader’s total, it risked being locked out of Final Jeopardy—a disaster. So Watson was instructed to be timid in these circumstances, even if it meant losing the game—and infuriating the chief scientist.
Nearly every week for several months, IBM had been bringing in groups of six players with game experience to match wits with Watson in this new mock-Jeopardy studio. They competed on game boards that had already been played in Culver City but not yet telecast. Friedman’s team would not grant IBM access to the elite players who qualified for Jeopardy’s Tournament of Champions. They didn’t want to give Watson too much exposure to Jeopardy greatness—at least not yet. For sparring partners, the machine had to settle for mere mortals, players who had won no more than two games in televised matches. It was up to Ferrucci’s team to imagine—or, more likely, to calculate—how much more quickly Ken Jennings and Brad Rutter would respond to the buzzer and how many more answers they’d get right.
By the time Watson started the sparring sessions, in November 2009, the machine had already practiced on tens of thousands of Jeopardy clues. But the move from Hawthorne to the Yorktown research center placed the system in a new and surprising laboratory. Playing the game tested new skills, starting with speed. For two years, development at the Hawthorne labs had focused on Watson’s cognitive process—coaxing it to come up with right answers more often, to advance up the Jennings Arc. During games, though, nailing the answer meant nothing if Watson lost the buzz. At the same time, it had to grapple with strategy. This meant calculating its bets in Daily Doubles and Final Jeopardy and estimating its chances on clues it had not yet seen. It also had to anticipate the behavior of its human foes, especially in Final Jeopardy, where single bets often won or lost games.
Perhaps the biggest revelation in the sparring matches came from the spectators: They laughed. They were mostly friends of the players and a smattering of IBM employees, watching from four rows of folding chairs. Watson amused them. This isn’t to say that they weren’t impressed by a machine that came up with some of the most obscure answers in a matter of seconds. But when Watson committed a blooper—and it happened several times a game—they cracked up. They laughed when Watson, exercising its mastery of roman numerals, referred to the civil rights leader Malcolm X as “Malcolm Ten.” They laughed more when Watson, asked what the “Al” in Alcoa stood for, promptly linked the aluminum giant to one of America’s most notorious gangsters: “What is Al Capone?” (Watson, during this stage, often referred to people as things. This established a strange symmetry, since the contestants routinely referred to the Jeopardy machine as “him.”) One Final Jeopardy answer a few weeks later produced more merriment. In the category 19th Century Literature, the clue read: “In Chap. 10, the whole mystery of the handkerchiefs, and the watches, and the jewels … Rushed upon this title boy’s ‘mind.’” Instead of Oliver Twist, Watson somehow came up with a British electronic dance music duo, answering, “What is the Pet Shop Boys?”
From a promotional perspective, an occasional nonsensical answer promised to make Watson a more entertaining television performer, as long as the computer kept it clean. This wasn’t always assured. In one of its first sparring sessions, in late 2009, the machine was sailing along, thrashing a couple of mid-level Jeopardy players in front of an audience that included Harry Friedman and fellow Jeopardy bosses. Then Watson startled everyone with a botched answer for a German four-letter word in the category Just Say No. Somehow the machine came up with “What is Fuck?” and flashed the word for all to see on its electronic answer panel. To Watson’s credit, it didn’t have nearly enough confidence in this response to buzz. (It was a human who correctly responded, “What is nein?”) Still, Ferrucci was mortified. It was a relief, he said, to look over at Friedman and his colleagues and see them laughing.
Still, such a blunder could tarnish IBM’s brand. Watson was the company’s ambassador. It was supposed to represent the future of computing. Machines like this, the company hoped, would soon be answering questions in businesses around the world. But it was clear that Watson could conceivably win the Jeopardy challenge and still be remembered, on YouTube and late-night TV, for its gaffes. After an analysis of Watson’s errors, IBM concluded that 5 percent of them were “embarrassing.” This led Ferrucci, early in 2010, to assign a team of researchers to a brand-new task: keeping Watson from looking dumb. “We call it the stupid team,” said Chu-Carroll. Another team worked on a profanity filter.
As each day’s sparring sessions began, the six Jeopardy players settled into folding chairs between the three contestant podiums, the host’s stand, and the big Jeopardy board, with its familiar grid of thirty clues. David Shepler stood before them. Dark, thin, and impeccably dressed, Shepler ran the logistics of the Jeopardy project. He sweated the details. He made sure that IBM followed to the letter the legal agreements covering the play. He didn’t bend an inch for Watson. (It was his ruling that docked Watson $600 for mispronouncing Pakistan.) In the War Room’s culture of engineers and scientists, Shepler, a former U.S. Air Force intelligence officer, was an outsider. He told them what they could not do, which at times led to resentment. Before each match, he instructed the contestants on the rules. They weren’t to tell anyone or—heaven forbid—blog about the matches, the behavior of Watson, or the clues, which had been entrusted to IBM by Jeopardy. He had them sign lengthy nondisclosure agreements and then introduced David Ferrucci.
On this winter morning, Ferrucci ambled to the front of the room. He was wearing dark slacks and a black pullover bearing IBM’s logo. He outlined the Jeopardy challenge and described the goal of building a question-answering dynamo. He pointed to the window behind them, where a set of blue rectangular towers housed the computers running the Watson program. Through the double-pane window, the players could hear the dull roar of the fans working to cool its processors. Ferrucci, priming the humans for the match ahead, tossed out a couple of Jeopardy clues, which they handled with ease. “Oh, I bet Watson’s getting nervous,” he said. “He could be in for a tough day.”
Still, Watson had made astounding progress since its early days in the War Room. Ferrucci showed a slide of what used to be the Jennings Arc. It had the same constellation of Jennings dots floating high and to the right. But it had been expanded into a Winners Cloud, with blue dots representing hundreds of other Jeopardy winners. Most of the winners occupied the upper right quadrant, but below and to the left of most of Jennings’s dots. The average winner buzzed on about half the questions and got four out of five right. Ferrucci traced Watson’s path on the chart. The computer, which in 2007 produced subhuman results, now came up with confident answers to about two-thirds of the clues it encountered and got more than 80 percent of them right. This level of performance put it smack in the middle of the Winners Cloud. Though not yet in Ken Jennin
gs’s orbit, but it was moving in that direction. Of its thirty-eight games to date against experienced players, Ferrucci said, it had won 66 percent, coming in third only 10 percent of the time.
While explaining Watson’s cognitive process, Ferrucci pointed to a black electronic panel. The players wouldn’t see it during the game, he explained, but this panel would show the audience Watson’s top five candidate answers for each question and how much confidence the machine had in each one. “This gives you a look into Watson’s brain,” he said. Moments later, he gave them a glimpse into his own. Showing how the computer answered a complicated clue featuring the Portuguese explorer Vasco da Gama, Ferrucci pointed to the list of candidate answers. “I was confident and I got it right,” he said. Then, realizing that he was doing a mind meld, he explained that he was speaking for Watson. “I identify with the computer sometimes.”
One of the contestants asked him how Watson “heard” the information. “It reads,” Ferrucci said. “When the clue hits your retina, it hits Watson’s chips.” Another contestant wondered about the algorithms Watson used to analyze the different answers. “Can you tell us how it generates confidence scores?”
“I could tell you,” Ferrucci said, clearly warming to the competitive nature of the Challenge, “but I’d have to shoot you.”
For these sparring rounds, IBM hired a young and telegenic host named Todd Crain. An actor originally from Rockford, Illinois, Crain had blond hair, a square jaw, and a quick wit, and had acted in comedy videos for TheOnion.com. At IBM’s Jeopardy studio, he mastered a fluid and hipper take on Alex Trebek. Unlike Ferrucci’s scientists, who usually referred to Watson as a thing, Crain always addressed Watson as a person. Watson was a character he could relate to, an information prodigy who committed the stupidest and most hilarious errors imaginable. Crain encouraged the machine, flattered it, and upbraided it. Sometimes he closed his eyes theatrically and moaned, “Oooooh, Watson!”
Final Jeopardy Page 12