Final Jeopardy

Home > Other > Final Jeopardy > Page 3
Final Jeopardy Page 3

by Stephen Baker


  “So here we have a machine that’s as fast as your brain, or close,” he said. “But it doesn’t think the way we think. So what would be an appropriate grand challenge that would have high visibility and excite people?” He didn’t remember the idea coming from Lickel or hearing about the Fishkill dinner. In fact, Horn thought the idea might have come from him. In any case, he liked it—and promptly ran into resistance. “The general response was negative,” he recalled. “People said, ‘It can’t be done. It’s too much of a publicity stunt. The only reason that you’re interested in it is because it’s a show on TV.’” But Horn thought that building a savvy answering machine was the ideal challenge for IBM. While he maintained that he viewed the grand challenge as pure research, it also made plenty of sense.

  IBM’s business had undergone a radical transformation over the course of Horn’s thirty-year career at the company. As late as the 1970s, IBM ruled the computer industry. It launched its first computers for business in 1952. But it was its breakthrough mainframe in 1964, Series 360, that established a single standard of computing in business, industry, and science. IBM pitched itself as a safe, if expensive, bet for companies looking to computerize. Its buttoned-down sales and consulting teams spread a compelling message around the world: “Nobody ever got fired for buying IBM.” Big Blue, a name derived from the massive blue mainframes it sold, grew so big that its rivals, including Sperry, Burroughs, Honeywell, and four other companies, came to be known as the Seven Dwarfs. During this time, IBM researchers at Saarinen’s edifice and at other labs around the world churned out an array of new technologies. They came up with magnetic strips for credit cards and floppy disks for computer data storage. Yet it was computers that drove the business. When Horn arrived at IBM Research in 1979, the greatest threat to IBM appeared to be a decade-long antitrust complaint brought by the U.S. Justice Department. It alleged that IBM had violated the Sherman Act by attempting to monopolize the fast-growing industry for business computers. Whether or not Big Blue had broken the law, its dominance was beyond question.

  By 1982, when the Justice Department dropped the suit for lack of evidence, the computer world was shifting under Big Blue’s feet. The previous year, IBM had unveiled its first personal computer, or PC. Priced at $1,500, it provided both legitimacy and a standard for the young industry. Early on, as corporate customers gobbled up PCs, it seemed as though IBM would go on to dominate this next stage of computing. But there was a crucial difference between these desktop machines and the mainframes. Nearly every component of the mainframes, including their processors and software, was made by IBM. In the lingo of the industry, the computers were vertically integrated. This was not the case with PCs. In order to get to market quickly at a low price, IBM built them from off-the-shelf technology—microprocessors from Intel and a rudimentary operating system, MS-DOS, from a Seattle startup called Microsoft. Since the PC had commodity innards, it took no time at all for newcomers, including Compaq and Taiwan’s Acer, to plug them into cheaper “IBM-compatible” computers, or clones. IBM found itself slugging it out with a slew of upstarts while Intel and Microsoft ran away with the profits and grew into titans. Big Blue was in decline, falling faster than most people imagined. And in 1992, the vast industrial behemoth stunned the business world by registering a $4.97 billion loss, the largest in U.S. history at the time. In the space of a decade, a company that had been synonymous with cutting-edge technology now looked tired and wasteful, a manufacturing titan ill-suited to the Information Age. It almost went under.

  A new chief executive, Louis V. Gerstner, arrived in 1993 and transformed IBM. He sold off or shuttered old manufacturing divisions and steered the company toward businesses based on information. IBM did not have to sell machinery to be a leader in technology, he said. It could focus on the intelligence to run the technology—the software—along with the know-how to put the systems to good use. That was services, including consulting, and it led IBM back to growth.

  Technology, in the early ’90s, was convulsing entire industries and the new World Wide Web promised even more dramatic change. IBM’s customers, which included virtually every blue-chip company on the planet, were confused about how these new networks and services fit into their businesses. Did it make sense to shift design work to China or India and have teams work virtually? Should they remake customer service around the Web? They had loads of questions, and IBM decided it could sell the answers. It could even take over tech operations for some of its customers and charge for the service.

  This push toward services and software continued under Gerstner’s successor, Samuel J. Palmisano. Two months after Charles Lickel came back from Poughkeepsie with the idea for a Jeopardy computer that could play Jeopardy, IBM sold its PC division to Lenovo Group of China. That year IBM Global Services registered $40 billion in sales, more than the $31 billion in hardware sales and a much larger share of profits. (By 2009, services would grow to $55 billion, nearly 60 percent of the company’s revenue. And the consultants working in the division sold lots of IBM software, which registered $21 billion in sales.) Naturally, a Jeopardy computer would run on IBM hardware. But the heart of the system, like IBM itself, would be the software created to answer difficult questions.

  A Jeopardy machine would also respond to another change in technology: the move toward human language. For most of the first half-century of the computer age, machines specialized in orderly rows of numbers and words. If the buyers in a database were listed in one column, the products in another, and the prices in a third, everything was clear: Computers could run the numbers in a flash. But if one of the customers showed up as “Don” in one transaction and “Donny” in another, the computer viewed them as two people: The two names represented different strings of ones and zeros, and therefore Don ≠ Donny. Computers had no sense of language, much less nicknames. In that way, they were clueless. The world, and all of its complexity, had to be simplified, structured and spoon-fed to these machines.

  But consider what hundreds of millions of ordinary people were using computers for by 2004. They were e-mailing and chatting. Some were signing up for new social networks. (Facebook launched in February of that year.) Online humanity was creating mountains of a messy type of digital data: human language. Billions of words were rocketing through networks and piling up in data centers. Those words expressed what millions of people were thinking, desiring, fearing, and scheming. The potential customers of IBM’s clients were out there spilling their lives. Entire industries grew by understanding what people were saying and predicting what they might want to do, where they might want to go, and what they were eager to buy. Google was already mining and indexing words on the Web, using them to build a media and advertising empire. Only months earlier, Google had debuted as a publicly traded company, and the new stock was sky-rocketing.

  IBM wasn’t about to mix it up with Google in the commercial Web. But Big Blue needed state-of-the-art tools to provide its corporate customers with the fastest and most insightful read of the words cascading through their networks. To keep a grip on its gold-plated consulting business, IBM required the very smartest, language-savvy technology—and it needed its customers to know and trust that it had it. It was central to IBM’s brand.

  So in mid-2005 Horn took up the challenge with a number of his top researchers, including Ferrucci. A twelve-year veteran at the company, Ferrucci managed a handful of research teams, including the five people who were teaching machines to answer simple questions in English. Their discipline was called question-answering. Ferrucci knew the challenges all too well. The machines stumbled in understanding English and appeared to plateau, in competitions sponsored by the U.S. government, at a success rate of about 35 percent.

  Ferrucci wasn’t a big Jeopardy fan, but he was familiar with it enough to appreciate the obstacles involved. Jeopardy tested a combination of knowledge, speed, and accuracy, along with game strategy. The show featured three contestants, each with a buzzer. In the course of abou
t twenty minutes, they raced to respond to sixty clues representing a combined value of $54,000. Each one—and this was a Jeopardy quirk—was in fact an answer, some far more complex than others. The contestant had to provide the missing question. For example, in an unusual Tournament of Champions game that aired in November 1994, contestants were presented with this $500 clue1 under the category Furniture: “French term for a what-not, a stand of tiered shelves with slender supports used to display curios.” The host, Alex Trebek, read the clue from the big game board. The moment he finished, a panel around the question lit up setting off the race to buzz. On average, contestants had about four seconds to read and consider the clue before buzzing. The first to buzz was, in effect, placing a bet. The right response—“What is an étagère?”—was worth $500 and gave the contestant the chance to pick again. (“Let’s try European Capitals for $200.”) A botched response wiped the same amount from a contestant’s score and gave the other two a chance to try. (In this example, no one dared to buzz. Such a clue, uncommon in Jeopardy, is known as a “triple-stumper.”)

  To compete in Jeopardy, a machine not only would need to come up with the answer, posed as a question, within four seconds, but it would also have to gauge its confidence in its response. It would have to know what it knew. “Humans know what they know like that,” Ferrucci said later, snapping his fingers. Replicating such confidence in a computer would be tricky. What’s more, the computer would have to calculate the risk according to where it stood in the game. If it was far ahead and had only middling confidence on “étagère,” it might make more sense not to buzz. In addition to piling up knowledge, a computer would have to learn to play the game.

  Complicating the game strategy were four wild cards. Three of the game’s sixty hidden clues were so-called Daily Doubles. In that 1994 game, a contestant named Rachael Schwartz, an attorney from Bedminster, New Jersey, asked for the $400 clue in the Furniture category. Up popped a Daily Double giving her the chance to bet some or all of her money on a furniture-related clue she had yet to see. She wagered $500, a third of her winnings, and was faced with this clue: “This store fixture began in 15th century Europe as a table whose top was marked for measuring.” She missed it, guessing, “What is a cutting table?,” and lost $500. (“What is a counter?” was the correct response.) It was early in the game and didn’t have much impact. The three players were all around the $1,000 mark. But later in a game, Ferrucci saw, Daily Doubles gave contestants the means to storm back from far behind. A computer playing the game would require a clever game program to calibrate its bets.

  The biggest of the wild cards was Final Jeopardy, the last clue of the game. As in Daily Doubles, contestants could bet all or part of their winnings on a single category. But all three contestants participated—as long as they had positive earnings. Often the game boiled down to betting strategies in Final Jeopardy. Take that 1994 contest, in which the betting took a strange turn. Going into Final Jeopardy, Rachael Schwartz led Kurt Bray, a scientist from Oceanside, California, by a slim margin, $9,200 to $8,600. The category was Historic Names. To lock down a win, she had to assume he would bet everything, reaching $17,200. A bet of $8,001 would give her one dollar more, provided she got it right. But if they both bet big and missed, they might fall to the third-place contestant, Brian Moore, a Ph.D. candidate from Pearland, Texas. In the minute or so that they took to place their bets, the two leaders had to map out the probabilities of a handful of different scenarios. They wrote down their dollar numbers and waited for the clue: “Though he spent most of his life in Europe, he was governor of the Bahamas for most of World War II.”

  The second-place player, Bray, was the only one to get it right: “Who was Edward VIII?” Yet he had bet only $500. It was a strange number. It placed him $100 behind the leader, not ahead of her. But the bet kept him beyond the reach of the third-place player. Most players bet at least something on a clue. If Schwartz had wagered and missed, he would win. Indeed, Schwartz missed the clue. She didn’t even bother guessing. But she had bet nothing, leaving herself $100 ahead and winning the game.

  The betting in Final Jeopardy, Ferrucci saw, might actually play to the strength of a computer. A machine could analyze betting patterns over thousands of games. It could crunch the probabilities and devise optimized strategies in a fraction of a second. “Computers are good at that kind of math,” he said.

  It was the rest of Jeopardy that appeared daunting. The game featured complex questions and a wide use of puns posing trouble for literal-minded computers. Then there was Jeopardy’s nearly boundless domain. Smaller and more specific subject areas were easier for computers, because they offered a more manageable set of facts and relationships to master. They provided context. A word like “leak,” for example, had a specific meaning in deep-sea drilling, another in heart surgery, and a third in corporate press relations. A know-it-all computer would have to recognize different contexts to keep the meanings clear. And Jeopardy’s clues took the concept of a broad domain to a near-ludicrous extreme. The game had an entire category on Famous Understudies. Another was on the oft-forgotten president Rutherford B. Hayes. Worse, from a computer architect’s point of view, the game demanded answers within seconds—and penalized players for getting them wrong. A Jeopardy machine, just like the humans on the show, would have to store all of its knowledge in its internal memory. (The challenge, IBM figured, wouldn’t be nearly as impressive if a bionic player had access to unlimited information on the Web. What’s more, Jeopardy would be unlikely to accept a Web-surfing contestant, since others didn’t have the same privilege.) Beating humans in Jeopardy, it seemed, was more than a stretch goal. It appeared impossible and spelled potential disaster for researchers. To embarrass the company on national television—or, more likely, to flame out before even getting there—was no way to manage a career.

  Ferrucci’s pessimism was also grounded in experience. In annual government competitions, known as TRec (Text Retrieval Conference), his question-answering (Q-A) team developed a system called Piquant. It struggled far below Jeopardy levels with a much easier test. In TRec, the competing teams were each given a relatively small “corpus” of about one million documents. They then had to train the machines to answer questions based on the material. (In one version from 2004, several of the questions had to do with Tom Cruise and his ex-wife.)

  In answering these questions, the computer, for all its processing power and memory, resembled nothing so much as a student with serious brain damage. An apparently simple question could turn it into knots. In 2005, it was asked: “What is Francis Scott Key best known for?” The first job was to determine which of those words represented the subject of the question, the “entity,” and whether that might be a person, a state, or perhaps an animal or a machine. Each one had different characteristics. “Francis” and “Scott” looked like names. But “Key”? That could be a metal tool to open doors or a mental breakthrough to solve problems. In its hunt, the computer might even spend a millisecond or two puzzling over Key lime pies. Clearing up these doubts might require a visit to the system’s “disambiguation” unit, where the answering program consulted a dictionary or looked for contextual clues in the surrounding words. Could “Key” be something the ingenious Francis Scott invented, collected, planted, or stole? Could he have baked it? Probably not. The structure of the question, with no direct object, made it look like the third name of a person. The capital K on Key strengthened that case.

  A person confronting that question either knew or did not know that Francis Scott Key wrote the U.S. national anthem, “The Star-Spangled Banner.” But he or she wasted no time searching for the subject and object in the sentence or wondering if it was a last name, a metal tool, or a tangy South Florida dessert.

  For the machine, things only got worse. The question lacked a verb, which could disorient the computer. If the question were, “What did Francis Scott Key write?” the machine could likely find a passage of text with Key writing something, and th
at something would point to the answer. The only pointer here—“is known for”—was maddeningly vague. Assuming the computer had access to the Internet (a luxury it wouldn’t have on the show), it headed off with nothing but the name. In Wikipedia, it might learn that Key was “an American lawyer, author and amateur poet, from Georgetown, who wrote the words to the United States national anthem, ‘The Star-Spangled Banner.’” For humans, the answer was right there. But the computer, with no verb to guide it, might answer that Key was known as an amateur poet or a lawyer from Georgetown. In the TRec competitions, IBM’s Piquant botched two out of every three questions.

  All too often, the system failed to understand the question or to put it in the right context. For this, a growing school of Artificial Intelligence argued, systems needed to spend more time in the computer equivalent of infancy, mastering the concepts that humans take for granted: time, space, and the basic laws of cause and effect.

  Toddlerhood is a tribulation for computers, because it represents knowledge that is tied to the human experience: the body and the senses. While crawling, we learn about space and physical objects, and we get a sense of time. The toddler reaches for the jar on the table. Moments later pieces of it lie scattered on the floor. What happened between those two states? It fell. Such lessons establish notions of before and after, cause and effect, and the nature of gravity. These experiences, most of them accompanied by a steady stream of human language, set the foundation for practically everything we learn. “You crawl around and bump into things,” said David Gunning, a senior manager at Vulcan Inc., an AI incubator in Seattle. “That’s basic research.” It isn’t just jars that fall, the toddler notices. Practically everything does. (Certain balloons are exceptions, which seem magical.) The child turns these observations into theory. Unlike computers, humans generalize.

 

‹ Prev