Final Jeopardy

Home > Other > Final Jeopardy > Page 4
Final Jeopardy Page 4

by Stephen Baker


  Even the metaphors in our language lead back to the tumbles and accidents seared into our consciousness in our early years. We “fall” for a sales pitch or “fall” in love, and we cringe at hearing “sharp” words or “stinging” rebukes. We process such expressions on such a basic level that they seem closer to feeling than thought (though for humans, unlike computers, the two are intertwined). Over the course of centuries, these metaphors infused language and, consequently, were fundamental to understanding Jeopardy clues. Yet to a machine with no body or experience in the physical world, each one was a puzzle.

  In some Artificial Intelligence labs, scientists were attempting to transmit these elementary experiences to computers. Sajit Rao, a professor at MIT, was introducing computers equipped with vision to rumpus-room learning, showing them objects moving, falling, obstructing paths, and piling on top of one another. The goal was to establish a conceptual understanding so that eventually computers could draw conclusions from visual observations. What would happen, for example, when vehicles blocked a road?

  Several years later, the U.S. Defense Department’s Advanced Research Projects Agency (DARPA) would fund Rao’s research for a program called Mind’s Eye. The idea was to teach machines not only to recognize objects but to be able to reason about what they were doing, where they might have come from. This work, they hoped, would lead to smart surveillance cameras, which would mean that computers could replace humans in the tedious and exhausting task of monitoring a spot—what the Pentagon calls “persistent stare.” Instead of simply recording movements, these systems would interpret them. If a man in Afghanistan went into a building carrying a package and emerged without it, the system would conclude that he had left it there. If he walked toward another person with a suitcase in his hand, it would predict that he was going to give it to him. A seeing and thinking machine that could generate hypotheses based on observations might zero in on potential roadside bombs or rooftop snipers. This type of intelligence, according to DARPA, would extend computer surveillance from objects to actions—from nouns to verbs.

  This skill required the computer to understand relationships—precisely the stumbling block of IBM’s Piquant as it struggled with questions in the TRec competition. But potential breakthroughs such as Mind’s Eye were still in the infant stage of research and wouldn’t be ready for years—certainly not in time to give a Jeopardy machine a dose of human smarts. What’s more, Ferrucci was busy managing another big software project. So after consulting his team and assembling the discouraging evidence, he broke the news to a disappointed Paul Horn. His team would not pursue the Jeopardy challenge. It was just too hard to guarantee results on a schedule.

  Free of that distraction, the Q-A team returned to its work, preparing Piquant for the next TRec competition. As it turned out, though, Ferrucci had won them only a respite, and a short one at that. Months later, in the summer of 2006, Horn returned with exactly the same question: How about Jeopardy?

  Reluctantly, Ferrucci and his small Q-A team gathered in a small room at the Hawthorne research center, a ten-minute drive south from Yorktown. (It was a far less elegant structure, a cuboid of black glass in an office park. But unlike Yorktown, where the public spaces were bathed in natural light and the offices windowless, Hawthorne’s offices did have views, mostly of parking lots.) The discussion followed the familiar, depressing lines: the team’s travails in the TRec competitions, the insanely broad domain of Jeopardy, and the difficulty of coming up with answers and a betting strategy in three to five seconds. TRec had no time limit at all, and the computer often churned away for minutes trying to answer a single question.

  While the team talked, Ferrucci sat at the back of the room, uncharacteristically quiet. He had a laptop open and was typing away. He was looking up Jeopardy clues online and then searching for answers on Google. The answers certainly didn’t pop up. But in many cases, the search engine led to the right neighborhood. He started thinking about the technologies needed to refine Google’s vague pointer to a precise answer. It would require much of the tech muscle of IBM. He’d have to bring in top natural-language researchers and experts in machine learning. To speed up the answering process, he’d need to spread out the computing to hundreds or even thousands of machines. This would require a crack hardware unit. His team would also need to educate the machine in strategy. Ferrucci had a few colleagues who focused on game theory. Several of them were training computers to play the Japanese game Go (whose computational complexity made chess look like Tic-Tac-Toe). Putting together all the pieces of this electronic brain would require a large multidisciplinary team and a huge investment—and even then they might fail. But the prospect of success, however remote, was tantalizing. Ferrucci looked up from his computer and said “Hey, I think we can do this.”

  At the dawn of Artificial Intelligence (AI), a half century ago, scientists predicted that computers would soon be speaking and answering questions fluently. A pioneer in the field, Herbert Simon, predicted in 1965 that “machines w[ould] be capable, within twenty years, of doing any work a man can do.” These were the glory days of AI, a period of boundless vision and bounteous funding. Machines, it seemed, would soon master language, recognize faces, and maneuver, as robots, in factories, hospitals, and homes. In short, computer scientists would create a superendowed class of electronic servants. This led, of course, to failed promises, to such a point that Artificial Intelligence became a term of derision. Bold projects to build bionic experts and conversational computers lost their sponsors. A long AI winter ensued, lasting through much of the ’80s and ’90s.

  What went wrong? In retrospect, it seems almost inconceivable that leading scientists, including Nobel laureates like Simon, believed it would be so easy. They certainly appreciated the complexity of the human brain. But they also realized that a lot of that complexity was tied up in dreams, memories, guilt, regrets, faith, desires, along with the controls to maintain the physical body. Machines wouldn’t have to bother with those details. All they needed was to understand the elements of the world and how they were related to one another. Machines trained in the particulars of sick people; ambulances and hospitals, for example, could conceivably devote their analytical skills to optimizing emergency services. Yet teaching the machines proved extraordinarily difficult. One of the biggest challenges was to anticipate the responses of humans. The machines weren’t up to it. And they had serious trouble with even the most basic forms of perception, such as seeing. For example, researchers struggled to teach machines to perceive the edges of things in the physical world. As it turned out, it required experience and knowledge and advanced powers of pattern recognition just to look through a window and understand that the oak tree in the yard was a separate entity. It was not connected to the shed on the other side of it or a pattern on the glass or the wallpaper surrounding the window.

  The biggest obstacle, though, was language. In the early days, it looked beguilingly easy. It was just a matter of programming the machine with vocabulary and linking it all together with a few thousand rules—the kind you’d find in a grammar book. If the machine still underperformed? Well, just give it more vocabulary, more rules.

  Once the electronic brain mastered language, it was simply a question of teaching it about the world. Asia’s over there. This is the United States. We have a democracy. That’s the Pacific Ocean between the two. It’s big, and wet. If researchers kept adding facts, millions of them, and defining their relationships, by the end of the grant cycle they might have a talking, thinking machine that “knew” what humans did.

  Language, of course, turns out to be far more complicated. Jaime Carbonell, a top researcher at Carnegie Mellon University, has been teaching language to machines for decades. The way he describes it, our minds are swimming with cultural and historical allusions, accumulated over millennia, along with a complex scheme of who’s who. Words, when spoken or read, vary wildly according to context. (Just imagine if the cops in New York raced off to Citi
Field, sirens wailing, every time someone was heard saying, “The Mets are getting killed!”)

  Carbonell, sitting in his Pittsburgh office, gave another example. He issued a statement: “I want a juicy hamburger.” What does it mean? Well, if a child says it to his mother, it’s a request or a plea. If a general says it to a corporal, it’s a tacit command. And if a prisoner says it to a cellmate, it might be nothing more than a wish. Scientists, of course, could attempt to teach a computer those variables as rules. But new layers of complexity pop up. Is the general a vegan or speaking sarcastically? Or maybe “hamburger” means something entirely different in prison lingo?

  This flexibility isn’t a weakness of language but a strength. Humans need words to be inexact; if they were too precise, each person would have a unique vocabulary of several billion words, all of them unintelligible to everyone else. You might have a unique word for the sip of coffee you just took at 7:59 A.M., which was flavored with the anxiety about the traffic in the Lincoln Tunnel or along Paris’s Périphérique. (That single word would be as useless to you as to everyone else. A word has to be used at least twice to have any purpose.)

  Each word is a lingua franca, a fragment of a clumsy common language. Imagine a man saying a simple sentence to a friend: “I’m weary.” He’s thinking about something, but what is it? Has he carried a load a long way in the sun? Does he have a sick child or financial troubles? His friend certainly has different ideas, based on his own experience, about what “weary” means. In addition to the various contexts, it might send other signals. Maybe where he comes from, the word has a slightly rarefied feel, and he’s wondering whether his friend is trumpeting his sophistication. Neither one knows exactly what the other is thinking. But that single word, “weary,” extends an itsy bridge between them.

  Now, with that bridge in place, the word shared, they dig deeper to see if they can agree on its meaning. They study each other’s expression and tone of voice. As Carbonell noted, context is crucial. Someone who has won the Boston Marathon might be contentedly weary. Another, in a divorce hearing, is anything but. One person may slack his jaw in an exaggerated way, as if to say “Know what I mean?” In this tiny negotiation, far beyond the range and capabilities of machines, two people can bridge the gap between the formal definition of a word and what they really want to say.

  It’s hard to nail down the exact end of AI winter. A certain thaw set in when IBM’s computer Deep Blue bested Garry Kasparov in their epic 1997 showdown. Until that match, human intelligence, with its blend of historical knowledge, pattern recognition, and the ability to understand and anticipate the behavior of the person across the board, ruled the game. Human grandmasters pondered a rich set of knowledge, jewels that had been handed down through the decades—from Bobby Fischer’s use of the Sozin Variation in his 1972 match with Boris Spassky to the history of the Queen’s Gambit Declined. Flipping through scenarios at about three per second—a glacial pace for a computing machine—these grandmasters looked for a flash of inspiration, an insight, the hallmark of human intelligence.

  Equally important, chess players tried to read the minds of their foes. This is a human specialty, a mark of our intelligence. Cognitive scientists refer to it as “theory of mind”; children develop it at about age four. It’s what enables us to imagine what someone else is experiencing and to build large and convoluted structures based on such analysis. “I wonder what he was thinking I knew when I told him …” Most fiction, from Henry James to Elmore Leonard, revolves around this very human analysis, something other species—and computers—cannot even approach. (It’s also why humans make such expert liars.)

  Unlike previous AI visions, in which a computer would “think” more or less the way we do, Deep Blue set off on a different course. It played on the strengths of a supercomputer: a fabulous memory and extraordinary calculating speed. Statistical approaches to machine intelligence had been around since the dawn of AI, but the numbers mavens had never witnessed anything approaching this level of computing power and speed. Deep Blue didn’t try to read Garry Kasparov’s mind, and it certainly didn’t count on flashes of inspiration. Instead, it raced through a century of grandmaster games, analyzing similar moves and situations. It then constructed the most probable scenarios for each possible move. It analyzed two hundred million moves per second (nearly seventy million for each one the humans considered). A similar approach for a computer writing poetry would be to scrutinize the patterns and vocabulary of every poem ever written before choosing each word.

  Forget inspiration, creativity, or blinding insight. Deep Blue crunched data and won its match by juggling statistics, testing thousands of scenarios, and calculating the odds. Its intelligence was alien to human beings—if it could be considered intelligence at all. IBM at the time described the machine as “less intelligent than the stupidest person.” In fact, the company stressed that Deep Blue did not represent AI, since it didn’t mimic human thinking. But the Deep Blue team made good on a decades-old promise. They taught a machine to win a game that was considered uniquely human. In this, they passed a chess version of the so-called Turing test, an intelligence exam for machines devised by Alan Turing, a pioneer in the field. If a human judge, Turing wrote, were to communicate with both a smart machine and another human, and that judge could not tell one from the other, the machine passed the test. In the limited realm of chess, Deep Blue aced the Turing test—even without engaging in what most of us would recognize as thought.

  But knowledge? That was another challenge altogether. Chess was esoteric. Only a handful of specially endowed people had mastered the game. Yet all of us played the knowledge game. By advancing from chess to Jeopardy, IBM was shifting the focus from a remote island off the coast straight to our cognitive mainland. Here, the computer would grapple with far more than game theory and math. It would be competing in a field utterly defined by human intelligence. The competitors in Jeopardy, as well as the other humans writing the clues, would feast on knowledge tied to experiences and sensations, sights and tastes. The machine, by contrast, would be blind and deaf, with no body, no experience, no life. Its only memories—if you could call them that—would be millions of lists and documents encoded in ones and zeros. And the entire game would be played in endlessly complex and nuanced language—a cinch for humans, a tribulation for machines.

  Picture one of those cartoons in which a land animal, perhaps a coyote, runs off a cliff and continues to run so fast in midair that it manages to fly (at least for a while). Now imagine that animal not only surviving but flying upward and competing with birds. That would be the challenge facing an IBM machine. It would have to use its native strengths in speed and computation to thrive in an utterly foreign setting. Strictly speaking, the machine would be engaged in a knowledge game without “knowing” a thing.

  Still, Ferrucci believed his team had a fighting chance, though he wasn’t quite ready to commit. He code-named the project Blue J—Blue for Big Blue, J for Jeopardy—and right before the holidays, in late 2006, he asked Horn to give him six months to see if it was possible.

  2. And Representing the Humans …

  ON A LATE SUMMER day in 2004, a twenty-nine-year-old software engineer named Ken Jennings placed a mammoth $12,000 bet on a Daily Double in Jeopardy. The category was Literary Pairs. Jennings, who by that point had won a record fifty straight games, was initially flummoxed by the clue: “The film title ‘Eternal Sunshine of the Spotless Mind’ comes from a poem about these ill-fated medieval lovers.” As the seconds passed, Jennings flipped through every literary medieval couple he could conjure up—Romeo and Juliet, Dante and Beatrice, Petrarch and Laura—but he found reasons to disqualify each one. Time was running out. A difference of $24,000 was at stake, enough for a new car. Jennings quickly reviewed the clue. On his second reading, something about the wording suggested to him that the medieval lovers were historical figures, not literary characters or their creators. He said he couldn’t put his finger on it, but it had “
a flavor of history.” At that point, the names of the French philosopher Peter Abelard and his student and lover, Heloise, popped into Jennings’s mind. It was the answer. He just knew it. It was their correspondence, the hundreds of letters they exchanged after their tragic separation, that qualified them as a literary pair. He pronounced the names as time ran out, pocketed the $12,000, and moved on to the next clue.

  In answering that single clue, Jennings displayed several peerless qualities of the human mind, ones that IBM’s computer engineers would be hard-pressed to instill in a machine. First, he immediately understood the complex clue. Unlike even the most sophisticated computers, he was a master of human language. Far beyond basic comprehension, he picked up nuance in the wording so very subtle that even he failed to decode it. Yet it pushed him toward the answer. Once Abelard and Heloise surfaced, more human magic kicked in: He knew he was right. While a Jeopardy computer would no doubt weigh thousands, even millions, of possibilities, humans could look at a mere handful and often pick the right one with utter confidence. Humans just know things. And good Jeopardy players often sense that they’ll find the answer, even before it comes to mind. “It’s an odd feeling,” Jennings wrote in his 2005 memoir, Brainiac. “The answer’s not on the tip of your tongue yet, but a light flashes in the recesses of your brain. A connection has been made, and you find your thumb pressing the buzzer while the brain races to catch up.”

  Perhaps the greatest advantage humans would enjoy over a Jeopardy machine was kinship with the fellow humans who had written the clues. With each clue, participants attempt to read the mind of the writers. What response could they be looking for? In an easy $200 category, would the writers expect players to recognize a Caribbean nation as small as Saint Lucia? With that offhand reference to “candid,” could they be pointing toward Voltaire’s Candide? Would they ever stack the European Capitals category with two clues featuring Dublin? When playing Jeopardy, Jennings said, “You’re not just parsing the question, you’re getting into the head of the writer.” In this psychological aspect of the game, a computer would be out of its league.

 

‹ Prev