Book Read Free

The One Device

Page 24

by Brian Merchant


  “I’m going to have to watch what I say on the record here,” says Tom Gruber with a short smile and a nod toward my recorder. That’s because Gruber is head of advanced development for Siri at Apple. We’re both aboard Mission Blue, a seafaring expedition organized by TED, the pop-lecture organization, and Sylvia Earle, the oceanographer, to raise awareness of marine-conservation issues. By night, there are TED Talks. By day, there’s snorkeling.

  Gruber’s easy to spot—he’s the goateed mad scientist flying the drone. He looks like he’s constantly scanning the room for intel. He talks softly but at a whirring clip, often cutting one rapid-fire thought short to begin another. “I’m interested in the human interface,” he says. “That’s really where I put my effort. The AI is there to serve the human interface, in my book.”

  Right now, he’s talking about Siri, Apple’s artificially intelligent personal assistant, who probably needs little introduction.

  Siri is maybe the most famous AI since HAL 9000, and “Hey, Siri” is probably the best-known AI-human interaction since “I’m sorry, Dave, I’m afraid I can’t do that.” One of those AIs, of course, assists us with everyday routines—Siri answered one billion user requests per week in 2015, and two billion in 2016—and the other embodies our deepest fears about machine sentience gone awry.

  Yet if you ask Siri where she—sorry, it, but more on that in a second—comes from, the reply is the same: “I, Siri, was designed by Apple in California.” But that isn’t the full story.

  Siri is really a constellation of features—speech-recognition software, a natural-language user interface, and an artificially intelligent personal assistant. When you ask Siri a question, here’s what happens: Your voice is digitized and transmitted to an Apple server in the Cloud while a local voice recognizer scans it right on your iPhone. Speech-recognition software translates your speech into text. Natural-language processing parses it. Siri consults what tech writer Steven Levy calls the iBrain—around 200 megabytes of data about your preferences, the way you speak, and other details. If your question can be answered by the phone itself (“Would you set my alarm for eight a.m.?”), the Cloud request is canceled. If Siri needs to pull data from the web (“Is it going to rain tomorrow?”), to the Cloud it goes, and the request is analyzed by another array of models and tools.

  Before Siri was a core functionality of the iPhone, it was an app on the App Store launched by a well-funded Silicon Valley start-up. Before that, it was a research project at Stanford backed by the Defense Department with the aim of creating an artificially intelligent assistant. Before that, it was an idea that had bounced around the tech industry, pop culture, and the halls of academia for decades; Apple itself had an early concept of a voice-interfacing AI in the 1980s.

  Before that there was the Hearsay II, a proto-Siri speech-recognition system. And Gruber says it was the prime inspiration for Siri.

  Dabbala Rajagopal “Raj” Reddy was born in 1937 in a village of five hundred people south of Madras, India. Around then, the region was hit with a seven-year drought and subsequent famine. Reddy learned to write, he says, by carving figures in the sand. He had difficulty with language, switching from his local dialect to English-only classes when he went to college, where professors spoke with Irish, Scottish, and Italian accents. Reddy headed to the College of Engineering at the University of Madras and afterward landed an internship in Australia. It was then, in 1959, that he first became acquainted with the concept of a computer.

  He got a master’s at the University of New South Wales, worked at IBM for three years, then moved to Stanford, where he’d eventually obtain his doctorate. He was drawn to the dawning study of artificial intelligence, and when his professor asked him to pick a topic to tackle, he gravitated to one in particular: speech recognition.

  “I chose that particular one because I was both interested in languages, having come from India at that time, and in having had to learn three or four languages,” he said in a 1991 interview for the Charles Babbage Institute. “Speech is something that is ubiquitous to humankind.… What I didn’t know was that it was going to be a lifetime problem. I thought it was a class project.”

  Over the next few years, he tried to build a system for isolated word recognition—a computer that could understand the words humans spoke to it. The system he and his colleagues created in the late 1960s, he said, “was the largest that I knew of at that time—about 560 words or something—with a respectable performance of like about 92%.” As with much of the advanced computer research around Stanford then, ARPA was doing the funding. It would mark a decades-long interest in the field of AI from the agency, which would fund multiple speech recognition projects in the 1970s. In 1969, Reddy moved to Carnegie Mellon and continued his work. There, with more ARPA funding, he launched the Hearsay project. It was, essentially, Siri, in its most embryonic form. “Ironically, it was a speech interface,” Gruber says. “A Siri kind of thing. It was 1975, I think; it was something crazy.”

  Hearsay II could correctly interpret a thousand words of English a vast majority of the time.

  “I just think the human mind is kind of the most interesting thing on the planet,” Tom Gruber says. He went to Loyola University in New Orleans to study psychology before discovering he had a knack for computers, which were just emerging on the academic scene. When the school got a Moog synthesizer, he whipped up a computer interface for it. And he created a computer-aided instruction system that’s still used at Loyola’s psych department today. Then Gruber stumbled across a paper published by a group of scientists at Carnegie Mellon University—led by Raj Reddy.

  What Gruber saw in the paper were the worm roots of AI—a speech-recognition system capable of symbolic reasoning. The beginnings of what would, decades later, become Siri. It’s one thing to train a computer to recognize sounds and match them to data stored in a database. But Reddy’s team was trying to figure out how the language could be represented within a computer so that the machine could do something useful with it. For that, it had to learn to be able to recognize and break down the different parts of a sentence.

  Symbolic reasoning describes how the human mind uses symbols to represent numbers and logical relationships to solve problems both simple and complex.

  Like: “‘We have an appointment at two to have an interview,’” Gruber says, referring to the time we’d set aside for our talk. “That’s a statement of fact that can be represented in knowledge-representation terms. It can’t be represented as a database entry unless the entire database is nothing but instances of that fact.” So you could, he’s saying, set up a massive database of every possible date and time, teach the computer to recognize it, and play a matching game. “But that’s not a knowledge representation. The knowledge representation is ‘You human being, me human being. Meet at time and place. Maybe ostensibly for a purpose’—and that is the basis for intelligence.”

  Gruber graduated summa cum laude in 1981 and headed to grad school at the University of Massachusetts at Amherst, where he looked into ways that AI might benefit the speech-impaired. “My first project was a human-computer interface using artificial intelligence to help with what’s called communication prosthesis,” he says. The AI would analyze the words of people who suffered from speech-impeding afflictions like cerebral palsy, and predict what they were trying to say. “It is actually an ancestor of something that I call ‘semantic autocomplete.’

  “I used it later in Siri,” Gruber says. “The same idea, just modernized.”

  The automated personal assistant is yet another one of our age-old ambitions and fantasies.

  “He was fashioning tripods, twenty in all, to stand around the wall of his well-builded hall, and golden wheels had he set beneath the base of each that of themselves they might enter the gathering of the gods at his wish and again return to his house, a wonder to behold.” That might be the earliest recorded imagining of an autonomous mechanical assistant, and it appears in Homer’s Iliad, written in the ei
ghth century B.C. Hephaestus, the Greek god of blacksmiths, has innovated a little fleet of golden-wheeled tripods that can shuttle to and from a god party upon command—Homeric robot servants.

  Siri, too, is essentially a mechanical servant. As Bruce G. Buchanan, a founding member of the American Association of Artificial Intelligence, puts it, “The history of AI is a history of fantasies, possibilities, demonstrations, and promise.” Before humans had anything approaching the technological know-how to create machines that emulated humans, they got busy imagining what might happen if they did.

  Jewish myths of golems, summoned out of clay to serve as protectors and laborers, but which usually end up running amok, are centuries old. Mary Shelley’s Frankenstein was an AI patched together with corpses and lightning. As far back as the third century B.C., a Lie Zie text describes an “artificer” presenting a king with a lifelike automaton, essentially a mechanical human mannequin that could sing and dance. The first time the word robot appeared was to describe the eponymous subjects of the playwright Karel Capek’s 1922 work Rossum’s Universal Robots. Capek’s new word was derived from robota, which means “forced labor.” Ever since, the word robot has been used to describe nominally intelligent machines that perform work for humans. From The Jetsons’ robo-maid Rosie to the Star Wars droids, robots are, basically, mechanical assistants.

  Primed by hundreds of years of fantasy and possibility, around the mid-twentieth century, once sufficient computing power was available, the scientific work investigating actual artificial intelligence began. With the resonant opening line “I propose to consider the question, ‘Can machines think?’” in his 1950 paper “Computing Machinery and Intelligence,” Alan Turing framed much of the debate to come. That work discusses his famous Imitation Game, now colloquially known as the Turing Test, which describes criteria for judging whether a machine may be considered sufficiently “intelligent.” Claude Shannon, the communication theorist, published his seminal work on information theory, introducing the concept of the bit as well as a language through which humans might speak to computers. In 1956, Stanford’s John McCarthy and his colleagues coined the term artificial intelligence for a new discipline, and we were off to the races.

  Over the next decade, as the scientific investigation of AI began to draw interest from the public and as, simultaneously, computer terminals became a more ubiquitous machine-human interface, the two future threads—screen-based interfaces and AI—wound into one, and the servile human-shaped robots of yore became disembodied. In the first season of the original Star Trek, Captain Kirk speaks to a cube-shaped computer. And, of course, in 2001: A Space Odyssey, Hal 9000 is an omnipresent computer controlled—for a while—through voice commands.

  “Now, Siri was more about traditional AI being the assistant,” Gruber says. “The idea, the core idea of having an AI assistant, has been around forever. I used to show clips from the Knowledge Navigator video from Apple.” The video he’s referring to, which is legendary in certain tech circles, is a wonderfully odd bit of early design fiction from John Sculley–era Apple. It depicts a professor in an opulent, Ivy League–looking office consulting his tablet (even now, Gruber refers to it as a Dynabook, and Alan Kay was apparently a consultant on the Knowledge Navigator project) by speaking to it. His computer is represented by a bow-tie-wearing dandy who informs Professor Serious about his upcoming engagements and his colleagues’ recent publications. “So that’s a model of Siri; that was there in 1987.”

  Gruber’s 1989 dissertation, which would be expanded into a book, was “The Acquisition of Strategic Knowledge,” and it described training an AI assistant to acquire knowledge from human experts.

  The period during which Gruber attended grad school was, he says, a “peak time when there were two symbolic approaches to AI. There was essentially pure logical representation and generic reasoning.”

  The logic-driven approach to AI included trying to teach a computer to reason using those symbolic building blocks, like the ones in an English sentence. The other approach was data-driven. That model says, “No, actually the problem is a representation of memory and reasoning is a small part,” Gruber says. “So lawyers, for instance, aren’t great lawyers because they have deep-thinking minds that solve riddles like Einstein’s. What the lawyers are good at is knowing a lot of stuff. They have databases, and they can comb through them rapidly to find the correct matches, the correct solutions.”

  Gruber was in the logic camp, and the approach is “no longer fashionable. Today, people want no knowledge but lots of data and machine learning.”

  It’s a tricky but substantive divide. When Gruber says knowledge, I think he means a firm, robust grasp on how the world works and how to reason. Today, researchers are less interested in developing AI’s ability to reason and more intent on having them do more and more complex machine learning, which is not unlike automated data mining. You might have heard the term deep learning. Projects like Google’s DeepMind neural network work essentially by hoovering up as much data as possible, then getting better and better at simulating desired outcomes. By processing immense amounts of data about, say, Van Gogh’s paintings, a system like this can be instructed to create a Van Gogh painting—and it will spit out a painting that looks kinda-sorta like a Van Gogh. The difference between this data-driven approach and the logic-driven approach is that this computer doesn’t know anything about Van Gogh or what an artist is. It is only imitating patterns—often very well—that it has seen before.

  “The thing that’s good for is perception,” Gruber says. “The computer vision, computer speech, understanding, pattern recognition, and these things did not do well with knowledge representations. They did better with data- and signal-processing techniques. So that’s what’s happened. The machine learning has just gotten really good at making generalizations over training examples.”

  But there is, of course, a deficiency in that approach. “The machine-learned models, no one really has any idea of what the models know or what they mean; they just perform in a way that meets the objective function of a training set”—like producing the Van Gogh painting.

  Scientists have a fairly good grasp on how human perception works—the mechanisms that allow us to hear and see—and those can be modeled pretty fluidly. They don’t, of course, have that kind of understanding of how our brains work. There’s no scientific consensus on how humans understand language, for instance. The databases can mimic how we hear and see, but not how we think. “So a lot of people think that’s AI. But that’s only the perception.”

  After Amherst, Gruber went to Stanford, where he invented hyper-mail. In 1994, he started his first company, Intraspect—“Essentially… a group mind for corporations.” He spent the next decade or so bouncing between start-ups and research. And then he met Siri. Or, rather, he met what was about to become Siri. It had been a long time in the making.

  Before we can get to Siri, we have get back to DARPA. The Defense Advanced Research Projects Agency (or ARPA before 1972) had funded a number of AI and speech-recognition projects in the 1960s, leading Raj Reddy and others to develop the field and inspiring the likes of Tom Gruber to join the discipline. In 2003, decades later, DARPA made an unexpected return to the AI game.

  The agency gave the nonprofit research outfit SRI International around two hundred million dollars to organize five hundred top scientists in a concerted research effort to build a virtual AI. The project was dubbed Cognitive Assistant that Learns and Organizes, or CALO—an attempt to wring an acronym out of calonis, which is, somewhat ominously, Latin for “soldier’s servant.” By the 2000s, AI had fallen out of fashion as a research pursuit, so the large-scale effort took some in the field by surprise. “CALO was put together at a time when many people said AI was a waste of time,” Paul Saffo, a technology forecaster at Stanford University, told the Huffington Post. “It had failed multiple times, skepticism was high and a lot of people thought it was a dumb idea.”

  One reason for the
DoD’s sudden interest in AI could have been the escalation of the Iraq War, which began in 2003—and indeed, some technology developed under CALO was deployed in Iraq as part of the army’s Command Post of the Future software system. Regardless, AI went from semi-dormant to a field of major activity. CALO was, “by any measure, the largest AI program in history,” said David Israel, one of its lead researchers. Some thirty universities sent their best AI researchers, and proponents of each of the major approaches to AI collaborated for the first time. “SRI had this project,” Gruber says. “They were paid by the government two hundred million bucks to run this project that was creating… a sensing office assistant, like it’d help you with meetings and PowerPoint, stuff like that. They wanted to push the art of AI,” he says.

  After the project drew to a close in 2008, its chief architect, Adam Cheyer, and a key executive, Dag Kittlaus, decided to spin some of the fundamental elements of the research into a start-up.

  “They came up with an architecture for How would you represent all the bits an assistant needs to know? Like, how do you recognize speech? How do you recognize human language? How do you understand service providers like Yelp or your calendar app and how do you combine the input with task intent?” Gruber says.

  Cheyer and Kittlaus imagined their assistant as a navigator of a “do engine” that would supplant the search engine as the predominant way people got around online. Proto-Siri would not only be able to scan the web, but also send a car to come pick you up on command, for instance. Originally, however, it wasn’t conceived as a voice interface, Gruber says.

 

‹ Prev