by John Markoff
The idea of the archive would become the guiding principle in the development of Siri. The SRI engineers developed an external memory that provided notes, reminders, schedules, and information, all in the form of a human conversation. The Siri designers adapted the work done on CALO and polished it. They wanted a computer that would take over the task of secretary. They wanted it to be possible to say, “Remind me to call Alan at three thirty or on my drive home.”
Just before Cheyer’s project was renamed Siri, Gruber would arrive to work with the tiny team at SRI that included Cheyer and Dag Kittlaus. Kittlaus had been managing mobile communications projects at Motorola before coming to SRI. They code named the project HAL, with only a hint of irony.
Cheyer was charming, but he was also fundamentally a highly technical engineer, and for that reason could never be the head of a company. Kittlaus was the opposite. A good-looking, tanned Norwegian executive who straddled the line between technology development and business, he was a quintessentially polished business development operator. He had done early work on the mobile Internet in Europe. Kittlaus arrived with a broad charter, having been asked by the lab’s managers to come in as an “entrepreneur-in-residence.” There wasn’t any particular assignment; he was just supposed to look around and find something promising. It was Kittlaus who found Cheyer. He immediately realized that Cheyer was a hidden gem.
They had first met briefly when Cheyer had been demonstrating prototypes for the wireless industry based on his OAA work in the 1990s. There had been some interest from the telecommunication industry, but Cheyer had realized that there was no way that his toy demos, written in the Prolog artificial intelligence language, would be something that could be used by millions of mobile phone users.
Although SRI later took pains to draw the links between CALO and Siri in order to garner a share of the credit, it was Cheyer who had dedicated his entire career to pursuing the development of a virtual assistant and natural language understanding. When Kittlaus first saw Cheyer’s work on Siri in 2007, he told him, “I can make a company out of this!” Cheyer, however, wasn’t immediately convinced. He didn’t see how Kittlaus could commercialize Siri, but he agreed to help him with the demos. Kittlaus won him over after buying him an iPhone, which had just been released. Cheyer had a very old Nokia and no interest in the new smartphone gadgets. “Play with this!” Kittlaus told him. “This thing is a game changer. Two years from now there will be a competitive response and every handset manufacturer and telco will be desperate to compete with Apple.” Since bandwidth would still be slow and screens would still be small, the companies that tried to compete with Apple would have to look for any competitive advantage they could find.
They were planning a start-up and so they began looking for a technical cofounder, but they also needed an outsider to assess the technology. That search led them to Tom Gruber. Cheyer and Kittlaus prepared a simple demo that appeared in Mosaic, the first Web browser, for Gruber. Users could type a question into a search box and it would respond. At the outset he was skeptical.
“I’ve seen this before, you guys are trying to boil the ocean,” he told Cheyer.
The program seemed like a search engine, but then Cheyer began to reveal all the AI components they had integrated into the machine.
Gruber paused. “Wait a moment,” he said. “This isn’t going to be just a search engine, is it?”
“Oh no,” Cheyer responded. “It’s an assistant.”
“But all you’re showing me is a search engine. I haven’t seen anything about an assistant,” Gruber replied. “Just because it talks to me doesn’t mean anything.”
He kept asking questions and Cheyer kept showing him hidden features in the system. As he continued the demonstration, Gruber started to run out of steam and fell silent. Kittlaus chimed in: “We’re going to put it on phones.”
That took Gruber by surprise. At that point, the iPhone had not yet become a huge commercial success.
“This phone is going to be everywhere,” Kittlaus said. “This is going to completely change the world. They are going to leave the BlackBerry behind and we want to be on this phone.” Gruber had spent his time designing for personal computers and the World Wide Web, not mobile phones, so hearing Kittlaus describe the future of computing was a revelation.
In the mid-2000s, keyboards on mobile phones were a limiting factor and so it made more sense to include speech recognition. SRI had been at the forefront of speech recognition research for decades. Nuance, the largest independent speech recognition firm, got its start as an SRI spin-off, so Cheyer understood the capabilities of speech recognition well.
“It’s not quite ready yet,” he said. “But it will be.”
Gruber was thrilled. Cheyer had been the chief architect of the CALO project at SRI, and Kittlaus had deep knowledge of the mobile phone industry. Moreover, Cheyer had access to a team of great programmers who were equipped with the necessary skills to build an assistant. Gruber realized immediately that this project would reach an audience far larger than anything he had worked on before. In order to succeed, though, the team needed to figure out how to design the service to interact well with humans. From his time at Intraspect and Real Travel, Gruber understood how to build computing systems for use by nontechnical consumers. “You need a VP of design,” he told them. It was clear to Gruber that he had the opportunity to work with two of the world’s leading experts in their fields, but he had just left an unsuccessful start-up himself. Did he want to sign up again for the crazy world of a start-up so soon?
Why not?
“Do you need a cofounder?” Gruber asked the two men at the end of the meeting.
The core of the team that would build Siri was now in place.
Independently, the three Siri founders had already spent a lot of time pitching investors in the area for funding for earlier projects. In the past, this had been an onerous chore for Gruber, since it required countless visits to venture capitalists who were often uninterested, arrogant, or both. This time their connection to SRI opened the doors to the Valley’s blue-chip venture firms. Dag Kittlaus was a master showman, and on their tour of the venture capital firms on Sand Hill Road, he developed a witty and charming pitch. He would take Cheyer and Gruber in tow to each fund-raising meeting. The men would then be escorted into a conference room and after they introduced themselves, Kittlaus innocently asked the VCs, “Hey, do any of you have one of those newfangled smartphones?” The VCs thrust their hands in their pockets and almost always retrieved Apple’s then-brand-new iPhone.
“Do you have the latest apps downloaded?” Kittlaus asked.
Yes.
“Do you have Google search?”
Of course!
Kittlaus then placed a twenty-dollar bill on the table and told the VCs, “If you can answer three questions in five minutes, you can walk away with my money.” He then asked the VCs three questions, the answers to which were difficult to search on Google or other similar apps. The venture capitalists listened to the questions and then either said, “Oh, I don’t have that app,” or made their way through multiple browser pages, following various hyperlinks in an effort to synthesize an answer. Inevitably, the VCs failed to answer even one of the questions in the time allowed, and Kittlaus never lost his money.
It was a clever way for the team to force the potential investors to visualize the need for the missing Siri application. To help them, the team put together fake magazine covers. One of them read: “The End of Search—Enter the Age of the Virtual Personal Assistant.” Another one featured an image of Siri crowding Google off the magazine cover. The Siri team also built slides to explain that the Google search was not the end point in the world of information retrieval.
Ultimately the team would be vindicated. Google was slow to come to a broader, more conversational approach to gathering and communicating information. Eventually, however, the search giant would come around to a similar approach. In May of 2013, Amit Singhal, head of t
he company’s “Knowledge” group, which includes search technology, kicked off a product introduction by proclaiming “the end of search as we know it.” Four years after Siri had arrived, Google acknowledged that the future of search was conversation. Cheyer’s jaw hit the floor when he heard the presentation. Even Google, a company that was all about data, had moved away from static search and in the direction of assistance.
Until they toured Sand Hill for venture capital, Adam Cheyer had been skeptical that the venture community would buy into their business case. He kept waiting for VCs to toss them out of their meetings, but it never happened. At this point, other companies had released less-impressive voice control systems that had gone bust. General Magic, the once high-flying handheld computing Apple spin-off, for example, had tried its hand as a speech-based personal assistant before going out of business in 2002. Gradually, however, Cheyer realized that if the team could develop a really good technical assistant, the venture capitalists and the money would follow.
The team had started looking for money in late 2007 and they were funded before the end of that year. They had initially visited Gary Morgenthaler, one of Silicon Valley’s elder statesmen and an influential SRI contact, for advice, but Morgenthaler liked the idea so much that he invited them back to pitch. In the end, the team picked Morgenthaler and Menlo Ventures, another well-known venture firm.
Before the dot-com era, companies kept their projects under wraps until they were ready to announce their developments at grand publicity events, but that changed during the Silicon Valley buildup to the bubble in the late 1990s. There was a new spirit of openness among more service-oriented new companies, which shared information freely and raced to be first to market. The Siri developers, however, decided to stay quiet; they even used the domain name stealth-company.com as a placeholder and a tease. They found office space in San Jose, far away from the other software start-ups that frequently settled in San Francisco. Having a base in San Jose also made it easy to find new talent. At the time, technical workers with families were moving to the south end of the Peninsula, and commuting to downtown San Jose was a breeze compared to the trek to Mountain View or Palo Alto.
To build the company culture, Adam Cheyer went out and bought picture frames and handed them out to all of the company’s employees. He asked everyone to choose a hero and then put a framed picture of that person on their desks. Then, he asked them to pick a quote that exemplified why that person was important to them. Cheyer hoped this would serve two purposes: it would be interesting to see who people chose, and it would also reflect something about each employee. Cheyer chose Engelbart and attached an early commitment made by the pioneering SRI researcher: “As much as possible, to boost mankind’s collective capability for coping with complex, urgent problems.” For Cheyer, the quote perfectly expressed the tension between automating and augmenting the human experience. He had always harbored a tiny feeling of guilt about his work as he moved between what he thought of as “people-based” systems and artificial intelligence–based projects. His entire career had vacillated between the two poles. It was 2007, the year that he also helped his friends start the activist site change.org, which fell squarely within the Engelbart tradition, and he believed that Siri was moving along the same path. Gruber had wanted to choose Engelbart as well, but when Cheyer chose him first he fell back on his musical hero, Frank Zappa.
Despite having his project sold to Tymnet in the early 1970s, Doug Engelbart had been brought back into the fold at SRI when Cheyer had arrived, and Cheyer had come to know the aging computer scientist as a father figure and a guiding light. Working on projects that were inspired by Engelbart’s augmentation ideas, he had tried to persuade Engelbart that he was working in his tradition. It had been challenging. By the 1990s, Engelbart, who had mapped it all out beginning in the 1960s, was a forlorn figure who felt the world had ignored him. It didn’t matter to Cheyer. He saw the power of Engelbart’s original vision clearly and he took it with him when he left SRI to build Siri.
In college, Cheyer had begun visualizing goals clearly and then systematically working to achieve them. One day just as they were getting started he wandered into an Apple Store and saw posters with an array of colorfully crafted icons representing the most popular iPhone applications. All of the powerful software companies were there: Google, Pandora, Skype. He focused on the advertising display and said to himself: “Someday Siri is going to have its icon right here on the wall of an Apple Store! I can picture it and I’m going to make this happen.”
They went to work. In Gruber’s view, the team was a perfect mix. Cheyer was a world-class engineer, Kittlaus was a great showman, and Gruber was someone who could build high-technology demos that wowed audiences. They knew how to position their project for investors and consumers alike. They not only anticipated the kinds of questions people would ask during a demo; they also researched ideas and technology that would have the most crowd appeal. Convincing the observer that the future was just around the corner became an art form unique to Silicon Valley. The danger, of course, was being too convincing. Promising too much was a clear recipe for disaster. Other personal assistants projects had failed, and John Sculley had publicized a grand vision for Knowledge Navigator, which he never delivered. As Siri’s developers kicked the project into high gear, Gruber dug out a copy of the Knowledge Navigator video. When Apple had shown it years earlier, it had instigated a heated debate within the user interface design community. Some would argue—and still argue—against the idea of personifying virtual assistants. Critics, such as Ben Shneiderman, insisted that software assistants were both technically and ethically flawed. They argued for keeping human users in direct control rather than handing off decisions to a software valet.
The Siri team did not shy away from the controversy, and it wasn’t long before they pulled back the curtain on their project, just a bit. By late spring 2009, Gruber was speaking obliquely about the new technology. During the summer of that year he appeared at a Semantic Web conference and described, point by point, how the futuristic technologies in the Knowledge Navigator were becoming a reality: there were now touch screens that enabled so-called gestural interfaces, there was a global network for information sharing and collaboration, developers were coding programs that interacted with humans, and engineers had started to finesse natural and continuous speech recognition. “This is a big problem that has been worked on for a long time, and we’re beginning to see some progress,” he told the audience. Gruber also pointed to developments that were on the horizon, like conversational speech between a computer agent and a human and the delegation of tasks to computers—like telling a computer: “Go ahead, make that appointment.” Finally, he noted, there was the issue of trust. In the Knowledge Navigator video, the professor had let the computer agent handle calls from his mother. If that wasn’t a sign of trust, what was? Gruber hoped his technology would inspire that same level of commitment.
After discussing the technologies forecasted in the Knowledge Navigator video, Gruber teased the audience. “Do we think that this Knowledge Navigator vision is possible today?” he asked. “I’m here to announce”—he paused slightly for effect—“that the answer is still no.” The audience howled with laughter and broke into applause. He added, “But we’re getting there.”
The Siri designers discovered early on that they could quickly improve cloud-based speech recognition. At that point, they weren’t using the SRI-inspired Nuance technology, but instead a rival system called Vlingo. Cheyer noticed that when speech recognition systems were placed on the Web, they were exposed to a torrent of data in the form of millions of user queries and corrections. This data set up a powerful feedback loop to train and improve Siri.
The developers continued to believe that their competitive advantage would be that the Siri service represented a fundamental break with the dominant paradigm for finding information on the Web—the information search—exemplified by Google’s dramatically successful sea
rch engine. Siri was not a search engine. It was an intelligent agent in the form of a virtual assistant that was capable of social interaction with humans. Gruber, who was also chief technology officer at Siri, laid out the concepts underlying the service in a series of white papers in the form of technical presentations. Finding information should be a conversation, not a search, he argued. The program should be capable of disambiguating questions to refine the answers to human questions. Siri would provide services—like finding movies and restaurants—not content. It would act as a highly personalized broker for the human user. In early 2010 the Siri team put together an iPhone demonstration for their board of directors. Siri couldn’t speak yet, but the program could interpret spoken queries and converse by responding to human queries in natural language sentences that were displayed in cartoonlike bubbles on screen. The board was enthusiastic and gave the developers more time to tune and polish the program.
In February of 2010, the tiny start-up released the program on the iPhone App Store. They received early positive reviews from the Silicon Valley digerati. Robert Scoble, one of the Valley’s high-profile technology bloggers, referred to it as “the most useful thing that I’ve seen so far this year.” Faint praise perhaps—it was still very early in the year.
Gruber was away at a technology retreat during the release and had almost no access to the Web when the product was first available. He had to rely on secondhand reports—“Dude, have you seen what’s happening to your app?!”—to keep up.