Surveillance Valley

Home > Other > Surveillance Valley > Page 16
Surveillance Valley Page 16

by Yasha Levine


  Wealth, fame, making a mark on the world—these were the things that the young Page fantasized about. Stanford University, and a research program funded by the Defense Advanced Research Projects Agency (previously known as ARPA), would allow him to achieve his dreams.13

  Stanford sits on the edge of the San Francisco Bay, thirty-five miles south of the city. It was founded by Leland Stanford, a local railroad tycoon elected as the state’s governor, then as a senator.14 When the university opened in 1891, New York’s Mail and Express mocked the project, writing, “the need for another university in California is about as great as that of an asylum for decayed sea captains in Switzerland.”15 But the institution and the surrounding area flourished in tandem. In the early twentieth century, the Bay Area developed a thriving radio and electronics industry, emerging as the center of vacuum-tube manufacturing. During World War II, the area boomed again, driven by the need for radio technology and advanced vacuum-tube design to support the military’s radar technology. After the war, Stanford University became the West Coast’s answer to the Massachusetts Institute of Technology, the elite engineering university closely linked to the US military-industrial complex.16 The area surrounding the campus was the epicenter of computer and microprocessor development.

  William Shockley was an MIT chemist and notorious eugenicist who made his name as part of the Bell Labs team that invented the solid-state transistor. In 1956, he returned to his hometown of Palo Alto to start Shockley Semiconductor inside the university’s Stanford Industrial Park.17 His company spawned several other microchip companies, including Intel, and gave Silicon Valley its name. Hewlett-Packard, Eastman Kodak, General Electric, Xerox PARC, and Lockheed Martin also set up shop inside Stanford’s Industrial Park around the same time. There was so much military work going on in Silicon Valley that, throughout the 1960s, Lockheed was the biggest employer in the Bay Area.

  ARPA had a huge presence on campus, too. The Stanford Research Institute did counterinsurgency and chemical warfare work for the agency as part of William Godel’s Project Agile. It also housed the Augmentation Research Center, an ARPANET site run by the acid-dropping Douglas Engelbart. Indeed, the ARPANET was part-born at Stanford.18

  Into the 1990s, Stanford University hadn’t changed all that much. It was still home to cutting-edge computer and networking research and still awash in military cash and cybernetic utopianism. Perhaps the biggest change occurred in the suburbs surrounding the university—Mountain View, Cupertino, San Jose—which became thick with investors and Internet start-ups: eBay, Yahoo!, and Netscape. Stanford was the epicenter of the Bay Area dot-com boom when a young Larry Page parachuted right into the vortex.

  Page started the computer science PhD program at Stanford in the autumn of 1995. He was in his element and immediately started scratching around for a research topic worthy of a dissertation. He toyed with various ideas, including a self-driving car, which Google would later get into in a heavy way. Eventually, he settled on Internet search.19

  In the mid-1990s, the Internet was growing exponentially. The landscape was chaotic: a jumble of random websites, personal webpages, university sites, news sites, and corporate properties. Pages were popping up all over the place. But there was no good central or authoritative directory that could help people navigate to where they wanted to go or find a particular song, article, or webpage. Search engines and directory portals like Yahoo!, AltaVista, and Excite were crude and sometimes had to be curated by hand. Search algorithms were extremely primitive, matching searches word for word without the ability to find the most relevant results. Despite their primitive technology and awful search results, these early search sites attracted huge amounts of traffic and investment. The young programmers who started them were rich beyond belief.

  In the parlance of Silicon Valley, it was a market ripe for disruption. Finding a way to improve search results not only was intellectually challenging but also could prove to be extremely lucrative.

  With Nikola Tesla’s ghost hanging over him, Page tackled the issue with his laser-guided brain. Page’s tinkering was encouraged by his graduate adviser, Terry Winograd, a pioneer in linguistic artificial intelligence who had done work in the 1970s at MIT’s Artificial Intelligence Lab, a part of the bigger ARPANET project. In the 1990s, Winograd was in charge of the Stanford Digital Libraries project, one component of the multi-million-dollar Digital Library Initiative sponsored by seven civilian, military, and law enforcement federal agencies, including NASA, DARPA, the FBI, and the National Science Foundation.20

  The Internet had grown into a vast and labyrinthine ecosystem spanning every type of computer network and data type imaginable: documents, databases, photographs, sound recordings, text, executable programs, videos, and maps.21 The purpose of the Digital Library Initiative was to find a way to organize and index this digital mess. Though the project had a broad civilian mandate, it was also linked to the needs of intelligence and law enforcement agencies. More and more, life was taking place online. People were leaving behind trails of digital information: diaries, blogs, forums, personal photographs, videos. Intelligence and law enforcement agencies wanted a better way of accessing this valuable asset.

  It made sense. Back in the 1960s, when the military was dealing with an avalanche of data and needed new tools to digest and analyze the information, ARPA was tasked with finding a solution. Three decades later, the Digital Library Initiative had evolved into an extension of the same project, driven by the same needs. And just like old times, DARPA played a role.22 Indeed, in 1994, just one year before Page had arrived at Stanford, DARPA’s funding of the Digital Library Initiative at Carnegie Mellon University produced a notable success: Lycos, a search engine named after Lycosidae, the scientific name for the wolf spider family.23

  Larry Page’s interest in search aligned perfectly with the goals of the Digital Library Initiative, and his research was carried out under its umbrella.24 When he finally published his first research paper in 1998, it bore the familiar disclosure: “funded by DARPA.” The agency that had created the Internet remained a central player.

  Larry Page met Sergey Brin on his first day at Stanford, at graduate orientation. The two were at once similar and polar opposites. They fast became friends.

  Page was withdrawn and quiet; some people thought maybe he was a bit autistic. He spoke with a strange lisp that some people mistook for an Eastern European accent.25 Brin was the opposite. He was social and talkative, and into sports. When fellow students recall his time at Stanford, they remember Brin rollerblading through the halls and constantly dropping by the offices of his professors to chew the fat. Unlike Page, Brin was an actual Eastern European. One overarching activity united the two future billionaires: their early experimentation with computers and the Internet.

  Sergey Brin’s family had emigrated from Moscow to the United States in the 1970s and very successfully integrated into the engineering-academic world. His mother, Eugenia, was a NASA scientist. His father, Michael, was a tenured mathematics professor at the University of Maryland.

  Brin was a math prodigy. When he was nine, he discovered the early Internet and spent his time hanging out in chatrooms and playing multiuser dungeon games, or MUDs.26 He spent hours immersed in this new communication technology, souring on it when he realized that it was full of people just like him, “ten-year-old boys trying to talk about sex.”27

  Brin finished high school in 1990, a year early, and enrolled at the University of Maryland with a dual major in math and computer science. He graduated with honors in 1993 and moved to Palo Alto to continue his studies at Stanford under a National Science Foundation Graduate Research Fellowship.28 At Stanford, he became interested in data mining: building computer algorithms that could predict what people would do on the basis of their past actions. What would they buy? What movies would they like?29 He even founded a student group called MIDAS: “Mining Data at Stanford.” In later years, behavioral data mining would prove to be Google’s Midas
touch. But that was well into the future. As Brin grew bored with the narrow focus of his data-mining research, he decided to join a new project with his buddy, Larry Page. “I talked to a lot of research groups, and this was the most exciting project, both because it tackled the Web, which represents human knowledge, and because I liked Larry,” Brin recalled in an interview.30

  The core problem of search was relevance. Some web pages were more important and authoritative than others, but the first search engines couldn’t tell the difference. The key, Page understood, was to find a way to incorporate a ranking system into the search results. It was a simple but powerful idea, cribbed from the world of academia, where the importance of a research paper was measured by how many times it had been cited by other research papers. A paper cited a thousand times was assumed to be more important than a paper cited only ten times. Because of its hyperlinked design—with every webpage linking to other pages—the Internet was essentially one giant citation machine. This was Page’s breakthrough. He called the resultant experimental project “PageRank” and with Brin’s help began lashing the thing together.

  They first coded a bot to crawl the entire Internet, scrape its contents, and save it all on their server at Stanford. They then refined and massaged the PageRank algorithm to produce relevant results. Because different links carried different values—a link from a newspaper like the New York Times was much more authoritative than a link from someone’s personal homepage—they tweaked their calculations so that pages were scored by the number of links as well as the scores of those links themselves. In the end, the rank of any given webpage would be the sum total of all the links and their values that pointed to it. Once the values of a few initial webpages entered the PageRank algorithm, new rankings propagated recursively through the whole web. “We converted the entire web into a big equation with several hundred million variables, which are the page ranks of all the web pages,” Brin explained not long after launching Google.31 It was a dynamic mathematical model of the Internet. If one value changed, then the whole thing would be recomputed.32

  They folded it into an experimental search engine they called “BackRub” and put it up on Stanford’s internal network. The BackRub logo was creepy: it featured a black-and-white photo of a hand attached to a hairy arm rubbing a nude back. But it didn’t matter. As word spread, students started using it—and they were amazed. This student project was better than any commercial search engine available at the time, such as Excite or AltaVista. The dominant search companies were valued in the billions but did not understand their own business. “They were looking only at text and not considering this other signal,” Page said.33

  The search engine, which the pair quickly renamed Google, became so popular it overwhelmed the bandwidth of Stanford’s network connection. Brin and Page realized they’d hit on something very special. Google was much bigger than a research project.

  Even at that early stage, they understood that Google’s search algorithm wasn’t just abstract mathematics. It catalogued and analyzed webpages, read their contents, looked at outgoing links, and ranked pages by importance and relevance. Because webpages were written and built by people, the two Google creators understood that their indexing system essentially depended on a kind of surveillance of the public Internet. “The process might seem completely automated, but in terms of how much human input goes into the final product, there are millions of people who spend time designing their webpages, determining who to link to and how, and that human element goes into it,” Brin said.34

  But there was more.

  Brin was deeply fascinated by the art and science of extracting information from people’s behavior in order to predict their future actions. Cataloguing the contents of the Internet was just the first step. The next was understanding the intent of the person doing the searching. Was it a teenager? A computer scientist? Male, female, or transgender? Where did they live? Where did they shop? If they searched for “cubs,” were they nature lovers or baseball fans? When they typed “buy underwear” were they interested in lacy thongs or boxer shorts? The more Google knew about someone, the better its search results would be.

  As Page and Brin worked on perfecting Google’s relevance algorithm, they began to think about customizing search results to a person’s interests and habits. Some of their initial ideas were rudimentary, including scanning a person’s browser bookmarks or ingesting the contents of their academic homepage, which usually listed personal interests as well as an academic and professional history. “These search engines could save users a great deal of trouble by efficiently guessing a large part of their interests,” the two wrote in the original 1998 paper that described Google’s search methods.35

  This short sentence would define the future company. Collecting data and profiling users became an obsession for them both. It would make them rich beyond belief and transform Google from a mere search engine into a sprawling global platform designed to capture as much information as possible about the people who came into contact with it.

  The Brain Tap

  In 1998, Larry Page and Sergey Brin moved into the garage of a house owned by Susan Wojcicki, the sister of Brin’s future wife, Anne Wojcicki. They had an initial $100,000 check from Andy Bechtolsheim, the cofounder of Sun Microsystems, a powerful computer company that itself had come out of an ARPA-funded 1970s computer research program at Stanford University.36 The initial small investment was followed by a $25 million tranche from two powerful venture capital outfits, Sequoia Capital and Kleiner Perkins.37

  Brin and Page couldn’t be happier. Flush with cash, the two young entrepreneurs hired a couple of their Stanford Digital Library Initiative colleagues and plowed their energy into improving Google’s still-rudimentary search engine.

  All the early search engine companies, from Lycos to Yahoo!, AltaVista to AOL, realized that they were sitting on something new and magical. “People came to our servers and they’d leave tracks. We could see every day exactly what people thought was important on the Internet,” Tim Koogle, Yahoo’s first CEO, said.38 “The Net is all about connection.… We sat in the middle, connecting people.” Yahoo! tried leveraging the data to gain insight into consumer demand, but its engineers barely scratched the surface of the valuable data they were amassing. Google’s search logs were no different. What separated the company from the pack was the sophistication and aggressiveness Page and Brin brought to mining and monetizing the data trail.

  Initially, Google’s team focused on mining user behavior to improve the search engine to better guess user intent. “If people type something and then go and change their query, you could tell they aren’t happy. If they go to the next page of results, it’s a sign they’re not happy. You can use those signs that someone’s not happy with what we gave them to go back and study those cases and find places to improve search,” explained one Google engineer.39 Studying the logs for patterns, Google engineers turned user behavior into a system of crowdsourced free labor. It acted like a feedback loop that taught the search engine to be “smarter.” An auto-suggest spellchecker feature allowed Google to recognize minor but important quirks in the way people used language in order to guess the meaning of what people typed rather than just matching text to text. “Today, if you type ‘Gandhi bio,’ we know that ‘bio’ means ‘biography.’ And if you type ‘bio warfare,’ it means ‘biological,’” another Google engineer explained.

  Steven Levy, a veteran tech journalist whose early career included a stint at Stewart Brand’s Whole Earth Software Catalog in the 1980s, gained unprecedented insider access to write the history of Google. The result was In the Plex: How Google Thinks, Works, and Shapes Our Lives, a hagiographic but highly informative story of Google’s rise to dominance. The book demonstrates that Page and Brin understood early on that Google’s success depended on grabbing and maintaining proprietary control over the behavioral data they captured through their services. This was the company’s biggest asset. “Over the years, Google wou
ld make the data in its logs the key to evolving its search engine,” wrote Levy. “It would also use those data on virtually every other product the company would develop. It would not only take note of user behavior in its released products but measure such behavior in countless experiments to test out new ideas and various improvements. The more Google’s system learned, the more new signals could be built into the search engine to better determine relevance.”40

  Improving Google’s usability and relevance helped make it the most popular search engine on the Internet. By the end of 1999, the company was averaging seven million searches daily, a roughly 70,000 percent increase from the previous year.41 Now that Google dominated the market, it was time to make money. It didn’t take long for the company to figure out how.

  In 2000, right after moving to its new expanded office at 2400 Bayshore in Mountain View, right next to the Ames NASA Center and a short drive from the Stanford campus, Page and Brin launched Google’s first money-maker. It was called AdWords, a targeted advertising system that let Google display ads based on the content of a search query. It was simple but effective: an advertiser selected keywords, and if those keywords appeared in a search string, Google would display the ad alongside search results and would only be paid if a user clicked the link.

  Google’s search logs were vital to AdWords. The company figured out that the better it knew the intention and interests of users when they hit the search button, the more effectively the company could pair users with a relevant advertiser, thus increasing the chance users would click ad links. AdWords was initially rudimentary, matching keyword to keyword. It couldn’t always guess a person’s interests with accuracy, but it was close. With time, Google got better at hitting the target, resulting in more relevant ads, more clicks, and more profits for Google. Multiplied by hundreds of millions of searches a day, even a tiny increase in the probability that a searcher would click an advertising link dramatically boosted company revenue. Over the coming years, Google became hungry for more and more data to refine the efficacy of the ad program. “The logs were money—we billed advertisers on the basis of the data they contained,” explained Douglas Edwards.42

 

‹ Prev