by Yasha Levine
Indeed, money began raining from the sky. In 2001, Google hired Sheryl Sandberg, a former chief of staff for President Bill Clinton’s Treasury secretary Larry Summers. She was tasked with developing and running the advertising business side of things, and she succeeded beyond anyone’s expectations. With a targeted system based on user behavior, advertising revenue shot up from $70 million in 2001 to $3.14 billion in 2004, the bulk of it resulting from simply showing the right ad at the right time to the right eyeballs.43 It was like a new form of alchemy: Google was turning useless scraps of data into mountains of gold.44
Barbecued Girl Meat
As Google engineers wrung personal information from their growing millions of users, executives worried the smallest disclosure regarding the operation could trigger a fatal public relations disaster. Page especially realized Google could potentially lose users if people understood the ways the company used their search streams.45 Guarding this secret became bedrock corporate policy.46
Page was incredibly paranoid about disclosing any hint of information. At his insistence, the company’s privacy policy was kept vague and brief, recalled Douglas Edwards in I’m Feeling Lucky. “Larry’s refusal to engage the privacy discussion with the public always frustrated me. I remained convinced we could start with basic information and build an information center that would be clear and forthright about the tradeoffs users made when they entered their queries on Google or any other search engine,” he wrote. “Those who truly cared would see we were being transparent. Even if they didn’t like our policies on data collection or retention, they would know what they were. If they went elsewhere to search, they would be taking a chance that our competitors’ practices were far worse than ours.”47
Page didn’t see things this way.
The founder wanted total secrecy. His paranoia reached such a pitch that he began to worry about a scrolling ticker screen in Google’s Mountain View office lobby that displayed random Google searches from around the world in real time. “Journalists who came to Google stood in the lobby mesmerized by this peek into the global gestalt and later waxed poetical about the international impact of Google and the deepening role search plays in all our lives. Visitors were so entranced that they stared up at the display as they signed in for their temporary badges, not bothering to read the restrictive non-disclosure agreements they were agreeing to,” wrote Edwards. “Larry never cared for the scrolling queries screen. He constantly monitored the currents of public paranoia around information seepage, and the scrolling queries set off his alarm.” Page believed that the rolling marquee gave visitors too much insight into what his company was really doing.
Ironically, a struggling Internet has-been provided the public with a rare and inadvertent glimpse at the kind of intimate data search engines had been storing in their search logs. In August 2006, AOL, the giant prehistoric network provider, released into the public domain a few gigabytes worth of anonymized search logs: 20 million search queries made by 657,000 of its customers over a three-month period. The search results had been powered by Google, which owned 5 percent of AOL and ran the company’s search engine.48
Page saw these logs as a lucrative but volatile asset, one that threatened the company’s core business if made public. An AOL research team thought differently: they released the batch of logs as a good deed in the name of furthering social research. As far as the public was concerned, it was a good deed. But for AOL, and by extension Google, the logs were a public relations fiasco, shining light on the massive and systemic privacy intrusion upon which the search economy was based.
Responding to the uproar, AOL claimed its engineers had anonymized the logs by replacing personally identifying user account information with randomized numbers. But journalists quickly discovered that user identities could easily be reverse-engineered with just a half dozen searches. One such user—known in the logs as “4417749”—was easily unmasked by a pair of enterprising New York Times reporters as a grandmotherly senior in rural Georgia:
No. 4417749 conducted hundreds of searches over a three-month period on topics ranging from “numb fingers” to “60 single men” to “dog that urinates on everything.” And search by search, click by click, the identity of AOL user No. 4417749 became easier to discern. There are queries for “landscapers in Lilburn, Ga,” several people with the last name Arnold and “homes sold in shadow lake subdivision gwinnett county georgia.” It did not take much investigating to follow that data trail to Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga., frequently researches her friends’ medical ailments and loves her three dogs.49
The AOL log data revealed something else. Many of the search queries were extremely private, humiliating, disturbing, and possibly incriminating. Interspersed with searches on mundane topics like restaurants, television programs, and digital camera reviews were searches for medical ailments and advice on what to do “the morning after being raped” and, in some cases, queries that seemed to show unstable individuals on the verge of doing something violent and dangerous. To fully grasp the personal nature of the now-public searches, here is a sample of the raw logs:
User 2281868
“how destroy demons that live in apt above”
“is hip hop and rap music a form of satanism”
“are niggers satan or demons or gremlins”
“animal sex”
“do niggers have x-ray vision”
User 6416389
“girls fattened for butchering”
“cooked tender flesh of girls”
“cutting steaks from buttocks of girls”
“girls strangled and eaten”
“girls cut up into steaks”
User 1879967
“i eat my ejaculate and how long can it stay fresh”
“livingontheedge”
“i use my cum as an after shave”
“is it unhealthy to store up seman or cum in a glass and drink it in a week”
“i put cum on face as scent to atract girtls”
I looked through the logs, and one search stream caught my attention. It belonged to user 5342598 and featured multiple queries about an unsolved murder of a woman in San Jose, followed by searches for resources that could help a person determine whether they were a serial killer. Here’s a sample of the stream:
User 5342598
“unsolved murders in san jose”
“tara marowski”
“unsolved murder of tar a marowski”
“tara marowski found dead in car”
“tara found dead in car”
“unsolved mysteries tara marowski”
“san jose police departments cold cases”
“psychological test given to prisoners”
“test to see if you are a serial killer”
Did this person murder someone? Was this a serial killer? Was the other searcher a cannibal? Did the other user really believe the neighbors were demons? Or were these people just searching for weird things on the Internet? It is impossible to say. As for the murder searches, they were a matter for law enforcement to figure out, and indeed search logs have become an increasingly important component of criminal investigations.
One thing was certain in the wake of the AOL release: search logs provided an unadulterated look into the details of people’s inner lives, with all the strangeness, embarrassing quirks, and personal anguish those details divulged. And Google owned it all.
You Have Spy Mail
It’s April 2004 and Google is in crisis mode. Sergey Brin and Larry Page set up a war room and bring top executives from across the company together to deal with a dangerous development. They aren’t hunting for terrorists this time, but repelling an attack in progress.
About a month earlier, Google had started to roll out the beta version of Gmail, its email service. It was a big deal for the young company, representing its first product offering beyond search. At the beginning, everything was going smoothly. Then events quickly spirale
d out of control.
Gmail aimed to poach users from established email providers such as Microsoft and Yahoo. To do that, Google shocked everyone by offering one gigabyte of free storage space with every account—an incredible amount of space at the time, considering Microsoft’s Hotmail offered just two megabytes of free storage. Naturally, people rushed to sign up. Some were so eager to get their accounts that Gmail’s prepublic release invites were fetching up to $200 on eBay.50 “One gigabyte changes everything. You no longer live in terror that somebody will send you a photo, thereby exceeding your two-megabyte limit and making all subsequent messages bounce back to their senders,” wrote New York Times tech columnist David Pogue. “In fact, Google argues that with so much storage, you should get out of the habit of deleting messages.”51
The Google service seemed too good to be true, once again upending the laws of economics. Why would a company give away something so valuable? It felt like charity. An example of Internet magic at work. Turned out there was a huge upside for Google.
The search box was a powerful thing. It allowed Google to peer into people’s lives, habits, and interests. But it only worked as long as users stayed on Google’s site. As soon as they clicked a link, they were gone, and their browsing stream vanished. What did people do after they left Google.com? What websites did they visit? How often? When? What were those websites about? To these questions, Google’s search logs offered dead silence. That’s where Gmail came in.
Once users logged their Internet browser in to their email account, Google was able to track their every movement on the Internet, even if they used multiple devices. People could even use a rival search engine, and Google could keep a bead on them. Gmail gave Google something else as well.52
In return for the “free” gigabyte of email storage, users gave the company permission to read and analyze all their email in the same way that the company analyzed their search streams and to display targeted ads based on content. They also gave Google permission to tie their search history and browsing habits to their email address.
In this sense, Gmail opened up a whole new dimension of behavior tracking and profiling: it captured personal and business correspondence, private documents, postcards, vacation photos, love letters, shopping receipts, bills, medical records, bank statements, school records, and anything else people routinely sent and received by email. Google argued that Gmail would benefit users, allowing the company to show them relevant ads rather than inundate them with spam.
Not everyone saw it this way.
Less than a week after Gmail’s public launch, thirty-one privacy and civil liberties organizations, led by the World Privacy Forum, published an open letter addressed to Sergey Brin and Larry Page asking them to immediately suspend the email service. “Google has proposed scanning the text of all incoming emails for ad placement. The scanning of confidential email violates the implicit trust of an email service provider,” the organizations wrote. “Google could—tomorrow—by choice or by court order, employ its scanning system for law enforcement purposes. We note that in one recent case, the Federal Bureau of Investigation obtained a court order compelling an automobile navigation service to convert its system into a tool for monitoring in-car conversations. How long will it be until law enforcement compels Google into a similar situation?”53
The press, which until then had nary a negative thing to say about Google, turned critical. The company got bruised by journalists for its “creepy” scanning of emails. One reporter for Canada’s Maclean’s magazine recounted her experience with using Gmail’s targeted ad system: “I discovered recently just how relevant when I wrote an email to a friend using my Gmail account. My note mentioned a pregnant woman whose husband had an affair. The Google ads didn’t push baby gear and parenting books. Rather, Gmail understood that ‘pregnant’ in this case wasn’t a good thing because it was coupled with the word ‘affair.’ So it offered the services of a private investigator and a marriage therapist.”54
Showing ads for spy services to betrayed mothers? It wasn’t a good look for a company that still draped itself in a progressive “Don’t Be Evil” image.
True to Larry Page’s paranoia about letting the privacy “toothpaste out of the tube,” Google stayed tightlipped about the inner workings of its email scanning program in the face of criticism. But a series of profiling and targeted advertising technology patents filed by the company that year offered a glimpse into how Gmail fit into Google’s multiplatform tracking and profiling system.55 They revealed that all email communication was subject to analysis and parsed for meaning; names were matched to real identities and addresses using third-party databases as well as contact information stored in a user’s Gmail address book; demographic and psychographic data, including social class, personality type, age, sex, personal income, and marital status were extracted; email attachments were scraped for information; even a person’s US residency status was established. All of this was then cross-referenced and combined with data collected through Google’s search and browsing logs, as well as third-party data providers, and added to a user profile. The patents made it clear that this profiling wasn’t restricted to registered Gmail users but applied to anyone who sent email to a Gmail account.
Taken together, these technical documents revealed that the company was developing a platform that attempted to track and profile everyone who came in touch with a Google product. It was, in essence, an elaborate system of private surveillance.
There was another quality to it. The language in the patent filings—descriptions of using “psychographic information,” “personality characteristics,” and “education levels” to profile and predict people’s interests—bore eerie resemblance to the early data-driven counterinsurgency initiatives funded by ARPA in the 1960s and 1970s. Back then, the agency had experimented with mapping the value systems and social relationships of rebellious tribes and political groups, in the hope of isolating the factors that made them revolt and, ultimately, use that information to build predictive models to stop insurgencies before they happened. The aborted Project Camelot was one example. Another was J. C. R. Licklider and Ithiel de Sola Pool’s 1969 ARPA Cambridge Project, which aimed to develop a suite of computer tools that would allow military researchers to build predictive models using complex data, including factors such as “political participation of various countries,” “membership in associations,” “youth movements,” and “peasant attitudes and behavior.”
The Cambridge Project had been an early attempt at the underlying technology that made prediction and analysis possible. Naturally, Google’s predictive system, which arrived thirty years later, was more advanced and sophisticated than ARPA’s crude first-generation database tools. But it was also very similar. The company wanted to ingest search, browsing history, and email data to build predictive profiles capable of guessing the future interests and behavior of its users. There was only one difference: instead of preventing political insurgencies, Google wanted the data to sell people products and services with targeted ads. One was military, the other commercial. But at their core, both systems were dedicated to profiling and prediction. The type of data plugged into them was irrelevant.
UC Berkeley law professor Chris Hoofnagle, an expert on information privacy law, argued before the California Senate that the difference between military and commercial profiling was illusory. He compared Google’s email scanning to the surveillance and prediction project at DARPA’s then-active Total Information Awareness (TIA) program, a predictive policing technology that was initially funded by DARPA and handed to the National Security Agency after the September 11 terrorist attacks.56
A year after Google launched Gmail, Hoofnagle testified at hearings on email and privacy held by California’s Senate Judiciary Committee. “The prospect that a computer could, en masse, view transactional and content data and draw conclusions was the plan of John Poindexter’s Total Information Awareness,” he said, referring to President Ronald Reagan
’s national security adviser who, under President George W. Bush, was put in charge of helping DARPA fight terrorism.57 “TIA proposed to look at a wide array of personal information and make inferences for the prevention of terrorism or general crime. Congress rejected Poindexter’s plan. Google’s content extraction is different than TIA in that it is designed to pitch advertising rather than catch criminals.” To Hoofnagle, Google’s data mining wasn’t just technically similar to what the government was doing; it was a privatized version of the same thing. He predicted that the information collected by Gmail would eventually be tapped by the US government. It was a no-brainer. “Allowing the extraction of this content from e-mail messages is likely to have profound consequences for privacy. First, if companies can view private messages to pitch advertising, it is a matter of time before law enforcement will seek access to detect criminal conspiracies. All too often in Washington, one hears policy wonks asking, ‘if credit card companies can analyze your data to sell your cereal, why can’t the FBI mine your data for terrorism?’”58
The language of the patents underscored Hoofnagle’s criticism that there was little difference between commercial and military technology. It also brought the conversation back to the fears of the 1970s, when computer and networking technology was first becoming commonplace. Back then, there was widespread understanding that computers were machines built for spying: gathering data about users for processing and analysis. It didn’t matter if it was stock market data, weather, traffic conditions, or a person’s purchasing history.59
To the Electronic Privacy Information Center, Gmail posed both ethical and legal challenges.60 The organization believed Google’s interception of private digital communication to be a potential violation of California’s wiretapping laws. The organization called on the state’s attorney general to investigate the company.