So when a parallel discussion started in January 2003 about Google's own privacy policy and what we revealed to users about the data we collected from their searches, something clicked in my head. "We need to cross the streams," I thought. It was all related. Scumware. Privacy. The Google toolbar. "Yada," I realized in a moment of transcendental clarity, "yada."
Our "Not the usual yada yada" message had forestalled an uproar over our toolbar tracking users as they moved across the web. Now we could shelter ourselves from a PR cataclysm over privacy and fight scumware at the same time by employing a similar tactic. I knew a firestorm was coming. We were not immune to criticism about our privacy policy—from mild concerns to wild conspiracy theories. We had people's most intimate thoughts in our log files and, soon enough, people would realize it. We didn't know who searched for what, but, as I had seen after 9/11, there were ways to extract that information if someone was motivated to do so.
Chances are you've Googled yourself. Almost all of us have searched for our own names. When you do that, Google sees your IP address, the number corresponding to your computer's connection point on the Internet. If you connect to the Internet via a large commercial Internet service provider (ISP), a new IP address is theoretically assigned each time you log on* and then reissued to others when you turn off your computer. In practice, however, your IP address may not change for days or even months.
If you've used Google before, most likely you also have a Google cookie on your computer—a unique string of digits Google placed there so it can remember your preferences each time you come back (preferences like "apply SafeSearch filtering" or "show results in Chinese"). Google doesn't know your name or your real-world location, though your IP address may reveal your city if your ISP assigns blocks of numbers to specific geographic regions.
Looking at all the searches conducted from one IP address by a computer with a cookie assigned to it over a period of time could give a search engine data about individual user behavior. That information would be invaluable in improving both the relevance of search results and the targeting of advertising.
Why is the information helpful? Say that for a single twenty-four- hour period you threw all the search terms entered by one cookie/IP address combo into a bucket and analyzed them to establish correlations. Then say you compared those correlations with those found in other buckets: other searches conducted by other cookied computers. Patterns would emerge. So if you found that a search for "best sushi in Mountain View" was often followed by a search for "Sushi Tomi restaurant," you might associate Sushi Tomi with the best sushi in Mountain View. A large search engine could compare tens of millions of buckets to determine how terms were related to one another. With that much data, you could derive some fairly definitive answers.
Using searchers' data, though, creates a fundamental dilemma. How do you protect user privacy while retaining the maximum value of the data for improving the search engine that collected it? Part of Google's answer was to anoint Nikhil Bhatla our "privacy czar." One of the first questions Nikhil raised was about identifying a user strictly from the stream of queries tied to one cookie over time. He shared an anecdote about engineer Jeff Dean, who had been working in the logs system where user search data was recorded. Jeff noticed that one cookie had been conducting a very interesting series of queries on technical topics, using highly sophisticated search techniques. He was impressed by the searcher's acumen. Only after studying the data further did he realize that the query stream he was looking at came from his own computer.
Nikhil's question kicked off a privacy debate among Googlers that dragged on for weeks. No one wanted to identify users or misuse the information we collected. But we also knew we weren't the only ones who might see the data in our logs. It was 2003. The Patriot Act had been the law of the land for a little more than a year, loosening restrictions on the government's ability to access email and other electronic communication records. The Justice Department could request data from Google, and we would be legally bound not to tell users that their information had been passed to law enforcement officials. Attorney General John Ashcroft might soon be knocking on our door.
The arguments raised were so complex and technical that it would be impossible to detail them all here.* The main issues, though, had to do with controlling access to logs data by Google staff, the length of time Google retained user data files, notifying users that we were storing their search information, and giving users the option to delete data we had collected. The tradeoff with each of these would be a reduction in Google's ability to mine logs data to make better products for all its users.
I trusted that my colleagues would make intelligent, ethical decisions on data access and retention. The point I cared most about was notification. I drafted a proposal outlining all the things we could do, should do to lead the discussion on privacy and to set an industry standard. Instead of avoiding the issues raised by the collection of user data, I advocated we embrace them. We had nothing to hide. We could establish an advisory committee of outside privacy advocates, set up a public forum in Google groups, post tutorials about data gathering on our site, and give instructions on how to delete cookies to avoid being tracked.
Matt Cutts and Wayne Rosing, our VP of engineering, loudly and publicly supported the plan. I started thinking about ways we could build an area on our site for consumer advocacy. Then Cindy let me know privately that one Googler was not pleased with my proposal. Marissa, Cindy said, claimed that the idea of an advisory panel was hers and that I had neglected to give her credit. I rolled my eyes. I had suggested an advisory panel because we had had one at the Merc. I had not been in any meeting at which Marissa brought up the topic and so had no idea that she had suggested something similar.
I was tempted to fire off a note to that effect, but at my performance review a couple of weeks earlier Cindy had instructed me to stop waging email wars that went on forever. So instead of refuting Marissa over the ether, I set up a face-to-face meeting. It took me a week to get on her calendar, and even then her only available time was after dinner. As dusk fell, we went for a walk around the vacant lot next door to try and clear the air.
Marissa assured me that I was not the only one misappropriating her ideas. I was just the most recent. And, she wanted to know, why didn't she get credit for her work on the homepage promotion lines, which after all, should really be her responsibility, not marketing's?
I wasn't sure what credit there was to give for a single line of text on the Google.com homepage, or who else in the company might care, but I offered to publicly acknowledge her contributions whenever she made them. I wasn't willing to cede control over them, though. The marketing text on the homepage was the most valuable promotional medium we employed. It reached millions of people, and since promotion was a marketing responsibility, not a product-management one, I insisted that marketing should control the space.
As we headed back into the building, I assured Marissa, with complete sincerity, that I respected her intelligence and opinions and the enormous contribution she made to Google. I viewed her as my most important colleague in terms of the work that lay ahead. We had been working together to improve Google for more than three years, I reminded her. Despite our differing points of view in the past—and probably going forward as well—it was essential we maintain a direct channel of communication. I encouraged her to bring future issues to me and assured her I would do the same.
I told Cindy later that our chat had poured oil on some troubled waters. But, I added, I didn't expect it to be the last conversation of its kind.
Meanwhile, the privacy discussion had grown a thousand heads and was consuming vast quantities of time and mental effort among the engineers and the product team. Was our goal to make Google the most trusted organization on the planet? Or the best search engine in the world? Both goals put user interests first, but they might be mutually exclusive.
Matt Cutts characterized the two main camps in what he termed "the Battle
Royal" as hawks and doves, where hawks wanted to keep as much user information as we could gather and doves wanted to delete search data as quickly as we got it. Larry and Sergey were hawks. Matt considered himself one as well.
"We never know how we might use this data," Matt explained. "It's a reflection of what the world is thinking, so how can that not be useful?" As someone who worked on improving the quality of Google's search results, Matt saw limitless possibilities. For example, "You can learn spelling correction even in languages that you don't understand. You can look at the actions of users refining their queries and say, if you see someone type in x, it should be spell-corrected to y."
Well, some engineers asked, why don't we just tell people how we use cookie data to improve our products? We could give Matt's example about the spell checker, which also relied on user data to work its magic with names like the often misspelled "Britney Spears."
We don't tell them, Larry explained, because we don't want our competitors to know how our spell checker works. Larry opposed any path that would reveal our technological secrets or stir the privacy pot and endanger our ability to gather data. People didn't know how much data we collected, but we were not doing anything evil with it, so why begin a conversation that would just confuse and concern everyone? Users would oversimplify the issue with baseless fears and then refuse to let us collect their data. That would be a disaster for Google, because we would suddenly have less insight into what worked and what didn't. It would be better to do the right thing and not talk about it.
Matt understood Larry's position. He also sympathized with Googlers who wanted to compromise by anonymizing the data or encrypting the logs and then throwing away the keys every month. That would keep some data accessible, but the unique identifiers would disappear.
Not that Matt thought it would do any good in stemming public concerns. "part of the problem," he told me, "was explaining that in real-world terms. As soon as you start talking about symmetric encryption and keys that rotate out, people's eyes turn to glass." The issue was too complicated to offer an easy solution. Even if we agreed to delete data, we couldn't be sure we erased all of it, because of automatic backups stored in numerous places for billing advertisers or maintaining an audit trail. I began to understand the hesitation to even engage in the discussion with users.
What if we let users opt out of accepting our cookies altogether? I liked that idea, but Marissa raised an interesting point. We would clearly want to set the default as "accept Google's cookies." If we fully explained what that meant to most users, however, they would probably prefer not to accept our cookie. So our default setting would go against users' wishes. Some people might call that evil, and evil made Marissa uncomfortable. She was disturbed that our current cookie-setting practices made the argument a reasonable one. She agreed that at the very least we should have a page telling users how they could delete their cookies, whether set by Google or by some other website.
Describing how to delete cookies fit neatly with a state-of-the-brand analysis I had been working on. In it, I laid out my thoughts about redirecting our identity from "search and only search" to a leadership role on issues affecting users online. I forecast that user privacy, our near monopoly in search, and censorship demands by foreign governments would be the three trials to bedevil us in the coming year. We needed to prepare—to get out in front and lead the parade rather than be trampled by it. Marissa complimented my analysis but had reservations about my recommendations. Just as I had thought "Don't be evil" overpromised, she feared taking public stands about our ethical positions would result in overly heightened expectations and negative reactions if we failed to live up to them. I understood that perspective (and shared it) but believed we didn't need to claim to be ethically superior. We just needed our actions to demonstrate that we were. Users could draw their own conclusions.
Sergey's feedback was less encouraging. "I find documents like this frightening," he stated. "It's vague and open-ended, which makes specific feedback impossible." Lest I take his lack of comments for assent, he asked me to detail the next steps I intended to take. I had already done that, but evidently he hadn't read past the first page. I wondered if my communication with Sergey would improve if I took him for a walking chat, as I had with Marissa—perhaps along a high cliff overlooking the ocean.
Meanwhile, the privacy discussion bubbled and boiled until at last a meeting could be arranged to hash out once and for all policies on employee access to user data, data retention, and user education about privacy issues.
The meeting raised many other questions, and answered none of them. Eric Schmidt half-jokingly suggested that our privacy policy should start off with the full text of the Patriot Act. Larry argued we should keep all our data until—well, until the time we should get rid of it. If we thought the government was overreaching, we could just encrypt everything and make it unreadable. Besides, Ashcroft would most likely go after the ISPs first, since they had much better data than we did about what users did online.* The meeting ended, but the debate continued for months.
My idea for blazing a path on educating users about privacy never gained the endorsement of Larry and Sergey, and so did not come to fruition. Perhaps they were right that it would have opened a Pandora's box. The issue of privacy would never go away, and trying to explain our rationale might only make things more confusing. Why not let the issue come to us instead of rushing out to meet it? We weren't willing to talk about the wonderful benefits of users sharing their data with us, because we weren't willing to share any information about how we used that data. If we couldn't say something nice, why say anything at all?
That didn't stop me from assuming the most aggressive possible stance when it came to communicating with users about privacy each time a new product launched. I repeated the Yada Yada story to every Googler who would listen, though I found few converts to my vision of users making fully informed decisions about the data they shared with us. Most engineers felt the tradeoff was too high. If users came to Google looking for information about online privacy, they figured, we would help them the way we always did—by sending them somewhere else for answers.
Let the Good Times Scroll
Larry refused to talk directly to users about cookies and log files, and he tried to keep the public from getting curious by minimizing their exposure to the data we collected. He wasn't always successful.
For example, a display of "real time" Google search queries crawled across a video monitor suspended over the receptionist's desk in our lobby. I sometimes sat on the red couch and watched to find out what the world was looking for. The terms scrolled by silently in a steady stream:
new employment in Montana
scheduled zip backup
greeting cards free
nervous system
lynyrd skynyrd tabliature Tuesday
datura metal
tamron lense 500mm
mode chip for playstation
the bone collector
singles chat
Journalists who came to Google stood in the lobby mesmerized by this peek into the global gestalt and later waxed poetical about the international impact of Google and the deepening role search plays in all our lives. Visitors were so entranced that they stared up at the display as they signed in for their temporary badges, not bothering to read the restrictive non-disclosure agreements they were agreeing to.
The query scroll was carefully filtered for offensive terms that might clash with our wholesome image.* Offensive terms written in English, anyway. I recall a group of Japanese visitors pointing and smirking at some of the katakana characters floating across the page. The inability to identify foreign-language porn is just one of the reasons we never used the query scroll widely for marketing purposes, despite its ability to instantly turn esoteric technology into voyeuristic entertainment.
Larry never cared for the scrolling queries screen. He constantly monitored the currents of public paranoia around information seepag
e, and the scrolling queries set off his alarm. He felt the display could inadvertently reveal personal data, because queries could contain names or information that users would prefer to remain private (for example, "John Smith DUI arrest in Springfield" or "Mary Jones amateur porn movie"). Moreover, it might cause people to think more about their own queries and stir what he deemed to be ungrounded fears over what information was conveyed with each search.
Larry tried to kill the Google Zeitgeist, too. Zeitgeist was a year-end feature that the PR team put together recapping the trends in search terms over the previous twelve months. The press loved Zeitgeist because it gave them another way to wrap up the year, but to Larry it raised too many questions about how much Google knew about users' searches and how long we kept their data. Cindy asked me to come up with a list of reasons to continue the tradition, and my rationale evidently convinced Larry the risk was acceptable, because the year-end Zeitgeist is still published on Google.com.
All the while we wrestled with the issues of what to tell users, our ability to mine their data became better and better. Amit Patel, as his first big project at Google, had built a rudimentary system to make sense of the logs that recorded user interactions with our site. Ironically, the same engineer who did the most to seed the notion of "Don't be evil" in the company's consciousness also laid the cornerstone of a system that would bring into question the purity of Google's intentions.
Amit's system was a stopgap measure. It took three years and an enormous effort from a team of Googlers led by legendary coder Rob Pike to perfect the technology that, since it processed logs, came to be designated "Sawmill." The power of Sawmill when it was activated in 2003 gave Google a clear understanding of user behavior, which in turn enabled our engineers to serve ads more effectively than Yahoo did, to identify and block some types of robotic software submitting search terms, to report revenue accurately enough to meet audit requirements, and to determine which UI features improved the site and which confused users. If engineers were reluctant to delete logs data before Sawmill, they were adamant about retaining it afterward.
I'm Feeling Lucky: The Confessions of Google Employee Number 59 Page 40