Data and Goliath

Home > Other > Data and Goliath > Page 3
Data and Goliath Page 3

by Bruce Schneier


  A Stanford University experiment examined the phone metadata of about 500 volunteers over several months. The personal nature of what the researchers could deduce from the metadata surprised even them, and the report is worth quoting:

  • Participant A communicated with multiple local neurology groups, a specialty pharmacy, a rare-condition management service, and a hotline for a pharmaceutical used solely to treat relapsing multiple sclerosis.

  • Participant B spoke at length with cardiologists at a major medical center, talked briefly with a medical laboratory, received calls from a pharmacy, and placed short calls to a home reporting hotline for a medical device used to monitor cardiac arrhythmias.

  • Participant C made a number of calls to a firearms store that specializes in the AR semiautomatic rifle platform, and also spoke at length with customer service for a firearm manufacturer that produces an AR line.

  • In a span of three weeks, Participant D contacted a home improvement store, locksmiths, a hydroponics dealer, and a head shop.

  • Participant E had a long early morning call with her sister. Two days later, she placed a series of calls to the local Planned Parenthood location. She placed brief additional calls two weeks later, and made a final call a month after.

  That’s a multiple sclerosis sufferer, a heart attack victim, a semiautomatic weapons owner, a home marijuana grower, and someone who had an abortion, all from a single stream of metadata.

  Web search data is another source of intimate information that can be used for surveillance. (You can argue whether this is data or metadata. The NSA claims it’s metadata because your search terms are embedded in the URLs.) We don’t lie to our search engine. We’re more intimate with it than with our friends, lovers, or family members. We always tell it exactly what we’re thinking about, in words as clear as possible. Google knows what kind of porn each of us searches for, which old lovers we still think about, our shames, our concerns, and our secrets. If Google decided to, it could figure out which of us is worried about our mental health, thinking about tax evasion, or planning to protest a particular government policy. I used to say that Google knows more about what I’m thinking of than my wife does. But that doesn’t go far enough. Google knows more about what I’m thinking of than I do, because Google remembers all of it perfectly and forever.

  I did a quick experiment with Google’s autocomplete feature. This is the feature that offers to finish typing your search queries in real time, based on what other people have typed. When I typed “should I tell my w,” Google suggested “should i tell my wife i had an affair” and “should i tell my work about dui” as the most popular completions. Google knows who clicked on those completions, and everything else they ever searched on.

  Google’s CEO Eric Schmidt admitted as much in 2010: “We know where you are. We know where you’ve been. We can more or less know what you’re thinking about.”

  If you have a Gmail account, you can check for yourself. You can look at your search history for any time you were logged in. It goes back for as long as you’ve had the account, probably for years. Do it; you’ll be surprised. It’s more intimate than if you’d sent Google your diary. And even though Google lets you modify your ad preferences, you have no rights to delete anything you don’t want there.

  There are other sources of intimate data and metadata. Records of your purchasing habits reveal a lot about who you are. Your tweets tell the world what time you wake up in the morning, and what time you go to bed each night. Your buddy lists and address books reveal your political affiliation and sexual orientation. Your e-mail headers reveal who is central to your professional, social, and romantic life.

  One way to think about it is that data is content, and metadata is context. Metadata can be much more revealing than data, especially when collected in the aggregate. When you have one person under surveillance, the contents of conversations, text messages, and e-mails can be more important than the metadata. But when you have an entire population under surveillance, the metadata is far more meaningful, important, and useful.

  As former NSA general counsel Stewart Baker said, “Metadata absolutely tells you everything about somebody’s life. If you have enough metadata you don’t really need content.” In 2014, former NSA and CIA director Michael Hayden remarked, “We kill people based on metadata.”

  The truth is, though, that the difference is largely illusory. It’s all data about us.

  CHEAPER SURVEILLANCE

  Historically, surveillance was difficult and expensive. We did it only when it was important: when the police needed to tail a suspect, or a business required a detailed purchasing history for billing purposes. There were exceptions, and they were extreme and expensive. The exceptionally paranoid East German government had 102,000 Stasi surveilling a population of 17 million: that’s one spy for every 166 citizens, or one for every 66 if you include civilian informants.

  Corporate surveillance has grown from collecting as little data as necessary to collecting as much as possible. Corporations always collected information on their customers, but in the past they didn’t collect very much of it and held it only as long as necessary. Credit card companies collected only the information about their customers’ transactions that they needed for billing. Stores hardly ever collected information about their customers, and mail-order companies only collected names and addresses, and maybe some purchasing history so they knew when to remove someone from their mailing list. Even Google, back in the beginning, collected far less information about its users than it does today. When surveillance information was expensive to collect and store, corporations made do with as little as possible.

  The cost of computing technology has declined rapidly in recent decades. This has been a profoundly good thing. It has become cheaper and easier for people to communicate, to publish their thoughts, to access information, and so on. But that same decline in price has also brought down the price of surveillance. As computer technologies improved, corporations were able to collect more information on everyone they did business with. As the cost of data storage became cheaper, they were able to save more data and for a longer time. As big data analysis tools became more powerful, it became profitable to save more information. This led to the surveillance-based business models I’ll talk about in Chapter 4.

  Government surveillance has gone from collecting data on as few people as necessary to collecting it on as many as possible. When surveillance was manual and expensive, it could only be justified in extreme cases. The warrant process limited police surveillance, and resource constraints and the risk of discovery limited national intelligence surveillance. Specific individuals were targeted for surveillance, and maximal information was collected on them alone. There were also strict minimization rules about not collecting information on other people. If the FBI was listening in on a mobster’s phone, for example, the listener was supposed to hang up and stop recording if the mobster’s wife or children got on the line.

  As technology improved and prices dropped, governments broadened their surveillance. The NSA could surveil large groups—the Soviet government, the Chinese diplomatic corps, leftist political organizations and activists—not just individuals. Roving wiretaps meant that the FBI could eavesdrop on people regardless of the device they used to communicate with. Eventually, US agencies could spy on entire populations and save the data for years. This dovetailed with a changing threat, and they continued espionage against specific governments, while expanding mass surveillance of broad populations to look for potentially dangerous individuals. I’ll talk about this in Chapter 5.

  The result is that corporate and government surveillance interests have converged. Both now want to know everything about everyone. The motivations are different, but the methodologies are the same. That is the primary reason for the strong public-private security partnership that I’ll talk about in Chapter 6.

  To see what I mean about the cost of surveillance technology, just look how cheaply or
dinary consumers can obtain sophisticated spy gadgets. On a recent flight, I was flipping through an issue of SkyMall, a catalog that airlines stick in the pocket of every domestic airplane seat. It offered an $80 pen with a hidden camera and microphone, so I could secretly record any meeting I might want evidence about later. I can buy a camera hidden in a clock radio for $100, or one disguised to look like a motion sensor alarm unit on a wall. I can set either one to record continuously or only when it detects motion. Another device allows me to see all the data on someone else’s smartphone—either iPhone or Android—assuming I can get my hands on it. “Read text messages even after they’ve been deleted. See photos, contacts, call histories, calendar appointments and websites visited. Even tap into the phone’s GPS data to find out where it’s been.” Only $120.

  From other retailers I can buy a keyboard logger, or keylogger, to learn what someone else types on her computer—assuming I have physical access to it—for under $50. I can buy call intercept software to listen in on someone else’s cell phone calls for $100. Or I can buy a remote-controlled drone helicopter with an onboard camera and use it to spy on my neighbors for under $1,000.

  These are the consumer items, and some of them are illegal in some jurisdictions. Professional surveillance devices are also getting cheaper and better. For the police, the declining costs change everything. Following someone covertly, either on foot or by car, costs around $175,000 per month—primarily for the salary of the agents doing the following. But if the police can place a tracker in the suspect’s car, or use a fake cell tower device to fool the suspect’s cell phone into giving up its location information, the cost drops to about $70,000 per month, because it only requires one agent. And if the police can hide a GPS receiver in the suspect’s car, suddenly the price drops to about $150 per month—mostly for the surreptitious installation of the device. Getting location information from the suspect’s cell provider is even cheaper: Sprint charges law enforcement only $30 per month.

  The difference is between fixed and marginal costs. If a police department performs surveillance on foot, following two people costs twice as much as following one person. But with GPS or cell phone surveillance, the cost is primarily for setting up the system. Once it is in place, the additional marginal cost of following one, ten, or a thousand more people is minimal. Or, once someone spends the money designing and building a telephone eavesdropping system that collects and analyzes all the voice calls in Afghanistan, as the NSA did to help defend US soldiers from improvised explosive devices, it’s cheap and easy to deploy that same technology against the telephone networks of other countries.

  Mass Surveillance

  The result of this declining cost of surveillance technology is not just a difference in price; it’s a difference in kind. Organizations end up doing more surveillance—a lot more. For example, in 2012, after a Supreme Court ruling, the FBI was required to either obtain warrants for or turn off 3,000 GPS surveillance devices installed in cars. It would simply be impossible for the FBI to follow 3,000 cars without automation; the agency just doesn’t have the manpower. And now the prevalence of cell phones means that everyone can be followed, all of the time.

  Another example is license plate scanners, which are becoming more common. Several companies maintain databases of vehicle license plates whose owners have defaulted on their auto loans. Spotter cars and tow trucks mount cameras on their roofs that continually scan license plates and send the data back to the companies, looking for a hit. There’s big money to be made in the repossession business, so lots of individuals participate—all of them feeding data into the companies’ centralized databases. One scanning company, Vigilant Solutions of Livermore, California, claims to have 2.5 billion records and collects 70 million scans in the US per month, along with date, time, and GPS location information.

  In addition to repossession businesses, scanning companies also sell their data to divorce lawyers, private investigators, and others. They sometimes relay it, in real time, to police departments, which combine it with scans they get from interstate highway on-ramps, toll plazas, border crossings, and airport parking lots. They’re looking for stolen vehicles and drivers with outstanding warrants and unpaid tickets. Already, the states’ driver’s license databases are being used by the FBI to identify people, and the US Department of Homeland Security wants all this data in a single national database. In the UK, a similar government-run system based on fixed cameras is deployed throughout the country. It enforces London’s automobile congestion charge system, and searches for vehicles that are behind on their mandatory inspections.

  Expect the same thing to happen with automatic face recognition. Initially, the data from private cameras will most likely be used by bounty hunters tracking down bail jumpers. Eventually, though, it will be sold for other uses and given to the government. Already the FBI has a database of 52 million faces, and facial recognition software that’s pretty good. The Dubai police are integrating custom facial recognition software with Google Glass to automatically identify suspects. With enough cameras in a city, police officers will be able to follow cars and people around without ever leaving their desks.

  This is mass surveillance, impossible without computers, networks, and automation. It’s not “follow that car”; it’s “follow every car.” Police could always tail a suspect, but with an urban mesh of cameras, license plate scanners, and facial recognition software, they can tail everyone—suspect or not.

  Similarly, putting a device called a pen register on a suspect’s land line to record the phone numbers he calls used to be both time-consuming and expensive. But now that the FBI can demand that data from the phone companies’ databases, it can acquire that information about everybody in the US. And it has.

  In 2008, the company Waze (acquired by Google in 2013) introduced a new navigation system for smartphones. The idea was that by tracking the movements of cars that used Waze, the company could infer real-time traffic data and route people to the fastest roads. We’d all like to avoid traffic jams. In fact, all of society, not just Waze’s customers, benefits when people are steered away from traffic jams so they don’t add to them. But are we aware of how much data we’re giving away?

  For the first time in history, governments and corporations have the ability to conduct mass surveillance on entire populations. They can do it with our Internet use, our communications, our financial transactions, our movements . . . everything. Even the East Germans couldn’t follow everybody all of the time. Now it’s easy.

  HIDDEN SURVEILLANCE

  If you’re reading this book on a Kindle, Amazon knows. Amazon knows when you started reading and how fast you read. The company knows if you’re reading straight through, or if you read just a few pages every day. It knows if you skip ahead to the end, go back and reread a section, or linger on a page—or if you give up and don’t finish the book. If you highlight any passages, Amazon knows about that, too. There’s no light that flashes, no box that pops up, to warn you that your Kindle is sending Amazon data about your reading habits. It just happens, quietly and constantly.

  We tolerate a level of electronic surveillance online that we would never allow in the physical world, because it’s not obvious or advertised. It’s one thing for a clerk to ask to see an ID card, or a tollbooth camera to photograph a license plate, or an ATM to ask for a card and a PIN. All of these actions generate surveillance records—the first case may require the clerk to copy or otherwise capture the data on the ID card—but at least they’re overt. We know they’re happening.

  Most electronic surveillance doesn’t happen that way. It’s covert. We read newspapers online, not realizing that the articles we read are recorded. We browse online stores, not realizing that both the things we buy and the things we look at and decide not to buy are being monitored. We use electronic payment systems, not thinking about how they’re keeping a record of our purchases. We carry our cell phones with us, not understanding that they’re constantly tracking o
ur location.

  Buzzfeed is an entertainment website that collects an enormous amount of information about its users. Much of the data comes from traditional Internet tracking, but Buzzfeed also has a lot of fun quizzes, some of which ask very personal questions. One of them—“How Privileged Are You?”—asks about financial details, job stability, recreational activities, and mental health. Over two million people have taken that quiz, not realizing that Buzzfeed saves data from its quizzes. Similarly, medical information sites like WebMD collect data on what pages users search for and read.

  Lest you think it’s only your web browsing, e-mails, phone calls, chats, and other electronic communications that are monitored, old-fashioned paper mail is tracked as well. Through a program called Isolation Control and Tracking, the US Postal Service photographs the exterior, front and back, of every piece of mail sent in the US. That’s about 160 billion pieces annually. This data is available to law enforcement, and certainly other government agencies as well.

  Off the Internet, many surveillance technologies are getting smaller and less obtrusive. In some cities, video cameras capture our images hundreds of times a day. Some are obvious, but we don’t see a CCTV camera embedded in a ceiling light or ATM, or a gigapixel camera a block away. Drones are getting smaller and harder to see; they’re now the size of insects and soon the size of dust.

  Add identification software to any of these image collection systems, and you have an automatic omnipresent surveillance system. Face recognition is the easiest way to identify people on camera, and the technology is getting better every year. In 2014, face recognition algorithms started outperforming people. There are other image identification technologies in development: iris scanners that work at a distance, gait recognition systems, and so on.

 

‹ Prev