Data and Goliath

Home > Other > Data and Goliath > Page 2
Data and Goliath Page 2

by Bruce Schneier


  I am not, and this book is not, anti-technology. The Internet, and the information age in general, has brought enormous benefits to society. I believe they will continue to do so. I’m not even anti-surveillance. The benefits of computers knowing what we’re doing have been life-transforming. Surveillance has revolutionized traditional products and services, and spawned entirely new categories of commerce. It has become an invaluable tool for law enforcement. It helps people all around the world in all sorts of ways, and will continue to do so far into the future.

  Nevertheless, the threats of surveillance are real, and we’re not talking about them enough. Our response to all this creeping surveillance has largely been passive. We don’t think about the bargains we’re making, because they haven’t been laid out in front of us. Technological changes occur, and we accept them for the most part. It’s hard to blame us; the changes have been happening so fast that we haven’t really evaluated their effects or weighed their consequences. This is how we ended up in a surveillance society. The surveillance society snuck up on us.

  It doesn’t have to be like this, but we have to take charge. We can start by renegotiating the bargains we’re making with our data. We need to be proactive about how we deal with new technologies. We need to think about what we want our technological infrastructure to be, and what values we want it to embody. We need to balance the value of our data to society with its personal nature. We need to examine our fears, and decide how much of our privacy we are really willing to sacrifice for convenience. We need to understand the many harms of overreaching surveillance.

  And we need to fight back.

  —Minneapolis, Minnesota, and Cambridge, Massachusetts, October 2014

  1

  Data as a By-product of Computing

  Computers constantly produce data. It’s their input and output, but it’s also a by-product of everything they do. In the normal course of their operations, computers continuously document what they’re doing. They sense and record more than you’re aware of.

  For instance, your word processor keeps a record of what you’ve written, including your drafts and changes. When you hit “save,” your word processor records the new version, but your computer doesn’t erase the old versions until it needs the disk space for something else. Your word processor automatically saves your document every so often; Microsoft Word saves mine every 20 minutes. Word also keeps a record of who created the document, and often of who else worked on it.

  Connect to the Internet, and the data you produce multiplies: records of websites you visit, ads you click on, words you type. Your computer, the sites you visit, and the computers in the network each produce data. Your browser sends data to websites about what software you have, when it was installed, what features you’ve enabled, and so on. In many cases, this data is enough to uniquely identify your computer.

  Increasingly we communicate with our family, friends, co-workers, and casual acquaintances via computers, using e-mail, text messaging, Facebook, Twitter, Instagram, SnapChat, WhatsApp, and whatever else is hot right now. Data is a by-product of this high-tech socialization. These systems don’t just transfer data; they also create data records of your interactions with others.

  Walking around outside, you might not think that you’re producing data, but you are. Your cell phone is constantly calculating its location based on which cell towers it’s near. It’s not that your cell phone company particularly cares where you are, but it needs to know where your cell phone is to route telephone calls to you.

  Of course, if you actually use that phone, you produce more data: numbers dialed and calls received, text messages sent and received, call duration, and so on. If it’s a smartphone, it’s also a computer, and all your apps produce data when you use them—and sometimes even when you’re not using them. Your phone probably has a GPS receiver, which produces even more accurate location information than the cell tower location alone. The GPS receiver in your smartphone pinpoints you to within 16 to 27 feet; cell towers, to about 2,000 feet.

  Purchase something in a store, and you produce more data. The cash register is a computer, and it creates a record of what you purchased and the time and date you purchased it. That data flows into the merchant’s computer system. Unless you paid cash, your credit card or debit card information is tied to that purchase. That data is also sent to the credit card company, and some of it comes back to you in your monthly bill.

  There may be a video camera in the store, installed to record evidence in case of theft or fraud. There’s another camera recording you when you use an ATM. There are more cameras outside, monitoring buildings, sidewalks, roadways, and other public spaces.

  Get into a car, and you generate yet more data. Modern cars are loaded with computers, producing data on your speed, how hard you’re pressing on the pedals, what position the steering wheel is in, and more. Much of that is automatically recorded in a black box recorder, useful for figuring out what happened in an accident. There’s even a computer in each tire, gathering pressure data. Take your car into the shop, and the first thing the mechanic will do is access all that data to diagnose any problems. A self-driving car could produce a gigabyte of data per second.

  Snap a photo, and you’re at it again. Embedded in digital photos is information such as the date, time, and location—yes, many cameras have GPS—of the photo’s capture; generic information about the camera, lens, and settings; and an ID number of the camera itself. If you upload the photo to the web, that information often remains attached to the file.

  It wasn’t always like this. In the era of newspapers, radio, and television, we received information, but no record of the event was created. Now we get our news and entertainment over the Internet. We used to speak to people face-to-face and then by telephone; we now have conversations over text or e-mail. We used to buy things with cash at a store; now we use credit cards over the Internet. We used to pay with coins at a tollbooth, subway turnstile, or parking meter. Now we use automatic payment systems, such as EZPass, that are connected to our license plate number and credit card. Taxis used to be cash-only. Then we started paying by credit card. Now we’re using our smartphones to access networked taxi systems like Uber and Lyft, which produce data records of the transaction, plus our pickup and drop-off locations. With a few specific exceptions, computers are now everywhere we engage in commerce and most places we engage with our friends.

  Last year, when my refrigerator broke, the serviceman replaced the computer that controls it. I realized that I had been thinking about the refrigerator backwards: it’s not a refrigerator with a computer, it’s a computer that keeps food cold. Just like that, everything is turning into a computer. Your phone is a computer that makes calls. Your car is a computer with wheels and an engine. Your oven is a computer that bakes lasagnas. Your camera is a computer that takes pictures. Even our pets and livestock are now regularly chipped; my cat is practically a computer that sleeps in the sun all day.

  Computers are getting embedded into more and more kinds of products that connect to the Internet. A company called Nest, which Google purchased in 2014 for more than $3 billion, makes an Internet-enabled thermostat. The smart thermostat adapts to your behavior patterns and responds to what’s happening on the power grid. But to do all that, it records more than your energy usage: it also tracks and records your home’s temperature, humidity, ambient light, and any nearby movement. You can buy a smart refrigerator that tracks the expiration dates of food, and a smart air conditioner that can learn your preferences and maximize energy efficiency. There’s more coming: Nest is now selling a smart smoke and carbon monoxide detector and is planning a whole line of additional home sensors. Lots of other companies are working on a wide range of smart appliances. This will all be necessary if we want to build the smart power grid, which will reduce energy use and greenhouse gas emissions.

  We’re starting to collect and analyze data about our bodies as a means of improving our health and well-being. I
f you wear a fitness tracking device like Fitbit or Jawbone, it collects information about your movements awake and asleep, and uses that to analyze both your exercise and sleep habits. It can determine when you’re having sex. Give the device more information about yourself—how much you weigh, what you eat—and you can learn even more. All of this data you share is available online, of course.

  Many medical devices are starting to be Internet-enabled, collecting and reporting a variety of biometric data. There are already—or will be soon—devices that continually measure our vital signs, our moods, and our brain activity. It’s not just specialized devices; current smartphones have some pretty sensitive motion sensors. As the price of DNA sequencing continues to drop, more of us are signing up to generate and analyze our own genetic data. Companies like 23andMe hope to use genomic data from their customers to find genes associated with disease, leading to new and highly profitable cures. They’re also talking about personalized marketing, and insurance companies may someday buy their data to make business decisions.

  Perhaps the extreme in the data-generating-self trend is lifelogging: continuously capturing personal data. Already you can install lifelogging apps that record your activities on your smartphone, such as when you talk to friends, play games, watch movies, and so on. But this is just a shadow of what lifelogging will become. In the future, it will include a video record. Google Glass is the first wearable device that has this potential, but others are not far behind.

  These are examples of the Internet of Things. Environmental sensors will detect pollution levels. Smart inventory and control systems will reduce waste and save money. Internet-connected computers will be in everything—smart cities, smart toothbrushes, smart lightbulbs, smart sidewalk squares, smart pill bottles, smart clothing—because why not? Estimates put the current number of Internet-connected devices at 10 billion. That’s already more than the number of people on the planet, and I’ve seen predictions that it will reach 30 billion by 2020. The hype level is pretty high, and we don’t yet know which applications will work and which will be duds. What we do know is that they’re all going to produce data, lots of data. The things around us will become the eyes and ears of the Internet.

  The privacy implications of all this connectivity are profound. All those smart appliances will reduce greenhouse gas emissions—and they’ll also stream data about how people move around within their houses and how they spend their time. Smart streetlights will gather data on people’s movements outside. Cameras will only get better, smaller, and more mobile. Raytheon is planning to fly a blimp over Washington, DC, and Baltimore in 2015 to test its ability to track “targets”—presumably vehicles—on the ground, in the water, and in the air.

  The upshot is that we interact with hundreds of computers every day, and soon it will be thousands. Every one of those computers produces data. Very little of it is the obviously juicy kind: what we ordered at a restaurant, our heart rate during our evening jog, or the last love letter we wrote. Rather, much of it is a type of data called metadata. This is data about data—information a computer system uses to operate or data that’s a by-product of that operation. In a text message system, the messages themselves are data, but the accounts that sent and received the message, and the date and time of the message, are all metadata. An e-mail system is similar: the text of the e-mail is data, but the sender, receiver, routing data, and message size are all metadata—and we can argue about how to classify the subject line. In a photograph, the image is data; the date and time, camera settings, camera serial number, and GPS coordinates of the photo are metadata. Metadata may sound uninteresting, but, as I’ll explain, it’s anything but.

  Still, this smog of data we produce is not necessarily a result of deviousness on anyone’s part. Most of it is simply a natural by-product of computing. This is just the way technology works right now. Data is the exhaust of the information age.

  HOW MUCH DATA?

  Some quick math. Your laptop probably has a 500-gigabyte hard drive. That big backup drive you might have purchased with it can probably store two or three terabytes. Your corporate network might have one thousand times that: a petabyte. There are names for bigger numbers. A thousand petabytes is an exabyte (a billion billion bytes), a thousand exabytes is a zettabyte, and a thousand zettabytes is a yottabyte. To put it in human terms, an exabyte of data is 500 billion pages of text.

  All of our data exhaust adds up. By 2010, we as a species were creating more data per day than we did from the beginning of time until 2003. By 2015, 76 exabytes of data will travel across the Internet every year.

  As we start thinking of all this data, it’s easy to dismiss concerns about its retention and use based on the assumption that there’s simply too much of it to save, and in any case it would be too hard to sift through for nuggets of meaningful information. This used to be true. In the early days of computing, most of this data—and certainly most of the metadata—was thrown away soon after it was created. Saving it took too much memory. But the cost of all aspects of computing has continuously fallen over the years, and amounts of data that were impractical to store and process a decade ago are easy to deal with today. In 2015, a petabyte of cloud storage will cost $100,000 per year, down 90% from $1 million in 2011. The result is that more and more data is being stored.

  You could probably store every tweet ever sent on your home computer’s disk drive. Storing the voice conversation from every phone call made in the US requires less than 300 petabytes, or $30 million, per year. A continuous video lifelogger would require 700 gigabytes per year per person. Multiply that by the US population and you get 2 exabytes per year, at a current cost of $200 million. That’s expensive but plausible, and the price will only go down. In 2013, the NSA completed its massive Utah Data Center in Bluffdale. It’s currently the third largest in the world, and the first of several that the NSA is building. The details are classified, but experts believe it can store about 12 exabytes of data. It has cost $1.4 billion so far. Worldwide, Google has the capacity to store 15 exabytes.

  What’s true for organizations is also true for individuals, and I’m a case study. My e-mail record stretches back to 1993. I consider that e-mail archive to be part of my brain. It’s my memories. There isn’t a week that goes by that I don’t search that archive for something: a restaurant I visited some years ago, an article someone once told me about, the name of someone I met. I send myself reminder e-mails all the time; not just reminders of things to do when I get home, but reminders of things that I might want to recall years in the future. Access to that data trove is access to me.

  I used to carefully sort all that e-mail. I had to decide what to save and what to delete, and I would put saved e-mails into hundreds of different folders based on people, companies, projects, and so on. In 2006, I stopped doing that. Now, I save everything in one large folder. In 2006, for me, saving and searching became easier than sorting and deleting.

  To understand what all this data hoarding means for individual privacy, consider Austrian law student Max Schrems. In 2011, Schrems demanded that Facebook give him all the data the company had about him. This is a requirement of European Union (EU) law. Two years later, after a court battle, Facebook sent him a CD with a 1,200-page PDF: not just the friends he could see and the items on his newsfeed, but all of the photos and pages he’d ever clicked on and all of the advertising he’d ever viewed. Facebook doesn’t use all of this data, but instead of figuring out what to save, the company finds it easier to just save everything.

  2

  Data as Surveillance

  Governments and corporations gather, store, and analyze the tremendous amount of data we chuff out as we move through our digitized lives. Often this is without our knowledge, and typically without our consent. Based on this data, they draw conclusions about us that we might disagree with or object to, and that can impact our lives in profound ways. We may not like to admit it, but we are under mass surveillance.

  Much of what
we know about the NSA’s surveillance comes from Edward Snowden, although people both before and after him also leaked agency secrets. As an NSA contractor, Snowden collected tens of thousands of documents describing many of the NSA’s surveillance activities. In 2013, he fled to Hong Kong and gave them to select reporters. For a while I worked with Glenn Greenwald and the Guardian newspaper, helping analyze some of the more technical documents.

  The first news story to break that was based on the Snowden documents described how the NSA collects the cell phone call records of every American. One government defense, and a sound bite repeated ever since, is that the data collected is “only metadata.” The intended point was that the NSA wasn’t collecting the words we spoke during our phone conversations, only the phone numbers of the two parties, and the date, time, and duration of the call. This seemed to mollify many people, but it shouldn’t have. Collecting metadata on people means putting them under surveillance.

  An easy thought experiment demonstrates this. Imagine that you hired a private detective to eavesdrop on someone. The detective would plant bugs in that person’s home, office, and car. He would eavesdrop on that person’s phone and computer. And you would get a report detailing that person’s conversations.

  Now imagine that you asked the detective to put that person under surveillance. You would get a different but nevertheless comprehensive report: where he went, what he did, who he spoke with and for how long, who he wrote to, what he read, and what he purchased. That’s metadata.

  Eavesdropping gets you the conversations; surveillance gets you everything else.

  Telephone metadata alone reveals a lot about us. The timing, length, and frequency of our conversations reveal our relationships with others: our intimate friends, business associates, and everyone in-between. Phone metadata reveals what and who we’re interested in and what’s important to us, no matter how private. It provides a window into our personalities. It yields a detailed summary of what’s happening to us at any point in time.

 

‹ Prev