Data and Goliath

Home > Other > Data and Goliath > Page 22
Data and Goliath Page 22

by Bruce Schneier

5. Any organization creating, maintaining, using, or disseminating records of identifiable personal data must assure the reliability of the data for their intended use and must take precautions to prevent misuses of the data.

  To be sure, making systems more secure will cost money, and corporations will pass those costs on to users in the form of higher prices if they can. But users are already paying extra costs for insecure systems: the direct and indirect costs of privacy breaches. Making companies liable for breaches moves those costs to them and, as a by-product, causes the companies to improve their security. The relevant term from economics is “least cost avoider”: it is economically efficient to assign liability to the entity in possession of the data because it is best positioned to minimize risk. Think about it—what can you do if you want Facebook to better secure your data? Not much. Economic theory says that’s why the company should bear the cost of poor security practices.

  REGULATE DATA USE

  Unlike in the EU, in the US today personal information about you is not your property; it’s owned by the collector. Laws protect specific categories of personal data—financial data, healthcare information, student data, videotape rental records—but we have nothing like the broad privacy protection laws you find in European countries. But broad legal protections are really the only solution; leaving the market to sort this out will lead to even more invasive mass surveillance.

  Here’s an example. Dataium is a company that tracks you as you shop for a car online. It monitors your visits to different manufacturers’ websites: what types of cars you’re looking at, what options you click on for more information, what sorts of financing options you research, how long you linger on any given page. Dealers pay for this information about you—not just information about the cars they sell, but the cars you looked at that are sold by other manufacturers. They pay for this information so that when you walk into a showroom, they can more profitably sell you a car.

  Think about the economics here. That information might cost you (ballpark estimate) $300 extra on the final price when you buy your car. That means it’s worth no more than $300 to protect yourself from Dataium’s tactics. But there are 16 million cars sold annually in the US. Even if you assume that Dataium has customer information relevant to just 2% of them, that means it’s worth about $100 million to the company to ensure that its tactics work.

  This asymmetry is why market solutions tend to fail. It’s a collective action problem. It’s worth $100 million to all of us collectively to protect ourselves from Dataium, but we can’t coordinate effectively. Dataium naturally bands the car dealers together, but the best way for us customers to band together is through collective political action.

  The point of use is a sensible place to regulate, because much of the information that’s collected about us is collected because we want it to be. We object when that information is being used in ways we didn’t intend: when it is stored, shared, sold, correlated, and used to manipulate us in some stealthy way. This means that we need restrictions on how our data can be used, especially restrictions on ways that differ from the purposes for which it was collected.

  Other problems arise when corporations treat their underlying algorithms as trade secrets: Google’s PageRank algorithm, which determines what search results you see, and credit-scoring systems are two examples. The companies have legitimate concerns about secrecy. They’re worried both that competitors will copy them and that people will figure out how to game them. But I believe transparency trumps proprietary claims when the algorithms have a direct impact on the public. Many more algorithms can be made public—or redesigned so they can be made public—than currently are. For years, truth in lending and fair lending laws have required financial institutions to ensure that the algorithms they use are both explainable and legally defensible. Mandated transparency needs to be extended into other areas where algorithms hold power over people: they have to be open. Also, there are ways of auditing algorithms for fairness without making them public.

  Corporations tend to be rational risk assessors, and will abide by regulation. The key to making this work is oversight and accountability. This isn’t something unusual: there are many regulated industries in our society, because we know what they do is both important and dangerous. Personal information and the algorithms used to analyze it are no different. Some regular audit mechanism would ensure that corporations are following the rules, and would penalize them if they don’t.

  This all makes sense in theory, but actually doing it is hard. The last thing we want is for the government to start saying, “You can only do this and nothing more” with our data. Permissions-based regulation would stifle technological innovation and change. We want rights-based regulation—basically, “You can do anything you want unless it is prohibited.”

  REGULATE DATA COLLECTION AS WELL

  Regulating data use isn’t enough. Privacy needs to be regulated in many places: at collection, during storage, upon use, during disputes. The OECD Privacy Framework sets them out nicely, and they’re all essential.

  There’s been a concerted multi-year effort by US corporations to convince the world that we don’t need regulations on data collection, only on data use. Companies seek to eradicate any limitations on data collection because they know that any use limitations will be narrowly defined, and that they can slowly expand them once they have our data. (A common argument against any particular data-use regulation is that it’s a form of censorship.) They know that if collection limitations are in place, it’s much harder to change them. But as with government mass surveillance, the privacy harms come from the simple collection of the data, not only from its use. Remember the discussion of algorithmic surveillance from Chapter 10. Unrestricted corporate collection will result in broad collection, expansive sharing with the government, and a slow chipping away at the necessarily narrowly defined use restrictions.

  We need to fight this campaign. Limitations on data collection aren’t new. Prospective employers are not allowed to ask job applicants whether they’re pregnant. Loan applications are not allowed to ask about the applicants’ race. “Ban the Box” is a campaign to make it illegal for employers to ask about applicants’ criminal pasts. The former US gays-in-the-military compromise, “Don’t Ask Don’t Tell,” was a restriction on data collection. There are restrictions on what questions can be asked by the US Census Bureau.

  Extending this to a world where everything we do is mediated by computers isn’t going to be easy, but we need to start discussing what sorts of data should never be collected. There are some obvious places to start. What we read online should be as private as it is in the paper world. This means we should legally limit recording the webpages we read, the links we click on, and our search results. It’s the same with our movements; it should not be a condition of having a cell phone that we subject ourselves to constant surveillance. Our associations—to whom we communicate, whom we meet on the street—should not be continually monitored. Maybe companies can be allowed to use some of this data immediately and then must purge it. Maybe they’ll be allowed to save it for a short period of time.

  One intriguing idea has been proposed by University of Miami Law School professor Michael Froomkin: requiring both government agencies and private companies engaging in mass data collection to file Privacy Impact Notices, modeled after Environmental Impact Reports. This would serve to inform the public about what’s being collected and why, and how it’s being stored and used. It would encourage decision makers to think about privacy early in any project’s development, and to solicit public feedback.

  One place to start is to require opt-in. Basically, there are two ways to obtain consent. Opt-in means that you have to explicitly consent before your data is collected and used. Opt-out is the opposite; your data will be collected and used unless you explicitly object. Companies like Facebook prefer opt-out, because they can make the option difficult to find and know that most people won’t bother. Opt-in is more fair, and the
use of service shouldn’t be contingent on allowing data collection.

  Right now, there’s no downside to collecting and saving everything. By limiting what companies can collect and what they can do with the data they collect, by making companies responsible for the data under their control, and by forcing them to come clean with customers about what they actually collect and what they do with it, we will influence them to collect and save only the data about us they know is valuable.

  Congress needs to begin the hard work of updating US privacy laws and stop making excuses for inaction. Courts can also play a significant role safeguarding consumer privacy by enforcing current privacy laws. The regulatory agencies, such as the FTC and the FCC, have some authority to protect consumer privacy in certain domains. But what the United States needs today is an independent data protection agency comparable to those in other countries around the world. And we have to do better than patching problems only after they become sufficiently harmful. These challenges are big and complex, and we require an agency with the expertise and resources to have a meaningful impact.

  MAKE DO WITH LESS DATA

  By and large, organizations could make do collecting much less data, and storing it for shorter periods of time, than they do now. The key is going to be understanding how much data is needed for what purpose.

  For example, many systems that collect identifications don’t really need that information. Often, authorization is all that’s required. A social networking site doesn’t need to know your real identity. Neither does a cloud storage company.

  Some types of data analysis require you to have data on a lot of people, but not on everyone. Think about Waze. It uses surveillance data to infer traffic flow, but doesn’t need everyone’s data to do that. If it has enough cars under surveillance to get a broad coverage of major roads, that’s good enough. Many retailers rely on ubiquitous surveillance to measure the effectiveness of advertisements, infer buying patterns, and so on; but again, they do not need everyone’s data. A representative sample is good enough for those applications, and was common when data collection was expensive.

  Other applications prefer having everyone’s data simply because it makes them more effective. Sure, Google could do well if it only had data on half of its users, or saved only half of the search queries on all of its users, but it would be a less profitable business. Still other applications actually need all of the data. If you’re a cell phone company trying to deliver mobile phone calls, you need to know where each user is located—otherwise the system won’t work.

  There are also differences in how long a company needs to store data. Waze and your cell phone company only need location data in real time. Advertisers need some historical data, but newer data is more valuable. On the other hand, some data is invaluable for research. Twitter, for example, is giving its data to the Library of Congress.

  We need laws that force companies to collect the minimum data they need and keep it for the minimum time they need it, and to store it more securely than they currently do. As one might expect, the German language has a single word for this kind of practice: Datensparsamkeit.

  GIVE PEOPLE RIGHTS TO THEIR DATA

  The US is the only Western country without basic data protection laws. We do have protections for certain types of information, but those are isolated areas. In general, our rights to our data are spotty. Google “remembers” things about me that I have long forgotten. That’s because Google has my lifelong search history, but I don’t have access to it to refresh my memory. Medtronic maintains that data from its cardiac defibrillators is proprietary to the company, and won’t let patients in whom they’re implanted have access to it. In the EU, people have a right to know what data companies have about them. This was why the Austrian Max Schrems was able to force Facebook to give him all his personal information the company had. Those of us in the US don’t enjoy that right.

  Figuring out how these rights should work is not easy. For example, here is a list of different types of data you produce on a social networking site.

  • Service data: the data you give to a social networking site in order to use it. Depending on the site, such data might include your legal name, your age, and your credit card number.

  • Disclosed data: what you post on your own pages, such as blog entries, photographs, messages, and comments.

  • Entrusted data: what you post on other people’s pages. It’s basically the same stuff as disclosed data, but the difference is that you don’t have control over the data once you post it—another user does.

  • Incidental data: what other people post about you. Maybe it’s a paragraph about you that someone else writes, or a picture of you that someone else takes and posts. Not only do you not have any control over it, you didn’t even create it.

  • Behavioral data: data the site collects about your habits by monitoring what you do and whom you do it with.

  • Derived data: data about you that is inferred from all the other data. For example, if 80% of your friends self-identify as gay, you’re probably gay, too.

  What rights should you have regarding each of those types of data? Today, it’s all over the map. Some types are always private, some can be made private, and some are always public. Some can be edited or deleted—I know one site that allows entrusted data to be edited or deleted within a 24-hour period—and some cannot. Some can be viewed and some cannot. In the US there are no rules; those that hold the data get to decide—and of course they have complete access.

  Different platforms give you different abilities to restrict who may see your communications. Until 2011, you could either make your Facebook posts readable by your friends only or by everyone; at that point, Facebook allowed you to have custom friends groups, and you could make posts readable by some of your friends but by not all of them. Tweets are either direct messages or public to the world. Instagram posts can be either public, restricted to specific followers, or secret. Pinterest pages have public or secret options.

  Standardizing this is important. In 2012, the White House released a “Consumer Privacy Bill of Rights.” In 2014, a presidential review group on big data and privacy recommended that this bill of rights be the basis for legislation. I agree.

  It’s easy to go too far with this concept. Computer scientist and technology critic Jaron Lanier proposes a scheme by which anyone who uses our data, whether it be a search engine using it to serve us ads or a mapping application using it to determine real-time road congestion, automatically pays us a royalty. Of course, it would be a micropayment, probably even a nanopayment, but over time it might add up to a few dollars. Making this work would be extraordinarily complex, and in the end would require constant surveillance even as it tried to turn that surveillance into a revenue stream for everyone. The more fundamental problem is the conception of privacy as something that should be subjected to commerce in this way. Privacy needs to be a fundamental right, not a property right.

  We should have a right to delete. We should be able to tell any company we’re entrusting our data to, “I’m leaving. Delete all the data you have on me.” We should be able to go to any data broker and say, “I’m not your product. I never gave you permission to gather information about me and sell it to others. I want my data out of your database.” This is what the EU is currently grappling with: the right to be forgotten. In 2014, the European Court of Justice ruled that in some cases search engines need to remove information about individuals from their results. This caused a torrent of people demanding that Google remove search results that reflected poorly on them: politicians, doctors, pedophiles. We can argue about the particulars of the case, and whether the court got the balance right, but this is an important right for citizens to have with respect to their data that corporations are profiting from.

  US Consumer Privacy Bill of Rights (2012)

  INDIVIDUAL CONTROL: Consumers have a right to exercise control over what personal data companies collect from them and how they use it.r />
  TRANSPARENCY: Consumers have a right to easily understandable and accessible information about privacy and security practices.

  RESPECT FOR CONTEXT: Consumers have a right to expect that companies will collect, use, and disclose personal data in ways that are consistent with the context in which consumers provide the data.

  SECURITY: Consumers have a right to secure and responsible handling of personal data.

  ACCESS AND ACCURACY: Consumers have a right to access and correct personal data in usable formats, in a manner that is appropriate to the sensitivity of the data and the risk of adverse consequences to consumers if the data is inaccurate.

  FOCUSED COLLECTION: Consumers have a right to reasonable limits on the personal data that companies collect and retain.

  ACCOUNTABILITY: Consumers have a right to have personal data handled by companies with appropriate measures in place to assure they adhere to the Consumer Privacy Bill of Rights.

  MAKE DATA COLLECTION AND PRIVACY SALIENT

  We reveal data about ourselves all the time, to family, friends, acquaintances, lovers, even strangers. We share with our doctors, our investment counselors, our psychologists. We share a lot of data. But we think of that sharing transactionally: I’m sharing data with you, because I need you to know things/trust you with my secrets/am reciprocating because you’ve just told me something personal.

  As a species, we have evolved all sorts of psychological systems to navigate these complex privacy decisions. And these systems are extraordinarily complex, highly attuned, and delicately social. You can walk into a party and immediately know how to behave. Whom you talk to, what you tell to whom, who’s around you, who’s listening: most of us can navigate that beautifully. The problem is that technology inhibits that social ability. Move that same party onto Facebook, and suddenly our intuition starts failing. We forget who’s reading our posts. We accidentally send something private to a public forum. We don’t understand how our data is monitored in the background. We don’t realize what the technologies we’re using can and cannot do.

 

‹ Prev