Obfuscation
Page 7
radio buttons—to an overarching argument about the development of human
society. Privacy is an outmoded idea, some say, a two-century anomaly of
Western industrialization, the interregnum between village life and social media; privacy makes it possible for us to develop as free-thinking, independent individuals; privacy is an expression of bourgeois hypocrisy and bad faith; privacy is the defense of social diversity … .1 This doesn’t merely show the ways in which the word is used. A moment’s reflection makes clear that
within these uses are divergent concepts. The house of privacy has many
rooms. Some are concerned with the integrity of family life, some with state oppression (now or in the future), some with the utility and value of data, and some with a true inner self that can only emerge in anonymity, and
many have intersections and communicating doors.2 This conceptual diversity carries over into the strategies, practices, technologies, and tactics used to produce, perform, and protect privacy.3 Elsewhere we have shown how many
of these conceptions can be unified under the banner of contextual integrity, but here our concern is with the connections between these concerns, as
they are specifically articulated, and with how we can defend ourselves
accordingly.4
The purpose of this chapter is to describe what obfuscation is and how it
fits into the diverse landscape of privacy interests, threats to those interests, and methods used to address those threats. Privacy is a multi-faceted concept, and a wide range of structures, mechanisms, rules, and practices are available to produce it and defend it. If we open up privacy’s tool chest, drawer by meta-phorical drawer, we find policy and law at the local, national, and global levels;
provably secure technologies, such as cryptography; the disclosure actions and practices of individuals; social systems of confidentiality (for example, those of journalists, priests, doctors, and lawyers); steganographic systems; collective withholding and omerta on the part of a community; and more. We find Timothy May’s BlackNet, an application of cryptographic technologies to describe a wholly anonymous information marketplace, with untraceable,
untaxable transactions, that fosters corporate espionage and the circulation of military secrets and forbidden and classified materials, with the long-term goal of the “collapse of governments.”5 We find legal work building on the Fourth Amendment to the United States Constitution to establish protections for communications networks and social sites, endeavoring to strike a balance between rights of individual citizens and powers of law enforcement. To this diverse kit we will add obfuscation, both as a method in itself and as an
approach that can be used within and alongside other methods, depending on the goal. We aim to persuade readers that for some privacy problems obfuscation is a plausible solution, and that for some it is the best solution.
Obfuscation, at its most abstract, is the production of noise modeled on
an existing signal in order to make a collection of data more ambiguous, confusing, harder to exploit, more difficult to act on, and therefore less valuable.
The word “obfuscation” was chosen for this activity because it connotes
obscurity, unintelligibility, and bewilderment and because it helps to distinguish this approach from methods that rely on disappearance or erasure.
Obfuscation assumes that the signal can be spotted in some way and adds a
plethora of related, similar, and pertinent signals—a crowd which an individual can mix, mingle, and, if only for a short time, hide.
Consider General Sir Arthur St. Clare, the fictional military martyr in G. K.
Chesterton’s short story “The Sign of the Broken Sword.” General St. Clare’s men were slaughtered in an ill-considered attack on an enemy camp. Why did the brilliant strategist attempt an obviously flawed attack on his foe’s superior position? Chesterton’s ecclesiastical detective, Father Brown, answers with a question: “Where does the wise man hide a pebble?” “On the beach,” his friend replies.6 And he hides a leaf in the forest, Brown continues—and if he needs to hide a body, he must produce many dead bodies among which to hide it. To protect his secret, General St. Clare slays one man, then conceals him by the chaos of other dead men, which he creates by commanding a sudden charge
on artillery that has the high ground.
46
CHApTER 3
Father Brown’s rhetorical question was repeated by the Rt. Hon. Lord Justice Jacob in a 2007 patent case:
Now it might be suggested that it is cheaper to make this sort of mass disclosure than to consider the documents with some care to decide whether
they should be disclosed. And at that stage it might be cheaper—just run it all through the photocopier or CD maker—especially since doing so is an
allowable cost. But that isn’t the point. For it is the downstream costs caused by over-disclosure which so often are so substantial and so pointless. It can even be said, in cases of massive over-disclosure, that there is a real risk that the really important documents will get overlooked—where does a wise
man hide a leaf?7
From dead soldiers to disclosed documents, we can see that the essence of
obfuscation is in getting things overlooked, and adding to the cost, trouble, and difficulty of doing the looking.
Obfuscation can usefully be compared to camouflage. Camouflage is
often thought of as a tool for outright disappearance—think of the scene in The Simpsons in which Milhouse imagines putting on his camo outfit and melting into the greenery, with only his glasses and smile still visible.8 In practice, both natural and man-made camouflage work with a variety of techniques and goals, only some of which are used to try to vanish from view entirely; others make use of “disruptive patterns” that hide the edges, outline, orienta-tion, and movement of a shape with fragments and suggestions of other pos-
sible shapes. Breaking up the outlines doesn’t make a shape disappear
entirely, as when a flounder buries itself in sand or an octopus uses its mantle to masquerade as a rock. Rather, for situations in which avoiding observation is impossible—when we move, change positions, or are otherwise exposed—
disruptive patterns and disruptive coloration interfere with assessments of things like range, size, speed, and numbers. They make the individual harder to identify and target, and the members of the group more difficult to count.
Many early military uses of camouflage were devoted to making large, hard-
to-hide things such as artillery emplacements difficult to assess accurately from the air. In situations in which one can’t disappear, producing numerous possible targets or vectors of motion can sow confusion and buy valuable
time. If obfuscation has an emblematic animal, it is the family of orb-weaving spiders, Cyclosa mulmeinensis (mentioned in chapter 2), which fill their webs WHY IS OBFUSCATION NECESSARY?
47
with decoys of themselves. The decoys are far from perfect copies, but when a wasp strikes they work well enough to give the orb-weaver a second or two to scramble to safety.
Hannah Rose Shell’s history of camouflage, Hide and Seek: Camouflage,
Photography, and the Media of Reconnaissance, develops the theme of “camouflage consciousness,” a way of being and acting based on one’s internal
model of the surveillance technology against which one must work.9 Shell
argues that a camoufleur producing patterns, a specialist training soldiers, and the soldiers on a battlefield were attempting to determine their visibility to binoculars and telescopic rifle sights, to still and film cameras, to airplanes, spotters, and satellites, and to act in ways that mitigated that visibility. This entailed combining research, estimates, modeling, and guesswork to exploit the flaws and limitations of observational technology. Camouflage, whether seeking the complete invisibility of mimicry or the temporary solution of hiding a shape in a mess of other, amb
iguous, obfuscating possible shapes, was
always a reflection of the capabilities of the technology against which it was developed.
It is the forms of data obfuscation or information obfuscation that concern us here—their technical utility for designers, developers, and activists. Understanding the moral and ethical roles of such forms of obfuscation means
understanding the data-acquisition and data-analysis technologies they can challenge and obstruct. It means understanding the threat models, the goals, and the constraints. Obfuscation is a tool among other tools for the construction and the defense of privacy, and like all tools it is honed on the purposes it can serve and the problems it can solve. To lay out the nature of these problems, we introduce the idea of information asymmetry.
3.2 Understanding information asymmetry: knowledge and power
At this point, let us recall Donald Rumsfeld’s famously convoluted explanation of the calculus of risk in the run-up to the invasion of Iraq: “there are known knowns, which we know we know; known unknowns, which we know we don’t
know; and unknown unknowns, which we do not know we don’t know.”10 As
much as this seems like a deliberate logic puzzle, it distinguishes three very different categories of danger. We can see a surveillance camera mounted on a streetlight, or concealed in a dome of mirrored glass on the ceiling of a hallway, and know we are being recorded. We know that we don’t know
48
CHApTER 3
whether the recording is being transmitted only on the site or whether it is being streamed over the Internet to some remote location. We know that we
don’t know how long the recording will be stored, or who is authorized to view it—just a security guard watching live, or an insurance inspector in the event of a claim, or the police?
There is a much larger category of unknown unknowns about something
as seemingly simple as a CCTV recording. We don’t know if the footage can be run through facial-recognition or gait-recognition software, for instance, or if the time code can be correlated with a credit-card purchase, or with the license plate of a car we exited, to connect our image with our identity—in fact, unless we are personally involved with privacy activism or security engineering, we don’t even know that we don’t know that. Confusing as it is, not only is the triple negative in this sentence accurate; it also indicates the layers of uncertainty: we aren’t aware that we can’t be sure that the video file will not be analyzed with predictive demographic tools in order to identify likely criminals or terrorists for questioning. This isn’t even the end of the unknowns, all potentially shaping consequential decisions produced in a dense cloud of our ignorance. And that is merely one CCTV camera, its cable or wireless trans-mission terminating somewhere, in some hard drive, that may be backed up
somewhere else—under what jurisdictions, what terms, what business
arrangements? Multiply this by making a credit-card purchase, signing up for an email list, downloading a smartphone app (“This app requires access to
your contacts”? “Sure!”), giving a postal code or a birthday or a identification number in response to a reasonable and legitimate request, and on and on
through the day and around the world.
It is obvious that information collection takes place in asymmetrical
power relationships: we rarely have a choice as to whether or not we are monitored, what is done with any information that is gathered, or what is done to us on the basis of conclusions drawn from that information. If you want to take a train, make a phone call, use a parking garage, or buy some groceries, you are going to be subject to information gathering and you are going to give up some or all control over elements of that information. It is rarely a matter of explicit agreement in a space of complete information and informed choice.
You will have to fill out certain forms in order to receive critical resources or to participate in civic life, and you will have to consent to onerous terms of service in order to use software that your job may require. Moreover, the
WHY IS OBFUSCATION NECESSARY?
49
infrastructure, by default, gathers data on you. Obfuscation is related to this problem of asymmetry of power—as the camouflage comparison suggests, it
is an approach suited to situations in which we can’t easily escape observation but we must move and act—but this problem is only the surface aspect of
information collection, what we know we know. A second aspect, the informational or epistemic asymmetry, is a deeper and more pernicious problem, and plays more of a role in shaping obfuscation in defense of privacy.
Brad Templeton, chair of the Electronic Frontier Foundation, has told a
story of the danger of “time-traveling robots from the future”11 that, with more powerful hardware and sophisticated software than we have today, come back in time and subject us to total surveillance; they connect the discrete (and, we thought, discreet) dots of our lives, turning the flow of our private experience into all-too-clear, all-too-human patterns, shining their powerful analytic light into the past’s dark corners. Those robots from the future are mercenaries working for anyone wealthy enough to employ them: advertisers, industries, governments, interested parties. We are helpless to stop them as they collate and gather our histories, because, unlike them, we can’t travel through time and change our past actions.
Templeton’s story isn’t science fiction, however. We produce enormous
volumes of data every day. Those data stay around indefinitely, and the technology that can correlate them and analyze them keeps improving. Things we once thought were private—if we thought of that at all—become open, visible, and meaningful to new technologies. This is one aspect of the information
asymmetry that shapes our practices of privacy and autonomy: we don’t know what near-future algorithms, techniques, hardware, and databases will be
able to do with our data. There is a constantly advancing front of transition from meaningless to meaningful—from minor life events to things that can
change our taxes, our insurance rates, our access to capital, our freedom to move, or whether we are placed on a list.
That is the future unknown, but there are information asymmetries that
should concern us in the present too. Information about us is valuable, and it moves around. A company that has collected information about us may
connect it with other disparate pools of records (logs of telephone calls, purchase records, personally identifying information, demographic rosters, activity on social networks, geolocative data), and may then package that
information and sell it to other companies—or hand it over in response to a 50
CHApTER 3
governmental request or a subpoena. Even if those who run a company promise to keep the information to themselves, it may become part of the
schedule of assets after a bankruptcy and then be acquired or sold off. All the work of correlation and analysis is done with tools and training that, for most of the people they affect, lie beyond anything more than a superficial understanding. The population at large doesn’t have access to the other databases, or to the techniques, the training in mathematics and computer science, or the software and hardware that one must have to comprehend what can be
done with seemingly trivial details from their lives and activities, and how such details can potentially provide more powerful, more nearly complete,
and more revealing analyses than ordinary users could have anticipated—
more revealing, in fact, than even the engineers and analysts could have
anticipated.
Tal Zarsky, one of the major theorists of data mining, has described a
subtle trap in predictive software—yet another, further step in the asymmetry.
Predictive systems draw on huge existing datasets to produce predictions of human activity: they will make predictions, accurate or inaccurate, which will be u
sed to make decisions and produce coercive outcomes, and people will be punished or rewarded for things they have not yet done. The discriminatory and manipulative possibilities are clear. However, as Zarsky explains, there is another layer to these concerns: “A non-interpretable process might follow from a data-mining analysis which is not explainable in human language.
Here, the software makes its selection decisions based upon multiple vari-
ables (even thousands). … It would be difficult for the government to provide a detailed response when asked why an individual was singled out to receive
differentiated treatment by an automated recommendation system. The most
the government could say is that this is what the algorithm found based on previous cases.”12
Developing these ideas further, Solon Barocas reveals how vulnerable we
are to data aggregation, analytics, and predictive modeling—now popularly
called “big data.” Big data methods take information we have willingly shared, or have been compelled to provide, and produce knowledge from inferences
that few—least of all we individual data subjects—could have anticipated.13 It is not simply that a decision is made and enforced. We can’t even be entirely sure that we know why a decision is made and enforced, because, in the ultimate unknowable unknown of data collection, those who make the decision
WHY IS OBFUSCATION NECESSARY?
51
don’t know why it is made and enforced. We are reduced to guessing at the inner workings of an opaque operation. We do not understand the grounds for judgment. We are in a state of informational asymmetry.
Insofar as this is an argument built partially on what we don’t—indeed
can’t—know, it runs the risk of being a little abstract. But we can make it thoroughly concrete, and discuss a different facet of the problem of information asymmetry, by turning briefly to the subject of risk. Think of “risk” as in
“credit risk.” As Josh Lauer’s research has shown, the management of credit was crucial in the history of data collection, dossier production, and data mining.14 Transformations in the mercantile and social order of the United States in the nineteenth century obliged businesses to issue credit to customers without having access to the “personal acquaintance and community