Obfuscation
Page 2
improve obfuscation in relation to its purpose. Will obfuscation work? Yes—
but only in context.
Let’s begin.
4
INTRODUCTION
An Obfuscation Vocabulary
I
There are many obfuscation strategies. They are shaped by the user’s purposes (which may range from buying a few minutes of time to permanently
interfering with a profiling system), by whether the users work alone or in concert, by its target and its beneficiaries, by the nature of the information to be obfuscated, and by other parameters we will discuss in part II. (Parts I and II can be read independently—you are encouraged to skip ahead if you have
questions about obfuscation’s purposes, about ethical and political quandaries, or about the circumstances that, we argue, make obfuscation a useful addition to the privacy toolkit.) Before we get to that, though, we want you to understand how of the many specific circumstances of obfuscation can be
generalized into a pattern. We can link together a family of seemingly disparate events under a single heading, revealing their underlying continuities and suggesting how similar methods can be applied to other contexts and other
problems. Obfuscation is contingent, shaped by the problems we seek to
address and the adversaries we hope to foil or delay, but it is characterized by a simple underlying circumstance: unable to refuse or deny observation, we create many plausible, ambiguous, and misleading signals within which the
information we want to conceal can be lost.
To illustrate obfuscation in the ways that are most salient to its use and development now, and to provide a reference for the rest of the book, we have selected a set of core cases that exemplify how obfuscation works and what it can do. These cases are organized thematically. Though they aren’t suited to a simple typology, we have structured them so that the various choices particular to obfuscation should become clear as you read. In addition to these cases, we present a set of brief examples that illustrate some of obfuscation’s other applications and some of its more unusual contexts. With these cases and
explanations, you will have an index of obfuscation across all the domains in Chapter 1
which we have encountered it. Obfuscation—positive and negative, effective Core Cases
and ineffective, targeted and indiscriminate, natural and artificial, analog and digital—appears in many fields and in many forms.
1 COre CaSeS
1.1 Chaff: defeating military radar
During the Second World War, a radar operator tracks an airplane over
Hamburg, guiding searchlights and anti-aircraft guns in relation to a phosphor dot whose position is updated with each sweep of the antenna. Abruptly, dots that seem to represent airplanes begin to multiply, quickly swamping the
display. The actual plane is in there somewhere, impossible to locate owing to the presence of “false echoes.”1
The plane has released chaff—strips of black paper backed with alumi-
num foil and cut to half the target radar’s wavelength. Thrown out by the pound and then floating down through the air, they fill the radar screen with signals.
The chaff has exactly met the conditions of data the radar is configured to look for, and has given it more “planes,” scattered all across the sky, than it can handle.
This may well be the purest, simplest example of the obfuscation
approach. Because discovery of an actual airplane was inevitable (there
wasn’t, at the time, a way to make a plane invisible to radar), chaff taxed the time and bandwidth constraints of the discovery system by creating too many potential targets. That the chaff worked only briefly as it fluttered to the ground and was not a permanent solution wasn’t relevant under the circumstances.
It only had to work well enough and long enough for the plane to get past the range of the radar.
As we will discuss in part II, many forms of obfuscation work best as
time-buying “throw-away” moves. They can get you only a few minutes, but
sometimes a few minutes is all the time you need.
The example of chaff also helps us to distinguish, at the most basic level, between approaches to obfuscation. Chaff relies on producing echoes—
imitations of the real thing—that exploit the limited scope of the observer.
(Fred Cohen terms this the “decoy strategy.”2) As we will see, some forms of obfuscation generate genuine but misleading signals—much as you would protect the contents of one vehicle by sending it out accompanied by several other identical vehicles, or defend a particular plane by filling the sky with other planes—whereas other forms shuffle genuine signals, mixing data in an effort to make the extraction of patterns more difficult. Because those who scatter chaff have exact knowledge of their adversary, chaff doesn’t have to do either of these things.
8
Chapter 1
If the designers of an obfuscation system have specific and detailed knowledge of the limits of the observer, the system they develop has to work for only one wavelength and for only 45 minutes. If the system their adversary uses for observation is more patient, or if it has a more comprehensive set of capacities for observation, they have to make use of their understanding of the adversary’s internal agenda—that is, of what useful information the adversary hopes to extract from data obtained through surveillance—and under-
mine that agenda by manipulating genuine signals.
Before we turn to the manipulation of genuine signals, let’s look at a very different example of flooding a channel with echoes.
1.2 twitter bots: filling a channel with noise
The two examples we are about to discuss are a study in contrasts. Although producing imitations is their mode of obfuscation, they take us from the
Second World War to present-day circumstances, and from radar to social
networks. They also introduce an important theme.
In chapter 3, we argue that obfuscation is a tool particularly suited to the
“weak”—the situationally disadvantaged, those at the wrong end of asym-
metrical power relationships. It is a method, after all, that you have reason to adopt if you can’t be invisible—if you can’t refuse to be tracked or surveilled, if you can’t simply opt out or operate within professionally secured networks.
This doesn’t mean that it isn’t also taken up by the powerful. Oppressive or coercive forces usually have better means than obfuscation at their disposal.
Sometimes, though, obfuscation becomes useful to powerful actors—as it did in two elections, one in Russia and one in Mexico. Understanding the choices faced by the groups in contention will clarify how obfuscation of this kind can be employed.
During protests over problems that had arisen in the 2011 Russian parlia-
mentary elections, much of the conversation about ballot-box stuffing and
other irregularities initially took place on LiveJournal, a blogging platform that had originated in the United States but attained its greatest popularity in Russia—more than half of its user base is Russian.3 Though LiveJournal is
quite popular, its user base is very small relative to those of Facebook’s and Google’s various social systems; it has fewer than 2 million active accounts.4
Thus, LiveJournal is comparatively easy for attackers to shut down by means of distributed denial of service (DDoS) attack—that is, by using computers COre CaSeS
9
scattered around the world to issue requests for the site in such volume that the servers making the site available are overwhelmed and legitimate users can’t access it. Such an attack on LiveJournal, in conjunction with the arrests of activist bloggers at a protest in Moscow, was a straightforward approach to censorship.5 When and why, then, did obfuscation become necessary?
The conversation about the Russian protest migrated to Twitter, and the
powers interest
ed in disrupting it then faced a new challenge. Twitter has an enormous user base, with infrastructure and security expertise to match. It could not be taken down as easily as LiveJournal. Based in the United States, Twitter was in a much better position to resist political manipulation than LiveJournal’s parent company. (Although LiveJournal service is provided by a
company set up in the U.S. for that purpose, the company that owns it, SUP
Media, is based in Moscow.6) To block Twitter outright would require direct government intervention. The LiveJournal attack was done independently, by nationalist hackers who may or may not have the approval and assistance of the Putin/Medvedev administration.7 Parties interested in halting the political conversation on Twitter therefore faced a challenge that will become familiar as we explore obfuscation’s uses: time was tight, and traditional mechanisms for action weren’t available. A direct technical approach—either blocking
Twitter within a country or launching a worldwide denial-of-service attack—
wasn’t possible, and political and legal angles of attack couldn’t be used.
Rather than stop a Twitter conversation, then, attackers can overload it
with noise.
During the Russian protests, the obfuscation took the form of thousands
of Twitter accounts suddenly piping up and users posting tweets using the
same hashtags used by the protesters.8 Hashtags are a mechanism for group-
ing tweets together; for example, if I add #obfuscation to a tweet, the symbol
# turns the word into an active link—clicking it will bring up all other tweets tagged with #obfuscation. Hashtags are useful for organizing the flood of
tweets into coherent conversations on specific topics, and #триумфальная
(referring to Triumfalnaya, the location of a protest) became one of several tags people could use to vent their anger, express their opinions, and organize further actions. (Hashtags also play a role in how Twitter determines “trend-ing” and significant topics on the site, which can then draw further attention to what is being discussed under that tag—the site’s Trending Topics list often draws news coverage.9)
10
Chapter 1
If you were following #триумфальная, you would have seen tweet after tweet from Russian activists spreading links to news and making plans. But those tweets began to be interspersed with tweets about Russian greatness, or tweets that seemed to consist of noise, gibberish, or random words and
phrases. Eventually those tweets dominated the stream for #триумфальная,
and those for other topics related to the protests, to such a degree that tweets relevant to the topic were, essentially, lost in the noise, unable to get any attention or to start a coherent exchange with other users. That flood of
new tweets came from accounts that had been inactive for much of their existence. Although they had posted very little from the time of their creation until the time of the protests, now each of them was posting dozens of times
an hour. Some of the accounts’ purported users had mellifluous names,
such as imelixyvyq, wyqufahij, and hihexiq; others had more conventional-
seeming names, all built on a firstname_lastname model—for example,
latifah_xander.10
Obviously, these Twitter accounts were “Twitter bots”—programs pur-
porting to be people and generating automatic, targeted messages. Many of
the accounts had been created around the same time. In numbers and in fre-
quency, such messages can easily dominate a discussion, effectively ruining the platform for a specific audience through overuse—that is, obfuscating
through the production of false, meaningless signals.
The use of Twitter bots is becoming a reliable technique for stifling Twitter discussion. The highly contentious 2012 Mexican elections provide another
example of this strategy in practice, and further refined.11 Protesters opposed to the front-runner, Enrique Peña Nieto, and to the Partido Revolucionario Institucional (PRI), used #marchaAntiEPN as an organizing hashtag for the
purposes of aggregating conversation, structuring calls for action, and arrang-ing protest events. Groups wishing to interfere with the protesters’ organizing efforts faced challenges similar to those in the Russian case. Rather than thousands of bots, however, hundreds would do—indeed, when this case
was investigated by the American Spanish-language TV network Univision,
only about thirty such bots were active. Their approach was both to interfere with the work being done to advance #marchaAntiEPN and to overuse
that hashtag. Many of the tweets consisted entirely of variants of “#marchaAntiEPN #marchaAntiEPN #marchaAntiEPN #marchaAntiEPN #marchaAntiEPN
#marchaAntiEPN.” Such repetition, particularly by accounts already showing COre CaSeS
11
suspiciously bot-like behavior, triggers systems within Twitter that identify attempts to manipulate the hashtagging system and then remove the
hashtags in question from the Trending Topics list. In other words, because the items in Trending Topics become newsworthy and attract attention, spammers and advertisers will try to push hashtags up into that space through repetition, so Twitter has developed mechanisms for spotting and blocking such
activity.12
The Mexican-election Twitter bots were deliberately engaging in bad
behavior in order to trigger an automatic delisting, thereby keeping the impact of #marchaAntiEPN “off the radar” of the larger media. They were making the hashtag unusable and removing its potential media significance. This was
obfuscation as a destructive act. Though such efforts use the same basic tactic as radar chaff (that is, producing many imitations configured to hide the real thing), they have very different goals: rather than just buying time (for example, in the run-up to an election and during the period of unrest afterward), they render certain terms unusable—even, from the perspective of a sorting algorithm, toxic—by manipulating the properties of the data through the use of false signals.
1.3 CacheCloak: location services without location tracking
CacheCloak takes an approach to obfuscation that is suited to location-based services (LBSs).13 It illustrates two twists in the use of false echoes and imitations in obfuscation. The first of these is making sure that relevant data can still be extracted by the user; the second is trying to find an approach that can work indefinitely rather than as a temporary time-buying strategy.
Location-based services take advantage of the locative capabilities of
mobile devices to create various services, some of them social (e.g., FourSquare, which turns going places into a competitive game), some lucrative
(e.g., location-aware advertising), and some thoroughly useful (e.g., maps and nearest-object searches). The classic rhetoric of balancing privacy against utility, in which utility is often presented as detrimental to privacy, is evident here. If you want the value of an LBS—for example, if you want to be on the network that your friends are on so you can meet with one of them if you and that person are near one another—you will have to sacrifice some privacy,
and you will have to get accustomed to having the service provider know
where you are. CacheCloak suggests a way to reconfigure the tradeoff.
12
Chapter 1
“Where other methods try to obscure the user’s path by hiding parts of it,”
the creators of CacheCloak write, “we obscure the user’s location by surrounding it with other users’ paths”14—that is, through the propagation of ambiguous data. In the standard model, your phone sends your location to the service and gets the information you requested in return. In the CacheCloak model, your phone predicts your possible paths and then fetches the results for
several likely routes. As you move, you receive the benefits of locative
awareness
—access to what you are looking for, in the form of data cached in advance of potential requests—and an adversary is left with many possible
paths, unable to distinguish the beginning from the end of a route and unable to determine where you came from, where you mean to go, or even where you
are. From an observer’s perspective, the salient data—the data we wish to
keep to ourselves—are buried inside a space of other, equally likely data.
1.4 trackMeNot: blending genuine and artificial search queries
TrackMeNot, developed in 2006 by Daniel Howe, Helen Nissenbaum, and
Vincent Toubiana, exemplifies a software strategy for concealing activity with imitative signals.15 The purpose of TrackMeNot is to foil the profiling of users through their searches. It was designed in response to the U.S. Department of Justice’s request for Google’s search logs and in response to the surprising discovery by a New York Times reporter that some identities and profiles could be inferred even from anonymized search logs published by AOL Inc.16
Our search queries end up acting as lists of locations, names, interests,
and problems. Whether or not our full IP addresses are included, our identities can be inferred from these lists, and patterns in our interests can be discerned.
Responding to calls for accountability, search companies have offered ways to address people’s concerns about the collection and storage of search queries, though they continue to collect and analyze logs of such queries.17 Preventing any stream of queries from being inappropriately revealing of a particular person’s interests and activities remains a challenge.18
The solution TrackMeNot offers is not to hide users’ queries from search
engines (an impractical method, in view of the need for query satisfaction), but to obfuscate by automatically generating queries from a “seed list” of terms.