by Ajay Agrawal
This raises at least a few other public policy avenues to be explored. For ex-
ample, given the public goods nature of data, there may be circumstances in
which public investment in data creation and public ownership of the data
thus created is worth exploring, particularly in circumstances when private
creation of such data would lead to antitrust concerns.
References
Abrahamson, Zachary. 2014. “Comment: Essential Data.” Yale Law Journal 124
(3): 867– 68.
422 Judith Chevalier
Meadows, Maxwell. 2014. “The Essential Facilities Doctrine in Information Econo-
mies: Illustrating Why the Antitrust Duty to Deal is Still Necessary in the New
Economy.” Fordham Intellectual Property, Media, and Entertainment Law Journal
25 (3): 795– 830.
Pate, R. Hewitt. 2006. “Refusals to Deal and Essential Facilities.” Testimony of
R. Hewitt Pate, DOJ/ FTC Hearings on Single- Firm Conduct, Washington
DC, July 18. https:// www .justice .gov/ atr/ refusals- deal- and- essential- facilities- r
- hewitt- pate- statement.
Segal, I., and M. Whinston. 2007. “Antitrust in Innovative Industries.” American
Economic Review 97 (5): 1703– 30.
Vesteger, Margrethe. 2016. “Making Data Work for Us.” Speech at the Data Ethics
event on Data as Power, Copenhagen, Sept. 9. https:// ec.europa.eu/ commission
/ commissioners/ 2014– 2019/ vestager/ announcements/ making- data- work- us_en.
17
Privacy, Algorithms, and
Artifi cial Intelligence
Catherine Tucker
Imagine the following scenario. You are late for a hospital appointment and
searching frantically for a parking spot. You know that you often forget
where you parked your car, so you use an app you downloaded called “Find
my Car.” The app takes a photo of your car and then geocodes the photo,
enabling you to easily fi nd the right location when you come to retrieve your
car. The app accurately predicts when it should provide a prompt. This all
sounds very useful. However, this example illustrates a variety of privacy
concerns in a world of artifi cial intelligence.
1. Data Persistence: This data, once created, may potentially persist longer
than the human that created it, given the low costs of storing such data.
2. Data Repurposing: It is not clear how such data could be used in the
future. Once created, such data can be indefi nitely repurposed. For example,
in a decade’s time parking habits may be part of the data used by health
insurance companies to allocate an individual to a risk premium.
3. Data Spillovers: There are potential spillovers for others who did not
take the photo. The photo may record other people and they may be identifi -
able through facial recognition, or incidentally captured cars may be identi-
fi able through license plate databases. These other people did not choose to
create the data, but my choice to create data may have spillovers for them
in the future.
Catherine Tucker is the Sloan Distinguished Professor of Management Science at MIT
Sloan School of Management and a research associate of the National Bureau of Economic Research.
For acknowledgments, sources of research support, and disclosure of the author’s material fi nancial relationships, if any, please see http:// www .nber .org/ chapters/ c14011.ack.
423
424 Catherine Tucker
This article will discuss these concerns in detail, after considering how
the theory of the economics of privacy relates to artifi cial intelligence (AI).
17.1 The Theory of Privacy in Economics and Artifi cial Intelligence
17.1.1 Current Models of Economics and Privacy and Their Flaws
The economics of privacy has long being plagued by a lack of clarity
about how to model privacy over data. Most theoretical economic models
model privacy as an intermediate good (Varian 1996; Farrell 2012). This
implies that an individual desire for data privacy will depend on how they
anticipate that data’s eff ect on future economic outcomes. If, for example,
this data leads a fi rm to charge higher prices based on the behavior they
observe in the data, a consumer may desire privacy. If a datum may lead
a fi rm to intrude on their time, then again a consumer may desire privacy.
However, this contrasts with, or at the very least has a diff erent emphasis
on, how many policymakers and even consumers think about privacy policy
and choice.
First, much of the policy debate involves whether or not consumers are
capable of making the right choice surrounding the decision to provide data,
and whether “notice and consent” provides suffi
cient information to con-
sumers so they make the right choice. Work such as McDonald and Cranor
(2008) emphasizes that even ten years ago it was unrealistic to think that con-
sumers would have time to properly inform themselves about how their data
may be used, as reading through privacy policies would take an estimated
244 hours each year. Since that study, the amount of devices (thermostats,
smart phones, apps, cars) collecting data has increased dramatically, suggest-
ing that it is, if anything, more implausible now that a consumer has the time
to actually understand the choice they are making in each of these instances.
Relatedly, even if customers are assumed to have been adequately in-
formed, a new “behavioral” literature on privacy shows that well- documented
eff ects from behavioral economics, such as the endowment eff ect or “anchor-
ing,” may also distort the ways customers make decisions surrounding their
data (Acquisti, Taylor, and Wagman 2016). Such distortions may allow for
policy interventions of the “nudge” type to allow consumers to make better
decisions (Acquisti 2010).
Third, this theory presupposes that customers will only desire privacy if
their data is actually used for something, rather than experiencing distaste
at the idea of their data being collected. Indeed, in some of the earliest
work on privacy in the internet era, Varian (1996) states, “I don’t really care
if someone has my telephone number as long as they don’t call me during
dinner and try to sell me insurance. Similarly, I don’t care if someone has
my address, as long as they don’t send me lots of offi
cial- looking letters
off ering to refi nance my house or sell me mortgage insurance.”
Privacy, Algorithms, and Artifi cial Intelligence 425
However, there is evidence to suggest that people do care about the mere
fact of collection of their data to the extent of changing their behavior,
even if the chance of their suff ering meaningfully adverse consequences
from that collection is very small. Empirical analysis of people’s reactions
to the knowledge that their search queries (Marthews and Tucker 2014)
had been collected by the US National Security Agency (NSA), shows a
signifi cant shift in behavior even when that data was not going to be used
by the government to identify terrorists, as it was simply personally embar-
rassing. Legally speaking, the Fourth Amendment of the US Constitution
covers the “unreasonable seizure” as well as the “u
nreasonable search” of
people’s “papers and eff ects,” suggesting that governments, and fi rms acting
on government’s behalf, cannot entirely ignore seizure of data and focus
only on whether a search is reasonable. Consequently, a growing consumer
market has emerged for “data- light” and “end- to-end encrypted” com-
munications and software solutions, where the fi rm collects much less or
no data about their consumers’ activities on their platform. These kinds of
concern suggest that the fact of data collection may matter as well as how
the data is used.
Last, often economic theory assumes that while customers desire fi rms
to have information that allows them to better match their horizontally
diff erentiated preferences, they do not desire fi rms to have information that
might inform their willingness to pay (Varian 1996). However, this idea
that personalization in a horizontal sense may be sought by customers goes
against popular reports of consumers fi nding personalization repugnant or
creepy (Lambrecht and Tucker 2013). Instead, it appears that personaliza-
tion of products using horizontally diff erentiated taste information is only
acceptable or successful if accompanied by a sense of control or ownership
over the data used, even where such control is ultimately illusory (Tucker
2014; Athey, Catalini, and Tucker 2017).
17.1.2 Artifi cial Intelligence and Privacy
Like “privacy,” artifi cial intelligence is often used loosely to mean many
things. This article follows (Agrawal, Gans, and Goldfarb 2016) and focuses
on AI as being associated with reduced costs of prediction. The obvious
eff ect that this will have on the traditional model of privacy is that more
types of data will be used to predict a wider variety of economic objectives.
Again, the desire (or lack of desire) for privacy will be a function of an
individual’s anticipation of the consequences of their data being used in a
predictive algorithm. If they anticipate that they will face worse economic
outcomes if the AI uses their data, they may desire to restrict their data
sharing or creating behavior.
It may be that the simple dislike or distaste for data collection will transfer
to the use of automated predictive algorithms to process their data. The
creepiness that leads to a desire for privacy that is attached to the use of
426 Catherine Tucker
data would be transferred to algorithms. Indeed, there is some evidence of
a similar behavioral process where some customers only accept algorithmic
prediction if it is accompanied by a sense of control (Dietvorst, Simmons,
and Massey 2016).
In this way, the question of AI algorithms seems simply a continuation
of the tension that has plagued earlier work in the economics of privacy. So,
a natural question is whether AI presents new or diff erent problems. This
article argues that many of the questions of AI and privacy choices will
constrain the ability of customers in our traditional model of privacy to
make choices regarding the sharing of their data. I emphasize three themes
that I think may distort this process in important and economically interest-
ing ways.
17.2 Data Persistence, AI, and Privacy
Data persistence refers to the fact that once digital data is created, it is
diffi
cult to delete completely. This is true from a technical perspective (Adee
2015). Unlike analog records, which can be destroyed with reasonable ease,
the intentional deletion of digital data requires resources, time, and care.
17.2.1 Unlike in Previous Eras, Data Created Now Is Likely to Persist
Cost constraints that used to mean that only the largest fi rms could aff ord
to store extensive data, and even then for a limited time, have essentially
disappeared.
Large shifts in the data- supply infrastructure have rendered the tools for
gathering and analyzing large swaths of digital data commonplace. Cloud-
based resources such as Amazon, Microsoft, and Rackspace make these
tools not dependent on scale,1 and storage costs for data continue to fall,
so that some speculate they may eventually approach zero.2 This allows
ever- smaller fi rms to have access to powerful and inexpensive computing
resources. This decrease in costs suggests that data may be stored indefi nitely
and can be used in predictive exercises should it be thought of as a useful
predictor.
The chief resource constraint on the deployment of big data solutions
is a lack of human beings with the data- science skills to draw appropriate
conclusions from analysis of large data sets (Lambrecht and Tucker 2017).
As time and skills evolve, this constraint may become less pressing.
Digital persistence may be concerning from a privacy point of view
because privacy preferences may change over time. The privacy preference
1. http:// betanews .com/ 2014/ 06/ 27/ comparing- the- top- three- cloud- storage- providers/.
2. http:// www .enterprisestorageforum .com/ storage- management/ can- cloud- storage- costs
- fall- to-zero- 1 .html.
Privacy, Algorithms, and Artifi cial Intelligence 427
that an individual may have felt when they created the data may be incon-
sistent with the privacy preference of their older self. This is something we
documented in Goldfarb and Tucker (2012). We showed that while younger
people tended to be more open with data, as they grew older their prefer-
ence for withholding data grew. This was a stable eff ect that persisted across
cohorts. It is not the case that young people today are unusually casual about
data; all generations when younger are more casual about data, but this pat-
tern was simply less visible previously because social media, and other ways
of sharing and creating potentially embarrassing data, did not yet exist.
This implies that one concern regarding AI and privacy is that it may use
data that was created a long time in the past, which in retrospect the indi-
vidual regrets creating.
Data that was created at t = 0 may have seemed innocuous at the time,
and in isolation may still be innocuous at t = t + 1, but increased computing power may be able to derive much more invasive conclusions from aggregations of otherwise innocuous data at t + 1 relative to t. Second, there is a whole variety of data generated on individuals that individuals do not
necessarily consciously choose to create. This not only includes incidental
collection of the data such as being photographed by another party, but
also data generated by the increased passive surveillance of public spaces,
and the use of cellphone technology without full appreciation of how much
data about an individual and location it discloses to third parties, including
the government.
Though there has been substantial work in bringing in the insights of
behavioral economics into the study of the economics of privacy, there has
been less work on time- preference consistency, despite the fact that it is one
of the oldest and most studied (Strotz 1955; Rubinstein 2006) phenomena
in behavioral economics. Introducing the potential for myopia or hyperbolic
discounting into the way we model privacy
choices over the creation of data
seems, therefore, an important step. Even if the economist concerned rejects
behavioral economics or myopia as an acceptable solution, at the very least
it is useful to emphasize that privacy choices should be modeled not as
something where the time between the creation of the data and the use of
the data is trivial, but instead is more acceptably modeled as a decision that
may be played out over an extended amount of time.
17.2.2 How Long Will Data’s Predictive Power Persist?
If we assume that any data created will probably persist, given low stor-
age costs, it may be that the more important question for understanding
the dynamics of privacy is the question of how long data’s predictive power
persists.
It seems reasonable to think that much of the data created today does not
have much predictive power tomorrow. This is something we investigated in
428 Catherine Tucker
Chiou and Tucker (2014) where we showed that the length of the data reten-
tion period that search engines were restricted to by the European Union
(EU) did not appear to aff ect the success of their algorithm at generating
useful search results. This is where the success of a search result was mea-
sured by whether or not the user felt compelled to search again. This may
make sense in the world of search engines where many searches are either
unique or focused on new events. On August 31, 2017, for example, the top
trending search on Google was “Hurricane Harvey,” something that could
not have been predicted on the basis of search behavior from more than a
few weeks prior.3
However, there are some forms of data where it is reasonable to think that
their predictive power will persist almost indefi nitely. The most important
example of this is the creation of genetic digital data. As Miller and Tucker
(2017) point out, companies such as 23andme .com are creating large reposi-
tories of genetic data spanning more than 1.2 million people. As pointed
out by Miller and Tucker (2017), genetic data has the unusual quality that
it does not change over time.
While the internet browsing behavior of a twenty- year- old may not prove
to be good for predicting their browsing behavior at age forty, the genetic
data of a twenty- year- old will almost perfectly predict the genetic data of