The Economics of Artificial Intelligence Page 70 Read online free by Ajay Agrawal

Home > Other > The Economics of Artificial Intelligence > Page 70

The Economics of Artificial Intelligence Page 70

This raises at least a few other public policy avenues to be explored. For ex-

ample, given the public goods nature of data, there may be circumstances in

which public investment in data creation and public ownership of the data

thus created is worth exploring, particularly in circumstances when private

creation of such data would lead to antitrust concerns.

References

Abrahamson, Zachary. 2014. “Comment: Essential Data.” Yale Law Journal 124

(3): 867– 68.

422 Judith Chevalier

Meadows, Maxwell. 2014. “The Essential Facilities Doctrine in Information Econo-

mies: Illustrating Why the Antitrust Duty to Deal is Still Necessary in the New

Economy.” Fordham Intellectual Property, Media, and Entertainment Law Journal

25 (3): 795– 830.

Pate, R. Hewitt. 2006. “Refusals to Deal and Essential Facilities.” Testimony of

R. Hewitt Pate, DOJ/ FTC Hearings on Single- Firm Conduct, Washington

DC, July 18. https:// www .justice .gov/ atr/ refusals- deal- and- essential- facilities- r

- hewitt- pate- statement.

Segal, I., and M. Whinston. 2007. “Antitrust in Innovative Industries.” American

Economic Review 97 (5): 1703– 30.

Vesteger, Margrethe. 2016. “Making Data Work for Us.” Speech at the Data Ethics

event on Data as Power, Copenhagen, Sept. 9. https:// ec.europa.eu/ commission

/ commissioners/ 2014– 2019/ vestager/ announcements/ making- data- work- us_en.

17

Privacy, Algorithms, and

Artifi cial Intelligence

Catherine Tucker

Imagine the following scenario. You are late for a hospital appointment and

searching frantically for a parking spot. You know that you often forget

where you parked your car, so you use an app you downloaded called “Find

my Car.” The app takes a photo of your car and then geocodes the photo,

enabling you to easily fi nd the right location when you come to retrieve your

car. The app accurately predicts when it should provide a prompt. This all

sounds very useful. However, this example illustrates a variety of privacy

concerns in a world of artifi cial intelligence.

1. Data Persistence: This data, once created, may potentially persist longer

than the human that created it, given the low costs of storing such data.

2. Data Repurposing: It is not clear how such data could be used in the

future. Once created, such data can be indefi nitely repurposed. For example,

in a decade’s time parking habits may be part of the data used by health

insurance companies to allocate an individual to a risk premium.

3. Data Spillovers: There are potential spillovers for others who did not

take the photo. The photo may record other people and they may be identifi -

able through facial recognition, or incidentally captured cars may be identi-

fi able through license plate databases. These other people did not choose to

create the data, but my choice to create data may have spillovers for them

in the future.

Catherine Tucker is the Sloan Distinguished Professor of Management Science at MIT

Sloan School of Management and a research associate of the National Bureau of Economic Research.

For acknowledgments, sources of research support, and disclosure of the author’s material fi nancial relationships, if any, please see http:// www .nber .org/ chapters/ c14011.ack.

423

424 Catherine Tucker

This article will discuss these concerns in detail, after considering how

the theory of the economics of privacy relates to artifi cial intelligence (AI).

17.1 The Theory of Privacy in Economics and Artifi cial Intelligence

17.1.1 Current Models of Economics and Privacy and Their Flaws

The economics of privacy has long being plagued by a lack of clarity

about how to model privacy over data. Most theoretical economic models

model privacy as an intermediate good (Varian 1996; Farrell 2012). This

implies that an individual desire for data privacy will depend on how they

anticipate that data’s eff ect on future economic outcomes. If, for example,

this data leads a fi rm to charge higher prices based on the behavior they

observe in the data, a consumer may desire privacy. If a datum may lead

a fi rm to intrude on their time, then again a consumer may desire privacy.

However, this contrasts with, or at the very least has a diff erent emphasis

on, how many policymakers and even consumers think about privacy policy

and choice.

First, much of the policy debate involves whether or not consumers are

capable of making the right choice surrounding the decision to provide data,

and whether “notice and consent” provides suffi

cient information to con-

sumers so they make the right choice. Work such as McDonald and Cranor

(2008) emphasizes that even ten years ago it was unrealistic to think that con-

sumers would have time to properly inform themselves about how their data

may be used, as reading through privacy policies would take an estimated

244 hours each year. Since that study, the amount of devices (thermostats,

smart phones, apps, cars) collecting data has increased dramatically, suggest-

ing that it is, if anything, more implausible now that a consumer has the time

to actually understand the choice they are making in each of these instances.

Relatedly, even if customers are assumed to have been adequately in-

formed, a new “behavioral” literature on privacy shows that well- documented

eff ects from behavioral economics, such as the endowment eff ect or “anchor-

ing,” may also distort the ways customers make decisions surrounding their

data (Acquisti, Taylor, and Wagman 2016). Such distortions may allow for

policy interventions of the “nudge” type to allow consumers to make better

decisions (Acquisti 2010).

Third, this theory presupposes that customers will only desire privacy if

their data is actually used for something, rather than experiencing distaste

at the idea of their data being collected. Indeed, in some of the earliest

work on privacy in the internet era, Varian (1996) states, “I don’t really care

if someone has my telephone number as long as they don’t call me during

dinner and try to sell me insurance. Similarly, I don’t care if someone has

my address, as long as they don’t send me lots of offi

cial- looking letters

off ering to refi nance my house or sell me mortgage insurance.”

Privacy, Algorithms, and Artifi cial Intelligence 425

However, there is evidence to suggest that people do care about the mere

fact of collection of their data to the extent of changing their behavior,

even if the chance of their suff ering meaningfully adverse consequences

from that collection is very small. Empirical analysis of people’s reactions

to the knowledge that their search queries (Marthews and Tucker 2014)

had been collected by the US National Security Agency (NSA), shows a

signifi cant shift in behavior even when that data was not going to be used

by the government to identify terrorists, as it was simply personally embar-

rassing. Legally speaking, the Fourth Amendment of the US Constitution

covers the “unreasonable seizure” as well as the “u
nreasonable search” of

people’s “papers and eff ects,” suggesting that governments, and fi rms acting

on government’s behalf, cannot entirely ignore seizure of data and focus

only on whether a search is reasonable. Consequently, a growing consumer

market has emerged for “data- light” and “end- to-end encrypted” com-

munications and software solutions, where the fi rm collects much less or

no data about their consumers’ activities on their platform. These kinds of

concern suggest that the fact of data collection may matter as well as how

the data is used.

Last, often economic theory assumes that while customers desire fi rms

to have information that allows them to better match their horizontally

diff erentiated preferences, they do not desire fi rms to have information that

might inform their willingness to pay (Varian 1996). However, this idea

that personalization in a horizontal sense may be sought by customers goes

against popular reports of consumers fi nding personalization repugnant or

creepy (Lambrecht and Tucker 2013). Instead, it appears that personaliza-

tion of products using horizontally diff erentiated taste information is only

acceptable or successful if accompanied by a sense of control or ownership

over the data used, even where such control is ultimately illusory (Tucker

2014; Athey, Catalini, and Tucker 2017).

17.1.2 Artifi cial Intelligence and Privacy

Like “privacy,” artifi cial intelligence is often used loosely to mean many

things. This article follows (Agrawal, Gans, and Goldfarb 2016) and focuses

on AI as being associated with reduced costs of prediction. The obvious

eff ect that this will have on the traditional model of privacy is that more

types of data will be used to predict a wider variety of economic objectives.

Again, the desire (or lack of desire) for privacy will be a function of an

individual’s anticipation of the consequences of their data being used in a

predictive algorithm. If they anticipate that they will face worse economic

outcomes if the AI uses their data, they may desire to restrict their data

sharing or creating behavior.

It may be that the simple dislike or distaste for data collection will transfer

to the use of automated predictive algorithms to process their data. The

creepiness that leads to a desire for privacy that is attached to the use of

426 Catherine Tucker

data would be transferred to algorithms. Indeed, there is some evidence of

a similar behavioral process where some customers only accept algorithmic

prediction if it is accompanied by a sense of control (Dietvorst, Simmons,

and Massey 2016).

In this way, the question of AI algorithms seems simply a continuation

of the tension that has plagued earlier work in the economics of privacy. So,

a natural question is whether AI presents new or diff erent problems. This

article argues that many of the questions of AI and privacy choices will

constrain the ability of customers in our traditional model of privacy to

make choices regarding the sharing of their data. I emphasize three themes

that I think may distort this process in important and economically interest-

ing ways.

17.2 Data Persistence, AI, and Privacy

Data persistence refers to the fact that once digital data is created, it is

diffi

cult to delete completely. This is true from a technical perspective (Adee

2015). Unlike analog records, which can be destroyed with reasonable ease,

the intentional deletion of digital data requires resources, time, and care.

17.2.1 Unlike in Previous Eras, Data Created Now Is Likely to Persist

Cost constraints that used to mean that only the largest fi rms could aff ord

to store extensive data, and even then for a limited time, have essentially

disappeared.

Large shifts in the data- supply infrastructure have rendered the tools for

gathering and analyzing large swaths of digital data commonplace. Cloud-

based resources such as Amazon, Microsoft, and Rackspace make these

tools not dependent on scale,1 and storage costs for data continue to fall,

so that some speculate they may eventually approach zero.2 This allows

ever- smaller fi rms to have access to powerful and inexpensive computing

resources. This decrease in costs suggests that data may be stored indefi nitely

and can be used in predictive exercises should it be thought of as a useful

predictor.

The chief resource constraint on the deployment of big data solutions

is a lack of human beings with the data- science skills to draw appropriate

conclusions from analysis of large data sets (Lambrecht and Tucker 2017).

As time and skills evolve, this constraint may become less pressing.

Digital persistence may be concerning from a privacy point of view

because privacy preferences may change over time. The privacy preference

1. http:// betanews .com/ 2014/ 06/ 27/ comparing- the- top- three- cloud- storage- providers/.

2. http:// www .enterprisestorageforum .com/ storage- management/ can- cloud- storage- costs

- fall- to-zero- 1 .html.

Privacy, Algorithms, and Artifi cial Intelligence 427

that an individual may have felt when they created the data may be incon-

sistent with the privacy preference of their older self. This is something we

documented in Goldfarb and Tucker (2012). We showed that while younger

people tended to be more open with data, as they grew older their prefer-

ence for withholding data grew. This was a stable eff ect that persisted across

cohorts. It is not the case that young people today are unusually casual about

data; all generations when younger are more casual about data, but this pat-

tern was simply less visible previously because social media, and other ways

of sharing and creating potentially embarrassing data, did not yet exist.

This implies that one concern regarding AI and privacy is that it may use

data that was created a long time in the past, which in retrospect the indi-

vidual regrets creating.

Data that was created at t = 0 may have seemed innocuous at the time,

and in isolation may still be innocuous at t = t + 1, but increased computing power may be able to derive much more invasive conclusions from aggregations of otherwise innocuous data at t + 1 relative to t. Second, there is a whole variety of data generated on individuals that individuals do not

necessarily consciously choose to create. This not only includes incidental

collection of the data such as being photographed by another party, but

also data generated by the increased passive surveillance of public spaces,

and the use of cellphone technology without full appreciation of how much

data about an individual and location it discloses to third parties, including

the government.

Though there has been substantial work in bringing in the insights of

behavioral economics into the study of the economics of privacy, there has

been less work on time- preference consistency, despite the fact that it is one

of the oldest and most studied (Strotz 1955; Rubinstein 2006) phenomena

in behavioral economics. Introducing the potential for myopia or hyperbolic

discounting into the way we model privacy
choices over the creation of data

seems, therefore, an important step. Even if the economist concerned rejects

behavioral economics or myopia as an acceptable solution, at the very least

it is useful to emphasize that privacy choices should be modeled not as

something where the time between the creation of the data and the use of

the data is trivial, but instead is more acceptably modeled as a decision that

may be played out over an extended amount of time.

17.2.2 How Long Will Data’s Predictive Power Persist?

If we assume that any data created will probably persist, given low stor-

age costs, it may be that the more important question for understanding

the dynamics of privacy is the question of how long data’s predictive power

persists.

It seems reasonable to think that much of the data created today does not

have much predictive power tomorrow. This is something we investigated in

428 Catherine Tucker

Chiou and Tucker (2014) where we showed that the length of the data reten-

tion period that search engines were restricted to by the European Union

(EU) did not appear to aff ect the success of their algorithm at generating

useful search results. This is where the success of a search result was mea-

sured by whether or not the user felt compelled to search again. This may

make sense in the world of search engines where many searches are either

unique or focused on new events. On August 31, 2017, for example, the top

trending search on Google was “Hurricane Harvey,” something that could

not have been predicted on the basis of search behavior from more than a

few weeks prior.3

However, there are some forms of data where it is reasonable to think that

their predictive power will persist almost indefi nitely. The most important

example of this is the creation of genetic digital data. As Miller and Tucker

(2017) point out, companies such as 23andme .com are creating large reposi-

tories of genetic data spanning more than 1.2 million people. As pointed

out by Miller and Tucker (2017), genetic data has the unusual quality that

it does not change over time.

While the internet browsing behavior of a twenty- year- old may not prove

to be good for predicting their browsing behavior at age forty, the genetic

data of a twenty- year- old will almost perfectly predict the genetic data of

‹ Prev Next ›