The Economics of Artificial Intelligence Page 71 Read online free by Ajay Agrawal

Home > Other > The Economics of Artificial Intelligence > Page 71

The Economics of Artificial Intelligence Page 71

that person when they turn forty.4

17.3 Data Repurposing, AI, and Privacy

The lengthy time frame that digital persistence of data implies increases

uncertainty surrounding how the data will be used. This is because once

created, a piece of data can be reused an infi nite number of times. As predic-

tion costs are lower, this generally expands the number of circumstances and

occasions where data may be used. If an individual is unable to reasonably

anticipate how their data may be repurposed or what the data may pre-

dict in this repurposed setting, modeling their choices over the creation of

their data becomes more diffi

cult and problematic than in our current very

deterministic models, which assume certainty over how data will be used.

17.3.1 Unanticipated

Correlations

There may be correlations in behavior across users that may not be antici-

pated when data is created, and it is in these kinds of spillovers that the larg-

est potential consequences for privacy of AI may be found.

One famous example of this is that someone liking (or disliking) curly fries

on Facebook would have been unable to reasonably anticipate it would be

3. https:// trends.google .com/ trends/.

4. As discussed in articles such as http:// www .nature .com/ news/ 2008/ 080624/ full/ news

.2008.913 .html, DNA does change somewhat over time, but that change is itself somewhat predictable.

Privacy, Algorithms, and Artifi cial Intelligence 429

predictive of intelligence (Kosinski, Stillwell, and Graepel 2013) and there-

fore potentially used as a screening device by algorithms aiming to identify

desirable employees or students.5

17.3.2 Unanticipated Distortions in Correlations

In these cases, an algorithm could potentially make a projection based on

a correlation in the data, using data that was created for a diff erent purpose.

The consequences for models of economics of privacy are that they assume

a singular use of data, rather than allowing for the potential of reuse in

unpredictable contexts.

However, even supposing that individuals were able to reasonably antici-

pate the repurposing of their data, there are incremental challenges with

thinking about their ability to project distortions that might come about as

a result of the repurposing of their data.

The potential for distortions based on correlations in data is something

we investigate in new research.6

In Miller and Tucker (2018) we document the distribution of advertising

by an advertising algorithm that attempts to predict a person’s ethnic affi

n-

ity from their data online. We ran multiple parallel ad campaigns targeted

at African American, Asian American, and Hispanic ethnic affi

nities. We

also ran an additional campaign targeted at those judged to not have any of

these three ethnic affi

nities. These campaigns highlighted a federal program

designed to enhance pathways to a federal job via internships and career

guidance.7 We ran this ad for a week and collected data on how many people

the ad was shown to in each county. We found that relative to what would be

predicted by the actual demographic makeup of that county given the census

data, the ad algorithm tended to predict that more African American people

are in states where there is a historical record of discrimination against

African Americans. This pattern is true for states that allowed slavery at the

time of the American Civil War, and also true for states that restricted the

ability of African Americans to vote in the twentieth century. In such states,

it was only the presence of African Americans that was over predicted, not

people with Hispanic or Asian American backgrounds.

We show that this cannot be explained by the algorithm responding to

behavioral data in these states, as there was no diff erence in click- through

patterns across diff erent campaigns across states, with or without this his-

tory of discrimination.

5. This study found that the best predictors of high intelligence include Thunderstorms, The Colbert Report, Science, and Curly Fries, whereas low intelligence was indicated by Sephora, I Love Being A Mom, Harley Davidson, and Lady Antebellum.

6. This new research will be the focus of my presentation at the NBER meetings.

7. For details of the program, see https:// www .usajobs .gov/ Help/ working- in-government

/ unique- hiring- paths/ students/.

430 Catherine Tucker

We discuss how this can be explained by four facts about how the algo-

rithm operates:

1. The algorithm identifi es a user as having a particular ethnic affi

nity

based on their liking of cultural phenomena such as celebrities, movies, TV

shows, and music.

2. People who have lower incomes are more likely to use social media to

express interest in celebrities, movies, TV shows, and music.

3. People who have higher incomes are more likely to use social media to

express their thoughts about the politics and the news.8

4. Research in economics has suggested that African Americans are more

likely to have lower incomes in states that have exhibited historic patterns of

discrimination (Sokoloff and Engerman 2000; Bertocchi and Dimico 2014).

The empirical regularity that an algorithm predicting race is more likely

to predict someone is black in geographies that have historic patterns of

discrimination matters because it highlights the potential for historical per-

sistence in algorithmic behavior. It suggests that dynamic consequences of

earlier history may aff ect how artifi cial intelligence makes predictions. When

that earlier history is repugnant, it is even more concerning. In this particular

case the issue is using a particular piece of data to predict a trait when the

generation of that data is endogenous.

This emphasizes that privacy policy in a world of predictive algorithms

is more complex than in a straightforward world where individuals make

binary decisions about their data. In our example, it would seem problem-

atic to bar low- income individuals from expressing their identities via their

affi

nity with musical or visual arts. However, their doing so could likely lead

to a prediction that they belong to a particular ethnic group. They may not

be aware ex ante of the risk that disclosing a musical preference may cause

Facebook to infer an ethnic affi

nity and advertise to them on that basis.

17.3.3 Unanticipated Consequences of Unanticipated Repurposing

In most economic models, a consumer’s prospective desire for privacy

in the data depends here on the consumer being able to accurately forecast

the uses to which the data is put. One problem with data privacy is that AI/

algorithmic use of existing data sets may be reaching a point where data

can be used and recombined in ways that people creating that data in, say,

2000 or 2005, could not reasonably have foreseen or incorporated into their

decision- making at the time.

Again, this brings up legal concerns where an aggregation, or mosaic,

of data on an individual is held to be shar
ply more intrusive than each

datum considered in isolation. In United States v. Jones (2012), Justice Soto-

mayor wrote in a well- known concurring opinion, “It may be necessary to

8. One of the best predictors of high income on social media is a liking of Dan Rather.

Privacy, Algorithms, and Artifi cial Intelligence 431

reconsider the premise that an individual has no reasonable expectation of

privacy in information voluntarily disclosed to third parties [ . . . ]. This

approach is ill suited to the digital age, in which people reveal a great deal

of information about themselves to third parties in the course of carrying

out mundane tasks.” Artifi cial intelligence systems have shown themselves

as able to develop very detailed pictures of individuals’ tastes, activities, and

opinions based on analysis of aggregated information on our now digitally

intermediated mundane tasks. Part of the risk in a mosaic approach for

fi rms is that data previously considered not personally identifi able or person-

ally sensitive—such as ZIP Code, gender, or age to within ten years—when

aggregated and analyzed by today’s algorithms, may suffi

ce to identify you

as an individual.

This general level of uncertainty surrounding the future use of data,

coupled with certainty that it will be potentially useful to fi rms, aff ects the

ability of a consumer to be able to clearly make a choice to create or share

data. With large amounts of risk and uncertainty surrounding how private

data may be used, this has implications for how an individual may process

their preferences regarding privacy.

17.4 Data Spillovers, AI, and Privacy

In the United States, privacy has been defi ned as an individual right, spe-

cifi cally an individual’s right to be left alone (Warren and Brandeis 1890) (in

this specifi c case, from journalists with cameras).

Economists’ attempts to devise a utility function that refl ects privacy have

refl ected this individualistic view. A person has a preference for keeping

information secret (or not) because of the potential consequences for their

interaction with a fi rm. So far, their privacy models have not refl ected the

possibility that another person’s preferences or behavior could have spill-

overs on this process.

17.5 Some Types of Data Used by Algorithms

May Naturally Generate Spillovers

For example, in the case of genetics, the decision to create genetic data has

immediate consequences for family members, since one individual’s genetic

data is signifi cantly similar to the genetic data of their family members. This

creates privacy spillovers for relatives of those who upload their genetic

profi le to 23andme. Data that predicts I may suff er from bad eyesight or

macular degeneration later in life could be used to reasonably predict that

those who are related to me by blood may also be more likely to share a

similar risk profi le.

Of course, one hopes that an individual would be capable of internalizing

the potential externalities on family members of genetic data revelation, but

432 Catherine Tucker

it does not seem far- fetched to imagine situations of estrangement where

such internalizing would not happen and there would be a clear externality.

Outside the realm of binary data, there are other kinds of data that by

their nature may create spillovers. These include photo, video, and audio

data taken in public places. Such data may be created for one purpose such

as the result of a recreational desire to use video to capture a memory or

to enhance security, but may potentially create data about other individu-

als whose voices or images are captured without them being aware that

their data is being recorded. Traditionally, legal models of privacy have

distinguished between the idea of a private realm where an individual has

an expectation of privacy and a public realm where an individual can have

no reasonable expectation of privacy. For example, in the Supreme Court

case California v. Greenwood (1988), the court refused to accept that an

individual had a reasonable expectation of privacy in garbage he had left

on the curb.

However, in a world where people use mobile devices and photo capture

extensively, facial recognition allows accurate identifi cation of any indi-

vidual while out in public, and individuals have diffi

culty avoiding such

identifi cations. Encoded in the notion that we do not have a reasonable

expectation of privacy in the public realm are two potential errors: that one’s

presence in a public space is usually transitory enough to not be recorded,

and that the record of one’s activities in the public space will not usually be

recorded, parsed, and exploited for future use. Consequently, the advance

of technology muddies the allocation of property rights over the creation

of data. In particular, it is not clear how video footage of my behavior in

public spaces, which can potentially accurately predict economically mean-

ingful outcomes such as health outcomes, can be clearly dismissed as being a

context where I had no expectation of privacy, or at least no right to control

the creation of data. In any case, these new forms of data, due in some sense

to the incidental nature of data creation seem to undermine the clear- cut

assumption of easily defi nable property rights over the data that is integral

to most economic models of privacy.

17.5.1 Algorithms Themselves Will Naturally

Create Spillovers across Data

One of the major consequences of AI and its ability to automate predic-

tion is that there may be spillovers between individuals and other economic

agents. There may also be spillovers across a person’s decision to keep some

information secret, if such secrecy predicts other aspects of that individual’s

behavior that AI might be able to project from.

Research has documented algorithmic outcomes that appear to be dis-

criminatory, and has argued that such outcomes may occur because the algo-

rithm itself will learn to be biased on the basis of the behavioral data that

Privacy, Algorithms, and Artifi cial Intelligence 433

feeds it (O’Neil 2017). Documented alleged algorithmic bias spans charging

more to Asians for test- taking prep software9 to black names being more

likely to produce criminal record check ads (Sweeney 2013) to women being

less likely to seeing ads for an executive coaching service (Datta, Tschantz,

and Datta 2015).

Such data- based discrimination is often held to be a privacy issue (Custers

et al. 2012). The argument is that it is abhorrent for a person’s data to be used

to discriminate against them—especially if they did not explicitly consent

to its collection in the fi rst place. However, though not often discussed in

the legally orientated data- based discrimination literature, there are many

links between the fears expressed for the potential of data- based discrimina-

tion and the earlier economics literature on statistical discrimination litera-

ture. In much the same way that some fi nd it distasteful when an employer

&
nbsp; extrapolates from general data on fertility decisions and consequences

among females to project similar expectations of fertility and behavior onto

a female employee, an algorithm making similar extrapolations is equally

distasteful. Such instances of statistical discrimination by algorithms may

refl ect spillovers of predictive power across individuals, which in turn may

not be necessarily internalized by each individual.

However, as of yet there have been few attempts to try to understand why

ad algorithms can produce apparently discriminatory outcomes, or whether

the digital economy itself may play a role in the apparent discrimination.

I argue that above and beyond the obvious similarity to the statistical dis-

crimination literature in economics, sometimes apparent discrimination can

be best understood as spillovers in algorithmic decision- making. This makes

the issue of privacy not just one of the potential that an individual’s data

can be used to discriminate against them.

In Lambrecht and Tucker (forthcoming), we discuss a fi eld study into

apparent algorithmic bias. We use data from a fi eld test of the display of an

ad for jobs in the science, technology, engineering, and math fi elds (STEM).

This ad was less likely to be shown to women. This appeared to be a result

of an algorithmic outcome, as the advertiser had intended the ad to be gen-

der neutral. We explore various ways that might explain why the algorithm

acted in an apparently discriminatory way. An obvious set of explanations is

ruled out. For example, it is not because the predictive algorithm has fewer

women to show the ad to, and it is not the case that the predictive algorithm

learns that women are less likely are to click the ad, since women are more

likely to click on it—conditional on being shown the ad—than men. In other

words, this is not simply statistical discrimination. We also show it is not that

9. https:// www .propublica .org/ article/ asians- nearly- twice- as-likely- to-get- higher- price

- from- princeton- review. In this case, the alleged discrimination apparently stemmed from the fact that Asians are more likely to live in cities that have higher test prep prices.

434 Catherine Tucker

the algorithm learned from local behavior that may historically have been

biased against women. We use data from 190 countries and show that the

‹ Prev Next ›