The Economics of Artificial Intelligence

Home > Other > The Economics of Artificial Intelligence > Page 71
The Economics of Artificial Intelligence Page 71

by Ajay Agrawal


  that person when they turn forty.4

  17.3 Data Repurposing, AI, and Privacy

  The lengthy time frame that digital persistence of data implies increases

  uncertainty surrounding how the data will be used. This is because once

  created, a piece of data can be reused an infi nite number of times. As predic-

  tion costs are lower, this generally expands the number of circumstances and

  occasions where data may be used. If an individual is unable to reasonably

  anticipate how their data may be repurposed or what the data may pre-

  dict in this repurposed setting, modeling their choices over the creation of

  their data becomes more diffi

  cult and problematic than in our current very

  deterministic models, which assume certainty over how data will be used.

  17.3.1 Unanticipated

  Correlations

  There may be correlations in behavior across users that may not be antici-

  pated when data is created, and it is in these kinds of spillovers that the larg-

  est potential consequences for privacy of AI may be found.

  One famous example of this is that someone liking (or disliking) curly fries

  on Facebook would have been unable to reasonably anticipate it would be

  3. https:// trends.google .com/ trends/.

  4. As discussed in articles such as http:// www .nature .com/ news/ 2008/ 080624/ full/ news

  .2008.913 .html, DNA does change somewhat over time, but that change is itself somewhat predictable.

  Privacy, Algorithms, and Artifi cial Intelligence 429

  predictive of intelligence (Kosinski, Stillwell, and Graepel 2013) and there-

  fore potentially used as a screening device by algorithms aiming to identify

  desirable employees or students.5

  17.3.2 Unanticipated Distortions in Correlations

  In these cases, an algorithm could potentially make a projection based on

  a correlation in the data, using data that was created for a diff erent purpose.

  The consequences for models of economics of privacy are that they assume

  a singular use of data, rather than allowing for the potential of reuse in

  unpredictable contexts.

  However, even supposing that individuals were able to reasonably antici-

  pate the repurposing of their data, there are incremental challenges with

  thinking about their ability to project distortions that might come about as

  a result of the repurposing of their data.

  The potential for distortions based on correlations in data is something

  we investigate in new research.6

  In Miller and Tucker (2018) we document the distribution of advertising

  by an advertising algorithm that attempts to predict a person’s ethnic affi

  n-

  ity from their data online. We ran multiple parallel ad campaigns targeted

  at African American, Asian American, and Hispanic ethnic affi

  nities. We

  also ran an additional campaign targeted at those judged to not have any of

  these three ethnic affi

  nities. These campaigns highlighted a federal program

  designed to enhance pathways to a federal job via internships and career

  guidance.7 We ran this ad for a week and collected data on how many people

  the ad was shown to in each county. We found that relative to what would be

  predicted by the actual demographic makeup of that county given the census

  data, the ad algorithm tended to predict that more African American people

  are in states where there is a historical record of discrimination against

  African Americans. This pattern is true for states that allowed slavery at the

  time of the American Civil War, and also true for states that restricted the

  ability of African Americans to vote in the twentieth century. In such states,

  it was only the presence of African Americans that was over predicted, not

  people with Hispanic or Asian American backgrounds.

  We show that this cannot be explained by the algorithm responding to

  behavioral data in these states, as there was no diff erence in click- through

  patterns across diff erent campaigns across states, with or without this his-

  tory of discrimination.

  5. This study found that the best predictors of high intelligence include Thunderstorms, The Colbert Report, Science, and Curly Fries, whereas low intelligence was indicated by Sephora, I Love Being A Mom, Harley Davidson, and Lady Antebellum.

  6. This new research will be the focus of my presentation at the NBER meetings.

  7. For details of the program, see https:// www .usajobs .gov/ Help/ working- in-government

  / unique- hiring- paths/ students/.

  430 Catherine Tucker

  We discuss how this can be explained by four facts about how the algo-

  rithm operates:

  1. The algorithm identifi es a user as having a particular ethnic affi

  nity

  based on their liking of cultural phenomena such as celebrities, movies, TV

  shows, and music.

  2. People who have lower incomes are more likely to use social media to

  express interest in celebrities, movies, TV shows, and music.

  3. People who have higher incomes are more likely to use social media to

  express their thoughts about the politics and the news.8

  4. Research in economics has suggested that African Americans are more

  likely to have lower incomes in states that have exhibited historic patterns of

  discrimination (Sokoloff and Engerman 2000; Bertocchi and Dimico 2014).

  The empirical regularity that an algorithm predicting race is more likely

  to predict someone is black in geographies that have historic patterns of

  discrimination matters because it highlights the potential for historical per-

  sistence in algorithmic behavior. It suggests that dynamic consequences of

  earlier history may aff ect how artifi cial intelligence makes predictions. When

  that earlier history is repugnant, it is even more concerning. In this particular

  case the issue is using a particular piece of data to predict a trait when the

  generation of that data is endogenous.

  This emphasizes that privacy policy in a world of predictive algorithms

  is more complex than in a straightforward world where individuals make

  binary decisions about their data. In our example, it would seem problem-

  atic to bar low- income individuals from expressing their identities via their

  affi

  nity with musical or visual arts. However, their doing so could likely lead

  to a prediction that they belong to a particular ethnic group. They may not

  be aware ex ante of the risk that disclosing a musical preference may cause

  Facebook to infer an ethnic affi

  nity and advertise to them on that basis.

  17.3.3 Unanticipated Consequences of Unanticipated Repurposing

  In most economic models, a consumer’s prospective desire for privacy

  in the data depends here on the consumer being able to accurately forecast

  the uses to which the data is put. One problem with data privacy is that AI/

  algorithmic use of existing data sets may be reaching a point where data

  can be used and recombined in ways that people creating that data in, say,

  2000 or 2005, could not reasonably have foreseen or incorporated into their

  decision- making at the time.

  Again, this brings up legal concerns where an aggregation, or mosaic,

  of data on an individual is held to be shar
ply more intrusive than each

  datum considered in isolation. In United States v. Jones (2012), Justice Soto-

  mayor wrote in a well- known concurring opinion, “It may be necessary to

  8. One of the best predictors of high income on social media is a liking of Dan Rather.

  Privacy, Algorithms, and Artifi cial Intelligence 431

  reconsider the premise that an individual has no reasonable expectation of

  privacy in information voluntarily disclosed to third parties [ . . . ]. This

  approach is ill suited to the digital age, in which people reveal a great deal

  of information about themselves to third parties in the course of carrying

  out mundane tasks.” Artifi cial intelligence systems have shown themselves

  as able to develop very detailed pictures of individuals’ tastes, activities, and

  opinions based on analysis of aggregated information on our now digitally

  intermediated mundane tasks. Part of the risk in a mosaic approach for

  fi rms is that data previously considered not personally identifi able or person-

  ally sensitive—such as ZIP Code, gender, or age to within ten years—when

  aggregated and analyzed by today’s algorithms, may suffi

  ce to identify you

  as an individual.

  This general level of uncertainty surrounding the future use of data,

  coupled with certainty that it will be potentially useful to fi rms, aff ects the

  ability of a consumer to be able to clearly make a choice to create or share

  data. With large amounts of risk and uncertainty surrounding how private

  data may be used, this has implications for how an individual may process

  their preferences regarding privacy.

  17.4 Data Spillovers, AI, and Privacy

  In the United States, privacy has been defi ned as an individual right, spe-

  cifi cally an individual’s right to be left alone (Warren and Brandeis 1890) (in

  this specifi c case, from journalists with cameras).

  Economists’ attempts to devise a utility function that refl ects privacy have

  refl ected this individualistic view. A person has a preference for keeping

  information secret (or not) because of the potential consequences for their

  interaction with a fi rm. So far, their privacy models have not refl ected the

  possibility that another person’s preferences or behavior could have spill-

  overs on this process.

  17.5 Some Types of Data Used by Algorithms

  May Naturally Generate Spillovers

  For example, in the case of genetics, the decision to create genetic data has

  immediate consequences for family members, since one individual’s genetic

  data is signifi cantly similar to the genetic data of their family members. This

  creates privacy spillovers for relatives of those who upload their genetic

  profi le to 23andme. Data that predicts I may suff er from bad eyesight or

  macular degeneration later in life could be used to reasonably predict that

  those who are related to me by blood may also be more likely to share a

  similar risk profi le.

  Of course, one hopes that an individual would be capable of internalizing

  the potential externalities on family members of genetic data revelation, but

  432 Catherine Tucker

  it does not seem far- fetched to imagine situations of estrangement where

  such internalizing would not happen and there would be a clear externality.

  Outside the realm of binary data, there are other kinds of data that by

  their nature may create spillovers. These include photo, video, and audio

  data taken in public places. Such data may be created for one purpose such

  as the result of a recreational desire to use video to capture a memory or

  to enhance security, but may potentially create data about other individu-

  als whose voices or images are captured without them being aware that

  their data is being recorded. Traditionally, legal models of privacy have

  distinguished between the idea of a private realm where an individual has

  an expectation of privacy and a public realm where an individual can have

  no reasonable expectation of privacy. For example, in the Supreme Court

  case California v. Greenwood (1988), the court refused to accept that an

  individual had a reasonable expectation of privacy in garbage he had left

  on the curb.

  However, in a world where people use mobile devices and photo capture

  extensively, facial recognition allows accurate identifi cation of any indi-

  vidual while out in public, and individuals have diffi

  culty avoiding such

  identifi cations. Encoded in the notion that we do not have a reasonable

  expectation of privacy in the public realm are two potential errors: that one’s

  presence in a public space is usually transitory enough to not be recorded,

  and that the record of one’s activities in the public space will not usually be

  recorded, parsed, and exploited for future use. Consequently, the advance

  of technology muddies the allocation of property rights over the creation

  of data. In particular, it is not clear how video footage of my behavior in

  public spaces, which can potentially accurately predict economically mean-

  ingful outcomes such as health outcomes, can be clearly dismissed as being a

  context where I had no expectation of privacy, or at least no right to control

  the creation of data. In any case, these new forms of data, due in some sense

  to the incidental nature of data creation seem to undermine the clear- cut

  assumption of easily defi nable property rights over the data that is integral

  to most economic models of privacy.

  17.5.1 Algorithms Themselves Will Naturally

  Create Spillovers across Data

  One of the major consequences of AI and its ability to automate predic-

  tion is that there may be spillovers between individuals and other economic

  agents. There may also be spillovers across a person’s decision to keep some

  information secret, if such secrecy predicts other aspects of that individual’s

  behavior that AI might be able to project from.

  Research has documented algorithmic outcomes that appear to be dis-

  criminatory, and has argued that such outcomes may occur because the algo-

  rithm itself will learn to be biased on the basis of the behavioral data that

  Privacy, Algorithms, and Artifi cial Intelligence 433

  feeds it (O’Neil 2017). Documented alleged algorithmic bias spans charging

  more to Asians for test- taking prep software9 to black names being more

  likely to produce criminal record check ads (Sweeney 2013) to women being

  less likely to seeing ads for an executive coaching service (Datta, Tschantz,

  and Datta 2015).

  Such data- based discrimination is often held to be a privacy issue (Custers

  et al. 2012). The argument is that it is abhorrent for a person’s data to be used

  to discriminate against them—especially if they did not explicitly consent

  to its collection in the fi rst place. However, though not often discussed in

  the legally orientated data- based discrimination literature, there are many

  links between the fears expressed for the potential of data- based discrimina-

  tion and the earlier economics literature on statistical discrimination litera-

  ture. In much the same way that some fi nd it distasteful when an employer

&
nbsp; extrapolates from general data on fertility decisions and consequences

  among females to project similar expectations of fertility and behavior onto

  a female employee, an algorithm making similar extrapolations is equally

  distasteful. Such instances of statistical discrimination by algorithms may

  refl ect spillovers of predictive power across individuals, which in turn may

  not be necessarily internalized by each individual.

  However, as of yet there have been few attempts to try to understand why

  ad algorithms can produce apparently discriminatory outcomes, or whether

  the digital economy itself may play a role in the apparent discrimination.

  I argue that above and beyond the obvious similarity to the statistical dis-

  crimination literature in economics, sometimes apparent discrimination can

  be best understood as spillovers in algorithmic decision- making. This makes

  the issue of privacy not just one of the potential that an individual’s data

  can be used to discriminate against them.

  In Lambrecht and Tucker (forthcoming), we discuss a fi eld study into

  apparent algorithmic bias. We use data from a fi eld test of the display of an

  ad for jobs in the science, technology, engineering, and math fi elds (STEM).

  This ad was less likely to be shown to women. This appeared to be a result

  of an algorithmic outcome, as the advertiser had intended the ad to be gen-

  der neutral. We explore various ways that might explain why the algorithm

  acted in an apparently discriminatory way. An obvious set of explanations is

  ruled out. For example, it is not because the predictive algorithm has fewer

  women to show the ad to, and it is not the case that the predictive algorithm

  learns that women are less likely are to click the ad, since women are more

  likely to click on it—conditional on being shown the ad—than men. In other

  words, this is not simply statistical discrimination. We also show it is not that

  9. https:// www .propublica .org/ article/ asians- nearly- twice- as-likely- to-get- higher- price

  - from- princeton- review. In this case, the alleged discrimination apparently stemmed from the fact that Asians are more likely to live in cities that have higher test prep prices.

  434 Catherine Tucker

  the algorithm learned from local behavior that may historically have been

  biased against women. We use data from 190 countries and show that the

 

‹ Prev