Book Read Free

The Economics of Artificial Intelligence

Page 73

by Ajay Agrawal


  its promise. However, it is often diffi

  cult to discover the real data practice. It

  is even more diffi

  cult to rectify consumer damage from a misrepresented data

  policy, as a court often requires a “body on the ground”—that is, evidence

  of a harmful outcome—as well as some confi dence that there is a causal

  link between that outcome and the data collector’s practices.3

  1. There could be positive externality from one player to another. For example, a data set that tracks an infectious disease nationwide can generate enormous public health benefi ts for everyone. But if each data collector accesses only part of the data and there is no way for him to benefi t from the fi nal product based on nationwide data, he may have an incentive to under-collect and undershare the data. Here I focus on negative externality, in order to highlight the risk of overcollecting and oversharing.

  2. The argument of negative externality has been discussed in multiple papers, including Swire and Litan (1998) and Odlyzko (2003). See Acquisti, Taylor, and Wagman (2016) for a more comprehensive summary.

  3. The Court’s emphasis on tangible harm is best illustrated in an ongoing battle between the Federal Trade Commission (FTC) and LabMD. LabMD is a medical testing laboratory

  that collects sensitive personal and medical information from consumers. The FTC alleged that LabMD violated the FTC Act by failing to employ reasonable and appropriate measures to prevent unauthorized access to consumers’ personal information. In November 2015, the Administrative Judge of the FTC dismissed the FTC complaint, arguing that complaint counsel failed to prove that LabMD’s data security conduct caused or was likely to cause substantial injury to consumers (https:// www .ftc .gov/ news- events/ press- releases/ 2015/ 11/ administrative

  - law- judge- dismisses- ftc- data- security- complaint). This decision was reversed in July 2016, by an Opinion and Final Order from the FTC commissioners (https:// www .ftc .gov/ news- events

  / press- releases/ 2016/ 07/ commission- fi nds- labmd- liable- unfair- data- security- practices). In November 2016, the 11th US Circuit Court of Appeals granted LabMD’s request to tempo-

  442 Ginger Zhe Jin

  Information asymmetry, externality, and commitment concerns can all

  be exacerbated by AI. More specifi cally, by potentially increasing the scope

  and value of consumer data use, AI can increase the expected benefi ts and

  costs of big data. But since the benefi ts are more internalized to the owner

  of the data and AI than consumer risks, AI could encourage intrusive use

  of data despite higher risks to consumers. For the same reason, new bene-

  fi ts enabled by AI—say cost savings or better sales—could entice a fi rm to

  (secretly) abandon its promise in privacy or data security.

  In short, big data introduces three “new” problems for consumer privacy:

  (a) sellers initially have more information about future data use than buyers

  after the focal transaction; (b) sellers need not fully internalize potential

  harms to consumers because of the inability to trace harm back to a data

  collector; and (c) sellers may promise consumer- friendly data policy at the

  time of data collection but renege afterward, as it is diffi

  cult to detect and

  penalize it ex post.4 All three encourage irresponsible data collection, data

  storage, and data use.

  All three problems could be aggravated by AI and other data technologies.

  Later in the chapter, I will describe a few AI- powered techniques that aim

  to alleviate the risk to consumer privacy and data security. Hence, the net

  impact of AI on privacy needs to take both sides into account.

  18.2 Ongoing Risk in Consumer Privacy and Data Security

  The risk associated with privacy and data security is real. Fundamentally

  data driven, the risk can be directly or indirectly related to AI and other data

  technologies. For example, since AI enhances the expected value of data,

  fi rms are encouraged to collect, store, and accumulate data, regardless of

  whether they will use AI themselves. The ever- growing big data storehouses

  become a prime target to hackers and scammers.

  18.2.1 Data at Risk

  According to the Privacy Rights Clearinghouse, 7,859 data breaches have

  been made public since 2005, exposing billions of records with personal

  identifi able information (PII) to potential abuse.5 A closer look at the data

  is even more alarming: not only do we observe mega breaches that aff ect

  millions at once, but also the information lost in a single breach spreads to all

  rarily stop enforcing the FTC order (while the appeals court considers the case), on the grounds that mere emotional harm and actions causing only a low likelihood of consumer harm may not meet the legal defi nition of unfair practice, even when the exposed data is highly sensitive.

  The court opinion can be found at http:// f.datasrvr .com/ fr1/ 016/ 73315/ 2016_1111 .pdf. What type of consumer harm is needed for a data security practice to be unfair and illegal remains an open question.

  4. Jin and Stivers (2017) elaborate on the three information problems in more details, but they do not associate them with AI or other data technology.

  5. https:// www .privacyrights .org/ data- breaches, accessed on December 18, 2017.

  Artifi cial Intelligence and Consumer Privacy 443

  kinds of PII. When Target lost 40 million records in December 2013, hack-

  ers got mostly debit and credit card numbers. But the recent Equifax breach

  (September 2017) aff ected 145 million people, with Social Security num-

  ber, whole credit history, and even driver’s license and transaction dispute

  data stolen from the same database. More concerning is the fact that data

  breaches occur disproportionally to organizations that accumulate massive

  PII data, including retailers, information aggregators, fi nancial institutions,

  and nonprofi t organizations such as governments, schools, and hospitals.

  Causes of data breaches have evolved as well. A decade ago, most data

  losses were driven by human errors such as unshredded records left in the

  trash, lost laptops without encrypted data, or data inadvertently uploaded

  to the open Web. Recent breaches are often the result of targeted hacking

  and ransomware attack. If we view a malicious hacker as a thief sneaking

  in to steal, a ransomware attacker is a kidnapper who takes control of your

  data system and demands ransom immediately. For instance, the ransom-

  ware attack in May 2017 has infected computers in ninety- nine countries

  (including the United States), bringing down transportation, banking,

  nuclear, and hospital systems in many places.6

  Thomas et al. (2017) follow the dark web from March 2016 to March

  2017, passively monitoring forums that trade credential leaks exposed via

  data breaches, phishing kits that deceive users into submitting their creden-

  tials to fake login pages, and off - the- shelf keyloggers that harvest passwords

  from infected machines. They identify large numbers of potential victims,

  including 788,000 of off - the- shelf keyloggers, 12.4 million of phishing kits,

  and 1.9 billion usernames and passwords exposed via data breaches. After

  matching these exposed credentials to Google’s internal database, they fi nd

  that 7 to 25 percent of exposed passwords match a victim’s Google account.

  More alarmingly, they observe “a remarkable lack of external pressure on

  bad actors, with phishing kit playb
ooks and keylogger capabilities remain-

  ing largely unchanged since the mid- 2000s.”

  18.2.2 Consumers at Risk

  The most concrete harm that could arise from a data breach is identity

  theft. According to the Bureau of Justice Statistics (BJS), identity theft

  aff ects 17.6 million (7 percent) of all US residents age sixteen and older

  (Harrell 2014). Consistently, identity theft is one of the biggest consumer-

  complaint categories—fi rst in 2014, second in 2015, and third in 2016 (FTC

  2014, 2015, 2016). In 2016, identity theft accounted for 13 percent of con-

  sumer complaints, trailing behind debt collection (28 percent) and imposter

  scam (13 percent), all of which could feed on lost personal data (FTC 2016).

  Of course, not all identity thefts are driven by inadequate privacy protec-

  tion or insuffi

  cient data security. Scammers practiced their creative art long

  6. http:// www .bbc .com/ news/ technology- 39901382, accessed on October 20, 2017.

  444 Ginger Zhe Jin

  before big data and AI existed. However, loss from identity theft is likely a

  function of data misuse. As reported by BJS (Harrell 2014), 86 percent of

  identity theft victims experienced fraudulent use of existing account infor-

  mation and 64 percent reported a direct fi nancial loss from the identity

  theft incident. Among those who reported direct fi nancial loss, victims of

  personal information fraud lost an average of $7,761 (with a median of

  $2,000) and victims of existing bank fraud lost an average of $780 (with a

  median of $200).7

  Researchers have attempted to draw a statistical link between data misuse

  and consumer harm. Romanosky, Acquisti, and Telang (2011) explore dif-

  ferences among state data breach notifi cation laws and fi nd that adoption of

  data breach disclosure laws reduces identity theft caused by data breaches by

  an average 6.1 percent. Romanosky, Hoff man, and Acquisti (2014) further

  examine federal data breach lawsuits from 2000 to 2010. They show that

  the odds of a fi rm being sued are 3.5 times greater when individuals suff er

  fi nancial harm but 6 times lower when the fi rm provides free credit moni-

  toring. Telang and Somanchi (2017) look at a more indirect consequence

  of data misuse. Using detailed transaction data from a US bank, they fi nd

  that consumers are 3 percentage points more likely to leave the bank if

  they have experienced an unauthorized fraudulent transaction within six

  months. While the unauthorized transaction could be a result of previous

  data breaches, it is diffi

  cult to attribute the fraud to a particular data breach.

  In other words, the bank and the consumer may both suff er from a data

  breach, but the breached fi rm has virtually zero shares in this suff ering.

  Tax fraud off ers another peek into the harm of data misuse. Through

  the Government Accounting Offi

  ce (GAO 2015), the US Internal Revenue

  Service (IRS) reported a point estimate of attempted identity theft refund

  fraud (as of 2013). Although the IRS was able to prevent or recover $24.2

  billion in fraudulent refunds, it paid out $5.8 billion in tax refunds that were

  later fl agged as identity theft frauds. In May 2015, the IRS disclosed a data

  breach where 100,000 taxpayer accounts were compromised through its Get

  Transcript application. This breach exposes sensitive information such as

  taxpayers’ prior- year tax fi lings. More important, it is compromised not

  because hackers broke a digital backdoor of the IRS, but because hackers

  were able to clear a multistep authentication process that required prior

  personal knowledge of the taxpayer’s Social Security number, date of birth,

  tax fi ling status, and street address.8 In other words, hackers got in the front

  door of the IRS, using information they already had or could readily guess.

  Such information is likely from previous data breaches or data available

  on the black market. This suggests that data breaches could have a ripple

  7. Direct fi nancial loss is not necessarily equal to the actual out- of-pocket loss to identity theft victims, as some fi nancial loss may be reimbursed.

  8. https:// www .irs .gov/ newsroom/ irs- statement- on- the- get- transcript- application, accessed on October 19, 2017.

  Artifi cial Intelligence and Consumer Privacy 445

  eff ect: a small vulnerability in one database could undermine data security

  in a completely unrelated organization.

  In some situations, data in the wrong hands could cause damage much

  bigger than fraudulent charges. For instance, the breach of AshleyMadison

  .com was said to be linked to multiple suicides.9 The ransomware attack in

  May 2017 was reported to have shut down work in sixteen UK hospitals,10

  crippled medical devices,11 and delayed at least one surgery in a US hospi-

  tal.12 As more medical devices get connected to the internet, compromised

  data security could generate disruption in surgeries and life support. It is

  not diffi

  cult to imagine similar risks in connected cars and the “internet

  of things.”

  One may argue that the ongoing wave of data breaches is more driven by

  data availability than by data- processing technology. This could be true at

  the moment, but recent trends suggest that criminals are getting sophisti-

  cated and are ready to exploit data technology.

  For instance, robocalls—the practice of using a computerized autodialer

  to deliver a prerecorded message to many telephones at once—has become

  prevalent because of relatively standard advances in information technology.

  But improved methods of pattern recognition and delivery appear to have

  increased the effi

  cacy, and thus prevalence, of these calls. For example, by

  pretending the call is from a local number that looks familiar to the receiver,

  it tricks the receiver into listening to unwanted telemarketing. Similarly,

  phishing emails have long strived to target people vulnerable to fi nancial

  and other frauds. Because the phishing attempt can be much more eff ective

  if it appears to come from a familiar email address and contains personal

  information that is supposedly only known to family and friends, eff ective

  phishing attempts have been limited by the labor needed to customize each

  email. This danger can be easily magnifi ed when scammers mass produce

  PII- customized phishing emails with individualized targeting, appeals, and

  mass delivery.

  Ironically, the same data technology that giant tech fi rms use for legitimate

  business can be converted into a tool for data misuse; AI is no exception.

  On September 6, 2017, Facebook admitted that it received approximately

  $100,000 in ad revenue from roughly 3,000 ads connected to 470 inauthentic

  accounts and pages that are affi

  liated with each other and likely operated out

  9. http:// www .dailymail.co .uk/ news/ article- 3208907/ The- Ashley- Madison- suicide- Texas

  - police- chief- takes- life- just- days- email- leaked- cheating- website- hack .html, http:// money

  .cnn .com/ 2015/ 08/ 24/ technology/ suicides- ashley- madison/ index .html, accessed on October 26, 2017.

  10. https:// www .theverge .com/ 2017/ 5/ 12/ 15630354/ nhs- hospitals-
ransomware- hack

  - wannacry- bitcoin, accessed on October 20, 2017.

  11. https:// www .forbes .com/ sites/ thomasbrewster/ 2017/ 05/ 17/ wannacry- ransomware

  - hit- real- medical- devices/ #7666463e425c, accessed on October 20, 2017.

  12. https:// www .recode .net/ 2017/ 6/ 27/ 15881666/ global- eu- cyber- attack- us- hackers- nsa

  - hospitals, accessed on October 20, 2017.

  446 Ginger Zhe Jin

  of Russia.13 Such information was estimated to reach as many as 126 million

  US users.14 Similar discoveries followed from Twitter and Google. The on-

  going investigation suggests that these Russian- backed accounts chose their

  content strategically so that the algorithms embedded in the platforms—

  including search rank, ad targeting, and post recommendation—helped to

  broadcast the message to specifi c demographics.15

  It is not going to be long before the same algorithms get exploited for

  stalking, blackmail, and other shady use. According to Vines, Roesner, and

  Kohno (2017), one can spend as low as $1,000 to track someone’s location

  with mobile ads. This is achieved by exploiting the ad tracking and ad target-

  ing algorithms widely used in mobile platforms and mobile apps. We do not

  know whether this trick has been used in the real world, but it sends two chill-

  ing messages. First, personal data is not only available to giant consumer-

  facing companies that can use AI for mass, individualized but impersonal,

  marketing but is also within the reach of small, nonmarket parties who can

  exploit that data for personalized targeting of the consumer. Arguably, the

  latter is more dangerous to a targeted individual, as small nonmarket par-

  ties face less reputation constraint, they are invisible to consumers, and they

  may be interested in causing more harm than simply getting a consumer to

  purchase an unwanted product. Second, these bad actors may be able to

  take advantage of the key algorithms that are designed to reap the benefi ts

  of AI for legitimate purposes. As these algorithms are further developed,

  they could also empower data misuse.

  Even if we can keep all data tightly secured and limit AI to its intended use,

  there is no guarantee that the intended use is harm free to consumers. Predic-

 

‹ Prev