The Economics of Artificial Intelligence
Page 73
its promise. However, it is often diffi
cult to discover the real data practice. It
is even more diffi
cult to rectify consumer damage from a misrepresented data
policy, as a court often requires a “body on the ground”—that is, evidence
of a harmful outcome—as well as some confi dence that there is a causal
link between that outcome and the data collector’s practices.3
1. There could be positive externality from one player to another. For example, a data set that tracks an infectious disease nationwide can generate enormous public health benefi ts for everyone. But if each data collector accesses only part of the data and there is no way for him to benefi t from the fi nal product based on nationwide data, he may have an incentive to under-collect and undershare the data. Here I focus on negative externality, in order to highlight the risk of overcollecting and oversharing.
2. The argument of negative externality has been discussed in multiple papers, including Swire and Litan (1998) and Odlyzko (2003). See Acquisti, Taylor, and Wagman (2016) for a more comprehensive summary.
3. The Court’s emphasis on tangible harm is best illustrated in an ongoing battle between the Federal Trade Commission (FTC) and LabMD. LabMD is a medical testing laboratory
that collects sensitive personal and medical information from consumers. The FTC alleged that LabMD violated the FTC Act by failing to employ reasonable and appropriate measures to prevent unauthorized access to consumers’ personal information. In November 2015, the Administrative Judge of the FTC dismissed the FTC complaint, arguing that complaint counsel failed to prove that LabMD’s data security conduct caused or was likely to cause substantial injury to consumers (https:// www .ftc .gov/ news- events/ press- releases/ 2015/ 11/ administrative
- law- judge- dismisses- ftc- data- security- complaint). This decision was reversed in July 2016, by an Opinion and Final Order from the FTC commissioners (https:// www .ftc .gov/ news- events
/ press- releases/ 2016/ 07/ commission- fi nds- labmd- liable- unfair- data- security- practices). In November 2016, the 11th US Circuit Court of Appeals granted LabMD’s request to tempo-
442 Ginger Zhe Jin
Information asymmetry, externality, and commitment concerns can all
be exacerbated by AI. More specifi cally, by potentially increasing the scope
and value of consumer data use, AI can increase the expected benefi ts and
costs of big data. But since the benefi ts are more internalized to the owner
of the data and AI than consumer risks, AI could encourage intrusive use
of data despite higher risks to consumers. For the same reason, new bene-
fi ts enabled by AI—say cost savings or better sales—could entice a fi rm to
(secretly) abandon its promise in privacy or data security.
In short, big data introduces three “new” problems for consumer privacy:
(a) sellers initially have more information about future data use than buyers
after the focal transaction; (b) sellers need not fully internalize potential
harms to consumers because of the inability to trace harm back to a data
collector; and (c) sellers may promise consumer- friendly data policy at the
time of data collection but renege afterward, as it is diffi
cult to detect and
penalize it ex post.4 All three encourage irresponsible data collection, data
storage, and data use.
All three problems could be aggravated by AI and other data technologies.
Later in the chapter, I will describe a few AI- powered techniques that aim
to alleviate the risk to consumer privacy and data security. Hence, the net
impact of AI on privacy needs to take both sides into account.
18.2 Ongoing Risk in Consumer Privacy and Data Security
The risk associated with privacy and data security is real. Fundamentally
data driven, the risk can be directly or indirectly related to AI and other data
technologies. For example, since AI enhances the expected value of data,
fi rms are encouraged to collect, store, and accumulate data, regardless of
whether they will use AI themselves. The ever- growing big data storehouses
become a prime target to hackers and scammers.
18.2.1 Data at Risk
According to the Privacy Rights Clearinghouse, 7,859 data breaches have
been made public since 2005, exposing billions of records with personal
identifi able information (PII) to potential abuse.5 A closer look at the data
is even more alarming: not only do we observe mega breaches that aff ect
millions at once, but also the information lost in a single breach spreads to all
rarily stop enforcing the FTC order (while the appeals court considers the case), on the grounds that mere emotional harm and actions causing only a low likelihood of consumer harm may not meet the legal defi nition of unfair practice, even when the exposed data is highly sensitive.
The court opinion can be found at http:// f.datasrvr .com/ fr1/ 016/ 73315/ 2016_1111 .pdf. What type of consumer harm is needed for a data security practice to be unfair and illegal remains an open question.
4. Jin and Stivers (2017) elaborate on the three information problems in more details, but they do not associate them with AI or other data technology.
5. https:// www .privacyrights .org/ data- breaches, accessed on December 18, 2017.
Artifi cial Intelligence and Consumer Privacy 443
kinds of PII. When Target lost 40 million records in December 2013, hack-
ers got mostly debit and credit card numbers. But the recent Equifax breach
(September 2017) aff ected 145 million people, with Social Security num-
ber, whole credit history, and even driver’s license and transaction dispute
data stolen from the same database. More concerning is the fact that data
breaches occur disproportionally to organizations that accumulate massive
PII data, including retailers, information aggregators, fi nancial institutions,
and nonprofi t organizations such as governments, schools, and hospitals.
Causes of data breaches have evolved as well. A decade ago, most data
losses were driven by human errors such as unshredded records left in the
trash, lost laptops without encrypted data, or data inadvertently uploaded
to the open Web. Recent breaches are often the result of targeted hacking
and ransomware attack. If we view a malicious hacker as a thief sneaking
in to steal, a ransomware attacker is a kidnapper who takes control of your
data system and demands ransom immediately. For instance, the ransom-
ware attack in May 2017 has infected computers in ninety- nine countries
(including the United States), bringing down transportation, banking,
nuclear, and hospital systems in many places.6
Thomas et al. (2017) follow the dark web from March 2016 to March
2017, passively monitoring forums that trade credential leaks exposed via
data breaches, phishing kits that deceive users into submitting their creden-
tials to fake login pages, and off - the- shelf keyloggers that harvest passwords
from infected machines. They identify large numbers of potential victims,
including 788,000 of off - the- shelf keyloggers, 12.4 million of phishing kits,
and 1.9 billion usernames and passwords exposed via data breaches. After
matching these exposed credentials to Google’s internal database, they fi nd
that 7 to 25 percent of exposed passwords match a victim’s Google account.
More alarmingly, they observe “a remarkable lack of external pressure on
bad actors, with phishing kit playb
ooks and keylogger capabilities remain-
ing largely unchanged since the mid- 2000s.”
18.2.2 Consumers at Risk
The most concrete harm that could arise from a data breach is identity
theft. According to the Bureau of Justice Statistics (BJS), identity theft
aff ects 17.6 million (7 percent) of all US residents age sixteen and older
(Harrell 2014). Consistently, identity theft is one of the biggest consumer-
complaint categories—fi rst in 2014, second in 2015, and third in 2016 (FTC
2014, 2015, 2016). In 2016, identity theft accounted for 13 percent of con-
sumer complaints, trailing behind debt collection (28 percent) and imposter
scam (13 percent), all of which could feed on lost personal data (FTC 2016).
Of course, not all identity thefts are driven by inadequate privacy protec-
tion or insuffi
cient data security. Scammers practiced their creative art long
6. http:// www .bbc .com/ news/ technology- 39901382, accessed on October 20, 2017.
444 Ginger Zhe Jin
before big data and AI existed. However, loss from identity theft is likely a
function of data misuse. As reported by BJS (Harrell 2014), 86 percent of
identity theft victims experienced fraudulent use of existing account infor-
mation and 64 percent reported a direct fi nancial loss from the identity
theft incident. Among those who reported direct fi nancial loss, victims of
personal information fraud lost an average of $7,761 (with a median of
$2,000) and victims of existing bank fraud lost an average of $780 (with a
median of $200).7
Researchers have attempted to draw a statistical link between data misuse
and consumer harm. Romanosky, Acquisti, and Telang (2011) explore dif-
ferences among state data breach notifi cation laws and fi nd that adoption of
data breach disclosure laws reduces identity theft caused by data breaches by
an average 6.1 percent. Romanosky, Hoff man, and Acquisti (2014) further
examine federal data breach lawsuits from 2000 to 2010. They show that
the odds of a fi rm being sued are 3.5 times greater when individuals suff er
fi nancial harm but 6 times lower when the fi rm provides free credit moni-
toring. Telang and Somanchi (2017) look at a more indirect consequence
of data misuse. Using detailed transaction data from a US bank, they fi nd
that consumers are 3 percentage points more likely to leave the bank if
they have experienced an unauthorized fraudulent transaction within six
months. While the unauthorized transaction could be a result of previous
data breaches, it is diffi
cult to attribute the fraud to a particular data breach.
In other words, the bank and the consumer may both suff er from a data
breach, but the breached fi rm has virtually zero shares in this suff ering.
Tax fraud off ers another peek into the harm of data misuse. Through
the Government Accounting Offi
ce (GAO 2015), the US Internal Revenue
Service (IRS) reported a point estimate of attempted identity theft refund
fraud (as of 2013). Although the IRS was able to prevent or recover $24.2
billion in fraudulent refunds, it paid out $5.8 billion in tax refunds that were
later fl agged as identity theft frauds. In May 2015, the IRS disclosed a data
breach where 100,000 taxpayer accounts were compromised through its Get
Transcript application. This breach exposes sensitive information such as
taxpayers’ prior- year tax fi lings. More important, it is compromised not
because hackers broke a digital backdoor of the IRS, but because hackers
were able to clear a multistep authentication process that required prior
personal knowledge of the taxpayer’s Social Security number, date of birth,
tax fi ling status, and street address.8 In other words, hackers got in the front
door of the IRS, using information they already had or could readily guess.
Such information is likely from previous data breaches or data available
on the black market. This suggests that data breaches could have a ripple
7. Direct fi nancial loss is not necessarily equal to the actual out- of-pocket loss to identity theft victims, as some fi nancial loss may be reimbursed.
8. https:// www .irs .gov/ newsroom/ irs- statement- on- the- get- transcript- application, accessed on October 19, 2017.
Artifi cial Intelligence and Consumer Privacy 445
eff ect: a small vulnerability in one database could undermine data security
in a completely unrelated organization.
In some situations, data in the wrong hands could cause damage much
bigger than fraudulent charges. For instance, the breach of AshleyMadison
.com was said to be linked to multiple suicides.9 The ransomware attack in
May 2017 was reported to have shut down work in sixteen UK hospitals,10
crippled medical devices,11 and delayed at least one surgery in a US hospi-
tal.12 As more medical devices get connected to the internet, compromised
data security could generate disruption in surgeries and life support. It is
not diffi
cult to imagine similar risks in connected cars and the “internet
of things.”
One may argue that the ongoing wave of data breaches is more driven by
data availability than by data- processing technology. This could be true at
the moment, but recent trends suggest that criminals are getting sophisti-
cated and are ready to exploit data technology.
For instance, robocalls—the practice of using a computerized autodialer
to deliver a prerecorded message to many telephones at once—has become
prevalent because of relatively standard advances in information technology.
But improved methods of pattern recognition and delivery appear to have
increased the effi
cacy, and thus prevalence, of these calls. For example, by
pretending the call is from a local number that looks familiar to the receiver,
it tricks the receiver into listening to unwanted telemarketing. Similarly,
phishing emails have long strived to target people vulnerable to fi nancial
and other frauds. Because the phishing attempt can be much more eff ective
if it appears to come from a familiar email address and contains personal
information that is supposedly only known to family and friends, eff ective
phishing attempts have been limited by the labor needed to customize each
email. This danger can be easily magnifi ed when scammers mass produce
PII- customized phishing emails with individualized targeting, appeals, and
mass delivery.
Ironically, the same data technology that giant tech fi rms use for legitimate
business can be converted into a tool for data misuse; AI is no exception.
On September 6, 2017, Facebook admitted that it received approximately
$100,000 in ad revenue from roughly 3,000 ads connected to 470 inauthentic
accounts and pages that are affi
liated with each other and likely operated out
9. http:// www .dailymail.co .uk/ news/ article- 3208907/ The- Ashley- Madison- suicide- Texas
- police- chief- takes- life- just- days- email- leaked- cheating- website- hack .html, http:// money
.cnn .com/ 2015/ 08/ 24/ technology/ suicides- ashley- madison/ index .html, accessed on October 26, 2017.
10. https:// www .theverge .com/ 2017/ 5/ 12/ 15630354/ nhs- hospitals-
ransomware- hack
- wannacry- bitcoin, accessed on October 20, 2017.
11. https:// www .forbes .com/ sites/ thomasbrewster/ 2017/ 05/ 17/ wannacry- ransomware
- hit- real- medical- devices/ #7666463e425c, accessed on October 20, 2017.
12. https:// www .recode .net/ 2017/ 6/ 27/ 15881666/ global- eu- cyber- attack- us- hackers- nsa
- hospitals, accessed on October 20, 2017.
446 Ginger Zhe Jin
of Russia.13 Such information was estimated to reach as many as 126 million
US users.14 Similar discoveries followed from Twitter and Google. The on-
going investigation suggests that these Russian- backed accounts chose their
content strategically so that the algorithms embedded in the platforms—
including search rank, ad targeting, and post recommendation—helped to
broadcast the message to specifi c demographics.15
It is not going to be long before the same algorithms get exploited for
stalking, blackmail, and other shady use. According to Vines, Roesner, and
Kohno (2017), one can spend as low as $1,000 to track someone’s location
with mobile ads. This is achieved by exploiting the ad tracking and ad target-
ing algorithms widely used in mobile platforms and mobile apps. We do not
know whether this trick has been used in the real world, but it sends two chill-
ing messages. First, personal data is not only available to giant consumer-
facing companies that can use AI for mass, individualized but impersonal,
marketing but is also within the reach of small, nonmarket parties who can
exploit that data for personalized targeting of the consumer. Arguably, the
latter is more dangerous to a targeted individual, as small nonmarket par-
ties face less reputation constraint, they are invisible to consumers, and they
may be interested in causing more harm than simply getting a consumer to
purchase an unwanted product. Second, these bad actors may be able to
take advantage of the key algorithms that are designed to reap the benefi ts
of AI for legitimate purposes. As these algorithms are further developed,
they could also empower data misuse.
Even if we can keep all data tightly secured and limit AI to its intended use,
there is no guarantee that the intended use is harm free to consumers. Predic-