The Economics of Artificial Intelligence
Page 68
prices vary with respect to quantity or quality), and
3. third degree (group pricing based on membership).
Fully personalized pricing is unrealistic, but prices based on fi ne- grained
features of consumers may well be feasible, so the line between third degree
and fi rst degree is becoming somewhat blurred. Shiller (2013) and Dubé and
Misra (2017) have investigated how much consumer surplus can be extracted
using ML models.
Second- degree price discrimination can also be viewed as pricing by
Artifi cial Intelligence, Economics, and Industrial Organization 411
group membership, but recognizing the endogeneity of group membership
and behavior. Machine learning using observational data will be of limited
help in designing such pricing schemes. However, reinforcement learning
techniques such as multiarmed bandits may also be helpful.
According to most noneconomics, the only thing worse than price dif-
ferentiation is price discrimination! However, most economists recognize
that price diff erentiation is often benefi cial from both an effi
ciency and an
equity point of view. Price diff erentiation allows markets to be served that
would otherwise not be served and often those unserved markets involve
low- income consumers.
DellaVigna and Gentzkow (2017) suggest that “the uniform pricing we
document signifi cantly increases the prices paid by poorer households rela-
tive to the rich.” This eff ect can be substantial. The authors show that “con-
sumers of [food] stores in the lowest income decile pay about 0.7 percent
higher prices than they would pay under fl exible pricing, but consumers of
stores in the top income decile pay about 9.0 percent lower prices than under
fl exible pricing.”
16.3.5 Returns to Scale
There are at least three types of returns to scale that could be relevant for
machine learning.
1. Classical supply- side returns to scale (decreasing average cost).
2. Demand- side returns to scale (network eff ects).
3. Learning by doing (improvement in quality or decrease in cost due to
experience).
Supply- Side Marginal Returns
It might seem like software is the paradigm case of supply- side returns
to scale: there is a large fi xed cost of developing the software, and a small
variable cost of distributing it. But if we compare this admittedly simple
model to the real world, there is an immediate problem.
Software development is not a one- time operation; almost all software is
updated and improved over time. Mobile phone operating systems are a case
in point: there are often monthly releases of bug fi xes and security improve-
ments coupled with yearly releases of major upgrades.
Note how diff erent this is from physical goods—true, there are bug fi xes
for mechanical problems in a car, but the capabilities of the car remain more
or less constant over time. A notable exception is the Tesla brand, where new
updated operating systems are released periodically.
As more and more products become network enabled we can expect to
see this happen more often. Your TV, which used to be a static device, will
be able to learn new tricks. Many TVs now have voice interaction, and we
can expect that machine learning will continue to advance in this area. This
412 Hal Varian
means that your TV will become more and more adept at communication
and likely will become better at discerning your preferences for various sorts
of content. The same goes for other appliances—their capabilities will no
longer be fi xed at time of sale, but will evolve over time.
This raises interesting economic questions about the distinction between
goods and services. When someone buys a mobile phone, a TV, or a car,
they are not just buying a static good, but rather a device that allows them
to access a whole panoply of services. This, in turn, raises a whole range of
questions about pricing and product design.
Demand- Side Returns to Scale
Demand- side economies of scale, or network eff ects, come in diff erent
varieties. There are direct network eff ects, where the value of a product or service to an incremental adopter depends on the total number of other adopt-
ers, and there are indirect network eff ects where there are two or more types
of complementary adopters. Users prefer an operating system with lots of
applications and developers prefer operating systems with lots of users.
Direct network eff ects could be relevant to choices of programming lan-
guages used in machine- learning systems, but the major languages are open
source. Similarly, it is possible that prospective users might prefer cloud
providers that have a lot of other users. However, it seems to me that this is
no diff erent than many other industries. Automobile purchasers may well
have a preference for popular brands since dealers, repair shops, parts, and
mechanics are readily available.
There is a concept that is circulating among lawyers and regulators called
“data network eff ects.” The model is that a fi rm with more customers can
collect more data and use this data to improve its product. This is often
true—the prospect of improving operations is what makes ML attractive—
but it is hardly novel. And it is certainly not a network eff ect! This is essen-
tially a supply- side eff ect known as “learning by doing” (also known as the
“experience curve” or “learning curve”). The classical exposition is Arrow
(1962); Spiegel and Hendel (2014) contain some up- to-date citations and a
compelling example.
Learning by Doing
Learning by doing is generally modeled as a process where unit costs
decline (or quality increases) as cumulative production or investment
increases. The rough rule of thumb is that a doubling of output leads to a
unit cost decline of 10 to 25 percent. Though the reasons for this effi
ciency
increase are not fi rmly established, the important point is that learning by
doing requires intention and investment by the fi rm and described in Stiglitz
and Greenwald (2014).
This distinguishes learning by doing from demand- side or supply- side
network eff ects that are typically thought to be more or less automatic.
Artifi cial Intelligence, Economics, and Industrial Organization 413
This is not really true either; entire books have been written about strategic
behavior in the presence of network eff ects. But there is an important dif-
ference between learning by doing and so-called “data network eff ects.” A
company can have huge amounts of data, but if it does nothing with the
data it produces no value.
In my experience the problem is not lack of resources but lack of skills.
A company that has data but no one to analyze it is in a poor position to
take advantage of that data. If there is no existing expertise internally, it is
hard to make intelligent choices about what skills are needed and how to
fi nd and hire people with those skills. Hiring good people has always been a
critical issue for competitive advantage. But since the widespread availability
of data is comparatively recent, this problem is particularly acute. Automo-
bile companies can hire people who know how to build automobiles, since
that is part of their core competency. They may or may not have suffi
cient
internal expertise to hire good data scientists, which is why we can expect
to see heterogeneity in productivity as this new skill percolates through the
labor markets. Bessen (2016, 2017) has written perceptively about this issue.
16.3.6 Algorithmic
Collusion
It has been known for decades that there are many equilibrium in repeated
games. The central result in this area is the so-called “folk theorem,” which
says that virtually any outcome can be achieved as an equilibrium in a
repeated game. For various formulations of this result, see the surveys by
Fudenberg (1992) and Pierce (1992).
Interaction of oligopolists can be viewed as a repeated game, and in this
case particular attention is focused on collusive outcomes. There are very
simple strategies that can be used to facilitate collusion.
Rapid Response Equilibrium. For example, consider the classic example
of two gas stations across the street from each other who can change prices
quickly and serve a fi xed population of consumers. Initially, they are both
pricing above marginal cost. If one drops its price by a penny, the other
quickly matches the price. In this case, both gas stations do worse off because
they are selling at a lower price. Hence, there is no reward to price cutting
and high prices prevail. Strategies of this sort may have been used in online
competition, as described in Varian (2000). Borenstein (1997) documents
related behavior in the context of airfare pricing.
Repeated Prisoner’s Dilemma. In the early 1980s, Robert Axelrod (1984)
conducted a prisoner’s dilemma tournament. Researches submitted algo-
rithmic strategies that were played against each other repeatedly.The winner
by a large margin was a simple strategy submitted by Anatol Rapoport called
“tit for tat.” In this strategy, each side starts out cooperating (charging high
prices). If either player defects (cuts its price), the other player matches.
Axelrod then constructed a tournament where strategies reproduced accord-
ing to their payoff s in the competition. He found that the best- performing
414 Hal Varian
strategies were very similar to tit for tat. This suggests that artifi cial agents
might learn to play cooperative strategies in a classic duopoly game.
NASDAQ Price Quotes. In the early 1990s, price quotes in the NASDAQ
were made in eighths of a dollar rather than cents. So if a bid was three-
eighths and an ask was two- eighths, a transaction would occur with the
buyer paying three- eighths and the seller receiving two- eighths. The diff er-
ence between the bid and the ask was the “inside spread,” which compen-
sated the traders for risk bearing and maintaining the capital necessary to
participate in the market. Note that the bigger the inside spread, the larger
the compensation to the market makers doing the trading.
In the mid- 1990s two economists, William Christie and Paul Schultz,
examined trades for the top seventy listed companies in NASDAQ and
found to their surprise that there were virtually no transactions made at odd-
eighth prices. The authors concluded that “our results most likely refl ected
an understanding or implicit agreement among the market makers to avoid
the use of odd- eighth price fractions when quoting these stocks” (Christie
and Schultz 1995, 203).
A subsequent investigation was launched by the Department of Justice
(DOJ), which was eventually settled by a $1.01 billion fi ne that, at the time,
was the largest fi ne ever paid in an antitrust case.
As these examples illustrate, it appears to be possible for implicit (or per-
haps explicit) cooperation to occur in the context of repeated interaction—
what Axelrod refers to as the “evolution of cooperation.”
Recently, issues of these sort have reemerged in the context of “algorith-
mic collusion.” In June 2017, the Organisation for Economic Co- operation
and Development (OECD) held a roundtable on algorithms and collusion
as a part of their work on competition in the digital economy. See OECD
(2017) for a background paper and Ezrachi and Stucke (2017) for a repre-
sentative contribution to the roundtable.
There are a number of interesting research questions that arise in this con-
text. The folk theorem shows that collusive outcomes can be an equilibrium
of a repeated game, but does not describe a specifi c algorithm that leads to
such an outcome. It is known that very simplistic algorithms, such as fi nite
automata with a small number of states cannot discover all equilibria (see
Rubinstein 1986).
There are auction- like mechanisms that can be used to approximate mo-
nopoly outcomes; see Segal (2003) for an example. However, I have not seen
similar mechanisms in an oligopoly context.
16.4 Structure of ML- Provision Industries
So far we have looked at industries that use machine learning, but it is also
of interest to look at companies that provide machine learning.
As noted above, it is likely that ML vendors will off er several related ser-
Artifi cial Intelligence, Economics, and Industrial Organization 415
vices. One question that immediately rises is how easy it will be to switch
among providers. Technologies such as containers have been developed
specifi cally make it easy to port applications from one cloud provider to
another. Open- source implementation such as dockers and kubernetes are
readily available. Lock in will not be a problem for small- and medium- size
applications, but of course, there could be issues involving large and complex
applications that involve customized applications.
Computer hardware also exhibits at least constant returns to scale due to
the ease of replicating hardware installations at the level of the chip, mother-
board, racks, or data centers themselves. The classic replication argument
for constant returns applies here since the basic way to increase capacity is
to just replicate what has been done before: add more core to the processors,
add more boards to racks, add more racks to the data center, and build more
data centers.
I have suggested earlier that cloud computing is more cost eff ective for
most users than building a data center from scratch. What is interesting is
that companies that require lots of data processing power have been able
to replicate their existing infrastructure and sell the additional capacity to
other, smaller entities. The result is an industry structure somewhat diff erent
than an economist might have imagined. Would an auto company build
excess capacity that it could then sell off to other companies? This is not
unheard of, but it is rare. Again it is the general purpose nature of comput-
ing that enables this model.
16.4.1 Pricing of ML Services
As with any other information- based industry, software is costly to pro-
duce and cheap to reproduce. As noted above, compu
ter hardware also
exhibits at least constant returns to scale due to the ease of replicating
hardware installations at the level of the chip, motherboard, racks, or data
centers themselves.
If services become highly standardized, then it is easy to fall into Bertrand-
like price cutting. Even in these early days, machine pricing appears to be
intensely competitive. For example, image recognition services cost about a
tenth- of-a- cent per image at all major cloud providers. Presumably, we will
see vendors try to diff erentiate themselves along dimensions of speed and
capabilities. Those fi rms that can provide better services may be able to pro-
vide premium prices, to the extent that users are willing to pay for premium
service. However, current speeds and accuracy are very high and it is unclear
how users value further improvement in these dimensions.
16.5 Policy
Questions
We have already discussed issues involving data ownership, data access,
diff erential pricing, returns to scale, and algorithmic collusion, all of which
416 Hal Varian
have signifi cant policy aspects. The major policy areas remaining are secu-
rity and privacy. I start with a few remarks about security.
16.5.1 Security
One important question that arises with respect to security is whether
fi rms have appropriate incentives in this regard. In a classic article, Ander-
son (1993) compares US and UK policy with respect to automatic teller
machines (ATMs). In the United States, the user was right unless the bank
could prove them wrong, while in the United Kingdom, the bank was right
unless the user could prove them wrong. The result of this liability assign-
ment was that US banks invested in security practices such as security cam-
eras, while the UK banks didn’t bother with such elementary precautions.
This industry indicates how important liability assignment is in creating
appropriate incentives for investment in security. The law and economics
analysis of tort law is helpful in understanding the implications of diff erent
liability assignments and what optimal assignments might look like.
One principle that emerges is that of the “due care” standard. If a fi rm fol-
lows certain standard procedures such as installing security fi xes within a few
days of their being released, implementing two- factor authentication, edu-