by Ajay Agrawal
able cost (renting time on the data center). An organization can purchase
virtually any amount of cloud services, so even small companies can start at
a minimal level and be charged based on usage. Cloud computing is much
more cost eff ective than owning your own data center, since compute and
data resources can be purchased on an as- needed basis. Needless to say,
most tech start-ups today use a cloud provider for their hardware, software,
and networking needs.
Cloud providers also off er various machine- learning services such as
voice recognition, image recognition, translation, and so on. These systems
are already trained by the vendor and can be put to immediate use by the
customer. It is no longer necessary for each company to develop its own
software for these tasks.
Competition among the cloud providers is intense. Highly detailed and
specifi c image recognition capabilities are off ered at a cost of a tenth- of-a-
cent per image or less, with volume discounts on top of that price.
A user may also have idiosyncratic data relevant to its own business like
the point- of-sale data mentioned above. The cloud provider also provides
up- to-date, highly optimized hardware and software than implements
popular machine- learning algorithms. This allows the use immediate access
to high- powered tools . . . providing that they have the expertise to use them.
If the hardware, software, and expertise are available, all that is needed is
the labeled data. There are a variety of ways to acquire such data.
• As By- Product of Operations. Think of a chain of restaurants where
some perform better than others, and management may be interested in
factors that are associated with performance. Much of the data in the
Kaggle competitions mentioned above are generated as a byproduct of
day- to-day operations.
• Web Scraping. This is a commonly used way to extract data from web-
sites. There is a legal debate about what exactly is permitted with respect
to both the collection of data and how it is used. The debate is too com-
plex to discuss here, but the Wikipedia entry on Web scraping is good.
An alternative is to use data that others have scraped. For example, the
Common Crawl database contains petabytes of data compiled over
eight years of Web crawling.
• Off ering a Service. When Google started its work on voice recognition,
it had no expertise and no data. It hired the expertise and they came up
404 Hal Varian
with the idea of a voice- input telephone directory as a way to acquire
data. Users would say “Joe’s Pizza, University Avenue, Palo Alto” and
the system would respond with a phone number. The digitized question
and the resulting user choices were uploaded to the cloud and machine
learning was used to evaluate the relationship between Google’s answer
and the user action—for example, to call the suggested number. The
ML training used data from millions of individual number requests and
learned rapidly. ReCAPTCHA applies a similar model where humans
label images to prove they are human and not a simple bot.
• Hiring Humans to Label Data. Mechanical Turk and other systems can
be used to pay people to label data (see Hutson 2017).
• Buying Data from Provider. There are many providers of various sorts
of data such as mail lists, credit scores, and so on.
• Sharing Data. It may be mutually advantageous to parties to share
data. This is common among academic researchers. The Open Images
Data set contains about nine million labeled images contributed by
universities and research labs. Sharing may be mandated for a variety
reasons, such as concerns for public safety. Examples are black boxes
from airplanes or medical data on epidemics.
• Data from Governments. There are vast amounts of data available
from governments, universities, research labs, and nongovernmental
agencies.
• Data from Cloud Providers. Many cloud providers also provide public
data repositories. See, for example, Google Public Data sets, Google
Patents Public Data set, or AWS Public Data sets.
• Computer- Generated Data. The Alpha Go 0 system mentioned earlier
generated its own data by playing Go games against itself. Machine-
vision algorithms can be trained using “synthetic images,” which are
actual images that have been shifted, rotated, and scaled in various
ways.
16.2.3 Important Characteristics of Data
Information science uses the concept of a “data pyramid” to depict the
relationship between data, information, and knowledge. Some system has
to collect the raw data, and subsequently organize and analyze that data
in order to turn it into information—something such as a textual docu-
ment image that can be understood by humans. Think of the pixels in an
image being turned into human- readable labels. In the past this was done
by humans; in the future more and more of this will be done by machines.
(See fi gure 16.1.)
This insights from the information can then turned into knowledge, which
generally is embodied in humans. We can think of data being stored in bits,
information stored in documents, and knowledge stored in humans. There
are well- developed markets and regulatory environments for information
Artifi cial Intelligence, Economics, and Industrial Organization 405
Fig. 16.1 The information pyramid
(books, articles, web pages, music, videos) and for knowledge (labor markets,
consultants). Markets for data—in the sense of unorganized collections of
bits—are not as developed. Perhaps this is because raw data is often heavily
context dependent and is not very useful until it is turned into information.
Data Ownership and Data Access
It is said that “data is the new oil.” Certainly, they are alike in one respect:
both need to be refi ned in order to be useful. But there is an important dis-
tinction: oil is a private good and consumption of oil is rival: if one person consumes oil, there is less available for someone else to consume. But data
is nonrival: one person’s use of data does not reduce or diminish another
person’s use.
So instead of focusing on data “ownership”—a concept appropriate for
private goods—we really should think about data access. Data is rarely
“sold” in the same way private goods are sold, rather it is licensed for specifi c
uses. Currently there is a policy debate in Europe about “who should own
autonomous vehicle data?” A better question is to ask “who should have
access to autonomous vehicle data and what can they do with it?” This for-
mulation emphasizes that many parties can simultaneously access autono-
mous vehicle data. In fact, from the viewpoint of safety it seems very likely
that multiple parties should be allowed to access autonomous vehicle data.
There could easily be several data collection points in a car: the engine, the
navigation system, mobile phones in rider’s pockets, and so on. Requiring
exclusivity without a good reason for doing so would unnecessarily limit
what can be done with the data.
Ross An
derson’s description of what happens when there is an aircraft
406 Hal Varian
crash makes an important point illustrating why it may be important to
allow several parties to access data.
When an aircraft crashes, it is front page news. Teams of investigators
rush to the scene, and the subsequent enquiries are conducted by experts
from organisations with a wide range of interests—the carrier, the insurer,
the manufacturer, the airline pilots’ union, and the local aviation author-
ity. Their fi ndings are examined by journalists and politicians, discussed
in pilots’ messes, and passed on by fl ying instructors. In short, the fl y-
ing community has a strong and institutionalised learning mechanism.
(Anderson 1993)
Should we not want the same sort of learning mechanism for autonomous
vehicles? Some sorts of information can be protected by copyright. But in
the United States, raw data such as a telephone directory is not protected
by copyright. (See Wikipedia entry on the legal case Feist Publications, Inc
v. Rural Telephone Service Co.)
Despite this, data providers may compile some data and off er to license on
certain terms to other parties. For example, there are several data companies
that merge US census data with other sorts of geographic data and off er
to license this data. These transactions may prohibit resale or relicensing.
Even though there is no protectable intellectual property, the terms of the
contract form a private contract that can be enforced by courts, as with any
other private contract.
Decreasing Marginal Returns
Finally, it is important to understand that data typically exhibits decreas-
ing returns to scale like any other factor of production. The same general
principle applies for machine learning. Figure 16.2 shows how the accuracy
of the Stanford dog breed classifi cation behaves as the amount of training
data increases. As one would expect, accuracy improves as the number of
training images increases, but it does so at a decreasing rate.
Figure 16.3 shows how the error rate in the ImageNet competition has
declined over the last several years. An important fact about this competition
is that the number of training and test observations has been fi xed during
this period. This means that the improved performance of the winning sys-
tems cannot depend on sample size since it has been constant. Other factors
such as improved algorithms, improved hardware, and improved expertise
have been much more important than the number of observations in the
training data.
16.3 Structure of ML- Using Industries
As with any new technology, the advent of machine learning raises several
economic questions.
Fig. 16.2 Machine- learning adoption by economic sector
Source: http:// vision.stanford .edu/ aditya86/ ImageNetDogs/.
Fig. 16.3 Imagenet image recognition
Source: Eckersley and Nasser (2017).
408 Hal Varian
Fig. 16.4 Number of AI- related technologies adopted at scale or in a core part of
the business
Source: McKinsey (2017).
• Which fi rms and industries will successfully adopt machine learning?
• Will we see heterogeneity in the timing of adoption and the ability to
use ML eff ectively?
• Can later adopters imitate early adopters?
• What is the role of patents, copyright, and trade secrets?
• What is the role of geography in adoption patterns?
• Is there a large competitive advantage for early, successful adopters?
Bughin and Hazan (2017) recently conducted a survey of 3,000 “AI
Aware” C- level executives about adoption readiness. Of these executives,
20 percent are “serious adopters,” 40 percent are “experimenting,” and
28 percent feel their fi rms “lack the technical capabilities” to implement ML.
McKinsey identifi es key enablers of adoption to be leadership, technical
ability, and data access. Figure 16.4 breaks down how ML adoption varies
across economic sectors. Not surprisingly, sectors such as telecom, tech, and
energy are ahead of less tech- savvy sectors such as construction and travel.
16.3.1 Machine Learning and Vertical Integration
A key question for industrial organization is how machine- learning tools
and data can be combined to create value. Will this happen within or across
corporate boundaries? Will ML users develop their own ML capabilities or
purchase ML solutions from vendors? This is the classic make versus buy
Artifi cial Intelligence, Economics, and Industrial Organization 409
question that is the key to understanding much of real- world industrial
organization.
As mentioned earlier, cloud vendors provide integrated hardware and
software environments for data manipulation and analysis. They also off er
access to public and private databases, provide labeling services, consulting,
and other related services that enable one- stop shopping for data manipula-
tion and analysis. Special- purpose hardware provided by cloud providers
such as GPUs and TPUs have become key technologies for diff erentiating
provider services.
As usual there is a tension between standardization and diff erentiation.
Cloud providers are competing intensely to provide standardized environ-
ments that can be easily maintained. At the same time, they want to provide
services that diff erentiate their off erings from competitors.
Data manipulation and machine learning are natural areas to compete
with respect to product speed and performance.
16.3.2 Firm Size and Boundaries
Will ML increase or decrease minimum effi
cient scale? The answer de-
pends on the relationship between fi xed costs and variable costs. If fi rms
have to spend signifi cant amounts to develop customized solutions to their
problems, we might expect that fi xed costs are signifi cant and fi rm size must
be large to amortize those costs. On the other hand, if fi rms can buy off -
the- shelf services from cloud vendors, we would expect that fi xed costs and
minimum effi
cient scale to be small.
Suppose, for example, that an oil change service would like to greet return-
ing customers by name. They can accomplish this using a database that joins
license plate numbers with customer names and service history. It would be
prohibitively expensive for a small provider to write the software to enable
this, so only the large chains could provide such services. On the other hand,
a third party might develop a smartphone app that could provide this ser-
vice for a nominal cost. This service might allow minimum effi
cient scale to
decrease. The same considerations apply for other small service providers
such as restaurants, dry cleaners, or convenience stores.
Nowadays new start-ups are able to outsource a variety of business pro-
cesses since there are a several providers of business services. Just as fast-
food providers could perfect a model with a single establishment and then
go national, business service companies can build systems once and replicate
them globally.
&n
bsp; Here is a list of how a start-up might outsource a dozen business pro-
cesses.
• Fund your project on Kickstarter.
• Cloud cloud computing and network from Google, Amazon, or Micro-
Soft.
• Use open- source software like Linux, Python, Tensorfl ow, and so forth.
410 Hal Varian
• Manage your software using GitHub.
• Become a micromultinational and hire programmers from abroad.
• Set up a Kaggle competition for machine learning.
• Use Skype, Hangouts, Google Docs, and so forth for team communi-
cation.
• Use Nolo for legal documents (company, patents, NDAs).
• Use QuickBooks for accounting.
• Use AdWords, Bing, or Facebook for marketing.
• Use ZenDesk for user support.
This is only a partial list. Most start-ups in Silicon Valley and SOMA
avail themselves of several of these business- process services. By choos-
ing standardizing business processes, the start-ups can focus on their core
competency and purchases services as necessary as they scale. One would
expect to see more entry and more innovation as a result of the availability
of these business- process services.
16.3.3 Pricing
The availability of cloud computing and machine learning off ers lots of
opportunities to adjust prices based on customer characteristics. Auctions
and other novel pricing mechanisms can be implemented easily. The fact
that prices can be so easily adjusted implies that various forms of diff erential
pricing can be implemented. However, it must be remembered that custom-
ers are not helpless; they can also avail themselves of enhanced search capa-
bilities. For example, airlines can adopt strategies that tie purchase price to
departure date. But services can be created that reverse- engineer the airline
algorithms and advise consumers about when to purchase (see, e.g., Etzioni
et al. (2003). See Acquisti and Varian (2005) for a theoretical model of how
consumers might respond to attempts to base prices on consumer history
and how the consumers can respond to such attempts.
16.3.4 Price Diff erentiation
Traditionally, price diff erentiation has been classifi ed into three categories:
1. First degree (personalized),
2. second degree (versioning: same price menu for all consumers, but