by Ajay Agrawal
ing the diff erent roles of prediction and judgment. Drawing inspiration from
Bolton and Faure- Grimaud (2009), we then build the baseline model with
two states of the world and uncertainty about payoff s to actions in each
state. We explore the value of judgment in the absence of any prediction
technology, and then the value of prediction technology when there is no
judgment. We fi nish the discussion of the baseline model with an explora-
tion of the interaction between prediction and judgment, demonstrating
that prediction and judgment are complements as long as judgment isn’t too
diffi
cult. We then separate prediction quality into prediction frequency and
prediction accuracy. As judgment improves, accuracy becomes more impor-
tant relative to frequency. Finally, we examine complex environments where
the number of potential states is large. Such environments are common in
economic models of automation, contracting, and boundaries of the fi rm.
We show that the eff ect of improvements in prediction on the importance
of judgment depend a great deal on whether the improvements in prediction
enable automated decision- making.
3.2 AI and Prediction Costs
We argue that the recent advances in artifi cial intelligence are advances
in the technology of prediction. Most broadly, we defi ne prediction as the
ability to take known information to generate new information. Our model
emphasizes prediction about the state of the world.
Most contemporary artifi cial intelligence research and applications come
from a fi eld now called “machine learning.” Many of the tools of machine
learning have a long history in statistics and data analysis, and are likely
familiar to economists and applied statisticians as tools for prediction and
classifi cation.2 For example, Alpaydin’s (2010) textbook Introduction to
Machine Learning covers maximum likelihood estimation, Bayesian esti-
mation, multivariate linear regression, principal components analysis, clus-
tering, and nonparametric regression. In addition, it covers tools that may
be less familiar, but also use independent variables to predict outcomes:
2. We defi ne prediction as known information to generate new information. Therefore, classifi cation techniques such as clustering are prediction techniques in which the new information to be predicted is the appropriate category or class.
A Theory of Decision- Making and Artifi cial Intelligence 93
regression trees, neural networks, hidden Markov models, and reinforce-
ment learning. Hastie, Tibshirani, and Friedman (2009) cover similar topics.
The 2014 Journal of Economic Perspectives symposium on big data covered
several of these less familiar prediction techniques in articles by Varian
(2014) and Belloni, Chernozhukov, and Hansen (2014).
While many of these prediction techniques are not new, recent advances
in computer speed, data collection, data storage, and the prediction methods
themselves have led to substantial improvements. These improvements have
transformed the computer science research fi eld of artifi cial intelligence. The
Oxford English Dictionary defi nes artifi cial intelligence as “[t]he theory and
development of computer systems able to perform tasks normally requiring
human intelligence.” In the 1960s and 1970s, artifi cial intelligence research
was primarily rules- based, symbolic logic. It involved human experts gen-
erating rules that an algorithm could follow (Domingos 2015, 89). These
are not prediction technologies. Such systems became very good chess
players and they guided factory robots in highly controlled settings; how-
ever, by the 1980s, it became clear that rules- based systems could not deal
with the complexity of many nonartifi cial settings. This led to an “AI winter”
in which research funding artifi cial intelligence projects largely dried up
(Markov 2015).
Over the past ten years, a diff erent approach to artifi cial intelligence has
taken off . The idea is to program computers to “learn” from example data
or experience. In the absence of the ability to predetermine the decision
rules, a data- driven prediction approach can conduct many mental tasks.
For example, humans are good at recognizing familiar faces, but we would
struggle to explain and codify this skill. By connecting data on names to
image data on faces, machine learning solves this problem by predicting
which image data patterns are associated with which names. As a prominent
artifi cial intelligence researcher put it, “Almost all of AI’s recent progress is
through one type, in which some input data (A) is used to quickly generate
some simple response (B)” (Ng 2016). Thus, the progress is explicitly about
improvements in prediction. In other words, the suite of technologies that
have given rise to the recent resurgence of interest in artifi cial intelligence
use data collected from sensors, images, videos, typed notes, or anything
else that can be represented in bits to fi ll in missing information, recognize
objects, or forecast what will happen next.
To be clear, we do not take a position on whether these prediction tech-
nologies really do mimic the core aspects of human intelligence. While Palm
Computing founder Jeff Hawkins argues that human intelligence is—in
essence—prediction (Hawkins 2004), many neuroscientists, psychologists,
and others disagree. Our point is that the technologies that have been given
the label artifi cial intelligence are prediction technologies. Therefore, in
order to understand the impact of these technologies, it is important to
assess the impact of prediction on decisions.
94 Ajay Agrawal, Joshua Gans, and Avi Goldfarb
3.3 Case:
Radiology
Before proceeding to the model, we provide some intuition of how predic-
tion and judgment apply in a particular context where prediction machines
are expected to have a large impact: radiology. In 2016, Geoff Hinton—one
of the pioneers of deep learning neural networks—stated that it was no lon-
ger worth training radiologists. His strong implication was that radiologists
would not have a future. This is something that radiologists have been con-
cerned about since 1960 (Lusted 1960). Today, machine- learning techniques
are being heavily applied in radiology by IBM using its Watson computer
and by a start-up, Enlitic. Enlitic has been able to use deep learning to detect
lung nodules (a fairly routine exercise)3 but also fractures (which is more
complex). Watson can now identify pulmonary embolism and some other
heart issues. These advances are at the heart of Hinton’s forecast, but have
also been widely discussed among radiologists and pathologists (Jha and
Topol 2016). What does the model in this chapter suggest about the future
of radiologists?
If we consider a simplifi ed characterization of the job of a radiologist,
it would be that they examine an image in order to characterize and clas-
sify that image and return an assessment to a physician. While often that
assessment is a diagnosis (i.e., “the patient has pneumonia”), in many cases,
the assessment is in the negative (i
.e., “pneumonia not excluded”). In that
regard, this is stated as a predictive task to inform the physician of the
likelihood of the state of the world. Using that, the physician can devise a
treatment.
These predictions are what machines are aiming to provide. In particular,
it might provide a diff erential diagnosis of the following kind:
Based on Mr Patel’s demographics and imaging, the mass in the liver has a
66.6 percent chance of being benign, 33.3 percent chance of being malignant,
and a 0.1 percent of not being real. 4
The action is whether some intervention is needed. For instance, if a
potential tumor is identifi ed in a noninvasive scan, then this will inform
whether an invasive examination will be conducted. In terms of identifying
the state of the world, the invasive exam is costly but safe—it can deduce a
cancer with certainty and remove it if necessary. The role of a noninvasive
exam is to inform whether an invasive exam should be forgone. That is, it
is to make physicians more confi dent about abstaining from treatment and
further analysis. In this regard, if the machine improves prediction, it will
lead to fewer invasive examinations.
3. “You did not go to medical school to measure lung nodules.” http:// www .medscape .com
/ viewarticle/ 863127#vp_2.
4. http:// www .medscape .com/ viewarticle/ 863127#vp_3.
A Theory of Decision- Making and Artifi cial Intelligence 95
Judgment involves understanding the payoff s. What is the payoff to con-
ducting a biopsy if the mass is benign, malignant, or not real? What is the
payoff to not doing anything in those three states? The issue for radiologists
in particular is whether a trained specialist radiologist is in the best position
to make this judgment or will it occur further along the chain of decision-
making or involve new job classes that merge diagnostic information such
as a combined radiologist/ pathologist (Jha and Topol 2016). Next, we for-
malize these ideas.
3.4 Baseline
Model
Our baseline model is inspired by the “bandit” environment considered by
Bolton and Faure- Grimaud (2009), although it departs signifi cantly in the
questions addressed and base assumptions made. Like them, in our base-
line model, we suppose there are two states of the world, { , } with prior
1
2
probabilities of {,1 – }. There are two possible actions: a state indepen-
dent action with known payoff of S (safe) and a state dependent action with
two possible payoff s, R or r, as the case may be (risky).
As noted in the introduction, a key departure from the usual assump-
tions of rational decision- making is that the decision maker does not know
the payoff from the risky action in each state and must apply judgment to
determine that payoff .5 Moreover, decision makers need to be able to make
a judgment for each state that might arise in order to formulate a plan that
would be the equivalent of payoff maximization. In the absence of such
judgment, the ex ante expectation that the risky action is optimal in any state
is v (which is independent between states). To make things more concrete,
we assume R > S > r.6 Thus, we assume that v is the probability in any state that the risky payoff is R rather than r. This is not a conditional probability of the state. It is a statement about the payoff , given the state.
In the absence of knowledge regarding the specifi c payoff s from the risky
action, a decision can only be made on the basis of prior probabilities. Then
the safe action will be chosen if
μ( vR + (1 v) r) + 1
( μ)( vR + (1 v) r) = vR + (1 v) r S.
5. Bolton and Faure- Grimaud (2009) consider this step to be the equivalent of a thought experiment where thinking takes time. To the extent that our results can be interpreted as a statement about the comparative advantage of humans, we assume that only humans can do judgment.
6. Thus, we assume that the payoff function, u, can only take one of three values, { R, r, S}.
The issue is which combinations of state realization and action lead to which payoff s. However, we assume that S is the payoff from the safe action regardless of state and so this is known to the decision maker. As it is the relative payoff s from actions that drive the results, this assumption is without loss in generality. Requiring this property of the safe action to be discovered would just add an extra cost. Implicitly, as the decision maker cannot make a decision in complete ignorance, we are assuming that the safe action’s payoff can be judged at an arbitrarily low cost.
96 Ajay Agrawal, Joshua Gans, and Avi Goldfarb
So that the payoff is: V = max{ vR + (1 – v) r, S}. To make things simpler, we 0
will focus our attention on the case where the safe action is—in the absence
of prediction or judgment—the default. That is, we assume that
(A1)
(Safe Default) vR + (1 – v) r ≤ S.
This assumption is made for simplicity only and will not change the quali-
tative conclusions.7 Under (A1), in the absence of knowledge of the payoff
function or a signal of the state, the decision maker would choose S.
3.4.1 Judgment in the Absence of Prediction
Prediction provides knowledge of the state. The process of judgment pro-
vides knowledge of the payoff function. Judgment therefore allows the deci-
sion maker to understand which action is optimal for a given state should
it arise. Suppose that this knowledge is gained without cost (as it would be
assumed to do under the usual assumptions of economic rationality). In
other words, the decision maker has knowledge of optimal action in a given
state. Then the risky action will be chosen (a) if it is the preferred action in
both states (which arises with probability v 2); (b) if it is the preferred action in but not and R + (1 – ) r > S (with probability v(1 – v)); or (c) if it is 1
2
the preferred action in but not and r + (1 – ) R > S (with probability 2
1
v(1 – v)). Thus, the expected payoff is
v 2 R + v(1 v)max μ R + (1 μ) r, S
{
}
+ v(1 v)max μ r + (1 μ) R, S
{
}+ (1 v)2 S.
Note that this is greater than V . The reason for this is that, when there is
0
uncertainty, judgment is valuable because it can identify actions that are
dominant or dominated—that is, that might be optimal across states. In
this situation, any resolution of uncertainty does not matter as it will not
change the decision made.
A key insight is that judgment itself can be consequential.
Result 1: If max{ R + (1 – ) r, r + (1 – ) R} > S, it is possible that judgment alone can cause the decision to switch from the default action (safe)
to the alternative action (risky).
As we are motivated by understanding the interplay between prediction
and judgment, we want to make these consequential. Therefore, we make the
following assumption to ensure prediction always has some value:
(A2) (Judgment Insuffi
cient) max{ R + (1 – ) r, r + (1 – ) R} ≤ S.
Under this assumption, if diff erent actions are optimal in each state and
this is known, the decision maker will not change to the risky action. This,
of course, implies that th
e expected payoff is
7. Bolton and Faure- Grimaud (2009) make the opposite assumption. Here, as our focus is on the impact of prediction, it is better to consider environments where prediction has the eff ect of reducing uncertainty over riskier actions.
A Theory of Decision- Making and Artifi cial Intelligence 97
v 2 R + (1 v 2 ) S.
Note that, absent any cost, full judgment improves the decision maker’s
expected payoff .
Judgment does not come for free. We assume here that it takes time
(although the formulation would naturally match with the notion that it
takes costly eff ort). Suppose the discount factor is < 1. A decision maker
can spend time in a period determining what the optimal action is for a par-
ticular state. If they choose to apply judgment with respect to state , then
i
there is a probability that they will determine the optimal action in that
i
period and can make a choice based on that judgment. Otherwise, they can
choose to apply judgment to that problem in the next period.
It is useful, at this point, to consider what judgment means once it has
been applied. The initial assumption we make here is that the knowledge
of the payoff function depreciates as soon as a decision is made. In other
words, applying judgment can delay a decision (and that is costly) and it
can improve that decision (which is its value) but it cannot generate experi-
ence that can be applied to other decisions (including future ones). In other
words, the initial conception of judgment is the application of thought rather
than the gathering of experience.8 Practically, this reduces our examination
to a static model. However, in a later section, we consider the experience
formulation and demonstrate that most of the insights of the static model
carry over to the dynamic model.
In summary, the timing of the game is as follows:
1. At the beginning of a decision stage, the decision maker chooses
whether to apply judgment and to what state or whether to simply choose
an action without judgment. If an action is chosen, uncertainty is resolved
and payoff s are realized and we move to a new decision stage.
2. If judgment is chosen, with probability, 1 – , they do not fi nd out
i
the payoff s for the risky action in that state, a period of time elapses and