The Economics of Artificial Intelligence Page 17 Read online free by Ajay Agrawal

Home > Other > The Economics of Artificial Intelligence > Page 17

The Economics of Artificial Intelligence Page 17

ing the diff erent roles of prediction and judgment. Drawing inspiration from

Bolton and Faure- Grimaud (2009), we then build the baseline model with

two states of the world and uncertainty about payoff s to actions in each

state. We explore the value of judgment in the absence of any prediction

technology, and then the value of prediction technology when there is no

judgment. We fi nish the discussion of the baseline model with an explora-

tion of the interaction between prediction and judgment, demonstrating

that prediction and judgment are complements as long as judgment isn’t too

diffi

cult. We then separate prediction quality into prediction frequency and

prediction accuracy. As judgment improves, accuracy becomes more impor-

tant relative to frequency. Finally, we examine complex environments where

the number of potential states is large. Such environments are common in

economic models of automation, contracting, and boundaries of the fi rm.

We show that the eff ect of improvements in prediction on the importance

of judgment depend a great deal on whether the improvements in prediction

enable automated decision- making.

3.2 AI and Prediction Costs

We argue that the recent advances in artifi cial intelligence are advances

in the technology of prediction. Most broadly, we defi ne prediction as the

ability to take known information to generate new information. Our model

emphasizes prediction about the state of the world.

Most contemporary artifi cial intelligence research and applications come

from a fi eld now called “machine learning.” Many of the tools of machine

learning have a long history in statistics and data analysis, and are likely

familiar to economists and applied statisticians as tools for prediction and

classifi cation.2 For example, Alpaydin’s (2010) textbook Introduction to

Machine Learning covers maximum likelihood estimation, Bayesian esti-

mation, multivariate linear regression, principal components analysis, clus-

tering, and nonparametric regression. In addition, it covers tools that may

be less familiar, but also use independent variables to predict outcomes:

2. We defi ne prediction as known information to generate new information. Therefore, classifi cation techniques such as clustering are prediction techniques in which the new information to be predicted is the appropriate category or class.

A Theory of Decision- Making and Artifi cial Intelligence 93

regression trees, neural networks, hidden Markov models, and reinforce-

ment learning. Hastie, Tibshirani, and Friedman (2009) cover similar topics.

The 2014 Journal of Economic Perspectives symposium on big data covered

several of these less familiar prediction techniques in articles by Varian

(2014) and Belloni, Chernozhukov, and Hansen (2014).

While many of these prediction techniques are not new, recent advances

in computer speed, data collection, data storage, and the prediction methods

themselves have led to substantial improvements. These improvements have

transformed the computer science research fi eld of artifi cial intelligence. The

Oxford English Dictionary defi nes artifi cial intelligence as “[t]he theory and

development of computer systems able to perform tasks normally requiring

human intelligence.” In the 1960s and 1970s, artifi cial intelligence research

was primarily rules- based, symbolic logic. It involved human experts gen-

erating rules that an algorithm could follow (Domingos 2015, 89). These

are not prediction technologies. Such systems became very good chess

players and they guided factory robots in highly controlled settings; how-

ever, by the 1980s, it became clear that rules- based systems could not deal

with the complexity of many nonartifi cial settings. This led to an “AI winter”

in which research funding artifi cial intelligence projects largely dried up

(Markov 2015).

Over the past ten years, a diff erent approach to artifi cial intelligence has

taken off . The idea is to program computers to “learn” from example data

or experience. In the absence of the ability to predetermine the decision

rules, a data- driven prediction approach can conduct many mental tasks.

For example, humans are good at recognizing familiar faces, but we would

struggle to explain and codify this skill. By connecting data on names to

image data on faces, machine learning solves this problem by predicting

which image data patterns are associated with which names. As a prominent

artifi cial intelligence researcher put it, “Almost all of AI’s recent progress is

through one type, in which some input data (A) is used to quickly generate

some simple response (B)” (Ng 2016). Thus, the progress is explicitly about

improvements in prediction. In other words, the suite of technologies that

have given rise to the recent resurgence of interest in artifi cial intelligence

use data collected from sensors, images, videos, typed notes, or anything

else that can be represented in bits to fi ll in missing information, recognize

objects, or forecast what will happen next.

To be clear, we do not take a position on whether these prediction tech-

nologies really do mimic the core aspects of human intelligence. While Palm

Computing founder Jeff Hawkins argues that human intelligence is—in

essence—prediction (Hawkins 2004), many neuroscientists, psychologists,

and others disagree. Our point is that the technologies that have been given

the label artifi cial intelligence are prediction technologies. Therefore, in

order to understand the impact of these technologies, it is important to

assess the impact of prediction on decisions.

94 Ajay Agrawal, Joshua Gans, and Avi Goldfarb

3.3 Case:

Radiology

Before proceeding to the model, we provide some intuition of how predic-

tion and judgment apply in a particular context where prediction machines

are expected to have a large impact: radiology. In 2016, Geoff Hinton—one

of the pioneers of deep learning neural networks—stated that it was no lon-

ger worth training radiologists. His strong implication was that radiologists

would not have a future. This is something that radiologists have been con-

cerned about since 1960 (Lusted 1960). Today, machine- learning techniques

are being heavily applied in radiology by IBM using its Watson computer

and by a start-up, Enlitic. Enlitic has been able to use deep learning to detect

lung nodules (a fairly routine exercise)3 but also fractures (which is more

complex). Watson can now identify pulmonary embolism and some other

heart issues. These advances are at the heart of Hinton’s forecast, but have

also been widely discussed among radiologists and pathologists (Jha and

Topol 2016). What does the model in this chapter suggest about the future

of radiologists?

If we consider a simplifi ed characterization of the job of a radiologist,

it would be that they examine an image in order to characterize and clas-

sify that image and return an assessment to a physician. While often that

assessment is a diagnosis (i.e., “the patient has pneumonia”), in many cases,

the assessment is in the negative (i
.e., “pneumonia not excluded”). In that

regard, this is stated as a predictive task to inform the physician of the

likelihood of the state of the world. Using that, the physician can devise a

treatment.

These predictions are what machines are aiming to provide. In particular,

it might provide a diff erential diagnosis of the following kind:

Based on Mr Patel’s demographics and imaging, the mass in the liver has a

66.6 percent chance of being benign, 33.3 percent chance of being malignant,

and a 0.1 percent of not being real. 4

The action is whether some intervention is needed. For instance, if a

potential tumor is identifi ed in a noninvasive scan, then this will inform

whether an invasive examination will be conducted. In terms of identifying

the state of the world, the invasive exam is costly but safe—it can deduce a

cancer with certainty and remove it if necessary. The role of a noninvasive

exam is to inform whether an invasive exam should be forgone. That is, it

is to make physicians more confi dent about abstaining from treatment and

further analysis. In this regard, if the machine improves prediction, it will

lead to fewer invasive examinations.

3. “You did not go to medical school to measure lung nodules.” http:// www .medscape .com

/ viewarticle/ 863127#vp_2.

4. http:// www .medscape .com/ viewarticle/ 863127#vp_3.

A Theory of Decision- Making and Artifi cial Intelligence 95

Judgment involves understanding the payoff s. What is the payoff to con-

ducting a biopsy if the mass is benign, malignant, or not real? What is the

payoff to not doing anything in those three states? The issue for radiologists

in particular is whether a trained specialist radiologist is in the best position

to make this judgment or will it occur further along the chain of decision-

making or involve new job classes that merge diagnostic information such

as a combined radiologist/ pathologist (Jha and Topol 2016). Next, we for-

malize these ideas.

3.4 Baseline

Model

Our baseline model is inspired by the “bandit” environment considered by

Bolton and Faure- Grimaud (2009), although it departs signifi cantly in the

questions addressed and base assumptions made. Like them, in our base-

line model, we suppose there are two states of the world, { , } with prior

1

2

probabilities of {,1 – }. There are two possible actions: a state indepen-

dent action with known payoff of S (safe) and a state dependent action with

two possible payoff s, R or r, as the case may be (risky).

As noted in the introduction, a key departure from the usual assump-

tions of rational decision- making is that the decision maker does not know

the payoff from the risky action in each state and must apply judgment to

determine that payoff .5 Moreover, decision makers need to be able to make

a judgment for each state that might arise in order to formulate a plan that

would be the equivalent of payoff maximization. In the absence of such

judgment, the ex ante expectation that the risky action is optimal in any state

is v (which is independent between states). To make things more concrete,

we assume R > S > r.6 Thus, we assume that v is the probability in any state that the risky payoff is R rather than r. This is not a conditional probability of the state. It is a statement about the payoff , given the state.

In the absence of knowledge regarding the specifi c payoff s from the risky

action, a decision can only be made on the basis of prior probabilities. Then

the safe action will be chosen if

μ( vR + (1 v) r) + 1

( μ)( vR + (1 v) r) = vR + (1 v) r S.

5. Bolton and Faure- Grimaud (2009) consider this step to be the equivalent of a thought experiment where thinking takes time. To the extent that our results can be interpreted as a statement about the comparative advantage of humans, we assume that only humans can do judgment.

6. Thus, we assume that the payoff function, u, can only take one of three values, { R, r, S}.

The issue is which combinations of state realization and action lead to which payoff s. However, we assume that S is the payoff from the safe action regardless of state and so this is known to the decision maker. As it is the relative payoff s from actions that drive the results, this assumption is without loss in generality. Requiring this property of the safe action to be discovered would just add an extra cost. Implicitly, as the decision maker cannot make a decision in complete ignorance, we are assuming that the safe action’s payoff can be judged at an arbitrarily low cost.

96 Ajay Agrawal, Joshua Gans, and Avi Goldfarb

So that the payoff is: V = max{ vR + (1 – v) r, S}. To make things simpler, we 0

will focus our attention on the case where the safe action is—in the absence

of prediction or judgment—the default. That is, we assume that

(A1)

(Safe Default) vR + (1 – v) r ≤ S.

This assumption is made for simplicity only and will not change the quali-

tative conclusions.7 Under (A1), in the absence of knowledge of the payoff

function or a signal of the state, the decision maker would choose S.

3.4.1 Judgment in the Absence of Prediction

Prediction provides knowledge of the state. The process of judgment pro-

vides knowledge of the payoff function. Judgment therefore allows the deci-

sion maker to understand which action is optimal for a given state should

it arise. Suppose that this knowledge is gained without cost (as it would be

assumed to do under the usual assumptions of economic rationality). In

other words, the decision maker has knowledge of optimal action in a given

state. Then the risky action will be chosen (a) if it is the preferred action in

both states (which arises with probability v 2); (b) if it is the preferred action in but not and R + (1 – ) r > S (with probability v(1 – v)); or (c) if it is 1

2

the preferred action in but not and r + (1 – ) R > S (with probability 2

1

v(1 – v)). Thus, the expected payoff is

v 2 R + v(1 v)max μ R + (1 μ) r, S

{

}

+ v(1 v)max μ r + (1 μ) R, S

{

}+ (1 v)2 S.

Note that this is greater than V . The reason for this is that, when there is

0

uncertainty, judgment is valuable because it can identify actions that are

dominant or dominated—that is, that might be optimal across states. In

this situation, any resolution of uncertainty does not matter as it will not

change the decision made.

A key insight is that judgment itself can be consequential.

Result 1: If max{ R + (1 – ) r, r + (1 – ) R} > S, it is possible that judgment alone can cause the decision to switch from the default action (safe)

to the alternative action (risky).

As we are motivated by understanding the interplay between prediction

and judgment, we want to make these consequential. Therefore, we make the

following assumption to ensure prediction always has some value:

(A2) (Judgment Insuffi

cient) max{ R + (1 – ) r, r + (1 – ) R} ≤ S.

Under this assumption, if diff erent actions are optimal in each state and

this is known, the decision maker will not change to the risky action. This,

of course, implies that th
e expected payoff is

7. Bolton and Faure- Grimaud (2009) make the opposite assumption. Here, as our focus is on the impact of prediction, it is better to consider environments where prediction has the eff ect of reducing uncertainty over riskier actions.

A Theory of Decision- Making and Artifi cial Intelligence 97

v 2 R + (1 v 2 ) S.

Note that, absent any cost, full judgment improves the decision maker’s

expected payoff .

Judgment does not come for free. We assume here that it takes time

(although the formulation would naturally match with the notion that it

takes costly eff ort). Suppose the discount factor is < 1. A decision maker

can spend time in a period determining what the optimal action is for a par-

ticular state. If they choose to apply judgment with respect to state , then

i

there is a probability that they will determine the optimal action in that

i

period and can make a choice based on that judgment. Otherwise, they can

choose to apply judgment to that problem in the next period.

It is useful, at this point, to consider what judgment means once it has

been applied. The initial assumption we make here is that the knowledge

of the payoff function depreciates as soon as a decision is made. In other

words, applying judgment can delay a decision (and that is costly) and it

can improve that decision (which is its value) but it cannot generate experi-

ence that can be applied to other decisions (including future ones). In other

words, the initial conception of judgment is the application of thought rather

than the gathering of experience.8 Practically, this reduces our examination

to a static model. However, in a later section, we consider the experience

formulation and demonstrate that most of the insights of the static model

carry over to the dynamic model.

In summary, the timing of the game is as follows:

1. At the beginning of a decision stage, the decision maker chooses

whether to apply judgment and to what state or whether to simply choose

an action without judgment. If an action is chosen, uncertainty is resolved

and payoff s are realized and we move to a new decision stage.

2. If judgment is chosen, with probability, 1 – , they do not fi nd out

i

the payoff s for the risky action in that state, a period of time elapses and

‹ Prev Next ›