The Economics of Artificial Intelligence

Home > Other > The Economics of Artificial Intelligence > Page 17
The Economics of Artificial Intelligence Page 17

by Ajay Agrawal


  ing the diff erent roles of prediction and judgment. Drawing inspiration from

  Bolton and Faure- Grimaud (2009), we then build the baseline model with

  two states of the world and uncertainty about payoff s to actions in each

  state. We explore the value of judgment in the absence of any prediction

  technology, and then the value of prediction technology when there is no

  judgment. We fi nish the discussion of the baseline model with an explora-

  tion of the interaction between prediction and judgment, demonstrating

  that prediction and judgment are complements as long as judgment isn’t too

  diffi

  cult. We then separate prediction quality into prediction frequency and

  prediction accuracy. As judgment improves, accuracy becomes more impor-

  tant relative to frequency. Finally, we examine complex environments where

  the number of potential states is large. Such environments are common in

  economic models of automation, contracting, and boundaries of the fi rm.

  We show that the eff ect of improvements in prediction on the importance

  of judgment depend a great deal on whether the improvements in prediction

  enable automated decision- making.

  3.2 AI and Prediction Costs

  We argue that the recent advances in artifi cial intelligence are advances

  in the technology of prediction. Most broadly, we defi ne prediction as the

  ability to take known information to generate new information. Our model

  emphasizes prediction about the state of the world.

  Most contemporary artifi cial intelligence research and applications come

  from a fi eld now called “machine learning.” Many of the tools of machine

  learning have a long history in statistics and data analysis, and are likely

  familiar to economists and applied statisticians as tools for prediction and

  classifi cation.2 For example, Alpaydin’s (2010) textbook Introduction to

  Machine Learning covers maximum likelihood estimation, Bayesian esti-

  mation, multivariate linear regression, principal components analysis, clus-

  tering, and nonparametric regression. In addition, it covers tools that may

  be less familiar, but also use independent variables to predict outcomes:

  2. We defi ne prediction as known information to generate new information. Therefore, classifi cation techniques such as clustering are prediction techniques in which the new information to be predicted is the appropriate category or class.

  A Theory of Decision- Making and Artifi cial Intelligence 93

  regression trees, neural networks, hidden Markov models, and reinforce-

  ment learning. Hastie, Tibshirani, and Friedman (2009) cover similar topics.

  The 2014 Journal of Economic Perspectives symposium on big data covered

  several of these less familiar prediction techniques in articles by Varian

  (2014) and Belloni, Chernozhukov, and Hansen (2014).

  While many of these prediction techniques are not new, recent advances

  in computer speed, data collection, data storage, and the prediction methods

  themselves have led to substantial improvements. These improvements have

  transformed the computer science research fi eld of artifi cial intelligence. The

  Oxford English Dictionary defi nes artifi cial intelligence as “[t]he theory and

  development of computer systems able to perform tasks normally requiring

  human intelligence.” In the 1960s and 1970s, artifi cial intelligence research

  was primarily rules- based, symbolic logic. It involved human experts gen-

  erating rules that an algorithm could follow (Domingos 2015, 89). These

  are not prediction technologies. Such systems became very good chess

  players and they guided factory robots in highly controlled settings; how-

  ever, by the 1980s, it became clear that rules- based systems could not deal

  with the complexity of many nonartifi cial settings. This led to an “AI winter”

  in which research funding artifi cial intelligence projects largely dried up

  (Markov 2015).

  Over the past ten years, a diff erent approach to artifi cial intelligence has

  taken off . The idea is to program computers to “learn” from example data

  or experience. In the absence of the ability to predetermine the decision

  rules, a data- driven prediction approach can conduct many mental tasks.

  For example, humans are good at recognizing familiar faces, but we would

  struggle to explain and codify this skill. By connecting data on names to

  image data on faces, machine learning solves this problem by predicting

  which image data patterns are associated with which names. As a prominent

  artifi cial intelligence researcher put it, “Almost all of AI’s recent progress is

  through one type, in which some input data (A) is used to quickly generate

  some simple response (B)” (Ng 2016). Thus, the progress is explicitly about

  improvements in prediction. In other words, the suite of technologies that

  have given rise to the recent resurgence of interest in artifi cial intelligence

  use data collected from sensors, images, videos, typed notes, or anything

  else that can be represented in bits to fi ll in missing information, recognize

  objects, or forecast what will happen next.

  To be clear, we do not take a position on whether these prediction tech-

  nologies really do mimic the core aspects of human intelligence. While Palm

  Computing founder Jeff Hawkins argues that human intelligence is—in

  essence—prediction (Hawkins 2004), many neuroscientists, psychologists,

  and others disagree. Our point is that the technologies that have been given

  the label artifi cial intelligence are prediction technologies. Therefore, in

  order to understand the impact of these technologies, it is important to

  assess the impact of prediction on decisions.

  94 Ajay Agrawal, Joshua Gans, and Avi Goldfarb

  3.3 Case:

  Radiology

  Before proceeding to the model, we provide some intuition of how predic-

  tion and judgment apply in a particular context where prediction machines

  are expected to have a large impact: radiology. In 2016, Geoff Hinton—one

  of the pioneers of deep learning neural networks—stated that it was no lon-

  ger worth training radiologists. His strong implication was that radiologists

  would not have a future. This is something that radiologists have been con-

  cerned about since 1960 (Lusted 1960). Today, machine- learning techniques

  are being heavily applied in radiology by IBM using its Watson computer

  and by a start-up, Enlitic. Enlitic has been able to use deep learning to detect

  lung nodules (a fairly routine exercise)3 but also fractures (which is more

  complex). Watson can now identify pulmonary embolism and some other

  heart issues. These advances are at the heart of Hinton’s forecast, but have

  also been widely discussed among radiologists and pathologists (Jha and

  Topol 2016). What does the model in this chapter suggest about the future

  of radiologists?

  If we consider a simplifi ed characterization of the job of a radiologist,

  it would be that they examine an image in order to characterize and clas-

  sify that image and return an assessment to a physician. While often that

  assessment is a diagnosis (i.e., “the patient has pneumonia”), in many cases,

  the assessment is in the negative (i
.e., “pneumonia not excluded”). In that

  regard, this is stated as a predictive task to inform the physician of the

  likelihood of the state of the world. Using that, the physician can devise a

  treatment.

  These predictions are what machines are aiming to provide. In particular,

  it might provide a diff erential diagnosis of the following kind:

  Based on Mr Patel’s demographics and imaging, the mass in the liver has a

  66.6 percent chance of being benign, 33.3 percent chance of being malignant,

  and a 0.1 percent of not being real. 4

  The action is whether some intervention is needed. For instance, if a

  potential tumor is identifi ed in a noninvasive scan, then this will inform

  whether an invasive examination will be conducted. In terms of identifying

  the state of the world, the invasive exam is costly but safe—it can deduce a

  cancer with certainty and remove it if necessary. The role of a noninvasive

  exam is to inform whether an invasive exam should be forgone. That is, it

  is to make physicians more confi dent about abstaining from treatment and

  further analysis. In this regard, if the machine improves prediction, it will

  lead to fewer invasive examinations.

  3. “You did not go to medical school to measure lung nodules.” http:// www .medscape .com

  / viewarticle/ 863127#vp_2.

  4. http:// www .medscape .com/ viewarticle/ 863127#vp_3.

  A Theory of Decision- Making and Artifi cial Intelligence 95

  Judgment involves understanding the payoff s. What is the payoff to con-

  ducting a biopsy if the mass is benign, malignant, or not real? What is the

  payoff to not doing anything in those three states? The issue for radiologists

  in particular is whether a trained specialist radiologist is in the best position

  to make this judgment or will it occur further along the chain of decision-

  making or involve new job classes that merge diagnostic information such

  as a combined radiologist/ pathologist (Jha and Topol 2016). Next, we for-

  malize these ideas.

  3.4 Baseline

  Model

  Our baseline model is inspired by the “bandit” environment considered by

  Bolton and Faure- Grimaud (2009), although it departs signifi cantly in the

  questions addressed and base assumptions made. Like them, in our base-

  line model, we suppose there are two states of the world, { , } with prior

  1

  2

  probabilities of {,1 – }. There are two possible actions: a state indepen-

  dent action with known payoff of S (safe) and a state dependent action with

  two possible payoff s, R or r, as the case may be (risky).

  As noted in the introduction, a key departure from the usual assump-

  tions of rational decision- making is that the decision maker does not know

  the payoff from the risky action in each state and must apply judgment to

  determine that payoff .5 Moreover, decision makers need to be able to make

  a judgment for each state that might arise in order to formulate a plan that

  would be the equivalent of payoff maximization. In the absence of such

  judgment, the ex ante expectation that the risky action is optimal in any state

  is v (which is independent between states). To make things more concrete,

  we assume R > S > r.6 Thus, we assume that v is the probability in any state that the risky payoff is R rather than r. This is not a conditional probability of the state. It is a statement about the payoff , given the state.

  In the absence of knowledge regarding the specifi c payoff s from the risky

  action, a decision can only be made on the basis of prior probabilities. Then

  the safe action will be chosen if

  μ( vR + (1 v) r) + 1

  ( μ)( vR + (1 v) r) = vR + (1 v) r S.

  5. Bolton and Faure- Grimaud (2009) consider this step to be the equivalent of a thought experiment where thinking takes time. To the extent that our results can be interpreted as a statement about the comparative advantage of humans, we assume that only humans can do judgment.

  6. Thus, we assume that the payoff function, u, can only take one of three values, { R, r, S}.

  The issue is which combinations of state realization and action lead to which payoff s. However, we assume that S is the payoff from the safe action regardless of state and so this is known to the decision maker. As it is the relative payoff s from actions that drive the results, this assumption is without loss in generality. Requiring this property of the safe action to be discovered would just add an extra cost. Implicitly, as the decision maker cannot make a decision in complete ignorance, we are assuming that the safe action’s payoff can be judged at an arbitrarily low cost.

  96 Ajay Agrawal, Joshua Gans, and Avi Goldfarb

  So that the payoff is: V = max{ vR + (1 – v) r, S}. To make things simpler, we 0

  will focus our attention on the case where the safe action is—in the absence

  of prediction or judgment—the default. That is, we assume that

  (A1)

  (Safe Default) vR + (1 – v) r ≤ S.

  This assumption is made for simplicity only and will not change the quali-

  tative conclusions.7 Under (A1), in the absence of knowledge of the payoff

  function or a signal of the state, the decision maker would choose S.

  3.4.1 Judgment in the Absence of Prediction

  Prediction provides knowledge of the state. The process of judgment pro-

  vides knowledge of the payoff function. Judgment therefore allows the deci-

  sion maker to understand which action is optimal for a given state should

  it arise. Suppose that this knowledge is gained without cost (as it would be

  assumed to do under the usual assumptions of economic rationality). In

  other words, the decision maker has knowledge of optimal action in a given

  state. Then the risky action will be chosen (a) if it is the preferred action in

  both states (which arises with probability v 2); (b) if it is the preferred action in but not and R + (1 – ) r > S (with probability v(1 – v)); or (c) if it is 1

  2

  the preferred action in but not and r + (1 – ) R > S (with probability 2

  1

  v(1 – v)). Thus, the expected payoff is

  v 2 R + v(1 v)max μ R + (1 μ) r, S

  {

  }

  + v(1 v)max μ r + (1 μ) R, S

  {

  }+ (1 v)2 S.

  Note that this is greater than V . The reason for this is that, when there is

  0

  uncertainty, judgment is valuable because it can identify actions that are

  dominant or dominated—that is, that might be optimal across states. In

  this situation, any resolution of uncertainty does not matter as it will not

  change the decision made.

  A key insight is that judgment itself can be consequential.

  Result 1: If max{ R + (1 – ) r, r + (1 – ) R} > S, it is possible that judgment alone can cause the decision to switch from the default action (safe)

  to the alternative action (risky).

  As we are motivated by understanding the interplay between prediction

  and judgment, we want to make these consequential. Therefore, we make the

  following assumption to ensure prediction always has some value:

  (A2) (Judgment Insuffi

  cient) max{ R + (1 – ) r, r + (1 – ) R} ≤ S.

  Under this assumption, if diff erent actions are optimal in each state and

  this is known, the decision maker will not change to the risky action. This,

  of course, implies that th
e expected payoff is

  7. Bolton and Faure- Grimaud (2009) make the opposite assumption. Here, as our focus is on the impact of prediction, it is better to consider environments where prediction has the eff ect of reducing uncertainty over riskier actions.

  A Theory of Decision- Making and Artifi cial Intelligence 97

  v 2 R + (1 v 2 ) S.

  Note that, absent any cost, full judgment improves the decision maker’s

  expected payoff .

  Judgment does not come for free. We assume here that it takes time

  (although the formulation would naturally match with the notion that it

  takes costly eff ort). Suppose the discount factor is < 1. A decision maker

  can spend time in a period determining what the optimal action is for a par-

  ticular state. If they choose to apply judgment with respect to state , then

  i

  there is a probability that they will determine the optimal action in that

  i

  period and can make a choice based on that judgment. Otherwise, they can

  choose to apply judgment to that problem in the next period.

  It is useful, at this point, to consider what judgment means once it has

  been applied. The initial assumption we make here is that the knowledge

  of the payoff function depreciates as soon as a decision is made. In other

  words, applying judgment can delay a decision (and that is costly) and it

  can improve that decision (which is its value) but it cannot generate experi-

  ence that can be applied to other decisions (including future ones). In other

  words, the initial conception of judgment is the application of thought rather

  than the gathering of experience.8 Practically, this reduces our examination

  to a static model. However, in a later section, we consider the experience

  formulation and demonstrate that most of the insights of the static model

  carry over to the dynamic model.

  In summary, the timing of the game is as follows:

  1. At the beginning of a decision stage, the decision maker chooses

  whether to apply judgment and to what state or whether to simply choose

  an action without judgment. If an action is chosen, uncertainty is resolved

  and payoff s are realized and we move to a new decision stage.

  2. If judgment is chosen, with probability, 1 – , they do not fi nd out

  i

  the payoff s for the risky action in that state, a period of time elapses and

 

‹ Prev