by Ajay Agrawal
Athey, Susan, and Guido Imbens. 2016. “Recursive Partitioning for Heterogeneous
Causal Eff ects.” Proceedings of the National Academy of Sciences 113:7353– 60.
Bengio, Yoshua, and Yann LeCun. 2007. “Scaling Learning Algorithms towards
AI.” Large- Scale Kernel Machines 34 (5): 1– 41.
13. The exception to this is web search, which has been eff ectively solved through AI.
86 Matt Taddy
Bottou, Léon, and Oliver Bousquet. 2008. “The Tradeoff s of Large Scale Learning.”
In Advances in Neural Information Processing Systems, 161– 68. NIPS Foundation.
http:// books.nips.cc.
Bresnahan, Timothy. 2010. “General Purpose Technologies.” Handbook of the Eco-
nomics of Innovation 2:761– 91.
Deaton, Angus, and John Muellbauer. 1980. “An Almost Ideal Demand System.”
American Economic Review 70:312– 26.
Duchi, John, Elad Hazan, and Yoram Singer. 2011. “Adaptive Subgradient Methods
for Online Learning and Stochastic Optimization.” Journal of Machine Learning
Research 12:2121– 59.
Feurer, Matthias, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel
Blum, and Frank Hutter. 2015. “Effi
cient and Robust Automated Machine Learn-
ing.” In Advances in Neural Information Processing Systems, 2962– 70. Cambridge, MA: MIT Press.
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. Cam-
bridge, MA: MIT Press.
Goodfellow, Ian, Jean Pouget- Abadie, Mehdi Mirza, Bing Xu, David Warde- Farley,
Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. “Generative Adversarial
Nets.” In Advances in Neural Information Processing Systems, 2672– 80. Cam-
bridge, MA: MIT Press.
Hardt, Moritz, Ben Recht, and Yoram Singer. 2016. “Train Faster, Generalize Better:
Stability of Stochastic Gradient Descent.” In Proceedings of the 33rd International Conference on Machine Learning 48:1225– 34. http:// proceedings.mlr.press/ v48
/ hardt16 .pdf.
Hartford, Jason, Greg Lewis, Kevin Leyton- Brown, and Matt Taddy. 2017. “Deep
IV: A Flexible Approach for Counterfactual Prediction.” In Proceedings of the
34th International Conference on Machine Learning 70:1414– 23. http:// proceedings
.mlr.press/ v70/ hartford17a .html.
Haugeland, John. 1985. Artifi cial Intelligence: The Very Idea. Cambridge, MA: MIT
Press.
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. “Deep Residual
Learning for Image Recognition.” In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 770– 78. https:// www .doi .org/ 10.1109
/ CVPR.2016.90.
Heckman, James J. 1977. “Sample Selection Bias as a Specifi cation Error (with an
Application to the Estimation of Labor Supply Functions).” NBER Working
Paper no. 172, Cambridge, MA.
Hinton, Geoff rey E., Simon Osindero, and Yee- Whye Teh. 2006. “A Fast Learning
Algorithm for Deep Belief Nets.” Neural Computation 18 (7): 1527– 54.
Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. “Long Short- Term Memory.”
Neural Computation 9 (8): 1735– 80.
Hornik, Kurt, Maxwell Stinchcombe, and Halbert White. 1989. “Multilayer Feed-
forward Networks are Universal Approximators.” Neural Networks 2:359– 66.
Karpathy, Andrej, and Li Fei- Fei. 2015. “Deep Visual- Semantic Alignments for
Generating Image Descriptions.” In Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition 39: (4) 3128– 37.
Kendall, Alex, and Yarin Gal. 2017. “What Uncertainties Do We Need in Bayesian
Deep Learning for Computer Vision?” arXiv preprint arXiv:1703.04977. https://
arxiv .org/ abs/ 1703.04977.
Kingma, Diederik, and Jimmy Ba. 2015. “ADAM: A Method for Stochastic Opti-
mization.” In Third International Conference on Learning Representations (ICLR).
https:// arxiv .org/ abs/ 1412.6980.
The Technological Elements of Artifi cial Intelligence 87
Krizhevsky, Alex, Ilya Sutskever, and Geoff rey E. Hinton. 2012. “Imagenet Clas-
sifi cation with Deep Convolutional Neural Networks.” In Advances in Neural
Information Processing Systems 1:1097– 105.
Lanier, Jaron. 2014. Who Owns the Future? New York: Simon & Schuster.
LeCun, Yann, and Yoshua Bengio. 1995. “Convolutional Networks for Images,
Speech, and Time Series.” In The Handbook of Brain Theory and Neural Networks,
255– 58. Cambridge, MA: MIT Press.
LeCun, Yann, Léon Bottou, Yoshua Bengio, and Patrick Haff ner. 1998. “Gradient-
Based Learning Applied to Document Recognition.” Proceedings of the IEEE
86:2278– 324.
McFadden, Daniel. 1980. “Econometric Models for Probabilistic Choice among
Products.” Journal of Business 53 (3): S13– 29.
Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean.
2013.”Distributed Representations of Words and Phrases and Their Composi-
tionality.” In Advances in Neural Information Processing Systems 2:3111– 19.
Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness,
Marc G. Bellemare, Alex Graves, et al. 2015. “Human- Level Control through
Deep Reinforcement Learning.” Nature 518 (7540): 529– 33.
Neal, Radford M. 2012. Bayesian Learning for Neural Networks, vol. 118. New York: Springer Science & Business Media.
Nielsen, Michael A. 2015. Neural Networks and Deep Learning. Determination Press.
http:// neuralnetworksanddeeplearning .com/.
Robbins, Herbert, and Sutton Monro. 1951. “A Stochastic Approximation Method.”
Annals of Mathematical Statistics, 22 (3): 400– 407.
Rosenblatt, Frank. 1958. “The Perceptron: A Probabilistic Model for Information
Storage and Organization in the Brain.” Psychological Review 65:386.
Rumelhart, David E., Geoff rey E. Hinton, and Ronald J. Williams. 1988. “Learning
Representations by Back- Propagating Errors.” Cognitive Modeling 5 (3): 1.
Sabour, Sara, Nicholas Frosst, and Geoff rey E. Hinton. 2017. “Dynamic Rout-
ing between Capsules.” In Advances in Neural Information Processing Systems,
3857– 67.
Silver, David, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George
Van Den Driessche, Julian Schrittwieser, et al. 2016. “Mastering the Game of Go
with Deep Neural Networks and Tree Search.” Nature 529:484– 89.
Silver, David, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja
Huang, Arthur Guez, et al. 2017. “Mastering the Game of Go without Human
Knowledge.” Nature 550:354– 59.
Srivastava, Nitish, Geoff rey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan
Salakhutdinov. 2014. “Dropout: A Simple Way to Prevent Neural Networks from
Overfi tting.” Journal of Machine Learning Research 15 (1): 1929– 58.
Taddy, Matt, Herbert K. H. Lee, Genetha A. Gray, and Joshua D Griffi
n. 2009.
“Bayesian Guided Pattern Search for Robust Local Optimization.” Technometrics
51 (4): 389– 401.
Thompson, William R. 1933. “On the Likelihood That One Unknown Probability
Exceeds Another in View of the Evidence of Two Samples.” Biometrika 25:285– 94.
Toulis, Panagiotis, Edoardo Airoldi, and Jason Rennie. 2014. “Statistical Analysis
of Stochastic G
radient Methods for Generalized Linear Models.” In International
Conference on Machine Learning, 667– 75.
van Seijen, Harm, Mehdi Fatemi, Joshua Romoff , Romain Laroche, Tavian Barnes,
and Jeff rey Tsang. 2017. “Hybrid Reward Architecture for Reinforcement Learn-
ing.” arXiv:1706.04208. https:// arxiv .org/ abs/ 1706.04208.
3
Prediction, Judgment,
and Complexity
A Theory of Decision- Making
and Artifi cial Intelligence
Ajay Agrawal, Joshua Gans, and Avi Goldfarb
3.1 Introduction
There is widespread discussion regarding the impact of machines on
employment (see Autor 2015). In some sense, the discussion mirrors a long-
standing literature on the impact of the accumulation of capital equipment
on employment; specifi cally, whether capital and labor are substitutes or
complements (Acemoglu 2003). But the recent discussion is motivated by
the integration of software with hardware and whether the role of machines
goes beyond physical tasks to mental ones as well (Brynjolfsson and McAfee
2014). As mental tasks were seen as always being present and essential,
human comparative advantage in these was seen as the main reason why, at
least in the long term, capital accumulation would complement employment
by enhancing labor productivity in those tasks.
The computer revolution has blurred the line between physical and men-
Ajay Agrawal is the Peter Munk Professor of Entrepreneurship at the Rotman School of
Management, University of Toronto, and a research associate of the National Bureau of Economic Research. Joshua Gans is professor of strategic management and holder of the Jeff rey S.
Skoll Chair of Technical Innovation and Entrepreneurship at the Rotman School of Management, University of Toronto (with a cross appointment in the Department of Economics), and a research associate of the National Bureau of Economic Research. Avi Goldfarb holds the Rotman Chair in Artifi cial Intelligence and Healthcare and is professor of marketing at the Rotman School of Management, University of Toronto, and is a research associate of the National Bureau of Economic Research.
Our thanks to Andrea Prat, Scott Stern, Hal Varian, and participants at the AEA (Chicago), NBER Summer Institute (2017), NBER Economics of AI Conference (Toronto), Columbia
Law School, Harvard Business School, MIT, and University of Toronto for helpful comments.
Responsibility for all errors remains our own. The latest version of this chapter is available at joshuagans .com. For acknowledgments, sources of research support, and disclosure of the authors’ material fi nancial relationships, if any, please see http:// www .nber .org/ chapters
/ c14010.ack.
89
90 Ajay Agrawal, Joshua Gans, and Avi Goldfarb
tal tasks. For instance, the invention of the spreadsheet in the late 1970s
fundamentally changed the role of bookkeepers. Prior to that invention,
there was a time- intensive task involving the recomputation of outcomes in
spreadsheets as data or assumptions changed. That human task was substi-
tuted by the spreadsheet software that could produce the calculations more
quickly, cheaply, and frequently. However, at the same time, the spreadsheet
made the jobs of accountants, analysts, and others far more productive.
In the accounting books, capital was substituting for labor, but the mental
productivity of labor was being changed. Thus, the impact on employment
critically depended on whether there were tasks the “computers cannot do.”
These assumptions persist in models today. Acemoglu and Restrepo
(2017) observe that capital substitutes for labor in certain tasks while at the
same time technological progress creates new tasks. They make what they
call a “natural assumption” that only labor can perform the new tasks as
they are more complex than previous ones.1 Benzell et al. (2015) consider
the impact of software more explicitly. Their environment has two types of
labor—high- tech (who can, among other things, code) and low- tech (who
are empathetic and can handle interpersonal tasks). In this environment,
it is the low- tech workers who cannot be replaced by machines while the
high- tech ones are employed initially to create the code that will eventually
displace their kind. The results of the model depend, therefore, on a class
of worker who cannot be substituted directly for capital, but also on the
inability of workers themselves to substitute between classes.
In this chapter, our approach is to delve into the weeds of what is hap-
pening currently in the fi eld of artifi cial intelligence (AI). The recent wave
of developments in AI all involve advances in machine learning. Those
advances allow for automated and cheap prediction; that is, providing a
forecast (or nowcast) of a variable of interest from available data (Agrawal,
Gans and Goldfarb 2018b). In some cases, prediction has enabled full auto-
mation of tasks—for example, self- driving vehicles where the process of
data collection, prediction of behavior and surroundings, and actions are
all conducted without a human in the loop. In other cases, prediction is a
standalone tool—such as image recognition or fraud detection—that may
or may not lead to further substitution of human users of such tools by
machines. Thus far, substitution between humans and machines has focused
mainly on cost considerations. Are machines cheaper, more reliable, and
more scalable (in their software form) than humans? This chapter, however,
considers the role of prediction in decision- making explicitly and from that
examines the complementary skills that may be matched with prediction
within a task.
1. To be sure, their model is designed to examine how automation of tasks causes a change in factor prices that biases innovation toward the creation of new tasks that labor is more suited to.
A Theory of Decision- Making and Artifi cial Intelligence 91
Our focus, in this regard, is on what we term judgment. While judgment
is a term with broad meaning, here we use it to refer to a very specifi c skill.
To see this, consider a decision. That decision involves choosing an action,
x, from a set, X. The payoff (or reward) from that action is defi ned by a function, u( x, ) where is a realization of an uncertain state drawn from a distribution, F(). Suppose that, prior to making a decision, a prediction (or signal), s, can be generated that results in a posterior, F(| s). Thus, the decision maker would solve
max
u x,
( ) dF s
( ).
x X
In other words, a standard problem of choice under uncertainty. In this
standard world, the role of prediction is to improve decision- making. The
payoff , or utility function, is known.
To create a role for judgment, we depart from this standard set-up in
statistical decision theory and ask how a decision maker comes to know the
function, u( x, )? We assume that this is not simply given or a primitive of the decision- making model. Instead, it requires a human to undertake a costly
process that allows the mapping from ( x, ) to a particular payoff value, u, to be discovered. This is a reasonable assumption given that beyond some rudimentary experimentation in closed environments, there is no current way for
an AI to impute a utility function that resides with humans. Additionally,
this
process separates the costs of providing the mapping for each pair, ( x, ).
(Actually, we focus, without loss in generality, on situations where u( x, ) ≠
u( x) for all and presume that if a payoff to an action is state independent that payoff is known.) In other words, while prediction can obtain a signal
of the underlying state, judgment is the process by which the payoff s from
actions that arise based on that state can be determined. We assume that
this process of determining payoff s requires human understanding of the
situation: it is not a prediction problem.
For intuition on the diff erence between prediction and judgment, consider
the example of credit card fraud. A bank observes a credit card transaction.
That transaction is either legitimate or fraudulent. The decision is whether
to approve the transaction. If the bank knows for sure that the transaction
is legitimate, the bank will approve it. If the bank knows for sure that it is
fraudulent, the bank will refuse the transaction. Why? Because the bank
knows the payoff of approving a legitimate transaction is higher than the
payoff of refusing that transaction. Things get more interesting if the bank
is uncertain about whether the transaction is legitimate. The uncertainty
means that the bank also needs to know the payoff from refusing a legitimate
transaction and from approving a fraudulent transaction. In our model,
judgment is the process of determining these payoff s. It is a costly activity,
in the sense that it requires time and eff ort.
As the new developments regarding AI all involve making prediction
more readily available, we ask, how does judgment and its endogenous appli-
92 Ajay Agrawal, Joshua Gans, and Avi Goldfarb
cation change the value of prediction? Are prediction and judgment sub-
stitutes or complements? How does the value of prediction change mono-
tonically with the diffi
culty of applying judgment? In complex environments
(as they relate to automation, contracting, and the boundaries of the fi rm),
how do improvements in prediction aff ect the value of judgment?
We proceed by fi rst providing supportive evidence for our assumption that
recent developments in AI overwhelmingly impact the costs of prediction.
We then use the example of radiology to provide a context for understand-