by Ajay Agrawal
tal data on bargaining and on risky choice. The second idea is that some
common limits on human prediction might be understood as the kinds of
errors made by poor implementations of machine learning. The third idea
is that it is important to study how AI technology used in fi rms and other
institutions can both overcome and exploit human limits. The fullest under-
standing of this tech- human interaction will require new knowledge from
behavioral economics about attention, the nature of assembled preferences,
and perceived fairness.
24.2 Machine Learning to Find Behavioral Variables
Behavioral economics can be defi ned as the study of natural limits on
computation, willpower, and self- interest, and the implications of those
Colin F. Camerer is the Robert Kirby Professor of Behavioral Finance and Economics at
the California Institute of Technology.
For acknowledgments, sources of research support, and disclosure of the author’s material fi nancial relationships, if any, please see http:// www .nber .org/ chapters/ c14013.ack.
587
588 Colin F. Camerer
limits for economic analysis (market equilibrium, IO, public fi nance, etc.).
A diff erent approach is to defi ne behavioral economics more generally, as
simply being open- minded about what variables are likely to infl uence eco-
nomic choices.
This open- mindedness can be defi ned by listing neighboring social
sciences that are likely to be the most fruitful source of explanatory variables.
These include psychology, sociology (e.g., norms), anthropology (cultural
variation in cognition), neuroscience, political science, and so forth. Call this
the “behavioral economics trades with its neighbors” view.
But the open- mindedness could also be characterized even more gener-
ally, as an invitation to machine- learn how to predict economic outcomes
from the largest possible feature set. In the “trades with its neighbors” view,
features are constructs that are contributed by diff erent neighboring sciences.
These could be loss aversion, identity, moral norms, in-group preference,
inattention, habit, model- free reinforcement learning, individual polygenic
scores, and so forth.
But why stop there?
In a general ML approach, predictive features could be—and should
be— any variables that predict. (For policy purposes, variables that could
be controlled by people, fi rms, and governments may be of special interest.)
These variables can be measurable properties of choices, the set of choices,
aff ordances and motor interactions during choosing, measures of atten-
tion, psychophysiological measures of biological states, social infl uences,
properties of individuals who are doing the choosing (SES, wealth, moods,
personality, genes), and so forth. The more variables, the merrier.
From this perspective, we can think about what sets of features are con-
tributed by diff erent disciplines and theories. What features does textbook
economic theory contribute? Constrained utility maximization in its most
familiar and simple form points to only three kinds of variables—prices,
information (which can inform utilities), and constraints.
Most propositions in behavioral economics add some variables to this
list of features, such as reference- dependence, context- dependence (menu
eff ects), anchoring, limited attention, social preference, and so forth.
Going beyond familiar theoretical constructs, the ML approach to behav-
ioral economics specifi es a very long list of candidate variables (= features)
and include all of them in an ML approach. This approach has two advan-
tages: First, simple theories can be seen as bets that only a small number of
features will predict well; that is, some eff ects (such as prices) are hypoth-
esized to be fi rst- order in magnitude. Second, if longer lists of features pre-
dict better than a short list of theory- specifi ed features, then that fi nding
establishes a plausible upper bound on how much potential predictability
is left to understand. The results are also likely to create raw material for
theory to fi gure out how to consolidate the additional predictive power into
crystallized theory (see also Kleinberg, Liang, and Mullainathan 2015).
Artifi cial Intelligence and Behavioral Economics 589
If behavioral economics is recast as open- mindedness about what vari-
ables might predict, then ML is an ideal way to do behavioral economics
because it can make use of a wide set of variables and select which ones
predict. I will illustrate it with some examples.
Bargaining. There is a long history of bargaining experiments trying to
predict what bargaining outcomes (and disagreement rates) will result from
structural variables using game- theoretic methods. In the 1980s there was
a sharp turn in experimental work toward noncooperative approaches in
which the communication and structure of bargaining was carefully struc-
tured (e.g., Roth 1995 and Camerer 2003 for reviews). In these experiments
the possible sequence of off ers in the bargaining are heavily constrained
and no communication is allowed (beyond the off ers themselves). This
shift to highly structured paradigms occurred because game theory, at the
time, delivered sharp, nonobvious new predictions about what outcomes
might result depending on the structural parameters—particularly, costs
of delay, time horizon, the exogenous order of off ers and acceptance, and
available outside options (payoff s upon disagreement). Given the diffi
culty
of measuring or controlling these structural variables in most fi eld settings,
experiments provided a natural way to test these structured- bargaining
theories.1
Early experiments made it clear that concerns for fairness or outcomes
of others infl uenced utility, and the planning ahead assumed in subgame
perfect theories is limited and cognitively unnatural (Camerer et al. 1994;
Johnson et al. 2002; Binmore et al. 2002). Experimental economists became
wrapped up in understanding the nature of apparent social preferences and
limited planning in structured bargaining.
However, most natural bargaining is not governed by rules about structure
as simple as those theories, and experiments became focused from 1985 to
2000 and beyond. Natural bargaining is typically “semi- structured”—that
is, there is a hard deadline and protocol for what constitutes an agreement,
and otherwise there are no restrictions on which party can make what off ers
at what time, including the use of natural language, face- to-face meetings
or use of agents, and so on.
The revival of experimental study of unstructured bargaining is a good
idea for three reasons (see also Karagözog˘lu, forthcoming). First, there are
now a lot more ways to measure what happens during bargaining in labora-
tory conditions (and probably in fi eld settings as well). Second, the large
number of features that can now be generated are ideal inputs for ML to
predict bargaining outcomes. Third, even when bargaining is unstructured
it is possible to produce bold, nonobvious precise predictions (thanks to the
>
revelation principle). As we will see, ML can then test whether the features
1. Examples include Binmore, Shaked, and Sutton (1985, 1989); Neelin, Sonnenschein, and Spiegel (1988); Camerer et al. (1994); and Binmore et al. (2002).
590 Colin F. Camerer
Fig. 24.1 A, initial off er screen (for informed player I, white bar); B, example cursor locations after three seconds (indicating amount off ered by I, white, or demanded by U, dark gray); C, cursor bars match which indicates an off er, consummated at six seconds; D, feedback screen for player I. Player U also receives feedback about pie size and profi t if a trade was made (otherwise the profi t is zero).
predicted by game theory to aff ect outcomes actually do, and how much
predictive power other features add (if any).
These three properties are illustrated by experiments of Camerer, Nave,
and Smith (2017).2 Two players bargain over how to divide an amount of
money worth $1– $6 (in integer values). One informed ( I ) player knows the
amount; the other, uninformed ( U ) player, doesn’t know the amount. They
are bargaining over how much the uninformed U player will get. But both
players know that I knows the amount.
They bargain over ten seconds by moving cursors on a bargaining number
line (fi gure 24.1). The data created in each trial is a time series of cursor loca-
tions, which are a series of step functions coming from a low off er to higher
ones (representing increases in off ers from I ) and from higher demands to
lower ones (representing decreasing demands from U ).
Suppose we are trying to predict whether there will be an agreement or
not based on all variables that can be observed. From a theoretical point
of view, effi
cient bargaining based on revelation principle analysis predicts
an exact rate of disagreement for each of the amounts $1– 6, based only on
the diff erent amounts available. Remarkably, this prediction is process- free.
2. This paradigm builds on seminal work on semistructured bargaining by Forsythe, Ken-
nan, and Sopher (1991).
Artifi cial Intelligence and Behavioral Economics 591
Fig. 24.2 ROC curves showing combinations of false and true positive rates in pre-
dicting bargaining disagreements
Notes: Improved forecasting is represented by curves moving to the upper left. The combination of process (cursor location features) and “pie” (amount) data are a clear improvement over either type of data alone.
However, from an ML point of view there are lots of features represent-
ing what the players are doing that could add predictive power (besides the
process- free prediction based on the amount at stake). Both cursor locations
are recorded every twenty- fi ve msec. The time series of cursor locations is
associated with a huge number of features—how far apart the cursors are,
the time since last concession (= cursor movement), size of last concession,
interactions between concession amounts and times, and so forth.
Figure 24.2 shows an ROC curve indicating test- set accuracy in predicting
whether a bargaining trial ends in a disagreement (= 1) or not. The ROC
curves sketch out combinations of true positive rates, P(disagree|predict
disagree) and false positive rates P(agree|predict disagree). An improved
ROC curve moves up and to the left, refl ecting more true positives and fewer
false positives. As is evident, predicting from process data only is about as
accurate as using just the amount (“pie”) sizes (the ROC curves with black
circle and empty square markers). Using both types of data improves predic-
tion substantially (curve with empty circle markers).
Machine learning is able to fi nd predictive value in details of how the
bargaining occurs (beyond the simple, and very good, prediction based
only on the amount being bargained over). Of course, this discovery is the
592 Colin F. Camerer
beginning of the next step for behavioral economics. It raises questions that
include: What variables predict? How do emotions,3 face- to-face commu-
nication, and biological measures (including whole- brain imaging)4 infl u-
ence bargaining? Do people consciously understand why those variables are
important? Can ML methods capture the eff ects of motivated cognition in
unstructured bargaining, when people can self- servingly disagree about case
facts?5 Can people constrain expression of variables that hurt their bargain-
ing power? Can mechanisms be designed that record these variables and
then create effi
cient mediation, into which people will voluntarily participate
(capturing all gains from trade)?6
Risky Choice. Peysakhovich and Naecker (2017) use machine learning to
analyze decisions between simple fi nancial risks. The set of risks are ran-
domly generated triples ($ y, $ x, 0) with associated probabilities ( p _ x, p _ y, p _0). Subjects give a willingness- to-pay (WTP) for each gamble.
The feature set is the fi ve probability and amount variables (excluding the
$0 payoff ), quadratic terms for all fi ve, and all two- and three- way inter-
actions among the linear and quadratic variables. For aggregate- level esti-
mation this creates 5 + 5 + 45 + 120 = 175 variables.
Machine learning predictions are derived from regularized regression
with a linear penalty (LASSO) or squared penalty (ridge) for (absolute)
coeffi
cients. Participants were N = 315 MTurk subjects who each gave ten
useable responses. The training set consists of 70 percent of the observa-
tions, and 30 percent are held out as a test set.
They also estimate predictive accuracy of a one- variable expected utility
model (EU, with power utility) and a prospect theory (PT) model, which
adds one additional parameter to allow nonlinear probability weighting
(Tversky and Kahneman 1992) (with separate weights, not cumulative ones).
For these models there are only one or two free parameters per person.7
The aggregate data estimation uses the same set of parameters for all
subjects. In this analysis, the test set accuracy (mean squared error) is almost
exactly the same for PT and for both LASSO and ridge ML predictions, even
though PT uses only two variables and the ML methods use 175 variables.
Individual- level analysis, in which each subject has their own parameters
has about half the mean squared error as the aggregate analysis. The PT and
ridge ML are about equally accurate.
The fact that PT and ML are equally accurate is a bit surprising because
the ML method allows quite a lot of fl exibility in the space of possible
3. Andrade and Ho (2009).
4. Lohrenz et al. (2007) and Bhatt et al. (2010).
5. See Babcock et al. (1995) and Babcock and Loewenstein (1997).
6. See Krajbich et al. (2008) for a related example of using neural measures to enhance effi
-
ciency in public good production experiments.
7. Note, however, that the ML feature set does not exactly nest the EU and PT forms. For example, a weighted combination of the linear outcome X and the quadratic term X 2 does not exactly equal the power function X.
Artifi cial Intelligence and Behavioral Economics 593
predictions. Indeed, the authors’ motivation was to use ML to show how
a model with a huge amount of fl exibility could fi
t, possibly to provide a
ceiling in achievable accuracy. If the ML predictions were more accurate
than EU or PT, the gap would show how much improvement could be had
by more complicated combinations of outcome and probability parameters.
But the result, instead, shows that much busier models are not more accurate
than the time- tested two- parameter form of PT, for this domain of choices.
Limited Strategic Thinking. The concept of subgame perfection in game
theory presumes that players look ahead in the future to what other players
might do at future choice nodes (even choice nodes that are unlikely to be
reached), in order to compute likely consequences of their current choices.
This psychological presumption does have some predictive power in short,
simple games. However, direct measures of attention (Camerer at al. 1994;
Johnson et al. 2002) and inference from experiments (e.g., Binmore et al.
2002) make it clear that players with limited experience do not look far
ahead.
More generally, in simultaneous games, there is now substantial evi-
dence that even highly intelligent and educated subjects do not all process
information in a way that leads to optimized choices given (Nash) “equi-
librium” beliefs—that is, beliefs that accurately forecast what other players
will do. More important, two general classes of theories have emerged that
can account for deviations from optimized equilibrium theory. One class,
quantal response equilibrium (QRE), are theories in which beliefs are sta-
tistically accurate but noisy (e.g., Goeree, Holt, and Palfrey 2016). Another
type of theory presumes that deviations from Nash equilibrium result from
a cognitive hierarchy of levels of strategic thinking. In these theories there
are levels of thinking, starting from nonstrategic thinking, based presumably
on salient features of strategies (or, in the absence of distinctive salience,
random choice). Higher- level thinkers build up a model of what lower-
level thinkers do (e.g., Stahl and Wilson 1995; Camerer, Ho, and Chong
2004; Crawford, Costa-Gomes, and Iriberri 2013). These models have been
applied to hundreds of experimental games with some degree of imperfect
cross- game generality, and to several fi eld settings.8