The Economics of Artificial Intelligence

Home > Other > The Economics of Artificial Intelligence > Page 85
The Economics of Artificial Intelligence Page 85

by Ajay Agrawal


  when the goal is semiparametric estimation or when there are a large number

  of covariates relative to the number of observations. Machine learning has

  great strengths in using data to select functional forms fl exibly.

  A second theme is that a key advantage of ML is that ML views empirical

  analysis as “algorithms” that estimate and compare many alternative models.

  This approach constrasts with economics, where (in principle, though rarely

  in reality) the researcher picks a model based on principles and estimates it

  once. Instead, ML algorithms build in “tuning” as part of the algorithm.

  The tuning is essentially model selection, and in an ML algorithm that is

  data driven. There are a whole host of advantages of this approach, includ-

  ing improved performance as well as enabling researchers to be systematic

  and fully describe the process by which their model was selected. Of course,

  cross- validation has also been used historically in economics, for example,

  for selecting the bandwidth for a kernel regression, but it is viewed as a fun-

  damental part of an algorithm in ML.

  A third, closely related theme is that “outsourcing” model selection to

  algorithm works very well when the problem is “simple”—for example, pre-

  diction and classifi cation tasks, where performance of a model can be evalu-

  ated by looking at goodness of fi t in a held- out test set. Those are typically

  not the problems of greatest interest for empirical researchers in economics,

  who instead are concerned with causal inference, where there is typically not

  an unbiased estimate of the ground truth available for comparison. Thus,

  more work is required to apply an algorithmic approach to economic prob-

  lems. The recent literature at the intersection of ML and causal inference,

  reviewed in this chapter, has focused on providing the conceptual framework

  and specifi c proposals for algorithms that are tailored for causal inference.

  A fourth theme is that the algorithms also have to be modifi ed to pro-

  vide valid confi dence intervals for estimated eff ects when the data is used to

  select the model. Many recent papers make use of techniques such as sample

  splitting, leave- one- out estimation, and other similar techniques to provide

  confi dence intervals that work both in theory and in practice. The upside is

  that using ML can provide the best of both worlds: the model selection is

  data driven, systematic, and a wide range of models are considered; yet, the

  model- selection process is fully documented, and confi dence intervals take

  into account the entire algorithm.

  Finally, the combination of ML and newly available data sets will change

  economics in fairly fundamental ways ranging from new questions, to new

  The Impact of Machine Learning on Economics 509

  approaches, to collaboration (larger teams and interdisciplinary inter-

  action), to a change in how involved economists are in the engineering and

  implementation of policies.

  21.2 What Is Machine Learning and What Are Early Use Cases?

  It is harder than one might think to come up with an operational defi -

  nition of ML. The term can be (and has been) used broadly or narrowly; it

  can refer to a collections of subfi elds of computer science, but also to a set

  of topics that are developed and used across computer science, engineer-

  ing, statistics, and increasingly the social sciences. Indeed, one could devote

  an entire article to the defi nition of ML, or to the question of whether the

  thing called ML really needed a new name other than statistics, the distinc-

  tion between ML and AI, and so on. However, I will leave this debate to

  others and focus on a narrow, practical defi nition that will make it easier

  to distinguish ML from the most commonly used econometric approaches

  used in applied econometrics until very recently.1 For readers coming from

  a machine- learning background, it is also important to note that applied

  statistics and econometrics have developed a body of insights on topics rang-

  ing from causal inference to effi

  ciency that have not yet been incorporated in

  mainstream machine learning, while other parts of machine learning have

  overlap with methods that have been used in applied statistics and social

  sciences for many decades.

  Starting from a relatively narrow defi nition of machine learning, machine

  learning is a fi eld that develops algorithms designed to be applied to data

  sets, with the main areas of focus being prediction (regression), classifi ca-

  tion, and clustering or grouping tasks. These tasks are divided into two main

  branches, supervised and unsupervised ML. Unsupervised ML involves

  fi nding clusters of observations that are similar in terms of their covariates,

  and thus can be interpreted as “dimensionality reduction”; it is commonly

  used for video, images, and text. There are a variety of techniques available

  for unsupervised learning, including k- means clustering, topic modeling,

  community detection methods for networks, and many more. For example,

  the Latent Dirichlet Allocation model (Blei, Ng, and Jordan 2003) has fre-

  quently been applied to fi nd “topics” in textual data. The output of a typical

  unsupervised ML model is a partition of the set of observations, where

  observations within each element of the partition are similar according to

  some metric, or, a vector of probabilities or weights that describe a mixture

  of topics or groups that an observation might belong to. If you read in the

  1. I will also focus on the most popular parts of ML; like many fi elds, it is possible to fi nd researchers who defi ne themselves as members of the fi eld of ML doing a variety of diff erent things, including pushing the boundaries of ML with tools from other disciplines. In this chapter I will consider such work to be interdisciplinary rather than “pure” ML, and will discuss it as such.

  510 Susan Athey

  newspaper that a computer scientist “discovered cats on YouTube,” that

  might mean that they used an unsupervised ML method to partition a set

  of videos into groups, and when a human watches the the largest group,

  they observe that most of the videos in the largest group contain cats. This

  is referred to as “unsupervised” because there were no “labels” on any of the

  images in the input data; only after examining the items in each group does

  an observer determine that the algorithm found cats or dogs. Not all dimen-

  sionality reduction methods involve creating clusters; older methods such as

  principal components analysis can be used to reduce dimensionality, while

  modern methods include matrix factorization (fi nding two low- dimensional

  matrices whose product well approximates a larger matrix), regularization

  on the norm of a matrix, hierarchical Poisson factorization (in a Bayesian

  framework) (Gopalan, Hofman, and Blei 2015), and neural networks.

  In my view, these tools are very useful as an intermediate step in empirical

  work in economics. They provide a data- driven way to fi nd similar news-

  paper articles, restaurant reviews, and so forth, and thus create variables

  that can be used in economic analyses. These variables might be part of the

  construction of ei
ther outcome variables or explanatory variables, depend-

  ing on the context. For example, if an analyst wishes to estimate a model

  of consumer demand for diff erent items, it is common to model consumer

  preferences over characteristics of the items. Many items are associated with

  text descriptions as well as online reviews. Unsupervised learning could be

  used to discover items with similar product descriptions in an initial phase

  of fi nding potentially related products, and it could also be used to fi nd

  subgroups of similar products. Unsupervised learning could further be used

  to categorize the reviews into types. An indicator for the review group could

  be used in subsequent analysis without the analyst having to use human

  judgement about the review content; the data would reveal whether a cer-

  tain type of review was associated with higher consumer perceived quality,

  or not. An advantage of using unsupervised learning to create covariates

  is that the outcome data is not used at all; thus, concerns about spurious

  correlation between constructed covariates and the observed outcome are

  less problematic. Despite this, Egami et al. (2016) have argued that research-

  ers may be tempted to fi ne- tune their construction of covariates by testing

  how they perform in terms of predicting outcomes, thus leading to spuri-

  ous relationships between covariates and outcomes. They recommend the

  approach of sample splitting, whereby the model tuning takes place on one

  sample of data, and then the selected model is applied on a fresh sample

  of data.

  Unsupervised learning can also be used to create outcome variables. For

  example, Athey, Mobius, and Pál (2017) examine the impact of Google’s

  shutdown of Google News in Spain on the types of news consumers read. In

  this case, the share of news in diff erent categories is an outcome of interest.

  Unsupervised learning can be used to categorize news in this type of anal-

  The Impact of Machine Learning on Economics 511

  ysis; that paper uses community detection techniques from network theory.

  In the absence of dimensionality reduction, it would be diffi

  cult to mean-

  ingfully summarize the impact of the shutdown on all of the diff erent news

  articles consumed in the relevant time frame.

  Supervised machine learning typically entails using a set of features or

  covariates ( X ) to predict an outcome ( Y). When using the term prediction, it is important to emphasize that the framework focuses not on forecasting,

  but rather on a setting where there are some labeled observations where both

  X and Y are observed (the training data), and the goal is to predict outcomes ( Y) in an independent test set based on the realized values of X for each unit in the test set. In other words, the goal is to construct ˆ

  μ( x), which is an esti-

  mator of ( x) = E[ Y | X = x], in order to do a good job predicting the true values of Y in an independent data set. The observations are assumed to be

  independent, and the joint distribution of X and Y in the training set is the same as that in the test set. These assumptions are the only substantive

  assumptions required for most machine- learning methods to work.

  In the case of classifi cation, the goal is to accurately classify observations.

  For example, the outcome could be the animal depicted in an image, the

  “features” or covariates are the pixels in the image, and the goal is to cor-

  rectly classify images into the correct animal depicted. A related but distinct

  estimation problem is to estimate Pr( Y = k | X = x) for each of k = 1, . . , K

  possible realizations of Y.

  It is important to emphasize that the ML literature does not frame itself

  as solving estimation problems—so estimating ( x) or Pr( Y = k | X = x) is not the primary goal. Instead, the goal is to achieve goodness of fi t in an

  independent test set by minimizing deviations between actual outcomes and

  predicted outcomes. In applied econometrics, we often wish to understand

  an object like ( x) in order to perform exercises like evaluating the impact of changing one covariate while holding others constant. This is not an explicit

  aim of ML modeling.

  There are a variety of ML methods for supervised learning, such as regu-

  larized regression (LASSO, ridge and elastic net), random forest, regression

  trees, support vector machines, neural nets, matrix factorization, and many

  others, such as model averaging. See Varian (2014) for an overview of some

  of the most popular methods and Mullainathan and Spiess (2017) for more

  details. (Also note that White [1992] attempted to popularize neural nets in

  economics in the early 1990s, but at the time they did not lead to substan-

  tial performance improvements and did not become popular in economics.)

  What leads us to categorize these methods as ML methods rather than tra-

  ditional econometric or statistical methods? First is simply an observation:

  until recently, these methods were neither used in published social science

  research, nor taught in social science courses, while they were widely stud-

  ied in the self- described ML and/or “statistical learning” literatures. One

  exception is ridge regression, which received some attention in economics,

  512 Susan Athey

  and LASSO had also received some attention. But from a more functional

  perspective, one common feature of many ML methods is that they use data-

  driven model selection. That is, the analyst provides the list of covariates or

  features, but the functional form is at least in part determined as a function

  of the data, and rather than performing a single estimation (as is done, at

  least in theory, in econometrics), so that the method is better described as

  an algorithm that might estimate many alternative models and then select

  among them to maximize a criterion.

  There is typically a trade- off between expressiveness of the model (e.g.,

  more covariates included in a linear regression) and risk of overfi tting, which

  occurs when the model is too rich relative to the sample size. (See Mullaina-

  than and Spiess [2017] for more discussion of this.) In the latter case, the

  goodness of fi t of the model when measured on the sample where the model

  is estimated is expected to be much better than the goodness of fi t of the

  model when evaluated on an independent test set. The ML literature uses a

  variety of techniques to balance expressiveness against overfi tting. The most

  common approach is cross- validation whereby the analyst repeatedly esti-

  mates a model on part of the data (a “training fold”) and then evaluates it

  on the complement (the “test fold”). The complexity of the model is selected

  to minimize the average of the mean- squared error of the prediction (the

  squared diff erence between the model prediction and the actual outcome) on

  the test folds. Other approaches used to control overfi tting include averaging

  many diff erent models, sometimes estimating each model on a subsample

  of the data (one can interpret the random forest in this way).

  In contrast, in much of cross- sectional econometrics and empirical work

  in economics, the tradition has been that the researcher specifi es one model,

  estimates the model on the full data set, and relies on statistica
l theory to

  estimate confi dence intervals for estimated parameters. The focus is on the

  estimated eff ects rather than the goodness of fi t of the model. For much em-

  pirical work in economics, the primary interest is in the estimate of a causal

  eff ect, such as the eff ect of a training program, a minimum wage increase,

  or a price increase. The researcher might check robustness of this parameter

  estimate by reporting two or three alternative specifi cations. Researchers

  often check dozens or even hundreds of alternative specifi cations behind

  the scenes, but rarely report this practice because it would invalidate the

  confi dence intervals reported (due to concerns about multiple testing and

  searching for specifi cations with the desired results). There are many disad-

  vantages to the traditional approach, including but not limited to the fact

  that researchers would fi nd it diffi

  cult to be systematic or comprehensive in

  checking alternative specifi cations, and further because researchers were not

  honest about the practice, given that they did not have a way to correct for

  the specifi cation search process. I believe that regularization and systematic

  model selection have many advantages over traditional approaches, and for

  this reason will become a standard part of empirical practice in econom-

  The Impact of Machine Learning on Economics 513

  ics. This will particularly be true as we more frequently encounter data sets

  with many covariates, and also as we see the advantages of being systematic

  about model selection. As I discuss later, however, this practice must be

  modifi ed from traditional ML and in general “handled with care” when the

  researcher’s ultimate goal is to estimate a causal eff ect rather than maximize

  goodness of fi t in a test set.

  To build some intuition about the diff erence between causal eff ect estima-

  tion and prediction, it can be useful to consider the widely used method of

  instrumental variables. Instrumental variables are used by economists when

  they wish to learn a causal eff ect, for example, the eff ect of a price on a fi rm’s

  sales, but they only have access to observational (nonexperimental) data. An

  instrument in this case might be an input cost for the fi rm that shifts over

 

‹ Prev