The Economics of Artificial Intelligence

Home > Other > The Economics of Artificial Intelligence > Page 87
The Economics of Artificial Intelligence Page 87

by Ajay Agrawal


  tional consequences, it is possible to constrain the algorithms. Generally,

  algorithms can be trained to optimize objectives under constraints, and thus

  it may be easier to impose societal objectives on algorithms than on subjec-

  tive decisions by humans.

  A third issue that arises is stability and robustness, for example, in

  response to variations in samples or variations in the environment. There

  are a variety of related ideas in machine learning, including domain adapta-

  tion (how do you make a model trained in one environment perform well in

  another environment), “transfer learning,” and others. The basic concern

  is that ML algorithms do exhaustive searches across a very large number

  of possible specifi cations looking for the best model that predicts Y based

  on X. The models will fi nd subtle relationships bewteen X and Y, some of which might not be stable across time or across environments. For example,

  for the last few years there may be more videos of cats with pianos than

  dogs with pianos. The presence of a piano in a video may thus predict cats.

  However, pianos are not a fundamentnal feature of cats that holds across

  environments, and so if a fad arises where dogs play pianos, performance

  of an ML algorithm might suff er. This might not be a problem for a tech

  fi rm that reestimates its models with fresh data daily, but predictive models

  are often used over much longer time periods in industry. For example,

  credit- scoring models may be held fi xed, since changing them makes it hard

  to assess the risk of the set of consumers who accept credit off ers. Scoring

  models used in medicine might be held fi xed over many years. There are

  The Impact of Machine Learning on Economics 519

  many interesting methodological issues involved in fi nding models that

  have stable performance and are robust to changing circumstances.

  Another issue is that of manipulability. In the application of using mobile

  data to do credit scoring, a concern is that consumers may be able to mani-

  plate the data observed by the loan provider (Bjorkegren and Grissen 2017).

  For example, if certain behavioral patterns help a consumer get a loan, the

  consumer can make it look like they have these behavioral patterns, for ex-

  ample, by visiting certain areas of a city. If resources are allocated to homes

  that look poor via satellite imagery, homes or villages can possibly modify

  the aerial appearance of their homes to make them look poorer. An open

  area for future research concerns how to constrain ML models to make them

  less prone to manipulability; Athey (2017) discusses some other examples

  of this.

  There are also other considerations that can be brought into ML when

  it is taken to the fi eld, including computational time, the cost of collecting

  and maintaining the “features” that are used in a model, and so on. For ex-

  ample, technology fi rms sometimes make use of simplifi ed models in order

  to reduce the response time for real- time user requests for information.

  Overall, my prediction is that social scientists (and computer scientists

  at the intersection with social science), particularly economists and other

  social scientists, will contribute heavily to defi ning these types of problems

  and concerns formally, and proposing solutions to them. This will not only

  provide for better implementations of ML in policy, but will also provide

  rich fodder for interesting research.

  21.4 A New Literature on Machine Learning and Causal Inference

  Despite the fascinating examples of “off - the- shelf ” or slightly modifi ed

  prediction methods, in general ML prediction models are solving fundamen-

  tally diff erent problems from much empirical work in social science, which

  instead focuses on causal inference. A prediction I have is that there will be

  an active and important literature combining ML and causal inference to

  create new methods, methods that harness the strengths of ML algorithms

  to solve causal inference problems. In fact, it is easy to make this prediction

  with confi dence because the movement is already well underway. Here I will

  highlight a few examples, focusing on those that illustrate a range of themes,

  while emphasizing that this is not a comprehensive survey or a thorough

  review.

  To see the diff erence between prediction and causal inference, imagine

  that you have a data set that contains data about prices and occupancy

  rates of hotels. Prices are easy to obtain through price comparison sites,

  but occupancy rates are typically not made public by hotels. Imagine fi rst

  that a hotel chain wishes to form an estimate of the occupancy rates of

  competitors, based on publicly available prices. This is a prediction problem:

  520 Susan Athey

  the goal is to get a good estimate of occupancy rates, where posted prices

  and other factors (such as events in the local area, weather, and so on) are

  used to predict occupancy. For such a model, you would expect to fi nd that

  higher posted prices are predictive of higher occupancy rates, since hotels

  tend to raise their prices as they fi ll up (using yield management software).

  In contrast, imagine that a hotel chain wishes to estimate how occupancy

  would change if the hotel raised prices across the board (that is, if it repro-

  grammed the yield management software to shift prices up by 5 percent in

  every state of the world). This is a question of causal inference. Clearly, even

  though prices and occupancy are positively correlated in a typical data set,

  we would not conclude that raising prices would increase occupancy. It is

  well known in the causal inference literature that the question about price

  increases cannot be answered simply by examining historical data without

  additional assumptions or structure. For example, if the hotel previously ran

  randomized experiments on pricing, the data from these experiments can be

  used to answer the question. More commonly, an analyst will exploit natural

  experiments or instrumental variables where the latter are variables that are

  unrelated to factors that aff ect consumer demand, but that shift fi rm costs

  and thus their prices. Most of the classic supervised ML literature has little

  to say about how to answer this question.

  To understand the gap between prediction and causal inference, recall that

  the foundation of supervised ML methods is that model selection (through,

  e.g., cross- validation) is carried out to optimize goodness of fi t on a test

  sample. A model is good if and only if it predicts outcomes well in a test

  set. In contrast, a large body of econometric research builds models that

  substantially reduce the goodness of fi t of a model in order to estimate the

  causal eff ect of, say, changing prices. If prices and quantities are positively

  correlated in the data, any model that estimates the true causal eff ect (quan-

  tity goes down if you change price) will not do as good a job fi tting a test

  data set that has the same joint distribution of prices and quantities as the

  training data. The place where the econometric model with a causal estimate

  would do better is at fi tting what happens if the fi rm actually changes prices
>
  at a given point in time at doing counterfactual predictions when the world

  changes. Techniques like instrumental variables seek to use only some of the

  information that is in the data the clean or exogenous or experiment- like

  variation in price sacrifi cing predictive accuracy in the current environment

  to learn about a more fundamental relationship that will help make decisions

  about changing price.

  However, a new but rapidly growing literature is tackling the problem of

  using ML methods for causal inference. This new literature takes many of

  the strengths and innovations of ML methods, but applies them to causal

  inference. Doing this requires changing the objective function, since the

  ground truth of the causal parameter is not observed in any test set. Also

  as a consequence of the fact that the truth is not observed in a test set, sta-

  The Impact of Machine Learning on Economics 521

  tistical theory plays a more important role in evaluating models, since it is

  more diffi

  cult to directly assess how well a parameter estimates the truth,

  even if the analyst has access to an independent test set. Indeed, this dis-

  cussion highlights one of the key ways in which prediction is substantially

  simpler than parameter estimation: for prediction problems, a prediction

  for a given unit (given its covariates) can be summarized in a single number,

  the predicted outcome, and the quality of the prediction can be evaluated

  on a test set without further modeling assumptions. Although the average

  squared prediction error of a model on a test set is a noisy estimate of the

  expected value of the mean squared error on a random test set (due to small

  sample size), the law of large numbers applies to this average and it converges

  quickly to the truth as the test set size increases. Since the standard deviation

  of the prediction error can also be easily estimated, it is straightforward to

  evaluate predictive models without imposing additional assumptions.

  There are a variety of diff erent problems that can be tackled with ML

  methods. An incomplete list of some that have gained early attention is

  given as follows. First, we can consider the type of identifi cation strategy

  for identifying causal eff ects. Some that have received attention in the new

  ML/ causal inference literature include:

  1. Treatment randomly assigned (experimental data).

  2. Treatment assignment unconfounded (conditional on covariates).

  3. Instrumental variables.

  4. Panel data settings (including diff erence- in-diff erence designs).

  5. Regression discontinuity designs.

  6. Structural models of individual or fi rm behavior.

  In each of those settings, there are diff erent problems of interest:

  1. Estimating average treatment eff ects (or a low- dimensional parameter

  vector).

  2. Estimating heterogeneous treatment eff ects in simple models or models

  of limited complexity.

  3. Estimating heterogeneous treatment eff ects nonparametrically.

  4. Estimating optimal treatment assignment policies.

  5. Identifying groups of individuals that are similar in terms of their treat-

  ment eff ects.

  Although the early literature is already too large to summarize all of the

  contributions to each combination of identifi cation strategty and problem

  of interest, it is useful to observe that at this point there are entries in almost

  all of the “boxes” associated with diff erent identifi cation strategies, both for

  average treatment eff ects and heterogeneous treatment eff ects. Here, I will

  provide a bit more detail on a few leading cases that have received a lot of

  attention, in order to illustrate some key themes in the literature.

  It is also useful to observe that even though the last four problems seem

  522 Susan Athey

  closely related, they are distinct, and the methods used to solve them as well

  as the issues that arise are distinct. These distinctions have not traditionally

  been emphasized as much in the literature on causal inference, but they mat-

  ter more in environments with data- driven model selection because each has

  a diff erent objective and the objective function can make a big diff erence in

  determining the selected model in ML- based models. Issues of inference are

  also distinct, as we will discuss further below.

  21.4.1 Average Treatment Eff ects

  A large and important branch of the literature on causal inference focuses

  on estimation of average treatment eff ects under the unconfoundedness

  assumption. This assumption requires that potential outcomes (the out-

  comes a unit would experience in alternative treatment regimes) are inde-

  pendent of treatment assignment, conditional on covariates. In other words,

  treatment assignment is as good as random after controlling for covariates.

  From the 1990s through the fi rst decade of the twenty- fi rst century, a

  literature emerged about using semiparametric methods to estimate average

  treatment eff ects (e.g., Bickel et al. [1993], focusing on an environment with

  a fi xed number of covariates that is small relative to the sample size). The

  methods are semiparametric in the sense that the goal is to estimate a low-

  dimensional parameter—in this case, the average treatment eff ect—without

  making parametric assumptions about the way in which covariates aff ect

  outcomes (e.g., Hahn 1998). (See Imbens and Wooldridge [2009] and Imbens

  and Rubin [2015] for reviews.) In the middle of the fi rst decade of the twenty-

  fi rst decade, Mark van der Laan and coauthors introduced and developed

  a set of methods called “targeted maximum likelihood” (van der Laan and

  Rubin 2006). The idea is that maximum likelihood is used to estimate a low-

  dimensional parameter vector in the presence of high- dimensional nuisance

  parameters. The method allows the nuisance parameters to be estimated

  with techniques that have less well- established properties or a slower con-

  vergence rate. This approach can be applied to estimate an average treatment

  eff ect parameter under a variety of identifi cation assumptions, but impor-

  tantly, it is an approach that can be used with many covariates.

  An early example of the application of ML methods to causal inference in

  economics (see Belloni, Chernozhukov, and Hansen 2014 and Chernozhu-

  kov, Hansen, and Spindler 2015 for reviews) uses regularized regression as

  an approach to deal with many potential covariates in an environment where

  the outcome model is “sparse,” meaning that only a small number of covari-

  ates actually aff ect mean outcome (but there are many observables, and

  the analyst does not know which ones are important). In an environment

  with unconfoundedness, since some covariates are correlated with both the

  treatment assignment and the outcome, if the analyst does not condition

  on them the omission of the confounder will lead to a biased estimate of

  the treatment eff ect. Belloni, Chernozhukov, and Hansen propose a double-

  The Impact of Machine Learning on Economics 523

  selection method based on the LASSO. The LASSO is a regularized regres-

  sion procedure where a regression is estimated using an ob
jective function

  that balances in-sample goodness of fi t with a penalty term that depends on

  the sum of the magnitude of regression coeffi

  cients. This form of penalty

  leads many covariates to be assigned a coeffi

  cient of zero, eff ectively drop-

  ping them from the regression. The magnitude of the penalty parameter is

  selected using cross- validation. The authors observe that if LASSO is used

  in a regression of the outcome and both the treatment indicator and other

  covariates, the coeffi

  cient on the treatment indicator will be a biased estimate

  of the treatment eff ect because confounders that have a weak relationship

  with the outcome but a strong relationship with the treatment assignment

  may be zeroed out by an algorithm whose sole objective is to select variables

  that predict outcomes.

  A variety of other methods have been proposed for combining machine

  learning and traditional econometric methods for estimating average treat-

  ment eff ects under the unconfoundedness assumption. Athey, Imbens, and

  Wager (2016) propose using a method they refer to as “residual balanc-

  ing,” building on work on balancing weights by Zubizarreta (2015). Their

  approach is similar to a “doubly- robust” method for estimating average treat-

  ment eff ects that proceeds by taking the average of the effi

  cient score, which

  involves an estimate of the conditional mean of outcomes given covariates

  as well as the inverse of the estimated propensity score; however, the residual

  balancing replaces inverse propensity score weights with weights obtained

  using quadratic programming, where the weights are designed to achieve

  balance between the treatment and control group. The conditional mean of

  outcomes is estimated using LASSO. The main result in the paper is that this

  procedure is effi

  cient and achieves the same rate of convergence as if the out-

  come model was known, under a few key assumptions. The most important

  assumption is that the outcome model is linear and sparse, although there

  can be a large number of covariates and the analyst does not need to have

  knowledge of which ones are important. The linearity assumption, while

  strong, allows the key result to hold in the absence of any assumptions about

  the structure of the process mapping covariates to the assignment, other

 

‹ Prev