The Economics of Artificial Intelligence Page 85 Read online free by Ajay Agrawal

Home > Other > The Economics of Artificial Intelligence > Page 85

The Economics of Artificial Intelligence Page 85

when the goal is semiparametric estimation or when there are a large number

of covariates relative to the number of observations. Machine learning has

great strengths in using data to select functional forms fl exibly.

A second theme is that a key advantage of ML is that ML views empirical

analysis as “algorithms” that estimate and compare many alternative models.

This approach constrasts with economics, where (in principle, though rarely

in reality) the researcher picks a model based on principles and estimates it

once. Instead, ML algorithms build in “tuning” as part of the algorithm.

The tuning is essentially model selection, and in an ML algorithm that is

data driven. There are a whole host of advantages of this approach, includ-

ing improved performance as well as enabling researchers to be systematic

and fully describe the process by which their model was selected. Of course,

cross- validation has also been used historically in economics, for example,

for selecting the bandwidth for a kernel regression, but it is viewed as a fun-

damental part of an algorithm in ML.

A third, closely related theme is that “outsourcing” model selection to

algorithm works very well when the problem is “simple”—for example, pre-

diction and classifi cation tasks, where performance of a model can be evalu-

ated by looking at goodness of fi t in a held- out test set. Those are typically

not the problems of greatest interest for empirical researchers in economics,

who instead are concerned with causal inference, where there is typically not

an unbiased estimate of the ground truth available for comparison. Thus,

more work is required to apply an algorithmic approach to economic prob-

lems. The recent literature at the intersection of ML and causal inference,

reviewed in this chapter, has focused on providing the conceptual framework

and specifi c proposals for algorithms that are tailored for causal inference.

A fourth theme is that the algorithms also have to be modifi ed to pro-

vide valid confi dence intervals for estimated eff ects when the data is used to

select the model. Many recent papers make use of techniques such as sample

splitting, leave- one- out estimation, and other similar techniques to provide

confi dence intervals that work both in theory and in practice. The upside is

that using ML can provide the best of both worlds: the model selection is

data driven, systematic, and a wide range of models are considered; yet, the

model- selection process is fully documented, and confi dence intervals take

into account the entire algorithm.

Finally, the combination of ML and newly available data sets will change

economics in fairly fundamental ways ranging from new questions, to new

The Impact of Machine Learning on Economics 509

approaches, to collaboration (larger teams and interdisciplinary inter-

action), to a change in how involved economists are in the engineering and

implementation of policies.

21.2 What Is Machine Learning and What Are Early Use Cases?

It is harder than one might think to come up with an operational defi -

nition of ML. The term can be (and has been) used broadly or narrowly; it

can refer to a collections of subfi elds of computer science, but also to a set

of topics that are developed and used across computer science, engineer-

ing, statistics, and increasingly the social sciences. Indeed, one could devote

an entire article to the defi nition of ML, or to the question of whether the

thing called ML really needed a new name other than statistics, the distinc-

tion between ML and AI, and so on. However, I will leave this debate to

others and focus on a narrow, practical defi nition that will make it easier

to distinguish ML from the most commonly used econometric approaches

used in applied econometrics until very recently.1 For readers coming from

a machine- learning background, it is also important to note that applied

statistics and econometrics have developed a body of insights on topics rang-

ing from causal inference to effi

ciency that have not yet been incorporated in

mainstream machine learning, while other parts of machine learning have

overlap with methods that have been used in applied statistics and social

sciences for many decades.

Starting from a relatively narrow defi nition of machine learning, machine

learning is a fi eld that develops algorithms designed to be applied to data

sets, with the main areas of focus being prediction (regression), classifi ca-

tion, and clustering or grouping tasks. These tasks are divided into two main

branches, supervised and unsupervised ML. Unsupervised ML involves

fi nding clusters of observations that are similar in terms of their covariates,

and thus can be interpreted as “dimensionality reduction”; it is commonly

used for video, images, and text. There are a variety of techniques available

for unsupervised learning, including k- means clustering, topic modeling,

community detection methods for networks, and many more. For example,

the Latent Dirichlet Allocation model (Blei, Ng, and Jordan 2003) has fre-

quently been applied to fi nd “topics” in textual data. The output of a typical

unsupervised ML model is a partition of the set of observations, where

observations within each element of the partition are similar according to

some metric, or, a vector of probabilities or weights that describe a mixture

of topics or groups that an observation might belong to. If you read in the

1. I will also focus on the most popular parts of ML; like many fi elds, it is possible to fi nd researchers who defi ne themselves as members of the fi eld of ML doing a variety of diff erent things, including pushing the boundaries of ML with tools from other disciplines. In this chapter I will consider such work to be interdisciplinary rather than “pure” ML, and will discuss it as such.

510 Susan Athey

newspaper that a computer scientist “discovered cats on YouTube,” that

might mean that they used an unsupervised ML method to partition a set

of videos into groups, and when a human watches the the largest group,

they observe that most of the videos in the largest group contain cats. This

is referred to as “unsupervised” because there were no “labels” on any of the

images in the input data; only after examining the items in each group does

an observer determine that the algorithm found cats or dogs. Not all dimen-

sionality reduction methods involve creating clusters; older methods such as

principal components analysis can be used to reduce dimensionality, while

modern methods include matrix factorization (fi nding two low- dimensional

matrices whose product well approximates a larger matrix), regularization

on the norm of a matrix, hierarchical Poisson factorization (in a Bayesian

framework) (Gopalan, Hofman, and Blei 2015), and neural networks.

In my view, these tools are very useful as an intermediate step in empirical

work in economics. They provide a data- driven way to fi nd similar news-

paper articles, restaurant reviews, and so forth, and thus create variables

that can be used in economic analyses. These variables might be part of the

construction of ei
ther outcome variables or explanatory variables, depend-

ing on the context. For example, if an analyst wishes to estimate a model

of consumer demand for diff erent items, it is common to model consumer

preferences over characteristics of the items. Many items are associated with

text descriptions as well as online reviews. Unsupervised learning could be

used to discover items with similar product descriptions in an initial phase

of fi nding potentially related products, and it could also be used to fi nd

subgroups of similar products. Unsupervised learning could further be used

to categorize the reviews into types. An indicator for the review group could

be used in subsequent analysis without the analyst having to use human

judgement about the review content; the data would reveal whether a cer-

tain type of review was associated with higher consumer perceived quality,

or not. An advantage of using unsupervised learning to create covariates

is that the outcome data is not used at all; thus, concerns about spurious

correlation between constructed covariates and the observed outcome are

less problematic. Despite this, Egami et al. (2016) have argued that research-

ers may be tempted to fi ne- tune their construction of covariates by testing

how they perform in terms of predicting outcomes, thus leading to spuri-

ous relationships between covariates and outcomes. They recommend the

approach of sample splitting, whereby the model tuning takes place on one

sample of data, and then the selected model is applied on a fresh sample

of data.

Unsupervised learning can also be used to create outcome variables. For

example, Athey, Mobius, and Pál (2017) examine the impact of Google’s

shutdown of Google News in Spain on the types of news consumers read. In

this case, the share of news in diff erent categories is an outcome of interest.

Unsupervised learning can be used to categorize news in this type of anal-

The Impact of Machine Learning on Economics 511

ysis; that paper uses community detection techniques from network theory.

In the absence of dimensionality reduction, it would be diffi

cult to mean-

ingfully summarize the impact of the shutdown on all of the diff erent news

articles consumed in the relevant time frame.

Supervised machine learning typically entails using a set of features or

covariates ( X ) to predict an outcome ( Y). When using the term prediction, it is important to emphasize that the framework focuses not on forecasting,

but rather on a setting where there are some labeled observations where both

X and Y are observed (the training data), and the goal is to predict outcomes ( Y) in an independent test set based on the realized values of X for each unit in the test set. In other words, the goal is to construct ˆ

μ( x), which is an esti-

mator of ( x) = E[ Y | X = x], in order to do a good job predicting the true values of Y in an independent data set. The observations are assumed to be

independent, and the joint distribution of X and Y in the training set is the same as that in the test set. These assumptions are the only substantive

assumptions required for most machine- learning methods to work.

In the case of classifi cation, the goal is to accurately classify observations.

For example, the outcome could be the animal depicted in an image, the

“features” or covariates are the pixels in the image, and the goal is to cor-

rectly classify images into the correct animal depicted. A related but distinct

estimation problem is to estimate Pr( Y = k | X = x) for each of k = 1, . . , K

possible realizations of Y.

It is important to emphasize that the ML literature does not frame itself

as solving estimation problems—so estimating ( x) or Pr( Y = k | X = x) is not the primary goal. Instead, the goal is to achieve goodness of fi t in an

independent test set by minimizing deviations between actual outcomes and

predicted outcomes. In applied econometrics, we often wish to understand

an object like ( x) in order to perform exercises like evaluating the impact of changing one covariate while holding others constant. This is not an explicit

aim of ML modeling.

There are a variety of ML methods for supervised learning, such as regu-

larized regression (LASSO, ridge and elastic net), random forest, regression

trees, support vector machines, neural nets, matrix factorization, and many

others, such as model averaging. See Varian (2014) for an overview of some

of the most popular methods and Mullainathan and Spiess (2017) for more

details. (Also note that White [1992] attempted to popularize neural nets in

economics in the early 1990s, but at the time they did not lead to substan-

tial performance improvements and did not become popular in economics.)

What leads us to categorize these methods as ML methods rather than tra-

ditional econometric or statistical methods? First is simply an observation:

until recently, these methods were neither used in published social science

research, nor taught in social science courses, while they were widely stud-

ied in the self- described ML and/or “statistical learning” literatures. One

exception is ridge regression, which received some attention in economics,

512 Susan Athey

and LASSO had also received some attention. But from a more functional

perspective, one common feature of many ML methods is that they use data-

driven model selection. That is, the analyst provides the list of covariates or

features, but the functional form is at least in part determined as a function

of the data, and rather than performing a single estimation (as is done, at

least in theory, in econometrics), so that the method is better described as

an algorithm that might estimate many alternative models and then select

among them to maximize a criterion.

There is typically a trade- off between expressiveness of the model (e.g.,

more covariates included in a linear regression) and risk of overfi tting, which

occurs when the model is too rich relative to the sample size. (See Mullaina-

than and Spiess [2017] for more discussion of this.) In the latter case, the

goodness of fi t of the model when measured on the sample where the model

is estimated is expected to be much better than the goodness of fi t of the

model when evaluated on an independent test set. The ML literature uses a

variety of techniques to balance expressiveness against overfi tting. The most

common approach is cross- validation whereby the analyst repeatedly esti-

mates a model on part of the data (a “training fold”) and then evaluates it

on the complement (the “test fold”). The complexity of the model is selected

to minimize the average of the mean- squared error of the prediction (the

squared diff erence between the model prediction and the actual outcome) on

the test folds. Other approaches used to control overfi tting include averaging

many diff erent models, sometimes estimating each model on a subsample

of the data (one can interpret the random forest in this way).

In contrast, in much of cross- sectional econometrics and empirical work

in economics, the tradition has been that the researcher specifi es one model,

estimates the model on the full data set, and relies on statistica
l theory to

estimate confi dence intervals for estimated parameters. The focus is on the

estimated eff ects rather than the goodness of fi t of the model. For much em-

pirical work in economics, the primary interest is in the estimate of a causal

eff ect, such as the eff ect of a training program, a minimum wage increase,

or a price increase. The researcher might check robustness of this parameter

estimate by reporting two or three alternative specifi cations. Researchers

often check dozens or even hundreds of alternative specifi cations behind

the scenes, but rarely report this practice because it would invalidate the

confi dence intervals reported (due to concerns about multiple testing and

searching for specifi cations with the desired results). There are many disad-

vantages to the traditional approach, including but not limited to the fact

that researchers would fi nd it diffi

cult to be systematic or comprehensive in

checking alternative specifi cations, and further because researchers were not

honest about the practice, given that they did not have a way to correct for

the specifi cation search process. I believe that regularization and systematic

model selection have many advantages over traditional approaches, and for

this reason will become a standard part of empirical practice in econom-

The Impact of Machine Learning on Economics 513

ics. This will particularly be true as we more frequently encounter data sets

with many covariates, and also as we see the advantages of being systematic

about model selection. As I discuss later, however, this practice must be

modifi ed from traditional ML and in general “handled with care” when the

researcher’s ultimate goal is to estimate a causal eff ect rather than maximize

goodness of fi t in a test set.

To build some intuition about the diff erence between causal eff ect estima-

tion and prediction, it can be useful to consider the widely used method of

instrumental variables. Instrumental variables are used by economists when

they wish to learn a causal eff ect, for example, the eff ect of a price on a fi rm’s

sales, but they only have access to observational (nonexperimental) data. An

instrument in this case might be an input cost for the fi rm that shifts over

‹ Prev Next ›