The Economics of Artificial Intelligence Page 90 Read online free by Ajay Agrawal

Home > Other > The Economics of Artificial Intelligence > Page 90

The Economics of Artificial Intelligence Page 90

heterogeneous across users. Similarly, each item has latent characteristics

that describe users’ willingness to travel to patronize the restaurant, and

each user has individual- specifi c preferences for those latent characteristics.

Thus, both users’ willingness to travel and their base utility for each restau-

rant vary across user- item pairs. To make the estimation computationally

feasible, we build on the methods of Ruiz, Athey, and Blei (2017). We show

that our model performs better than more standard competing models such

as multi nomial logit and nested logit models, in part due to the personal-

ization of the estimates. We demonstrate in particular that our model per-

forms better when predicting consumer responses to restaurant openings

and closings, and we analyze how consumers reallocate their demand after a

restaurant closes to nearby restaurants versus more distant restaurants with

similar characteristics. Since there are several hundred restaurant openings

and closings in the data, we are able to use the large number of “natural

experiments” in the data to assess performance of the model. Finally, we

show how the model can be used to analyze questions involving counter-

factuals such as what type of restaurant would attract the most consumers

in a given location.

Another recent paper that makes use of factorization in the context of

a structural model of consumer demand is Wan et al. (2017). This paper

builds a model of consumer choice that includes choices over categories,

purchases within a category, and quantity to purchase. The model allows for

individual heterogeneity in preferences, and uses factorization techniques

to estimate the model.

21.5 Broader Predictions about the Impact

of Machine Learning on Economics

My prediction is that there will be substantial changes in how empirical

work is conducted; indeed, it is already happening, and so this prediction

already can be made with a high degree of certainty. I predict that a number

of changes will emerge, summarized as follows:

1. Adoption of off - the- shelf ML methods for their intended tasks (pre-

diction, classifi cation, and clustering, e.g., for textual analysis).

2. Extensions and modifi cations of prediction methods to account for

considerations such as fairness, manipulability, and interpretability.

3. Development of new econometric methods based on machine learning

designed to solve traditional social science estimation tasks.

4. No fundamental changes to theory of identifi cation of causal eff ects.

The Impact of Machine Learning on Economics 535

5. Incremental progress to identifi cation and estimation strategies for

causal eff ects that exploit modern data settings including large- panel data

sets and environments with many small experiments.

6. Increased emphasis on model robustness and other supplementary

analysis to assess credibility of studies.

7. Adoption of new methods by empiricists at large scale.

8. Revival and new lines of research in productivity and measurement.

9. New methods for the design and analysis of large administrative data,

including merging these sources and privacy- preserving methods.

10. Increase in interdisciplinary research.

11. Changes in organization, dissemination, and funding of economic

research.

12. Economist as engineer engages with fi rms, government to design, and

implement policies in digital environment.

13. Design and implementation of digital experimentation, both one- time

and as an ongoing process, including multiarmed bandit experimentation

algorithms, in collaboration with fi rms and government.

14. Research on developing high- quality metrics that can be measured

quickly, in order to facilitate rapid incremental innovation and experimen-

tation.

15. Increased use of data analysis in all levels of economics teaching;

increase in interdisciplinary data science programs.

16. Research on the impact of AI and ML on the economy.

This chapter has discussed the fi rst three predictions in some detail; I will

now discuss each of the remaining predictions in turn.

First, as emphasized in the discussion about the benefi ts from using ML,

ML is a very powerful tool for data- driven model selection. Getting the best

fl exible functional form to fi t data is very important for many reasons; for

example, when the researcher assumes that treatment assignment is uncon-

founded, it is still crucial to fl exibly control for covariates, and a vast litera-

ture has documented that modeling choices matter. A theme highlighted in

this chapter is that ML can be used any time that semiparametric methods

might have been used in the traditional econometrics literature. However,

fi nding the best functional form is a distinct concern from whether an eco-

nomic parameter would be identifi ed with suffi

cient data. Thus, there is no

obvious benefi t from ML in terms of thinking about identifi cation issues.

However, the types of data sets that are becoming widely available due

to digitization suggest new identifi cation questions. For example, it is com-

mon for there to be frequent changes in algorithms in ecommerce platforms.

These changes in algorithms create variation in user experiences (as well

as in seller experiences in platforms and marketplaces). Thus, a typical

user or seller may experience a large number of changes, each of which

has modest eff ects. There are open questions about what can be learned in

536 Susan Athey

such environments. From an estimation perspective, there is also room to

develop ML- inspired algorithms that take advantage of the many sources

of variation experienced by market participants. In my 2012 Fisher Schultz

lecture, I illustrated the idea of using randomized experiments conducted

by technology fi rms as instruments for estimating position eff ects for spon-

sored search advertisements. This idea has since been exploited more fully

by others (e.g., Goldman and Rao 2014), but many open questions remain

about the best ways to use the information in such data sets.

Digitization is also leading to the creation of many panel data sets that

record individual behavior at relatively high frequency over a period of time.

There are many open questions about how to make the best use of rich

panel data. Previously, we discussed several new papers at the intersection

of ML and econometrics that made use of panel data (e.g., Athey, Bayati,

et al. 2017), but I predict that this literature will grow dramatically over the

next few years.

There are many reasons that empiricists will adopt ML methods at scale.

First, many ML methods simplify a variety of arbitrary choices analysts

needed to make. In larger and more complex data sets, there are many more

choices. Each choice must be documented, justifi ed, and serves at a poten-

tial source of criticism of a paper. When systematic, data- driven methods

are available, research can be made more principled and systematic, and

there can be objective measures against which these choices can be evalu
-

ated. Indeed, it would really be impossible for a researcher using traditional

empirical methods to fully document the process by which the model specifi -

cation was selected; in contrast, algorithmic selection (when the algorithm is

given the correct objective for the problem) has superior performance while

simultaneously being reproducible. Second, one way to conceptualize ML

algorithms is that they perform like automated research assistants—they

work much faster and more eff ectively than traditional research assistants

at exploring modeling choices, yet the methods that have been customized

for social science applications also build in protections so that, for example,

valid confi dence intervals can be obtained. Although it is crucial to con-

sider carefully the objective that the algorithms are given, in the end they

are highly eff ective. Thus, they help resolve issues like “p- value hacking” by giving researchers the best of both worlds—superior performance as well as

correct p- values that take into account the specifi cation- selection process.

Third, in many cases, new results can be obtained. For example, if an author

has run a fi eld experiment, there is no reason not to search for heterogeneous

treatment eff ects using methods such as those in Athey and Imbens (2016).

The method ensures that valid confi dence intervals can be obtained for the

resulting estimates of treatment eff ect heterogeneity.

Alongside the adoption of ML methods for old questions, new questions

and types of analyses will emerge in the fi elds of productivity and mea-

surement. Some examples of these have already been highlighted, such as

The Impact of Machine Learning on Economics 537

the ability to measure economic outcomes at a granular level over a longer

period of time, through, for example, imagery. Glaeser et al. (2018) pro-

vides a nice overview of how big data and ML will aff ect urban economics

as a fi eld, as well as the operational effi

ciency of cities. More broadly, as

governments begin to absorb high- frequency, granular data, they will need

to grapple with questions about how to maintain the stability of offi

cial

statistics in a world where the underlying data changes rapidly. New ques-

tions will emerge about how to architect a system of measurement that

takes advantage of high- frequency, noisy, unstable data, but yields statistics

whose meaning and relationship with a wide range of economic variables

remains stable. Firms will face similar problems as they attempt to forecast

outcomes relevant to their own businesses using noisy, high- frequency data.

The emerging literature in academics, government, and industry on “now-

casting” in macroeconomics (e.g., Banbura et al. [2013] and ML begins to

address some, but not all, of these issues). We will also see the emergence

of new forms of descriptive analysis, some inspired by ML. Examples of

these include techniques for describing association, for example, people who

do A also do B; as well as interpretations and visualizations of the output

of unsupervised ML techniques such as matrix factorization, clustering,

and so on. Economists are likely to refi ne these methods to make them more

directly useful quantiatively, and for business and policy decisions.

More broadly, the ability to use predictive models to measure economic

outcomes at high granularity and fi delity will change the types of questions

we can ask and answer. For example, imagery from satellites or Google’s

street view can be used in combination with survey data to train models that

can be used to produce estimates of economic outcomes at the level of the

individual home, either within the United States or in developing countries

where administrative data quality can be problematic (e.g., Jean et al. 2016;

Engstrom, Hersh, and Newhouse 2017; Naik et al. 2014).

Another area of transformation for economics will be in the design and

analysis of large- scale administrative data sets. We will see attempts to bring

together disparate sources to provide a more complete view of individuals

and fi rms. The behavior of individuals in the fi nancial world, the physical

world, and the digital world will be connected, and in some cases ML will be

needed simply to match diff erent identities from diff erent contexts onto the

same individual. Further, we will observe behavior of individuals over time,

often with high- frequency measurements. For example, children will leave

digital footprints throughout their education, ranging from how often they

check their homework assignments, the assignments themselves, comments

from teachers, and so on. Children will interact with adaptive systems that

change the material they receive based on their previous engagement and

performance. This will create the need for new statistical methods, building

on existing ML tools, but where the methods are more tailored to a panel-

data setting with signifi cant dynamic eff ects (and possibly peer eff ects as

538 Susan Athey

well; see, for some recent statistical advances designed around analyzing

large scale network data, Ugander et al. 2013; Athey, Eckles, and Imbens

2015; Eckles et al. 2016).

Another area of future research concerns how to analyze personal data

without compromising user privacy. There is a literature in computer science

around querying data while preserving privacy; the literature is referred to as

“diff erential privacy.” Some recent research has brought together the com-

puter science literature with questions about estimating statistical models

(see, e.g., Komarova, Nekipelov, and Yakovlev 2015).

I also predict a substantial increase in interdisciplinary work. Com-

puter scientists and engineers may remain closer to the frontier in terms of

algorithm design, computational effi

ciency, and related concerns. As I will

expand on further in a moment, academics of all disciplines will be gaining

a much greater ability to intervene in the environment in a way that facili-

tates measurement and caual inference. As digital interactions and digital

interventions expand across all areas of society, from education to health

to government services to transportation, economists will collaborate with

domain experts in other areas to design, implement, and evaluate changes

in technology and policy. Many of these digital interventions will be pow-

ered by ML, and ML- based causal inference tools will be used to estimate

personalized treatment eff ects of the interventions and design personalized

treatment assignment policies.

Alongside the increase in interdisciplinary work, there will also be changes

to the organization, funding, and dissemination of economics research.

Research on large data sets with complex data creation and analysis pipe-

lines can be labor intensive and also require specialized skills. Scholars who

do a lot of complex data analysis with large data sets have already begun

to adopt a “lab” model more similar to what is standard today in computer

science and many natural sciences. A lab might include a postdoctoral fellow,

multiple PhD
students, predoctoral fellows (full- time research assistants

between their bachelor’s and PhD), undergraduates, and possibly full- time

staff . Of course, labs of this scale are expensive, and so the funding models

for economics will need to adapt to address this reality. One concern is

inequality of access to resources required to do this type of research, given

that it is expensive enough that it cannot be supported given traditional

funding pools for more than a small fraction of economists at research

universities.

Within a lab, we will see increased adoption of collaboration tools such as

those used in software fi rms; tools include GitHub (for collaboration, ver-

sion control, and dissemination of software), as well as communication tools

(e.g., my generalized random- forest software is available as an open source

package on Github at http:// github .com/ swager/ grf, and users report issues

through the GitHub, and can submit request to pull in proposed changes

or additions to the code).

The Impact of Machine Learning on Economics 539

There will also be an increased emphasis on documenation and repro-

ducibility, which are necessary to make a large lab function. This will hap-

pen even as some data sources remain proprietary. “Fake” data sets will

be created that allow others to run a lab’s code and replicate the analysis

(except not on the real data). As an example of institutions created to sup-

port the lab model, both Stanford GSB and the Stanford Institute for Eco-

nomic Policy Research have “pools” of predoctoral fellows that are shared

among faculty; these programs provide mentorship, training, the opportu-

nity to take one class each quarter, and they also are demographically more

diverse than graduate student populations. The predoctoral fellows have a

special form of student status within Stanford. Other public- and private-

sector research groups have also adopted similar programs, with Microsoft

Research- New England an early innovator in this area, while individual

researcheres at universities like Harvard and MIT have also been making

use of predoctoral research assistants for a number of years.

We will also see changes in how economists engage with government,

industry, education, and health. The concept of the “economist as engineer”

promoted by market- design experts including Robert Wilson, Paul Mil-

‹ Prev Next ›