The Economics of Artificial Intelligence
Page 90
heterogeneous across users. Similarly, each item has latent characteristics
that describe users’ willingness to travel to patronize the restaurant, and
each user has individual- specifi c preferences for those latent characteristics.
Thus, both users’ willingness to travel and their base utility for each restau-
rant vary across user- item pairs. To make the estimation computationally
feasible, we build on the methods of Ruiz, Athey, and Blei (2017). We show
that our model performs better than more standard competing models such
as multi nomial logit and nested logit models, in part due to the personal-
ization of the estimates. We demonstrate in particular that our model per-
forms better when predicting consumer responses to restaurant openings
and closings, and we analyze how consumers reallocate their demand after a
restaurant closes to nearby restaurants versus more distant restaurants with
similar characteristics. Since there are several hundred restaurant openings
and closings in the data, we are able to use the large number of “natural
experiments” in the data to assess performance of the model. Finally, we
show how the model can be used to analyze questions involving counter-
factuals such as what type of restaurant would attract the most consumers
in a given location.
Another recent paper that makes use of factorization in the context of
a structural model of consumer demand is Wan et al. (2017). This paper
builds a model of consumer choice that includes choices over categories,
purchases within a category, and quantity to purchase. The model allows for
individual heterogeneity in preferences, and uses factorization techniques
to estimate the model.
21.5 Broader Predictions about the Impact
of Machine Learning on Economics
My prediction is that there will be substantial changes in how empirical
work is conducted; indeed, it is already happening, and so this prediction
already can be made with a high degree of certainty. I predict that a number
of changes will emerge, summarized as follows:
1. Adoption of off - the- shelf ML methods for their intended tasks (pre-
diction, classifi cation, and clustering, e.g., for textual analysis).
2. Extensions and modifi cations of prediction methods to account for
considerations such as fairness, manipulability, and interpretability.
3. Development of new econometric methods based on machine learning
designed to solve traditional social science estimation tasks.
4. No fundamental changes to theory of identifi cation of causal eff ects.
The Impact of Machine Learning on Economics 535
5. Incremental progress to identifi cation and estimation strategies for
causal eff ects that exploit modern data settings including large- panel data
sets and environments with many small experiments.
6. Increased emphasis on model robustness and other supplementary
analysis to assess credibility of studies.
7. Adoption of new methods by empiricists at large scale.
8. Revival and new lines of research in productivity and measurement.
9. New methods for the design and analysis of large administrative data,
including merging these sources and privacy- preserving methods.
10. Increase in interdisciplinary research.
11. Changes in organization, dissemination, and funding of economic
research.
12. Economist as engineer engages with fi rms, government to design, and
implement policies in digital environment.
13. Design and implementation of digital experimentation, both one- time
and as an ongoing process, including multiarmed bandit experimentation
algorithms, in collaboration with fi rms and government.
14. Research on developing high- quality metrics that can be measured
quickly, in order to facilitate rapid incremental innovation and experimen-
tation.
15. Increased use of data analysis in all levels of economics teaching;
increase in interdisciplinary data science programs.
16. Research on the impact of AI and ML on the economy.
This chapter has discussed the fi rst three predictions in some detail; I will
now discuss each of the remaining predictions in turn.
First, as emphasized in the discussion about the benefi ts from using ML,
ML is a very powerful tool for data- driven model selection. Getting the best
fl exible functional form to fi t data is very important for many reasons; for
example, when the researcher assumes that treatment assignment is uncon-
founded, it is still crucial to fl exibly control for covariates, and a vast litera-
ture has documented that modeling choices matter. A theme highlighted in
this chapter is that ML can be used any time that semiparametric methods
might have been used in the traditional econometrics literature. However,
fi nding the best functional form is a distinct concern from whether an eco-
nomic parameter would be identifi ed with suffi
cient data. Thus, there is no
obvious benefi t from ML in terms of thinking about identifi cation issues.
However, the types of data sets that are becoming widely available due
to digitization suggest new identifi cation questions. For example, it is com-
mon for there to be frequent changes in algorithms in ecommerce platforms.
These changes in algorithms create variation in user experiences (as well
as in seller experiences in platforms and marketplaces). Thus, a typical
user or seller may experience a large number of changes, each of which
has modest eff ects. There are open questions about what can be learned in
536 Susan Athey
such environments. From an estimation perspective, there is also room to
develop ML- inspired algorithms that take advantage of the many sources
of variation experienced by market participants. In my 2012 Fisher Schultz
lecture, I illustrated the idea of using randomized experiments conducted
by technology fi rms as instruments for estimating position eff ects for spon-
sored search advertisements. This idea has since been exploited more fully
by others (e.g., Goldman and Rao 2014), but many open questions remain
about the best ways to use the information in such data sets.
Digitization is also leading to the creation of many panel data sets that
record individual behavior at relatively high frequency over a period of time.
There are many open questions about how to make the best use of rich
panel data. Previously, we discussed several new papers at the intersection
of ML and econometrics that made use of panel data (e.g., Athey, Bayati,
et al. 2017), but I predict that this literature will grow dramatically over the
next few years.
There are many reasons that empiricists will adopt ML methods at scale.
First, many ML methods simplify a variety of arbitrary choices analysts
needed to make. In larger and more complex data sets, there are many more
choices. Each choice must be documented, justifi ed, and serves at a poten-
tial source of criticism of a paper. When systematic, data- driven methods
are available, research can be made more principled and systematic, and
there can be objective measures against which these choices can be evalu
-
ated. Indeed, it would really be impossible for a researcher using traditional
empirical methods to fully document the process by which the model specifi -
cation was selected; in contrast, algorithmic selection (when the algorithm is
given the correct objective for the problem) has superior performance while
simultaneously being reproducible. Second, one way to conceptualize ML
algorithms is that they perform like automated research assistants—they
work much faster and more eff ectively than traditional research assistants
at exploring modeling choices, yet the methods that have been customized
for social science applications also build in protections so that, for example,
valid confi dence intervals can be obtained. Although it is crucial to con-
sider carefully the objective that the algorithms are given, in the end they
are highly eff ective. Thus, they help resolve issues like “p- value hacking” by giving researchers the best of both worlds—superior performance as well as
correct p- values that take into account the specifi cation- selection process.
Third, in many cases, new results can be obtained. For example, if an author
has run a fi eld experiment, there is no reason not to search for heterogeneous
treatment eff ects using methods such as those in Athey and Imbens (2016).
The method ensures that valid confi dence intervals can be obtained for the
resulting estimates of treatment eff ect heterogeneity.
Alongside the adoption of ML methods for old questions, new questions
and types of analyses will emerge in the fi elds of productivity and mea-
surement. Some examples of these have already been highlighted, such as
The Impact of Machine Learning on Economics 537
the ability to measure economic outcomes at a granular level over a longer
period of time, through, for example, imagery. Glaeser et al. (2018) pro-
vides a nice overview of how big data and ML will aff ect urban economics
as a fi eld, as well as the operational effi
ciency of cities. More broadly, as
governments begin to absorb high- frequency, granular data, they will need
to grapple with questions about how to maintain the stability of offi
cial
statistics in a world where the underlying data changes rapidly. New ques-
tions will emerge about how to architect a system of measurement that
takes advantage of high- frequency, noisy, unstable data, but yields statistics
whose meaning and relationship with a wide range of economic variables
remains stable. Firms will face similar problems as they attempt to forecast
outcomes relevant to their own businesses using noisy, high- frequency data.
The emerging literature in academics, government, and industry on “now-
casting” in macroeconomics (e.g., Banbura et al. [2013] and ML begins to
address some, but not all, of these issues). We will also see the emergence
of new forms of descriptive analysis, some inspired by ML. Examples of
these include techniques for describing association, for example, people who
do A also do B; as well as interpretations and visualizations of the output
of unsupervised ML techniques such as matrix factorization, clustering,
and so on. Economists are likely to refi ne these methods to make them more
directly useful quantiatively, and for business and policy decisions.
More broadly, the ability to use predictive models to measure economic
outcomes at high granularity and fi delity will change the types of questions
we can ask and answer. For example, imagery from satellites or Google’s
street view can be used in combination with survey data to train models that
can be used to produce estimates of economic outcomes at the level of the
individual home, either within the United States or in developing countries
where administrative data quality can be problematic (e.g., Jean et al. 2016;
Engstrom, Hersh, and Newhouse 2017; Naik et al. 2014).
Another area of transformation for economics will be in the design and
analysis of large- scale administrative data sets. We will see attempts to bring
together disparate sources to provide a more complete view of individuals
and fi rms. The behavior of individuals in the fi nancial world, the physical
world, and the digital world will be connected, and in some cases ML will be
needed simply to match diff erent identities from diff erent contexts onto the
same individual. Further, we will observe behavior of individuals over time,
often with high- frequency measurements. For example, children will leave
digital footprints throughout their education, ranging from how often they
check their homework assignments, the assignments themselves, comments
from teachers, and so on. Children will interact with adaptive systems that
change the material they receive based on their previous engagement and
performance. This will create the need for new statistical methods, building
on existing ML tools, but where the methods are more tailored to a panel-
data setting with signifi cant dynamic eff ects (and possibly peer eff ects as
538 Susan Athey
well; see, for some recent statistical advances designed around analyzing
large scale network data, Ugander et al. 2013; Athey, Eckles, and Imbens
2015; Eckles et al. 2016).
Another area of future research concerns how to analyze personal data
without compromising user privacy. There is a literature in computer science
around querying data while preserving privacy; the literature is referred to as
“diff erential privacy.” Some recent research has brought together the com-
puter science literature with questions about estimating statistical models
(see, e.g., Komarova, Nekipelov, and Yakovlev 2015).
I also predict a substantial increase in interdisciplinary work. Com-
puter scientists and engineers may remain closer to the frontier in terms of
algorithm design, computational effi
ciency, and related concerns. As I will
expand on further in a moment, academics of all disciplines will be gaining
a much greater ability to intervene in the environment in a way that facili-
tates measurement and caual inference. As digital interactions and digital
interventions expand across all areas of society, from education to health
to government services to transportation, economists will collaborate with
domain experts in other areas to design, implement, and evaluate changes
in technology and policy. Many of these digital interventions will be pow-
ered by ML, and ML- based causal inference tools will be used to estimate
personalized treatment eff ects of the interventions and design personalized
treatment assignment policies.
Alongside the increase in interdisciplinary work, there will also be changes
to the organization, funding, and dissemination of economics research.
Research on large data sets with complex data creation and analysis pipe-
lines can be labor intensive and also require specialized skills. Scholars who
do a lot of complex data analysis with large data sets have already begun
to adopt a “lab” model more similar to what is standard today in computer
science and many natural sciences. A lab might include a postdoctoral fellow,
multiple PhD
students, predoctoral fellows (full- time research assistants
between their bachelor’s and PhD), undergraduates, and possibly full- time
staff . Of course, labs of this scale are expensive, and so the funding models
for economics will need to adapt to address this reality. One concern is
inequality of access to resources required to do this type of research, given
that it is expensive enough that it cannot be supported given traditional
funding pools for more than a small fraction of economists at research
universities.
Within a lab, we will see increased adoption of collaboration tools such as
those used in software fi rms; tools include GitHub (for collaboration, ver-
sion control, and dissemination of software), as well as communication tools
(e.g., my generalized random- forest software is available as an open source
package on Github at http:// github .com/ swager/ grf, and users report issues
through the GitHub, and can submit request to pull in proposed changes
or additions to the code).
The Impact of Machine Learning on Economics 539
There will also be an increased emphasis on documenation and repro-
ducibility, which are necessary to make a large lab function. This will hap-
pen even as some data sources remain proprietary. “Fake” data sets will
be created that allow others to run a lab’s code and replicate the analysis
(except not on the real data). As an example of institutions created to sup-
port the lab model, both Stanford GSB and the Stanford Institute for Eco-
nomic Policy Research have “pools” of predoctoral fellows that are shared
among faculty; these programs provide mentorship, training, the opportu-
nity to take one class each quarter, and they also are demographically more
diverse than graduate student populations. The predoctoral fellows have a
special form of student status within Stanford. Other public- and private-
sector research groups have also adopted similar programs, with Microsoft
Research- New England an early innovator in this area, while individual
researcheres at universities like Harvard and MIT have also been making
use of predoctoral research assistants for a number of years.
We will also see changes in how economists engage with government,
industry, education, and health. The concept of the “economist as engineer”
promoted by market- design experts including Robert Wilson, Paul Mil-