The Economics of Artificial Intelligence

Home > Other > The Economics of Artificial Intelligence > Page 90
The Economics of Artificial Intelligence Page 90

by Ajay Agrawal


  heterogeneous across users. Similarly, each item has latent characteristics

  that describe users’ willingness to travel to patronize the restaurant, and

  each user has individual- specifi c preferences for those latent characteristics.

  Thus, both users’ willingness to travel and their base utility for each restau-

  rant vary across user- item pairs. To make the estimation computationally

  feasible, we build on the methods of Ruiz, Athey, and Blei (2017). We show

  that our model performs better than more standard competing models such

  as multi nomial logit and nested logit models, in part due to the personal-

  ization of the estimates. We demonstrate in particular that our model per-

  forms better when predicting consumer responses to restaurant openings

  and closings, and we analyze how consumers reallocate their demand after a

  restaurant closes to nearby restaurants versus more distant restaurants with

  similar characteristics. Since there are several hundred restaurant openings

  and closings in the data, we are able to use the large number of “natural

  experiments” in the data to assess performance of the model. Finally, we

  show how the model can be used to analyze questions involving counter-

  factuals such as what type of restaurant would attract the most consumers

  in a given location.

  Another recent paper that makes use of factorization in the context of

  a structural model of consumer demand is Wan et al. (2017). This paper

  builds a model of consumer choice that includes choices over categories,

  purchases within a category, and quantity to purchase. The model allows for

  individual heterogeneity in preferences, and uses factorization techniques

  to estimate the model.

  21.5 Broader Predictions about the Impact

  of Machine Learning on Economics

  My prediction is that there will be substantial changes in how empirical

  work is conducted; indeed, it is already happening, and so this prediction

  already can be made with a high degree of certainty. I predict that a number

  of changes will emerge, summarized as follows:

  1. Adoption of off - the- shelf ML methods for their intended tasks (pre-

  diction, classifi cation, and clustering, e.g., for textual analysis).

  2. Extensions and modifi cations of prediction methods to account for

  considerations such as fairness, manipulability, and interpretability.

  3. Development of new econometric methods based on machine learning

  designed to solve traditional social science estimation tasks.

  4. No fundamental changes to theory of identifi cation of causal eff ects.

  The Impact of Machine Learning on Economics 535

  5. Incremental progress to identifi cation and estimation strategies for

  causal eff ects that exploit modern data settings including large- panel data

  sets and environments with many small experiments.

  6. Increased emphasis on model robustness and other supplementary

  analysis to assess credibility of studies.

  7. Adoption of new methods by empiricists at large scale.

  8. Revival and new lines of research in productivity and measurement.

  9. New methods for the design and analysis of large administrative data,

  including merging these sources and privacy- preserving methods.

  10. Increase in interdisciplinary research.

  11. Changes in organization, dissemination, and funding of economic

  research.

  12. Economist as engineer engages with fi rms, government to design, and

  implement policies in digital environment.

  13. Design and implementation of digital experimentation, both one- time

  and as an ongoing process, including multiarmed bandit experimentation

  algorithms, in collaboration with fi rms and government.

  14. Research on developing high- quality metrics that can be measured

  quickly, in order to facilitate rapid incremental innovation and experimen-

  tation.

  15. Increased use of data analysis in all levels of economics teaching;

  increase in interdisciplinary data science programs.

  16. Research on the impact of AI and ML on the economy.

  This chapter has discussed the fi rst three predictions in some detail; I will

  now discuss each of the remaining predictions in turn.

  First, as emphasized in the discussion about the benefi ts from using ML,

  ML is a very powerful tool for data- driven model selection. Getting the best

  fl exible functional form to fi t data is very important for many reasons; for

  example, when the researcher assumes that treatment assignment is uncon-

  founded, it is still crucial to fl exibly control for covariates, and a vast litera-

  ture has documented that modeling choices matter. A theme highlighted in

  this chapter is that ML can be used any time that semiparametric methods

  might have been used in the traditional econometrics literature. However,

  fi nding the best functional form is a distinct concern from whether an eco-

  nomic parameter would be identifi ed with suffi

  cient data. Thus, there is no

  obvious benefi t from ML in terms of thinking about identifi cation issues.

  However, the types of data sets that are becoming widely available due

  to digitization suggest new identifi cation questions. For example, it is com-

  mon for there to be frequent changes in algorithms in ecommerce platforms.

  These changes in algorithms create variation in user experiences (as well

  as in seller experiences in platforms and marketplaces). Thus, a typical

  user or seller may experience a large number of changes, each of which

  has modest eff ects. There are open questions about what can be learned in

  536 Susan Athey

  such environments. From an estimation perspective, there is also room to

  develop ML- inspired algorithms that take advantage of the many sources

  of variation experienced by market participants. In my 2012 Fisher Schultz

  lecture, I illustrated the idea of using randomized experiments conducted

  by technology fi rms as instruments for estimating position eff ects for spon-

  sored search advertisements. This idea has since been exploited more fully

  by others (e.g., Goldman and Rao 2014), but many open questions remain

  about the best ways to use the information in such data sets.

  Digitization is also leading to the creation of many panel data sets that

  record individual behavior at relatively high frequency over a period of time.

  There are many open questions about how to make the best use of rich

  panel data. Previously, we discussed several new papers at the intersection

  of ML and econometrics that made use of panel data (e.g., Athey, Bayati,

  et al. 2017), but I predict that this literature will grow dramatically over the

  next few years.

  There are many reasons that empiricists will adopt ML methods at scale.

  First, many ML methods simplify a variety of arbitrary choices analysts

  needed to make. In larger and more complex data sets, there are many more

  choices. Each choice must be documented, justifi ed, and serves at a poten-

  tial source of criticism of a paper. When systematic, data- driven methods

  are available, research can be made more principled and systematic, and

  there can be objective measures against which these choices can be evalu
-

  ated. Indeed, it would really be impossible for a researcher using traditional

  empirical methods to fully document the process by which the model specifi -

  cation was selected; in contrast, algorithmic selection (when the algorithm is

  given the correct objective for the problem) has superior performance while

  simultaneously being reproducible. Second, one way to conceptualize ML

  algorithms is that they perform like automated research assistants—they

  work much faster and more eff ectively than traditional research assistants

  at exploring modeling choices, yet the methods that have been customized

  for social science applications also build in protections so that, for example,

  valid confi dence intervals can be obtained. Although it is crucial to con-

  sider carefully the objective that the algorithms are given, in the end they

  are highly eff ective. Thus, they help resolve issues like “p- value hacking” by giving researchers the best of both worlds—superior performance as well as

  correct p- values that take into account the specifi cation- selection process.

  Third, in many cases, new results can be obtained. For example, if an author

  has run a fi eld experiment, there is no reason not to search for heterogeneous

  treatment eff ects using methods such as those in Athey and Imbens (2016).

  The method ensures that valid confi dence intervals can be obtained for the

  resulting estimates of treatment eff ect heterogeneity.

  Alongside the adoption of ML methods for old questions, new questions

  and types of analyses will emerge in the fi elds of productivity and mea-

  surement. Some examples of these have already been highlighted, such as

  The Impact of Machine Learning on Economics 537

  the ability to measure economic outcomes at a granular level over a longer

  period of time, through, for example, imagery. Glaeser et al. (2018) pro-

  vides a nice overview of how big data and ML will aff ect urban economics

  as a fi eld, as well as the operational effi

  ciency of cities. More broadly, as

  governments begin to absorb high- frequency, granular data, they will need

  to grapple with questions about how to maintain the stability of offi

  cial

  statistics in a world where the underlying data changes rapidly. New ques-

  tions will emerge about how to architect a system of measurement that

  takes advantage of high- frequency, noisy, unstable data, but yields statistics

  whose meaning and relationship with a wide range of economic variables

  remains stable. Firms will face similar problems as they attempt to forecast

  outcomes relevant to their own businesses using noisy, high- frequency data.

  The emerging literature in academics, government, and industry on “now-

  casting” in macroeconomics (e.g., Banbura et al. [2013] and ML begins to

  address some, but not all, of these issues). We will also see the emergence

  of new forms of descriptive analysis, some inspired by ML. Examples of

  these include techniques for describing association, for example, people who

  do A also do B; as well as interpretations and visualizations of the output

  of unsupervised ML techniques such as matrix factorization, clustering,

  and so on. Economists are likely to refi ne these methods to make them more

  directly useful quantiatively, and for business and policy decisions.

  More broadly, the ability to use predictive models to measure economic

  outcomes at high granularity and fi delity will change the types of questions

  we can ask and answer. For example, imagery from satellites or Google’s

  street view can be used in combination with survey data to train models that

  can be used to produce estimates of economic outcomes at the level of the

  individual home, either within the United States or in developing countries

  where administrative data quality can be problematic (e.g., Jean et al. 2016;

  Engstrom, Hersh, and Newhouse 2017; Naik et al. 2014).

  Another area of transformation for economics will be in the design and

  analysis of large- scale administrative data sets. We will see attempts to bring

  together disparate sources to provide a more complete view of individuals

  and fi rms. The behavior of individuals in the fi nancial world, the physical

  world, and the digital world will be connected, and in some cases ML will be

  needed simply to match diff erent identities from diff erent contexts onto the

  same individual. Further, we will observe behavior of individuals over time,

  often with high- frequency measurements. For example, children will leave

  digital footprints throughout their education, ranging from how often they

  check their homework assignments, the assignments themselves, comments

  from teachers, and so on. Children will interact with adaptive systems that

  change the material they receive based on their previous engagement and

  performance. This will create the need for new statistical methods, building

  on existing ML tools, but where the methods are more tailored to a panel-

  data setting with signifi cant dynamic eff ects (and possibly peer eff ects as

  538 Susan Athey

  well; see, for some recent statistical advances designed around analyzing

  large scale network data, Ugander et al. 2013; Athey, Eckles, and Imbens

  2015; Eckles et al. 2016).

  Another area of future research concerns how to analyze personal data

  without compromising user privacy. There is a literature in computer science

  around querying data while preserving privacy; the literature is referred to as

  “diff erential privacy.” Some recent research has brought together the com-

  puter science literature with questions about estimating statistical models

  (see, e.g., Komarova, Nekipelov, and Yakovlev 2015).

  I also predict a substantial increase in interdisciplinary work. Com-

  puter scientists and engineers may remain closer to the frontier in terms of

  algorithm design, computational effi

  ciency, and related concerns. As I will

  expand on further in a moment, academics of all disciplines will be gaining

  a much greater ability to intervene in the environment in a way that facili-

  tates measurement and caual inference. As digital interactions and digital

  interventions expand across all areas of society, from education to health

  to government services to transportation, economists will collaborate with

  domain experts in other areas to design, implement, and evaluate changes

  in technology and policy. Many of these digital interventions will be pow-

  ered by ML, and ML- based causal inference tools will be used to estimate

  personalized treatment eff ects of the interventions and design personalized

  treatment assignment policies.

  Alongside the increase in interdisciplinary work, there will also be changes

  to the organization, funding, and dissemination of economics research.

  Research on large data sets with complex data creation and analysis pipe-

  lines can be labor intensive and also require specialized skills. Scholars who

  do a lot of complex data analysis with large data sets have already begun

  to adopt a “lab” model more similar to what is standard today in computer

  science and many natural sciences. A lab might include a postdoctoral fellow,

  multiple PhD
students, predoctoral fellows (full- time research assistants

  between their bachelor’s and PhD), undergraduates, and possibly full- time

  staff . Of course, labs of this scale are expensive, and so the funding models

  for economics will need to adapt to address this reality. One concern is

  inequality of access to resources required to do this type of research, given

  that it is expensive enough that it cannot be supported given traditional

  funding pools for more than a small fraction of economists at research

  universities.

  Within a lab, we will see increased adoption of collaboration tools such as

  those used in software fi rms; tools include GitHub (for collaboration, ver-

  sion control, and dissemination of software), as well as communication tools

  (e.g., my generalized random- forest software is available as an open source

  package on Github at http:// github .com/ swager/ grf, and users report issues

  through the GitHub, and can submit request to pull in proposed changes

  or additions to the code).

  The Impact of Machine Learning on Economics 539

  There will also be an increased emphasis on documenation and repro-

  ducibility, which are necessary to make a large lab function. This will hap-

  pen even as some data sources remain proprietary. “Fake” data sets will

  be created that allow others to run a lab’s code and replicate the analysis

  (except not on the real data). As an example of institutions created to sup-

  port the lab model, both Stanford GSB and the Stanford Institute for Eco-

  nomic Policy Research have “pools” of predoctoral fellows that are shared

  among faculty; these programs provide mentorship, training, the opportu-

  nity to take one class each quarter, and they also are demographically more

  diverse than graduate student populations. The predoctoral fellows have a

  special form of student status within Stanford. Other public- and private-

  sector research groups have also adopted similar programs, with Microsoft

  Research- New England an early innovator in this area, while individual

  researcheres at universities like Harvard and MIT have also been making

  use of predoctoral research assistants for a number of years.

  We will also see changes in how economists engage with government,

  industry, education, and health. The concept of the “economist as engineer”

  promoted by market- design experts including Robert Wilson, Paul Mil-

 

‹ Prev