by Andrew Leigh
22Of 1082 Indigenous-specific programs found in an online search, the author estimates that only eighty-eight have been, or are in the process of being, evaluated. See Sara Hudson, Mapping the Indigenous Program and Funding Maze, Research Report 18, Sydney: CIS, 2016, p. 23. Similarly, a review of Commonwealth Indigenous programs by the Department of Finance found a lack of robust evidence on the performance of most of them: see Productivity Commission, ‘Better Indigenous policies: The role of evaluation, roundtable proceedings, 22–23 October 2012, Canberra’, Canberra: Productivity Commission, 2012, p. 18.
23Peter Rossi, ‘The iron law of evaluation and other metallic rules’ in Joann L. Miller and Michael Lewis (eds), Research in Social Problems and Public Policy, vol. 4, Greenwich, CT: JAI Press, 1987, pp. 3–20 at p. 3.
24Tim Harford, ‘The random risks of randomised trials’, Financial Times, 25 April 2014.
25Janet Currie, ‘Early childhood education programs,’ Journal of Economic Perspectives, vol. 15, no. 2, 2001, pp. 213–38.
26Patrick Kline & Chris Walters, ‘Evaluating public programs with close substitutes: The case of Head Start’, Quarterly Journal of Economics, vol. 131, no. 4, 2016, pp. 1795–1848. See also Roland Fryer, ‘The production of human capital in developed countries: Evidence from 196 randomized field experiments’ in Banerjee & Duflo (eds), Handbook of Field Experiments, pp. 95–322.
27In one randomised evaluation of Head Start, the share of children attending centre-based care was 90 per cent in the treatment group and 43 per cent in the control group: Michael Puma, Stephen Bell, Ronna Cook & Camilla Heid, ‘Head Start impact study final report’, Washington, DC, 2010: US Department of Health and Human Services, Administration for Children and Families.
28The main error was that evaluators were massively overstating the true cost of Head Start. The cost should have been measured not as the total cost of Head Start, but the difference between Head Start’s cost and the cost of the other publicly provided preschool programs. See Kline and Walters, ‘Evaluating public programs with close substitutes’.
29See for example Andrew Leigh, ‘Employment effects of minimum wages: Evidence from a quasi-experiment’, Australian Economic Review, vol. 36, no. 4, 2003, pp. 361–73 (with erratum in vol. 37, no.1, pp. 102–5).
30Ian Davidoff & Andrew Leigh, ‘How much do public schools really cost? Estimating the relationship between house prices and school quality’, Economic Record, vol. 84, no. 265, 2008, pp. 193–206.
31Andrew Leigh & Chris Ryan, ‘Estimating returns to education using different natural experiment techniques’, Economics of Education Review, vol. 27, no. 2, 2008, pp. 149–60.
32Paul Burke & Andrew Leigh, ‘Do output contractions trigger democratic change?’ American Economic Journal: Macroeconomics, vol. 2, no. 4, 2010, pp. 124–57
33Andrew Leigh and Christine Neill, ‘Can national infrastructure spending reduce local unemployment? Evidence from an Australian roads program’, Economics Letters, vol. 113, no. 2, 2011, pp. 150–3.
34Susan Athey, ‘Machine learning and causal inference for policy evaluation’, in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 5–6. ACM, 2015; Sendhil Mullainathan & Jann Spiess, ‘Machine learning: an applied econometric approach’, Journal of Economic Perspectives, vol. 31, no. 2, 2017, pp. 87–106.
35Peter Passell, ‘Like a new drug, social programs are put to the test’, New York Times, 9 March 1993.
36Joshua Angrist & Jörn-Steffen Pischke, Mostly Harmless Econometrics: An Empiricist’s Companion, Princeton: Princeton University Press, 2009, pp. 4–11
37Robert J. LaLonde, ‘Evaluating the econometric evaluations of training programs with experimental data’, American Economic Review, vol. 76, no. 4, 1986. pp. 604–20. See also Joshua D. Angrist & Jörn-Steffen Pischke, ‘The credibility revolution in empirical economics: How better research design is taking the con out of econometrics’, Journal of Economic Perspectives, vol. 24, no. 2, 2010, pp. 3–30.
38George Bulman & Robert W. Fairlie, ‘Technology and education: The effects of computers, the Internet and computer assisted instruction on educational outcomes’ in Eric A. Hanushek, Stephen Machin & Ludger Woessmann (eds), Handbook of the Economics of Education, Volume 5, Amsterdam: Elsevier, 2016, pp. 239–80.
39The United States abandoned the gold standard in 1971. A 2012 survey of prominent US economists carried out by the IGM Economic Experts Panel found that none of the forty respondents supported a return to the gold standard.
40For a proposed evidence hierarchy, see Andrew Leigh, ‘What evidence should social policymakers use?’, Economic Roundup, no. 1, 2009, pp. 27–43.
41Jon Baron, quoted in Gueron & Rolston, Fighting for Reliable Evidence, p. 458.
42Sheena S. Iyengar & Mark R. Lepper, ‘When choice is demotivating: Can one desire too much of a good thing?’ Journal of personality and social psychology, vol. 79, no. 6, 2000, pp. 995–1006.
43This example is from Manzi, Uncontrolled, pp. 149–52. At the time of writing, Google Scholar estimated that Iyengar and Lepper’s paper had been cited over 2500 times. I confess that I’m one of those who is guilty of popularising it without reviewing the follow-up studies: Andrew Leigh, The Economics of Just About Everything, Sydney: Allen & Unwin, 2014, p. 10.
44Benjamin Scheibehenne, Rainer Greifeneder & Peter M. Todd, ‘Can there ever be too many options? A meta-analytic review of choice overload’, Journal of Consumer Research, vol. 37, no. 3, 2010, pp. 409–25.
45Alan Gerber & Neil Malhotra, ‘Publication bias in empirical sociological research’, Sociological Methods & Research, vol. 37, no. 1, 2008, pp. 3–30; Alan Gerber & Neil Malhotra, ‘Do statistical reporting standards affect what is published? Publication bias in two leading political science journals’, Quarterly Journal of Political Science. vol. 3, no. 3, 2008, pp. 313–26; E.J. Masicampo & Daniel R. Lalande, ‘A peculiar prevalence of p values just below .05’, Quarterly Journal of Experimental Psychology, vol. 65, no. 11, 2012, pp. 2271–9; Kewei Hou, Chen Xue & Lu Zhang, ‘Replicating anomalies’, NBER Working Paper 23394, Cambridge, MA: National Bureau of Economic Research, 2017.
46Alexander A. Aarts, Joanna E. Anderson, Christopher J. Anderson, et al., ‘Estimating the reproducibility of psychological science’, Science, vol. 349, no. 6251, 2015.
47This represented two out of eighteen papers: John P.A. Ioannidis, David B. Allison, Catherine A. Ball, et al., ‘Repeatability of published microarray gene expression analyses’, Nature Genetics, vol. 41, no. 2, 2009, pp. 149–55.
48This represented six out of fifty-three papers: C. Glenn Begley & Lee M. Ellis, ‘Drug development: Raise standards for preclinical cancer research’, Nature, vol. 483, no. 7391, 2012, pp. 531–3.
49This represented twenty-nine out of fifty-nine papers: Andrew C. Chang & Phillip Li, ‘A preanalysis plan to replicate sixty economics research papers that worked half of the time’, American Economic Review, vol. 107, no. 5, 2017, pp. 60–4.
50John P.A. Ioannidis, ‘Why most published research findings are false’, PLoS Med, vol. 2, no. 8, 2005, e124.
51See, for example, Zacharias Maniadis, Fabio Tufano & John A. List, ‘How to make experimental economics research more reproducible: Lessons from other disciplines and a new proposal’, Replication in Experimental Economics, 2015, pp. 215–30; Regina Nuzzo, ‘How scientists fool themselves – and how they can stop’, Nature, vol. 526, no. 7572, 2015, pp. 182–5.
52Larry Orr, ‘If at first you succeed, try again!’, Straight Talk on Evidence blog, Laura and John Arnold Foundation, 16 August 2017
53Author’s interview with David Johnson, 16 July 2015.
54The same is true of the question as to when research should overturn prior beliefs. Just one study should not necessarily be persuasive, but multiple studies should eventually cause a person to change his or her mind: Luigi Butera & John A. List, ‘An economic approach to alleviate the crises of confidence in science: With an application to the public
goods game’, NBER Working Paper No. 23335, Cambridge, MA: NBER, 2017.
55In the Campbell Collaboration database, the share of studies carried out in the United States was 88 per cent prior to 1985, 87 per cent for 1985 to 1994, and 29 per cent for 2005 to 2014: Ames & Wilson, ‘Unleashing the potential’.
56Monique L. Anderson, Karen Chiswell, Eric D. Peterson, Asba Tasneem, James Topping & Robert M. Califf, ‘Compliance with results reporting at ClinicalTrials.gov’ New England Journal of Medicine, vol. 372, no. 11, 2015, pp. 1031–39.
57‘Spilling the beans: Failure to publish the results of all clinical trials is skewing medical science’, Economist, 25 July 2015, pp. 62–3.
58‘Spilling the beans’.
59Ben Goldacre, Henry Drysdale, Anna Powell-Smith, et al. The COMPare Trials Project, 2016, www.COMPare-trials.org; Christopher W. Jones, Lukas G. Keil, Wesley C. Holland, et al., ‘Comparison of registered and published outcomes in randomized controlled trials: A systematic review’, BMC Medicine, vol. 13, no. 1, 2015, pp. 1–12; Padhraig S. Fleming, Despina Koletsi, Kerry Dwan & Nikolaos Pandis. ‘Outcome discrepancies and selective reporting: impacting the leading journals?’ PloS One, vol. 10, no. 5, 2015, e0127495.
60‘For my next trick …’, Economist, 26 March 2016
61In Grade 5, students in the treatment group scored 10.9 points higher on their numeracy tests (representing about two months’ worth of learning), and had 0.24 kg less body fat (about 11 per cent less fat): Richard D. Telford, Ross B. Cunningham, Robert Fitzgerald, et al., ‘Physical education, obesity, and academic achievement: A 2-year longitudinal investigation of Australian elementary school children’, American Journal of Public Health, vol. 102, no. 2, 2012, pp. 368–74. In Grade 6, the share of children with elevated LDL-C was 23 per cent in the control group, but 14 per cent in the treatment group: Richard D. Telford, Ross B. Cunningham, Paul Waring, et al., ‘Physical education and blood lipid concentrations in children: The LOOK randomized cluster trial’, PloS One, vol. 8, no. 10, 2013, e76124.
12 WHAT’S THE NEXT CHANCE?
1David Wootton, The Invention of Science: A New History of the Scientific Revolution, New York: Harper, 2015, pp. 6–7.
2Wootton, The Invention of Science, p. 355, quoted in Adam Gopnik, ‘Spooked’, New Yorker, 30 November 2015, pp. 84–6.
3The survey asked whether people agreed with the statement that ‘Human beings, as we know them, developed from other species of animals’. Jon D. Miller, Eugenie C. Scott and Shinji Okamoto, ‘Public acceptance of evolution’, Science, vol. 313, no. 5788, 2006, pp. 76–6. Gallup polls show that the share of Americans believing that ‘human beings have evolved over millions of years from other forms of life, but God had no part in this process’ grew from 9 per cent in 1982 to 19 per cent in 2017.
4Economist Intelligence Unit, Gut & gigabytes: Capitalising on the art & science in decision making, New York: PwC, 2014, p. 29.
5Tim Harford, ‘How politicians poisoned statistics’, FT Magazine, 14 April 2016.
6Harry Frankfurt, ‘On bullshit’, Raritan Quarterly Review, vol. 6, no. 2, 1986, pp. 81–100.
7Donald Campbell, ‘The experimenting society’ in William Dunn (ed.), The Experimenting Society: Essays in Honor of Donald T. Campbell, Policy Studies Review Annual, Volume 11, Transaction Publishers, New Brunswick, 1998, p. 39.
8Campbell, ‘The experimenting society’, p. 41.
9Richard Feynman, ‘Cargo cult science’, Caltech Commencement Address, 1974.
10Esther Duflo & Michael Kremer, ‘Use of randomization in the evaluation of development effectiveness’ in William R. Easterly (ed.) Reinventing Foreign Aid, Cambridge MA: MIT Press, 2008, p. 117.
11Halpern, Inside the Nudge Unit, p. 341.
12Peter Passell, ‘Like a new drug, social programs are put to the test’, New York Times, 9 March 1993, p. C1. Gueron headed the Manpower Demonstration and Research Corporation from 1986 to 2004.
13For a discussion of incrementalism in art, economics, sport and dieting, see Stephen Dubner, ‘In praise of incrementalism’, Freakonomics Radio, 26 October 2016.
14Quoted in Lisa Sanders, ‘Medicine’s progress, one setback at time’, New York Times, 16 March 2003, pp. 29–31.
15Quoted in Colleen M. McCarthy, E. Dale Collins & Andrea L. Pusic, ‘Where do we find the best evidence?’ Plastic and Reconstructive Surgery, vol. 122, no. 6, 2008, pp. 1942–7.
16Quoted in Gomes, The Good Life, p. 84.
17OECD, Entrepreneurship at a Glance 2015, Paris: OECD Publishing, 2015, p. 58.
18William R. Kerr, Ramana Nanda & Matthew Rhodes-Kropf, ‘Entrepreneurship as experimentation’, Journal of Economic Perspectives, vol. 28, no. 3, 2014, pp. 25–48.
19Quoted in Dan Ariely, ‘Why businesses don’t experiment’, Harvard Business Review, vol. 88, no. 4, 2010, pp. 34–36.
20Megan McArdle, The Up Side of Down: Why Failing Well Is the Key to Success, New York: Penguin, 2015.
21Bent Flyvbjerg, Mette K. Skamris Holm & Søren L. Buhl, ‘How (in) accurate are demand forecasts in public works projects? The case of transportation’, Journal of the American Planning Association, vol. 71, no. 2, 2005, pp. 131–46; Robert Bain, ‘Error and optimism bias in toll road traffic forecasts’, Transportation, vol. 36, no. 5, 2009, pp. 469–82; Bent Flyvbjerg & Eamonn Molloy, ‘Delusion, deception and corruption in major infrastructure projects: Causes, consequences, cures’, International Handbook on the Economics of Corruption, vol. 2, 2012, pp. 81–107.
22Nassim Nicholas Taleb, The Black Swan: The Impact of the Highly Improbable, 2nd edn, New York: Random House, 2010, p. 154.
23Ola Svenson, ‘Are we all less risky and more skillful than our fellow drivers?’ Acta Psychologica, vol. 47, no. 2, pp. 143–8.
24Eighteen per cent rated their own beauty as above average, 79 percent said average, and 3 per cent said below average: Jonathan Kelley, Robert Cushing & Bruce Headey, Codebook for 1984 Australian National Social Science Survey (ICPSR 9084), Ann Arbor, MI: Inter-university Consortium for Political and Social Research, 1989.
25Dominic D.P. Johnson & James H. Fowler, ‘The evolution of overconfidence’, Nature, vol. 477, no. 7364, 2011, pp. 317–20.
26Daniel Kahneman, Thinking, Fast and Slow, New York: Macmillan, 2011, p. 263.
27For a thoughtful discussion of why the legal profession has resisted randomised trials, see James Greiner & Andrea Matthews, ‘Randomized control trials in the United States legal profession’, Annual Review of Law and Social Science, vol. 12, 2016, pp. 295–312. The dearth of random assignment studies (and empirical evidence, for that matter) on antiterrorism strategies is discussed in Anthony Biglan, ‘Where terrorism research goes wrong’, New York Times, 6 March 2015, p. SR12.
28Chris Blattman, ‘Why “what works?” is the wrong question: Evaluating ideas not programs’, chrisblattman.com, 19 July 2016.
29Jens Ludwig, Jeffrey R. Kling & Sendhil Mullainathan, ‘Mechanism experiments and policy evaluations’, Journal of Economic Perspectives, vol. 25, no. 3, 2011, pp. 17–38.
30In 1969, Stanford psychologist Philip Zimbardo tried this on a small scale, smashing the windows on a parked car and then watching to see how members of the community responded. See George Kelling & James Wilson, ‘Broken windows: The police and neighborhood safety’, Atlantic, vol. 249, no. 3, 1982, pp. 29–38.
31See USAID, ‘Frequently Asked Questions about Development Innovation Ventures’, Washington, DC: USAID, 6 February 2017; USAID, ‘FY2015 & FY2016 Development Innovation Ventures Annual Program Statement’, Washington, DC: USAID, 20 October 2015.
32These examples are drawn from the Coalition for Evidence-Based Policy (now incorporated into the Laura and John Arnold Foundation), and a presentation by Adam Gamoran, titled ‘Measuring impact in science education: Challenges and possibilities of experimental design’, NYU Abu Dhabi Conference, January 2009.
33‘In praise of human guinea pigs’, The Economist, 12 December 2015, p. 14.
34Education Endowment Foundatio
n, ‘Classification of the security of findings from EEF evaluations’, 21 May 2014.
35‘David Olds speaks on value of randomized controlled trials’, Children’s Health Policy Centre, Faculty of Health Sciences, Simon Fraser University, 26 May 2014.
36Dean Karlan & Daniel H. Wood, ‘The effect of effectiveness: Donor response to aid effectiveness in a direct mail fundraising experiment’, Journal of Behavioral and Experimental Economics, vol. 66, issue C, 2017, pp. 1–8. The mailings were sent in 2007 and 2008, but the description quoted is from the 2008 letters.
TEN COMMANDMENTS
1Gueron and Rolston, Fighting for Reliable Evidence, p. 383
2Admittedly, this creates the risk that results may come too late to shape policy. When the Indonesian government announced that it planned to double teacher salaries, a team of researchers created a randomised trial by implementing the change early in randomly selected schools. The evaluation showed that the policy – costing over US$5 billion annually – did not improve student learning. As a former finance minister of Indonesia wryly noted afterwards, the result would have been more useful if it had been known before the policy change had been fully implemented: see Karthik Muralidharan & Paul Niehaus, ‘Experimentation at scale’, Journal of Economic Perspectives, vol. 31, no. 4, 2017, pp. 103–24.
3Uri Gneezy & Pedro Rel-Biel, ‘On the relative efficiency of performance pay and noncontingent incentives’, Journal of the European Economic Association, vol. 12, no. 1, 2014, pp. 62–72.
4Charles Ralph Buncher & Jia-Yeong Tsay (eds), Statistics in the Pharmaceutical Industry, 2nd edn, New York: Marcel Dekker, 1994, p. 211.
5Derek Willis, ‘Professors’ research project stirs political outrage in Montana’, New York Times, 28 October 2014.