Book Read Free

Randomistas

Page 22

by Andrew Leigh


  2. Think creatively about how to create a random difference in the program.

  Sometimes it isn’t practical or ethical to tell a group of people that they won’t ever get the program. If so, consider alternatives to the standard randomised trial. If a policy is already being rolled out over a two-year period, why not randomise who gets it in the first year and who gets it in the second year?2 If you want to evaluate an existing program that has low take-up rates, can you use an information campaign or incentives to randomly encourage some people to access the program?

  3. Consider what the control group will do.

  Put yourself in the shoes of someone who ends up in the control group. What would you do? Recall the evaluations of the US early childhood program Head Start, which initially failed to recognise that many of the children in the control group attended other publicly provided preschool centres. Until this was taken into account, the true benefit–cost ratio was underestimated by a factor of two.

  4. Choose which outcomes to measure.

  Administrative data has the advantage that it’s cheap or free, and you can typically get it for everyone in the experiment. Surveys can be tailored, but if only one-tenth of the people answer your survey, you’ll need to start with a sample that’s ten times as big. Some surveys do repeated follow-ups, while others pay people for responding (an experiment by a chain store found that putting a dollar in the envelope doubled response rates from 8 to 16 per cent).3 When you assess the impact of the intervention, focus only on the random assignment. If a person who started in the control group manages to get their hands on the treatment, you must still analyse the data based on their original status.

  5. Select the level at which to randomise.

  An educational intervention might randomise across students, across classrooms or across schools. The right answer here depends on practical considerations, ethical concerns and how the policy might spill over from the treatment group to the control group. Iin some of the early trials of AIDS drugs, patients in the treatment and control groups shared their medication – an understandable reaction, given that the disease was effectively a death sentence at that time.4 Everyone got half a dose of the true drugs, the trial results were useless, and the drugs ended up taking longer to get approved. If the trial had randomised across hospitals, it would have required a larger sample but the trial would have been more likely to succeed.

  6. Ensure your study is large enough.

  If you’re expecting the intervention to cause massive differences between treatment and control groups, then a small sample might be sufficient. Recall the Triple P parenting program, which had such a large impact that it led to significant results in a sample of just fifty-one Indigenous families. But if you’re testing something that will only move the needle slightly, you’ll need a larger sample. Remember the problem with estimating the impact of television advertisements: an individual ad has such a small impact on overall buying that it’s almost impossible to detect even in a randomised trial. If you want an indication of how big your experiment needs to be, an internet search for ‘power calculation’ will bring up useful online calculators. If your intended sample isn’t large enough, consider collaborating with researchers in other cities or countries. Not only will it boost the sample, but it will also make people more inclined to believe that your findings are true everywhere – not just in one specific context.

  7. Register your trial and get ethics approval.

  If you hope to publish the results, then register your trial with the appropriate medical or social science website. Wherever possible, obtain ethics approval. If your intervention could harm the participants, the ethics committee may require you to establish a data and safety monitoring board, which will keep an eye on the experiment as it runs. While ethics approval can be time-consuming, it does provide insurance in case anything goes wrong. In 2014 a political science experiment sent flyers to 100,000 Montana voters, showing the ideological position of candidates for the state Supreme Court.5 Because the flyers bore the official state seal, the study was found to have breached Montana election law. The researchers might have deflected the blame if they’d had the experiment approved by their universities’ internal review boards. But they hadn’t.6

  8. Confirm that the key people understand and support randomisation.

  It’s critical to ensure that everyone involved understands why an experiment is being conducted. Supervisors will need to justify randomisation to funders or to managers. Caseworkers may have to turn away needy people based on a random process.7 If these people don’t follow the randomisation, your results are likely to be garbage.8 BETA head Michael Hiscox says that ‘developing the partnerships and getting everyone on board at the start probably accounts for 75 per cent of total effort spent in my trials’.9 Bad experts attempt to muddle through by pulling rank and using jargon. Good experts take time to explain to those on the ground what they hope to learn from a randomised trial, why it will be helpful to the clients and the organisation, and why the trial is ethical.

  9. Use a truly random procedure to split the sample.

  To allocate people to the treatment and control groups, you can toss a coin, draw paper from a hat or use your spreadsheet’s random number generation function. If you’re splitting a list in half, ensure it has been sorted into random order. If you have some background information about the participants, then you can get a bit more statistical precision by balancing the treatment and control groups on observable traits. An evaluation of mentoring programs conducted in the 1930s matched troubled youth into similar pairs, based on age, family background and delinquent behaviour.10 Researchers then tossed a coin within each pair, assigning one to the treatment group and the other to the control group.11

  10. If possible, conduct a small-scale pilot study.

  Just as an athlete goes through their paces before the race starts, it’s helpful to check the integrity of your experiment on a modest scale. The aim isn’t to produce usable results: but to find unanticipated problems in the randomisation, implementation or survey. As Dean Karlan and Jacob Appel note, you might feel like you just want to get on with the full experiment, but ‘pre-testing or small-scale piloting is the best way to preview take-up rates and reveal occupational hiccups that could arise during implementation’.12 Fix the bugs, and you’re ready to run your randomised trial!

  ACKNOWLEDGEMENTS

  My interest in randomised trials was first piqued during my studies at the Harvard Kennedy School in the early 2000s. My thesis chair, Christopher Jencks, and my advisers, David Ellwood and Caroline Hoxby, are scholars with an infectious sense of scientific curiosity. Like my wonderful parents, Barbara and Michael Leigh, my advisers taught me the value of asking questions and sifting the evidence as critically as possible. As a professor-turned-politician, I’ve also been influenced by the late US senator Daniel Patrick Moynihan, whose evidence-informed approach to public policy still has a great deal to teach us.

  In the course of researching this book, I learned a great deal from speaking with subject matter experts, as well as my parliamentary colleagues and the engaged electors of Fenner. My particular thanks to interviewees Aileen Ashford, Jon Baron, Vicki Bon, Jeff Borland, John Chalmers, Peter Choong, Tamera Clancy, Tony Davis, Jane Eastgate, Alan Frost, Alan Garner, Kate Glazebrook, Sue Grigg, Alice Hill, Michael Hiscox, Ben Hubbard, David Johnson, Guy Johnson, Brigid Jordan, Anne Kennedy, Tess Lea, Kate Leslie, John List, Angela Merriam, Matthew Morris, Greg Rebetzke, Stefanie Schurer, Adam Story, Andrew Sullivan, Dick Telford, Yi-Ping Tseng, Dave Vicary, Joe Walker, Valerie Wilson and Michael Woolcock. Thanks also to surgeon Peter Choong and his team for allowing me to watch them in action.

  For thoughtful suggestions, my thanks to Andrew Charlton, Philip Clarke, Andrew Davidson, Trevor Duke, Nicholas Faulkner, Rory Gallagher, Nick Green, Sonia Loudon, Eleanor Robson, Peter Siminski, Rocco Weglarz and Jessy Wu. For comments on earlier drafts, I am grateful to Esther Duflo, David Halpern, Ian Harri
s, Michael Hiscox, Dean Karlan, Barbara Leigh, Jennifer Rayner, Nick Terrell, Damjan Vukcevic and seminar participants at Melbourne’s Royal Children’s Hospital. I also worked with Phil Ames and James Wilson, two expatriate Australians who chose to do their Harvard Kennedy School Policy Analysis Exercise on the topic of randomised policy trials in government. Their research paper is first-rate, and these two randomistas will help shape Australian policy-making in decades to come.

  Fundamentally, this book is about the way that a better feedback loop can help us learn from our mistakes. For spotting my errors and honing my prose, thanks to my extraordinary editors, Chris Feik and Kirstie Innes-Will, as well as to the rest of the team at Black Inc. and Yale University Press for their hard work and dedication to this project.

  There are some excellent books on randomised experiments, from which I’ve learnt a great deal. If you’re interested in reading more, I particularly recommend Ian Harris’s Surgery, the Ultimate Placebo (on medical trials); Dean Karlan and Jacob Appel’s More Than Good Intentions and Abhijit Banerjee and Esther Duflo’s Poor Economics (on trials in developing countries); Uri Gneezy and John List’s The Why Axis (on experiments in business and philanthropy); Alan Gerber and Donald Green’s Get Out the Vote! (on political randomised trials); David Halpern’s Inside the Nudge Unit (on policy trials); and Tim Harford’s Adapt (on the philosophy of experimentation). For devotees, the two-volume Handbook of Field Experiments, edited by Esther Duflo and Abhijit Banerjee, provides a detailed survey of the subject.

  For tips on running your own randomised trial, check out Rachel Glennerster and Kudzai Takavarasha’s Running Randomised Evaluations, Dean Karlan and Jacob Appel’s Failing in the Field, the British Behavioural Insights Team’s Test, Learn, Adapt handbook, and the Australian BETA Unit’s Guide to Developing Behavioural Interventions for Randomised Controlled Trials.

  There was more than a little luck in an Australian economist meeting an American landscape architect in Boston eighteen years ago. To my amazing wife, Gweneth: thank you for taking a chance on me, and for your laughter, wisdom and kindness ever since. To our three remarkable boys, Sebastian, Theodore and Zachary: may you continue to experiment with life, combining optimism with scepticism, and a love of today with a desire to make tomorrow even better.

  NOTES

  1 SCURVY, SCARED STRAIGHT AND SLIDING DOORS

  1Quoted in Stephen Bown, Scurvy: How a Surgeon, a Mariner and a Gentleman Solved the Greatest Medical Mystery of the Age of Sail, New York: Thomas Dunne, 2003, p. 34.

  2Bown, Scurvy, p. 3.

  3Jonathan Lamb, Preserving the Self in the South Seas, 1680–1840, Chicago: University of Chicago Press, 2001, p. 117.

  4Bown, Scurvy, p. 26.

  5We think Lind effectively assigned sailors to the six groups at random, though with the benefit of a few centuries of hindsight it would have been better if he had done so by a formal mechanism, such as drawing their names out of a hat.

  6Lind claimed that scurvy was caused when the body’s perspiration system was blocked, causing ‘excrementitious humours’ to become ‘extremely acrid and corrosive’: quoted in Bown, Scurvy, p. 104.

  7Email from Alan Frost to author, 2 July 2015. See also Alan Frost, Botany Bay Mirages: Illusions of Australia’s Convict Beginnings, Melbourne: Melbourne University Press, 1994, pp. 120–5; James Watt, ‘Medical aspects and consequences of Cook’s voyages’ in Robin Fisher & Hugh Johnston, Captain James Cook and His Times, Vancouver and London: Douglas & Mclntyre and Croom Helm, 1979; James Watt, ‘Some consequences of nutritional disorders in eighteenth century British circumnavigations’ in James Watt, E.J. Freeman & William F. Bynum, Starving Sailors: The Influence of Nutrition upon Naval and Maritime History, London: National Maritime Museum, 1981, pp. 54–9.

  8The principal surgeon on the First Fleet wrote: ‘In one of his Majesty’s ships, I was liberally supplied with that powerful antiscorbutic, essence of malt; we had also sour krout.’ John White, Journal of a Voyage to New South Wales, 1790, entry on 6 July 1787.

  9Arthur Phillip, The Voyage of Governor Phillip to Botany Bay with an Account of the Establishment of the Colonies of Port Jackson and Norfolk Island, London: John Stockdale, 1789, Ch. 7.

  10Bown, Scurvy, pp. 170–84

  11Bown, Scurvy, p. 200.

  12Bown, Scurvy, p. 198.

  13Sally A. Brinkman, Sarah E. Johnson, James P. Codde, et al., ‘Efficacy of infant simulator programmes to prevent teenage pregnancy: A school-based cluster randomised controlled trial in Western Australia’, The Lancet, vol. 388, no. 10057, 2016, pp. 2264–71.

  14Carol Dweck, Mindset: The New Psychology of Success, New York: Random House, 2006.

  15Angus Deaton, ‘Making aid work: Evidence-based aid must not become the latest in a long string of development fads’, Boston Review, vol. 31, no. 4, 2006, p. 13.

  16Chris Van Klaveren & Kristof De Witte, ‘Football to improve math and reading performance’, Education Economics, vol. 23, no. 5, 2015, pp. 577–95.

  17The experiment was conducted with two newspapers, the Washington Times and the Washington Post. Those randomly assigned to get the Washington Post were 8 percentage points more likely to vote for the Democratic Party than those assigned to the control group. Surprisingly, the Washington Times group were also more likely to vote Democrat, though the effect was not statistically significant. Alan S. Gerber, Dean Karlan & Daniel Bergan, ‘Does the media matter? A field experiment measuring the effect of newspapers on voting behavior and political opinions’, American Economic Journal: Applied Economics, vol. 1, no. 2, pp. 35–52.

  18Luc Behaghel, Clément De Chaisemartin & Marc Gurgand, ‘Ready for boarding? The effects of a boarding school for disadvantaged students’, American Economic Journal: Applied Economics, vol. 9, no. 1, 2017, pp. 140–64.

  19Better stoves turned out to improve people’s health in the first year, but not in the second and subsequent years. Rema Hanna, Esther Duflo & Michael Greenstone, ‘Up in smoke: The influence of household behavior on the long-run impact of improved cooking stoves’, American Economic Journal: Economic Policy, vol. 8, no. 1, 2016, pp. 80–114.

  20Christopher Blattman & Stefan Dercon, ‘Everything we knew about sweatshops was wrong’, New York Times, 27 April 2017.

  21Coalition for Evidence-Based Policy, ‘Evidence summary for Treatment Foster Care Oregon (formerly MTFC)’, Washington, DC: Coalition for Evidence-Based Policy, 2009.

  22For a review of the quasi-experimental and randomised evaluations of Scared Straight, see Anthony Petrosino, Carolyn Turpin-Petrosino & John Buehler, ‘“Scared Straight’ and other juvenile awareness programs for preventing juvenile delinquency’ (Updated C2 Review), Campbell Collaboration Reviews of Intervention and Policy Evaluations (C2-RIPE), 2002. See also Robert Boruch & Ning Rui, ‘From randomized controlled trials to evidence grading schemes: Current state of evidence-based practice in social sciences’, Journal of Evidence-Based Medicine, vol. 1, no. 1, 2008, pp. 41–9.

  23The research was published in 1982. James Finckenaur, Scared Straight and the Panacea Phenomenon, Englewood Cliffs, New Jersey: Prentice-Hall, 1982.

  24Quoted in Matthew Syed, Black Box Thinking: Why Most People Never Learn from Their Mistakes – But Some Do, New York: Portfolio, 2015, p. 163.

  25Petrosino, Turpin-Petrosino & Buehler, ‘“Scared Straight’ and other juvenile awareness programs’. See also an update: Anthony Petrosino, Carolyn Turpin-Petrosino, Meghan E. Hollis-Peel & Julia G. Lavenberg, ‘“Scared Straight” and other juvenile awareness programs for preventing juvenile delinquency: A systematic review’, Campbell Systematic Reviews, Oslo: Campbell Collaboration, 2013.

  26Howard S. Bloom, Larry L. Orr, Stephen H. Bell, et al., ‘The benefits and costs of JTPA Title II-A programs: Key findings from the National Job Training Partnership Act study’, Journal of Human Resources, vol. 32, no. 3, 1997, pp. 549–76.

  27Many of these studies are reviewed in James J. Heckman, Robert J. LaLonde & Jeffrey A. Smith, ‘The economics
and econometrics of active labor market programs’ in Orley Ashenfelter & David Card (eds), Handbook of Labor Economics, vol. 3A, Amsterdam: North Holland, 1999, pp. 1865–2097. Recent evidence suggests that in developing countries, job training may be useful for youths: Orazio Attanasio, Adriana Kugler & Costas Meghir, ‘Subsidizing vocational training for disadvantaged youth in Colombia: Evidence from a randomized trial’, American Economic Journal: Applied Economics, vol. 3, no. 3, 2011, pp. 188–220.

  28Roland G. Fryer, Jr., Steven D. Levitt & John A. List, ‘Parental incentives and early childhood achievement: A field experiment in Chicago Heights’, NBER Working Paper No. 21477, Cambridge, MA: NBER, 2015.

  29Marc E. Wheeler, Thomas E. Keller & David L. DuBois, ‘Review of three recent randomized trials of school-based mentoring: Making sense of mixed findings’, Social Policy Report, vol. 24, no. 3, 2010.

  30Raj Chande, Michael Luca, Michael Sanders, et al., ‘Curbing adult student attrition: Evidence from a field experiment’, Harvard Business School Working Paper No. 15-06, Boston, MA: Harvard Business School, 2015.

  31This phenomenon is known to labour economists as ‘Ashenfelter’s Dip’, after Princeton economist Orley Ashenfelter.

  32Another example: in late 2008 and early 2009, I was seconded from my then job as an economics professor to work as a principal adviser in the Australian Treasury. Given that this was when the global financial crisis broke, the correlation between my work as a Treasury adviser and Australia’s economic growth performance is strongly negative. To ascribe causality would be to greatly overstate the power of my advice.

 

‹ Prev