Book Read Free

Randomistas

Page 15

by Andrew Leigh


  *

  Google’s first randomised experiment was conducted on 27 February 2000.55 Curious about whether the firm should give users more than ten search results, they randomly assigned 1 in 1000 users to receive twenty results, and another 1 in 1000 users to get thirty results. The company couldn’t have gotten a bigger rebuff. Doubling or tripling the number of results slowed the loading times and caused many users to leave the site. They stuck with the top ten.

  Today, Google is conducting hundreds of randomised experiments on its users: seeking to fine-tune its search algorithm and the way its results are presented.56 In some cases, these involve tiny tweaks – such as increasing the amount of white space around the first search result, bolding query words in the search results or making slight adjustments to the graphics.

  When choosing the colour for its toolbar, Google designers noticed that users were slightly more likely to click through if the toolbar was presented as a greenish-blue colour than if it was coloured plain blue. More click-throughs meant more advertising revenue for the company, so modest improvements had significant revenue implications. Seeing the results, Marissa Mayer, then a vice-president at Google, suggested that they go further. Mayer proposed randomly splitting users into forty equally sized groups. Each group would then see the toolbar in a slightly different shade of blue. The company could then simply choose the colour based on whatever generated the highest click-through rates. Science, not gut instinct, determined the result. With billions of clicks, even a small difference means big bucks. According to journalist Matthew Syed, a Google executive estimated that finding the perfect colour for the toolbar added US$200 million to the company’s bottom line.57

  At other times, Google tests bigger changes, such as new features or products. But not every new idea at Google gets trialled with live traffic. Indeed, the company estimates that for every idea that gets trialled, there are three others that were considered and rejected before reaching the trial stage. Even among those that are trialled, only one in five leads to a change in the product. In other words, Google’s ideas have a failure rate of nineteen out of twenty.58 As co-founder Eric Schmidt observes, ‘This is a company where it’s absolutely okay to try something that’s very hard, have it not be successful, and take the learning from that.’59

  Some people say that when you have a really big sample, you don’t need randomised trials. Instead, they claim, you can just look for patterns.60 But Google most likely has more data than any other organisation in the world, yet still conducts oodles of in-house experiments. Google’s scientists have access to around 15 exabytes of data, and around 40,000 searches each second. If Google still gets value from randomised trials, then the same must go for every other researcher on the planet.

  At Netflix, the same data-driven culture prevails. People who sign up for a free trial are randomly assigned to different treatments aimed at turning them into paying customers.61 Regular users are often placed in experiments to test new aspects of the website. As Netflix’s data boffins note, gut feel doesn’t definitively tell you what to offer someone who has just finished watching House of Cards. Should it be the shows that are most similar, most popular or most trendy? As they admit, ‘using our own intuition, even collective intuition, to choose the best variant of a recommendation algorithm also often yields the wrong answer’.62 Better personalisation and recommendations allow Netflix to keep more of its customers. The company estimates that improving its algorithms has saved it over US$1 billion a year.63

  Perhaps because Google’s and Netflix’s experiments are directed at improving the quality of their websites, they haven’t attracted much criticism. For Facebook, the experience was much less positive when it partnered with psychology researchers to carry out a social science experiment on its users.64 For a single week in January 2012, Facebook randomly altered the emotional content in the News Feeds of 700,000 users. Software analysed each post by a Facebook friend, compared it against a list of positive and negative words, and classified it as negative, positive or neutral. People in the experimental treatment either saw 10 per cent fewer negative posts or 10 per cent fewer positive posts.65 The study only affected what people saw in their News Feed; they could still view all the posts by going to a friend’s Wall or Timeline.

  The researchers then watched to see how users responded. They found that there was a small but noticeable impact: when people see more negativity in their friends’ posts, their own posts become more negative. Reading negative material also led to social withdrawal. When you see negative posts, you’re less likely to use Facebook the next day. Before the experiment, some had theorised that Facebook might be driven by a combination of jealousy and schadenfreude. If social comparisons dominate, successful friends should make us morose, while their bad news should make us feel fortunate. The reality turns out to be simpler: our moods follow those of our friends. Positive and negative emotions are contagious.66

  Like Amazon’s pricing experiments, Facebook’s emotional manipulation experiments caused a media firestorm. The researchers were criticised by the British Psychological Society, the Federal Trade Commission was asked to investigate, and the academic journal Proceedings of the National Academy of Sciences published an editorial ‘expression of concern’.67 Sheryl Sandberg, then Facebook’s chief operating officer, told users: ‘It was poorly communicated and for that communication we apologize. We never meant to upset you.’68

  Given that I’m both a randomista and a former professor, you might expect me to side with companies that collaborate with academics to conduct social science experiments. But I can readily see where the complainants are coming from. If you’re a Facebook user, there’s a chance you unwittingly took part in their 2012 emotional manipulation study. You wouldn’t have known it then, and you wouldn’t know it now. And unlike a tweak to the Google algorithm, we can’t be confident the Facebook experiment left its users unscathed. For experiments of this kind, large firms may want to consider encouraging a subset of users to opt in to become ‘A/B testers’, perhaps offering them a perk in exchange for their pioneering spirit. Such a model would give companies the confidence that their guinea pigs at least knew they were in a maze, while giving users who find experimentation a bit ‘creepy’ the chance to say no.

  9

  TESTING THEORIES IN POLITICS AND PHILANTHROPY

  Residents of East Rock, Connecticut, take Halloween seriously. On a typical night, around 500 children descend on each house that offers candy to trick-or-treaters. But in 2008 the costumed children came upon an unusual home, owned by economist Dean Karlan. The left side of the porch was decorated with campaign posters and a life-size cut-out of the Democratic presidential candidate, Barack Obama. The right side of the porch featured the Republican candidate, John McCain.1

  East Rock is a heavily Democratic neighbourhood, so it was no surprise that when children were told that they could pick up one piece of candy from either side of the house, four out of five chose the Democratic side. Then the economist homeowner tried something else. Randomly selected children were told that they could get two pieces of candy from the Republican side, or one piece from the Democratic side. In other words, they were offered a sweet inducement to go against their political preferences.

  It turned out that whether children in a Democratic-voting neighbourhood were willing to take candy from a Republican depended on their age. Those aged between four and eight tended to stick to their ideology. Older children, aged nine to fifteen, mostly switched to the Republican side if it meant twice the candy. Four years later, a repeat experiment for Halloween 2012 (this time featuring Barack Obama versus Mitt Romney) produced similar results.

  These Halloween experiments suggest that there may be some truth in the cliché about young people voting with their hearts and older people voting with their wallets. It’s also a reminder that randomistas like Dean Karlan are endlessly imaginative in their quest to test theories with the toss of a coin.

  Indeed, in his
first presidential race, Barack Obama’s team was using randomised trials to evaluate campaign strategies. In December 2007, when visitors first went to the Obama website, they saw one of several images, including a colour image of Obama, a black-and-white family photo or a video of Obama speaking.2 Then they were encouraged to subscribe to campaign emails using various messages, including ‘Join Us Now’, ‘Learn More’ or ‘Sign Up’. Take a moment to guess which combination of image and message you think worked best. Is it the colour photo, the black-and-white photo or the video? Would you imagine it is best to ask supporters to Join, Learn or Sign Up?

  There are reasons to like each of the combinations, but on the Obama campaign team, people’s instincts tended to gravitate to the video and the message ‘Sign Up’. They were expert campaigners, but most of them turned out to be wrong. After rotating the variants across 300,000 web visitors, the campaign found that the combination of the black-and-white photo and ‘Learn More’ garnered 41 per cent more email addresses. Over the course of the campaign, the Obama team estimated that this one experiment garnered nearly 3 million more emails, 280,000 more volunteers and US$60 million more in donations. The experts had assumed that people were more likely to sign on when they saw a video. But as campaign digital adviser Dan Siroker told one interviewer, ‘Assumptions tend to be wrong.’3

  In this chapter, I’ll discuss the growing use of randomised trials in politics and philanthropy, starting with campaigns to get people to the ballot box, then moving on to randomised trials on fundraising, and concluding with a recent spate of experiments where politicians are the unwitting subjects of randomised experiments.

  *

  In countries where voting is voluntary, a huge amount of effort goes into persuading people to go to the polls. These ‘get out the vote’ campaigns are conducted both by nonpartisan civic groups (who are keen to raise overall turnout), as well as by candidates (whose aim is to increase turnout among their own supporters).

  In the 1924 US presidential election, political scientist Harold Gosnell noticed that a huge amount of effort was being made to get citizens to vote.4 The National League of Women Voters and the Boy Scouts were knocking on millions of doors, reminding people of their civic duty to cast a ballot. Turnout was indeed higher in 1924 than it had been in 1920. But when it came to the question of what part of the increase had been caused by the doorknocking efforts, Gosnell argued that the only candid answer was ‘we do not know’.

  To find out, Gosnell embarked on what was probably the first randomised trial in political science – investigating the impact on voter turnout of sending letters to homes in Chicago.5 Only by having a credible control group, Gosnell pointed out, could we truly know what impact the mailings had on voting. His studies estimated that those who received a letter were between 1 and 9 percentage points more likely to vote.

  Nearly a century later, direct mail has much less impact on voters than in Gosnell’s day. But he understood – as some campaigners still do not – that randomised trials are one of the best ways of measuring what works in a political campaign. In their bestselling book Get Out the Vote, two leading political randomistas, Alan Gerber and Donald Green, point out that campaigns are often still run by grizzled veterans who know a lot about inputs – the various tactics that can be used – but not much about outputs – the number of extra votes they garner.6 The greybeards have plenty of anecdotes but no control groups. For example, campaign veterans might point out that a candidate spent a lot of time making telephone calls, and then got a swing towards her. The problem is that it’s hard to know the counterfactual: what would have happened if she hadn’t hit the phones?

  Similarly, campaign veterans might point to a part of the electorate where extra street stalls were held, and note that the candidate’s vote was higher in those suburbs. Again, how do we know that those suburbs wouldn’t have favoured the candidate for other reasons? In the past, campaigns have massively overestimated the impact of voter contact on turnout by asking people what contact they had with politicians and whether they voted. The problem is that contact with politicians isn’t random. Campaigns pursue citizens who are more likely to vote, and those citizens who seek out their local candidates are the kinds of people who were always more likely to vote. So simply looking at the correlation between contact and turnout doesn’t tell you much.7

  Yet the myths persist. As a politician, I meet plenty of campaign veterans with their own secret sauce. I’ve met ‘experts’ who are convinced that partisan letters work best when paired with doorknocking, that telephone calls work best in the final week of the campaign, or that posters outside the election booth make a huge difference. But ask them about their evidence base and it’s quickly apparent that their war stories lack a control group.

  Steadily, however, campaigns are beginning to shift their strategy towards an approach that is more curious, open-minded and data-driven. Modern political strategists are less likely to think that they know all the answers, and be more open to being proven wrong. They have less confidence in any particular campaign technique, but more confidence that they know how to sort effective campaigns from bad ones.

  Following in Gosnell’s footsteps, researchers have now published more than a hundred studies on the impact of various campaign strategies on increasing voter turnout. In the United States, this is facilitated by the fact that whether or not a person voted in a particular election is public information (though which candidate they voted for remains secret). So if campaigners want to boost turnout, they can start with a list of 20,000 people on the electoral roll, send a letter to 10,000 of them before the election and then look at the voter files afterwards to see whether there are turnout differences between the two groups.

  Now that the randomistas have been running turnout experiments, we have a wide range of findings on different techniques to get people to the polls. So let’s run through the evidence, starting with traditional approaches (television, radio, letters, phone calls and doorknocking), and finishing with newer strategies (emails, text messages and online advertisements).

  In his 2006 campaign for Texas governor, Rick Perry allowed a group of political scientists to randomise the placement of his radio and television advertisements. Over a three-week period, Governor Perry’s ads – which began ‘I’ve never been more proud to call myself a Texan’ – were assigned across eighteen media markets by researchers. The academics then tested the impact via telephone surveys.8 The experiment showed no effect of his radio advertisements. In the case of television, the maximum volume of advertisements – costing US$3 million a week – increased Governor Perry’s support by 5 percentage points. But the following week, there was no detectable impact on voter preference. By contrast with product commercials, at least political advertisements have a measurable impact.9 But the fact that the effect has disappeared a week after seeing the ad is a reminder that few ads are memorable.

  The history of political advertising has produced a handful of famous advertisements – such as Ronald Reagan’s ‘Morning in America’ segment, or the attack ad on Michael Dukakis for releasing Willie Horton from jail – but they are the exception. Most political advertising – like Rick Perry’s riff on Texan pride – is pretty forgettable. The Perry study suggests that a last-minute television blitz would indeed increase a candidate’s share of the vote. Conversely, it also implies that a one-week ‘television blackout’ period prior to polling day would entirely eliminate the effect of television ads on the election campaign.

  How about letters? When sent by non partisan groups, direct mail has a small but positive impact on turnout. Experiments have now been conducted in low-stakes and high-stakes elections, with researchers sending up to eight pieces of mail to each household. Combining the results of fifty-one randomised experiments conducted in the United States between 1998 and 2014, Alan Gerber and Donald Green concluded that each additional letter raises turnout by about 0.5 percentage points.10 Put another way, nonpartisan groups need to pos
t 200 letters to get one more person to the polls.

  A remarkable 2006 Michigan study suggests that this impact could be magnified using ‘social pressure’.11 Relying on the fact that voter turnout is public, researchers experimented with three kinds of letters that ramped up the social pressure: first, one that told people that their voting behaviour would be monitored by university researchers; second, a letter that set out the household’s past voting behaviour and promised to update it after the election; and third, a letter that listed the turnout of neighbours living on the same block. Each boosted turnout, with the ‘neighbours’ mailing increasing turnout at a massive rate of one extra voter for every twelve letters.

  Most other studies have reinforced these findings, although one study in a small city in Texas failed to find any effect of social pressure.12 One intriguing theory for this is that in some settings, the threat to tell the neighbours whether you voted might serve to anger people rather than to embarrass them into voting. Other randomised trials take a more positive approach to social pressure, by thanking people for voting in previous elections, or promising to put citizens with perfect voting records on a public ‘honour roll’.13 These kinds of letters raise turnout, but not by as much as the threat to shame non-voters.

  While letters can persuade people to vote, this effect seems to be restricted to mailings from groups that are interested in raising civic participation. When letters are sent by those with a stake in the outcome, they have little or no impact. Summarising nineteen US randomised experiments sent by Democratic candidates, Republican candidates and advocacy organisations, Gerber and Green find that such groups would need to send 10,000 letters to persuade one more person to vote.14 This estimate is so imprecise that it cannot reliably be distinguished from zero, leading Gerber and Green to conclude that ‘partisan mail has no effect on turnout’.15

 

‹ Prev