Everything Is Obvious Page 19 Read online free by Duncan J. Watts

Home > Other > Everything Is Obvious > Page 19

Everything Is Obvious Page 19

As creative as they are, these examples of crowdsourcing work best for media sites that already attract millions of visitors, and so automatically generate real-time information about what people like or don’t like. So if you’re not Bravo or Cheezburger or BuzzFeed—if you’re just some boring company that makes widgets or greeting cards or whatnot—how can you tap into the power of the crowd? Fortunately, crowdsourcing services like Amazon’s Mechanical Turk (which Winter Mason and I used to run our experiments on pay and performance that I discussed in Chapter 2) can also be used to perform fast and inexpensive market research. Unsure what to call your next book? Rather than tossing around ideas with your editor, you can run a quick poll on Mechanical Turk and get a thousand opinions in a matter of hours, for about $10—or better yet, have the “turkers” come up with the suggestions as well as voting on them. Looking to get feedback on some design choices for a new product or advertising campaign? Throw the images up on Mechanical Turk and have users vote. Want an independent evaluation of your search engine results? Strip off the labels and throw your results up on Mechanical Turk next to your competitors’, and let real Web users decide. Wondering if the media is biased against your candidate? Scrape several hundred news stories off the Web and have the turkers read them and rate them for positive or negative sentiment—all over a weekend.9

Clearly Mechanical Turk, along with other potential crowdsourcing solutions, comes with some limitations—most obviously the representativeness and reliability of the turkers. To many people it seems strange that anyone would work for pennies on mundane tasks, and therefore one might suspect either that the turkers are not representative of the general population or else that they do not take the work seriously. These are certainly valid concerns, but as the Mechanical Turk community matures, and as researchers learn more about it, the problems seem increasingly manageable. Turkers, for example, are far more diverse and representative than researchers initially suspected, and several recent studies have shown that they exhibit comparable reliability to “expert” workers. Finally, even where their reliability is poor—which sometimes it is—it can often be boosted through simple techniques, like soliciting independent ratings for every piece of content from several different turkers and taking the majority or the average score.10

PREDICTING THE PRESENT

At a higher level, the Web as a whole can also be viewed as a form of crowdsourcing. Hundreds of millions of people are increasingly turning to search engines for information and research, spending ever more time browsing news, entertainment, shopping, and travel sites, and increasingly sharing content and information with their friends via social networking sites like Facebook and Twitter. In principle, therefore, one might be able to aggregate all this activity to form a real-time picture of the world as viewed through the interests, concerns, and intentions of the global population of Internet users. By counting the number of searches for influenza-related terms like “flu,” and “flu shots,” for example, researchers at Google and Yahoo! have been able to estimate influenza caseloads remarkably close to those reported by the CDC.11 Facebook, meanwhile, publishes a “gross national happiness” index based on users’ status updates,12 while Yahoo! compiles an annual list of most-searched-for items that serves as a rough guide to the cultural zeitgeist.13 In the near future, no doubt, it will be possible to combine search and update data, along with tweets on Twitter, check-ins on Foursquare, and many other sources to develop more specific indices associated with real estate or auto sales or hotel vacancy rates—not just nationally, but down to the local level.14

Once properly developed and calibrated, Web-based indices such as these could enable businesses and governments alike to measure and react to the preferences and moods of their respective audiences—what Google’s chief economist Hal Varian calls “predicting the present.” In some cases, in fact, it may even be possible to use the crowd to make predictions about the near future. Consumers contemplating buying a new camera, for example, may search to compare models. Moviegoers may search to determine the opening date of an upcoming film or to locate cinemas showing it. And individuals planning a vacation may search for places of interest and look up airline costs or price hotel rooms. If so, it follows that by aggregating counts of search queries related to retail activity, moviegoing, or travel, one might be able to make near-term predictions about behavior of economic, cultural, or political interest.

Determining what kind of behavior can be predicted using searches, as well as the accuracy of such predictions and the timescale over which predictions can be usefully made are therefore all questions that researchers are beginning to address. For example, my colleagues at Yahoo! and I recently studied the usefulness of search-query volume to predict the opening weekend box office revenues of feature films, the first-month sales of newly released video games, and the Billboard “Hot 100” ranking of popular songs. All these predictions were made at most a few weeks in advance of the event itself, so we are not talking about long-term predictions here—as discussed in the previous chapter, those are much harder to make. Nevertheless, even having a slightly better idea a week in advance of audience interest might help a movie studio or a distributor decide how many screens to devote to which movies in different local regions.15

What we found is that the improvement one can get from search queries over other types of public data—like production budgets or distribution plans—is small but significant. As I discussed in the last chapter, simple models based on historical data are surprisingly hard to outperform, and the same rule applies to search-related data as well. But there are still plenty of ways in which search and other Web-based data could help with predictions. Sometimes, for example, you won’t have access to reliable sources of historical data—say you’re launching a new game that isn’t like games you’ve launched in the past, or because you don’t have access to a competitor’s sales figures. And sometimes, as I’ve also discussed, the future is not like the past—such as when normally placid economic indicators suddenly increase in volatility or historically rising housing prices abruptly crash—and in these circumstances prediction methods based on historical data can be expected to perform poorly. Whenever historical data is unavailable or is simply uninformative, therefore, having access to the real-time state of collective consciousness—as revealed by what people are searching for—might give you a valuable edge.

In general, the power of the Web to facilitate measure-and-react strategies ought to be exciting news for business, scientists, and government alike. But it’s important to keep in mind that the principle of measure and react is not restricted to Web-based technology, as indeed the very non-Web company Zara exemplifies. The real point is that our increasing ability to measure the state of the world ought to change the conventional mind-set toward planning. Rather than predicting how people will behave and attempting to design ways to make consumers respond in a particular way—whether to an advertisement, a product, or a policy—we can instead measure directly how they respond to a whole range of possibilities, and react accordingly. In other words, the shift from “predict and control” to “measure and react” is not just technological—although technology is needed—but psychological. Only once we concede that we cannot depend on our ability to predict the future are we open to a process that discovers it.16

DON’T JUST MEASURE: EXPERIMENT

In many circumstances, however, merely improving our ability to measure things does not, on its own, tell us what we need to know. For example, a colleague of mine recently related a conversation he’d had with the CFO of a major American corporation who confided that in the previous year his company had spent about $400 million on “brand advertising,” meaning that it was not advertising particular products or services—just the brand. How effective was that money? According to my colleague, the CFO had lamented that he didn’t know whether the correct number should have been $400 million or zero. Now let’s think about that for a second. The CFO wasn’t saying t
hat the $400 million hadn’t been effective—he was saying that he had no idea how effective it had been. As far as he could tell, it was entirely possible that if they had spent no money on brand advertising at all, their performance would have been no different. Alternatively, not spending the money might have been a disaster. He just didn’t know.

Now, $400 million might seem like a lot of money not to know about, but in reality it’s a drop in the ocean. Every year, US corporations collectively spend about $500 billion on marketing, and there’s no reason to think that this CFO was any different from CFOs at other companies—more honest perhaps, but not any more or less certain. So really we should be asking the same question about the whole $500 billion. How much effect on consumer behavior does it really have? Does anybody have any idea? When pressed on this point, advertisers often quote the department-store magnate John Wanamaker, who is reputed to have said that “half the money I spend on advertising is wasted—I just don’t know which half.” It’s entirely apropos and always seems to get a laugh. But what many people don’t appreciate is that Wanamaker uttered it almost a century ago, around the time when Einstein published his theory of general relativity. How is it that in spite of the incredible scientific and technological boom since Wanamaker’s time—penicillin, the atomic bomb, DNA, lasers, space flight, supercomputers, the Internet—his puzzlement remains as relevant today as it was then?

It’s certainly not because advertisers haven’t gotten better at measuring things. With their own electronic sales databases, third-party ratings agencies like Nielsen and comScore, and the recent tidal wave of clickstream data online, advertisers can measure many more variables, and at far greater resolution, than Wanamaker could. Arguably, in fact, the advertising world has more data than it knows what to do with. No, the real problem is that what advertisers want to know is whether their advertising is causing increased sales; yet almost always what they measure is the correlation between the two.

In theory, of course, everyone “knows” that correlation and causation are different, but it’s so easy to get the two mixed up in practice that we do it all the time. If we go on a diet and then subsequently lose weight, it’s all too tempting to conclude that the diet caused the weight loss. Yet often when people go on diets, they change other aspects of their lives as well—like exercising more or sleeping more or simply paying more attention to what they’re eating. Any of these other changes, or more likely some combination of them, could be just as responsible for the weight loss as the particular choice of diet. But because it is the diet they are focused on, not these other changes, it is the diet to which they attribute the effect. Likewise, every ad campaign takes place in a world where lots of other factors are changing as well. Advertisers, for example, often set their budgets for the upcoming year as a function of their anticipated sales volume, or increase their spending during peak shopping periods like the holidays. Both these strategies will have the effect that sales and advertising will tend to be correlated whether or not the advertising is causing anything at all. But as with the diet, it is the advertising effort on which the business focuses its attention; thus if sales or some other metric of interest subsequently increases, it’s tempting to conclude that it was the advertising, and not something else, that caused the increase.17

Differentiating correlation from causation can be extremely tricky in general. But one simple solution, at least in principle, is to run an experiment in which the “treatment”—whether the diet or the ad campaign—is applied in some cases and not in others. If the effect of interest (weight loss, increased sales, etc.) happens significantly more in the presence of the treatment than it does in the “control” group, we can conclude that it is in fact causing the effect. If it doesn’t, we can’t. In medical science, remember, a drug can be approved by the FDA only after it has been subjected to field studies in which some people are randomly assigned to receive the drug while others are randomly assigned to receive either nothing or a placebo. Only if people taking the drug get better more frequently than people who don’t take the drug is the drug company allowed to claim that it works.

Precisely the same reasoning ought to apply in advertising. Without experiments, it’s actually close to impossible to ascertain cause and effect, and therefore to measure the real return on investment of an advertising campaign. Let’s say, for example, that a new product launch is accompanied by an advertising campaign, and the product sells like hotcakes. Clearly one could compute a return on investment based on how much was spent on the campaign and how much sales were generated, and that’s generally what advertisers do. But what if the item was simply a great product that would have sold just as well anyway, even with no advertising campaign at all? Then clearly that money was wasted. Alternatively, what if a different campaign would have generated twice as many sales for the same cost? Once again, in a relative sense the campaign generated a poor return on investment, even though it “worked.”18

Without experiments, moreover, it’s extremely difficult to measure how much of the apparent effect of an ad was due simply to the predisposition of the person viewing it. It is often noted, for example, that search ads—the sponsored links you see on the right-hand side of a search results page—perform much better than display ads that appear on most other Web pages. But why is that? A big part of the reason is that which sponsored links you see depends very heavily on what you just searched for. People searching for “Visa card” are very likely to see ads for credit card vendors, while people searching for “Botox treatments” are likely to see ads for dermatologists. But these people are also more likely to be interested precisely in what those particular advertisers have to offer. As a result, the fact that someone who clicked on an ad for a Visa card subsequently signs up for one can only be partly attributed to the ad itself, for the simple reason that the same consumer might have signed up for the card anyway.

This seems like an obvious point, but it is widely misunderstood.19 Advertisers, in fact, often pay a premium to reach customers they think are most likely to buy their products—because they have bought their products (e.g., Pampers) in the past; or because they have bought products in the same category (e.g., a competitor to Pampers); or because their attributes and circumstances make them likely to do so soon (e.g., a young couple expecting their first child). Targeted advertising of this kind is often held up as the quintessence of a scientific approach. But again, at least some of those consumers, and possibly many of them, would have bought the products anyway. As a result, the ads were just as wasted on them as they were on consumers who saw the ads and weren’t interested. Viewed this way, the only ads that matter are those that sway the marginal consumer—the one who ends up buying the product, but who wouldn’t have bought it had they not seen the ad. And the only way to determine the effect on marginal consumers is to conduct an experiment in which the decision about who sees the ad and who doesn’t is made randomly.

FIELD EXPERIMENTS

A common objection to running these kinds of randomized experiments is that it can be difficult to do in practice. If you put up a billboard by the highway or place an ad in a magazine, it’s generally impossible to know who sees it—even consumers themselves are often unaware of the ads they have seen. Moreover, the effects can be hard to measure. Consumers may make a purchase days or even weeks later, by which stage the connection between seeing the ad and acting on it has been lost. These are reasonable objections, but increasingly they can be dealt with, as three of my colleagues at Yahoo!—David Reiley, Taylor Schreiner, and Randall Lewis—demonstrated recently in a pioneering “field experiment” involving 1.6 million customers of a large retailer who were also active Yahoo! users.

To perform the experiment, Reiley and company randomly assigned 1.3 million users to the “treatment” group, meaning that when they arrived at Yahoo!-operated websites, they were shown ads for the retailer. The remaining 300,000, meanwhile, were assigned to the “control” group, meaning that they did not s
ee these ads even if they visited exactly the same pages as the treatment group members. Because the assignment of individuals to treatment and control groups was random, the differences in behavior between the two groups had to be caused by the advertising itself. And because all the participants in the experiment were also in the retailer’s database, the effect of the advertising could be measured in terms of their actual purchasing behavior—up to several weeks after the campaign itself concluded.20

Using this method, the researchers estimated that the additional revenue generated by the advertising was roughly four times the cost of the campaign in the short run, and possibly much higher over the long run. Overall, therefore, they concluded that the campaign had in fact been effective—a result that was clearly good news both for Yahoo! and the retailer. But what they also discovered was that almost all the effect was for older consumers—the ads were largely ineffective for people under forty. At first, this latter result seems like bad news. But the right way to think about it is that finding out that something doesn’t work is also the first step toward learning what does work. For example, the advertiser could experiment with a variety of different approaches to appeal to younger people, including different formats, different styles, or even different sorts of incentives and offers. It’s entirely possible that something would work, and it would be valuable to figure out what that is in a systematic way.

‹ Prev Next ›