Super Crunchers
Page 20
But What If It’s Wrong?
The Lott saga has important lessons for Super Crunchers. First, Lott should be applauded for his exemplary sharing of data. Even though Lott’s reputation has been severely damaged by the Mary Rosh incident and a host of other concerns, Lott’s open-access policy has contributed to a new sharing ethic among data crunchers. I, for one, now share data whenever I legally can. And several journals including my own Journal of Law, Economics, and Organization now require data sharing (or an explanation why you can’t share). Donohue and I would never have been able to evaluate Lott’s work if he had not led the way by giving us the dataset that he worked on.
The Lott saga also underscores why it is so important to have independent verification of results. It’s so easy for people of good faith to make mistakes. Moreover, once a researcher has spent endless hours producing an interesting result, he or she becomes invested in defending it. I include myself in this tendency. It’s easy and true to charge that the intuitivist and experientialist are subject to cognitive biases. Yet the Lott saga shows that empiricists are, too. Lott’s adamant defense of his thesis in the face of such overwhelming evidence underscores this fact. Numbers don’t have emotions or preferences, but the number crunchers that interpret them do.
My contretemps with Lott suggests the usefulness of setting up a formalized system of empirical devil’s advocacy akin to the role of an Advocatus Diaboli in the Roman Catholic Church. For over 500 years, the canonization process followed a formal procedure in which one person (a postulator) presents the case in favor and another (the promoter of the faith) presents the case against. According to Prospero Lamertini (later Pope Benedict XIV [1740–58]):
It is [the promoter of the faith’s duty] to critically examine the life of, and the miracles attributed to, the individual up for sainthood or blessedness. Because his presentation of facts must include everything unfavorable to the candidate, the promoter of the faith is popularly known as the devil’s advocate. His duty requires him to prepare in writing all possible arguments, even at times seemingly slight, against the raising of any one to the honours of the altar.
Corporate boards could create devil’s advocate positions whose job it is to poke holes in pet projects. These professional “No” men could be an antidote to overconfidence bias—without risking their jobs. The Lott story shows that institutionalized counterpunching may also be appropriate for Super Crunchers to make sure that their predictions are robust.
Among academic crunchers, this devil’s advocacy is a two-way street. Donohue and I have crunched numbers testing the robustness of Lott’s “More Guns” thesis. Lott again and again has recrunched numbers that my coauthors and I have run. Lott challenged the robustness of an article that Levitt and I wrote showing that the hidden transmitter LoJack has a big impact on reducing crime. And Lott has also recrunched numbers to challenge a Donohue and Levitt article showing that the legalization of abortion reduced crime. To my mind, none of Lott’s counterpunching crunches has been persuasive. Nonetheless, the real point is that it’s not for us or Lott to decide. By opening up the number crunching to contestation, we’re more likely to get it right. We keep each other honest.
Contestation and counterpunching is especially important for Super Crunching, because the method leads to centralized decision making. When you are putting all your eggs in a single decisional basket, it’s important to try to make sure that the decision is accurate. The carpenter’s creed to “measure twice, cut once” applies. Outside of the academy, however, the useful Lott/Ayres/Donohue/Levitt contestation is often lacking. We’re used to governmental or corporate committee reports giving the supposedly definitive results of some empirical study. Yet agencies and committees usually don’t have empirical checks and balances. Particularly when the underlying data is proprietary or confidential—and this is still often the case with regard to both business and governmental data—it becomes impossible for outsiders like Lott or me to counterpunch. It thus becomes all the more important that these closed Super Crunchers make themselves answerable to loyal opposition within their own organizations. Indeed, I predict that data-quality firms will appear to provide confidential second opinions—just like the big four accounting firms will audit your books. Decision makers shouldn’t rely on the word of just one number cruncher.
Most of this book has been chock full of examples where Super Crunchers get it right. We might or might not always like the impact of their predictions on us as consumers, employees, or citizens, but the predictions have tended to be more accurate than those of humans unaided by the power of data mining. Still, the Lott saga underscores the fact that number crunchers are not infallible oracles. We, of course, can and do get it wrong. The world suffers when it relies on bad numbers.
The onslaught of data-based decision making if not monitored (internally or externally) may unleash a wave of mistaken statistical analysis. Some databases do not easily yield up definitive answers. In the policy arena, there are still lively debates about whether (a) the death penalty, or (b) concealed handguns, or (c) abortions reduce crime. Some researchers have so comprehensively tortured the data that their datasets become like prisoners who will tell you anything you want to know. Statistical analysis casts a patina of scientific integrity over a study that can obscure the misuse of mistaken assumptions.
Even randomized studies, the gold-standard of causal testing, may yield distorted predictions. For example, Nobel Prize–winning econometrician James Heckman has appropriately railed against reliance on randomized results where there is a substantial decline in the number of subjects who complete the experiment. For example, at the moment I’m trying to set up a randomized study to test whether Weight Watchers plus a financial incentive to lose weight does better than Weight Watchers alone. The natural way to set this up is to find a bunch of people who are about to start Weight Watchers and then to flip a coin and give half a financial incentive and make the other half the control group. The problem comes when we try to collect the results. The Constitutional prohibition against slavery is a very good thing, but it means that we can’t mandate that people will continue participating in our study. There is almost always attrition as some people after a while quit responding to your phone calls. Even though the treatment and control group were probabilistically identical in the beginning, they may be very different by the end. Indeed, in this example, I worry that people who fail to lose weight are more likely to quit the financial incentive group thus leaving me at the end with a self-censored sample of people who have succeeded. That’s not a very good test of whether the financial incentive causes more weight loss.
One of the most controversial recent randomized studies concerned an even more basic question: are low-fat diets good for your health? In 2006, the Women’s Health Initiative (WHI) reported the results of a $415 million federal study. The researchers randomly assigned nearly 49,000 women ages fifty to seventy-nine to follow a low-fat diet or not, and then followed their health for eight years.
The low-fat diet group “received an intensive behavioral modification program that consisted of eighteen group sessions in the first year and quarterly maintenance sessions thereafter.” These women did report eating 10.7 percent less fat at the end of the first year and 8.1 percent less fat at the end of year six. (They also reported eating on average each day an extra serving of vegetables or fruit.)
The shocking news was that, contrary to prior accepted wisdom, the low-fat diet did not improve the women’s health. The women assigned to the low-fat diet weighed about the same and had the same rates of breast cancer, colon cancer, and heart disease as those whose diets were unchanged. (There was a slightly lower risk of breast cancer—42 per 10,000 per year in the low-fat diet group, compared with 45 per 10,000 in the regular diet group—but the difference was statistically insignificant.)
Some researchers trumpeted the results as a triumph for evidence-based medicine. For them, this massive randomized trial conclusively refuted e
arlier studies which suggested that low-fat diets might reduce the incidence of breast or colon cancer. These earlier studies were based on indirect evidence—for example, finding that women who moved to the United States from countries where diets were low in fat acquired a higher risk of cancer. There were also some animal studies showing that a high-fat diet could lead to more mammary cancer.
So the Women’s Health Initiative was a serious attempt to directly test a central and very pressing question. A sign of the researchers’ diligence can be seen in the surprisingly low rate of attrition. After eight years, only 4.7 percent of the women in the low-fat diet group withdrew from participation or were lost to follow-up (compared with 4.0 percent of the women in the regular diet group).
Nonetheless, the study has been attacked. Even supporters of evidence-based medicine have argued that the study wasted hundreds of millions of dollars because it asked the wrong question. Some say that the recommended diet wasn’t low fat enough. The dieters were told that 20 percent of their calories could come from fat. (Only 31 percent of them got their dietary fat that low.) Some critics think—especially because of compliance issues—that the researchers should have recommended 10 percent fat.
Others critics think the study is useless because it tested only for the impact of reducing total fat in the diet instead of testing the impact of reducing saturated fats, which raise cholesterol levels. Randomized studies can’t tell you anything about treatments that you failed to test. So we just don’t know whether reducing saturated and trans fats might still reduce the risk of heart disease. And we’re not likely to get the answer soon. Dr. Michael Thun, who directs epidemiological research for the American Cancer Society, called the WHI study, “the Rolls-Royce of studies,” not just because it was high quality, but also because it was so expensive. “We usually have only one shot,” he said, “at a very large-scale trial on a particular issue.”
Similar concerns have been raised about another WHI study, which tested the impact of calcium supplements. A seven-year randomized test of 36,000 women aged fifty to seventy-nine found that taking calcium supplements resulted in no significant reduction in risk of hip fracture (but did increase the risk of kidney stones). Critics again worry that the study asked the wrong question to the wrong set of women. Proponents of calcium supplements want to know whether supplements might not still help older women. Others said they should have excluded women who are already getting plenty of calcium in their regular diet, so that the study would have tested the impact of calcium supplements when there is a pre-existing deficiency. And of course some wished they had tested a higher dose supplement.
Still, even the limited nature of the results gives one pause. Dr. Ethel Siris, president of the National Osteoporosis Foundation, said the new study made her question the advice she had given women to take calcium supplements regardless of what is in their diet. “We didn’t think it hurt, which is why doctors routinely gave it,” Siris said.
When she heard about the results of the calcium study, Siris’s first reaction was to try to pick it apart. She changed her mind when she heard the unreasonable way that people were criticizing some of the WHI studies. Seeing the psychology of resistance in others helped her overcome it in herself. She didn’t want to find herself “thinking there was something wrong with the design of this study because I don’t like the results.”
Much is at stake here. The massive randomized WHI studies are changing physician practice with regard to a host of treatments. Some doctors have stopped recommending low-fat diets to their patients as a way to reduce their heart disease and cancer risk. Others, like Siris, have changed their minds about calcium supplements. Even the best studies need to be interpreted. Done well, Super Crunching is a boon to society. Done badly, database decision making can kill.
The rise of Super Crunching is a phenomenon that cannot be ignored. On net, it has and will continue to improve our lives. Having more information about “what causes what” is usually good. But the purpose of this chapter has been to point out exceptions to this general tendency. Much of the resistance that we’ve seen over and over in this book can be explained by self-interest. Traditional experts don’t like the loss of control and status that often accompanies a shift toward Super Crunching. But some of the resistance is more visceral. Some people fear numbers. For these people, Super Crunching is their worst nightmare. To them, the spread of data-driven decision making is just the kind of thing they thought they could avoid by majoring in the humanities and then studying something nice and verbal, like law.
We should expect a Super Crunching backlash. The greater its impact, the greater the resistance—at least pockets of resistance. Just as we have seen the rise of hormone-free milk and cruelty-free cosmetics, we should expect to see products that claim to be “data-mining free.” In a sense, we already are. In politics, there is a certain attraction to candidates who are straight shooters, who don’t poll every position, who don’t relentlessly stay on message to follow a focus group–approved script. In business, we find companies like Southwest Airlines that charge one price for any seat on a particular route. Southwest passengers don’t need Farecast to counter-crunch future fares on their behalf, because Southwest doesn’t play the now-you-see-it-now-you-don’t pricing games (euphemistically called “revenue enhancement”) where other airlines try to squeeze as much as they can from every individual passenger.
While price resistance is reasonable, a broader quest for a life untouched by Super Crunching is both infeasible and ill-advised. Instead of a Luddite rejection of this powerful new technology, it is better to become a knowledgeable participant in the revolution. Instead of sticking your head in the sands of innumeracy, I recommend filling your head with the basic tools of Super Crunching.
CHAPTER 8
The Future of Intuition (and Expertise)
Here’s a fable that happens to be true. Once upon a time, I went for a hike with my daughter Anna, who was eight years old at the time. Anna is a talkative girl who is, much to my consternation, developing a fashion sense. She’s also an intricate planner. She’ll start thinking about the theme and details of her birthday party half a year in advance. Recently, she’s taken to designing and fabricating elaborate board games for her family to play.
While we were hiking, I asked Anna how many times in her life she had climbed the Sleeping Giant trail. Anna replied, “Six times.” I then asked what was the standard deviation of her estimate. Anna replied, “Two times.” Then she paused and said, “Daddy, I want to revise my mean to eight.”
Something in Anna’s reply gets at the heart of why “thinking-by-numbers is the new way to be smart.” To understand what was going on in that little mind of hers, we have to step back and learn something about our friend, the standard deviation.
You see, Anna knows that standard deviations are an incredibly intuitive measure of dispersion. She knows that standard deviations give us a way of toggling back and forth between numbers and our intuitions about the underlying variability of some random process. This all sounds horribly abstract and unhelpful, but one concrete fact is now deeply ingrained in Anna’s psyche:
There’s a 95 percent chance that a normally distributed variable will fall within two standard deviations (plus or minus) of its mean.
In our family, we call this the “Two Standard Deviation” rule (or 2SD for short). Understanding this simple rule is really at the heart of understanding variability. So what does it mean? Well, the average IQ score is 100 and the standard deviation is 15. So the 2SD rule tells us that 95 percent of people will have an IQ between 70 (which is 100 minus two standard deviations) and 130 (which is 100 plus two standard deviations). Using the 2SD rule gives us a simple way to translate a standard deviation number into an intuitive statement about variability. Because of the 2SD rule, we can think about variability in terms of something that we understand: probabilities and proportions. Most people (95 percent) have IQs between 70 and 130. If the distribution of IQs were
less variable—say, the standard deviation was only 5—then the range of scores that just included 95 percent of the population would be much smaller. We’d be able to say 95 percent of people have IQs between 90 and 110. (In fact, later on, we’ll learn how Larry Summers, the ousted president of Harvard, got into a world of trouble by suggesting that men and women have different IQ standard deviations.)
We now know enough to figure out what was going on in Anna’s eight-year-old mind during that fateful hike. You see, Anna can recite the 2SD rule in her sleep. She knows that standard deviations are our friends and that the first thing you always do whenever you have a standard deviation and a mean is to apply the 2SD rule.
Recall that after Anna said she had hiked Sleeping Giant six times, she said the standard deviation of her estimate was two. She got the number two as her estimate for the standard deviation by thinking about the 2SD rule. Anna asked herself what the 95 percent range of her confidence was and then tried to back out a number that was consistent with her intuitions. She used the 2SD rule to translate her intuitions into a number. (If you want a challenge, see if you can use the 2SD rule and just your intuition to derive a number for the standard deviation for adult male height. You’ll find help at the bottom of the page.)*4
But Anna wasn’t done. The really amazing thing was that after a pause of a few seconds, she said, “Daddy, I want to revise my mean to eight.” During that pause, after she told me her estimate was six and the standard deviation was two, she was silently thinking more about the 2SD rule. The rule told her, of course, that there was a 95 percent chance that she had walked to the top of Sleeping Giant between two and ten times. And here’s the important part: without any prompting she reflected on the truth of this range using nothing more than her experience, her memories. She realized that she had clearly walked it more than two times. Her numbers didn’t fit her intuitions.