by Ian Ayres
Anna could have resolved the contradiction by revising down her standard deviation estimate (“Daddy, I want to revise my standard deviation to one”). But she felt instead that it was more accurate to increase her estimate of the mean. By revising the mean up to eight, Anna was now saying there was a 95 percent chance that she had walked the trail between four and twelve times. Having been her companion on these walks, I can attest that she revised the right number.
I had never been more proud of my daughter. Yet the point of the story is not (only) to kvell about Anna’s talents. (She is a smart kid, but she’s no genius child. Notwithstanding my best attempts to twist her intellect, she’s pretty normal.) No, the point of the story is to show how statistics and intuition can comfortably interact. Anna toggled back and forth between her memories as well as her knowledge of statistics to come up with a better estimate than either could have produced by themselves.
By estimating the 95 percent probability range, Anna actually produced a more accurate estimate of the mean. This is potentially a huge finding. Imagine what it could mean for the examination of trial witnesses, where lawyers often struggle to elicit estimates of when or how many times something occurred. You might even use it yourself when trying to jog your own or someone else’s memory.
The (Wo)Man of the Future
For the rational study of the law the blackletter man may be the man of the present, but the man of the future is the man of statistics…
OLIVER WENDELL HOLMES, JR.
THE PATH OF THE LAW, 1897
The rise of statistical thinking does not mean the end of intuition or expertise. Rather, Anna’s revision underscores how intuition will be reinvented to coexist with statistical thinking. Increasingly, decision makers will switch back and forth between their intuitions and data-based decision making. Their intuitions will guide them to ask new questions of the data that non-intuitive number crunchers would miss. And databases will increasingly allow decision makers to test their intuitions—not just once, but on an ongoing basis.
This dialectic is a two-way street. The best data miners will sit back and use their intuitions and experiential expertise to query whether their statistical analysis makes sense. Statistical results that diverge widely from intuition should be carefully interrogated. While there is now great conflict between dyed-in-the-wool intuitivists and the new breed of number crunchers, the future is likely to show that these tools are complements more than substitutes. Each form of decision making can pragmatically counterbalance the greatest weaknesses of the other.
Sometimes, instead of starting with a hypothesis, Super Crunchers stumble across a puzzling result, a number that shouldn’t be there. That’s what happened to Australian economist Justin Wolfers, when he was teaching a seminar at the University of Pennsylvania’s Wharton School on information markets and sports betting. Wolfers wanted to show his students how accurate Las Vegas bookies were at predicting college basketball games. So Wolfers pulled data on over 44,000 games—almost every college basketball game over a sixteen-year period. He created a simple graph showing what the actual margin of victory was relative to the market’s predicted point spread.
“The graph was bang on a normal bell curve,” he said. Almost exactly 50 percent (50.01 percent) of the time the favored team beat the point spread and almost exactly 50 percent of the time they came up short. “I wanted to show the class that this wasn’t just true in general, but that it would hold true for different-size point spreads.” The graphs that Wolfers made for games where the point spread was less than six points, and for point spreads from six to twelve points, again showed that the Las Vegas line was extremely accurate. Indeed, the following graph for all games where the point spread was twelve or less shows just how accurate:
SOURCE: Justin Wolfers, “Point Shaving: Corruption in NCAA Basketball,” PowerPoint presentation, AEA Meetings (January 7, 2006)
Look how close the actual distribution of victory margins (the solid line) was to the theoretical normal bell curve. This picture gives you an idea of why they call it the “normal” distribution. Many real-world variables are approximately normal, taking the shape of your standard-issue bell curve. Almost nothing is perfectly normal. Still, many actual distributions are close enough to the normal distribution to provide a wickedly accurate approximation until you get several standard deviations into the tails.*5
The problem was when Wolfers graphed games where the point spread was more than twelve points. When he crunched the numbers for his class, this is what he found:
SOURCE: Justin Wolfers, “Point Shaving: Corruption in NCAA Basketball,” PowerPoint presentation, AEA Meetings. (January 7, 2006)
Instead of a 50–50 chance that the favored team would beat the point spread, Wolfers found there was only a 47 percent chance that the favored team would beat the spread (and hence a 53 percent chance that they would fail to cover the spread). This six percentage point difference might not sound like a lot, but when you’re talking about millions of dollars bet on thousands of games (more than one fifth of college games have a point spread of more than twelve points), six percentage points is a big discrepancy. Something about the graph struck Wolfers as being very fishy and he started puzzling about it.
Wolfers was a natural for this investigation. He once worked for a bookie back home in Australia. More importantly, he is a leader in the new style of Super Crunching. A 2007 New York Times article, aptly titled “The Future of Economics Isn’t So Dismal,” singled him out as one of thirteen promising young economists, Super Crunchers all, who were remaking the field. Wolfers has a toothy smile and long, platinum blond hair often worn in a ponytail. To see him present a paper is to experience an endearing mixture of substance and flash. He is very much a rock star of Super Crunching.
So when Wolfers looked at the lopsided graph, it wasn’t just that the favored team didn’t cover the spread enough that bothered him; it was that they failed by just a few points. It was the hump in the distribution just below the Vegas line that didn’t seem right. Justin began to worry that a small proportion of time when the point spread was high, players on the favored team would shave points. Suddenly, it all made a lot of sense. When there was a large point spread, players could shave points without really hurting their team’s chances of still winning the game. Justin didn’t think that all the games were rigged. But the pattern in the graph is what you’d see if about 6 percent of all high-point-spread games were fixed.
Justin didn’t stop there. The future belongs to the Super Cruncher who can work back and forth and back again between his intuitions and numbers. The graph led Justin to hypothesize about point shaving. His hypothesizing led him to look for further tests that could confirm or disconfirm his hypothesis. He dug further in and found that if you looked at the score five minutes before the end of the game, there was no shortfall. The favored team was right on track to beat the spread 50 percent of the time. It was just in the last five minutes that the shortfall appeared. This isn’t proof positive, but it does make a stronger circumstantial case—after all, that’s the safest time for a bribed player to let up a bit, secure in the knowledge that it won’t lead to his team losing.
The future belongs to people like Wolfers who are comfortable with both intuition and numbers. This “new way to be smart” is also for the consumers of Super Crunching. Increasingly, it will be useful for people like Anna to be able to quantify their intuitions. It is also important to be able to restate other people’s Super Crunching results in terms that internally make intuitive sense.
One of the very coolest things about Super Crunching is that it not only predicts but also simultaneously tells you how accurate its prediction is. The standard deviation of the prediction is the crucial measure of accuracy. Indeed, the 2SD rule is the key to understanding whether a prediction is so accurate that Super Crunchers say it is “statistically significant.” When statisticians say that a result is statistically significant, they are really just saying that some
prediction is more than two standard deviations away from some other number. For example, when Wolfers says that the shortfall in favored teams covering the spread is statistically significant, he means that their 47 percent probability of covering is more than two standard deviations below the 50 percent probability he would predict if it was really a fair bet.
The designation of “statistical significance” is taken by a lot of people to be some highly technical determination. Yet it has a very intuitive explanation. There is less than a 5 percent chance that a random variable will be more than two standard deviations away from its expected mean (this is just the flip side of the 2SD rule). If an estimate is more than two standard deviations away from some other number, we say this is a statistically significant difference because it is highly unlikely (i.e., there is less than a 5 percent probability) that the estimated difference happened by chance. So just by knowing the 2SD rule, you know a lot about why “statistical significance” is pretty intuitive.
In this chapter, I hope to give you an idea for what it feels like to toggle back and forth between intuitions and numbers. I’m going to do it by introducing you to two valuable quantitative tools for the man or woman of the future. Reading this will not train you enough to be a full-fledged Super Cruncher. Yet learning and playing with these tools will put you well on the road to the wonderful dialectic of combining intuitions and statistics, experience and estimates. You’ve already started to learn how to use the first tool—the intuitive measure of dispersion, the standard deviation. One of the first steps is to see if you can communicate what you know to someone else.
A World of Information in a Single Number
When I taught at Stanford Law School, professors were required to award grades that had a 3.2 mean. Students would still obsess about how professors graded, but instead of focusing on the professor’s mean grade, they’d obsess about how variable the grades were around the mandatory mean. Innumerable students and professors would engage in inane conversations where students would ask if a professor was a “spreader” or “clumper.” Good students would want to avoid clumpers so that they would have a better chance at getting an A, while bad students hated the spreaders who handed out more As but also more Fs.
The problem was that many of the students and many of the professors had no way to express the degree of variability in professors’ grading habits. And it’s not just the legal community. As a nation, we lack a vocabulary of dispersion. We don’t know how to express what we intuitively know about the variability of a distribution of numbers.
The 2SD rule could help give us this vocabulary. A professor who said that her standard deviation was .2 could have conveyed a lot of information with a single number. The problem is that very few people in the U.S. today understand what this means. But you should know and be able to explain to others that only about 2.5 percent of the professor’s grades are above 3.6.
It’s amazing the economy of information that can be conveyed in just a few words. We all know that investing in the stock market is risky, but just how risky is risky? Once again, standard deviations and the 2SD rule come to our rescue. Super Crunching regression tells us that the predicted return next year of a diversified portfolio of New York Stock Exchange stocks is 10 percent, but that the standard deviation is 20 percent. Just knowing these two numbers reveals an incredible amount.
Suddenly we know that there’s a 95 percent chance that the return on this portfolio will be between minus 30 percent and positive 50 percent. If you invest $100, there’s a 95 percent chance that you’ll end the year with somewhere between $70 and $150. The actual returns on the stock market aren’t perfectly normal, but they are close enough for us to learn an awful lot from just two numbers, the mean and the standard deviation.
Indeed, once you know the mean and standard deviation of a normal distribution, you know everything there is to know about the distribution. Statisticians call these two values “summary statistics” because they summarize all the information contained in the entire bell curve. Armed with a mean and a standard deviation, we can not only apply the 2SD rule; we can also figure out the chance that a variable will fall within any given range of values. Want to know the chance that the stock market will go down this coming year? Well, if the expected return is 10 percent and the standard deviation is 20 percent, you’re really asking for the chance that the return will fall more than one half of a standard deviation below the mean. Turns out the answer (which takes about thirty seconds to calculate in Excel) is 31 percent.
Exploiting this ability to figure out the probability that some variable will be above or below a particular value pays even greater dividends in political polls.
Probabilistic Leader of the Pack
The current newspaper conventions on how to report polling data are all screwed up. Newspaper articles tend to say something like: “In a Quinnipiac poll of 1,243 likely voters, Calvin holds a 52 percent to 48 percent advantage over Hobbes for the Senate seat. The poll’s margin of error is plus or minus two percentage points.”
How many people understand what the margin of error really means? Do you? Before going on, write down what you think is the chance that most people in the state really support Calvin.
It should come as no surprise that the margin of error is related to the font of all statistical wisdom, the 2SD rule. The margin of error is nothing more than two standard deviations. So if the newspaper tells you that the margin of error is two percentage points, that means that one standard deviation is one percentage point. We want to know what proportion of people in the entire state population of likely voters support Calvin and Hobbes, but the sample proportions by chance might be misrepresentative of the population proportions. The standard deviation measure tells us how far the sample predictions might stray by chance from the true population proportions that we care about.
So once again we can apply our friend, the 2SD rule. We start with the sample proportion that supports Calvin, 52 percent, and then construct a range of numbers by adding on and subtracting off the margin of error (which is two standard deviations). That’s 52 percent plus or minus 2 percent. So using the 2SD rule we can say, “There is a 95 percent chance that somewhere between 50 percent and 54 percent of likely voters support Calvin.” Printing something like this would provide a lot more information than the cryptic margin of error disclaimer.
Even this 95 percent characterization fails, however, to emphasize an even more basic result: the probability that Calvin is actually leading. For this example, it’s pretty easy to figure out. Since there is a 95 percent chance that Calvin’s true support in the state is between 50 percent and 54 percent, there is a 5 percent chance that his true support is in one of the two tails of the bell curve—either above 54 percent or below 50 percent. And since the two tails of the bell curve are equal in size, there is just a 2.5 percent chance that Calvin’s statewide support is less than 50 percent. That means there’s about a 97.5 percent chance that Calvin is leading.
Reporters are massively misinformed when it comes to figuring out the probability of leading. If Laverne is leading Shirley 51 percent to 49 percent with a margin of error of 2 percent, news articles will say that the race is “a statistical dead heat.” Balderdash, I say. Laverne’s polling result is a full standard deviation above 50 percent. (Remember, the margin of error is two standard deviations, so in this example one standard deviation is 1 percent.) Crunching these numbers in Excel tells us in a few seconds that there is an 84 percent chance that Laverne currently leads in the polls. If something doesn’t change, she is your likely winner.
In many polls, there are undecideds and third-party candidates, so the proportions of the two leading candidates often add up to less than 100 percent. But the probability of leading tells you just what it says—the probable leader of the pack.
People have a much easier time understanding proportions and probabilities than they do standard deviations and margins of error. The beauty of the 2SD rule is that it
provides a bridge for translating one into the other. Instead of reporting the margin of error, reporters should start telling people something that they intuitively understand, the “probability of leading.” Standard deviations are our friends, and they can be used to tell even the uninitiated about things that we really do care about.
Working Backwards
But wait, there’s more. The stock and survey examples show that if you know the mean and standard deviation, you can work forward to calculate a proportion or probability that tells people something interesting about the underlying process. Yet sometimes it’s useful to work backward, starting with a probability and then estimating the implicit standard deviation that would give rise to that result. Lawrence Summers got into a lot of trouble for doing just this.
On January 14, 2005, the president of Harvard University, Lawrence Summers, touched off a firestorm of criticism when he spoke at a conference on the scarcity of women professors in science and math. A slew of newspaper articles characterized his remarks as suggesting that women are “somehow innately deficient in mathematics.” The New York Times in 2007 characterized Summers’s remarks as claiming that “a lack of intrinsic aptitude could help explain why fewer women than men reach the top ranks of science and math in universities.” The article (like many others) suggested that the subsequent furor over Summers’s speech contributed to his resignation in 2006 (and the decision to replace him with the first female president in the university’s 371-year history).
Summers’s speech did in fact suggest that there might be innate differences in the intelligence of men and women. But he didn’t argue that the average intelligence of women was any less than that of men. He focused instead on the possibility that the intelligence of men is more variable than that of women. He explicitly worked backwards from observed proportions to implicit standard deviations. Here’s what Summers said: