by Nate Silver
The bigger problem, however, is that the frequentist methods—in striving for immaculate statistical procedures that can’t be contaminated by the researcher’s bias—keep him hermetically sealed off from the real world. These methods discourage the researcher from considering the underlying context or plausibility of his hypothesis, something that the Bayesian method demands in the form of a prior probability. Thus, you will see apparently serious papers published on how toads can predict earthquakes,50 or how big-box stores like Target beget racial hate groups,51 which apply frequentist tests to produce “statistically significant” (but manifestly ridiculous) findings.
Data Is Useless Without Context
Fisher mellowed out some toward the end of his career, occasionally even praising Bayes.52 And some of the methods he developed over his long career (although not the ones that are in the widest use today) were really compromises between Bayesian and frequentist approaches. In the last years of his life, however, Fisher made a grievous error of judgment that helps to demonstrate the limitations of his approach.
The issue concerned cigarette smoking and lung cancer. In the 1950s, a large volume of research—some of it using standard statistical methods and some using Bayesian ones53—claimed there was a connection between the two, a connection that is of course widely accepted today.
Fisher spent much of his late life fighting against these conclusions, publishing letters in prestigious publications including The British Medical Journal and Nature.54 He did not deny that the statistical relationship between cigarettes and lung cancer was fairly strong in these studies, but he claimed it was a case of correlation mistaken for causation, comparing it to a historical correlation between apple imports and marriage rates in England.55 At one point, he argued that lung cancer caused cigarette smoking and not the other way around56—the idea, apparently, was that people might take up smoking for relief from their lung pain.
Many scientific findings that are commonly accepted today would have been dismissed as hooey at one point. This was sometimes because of the cultural taboos of the day (such as in Galileo’s claim that the earth revolves around the sun) but at least as often because the data required to analyze the problem did not yet exist. We might let Fisher off the hook if, it turned out, there was not compelling evidence to suggest a linkage between cigarettes and lung cancer by the 1950s. Scholars who have gone back and looked at the evidence that existed at the time have concluded, however, that there was plenty of it—a wide variety of statistical and clinical tests conducted by a wide variety of researchers in a wide variety of contexts demonstrated the causal relationship between them.57 The idea was quickly becoming the scientific consensus.
So why did Fisher dismiss the theory? One reason may have been that he was a paid consultant of the tobacco companies.58 Another may have been that he was a lifelong smoker himself. And Fisher liked to be contrarian and controversial, and disliked anything that smacked of puritanism. In short, he was biased, in a variety of ways.
But perhaps the bigger problem is the way that Fisher’s statistical philosophy tends to conceive of the world. It emphasizes the objective purity of the experiment—every hypothesis could be tested to a perfect conclusion if only enough data were collected. However, in order to achieve that purity, it denies the need for Bayesian priors or any other sort of messy real-world context. These methods neither require nor encourage us to think about the plausibility of our hypothesis: the idea that cigarettes cause lung cancer competes on a level playing field with the idea that toads predict earthquakes. It is, I suppose, to Fisher’s credit that he recognized that correlation does not always imply causation. However, the Fisherian statistical methods do not encourage us to think about which correlations imply causations and which ones do not. It is perhaps no surprise that after a lifetime of thinking this way, Fisher lost the ability to tell the difference.
Bob the Bayesian
In the Bayesian worldview, prediction is the yardstick by which we measure progress. We can perhaps never know the truth with 100 percent certainty, but making correct predictions is the way to tell if we’re getting closer.
Bayesians hold the gambler in particularly high esteem.59 Bayes and Laplace, as well as other early probability theorists, very often used examples from games of chance to explicate their work. (Although Bayes probably did not gamble much himself,60 he traveled in circles in which games like cards and billiards were common and were often played for money.) The gambler makes predictions (good), and he makes predictions that involve estimating probabilities (great), and when he is willing to put his money down on his predictions (even better), he discloses his beliefs about the world to everyone else. The most practical definition of a Bayesian prior might simply be the odds at which you are willing to place a bet.*
And Bob Voulgaris is a particularly Bayesian type of gambler. He likes betting on basketball precisely because it is a way to test himself and the accuracy of his theories. “You could be a general manager in sports and you could be like, Okay, I’ll get this player and I’ll get that player,” he told me toward the end of our interview. “At the end of the day you don’t really know if you’re right or wrong. But at the end of the day, the end of the season, I know if I’m right or wrong because I know if I’m winning money or I’m losing it. That’s a pretty good validation.”
Voulgaris soaks up as much basketball information as possible because everything could potentially shift his probability estimates. A professional sports bettor like Voulgaris might place a bet only when he thinks he has at least a 54 percent chance of winning it. This is just enough to cover the “vigorish” (the cut a sportsbook takes on a winning wager), plus the risk associated with putting one’s money into play. And for all his skill and hard work—Voulgaris is among the best sports bettors in the world today—he still gets only about 57 percent of his bets right. It is just exceptionally difficult to do much better than that.
A small piece of information that improves Voulgaris’s estimate of his odds from 53 percent to 56 percent can therefore make all the difference. This is the sort of narrow margin that gamblers, whether at the poker table or in the stock market, make their living on. Fisher’s notion of statistical significance, which uses arbitrary cutoffs devoid of context* to determine what is a “significant” finding and what isn’t,61 is much too clumsy for gambling.
But this is not to suggest that Voulgaris avoids developing hypotheses around what he’s seeing in the statistics. (The problem with Fisher’s notion of hypothesis testing is not with having hypotheses but with the way Fisher recommends that we test them.)62 In fact, this is critical to what Voulgaris does. Everyone can see the statistical patterns, and they are soon reflected in the betting line. The question is whether they represent signal or noise. Voulgaris forms hypotheses from his basketball knowledge so that he might tell the difference more quickly and more accurately.
Voulgaris’s approach to betting basketball is one of the purer distillations of the scientific method that you’re likely to find (figure 8-7). He observes the world and asks questions: why are the Cleveland Cavaliers so frequently going over on the total? He then gathers information on the problem, and formulates a hypothesis: the Cavaliers are going over because Ricky Davis is in a contract year and is trying to play at a fast pace to improve his statistics. The difference between what Voulgaris does and what a physicist or biologist might do is that he demarcates his predictions by placing bets on them, whereas a scientist would hope to validate her prediction by conducting an experiment.
FIGURE 8-7: SCIENTIFIC METHOD
Step in Scientific Method 63
Sports Betting Example
Observe a phenomenon
Cavaliers games are frequently going over the game total.
Develop a hypothesis to explain the phenomenon
Cavaliers games are going over because Ricky Davis is playing for a new contract and trying to score as many points as possible.
Formulate a prediction
from the hypothesis
Davis’s incentives won’t change until the end of the season. Therefore: (i) he’ll continue to play at a fast pace, and, (ii) future Cavaliers games will continue to be high-scoring as a result.
Test the prediction
Place your bet.
If Voulgaris can develop a strong hypothesis about what he is seeing in the data, it can enable him to make more aggressive bets. Suppose, for instance, that Voulgaris reads some offhand remark from the coach of the Denver Nuggets about wanting to “put on a good show” for the fans. This is probably just idle chatter, but it might imply that the team will start to play at a faster pace in order to increase ticket sales. If this hypothesis is right, Voulgaris might expect that an over bet on Nuggets games will win 70 percent of the time as opposed to the customary 50 percent. As a consequence of Bayes’s theorem, the stronger Voulgaris’s belief in his hypothesis, the more quickly he can begin to make profitable bets on Nuggets games. He might be able to do so after watching just a game or two, observing whether his theory holds in practice—quickly enough that Vegas will have yet to catch on. Conversely, he can avoid being distracted by statistical patterns, like the Lakers’ slow start in 1999, that have little underlying meaning but which other handicappers might mistake for a signal.
The Bayesian Path to Less Wrongness
But are Bob’s probability estimates subjective or objective? That is a tricky question.
As an empirical matter, we all have beliefs and biases, forged from some combination of our experiences, our values, our knowledge, and perhaps our political or professional agenda. One of the nice characteristics of the Bayesian perspective is that, in explicitly acknowledging that we have prior beliefs that affect how we interpret new evidence, it provides for a very good description of how we react to the changes in our world. For instance, if Fisher’s prior belief was that there was just a 0.00001 percent chance that cigarettes cause lung cancer, that helps explain why all the evidence to the contrary couldn’t convince him otherwise. In fact, there is nothing prohibiting you under Bayes’s theorem from holding beliefs that you believe to be absolutely true. If you hold there is a 100 percent probability that God exists, or a 0 percent probability, then under Bayes’s theorem, no amount of evidence could persuade you otherwise.
I’m not here to tell you whether there are things you should believe with absolute and unequivocal certainty or not.* But perhaps we should be more honest about declaiming these. Absolutely nothing useful is realized when one person who holds that there is a 0 percent probability of something argues against another person who holds that the probability is 100 percent. Many wars—like the sectarian wars in Europe in the early days of the printing press—probably result from something like this premise.
This does not imply that all prior beliefs are equally correct or equally valid. But I’m of the view that we can never achieve perfect objectivity, rationality, or accuracy in our beliefs. Instead, we can strive to be less subjective, less irrational, and less wrong. Making predictions based on our beliefs is the best (and perhaps even the only) way to test ourselves. If objectivity is the concern for a greater truth beyond our personal circumstances, and prediction is the best way to examine how closely aligned our personal perceptions are with that greater truth, the most objective among us are those who make the most accurate predictions. Fisher’s statistical method, which saw objectivity as residing within the confines of a laboratory experiment, is less suitable to this task than Bayesian reasoning.
One property of Bayes’s theorem, in fact, is that our beliefs should converge toward one another—and toward the truth—as we are presented with more evidence over time. In figure 8-8, I’ve worked out an example wherein three investors are trying to determine whether they are in a bull market or a bear market. They start out with very different beliefs about this—one of them is optimistic, and believes there’s a 90 percent chance of a bull market from the outset, while another one is bearish and says there’s just a 10 percent chance. Every time the market goes up, the investors become a little more bullish relative to their prior, while every time it goes down the reverse occurs. However, I set the simulation up such that, although the fluctuations are random on a day-to-day basis, the market increases 60 percent of the time over the long run. Although it is a bumpy road, eventually all the investors correctly determine that they are in a bull market with almost (although not exactly, of course) 100 percent certainty.
FIGURE 8-8: BAYESIAN CONVERGENCE
In theory, science should work this way. The notion of scientific consensus is tricky, but the idea is that the opinion of the scientific community converges toward the truth as ideas are debated and new evidence is uncovered. Just as in the stock market, the steps are not always forward or smooth. The scientific community is often too conservative about adapting its paradigms to new evidence,64 although there have certainly also been times when it was too quick to jump on the bandwagon. Still, provided that everyone is on the Bayesian train,* even incorrect beliefs and quite wrong priors are revised toward the truth in the end.
Right now, for instance, we may be undergoing a paradigm shift in the statistical methods that scientists are using. The critique I have made here about the flaws of Fisher’s statistical approach is neither novel nor radical: prominent scholars in fields ranging from clinical psychology65 to political science66 to ecology67 have made similar arguments for years. But so far there has been little fundamental change.
Recently, however, some well-respected statisticians have begun to argue that frequentist statistics should no longer be taught to undergraduates.68 And some professions have considered banning Fisher’s hypothesis test from their journals.69 In fact, if you read what’s been written in the past ten years, it’s hard to find anything that doesn’t advocate a Bayesian approach.
Bob’s money is on Bayes, too. He does not literally apply Bayes’s theorem every time he makes a prediction. But his practice of testing statistical data in the context of hypotheses and beliefs derived from his basketball knowledge is very Bayesian, as is his comfort with accepting probabilistic answers to his questions.
It will take some time for textbooks and traditions to change. But Bayes’s theorem holds that we will converge toward the better approach. Bayes’s theorem predicts that the Bayesians will win.
9
RAGE AGAINST THE MACHINES
The twenty-seven-year-old Edgar Allan Poe, like many others before him, was fascinated by the Mechanical Turk, a contraption that had once beaten Napoleon Bonaparte and Benjamin Franklin at chess. The machine, constructed in Hungary in 1770 before Poe or the United States of America were born, had come to tour Baltimore and Richmond in the 1830s after having wowed audiences around Europe for decades. Poe deduced that it was an elaborate hoax, its cogs and gears concealing a chess master who sat in its cupboards and manipulated its levers to move the pieces about the board and nod its turban-covered head every time that it put its opponent into check.
Poe is regarded as the inventor of the detective story,1 and some of his work in sleuthing the hoax was uncanny. He was rightly suspicious, for instance, that a man (later determined to be the German chess master William Schlumberger) could always be found packing and unpacking the machine but was nowhere to be seen while the game was being played (aha! he was in the box). What was truly visionary about Poe’s essay on the Mechanical Turk, however, was his grasp of its implications for what we now call artificial intelligence (a term that would not be coined until 120 years later). His essay expressed a very deep and very modern ambivalence about the prospect that computers might be able to imitate, or improve on, the higher functions of man.
FIGURE 9-1: THE MECHANICAL TURK
Poe recognized just how impressive it might be for a machine to play chess at all. The first mechanical computer, what Charles Babbage called the difference engine, had barely been conceived of at the time that Poe wrote his exposé. Babbage’s proposed computer, which was never fully bui
lt during his lifetime, might at best hope to approximate some elementary functions like logarithms in addition to carrying out addition, subtraction, multiplication, and division. Poe thought of Babbage’s work as impressive enough—but still, all it did was take predictable inputs, turn a few gears, and spit out predictable outputs. There was no intelligence there—it was purely mechanistic. A computer that could play chess, on the other hand, verged on being miraculous because of the judgment required to play the game well.
Poe claimed that if this chess-playing machine were real, it must by definition play chess flawlessly; machines do not make computational errors. He took the fact that the Turk did not play perfect chess—it won most of its games but lost a few—as further proof that it was not a machine but a human-controlled apparatus, full of human imperfections.
Although Poe’s logic was flawed, this reverence for machines is still with us today. We regard computers as astonishing inventions, among the foremost expressions of human ingenuity. Bill Gates often polls as among the most admired men in America,2 and Apple and Google among our most admired companies.3 And we expect computers to behave flawlessly and effortlessly, somehow overcoming the imperfections of their creators.
Moreover, we view the calculations of computer programs as unimpeachably precise and perhaps even prophetic. In 2012, a pair of British teenagers were accused of defrauding investors out of more than a million dollars by promoting their stock-picking “robot” named MARL,4 which they claimed could process “1,986,832 mathematical calculations per second” while avoiding “human ‘gut feelings,’” allowing investors to double their money every few hours by following MARL’s recommendations for penny stocks.5