The Signal and the Noise
Page 46
The legendary shortstop Derek Jeter was a frequent subject of debate during the Moneyball era. Broadcasters and scouts noticed that Jeter seemed to make an especially large number of diving plays and concluded that he was an exceptional shortstop for that reason. Stat geeks crunched the numbers and detected a flaw in this thinking.1 Although Jeter was a terrific athlete, he often got a slow jump on the ball and dove because he was making up for lost time. In fact, the numbers suggested that Jeter was a fairly poor defensive shortstop, despite having won five Gold Glove awards. The plays that Jeter had to dive for, a truly great defensive shortstop like Ozzie Smith might have made easily—perhaps receiving less credit for them because he made them look routine.
FIGURE C-1: SHORTSTOP DIVING RANGES
Whatever range of abilities we have acquired, there will always be tasks sitting right at the edge of them. If we judge ourselves by what is hardest for us, we may take for granted those things that we do easily and routinely.
One of the most spectacularly correct predictions in history was that of the English astronomer Edmund Halley, who in 1705 predicted that a great comet would return to the earth in 1758. Halley had many doubters, but the comet returned just in the nick of time.2 Comets, which in antiquity were regarded as being wholly unpredictable omens from the gods,3 are now seen as uncannily regular and predictable things.
Astronomers predict that Halley’s Comet will next make its closest approach to the earth on July 28, 2061. By that time, many problems in the natural world that now vex our predictive abilities will have come within the range of our knowledge.
Nature’s laws do not change very much. So long as the store of human knowledge continues to expand, as it has since Gutenberg’s printing press, we will slowly come to a better understanding of nature’s signals, if never all its secrets.
And yet if science and technology are the heroes of this book, there is the risk in the age of Big Data about becoming too starry-eyed about what they might accomplish.
There is no reason to conclude that the affairs of men are becoming more predictable. The opposite may well be true. The same sciences that uncover the laws of nature are making the organization of society more complex. Technology is completely changing the way we relate to one another. Because of the Internet, “the whole context, all the equations, all the dynamics of the propagation of information change,” I was told by Tim Berners-Lee, who invented the World Wide Web in 1990.4
The volume of information is increasing exponentially. But relatively little of this information is useful—the signal-to-noise ratio may be waning. We need better ways of distinguishing the two.
This book is less about what we know than about the difference between what we know and what we think we know. It recommends a strategy so that we might close that gap. The strategy requires one giant leap and then some small steps forward. The leap is into the Bayesian way of thinking about prediction and probability.
Think Probabilistically
Bayes’s theorem begins and ends with a probabilistic expression of the likelihood of a real-world event. It does not require you to believe that the world is intrinsically uncertain. It was invented in the days when the regularity of Newton’s laws formed the dominant paradigm in science. It does require you to accept, however, that your subjective perceptions of the world are approximations of the truth.
This probabilistic element of the Bayesian way may seem uncomfortable at first. Unless we grew up playing cards or other games of chance, we were probably not encouraged to think in this way. Mathematics classrooms spend more time on abstract subjects like geometry and calculus than they do on probability and statistics. In many walks of life, expressions of uncertainty are mistaken for admissions of weakness.
When you first start to make these probability estimates, they may be quite poor. But there are two pieces of favorable news. First, these estimates are just a starting point: Bayes’s theorem will have you revise and improve them as you encounter new information. Second, there is evidence that this is something we can learn to improve. The military, for instance, has sometimes trained soldiers in these techniques,5 with reasonably good results.6 There is also evidence that doctors think about medical diagnoses in a Bayesian manner.7
It is probably better to follow the lead of our doctors and our soldiers than our television pundits.
• • •
Our brains process information by means of approximation.8 This is less an existential fact than a biological necessity: we perceive far more inputs than we can consciously consider, and we handle this problem by breaking them down into regularities and patterns.
Under high stress, the regularities of life will be stripped away. Studies of people who survived disasters like the September 11 attacks found that they could recall some minute details about their experiences and yet often felt almost wholly disconnected from their larger environments.9 Under these circumstances, our first instincts and first approximations may be rather poor, often failing to recognize the gravity of the threat. Those who had been forced to make decisions under extreme stress before, like on the battlefield, were more likely to emerge as heroes, leading others to safety.10
Our brains simplify and approximate just as much in everyday life. With experience, the simplifications and approximations will be a useful guide and will constitute our working knowledge.11 But they are not perfect, and we often do not realize how rough they are.
Consider the following set of seven statements, which are related to the idea of the efficient-market hypothesis and whether an individual investor can beat the stock market. Each statement is an approximation, but each builds on the last one to become slightly more accurate.
No investor can beat the stock market.
No investor can beat the stock market over the long run.
No investor can beat the stock market over the long run relative to his level of risk.
No investor can beat the stock market over the long run relative to his level of risk and accounting for his transaction costs.
No investor can beat the stock market over the long run relative to his level of risk and accounting for his transaction costs, unless he has inside information.
Few investors beat the stock market over the long run relative to their level of risk and accounting for their transaction costs, unless they have inside information.
It is hard to tell how many investors beat the stock market over the long run, because the data is very noisy, but we know that most cannot relative to their level of risk, since trading produces no net excess return but entails transaction costs, so unless you have inside information, you are probably better off investing in an index fund.
The first approximation—the unqualified statement that no investor can beat the stock market—seems to be extremely powerful. By the time we get to the last one, which is full of expressions of uncertainty, we have nothing that would fit on a bumper sticker. But it is also a more complete description of the objective world.
There is nothing wrong with an approximation here and there. If you encountered a stranger who knew nothing about the stock market, informing him that it is hard to beat, even in the crude terms of the first statement, would be a lot better than nothing.
The problem comes when we mistake the approximation for the reality. Ideologues like Phil Tetlock’s hedgehogs behave in this way. The simpler statements seem more universal, more in testament to a greater truth or grander theory. Tetlock found, however, that his hedgehogs were very poor at making predictions. They leave out all the messy bits that make life real and predictions more accurate.
We have big brains, but we live in an incomprehensibly large universe. The virtue in thinking probabilistically is that you will force yourself to stop and smell the data—slow down, and consider the imperfections in your thinking. Over time, you should find that this makes your decision making better.
Know Where You’re Coming From
Bayes’s theorem requires us to state—expl
icitly—how likely we believe an event is to occur before we begin to weigh the evidence. It calls this estimate a prior belief.
Where should our prior beliefs come from? Ideally, we would like to build on our past experience or even better the collective experience of society. This is one of the helpful roles that markets can play. Markets are certainly not perfect, but the vast majority of the time, collective judgment will be better than ours alone. Markets form a good starting point to weigh new evidence against, particularly if you have not invested much time in studying a problem.
Of course, markets are not available in every case. It will often be necessary to pick something else as a default. Even common sense can serve as a Bayesian prior, a check against taking the output of a statistical model too credulously. (These models are approximations and often rather crude ones, even if they seem to promise mathematical precision.) Information becomes knowledge only when it’s placed in context. Without it, we have no way to differentiate the signal from the noise, and our search for the truth might be swamped by false positives.
What isn’t acceptable under Bayes’s theorem is to pretend that you don’t have any prior beliefs. You should work to reduce your biases, but to say you have none is a sign that you have many. To state your beliefs up front—to say “Here’s where I’m coming from”12—is a way to operate in good faith and to recognize that you perceive reality through a subjective filter.
Try, and Err
This is perhaps the easiest Bayesian principle to apply: make a lot of forecasts. You may not want to stake your company or your livelihood on them, especially at first.* But it’s the only way to get better.
Bayes’s theorem says we should update our forecasts any time we are presented with new information. A less literal version of this idea is simply trial and error. Companies that really “get” Big Data, like Google, aren’t spending a lot of time in model land.* They’re running thousands of experiments every year and testing their ideas on real customers.
Bayes’s theorem encourages us to be disciplined about how we weigh new information. If our ideas are worthwhile, we ought to be willing to test them by establishing falsifiable hypotheses and subjecting them to a prediction. Most of the time, we do not appreciate how noisy the data is, and so our bias is to place too much weight on the newest data point. Political reporters often forget that there is a margin of error when polls are reported, and financial reporters don’t always do a good job of conveying how imprecise most economic statistics are. It’s often the outliers that make the news.
But we can have the opposite bias when we become too personally or professionally invested in a problem, failing to change our minds when the facts do. If an expert is one of Tetlock’s hedgehogs, he may be too proud to change his forecast when the data is incongruous with his theory of the world. Partisans who expect every idea to fit on a bumper sticker will proceed through the various stages of grief before accepting that they have oversimplified reality.
The more often you are willing to test your ideas, the sooner you can begin to avoid these problems and learn from your mistakes. Staring at the ocean and waiting for a flash of insight is how ideas are generated in the movies. In the real world, they rarely come when you are standing in place.13 Nor do the “big” ideas necessarily start out that way. It’s more often with small, incremental, and sometimes even accidental steps that we make progress.
Our Perceptions of Predictability
Prediction is difficult for us for the same reason that it is so important: it is where objective and subjective reality intersect. Distinguishing the signal from the noise requires both scientific knowledge and self-knowledge: the serenity to accept the things we cannot predict, the courage to predict the things we can, and the wisdom to know the difference.14
Our views on how predictable the world is have waxed and waned over the years. One simple measure of it is the number of times the words “predictable” and “unpredictable” are used in academic journals.15 At the dawn of the twentieth century, the two words were used almost exactly as often as one another. The Great Depression and the Second World War catapulted “unpredictable” into the dominant position. As the world healed from these crises, “predictable” came back into fashion, its usage peaking in the 1970s. “Unpredictable” has been on the rise again in recent years.
FIGURE C-2: THE PERCEPTION OF PREDICTABILITY, 1900–2012
These perceptions about predictability are more affected by the fashions of the sciences16 and the shortness of our memories—has anything really bad happened recently?—than by any real change in our forecasting skills. How good we think we are at prediction and how good we really are may even be inversely correlated. The 1950s, when the world was still shaken by the war and was seen as fairly unpredictable, was a time of more economic17 and scientific18 productivity than the 1970s, the decade when we thought we could predict everything, but couldn’t.
These shifting attitudes have reverberated far beyond academic journals. If you drew the same chart based on the use of the words “predictable” and “unpredictable” in English-language fiction, it would look almost exactly the same as in figure C-2.19 An unpredicted disaster, even if it has no direct effect on us, shakes our confidence that we are in control of our fate.
But our bias is to think we are better at prediction than we really are. The first twelve years of the new millennium have been rough, with one unpredicted disaster after another. May we arise from the ashes of these beaten but not bowed, a little more modest about our forecasting abilities, and a little less likely to repeat our mistakes.
ACKNOWLEDGMENTS
As the author Joseph Epstein has noted, it is a lot better to have written a book than to actually be writing one. Writing a book requires a tremendous amount of patience, organization, and discipline, qualities that I lack and that writing a blog do not very much encourage.
I was therefore highly dependent on many others who had those qualities in greater measure, and whose wisdom helped to shape the book in many large and small ways.
Thank you to my parents, Brian David Silver and Sally Thrun Silver, to whom this book is dedicated, and to my sister, Rebecca Silver.
Thank you to Virginia Smith for being a terrific editor in all respects. She, as Laura Stickney, Ann Godoff, and Scott Moyers, believed in the vision of the book. They made few compromises in producing a book that fulfilled that vision and yet tolerated many excuses when I needed more time to get it there.
Thank you to my literary agent, Sydelle Kramer, for helping me to conceive of and sell the project. Her advice was invariably the right kind: gentle enough, but never too gentle, on the many occasions when the book seemed at risk of running off the rails.
Thank you to my research assistant, Arikia Millikan, who provided boundless enthusiasm for the book, and whose influence is reflected in its keen interest in science and technology. Thank you to Julia Kamin, whose organizational skills helped point the way forward when the book was at a critical stage. Thank you to Jane Cavolina and Ellen Cavolina Porter, who produced high-quality transcriptions on a demanding schedule.
Thank you to Emily Votruba, Veronica Windholz, Kaitlyn Flynn, Amanda Dewey, and John Sharp for turning the book around against an extremely tight production schedule, and for their understanding that “today” usually meant “tonight” and that “tonight” usually meant “5 in the morning.”
Thank you to Robert Gauldin for his love and support. Thank you to Shashank Patel, Kim Balin, Bryan Joiner, Katie Halper, Jason MacLean, Maryam Saleh, and Jessica Klein for tolerating my rambling on about the book for hours at a time on the one hand or going into hiding for weeks at a time on the other.
Thank you to Micah Cohen at the New York Times, who assisted with this book in more ways than I can count.
Thank you to my bosses and colleagues at the New York Times, especially Megan Liberman, Jim Roberts, David Leonhardt, Lisa Tozzi, Gerry Mullany, Rick Berke, Dick Stevenson, Derek Willis, Matt Ericson, Gr
eg Veis, and Hugo Lindgren, who trusted me to manage the demands of the book production cycle along with those of the news cycle. Thank you to Bill Keller, Gerry Marzorati, and Jill Abramson for bringing me into the New York Times family.
Thank you to John Sides, Andrew Gelman, Tom Schaller, Ed Kilgore, Renard Sexton, Brian McCabe, Hale Stewart, and Sean Quinn for their contributions to the FiveThirtyEight blog.
Thank you to Richard Thaler and Anil Kashyap, of the University of Chicago, for reviewing the chapters related to economics and finance. Thank you to David Carr, Kathy Gauldin, and Page Ashley for reminding me of the importance of finishing the book, and to Will Repko for helping to instill that work ethic that might get it there.
Thank you to Gary Huckabay, Brandon Adams, Rafe Furst, Kevin Goldstein, Keith Urbahn, Matthew Vogel, Rachel Hauser, Jennifer Bloch, Thom Shanker, Kyu-Young Lee, and Mark Goldstein for serving as connectors and facilitators at key points along the way.
Many people were polled on the title of this book. Thank you to Jonah Peretti, Andrea Harner, Kyle Roth, Jessi Pervola, Ruth Welte, Brent Silver, Richard Silver, Amanda Silver, Roie Lindegren, Len Lindegren, Zuben Jelveh, Douglas Jester, Justin Wolfers, J. Stephen Steppard, Robert Erikson, Katie Donalek, Helen Lee, Katha Pollitt, Jeffrey Toobin, David Roberts, Felix Salmon, Hillary Bok, Heather Hurlburt, Art Goldhammer, David Karol, Sara Robinson, Max Sawicky, Michael O’Hare, Marc Tracy, Daniel Davies, E. J. Graff, Paul Starr, Russ Wellen, Jeffrey Hauser, Dana Goldstein, Suzy Khimm, Jonathan Zasloff, Avi Zenilman, James Galbraith, Greg Anrig, Paul Waldman, and Bob Kuttner for providing their advice.
This book is fairly scrupulous about citing the origin of its ideas, but some people I interviewed were more influential in determining its direction than might be inferred by the number of times that they appear in the text. This list includes Daniel Kahneman, Vasik Rajlich, Dr. Alexander “Sandy” McDonald, Roger Pielke Jr., John Rundle, Thomas Jordan, Irene Eckstrand, Phil Gordon, Chris Volinsky, Robert Bell, Tim Berners-Lee, Lisa Randall, Jay Rosen, Simon Jackman, Diane Lauderdale, Jeffrey Sachs, Howard Lederer, Rodney Brooks, Henry Abbott, and Bruce Bueno de Mesquita among others.