The People Who Appear in this Chapter
Bayes, Thomas.
1702-1761 - English statistician and minister. The theorem which bears his name is a critical part of probability mathematics. He never published it himself, and so we have to thank his friend Richard Price for not losing this work to history.
Atkins, Vera.
1908-2000 - Romanian-British intelligence officer. She received high honors from both the British and French governments for her work for the allies during World War II.
Probabilistic Thinking
Probabilistic thinking is essentially trying to estimate, using some tools of math and logic, the likelihood of any specific outcome coming to pass. It is one of the best tools we have to improve the accuracy of our decisions. In a world where each moment is determined by an infinitely complex set of factors, probabilistic thinking helps us identify the most likely outcomes. When we know these our decisions can be more precise and effective.
Are you going to get hit by lightning or not?
Why we need the concept of probabilities at all is worth thinking about. Things either are or are not, right? We either will get hit by lightning today or we won’t. The problem is, we just don’t know until we live out the day. Which doesn’t help us at all when we make our decisions in the morning. The future is far from determined and we can better navigate it by understanding the likelihood of events that could impact us.
Our lack of perfect information about the world gives rise to all of probability theory, and its usefulness. We know now that the future is inherently unpredictable because not all variables can be known and even the smallest error imaginable in our data very quickly throws off our predictions. The best we can do is estimate the future by generating realistic, useful probabilities. So how do we do that?
Probability is everywhere, down to the very bones of the world. The probabilistic machinery in our minds—the cut-to-the-quick heuristics made so famous by the psychologists Daniel Kahneman and Amos Tversky—was evolved by the human species in a time before computers, factories, traffic, middle managers, and the stock market. It served us in a time when human life was about survival, and still serves us well in that capacity.2
But what about today—a time when, for most of us, survival is not so much the issue? We want to thrive. We want to compete, and win. Mostly, we want to make good decisions in complex social systems that were not part of the world in which our brains evolved their (quite rational) heuristics.
For this, we need to consciously add in a needed layer of probability awareness. What is it and how can I use it to my advantage?
There are three important aspects of probability that we need to explain so you can integrate them into your thinking to get into the ballpark and improve your chances of catching the ball:
Bayesian thinking
Fat-tailed curves
Asymmetries
Thomas Bayes and Bayesian thinking: Bayes was an English minister in the first half of the 18th century, whose most famous work, “An Essay Toward Solving a Problem in the Doctrine of Chances”, was brought to the attention of the Royal Society by his friend Richard Price in 1763—two years after his death. The essay concerned how we should adjust probabilities when we encounter new data, and provided the seeds for the great mathematician Pierre Simon Laplace to develop what we now call Bayes’s Theorem.
The core of Bayesian thinking (or Bayesian updating, as it can be called) is this: given that we have limited but useful information about the world, and are constantly encountering new information, we should probably take into account what we already know when we learn something new. As much of it as possible. Bayesian thinking allows us to use all relevant prior information in making decisions. Statisticians might call it a base rate, taking in outside information about past situations like the one you’re in.
Consider the headline “Violent Stabbings on the Rise.” Without Bayesian thinking, you might become genuinely afraid because your chances of being a victim of assault or murder is higher than it was a few months ago. But a Bayesian approach will have you putting this information into the context of what you already know about violent crime.
You know that violent crime has been declining to its lowest rates in decades. Your city is safer now than it has been since this measurement started. Let’s say your chance of being a victim of a stabbing last year was one in 10,000, or 0.01%. The article states, with accuracy, that violent crime has doubled. It is now two in 10,000, or 0.02%. Is that worth being terribly worried about? The prior information here is key. When we factor it in, we realize that our safety has not really been compromised.
Conversely, if we look at the diabetes statistics in the United States, our application of prior knowledge would lead us to a different conclusion. Here, a Bayesian analysis indicates you should be concerned. In 1958, 0.93% of the population was diagnosed with diabetes. In 2015 it was 7.4%. When you look at the intervening years, the climb in diabetes diagnosis is steady, not a spike. So the prior relevant data, or priors, indicate a trend that is worrisome.
It is important to remember that priors themselves are probability estimates. For each bit of prior knowledge, you are not putting it in a binary structure, saying it is true or not. You’re assigning it a probability of being true. Therefore, you can’t let your priors get in the way of processing new knowledge. In Bayesian terms, this is called the likelihood ratio or the Bayes factor. Any new information you encounter that challenges a prior simply means that the probability of that prior being true may be reduced. Eventually some priors are replaced completely. This is an ongoing cycle of challenging and validating what you believe you know. When making uncertain decisions, it’s nearly always a mistake not to ask: What are the relevant priors? What might I already know that I can use to better understand the reality of the situation? — Sidebar: Conditional Probability
Conditional Probability
Conditional probability is similar to Bayesian thinking in practice, but comes at it from a different angle. When you use historical events to predict the future, you have to be mindful of the conditions that surrounded that event.
Events can be independent, like tossing a coin, or dependent. In the latter case, it means the outcomes of an event are conditional on what preceded them. Let’s say the last three times I’ve hung out with you and we’ve gone for ice cream, I’ve picked vanilla. Do you conclude that vanilla is my favorite, and thus I will always choose it? You want to check first if my choosing vanilla is independent or dependent. Am I the first to choose from 100 flavors? Or am I further down the line, when chocolate is no longer available?
My ice cream choice is independent if all the flavors are available each time someone in my group makes a choice. It is dependent if the preceding choices of my friends reduce what is available to me. In this case, the probability of my choosing vanilla is conditional on what is left after my friends make their choices.
Thus, using conditional probability means being very careful to observe the conditions preceding an event you’d like to understand.
Now we need to look at fat-tailed curves: Many of us are familiar with the bell curve, that nice, symmetrical wave that captures the relative frequency of so many things from height to exam scores. The bell curve is great because it’s easy to understand and easy to use. Its technical name is “normal distribution.” If we know we are in a bell curve situation, we can quickly identify our parameters and plan for the most likely outcomes.
Fat-tailed curves are different. Take a look.
At first glance they seem similar enough. Common outcomes cluster together, creating a wave. The difference is in the tails. In a bell curve the extremes are predictable. There can only be so much deviation from the mean. In a fat-tailed curve there is no real cap on extreme events.
The more extreme events that are possible, the longer the tails of the curve get. Any one extreme event is still unlikely, but the sheer number of options means that we can’t rely on the most
common outcomes as representing the average. The more extreme events that are possible, the higher the probability that one of them will occur. Crazy things are definitely going to happen, and we have no way of identifying when. — Sidebar: Orders of Magnitude
Think of it this way. In a bell curve type of situation, like displaying the distribution of height or weight in a human population, there are outliers on the spectrum of possibility, but the outliers have a fairly well-defined scope. You’ll never meet a man who is ten times the size of an average man. But in a curve with fat tails, like wealth, the central tendency does not work the same way. You may regularly meet people who are ten, 100, or 10,000 times wealthier than the average person. That is a very different type of world.
Let’s re-approach the example of the risks of violence we discussed in relation to Bayesian thinking. Suppose you hear that you had a greater risk of slipping on the stairs and cracking your head open than being killed by a terrorist. The statistics, the priors, seem to back it up: 1,000 people slipped on the stairs and died last year in your country and only 500 died of terrorism. Should you be more worried about stairs or terror events?
_
Always be extra mindful of the tails:
They might mean everything.
Some use examples like these to prove that terror risk is low—since the recent past shows very few deaths, why worry?3 The problem is in the fat tails: The risk of terror violence is more like wealth, while stair-slipping deaths are more like height and weight. In the next ten years, how many events are possible? How fat is the tail?
The important thing is not to sit down and imagine every possible scenario in the tail (by definition, it is impossible) but to deal with fat-tailed domains in the correct way: by positioning ourselves to survive or even benefit from the wildly unpredictable future, by being the only ones thinking correctly and planning for a world we don’t fully understand.
— Sidebar: Anti-fragility
Asymmetries: Finally, you need to think about something we might call “metaprobability”—the probability that your probability estimates themselves are any good.
This massively misunderstood concept has to do with asymmetries. If you look at nicely polished stock pitches made by professional investors, nearly every time an idea is presented, the investor looks their audience in the eye and states they think they’re going to achieve a rate of return of 20% to 40% per annum, if not higher. Yet exceedingly few of them ever attain that mark, and it’s not because they don’t have any winners. It’s because they get so many so wrong. They consistently overestimate their confidence in their probabilistic estimates. (For reference, the general stock market has returned no more than 7% to 8% per annum in the United States over a long period, before fees.)
Orders of Magnitude
Nassim Taleb puts his finger in the right place when he points out our naive use of probabilities. In The Black Swan, he argues that any small error in measuring the risk of an extreme event can mean we’re not just slightly off, but way off—off by orders of magnitude, in fact. In other words, not just 10% wrong but ten times wrong, or 100 times wrong, or 1,000 times wrong. Something we thought could only happen every 1,000 years might be likely to happen in any given year! This is using false prior information and results in us underestimating the probability of the future distribution being different.
_
Taleb, Nassim. The Black Swan: The Impact of the Highly Improbable, 2nd edition. New York: Random House, 2010.
Anti-fragility
How do we benefit from the uncertainty of a world we don’t understand, one dominated by “fat tails”? The answer to this was provided by Nassim Taleb in a book curiously titled Antifragile.
Here is the core of the idea. We can think about three categories of objects: Ones that are harmed by volatility and unpredictability, ones that are neutral to volatility and unpredictability, and finally, ones that benefit from it. The latter category is antifragile—like a package that wants to be mishandled. Up to a point, certain things benefit from volatility, and that’s how we want to be. Why? Because the world is fundamentally unpredictable and volatile, and large events—panics, crashes, wars, bubbles, and so on—tend to have a disproportionate impact on outcomes.
There are two ways to handle such a world: try to predict, or try to prepare. Prediction is tempting. For all of human history, seers and soothsayers have turned a comfortable trade. The problem is that nearly all studies of “expert” predictions in such complex real-world realms as the stock market, geopolitics, and global finance have proven again and again that, for the rare and impactful events in our world, predicting is impossible! It’s more efficient to prepare.
What are some ways we can prepare—arm ourselves with antifragility—so we can benefit from the volatility of the world?
The first one is what Wall Street traders would call “upside optionality”, that is, seeking out situations that we expect have good odds of offering us opportunities. Take the example of attending a cocktail party where a lot of people you might like to know are in attendance. While nothing is guaranteed to happen—you may not meet those people, and if you do, it may not go well— you give yourself the benefit of serendipity and randomness. The worst thing that can happen is...nothing. One thing you know for sure is that you’ll never meet them sitting at home. By going to the party, you improve your odds of encountering opportunity.
The second thing we can do is to learn how to fail properly. Failing properly has two major components. First, never take a risk that will do you in completely. (Never get taken out of the game completely.) Second, develop the personal resilience to learn from your failures and start again. With these two rules, you can only fail temporarily.
No one likes to fail. It hurts. But failure carries with it one huge antifragile gift: learning. Those who are not afraid to fail (properly) have a huge advantage over the rest. What they learn makes them less vulnerable to the volatility of the world. They benefit from it, in true antifragile fashion.
Let’s say you’d like to start a successful business, but you have no business experience. Do you attend business school or start a business that might fail? Business school has its benefits, but business itself—the rough, jagged real-world experience of it—teaches through rapid feedback loops of success and failure. In other words, trial and error carries the precious commodity of information.
The Antifragile mindset is a unique one. Whenever possible, try to create scenarios where randomness and uncertainty are your friends, not your enemies.
_
Taleb, Nassim.
Antifragile. New York: Random House, 2012.
_
The SOE’s primary goal in France was to coordinate and initiate sabotage and other subversive activities against the Germans.
Another common asymmetry is people’s ability to estimate the effect of traffic on travel time. How often do you leave “on time” and arrive 20% early? Almost never? How often do you leave “on time” and arrive 20% late? All the time? Exactly. Your estimation errors are asymmetric, skewing in a single direction. This is often the case with probabilistic decision-making.
Far more probability estimates are wrong on the “over-optimistic” side than the “under-optimistic” side. You’ll rarely read about an investor who aimed for 25% annual return rates who subsequently earned 40% over a long period of time. You can throw a dart at the Wall Street Journal and hit the names of lots of investors who aim for 25% per annum with each investment and end up closer to 10%.
The spy world
Successful spies are very good at probabilistic thinking. High-stakes survival situations tend to make us evaluate our environment with as little bias as possible.
When Vera Atkins was second in command of the French unit of the Special Operations Executive (SOE), a British intelligence organization reporting directly to Winston Churchill during World War II4, she had to make hundreds of decisions by figuring out the probable accuracy of inherently
unreliable information.
Atkins was responsible for the recruitment and deployment of British agents into occupied France. She had to decide who could do the job, and where the best sources of intelligence were. These were literal life-and-death decisions, and all were based in probabilistic thinking.
First, how do you choose a spy? Not everyone can go undercover in high stress situations and make the contacts necessary to gather intelligence. The result of failure in France in WWII was not getting fired; it was death. What factors of personality and experience show that a person is right for the job? Even today, with advancements in psychology, interrogation, and polygraphs, it’s still a judgment call.
_
Many of the British Intelligence services worked with the French Resistance in WWII. It was a win-win. Expert knowledge of the territory for the British, weapons and financial support for the Resistance.
For Vera Atkins in the 1940s, it was very much a process of assigning weight to the various factors and coming up with a probabilistic assessment of who had a decent chance of success. Who spoke French? Who had the confidence? Who was too tied to family? Who had the problem-solving capabilities? From recruitment to deployment, her development of each spy was a series of continually updated, educated estimates.
Getting an intelligence officer ready to go is only half the battle. Where do you send them? If your information was so great that you knew exactly where to go, you probably wouldn’t need an intelligence mission. Choosing a target is another exercise in probabilistic thinking. You need to evaluate the reliability of the information you have and the networks you have set up. Intelligence is not evidence. There is no chain of command or guarantee of authenticity.
The Great Mental Models Page 9