The expert witness who testified in both cases was a pediatrician. His theory was that both mothers suffered from Munchausen syndrome by proxy, a form of child abuse in which a parent subjects a child to unnecessary medical procedures. More convincing to the juries than this theory, however, was a probability presented to them during the pediatrician’s testimony. The pediatrician testified that the odds against two babies dying of cot death (the term used in Britain for sudden infant death syndrome) in the same family was 73 million to 1. This figure greatly impressed the juries because it made the probability of these deaths happening by chance seem quite small. But the pediatrician had missed a basic rule of applied probability that operates in cases like this. To get his 73,000,000:1 figure, he had simply squared the probability of a single cot death. But this is the correct formula only under the assumption that the two deaths were independent events. That assumption is likely false in the case of sudden infant death syndrome where various genetic and environmental factors have been studied that would increase the probability of these deaths happening in the same family.
Shortly after the conviction of Mrs. Clark, the British Medical Journal published an essay titled “Conviction by Mathematical Error?” pointing out the errors in the logic of probability in the pediatrician’s trial testimony. In one sense, the error in probabilistic reasoning is trivial. Once it is pointed out to most people, they understand it. That the “square the probabilities” rule requires that the events be independent is stressed by every introductory statistics instructor. But in another sense, the problem seems larger. The mindware of basic probability theory is very inadequately distributed. The pediatrician had failed to learn it, and it was also probably unknown to the judge and jury. Most people leave high school without knowing the multiplication rule of probability, and only a subset of university students takes courses in which it is taught. Intelligence tests do not assess it. Research in cognitive psychology has shown that our natural thinking tendencies (what the cognitive miser relies on) will not yield the right estimates when processing probabilistic information of this type.3 Many important rules of probability theory are not in the stored mindware of most people because they have not been learned through formal instruction. In short, the lack of knowledge of probability theory is a mindware gap that is widespread and thus the source of much irrational thought and action.
In these two examples (facilitated communication and the convictions due to improper use of probabilities), I have illustrated how missing mindware can lead to irrational decisions and action. The two classes of missing mindware that I have illustrated here—rules of scientific thinking and probabilistic thinking, respectively—were chosen deliberately because they account for much irrational thinking. The presence or absence of this mindware determines whether people are rational or not. It is mindware that is often missing even among people of high intelligence (due to lack of exposure or instruction) and thus is a cause of dysrationalia. That is, because the tests do not assess probabilistic reasoning, many people who are deemed highly intelligent might still be plagued by irrational probabilistic judgments. Although many IQ tests do examine whether people have acquired certain types of factual information (for example, vocabulary), the mindware of scientific thinking and probability is not assessed by the tests. If it were, some people would be deemed more intelligent than they are by current tests and other people less so.
The Reverend Thomas Bayes to the Rescue!
The scientific thinking principle illustrated in the facilitated communication example—the necessity of considering alternative hypotheses—has enormous generality. The most basic form of this reasoning strategy—one that might be termed “think of the opposite”—is in fact mindware that can be used in a variety of problems in daily life. Imagine that there is an intriguing-looking restaurant in your neighborhood that you have never visited. One thing that has kept you away is that several of your discerning friends have said that they have eaten there and that it is not very good. Rightly or wrongly (your friends may be unrepresentative—and you may be overly influenced by the vividness of their testimony) you (implicitly) put the probability that the restaurant is any good at .50—that is, 50 percent. Later that month you are at the hairdresser getting a cut, and the proprietor of the restaurant happens to be there as well. The proprietor, recognizing you from the neighborhood, asks you why you have never been in the restaurant. You make up a lame excuse. Perhaps detecting some reluctance on your part, the proprietor says, “Come on, what’s the matter? Ninety-five percent of my customers never complain!”
Does this put you at ease? Does this make you want to go there? Is this evidence that the restaurant is good?
The answer to all of these questions is of course a resounding no. In fact, the proprietor’s statement has made you, if anything, even more hesitant to go there. It certainly hasn’t made you want to raise your implicit probability that it is any good above 50/50. What is wrong with the proprietor’s reasoning? Why is the proprietor wrong in viewing his or her statement as evidence that the restaurant is good and that you should go to it?
The formal answer to this question can be worked out using a theorem discovered by the Reverend Thomas Bayes of Tunbridge Wells, England, in the eighteenth century.4 Bayes’ formula is written in terms of just two fundamental concepts: the focal hypothesis under investigation (labeled H) and a set of data that are collected relevant to the hypothesis (labeled D). In the formula I will show you below, you will see an additional symbol, ∼H (not H). This simply refers to the alternative hypothesis: the mutually exclusive alternative that must be correct if the focal hypothesis, H, is false. Thus, by convention, the probability of the alternative hypothesis, ∼H, is one minus the probability of the focal hypothesis, H. For example, if I think the probability that the fish at the end of my line is a trout is .60, then that is the equivalent of saying that the probability that the fish at the end of my line is not a trout is .40.
Here I should stop and say that this is the most mathematical and technical part of this book. However, it is not the math but the concepts that are important, and they should be clear throughout the discussion even if you are math-phobic and wish to ignore the numbers and formulas. This is a key point. You need not learn anything more than a way of thinking—some verbal rules—in order to be a Bayesian thinker. Formal Bayesian statistics involve calculation to be sure, but to escape the thinking errors surrounding probability you only need to have learned the conceptual logic of how correct thinking about probabilities works.
So, in the formula to come, P(H) is the probability estimate that the focal hypothesis is true prior to collecting the data, and P(∼H) is the probability estimate that the alternative hypothesis is true prior to collecting the data. Additionally, a number of conditional probabilities come into play. For example, P(H/D) represents the probability that the focal hypothesis is true subsequent to the data pattern being actually observed, and P(∼H/D) represents the complement of this—the posterior probability of the alternative hypothesis, given the data observed. P(D/H) is the probability of observing that particular data pattern given that the focal hypothesis is true, and P(D/∼H) (as we shall see below, a very important quantity) the probability of observing that particular data pattern given that the alternative hypothesis is true. It is important to realize that P(D/H) and P(D/∼H) are not complements (they do not add to 1.0). The data might be likely given both the focal and alternative hypotheses or unlikely given both the focal and alternative hypotheses.
We will focus here on the most theoretically transparent form of Bayes’ formula—one which is written in so-called odds form:
In this ratio, or odds form, from left to right, the three ratio terms represent: the posterior odds favoring the focal hypothesis (H) after receipt of the new data (D); the so-called likelihood ratio (LR) composed of the probability of the data given the focal hypothesis divided by the probability of the data given the alternative hypothesis; and the prior odds favo
ring the focal hypothesis. Specifically:
posterior odds = P(H/D)/P(∼H/D)
likelihood ratio = P(D/H)/P(D/∼H)
prior odds = P(H)/P(∼H)
The formula tells us that the odds favoring the focal hypothesis (H) after receipt of the data are arrived at by multiplying together the other two terms—the likelihood ratio and the prior odds favoring the focal hypothesis:
posterior odds favoring the focal hypothesis = LR × prior odds
It is very important to understand, though, that no one is saying that people are irrational if they do not know Bayes’ rule. No one is expected to know the formula. Instead, the issue is whether people’s natural judgments of probabilities follow—to an order of approximation—the dictates of the theorem. It is understood that people making probabilistic judgments are making spontaneous “guesstimates”—the experimental evidence concerns whether these spontaneous judgments capture some of the restrictions that Bayes’ theorem puts on probabilities. When we fall to the ground, our body can be described as behaving according to a law of Newton’s. We do not consciously calculate Newton’s law as our falling behavior is taking place—but we can in fact be described as if we were adhering to that law. The analogous question here is whether people’s judgments can be described as adhering to the model of rational reasoning provided by Bayes’ rule. The probability judgments of people might be described as consistent with Bayes’ rule without their having any knowledge of the formula or being aware of any conscious calculation.
There are several ways in which reasoning has been found to deviate from the prescriptions of Bayes’ rule, but in this section I concentrate on just one:5
Often, when evaluating the diagnosticity of evidence, [P(D/H)/P(D/∼H)], people fail to appreciate the relevance of the denominator term [P(D/∼H)]. They fail to see the necessity of evaluating the probability of obtaining the data observed if the focal hypothesis were false.
This is the formal reason why failing to “think of the opposite” leads to serious reasoning errors. Let’s go back to the proprietor of the restaurant described above. Anyone who thinks that the proprietor’s argument is a good one is making this error. Here is why the proprietor’s reasoning is wrong.
In Bayesian terms, what is happening is that the proprietor is providing you only with information about P(D/H) [the probability of less than 5 percent complaints if the restaurant is good] and ignoring P(D/∼H) [the probability of less than 5 percent complaints if the restaurant is not good]. He or she wants you to raise your probability because he has presented you with a high P(D/H), but you are reluctant to do so because you (rightly) see that the critical posterior odds depend on more than this. You, in turn (if you are thinking correctly) are making some assumptions about the term he is not giving you—P(D/∼H)—and realizing that the evidence he is presenting is not very good. In this simple example, you recognize the necessity of obtaining evidence about P(D/∼H). In other words, what is the probability that only 5 percent of the customers would complain directly to the proprietor if the restaurant were not good?
What is happening in Bayesian terms is this. Recall the basic formula. Conceptually, it is:
posterior odds = likelihood ratio × prior odds
Let us suppose that you put your prior probability that the restaurant is good at .50—the same probability, .50, that it is bad. The prior odds in favor of the restaurant being good are thus .5/.5, or 1 to 1—even money, in racetrack terms.
What is the likelihood ratio (LR) here? Taking the proprietor at face value, the datum is the fact that: 95 percent of the customers never complain. So the likelihood ratio might be portrayed as this:
P(at least 95% of the customers never complain/the restaurant is good)
divided by
P(at least 95% of the customers never complain/the restaurant is bad)
Given that a restaurant is good, it is highly likely that at least 95 percent won’t complain. In fact a 5 percent complaint rate is pretty high for any restaurant to stay in business with, so it’s probably at least .99 probable that a good restaurant will have more than 95 percent walking away without complaint. The key to the proprietor’s error is in the term in the denominator—P(D/∼H): Given that a restaurant is bad, what is the probability that more than 95 percent of its customers wouldn’t complain? There are many problems here. Most bad restaurants are not bad all the time. Additionally, most are bad not because customers are gagging on the food (such restaurants close quickly), but because they are consistently mediocre or worse than average for their neighborhood. It is because they are “blah” restaurants—not that they are poisoning people. Add to this the fact that, for a host of social reasons, people rarely publicly complain when they are mildly dissatisfied. It seems quite likely that at a bad restaurant—a restaurant that would not poison us, but that we would not want to go to—most people would leave without complaining. This is why the 95 percent figure is unimpressive.
Given it is a bad restaurant, there might be a .90 probability that at least 95 percent of the customers will still leave without complaining. So what happens in Bayes’ theorem when we plug in these numbers for the likelihood ratio is this:
posterior odds = likelihood ratio × prior odds
posterior odds = (.99/.90) × (.5/.5)
posterior odds = 1.1
The odds favoring its being a good restaurant are still only 1.1 to 1 (the probability that it is a good restaurant has gone from 50 percent to only 52.4 percent6). Thus, on the best possible interpretation, it is still not very probable that this is a good restaurant.
The restaurant proprietor has tried to lure us into a thinking error. The proprietor’s sleight of hand involves three parts:
1. Producing a datum, D, guaranteed to yield a high P(D/H),
2. Hoping that we will fail to consider P(D/∼H), and
3. Implying that the high P(D/H) alone implies a high probability for the focal hypothesis.
A large research literature has grown up demonstrating that the tendency to ignore the probability of the evidence given that the nonfocal hypothesis is true—P(D/∼H)—is a ubiquitous psychological tendency. For example, psychologist Michael Doherty and colleagues used a simple paradigm in which subjects were asked to imagine that they were a doctor examining a patient with a red rash.7 They were shown four pieces of evidence, and the subjects were asked to choose which pieces of information they would need in order to determine the probability that the patient had the disease “Digirosa.” The four pieces of information were:
The percentage of people with Digirosa.
The percentage of people without Digirosa.
The percentage of people with Digirosa who have a red rash.
The percentage of people without Digirosa who have a red rash.
These pieces of information corresponded to the four terms in the Bayesian formula: P(H), P(∼H), P(D/H), and P(D/∼H). Because P(H) and P(∼H) are complements, only three pieces of information are necessary to calculate the posterior probability. However, P(D/∼H)—the percentage of people who have a red rash among those without Digirosa—clearly must be selected because it is a critical component of the likelihood ratio in Bayes’ formula. Nevertheless, 48.8 percent of the individuals who participated in a study by Doherty and colleagues failed to select the P(D/∼H) card. Thus, to many people presented with this problem, the people with a red rash but without Digirosa do not seem relevant—they seem (mistakenly) to be a nonevent.
The importance of P(D/∼H) is not something that is automatically installed in our brain as mindware, so the fact that it is absolutely necessary information often seems counterintuitive. People have to be taught that it is important, or else their default is to ignore it. Thus, for many people, failure to realize the importance of processing P(D/∼H) represents a mindware gap.
A Critical Mindware Gap—Ignoring the Alternative Hypothesis
The failure to attend to the alternative hypothesis—to the denominator of the likel
ihood ratio when receiving evidence—is not a trivial reasoning error. Paying attention to the probability of the observation under the alternative hypothesis is a critical component of clinical judgment in medicine and many other applied sciences. It is the reason we use control groups. It is essential to know what would have happened if the variable of interest had not been changed. Both clinical and scientific inference are fatally compromised if we have information about only the treated group.
This is perhaps one of many things that went seriously awry in the facilitated communication case which was characterized by failure to think about the necessity of testing alternative hypotheses. Psychologists have done extensive research on the tendency for people to ignore essential comparative (control group) information. For example, in a much researched covariation detection paradigm, subjects are shown data from an experiment examining the relation between a treatment and patient response.8 They might be told, for instance, that:
200 people were given the treatment and improved
75 people were given the treatment and did not improve
50 people were not given the treatment and improved
15 people were not given the treatment and did not improve
These data represent the equivalent of a 2 × 2 matrix summarizing the results of the experiment. In covariation detection experiments, subjects are asked to indicate whether the treatment was effective. Many think that the treatment in this example is effective. They focus on the large number of cases (200) in which improvement followed the treatment. Secondarily, they focus on the fact that more people who received treatment showed improvement (200) than showed no improvement (75). Because this probability (200/275 = .727) seems high, subjects are enticed into thinking that the treatment works. This is an error of rational thinking.
What Intelligence Tests Miss Page 17