Such an approach ignores the probability of improvement given that treatment was not given. Since this probability is even higher (50/65 = .769) the particular treatment tested in this experiment can be judged to be completely ineffective. The tendency to ignore the outcomes in the no-treatment condition and focus on the large number of people in the treatment/improvement group seduces many people into viewing the treatment as effective. Disturbingly, this nonoptimal way of treating evidence has been found even among those who specialize in clinical diagnosis such as physicians.
More Mindware of Scientific Thinking: Falsifiability
Just as people have difficulty learning to assess data in light of an alternative hypothesis, people have a hard time thinking about evidence and tests that could falsify their focal hypotheses. Instead, people tend to seek to confirm theories rather than falsify them. One of the most investigated problems in four decades of reasoning research illustrates this quite dramatically. The task was invented by Peter Wason, one of the most creative scientists to study human rationality in the modern era, and has been investigated in dozens, if not hundreds, of studies.9 Try to answer it before reading ahead: Imagine four rectangles, each representing a card lying on a table. Each one of the cards has a letter on one side and a number on the other side. Here is a rule: If a card has a vowel on its letter side, then it has an even number on its number side. Two of the cards are letter-side up, and two of the cards are number-side up. Your task is to decide which card or cards must be turned over in order to find out whether the rule is true or false. Indicate which cards must be turned over. The four cards confronting you have the stimuli K, A, 8, and 5 showing.
This task is called the four-card selection task and has been intensively investigated for two reasons—most people get the problem wrong and it has been devilishly hard to figure out why. The answer seems obvious. The hypothesized rule is: If a card has a vowel on its letter side, then it has an even number on its number side. So the answer would seem to be to pick the A and the 8—the A, the vowel, to see if there is an even number on its back, and the 8 (the even number) to see if there is a vowel on the back. The problem is that this answer—given by about 50 percent of the people completing the problem—is wrong. The second most common answer, to turn over the A card only (to see if there is an even number on the back)—given by about 20 percent of the responders—is also wrong. Another 20 percent of the responders turn over other combinations (for example, K and 8) that are also not correct.
If you were like 90 percent of the people who have completed this problem in dozens of studies during the past several decades, you answered it incorrectly too (and in your case, you even missed it despite the hint given by my previous discussion of falsifiability!). Let’s see how most people go wrong. First, where they don’t go wrong is on the K and A cards. Most people don’t choose the K and they do choose the A. Because the rule says nothing about what should be on the backs of consonants, the K is irrelevant to the rule. The A is not. It could have an even or odd number on the back, and although the former would be consistent with the rule, the latter is the critical potential outcome—it could prove that the rule is false. In short, in order to show that the rule is not false, the A must be turned. That is the part that most people get right.
However, it is the 8 and 5 that are the hard cards. Many people get these two cards wrong. They mistakenly think that the 8 card must be chosen. This card is mistakenly turned because people think that they must check to see if there is a vowel rather than a nonvowel on the back. But, for example, if there were a K on the back of the 8, it would not show that the rule is false because, although the rule says that vowels must have even numbers on the back, it does not say that even numbers must have vowels on the back. So finding a nonvowel on the back says nothing about whether the rule is true or false. In contrast, the 5 card, which most people do not choose, is absolutely essential. The 5 card might have a vowel on the back and, if it did, the rule would be shown to be false because all vowels would not have even numbers on the back. In short, in order to show that the rule is not false, the 5 card must be turned.
In summary, the rule is in the form of an “if P then Q” conditional, and it can be shown to be false only by showing an instance of P and not-Q, so the P and not-Q cards (A and 5 in our example) are the only two that need to be turned to determine whether the rule is true or false. If the P and not-Q combination is there, the rule is false. If it is not there, then the rule is true.
Why do most people answer incorrectly when this problem, after explanation, is so easy? Many theories exist, but one of the oldest theories that certainly plays at least a partial role in the poor performance is that people focus on confirming the rule. This is what sets them about turning the 8 card (in hopes of confirming the rule by observing a vowel on the other side) and turning the A card (in search of the confirming even number). What they do not set about doing is looking at what would falsify the rule—a thought pattern that would immediately suggest the relevance of the 5 card (which might contain a disconfirming vowel on the back). As I have noted, there are many other theories of the poor performance on the task, but regardless of which of these descriptive theories explains the error, there is no question that a concern for falsifiability would rectify the error.
As useful as the falsifiability principle is in general reasoning, there is a large amount of evidence indicating that it is not a natural strategy. The reason is that the cognitive miser does not automatically construct models of alternative worlds, but instead models the situation as given. This is why, for most people, the mindware of seeking falsifying evidence must be taught.
Another paradigm which illustrates the problems that people have in dealing with falsification is the so-called 2-4-6 task, another famous reasoning problem invented by Peter Wason.10 In the 2-4-6 task, subjects are told that the experimenter has a rule in mind that classifies sets of three integers (triplets). They are told that the triplet 2-4-6 conforms to the rule. The subjects are then to propose triplets and, when they do, the experimenter tells them whether their triplet conforms to the rule. Subjects are to continue proposing triplets and receiving feedback until they think they have figured out what the experimenter’s rule is, at which time they should announce what they think the rule is.
The experimenter’s rule in the 2-4-6 task is actually “any set of three increasing numbers.” Typically, subjects have a very difficult time discovering this rule because they initially adopt an overly restrictive hypothesis about what the rule is. They develop rules like “even numbers increasing” or “numbers increasing in equal intervals” and proceed to generate triplets that are consistent with their overly restrictive hypothesis. Subjects thus receive much feedback from the experimenter that their triplets are correct, and when they announce their hypothesis they are often surprised when told it is not correct. For example, a typical sequence is for the subject to generate triplets like: 8-10-12; 14-16-18; 40-42-44. Receiving three confirmations, they announce the rule “numbers increasing by two.” Told this is incorrect, they then might proceed to generate 2-6-10; 0-3-6; and 1-50-99—again receiving confirmatory feedback. They then proceed to announce a rule like “the rule is that the difference between numbers next to each other is the same”—which again is incorrect. What they fail to do with any frequency is to generate sequences seriously at odds with their hypothesis so that they might falsify it—sequences like 100-90-80 or 1-15-2.
That subjects are not seriously attempting to refute their focal hypothesis is suggested by one manipulation that has been found to strongly facilitate performance. Ryan Tweney and colleagues ran an experiment in which the subject was told that the experimenter was thinking of two rules—one rule would apply to a group of triplets called DAX and the other to a set of triplets called MED. Each time the subject announced a triplet he or she was told whether it was DAX or MED. The subject was told that 2-4-6 was a DAX, and the experiment proceeded as before. DAX was defined, as bef
ore, as “any set of three increasing numbers” and MED was defined as “anything else.” Under these conditions, the subjects solved the problem much more easily, often alternating between positive tests of DAX and MED. Of course—now—a positive test of MED is an attempt to falsify DAX. The subject is drawn into falsifying tests of DAX because there is another positive, salient, and vivid hypothesis to focus upon (MED). Because the alternative exhausts the universe of hypotheses and it is mutually exclusive of the old focal hypothesis, each time the subjects try to get a confirmation of one they are simultaneously attempting a falsification of the other. In this way, the subjects were drawn to do something they did not normally do—focus on the alternative hypothesis and falsify the focal hypothesis. Of course, the fact that they had to be lured into it in this contrived way only serves to reinforce how difficult it is to focus on the focal hypothesis not being true.
Thus, the bad news is that people have a difficult time thinking about the evidence that would falsify their focal hypothesis. The good news is that this mindware is teachable. All scientists go through training that includes much practice at trying to falsify their focal hypothesis, and they automatize the verbal query “What alternative hypotheses should I consider?”
Base Rates: More Bayesian Mindware
Assigning the right probability values to future events is another critical aspect of rational thought. Interestingly, research has shown that people are quite good at dealing implicitly with probabilistic information (when it needs only to be tracked by the autonomous mind), but at the same time, when probabilities must be reasoned about explicitly people have considerable difficulty. Consider a problem that concerns the estimation of medical risk and has been the focus of considerable research, including some involving medical personnel:11
Imagine that the XYZ virus causes a serious disease that occurs in 1 in every 1000 people. Imagine also that there is a test to diagnose the disease that always indicates correctly that a person who has the XYZ virus actually has it. Finally, imagine that the test has a false-positive rate of 5 percent. This means that the test wrongly indicates that the XYZ virus is present in 5 percent of the cases where the person does not have the virus. Imagine that we choose a person randomly and administer the test, and that it yields a positive result (indicates that the person is XYZ-positive). What is the probability (expressed as a percentage ranging from 0 to 100) that the individual actually has the XYZ virus, assuming that we know nothing else about the individual’s personal or medical history?
Don’t read on until you have taken a stab at the problem. Do not feel that you must calculate the answer precisely (although if you think you can, go ahead). Just give your best guesstimate. The point is not to get the precise answer so much as to see whether you are in the right ballpark. The answers of many people are not. They show a tendency to overweight concrete and vivid single-case information when it must be combined with more abstract probabilistic information.
The most common answer is 95 percent. The correct answer is approximately 2 percent! People vastly overestimated the probability that a positive result truly indicates the XYZ virus. Although the correct answer to this problem can again be calculated by means of Bayes’ rule, a little logical reasoning can help to illustrate the profound effect that base rates have on probabilities. We were given the information that, of 1000 people, just one will actually be XYZ-positive. If the other 999 (who do not have the disease) are tested, the test will indicate incorrectly that approximately 50 of them have the virus (.05 multiplied by 999) because of the 5 percent false-positive rate. Thus, of the 51 patients testing positive, only 1 (approximately 2 percent) will actually be XYZ-positive. In short, the base rate is such that the vast majority of people do not have the virus. This fact, combined with a substantial false-positive rate, ensures that, in absolute numbers, the majority of positive tests will be of people who do not have the virus.
In this problem there is a tendency to overweight individual-case evidence and underweight statistical information. The case evidence (the laboratory test result) seems “tangible” and “concrete” to most people—it is more vivid. In contrast, the probabilistic evidence seems, well—probabilistic! This reasoning is of course fallacious because case evidence itself is always probabilistic. A clinical test misidentifies the presence of a disease with a certain probability. The situation is one in which two probabilities, the probable diagnosticity of the case evidence and the prior probability, must be combined if one is to arrive at a correct decision. There are right and wrong ways of combining these probabilities, and more often than not—particularly when the case evidence gives the illusion of concreteness—people combine the information in the wrong way.
I cannot emphasize enough that I do not wish to imply in this discussion of Bayesian reasoning that we do, or should, actually calculate using the specific Bayesian formula in our minds.12 It is enough that people learn to “think Bayesian” in a qualitative sense—that they have what might be called “Bayesian instincts,” not that they have memorized the rule, which is unnecessary. It is enough, for example, simply to realize the importance of the base rate. That would allow a person to see the critical insight embedded in the XYZ virus problem—that when a test with a substantial false alarm rate is applied to a disease with a very small base rate, then the majority of individuals with a positive test will not have the disease. This is all the knowledge of the Bayesian mindware regarding base rate that is needed (of course, greater depth of understanding would be an additional plus). Such a qualitative understanding will allow a person to make a guesstimate that is close enough to prevent serious errors in action in daily life. It is likewise with P(D/∼H). Good thinkers need not always actually calculate the likelihood ratio. They only need enough conceptual understanding to recognize the reason why the restaurant proprietor’s argument is a poor one.
Mindware for Probability Assessment
Consider another problem that is famous in the literature of cognitive psychology, the so-called Linda problem.13
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Please rank the following statements by their probability, using 1 for the most probable and 8 for the least probable.
a. Linda is a teacher in an elementary school ____
b. Linda works in a bookstore and takes Yoga classes ____
c. Linda is active in the feminist movement ____
d. Linda is a psychiatric social worker ____
e. Linda is a member of the League of Women Voters ____
f. Linda is a bank teller ____
g. Linda is an insurance salesperson ____
h. Linda is a bank teller and is active in the feminist movement ____
Most people make what is called a “conjunction error” on this problem. Because alternative h (Linda is a bank teller and is active in the feminist movement) is the conjunction of alternatives c and f, the probability of h cannot be higher than that of either c (Linda is active in the feminist movement) or f (Linda is a bank teller). All feminist bank tellers are also bank tellers, so h cannot be more probable than f—yet often over 80 percent of the subjects in studies rate alternative h as more probable than f, thus displaying a conjunction error. It is often argued that attribute substitution is occurring when subjects answer incorrectly on this problem. Rather than think carefully and see the problem as a probabilistic scenario, subjects instead answer on the basis of a simpler similarity assessment (a feminist bank teller seems to overlap more with the description of Linda than does the alternative “bank teller”).
Of course, logic dictates that the subset (feminist bank teller)/superset (bank teller) relationship should trump assessments of similarity when judgments of probability are at issue. If the relevant probability relationships are well learned, then using similarity reflects an error of the cognitive miser. I
n contrast, if the relevant rules of probability are not learned well enough for this problem to be perceived as within the domain of probabilistic logic, then the thinking error might be reclassified as a case of a mindware gap (rather than one of attribute substitution based on similarity and vividness).
An additional error in dealing with probabilities—one with implications for real-life decision making—is the inverting of conditional probabilities. The inversion error in probabilistic reasoning is thinking that the probability of A, given B, is the same as the probability of B, given A. The two are not the same, yet they are frequently treated as if they are. For example, Robyn Dawes described an article in a California newspaper that had a headline implying that a survey indicated that use of marijuana led to the use of hard drugs. The headline implied that the survey was about the probability of a student’s using hard drugs, given previous smoking of marijuana. But, actually, the article was about the inverse probability: the probability of having smoked marijuana, given that the student was using hard drugs. The problem is that the two probabilities are vastly different. The probability that students use hard drugs, given that they have smoked marijuana, is much, much smaller than the probability of having smoked marijuana given that students are using hard drugs. The reason is that most people who smoke marijuana do not use hard drugs, but most people who use hard drugs have tried marijuana.
What Intelligence Tests Miss Page 18