Think Again: How to Reason and Argue
Page 15
Both of these arguments are inductive, because they are clearly not valid. It is possible that only 10% of your sample plays golf, but many more people who use online dating services play golf. It is also possible that 10% of people who use online dating services play golf, but it is much more likely that this individual plays golf. Because these possibilities are so obvious, this argument is probably not intended to be valid.
How strong are these inductive arguments? That depends on the probability of the conclusion given the premises. To assess that, we need to ask a series of questions to determine how each argument could go astray.
The first question to ask about the generalization is whether its premise is true. Did only one out of your sample of ten play golf in the last six months? Even if only one reported playing golf then, maybe more of them played golf, but they chose to ignore that question; or maybe they played golf but forgot about it; or maybe they denied playing golf because they thought you were asking your question in order to weed out dates who play golf too often. People on online dating sites are not always trustworthy. What a surprise!
The second question is whether your sample is big enough. It is better to ask ten than to ask only three, but it would be better yet to ask a hundred, although it would take a long time to gather such a large sample. A sample of ten, thus, gives your argument some strength, but not much. Whether it is strong enough depends on how much is at stake. If the sample is too small, then the argument commits a fallacy called hasty generalization.
The third question is whether your sample is biased. A sample is biased when the percentage of the sample with the feature you are seeking is significantly higher or lower than the percentage of the whole group with that feature. Notice that even a large sample (such as 100 or 1,000 online daters) can be biased. This bias could occur if most golfers use a different online dating website, which reduces the number of golfers who use the website that you are sampling. Then you should not use your sample to draw any conclusion about how many people who use online dating services in general play golf. Even if you are interested only in this particular website, your sample might be biased if your application mentioned that you play golf, and the website used this information to suggest possible contacts. Then the names that you received might include many more golfers than is representative of the website as a whole. Or the website might send you only names of local users, and you might live in an area with fewer (or more) golfers than other areas.
Another way to bias your sample is by asking leading or misleading questions. The percentage of affirmative answers would probably have been much higher if you had asked, “Would you ever be willing to play golf?” and much lower if you had asked, “Are you fanatical about golf?” To avoid this way of pushing your results in one direction or the other, you asked, “How often did you play golf in the last six months?” This apparently neutral question still might have hidden biases. If you ask it in April, many golfers in snowy climates will not have played golf in six months, even though they will play as much as they can after the snow melts and their courses open. To avoid this problem, you should have asked about a full year. Or maybe they really do like to play golf, but they have nobody to play golf with, so they are also looking for a partner who plays golf. Then you should have asked whether they want to play golf. The results of generalizations are often affected by the questions used to gather a sample.
Overall, every inductive generalization from a sample needs to meet several standards. First, its premises must be true. (Duh! That is obvious, but people often forget it.) Second, its sample must be large enough. (Obvious again! But people rarely bother to ask how big the sample was.) Third, its sample must not be biased. (Bias is often less clear, because it is hidden in the sampling methods.) You will be fooled less often if you get in the habit of asking whether all three standards are met whenever you encounter or give an inductive generalization.
Application
The next kind of induction applies generalizations back down to individuals. Our example was this argument: “This person uses an online dating website, and only 10% of online dating website users play golf, so this person probably does not play golf.” How strong is this argument?
As always, the first question that you need to ask is whether its premises are true. If not (and if you should know this), then this argument does not give you a strong reason to believe the conclusion. But let’s assume that the premises are true.
You also need to ask whether the percentage is high (or low) enough. Your argument would provide a stronger reason for its conclusion if its premise cited 1% instead of 10% and a weaker reason for its conclusion if its premise cited 30% instead of 10%. And if its premise were that 90% of online daters play golf, then it could provide a strong reason for the opposite conclusion that this person probably does play golf. These numbers affect the strength of this kind of inductive argument.
Another kind of mistake is more subtle and quite common. What if the person who contacts you on the dating website contacted you because your profile on the dating website mentioned golf? Add that 80% of users who contact people because their profiles mention golf are themselves golfers. We can build this new information into a conflicting statistical application: This person contacted you because your profile mentioned golf, and 80% of users who contact people because their profiles mention golf are themselves golfers, so this person probably does play golf—or, more precisely, there is an 80% chance that this person plays golf.
Now we have statistical applications with opposite conclusions. The first said that this person probably does not play golf. The second says that this person probably does play golf. Which is more accurate? Which should we trust? The crucial difference to notice is that these arguments cite different classes, called reference classes. The first argument cites percentages within the class of online dating website users, whereas the second cites percentages within the class of those special online dating website users who contact people because their profiles mention golf. The latter class is smaller and a proper subset of the former class. In cases like this, assuming that the premises are true and equally justified, the argument with the narrower reference class usually provides a stronger reason, because its information is more specific to the case at hand.
Conflicting reference classes are often overlooked by people who apply generalizations to individual conclusions. This mistake combined with the fallacy of hasty generalization lies behind a great deal of stereotyping and prejudice. We all depend on generalizations and stereotypes in some cases, but mistakes about disadvantaged and vulnerable ethnic, racial, and gender groups can be especially harmful. A bigot might run into one stupid, violent, or dishonest member of an ethnic group. Every group has bad apples. The bigot then hastily generalizes to the conclusion that everyone in that ethnic group is similarly stupid, violent, or dishonest. Then the bigot meets a new member of that ethnic group, and applies the hasty generalization. The bigot concludes that this new individual is also stupid, violent, or dishonest, without considering the fact that this new individual also has other features that indicate intelligence, pacifism, and honesty. The bigot’s small sample and failure to consider such narrower conflicting reference classes show how bad reasoning can play a role in originating and maintaining prejudice. Bad reasoning is not the whole story, of course, since emotion, history, and self-interest also fuel bigotry, but we still might be able to reduce some prejudice to some degree by avoiding simple mistakes in inductive arguments.
WHY DID THAT HAPPEN?
Our next form of inductive reasoning is inference to the best explanation. It might be the most common form of all. When a cake does not rise, the baker needs to figure out the best explanation of this catastrophe. When a committee member does not show up for a meeting, colleagues wonder why. When a car does not start in the morning, its owner needs to find the best explanation in order to figure out which part to fix. This kind of inductive argument is also what detectives (like Sherloc
k Holmes) use to catch criminals. Detectives infer a conclusion about who did it because that conclusion provides the best explanation of their observations of the crime scene, the suspects, and other evidence. Many crime dramas are, in effect, long inferences to the best explanation. Science also postulates theories as the best explanation of observed results in experiments, such as when Sir Isaac Newton postulates gravity to explain tides or paleontologists hypothesize a meteor to explain the extinction of the dinosaurs.
These arguments share a certain form:
(1)Observation: Some surprising phenomenon needs to be explained.
(2)Hypothesis: A certain hypothesis explains the observations in (1).
(3)Comparison: The explanation in (2) is better than any alternative explanation of the observations in (1).
(4)Conclusion: The hypothesis in (2) is correct.
In our examples, the observations in (1) are the cake not rising, the colleague missing the meeting, the car not starting, the crime occurring, the tides rising, and the dinosaurs disappearing. Each argument then needs a set of competing hypotheses to compare plus some reasons to prefer one of those explanations.
Inferences to the best explanation are clearly not valid, since it is possible for the conclusion (4) to be false when the premises (1)–(3) are all true. That lack of validity is, however, a feature rather than a bug. Inferences to the best explanation are not intended to be valid, so it is unfair to criticize them for failing to be valid—just as it would be unfair to criticize a bicycle for failing to work in the ocean.
Inferences to the best explanation still need to meet other standards. They can go astray when any of their premises is false. Sometimes an inference to the best explanation is defective because the observation in premise (1) is not accurate. A detective might be misled when he tries to explain the blood on the car seat, when the stain is really beetroot juice. An inference to the best explanation can also go astray when the hypothesis in premise (2) does not really explain the observation. You might think that your car did not start because it was out of fuel when actually the starter did not even begin to turn over, and lack of fuel cannot explain that observation, since the starter does turn over when it is out of fuel (but not when the electrical system fails). Perhaps the most common problem for inferences to the best explanation is when premise (3) is false either because a competing hypothesis is better than the arguer thinks or because the arguer overlooked an alternative hypothesis that provides an even better explanation. You might think that your colleague missed the meeting because she forgot, when really she was hit by a car on the way to the meeting. Such mistakes can lead to regret and apologies.
Overall, some inferences to the best explanation can provide strong reasons to believe their conclusions, as when a detective provides evidence beyond a reasonable doubt that a defendant is guilty. In contrast, other inferences to the best explanation fail miserably, such as when beetroot juice is mistaken for blood. In order to determine how strong an inference to the best explanation is, we need to look carefully at each premise and also at the conclusion.
Hussein’s Tubes
Let’s try this with a controversial example. Some of the most important inferences to the best explanation lie behind political decisions, such as the decision by the United States to start the Iraq war. In his testimony before the United Nations Security Council on February 5, 2003, United States Secretary of State Colin Powell gave this argument:
Saddam Hussein is determined to get his hands on a nuclear bomb. He is so determined that he has made repeated covert attempts to acquire high-specification aluminum tubes from eleven different countries . . . . There is controversy about what these tubes are for. Most U.S. experts think they are intended to serve as rotors in centrifuges to enrich uranium. Other experts, and the Iraqis themselves, argue that they are really to produce the rocket bodies for a conventional weapon, a multiple rocket launcher . . . . First, it strikes me as quite odd that these tubes are manufactured to a tolerance that far exceeds U. S. requirements for comparable rockets. Maybe Iraqis just manufacture their rockets to a higher standard than we do, but I don’t think so. Second, we actually have examined tubes from several different batches that were seized clandestinely before they reached Baghdad. What we notice in these different batches is a progression to higher and higher levels of specification . . . . Why would they continue refining the specifications? Why would they go to all the trouble for something that, if it was a rocket, would soon be blown into shrapnel when it went off? . . . These illicit procurement efforts show that Saddam Hussein is very much focused on putting in place the key missing piece from his nuclear weapons program, the ability to produce fissile material.7
Of course, I do not endorse this argument. There are many reasons to doubt its premises and conclusion, especially given what we learned later. My goal is only to understand it.
The most natural way to understand Powell’s argument is as an inference to the best explanation. He mentions a surprising phenomenon that needs to be explained and compares three potential explanations of that phenomenon, so his argument fits cleanly into the form above:
(1*)Observation: Saddam Hussein made repeated covert attempts to acquire high-specification aluminum tubes that were increasingly refined.
(2*)Hypothesis: Hussein’s desire to produce fissile material and use it to make a nuclear bomb could explain why he made the attempts described in (1*).
(3*)Comparison: The explanation in (2*) is better than any alternative explanation of the observations in (1*), including Hussein’s reported desire to produce conventional rocket bodies and higher standards in Iraqi manufacturing.
(4*)Conclusion: Hussein desires to produce fissile material for a nuclear bomb.
Powell adds more to back up his premises, but let’s start with the central argument (1*)–(4*). Reconstructing the argument in this form should reveal or clarify how its premises work together to provide some reason to believe its conclusion. But how strong is that reason? To assess the strength of the argument, we need to go through the premises and conclusion carefully.
Premise (1*) raises several questions. How high were the specifications of the tubes that Hussein tried to obtain? How do we know that he insisted on such high specifications? How many attempts did he make? How long ago? Were they covert in the sense of being hidden from everyone or only from the United States? Why did he hide them? Although such questions are important, Powell could probably answer them, and he does cite evidence of Hussein’s attempts in other parts of his testimony, so it makes sense here to focus attention on his other premises.
Premise (2*) adds that the phenomena in (1*) can be explained by Hussein’s desire to produce fissile material for a nuclear bomb. This makes sense. People who desire to make fissile material will want to acquire what is necessary to make it, and high-specification aluminum tubes were needed to produce fissile material. Indeed, the high specifications were needed only for fissile material of the kind used in nuclear bombs, and there would be little use for this kind of fissile material except in making nuclear bombs. At least that is what Powell assumes.
The most serious problems arise in premise (3*). This premise compares Powell’s preferred explanation in (2*) with two competitors: a desire to produce conventional rocket bodies and higher Iraqi standards in manufacturing rockets. Powell focuses on rocket bodies because that explanation was offered by Hussein himself. Still, Powell’s argument would fail if any other explanation was as strong as Powell’s preferred explanation in (2*), so we need to consider both alternatives.
Powell criticizes the alternative explanation in terms of conventional rockets by asking rhetorical questions: “Why would they continue refining the specifications? Why would they go to all the trouble for something that, if it was a rocket, would soon be blown into shrapnel when it went off?” His point here is that the explanation in terms of conventional rockets fails to explain the continual refinements because rockets do not require these refine
ments, whereas his preferred explanation in terms of nuclear bombs succeeds in explaining these additional observations. Its ability to explain more observations is what is supposed to make his explanation better.
This increased explanatory power is a common ground for preferring one explanation to another. Suppose that the hypothesis that Gregor killed Maxim explains why the boot prints outside the murder scene are size 14, because Gregor wears size 14 boots, but this hypothesis cannot explain why those boot prints have their distinctive tread pattern, because Gregor does not own any boots with that tread pattern. Then that explanation is not as good as the hypothesis that Ivan killed Maxim, if Ivan wears size 14 and also owns boots with that distinctive tread pattern. We prefer hypotheses that explain more. Powell is simply applying this general principle to the case of aluminum tubes.
This argument is still subject to many objections. Critics could deny or doubt that Iraq did continue refining the specifications, in which case there would be no need to explain this. Or they could reply that these continual refinements were needed for conventional rockets, so the alternative hypothesis does explain the observations. To avoid these objections, Powell needs background arguments that are not included in the quoted passage. Still, even without delving deeper, our reconstruction has pinpointed at least two issues for further exploration.
The other alternative that Powell mentions is that “Iraqis just manufacture their rockets to a higher standard than we do.” Here Powell seems to have his tongue in his cheek. That is why he thinks all he needs to say in response is simply, “I don’t think so.” This sarcastic assurance seems to build on the assumption that US manufacturing is at least as precise as Iraqi manufacturing. That assumption might be obvious to this audience, but it is striking that Powell does not explicitly give any reason to favor his own explanation above this alternative.