Expert Political Judgment

Page 34

by Philip E. Tetlock

Figure 7.8. Inevitability and impossibility curves for the Cuban missile crisis. The inevitability curve displays gradually rising likelihood judgments of some form of peaceful resolution. The lower impossibility curve displays gradually declining likelihood judgments of the set of all alternative more violent endings. The higher impossibility curve was derived by adding the experts’ likelihood judgments of six specific subsets of these alternative violent possible endings. Adding values of the lower impossibility curve to the corresponding values of the inevitability curve yields sums only slightly above 1.0. But inserting values from the higher impossibility curve yields sums well above 1.0. The shaded area between the two impossibility curves represents the cumulative impact of unpacking on the subjective probability of counterfactual alternatives to reality.

Figure 7.8 demonstrates that logical anomalies do indeed emerge. Three findings stand out: (a) the power of unpacking ~x counterfactual alternatives to reality to inflate subjective probabilities beyond reason. Observers consistently judged the whole set of alternative violent outcomes to be less likely than the sum of its exclusive and exhaustive parts. The shaded area between the two impossibility curves represents the magnitude of this effect: the cumulative increase in the judged probability of counterfactual possibilities when experts generated impossibility curves not for the whole set of more violent outcomes (lower curve) but rather for each of the six unpacked subsets of more violent outcomes (higher curve). When we sum the values on the higher impossibility curve with corresponding dates on the inevitability curve, the sums routinely exceed 1.0; (b) the tendency of unpacking effects to grow gradually smaller as we move toward the end of the crisis. Experts who were unpacking ~x possibilities saw less and less wiggle room for rewriting history as the end approached; (c) the power of unpacking to mess up our understanding of the past. In the no-unpacking control group, simple linear equations captured 82 percent of the variance in judgments of the undifferentiated sets of peaceful outcomes and more violent alternatives. The past looks like a smooth linear progression toward a predestined outcome. In the unpacking condition, the past looks more like a random walk, albeit around a discernible trend, with three noticeable shifts in direction (violations of monotonicity). A fourth-order polynomial equation is necessary to explain 80 percent of the variance in these retrospective likelihood judgments.

Although figure 7.8 does not display it, foxes who unpacked counterfactual possibilities exhibited the strongest sub-additivity effects (probability judgments exceeding 1.0). Averaging across dates, their combined inevitability and impossibility judgments summed to 1.38, significantly greater than foxes in the control group ( = 1.07), or hedgehogs in either the unpacking ( = 1.18) or control conditions ( = 1.04). Foxes were also more prone to twilight-zone effects in which self-contradiction became particularly flagrant. There were periods of time for 85 percent of foxes, but only about 60 percent of hedgehogs, during which peace seemed inevitable (modal inevitability date = Oct. 27) but war still possible (modal impossibility date = Oct. 28).

A final sign of the power of unpacking comes from cross-condition comparisons of correlations between theoretical beliefs, such as the robustness of nuclear deterrence, and reactions to close-call counterfactuals raising the specter of nuclear war. The correlation is greater in the control than in the unpacking condition, (r [28] = 0.61 versus r [32] = 0.27). This drop-off is consistent with the notion that, under unpacking, observers shift from an abstract, covering-law mode of thinking to a more idiographic, case-by-case mode.

UNMAKING THE WEST EXPERIMENT

A second experiment replicated the previous results but on a grander measurement canvas. Experts drawn from the World History Association judged possibilities that stretched over one thousand years, not just fourteen days. The target issue was the rise of the West: How did it come to pass that a small number of Europeans, working from unpromising beginnings one thousand years ago, came to wield such disproportionate geopolitical influence? We saw in chapter 5 that hedgehogs who believe in the survival-of-the-fittest civilizations tended to see this mega-outcome as the product of deep and immutable causes and to be dismissive of close-call counterfactuals that implied otherwise.

There were two experimental conditions. In the control condition, experts received no encouragement, one way or the other, to think about alternative historical outcomes. We merely presented two measures. The starting question for the inevitability curve exercise was: At what point did some form of Western geopolitical domination become inevitable? The starting point for the impossibility curve exercise was: At what point did all possible alternatives to Western geopolitical domination become impossible? After identifying their inevitability and impossibility points, experts estimated how the likelihood of each class of historical outcome waxed or waned prior to those points. By contrast, the intensive-unpacking condition broke the set of all possible alternatives to Western domination into more refined subsets of scenarios in which either no civilization achieves global dominance or a non-Western civilization achieves global dominance. It then broke the no-hegemon world into subsets in which this outcome is brought about by either enfeebling the West (e.g., more lethal plagues, deeper Mongol incursions) or by empowering one of the Rest (e.g., Islam, China) and it broke the alternative-hegemon world into subsets in which Islam, China, or some other civilization achieves global power projection capabilities. Experts then judged the likelihood of each subset by plotting inevitability and impossibility curves.

The results replicated the missile crisis study in several key respects: (a) unpacking counterfactual alternatives to reality again inflated subjective probabilities beyond reason. As figure 7.9 shows, observers consistently judged the whole set of alternatives to Western domination to be less likely than the sum of its exclusive and exhaustive parts. The shaded area between the two impossibility curves captures the cumulative magnitude of this effect; (b) unpacking effects again grow smaller as we move to the end of the historical sequence; (c) unpacking again had the power to mess up our understanding of the past, transforming a smooth progression toward a foreordained outcome into a far more erratic journey. We need a fifth-order polynomial equation to capture 80 percent of the variance in the zigzaggy perceptions of the likelihood of unpacked outcomes, whereas a simple linear equation does the same work in the no-unpacking control condition; (d) unpacking again cut into the power of covering-law beliefs to constrain perceptions of the possible, with correlations dropping from .63 in the control condition to .25 in the unpacking condition; (e) foxes were again more susceptible to unpacking effects and made more subadditive probability judgments. Their inevitability and impossibility curve judgments averaged 1.41, markedly greater than for foxes in the control group ( = 1.09), or hedgehogs in either the control ( = 1.03) or unpacking groups ( = 1.21); (f) foxes who unpacked counterfactual possibilities again displayed longer twilight-zone periods. In the control group, foxes and hedgehogs did not disagree by a significant margin: the twilight zone period was roughly eighteen years long (they judged Western domination to be inevitable by 1731, on average, but considered alternatives to be still possible as late as 1749). But in the unpacking condition, the fox twilight zone stretched for forty-seven years compared to the hedgehogs’ twenty-five years (foxes judged Western domination to be inevitable by 1752 but alternatives to be still possible as late as 1799).

Figure 7.9. Inevitability and impossibility curves for the Rise of the West. The inevitability curve displays rising likelihood judgments of Western geopolitical dominance. The lower impossibility curve displays declining likelihood judgments of all possible alternatives to Western dominance. The higher impossibility curve was derived by adding experts’ likelihood judgments of six specific subsets of alternative historical outcomes. Adding values of the lower impossibility curve to the corresponding values of the inevitability curve yields sums only slightly above 1.0. Inserting values from the higher impossibility curve yields sums well above 1.0. The shaded area between the two impossibi
lity curves represents the cumulative impact of unpacking on the subjective probability of counterfactual alternatives to reality.

Thoughts on “Debiasing” Thinking about Possible Pasts

Chapters 5 and 7 bring into sharp relief the strengths and weaknesses of fox and hedgehog styles of historical reasoning. Chapter 5 showed that hedgehogs wove tighter mental connections between their abstract theoretical beliefs and specific opinions about what was possible at particular times and places. Their trademark approach was deductive. For example, if I believe the neorealist theory of balancing is correct, and that the historical context under examination is one in which a would-be hegemon (Philip II, Napoleon, Hitler, etc.) was squelched by a coalition of nervous rivals, then I can confidently rule out close-call counterfactuals that imply that, with minor tweaking of background conditions, the bid to achieve hegemony could have succeeded.

There was not much room in chapter 5 to contest the facts: the repeated demonstrations of the joint effects of cognitive style and theoretical beliefs on resistance to dissonant close-call scenarios. But there was room to contest how to “spin” those facts. In postexperimental conversations, many hedgehogs defended their deductive orientation toward history on the grounds of parsimony. A few hedgehogs also remarked on the foolishness of treating “things that perhaps almost happened” as serious arguments against an otherwise robust generalization. “Open-mindedness” here shaded into “credulousness.” As one participant commented: “I’ll change my mind in response to real but not imaginary evidence. Show me actual cases of balancing failing. Then we can talk about when the proposition holds.” Hedgehogs also saw nothing commendably open-minded about endorsing a generalization at one moment and at the next endorsing close-call counterfactuals that poked so many holes in the generalization as to render it “practically useless.” Thinking of that sort just looked “flip-floppy” to them.

When it came to invidious intellectual stereotyping, however, the foxes gave as good as they got. Some foxes were “appalled by the breathtaking arrogance” of the deductive covering-law approach, which they derogated as “pseudoscience,” “conjecture that capitalizes on hindsight,” and—the capstone insult—“tone-deaf to history.” They saw tight linkages between abstract theoretical beliefs and specific historical ones not as a strength (“nothing to brag about”) but rather as a potential weakness (an ominous sign of a “dogmatic approach to the past”). They saw looser linkages between general and specific beliefs not as a weakness (muddled and incoherent) but rather as a potential strength (a mature recognition of how riddled with coincidence and exceptions history is).

In chapter 7, we see similar disagreements over the right normative “spin” to put on the facts. But chapter 7 offers the first evidence of systematic bias more pronounced among foxes than among hedgehogs. Foxes were markedly more affected by the “unpacking of scenario” manipulations going both forward and backward in time. Some hedgehogs found the foxes’ predicament amusing. Shown the aggregate data in a debriefing, one participant quipped: “I’ll bet they’re good talkers, but let them try to talk their way out of the knots that they have tied themselves into here.” The greater susceptibility of foxes to “sub-additivity” effects reinforced the suspicion of hedgehogs that there was something sloppy about the fox approach to history. It looks disturbingly easy to lure foxes into the inferential equivalent of wild goose chases that cause them to assign too much likelihood to too many scenarios.

The foxes’ reaction to their falling short was both similar to and different from the hedgehogs’ reactions to their falling short on correspondence and coherence measures in chapters 3 and 4. Like the hedgehogs, the first response was to challenge the messenger. “You set us up” was a common refrain—although that begged the question of why foxes were easier to set up and raised the counter that, if foxes could be led astray by so simple a manipulation as scenario proliferation, then surely the experiments reveal a threat to good judgment in their professional lives. Like hedgehogs caught in earlier epistemic predicaments, many foxes tried to defend what they had done. No one put it exactly this way but they said, in essence, that foolish consistency is the hobgoblin of little minds. Do not worry that unpacking possibilities creates or reveals contradictions within belief systems. Preserving formal logic is not nearly as important as appreciating that life is indeterminate and full of surprises.

Unlike hedgehogs, though, the more introspective foxes showed considerable curiosity about the mental processes underlying the scenario effect and the potential implications of the effect. One fox quickly connected two observations: on the one hand, the debiasing studies of hindsight showed that encouraging experts to imagine alternatives to reality “made things better” by one standard (more accurate recall of past states of mind) and on the other hand, the Cuban missile crisis and “Unmaking the West” studies showed that unpacking alternatives to reality “made things worse” by another standard (more incoherent probability judgments). Anticipating my own preferred value spin on the results, he observed: “Well, you have two offsetting sources of error. We need to figure out how to manage them.”

CLOSING OBSERVATIONS

Chapter 7 does not tell us whether, in any given case, observers struck the right balance between theory and imagination-driven thinking. But the findings do sharpen our understanding of the criteria we use to make attributions of good judgment. On the one hand, scenario exercises can check hindsight bias and occasionally improve forecasting accuracy by stretching our conceptions of the possible. On the other hand, it is easy to overdo it when we start imagining “possible worlds.” Taking too many scenarios too seriously ties us into self-contradictory knots. Balancing these arguments, we might say that scenario exercises check theory-driven biases by activating countervailing imagination-driven biases, the cognitive equivalent of fighting fire with fire. And, though imagination-driven biases have not loomed large in this book as threats to good judgment, one could argue that people who make sub-additive probability judgments are as at risk of making flawed decisions as people who are overconfident and poky belief updaters.

Indeed, if we were to design correctives to imagination-driven biases, they would look like mirror images of the scenario generation exercises that we designed to correct theory-driven biases. To check runaway unpacking effects, people need plausibility pruners for cutting off speculation that otherwise grows like topsy over the bounds of probability. And people naturally rely on their preconceptions about causality to figure out where to start pruning, where to start saying “That couldn’t happen because….”

These tensions capture a metacognitive trade-off. Whether we know it or not, we are continually making decisions about how to decide, about how best to mix theory-driven and imagination-driven modes of thinking. Theory-driven thinking confers the benefits of closure and parsimony but desensitizes us to nuance, complexity, contingency, and the possibility that our theory is wrong. Imagination-driven thinking sensitizes us to possible worlds that could have been but exacts a price in confusion and even incoherence.

Hedgehogs and foxes disagree over how to manage this trade-off. Hedgehogs put more faith in theory-driven judgments and keep their imaginations on tighter leashes than do foxes. Foxes are more inclined to entertain dissonant scenarios that undercut their own beliefs and preferences. Insofar as there are advantages to be accrued by engaging in self-subversive thinking—benefits such as appropriately qualifying conditional forecasts and acknowledging mistakes—foxes will reap them. Insofar as there are prices for suspending disbelief—diluting one’s confidence in sound predictions and being distracted by ephemera—foxes will pay them. To link this argument to nature-versus-nurture debates over the heritability of cognitive styles—admittedly a stretch—it would be surprising from a population genetics perspective if both cognitive styles were not well represented in the human genome today. Foxes were better equipped to survive in rapidly changing environments in which those who abandoned bad ideas quickly h
eld the advantage. Hedgehogs were better equipped to survive in static environments that rewarded persisting with tried-and-true formulas. Our species—homo sapiens—is better off for having both temperaments and so too are the communities of specialists brought under the cognitive microscope in this volume.

It would be a mistake, however, to depict theory- and imagination-driven cognition as equally powerful forces in mental life. Most of the time, theory-driven cognition trumps imagination-driven cognition for foxes and hedgehogs alike. The differences that arise are matters of degree, not reversals of sign. We all do it, but theory-driven hedgehogs are less apologetic about applying demanding “Must I believe this?” tests to disagreeable evidence. Just how overwhelming evidence must be to break this barrier is illustrated by the ridiculously high thresholds of proof that partisans set for conceding their side did something scandalous. It required the Watergate recordings to force Nixon defenders to acknowledge that he had obstructed justice, and it required DNA testing of Monica Lewinski’s dress to compel Clinton defenders to concede that something improper had occurred in the Oval Office (at which point the defenders shifted into another belief system defense—trivialization). And we all do it, but theory-driven hedgehogs are also less apologetic about applying lax “Can I believe this?” tests to agreeable evidence. Just how easy it is to break this barrier is illustrated by the ridiculously low thresholds of proof that partisans set for rustling up evidence that supports their side or casts aspersions on the other. When we use this standard, we risk becoming the mental repositories of absurdities such as “Extraterrestrials are warning us to be better custodians of our planet: vote for Gore in 2000” or “Bill Clinton is an agent of the People’s Republic of China.”

‹ Prev Next ›