Expert Political Judgment

Page 30

by Philip E. Tetlock

Of all the possible ways to tinker with probability scores, fuzzy-set adjustments most polarized reviewers of this book. Positivists warned of opening Pandora’s box. We can never conclude that anyone ever made a mistake if we lower the bar to the point where we accept at face value all rationalizations that smart people concoct to save face. By contrast, constructivists saw fuzzy-set adjustments as a welcome break from the “naïve” practice of coding reality into dichotomous on/off, zero/one, categories. We live in a world of shades of gray. Sometimes it makes sense to say that things that did not happen almost did or that things that have not happened still might or that exogenous shocks have thrown off the predictions of a sound theory.

Our probability-scoring procedures are flexible enough to transform an irreconcilable philosophical feud into a tractable measurement problem. As the Technical Appendix describes, we drew on fuzzy-set theory to transform binary variables into continuous ones.11 With no adjustment, the probability score for an expert who assigned a .6 likelihood to a future that did not occur would be .36 (higher scores indicate worse fits between judgments and reality). But with adjustments that shrink reality-probability gaps in proportion to the frequency with which different groups of experts invoked belief system defenses and in proportion to the credibility weights we assign those defenses, the probability scores can be cut in half or more.

Can generous fuzzy-set adjustments eradicate the stubborn performance gap between hedgehogs and foxes? It depends on how defensible one views the belief system defenses. Hedgehogs gain ground from fuzzy-set adjustments for three reasons: first, the gaps between probability judgments and reality were bigger for hedgehogs (hence, hedgehogs benefit more in absolute terms from percentage-shrinkage, fuzzy-set adjustments); second, these initial probability-reality gaps were larger because hedgehogs more consistently exaggerated the likelihood of change for both better and worse (hence hedgehogs catch up only when we define the forecasting goal as distinguishing the status quo from change in either direction); third, hedgehogs resorted roughly twice as often as foxes to belief system defenses that triggered fuzzy-set adjustments (hence hedgehogs get the benefit of roughly twice as many adjustments). Figure 6.5 shows the predictable result: the advantage foxes enjoyed on forecasting skill disappears when we focus on distinguishing the status quo from change and assign large credibility weights to belief system defenses (greater than .6).

This “victory” for hedgehogs is purchased, though, at a price many see as prohibitive. Positivists suspect that I have now made the transition from collegial open-mindedness to craven appeasement of solipsistic nonsense. Fuzzy-set adjustments mean that, when we take at face value the rationalizations that poor forecasters offer for being off the mark, these forecasters look as accurate as their less defensive and more accurate colleagues. Positivist critics also remind us of how selectively experts, especially hedgehogs, advanced close-call counterfactual, exogenous-shock, and off-on-timing interpretations of political outcomes. Chapter 4 showed these interpretive strategies became most popular when experts had embarrassingly large gaps between subjective probabilities and objective reality that needed closing. Experts rarely gave intellectual rivals who bet on the wrong outcome the same benefit of the close-call doubt. Large fuzzy-set adjustments thus reward self-serving biases in reasoning. If we reduce the adjustments in proportion to the selectivity with which belief system defenses were invoked (thus punishing self-serving reasoning), the performance gaps in forecasting skill reappear.

What role remains for fuzzy-set adjustments? The answer is, fittingly, fuzzy. Constructivists are right: there is a certain irreducible ambiguity about which rationales for forecasting glitches should be dismissed as rationalizations and which should be taken seriously. But the positivists are right: hedgehogs can only achieve cognitive parity if we permit implausibly large fuzzy-set adjustments that reflect the greater propensity of hedgehogs to explain away mistakes.

Figure 6.5. The gap between hedgehogs and foxes narrows, and even disappears, when we apply fuzzy-set adjustments that give increasingly generous credibility weights to belief system defenses. Crossover occurs with sufficiently extreme adjustments when we define the forecasting task as distinguishing continuation of the status quo from change in either direction (up or down). Lower scores on y axis are better scores.

A PARADOX: WHY CATCH-UP IS FAR MORE ELUSIVE FOR THE AVERAGE INDIVIDUAL THAN FOR THE GROUP AVERAGE

The previous defenses tried, rather futilely, to raise the average hedgehog forecaster to parity with the average fox forecaster. But all scoring adjustments were applied at an individual level of analysis. There may be a better way to salvage hedgehogs’ collective reputation in the forecasting arena.

It is an intriguing mathematical fact that the inferiority of the average forecaster from a group need not imply the inferiority of the average forecast from that group.12 With respect to the current dataset, for instance, Jensen’s inequality tells us that, for quadratic variables such as probability scores, the average accuracy of individual forecasters will typically be worse than the accuracy of the average of all their forecasts. Jensen’s inequality also implies that this gap between the average forecaster and the average forecast will be greater for groups—such as hedgehogs—that make more extreme (higher variance) forecasts. Consistent with this analysis, we find that, whereas the probability score for the average of all fox forecasts is only slightly superior to that of the average fox forecaster (.181 versus .186), the probability score for the average for all hedgehog forecasts is massively superior to that of the average hedgehog forecaster (.184 versus .218). The average fox forecast beats about 70% of foxes; the average hedgehog forecast beats 95% of hedgehogs.

Why do we finally find catch-up here? The political-psychological intuition behind this result is that, relative to the congenitally balanced foxes, the intellectually aggressive hedgehogs make more extreme mistakes in all possible directions and thus derive disproportionate benefit when we let their errors offset each other in composite forecasts. Foxes do intuitively what averaging does statistically, and what hedgehogs individually largely fail to do: blend perspectives with nonredundant predictive information. Defenders of hedgehogs could take this as a vindication of sorts: national security advisers do not do appreciably worse relying on the average predictions of hedgehog analysts than on those of fox analysts. But it seems more reasonable to the author to take this result as reinforcing the analysis in chapter 3 of why hedgehogs performed consistently more poorly than foxes. Hedgehogs lost because their cognitive style was less well suited for tracking the trajectories of complex, evolving social systems.

Really Not Incorrigibly Closed-minded Defense

The focus shifts here from “who got what right” to “who changed their minds when they were wrong.” Defenders of hedgehogs can try to parry the charge that their clients are bad Bayesians by arguing that the indictment rests on bad philosophy of science. Drawing on “post-positivist philosophy of science,” they warn us of the perils of “naïve falsificationism” and admonish us that all hypothesis testing rests on background assumptions about what constitutes a fair test.13 We can justify modus tollens, the inference from “if hypothesis p is true, then q” to “~q therefore ~p,” only if we can show that the conditions for applying hypothesis p to the world have been satisfied and q truly did not occur.

But this challenge too ultimately stands or falls on the defensibility of the belief system defenses that losers of reputational bets invoke to sever modus tollens. These bets asked forecasters to estimate the probabilities of possible futures conditional on the correctness of their perspective on the underlying drivers of events at work and then conditional on the correctness of a rival perspective. If I say that the survival of the Soviet Union is likely given my view of the world (.8) but unlikely given your view of the world (.2), and the Soviet Union collapses, I am under a Bayesian obligation to change my mind about the relative merits of our worldviews and to do so in proportion
to the extremity of the odds I originally offered (4:1).

But should I change my mind if the necessary conditions for applying “my theory” to the world were not fulfilled, if exogenous shocks undercut the ceteris paribus requirement for all fair tests of hypotheses, if the predicted event almost occurred and still might, and if I now believe prediction exercises to be so riddled with indeterminacy as to be meaningless? Each defense provides a plausible reason for supposing that, whatever experts thought the odds were years ago, they now see things in a different light that justifies their refusals to change their minds.

There is endless room for wrangling over the merits of specific applications of specific defenses. But there is still no avoiding the observation in Chapter 4 that experts activated defenses in a curiously asymmetrical fashion. They almost never mobilized defenses to help out adversaries whose most likely scenarios failed to materialize. Virtually no one says: “Don’t hold the nonoccurrence of y against those who assigned steep odds in favor of the proposition ‘if x, then y’ because the conditions for x were never satisfied.” The inescapable conclusion is that experts are far fussier about signing off on the background conditions for testing their own pet ideas than they are for testing those of their opponents.

In a similar vein, one virtually never hears forecasters try to spare their rivals embarrassment by insisting that, although the predicted outcome did not occur in the specified time frame, it almost did (close-call defense) and perhaps soon will (off-on-timing defense). These defenses acknowledge that, although technically, y failed to occur, it almost did and those who said it would deserve credit for being almost right, not blame for being wrong by so whisker thin a margin. Hypothesis testing can be easily thrown off by lustful presidents, drunken coup plotters, determined assassins, and the exact timing of the puncturing of market bubbles—as long as the hypothesis at stake is one’s own.

How should we weigh these arguments? On the one hand, we can exonerate hedgehogs of charges of reneging more often on reputational bets if we allow generous fuzzy-set adjustments that assign large credibility weights (.7 or higher) to the close-call, off-on-timing, or exogenous-shock defenses. On the other hand, we also know that experts in general, and hedgehogs in particular, invoked these defenses in suspiciously self-serving ways. A balanced appraisal is that, although the normative verdicts reached in chapters 3 and 4 may need more case-specific qualifications, the overall verdicts stand.

Rebutting Accusations of Double Standards

Chapter 5 showed that experts, especially hedgehogs, advanced three reasons for dismissing dissonant evidence that they rarely applied to congenial evidence: challenging the authenticity and representativeness of documents, and the motives of investigators. Follow-up interviews revealed that hedgehogs saw little need to apologize for upholding a lenient standard for congenial evidence and a tough standard for disagreeable evidence. Their attitude was: “The more ludicrous the claim, the higher the hurdle its promoters should jump.” Claims that contradict established knowledge merit sharp scrutiny.14

By contrast, foxes were more flustered by the unveiling of their own double standards and quicker to sense the risks of the hedgehog defense of double standards: one will never change one’s mind if one always accepts poor research with agreeable conclusions and hammers away at good research with disagreeable conclusions. The question becomes: When do double standards become unacceptable? Many philosophers of science argue that one reaches that point when one becomes insensitive to variations in research quality and attends only to the agreeableness of the results.15 One’s viewpoint then becomes impregnable to evidence. Using this standard, reanalysis of the Kremlin archives study reveals that, although foxes and hedgehogs were both more inclined to accept discoveries that meshed with their ideological preconceptions, foxes were at least moderately responsive to the quality of the underlying research, whereas hedgehogs were seemingly oblivious. This finding is a warning that, although setting higher standards for dissonant claims is sometimes defensible, hedgehogs risk taking the policy too far.

Rebutting Accusations of Using History to Prop Up One’s Prejudices

Chapter 5 showed that experts, especially hedgehogs with strong convictions, used a number of lines of logical defense to neutralize close-call counterfactuals that undercut pet theories. We also saw that hedgehogs saw no need to apologize for their dismissive stance toward close-call counterfactuals. They felt that learning from history would become impossible if we paid attention to every whimsical what-if that might stray through the minds of dreamy observers. Confronted with comparisons to foxes’ greater openness to close-call scenarios, hedgehogs saw the contrast in a light favorable to themselves: “We (hedgehogs) know how to follow through on the historical implications of an argument” and they (foxes) “tie themselves up in self-contradictory knots.” Foxes, of course, saw things differently: they suspected hedgehogs of being “heavy-handed determinists” (“How can they be so confident about things no one could know?”) Foxes thought that they knew when to rein in the impulse to generalize.

Lacking definitive correspondence standards for assessing how re-routable history was at given junctures, some say we are left with utterly subjective judgment calls.16 Logic and evidence cannot, however, be completely shunted out. A higher-order inconsistency bedevils both many hedgehogs and foxes, but especially hedgehogs: the tendency to fit the distant past into neat deterministic patterns coupled with the tendency, especially right after forecasting failures, to portray the recent past as riddled with contingency. Inasmuch as there is no compelling reason to suppose that the recent past is more stochastic than the distant past, this pattern is suggestive of cognitive illusion.

Defending the Hindsight Bias

Chapter 4 also showed that hedgehogs were more susceptible to hindsight effects: to exaggerate the degree to which “they saw it coming all along.” Defenders of hedgehogs can, however, challenge the characterization of hindsight bias as cognitive defect. Hindsight may be an adaptive mechanism that “unclutters our minds by tossing out inaccurate information—a kind of merge/purge for the mind … a mental bargain, a cheap price for a much larger cognitive gain: a well-functioning memory that can forget what we do not need—such as outdated knowledge—and that constantly updates our knowledge, thereby increasing the accuracy of our inferences.”17 As one exasperated expert commented, “We all once believed in Santa Claus. You don’t think I keep track of every screwy belief I once held.”

Granting that the hindsight effect is cognitively convenient does not, however, alter its status as a mistake in the strictest correspondence meaning of the term—a deviation from reality that makes it difficult to learn from experience. Hindsight bias forecloses the possibility of surprises, and surprises—because they are hard to forget—play a critical role in learning when and where our beliefs failed us. Indeed, neuroscientists have begun to pinpoint the mechanisms involved: surprise and learning are both linked to neural activation in the dorsolateral prefrontal cortex. The key point should not, though, require magnetic resonance imaging of brain functions. Aesop’s fables warn us of the dangers of a smug “knew it all along” attitude toward the world.

We Posed the Wrong Questions

Imperfect though they may be, there are still standards for judging replies to forecasting and belief-updating questions. But it gets progressively harder to tease out testable implications from the sixth, seventh, and eighth hedgehog defenses: “The researchers asked the wrong questions of the wrong people at the wrong time.” The only response is sometimes: “Well, if you think you’d get different results by posing different types of questions to different types of people, go ahead.” That is how science is supposed to proceed.

In fairness to hedgehogs, though, some offered the “wrong questions” protest well before we knew who got what right. They told me from the outset that their province is the long view, measured not in months or years but rather in decades, generations, and occasionally centuries.18 On
e hedgehog expressed the view of many: “I cannot tell you what will happen tomorrow in Kosovo or Kashmir or Korea. These idiotic squabbles will go on until exhaustion sets in. But I can tell you that irreversible trends are at work in the world today.”

We noted in chapter 3 that hedgehogs found comprehensive worldviews congenial and were drawn to three major offerings from the late twentieth-century marketplace of ideas: optimistic-rationalist positions that predict that states will be compelled by competitive pressures to become increasingly democratic and capitalistic, more pessimistic identity politics–neorealist positions that predict that peoples, and the nation-states they inhabit, will be forever divided by bitter in-group–out-group distinctions, and still more depressing neo-Malthusian views that predict ever-nastier ecocatastrophes and conflicts between the haves and the have-nots.

Each school of thought spun elaborate justifications for its profoundly different view of the long-term future. And although it will take a long time to sort out which predictions will prove prophetic, there is no reason to believe that the relative performance of hedgehogs would improve even if we waited decades. Hedgehogs are ideologically diverse. Only a few of them can be right in each long-term competition. The best bet thus remains on the less audacious foxes.

We Failed to Talk to Properly Qualified and/or Properly Motivated People at the Right Time

‹ Prev Next ›