Expert Political Judgment

Home > Other > Expert Political Judgment > Page 35
Expert Political Judgment Page 35

by Philip E. Tetlock


  Good judgment, then, is a precarious balancing act. We often learn we have gone too far in one direction only after it is too late to pull back. Executing this balancing act requires cognitive skills of a high order: the capacity to monitor our own thought processes for telltale signs of excessive closed- or open-mindedness and to strike a reflective equilibrium faithful to our conceptions of the norms of fair intellectual play. We need to cultivate the art of self-overhearing, to learn how to eavesdrop on the mental conversations we have with ourselves as we struggle to strike the right balance between preserving our existing worldview and rethinking core assumptions. This is no easy art to master. If we listen to ourselves carefully, we will often not like what we hear. And we will often be tempted to laugh off the exercise as introspective navel-gazing, as an infinite regress of homunculi spying on each other … all the way down. No doubt, such exercises can be taken to excess. But, if I had to bet on the best long-term predictor of good judgment among the observers in this book, it would be their commitment—their soul-searching Socratic commitment—to thinking about how they think.

  1 “Judgmental performance” here refers only to those most readily measurable aspects of good judgment: the empirical accuracy and logical coherence of subjective probability forecasts. This chapter does not test the claim—central to the livelihoods of scenario consultants but notoriously difficult to quantify—that scenario exercises stimulate contingency planning that more than compensates for the transaction costs of the scenario exercises.

  2 Koriat, Fischhoff, and Lichtenstein, “Over-Confidence”; P. E. Tetlock and J. Kim, “Accountability and Overconfidence in a Personality Prediction Task,” Journal of Personality and Social Psychology 52 (1987): 700–709.

  3 C. Anderson, “Inoculation and Counter-explanation: Debiasing Techniques in the Perseverance of Social Theories,” Social Cognition 1 (1982): 126–139.

  4 Hawkins and Hastie, “Hindsight.”

  5 The Economist in October 2001 characterized scenario planning as the most popular approach to protecting big organizations from nasty surprises lurking in the ill-defined future.

  6 Schwartz, “Long View.”

  7 Schwartz, “Long View.”

  8 J. A. Ogilvy, P. Schwartz, and J. Flower, China’s Futures (San Francisco, CA: Jossey-Bass, 2000).

  9 Moreover, there is a good chance, given the capacious capacity of human beings to rationalize choices and the stingy feedback history provides, that consumers themselves are not “in the know.” If we had relied in this project on experts’ self-assessments of whether they were overconfident or self-justifying—instead of assessing their performance—we would have concluded that “all’s well because the experts tell us so.” Even now, it is a safe bet that few readers think of themselves as systematically biased thinkers.

  10 These imagination-driven biases are fueled by dramatizing scenarios in ways that, on the one hand, make it easier to transport ourselves into the fictional universe but, on the other, make the overall scenario increasingly improbable by any logical standard. As a result of this perverse inverse relationship between the psychological impact of stories and the cumulative likelihood of their event linkages, more imaginative thinkers become more susceptible to making self-contradictory judgments that violate basic precepts of probability theory. They find themselves endorsing oxymoronic assertions such as “I believe that by this point outcome x was inevitable but alternatives to x were still possible.” They also assign higher likelihoods to vividly embellished scenarios than they would have to the abstract sets of possibilities from which scenarios were derived and thus constitute subsets. The result is exactly what Amos Tversky’s support theory predicts: reverse Gestalt effects in which people judge the probability of the whole to be less than the sum of its exclusive and exhaustive parts.

  11 Support theory has held up in many samples—from undergraduates to options traders to physicians—so presumably it also applies to professional observers of the political scene.

  12 A. Tversky and D. Kahneman, “Extensional vs. Intuitive Reasoning: The Conjunction Fallacy in Probability Judgment,” Psychological Review 90 (1983): 293–315.

  13 P. E. Tetlock, “The Logic and Psycho-logic of Counterfactual Thought Experiments in the Rise-of-the West Debate,” in Unmaking the West: What-If Scenarios That Rewrite World History, ed. P. E. Tetlock, R. N. Lebow, and G. Parker (Ann Arbor: University of Michigan Press, 2006); M. C. Green and T. C. Brock, “The Role of Transportation in the Persuasiveness of Narratives,” Journal of Personality and Social Psychology 79(5) (2000): 701–21; A. Tversky and C. Fox, “Weighting Risk and Uncertainty,” Psychological Review 102(2) (1995): 269–83.

  14 J. S. Carroll, “The Effect of Imagining an Event on Expectations for the Event: An Interpretation in Terms of the Availability Heuristic,” Journal of Experimental Social Psychology 14 (1978): 88–96; L. Ross, M. R. Lepper, F. Strack, and J. Steinmetz, “Social Explanation and Social Expectation: Effects of Real and Hypothetical Explanations on Subjective Likelihood,” Journal of Personality and Social Psychology 35 (1997): 817–29.

  15 The greater susceptibility to scenario generation effects of foxes is a result that holds up well in this chapter, regardless of whether experts were contemplating possible futures or possible pasts. These results nicely parallel those of experimental work that demonstrates the greater susceptibility of low-need-for-structure (or closure) respondents to divergent-thinking manipulations. See E. Hirt, F. Kardes, and K. Markman, “Activating a Mental-simulation Mindset through Generation of Alternatives: Implications for De-biasing in Related and Unrelated Domains,” Journal of Experimental Social Psychology 40 (2004): 374–83.

  16 This manipulation bears a strong similarity to those employed in laboratory work on debiasing. See Hawkins and Hastie, “Hindsight.”

  17 P. E. Tetlock, R. N. Lebow, and G. Parker, eds., Unmaking the West: What-If Scenarios That Rewrite World History (Ann Arbor: University of Michigan Press, 2006); Tetlock and Lebow, “Poking Counterfactual Holes.”

  CHAPTER 8

  Exploring the Limits on Objectivity and Accountability

  “When I use a word,” Humpty Dumpty said, in a rather scornful tone, “it means just what I choose it to mean—neither more nor less.” “The question is,” said Alice, “whether you can make words mean so many different things.” “The question is,” said Humpty Dumpty, “which is to be master—that’s all.”

  —LEWIS CARROLL

  You never know you have had enough until you have had more than enough.

  —WILLIAM BLAKE

  OBJECTIVITY was the bedrock principle on which professional societies of historians and social scientists were founded in the nineteenth century. The disciplinary mandate was to move closer, by successive imperfect approximations, toward the truth, a truth unadorned by mocking quotation marks. Well before the twentieth century’s end, however, scholars started chiseling at this foundation, raising doubts about the positivist project and the feasibility of sharp distinctions between observer and observed, fact and value, and even fact and fiction.1 Constructivist and relativist epistemologies—which depicted “truth” as perspectival and demanded to know “whose truth”—garnered considerable respectability.

  My own sympathies should not be in doubt. This research program has been unabashedly neopositivist in conception and design. In study after study, I exhorted experts to translate inchoate private hunches into precise public assertions that I classified as accurate or inaccurate, defensible or indefensible, and duly entered into my expandable correlation matrix of indicators and predictors of good judgment. From a neopositivist perspective, it is tempting to close on a defiant note, to declare that until someone comes up with something demonstrably better, these imperfect measures are reasonable approximations of an elusive construct.

  I divide this final chapter into two sections. The first section grapples with the philosophical objections raised in chapter 1 against this entire enterprise. It is organized around a Soc
ratic exchange among rival perspectives on what the behavioral and social sciences have to offer. Constructivist and postmodernist critics of objectivity remind us that, as they prophesied, the pursuit of objectivity did occasionally bog down. “Truth” is to some degree perspectival and socially constructed: what counts as good judgment hinges on judgment calls on the “right” value, probability weighting, and controversy adjustments to probability scores, on the “best” estimates of base rates for difficulty adjustments, and on the “correct” credibility weights to assign experts’ rationales for refusing to change their minds when the unexpected occurs. Positivist proponents of objectivity remind us of the dangers of cutting experts too much slack by permitting too many adjustments. It is possible to define empirical and logical standards of accountability that transcend partisan wrangling and that allow us to gauge the judgmental performance of experts, from diverse points of view, on common metrics.

  If there is a grand moral here, it is that there is no quick measurement fix to the traditional tension between subjectivist and objectivist approaches to human judgment. The significance of this effort does not lie in the exact balance struck between conflicting views of good judgment; it lies in the broader precedent of using objectivist methods to factor difficult to quantify, subjectivist objections into the measurement process. The crisscrossing epistemological divisions of our day—qualitative versus quantitative, subjective versus objective, constructivism versus positivism—are not as irreconcilable as some suppose.

  The second section of the chapter grapples with the policy implications of this effort to objectify standards for judging judgment. The motivation for the project was not solely a basic science one: the opportunity to test hypotheses about the political-psychological correlates and determinants of susceptibility to judgmental biases. The motivation was also, in part, an applied science one. I suspected at the outset that we as a society would be better off if we held experts—be they pundits in the public arena or intelligence analysts behind the scenes—systematically accountable to standards of evidence that command broad assent across the spectrum of reasonable opinion. Subsequent findings from this project—as well as events over the last twenty years—have reinforced my suspicion that there is something wrong with existing mechanisms for getting to the truth both in the media-driven marketplace of ideas and in the top-secret world of intelligence analysis. Indeed, one of the more disconcerting results of this project has been the discovery of an inverse relationship between how well experts do on our scientific indicators of good judgment and how attractive these experts are to the media and other consumers of expertise. The same self-assured hedgehog style of reasoning that suppresses forecasting accuracy and slows belief updating translates into compelling media performances: attention-grabbing bold predictions that are rarely checked for accuracy and, when found to be wrong, that forecasters steadfastly defend as “soon to be right,” or “almost right” or as the “right mistakes” to have made given the available information and choices.

  From a broadly nonpartisan perspective, the situation cries out for remedy. And from the scientific vantage point offered by this project, the natural remedy is to apply our performance metrics to actual controversies: to pressure participants in debates—be they passionate partisans or dispassionate analysts—to translate vague claims into testable predictions that can be scored for empirical accuracy and logical defensibility. Of course, the resistance would be fierce, especially from those with the most to lose—those with grand reputations and humble track records. But I do still believe it possible to raise the quality of debate by tracking the quality of claims and counterclaims that people routinely make about what could have been (if you had had any sense, you would have listened to us!) or might yet be (if you have any sense, listen now!). The knowledge that one’s forecasting batting average and reputation for honoring reputational bets are at stake may motivate even the most prone-to-self-justification hedgehogs, and the most prone-to-groupthink groups, to try harder to distinguish what they really know about the future from what they suspect, hope, or fear might be the case.2

  CLASHING PHILOSOPHICAL PERSPECTIVES

  Relativists have doubted this undertaking from the start.3 I find it useful, though, to distinguish degrees of doubt. Adamant relativists give no ground. Look at chapter 6, they chortle, and count the dubious assumptions and scoring adjustments that the author had to make in gauging whether hedgehogs are worse forecasters or pokier belief updaters than foxes. Look at chapter 7 and enumerate the judgment calls the author had to make in weighing the mind-opening benefits of scenario exercises against the costs of cognitive chaos. Good judgment, the harsh indictment runs, is a quicksilver concept forever slipping from our positivist grasp. Less doctrinaire relativists give some ground: they concede we may have learned something useful about the linkages between styles of reasoning and good judgment variously conceived, but reaffirm their antipathy toward any effort to “objectify” anything as profoundly “intersubjective” as good judgment.

  I also find it useful to array neopositivist replies to these complaints along a conciliatoriness continuum. At the confrontational end are hardliners who believe I made too many concessions to the “thinly disguised whining” of sore losers who refuse to admit their mistakes. At the conciliatory end, where I place myself, are more accommodating responses to relativists: “Yes, we do live in a controversy-charged, ambiguity-laden world that maps imperfectly onto right-wrong classification schemes” and “Yes, the assumptions underlying our measures of good judgment are open to moral, metaphysical, and historical challenges.” But even we “accommodationists” are willing to give up only so much to those who insist on the futility of all efforts to objectify good judgment. We have to draw the line somewhere.

  Let us then populate our “Socratic dialogue” with four speakers: unrelenting and reasonable relativist critics as well as accommodating and hard-line neopositivist defenders. Readers can judge for themselves who proves the most incisive interlocutor.

  Unrelenting Relativist

  Occasionally, the author comes tantalizingly close to grasping the self-contradictions in which he has ensnared himself. He tried to capture a value-laden construct with a net of value-neutral measures and, not surprisingly, he came up empty. We have to wait until chapter 6, though, for the author finally to recognize that his centerpiece correspondence measure of good judgment, the probability score, is flawed in a multitude of ways. And even here, he refuses to acknowledge the key implication of these flaws: the impossibility of developing a theory-free, value-free measure of “getting it right.” Instead, the author resorts to desperate patch-ups: difficulty adjustments to cope with variation in the unpredictability of forecasting tasks, value adjustments to cope with variation in the priorities that forecasters attach to avoiding different mistakes, controversy adjustments to cope with variation in views on what really happened, and fuzzy-set adjustments to cope with variation in views on what almost happened or might yet happen. The author is delusional if he thinks these patch-ups bring him closer to “Platonic true” scores on a good-judgment continuum.

  Each patch-up raises more questions than it answers. Consider the challenge of computing the “correct” base rate for difficulty-adjusted probability scores. Which time frames should we use to ascertain how often there is leader turnover or shifts in central government expenditure or …? Which comparison cases should we use for Ethiopia in the early 1990s: “sub-Saharan African countries” or “former Soviet bloc nations transitioning from Communism” or “dictatorships transitioning to democracy”? Is it even meaningful to talk about base rates for many outcomes? There was only one Soviet Union. What do we learn by lumping it with other multiethnic empires (a small set populated by diverse entities that resist all but the most circumscribed comparisons)?

  Base rates represent inappropriate intrusions of probability theory into domains where well-defined sets of comparison cases do not exist. As one defiant participant decla
red: “This ain’t survey research.” And I am not swayed by the author’s tinkering with alternative base-rate estimates in difficulty adjustments. These pseudoscientific “sensitivity analyses”— raising or lowering arbitrary estimates by arbitrary fudge factors—are fig leaves for ignorance.

  Or consider the daunting problems of value-adjusting probability scores. It requires only glancing familiarity with the political scene to guess where true believers will line up. Those on the left have traditionally tried to avoid false alarms that treat status quo states as expansionist or that push harsh austerity measures on developing countries. Those on the right have harbored the mirror-image concerns. But, of course, more nuanced thinkers are less predictable. Their error-avoidance priorities take different forms in different circumstances—judgments that require the “excessively generous value adjustments” the author explicitly abjures.4

  The author is too stingy in granting value adjustments. But I have a more fundamental objection. Like difficulty adjustments, value adjustments are utterly arbitrary. In both cases, the author tries to conceal the capriciousness of the process under the scientific rhetoric of “gauging robustness” and “assessing boundary conditions.” Tinkering with alternative value adjustments—shrinking or expanding gaps between subjective probabilities and “objective reality” by plucking coefficients from the air—is just another fig leaf.

  And that brings us to the most desperate of all the patch-ups—controversy and fuzzy-set adjustments—where we can no longer dodge the intersubjective character of objectivity. Underlying all of the author’s correspondence and belief-updating measures of good judgment is the naïve assumption that things either did or did not happen and, once we know the truth, we can code reality as zero or 1.0, and then assess forecasting accuracy (by computing deviations between subjective probabilities and objective realities) or appropriate belief updating (by computing deviations between observed change and that required by earlier reputational bets).

 

‹ Prev