Expert Political Judgment

Home > Other > Expert Political Judgment > Page 39
Expert Political Judgment Page 39

by Philip E. Tetlock


  Of the thirteen items in the Styles-of-Reasoning Questionnaire, we drew eight from Kruglanski’s need-for-closure scale: (1) “Having clear rules and order at work is essential for success”; (2) “Even after I have made up my mind about something, I am always eager to consider a different opinion”; (3) “I dislike questions that can be answered in many different ways”; (4) “I usually make important decisions quickly and confidently”; (5) “When considering most conflict situations, I can usually see how both sides could be right”; (6) “It is annoying to listen to someone who cannot seem to make up his or her mind”; (7) “I prefer interacting with people whose opinions are very different from my own”; (8) “When trying to solve a problem I often see so many possible options that it is confusing.” The remaining items (9–13) were as follows: (9) In a famous essay, “Isaiah Berlin classified intellectuals as hedgehogs or foxes. The hedgehog knows one big thing and tries to explain as much as possible within that conceptual framework, whereas the fox knows many small things and is content to improvise explanations on a case-by-case basis. I place myself toward the hedgehog or fox end of this scale”; (10) “Scholars are usually at greater risk of exaggerating how complex the world is than they are of underestimating how complex it is”; (11) “We are closer than many think to achieving parsimonious explanations of politics”; (12) “I think politics is more cloudlike than clocklike (“cloudlike” meaning inherently unpredictable; “clocklike” meaning perfectly predictable if we have adequate knowledge)”; (13) “The more common error in decision making is to abandon good ideas too quickly, not to stick with bad ideas too long.” Maximum likelihood factor analysis of all thirteen items (with quartimin rotation) yielded the two-factor solution reported in table 2 of chapter 3. Our analyses focused on the first and largest factor—the hedgehog-fox factor—in large part because the second factor (decisiveness) explained so little of the variance in theoretically significant outcomes. The high-loading items on the first factor (0.25 and greater) defined the hedgehog-fox measure used in most of the analyses in the book (Cronbach’s alpha = 0.81).

  Research Procedures and Materials

  All respondents were given a Possible-Futures Questionnaire that introduced the study in this way: “Although political forecasting is obviously an inexact science, educated guesswork is still critical for setting priorities and making contingency plans. Your answers to the forecasting questions posed here will not be traceable either to you personally or to any institution with which you may be affiliated. Our goal is not to proclaim ‘winners’ and ‘losers’ in a forecasting contest but rather to study how highly trained professionals reason about complex real-world processes under conditions of uncertainty.”

  We began systematic collection of forecasts, and reactions to the degrees of confirmation or disconfirmation of those forecasts, in 1987–1988 and continued in periodic spurts through 2003. The forecasting exercises solicited subjective probability judgments of possible futures of approximately sixty nations. These nations had been clustered into nine categories: (1) the Soviet bloc, which initially included the Soviet Union (time series discontinued at end of 1991 and broken into Russia, the Ukraine, and Kazakhstan), Poland, German Democratic Republic (discontinued in 1990 with German reunification), Czechoslovakia (broken into the Czech Republic and Slovakia in 1993), Hungary, Romania, Bulgaria, and non–Warsaw Pact member Yugoslavia (discontinued in 1991 and broken into three of its republics, Slovenia, Croatia, and Serbia); (2) a European Union cluster that included the four largest economies—the United Kingdom, France, Federal Republic of Germany, and Italy; (3) North America (United States and Canada); (4) Central and Latin America, including Mexico, Cuba, Venezuela, Brazil, Argentina, and Chile; (5) the Arab world, including Egypt, Syria, Iraq, Saudi Arabia, Libya, and Sudan, plus Israel, Turkey, and Iran; (6) sub-Saharan Africa, including a “Horn of Africa” subgroup (Somalia, Ethiopia, and, as of 1993, Eritrea), a west Africa subgroup (Nigeria, Ghana, Ivory Coast, Sierra Leone, and Liberia), a central Africa group (Zaire, Angola, Zimbabwe, Uganda, Rwanda, and Burundi), and a southern Africa group (South Africa and Mozambique); (7) China; (8) Northeast Asia (Japan, North and South Korea); (9) Southeast Asia (Vietnam, Thailand, Malaysia, and Indonesia). We also conducted several more specialized exercises that cut across regional expertise (described later). The pool of respondents included at least ten specialists in each region or functional domain examined here, and, in the cases of the Soviet bloc, the Arab world, North America, and the European Union, in excess of twenty.

  The typical testing session was divided into three phases. First, experts answered the previously described questions that probed their professional backgrounds, preferred styles of thinking, and ideological and theoretical commitments. Second, they judged the probabilities of short- and longer-term futures for at least two nations within their regional specialty. Third, experts played the role of “dilettantes” and ventured probability judgments of possible futures for at least two nations drawn from regions of the world with which they were less familiar (nations chosen to balance the number of dilettante predictions obtained across regions of the world). Experts working within their specialty were also sometimes asked to make more complex reputational bets that required estimating the likelihood of possible futures conditional, first, on their own perspective on the underlying forces at work being correct and conditional, second, on the most influential rival perspective being correct. These reputational bets allowed us to assess the degree to which experts updated their beliefs like good Bayesians and are the focus of chapter 4 and section II of this appendix.

  We assured experts that we understood no human being could possess detailed knowledge of all the topics covered in these exercises and urged experts to assign just-guessing confidence whenever they felt they knew nothing that would justify elevating one set of possibilities over the others. We also gave participants brief facts-on-file summaries of recent developments (“to refresh memories and ensure a minimal common knowledge base”). And it is worth emphasizing that all data were collected with strict guarantees of confidentiality to both individuals and to the organizations with which they were affiliated. These assurances were necessary for both practical reasons (many experts would participate only under such ground rules) and substantive reasons (the objective of our measurement efforts is to tap into what experts really think, not what public positions experts deem it prudent to take).

  Forecasting questions had to satisfy five criteria:

  1. Pass the clairvoyance test. This test requires defining possible futures clearly enough that, if a genuine clairvoyant were in the room, that person could gaze into her crystal ball and tell you whether the forecast was right or wrong, with no need to return to the forecaster with bothersome requests for ex post respecifications of the sort typically needed in less formal forecasting exercises (“What exactly did you mean by ‘a Polish Peron’ or ‘continuing tension in Kashmir’ or ‘moderate growth in Japan’?”). We sought easily verifiable public indicators.

  2. Pass the exclusivity and exhaustiveness tests. We relied on formal probability theory to assess the accuracy and coherence of the probability judgments that forecasters attached to possible futures. But probabilities are supposed to add to 1.0 if and only if the possibilities do not overlap and if and only if the possibilities exhaust the universe of outcomes. It was necessary therefore to define the boundaries between possible futures with care. Sometimes this was easy. Certain criterion variables had “natural” break points. In the language of measurement theory, they formed either nominal scales that took 0 or 1 values or ordinal scales that permitted rough rank-order comparisons of degree of change. Examples include leadership transitions (e.g., Will “X” still be president or prime minister?) border changes (e.g., Will the state’s borders remain the same, expand, or contract?) and entry into or exit from international security regimes (e.g., NATO, Warsaw Pact, nonproliferation treaty) or trade regimes/monetary union (e.g., GATT, WTO, EU, NAFTA).

&
nbsp; Other criterion variables were continuous in character. In the language of measurement theory, they formed ratio scales with equal-spaced intervals between values and nonarbitrary zero points. Examples include GDP growth rates, central government debt as percentage of GDP, state-owned enterprises as percentage of GDP, defense spending as percentage of GDP, stock market closes, and currency exchange rates. The confidence interval was usually defined by plus or minus 0.5 of a standard deviation of the previous five or ten years of values of the variable. Experts were then asked to judge the subjective probability of future values falling below, within, or above the specified band. For example, if GDP growth had been 2.5 percent in the most recently available year, and if the standard deviation of growth values in the last ten years had been 1.5 percent, then the confidence band would have been bounded by 1.75 percent and 3.25 percent.

  3. Pass the “don’t bother me too often with dumb questions” test. Questions that made sense in some parts of the world made little sense in others. No one expected a coup in the United States or United Kingdom, but many regarded coups as serious possibilities in Saudi Arabia, Nigeria, and so on. Experts guffawed at judging the nuclear proliferation risk posed by Canada or Norway but not at the risks posed by Pakistan or North Korea. Some “ridiculous questions” were thus deleted.

  4. Include questions that vary in difficulty and allow us to assess wide individual or group differences in forecasting skill. Range of item difficulty was ensured by varying the temporal range of forecasts (short versus long-term), the regional focus (zone of turbulence versus stability) and the anticipated variance in outcomes (judging from past base rates, border and regime changes are rare, whereas shifts in unemployment or inflation are quite common).

  5. Avoidance of value-charged language. This “just the facts, ma’am” rule requires defining possible futures in a reasonably neutral fashion designed to minimize offense among even stridently partisan participants. We could say that central government debt rises above 120 percent of GDP but not that pork-barrel politics is out of control; that cross-border violence is rising but not that the bloodthirsty Israeli or Pakistani aggressors have struck again; that nondemocratic regimes change but not that nations have been liberated or have fallen under even more oppressive yokes. Of course, perfect value neutrality is unattainable, but it is worth trying to approximate it.

  Scoring Rules

  Although experts sometimes made “0” and “1” predictions (saying that x was impossible or inevitable), they mostly expressed degrees of uncertainty about what would happen. And—with the exception of card-carrying Bayesians—they mostly preferred to express that uncertainty via familiar verbal hedges: Gorbachev will “almost certainly” fail or John Major will “probably” lose or there is a “good chance” Pakistan will set off a nuclear test or the likelihood of peaceful transition to majority rule is “vanishingly small.”

  We had to coax participants to translate these idiosyncratic estimates of uncertainty onto standardized probability scales. From a psychometric perspective, however, the advantages of quantifying uncertainty outweighed the inevitable complaints about “the pseudoscientific artificiality” of affixing probability estimates to unique events. There is just no systematic way to check the accuracy of casual talk about alternative futures, but we can check the accuracy of subjective probability judgments. Quantification gives us a framework for assessing—across many forecasts on many occasions—the correspondence between the subjective likelihoods and objective frequencies of events (e.g., calibration and discrimination measures).

  The methods of eliciting subjective probabilities were quite similar for variables with different metric properties. We carved possible futures into logically exhaustive and mutually exclusive sets, usually three of them. Experts then assigned probabilities to each set of possibilities, with the constraint that those judgments had to sum to 1.0. A typical three-possible-futures scale looked like this:

  We told experts that, for three-possible-futures forecasts, they should treat 0.33 as the point of maximum uncertainty. We gave them a special “maximum uncertainty” box, which, we stressed, they should use only if they felt that they had no relevant knowledge for judging one possible set of futures to be more likely than the other(s) (“just-guessing” values). We also told experts the conditions for assigning the values of zero (only if sure that it is impossible one of the possible futures could occur in the specified period) and of 1.0 (only if sure that it is inevitable one of the possible futures will occur in the specified period). Although at opposing ends of the scale, ratings of zero and 1.0 share a critical psychological property: both represent movement from uncertainty into certainty.

  For continuous, ratio-scale variables, experts were given the most recently available value of the variable and presented with a confidence interval around that value. The confidence interval was usually defined by plus or minus 0.5 of a standard deviation of the previous five years of values of the variable (or previous five elections). Experts then judged the subjective probability of future values falling below, within, or above the specified band. For example, if GDP growth had been 2.5 percent in the most recently available year available, and if the standard deviation of growth values in the last five years had been 1.5 percent, then the confidence band would have been bounded by 1.75 percent and 3.25 percent. For experimental purposes, we occasionally broke the set of possible futures into four or even more categories to capture variation in judgments of extreme possibilities (e.g., severe recessions or sustained economic booms) as well as to assess the impact of “unpacking” possible futures into increasingly differentiated subsets.

  In principle, the total number of subjective probability forecasts obtained in the regional forecasting exercises should have numbered 95,472: the logical result of 284 forecasters each making short-term and long-term predictions for each of four nations (two inside and two outside their domains of expertise) on seventeen outcome variables (on average), each of which was typically broken down into three possible futures and thus required three separate probability estimates. In reality, as the result of substantial amounts of missing data due to forecasters not answering each question posed, there were 82,361 subjective probability estimates (derived from responses to approximately 27,450 forecasting questions).

  Types of Forecasting Questions

  Most questions were posed in exercises that were conducted in 1988 and 1992, came in both short-time-horizon and longer-time-horizon versions, and fell in one of the following four broad “content” categories.

  CONTINUITY OF DOMESTIC POLITICAL LEADERSHIP

  For established democracies, should we expect after either the next election (short-term) or the next two elections (longer-term) the party that currently has the most representatives in the legislative branch(es) of government will retain this status, will lose this status, or will strengthen its position (separate judgments for bicameral systems)? For democracies with presidential elections, should we expect that after the next election or next two elections, the current incumbent/party will lose control, will retain control with reduced popular support, or will retain control with greater popular support? Confidence bands around the status quo option were based on the variance either in seats controlled or in popular vote over the last five elections.

  For states with shakier track records of competitive elections, should we expect that, in either the next five or ten years, the individuals and (separate judgment) political parties/movements currently in charge will lose control, will retain control but weather major challenges to their authority (e.g., coup attempts, major rebellions), or will retain control without major challenges? Also, for less stable polities, should we expect the basic character of the political regime to change in the next five or ten years and, if so, will it change in the direction of increased or reduced economic freedom, increased or reduced political freedom, and increased or reduced corruption? Should we expect over the next five or ten years that interethnic and other sect
arian violence will increase, decrease, or remain about the same? Finally, should we expect state boundaries—over the next ten or twenty-five years—to remain the same, expand, or contract and—if boundaries do change—will it be the result of peaceful or violent secession by a subnational entity asserting independence or the result of peaceful or violent annexation by another nation-state?

  DOMESTIC POLICY AND ECONOMIC PERFORMANCE

  With respect to policy, should we expect—over the next two or five years—increases, decreases, or essentially no changes in marginal tax rates, central bank interest rates, central government expenditures as percentage of GDP, annual central government operating deficit as percentage of GDP, and the size of state-owned sectors of the economy as percentage of GDP? Should we expect—again over the next two or five years—shifts in government priorities such as percentage of GDP devoted to education or to health care? With respect to economic performance, should we expect—again over the next two or five years—growth rates in GDP to accelerate, decelerate, or remain about the same? What should our expectations be for inflation and unemployment over the next two or five years? Should we expect—over the next five or ten years—entry into or exit from membership in free-trade agreements or monetary unions?

 

‹ Prev