The Black Swan
Page 21
THE EXPERT PROBLEM, OR THE TRAGEDY OF THE EMPTY SUIT
So far we have not questioned the authority of the professionals involved but rather their ability to gauge the boundaries of their own knowledge. Epistemic arrogance does not preclude skills. A plumber will almost always know more about plumbing than a stubborn essayist and mathematical trader. A hernia surgeon will rarely know less about hernias than a belly dancer. But their probabilities, on the other hand, will be off—and, this is the disturbing point, you may know much more on that score than the expert. No matter what anyone tells you, it is a good idea to question the error rate of an expert’s procedure. Do not question his procedure, only his confidence. (As someone who was burned by the medical establishment, I learned to be cautious, and I urge everyone to be: if you walk into a doctor’s office with a symptom, do not listen to his odds of its not being cancer.)
I will separate the two cases as follows. The mild case: arrogance in the presence of (some) competence, and the severe case: arrogance mixed with incompetence (the empty suit). There are some professions in which you know more than the experts, who are, alas, people for whose opinions you are paying—instead of them paying you to listen to them. Which ones?
What Moves and What Does Not Move
There is a very rich literature on the so-called expert problem, running empirical testing on experts to verify their record. But it seems to be confusing at first. On one hand, we are shown by a class of expert-busting researchers such as Paul Meehl and Robyn Dawes that the “expert” is the closest thing to a fraud, performing no better than a computer using a single metric, their intuition getting in the way and blinding them. (As an example of a computer using a single metric, the ratio of liquid assets to debt fares better than the majority of credit analysts.) On the other hand, there is abundant literature showing that many people can beat computers thanks to their intuition. Which one is correct?
There must be some disciplines with true experts. Let us ask the following questions: Would you rather have your upcoming brain surgery performed by a newspaper’s science reporter or by a certified brain surgeon? On the other hand, would you prefer to listen to an economic forecast by someone with a PhD in finance from some “prominent” institution such as the Wharton School, or by a newspaper’s business writer? While the answer to the first question is empirically obvious, the answer to the second one isn’t at all. We can already see the difference between “know-how” and “know-what.” The Greeks made a distinction between technē and epistēmē. The empirical school of medicine of Menodotus of Nicomedia and Heraclites of Tarentum wanted its practitioners to stay closest to technē (i.e., “craft”), and away from epistēmē (i.e., “knowledge,” “science”).
The psychologist James Shanteau undertook the task of finding out which disciplines have experts and which have none. Note the confirmation problem here: if you want to prove that there are no experts, then you will be able to find a profession in which experts are useless. And you can prove the opposite just as well. But there is a regularity: there are professions where experts play a role, and others where there is no evidence of skills. Which are which?
Experts who tend to be experts: livestock judges, astronomers, test pilots, soil judges, chess masters, physicists, mathematicians (when they deal with mathematical problems, not empirical ones), accountants, grain inspectors, photo interpreters, insurance analysts (dealing with bell curve–style statistics).
Experts who tend to be … not experts: stockbrokers, clinical psychologists, psychiatrists, college admissions officers, court judges, councilors, personnel selectors, intelligence analysts (the CIA’s record, in spite of its costs, is pitiful), unless one takes into account some great dose of invisible prevention. I would add these results from my own examination of the literature: economists, financial forecasters, finance professors, political scientists, “risk experts,” Bank for International Settlements staff, august members of the International Association of Financial Engineers, and personal financial advisers.
Simply, things that move, and therefore require knowledge, do not usually have experts, while things that don’t move seem to have some experts. In other words, professions that deal with the future and base their studies on the nonrepeatable past have an expert problem (with the exception of the weather and businesses involving short-term physical processes, not socioeconomic ones). I am not saying that no one who deals with the future provides any valuable information (as I pointed out earlier, newspapers can predict theater opening hours rather well), but rather that those who provide no tangible added value are generally dealing with the future.
Another way to see it is that things that move are often Black Swan–prone. Experts are narrowly focused persons who need to “tunnel.” In situations where tunneling is safe, because Black Swans are not consequential, the expert will do well.
Robert Trivers, an evolutionary psychologist and a man of supernormal insights, has another answer (he became one of the most influential evolutionary thinkers since Darwin with ideas he developed while trying to go to law school). He links it to self-deception. In fields where we have ancestral traditions, such as pillaging, we are very good at predicting outcomes by gauging the balance of power. Humans and chimps can immediately sense which side has the upper hand, and make a cost-benefit analysis about whether to attack and take the goods and the mates. Once you start raiding, you put yourself into a delusional mind-set that makes you ignore additional information—it is best to avoid wavering during battle. On the other hand, unlike raids, large-scale wars are not something present in human heritage—we are new to them—so we tend to misestimate their duration and overestimate our relative power. Recall the underestimation of the duration of the Lebanese war. Those who fought in the Great War thought it would be a mere cakewalk. So it was with the Vietnam conflict, so it is with the Iraq war, and just about every modern conflict.
You cannot ignore self-delusion. The problem with experts is that they do not know what they do not know. Lack of knowledge and delusion about the quality of your knowledge come together—the same process that makes you know less also makes you satisfied with your knowledge.
Next, instead of the range of forecasts, we will concern ourselves with the accuracy of forecasts, i.e., the ability to predict the number itself.
How to Have the Last Laugh
We can also learn about prediction errors from trading activities. We quants have ample data about economic and financial forecasts—from general data about large economic variables to the forecasts and market calls of the television “experts” or “authorities.” The abundance of such data and the ability to process it on a computer make the subject invaluable for an empiricist. If I had been a journalist, or, God forbid, a historian, I would have had a far more difficult time testing the predictive effectiveness of these verbal discussions. You cannot process verbal commentaries with a computer—at least not so easily. Furthermore, many economists naïvely make the mistake of producing a lot of forecasts concerning many variables, giving us a database of economists and variables, which enables us to see whether some economists are better than others (there is no consequential difference) or if there are certain variables for which they are more competent (alas, none that are meaningful).
I was in a seat to observe from very close our ability to predict. In my full-time trader days, a couple of times a week, at 8:30 A.M., my screen would flash some economic number released by the Department of Commerce, or Treasury, or Trade, or some such honorable institution. I never had a clue about what these numbers meant and never saw any need to invest energy in finding out. So I would not have cared the least about them except that people got all excited and talked quite a bit about what these figures were going to mean, pouring verbal sauce around the forecasts. Among such numbers you have the Consumer Price Index (CPI), Nonfarm Payrolls (changes in the number of employed individuals), the Index of Leading Economic Indicators, Sales of Durable Goods (dubbed “doable girls”
by traders), the Gross Domestic Product (the most important one), and many more that generate different levels of excitement depending on their presence in the discourse.
The data vendors allow you to take a peek at forecasts by “leading economists,” people (in suits) who work for the venerable institutions, such as J. P. Morgan Chase or Morgan Stanley. You can watch these economists talk, theorizing eloquently and convincingly. Most of them earn seven figures and they rank as stars, with teams of researchers crunching numbers and projections. But the stars are foolish enough to publish their projected numbers, right there, for posterity to observe and assess their degree of competence.
Worse yet, many financial institutions produce booklets every year-end called “Outlook for 200X,” reading into the following year. Of course they do not check how their previous forecasts fared after they were formulated. The public might have been even more foolish in buying the arguments without requiring the following simple tests—easy though they are, very few of them have been done. One elementary empirical test is to compare these star economists to a hypothetical cabdriver (the equivalent of Mikhail from Chapter 1): you create a synthetic agent, someone who takes the most recent number as the best predictor of the next, while assuming that he does not know anything. Then all you have to do is compare the error rates of the hotshot economists and your synthetic agent. The problem is that when you are swayed by stories you forget about the necessity of such testing.
Events Are Outlandish
The problem with prediction is a little more subtle. It comes mainly from the fact that we are living in Extremistan, not Mediocristan. Our predictors may be good at predicting the ordinary, but not the irregular, and this is where they ultimately fail. All you need to do is miss one interest-rates move, from 6 percent to 1 percent in a longer-term projection (what happened between 2000 and 2001) to have all your subsequent forecasts rendered completely ineffectual in correcting your cumulative track record. What matters is not how often you are right, but how large your cumulative errors are.
And these cumulative errors depend largely on the big surprises, the big opportunities. Not only do economic, financial, and political predictors miss them, but they are quite ashamed to say anything outlandish to their clients—and yet events, it turns out, are almost always outlandish. Furthermore, as we will see in the next section, economic forecasters tend to fall closer to one another than to the resulting outcome. Nobody wants to be off the wall.
Since my testing has been informal, for commercial and entertainment purposes, for my own consumption and not formatted for publishing, I will use the more formal results of other researchers who did the dog work of dealing with the tedium of the publishing process. I am surprised that so little introspection has been done to check on the usefulness of these professions. There are a few—but not many—formal tests in three domains: security analysis, political science, and economics. We will no doubt have more in a few years. Or perhaps not—the authors of such papers might become stigmatized by his colleagues. Out of close to a million papers published in politics, finance, and economics, there have been only a small number of checks on the predictive quality of such knowledge.
Herding Like Cattle
A few researchers have examined the work and attitude of security analysts, with amazing results, particularly when one considers the epistemic arrogance of these operators. In a study comparing them with weather forecasters, Tadeusz Tyszka and Piotr Zielonka document that the analysts are worse at predicting, while having a greater faith in their own skills. Somehow, the analysts’ self-evaluation did not decrease their error margin after their failures to forecast.
Last June I bemoaned the dearth of such published studies to Jean-Philippe Bouchaud, whom I was visiting in Paris. He is a boyish man who looks half my age though he is only slightly younger than I, a matter that I half jokingly attribute to the beauty of physics. Actually he is not exactly a physicist but one of those quantitative scientists who apply methods of statistical physics to economic variables, a field that was started by Benoît Mandelbrot in the late 1950s. This community does not use Mediocristan mathematics, so they seem to care about the truth. They are completely outside the economics and business-school finance establishment, and survive in physics and mathematics departments or, very often, in trading houses (traders rarely hire economists for their own consumption, but rather to provide stories for their less sophisticated clients). Some of them also operate in sociology with the same hostility on the part of the “natives.” Unlike economists who wear suits and spin theories, they use empirical methods to observe the data and do not use the bell curve.
He surprised me with a research paper that a summer intern had just finished under his supervision and that had just been accepted for publication; it scrutinized two thousand predictions by security analysts. What it showed was that these brokerage-house analysts predicted nothing—a naïve forecast made by someone who takes the figures from one period as predictors of the next would not do markedly worse. Yet analysts are informed about companies’ orders, forthcoming contracts, and planned expenditures, so this advanced knowledge should help them do considerably better than a naïve forecaster looking at the past data without further information. Worse yet, the forecasters’ errors were significantly larger than the average difference between individual forecasts, which indicates herding. Normally, forecasts should be as far from one another as they are from the predicted number. But to understand how they manage to stay in business, and why they don’t develop severe nervous breakdowns (with weight loss, erratic behavior, or acute alcoholism), we must look at the work of the psychologist Philip Tetlock.
I Was “Almost” Right
Tetlock studied the business of political and economic “experts.” He asked various specialists to judge the likelihood of a number of political, economic, and military events occurring within a specified time frame (about five years ahead). The outcomes represented a total number of around twenty-seven thousand predictions, involving close to three hundred specialists. Economists represented about a quarter of his sample. The study revealed that experts’ error rates were clearly many times what they had estimated. His study exposed an expert problem: there was no difference in results whether one had a PhD or an undergraduate degree. Well-published professors had no advantage over journalists. The only regularity Tetlock found was the negative effect of reputation on prediction: those who had a big reputation were worse predictors than those who had none.
But Tetlock’s focus was not so much to show the real competence of experts (although the study was quite convincing with respect to that) as to investigate why the experts did not realize that they were not so good at their own business, in other words, how they spun their stories. There seemed to be a logic to such incompetence, mostly in the form of belief defense, or the protection of self-esteem. He therefore dug further into the mechanisms by which his subjects generated ex post explanations.
I will leave aside how one’s ideological commitments influence one’s perception and address the more general aspects of this blind spot toward one’s own predictions.
You tell yourself that you were playing a different game. Let’s say you failed to predict the weakening and precipitous fall of the Soviet Union (which no social scientist saw coming). It is easy to claim that you were excellent at understanding the political workings of the Soviet Union, but that these Russians, being exceedingly Russian, were skilled at hiding from you crucial economic elements. Had you been in possession of such economic intelligence, you would certainly have been able to predict the demise of the Soviet regime. It is not your skills that are to blame. The same might apply to you if you had forecast the landslide victory for Al Gore over George W. Bush. You were not aware that the economy was in such dire straits; indeed, this fact seemed to be concealed from everyone. Hey, you are not an economist, and the game turned out to be about economics.
You invoke the outlier. Something happened that was outside t
he system, outside the scope of your science. Given that it was not predictable, you are not to blame. It was a Black Swan and you are not supposed to predict Black Swans. Black Swans, NNT tells us, are fundamentally unpredictable (but then I think that NNT would ask you, Why rely on predictions?). Such events are “exogenous,” coming from outside your science. Or maybe it was an event of very, very low probability, a thousand-year flood, and we were unlucky to be exposed to it. But next time, it will not happen. This focus on the narrow game and linking one’s performance to a given script is how the nerds explain the failures of mathematical methods in society. The model was right, it worked well, but the game turned out to be a different one than anticipated.
The “almost right” defense. Retrospectively, with the benefit of a revision of values and an informational framework, it is easy to feel that it was a close call. Tetlock writes, “Observers of the former Soviet Union who, in 1988, thought the Communist Party could not be driven from power by 1993 or 1998 were especially likely to believe that Kremlin hardliners almost overthrew Gorbachev in the 1991 coup attempt, and they would have if the conspirators had been more resolute and less inebriated, or if key military officers had obeyed orders to kill civilians challenging martial law or if Yeltsin had not acted so bravely.”