Antifragile: Things That Gain from Disorder
Page 51
In short, for a healthy person, there is a small probability of disastrous outcomes (discounted because unseen and not taken into account), and a high probability of mild benefits.
FIGURE 32. Nonlinearities in biology. The shape convex-concave necessarily flows from anything increasing (monotone, i.e., never decreasing) and bounded, with maximum and minimum values, i.e., does not reach infinity from either side. At low levels, the dose response is convex (gradually more and more effective). Additional doses tend to become gradually ineffective or start hurting. The same can apply to anything consumed in too much regularity. This type of graph necessarily applies to any situation bounded on both sides, with a known minimum and maximum (saturation), which includes happiness.
For instance, if one considers that there exists a maximum level of happiness and unhappiness, then the general shape of this curve with convexity on the left and concavity on the right has to hold for happiness (replace “dose” with “wealth” and “response” with “happiness”). Kahneman-Tversky prospect theory models a similar shape for “utility” of changes in wealth, which they discovered empirically.
FIGURE 33. Recall the hypertension example. On the vertical axis, we have the benefits of a treatment, on the horizontal, the severity of the condition. The arrow points at the level where probabilistic gains match probabilistic harm. Iatrogenics disappears nonlinearly as a function of the severity of the condition. This implies that when the patient is very ill, the distribution shifts to antifragile (thicker right tail), with large benefits from the treatment over possible iatrogenics, little to lose.
Note that if you increase the treatment you hit concavity from maximum benefits, a zone not covered in the graph—seen more broadly, it would look like the preceding graph.
FIGURE 34. The top graph shows hormesis for an organism (similar to Figure 19): we can see a stage of benefits as the dose increases (initially convex) slowing down into a phase of harm as we increase the dose a bit further (initially concave); then we see things flattening out at the level of maximum harm (beyond a certain point, the organism is dead so there is such a thing as a bounded and known worst case scenario in biology). To the right, a wrong graph of hormesis in medical textbooks showing initial concavity, with a beginning that looks linear or slightly concave.
THE INVERSE TURKEY PROBLEM
FIGURE 35. Antifragile, Inverse Turkey Problem: The unseen rare event is positive. When you look at a positively skewed (antifragile) time series and make inferences about the unseen, you miss the good stuff and underestimate the benefits (the Pisano, 2006a, 2006b, mistake). On the bottom, the other Harvard problem, that of Froot (2001). The filled area corresponds to what we do not tend to see in small samples, from insufficiency of points. Interestingly the shaded area increases with model error. The more technical sections call this zone ωB (turkey) and ωC (inverse turkey).
DIFFERENCE BETWEEN POINT ESTIMATES AND DISTRIBUTIONS
Let us apply this analysis to how planners make the mistakes they make, and why deficits tend to be worse than planned:
FIGURE 36. The gap between predictions and reality: probability distribution of outcomes from costs of projects in the minds of planners (top) and in reality (bottom). In the first graph they assume that the costs will be both low and quite certain. The graph on the bottom shows outcomes to be both worse and more spread out, particularly with higher possibility of unfavorable outcomes. Note the fragility increase owing to the swelling left tail.
This misunderstanding of the effect of uncertainty applies to government deficits, plans that have IT components, travel time (to a lesser degree), and many more. We will use the same graph to show model error from underestimating fragility by assuming that a parameter is constant when it is random. This is what plagues bureaucrat-driven economics (next discussion).
Appendix II (Very Technical):
WHERE MOST ECONOMIC MODELS FRAGILIZE AND BLOW PEOPLE UP
When I said “technical” in the main text, I may have been fibbing. Here I am not.
The Markowitz incoherence: Assume that someone tells you that the probability of an event is exactly zero. You ask him where he got this from. “Baal told me” is the answer. In such case, the person is coherent, but would be deemed unrealistic by non-Baalists. But if on the other hand, the person tells you “I estimated it to be zero,” we have a problem. The person is both unrealistic and inconsistent. Something estimated needs to have an estimation error. So probability cannot be zero if it is estimated, its lower bound is linked to the estimation error; the higher the estimation error, the higher the probability, up to a point. As with Laplace’s argument of total ignorance, an infinite estimation error pushes the probability toward ½.
We will return to the implication of the mistake; take for now that anything estimating a parameter and then putting it into an equation is different from estimating the equation across parameters (same story as the health of the grandmother, the average temperature, here “estimated” is irrelevant, what we need is average health across temperatures). And Markowitz showed his incoherence by starting his “semi-nal” paper with “Assume you know E and V” (that is, the expectation and the variance). At the end of the paper he accepts that they need to be estimated, and what is worse, with a combination of statistical techniques and the “judgment of practical men.” Well, if these parameters need to be estimated, with an error, then the derivations need to be written differently and, of course, we would have no paper—and no Markowitz paper, no blowups, no modern finance, no fragilistas teaching junk to students.… Economic models are extremely fragile to assumptions, in the sense that a slight alteration in these assumptions can, as we will see, lead to extremely consequential differences in the results. And, to make matters worse, many of these models are “back-fit” to assumptions, in the sense that the hypotheses are selected to make the math work, which makes them ultrafragile and ultrafragilizing.
Simple example: Government deficits.
We use the following deficit example owing to the way calculations by governments and government agencies currently miss convexity terms (and have a hard time accepting it). Really, they don’t take them into account. The example illustrates:
(a) missing the stochastic character of a variable known to affect the model but deemed deterministic (and fixed), and
(b) F, the function of such variable, is convex or concave with respect to the variable.
Say a government estimates unemployment for the next three years as averaging 9 percent; it uses its econometric models to issue a forecast balance B of a two-hundred-billion deficit in the local currency. But it misses (like almost everything in economics) that unemployment is a stochastic variable. Employment over a three-year period has fluctuated by 1 percent on average. We can calculate the effect of the error with the following:
Unemployment at 8%, Balance B(8%) = −75 bn (improvement of 125 bn)
Unemployment at 9%, Balance B(9%)= −200 bn
Unemployment at 10%, Balance B(10%)= −550 bn (worsening of 350 bn)
The concavity bias, or negative convexity bias, from underestimation of the deficit is −112.5 bn, since ½ {B(8%) + B(10%)} = −312 bn, not −200 bn. This is the exact case of the inverse philosopher’s stone.
FIGURE 37. Nonlinear transformations allow the detection of both model convexity bias and fragility. Illustration of the example: histogram from Monte Carlo simulation of government deficit as a left-tailed random variable simply as a result of randomizing unemployment, of which it is a concave function. The method of point estimate would assume a Dirac stick at −200, thus underestimating both the expected deficit (−312) and the tail fragility of it. (From Taleb and Douady, 2012).
Application: Ricardian Model and Left Tail—The Price of Wine Happens to Vary
For almost two hundred years, we’ve been talking about an idea by the economist David Ricardo called “comparative advantage.” In short, it says that a country should have a certain policy based on its comparative advantage in w
ine or clothes. Say a country is good at both wine and clothes, better than its neighbors with whom it can trade freely. Then the visible optimal strategy would be to specialize in either wine or clothes, whichever fits the best and minimizes opportunity costs. Everyone would then be happy. The analogy by the economist Paul Samuelson is that if someone happens to be the best doctor in town and, at the same time, the best secretary, then it would be preferable to be the higher-earning doctor—as it would minimize opportunity losses—and let someone else be the secretary and buy secretarial services from him.
I agree that there are benefits in some form of specialization, but not from the models used to prove it. The flaw with such reasoning is as follows. True, it would be inconceivable for a doctor to become a part-time secretary just because he is good at it. But, at the same time, we can safely assume that being a doctor insures some professional stability: People will not cease to get sick and there is a higher social status associated with the profession than that of secretary, making the profession more desirable. But assume now that in a two-country world, a country specialized in wine, hoping to sell its specialty in the market to the other country, and that suddenly the price of wine drops precipitously. Some change in taste caused the price to change. Ricardo’s analysis assumes that both the market price of wine and the costs of production remain constant, and there is no “second order” part of the story.
Click here for a larger image of this table.
The logic: The table above shows the cost of production, normalized to a selling price of one unit each, that is, assuming that these trade at equal price (1 unit of cloth for 1 unit of wine). What looks like the paradox is as follows: that Portugal produces cloth cheaper than Britain, but should buy cloth from there instead, using the gains from the sales of wine. In the absence of transaction and transportation costs, it is efficient for Britain to produce just cloth, and Portugal to only produce wine.
The idea has always attracted economists because of its paradoxical and counterintuitive aspect. For instance, in an article “Why Intellectuals Don’t Understand Comparative Advantage” (Krugman, 1998), Paul Krugman, who fails to understand the concept himself, as this essay and his technical work show him to be completely innocent of tail events and risk management, makes fun of other intellectuals such as S. J. Gould who understand tail events albeit intuitively rather than analytically. (Clearly one cannot talk about returns and gains without discounting these benefits by the offsetting risks.) The article shows Krugman falling into the critical and dangerous mistake of confusing function of average and average of function. (Traditional Ricardian analysis assumes the variables are endogenous, but does not add a layer of stochasticity.)
Now consider the price of wine and clothes variable—which Ricardo did not assume—with the numbers above the unbiased average long-term value. Further assume that they follow a fat-tailed distribution. Or consider that their costs of production vary according to a fat-tailed distribution.
If the price of wine in the international markets rises by, say, 40 percent, then there are clear benefits. But should the price drop by an equal percentage, −40 percent, then massive harm would ensue, in magnitude larger than the benefits should there be an equal rise. There are concavities to the exposure—severe concavities.
And clearly, should the price drop by 90 percent, the effect would be disastrous. Just imagine what would happen to your household should you get an instant and unpredicted 40 percent pay cut. Indeed, we have had problems in history with countries specializing in some goods, commodities, and crops that happen to be not just volatile, but extremely volatile. And disaster does not necessarily come from variation in price, but problems in production: suddenly, you can’t produce the crop because of a germ, bad weather, or some other hindrance.
A bad crop, such as the one that caused the Irish potato famine in the decade around 1850, caused the death of a million and the emigration of a million more (Ireland’s entire population at the time of this writing is only about six million, if one includes the northern part). It is very hard to reconvert resources—unlike the case in the doctor-typist story, countries don’t have the ability to change. Indeed, monoculture (focus on a single crop) has turned out to be lethal in history—one bad crop leads to devastating famines.
The other part missed in the doctor-secretary analogy is that countries don’t have family and friends. A doctor has a support community, a circle of friends, a collective that takes care of him, a father-in-law to borrow from in the event that he needs to reconvert into some other profession, a state above him to help. Countries don’t. Further, a doctor has savings; countries tend to be borrowers.
So here again we have fragility to second-order effects.
Probability Matching: The idea of comparative advantage has an analog in probability: if you sample from an urn (with replacement) and get a black ball 60 percent of the time, and a white one the remaining 40 percent, the optimal strategy, according to textbooks, is to bet 100 percent of the time on black. The strategy of betting 60 percent of the time on black and 40 percent on white is called “probability matching” and considered to be an error in the decision-science literature (which I remind the reader is what was used by Triffat in Chapter 10). People’s instinct to engage in probability matching appears to be sound, not a mistake. In nature, probabilities are unstable (or unknown), and probability matching is similar to redundancy, as a buffer. So if the probabilities change, in other words if there is another layer of randomness, then the optimal strategy is probability matching.
How specialization works: The reader should not interpret what I am saying to mean that specialization is not a good thing—only that one should establish such specialization after addressing fragility and second-order effects. Now I do believe that Ricardo is ultimately right, but not from the models shown. Organically, systems without top-down controls would specialize progressively, slowly, and over a long time, through trial and error, get the right amount of specialization—not through some bureaucrat using a model. To repeat, systems make small errors, design makes large ones.
So the imposition of Ricardo’s insight-turned-model by some social planner would lead to a blowup; letting tinkering work slowly would lead to efficiency—true efficiency. The role of policy makers should be to, via negativa style, allow the emergence of specialization by preventing what hinders the process.
A More General Methodology to Spot Model Error
Model second-order effects and fragility: Assume we have the right model (which is a very generous assumption) but are uncertain about the parameters. As a generalization of the deficit/employment example used in the previous section, say we are using f, a simple function: f(x|ᾱ), where ᾱ is supposed to be the average expected input variable, where we take φ as the distribution of α over its domain , .
The philosopher’s stone: The mere fact that α is uncertain (since it is estimated) might lead to a bias if we perturbate from the inside (of the integral), i.e., stochasticize the parameter deemed fixed. Accordingly, the convexity bias is easily measured as the difference between (a) the function f integrated across values of potential α, and (b) f estimated for a single value of α deemed to be its average. The convexity bias (philosopher’s stone) ωA becomes:1
The central equation: Fragility is a partial philosopher’s stone below K, hence ωB the missed fragility is assessed by comparing the two integrals below K in order to capture the effect on the left tail:
which can be approximated by an interpolated estimate obtained with two values of α separated from a midpoint by ∆α its mean deviation of α and estimating
Note that antifragility ωC is integrating from K to infinity. We can probe ωB by point estimates of f at a level of X ≤ K
so that
which leads us to the fragility detection heuristic (Taleb, Canetti, et al., 2012). In particular, if we assume that ω´B(X) has a constant sign for X ≤ K, then ωB(K) has the same sign. The detection heuristic is a pert
urbation in the tails to probe fragility, by checking the function ω´B(X) at any level X.
Click here for a larger image of this table.
Portfolio fallacies: Note one fallacy promoted by Markowitz users: portfolio theory entices people to diversify, hence it is better than nothing. Wrong, you finance fools: it pushes them to optimize, hence overallocate. It does not drive people to take less risk based on diversification, but causes them to take more open positions owing to perception of offsetting statistical properties—making them vulnerable to model error, and especially vulnerable to the underestimation of tail events. To see how, consider two investors facing a choice of allocation across three items: cash, and securities A and B. The investor who does not know the statistical properties of A and B and knows he doesn’t know will allocate, say, the portion he does not want to lose to cash, the rest into A and B—according to whatever heuristic has been in traditional use. The investor who thinks he knows the statistical properties, with parameters σA, σB, ρA,B, will allocate ωA, ωB in a way to put the total risk at some target level (let us ignore the expected return for this). The lower his perception of the correlation ρA,B, the worse his exposure to model error. Assuming he thinks that the correlation ρA,B, is 0, he will be overallocated by 1⁄3 for extreme events. But if the poor investor has the illusion that the correlation is −1, he will be maximally overallocated to his A and B investments. If the investor uses leverage, we end up with the story of Long-Term Capital Management, which turned out to be fooled by the parameters. (In real life, unlike in economic papers, things tend to change; for Baal’s sake, they change!) We can repeat the idea for each parameter σ and see how lower perception of this σ leads to overallocation.