Book Read Free

The Black Swan

Page 35

by Nassim Nicholas Taleb


  Company size 1.5

  People killed in terrorist attacks 2 (but possibly a much lower exponent)

  * Source: M.E.J. Newman (2005) and the author’s own calculations.

  Let me show the different measured exponents for a variety of phenomena.

  Let me tell you upfront that these exponents mean very little in terms of numerical precision. We will see why in a minute, but just note for now that we do not observe these parameters; we simply guess them, or infer them for statistical information, which makes it hard at times to know the true parameters—if it in fact exists. Let us first examine the practical consequences of an exponent.

  TABLE 3: THE MEANING OF THE EXPONENT

  Exponent Share of the top 1% Share of the top 20%

  1 99.99%* 99.99%

  1.1 66% 86%

  1.2 47% 76%

  1.3 34% 69%

  1.4 27% 63%

  1.5 22% 58%

  2 10% 45%

  2.5 6% 38%

  3 4.6% 34%

  Table 3 illustrates the impact of the highly improbable. It shows the contributions of the top 1 percent and 20 percent to the total. The lower the exponent, the higher those contributions. But look how sensitive the process is: between 1.1 and 1.3 you go from 66 percent of the total to 34 percent. Just a 0.2 difference in the exponent changes the result dramatically—and such a difference can come from a simple measurement error. This difference is not trivial: just consider that we have no precise idea what the exponent is because we cannot measure it directly. All we do is estimate from past data or rely on theories that allow for the building of some model that would give us some idea—but these models may have hidden weaknesses that prevent us from blindly applying them to reality.

  So keep in mind that the 1.5 exponent is an approximation, that it is hard to compute, that you do not get it from the gods, at least not easily, and that you will have a monstrous sampling error. You will observe that the number of books selling above a million copies is not always going to be 8—It could be as high as 20, or as low as 2.

  More significantly, this exponent begins to apply at some number called “crossover,” and addresses numbers larger than this crossover. It may start at 200,000 books, or perhaps only 400,000 books. Likewise, wealth has different properties before, say, $600 million, when inequality grows, than it does below such a number. How do you know where the crossover point is? This is a problem. My colleagues and I worked with around 20 million pieces of financial data. We all had the same data set, yet we never agreed on exactly what the exponent was in our sets. We knew the data revealed a fractal power law, but we learned that one could not produce a precise number. But what we did know—that the distribution is scalable and fractal—was sufficient for us to operate and make decisions.

  The Problem of the Upper Bound

  Some people have researched and accepted the fractal “up to a point.” They argue that wealth, book sales, and market returns all have a certain level when things stop being fractal. “Truncation” is what they propose. I agree that there is a level where fractality might stop, but where? Saying that there is an upper limit but I don’t know how high it is, and saying there is no limit carry the same consequences in practice. Proposing an upper limit is highly unsafe. You may say, Let us cap wealth at $150 billion in our analyses. Then someone else might say, Why not $151 billion? Or why not $152 billion? We might as well consider that the variable is unlimited.

  Beware the Precision

  I have learned a few tricks from experience: whichever exponent I try to measure will be likely to be overestimated (recall that a higher exponent implies a smaller role for large deviations)—what you see is likely to be less Black Swannish than what you do not see. I call this the masquerade problem.

  Let’s say I generate a process that has an exponent of 1.7. You do not see what is inside the engine, only the data coming out. If I ask you what the exponent is, odds are that you will compute something like 2.4. You would do so even if you had a million data points. The reason is that it takes a long time for some fractal processes to reveal their properties, and you underestimate the severity of the shock.

  Sometimes a fractal can make you believe that it is Gaussian, particularly when the cutpoint starts at a high number. With fractal distributions, extreme deviations of that kind are rare enough to smoke you: you don’t recognize the distribution as fractal.

  The Water Puddle Revisited

  As you have seen, we have trouble knowing the parameters of whichever model we assume runs the world. So with Extremistan, the problem of induction pops up again, this time even more significantly than at any previous time in this book. Simply, if a mechanism is fractal it can deliver large values; therefore the incidence of large deviations is possible, but how possible, how often they should occur, will be hard to know with any precision. This is similar to the water puddle problem: plenty of ice cubes could have generated it. As someone who goes from reality to possible explanatory models, I face a completely different spate of problems from those who do the opposite.

  I have just read three “popular science” books that summarize the research in complex systems: Mark Buchanan’s Ubiquity, Philip Ball’s Critical Mass, and Paul Ormerod’s Why Most Things Fail. These three authors present the world of social science as full of power laws, a view with which I most certainly agree. They also claim that there is universality of many of these phenomena, that there is a wonderful similarity between various processes in nature and the behavior of social groups, which I agree with. They back their studies with the various theories on networks and show the wonderful correspondence between the so-called critical phenomena in natural science and the self-organization of social groups. They bring together processes that generate avalanches, social contagions, and what they call informational cascades, which I agree with.

  Universality is one of the reasons physicists find power laws associated with critical points particularly interesting. There are many situations, both in dynamical systems theory and statistical mechanics, where many of the properties of the dynamics around critical points are independent of the details of the underlying dynamical system. The exponent at the critical point may be the same for many systems in the same group, even though many other aspects of the system are different. I almost agree with this notion of universality. Finally, all three authors encourage us to apply techniques from statistical physics, avoiding econometrics and Gaussian-style nonscalable distributions like the plague, and I couldn’t agree more.

  But all three authors, by producing, or promoting precision, fall into the trap of not differentiating between the forward and the backward processes (between the problem and the inverse problem)—to me, the greatest scientific and epistemological sin. They are not alone; nearly everyone who works with data but doesn’t make decisions on the basis of these data tends to be guilty of the same sin, a variation of the narrative fallacy. In the absence of a feedback process you look at models and think that they confirm reality. I believe in the ideas of these three books, but not in the way they are being used—and certainly not with the precision the authors ascribe to them. As a matter of fact, complexity theory should make us more suspicious of scientific claims of precise models of reality. It does not make all the swans white; that is predictable: it makes them gray, and only gray.*

  As I have said earlier, the world, epistemologically, is literally a different place to a bottom-up empiricist. We don’t have the luxury of sitting down to read the equation that governs the universe; we just observe data and make an assumption about what the real process might be, and “calibrate” by adjusting our equation in accordance with additional information. As events present themselves to us, we compare what we see to what we expected to see. It is usually a humbling process, particularly for someone aware of the narrative fallacy, to discover that history runs forward, not backward. As much as one thinks that businessmen have big egos, these people are often humbled by reminders of the differences between d
ecision and results, between precise models and reality.

  What I am talking about is opacity, incompleteness of information, the invisibility of the generator of the world. History does not reveal its mind to us—we need to guess what’s inside of it.

  From Representation to Reality

  The above idea links all the parts of this book. While many study psychology, mathematics, or evolutionary theory and look for ways to take it to the bank by applying their ideas to business, I suggest the exact opposite: study the intense, uncharted, humbling uncertainty in the markets as a means to get insights about the nature of randomness that is applicable to psychology, probability, mathematics, decision theory, and even statistical physics. You will see the sneaky manifestations of the narrative fallacy, the ludic fallacy, and the great errors of Platonicity, of going from representation to reality.

  When I first met Mandelbrot I asked him why an established scientist like him who should have more valuable things to do with his life would take an interest in such a vulgar topic as finance. I thought that finance and economics were just a place where one learned from various empirical phenomena and filled up one’s bank account with f*** you cash before leaving for bigger and better things. Mandelbrot’s answer was, “Data, a gold mine of data.” Indeed, everyone forgets that he started in economics before moving on to physics and the geometry of nature. Working with such abundant data humbles us; it provides the intuition of the following error: traveling the road between representation and reality in the wrong direction.

  The problem of the circularity of statistics (which we can also call the statistical regress argument) is as follows. Say you need past data to discover whether a probability distribution is Gaussian, fractal, or something else. You will need to establish whether you have enough data to back up your claim. How do we know if we have enough data? From the probability distribution—a distribution does tell you whether you have enough data to “build confidence” about what you are inferring. If it is a Gaussian bell curve, then a few points will suffice (the law of large numbers once again). And how do you know if the distribution is Gaussian? Well, from the data. So we need the data to tell us what the probability distribution is, and a probability distribution to tell us how much data we need. This causes a severe regress argument.

  This regress does not occur if you assume beforehand that the distribution is Gaussian. It happens that, for some reason, the Gaussian yields its properties rather easily. Extremistan distributions do not do so. So selecting the Gaussian while invoking some general law appears to be convenient. The Gaussian is used as a default distribution for that very reason. As I keep repeating, assuming its application beforehand may work with a small number of fields such as crime statistics, mortality rates, matters from Mediocristan. But not for historical data of unknown attributes and not for matters from Extremistan.

  Now, why aren’t statisticians who work with historical data aware of this problem? First, they do not like to hear that their entire business has been canceled by the problem of induction. Second, they are not confronted with the results of their predictions in rigorous ways. As we saw with the Makridakis competition, they are grounded in the narrative fallacy, and they do not want to hear it.

  ONCE AGAIN, BEWARE THE FORECASTERS

  Let me take the problem one step higher up. As I mentioned earlier, plenty of fashionable models attempt to explain the genesis of Extremistan. In fact, they are grouped into two broad classes, but there are occasionally more approaches. The first class includes the simple rich-get-richer (or big-get-bigger) style model that is used to explain the lumping of people around cities, the market domination of Microsoft and VHS (instead of Apple and Betamax), the dynamics of academic reputations, etc. The second class concerns what are generally called “percolation models,” which address not the behavior of the individual, but rather the terrain in which he operates. When you pour water on a porous surface, the structure of that surface matters more than does the liquid. When a grain of sand hits a pile of other grains of sand, how the terrain is organized is what determines whether there will be an avalanche.

  Most models, of course, attempt to be precisely predictive, not just descriptive; I find this infuriating. They are nice tools for illustrating the genesis of Extremistan, but I insist that the “generator” of reality does not appear to obey them closely enough to make them helpful in precise forecasting. At least to judge by anything you find in the current literature on the subject of Extremistan. Once again we face grave calibration problems, so it would be a great idea to avoid the common mistakes made while calibrating a nonlinear process. Recall that nonlinear processes have greater degrees of freedom than linear ones (as we saw in Chapter 11), with the implication that you run a great risk of using the wrong model. Yet once in a while you run into a book or articles advocating the application of models from statistical physics to reality. Beautiful books like Philip Ball’s illustrate and inform, but they should not lead to precise quantitative models. Do not take them at face value.

  But let us see what we can take home from these models.

  Once Again, a Happy Solution

  First, in assuming a scalable, I accept that an arbitrarily large number is possible. In other words, inequalities should not stop above some known maximum bound.

  Say that the book The Da Vinci Code sold around 60 million copies. (The Bible sold about a billion copies but let’s ignore it and limit our analysis to lay books written by individual authors.) Although we have never known a lay book to sell 200 million copies, we can consider that the possibility is not zero. It’s small, but it’s not zero. For every three Da Vinci Code–style bestsellers, there might be one superbestseller, and though one has not happened so far, we cannot rule it out. And for every fifteen Da Vinci Codes there will be one superbestseller selling, say, 500 million copies.

  Apply the same logic to wealth. Say the richest person on earth is worth $50 billion. There is a nonnegligible probability that next year someone with $100 billion or more will pop out of nowhere. For every three people with more than $50 billion, there could be one with $100 billion or more. There is a much smaller probability of there being someone with more than $200 billion—one third of the previous probability, but nevertheless not zero. There is even a minute, but not zero probability of there being someone worth more than $500 billion.

  This tells me the following: I can make inferences about things that I do not see in my data, but these things should still belong to the realm of possibilities. There is an invisible bestseller out there, one that is absent from the past data but that you need to account for. Recall my point in Chapter 13: it makes investment in a book or a drug better than statistics on past data might suggest. But it can make stock market losses worse than what the past shows.

  Wars are fractal in nature. A war that kills more people than the devastating Second World War is possible—not likely, but not a zero probability, although such a war has never happened in the past.

  Second, I will introduce an illustration from nature that will help to make the point about precision. A mountain is somewhat similar to a stone: it has an affinity with a stone, a family resemblance, but it is not identical. The word to describe such resemblances is self-affine, not the precise self-similar, but Mandelbrot had trouble communicating the notion of affinity, and the term self-similar spread with its connotation of precise resemblance rather than family resemblance. As with the mountain and the stone, the distribution of wealth above $1 billion is not exactly the same as that below $1 billion, but the two distributions have “affinity.”

  Third, I said earlier that there have been plenty of papers in the world of econophysics (the application of statistical physics to social and economic phenomena) aiming at such calibration, at pulling numbers from the world of phenomena. Many try to be predictive. Alas, we are not able to predict “transitions” into crises or contagions. My friend Didier Sornette attempts to build predictive models, which I love, except that I
can not use them to make predictions—but please don’t tell him; he might stop building them. That I can’t use them as he intends does not invalidate his work, it just makes the interpretations require broad-minded thinking, unlike models in conventional economics that are fundamentally flawed. We may be able to do well with some of Sornette’s phenomena, but not all.

  WHERE IS THE GRAY SWAN?

  I have written this entire book about the Black Swan. This is not because I am in love with the Black Swan; as a humanist, I hate it. I hate most of the unfairness and damage it causes. Thus I would like to eliminate many Black Swans, or at least to mitigate their effects and be protected from them. Fractal randomness is a way to reduce these surprises, to make some of the swans appear possible, so to speak, to make us aware of their consequences, to make them gray. But fractal randomness does not yield precise answers. The benefits are as follows. If you know that the stock market can crash, as it did in 1987, then such an event is not a Black Swan. The crash of 1987 is not an outlier if you use a fractal with an exponent of three. If you know that biotech companies can deliver a megablockbuster drug, bigger than all we’ve had so far, then it won’t be a Black Swan, and you will not be surprised, should that drug appear.

  Thus Mandelbrot’s fractals allow us to account for a few Black Swans, but not all. I said earlier that some Black Swans arise because we ignore sources of randomness. Others arise when we overestimate the fractal exponent. A gray swan concerns modelable extreme events, a black swan is about unknown unknowns.

  I sat down and discussed this with the great man, and it became, as usual, a linguistic game. In Chapter 9 I presented the distinction economists make between Knightian uncertainty (incomputable) and Knightian risk (computable); this distinction cannot be so original an idea to be absent in our vocabulary, and so we looked for it in French. Mandelbrot mentioned one of his friends and prototypical heroes, the aristocratic mathematician Marcel-Paul Schützenberger, a fine erudite who (like this author) was easily bored and could not work on problems beyond their point of diminishing returns. Schützenberger insisted on the clear-cut distinction in the French language between hasard and fortuit. Hasard, from the Arabic az-zahr, implies, like alea, dice—tractable randomness; fortuit is my Black Swan—the purely accidental and unforeseen. We went to the Petit Robert dictionary; the distinction effectively exists there. Fortuit seems to correspond to my epistemic opacity, l’imprévu et non quantifiable; hasard to the more ludic type of uncertainty that was proposed by the Chevalier de Méré in the early gambling literature. Remarkably, the Arabs may have introduced another word to the business of uncertainty: rizk, meaning property.

 

‹ Prev