More Than You Know
Page 5
But stock price changes are not normally distributed, which has implications for our notions of risk and uncertainty, market timing, and money management. More specifically, stock price change distributions show high kurtosis—the mean is higher, and the tails fatter, than a normal distribution. (We may still say that a distribution exists that characterizes the market; it’s just not a normal distribution.) These return outliers are of particular interest in understanding the characteristics of stock market returns over time.
To demonstrate this point, I looked at daily S&P 500 Index price changes from January 3, 1978, to March 30, 2007. The index’s annual return (excluding dividends) in that period was 9.5 percent. I then knocked out the fifty worst days, and then the fifty best days, from the sample (over 7,000 days). Had you been able to avoid the worst fifty days, your annual return would have been 18.2 percent, versus the realized ten percent. Without the fifty best days, the return was just 0.6 percent.
This analysis may be attention grabbing, but it lacks a point of reference. To provide better context, I calculated a mean and standard deviation based on the actual underlying data and used those statistics to generate a random sample of the same size and characteristics. When I knocked out the fifty worst days from the sample I created, the returns were just 15.2 percent (versus 18.2 percent for the real data). Likewise, when I knocked out the fifty best days, the return was 3.5 percent, significantly higher than that for the real data.
EXHIBIT 5.1 Volatility Is Clustered, January 1978-March 2007
Source: FactSet and author analysis.
In plain words, this analysis shows that extreme-return days play a much more significant role in shaping the market’s total returns than a normal distribution suggests. It also makes a strong case against market timing, unless an investor has a systematic way to anticipate extreme-return days.
A final thought on extreme-return days is that they do not appear randomly throughout the time series but rather tend to cluster (see exhibit 5.1). So our exercise of knocking out high and low return days is not very realistic because in the real data these extreme days (up and down) come in bunches.
How Predictions Change Future Payoffs
A lot of issues surround prediction, but in this discussion of risk and uncertainty, I focus on how, in markets, acting on a prediction can change the prediction’s outcome.
One way to think about it is to contrast a roulette wheel with a pari-mutuel betting system. If you play a fair game of roulette, whatever prediction you make will not affect the outcome. The prediction’s outcome is independent of the prediction itself. Contrast that with a prediction at the racetrack. If you believe a particular horse is likely to do better than the odds suggest, you will bet on the horse. But your bet will help shape the odds. For instance, if all bettors predict a particular horse will win, the odds will reflect that prediction, and the return on investment will be poor.
The analogy carries into the stock market. If you believe a stock is undervalued and start to buy it, you will help raise the price, thus driving down prospective returns. This point underscores the importance of expected value, a central concept in any probabilistic exercise. Expected value formalizes the idea that your return on an investment is the product of the probabilities of various outcomes and the payoff from each outcome.6
Peter Bernstein once said, “The fundamental law of investing is the uncertainty of the future.” As investors, our challenge is to translate those uncertainties into probabilities and payoffs in the search for attractive securities. An ability to classify probability statements can be very useful in this endeavor.
6
Are You an Expert?
Experts and Markets
Overall, the evidence suggests there is little benefit to expertise. . . . Surprisingly, I could find no studies that showed an important advantage for expertise.
—J. Scott Armstrong, “The Seer-Sucker Theory: The Value of Experts in Forecasting”1
Man Versus Machine
If you enter a hospital with chest pains, doctors will quickly administer an electrocardiogram (EKG) test. The EKG measures electrical impulses in your heart and translates them into squiggles on graph paper. Based in part on the readout, the doctor determines whether or not you’re having a heart attack. Sometimes the readouts are clear. But often they’re equivocal, which means you are relying on the doctor’s expertise to come to a proper diagnosis.
So how good are doctors at reading EKGs? In a 1996 showdown, Lund University researcher Lars Edenbrandt pitted his computer against Dr. Hans Ohlin, a leading Swedish cardiologist. An artificial intelligence expert, Edenbrandt had trained his machine by feeding it thousands of EKGs and indicating which readouts were indeed heart attacks. The fifty-year-old Ohlin routinely read as many as 10,000 EKGs a year as part of his practice.
Edenbrandt chose a sample of over 10,000 EKGs, exactly half of which showed confirmed heart attacks, and gave them to machine and man. Ohlin took his time evaluating the charts, spending a week carefully separating the stack into heart-attack and no-heart-attack piles. The battle was reminiscent of Garry Kasparov versus Deep Blue, and Ohlin was fully aware of the stakes.
As Edenbrandt tallied the results, a clear-cut winner emerged: the computer correctly identified the heart attacks in 66 percent of the cases, Ohlin only in 55 percent. The computer proved 20 percent more accurate than a leading cardiologist in a routine task that can mean the difference between life and death.2
Our society tends to hold experts in high esteem. Patients routinely surrender their care to doctors, investors listen to financial advisors, and receptive TV viewers tune in to pundits of all stripes. But what is the basis for this unquestioning faith in experts?
Where Do Experts Do Well?
In some domains, experts clearly and consistently outperform the average person: just imagine playing chess against a grandmaster, trading volleys on Wimbledon’s center court, or performing brain surgery. Yet in other domains experts add very little value, and their opinions are routinely inferior to collective judgments. Further, experts in some fields tend to agree most of the time (e.g., weather forecasters), while in other fields they often stand at complete odds with one another. What’s going on?
Let’s narrow our discussion to cognitive tasks. One way to look at expert effectiveness is based on the nature of the problem they address. We can consider problem types on a continuum.3 One side captures straightforward problems inherent to static, linear, and discrete systems. The opposite side reflects dynamic, non-linear, and continuous problems. Exhibit 6.1 offers additional adjectives for each of the two extremes.
While tens of thousands of hours of deliberate practice allows experts to internalize many of their domain’s features, this practice can also lead to reduced cognitive flexibility. Reduced flexibility leads to deteriorating expert performance as problems go from the simple to the complex.
Two concepts are useful here. The first is what psychologists call functional fixedness, the idea that when we use or think about something in a particular way we have great difficulty in thinking about it in new ways. We have a tendency to stick to our established perspective and are very slow to consider alternative perspectives.
EXHIBIT 6.1 Edges of the Problem Continuum
Discrete Continuous
Static Dynamic
Sequential Simultaneous
Mechanical Organic
Separable Interactive
Universal Conditional
Homogenous Heterogeneous
Regular Irregular
Linear Nonlinear
Superficial Deep
Single Multiple
Stationary Nonstationary
Source: Paul J. Feltovich, Rand J. Spiro, and Richard L. Coulsen, “Issues of Expert Flexibility in Contexts Characterized by Complexity and Change,” in Expertise in Context: Human and Machine, ed. Paul J. Feltovich, Kenneth M. Ford, and Robert R. Hoffman (Menlo Park, Cal.: AAAI Press and Cambridge, Mass.: MIT Press, 1997), 128-9 and author.
The second idea, reductive bias, says that we tend to treat non-linear, complex systems (the right-hand side of the continuum) as if they are linear, simple systems. A common resulting error is evaluating a system based on attributes versus considering the circumstances. For example, some investors focus solely on statistically cheap stocks (attribute) and fail to consider whether or not the valuation indicates value (circumstance).
Reductive bias also presents a central challenge for economists, who attempt to model and predict complex systems using tools and metaphors from simpler equilibrium systems. The bias demonstrates a number of conceptual challenges, including the failure to consider novel approaches, novelty clues, and system change.
None of this is to say that experts are inflexible automatons. Experts act with demonstrably more flexibility than novices in a particular domain. Psychologists specify two types of expert flexibility. In the first type, the expert internalizes many of the domain’s salient features and hence sees and reacts to most of the domain’s contexts and their effects. This flexibility operates effectively in relatively stable domains.
The second type of flexibility is more difficult to exercise. This flexibility requires experts to recognize when their cognitively accessible models are unlikely to work, forcing the experts to go outside their routines and their familiar frameworks to solve problems. This flexibility is crucial to success in nonlinear, complex systems.
So how do experts ensure they incorporate both types of flexibility? Advocates of cognitive flexibility theory suggest the major determinant in whether or not an expert will have more expansive flexibility is the amount of reductive bias during deliberate practice.4 More reductive bias may improve efficiency but will reduce flexibility. To mitigate reductive bias, the theory prescribes exploring abstractions across diverse cases to capture the significance of context dependence. Experts must also look at actual case studies and see when rules do and don’t work.
Exhibit 6.2 consolidates these ideas and offers a quick guide to expert performance in various types of cognitive domains. Consistent with Exhibit 6.1, we show a range of domains from the most simple on the left to the most complex on the right. The exhibit shows that expert performance is largely a function of the type of problem the expert addresses.
For rules-based systems with limited degrees of freedom, computers consistently outperform individual humans.5 Humans perform well, but the computers are better and often cheaper. Computer algorithms beat people for reasons psychologists have documented: humans are easily influenced by suggestion, recent experience, and how information is presented. Humans also do a poor job of weighing variables.6 Because most decisions in these systems are rules based, experts tend to agree. The EKG-reading story illustrates this point.
EXHIBIT 6.2 Expert Performance Depends on the Problem Type
Source: Beth Azar, “Why Experts Often Disagree,” APA Monitor Online 30, no. 5 (May 1999) and author.
The next column shows rules-based systems with high degrees of freedom. Experts tend to add the most value here. For example, while Deep Blue narrowly beat chess master Garry Kasparov, no computer is even close to beating a top player in Go, a game with simple rules but a larger 19-by- 19 board.7 Improving computing power, however, will eventually challenge the expert edge in this domain type. Agreement among experts in this domain remains reasonably high.
A move to the right reveals a probabilistic domain with limited degrees of freedom. The value of experts declines because outcomes are probabilistic, but experts still hold their own versus computers and collectives. Expert agreement dips again in these domains. Statistics can improve expert decision making with these problems, a point Michael Lewis develops fully for professional baseball player selection in his bestseller Moneyball.
The right-most column shows the most difficult environment: a probabilistic domain with high degrees of freedom. Here the evidence clearly shows that collectives outperform experts.8 The stock market provides an obvious case in point, and it comes as no surprise that the vast majority of investors add no value. In this domain, experts can, and often do, hold diametrically opposite views on the same issue.9
We often rely on experts. But how good are their predictions, really? Psychologist Phil Tetlock asked nearly three hundred experts to make literally tens of thousands of predictions over nearly two decades. These were difficult predictions related to political and economic outcomes—similar to the types of problems investors tackle.
The results were unimpressive. Expert forecasters improved little, if at all, on simple statistical models. Further, when Tetlock confronted the experts with their poor predicting acuity, they went about justifying their views just like everyone else does. Tetlock doesn’t describe in detail what happens when the expert opinions are aggregated, but his research certainly shows that ability, defined as expertise, does not lead to good predictions when the problems are hard.
Decomposing the data, Tetlock found that while expert predictions were poor overall, some were better than others. What mattered in predictive ability was not who the people were or what they believed, but rather how they thought. Using a metaphor from Archilochus (via Isaiah Berlin), Tetlock segregated the experts into hedgehogs and foxes. Hedgehogs know one big thing and extend the explanatory reach of that thing to everything they encounter. Foxes, in contrast, tend to know a little about a lot and are not wedded to a single explanation for complex problems.
Two of Tetlock’s discoveries are particularly relevant. The first is a correlation between media contact and poor predictions. Tetlock notes that “betterknown forecasters—those more likely to be fêted by the media—were less calibrated than their lower-profile colleagues.”10 The research provides yet another reason to be wary of the radio and television talking heads.
Second, Tetlock found foxes tend to be better predictors than hedgehogs. He writes:High scorers look like foxes: thinkers who know many small things (tricks of their trade), are skeptical of grand schemes, see explanation and prediction not as deductive exercises but rather exercises in flexible “ad hocery” that require stitching together diverse sources of information, and are rather diffident about their own forecasting prowess.11
We can say that hedgehogs have one power tool while foxes have many tools in their toolbox. Of course, hedgehogs solve certain problems brilliantly—they certainly get their fifteen minutes of fame—but don’t predict as well over time as the foxes do, especially as conditions change. Tetlock’s research provides scholarly evidence of diversity’s power.
7
The Hot Hand in Investing
What Streaks Tell Us About Perception, Probability, and Skill
Long streaks are, and must be, a matter of extraordinary luck imposed on great skill.
—Stephen Jay Gould, “The Streak of Streaks”
Anyone can theoretically roll 12 sevens in a row.
—Bill Gross, Barron’s
Finding the Hot Shot
Humans are natural pattern seekers. One well-known example is the hot hand in basketball. A player who makes a few baskets in a row is considered to have a hot hand, which implies that he has a higher-than-normal chance of sinking his next shot. Research shows that sports fans, and the athletes themselves, believe in the hot hand phenomenon.
There’s only one problem: The hot hand does not exist. Scientists studied a season’s worth of shooting statistics of the Philadelphia 76ers and free-throw records of the Boston Celtics and found no evidence for the hot hand. Players did make successive shots, of course, but those streaks were completely consistent with probabilities. Streaks and slumps lie within the domain of chance.1
We see patterns where none exist because we’re wired to expect that the characteristics of chance show up not just in a total sequence but also in small parts of the sequence. Psychologists Amos Tversky and Daniel Kahneman call this “belief in the law of small numbers.”
For example, if you show someone a short section of a long coin-toss se
ries, he will expect to see a fifty/fifty mix between heads and tails even though a short section will generally deviate systematically from chance. Even a short sequence of repeated heads is enough to convince most people (falsely) that the longer sequence is not random. That’s the reason we believe in hot hands.2
The main point here, though, is not that humans are poor at relating probabilities to sequences of outcomes. The more important issue is that streaks inform us about probabilities. In human endeavors, unlike a fair coin toss, the probabilities of success or failure are not the same for each individual. Long success streaks happen to the most skillful in a field precisely because their general chance of success is higher than average.
Streaks and Skill
Here’s an illustration of the link between streaks and skill. Let’s say you have two basketball players, Sally Swish and Allen Airball. Sally, the more skilled of the two, makes 60 percent of her shot attempts. Allen only makes 30 percent of his. What are the probabilities of each player making five shots in a row? For Sally, the likelihood is (0.6)5, or 7.8 percent. That means that Sally will get five in a row about every thirteen sequences. Allen’s chances are only (0.3)5, or 0.24 percent. So Allen’s only going to hit five straight once every 412 sequences. Without violating any probability principle, Sally is going have a lot more streaks than Allen.3
Consistent with this thesis, Wilt Chamberlin drained eighteen consecutive shots on February 24, 1967, to earn the NBA record for the longest field-goal streak in a game. Chamberlin made 54 percent of his field-goal attempts over his career, placing him among the game’s top twenty in shooting accuracy.
Baseball hitting streaks are another good way to test the notion that we should associate long streaks with skill (as well as luck). In Major League Baseball history, forty-two players have staked hitting streaks of thirty or more games. The average lifetime batting average of these players is .311. To put that in perspective, a .311 lifetime average would place a hitter among the top one hundred in the history of the game.