(2013) and for more detail refer to Poon (2005). Here, I have two general observations.
The GARCH Family and Trading
The simplest forecasting model is to assume that the volatility over
the next N days will be the same as it was over the previous N
days. Mathematically,
(3.1)
This has two major problems. First is the “windowing” effect
where a large single return affects the volatility calculation for N
55
days, then drops out of the sample. This creates jumps in the
volatility measurements and hence the forecast. An example is
given in Figure 3.1, where we calculate the 30-day volatility of
Maximus, Inc. (MMS) from June, 15, 2019, to September, 30,
2019.
FIGURE 3.1 The rolling 30-day close-to-close volatility of Maximus, Inc.
The typical daily move of this stock was about 0.7% but on August
8 it jumped by 12% because of earnings. This caused the 30-day
volatility to jump from 17.8% to 39.3%. Thirty days later, the
earnings day dropped out of the calculation and volatility again
dropped to 23.3%. If we can know what events are outliers, we can
avoid this problem by removing them from the data. We can just
throw out the earnings day return.
A bigger problem is that this method doesn't take volatility
clustering into account. Periods of exceptionally high or low
volatility will persist for only a short time. The exponentially
weighted moving average (EWMA) model takes this into account.
This says variance evolves as
(3.2)
where λ is usually chosen to be between 0.9 and 1.
The GARCH (generalized autoregressive conditional
heteroskedasticity) family of models extend this idea to allow for
56
mean reversion to the long-term variance. The GARCH(1,1) model
(so-called because it contains only first-order lagged terms) is
(3.3)
where α, β, and γ sum to 1 and γV is the long-term variance.
GARCH is both an insightful improvement on naively assuming
future volatility will be like the past and also wildly overrated as a
predictor. GARCH models capture the essential characteristics of
volatility: volatility tomorrow will probably be close to what it is
today and volatility in the long term will probably be whatever the
historical long-term average has been. Everything in between is
interpolation, and it is in the interpolation that the models in the
family differ. As an example, Figure 3.2 shows the term structure of forecast volatility for SPY on August 1, 2019, using GARCH(1,1)
and GJR-GARCH(1,1), which also accounts for the asymmetry of
positive and negative returns. Both models are estimated from the
previous four years of daily returns using MLE.
From a practical perspective, the difference is negligible. And this
is what has led to the proliferation of GARCH-type models. They
are all roughly the same. No model is clearly better than the
others. In any situation where there are many competing theories
it is a sign that all of the theories are bad. There is one
Schrödinger equation. It works very well. There are thousands of
GARCH variants. None work very well.
In fact, it has been shown that the forecasts from GARCH
generally are no better than a simple EWMA model, and most
professional traders are reluctant to use GARCH. Part of the
reticence is due to the instability of the MLE parameters. These
can change considerably week to week. MLE also requires about a
thousand data points to get a good estimate. This means that if we
are using daily data, our forecast will be using information from
four years ago. This isn't good.
But there is a practical way to combine the robustness of EWMA
and the decay to a long-term mean that GARCH allows. When a
trader uses EWMA, he arbitrarily chooses the decay parameter
instead of fitting to historical data and using MLE. We can do the
same with GARCH. Choose a model, choose the parameters, and
use it consistently. This means that eventually we will develop
intuition by “seeing” the market through the lens of this model.
57
For indices, choosing α in the range of 0.9 and β between 0.02 and 0.04 seems to work.
FIGURE 3.2 Term structure of forecast volatility for SPY using GARCH(1,1) (solid line) and GJR-GARCH (dashed line).
Implied Volatility as a Predictor
Implied volatility can be used to predict future realized volatility if
we account for the variance premium. So a forecast of the 30-day
volatility for the S&P 500 would be given by subtracting the
appropriate variance premium for the current VIX level (refer to
Table 4.3) from the VIX.
Most underlying products do not have a calculated VIX index. The
first way to deal with this is to follow the CBOE's published
methodology and construct a VIX. An easier way is to create a
weighted average of the appropriate ATM volatilities and use that
as a proxy. This methodology was used to create the original VIX
(ticker symbol VXO). VXO and the VIX returns have an 88%
correlation and the average difference between their values is
about 0.5% of the VIX level. This approximation isn't ideal but will
usually be the best there is.
Ensemble Predictions
58
The volatility market is now mature enough that any time series–
based volatility method will probably not provide forecasts that
are good enough to profit in the option market. A better approach
is to combine a number of different forecasts. This idea of the
usefulness of information aggregation is far from new. One of the
earliest advocates for the “wisdom of crowds” was Sir Francis
Galton. In 1878, he photographically combined many different
portraits to show that “all the portraits are better looking than
their components because the averaged portrait of many persons
is free from the irregularities that variously blemish the look of
each of them.” His experiment has been repeated and his
conclusions validated numerous times using more advanced
equipment.
An aggregate forecast can be better than any of the components
that make it up. This can be demonstrated with a simple example.
Imagine that we ask 100 people the multiple-choice question,
“What is the capital of Italy?” with the possibilities being Rome,
Milan, Turin, and Venice. Twenty of the group are sure about the
correct answer (Rome). The remaining 80 just guess so their
choices are equally divided among all the choices, which get 20
votes each. So, Rome receives 40 votes (the 20 people who knew
and 20 votes from guesses) and the other cities get 20 votes each.
Even though only a small proportion of the people had genuine
knowledge, the signal was enough to easily swamp the noise from
the guesses of the guessers.
This example also shows that for forecast combinations to be most
useful they need to contain diverse information. We need the
people who are wrong to be uncorrelated sources of noise. That
isn't the case with volatilit
y time series models. Most models will
have very high correlation with each other. However, simply
averaging the predictions from a number of simple models will
still improve predictions. I used five volatility models to predict
subsequent 30-day S&P 500 volatility from 1990 to the end of
2018. Table 3.1 shows the summary statistics for each model and
also for a simple average.
The error of the average is only beaten by that of the simple 30-
day average (the least sophisticated model) but it beats it when we
consider the dispersion of results. Interestingly averaging the 0.9
and 0.95 EWMA models also leads to a slight improvement. This
is shown in Table 3.2.
59
Even very similar models can be usefully averaged. This is
probably the best way to apply this concept. Average over every
GARCH model possible, a wide range of time scales, and a wide
range of parameters. Ideally, the models that are averaged would
be based on totally different ideas or data, but with volatility this
won't happen.
TABLE 3.1 Thirty-Day Volatility Forecasts for the S&P
500 from 1990 to the End of 2018
Averag
30-Day
EWM EWM VIX GARC
e
Historical A ( λ = A ( λ =
H (1,1)
Volatility
0.9)
0.95)
Average
Error
(volatility
0.27
−0.10
−0.92 −0.76
0.30 0.44
points)
SD of
Error
5.2
6.0
5.8
5.9
5.2
5.9
10th
Percentile
−5.1
−5.8
−7.3
6.6
−6.4
−6.3
90th
Percentile
5.2
5.8
4.1
5.0
8.9
5.3
R-Squared
0.65
0.62
0.62
0.60
0.64 0.60
TABLE 3.2 Thirty-Day Volatility EWMA Forecasts for the S&P 500 from 1990 to the End of 2018
Averag EWMA ( λ = EWMA ( λ =
e
0.9)
0.95)
Average Error (volatility
points)
−1.1
−1.5
−0.76
SD of Error
5.7
5.8
5.9
10th Percentile
−6.9
−7.3
6.6
90th Percentile
4.5
4.1
5.0
R-Squared
0.61
0.62
0.60
Conclusion
60
Realized volatility is reasonably forecastable for a financial time
series. Unfortunately, this means that it is hard to make a good
forecast that differs significantly from the market's consensus.
However, volatility predictions are essential even when they are
not the basis for finding edge. In particular, any sensible sizing
scheme will need a prediction of future volatility.
Summary
All trading strategies can be categorized as either model driven
or based on special situations. Each type has weaknesses and
strengths.
An ensemble prediction of volatility will usually outperform
time series methods.
61
CHAPTER 4
The Variance Premium
In finance, everything that is agreeable is unsound and everything that is
sound is disagreeable.
—Winston Churchill
The variance premium (also known as the volatility premium) is the tendency
for implied volatility to be higher than subsequently realized volatility.
This is not a recent phenomenon. In his 1906 book The Put and Call, Leonard Higgins writes how traders on the London Stock Exchange first determine a
statistical fair value for options, then “add to the ‘average value’ of the put and call an amount which will give a fair margin of profit.” That is, a variance
premium was added.
The variance premium exists in equity indices, the VIX, bonds, commodities,
currencies, and many stocks. It is probably the most important factor to be
aware of when trading options. Even traders who are not trying to directly
monetize the effect need to know of it and understand it. It is the tide that long option positions need to overcome to be profitable. Even traders who only use
options to trade directionally need to take this into account. Even if directional predictions are correct, it is very hard to make money if one is consistently
paying too much for options (see Chapters Six and Seven for more discussion of
this point).
This effect can be monetized in many ways. The size and persistence of the
variance premium is so strong that the precise details of a strategy often aren't very important. Practically any strategy that sells implied volatility has a
significant head start on being profitable if the premium is there.
In this chapter we will discuss the characteristics of the variance premium in
various products; look at the relationships among the variance premium,
correlation, and skewness; and give some possible reasons for the existence of
the effect.
Aside: The Implied Variance Premium
The variance premium refers to the difference between implied volatility, which can be defined by either BSM implied volatilities or variance swaps, and
subsequent realized volatility. There is a related phenomenon that occurs
entirely in the implied space. Being short VIX futures is generally a profitable strategy (although not a wildly successful one). Figure 4.1 shows the results of always being short the VIX front month future from June 2015 to October 2019.
The VIX itself doesn't decay in the same way (refer to Figure 4.2). This is really a
term-structure effect in the futures.
62
FIGURE 4.1 Profit from selling 1 front-month VIX future.
FIGURE 4.2 The VIX index from June 2015 to October 2019.
According to the rational expectations hypothesis, the VIX futures curve should
be an unbiased predictor of where the VIX index will be on the expiration date.
The narrowing of the basis as time approaches the expiration date should be
more dependent on the cash index moving toward the future's price.
The theory of rational expectations has been tested on many different
commodity futures and it is generally a poor description of price movements.
Futures tend to move toward the cash. Alternatively, the cash VIX is a better
predictor of future VIX levels than the futures are.
It is probably not surprising that this also occurs in the VIX. VIX futures are
unusual. Generally, futures are priced by first assuming that they are forwards, then constructing an arbitrage-free portfolio of the underlying and the future.
However, the VIX index cannot be traded so this method is not useful for
pricing VIX futures. Given that VIX futures are not constrained by tight, no-
arbitrage bounds, there is even more room for inefficiencies.
On its own this doesn't mean short positions have to be profitable. But the VIX
&nbs
p; term structure is usually in contango (from the time VIX futures were listed in
2006 to the start of 2019, the term structure has been in contango 81% of the
time). This means that the futures are above the cash and tend to decline toward 63
it. The best discussion of this effect is in Simon and Campasano (2014). Selling a
future only when the previous day's prices were in contango considerably
improves this strategy. Figure 4.3 shows the results of being short the VIX
front-month future from June 2015 to October 2019, when the term structure is
in contango.
FIGURE 4.3 Profit from selling 1 front-month VIX future when the term structure is in contango.
Variance Premium in Equity Indices
Figure 4.4 shows the VIX and the subsequent 30-day realized volatility of the S&P 500 from 1990 to the end of 2018.
On average the VIX was four volatility points higher than the realized volatility and the premium is positive 85% of the time. Figure 4.5 shows the premium in
volatility points. Figure 4.6 shows the distribution of the daily premia and Table
4.1 gives the summary statistics.
FIGURE 4.4 The VIX and the subsequent 30-day realized S&P 500 volatility.
64
FIGURE 4.5 The S&P 500 variance premium (VIX minus realized volatility).
FIGURE 4.6 The S&P 500 variance premium distribution.
TABLE 4.1 Summary Statistics for the S&P 500 Variance Premium Mean
4.08
Standard
deviation
5.96
Skewness
−2.33
Maximum
31.21
Minimum
−53.3
4
Median
4.63
90th percentile
9.62
10th percentile
−1.45
The Dow Jones 30, NASDAQ 100, and Russell 2000 indices have similar
variance premia. The summary statistics for these are given in Table 4.2.
65
TABLE 4.2 Summary Statistics for the Dow Jones, NASDAQ 100, and Russell 2000 Variance Premia
Index
Dow Jones
NASDAQ 100
Russell 2000
(from 1998)
(from 2001)
(from 2004)
Mean
3.50
3.41
3.24
Standard
deviation
6.18
6.99
6.58
Skewness
Positional Option Trading (Wiley Trading) Page 7