Positional Option Trading (Wiley Trading)

Home > Other > Positional Option Trading (Wiley Trading) > Page 7
Positional Option Trading (Wiley Trading) Page 7

by Euan Sinclair


  (2013) and for more detail refer to Poon (2005). Here, I have two general observations.

  The GARCH Family and Trading

  The simplest forecasting model is to assume that the volatility over

  the next N days will be the same as it was over the previous N

  days. Mathematically,

  (3.1)

  This has two major problems. First is the “windowing” effect

  where a large single return affects the volatility calculation for N

  55

  days, then drops out of the sample. This creates jumps in the

  volatility measurements and hence the forecast. An example is

  given in Figure 3.1, where we calculate the 30-day volatility of

  Maximus, Inc. (MMS) from June, 15, 2019, to September, 30,

  2019.

  FIGURE 3.1 The rolling 30-day close-to-close volatility of Maximus, Inc.

  The typical daily move of this stock was about 0.7% but on August

  8 it jumped by 12% because of earnings. This caused the 30-day

  volatility to jump from 17.8% to 39.3%. Thirty days later, the

  earnings day dropped out of the calculation and volatility again

  dropped to 23.3%. If we can know what events are outliers, we can

  avoid this problem by removing them from the data. We can just

  throw out the earnings day return.

  A bigger problem is that this method doesn't take volatility

  clustering into account. Periods of exceptionally high or low

  volatility will persist for only a short time. The exponentially

  weighted moving average (EWMA) model takes this into account.

  This says variance evolves as

  (3.2)

  where λ is usually chosen to be between 0.9 and 1.

  The GARCH (generalized autoregressive conditional

  heteroskedasticity) family of models extend this idea to allow for

  56

  mean reversion to the long-term variance. The GARCH(1,1) model

  (so-called because it contains only first-order lagged terms) is

  (3.3)

  where α, β, and γ sum to 1 and γV is the long-term variance.

  GARCH is both an insightful improvement on naively assuming

  future volatility will be like the past and also wildly overrated as a

  predictor. GARCH models capture the essential characteristics of

  volatility: volatility tomorrow will probably be close to what it is

  today and volatility in the long term will probably be whatever the

  historical long-term average has been. Everything in between is

  interpolation, and it is in the interpolation that the models in the

  family differ. As an example, Figure 3.2 shows the term structure of forecast volatility for SPY on August 1, 2019, using GARCH(1,1)

  and GJR-GARCH(1,1), which also accounts for the asymmetry of

  positive and negative returns. Both models are estimated from the

  previous four years of daily returns using MLE.

  From a practical perspective, the difference is negligible. And this

  is what has led to the proliferation of GARCH-type models. They

  are all roughly the same. No model is clearly better than the

  others. In any situation where there are many competing theories

  it is a sign that all of the theories are bad. There is one

  Schrödinger equation. It works very well. There are thousands of

  GARCH variants. None work very well.

  In fact, it has been shown that the forecasts from GARCH

  generally are no better than a simple EWMA model, and most

  professional traders are reluctant to use GARCH. Part of the

  reticence is due to the instability of the MLE parameters. These

  can change considerably week to week. MLE also requires about a

  thousand data points to get a good estimate. This means that if we

  are using daily data, our forecast will be using information from

  four years ago. This isn't good.

  But there is a practical way to combine the robustness of EWMA

  and the decay to a long-term mean that GARCH allows. When a

  trader uses EWMA, he arbitrarily chooses the decay parameter

  instead of fitting to historical data and using MLE. We can do the

  same with GARCH. Choose a model, choose the parameters, and

  use it consistently. This means that eventually we will develop

  intuition by “seeing” the market through the lens of this model.

  57

  For indices, choosing α in the range of 0.9 and β between 0.02 and 0.04 seems to work.

  FIGURE 3.2 Term structure of forecast volatility for SPY using GARCH(1,1) (solid line) and GJR-GARCH (dashed line).

  Implied Volatility as a Predictor

  Implied volatility can be used to predict future realized volatility if

  we account for the variance premium. So a forecast of the 30-day

  volatility for the S&P 500 would be given by subtracting the

  appropriate variance premium for the current VIX level (refer to

  Table 4.3) from the VIX.

  Most underlying products do not have a calculated VIX index. The

  first way to deal with this is to follow the CBOE's published

  methodology and construct a VIX. An easier way is to create a

  weighted average of the appropriate ATM volatilities and use that

  as a proxy. This methodology was used to create the original VIX

  (ticker symbol VXO). VXO and the VIX returns have an 88%

  correlation and the average difference between their values is

  about 0.5% of the VIX level. This approximation isn't ideal but will

  usually be the best there is.

  Ensemble Predictions

  58

  The volatility market is now mature enough that any time series–

  based volatility method will probably not provide forecasts that

  are good enough to profit in the option market. A better approach

  is to combine a number of different forecasts. This idea of the

  usefulness of information aggregation is far from new. One of the

  earliest advocates for the “wisdom of crowds” was Sir Francis

  Galton. In 1878, he photographically combined many different

  portraits to show that “all the portraits are better looking than

  their components because the averaged portrait of many persons

  is free from the irregularities that variously blemish the look of

  each of them.” His experiment has been repeated and his

  conclusions validated numerous times using more advanced

  equipment.

  An aggregate forecast can be better than any of the components

  that make it up. This can be demonstrated with a simple example.

  Imagine that we ask 100 people the multiple-choice question,

  “What is the capital of Italy?” with the possibilities being Rome,

  Milan, Turin, and Venice. Twenty of the group are sure about the

  correct answer (Rome). The remaining 80 just guess so their

  choices are equally divided among all the choices, which get 20

  votes each. So, Rome receives 40 votes (the 20 people who knew

  and 20 votes from guesses) and the other cities get 20 votes each.

  Even though only a small proportion of the people had genuine

  knowledge, the signal was enough to easily swamp the noise from

  the guesses of the guessers.

  This example also shows that for forecast combinations to be most

  useful they need to contain diverse information. We need the

  people who are wrong to be uncorrelated sources of noise. That

  isn't the case with volatilit
y time series models. Most models will

  have very high correlation with each other. However, simply

  averaging the predictions from a number of simple models will

  still improve predictions. I used five volatility models to predict

  subsequent 30-day S&P 500 volatility from 1990 to the end of

  2018. Table 3.1 shows the summary statistics for each model and

  also for a simple average.

  The error of the average is only beaten by that of the simple 30-

  day average (the least sophisticated model) but it beats it when we

  consider the dispersion of results. Interestingly averaging the 0.9

  and 0.95 EWMA models also leads to a slight improvement. This

  is shown in Table 3.2.

  59

  Even very similar models can be usefully averaged. This is

  probably the best way to apply this concept. Average over every

  GARCH model possible, a wide range of time scales, and a wide

  range of parameters. Ideally, the models that are averaged would

  be based on totally different ideas or data, but with volatility this

  won't happen.

  TABLE 3.1 Thirty-Day Volatility Forecasts for the S&P

  500 from 1990 to the End of 2018

  Averag

  30-Day

  EWM EWM VIX GARC

  e

  Historical A ( λ = A ( λ =

  H (1,1)

  Volatility

  0.9)

  0.95)

  Average

  Error

  (volatility

  0.27

  −0.10

  −0.92 −0.76

  0.30 0.44

  points)

  SD of

  Error

  5.2

  6.0

  5.8

  5.9

  5.2

  5.9

  10th

  Percentile

  −5.1

  −5.8

  −7.3

  6.6

  −6.4

  −6.3

  90th

  Percentile

  5.2

  5.8

  4.1

  5.0

  8.9

  5.3

  R-Squared

  0.65

  0.62

  0.62

  0.60

  0.64 0.60

  TABLE 3.2 Thirty-Day Volatility EWMA Forecasts for the S&P 500 from 1990 to the End of 2018

  Averag EWMA ( λ = EWMA ( λ =

  e

  0.9)

  0.95)

  Average Error (volatility

  points)

  −1.1

  −1.5

  −0.76

  SD of Error

  5.7

  5.8

  5.9

  10th Percentile

  −6.9

  −7.3

  6.6

  90th Percentile

  4.5

  4.1

  5.0

  R-Squared

  0.61

  0.62

  0.60

  Conclusion

  60

  Realized volatility is reasonably forecastable for a financial time

  series. Unfortunately, this means that it is hard to make a good

  forecast that differs significantly from the market's consensus.

  However, volatility predictions are essential even when they are

  not the basis for finding edge. In particular, any sensible sizing

  scheme will need a prediction of future volatility.

  Summary

  All trading strategies can be categorized as either model driven

  or based on special situations. Each type has weaknesses and

  strengths.

  An ensemble prediction of volatility will usually outperform

  time series methods.

  61

  CHAPTER 4

  The Variance Premium

  In finance, everything that is agreeable is unsound and everything that is

  sound is disagreeable.

  —Winston Churchill

  The variance premium (also known as the volatility premium) is the tendency

  for implied volatility to be higher than subsequently realized volatility.

  This is not a recent phenomenon. In his 1906 book The Put and Call, Leonard Higgins writes how traders on the London Stock Exchange first determine a

  statistical fair value for options, then “add to the ‘average value’ of the put and call an amount which will give a fair margin of profit.” That is, a variance

  premium was added.

  The variance premium exists in equity indices, the VIX, bonds, commodities,

  currencies, and many stocks. It is probably the most important factor to be

  aware of when trading options. Even traders who are not trying to directly

  monetize the effect need to know of it and understand it. It is the tide that long option positions need to overcome to be profitable. Even traders who only use

  options to trade directionally need to take this into account. Even if directional predictions are correct, it is very hard to make money if one is consistently

  paying too much for options (see Chapters Six and Seven for more discussion of

  this point).

  This effect can be monetized in many ways. The size and persistence of the

  variance premium is so strong that the precise details of a strategy often aren't very important. Practically any strategy that sells implied volatility has a

  significant head start on being profitable if the premium is there.

  In this chapter we will discuss the characteristics of the variance premium in

  various products; look at the relationships among the variance premium,

  correlation, and skewness; and give some possible reasons for the existence of

  the effect.

  Aside: The Implied Variance Premium

  The variance premium refers to the difference between implied volatility, which can be defined by either BSM implied volatilities or variance swaps, and

  subsequent realized volatility. There is a related phenomenon that occurs

  entirely in the implied space. Being short VIX futures is generally a profitable strategy (although not a wildly successful one). Figure 4.1 shows the results of always being short the VIX front month future from June 2015 to October 2019.

  The VIX itself doesn't decay in the same way (refer to Figure 4.2). This is really a

  term-structure effect in the futures.

  62

  FIGURE 4.1 Profit from selling 1 front-month VIX future.

  FIGURE 4.2 The VIX index from June 2015 to October 2019.

  According to the rational expectations hypothesis, the VIX futures curve should

  be an unbiased predictor of where the VIX index will be on the expiration date.

  The narrowing of the basis as time approaches the expiration date should be

  more dependent on the cash index moving toward the future's price.

  The theory of rational expectations has been tested on many different

  commodity futures and it is generally a poor description of price movements.

  Futures tend to move toward the cash. Alternatively, the cash VIX is a better

  predictor of future VIX levels than the futures are.

  It is probably not surprising that this also occurs in the VIX. VIX futures are

  unusual. Generally, futures are priced by first assuming that they are forwards, then constructing an arbitrage-free portfolio of the underlying and the future.

  However, the VIX index cannot be traded so this method is not useful for

  pricing VIX futures. Given that VIX futures are not constrained by tight, no-

  arbitrage bounds, there is even more room for inefficiencies.

  On its own this doesn't mean short positions have to be profitable. But the VIX

&nbs
p; term structure is usually in contango (from the time VIX futures were listed in

  2006 to the start of 2019, the term structure has been in contango 81% of the

  time). This means that the futures are above the cash and tend to decline toward 63

  it. The best discussion of this effect is in Simon and Campasano (2014). Selling a

  future only when the previous day's prices were in contango considerably

  improves this strategy. Figure 4.3 shows the results of being short the VIX

  front-month future from June 2015 to October 2019, when the term structure is

  in contango.

  FIGURE 4.3 Profit from selling 1 front-month VIX future when the term structure is in contango.

  Variance Premium in Equity Indices

  Figure 4.4 shows the VIX and the subsequent 30-day realized volatility of the S&P 500 from 1990 to the end of 2018.

  On average the VIX was four volatility points higher than the realized volatility and the premium is positive 85% of the time. Figure 4.5 shows the premium in

  volatility points. Figure 4.6 shows the distribution of the daily premia and Table

  4.1 gives the summary statistics.

  FIGURE 4.4 The VIX and the subsequent 30-day realized S&P 500 volatility.

  64

  FIGURE 4.5 The S&P 500 variance premium (VIX minus realized volatility).

  FIGURE 4.6 The S&P 500 variance premium distribution.

  TABLE 4.1 Summary Statistics for the S&P 500 Variance Premium Mean

  4.08

  Standard

  deviation

  5.96

  Skewness

  −2.33

  Maximum

  31.21

  Minimum

  −53.3

  4

  Median

  4.63

  90th percentile

  9.62

  10th percentile

  −1.45

  The Dow Jones 30, NASDAQ 100, and Russell 2000 indices have similar

  variance premia. The summary statistics for these are given in Table 4.2.

  65

  TABLE 4.2 Summary Statistics for the Dow Jones, NASDAQ 100, and Russell 2000 Variance Premia

  Index

  Dow Jones

  NASDAQ 100

  Russell 2000

  (from 1998)

  (from 2001)

  (from 2004)

  Mean

  3.50

  3.41

  3.24

  Standard

  deviation

  6.18

  6.99

  6.58

  Skewness

 

‹ Prev