Monday, 15 May 2017

People are worried about the VIX

"Today the VIX traded below 10 briefly intraday. A pretty rare occurrence. Since 1993, there have been only 18 days where it traded below 10 intraday and only 9 days where it closed below 10." (source: some random dude on my linkedin feed)

... indeed 18 observations is a long.... long... way from anything close to a statistically significant sample size. (my response to random dude)

You can't move on the internet these days for scare stories about the incredibly low level of the VIX, a measure of US implied stock market volatility. Notably the VIX closed below 10 on a couple of days last week, although it has since slightly ticked up. Levels of the VIX this low are very rare - they've only happened on 11 days since 1990 (as of the date I'm writing this).

The VIX in all it's glory


The message is that we should be very worried about this. The logic is simple - "Calm before a storm". Low levels of the VIX seem to presage scary stuff happening in the near future. Really low levels, then, must mean a very bad storm indeed.

Consider for example the VIX in early 2007:

Pootling around at 10 in late 2006, early 2007, the VIX responded to the failure of two Bear Stearns hedge funds which (as we know now) marked the beginning of the credit crunch. 18 months later there was a full blown panic happening.

This happened then, therefore it will happen again.

It struck me that this story is an example of what behavioural finance type people call narrative bias; the tendency of human beings to extrapolate single events into a pattern. But we need to use some actual statistics to see if we can really extend this anecdotal evidence into a full blown forecasting rule.

There has been some sensible attempt to properly quantify how worried we should be, most notably here on the FT alphaville site, but I thought it worth doing my own little analysis on the subject. Spoiler alert for the terminally lazy: there is probably nothing to be worried about. If you're going to read the rest of the post then along the way you'll also learn a little about judging uncertainty when forecasting, the effect of current vol on future price movements, and predicting volatility generally.

(Note: Explanations for the low level of the VIX abound, and self appointed finance "experts" can be found pontificating on this subject. It's also puzzling how the VIX is so low, when apparently serious sized traders are buying options on it in bucket load sized units (this guy thinks he knows why). I won't be dealing with this conundrum here. I'm only concerned about making money. To make money we just need to judge if the level of the VIX really has any predictive power. We probably don't need to know why the VIX is low.)


Does the level of VIX predict stock prices?


If this was an educational piece I'd work up to this conclusion gradually, but as it's clickbait I'll deal with the question everyone wants to know first (fully aware that most people will then stop reading).

This graph shows the distribution of rolling 20 business day (about one month) US stock returns since 1997:


(To be precise it's the return of the S&P 500 futures contract since I happened to have that lying around; strictly speaking you'd add LIBOR to these. The S&P data goes back to 1997. I've also done this analysis with actual US stock monthly returns going back to 1990. The results are the same - I'm only using the futures here as I have daily returns which makes for nicer, more granular, plots.) 

Important point here: this is an unconditional plot. It tells us how (un)predictable one month stock returns are in the absence of any conditioning information. Now let's add some conditioning information - the level of spot VIX:

I've split history in half - times when VIX was low (below 19.44%) shown in red, and when it was high (above 19.44%), which are in blue (overlaps are in purple). Things I notice about this plot are:


  • The average return doesn't seem to be any different between the two periods of history
  • The blue distribution is wider than the red one. In other words if spot VIX is high, then returns are likely to be more volatile. Really this is just telling us that implied vol (what the VIX is measuring) is a pretty good predictor of realised vol (what actually happens). I'll talk more about predicting vol, rather than the direction of returns, later in the post.
  • Digging in a bit more it looks like there are more bad returns in the blue period (negative skew to use the jargon)


The upshot of the first bullet point is that spot VIX doesn't predict future equity returns very well. In fact the average monthly return is 0.22% when vol is low, and 0.38% when vol is high; a difference of 0.16% a month. That doesn't seem like a big difference - and it's hard to see from the plot - but can we test that properly?

Yes we can. This plot shows the distribution of the differences in averages:

This was produced by monte carlo: repeatedly comparing the difference between random independent draws from the two distributions. This is better than using something like a 't-test' which assumes a certain distribution.

A negative number here means that high VIX gives a higher return than low VIX. We already know this is true, but the distribution plot shows us that this difference is actually reasonably significant. In fact 94.4% of the differences above are below zero. That isn't quite at the 95% level that many statisticians use for significance testing, but it's close.

To put it another way we can be 94.4% confident that the expected return for a low VIX (below 20%) environment will be lower than that for days when VIX is high (above 20%).

A moments thought shows it would be surprising if we got a different result. In finance we expect that with a higher return you will get higher risk. We know that when VIX is high that returns will have a higher volatility. So it's not shocking that they also have higher risk.

So a better way of testing this is to use risk adjusted returns. This isn't the place to debate the best way of risk adjusting returns, I'm going to use the Sharpe Ratio and that is that. Here I define the Sharpe as the 20 business day return divided by the volatility of that return, and then annualised.

(You can see now why using the futures contract is better, because to calculate Sharpe Ratios I don't need to deduct the risk free rate)

Now we've adjusted for risk there is little to choose between the high VIX and low VIX environments. In fact things have reversed, with low VIX having a higher Sharpe Ratio than high VIX. But the difference in Sharpes is just 0.04, which isn't very much.


We can only be 63% confident that low VIX is better than high VIX. This is little better than chance, which would be 50% confidence.

An important point: notice that although the difference in Sharpes isn't significant, we do know it with reasonably high confidence, as each bucket of observations (high or low VIX) is quite large. We can be almost 100% confident that the difference was somewhere between -0.04 and +0.04.

"Hang on a minute!", I hear you cry. The point now is that vol is really really low now. The analysis above is for VIX above and below 20%. You want to know what happens to stock returns when VIX is incredibly low - below 10%.

The conditional Sharpe Ratio for VIX below 10 is actually negative (-0.14) versus the positive Sharpe we get the rest of the time (0.14). Do we have a newspaper story here?

Here is the plot of Sharpe Ratios for very low VIX below 10% (red), and the rest of the time (blue):

But hang on, where are the red bars in the plot? Well remember there are only a tiny number of observations where we see vol below 10. You can just about make them out at the bottom of the plot. In statistics when we have a small number of observations we can also be much less certain about any inference we can draw from them.

Here for example is the plot of the difference between the Sharpe Ratio of returns for very low VIX and 'normal' VIX.

Notice that the amount of uncertainty about the size of the difference is substantial. Earlier it was between -0.04 and 0.04, now it's between -1 and 0.5; a much larger range. To reiterate this is because one of the samples we're using to calculate the expected difference in Sharpe Ratios is very small indeed. It does look however as if there is a reasonable chance that returns are lower when VIX is low; we can be 86% confident that this is the case.

Perhaps we should do a "proper" quant investingation, and take the top and bottom 10% of VIX observations, plus the middle, and compare and contrast.That way we can get some more data. After all although statistics can allow us to make inferences from tiny sample sizes (like the 11 days the VIX closed below 10), it doesn't mean we should.



The big blue area is obviously the middle of the VIX distribution; whilst the purple (actually red on blue) is relatively low VIX, and the green is relatively high VIX.

It's not obvious from the plot but there is actually a nice pattern here. When the VIX is very low the average SR is 0.071; when it's in the middle the SR is 0.139, and when it's really high the SR is 0.20. 
Comparing these numbers the differences are actually highly significant (99.3% chance mid VIX is better than low VIX, 98.4% chance high VIX is better than mid VIX, and 99.999% chance high VIX is better than low VIX).

So it looks like there might be something here - an inverse relationship between VIX and future equity returns. However to be clear you should still expect to make money owning S&P 500 when the VIX is relatively low - just a little bit less money than normal. Buying equities when the VIX is above 30 also looks like a good strategy. It will be interesting to see if market talking heads start pontificating on that idea when, at some point, the VIX gets back to that level.

"Hang on another minute!!", I hear you unoriginally cry, again. The original story I told at the top of this post was about VIX spiking in February 2007, and the stock market reacting about 18 months later. Perhaps 20 business days is just too short a period to pick up the effect we're expecting. Let's use a year instead.




The results here are more interesting. The best time to invest is when VIX is very high (average SR in the subsequent year, 1.94). So the 'buy when everyone else is terrified' mantra is true. But the second best time to invest is when VIX is relatively low! (average SR 1.14). These are both higher Sharpes than what you get when the VIX is just middling (around 0.94). Again these are also statistically significant differences (low VIX versus average VIX is 97% confidence, the other pairs of tests are >99%).

I could play with permutations of these figures all day, and I'd be rightly accused of data mining. So let me summarise. Buying when the VIX is really high (say above 30) will probably result in you doing well, but you'll need nerves of steel to do it. Buying when the VIX is really low (say less than 15) might give you results that are a little worse than usual, or they might not.

However there is nothing special about the VIX being below 10. We just can't extrapolate from the tiny number of times it has happened and say anything concrete.


Does the level of VIX predict vol?


Whilst the VIX isn't that great for predicting the direction of equity markets, I noted in passing above that it looks like it's pretty good at predicting their future volatility

We're still conditioning on low, middling, and high VIX here but the response variable is the annualised level of volatility over the subsequent 20 days. You can see that most of the red (turning purple) low VIX observations are on the left hand side of the plot - low VIX means vol will continue to be low. The green (high VIX) observations are spread out over a wider area, but they extend over to the far right.

Summarising:

Low VIX (below 12.5): Average subsequent vol 8%
Medium VIX: Average subsequent vol 12.3%
High VIX: Average subsequent vol 21.9%

These numbers are massively statistically significant from each other (above 99.99%). I get similar numbers for trying to predict one year volatility. 

So it looks like the current low level of VIX means that prices probably won't move very much. 


Does the level of vol predict vol?


The VIX is a forward looking measure of future volatility, and it turns out a pretty good one. However there is an even simpler predictor of future vol, and that is recent vol. The level of the VIX, and the level of recent volatility, are very similar - their correlation is around 0.77.

Skipping to the figures, how well does recent vol (over the last 20 days) predict subsequent vol (over the next 20 days)?

Recent Vol less than 6.7%: Average subsequent vol 7.9%
Recent Vol between 6.7% and 21.7%: Average subsequent vol 13.2%
Recent Vol over 21.7%: Average subsequent vol 23.4%

These are also hugely significant differences (>99.99% probability). 


The best way of predicting volatility


Interestingly if you use the VIX to try and predict what the VIX will be in one months time you find it is also very good. Basically both recent vol and implied vol (as measured by the VIX) cluster - high values tend to follow high values, and vice versa. Over the longer run vol tends not to stay high, but will mean revert to more average levels - and this applies to both implied vol (so the VIX) and realised vol.

So a complete model for forecasting future volatility should include the following:

  1. recent vol (+ve effect)
  2. current implied vol (the VIX) (+ve)
  3. recent vol relative to long run average (-ve)
  4. recent level of spot VIX relative to long run average
  5. (You can chuck in intraday returns and option smile if you have time on your hands)
However there is decreasing benefit from including each of these things. Recent vol does a great job of telling you what vol is probably going to be in the near future. Including the current level of the VIX improves your predictive power, but not very much.


Summary


The importance of the VIX to future equity returns is somewhat overblown. It's just plain silly to say we can forecast anything from something that's only happened on a handful of occasions in the past (granted that the handful in question belongs to someone with 11 fingers). Low VIX might be a signal that returns will be a little lower than average in the short term, but by no means is inevitable impending doom fast approaching. 

If there is a consistent lesson here it's that very high levels of VIX are a great buy signal. 

The VIX is also helpful for predicting future volatility - but if you have room in your life for just one forecasting rule using recent realised vol is better.


Tuesday, 2 May 2017

Some reflections on QuantCon 2017

As you'll know if you've been following any of my numerous social media accounts I spent the weekend in New York at QuantCon, a conference organised by Quantopian who provide a cloud platform for python systematic trading strategy backtesting.

Quantopian had kindly invited me to come and speak, and you can find the slides of my presentation here. A video of the talk will also be available in a couple of weeks to attendees and live feed subscribers. If you didn't attend this will cost you $199 less a discount using the code CarverQuantCon2017 (That's for the whole thing - not just my presentation! I should also emphasise I don't get any of this money so please don't think I'm trying to flog you anything here).

Is a bit less than $200 worth it? Well read the rest of this post for a flavour of the quality of the conference. If you're willing to wait a few months then I believe that the videos will probably become publicly available at some point (this is what happened last year).

The whole event was very interesting and thought provoking; and I thought it might be worth recording some of the more interesting thoughts that I had. I won't bother with the less interesting thoughts like "Boy it's much hotter here than I'd expected it to be" and "Why can't they make US dollars of different denominations more easily distinguishable from each other?!".


Machine learning (etc etc) is very much a thing


Cards on the table - I'm not super keen on machine learning (ML), AI Artificial intelligence, NN Neural Networks, and DL Deep Learning (or any mention of Big Data, or people calling me a Data Scientist behind my back - or to my face for that matter). Part of that bias is because of ignorance - it's a subject I barely understand, and part is my natural suspicion of anything which has been massively over hyped.

But it's clearly the case that all this stuff is very much in vogue right now, to the point where at the conference I was told it's almost impossible to get a QuantJob unless you profess expertise in this subject (since I have none I'd be stuck with a McJob if I tried to break into the industry now); and universities are renaming courses on statistics "machine learning"... although the content is barely changed. And at QuantCon there were a cornucopia of presentations on these kind of topics. Mostly I managed to avoid these. But the first keynote was about ML, and the last keynote which was purportedly about portfolio optimisation (by the way it was excellent, and I'll return to that later), so I didn't manage to avoid it completely.

I also spent quite a bit of time during the 'off line' part of the conference talking to people from the ML / NN / DL / AI side of the fence. Most of them were smart, nice and charming which was somewhat disconcerting (I felt like a heretic who'd met some guys from the Spanish inquisition at a party, and discovered that they were all really nice people who just happened to have jobs that involved torturing people). Still it's fair to say we had some very interesting, though very civilised, debates.

Most of these guys for example were very open about the fact that financial price forecasting is a much harder problem than forecasting likely credit card defaults or recognising pictures of cats on the internet (an example that Dr Ernie Chan was particularly fond of using in his excellent talk, which I'll return to later. I guess he likes cats. Or watches a lot of youtube).

Also, this cartoon:

Source: https://xkcd.com/1831/ This is uncannily similar to what DJ Trump recently said about healthcare reform.


The problem I have here is that "machine learning" is a super vague term which nobody can agree on a definition for. If for example I run the most simple kind of optimisation where I do a grid search over possible parameters and pick the best, is that machine learning? The machine has "learnt" what the best parameters are. Or I could use linear regression (200+ years old) to "learn" the best parameters. Or to be a bit fancier, if I use a Markov process (~100 years old) and update my state probabilities in some rolling out of sample Bayesian way, isn't that what an ML guy would call reinforcement learning?

It strikes me as pretty arbitrary whether a particular technique is machine learning or considered to be "old school" statistics. Indeed look at this list of ML techniques that Google just found for me, here:

  1. Linear Regression
  2. Logistic Regression
  3. Decision Tree
  4. SVM
  5. Naive Bayes
  6. KNN
  7. K-Means
  8. Random Forest
  9. Dimensionality Reduction Algorithms
  10. Gradient Boost & Adaboost

Some of these machine learning techniques don't seem to be very fancy at all. Linear and logistic regression are machine learning? And also Principal Components Analysis? (which apparently is now a "dimensionality reduction algorithm". Which is like calling a street cleaner a "refuse clearance operative")

Heck, I've been using clustering algorithms like KNN for donkeys years, mainly in portfolio construction (of which more later in the post). But apparently that's also now "machine learning".

Perhaps the only important distinction then is between unsupervised and supervised machine learning. It strikes me as fundamentally different to classical techniques when you let the machine go and do it's learning, drawing purely from the data to determine what the model should look like. It also strikes me as potentially dangerous. As I said in my own talk I wouldn't trust a new employee with no experience in the financial markets to do their fitting without supervision. I certainly wouldn't trust a machine.

Still this might be the only way of discovering a genuinely novel and highly non linear pattern in some rich financial data. Which is why I personally think high frequency trading is one of the more likely applications for these techniques (I particularly enjoyed Domeyards Christina Qi's presentation on this subject, which most of us only know about through books like Flash Boys).

I think it's fair to say that I am a bit more well disposed towards those on the other side of the fence than I was at the conference. But don't expect me to start using neural networks anytime soon.


... but "Classical" statistics are still important


One of my favourite talks that I've already mentioned was Dr Ernie Chan who talked about using some fairly well known techniques to identify pictures of cats on you tube enhance the statistical significance of backtests (with a specific example of a multi factor equity regression).


Source: https://twitter.com/saeedamenfx

Although I didn't personally learn anything new in this talk I found it extremely interesting and useful in reminding everyone about the core issues in financial analysis. Fancy ML algorithims can't help solve the fundamental problem that we usually have insufficient data, and what we have has a pretty low ratio of signal to noise. Indeed most of these fancy methods need a shed load of data to work, especially if you run them on an expanding or rolling out of sample basis as I would strongly suggest. There are plenty of sensible "old school" methods that can help with this conundrum, and Ernie did a great job of providing an overview of them.

Another talk I went to was about detecting structural breaks in relative value fixed income trading, which was presented by Edith Mandel of Greenwhich Street Advisors. Although I didn't actually agree with the approach being used this stuff is important. Fundamentally this business is about trying to use the past to predict the future. It's really important to have good robust tests to distinguish when this is no longer working, so we know that the world has fundamentally changed and it isn't just bad luck. Again this is something that classical statistical techniques like Markov chains are very much capable of doing.


It's all about the portfolio construction, baby


As some of you know I'm currently putting the final touches to a modest volume on the ever fascinating subject of portfolio construction. So it's something I'm particularly interested in at the moment. There were stacks of talks on this subject at Quancon, but I only managed to attend two in person.

Firstly the final keynote talk, which was very well received, was on Building Diversified Portfolios that Outperform Out-of-Sample", or to be more specific Hierarchical Risk Parity (HRP), by Dr. Marcos López de Prado:

Source: https://twitter.com/quantopian. As you can see Dr. Marcos is both intelligent, and also rather good looking (at least as far as I, a heterosexual man, can tell).

HRP is basically a combination of a clustering method to group assets and risk parity (essentially holding positions inversely scaled to a volatility estimate). So in some ways it is not hugely dissimilar to an automated version of the "handcrafted" method I describe in my first book. Although it smells a lot like this is machine learning I really enjoyed this presentation, and if you can't use handcrafting because it isn't sophisticated enough then HRP is an excellent alternative.

There were also some interesting points raised in the presentation (and Q&A, and the bar afterwards) more generally about testing portfolio construction methods. Firstly Dr Marcos is a big fan (as am I) of using random data to test things. I note in passing that you can also use bootstrapping of real data to get an idea of whether one technique is just lucky, or genuinely better.

Secondly one of the few criticisms I heard was that Dr Marcos chose an easy target - naive Markowitz - to benchmark his approach against. Bear in mind that (a) nobody uses naive Markowitz, and (b) there are plenty of alternatives which would provide a sterner test. Future QuantCon presenters on this subject should beware - this is not an easy audience to please! In fairness other techniques are used as benchmarks in the actual research paper.

If you want to know more about HRP there is more detail here.

I also found a hidden gem in one of the more obscure conference rooms, this talk by Dr. Alec (Anatoly) Schmidt on "Using Partial Correlations for Increasing Diversity of Mean-variance Portfolio".

Source: https://twitter.com/quantopian


That is more interesting than it sounds - I believe this relatively simple technique could be something genuinely special and novel which will allow us to get bad old Markowitz to do a better job with relatively little work, and without introducing the biases of techniques like shrinkage, or causing the problems with constraints like bootstrapping does. I plan to do some of my own research on this topic in the near future, so watch this space. Until then amuse yourself with the paper from SSRN.


Dude, QuantCon is awesome


Finance and trading conferences have a generally bad reputation, which they mostly deserve. "Retail" end conferences are normally free or very cheap, but mostly consist of a bunch of snake oil salesman. "Professional" conferences are normally very pricey (though nobody there is buying their ticket with their own money), and mostly consist of a bunch of better dressed and slightly snake oil salespeople.

QuantCon is different. Snake oil sales people wouldn't last 5 minutes in front of the audience at this conference, even if they'd somehow managed to get booked to speak. This was probably the single biggest concentration of collective IQ under one roof in finance conference history (both speakers and attendees). The talks I went to were technically sound, and almost without exception presented by engaging speakers.

Perhaps the only downside of QuantCon is that the sheer quantity and variety of talks makes decisions difficult, and results in huge amount of regret at not being able to go to a talk because something only slightly better is happening in the next room. Still I know that I will have offended many other speakers by not (a) going to their talk, and (b) not writing about it here.

So I feel obligated to mention this other review of the event from Saeed Amen, and this one from Andreas Clenow, who are amongst the speakers whose presentations I sadly missed.

PS If you're wondering wether I am getting paid by QuantCon to write this, the answer is zero. Regular readers will know me well enough that I do not shill for anybody; the only thing I have to gain from posting this is an invite to next years conference!