Tuesday, 11 November 2025

Is predicting vol better worth the effort and does the VIX help?

 I'm a vol scaler.

There I've said it. Yes I adjust my position size inversely to vol. And so should you. 

But to this well we need to be able to predict future vol; where the 'future' here is roughly how long we expect to hold our positions for. 

Some people spend a lot of effort on this. They use implied vol from options, high(er) frequency data, GARCH or stochastic vol models. Other people don't spend a lot of effort on this. They look at the vol from the last month or so, and use that. I'm somewhere in the middle (though biased massively towards simplicity); I use an exponentially weighted moving average of recent vol combined with a much slower average.

An obvious question with any research effort is this: is the extra effort worth it? If we were trading options, then sure it would be. But we're not.

In this post I answer that 'is it worth spending time on this nonsense' question and look at the actual improvements we can gain from moving from the most rudimentary vol forecasting to the slightly more complex stuff I do. I also see if we can use a simple indicator of future volatility - the VIX - to improve things further. This was suggested by someone on Twitter(X). 


Is it worth predicting vol better?

I've mentioned this experiement a few times in the past, but I don't think I have ever blogged about it. Basically you run two backtests, one with your normal historic vol estimation, and the other with perfect foresight: basically equal to the ex-post vol over the next 30 days. This will be equal to the theoretical best possible job we could do if we really worked hard at forecasting vol. We can't do any better than a crystal ball. 

Then you check out the improvement. If vol is worth forecasting, there will be a big improvement in performance.

[This is a 'workhorse' test simulation with 100 liquid futures and 4 signals: 40% carry, and 20% in eahc of ewmac 16,32 and 64]

We begin with the simplest possible predictor of vol, a backward looking standard deviation estimate with an infinite window. Essentially this is a fixed vol estimate without any in sample estimation issues. We then compare that to the perfect foresight model.

Let's begin by looking and seeing what the vol outcome is like, this is one month rolling vol estimate (the realised vol of the strategy returns); clearly foresight does a better job of vol targeting.


Above are the cumulated returns. That sure looks like a decent improvement and as the vol of perfect foresight is lower it's better than it looks. It's a half unit improvement in SR points, from 0.76 to 1.24. The skew has dropped off from over 1.0 monthly to 0.12, but you know from my previous posts that small dip in skew won't be enough to destroy the huge CAGR advantage given by this sort of SR premium. The sortino is much better, more than double. 

So the short answer is yes, it's worth predicting vol better. Let's see how 


What size window

The obvious thing to do is to shorten our estimation window from forever to something a little shorter. Here is a graph I like to show people:

The x-axis shows the window size for a historic vol estimator in business days. The y-axis shows the R squared regressing the realised vol for a given future time period against the estimator / predictor of future vol. We're looking for the point on the x-axis that maximises R squared. Each line is a different length of future time period. So for example, to get the best prediction of vol one month ahead (about 21 business days) we look at the purple line for 21 days, and we can see this peaks at around 25 days. 

This is also the highest R squared. We are best at predicting one month vol ahead than other periods, and to do so we should use the previous one month vol (actually slightly more than a month). 

We don't do quite as well predicting shorter periods, and it looks like we might need slightly less data to predict eg 5 day vol. We do worse predicting longer periods, and it looks like we need more data. For 365 days ahead vol, the best R squared is obtained at somewhere between 40 days (around 2 months) and 100 days (around 5 months). 

Note: these are good R squared! In my last post a monthly holding period with an R squared of 0.1 would give us a SR of over 1, which is good. Here we are seeing R squared of over 0.30, which equates to a SR of nearly 2. That is very good - if we were as good at predicting returns as vol our SR would be two!

With that in mind, let's go from an infinite lookback to a 25 day business day lookback and see what happens.

First the rolling vol:

We can already see a fair improvement from the spikiness of the benchmark. How about the returns?

It looks like we are doing better than the benchmark and are competitive with foresight. However some of this is higher vol; our SR is 1.03 which still falls short of the 1.24 of the perfect foresight model, though obviously much better than the benchmark of infinite vol.

To recap:

Infinite previous vol                  SR 0.76
One month simple rolling vol           SR 1.03
Perfect foresight                      SR 1.24


From simple to exponential moving average

Now let's be a little fancier and go to EWM of vol rather than a simple equally weighted measure. This might not get us a better forecast of vol, but we should be smoother. A 36 day span in the pandas EWM function has the same half life as a 25 day SMA.

As before, here's the vol targeting, which is now almost identical:


And for profits....


Again we aren't quite vol matched, but EWM does in fact add a small increment in SR of 0.04 units. Around a quarter of that modest bump comes from lower costs (a saving of around 24 bp a year). 


Infinite previous vol                  SR 0.76
One month simple rolling vol           SR 1.03
One month EWM rolling vol              SR 1.06
Perfect foresight                      SR 1.24


I already looked at this in my book AFTS, but if we combine the standard 25 EWM vol with a very long run average (10 years) of the same vol we get another small bump. This is the vol measure I use myself.


Introducing the VIX

We are still some way short of getting close to perfect foresight vol. So let's do something else, for fun. We know that implied vol should be a good predictor of future vol; accounting for the well known vol premium (we get paid for being short gamma, hence implied is persistently higher than expected future vol).

Here's the simple rolling 25 day standard deviation measure for the S&P 500, and the VIX:

Note: I would like to thank Paul Calluzzo for pointing out a stupid mistake I had made in the first version of this post

A couple of things to notice. Firstly the vol premium is larger after 2008 due to a general level of scaredy-cat-ness, and it sems to have narrowed somewhat inthe last few years. Over the last few years there have been a lot of dumb retail people selling vol and pushing the price down! 

Secondly it looks like VIX tracks rather than predicts increases in risk, at least for those unexpected events which cause the biggest spikes. Which suggests it's predictive power will be somewhat limited.
If we regress future vol on historic vol plus the VIX, the VIX coefficient is 0.14 and the historic vol comes in at 0.71. That suggests historic vol does most of the explaining with VIX not adding much to the party. I get similar results if I put the vol premium (VIX - historic vol) plus historic vol into the regression to reduce potential colinearity. 

Summary

There are significant performance benefits to be gained from forecasting vol well even in a directional system that doesn't trade optionality. Over half of those benefits can be captured by just using the right amount of lookback on a simple historical estimate. Further complexity can probably improve vol targeting but is unlikely to lead to significant performance improvements. Finally, the VIX is not especially helpful in predicting future volatility; mostly this is explained pretty well by historic vol.



Saturday, 1 November 2025

R squared and Sharpe Ratio

 Here's some research I did whilst writing my new book (coming next year, and aimed at relatively inexperienced traders). Imagine the scene. You're a trader who products forecasts (a scaled number which predicts future risk adjusted returns, or at least you hope it does) who wants to evaluate how good you are. After all you've read Carver, and you know you should use your expected Sharpe Ratio to determine your risk target and cost budget.

But you don't have access to cutting edge backtesting software, or even dodgy home brew backtesting software like my own psystemtrade, instead you just have Excel (substitute for your own favourite spreadsheet, god knows I certainly don't use the Micros*it product myself). You're not enough of a spreadsheet whizz to construct a backtest, but you can just about manage a linear regression. But how do we get a Sharpe Ratio from a regression?

If that is to much of a stretch for the typical reader of this blog, instead imagine that you do fancy yourself as a bit of a data scientist, and naturally you begin your research by regressing your risk adjusted returns on your forecasts to identify 'features' (I'm given to understand this is the way these people speak) before going near your backtester because you've read Lopez De Prado

Feels like we're watching a remake of that classic scene in Good Will Hunting doesn't it "Of course that's your contention. You're a first year data scientist. You just finished some financial economist, Lopez De Prado prob'ly, and so naturally that's what you believe until next month when you get to Rob Carver and get convinced that momentum is a risk factor. That'll last until sometime in your second year..."

But you're wondering whether an R squared of 0.05 is any good or not? Unlike the Sharpe Ratio, where you know that 1 is good, 2 is brilliant, and 3 means you are eithier the next RenTech or more likely you've overfitted.

So I thought it would be 'fun' to model the relationship between these two measures of performance. Also, like I said, it's useful for the book. Which is very much aimed at the tech novice trader rather than the data scientist, but I guess the data scientist can just get the result for free from this blogpost as they're unlikely to buy the book.

There are three ways we can do this. We can use a closed form formula, we can use random data, or we can use actual data. I'm going to do all three. Partly to verify the formula works in the real world, and partly to 

There is code here; you'll need psystemtrade to run it though.

Edit notes: I'd like to thank LacertaXG1 and Vivek Rao for reminding me that a closed form formula exists for this problem.


Closed form formula

From the book known only as G&K we have one of my favourite laws, LAM - the law of active management. This is where the famous 'Sharpe Ratio (actually Information Ratio, but we're amongst friends) is proportional to sqrt active bets' comes from, a result we use in both portfolio size space (the IDM for a portfolio of N uncorrelated assets ought to be sqrt N) and in time space (for a given success rate the SR for a trading strategy with holding period T will be sqrt 2 times better if we halve our holding period). 

Anyway under LAM at an annual holding period an R squared of 0.01 equates to an IC/SR of 0.10. Under LAM we'd expect the same R squared to result in a sqrt(256) = 16, SR of 1.6 at a daily holding period. Let's see how well this is borne out by the data.


Real data and forecasts

This is the easiest one. We're going to get some real forecasts, for things like carry, momentum. You know the sort of thing I do. If not, read some books. Or if you're a cheapskate, the rest of this blog. And we get the price of the things the forecasts are for. And because I do indeed have fancy backtesting software I can measure the SR for a given forecast/price pairing*. 

* to do this we need a way of mapping from forecast to positions, basically I just do inverse vol position scaling with my standard simple vol estimate which is roughly the last month of daily returns, and the overall forecast scaling doesn't really matter because we're not interested in the estimated coefficients of the regression just the R squared.

And because I can do import statsmodel in python, I can also do regressions. What's the regression I do? Well since forecasts are for  predicting future risk adjusted returns, I regress:

(price_t+h - price_t)/vol_estimate_t = alpha + beta * (forecast_t) + epsilon_t 

Where t is time index, and h is the forecast horizon in calendar days, which I measure simply by working out the forecast turnover (by counting the typical frequency of forecast sign changes from negative to positive in a year), and then dividing 365 by the turnover. 

Strictly speaking we should remove overlapping periods as that will inflate our R squared, but as long as we consistently don't remove overlapping periods it then our results will be fine.

Beta we don't care about as long as it's positive (it's some arbitrary scaling factor that will depend on the size of h and the forecast scaling), and alpha will be any bias in the forecast which we also don't care about. All we care about is how well the regression fits, and for that we use R squared. 

Note: We could also look at the statistical significance of the beta estimate, but that's going to depend on the length of time period we have. I'd rather look at the statistical significance of the SR estimate once we have it, so we'll leave that to one side. 

Anyway we end up with a collection of SR and the counterpart R squared for the relevant regression. Which we'll plot in a minute, but let's get random data first.


Random data

This is the slightly harder one. To help out, let's think about the regression we're going to end up running:

(price_t+h - price_t)/vol_estimate_t = alpha + beta * (forecast_t)  + epsilon_t 

And let's move some stuff around:

 (forecast_t)  

     = ((1/beta)*(price_t+h - price_t)/vol_estimate_t) 

      + (alpha/beta) + (epsilon_t/beta) 

If we assume that alpha is zero, and we're not bothered about arbitrary beta scaling, then we can see that:

 (forecast_t)  

     = ((price_t+h - price_t)/vol_estimate_t) + noise

This means we can do the following:
  • Create a random price series, compounded gaussian random is fine, and scaling doesn't matter
  • Measure it's backward looking vol estimate
  • Work out the future risk adjusted price return at any given point for some horizon, h
  • Add noise to it (as a multiple of the gaussian standard deviation)
  • Voila! As the french would say. We have a forecast! (Or nous avons une prĂ©vision! As the French would say)
We now have a price, and a forecast. So we can repeat the exercise of measuring a SR and doing a regression from which we get the R squared. And we'll get the behaviour we expect; more noise equals lower SR and a worse R squared. We can run this bad boy many times for different horizons, and also for different levels of noise.


Results

Without adoing any further, here are some nice pictures. We'll start with the fake data. Each of the points on these graphs is the mean SR and R squared from 500 random price series. The x-axis is a LOG scale for R squared. 10^-1 is 0.01 and so on, you know the drill. The y axis is the SR. No logging. The titles are the forecast horizons in business days, so 5 days is a week, etc etc.

As we're trading quickly, we get pretty decent SR even for R squared that would make you sad. An R squared of 0.01, which sounds rubbish, gives you a SR of around 0.7. 

Heres around a monthly holding period:


Two months:


Three months:


Six months:

And finally, one year:



Right so what are the conclusions? There is some fun intuition here. We can see that an R squared of 0.01 equates to a SR of 0.1 at an annual holding period as the theory suggests. It's also clear that an R squared of 0.1, which is very high for financial data, isn't going to help that much if your holding period is a year. Your SR will still only be around 0.30. Wheras if you're trading fifty times faster, around once a week, it will be around 2.30 SR with R squared of 0.1. The ratio between these two numbers (7.6) is almost exactly equal to the square root of fifty (7.1) and this is no accident; our results are in line with the law of active management which is a nice touch.

Neatly, an R squared of 1 equates exactly to a SR of 1 at a one year holding period.

Now how about some real results. Here we don't know what the forecast horizon is, instead we measure it from the forecast. This does mean we won't have neat graphs for a given horizon, but we can do each graph for a range of horizons. And we don't have to make up the forecast by reversing the regression equation, we just have forecasts already. And the price, well of course we have prices.
Important note! Unlike with fake data where we're unlikely to lose money on average, with real data we can lose money. So we remove all the negative SR before plotting.

Here's for a horizon of about 5 days:

No neat lines here; each scatter point represents an instrument and trading rule (probably mostly fast momentum). Remember this from earlier for the 5 day plot with fake data: "An R squared of 0.01, which sounds rubbish, gives you a SR of around 0.7". You can see that is still true here. And also the general shape is similar to what we'd expect; a gentle upward curve. We just have more really low SR, and (sadly!) fewer higher SR than in the fake data.

About two weeks:

About a month:

About two months:
About three months:
About six months... notice things are getting sparser
And finally about a year:
There is very little to go on here, but an R squared of 0.1 which before gave a SR of 0.3 isn't a million miles away at 0.5. In general I'd say the real results come close to confirming the fake results.


Summary

Both data scientists and neophyte traders alike can use the fake data graphs to get SR without doing a backtest. Do your regression at some forecast horizon for which a fake data graph exists. Don't remove overlapping periods. If the beta is negative then you're losing money. If the beta is positive then you can lookup the SR inferred by the R squared.

You can also use any graph, and then correct the results for LAM. For example, if you want the results for 1 day, then you can use the results for 5 days and multiply the SR by sqrt(5). But you want a closed form solution. So here is one, assuming 256 business days in a year:

The SR for N days holding period is equal to 16 * sqrt(R squared / N)