Tuesday 3 October 2023

The State Of Vol

 I'm sometimes asked where I get my ideas for new trading strategies from. The boring truth is I rarely test new trading strategies, and I mostly steal ideas when I feel in the mood. Today for example I saw this tweet post on twitter X:

The original paper is here  (requires subscription or academic institution membership)

Now I've written in the past about how volatility levels affect the profits of trading strategies, in particular momentum where there is a pretty striking effect (see my new book Advanced Futures Trading Strategies [AFTS] for more info), and I've also written about how the level of past vol affects future vol (and indeed, this incredible predictability of vol is a cornerstone of the inverse volatility sizing formula that lies at the heart of my trading strategies), but I don't think I've ever written about the level of volatility's effect on future price movements

There is quite a simple story here which if you can see the paper is in figure 1: absolute returns are pretty flat across different vol regimes, and because vol is pretty autocorrelated from month to month, we'd expect return/vol to be higher if recent vol was lower. To put it another way, there is a link here with my previous post about CAPM; this story is sort of about the time series version of CAPM in which higher vol should be rewarded with higher returns (and hence Sharpe Ratios should not show any pattern conditional on relatively volatility levels), and the opposite happens in single stocks in an analogy to the fact that CAPM doesn't work in cross section for single stocks, in fact we should 'bet against beta' and buy low beta stocks.

But the paper is written for single stocks; a risky asset, and one with some idiosyncratic distributional features. Will the results hold for the wider universe of futures markets? 


Data and definitions

I am not interested in reproducing the results in the paper since I'm more concerned with whether this is a profitable trading strategy sitting within my usual framework, so I will do things a bit differently.

I use my usual set of futures markets, after removing duplicates (eg micro and mini versions, or cross listings) it comes in at 211 instruments. Data goes back to 1970, and I use my standard vol estimate calculated in % terms, which I divide by an exponential moving average of the same with a 10 year vol halflife to get a relative vol measure. If this is 1, then the vol we're seeing is going to be typical of the level over the last couple of decades or so, if it's higher than 1 then the vol is higher, and so on. Again readers of AFTS will recognise this measure. The paper doesn't quite do this; it just looks at the level of vol versus the in sample historic distribution. This means we can't use that forecast as a trading signal.

For the y-axis of response, I will use the returns in the following month normalised by the volatility estimate (in price differences this time) at the beginning of the month for reasons I have discussed before. As I'm trading futures, this is also the Sharpe Ratio. The original paper uses a straight one month moving window of returns for vol estimation, and the pure ex-post Sharpe Ratio using the realised vol in the following month, so some small difference there. However because of the autocorrelation of vol, it shouldn't affect things too much; if anything my results probably should be a bit better because I'm dividing an unknown future estimate of mean by a known current estimate of standard deviation to get Sharpe Ratio, rather than having unknown mean/unknown sigma.

The original papers uses single stocks, I use a massive variety of futures allowing me to see if this effect persists at the stock index level and for other asset classes. 


Some comparable graphs

Let's start with producing the same kind of graphs as in the paper. So each month I look at the current level of vol versus it's historic average, and then rank it on an in sample basis (yes I know, but for the time being...), and then bucket the ranking into quintiles. I measure the risk adjusted return in the following month and take a median of those Sharpe Ratios; medians being somewhat more robust than mean.

Here is the result for the S&P 500:




I use the same ordering; 1 on the left is the lowest quantile of vol, 5 on the right is the highest. That picture is not quite as compelling as the original paper; but ignoring the 2nd bar it has the right pattern: higher recent vol means lower risk adjusted returns in the following month. Nevertheless we are only using 492 data points compared to the thousands we can get with single stocks; plus I am suspicous of anyone that uses a single instrument to prove anything, so let's expand out to all equity futures (with months stacked, which will give a higher weight to instruments with more history):



That is .... not what we'd hoped for.

The name is Bond:




Not great. What about Vol (VSTOXX, VIX)?




OK only two markets, but there is an apparent reverse effect here. You don't want to be long vol on average, but perhaps the worst time to be is when vol of vol is very low.

FX:





At best noise, at worst the reverse of what we expect.

Metals




That is the strongest, wrong way round, effect we have seen. Agricultural:




Just noise really. And finally energies:



The wrong way round, and inconsistent.

It's probably futile, but what happens if we pool our results across all markets?



There could be a story there- higher vol means higher risk adjusted returns, unless vol is very high; or we could be looking at the aggregated result of jamming together instruments that behave very differently*, plus a bunch of instruments that are just noise. 

[* it looks like markets fall into two camps, one where the effect is the 'right way' round (high vol, lower returns; perhaps the S&P 500 is here), others where it is the wrong way (most strikingly, vol), and the third of the two camps doesn't really have a strong effect]

Just to give a point of comparison, here's the bucket plot if I replace the ratio of vol with the level of the 32/128 day momentum forecast level (something we know is very successful as a forecast at a portfolio level, although it is weak in say equities):


Clearly this vol effect is not as strong as momentum.

A trading rule

Although the results aren't what we expect, this could be profitable as a 'wrong way round' rule, which buys when vol is higher, and sells when vol is lower. This won't work for very high vol (since quintile 5 shows a worse return for quintiles 2,3 and 4, but then we would hit forecast capping up there anyway. That isn't much different to what happens with faster momentum rules anyway; if you plot response versus forecast level it's not as good for extreme forecasts (see AFTS for more).

In my first book, Systematic Trading, I said that implementing a rule as a wrong way rule after discovering it is a form of data mining so bear in mind that the backtest results here will effectively be overstated [in reality, you couldn't have implemented this rule until enough historical evidence that the effect was the 'wrong' way round existed; until that moment of statistically significant realisation we would eithier be trading the money losing 'right way round' rule, or a combination of the two, eithier of which is inferior to being all in on 'wrong way round' from the start of the backtest].

I'd add a couple of tweaks as well; firstly rather than quantile buckets which are too coarse and in sample; I will use the percentile of the ratio versus historic levels as the forecast (the same approach I use for the volatility overlay rule in AFTS) , and also chuck in my standard ewm(span=10) smooth to reduce noise on what should be a monthly holding forecast (this reduces turnover from 25 times a year to 10 times without affecting profitability).

Before continuing, it might be worth thinking about how this interacts with the volatility overlay on a trend following rule (which cuts exposure long or short when vol is high).

  • If vol is low, we'd be SHORT from a vol rule
    • ... and we'd have a stronger trend following signal
      • if markets are trending up we'd be net flat
      • If markets are trending down we'd have a strong sell
  • If vol sits in the middle, we'd be FLAT from a vol rule
    • .... and we've have an unadulterated trend signal
      • If markets are trending up we'd be long
      • If markets are trending down we'd be short
  • If vol is high, we'd be LONG from a vol rule
    • .... and have a weaker trend following signal
      • If markets are trending up we'd be modestly long
      • If markets are trending down we'd be flat
Clearly this will work differently for risk on/risk off markets; risky markets like equities will most probably go up with low vol and down with high vol, thus occupying the two lines shown with italics. In these situations we'd have no position on (assuming these are the only two trading rules we're using). Since trend following doesn't work super well in equity indices, this might improve our lives somewhat.

Risk off markets, the most extreme of which is VIX/VSTOXX, will occupy the lines in bold as they have high vol when they are going up, and low vol vol the way down. Notice that this will introduce a short bias (modest long on the way up, strong short on the way down); since these markets tend to lose money on the long side again we might expect a benefit here.

Other markets - which form the majority of futures instruments - will be less clear, so let's see how it goes.

Here's the code for the purely backward looking vol quantile calculation, and the actual trading rule:

def get_vol_quantile_points(self, instrument_code):
self.log.debug("Calculating vol quantile for %s" % instrument_code)
daily_vol = self.parent.rawdata.get_daily_percentage_volatility(instrument_code)
ten_year_vol = daily_vol.rolling(2500, min_periods=10).mean()
normalised_vol = daily_vol / ten_year_vol

normalised_vol_q = quantile_of_points_in_data_series(normalised_vol)

return normalised_vol_q

def quantile_of_points_in_data_series(data_series: pd.Series) -> pd.Series:
## With thanks to https://github.com/PurpleHazeIan for this implementation
numpy_series = np.array(data_series)
results = []

for irow in range(len(data_series)):
current_value = numpy_series[irow]
count_less_than = (numpy_series < current_value)[:irow].sum()
results.append(count_less_than / (irow + 1))

results_series = pd.Series(results, index=data_series.index)
return results_series
def vol_rule(vol_quantile_points: pd.Series, smooth: int = 10):
# vol quantile points sits in space 0 to 1.0
raw_forecast = (vol_quantile_points - 0.5) * 40 ## sits in space -20 to +20
smoothed_forecast = raw_forecast.ewm(span=smooth).mean()
return smoothed_forecast


System test

I'm testing this using my current trading system with 147 instruments and relevant instrument weights. I'll be using the 'static' rather than 'dynamically optimised' flavour of the system, to get a feel for pure performance before the noise added by optimisation. This also means I'm going to put in an unrealistically large slug of capital, and remove a few markets that are too expensive to trade, bringing me down to 138 markets. I estimate instrument weights and the IDM (which peaks at 2.15) using my usual optimisation defaults.

To begin with, let's look at the performance of the vol strategy by itself. You're all dying to see it, so here is the money shot with the full account curve before and after costs:


Well it's not terrible but it's not amazing eithier. The Sharpe is a mere 0.10, which is not exactly knocking it out of the park... at best we've tapped the ball and it's dribbled a few feet away. 

Can this thing add value when combined with momentum (particularly given the discussion above)? Let's keep it simple and just use a single ewmac rule, 16/64. Correlation between the rules is actually a little negative, so let's do some god-awful in sample fitting and allocate 10% of our portfolio to the new vol rule.

It's not really worth plotting as these two systems will both look pretty similar, but this relatively small allocation to our new putative signal does indeed push up the Sharpe Ratio from 1.10 to 1.15, with the Sortino rising by a similar amount. Costs are slightly reduced, skew falls slightly (from 0.55 to 0.48), but the more robust lower tail ratio is unchanged (see AFTS for a definition). 


Summary


It does seem like the predictive effect of vol in single equities isn't replicated across the futures universe; if anything the effect is reversed although it is not as strong as in the original paper. Rather than buying instruments with lower vol to get higher risk adjusted returns, we should do the opposite. Rather than 'time series CAPM' failing, and a 'bet against time series standard deviation levels', we in fact see an even stronger 'time series CAPM' where risk adjusted returns aren't constant with relative vol levels, but actually improve when volatility is relatively high.

If we use this idea to construct a simple trading rule the result is not the world's greatest standalone signal. But there does seem to be some promise in adding it to a trend following strategy due to it's negative correlation.









Tuesday 26 September 2023

Does CAPM work across and within asset classes - done correctly

 I haven't posted much recently because I've been busy with other stuff, and I only post when I feel like I have something to say (the advantages of not having a paid for subscription service!). But I was compelled to post by this tweet:


Which links to this article: https://mailchi.mp/verdadcap/asset-class-capm

... which in turn generated a fair amount of heat and light, since there are two key mistakes in the article and tweet. In truth these are a manifestation of a single mistake, which is a mis-definition of CAPM. CAPM remember says that there is only one risk factor, market risk, and excess security returns are equal to the covariance of security/market returns (Beta) multiplied by market returns. And excess returns are equal to the risk free rate. 

But in the article they plot standard deviation versus mean, minus inflation. So they are confusing both inflation and the risk free rate, but also getting covariance and standard deviation mixed up. The latter error pointed out by several posters on twitter, although there is a mini argument suggesting that in the uber CAPM model with freely available leverage all securities should lie on the capital markets line, and hence have the same Sharpe Ratio, and hence all you need to do is plot excess return vs standard deviation (although again, excess return is versus the risk free rate NOT inflation!). 

Anyway I thought it would be interesting to redo this plot, but correctly. After all it's an interesting topic that speaks to the benefits of diversification. TLDR: the original authors conclusion is correct (CAPM works across asset classes, but not within) even if their methods are badly flawed.


Data

I use monthly returns data pulled from my dataset of over 200 futures instruments, from which I annualise mean, standard deviation and Sharpe Ratios. As futures markets these are automatically excess returns. I have history back to 1970 for some markets, and the original plot only goes back to 1973, but for reasons that I will explain in a minute I will start my analysis in 1983. To define an asset class 'market' I start with a simple equal weighted index of all the futures that had returns in that month. Arguably I should use market cap weightings, but I don't have these to hand and in any case the results probably won't change much (since there are, eg, more US futures equity indices and the US is a big part of the global equity market). Note that this means due to diversification in theory the standard deviation of each market index will fall over time. I could correct for this, but it is not significant.

Another slightly weird thing about the original plot is that it actually splits out certain asset classes; which seems to rather undermine the argument; for example small and big equity markets, short and long term bonds (ST, LT), and different credit quality bonds.I don't have enough futures with enough history to do a split between small and big equity, nor do I have enough HY/IG bond futures to be confident the results would be meaningful, but I able to include a lot more markets and asset classes. So I have:

  • Bonds (and at this stage I won't seperate these into ST/LT) - these are mostly government bonds (39 markets)
  • Equities of all types (58)
  • Metals (rather than just Gold in the original piece, 21)
  • Energies (rather than just Oil, 20)
  • Agricultural (38)

I don't include FX, since you can argue if it's really an asset class, and because it includes a mishmash of things that are bets for and against the dollar, emerging markets, and so on. And I don't include volatility, since this usually only has two markets in it (VIX and VSTOXX). 

Equity indices are late to the futures trading party, and my data for these doesn't start until late 1982. So for strict comparability I remove everything before January 1983. Again, this doesn't affect the final results all that much.


Plotting Sharpe Ratio

Let's first drop the incorrect definition of excess return, and plot excess mean versus standard deviation plots (to reiterate, as these are future the returns are automatically excess of the risk free rate). Note that means that everything on a straight line will have the same Sharpe Ratio.



Looks pretty good! And indeed if we look at the statistics including the Sharpe Ratios, we can see there is not that much difference between the SR, certainly nothing statistically significant:

        mean   std    sr
Ags     0.02  0.12  0.20
Bond    0.02  0.04  0.55
Equity  0.08  0.16  0.48
Metals  0.04  0.18  0.21
OilGas  0.09  0.28  0.34
Although we only have five data points, it does seem that there is a roughly positive relationship between excess mean over risk free and standard deviation.

Bringing in Beta

Having verified the original results after substituting the risk free rate for inflation, let's now bring in Beta. Under CAPM we'd expect that if we plotted excess mean against covariance rather than standard deviation, we'd again find a positive relationship. That should make assets with lower correlations look more attractive; that reminds me here's the correlation matrix:

        Ags  Bond  Equity  Metals  OilGas
Ags 1.00 -0.11 0.21 0.37 0.27
Bond -0.11 1.00 0.09 -0.06 -0.14
Equity 0.21 0.09 1.00 0.26 0.12
Metals 0.37 -0.06 0.26 1.00 0.28
OilGas 0.27 -0.14 0.12 0.28 1.00

The problem of course is how to measure Beta, i.e. what is the 'market' that we are regressing our returns on. That's a hard enough problem when considering equities, but here we should really include every investable asset in the world, weighted by market cap back to 1983. I don't have those figures to hand!!

Instead I'm going to opt for another quick and dirty solution, namely to create a market index in the following proportions:

  • Ags 10%
  • Bonds 40%
  • Equities 30%
  • Metals 10%
  • Oil and Gas 10%

This is based on some roughly true things; bonds and equities form most of the investable universe and there are more bonds issued than equities. And since most people are probably starting with a bonds/equities based portfolio, considering the diversification available versus something that is mostly that is probably a reasonable thing to do.

If you prefer you can do something else like risk parity (which would be about 50% bonds, with the other asset classes roughly splitting the rest), but it probably won't make that much difference. 

This market index has a standard deviation of 7.4% a year, and a mean of 4.8%; it's SR of 0.64 as you would expect is superior to it's constituents.

Let's have a look at the betas and alphas, also correlation with the market (corr), standard deviations and Sharpe Ratios:

          std   corr   beta     sr  alpha
Ags 0.119 0.471 0.765 0.199 -0.011
Bond 0.039 0.183 0.098 0.553 0.017
Equity 0.162 0.826 1.822 0.484 -0.008
Metals 0.176 0.566 1.353 0.215 -0.026
OilGas 0.277 0.541 2.027 0.338 -0.006

We can see that to an extent higher standard deviation also means higher beta, but not always; equities and metals have virtually the same standard deviation but equities have a higher beta because they are more correlated. There is also a weak relationship between alpha and SR.

Let's now redo the scatter plot but this time with Beta on the x-axis and adding the market portfolio:



The obvious outperformance of Bonds aside, this again does like a clear case of supporting the CAPM for the case of across asset classes; if anything it's clearer than before.


Intra market

Now let us address the point in the post which is mentioned but briefly; the fact that CAPM doesn't work within asset classes. This is not a new finding. Indeed there is the mysterious result of Beta making an excellent counter signal ('Betting against Beta' Pedersen and Frazzini JFE 2014) at least in individual equities. It seems that lower Beta stocks have excess Alpha compared to higher Beta stocks; one story that explains that is that if Beta is synomonous with standard deviation (which as discussed, it ain't exactly), then we'd need higher leverage to hold low Beta stocks and not everyone can or wants to leverage to the hilt.

This is perhaps a more interesting study to do, since we could potentially use any positive result here as a trading signal; buying instruments within an asset class that have low Beta (or low standard deviation), and shorting those that are high Beta. Once again we run up against the definition of 'the market' in each asset class, but I will stick with the simple equal weighted across time version I have been using so far.

Here follows a blizzard (correct collective noun?) of plots. Firstly, here's excess mean against standard deviation (the original Sharpe Ratio plot):








A big caveat here is that different instruments may have wildly different data histories. With that said, there is mostly no evidence here of a similar Sharpe Ratio. The exception is bonds. There does seem to be a relationship between duration (which is highly correlated to standard deviation) and excess return; and we also see that High Yield which is riskier than most of the goverment bonds has a higher return. In other worse, the bastardised version of the CAPM using vol rather than Beta does work within one asset class, which is perhaps why the authors of the original post decided to treat bonds as several different asset classes :-)

Now let's do things 'properly' and look at excess mean versus Beta:







Interestingly the positive result in Bonds is slightly different here; it mostly holds true that we get higher excess return for more Beta with the exception of high yield bonds. These are negatively correlated to the rest of the universe, and as a result have negative Beta. My returns for the high yield bond future go back to 2000, so this isn't a fluke down to a limited number of returns. However for government bonds, again it seems that CAPM holds true.

For a giggle let's reproduce the plots from the 'Betting Against Beta' paper, and plot Alpha vs Beta. CAPM predicts a horizontal line, whilst the original paper found a downward sloping line.





With the possible exception of oil and gas, there isn't much to write home about here. It doesn't look like CAPM or Betting against Beta is particularly compelling within asset classes that contain futures. 

(Note that in any case this isn't a proper test of Betting against Beta as a trading signal, since everything is in sample and not time varying)

Summary

Sloppy execution aside, the key findings of the original paper are correct; CAPM doesn't really work within asset classes, unless you lump all bonds into a single asset class in which case it works just fine, but it does work across asset classes. 

Friday 12 May 2023

Clustering trading rule p&l

 I recently upgraded my live production system to include all the extra instruments I've added on recently. I also did a little consolidation of trading rules, simplifying things slightly by removing some rules that didn't really have much allocation, and adding a couple from my new book. As usual I set the instrument weights and forecast weights using my handcrafting methodology, which is basically a top down method that involves clustering things into groups in a hierarchical fasion.

In my backtests I do this clustering using the correlation matrix as a guide, but for production weights I use heurestics. So for instruments I say things like 'bonds are probably more correlated with each other than with other assets' and form the clusters initially as asset classes. And for forecast weights, which allocate across trading rules, I say things like 'momentum type trading rules are probably more correlated with each other', so I end up using a hierarchy like this:

  • Convergent (eg carry and mean reversion), Divergent (eg momentum)
  • Generic trading rule (eg EWMAC)
  • Specific trading rule variation (eg EWMAC2,8)

Now I recently tested this clustering method for instruments in this blog post. OK it was 17 months ago, but it felt recent to me. Basically I used a clustering methodology and threw in the actual correlation matrix to see how the grouping turned out. It was quite interesting. So I thought it would be quite interesting to do a similar thing with forecast weights. Effectively I am sense checking my heuristic guidelines to see if they are completely nuts, or vaguely okay.

Some code.


Getting the correlation matrix

Well you might think this is easy, but it's not. The correlation matrix here is the correlation of returns for a given set of trading rules and variations. But returns of what? A single instrument, like the S&P 500? That obviously may be unrepresentative of the sample generally, and we're not going to do this exercise for the 200+ instruments I have in my dataset now. What about the correlation of average returns taken across a bunch of instruments, or perhaps the average of the correlations taken across the same bunch - note these aren't quite the same thing. For example an average of correlations will give every instrument the same weight, wheras an average of returns will give a higher weighting to instruments with more data history.

And if we are doing averaging, do we just do a simple average - which will be biased since 37% of my futures are equities? Or do we use the instrument weights?

The good news is it probably won't make too much difference. Given enough history, the correlation of trading rule forecast returns is pretty similar across instruments. But we probably want to avoid overweighting certain asset classes, or equally weighting instruments without much history. So I'm going to go for taking the return correlation of portfolios for each trading rule. Each portfolio consists of the same trading rule being traded in all the instruments I trade, weighted by my actual instrument weights. 

Now I don't actually trade all rules in all instruments, because of trading costs, and sometimes because the instrument has certain flaws, but what we are trying to get here is as much information as possible to build a robust correlation matrix. I will also use pre-cost returns; not that it will make any difference, but the point here is to discover how similar rules are to each other, which doesn't depend on costs.

Finally note that I have 135 instruments with instrument weights, because some of my 208 are duplicates (eg micro and mini S&P 500), or I can't legally trade them, or for some other reason.


Results: N=2

Let's kick things off then:

Cluster 1 'convergent'
['mrinasset160', 'carry10', 'carry30', 'carry60', 'carry125', 'relcarry',
'skewabs365', 'skewabs180', 'skewrv365', 'skewrv180']
Cluster 2 'divergent'
['breakout10', 'breakout20', 'breakout40', 'breakout80', 'breakout160', 'breakout320',
'relmomentum10', 'relmomentum20', 'relmomentum40', 'relmomentum80', 
'assettrend2', 'assettrend4', 'assettrend8', 'assettrend16', 
'assettrend32', 'assettrend64', 
'normmom2', 'normmom4', 'normmom8', 'normmom16', 'normmom32', 'normmom64', 
'momentum4', 'momentum8', 'momentum16', 'momentum32', 'momentum64', 
'accel16', 'accel32', 'accel64']


An absolutely perfect convergent vs divergent split. The labels by the way are added by me, not the code.


Results: N=3


Cluster 1 'convergent' (Unchanged)
['mrinasset160', .... ]

Cluster 2 'fast divergent'
['breakout10', 'breakout20',
'relmomentum10', 'relmomentum20', 'relmomentum40', 'relmomentum80', 
'assettrend2', 'assettrend4', 
'normmom2', 'normmom4', 
'momentum4', 'accel16']

Cluster 3 'medium and slow divergent'
['breakout40', 'breakout80', 'breakout160', 'breakout320',
'assettrend8', 'assettrend16', 'assettrend32', 'assettrend64', 
'normmom8', 'normmom16', 'normmom32', 'normmom64', 
'momentum8', 'momentum16', 'momentum32', 'momentum64', 
'accel32', 'accel64']
This is why we are doing this exercise - we've just discovered something interesting: fast momentum like trading rules have more in common with other fast momentum trading rules, than they do with slow variations of themselves.

Results: N=4

Cluster 1 'convergent mean reversion'
['mrinasset160']
Cluster
2 'convergent skew and carry'
['carry10', 'carry30', 'carry60', 'carry125', 'relcarry', 'skewabs365', 'skewabs180', 'skewrv365', 'skewrv180']
Cluster
3 'fast divergent - unchanged'
['breakout10', 'breakout20', ....]
Cluster
4 'medium and slow divergent - unchanged'
['breakout40', 'breakout80', ....]


Results: N=5

Now it's the turn of the (relatively) slow divergent to be split up:

Cluster 1 'convergent mean reversion (unchanged)'
['mrinasset160', 'mrwrings4']
Cluster
2 'convergent skew and carry (unchanged)'
['carry10', 'carry30', 'carry60', ....]
Cluster
3 'fast divergent - unchanged'
['breakout10', 'breakout20', ....]
Cluster 4 'slow divergent'
['breakout160', 'breakout320',
'assettrend32', 'assettrend64', 
'normmom32', 'normmom64', 
'momentum32', 'momentum64']
Cluster 5 'medium speed divergent'
['breakout40', 'breakout80',
'assettrend8', 'assettrend16', 
'normmom8', 'normmom16', 
'momentum8', 'momentum16', 
'accel32', 'accel64']
Again it's the speed of trading that is the differentiator here, not the trading rule.


Results: N=6

We break relative momentum off from it's counterparts in what was previously cluster 3:

Cluster 1 'convergent mean reversion (unchanged)
['mrinasset160']
Cluster
2 'convergent skew and carry' (unchanged)
['carry10', 'carry30', 'carry60', ...]
Cluster
3 'fast divergent - unchanged'
['breakout10', 'breakout20', ....]
Cluster 3
['relmomentum10', 'relmomentum20', 'relmomentum40', 'relmomentum80']
Cluster 4
['breakout10', 'breakout20',
'assettrend2', 'assettrend4', 
'normmom2', 'normmom4', 
'momentum4', 'accel16']
Cluster 5 'slow divergent' (unchanged - was cluster 4)
['breakout160', 'breakout320',...
]
Cluster 6 'medium speed divergent' (unchanged - was cluster 5)
['breakout40', 'breakout80',...
]

Results: N=7

And now acceleration comes away from the other slow rules:

Cluster 1 'convergent mean reversion (unchanged)
['mrinasset160']
Cluster
2 'convergent skew and carry' (unchanged)
['carry10', 'carry30', 'carry60', ...]
Cluster 3 'relative momentum' (unchanged)
['relmomentum10', 'relmomentum20', 'relmomentum40', 'relmomentum80']
Cluster 4 'fast divergent' (unchanged)
['breakout10', 'breakout20'...
]
Cluster 5 'slow divergent ex. accel'
['breakout160', 'breakout320', 'assettrend32', 'assettrend64', 'normmom32', 'normmom64', 'momentum32', 'momentum64']
Cluster 6 'slow acceleration'
['accel32', 'accel64']
Cluster 7 'medium speed divergent' (unchanged - was cluster 6)
['breakout40', 'breakout80',...
]

Results: N=8

Skew and carry seperate:


Cluster 1 'convergent mean reversion (unchanged)
['mrinasset160']
Cluster 2 ('skew)
['skewabs365', 'skewabs180', 'skewrv365', 'skewrv180']
Cluster 3 ('carry')
['carry10', 'carry30', 'carry60', 'carry125', 'relcarry']
Cluster 4 'relative momentum' (unchanged)
['relmomentum10', 'relmomentum20', 'relmomentum40', 'relmomentum80']
Cluster 5 'fast divergent' (unchanged)
['breakout10', 'breakout20'...
]
Cluster 6 'slow divergent ex. accel'
['breakout160', 'breakout320',...]
Cluster 7 'slow acceleration' (unchanged)
['accel32', 'accel64']
Cluster 8 'medium speed divergent' (unchanged)
['breakout40', 'breakout80',...
]

Results: N=11

Let's skip ahead a bit, and also show all the instruments in each group for this final iteration:

Cluster 1 'slow asset mean reversion'
['mrinasset160']
Cluster 2 'skew'
['skewabs365', 'skewabs180', 'skewrv365', 'skewrv180']
Cluster 3 'carry'
['carry10', 'carry30', 'carry60', 'carry125', 'relcarry']
Cluster 4 'slow relative momentum'
['relmomentum10', 'relmomentum20']
Cluster 5 'fast relative momentum'
['relmomentum40', 'relmomentum80']
Cluster 6 'divergent speed 2'
['breakout20', 'assettrend4', 'normmom4', 'momentum4']
Cluster 7 'divergent speed 1 (fastest)'
['breakout10', 'assettrend2', 'normmom2', 'accel16']
Cluster 8 'divergent speed 5 (slowest)'
['breakout160', 'breakout320', 'assettrend32', 'assettrend64', 'normmom32', 'normmom64', 'momentum32', 'momentum64']
Cluster 9 'slow acceleration'
['accel32', 'accel64']
Cluster 10 'divergent speed 4'
['breakout80', 'assettrend16', 'normmom16', 'momentum16']
Cluster 11 'divergent speed 3'
['breakout40', 'assettrend8', 'normmom8', 'momentum8']

A new heirarchy for handcrafting trading rules

With that all in mind, a better heirarchy would be something a bit like this:

  • Convergent
    • Mean reversion
    • Skew
      • Equal split between 4 skew rules
    • Carry
      • Outright carry
      • Relative carry
  • Divergent
    • Speed 1 (fastest: turnover > 45)
      • acceleration - nothing fast enough
      • relmomentum10
      • other trend
        • breakout10
        • assettrend2
        • normmom2
        • momentum4
    • Speed 2 (22 < turnover <45)
      • acceleration16
      • relmomentum20
      • other trend
        • breakout20
        • assettrend4
        • normmom4
        • momentum8
    • Speed 3 (12 < turnover < 22)
      • acceleration32
      • relmomentum40
      • other trend
        • breakout40
        • assettrend8
        • normmom8
        • momentum16
    • Speed 4 (7 < turnover < 12)
      • acceleration64
      • relmomentum80
      • other trend
        • breakout80
        • assettrend16
        • normmom16
        • momentum32
    • Speed 5 (4 < turnover < 7)
      • other trend
        • breakout160
        • assettrend32
        • normmom32
        • momentum64
    • Speed 6 (turnover > 4)
      • other trend
        • breakout320
        • assettrend64
        • normmom64
As you can see I (roughly) used turnovers to group the divergent rules, although these groupings aren't quite right I thought it better to go for some nice neat sequences. And this also doesn't exactly follow how the clustering above works eithier. But this would certainly be a better way of doing things than the grouping entirely by trading rule, which as we've seen doesn't make sense for divergent rules where speed is more important.

Now of course there are a lot of caveats with this; first of all it's entirely in sample, but given how stable and persistent correlation of trading rules returns are over time, we'd probably get very similar results with a purely backward looking approach. Secondly we're ignoring things like costs, and the possibility that some rules may do better than others, but we can deal with that when we actually use the above structure to set instrument weights.


Friday 5 May 2023

Trading and investing performance year nine - part 2: Futures trading


 Here is part two of my annual review. Part one looked at my overall portfolio, including long only, but there was only a cursory look at my futures. Here in this second part I will be looking a my futures trading account in a lot more detail.

It's important to say why I'm doing this. I'm certainly not doing it so I can upweight good strategies, and delete badly performing ones. A year of data on top of a 50 year backtest is meaningless. But it's interesting to know what did well or badly, whether my trading costs were higher than expected, how closely my live performance matches simulation, and whether my new dynamic optimisation is adding value compared to the simpler static system I was trading until late 2021 (as requested by Christina). 



A reminder of my overall futures performance

As I noted at the start of the post, I'll just put a very cursory look at my futures trading in here, with a subsequent follow up post to look at more details. All figures are as a % of my notional capital, which will usually be more than I have in my account. 

MTM: -9.7%
Interest: 1.3%
Fees: -0.06%
Commissions: -0.21%

Net futures trading: -8.7%

'Interest' includes dividends on 'cash like' short bond ETFs I hold to make a slightly more efficient use of my cash; I've recently (in this financial year) added a bit more to this sub portfolio. It's quite interesting how interest has gone from being irrelevant to actually adding something to performance.

As I've done in previous years I compare this to two benchmarks, 'Bench2' the SGA CTA index, and a 'Bench1' a fund run by AHL, my ex employers. My loss was worse than both; on a vol corrected basis bench1 made 5% (admittedly on the back of a loss last year), and bench2 only dropped 1.3%

It might be interesting to look at the performance of me versus those benchmarks since I started trading my own money:

        Me Bench1 Bench2

2014 – 2015 58.2% 70.2% 50.7%

2015 – 2016 23.2% -8.7% -1.6%

2016 – 2017 -14.0% -6.2% -25.5%

2017 – 2018 -3.7% 7.5% -4.4%

2018 – 2019 5.2% 8.1% 0.8%

2019 – 2020 39.7% 22.6% 9.3%

2020 – 2021 0.4% 0.8% 12.7%

2021 – 2022 27.0% -5.2% 38.3%

2022 - 2023 -8.71% 5.0% -1.30%

Mean         14.1% 10.4% 8.8%

Std dev         24.3% 24.3% 23.1%

SR         0.58 0.43 0.38

Geom mean 11.9% 8.5% 6.7%

Correlation 0.71 0.80

alpha         6.7% 6.8%

beta         0.71 0.84


Monthly performance, returns vol normalised to 'me' in sample

Still looks pretty healthy. I seem to have been hurt less badly by the sell off that occured in March, possibly due to a lower trend following component (more discussion of that later).


Market by market

Here are the numbers by asset class

  OilGas  -3.19

     Ags  -2.78

      FX  -2.00

  Equity  -2.14

  Sector  -1.64

  Metals  -0.94

     Vol   1.52

    Bond   1.56

A big turnaround for energy and ags markets, the darlings of the 2021/22 accounting period. And here are some worst and best:

0    GAS_US_mini   -2.0

1            SMI   -1.7

2          WHEAT   -1.5

3            AUD   -1.4

4          US10U   -1.4

5       EU-BASIC   -1.2

6           IRON   -1.2

7   CRUDE_W_mini   -1.2

8         SOYOIL   -1.0

9         GBPEUR   -1.0

....

47          US20    1.0

48       SOYMEAL    1.1

49           VIX    1.4

50           EUR    1.7

51           JPY    2.0

52          BUND    2.0


Interesting that I had profits in 53 instruments last year; about half the number I was actually trading. Again this is the work of the dynamic optimiser; before that I was trading only ~30 markets and it's unlikely I would have positions in all of them during the year.

Trading rules

Presented initially without comment, a bunch of plots showing p&l for each trading rule group:










The most obvious thing is how depressingly bad all of these graphs are. Pretty much every group of trading rules had a small net loss over the year. Even the diversifying carry and skew strategies weren't much help, although they did make money back in the sell off that caught out all the trend following style rules, narrowing my underperformance against the benchmark for the year. Only mean reversion (within asset classes - basically a value strategy), and relative carry were decently profitable. 

It's this lack of signal diversification that brought me to my second worst loss when I started trading: -8.7%. Of course, most equity long only managers would murder half their family for that to be their second worst annual loss, so let's get some perspective here.



Live vs sim

Now let's turn to see how well my live performance matches what my backtest thinks I got. The dynamic optimisation introduces some new jeopardy here, since it results in some path dependence; if the starting positions are different at the start of the time period it's more likely that things will diverge thereafter (I could deal with this by populating my actual starting positions into the backtest on the first of April, but that's a lot of hassle).




'dynamic' here is the backtest, 'live' is live.

Well there are certiainly some differences here; you can see when I went on holiday, but interestingly we end up in exactly the same place. There are so many reasons why these will end up different, none of which I am that bothered about exploring.


Costs and slippage

I already noted above that my commission came in at 21bp (basis points = 0.21%) of account value, but how about slippage?

The cost I would have paid had I crossed the spread every time I traded (market orders) would have been 91bp. However my execution algo by sometimes executing passively saved me 22bp, i.e. around a quarter of the total. So my net slippage was 69bp, for total costs of 90bp.

This is a lot less than last years 3% (due to a one off strategy change), and a little less than my backtest which comes in at around 1% a year. It looks like my new dynamic optimisation algo is doing it's thing.


Dynamic vs static optimisation

Finally let's compare the performance of what I currently trade (dynamic optimisation with over 100 instruments) versus what I was trading before (static optimisation with less than 30). I'm going to compare backtest vs backtest here - I no longer trade static optimisation with real money so there is no other way of doing it; and it seems fairest to compare like with like. Plus we've already seen the difference between the dynamic optimisation backtest and it's live production returns.

Naturally one year doesn't prove anything, and it's also true that the results of a static backtest can be unusually flattered by getting lucky with your choice of instruments (something I discuss at length in my new book). 

An important point here is that it's generally a good thing to store the code and config you use for past trading systems so you can do this exercise. It might also be worth noting down the hash number of the repo version you used with the code; firstly in case you fix bugs in the backtest that change the results (or introduce new bugs!) - although arguably if you run the same code with both backtests that is fairer; secondly in case you make changes that are backwardly incompatible and the old code just doesn't run.

First the long view (well since 2000, which is when my stored backtest begins):



Again this is a case of the static backtest getting lucky. What about last year:
There is a bit of dynamic outperformance, and it's certainly a smoother ride. 


What next

I'm not as interested in some of these statistics as other people are; with the exception of costs, and as long as my live p&l is in the same ball park as my backtest. But hopefully your curiosity has been sated.

My short term plan is to add another bunch of instruments to my strategy, since I've added a bunch more to my database. Then I'm going to have a look at implementing some of the novel strategies in my book, albeit with some fun twists.