Tuesday, 19 November 2024

CTA index replication and the curse of dimensionality

Programming note: 

So, first I should apologise for the LONG.... break between blogposts. This started when I decided not to do my usual annual review of performance - it is a lot of work, and I decided that the effort wasn't worth the value I was getting from it (in the interests of transparency, you can still find my regularly updated futures trading performance here). Since then I have been busy with other projects, but I now find myself with more free time and a big stack of things I want to research and write blog posts on.

Actual content begins here:

To the point then - if you have heard me talking on the TTU podcast you will know that one of my pet subjects for discussion is the thorny idea of replicating - specifically, replicating the performance of a CTA index using a relatively modest basket of futures which is then presented inside something like an ETF or other fund wrapper as an alternative to investing in the CTA index itself (or to be more precise, investing in the constituents because you can't actually invest in an index).

Reasons why this might be a good thing are: 

  • that you don't have to pay fat fees to a bunch of CTA managers, just slightly thinner ones to the person providing you with the ETF. 
  • potentially lower transaction costs outside of the fee charged
  • Much lower minimum investment ticket size
  • Less chance of idiosyncratic manager exposure if you were to deal with the ticket size issue by investing in just a subset of managers rather than the full index
How is this black magic achieved? In an abstract way there are three ways we can replicate something using a subset of the instruments that the underyling managers are trading:
  • If we know the positions - by finding the subset of positions which most closely matches the joint positions held by the funds in the index. This is how my own dynamic optimisation works, but it's not really practical or possible in this context.
  • Using the returns of individual instruments: doing a top down replication where we try and find the basket of  current positions that does the best job of producing those returns.
  • If we know the underlying strategies - by doing a bottom up replication where we try and find the basket of strategies that does the best job of producing those returns.

In this post I discuss in more detail some more of my thoughts on replication, and why I think bottom up is superior to top down (with evidence!).

I'd like to acknowledge a couple of key papers which inspired this post, and from which I've liberally stolen:



Why are we replicating?

You may think I have already answered this; replication allows us to get close to the returns of an index more cheaply and with lower minimum ticket size than if we invested in the underlying managers. But we need to take a step back: why do we want the returns of the <insert name of CTA index> index?




For many institutional allocators of capital the goal is indeed closely matching and yet beating the returns of a (relatively) arbitrary benchmark. In which case replication is probably a good thing.

If on the other hand you want to get exposure to some latent trend following (and carry, and ...) return factors that you believe are profitable and/or diversifying then other options are equally valid, including investing in a selected number of managers, or doing DIY trend following (and carry, and ...). In both cases you will end up with a lower correlation to the index than with replication, but frankly you probably don't care.

And of course for retail investors where direct manager investment (in a single manager, let alone multiple managers) and DIY trend following aren't possible (both requiring $100k or more) then a half decent and chearp ETF that gives you that exposure is the only option. Note such a fund wouldn't neccessarily need to do any replication - it could just consist of a set of simple CTA type strategies run on a limited universe of futures and that's probably just fine. 

(There is another debate about how wide that universe of futures should be, which I have also discussed in recent TTU episodes and for which this article is an interesting viewpoint). 

For now let's assume we care deeply, deeply, about getting the returns of the index and that replication is hence the way to go.


What exactly are we replicating?

In a very abstract way, we think of there being C_0....C_N CTA managers in an index. For example in the SG CTA index there are 20 managers, whilst in the BTOP50 index there are... you can probably guess. No, not 50, it's currently 20. The 50 refers to the fact it's trying to capture at least 50% of the investable universe. 

In theory the managers could be weighted in various ways (AUM, vol, number of Phds in the front office...) but both of these major indices are equally weighted. It doesn't actually matter what the weighting is for our purposes today.

Each manager trades in X underlying assets with returns R_0.....R_X. At any given time they will have positions in each of these assets, P_c_x (so for manager 0, P_0_0.... P_0_X, for manager 1, P_1_0...P_1_X and in total there will be X*N positions at each time interval). Not every manager has to trade every asset, so many of these positions could be persistently zero.

If we sum positions up across managers for each underlying asset, then there will be a 'index level' position in each underlying asset P_0.... P_X. If we knew that position and were able to know instantly when it was changing, we could perfectly track the index ignoring fees and costs. In practice, we're going to do a bit better than the index in terms of performance as we will get some execution cost netting effects (where managers trade against each other we can net those off), and we're not paying fees. 

Note that not paying performance fees on each manager (the 20 part of '2&20') will obviously improve our returns, but it will also lower our correlation with the index. Management fee savings however will just go straight to our bottom line without reducing correlation. There will be additional noise from things like how we invest our spare margin in different currencies, but this should be tiny. All this means that even in the world of perfectly observable positions we will never quite get to a correlation of 1 with the index.

But we do not know those positions! Instead, we can only observe the returns that the index level positions produce. We have to infer what the positions are from the returns. 


The curse of dimensionality and non stationarity, top down version

How can we do this inference? Well we're finance people, so the first thing we would probably reach for is a regression (it doesn't have to be a regression, and no doubt younger people reading this blog would prefer something a bit more modern, but the advantage of a regression is it's very easy to understand it's flaws and problems unlike some black box ML technique and thus illustrate what's going wrong here).

On the left hand side of the regression is the single y variable we are trying to predict - the returns of the index. On the right hand side we have the returns of all the possible instruments we know our managers are trading. This will probably run into the hundreds, but the maximum used for top down replication is typically 50 which should capture the lions share of the positions held. The regressed 'beta' coefficients on each of these returns will be the positions that we're going to hold in each instrument in our replicating portfolio: P_0... P_X. 

Is this regression even possible? Well, as a rule you want to have lots more data points than you do coefficients to estimate. Let's call the ratio between these the Data Ratio. It isn't called that! But it's as good a name as any. There is a rule of thumb that you should have at least 10x the number of variables in data points. I've been unable to find a source for who invented this rule, so let's call it The Rule Of Thumb.

There are over 3800 data points available for the BTOP50 - 14 years of daily returns, so having say 50 coefficients to estimate gives us a ratio of over 70. So we are all good.

Note - We don't estimate an intercept as we want to do this replication without help or hindrance from a systematic return bias.

In fact we are not good at all- we have a very big problem, which is that the correct betas will change every day as the positions held change every day. In theory then that means we will have to estimate 200 variables with just one piece of data - todays daily return. That's a ratio of 0.005x; well below 10!

Note - we may also have the returns for each individual manager in the index, but a moments thought 
will tell you that this is not actually helpful as it just means we will have twenty regressions to do, each with exactly the same dimensionality problem.

We can get round this. One good thing is that these CTAs aren't trading that quickly, so the position weights we should use today are probably pretty similar to yesterdays. So we can use more than one day of returns to estimate the correct current weights. The general approach in top down replication is to use rolling windows in the 20 to 40 day range. 

We now have a ratio of 40 datapoints: 50 coefficients - which is still less than ten.

To solve this problem we must reduce the number of betas we're trying to estimate by reducing the number of instruments in our replacing portfolio. This can be done by picking a set of reasonably liquid and uncorrelated instruments (say 10 or 15) to the point where we can actually estimate enough position weights to somewhat replicate the portfolio. 

However with 40 days of observations we need to have just four instruments to meet our rule of thumb. It would be hard to find a fixed group of four instruments that suffice to do a good job of replicating a trend index that actually has hundreds of instruments underlying it.

To deal withs problem, we can use some fancy econometrics. With regularisation techniques like LASSO or ridge regression; or stepwise regressions, we can reduce the effective number of coefficients we have to estimate. We would effectively be estimating a small number of coefficients, but they would be the coefficients of four different instruments over time (yes this is a hand waving sentence) which give us the best current fit.

Note that there is a clear trade off here between the choice of lookback window, and the number of coefficients estimated (eithier as an explicit fixed market choice, dynamically through stepwise regression, or in an implicit way through regularisation):

  • Very short windows will worsen the curse of dimensionality. Longer windows won't be reactive enough to position changes.
  • A smaller set of markets means a better fit, and means we can be more reactive to changes in positions held by the underlying markets, but it also means we're going to do a poorer job of replicating the index.


Introducing strategies and return factors

At this point if we were top down replicators, we would get our dataset and start running regressions. But instead we're going to pause and think a bit more deeply. We actually have additional information about our CTA managers - we know they are CTA managers! And we know that they are likely to do stuff like trend following, as well as other things like carry and no doubt lots of other exotic things. 

That information can be used to improve the top down regression. For example, we know that CTA managers probably do vol scaling of positions. Therefore, we can regress against the vol scaled returns of the underlying markets rather than the raw returns. That will have the benefit of making the betas more stable over time, as well as making the Betas comparable and thus more intuitive when interpreting the results.

But we can also use this information to tip the top down idea on it's head. Recall:

Each manager trades in X underlying assets with returns R_0.....R_X. At any given time they will have positions in each of these assets, P_c_x (so for manager 0, P_0_0.... P_0_X, for manager 1, P_1_0...P_1_X so there will be X*N positions at each time interval). 

Now instead we consider the following:

Each manager trades in Y underlying strategies with returns r_0.....r_Y. At any given time they will have weights in each of these strategies, w_c_y (so for manager 0, w_0_0.... w_0_Y, for manager 1, w_1_0...w_1_Y so there will be Y*N positions at each time interval). 

Why is this good? Well because strategy weights, unlike positions, are likely to be much more stable. I barely change my strategy weights. Most CTAs probably do regular refits, but even if they do then the weights they are using now will be very similar to those used a year ago. Instead of a 40 day window, it wouldn't be unreasonable to use a window length that could be measured in years: thousands of days. This considerably improves the curse of dimensionality problem.


Some simple tables

For a given number of X instruments, and a given number of Y strategies, Z for each instrument:



                                Top down              Bottom up

Approx optimal window size      40 days               2000 days

Number of coefficients            X                     X*Z

Data ratio                  40 / X                   2000 / X*Z


Therefore as long as Z is less than 50 the data ratio of the bottom up strategy will be superior. For example, with some real numbers - 20 markets and 5 strategies per market:

                  


                                Top down              Bottom up

Approx optimal window size        40 days               2000 days
Number of coefficients            20                    100

Data ratio                        2                      20


Alternatively, we could calculate the effective number of coefficients we could estimate to get a data ratio of 10 (eithier as a fixed group, or implicit via regularisation):



                                Top down              Bottom up

Approx optimal window size        40 days               2000 days

Data ratio                        10                    10

Number of coefficients            4                     20


It's clear that with bottom up replication we should get a better match as we can smuggle in many more coefficients, regardless of how fancy our replication is.

A very small number of caveats


There are some "but..."'s, and some "hang on a moment's" though. We potentially have a much larger number of strategies than instruments, given that we probably use more than one strategy on each instrument. Two trend following speeds plus one carry strategy is probably a minimum; tripling the number of coefficients we have to estimate. It could be many more times that.

There are ways round this - the same ways we would use to get round the 'too many instruments' problem we had before. And ultimately the benefit from allowing a much longer window length is significantly greater than the increase in potential coefficients from multiple strategies per instrument. Even if we ended up with thousands of potential coefficients, we'd still end up selecting more of them than we would with top down replication.

A perhaps unanswerable 'but...' is that we don't know for sure which strategies are being used by the various managers, whereas we almost certainly know all the possible underlying instruments they are trading. For basic trend following that's not a problem; it doesn't really matter how you do trend following you end up with much the same return stream. But it's problematic for managers doing other things.

A sidebar on latent factors


Now one thing I have noticed in my research is that asset class trends seem to explain most of instrument trend following returns (see my latest book for details). To put it another way, if you trend follow a global equity index you capture much of the p&l from trend following the individual constituents. In a handwaving way, this is an example of a latent return factor. Latent factors are the reason why both top down and bottom up replication work as well as they do so it's worth understanding them.

The idea is that there are these big and unobservable latent factors that drive returns (and risk), and individual market returns are just manifestations of those. So there is the equity return factor for example, and also a bond one. A standard way of working out what these factors are is to do a decomposition of the covariance matrix and find out what the principal components are. The first few PC will often explain most of the returns. The factor loadings are relatively static and slow moving; the S&P 500 is usually going to have a big weight in the equity return factor.

Taking this idea a step further, there could also be 'alternative' return factors; like the trend following factor or carry factor (or back in equity land, value and quality). These have dynamic loadings versus the underyling instruments; sometimes the trend following factor will be long S&P 500 and sometimes short. This dynamic loading is what makes top down replication difficult.

Bottom up regression reverses this process and begins with some known factors; eg the returns from trend following the S&P 500 at some speed with a given moving average crossover, and then tries to work out the loading on those factors for a given asset - in this case the CTA index. 

Note that this also suggests some interesting research ideas such as using factor decomposition to reduce the number of instruments or strategies required to do top down or bottom up replication, but that is for another day. 

If factors didn't exist and all returns were idiosyncratic both types of replication would be harder; the fact they do seem to exist makes replication a lot easier as it reduces the number of coefficients required to do a good job.



Setup of an empirical battle royale


Let's do a face off then of the two methodologies. The key thing here isn't to reproduce the excellent work done by others (see the referenced papers for examples), or neccessarily to find the best possible way of doing eithier kind of replication, but to understand better how the curse of dimensionality affects each of them. 

My choice of index is the BTOP50, purely because daily returns are still available for free download. My set of instruments will be the 102 I used in my recent book 'AFTS' (actually 103, but Eurodollar is no longer trading) which represent a good spread of liquid futures instruments across all the major asset classes. 

I am slightly concerned about using daily returns, because the index snapshot time is likely to be different from the closing futures price times I am using. This could lead to lookahead bias, although that is easily dealt with by introducing a conservative two day lag in betas as others have done. However it could also make the results worse since a systematic mismatch will lower the correlation between the index returns and underyling instrument returns (and thus also the strategy returns in a bottom up replication). To avoid this I also tested a version using two day returns but it did not affect the results.

For the top down replication I will use six different window sizes from 8 business days up to 256 (about a year) with all the powers of 2 in between. These window sizes exceed the range typically used in this application, deliberately because I want to illustrate the tradeoffs involved. For bottom up replication I will use eight window sizes from 32 business days up to 4096 (about sixteen years, although in practice we only have 14 years of data for the BTP50 so this means using all the available data). 

We will do our regressions every day, and then use an exponential smooth on the resulting coefficients with a span equal to twice the window size. For better intuition, a 16 day exponential span such as we would use with an 8 day window size has a halife of around 5.5 days. The maximum smooth I use is a span of 256 days.

For bottom up replication, I will use seven strategies: three trend following EWMA4,16, EWMAC16,64, EWMAC64,256 and a carry strategy (carry60); plus some additional strategies: acceleration32, mrinasset1000, and skewabs180. For details of what these involve, please see AFTS or various blogposts; suffice to say they can be qualitiatively described as fast, medium and slow trend following, carry, acceleration (change in momentum), mean reversion, fast momentum and skew respectively. Note that in the Resolve paper they use 13 strategies for each instrument, but these are all trend following over different speeds and are likely to be highly correlated (which is bad for regression, and also not helpful for replication).

I will use a limited set of 15 instruments, the same as those used in the Newfound paper, which gives me 15*7 = 105 coefficients to estimate - roughly the same as in the top down replication.

I'm going to use my standard continous forecasting method just because that is the code I have to hand; the Resolve paper does various kinds of sensitivity analysis and concludes that both binary and continous produce similar results (with a large enough universe of instruments, it doesn't matter so much exactly how you do the CTA thing). 

Note - it could make sense to force the coefficients on bottom up replication to be positive, however we don't know for sure if a majority of CTAs are using some of these strategies in reverse, in particular the divergent non trend following strategies.


Approx data ratios with different window sizes if all ~100 coefficients estimated:

                               

8 days                            0.08
16 days                           0.16
32 days                           0.32
64 days                           0.64
128 days                          1.28
256 days                          2.56
512 days                          5.12
1024 days                         10.2    
2048 days                         20.5
4096 days                         41.0


In both cases I need a way to reduce the number of regressors on the right hand side from somewhere just over 100 to something more reasonable. This will clearly be very important with an 8 day window!

Various fancy techniques are commonly used for this including LASSO and ridge regression. There is a nice summary of the pros and cons of these in an appendix of the Resolve paper; one implication being that the right technique will depend on whether we are doing bottom up or top down replication. They also talk about elastic net, a technique that combines both of these techniques. For simplicity I use LASSO, as there is only one hyperparameter to fit (penalty size).



Results

Here are the correlation figures for the two methods with different lookback windows:


As you can see, the best lookback for the top down method needs to be quite short to capture changing positions. Since strategy weights are more stable, we can use a longer lookback for the bottom up method. For any reasonable length of lookback the correlation produced by the top down method is pretty stable, and significantly better than the bottom up method.


Footnote: Why not do both?

One of the major contributions of the Resolve paper is the idea of combining both top down and bottom up methods. We can see why this make sense. Although bottom up is superior as it causes less dimensionality issues, it does suffer because there might be some extra 'secret sauce' that our bottom up models don't capture. By including the top down element as well we can possibly fill this gap.


Footnote on 'Creating a CTA from scratch'

You may have seen some bottom up 'replication' articles that don't use any regression, such as this one. They just put together a set of simple strategies with some sensible weights and then do an ex-post cursory check on correlation with the index. The result, without trying, is a daily correlation of 0.6 with the SG CTA index, in line with the best bottom up results above without any of the work or the risks involved with doing potentially unstable regressions on small amounts of data. Indeed, my own trading strategies (monthly) correlation with the SG CTA index was 0.8 last time I checked. I have certainly done no regressions to get that that!

As I mentioned above, if you are a retail investor or an institutional investor who is not obsessed with benchmarking, then this might be the way to go. There is then no limit on the number of markets and strategies you can include.


Conclusion

I guess my conclusion comes back to why... why are we doing this.

If we really want to replicate the index then we should be agnostic about methodology and go with what is best. This will involve mostly bottom up with a longish window for the reasons discussed above, although it can probably be improved by including an averaging with top down.

But if we are trying to get 'exposure to some trend following factors' without caring about the index then I would probably start with the bottom up components of simple strategies on a diversified set of instruments with sensible but dumb 'no-information' weights that probably use some correlation information but not much else (see all the many posts I have done on portfolio optimisation). Basically the 'CTA from scratch' idea.

And then it might make sense to move in the direction of trying to do a bottom up replication of the index if you did decide to reduce your tracking error, though I'd probably use a robust regression to avoid pulling the strategy weights too far from the dumb weights.






Wednesday, 6 March 2024

Fitting with: exponential weighting, alpha and the kitchen sink

 I've talked at some length before about the question of fitting forecast weights, the weights you use to allocate risk amongst different signals used to trade a particular instrument. Generally I've concluded that there isn't much point wasting time on this, for example consider my previous post on the subject here.

However it's an itch I keep wanting to scratch, and in particular there are three things I'd like to look at which I haven't considered before:

  • I've generally used ALL my data history, weighted equally. But there are known examples of trading rules which just stop working during the backtested period, for example faster momentum pre-cost (see my last book for a discussion). 
  • I've generally used Sharpe ratio as my indicator of performance of choice. But one big disadvantage of it is that will tend to favour rules with more positive Beta exposure on markets that have historically gone up 
  • I've always used a two step process where I first fit forecast weights, and then instrument weights. This seperation makes things easier. But we can imagine many examples where it would produce a suboptimal performance.
In this post I discuss some ideas to deal with these problems:

  • Exponential weighting, with more recent performance getting a higher weight.
  • Using alpha rather than Sharpe ratio to fit.
  • A kitchen sink approach where both instrument and forecast weights are fitted together.

Note I have a longer term project in mind where I re-consider the entire structure of my trading system, but that is a big job, and I want to put in place these changes before the end of the UK tax year, when I will also be introducing another 50 or so instruments into my traded universe, something that would require some fitting of some kind to be done anyway.


Exponential weighting

Here is the 2nd most famous 'hockey stick' graph in existence:

(From my latest book, Advanced Futures Trading Strategies AFTS)

Focusing on the black lines, which show the net performance of the two fastest EWMAC trading rules across a portfolio of 102 futures contracts, there's a clear pattern. Prior to 1990 these rules do pretty well, then afterwards they flatline (EWMAC4 in a very clear hockey stick pattern) and do badly (EWMAC2). 

I discuss some reasons why this might have happened in the book, but that isn't what concerns us now. What bothers me is this; if I allocate my portfolio across these trading strategies using all the data since 1970 then I'm going to give some allocation to EWMAC4 and even a little bit to EWMAC2. But does that really make sense, to put money in something that's been flat / money losing for over 30 years?

Fitting by use of historic data is a constant balance between using more history, to get more robust statistically significant results, and using more recent data that is more likely to be relevant and also accounts for alpha decay. The right balance depends on both the holding period of our strategies (HFT traders use months of data, I should certainly be using decades), and also the context (to predict instrument standard deviation, I use something equvalent to using about a month of returns, whereas for this problem a much longer history would be appropriate). 

Now I am not talking about crazy and doing something daft like allocating everything to the strategy that did best last week, but it does seem reasonable to use something like a 15 year halflife when estimating means and Sharpe Ratios of trading strategy returns.

That would mean I'd currently be giving about 86% of any weighting to the period after 1990, compared to about 62% now with equal weighting. So it's not a pure rolling window; the distant past still has some value, but the recent past is more important. 


Using alpha rather than Sharpe Ratio to fit

One big difference between Quant equity people and Quant futures traders is that the former are obsessed with alpha. They get mugs from their significant others with 'worlds best alpha generator' on them for christmas. They wear jumpers with the alpha symbol on. You get the idea. Beta is something to be hedged out. Much of the logic is that we're probably getting our daily diet of Beta exposure elsewhere, so the holistic optimal portfolio will consist of our existing Beta plus some pure alpha.

Quant futures traders are, broadly speaking, more concerned with outright return. I'm guilty of this myself. Look at the Sharpe Ratio in the backtest. Isn't it great? And you know what, that's probably fine. The correlation of a typical managed futures strategy with equity/bond 60:40 is pretty low. So most of our performance will be alpha anyway. 

However evaluating different trading strategies on outright performance is somewhat problematic. Certain rules are more likely to have a high correlation with underlying markets. Typically this will include carry in assets where carry is usually positive (eg bonds), and slower momentum on anything that has most usually gone up in the past (eg equities)*.  To an extent some of this is fine since we want to collect some risk premia, but if we're already collecting those premia elsewhere in long only portfolios**, why bother? 

* This also means that any weighting of instrument performance will be biased towards things that have gone up in the past - not a problem for me right now as I generally ignore it, but could be a problem if we adopt a 'kitchen sink' approach as I will discuss later.

** Naturally 'Trend following plus nothing' people will prefer to collect their risk premia inside their trend following portfolios, but they are an exception. I note in passing that for a retail investor who has to pay capital gains when their futures roll, it is likely that holding futures positions is an inferior way of collecting alpha.

I'm reminded of a comment by an old colleague of mine* on investigating different trading rules in the bond sector (naturally evalutin. After several depressing weeks he concluded that 'Nothing I try is any better than long only'.

*Hi Tom!

So in my latest book AFTS (sorry for the repeated plugs, but you're reading this for free so there has to be some advertising and at least it's more relevant than whatever clickbait nonsense the evil algo would serve up to you otherwise) I did redress this slightly by looking at alpha and not just outright returns. For example my slowest momentum rule (EWMAC64,256) has a slightly higher SR than one of my fastest (EWMAC8,32), but an inferior alpha even after costs.


Which benchmark?

Well this idea of using alpha is all very well, but what benchmark are we regressing on to get it? This isn't US equities now mate, you can't just use the S&P 500 without thinking. Some plausible candidates are:

  1. The S&P 500,.
  2. The 60:40 portfolio that some mythical investor might own as well as this, or a more tailored version to my own requirementsThis would be roughly equivalent to long everything on a subset of markets, with sector risk weights of about 80% in equities and 20% in bonds. Frankly this wouldn't be much different to the S&P 500.
  3. The 'long everything' portfolio I used in AFTS, which consists of all my futures with a constant positive fixed forecast (the system from chapter 4, as readers will know).
  4. A long only portfolio just for the sector a given instrument is trading in.
  5. A long only position just on the given instrument we are trading.

There are a number of things to consider here. What is the other portfolio that we hold? It might well be the S&P 500 or just the magnificent 7; it's more likely to consist of a globally diversified bunch of bonds and stocks; it's less likely to have a long only cash position in some obscure commodities contract. 

Also not all things deliver risk premia in their naked Beta outfits. Looking at median long only constant forecast SR in chapter 3 of AFTS, they appear lower in the non financial assets (0.07 in ags, 0.27 in metals and 0.32 in energy; versus 0.40 in short vol, 0.46 in equity and 0.59 in bonds; incidentally FX is also close to zero at 0.09, but EM apart there's no reason why we should earn a risk premium here). This implies we should be veering towards loading up on Beta in financials and Alpha in non financials). 

But it's hard to disaggregate what is the natural risk premium from holding financial assets, versus what we've earned just from a secular downtrend in rates and inflation that lasted for much of the 50 odd years of the backtest. Much of the logic for doing this exercise is because I'm assuming that these long only returns will be lower in the future because that secular trend has now finished.

Looking at the alpha just on one instrument will make it a bit weird when comparing alphas across different instruments. It might sort of make more sense to do the regression on a sector Beta. This would be more analogus to what the equity people do.

On balance I think the 'long everything' benchmark I used in AFTS is the best compromise. Because trends have been stronger in equities and bonds it will be reasonably correlated to 60:40 anyway. Regressing against this will thus give a lower Beta and potentially better Alpha for instruments outside of those two sectors.

One nice exercise to do is to then see what a blend of long everything and the alpha optimised portfolio looks like. This would allow us to include a certain amount of general Beta into the strategy. We probably shouldn't optimise for this.


Optimising with alpha

We want to allocate more to strategies with higher alpha. We also want that alpha to be statistically significant. We'll get more statistical significance with more observations, and/or a better fit to the regression. 

Unlike with means and Sharpe Ratios, I don't personally have any well developed methodologies, theories, or heuristics, for allocating weights according to alpha or significance of alpha. I did consider developing a new heuristic, and wasted a bit of time with toy formula involving the product of (1- p_value) and alpha.

But I quickly realised that it's fairly easy to adapt work I have done on this before. Instead of using naked return streams, we use residual return streams; basically the return left over after subtracting Beta*benchmark return. We can then divide this by the target return to get a Sharpe Ratio, which is then plugged in as normal.

How does this fit into an exponential framework? There are a number of ways of doing this, but I decided against the complexity of writing code (which would be slow) to do my regression in a full exponential way. Instead I estimate my Betas on first a rolling, then an expanding, 30 year window (which trivially has a 15 year half life). I don't expect Betas to vary that much over time. I estimate my alphas (and hence Sharpe ratios) with a 15 year half life on the residuals. Betas are re-estimated every year, and the most up to date estimate is then used to correct returns in the past year (otherwise the residual returns would change over time which is a bit weird and also computationally more expensive).


Kitchen sink

I've always done my optimisation in a two step process. First, what is the best way to forecast the price of this market (what is the best allocation across trading rules, i.e. what are my forecast weights)? Second, how should I put together a portfolio of these forecasters (what is the best allocation across instruments, i.e. what are my instrument weights)? 

Partly that reflects the way my trading strategy is constructed, but this seperation also makes things easier. But it does reflect a forecasting mindset, rather than a 'diversified set of risk premia' mindset. Under the latter mindset, it would make sense to do a joint optimisation where the individual 'lego bricks' are ONE trading rule and ONE instrument. 

It strikes me that this is also a much more logical approach once we move to maximising alpha rather than maximising Sharpe Ratio. 

Of course there are potential pain points here. Even for a toy portfolio of 10 trading rules and 50 instruments we are optimising 500 assets. But the handcrafting approach of top down optimisation ought to be able to handle this fairly easily (we shall see!).



Testing

Setup

Let's think about how to setup some tests for these ideas. For speed and interpretation I want to keep things reasonably small. I'm going to use my usual five outright momentum EWMAC trading rules, plus 1 carry (correlations are pretty high here, I will use carry60), plus one of my skew rules (skewabs180 for those who care), asset class mean reversion - a value type rule (mrinasset1000), and both asset class momentum (assettrend64) and relative momentum (relmomentum80). My expectation is that the more RV type rules - relative momentum, skew, value - will get a higher weight than when we are just considering outright performance. I'm also expecting that the very fastest momentum will have a lower weight when exponential weighting is used.

The rules profitability is shown above. You can see that we're probably going to want to have less MR (mean reversion), as it's rubbish; and also if we update our estimates for profitability we'd probably want less faster momentum and relative momentum. There is another hockey stick from 2009 onwards when many rules seem to flatten off somewhat.

(Frankly we could do with more rules that made more money recently; but I don't want to be accused of overegging the pudding on overfitting here)

For instruments, to avoid breaking my laptop with repeated optimisation of 200+ instruments I kept it simple and restricted myself to only those with at least 20 years of trading history. There are 39 of these old timers:

'BRE', 'CAD', 'CHF', 'CORN', 'DAX', 'DOW', 'EURCHF', 'EUR_micro', 'FEEDCOW', 'GASOILINE', 'GAS_US_mini', 'GBP', 'GBPEUR', 'GOLD_micro', 'HEATOIL', 'JPY', 'LEANHOG', 'LIVECOW', 'MILK', 'MSCISING', 'MXP', 'NASDAQ_micro', 'NZD', 'PALLAD', 'PLAT', 'REDWHEAT', 'RICE', 'SILVER', 'SOYBEAN_mini', 'SOYMEAL', 'SOYOIL', 'SP400', 'SP500_micro', 'US10', 'US20', 'US5', 'WHEAT', 'YENEUR', 'ZAR'

On the downside there is a bit of a sector bias here (12 FX, 11 Ags,  6 equity, 4 metals, and only 3 bonds amd 3 energy), but that also gives more work for the optimiser (FWIW my full set of instruments has biased towards equities, so you can't really win).

For my long only benchmark used for regressions I'm going to use a fixed forecast of +10, which in laymans terms means it's a risk parity type portfolio. I will set the instrument weights using my handcrafting method, but without any information about Sharpe Ratio, just correlations. IDM is estimated on backward looking data of course.

I will then have something that roughly resembles my current system (although clearly with fewer markets and trading rules, and without using dynamic optimisation of positions). I also use handcrafting, but I fit forecast weights and instrument weights seperately, again without using any information on performance just correlations. 

I then check the effect of introducing the following features:

  • 'SR' Allowing Sharpe Ratio information to influence forecast and instrument weights
  • 'Alpha' Using alpha rather than Sharpe Ratio
  • 'Short' Using a 15 year halflife rather than all the data to estimate Sharpe Ratios and correlations
  • 'Sink' Estimating the weights for forecast and instrument weights at the same time

Apart from SR and alpha which are mutually exclusive, this gives me the following possible permutations:

  • Baseline: Using no peformance information 
  • 'SR' 
  • 'SR+Short' 
  • 'Sink' 
  • 'SR+Sink' 
  • 'SR+Short+Sink' 
  • 'Alpha' 
  • 'Alpha+Short'
  • 'Alpha+Sink'
  • 'Alpha+Short+Sink'
In terms of performance I'm going to check both the outright performance, but also the overall portfolio alpha. I will also look seperately at the post 2008 period and the pre 2008 period. Naturally everything is done out of sample, with robust optimisation, and after costs.

Finally, as usual in all cases I discard trading rules which don't meet my 'speed limit'. This also means that I don't trade the Milk future at all.


Long only benchmark

Some fun facts, here are the final instrument weights by asset class:

{'Ags': 0.248, 'Bond': 0.286, 'Equity': 0.117, 'FX': 0.259, 'Metals': 0.0332, 'OilGas': 0.0554}

The final diversification multiplier is 2.13. It has a SR of around 0.6, and costs of around 0.4% a year.


Baseline

Here is a representative set of forecast weights (S&P 500):

relmomentum80       0.105

momentum4           0.094
momentum8           0.048
momentum16          0.048
momentum32          0.054
momentum64          0.054
assettrend64        0.102

carry60             0.155
mrinasset1000       0.238
skewabs180          0.102

The massive weight to mrinasset is due to the fact it is very diversifying, and we are only using correlations here. But mrinasset isn't very good, so smuggling in outright performance would probably be a good thing to do.

SR of this thing is 0.98 and costs are a bit higher as we'd expect at 0.75% annualised. Always amazing how well just a simple diversified system can do. The Beta to our long only model is just 0.09 (again probably due to that big dollop of mean reversion which is slightly negative Beta if anything), so perhaps unsurprising the net alpha is 18.8% a year (dividing by the vol gets to a SR of 0.98 again just on the alpha). BUT...



Performance has declined over time. 


'SR'

I'm now going to allow the fitting process for both forecast and instrument weighs to use Sharpe ratio. Naturally I'm doing this in a sensible way so the weights won't go completely nuts.

Let's have a look at the forecast weights for comparison:

momentum4           0.112

momentum8           0.065

momentum16          0.068

momentum32          0.075

momentum64          0.072

assettrend64        0.122

relmomentum80       0.097


mrinasset1000       0.135

skewabs180          0.105

carry60             0.149


We can see that money losing MR has a lower weight, and in general the non trendy part of the portfolio has dropped from about half to under 40%. But we still have lots of faster momentum as we're using the whole period to fit.

Instrument weights by asset class meanwhile look like this:

{'Ags': 0.202, 'Bond': 0.317, 'Equity': 0.191, 'FX': 0.138 'Metals': 0.0700, 'OilGas': 0.0820}

Not dramatic changes, but we do get a bit more of the winning asset classes. 


'SR+Short'

Now what happens if we change our mean and correlation estimates so they have a 15 year halflife, rather than using all the data?

Forecast weights:

momentum4           0.095

momentum8           0.055

momentum16          0.059

momentum32          0.067

momentum64          0.066

relmomentum80       0.095

assettrend64        0.122

skewabs180          0.117

carry60             0.142

mrinasset1000       0.183



There's definitely been a shift out of faster momentum, and into things that have done better recently such as skew. We are also seeing more MR which seems counterintuitive, my initial theory is that it's because MR becomes more diversifying over time and this is indeed the case.


'Sink+SR+Short'

So far we've just been twiddling around a little at the edges really, but this next change is potentially quite different - jointly optimising the forecast and instrument weights. Let's look at the results with the SR using the 15 year halflife.

Here are the S&P 500 forecast weights - note that unlike for other methods, these could be wildly different across instruments:

momentum4           0.136
momentum8           0.188
momentum16          0.147
momentum32          0.046
momentum64          0.048
assettrend64        0.098
relmomentum80       0.012

carry60             0.035
mrinasset1000       0.000
skewabs180          0.290

Here we see decent amounts of faster momentum - maybe because it's a cheaper instrument or just happens to work better - but no mean reversion which apparently is shocking here. A better way of doing this is seeing the forecast weights added up across all instruments:

momentum4        0.181185
momentum8        0.138984
momentum16       0.088705
momentum32       0.079942
momentum64       0.098399
assettrend64     0.107174
relmomentum80    0.052196

skewabs180       0.086572
mrinasset1000    0.063426
carry60          0.103418


Perhaps surprisingly now we're seeing brutally large amounts of fast momentum, and less of the more diversifying rules. 
 


Interlude - clustering when everything is optimised together


To understand a little better what's going on, it might be helpful to do a cluster analysis to see how things are grouping together when we do our top down optimisation across the 10 rules and 37 instruments: 370 things altogether. Using the final correlation matrix to do the clustering, here are the results for 2 clusters:

Instruments {'CAD': 10, 'FEEDCOW': 10, 'GAS_US_mini': 10, 'GBPEUR': 10, 'LEANHOG': 10, 'LIVECOW': 10, 'MILK': 10, 'RICE': 10, 'SP400': 10, 'YENEUR': 10, 'ZAR': 10, 'DAX': 9, 'DOW': 9, 'MSCISING': 9, 'NASDAQ_micro': 9, 'SP500_micro': 9, 'GASOILINE': 4, 'GOLD_micro': 4, 'HEATOIL': 4, 'JPY': 4, 'PALLAD': 4, 'PLAT': 4, 'SILVER': 4, 'CHF': 3, 'CORN': 3, 'EURCHF': 3, 'EUR_micro': 3, 'NZD': 3, 'REDWHEAT': 3, 'SOYBEAN_mini': 3, 'SOYMEAL': 3, 'SOYOIL': 3, 'US10': 3, 'US5': 3, 'WHEAT': 3, 'BRE': 2, 'GBP': 2, 'US20': 2, 'MXP': 1}
Rules {'skewabs180': 37, 'mrinasset1000': 35, 'carry60': 33, 'relmomentum80': 25, 'assettrend64': 16, 'momentum4': 16, 'momentum8': 16, 'momentum16': 16, 'momentum32': 16, 'momentum64': 16}

Instruments {'MXP': 9, 'BRE': 8, 'GBP': 8, 'US20': 8, 'CHF': 7, 'CORN': 7, 'EURCHF': 7, 'EUR_micro': 7, 'NZD': 7, 'REDWHEAT': 7, 'SOYBEAN_mini': 7, 'SOYMEAL': 7, 'SOYOIL': 7, 'US10': 7, 'US5': 7, 'WHEAT': 7, 'GASOILINE': 6, 'GOLD_micro': 6, 'HEATOIL': 6, 'JPY': 6, 'PALLAD': 6, 'PLAT': 6, 'SILVER': 6, 'DAX': 1, 'DOW': 1, 'MSCISING': 1, 'NASDAQ_micro': 1, 'SP500_micro': 1}
Rules {'assettrend64': 23, 'momentum4': 23, 'momentum8': 23, 'momentum16': 23, 'momentum32': 23, 'momentum64': 23, 'relmomentum80': 14, 'carry60': 6, 'mrinasset1000': 4, 'skewabs180': 2}


Interepration here is that for each cluster I count the number of instruments present, and then trading rules. So for example the first cluster has 10 examples of CAD - since there are 10 trading rules that means all the CAD is in this cluster. It also has 37 examples of the skewabs180 rules, again this means that all the skew rules have been collected here.

This first cluster split clearly shows a split between divergent rules in cluster 1, and trendy type rules in cluster 2. The instrument split is less helpful.

Jumping ahead, here are N=10 clusters with my own labels in bold:

Cluster 1 EQUITY TREND
Instruments {'DAX': 6, 'DOW': 5, 'NASDAQ_micro': 5, 'SP400': 5, 'SP500_micro': 5}
Rules {'assettrend64': 5, 'momentum16': 5, 'momentum32': 5, 'momentum64': 5, 'momentum8': 4, 'mrinasset1000': 1, 'skewabs180': 1}

Cluster 2 ???
Instruments {'GAS_US_mini': 9, 'EURCHF': 2, 'GOLD_micro': 2, 'MSCISING': 2, 'NASDAQ_micro': 2, 'PALLAD': 2, 'PLAT': 2, 'RICE': 2, 'SILVER': 2, 'SP500_micro': 2, 'BRE': 1, 'CAD': 1, 'DAX': 1, 'DOW': 1, 'EUR_micro': 1, 'GASOILINE': 1, 'REDWHEAT': 1, 'SOYBEAN_mini': 1, 'SOYMEAL': 1, 'SOYOIL': 1, 'SP400': 1, 'US5': 1, 'WHEAT': 1}
Rules {'mrinasset1000': 15, 'carry60': 8, 'skewabs180': 6, 'relmomentum80': 5, 'assettrend64': 1, 'momentum4': 1, 'momentum8': 1, 'momentum16': 1, 'momentum32': 1, 'momentum64': 1}

Cluster 3 ???
Instruments {'FEEDCOW': 10, 'GBPEUR': 10, 'LEANHOG': 10, 'LIVECOW': 10, 'MILK': 10, 'YENEUR': 10, 'ZAR': 10, 'CAD': 9, 'RICE': 8, 'MSCISING': 7, 'HEATOIL': 4, 'JPY': 4, 'SP400': 4, 'CHF': 3, 'CORN': 3, 'DOW': 3, 'GASOILINE': 3, 'NZD': 3, 'US10': 3, 'DAX': 2, 'EUR_micro': 2, 'GBP': 2, 'GOLD_micro': 2, 'NASDAQ_micro': 2, 'PALLAD': 2, 'PLAT': 2, 'REDWHEAT': 2, 'SILVER': 2, 'SOYBEAN_mini': 2, 'SOYMEAL': 2, 'SOYOIL': 2, 'SP500_micro': 2, 'US20': 2, 'US5': 2, 'WHEAT': 2, 'BRE': 1, 'EURCHF': 1, 'GAS_US_mini': 1, 'MXP': 1}
Rules {'skewabs180': 30, 'carry60': 25, 'relmomentum80': 20, 'mrinasset1000': 19, 'momentum4': 15, 'momentum8': 11, 'assettrend64': 10, 'momentum16': 10, 'momentum32': 10, 'momentum64': 10}

Cluster 4 US RATES TREND+CARRY
Instruments {'US20': 8, 'US10': 7, 'US5': 7}
Rules {'carry60': 3, 'assettrend64': 3, 'momentum4': 3, 'momentum8': 3, 'momentum16': 3, 'momentum32': 3, 'momentum64': 3, 'mrinasset1000': 1}

Cluster 5 EURCHF
Instruments {'EURCHF': 7}
Rules {'assettrend64': 1, 'momentum4': 1, 'momentum8': 1, 'momentum16': 1, 'momentum32': 1, 'momentum64': 1, 'skewabs180': 1}

Cluster 6 EQUITY MR+REL MOMENTUM
Instruments {'DAX': 1, 'DOW': 1, 'MSCISING': 1, 'NASDAQ_micro': 1, 'SP500_micro': 1}
Rules {'mrinasset1000': 3, 'relmomentum80': 2}

Cluster 7 G10 FX TREND
Instruments {'GBP': 8, 'CHF': 7, 'EUR_micro': 7, 'NZD': 7, 'JPY': 6}
Rules {'assettrend64': 5, 'momentum4': 5, 'momentum8': 5, 'momentum16': 5, 'momentum32': 5, 'momentum64': 5, 'relmomentum80': 4, 'carry60': 1}

Cluster 8 EM FX
Instruments {'MXP': 9, 'BRE': 8}
Rules {'relmomentum80': 2, 'carry60': 2, 'assettrend64': 2, 'momentum4': 2, 'momentum8': 2, 'momentum16': 2, 'momentum32': 2, 'momentum64': 2, 'skewabs180': 1}

Cluster 9 AGS TREND
Instruments {'CORN': 7, 'REDWHEAT': 7, 'SOYBEAN_mini': 7, 'SOYMEAL': 7, 'SOYOIL': 7, 'WHEAT': 7}
Rules {'relmomentum80': 6, 'assettrend64': 6, 'momentum4': 6, 'momentum8': 6, 'momentum16': 6, 'momentum32': 6, 'momentum64': 6}

Cluster 10 ENERGY/METAL TREND
Instruments {'GASOILINE': 6, 'GOLD_micro': 6, 'HEATOIL': 6, 'PALLAD': 6, 'PLAT': 6, 'SILVER': 6}
Rules {'assettrend64': 6, 'momentum4': 6, 'momentum8': 6, 'momentum16': 6, 'momentum32': 6, 'momentum64': 6}

We can see that there are some richer things going on here than we could capture in the simple 2-dimensional fit of first forecast weights, then instrument weights. 


'Alpha'

Let's now see what happens if we replace the use of Sharpe Ratio on raw returns to measure performance with optimisation with the use of a Sharpe Ratio on residual returns after adjusting for Beta exposure; alpha basically.

Here are our usual forecast weights for S&P 500:

momentum4        0.061399
momentum8        0.067279
momentum16       0.037172
momentum32       0.037112
momentum64       0.069207
relmomentum80    0.170218
assettrend64     0.176815

skewabs180       0.105298
carry60          0.102799
mrinasset1000    0.172700

Very interesting; we're steering very much away from all speeds of 'vanilla' momentum here, and once again we have a lump of money in the very much diversifying but money losing mean reversion in assets.


Results


Right so you have waded through all this crap, and here is your reward, what are the results like?

                     SR   beta   r_SR  H1_SR  H1_beta  H1_r_SR  H2_SR  H2_beta  H2_r_SR
LONG_ONLY
0.601 1.000 0.000 0.740 1.000 0.000 0.191 1.000 0.000
BASELINE 0.983 0.094 0.940 1.246 0.094 1.192 0.197 0.058 0.188
SR 1.087 0.308 0.932 1.298 0.338 1.084 0.458 0.151 0.438
SR_short 1.089 0.324 0.927 1.318 0.353 1.100 0.375 0.166 0.350
sink 1.025 0.323 0.871 1.252 0.330 1.055 0.356 0.260 0.319
SR_sink 0.975 0.362 0.804 1.167 0.378 0.947 0.388 0.265 0.349
SR_short_sink 0.929 0.384 0.753 1.121 0.395 0.899 0.323 0.305 0.277
alpha 0.993 0.199 0.891 1.212 0.220 1.069 0.345 0.081 0.336
alpha_short 1.023 0.238 0.904 1.237 0.249 1.082 0.367 0.160 0.343
alpha_sink_short 0.888 0.336 0.735 1.090 0.336 0.903 0.257 0.299 0.210
The columns are the SR, beta, and 'residual SR' (alpha divided by standard deviation) for the whole period, then for H1 (not really the first half, but pre 2009), then for H2 (after 2009). Green values are the best or very close to it, red is the worst (excluding long only, natch).

Top line is everything looks worse after 2009 for both outright and residual performance. Looking at the entire period, there are some fitting methods that do better than the baseline on SR, but on residual SR they fail to improve. Focusing on the second half of the data, there is a better improvement on SR over the baseline for all the fitting methods, but one which also survives the use of a residual SR. 

But the best model of all in that second half was almost much the simplest, just using the SR alone to robustly fit weights - but for the entire period, and sticking to the two stage process of fitting forecast weights and then instrument weights. 

I checked, and the improvement over the baseline from just using SR was statistically significant with a p-value of about 0.02. The p-value versus the competing 'apha' fit wasn't so good - just 0.2; but Occams Rob's razor says we should use the simplest possible model unless there is a more complex model that is significantly better. SR is significantly better than the simpler baseline model at least in the more critical second half of the data, so we should use it and only use a more complex model if they are better. We don't need a decent p-value to justify using SR over alpha, since the latter is more complex.


Coda: Forecast or instrument weights?

One thing I was curious about was whether the improvements from using SR are down to fitting forecast weights or instrument weights. I'm a bit more bullish on the former, as I feel there is more data and thus more likelihood of getting robust statistics. Every time I have looked at instrument performance, I've not seen any stastistically significant differences.

(If you have say 30 years of data history, then for each instrument you have 30 years worth of information, but for each trading rule you have evidence from each instrument so you end up with 30*30 years which means you have root(30) = 5.5 times more information).

My hope / expectation is that all the work is being done by forecast weight fitting, so I checked to see what happened if I ran a SR fit just on instruments, and just on forecasts:

               SR   beta   r_SR  H1_SR  H1_beta  H1_r_SR  H2_SR  H2_beta  H2_r_SR
LONG_ONLY 0.601 1.000 0.000 0.740 1.000 0.000 0.191 1.000 0.000
BASELINE 0.983 0.094 0.940 1.246 0.094 1.192 0.197 0.058 0.188
SR 1.087 0.308 0.932 1.298 0.338 1.084 0.458 0.151 0.438
SR_forecasts 1.173 0.330 1.002 1.416 0.364 1.179 0.441 0.154 0.420
SR_instruments 0.948 0.118 0.892 1.179 0.121 1.106 0.259 0.076 0.248


Sure enough we can see that the benefit is pretty much entirely coming from the forecast weight fitting.


Conclusions

I've shyed away from using performance rather than just correlations for fitting, but as I said earlier it is an itch I wanted to scratch. It does seem that none of the fancy alternatives I've considered in this post add value; so I will keep searching for the elusive bullet of quick wins through portfolio optimisation. 

Meanwhile for the exercise of updating my trading strategy with new instruments, I will probably be using Sharpe Ratio information to robustly fit forecast weights but not instrument weighs (I still need to hold my nose a bit!).







Monday, 5 February 2024

Introducing max-GM (median@p), a new(?) performance statistic

Do you remember this post? https://qoppac.blogspot.com/2022/06/vol-targeting-cagr-race.html

Here I introduced a performance metric, the best annualised compounding return at the optimal leverage level for that strategy. This is equivalent to finding the highest geometric return once a strategy is run at it's Kelly optimal leverage.

I've since played with that idea a bit, for example in this more recent post amongst others I considered the implications of that if we have different tolerances for uncertainty and used bootstrapping of returns when optimising a stock and bond portfolio with leverage, whilst this post from last month does the same exercise in a bitcoin/equity portfolio without leverage.

In this post I return to the idea of focusing on this performance metric - the maximised geometric mean at optimal leverage, but now I'm going to take a more abstract view to try and get a feel for in general what sort of strategies are likely to be favoured when we use this performance metric. In particular I'd like to return to the theme of the original post, which is the effect that skew and other return moments have on the maximum achievable geometric mean. 

Obvious implications here are in comparing different types of strategies, such as trend following versus convergent strategies like relative value or option selling; or even 'classic' trend following versus other kinds.



Some high school maths

(Note I use 'high school' in honor of my US readers, and 'maths' for my British fans)

To kick off let's make the very heroric assumption of Gaussian returns, and assume we're working at the 'maximum Kelly' point which means we want to maximise the median of our final wealth distribution - same as aiming for maximum geometric return - and are indifferent to how much risk such a portfolio might generate.

Let's start with the easy case where we assume the risk free rate is zero; which also implies we pay no interest for borrowing (apologies I've just copied and pasted in great gobs of LaTeX output since it's easier than inserting each formula manually):

Now that is a very nice intuitive result!

Trivially then if we can use as much leverage as we like, and we are happy to run at full Kelly, and our returns are Gaussian, then we should choose the strategy with the highest Sharpe Ratio. What's more if we can double our Sharpe Ratio, we will quadruple our Geometric mean!

Truely the Sharpe Ratio is the one ratio to rule them all!

Now let's throw in the risk free rate:


We have the original term from before (although the SR now deducts the risk free rate), and we add on the risk free rate reflecting the fact that the SR deducts it, so we add it back on again to get our total return. Note that the higher SR = higher geometric mean at optimal leverage relationship is still true. Even with a positive risk free rate we still want to choose the strategy with the highest Sharpe Ratio!

Consider for example a classic CTA type strategy with 10% annualised return and 20% standard deviation, a miserly SR of 0.5 with no risk free rate; and contrast with a relative value fixed income strategy that earns 6.5% annualised return with 6% standard deviation, a SR of 1.0833

Now if the risk free rate is zero we would prefer the second strategy as it has a higher SR, and indeed should return a much higher geometric mean (since the SR is more than double, it should be over four times higher). Let's check. The optimal leverages are 2.5 times and 18.1 (!) times respectively. At those leverages the arithmetic means are 25% and 117% respectively, and the geometric means using the approximation are 12.5% for the CTA and 58.7% for the RV strategy.

But what if the risk free rate was 5%? Our Sharpe ratios are now equal: both are 0.25. The optimal leverages are also lower, 1.25 and 4.17. The arithmetic means come in at 12.5% and 27.1%, with geometric means of 9.4% and 24%. However we have to include the cost of interest; which is just 1.25% for the CTA strategy (borrowing just a quarter of it's capital at a cost of 5% remember) but a massive 15.8% for the RV. Factoring those in the net geometric means drop to 8.125% for both strategies - we should be indifferent between them, which makes sense as they have equal SR.



The horror of higher moments

Now there is a lot wrong with this analysis. We'll put aside the uncertainty around being able to measure exactly what the Sharpe Ratio of a strategy is likely to be (which I can deal with by drawing off more conservative points of the return distribution, as I have done in several previous posts), and the assumption that returns will be the same in the future. But that still leaves us with the big problem that returns are not Gaussian! In particular a CTA strategy is likely to have positive skew, whilst an RV variant is more likely to be a negatively skewed beast, both with fat tails in excess of what a Gaussian model would deliver. In truth in the stylised example above I'd much prefer to run the CTA strategy rather than a quadruple leveraged RV strategy with horrible left tail properties.

Big negative skewed strategies tend to have worse one day losses; or crap VAR if you prefer that particular measure. The downside of using high leverage is that we will be saddled with a large loss on a day when we have high leverage, which will significantly reduce our geometric mean.

There are known ways to deal with modifying the geometric mean calculation to deal with higher moments like skew and kurtosis. But my aim in this post is to keep things simple; and I'd also rather not use the actual estimate of kurtosis from historic data since it has large sampling error and may underestimate the true horror of a bad day that can happen in the future (the so called 'peso problem'); I also don't find the figures for kurtosis particuarly intuitive. 

(Note that I did consider briefly using maximum drawdown as my idiots tail effect here. However maximum drawdown is only relevant if we can't reduce our leverage into the drawdown. And perhaps counter intuitively, negative skewed strategies actually have lower and shorter drawdown)

Instead let's turn to the tail ratio, which I defined in my latest book AFTS. A lower tail ratio of 1.0 means that that the left tail is Gaussian in size, whilst a higher ratio implies a fatter left tail.

I'm going to struggle to rewrite the relatively simple 0.5SR^2 formulae to include a skew and left tail term, which in case will require me to make some distributional assumptions. Instead I'm going to use some bootstrapping to generate some distributions, measure the tail properties, find the optimal leverage, and then work out the geometric return at the optimal leverage point. We can then plot maximal geometric means against empirical tail ratios to get a feel for what sort of effect these have.



Setup

To generate distributions with different tail properties I will use a mixture of two Gaussian distributions; including one tail distribution with a different mean and standard deviation* which we draw from with some probability<0.5. It will then be straightforward to adjust the first two moments of the sample distribution of returns to equalise Sharpe Ratio so we are comparing like with like.

* you will recognise this as the 'normal/bliss' regime approach used in the paper I discussed in my prior post around optimal crypto allocation, although of course it will only be bliss if the tail is nicer which won't be the case half the time.

As a starting point then my main return distribution will have daily standard deviation 1% and mean 0.04% which gives me an annualised SR of 0.64, and will be 2500 observations (about 10 years) in length - running with different numbers won't affect things very much. For each sample I will draw the probability of a tail distribution from a uniform distribution between 1% and 20%, and the tail distribution daily mean from uniform [-5%, 5%], and for the tail standard deviation I will use 3% (three times the normal). 

All this is just to give me a series of different return distributions with varying skew and tail properties. I can then ex-post adjust the first two moments so I'm hitting them dead on, so the mean, standard deviation and SR are identical for all my sample runs. The target standard deviation is 16% a year, and the target SR is 0.64, all of which means that if the returns are Gaussian we'd get a maximum leverage of 4.0 times.

As always with these things it's probably easier to look at code, which is here (just vanilla python only requirements are pandas/numpy).



Results

Let's start with looking at optimal leverage. Firstly, how does this vary with skew?


Ignore the 'smearing' effect this is just because each of the dots in a given x-plane will have the same skew and first two moments, but slightly different distribution otherwise. As we'd expect given the setup of the problem the optimal leverage with zero skew is 4.0

For a pretty nasty skew of -3 the leverage should be about 10% lower - 3.6; whilst for seriously positive skew of +3 you could go up to 4.5. These aren't big changes! Especially when you consider that few real world trading strategies have absolute skew values over 1 with the possible exception of some naked option buying/selling madness. The most extreme skew value I could find in my latest book was just over 2.0, and that was for single instruments trading carry in just one asset class (metals).

What about the lower tail? I've truncated the plot on the x-axis for reasons I will explain in a second.



Again for the base case with a tail ratio of 1.0 (Gaussian) the optimal leverage is 4.0; for very thin lower tails again it goes up to 4.4, but for very fat lower tails it doesn't really get below about 3.6. Once again, tail ratios of over 4.0 are pretty rare in real strategies (though not individual instruments), although my sampling does sometimes generate very high tail ratios the optimal leverages never go below 3.6 even for a ratio of nearly 100.

Upper tail, again truncated:


Again a tail ratio of 1.0 corresponds to optimal leverage of roughly 4.0; and once again even for very fat upper tail ratios (of the sort that 'classical' trend followers like to boast about, the optimal leverage really isn't very much higher.

Now let's turn to Geometric mean, first as affected by skew:


As we'd expect we can generate more geometric mean with more skew... but not a lot! Even a 3 unit improvement in skew barely moves the return needle from 22.5% to 24.5%. To repeat myself it's rare to find trading strategies above or below 1.0, never mind 3.0. For example the much vaunted Mulvaney capital has a return skew on monthly returns of about 0.45. The graph above shows that will only add about 0.25% to expected geometric return at optimal leverage versus a Gaussian normal return distribution (this particular fund has extremely large standard deviation as they are one of the few funds in the world that actually runs at optimal leverage levels).

For completeness, here are the tail ratios. First the lower tail ratio:

Fat left tails do indeed reduce maximum optinmal geometric means, but not a lot.

Now for the upper tail:



Summary

I do like neat formulae, and the results for Gaussian normal distributions do have the property of being nice. Of course they are based on unreal expectations, and anyway who runs full Kelly on a real strategy expecting the returns to be normal and the SR to be equal to the backtested SR is an idiot. A very optimistic idiot, no doubt with a sunny disposition, but an idiot nonetheless.

For the results later in the post it does seem surprising that even if you have something with a very ugly distribution that you'd not really adjust your optimal leverage much, and hence see barely on impact on geometric mean. But remember here that I'm fixing the first two moments of the distribution; which means I'm effectively assuming that I can measure these with certainty and also that the future will be exactly like the past. These are not realistic expectations! 

And that is why in the past rather than choose the leverage that maximised geometric mean, I've chosen to maximise some more conservative point on the distribution of terminal wealth (where geometric mean would be the median of that distribution). Doing that would cause some damage to the negative skewed, fat left tail distributions resulting in lower optimal leverage and thus lower geometric means.

I have thought of a way of doing this analysis with distributional points, but it's quite computationally intensive and this post is already on the long side, so let's stop here.