Friday, 6 December 2024

Taking an income from your trading account - probabilistic Kelly with regular withdrawals

Programming note: This post has been in draft since ... 2016!

One question you will see me asked a lot is 'how much money do I need to become a full time trader?'. And I usually have a handwaving answer along the lines of 'Well if you think your strategy will earn you 10% a year, then you probably want to be able to cover 5 years of expenses with no income from your trading strategy, so you need 15x your annual living expenses as an absolute minimum'. Which curiously often isn't the answer people want to hear, since objectively at that point they would already be rich and they want to trade purely to become rich (a terrible idea! people should only trade for fun with money they can afford to lose); and also because they want to start trading right now with the $1,000 they have saved up which wouldn't be enough to cover next months rent. 

But behind that slightly trite question there is a more deep and meaningful one. It is a variation of this question, which I've talked about a lot on this blog and in my various books:

"Given an expected distribution of trading strategy returns, what is the appropriate standard deviation or leverage target to run your strategy at?"

And the variation we address here is:

How does the answer to the above question change if you are regularly taking a specific % withdrawal from your account?

This has obvious applications to retail traders like me (although I don't currently take a regular withdrawal from my trading account which is only a proportion of my total investments, rather I sporadically take profits). But it could also have applications to institutional investors creating some kind of structured product with a fixed coupon (do people still do that?).

There is generic python code here (no need to install any of my libraries first, except optionally to use the progressBar function) to follow along with.


A brief reminder of prior art(icles)


For new readers and those with poor memories, here's a quick run through what I mean by 'probabilistic Kelly'. If you are completely new to this and find I'm going too quickly, you might want to read some prior articles:


If you know this stuff backwards, then you can skim through very quickly just to make sure you haven't remembered it wrong.

Here goes then: The best measure of performance is the following - having the most money at the end of your horizon (which for this blogpost I will assume is 10 years, eg around 2,560 working days). We maximise this by maximising final wealth, or log(final wealth). This is known as the Kelly criterion. The amount of money you will have at the end of time is equal to your starting capital C, multiplied by the product of (1+r0)(1+r1)...(1+rT) where rt is the return in a given time period. The t'th root of all of that lot, minus one, is equal to the geometric mean. So to get the most money, we maximise the annual geometric mean of returns which is also known in noob retail trading circles as the CAGR.

If we can use any amount of leverage then for Gaussian returns the optimal standard deviation will be equal to the Sharpe Ratio (i.e. average arithmetic excess return / standard deviation). For example, if we have a strategy with a return of 15%, with risk free rate of 5%, and standard deviation of 20%; then the Sharpe ratio will be (15-5)/20 = 0.50; the optimal standard deviation is 0.50 = 50%; and the leverage required to get that will be 50%/20% = 2.5.

Note: For the rest of the post I'm going to assume Gaussian normal returns since we're interested in the relative effects of what happens when we introduce cash withdrawal, rather than the precise numbers involved. As a general rule if returns are negatively skewed, then this will reduce the optimal leverage and standard deviation target, and hence the safe cash withdrawal rate. 

Enough maths: it's probably easier to look at some pictures. For the arbitrary strategy with the figures above, let's see what happens to return characteristics as we crank up leverage (x-axis; leverage 1 means no leverage and fully invested, >1 means we are applying leverage, <1 means we keep some cash in reserve):

x-axis: leverage, y-axis: various statistics

The raw mean in blue shows the raw effect of applying leverage; doubling leverage doubles the annual mean from 15% to 30%. Similarly doubling leverage doubles the standard deviation in green from 20% to 40%. However when we use leverage we have to borrow money; so the orange line showing the adjusted mean return is lower than the blue line (for leverage >1) as we have to pay interest..

The geometric mean is shown in red. This initially increases, and is highest at 2.5 times leverage - the figure calculated above, before falling. Note that the geometric mean is always less than the mean; and the gap between them gets larger the riskier the strategy gets. They will only be equal if the standard deviation is zero. Note also that using half the optimal leverage doesn't halve the geometric return; it falls to around 14.4% a year down from just over 17.5% a year with the optimal leverage. But doubling the leverage to 5.0 times results in the geometric mean falling to zero (this is a general result). Something to bear in mind then is that using less than the optimal leverage doesn't hurt much, using more hurts a lot.

Here is another plot showing the geometric mean (Left axis, blue) and final account value where initial capital C=1 (right axis, orange); just to confirm the maximum occurs at the same leverage point:

x-axis: leverage, y-axis LHS: geometric mean (blue), y-axis RHS: final account value (orange)

Remember the assumption we're making here is that we can use as much leverage as possible. That means that if we have a typical relative value (stat arb, equity long short, LTCM...) hedge fund with low standard deviation but high Sharpe ratio, then we would need a lot of leverage to hit the optimal point. 

If we label our original asset A, then now consider another asset B with excess mean 10%, standard deviation 10%, and thus Sharpe Ratio of 1.0. For this second asset, assuming it is Gaussian (and assets like this are normally left skewed in reality) the optimal standard deviation will be equal to the SR, 100%; and the leverage required to get that will be 100/10 = 10x. Which is a lot. Here is what happens if we plot the geometric mean against leverage for both assets.



Optimal leverage (x axis) occurs at maximum geometric mean (y axis) which is at leverage 2.5 for A, and at leverage 10 for B (which as you would expect has a much higher geometric mean at that point). 

But if we plot the geometric mean (y axis) against standard deviation (x axis) we can see the optimium risk target is 50% (A) and 100% (B) respectively:

x-axis: leverage, y-axis geometric mean
 

Bringing in uncertainty


Now this would be wonderful except for one small issue; we don't actually know with certainty what our distribution of futures returns will be. If we assume (heroically!) that there is no upward bias in our returns eg because they are from a backtest, and we also assume that the 'data generating process (DGP)' for our returns will not change, and that our statistical model (Gaussian) is appropriate for future returns; then we are still left with the problem that the parameters we are estimating for our returns are subject to sampling estimation error or what I called in my second book 'Smart Portfolios', the "uncertainty of the past".

There are at least three ways to calculate estimation error for something like a Sharpe Ratio, and they are:

  • With a distributional assumption, using a closed form formula eg the variance of the estimate will be (1+.5SR^2)/N where N is the number of observations, if returns are Gaussian. For our 2560 daily returns and an annual SR of 0.5 that will come out to a standard deviation of estimate for the SR of 0.32; eg that would give a 95% confidence interval for annual SR (approx +/- 2 s.d.) of approximately -0.1 to 1.1
  • With non parametric bootstrapping where we sample with replacement from the original time series of returns
  • With parametric monte carlo where we fix some distribution, estimate the distributional parameters from the return series and resample from those distributions
Calculation: annual SR = 0.5, daily SR = 0.5/sqrt(256) = 0.03125. Variance of estimate = (1+.5*.03125^2)/2560 = 0.000391, standard deviation of estimate = 0.0197, annualised = 0.0197*sqrt(256) = 0.32

For simplicity and since I 'know' the parameters of the distribution I'm going to use the third method in this post. 

(it would be equally valid to use the other methods, and I've done so in the past...)

So what we do is generate a number of new return series from the same distribution of returns as in the original strategy, and the same length (10 years). For each of these we calculate the final account value given various leverage levels. We then get a distribution of account values for different leverage levels. 

The full Kelly optimal would just find the leverage level at which the average account value was maximised, i.e. the median 50% percentile point of this distribution. Instead however we're going to take some more conservative distributional point which is something less than 50%, like for example 20%. In plain english, we want the leverage level that maximises the account value that we expect to get say two out of ten times in a future 10 year period (assuming all our assumptions about the distribution are true). 

Note this is a more sophisticated way of doing the crude 'half Kelly' targeting used by certain people, as I've discussed in previous blog posts. It also gives us some comfort in the case of our returns not being normally distributed, but where we've been unable to accurately estimate the likely left hand tail from the existing historic data ('peso problem').

Let's return to asset A and show the final value at different points of the monte carlo distribution, for different leverage levels:


x-axis leverage level, y-axis final value of capital, lines: different percentile points of distribution

Each line is a different point on the distribution, eg 0.5 is the median, 0.2 is the 20% percentile and so on. As we get more pessimistic (lower values of percentile), the final value curve slips down for a given level of leverage; but the optimal leverage which maximizes final value also reduces. If you are super optimistic (75% percentile) you would use 3.5x leverage; but if you were really conservative (10% percentile) you would use about 0.5x leverage (eg keep half your money in cash). 

As I said in my previous post your choice of line is down to your tolerance for uncertainty. This is not quite the same as a risk tolerance, since here we are assuming that you are happy to maximise geometric mean and therefore you are happy to take as much standard deviation risk as that involves. I personally feel that the choice of uncertainty tolerance is much more intuitive to most people than choosing a standard deviation risk limit / target, or god forbid a risk tolerance penalty variable.


Introducing withdrawals


Now we are all caught up with the past, let's have a look at what happens if we withdraw money from our portfolio over time. First decision to make is what our utility function is. Do we still want to maximise final value? Or are we happy to end up with some non positive value of money at the end of time? For some of us, the answer will depend on how much we love our children :-) To keep things simple, I'm initially going to assume that we want to maximise final value, subject to that being at least equal to our starting capital. As my compounding calculations assume an initial wealth of 1.0, that means a final account value of at least 1.0.

Inititally then I'm going to look at what happens in the non probabilistic case. In the following graph, the x-axis is leverage as before, and the y-axis this time is final value. Each of the lines shows what will happen at a different withdrawal rate. 0 is no withdrawal, 0.005 is 0.5% a year, and so on up to 0.2; 20% a year.


x-axis leverage, y-axis final value. Each line is a different annual withdrawal rate

At higher withdrawal rates we make less money - duh! - but the optimal leverage remains unchanged. That makes sense. Regardless of how much money we are withdrawing, we're going to want to run at the same optimal amount of leverage. 

And for all withdrawal rates of 17% or less, we end up with at least 1.0 of our final account value, so we can use the optimal leverage without any worries. For higher withdrawal rates, eg 20%, we can never safely withdraw all that amount, regardless of how much leverage we use. We'll always end up with less than our final account value even at the optimal leverage ratio.

For this Sharpe Ratio level then, to end up with at least 1.0 of our account value, it looks like our safe withdrawal rate is around 17% (In fact, I calculate it later to be more like 18%).


Safe withdrawals versus Sharpe Ratio


OK that's for a Sharpe of 0.5, but what if we have a strategy which is much better or worse? What is the relationship between a safe withdrawal rate, and the Sharpe Ratio of the underlying strategy?  Let's assume that we want to end up with at least 1.0x our starting capital after 10 years, and we push our withdrawal rate up to that point. 

X-axis Sharpe Ratio, y-axis safe withdrawal rate leaving capital unchanged at starting level

That looks a bit exponential-esque, which kind of makes sense since we know that returns gross of funding costs scale with the square of SR: If our returns double with the same standard deviation we double our SR, then we can double our risk target, which means we can use twice as much leverage, so we end up with four times the return. It isn't exactly exponential, because we have to fund borrowing. 

The above result is indifferent to the standard deviation of the underlying asset as we'd expect (I did check!), but how does it vary when we change the other key values in our calculation: the years to run the strategy over and the proportion of our starting capital we want to end up with?

x-axis amount of starting capital to end up with, y-axis withdrawal rate, lines different time periods in years

Each of these plots has the same format. The Sharpe Ratio of the underlying strategy is fixed, and is in the title. The y-axis shows the safe withdrawal rate, for a given amount of remaining starting capital on the x-axis (where 1.0 means we want to end up with all our remaining capital). Each line shows the results for a different number of years.

The first thing to notice is that if we want to maintain our starting capital, the withdrawal rate will be unchanged regardless of the number of years we are trading for. That makes sense - this is a 'steady state' where we are withdrawing exactly what we make each year. If we are happy to end up with less of our capital, then with shorter horizons we can afford to take a lot more out of our account each year. Again, this makes sense. However if we want to end up with more money than we started with, and our horizon is short, then we have to take less out to let everything compound up. In fact for a short enough time horizon we can't end up with twice our capital as there just isn't enough time to compound up (at what here is quite a poor Sharpe Ratio). 

x-axis amount of starting capital to end up with, y-axis withdrawal rate, lines different time periods in years


With a higher Sharpe, the pattern is similar but the withdrawal rates that are possible are much larger. 


x-axis amount of starting capital to end up with, y-axis withdrawal rate, lines different time periods in years



Withdrawals probabilistically


Notice that if you really are going to consistently hit a SR of exactly 1, and you're prepared to run at full Kelly, then a very high withdrawal rate of 50% is apparently possible. But hitting a SR of exactly 1 is unlikely because of parameter uncertainty. 

So let's see what happens if we introduce the idea of distributional monte carlo into withdrawals. To keep things simple, I'm going to stick my original goal of saying that we want to end up with exactly 100% of our capital remaining when we finish. That means we can solve the problem for an arbitrary number of years (I'm going to use 30, which seems reasonable for someone in the withdrawal phase of their investment career post retirement). 

What I'm going to do then is generate a large number of random 30 year daily return series drawn for a return distribution appropriate for a given Sharpe Ratio, and for each of those calculate what the optimal leverage would be (which remember from earlier is invariant to withdrawal rate), and then find the maximum annual withdrawal rate that means I still have my starting capital at the end of the investment period. This will give me a distribution of withdrawal rates. 

From that distribution I then take a different quantile point, depending on whether I am being optimistic or pessimistic versus the median.



X-axis: Sharpe Ratio. Y-axis: withdrawal rate (where 0.5 is 50% a year). Line colours: different percentiles of the monte carlo withdrawal rate distribution, eg 0.5 is the median, 0.1 is the very conservative 10% percentile.


Here is the same data in a table:

                   Percentile
SR     0.10   0.20   0.30   0.50    0.75
0.10 4.7 4.8 4.9 5.5 7.40
0.25 4.9 5.3 6.0 7.9 11.00
0.50 9.0 11.0 13.0 18.0 24.25
0.75 18.0 23.0 26.0 33.0 43.00
1.00 33.0 39.0 45.0 55.0 66.00
1.50 83.9 96.0 102.0 117.0 135.00
2.00 162.0 176.0 185.7 205.0 228.25

We can see our old friend 18% in the median 0.50 percentile column, for the 0.50 SR row. As before we can withdraw more with higher Sharpe Ratios.
Now though as you would expect, as we get more optimistic about the quanti, we would use a higher withdrawal rate. For example, for a SR of 1.0 the withdrawal rates vary from 33% a year at the very conservative 10% percentile, right up to 66% at the highly optimistic 75% percentile.
As I've discussed before nobody should ever use more than the median 50% (penultimate column) which means you're basically indifferent to uncertainty, and I'd be vary wary of the bottom few rows with very high  Sharpe Ratios, unless you're actually running an HFT shop or Jane Street in which case good luck.
Footnote: All of the above numbers were calculated with a 5% risk free rate. Here are the same figures with a 0% risk free rate. They are roughly, but not exactly, the above minus 5%. This means that for low enough SR values and percentile points we can't safely withdraw anything and expect to end up with our starting capital intact.

                    Percentile
SR     0.10   0.20   0.30   0.50   0.75
0.10 0.0 0.0 0.0 0.4 2.8
0.25 0.0 0.6 1.4 3.6 7.5
0.50 3.6 5.9 8.1 12.0 19.0
0.75 13.0 17.0 21.0 28.0 38.0
1.00 29.0 34.0 39.0 48.0 62.0
1.50 79.0 90.0 96.0 110.0 131.0
2.00 150.9 167.0 178.0 198.5 223.0



Conclusion



I find him strangely compelling and also very annoying. He is always sniggering and has a permanent smug look on his face. The videos are mostly set in exotic places where we are presumably supposed to envy Anton's lifestyle which seems to involve spending a lot of time flying around the world - not something I'd personally aspire to.

t

I can't comment on the quality of his education but at least he has the pedigree. He also has some interesting opinions about non trading subjects but then so do most trading "gurus". Mostly on trading, and on the financial industry generally, from what I've seen he talks mostly sense.

Anyway, one interesting thing he said is that you shouldn't use trading for income but only to grow capital. Something I mostly agree with. Mostly people who w

http://www.elitetrader.com/et/index.php?threads/how-much-did-you-save-up-before-you-decided-to-trade-full-time.298251/
As a procrastination technique (I'm supposed to be writing my second book) I've been watching the videos of Anton Kriel on youtube. For those of you who don't know him he's an english ex goldman sachs guy who retired at the age of 27, "starred" in the post modern turtle traders based reality trading show "million dollar traders", and now runs something called the institute of trading that offers very high priced training and mentoring courses.

I find him strangely compelling and also very annoying. He is always sniggering and has a permanent smug look on his face. The videos are mostly set in exotic places where we are presumably supposed to envy Anton's lifestyle which seems to involve spending a lot of time flying around the world - not something I'd personally aspire to.

I can't comment on the quality of his education but at least he has the pedigree. He also has some interesting opinions about non trading subjects but then so do most trading "gurus". Mostly on trading, and on the financial industry generally, from what I've seen he talks mostly sense.

Anyway, one interesting thing he said is that you shouldn't use trading for income but only to grow capital. Something I mostly agree with. Mostly people who w

http://www.elitetrader.com/et/index.php?threads/how-much-did-you-save-up-before-you-decided-to-trade-full-time.298251/
I'd quite a conservative person, so I'd probably conservatively assume my SR was around 0.50 (in backtest it's much higher than that, and even in live trading it's been a little bit higher), and use the most conservative 10% percentile. That implies that with a withdrawal rate a shade under 4% plus the risk free rate I'd still have my starting capital intact after any given period of time. 

Since I've used a risk free rate of 5%, that implies withdrawing the risk free rate plus another 4% on top, for a total of 9%.

If you're more aggressive, and have good reason to expect a higher Sharpe Ratio, then you could consider a withdrawal rate up to perhaps 30%. But this should only be done by someone with a track record of achieving those kinds of returns over several years, and who is comfortable with the fact that their chances of maintaining their capital are only a coin flip.

Note that one reason this is quite low is that in a conservative 10% quantile scenario I'd rarely be using the full Kelly (remember 0.50 SR implies a 50% risk target); this is consistent with what I actually do which is use a 25% risk target. With a 25% risk target, and SR 0.5 in theory I will make the risk free rate plus 12.5%. So I'm withdrawing around a third of my expected profits, which sounds like a good rule of thumb for someone who is relatively risk averse. 

Obviously my conservative 9% is higher than the 4% suggested by most retirement planners (which is a bit arbitrary as it doesn't seem to change when the risk free rate changes), but that is for long only portfolios where the Sharpe probably won't be even as good as 0.50; and more importantly where leverage isn't possible. Getting even to the half Kelly risk target of 25% isn't going to be possible without leverage with a portfolio that doesn't just contain small cap stocks or crypto.... it will be impossible with 60:40 for sure! But also bear in mind that my starting capital won't be worth what it's currently worth in real terms in the future, so I might want to reduce that figure further. 

Tuesday, 19 November 2024

CTA index replication and the curse of dimensionality

Programming note: 

So, first I should apologise for the LONG.... break between blogposts. This started when I decided not to do my usual annual review of performance - it is a lot of work, and I decided that the effort wasn't worth the value I was getting from it (in the interests of transparency, you can still find my regularly updated futures trading performance here). Since then I have been busy with other projects, but I now find myself with more free time and a big stack of things I want to research and write blog posts on.

Actual content begins here:

To the point then - if you have heard me talking on the TTU podcast you will know that one of my pet subjects for discussion is the thorny idea of replicating - specifically, replicating the performance of a CTA index using a relatively modest basket of futures which is then presented inside something like an ETF or other fund wrapper as an alternative to investing in the CTA index itself (or to be more precise, investing in the constituents because you can't actually invest in an index).

Reasons why this might be a good thing are: 

  • that you don't have to pay fat fees to a bunch of CTA managers, just slightly thinner ones to the person providing you with the ETF. 
  • potentially lower transaction costs outside of the fee charged
  • Much lower minimum investment ticket size
  • Less chance of idiosyncratic manager exposure if you were to deal with the ticket size issue by investing in just a subset of managers rather than the full index
How is this black magic achieved? In an abstract way there are three ways we can replicate something using a subset of the instruments that the underyling managers are trading:
  • If we know the positions - by finding the subset of positions which most closely matches the joint positions held by the funds in the index. This is how my own dynamic optimisation works, but it's not really practical or possible in this context.
  • Using the returns of individual instruments: doing a top down replication where we try and find the basket of  current positions that does the best job of producing those returns.
  • If we know the underlying strategies - by doing a bottom up replication where we try and find the basket of strategies that does the best job of producing those returns.

In this post I discuss in more detail some more of my thoughts on replication, and why I think bottom up is superior to top down (with evidence!).

I'd like to acknowledge a couple of key papers which inspired this post, and from which I've liberally stolen:



Why are we replicating?

You may think I have already answered this; replication allows us to get close to the returns of an index more cheaply and with lower minimum ticket size than if we invested in the underlying managers. But we need to take a step back: why do we want the returns of the <insert name of CTA index> index?




For many institutional allocators of capital the goal is indeed closely matching and yet beating the returns of a (relatively) arbitrary benchmark. In which case replication is probably a good thing.

If on the other hand you want to get exposure to some latent trend following (and carry, and ...) return factors that you believe are profitable and/or diversifying then other options are equally valid, including investing in a selected number of managers, or doing DIY trend following (and carry, and ...). In both cases you will end up with a lower correlation to the index than with replication, but frankly you probably don't care.

And of course for retail investors where direct manager investment (in a single manager, let alone multiple managers) and DIY trend following aren't possible (both requiring $100k or more) then a half decent and chearp ETF that gives you that exposure is the only option. Note such a fund wouldn't neccessarily need to do any replication - it could just consist of a set of simple CTA type strategies run on a limited universe of futures and that's probably just fine. 

(There is another debate about how wide that universe of futures should be, which I have also discussed in recent TTU episodes and for which this article is an interesting viewpoint). 

For now let's assume we care deeply, deeply, about getting the returns of the index and that replication is hence the way to go.


What exactly are we replicating?

In a very abstract way, we think of there being C_0....C_N CTA managers in an index. For example in the SG CTA index there are 20 managers, whilst in the BTOP50 index there are... you can probably guess. No, not 50, it's currently 20. The 50 refers to the fact it's trying to capture at least 50% of the investable universe. 

In theory the managers could be weighted in various ways (AUM, vol, number of Phds in the front office...) but both of these major indices are equally weighted. It doesn't actually matter what the weighting is for our purposes today.

Each manager trades in X underlying assets with returns R_0.....R_X. At any given time they will have positions in each of these assets, P_c_x (so for manager 0, P_0_0.... P_0_X, for manager 1, P_1_0...P_1_X and in total there will be X*N positions at each time interval). Not every manager has to trade every asset, so many of these positions could be persistently zero.

If we sum positions up across managers for each underlying asset, then there will be a 'index level' position in each underlying asset P_0.... P_X. If we knew that position and were able to know instantly when it was changing, we could perfectly track the index ignoring fees and costs. In practice, we're going to do a bit better than the index in terms of performance as we will get some execution cost netting effects (where managers trade against each other we can net those off), and we're not paying fees. 

Note that not paying performance fees on each manager (the 20 part of '2&20') will obviously improve our returns, but it will also lower our correlation with the index. Management fee savings however will just go straight to our bottom line without reducing correlation. There will be additional noise from things like how we invest our spare margin in different currencies, but this should be tiny. All this means that even in the world of perfectly observable positions we will never quite get to a correlation of 1 with the index.

But we do not know those positions! Instead, we can only observe the returns that the index level positions produce. We have to infer what the positions are from the returns. 


The curse of dimensionality and non stationarity, top down version

How can we do this inference? Well we're finance people, so the first thing we would probably reach for is a regression (it doesn't have to be a regression, and no doubt younger people reading this blog would prefer something a bit more modern, but the advantage of a regression is it's very easy to understand it's flaws and problems unlike some black box ML technique and thus illustrate what's going wrong here).

On the left hand side of the regression is the single y variable we are trying to predict - the returns of the index. On the right hand side we have the returns of all the possible instruments we know our managers are trading. This will probably run into the hundreds, but the maximum used for top down replication is typically 50 which should capture the lions share of the positions held. The regressed 'beta' coefficients on each of these returns will be the positions that we're going to hold in each instrument in our replicating portfolio: P_0... P_X. 

Is this regression even possible? Well, as a rule you want to have lots more data points than you do coefficients to estimate. Let's call the ratio between these the Data Ratio. It isn't called that! But it's as good a name as any. There is a rule of thumb that you should have at least 10x the number of variables in data points. I've been unable to find a source for who invented this rule, so let's call it The Rule Of Thumb.

There are over 3800 data points available for the BTOP50 - 14 years of daily returns, so having say 50 coefficients to estimate gives us a ratio of over 70. So we are all good.

Note - We don't estimate an intercept as we want to do this replication without help or hindrance from a systematic return bias.

In fact we are not good at all- we have a very big problem, which is that the correct betas will change every day as the positions held change every day. In theory then that means we will have to estimate 200 variables with just one piece of data - todays daily return. That's a ratio of 0.005x; well below 10!

Note - we may also have the returns for each individual manager in the index, but a moments thought 
will tell you that this is not actually helpful as it just means we will have twenty regressions to do, each with exactly the same dimensionality problem.

We can get round this. One good thing is that these CTAs aren't trading that quickly, so the position weights we should use today are probably pretty similar to yesterdays. So we can use more than one day of returns to estimate the correct current weights. The general approach in top down replication is to use rolling windows in the 20 to 40 day range. 

We now have a ratio of 40 datapoints: 50 coefficients - which is still less than ten.

To solve this problem we must reduce the number of betas we're trying to estimate by reducing the number of instruments in our replacing portfolio. This can be done by picking a set of reasonably liquid and uncorrelated instruments (say 10 or 15) to the point where we can actually estimate enough position weights to somewhat replicate the portfolio. 

However with 40 days of observations we need to have just four instruments to meet our rule of thumb. It would be hard to find a fixed group of four instruments that suffice to do a good job of replicating a trend index that actually has hundreds of instruments underlying it.

To deal withs problem, we can use some fancy econometrics. With regularisation techniques like LASSO or ridge regression; or stepwise regressions, we can reduce the effective number of coefficients we have to estimate. We would effectively be estimating a small number of coefficients, but they would be the coefficients of four different instruments over time (yes this is a hand waving sentence) which give us the best current fit.

Note that there is a clear trade off here between the choice of lookback window, and the number of coefficients estimated (eithier as an explicit fixed market choice, dynamically through stepwise regression, or in an implicit way through regularisation):

  • Very short windows will worsen the curse of dimensionality. Longer windows won't be reactive enough to position changes.
  • A smaller set of markets means a better fit, and means we can be more reactive to changes in positions held by the underlying markets, but it also means we're going to do a poorer job of replicating the index.


Introducing strategies and return factors

At this point if we were top down replicators, we would get our dataset and start running regressions. But instead we're going to pause and think a bit more deeply. We actually have additional information about our CTA managers - we know they are CTA managers! And we know that they are likely to do stuff like trend following, as well as other things like carry and no doubt lots of other exotic things. 

That information can be used to improve the top down regression. For example, we know that CTA managers probably do vol scaling of positions. Therefore, we can regress against the vol scaled returns of the underlying markets rather than the raw returns. That will have the benefit of making the betas more stable over time, as well as making the Betas comparable and thus more intuitive when interpreting the results.

But we can also use this information to tip the top down idea on it's head. Recall:

Each manager trades in X underlying assets with returns R_0.....R_X. At any given time they will have positions in each of these assets, P_c_x (so for manager 0, P_0_0.... P_0_X, for manager 1, P_1_0...P_1_X so there will be X*N positions at each time interval). 

Now instead we consider the following:

Each manager trades in Y underlying strategies with returns r_0.....r_Y. At any given time they will have weights in each of these strategies, w_c_y (so for manager 0, w_0_0.... w_0_Y, for manager 1, w_1_0...w_1_Y so there will be Y*N positions at each time interval). 

Why is this good? Well because strategy weights, unlike positions, are likely to be much more stable. I barely change my strategy weights. Most CTAs probably do regular refits, but even if they do then the weights they are using now will be very similar to those used a year ago. Instead of a 40 day window, it wouldn't be unreasonable to use a window length that could be measured in years: thousands of days. This considerably improves the curse of dimensionality problem.


Some simple tables

For a given number of X instruments, and a given number of Y strategies, Z for each instrument:



                                Top down              Bottom up

Approx optimal window size      40 days               2000 days

Number of coefficients            X                     X*Z

Data ratio                  40 / X                   2000 / X*Z


Therefore as long as Z is less than 50 the data ratio of the bottom up strategy will be superior. For example, with some real numbers - 20 markets and 5 strategies per market:

                  


                                Top down              Bottom up

Approx optimal window size        40 days               2000 days
Number of coefficients            20                    100

Data ratio                        2                      20


Alternatively, we could calculate the effective number of coefficients we could estimate to get a data ratio of 10 (eithier as a fixed group, or implicit via regularisation):



                                Top down              Bottom up

Approx optimal window size        40 days               2000 days

Data ratio                        10                    10

Number of coefficients            4                     20


It's clear that with bottom up replication we should get a better match as we can smuggle in many more coefficients, regardless of how fancy our replication is.

A very small number of caveats


There are some "but..."'s, and some "hang on a moment's" though. We potentially have a much larger number of strategies than instruments, given that we probably use more than one strategy on each instrument. Two trend following speeds plus one carry strategy is probably a minimum; tripling the number of coefficients we have to estimate. It could be many more times that.

There are ways round this - the same ways we would use to get round the 'too many instruments' problem we had before. And ultimately the benefit from allowing a much longer window length is significantly greater than the increase in potential coefficients from multiple strategies per instrument. Even if we ended up with thousands of potential coefficients, we'd still end up selecting more of them than we would with top down replication.

A perhaps unanswerable 'but...' is that we don't know for sure which strategies are being used by the various managers, whereas we almost certainly know all the possible underlying instruments they are trading. For basic trend following that's not a problem; it doesn't really matter how you do trend following you end up with much the same return stream. But it's problematic for managers doing other things.

A sidebar on latent factors


Now one thing I have noticed in my research is that asset class trends seem to explain most of instrument trend following returns (see my latest book for details). To put it another way, if you trend follow a global equity index you capture much of the p&l from trend following the individual constituents. In a handwaving way, this is an example of a latent return factor. Latent factors are the reason why both top down and bottom up replication work as well as they do so it's worth understanding them.

The idea is that there are these big and unobservable latent factors that drive returns (and risk), and individual market returns are just manifestations of those. So there is the equity return factor for example, and also a bond one. A standard way of working out what these factors are is to do a decomposition of the covariance matrix and find out what the principal components are. The first few PC will often explain most of the returns. The factor loadings are relatively static and slow moving; the S&P 500 is usually going to have a big weight in the equity return factor.

Taking this idea a step further, there could also be 'alternative' return factors; like the trend following factor or carry factor (or back in equity land, value and quality). These have dynamic loadings versus the underyling instruments; sometimes the trend following factor will be long S&P 500 and sometimes short. This dynamic loading is what makes top down replication difficult.

Bottom up regression reverses this process and begins with some known factors; eg the returns from trend following the S&P 500 at some speed with a given moving average crossover, and then tries to work out the loading on those factors for a given asset - in this case the CTA index. 

Note that this also suggests some interesting research ideas such as using factor decomposition to reduce the number of instruments or strategies required to do top down or bottom up replication, but that is for another day. 

If factors didn't exist and all returns were idiosyncratic both types of replication would be harder; the fact they do seem to exist makes replication a lot easier as it reduces the number of coefficients required to do a good job.



Setup of an empirical battle royale


Let's do a face off then of the two methodologies. The key thing here isn't to reproduce the excellent work done by others (see the referenced papers for examples), or neccessarily to find the best possible way of doing eithier kind of replication, but to understand better how the curse of dimensionality affects each of them. 

My choice of index is the BTOP50, purely because daily returns are still available for free download. My set of instruments will be the 102 I used in my recent book 'AFTS' (actually 103, but Eurodollar is no longer trading) which represent a good spread of liquid futures instruments across all the major asset classes. 

I am slightly concerned about using daily returns, because the index snapshot time is likely to be different from the closing futures price times I am using. This could lead to lookahead bias, although that is easily dealt with by introducing a conservative two day lag in betas as others have done. However it could also make the results worse since a systematic mismatch will lower the correlation between the index returns and underyling instrument returns (and thus also the strategy returns in a bottom up replication). To avoid this I also tested a version using two day returns but it did not affect the results.

For the top down replication I will use six different window sizes from 8 business days up to 256 (about a year) with all the powers of 2 in between. These window sizes exceed the range typically used in this application, deliberately because I want to illustrate the tradeoffs involved. For bottom up replication I will use eight window sizes from 32 business days up to 4096 (about sixteen years, although in practice we only have 14 years of data for the BTP50 so this means using all the available data). 

We will do our regressions every day, and then use an exponential smooth on the resulting coefficients with a span equal to twice the window size. For better intuition, a 16 day exponential span such as we would use with an 8 day window size has a halife of around 5.5 days. The maximum smooth I use is a span of 256 days.

For bottom up replication, I will use seven strategies: three trend following EWMA4,16, EWMAC16,64, EWMAC64,256 and a carry strategy (carry60); plus some additional strategies: acceleration32, mrinasset1000, and skewabs180. For details of what these involve, please see AFTS or various blogposts; suffice to say they can be qualitiatively described as fast, medium and slow trend following, carry, acceleration (change in momentum), mean reversion, fast momentum and skew respectively. Note that in the Resolve paper they use 13 strategies for each instrument, but these are all trend following over different speeds and are likely to be highly correlated (which is bad for regression, and also not helpful for replication).

I will use a limited set of 15 instruments, the same as those used in the Newfound paper, which gives me 15*7 = 105 coefficients to estimate - roughly the same as in the top down replication.

I'm going to use my standard continous forecasting method just because that is the code I have to hand; the Resolve paper does various kinds of sensitivity analysis and concludes that both binary and continous produce similar results (with a large enough universe of instruments, it doesn't matter so much exactly how you do the CTA thing). 

Note - it could make sense to force the coefficients on bottom up replication to be positive, however we don't know for sure if a majority of CTAs are using some of these strategies in reverse, in particular the divergent non trend following strategies.


Approx data ratios with different window sizes if all ~100 coefficients estimated:

                               

8 days                            0.08
16 days                           0.16
32 days                           0.32
64 days                           0.64
128 days                          1.28
256 days                          2.56
512 days                          5.12
1024 days                         10.2    
2048 days                         20.5
4096 days                         41.0


In both cases I need a way to reduce the number of regressors on the right hand side from somewhere just over 100 to something more reasonable. This will clearly be very important with an 8 day window!

Various fancy techniques are commonly used for this including LASSO and ridge regression. There is a nice summary of the pros and cons of these in an appendix of the Resolve paper; one implication being that the right technique will depend on whether we are doing bottom up or top down replication. They also talk about elastic net, a technique that combines both of these techniques. For simplicity I use LASSO, as there is only one hyperparameter to fit (penalty size).



Results

Here are the correlation figures for the two methods with different lookback windows:


As you can see, the best lookback for the top down method needs to be quite short to capture changing positions. Since strategy weights are more stable, we can use a longer lookback for the bottom up method. For any reasonable length of lookback the correlation produced by the top down method is pretty stable, and significantly better than the bottom up method.


Footnote: Why not do both?

One of the major contributions of the Resolve paper is the idea of combining both top down and bottom up methods. We can see why this make sense. Although bottom up is superior as it causes less dimensionality issues, it does suffer because there might be some extra 'secret sauce' that our bottom up models don't capture. By including the top down element as well we can possibly fill this gap.


Footnote on 'Creating a CTA from scratch'

You may have seen some bottom up 'replication' articles that don't use any regression, such as this one. They just put together a set of simple strategies with some sensible weights and then do an ex-post cursory check on correlation with the index. The result, without trying, is a daily correlation of 0.6 with the SG CTA index, in line with the best bottom up results above without any of the work or the risks involved with doing potentially unstable regressions on small amounts of data. Indeed, my own trading strategies (monthly) correlation with the SG CTA index was 0.8 last time I checked. I have certainly done no regressions to get that that!

As I mentioned above, if you are a retail investor or an institutional investor who is not obsessed with benchmarking, then this might be the way to go. There is then no limit on the number of markets and strategies you can include.


Conclusion

I guess my conclusion comes back to why... why are we doing this.

If we really want to replicate the index then we should be agnostic about methodology and go with what is best. This will involve mostly bottom up with a longish window for the reasons discussed above, although it can probably be improved by including an averaging with top down.

But if we are trying to get 'exposure to some trend following factors' without caring about the index then I would probably start with the bottom up components of simple strategies on a diversified set of instruments with sensible but dumb 'no-information' weights that probably use some correlation information but not much else (see all the many posts I have done on portfolio optimisation). Basically the 'CTA from scratch' idea.

And then it might make sense to move in the direction of trying to do a bottom up replication of the index if you did decide to reduce your tracking error, though I'd probably use a robust regression to avoid pulling the strategy weights too far from the dumb weights.






Wednesday, 6 March 2024

Fitting with: exponential weighting, alpha and the kitchen sink

 I've talked at some length before about the question of fitting forecast weights, the weights you use to allocate risk amongst different signals used to trade a particular instrument. Generally I've concluded that there isn't much point wasting time on this, for example consider my previous post on the subject here.

However it's an itch I keep wanting to scratch, and in particular there are three things I'd like to look at which I haven't considered before:

  • I've generally used ALL my data history, weighted equally. But there are known examples of trading rules which just stop working during the backtested period, for example faster momentum pre-cost (see my last book for a discussion). 
  • I've generally used Sharpe ratio as my indicator of performance of choice. But one big disadvantage of it is that will tend to favour rules with more positive Beta exposure on markets that have historically gone up 
  • I've always used a two step process where I first fit forecast weights, and then instrument weights. This seperation makes things easier. But we can imagine many examples where it would produce a suboptimal performance.
In this post I discuss some ideas to deal with these problems:

  • Exponential weighting, with more recent performance getting a higher weight.
  • Using alpha rather than Sharpe ratio to fit.
  • A kitchen sink approach where both instrument and forecast weights are fitted together.

Note I have a longer term project in mind where I re-consider the entire structure of my trading system, but that is a big job, and I want to put in place these changes before the end of the UK tax year, when I will also be introducing another 50 or so instruments into my traded universe, something that would require some fitting of some kind to be done anyway.


Exponential weighting

Here is the 2nd most famous 'hockey stick' graph in existence:

(From my latest book, Advanced Futures Trading Strategies AFTS)

Focusing on the black lines, which show the net performance of the two fastest EWMAC trading rules across a portfolio of 102 futures contracts, there's a clear pattern. Prior to 1990 these rules do pretty well, then afterwards they flatline (EWMAC4 in a very clear hockey stick pattern) and do badly (EWMAC2). 

I discuss some reasons why this might have happened in the book, but that isn't what concerns us now. What bothers me is this; if I allocate my portfolio across these trading strategies using all the data since 1970 then I'm going to give some allocation to EWMAC4 and even a little bit to EWMAC2. But does that really make sense, to put money in something that's been flat / money losing for over 30 years?

Fitting by use of historic data is a constant balance between using more history, to get more robust statistically significant results, and using more recent data that is more likely to be relevant and also accounts for alpha decay. The right balance depends on both the holding period of our strategies (HFT traders use months of data, I should certainly be using decades), and also the context (to predict instrument standard deviation, I use something equvalent to using about a month of returns, whereas for this problem a much longer history would be appropriate). 

Now I am not talking about crazy and doing something daft like allocating everything to the strategy that did best last week, but it does seem reasonable to use something like a 15 year halflife when estimating means and Sharpe Ratios of trading strategy returns.

That would mean I'd currently be giving about 86% of any weighting to the period after 1990, compared to about 62% now with equal weighting. So it's not a pure rolling window; the distant past still has some value, but the recent past is more important. 


Using alpha rather than Sharpe Ratio to fit

One big difference between Quant equity people and Quant futures traders is that the former are obsessed with alpha. They get mugs from their significant others with 'worlds best alpha generator' on them for christmas. They wear jumpers with the alpha symbol on. You get the idea. Beta is something to be hedged out. Much of the logic is that we're probably getting our daily diet of Beta exposure elsewhere, so the holistic optimal portfolio will consist of our existing Beta plus some pure alpha.

Quant futures traders are, broadly speaking, more concerned with outright return. I'm guilty of this myself. Look at the Sharpe Ratio in the backtest. Isn't it great? And you know what, that's probably fine. The correlation of a typical managed futures strategy with equity/bond 60:40 is pretty low. So most of our performance will be alpha anyway. 

However evaluating different trading strategies on outright performance is somewhat problematic. Certain rules are more likely to have a high correlation with underlying markets. Typically this will include carry in assets where carry is usually positive (eg bonds), and slower momentum on anything that has most usually gone up in the past (eg equities)*.  To an extent some of this is fine since we want to collect some risk premia, but if we're already collecting those premia elsewhere in long only portfolios**, why bother? 

* This also means that any weighting of instrument performance will be biased towards things that have gone up in the past - not a problem for me right now as I generally ignore it, but could be a problem if we adopt a 'kitchen sink' approach as I will discuss later.

** Naturally 'Trend following plus nothing' people will prefer to collect their risk premia inside their trend following portfolios, but they are an exception. I note in passing that for a retail investor who has to pay capital gains when their futures roll, it is likely that holding futures positions is an inferior way of collecting alpha.

I'm reminded of a comment by an old colleague of mine* on investigating different trading rules in the bond sector (naturally evalutin. After several depressing weeks he concluded that 'Nothing I try is any better than long only'.

*Hi Tom!

So in my latest book AFTS (sorry for the repeated plugs, but you're reading this for free so there has to be some advertising and at least it's more relevant than whatever clickbait nonsense the evil algo would serve up to you otherwise) I did redress this slightly by looking at alpha and not just outright returns. For example my slowest momentum rule (EWMAC64,256) has a slightly higher SR than one of my fastest (EWMAC8,32), but an inferior alpha even after costs.


Which benchmark?

Well this idea of using alpha is all very well, but what benchmark are we regressing on to get it? This isn't US equities now mate, you can't just use the S&P 500 without thinking. Some plausible candidates are:

  1. The S&P 500,.
  2. The 60:40 portfolio that some mythical investor might own as well as this, or a more tailored version to my own requirementsThis would be roughly equivalent to long everything on a subset of markets, with sector risk weights of about 80% in equities and 20% in bonds. Frankly this wouldn't be much different to the S&P 500.
  3. The 'long everything' portfolio I used in AFTS, which consists of all my futures with a constant positive fixed forecast (the system from chapter 4, as readers will know).
  4. A long only portfolio just for the sector a given instrument is trading in.
  5. A long only position just on the given instrument we are trading.

There are a number of things to consider here. What is the other portfolio that we hold? It might well be the S&P 500 or just the magnificent 7; it's more likely to consist of a globally diversified bunch of bonds and stocks; it's less likely to have a long only cash position in some obscure commodities contract. 

Also not all things deliver risk premia in their naked Beta outfits. Looking at median long only constant forecast SR in chapter 3 of AFTS, they appear lower in the non financial assets (0.07 in ags, 0.27 in metals and 0.32 in energy; versus 0.40 in short vol, 0.46 in equity and 0.59 in bonds; incidentally FX is also close to zero at 0.09, but EM apart there's no reason why we should earn a risk premium here). This implies we should be veering towards loading up on Beta in financials and Alpha in non financials). 

But it's hard to disaggregate what is the natural risk premium from holding financial assets, versus what we've earned just from a secular downtrend in rates and inflation that lasted for much of the 50 odd years of the backtest. Much of the logic for doing this exercise is because I'm assuming that these long only returns will be lower in the future because that secular trend has now finished.

Looking at the alpha just on one instrument will make it a bit weird when comparing alphas across different instruments. It might sort of make more sense to do the regression on a sector Beta. This would be more analogus to what the equity people do.

On balance I think the 'long everything' benchmark I used in AFTS is the best compromise. Because trends have been stronger in equities and bonds it will be reasonably correlated to 60:40 anyway. Regressing against this will thus give a lower Beta and potentially better Alpha for instruments outside of those two sectors.

One nice exercise to do is to then see what a blend of long everything and the alpha optimised portfolio looks like. This would allow us to include a certain amount of general Beta into the strategy. We probably shouldn't optimise for this.


Optimising with alpha

We want to allocate more to strategies with higher alpha. We also want that alpha to be statistically significant. We'll get more statistical significance with more observations, and/or a better fit to the regression. 

Unlike with means and Sharpe Ratios, I don't personally have any well developed methodologies, theories, or heuristics, for allocating weights according to alpha or significance of alpha. I did consider developing a new heuristic, and wasted a bit of time with toy formula involving the product of (1- p_value) and alpha.

But I quickly realised that it's fairly easy to adapt work I have done on this before. Instead of using naked return streams, we use residual return streams; basically the return left over after subtracting Beta*benchmark return. We can then divide this by the target return to get a Sharpe Ratio, which is then plugged in as normal.

How does this fit into an exponential framework? There are a number of ways of doing this, but I decided against the complexity of writing code (which would be slow) to do my regression in a full exponential way. Instead I estimate my Betas on first a rolling, then an expanding, 30 year window (which trivially has a 15 year half life). I don't expect Betas to vary that much over time. I estimate my alphas (and hence Sharpe ratios) with a 15 year half life on the residuals. Betas are re-estimated every year, and the most up to date estimate is then used to correct returns in the past year (otherwise the residual returns would change over time which is a bit weird and also computationally more expensive).


Kitchen sink

I've always done my optimisation in a two step process. First, what is the best way to forecast the price of this market (what is the best allocation across trading rules, i.e. what are my forecast weights)? Second, how should I put together a portfolio of these forecasters (what is the best allocation across instruments, i.e. what are my instrument weights)? 

Partly that reflects the way my trading strategy is constructed, but this seperation also makes things easier. But it does reflect a forecasting mindset, rather than a 'diversified set of risk premia' mindset. Under the latter mindset, it would make sense to do a joint optimisation where the individual 'lego bricks' are ONE trading rule and ONE instrument. 

It strikes me that this is also a much more logical approach once we move to maximising alpha rather than maximising Sharpe Ratio. 

Of course there are potential pain points here. Even for a toy portfolio of 10 trading rules and 50 instruments we are optimising 500 assets. But the handcrafting approach of top down optimisation ought to be able to handle this fairly easily (we shall see!).



Testing

Setup

Let's think about how to setup some tests for these ideas. For speed and interpretation I want to keep things reasonably small. I'm going to use my usual five outright momentum EWMAC trading rules, plus 1 carry (correlations are pretty high here, I will use carry60), plus one of my skew rules (skewabs180 for those who care), asset class mean reversion - a value type rule (mrinasset1000), and both asset class momentum (assettrend64) and relative momentum (relmomentum80). My expectation is that the more RV type rules - relative momentum, skew, value - will get a higher weight than when we are just considering outright performance. I'm also expecting that the very fastest momentum will have a lower weight when exponential weighting is used.

The rules profitability is shown above. You can see that we're probably going to want to have less MR (mean reversion), as it's rubbish; and also if we update our estimates for profitability we'd probably want less faster momentum and relative momentum. There is another hockey stick from 2009 onwards when many rules seem to flatten off somewhat.

(Frankly we could do with more rules that made more money recently; but I don't want to be accused of overegging the pudding on overfitting here)

For instruments, to avoid breaking my laptop with repeated optimisation of 200+ instruments I kept it simple and restricted myself to only those with at least 20 years of trading history. There are 39 of these old timers:

'BRE', 'CAD', 'CHF', 'CORN', 'DAX', 'DOW', 'EURCHF', 'EUR_micro', 'FEEDCOW', 'GASOILINE', 'GAS_US_mini', 'GBP', 'GBPEUR', 'GOLD_micro', 'HEATOIL', 'JPY', 'LEANHOG', 'LIVECOW', 'MILK', 'MSCISING', 'MXP', 'NASDAQ_micro', 'NZD', 'PALLAD', 'PLAT', 'REDWHEAT', 'RICE', 'SILVER', 'SOYBEAN_mini', 'SOYMEAL', 'SOYOIL', 'SP400', 'SP500_micro', 'US10', 'US20', 'US5', 'WHEAT', 'YENEUR', 'ZAR'

On the downside there is a bit of a sector bias here (12 FX, 11 Ags,  6 equity, 4 metals, and only 3 bonds amd 3 energy), but that also gives more work for the optimiser (FWIW my full set of instruments has biased towards equities, so you can't really win).

For my long only benchmark used for regressions I'm going to use a fixed forecast of +10, which in laymans terms means it's a risk parity type portfolio. I will set the instrument weights using my handcrafting method, but without any information about Sharpe Ratio, just correlations. IDM is estimated on backward looking data of course.

I will then have something that roughly resembles my current system (although clearly with fewer markets and trading rules, and without using dynamic optimisation of positions). I also use handcrafting, but I fit forecast weights and instrument weights seperately, again without using any information on performance just correlations. 

I then check the effect of introducing the following features:

  • 'SR' Allowing Sharpe Ratio information to influence forecast and instrument weights
  • 'Alpha' Using alpha rather than Sharpe Ratio
  • 'Short' Using a 15 year halflife rather than all the data to estimate Sharpe Ratios and correlations
  • 'Sink' Estimating the weights for forecast and instrument weights at the same time

Apart from SR and alpha which are mutually exclusive, this gives me the following possible permutations:

  • Baseline: Using no peformance information 
  • 'SR' 
  • 'SR+Short' 
  • 'Sink' 
  • 'SR+Sink' 
  • 'SR+Short+Sink' 
  • 'Alpha' 
  • 'Alpha+Short'
  • 'Alpha+Sink'
  • 'Alpha+Short+Sink'
In terms of performance I'm going to check both the outright performance, but also the overall portfolio alpha. I will also look seperately at the post 2008 period and the pre 2008 period. Naturally everything is done out of sample, with robust optimisation, and after costs.

Finally, as usual in all cases I discard trading rules which don't meet my 'speed limit'. This also means that I don't trade the Milk future at all.


Long only benchmark

Some fun facts, here are the final instrument weights by asset class:

{'Ags': 0.248, 'Bond': 0.286, 'Equity': 0.117, 'FX': 0.259, 'Metals': 0.0332, 'OilGas': 0.0554}

The final diversification multiplier is 2.13. It has a SR of around 0.6, and costs of around 0.4% a year.


Baseline

Here is a representative set of forecast weights (S&P 500):

relmomentum80       0.105

momentum4           0.094
momentum8           0.048
momentum16          0.048
momentum32          0.054
momentum64          0.054
assettrend64        0.102

carry60             0.155
mrinasset1000       0.238
skewabs180          0.102

The massive weight to mrinasset is due to the fact it is very diversifying, and we are only using correlations here. But mrinasset isn't very good, so smuggling in outright performance would probably be a good thing to do.

SR of this thing is 0.98 and costs are a bit higher as we'd expect at 0.75% annualised. Always amazing how well just a simple diversified system can do. The Beta to our long only model is just 0.09 (again probably due to that big dollop of mean reversion which is slightly negative Beta if anything), so perhaps unsurprising the net alpha is 18.8% a year (dividing by the vol gets to a SR of 0.98 again just on the alpha). BUT...



Performance has declined over time. 


'SR'

I'm now going to allow the fitting process for both forecast and instrument weighs to use Sharpe ratio. Naturally I'm doing this in a sensible way so the weights won't go completely nuts.

Let's have a look at the forecast weights for comparison:

momentum4           0.112

momentum8           0.065

momentum16          0.068

momentum32          0.075

momentum64          0.072

assettrend64        0.122

relmomentum80       0.097


mrinasset1000       0.135

skewabs180          0.105

carry60             0.149


We can see that money losing MR has a lower weight, and in general the non trendy part of the portfolio has dropped from about half to under 40%. But we still have lots of faster momentum as we're using the whole period to fit.

Instrument weights by asset class meanwhile look like this:

{'Ags': 0.202, 'Bond': 0.317, 'Equity': 0.191, 'FX': 0.138 'Metals': 0.0700, 'OilGas': 0.0820}

Not dramatic changes, but we do get a bit more of the winning asset classes. 


'SR+Short'

Now what happens if we change our mean and correlation estimates so they have a 15 year halflife, rather than using all the data?

Forecast weights:

momentum4           0.095

momentum8           0.055

momentum16          0.059

momentum32          0.067

momentum64          0.066

relmomentum80       0.095

assettrend64        0.122

skewabs180          0.117

carry60             0.142

mrinasset1000       0.183



There's definitely been a shift out of faster momentum, and into things that have done better recently such as skew. We are also seeing more MR which seems counterintuitive, my initial theory is that it's because MR becomes more diversifying over time and this is indeed the case.


'Sink+SR+Short'

So far we've just been twiddling around a little at the edges really, but this next change is potentially quite different - jointly optimising the forecast and instrument weights. Let's look at the results with the SR using the 15 year halflife.

Here are the S&P 500 forecast weights - note that unlike for other methods, these could be wildly different across instruments:

momentum4           0.136
momentum8           0.188
momentum16          0.147
momentum32          0.046
momentum64          0.048
assettrend64        0.098
relmomentum80       0.012

carry60             0.035
mrinasset1000       0.000
skewabs180          0.290

Here we see decent amounts of faster momentum - maybe because it's a cheaper instrument or just happens to work better - but no mean reversion which apparently is shocking here. A better way of doing this is seeing the forecast weights added up across all instruments:

momentum4        0.181185
momentum8        0.138984
momentum16       0.088705
momentum32       0.079942
momentum64       0.098399
assettrend64     0.107174
relmomentum80    0.052196

skewabs180       0.086572
mrinasset1000    0.063426
carry60          0.103418


Perhaps surprisingly now we're seeing brutally large amounts of fast momentum, and less of the more diversifying rules. 
 


Interlude - clustering when everything is optimised together


To understand a little better what's going on, it might be helpful to do a cluster analysis to see how things are grouping together when we do our top down optimisation across the 10 rules and 37 instruments: 370 things altogether. Using the final correlation matrix to do the clustering, here are the results for 2 clusters:

Instruments {'CAD': 10, 'FEEDCOW': 10, 'GAS_US_mini': 10, 'GBPEUR': 10, 'LEANHOG': 10, 'LIVECOW': 10, 'MILK': 10, 'RICE': 10, 'SP400': 10, 'YENEUR': 10, 'ZAR': 10, 'DAX': 9, 'DOW': 9, 'MSCISING': 9, 'NASDAQ_micro': 9, 'SP500_micro': 9, 'GASOILINE': 4, 'GOLD_micro': 4, 'HEATOIL': 4, 'JPY': 4, 'PALLAD': 4, 'PLAT': 4, 'SILVER': 4, 'CHF': 3, 'CORN': 3, 'EURCHF': 3, 'EUR_micro': 3, 'NZD': 3, 'REDWHEAT': 3, 'SOYBEAN_mini': 3, 'SOYMEAL': 3, 'SOYOIL': 3, 'US10': 3, 'US5': 3, 'WHEAT': 3, 'BRE': 2, 'GBP': 2, 'US20': 2, 'MXP': 1}
Rules {'skewabs180': 37, 'mrinasset1000': 35, 'carry60': 33, 'relmomentum80': 25, 'assettrend64': 16, 'momentum4': 16, 'momentum8': 16, 'momentum16': 16, 'momentum32': 16, 'momentum64': 16}

Instruments {'MXP': 9, 'BRE': 8, 'GBP': 8, 'US20': 8, 'CHF': 7, 'CORN': 7, 'EURCHF': 7, 'EUR_micro': 7, 'NZD': 7, 'REDWHEAT': 7, 'SOYBEAN_mini': 7, 'SOYMEAL': 7, 'SOYOIL': 7, 'US10': 7, 'US5': 7, 'WHEAT': 7, 'GASOILINE': 6, 'GOLD_micro': 6, 'HEATOIL': 6, 'JPY': 6, 'PALLAD': 6, 'PLAT': 6, 'SILVER': 6, 'DAX': 1, 'DOW': 1, 'MSCISING': 1, 'NASDAQ_micro': 1, 'SP500_micro': 1}
Rules {'assettrend64': 23, 'momentum4': 23, 'momentum8': 23, 'momentum16': 23, 'momentum32': 23, 'momentum64': 23, 'relmomentum80': 14, 'carry60': 6, 'mrinasset1000': 4, 'skewabs180': 2}


Interepration here is that for each cluster I count the number of instruments present, and then trading rules. So for example the first cluster has 10 examples of CAD - since there are 10 trading rules that means all the CAD is in this cluster. It also has 37 examples of the skewabs180 rules, again this means that all the skew rules have been collected here.

This first cluster split clearly shows a split between divergent rules in cluster 1, and trendy type rules in cluster 2. The instrument split is less helpful.

Jumping ahead, here are N=10 clusters with my own labels in bold:

Cluster 1 EQUITY TREND
Instruments {'DAX': 6, 'DOW': 5, 'NASDAQ_micro': 5, 'SP400': 5, 'SP500_micro': 5}
Rules {'assettrend64': 5, 'momentum16': 5, 'momentum32': 5, 'momentum64': 5, 'momentum8': 4, 'mrinasset1000': 1, 'skewabs180': 1}

Cluster 2 ???
Instruments {'GAS_US_mini': 9, 'EURCHF': 2, 'GOLD_micro': 2, 'MSCISING': 2, 'NASDAQ_micro': 2, 'PALLAD': 2, 'PLAT': 2, 'RICE': 2, 'SILVER': 2, 'SP500_micro': 2, 'BRE': 1, 'CAD': 1, 'DAX': 1, 'DOW': 1, 'EUR_micro': 1, 'GASOILINE': 1, 'REDWHEAT': 1, 'SOYBEAN_mini': 1, 'SOYMEAL': 1, 'SOYOIL': 1, 'SP400': 1, 'US5': 1, 'WHEAT': 1}
Rules {'mrinasset1000': 15, 'carry60': 8, 'skewabs180': 6, 'relmomentum80': 5, 'assettrend64': 1, 'momentum4': 1, 'momentum8': 1, 'momentum16': 1, 'momentum32': 1, 'momentum64': 1}

Cluster 3 ???
Instruments {'FEEDCOW': 10, 'GBPEUR': 10, 'LEANHOG': 10, 'LIVECOW': 10, 'MILK': 10, 'YENEUR': 10, 'ZAR': 10, 'CAD': 9, 'RICE': 8, 'MSCISING': 7, 'HEATOIL': 4, 'JPY': 4, 'SP400': 4, 'CHF': 3, 'CORN': 3, 'DOW': 3, 'GASOILINE': 3, 'NZD': 3, 'US10': 3, 'DAX': 2, 'EUR_micro': 2, 'GBP': 2, 'GOLD_micro': 2, 'NASDAQ_micro': 2, 'PALLAD': 2, 'PLAT': 2, 'REDWHEAT': 2, 'SILVER': 2, 'SOYBEAN_mini': 2, 'SOYMEAL': 2, 'SOYOIL': 2, 'SP500_micro': 2, 'US20': 2, 'US5': 2, 'WHEAT': 2, 'BRE': 1, 'EURCHF': 1, 'GAS_US_mini': 1, 'MXP': 1}
Rules {'skewabs180': 30, 'carry60': 25, 'relmomentum80': 20, 'mrinasset1000': 19, 'momentum4': 15, 'momentum8': 11, 'assettrend64': 10, 'momentum16': 10, 'momentum32': 10, 'momentum64': 10}

Cluster 4 US RATES TREND+CARRY
Instruments {'US20': 8, 'US10': 7, 'US5': 7}
Rules {'carry60': 3, 'assettrend64': 3, 'momentum4': 3, 'momentum8': 3, 'momentum16': 3, 'momentum32': 3, 'momentum64': 3, 'mrinasset1000': 1}

Cluster 5 EURCHF
Instruments {'EURCHF': 7}
Rules {'assettrend64': 1, 'momentum4': 1, 'momentum8': 1, 'momentum16': 1, 'momentum32': 1, 'momentum64': 1, 'skewabs180': 1}

Cluster 6 EQUITY MR+REL MOMENTUM
Instruments {'DAX': 1, 'DOW': 1, 'MSCISING': 1, 'NASDAQ_micro': 1, 'SP500_micro': 1}
Rules {'mrinasset1000': 3, 'relmomentum80': 2}

Cluster 7 G10 FX TREND
Instruments {'GBP': 8, 'CHF': 7, 'EUR_micro': 7, 'NZD': 7, 'JPY': 6}
Rules {'assettrend64': 5, 'momentum4': 5, 'momentum8': 5, 'momentum16': 5, 'momentum32': 5, 'momentum64': 5, 'relmomentum80': 4, 'carry60': 1}

Cluster 8 EM FX
Instruments {'MXP': 9, 'BRE': 8}
Rules {'relmomentum80': 2, 'carry60': 2, 'assettrend64': 2, 'momentum4': 2, 'momentum8': 2, 'momentum16': 2, 'momentum32': 2, 'momentum64': 2, 'skewabs180': 1}

Cluster 9 AGS TREND
Instruments {'CORN': 7, 'REDWHEAT': 7, 'SOYBEAN_mini': 7, 'SOYMEAL': 7, 'SOYOIL': 7, 'WHEAT': 7}
Rules {'relmomentum80': 6, 'assettrend64': 6, 'momentum4': 6, 'momentum8': 6, 'momentum16': 6, 'momentum32': 6, 'momentum64': 6}

Cluster 10 ENERGY/METAL TREND
Instruments {'GASOILINE': 6, 'GOLD_micro': 6, 'HEATOIL': 6, 'PALLAD': 6, 'PLAT': 6, 'SILVER': 6}
Rules {'assettrend64': 6, 'momentum4': 6, 'momentum8': 6, 'momentum16': 6, 'momentum32': 6, 'momentum64': 6}

We can see that there are some richer things going on here than we could capture in the simple 2-dimensional fit of first forecast weights, then instrument weights. 


'Alpha'

Let's now see what happens if we replace the use of Sharpe Ratio on raw returns to measure performance with optimisation with the use of a Sharpe Ratio on residual returns after adjusting for Beta exposure; alpha basically.

Here are our usual forecast weights for S&P 500:

momentum4        0.061399
momentum8        0.067279
momentum16       0.037172
momentum32       0.037112
momentum64       0.069207
relmomentum80    0.170218
assettrend64     0.176815

skewabs180       0.105298
carry60          0.102799
mrinasset1000    0.172700

Very interesting; we're steering very much away from all speeds of 'vanilla' momentum here, and once again we have a lump of money in the very much diversifying but money losing mean reversion in assets.


Results


Right so you have waded through all this crap, and here is your reward, what are the results like?

                     SR   beta   r_SR  H1_SR  H1_beta  H1_r_SR  H2_SR  H2_beta  H2_r_SR
LONG_ONLY
0.601 1.000 0.000 0.740 1.000 0.000 0.191 1.000 0.000
BASELINE 0.983 0.094 0.940 1.246 0.094 1.192 0.197 0.058 0.188
SR 1.087 0.308 0.932 1.298 0.338 1.084 0.458 0.151 0.438
SR_short 1.089 0.324 0.927 1.318 0.353 1.100 0.375 0.166 0.350
sink 1.025 0.323 0.871 1.252 0.330 1.055 0.356 0.260 0.319
SR_sink 0.975 0.362 0.804 1.167 0.378 0.947 0.388 0.265 0.349
SR_short_sink 0.929 0.384 0.753 1.121 0.395 0.899 0.323 0.305 0.277
alpha 0.993 0.199 0.891 1.212 0.220 1.069 0.345 0.081 0.336
alpha_short 1.023 0.238 0.904 1.237 0.249 1.082 0.367 0.160 0.343
alpha_sink_short 0.888 0.336 0.735 1.090 0.336 0.903 0.257 0.299 0.210
The columns are the SR, beta, and 'residual SR' (alpha divided by standard deviation) for the whole period, then for H1 (not really the first half, but pre 2009), then for H2 (after 2009). Green values are the best or very close to it, red is the worst (excluding long only, natch).

Top line is everything looks worse after 2009 for both outright and residual performance. Looking at the entire period, there are some fitting methods that do better than the baseline on SR, but on residual SR they fail to improve. Focusing on the second half of the data, there is a better improvement on SR over the baseline for all the fitting methods, but one which also survives the use of a residual SR. 

But the best model of all in that second half was almost much the simplest, just using the SR alone to robustly fit weights - but for the entire period, and sticking to the two stage process of fitting forecast weights and then instrument weights. 

I checked, and the improvement over the baseline from just using SR was statistically significant with a p-value of about 0.02. The p-value versus the competing 'apha' fit wasn't so good - just 0.2; but Occams Rob's razor says we should use the simplest possible model unless there is a more complex model that is significantly better. SR is significantly better than the simpler baseline model at least in the more critical second half of the data, so we should use it and only use a more complex model if they are better. We don't need a decent p-value to justify using SR over alpha, since the latter is more complex.


Coda: Forecast or instrument weights?

One thing I was curious about was whether the improvements from using SR are down to fitting forecast weights or instrument weights. I'm a bit more bullish on the former, as I feel there is more data and thus more likelihood of getting robust statistics. Every time I have looked at instrument performance, I've not seen any stastistically significant differences.

(If you have say 30 years of data history, then for each instrument you have 30 years worth of information, but for each trading rule you have evidence from each instrument so you end up with 30*30 years which means you have root(30) = 5.5 times more information).

My hope / expectation is that all the work is being done by forecast weight fitting, so I checked to see what happened if I ran a SR fit just on instruments, and just on forecasts:

               SR   beta   r_SR  H1_SR  H1_beta  H1_r_SR  H2_SR  H2_beta  H2_r_SR
LONG_ONLY 0.601 1.000 0.000 0.740 1.000 0.000 0.191 1.000 0.000
BASELINE 0.983 0.094 0.940 1.246 0.094 1.192 0.197 0.058 0.188
SR 1.087 0.308 0.932 1.298 0.338 1.084 0.458 0.151 0.438
SR_forecasts 1.173 0.330 1.002 1.416 0.364 1.179 0.441 0.154 0.420
SR_instruments 0.948 0.118 0.892 1.179 0.121 1.106 0.259 0.076 0.248


Sure enough we can see that the benefit is pretty much entirely coming from the forecast weight fitting.


Conclusions

I've shyed away from using performance rather than just correlations for fitting, but as I said earlier it is an itch I wanted to scratch. It does seem that none of the fancy alternatives I've considered in this post add value; so I will keep searching for the elusive bullet of quick wins through portfolio optimisation. 

Meanwhile for the exercise of updating my trading strategy with new instruments, I will probably be using Sharpe Ratio information to robustly fit forecast weights but not instrument weighs (I still need to hold my nose a bit!).