Programming note:
So, first I should apologise for the LONG.... break between blogposts. This started when I decided not to do my usual annual review of performance - it is a lot of work, and I decided that the effort wasn't worth the value I was getting from it (in the interests of transparency, you can still find my regularly updated futures trading performance here). Since then I have been busy with other projects, but I now find myself with more free time and a big stack of things I want to research and write blog posts on.
Actual content begins here:
To the point then - if you have heard me talking on the TTU podcast you will know that one of my pet subjects for discussion is the thorny idea of replicating - specifically, replicating the performance of a CTA index using a relatively modest basket of futures which is then presented inside something like an ETF or other fund wrapper as an alternative to investing in the CTA index itself (or to be more precise, investing in the constituents because you can't actually invest in an index).
Reasons why this might be a good thing are:
- that you don't have to pay fat fees to a bunch of CTA managers, just slightly thinner ones to the person providing you with the ETF.
- potentially lower transaction costs outside of the fee charged
- Much lower minimum investment ticket size
- Less chance of idiosyncratic manager exposure if you were to deal with the ticket size issue by investing in just a subset of managers rather than the full index
How is this black magic achieved? In an abstract way there are three ways we can replicate something using a subset of the instruments that the underyling managers are trading:
- If we know the positions - by finding the subset of positions which most closely matches the joint positions held by the funds in the index. This is how my own dynamic optimisation works, but it's not really practical or possible in this context.
- Using the returns of individual instruments: doing a top down replication where we try and find the basket of current positions that does the best job of producing those returns.
- If we know the underlying strategies - by doing a bottom up replication where we try and find the basket of strategies that does the best job of producing those returns.
In this post I discuss in more detail some more of my thoughts on replication, and why I think bottom up is superior to top down (with evidence!).
I'd like to acknowledge a couple of key papers which inspired this post, and from which I've liberally stolen:
Why are we replicating?
You may think I have already answered this; replication allows us to get close to the returns of an index more cheaply and with lower minimum ticket size than if we invested in the underlying managers. But we need to take a step back: why do we want the returns of the <insert name of CTA index> index?
For many institutional allocators of capital the goal is indeed closely matching and yet beating the returns of a (relatively) arbitrary benchmark. In which case replication is probably a good thing.
If on the other hand you want to get exposure to some latent trend following (and carry, and ...) return factors that you believe are profitable and/or diversifying then other options are equally valid, including investing in a selected number of managers, or doing DIY trend following (and carry, and ...). In both cases you will end up with a lower correlation to the index than with replication, but frankly you probably don't care.
And of course for retail investors where direct manager investment (in a single manager, let alone multiple managers) and DIY trend following aren't possible (both requiring $100k or more) then a half decent and chearp ETF that gives you that exposure is the only option. Note such a fund wouldn't neccessarily need to do any replication - it could just consist of a set of simple CTA type strategies run on a limited universe of futures and that's probably just fine.
(There is another debate about how wide that universe of futures should be, which I have also discussed in recent TTU episodes and for which
this article is an interesting viewpoint).
For now let's assume we care deeply, deeply, about getting the returns of the index and that replication is hence the way to go.
What exactly are we replicating?
In a very abstract way, we think of there being C_0....C_N CTA managers in an index. For example in the SG CTA index there are 20 managers, whilst in the BTOP50 index there are... you can probably guess. No, not 50, it's currently 20. The 50 refers to the fact it's trying to capture at least 50% of the investable universe.
In theory the managers could be weighted in various ways (AUM, vol, number of Phds in the front office...) but both of these major indices are equally weighted. It doesn't actually matter what the weighting is for our purposes today.
Each manager trades in X underlying assets with returns R_0.....R_X. At any given time they will have positions in each of these assets, P_c_x (so for manager 0, P_0_0.... P_0_X, for manager 1, P_1_0...P_1_X and in total there will be X*N positions at each time interval). Not every manager has to trade every asset, so many of these positions could be persistently zero.
If we sum positions up across managers for each underlying asset, then there will be a 'index level' position in each underlying asset P_0.... P_X. If we knew that position and were able to know instantly when it was changing, we could perfectly track the index ignoring fees and costs. In practice, we're going to do a bit better than the index in terms of performance as we will get some execution cost netting effects (where managers trade against each other we can net those off), and we're not paying fees.
Note that not paying performance fees on each manager (the 20 part of '2&20') will obviously improve our returns, but it will also lower our correlation with the index. Management fee savings however will just go straight to our bottom line without reducing correlation. There will be additional noise from things like how we invest our spare margin in different currencies, but this should be tiny. All this means that even in the world of perfectly observable positions we will never quite get to a correlation of 1 with the index.
But we do not know those positions! Instead, we can only observe the returns that the index level positions produce. We have to infer what the positions are from the returns.
The curse of dimensionality and non stationarity, top down version
How can we do this inference? Well we're finance people, so the first thing we would probably reach for is a regression (it doesn't have to be a regression, and no doubt younger people reading this blog would prefer something a bit more modern, but the advantage of a regression is it's very easy to understand it's flaws and problems unlike some black box ML technique and thus illustrate what's going wrong here).
On the left hand side of the regression is the single y variable we are trying to predict - the returns of the index. On the right hand side we have the returns of all the possible instruments we know our managers are trading. This will probably run into the hundreds, but the maximum used for top down replication is typically 50 which should capture the lions share of the positions held. The regressed 'beta' coefficients on each of these returns will be the positions that we're going to hold in each instrument in our replicating portfolio: P_0... P_X.
Is this regression even possible? Well, as a rule you want to have lots more data points than you do coefficients to estimate. Let's call the ratio between these the Data Ratio. It isn't called that! But it's as good a name as any. There is a rule of thumb that you should have at least 10x the number of variables in data points. I've been unable to find a source for who invented this rule, so let's call it The Rule Of Thumb.
There are over 3800 data points available for the BTOP50 - 14 years of daily returns, so having say 50 coefficients to estimate gives us a ratio of over 70. So we are all good.
Note - We don't estimate an intercept as we want to do this replication without help or hindrance from a systematic return bias.
In fact we are not good at all- we have a very big problem, which is that the correct betas will change every day as the positions held change every day. In theory then that means we will have to estimate 200 variables with just one piece of data - todays daily return. That's a ratio of 0.005x; well below 10!
Note - we may also have the returns for each individual manager in the index, but a moments thought
will tell you that this is not actually helpful as it just means we will have twenty regressions to do, each with exactly the same dimensionality problem.
We can get round this. One good thing is that these CTAs aren't trading that quickly, so the position weights we should use today are probably pretty similar to yesterdays. So we can use more than one day of returns to estimate the correct current weights. The general approach in top down replication is to use rolling windows in the 20 to 40 day range.
We now have a ratio of 40 datapoints: 50 coefficients - which is still less than ten.
To solve this problem we must reduce the number of betas we're trying to estimate by reducing the number of instruments in our replacing portfolio. This can be done by picking a set of reasonably liquid and uncorrelated instruments (say 10 or 15) to the point where we can actually estimate enough position weights to somewhat replicate the portfolio.
However with 40 days of observations we need to have just four instruments to meet our rule of thumb. It would be hard to find a fixed group of four instruments that suffice to do a good job of replicating a trend index that actually has hundreds of instruments underlying it.
To deal withs problem, we can use some fancy econometrics. With regularisation techniques like LASSO or ridge regression; or stepwise regressions, we can reduce the effective number of coefficients we have to estimate. We would effectively be estimating a small number of coefficients, but they would be the coefficients of four different instruments over time (yes this is a hand waving sentence) which give us the best current fit.
Note that there is a clear trade off here between the choice of lookback window, and the number of coefficients estimated (eithier as an explicit fixed market choice, dynamically through stepwise regression, or in an implicit way through regularisation):
- Very short windows will worsen the curse of dimensionality. Longer windows won't be reactive enough to position changes.
- A smaller set of markets means a better fit, and means we can be more reactive to changes in positions held by the underlying markets, but it also means we're going to do a poorer job of replicating the index.
Introducing strategies and return factors
At this point if we were top down replicators, we would get our dataset and start running regressions. But instead we're going to pause and think a bit more deeply. We actually have additional information about our CTA managers - we know they are CTA managers! And we know that they are likely to do stuff like trend following, as well as other things like carry and no doubt lots of other exotic things.
That information can be used to improve the top down regression. For example, we know that CTA managers probably do vol scaling of positions. Therefore, we can regress against the vol scaled returns of the underlying markets rather than the raw returns. That will have the benefit of making the betas more stable over time, as well as making the Betas comparable and thus more intuitive when interpreting the results.
But we can also use this information to tip the top down idea on it's head. Recall:
Each manager trades in X underlying assets with returns R_0.....R_X. At any given time they will have positions in each of these assets, P_c_x (so for manager 0, P_0_0.... P_0_X, for manager 1, P_1_0...P_1_X so there will be X*N positions at each time interval).
Now instead we consider the following:
Each manager trades in Y underlying strategies with returns r_0.....r_Y. At any given time they will have weights in each of these strategies, w_c_y (so for manager 0, w_0_0.... w_0_Y, for manager 1, w_1_0...w_1_Y so there will be Y*N positions at each time interval).
Why is this good? Well because strategy weights, unlike positions, are likely to be much more stable. I barely change my strategy weights. Most CTAs probably do regular refits, but even if they do then the weights they are using now will be very similar to those used a year ago. Instead of a 40 day window, it wouldn't be unreasonable to use a window length that could be measured in years: thousands of days. This considerably improves the curse of dimensionality problem.
Some simple tables
For a given number of X instruments, and a given number of Y strategies, Z for each instrument:
Top down Bottom up
Approx optimal window size 40 days 2000 days
Number of coefficients X X*Z
Data ratio 40 / X 2000 / X*Z
Therefore as long as Z is less than 50 the data ratio of the bottom up strategy will be superior. For example, with some real numbers - 20 markets and 5 strategies per market:
Top down Bottom up
Approx optimal window size 40 days 2000 days
Number of coefficients 20 100
Data ratio 2 20
Alternatively, we could calculate the effective number of coefficients we could estimate to get a data ratio of 10 (eithier as a fixed group, or implicit via regularisation):
Top down Bottom up
Approx optimal window size 40 days 2000 days
Data ratio 10 10
Number of coefficients 4 20
It's clear that with bottom up replication we should get a better match as we can smuggle in many more coefficients, regardless of how fancy our replication is.
A very small number of caveats
There are some "but..."'s, and some "hang on a moment's" though. We potentially have a much larger number of strategies than instruments, given that we probably use more than one strategy on each instrument. Two trend following speeds plus one carry strategy is probably a minimum; tripling the number of coefficients we have to estimate. It could be many more times that.
There are ways round this - the same ways we would use to get round the 'too many instruments' problem we had before. And ultimately the benefit from allowing a much longer window length is significantly greater than the increase in potential coefficients from multiple strategies per instrument. Even if we ended up with thousands of potential coefficients, we'd still end up selecting more of them than we would with top down replication.
A perhaps unanswerable 'but...' is that we don't know for sure which strategies are being used by the various managers, whereas we almost certainly know all the possible underlying instruments they are trading. For basic trend following that's not a problem; it doesn't really matter how you do trend following you end up with much the same return stream. But it's problematic for managers doing other things.
A sidebar on latent factors
Now one thing I have noticed in my research is that asset class trends seem to explain most of instrument trend following returns (see my
latest book for details). To put it another way, if you trend follow a global equity index you capture much of the p&l from trend following the individual constituents. In a handwaving way, this is an example of a
latent return factor. Latent factors are the reason why both top down and bottom up replication work as well as they do so it's worth understanding them.
The idea is that there are these big and unobservable latent factors that drive returns (and risk), and individual market returns are just manifestations of those. So there is the equity return factor for example, and also a bond one. A standard way of working out what these factors are is to do a decomposition of the covariance matrix and find out what the principal components are. The first few PC will often explain most of the returns. The factor loadings are relatively static and slow moving; the S&P 500 is usually going to have a big weight in the equity return factor.
Taking this idea a step further, there could also be 'alternative' return factors; like the trend following factor or carry factor (or back in equity land, value and quality). These have dynamic loadings versus the underyling instruments; sometimes the trend following factor will be long S&P 500 and sometimes short. This dynamic loading is what makes top down replication difficult.
Bottom up regression reverses this process and begins with some known factors; eg the returns from trend following the S&P 500 at some speed with a given moving average crossover, and then tries to work out the loading on those factors for a given asset - in this case the CTA index.
Note that this also suggests some interesting research ideas such as using factor decomposition to reduce the number of instruments or strategies required to do top down or bottom up replication, but that is for another day.
If factors didn't exist and all returns were idiosyncratic both types of replication would be harder; the fact they do seem to exist makes replication a lot easier as it reduces the number of coefficients required to do a good job.
Setup of an empirical battle royale
Let's do a face off then of the two methodologies. The key thing here isn't to reproduce the excellent work done by others (see the referenced papers for examples), or neccessarily to find the best possible way of doing eithier kind of replication, but to understand better how the curse of dimensionality affects each of them.
My choice of index is the BTOP50, purely because daily returns are still available for free download. My set of instruments will be the 102 I used in my
recent book 'AFTS' (actually 103, but Eurodollar is no longer trading) which represent a good spread of liquid futures instruments across all the major asset classes.
I am slightly concerned about using daily returns, because the index snapshot time is likely to be different from the closing futures price times I am using. This could lead to lookahead bias, although that is easily dealt with by introducing a conservative two day lag in betas as others have done. However it could also make the results worse since a systematic mismatch will lower the correlation between the index returns and underyling instrument returns (and thus also the strategy returns in a bottom up replication). To avoid this I also tested a version using two day returns but it did not affect the results.
For the top down replication I will use six different window sizes from 8 business days up to 256 (about a year) with all the powers of 2 in between. These window sizes exceed the range typically used in this application, deliberately because I want to illustrate the tradeoffs involved. For bottom up replication I will use eight window sizes from 32 business days up to 4096 (about sixteen years, although in practice we only have 14 years of data for the BTP50 so this means using all the available data).
We will do our regressions every day, and then use an exponential smooth on the resulting coefficients with a span equal to twice the window size. For better intuition, a 16 day exponential span such as we would use with an 8 day window size has a halife of around 5.5 days. The maximum smooth I use is a span of 256 days.
For bottom up replication, I will use seven strategies: three trend following EWMA4,16, EWMAC16,64, EWMAC64,256 and a carry strategy (carry60); plus some additional strategies: acceleration32, mrinasset1000, and skewabs180. For details of what these involve, please see AFTS or various blogposts; suffice to say they can be qualitiatively described as fast, medium and slow trend following, carry, acceleration (change in momentum), mean reversion, fast momentum and skew respectively. Note that in the Resolve paper they use 13 strategies for each instrument, but these are all trend following over different speeds and are likely to be highly correlated (which is bad for regression, and also not helpful for replication).
I will use a limited set of 15 instruments, the same as those used in the Newfound paper, which gives me 15*7 = 105 coefficients to estimate - roughly the same as in the top down replication.
I'm going to use my standard continous forecasting method just because that is the code I have to hand; the Resolve paper does various kinds of sensitivity analysis and concludes that both binary and continous produce similar results (with a large enough universe of instruments, it doesn't matter so much exactly how you do the CTA thing).
Note - it could make sense to force the coefficients on bottom up replication to be positive, however we don't know for sure if a majority of CTAs are using some of these strategies in reverse, in particular the divergent non trend following strategies.
Approx data ratios with different window sizes if all ~100 coefficients estimated:
16 days 0.16
32 days 0.32
64 days 0.64
128 days 1.28
256 days 2.56
512 days 5.12
1024 days 10.2
2048 days 20.5
4096 days 41.0
In both cases I need a way to reduce the number of regressors on the right hand side from somewhere just over 100 to something more reasonable. This will clearly be very important with an 8 day window!
Various fancy techniques are commonly used for this including LASSO and ridge regression. There is a nice summary of the pros and cons of these in an appendix of the
Resolve paper; one implication being that the right technique will depend on whether we are doing bottom up or top down replication. They also talk about elastic net, a technique that combines both of these techniques. For simplicity I use LASSO, as there is only one hyperparameter to fit (penalty size).
Here are the correlation figures for the two methods with different lookback windows:
As you can see, the best lookback for the top down method needs to be quite short to capture changing positions. Since strategy weights are more stable, we can use a longer lookback for the bottom up method. For any reasonable length of lookback the correlation produced by the top down method is pretty stable, and significantly better than the bottom up method.
Footnote: Why not do both?
One of the major contributions of the Resolve paper is the idea of combining both top down and bottom up methods. We can see why this make sense. Although bottom up is superior as it causes less dimensionality issues, it does suffer because there might be some extra 'secret sauce' that our bottom up models don't capture. By including the top down element as well we can possibly fill this gap.
Footnote on 'Creating a CTA from scratch'
You may have seen some bottom up 'replication' articles that don't use any regression, such as
this one. They just put together a set of simple strategies with some sensible weights and then do an ex-post cursory check on correlation with the index. The result, without trying, is a daily correlation of 0.6 with the SG CTA index, in line with the best bottom up results above without any of the work or the risks involved with doing potentially unstable regressions on small amounts of data. Indeed, my own trading strategies (monthly) correlation with the SG CTA index was 0.8 last time I checked. I have certainly done no regressions to get that that!
As I mentioned above, if you are a retail investor or an institutional investor who is not obsessed with benchmarking, then this might be the way to go. There is then no limit on the number of markets and strategies you can include.
Conclusion
I guess my conclusion comes back to why... why are we doing this.
If we really want to replicate the index then we should be agnostic about methodology and go with what is best. This will involve mostly bottom up with a longish window for the reasons discussed above, although it can probably be improved by including an averaging with top down.
But if we are trying to get 'exposure to some trend following factors' without caring about the index then I would probably start with the bottom up components of simple strategies on a diversified set of instruments with sensible but dumb 'no-information' weights that probably use some correlation information but not much else (see all the many posts I have done on portfolio optimisation). Basically the 'CTA from scratch' idea.
And then it might make sense to move in the direction of trying to do a bottom up replication of the index if you did decide to reduce your tracking error, though I'd probably use a robust regression to avoid pulling the strategy weights too far from the dumb weights.
Hi Rob,
ReplyDeleteThis is a very interesting take on the replication question. I just wanted to point out that I think you might have a typo in the Results section, where I think you meant to say that
"the best lookback for the TOP DOWN method needs to be quite short to capture changing positions".
I do love however that in this case the best solution to the replication question is simply the simplest solution.
Fixed thank you
ReplyDeleteNice post Rob. Glad to see that your blog lives on!
ReplyDeleteThe whole replication bother seems like a bit of a dead end. It's a poor solution to the problem which is simply that CTAs charge too much.
If/when a CTA with a solid track record gets together with a big ETF provider and they put together an ETF trading 40+ markets with sub 1% fees, replication will be forgotten.
The average pair-wise correlation of most CTAs is around 60%, so one could pick a CTA at random and do almost as well. Feels like much tail-chasing for very little deviation from H0...
ReplyDelete