This will be the first in a series of posts about portfolio optimisation. Main reason being I'm planning to write a book about backtesting, and that will include a big chunk of material on optimisation. Yes, I know, my latest book isn't out yet (it's out in December - in time for Christmas). But this backtesting book is going to be quite deep (and probably long!) so I need to start researching now if it's going to be written any time soon. Today's post is not that deep, and is quite short. It has literally been written whilst waiting for the rather extensive testing of the second post to finish. Anyone, let's begin.
One of the issues when fitting is to decide how to structure and order the process. In very abstract terms, a component of a trading strategy will consist of a forecast to predict the price of an instrument. A forecast might be something like momentum16,64 - that's the exponentially weighted moving average crossover with spans of 16 and 64 days to you my good sir or madam. An instrument is something like the US 10 year bond future. We can represent all these options in a grid like so:
momentum16,64 momentum4,16 carry10
US10 X X X
SP500 X X X
US5 X X X
.... where each 'X' is a place on the grid. If those were white squares, and if you can imagine that if there were some forecasts missing from certain instruments which were black squares, then we'd have a crossword grid. Yes that's all I've got. Quite a weak link. Apologies.You think it's easy coming up with catchy blog titles?
And note that this is a tiny subset of the full grid. Instead of these 9 possibilities my full trading system currently has 10,373 options. That's 40 trading rules across 260 different instruments. Some of those instruments are duplicated (eg SP500 mini and micro), some are no longer traded; but that still leaves 204 instruments and over 8,100 options.
Anyone it should be obvious that in doing our fitting we have a few options:
- A joint fit where we fit everything in one go. 8,100 options. In one go. Let that sink in.
- A natural clustering where we cluster together things that are correlated.
- A down and then across structured clustering where we first fit within rules - so for example working out what the best blend of US10, SP500 and US5 is within the carry10 rule - and then across rules - so estimating the best blend of carry10, momentum16,64 and momentum4,16.
- An across and then down structured clustering where we first fit within instruments - so for example working out what the best blend of carry10, momentum16,64 and momentum4,16 is within US10 - and then across instruments - so estimating the best blend of US10, SP500 and US5.
Before I begin, I did decide to limit the analysis to the last 10 years. It's kind of slow just calculating a correlation matrix from 52 years of data and many instruments dont't have data except for the last 10 years anyway. I also resampled the returns to a weekly frequency. This is what I do when optimising anyway. Unless your returns are very quick, this won't affect correlations much. That leaves us with 520 weeks (I know that isn't exactly 10 years. Let me check to see if I give a toss. No, I don't) or rows in a dataframe, with over 8,100 columns remember. That's about the limit as to what my laptop can calculate a correlation for; and it's quite a painful process to cluster these bad boys as well.
Anyway let's begin. I've come up with quite a fun way to visualise these clusters which you can see here for the first two cluster plot:
OK as you can see each cluster has a subplot. Each splot has two stacked bars. The lefthand side shows the composition by asset class. The righthand side shows the composition by trading rule. You can just about make out from the legend what the various colours mean. Note these aren't portfolio weights, and just reflect the number of instrument/rule combos in each category.Here is the three cluster plot. All three clusters look very similar for both instruments and rules, again suggesting there isn't much going on here yet on eithier axis.
I look forward to anyone who can give me a coherent story as to why those things are lumped together.
Personally I'm taking the absence of any contradictory evidence as evidence that I should continue to do what I've done before: fit across and down. Doing some kind of all group clustering or all in one fit, or doing down and then across; none of these seem to offer clear advantages. So why not stick with a simple thing that works?
Arguably this has been a waste of time, but the good news is I can recycle this code to visualise forecast weights across a strategy so that's something....







