This Blog is Systematic: To Cluster Or Not To Cluster That is the Question...

This is the sixth (!) post in a series I'm writing on portfolio optimisation. A quick reminder of the story so far:

In the first post I showed that if you are optimising across forecasts from different trading rules and instruments, that the rules within an instrument cluster naturally together, suggesting you should first fit within; and then across, instruments. Luckily, this is what I've always done.
In my second post I ran some experiments with optimising with random data. The results showed a supreme indifference between joint winners: monte carlo and bootstrapping, and a shrinkage methodology with a tiny bit of SR shrinkage. Using a more conservative viewpoint didn't affect the results much.
... before moving into the world of real data for post three where I showed that the predictability of sampling distribution of parameter estimates was much worse with real data than with random data.
Then in post, number four, I reran my experiments this time with real data. Unsurprisingly I found that the previous winners did badly. Instead a middle ground of shrinkage was the winner across most time periods; unless the in sample period was too short.
Post number five attempted a spin on #4 by averaging out the weights produced by different shrinkage levels. It was an abject failure. We shall never speak of it again.

Importantly, all these experiments were done with a relatively small number of assets: nine. Bear in mind that when I fit forecast weights within instruments I have 40 assets; and when fitting instrument weights I have over 200.

Now you will remember that my previous favourite fitting method involves hierarchical clustering. I gave this the grand name of handcrafting, and in it's simplest form it doesn't use SR at all and just clusters by correlations or by some user assigned labels (eg asset class).

There is plenty of empirical and theoretical evidence elsewhere as to why this makes sense for larger portfolios, but as I have the code and framework to do so, I thought why not battle test for myself; as part of this (long) exercise in checking that all my portfolio optimisation assumptions are correct. The question we want to answer then is what is the optimal number of clusters as a proportion of the total portfolio size? And if it's 1, then we don't cluster.

As before I can also vary the amount of available in sample data, and data used for out of sample evaluation. And this should be a joint test with the amount of shrinkage required. With clusters, we might expect less shrinkage to be required, since that is building in some robustness already. However in the interests of time, I won't be using Monte Carlo or Bootstrapping (doing that for each cluster in turn would be .... very..... slow..... indeed).

As before I will be doing this exercise using forecast weights (for which I generally have 40 assets). I can do some random selection, alternating between 20, 30 and 40 assets; randomly choosing instruments to do this with.

Listing all the permutations:

Select 1,5 or 10 years of in sample data
Select 1 or 5 years of out of sample data
Pick a random instrument, ensuring there is enough history available (between 2 and 15 years). We will only choose from instruments with sufficient history for the time required.
Randomly pick N=20, 30 or 40 assets from those available (if 40, just use them all)

Then for a given dataset of in and out of sample:

Select correlation (from this list of options: 0, 0.7, 0.75, 0.8) and SR shrinkage (from this list: 0.0, 0.25, 0.5, 0.75, 1.0)
Select optimal number of clusters from 1 (no clustering), 2,3,4,6,8,10.
Run in sample optimisation and out of sample optimisation on all the options above.
Repeat a few hundred times (it's quite slow!).

As I have done before I will also be checking speed - obviously adding clusters will increase optimisation time; firstly because the clustering and aggregation process itself adds time, but mainly because we will end up doing more optimisations. For example, for N=20 doing ten clusters would require 11 optimisations, one for each cluster, and one across clusters. Whilst the individual optimisations might be slightly faster than for smaller portfolios due to faster convergence, this is still probably going to be longer than a single optimisation.

The speed of the optimisations varied between 28 seconds up to 294 seconds. Longer in sample time means slightly slower estimation time, more assets means slower convergence, less shrinkage also means slower convergence and as discussed just now smaller clusters also slows things down.

Since we're trying to establish here how much clustering to do, the only thing to note is that clustering imposes a speed penalty.

TLDR: This is a very long post due to the exhaustive number of combinations. There are numerous pretty plots to look at. But if you get bored and skip to the end to see the results, I won't judge you. Honest.

Results - forecast weights

20 assets - one year in sample, one year out of sample

In the previous couple of posts I showed the results as a matrix with different levels of shrinkage, for different time periods. Here it's a bit trickier, since we're considering the performance on 3 axis: two shrinkage, plus cluster size (where 1 is no clustering). But my screen only has two axis. Hence - in the words of many peoples relationship status on facebook circa 2010 - it's complicated.

This is a heatmap and the right hand rectangle is the key. The left hand is the data. The x-axis is the amount of correlation shrinkage. Apologies for the bunched up numbers: the values are 0,.7,.75,.8. Zero is there for fun; the other values are approximately. On the y-axis are the SR shrinkage and the number of clusters. So 1.0,8 is full shrinkage and 8 clusters. The colours are the median SR for each out of sample optimisation except that I've done what I did before in post four. I found the optimum value (zero correlation shrinkage, 0.75 SR shrinkage, cluster size of 5). I then did a t-test to compare that optimum SR value against all the other values. Where that test failed at a critical value of 0.1; in other words when the relevant value isn't significantly different from the optimum; I coloured in the square with the same colour SR value as the optimum.

If that explanation doesn't make sense you should probably reread post#4

The interpretation here is that with a few exceptions pretty much any value is fine as almost everything is one colour.

30 assets - one year in sample, one year out of sample

This is similar to 20 assets - nothing significant.

40 assets - one year in sample, one year out of sample

Here anything works, except:

Too much SR shrinkage (the coloured area at the bottom of the plot)
Not clustering

20 assets - one year in sample, five years out of sample

Again it's easier to say what doesn't work:

No correlation shrinkage
Too much SR shrinkage

The amount of clustering doesn't really influence the results.

30 assets - one year in sample, five years out of sample

Similar; perhaps a hint that smaller clusters underperform but just that.

40 assets - one year in sample, five years out of sample

Whilst there is more significance here, and a clear dislike for zero or full SR shrinkage, plus zero correlation shrinkage; again it does look like applying any degree of clustering is equally valid.

20 assets - five years in sample, one year out of sample

At this stage some people will be regretting their decision to print out my blogpost. In colour. For their sakes, I will be only reporting results where there is significance.

And for this combo, nothing is really significantly bad.

30 assets - five years in sample, one year out of sample

Nothing is really significantly bad.

40 assets - five years in sample, one year out of sample

Certainly a preference here for lower amounts of SR shrinkage; with weaker evidence that middling amounts of clustering work well.

20 assets - five years in sample, five years out of sample

Focusing purely on clusters; it looks like 2 clusters is the one to avoid here.

30 assets - five years in sample, five years out of sample

Nothing is really significantly bad.

40 assets - five years in sample, five years out of sample

20 assets - ten years in sample, one year out of sample

The whole plot is the same colour. Literally nothing to see here.

30 assets - ten years in sample, one year out of sample

40 assets - ten years in sample, one year out of sample

Again it looks like a modest amount of clustering; with N between 4 and 6 might be best.

20 assets - ten years in sample, five years out of sample

Looks like a case for larger cluster size...I think?

30 assets - ten years in sample, five years out of sample

Shrink the correlation and do some kind of clustering and you'll be fine mate.

40 assets - ten years in sample, five years out of sample

The final plot - and the one with the most significance. Shrinkage of about 0.75 on both and big clusters is the way to go here. More generally again it looks like middling cluster sizes are about right.

Summary

I think it is fair to say there aren't many definitive conclusions one can draw from that... experience. I would say however that there is some weak evidence that some level of clustering is better than none at all. And I would say there is even weaker evidence that you don't want your clusters to be too small, i.e. have a larger number of clusters.

As in the previous post I could test the effect of combining portfolio weights derived with different cluster sizes. But given that we have struggled to find much statistical significance here it seems unlikely we'd get much satisifaction.

In the face of choosing a parameter in the face of no evidence there are two things I like - heuristic rules and powers of 2. Therefore let's say you should use 6 clusters when you're doing your thing. That's roughly equivalent to the number of distinct trading rules I have, and the number of asset classes when optimising for instrument weights. That isn't a power of 2, but 4 clusters seems a bit on the low size, and 8 a little high. In the face of choosing a parameter in the face of no evidence there are three things I like - heuristic rules, powers of 2, and taking an average of potential values.

This Blog is Systematic

Thursday, 18 June 2026

To Cluster Or Not To Cluster That is the Question...

Results - forecast weights

20 assets - one year in sample, one year out of sample

30 assets - one year in sample, one year out of sample

40 assets - one year in sample, one year out of sample

20 assets - one year in sample, five years out of sample

30 assets - one year in sample, five years out of sample

40 assets - one year in sample, five years out of sample

20 assets - five years in sample, one year out of sample

At this stage some people will be regretting their decision to print out my blogpost. In colour. For their sakes, I will be only reporting results where there is significance.

And for this combo, nothing is really significantly bad.

30 assets - five years in sample, one year out of sample

Nothing is really significantly bad.

40 assets - five years in sample, one year out of sample

20 assets - five years in sample, five years out of sample

30 assets - five years in sample, five years out of sample

Nothing is really significantly bad.

40 assets - five years in sample, five years out of sample

20 assets - ten years in sample, one year out of sample

30 assets - ten years in sample, one year out of sample

40 assets - ten years in sample, one year out of sample

20 assets - ten years in sample, five years out of sample

30 assets - ten years in sample, five years out of sample

40 assets - ten years in sample, five years out of sample

Summary

No comments:

Post a Comment

Contact Me (Spam will be politely ignored)

Thursday, 18 June 2026

To Cluster Or Not To Cluster That is the Question...

Results - forecast weights

20 assets - one year in sample, one year out of sample

30 assets - one year in sample, one year out of sample

40 assets - one year in sample, one year out of sample

20 assets - one year in sample, five years out of sample

30 assets - one year in sample, five years out of sample

40 assets - one year in sample, five years out of sample

20 assets - five years in sample, one year out of sample

At this stage some people will be regretting their decision to print out my blogpost. In colour. For their sakes, I will be only reporting results where there is significance.

And for this combo, nothing is really significantly bad.

30 assets - five years in sample, one year out of sample

Nothing is really significantly bad.

40 assets - five years in sample, one year out of sample

20 assets - five years in sample, five years out of sample

30 assets - five years in sample, five years out of sample

Nothing is really significantly bad.

40 assets - five years in sample, five years out of sample

20 assets - ten years in sample, one year out of sample

30 assets - ten years in sample, one year out of sample

40 assets - ten years in sample, one year out of sample

20 assets - ten years in sample, five years out of sample

30 assets - ten years in sample, five years out of sample

40 assets - ten years in sample, five years out of sample

Summary

No comments:

Post a Comment

Contact Me (Spam will be politely ignored)

Subscribe To