This Blog is Systematic: One of These Things (Is Not Like the Others). Or is it? Pooling rule p&l estimates across instruments.

This is the eighth post in a series I'm writing on portfolio optimisation. I haven't done one of these for a few posts, so here is the story so far:

In the first post I showed that if you are optimising across forecasts from different trading rules and instruments, then you should first fit within; and then across, instruments. As I do anyway.
In my second post I ran some experiments with optimising with random data. The results showed a supreme indifference between joint winners: monte carlo and bootstrapping, and a shrinkage methodology with a tiny bit of SR shrinkage.
For post three I showed that the predictability of sampling distribution of parameter estimates was much worse with real data than with random data.
Number four, saw me rerun post #2 with real data. A middle ground of some shrinkage was the winner across most time periods; unless the in sample period was too short.
Post number five was a twist on Bayesian shrinkage and an abject failure.
Number six was about clustering. It turns out clustering is good, though not staggeringly so. A cluster size of six was arbitrarily chosen as being reasonable enough.
The post whose number is seven was about structural breaks in estimates of SR. For forecast/instrument pairings these occur between 13% and half the time depending on the critical value used. However out of sample optimisation did not show a significant SR improvement if only history after a break was used for estimation.

This post has something in common with number seven. We know that we want more data for estimation. We can get more data with more frequent data (not helpful if you are trading slowly) or more history of data. For more history we only want history that is relevant. If the history occured before a structural break then we should throw it away (at least in theory - that didn't work so well in practice).

There is another way to get more data and that is in the cross section. For example, we could combine the p&l for the trading rule momentum16 across all the US government bond futures, or the entire fixed income sector. Alternatively we could go the whole hog and combine all the p&l across every future (all this is ignoring costs, which are different for each instrument, and would have to be deducted ex-post this exercise).

A key question however when doing this is 'are these things similar enough'. For example, should we use the same SR estimate for momentum16 across all instruments. Or can we say that a particular instrument 'just doesn't trend well'? Consider Cocoa, the stand out star of many trend following portfolios in 2024. Here is it's performance using the momentum16 trading rule:

Is it wise to assume that a particular instrument is poor at trending, just because it has been historically (or at least from 1980 to 2020)?

Note: this isn't the same as assuming a particular instrument is poor - that decision will come when we look at instrument weights.

Some of you might remember this plot from my third book, Leveraged Trading:

That shows the SR for that same trading rule, momentum16, with error bars across the data set used in the book. The error bars suggest that there aren't any SR that are statistically different from each other, so we should pool everything.

If I recalculate these numbers usign the 214 instruments in my current data set, excluding duplicates, then the estimated SR trading momentum16 varies across instruments from between -1.85 and 1.95. That range is, to be fair, larger than in the plot above.

If we test the SR difference between the samples of the worst and best instrument (Lead-LME and House-US FWIW) we do get a significant t-statistic (around 4 to 5, depending on whether we only test on the overlapping period or do an independent test of all the data from both instruments). So they are significantly different. It's even more impressive when you consider there are only 3 or 5 years of data used to make this evaluation (again eithier the overlapping or the entire time period). The p-value is around 0.003%. Much less than the p-value we would crudely expect to see by fluke with 214 instruments (which would be a little below 0.5%).

It's also worth noting that this is quite a wide range of p&l for a trading rule: from -1.85 to 1.94 is a range of 3.8. For momentum64 it's from -1.0 to +1.35; and for carry10 it's from -0.86 to 1.46. The wider ranges of SR seem to be only present for the faster trend style rules; a range of 3.1 SR units for momentum4 (from -1.0 to +2.1 pre costs), 2.8 for breakout10, 3.3 for relmomentum10 and so on. Some of those more extreme figures are because some instruments don't have as much data (in the box plot above, they'd have wider boxes).

But could this just be luck? What kind of SR range would you expect across 214 random gaussian p&l streams with an underlying expected SR of 0.19 and say 4 years of data? The answer is about 2.68; with a 90% range of 2.44 to 3.11. Another way of looking at a distribution is the cross sectional standard deviation, across 214 random instruments the standard deviation of SR estimates with 4 years of data is around 0.5 (and that is much more stable than the range).

Now, if we subsample random 4 year periods for each instrument across the actual performance of momentum16 we get much bigger numbers. The SR range is around 3.88, and the standard deviation comes in around 0.59. So we can see that the real data has around a fifth more variability than what we'd expect purely from randomness.

Those four year cross sectional standard deviations come in at similar values for all trading rules; the lowest is 0.53 (momentum64) and almost all are between 0.5 and 0.6 with one standout exception: a value of 0.866 for mrinasset1000 (a slow cross sectional mean reversion within asset classes).

All this arseing around is just to confirm that there is indeed more cross sectional variability in SR estimates than luck would suggest. Which means it is possible that full pooling (use all data to estimate everything) approach might be beatable by another more selective pooling approach.

Some selective pooling approaches

There are a number of ways we could approach this exercise:

We could cluster things together that have similarities, such as by asset class. I have done this in the past but it is slightly unsatisfactory to use a method which could have subjectivity and can't 'learn' from the data. Let's ignore that for now.
We could do it on an estimate by estimate basis. We could compare the distribution of returns for say carry10 on US10 year bonds, and on US2 year bonds; and say "Well these distributions aren't significantly different. Let's pool the returns together".
Or we could look at the estimates of SR across different rules. You could have a vector of the SR for carry10, carry20 and so on. And you'd look at that vector of estimates, and calculate <some measure of> distance between them, and if the distance is low enough, then you'd pool the returns for all the rules for those two instruments.
Or we could do it on portfolio weights. We could for example fit the weights for 2 year bonds, and for 10 year bonds, and then see if they were significantly different. We could then pool the returns if they were not that different. In fact I have looked at this before*.

* Note: we could also take an average of the weights, but that risks 'over robustness' if the weights were already produced using some kind of shrinking technology; i.e. you would end up averaging the weights too much.

An advantage of the second approach is we don't need to worry if instruments have different sets of trading rules available to them. Another is that it's obvious what significantly different means - we can just run plain vanilla t-tests like we did with the previous post on structural breaks at some critical value (CV). A disadvantage is that we need to do a lot more computation (one set for every rule seperately); yet another is that because we are testing so many things the danger of finding false positives is greatly increased. The standard response to this is to reduce the CV but then we're likely to miss some real differences. Yes, the classic dilemma of statistical testing.

For the third and fourth approaches we're basically saying "is this instrument like this other instrument". That radically reduces the set of comparisons we have to do (just comparing instruments with instruments). The disadvantage is that it's harder to compare instruments with different rules; though not impossible. If the rule sets are similar enough then we can do our comparisons with a zero weight to the missing rules; and then just add back the rules that aren't in the shared dataset. Then we have the calibration of significance. We can use random data, like I did above, to work out what the likelihood of a particular distance between estimates or weights happening by pure fluke.

It goes get difficult however to incorporate missing or additional forecasts when we are comparing weights. But that's straightforward with the third method. If some rules are missing from the pooled returns, then we can just calculate the additional statistics we need using returns only for that instrument.

Note that all four approaches can incorporate different cost levels. We just do everything in pre-cost world, and then as a final step deduct costs from the gross returns before optimising a given instrument. Note we couldn't do that as easily if were averaging weights for the third approach.

As I have looked at using weights before, I'm going to look at SR based grouping in this post - approach #3. So to be clear what I am doing is:

"Look at the estimates of SR across different rules. You could have a vector of the SR for carry10, carry20 and so on. And you'd look at that vector of estimates, and calculate <some measure of> distance between them, and if the distance is low enough, then you'd pool the returns for all the rules for those two instruments."

Calibrating the threshold

This then is the sort of thing we are looking at:

US10 US5 SP500

breakout10 -0.02 0.10 -0.43

breakout20 0.23 0.28 -0.08

breakout40 0.36 0.39 0.07

breakout80 0.35 0.38 0.25

breakout160 0.28 0.30 0.34

breakout320 0.30 0.41 0.37

relmomentum10 0.09 0.23 -0.15

relmomentum20 -0.02 0.11 -0.10

relmomentum40 0.12 0.13 -0.14

relmomentum80 0.17 0.26 -0.13

mrinasset1000 -0.06 -0.15 -0.49

carry10 0.51 0.50 0.10

carry30 0.53 0.50 0.10

carry60 0.53 0.48 0.12

carry125 0.54 0.49 0.18

assettrend2 0.02 0.03 -0.17

assettrend4 0.25 0.25 -0.17

assettrend8 0.41 0.40 -0.05

assettrend16 0.43 0.44 0.17

assettrend32 0.39 0.39 0.25

assettrend64 0.40 0.40 0.26

normmom2 0.05 0.14 -0.35

normmom4 0.25 0.30 -0.27

normmom8 0.39 0.38 -0.10

normmom16 0.42 0.42 0.08

normmom32 0.40 0.42 0.25

normmom64 0.37 0.43 0.32

momentum4 0.18 0.27 -0.23

momentum8 0.35 0.42 -0.02

momentum16 0.39 0.46 0.16

momentum32 0.35 0.42 0.30

momentum64 0.32 0.41 0.35

relcarry 0.04 0.20 0.14

skewabs365 -0.02 -0.12 0.24

skewabs180 0.17 0.02 0.32

skewrv365 0.29 0.22 -0.06

skewrv180 0.26 0.16 0.19

accel16 0.31 0.31 -0.11

accel32 0.22 0.28 -0.16

accel64 0.03 0.14 -0.04

That shows you the vector Sharpe Ratios for each trading rule for three different instruments. The question then is, should be pool the returns of US5 and US10? And what about SP500? Or are these vectors of SR distinctively different and we should not pool at all?

Just by eye the two bonds do look very similar. The S&P 500 not so much. A simple euclidian distance metric gives a distance of 0.07 between the two bonds; and around 0.31 between the equity and the bonds. Now this distance measure is crude. It doesn't take into account the length of data each asset has. A proper statistical test if the time periods were matched would also look at the correlation of the two matched return distributions. But regular followers of this blog will know that I love crude. So let's run with this.

Note: The distance metric is just sqrt(average(w_i_1 - w_i_2)) for weights in rules i=1...N and for instruments 1 and 2.

How can we calibrate this distance measure? To put it another way, is 0.07 very low, and is 0.31 very high? Should we pool everything? Just the bonds?

Well, my bias is towards pooling. Before writing this blog post I've generally pooled everything without giving it a moments thought. So I would only be not pooling if there is a high chance that two instruments are distinctly different. That suggests my rule is to pool two instruments, unless their vector distance is above some critical value. In the simple example above, any critical value in the range 0.08 to 0.29 would imply pooling the two bonds, but would say not to pool the S&P. A critical value below 0.07 would result in no pooling. A value of 0.32 or above would imply pooling everything.

How to find these critical values? Easy! I can set the critical value using random data at a level where we would pool unless there is a (say) 95% chance that the instruments are actually significantly different. We know that we typically have about 20 years of data. We know that the average trading rule on the typical instrument has a SR of around 0.15. If we generate 40 lots of random returns with that, and measure the SR, we'll get something like one of the columns above. Repeat that for another non existent instrument and we have two random 'instruments' we can calculate a weight for. Then we calculate the distance between those two random weights. Finally we rinse and repeat many times. We then get a distribution of weights. Voila:

We can certainly say that a distance of 0.31 is what we'd easily expect through random chance, whereas 0.07 is very unlikely to happen. You may be wondering how different that would be for say an instrument with just 5 years of data. Wonder no longer:

With less data and more variability in SR larger distances are more common. And what about the upper end with say 40 years of data (roughly what we have for the two bonds and S&P above):

There is less variability in outcomes with longer periods, so the distances are smaller. We're still not seeing a distance of 0.07 here though. It's incredibly unlikely to be a coincidence that the two bonds have roughly the same weights. At the other end of the spectrum, it's also pretty unlikely that the S&P and the two bonds are actually drawn from the same distribution; since 0.31 is at the right edge of the distribution of distances.

Now we should apply a pinch of salt correction to numbers from random data as we know from post two that real data doesn't behave like random data with a fixed SR, and shows more variability. So we would expect larger distances in real data than we see here, particularly for longer periods. For a given critical value calibrated using random data, it's likely that with actual data we will see more apparent significant differences, and therefore slightly less pooling going on.

The other thing to point out is that there are (N^2-N)/2 possible pairwise comparisons of weights; or for 214 instruments 23,000 or so. Where we to set our threshold for pooling at (say) 99% we'd expect to see over 200 apparent significant weight differences just by chance even if the instruments had no significant difference in true SR.

Fortunately there isn't that much difference in the tails for the distribution of distances over different time periods from random data:

1 year 5 years 10 years 20 years 30 years 40 years

95% 1.65 0.74 0.53 0.37 0.30 0.26

99% 1.81 0.80 0.57 0.40 0.33 0.28

99.9% 1.91 0.85 0.61 0.43 0.35 0.30

Note that using the bottom row would just barely still result in S&P 500 and the bonds getting seperated (distance 0.31 with 40 years or so of data); and with a little less history they would be pooled together. Despite the higher risk of false positives, my bias to pooling isn't so absurd that I think that should happen. Anyway, I'm going to use the top row, which implies:

We will pool instruments unless there is a 95% chance or greater that their SR vectors are significantly different. That will be the case if their SR vectors have distances greater than 1.65 (one year or less) up to 0.26 (40 years or less).

What if we have instruments with different amounts of history? Given our bias is towards pooling, I would always use a higher critical value for distance. For example, if we had one instrument with 5 years of returns and another with 40 years, I'd use the critical value for 5 years. That also means that instruments with less returns are more likely to be pooled when they first enter the data set. Which feels like the correct approach.

The pooling algo in full

OK so what we do is (for a given point in time as this will be on an in and out of sample basis):

estimate the SR for each rule on each instrument
for that vector of SR, work out the distance between that instrument and all other instruments
calculate the critical value for each pairing, using the lowest number of years available for one of the instruments.
assuming there is at least one pair where the distance is less than the critical value, find the pair with the smallest distance
combine the returns of those instruments together into a new psuedo pooled instrument
calculate the SR vector, and distances between this new pseudo instrument and all existing instruments (all the previously calculated distances between the remaining instruments will remain the same and don't need recalculating). Note that the number of years available for a pseudo instrument will be equal to the sum of the years on the individual instruments.
repeat until there are no distances less than the critical value (which means there is a less than 95% chance that their SR are significantly different).

We now have a mixture of pseudo instruments and possibly instruments. We use the pre-cost returns for each pool to optimise the appropriate portfolio weights. There is some additional logic around handling distinct and missing forecasts, and also costs, but for now let's keep things simple.

Just for fun

Just for fun, let's run a single in sample test and see what gets pooled together in instrument space. This is akin to clustering exercises I have done before but there I just used underlying instrument returns.

This takes a while, and you may be surprised by some of the first pairs of instruments that are pooled together:

Pooling DistanceKeys(key1='SOFR', key2='EDOLLAR') # both STIR

Pooling DistanceKeys(key1='US5', key2='US10') # the example we have been using

Pooling DistanceKeys(key1='PLAT', key2='REDWHEAT') # WTF!

Pooling DistanceKeys(key1='HEATOIL', key2='CHF') # WATF?!

Pooling DistanceKeys(key1='GILT', key2='CAD10') # both 10 year bonds fine

Pooling DistanceKeys(key1='ZAR', key2='DAX') # WATFF?!?!?

...

Anyway once finished we end up with 54 pooled returns rather than 214 of our original distinct instruments.

There are two huge groups that take in a big chunk of the instruments. Here is the first with 103 instruments:

['AUD_micro', 'BBCOMM', 'BOBL', 'BONO', 'BRE', 'BRENT_W', 'BTP', 'BTP3', 'BUND', 'BUXL', 'CAD', 'CAD10', 'CANOLA', 'CH10', 'CHEESE', 'CHF', 'COCOA', 'COCOA_LDN', 'COFFEE', 'COPPER-micro', 'CORN', 'COTTON', 'COTTON2', 'CRUDE_ICE', 'CRUDE_W_micro', 'DAX', 'DOW', 'DX', 'EDOLLAR', 'EU-BANKS', 'EU-DJ-UTIL', 'EURCHF', 'EUR_micro', 'FANG', 'FEEDCOW', 'FTSE250', 'FTSECHINAA', 'GAS-PEN', 'GASOIL', 'GASOILINE', 'GAS_US_mini', 'GBP', 'GICS', 'GILT', 'GOLD_micro', 'HANG_mini', 'HEATOIL', 'HIGHYIELD', 'IBEX_mini', 'IRS', 'JGB', 'JGB-SGX-mini', 'JPY', 'KOSPI_mini', 'KR3', 'LEANHOG', 'LIVECOW', 'LUMBER', 'MILLWHEAT', 'MSCISING', 'MXP', 'NASDAQ_micro', 'NIFTY', 'NIKKEI', 'OAT', 'OATIES', 'OJ', 'OMX', 'PALLAD', 'PLAT', 'PLN', 'R1000', 'RAPESEED', 'REDWHEAT', 'RICE', 'ROBUSTA', 'RUR', 'SGX', 'SHATZ', 'SILVER', 'SMI-MID', 'SOFR', 'SOYBEAN_mini', 'SOYMEAL', 'SOYOIL', 'SP400', 'SP500_micro', 'SUGAR11', 'SUGAR_WHITE', 'TOPIX', 'US-DISCRETE', 'US-HEALTH', 'US-TECH', 'US10', 'US10U', 'US2', 'US20', 'US30', 'US5', 'VIX_mini', 'WHEAT', 'YENEUR', 'ZAR']

Note that this does include both the S&P 500 and the US bond markets!

The second group of fifty instruments is mostly stock sectors, but not entirely:

['AEX', 'BOVESPA', 'CAC', 'CAD2', 'CAD5', 'CLP', 'CZK', 'DJSTX-SMALL', 'EU-AUTO', 'EU-BASIC', 'EU-CHEM', 'EU-CONSTRUCTION', 'EU-DIV30', 'EU-DJ-TELECOM', 'EU-FOOD', 'EU-HEALTH', 'EU-MID', 'EU-OIL', 'EU-REALESTATE', 'EU-RETAIL', 'EU-TECH', 'EU-TRAVEL', 'EURCAD', 'EURO600', 'EUROSTX', 'EUROSTX-SMALL', 'FTSE100', 'FTSECHINAH', 'GBPCHF', 'GBPEUR', 'IG', 'INR', 'MSCIASIA', 'MSCIEAFA', 'NICKEL_LME', 'NOK', 'NZD', 'RUSSELL', 'SEK', 'SMI', 'SPI200', 'US-ENERGY', 'US-FINANCE', 'US-INDUSTRY', 'US-MATERIAL', 'US-PROPERTY', 'US-REALESTATE', 'US-STAPLES', 'US-UTILS', 'V2X']

Again in terms of weirdness, the Canadian 10 year bond is in group 1 whilst everything else is in group 2. VIX is in group 1, and V2X in group 2.

Next there are a few small groups, which mostly don't have any internal logic:

BRENT-LAST, USIRS5, USIRS10 (two out of three make sense)

MILK, WHEY, KR10, WHEAT_ICE (two out of four make sense)

FED, COAL-GEORDIE (nope, I got nothing)

IRON, ETHANOL

EURIBOR-ICE, COAL

SUGAR16/MILKDRY (the "what you shouldn't put in your coffee" group)

This leaves 46 instruments which can't be pooled with anything else:

['ALUMINIUM', 'AUDJPY', 'BB3M', 'BITCOIN', 'BUTTER', 'CHFJPY', 'CHINAA-CON', 'CNH', 'COPPER_LME', 'ETHER-micro', 'EU-HOUSE', 'EU-INSURE', 'EU-MEDIA', 'EUA', 'EURAUD', 'FTSEINDO', 'FTSETAIWAN', 'FTSEVIET', 'GBPJPY', 'HANGENT_mini', 'HANGTECH', 'HOUSE-US', 'JP-REALESTATE', 'KOSDAQ', 'KRWUSD_mini', 'LEAD_LME', 'LUMBER-new', 'MIB', 'MILKWET', 'MSCIEMASIA', 'MSCITAIWAN', 'MSCIWORLD', 'MUMMY', 'RUBBER', 'SARONA', 'SGD', 'SONIA3', 'STEEL', 'SWISSLEAD', 'TIN_LME', 'TWD', 'US3', 'USIRS2ERIS', 'USIRS5ERIS', 'VNKI', 'ZINC_LME']

Oh yes crypto people, Bitcoin and Ethereum are 'special'. As special as Tin and Rubber anyway. The furthest distance remaining after all that pooling is just under 0.31 which just exceeds the relevant critical value.

Evaluating the results

You should be used to the procedure by now if you've been following the blog posts. I will do the usual thing of cycling through different lengths of in sample (5 years, 10 years) and out of sample (1 year and 5 years) lengths of time. For shorter time periods that will allow me to subsample different historic periods. For speed and to get some alternative paths I'm not going to consider all the instruments. Instead I will randomly subsample 50 instruments randomly out of the 214 available. Note that the pool of available instruments will be smaller when I am using e.g. 15 years of in and out of sample data, which is why I'm not going to a 40 year in sample period as I've done before when investigating structural breaks.

Then for a given set of returns I will eithier use fully pooled, distance weight pooled, or unpooled returns. Then I will optimise for each instrument based on the relevant returns, using the shrinkage method with SR shrinkage of 0.5 and correlation of 0.75. Finally I will take the equally weighted across instruments portfolio SR for the 50 instruments, out of sample. Note this should be better than the average SR for each instrument. I could do better with some kind of instrument weight allocation, but that is for another day. I should still be able to pick up whether I am losing in lower diversification through pooling.

... with a twist

Basically everything I have done up until now includes implicit in sample fitting, because I'm only selecting from trading rules that actually work. This will inflate the backtest results, but until now at least won't have a serious effect on the calibrations I have been running. But with this step of looking at pooling I am worried that there will be rules that just don't work on some instruments. To try and alleviate that in sample fitting problem, I'm going to include the opposite of each trading rule as well as the original rule. Then at each optimisation we only choose the positive SR option. Note there are no trading costs at this stage of my research so the p&l of the opposite rule is exactly equal to -1* the 'correct' rule.

Anyway on with the results.

5 years in sample, 1 year out of sample

               SR    pvalue
unpooled     0.320     0.0
all pooled   0.447     0.0
algo pooled  0.514     NaN

               SR    pvalue
unpooled    -0.104     0.0
all pooled   0.391     NaN
algo pooled  0.016     0.0

The first table shows the results as I've been analysing up to now, with implicit fitting and only the 'correct' version of the trading rules included. In the second table I've allowed the possibility of the opposite rule to be included. Note the much lower performance that results; and a difference of opinion on whether we are better pooling everything or using the algo.

5 years in sample, 5 year out of sample

               SR    pvalue
unpooled     0.624     0.0
all pooled   0.431     0.0
algo pooled  0.657     NaN

               SR    pvalue
unpooled     0.339     0.0
all pooled   0.346     0.0
algo pooled  0.404     NaN

Once agin the SR are reduced by being more honest, but with longer out of sample the algo pooling method is now superior.

10 years in sample, 1 year out of sample


               SR    pvalue
unpooled     0.862     NaN
all pooled   0.436     0.0
algo pooled  0.626     0.0

               SR    pvalue
unpooled     0.460     NaN
all pooled   0.304     0.0
algo pooled  0.248     0.0

The one thing we haven't got here is consistency... now not pooling at all is the correct thing to do!

10 years in sample, 5 years out of sample

               SR    pvalue
unpooled     0.683     0.0
all pooled   0.584     0.0
algo pooled  0.735     NaN


               SR    pvalue
unpooled     0.781     NaN
all pooled   0.491     0.0
algo pooled  0.650     0.0

A bit of an unusual case here since we do better on unpooled when including opposite rules, but it can happen just by luck. Anyway things really are inconsistent here...

Summary

Although the results above do seem quite messy, if we focus on the more honest figures that include opposite rules we can see a pattern if we look at the best method in each case:

5 year 1 year: All pooled

5 year 5 year: Algo pooled

10 year 1 year: Unpooled

10 year 5 year: Unpooled

Hence the more data we have, the more it seems we can allow each instrument to have it's own parameter estimates rather than sharing with other instruments.

Anyway, what to do? I am struggling here. I like more SR as much as the next guy, but I also have biases towards simplicity (Occam's razor), robustness and not changing things if I can avoid them. Sticking with what I currently do - pooling everything - is very tempting. It's simple, and it is also likely to be very robust. Not pooling at all is possibly even simpler; and with enough data history does seem to perform better. But it also worries me! Although we're ensuring robustness by using shrinkage, so maybe it's okay.

The Algo method is cool and fun, but definitely massively complicates matters. The method also doesn't produce 'nice' results. When I ran the original 'all instruments' grouping exercise, the long tail of instruments that don't fit elsewhere was slightly concerning. I had hoped to get groups that were congruent with asset classes or at least had some obvious logic, and I certainly didn't. This does suggest that the pooling by weights I have attempted before is worth a second look.

Alternatively I could use some simple heuristic like:

If an instrument has less than 5 years of data history, use pooled returns
If it has more than 25 years of history, use individual returns
With between 5 and 25 years of history, use weights that are an average of these; where the weight on pooled returns for N years of returns is (25-N)*0.05 and obviously the weight on .

So there is another blog post to come at some point where I revisit the issue of pooling.

But for now we can put pooling by SR vector in the bin, the concept permanently damaged by the sharp edge of Occams razor (topical political reference there!).

This Blog is Systematic

Monday, 29 June 2026

One of These Things (Is Not Like the Others). Or is it? Pooling rule p&l estimates across instruments.

Some selective pooling approaches

Calibrating the threshold

The pooling algo in full

Just for fun

Evaluating the results

... with a twist

5 years in sample, 1 year out of sample

5 years in sample, 5 year out of sample

10 years in sample, 1 year out of sample

10 years in sample, 5 years out of sample

Summary

No comments:

Post a Comment

Contact Me (Spam will be politely ignored)

Monday, 29 June 2026

One of These Things (Is Not Like the Others). Or is it? Pooling rule p&l estimates across instruments.

Some selective pooling approaches

Calibrating the threshold

The pooling algo in full

Just for fun

Evaluating the results

... with a twist

5 years in sample, 1 year out of sample

5 years in sample, 5 year out of sample

10 years in sample, 1 year out of sample

10 years in sample, 5 years out of sample

Summary

No comments:

Post a Comment

Contact Me (Spam will be politely ignored)

Subscribe To