Thursday, 27 May 2021

Fit forecast weights by instrument, by group or fit across all markets? Or all three?

I've long been a critic of the sort of people who think that one should run a different trading system for each instrument that you trade. It is the sort of thing that makes intuitive sense; surely the S&P 500 is a completely different animal to the Corn future? And that's probably true for high frequency traders, but not at the sort of timescales that I tend to trade over (holding periods of a couple of weeks up to a couple of months). There I'm using rules that I expect to work over pretty much any instrument I trade, and to perform consistently over long periods of time.

So I've generally advocated pooling information across markets when fitting. My preferred method is to pool gross returns, then apply the costs for each individual instrument, so more expensive instruments will end up trading slower; otherwise everything will look pretty similar.

But... might instrument specific fitting actually work? Or even if that doesn't work, what about pooling together information for similar instruments? Or.... is there some way of getting the best out of all three worlds here: using a blend of instrument specific, globally pooled, and similarity pooled information?

Let's find out.



What exactly is wrong with fitting by instrument?

Let's think about a simple momentum system, where the combined forecast is a weighted average of N different trend signals, each with different speeds. These could be moving average crossovers with some length, or breakouts with some varying window. The only fitting that can be done in this kind of system is to allocate risk weightings differently to different speeds of momentum. Naturally this is a deliberate design decision to avoid 'free-form' fitting of large numbers of parameters, and reduce the issue to a portfolio optimisation problem (which is relatively well understood) with just N-1 degrees of freedom.

The decision we have to make is this: What forecast weights should a given instrument have?

Important note: my trading systems are carefully designed to abstract away any differences in instruments, mostly by the use of risk scaling or risk normalisation. Thus we don't need to estimate or re-estimate 'magic numbers' for each instrument, or calibrate them seperately to account for differences in volatility. Similarly forecasts from each trading rule are normalised to have the same expected risk, so there are no magic numbers required here eithier. This is done automatically by the use of forecast scalars and risk normalisation. 

In a simple portfolio optimisation where all assets have the same expected volatility what matters in determining the weights: Basically correlation and relative Sharpe Ratio (equivalent to mean, given the identical volatilities). 

But it turns out that when you analyse the different correlation across trading rules for different instruments, you get very similar results. 

(There's chunks of pysystemtrade code scattered throughout this post, but hopefully the general approach will be applicable to your own trading system. You may find it helpful to read my posts on optimising with costs, and my preferred optimisation method, handcrafting)


def corr_from(system, instrument):
y = system.combForecast.calculation_of_raw_estimated_monthly_forecast_weights(instrument)
return y.optimiser_over_time.optimiser.calculate_correlation_matrix_for_period(
y.optimiser_over_time.fit_dates[-1]).as_pd().round(2)
corr_from(system, "CORN")
momentum16 momentum32 momentum4 momentum64 momentum8
momentum16 1.00 0.88 0.65 0.61 0.89
momentum32 0.88 1.00 0.41 0.88 0.64
momentum4 0.65 0.41 1.00 0.21 0.89
momentum64 0.61 0.88 0.21 1.00 0.37
momentum8 0.89 0.64 0.89 0.37 1.00

corr_from(system, "SP500")
momentum16 momentum32 momentum4 momentum64 momentum8
momentum16 1.00 0.92 0.60 0.79 0.90
momentum32 0.92 1.00 0.40 0.94 0.71
momentum4 0.60 0.40 1.00 0.29 0.85
momentum64 0.79 0.94 0.29 1.00 0.57
momentum8 0.90 0.71 0.85 0.57 1.00

We can see that the results are fairly similar: in fact they'd result in very similar weights (all other things being equal). 

This is partly because my handcrafted method is robust to correlation differences that aren't significant, but even a vanilla MVO wouldn't result in radically different weights. In fact I advocate using artifical data to estimate the correlations for momentum rules of different speed, since it will give a robust but accurate result.

(Things are a bit different for carry and other more exotic trading rules, but I'll be bringing those in later)

What about Sharpe Ratio? Well there are indeed some differences....

def SR_from(system, instrument):
y = system.combForecast.calculation_of_raw_estimated_monthly_forecast_weights(instrument)
std = np.mean(list(y.optimiser_over_time.optimiser.calculate_stdev_for_period(y.optimiser_over_time.fit_dates[-1]).values()))
means =y.optimiser_over_time.optimiser.calculate_mean_for_period(y.optimiser_over_time.fit_dates[-1])
SR = dict([
(key, round(mean/std,3)) for key,mean in means.items()
])

return SR
SR_from(system, "CORN")
{'momentum16': 0.39, 'momentum32': 0.296, 'momentum4': -0.25, 'momentum64': 0.102,
'momentum8': 0.206}

SR_from(system, "SP500")
{'momentum16': 0.147, 'momentum32': 0.29, 'momentum4': -0.207, 'momentum64': 0.359,
'momentum8': -0.003}

We can see the well known effect that faster momentum isn't much cop for equity indices, as well as some other differences.

But are they significant differences? Are they significant enough that we should use them in determining what weights to use? Here are the forecast weights with no pooling of gross returns for each instrument:


system.config.forecast_weight_estimates['pool_gross_returns'] = False
system.combForecast.get_forecast_weights("CORN").iloc[-1].round(2)
momentum16 0.39
momentum4 0.00
momentum8 0.13
momentum64 0.16
momentum32 0.32
system.combForecast.get_forecast_weights("SP500").iloc[-1].round(2)
momentum16 0.22
momentum4 0.00
momentum8 0.08
momentum64 0.36
momentum32 0.33


The weights are certainly a bit different, although my use of a robust optimisation process (handcrafting) means they're not that crazy. Or maybe it makes more sense to pool our results:

system.config.forecast_weight_estimate['pool_gross_returns'] = True
system.combForecast.get_forecast_weights("CORN").iloc[-1].round(2)
momentum16 0.21
momentum4 0.00
momentum8 0.11
momentum64 0.30
momentum32 0.38
system.combForecast.get_forecast_weights("SP500").iloc[-1].round(2)
momentum16 0.22
momentum4 0.01
momentum8 0.16
momentum64 0.24
momentum32 0.37

(The small differences here are because we're still using the specific costs for each instrument - it's only gross returns that we pool). 

There is a tension here: We want more data to get robust fitting results (which implies pooling across instruments is the way to go) and yet we want to account for idiosyncratic differences in performance between instruments (which implies not pooling).

At the moment there is just a binary choice: we eithier pool gross returns, or we don't (we could also pool costs, and hence net returns, but to me that doesn't make a lot of sense - I think the costs of an instrument should determine how it is traded).

And the question is more complex again, because what instruments should we pool across?  But maybe it would make more sense to pool across instruments within the same asset class? This effectively is what was done at AHL when I worked there due to the fact that we ran seperate teams for each asset class (I was head of fixed income), and each team fitted their own models (What they do now, I dunno. Probably some fancy machine learning nonsense). Or across everything, regardless of costs?

Really, we have three obvious alternatives:

  • Fit by instrument, reflecting the idiosyncractic nature of each instrument
  • Fit with information pooled across similar instruments (same asset class? Perhaps)
  • Fit with information pooled across all instruments

So the point of this post is to test these alternatives out. But what I also want to try is something else: a method which uses a blend of all three methods. In this post I develop a methodology to do this kind of 'blended weights' (not a catchy name! Suggestions are welcome!).



A brief interlude: The speed limit

In my first book I introduce the idea of a 'speed limit' on costs, measured in annualised risk adjusted terms (so effectively a Sharpe Ratio). The idea is that on a per instrument, per trading rule basis it's unlikely (without overfitting) you will get an average SR before costs of more than about 0.40 on average, and you wouldn't want to spend more than a third of that on costs (about 0.13). Therefore it makes no sense to include any trading rules which breach this limit for a given instrument (which will happen if they trade too quickly, and the instrument concerned is relatively expensive to trade).

Now whilst I do like the idea of the speed limit, one could argue that it is unduly conservative. For starters, expensive rules are going to be penalised anyway since I optimise on after costs returns, and I am taking SR into account when deciding the correct weights to use. In fact they get penalised twice, since I include a scaling factor of 2.0 on all costs when optimising. Secondly, a fast rule might not affect turnover on the entire system once added to a bunch of slower rules, especially if it has some diversifying effects. Thirdly, I apply a buffering on the final position for a given instrument, which reduces turnover and thus costs anyway, so the marginal effect of allocating to a faster rule might be very small. 

It turns out that this question of whether to apply the speed limit is pretty important. It will result in different individually fitted instrument weights, different asset groupings, and different results. For this reason I'll be running the results both with, and without the speed limit. And of course I'll be checking what effect this difference has on the pre-cost and after-costs SR.



The setup


Although just looking at momentum rules alone is an interesting exercise to get a feel for the process, and make sure the code worked (I did find a few bugs!), the fact is the rules involved are far too similar to produce meaningfully different results; especially because the handcrafting method I use for optimisation is designed to produce robust weights. 

Instead I decided to use a more interesting set of rules, which basically constitute an evenly spread sample from the rules I use myself:


Here's the correlation matrix for these guys (pooling all instrument returns together)

               assettrend32  breakout10  breakout160  carry10  kurtS_abs30  momentum4
assettrend32 1.00 0.16 0.75 0.29 -0.04 0.29
breakout10 0.16 1.00 0.18 0.08 -0.01 0.82
breakout160 0.75 0.18 1.00 0.37 -0.04 0.35
carry10 0.29 0.08 0.37 1.00 -0.05 0.12
kurtS_abs30 -0.04 -0.01 -0.04 -0.05 1.00 -0.02
momentum4 0.29 0.82 0.35 0.12 -0.02 1.00
momentum64 0.73 0.15 0.89 0.46 -0.03 0.28
mrinasset160 0.02 -0.05 -0.38 -0.11 0.03 -0.11
normmom32 0.80 0.18 0.89 0.33 -0.04 0.34
relcarry 0.04 0.00 0.19 0.63 -0.02 0.02
relmomentum20 0.02 0.25 0.19 0.05 0.01 0.42
skewabs90 -0.02 0.01 0.02 0.11 -0.06 -0.03

               momentum64  mrinasset160  normmom32  relcarry  relmomentum20  skewabs90
assettrend32 0.73 0.02 0.80 0.04 0.02 -0.02
breakout10 0.15 -0.05 0.18 0.00 0.25 0.01
breakout160 0.89 -0.38 0.89 0.19 0.19 0.02
carry10 0.46 -0.11 0.33 0.63 0.05 0.11
kurtS_abs30 -0.03 0.03 -0.04 -0.02 0.01 -0.06
momentum4 0.28 -0.11 0.34 0.02 0.42 -0.03
momentum64 1.00 -0.41 0.87 0.25 0.16 0.08
mrinasset160 -0.41 1.00 -0.45 -0.19 -0.26 -0.01
normmom32 0.87 -0.45 1.00 0.13 0.22 -0.03
relcarry 0.25 -0.19 0.13 1.00 0.03 0.10
relmomentum20 0.16 -0.26 0.22 0.03 1.00 -0.06
skewabs90 0.08 -0.01 -0.03 0.10 -0.06 1.00

There are some rules with high correlation, mostly momentum of similar speeds defined differently. And the mean reversion rule is obviously negatively correlated with the trend rules; whilst the skew and kurtosis rules are clearly doing something quite different.

Here are the Sharpe Ratios (using data pooled across instruments):

{'momentum4': 0.181, 'momentum64': 0.627, 'carry10': 0.623, 
'breakout10': -0.524, 'breakout160': 0.714, 'mrinasset160': -0.271, 
'relmomentum20': 0.058, 'assettrend32': 0.683, 'normmom32': 0.682, 
'relcarry': 0.062, 'skewabs90': 0.144, 'kurtS_abs30': -0.600}


Not all of these rules are profitable! That's because I didn't cherry pick rules which I know made money; I want the optimiser to decide - otherwise I'm doing implicit fitting.

As this exercise is quite time consuming, I also used a subset of my full list of instruments, randomly picked mainly to see how well the clustering of groups worked (so there is quite a lot of fixed income for example):

'AEX', 'AUD', 'SP500', 'BUND', "SHATZ",'BOBL','US10', 'US2','US5', 'EDOLLAR', 'CRUDE_W', 'GAS_US', 'CORN', 'WHEAT'



Fit weights for individual instrument

Step one is to fit weights for each individual instrument. We'll use these for three different purposes:

  • To test instrument specific fitting
  • To decide what instruments to pool together for 'pool similar' fitting
  • To provide some of the weights to blend together for 'blended' weights


system.config.forecast_weight_estimate['ceiling_cost_SR'] = 9999 # Set to 0.13 to get weights with speed limit
system.config.forecast_weight_estimate['pool_gross_returns'] = False
system.config.forecast_weight_estimate['equalise_SR'] = False
system.config.use_forecast_weight_estimates = True
system.config.instruments = ['AEX', 'AUD', 'SP500', 'BUND', "SHATZ",'BOBL','US10', 'US2','US5', 'EDOLLAR', 'CRUDE_W', 'GAS_US', 'CORN', 'WHEAT']

system = futures_system()

wts_dict = {}
for instrument in system.get_instrument_list():
wts_dict[instrument] = system.combForecast.get_forecast_weights(instrument)


Get instrument groupings


The next stage is to decide which instruments to group together for fitting purposes. Now I could, as I said, do this by asset class. But it seems to make more sense to let the actual forecast weights tell me how they should be clustered, whilst also avoiding any implicit fitting through human selection of what constitutes an asset class. I'll use k-means clustering, which I also used for handcrafting. This takes the wts_dict we produced above as it's argument (remember this is a dict of pandas Data Frames, on per instrument):

import pandas as pd
from sklearn.cluster import KMeans


def get_grouping_pd(wts_dict, n_clusters=4):
all_wts_common_columns_as_dict = create_aligned_dict_of_weights(wts_dict)
## all aligned so can use a single index

all_wts_as_list_common_index = list(all_wts_common_columns_as_dict.values())[0].index
## weights are monthly, let's do this monthly or we'll be here all day
annual_range = range(0, len(all_wts_as_list_common_index), int(len(all_wts_as_list_common_index)/40))
list_of_groupings = [
get_grouping_for_index_date(all_wts_common_columns_as_dict,
index_number, n_clusters=n_clusters)
for index_number in annual_range]

pd_of_groupings = pd.DataFrame(list_of_groupings)
date_index = [all_wts_as_list_common_index[idx] for idx in annual_range]
pd_of_groupings.index = date_index

return pd_of_groupings


def get_grouping_for_index_date(all_wts_common_columns_as_dict: dict,
index_number: int, n_clusters = 4):
print("Grouping for %d" % index_number)
as_pd = get_df_of_weights_for_index_date(all_wts_common_columns_as_dict, index_number)
results_as_dict = get_clusters_for_pd_of_weights(as_pd, n_clusters = n_clusters)

print(results_as_dict)

return results_as_dict

def get_df_of_weights_for_index_date(all_wts_common_columns_as_dict: dict,
index_number: int):

dict_for_index_date = dict()
for instrument in all_wts_common_columns_as_dict.keys():
wts_as_dict = dict(all_wts_common_columns_as_dict[instrument].iloc[index_number])
wts_as_dict = dict([
(str(key), float(value))
for key, value in wts_as_dict.items()
])
dict_for_index_date[instrument] =wts_as_dict
as_pd = pd.DataFrame(dict_for_index_date)
as_pd = as_pd.transpose()

as_pd[as_pd.isna()] = 0.0

return as_pd


def get_clusters_for_pd_of_weights(as_pd, n_clusters = 4):
kmeans = KMeans(n_clusters=n_clusters).fit(as_pd)
klabels = list(kmeans.labels_)
row_names = list(as_pd.index)
results_as_dict = dict([
(instrument, cluster_id) for instrument, cluster_id in
zip(row_names, klabels)
])

return results_as_dict

As an example, here are the groupings for the final month of data (I've done this particular fit with a subset of the trading rules to make the results easier to view):

get_grouping_for_index_date(all_wts_common_columns_as_dict, -1)
{'AEX': 3, 'AUD': 3, 'BOBL': 0, 'BUND': 0, 'CORN': 1, 'CRUDE_W': 3, 'EDOLLAR': 0,
'GAS_US': 3, 'SHATZ': 0, 'SP500': 0, 'US10': 0, 'US2': 2, 'US5': 0, 'WHEAT': 1}

There are four groups (I use 4 clusters throughout, a completely arbitrary decision that seems about right with 14 instruments):

- A bond group containing BOBL, BUND, EDOLLAR, SHATZ, US5 and US10; but curiously also SP500

- An Ags group: Corn and Wheat

- US 2 year

- The rest: Crude & Gas; AEX and AUD

These are close but not quite the same as asset classes (for which you'd have a bond group, an Ags group, Energies, and equities/currency). Let's have a look at the weights to see where these groups came from (remember I'm using a subset here):

get_df_of_weights_for_index_date(all_wts_common_columns_as_dict, -1).round(2)

carry10 momentum16 momentum32 momentum4 momentum64 momentum8
AEX 0.39 0.06 0.08 0.31 0.14 0.02
AUD 0.42 0.16 0.10 0.03 0.11 0.19
CRUDE_W     0.40        0.15        0.15       0.05        0.13       0.12
GAS_US      0.42        0.09        0.08       0.12        0.11       0.18

         carry10  momentum16  momentum32  momentum4  momentum64  momentum8
CORN        0.17        0.28        0.23       0.00        0.12       0.20
WHEAT       0.28        0.18        0.23       0.00        0.24       0.07


         carry10  momentum16  momentum32  momentum4  momentum64  momentum8
US2         1.00        0.00        0.00       0.00        0.00       0.00


         carry10  momentum16  momentum32  momentum4  momentum64  momentum8
BOBL        0.66        0.10        0.11       0.00        0.13       0.00
BUND 0.64 0.06 0.13 0.01 0.13 0.03

EDOLLAR     0.67        0.00        0.19       0.00        0.14       0.00

SHATZ 0.79 0.00 0.00 0.00 0.21 0.00
SP500 0.64 0.08 0.10 0.04 0.11 0.04
US10 0.60 0.12 0.12 0.00 0.11 0.06
US5 0.56 0.12 0.13 0.00 0.13 0.06


It's a pretty convincing grouping I think! They key difference between the groups is the amount of carry that they have: a lot (bonds, S&P), a little (the Ags markets) or some (Energies and markets beginning with the letter A). (Note that US 2 year can only trade carry in this particular run - which is with the speed limit on rules. The other rules are too expensive, due to US2 very low volatility. Shatz is a tiny bit cheaper and can also trade very slow momentum. This is enough to put it in the same groups as the other bonds for now).


Fit the system by group


Now we want to fit the system with data pooled for the groups we've just created. These weights will be used for:

  • To test group fitting
  • To provide weights to blend together 
It would be straightforward (and in-sample cheating!) to use a static set of groups for fitting, but we want to use different groups for different time periods.

So the code here is a bit complicated, but it's here in this gist if you're interested.



Fit the entire system with everything pooled


Now for the final fitting, where we pool the gross returns of every instrument. The key configuration change to the default are these two:

system.config.forecast_weight_estimate['pool_gross_returns'] = True # obviously!
system.config.forecast_weight_estimate['ceiling_cost_SR'] = 9999 # ensures all markets grouped

The removal of the speed limit (sharpe ratio ceiling) is key, otherwise the system will only pool returns for instruments with similar costs. Without the ceiling we'll pool gross returns across every instrument.  I can modify the fitted weights to remove rules that exceed the SR ceiling in post processing, when I want to look at the results with the speed limit included.



Use a blended set of weights

Now for the final system: using a blended set of weights. I don't need to do any optimisation here, just take an average of:

  • The individual weights
  • The group fitted weights
  • Weights from results pooled across the entire system

I did originally think I'd do something funky here; perhaps using weights for the averaging which reflected eg the amount of data an individual instrument had (which would increase over time). But I decided to keep things simple and just take a simple average of all three weighting schemes. In any case the handcrafting method already accounts for the length of data when deciding how much faith to put in the SR estimates used for a given estimate, so an instrument with less data would have weights that were less extreme anyway.


What do the weights look like?

To get a feel for the process, here are the weights for US 2 year (with the speed limit imposed, so only rules cheap enough to trade are included). As already noted this is an expensive instrument, so the use of a speed limit will reduce the number of rules it can actually trade (making the results more tractable).  There are some noticeable effects; in particular slow intra asset mean reversion does very badly for US 2 year, but pretty well within it's group and across the entire set of instruments.

               Ind  Group  Entire  Blend
assettrend32 0.24 0.24 0.23 0.24
breakout160 0.14 0.06 0.06 0.09
carry10 0.27 0.17 0.17 0.22
mrinasset160 0.00 0.24 0.32 0.14
normmom32 0.11 0.12 0.06 0.11
relcarry 0.24 0.16 0.16 0.20
Where we to look at the rule weightings for Shatz (another expensive short duration bond market), we'd find that the individual weights were different (again look at mrinasset160), but the grouped and entire system weights would be very similar (since they are in the same group in this case); except that Shatz has an extra rule that is too expensive for US2, and because the instrument costs are a little different: 

               Ind  Group  Entire  Blend
assettrend32 0.17 0.23 0.21 0.20
breakout160 0.06 0.06 0.06 0.06
carry10 0.33 0.17 0.15 0.24
momentum64 0.06 0.06 0.11 0.06
mrinasset160 0.19 0.21 0.28 0.19
normmom32 0.06 0.12 0.05 0.10
relcarry 0.14 0.15 0.14 0.15

Similarly the rules for S&P 500 would be different again, both individually and for the group, but for the entire system they'd be fairly similar (except again, that SP500 has a few more rules it can trade, and is cheaper).

                Ind  Group  Entire  Blend
assettrend32 0.09 0.16 0.14 0.13
breakout10 0.02 0.04 0.03 0.03
breakout160 0.02 0.04 0.04 0.04
carry10 0.15 0.09 0.10 0.12
kurtS_abs30 0.18 0.17 0.05 0.18
momentum4 0.06 0.05 0.05 0.05
momentum64 0.02 0.09 0.07 0.06
mrinasset160 0.06 0.07 0.09 0.07
normmom32 0.04 0.04 0.04 0.04
relcarry 0.07 0.06 0.06 0.06
relmomentum20 0.10 0.07 0.07 0.08
skewabs90 0.17 0.12 0.27 0.13

In all three cases the 'blended' weights are (roughly) an average of the first three columns.


The results!


Remember we have have eight possible schemes here:

  • Fitting individually
  • Fitting across groups
  • Fitting across everything
  • A blend of the above
... and each of these can be done with, or without a 'speed limit' on costs (Any trading rule with a Sharpe Ratio of costs that is above 0.13 will have it's weight set to zero, regardless of what the fitted weights are). We also need a benchmark. Let's use equal weights; which will be hard to beat with the selection of rules we have (not an unusual correlation structure, or any deliberately bad rules). 

Let's just show raw Sharpe Ratios. 


                 All rules                     Speed limit
Individual          0.602                          0.545
Groups              0.656                          0.546
Everything          0.651                          0.587
Blend               0.656                          0.587
Equal wt.           0.657                          0.726

Now these are very similar Sharpe Ratios. We'd get more dramatic results if we used a crap, non robust, fitting method which didn't account for noise: something like Naive Markowitz for example. In this case we'd expect very poor results from the individual instrument weighting, and probably very good results from the blended method, with the other two methods coming somewhere between.

The first thing we notice is that using all rules is consistently better, after costs, than excluding expensive rules based on my 'speed limit' figure. Remember we'll still be giving those costly rules a lower weight for the relevant instruments because of their higher costs; but a rule that manages to handily outperform even it's cost penalty will get a decent weight.

(The exception is for equal weights; if we just equally weight *all* trading rules, that will include some that are far too expensive to trade. Equally weighting only those that pass the speed limit is a great method, and beats everything else!)

What is going on here? Let's look at the effects of costs on SR:

def net_costs(system):
return system.accounts.portfolio().gross.sharpe() - system.accounts.portfolio().sharpe()


                 All rules                       Speed limit
Individual          0.120                           0.083
Groups              0.109                           0.085
Everything          0.096                           0.079
Blend               0.090                           0.083
Equal wt.           0.178                           0.074

Ignoring equal weights, removing the speed limit does increase costs a little, but only by around 1 to 2 SR basis points; versus an improvement in net performance of between 4 and 10 SR basis points (which means gross performance must have gone up by 5 to 12 basis points). 

(For equal weights we have an extra 10 basis points of costs, but only 3 basis points of gross return improvement; hence a net loss of 7 basis points of net return)

The next thing we notice is that pooling across groups and pooling across everything is better than fitting on an individual instrument (the ranking is slightly different, depending on whether we are using the speed limit or not). Blending weights together does about as well as any other option. Equal weights is as good as or better than that.

It doesn't surprise me that pooling across everything is better than fitting by instrument; that was my original opinion. Pooling across groups is equally good; and in fact with more instruments in the portfolio I'd expect the two to end up pretty similar. What might be surprising is that pooling across groups doesn't help much when we only choose cheap rules. But think about how we formed our groups; we clustered things that had similar weights together; with the speed limit these are things that are likely to have the same level of costs. 

It isn't surprising that blended weights are better than everything, as it's a well known effect that averaging weights generally improves robustness and therefore out of sample peformance. Nor is it surprising that equal weights does so well; although it wouldn't look as good with a more esoteric set of trading rules (including ones I hadn't already pre-selected as profitable). 


Summary

To an extent this kind of post is a bit pointless, as trying to improve your optimisation technique is a time-sink that will not result in any serious improvement in performance - though it might result in more robustness. Still it's an itch I had to scratch, and I got to play with my favourite ML tool - clustering.

Why is it pointless? Well it doesn't matter so much what data you use to find your trading rule portfolio weights you use, if you're already using a robust method like handcrafting for fitting. The robust method will mostly correct for anything you do that is stupid.

Having said that a clearly stupid thing to do is to fit weights for each instrument individually - there just isn't enough meaningful information in a single market. Fitting by grouped instruments will make things more robust and also probably improve performance. By the way in my full portfolio I'd probably use more clusters since I have a lot more instruments.

Fitting across the entire portfolio seems to do okay here; but I can't help thinking there are situations in which different instruments will behave differently; I'm thinking for example of a set of rules that includes both slower momentum and faster mean reversion, where the boundary between one and the other working is fluid and depends on the instrument involved (there is some of that in the rules I've got, but not much).

Using a blend of weights is a cop-out if you can't decide what is best, which has other advantages: any kind of averaging makes things more robust. The bad news is that there is quite a lot of work involved here to get the blended weights. A compromise would be to use an average of individual instrument weights and weights fitted across the entire portfolio; this will speed things up a lot as it is estimating the grouped weights that really slows you down.

A more interesting question - and a more surprising result - is whether I should stick to using my 'speed limit' concept. Given most of the rules I trade are fairly slow anyway, it might be worth increasing it a little. This will be especially true if I change my system to one that directly optimises positions in the presence of costs, rather than just buffer.

Finally if you're dealing with a list of trading rules that you know work fairly well across instruments, and you've filtered out those that are too expensive, and the correlation structure is fairly regular: You'd be mad not to use equal weights. 

10 comments:

  1. Hi Rob,

    Great post as always! With regard to your last point, I was wondering what you would consider to be a 'regular' correlation structure?

    ReplyDelete
  2. Rob, probably a dumb question. When you go through this process, you are weighting the various rules to produce ONE signal for each instrument, right? Not running all the rules individually and having the weights determine the allocation in the overall portfolio, correct?

    ReplyDelete
    Replies
    1. ". When you go through this process, you are weighting the various rules to produce ONE signal for each instrument, right?"

      Right.

      Delete
  3. Hi Rob.

    Thanks for addressing my question about conditioning carry forecasts based on asset class on your Dec 11th TTU episode. In it, you said that your "realised carry" stat for asset classes ranged from:
    - equities, bonds, vol: >100%
    - fx, metals: ~100%
    - energy: 77%
    - ag: 46%

    I understand your concern that doing anything with these results is overfitting, but I think something is going on here. Namely, asset classes which you'd expect to have more variation in their carry forecasts due to seasonality and storage costs have lower realised carry.

    It's been my experience trading your system so far that I tend to get the strongest carry forecasts in ags and the weakest in equities (disclaimer, less than 1 year!).

    Do you know the historical carry forecasts by asset class? My guess is that they might resemble the inverse of the realised carry result. In which case, we'd be systematically betting on carry the most where our forecasts are historically weakest, and vice versa.

    Conditioning each carry forecast to that asset class's history could weight the carry rule more evenly across asset classes.

    What do you think?

    Richard

    ReplyDelete
    Replies
    1. These figures are actually in my new book :-) But 'weighting the carry rule more evenly across asset classes' is IMHO a bad idea.

      Delete
    2. Great, it can't arrive soon enough!

      I conjecture that your system is currently weighting the carry rule across asset classes the wrong way: betting biggest where the carry rule performs the worst (ags), and betting smallest where it performs the best (equities).

      It's like you're listening to the loudest voice (which has a history of exaggeration) rather than the quietest voice (which has a history of understatement).

      Does this not concern you?

      Delete
    3. Actually generally it makes sense to allow asset classes with more carry forecast to bet more, although yes generally financial assets deliver more than they promise. There is more in the book but what you propose smells like overfitting sorry. But I will look at in a future blog post.

      Delete
    4. Book arrived (yay!) and the first thing was to find the historical carry forecasts for each asset class. Following your lead and ignoring vol as it only has 2 instruments, we have:

      Asset class [avg absolute carry forecast, % carry realised]

      Equities [0.17, 170%]
      Metals [0.24, 115%]
      FX [0.44, 109%]
      Energy [0.72, 77%]
      Bonds [0.73, 142%]
      Ags [0.82, 46%]

      It looks like the relationship does exist. That is, asset classes with more variation in carry forecast realise less of their carry. You could even consider the one outlier in that list (bonds) a special case with the secular bull market inflating their realised carry.

      I was surprised to see footnote 163: "If we had used a different estimate of the forecast scalar for each asset class, then this linkage between promised and realised carry would be broken. We would have a larger forecast scalar, and hence a higher standard deviation, in asset classes where carry was systematically low such as equities. We would be taking on more risk with very poor rewards for our efforts."

      I don't understand the last sentence. It appears that we would be rewarded handsomely for taking more risk in equity carry and less risk in energy & ags carry.

      Whether it's overfitting or not is of course a separate issue, but I did make this hypothesis before seeing the data!

      Interested to hear your thoughts Rob.

      Delete

Comments are moderated. So there will be a delay before they are published. Don't bother with spam, it wastes your time and mine.