I've long been a critic of the sort of people who think that one should run a different trading system for each instrument that you trade. It is the sort of thing that makes intuitive sense; surely the S&P 500 is a completely different animal to the Corn future? And that's probably true for high frequency traders, but not at the sort of timescales that I tend to trade over (holding periods of a couple of weeks up to a couple of months). There I'm using rules that I expect to work over pretty much any instrument I trade, and to perform consistently over long periods of time.
So I've generally advocated pooling information across markets when fitting. My preferred method is to pool gross returns, then apply the costs for each individual instrument, so more expensive instruments will end up trading slower; otherwise everything will look pretty similar.
But... might instrument specific fitting actually work? Or even if that doesn't work, what about pooling together information for similar instruments? Or.... is there some way of getting the best out of all three worlds here: using a blend of instrument specific, globally pooled, and similarity pooled information?
Let's find out.
What exactly is wrong with fitting by instrument?
Let's think about a simple momentum system, where the combined forecast is a weighted average of N different trend signals, each with different speeds. These could be moving average crossovers with some length, or breakouts with some varying window. The only fitting that can be done in this kind of system is to allocate risk weightings differently to different speeds of momentum. Naturally this is a deliberate design decision to avoid 'free-form' fitting of large numbers of parameters, and reduce the issue to a portfolio optimisation problem (which is relatively well understood) with just N-1 degrees of freedom.
The decision we have to make is this: What forecast weights should a given instrument have?
Important note: my trading systems are carefully designed to abstract away any differences in instruments, mostly by the use of risk scaling or risk normalisation. Thus we don't need to estimate or re-estimate 'magic numbers' for each instrument, or calibrate them seperately to account for differences in volatility. Similarly forecasts from each trading rule are normalised to have the same expected risk, so there are no magic numbers required here eithier. This is done automatically by the use of forecast scalars and risk normalisation.
In a simple portfolio optimisation where all assets have the same expected volatility what matters in determining the weights: Basically correlation and relative Sharpe Ratio (equivalent to mean, given the identical volatilities).
But it turns out that when you analyse the different correlation across trading rules for different instruments, you get very similar results.
(There's chunks of pysystemtrade code scattered throughout this post, but hopefully the general approach will be applicable to your own trading system. You may find it helpful to read my posts on optimising with costs, and my preferred optimisation method, handcrafting)
def corr_from(system, instrument):
y = system.combForecast.calculation_of_raw_estimated_monthly_forecast_weights(instrument)
return y.optimiser_over_time.optimiser.calculate_correlation_matrix_for_period(
y.optimiser_over_time.fit_dates[-1]).as_pd().round(2)corr_from(system, "CORN")
momentum16 momentum32 momentum4 momentum64 momentum8
momentum16 1.00 0.88 0.65 0.61 0.89
momentum32 0.88 1.00 0.41 0.88 0.64
momentum4 0.65 0.41 1.00 0.21 0.89
momentum64 0.61 0.88 0.21 1.00 0.37
momentum8 0.89 0.64 0.89 0.37 1.00
corr_from(system, "SP500")
momentum16 momentum32 momentum4 momentum64 momentum8
momentum16 1.00 0.92 0.60 0.79 0.90
momentum32 0.92 1.00 0.40 0.94 0.71
momentum4 0.60 0.40 1.00 0.29 0.85
momentum64 0.79 0.94 0.29 1.00 0.57
momentum8 0.90 0.71 0.85 0.57 1.00
We can see that the results are fairly similar: in fact they'd result in very similar weights (all other things being equal).
This is partly because my handcrafted method is robust to correlation differences that aren't significant, but even a vanilla MVO wouldn't result in radically different weights. In fact I advocate using artifical data to estimate the correlations for momentum rules of different speed, since it will give a robust but accurate result.
(Things are a bit different for carry and other more exotic trading rules, but I'll be bringing those in later)
What about Sharpe Ratio? Well there are indeed some differences....
def SR_from(system, instrument):
y = system.combForecast.calculation_of_raw_estimated_monthly_forecast_weights(instrument)
std = np.mean(list(y.optimiser_over_time.optimiser.calculate_stdev_for_period(y.optimiser_over_time.fit_dates[-1]).values()))
means =y.optimiser_over_time.optimiser.calculate_mean_for_period(y.optimiser_over_time.fit_dates[-1])
SR = dict([
(key, round(mean/std,3)) for key,mean in means.items()
])
return SR
SR_from(system, "CORN")
{'momentum16': 0.39, 'momentum32': 0.296, 'momentum4': -0.25, 'momentum64': 0.102,'momentum8': 0.206}
SR_from(system, "SP500")
{'momentum16': 0.147, 'momentum32': 0.29, 'momentum4': -0.207, 'momentum64': 0.359,'momentum8': -0.003}
We can see the well known effect that faster momentum isn't much cop for equity indices, as well as some other differences.
But are they significant differences? Are they significant enough that we should use them in determining what weights to use? Here are the forecast weights with no pooling of gross returns for each instrument:
system.config.forecast_weight_estimates['pool_gross_returns'] = False
system.combForecast.get_forecast_weights("CORN").iloc[-1].round(2)
momentum16 0.39
momentum4 0.00
momentum8 0.13
momentum64 0.16
momentum32 0.32system.combForecast.get_forecast_weights("SP500").iloc[-1].round(2)
momentum16 0.22
momentum4 0.00
momentum8 0.08
momentum64 0.36
momentum32 0.33
The weights are certainly a bit different, although my use of a robust optimisation process (handcrafting) means they're not that crazy. Or maybe it makes more sense to pool our results:
system.config.forecast_weight_estimate['pool_gross_returns'] = True
system.combForecast.get_forecast_weights("CORN").iloc[-1].round(2)
momentum16 0.21
momentum4 0.00
momentum8 0.11
momentum64 0.30
momentum32 0.38
system.combForecast.get_forecast_weights("SP500").iloc[-1].round(2)
momentum16 0.22
momentum4 0.01
momentum8 0.16
momentum64 0.24
momentum32 0.37
(The small differences here are because we're still using the specific costs for each instrument - it's only gross returns that we pool).
There is a tension here: We want more data to get robust fitting results (which implies pooling across instruments is the way to go) and yet we want to account for idiosyncratic differences in performance between instruments (which implies not pooling).
At the moment there is just a binary choice: we eithier pool gross returns, or we don't (we could also pool costs, and hence net returns, but to me that doesn't make a lot of sense - I think the costs of an instrument should determine how it is traded).
And the question is more complex again, because what instruments should we pool across? But maybe it would make more sense to pool across instruments within the same asset class? This effectively is what was done at AHL when I worked there due to the fact that we ran seperate teams for each asset class (I was head of fixed income), and each team fitted their own models (What they do now, I dunno. Probably some fancy machine learning nonsense). Or across everything, regardless of costs?
Really, we have three obvious alternatives:
- Fit by instrument, reflecting the idiosyncractic nature of each instrument
- Fit with information pooled across similar instruments (same asset class? Perhaps)
- Fit with information pooled across all instruments
So the point of this post is to test these alternatives out. But what I also want to try is something else: a method which uses a blend of all three methods. In this post I develop a methodology to do this kind of 'blended weights' (not a catchy name! Suggestions are welcome!).
A brief interlude: The speed limit
In my first book I introduce the idea of a 'speed limit' on costs, measured in annualised risk adjusted terms (so effectively a Sharpe Ratio). The idea is that on a per instrument, per trading rule basis it's unlikely (without overfitting) you will get an average SR before costs of more than about 0.40 on average, and you wouldn't want to spend more than a third of that on costs (about 0.13). Therefore it makes no sense to include any trading rules which breach this limit for a given instrument (which will happen if they trade too quickly, and the instrument concerned is relatively expensive to trade).
Now whilst I do like the idea of the speed limit, one could argue that it is unduly conservative. For starters, expensive rules are going to be penalised anyway since I optimise on after costs returns, and I am taking SR into account when deciding the correct weights to use. In fact they get penalised twice, since I include a scaling factor of 2.0 on all costs when optimising. Secondly, a fast rule might not affect turnover on the entire system once added to a bunch of slower rules, especially if it has some diversifying effects. Thirdly, I apply a buffering on the final position for a given instrument, which reduces turnover and thus costs anyway, so the marginal effect of allocating to a faster rule might be very small.
It turns out that this question of whether to apply the speed limit is pretty important. It will result in different individually fitted instrument weights, different asset groupings, and different results. For this reason I'll be running the results both with, and without the speed limit. And of course I'll be checking what effect this difference has on the pre-cost and after-costs SR.
The setup
- 'momentum4' EWMAC 4,16
- 'momentum64' EWMAC 64,256
- 'carry10' Carry with a 10 day smooth
- 'breakout10' Breakout 10 day window
- 'breakout160' Breakout 160 day window
- 'mrinasset160' Mean reversion within asset classes, 160 day window
- 'relmomentum20' Cross sectional momentum within asset class, 20 day window
- 'assettrend32' Momentum for asset class, EWMAC 32,64
- 'normmom32' Normalised momentum EWMAC 32, 64
- 'relcarry' Relative carry within asset classes
- 'skewabs90' Skew 90 day window
- 'kurtS_abs30' Kurtosis conditioned on skew 30 day window
assettrend32 breakout10 breakout160 carry10 kurtS_abs30 momentum4
assettrend32 1.00 0.16 0.75 0.29 -0.04 0.29
breakout10 0.16 1.00 0.18 0.08 -0.01 0.82
breakout160 0.75 0.18 1.00 0.37 -0.04 0.35
carry10 0.29 0.08 0.37 1.00 -0.05 0.12
kurtS_abs30 -0.04 -0.01 -0.04 -0.05 1.00 -0.02
momentum4 0.29 0.82 0.35 0.12 -0.02 1.00
momentum64 0.73 0.15 0.89 0.46 -0.03 0.28
mrinasset160 0.02 -0.05 -0.38 -0.11 0.03 -0.11
normmom32 0.80 0.18 0.89 0.33 -0.04 0.34
relcarry 0.04 0.00 0.19 0.63 -0.02 0.02
relmomentum20 0.02 0.25 0.19 0.05 0.01 0.42
skewabs90 -0.02 0.01 0.02 0.11 -0.06 -0.03
momentum64 mrinasset160 normmom32 relcarry relmomentum20 skewabs90
assettrend32 0.73 0.02 0.80 0.04 0.02 -0.02
breakout10 0.15 -0.05 0.18 0.00 0.25 0.01
breakout160 0.89 -0.38 0.89 0.19 0.19 0.02
carry10 0.46 -0.11 0.33 0.63 0.05 0.11
kurtS_abs30 -0.03 0.03 -0.04 -0.02 0.01 -0.06
momentum4 0.28 -0.11 0.34 0.02 0.42 -0.03
momentum64 1.00 -0.41 0.87 0.25 0.16 0.08
mrinasset160 -0.41 1.00 -0.45 -0.19 -0.26 -0.01
normmom32 0.87 -0.45 1.00 0.13 0.22 -0.03
relcarry 0.25 -0.19 0.13 1.00 0.03 0.10
relmomentum20 0.16 -0.26 0.22 0.03 1.00 -0.06
skewabs90 0.08 -0.01 -0.03 0.10 -0.06 1.00
There are some rules with high correlation, mostly momentum of similar speeds defined differently. And the mean reversion rule is obviously negatively correlated with the trend rules; whilst the skew and kurtosis rules are clearly doing something quite different.
Here are the Sharpe Ratios (using data pooled across instruments):
{'momentum4': 0.181, 'momentum64': 0.627, 'carry10': 0.623,
'breakout10': -0.524, 'breakout160': 0.714, 'mrinasset160': -0.271,
'relmomentum20': 0.058, 'assettrend32': 0.683, 'normmom32': 0.682,
'relcarry': 0.062, 'skewabs90': 0.144, 'kurtS_abs30': -0.600}
Not all of these rules are profitable! That's because I didn't cherry pick rules which I know made money; I want the optimiser to decide - otherwise I'm doing implicit fitting.
As this exercise is quite time consuming, I also used a subset of my full list of instruments, randomly picked mainly to see how well the clustering of groups worked (so there is quite a lot of fixed income for example):
'AEX', 'AUD', 'SP500', 'BUND', "SHATZ",'BOBL','US10', 'US2','US5', 'EDOLLAR', 'CRUDE_W', 'GAS_US', 'CORN', 'WHEAT'
Fit weights for individual instrument
Step one is to fit weights for each individual instrument. We'll use these for three different purposes:
- To test instrument specific fitting
- To decide what instruments to pool together for 'pool similar' fitting
- To provide some of the weights to blend together for 'blended' weights
system.config.forecast_weight_estimate['ceiling_cost_SR'] = 9999 # Set to 0.13 to get weights with speed limit
system.config.forecast_weight_estimate['pool_gross_returns'] = False
system.config.forecast_weight_estimate['equalise_SR'] = False
system.config.use_forecast_weight_estimates = True
system.config.instruments = ['AEX', 'AUD', 'SP500', 'BUND', "SHATZ",'BOBL','US10', 'US2','US5', 'EDOLLAR', 'CRUDE_W', 'GAS_US', 'CORN', 'WHEAT']
system = futures_system()
wts_dict = {}
for instrument in system.get_instrument_list():
wts_dict[instrument] = system.combForecast.get_forecast_weights(instrument)
Get instrument groupings
The next stage is to decide which instruments to group together for fitting purposes. Now I could, as I said, do this by asset class. But it seems to make more sense to let the actual forecast weights tell me how they should be clustered, whilst also avoiding any implicit fitting through human selection of what constitutes an asset class. I'll use k-means clustering, which I also used for handcrafting. This takes the wts_dict we produced above as it's argument (remember this is a dict of pandas Data Frames, on per instrument):
import pandas as pd
from sklearn.cluster import KMeans
def get_grouping_pd(wts_dict, n_clusters=4):
all_wts_common_columns_as_dict = create_aligned_dict_of_weights(wts_dict)
## all aligned so can use a single index
all_wts_as_list_common_index = list(all_wts_common_columns_as_dict.values())[0].index
## weights are monthly, let's do this monthly or we'll be here all day
annual_range = range(0, len(all_wts_as_list_common_index), int(len(all_wts_as_list_common_index)/40))
list_of_groupings = [
get_grouping_for_index_date(all_wts_common_columns_as_dict,
index_number, n_clusters=n_clusters)
for index_number in annual_range]
pd_of_groupings = pd.DataFrame(list_of_groupings)
date_index = [all_wts_as_list_common_index[idx] for idx in annual_range]
pd_of_groupings.index = date_index
return pd_of_groupings
def get_grouping_for_index_date(all_wts_common_columns_as_dict: dict,
index_number: int, n_clusters = 4):
print("Grouping for %d" % index_number)
as_pd = get_df_of_weights_for_index_date(all_wts_common_columns_as_dict, index_number)
results_as_dict = get_clusters_for_pd_of_weights(as_pd, n_clusters = n_clusters)
print(results_as_dict)
return results_as_dict
def get_df_of_weights_for_index_date(all_wts_common_columns_as_dict: dict,
index_number: int):
dict_for_index_date = dict()
for instrument in all_wts_common_columns_as_dict.keys():
wts_as_dict = dict(all_wts_common_columns_as_dict[instrument].iloc[index_number])
wts_as_dict = dict([
(str(key), float(value))
for key, value in wts_as_dict.items()
])
dict_for_index_date[instrument] =wts_as_dict
as_pd = pd.DataFrame(dict_for_index_date)
as_pd = as_pd.transpose()
as_pd[as_pd.isna()] = 0.0
return as_pd
def get_clusters_for_pd_of_weights(as_pd, n_clusters = 4):
kmeans = KMeans(n_clusters=n_clusters).fit(as_pd)
klabels = list(kmeans.labels_)
row_names = list(as_pd.index)
results_as_dict = dict([
(instrument, cluster_id) for instrument, cluster_id in
zip(row_names, klabels)
])
return results_as_dict
As an example, here are the groupings for the final month of data (I've done this particular fit with a subset of the trading rules to make the results easier to view):
get_grouping_for_index_date(all_wts_common_columns_as_dict, -1)
{'AEX': 3, 'AUD': 3, 'BOBL': 0, 'BUND': 0, 'CORN': 1, 'CRUDE_W': 3, 'EDOLLAR': 0,
'GAS_US': 3, 'SHATZ': 0, 'SP500': 0, 'US10': 0, 'US2': 2, 'US5': 0, 'WHEAT': 1}
There are four groups (I use 4 clusters throughout, a completely arbitrary decision that seems about right with 14 instruments):
- A bond group containing BOBL, BUND, EDOLLAR, SHATZ, US5 and US10; but curiously also SP500
- An Ags group: Corn and Wheat
- US 2 year
- The rest: Crude & Gas; AEX and AUD
These are close but not quite the same as asset classes (for which you'd have a bond group, an Ags group, Energies, and equities/currency). Let's have a look at the weights to see where these groups came from (remember I'm using a subset here):
get_df_of_weights_for_index_date(all_wts_common_columns_as_dict, -1).round(2)
carry10 momentum16 momentum32 momentum4 momentum64 momentum8
AEX 0.39 0.06 0.08 0.31 0.14 0.02
AUD 0.42 0.16 0.10 0.03 0.11 0.19CRUDE_W 0.40 0.15 0.15 0.05 0.13 0.12
GAS_US 0.42 0.09 0.08 0.12 0.11 0.18
carry10 momentum16 momentum32 momentum4 momentum64 momentum8
CORN 0.17 0.28 0.23 0.00 0.12 0.20
WHEAT 0.28 0.18 0.23 0.00 0.24 0.07
carry10 momentum16 momentum32 momentum4 momentum64 momentum8
US2 1.00 0.00 0.00 0.00 0.00 0.00
carry10 momentum16 momentum32 momentum4 momentum64 momentum8
BOBL 0.66 0.10 0.11 0.00 0.13 0.00
BUND 0.64 0.06 0.13 0.01 0.13 0.03
EDOLLAR 0.67 0.00 0.19 0.00 0.14 0.00
SHATZ 0.79 0.00 0.00 0.00 0.21 0.00
SP500 0.64 0.08 0.10 0.04 0.11 0.04
US10 0.60 0.12 0.12 0.00 0.11 0.06
US5 0.56 0.12 0.13 0.00 0.13 0.06
It's a pretty convincing grouping I think! They key difference between the groups is the amount of carry that they have: a lot (bonds, S&P), a little (the Ags markets) or some (Energies and markets beginning with the letter A). (Note that US 2 year can only trade carry in this particular run - which is with the speed limit on rules. The other rules are too expensive, due to US2 very low volatility. Shatz is a tiny bit cheaper and can also trade very slow momentum. This is enough to put it in the same groups as the other bonds for now).
Fit the system by group
- To test group fitting
- To provide weights to blend together
Fit the entire system with everything pooled
system.config.forecast_weight_estimate['pool_gross_returns'] = True # obviously!
system.config.forecast_weight_estimate['ceiling_cost_SR'] = 9999 # ensures all markets grouped
Use a blended set of weights
- The individual weights
- The group fitted weights
- Weights from results pooled across the entire system
What do the weights look like?
Ind Group Entire Blend
assettrend32 0.24 0.24 0.23 0.24
breakout160 0.14 0.06 0.06 0.09
carry10 0.27 0.17 0.17 0.22
mrinasset160 0.00 0.24 0.32 0.14
normmom32 0.11 0.12 0.06 0.11
relcarry 0.24 0.16 0.16 0.20
Ind Group Entire Blend
assettrend32 0.17 0.23 0.21 0.20
breakout160 0.06 0.06 0.06 0.06
carry10 0.33 0.17 0.15 0.24
momentum64 0.06 0.06 0.11 0.06
mrinasset160 0.19 0.21 0.28 0.19
normmom32 0.06 0.12 0.05 0.10
relcarry 0.14 0.15 0.14 0.15
Ind Group Entire Blend
assettrend32 0.09 0.16 0.14 0.13
breakout10 0.02 0.04 0.03 0.03
breakout160 0.02 0.04 0.04 0.04
carry10 0.15 0.09 0.10 0.12
kurtS_abs30 0.18 0.17 0.05 0.18
momentum4 0.06 0.05 0.05 0.05
momentum64 0.02 0.09 0.07 0.06
mrinasset160 0.06 0.07 0.09 0.07
normmom32 0.04 0.04 0.04 0.04
relcarry 0.07 0.06 0.06 0.06
relmomentum20 0.10 0.07 0.07 0.08
skewabs90 0.17 0.12 0.27 0.13
The results!
- Fitting individually
- Fitting across groups
- Fitting across everything
- A blend of the above
def net_costs(system):
return system.accounts.portfolio().gross.sharpe() - system.accounts.portfolio().sharpe()
All rules Speed limit
Hi Rob,
ReplyDeleteGreat post as always! With regard to your last point, I was wondering what you would consider to be a 'regular' correlation structure?
All the off diagonals are equal.
DeleteRob, probably a dumb question. When you go through this process, you are weighting the various rules to produce ONE signal for each instrument, right? Not running all the rules individually and having the weights determine the allocation in the overall portfolio, correct?
ReplyDelete". When you go through this process, you are weighting the various rules to produce ONE signal for each instrument, right?"
DeleteRight.
Hi Rob.
ReplyDeleteThanks for addressing my question about conditioning carry forecasts based on asset class on your Dec 11th TTU episode. In it, you said that your "realised carry" stat for asset classes ranged from:
- equities, bonds, vol: >100%
- fx, metals: ~100%
- energy: 77%
- ag: 46%
I understand your concern that doing anything with these results is overfitting, but I think something is going on here. Namely, asset classes which you'd expect to have more variation in their carry forecasts due to seasonality and storage costs have lower realised carry.
It's been my experience trading your system so far that I tend to get the strongest carry forecasts in ags and the weakest in equities (disclaimer, less than 1 year!).
Do you know the historical carry forecasts by asset class? My guess is that they might resemble the inverse of the realised carry result. In which case, we'd be systematically betting on carry the most where our forecasts are historically weakest, and vice versa.
Conditioning each carry forecast to that asset class's history could weight the carry rule more evenly across asset classes.
What do you think?
Richard
These figures are actually in my new book :-) But 'weighting the carry rule more evenly across asset classes' is IMHO a bad idea.
DeleteGreat, it can't arrive soon enough!
DeleteI conjecture that your system is currently weighting the carry rule across asset classes the wrong way: betting biggest where the carry rule performs the worst (ags), and betting smallest where it performs the best (equities).
It's like you're listening to the loudest voice (which has a history of exaggeration) rather than the quietest voice (which has a history of understatement).
Does this not concern you?
Actually generally it makes sense to allow asset classes with more carry forecast to bet more, although yes generally financial assets deliver more than they promise. There is more in the book but what you propose smells like overfitting sorry. But I will look at in a future blog post.
DeleteThat would be amazing, thanks.
DeleteBook arrived (yay!) and the first thing was to find the historical carry forecasts for each asset class. Following your lead and ignoring vol as it only has 2 instruments, we have:
DeleteAsset class [avg absolute carry forecast, % carry realised]
Equities [0.17, 170%]
Metals [0.24, 115%]
FX [0.44, 109%]
Energy [0.72, 77%]
Bonds [0.73, 142%]
Ags [0.82, 46%]
It looks like the relationship does exist. That is, asset classes with more variation in carry forecast realise less of their carry. You could even consider the one outlier in that list (bonds) a special case with the secular bull market inflating their realised carry.
I was surprised to see footnote 163: "If we had used a different estimate of the forecast scalar for each asset class, then this linkage between promised and realised carry would be broken. We would have a larger forecast scalar, and hence a higher standard deviation, in asset classes where carry was systematically low such as equities. We would be taking on more risk with very poor rewards for our efforts."
I don't understand the last sentence. It appears that we would be rewarded handsomely for taking more risk in equity carry and less risk in energy & ags carry.
Whether it's overfitting or not is of course a separate issue, but I did make this hypothesis before seeing the data!
Interested to hear your thoughts Rob.