Wednesday 6 March 2024

Fitting with: exponential weighting, alpha and the kitchen sink

 I've talked at some length before about the question of fitting forecast weights, the weights you use to allocate risk amongst different signals used to trade a particular instrument. Generally I've concluded that there isn't much point wasting time on this, for example consider my previous post on the subject here.

However it's an itch I keep wanting to scratch, and in particular there are three things I'd like to look at which I haven't considered before:

  • I've generally used ALL my data history, weighted equally. But there are known examples of trading rules which just stop working during the backtested period, for example faster momentum pre-cost (see my last book for a discussion). 
  • I've generally used Sharpe ratio as my indicator of performance of choice. But one big disadvantage of it is that will tend to favour rules with more positive Beta exposure on markets that have historically gone up 
  • I've always used a two step process where I first fit forecast weights, and then instrument weights. This seperation makes things easier. But we can imagine many examples where it would produce a suboptimal performance.
In this post I discuss some ideas to deal with these problems:

  • Exponential weighting, with more recent performance getting a higher weight.
  • Using alpha rather than Sharpe ratio to fit.
  • A kitchen sink approach where both instrument and forecast weights are fitted together.

Note I have a longer term project in mind where I re-consider the entire structure of my trading system, but that is a big job, and I want to put in place these changes before the end of the UK tax year, when I will also be introducing another 50 or so instruments into my traded universe, something that would require some fitting of some kind to be done anyway.


Exponential weighting

Here is the 2nd most famous 'hockey stick' graph in existence:

(From my latest book, Advanced Futures Trading Strategies AFTS)

Focusing on the black lines, which show the net performance of the two fastest EWMAC trading rules across a portfolio of 102 futures contracts, there's a clear pattern. Prior to 1990 these rules do pretty well, then afterwards they flatline (EWMAC4 in a very clear hockey stick pattern) and do badly (EWMAC2). 

I discuss some reasons why this might have happened in the book, but that isn't what concerns us now. What bothers me is this; if I allocate my portfolio across these trading strategies using all the data since 1970 then I'm going to give some allocation to EWMAC4 and even a little bit to EWMAC2. But does that really make sense, to put money in something that's been flat / money losing for over 30 years?

Fitting by use of historic data is a constant balance between using more history, to get more robust statistically significant results, and using more recent data that is more likely to be relevant and also accounts for alpha decay. The right balance depends on both the holding period of our strategies (HFT traders use months of data, I should certainly be using decades), and also the context (to predict instrument standard deviation, I use something equvalent to using about a month of returns, whereas for this problem a much longer history would be appropriate). 

Now I am not talking about crazy and doing something daft like allocating everything to the strategy that did best last week, but it does seem reasonable to use something like a 15 year halflife when estimating means and Sharpe Ratios of trading strategy returns.

That would mean I'd currently be giving about 86% of any weighting to the period after 1990, compared to about 62% now with equal weighting. So it's not a pure rolling window; the distant past still has some value, but the recent past is more important. 


Using alpha rather than Sharpe Ratio to fit

One big difference between Quant equity people and Quant futures traders is that the former are obsessed with alpha. They get mugs from their significant others with 'worlds best alpha generator' on them for christmas. They wear jumpers with the alpha symbol on. You get the idea. Beta is something to be hedged out. Much of the logic is that we're probably getting our daily diet of Beta exposure elsewhere, so the holistic optimal portfolio will consist of our existing Beta plus some pure alpha.

Quant futures traders are, broadly speaking, more concerned with outright return. I'm guilty of this myself. Look at the Sharpe Ratio in the backtest. Isn't it great? And you know what, that's probably fine. The correlation of a typical managed futures strategy with equity/bond 60:40 is pretty low. So most of our performance will be alpha anyway. 

However evaluating different trading strategies on outright performance is somewhat problematic. Certain rules are more likely to have a high correlation with underlying markets. Typically this will include carry in assets where carry is usually positive (eg bonds), and slower momentum on anything that has most usually gone up in the past (eg equities)*.  To an extent some of this is fine since we want to collect some risk premia, but if we're already collecting those premia elsewhere in long only portfolios**, why bother? 

* This also means that any weighting of instrument performance will be biased towards things that have gone up in the past - not a problem for me right now as I generally ignore it, but could be a problem if we adopt a 'kitchen sink' approach as I will discuss later.

** Naturally 'Trend following plus nothing' people will prefer to collect their risk premia inside their trend following portfolios, but they are an exception. I note in passing that for a retail investor who has to pay capital gains when their futures roll, it is likely that holding futures positions is an inferior way of collecting alpha.

I'm reminded of a comment by an old colleague of mine* on investigating different trading rules in the bond sector (naturally evalutin. After several depressing weeks he concluded that 'Nothing I try is any better than long only'.

*Hi Tom!

So in my latest book AFTS (sorry for the repeated plugs, but you're reading this for free so there has to be some advertising and at least it's more relevant than whatever clickbait nonsense the evil algo would serve up to you otherwise) I did redress this slightly by looking at alpha and not just outright returns. For example my slowest momentum rule (EWMAC64,256) has a slightly higher SR than one of my fastest (EWMAC8,32), but an inferior alpha even after costs.


Which benchmark?

Well this idea of using alpha is all very well, but what benchmark are we regressing on to get it? This isn't US equities now mate, you can't just use the S&P 500 without thinking. Some plausible candidates are:

  1. The S&P 500,.
  2. The 60:40 portfolio that some mythical investor might own as well as this, or a more tailored version to my own requirementsThis would be roughly equivalent to long everything on a subset of markets, with sector risk weights of about 80% in equities and 20% in bonds. Frankly this wouldn't be much different to the S&P 500.
  3. The 'long everything' portfolio I used in AFTS, which consists of all my futures with a constant positive fixed forecast (the system from chapter 4, as readers will know).
  4. A long only portfolio just for the sector a given instrument is trading in.
  5. A long only position just on the given instrument we are trading.

There are a number of things to consider here. What is the other portfolio that we hold? It might well be the S&P 500 or just the magnificent 7; it's more likely to consist of a globally diversified bunch of bonds and stocks; it's less likely to have a long only cash position in some obscure commodities contract. 

Also not all things deliver risk premia in their naked Beta outfits. Looking at median long only constant forecast SR in chapter 3 of AFTS, they appear lower in the non financial assets (0.07 in ags, 0.27 in metals and 0.32 in energy; versus 0.40 in short vol, 0.46 in equity and 0.59 in bonds; incidentally FX is also close to zero at 0.09, but EM apart there's no reason why we should earn a risk premium here). This implies we should be veering towards loading up on Beta in financials and Alpha in non financials). 

But it's hard to disaggregate what is the natural risk premium from holding financial assets, versus what we've earned just from a secular downtrend in rates and inflation that lasted for much of the 50 odd years of the backtest. Much of the logic for doing this exercise is because I'm assuming that these long only returns will be lower in the future because that secular trend has now finished.

Looking at the alpha just on one instrument will make it a bit weird when comparing alphas across different instruments. It might sort of make more sense to do the regression on a sector Beta. This would be more analogus to what the equity people do.

On balance I think the 'long everything' benchmark I used in AFTS is the best compromise. Because trends have been stronger in equities and bonds it will be reasonably correlated to 60:40 anyway. Regressing against this will thus give a lower Beta and potentially better Alpha for instruments outside of those two sectors.

One nice exercise to do is to then see what a blend of long everything and the alpha optimised portfolio looks like. This would allow us to include a certain amount of general Beta into the strategy. We probably shouldn't optimise for this.


Optimising with alpha

We want to allocate more to strategies with higher alpha. We also want that alpha to be statistically significant. We'll get more statistical significance with more observations, and/or a better fit to the regression. 

Unlike with means and Sharpe Ratios, I don't personally have any well developed methodologies, theories, or heuristics, for allocating weights according to alpha or significance of alpha. I did consider developing a new heuristic, and wasted a bit of time with toy formula involving the product of (1- p_value) and alpha.

But I quickly realised that it's fairly easy to adapt work I have done on this before. Instead of using naked return streams, we use residual return streams; basically the return left over after subtracting Beta*benchmark return. We can then divide this by the target return to get a Sharpe Ratio, which is then plugged in as normal.

How does this fit into an exponential framework? There are a number of ways of doing this, but I decided against the complexity of writing code (which would be slow) to do my regression in a full exponential way. Instead I estimate my Betas on first a rolling, then an expanding, 30 year window (which trivially has a 15 year half life). I don't expect Betas to vary that much over time. I estimate my alphas (and hence Sharpe ratios) with a 15 year half life on the residuals. Betas are re-estimated every year, and the most up to date estimate is then used to correct returns in the past year (otherwise the residual returns would change over time which is a bit weird and also computationally more expensive).


Kitchen sink

I've always done my optimisation in a two step process. First, what is the best way to forecast the price of this market (what is the best allocation across trading rules, i.e. what are my forecast weights)? Second, how should I put together a portfolio of these forecasters (what is the best allocation across instruments, i.e. what are my instrument weights)? 

Partly that reflects the way my trading strategy is constructed, but this seperation also makes things easier. But it does reflect a forecasting mindset, rather than a 'diversified set of risk premia' mindset. Under the latter mindset, it would make sense to do a joint optimisation where the individual 'lego bricks' are ONE trading rule and ONE instrument. 

It strikes me that this is also a much more logical approach once we move to maximising alpha rather than maximising Sharpe Ratio. 

Of course there are potential pain points here. Even for a toy portfolio of 10 trading rules and 50 instruments we are optimising 500 assets. But the handcrafting approach of top down optimisation ought to be able to handle this fairly easily (we shall see!).



Testing

Setup

Let's think about how to setup some tests for these ideas. For speed and interpretation I want to keep things reasonably small. I'm going to use my usual five outright momentum EWMAC trading rules, plus 1 carry (correlations are pretty high here, I will use carry60), plus one of my skew rules (skewabs180 for those who care), asset class mean reversion - a value type rule (mrinasset1000), and both asset class momentum (assettrend64) and relative momentum (relmomentum80). My expectation is that the more RV type rules - relative momentum, skew, value - will get a higher weight than when we are just considering outright performance. I'm also expecting that the very fastest momentum will have a lower weight when exponential weighting is used.

The rules profitability is shown above. You can see that we're probably going to want to have less MR (mean reversion), as it's rubbish; and also if we update our estimates for profitability we'd probably want less faster momentum and relative momentum. There is another hockey stick from 2009 onwards when many rules seem to flatten off somewhat.

(Frankly we could do with more rules that made more money recently; but I don't want to be accused of overegging the pudding on overfitting here)

For instruments, to avoid breaking my laptop with repeated optimisation of 200+ instruments I kept it simple and restricted myself to only those with at least 20 years of trading history. There are 39 of these old timers:

'BRE', 'CAD', 'CHF', 'CORN', 'DAX', 'DOW', 'EURCHF', 'EUR_micro', 'FEEDCOW', 'GASOILINE', 'GAS_US_mini', 'GBP', 'GBPEUR', 'GOLD_micro', 'HEATOIL', 'JPY', 'LEANHOG', 'LIVECOW', 'MILK', 'MSCISING', 'MXP', 'NASDAQ_micro', 'NZD', 'PALLAD', 'PLAT', 'REDWHEAT', 'RICE', 'SILVER', 'SOYBEAN_mini', 'SOYMEAL', 'SOYOIL', 'SP400', 'SP500_micro', 'US10', 'US20', 'US5', 'WHEAT', 'YENEUR', 'ZAR'

On the downside there is a bit of a sector bias here (12 FX, 11 Ags,  6 equity, 4 metals, and only 3 bonds amd 3 energy), but that also gives more work for the optimiser (FWIW my full set of instruments has biased towards equities, so you can't really win).

For my long only benchmark used for regressions I'm going to use a fixed forecast of +10, which in laymans terms means it's a risk parity type portfolio. I will set the instrument weights using my handcrafting method, but without any information about Sharpe Ratio, just correlations. IDM is estimated on backward looking data of course.

I will then have something that roughly resembles my current system (although clearly with fewer markets and trading rules, and without using dynamic optimisation of positions). I also use handcrafting, but I fit forecast weights and instrument weights seperately, again without using any information on performance just correlations. 

I then check the effect of introducing the following features:

  • 'SR' Allowing Sharpe Ratio information to influence forecast and instrument weights
  • 'Alpha' Using alpha rather than Sharpe Ratio
  • 'Short' Using a 15 year halflife rather than all the data to estimate Sharpe Ratios and correlations
  • 'Sink' Estimating the weights for forecast and instrument weights at the same time

Apart from SR and alpha which are mutually exclusive, this gives me the following possible permutations:

  • Baseline: Using no peformance information 
  • 'SR' 
  • 'SR+Short' 
  • 'Sink' 
  • 'SR+Sink' 
  • 'SR+Short+Sink' 
  • 'Alpha' 
  • 'Alpha+Short'
  • 'Alpha+Sink'
  • 'Alpha+Short+Sink'
In terms of performance I'm going to check both the outright performance, but also the overall portfolio alpha. I will also look seperately at the post 2008 period and the pre 2008 period. Naturally everything is done out of sample, with robust optimisation, and after costs.

Finally, as usual in all cases I discard trading rules which don't meet my 'speed limit'. This also means that I don't trade the Milk future at all.


Long only benchmark

Some fun facts, here are the final instrument weights by asset class:

{'Ags': 0.248, 'Bond': 0.286, 'Equity': 0.117, 'FX': 0.259, 'Metals': 0.0332, 'OilGas': 0.0554}

The final diversification multiplier is 2.13. It has a SR of around 0.6, and costs of around 0.4% a year.


Baseline

Here is a representative set of forecast weights (S&P 500):

relmomentum80       0.105

momentum4           0.094
momentum8           0.048
momentum16          0.048
momentum32          0.054
momentum64          0.054
assettrend64        0.102

carry60             0.155
mrinasset1000       0.238
skewabs180          0.102

The massive weight to mrinasset is due to the fact it is very diversifying, and we are only using correlations here. But mrinasset isn't very good, so smuggling in outright performance would probably be a good thing to do.

SR of this thing is 0.98 and costs are a bit higher as we'd expect at 0.75% annualised. Always amazing how well just a simple diversified system can do. The Beta to our long only model is just 0.09 (again probably due to that big dollop of mean reversion which is slightly negative Beta if anything), so perhaps unsurprising the net alpha is 18.8% a year (dividing by the vol gets to a SR of 0.98 again just on the alpha). BUT...



Performance has declined over time. 


'SR'

I'm now going to allow the fitting process for both forecast and instrument weighs to use Sharpe ratio. Naturally I'm doing this in a sensible way so the weights won't go completely nuts.

Let's have a look at the forecast weights for comparison:

momentum4           0.112

momentum8           0.065

momentum16          0.068

momentum32          0.075

momentum64          0.072

assettrend64        0.122

relmomentum80       0.097


mrinasset1000       0.135

skewabs180          0.105

carry60             0.149


We can see that money losing MR has a lower weight, and in general the non trendy part of the portfolio has dropped from about half to under 40%. But we still have lots of faster momentum as we're using the whole period to fit.

Instrument weights by asset class meanwhile look like this:

{'Ags': 0.202, 'Bond': 0.317, 'Equity': 0.191, 'FX': 0.138 'Metals': 0.0700, 'OilGas': 0.0820}

Not dramatic changes, but we do get a bit more of the winning asset classes. 


'SR+Short'

Now what happens if we change our mean and correlation estimates so they have a 15 year halflife, rather than using all the data?

Forecast weights:

momentum4           0.095

momentum8           0.055

momentum16          0.059

momentum32          0.067

momentum64          0.066

relmomentum80       0.095

assettrend64        0.122

skewabs180          0.117

carry60             0.142

mrinasset1000       0.183



There's definitely been a shift out of faster momentum, and into things that have done better recently such as skew. We are also seeing more MR which seems counterintuitive, my initial theory is that it's because MR becomes more diversifying over time and this is indeed the case.


'Sink+SR+Short'

So far we've just been twiddling around a little at the edges really, but this next change is potentially quite different - jointly optimising the forecast and instrument weights. Let's look at the results with the SR using the 15 year halflife.

Here are the S&P 500 forecast weights - note that unlike for other methods, these could be wildly different across instruments:

momentum4           0.136
momentum8           0.188
momentum16          0.147
momentum32          0.046
momentum64          0.048
assettrend64        0.098
relmomentum80       0.012

carry60             0.035
mrinasset1000       0.000
skewabs180          0.290

Here we see decent amounts of faster momentum - maybe because it's a cheaper instrument or just happens to work better - but no mean reversion which apparently is shocking here. A better way of doing this is seeing the forecast weights added up across all instruments:

momentum4        0.181185
momentum8        0.138984
momentum16       0.088705
momentum32       0.079942
momentum64       0.098399
assettrend64     0.107174
relmomentum80    0.052196

skewabs180       0.086572
mrinasset1000    0.063426
carry60          0.103418


Perhaps surprisingly now we're seeing brutally large amounts of fast momentum, and less of the more diversifying rules. 
 


Interlude - clustering when everything is optimised together


To understand a little better what's going on, it might be helpful to do a cluster analysis to see how things are grouping together when we do our top down optimisation across the 10 rules and 37 instruments: 370 things altogether. Using the final correlation matrix to do the clustering, here are the results for 2 clusters:

Instruments {'CAD': 10, 'FEEDCOW': 10, 'GAS_US_mini': 10, 'GBPEUR': 10, 'LEANHOG': 10, 'LIVECOW': 10, 'MILK': 10, 'RICE': 10, 'SP400': 10, 'YENEUR': 10, 'ZAR': 10, 'DAX': 9, 'DOW': 9, 'MSCISING': 9, 'NASDAQ_micro': 9, 'SP500_micro': 9, 'GASOILINE': 4, 'GOLD_micro': 4, 'HEATOIL': 4, 'JPY': 4, 'PALLAD': 4, 'PLAT': 4, 'SILVER': 4, 'CHF': 3, 'CORN': 3, 'EURCHF': 3, 'EUR_micro': 3, 'NZD': 3, 'REDWHEAT': 3, 'SOYBEAN_mini': 3, 'SOYMEAL': 3, 'SOYOIL': 3, 'US10': 3, 'US5': 3, 'WHEAT': 3, 'BRE': 2, 'GBP': 2, 'US20': 2, 'MXP': 1}
Rules {'skewabs180': 37, 'mrinasset1000': 35, 'carry60': 33, 'relmomentum80': 25, 'assettrend64': 16, 'momentum4': 16, 'momentum8': 16, 'momentum16': 16, 'momentum32': 16, 'momentum64': 16}

Instruments {'MXP': 9, 'BRE': 8, 'GBP': 8, 'US20': 8, 'CHF': 7, 'CORN': 7, 'EURCHF': 7, 'EUR_micro': 7, 'NZD': 7, 'REDWHEAT': 7, 'SOYBEAN_mini': 7, 'SOYMEAL': 7, 'SOYOIL': 7, 'US10': 7, 'US5': 7, 'WHEAT': 7, 'GASOILINE': 6, 'GOLD_micro': 6, 'HEATOIL': 6, 'JPY': 6, 'PALLAD': 6, 'PLAT': 6, 'SILVER': 6, 'DAX': 1, 'DOW': 1, 'MSCISING': 1, 'NASDAQ_micro': 1, 'SP500_micro': 1}
Rules {'assettrend64': 23, 'momentum4': 23, 'momentum8': 23, 'momentum16': 23, 'momentum32': 23, 'momentum64': 23, 'relmomentum80': 14, 'carry60': 6, 'mrinasset1000': 4, 'skewabs180': 2}


Interepration here is that for each cluster I count the number of instruments present, and then trading rules. So for example the first cluster has 10 examples of CAD - since there are 10 trading rules that means all the CAD is in this cluster. It also has 37 examples of the skewabs180 rules, again this means that all the skew rules have been collected here.

This first cluster split clearly shows a split between divergent rules in cluster 1, and trendy type rules in cluster 2. The instrument split is less helpful.

Jumping ahead, here are N=10 clusters with my own labels in bold:

Cluster 1 EQUITY TREND
Instruments {'DAX': 6, 'DOW': 5, 'NASDAQ_micro': 5, 'SP400': 5, 'SP500_micro': 5}
Rules {'assettrend64': 5, 'momentum16': 5, 'momentum32': 5, 'momentum64': 5, 'momentum8': 4, 'mrinasset1000': 1, 'skewabs180': 1}

Cluster 2 ???
Instruments {'GAS_US_mini': 9, 'EURCHF': 2, 'GOLD_micro': 2, 'MSCISING': 2, 'NASDAQ_micro': 2, 'PALLAD': 2, 'PLAT': 2, 'RICE': 2, 'SILVER': 2, 'SP500_micro': 2, 'BRE': 1, 'CAD': 1, 'DAX': 1, 'DOW': 1, 'EUR_micro': 1, 'GASOILINE': 1, 'REDWHEAT': 1, 'SOYBEAN_mini': 1, 'SOYMEAL': 1, 'SOYOIL': 1, 'SP400': 1, 'US5': 1, 'WHEAT': 1}
Rules {'mrinasset1000': 15, 'carry60': 8, 'skewabs180': 6, 'relmomentum80': 5, 'assettrend64': 1, 'momentum4': 1, 'momentum8': 1, 'momentum16': 1, 'momentum32': 1, 'momentum64': 1}

Cluster 3 ???
Instruments {'FEEDCOW': 10, 'GBPEUR': 10, 'LEANHOG': 10, 'LIVECOW': 10, 'MILK': 10, 'YENEUR': 10, 'ZAR': 10, 'CAD': 9, 'RICE': 8, 'MSCISING': 7, 'HEATOIL': 4, 'JPY': 4, 'SP400': 4, 'CHF': 3, 'CORN': 3, 'DOW': 3, 'GASOILINE': 3, 'NZD': 3, 'US10': 3, 'DAX': 2, 'EUR_micro': 2, 'GBP': 2, 'GOLD_micro': 2, 'NASDAQ_micro': 2, 'PALLAD': 2, 'PLAT': 2, 'REDWHEAT': 2, 'SILVER': 2, 'SOYBEAN_mini': 2, 'SOYMEAL': 2, 'SOYOIL': 2, 'SP500_micro': 2, 'US20': 2, 'US5': 2, 'WHEAT': 2, 'BRE': 1, 'EURCHF': 1, 'GAS_US_mini': 1, 'MXP': 1}
Rules {'skewabs180': 30, 'carry60': 25, 'relmomentum80': 20, 'mrinasset1000': 19, 'momentum4': 15, 'momentum8': 11, 'assettrend64': 10, 'momentum16': 10, 'momentum32': 10, 'momentum64': 10}

Cluster 4 US RATES TREND+CARRY
Instruments {'US20': 8, 'US10': 7, 'US5': 7}
Rules {'carry60': 3, 'assettrend64': 3, 'momentum4': 3, 'momentum8': 3, 'momentum16': 3, 'momentum32': 3, 'momentum64': 3, 'mrinasset1000': 1}

Cluster 5 EURCHF
Instruments {'EURCHF': 7}
Rules {'assettrend64': 1, 'momentum4': 1, 'momentum8': 1, 'momentum16': 1, 'momentum32': 1, 'momentum64': 1, 'skewabs180': 1}

Cluster 6 EQUITY MR+REL MOMENTUM
Instruments {'DAX': 1, 'DOW': 1, 'MSCISING': 1, 'NASDAQ_micro': 1, 'SP500_micro': 1}
Rules {'mrinasset1000': 3, 'relmomentum80': 2}

Cluster 7 G10 FX TREND
Instruments {'GBP': 8, 'CHF': 7, 'EUR_micro': 7, 'NZD': 7, 'JPY': 6}
Rules {'assettrend64': 5, 'momentum4': 5, 'momentum8': 5, 'momentum16': 5, 'momentum32': 5, 'momentum64': 5, 'relmomentum80': 4, 'carry60': 1}

Cluster 8 EM FX
Instruments {'MXP': 9, 'BRE': 8}
Rules {'relmomentum80': 2, 'carry60': 2, 'assettrend64': 2, 'momentum4': 2, 'momentum8': 2, 'momentum16': 2, 'momentum32': 2, 'momentum64': 2, 'skewabs180': 1}

Cluster 9 AGS TREND
Instruments {'CORN': 7, 'REDWHEAT': 7, 'SOYBEAN_mini': 7, 'SOYMEAL': 7, 'SOYOIL': 7, 'WHEAT': 7}
Rules {'relmomentum80': 6, 'assettrend64': 6, 'momentum4': 6, 'momentum8': 6, 'momentum16': 6, 'momentum32': 6, 'momentum64': 6}

Cluster 10 ENERGY/METAL TREND
Instruments {'GASOILINE': 6, 'GOLD_micro': 6, 'HEATOIL': 6, 'PALLAD': 6, 'PLAT': 6, 'SILVER': 6}
Rules {'assettrend64': 6, 'momentum4': 6, 'momentum8': 6, 'momentum16': 6, 'momentum32': 6, 'momentum64': 6}

We can see that there are some richer things going on here than we could capture in the simple 2-dimensional fit of first forecast weights, then instrument weights. 


'Alpha'

Let's now see what happens if we replace the use of Sharpe Ratio on raw returns to measure performance with optimisation with the use of a Sharpe Ratio on residual returns after adjusting for Beta exposure; alpha basically.

Here are our usual forecast weights for S&P 500:

momentum4        0.061399
momentum8        0.067279
momentum16       0.037172
momentum32       0.037112
momentum64       0.069207
relmomentum80    0.170218
assettrend64     0.176815

skewabs180       0.105298
carry60          0.102799
mrinasset1000    0.172700

Very interesting; we're steering very much away from all speeds of 'vanilla' momentum here, and once again we have a lump of money in the very much diversifying but money losing mean reversion in assets.


Results


Right so you have waded through all this crap, and here is your reward, what are the results like?

                     SR   beta   r_SR  H1_SR  H1_beta  H1_r_SR  H2_SR  H2_beta  H2_r_SR
LONG_ONLY
0.601 1.000 0.000 0.740 1.000 0.000 0.191 1.000 0.000
BASELINE 0.983 0.094 0.940 1.246 0.094 1.192 0.197 0.058 0.188
SR 1.087 0.308 0.932 1.298 0.338 1.084 0.458 0.151 0.438
SR_short 1.089 0.324 0.927 1.318 0.353 1.100 0.375 0.166 0.350
sink 1.025 0.323 0.871 1.252 0.330 1.055 0.356 0.260 0.319
SR_sink 0.975 0.362 0.804 1.167 0.378 0.947 0.388 0.265 0.349
SR_short_sink 0.929 0.384 0.753 1.121 0.395 0.899 0.323 0.305 0.277
alpha 0.993 0.199 0.891 1.212 0.220 1.069 0.345 0.081 0.336
alpha_short 1.023 0.238 0.904 1.237 0.249 1.082 0.367 0.160 0.343
alpha_sink_short 0.888 0.336 0.735 1.090 0.336 0.903 0.257 0.299 0.210
The columns are the SR, beta, and 'residual SR' (alpha divided by standard deviation) for the whole period, then for H1 (not really the first half, but pre 2009), then for H2 (after 2009). Green values are the best or very close to it, red is the worst (excluding long only, natch).

Top line is everything looks worse after 2009 for both outright and residual performance. Looking at the entire period, there are some fitting methods that do better than the baseline on SR, but on residual SR they fail to improve. Focusing on the second half of the data, there is a better improvement on SR over the baseline for all the fitting methods, but one which also survives the use of a residual SR. 

But the best model of all in that second half was almost much the simplest, just using the SR alone to robustly fit weights - but for the entire period, and sticking to the two stage process of fitting forecast weights and then instrument weights. 

I checked, and the improvement over the baseline from just using SR was statistically significant with a p-value of about 0.02. The p-value versus the competing 'apha' fit wasn't so good - just 0.2; but Occams Rob's razor says we should use the simplest possible model unless there is a more complex model that is significantly better. SR is significantly better than the simpler baseline model at least in the more critical second half of the data, so we should use it and only use a more complex model if they are better. We don't need a decent p-value to justify using SR over alpha, since the latter is more complex.


Coda: Forecast or instrument weights?

One thing I was curious about was whether the improvements from using SR are down to fitting forecast weights or instrument weights. I'm a bit more bullish on the former, as I feel there is more data and thus more likelihood of getting robust statistics. Every time I have looked at instrument performance, I've not seen any stastistically significant differences.

(If you have say 30 years of data history, then for each instrument you have 30 years worth of information, but for each trading rule you have evidence from each instrument so you end up with 30*30 years which means you have root(30) = 5.5 times more information).

My hope / expectation is that all the work is being done by forecast weight fitting, so I checked to see what happened if I ran a SR fit just on instruments, and just on forecasts:

               SR   beta   r_SR  H1_SR  H1_beta  H1_r_SR  H2_SR  H2_beta  H2_r_SR
LONG_ONLY 0.601 1.000 0.000 0.740 1.000 0.000 0.191 1.000 0.000
BASELINE 0.983 0.094 0.940 1.246 0.094 1.192 0.197 0.058 0.188
SR 1.087 0.308 0.932 1.298 0.338 1.084 0.458 0.151 0.438
SR_forecasts 1.173 0.330 1.002 1.416 0.364 1.179 0.441 0.154 0.420
SR_instruments 0.948 0.118 0.892 1.179 0.121 1.106 0.259 0.076 0.248


Sure enough we can see that the benefit is pretty much entirely coming from the forecast weight fitting.


Conclusions

I've shyed away from using performance rather than just correlations for fitting, but as I said earlier it is an itch I wanted to scratch. It does seem that none of the fancy alternatives I've considered in this post add value; so I will keep searching for the elusive bullet of quick wins through portfolio optimisation. 

Meanwhile for the exercise of updating my trading strategy with new instruments, I will probably be using Sharpe Ratio information to robustly fit forecast weights but not instrument weighs (I still need to hold my nose a bit!).







14 comments:

  1. To clarify, you're using Sharpe on an expanding window to adjust weights? It is interesting that there was no benefit to adjusting instrument weightings this way!

    What if you used a rolling 48 or 96-month window or such for rules? I've done some testing on systems that showed a benefit to selecting the best subsystems based on this criteria - though not for Futures. Unfortunately they still had severe look-back bias as the systems were created post the test data...

    ReplyDelete
    Replies
    1. 48/96 months is *way* too short to get statistical significance

      Delete
    2. Thank you. I've seen the same return degradation you found in futures in equity selection and taa (etf) asset class rotation and was hoping to also find a solution.

      I'm a moron though here - is it simply the number of price points that account for it being okay for high frequency traders to alter signals in only weeks, but not for us to use 48 months? So for 3 weeks a high frequency trader would have 7,200 price points with minute data, whereas 48 months only accounts for 1460 price points using daily data?

      Delete
    3. Another mistake, 252 days * 4 years is only 968.

      Delete
    4. " is it simply the number of price points that account for it being okay for high frequency traders to alter signals in only weeks, but not for us to use 48 months?" to precise it's the amount of information, broadly speaking the number of decisions you are making. So if I was to test my strategy on tick data, that wouldn't give me more information since my holding period is still ~1 month. Also the statistical significance grows with square root time.

      Delete
    5. Thank you for your help. I went back to my program with statsmodels and your advice. It showed significance vs 60-40, but when comparing to the best individual strategies there was no significance and very low power with both the expanding and rolling windows (even though I was aggregating 12 out of 96). I think I basically over-engineered sorting an Excel sheet by Sharpe and pretending I picked a few of the best back in 1970.

      Delete
  2. Very interesting post, sir. Quick question: is there any statistical 'incompleteness' or hazard from using p-values as opposed to, say, confidence intervals for comparing SR to alpha in your returns estimates?

    ReplyDelete
  3. With regard to actual implementation: you are using your hand-crafted weights and then applying a formula to exponentially weight a Sharpe ratio adjustment over time? Is this somehow applied to and adjusts the dataframe that forecast_weights_for_instruments() pulls (and just starts with handcrafted weights set in a config file)? Would you consider releasing the formula used and code? Thank you!

    ReplyDelete
    Replies
    1. Yeah it's 'handcrafting' but not done by hand. You're welcome to look at the code it's horrible though https://gist.github.com/robcarver17/58b3668407fdbd05954c34373c63d9ed

      Delete
    2. Wow, thank you very much. Quite a bit of work went into this paper/post!

      Delete
    3. I tried to understand the code and what's happening in PySystemTrade, but had a bit of trouble:

      Is this right?
      1) Taking an expanding window of returns over the rolling standard deviation to calculate Sharpe (*not* a rolling window of returns).
      2) Applying the ew_lookback to the above (i.e. 15 years)
      3) Adjusting weights depending on confidence

      Delete
    4. Sorry about the above. I was trying to implement a similar setup in my own (far, far, far simpler) Python scripts. I finally simply exponentially weighted the mean() and std() and created the Sharpe, then ranked. I did see a consistent effect across three different systems that ranked 90+ TAA (etf) strategies and two systems that used a variety of equity systems (many with decade plus out of sample).

      For fun I also tried to expand on your tests here. There appeared to be a small Sharpe bump by only applying sr_equalize False to forecast and not instruments (an idea you alluded to in the end) and a significant bumps by separately testing the full rule set with these limited instruments, and testing the full instrument set with these limited rules - not surprising of course.

      Delete
  4. Great post as usual Rob, thanks so much. I've been playing around with this as well. I generated the unweighted performance of each trading rule; for the fast trading rules I filtered out the markets that were too expensive (so it only reflects performance on markets that were viable to trade), then ran those weekly returns through your full handcrafting code included in Pysystemtrade and below is what popped out. Interesting results and some interesting divergences from the manually handcrafted weights. I believe your full handcrafting script already includes sharpe (along with correlations) as a weighting criteria similar to the sharpe only test here, no?

    accel16: 0.02581
    accel32: 0.00000
    accel64: 0.00000
    assettrend16: 0.07681
    assettrend2: 0.00684
    assettrend32: 0.00822
    assettrend4: 0.00129
    assettrend64: 0.00745
    assettrend8: 0.01186
    breakout10: 0.00475
    breakout160: 0.00029
    breakout20: 0.00667
    breakout320: 0.05110
    breakout40: 0.00606
    breakout80: 0.01203
    carry10: 0.00082
    carry125: 0.10106
    carry30: 0.00274
    carry60: 0.02006
    momentum16: 0.00184
    momentum32: 0.02651
    momentum4: 0.00121
    momentum64: 0.00255
    momentum8: 0.00398
    mrinasset1000: 0.15134
    normmom16: 0.01894
    normmom2: 0.00121
    normmom32: 0.03793
    normmom4: 0.00138
    normmom64: 0.01305
    normmom8: 0.00953
    relcarry: 0.00000
    relmomentum10: 0.09064
    relmomentum20: 0.10128
    relmomentum40: 0.00223
    relmomentum80: 0.00759
    skewabs180: 0.01640
    skewabs365: 0.07140
    skewrv180: 0.03902
    skewrv365: 0.05809

    Tree:
    [' '
    'Contains '
    '3 '
    'sub '
    'portfolios',
    ['[0] '
    'Contains '
    '3 '
    'sub '
    'portfolios',
    ['[0][0] '
    'Contains '
    '3 '
    'sub '
    'portfolios',
    ['[0][0][0] '
    'Contains '
    "['relmomentum40', "
    "'relmomentum80']"],
    ['[0][0][1] '
    'Contains '
    '3 '
    'sub '
    'portfolios',
    ['[0][0][1][0] '
    'Contains '
    "['breakout80', "
    "'momentum16', "
    "'normmom16']"],
    ['[0][0][1][1] '
    'Contains '
    "['assettrend16']"],
    ['[0][0][1][2] '
    'Contains '
    "['accel64']"]],
    ['[0][0][2] '
    'Contains '
    '3 '
    'sub '
    'portfolios',
    ['[0][0][2][0] '
    'Contains '
    '2 '
    'sub '
    'portfolios',
    ['[0][0][2][0][0] '
    'Contains '
    "['breakout160', "
    "'momentum32', "
    "'normmom32']"],
    ['[0][0][2][0][1] '
    'Contains '
    "['assettrend32']"]],
    ['[0][0][2][1] '
    'Contains '
    "['momentum64', "
    "'normmom64']"],
    ['[0][0][2][2] '
    'Contains '
    "['assettrend64', "
    "'breakout320']"]]],
    ['[0][1] '
    'Contains '
    '3 '
    'sub '
    'portfolios',
    ['[0][1][0] '
    'Contains '
    "['assettrend8', "
    "'momentum8', "
    "'normmom8']"],
    ['[0][1][1] '
    'Contains '
    "['breakout40']"],
    ['[0][1][2] '
    'Contains '
    "['accel32']"]],
    ['[0][2] '
    'Contains '
    '3 '
    'sub '
    'portfolios',
    ['[0][2][0] '
    'Contains '
    "['assettrend2', "
    "'breakout10', "
    "'normmom2']"],
    ['[0][2][1] '
    'Contains '
    '2 '
    'sub '
    'portfolios',
    ['[0][2][1][0] '
    'Contains '
    "['assettrend4', "
    "'momentum4', "
    "'normmom4']"],
    ['[0][2][1][1] '
    'Contains '
    "['breakout20']"]],
    ['[0][2][2] '
    'Contains '
    "['accel16']"]]],
    ['[1] '
    'Contains '
    '3 '
    'sub '
    'portfolios',
    ['[1][0] '
    'Contains '
    '3 '
    'sub '
    'portfolios',
    ['[1][0][0] '
    'Contains '
    "['carry10', "
    "'carry30', "
    "'carry60']"],
    ['[1][0][1] '
    'Contains '
    "['carry125']"],
    ['[1][0][2] '
    'Contains '
    "['relcarry']"]],
    ['[1][1] '
    'Contains '
    "['relmomentum10', "
    "'relmomentum20']"],
    ['[1][2] '
    'Contains '
    '2 '
    'sub '
    'portfolios',
    ['[1][2][0] '
    'Contains '
    "['skewabs180', "
    "'skewabs365']"],
    ['[1][2][1] '
    'Contains '
    "['skewrv180', "
    "'skewrv365']"]]],
    ['[2] '
    'Contains '
    "['mrinasset1000']"]]

    ReplyDelete
  5. You could try the residual SR to weight, instead of the alphas.

    ReplyDelete

Comments are moderated. So there will be a delay before they are published. Don't bother with spam, it wastes your time and mine.