Friday 12 May 2023

Clustering trading rule p&l

 I recently upgraded my live production system to include all the extra instruments I've added on recently. I also did a little consolidation of trading rules, simplifying things slightly by removing some rules that didn't really have much allocation, and adding a couple from my new book. As usual I set the instrument weights and forecast weights using my handcrafting methodology, which is basically a top down method that involves clustering things into groups in a hierarchical fasion.

In my backtests I do this clustering using the correlation matrix as a guide, but for production weights I use heurestics. So for instruments I say things like 'bonds are probably more correlated with each other than with other assets' and form the clusters initially as asset classes. And for forecast weights, which allocate across trading rules, I say things like 'momentum type trading rules are probably more correlated with each other', so I end up using a hierarchy like this:

  • Convergent (eg carry and mean reversion), Divergent (eg momentum)
  • Generic trading rule (eg EWMAC)
  • Specific trading rule variation (eg EWMAC2,8)

Now I recently tested this clustering method for instruments in this blog post. OK it was 17 months ago, but it felt recent to me. Basically I used a clustering methodology and threw in the actual correlation matrix to see how the grouping turned out. It was quite interesting. So I thought it would be quite interesting to do a similar thing with forecast weights. Effectively I am sense checking my heuristic guidelines to see if they are completely nuts, or vaguely okay.

Some code.

Getting the correlation matrix

Well you might think this is easy, but it's not. The correlation matrix here is the correlation of returns for a given set of trading rules and variations. But returns of what? A single instrument, like the S&P 500? That obviously may be unrepresentative of the sample generally, and we're not going to do this exercise for the 200+ instruments I have in my dataset now. What about the correlation of average returns taken across a bunch of instruments, or perhaps the average of the correlations taken across the same bunch - note these aren't quite the same thing. For example an average of correlations will give every instrument the same weight, wheras an average of returns will give a higher weighting to instruments with more data history.

And if we are doing averaging, do we just do a simple average - which will be biased since 37% of my futures are equities? Or do we use the instrument weights?

The good news is it probably won't make too much difference. Given enough history, the correlation of trading rule forecast returns is pretty similar across instruments. But we probably want to avoid overweighting certain asset classes, or equally weighting instruments without much history. So I'm going to go for taking the return correlation of portfolios for each trading rule. Each portfolio consists of the same trading rule being traded in all the instruments I trade, weighted by my actual instrument weights. 

Now I don't actually trade all rules in all instruments, because of trading costs, and sometimes because the instrument has certain flaws, but what we are trying to get here is as much information as possible to build a robust correlation matrix. I will also use pre-cost returns; not that it will make any difference, but the point here is to discover how similar rules are to each other, which doesn't depend on costs.

Finally note that I have 135 instruments with instrument weights, because some of my 208 are duplicates (eg micro and mini S&P 500), or I can't legally trade them, or for some other reason.

Results: N=2

Let's kick things off then:

Cluster 1 'convergent'
['mrinasset160', 'carry10', 'carry30', 'carry60', 'carry125', 'relcarry',
'skewabs365', 'skewabs180', 'skewrv365', 'skewrv180']
Cluster 2 'divergent'
['breakout10', 'breakout20', 'breakout40', 'breakout80', 'breakout160', 'breakout320',
'relmomentum10', 'relmomentum20', 'relmomentum40', 'relmomentum80', 
'assettrend2', 'assettrend4', 'assettrend8', 'assettrend16', 
'assettrend32', 'assettrend64', 
'normmom2', 'normmom4', 'normmom8', 'normmom16', 'normmom32', 'normmom64', 
'momentum4', 'momentum8', 'momentum16', 'momentum32', 'momentum64', 
'accel16', 'accel32', 'accel64']

An absolutely perfect convergent vs divergent split. The labels by the way are added by me, not the code.

Results: N=3

Cluster 1 'convergent' (Unchanged)
['mrinasset160', .... ]

Cluster 2 'fast divergent'
['breakout10', 'breakout20',
'relmomentum10', 'relmomentum20', 'relmomentum40', 'relmomentum80', 
'assettrend2', 'assettrend4', 
'normmom2', 'normmom4', 
'momentum4', 'accel16']

Cluster 3 'medium and slow divergent'
['breakout40', 'breakout80', 'breakout160', 'breakout320',
'assettrend8', 'assettrend16', 'assettrend32', 'assettrend64', 
'normmom8', 'normmom16', 'normmom32', 'normmom64', 
'momentum8', 'momentum16', 'momentum32', 'momentum64', 
'accel32', 'accel64']
This is why we are doing this exercise - we've just discovered something interesting: fast momentum like trading rules have more in common with other fast momentum trading rules, than they do with slow variations of themselves.

Results: N=4

Cluster 1 'convergent mean reversion'
2 'convergent skew and carry'
['carry10', 'carry30', 'carry60', 'carry125', 'relcarry', 'skewabs365', 'skewabs180', 'skewrv365', 'skewrv180']
3 'fast divergent - unchanged'
['breakout10', 'breakout20', ....]
4 'medium and slow divergent - unchanged'
['breakout40', 'breakout80', ....]

Results: N=5

Now it's the turn of the (relatively) slow divergent to be split up:

Cluster 1 'convergent mean reversion (unchanged)'
['mrinasset160', 'mrwrings4']
2 'convergent skew and carry (unchanged)'
['carry10', 'carry30', 'carry60', ....]
3 'fast divergent - unchanged'
['breakout10', 'breakout20', ....]
Cluster 4 'slow divergent'
['breakout160', 'breakout320',
'assettrend32', 'assettrend64', 
'normmom32', 'normmom64', 
'momentum32', 'momentum64']
Cluster 5 'medium speed divergent'
['breakout40', 'breakout80',
'assettrend8', 'assettrend16', 
'normmom8', 'normmom16', 
'momentum8', 'momentum16', 
'accel32', 'accel64']
Again it's the speed of trading that is the differentiator here, not the trading rule.

Results: N=6

We break relative momentum off from it's counterparts in what was previously cluster 3:

Cluster 1 'convergent mean reversion (unchanged)
2 'convergent skew and carry' (unchanged)
['carry10', 'carry30', 'carry60', ...]
3 'fast divergent - unchanged'
['breakout10', 'breakout20', ....]
Cluster 3
['relmomentum10', 'relmomentum20', 'relmomentum40', 'relmomentum80']
Cluster 4
['breakout10', 'breakout20',
'assettrend2', 'assettrend4', 
'normmom2', 'normmom4', 
'momentum4', 'accel16']
Cluster 5 'slow divergent' (unchanged - was cluster 4)
['breakout160', 'breakout320',...
Cluster 6 'medium speed divergent' (unchanged - was cluster 5)
['breakout40', 'breakout80',...

Results: N=7

And now acceleration comes away from the other slow rules:

Cluster 1 'convergent mean reversion (unchanged)
2 'convergent skew and carry' (unchanged)
['carry10', 'carry30', 'carry60', ...]
Cluster 3 'relative momentum' (unchanged)
['relmomentum10', 'relmomentum20', 'relmomentum40', 'relmomentum80']
Cluster 4 'fast divergent' (unchanged)
['breakout10', 'breakout20'...
Cluster 5 'slow divergent ex. accel'
['breakout160', 'breakout320', 'assettrend32', 'assettrend64', 'normmom32', 'normmom64', 'momentum32', 'momentum64']
Cluster 6 'slow acceleration'
['accel32', 'accel64']
Cluster 7 'medium speed divergent' (unchanged - was cluster 6)
['breakout40', 'breakout80',...

Results: N=8

Skew and carry seperate:

Cluster 1 'convergent mean reversion (unchanged)
Cluster 2 ('skew)
['skewabs365', 'skewabs180', 'skewrv365', 'skewrv180']
Cluster 3 ('carry')
['carry10', 'carry30', 'carry60', 'carry125', 'relcarry']
Cluster 4 'relative momentum' (unchanged)
['relmomentum10', 'relmomentum20', 'relmomentum40', 'relmomentum80']
Cluster 5 'fast divergent' (unchanged)
['breakout10', 'breakout20'...
Cluster 6 'slow divergent ex. accel'
['breakout160', 'breakout320',...]
Cluster 7 'slow acceleration' (unchanged)
['accel32', 'accel64']
Cluster 8 'medium speed divergent' (unchanged)
['breakout40', 'breakout80',...

Results: N=11

Let's skip ahead a bit, and also show all the instruments in each group for this final iteration:

Cluster 1 'slow asset mean reversion'
Cluster 2 'skew'
['skewabs365', 'skewabs180', 'skewrv365', 'skewrv180']
Cluster 3 'carry'
['carry10', 'carry30', 'carry60', 'carry125', 'relcarry']
Cluster 4 'slow relative momentum'
['relmomentum10', 'relmomentum20']
Cluster 5 'fast relative momentum'
['relmomentum40', 'relmomentum80']
Cluster 6 'divergent speed 2'
['breakout20', 'assettrend4', 'normmom4', 'momentum4']
Cluster 7 'divergent speed 1 (fastest)'
['breakout10', 'assettrend2', 'normmom2', 'accel16']
Cluster 8 'divergent speed 5 (slowest)'
['breakout160', 'breakout320', 'assettrend32', 'assettrend64', 'normmom32', 'normmom64', 'momentum32', 'momentum64']
Cluster 9 'slow acceleration'
['accel32', 'accel64']
Cluster 10 'divergent speed 4'
['breakout80', 'assettrend16', 'normmom16', 'momentum16']
Cluster 11 'divergent speed 3'
['breakout40', 'assettrend8', 'normmom8', 'momentum8']

A new heirarchy for handcrafting trading rules

With that all in mind, a better heirarchy would be something a bit like this:

  • Convergent
    • Mean reversion
    • Skew
      • Equal split between 4 skew rules
    • Carry
      • Outright carry
      • Relative carry
  • Divergent
    • Speed 1 (fastest: turnover > 45)
      • acceleration - nothing fast enough
      • relmomentum10
      • other trend
        • breakout10
        • assettrend2
        • normmom2
        • momentum4
    • Speed 2 (22 < turnover <45)
      • acceleration16
      • relmomentum20
      • other trend
        • breakout20
        • assettrend4
        • normmom4
        • momentum8
    • Speed 3 (12 < turnover < 22)
      • acceleration32
      • relmomentum40
      • other trend
        • breakout40
        • assettrend8
        • normmom8
        • momentum16
    • Speed 4 (7 < turnover < 12)
      • acceleration64
      • relmomentum80
      • other trend
        • breakout80
        • assettrend16
        • normmom16
        • momentum32
    • Speed 5 (4 < turnover < 7)
      • other trend
        • breakout160
        • assettrend32
        • normmom32
        • momentum64
    • Speed 6 (turnover > 4)
      • other trend
        • breakout320
        • assettrend64
        • normmom64
As you can see I (roughly) used turnovers to group the divergent rules, although these groupings aren't quite right I thought it better to go for some nice neat sequences. And this also doesn't exactly follow how the clustering above works eithier. But this would certainly be a better way of doing things than the grouping entirely by trading rule, which as we've seen doesn't make sense for divergent rules where speed is more important.

Now of course there are a lot of caveats with this; first of all it's entirely in sample, but given how stable and persistent correlation of trading rules returns are over time, we'd probably get very similar results with a purely backward looking approach. Secondly we're ignoring things like costs, and the possibility that some rules may do better than others, but we can deal with that when we actually use the above structure to set instrument weights.

Friday 5 May 2023

Trading and investing performance year nine - part 2: Futures trading

 Here is part two of my annual review. Part one looked at my overall portfolio, including long only, but there was only a cursory look at my futures. Here in this second part I will be looking a my futures trading account in a lot more detail.

It's important to say why I'm doing this. I'm certainly not doing it so I can upweight good strategies, and delete badly performing ones. A year of data on top of a 50 year backtest is meaningless. But it's interesting to know what did well or badly, whether my trading costs were higher than expected, how closely my live performance matches simulation, and whether my new dynamic optimisation is adding value compared to the simpler static system I was trading until late 2021 (as requested by Christina). 

A reminder of my overall futures performance

As I noted at the start of the post, I'll just put a very cursory look at my futures trading in here, with a subsequent follow up post to look at more details. All figures are as a % of my notional capital, which will usually be more than I have in my account. 

MTM: -9.7%
Interest: 1.3%
Fees: -0.06%
Commissions: -0.21%

Net futures trading: -8.7%

'Interest' includes dividends on 'cash like' short bond ETFs I hold to make a slightly more efficient use of my cash; I've recently (in this financial year) added a bit more to this sub portfolio. It's quite interesting how interest has gone from being irrelevant to actually adding something to performance.

As I've done in previous years I compare this to two benchmarks, 'Bench2' the SGA CTA index, and a 'Bench1' a fund run by AHL, my ex employers. My loss was worse than both; on a vol corrected basis bench1 made 5% (admittedly on the back of a loss last year), and bench2 only dropped 1.3%

It might be interesting to look at the performance of me versus those benchmarks since I started trading my own money:

        Me Bench1 Bench2

2014 – 2015 58.2% 70.2% 50.7%

2015 – 2016 23.2% -8.7% -1.6%

2016 – 2017 -14.0% -6.2% -25.5%

2017 – 2018 -3.7% 7.5% -4.4%

2018 – 2019 5.2% 8.1% 0.8%

2019 – 2020 39.7% 22.6% 9.3%

2020 – 2021 0.4% 0.8% 12.7%

2021 – 2022 27.0% -5.2% 38.3%

2022 - 2023 -8.71% 5.0% -1.30%

Mean         14.1% 10.4% 8.8%

Std dev         24.3% 24.3% 23.1%

SR         0.58 0.43 0.38

Geom mean 11.9% 8.5% 6.7%

Correlation 0.71 0.80

alpha         6.7% 6.8%

beta         0.71 0.84

Monthly performance, returns vol normalised to 'me' in sample

Still looks pretty healthy. I seem to have been hurt less badly by the sell off that occured in March, possibly due to a lower trend following component (more discussion of that later).

Market by market

Here are the numbers by asset class

  OilGas  -3.19

     Ags  -2.78

      FX  -2.00

  Equity  -2.14

  Sector  -1.64

  Metals  -0.94

     Vol   1.52

    Bond   1.56

A big turnaround for energy and ags markets, the darlings of the 2021/22 accounting period. And here are some worst and best:

0    GAS_US_mini   -2.0

1            SMI   -1.7

2          WHEAT   -1.5

3            AUD   -1.4

4          US10U   -1.4

5       EU-BASIC   -1.2

6           IRON   -1.2

7   CRUDE_W_mini   -1.2

8         SOYOIL   -1.0

9         GBPEUR   -1.0


47          US20    1.0

48       SOYMEAL    1.1

49           VIX    1.4

50           EUR    1.7

51           JPY    2.0

52          BUND    2.0

Interesting that I had profits in 53 instruments last year; about half the number I was actually trading. Again this is the work of the dynamic optimiser; before that I was trading only ~30 markets and it's unlikely I would have positions in all of them during the year.

Trading rules

Presented initially without comment, a bunch of plots showing p&l for each trading rule group:

The most obvious thing is how depressingly bad all of these graphs are. Pretty much every group of trading rules had a small net loss over the year. Even the diversifying carry and skew strategies weren't much help, although they did make money back in the sell off that caught out all the trend following style rules, narrowing my underperformance against the benchmark for the year. Only mean reversion (within asset classes - basically a value strategy), and relative carry were decently profitable. 

It's this lack of signal diversification that brought me to my second worst loss when I started trading: -8.7%. Of course, most equity long only managers would murder half their family for that to be their second worst annual loss, so let's get some perspective here.

Live vs sim

Now let's turn to see how well my live performance matches what my backtest thinks I got. The dynamic optimisation introduces some new jeopardy here, since it results in some path dependence; if the starting positions are different at the start of the time period it's more likely that things will diverge thereafter (I could deal with this by populating my actual starting positions into the backtest on the first of April, but that's a lot of hassle).

'dynamic' here is the backtest, 'live' is live.

Well there are certiainly some differences here; you can see when I went on holiday, but interestingly we end up in exactly the same place. There are so many reasons why these will end up different, none of which I am that bothered about exploring.

Costs and slippage

I already noted above that my commission came in at 21bp (basis points = 0.21%) of account value, but how about slippage?

The cost I would have paid had I crossed the spread every time I traded (market orders) would have been 91bp. However my execution algo by sometimes executing passively saved me 22bp, i.e. around a quarter of the total. So my net slippage was 69bp, for total costs of 90bp.

This is a lot less than last years 3% (due to a one off strategy change), and a little less than my backtest which comes in at around 1% a year. It looks like my new dynamic optimisation algo is doing it's thing.

Dynamic vs static optimisation

Finally let's compare the performance of what I currently trade (dynamic optimisation with over 100 instruments) versus what I was trading before (static optimisation with less than 30). I'm going to compare backtest vs backtest here - I no longer trade static optimisation with real money so there is no other way of doing it; and it seems fairest to compare like with like. Plus we've already seen the difference between the dynamic optimisation backtest and it's live production returns.

Naturally one year doesn't prove anything, and it's also true that the results of a static backtest can be unusually flattered by getting lucky with your choice of instruments (something I discuss at length in my new book). 

An important point here is that it's generally a good thing to store the code and config you use for past trading systems so you can do this exercise. It might also be worth noting down the hash number of the repo version you used with the code; firstly in case you fix bugs in the backtest that change the results (or introduce new bugs!) - although arguably if you run the same code with both backtests that is fairer; secondly in case you make changes that are backwardly incompatible and the old code just doesn't run.

First the long view (well since 2000, which is when my stored backtest begins):

Again this is a case of the static backtest getting lucky. What about last year:
There is a bit of dynamic outperformance, and it's certainly a smoother ride. 

What next

I'm not as interested in some of these statistics as other people are; with the exception of costs, and as long as my live p&l is in the same ball park as my backtest. But hopefully your curiosity has been sated.

My short term plan is to add another bunch of instruments to my strategy, since I've added a bunch more to my database. Then I'm going to have a look at implementing some of the novel strategies in my book, albeit with some fun twists.