Tuesday, 13 April 2021

Trading and investing performance - year seven

 It's April, which means the birds are singing, the trees are leafing, and I'm doing my annual review of my investing and trading performance. 

The format will be familiar from previous years, but I'm going to be using the fact I've upgraded my live trading system to include a lot more detail about my futures trading performance.


TLDR: Last year my futures trading bailed me out during a market meltdown in which my long only portfolio underperformed a market that didn't do that well. My futures trading, and some (for me) very active and slightly discretionary ETF reallocations helped push me to a smaller loss. Since then the market has rebounded strongly, and my long only portfolio has done even better - especially in UK stockpicking. And as the diversification wheel spins round, my futures trading detracted somewhat from that performance, although not by very much.

I've also taken some steps to simplify my portfolio; reducing the number of ETFs I hold, clearing all the long only investments out of my trading account, and closing down my futures hedge.

 

Overview of my world


My investments fall into the following categories:


  • In my investment accounts:
    • 1 UK stocks
    • 2 Various ETFs, covering stocks, bonds, gold and property
    • 3 Usually some uninvested cash
  • In my trading account:
    • 4 Various ETFs, covering stocks and bonds
    • 5 A futures contract hedge against those long only ETFs in 2.1, so that the net Beta is around zero 
    • 6 Futures contracts traded by my fully automated trading system
    • 7 Cash needed for futures margin, and to cover potential trading losses (there is also some cash in my investment accounts, but it's pretty much a rounding error)


Excluded from this analysis is:

  • Net property equity - my house
  • My 'cash float', roughly 6 months of household expenditure that is kept separately from my investment and trading accounts. 


For the purposes of benchmarking it makes most sense to lump my investments in the following way:


  • A: UK single stocks
    • Benchmarked against ISF, a cheap FTSE 100 ETF (FTSE 350 is probably a better benchmark but these ETFs tend to be more expensive).
  • B: Long only investments: All ETFs (in both investment and trading accounts) and UK stocks
    • Benchmarked against a cheap 60:40 fund. This is the type of top down asset allocation portfolio I deal with in my second book.
  • C: Equity neutral: The ETFs in my trading account, plus the equity hedge. 
    • Benchmark is zero.
  • D: Futures trading: Return from the futures contracts traded by my fully automated system. This is the type of portfolio I deal with in my first book, and in my third book. The denominator of performance here is the notional capital at risk in my account (usually close to, but not exactly the same as the account value).
    • Benchmarks are a similar fund run by my ex employers AHL, and the SG CTA index, adjusted for volatility.
  • E: Trading account value: This is essentially everything in my trading account, and consists of equity neutral + futures trading. 
    • No relevant benchmark.
  • F: Everything: Long only investments, plus futures hedge, plus futures trading. I include the value of any cash included in my trading or investment accounts, since if I wasn't trading I could invest this. 
    • For the benchmark here again I use a cheap 60:40 fund.


If you prefer maths, then the relationship to the first set of categories is:

A = 1
B = 1 + 2 + 4
C = 4 + 5
D = 6 + 7
E = 4 + 5 + 6 + 7 = C + D
F = 1 + 2 + 3 + 4 + 5 + 6 + 7 = B + 3 + 5 + D

(Things will be simpler next year. See the end of the post)


Performance contribution


The figures shown are the contribution of each category to my total investment performance:

A, or 1) UK equities +10.4%
2) ETFs +17.5%
B) Long only investments +27.9%
C) Equity hedge -0.9%
D) Systematic futures trading +0.3%
E) Trading account: -0.8%
F) Total +27.4%

Here are the same figures as 'internal rates of return' - the Excel function XIRR (so you can't add these up, but they are comparable and account for flows between categories):

A) UK equities +64.3% Benchmark +24.6%
B) Long only investments +34.8%   Benchmark +21.5%
D) Systematic futures trading  +0.4% Benchmarks +0.7% +10.9%
F) Total +27.4% Benchmark+21.5%

Clearly it's been an outstanding year on the 'long only' side, both in absolute terms and relative to the benchmark I use (Vanguard 60:40), with a particularly good year in UK stocks. Futures have been less impressive, but overall this is a substantial improvement on last years carnage.



UK Equities


This is portfolio uses a mechanical system described here plus enforced sector diversification. 

At the start of the year this subportfolio looked like this:

HMSO Hammerson PLC (LSE:HMSO)               6.1%
IUKD    UK High yielding ETF 93.9%

Here's what I wrote last year:

"So after month end, once some data was available, I decided to start rebuilding my UK equity portfolio. Only the first of these purchases, Hammerson, is included in the year end figures above. It dropped by a quarter in value the day after I bought it, but now shows a modest profit.

Now a large number of stocks are passing my filter, but I am wary of going 'all in' and trying to bottom fish whilst risk levels are still elevated and trends are showing a mixed picture (decidedly up in the short term, still down in the long term).

My rule about sector diversification is more important than ever here; a huge number of housebuilders look great value, but they are probably more exposed than average to the bear case and so you would not want to exclusively own this sector.I decided to invest a proportion of my target portfolio every week in the stock that shows the best value, initially limiting myself to one stock per sector, with an eye to eventually holding about 20 - 25 stocks. I will miss out on some bargains by averaging in like this, but will hopefully pick up a some stocks that are currently overvalued, and will obviously benefit if the current rally turns out to be a false dawn.

At the moment I am using cash for this, but at some point I will sell the 'placeholder' IUKD ETFs and also do a modest reallocation from bonds. It will take about 6 months to fully re-invest this sub-portfolio."

Over the course of the next few months (from April to November) I did indeed buy, buy, buy; and this is what we have now:

RMG Royal Mail PLC (LSE:RMG)                 7.1%
INVP Investec PLC (LSE:INVP)                         7.1%
MCRO Micro Focus International PLC (LSE:MCRO) 5.9%
GNC Greencore Group PLC (LSE)                 5.6%
ITV ITV PLC (LSE:ITV)                         5.6%
CRST Crest Nicholson Holdings PLC (LSE:CRST)         4.7%
SNR Senior PLC (LSE:SNR)                         4.6%
DC. Dixons Carphone PLC (LSE:DC.)                 4.5%
AV. Aviva PLC (LSE:AV.)                         4.4%
VSVS Vesuvius PLC                                 4.4%
MKS Marks & Spencer Group PLC (LSE:MKS)         4.2%
RAT Rathbone Brothers PLC                         4.0%
CNA Centrica PLC (LSE:CNA)                         3.9%
MGNS Morgan Sindall Group PLC (LSE:MGNS)             3.9%
MGAM Morgan Advanced Materials PLC (LSE:MGAM) 3.8%
SYNT Synthomer PLC (LSE:SYNT)                 3.7%
LLOY Lloyds Banking Group PLC (LSE:LLOY)         3.6%
BT BT Group PLC (LSE:BT.A)                         3.4%
IMB Imperial Brands PLC (LSE:IMB)                 3.0%
GOOG Go-Ahead Group (The) PLC (LSE:GOG)         2.9%
MRW Morrison (Wm) Supermarkets PLC (LSE:MRW) 2.8%
DLG Direct Line Insurance Group PLC (LSE:DLG) 2.6%
UKW Greencoat UK Wind (LSE:UKW)                 2.3%
BAB Babcock International Group PLC (LSE:BAB) 1.9%
HMSO Hammerson PLC (LSE:HMSO)                 0.2%

(The buying was funded partly from selling the stopgap IUKD, partly from selling bond ETFs as part of my risk exposure optimisation, and also with a big chunk of cash).

Apart from INVESCO, which I bought a double helping of, all of these had the same initial investment so the order above also reflects their relative performance. In fact only Hammerson, Babock and UK Wind are currently showing losses. In contrast the total return of Royal Mail is a stellar 177% (since July; so even more on an annualised basis), presumably due to all the rubbish that we are all ordering from the internet; and MCRO and ITV are both up more than 100%.

(Strictly speaking, Hammerson hit it's stop loss and then some right at the start of the year but I didn't get round to selling it. It's value is now so low it is not worth selling; so I'm keeping it as an out of the money call option on UK commercial propery)

The IRR of 64.3% compares extremely well with the benchmark, the ISF FTSE 100 fund, which returned 'just' 24.6%. It isn't timing; my average weighted purchases happened when the index had already rallied 10% from the start of the year. So it's all down to stock selection. My idea of buying the cheapest stock in a sector I did not yet own was perfectly suited to a year in which sector and factor rotation was churning like crazy; even if I missed out on the first leg up of the market rally.

Still the UK market has done very poorly compared to most; global equities are up over 50%, mainly due to US equities being up over 60%. In this compahy my 64.3% would be good, but not brilliant. 

Dividends only added about 1% to the performance above; partly due to COVID as firms weren't paying, and partly because I only owned these stocks for 7 months or so on average. In fact I received dividends in only 10 of the stocks above, with only a couple (IMB and AV.) giving me a yield on the purchase price above 5%.


2016 - 2017 XIRR  29.2%, benchmark  22.7%

2017 - 2018 XIRR  18.3%, benchmark   2.2%

2018 - 2019 XIRR  -2.3%, benchmark   7.6%

2019 - 2020 XIRR -23.1%, benchmark -24.3%

2020 - 2021 XIRR +64.3% benchmark +24.6%

UK stock picking has been, with one blip, the best part of my portfolio and this year is no exception. On average I'm over 8% up a year: the geometric means are 13.5% and 4.9% respectively; the Sharpe Ratios are 0.52 and 0.33. 



ETFs and funds


All my non UK and non equity exposure is in ETFs, with a smattering of investment trusts. As usual trading was done for tax optimisation, to generate funds for SIPP and ISA Investment, and to get the right risk exposure (discussed later). As the markets continued to be volatile I did a lot more of trading than usual (though not as much as in the crazy days of March last year). This has slowed down in the last few months of the year as my exposure targets stopped moving so much.


I don't look at the performance or risk of my ETF portfolio seperately, only in conjuction with UK shares.

It's worth noting that I took the opportunity to realise tax losses and clean out a lot of ETFs from my trading account. This will make the performance of that cleaner, since I also closed the equity hedge out. All that is in that account now is cash and a cash like ETF (XSTR).

Overall then I've gone from about 42 ETF names to 'just' 18. This is sufficiently few that I can actually print the entire list:

Equity EM Dividend SEDY

Equity Europe Dividend EUDV

Equity Europe Dividend IDVY

Equity US Dividend HDLG

Equity Asia Normal    PAXG

Bond UK High yield ISXF

Equity Global Dividend VHYL

Equity Asia Dividend IAPD

Bond Europe High yield SHYG

Bond EM Government SEMB

Equity Europe Dividend EQDS

Bond UK Cash like XSTR

Bond EM Corporate EMCP

Equity Asia Dividend PADV

Equity US Dividend USDV

Equity EM Normal   HMEF

Equity Global Infrastructure PGIT

Bond Global High yield VSL


Mostly these follow the recommendations in my 'model portfolio' however there are some funds which I owned before I did that exercise which may be slightly more expensive. The last two funds aren't strictly ETFs but UK listed investment trusts; I've dithered about where to classify these but they seem to make most sense here.



Long only


Calculating a joint XIRR for both stocks and ETF's I get an XIRR of 34.8% which compares extremely well to my usual benchmark, a Vanguard 60:40 fund which earned 21.5%. Dividends accounted for 3.2% of that: not bad considering the whole global pandemic thing.

Perhaps half of that is down to tactical asset allocation; I started the year with an equity:bond allocation of 61:39 (ignoring the relatively high cash load I had, and at this point ignoring my futures trading entirely); and by December I was running 85:15 due to deallocating from bonds on strongly negative relative momentum (as not only did bonds go down, but equities rallied strongly). 

An 80:20 benchmark would have earned about 30%, and 100% equities would have been more like 40%. Had I just invested in Vanguard funds and adjusted the risk exposure to match the adjustments I made to my asset allocation (i.e. to try and factor out the component of my returns from asset class allocation), I would probably have earned something in the high 20's.

The rest then is intra-asset allocation. Some of that was definitely negative: I'm perenially underallocated to the horrifically overpriced US market (started at 20% of equities, ended up at around 12%, compared to a market cap share of over 50%); and as we know the US market did extremely well: up over 60% in the relevant period.

But as we've seen my UK equity investing was top notch (moving my entire long only XIRR from about 27% without the UK, to nearly 35% with it included). I also did relatively less badly in bonds; as I sold government bonds but held on to riskier corporates that haven't done so badly.


2016 - 2017 XIRR  22.3%, benchmark   17.7%
2017 - 2018 XIRR   1.3%, benchmark   1.3%
2018 - 2019 XIRR   4.0%, benchmark   7.2%
2019 - 2020 XIRR -17.5%, benchmark -10.5%
2020 - 2021 XIRR +34.8%, benchmark +21.5%

The geometric means come out at 7.5% and 6.8% respectively; a small beat for me, but nowhere near statistically significant (and my Sharpe Ratio is actually lower).

Note: I've included the 'cash like' ETF XSTR that sits in my trading account in the ETF figures. Without this my performance in this bucket would have been a little higher.



Systematic futures trading and equity hedge



The systematic futures trading system I run is effectively what you can in find in "Systematic Trading"and which I've blogged about at length.


This year I made a few changes. I turned off my old code, and switched to the implementation in pysystemtrade. As I noted above, I also closed a random bunch of ETFs, and my equity hedge. So going forward this will be a much simpler report. All that is in this account are futures, cash, and a 'cash like' ETF: XSTR.


However things are still a little more complicated for 2021-21, and for my trading account as a whole the breakdown looks like this (all numbers are as a % of the notional maximum capital at risk):

Hedging futures: -4.5%
Hedged stocks, total return: +0.1%

Net equity neutral: -4.3%

Futures trading:
MTM: 4.1%
Interest: -0.09%
Fees: -0.09%
Commissions: -0.47%
FX gain/loss: -3.1%

Net futures trading: +0.39%

Grand total: -4.0%


So I actually made a bit of money trading futures, but lost it on the equity 'neutral' component, and on FX: the value of the cash I hold in non GBP currency went down (GBPUSD rallied from about 1.23 to 1.37 during the year, though fortunately it was flat against euros). 


The absolute size of this currency gain/loss is not out of in line with earlier years. I could probably be a bit more pro-active in terms of managing my FX exposure to sweep as much as possible back into GBP on a regular basis, but at the end of the day I will always have to hold quite a bit of foreign currency in my account for margin to avoid paying borrowing charges, and I'm not bothered enough about it that I'm going to hedge it out (though perhaps I should be).


Let's turn to the benchmarks.

 'Bench1' is an GBP denominated AHL fund, using monthly returns from April to March in each year, and a new benchmark 'Bench2' is the SG CTA index, with matching daily returns. Both have returns scaled up to match my volatility: multipling by 1.45 and 1.8 respectively. As I redo this calculation every year the benchmark numbers don't always match previous years.

Remember the benchmark should only be compared against futures trading, not the equity neutral component of the portfolio.


Year:    14/15   15/16   16/17   17/18  18/19  19/20    20/21

Total:   57.2%   39.6%    0.3%    0.4%   6.1%   35.7%   -4.0%
Hedge:   -1.1%   16.3%   14.4%    4.1%   1.0%   -4.0%   -4.3%
Futures: 58.2%   23.2%  -14.0%   -3.7%   5.2%   39.7%   +0.4%

Bench1:  66.9%   -8.3%   -5.8%    7.1%   7.7%   21.6%   +0.7%
Bench2:          -6.7%* -21.9%   -3.8%   0.7%    8.0%   10.9%

* From 13th April 2015


So a pretty poor year, all things considered, and the first time the SG index has beaten me (and it would have done so even without the FX noise). First benchmark also didn't have a great year: their trading style is (unsurprisingly) closer to mine than the rest of the industry.

My futures only annualised geometric mean at 13.1% and Sharpe Ratio of 0.6 is still a tiny bit better than the AHL fund (12.9% and 0.5 respectively), and still an awful lot better than the SG index.

(Incidentally, my Sharpe ratio calculated on monthly or daily returns is actually over 1, which is in line with my backtest).

On balance it will be nice to remove the equity hedge; overall it did add to performance over all the years I've had it, but it's also been a source of unwanted noise.


Let's do some python


In previous years I would have stopped here, but now we have pysystemtrade with all it's lovely diagnostics.

Let's start with the account curve:

from sysproduction.data.capital import dataCapital
d = dataCapital()
pandl = d.get_series_of_accumulated_capital()
pandl_returns = pandl.diff()
pandl_capital = d.get_series_of_maximum_capital()
profits = pandl_returns / pandl_capital
profits.cumsum().plot()

And zoomed in to last year

So I hit a HWM in June, lost that and some more, then recovered and have been gently wobbling around without much happening for the last few months.

I can use the pandl_report function to zoom in on various markets and instruments:

====================================
P&L by instrument for all strategies
====================================

      codes  pandl
1   EUROSTX  -4.31  (used as a hedge only)
2   EDOLLAR  -2.86
3       JPY  -1.51
4      US10  -1.48
5      KR10  -1.08
6   LEANHOG  -0.99
7   CRUDE_W  -0.72
8    GAS_US  -0.66
9      CORN  -0.52
... various small numbers removed
17     BOBL   0.52
18      OAT   0.56
19      VIX   1.01
20  SOYBEAN   1.10
21      CAC   1.32
22      MXP   1.81
23      SMI   2.24
24      AUD   2.47
25     PLAT   2.81
26      BTP   3.59
27    WHEAT   4.53



==================
P&L by asset class
==================

    codes  pandl
0    STIR  -2.86
1  OilGas  -1.39
2  Equity  -0.76
3     Ags  +0.02
4     Vol   1.12
5    Bond   1.44
6      FX   2.59
7  Metals   2.81






Total investment return


My total return on all my investments, including cash held for futures margin, came in at an XIRR of +27.9% (the simple return which we've already seen above was slightly lower: +27.4%). This is lower than the 34.8% for long only for a couple of reasons; firstly I did lose money trading futures (a loss in my equity hedge, partly offset with a gain in my actual trading). Secondly, and more importantly, the denominator is larger as it includes the cash I use for futures margin. Once again Vanguard 60:40 seems an appropriate benchmark (since if I wasn't trading futures I could throw all my cash into that fund), at +21.5%


Let's look at some history


2016 - 2017 XIRR 18.2%, benchmark 19.3%
2017 - 2018 XIRR  0.6%, benchmark 1.3%
2018 - 2019 XIRR  4.4%, benchmark 7.2%
2019 - 2020 XIRR -6.6%, benchmark -10.5

2020 - 2021 XIRR +27.9% benchmark +21.%


I noted last year that I was in a dead heat with the benchmark in the geometric mean race; obviously this year puts me into a lead with a geometric mean of 9.3% versus 7.1% (My Sharpe Ratio is also a little higher; 0.64 to 0.58 using a zero risk free interest rate). To put it another way, futures have added about 1.8% annually to my performance, though quite unevenly: 


2016 - 2017 Total XIRR 18.2%,22.3% without futures

2017 - 2018 Total XIRR  0.6%, 1.3% without futures

2018 - 2019 Total XIRR  4.4%, 4% without futures

2019 - 2020 Total XIRR -6.6%, -17.5% without futures

2020 - 2021 Total XIRR +27.9%, 34.8% without futures


(They would almost certainly have added substantially to my performance in 2014-2015, and 2015-2016, though I wasn't formally measuring my entire portfolio at this point).



Risk


This year I started using the model portfolio (outlined here) as a basis for my risk allocation. So the figures here should look pretty similar to those (with the addition of cash and futures exposure). During the year I sold down my cash, and also shifted from bonds to equities in line with the 12 month momentum signals I use for this allocation.

Cash allocation

|Asset   |Start of year|Current|
--------------------------------
|Bonds   |   28.9%     | 12.7% |
|Equity  |   45.8%     | 73.4% |
|Other   |    0.2%     |  0.0% |
|Cash    |   25.2%     | 13.9% |

"Cash" excludes money held in bank accounts which will cover ~6 months of living expenses that I do not deem part of my investment portfolio. 
Note: I've included the 'cash like' ETF XSTR that sits in my trading account in the Bond ETF figures, not in the cash. It amounts to 1.8% of the total allocation figures shown here. You could argue that the bond ETF figures are more like 10.9% and cash 15.7%.

Other: I had some random property and metals ETFs that I've cleared out in the cull of small holdings I mentioned above.


Risk allocation


|Asset    |Strategic|Start of year|Current| 
-------------------------------------------
|Bonds    |   25%   |   17.4%     |  7.0% |   
|Equity   |   50%   |   50.2%     | 71.6% |
|Futures  |   25%   |   35.2%     | 21.4% |
|Other    |    0%   |    0.3%     |  0.0% |


The bond/equity change is again based on momentum. My futures risk has gone down, but only as a percentage; my futures risk in money terms is unchanged, but my overall risk is higher because I'm holding less cash and more equities.



Current regional exposures (rows add up to 100%)


|      | Asia | EM   | Euro  |  UK   |  US  |
---------------------------------------------
|Bonds | 0%   | 39%  | 17%   |  42%  |  2% |
|Equity| 12%  | 23%  | 23%   |  30%  |  12% |


Relative to the tactical allocations I would like to have (in the absence of paying capital gains tax), I'm overweight UK and EM bonds, and European equities; and underweight Asian, EM and US equities. 



Summary, and what now



Performance has obviously been good, but I prefer to focus on process. Things have gradually calmed down this year, and I've brought some more rigour to my methodology (in the form of the model portfolio) and also done some cleaning and tidying to simplify things.

So my structure is now much cleaner:

  • In my investment accounts:
    • 1 UK stocks
    • 2 Various ETFs, covering stocks, bonds, gold and property
    • 3 Usually some uninvested cash
  • In my trading account:
    • 4 Futures contracts traded by my fully automated trading system
    • 5 Cash needed for futures margin, and to cover potential trading losses (there is also some cash in my investment accounts, but it's pretty much a rounding error). Some of this cash is held in the form of a 'cash like' ETF XSTR.

For the purposes of benchmarking it makes most sense to lump my investments in the following way:

  • A: UK single stocks
    • Benchmarked against ISF
  • B: Long only investments: All ETFs (in both investment and trading accounts) and UK stocks
    • Benchmarked against a cheap 60:40 fund. 
  • C: Futures trading: Return from the futures contracts traded by my fully automated system. 
    • Benchmarks are a similar fund run by my ex employers AHL, and the SG CTA index, adjusted for volatility.
  • D: Everything: Long only investments, plus futures trading. I include the value of any cash included in my trading or investment accounts, since if I wasn't trading I could invest this. 
    • For the benchmark here again I use a cheap 60:40 fund.


If you prefer maths, then the relationship to the first set of categories is:

A = 1
B = 1 + 2 
C = 4 + 5
D = 1 + 2 + 3 + 4 + 5

UK stockpicking has obviously done well, but I have no interest in substantially increasing the size of my book here, that's now up to 25 names and is also a fraction over it's tactical target allocation. If some stop losses start being hit then I will look at the target, and if neccessary add or delete some capital from any investment I make into new shares. 

I'm happy with my new model portfolio based approach to managing my ETF exposure; now much more formally in line with the sort of thing outlined in Smart Portfolios, and only a few minutes of work every month to keep updated. I have to do this years ISA and pension allocations which should allow me to get a bit closer to the ideal tactical allocations. Almost nothing I own has the potential for a capital gains tax loss, so it might be hard to do a huge amount of rebalancing.

I'm currently in the middle of a massive refactoring of my pysystemtrade research code; once completed I hope to research and implement some new futures trading systems. Watch this space!

Thursday, 4 March 2021

Does it make sense to change your trading behaviour in different periods of volatility?

 A few days ago I was browsing on the elitetrader.com forum site when someone posted this:

I am interested to know if anyone change their SMA/EMA/WMA/KAMA/LRMA/etc. when volatility changes? Let say ATR is rising, would you increase/decrease the MA period to make it more/less sensitive? And the bigger question would be, is there a relationship between volatility and moving average?

Interesing I thought, and I added it to my very long list of things to think about (In fact I've researched something vaguely like this before, but I couldn't remember what the results were, and the research was done whilst at my former employers which means it currently behind a firewall and a 150 page non disclosure agreement). 

Then a couple of days ago I ran a poll off the back of this post as to what my blogpost this month should be about (though mainly the post was an excuse to reminisce about the Fighting Fantasy series of books).

And lo and behold, this subject is what people wanted to know about. But even if you don't want to know about it, and were one of the 57% that voted for the other two options, this is still probably a good post to read. I'm going to be discussing principles and techniques that apply to any evaluation of this kind of system modification.

However: spolier alert - this little piece of research took an unexpected turn. Read on to find out what happened...



Why this is topical


This is particularly topical because during the market crisis that consumed much of 2020 it was faster moving averages that outperformed slower. Consider these plots which show the average Sharpe Ratio for different kinds of trading rule averaged across instruments. The first plot is for all the history I have (back to the 1970's), then the second is for the first half of 2020, and finally for March 2020 alone:



The pattern is striking: going faster works much better than it did in the overall sample. What's more, it seems to be confined to the financial asset classes (FX, Rates and especially equities) where vol exploded the most:



Furthermore, we can see a similar effect in another notoriously turbulent year:

If we were sell side analysts that would be our nice little research paper finished, but of course we aren't... a few anecdotes do not make up a serious piece of analysis.


Formally specifying the problem

Rewriting the above in fancy sounding language, and bearing in mind the context of my trading system, I can write the above as:

Are the optimal forecast weights across trading rules of different speeds different when conditioned on the current level of volatility?

As I pointed out in my last post this leaves a lot of questions unanswered. How should we define the current level of volatility? How we define 'optimality'? How do we evaluate the performance of this change to our simple unconditional trading rules?



Defining the current level of volatility


For this to be a useful thing to do, 'current' is going to have to be based on backward looking data only. It would have been very helpful to have known in early February last year (2020) that vol was about to rise sharply, and thus perhaps different forecast weights were required, but we didn't actually own the keys to a time machine so we couldn't have known with certainty what was about to happen (and if we had, then changing our forecast weights would not have been high up our to-do list!).

So we're going to be using some measure of historic volatility. The standard measure of vol I use in my trading system (exponentially weighted, equivalent to a lookback of around a month) is a good starting point which we know does a good job of predicting vol over the next 30 days or so (although it does suffer from biases, as I discuss here). Arguably a shorter measure of vol would be more responsive, whilst a longer measure of vol would mean that our forecast weights aren't changing as much thus reducing the costs.

Now how do we define the level of volatility? In that previous post I used current vol estimate / 10 year rolling average of the  vol for the relevant. That seems pretty reasonable. 

Here for example is the rolling % vol for SP500:

import  pandas as pd
from systems.provided.futures_chapter15.basesystem import *

system =futures_system()

instrument_list = system.get_instrument_list()

all_perc_vols =[system.rawdata.get_daily_percentage_volatility(code) for code in instrument_list]



 And here's the same, after dividing by 10 year vol:

ten_year_averages = [vol.rolling(2500, min_periods=10).mean() for vol in all_perc_vols]
normalised_vol_level = [vol / ten_year_vol for vol, ten_year_vol in zip(all_perc_vols, ten_year_averages)]




The picture is very similar, but importantly we can now compare and pool results across instruments.

def stack_list_of_pd_series(x):
stacked_list = []
for element in x:
stacked_list = stacked_list + list(element.values)

return stacked_list

stacked_vol_levels = stack_list_of_pd_series(normalised_vol_level)

stacked_vol_levels = [x for x in stacked_vol_levels if not np.isnan(x)]
matplotlib.pyplot.hist(stacked_vol_levels, bins=1000)

What's immediately obvious is that this is a very skewed distribution. This is made clear if we stack up all the normalised vols across markets and plot the distribution:

Update: There was a small bug in my code that didn't affect the conclusions, but had a significant effect on the scale of the normalised vol. Now fixed. Thanks to Rafael L. for pointing this out.






The mean is 0.98 - as you'd expect - but look at that right tail! About 1% of the observations are over 2.5, and the maximum value is nearly 6.7. You might think this is due to some particularly horrible markets (VIX?), but nearly all the instruments have normalised vol that is distributed like this.

At this point we need to think about how many vol regimes were going to have, and how they should be selected. More regimes will mean we can more closely fit our speed to what is going on, but we'd end up with fewer data points (I'm reminded of this post where someone had inferred behaviour from just 18 days when the VIX was especially low). Fewer data points will mean our forecast weights will eithier revert to an average, or worse take extreme values if we're not fitting robustly.

I decided to use three regimes:
  • Low: Normalised vol in the bottom 25% quantile [using the entire historical period so far to determine the quantile] (over the whole period, normalised vol between 0.16 and 0.7 times the ten year average)
  • Medium: Between 25% and 75% (over the whole period, normalised vol 0,7 to 1.14 times the ten year average)
  • High: Between 75% and 100% (over the whole period, normalised vol 1.14 to 6,6 times more than the ten year average)
There could be a case for making these regimes equal size, but I think there is something about relatively high vol that is unique so I made that smaller (with low vol the same size for symettry). Equally, there is a case for making them more extreme. There certainly isn't a case for jumping ahead and seeing which range of regimes performs the best - that would be implicit fitting!

def historic_quantile_groups(system, instrument_code, quantiles = [.25,.5,.75]):
daily_vol = system.rawdata.get_daily_percentage_volatility(instrument_code)
    # We shift by one day to avoid forward looking information
ten_year_vol = daily_vol.rolling(2500, min_periods=10).mean().shift(1)
normalised_vol = daily_vol / ten_year_vol

quantile_points = [get_historic_quantile_for_norm_vol(normalised_vol, quantile) for quantile in quantiles]
stacked_quantiles_and_vol = pd.concat(quantile_points+[normalised_vol], axis=1)
quantile_groups = stacked_quantiles_and_vol.apply(calculate_group_for_row, axis=1)

return quantile_groups

def get_historic_quantile_for_norm_vol(normalised_vol, quantile_point):
return normalised_vol.rolling(99999, min_periods=4).quantile(quantile_point)

def calculate_group_for_row(row_data: pd.Series) -> int:
values = list(row_data.values)
if any(np.isnan(values)):
return np.nan
vol_point = values.pop(-1)
group = 0 # lowest group
for comparision in values[1:]:
if vol_point<=comparision:
return group
group = group+1

# highest group will be len(quantiles)-1
return group

Over all instruments pooled together...
quantile_groups = [historic_quantile_groups(system, code) for code in instrument_list]
stacked_quantiles = stack_list_of_pd_series(quantile_groups)
.... the size of each group comes out at:
  • Low vol: 53% of observations
  • Medium vol: 22% 
  • High vol: 25%
That's different from the 25,50,25 you'd expect. That's because  vol isn't stable over this period and we're using backward looking quantiles, rather than doing a forward looking cheat where we use the entire period to determine our quantiles (which would give us exactly 25,50,25).

Still we've got a quarter in our high vol group, which was what we are aiming for. And I feel it would be some kind of cheating to go back and change the quantile cutoffs having seen these numbers.


Unconditional performance of momentum speeds


Let's get the unconditional returns for the rules in our trading system: momentum using exponentially weighted moving average crossovers from 2_8 (2 day lookback - 8 days) up to 64_256, plus the carry rule (not strictly speaking part of the problem we're looking at today, but what the hell: we can use this as a proxy for determining whether 'divergent' / momentum or 'convergent' systems do worse or better when vol is high or low). These are average returns across instruments; which won't be as good as the portfolio level returns for each rule (we'll look at those later).

rule_list  =list(system.rules.trading_rules().keys())
perf_for_rule = {}
for rule in rule_list:
perf_by_instrument = {}
for code in instrument_list:
perf_for_instrument_and_rule = system.accounts.pandl_for_instrument_forecast(code, rule)
perf_by_instrument[code] = perf_for_instrument_and_rule

perf_for_rule[rule] = perf_by_instrument

# stack
stacked_perf_by_rule = {}
for rule in rule_list:
acc_curves_this_rule = perf_for_rule[rule].values()
stacked_perf_this_rule = stack_list_of_pd_series(acc_curves_this_rule)
stacked_perf_by_rule[rule] = stacked_perf_this_rule

def sharpe(x):
# assumes daily data
return 16*np.nanmean(x) / np.nanstd(x)

for rule in rule_list:
print("%s:%.3f" % (rule, sharpe(stacked_perf_by_rule[rule])))

ewmac2_8:0.064
ewmac4_16:0.202
ewmac8_32:0.303
ewmac16_64:0.345
ewmac32_128:0.351
ewmac64_256:0.339
carry:0.318

Similar to the plot we saw earlier; unconditionally medium and slow momentum (and carry) tends to outperform fast momentum.

Now what if we condition on the current state of vol?
historic_quantiles = {}
for code in instrument_list:
historic_quantiles[code] = historic_quantile_groups(system, code)

conditioned_perf_for_rule_by_state = []

for condition_state in [0,1,2]:
print("State:%d \n\n\n" % condition_state)

conditioned_perf_for_rule = {}
for rule in rule_list:
conditioned_perf_by_instrument = {}
for code in instrument_list:
perf_for_instrument_and_rule = perf_for_rule[rule][code]
condition_vector = historic_quantiles[code]==condition_state
condition_vector = condition_vector.reindex(perf_for_instrument_and_rule.index).ffill()
conditioned_perf = perf_for_instrument_and_rule[condition_vector]

conditioned_perf_by_instrument[code] = conditioned_perf

conditioned_perf_for_rule[rule] = conditioned_perf_by_instrument

conditioned_perf_for_rule_by_state.append(conditioned_perf_for_rule)

stacked_conditioned_perf_by_rule = {}
for rule in rule_list:
acc_curves_this_rule = conditioned_perf_for_rule[rule].values()
stacked_perf_this_rule = stack_list_of_pd_series(acc_curves_this_rule)
stacked_conditioned_perf_by_rule[rule] = stacked_perf_this_rule

print("State:%d \n\n\n" % condition_state)
for rule in rule_list:
print("%s:%.3f" % (rule, sharpe(stacked_conditioned_perf_by_rule[rule])))

State:0  (Low vol)
ewmac2_8:0.207
ewmac4_16:0.334
ewmac8_32:0.432
ewmac16_64:0.481
ewmac32_128:0.492
ewmac64_256:0.462
carry:0.442

Interesting! These numbers are better than the unconditional figures we saw above, but fast momentum still looks poor relatively speaking (these numbers, like all those in this post, are after costs). But overall the pattern isn't that different from the unconditional performance; nowhere near enough to justify changing forecast weights very much.

State:1 (Medium vol)
ewmac2_8:0.139
ewmac4_16:0.255
ewmac8_32:0.335
ewmac16_64:0.380
ewmac32_128:0.397
ewmac64_256:0.340
carry:0.195

The 'medium' level of vol is more similar to the unconditional figures. Again this is nothing to write home about in terms of differences in relative performance, although relatively speaking fast is looking a little worse.


State:2 (High vol)
ewmac2_8:-0.299
ewmac4_16:-0.106
ewmac8_32:0.027
ewmac16_64:0.043
ewmac32_128:0.003
ewmac64_256:0.002
carry:0.103


Now you've probably noticed a pattern here, and I know everyone is completely distracted by it, but just for a moment lets' focus on relative performance, which is what this post is supposed to be about. Relatively speaking fast is still worse than slow, and it's now much worse. 

Carry has markedly improved, but.... oh what the hell I can't contain myself anymore. There is nothing that interesting or useful in the relative performance, but what is clear is that the absolute performance of everything is reducing as we get to a higher volatility environment.

Update: 
A regular reader (Mike N) asked me how much the above figures were affected by costs. So I re-ran the above but excluded costs

Unconditional figures:
ewmac2_8:0.135
ewmac4_16:0.245
ewmac8_32:0.334
ewmac16_64:0.368
ewmac32_128:0.369
ewmac64_256:0.343
carry:0.309

Low vol:
ewmac2_8:0.277
ewmac4_16:0.368
ewmac8_32:0.449
ewmac16_64:0.491
ewmac32_128:0.497
ewmac64_256:0.466
carry:0.443

Medium vol:
ewmac2_8:0.209
ewmac4_16:0.290
ewmac8_32:0.353
ewmac16_64:0.390
ewmac32_128:0.403
ewmac64_256:0.345
carry:0.196


High vol:
ewmac2_8:-0.232
ewmac4_16:-0.071
ewmac8_32:0.045
ewmac16_64:0.053
ewmac32_128:0.009
ewmac64_256:0.007
carry:0.104

So not just a cost story.

Testing the significance of overall performance in different vol environments

I really ought to end this post here, as the answer to the original question is a firm no: you shouldn't change your speed as vol increases. 

However we've now been presented with a new hypothesis: "Momentum and carry will do badly when vol is relatively high"

Let's switch gears and test this hypothesis.

First of all let's consider the statistical significance of the differences in return we saw above:

from scipy import stats

for rule in rule_list:
perf_group_0 = stack_list_of_pd_series(conditioned_perf_for_rule_by_state[0][rule].values())
perf_group_1 = stack_list_of_pd_series(conditioned_perf_for_rule_by_state[1][rule].values())
perf_group_2 = stack_list_of_pd_series(conditioned_perf_for_rule_by_state[2][rule].values())

t_stat_0_1 = stats.ttest_ind(perf_group_0, perf_group_1)
t_stat_1_2 = stats.ttest_ind(perf_group_1, perf_group_2)
t_stat_0_2 = stats.ttest_ind(perf_group_0, perf_group_2)

print("Rule: %s , low vs medium %.2f medium vs high %.2f low vs high %.2f" % (rule,
t_stat_0_1.pvalue,
t_stat_1_2.pvalue,
t_stat_0_2.pvalue))

Rule: ewmac2_8 , low vs medium 0.37 medium vs high 0.00 low vs high 0.00
Rule: ewmac4_16 , low vs medium 0.25 medium vs high 0.00 low vs high 0.00
Rule: ewmac8_32 , low vs medium 0.12 medium vs high 0.00 low vs high 0.00
Rule: ewmac16_64 , low vs medium 0.08 medium vs high 0.00 low vs high 0.00
Rule: ewmac32_128 , low vs medium 0.07 medium vs high 0.00 low vs high 0.00
Rule: ewmac64_256 , low vs medium 0.03 medium vs high 0.00 low vs high 0.00
Rule: carry , low vs medium 0.00 medium vs high 0.32 low vs high 0.00

These are p-values, so a low number means statistical significance. Generally speaking, with the exception of carry, the biggest effect is when we jump from medium to high vol; the jump from low to medium doesn't usually result in a significantly worse performance.

So it's something special about the high-vol enviroment where returns get badly degraded.


Is this an effect we can actually capture?


One concern I have is how quickly we move in and out of the different vol regimes; here for example is Eurodollar:




To exploit this effect we're going to have to do something like radically reduce our leverage whenever an instrument enters 'zone 2: high vol'. That clearly would have worked in early 2020 when there was a persistent high vol environment for some reason that escapes me now. But would we really get the chance to do very much for those brief few days in late 2019 when Eurodollar enters the highest vol zone?

Above you may have noticed I put in a one day lag on the vol estimate - this is to ensure we aren't conditioning todays return based on a vol estimate that uses todays return - clearly we couldn't change our leverage or otherwise react until we actually got the close of business price.

[In my backtest I automatically lag trades by a day, so when I finally come to test anything this shift can be removed]

In fact I have a confession to make... when first running this code I omitted the shift(1) lag, and the results were even stronger; with heavily negative returns for all trading rules in the highest vol region (except carry, which was barely positive). So this makes me suspicous that we wouldn't have the chance to react in time to make much of this.

Still, repeating the results with a 2 and even 3 day lag I still have some pretty low p-values, so there is probably something in it. Also, interestingly, with these greater lags there is more differentiation between low and medium regimes. Here for example are the T-statistics for a 3 day lag:

Rule: ewmac2_8, low vs medium 0.06 medium vs high 0.01 low vs high 0.00
Rule: ewmac4_16, low vs medium 0.16 medium vs high 0.04 low vs high 0.00
Rule: ewmac8_32, low vs medium 0.13 medium vs high 0.08 low vs high 0.00
Rule: ewmac16_64, low vs medium 0.03 medium vs high 0.06 low vs high 0.00
Rule: ewmac32_128, low vs medium 0.01 medium vs high 0.06 low vs high 0.00
Rule: ewmac64_256, low vs medium 0.02 medium vs high 0.14 low vs high 0.00
Rule: carry, low vs medium 0.08 medium vs high 0.46 low vs high 0.01


A more graduated system


Rather than using regimes, I think it would make more sense to do something involving a more continous variable, which is the quantile percentile itself, rather than the regime bucket that it falls into. Then we won't drastically shift gears between regimes.

Recall our three regimes:
  • Low: Normalised vol in the bottom 25% quantile
  • Medium: Between 25% and 75%
  • High: Between 75% and 100% 
One temptation is to introduce something just for the high regime, where we start degearing when our quantile percentile is above 75%; but that makes me feel queasy (it's clearly implicit fitting), plus the results with higher lags indicate that it might not be a 'high vol is especially bad' effect, but rather a general 'as vol gets higher we make less money'.

After some thought (well 10 seconds) I came up with the following:

Multiply raw forecasts by L where (if Q is the percentile expressed as a decimal, eg 1 = 100%):

L = 2 - 1.5Q

That will vary L between 2 (if vol is really low) and 0.5 (if vol is really high). The reason we're not turning off the system completely for high vol is for all the usual reasons; although this is a strong effect it's still not a certainty 

I use the raw forecast here. I do this because there is no guarantee that the above will result in the forecast retaining the correct scaling. So if I then estimate forecast scalars using these transformed forecasts, I will end up with something that has the right scaling.

These forecasts will then be capped at -20,+20; which may undo some of the increases in leverage done when vol is particularly low - but 

 

Smoothing vol forecast attenuation


The first thing I did was to see what the L factor actually looks like in practice. Here it is for Eurodollar [I will give you the code in a few moments]:


It sort of seems to make sense; there for example you can see the attenuation backing right off in early 2020 when we had the COVID inspired high vol. However it worries me that this thing is pretty noisy. Laid on top of a relatively smooth slow moving average this thing is going to boost trading costs quite a lot. I think the appropriate thing to do here is smooth it before applying it to the raw forecast. Of course if we smooth it too much then we'll be lagging the vol period.

Once again, the wrong thing to do here would be some kind of optimisation of post cost returns to find the best smoothing lookback, or something that was keyed into the speed of the relevant trading rule; instead I'm just going to plump for a ewma with a 10 day span. 


Testing the attenuation, rule by rule


Here then is the code that implements the attenuation:

from systems.forecast_scale_cap import *

class volAttenForecastScaleCap(ForecastScaleCap):

@diagnostic()
def get_vol_quantile_points(self, instrument_code):
## More properly this would go in raw data perhaps
self.log.msg("Calculating vol quantile for %s" % instrument_code)
daily_vol = self.parent.rawdata.get_daily_percentage_volatility(instrument_code)
ten_year_vol = daily_vol.rolling(2500, min_periods=10).mean()
normalised_vol = daily_vol / ten_year_vol

normalised_vol_q = quantile_of_points_in_data_series(normalised_vol)

return normalised_vol_q

@diagnostic()
def get_vol_attenuation(self, instrument_code):
normalised_vol_q = self.get_vol_quantile_points(instrument_code)
vol_attenuation = normalised_vol_q.apply(multiplier_function)

smoothed_vol_attenuation = vol_attenuation.ewm(span=10).mean()

return smoothed_vol_attenuation

@input
def get_raw_forecast_before_attenuation(self, instrument_code, rule_variation_name):
## original code for get_raw_forecast
raw_forecast = self.parent.rules.get_raw_forecast(
instrument_code, rule_variation_name
)

return raw_forecast

@diagnostic()
def get_raw_forecast(self, instrument_code, rule_variation_name):
## overriden methon this will be called downstream so don't change name
raw_forecast_before_atten = self.get_raw_forecast_before_attenuation(instrument_code, rule_variation_name)

vol_attenutation = self.get_vol_attenuation(instrument_code)

attenuated_forecast = raw_forecast_before_atten * vol_attenutation

return attenuated_forecast
def quantile_of_points_in_data_series(data_series):
results = [quantile_of_points_in_data_series_row(data_series, irow) for irow in range(len(data_series))]
results_series = pd.Series(results, index = data_series.index)

return results_series

from statsmodels.distributions.empirical_distribution import ECDF

# this is a little slow so suggestions for speeding up are welcome
def quantile_of_points_in_data_series_row(data_series, irow):
if irow<2:
return np.nan
historical_data = list(data_series[:irow].values)
current_value = data_series[irow]
ecdf_s = ECDF(historical_data)

return ecdf_s(current_value)

def multiplier_function(vol_quantile):
if np.isnan(vol_quantile):
return 1.0

return 2 - 1.5*vol_quantile

And here's how to implement it in a new futures system (we just copy and paste the futures_system code and change the object passed for the forecast scaling/capping stage)::
from systems.provided.futures_chapter15.basesystem import *


def futures_system_with_vol_attenuation(data=None, config=None, trading_rules=None, log_level="on"):

if data is None:
data = csvFuturesSimData()

if config is None:
config = Config(
"systems.provided.futures_chapter15.futuresconfig.yaml")

rules = Rules(trading_rules)

system = System(
[
Account(),
Portfolios(),
PositionSizing(),
FuturesRawData(),
ForecastCombine(),
volAttenForecastScaleCap(),
rules,
],
data,
config,
)

system.set_logging_level(log_level)

return system

And now I can set up two systems, one without attenuation and one with:
system =futures_system()
# will equally weight instruments
del(system.config.instrument_weights)

# need to do this to deal fairly with attenuation
# do it here for consistency
system.config.use_forecast_scale_estimates = True
system.config.use_forecast_div_mult_estimates=True

# will equally weight forecasts
del(system.config.forecast_weights)

# standard stuff to account for instruments coming into the sample
system.config.use_instrument_div_mult_estimates = True

system_vol_atten = futures_system_with_vol_attenuation()
del(system_vol_atten.config.forecast_weights)
del(system_vol_atten.config.instrument_weights)
system_vol_atten.config.use_forecast_scale_estimates = True
system_vol_atten.config.use_forecast_div_mult_estimates=True
system_vol_atten.config.use_instrument_div_mult_estimates = True

rule_list =list(system.rules.trading_rules().keys())

for rule in rule_list:
sr1= system.accounts.pandl_for_trading_rule(rule).sharpe()
sr2 = system_vol_atten.accounts.pandl_for_trading_rule(rule).sharpe()

print("%s before %.2f and after %.2f" % (rule, sr1, sr2))

Let's check out the results:
ewmac2_8 before 0.43 and after 0.52
ewmac4_16 before 0.78 and after 0.83
ewmac8_32 before 0.96 and after 1.00
ewmac16_64 before 1.01 and after 1.07
ewmac32_128 before 1.02 and after 1.07
ewmac64_256 before 0.96 and after 1.00
carry before 1.07 and after 1.11

Now these aren't huge improvements, but they are very consistent across every single trading rule. But are they statistically significant?
from syscore.accounting import account_test

for rule in rule_list:
acc1= system.accounts.pandl_for_trading_rule(rule)
acc2 = system_vol_atten.accounts.pandl_for_trading_rule(rule)
print("%s T-test %s" % (rule, str(account_test(acc2, acc1))))

ewmac2_8 T-test (0.005754898313025798, Ttest_relResult(statistic=4.23535684665446, pvalue=2.2974165336647636e-05))
ewmac4_16 T-test (0.0034239182014355815, Ttest_relResult(statistic=2.46790714210943, pvalue=0.013603190422737766))
ewmac8_32 T-test (0.0026717541872894254, Ttest_relResult(statistic=1.8887927423648214, pvalue=0.058941593401076096))
ewmac16_64 T-test (0.0034357601899108192, Ttest_relResult(statistic=2.3628815728522112, pvalue=0.018147935814311716))
ewmac32_128 T-test (0.003079560056791747, Ttest_relResult(statistic=2.0584403445859034, pvalue=0.03956754085349411))
ewmac64_256 T-test (0.002499427499123595, Ttest_relResult(statistic=1.7160401190191614, pvalue=0.08617825487582882))
carry T-test (0.0022278238232666947, Ttest_relResult(statistic=1.3534155676590192, pvalue=0.17594617201514515))

A mixed bag there, but with the exception of carry there does seem to be a reasonable amount of improvement; most markedly with the very fastest rules.
Again, I could do some implicit fitting here to only use the attenuation on momentum, or use less of it on slower momentum. But I'm not going to do that.

Summary


To return to the original question: yes we should change our trading behaviour as vol changes.
But not in the way you might think, especially if you had extrapolated the performance from March 2020.

As vol gets higher faster trading rules do relatively badly, but actually the bigger story is that all momentum rules suffer
(as does carry, a bit). Not what I had expected to find, but very interesting. So a big thanks to the internet's hive mind for voting for this option.