It's for this reason that only 2 out of the 75 posts I've published on this blog have been about trading rules (this on trend following and carry; and this one on my 'breakout' system). But ... if I look at my inbox or blog comments or my thread on elitetrader.com the most common request is for me to "write about X"... where X is some trading rule I may have casually mentioned in passing that I use, but haven't written about it.
So I have mixed feelings writing this post (in which the metaphorical kimono will be completely opened- there are no more secret trading rules hiding inside my system). I'm hoping that this will satisfy the clamour for information about other trading rules that I run. Of course it's also worth adding these rules to my open source python project pysystemtrade, since I hope that will eventually replace the legacy system I use for my own trading, and I won't want to do that unless I have a complete set of trading rules that matches what I currently use.
But I'd like to (re-)emphasise that there is much, much, much more to successful systems trading than throwing every possible trading rule into your back test and hoping for the best. Adding trading rules should be your last resort once you have a decent framework, and have done as much instrument diversification as your capital can cope with.
Pre-requisites: Although there is some messy pysystemtrade python code for this post here you don't need to use it. It will however be helpful to have a good understanding of my existing trading rules: Carry and EWMAC (Exponentially weighted Moving Average Crossover) which you can glean from my first book or this post - most of the rules I discuss here are built upon those two basic ideas.
PS You'll probably notice that I won't talk in detail about how you'd develop a new trading rule; but don't panic, that's the subject of this post.
Short volatility
I'm often asked "What do you think your trading edge is?" A tiresome question (don't ask it again if you want to stay in my good books). If I have any 'edge' it's that I've learned, the hard way, the importance of correct position sizing and sticking to your trading system. My edge certainly doesn't lie in creating novel trading rules.
Instead the rules I use all capitalise on well known risk factors: momentum and carry for example. You'll sometimes see these called return factors but you don't get return without risk. Of course we all have different risk tolerances, but if you are happy to hold positions that the average investor finds uncomfortably risky, then you'll earn a risk premium (at least it will look like a premium if you use standard measures of risk when doing your analysis). A comprehensive overview of the world of return factors can be found in this excellent book or in this website.
One well known risk factor is the volatility premium. Simply put investors are terrified of the market falling, and bid up the price of options. This means that implied volatility (effectively the price of volatility implied by option prices) will on average be higher than expected realised volatility.
How can a systematic futures trader earn the volatility premium? You could of course build a full blown options trading strategy, like my ex AHL colleague. But this is a huge amount of work. A much simpler way is to just sell volatility futures (the US VIX, and European
V2TX); in my framework that equates to using a constant forecast of -10, or what I call in my book the "no rule" trading rule (note because of position scaling we'll still have smaller positions when the volatility of volatility was higher, and vice versa).
And here is a nice picture showing a backtest of this rule:
"With hindsight Rob realised that starting his short vol strategy in late 2007 may not have been ideal timing...." |
Earning this particular premium isn't for the faint hearted. You will usually earn a consistent return with occasional, horrific, drawdowns. This is what I call a negative skew / insurance selling strategy. Indeed based on monthly returns the skew of the above is a horror show -0.664. This isn't as bad as the underlying price series, because vol scaling helps improve skew, but still pretty ugly (on S&P500 using the same strategy it's a much nicer 0.36).
It is a good compliment to the positive skew trend following rules that form the core of my system (carry is broadly skew neutral, depending on the asset class). For various reasons I don't recommend using the first contract when trading vol futures (in my data the back adjusted price is based on holding the second contract). One of these good reasons is that the skew is really, really bad on the first contract.
But... we already have trend following and carry in vol? Do we need a short bias as well?
I already include the VIX, and V2TX, in my trend following and carry strategies. That means to an extent I am already earning a volatility premium.
How come? Well imagine you're holding the first VIX contract, due to expire in a months time. The price of that (implied vol) will be higher than the current level of the VIX (which I'll call, inaccurately, spot vol), reflecting the desire of investors to pay up for protection against volatility in the next month. As the contract ages the price will drift down to spot levels, assuming nothing changes; a rolldown effect on futures prices. That's exactly what the carry strategy is designed to capture.
This isn't exactly the same as the implied versus spot vol premium; but it's very closely related.
Now consider trend following. Assuming you use back adjusted futures prices then in an environment when spot vol doesn't move, but in which there is negative rolldown for the reasons described above, then the back adjusted price will drift downwards. This will create a trend in which the trend following strategy will want to participate.
Arguably trend following and carry are actually better than being short vol, since they are reactive to changing conditions. In 2008 a short vol strategy would have remained stubbornly short in the face of rapidly rising vol levels. But trend following would have ended up going long vol (eventually, depending on the speed of the rule variation). Also in a crisis the vol curve tends to invert (further out vol becoming cheaper than nearer vol) - in this situation a carry strategy would buy vol.
The vol curve tends to invert in a crisis |
So.... what happens if I throw carry and trend following back into the mix? Using the default optimisation method in pysystemtrade (bayesian shrinkage) the short biased signal gets roughly a 10% weight (sticking to just VIX and V2X). That equates to an improvement in Sharpe Ratio on the overall account curve of the two vol futures of just 0.03, a difference that isn't statistically different. And the skew gets absolutely horrific.
So... is this worth doing? I'll discuss this general issue at the end of the post. But on the face of it using trend following and carry on vol futures might a better way of capturing the vol premium than just a fixed short bias. Using all three of course could be even better.
An aside: What about other asset classes?
An excellent question is why we don't incorporate a bias to other asset classes that are known to earn a risk premium; for example long equities (earning the equity risk premium) or long bonds (earning the term premium)?*
* I'm not convinced that there is a risk premia in Commodities, at best these might act as an inflation hedge but without a positive expected return. It's not obvious what the premia you'd earn in FX is, or which way round you should be to earn it.
This might make sense if all your capital was in systematic futures trading (which I don't recommend - it's extremely difficult to earn a regular income purely from trading). But I, like most people, own a chunk of shares and ETFs which nicely cover the equity and bond universe (and which pay relatively steady dividends which I'm happy to earn an income from). I don't really need any more exposure to these traditional asset classes.
And of course the short vol strategy has a relationship with equity prices; crashes in equities normally happen alongside spikes in the VIX / V2X (I deliberately say relationship here rather than correlation, since the relationship is highly non linear). Having both long equity and short vol in the same portfolio is effectively loading up massively on short black swan exposure.
Relative carry
The next rule I want to consider is also relatively simple - it's a relative version of the carry rule that I describe in my book and which is already implemented in pysystemtrade. As the authors of this seminal paper put it:
"For each global asset class, we construct a carry strategy that invests in high-carry
securities while short selling low-carry instruments, where each instrument is weighted
by the rank of its carry"
Remember for carry the original forecast is quite noisy, to avoid that we need to smooth it. In my own system I use a fixed smooth of 90 business days (as many futures roll quarterly) for both absolute and relative carry.
Mathematically the relative carry measure for some instrument x will be:
Rx_t = Cx_t - median(Ca_t, Cb_t, ...)
Where Ca_t is the smoothed carry forecast for some instrument a, Cb_t for instrument b and so on; where a,b, c....x are all in the same asset class.
Note - some people will apply a further normalisation here to reflect periods when the carry values are tightly clustered within an asset class, or when they are further apart - the normalisation will ensure a consistent expected cross sectional standard deviation for the forecast. However this is leveraging up on weak information - not usually a good idea.
This rule isn't super brilliant by itself. Here it is, tested using the full set of futures in my dataset:
It clearly underperforms it's cousin, absolute carry. More interestingly though the predictors look to be doing relatively different things (correlation is much lower than you might expect at around 0.6), and the optimisation actually gives the relative carry predictor around 40% of the weight when I just run a backtest with only these two predictors.
Lobbing together a backtest with both relative and absolute carry the Sharpe ratio is improved from 0.508 to 0.524 (monthly returns, annualised). Again hardly an earth shattering improvement, but it all helps.
Normalised momentum
Now for something completely different. Most trading rules rely on the idea of filtering the price series to capture certain features (the other school of thought within the technical analysis campus is that one should look for patterns, which I'm less enthusiastic about). For example an EWMAC trend following rule is a filter which tries to see trends in data. Filtering is required because price series are noisy, and a lot of that noise just contributes to potentially higher trading costs rather than giving us new information.
But there is another approach - we could normalise the price series to make it less noisy, and then apply a filter to the resulting data. The normalised series is cleaner, and so the filters have less work to do.
The normalisation I use is the cumulative normalised return. So given a price series P_0, P_1 ... P_T the normalised return is:
R_t = (P_t - P_[t-1] ) / sigma (P_0.... P_t)
Where sigma is a standard deviation calculation.Also to avoid really low vol or bad prices screwing things up I apply a cap of 6.0 in absolute values on R_t. Then the normalised price on any given day t will be:
N_t = R_1 + R_2 + R_3 + .... R_t
NOTE: For scholars of financial history I've personally never seen this trading rule used elsewhere - it's something I dreamed up myself about three years ago. However it comes under the "too simple not to have been already thought of" category so I expect to see comments pointing out that this was invented by some guy, or gal, in 1952. If nobody does then I will not feel too embarrassed to call this "Carvers Normalised Momentum".
Perceptive readers will note:
- You probably shouldn't use normalised prices to identify levels since the level of the price is stripped out by the normalisation.
- These price series will not show exponential growth; the returns will be roughly normal rather than log normal. This is a good thing since over long horizons using prices that show exponential growth tends to screw up most filters since they don't know about exponential growth. Over relatively short horizons however it makes no difference.
- Simple returns calculated using the change in normalised price can be directly compared and aggregated across different instruments, asset classes and time periods; something that you can't do with ordinary prices. We'll use this fact later.
Rather boringly I am now going to apply my favourite EWMAC filter to these normalised price series, although frankly you could apply pretty much anything you like to them.
Minor point: The volatility normalisation stage of an EWMAC calculation [remember its ewma_fast - ewma_slow / volatility] isn't strictly necessary when applied to normalised price series which will have a constant expected volatility but it's more hassle to take it out so I leave it in here.
Normalised momentum |
Performance wise there isn't much to choose between normalised and the use of standard EWMAC on the actual price; but these things aren't perfectly correlated, and that can only be a good thing.
Aggregate momentum
It's generally accepted that momentum doesn't work that well on individual stocks. It does however sort of work on industries. And it is relatively better again when applied to country level equity indices.
I have an explanation for this. The price of an individual equity is going to be related to the global equity risk premium, plus country specific, industry specific, and idiosyncratic firm specific factors. The global equity risk premium seems to show pretty decent trends. The other factors less so; and indeed by the time you are down to within industries mean reversion tends to dominate (though you might call it the value factor, which if per share fundamentals are unchanged amounts to the same thing).
Value type strategies then tend to work best when we're comparing similar assets, like equities in the same country and industry; also because accounting ratios are more comparable across two Japanese banks, than across a Japanese bank and a Belgian chocolate manufacturer. There is a more complete expounding of this idea in my new book, to be released later this year.
So trading equity index futures then means we're trying to pick up the momentum in global equity prices through a noisy measurement (the price of the equity index) with a dollop of mean reverting factor added on top.
If you follow this argument to it's logical conclusion then the best places to see momentum will be at the global asset class level*. There we will have best measure of the underlying risk factor, without any pesky mean reversion effects getting in the way.
* A future research project is to go even further. I could for example create super asset classes, like "all risky assets" [equities, vol, IMM FX which are all short USD in the numeraire, commodities...?] and "all safe assets" [bonds, precious metals, STIR, ...]. I could even try and create a single asset class using some kind of PCA analysis to identify the single most important global factor.
How do we measure momentum at the asset class level? This is by no means a novel idea (see here) so there are plenty of suggestions out there. We could use benchmarks like MSCI world for equities, but that would involve dipping into another data source (and having to adjust because futures returns are excess returns, whilst MSCI world is a total return); and it's not obvious what we'd use for certain other asset classes. Instead I'm going to leverage off the idea of normalised prices and normalised returns which I introduced above.
The normalised return for an asset class at time t will be:
RA_t = median(Ra_t, Rb_t, Rc_t, ...)
Where Ra_t, Rb_t are the normalised returns for the individual instruments within that asset class (eg for equities that might include SP500 futures, EUROSTOXX and so on). You could take a weighted average, using market cap, or your own risk allocations to each instrument, but I'm not going to bother and just use a simple average.
Then the normalised price for an asset class is just:
NA_t = RA_1 + RA_2 + RA_3 + .... RA_t
Next step is to apply a trend following filter to the normalised price... yes why not use EWMAC?
Minor point of order - it's definitely worth keeping the volatility normalisation part of EWMAC here because the volatility of NA is not constant even when the volatility of each Na, Nb... is - if equities become less correlated then the volatility of NA will fall, and vice versa; as more assets are added to the data basket and diversification increases again the volatility of NA will fall. Indeed NA should have an expected volatility that is lower than the expected volatility of any of Na, Nb...
Having done that we have a forecast that will be the same for all instruments in a particular asset class.
If I compare this to standard, and normalised, momentum:
... again performance wise not much to see here, but there is clearly diversification despite all three rules using EWMAC with identical speeds!
If I compare this to standard, and normalised, momentum:
... again performance wise not much to see here, but there is clearly diversification despite all three rules using EWMAC with identical speeds!
Cross sectional within assets
So we can improve our measure of momentum using aggregated returns across an asset class. This works because the price of an instrument within an asset class is affected by the global asset class underlying latent momentum, plus a factor that is mostly mean reverting. Won't it also make sense then to trade that mean reversion? In concrete terms if for example the NASDAQ has been outperforming the DAX, shouldn't we bet on that no longer happening?
Mathematically then, if NA_t is the normalised price for an asset class, and Nx_t is the normalised price for some instrument within that asset class, then the amount of outperformance (or if you prefer, Disequilibrium) over a given time horizon (tau, t) is:
Be careful of making t-tau too large as remember the slightly different properties of Nx and NA; the former has constant expected vol whilst the latter will, by construction, have lower and time varying vol. But also be careful of making it too small- you need sufficient time to estimate an equilibrium. A value of around 6 months probably makes sense
And my personal favourite measure of mean reversion is a smooth of this out-performance:
Dx_t = [Nx_t - Nx_tau] - [NA_t - NA_tau]
Be careful of making t-tau too large as remember the slightly different properties of Nx and NA; the former has constant expected vol whilst the latter will, by construction, have lower and time varying vol. But also be careful of making it too small- you need sufficient time to estimate an equilibrium. A value of around 6 months probably makes sense
And my personal favourite measure of mean reversion is a smooth of this out-performance:
- EWMA(Dx_t, span)
Where EWMA is the usual exponentially weighted moving average; this basically ensures we don't trade too much whilst betting on the mean reversion. The minus sign is there to show mean reversion is expected to occur (I prefer this explicit reminder, rather than reversing the stuff inside Dx).
Using my usual heuristic, finger in the air, combined with some fake data I concluded that a good value to use for the EWMA span was one quarter of the horizon length, t - tau.
Using my usual heuristic, finger in the air, combined with some fake data I concluded that a good value to use for the EWMA span was one quarter of the horizon length, t - tau.
Here is an example for US 10 year bond futures. First of all the normalised prices:
Blue is US 10 year normalised price. Orange is the normalised price for all bond futures. |
US 10 year bond future normalised price - Bond asset class normalised price |
Notice how the system first bets strongly on mean reversion occurring during the taper tantrum, but then re-estimates the equilibrium and cuts its bet. With any mean reversion system it's important to have some mechanism to stop the falling knife being caught; whether it be something simple like this, a formal test for a structural break, or a stop loss mechanism (also note that forecast capping does some work here).
What about performance? You know what - it isn't great:
Performance across all my futures markets of mean reversion rule |
BUT this is a really nice rule to have, since by construction it's strongly negatively correlated with all the trend following rules we have (in case you have lost count there are now four!: original EWMAC, breakout, normalised momentum, and aggregate momentum; with just two carry rules - absolute and relative; plus the odd one out - short volatility). Rules that are negatively correlated are like buying an insurance policy - you shouldn't expect them to be profitable (because insurance companies make profits in the long run) but you'll be glad you bought them when if your car is stolen.
In fact I wouldn't expect this rule to perform very well, since plenty of people have found that cross sectional momentum works sort of okay in some asset classes (read this: thank you my ex-colleagues at AHL) and this is doing the opposite (sort of). But strong negative correlation means we can afford to have a little slack in accepting a rule that isn't stellar in isolation (a negatively correlated asset with a positive expected return can be used to create a magic money machine).
In fact I wouldn't expect this rule to perform very well, since plenty of people have found that cross sectional momentum works sort of okay in some asset classes (read this: thank you my ex-colleagues at AHL) and this is doing the opposite (sort of). But strong negative correlation means we can afford to have a little slack in accepting a rule that isn't stellar in isolation (a negatively correlated asset with a positive expected return can be used to create a magic money machine).
Note: This rule is similar in spirit to the "Value" measure defined for commodity futures in this seminal paper (although the implementation in the paper isn't cross sectional). To reconcile this it's worth noting that momentum and value mostly operate on different time frequencies - in the paper the value measure is based on 5 year mean reversion [I use 6 months], whilst the authors use a 12 month measure for momentum [roughly congruent to my slowest variation].
Summary
Does adding these rules improve the performance of a basic trend following using EWMAC on price, plus carry strategy? It doesn't (I did warn you right at the start of the post!) but is it sill worth doing? I use a variation of Occam's Razor when evaluating changes to my trading strategy. Does the change provide a statistically significant improvement in performance? If not is it worth the effort? (By the way I make exceptions for simplifying and instrument diversifying changes when applying these rules).
I'd expect there to be a small improvement in performance given these rules are diversifying, and given that there isn't enough evidence to suggest that these rules are better or worse than any of my existing rules, but in practice it actually comes out with slightly worse performance; although not with a statistically significant difference.
But I don't care. I have a Bayesian view that the 'true' Sharpe Ratio of the expanded set of rules is higher, even if one sample (the actual backtest) comes out slightly different that doesn't dissuade me. I'm also a bit wary of relying on just one form of momentum rule to pick up trends in the future, even if it has been astonishingly successful in the past. I'd rather have some diversification.
Note if I had dropped any of the 'dud' rules like mean reversion, I'd be guilty of in sample implicit [over]fitting. Instead I choose to keep them in the backtest, and let the optimisation downweight them in as much as there was statically significant evidence they weren't any good.
The new rules have less of a long bias to assets that have gone up consistently in the backtest period; so arguably they have more 'alpha' though I haven't formally judged that.
Although on the face of it there is no compelling case for adding all these extra rules I'm prepared to make an exception. Although I don't like making my system more complex without good reason there is complexity, and there is complexity. I would rather have (a) a relatively large number of simple rules combined in a linear way, with no fancy portfolio construction, than (b) a single rule which has an insane number of parameters and is used to determine expected returns in a full blown markowitz optimisation.
So I'm going to be keeping all these numerous rule variations in my portfolio.