I've had quite a few requests for a simulated back-test of this system. Although the system has done very well since I began trading in April 2014, this has been at a time when most trend following systems have done very well. In particular the flagship fund of quant shop AHL (where I used to work) made about 34% last year.

(I'll be providing a more thorough review of my own performance after I've done a full year of trading, in a few weeks time).

So there is natural curiosity as to whether 2009 - 2013, much worse for trend following, would also have been bad for my system.

This will also be an educational exercise, as I'll talk through some of the issues involved in making a backtest as realistic as possible, and avoiding the deathly curse of "overfitting". Overfitted back-tests can look amazing, but they are unlikely to do well in actual trading.

### Futures markets

I will use data from 43 futures markets to simulate the model. These have been chosen to cover a wide range of asset classes, and also based on factors like trading cost and data availability. One slight wrinkle is that I don't have a long series of price history for all my instruments. The data I get from my broker only goes back to late 2013 when I started collecting prices, and although www.quandl.com has been a great resource for backfilling longer price data, it doesn't cover every market. If anyone knows of another site for getting (free) historical daily price data for individual futures contracts, preferably which provides .csv files or an API, I'd love to hear about it.

Here you can see how many markets I have data for over time:

Notice the big jump in 2013 when I started getting broker data. This should mean the backtest is a little conservative, since you get better performance from more markets (I know this from simulating performance of similar systems in my old job where I had access to much more data).

However it also means I've needed to take care to make sure that the weights to different instruments in the portfolio is rescaled and fitted properly as new series of data arrives.

### Trading rules

I have four main kinds of trading rules:

- Trend following
- Carry
- Relative value (within an asset class)
- Selling volatility (this 'rule' just amounts to a modest short bias on the VIX, and V2X markets)

How do we decide how much weight to give to each trading rule? I use a technique called non parametric bootstrapping to do my portfolio optimisation. Bootstrapping automatically gives you the right weights depending on how different the underlying data is from random noise, so it produces less extreme portfolios.

This is done on an expanding window out of sample. For example to trade in 1987 I used data from 1978 to 1986 to fit my weights. For 2015 I used data from 1978 to 2014. So I'm only using the past, not forward looking data.

To avoid over fitting I pool the pre-cost returns across all the instruments for which I have data. I've rarely found enough consistent evidence that different trading rules work better pre-cost on different kinds of instrument to justify doing anything else, especially given the paucity of available data in the past.

I then work out after cost returns, so it's likely that on expensive markets there will be less weight on faster trading rules.

### Over fitting and data mining

Other than making sure you account properly for the effect of costs the main issue to worry about is over fitting AKA data mining. As you can see I am quite careful not to use forward looking information, and bootstrapping ensures we don't over fit based on limited data.

However I can't get away from the fact that I am using trading rules that I know will work, based on my own experience and general market knowledge. So there will be some

*data mining going on before the backtest is even run.*

**implicit**This issue is discussed briefly in this blog. It will be discussed more thoroughly in my forthcoming book (details to follow, but hopefully out later this year), where there will also be more information about backtesting and fitting generally.

But my rules are generally simple, and having a number of variations for each rule should minimise the bias this causes. Still I wouldn't expect to realise the backtested Sharpe Ratio that I see in this back-test (this is also because future asset returns generally aren't likely to be as high in the simulated period, when a secular in inflation caused large one off repricing gains). But its much more realistic than an overfitted version would be.

### A portfolio of futures

I then use a similar procedure to get weights for the instruments in my portfolio, with a few tweaks. I use weekly returns, otherwise the correlations are unrealistically low due to different market closing times (all other work is done with daily data). Obviously I don't pool data from different instruments together!

However if I don't have at least a year of data for an instrument when I start trading it I use average returns from the rest of the asset class, plus some noise such that the new asset will be 80% correlated on average with the other instruments of the same group. This gives me reasonable weights until I have enough data to fit them more precisely.

I also don't take pre cost performance into account (again there isn't much evidence that this is statistically different between markets); although because I'm bootstrapping it wouldn't change the weights much anyway.

Here are the final weights from the bootstrapping procedure, for each asset class:

Agricultural: 21.5%

Bonds and STIR: 17.5%

Equity index, including volatility: 17.3%

FX: 19.1%

Metals: 16.7%

Oil and Gas: 8.3%

These are nice and even.

### Risk targeting

I assume here that we start with £500,000; and are targeting risk such that our annualised returns will have an average volatility of 25% of this, £125,000 (this is the same percentage risk target, but not the same size portfolio as I have).

It's imperative that we know we're getting this right. Here is a an estimate of the realised rolling annualised volatility of returns. Higher peaks mean that we have strong forecasts from our trading rules, or that correlations are particularly high, or that the markets were more volatile than we hoped when we originally put on our positions. However the average is about right; and if anything is a little lower and more conservative than it should be.

(This is to do with a risk management overlay that I use in my model, which reduces risk when it thinks there is potential for large losses)

### And the winner is...

Here is what you've all been waiting for - the veritable money shot.Some statistics:

Sharpe Ratio: 0.88

Realised annualised standard deviation: 19%

Average drawdown: 9.2%

Ratio of winning days to losing day returns: 1.006

Proportion of winning days: 54%

Worst drawdown: 33%

Proportion of days spent in drawdown: 94%

Note that without costs the sharpe would be higher, around 0.94. So I'm paying 0.06 SR in costs. This is an outcome of how I excluded faster trading rules for more expensive instruments.

These returns assume we maintain the same risk target. However all traders should reduce their risk when they lose money. Most will also want to increase exposure as their account value grows. In the latter case the returns shown above are effectively a log graph of what your returns would be. Since the system makes 16% a year on average over 32 years the compounded returns would be pretty good.

I reduce my capital when I make losses, but keep it at a capped maximum when I am at my high water mark. This would slightly increase the Sharpe shown above and reduce the drawdowns, at the expense of a lower total gain.

Here are returns we get from the different styles of trading (don't worry about the units on the y-axis):

Note that in calculating profits I always lag my trades by one day, and assume they are done at the next days closing price, paying half the usual spread on the market, and the normal commission. This is all fairly conservative.

These simulated returns don't include interest charges, gains or losses on converting FX for margin payments, or data fees. In my annual review of actual performance I'll give you some idea of how large these elements are (sneak preview, not that large).

If you'd like any more detail or stats, then please comment on this post. I hope this has been interesting.

Great post, Rob. Thanks for the insight into your system - looking forward to more details in your later post and book!

ReplyDeleteOne question, which you'll probably handle later: how do you handle lot size issues with the account? If you're trading £500k with that risk level, maybe not such an issue. But I would imagine 1 lot of each of your markets would test the risk bounds...

I do think about minimum size. Ideally I'd want to have a maximum position of at least 4 contracts in each instrument. However there are a few where I don't quite get to that. For example for US 20 year bonds I'd only ever have one contract, unless the vol falls a lot. However only about half a dozen instruments are affected so it doesn't mess up my desired risk too much.

ReplyDeleteClearly I still need to have a maximum of at least one contract. That is why I don't trade DAX anymore, or the longest German and US bonds (never mind JGB).

This comment has been removed by a blog administrator.

ReplyDeleteDarrin, sorry accidentally deleted your comment. Repeated here:

Delete"Just bought your book and have been reading it non stop. Great stuff!

Initially, I'm finding the correlation matrix and diversification multiplier confusing. When/if you test new strategies, with several variation rules, is your first step always calculating forecast weights? Can you give a quick example of the initial steps you took when you first developed your system?

I'm on chapter 10, but my understanding thus far is that the flow goes something like this: Create a rule with a few variations, Calculate forecast weights, then the forecast signal for the instrument is created and so on through the process. Is this correct?"

That's correct. Once you have the forecast weights and you've come up with some correlations you'd calculate the forecast diversification multiplier.

Rob

What type of relative value trading do you do? Thanks

ReplyDeleteRelative carry (rolldown, contango... choose your favourite term) within an asset class.

DeleteHi there

ReplyDeleteBought the book, very good. In it you list 6 markets to trade for a small account.

Could you put in a spreadsheet the list of the 43 markets you trade, and also add those you would trade if account size permitted.

Thanks

Peter

https://docs.google.com/spreadsheets/d/1RQEcpVK7esBHmWRlugoXGI_RKBp634m9MkU_XrLPdAQ/edit?usp=sharing

DeleteI've dropped a couple of markets since, now down to 39

The 'wishlist' in this sheet is one I put together a while ago. It's by no means comprehensive, and may include some markets that aren't as liquid as I'd hoped.

This is another source you could use

https://fimag.fia.org/sites/default/files/content_attachments/2014%20FIA%20Annual%20Volume%20Survey%20%E2%80%93%20Charts%20and%20Tables.pdf

(but it uses contract volume rather than value so misses out a lot of contracts I trade)

Hi Rob, Your book, blog and additional resources have been very helpful in opening my eyes to different methods of system design.

ReplyDeleteRegarding the summary statistics above and the "money shot" above, did you find that the drawdowns diminished in magnitude over time as you added markets? It appears that the worst drawdown of 33% happened near the beginning of your backtesting. While hard to tell from the graph, what was the frequency of drawdowns in excess of 20-25%?

Are you surprised by the remarkably good performance this program has generated thus far? Or concerned that it isn't performing closer to your original expectations?

Thanks for the transparent view you have provided into this project.

John

Hi John

DeleteTheoretically drawdowns should reduce as diversification increases; and sharpe ratio goes up.

In practice the drawdown from any given account curve is random:

http://qoppac.blogspot.co.uk/2015/11/using-random-data.html

So it's perfectly possible that you'd get a larger drawdown later in the backtest.

I don't have a full drawdown chart but I'll included one when I get a chance.

I'm not surprised by the live performance of my system so far, though I of course am pleased. Returns are random remember. Assuming my backtested sharpe ratio is correct last year's return was a one in ten year event (91% percentile of returns).

This year is a 70% percentile, so far at least.

In the post you mention one of your rules is selling volatility; also you use a "short bias" on VIX. Did this bias come from backtesting, and it changes for each rebalance? Also, in the book you mention it's possible to combine a positive skew rule with a negative skew asset, i.e. shortselling the VIX. So is selling a volatility a rule or a vanilla instrument? Thanks in advance,

ReplyDeleteNo I have a fixed short on the VIX and V2X which is equivalent to selling options and delta hedging.

DeleteThis is kind of a hybrid between rule and instrument; technically it's a "no rule" position on a single instrument, but I could also use a rule for trading options which would give the same effect.

Thanks

ReplyDeleteHi Rob,

ReplyDeleteFor buffering you 'trade to the edge'. Just so I have it right, if the optimal position is 50% and the buffer % is 10% and the current position is 65%, say, you rebalance back to 60%?

If so, lets say the next 2-days returns are .002% and 1%. Thus bringing the positions back above 60%. Do you just keep rebalancing back down to 60% on both those days as well?

For buffering you 'trade to the edge'. Just so I have it right, if the optimal position is 50% and the buffer % is 10% and the current position is 65%, say, you rebalance back to 60%?

ReplyDeleteYes.

If so, lets say the next 2-days returns are .002% and 1%. Thus bringing the positions back above 60%. Do you just keep rebalancing back down to 60% on both those days as well?

Yes - in theory. But with futures you're unlikely to do such tiny trades. For equities you may also want to impose a minimum trade size to avoid doing 1 or 2 share clips with a minimum brokerage fee that will hurt you.

Hello Robert,

ReplyDeleteIn practice do you actually lag your trades by one day?

Many thanks.

Well if you're using the closing price from today (Tuesday) then the earliest you can trade is shortly after the open on Wednesday; and unless the market is closed I'd like to have done my trade before Wednesdays close. So in practice the lag is between 0.1 and 0.9 days (ignoring the overnight close and weekends).

DeleteHello Robert,

ReplyDeleteCould you perhaps expand on:

This is to do with a risk management overlay that I use in my model, which reduces risk when it thinks there is potential for large losses

Many thanks.

Very briefly (will be the subject of an entire post at some point): reduce positions when total estimated risk is too high, reduce positions when risk calculated using pessimistically high volatility is too high, reduce positions when risk calculated using shocked correlation matrix (all +1 or -1, whatever is worse) is too high.

DeleteThanks for the answer.

ReplyDeleteLooking forward to the post. As you rightly say, risk and costs are the two things an investor can control. A post on a solid, implementable risk system would probably add a lot of value to a lot of investors.

Hi Rob,

ReplyDeleteReally enjoyed your book. Just thinking through your trend following, carry, and short vol bias rules. Do you find a high correlation between these rules when applied to VIX and V2X futures? I would think that generally, since the VIX curve is usually upwards sloping, the carry always appears high, leading you to short VIX futures. The persistent decline in VIX futures would also lead your trend following system to be shorting it most of the time as well. And you also have the "always short" rule.

The other somewhat worrying thing is that vol of VIX itself is rather jumpy, but in times like these one would potentially be seeing a high carry/trend forecast owning to the steep curve and low realized vol of VIX futures, and also shorting more VIX futures to reach the vol target.

Yes these rules would be correlated, but no more so than for say eurodollar or bonds; or any contract where you generally have strong positive carry.

DeleteThe jumpiness of vol is pretty unpleasant I agree. One measure to take is to limit the leverage you take when vol is very low. Read this, https://qoppac.blogspot.co.uk/2016/09/systematic-risk-management.html section "Risk management within the system (endogeonous)"

Thank you! That seems like a sensible way to limit the risk of vol targeting when realized vol is very low.

DeleteHi Rob, an older post but hoping you don’t mind a new question on this.

ReplyDeleteI’m intrigued why you backtest with a one day lag at the closing price, rather than just using the open price following a signal at close?

Thanks - big fan of your books - looking forward to the next one? :)

David

A few reasons: (a) I don't have OHLC data in my current database, although I could easily take the first sample each day as open, (b) It's more conservative, if the market is fast moving you might not get the open price. I suppose a better measure might be the average of close and open on the following day (close to VWAP).

Delete