Thursday 5 March 2015

Simulating my futures system

As most of you know I'm running a fully automated futures trading system. This system uses a number of different signals to forecast price movements, but is mostly a trend following system.

I've had quite a few requests for a simulated back-test of this system. Although the system has done very well since I began trading in April 2014, this has been at a time when most trend following systems have done very well.  In particular the flagship fund of quant shop AHL (where I used to work) made about 34% last year.

(I'll be providing a more thorough review of my own performance after I've done a full year of trading, in a few weeks time).

So there is natural curiosity as to whether 2009 - 2013, much worse for trend following, would also have been bad for my system.

This will also be an educational exercise, as I'll talk through some of the issues involved in making a backtest as realistic as possible, and avoiding the deathly curse of "overfitting". Overfitted back-tests can look amazing, but they are unlikely to do well in actual trading.

Futures markets

I will use data from 43 futures markets to simulate the model. These have been chosen to cover a wide range of asset classes, and also based on factors like trading cost and data availability. One slight wrinkle is that I don't have a long series of price history for all my instruments. The data I get from my broker only goes back to late 2013 when I started collecting prices, and although has been a great resource for backfilling longer price data, it doesn't cover every market. If anyone knows of another site for getting (free) historical daily price data for individual futures contracts, preferably which provides .csv files or an API, I'd love to hear about it.

Here you can see how many markets I have data for over time:

Notice the big jump in 2013 when I started getting broker data. This should mean the backtest is a little conservative, since you get better performance from more markets (I know this from simulating performance of similar systems in my old job where I had access to much more data).

However it also means I've needed to take care to make sure that the weights to different instruments in the portfolio is rescaled and fitted properly as new series of data arrives.

Trading rules

I have four main kinds of trading rules:

 Most rules have variations that work at different speeds. So there is a total of 32 possible rules that can be used. However unless an instrument is very cheap to trade we won't be able to access many of the faster rules, as they'll be too expensive. I drop these rules before proceeding.

How do we decide how much weight to give to each trading rule? I use a technique called non parametric bootstrapping to do my portfolio optimisation. Bootstrapping automatically gives you the right weights depending on how different the underlying data is from random noise, so it produces less extreme portfolios.

This is done on an expanding window out of sample. For example to trade in 1987 I used data from 1978 to 1986 to fit my weights. For 2015 I used data from 1978 to 2014. So I'm only using the past, not forward looking data.

To avoid over fitting I pool the pre-cost returns across all the instruments for which I have data. I've rarely found enough consistent evidence that different trading rules work better pre-cost on different kinds of instrument to justify doing anything else, especially given the paucity of available data in the past.
I then work out after cost returns, so it's likely that on expensive markets there will be less weight on faster trading rules.

Over fitting and data mining

Other than making sure you account properly for the effect of costs the main issue to worry about is over fitting AKA data mining. As you can see I am quite careful not to use forward looking information, and bootstrapping ensures we don't over fit based on limited data.

However I can't get away from the fact that I am using trading rules that I know will work, based on my own experience and general market knowledge. So there will be some implicit data mining going on before the backtest is even run.

This issue is discussed briefly in this blog. It will be discussed more thoroughly in my forthcoming book (details to follow, but hopefully out later this year), where there will also be more information about backtesting and fitting generally.

But my rules are generally simple, and having a number of variations for each rule should minimise the bias this causes. Still I wouldn't expect to realise the backtested Sharpe Ratio that I see in this back-test (this is also because future asset returns generally aren't likely to be as high in the simulated period, when a secular in inflation caused large one off repricing gains). But its much more realistic than an overfitted version would be.

A portfolio of futures

I then use a similar procedure to get weights for the instruments in my portfolio, with a few tweaks. I use weekly returns, otherwise the correlations are unrealistically low due to different market closing times (all other work is done with daily data). Obviously I don't pool data from different instruments together!

However if I don't have at least a year of data for an instrument when I start trading it I use average returns from the rest of the asset class, plus some noise such that the new asset will be 80% correlated on average with the other instruments of the same group. This gives me reasonable weights until I have enough data to fit them more precisely.

I also don't take pre cost performance into account (again there isn't much evidence that this is statistically different between markets); although because I'm bootstrapping it wouldn't change the weights much anyway.

Here are the final weights from the bootstrapping procedure, for each asset class:

Agricultural: 21.5%
Bonds and STIR: 17.5%
Equity index, including volatility: 17.3%
FX: 19.1%
Metals: 16.7%
Oil and Gas:  8.3%

These are nice and even.

Risk targeting

I assume here that we start with £500,000; and are targeting risk such that our annualised returns will have an average volatility of 25% of this, £125,000 (this is the same percentage risk target, but not the same size portfolio as I have).

It's imperative that we know we're getting this right. Here is a an estimate of the realised rolling annualised volatility of returns.  Higher peaks mean that we have strong forecasts from our trading rules, or that correlations are particularly high, or that the markets were more volatile than we hoped when we originally put on our positions. However the average is about right; and if anything is a little lower and more conservative than it should be.

(This is to do with a risk management overlay that I use in my model, which reduces risk when it thinks there is potential for large losses)

And the winner is...

Here is what you've all been waiting for - the veritable money shot.

You can see that the last year has been exceptionally good. Overall though this is a good, but not unbelievable performance. It would have been very easy to get a much better curve by fitting in sample, and by using more aggressive fitting techniques. But that would prove nothing, and I'd probably be doing much worse in real trading.

Some statistics:

Sharpe Ratio: 0.88
Realised annualised standard deviation: 19%
Average drawdown: 9.2%
Ratio of winning days to losing day returns: 1.006
Proportion of winning days: 54%
Worst drawdown: 33%
Proportion of days spent in drawdown: 94%

Note that without costs the sharpe would be higher, around 0.94. So I'm paying 0.06 SR in costs. This is an outcome of how I excluded faster trading rules for more expensive instruments.

These returns assume we maintain the same risk target. However all traders should reduce their risk when they lose money. Most will also want to increase exposure as their account value grows. In the latter case the returns shown above are effectively a log graph of what your returns would be. Since the system makes 16% a year on average over 32 years the compounded returns would be pretty good.

I reduce my capital when I make losses, but keep it at a capped maximum when I am at my high water mark. This would slightly increase the Sharpe shown above and reduce the drawdowns, at the expense of a lower total gain.

Here are returns we get from the different styles of trading (don't worry about the units on the y-axis):

You can see that trend following (which contributes about 60% of my risk), as has been well documented, did poorly from 2011-2013. However the other trading rules saved the day; in particular Carry. On the other hand 2014 was a great year for trend following, and this is reflected in my overall performance and those of large funds with similar styles such as AHL, Bluetrend, Winton and Cantab.

Note that in calculating profits I always lag my trades by one day, and assume they are done at the next days closing price, paying half the usual spread on the market, and the normal commission. This is all fairly conservative.

These simulated returns don't include interest charges, gains or losses on converting FX for margin payments, or data fees. In my annual review of actual performance I'll give you some idea of how large these elements are (sneak preview, not that large).

If you'd like any more detail or stats, then please comment on this post. I hope this has been interesting.


  1. Great post, Rob. Thanks for the insight into your system - looking forward to more details in your later post and book!

    One question, which you'll probably handle later: how do you handle lot size issues with the account? If you're trading £500k with that risk level, maybe not such an issue. But I would imagine 1 lot of each of your markets would test the risk bounds...

  2. I do think about minimum size. Ideally I'd want to have a maximum position of at least 4 contracts in each instrument. However there are a few where I don't quite get to that. For example for US 20 year bonds I'd only ever have one contract, unless the vol falls a lot. However only about half a dozen instruments are affected so it doesn't mess up my desired risk too much.

    Clearly I still need to have a maximum of at least one contract. That is why I don't trade DAX anymore, or the longest German and US bonds (never mind JGB).

  3. This comment has been removed by a blog administrator.

    1. Darrin, sorry accidentally deleted your comment. Repeated here:

      "Just bought your book and have been reading it non stop. Great stuff!

      Initially, I'm finding the correlation matrix and diversification multiplier confusing. When/if you test new strategies, with several variation rules, is your first step always calculating forecast weights? Can you give a quick example of the initial steps you took when you first developed your system?

      I'm on chapter 10, but my understanding thus far is that the flow goes something like this: Create a rule with a few variations, Calculate forecast weights, then the forecast signal for the instrument is created and so on through the process. Is this correct?"

      That's correct. Once you have the forecast weights and you've come up with some correlations you'd calculate the forecast diversification multiplier.


  4. What type of relative value trading do you do? Thanks

    1. Relative carry (rolldown, contango... choose your favourite term) within an asset class.

  5. Hi there

    Bought the book, very good. In it you list 6 markets to trade for a small account.
    Could you put in a spreadsheet the list of the 43 markets you trade, and also add those you would trade if account size permitted.




      I've dropped a couple of markets since, now down to 39

      The 'wishlist' in this sheet is one I put together a while ago. It's by no means comprehensive, and may include some markets that aren't as liquid as I'd hoped.

      This is another source you could use

      (but it uses contract volume rather than value so misses out a lot of contracts I trade)

  6. Hi Rob, Your book, blog and additional resources have been very helpful in opening my eyes to different methods of system design.

    Regarding the summary statistics above and the "money shot" above, did you find that the drawdowns diminished in magnitude over time as you added markets? It appears that the worst drawdown of 33% happened near the beginning of your backtesting. While hard to tell from the graph, what was the frequency of drawdowns in excess of 20-25%?

    Are you surprised by the remarkably good performance this program has generated thus far? Or concerned that it isn't performing closer to your original expectations?

    Thanks for the transparent view you have provided into this project.


    1. Hi John

      Theoretically drawdowns should reduce as diversification increases; and sharpe ratio goes up.

      In practice the drawdown from any given account curve is random:

      So it's perfectly possible that you'd get a larger drawdown later in the backtest.

      I don't have a full drawdown chart but I'll included one when I get a chance.

      I'm not surprised by the live performance of my system so far, though I of course am pleased. Returns are random remember. Assuming my backtested sharpe ratio is correct last year's return was a one in ten year event (91% percentile of returns).

      This year is a 70% percentile, so far at least.

  7. In the post you mention one of your rules is selling volatility; also you use a "short bias" on VIX. Did this bias come from backtesting, and it changes for each rebalance? Also, in the book you mention it's possible to combine a positive skew rule with a negative skew asset, i.e. shortselling the VIX. So is selling a volatility a rule or a vanilla instrument? Thanks in advance,

    1. No I have a fixed short on the VIX and V2X which is equivalent to selling options and delta hedging.

      This is kind of a hybrid between rule and instrument; technically it's a "no rule" position on a single instrument, but I could also use a rule for trading options which would give the same effect.

  8. Hi Rob,

    For buffering you 'trade to the edge'. Just so I have it right, if the optimal position is 50% and the buffer % is 10% and the current position is 65%, say, you rebalance back to 60%?

    If so, lets say the next 2-days returns are .002% and 1%. Thus bringing the positions back above 60%. Do you just keep rebalancing back down to 60% on both those days as well?

  9. For buffering you 'trade to the edge'. Just so I have it right, if the optimal position is 50% and the buffer % is 10% and the current position is 65%, say, you rebalance back to 60%?


    If so, lets say the next 2-days returns are .002% and 1%. Thus bringing the positions back above 60%. Do you just keep rebalancing back down to 60% on both those days as well?

    Yes - in theory. But with futures you're unlikely to do such tiny trades. For equities you may also want to impose a minimum trade size to avoid doing 1 or 2 share clips with a minimum brokerage fee that will hurt you.

  10. Hello Robert,

    In practice do you actually lag your trades by one day?

    Many thanks.

    1. Well if you're using the closing price from today (Tuesday) then the earliest you can trade is shortly after the open on Wednesday; and unless the market is closed I'd like to have done my trade before Wednesdays close. So in practice the lag is between 0.1 and 0.9 days (ignoring the overnight close and weekends).

  11. Hello Robert,

    Could you perhaps expand on:

    This is to do with a risk management overlay that I use in my model, which reduces risk when it thinks there is potential for large losses

    Many thanks.

    1. Very briefly (will be the subject of an entire post at some point): reduce positions when total estimated risk is too high, reduce positions when risk calculated using pessimistically high volatility is too high, reduce positions when risk calculated using shocked correlation matrix (all +1 or -1, whatever is worse) is too high.

    2. I need your advise on setting up risk controls. I trade fx spot (20 times leverage) and just one symbol GBPJPY; fx futures and future options for trading EUR to make most of Span margin system. Mostly trading and risk managing manually. Returns r too good to be true 10% a month (9 pc drawdown). since fundseeder has captured it so I have to believe it.
      I want to make an auto trader for GBPJPY with lots, risk and margin management rules. How do I go about it. Thanks

    3. I need your advise on setting up risk controls. I trade fx spot (20 times leverage) and just one symbol GBPJPY; fx futures and future options for trading EUR to make most of Span margin system. Mostly trading and risk managing manually. Returns r too good to be true 10% a month (9 pc drawdown). since fundseeder has captured it so I have to believe it.
      I want to make an auto trader for GBPJPY with lots, risk and margin management rules. How do I go about it. Thanks

    4. I can't possibly answer a question like that in a box this size!

  12. Thanks for the answer.

    Looking forward to the post. As you rightly say, risk and costs are the two things an investor can control. A post on a solid, implementable risk system would probably add a lot of value to a lot of investors.

  13. Hi Rob,
    Really enjoyed your book. Just thinking through your trend following, carry, and short vol bias rules. Do you find a high correlation between these rules when applied to VIX and V2X futures? I would think that generally, since the VIX curve is usually upwards sloping, the carry always appears high, leading you to short VIX futures. The persistent decline in VIX futures would also lead your trend following system to be shorting it most of the time as well. And you also have the "always short" rule.

    The other somewhat worrying thing is that vol of VIX itself is rather jumpy, but in times like these one would potentially be seeing a high carry/trend forecast owning to the steep curve and low realized vol of VIX futures, and also shorting more VIX futures to reach the vol target.

    1. Yes these rules would be correlated, but no more so than for say eurodollar or bonds; or any contract where you generally have strong positive carry.

      The jumpiness of vol is pretty unpleasant I agree. One measure to take is to limit the leverage you take when vol is very low. Read this, section "Risk management within the system (endogeonous)"

    2. Thank you! That seems like a sensible way to limit the risk of vol targeting when realized vol is very low.

  14. Hi Rob, an older post but hoping you don’t mind a new question on this.

    I’m intrigued why you backtest with a one day lag at the closing price, rather than just using the open price following a signal at close?

    Thanks - big fan of your books - looking forward to the next one? :)


    1. A few reasons: (a) I don't have OHLC data in my current database, although I could easily take the first sample each day as open, (b) It's more conservative, if the market is fast moving you might not get the open price. I suppose a better measure might be the average of close and open on the following day (close to VWAP).

  15. Never mind my previous comment about picking rule weights. I have found the answer at

  16. I am inspired by your books and have just started working on my own futures trading portfolio. In your calculated portfolio in this blog post, about 50% is allocated to commodity. Do you think it concerning if I do the same with my portfolio? Or the correlations of the 3 commodity classes low enough with each other, that can justify the high weight? Your recommendation in your book was to start with the 4 asset groups - 1)bond, 2)equity, 3)fx, 4)commodities, and once done then to go to the asset classes next including the 3 commodity classes. So I am a bit confused. Thanks for your advice!

    1. Commodities are relatively uncorrelated with other asset classes so a higher weight is justified. The difference with my book is that I used heuristic weighting, wheras the above are optimised.

  17. Hi Rob,

    Nice post here. Regarding futures data, I find Norgate Data to be pretty good, though a fee (6 months = USD 148.50, 12 months = USD 270) would have to be paid to access the data.

    Anyway, your post got me thinking about this question.

    As of March 2015 (when you wrote this post), you had data for 43 future whereas you had data for approximately 20 futures (based on my interpretation of the graph on this post) as of year 2000.

    If I could ask, “Wouldn’t the backtest be more appropriate if it is done on the 20 futures over the period, 2000 to 2015, since there is actual historical data for all these 20 futures over the whole period, instead of an expanding universe of futures (from around 4 to 43) over time (from 1978 to 2015) based on data availability? ”

    While diversification will definitely be lower if 20 futures instead of 43 futures were used, at least in the former scenario there is consistency in the instruments which the strategy is backtested on over those 15 years whereas in the latter scenario, the optimal trading parameters are identified from fitting the strategy on data with different instruments at different time period?

    1. This is a very big question, which has to do with the consistency of parameter estimates over time and across instruments, and which of those you think is more important or problematic. And the honest answer is, I don't know. But my general experience is that parameters for the sort of trading system I run are quite stable over time, suggesting the approach of using more time for fewer instruments is the right one. Having said that, this is a serious question which would require a considerable amount of research to properly answer.

  18. Rob, thanks for your response.

    If I could ask further, let’s say if I find that the optimal parameter for a strategy via backtesting on the 43 futures is 5 over the period 1980 to 2015. Would checking for the uniformity of backtested performance between shorter periods (ie 1980 – 1990 vs 1990 – 2000 vs 2000-2010 vs 2010-2015) for the strategy with parameter 5 help to ensure that the introduction of futures with shorter history after 1980 does not create any bias towards any instrument with shorter history?

    1. Yes: I'd want to check that the difference in parameter estimates wasn't statistically significantly different (so it might be that one data set comes up with a parameter value of 10 but in fact there is so much noise you can't actually distinguish between the estimates)

    2. ok sure. thanks Rob.

  19. Hi Robert - quick question. When you say that you sell volatility with a modest short bias, does it mean you keep a permanent short position, so that trend following and carry rules do not apply to VIX and V2X? Thanks!

    1. No, I apply those rules but I also apply a constant -10 forecast rule; so the result is a short bias compared to the situation if the always short rule wasn't there.

    2. Thanks Rob - so for volatility based futures only, what weight do you attribute to this -10 forecast rule if you also had momentum and carry rules in the mix? A third each?

    3. 20% short bias
      20% carry
      60% momentum

  20. Robert - thanks for your reply.

    Another QQ. In your book “Systematic Trading”, you state that “I have 8 rules drawn from 5 different themes”. I believe momentum and carry are two of these 5 themes? Without going into much details, would you be able to share the other 3 themes? Thanks, R

    1. Not actually sure what I meant by 'theme' at the time, but here's a list of the eight rules:

      Momentum for a single instrument


      Normalised momentum for a single instrument

      Mean reversion within asset classes

      Relative carry within asset classes

      Momentum for a synthetic asset class.

      Systematic short volatility.


    2. Thanks Robert. May I ask what the optimal weights are to each of the 8 rules? Although I believe that the systematic short volatility is specific to VIX only instruments? Thanks, R

    3. Not sure about the 'optimal' weights, but the weights I use vary by instrument. However a simple average is:

      short vol 1%
      breakout 33%
      Mean reversion in asset 10%
      carry 20%
      Momentum for a synthetic asset class. 11%
      Normalised momentum 10%
      Single instrument momentum 10%
      Relative carry 5%

    4. Thanks for your reply. If one had only access to the following rules, would you agree with the following weights?

      1. single instrument momentum - 20%
      2. Single instrument breakout - 30%
      3. Single instrument Carry - 50%

      I have done some simple back testing and it seems that breakout makes more money than momentum. Intuitively, this might make sense because 75% of the time asset prices are in a range and the breakout signal will be quite weak if prices are hovering around the roll_mean. Whist pure momentum will lose more money in trading ranges due to whipsaws of prices. May I ask what your thoughts are on this?


    5. They're not crazy weights, and I would be happy to use them, although I wouldn't do so based on an in sample glance at a backtest and some intuition that doesn't make any sense and hasn't been properly checked (sorry if that sounds harsh, but what did you expect :-) )

    6. Thanks for being direct, Rob, I don’t mind. ;-)

      I first heard of this statement - that markets move in trading ranges 75% of the time - from a Paul Tudor Jones’s interview. I was also sceptical about this given that trend following works but when I measure the times markets are in a trading range vs trending using “classic breakout” rule on a 60 day look back on crude, I found out that it was actually true!

      At the moment I don’t have access to prices of other assets so I cannot expand my search but there might be more substance than it seems. That’s why I thought I would ask you.


    7. I'm prepare to believe that markets don't trend 75% of the time, otherwise I'd be a lot richer :-) However given how highly correlated breakout and EWMA momentum is, I don't really think there is much difference for a *continous* system for both, as both will have smaller positions on when the trend is weaker.


Comments are moderated. So there will be a delay before they are published. Don't bother with spam, it wastes your time and mine.