Monday, 18 January 2016

pysystemtrader: Estimated forecast scalars

I've just added the ability to estimate forecast scalars to pysystemtrade. So rather than a fixed scalar specified in the config you can let the code estimate a time series of what the scalar should be.

If you don't understand what the heck a forecast scalar is, then you might want to read my book (chapter 7). If you haven't bought it, then you might as well know that the scalar is used to modify a trading rule's forecast so that it has the correct average absolute value, normally 10.

Here are some of my thoughts on the estimation of forecast scalars, with a quick demo of how it's done in the code. I'll be using this "pre-baked system" as our starting point:

from systems.provided.futures_chapter15.estimatedsystem import futures_system

The code I use for plots etc in this post is in this file.

Even if you're not using my code it's probably worth reading this post as it gives you some idea of the real "craft" of creating trading systems, and some of the issues you have to consider.

Targeting average absolute value

The basic idea is that we're going to take the natural, raw, forecast from a trading rule variation and look at it's absolute value. We're then going to take an average of that value. Finally we work out the scalar that would give us the desired average (usually 10).

Notice that for non symmetric forecasts this will give a different answer to measuring the standard deviation, since this will take into account the average forecast. Suppose you had a long biased forecast that varied between +0 and +20, averaging +10. The average absolute value will be about 10, but the standard deviation will probably be closer to 5.

Neither approach is "right" but you should be aware of the difference (or better still avoid using biased forecasts).

Use a median, not a mean

From above: "We're then going to take an average of that value..."

Now when I say average I could mean the mean. Or the median. (Or the mode but that's just silly). I prefer the median, because it's more robust to having occasional outlier values for forecasts (normally for a forecast normalised by standard deviation, when we get a really low figure for the normalisation).

The default function I've coded up uses a rolling median (syscore.algos.forecast_scalar_pooled). Feel free to write your own and use that instead:


Pool data across instruments

In chapter 7 of my book I state that good forecasts should be consistent across multiple instruments, by using normalisation techniques so that a forecast of +20 means the same thing (strong buy) for both S&P500 and Eurodollar.

One implication of this is that the forecast scalar should also be identical for all instruments. And one implication of that is that we can pool data across multiple markets to measure the right scalar. This is what the code defaults to doing.

Of course to do that we should be confident that the forecast scalar ought to be the same for all markets. This should be true for a fast trend following rule like this:

## don't pool

for instrument_code in system.get_instrument_list():
    results.append(round(float(system.forecastScaleCap.get_forecast_scalar(instrument_code, "ewmac2_8").tail(1).values),2))

[13.13, 13.13, 13.29, 12.76, 12.23, 13.31]

Close enough to pool, I would say. For something like carry you might get a slightly different result even when the rule is properly scaled; it's a slower signal so instruments with short history will (plus some instruments just persistently have more carry than others - that's why the rule works).

for instrument_code in system.get_instrument_list():

   results.append(round(float(system.forecastScaleCap.get_forecast_scalar(instrument_code, "carry").tail(1).values),2))

[10.3, 58.52, 11.26, 23.91, 21.79, 18.81]

The odd one's out are V2X (with a very low scalar) and Eurostoxx (very high) - both have only a year and a half of data - not really enough to be sure of the scalar value.

One more important thing, the default function takes a cross sectional median of absolute values first, and then takes a time series average of that. The reason I do it that way round, rather than time series first, is otherwise when new instruments move into the average they'll make the scalar estimate jump horribly.

Finally if you're some kind of weirdo (who has stupidly designed an instrument specific trading rule), then this is how you'd estimate everything individually:

## don't pool

## need a different function

Use an expanding window

As well as being consistent across instruments, good forecasts should be consistent over time. Sure it's likely that forecasts can remain low for several months, or even a couple of years if they're slower trading rules, but a forecast scalar shouldn't average +10 in the 1980's, +20 in the 1990's, and +5 in the 2000's.

For this reason I don't advocate using a moving window to average out my estimate of average forecast values; better to use all the data we have with an expanding window.

For example, here's my estimate of the scalar for a slow trend following rule, using a moving window of one year. Notice the natural peaks and troughs as we get periods with strong trends (like 2008, 2011 and 2015), and periods without them.
There's certainly no evidence that we should be using a varying scalar for different decades (though things are a little crazier in the early part of the data, perhaps because we have fewer markets contributing). The shorter estimate just adds noise, and will be a source of additional trading in our system.

If you insist on using a moving window, here's how:

## By default we use an expanding window by making this parameter *large* eg 1000 years of daily data
## Here's how I'd use a four year window (4 years * 250 business days)

Goldilocks amount of minimum data - not too much, not too little

The average value of a forecast will vary over the "cycle" of the forecast. This also means that estimating the average absolute value over a short period may well give you the wrong answer.

For example suppose you're using a very slow trend following signal looking for 6 month trends, and you use just a month of data to find the initial estimate of your scalar. You might be in a period of a really strong trend, and get an unrealistically high value for the average absolute forecast, and thus a scalar that is biased downwards.

Check out this, the raw forecast for a very slow trend system on Eurodollars:
Notice how for the first year or so there is a very weak signal. If I'm now crazy enough to use a 50 day minimum, and fit without pooling, we get the following estimate for the forecast scalar:

Not nice. The weak signal has translated into a massive overestimate of the scalar.

On the other hand, using a very long minimum window means we'll eithier have to burn a lot of data, or effectively be fitting in sample for much of the time (depending on whether we backfill - see next section).

The default is two years, which feels about right to me, but you can easily change it, eg to one year:

## use a year (250 trading days)

Cheat, a bit

So if we're using 2 years of minimum data, then what do we do if we have less than 2 years? It isn't so bad if we're pooling, since we can use another instrument's data before we get our own, but what if this is the first instrument we're trading. Do we really want to burn our first two years of precious data?

I think it's okay to cheat here, and backfill the first valid value of a forecast scalar. We're not optimising for maximum performance here, so this doesn't feel like a forward looking backtest.

Just be aware that if you're using a really long minimum window then you're effectively fitting in sample during the period that is backfilled.

Naturally if you disagree, you can always change the parameter:



Perhaps the most interesting thing about this post is how careful and thoughtful we have to be about something as mundane as estimating a forecast scalar.

And if you think this is bad, some of the issues I've discussed here were the subject of debates lasting for years when I worked for AHL!

But being a successful systematic trader is mostly about getting a series of mundane things right. It's not usually about discovering some secret trading rule that nobody else has thought of. If you can do that, great, but you'll also need to ensure all the boring stuff is right as well.


  1. Hi Rob,

    For variations within a trading rule ie differing lookbaks for the EWMAC are you calculating a diversification multiplier between variations? If you are then is it using a bootstrapping method or are you just looking at the entire history?

    I'm piecing together how it's being done from your code but so far my attempts have produced a very slow version of code.

    1. I don't do it this way, no. I throw all my trading rules into a single bit pot, and on that work out FDM and weights.

      There is an alternative which is to group rules together as you suggest, work out weights and an FDM for the group; and then repeat the same exercise across groups. However it isn't something I intend to code up.

  2. Dear Rob,

    I've been playing with the sytem and estimated forecasts for ES EWMAC(18,36) rule from 2010 to 2016 using both MA and EWMA volatility approach.

    Scaled ewma forecast looks fine, but scaled MA forecast is biased upwards oscilating around 10 with range -20 to 50. I believe this is due to low volatility (uptrend in equity prices and index futures) especialy in the period from 2012 to 2014. Should i just floor the volatility at some lvl when calculating forecast using MA approach? Also, should I annualize stddev when working out the MA forecast?

    From your experience, which approach is better - MA or EWMA when calculating forecasts?

  3. I'd never use MA. When you get a large return the MA will pop up. When it drops out of the window the MA pops down. So it's super jumpy, and this affects it's general statistical properties. Fixing the vol at a floor is treating the symptom, not the problem, and will have side affects (the position won't respond to an initial spike in vol when below the floor).

    Note: The system already includes a vol floor (syscore.algos.robust_vol_calc) which defaults to flooring the vol at the 5% quantile of vol in the last couple of years.

  4. Dear Rob, could I ask for a bit of clarification (it could just be my own made confusion, but I would appreciate your help)

    In the book under "Summary for Combining forecasts" you mention that the order is to first combine the raw forecast from all trading variations and then apply the forecast scaler. Whereas above here the forecast scaler is at the trading rule variation level.
    My confusion is that 1 - which approach do you really recommend
    2 - if the scaler is to be applied on combined forecast , how to assess the absolute average forecast across variations ?

    Thank you in advance . Puneet

    1. I think you've misread the book (or it's badly written). The forecast scalar is for each individual trading rule variation.

  5. Rob thank you for the clarification. Book is fantastic. The confusions are of my own making

    one more for you on Forecast Scalers - When you talk about looking at absolute average value of forecast across instruments, are these forecast raw or volatility adjusted? If the forecasts are not vol. adjusted, the absolute value of the instrument will play a part in the forecast scaler determination ? That would not be right in my opinion. More so if we want a single scaler across instruments. Thanks in advance for your help.

    1. Yes forecasts should always be adjusted so that they are 'scale-less' and consistent across instruments before scaling. This will normally involve volatility adjusting.

  6. I probably missed it but do you explicitly test that forecast magnitude correlates with forecast accuracy? For example, do higher values of EWMAs result in greater accuracy? I'm assuming there's an observed albeit weak correlation. I haven't yet immersed myself in your book so I may be barking up the wrong tree.

    1. I have tested that specific relationship, and yes it happens. If you think about it logically most forecasting rules "should" work that way.

  7. Rob thank you for being so generous with your expertise. I really enjoyed your book and recommended it to several friends who also purchased it.

    In your post, you show that the instrument forecast scalars are fairly consistent for the EWMAC trading rule but not for the carry rule. Do you still then pool and use one forecast scalar for all instruments when using the carry rule?

    1. Yes I use a single scalar for all instruments with carry as well.

  8. Dear Rob,

    So for pooling, I do following:

    1. I generate forecasts for each instrument separatley.
    2. I take cross sectional median for forecast absolute values for all the instruments I'm pooling. What if dates i have data for each instrument are inconsistent?
    3. I take time average of cross-sectional median found above.
    4. I divide 10 into this average to get forecast scalar.

    Am I correct on that?

    Also, not related to this topic, but rather to trading rules:

    What do you think of an idea behind using a negative skew rule such as mean reversion to some average - e.g. take a linear deviaion from some moving average - the further the price from it, the stronger the forecast but with opposite sign. Wouldn't this harm EWMAC trend rule?


    1. 1. yes
      2. yes. It just means in earlier periods you will have fewer data points in your median.
      3, 4, yes.

      "e.g. take a linear deviaion from some moving average - the further the price from it, the stronger the forecast but with opposite sign. Wouldn't this harm EWMAC trend rule? "

      It's fine. Something like this would probably work at different time periods to trend following which seems to break down for holding periods of just a few days or for over a year. It would also be uncorrelated which is obviously good. But running it at the same frequency as EWMAC would be daft, as you'd be running the inverse of a trading rule at the same time as the rule itself.

  9. Dear Rob,

    I'm still struggling with forecast scalars.

    1. For example for one of my ewmac rule variations i get forecast scalars ranging from 7 to 20 for different instruments and one weird outlier with monotonically declining scalar from 37 to 20. Do I still pool data to get one forecast scalar for all instruments or not?

    2. Also, forecast scalar changes with every new data point, which value do i use to scale forecasts?

    for example i get following forecasts nd scalars for 5 dta points

    5 2
    4 2.5
    4 2.5
    6 1.7
    7 1.25

    for a scaled forecast do i just multiply values by line e.g. 5*2, 4*2.5 ... 7*1.25, or use the latest value of the forecast scalar for all my forecast data points e.g. 5*1.25, 4*1.25 ... 7*1.25?

    I would love to show a picture, however i have no idea how to attach it to the comments, is it possible?

    1. 1. If you are sure you have created a forecast that ought to be consistent across instruments (and ewmac should be) then yes pool. Such wildly different values suggest you have *very* short data history. Personally in this situation I'd use the forecast scalars from my book rather than estimating them.

      2. In theory you should use the value of the scalar that was available at that time, otherwise your backtest is forward looking.

      I don't think you can attach pictures here. You could try messaging me on twitter, facebook or linkedin.

  10. Hi Rob (and P_Ser)

    If you want to share pictures, it can be done via screencast for example. I've used it to ask my question below :-)

    For my understanding, is the way of working in the following link correct with what you mean in this article :

    First step : take the median of all instruments for each date (= cross section median I suppose)
    Second step : take a normal average (or mean) of all this medians

    In the screenshot you can see this worked out for some dummy data.

    I also have the idea to calculate the forecast scalers only yearly and then smoothing them out like described in this link :

    I think it has no added value to calculate the forecast scaler each date like Peter mentioned, do you agree with this ?


    1. Hopefully it's clear from function forecast_scalar what I do.

      # Take CS average first
      # we do this before we get the final TS average otherwise get jumps in
      # scalar
      if xcross.shape[1] == 1:
      x = xcross.abs().iloc[:, 0]
      x = xcross.ffill().abs().median(axis=1)

      # now the TS
      avg_abs_value = pd.rolling_mean(x, window=window, min_periods=min_periods)
      scaling_factor = target_abs_forecast / avg_abs_value

    2. Hi,

      I've already found the code on github but I must confess that reading other people's code in a language that I don't know is very difficult to me... but that's my problem. So everything I write my own is based on the articles you wrote.

      For so far I can read the code, it is the same as what I mean in the Excel-screenshot.

      Thanks, Kris

  11. Let's assume that you choose a forecast scalar such that the forecast is distributed as a normal distribution centred at 0 with expected absolute value 10. Since the expected absolute value of a Gaussian is equal to the stddev * sqrt(2/pi), this then seems to imply that the standard deviation of your forecast will be (10/sqrt(2/pi))=12.5.

    Looking at Chapter 10 of your book, when you calculate what size position to take based on your forecast, it appears that you choose a size assuming that the stddev of the forecast is 10. You use this assumption to ensure that the stddev of your daily USD P&L equals your daily risk target. But since the stddev is actually 12.5 rather than 10, doesn't this mean that you are taking on 25% too much risk?

    However, I think the effect is less than this in practice because we're truncating the ends of the forecast Gaussian (at +-20) which will tend to bring the realised stddev down again. Applying the formula from Wikipedia for the stddev of a truncated Gaussian with underlying stddev 12.5, mean 0, and limits +-20, I get that the stddev of the resulting distribution is 9.69, which is almost exactly 10 again.

    So, I think this explains why we end up targeting the right risk, despite fudging the difference between the expected absolute value and the stddev.

    This does imply that you shouldn't change the limits of the forecast -- increasing the limits from +-20 to +-30 will cause you to take 18% too much risk!

    Python code for calculation:

    1. Yes I used E(abs(forecast)) in the book as it's easier to explain than the expected standard deviation. The other difference, of course, is that if the mean isn't zero (which for slow trend following and carry is very likely) then the standard deviation will be biased downwards compared to abs() although that would be easy to fix by using MAD rather than STDEV.

      But yes you're right the effect of applying a cap is to bring things down to the right risk again. You can add this to the list of reasons why using a cap is a good thing.

      Incidentally the caps on FDM and IDM also tend to supress risk; and I've also got a risk management overlay in my own system which does a bit more of that and which I'll blog about in due course.

      Good spot, this is a very subtle effect! And of course in practice forecast distributions are rarely gaussian and risk isn't perfectly predictable so the whole thing is an approximation anyway and +/- 25% on realised risk is about as close as you can get.

    2. Thanks for the reply Robert - interesting points. I look forward to your post on risk management.

      For the record I realised that my reasoning above is slightly wrong because we don't discard forecasts outside +-20 but cap them, so we can't apply the formula for the truncated Gaussian, we need to use a slightly modified version (see g at With this fix, the stddev of the forecast turns out to be 11.3 i.e. it causes us to take on 13% too much risk.