A few days ago I was browsing on the elitetrader.com forum site when someone posted this:

I am interested to know if anyone change their SMA/EMA/WMA/KAMA/LRMA/etc. when volatility changes? Let say ATR is rising, would you increase/decrease the MA period to make it more/less sensitive? And the bigger question would be, is there a relationship between volatility and moving average?

Interesing I thought, and I added it to my very long list of things to think about (In fact I've researched something vaguely like this before, but I couldn't remember what the results were, and the research was done whilst at my former employers which means it currently behind a firewall and a 150 page non disclosure agreement).

Then a couple of days ago I ran a poll off the back of this post as to what my blogpost this month should be about (though mainly the post was an excuse to reminisce about the Fighting Fantasy series of books).

And lo and behold, this subject is what people wanted to know about. But even if you don't want to know about it, and were one of the 57% that voted for the other two options, this is still probably a good post to read. I'm going to be discussing principles and techniques that apply to any evaluation of this kind of system modification.

However: spolier alert - this little piece of research took an unexpected turn. Read on to find out what happened...

### Why this is topical

This is particularly topical because during the market crisis that consumed much of 2020 it was faster moving averages that outperformed slower. Consider these plots which show the average Sharpe Ratio for different kinds of trading rule averaged across instruments. The first plot is for all the history I have (back to the 1970's), then the second is for the first half of 2020, and finally for March 2020 alone:

The pattern is striking: going faster works much better than it did in the overall sample. What's more, it seems to be confined to the financial asset classes (FX, Rates and especially equities) where vol exploded the most:

Furthermore, we can see a similar effect in another notoriously turbulent year:

If we were sell side analysts that would be our nice little research paper finished, but of course we aren't... a few anecdotes do not make up a serious piece of analysis.### Formally specifying the problem

Rewriting the above in fancy sounding language, and bearing in mind the context of my trading system, I can write the above as:

**Are the optimal forecast weights across trading rules of different speeds different when conditioned on the current level of volatility?**

As I pointed out in my last post this leaves a lot of questions unanswered. How should we define the current level of volatility? How we define 'optimality'? How do we evaluate the performance of this change to our simple unconditional trading rules?

### Defining the current level of volatility

For this to be a useful thing to do, 'current' is going to have to be based on backward looking data only. It would have been very helpful to have known in early February last year (2020) that vol was about to rise sharply, and thus perhaps different forecast weights were required, but we didn't actually own the keys to a time machine so we couldn't have known with certainty what was about to happen (and if we had, then changing our forecast weights would not have been high up our to-do list!).

So we're going to be using some measure of historic volatility. The standard measure of vol I use in my trading system (exponentially weighted, equivalent to a lookback of around a month) is a good starting point which we know does a good job of predicting vol over the next 30 days or so (although it does suffer from biases, as I discuss here). Arguably a shorter measure of vol would be more responsive, whilst a longer measure of vol would mean that our forecast weights aren't changing as much thus reducing the costs.

Now how do we define the level of volatility? In that previous post I used current vol estimate / 10 year rolling average of the vol for the relevant. That seems pretty reasonable.

Here for example is the rolling % vol for SP500:

import pandas as pd

from systems.provided.futures_chapter15.basesystem import *

system =futures_system()

instrument_list = system.get_instrument_list()

all_perc_vols =[system.rawdata.get_daily_percentage_volatility(code) for code in instrument_list]

And here's the same, after dividing by 10 year vol:

`ten_year_averages = [vol.rolling(2500, min_periods=10).mean() for vol in all_perc_vols]`

normalised_vol_level = [vol / ten_year_vol for vol, ten_year_vol in zip(all_perc_vols, ten_year_averages)]

`def stack_list_of_pd_series(x):`

stacked_list = []

for element in x:

stacked_list = stacked_list + list(element.values)

return stacked_list

stacked_vol_levels = stack_list_of_pd_series(normalised_vol_level)

stacked_vol_levels = [x for x in stacked_vol_levels if not np.isnan(x)]

matplotlib.pyplot.hist(stacked_vol_levels, bins=1000)

**Update: There was a small bug in my code that didn't affect the conclusions, but had a significant effect on the scale of the normalised vol. Now fixed. Thanks to Rafael L. for pointing this out.**

- Low: Normalised vol in the bottom 25% quantile [using the entire historical period so far to determine the quantile] (over the whole period, normalised vol between 0.16 and 0.7 times the ten year average)
- Medium: Between 25% and 75% (over the whole period, normalised vol 0,7 to 1.14 times the ten year average)
- High: Between 75% and 100% (over the whole period, normalised vol 1.14 to 6,6 times more than the ten year average)

`def historic_quantile_groups(system, instrument_code, quantiles = [.25,.5,.75]):`

daily_vol = system.rawdata.get_daily_percentage_volatility(instrument_code)

` # We shift by one day to avoid forward looking information`

ten_year_vol = daily_vol.rolling(2500, min_periods=10).mean().shift(1)

normalised_vol = daily_vol / ten_year_vol

quantile_points = [get_historic_quantile_for_norm_vol(normalised_vol, quantile) for quantile in quantiles]

stacked_quantiles_and_vol = pd.concat(quantile_points+[normalised_vol], axis=1)

quantile_groups = stacked_quantiles_and_vol.apply(calculate_group_for_row, axis=1)

return quantile_groups

def get_historic_quantile_for_norm_vol(normalised_vol, quantile_point):

return normalised_vol.rolling(99999, min_periods=4).quantile(quantile_point)

def calculate_group_for_row(row_data: pd.Series) -> int:

values = list(row_data.values)

if any(np.isnan(values)):

return np.nan

vol_point = values.pop(-1)

group = 0 # lowest group

for comparision in values[1:]:

if vol_point<=comparision:

return group

group = group+1

# highest group will be len(quantiles)-1

return group

`quantile_groups = [historic_quantile_groups(system, code) for code in instrument_list]`

stacked_quantiles = stack_list_of_pd_series(quantile_groups)

- Low vol: 53% of observations
- Medium vol: 22%
- High vol: 25%

### Unconditional performance of momentum speeds

`rule_list =list(system.rules.trading_rules().keys())`

perf_for_rule = {}

for rule in rule_list:

perf_by_instrument = {}

for code in instrument_list:

perf_for_instrument_and_rule = system.accounts.pandl_for_instrument_forecast(code, rule)

perf_by_instrument[code] = perf_for_instrument_and_rule

perf_for_rule[rule] = perf_by_instrument

# stack

stacked_perf_by_rule = {}

for rule in rule_list:

acc_curves_this_rule = perf_for_rule[rule].values()

stacked_perf_this_rule = stack_list_of_pd_series(acc_curves_this_rule)

stacked_perf_by_rule[rule] = stacked_perf_this_rule

def sharpe(x):

# assumes daily data

return 16*np.nanmean(x) / np.nanstd(x)

for rule in rule_list:

print("%s:%.3f" % (rule, sharpe(stacked_perf_by_rule[rule])))

`historic_quantiles = {}`

for code in instrument_list:

historic_quantiles[code] = historic_quantile_groups(system, code)

conditioned_perf_for_rule_by_state = []

for condition_state in [0,1,2]:

print("State:%d \n\n\n" % condition_state)

conditioned_perf_for_rule = {}

for rule in rule_list:

conditioned_perf_by_instrument = {}

for code in instrument_list:

perf_for_instrument_and_rule = perf_for_rule[rule][code]

condition_vector = historic_quantiles[code]==condition_state

condition_vector = condition_vector.reindex(perf_for_instrument_and_rule.index).ffill()

conditioned_perf = perf_for_instrument_and_rule[condition_vector]

conditioned_perf_by_instrument[code] = conditioned_perf

conditioned_perf_for_rule[rule] = conditioned_perf_by_instrument

conditioned_perf_for_rule_by_state.append(conditioned_perf_for_rule)

stacked_conditioned_perf_by_rule = {}

for rule in rule_list:

acc_curves_this_rule = conditioned_perf_for_rule[rule].values()

stacked_perf_this_rule = stack_list_of_pd_series(acc_curves_this_rule)

stacked_conditioned_perf_by_rule[rule] = stacked_perf_this_rule

print("State:%d \n\n\n" % condition_state)

for rule in rule_list:

print("%s:%.3f" % (rule, sharpe(stacked_conditioned_perf_by_rule[rule])))

*relative*performance, which is what this post is supposed to be about. Relatively speaking fast is still worse than slow, and it's now much worse.

*everything*is reducing as we get to a higher volatility environment.

**Update:**

### Testing the significance of overall performance in different vol environments

**you shouldn't change your speed as vol increases.**

from scipy import stats

for rule in rule_list:

perf_group_0 = stack_list_of_pd_series(conditioned_perf_for_rule_by_state[0][rule].values())

perf_group_1 = stack_list_of_pd_series(conditioned_perf_for_rule_by_state[1][rule].values())

perf_group_2 = stack_list_of_pd_series(conditioned_perf_for_rule_by_state[2][rule].values())

t_stat_0_1 = stats.ttest_ind(perf_group_0, perf_group_1)

t_stat_1_2 = stats.ttest_ind(perf_group_1, perf_group_2)

t_stat_0_2 = stats.ttest_ind(perf_group_0, perf_group_2)

print("Rule: %s , low vs medium %.2f medium vs high %.2f low vs high %.2f" % (rule,

t_stat_0_1.pvalue,

t_stat_1_2.pvalue,

t_stat_0_2.pvalue))

```
Rule: ewmac2_8 , low vs medium 0.37 medium vs high 0.00 low vs high 0.00
Rule: ewmac4_16 , low vs medium 0.25 medium vs high 0.00 low vs high 0.00
Rule: ewmac8_32 , low vs medium 0.12 medium vs high 0.00 low vs high 0.00
Rule: ewmac16_64 , low vs medium 0.08 medium vs high 0.00 low vs high 0.00
Rule: ewmac32_128 , low vs medium 0.07 medium vs high 0.00 low vs high 0.00
Rule: ewmac64_256 , low vs medium 0.03 medium vs high 0.00 low vs high 0.00
Rule: carry , low vs medium 0.00 medium vs high 0.32 low vs high 0.00
```

### Is this an effect we can actually capture?

### A more graduated system

- Low: Normalised vol in the bottom 25% quantile
- Medium: Between 25% and 75%
- High: Between 75% and 100%

**Multiply raw forecasts by L where (if Q is the percentile expressed as a decimal, eg 1 = 100%):**

**L = 2 - 1.5Q**

*raw*forecast here. I do this because there is no guarantee that the above will result in the forecast retaining the correct scaling. So if I then estimate forecast scalars using these transformed forecasts, I will end up with something that has the right scaling.

### Smoothing vol forecast attenuation

### Testing the attenuation, rule by rule

`from systems.forecast_scale_cap import *`

class volAttenForecastScaleCap(ForecastScaleCap):

@diagnostic()

def get_vol_quantile_points(self, instrument_code):

## More properly this would go in raw data perhaps

self.log.msg("Calculating vol quantile for %s" % instrument_code)

daily_vol = self.parent.rawdata.get_daily_percentage_volatility(instrument_code)

ten_year_vol = daily_vol.rolling(2500, min_periods=10).mean()

normalised_vol = daily_vol / ten_year_vol

normalised_vol_q = quantile_of_points_in_data_series(normalised_vol)

return normalised_vol_q

@diagnostic()

def get_vol_attenuation(self, instrument_code):

normalised_vol_q = self.get_vol_quantile_points(instrument_code)

vol_attenuation = normalised_vol_q.apply(multiplier_function)

smoothed_vol_attenuation = vol_attenuation.ewm(span=10).mean()

return smoothed_vol_attenuation

@input

def get_raw_forecast_before_attenuation(self, instrument_code, rule_variation_name):

## original code for get_raw_forecast

raw_forecast = self.parent.rules.get_raw_forecast(

instrument_code, rule_variation_name

)

return raw_forecast

@diagnostic()

def get_raw_forecast(self, instrument_code, rule_variation_name):

## overriden methon this will be called downstream so don't change name

raw_forecast_before_atten = self.get_raw_forecast_before_attenuation(instrument_code, rule_variation_name)

vol_attenutation = self.get_vol_attenuation(instrument_code)

attenuated_forecast = raw_forecast_before_atten * vol_attenutation

return attenuated_forecast

def quantile_of_points_in_data_series(data_series):

results = [quantile_of_points_in_data_series_row(data_series, irow) for irow in range(len(data_series))]

results_series = pd.Series(results, index = data_series.index)

return results_seriesfrom statsmodels.distributions.empirical_distribution import ECDF# this is a little slow so suggestions for speeding up are welcomedef quantile_of_points_in_data_series_row(data_series, irow):

if irow<2:

return np.nan

historical_data = list(data_series[:irow].values)

current_value = data_series[irow]

ecdf_s = ECDF(historical_data)

return ecdf_s(current_value)

def multiplier_function(vol_quantile):

if np.isnan(vol_quantile):

return 1.0

return 2 - 1.5*vol_quantile

`And here's how to implement it in a new futures system (we just copy and paste the futures_system code and change the object passed for the forecast scaling/capping stage)::`

`from systems.provided.futures_chapter15.basesystem import *`

def futures_system_with_vol_attenuation(data=None, config=None, trading_rules=None, log_level="on"):

if data is None:

data = csvFuturesSimData()

if config is None:

config = Config(

"systems.provided.futures_chapter15.futuresconfig.yaml")

rules = Rules(trading_rules)

system = System(

[

Account(),

Portfolios(),

PositionSizing(),

FuturesRawData(),

ForecastCombine(),

volAttenForecastScaleCap(),

rules,

],

data,

config,

)

system.set_logging_level(log_level)

return system

`And now I can set up two systems, one without attenuation and one with:`

`system =futures_system()`

# will equally weight instruments

del(system.config.instrument_weights)

# need to do this to deal fairly with attenuation

# do it here for consistency

system.config.use_forecast_scale_estimates = True

system.config.use_forecast_div_mult_estimates=True

# will equally weight forecasts

del(system.config.forecast_weights)

# standard stuff to account for instruments coming into the sample

system.config.use_instrument_div_mult_estimates = True

system_vol_atten = futures_system_with_vol_attenuation()

del(system_vol_atten.config.forecast_weights)

del(system_vol_atten.config.instrument_weights)

system_vol_atten.config.use_forecast_scale_estimates = True

system_vol_atten.config.use_forecast_div_mult_estimates=True

system_vol_atten.config.use_instrument_div_mult_estimates = True

rule_list =list(system.rules.trading_rules().keys())

for rule in rule_list:

sr1= system.accounts.pandl_for_trading_rule(rule).sharpe()

sr2 = system_vol_atten.accounts.pandl_for_trading_rule(rule).sharpe()

print("%s before %.2f and after %.2f" % (rule, sr1, sr2))

`Let's check out the results:`

```
ewmac2_8 before 0.43 and after 0.52
ewmac4_16 before 0.78 and after 0.83
ewmac8_32 before 0.96 and after 1.00
ewmac16_64 before 1.01 and after 1.07
ewmac32_128 before 1.02 and after 1.07
ewmac64_256 before 0.96 and after 1.00
carry before 1.07 and after 1.11
```

`Now these aren't huge improvements, but they are very consistent across every single trading rule. But are they statistically significant?`

`from syscore.accounting import account_test`

for rule in rule_list:

acc1= system.accounts.pandl_for_trading_rule(rule)

acc2 = system_vol_atten.accounts.pandl_for_trading_rule(rule)

print("%s T-test %s" % (rule, str(account_test(acc2, acc1))))

`ewmac2_8 T-test (0.005754898313025798, Ttest_relResult(statistic=4.23535684665446, pvalue=2.2974165336647636e-05)) ewmac4_16 T-test (0.0034239182014355815, Ttest_relResult(statistic=2.46790714210943, pvalue=0.013603190422737766)) ewmac8_32 T-test (0.0026717541872894254, Ttest_relResult(statistic=1.8887927423648214, pvalue=0.058941593401076096)) ewmac16_64 T-test (0.0034357601899108192, Ttest_relResult(statistic=2.3628815728522112, pvalue=0.018147935814311716)) ewmac32_128 T-test (0.003079560056791747, Ttest_relResult(statistic=2.0584403445859034, pvalue=0.03956754085349411)) ewmac64_256 T-test (0.002499427499123595, Ttest_relResult(statistic=1.7160401190191614, pvalue=0.08617825487582882)) carry T-test (0.0022278238232666947, Ttest_relResult(statistic=1.3534155676590192, pvalue=0.17594617201514515))`

A mixed bag there, but with the exception of carry there does seem to be a reasonable amount of improvement; most markedly with the very fastest rules.

`Again, I could do some implicit fitting here to only use the attenuation on momentum, or use less of it on slower momentum. But I'm not going to do that.`

## Summary

To return to the original question: yes we should change our trading behaviour as vol changes.But not in the way you might think, especially if you had extrapolated the performance from March 2020.As vol gets higher faster trading rules do relatively badly, but actually the bigger story is that all momentum rules suffer(as does carry, a bit). Not what I had expected to find, butveryinteresting. So a big thanks to the internet's hive mind for voting for this option.

As volatility rises, doesn't there tend to be more swings in both directions and therefore more trading? (This could be because our vol adjustments lag the actual jump in vol and systems can't reduce their trade size fast enough.) If so, faster systems (shorter lookbacks) would indeed perform worse since they have higher trading costs relative to Sharpe.

ReplyDeleteIn other words, when you say, "As vol gets higher faster trading rules do relatively badly, but actually the bigger story is that all momentum rules suffer," how much of it is a trading cost story? Is there a way to determine this, like rerunning it excluding trading costs?

In theory yes, but I've just totally borked my python in an attempt to upgrade everything... let me get back to you

Deletei just think that Vol tends to mean-reverse, so over-reacting (faster trading) makes the situation worse as the system gets whipsawed.

DeleteI guess the question is, when you get around to it, is what percentage of the decline in results is from costs and what percent from other things, e.g., bad signals and whipsaws.

ReplyDeleteAnd more importantly, what difference, if any, does this make?

Thanks for the great post and follow up!

Updated the post to reflect this.

DeleteWow. I expected trading costs to be a more significant factor than they are. Not necessarily in the general decline in performance but rather in the relative poor performance of faster trading rules. But costs don't seem to be significant here either (if my math is right).

DeleteEither way, it seems likes the best thing to do is to scale back your positions during periods of high vol. Of course, those who dynamically vol target/scale are already doing this.

Thanks Rob for doing the research, and the extra research too.

Hi Rob, thanks for a great analysis (as usual)! Doesn’t this prove the old (anecdotal) saying that markets take the stairs up (low(er) vol, steady momentum) and the elevator down (high vol, momentum crash)?

ReplyDeleteI'm not sure it proves it, but it is certainly consistent with it.

DeleteHi Rob, thanks for a great analysis (as usual)! Doesn’t this prove the old (anecdotal) saying that markets take the stairs up (low(er) vol, steady momentum) and the elevator down (high vol, momentum crash)?

ReplyDeleteSo as vol gets higher, mean reversion, not trend following should do better.

ReplyDeleteI didn't actually test this... but maybe.

DeleteGreat findings Rob, you binned the data be 3 regimes, what if volatility is fluctuating between low and medium volatility? Doesn't that fragment the time series data? Suppose, 16 EMA crosses 64 EMA to generate a buy signal in Low Vol regime then market enters medium Vol regime and now 16 EMA is below 64, what trade gets registered?

ReplyDeleteThe trade you have on wouldn't be affected, but the size of the trade would change. Yes you'd get that kind of behaviour which is why I decided not to implement that but instead used a measure of vol regime that was contionous.

DeleteGreat post, Rob!

ReplyDeleteI didn't quite got the reasoning behind a implementation detail. Maybe you could clarify this.

The estimate for the 10 year vol that you are using in this post applies a rolling standard deviation over a ewma short term vol. Wouldn't make more sense to apply a simple rolling average over this short term vol in order to obtain a more reasonable estimate for the 10 year vol?

"ten_year_vol = daily_vol.rolling(2500, min_periods=10).mean().shift(1)"

instead of

"ten_year_vol = daily_vol.rolling(2500, min_periods=10).std().shift(1)"

In the previous post that you investigate slower vol terms in a simple linear weighted vol model (with short and long term components), you applied a simple moving average as a proxy to the long term vol. So, why are you using stantard deviation instead of mean this time?

Copying and pasting error. It should indeed be ten_year_vol = daily_vol.rolling(2500, min_periods=10).mean().shift(1)

DeleteWill now fix, thanks.

Thanks for clarifying, Rob. I was thinking whether it could be some sort of weird Z-score-like measure. Fortunately, it is just a pasting error.

DeleteThank you for the great post.

ReplyDeleteHi Rob - I would like to ask you a clarification on the following statement:” I use the raw forecast here. I do this because there is no guarantee that the above will result in the forecast retaining the correct scaling. So if I then estimate forecast scalars using these transformed forecasts, I will end up with something that has the right scaling”.

ReplyDeleteIf we apply “L” to the “raw forecast” and then we apply the “forecast scalars” to the raw forecast adjusted by L, then are we not diluting / losing the overall signal coming from L? I mean, the overall signal might well say - hey, it’s a high vol environment, let’s rein in risk; but then our forecast scalars will say, No, we need to adjust the position upward to make it more like an average bet, whisk we don’t want to take an average bet in the forecast adjusted for L is low.

I hope I have been clear. If not let me know.

Thanks,

Ric

You've been clear, but you are missing the fact that the forecast scalars are effectively fixed.

DeleteThat means they won't 'rein in' risk as you describe, since we're going to multiply the modified forecast by the same scalar regardless of environment.

Now for the disclaimer: they're not exactly fixed, since they are estimated on a rolling out of sample basis. That means if we start in say an (ex-post) low vol environment then the the forecast scalars will be too low (since the raw forecast is higher, and the scalar doesn't need to be as large); but after a few years it should be sorted out.

Thanks for your reply, Robert.

DeleteFollow-up question: what if we applied “L” to the “adjusted forecast” instead? Would that be acceptable ?

Thanks, Ric

What's the 'adjusted' forecast??

DeleteApologies for not being clear. For “adjusted forecast”, I mean the “raw forecast” adjusted by “forecast scalars”.

ReplyDeleteSo, what if we applied “L” to this “adjusted forecast”? Would that be acceptable?

Thanks!

Yes on reflection, this would probably make more sense than the version I described.

Delete