Thursday, 4 March 2021

Does it make sense to change your trading behaviour in different periods of volatility?

 A few days ago I was browsing on the elitetrader.com forum site when someone posted this:

I am interested to know if anyone change their SMA/EMA/WMA/KAMA/LRMA/etc. when volatility changes? Let say ATR is rising, would you increase/decrease the MA period to make it more/less sensitive? And the bigger question would be, is there a relationship between volatility and moving average?

Interesing I thought, and I added it to my very long list of things to think about (In fact I've researched something vaguely like this before, but I couldn't remember what the results were, and the research was done whilst at my former employers which means it currently behind a firewall and a 150 page non disclosure agreement). 

Then a couple of days ago I ran a poll off the back of this post as to what my blogpost this month should be about (though mainly the post was an excuse to reminisce about the Fighting Fantasy series of books).

And lo and behold, this subject is what people wanted to know about. But even if you don't want to know about it, and were one of the 57% that voted for the other two options, this is still probably a good post to read. I'm going to be discussing principles and techniques that apply to any evaluation of this kind of system modification.

However: spolier alert - this little piece of research took an unexpected turn. Read on to find out what happened...



Why this is topical


This is particularly topical because during the market crisis that consumed much of 2020 it was faster moving averages that outperformed slower. Consider these plots which show the average Sharpe Ratio for different kinds of trading rule averaged across instruments. The first plot is for all the history I have (back to the 1970's), then the second is for the first half of 2020, and finally for March 2020 alone:



The pattern is striking: going faster works much better than it did in the overall sample. What's more, it seems to be confined to the financial asset classes (FX, Rates and especially equities) where vol exploded the most:



Furthermore, we can see a similar effect in another notoriously turbulent year:

If we were sell side analysts that would be our nice little research paper finished, but of course we aren't... a few anecdotes do not make up a serious piece of analysis.


Formally specifying the problem

Rewriting the above in fancy sounding language, and bearing in mind the context of my trading system, I can write the above as:

Are the optimal forecast weights across trading rules of different speeds different when conditioned on the current level of volatility?

As I pointed out in my last post this leaves a lot of questions unanswered. How should we define the current level of volatility? How we define 'optimality'? How do we evaluate the performance of this change to our simple unconditional trading rules?



Defining the current level of volatility


For this to be a useful thing to do, 'current' is going to have to be based on backward looking data only. It would have been very helpful to have known in early February last year (2020) that vol was about to rise sharply, and thus perhaps different forecast weights were required, but we didn't actually own the keys to a time machine so we couldn't have known with certainty what was about to happen (and if we had, then changing our forecast weights would not have been high up our to-do list!).

So we're going to be using some measure of historic volatility. The standard measure of vol I use in my trading system (exponentially weighted, equivalent to a lookback of around a month) is a good starting point which we know does a good job of predicting vol over the next 30 days or so (although it does suffer from biases, as I discuss here). Arguably a shorter measure of vol would be more responsive, whilst a longer measure of vol would mean that our forecast weights aren't changing as much thus reducing the costs.

Now how do we define the level of volatility? In that previous post I used current vol estimate / 10 year rolling average of the  vol for the relevant. That seems pretty reasonable. 

Here for example is the rolling % vol for SP500:

import  pandas as pd
from systems.provided.futures_chapter15.basesystem import *

system =futures_system()

instrument_list = system.get_instrument_list()

all_perc_vols =[system.rawdata.get_daily_percentage_volatility(code) for code in instrument_list]



 And here's the same, after dividing by 10 year vol:

ten_year_averages = [vol.rolling(2500, min_periods=10).mean() for vol in all_perc_vols]
normalised_vol_level = [vol / ten_year_vol for vol, ten_year_vol in zip(all_perc_vols, ten_year_averages)]




The picture is very similar, but importantly we can now compare and pool results across instruments.

def stack_list_of_pd_series(x):
stacked_list = []
for element in x:
stacked_list = stacked_list + list(element.values)

return stacked_list

stacked_vol_levels = stack_list_of_pd_series(normalised_vol_level)

stacked_vol_levels = [x for x in stacked_vol_levels if not np.isnan(x)]
matplotlib.pyplot.hist(stacked_vol_levels, bins=1000)

What's immediately obvious is that this is a very skewed distribution. This is made clear if we stack up all the normalised vols across markets and plot the distribution:

Update: There was a small bug in my code that didn't affect the conclusions, but had a significant effect on the scale of the normalised vol. Now fixed. Thanks to Rafael L. for pointing this out.






The mean is 0.98 - as you'd expect - but look at that right tail! About 1% of the observations are over 2.5, and the maximum value is nearly 6.7. You might think this is due to some particularly horrible markets (VIX?), but nearly all the instruments have normalised vol that is distributed like this.

At this point we need to think about how many vol regimes were going to have, and how they should be selected. More regimes will mean we can more closely fit our speed to what is going on, but we'd end up with fewer data points (I'm reminded of this post where someone had inferred behaviour from just 18 days when the VIX was especially low). Fewer data points will mean our forecast weights will eithier revert to an average, or worse take extreme values if we're not fitting robustly.

I decided to use three regimes:
  • Low: Normalised vol in the bottom 25% quantile [using the entire historical period so far to determine the quantile] (over the whole period, normalised vol between 0.16 and 0.7 times the ten year average)
  • Medium: Between 25% and 75% (over the whole period, normalised vol 0,7 to 1.14 times the ten year average)
  • High: Between 75% and 100% (over the whole period, normalised vol 1.14 to 6,6 times more than the ten year average)
There could be a case for making these regimes equal size, but I think there is something about relatively high vol that is unique so I made that smaller (with low vol the same size for symettry). Equally, there is a case for making them more extreme. There certainly isn't a case for jumping ahead and seeing which range of regimes performs the best - that would be implicit fitting!

def historic_quantile_groups(system, instrument_code, quantiles = [.25,.5,.75]):
daily_vol = system.rawdata.get_daily_percentage_volatility(instrument_code)
    # We shift by one day to avoid forward looking information
ten_year_vol = daily_vol.rolling(2500, min_periods=10).mean().shift(1)
normalised_vol = daily_vol / ten_year_vol

quantile_points = [get_historic_quantile_for_norm_vol(normalised_vol, quantile) for quantile in quantiles]
stacked_quantiles_and_vol = pd.concat(quantile_points+[normalised_vol], axis=1)
quantile_groups = stacked_quantiles_and_vol.apply(calculate_group_for_row, axis=1)

return quantile_groups

def get_historic_quantile_for_norm_vol(normalised_vol, quantile_point):
return normalised_vol.rolling(99999, min_periods=4).quantile(quantile_point)

def calculate_group_for_row(row_data: pd.Series) -> int:
values = list(row_data.values)
if any(np.isnan(values)):
return np.nan
vol_point = values.pop(-1)
group = 0 # lowest group
for comparision in values[1:]:
if vol_point<=comparision:
return group
group = group+1

# highest group will be len(quantiles)-1
return group

Over all instruments pooled together...
quantile_groups = [historic_quantile_groups(system, code) for code in instrument_list]
stacked_quantiles = stack_list_of_pd_series(quantile_groups)
.... the size of each group comes out at:
  • Low vol: 53% of observations
  • Medium vol: 22% 
  • High vol: 25%
That's different from the 25,50,25 you'd expect. That's because  vol isn't stable over this period and we're using backward looking quantiles, rather than doing a forward looking cheat where we use the entire period to determine our quantiles (which would give us exactly 25,50,25).

Still we've got a quarter in our high vol group, which was what we are aiming for. And I feel it would be some kind of cheating to go back and change the quantile cutoffs having seen these numbers.


Unconditional performance of momentum speeds


Let's get the unconditional returns for the rules in our trading system: momentum using exponentially weighted moving average crossovers from 2_8 (2 day lookback - 8 days) up to 64_256, plus the carry rule (not strictly speaking part of the problem we're looking at today, but what the hell: we can use this as a proxy for determining whether 'divergent' / momentum or 'convergent' systems do worse or better when vol is high or low). These are average returns across instruments; which won't be as good as the portfolio level returns for each rule (we'll look at those later).

rule_list  =list(system.rules.trading_rules().keys())
perf_for_rule = {}
for rule in rule_list:
perf_by_instrument = {}
for code in instrument_list:
perf_for_instrument_and_rule = system.accounts.pandl_for_instrument_forecast(code, rule)
perf_by_instrument[code] = perf_for_instrument_and_rule

perf_for_rule[rule] = perf_by_instrument

# stack
stacked_perf_by_rule = {}
for rule in rule_list:
acc_curves_this_rule = perf_for_rule[rule].values()
stacked_perf_this_rule = stack_list_of_pd_series(acc_curves_this_rule)
stacked_perf_by_rule[rule] = stacked_perf_this_rule

def sharpe(x):
# assumes daily data
return 16*np.nanmean(x) / np.nanstd(x)

for rule in rule_list:
print("%s:%.3f" % (rule, sharpe(stacked_perf_by_rule[rule])))

ewmac2_8:0.064
ewmac4_16:0.202
ewmac8_32:0.303
ewmac16_64:0.345
ewmac32_128:0.351
ewmac64_256:0.339
carry:0.318

Similar to the plot we saw earlier; unconditionally medium and slow momentum (and carry) tends to outperform fast momentum.

Now what if we condition on the current state of vol?
historic_quantiles = {}
for code in instrument_list:
historic_quantiles[code] = historic_quantile_groups(system, code)

conditioned_perf_for_rule_by_state = []

for condition_state in [0,1,2]:
print("State:%d \n\n\n" % condition_state)

conditioned_perf_for_rule = {}
for rule in rule_list:
conditioned_perf_by_instrument = {}
for code in instrument_list:
perf_for_instrument_and_rule = perf_for_rule[rule][code]
condition_vector = historic_quantiles[code]==condition_state
condition_vector = condition_vector.reindex(perf_for_instrument_and_rule.index).ffill()
conditioned_perf = perf_for_instrument_and_rule[condition_vector]

conditioned_perf_by_instrument[code] = conditioned_perf

conditioned_perf_for_rule[rule] = conditioned_perf_by_instrument

conditioned_perf_for_rule_by_state.append(conditioned_perf_for_rule)

stacked_conditioned_perf_by_rule = {}
for rule in rule_list:
acc_curves_this_rule = conditioned_perf_for_rule[rule].values()
stacked_perf_this_rule = stack_list_of_pd_series(acc_curves_this_rule)
stacked_conditioned_perf_by_rule[rule] = stacked_perf_this_rule

print("State:%d \n\n\n" % condition_state)
for rule in rule_list:
print("%s:%.3f" % (rule, sharpe(stacked_conditioned_perf_by_rule[rule])))

State:0  (Low vol)
ewmac2_8:0.207
ewmac4_16:0.334
ewmac8_32:0.432
ewmac16_64:0.481
ewmac32_128:0.492
ewmac64_256:0.462
carry:0.442

Interesting! These numbers are better than the unconditional figures we saw above, but fast momentum still looks poor relatively speaking (these numbers, like all those in this post, are after costs). But overall the pattern isn't that different from the unconditional performance; nowhere near enough to justify changing forecast weights very much.

State:1 (Medium vol)
ewmac2_8:0.139
ewmac4_16:0.255
ewmac8_32:0.335
ewmac16_64:0.380
ewmac32_128:0.397
ewmac64_256:0.340
carry:0.195

The 'medium' level of vol is more similar to the unconditional figures. Again this is nothing to write home about in terms of differences in relative performance, although relatively speaking fast is looking a little worse.


State:2 (High vol)
ewmac2_8:-0.299
ewmac4_16:-0.106
ewmac8_32:0.027
ewmac16_64:0.043
ewmac32_128:0.003
ewmac64_256:0.002
carry:0.103


Now you've probably noticed a pattern here, and I know everyone is completely distracted by it, but just for a moment lets' focus on relative performance, which is what this post is supposed to be about. Relatively speaking fast is still worse than slow, and it's now much worse. 

Carry has markedly improved, but.... oh what the hell I can't contain myself anymore. There is nothing that interesting or useful in the relative performance, but what is clear is that the absolute performance of everything is reducing as we get to a higher volatility environment.

Update: 
A regular reader (Mike N) asked me how much the above figures were affected by costs. So I re-ran the above but excluded costs

Unconditional figures:
ewmac2_8:0.135
ewmac4_16:0.245
ewmac8_32:0.334
ewmac16_64:0.368
ewmac32_128:0.369
ewmac64_256:0.343
carry:0.309

Low vol:
ewmac2_8:0.277
ewmac4_16:0.368
ewmac8_32:0.449
ewmac16_64:0.491
ewmac32_128:0.497
ewmac64_256:0.466
carry:0.443

Medium vol:
ewmac2_8:0.209
ewmac4_16:0.290
ewmac8_32:0.353
ewmac16_64:0.390
ewmac32_128:0.403
ewmac64_256:0.345
carry:0.196


High vol:
ewmac2_8:-0.232
ewmac4_16:-0.071
ewmac8_32:0.045
ewmac16_64:0.053
ewmac32_128:0.009
ewmac64_256:0.007
carry:0.104

So not just a cost story.

Testing the significance of overall performance in different vol environments

I really ought to end this post here, as the answer to the original question is a firm no: you shouldn't change your speed as vol increases. 

However we've now been presented with a new hypothesis: "Momentum and carry will do badly when vol is relatively high"

Let's switch gears and test this hypothesis.

First of all let's consider the statistical significance of the differences in return we saw above:

from scipy import stats

for rule in rule_list:
perf_group_0 = stack_list_of_pd_series(conditioned_perf_for_rule_by_state[0][rule].values())
perf_group_1 = stack_list_of_pd_series(conditioned_perf_for_rule_by_state[1][rule].values())
perf_group_2 = stack_list_of_pd_series(conditioned_perf_for_rule_by_state[2][rule].values())

t_stat_0_1 = stats.ttest_ind(perf_group_0, perf_group_1)
t_stat_1_2 = stats.ttest_ind(perf_group_1, perf_group_2)
t_stat_0_2 = stats.ttest_ind(perf_group_0, perf_group_2)

print("Rule: %s , low vs medium %.2f medium vs high %.2f low vs high %.2f" % (rule,
t_stat_0_1.pvalue,
t_stat_1_2.pvalue,
t_stat_0_2.pvalue))

Rule: ewmac2_8 , low vs medium 0.37 medium vs high 0.00 low vs high 0.00
Rule: ewmac4_16 , low vs medium 0.25 medium vs high 0.00 low vs high 0.00
Rule: ewmac8_32 , low vs medium 0.12 medium vs high 0.00 low vs high 0.00
Rule: ewmac16_64 , low vs medium 0.08 medium vs high 0.00 low vs high 0.00
Rule: ewmac32_128 , low vs medium 0.07 medium vs high 0.00 low vs high 0.00
Rule: ewmac64_256 , low vs medium 0.03 medium vs high 0.00 low vs high 0.00
Rule: carry , low vs medium 0.00 medium vs high 0.32 low vs high 0.00

These are p-values, so a low number means statistical significance. Generally speaking, with the exception of carry, the biggest effect is when we jump from medium to high vol; the jump from low to medium doesn't usually result in a significantly worse performance.

So it's something special about the high-vol enviroment where returns get badly degraded.


Is this an effect we can actually capture?


One concern I have is how quickly we move in and out of the different vol regimes; here for example is Eurodollar:




To exploit this effect we're going to have to do something like radically reduce our leverage whenever an instrument enters 'zone 2: high vol'. That clearly would have worked in early 2020 when there was a persistent high vol environment for some reason that escapes me now. But would we really get the chance to do very much for those brief few days in late 2019 when Eurodollar enters the highest vol zone?

Above you may have noticed I put in a one day lag on the vol estimate - this is to ensure we aren't conditioning todays return based on a vol estimate that uses todays return - clearly we couldn't change our leverage or otherwise react until we actually got the close of business price.

[In my backtest I automatically lag trades by a day, so when I finally come to test anything this shift can be removed]

In fact I have a confession to make... when first running this code I omitted the shift(1) lag, and the results were even stronger; with heavily negative returns for all trading rules in the highest vol region (except carry, which was barely positive). So this makes me suspicous that we wouldn't have the chance to react in time to make much of this.

Still, repeating the results with a 2 and even 3 day lag I still have some pretty low p-values, so there is probably something in it. Also, interestingly, with these greater lags there is more differentiation between low and medium regimes. Here for example are the T-statistics for a 3 day lag:

Rule: ewmac2_8, low vs medium 0.06 medium vs high 0.01 low vs high 0.00
Rule: ewmac4_16, low vs medium 0.16 medium vs high 0.04 low vs high 0.00
Rule: ewmac8_32, low vs medium 0.13 medium vs high 0.08 low vs high 0.00
Rule: ewmac16_64, low vs medium 0.03 medium vs high 0.06 low vs high 0.00
Rule: ewmac32_128, low vs medium 0.01 medium vs high 0.06 low vs high 0.00
Rule: ewmac64_256, low vs medium 0.02 medium vs high 0.14 low vs high 0.00
Rule: carry, low vs medium 0.08 medium vs high 0.46 low vs high 0.01


A more graduated system


Rather than using regimes, I think it would make more sense to do something involving a more continous variable, which is the quantile percentile itself, rather than the regime bucket that it falls into. Then we won't drastically shift gears between regimes.

Recall our three regimes:
  • Low: Normalised vol in the bottom 25% quantile
  • Medium: Between 25% and 75%
  • High: Between 75% and 100% 
One temptation is to introduce something just for the high regime, where we start degearing when our quantile percentile is above 75%; but that makes me feel queasy (it's clearly implicit fitting), plus the results with higher lags indicate that it might not be a 'high vol is especially bad' effect, but rather a general 'as vol gets higher we make less money'.

After some thought (well 10 seconds) I came up with the following:

Multiply raw forecasts by L where (if Q is the percentile expressed as a decimal, eg 1 = 100%):

L = 2 - 1.5Q

That will vary L between 2 (if vol is really low) and 0.5 (if vol is really high). The reason we're not turning off the system completely for high vol is for all the usual reasons; although this is a strong effect it's still not a certainty 

I use the raw forecast here. I do this because there is no guarantee that the above will result in the forecast retaining the correct scaling. So if I then estimate forecast scalars using these transformed forecasts, I will end up with something that has the right scaling.

These forecasts will then be capped at -20,+20; which may undo some of the increases in leverage done when vol is particularly low - but 

 

Smoothing vol forecast attenuation


The first thing I did was to see what the L factor actually looks like in practice. Here it is for Eurodollar [I will give you the code in a few moments]:


It sort of seems to make sense; there for example you can see the attenuation backing right off in early 2020 when we had the COVID inspired high vol. However it worries me that this thing is pretty noisy. Laid on top of a relatively smooth slow moving average this thing is going to boost trading costs quite a lot. I think the appropriate thing to do here is smooth it before applying it to the raw forecast. Of course if we smooth it too much then we'll be lagging the vol period.

Once again, the wrong thing to do here would be some kind of optimisation of post cost returns to find the best smoothing lookback, or something that was keyed into the speed of the relevant trading rule; instead I'm just going to plump for a ewma with a 10 day span. 


Testing the attenuation, rule by rule


Here then is the code that implements the attenuation:

from systems.forecast_scale_cap import *

class volAttenForecastScaleCap(ForecastScaleCap):

@diagnostic()
def get_vol_quantile_points(self, instrument_code):
## More properly this would go in raw data perhaps
self.log.msg("Calculating vol quantile for %s" % instrument_code)
daily_vol = self.parent.rawdata.get_daily_percentage_volatility(instrument_code)
ten_year_vol = daily_vol.rolling(2500, min_periods=10).mean()
normalised_vol = daily_vol / ten_year_vol

normalised_vol_q = quantile_of_points_in_data_series(normalised_vol)

return normalised_vol_q

@diagnostic()
def get_vol_attenuation(self, instrument_code):
normalised_vol_q = self.get_vol_quantile_points(instrument_code)
vol_attenuation = normalised_vol_q.apply(multiplier_function)

smoothed_vol_attenuation = vol_attenuation.ewm(span=10).mean()

return smoothed_vol_attenuation

@input
def get_raw_forecast_before_attenuation(self, instrument_code, rule_variation_name):
## original code for get_raw_forecast
raw_forecast = self.parent.rules.get_raw_forecast(
instrument_code, rule_variation_name
)

return raw_forecast

@diagnostic()
def get_raw_forecast(self, instrument_code, rule_variation_name):
## overriden methon this will be called downstream so don't change name
raw_forecast_before_atten = self.get_raw_forecast_before_attenuation(instrument_code, rule_variation_name)

vol_attenutation = self.get_vol_attenuation(instrument_code)

attenuated_forecast = raw_forecast_before_atten * vol_attenutation

return attenuated_forecast
def quantile_of_points_in_data_series(data_series):
results = [quantile_of_points_in_data_series_row(data_series, irow) for irow in range(len(data_series))]
results_series = pd.Series(results, index = data_series.index)

return results_series

from statsmodels.distributions.empirical_distribution import ECDF

# this is a little slow so suggestions for speeding up are welcome
def quantile_of_points_in_data_series_row(data_series, irow):
if irow<2:
return np.nan
historical_data = list(data_series[:irow].values)
current_value = data_series[irow]
ecdf_s = ECDF(historical_data)

return ecdf_s(current_value)

def multiplier_function(vol_quantile):
if np.isnan(vol_quantile):
return 1.0

return 2 - 1.5*vol_quantile

And here's how to implement it in a new futures system (we just copy and paste the futures_system code and change the object passed for the forecast scaling/capping stage)::
from systems.provided.futures_chapter15.basesystem import *


def futures_system_with_vol_attenuation(data=None, config=None, trading_rules=None, log_level="on"):

if data is None:
data = csvFuturesSimData()

if config is None:
config = Config(
"systems.provided.futures_chapter15.futuresconfig.yaml")

rules = Rules(trading_rules)

system = System(
[
Account(),
Portfolios(),
PositionSizing(),
FuturesRawData(),
ForecastCombine(),
volAttenForecastScaleCap(),
rules,
],
data,
config,
)

system.set_logging_level(log_level)

return system

And now I can set up two systems, one without attenuation and one with:
system =futures_system()
# will equally weight instruments
del(system.config.instrument_weights)

# need to do this to deal fairly with attenuation
# do it here for consistency
system.config.use_forecast_scale_estimates = True
system.config.use_forecast_div_mult_estimates=True

# will equally weight forecasts
del(system.config.forecast_weights)

# standard stuff to account for instruments coming into the sample
system.config.use_instrument_div_mult_estimates = True

system_vol_atten = futures_system_with_vol_attenuation()
del(system_vol_atten.config.forecast_weights)
del(system_vol_atten.config.instrument_weights)
system_vol_atten.config.use_forecast_scale_estimates = True
system_vol_atten.config.use_forecast_div_mult_estimates=True
system_vol_atten.config.use_instrument_div_mult_estimates = True

rule_list =list(system.rules.trading_rules().keys())

for rule in rule_list:
sr1= system.accounts.pandl_for_trading_rule(rule).sharpe()
sr2 = system_vol_atten.accounts.pandl_for_trading_rule(rule).sharpe()

print("%s before %.2f and after %.2f" % (rule, sr1, sr2))

Let's check out the results:
ewmac2_8 before 0.43 and after 0.52
ewmac4_16 before 0.78 and after 0.83
ewmac8_32 before 0.96 and after 1.00
ewmac16_64 before 1.01 and after 1.07
ewmac32_128 before 1.02 and after 1.07
ewmac64_256 before 0.96 and after 1.00
carry before 1.07 and after 1.11

Now these aren't huge improvements, but they are very consistent across every single trading rule. But are they statistically significant?
from syscore.accounting import account_test

for rule in rule_list:
acc1= system.accounts.pandl_for_trading_rule(rule)
acc2 = system_vol_atten.accounts.pandl_for_trading_rule(rule)
print("%s T-test %s" % (rule, str(account_test(acc2, acc1))))

ewmac2_8 T-test (0.005754898313025798, Ttest_relResult(statistic=4.23535684665446, pvalue=2.2974165336647636e-05))
ewmac4_16 T-test (0.0034239182014355815, Ttest_relResult(statistic=2.46790714210943, pvalue=0.013603190422737766))
ewmac8_32 T-test (0.0026717541872894254, Ttest_relResult(statistic=1.8887927423648214, pvalue=0.058941593401076096))
ewmac16_64 T-test (0.0034357601899108192, Ttest_relResult(statistic=2.3628815728522112, pvalue=0.018147935814311716))
ewmac32_128 T-test (0.003079560056791747, Ttest_relResult(statistic=2.0584403445859034, pvalue=0.03956754085349411))
ewmac64_256 T-test (0.002499427499123595, Ttest_relResult(statistic=1.7160401190191614, pvalue=0.08617825487582882))
carry T-test (0.0022278238232666947, Ttest_relResult(statistic=1.3534155676590192, pvalue=0.17594617201514515))

A mixed bag there, but with the exception of carry there does seem to be a reasonable amount of improvement; most markedly with the very fastest rules.
Again, I could do some implicit fitting here to only use the attenuation on momentum, or use less of it on slower momentum. But I'm not going to do that.

Summary


To return to the original question: yes we should change our trading behaviour as vol changes.
But not in the way you might think, especially if you had extrapolated the performance from March 2020.

As vol gets higher faster trading rules do relatively badly, but actually the bigger story is that all momentum rules suffer
(as does carry, a bit). Not what I had expected to find, but very interesting. So a big thanks to the internet's hive mind for voting for this option.


38 comments:

  1. As volatility rises, doesn't there tend to be more swings in both directions and therefore more trading? (This could be because our vol adjustments lag the actual jump in vol and systems can't reduce their trade size fast enough.) If so, faster systems (shorter lookbacks) would indeed perform worse since they have higher trading costs relative to Sharpe.

    In other words, when you say, "As vol gets higher faster trading rules do relatively badly, but actually the bigger story is that all momentum rules suffer," how much of it is a trading cost story? Is there a way to determine this, like rerunning it excluding trading costs?

    ReplyDelete
    Replies
    1. In theory yes, but I've just totally borked my python in an attempt to upgrade everything... let me get back to you

      Delete
    2. i just think that Vol tends to mean-reverse, so over-reacting (faster trading) makes the situation worse as the system gets whipsawed.

      Delete
  2. I guess the question is, when you get around to it, is what percentage of the decline in results is from costs and what percent from other things, e.g., bad signals and whipsaws.
    And more importantly, what difference, if any, does this make?
    Thanks for the great post and follow up!

    ReplyDelete
    Replies
    1. Updated the post to reflect this.

      Delete
    2. Wow. I expected trading costs to be a more significant factor than they are. Not necessarily in the general decline in performance but rather in the relative poor performance of faster trading rules. But costs don't seem to be significant here either (if my math is right).

      Either way, it seems likes the best thing to do is to scale back your positions during periods of high vol. Of course, those who dynamically vol target/scale are already doing this.

      Thanks Rob for doing the research, and the extra research too.

      Delete
  3. Hi Rob, thanks for a great analysis (as usual)! Doesn’t this prove the old (anecdotal) saying that markets take the stairs up (low(er) vol, steady momentum) and the elevator down (high vol, momentum crash)?

    ReplyDelete
    Replies
    1. I'm not sure it proves it, but it is certainly consistent with it.

      Delete
  4. Hi Rob, thanks for a great analysis (as usual)! Doesn’t this prove the old (anecdotal) saying that markets take the stairs up (low(er) vol, steady momentum) and the elevator down (high vol, momentum crash)?

    ReplyDelete
  5. So as vol gets higher, mean reversion, not trend following should do better.

    ReplyDelete
  6. Great findings Rob, you binned the data be 3 regimes, what if volatility is fluctuating between low and medium volatility? Doesn't that fragment the time series data? Suppose, 16 EMA crosses 64 EMA to generate a buy signal in Low Vol regime then market enters medium Vol regime and now 16 EMA is below 64, what trade gets registered?

    ReplyDelete
    Replies
    1. The trade you have on wouldn't be affected, but the size of the trade would change. Yes you'd get that kind of behaviour which is why I decided not to implement that but instead used a measure of vol regime that was contionous.

      Delete
  7. Great post, Rob!

    I didn't quite got the reasoning behind a implementation detail. Maybe you could clarify this.

    The estimate for the 10 year vol that you are using in this post applies a rolling standard deviation over a ewma short term vol. Wouldn't make more sense to apply a simple rolling average over this short term vol in order to obtain a more reasonable estimate for the 10 year vol?

    "ten_year_vol = daily_vol.rolling(2500, min_periods=10).mean().shift(1)"

    instead of

    "ten_year_vol = daily_vol.rolling(2500, min_periods=10).std().shift(1)"

    In the previous post that you investigate slower vol terms in a simple linear weighted vol model (with short and long term components), you applied a simple moving average as a proxy to the long term vol. So, why are you using stantard deviation instead of mean this time?

    ReplyDelete
    Replies
    1. Copying and pasting error. It should indeed be ten_year_vol = daily_vol.rolling(2500, min_periods=10).mean().shift(1)

      Will now fix, thanks.

      Delete
    2. Thanks for clarifying, Rob. I was thinking whether it could be some sort of weird Z-score-like measure. Fortunately, it is just a pasting error.

      Delete
  8. Thank you for the great post.

    ReplyDelete
  9. Hi Rob - I would like to ask you a clarification on the following statement:” I use the raw forecast here. I do this because there is no guarantee that the above will result in the forecast retaining the correct scaling. So if I then estimate forecast scalars using these transformed forecasts, I will end up with something that has the right scaling”.

    If we apply “L” to the “raw forecast” and then we apply the “forecast scalars” to the raw forecast adjusted by L, then are we not diluting / losing the overall signal coming from L? I mean, the overall signal might well say - hey, it’s a high vol environment, let’s rein in risk; but then our forecast scalars will say, No, we need to adjust the position upward to make it more like an average bet, whisk we don’t want to take an average bet in the forecast adjusted for L is low.

    I hope I have been clear. If not let me know.

    Thanks,
    Ric

    ReplyDelete
    Replies
    1. You've been clear, but you are missing the fact that the forecast scalars are effectively fixed.

      That means they won't 'rein in' risk as you describe, since we're going to multiply the modified forecast by the same scalar regardless of environment.

      Now for the disclaimer: they're not exactly fixed, since they are estimated on a rolling out of sample basis. That means if we start in say an (ex-post) low vol environment then the the forecast scalars will be too low (since the raw forecast is higher, and the scalar doesn't need to be as large); but after a few years it should be sorted out.

      Delete
    2. Thanks for your reply, Robert.

      Follow-up question: what if we applied “L” to the “adjusted forecast” instead? Would that be acceptable ?

      Thanks, Ric

      Delete
    3. What's the 'adjusted' forecast??

      Delete
  10. Apologies for not being clear. For “adjusted forecast”, I mean the “raw forecast” adjusted by “forecast scalars”.
    So, what if we applied “L” to this “adjusted forecast”? Would that be acceptable?

    Thanks!

    ReplyDelete
    Replies
    1. Yes on reflection, this would probably make more sense than the version I described.

      Delete
  11. Hi Rob - first of all, it was great to see you on YouTube yesterday and discuss the systematic topics. Thanks for keeping educating us.

    A couple of questions on the forecast scalars:

    1) in your book and supporting spreadsheets, you advise to use a “fixed” scalar for a given trading rule variation. May I know how you fit these? Let’s suppose I have 10 years of backtest with related raw forecasts - do you use the entire 10 years to come up with your “fixed forecast scalar”, which will then be applied to live trading system?

    2) why did you not use a Z score methodology to transform raw forecasts into normalised forecasts? Is the Z-score methodology an inferior one to the one proposed by you?

    3) if I wanted to get training and a certification in systematic trading, what steps would you advice?

    Thanks in advance for your replies.

    Riccardo

    ReplyDelete
    Replies
    1. 1) Yes, although in practice I do it in a backward looking way in the backtest.

      2) What I'm doing is actually pretty similar to a time series Z score. The main difference is that in a Z score you would subtract the mean. I don't want to do that; I want to preserve any long/short bias in the forecast. For example, consider a slow momentum on an asset that has been going up through most of history. The raw forecast will mostly be long that asset, and I want to preserve that 'longness'. The other minor difference is that a Z score uses the standard deviation and I use the mean absolute deviation.

      3) The only reputable course I am aware of is this https://www.sbs.ox.ac.uk/programmes/executive-education/online-programmes/oxford-algorithmic-trading-programme

      Of course I haven't done the course, but the list of contributors is pretty top notch.

      Delete
  12. Hi Rob - how would you differentiate trend following from global macro investing?

    Also, could one apply your trading system framework to other types of investing - I.e. fundamental investing - or do you think raw forecasts, forecast scalars, etc will have to be calculated differently?

    Thanks,
    R

    ReplyDelete
    Replies
    1. I'm not sure. I've never seen a definition of what 'global macro' investing is. I guess it's investing at a global level, so equity indices, currencies... things you buy with futures? And 'macro', does that mean only using macroeconomic indicators? Or a broader set of signals? I guess one could use a trend following system as part of that.

      Answer to your second question is yes, absolutely. It's less relevant for relative value / stat arb / ... however, but can be made to work with some modifications.

      Delete
    2. Hi Rob - I hope you are well.

      Would you mind explaining how your system and the calculation of forecasts can be adapted if we were to try and implement a relative value strategy / stat arb strategy or signal?

      Thanks in advance for your help.

      Delete
    3. For eg any pair trading you create a synthetic instrument that embodies the spread, then your forecast would be something like -(price of spread - fair value), and then you translate the synthetic instrument back to actual positions. I'll probably blog about this at some point.

      Delete
    4. Thanks Rob.

      Just to understand this a bit better in the meantime. Assuming I want to implement a Value signal using country ETFs.

      A) UK equity earnings yield = 8%
      B) Australia equity earnings yield = 4%
      C) MSCI World ewnrjtjd yield = 6%

      “Relative Value” Raw Forecast UK Eq = -(8%-4%)-6%= +2%

      Relative value raw forecast Aussie Eq= [-(4%-8%)]-6%= -2%

      Is my interpretation correct?

      Thanks,
      R

      Delete
    5. Yes except you'd probably want to Z-score rather than just demeaning.

      Delete
  13. Thanks Rob.

    If I was going to build a trend following model like yours but using a set of country etfs - as many as there are in the chosen equity bm - with the idea of beating the MSCI ACWI, then how can I account for the fact that forecasts might signal a buy trade across the board for all of them, if all the equity markets are trending in the same direction?

    Do I have to calculate an average forecast across them and then subtract such an average from each forecast?

    Or shall I subtract the minimum value between forecasts from each of them?

    I hope I have been clear where I am coming from. Your trading system is absolute oriented, whilst I am trying to beat a defined benchmark and I wonder how I can adapt your system to that.

    It would be great if you could share what the best practices are in such situation.

    Thanks in advance.

    ReplyDelete
    Replies
    1. So I assume you are trying to put together a long only type portfolio? And do you want to buy a selection with equal weights (eg 'buy 10 out of the FTSE 100') or in theory buy everything with different weights?

      Delete
  14. Yes. My benchmark could be the FTSE 100 and I have the ability to buy the underlying FTSE 100 GICS Sector ETFs, 11 of them - energy, consumer staples, industrials etc. We are in a market environment where most of the sectors have strong up trends but some of them have weaker trends.

    How can we compute the forecasts in such a way that the sum of the deviations of each sector ETF from BM is equal to zero?

    The mandate is like beating the FTSE 100 with a TAA approach, where at each point in time we will be long some sectors and short some others but the sum of all the deviations is zero.

    What’s the best practice to achieve that?

    Thanks

    ReplyDelete
    Replies
    1. Well you just calculate the forecasts, and then subtract the average forecast for each sector... right?

      Delete
  15. Thanks Rob.

    I thought about it but it felt too simple to me in the first instance, hence I wanted to get your opinion.

    Thanks

    ReplyDelete
  16. Hi Rob,
    Thanks for the post. Slightly unrelated question – Do you have any views on how trading volumes affect trend following?
    How would one go about investigating this question in the same format you have done above for volatility? E.g. how do you recommend in measuring trading volumes

    ReplyDelete
    Replies
    1. Very hard to measure volumes for futures because of rolls. I've never found any systematic effects of volume on price.

      Delete

Comments are moderated. So there will be a delay before they are published. Don't bother with spam, it wastes your time and mine.