Monday 13 January 2020

Skew and Kurtosis as trading rules

This is part X of my series of blog posts on skew and kurtosis, where 2<X<5. Part X, because it depends on how you number them! If you were to read them in a logical order then the series looks something like this:


  • A post on skew: measuring, and it's impact on future returns
  • A post on kurtosis: measuring, it's impact on future returns, and it's interaction with skew.
  • A post on trend following and skew (which I actually wrote first, hence the confusion!)
  • This post: on using skew and kurtosis as trading rules
This series acts as a little demonstration as to how we can take an idea and run with it to the extremes, possibly even taking things too far (this post will reveal whether this is the case).

This post will also demonstrate how we should test an ensemble of trading rules without committing the sins of implicit fitting (basically dropping variations that don't work from our backtest). And it will use pysystemtrade. There will be some pretty geeky stuff showing you how to implement novel trading rules in the aforementioned python library.


The trading rules


In the last couple of posts I explained that if we know what skew and kurtosis have been recently (which we do) we can use that as conditioning information on what returns will be in the future (which we don't normally know). The obvious thing to do with this is turn it into a trading rule, in fact there will be 12 trading rules. This is because I have 3 kinds of rules:


  • a pure skew rule ('skew')
  • a skew conditioned on kurtosis rule ('skewK')
  • a kurtosis conditioned on skew rule ('kurtS')
And each of these rules can be applied in 4 different ways (essentially 4 kinds of demeaning):
  • Absolute: versus the average across all assets and time periods [an alternative for pure skew is to use zero as the average here, but let's be consistent] ('_abs')
  • Relative to this particular assets history (where history was the last 10 years) ('_ts' for time series)
  • Relative to the current cross sectional average across all assets ('_cs')
  • Relative to the current cross sectional average within the relevant asset class ('_rv' i.e. relative value)

Finally each of these rules will have 6 variations, for the six periods over which skew/kurtosis will be measured:


  • 7 days ('_7D')
  • 14 days ('_14D')
  • 1 month ('_30D')
  • 3 months ('_90D')
  • 6 months ('_180D)
  • 12 months ('_365D')

Thus an absolute skew with kurtosis conditioning over 3 months will be known as 'skewK_abs_90'. Catchy. That's a total of 72 different possibilities to consider!

Some of these will probably be too expensive to trade on one or more instruments, but pysystemtrade will take care of that for us. Still we will need to winnow this down to a more sensible figure.


A brief geeky diversion


A precursor to using pysystemtrade to test new trading rules is to add any raw data they will access in the relevant python code, or specifically for futures here. If you're doing your own experiments you should do this by inheriting from the base object in the relevant file and adding bells and whistles in the form of additional methods, but since 'my gaff (code) -my rules' I've updated the actual code. So for example, if we calculate the skew here then we can re-use it many times across the various rules.

However there is a weakness with this code, which is that we can't pass arguments into the raw data function. So we couldn't for example pass in the length of time used.

This isn't a problem in most cases since we can do the relevant work inside the actual trading rule, pulling in raw percentage returns as the input into our function. This is slower, but it works. It is however a problem for anything that needs access to the skew (or kurtosis) for other instruments (cs and rv rules), since trading rule functions work on the forecast for a single instrument at a time.

There are two options here; one is to modify the pysystemtrade code so it can deal with this, and the second is to fix the horizon length used in the cs and rv rules. Indeed this is the approach I use in the relative carry rule, discussed here.

I'm usually against making things more complex, but I think that changing the code is the right thing to do here. See here to see how this was done.


Coding up the rules


Here then is the raw data code. As you can see I've coded up a couple of methods to calculate skew and kurtosis, and then kept things generic with a whole bunch of 'factor' methods that can be used for any predictive factor (at some point I'll replace the relative carry code so it uses this pattern).

Notice that we have a positive skew, and a negative skew method. The latter will be used for the standalone skew rule, and the former as a conditioning method.

Here are the trading rules:

from syscore.algos import robust_vol_calc

def factor_trading_rule(demean_factor_value, smooth=90):
    vol =robust_vol_calc(demean_factor_value)
    normalised_factor_value = demean_factor_value / vol
    smoothed_normalised_factor_value = normalised_factor_value.ewm(span=smooth).mean()

    return smoothed_normalised_factor_value

def conditioned_factor_trading_rule(demean_factor_value, condition_demean_factor_value, smooth=90):
    vol = robust_vol_calc(demean_factor_value)
    normalised_factor_value = demean_factor_value / vol

    sign_condition = condition_demean_factor_value.apply(np.sign)
    sign_condition_resample = sign_condition.reindex(normalised_factor_value.index).ffill()

    conditioned_factor = normalised_factor_value *sign_condition_resample
    smoothed_conditioned_factor = conditioned_factor.ewm(span=smooth).mean()

    return smoothed_conditioned_factor

As you can see these are actually quite generic trading rules, which is a consequence of how I've written the raw data methods. This also means we can do much of our work in the configuration stage, rather than by writing many different rules.

Notice all that I've added a smoothing function that wasn't in the original code. When I examined the output originally it was quite jumpy; this is because the skew and kurtosis estimators aren't exponentially weighted, and when one exceptionally bad or good return drops in or out of the window it can cause a big change. This meant that even the very long windows had high turnover, something that is undesirable.  I've set the smooth at a tenth of the length of the lookback (not tested, but seems sensible).

Here is a gist showing how to set up the 72 rules (in code, but could easily be done as a configuration). A snippet is below:

smooth = int(np.ceil(lookback_days/10.0))
kurtS_rv = TradingRule(conditioned_factor_trading_rule, 
                    data=['rawdata.get_demeanded_factor_value',
                    'rawdata.get_demeanded_factor_value'],
                    other_args=dict(smooth = smooth, _factor_name="kurtosis",
                                       _demean_method="average_factor_value_in_asset_class_for_instrument",
                                       _lookback_days = lookback_days,
                                       __factor_name="skew",
                                       __demean_method="average_factor_value_in_asset_class_for_instrument",
                                       __lookback_days = lookback_days
                                       ))

You can really see here how the very generic functions are being configured. For the conditioned rule we pass two types of data; both are factors which have been demeaned hence the identical names. In the other args the smooth is passed to the trading rule itself, the single underscore prefixes (_factor_name, _demean_method, _lookback_days) are passed to the first method in the data list 'rawdata.demeanded_factor_value'; and the double underscores are passed to the second method (which happens to be the same here). On the second call the lookback and demeaning method are identical, but the factor names are different - we use skew as the main factor and kurtosis as the conditioning factor.


Checking behaviour, correlation and costs


Before piling into seeing whether any of these 72 (!) putative strategies makes sense from a behaviour, cost and correlation perspective. Hopefully we can drop some of the numerous variations. Now, I've been very vocal in the past about the use of fake data to do this part of fitting trading strategies.

However in this case we'd need to generate data that had interesting skew and kurtosis properies that were time varying. To avoid this I decided to use a single market, S&P 500. I chose the S&P because it has a reasonable length of history, and it's the second cheapest market I trade (the NASDAQ is slightly cheaper but doesn't have the same history). So if the S&P can't trade a particular rule, we can definitely ignore it.

This is slightly cheating, but I won't use any performance data to make in sample decisions.

First let's set up the backtest (assuming we've already got the trading rules using the gist code above):


ordered_rule_names = list(all_trading_rules.keys())
config = temp_config
config.use_forecast_div_mult_estimates = True
config.use_forecast_scale_estimates = True
config.use_instrument_div_mult_estimates = True
config.use_instrument_weight_estimates = False
config.use_forecast_weight_estimates = True
del(config.instrument_weights)
system = futures_system(trading_rules=all_trading_rules, config=config)

Now let's check the costs:

SR_costs_for_rules=[]
for rule in ordered_rule_names:
    SR_costs_for_rules.append((rule, 
          system.accounts.get_SR_cost_for_instrument_forecast("SP500", rule)))

SR_costs_for_rules.sort(key=lambda x: x[1])

Looking at the last few observations, all the rules with a 7 day lookback have costs greater than my normal cuttoff (0.13 SR units, see "Systematic Trading" to understand why). So we can drop this from our consideration.

Now for correlations:

rule_returns=system.accounts.pandl_for_instrument_rules_unweighted("SP500").to_frame()
rule_returns = rule_returns[ordered_rule_names]
corr_matrix = rule_returns.corr()

First let's look at the 'internal' correlations within each rule. For example:

select_rules = ['skew_abs_14', 'skew_abs_30', 'skew_abs_90', 'skew_abs_180', 'skew_abs_365']
corr_matrix.loc[select_rules, select_rules]
              skew_abs_14  skew_abs_30  skew_abs_90  skew_abs_180  skew_abs_365
skew_abs_14      1.000000     0.530610     0.158682      0.104764      0.022758
skew_abs_30      0.530610     1.000000     0.445712      0.218372      0.039874
skew_abs_90      0.158682     0.445712     1.000000      0.619104      0.305271
skew_abs_180     0.104764     0.218372     0.619104      1.000000      0.580179
skew_abs_365     0.022758     0.039874     0.305271      0.580179      1.000000

It looks like there are pleasingly low correlations between adjacent trading rules. I checked this for all the rules, with similar results.

Now let's check for variations of the skew rule, eg:

             skew_abs_14  skew_rv_14  skew_ts_14  skew_cs_14
skew_abs_14     1.000000    0.542259    0.996992    0.952949
skew_rv_14      0.542259    1.000000    0.543386    0.562397
skew_ts_14      0.996992    0.543386    1.000000    0.948784
skew_cs_14      0.952949    0.562397    0.948784    1.000000

Wow! Looks like the absolute, time series and cross sectional variations are basically doing the same thing. Checking the other rules I see similarly high correlations, although they tend to be a bit lower for longer lookbacks.

Whipping out Occams razor, it seems to make most sense to drop the time series and cross sectional rules completely since they are more complex implementations of the basic 'abs' rule but add little diversification. We'll keep the cross asset class relative value for now, since that does something quite different.

Now let's check across styles:

                carry  ewmac4_16  skew_abs_14  skewK_abs_14  kurtS_abs_14
carry         1.000000   0.079025    -0.020398      0.018712      0.053978
ewmac4_16     0.079025   1.000000     0.129336      0.077702      0.080301
skew_abs_14  -0.020398   0.129336     1.000000      0.184635      0.120404
skewK_abs_14  0.018712   0.077702     0.184635      1.000000      0.821673
kurtS_abs_14  0.053978   0.080301     0.120404      0.821673      1.000000


Skew conditioned on Kurtosis, and kurtosis conditioned on skew, seem to have a highish correlation. That's also true for the cross sectional variants:

                carry  ewmac4_16  skew_cs_30  skewK_cs_30  kurtS_cs_30
carry        1.000000   0.079025    0.039870     0.032401     0.053643
ewmac4_16    0.079025   1.000000    0.118919     0.012837     0.044516
skew_cs_30   0.039870   0.118919    1.000000     0.151807     0.000230
skewK_cs_30  0.032401   0.012837    0.151807     1.000000     0.843337
kurtS_cs_30  0.053643   0.044516    0.000230     0.843337     1.000000

That pattern holds true all the way up the longest lookbacks. It probably doesn't make sense to have two skew rules, so let's drop the skew conditioned on Kurtosis - again this is the more complex rule.

This leaves us with the following rules:
  • a pure skew rule ('skew')
  • a kurtosis conditioned on skew rule ('kurtS')
And each of these rules can be applied in two different ways (essentially two kinds of demeaning):
  • Absolute: versus the average across all assets and time periods [an alternative for pure skew is to use zero as the average here, but let's be consistent] ('_abs')
  • Relative to the current cross sectional average within the relevant asset class ('_rv' i.e. relative value)

Finally each of these rules will have 5 variations, for the five periods over which skew/kurtosis will be measured:
  • 14 days ('_14D')
  • 1 month ('_30D')
  • 3 months ('_90D')
  • 6 months ('_180D)
  • 12 months ('_365D')
So we now have 'just' 5*2*2 = 20 rules. Much more managable.


Trading rule allocation


Proceeding with S&P 500 for now, let's see how my handcrafting method allocates weights:

portfolio = system.combForecast.calculation_of_raw_estimated_forecast_weights("SP500").results[-1].diag['hc_portfolio']
portfolio.show_subportfolio_tree()


[' Contains 3 sub portfolios', 
 ['[0] Contains 3 sub portfolios', (Skew and RV kurtosis)
  ['[0][0] Contains 3 sub portfolios', (Slower skew rules)
   ["[0][0][0] Contains ['skew_abs_180', 'skew_abs_365', 'skew_abs_90']"], 
   ["[0][0][1] Contains ['skew_rv_180', 'skew_rv_90']"], 
   ["[0][0][2] Contains ['skew_rv_365']"]], 
  ['[0][1] Contains 2 sub portfolios', (Faster skew rules)
   ["[0][1][0] Contains ['skew_abs_14', 'skew_rv_14']"], (very fast skew)
   ["[0][1][1] Contains ['skew_abs_30', 'skew_rv_30']"]], (fastish skew)
  ['[0][2] Contains 3 sub portfolios', (Mostly RV kurtosis)
   ["[0][2][0] Contains ['kurtS_rv_180', 'kurtS_rv_365']"], 
   ["[0][2][1] Contains ['kurtS_abs_14', 'kurtS_rv_14']"], 
   ["[0][2][2] Contains ['kurtS_rv_30', 'kurtS_rv_90']"]]], 
 ['[1] Contains 3 sub portfolios',  (Carry and most absolute kurtosis)
  ["[1][0] Contains ['carry', 'kurtS_abs_180', 'kurtS_abs_365']"], 
  ["[1][1] Contains ['kurtS_abs_30']"], 
  ["[1][2] Contains ['kurtS_abs_90']"]], 
 ['[2] Contains 3 sub portfolios', (Momentum)
  ["[2][0] Contains ['ewmac2_8', 'ewmac4_16']"], (Fast mom) 
  ["[2][1] Contains ['ewmac32_128', 'ewmac64_256']"], (Slow mom)
  ["[2][2] Contains ['ewmac16_64', 'ewmac8_32']"]]] (medium mom)


I've added some notes manually, the algo doesn't do this labelling for us.

Summary of weights:

[(rule, weight) for rule,weight in zip(list(portfolio.all_instruments), portfolio.cash_weights)]

Carry 9.1%
EMWAC 12.9%
skew_abs 19.9%
skew_rv 17.0%
kurtS_abs 10.8%
kurtS_rv 29.2%


Performance


Okay, it's time for the moment of truth. How well do these trading rules actually perform?

First let's check out the skew rules:

select_rules = ['skew_abs_14', 'skew_abs_30', 'skew_abs_90', 'skew_abs_180', 'skew_abs_365']
system.accounts.pandl_for_all_trading_rules_unweighted().to_frame()[select_rules].cumsum().plot()


The best performing 'vanilla' skew rule is the one with a 365 day lookback. A one year lookback is also what was used in the canonical paper on skew / futures (more on this later). It has a SR of 0.33. Not up there with the EWMAC and carry rules with SR of 0.9 plus (excluding the fastest EWMAC that comes in at just 0.5), but positive at least. Thereafter there is a very clear pattern with faster skew rules doing worse.

Incidentally the 'flat spot' on the blue line is because it can only be traded by the cheaper markets, none of which have data before the year 2000.

What about RV skew?

A similar(ish) pattern here with the slowest skew rules coming in at SR of around 0.35, and the faster rules being rather unhelpful.

Now for kurtosis (conditioned on skew):


Hmmm. Nothing to shoot the lights out there eithier.


Rule selection part N


The holy grail is a trading rule that is negatively correlated to something we've already got, and has a positive Sharpe Ratio. In my original post on trend following and skew I noted that skew for interesting reasons was likely to be negatively correlated with momentum at certain speeds, and seems to have positive performance.

In this post the negative correlation seems to have been borne out (or at least the correlation is basically zero), but the positive performance is patchy. Nevertheless, in my 'ideas first' paradigm (described here), I will sometimes use rules that don't have statistically significant performance if their original motivation is well founded. So it might be worth chucking some skew and kurtosis into the mix.

The slower skew rules (of both flavours) do a reasonable job, and they are logical and straightforward rules with a well motivated reason as to why they should work. Thanks to my prior work, I also have a good understanding of how they interact with momentum.

I'm a little less comfortable with the kurtosis rules; the conditioning makes it a little more complex than something I'd normally contemplate using. I think here I got a little carried away with demonstrating how clever I could be (okay K - K_mu * sign(S - S_mu) isn't exactly the general theory of relativity, but it's much more complex than EWMA_f - EWMA_s). On balance I would prefer not to use the kurtosis rules, even though their cumulative SR is similar to skew.


Some thoughts on fitting


It's worth noting that the 365 day skew rule, which did the best here, is the same lookback used by this paper https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2671165. Here is an opportunity for me to quickly (remind / tell) you about my framework for three kinds of fitting.

Tacit fitting would have happened if I had used the 365 day rule having read it in that paper. We know that academic papers which don't have useful results are rarely published. Therefore there is a chance that the academics in question tried different formulations before deciding on 365 days. Of course this might not be true, and they could have just used 365 days; realised it worked, and moved on*. The fact this is a 365 day lookback, and not 275.4, makes this more plausible. Still the risk is there.

* And they could also have got the 365 days from another paper, whose authors tried different variations. Same problem.

Implicit fitting would be if I had run these backtests and chosen the best performing rule variations to use in my trading system (which as it happens were skew_abs_365 and skew_rv_180 if you're interested). Then when I ran my backtest again it would have looked pretty dammn good.

Explicit fitting is what I've actually done; used a mechanical rule to decide which rules are good, and should get more capital; and which are poor. This is the best kind, as long as you do it in a robust way that understands signal:noise ratios, and in a backward looking rolling out of sample fashion.

Having stated I will, going forward, only use the two skew rules am I guilty of implicit fitting? After all I have modified the configuration of my backtest after peeking at all the data. To a degree this is true. But I offer two defenses. Firstly, I'm still using all the different variations of the rules from 14 day to 365 day lookbacks and allowing the system to weight them appropriately. Secondly, removing the kurtosis rules doesn't really affect the performance of the system one way or another. So it's not like I'm biasing my backtest SR upwards by 50 basis points.


Portfolio level results


Having done all this, what effect does adding the two types of skew rule to my standard backtest have?


The orange line is the original backtest, and the blue line is the new one. Looks decent enough, but the improvement is only 7bp of SR. Still I've always been a fan of systems that use lots of simple rules, each adding a little extra to the mix, and even 7bp is better than a punch in the face with a sharp stick.


Conclusion


This exercise has been a little dissapointing, as I had hoped the skew rules would be a little more exciting performance-wise, but I've demonstrated some important practices. I've also had some fun adding extra flexibility to pysystemtrade.

13 comments:

  1. Hi

    What is your opinion on the bond market and allocation to bonds as of now? I just heard that some banks will invest more in infrastructure and property instead of bonds.

    Regards

    ReplyDelete
  2. I wonder what you think of this recent paper https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3505422

    Cross-Asset Skew

    ReplyDelete
  3. Hi Rob,
    I've been studying distribution modeling & parameterization of the moments, and in a thread on nuclearphynance.com, some members (including N. Taleb) were discussing power-law distributions and how skewness & kurtosis might affect diversification. To wit:

    "One thing I would like to see you address is how skewness and kurtosis affect diversification. In the gaussian world, diversification is the great free lunch...
    But in the non-gaussian world, life is not as good as in the gaussian fairy tale. And let's not even talk about correlation."

    "There is the start of an answer to that in "Bouchaud and Potters"
    Theory of financial risk and derivative pricing.

    " for the sum of independant random variable, all the cumulant simply adds (..)
    Normalized cumulant thus decay with N (number of asset) for n (order of the cumulant) >2. The higher the cumulant the faster the decay \lambda^N_n >N^{1-n/2}
    Kurtosis, defined as the fourth order normalized cumulant thus decreases as 1/N"

    They note however further that for case of power law (instead of gaussian) the cumulant diverge and the sum of power law still behaves as a power law (page 22-23). They stress that the central limit theorem doesn't say anything about the tails of the portfolio, just about the "center".

    Further in the book they discuss portfolio theory using power law distribution while minimizing the VaR." (source: https://nuclearphynance.com/Show%20Post.aspx?PostIDKey=166742)

    I'm not promoting that site, I just frequent it when trying to get some street-wise perspective to all the math-heavy word-salad theory.

    Anyway, I'm curious as to whether you think any of the notions presented there have merit w.r.t. your experience. I didn't see discussion of finer details like frequency of their trades, and only some notes on types of strategies.

    ReplyDelete
    Replies
    1. It's a well known stylised fact that most negative skew assets have high correlation in down markets (most vol selling short gamma like strategies got tanked in 2008). Therefore, without needing to look at the maths, it's obvious that a basket of -ve skew assets isn't really as diversified as you think it is.

      Personally I'm not too happy estimating parameters like co-skewness, because like skewness and kurtosis they are badly affected by a small number of outliers. A monte-carlo 'shuffle' type analysis will give more robust and insightful results, assuming that the data history is long enough to include at least one crisis period.

      (A shuffle analysis is basically sample from positions series, then sample from returns, then calculate a distribution of returns. This will give you a tail distribution that reflects the worst thing that could have happened to the worst kind of portfolio you had in the past)

      Delete
    2. Thanks for the thoughtful response.

      In general (i.e. w/o skew analysis), the increased correlations have definitely been noticeable across indexes, energy futs, etc since the virus scare started this year.

      On a separate note, have you looked at power-law distributions for modeling markets/returns much? TBH it's a new one to me, and a few sources (notably Gabaix et. al "A Theory of Power Law Distributions in Financial Market Fluctuations") *seem* to imply that "conventional" returns distribution modeling is insufficient w.r.t. the speed/size of bigger market participants moving big volumes. From that paper, one excerpt jumped out: "crashes do not appear to be outliers of the distribution."

      A few plots of power-law vs Gaussian/normal that I saw appeared to show a significant lead in power-law curvature vs. lag in the Gaussian.

      Delete
    3. P.S. I might be grossly mischaracterizing the applicability of power-law distributions here...

      Delete
    4. Personally I am not excited about this stuff. It's neccessary if you're writing option pricing models, but for normal trading risk management I find that assuming a joint normal distribution plus a little common sense and maybe some monte carlo risk modelling goes a long way. And by common sense, I mean not loading up a portfolio with obviously negative skew strategies, and limiting your weight to super fat tailed stuff like VIX.

      Delete
  4. Hi Rob, great post as always, I believe when calculating the skew forecast, the skew is divided by the daily vol (percentage points), I was just wondering the intuition for this. I believe skew will have a natural absolute average of 0.5, inferring a scalar of 20, however as each instrument will have a differing vol, the forecast will differ per market, but given the position sizing is controlled by the inverse volatility, was just wondering the need to normalise the skew by the percentage volatility prior to adjusting the position via the inverse volatility?

    ReplyDelete
    Replies
    1. I guess if i have a more volatile market, the skew might be more apparent, so i should normalise it by dividing by the vol, resulting in the forecast scalar needing to be recalculated as it most likely will not be 0.5

      Delete
    2. We divide by the daily vol of the skew, not of the returns of the underyling instrument. That is a generic technique that will turn any factor into a scale free factor, assuming that the returns are roughly normal.

      Any scale free factor can then be divided by instrument volatility to get a position size.

      Delete
  5. also the returns are likely to be far from normal and hence more extreme skew.

    ReplyDelete
  6. Clearly lockdown affecting me, that makes more sense!

    ReplyDelete

Comments are moderated. So there will be a delay before they are published. Don't bother with spam, it wastes your time and mine.