Tuesday 12 November 2019

Kurtosis and expected returns

In my last post, I stated my intention to write a series of posts about skew.

Slight change of plan, since one loyal reader suggested that I write about kurtosis. I thought that might be fun, since I haven't thought about kurtosis much, and the literature on kurtosis isn't as well developed. It turns out that considering both together leads to some very interesting results.

The plan is to basically repeat my previous analysis of skew for kurtosis. Then my subsequent posts on this subject will discuss both skew AND kurtosis. Hope that makes some kind of sense.

Not "everything you always wanted to know about kurtosis, but we're afraid to ask", but enough to understand this post

The first four moments of a distribution are:

- mean
- standard deviation
- skew
- kurtosis

In laymans terms, these define:

- how positive or negative the "middle"* of a distribution is
- how wide the distribution is
- how symettric the distribution is, or is not
- whether the distribution is mostly lumped into the middle, or spreads out to the edges

* notice I've used the comically imprecise term 'middle' here to avoid any mean vs median arguments

High kurtosis then means extreme events are more common than a vanilla Gaussian distribution would suggest. High kurtosis means fat tails. High kurtosis means every financial time series, ever.

Interpreting the first 3 moments is pretty easy, kurtosis less so. In this post I'm going to be using the standard kurtosis measure used in pandas. For reasons to boring to go into, the kurtosis of a normal distribution is 3. The pandas measure looks at excess kurtosis, so a figure of 0 means 'the normal amount of kurtosis'*, and any positive number means more than that. What does a kurtosis of 1.0 mean? Or 10.0? No idea really, since I don't have any intuition for the figure - one of the reasons to do this post is to get some.

* It's worth checking this yourself by generating random Gaussian data and measuring the kurtosis. I leave this as an exercise for the reader.

I won't be repeating the code since it's identical to the previous post, but with the string 'skew' replaced with 'kurtosis'.

Niggle: Not quite true, thanks to the bizzare and ever changing pandas API. These will all give averages over the whole Series:

percentage_returns[code].kurtosis(0) # omitting the zero will give a rolling figure(!)
percentage_returns[code].kurt() # same as kurtosis(0)

Variance in kurtosis estimates

So how accurately can we measure kurtosis? Here are some bootstrapped distributions of sample variance. Firstly, here it is for  EURUSD, which has a relatively low kurtosis:

Now here are US 2 year bonds, which has a relatively high kurtosis:

Somewhat unsurprisingly, the higher the Kurtosis the wider the estimate. Big kurtosis means outliers; resampling means we'll sometimes catch the outliers and sometimes will get more than our fair share of them: so a big variation in potential kurtosis.

Let's do a boxplot for everything:

Unlike skew there aren't any obvious patterns here; assets we'd expect to have similar kurtosis (like US bond futures) are all over the shop: 20 years are low, 5 years have a bit more, 10 years a bit more again, and 2 years have absolutely loads of the stuff.

Some of this is due to time varying regimes, a problem which will go away later in the post. For example here are US 2 year daily returns:

Pretty crazy in the financial crisis, and then they settle down somewhat. Measuring kurtosis post 2009 is likely to give a very different answer.

Do assets with generally higher kurtosis have higher returns?

It was easy to tell a story about skew preference; negative skew is clearly a bad thing and we should be rewared for holding it.

Should we get paid for high kurtosis? There are two possibilities. If the kurtosis is coming from a positive fat tail we might expect people to overpay for the chance to 'win a lottery': they will prefer high kurtosis, and there will be higher expected returns for low kurtosis assets. But if the kurtosis is coming from a negative fat tail then people will dislike it a lot.

Anyway, on average we're not paid for kurtosis:

X-axis, kurtosis over whole sample. Y-axis: average daily return

Ignoring the two vol markets, whose kurtosis is nowhere near as bad as their skew, the relationship looks ... slightly negative?

Here's a boxplot showing the distribution of resampled daily returns for high kurtosis (over 5.5) versus low kurtosis (under 5.5) instruments:

It does looks like low kurtosis is better than high, suggesting the 'lottery ticket preference' is holding up here: people overpay for high kurtosis. But we need to condition on skew to determine whether that hypothesis is correct or not.

Incidentally if you are worried about the vol markets, VIX comes in as low kurtosis and V2X as high.

Does the about finding still hold for risk adjusted returns, i.e. Sharpe Ratios?

Looks like something is still there, although perhaps not as significant as for outright returns. Lower kurtosis assets seem to have about 0.2 SR points extra per year.

Does an asset with currently higher kurtosis outperform one which has lower current kurtosis? (time series forecasting)

As with skew I'm going to measure kurtosis over different time periods; from a week up to a year of historic returns. Then I will do a t-test to see if assets that currently have higher kurtosis than the global median (about ) outperform those with lower kurtosis.

First the Sharpe Ratios, conditional on recent kurtosis:

It looks like, unlike skew, the preference for high kurtosis is something that appears most strongly at short horizons.

Now the t-statistics, comparing low and high kurtosis:

Basically noise.

How do current skew and kurtosis forecast future returns? (time series forecasting)

As I mentioned above there is a big difference between high kurtosis coming from positive returns, and the same from negative returns. Perhaps we will see something more interesting if we look at the combination of skew and kurtosis.

Same as before, different frequencies, but this time we look at both skew and kurtosis preceeding the date when we estimate a forward looking SR. First Sharpe Ratios:

This is without doubt the most interesting graph so far.

(Sure, but quite a low bar to beat...)

Remember we had two hypothesis about kurtosis:
  • People dislike lumpy returns, and want to avoid them. High kurtosis should always pay more than low kurtosis.
  • People only dislike lumpy returns if they're negative. They're happy to pay more for lottery tickets. High kurtosis should outperform low kurtosis for positive skew assets. For negative skew assets the relationship should be reversed

Let's summarise the findings of this graph and the previous post:

  • Negative skew* assets outperform, most strongly at longer horizons (from my previous blog post).
  • High kurtosis assets underperform, most strongly at shorter horizons (discussed earlier)
  • Within assets that have high kurtosis, at short horizons positive skew is rewarded. At longer horizons there is nothing meaningful.
  • Within assets that have low kurtosis, at longer horizons negative skew is rewarded. At shorter horizons there is nothing meaningful.
  • Within assets that have positive skew, at short horizons high kurtosis is rewarded 
  • Within assets that have negative skew, kurtosis is irrelevant

* incidentally I'm using zero as my skew cutoff here for simplicity, as in the previous post I determined it didn't make much difference. For kurtosis there is no 'natural' cutoff, so I'm sticking to the historic sample median of around 5.5

Or to put it another way:

  • The dominance of negative skew assets at longer horizons is only relevant for assets with low kurtosis.
  • The outperformance of high kurtosis assets at shorter horizons is only relevant for assets with positive skew.
Unlike a simple high skew / low skew strategy it's difficult to pick apart what relationships we should focus on here, especially if we want to put things into a continous forecast framework without implicit fitting (it would be very easy to create a set of binary rules which embodied the above findings, and which would embody loads of forward looking information). Implicit overfitting is particularly likely here as we don't have the simple intuitive results we got from outright skew.

I decided to boil the above down to two simple trading rules:

The skew rule

{(skew - Average skew) / Sigma [Skew]} 
* sign (Kurtosis - average kurtosis)

The kurtosis rule

{(Kurtosis - Average kurtosis) / Sigma [Kurtosis]} 
* sign (Skew - average skew)

... where for this specification the average is across all instruments and all past history (currently done entirely in sample, but in a trading rule will be based on an expanding window), and sigma is a standard deviation based on the past history of all instruments (the sigma is not important now, but will be when we come to design trading rules to ensure we have properly normalised forecasts).

This have the advantage of being relatively parsimonious and symettrical, albeit a bit non linear. There is still potentially an issue with implicit fitting, but we can deal with that in later posts.

Under these conditions we'd have the following positions in both rules:

  • High kurtosis, high skew: Both Long (profitable at short horizons)
  • High kurtosis, low skew: Short (Does relatively badly at short horizons)
  • Low kurtosis, high skew: Short (Does relatively badly especially at long horizons)
  • Low kurtosis, low skew: Long (does relatively well at long horizons)

(It's possible to combine these into a single rule, however I like the idea of having a skew and a kurtosis rule and the effects work differently at varying horizons)

Let's look at the t-statistics:

(For example 'pos skew rule' is the kurtosis rule applied when skew is positive and so on; really this ought to be 'pos skew, kurtosis rule' but you get the idea).

Here positive t-statistic means a rule is working. It looks like all the rules work pretty well at a one month frequency, with the skew rule working especially well for longer periods when kurtosis is low.

Does an asset with different skew / kurtosis than normal perform better than average (normalised time series)?

We can modify the rules above so that instead of using the average across all assets we will actually use the average for a given instrument (we can also modify the standard deviation once we get to producing actual forecasts).

Interestingly the rules seem to be bad at the original sweet spot, although skew conditioned on low kurtosis still does very well at longer horizons.

Now let's demean on the current average across all assets:

Ouch. A pretty poor performance. The skew rules (red and green) in particular are very sensitive to frequency.

Finally let's do the same thing, but this time demean by the current median skew and kurtosis for a given asset class. 

Again, not really the best result.


  • It's hard to estimate kurtosis with any certainty, even harder when kurtosis is large (outliers)
  • Unexpectedly we don't get paid for owning assets with high kurtosis
  • .... and then it gets complicated

Yes, there's an awful lot of results in this post! 

The key finding is that, as you may expect, skew and kurtosis have more forecasting power when they are conditioned on each other. Generally we want to own instruments that have had high kurtosis and relatively positive skew: these are lottery tickets which for some reason the market undervalues. We also want to own instruments that have low kurtosis and relatively negative skew; here we get rewarded for negative skew without suffering too many outliers. Instruments where skew and kurtosis are in opposite directions are less attractive.

These effects don't persist that well when we use different demeaning techniques, unlike in skew world where they hold up quite well.

It's worth reflecting on what I have done so far. In the last post I considered 4 different skew trading rules (outright, time series demean, cross sectional demean, asset class cross sectional demean). In this one I've effectively come up with another 8: 4 for skew conditioned on kurtosis, 4 for kurtosis conditioned on skew. That's a total of 12 different trading rules, each of which potentially has 6 different variations for different lookbacks.

Though it would be tempting to select a few of these for further testing that would be implicit fitting; I would be doing so based on the analysis I have done so far having looked at all the data. Instead the right and proper thing will be to take forward all 12 rules into an analysis where their risk weights are fitted systematically in a backward looking framework. So that's the next post.


  1. Thank you very much, Rob, for performing the kurtosis tests so quickly.

  2. Thank you Rob, very interesting analysis! I have been pondering for quite a while about skew and kurtosis, and I find that considering both simultaneously gives more intuitive results despite being slightly more complex.

    It looks like what is needed now is a robust way to estimate those higher order moments of the distribution. I've read some people suggesting the use of quantiles to reconstruct a distribution, while in other fields I've seen instead using maximum likelihood estimates to fit high order polynomial distributions. From those reconstructed distributions they claim it is then possible to compute those higher order moments in a more robust or cheaper way. What's your take on that?

    1. It's certainly more robust to measure some distribution points, eg 1,25,50,75,99; and then look at the ratios and differences between these to measure both skew and kurtosis. These measures may or may not be more robust, they remove outliers but on the downside they're only looking at a few points of the distribution.

      I'm personally reasonably happy with the standard measure of skew, and an easy way to make it more robust is by only looking at values in the q1-q99 range. It's also possible to measure kurtosis in the same way, although a quantile based measure of kurtosis might have the advantage of being more intuitive.

      What's intruiging is the idea of creating a single joint measure of skew and kurtosis which exactly matches what we want to predict (distinguishing between a scenario when skew and kurtosis are lined up, versus when they are not), using quantile points as the input. I need to think about this more deeply.

  3. Your 'kurtosis rule' formula is shown as an exact duplicate of the 'skew rule'...is that correct, or did you mean to substitute kurtosis for skew in the latter?

    1. Yes well spotted. Fixed.

    2. Thank you, for writing this post, very interesting and a topic I had bookmarked for analysis (among dozens of other things).
      Another question - where you mention 'high skew' and 'low skew', are these terms just representing positive and negative skew, respectively, and named so to keep consistency with kurtosis term?

    3. ...perhaps that's a never-mind, looks like you answered it with "(For example 'pos skew rule' is the kurtosis rule applied when skew is positive and so on; really this ought to be 'pos skew, kurtosis rule' but you get the idea)"

    4. Not quite, high skew means higher than average. Since the average is a bit negative (over the long run) it's possible to have high skew that's actually still negative. For the cross sectional and normalised stuff, average could easily be very postive or very negative depending on the instrument and time period considered.

  4. Dear author, I have two questions.

    1. Do you use the market breath indicator, options put - call ratio and things like that?

    2. In the Systematic Trading book you say that you use the weighted sum of three exponential moving averages crossovers for forecasting. It is quite complicated. Would you please post some proofs that it works better than just a single EMA crossover?

    1. 1. I don't. Doesn't mean they aren't valid, I just don't use them.

      2. Proof is simple; assuming correlation<1 and same sharpe ratio, a diversified set of trading rules will always outperform a single trading rule.

    2. You mean that 8-32 EMA crossover is considered more noisy than 16-64 EMA crossover, but the difference in performance won't be statistically significant?

    3. If by more noisy you mean lower realised Sharpe Ratio, then yes the difference in performance between different crossovers is not significant.

  5. How about describing in details the calculation of annualized Sharp ratio and correlation between instruments using financial time series bootstrapping?

    1. Well briefly; since you can always look at the code or google this; for a return series length N you randomly select with replacement N values to form a new series and calculate the required statistic Si. You do this for i=0... some big number, and then you have the distribution of Si for whatever purpose you require.

    2. Simple bootstrap is not a good idea for stocks prices time series. I have read bootstrap 101 appendix in your book, but many questions remain, and I can't find good answers.

  6. One more question. Have you explored Kalman filter as an alternative to moving averages crossover as trading signal?

    1. Sure it works well enough but I don't understand it so well so I don't use it.

    2. My real question is whether studying Kalman filter for replacing the moving averages crossover is worth the effort.

    3. My real answer is probably not, at least if you are doing it with the expectation of getting a performance gain. You might get a small boost from running *both* Kalman and EWMA. You might prefer Kalman for other reasons.

    4. You might get a small boost from running *both* Kalman and EWMA - wow, great idea, thanks.

      You might prefer Kalman for other reasons - would you please explain?

    5. If you find it's behaviour more intuitive for example.

  7. What is your opinion on Renko charts (bricks)?

    1. I don't know what they are. So I'm not really qualified to comment.


Comments are moderated. So there will be a delay before they are published. Don't bother with spam, it wastes your time and mine.