Tuesday 10 December 2019

New and Improved Sharpe Ratio adjustment in the handcrafting method

In my recent posts on skew and kurtosis I've put together a large number of ideas for possible trading strategies. The next step will be to create and test these ideas out. However I already know from my initial analysis that many of these ideas will probably have poor performance. This leaves me in something of a conundrum. In fact this is just one example of the conundrum that strategy developers face all the time.

Do I ignore these strategies completely, which will result in my historic back test being overstated (a sin I have named implicit fitting)? Or do I include them in a portfolio of strategies, and hope that the allocation methodology will downweight them because of their poor performance?

The problem is that methods for portfolio allocation tend to fall into one of two camps. There are robust methods that completely ignore performance, from the simple 'one over n' equal weighting all the way up to more complex methods like Hierarchical Risk Parity (HRP). Those that do use the performance of strategies, like the basic naive Markowitz optimisation, tend to be very un-robust. Small differences in mean result in big differences in allocation.

There is no easy answer here. On the one hand, the mean (or equivalently Sharpe Ratio) is the moment of the distribution with the largest sampling error, and which is the hardest to predict. On the other hand, there really ought to be some way to downweight and ultimately drop consistently money losing strategies.

In the handcrafting method, which I wrote about in some detail this time last year (the first post is here), I use a simple heuristic technique to adjust weights for different Sharpe Ratios. Basically it's the yellow line on this graph:

The x-axis shows the relative SR of an asset compared to the average of a portfolio (or sub-portfolio). The y-axis is the weighting that is applied. So for an asset with the average SR, the weighting factor is exactly 1.

The method is then simple: we multiply all the weights in a (sub)portfolio by the weighting factor given by their relative SR. We then re-normalise the weights so they add up to 100%. Assets with higher SR have higher weights, and vice-versa. But extreme weights don't occur, since the weight multiplier is never below 0.6 or above 1.4.

(As an aside, the handcrafting method also accounts for uncertainty in correlations, but not in standard deviations. The latter is relatively small and has minimal effect on portfolio weights)

The heuristic has the advantage of being robust, fast and intuitive. The disadvantage is that this technique does not account for important factors. In particular, we can become more confident about our weighting when (a) we have more data and (b) the correlation between assets is higher. In the right circumstances a weight of almost zero would be justified - effectively entirely removing an asset or trading rule variation from the portfolio.

In this post I modify the simple heuristic to account for these factors. Although this slightly increases the calculation time for the handcrafted optimisation, it is worth doing. There will be python code.

Where does the heuristic come from?

The heuristic essentially comes from doing a portfolio optimisation on a very simple arbitrary portfolio, which has two assets with:

  • Asset one: some arbitrary Sharpe Ratio (I use SR=0.5 which makes sense for the sorts of trading rules I use)
  • Asset two: A higher or lower Sharpe Ratio
  • Both assets: Some arbitrary identical standard deviation (I use 15% though in fact this doesn't affect the results)
  • Some correlation
If both assets had the same Sharpe Ratio then the optimal allocation would be 50% each, irrespective of correlation. We can check this:

### Following code is boilerplate for optimising
from scipy.optimize import minimize
import numpy as np
import scipy.stats as stats
import pandas as pd

def optimise(mean_list, avg_correlation, std): corr_matrix = boring_corr_matrix(len(mean_list), offdiag=avg_correlation) stdev_list = [std]*len(mean_list) sigma = sigma_from_corr_and_std(stdev_list, corr_matrix) return optimise_with_sigma(sigma, mean_list) def optimise_with_sigma(sigma, mean_list): mus = np.array(mean_list, ndmin=2).transpose() number_assets = sigma.shape[1] start_weights = [1.0 / number_assets] * number_assets # Constraints - positive weights, adding to 1.0
    bounds = [(0.0, 1.0)] * number_assets
    cdict = [{'type': 'eq', 'fun': addem}]
    ans = minimize(
        start_weights, (sigma, mus),
    weights = ans['x']

    return weights

def neg_SR(weights, sigma, mus):
    # Returns minus the Sharpe Ratio (as we're minimising)    
    weights = np.matrix(weights)
    estreturn = (weights * mus)[0, 0]
    std_dev = (variance(weights, sigma)**.5)

    return -estreturn / std_dev

def variance(weights, sigma):
    # returns the variance (NOT standard deviation) given weights and sigma    
    return (weights * sigma * weights.transpose())[0, 0]

def sigma_from_corr_and_std(stdev_list, corrmatrix):
    stdev = np.array(stdev_list, ndmin=2).transpose()
    sigma = stdev * corrmatrix * stdev

    return sigma

def addem(weights):
    # Used for constraints    
    return 1.0 - sum(weights)

def boring_corr_matrix(size, offdiag=0.99, diag=1.0):
    """    Create a boring correlation matrix
    :param size: dimensions    
    :param offdiag: value to put in off diagonal    
    :param diag: value to put in diagonal    
    :return: np.array 2 dimensions, size    
    size_index = range(size)

    def _od(i, j, offdiag, diag):
        if i == j:
            return diag
            return offdiag

    m = [[_od(i, j, offdiag, diag) for i in size_index] for j in size_index]
    m = np.array(m)
    return m

# Both have same mean, and hence SR, just vary the correlation
>>> optimise([0.05, 0.05], 0.0, 0.15)
array([0.5, 0.5])
>>> optimise([0.05, 0.05], 0.9, 0.15)
array([0.5, 0.5])
>>> optimise([0.05, 0.05], -0.9, 0.15)
array([0.5, 0.5])

A small difference in means will result in crazy weights, especially if we have highish correlation:

optimise([0.05, 0.06], 0.9, 0.15)
array([1.11022302e-16, 1.00000000e+00])

There is more of this kind of analysis in Smart Portfolios if you're curious

We need some way of accounting for the uncertainty in our mean estimate (in principal we should do the same for correlation and vol; but the handcrafting method already does this for correlation, and vol is relatively predictable).

Let's begin with the textbook formula for sampling error for a difference in means:

Line 1: Standard deviation of the mean estimate, asset i, given sigma_i (annualised standard deviation) and N is the number of years of data
Line 2: Ditto for asset j
Line 3: Standard deviation of the difference in means
Line 4: If both assets have the same standard deviation this simplifies to the final line.

This gives us the standard deviation of the distribution of the differences in mean estimate. The mean of that distribution is just the difference between the two sample means. Thanks to central limit theorem, the distribution is Gaussian .So we can work out the cumulative probability of any particular difference in mean estimate.

Here is an implementation of this in python:

def calculate_omega_difference(std, years_of_data, avg_correlation):
    omega_one_asset = std / (years_of_data)**.5    
    omega_variance_difference = 2*(omega_one_asset**2)*(1- avg_correlation)
    omega_difference = omega_variance_difference**.5
    return omega_difference

def calculate_confident_mean_difference(std, years_of_data, mean_difference, confidence_interval, avg_correlation):
    omega_difference = calculate_omega_difference(std, years_of_data, avg_correlation)
    confident_mean_difference = stats.norm(mean_difference, omega_difference).ppf(confidence_interval)

    return confident_mean_difference

# The 0.5 point will just be the mean difference
>>> calculate_confident_mean_difference(0.15, 10, 0.3, 0.5, 0.0)

# It's plausible (5% chance) the difference is quite a bit lower than 0.3 with enough data
>>> calculate_confident_mean_difference(0.15, 10, 0.3, 0.05, 0.0)

# With insufficient date we can't be confident the mean is even positive
>>> calculate_confident_mean_difference(0.15, 1, 0.3, 0.05, 0.0)

# ... unless the correlation is really high 
>>> calculate_confident_mean_difference(0.15, 1, 0.3, 0.05, 0.95)

# Trivially if we increase the mean difference the distribution is just shifted
>>> calculate_confident_mean_difference(0.15, 1, 0.6, 0.05, 0.95)

It's worth playing around with this code to get a feel for how the mean distribution is affected by (a) the correlation and (b) the amount of data available (it's less interesting to change the standard deviation or average mean).

We can now see how different levels of confidence in our mean estimate affect our portfolio weights. This function optimises, fixing the Sharpe Ratio of the second asset, and applying an adjustment to the first Sharpe Ratio.

def weights_given_SR_diff(SR_diff,  avg_correlation,  confidence_interval, years_of_data,
                          avg_SR=0.5, std=0.15, how_many_assets=2):
    """    Return the ratio of weight to 1/N weight for an asset with unusual SR
    :param SR_diff: Difference between the SR and the average SR. 0.0 indicates same as average    
    :param avg_correlation: Average correlation amongst assets    
    :param years_of_data: How long has this been going one    
    :param avg_SR: Average SR to use for other asset    
    :param confidence_interval: How confident are we about our mean estimate (i.e. cdf point)    
    :param how_many_assets: .... are we optimising over (I only consider 2, but let's keep it general)    
    :param std: Standard deviation to use
    :return: Ratio of weight, where 1.0 means no difference    
    # From SR to means
    average_mean = avg_SR * std
    asset1_mean = (SR_diff + avg_SR)*std

    mean_difference = asset1_mean - average_mean

    ## Work out what the mean is with appropriate confidence    
    confident_mean_difference = calculate_confident_mean_difference(std, years_of_data, mean_difference, confidence_interval, avg_correlation)

    confident_asset1_mean = confident_mean_difference + average_mean

    mean_list = [confident_asset1_mean]+[average_mean]*(how_many_assets-1)

    weights = optimise(mean_list, avg_correlation, std)

    return list(weights)

# Median estimate (50%, 0.5 point) for a modest SR improvement (0.2) with lowish
# correlation (0.3) and a decade of data (10). 
# Equivalent to optimise([0.7*.15, 0.5*.15], 0.3, 0.15)
>>> weights_given_SR_diff(0.2, 0.3, 0.5, 10)
[0.6529536546603577, 0.3470463453396423]

# It's plausible the weight to the first asset could be lower (10% point, 0.1)...
>>> weights_given_SR_diff(0.2, 0.3, 0.1, 10)
[0.14015358673119246, 0.8598464132688076]

# or higher (90% point, 0.9)...
>>> weights_given_SR_diff(0.2, 0.3, 0.9, 10)
[0.8741474140876935, 0.12585258591230658]

I'm going to do a sort of monte-carlo exercise, but not really. If we consider a number of fixed points on a distribution, equally spaced, then each are equally likely. For example the nine points, 10%, 20%, .... 90% are all equally likely to occur. If we optimise the portfolio weights for a 10% degree of confidence, then 20%, and so on; then take an average of the weights, we basically get the same effect as if we repeatedly sampled the return distribution in the normal way (the non-parametric bootstrapping I discussed in Systematic Trading). But the advantage of using this method is that we only need to run N optimisations (where N is hopefully not too big - to be determined shortly), and more interestingly we don't get the randomness of bootstrapping - the same parameters should always produce the same result.

def mini_bootstrap_ratio_given_SR_diff(SR_diff,  avg_correlation, years_of_data, avg_SR=0.5, std=0.15, how_many_assets=2,
    Do a parametric bootstrap of portfolio weights to tell you what the ratio should be between an asset which       
    has a higher backtested SR (by SR_diff) versus another asset(s) with average Sharpe Ratio (avg_SR)
    All assets are assumed to have same standard deviation and correlation
    :param SR_diff: Difference in performance in Sharpe Ratio (SR) units between one asset and the rest    
    :param avg_correlation: Average correlation across portfolio    
    :param years_of_data: How many years of data do you have (can be float for partial years)    
    :param avg_SR: Should be realistic for your type of trading    
    :param std: Standard deviation (doesn't affect results, just a scaling parameter)    
    :param how_many_assets: How many assets in the imaginary portfolio    
    :param p_step: Step size to go through in the CDF of the mean estimate    
    :return: float, ratio of weight of asset with different SR to 1/n weight    
    dist_points = np.arange(p_step, stop=(1-pstep)+0.000001, step=p_step)
    list_of_weights = [weights_given_SR_diff(SR_diff,  avg_correlation,
                                          confidence_interval, years_of_data, avg_SR=avg_SR, std=std,
                     for confidence_interval in dist_points]

    array_of_weights = np.array(list_of_weights)
    average_weights = np.nanmean(array_of_weights, axis=0)
    ratio_of_weights = weight_ratio(average_weights)

    if np.sign(ratio_of_weights-1.0)!=np.sign(SR_diff):
        # This shouldn't happen, and only occurs because weight distributions get curtailed at zero        
        return 1.0

    return ratio_of_weights

def weight_ratio(weights):
    Return the ratio of weight of first asset to 1/n weight
    :param weights:    
    :return: float    
    one_over_N_weight = 1.0/len(weights)
    weight_first_asset = weights[0]

    return weight_first_asset/one_over_N_weight

Notice that the function returns the 'weight ratio': the ratio of the optimised  weight to a 1/N weight, since that is what is used in the heuristic method.

There are several open questions here, relating to the parameters we have to specify. Average correlation and the amount of available data are key variables which I will explore in a moment. The choice of average Sharpe Ratio will influence our results to a lesser degree, but to keep things simple I will be sticking to avg_SR=0.5. Changing the standard deviation will have absolutely no effect on the results as it is just a scaling factor.

Finally there is a straightforward methodological issue - the choice of 'p-step' i.e.  how many of these optimisations do we need to do for an accurate answer? Is 0.1,0.2,... 0.9 sufficient (in the function, p-step=0.1) or do we need to be more granular?

import matplotlib
how_many_assets = 2
avg_SR = 0.5
avg_correlation = 0.5
years_of_data = 10
SR_diff_list = list(np.arange(-0.5, stop=0.5, step=0.05))
list_of_psteps = [0.2, 0.1, 0.05, 0.1, 0.01, 0.001]

results_over_psteps = dict()
for pstep in list_of_psteps:
    results_over_SR_diff = []
    for SR_diff in SR_diff_list:
        print("Pstep %f SR_diff %f Years of data %d" % (pstep, SR_diff, years_of_data))
        ratio = mini_bootstrap_ratio_given_SR_diff(SR_diff, how_many_assets,
                                                   avg_correlation, years_of_data, avg_SR=avg_SR,
                                                   p_step = pstep)
    results_over_psteps[pstep] = results_over_SR_diff

df_results = pd.DataFrame(results_over_psteps, index=SR_diff_list)
plot_title = "(Avg correlation %f Number of assets %d, Average SR %f Years of data %d; Lines show pstes %f:%f)" \
             % (avg_correlation, how_many_assets, avg_SR, years_of_data, list_of_psteps[0], list_of_psteps[-1])
matplotlib.pyplot.xlabel("SR relative to portfolio average SR")

I would say that a p-step of 0.01 (i.e. distributional points at 0.01, 0.02, 0.03...0.99) is more than sufficient to give an accurate result (very close to the much more finely grained 0.001).

A minor point: in this and all the following plots the lines do not (as you might expect) go through the point (0.0, 1.0). This is an artifact of the portfolio optimisation process, and not a bug. If I was using these plots in a book, I'd shift them to get them to look nice. As it is, because weights are always renormalised to 100%, it won't make any difference.

How is the heuristic weighting affected by having more data?

how_many_assets = 2
SR_diff_list = list(np.arange(-0.5, stop=0.5, step=0.05))
various_lengths_of_history = [2,5,10,20,30,40]
p_step = 0.01avg_correlation=0.5results_over_years = dict()
for years_of_data in various_lengths_of_history:
    results_over_SR_diff = []
    for SR_diff in SR_diff_list:
        print("Correlation %f SR_diff %f Years of data %d" % (avg_correlation, SR_diff, years_of_data))

        ratio = mini_bootstrap_ratio_given_SR_diff(SR_diff, avg_correlation, years_of_data, avg_SR=avg_SR,
                                                   std=std, how_many_assets=how_many_assets, p_step = p_step)
    results_over_years[years_of_data] = results_over_SR_diff

df_results = pd.DataFrame(results_over_years, index = SR_diff_list)
plot_title = "Avg correlation %f (Number of assets %d, Average SR %f; Lines show years of data %d:%d)" \
                        % (avg_correlation,     how_many_assets,   avg_SR,  various_lengths_of_history[0], various_lengths_of_history[-1])
df_results.plot(title = plot_title)
matplotlib.pyplot.xlabel("SR relative to portfolio average SR")
What we expect to see here is more data -> increase in confidence -> more extreme weights, when we have more history. And that is exactly what we see; with 2 years of data the lowest weight we can get to is around 0.5, whilst with 30 years plus it's almost zero.

How is the weighting affected by average correlations?

how_many_assets = 2
SR_diff_list = list(np.arange(-0.5, stop=0.5, step=0.05))
years_of_data = 10
p_step = 0.01
various_correlations=[0.0, 0.3, 0.5, 0.7, 0.9]
results_over_correl = dict()
for avg_correlation in various_correlations:
    results_over_SR_diff = []
    for SR_diff in SR_diff_list:
        print("Correlation %f SR_diff %f Years of data %d" % (avg_correlation, SR_diff, years_of_data))

        ratio = mini_bootstrap_ratio_given_SR_diff(SR_diff, avg_correlation, years_of_data, avg_SR=avg_SR,
                                                   std=std, how_many_assets=how_many_assets, p_step = p_step)
    results_over_correl[avg_correlation] = results_over_SR_diff

df_results = pd.DataFrame(results_over_correl, index = SR_diff_list)
plot_title = "Years %d (Number of assets %d, Average SR %f; Lines show correlations %d:%d)" \
                        % (years_of_data,     how_many_assets,   avg_SR,  various_correlations[0], various_correlations[-1])
df_results.plot(title = plot_title)
matplotlib.pyplot.xlabel("SR relative to portfolio average SR")

The higher the correlation between two strategies, the more confident we can be that a small difference in Sharpe Ratios is meaningful, and the more extreme weights we can use.

Plugging into pysystemtrade

Now seems a good opportunity to test how effective this method is. So let's plug in the above code into the handcrafting implementation in my open source python backtesting engine, pysystemtrade.

The optimisation code is a bit of a mess, but as a starting point towards refactoring I've moved the core optimisation routines which are needed here into another file to avoid double coding or messing up the heirarchy. I've also updated handcrafting.py to include the new method.

Let's check things are working as we expect with a simple example. Imagine we're trading the VIX with some vanilla trading rules, plus a dumb one: Always go long the VIX:

from systems.provided.futures_chapter15.estimatedsystem import futures_system
from systems.provided.moretradingrules.morerules import long_bias
from systems.forecasting import TradingRule

system = futures_system(log_level="on")
config = system.config
new_rule = TradingRule(long_bias) config.trading_rules['long'] = new_rule

# Following line means we won't be adjusting
system = futures_system(log_level="on", config=config)

First we'll check what happens without any Sharpe Ratio adjustment.

instrument_returns = system.combForecast.get_returns_for_optimisation("VIX").to_frame()
instrument_returns = instrument_returns.resample("W").sum()

p=Portfolio(instrument_returns, use_SR_estimates=False)

>>> p.sharpe_ratio
array([ 0.6591312 ,  0.50752412,  0.5209934 ,  0.48374054, -0.52792834])

>>> p.corr_matrix
                carry  ewmac16_64  ewmac32_128  ewmac64_256      long
carry        1.000000    0.681041     0.844301     0.913307 -0.872324
ewmac16_64   0.681041    1.000000     0.914988     0.782395 -0.528207
ewmac32_128  0.844301    0.914988     1.000000     0.946146 -0.659741
ewmac64_256  0.913307    0.782395     0.946146     1.000000 -0.726869
long        -0.872324   -0.528207    -0.659741    -0.726869  1.000000

>>> p.show_subportfolio_tree()
Natural top level grouping used
[' Contains 3 sub portfolios', 
["[0] Contains ['carry', 'ewmac32_128', 'ewmac64_256']"], 
["[1] Contains ['ewmac16_64']"], ["[2] Contains ['long']"]]

>>> p.cash_weights
Natural top level grouping used
[0.06658, 0.3790, 0.08106, 0.0653, 0.4079]

It's very diversifying, so get's it's own group and a chunky weight.

Now with the adjustment applied:

system = futures_system(log_level="on", config=config)

instrument_returns = system.combForecast.get_returns_for_optimisation("VIX").to_frame()
instrument_returns = instrument_returns.resample("W").sum()

>>> p.cash_weights
Natural top level grouping used
[0.1092, 0.5553, 0.1329, 0.09569, 0.1067]

Let's look at the time series of weights to see how they change as we get more data:


The "always long" trading rule weight starts off pretty high after 2008 when it performed very well. Subsequently however it's performance fades, and so does it's weight (albeit there is some jumping around as the group weights change).


I'm happy that the heuristic for Sharpe Ratio now deals properly with assets that have different correlations and lengths of available data. In the new year I'll be using the upgraded handcrafting method to test the trading rules for skew and kurtosis that I've been playing with for the last couple of posts.

Tuesday 12 November 2019

Kurtosis and expected returns

In my last post, I stated my intention to write a series of posts about skew.

Slight change of plan, since one loyal reader suggested that I write about kurtosis. I thought that might be fun, since I haven't thought about kurtosis much, and the literature on kurtosis isn't as well developed. It turns out that considering both together leads to some very interesting results.

The plan is to basically repeat my previous analysis of skew for kurtosis. Then my subsequent posts on this subject will discuss both skew AND kurtosis. Hope that makes some kind of sense.

Not "everything you always wanted to know about kurtosis, but we're afraid to ask", but enough to understand this post

The first four moments of a distribution are:

- mean
- standard deviation
- skew
- kurtosis

In laymans terms, these define:

- how positive or negative the "middle"* of a distribution is
- how wide the distribution is
- how symettric the distribution is, or is not
- whether the distribution is mostly lumped into the middle, or spreads out to the edges

* notice I've used the comically imprecise term 'middle' here to avoid any mean vs median arguments

High kurtosis then means extreme events are more common than a vanilla Gaussian distribution would suggest. High kurtosis means fat tails. High kurtosis means every financial time series, ever.

Interpreting the first 3 moments is pretty easy, kurtosis less so. In this post I'm going to be using the standard kurtosis measure used in pandas. For reasons to boring to go into, the kurtosis of a normal distribution is 3. The pandas measure looks at excess kurtosis, so a figure of 0 means 'the normal amount of kurtosis'*, and any positive number means more than that. What does a kurtosis of 1.0 mean? Or 10.0? No idea really, since I don't have any intuition for the figure - one of the reasons to do this post is to get some.

* It's worth checking this yourself by generating random Gaussian data and measuring the kurtosis. I leave this as an exercise for the reader.

I won't be repeating the code since it's identical to the previous post, but with the string 'skew' replaced with 'kurtosis'.

Niggle: Not quite true, thanks to the bizzare and ever changing pandas API. These will all give averages over the whole Series:

percentage_returns[code].kurtosis(0) # omitting the zero will give a rolling figure(!)
percentage_returns[code].kurt() # same as kurtosis(0)

Variance in kurtosis estimates

So how accurately can we measure kurtosis? Here are some bootstrapped distributions of sample variance. Firstly, here it is for  EURUSD, which has a relatively low kurtosis:

Now here are US 2 year bonds, which has a relatively high kurtosis:

Somewhat unsurprisingly, the higher the Kurtosis the wider the estimate. Big kurtosis means outliers; resampling means we'll sometimes catch the outliers and sometimes will get more than our fair share of them: so a big variation in potential kurtosis.

Let's do a boxplot for everything:

Unlike skew there aren't any obvious patterns here; assets we'd expect to have similar kurtosis (like US bond futures) are all over the shop: 20 years are low, 5 years have a bit more, 10 years a bit more again, and 2 years have absolutely loads of the stuff.

Some of this is due to time varying regimes, a problem which will go away later in the post. For example here are US 2 year daily returns:

Pretty crazy in the financial crisis, and then they settle down somewhat. Measuring kurtosis post 2009 is likely to give a very different answer.

Do assets with generally higher kurtosis have higher returns?

It was easy to tell a story about skew preference; negative skew is clearly a bad thing and we should be rewared for holding it.

Should we get paid for high kurtosis? There are two possibilities. If the kurtosis is coming from a positive fat tail we might expect people to overpay for the chance to 'win a lottery': they will prefer high kurtosis, and there will be higher expected returns for low kurtosis assets. But if the kurtosis is coming from a negative fat tail then people will dislike it a lot.

Anyway, on average we're not paid for kurtosis:

X-axis, kurtosis over whole sample. Y-axis: average daily return

Ignoring the two vol markets, whose kurtosis is nowhere near as bad as their skew, the relationship looks ... slightly negative?

Here's a boxplot showing the distribution of resampled daily returns for high kurtosis (over 5.5) versus low kurtosis (under 5.5) instruments:

It does looks like low kurtosis is better than high, suggesting the 'lottery ticket preference' is holding up here: people overpay for high kurtosis. But we need to condition on skew to determine whether that hypothesis is correct or not.

Incidentally if you are worried about the vol markets, VIX comes in as low kurtosis and V2X as high.

Does the about finding still hold for risk adjusted returns, i.e. Sharpe Ratios?

Looks like something is still there, although perhaps not as significant as for outright returns. Lower kurtosis assets seem to have about 0.2 SR points extra per year.

Does an asset with currently higher kurtosis outperform one which has lower current kurtosis? (time series forecasting)

As with skew I'm going to measure kurtosis over different time periods; from a week up to a year of historic returns. Then I will do a t-test to see if assets that currently have higher kurtosis than the global median (about ) outperform those with lower kurtosis.

First the Sharpe Ratios, conditional on recent kurtosis:

It looks like, unlike skew, the preference for high kurtosis is something that appears most strongly at short horizons.

Now the t-statistics, comparing low and high kurtosis:

Basically noise.

How do current skew and kurtosis forecast future returns? (time series forecasting)

As I mentioned above there is a big difference between high kurtosis coming from positive returns, and the same from negative returns. Perhaps we will see something more interesting if we look at the combination of skew and kurtosis.

Same as before, different frequencies, but this time we look at both skew and kurtosis preceeding the date when we estimate a forward looking SR. First Sharpe Ratios:

This is without doubt the most interesting graph so far.

(Sure, but quite a low bar to beat...)

Remember we had two hypothesis about kurtosis:
  • People dislike lumpy returns, and want to avoid them. High kurtosis should always pay more than low kurtosis.
  • People only dislike lumpy returns if they're negative. They're happy to pay more for lottery tickets. High kurtosis should outperform low kurtosis for positive skew assets. For negative skew assets the relationship should be reversed

Let's summarise the findings of this graph and the previous post:

  • Negative skew* assets outperform, most strongly at longer horizons (from my previous blog post).
  • High kurtosis assets underperform, most strongly at shorter horizons (discussed earlier)
  • Within assets that have high kurtosis, at short horizons positive skew is rewarded. At longer horizons there is nothing meaningful.
  • Within assets that have low kurtosis, at longer horizons negative skew is rewarded. At shorter horizons there is nothing meaningful.
  • Within assets that have positive skew, at short horizons high kurtosis is rewarded 
  • Within assets that have negative skew, kurtosis is irrelevant

* incidentally I'm using zero as my skew cutoff here for simplicity, as in the previous post I determined it didn't make much difference. For kurtosis there is no 'natural' cutoff, so I'm sticking to the historic sample median of around 5.5

Or to put it another way:

  • The dominance of negative skew assets at longer horizons is only relevant for assets with low kurtosis.
  • The outperformance of high kurtosis assets at shorter horizons is only relevant for assets with positive skew.
Unlike a simple high skew / low skew strategy it's difficult to pick apart what relationships we should focus on here, especially if we want to put things into a continous forecast framework without implicit fitting (it would be very easy to create a set of binary rules which embodied the above findings, and which would embody loads of forward looking information). Implicit overfitting is particularly likely here as we don't have the simple intuitive results we got from outright skew.

I decided to boil the above down to two simple trading rules:

The skew rule

{(skew - Average skew) / Sigma [Skew]} 
* sign (Kurtosis - average kurtosis)

The kurtosis rule

{(Kurtosis - Average kurtosis) / Sigma [Kurtosis]} 
* sign (Skew - average skew)

... where for this specification the average is across all instruments and all past history (currently done entirely in sample, but in a trading rule will be based on an expanding window), and sigma is a standard deviation based on the past history of all instruments (the sigma is not important now, but will be when we come to design trading rules to ensure we have properly normalised forecasts).

This have the advantage of being relatively parsimonious and symettrical, albeit a bit non linear. There is still potentially an issue with implicit fitting, but we can deal with that in later posts.

Under these conditions we'd have the following positions in both rules:

  • High kurtosis, high skew: Both Long (profitable at short horizons)
  • High kurtosis, low skew: Short (Does relatively badly at short horizons)
  • Low kurtosis, high skew: Short (Does relatively badly especially at long horizons)
  • Low kurtosis, low skew: Long (does relatively well at long horizons)

(It's possible to combine these into a single rule, however I like the idea of having a skew and a kurtosis rule and the effects work differently at varying horizons)

Let's look at the t-statistics:

(For example 'pos skew rule' is the kurtosis rule applied when skew is positive and so on; really this ought to be 'pos skew, kurtosis rule' but you get the idea).

Here positive t-statistic means a rule is working. It looks like all the rules work pretty well at a one month frequency, with the skew rule working especially well for longer periods when kurtosis is low.

Does an asset with different skew / kurtosis than normal perform better than average (normalised time series)?

We can modify the rules above so that instead of using the average across all assets we will actually use the average for a given instrument (we can also modify the standard deviation once we get to producing actual forecasts).

Interestingly the rules seem to be bad at the original sweet spot, although skew conditioned on low kurtosis still does very well at longer horizons.

Now let's demean on the current average across all assets:

Ouch. A pretty poor performance. The skew rules (red and green) in particular are very sensitive to frequency.

Finally let's do the same thing, but this time demean by the current median skew and kurtosis for a given asset class. 

Again, not really the best result.


  • It's hard to estimate kurtosis with any certainty, even harder when kurtosis is large (outliers)
  • Unexpectedly we don't get paid for owning assets with high kurtosis
  • .... and then it gets complicated

Yes, there's an awful lot of results in this post! 

The key finding is that, as you may expect, skew and kurtosis have more forecasting power when they are conditioned on each other. Generally we want to own instruments that have had high kurtosis and relatively positive skew: these are lottery tickets which for some reason the market undervalues. We also want to own instruments that have low kurtosis and relatively negative skew; here we get rewarded for negative skew without suffering too many outliers. Instruments where skew and kurtosis are in opposite directions are less attractive.

These effects don't persist that well when we use different demeaning techniques, unlike in skew world where they hold up quite well.

It's worth reflecting on what I have done so far. In the last post I considered 4 different skew trading rules (outright, time series demean, cross sectional demean, asset class cross sectional demean). In this one I've effectively come up with another 8: 4 for skew conditioned on kurtosis, 4 for kurtosis conditioned on skew. That's a total of 12 different trading rules, each of which potentially has 6 different variations for different lookbacks.

Though it would be tempting to select a few of these for further testing that would be implicit fitting; I would be doing so based on the analysis I have done so far having looked at all the data. Instead the right and proper thing will be to take forward all 12 rules into an analysis where their risk weights are fitted systematically in a backward looking framework. So that's the next post.

Monday 21 October 2019

Skew and expected returns

Some bloke* once said "The most overlooked characteristic of a strategy is the expected skew of it's returns, i.e. how symmetrical they are"

* It was me. "Systematic Trading" page 40

Skew then is an important concept, and one which I find myself thinking about a lot. So I've decided to write a series of posts about skew, of which is the first.

In fact I've already written a substantive post on trend following and skew, so this post is sort of the prequel to that. This then is actually the second post in the series, but really it's the first, because you should read this one first. Don't tell me you're confused, I know for a fact everyone reading this is fine with the fact that the Star Wars films came out in the order 4,5,6,1,2,3,7,8,9.

In this post I'll talk about two things. Firstly I will (briefly) discuss the difficulties of measuring skew: yes it's my old favourite subject sampling variation. Secondly I'll talk (at great length) about how skew can predict expected returns by answering the following questions:

  • Do futures with negative skew generally have higher returns than those with positive skew? (strategic allocation)
  • Does this effect still hold when we adjust returns for risk (using standard deviations only)? (risk adjusted returns)
  • Are these rewards because we're taking on more risk in the form of 
  • Does an asset that currently has negative skew outperform one that has positive skew? (time series and forecasting)
  • Does an asset with lower skew than normal perform better than average (normalised time series)?
  • Do these effects hold within asset classes? (relative value)

Some of these are well known results, others might be novel (I haven't checked - this isn't an academic paper!). In particular, this is the canonical paper on skew for futures: but it focuses on commodity futures. There has also been research in single equities that might be relevant (or it may not "Aggregate stock market returns display negative skewness. Firm-level stock returns display positive skewness." from here).

This is the sort of 'pre-backtest' analysis that you should do with a new trading strategy idea, to get a feel for it. What we need to be wary of is implicit fitting; deciding to pursue certain variations and not others having seen the entire sample. I will touch on this again later.

I will assume you're already familiar with the basics of skew, if you're not then you can (i) read "Systematic Trading" (again), (ii) read the first part of this post, or (iii) use the magical power of the internet, or if you're desperate (iv) read a book on statistics.

Variance in skew sample estimates

Really quick reminder: The variance in a sample estimate tells us how confident we can be in a particular estimate of some property. The degree of confidence depends on how much data we have (more data: more confident), the amount of variability in the data (e.g. for sample means, less volatility: more confident), and the estimator we are using (estimates of standard deviation: low variation)

We can estimate sampling distributions in two ways: using closed form formulae, or using a monte carlo estimator.

Closed form formulae are available for things like mean, standard deviation, Sharpe ratio and correlation estimates; but they usually assume i.i.d. returns (Gaussian and serially uncorrelated). For example, the formula for the variance of the mean estimate is sigma / N^2, where sigma is the sample variance estimate and N is the number of data points.

What is the closed form formula for skew? Well, assuming Gaussian returns the formula is as follows:

Skew = 0

Obviously that isn't much use! To get a closed form we'd need to assume our returns had some other distribution. And the closed forms tend to be pretty horific, and also the distributions aren't normally much use if there are outliers (something which, as we shall see, has a fair old effect on the variance). So let's stick to using monte carlo.

Obviously to do that, we're going to need some data. Let's turn to my sadly neglected open source project, pysystemtrade.

import numpy as np
from systems.provided.futures_chapter15.estimatedsystem import *
system = futures_system()

del(system.config.instruments) # so we can get results for everything
instrument_codes = system.get_instrument_list()

percentage_returns = dict()

for code in instrument_codes:
    denom_price = system.rawdata.daily_denominator_price(code)
    instr_prices = system.rawdata.get_daily_prices(code)

    num_returns = instr_prices.diff()
    perc_returns = num_returns / denom_price.ffill()

    # there are some false outliers in the data, let's remove them
    vol_norm_returns = system.rawdata.norm_returns(code)
percentage_returns[code] = perc_returns
We'll use this data throughout the rest of the post; if you want to analyse your own data then feel free to substitute it in here.

Pandas has a way of measuring skew:


We're ignoring for now the question of whether we should use daily, weekly or whatever returns to define skew.

However this doesn't capture the uncertainty in this estimate. Let's write a quick function to get that information:

import random
def resampled_skew_estimator(data, monte_carlo_count=500):
    """    Get a distribution of skew estimates
    :param data: some time series    :param monte_carlo_count: number of goes we monte carlo for    :return: list    """
    skew_estimate_distribution = []
    for _notUsed in range(monte_carlo_count):
        resample_index = [int(random.uniform(0,len(data))) for _alsoNotUsed in range(len(data))]
        resampled_data = data[resample_index]
        sample_skew_estimate = resampled_data.skew()

    return skew_estimate_distribution

Now I can plot the distribution of the skew estimate for an arbitrary market:

import matplotlib.pyplot as pyplot

data = percentage_returns['VIX']
x=resampled_skew_estimator(data, 1000)
pyplot.hist(x, bins=30)

Boy... that is quite a range. It's plausible that the skew of VIX (one of the most positively skewed assets in my dataset) could be zero. It's equally possible it could be around 0.6. Clearly we should be quite careful about interpreting small differences in skew as anything significant.

Let's look at the distribution across all of our different futures instruments

# do a boxplot for everything
import pandas as pd

df_skew_distribution = dict()
for code in instrument_codes:
    x = resampled_skew_estimator(percentage_returns[code],1000)
    y = pd.Series(x)

df_skew_distribution = pd.DataFrame(df_skew_distribution)

df_skew_distribution = df_skew_distribution.reindex(df_skew_distribution.mean().sort_values().index, axis=1)


It looks like:

  • Most assets are negatively skewed
  • Positively skewed assets are kind of logical: V2X, VIX (vol), JPY (safe haven currency), US 10 year, Eurodollar (bank accounts backed by the US government). 
  • The most negatively skewed assets include stock markets and carry currencies (like MXP, AUD) but also some commodities
  • Several pinches of salt should be used here as very few assets have statistically significant skew in eithier direction.
  • More negatively and positively skewed assets have wider confidence intervals
  • Positive skewed assets have a postively skewed estimate for skew; and vice versa for negative skew
  • There are some particularly fat tailed assets whose confidence intervals are especially wide: Corn, V2X, Eurodollar, US 2 year. 

Bear in mind that not all instruments have the same length of data, and in particular many do not include 2008.

Do assets with negative skew generally have higher returns than those with positive skew? 

Let's find out.

# average return vs skew
avg_returns = [percentage_returns[code].mean() for code in instrument_codes]
skew_list = [percentage_returns[code].skew() for code in instrument_codes]

fig, ax = pyplot.subplots()
ax.scatter(skew_list, avg_returns, marker="")

for i, txt in enumerate(instrument_codes):
    ax.annotate(txt, (skew_list[i], avg_returns[i]))

Ignoring the two vol markets, it looks like there might be a weak relationship there. But there is huge uncertainty in return estimates. Let's bootstrap the distribution of mean estimates, and plot them with the most negative skew on the left and the most positive skew on the right:

def resampled_mean_estimator(data, monte_carlo_count=500):
    """    Get a distribution of mean estimates
    :param data: some time series    :param monte_carlo_count: number of goes we monte carlo for    :return: list    """
    mean_estimate_distribution = []
    for _notUsed in range(monte_carlo_count):
        resample_index = [int(random.uniform(0, len(data))) for _alsoNotUsed in range(len(data))]
        resampled_data = data[resample_index]
        sample_mean_estimate = resampled_data.mean()

    return mean_estimate_distribution

df_mean_distribution = dict()
for code in instrument_codes:
    x = resampled_mean_estimator(percentage_returns[code],1000)
    y = pd.Series(x)

df_mean_distribution = pd.DataFrame(df_mean_distribution)

df_mean_distribution = df_mean_distribution[df_skew_distribution.columns]


Again, apart from the vol, hard to see much there. Let's lump together all the countries with above average skew (high skew), and those with below average (low skew):

skew_by_code = df_skew_distribution.mean()
avg_skew = np.mean(skew_by_code.values)
low_skew_codes = list(skew_by_code[skew_by_code<avg_skew].index)
high_skew_codes = list(skew_by_code[skew_by_code>=avg_skew].index)

def resampled_mean_estimator_multiple_codes(percentage_returns, code_list, monte_carlo_count=500):
    :param percentage_returns: dict of returns    :param code_list: list of str, a subset of percentage_returns.keys    :param monte_carlo_count: how many times    :return: list of mean estimtes    """
    mean_estimate_distribution = []
    for _notUsed in range(monte_carlo_count):
        # randomly choose a code        code = code_list[int(random.uniform(0, len(code_list)))]
        data = percentage_returns[code]
        resample_index = [int(random.uniform(0,len(data))) for _alsoNotUsed in range(len(data))]
        resampled_data = data[resample_index]
        sample_mean_estimate = resampled_data.mean()

    return mean_estimate_distribution

df_mean_distribution_multiple = dict()
df_mean_distribution_multiple['High skew'] = resampled_mean_estimator_multiple_codes(percentage_returns,high_skew_codes,1000)
df_mean_distribution_multiple['Low skew'] = resampled_mean_estimator_multiple_codes(percentage_returns,low_skew_codes,1000)

df_mean_distribution_multiple = pd.DataFrame(df_mean_distribution_multiple)

Incidentally I've truncated the plots here because there is a huge tail of negative returns for high skew: basically the vol markets. The mean and medians are instructive, multiplied by 250 to annualise the mean return is -6.6% for high skew and +1.8% for low skew. Without that long tail having such an impact the medians are much closer: +0.9% for high skew and +2.2% for low skew.

If I take out the vol markets I get means of 0.6% and 1.7%, and medians of 1.2% and 2.3%. The median is unaffacted, but the ridiculously low mean return for high vol markets is taken out.

So: there is something there, of the order of a 1.0% advantage in extra annual returns for owning markets with lower than average skew. If you're an investor with a high tolerance for risk who can't use leverage; well you can stop reading now.

Does this effect still hold when we adjust returns for risk (using standard deviations only)?

Excellent question. Let's find out.

# sharpe ratio  vs skew
sharpe_ratios = [16.0*percentage_returns[code].mean()/percentage_returns[code].std() for code in instrument_codes]
skew_list = [percentage_returns[code].skew() for code in instrument_codes]

fig, ax = pyplot.subplots()
ax.scatter(skew_list, sharpe_ratios, marker="")
for i, txt in enumerate(instrument_codes):
    ax.annotate(txt, (skew_list[i], sharpe_ratios[i]))

Hard to see any relationship here, although the two vol markets still stand out as outliers.

Let's skip straight to the high skew/low skew plot, this time for Sharpe Ratios:

def resampled_SR_estimator_multiple_codes(percentage_returns, code_list, monte_carlo_count=500, avoiding_vol=False):
    :param percentage_returns: dict of returns    :param code_list: list of str, a subset of percentage_returns.keys    :param monte_carlo_count: how many times    :return: list of SR estimtes    """
    SR_estimate_distribution = []
    for _notUsed in range(monte_carlo_count):
        # randomly choose a code        # comment in these lines to avoid vol        if avoiding_vol:
            code = "VIX"            while code in ["VIX", "V2X"]:
                code = code_list[int(random.uniform(0, len(code_list)))]
            code = code_list[int(random.uniform(0, len(code_list)))]

        data = percentage_returns[code]
        resample_index = [int(random.uniform(0,len(data))) for _alsoNotUsed in range(len(data))]
        resampled_data = data[resample_index]
        SR_estimate = 16.0*resampled_data.mean()/resampled_data.std()

    return SR_estimate_distribution

df_SR_distribution_multiple = dict()
df_SR_distribution_multiple['High skew'] = resampled_SR_estimator_multiple_codes(percentage_returns,high_skew_codes,1000)
df_SR_distribution_multiple['Low skew'] = resampled_SR_estimator_multiple_codes(percentage_returns,low_skew_codes,1000)

df_SR_distribution_multiple = pd.DataFrame(df_SR_distribution_multiple)

Hard to see what, if any, is the difference there. The summary statistics are even more telling:

Mean: High skew 0.22, Low skew 0.26
Median: High skew 0.22, Low skew 0.20

Once we adjust for risk, or at least risk measured by the second moment of the distribution, then uglier skew (the third moment) doesn't seem to be rewarded by an improved return.

Things get even more interesting if we remove the vol markets again:

Mean: High skew 0.37, Low skew 0.24
Median: High skew 0.29, Low skew 0.17

A complete reversal! Probably not that significant, but a surprising turn of events none the less.

Does an asset that currently has negative skew outperform one that has positive skew? (time series and forecasting)

Average skew and average returns aren't that important or interesting; but it would be cool if we could use the current level of skew to predict risk adjusted returns in the following period.

An open question is, what is the current level of skew? Should we use skew defined over the last week? Last month? Last year? I'm going to check all of these, since I'm a big fan of time diversification for trading signals.

I'm going to get the distribution of risk adjusted returns (no need for bootstrapping) for the following N days, where skew over the previous N days has been higher or lower than average. Then I do a t-test to see if the realised Sharpe Ratio is statistically significantly better in a low skew versus high skew environment.

all_SR_list = []
all_frequencies = ["7D", "14D", "1M", "3M", "6M", "12M"]

for freqtouse in all_frequencies:
    all_results = []
    for instrument in instrument_codes:
            # we're going to do rolling returns            perc_returns = percentage_returns[instrument]
            start_date = perc_returns.index[0]
            end_date = perc_returns.index[-1]

            periodstarts = list(pd.date_range(start_date, end_date, freq=freqtouse)) + [

            for periodidx in range(len(periodstarts) - 2):
                # avoid snooping                p_start = periodstarts[periodidx]+pd.DateOffset(-1)
                p_end = periodstarts[periodidx+1]+pd.DateOffset(-1)
                s_start = periodstarts[periodidx+1]
                s_end = periodstarts[periodidx+2]

                period_skew = perc_returns[p_start:p_end].skew()
                subsequent_return = perc_returns[s_start:s_end].mean()
                subsequent_vol = perc_returns[s_start:s_end].std()
                subsequent_SR = 16*(subsequent_return / subsequent_vol)

                if np.isnan(subsequent_SR) or np.isnan(period_skew):
                    continue                else:
                    all_results.append([period_skew, subsequent_SR])

    all_results=pd.DataFrame(all_results, columns=['x', 'y'])

    subsequent_sr_distribution = dict()
    subsequent_sr_distribution['High_skew'] = all_results[all_results.x>=avg_skew].y
    subsequent_sr_distribution['Low_skew'] = all_results[all_results.x<avg_skew].y

    subsequent_sr_distribution = pd.DataFrame(subsequent_sr_distribution)

    med_SR =subsequent_sr_distribution.median()
    tstat = stats.ttest_ind(subsequent_sr_distribution.High_skew, subsequent_sr_distribution.Low_skew, nan_policy="omit").statistic

all_tstats = pd.Series(all_tstats, index=all_frequencies)

Here are the T-statistics:

Large negative numbers mean a bigger difference in performance. It looks like we get significant results measuring skew over at least the last 1 month or so.

How much is this worth to us? Here are the conditional median returns:

all_SR_list = pd.DataFrame(all_SR_list, index=all_frequencies)
Nice. An extra 0.1 to 0.4 Sharpe Ratio units. One staggering thing about this graph is this, when measuring skew over the last 3 to 12 months, assets with higher than average skew make basically no money.

Should we use 'lower than average skew' or 'negative skew' as our cutoff / demeaning point?

Up to now we've been using the median skew as our cutoff (which in a contionous trading system would be our demeaning point, i.e. we'd have positive forecasts for skew below the median, and negative forecasts for skew which was above). This cuttoff hasn't quite been zero, since on average more assets have negative skew. But is there something special about using a cutoff of zero?

Easy to check:

#avg_skew=all_results.x.median()# instead:
avg_skew = 0

With a less symettric split we'd normally to get better statistical significance (since the 'high skew' group is a bit smaller now), however the results are almost tthe same. Personally I'm going to stick to using the median as my cutoff, since it will make my trading system more symettric.

Does an asset with lower skew than normal perform better than average (normalised time series)?

The results immediately above can be summarised as:

- Assets which currently have more negative skew than average - measured as an average across all assets over all time

This confound three possible effects:

- Assets with on average (over all time) more negative skew perform better on average (the first thing we checked - and on a risk adjusted basis the effect is pretty weak and mostly confined to the vol markets)
- Assets which have currently more negative skew than their own average perform better
- Assets which currently have more negative skew than the current average perform better than other assets

Let's check two and three.

First let's check the second effect, which can be rephrased as is skew demeaned by the average for an asset predictive of future performance for that asset?
I'm going to use the average skew for the last 10 years to demean each asset.

Code the same as above, except:

perc_returns = percentage_returns[instrument]
all_skew = perc_returns.rolling("3650D").skew()
period_skew = perc_returns[p_start:p_end].skew()
avg_skew = all_skew[:p_end][-1]
period_skew = period_skew - avg_skew
subsequent_return = perc_returns[s_start:s_end].mean()

Similar to before, but not quite as significant. The weaker effect at one month has vanished. Here are the Sharpe Ratios:

The 'skew bonus' has reduced somewhat to around 0.2 SR points for the last 12 months of returns.

Now let's check the third effect, which can be rephrased as is skew demeaned by the average of current skews across all assets asset predictive of future performance for that asset?

Code changes:
all_SR_list = []
all_frequencies = ["7D", "14D", "30D", "90D", "180D", "365D"]
for freqtouse in all_frequencies:
    all_results = []
    # relative value skews need averaged
    skew_df = {}
    for instrument in instrument_codes:
        # rolling skew over period        instrument_skew = percentage_returns[instrument].rolling(freqtouse).skew()
        skew_df[instrument] = instrument_skew

    skew_df_all = pd.DataFrame(skew_df)
    skew_df_median = skew_df_all.median(axis=1)

    for instrument in instrument_codes:
period_skew = perc_returns[p_start:p_end].skew()
avg_skew = skew_df_median[:p_end][-1]
period_skew = period_skew - avg_skew
subsequent_return = perc_returns[s_start:s_end].mean()

Plots, as before:

To summarise then:

- Using recent skew is very predictive of future returns, if 'recent' means using at least 1 month of returns, and ideally more. The effect is strongest if we use the last 6 months or so of returns.
- Some, but not all, of this effect persists if we normalise skew by the long run average for an asset. So, for example, even for assets which generally have positive skew, you're better off investing in them when their skew is lower than normal
- Some, but not all, of this effect persists if we normalise skew by the current average level of skew. So, for example, even in times when skew generally is negative (2008 anyone?) it's better to invest in the assets with the most negative skew.

We could formally decompose the above effects with for example a regression, but I'm more of a fan of using simple trading singles which are linearly weighted, with weights conditional on correlations between signals.

Do these effects hold within asset classes? (relative value)

Rather than normalising skew by the current average across all assets, maybe it would be better to consider the average for that asset class. So we'd be comparing S&500 current skew with Eurostoxx, VIX with V2X, and so on.

all_SR_list = []
all_frequencies = ["7D", "14D", "30D", "90D", "180D", "365D"]
asset_classes = list(system.data.get_instrument_asset_classes().unique())

for freqtouse in all_frequencies:
    all_results = []
    # relative value skews need averaged
    skew_df_median_by_asset_class = {}
    for asset in asset_classes:
        skew_df = {}
        for instrument in system.data.all_instruments_in_asset_class(asset):
            # rolling skew over period            instrument_skew = percentage_returns[instrument].rolling(freqtouse).skew()
            skew_df[instrument] = instrument_skew

        skew_df_all = pd.DataFrame(skew_df)
        skew_df_median = skew_df_all.median(axis=1)
        # will happen if only one asset class        skew_df_median[skew_df_median==0] = np.nan

        skew_df_median_by_asset_class[asset] = skew_df_median

    for instrument in instrument_codes:
            # we're going to do rolling returns            asset_class = system.data.asset_class_for_instrument(instrument)

            perc_returns = percentage_returns[instrument]
period_skew = perc_returns[p_start:p_end].skew()

avg_skew = skew_df_median_by_asset_class[asset_class][:p_end][-1]
period_skew = period_skew - avg_skew
subsequent_return = perc_returns[s_start:s_end].mean()

That's interesting: the effect is looking a lot weaker except for the longer horizons. The worse t-stats could be explained by the fact that we have less data (long periods when only one asset is in an asset class and we can't calculate this measure), but the relatively small gap between Sharpe Ratios isn't affected by this.

So almost all of the skew effect is happening at the asset class level. Within asset classes, for futures at least, if you normalise skew by asset class level skew you get a not significant 0.1 SR units or so of benefit, and then only for fairly slow time frequencies.


This has been a long post. And it's been quite a heavy, graph and python laden, post. Let's have a quick recap:

  • Most assets have negative skew
  • There is quite a lot of sampling uncertainty around skew, which is worse for assets with outliers (high kurtosis) and extreme absolute skew
  • Assets which on average have lower (more negative) skew will outperform in the long run.
  • This effect is much smaller when we look at risk adjusted returns (Sharpe Ratios), and is driven mainly by the vol markets (VIX, V2X)
  • Assets with lower skew right now will outperform those with higher skew right now. This is true for skew measuring and forecasting periods of at least 1 month, and is strongest around the 6 month period. In the latter case an average improvement of 0.4 SR units can be expected.
  • This effect is the same regardless of wether skew is compared to the median or compared to zero.
  • This effect mostly persists even if we demean skew by the rolling average of skew for a given asset over the last 10 years: time series relative value
  • This effect mostly persists if we deman skew by the average of current skews across all assets: cross sectional relative value
  • But this effect mostly vanishes if the relative value measure is taken versus the average for the relevant asset class.
This is all very interesting, but it mostly compares, and it still isn't a trading strategy. So in the next post I will consider the implementation of these ideas as a suite of trading strategies: 

  • Skew measured over the last N days, relative to a long term average across all assets
  • Skew measured over the last N days, relative to a long term average for this asset
  • Skew measured over the last N days, relative to the current average for all assets
  • Skew measured over the last N days, relative to the current average for the relevant asset class
Where N will be in the ballpark 10... 250 business days.

For now it's important to bear in mind that I must not discard any of the above ideas because of likely poor performance:
  • lower values of N (though some might be removed because their trading costs are too high), eg 2 weeks
  • Asset class relative value