Tuesday 10 December 2019

New and Improved Sharpe Ratio adjustment in the handcrafting method

In my recent posts on skew and kurtosis I've put together a large number of ideas for possible trading strategies. The next step will be to create and test these ideas out. However I already know from my initial analysis that many of these ideas will probably have poor performance. This leaves me in something of a conundrum. In fact this is just one example of the conundrum that strategy developers face all the time.

Do I ignore these strategies completely, which will result in my historic back test being overstated (a sin I have named implicit fitting)? Or do I include them in a portfolio of strategies, and hope that the allocation methodology will downweight them because of their poor performance?

The problem is that methods for portfolio allocation tend to fall into one of two camps. There are robust methods that completely ignore performance, from the simple 'one over n' equal weighting all the way up to more complex methods like Hierarchical Risk Parity (HRP). Those that do use the performance of strategies, like the basic naive Markowitz optimisation, tend to be very un-robust. Small differences in mean result in big differences in allocation.

There is no easy answer here. On the one hand, the mean (or equivalently Sharpe Ratio) is the moment of the distribution with the largest sampling error, and which is the hardest to predict. On the other hand, there really ought to be some way to downweight and ultimately drop consistently money losing strategies.

In the handcrafting method, which I wrote about in some detail this time last year (the first post is here), I use a simple heuristic technique to adjust weights for different Sharpe Ratios. Basically it's the yellow line on this graph:

The x-axis shows the relative SR of an asset compared to the average of a portfolio (or sub-portfolio). The y-axis is the weighting that is applied. So for an asset with the average SR, the weighting factor is exactly 1.

The method is then simple: we multiply all the weights in a (sub)portfolio by the weighting factor given by their relative SR. We then re-normalise the weights so they add up to 100%. Assets with higher SR have higher weights, and vice-versa. But extreme weights don't occur, since the weight multiplier is never below 0.6 or above 1.4.

(As an aside, the handcrafting method also accounts for uncertainty in correlations, but not in standard deviations. The latter is relatively small and has minimal effect on portfolio weights)

The heuristic has the advantage of being robust, fast and intuitive. The disadvantage is that this technique does not account for important factors. In particular, we can become more confident about our weighting when (a) we have more data and (b) the correlation between assets is higher. In the right circumstances a weight of almost zero would be justified - effectively entirely removing an asset or trading rule variation from the portfolio.

In this post I modify the simple heuristic to account for these factors. Although this slightly increases the calculation time for the handcrafted optimisation, it is worth doing. There will be python code.

Where does the heuristic come from?

The heuristic essentially comes from doing a portfolio optimisation on a very simple arbitrary portfolio, which has two assets with:

  • Asset one: some arbitrary Sharpe Ratio (I use SR=0.5 which makes sense for the sorts of trading rules I use)
  • Asset two: A higher or lower Sharpe Ratio
  • Both assets: Some arbitrary identical standard deviation (I use 15% though in fact this doesn't affect the results)
  • Some correlation
If both assets had the same Sharpe Ratio then the optimal allocation would be 50% each, irrespective of correlation. We can check this:

### Following code is boilerplate for optimising
from scipy.optimize import minimize
import numpy as np
import scipy.stats as stats
import pandas as pd

def optimise(mean_list, avg_correlation, std): corr_matrix = boring_corr_matrix(len(mean_list), offdiag=avg_correlation) stdev_list = [std]*len(mean_list) sigma = sigma_from_corr_and_std(stdev_list, corr_matrix) return optimise_with_sigma(sigma, mean_list) def optimise_with_sigma(sigma, mean_list): mus = np.array(mean_list, ndmin=2).transpose() number_assets = sigma.shape[1] start_weights = [1.0 / number_assets] * number_assets # Constraints - positive weights, adding to 1.0
    bounds = [(0.0, 1.0)] * number_assets
    cdict = [{'type': 'eq', 'fun': addem}]
    ans = minimize(
        start_weights, (sigma, mus),
    weights = ans['x']

    return weights

def neg_SR(weights, sigma, mus):
    # Returns minus the Sharpe Ratio (as we're minimising)    
    weights = np.matrix(weights)
    estreturn = (weights * mus)[0, 0]
    std_dev = (variance(weights, sigma)**.5)

    return -estreturn / std_dev

def variance(weights, sigma):
    # returns the variance (NOT standard deviation) given weights and sigma    
    return (weights * sigma * weights.transpose())[0, 0]

def sigma_from_corr_and_std(stdev_list, corrmatrix):
    stdev = np.array(stdev_list, ndmin=2).transpose()
    sigma = stdev * corrmatrix * stdev

    return sigma

def addem(weights):
    # Used for constraints    
    return 1.0 - sum(weights)

def boring_corr_matrix(size, offdiag=0.99, diag=1.0):
    """    Create a boring correlation matrix
    :param size: dimensions    
    :param offdiag: value to put in off diagonal    
    :param diag: value to put in diagonal    
    :return: np.array 2 dimensions, size    
    size_index = range(size)

    def _od(i, j, offdiag, diag):
        if i == j:
            return diag
            return offdiag

    m = [[_od(i, j, offdiag, diag) for i in size_index] for j in size_index]
    m = np.array(m)
    return m

# Both have same mean, and hence SR, just vary the correlation
>>> optimise([0.05, 0.05], 0.0, 0.15)
array([0.5, 0.5])
>>> optimise([0.05, 0.05], 0.9, 0.15)
array([0.5, 0.5])
>>> optimise([0.05, 0.05], -0.9, 0.15)
array([0.5, 0.5])

A small difference in means will result in crazy weights, especially if we have highish correlation:

optimise([0.05, 0.06], 0.9, 0.15)
array([1.11022302e-16, 1.00000000e+00])

There is more of this kind of analysis in Smart Portfolios if you're curious

We need some way of accounting for the uncertainty in our mean estimate (in principal we should do the same for correlation and vol; but the handcrafting method already does this for correlation, and vol is relatively predictable).

Let's begin with the textbook formula for sampling error for a difference in means:

Line 1: Standard deviation of the mean estimate, asset i, given sigma_i (annualised standard deviation) and N is the number of years of data
Line 2: Ditto for asset j
Line 3: Standard deviation of the difference in means
Line 4: If both assets have the same standard deviation this simplifies to the final line.

This gives us the standard deviation of the distribution of the differences in mean estimate. The mean of that distribution is just the difference between the two sample means. Thanks to central limit theorem, the distribution is Gaussian .So we can work out the cumulative probability of any particular difference in mean estimate.

Here is an implementation of this in python:

def calculate_omega_difference(std, years_of_data, avg_correlation):
    omega_one_asset = std / (years_of_data)**.5    
    omega_variance_difference = 2*(omega_one_asset**2)*(1- avg_correlation)
    omega_difference = omega_variance_difference**.5
    return omega_difference

def calculate_confident_mean_difference(std, years_of_data, mean_difference, confidence_interval, avg_correlation):
    omega_difference = calculate_omega_difference(std, years_of_data, avg_correlation)
    confident_mean_difference = stats.norm(mean_difference, omega_difference).ppf(confidence_interval)

    return confident_mean_difference

# The 0.5 point will just be the mean difference
>>> calculate_confident_mean_difference(0.15, 10, 0.3, 0.5, 0.0)

# It's plausible (5% chance) the difference is quite a bit lower than 0.3 with enough data
>>> calculate_confident_mean_difference(0.15, 10, 0.3, 0.05, 0.0)

# With insufficient date we can't be confident the mean is even positive
>>> calculate_confident_mean_difference(0.15, 1, 0.3, 0.05, 0.0)

# ... unless the correlation is really high 
>>> calculate_confident_mean_difference(0.15, 1, 0.3, 0.05, 0.95)

# Trivially if we increase the mean difference the distribution is just shifted
>>> calculate_confident_mean_difference(0.15, 1, 0.6, 0.05, 0.95)

It's worth playing around with this code to get a feel for how the mean distribution is affected by (a) the correlation and (b) the amount of data available (it's less interesting to change the standard deviation or average mean).

We can now see how different levels of confidence in our mean estimate affect our portfolio weights. This function optimises, fixing the Sharpe Ratio of the second asset, and applying an adjustment to the first Sharpe Ratio.

def weights_given_SR_diff(SR_diff,  avg_correlation,  confidence_interval, years_of_data,
                          avg_SR=0.5, std=0.15, how_many_assets=2):
    """    Return the ratio of weight to 1/N weight for an asset with unusual SR
    :param SR_diff: Difference between the SR and the average SR. 0.0 indicates same as average    
    :param avg_correlation: Average correlation amongst assets    
    :param years_of_data: How long has this been going one    
    :param avg_SR: Average SR to use for other asset    
    :param confidence_interval: How confident are we about our mean estimate (i.e. cdf point)    
    :param how_many_assets: .... are we optimising over (I only consider 2, but let's keep it general)    
    :param std: Standard deviation to use
    :return: Ratio of weight, where 1.0 means no difference    
    # From SR to means
    average_mean = avg_SR * std
    asset1_mean = (SR_diff + avg_SR)*std

    mean_difference = asset1_mean - average_mean

    ## Work out what the mean is with appropriate confidence    
    confident_mean_difference = calculate_confident_mean_difference(std, years_of_data, mean_difference, confidence_interval, avg_correlation)

    confident_asset1_mean = confident_mean_difference + average_mean

    mean_list = [confident_asset1_mean]+[average_mean]*(how_many_assets-1)

    weights = optimise(mean_list, avg_correlation, std)

    return list(weights)

# Median estimate (50%, 0.5 point) for a modest SR improvement (0.2) with lowish
# correlation (0.3) and a decade of data (10). 
# Equivalent to optimise([0.7*.15, 0.5*.15], 0.3, 0.15)
>>> weights_given_SR_diff(0.2, 0.3, 0.5, 10)
[0.6529536546603577, 0.3470463453396423]

# It's plausible the weight to the first asset could be lower (10% point, 0.1)...
>>> weights_given_SR_diff(0.2, 0.3, 0.1, 10)
[0.14015358673119246, 0.8598464132688076]

# or higher (90% point, 0.9)...
>>> weights_given_SR_diff(0.2, 0.3, 0.9, 10)
[0.8741474140876935, 0.12585258591230658]

I'm going to do a sort of monte-carlo exercise, but not really. If we consider a number of fixed points on a distribution, equally spaced, then each are equally likely. For example the nine points, 10%, 20%, .... 90% are all equally likely to occur. If we optimise the portfolio weights for a 10% degree of confidence, then 20%, and so on; then take an average of the weights, we basically get the same effect as if we repeatedly sampled the return distribution in the normal way (the non-parametric bootstrapping I discussed in Systematic Trading). But the advantage of using this method is that we only need to run N optimisations (where N is hopefully not too big - to be determined shortly), and more interestingly we don't get the randomness of bootstrapping - the same parameters should always produce the same result.

def mini_bootstrap_ratio_given_SR_diff(SR_diff,  avg_correlation, years_of_data, avg_SR=0.5, std=0.15, how_many_assets=2,
    Do a parametric bootstrap of portfolio weights to tell you what the ratio should be between an asset which       
    has a higher backtested SR (by SR_diff) versus another asset(s) with average Sharpe Ratio (avg_SR)
    All assets are assumed to have same standard deviation and correlation
    :param SR_diff: Difference in performance in Sharpe Ratio (SR) units between one asset and the rest    
    :param avg_correlation: Average correlation across portfolio    
    :param years_of_data: How many years of data do you have (can be float for partial years)    
    :param avg_SR: Should be realistic for your type of trading    
    :param std: Standard deviation (doesn't affect results, just a scaling parameter)    
    :param how_many_assets: How many assets in the imaginary portfolio    
    :param p_step: Step size to go through in the CDF of the mean estimate    
    :return: float, ratio of weight of asset with different SR to 1/n weight    
    dist_points = np.arange(p_step, stop=(1-pstep)+0.000001, step=p_step)
    list_of_weights = [weights_given_SR_diff(SR_diff,  avg_correlation,
                                          confidence_interval, years_of_data, avg_SR=avg_SR, std=std,
                     for confidence_interval in dist_points]

    array_of_weights = np.array(list_of_weights)
    average_weights = np.nanmean(array_of_weights, axis=0)
    ratio_of_weights = weight_ratio(average_weights)

    if np.sign(ratio_of_weights-1.0)!=np.sign(SR_diff):
        # This shouldn't happen, and only occurs because weight distributions get curtailed at zero        
        return 1.0

    return ratio_of_weights

def weight_ratio(weights):
    Return the ratio of weight of first asset to 1/n weight
    :param weights:    
    :return: float    
    one_over_N_weight = 1.0/len(weights)
    weight_first_asset = weights[0]

    return weight_first_asset/one_over_N_weight

Notice that the function returns the 'weight ratio': the ratio of the optimised  weight to a 1/N weight, since that is what is used in the heuristic method.

There are several open questions here, relating to the parameters we have to specify. Average correlation and the amount of available data are key variables which I will explore in a moment. The choice of average Sharpe Ratio will influence our results to a lesser degree, but to keep things simple I will be sticking to avg_SR=0.5. Changing the standard deviation will have absolutely no effect on the results as it is just a scaling factor.

Finally there is a straightforward methodological issue - the choice of 'p-step' i.e.  how many of these optimisations do we need to do for an accurate answer? Is 0.1,0.2,... 0.9 sufficient (in the function, p-step=0.1) or do we need to be more granular?

import matplotlib
how_many_assets = 2
avg_SR = 0.5
avg_correlation = 0.5
years_of_data = 10
SR_diff_list = list(np.arange(-0.5, stop=0.5, step=0.05))
list_of_psteps = [0.2, 0.1, 0.05, 0.1, 0.01, 0.001]

results_over_psteps = dict()
for pstep in list_of_psteps:
    results_over_SR_diff = []
    for SR_diff in SR_diff_list:
        print("Pstep %f SR_diff %f Years of data %d" % (pstep, SR_diff, years_of_data))
        ratio = mini_bootstrap_ratio_given_SR_diff(SR_diff, how_many_assets,
                                                   avg_correlation, years_of_data, avg_SR=avg_SR,
                                                   p_step = pstep)
    results_over_psteps[pstep] = results_over_SR_diff

df_results = pd.DataFrame(results_over_psteps, index=SR_diff_list)
plot_title = "(Avg correlation %f Number of assets %d, Average SR %f Years of data %d; Lines show pstes %f:%f)" \
             % (avg_correlation, how_many_assets, avg_SR, years_of_data, list_of_psteps[0], list_of_psteps[-1])
matplotlib.pyplot.xlabel("SR relative to portfolio average SR")

I would say that a p-step of 0.01 (i.e. distributional points at 0.01, 0.02, 0.03...0.99) is more than sufficient to give an accurate result (very close to the much more finely grained 0.001).

A minor point: in this and all the following plots the lines do not (as you might expect) go through the point (0.0, 1.0). This is an artifact of the portfolio optimisation process, and not a bug. If I was using these plots in a book, I'd shift them to get them to look nice. As it is, because weights are always renormalised to 100%, it won't make any difference.

How is the heuristic weighting affected by having more data?

how_many_assets = 2
SR_diff_list = list(np.arange(-0.5, stop=0.5, step=0.05))
various_lengths_of_history = [2,5,10,20,30,40]
p_step = 0.01avg_correlation=0.5results_over_years = dict()
for years_of_data in various_lengths_of_history:
    results_over_SR_diff = []
    for SR_diff in SR_diff_list:
        print("Correlation %f SR_diff %f Years of data %d" % (avg_correlation, SR_diff, years_of_data))

        ratio = mini_bootstrap_ratio_given_SR_diff(SR_diff, avg_correlation, years_of_data, avg_SR=avg_SR,
                                                   std=std, how_many_assets=how_many_assets, p_step = p_step)
    results_over_years[years_of_data] = results_over_SR_diff

df_results = pd.DataFrame(results_over_years, index = SR_diff_list)
plot_title = "Avg correlation %f (Number of assets %d, Average SR %f; Lines show years of data %d:%d)" \
                        % (avg_correlation,     how_many_assets,   avg_SR,  various_lengths_of_history[0], various_lengths_of_history[-1])
df_results.plot(title = plot_title)
matplotlib.pyplot.xlabel("SR relative to portfolio average SR")
What we expect to see here is more data -> increase in confidence -> more extreme weights, when we have more history. And that is exactly what we see; with 2 years of data the lowest weight we can get to is around 0.5, whilst with 30 years plus it's almost zero.

How is the weighting affected by average correlations?

how_many_assets = 2
SR_diff_list = list(np.arange(-0.5, stop=0.5, step=0.05))
years_of_data = 10
p_step = 0.01
various_correlations=[0.0, 0.3, 0.5, 0.7, 0.9]
results_over_correl = dict()
for avg_correlation in various_correlations:
    results_over_SR_diff = []
    for SR_diff in SR_diff_list:
        print("Correlation %f SR_diff %f Years of data %d" % (avg_correlation, SR_diff, years_of_data))

        ratio = mini_bootstrap_ratio_given_SR_diff(SR_diff, avg_correlation, years_of_data, avg_SR=avg_SR,
                                                   std=std, how_many_assets=how_many_assets, p_step = p_step)
    results_over_correl[avg_correlation] = results_over_SR_diff

df_results = pd.DataFrame(results_over_correl, index = SR_diff_list)
plot_title = "Years %d (Number of assets %d, Average SR %f; Lines show correlations %d:%d)" \
                        % (years_of_data,     how_many_assets,   avg_SR,  various_correlations[0], various_correlations[-1])
df_results.plot(title = plot_title)
matplotlib.pyplot.xlabel("SR relative to portfolio average SR")

The higher the correlation between two strategies, the more confident we can be that a small difference in Sharpe Ratios is meaningful, and the more extreme weights we can use.

Plugging into pysystemtrade

Now seems a good opportunity to test how effective this method is. So let's plug in the above code into the handcrafting implementation in my open source python backtesting engine, pysystemtrade.

The optimisation code is a bit of a mess, but as a starting point towards refactoring I've moved the core optimisation routines which are needed here into another file to avoid double coding or messing up the heirarchy. I've also updated handcrafting.py to include the new method.

Let's check things are working as we expect with a simple example. Imagine we're trading the VIX with some vanilla trading rules, plus a dumb one: Always go long the VIX:

from systems.provided.futures_chapter15.estimatedsystem import futures_system
from systems.provided.moretradingrules.morerules import long_bias
from systems.forecasting import TradingRule

system = futures_system(log_level="on")
config = system.config
new_rule = TradingRule(long_bias) config.trading_rules['long'] = new_rule

# Following line means we won't be adjusting
system = futures_system(log_level="on", config=config)

First we'll check what happens without any Sharpe Ratio adjustment.

instrument_returns = system.combForecast.get_returns_for_optimisation("VIX").to_frame()
instrument_returns = instrument_returns.resample("W").sum()

p=Portfolio(instrument_returns, use_SR_estimates=False)

>>> p.sharpe_ratio
array([ 0.6591312 ,  0.50752412,  0.5209934 ,  0.48374054, -0.52792834])

>>> p.corr_matrix
                carry  ewmac16_64  ewmac32_128  ewmac64_256      long
carry        1.000000    0.681041     0.844301     0.913307 -0.872324
ewmac16_64   0.681041    1.000000     0.914988     0.782395 -0.528207
ewmac32_128  0.844301    0.914988     1.000000     0.946146 -0.659741
ewmac64_256  0.913307    0.782395     0.946146     1.000000 -0.726869
long        -0.872324   -0.528207    -0.659741    -0.726869  1.000000

>>> p.show_subportfolio_tree()
Natural top level grouping used
[' Contains 3 sub portfolios', 
["[0] Contains ['carry', 'ewmac32_128', 'ewmac64_256']"], 
["[1] Contains ['ewmac16_64']"], ["[2] Contains ['long']"]]

>>> p.cash_weights
Natural top level grouping used
[0.06658, 0.3790, 0.08106, 0.0653, 0.4079]

It's very diversifying, so get's it's own group and a chunky weight.

Now with the adjustment applied:

system = futures_system(log_level="on", config=config)

instrument_returns = system.combForecast.get_returns_for_optimisation("VIX").to_frame()
instrument_returns = instrument_returns.resample("W").sum()

>>> p.cash_weights
Natural top level grouping used
[0.1092, 0.5553, 0.1329, 0.09569, 0.1067]

Let's look at the time series of weights to see how they change as we get more data:


The "always long" trading rule weight starts off pretty high after 2008 when it performed very well. Subsequently however it's performance fades, and so does it's weight (albeit there is some jumping around as the group weights change).


I'm happy that the heuristic for Sharpe Ratio now deals properly with assets that have different correlations and lengths of available data. In the new year I'll be using the upgraded handcrafting method to test the trading rules for skew and kurtosis that I've been playing with for the last couple of posts.