Remember the handcrafting method, which I described in this series of posts?
Motivating portfolio construction
Methodology
Implementing
Testing
Adjusting portfolio weights for Sharpe Ratios
All very nice, all very theoretically grounded, except for one thing: the 'candidate matrices'. Remember, what we do is group our portfolio into subportfolios of 2 or 3 assets, and find some initial risk weights (following which there is some funny business involved IDMs and SR adjustments).
For two assets the risk weights are trivial (50% each), and then for three assets we have the problem of coming up with some portfolio weights that account for different correlations, and yet are robust to the uncertainty in correlation estimates.
Note: I account for uncertainty in Sharpe Ratios. I ignore uncertainty in volatility estimates, as that is relatively small and has minimal effect on portfolio weights.
I do this by coming up with 'candidate' matrices. Basically a series of correlation matricies that cover the possible combinations of three assets. Each of these then has a matching set of portfolio weights. We find the closest correlation matrix and voila, we have the right weights! (possibly with some extrapolation if no matrix is perfect).
Now this has always been a slightly unsatisfactory part of the method, which reflects it's roots as a heuristic method used by humans requiring minimal computation. As humans, it is nice to glance at correlation matrices and say 'ah this looks like a classic #3 matrix. I have some nice robust weights for this in a draw somewhere'. But as I've made the transition towards fully systematizing and automating this method, we can do better.
To be precise, we can do what we need with the SR adjustments. In the original post, with the method taken from my first book, this was a somewhat vague method that took no precise account for how much uncertainty there is in SR estimates. In my subsequent post on the subject, I did exactly that: computed the expected distribution of parameter estimates, and took the average portfolio weight over various points
So in this post, I'm going to take a similar approach. To whit, I will replace the 'candidate matching' stage in handcrating with the following:
- Get central estimates for correlations in a given asset portfolio
- Get the amount of time for which those correlations are estimated over
- Calculate the parameter distribution for the correlations
- Optimise the portfolio weights over different points on the correlation distribution
- Take an average of those portfolio weights
I'd recommend reading this post even if you're not a fan of the handcrafting method. It will give you some interesting insight into the uncertainty of correlation estimates, their effect on portfolio weights, and (bonus feature!) the optimal amount of time to estimate correlations over.
Setup
Let's start by setting ourselves up with some boiler plate optimisation code that will give us some weights for a three asset portfolio, based purely on the correlations (I'll assume SR=0.5 and some arbitrary standard deviation for all three assets).
from scipy.optimize import minimize
import numpy as np
def optimise_for_corr_matrix(corr_matrix):
mean_list = [.05]*3
std = [.1]*3
return optimise_inner(mean_list, corr_matrix, std)
def optimise_inner(mean_list, corr_matrix, std):
stdev_list = [std]*len(mean_list)
sigma = sigma_from_corr_and_std(stdev_list, corr_matrix)
return optimise_with_sigma(sigma, mean_list)
def optimise_with_sigma(sigma, mean_list):
mus = np.array(mean_list, ndmin=2).transpose()
number_assets = sigma.shape[1]
start_weights = [1.0 / number_assets] * number_assets
# Constraints - positive weights, adding to 1.0
bounds = [(0.0, 1.0)] * number_assets
cdict = [{'type': 'eq', 'fun': addem}]
ans = minimize(
neg_SR,
start_weights, (sigma, mus),
method='SLSQP',
bounds=bounds,
constraints=cdict,
tol=0.00001)
weights = ans['x']
return weights
def neg_SR(weights, sigma, mus):
# Returns minus the Sharpe Ratio (as we're minimising)
weights = np.matrix(weights)
estreturn = (weights * mus)[0, 0]
std_dev = (variance(weights, sigma)**.5)
return -estreturn / std_dev
def variance(weights, sigma):
# returns the variance (NOT standard deviation) given weights and sigma
return (weights * sigma * weights.transpose())[0, 0]
def sigma_from_corr_and_std(stdev_list, corrmatrix):
stdev = np.array(stdev_list, ndmin=2).transpose()
sigma = stdev * corrmatrix * stdev
return sigma
def addem(weights):
# Used for constraints
return 1.0 - sum(weights)
labelledCorrelations = namedtuple("labelledCorrelationList", 'ab ac bc')
def three_asset_corr_matrix(labelled_correlations):
"""
:return: np.array 2 dimensions, size
"""
ab = labelled_correlations.ab
ac = labelled_correlations.ac
bc = labelled_correlations.bc
m = [[1.0, ab, ac], [ab, 1.0, bc], [ac, bc, 1.0]]
m = np.array(m)
return m
Let's check a few stylised examples:
>>> optimise_for_corr_matrix(three_asset_corr_matrix(labelledCorrelations(ab=0,ac=0,bc=0)))
array([0.33333333, 0.33333333, 0.33333333])
>>> optimise_for_corr_matrix(three_asset_corr_matrix(labelledCorrelations(ab=0.9,ac=0,bc=0)))
array([0.25627057, 0.25627056, 0.48745887])
>>> optimise_for_corr_matrix(three_asset_corr_matrix(labelledCorrelations(ab=0.9,ac=0.9,bc=0)))
array([0. , 0.49999999, 0.50000001])
>>> optimise_for_corr_matrix(three_asset_corr_matrix(labelledCorrelations(ab=0.9,ac=0.9,bc=0.9)))
array([0.33333333, 0.33333333, 0.33333333])
That all looks pretty sensible.
Correlation uncertainty
Now, what is the correlation uncertainty given some sample size and the correlation value? I could paste in some LaTex, but instead let me point you towards the wiki page, and show you some code that implements it:
def get_correlation_distribution_point(corr_value, data_points, conf_interval):
fisher_corr = fisher_transform(corr_value)
point_in_fisher_units = \
get_fisher_confidence_point(fisher_corr, data_points, conf_interval)
point_in_natural_units = inverse_fisher(point_in_fisher_units)
return point_in_natural_units
def fisher_transform(corr_value):
return 0.5*np.log((1+corr_value) / (1-corr_value)) # also arctanh
def get_fisher_confidence_point(fisher_corr, data_points, conf_interval):
if conf_interval<0.5:
confidence_in_fisher_units = fisher_confidence(data_points, conf_interval)
point_in_fisher_units = fisher_corr - confidence_in_fisher_units
elif conf_interval>0.5:
confidence_in_fisher_units = fisher_confidence(data_points, 1-conf_interval)
point_in_fisher_units = fisher_corr + confidence_in_fisher_units
else:
point_in_fisher_units = fisher_corr
return point_in_fisher_units
def fisher_confidence(data_points, conf_interval):
data_point_root =fisher_stdev(data_points)
conf_point = get_confidence_point(conf_interval)
return data_point_root * conf_point
def fisher_stdev(data_points):
data_point_root = 1/((data_points-3)**.5)
return data_point_root
def get_confidence_point(conf_interval):
conf_point = norm.ppf(1-(conf_interval/2))
return conf_point
def inverse_fisher(fisher_corr_value):
return (np.exp(2*fisher_corr_value) - 1) / (np.exp(2*fisher_corr_value) + 1)
Let's have a play with this. First of all, the 0.5 point should give us the central estimate for correlation:
>>> get_correlation_distribution_point(0.9, 100, 0.5)
0.9
>>> get_correlation_distribution_point(0.0, 100, 0.5)
0.0
We can back out lower and upper estimates:
>>> get_correlation_distribution_point(0.0, 100, 0.05)
-0.1964181176820594
>>> get_correlation_distribution_point(0.0, 100, 0.95)
0.19641811768205936
These get wider if we use less data:
>>> get_correlation_distribution_point(0.0, 10, 0.05)
-0.6296263003883293
>>> get_correlation_distribution_point(0.0, 10, 0.95)
0.6296263003883295
The distribution gets narrower and slightly assymetric at extremes:
>>> get_correlation_distribution_point(0.99, 100, 0.05)
0.9851477380139932
>>> get_correlation_distribution_point(0.99, 100, 0.95)
0.9932723911926727
And so on.
Three asset portfolio weights under conditions of portfolio uncertainty
For a three asset portfolio we can now back out the optimised portfolio weights given the relevant points on the correlation distributions:
def optimise_for_corr_matrix_with_uncertainty(corr_matrix, conf_intervals, data_points):
labelled_correlations = extract_asset_pairwise_correlations_from_matrix(corr_matrix)
labelled_correlation_points = calculate_correlation_points_from_tuples(labelled_correlations, conf_intervals, data_points)
corr_matrix = three_asset_corr_matrix(labelled_correlation_points)
weights = optimise_for_corr_matrix(corr_matrix)
return weights
def extract_asset_pairwise_correlations_from_matrix(corr_matrix):
ab = corr_matrix[0][1]
ac = corr_matrix[0][2]
bc = corr_matrix[1][2]
return labelledCorrelations(ab=ab, ac=ac, bc=bc)
def calculate_correlation_points_from_tuples(labelled_correlations, conf_intervals, data_points):
correlation_point_list = [get_correlation_distribution_point(corr_value, data_points, confidence_interval)
for corr_value, confidence_interval in
zip(labelled_correlations, conf_intervals)]
labelled_correlation_points = labelledCorrelations(*correlation_point_list)
return labelled_correlation_points
Let's try this out:
>>> corr_matrix = three_asset_corr_matrix(labelledCorrelations(0,0,0))
>>> conf_intervals = labelledCorrelations(.5,.5,.5)
>>> optimise_for_corr_matrix_with_uncertainty(corr_matrix, conf_intervals, 100)
array([0.33333333, 0.33333333, 0.33333333])
# Makes sense, taking the median point off the distribution just recovers equal weights
>>> conf_intervals = labelledCorrelations(.5,.5,.95)
>>> optimise_for_corr_matrix_with_uncertainty(corr_matrix, conf_intervals, 100)
array([0.37431944, 0.31284028, 0.31284028])
# with a higher confidence point for correlation BC we put more in asset A
>>> conf_intervals = labelledCorrelations(.5,.5,.05)
>>> optimise_for_corr_matrix_with_uncertainty(corr_matrix, conf_intervals, 100)
array([0.286266, 0.356867, 0.356867])
# with a lower correlation for BC we put less in asset A
>>> optimise_for_corr_matrix_with_uncertainty(corr_matrix, conf_intervals, 10)
array([0.15588428, 0.42205786, 0.42205786])
# With less data the confidence intervals get wider
Seems to be working. Now I need to use the same technique as in the previous post on SR adjustments, which is get the portfolio weights over a number of points on the distribution, and then take an average.
def optimised_weights_given_correlation_uncertainty(corr_matrix, data_points, p_step=0.1):
dist_points = np.arange(p_step, stop=(1-p_step)+0.000001, step=p_step)
list_of_weights = []
for conf1 in dist_points:
for conf2 in dist_points:
for conf3 in dist_points:
conf_intervals = labelledCorrelations(conf1, conf2, conf3)
weights = optimise_for_corr_matrix_with_uncertainty(corr_matrix, conf_intervals, data_points)
list_of_weights.append(weights)
array_of_weights = np.array(list_of_weights)
average_weights = np.nanmean(array_of_weights, axis=0)
return average_weights
Slightly niggly thing here is that there are three inputs, so I need to iterate over 3 sets of distributions. Hence I'm using a 'p-step' of 0.1 (meaning we have to do 9^3 optimisations); using smaller values doesn't change the results (whilst being exponentially slower!), larger values are a bit too coarse.
Recalculating the candidate matrices
Now I'm going to recalculate the candidate matrices used in the original handcrafting. This involves doing this kind of thing:
>>> corr_matrix = three_asset_corr_matrix(labelledCorrelations(0.,0.,0.))
# get raw optimised weights, no uncertainty
>>> optimise_for_corr_matrix(corr_matrix)
array([0.33333333, 0.33333333, 0.33333333])
# get optimised weights with uncertainty, using data_periods = 5
>>> optimised_weights_given_correlation_uncertainty(corr_matrix, 5)
array([0.33333332, 0.33333346, 0.33333323])
Here are the results. First the raw weights with no uncertainty (first 3 columns are pairwise correlations, next 3 are weights):
Corr AB Corr AC Corr BC Raw A Raw B Raw C
0 0.5 0 0.286 0.428 0.286
0 0.9 0 0.256 0.487 0.256
0.5 0 0.5 0.5 0 0.5
0 0.5 0.9 0.5 0.5 0
0.9 0 0.9 0.5 0 0.5
0.5 0.9 0.5 0.263 0.473 0.263
0.9 0.5 0.9 0.5 0 0.5
0 0 0 0.33 0.33 0.33
Now for the original handcrafted weights:
Corr AB Corr AC Corr BC H/C A H/C B H/C C
0 0.5 0 0.3 0.4 0.3
0 0.9 0 0.27 0.46 0.27
0.5 0 0.5 0.37 0.26 0.27
0 0.5 0.9 0.45 0.45 0.1
0.9 0 0.9 0.39 0.22 0.39
0.5 0.9 0.5 0.29 0.42 0.29
0.9 0.5 0.9 0.42 0.16 0.42
0 0 0 0.33 0.33 0.33
Now let's do the new stuff, with 5 data points (anything less than 4 will break, for obvious reasons):
Corr AB Corr AC Corr BC 5dp A 5dp B 5dp C
0 0.5 0 0.298 0.404 0.298
0 0.9 0 0.264 0.472 0.264
0.5 0 0.5 0.379 0.241 0.379
0 0.5 0.9 0.462 0.349 0.189
0.9 0 0.9 0.44 0.12 0.44
0.5 0.9 0.5 0.279 0.441 0.279
0.9 0.5 0.9 0.411 0.178 0.411
0 0 0 0.33 0.33 0.33
Some observations:
- For 'nice' matrices (like equal correlations or the first row: 0,.5,0) the results of both the original handcrafting and the new method are very close to the raw optimisation
- In most cases the original handcrafting and the new method give very similar results. The main exceptions are the fourth row and the third row.
Now let's see how things change with more data, say data_points=20:
Corr AB Corr AC Corr BC 20dp A 20dp B 20dp C
0 0.5 0 0.278 0.444 0.278
0 0.9 0 0.252 0.496 0.252
0.5 0 0.5 0.437 0.125 0.437
0 0.5 0.9 0.497 0.443 0.06
0.9 0 0.9 0.5 0 0.5
0.5 0.9 0.5 0.256 0.489 0.256
0.9 0.5 0.9 0.4955 0.009 0.4955
0 0 0 0.33 0.33 0.33
For the 'nicer' correlation matrices, not very much happens as we have more data. However for the more extreme versions, like rows 3 and 4, we get closer to the raw optimised weights (since we can be more confident that the central correlation estimates are correct). In fact for row 5 we end up with the raw weights, and the penultimate row isn't far off. And finally data_points=100:
Corr AB Corr AC Corr BC 100dp A 100dp B 100dp C
0 0.5 0 0.283 0.434 0.283
0 0.9 0 0.252 0.496 0.252
0.5 0 0.5 0.471 0.059 0.471
0 0.5 0.9 0.5 0.5 0
0.9 0 0.9 0.5 0 0.5
0.5 0.9 0.5 0.256 0.489 0.256
0.9 0.5 0.9 0.5 0 0.5
0 0 0 0.33 0.33 0.33
How many data points should we use?
Although the confidence intervals are tighter (since the correlation is closer to 1) the record is just as bad: 14 misses in 18 years. No secular trend here, but the typical pattern of highly correlated assets: correlations break down in a crisis (1999, 2008). Again the true 95% range is probably 0.04 rather than the 0.01 show here.
def fisher_confidence(data_points, conf_interval):
data_point_root =fisher_stdev(data_points)*4
conf_point = get_confidence_point(conf_interval)
return data_point_root * conf_point
>>> get_correlation_distribution_point(-0.25, 1000, 0.025)
-0.31528119007914723
>>> get_correlation_distribution_point(-0.25, 1000, 0.975)
-0.18236395028476593
>>> get_correlation_distribution_point(.96, 1000, 0.025)
0.9540384541438645
>>> get_correlation_distribution_point(.96, 1000, 0.975)
0.9652020604250287
>>> get_correlation_distribution_point(-0.25, 1000, 0.025)
-0.4925007507489921
>>> get_correlation_distribution_point(-0.25, 1000, 0.975)
>>> 0.028523195562247212
>>> get_correlation_distribution_point(.96, 1000, 0.025)
0.9304815655421306
>>> get_correlation_distribution_point(.96, 1000, 0.975)
0.9771329891421301
def apply_min_weight(average_weights):
weights_with_min = [min_weight(weight) for weight in average_weights]
adj_weights = weights_sum_to_total(weights_with_min)
return adj_weights
def min_weight(weight):
if weight<0.1:
return 0.1
else:
return weight
def weights_sum_to_total(weights_with_min):
sum_weights = np.sum(weights_with_min)
adj_weights = weights_with_min / sum_weights
return adj_weights
>>> apply_min_weight(average_weights)
array([0.45454545, 0.09090909, 0.45454545])
Should we use all of our data to estimate correlations?
- Underlying instrument returns for different instruments
- Trading subsystem returns for different instruments
- Returns of trading rules
from systems.provided.futures_chapter15.basesystem import futures_system
system = futures_system()
## underlying returnsdel(system.config.instrument_weights)
perc_returns = [system.rawdata.get_percentage_returns(instrument_code) for instrument_code in instrument_list]
instrument_list = system.get_instrument_list()
perc_returns_df = pd.concat(perc_returns, axis=1)
perc_returns_df.columns = instrument_list
## subsystem returns
subsys_returns = [system.accounts.pandl_for_subsystem(instrument_code) for instrument_code in instrument_list]
subsys_returns_df = pd.concat(subsys_returns, axis=1)
subsys_returns_df.columns = instrument_list
## trading rule returnsdef get_rule_returns(instrument_code):
rules = list(system.rules.trading_rules().keys())
rule_returns = [system.accounts.pandl_for_instrument_forecast(instrument_code, rule) for rule in rules]
rule_returns_df = pd.concat(rule_returns, axis=1)
rule_returns_df.columns = rules
return rule_returns_df
use_returns = perc_returns_df # change as required
use_returns = use_returns[pd.datetime(1998,1,1):] # common timestamp for fair comparison
use_returns = use_returns.resample("5B").sum() # good enough approx for weekly returns
def get_forecast_and_future_corr(Nweeks_back, Nweeks_forward):
forecast = get_historic_correlations(Nweeks_back)
future = get_future_correlations(Nweeks_forward)
pd_result = merge_forecast_and_future(forecast, future, Nweeks_forward)
return pd_result
def merge_forecast_and_future(forecast, future, Nweeks_forward):
assets = forecast.columns # should be the same won't check
pd_result = []
for asset in assets:
result_for_asset = pd.concat([forecast[asset], future[asset]], axis=1)
# remove tail with nothing
result_for_asset = result_for_asset[:-Nweeks_forward]
# remove overlapping periods which bias R^2
selector = range(0, len(result_for_asset.index), Nweeks_forward)
result_for_asset = result_for_asset.iloc[selector]
result_for_asset.columns = ['forecast', 'turnout']
pd_result.append(result_for_asset)
pd_result = pd.concat(pd_result, axis=0)
return pd_result
def get_future_correlations(Nweeks_forward):
corr = get_rolling_correlations(use_returns, Nweeks_forward)
corr = corr.ffill()
future_corr = corr.shift(-Nweeks_forward)
return future_corr
def get_historic_correlations(Nweeks_back):
corr = get_rolling_correlations(use_returns, Nweeks_back)
corr = corr.ffill()
return corr
def get_rolling_correlations(use_returns, Nperiods):
roll_df = use_returns.rolling(Nperiods, min_periods=4).corr()
perm_names = get_asset_perm_names(use_returns)
roll_list = [get_rolling_corr_for_perm_pair(perm_pair, roll_df) for perm_pair in perm_names]
roll_list_df = pd.concat(roll_list, axis=1)
roll_list_df.columns = ["%s/%s" % (asset1, asset2) for (asset1, asset2) in perm_names]
return roll_list_df
def get_asset_perm_names(use_returns):
asset_names = use_returns.columns
permlist = []
for asset1 in asset_names:
for asset2 in asset_names:
if asset1==asset2:
continue
pairing = [asset1, asset2]
if pairing in permlist:
continue
pairing.reverse()
if pairing in permlist:
continue
permlist.append(pairing)
return permlist
def get_rolling_corr_for_perm_pair(perm_pair, roll_df):
return roll_df[perm_pair[0]][:,perm_pair[1]]
Nweeks_forward = 52 # use 13 weeks for underlying returns, 52 for others
import statsmodels.api as sm
import matplotlib.pyplot as pyplot
pyplot.rcParams.update({'font.size': 16})
Nweeks_list = [4, 7, 13, 26,52, 104, 208, 520]
r_squared = []
for Nweeks_back in Nweeks_list:
print(Nweeks_back)
pd_result = get_forecast_and_future_corr(Nweeks_back, Nweeks_forward)
pd_result = pd_result.dropna()
x = pd_result.forecast
x = sm.add_constant(x)
y = pd_result.turnout
est = sm.OLS(y, x).fit()
r2 = est.rsquared_adj
r_squared.append(r2)
ax = pyplot.gca()
ax.scatter(Nweeks_list, r_squared)
ax.set_xscale('log')
Nweeks_forward = 52 # use 13 weeks for underyling returns, 52 for others
r_squared = []
for Nweeks_back in Nweeks_list:
print(Nweeks_back)
all_r2 = []
for instrument_code in instrument_list:
print("%s/%s" % (instrument_code, Nweeks_back))
use_returns = get_rule_returns(instrument_code)
use_returns = use_returns[pd.datetime(1998, 1, 1):] # common timestamp
use_returns = use_returns.resample("5B").sum() # good enough approx for weekly returns
pd_result = get_forecast_and_future_corr(Nweeks_back, Nweeks_forward)
pd_result = pd_result.replace([np.inf, -np.inf], np.nan)
pd_result = pd_result.dropna()
x = pd_result.forecast
x = sm.add_constant(x)
y = pd_result.turnout
est = sm.OLS(y, x).fit()
r2_this_instr = est.rsquared_adj
all_r2.append(r2_this_instr)
r2 = np.mean(all_r2)
r_squared.append(r2)
- A lookback of less than 2 years is likely to do a relatively poor job of forecasting future correlations over the next year
- Lookbacks of more than 10 years may do a little better at forecasting, but from above we know they will produce weights that aren't as robust even after applying a fudge factor
- Shorter lookbacks may also result in assets moving between groups, which will produce changes in weights regardless of how robust our method is.
Plugging into pysystemtrade
from systems.provided.futures_chapter15.estimatedsystem import futures_system
system = futures_system()
system.config.instruments = ['US10', 'US5', 'SP500']
system = futures_system(config = system.config)
x=system.combForecast.get_forecast_weights("US10")
x.plot()
Hi Rob! love the post. I am just not sure how and when the weights are calculated. Is the forecast of the instruments used to calculate the weights?
ReplyDeleteBest,
Andrew
optimise_with_sigma
DeleteHi Rob.
ReplyDeleteRegarding the weighting of trading rules, do you have any thoughts on deriving correlations directly from trading rule characteristics themselves rather than returns of synthetic trading rule portfolios?
As an example, let’s say we're trading a multifactor equities strategy, using P/E, P/B, and momentum. The rolling correlations based on returns will depend on the recent strength of the factors. E.g., when value stocks all sell off, the price correlation of P/E and P/B will rise.
The characteristic correlations, however, should be quite stable: low P/E stocks are often low P/B stocks, regardless of whether value stocks have had co-movement lately.
If by using correlations in the weighting of trading rules, we’re essentially saying “I’m not going to reward Bob for telling me what Alice already told me,” perhaps using characteristics gets to the heart of the matter better than using returns?
Thanks,
Richard
Yes, and in fact you can do this for momentum
Deleteeg https://photos.app.goo.gl/rHo7HACFKEkm5Xcj9 (credit Soc Gen via Riccardo Ronco). Such derivations rely on assumptions however which are unlikely to be true. The advantage is you definitely can't be accused of overfitting.
In the example you linked, I'm guessing the difference between empirical and theoretical correlations of the momentum signals is due to autocorrelation?
ReplyDeleteWhat are the assumptions that are unlikely to be true?
Auto correlation and non Gaussian distributions.
DeleteHi Rob. I've got 2 questions regarding correlations.
ReplyDeleteQ1.
I'm struggling to understand one of the scenarios in your original handcrafted weights.
In row 11: correlations(AB,AC,BC) of [0.9,0.5,0.9], use weights(A,B,C) of [42%,16%,42%]. So far so good.
In row 9: correlations(AB,AC,BC) of [0.9, 0, 0.9], use weights(A,B,C) of [39%,22%,39%]. In this scenario, AC has a lower correlation, so shouldn't assets A & C have higher weights and B a lower weight relative to row 11? What am I missing?
Q2.
I find the intuitions you include as justifications for your decisions hugely valuable (e.g. SR max of 0.4 for an instrument subsystem). However, I can't grasp your aversion to zero weights for correlated trading rules.
My understanding of the correlation uncertainty problem is that for high value correlations (e.g. 0.9), negative skew means that small samples will usually overstate underlying correlation. However, you've shown that with enough data points this problem goes away.
What exactly are you trying to protect against by avoiding zero weights?
Thanks,
Richard
PS Another hack you could use: you could apply a linear shrinkage factor to all correlations before solving for max diversification. I think this would bring the weights closer to what you want.
Q1: There is a good chance the weights are transposed, but I can't easily find the ancient code I used to calculate them to check.
DeleteQ2: Zero weights assume the data is purely stationary. Even if we had a zillion data points, it seems heroic to assume that something really should have a zero weight. Zillions of data points only improve the sampling error for the past data which doesn't help us if the correlations change.
I've used shrinkage in the past, but it is a bit of a 'hacky' solution to this problem without the elegant direct line from sampling uncertainty to weights that we have here.
Thanks for your response Rob.
DeleteEven if we assume non-stationarity, out-of-sample correlations may have to change by a lot for a zero weight to become non-zero (when we force weights to be non-negative).
To use the example of 3 different speed EWMA rules with correlations(AB,AC,BC) of [0.9/X/0.9], optimising for max diversification doesn't allocate a weight to asset B until AC rises above 0.8.
In fact, with correlations(AB,AC,BC) of [X/Y/X], Y must be greater than 2X-1 before asset B receives a non-zero weight (no proof, just derived empirically). Therefore, even given non-stationarity, it appears that zero weights could actually be quite stable.
What do you think?
Well it isn't just correlations that aren't stable. Let's put them aside and think more deeply about why I don't like zero weights. Financial assets aren't nice joint Gaussian things, whether they are underlying instrument returns. In some ways they have more in common with an envelope full of lottery tickets. Imagine I had an envelope containing all possible tickets for tonights lottery draw, and I allowed you to allocate portfolio weights to them. Of course the optimal weight would be 1/n, but more importantly you would be insane to allocate a zero weight to any ticket, because that could be the one that hits the jackpot. So even if you had some skill at predicting lottery draw numbers, and had a strong forecast that a certain set of numbers would come up, you would still want a non zero allocation in everything else.
DeleteOk, cast the Gaussian theory into the bonfire, let’s talk in bets!
DeleteIn your lottery ticket analogy, I agree that it'd be wrong to have a zero weight. But I don't think your analogy is fair because lottery tickets are uncorrelated (well, negatively correlated actually) and max diversification only ever allocates a zero weight because correlations are high!
Let me try my hand at analogy.
Say you have 3 friends, each of whom has a profitable system for picking the profitable line in a basketball game.
Alex runs statistical models using historical possession data to make profitable bets. She wins 55% of bets.
Ben has a natural gift for handicapping and watches every game. He also wins 55% of bets, but his bets don't always match up with Alex's.
Charlie gets tips from Alex & Ben, and bets according to who he likes more at the time. Clearly Charlie also wins 55% of bets.
How would you allocate to a portfolio of these betting systems?
Charlie's system is more correlated to both Alex's and Ben's than either Alex's or Ben's are to each other, but the correlations aren’t stable. However, because Charlie's system sits inside Alex's and Ben's systems, it can never provide any diversification benefit to the portfolio, and therefore zero-weighting Charlie makes sense.
Isn’t it possible that a similar relationship exists between different speed trend rules? A medium speed trend rule is intrinsically correlated to the faster and slower rules (as it’s calculated using many of the same values), and so it could be possible that it never provides any diversification benefit?
I appreciate you taking the time on this, it’s certainly right down in the weeds.
Of course I wouldn't bet on Charlie.
DeleteI guess the point I was that asset returns are sometimes like ABC, and sometimes like the lottery. The main point of the lottery wasn't so much the zero correlation, more that a linear correlation measure doesn't really make sense when the relationship between assets is non linear. And when you move from having well known correlation linear correlation estimates to having unknown non linear relationships, zero weights make less sense.
To put it another way, for many portfolio optimisation problems things are as you describe, except every now and then Charlie has a flash of brilliance and does something unique - he makes money when the other two don't. Not often enough to affect the linear correlation measure - but enough to justify a non zero weight.
I must admit for correlated trend rules, there is definitely a case to say they are more like the ABC example than the lottery example. It's a special case where I can actually calculate what the correlations from the rules can be without empirical data.
But I am not so sure that is quite so true for trading rules that work quite differently from each other.
And I think we can agree that instrument weights covering substrategies trading 'outlier' trading rules like momentum are much more like the lottery example than the ABC, and therefore we should avoid zero weights (assuming we have the capital to do so, or are using something like my dynamic optimisation technique which allows you to trade a large number of instruments with limited dosh).
No problem discussing this - it's something I'll probably include in my 5th or 6th book, so you're providing me with free material....
Hi Rob, something that confuses me about the correlation scores above and their relationship to the handcrafted weights (and also seen in Ch4, Table 8, page 79 of your Sys trading book) is that you assign a higher weight to a higher correlation value. If one asset is 0.9 correlated to another shouldn't it receive a lower weight? Below series is pasted from above as an example; I would expect more funds/risk in the 0 correlated assets but 0.9 has the highest weighted value (0.472). Its the same in the book where 0.0 correlated assets get a lower weighting than higher correlation scores. What am i missing?
ReplyDelete0 0.9 0 0.264 0.472 0.264