Fortunately there are a few methods we can use to deal with this problem. Some of these are straightforward; like the handcrafted method I describe in my book. Others such as bootstrapping and shrinkage are more complicated. On any particular set of asset price history one method or another may perform better. It's even possible that just by fluke the simplest naive method will do best.
I believe that to get a good grasp of which portfolio optimisation is best you need to use random data; and I'll do exactly that in this post.
This is the third and final post in a series on using random data. The first, which is worth reading before this one, can be found here. The second post is optional reading. You may also want to read this post which shows some optimisation with real data and explains some of the concepts here in more detail.
I'm using straightforward Markowitz optimisation. We maximise the Sharpe Ratio of a portfolio, over some portfolio weights, given the expected mean, standard deviation, and correlation of returns.
Because I'm optimising the weights of trading systems I constrain weights to be positive, and to sum to 100%. I can also assume that my systems have the same expected volatility of returns, so all I need is the mean (or Sharpe Ratio) and correlation. Finally I don't expect to see negative correlations which makes the problem more stable.
The 'one period' or naive method is to take all past returns and estimate Sharpe Ratios and correlations; and then plug these estimates into the model. The flaw of this method is it ignores the huge uncertainty in the estimates.
Method two - bootstrapping - involves repeating our optimisation many times over different parts of our data, taking the resulting weights, and averaging them out. So the weights are the average of many optimisations, rather than one optimisation on the average of all data.
The logic for bootstrapping is simple. I believe that the past is a good guide to the future; but I don't know which part of the past will be repeated. To hedge my bets I assume there is an equal chance of seeing any particular historical period. So it's logical to use an average of the portfolios which did best over all previous periods.
Shrinkage is a Bayesian method which involves taking the estimated Sharpe Ratios and correlation, and then "shrinking" them towards a "prior". There are numerous ways to do this, but I'm going to use a relatively simple variation. "Shrinking" involves taking a weighted average of the estimated correlation matrix (or mean vector) and the prior; the weighting on the "prior" is the shrinkage factor. For "priors" I'm going to use the "zero information" priors - equal Sharpe Ratios, and identical correlations.
By the way the nemesis of all optimisations, the very simplest method of equal weights, is equivalent to using the shrinkage method of a shrinkage factor of 100%. It should also be obvious that using 0% shrinkage is the same as a naive optimisation.
Handcrafting is a simple robust method I describe in chapter four of my book. We estimate correlations, and then use a lookup table to determine what weights to use. I'm going to use the extended table which is here.
* There is an extension in my book to the hand crafted method which incorporates estimated Sharpe Ratios; but to keep life simple I won't be discussing it here.
I'm going to use the technique I described in the first post in this series for producing random returns for 3 assets. I'm going to use three because it will make the results more tractable and intuitive. It also means I can use the handcrafted method without creating a complex grouping algorithim. The general results will still apply with more assets.
The portfolio I'm going to generate data from has identical Sharpes (in expectation), and correlations of 0.8, 0, 0. So it's very similar to the portfolio of two equity indicies and one bond index I consider in this post.
Note that the set of correlations is deliberately different (mostly) from the set used by the handcrafted method, just to ensure there is no favouritism here.
The usual caveat about random data applies here. The expected sharpe ratio, volatility and correlation of returns in this random world is fixed; but in reality it varies a lot. Nevertheless I think we can still draw some useful conclusions by using random data. Be warned though that non robust methods (such as the classic naive method) will do even worse in the real world than they do here.
All my testing will be done on out of sample windows*. So I'll use data in the past to estimate sharpe ratio and correlations, coming up with some weights, and then run those weights for a year to see how the portfolio behaves. The amount of history you have is critical, particularly with this stylised example where there is a fixed data generation process to be 'discovered'.
I ran tests with available asset price histories of 1 year, 5 years, 10 years and 20 years for the in sample period, with 1 year for the out of sample period.
* this term is explained more in chapter three of my book, and in this post.
To keep things simple I'll just be using the average out of sample sharpe ratio as my measure of how successful a given method is. Just for fun I'll also measure the average degradation of Sharpe ratio for the optimised weights between in sample, and out of sample.
I'm going to going to basically compare the four methods against each other, and see which does best on at out of sample basis. Actually I'm going to compare 18 (!) methods- hand-crafted and bootstrapping; plus shrinkage with a shrinkage factor of 0% (which is equivalent to the naive method), 33%, 66% and 100% (which is equivalent to equal weighting); note because I allow different shrinkage factors on the mean and correlation that adds up to 16 possibilities.
All these graphs have the same format. The x axis shows which of the 18 methods are used. Note the abbreviations: H/C is handcrafted (which is also the benchmark), BS is bootstrapped and Sab implies shrinkage of a on asset Sharpe Ratios and b on correlations; where 0,3,6 and 1 represent 0%, 33%, 66% and 100%. Naturally S00 is the naive method and S11 is equal weights. The y axis shows eithier the average out of sample SR versus a benchmark (S00), or the average degradation from in sample to out of sample (negative means returns got worse).
I plot the methods with shrinkage for the mean increasing as we go to the right (S11 - equal weights, BS and HC are on the extreme right; the naive portfolio S00 on the extreme left).
Naturally because this is random data I've generated a number of series of asset returns for each set of asset sharpe ratios and correlations; and the results are averaged over those.
There is some (messy) code here.
Unsurprisingly without sufficient shrinkage on the means there is massive degradation in the Sharpe Ratio going from in sample to out of sample. Look at S00 the naive method. Out of sample it has an average SR of 0.58; in sample it is 0.78 higher than this, at 1.36! Using a non robust optimisation method over such a short period of data is going to give you seriously .
There is still a degradation in performance going out of sample, but it is much smaller than before.
To an extent these results are a function of the underlying portfolio. If we ran these tests with a portfolio that had significant mean differences then shrinking the mean wouldn't be such a good idea. Here for example are the results with the same correlations as before, but Sharpe Ratios of [0.0, 0.5, 1.0].
First one year:
Now after twenty years:
Handcrafting, which in the simple form here does not account for differences in sharpe ratios, doesn't do as well as bootstrapping which does. It also loses out once the shrinkage methods have enough data to use the difference in Sharpe ratios properly.
However we don't know in advance what kind of portfolio we have... and significant differences in correlations are more common than statistically different Sharpe Ratios.
I naturally have a soft spot for my preferred method of bootstrapping and handcrafting. Shrinking can be a good alternative, but it's hard to get the shrinkage factor correct. In general you need to shrink mean estimates more than correlations, and shrink more when you have less data history. Using insufficient shrinkage; or none at all with the naive method, will also lead to massive degradation from in sample to out of sample returns.
This was the final post in a series on using random data.
First post: Introducing random data
Second post: Does equity curve trading work?