Fortunately there are a few methods we can use to deal with this problem. Some of these are straightforward; like the handcrafted method I describe in my book. Others such as bootstrapping and shrinkage are more complicated. On any particular set of asset price history one method or another may perform better. It's even possible that just by fluke the simplest naive method will do best.
I believe that to get a good grasp of which portfolio optimisation is best you need to use random data; and I'll do exactly that in this post.
This is the third and final post in a series on using random data. The first, which is worth reading before this one, can be found here. The second post is optional reading. You may also want to read this post which shows some optimisation with real data and explains some of the concepts here in more detail.
Optimisation 101
I'm using straightforward Markowitz optimisation. We maximise the Sharpe Ratio of a portfolio, over some portfolio weights, given the expected mean, standard deviation, and correlation of returns.
Because I'm optimising the weights of trading systems I constrain weights to be positive, and to sum to 100%. I can also assume that my systems have the same expected volatility of returns, so all I need is the mean (or Sharpe Ratio) and correlation. Finally I don't expect to see negative correlations which makes the problem more stable.
The 'one period' or naive method is to take all past returns and estimate Sharpe Ratios and correlations; and then plug these estimates into the model. The flaw of this method is it ignores the huge uncertainty in the estimates.
Method two - bootstrapping - involves repeating
our
optimisation many times over different parts of our data, taking
the resulting weights, and averaging
them out. So the
weights are the average
of many
optimisations, rather than one
optimisation on the
average of
all
data.
The logic for bootstrapping is simple. I believe
that the past is a good guide to the future; but I don't know which
part of the past will be repeated. To hedge my bets I assume there is
an equal chance of seeing any particular historical period. So it's
logical to use an average of the portfolios which did best over all
previous periods.
Shrinkage is a Bayesian method which involves taking the estimated Sharpe Ratios and correlation, and then "shrinking" them towards a "prior". There are numerous ways to do this, but I'm going to use a relatively simple variation. "Shrinking" involves taking a weighted average of the estimated correlation matrix (or mean vector) and the prior; the weighting on the "prior" is the shrinkage factor. For "priors" I'm going to use the "zero information" priors - equal Sharpe Ratios, and identical correlations.
By the way the nemesis of all optimisations, the very simplest method of equal weights, is equivalent to using the shrinkage method of a shrinkage factor of 100%. It should also be obvious that using 0% shrinkage is the same as a naive optimisation.
Handcrafting is a simple robust method I describe in chapter four of my book. We estimate correlations, and then use a lookup table to determine what weights to use. I'm going to use the extended table which is here.
* There is an extension in my book to the hand crafted method which incorporates estimated Sharpe Ratios; but to keep life simple I won't be discussing it here.
The analysis
I'm going to use the technique I described in the first post in this series for producing random returns for 3 assets. I'm going to use three because it will make the results more tractable and intuitive. It also means I can use the handcrafted method without creating a complex grouping algorithim. The general results will still apply with more assets.
The portfolio I'm going to generate data from has identical Sharpes (in expectation), and correlations of 0.8, 0, 0. So it's very similar to the portfolio of two equity indicies and one bond index I consider in this post.
Note that the set of correlations is deliberately different (mostly) from the set used by the handcrafted method, just to ensure there is no favouritism here.
The usual caveat about random data applies here. The expected sharpe ratio, volatility and correlation of returns in this random world is fixed; but in reality it varies a lot. Nevertheless I think we can still draw some useful conclusions by using random data. Be warned though that non robust methods (such as the classic naive method) will do even worse in the real world than they do here.
All my testing will be done on out of sample windows*. So I'll use data in the past to estimate sharpe ratio and correlations, coming up with some weights, and then run those weights for a year to see how the portfolio behaves. The amount of history you have is critical, particularly with this stylised example where there is a fixed data generation process to be 'discovered'.
I ran tests with available asset price histories of 1 year, 5 years, 10 years and 20 years for the in sample period, with 1 year for the out of sample period.
* this term is explained more in chapter three of my book, and in this post.
To keep things simple I'll just be using the average out of sample sharpe ratio as my measure of how successful a given method is. Just for fun I'll also measure the average degradation of Sharpe ratio for the optimised weights between in sample, and out of sample.
I'm going to going to basically compare the four methods against each other, and see which does best on at out of sample basis. Actually I'm going to compare 18 (!) methods- hand-crafted and bootstrapping; plus shrinkage with a shrinkage factor of 0% (which is equivalent to the naive method), 33%, 66% and 100% (which is equivalent to equal weighting); note because I allow different shrinkage factors on the mean and correlation that adds up to 16 possibilities.
Results
All these graphs have the same format. The x axis shows which of the 18 methods are used. Note the abbreviations: H/C is handcrafted (which is also the benchmark), BS is bootstrapped and Sab implies shrinkage of a on asset Sharpe Ratios and b on correlations; where 0,3,6 and 1 represent 0%, 33%, 66% and 100%. Naturally S00 is the naive method and S11 is equal weights. The y axis shows eithier the average out of sample SR versus a benchmark (S00), or the average degradation from in sample to out of sample (negative means returns got worse).
I plot the methods with shrinkage for the mean increasing as we go to the right (S11 - equal weights, BS and HC are on the extreme right; the naive portfolio S00 on the extreme left).
Naturally because this is random data I've generated a number of series of asset returns for each set of asset sharpe ratios and correlations; and the results are averaged over those.
There is some (messy) code here.
One year
The results above are from using just one year of data. Adding shrinkage to correlations seems to make things slightly worse, but shrinking the means improves things. The handcrafted method does best, with bootstrapping coming in second. The scale is a little misleading; going from the worst to the best method is an improvement in SR from around 0.53 to 0.68.Unsurprisingly without sufficient shrinkage on the means there is massive degradation in the Sharpe Ratio going from in sample to out of sample. Look at S00 the naive method. Out of sample it has an average SR of 0.58; in sample it is 0.78 higher than this, at 1.36! Using a non robust optimisation method over such a short period of data is going to give you seriously .
Five years
Ten years
Twenty years
With twenty years of data the naive method is doing a little better. Shrinking the correlations definitely penalises performance - we now have enough data to be confident that the correlations are significant. However there is still a benefit from shrinking the means; even completely away. The advantage of handcrafting and bootstrapping has been whittled away, but they are still amongst the best methods.
There is still a degradation in performance going out of sample, but it is much smaller than before.
Postscript
To an extent these results are a function of the underlying portfolio. If we ran these tests with a portfolio that had significant mean differences then shrinking the mean wouldn't be such a good idea. Here for example are the results with the same correlations as before, but Sharpe Ratios of [0.0, 0.5, 1.0].
First one year:
Now after twenty years:
Here the optimal shrinkage for the mean seems to be between 33% and 66%.
Handcrafting, which in the simple form here does not account for differences in sharpe ratios, doesn't do as well as bootstrapping which does. It also loses out once the shrinkage methods have enough data to use the difference in Sharpe ratios properly.
However we don't know in advance what kind of portfolio we have... and significant differences in correlations are more common than statistically different Sharpe Ratios.
Conclusion
I naturally have a soft spot for my preferred method of bootstrapping and handcrafting. Shrinking can be a good alternative, but it's hard to get the shrinkage factor correct. In general you need to shrink mean estimates more than correlations, and shrink more when you have less data history. Using insufficient shrinkage; or none at all with the naive method, will also lead to massive degradation from in sample to out of sample returns.
This was the final post in a series on using random data.
First post: Introducing random data
Second post: Does equity curve trading work?
Have you read Pedersen's "Enhanced Portfolio Optimization" paper? He finds that correlation shrinkage fixes errors both in the covariance matrix and in the expected returns. I'd like to know what you think, since his results look quite different from yours (obviously the results depend on the assumptions, but still...). Thanks
ReplyDeleteI haven't read the paper, but it makes sense. Shrinking correlations makes you less sensitive to small differences in means and standard deviations - not an empirical result, just a property of the maths.
DeleteHi Robert, love what you do!
ReplyDeleteI've tried to replicate-ish your idea, to check, for a given system, which trading horizon would produce the highest Sharpe ratios. So I backtested the system across many different instruments, for 4 different trading speed each time (not accounting for costs): 1min, 5min, 10min and 30min. My intuition, based on maths was that the longer the horizon, the higher the Sharpe should be (as StDev only increases with the SQRT of time).
But what I observe is the opposite: the shorter the horizon, the higher the SR.
I am a little bit surprised and this sparked a debate in my team. What would you say?
That's exactly what I would expect - in theory you get more independent bets the faster the trade and therefore according to the law of active management (grinold and kahn) your SR goes up.
Delete"My intuition, based on maths was that the longer the horizon, the higher the Sharpe should be (as StDev only increases with the SQRT of time)." yes but that only implies that for a given underlying SR, the SR calculated over longer periods will increase eg a daily SR of 0.10 translates to an annualised SR of approx 1.60
Thanks vm, very useful
Delete