As I said in my last post I'm currently in the process of a mega-sized research project on fitting. In the first post I examined the correct way to cluster combinations of trading rules and instruments.
This next post is rather meatier, and is about evaluating and calibrating some portfolio optimisation techniques. We might call this 'meta optimisation', since we want to find the best way to do optimisation, which itself is effectively a form of optimisation - we are choosing between alternatives based on some utility function.
And because it's optimisation, it can be done in a bad in sample way. And often is. People do have a habit of using a particular data set, working out which optimisation will work best, and then using that. They think they are good people because the optimisation is running in a nice robust out of sample fashion -but they are not good people. Because the choice of optimisation itself has been made having seen all the data.
To avoid this I'm initially going to use random data to evaluate and calibrate the various optimisation techniques. Then no real data will be harmed. A subsequent post will use some real data.
Note: I've sort of had a go at this before, here. However this is a much more thorough look at the problem, whereas the previous post was very limited in scope both of data and also of methodologies. There is also a link here to my multiple posts about probabilistic evalulation of outcomes ().
Note 2: whilst researching this post I found a 'new' shrinkage based method, EPO, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3530390 developed by one of my favourite authors (Pedersen of AQR) with co-authors. The main reason I like this is because of the allusion with the more famous EPO which as someone who cycles and follows cycling is obviously quite ironically entertaining (I have described it as 'new' but it's several years old, and I can now see that it was highlighted by younggotti in a comment on my earlier post but which I didn't follow up)
Note 3: I also came across this relatively new book: https://portfoliooptimizationbook.com/ which is quite a nice survey of the field.
The data
For the data I'm going to keep things real simple. I will be doing an optimisation with nine assets. That might seem an odd number*, but it's because in a subsequent post I will be repeating some of this with real assets, and I have nine specific ones in mind (spoiler alert: three instruments trading three different trading rules/methods: two speeds of trend, plus carry). The true SR of the assets will be drawn with equal proability from this list: [-0.5, -0.25, 0, 0.25, 0.5, 0.75, 1]. The true correlation of the assets will be drawn with equal probability from this list [-0.25, 0, 0.25, 0.5, 0.75, 0.9]. These lists are not fully symmetric, since trading strategy returns of the type I am optimising tend not to have substantial negative correlations or very high/low SR.
* obviously it is indeed an odd number, but it might also appear to be an arbitrary choice.
I will then randomly draw returns from those multivariate Gaussian distributions, generating 2000 outcomes, each with 35 years of history (each 'year' is 256 business days). Why 35 years? Well, I will be varying the length of the in sample period. To be precise, I will use in sample periods between 1 year and 30 years; and evaluate out of sample on a five year basis. Using a (shorter) longer out of sample period would just (increase) reduce the variance between different outcomes; it won't affect their relative efficacy. It seems unlikely anyone will go more than five years without refitting (I do it every year in backtest) this seems about right.
I will generate a certain number of histories and then evaluate the relative performance of each optimiser on each history; thus avoiding the role of luck if one optimiser happens to get a lucky break.
Note if it isn't obvious I'm assuming I am a SR maximiser (equivalent to a CAGR maximiser for a leveraged investor with Gaussian returns), and I'm assuming all assets have the same expected standard deviation. As a futures trader this is fine. I'm also assuming weights will be positive. These are my standard boilerplate assumptions for optimising trading strategy returns.
Random data is not real data
Well duh. But why is this important? Because random data is drawn from a fixed and well behaved distribution. This means the optimiser only has to discover / estimate the parameters of that distribution as more data is revealed to it. But real data doesn't have a fixed and known distribution. It doesn't actually have any distribution at all. We just model it hoping it does.
Essentially random data sets a lower bound on robustness calibration. For example, suppose we determine that the correct shrinkage for the vector of expected SR on a Bayesian portfolio optimisation using random data is 0.1 (which means we average using 90% of the estimated SR, and 10% of the prior SR). Then it's likely the correct shrinkage on real data will be higher than 0.1.
This also means that less robust methods will be flattered compared to more robust methods when using random rather than real data. For this reason we need to treat the results with some caution; and in a future post I will be sense checking them against some real data.
The criteria
On what basis should be evaluate an optimiser? Clearly we are interested in the out of sample performance - Sharpe Ratio in this case. But it's the probabilistic performance that interests me. Using random data means we can look at a distribution of outcomes. And I'm not just interested in the central, median point of that distribution. I'm concerned with optimisers that produce extreme, sparse, weights. On average these might look fine, but their downside will be worse than a more robust optimiser which produces more reasonable weights. So I am also going to evalutate performance at a more cautious 5% percentile point (what we use for statistical significance).
Of course, there are other criteria for optimising. Speed is an important one that will be bad for gridsearch, bootstrap and monte carlo type methods. Related to that is convergence - how quickly does a boostrap or monte carlo converge on weights that are 'good enough'. If convergence is quite quick then the penalty of running multiple optimisations won't be as large.
The optimisers
Let's quickly run through the competitors in this little olympics:
- monte carlo (random, parameteric)
- bootstrapping (random, non parametric)
- double shrinkage (shrinking SR towards average SR, and correlations to zero). Shrinkage can range from zero (no shrinkage) Note with the right parameters this encompasses some other methods including:
- NMV naive mean variance (no shrinkage on anything)
- EW equal weights (both full shrinkage)
- MD maximum diversification (no shrinkage correlation, full shrinkage on SR)
- EPO (we just shrink the correlation matrix to some degree)
Notice I am not at this stage using any kind of clustering or hierarchical method, such as my own 'handcrafting'. My intention is to first, in this post, establish the best way to optimise relatively small portfolios. Then in a subsequent post I will properly evaluate the performance / speed tradeoff of using this small portfolio optimisation inside a top down clustering method.
There are a whole bunch of other methods we could use, but I have a good understanding of the methods above and I don't feel the need to go very fancy.
Note that within the shrinkage team we have a number of competitiors as we can vary the shrinkage in a range of let's say 0 (no shrinkage, use empirical results), to 1.0 (full shrinkage). For correlation shrinkage I'm going to use these nine steps: 0, 0.2, 0.4, 0.6, 0.7, 0.75, 0.8, 0.9, 1.0 (the optimal EPO shrinkage is 0.75 hence the extra granularity around there). For SR shrinkage I'm going to use these nine values [0, 0.25, 0.5, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0] because I know that minimal amounts of mean shrinkage don't achieve much. That gives me 81 possible shrinkage methods; but one is actually equal weights (shrinkage of 1 on SR and 1 on correlations); another is maximum diversification (1,0), a third is naive mean variance (0,0), and there are seven that are EPO (0, 0.2... 0.9). So there are actually 71 shrinkage methods of various strengths.
Establishing convergence speed
Before we begin we need to establish the number of runs required to establish convergence for the bootstrapping and monte carlo methods.
There are a couple of ways we can establish convergence speed:
- How quickly the weights 'settle down', i.e. does not change very much
- How quickly the probabilistic out of sample SR 'settles down'
Note also that this convergence will take longer with larger portfolios, since there is a bigger possible space to cover.
The following shows for bootstrapping one random one year long in-sample how the average* error narrows between the 'correct' weights (run with the maximum of 2000 iterations) and the weights we get with fewer iterations:
* sqrt(sum(error^2))

The giant test
Qualifiying round, shrinkage
0.00 0.20 0.40 0.60 0.70 0.75 0.80 0.90 1.00
0.00 0.87 0.86 0.84 0.81 0.79 0.79 0.77 0.75 0.72
0.25 0.88 0.86 0.83 0.81 0.79 0.77 0.77 0.74 0.70
0.50 0.87 0.85 0.83 0.79 0.77 0.75 0.74 0.72 0.67
0.75 0.83 0.82 0.78 0.74 0.72 0.70 0.69 0.67 0.62
0.80 0.81 0.80 0.77 0.73 0.71 0.69 0.67 0.65 0.60
0.85 0.77 0.77 0.75 0.70 0.67 0.67 0.65 0.62 0.59
0.90 0.73 0.73 0.70 0.65 0.65 0.63 0.62 0.61 0.57
0.95 0.65 0.64 0.61 0.58 0.57 0.56 0.56 0.57 0.55
1.00 0.41 0.40 0.40 0.39 0.38 0.38 0.38 0.37 0.38
0.00 0.20 0.40 0.60 0.70 0.75 0.80 0.90 1.00
0.00 -0.12 -0.13 -0.13 -0.15 -0.16 -0.15 -0.16 -0.18 -0.20
0.25 -0.12 -0.13 -0.13 -0.13 -0.14 -0.15 -0.17 -0.19 -0.22
0.50 -0.13 -0.13 -0.14 -0.17 -0.18 -0.18 -0.19 -0.22 -0.25
0.75 -0.21 -0.18 -0.19 -0.21 -0.23 -0.24 -0.25 -0.26 -0.29
0.80 -0.21 -0.21 -0.21 -0.22 -0.24 -0.26 -0.28 -0.28 -0.31
0.85 -0.27 -0.25 -0.23 -0.27 -0.29 -0.28 -0.30 -0.30 -0.34
0.90 -0.35 -0.34 -0.33 -0.33 -0.32 -0.34 -0.33 -0.34 -0.36
0.95 -0.50 -0.48 -0.45 -0.44 -0.43 -0.43 -0.40 -0.37 -0.39
1.00 -0.72 -0.69 -0.68 -0.66 -0.66 -0.66 -0.66 -0.66 -0.47
0.00 0.20 0.40 0.60 0.70 0.75 0.80 0.90 1.00
0.00 1.19 1.18 1.16 1.15 1.13 1.12 1.11 1.08 1.04
0.25 1.19 1.18 1.16 1.14 1.12 1.11 1.09 1.06 1.01
0.50 1.19 1.18 1.16 1.13 1.10 1.08 1.06 1.02 0.95
0.75 1.13 1.13 1.10 1.04 1.01 0.98 0.95 0.89 0.81
0.80 1.09 1.09 1.06 1.01 0.96 0.93 0.90 0.84 0.77
0.85 1.05 1.04 1.01 0.95 0.90 0.87 0.84 0.78 0.72
0.90 0.97 0.95 0.93 0.86 0.82 0.80 0.76 0.71 0.66
0.95 0.84 0.82 0.79 0.73 0.69 0.67 0.65 0.61 0.58
1.00 0.47 0.47 0.45 0.43 0.42 0.41 0.41 0.40 0.38
0.00 0.20 0.40 0.60 0.70 0.75 0.80 0.90 1.00
0.00 0.32 0.31 0.31 0.30 0.29 0.28 0.27 0.25 0.21
0.25 0.32 0.31 0.31 0.29 0.28 0.27 0.26 0.23 0.19
0.50 0.31 0.31 0.29 0.28 0.25 0.24 0.22 0.19 0.14
0.75 0.24 0.24 0.21 0.19 0.17 0.15 0.13 0.08 0.02
0.80 0.19 0.20 0.19 0.15 0.13 0.11 0.09 0.05 -0.02
0.85 0.14 0.16 0.15 0.11 0.07 0.06 0.04 -0.01 -0.08
0.90 0.04 0.09 0.08 0.03 -0.00 -0.03 -0.04 -0.09 -0.15
0.95 -0.15 -0.09 -0.10 -0.12 -0.13 -0.16 -0.17 -0.21 -0.24
1.00 -0.60 -0.59 -0.57 -0.56 -0.55 -0.55 -0.55 -0.54 -0.47
- MC monte carlo (random, parameteric) with instances depending on in sample length
- BS bootstrapping (random, non parametric) with instances depending on in sample length
- OS Optimal shrinkage (SR shrinkage=0.25, nothing on correlations)
- EPO shrinkage (correlation shrinkage = 0.75, nothing on SR)
- CS Cautious shrinkage (SR shirnkage = 0.5, correlation shrinkage = 0.4)
- NMV Naive mean variance (no shrinkage on eithier)
- EW Equal weights (full shrinkage on both)
Summary and what's next
- We got some ballpark for MC/bootstrap convergence rates
- Even in this context some shrinkage on the mean is optimal
- The gold standard for weights is bootstrap/MC, which also don't require any shrinkage meta-parameters, but they are bloody slow.







No comments:
Post a Comment
Comments are moderated. So there will be a delay before they are published. Don't bother with spam, it wastes your time and mine.