This is my third post in a series about optimisation and fitting. In my previous post I used random data to calibrate and evaluate many portfolio optimisation techniques. It's worth quoting in full from that post:
Random data is not real data: Well duh. But why is this important? Because random data is drawn from a fixed and well behaved distribution. This means the optimiser only has to discover / estimate the parameters of that distribution as more data is revealed to it. But real data doesn't have a fixed and known distribution. It doesn't actually have any distribution at all. We just model it hoping it does.
To summarise then, random data from a fixed distribution differs from real data in three important ways:
- There is no distribution! We just assume there is one.
- The distribution (which doesn't exist) is not known, and thus it's likely the distribution we assume is the wrong one. This is especially true for modelling underlying financial price returns with joint Gaussian models.
- The distribution (which again, doesn't exist) isn't fixed, but can change over time.
And it has one thing in common:
- The unknown parameters of the distribution are unknown and have to be learned over time.



Those might be flukes, so let's look at lots of random results. I'm going to pick an instrument and trading rule randomly, and measure it's final RMSE number. I will then generate some random returns of the same length from the same SR distribution (by measuring the full sample SR for the relevant instrument/rule pairing); and measure that's RMSE. I will then select another rule from the same instrument, get the correlation of the two p&l streams, and generate some more random returns with the given expected correlation. Next and finally I will measure the correlation RMSE for the two sets of real returns, and the two sets of random returns.
If I consider the ratio [RMSE real data / RMSE random data] (both for next one year); then the median of this over a few thousand randomly selected trading strategy components is 1.06 for Sharpe Ratios, and for correlations around 5.6.
In simple terms, we are a little bit worse than forecasting Sharpe Ratios in real data one year ahead than we would be with random data, but a LOT worse with correlations.
Partly this is because we are pretty terrible at forecasting SR one year ahead anyway even with a stable underlying distribution; we don't do much worse with real data. However it does seem that correlations are far more unstable in reality than in randomly generated data. Note that these are correlations for trading strategy component returns. In some cases they are mathematically related (eg EWMAC of different speeds) and could be derived with some assumptions, a pencil, and a napkin. They are certainly more stable than the returns of the underlying instruments themselves (think about the changing correlation of stocks and bonds in different inflation environments).
(Note: These numbers are about the same for five years ahead and also ten years ahead)
If we recall from the prior post that the optimal shrinkage is zero on correlations with random data; we can now see why with actual data we'd probably want to opt for some correlation shrinkage; purely because the sampling error is much larger in practice. That is the empirical finding of the EPO paper. It does feel a bit weird since up to now my gut feeling has been that we have to shrink means a lot because they are much harder to forecast and because they have an outsized effect on portfolio weights compared to differences in correlation. Whilst the latter is still true it seems the former is not.
Food for though. Anyway the next step is to repeat the 'Ultimate Fitting Championships' battle, but this time with real data.








No comments:
Post a Comment
Comments are moderated. So there will be a delay before they are published. Don't bother with spam, it wastes your time and mine.