Monday 7 June 2021

Optimising my way out of a small fund problem - part one

This is part one of a series of posts about using optimisation to get the best possible portfolio given a relatively small amount of capital.

In this short post I present the idea, and discuss some issues that I need to resolve. It's a bit of a stream of conciousness! It's less of a blog post, and more my random jottings on the subject converted from scribbles to electronic prose. It's a precursor to further posts where I will start designing and testing the method.

I am sorry for my size

There is a little known book about the City of London in the 80's (The buck stops here), in which there is quite an amusing anecote. The stockbroker - who has recently been fired - goes for a meal / drink with a Japanese client:

"His enzymes had let him down again, and he was a bit drunk, in a benign sort of way 'I am sorry, Mr Parton for my size' he kept on muttering. I caught the stares of a few passers-by and wanted to say to them, this man does not mean what you think he means."

Of course the Japanese fund manager is referring to the size of his fund which is relatively modest (and this is why the broker has been canned in the first place. As a specialist in selling European equities to Japanese investors who prefer to invest domestically, or at a push in the US, he is doomed).

My fund, or rather my trading account, is also relatively modest. It's larger than the average retail account, but by no means the multi billion dollars I used to jockey back in the days when I had a proper job. 

This is.... unfortunate. Why does it matter? Obviously it means fewer bragging rights in Soho wine bars, but that doesn't bother me (especially as at the time of writing, Soho wine bars are outside table service with NHS track and trace enabled only). No, what bothers me is this:

The graph shows the increase in expected Sharpe Ratio as you add instruments to a simple trading strategy consisting of a single moving average crossover (And like a good boy, I've put error bars around the Sharpe Ratio estimates). So with one (randomly chosen) instrument the average SR is around 0.24; but if I add another (randomly chosen) instrument it goes up to around 0.3. And with a few wiggles, the increase continues pretty monotonically. And as those error bars show, the improvement is statistically significant.

(I could do even better especially for the first few assets if I deliberately added instruments that diversified the existing pool at each stage, rather than just randomly choosing).

This graph is striking, especially if I compared it to another graph where I added trading rules but kept the number of instruments constant. There the increase is slower, and also begins to show reduced marginal gains. Here we're still getting fairly steady improvements in performance at the 33 instrument mark. If there is an optimal number of instruments (at which point the marginal improvement became non existent) one could trade it's clearly much more than 33, or even the 37 or so I've traded with (give or take) since 2014.

To make a famous quote more accurate:
Diversification across instruments is the only free lunch in finance.

However it isn't actually a free lunch. Every extra instrument you trade will use up capital (this isn't true for trading rules, at least not the way my system is implemented). This problem is most pressing for futures traders, since you can't trade fractions of a futures contract, and most contracts are very large in dollar risk compared to the average persons trading account.

This means that with less capital you can't trade the 400+ or so instruments traded by AHL and other large CTAs.  Even if we put aside the OTC instruments and cash equities that these funds trade, and just stick to futures, there are something like 70 additional futures markets I don't already trade which are liquid enough, not massive in size, have cheap data, and don't cost too much. But there is no way I could trade over 100 markets with my capital.

And this is a serious problem for retail traders, which is why I wrote a whole book about how to make the best use of scarce capital (the subject is also discussed at length in my first and second books). Diversification across instruments is the main competitive advantage that large funds have.

So I'm stuck with around 37 instruments, and I can only manage that many because of an ugly hack that I wrote about at some length here

That ugly hack is worth a brief discussion (though you are welcome to read the post). It relies on the fact that, with some exceptions, a larger forecast (my scaled measure of expected risk adjusted return) implies a larger ex-post risk adjusted return. This is something I analysed in more detail in this more recent post

So in the ugly hack I ignore forecasts that are too small, and then scale up forecasts beyond some threshold more aggresively scale up trades (to ensure that the scaling properties of the forecast are unchanged). I have to do this in markets where my modest capital is most pressing: those with relatively large contract sizes. 

The important point here is that larger forecasts are better - hold on to that point.

Optimisation to the rescue

Now any financial quant worth their salt would read what I've just written and say 'Pff! That's just an optimisation problem'.

'Pff?' I'd reply.

'Mais Oui*. All you need to do is take the expected returns and covariance matrix, limit the optimisation weights to discrete values, and press F9**'  

* Thanks to their excellent Grand Ecole system, most quants are French

** Surprisingly large amounts of the financial system, especially on the sell side, run in Excel

'But where do I get the expected returns from?'

'Boff! You already have the, how do you say, forecasts? A higher forecast means a higher expected return, does it not?'

'Yes, but there is no obvious mapping... Also aren't optimisations somewhat.... well not robust?'

'Only if handled by an inexperienced Rosbif like yourself. For a suitable fee I can of course help you out....'

Now I can't afford to pay this imaginary Quant a fee, and of course she is imaginary, so we'll have to come up with a better solution using a methodology that I understand (no doubt much simpler than is taught in the hallowed lecture theatres of the Ecole Polytechnique). And the building block we're going to use is Black-Litterman.

A brief idiots guide to Black-Litterman


Well Black and Litterman are of course the legendary (and sadly missed) Fischer Black of BSM and BDT; and GSAM legend Bob Litterman. And their model deals with the problem I highlighted above 'But where do I get the expected returns from?'

And the answer is you get them from an inverse portfolio optimisation. You start with a portfolio of weights (let's put aside for the moment the question of where they come from). Then you estimate a covariance matrix. Then you run the classical Markowitz optimisation (find the optimal weights given a vector of expected returns and a covariance matrix, and some risk tolerance or utility function) in reverse so it becomes find the expected returns given a vector of weights and a covariance matrix.

BL (as I will say henceforth) used the market portfolio for their starting weights, and hence the resulting implied returns are the 'equilibrium returns'; the returns that are expected given that the 'average' (in a cumulative sense) investor must hold the market portfolio by construction.

Once you have your expected returns you can combine them with some forecasted returns. Perhaps you want to include the discretionary opinion of your chief economist. Or maybe you've got some kind of systematic model for forecasting returns. In any case you take a weighted average of the original equilibrium returns and your forecasts (so this is Bayesian in character as we shrink our forecasts towards the equilibrium returns). Now with your new vector of expected returns you run the normal optimisation forward; using the same covariance matrix you derive a new set of optimal weights.

(The full paper is here)

BL portfolios have some nice properties. If you make no changes at all to the expected returns then you'll recover the original weights (this is a good way to check your code is working!). If you replace them completely, you'll basically have the portfolio implied by your forecasts (which will usually be not very robust at all, with the usual extreme weights problem highlighted). But a blend of the two sets of expected returns, if weighted mostly towards the equlibrium returns, will produce robust portfolios that are tilted away from the market cap weights to reflect our forecasts.

I'm a fan of BL because it accounts, to an extent, for the hierarchy of inputs to a portfolio optimisation. Expected returns are the hardest to forecast, and small changes have a big effect on the output. Standard deviations are relatively easy to forecast, and small changes have a small effect on the output. Correlations fall somewhere in the middle. BL effectively assumes we can predict standard deviations and correlations perfectly, but doesn't make the same assumption about expected returns.

But I don't actually use BL for optimisation, mainly because in the kind of problem I'm usually dealing with (eg deciding how to linearly weight a variety of trading rules and instruments) as it isn't obvious what the 'market cap portfolio' should be. And I'm not going to use it for it's intended purpose here eithier.

The brilliant idea

We can use the BL methodology to do something rather cool and interesting, and fun (and completely different from the original intent). We can run the backward optimisation, and then the forward, without making any changes to the expected returns. Instead we make some other change to the optimisation. Most commonly this would be the introduction of constraints; like a limit on Emerging market exposure, or a position size limit, or ... and this is relevant.... a discrete position size constraint.

So the plan looks something like this:

  • Run my standard position generation function, which will produce a vector of desired contract positions across instruments, all of which will be non integer. Let's call this the 'original' portfolio weights. The main inputs into this calculation are the forecast, instrument weight (as a proportion of risk capital allocated), current volatility of the instrument, long run target volatility and the instrument diversification multiplier (see here, and search for 'why does expected risk vary')
  • Estimate a covariance matrix Σ and a risk aversion coefficient λ
  • Using a reverse Markowitz, BL style, calculate the implied expected returns for each instrument, µ. There is a closed form for the reverse optimisation, since this doesn't have constraints: λΣw
  • Run the optimisation forward using µ, Σ, λ, with a constraint that only integer contract positions can be taken.
Intuitively the sort of thing this process would do is to trade more of instruments with smaller contract size, if they are posiitvely correlated with instruments that are too big to trade. So it's going to be superior to something that just gives you the rounded version of the optimal portfolio (like for example minimising Euclidian distance); which if you have enough instruments and insufficient capital is going to be a vector of zero weights.

The brilliant idea is harder than it first sounds: some small problems

Now there are a lot of unanswered questions here. I've spent a long time thinking about this idea (over 18 months); and it's actually much more complicated than it might first seem.

For the discrete optimisation we're probably going to want to use some kind of grid search. That's going to be slow, especially if I end up with my 'dream' portfolio of 100+ instruments.

In fact it's worse than that, because a great feature of this approach is we can calculate forecasts for instruments we have no intention of trading (because they aren't sufficiently liquid, or are too expensive) as well as instruments that we'd like to trade but the contract size is inordinately large so we can't. And then we can use their forecasts to inform us what our overall portfolio should look like once we apply the discrete constraints; for example the (way too large) Ethereum contract could give me useful information about how to trade the micro Bitcoin future. 

In fact my full wish list currently stands at a total of 228 instruments. Anything we can do to reduce the area that has to be searched would be good! For example, I'd be reluctant to put more than 10% of my risk capital in a single instrument. That sets an upper and lower limit on position size.

I'd also be unhappy changing the sign of a position as a result of an optimisation. I don't want to end up with weird spreading behaviour, just because two instruments are negatively correlated doesn't mean I want to go long/short if say both forecasts are positive. So the lower limit would be zero for a long optimal position, and the upper limit would be zero for a short optimal.

It would probably make a lot of sense to do some kind of coarse to fine search, but I'll discuss specific options for that later. 

It's possible that contracts will move in and out of the 'tradeable / not tradeable' region over time, and rather than adding/removing them manually it would be better to allow an optimisation to do this. There would need to be a list of instruments in a state of 'reduce only', for which the maximum would be the current position (if long, the minimum if short). This list would be updated automatically for instruments that fell below or suddenly qualified for my required criteria for volume and costs.  There would no need to eliminate instruments that were 'too big to trade'; this would happen naturally if 10% of risk capital wasn't sufficient to take even a single contract of position.

It's plausible that there could also be instruments that we couldn't trade at all - eithier permanently or temporarily. For these the maximum and minimum would be equal to the current position. For example, it might be that I get end of day data from one source, for a market I can't afford to get live L1 data for to trade with.

Notice that for these last few points the optimisation would need to have knowledge of the current positions held by the system. In production this means it would make most sense in the pysystemtrade layer that generates 'instrument orders', which sits between the strategy optimal position generation and the execution layer.

In my current trading system this layer currently implements a buffering algo to reduce turnover. It would make sense to replace this with an optimisation that considered explicit costs in it's calculation. It's trivial to calculate the expected cost per contract to do a given trade, assuming you have expected slippage and commision data (which I have). An open question is wether those costs should be amortised over the required holding period rather than assume we're optimising until the next optimisation (in 24 hours presumably), or whether a multiplier should be applied to reflect that costs are more certain than returns (for example, I apply a multiplier of 2 in my normal optimisation of forecast and instrument weights).

Something I have skated over is the fact that my initial strategy will produce desired weights in contract units, and the final optimisation also needs to know about discrete contract units, but 'w' is expressed as a notional position size as a proportion of capital (costs per contract would be in £,$,... units, but one can easily convert that to be a proportion of capital). So I'd need to work out what a single contract was in units of w when determining what the possible discrete step sizes were for each instrument.

Finally one can imagine extending this further; for example by introducing margin requirements into the optimisation.

And some big problems

All of the above problems are mostly just <vaguely waves hand> engineering. I know what needs to be done, it's just a matter of coding it up.

A more difficult question lies around the coefficient of risk aversion, λ. I'm not used to thinking in terms of that at all. However in theory it won't actually matter what λ is set to, as long as we use the same λ in both the reverse and forward optimisation (trivially, so that it is consistent with the closed form of the initial reverse optimisation the form of the forward optimisation must be to maximise max w'µ − λ w'Σw/ 2 rather than the more modern version where we specify a maximum risk and solve for highest return). That should naturally result in a portfolio which has about the same amount of risk as the original. Which is important, because there is information in the amount of risk that the original strategy positions want to take. 

[Note that I could still impose a maximum risk constraint (at say twice my expected annualised target risk of 25% a year); this would replace part of my exogenous risk overlay which effectively fulfills the same function.]

I've left the hardest problem until last, and this is 'What covariance matrix should we use'? Remember a covariance is just the offspring of a correlation matrix and a standard deviation that love each other very much. 

Well the easy part is the standard deviations; I'll just use estimates of percentage annualised risk for each instrument (since we're dealing in w units as a proportion of capital, % risk is the most appropriate). And this seems as good a time as any to introduce a blended estimate of volatility (as discussed here, which will also make another part of my exogenous risk overlay redundant, since it includes a mean reverting component). But what about the correlation?

Should we use the correlation of instrument returns, or should we use the correlation of trading subsystem returns, which after all is what was used (although not directly) to calculate the instrument weights? And which of these should we use in our initial ('reverse') and second ('forward') optimisations?

Let's look at an example. Suppose that we have a 50% instrument weight in SP500, 25% each in US2 and US5 (because the trading subsystems for the two bonds are highly correlated, and historically they've been relatively uncorrelated with SP500), and also suppose those weights are a result of doing a naive markowitz optimisation with some specific correlation matrix of trading subsystem returns (not true in practice, but we'll come to that).

And suppose also that we have equal positive forecasts in all three assets (we we expect the same risk adjusted return). We'll have long positions, but with a larger long position in SP500 than in the other two assets (ignoring the effect of risk; in practice we'd have apply risk scaling to these positions).

What will the implied expected returns look like for these assets after we do the initial reverse optimisation? 

Well if we use the correlation of trading subsystem returns, then in theory we'd end up with expected returns that were equal (actually risk adjusted returns that were equal, but we're ignoring risk and focusing on correlation for now). Which is all fine and correct - since the forecasts are equal.

Let's also suppose however that right now the current correlation of the instrument returns of US2, US5 and SP500 are all equal and positive (so the world has changed, and stocks and bonds are now highly correlated). Then if were to use this correlation matrix in the initial forward optimisation then our implied expected returns would be higher for SP500 than it is for US2 and US10 year (ignoring risk again). This doesn't seem right.

Now what happens if we run the forward optimisation with each of the two matrices. The better option, for me, is to use the current correlation of instrument returns. This deals with the problem I highligted here. If we were to use the long run matrix of subsystem returns we wouldn't be taking into account the change in risk conditions (stocks becoming more correlated with bonds), which is arguably a major flaw of the type of trading system I like to use (forecasts developed independently, and expected risk does not take changes into correlation into account). 

We have four cases:

Reverse / Forward optimisation: which correlation matrix used

A: Subsystem correlation / Subsystem correlation

Implied expected returns will be correct (see above). Final positions will take no account of the fact that stocks are now more correlated with bonds. Using identical matrices will result in consistency and more intuitive results. 

B: Current instrument correlation / current instrument correlation

Implied expected returns will be wrong (see above). Final positions will take account of changes in stock and bond correlations. Using identical matrices will result in consistency and more intuitive results. 

C: Current instrument correlation / Subsystem correlation

Implied expected returns will be wrong (see above).  Final positions will take no account of the fact that stocks are now more correlated with bonds.Using different matrices will result in less intuitive results, may not result in robust portfolios, and could result in unhelpful effects around expected risk targeting.

D: Subsystem correlation / Current instrument correlation

Implied expected returns will be correct (see above). Final positions will take account of changes in stock and bond correlations. Using different matrices will result in less intuitive results, may not result in robust portfolios, and could result in unhelpful effects around expected risk targeting.

We can discount option C right away; it really is the worst of all worlds. 

I don't like option B, since it will result in the 'wrong' expected returns, but perhaps that doesn't actually matter as much as I think it should in practice. As it's using current correlations, it will be more adaptive to different risk conditions. And as it's the same matrix in both optimisations, the BL machinery will work as expected.

Option A will also work, but it won't give us the nice property of giving us a more holistic and dynamic adaptation to portfolio risk. It will be much more like the existing system in character.  

I'm intrigued by option D. In some ways it's the best of both worlds. If it works, then in the example it would have the correct identical expected returns, but then allocate away from SP500 due to it's (temporarily) higher correlation with the other two assets. It gives us a nice holistic and dynamic adaptation. However I'm worried that using two different correlation matrices will make the thing rather weird. It strikes me as likely that the character of the resulting portfolio could be very different from the original, even if we prevent the signs of positions changing.

Also, will it produce the same amount of required risk if I use the same coefficient of risk aversion? Or do I need to calculate the required target risk (for example by scaling the long run risk target I use, 25%, by the aggregate strength of forecasts) and then run the forward optimisation using a maximum standard deviation rather than a coefficient of risk aversion?

There is a technical issue with both options A and D as the correct correlation matrix of subsystem returns (that will result in the expected returns being implied as 'correct' i.e. proportional to forecasts) won't be the same as one you just estimate, because the instrument weights aren't just naively derived from a given correlation matrix; they are robustly optimised. Perhaps that doesn't matter so much for option A since it's the same correlation matrix in both cases (to an extent the correlation matrix is arbitrary). For option D however all the benefit of recovering the correct expected returns will be lost if we don't have the 'right' matrix on the initial optimisation.

I think I have to dismiss option D on the grounds of complexity.

It comes down then to a fight between using the long run correlation of subsystem returns for both forward and reverse optimisation (option A), and using the current correlation of instrument returns for both (option B).  

I'll need to test both of these options to get a feel for how well they work, and whether they have the properties I expect and want.

Some thoughts on testing

I'd be surprised if I was able to run a full backtest with 228 instruments doing a daily optimisation for 40 odd years of data without my laptop committing digital suicide. Instead I'm going to work with a smaller universe of instruments to test what is going on. As well as checking the optimisation does things that make sense, I'm interested in the tracking error between the p&l that would be possible with a large amount of capital versus what the system can actually produce through the optimisation, and whether the expected risk is broadly similar for the original and optimised portfolio.

What next

In the next post I go ahead and test this crazy idea out.


  1. This is really interesting, thanks for posting. I wonder what the compute requirements would be if you added a faster moving trading sub-system in addition to your trendfollower and then performed a multi-period optimisation...

    ______________________________ . \ | / .
    / / \ \ \ / /
    | | ========== - -
    \____________________________\_/ / / \ \

    Could be expensive.

    1. Not if the cost penalty does it's job properly.

      Of course that doesn't always work in practice.

    2. I mean't for your cloud compute bill :)

  2. Rob, c'est un très bon sujet, j'ai lutté pendant des années pour essayer de le comprendre. I currently been doing forecast filtering, but less than happy with it. I am going to start coding up this approach, don't suppose you have any worked examples to share, or is that for part deux! Thanks once again for the great through leadership, now all we need as some JGB micro contracts!

    1. Merci! No, I'm currently writing the code for part two. So watch this space.

    2. thanks rob as always! I will be implementing mine in java (long story why java, but mainly cos i have not got the time to re-implement in python (although at some point I will have no choice) ) but happy to collaborate in anyway I can, or to cross check the results with java as a secondary check.


Comments are moderated. So there will be a delay before they are published. Don't bother with spam, it wastes your time and mine.