Monday, 14 November 2022

If you're so smart, how come you're not SBF? The Kelly criterion and choice of expectations and utility function when bet sizing



There has been a very interesting discussion on twitter, relating to some stuff said by Sam Bankman-Fried (SBF), who at the time of writing has just completely vaporized billions of dollars in record time via the medium of his crypto exchange FTX, and provided a useful example to future school children of the meaning of the phrase nominative determinism*.

* Sam, Bank Man: Fried. Geddit? 

Read the whole thread from the top:

https://twitter.com/breakingthemark/status/1591114381508558849

TLDR the views of SBF can be summarised as follows:

  • Kelly criterion maximises log utility
  • I don't have a log utility function. It's probably closer to linear.
  • Therefore I should bet at higher than Kelly. Up to 5x would be just fine.
I, and many others, have pointed out that SBF is an idiot. Of course it's easier to do this when he's just proven his business incompetence on a grand scale, but to be fair I was barely aware of the guy until a week ago. Specifically, he's wrong about the chain of reasoning above*. 

* It's unclear whether this is specifically what brought SBF down. At the time of writing he appears to have taken money from his exchange to prop up his hedge fund, so maybe the hedge fund was using >>> Kelly leverage, and this really is the case. 

In this post I will explain why he was wrong, with pictures. To be clearer, I'll discuss how the choice of expectation and utility function affects optimal bet sizing. 

I've discussed parts of this subject briefly before, but you don't need to read the previous post.


Scope and assumptions


To keep it tight, and relevant to finance, this post will ignore arguments seen on twitter related to one off bets, and whether you should bet differently if you are considering your contribution to society as a whole. These are mostly philosophical discussions which it's hard to solve with pictures. So the set up we have is:

  • There is an arbitrary investment strategy, which I assume consists of a data generating process (DGP) producing Gaussian returns with a known mean and standard deviation (this ignores parameter uncertainty, which I've banged on about often enough, but effectively would result in even lower bet sizing).
  • We make a decision as to how much of our capital we allocate to this strategy for an investment horizon of some arbitrary number of years, let's say ten.
  • We're optimising L, the leverage factor, where L =1 would be full investment, 2 would be 100% leverage, 0.5 would be 50% in cash 50% in the strategy and so on.
  • We're interested in maximising the expectation of f(terminal wealth) after ten years, where f is our utility function.
  • Because we're measuring expectations, we generate a series of possible future outcomes based on the DGP and take the expectation over those.
Note that I'm using the contionous version of the Kelly criterion here, but the results would be equally valid for the sort of discrete bets that appear in the original discussion.


Specific parameters

Let's take a specific example. Set mean =10% and standard deviation = 20%, which is a Sharpe ratio of 0.5, and therefore Kelly should be maxed at 50% risk, equating to L = 50/20 = 2.5. SBF optimal leverage would be around 5 times that, L = 12.5. We start with wealth of 1 unit, and compound it over 10 years.

I don't normally paste huge chunks of code in these blog posts, but this is a fairly short chunk:

import pandas as pd
import numpy as np
from math import log

ann_return = 0.1
ann_std_dev = 0.2

BUSINESS_DAYS_IN_YEAR = 256
daily_return = ann_return / BUSINESS_DAYS_IN_YEAR
daily_std_dev = ann_std_dev / (BUSINESS_DAYS_IN_YEAR**.5)

years = 10
number_days = years * BUSINESS_DAYS_IN_YEAR


def get_series_of_final_account_values(monte_return_streams,
leverage_factor = 1):
account_values = [account_value_from_returns(returns,
leverage_factor=leverage_factor)
for returns in monte_return_streams]

return account_values

def get_monte_return_streams():
monte_return_streams = [get_return_stream() for __ in range(10000)]

return monte_return_streams

def get_return_stream():
return np.random.normal(daily_return,
daily_std_dev,
number_days)

def account_value_from_returns(returns, leverage_factor: float = 1.0):
one_plus_return = np.array(
[1+(return_item*leverage_factor)
for return_item in returns])
cum_return = one_plus_return.cumprod()

return cum_return[-1]

monte_return_streams = get_monte_return_streams()

Utility function: Expected log(wealth) [Kelly]

Kelly first. We want to maximise the expected log final wealth:

def expected_log_value(monte_return_streams,
leverage_factor = 1):

series_of_account_values =get_series_of_final_account_values(
monte_return_streams = monte_return_streams,
leverage_factor = leverage_factor)
log_values_over_account_values = [log(account_value) for account_value in series_of_account_values]

return np.mean(log_values_over_account_values)

And let's plot the results:

def plot_over_leverage(monte_return_streams, value_function):
leverage_ratios = np.arange(1.5, 5.1, 0.1)
values = []
for leverage in leverage_ratios:
print(leverage)
values.append(
value_function(monte_return_streams, leverage_factor=leverage)
)

leverage_to_plot = pd.Series(
values, index = leverage_ratios
)

return leverage_to_plot

leverage_to_plot = plot_over_leverage(monte_return_streams,
expected_log_value)
leverage_to_plot.plot()

In this plot, and nearly all of those to come, the x-axis shows the leverage L and the y-axis shows the value of the expected utility. To find the optimal L we look to see where the highest point of the utility curve is.

As we'd expect:

  • Max expected log(wealth) is at L=2.5. This is the optimal Kelly leverage factor.
  • At twice optimal we expect to have log wealth of zero, equivalent to making no money at all (since starting wealth is 1).
  • Not plotted here, but at SBF leverage (12.5) we'd have expected log(wealth) of <undefined> and have lost pretty much all of our money.


Utility function: Expected (wealth) [SBF?]

Now let's look at a linear utility function, since SBF noted that his utility was 'roughly close to linear'. Here our utility is just equal to our terminal wealth, so it's purely linear.

def expected_value(monte_return_streams,
leverage_factor = 1):

series_of_account_values =get_series_of_final_account_values(
monte_return_streams = monte_return_streams,
leverage_factor = leverage_factor)

return np.mean(series_of_account_values)
leverage_to_plot = plot_over_leverage(monte_return_streams,
expected_value)

You can see where SBF was coming from, right? Utility gets exponentially higher and higher, as we add more leverage. Five times leverage is a lot better than 2.5 times, the Kelly criterion. Five times Kelly, or 2.5 * 5= 12.5, would be even better.


Utility function: Median(wealth) 

However there is an important assumption above, which is the use of the mean for the expectation operator. This is dumb. It would mean (pun, sorry), for example, that of the following:

  1. An investment that lost $1,000 99 times out of 100; and paid out $1,000,000 1% of the time
  2. An investment that is guaranteed to gain $9,000 

... we would theoretically prefer option 1 since it has an expected value of $9,010, higher than the trivial expected value of $9,000 for option 2. There might be some degenerate gamblers who prefer 1 to 2, but not many.

(Your wealth would also affect which of these you would prefer. If $1,000 is a relatively trivial amount to you, you might prefer 1. If this is the case consider if you'd still prefer 1 to 2 if the figures were 1000 times larger, or a million times larger). 

I've discussed this before, but I think the median is the more appropriate. What the median implies in this context is something like this: 

Considering all possible future outcomes, how can I maximise the utility I receive in the outcome that will occur half the time?

I note that the median of option 1 above is zero, whilst the median of option 2 is $9,000. Option 2 is now far more attractive.


def median_value(monte_return_streams,
leverage_factor = 1):

series_of_account_values =get_series_of_final_account_values(
monte_return_streams = monte_return_streams,
leverage_factor = leverage_factor)

return np.median(series_of_account_values)


The spooky result here is that the optimal leverage is now 2.5, the same as the Kelly criterion.

Even with linear utility, if we use the median expectation, Kelly is the optimal strategy.

The reason why people prefer to use mean(log(wealth)) rather than median(wealth), even though they are equivalent, is that the former is more computationally attractive.

Note also the well known fact that Kelly also maximises the geometric return.

With Kelly we aren't really making any assumptions about utility function: our assumption is effectively that the median is the correct expectations operator

The entire discussion about utility is really a red herring. It's very hard to measure utility functions, and everyone probably does have a different one, I think it's much better to focus on expectations.


Utility function: Nth percentile(wealth) 

Well you might be thinking that SBF seems like a particularly optimistic kind of guy. He isn't interested in the median outcome (which is the 50% percentile). Surely there must be some percentile at which it makes sense to bet 5 times Kelly? Maybe he is interested in the 75% percentile outcome?

QUANTILE = .75
def value_quantile(monte_return_streams,
leverage_factor = 1):

series_of_account_values =get_series_of_final_account_values(
monte_return_streams = monte_return_streams,
leverage_factor = leverage_factor)

return np.quantile(series_of_account_values, QUANTILE)

Now the optimal is around L=3.5. This is considerably higher than the Kelly max of L=2.5, but it is still nowhere near the SBF optimal L of 12.5.

Let's plot the utility curves for a bunch of different quantile points:

list_over_quantiles = []
quantile_ranges = np.arange(.4, 0.91, .1)
for QUANTILE in quantile_ranges:
leverage_to_plot = plot_over_leverage(monte_return_streams,
value_quantile)
list_over_quantiles.append(leverage_to_plot)

pd_list = pd.DataFrame(list_over_quantiles)
pd_list.index = quantile_ranges
pd_list.transpose().plot()

It's hard to see what's going on here, legend floating point representation notwithstanding, but you can hopefully see that the maximum L (hump of each curve) gets higher as we go up the quantile scale, as the curves themselves get higher (as you would expect).

But in none of these quantiles we are still nowhere near reaching an optimal L of 12.5. Even at the 90% quantile - evaluating something that only happens one in ten times - we have a maximum L of under 4.5.

Now there will be some quantile point at which L=12.5 is indeed optimal. Returning to my simple example:

  1. An investment that lost $1,000 99 times out of 100; and paid out $1,000,000 1% of the time
  2. An investment that is guaranteed to gain $9,000 

... if we focus on outcomes that will happen less than one in a million times (the 99.9999% quantile and above) then yes sure, we'd prefer option 1.

So at what quantile point does a leverage factor of 12.5 become optimal? I couldn't find out exactly, since to look at extremely rare quantile points requires very large numbers of outcomes*. I actually broke my laptop before I could work out what the quantile point was. 

* for example, if you want ten observations to accurately measure the quintile point, then for the 99.99% quantile you would need 10* (1/(1-0.9999)) = 100,000 outcomes.

But even for a quantile of 99.99% (!), we still aren't at an optimal leverage of 12.5! 


You can see that the optimal leverage is 8 (around 3.2 x Kelly), still way short of 12.5.


Summary

Rather than utility functions, I think it's easier to say to ask people the likelihood of outcome they are concerned about. I'd argue that sensible people would think about the median outcome, which is what you expect to happen 50% of the time. And if you are a bit risk averse, you should probably consider an even lower quantile. 

In contrast SBF went for bet sizing that would only make sense in the set of outcomes that happens significantly less than 0.01% of the time. That is insanely optimistic; and given he was dealing with billions of dollars of other peoples money it was also insanely irresponsible.

Was SBF really that recklessly optimistic, or dumb? In this particular case I'd argue the latter. He had a very superfical understanding of Kelly bet sizing, and because of that he thought he could ignore it. 
This is a classic example of 'a little knowledge is a dangerous thing'. A dumb person doesn't understand anything, but reads on the internet somewhere that half Kelly is the correct bet sizing. So they use it. A "smart" person like SBF glances at the Kelly formula, thinks 'oh but I don't have log utility' and leverages up five times Kelly and thinks 'Wow I am so smart look at all my money'. And that ended well...

A truely enlightened person understands that it isn't about the utility function, but about the expectation operator. They also understand about uncertainty, optimistic backtesting bias, and a whole bunch of factors that imply that even 0.5 x Kelly is a little reckless. I, for example, use something south of a quarter Kelly. 

Which brings us back to the meme at the start of the post:



Note I am not saying I am smarter than SBF. On pure IQ, I am almost certainly much, much dumber. In fact, it's because I know I am not a genius that I'm not arrogant enough to completely follow or ignore the Kelly criteria without first truely understanding it.

Whilst this particular misunderstanding might not have brought down SBF's empire, it shows that really really smart people can be really dumb - particularly when they think that they are so smart they don't need to properly understand something before ignoring it*.

* Here is another example of him getting something completely wrong


Postscript (16th November 2022)

I had some interesting feedback from Edwin Teejay on twitter, which is worth addressing here as well. Some of the feedback I've incorporated into the post already.

(Incidentally, Edwin is a disciple of Ergodic Economics, which has a lot of very interesting stuff to say about the entire problem of utility maximisation)

First he commented that the max(median) = max(log) relationship is only true for a long sequence of bets, i.e. asymptotically. We effectively have 5000 bets in our ten year return sequence. As I said originally, I framed this as a typical asset optimisation problem rather than a one off bet (or small number of one off bets).

He then gives an example of a one off bet decision where the median would be inappropriate:
  1. 100% win $1
  2. 51% win $0 / 49% win $1'000'000
The expected values (mean expectation) are $1 and $490,000 respectively, but the medians are $1 and $0. But any sane person would pick the second option.

My retort to this is essentially the same as before - this isn't something that could realistically happen in a long sequence of bets. Suppose we are presented with making the bet above every single week for 5 weeks. The distribution of wealth outcomes for option 1 is single peaked - we earn $5. The distribution of wealth outcomes for option 2 will vary from $0 (with probability 3.4%) to $5,000,000 (with a slightly lower probability of 2.8% - I am ignoring 'compounding', eg the possibility to buy more bets with money we've already won), with a mean of $2.45 million. 

But the median is pretty good: $1 million. So we'd definitely pick option 2. And that is with just 5 bets in the sequence. So the moment we are looking at any kind of repeating bet, the law of large numbers gets us closer and closer to the median being the optimal decision. We are just extremely unlikely to see the sort of payoff structure in the bet shown in a series of repeated bets.

Now what about the example I posted:
  1. An investment that lost $1,000 99 times out of 100; and paid out $1,000,000 1% of the time
  2. An investment that is guaranteed to gain $9,000 
Is it realistic to expect this kind of payoff structure in a series of repeated bets? Well consider instead the following:
  1. An investment that lost $1 most of the time; and paid out $1,000,000 0.001% of the time
  2. An investment that is guaranteed to gain $5

The mean of these bets is ~$9 and $5, and the medians are $-1 and $5.

Is this unrealistic? Well, these sorts of payoffs do exist in the world- they are called lottery tickets (albeit it is rare to get a lottery ticket with a $9 positive mean!). And this is something closer to the SBF example, since I noted that he would have to be looking at somewhere north of the 0.01% quantile to choose 5x Kelly Leverage.

Now what happens if we run the above as a series of 5000 repeated bets (again with no compounding for simplicity).  We end up with the following distributions:
  1. An investment that lost $5000 95.1% of the time, and makes $1 million or more 5% of the time.
  2. An investment that is guaranteed to gain $25,000

Since there is no compounding we can just multiply up the individual numbers to get the mean ($45,000  and $25,000 respectively). The medians are -$5,000 and $25,000. Personally, I still prefer option 2! You might still prefer option 2 if spending $5,000 on lottery tickets over 10 years reflects a small proportion of your wealth, but I refer you to the previous discussion on this topi: make 

So I would argue that it in a long run of bets we are more likely in real life to get payoff structures of the kind I posited, than the closer to 50:50 bet suggested by Edwin. Ultimately, I think we agree that for long sequences of bets the median makes more sense (with a caveat). I personally think long run decision making is more relevant to most people than one off bets. 

What is the caveat? Edwin also said that the choice of the median is 'arbitrary'. I disagree here. The median is 'what happens half the time'. I still think for most people that is a logical reference point for 'what I expect to happen', as well as in terms of the maths: both median and mean are averages after all. I personally think it's fine to be more conservative than this if you are risk averse, but not to be more aggressive - bear in mind that will mean you are betting at more than Kelly.

But anyway, as Matt Hollerbach, whose orginal series of tweets inspired this post, said:

"The best part of Robs framework is you don't have to use the median,50%. You could use 60%  or 70%  or 40% if your more conservative.  And it intuitively tells you what the chance of reaching your goal is. You don't get duped into a crazy long shot that the mean might be hiding in."  (typos corrected from original tweet)

This fits well into my general framework for thinking about uncertainty. Quantify it, and be aware of it. Then if you still do something crazy/stupid, well at least you know you're being an idiot...


Wednesday, 2 November 2022

Optimal trend following allocation under conditions of uncertainty and without secular trends

Few people are brave enough to put their entire net worth into a CTA fund or home grown trend following strategy (my fellow co-host on the TTU podcast, Jerry Parker, being an honorable exception with his 'Trend following plus nothing' portfolio allocation strategy). Most people have considerably less than 100% - and I include myself firmly in that category. And it's probably true that most people have less than the sort of optimal allocation that is recommended by portfolio optimisation engines.

Still it is a useful exercise to think about just how much we should allocate to trend following, at least in theory. The figure that comes out of such an exercise will serve as both a ceiling (you probably don't want any more than this), and a target (you should be aiming for this). 

However any sort of portfolio optimisation based on historical returns is likely to be deeply flawed. I've covered the problems involved at length before, in particular in my second book and in this blogpost, but here's a quick recap:

  1. Standard portfolio optimisation techniques are not very robust
  2. We often assume normal distributions, but financial returns are famously abnormal
  3. There is uncertainty in the parameter estimates we make from the data
  4. Past returns distributions may be biased and unlikely to repeat in the future

As an example of the final effect, consider the historically strong performance of equities and bonds in a 60:40 style portfolio during my own lifetime, at least until 2022. Do we expect such a performance to be repeated? Given it was driven by a secular fall in inflation from high double digits, and a resulting fall in interest rates and equity discount rates, probably not. 

Importantly, a regime change to lower bond and equity returns will have varying impact on a 60:40 long only portfolio (which will get hammered), a slow trend following strategy (which will suffer a little), and a fast trend following strategy (which will hardly be affected). 

Consider also the second issue: non Gaussian return distributions. In particular equities have famously negative skew, whilst trend following - especially the speedier variation - is somewhat positive in this respect. Since skew affects optimal leverage, we can potentially 'eat' extra skew in the form of higher leverage and returns. 

In conclusion then, some of the problems of portfolio optimisation are likely to be especially toxic when we're looking at blends of standard long only assets combined with trend following. In this post I'll consider some tricks methods we can use to alleviate these problems, and thus come up with a sensible suggestion for allocating to trend following. 

If nothing else, this is a nice toy model for considering the issues we have when optimising, something I've written about at length eg here. So even if you don't care about this problem, you'll find some interesting ways to think about robust portfolio optimisation within.

Credit: This post was inspired by this tweet.

Some very messy code with hardcoding galore, is here.


The assets

Let's first consider the assets we have at our disposal. I'm going to make this a very simple setup so we can focus on what is important whilst still learning some interesting lessons. For reasons that will become apparent later, I'm limiting myself to 3 assets. We have to decide how much to allocate to each of the following three assets:

  • A 60:40 long only portfolio of bonds and equities, represented by the US 10 year and S&P 500
  • A slow/medium speed trend following strategy, trading the US 10 year and S&P 500 future with equal risk allocation, with a 12% equity-like annualised risk target. This is a combination of EWMAC crossovers: 32,128 and 64,256
  • A relatively fast trend following strategy, trading the US 10 year and S&P 500 future with equal risk allocation, with a 12% annualised risk target. Again this is a combination of EWMAC crossovers: 8, 32 and 16,64

Now there is a lot to argue with here. I've already explained why I want to allocate seperately to fast and slow trend following; as it will highlight the effect of secular trends.

The reason for the relatively low standard deviation target is that I'm going to use a non risk adjusted measure of returns, and if I used a more typical CTA style risk (25%) it would produce results that are harder to interpret.

You may also ask why I don't have any commodities in my trend following fund. But what I find especially interesting here is the effect on correlations between these kinds of strategies when we adjust for long term secular trends. These correlations will be dampened if there are other instruments in the pot. The implication of this is that the allocation to a properly diversified trend following fund running futures across multiple asset classes will likely be higher than what is shown here.

Why 60:40? Rather than 60:40, I could directly try and work out the optimal allocation to a universe of bonds and equities seperately. But I'm taking this as exogenous, just to simplify things. Since I'm going to demean equity and bond returns in a similar way, this shouldn't affect their relative weightings.

50:50 risk weights on the mini trend following strategies is more defensible; again I'm using fixed weights here to make things easier and more interpretable. For what it's worth the allocation within trend following for an in sample backtest would be higher for bonds than for equities, and this is especially true for the faster trading strategy.

Ultimately three assets makes the problem both tractable and intuitive to solve, whilst giving us plenty of insight.


Characteristics of the underyling data

Note I am going to use futures data even for my 60:40, which means all the returns I'm using are excess returns.

Let's start with a nice picture:


So the first thing to note is that the vol of the 60:40 is fairly low at around 12%; as you'd expect given it has a chunky allocation to bonds (vol ~6.4%). In particular, check out the beautifully smooth run from 2009 to 2022. The two trading strategies also come in around the 12% annualised vol mark, by design. In terms of Sharpe Ratio, the relative figures are 0.31 (fast trading strategy), 0.38 (long only) and 0.49 (slow trading strategy). However as I've already noted, the performance of the long only and slow strategies is likely to be flattered by the secular trends in equities and bonds seen since 1982 (when the backtest starts).

Correlations matter, so here they are:

         60:40  Fast TF  Slow TF
60:40 1.00 -0.02 0.25
Fast TF -0.02 1.00 0.68
Slow TF 0.25 0.68 1.00

What about higher moments? The monthly skews are -1.44 (long only), 0.08 (slow) and 0.80 (fast). Finally what about the tails? I have a novel method for measuring these which I discuss in my new book, but all you need to know is that a figure greater than one indicates a non-normal distribution. The lower tail ratios are 1.26 (fast), 1.35 (slow) and 2.04 (long only); whilst the uppers are 1.91 (fast), 1.74 (slow) and 1.53 (long only). In other words, the long only strategy has nastier skew and worst tails than the fast trading strategy, whilst the slow strategy comes somewhere in between.


Demeaning

To reiterate, again, the performance of the long only and slow strategies is likely to be flattered by the secular trends in equities and bonds, caused by valuation rerating in equities and falling interest rates in bonds. 

Lets take equities. The P/E ratio in September 1982 was around 9.0, versus 20.1 now. This equates to 2.0% a year in returns coming from the rerating of equities. Over the same period US 10 year bond yields have fallen from around 10.2% to 4.0% now, equating to around 1.2% a year in returns. I can do a simple demeaning to reduce the returns achieved by the appropriate amounts.

Here are the demeaned series with the original backadjusted prices. First S&P:

And for US10:


What effect does the demeaning have? It doesn't affect significantly standard deviations, skew, or tail ratios. But it does affect the Sharpe Ratio:


              Original       Demean     Difference

Long only       0.38          0.24        -0.14

Slow TF         0.49          0.41        -0.08

Fast TF         0.31          0.25        -0.06

This is exactly what we would expect. The demeaning has a larger effect on the long only 60:40, and to a lesser extent the slower trend following. 

And the correlation is also a little different:

         60:40  Fast TF  Slow TF
60:40 1.00 -0.06 0.18
Fast TF -0.06 1.00 0.66
Slow TF 0.18 0.66 1.00

Both types of trend have become slightly less correlated with 60:40, which makes sense.


The optimisation

Any optimisation requires (a) a utility or fitness function that we are maximising, and (b) a method for finding the highest value of that function. In terms of (b) we should bear in mind the comments I made earlier about robustness, but let's first think about (a).

An important question here is whether we should be targeting a risk adjusted measure like Sharpe Ratio, and hence assuming leverage is freely available, which is what I normally do. But for an exercise like this a more appropriate utility function will target outright return and assume we can't access leverage. Hence our portfolio weights will need to sum to exactly 100% (we don't force this to allow for the possibility of holding cash; though this is unlikely). 

It's more correct to use geometric return, also known as CAGR, rather than arithmetic mean since that is effectively the same as maximising the (log) final value of your portfolio (Kelly criteria). Using geometric mean also means that negative skew and high kurtosis strategies will be punished, as will excessive standard deviation. By assuming a CAGR maximiser, I don't need to worry about the efficient frontier, I can maximise for a single point. It's for this reason that I've created TF strategies with similar vol to 60:40.

I'll deal with uncertainty by using a resampling technique. Basically, I randomly sample with replacement from the joint distribution of daily returns for the three assets I'm optimising for, to create a new set of account curves (this will preserve correlations, but not autocorrelations. This would be problematic if I was using drawdown statistics, but I'm not). For a given set of instrument weights, I then measure the utility statistic (CAGR) for the resampled returns. I repeat this exercise a few times, and then I end up with a distribution of CAGR for a given set of weights. This allows us to take into account the effect of uncertainty. 

Finally we have the choice of optimisation technique. Given we have just three weights to play with, and only two degrees of freedom, it doesn't seem too heroic to use a simple grid search. So let's do that.


Some pretty pictures

Because we only have two degrees of freedom, we can plot the results on a 2-d heatmap. Here's the results for the median CAGR, with the original set of returns before demeaning:
Sorry for the illegible labels - you might have to click on the plots to see them. The colour shown reflects the CAGR. The x-axis is the weight for the long only 60:40 portfolio, and the y-axis for slow trend following. The weight to fast trend following will be whatever is left over. The top diagonal isn't populated since that would require weights greater than 1; the diagonal line from top left to bottom right is where there is zero weight to fast trend following; top left is 100% slow TF and bottom right is 100% long only.

Ignoring uncertainty then, the optimal weight (brightest yellow) is 94% in slow TF and 6% in long only. More than most people have! However note that there is a fairly large range of yellow CAGR that are quite similar. 

The 30% quantile estimate for the optimal weights is a CAGR of 4.36, and for the 70% quantile it's 6.61. Let's say we'd be indifferent between any weights whose median CAGR falls in that range (in practice then, anything whose median CAGR is greater than 4.36). If I replace everything that is statistically indistinguishable from the maximum with white space, and redo the heatmap I get this:

This means that, for example, a weight of 30% in long only, 34% in slow trend following, and 36% in fast trend following; is just inside the whitespace and thus is statistically indistinguishable from the optimal set of weights. Perhaps of more interest, the maximum weight we can have to long only and still remain within this region (at the bottom left, just before the diagonal line reappears) is about 80%.

Implication: We should have at least 20% in trend following.

If I had to choose an optimal weight, I'd go for the centroid of the convex hull of the whitespace. I can't be bothered to code that up, but by eye it's at roughly 40% 60/40, 50% slow TF, 10% fast TF.

Now let's repeat this exercise with the secular trends removed from the data.

The plot is similar, but notice that the top left has got much better than the bottom right; we should have a lower weight to 60:40 than in the past. In fact the optimal is 100% in slow trend following; zilch, nil, zero, nada in both fast TF and 60:40.

But let's repeat the whitespace exercise to see how robust this result is:

The whitespace region is much smaller than before, and is heavily biased towards the top left. Valid portfolio weights that are indistinguishable from the maximum include 45% in 60:40 and 55% in slow TF (and 45% is the most you should have in 60:40 whilst remaining in this region). We've seen a shift away from long only (which we'd expect), but interestingly no shift towards fast TF, which we might have expected as it is less affected by demeaning.

The optimal (centroid, convex hull, yada yada...) is somewhere around 20% 60:40, 75% slow TF and 5% in fast TF.


Summary: practical implications

This has been a highly stylised exercise, deliberately designed to shine a light on some interesting facts and show you some interesting ways to visualise the uncertainty in portfolio optimisation. You've hopefully seen how we need to consider uncertainty in optimisation, and I've shown you a nice intuitive way to produce robust weights.

The bottom line then is that a robust set of allocations would be something like 40% 60/40, 50% slow TF, 10% fast TF; but with a maximum allocation to 60/40 of about 80%. If we use data that has had past secular trends removed, we're looking at an even higher allocation to TF, with the maximum 60/40 allocation reducing considerably, to around 45%

Importantly, this has of course been an entirely in sample exercise. Although we've made an effort to make things more realistic by demeaning, much of the results depend on the finding that slow TF has a higher SR than 60:40, an advantage that is increased by demeaning. Correcting for this would result in a higher weight to 60:40, but also to fast TF.

Of course if we make this exercise more realistic, it will change these results:
  • Improving 60:40 equities- Introducing non US assets, and allocating to individual equities
  • Improving 60:40 bonds -  including more of the term structure, inflation and corporate bonds, 
  • Improving 60:40 by including other non TF alternatives
  • Improving the CTA offering - introducing a wider set of instruments across asset classes (there would also be a modest benefit from widening beyond a single type of trading rule)
  • Adding fees to the CTA offering 
I'd expect the net effect of these changes to result in a higher weight to TF, as the diversification benefits in going from two instruments to say 100 is considerable; and far outweights the effect of fees and improved diversification in the long only space.