Thursday, 5 January 2023

Scream if you want to go faster

Happy new year.

I didn't post very much in 2022, because I was in the process of writing a new book (out in April!). Save a few loose ends, my work on that project is pretty much done. Now I have some research topics I will be looking at this year, with the intention of returning to something like a monthly posting cycle. 

To be clear this is *not* a new years resolution, and therefore is *not* legally or morally binding.

The first of these research projects relates to expensive (high cost) trading strategies. Now I deal with these strategies in a fairly disdainful way, since I'm conservative when it comes to trading fast and I prefer to avoid throwing away too much return on (certain) costs in the pursuit of (uncertain) returns. 

Put simply: I don't trade anything that is too quick. To be more precise, I do not allocate risk capital to trading rules where turnover * cost per trade > 'speed limit'. The effect of this is that instruments that are cheaper to trade get an allocation to expensive trading rules that have a high turnover; but most don't. And there are a whole series of potential trading rules which are far too rich for all but the very cheapest instruments that I trade.

My plan is to do a series of posts that explore in more depth whether this is the correct approach, and whether there is actually some room for faster trading strategies in my quiver. Broadly speaking, it might be that eithier:

  • The layers of buffering and optimisation in my system mean it is possible to add faster strategies without worrying about the costs they incur. 
  • Or there could be some other way of smuggling in faster trading, perhaps via an additional execution layer that optimises the execution of orders coming from the (slow) core strategy. 

Part of the motivation for this is that in my new book I introduce some relatively quick trading strategies which are viable with most instruments, but which require a different execution architecture from my current system (a portfolio that is dynamically optimised daily, allowing me to include over 100 instruments despite a relatively modest capital base). Thus the quicker strategies are not worth trading with my retail sized trading account; since to do so would result in a severe loss of instrument diversification which would more than negate any advantages. I'll allude to these specific strategies further at some point in the series, when I determine if there are other ways to sneak them into the system.

For this first post I'm going to explore the relationship between instrument cost and momentum performance. Mostly this will be an exercise in determining whether my current approach to fitting instrument and forecast weights is correct, or if I am missing something. But it will also allow me to judge whether it is worth trying to 'smuggle in' faster momentum trading rules that are too pricey for most instruments to actually trade.

The full series of posts:

  • This is the first post

Note 1: There is some low quality python code here, for those that use my open source package pysystemtrade. And for those who haven't yet pre-ordered the book, you may be interested to know that this post effectively fleshes out some more work I present in chapter nine around momentum speed.

Note 2: The title comes from this song, but please don't draw any implications from it. It's not my favourite Geri Halliwell song (that's 'Look at Me'), and Geri isn't even my favourite Spice Girl (that's Emma), and the Spice Girls are certainly not my favourite band (they aren't even in my top 500).



The theories

There are two dogs in this fight. 

Well metaphorical dogs, I don't believe in dog fighting. Don't cancel me! But the fight is very real and not metaphorical at all. 

I'm interested in how much truth there is in the following two hypothesis:

  • We have a prior belief that the expected pre-cost performance for a given trading rule is the same regardless of underlying instrument cost. 
  • "No free lunch": expected pre-cost performance is higher for instruments that cost more to trade, but costs exactly offset this effet. Therefore expected post-cost performance for a given trading rule will be identical regardless of instrument cost.

I use expectations here because in reality, as we shall see, there is huge variation in performance over instruments but quite a lot of that isn't statistically significant.

What implications do these hypothesis have? If the first is true, then we shouldn't bother trading expensive instruments (or at least radically downweight them, since there will be diversification benefits). They are just expensive ways of getting the same pre-cost performance. But if the second statement is true then an expensive instrument is just as good as a cheaper one.

Note that my current dynamic optimisation allows me to side step this issue; I generate signals for instruments regardless of their costs and I don't set instrument weights according to costs, but then I use a cost penalty optimisation which makes it unlikely I will actually trade expensive instruments to implement my view. Of course there are some instruments I don't trade at all, as their costs are just far too high. 

Moving on, the above two statements have the following counterparts:

  • We have a prior belief that the expected pre-cost performance across trading rules for a given instrument is constant. 
  • Expected post-cost performance across trading rules for a given instrument will be identical.

What implications do these theories have for deciding how fast to trade a given instrument, and how much forecast weight to give to a given speed of momentum? If pre-cost SR is equal, we should give a higher weight to slower trading rules; particularly if an instrument is expensive to trade. If post-cost SR is equal, then we should probably give everything equal weights.

Note that I currently don't do eithier of these things: I completely delete rules that are too quick beyond some (relatively arbitrary) boundary, then set the other weights to be equal. In some ways this is a compromise between these two extremes approaches.

A brief footnote: 

You may ask why I am looking into this now? Well, I was pointed to this podcast by an ex-colleague, in which another ex-colleague of mine makes a rather interesting claim:

"if you see a market that is more commoditised [lower costs] you tend to see faster momentum disappear."

Incidentally it's worth listening to the entire podcast, which as you'd expect from a former team member of yours truely is excellent in every way.

Definitely interesting! This would be in line with the 'no free lunch' theory. If fast momentum is only profitable before costs for instruments that cost a lot to trade, then it won't be possible to exploit it. And the implication of this is that I am doing exactly the wrong thing: if an instrument is cheap enough to trade faster momentum, I shouldn't just unthinkingly let it. Conversely if an instrument is expensive, it might be worth considering faster momentum if we can work out some way of avoiding those high costs. 

It's also worth saying that there are already some stylised facts that support this theory. Principally, faster momentum signals stopped working particularly in equity markets in the 1990s; and equity markets became the cheapest instruments to trade in the same period.

Incidentally, I explore the change in momentum profitability over time more in the new book. 



The setup

I started with my current set of 206 instruments, and removed duplicates (eg mini S&P 500, for which the results would be the same as micro), and those with less than one year of history (for reasons that will become clearer later, but basically this is to ensure my results are robust). This left me with 160 instruments - still a decent sample.

I then set up my usual six exponentially weighted moving average crossover (EWMAC) trading rules, all of the form N,4N so EWMAC2 denotes a 2 day span minus an 8 day span: EWMAC2, EWMAC4, EWMAC8, EWMAC16, EWMAC32, EWMAC64.

I'm going to use Sharpe ratio as my quick and dirty measure of performance, and measure trading costs per instruments as the cost per trade in SR units. However because the range of SR trading costs is very large, covering several orders of magnitude (from 0.0003 for NASDAQ to over 90 for the rather obscure Euribor contract I have in my database), I will use log(costs) as my fitting variable and for plotting purposes. 

Before proceeding, there is an effect we have to bear in mind, which is the different lengths of data involved. Some instruments in this dataset have 40+years of data, others just one. But in a straight median or mean over instrument SR they will get the same weighting. We need to check that there is no bias; for example because equities generally have less data and are also the cheapest:

Here the y-axis shows the SR cost per trade (log axis) and the x-axis shows the number of days of data. 

There doesn't seem to be a clear bias, eg more recent instruments especially cheap, so it's probably safe to ignore the data length issue.


Gross SR performance by trading rule versus costs

Now to consider the relationship between the cost of an instrument, and the performance of a given trading rule. Each of these scatter plots has one point per instrument; each point shows the SR cost per trade (log) on the x-axis, and the gross SR performance of a given trading rule on the y-axis. I've also added regression lines, and R squared for those regressions.

If the 'no free lunch' rule of equal post cost SR applied, we'd see a positive slope in these charts, whereas if SR were equal pre-cost we'd see a flat line.







So there is something interesting here. For the very fastest trading rules, there is a weak positive relationship (R squared of 0.075) whereby the more expensive an instrument is, the higher the gross SR. That relationship gets monotonically weaker as we slow down, and completely vanishes for the very slowest momentum rule. This does seem to chime with the 'no free lunch' theory; gross profits on very fast momentum are only available for instruments where that would be too expensive to trade.

Perhaps then there is something in the idea that we can use very fast trading rules on expensive instruments, if we can get round the pesky trading costs!


Net SR performance by trading rule versus costs


What happens if we repeat these plots, but with net performance? If the no free lunch (equal post-cost SR) rule is exactly correct, then this should be a horizontal line with no relationship between net SR and the cost of trading a given instruments. Of course if the other hypothesis (equal pre-cost SR) is true, then we'd see a downward sloping line- and because we have log(costs) on the x-axis it would slope downward exponentially. Let's have a look at the very fastest rule:


I haven't dropped a regression line on here, because there clearly isn't a linear relationship. The red vertical line shows the point at which I'd currently stop trading this rule for a given instrument. Everything to the right of this line is an instrument that is too expensive for this trading rule; a SR cost above 0.0031 units. To the right of this line it's clear that we lose more and more money for more expensive instruments; the small improvement in gross SR we saw before for costlier instruments is completely dominated by much higher costs. 

Technically I should allow for the effect of rolls on turnover but in for simplicity I ignore those when drawing the red line, since roll frequency is different for each instrument. They will only have an effect for very expensive instruments at very slow speeds.

But to the left of it things aren't as obvious:


There aren't that many data points here, but there doesn't seem to be much of an upward or downward slope here, which is what we'd expect from the no free lunch theory; to put it another way if costs aren't too high then we can treat the post cost SR as equal, which is a vindication of my forecast weight allocation process. We can confirm that by adding a regression line, but fitting only on the points to the left of the line:


There is a very slight downward slope; indicating that this very fast momentum might be a little closer to the 'equal pre-cost' than 'equal post-cost' hypothesis. But the R squared is very small, and there aren't many data points, reflecting the very small number of instruments that can trade this rule.

Let's continue using  this approach for slower rules:





By the time we get to the very slowest trading rule, it looks much more like the assumption of equal post cost SR is true. The R squared is barely in double figures, so this isn't a very clear result, but it does look like you would want to downweight expensive instruments, as well as removing those that exceed the cost threshold and are to the right of the red line. This remains true even if we're only trading the very slowest EWMAC64 speed on those instruments, which we would be. 

Indeed, if we trust the regression line it looks the cost ceiling for EWMAC64 (and therefore the global cost ceiling to decide whether to trade an instrument at all, in the absence of any cheaper trading rules) should be something like a log cost of -5, or 0.007 SR cost units (the point at which the regression line crosses the x-axis of zero expected SR). 

That's actually a little more conservative than my current global maximum for instrument costs, which is 0.01 SR units (discussed here), but on the other hand the use of dynamic cost penalties means I can probably relax a little on this front.


Optimal trading speed

Let's return to the second set of statements we want to test:

  1. We have a prior belief that the expected pre-cost performance across trading rules for a given instrument is constant. 
  2. Expected post-cost performance across trading rules for a given instrument will be identical.
To put it another way, in a pre-cost world the optimal trading speed will be eithier:
  1. Identical regardless of instrument costs
  2. Faster for more expensive instruments
And in a post-cost world, optimal trading speed will be:
  1. Slower for more expensive instruments
  2. Identical regardless of instrument costs
How do we measure optimal trading speed? This is a bit tricker than just measuring the SR of the rule, since it's effectively the result of a portfolio optimisation. A full blown optimisation would seem a bit much, but just using the EWMAC with the highest SR would be far too noisy. 

I decided to use the following method, which effectively allocates in proportion to SR (where positive):

speed_as_list = np.array([1,2,3,4,5,6])
def optimal_trading_rule_for_instrument(instrument_code, curve_type="gross"):
sr_by_rule = pd.Series([
sr_for_rule_type_instrument(rule_name, instrument_code, curve_type=curve_type)
for rule_name in list_of_rules])

sr_by_rule[sr_by_rule<0] = 0
if sr_by_rule.sum()==0:
return 7.0

sr_by_rule_as_weight = sr_by_rule / sr_by_rule.sum()
weight_by_speed = sr_by_rule_as_weight * speed_as_list
optimal_speed = weight_by_speed.sum()

return optimal_speed

This returns a 'speed number'. The optimal speed number will be 1 (if EWMAC 2 is the best), 2 (if it's EWMAC 4, or something like a combination of EWMAC2,4, and 8 which works out as an average of 4), 3 (EWMAC8), 4 (EWMAC16).... 6 (EWMAC64) or 7 if there are no trading rules with a positive Sharpe (which could be due to very high costs; or just bad luck).


Optimal trading speed with gross SR

Let's repeat the exercise of scatter plotting. Log(costs) of instrument is still on the x-axis, but on the y-axis we have the optimal trading speed, as a number between 1 (fast!) and 6 (very slow!), or 7 (don't bother).

Well this looks pretty flat. The optimal trading speed is roughly 3.5 (somewhere between EWMAC8 and EWMAC16) for very cheap instruments, and perhaps 3 for very expensive ones. But it's noisy as anything. It's probably safe to assume that there is no clear relationship between optimal speed and instrument costs, if we only use gross returns.


Optimal trading speed with net SR

Now let's do the same thing, but this time we find the optimal speed using net rather than gross returns.

As before I've added a red vertical line. Instruments to the right of this are too expensive to trade, even with 100% weight on my slowest trading rule, since their costs would exceed my speed limit of 0.13 SR units per year. 

There are a lot more '7' here, as you'd expect, especially for high cost instruments (which all have negative SR after costs for all momentum), but there are also quite a few lower cost instruments with the same problem. This is just luck - we know that SR by trading rule is noisy, so by bad luck we'd have a few instruments which have negative SR for all our trading rules. 

As we did before, let's ignore the instruments above the red line, and run a regression on what's left over:


That's certainly a strong result, and very much in favour of trading more slowly as instrument costs rise.

However it might be unduly influenced by the '7' points, so let's drop those and see what it looks like without them:
There is still something there, but it's a bit weaker. Roughly speaking, for the very cheapest instruments the optimal trading speed is around 3.5 (something like an equal weight of EWMAC8 and EWMAC16), and for the costliest it's around 5 (EWMAC32, or equivalently an equal weight of 16,32 and 64). 

It's probably worth contrasting this with the weights I currently allocate. Rule turnovers are roughly 42 (EWMAC2), 20, 9, 3.8, 2.3, and 2.1 (EWMAC64). To trade all six rules I would need an instrument SR cost of less than 0.00283 (assuming quarterly rolls), around -5.9 in log space. Such an instrument would have equal weights across all six rules, and therefore a speed number of around 3.5. That is a little faster than the above regression would suggest (-5.9 is closer to an optimal speed number of 4.1), but the regression is very noisy. 

To trade EWMAC64 and nothing else, I'd require an instrument SR cost of less than 0.021 (again assuming quarterly rolls; it would be higher for monthly rolls), or -3.8 in log space. With costs higher than that I couldn't trade anything at all. 

Note that is to the left of the red line, since the red line ignores the effect of rolling on turnover. 

Just EWMAC64 is a speed number of 6, and the regression suggests a speed number of 5.9 with -3.8 log costs. That is a pretty good match.


Summary and implications


Let's deal with the issue of optimal speed first, since the implications here are more straightforward. Broadly speaking it is correct that pre-cost optimal speed is flat, and therefore we should slow down as instruments get more expensive to trade. Although the results are noisy, they suggest that with my current simplistic method for allocating forecast weights I'm spot on with the most expensive instruments, but I might be trading the very cheapest instruments a tiny bit too quickly. However the difference isn't enough to worry about.

Turning to the issue of performance of momentum according to instrument cost, I draw two main conclusions. 

Firstly, the post-cost results suggest that trading rule performance gets worse for more expensive instruments, even if we're trading them slowly. Hence, if I was trading a static system without dynamic optimisation, then I would give consideration to penalising the instrument weight of instruments with high costs (but which were still cheap enough to trade). It's important to note that this is at odds with the approach I've taken before, and discussed in my first book, where I generally set instrument weights without considering costs (assuming post cost SR is equal). Also, my 'cheap enough to trade' bar of 0.01 SR units may be set a little aggressively; 0.007 could be closer to the mark.

However, as I'm currently using dynamic optimisation with a cost penalty, I'm less worried about these issues. This will naturally allocate less to more expensive instruments, and trade them less.

Secondly, the pre-cost performance of momentum versus instrument costs suggests that there is some truth in the idea that more expensive instruments can be traded with fast momentum if you don't have to worry about costs. This is quite a weak result, but I will bear it in mind when I think about 'smuggling in' faster trading rules.

In conclusion, I'm happy that my current pragmatic and simplistic approach to fitting is good enough, but it's been a useful exercise to properly interrogate my assumptions on trading costs and find some surprising results. 

Monday, 14 November 2022

If you're so smart, how come you're not SBF? The Kelly criterion and choice of expectations and utility function when bet sizing



There has been a very interesting discussion on twitter, relating to some stuff said by Sam Bankman-Fried (SBF), who at the time of writing has just completely vaporized billions of dollars in record time via the medium of his crypto exchange FTX, and provided a useful example to future school children of the meaning of the phrase nominative determinism*.

* Sam, Bank Man: Fried. Geddit? 

Read the whole thread from the top:

https://twitter.com/breakingthemark/status/1591114381508558849

TLDR the views of SBF can be summarised as follows:

  • Kelly criterion maximises log utility
  • I don't have a log utility function. It's probably closer to linear.
  • Therefore I should bet at higher than Kelly. Up to 5x would be just fine.
I, and many others, have pointed out that SBF is an idiot. Of course it's easier to do this when he's just proven his business incompetence on a grand scale, but to be fair I was barely aware of the guy until a week ago. Specifically, he's wrong about the chain of reasoning above*. 

* It's unclear whether this is specifically what brought SBF down. At the time of writing he appears to have taken money from his exchange to prop up his hedge fund, so maybe the hedge fund was using >>> Kelly leverage, and this really is the case. 

In this post I will explain why he was wrong, with pictures. To be clearer, I'll discuss how the choice of expectation and utility function affects optimal bet sizing. 

I've discussed parts of this subject briefly before, but you don't need to read the previous post.


Scope and assumptions


To keep it tight, and relevant to finance, this post will ignore arguments seen on twitter related to one off bets, and whether you should bet differently if you are considering your contribution to society as a whole. These are mostly philosophical discussions which it's hard to solve with pictures. So the set up we have is:

  • There is an arbitrary investment strategy, which I assume consists of a data generating process (DGP) producing Gaussian returns with a known mean and standard deviation (this ignores parameter uncertainty, which I've banged on about often enough, but effectively would result in even lower bet sizing).
  • We make a decision as to how much of our capital we allocate to this strategy for an investment horizon of some arbitrary number of years, let's say ten.
  • We're optimising L, the leverage factor, where L =1 would be full investment, 2 would be 100% leverage, 0.5 would be 50% in cash 50% in the strategy and so on.
  • We're interested in maximising the expectation of f(terminal wealth) after ten years, where f is our utility function.
  • Because we're measuring expectations, we generate a series of possible future outcomes based on the DGP and take the expectation over those.
Note that I'm using the contionous version of the Kelly criterion here, but the results would be equally valid for the sort of discrete bets that appear in the original discussion.


Specific parameters

Let's take a specific example. Set mean =10% and standard deviation = 20%, which is a Sharpe ratio of 0.5, and therefore Kelly should be maxed at 50% risk, equating to L = 50/20 = 2.5. SBF optimal leverage would be around 5 times that, L = 12.5. We start with wealth of 1 unit, and compound it over 10 years.

I don't normally paste huge chunks of code in these blog posts, but this is a fairly short chunk:

import pandas as pd
import numpy as np
from math import log

ann_return = 0.1
ann_std_dev = 0.2

BUSINESS_DAYS_IN_YEAR = 256
daily_return = ann_return / BUSINESS_DAYS_IN_YEAR
daily_std_dev = ann_std_dev / (BUSINESS_DAYS_IN_YEAR**.5)

years = 10
number_days = years * BUSINESS_DAYS_IN_YEAR


def get_series_of_final_account_values(monte_return_streams,
leverage_factor = 1):
account_values = [account_value_from_returns(returns,
leverage_factor=leverage_factor)
for returns in monte_return_streams]

return account_values

def get_monte_return_streams():
monte_return_streams = [get_return_stream() for __ in range(10000)]

return monte_return_streams

def get_return_stream():
return np.random.normal(daily_return,
daily_std_dev,
number_days)

def account_value_from_returns(returns, leverage_factor: float = 1.0):
one_plus_return = np.array(
[1+(return_item*leverage_factor)
for return_item in returns])
cum_return = one_plus_return.cumprod()

return cum_return[-1]

monte_return_streams = get_monte_return_streams()

Utility function: Expected log(wealth) [Kelly]

Kelly first. We want to maximise the expected log final wealth:

def expected_log_value(monte_return_streams,
leverage_factor = 1):

series_of_account_values =get_series_of_final_account_values(
monte_return_streams = monte_return_streams,
leverage_factor = leverage_factor)
log_values_over_account_values = [log(account_value) for account_value in series_of_account_values]

return np.mean(log_values_over_account_values)

And let's plot the results:

def plot_over_leverage(monte_return_streams, value_function):
leverage_ratios = np.arange(1.5, 5.1, 0.1)
values = []
for leverage in leverage_ratios:
print(leverage)
values.append(
value_function(monte_return_streams, leverage_factor=leverage)
)

leverage_to_plot = pd.Series(
values, index = leverage_ratios
)

return leverage_to_plot

leverage_to_plot = plot_over_leverage(monte_return_streams,
expected_log_value)
leverage_to_plot.plot()

In this plot, and nearly all of those to come, the x-axis shows the leverage L and the y-axis shows the value of the expected utility. To find the optimal L we look to see where the highest point of the utility curve is.

As we'd expect:

  • Max expected log(wealth) is at L=2.5. This is the optimal Kelly leverage factor.
  • At twice optimal we expect to have log wealth of zero, equivalent to making no money at all (since starting wealth is 1).
  • Not plotted here, but at SBF leverage (12.5) we'd have expected log(wealth) of <undefined> and have lost pretty much all of our money.


Utility function: Expected (wealth) [SBF?]

Now let's look at a linear utility function, since SBF noted that his utility was 'roughly close to linear'. Here our utility is just equal to our terminal wealth, so it's purely linear.

def expected_value(monte_return_streams,
leverage_factor = 1):

series_of_account_values =get_series_of_final_account_values(
monte_return_streams = monte_return_streams,
leverage_factor = leverage_factor)

return np.mean(series_of_account_values)
leverage_to_plot = plot_over_leverage(monte_return_streams,
expected_value)

You can see where SBF was coming from, right? Utility gets exponentially higher and higher, as we add more leverage. Five times leverage is a lot better than 2.5 times, the Kelly criterion. Five times Kelly, or 2.5 * 5= 12.5, would be even better.


Utility function: Median(wealth) 

However there is an important assumption above, which is the use of the mean for the expectation operator. This is dumb. It would mean (pun, sorry), for example, that of the following:

  1. An investment that lost $1,000 99 times out of 100; and paid out $1,000,000 1% of the time
  2. An investment that is guaranteed to gain $9,000 

... we would theoretically prefer option 1 since it has an expected value of $9,010, higher than the trivial expected value of $9,000 for option 2. There might be some degenerate gamblers who prefer 1 to 2, but not many.

(Your wealth would also affect which of these you would prefer. If $1,000 is a relatively trivial amount to you, you might prefer 1. If this is the case consider if you'd still prefer 1 to 2 if the figures were 1000 times larger, or a million times larger). 

I've discussed this before, but I think the median is the more appropriate. What the median implies in this context is something like this: 

Considering all possible future outcomes, how can I maximise the utility I receive in the outcome that will occur half the time?

I note that the median of option 1 above is zero, whilst the median of option 2 is $9,000. Option 2 is now far more attractive.


def median_value(monte_return_streams,
leverage_factor = 1):

series_of_account_values =get_series_of_final_account_values(
monte_return_streams = monte_return_streams,
leverage_factor = leverage_factor)

return np.median(series_of_account_values)


The spooky result here is that the optimal leverage is now 2.5, the same as the Kelly criterion.

Even with linear utility, if we use the median expectation, Kelly is the optimal strategy.

The reason why people prefer to use mean(log(wealth)) rather than median(wealth), even though they are equivalent, is that the former is more computationally attractive.

Note also the well known fact that Kelly also maximises the geometric return.

With Kelly we aren't really making any assumptions about utility function: our assumption is effectively that the median is the correct expectations operator

The entire discussion about utility is really a red herring. It's very hard to measure utility functions, and everyone probably does have a different one, I think it's much better to focus on expectations.


Utility function: Nth percentile(wealth) 

Well you might be thinking that SBF seems like a particularly optimistic kind of guy. He isn't interested in the median outcome (which is the 50% percentile). Surely there must be some percentile at which it makes sense to bet 5 times Kelly? Maybe he is interested in the 75% percentile outcome?

QUANTILE = .75
def value_quantile(monte_return_streams,
leverage_factor = 1):

series_of_account_values =get_series_of_final_account_values(
monte_return_streams = monte_return_streams,
leverage_factor = leverage_factor)

return np.quantile(series_of_account_values, QUANTILE)

Now the optimal is around L=3.5. This is considerably higher than the Kelly max of L=2.5, but it is still nowhere near the SBF optimal L of 12.5.

Let's plot the utility curves for a bunch of different quantile points:

list_over_quantiles = []
quantile_ranges = np.arange(.4, 0.91, .1)
for QUANTILE in quantile_ranges:
leverage_to_plot = plot_over_leverage(monte_return_streams,
value_quantile)
list_over_quantiles.append(leverage_to_plot)

pd_list = pd.DataFrame(list_over_quantiles)
pd_list.index = quantile_ranges
pd_list.transpose().plot()

It's hard to see what's going on here, legend floating point representation notwithstanding, but you can hopefully see that the maximum L (hump of each curve) gets higher as we go up the quantile scale, as the curves themselves get higher (as you would expect).

But in none of these quantiles we are still nowhere near reaching an optimal L of 12.5. Even at the 90% quantile - evaluating something that only happens one in ten times - we have a maximum L of under 4.5.

Now there will be some quantile point at which L=12.5 is indeed optimal. Returning to my simple example:

  1. An investment that lost $1,000 99 times out of 100; and paid out $1,000,000 1% of the time
  2. An investment that is guaranteed to gain $9,000 

... if we focus on outcomes that will happen less than one in a million times (the 99.9999% quantile and above) then yes sure, we'd prefer option 1.

So at what quantile point does a leverage factor of 12.5 become optimal? I couldn't find out exactly, since to look at extremely rare quantile points requires very large numbers of outcomes*. I actually broke my laptop before I could work out what the quantile point was. 

* for example, if you want ten observations to accurately measure the quintile point, then for the 99.99% quantile you would need 10* (1/(1-0.9999)) = 100,000 outcomes.

But even for a quantile of 99.99% (!), we still aren't at an optimal leverage of 12.5! 


You can see that the optimal leverage is 8 (around 3.2 x Kelly), still way short of 12.5.


Summary

Rather than utility functions, I think it's easier to say to ask people the likelihood of outcome they are concerned about. I'd argue that sensible people would think about the median outcome, which is what you expect to happen 50% of the time. And if you are a bit risk averse, you should probably consider an even lower quantile. 

In contrast SBF went for bet sizing that would only make sense in the set of outcomes that happens significantly less than 0.01% of the time. That is insanely optimistic; and given he was dealing with billions of dollars of other peoples money it was also insanely irresponsible.

Was SBF really that recklessly optimistic, or dumb? In this particular case I'd argue the latter. He had a very superfical understanding of Kelly bet sizing, and because of that he thought he could ignore it. 
This is a classic example of 'a little knowledge is a dangerous thing'. A dumb person doesn't understand anything, but reads on the internet somewhere that half Kelly is the correct bet sizing. So they use it. A "smart" person like SBF glances at the Kelly formula, thinks 'oh but I don't have log utility' and leverages up five times Kelly and thinks 'Wow I am so smart look at all my money'. And that ended well...

A truely enlightened person understands that it isn't about the utility function, but about the expectation operator. They also understand about uncertainty, optimistic backtesting bias, and a whole bunch of factors that imply that even 0.5 x Kelly is a little reckless. I, for example, use something south of a quarter Kelly. 

Which brings us back to the meme at the start of the post:



Note I am not saying I am smarter than SBF. On pure IQ, I am almost certainly much, much dumber. In fact, it's because I know I am not a genius that I'm not arrogant enough to completely follow or ignore the Kelly criteria without first truely understanding it.

Whilst this particular misunderstanding might not have brought down SBF's empire, it shows that really really smart people can be really dumb - particularly when they think that they are so smart they don't need to properly understand something before ignoring it*.

* Here is another example of him getting something completely wrong


Postscript (16th November 2022)

I had some interesting feedback from Edwin Teejay on twitter, which is worth addressing here as well. Some of the feedback I've incorporated into the post already.

(Incidentally, Edwin is a disciple of Ergodic Economics, which has a lot of very interesting stuff to say about the entire problem of utility maximisation)

First he commented that the max(median) = max(log) relationship is only true for a long sequence of bets, i.e. asymptotically. We effectively have 5000 bets in our ten year return sequence. As I said originally, I framed this as a typical asset optimisation problem rather than a one off bet (or small number of one off bets).

He then gives an example of a one off bet decision where the median would be inappropriate:
  1. 100% win $1
  2. 51% win $0 / 49% win $1'000'000
The expected values (mean expectation) are $1 and $490,000 respectively, but the medians are $1 and $0. But any sane person would pick the second option.

My retort to this is essentially the same as before - this isn't something that could realistically happen in a long sequence of bets. Suppose we are presented with making the bet above every single week for 5 weeks. The distribution of wealth outcomes for option 1 is single peaked - we earn $5. The distribution of wealth outcomes for option 2 will vary from $0 (with probability 3.4%) to $5,000,000 (with a slightly lower probability of 2.8% - I am ignoring 'compounding', eg the possibility to buy more bets with money we've already won), with a mean of $2.45 million. 

But the median is pretty good: $1 million. So we'd definitely pick option 2. And that is with just 5 bets in the sequence. So the moment we are looking at any kind of repeating bet, the law of large numbers gets us closer and closer to the median being the optimal decision. We are just extremely unlikely to see the sort of payoff structure in the bet shown in a series of repeated bets.

Now what about the example I posted:
  1. An investment that lost $1,000 99 times out of 100; and paid out $1,000,000 1% of the time
  2. An investment that is guaranteed to gain $9,000 
Is it realistic to expect this kind of payoff structure in a series of repeated bets? Well consider instead the following:
  1. An investment that lost $1 most of the time; and paid out $1,000,000 0.001% of the time
  2. An investment that is guaranteed to gain $5

The mean of these bets is ~$9 and $5, and the medians are $-1 and $5.

Is this unrealistic? Well, these sorts of payoffs do exist in the world- they are called lottery tickets (albeit it is rare to get a lottery ticket with a $9 positive mean!). And this is something closer to the SBF example, since I noted that he would have to be looking at somewhere north of the 0.01% quantile to choose 5x Kelly Leverage.

Now what happens if we run the above as a series of 5000 repeated bets (again with no compounding for simplicity).  We end up with the following distributions:
  1. An investment that lost $5000 95.1% of the time, and makes $1 million or more 5% of the time.
  2. An investment that is guaranteed to gain $25,000

Since there is no compounding we can just multiply up the individual numbers to get the mean ($45,000  and $25,000 respectively). The medians are -$5,000 and $25,000. Personally, I still prefer option 2! You might still prefer option 2 if spending $5,000 on lottery tickets over 10 years reflects a small proportion of your wealth, but I refer you to the previous discussion on this topi: make 

So I would argue that it in a long run of bets we are more likely in real life to get payoff structures of the kind I posited, than the closer to 50:50 bet suggested by Edwin. Ultimately, I think we agree that for long sequences of bets the median makes more sense (with a caveat). I personally think long run decision making is more relevant to most people than one off bets. 

What is the caveat? Edwin also said that the choice of the median is 'arbitrary'. I disagree here. The median is 'what happens half the time'. I still think for most people that is a logical reference point for 'what I expect to happen', as well as in terms of the maths: both median and mean are averages after all. I personally think it's fine to be more conservative than this if you are risk averse, but not to be more aggressive - bear in mind that will mean you are betting at more than Kelly.

But anyway, as Matt Hollerbach, whose orginal series of tweets inspired this post, said:

"The best part of Robs framework is you don't have to use the median,50%. You could use 60%  or 70%  or 40% if your more conservative.  And it intuitively tells you what the chance of reaching your goal is. You don't get duped into a crazy long shot that the mean might be hiding in."  (typos corrected from original tweet)

This fits well into my general framework for thinking about uncertainty. Quantify it, and be aware of it. Then if you still do something crazy/stupid, well at least you know you're being an idiot...


Wednesday, 2 November 2022

Optimal trend following allocation under conditions of uncertainty and without secular trends

Few people are brave enough to put their entire net worth into a CTA fund or home grown trend following strategy (my fellow co-host on the TTU podcast, Jerry Parker, being an honorable exception with his 'Trend following plus nothing' portfolio allocation strategy). Most people have considerably less than 100% - and I include myself firmly in that category. And it's probably true that most people have less than the sort of optimal allocation that is recommended by portfolio optimisation engines.

Still it is a useful exercise to think about just how much we should allocate to trend following, at least in theory. The figure that comes out of such an exercise will serve as both a ceiling (you probably don't want any more than this), and a target (you should be aiming for this). 

However any sort of portfolio optimisation based on historical returns is likely to be deeply flawed. I've covered the problems involved at length before, in particular in my second book and in this blogpost, but here's a quick recap:

  1. Standard portfolio optimisation techniques are not very robust
  2. We often assume normal distributions, but financial returns are famously abnormal
  3. There is uncertainty in the parameter estimates we make from the data
  4. Past returns distributions may be biased and unlikely to repeat in the future

As an example of the final effect, consider the historically strong performance of equities and bonds in a 60:40 style portfolio during my own lifetime, at least until 2022. Do we expect such a performance to be repeated? Given it was driven by a secular fall in inflation from high double digits, and a resulting fall in interest rates and equity discount rates, probably not. 

Importantly, a regime change to lower bond and equity returns will have varying impact on a 60:40 long only portfolio (which will get hammered), a slow trend following strategy (which will suffer a little), and a fast trend following strategy (which will hardly be affected). 

Consider also the second issue: non Gaussian return distributions. In particular equities have famously negative skew, whilst trend following - especially the speedier variation - is somewhat positive in this respect. Since skew affects optimal leverage, we can potentially 'eat' extra skew in the form of higher leverage and returns. 

In conclusion then, some of the problems of portfolio optimisation are likely to be especially toxic when we're looking at blends of standard long only assets combined with trend following. In this post I'll consider some tricks methods we can use to alleviate these problems, and thus come up with a sensible suggestion for allocating to trend following. 

If nothing else, this is a nice toy model for considering the issues we have when optimising, something I've written about at length eg here. So even if you don't care about this problem, you'll find some interesting ways to think about robust portfolio optimisation within.

Credit: This post was inspired by this tweet.

Some very messy code with hardcoding galore, is here.


The assets

Let's first consider the assets we have at our disposal. I'm going to make this a very simple setup so we can focus on what is important whilst still learning some interesting lessons. For reasons that will become apparent later, I'm limiting myself to 3 assets. We have to decide how much to allocate to each of the following three assets:

  • A 60:40 long only portfolio of bonds and equities, represented by the US 10 year and S&P 500
  • A slow/medium speed trend following strategy, trading the US 10 year and S&P 500 future with equal risk allocation, with a 12% equity-like annualised risk target. This is a combination of EWMAC crossovers: 32,128 and 64,256
  • A relatively fast trend following strategy, trading the US 10 year and S&P 500 future with equal risk allocation, with a 12% annualised risk target. Again this is a combination of EWMAC crossovers: 8, 32 and 16,64

Now there is a lot to argue with here. I've already explained why I want to allocate seperately to fast and slow trend following; as it will highlight the effect of secular trends.

The reason for the relatively low standard deviation target is that I'm going to use a non risk adjusted measure of returns, and if I used a more typical CTA style risk (25%) it would produce results that are harder to interpret.

You may also ask why I don't have any commodities in my trend following fund. But what I find especially interesting here is the effect on correlations between these kinds of strategies when we adjust for long term secular trends. These correlations will be dampened if there are other instruments in the pot. The implication of this is that the allocation to a properly diversified trend following fund running futures across multiple asset classes will likely be higher than what is shown here.

Why 60:40? Rather than 60:40, I could directly try and work out the optimal allocation to a universe of bonds and equities seperately. But I'm taking this as exogenous, just to simplify things. Since I'm going to demean equity and bond returns in a similar way, this shouldn't affect their relative weightings.

50:50 risk weights on the mini trend following strategies is more defensible; again I'm using fixed weights here to make things easier and more interpretable. For what it's worth the allocation within trend following for an in sample backtest would be higher for bonds than for equities, and this is especially true for the faster trading strategy.

Ultimately three assets makes the problem both tractable and intuitive to solve, whilst giving us plenty of insight.


Characteristics of the underyling data

Note I am going to use futures data even for my 60:40, which means all the returns I'm using are excess returns.

Let's start with a nice picture:


So the first thing to note is that the vol of the 60:40 is fairly low at around 12%; as you'd expect given it has a chunky allocation to bonds (vol ~6.4%). In particular, check out the beautifully smooth run from 2009 to 2022. The two trading strategies also come in around the 12% annualised vol mark, by design. In terms of Sharpe Ratio, the relative figures are 0.31 (fast trading strategy), 0.38 (long only) and 0.49 (slow trading strategy). However as I've already noted, the performance of the long only and slow strategies is likely to be flattered by the secular trends in equities and bonds seen since 1982 (when the backtest starts).

Correlations matter, so here they are:

         60:40  Fast TF  Slow TF
60:40 1.00 -0.02 0.25
Fast TF -0.02 1.00 0.68
Slow TF 0.25 0.68 1.00

What about higher moments? The monthly skews are -1.44 (long only), 0.08 (slow) and 0.80 (fast). Finally what about the tails? I have a novel method for measuring these which I discuss in my new book, but all you need to know is that a figure greater than one indicates a non-normal distribution. The lower tail ratios are 1.26 (fast), 1.35 (slow) and 2.04 (long only); whilst the uppers are 1.91 (fast), 1.74 (slow) and 1.53 (long only). In other words, the long only strategy has nastier skew and worst tails than the fast trading strategy, whilst the slow strategy comes somewhere in between.


Demeaning

To reiterate, again, the performance of the long only and slow strategies is likely to be flattered by the secular trends in equities and bonds, caused by valuation rerating in equities and falling interest rates in bonds. 

Lets take equities. The P/E ratio in September 1982 was around 9.0, versus 20.1 now. This equates to 2.0% a year in returns coming from the rerating of equities. Over the same period US 10 year bond yields have fallen from around 10.2% to 4.0% now, equating to around 1.2% a year in returns. I can do a simple demeaning to reduce the returns achieved by the appropriate amounts.

Here are the demeaned series with the original backadjusted prices. First S&P:

And for US10:


What effect does the demeaning have? It doesn't affect significantly standard deviations, skew, or tail ratios. But it does affect the Sharpe Ratio:


              Original       Demean     Difference

Long only       0.38          0.24        -0.14

Slow TF         0.49          0.41        -0.08

Fast TF         0.31          0.25        -0.06

This is exactly what we would expect. The demeaning has a larger effect on the long only 60:40, and to a lesser extent the slower trend following. 

And the correlation is also a little different:

         60:40  Fast TF  Slow TF
60:40 1.00 -0.06 0.18
Fast TF -0.06 1.00 0.66
Slow TF 0.18 0.66 1.00

Both types of trend have become slightly less correlated with 60:40, which makes sense.


The optimisation

Any optimisation requires (a) a utility or fitness function that we are maximising, and (b) a method for finding the highest value of that function. In terms of (b) we should bear in mind the comments I made earlier about robustness, but let's first think about (a).

An important question here is whether we should be targeting a risk adjusted measure like Sharpe Ratio, and hence assuming leverage is freely available, which is what I normally do. But for an exercise like this a more appropriate utility function will target outright return and assume we can't access leverage. Hence our portfolio weights will need to sum to exactly 100% (we don't force this to allow for the possibility of holding cash; though this is unlikely). 

It's more correct to use geometric return, also known as CAGR, rather than arithmetic mean since that is effectively the same as maximising the (log) final value of your portfolio (Kelly criteria). Using geometric mean also means that negative skew and high kurtosis strategies will be punished, as will excessive standard deviation. By assuming a CAGR maximiser, I don't need to worry about the efficient frontier, I can maximise for a single point. It's for this reason that I've created TF strategies with similar vol to 60:40.

I'll deal with uncertainty by using a resampling technique. Basically, I randomly sample with replacement from the joint distribution of daily returns for the three assets I'm optimising for, to create a new set of account curves (this will preserve correlations, but not autocorrelations. This would be problematic if I was using drawdown statistics, but I'm not). For a given set of instrument weights, I then measure the utility statistic (CAGR) for the resampled returns. I repeat this exercise a few times, and then I end up with a distribution of CAGR for a given set of weights. This allows us to take into account the effect of uncertainty. 

Finally we have the choice of optimisation technique. Given we have just three weights to play with, and only two degrees of freedom, it doesn't seem too heroic to use a simple grid search. So let's do that.


Some pretty pictures

Because we only have two degrees of freedom, we can plot the results on a 2-d heatmap. Here's the results for the median CAGR, with the original set of returns before demeaning:
Sorry for the illegible labels - you might have to click on the plots to see them. The colour shown reflects the CAGR. The x-axis is the weight for the long only 60:40 portfolio, and the y-axis for slow trend following. The weight to fast trend following will be whatever is left over. The top diagonal isn't populated since that would require weights greater than 1; the diagonal line from top left to bottom right is where there is zero weight to fast trend following; top left is 100% slow TF and bottom right is 100% long only.

Ignoring uncertainty then, the optimal weight (brightest yellow) is 94% in slow TF and 6% in long only. More than most people have! However note that there is a fairly large range of yellow CAGR that are quite similar. 

The 30% quantile estimate for the optimal weights is a CAGR of 4.36, and for the 70% quantile it's 6.61. Let's say we'd be indifferent between any weights whose median CAGR falls in that range (in practice then, anything whose median CAGR is greater than 4.36). If I replace everything that is statistically indistinguishable from the maximum with white space, and redo the heatmap I get this:

This means that, for example, a weight of 30% in long only, 34% in slow trend following, and 36% in fast trend following; is just inside the whitespace and thus is statistically indistinguishable from the optimal set of weights. Perhaps of more interest, the maximum weight we can have to long only and still remain within this region (at the bottom left, just before the diagonal line reappears) is about 80%.

Implication: We should have at least 20% in trend following.

If I had to choose an optimal weight, I'd go for the centroid of the convex hull of the whitespace. I can't be bothered to code that up, but by eye it's at roughly 40% 60/40, 50% slow TF, 10% fast TF.

Now let's repeat this exercise with the secular trends removed from the data.

The plot is similar, but notice that the top left has got much better than the bottom right; we should have a lower weight to 60:40 than in the past. In fact the optimal is 100% in slow trend following; zilch, nil, zero, nada in both fast TF and 60:40.

But let's repeat the whitespace exercise to see how robust this result is:

The whitespace region is much smaller than before, and is heavily biased towards the top left. Valid portfolio weights that are indistinguishable from the maximum include 45% in 60:40 and 55% in slow TF (and 45% is the most you should have in 60:40 whilst remaining in this region). We've seen a shift away from long only (which we'd expect), but interestingly no shift towards fast TF, which we might have expected as it is less affected by demeaning.

The optimal (centroid, convex hull, yada yada...) is somewhere around 20% 60:40, 75% slow TF and 5% in fast TF.


Summary: practical implications

This has been a highly stylised exercise, deliberately designed to shine a light on some interesting facts and show you some interesting ways to visualise the uncertainty in portfolio optimisation. You've hopefully seen how we need to consider uncertainty in optimisation, and I've shown you a nice intuitive way to produce robust weights.

The bottom line then is that a robust set of allocations would be something like 40% 60/40, 50% slow TF, 10% fast TF; but with a maximum allocation to 60/40 of about 80%. If we use data that has had past secular trends removed, we're looking at an even higher allocation to TF, with the maximum 60/40 allocation reducing considerably, to around 45%

Importantly, this has of course been an entirely in sample exercise. Although we've made an effort to make things more realistic by demeaning, much of the results depend on the finding that slow TF has a higher SR than 60:40, an advantage that is increased by demeaning. Correcting for this would result in a higher weight to 60:40, but also to fast TF.

Of course if we make this exercise more realistic, it will change these results:
  • Improving 60:40 equities- Introducing non US assets, and allocating to individual equities
  • Improving 60:40 bonds -  including more of the term structure, inflation and corporate bonds, 
  • Improving 60:40 by including other non TF alternatives
  • Improving the CTA offering - introducing a wider set of instruments across asset classes (there would also be a modest benefit from widening beyond a single type of trading rule)
  • Adding fees to the CTA offering 
I'd expect the net effect of these changes to result in a higher weight to TF, as the diversification benefits in going from two instruments to say 100 is considerable; and far outweights the effect of fees and improved diversification in the long only space.