Monday, 14 November 2022

If you're so smart, how come you're not SBF? The Kelly criterion and choice of expectations and utility function when bet sizing



There has been a very interesting discussion on twitter, relating to some stuff said by Sam Bankman-Fried (SBF), who at the time of writing has just completely vaporized billions of dollars in record time via the medium of his crypto exchange FTX, and provided a useful example to future school children of the meaning of the phrase nominative determinism*.

* Sam, Bank Man: Fried. Geddit? 

Read the whole thread from the top:

https://twitter.com/breakingthemark/status/1591114381508558849

TLDR the views of SBF can be summarised as follows:

  • Kelly criterion maximises log utility
  • I don't have a log utility function. It's probably closer to linear.
  • Therefore I should bet at higher than Kelly. Up to 5x would be just fine.
I, and many others, have pointed out that SBF is an idiot. Of course it's easier to do this when he's just proven his business incompetence on a grand scale, but to be fair I was barely aware of the guy until a week ago. Specifically, he's wrong about the chain of reasoning above*. 

* It's unclear whether this is specifically what brought SBF down. At the time of writing he appears to have taken money from his exchange to prop up his hedge fund, so maybe the hedge fund was using >>> Kelly leverage, and this really is the case. 

In this post I will explain why he was wrong, with pictures. To be clearer, I'll discuss how the choice of expectation and utility function affects optimal bet sizing. 

I've discussed parts of this subject briefly before, but you don't need to read the previous post.


Scope and assumptions


To keep it tight, and relevant to finance, this post will ignore arguments seen on twitter related to one off bets, and whether you should bet differently if you are considering your contribution to society as a whole. These are mostly philosophical discussions which it's hard to solve with pictures. So the set up we have is:

  • There is an arbitrary investment strategy, which I assume consists of a data generating process (DGP) producing Gaussian returns with a known mean and standard deviation (this ignores parameter uncertainty, which I've banged on about often enough, but effectively would result in even lower bet sizing).
  • We make a decision as to how much of our capital we allocate to this strategy for an investment horizon of some arbitrary number of years, let's say ten.
  • We're optimising L, the leverage factor, where L =1 would be full investment, 2 would be 100% leverage, 0.5 would be 50% in cash 50% in the strategy and so on.
  • We're interested in maximising the expectation of f(terminal wealth) after ten years, where f is our utility function.
  • Because we're measuring expectations, we generate a series of possible future outcomes based on the DGP and take the expectation over those.
Note that I'm using the contionous version of the Kelly criterion here, but the results would be equally valid for the sort of discrete bets that appear in the original discussion.


Specific parameters

Let's take a specific example. Set mean =10% and standard deviation = 20%, which is a Sharpe ratio of 0.5, and therefore Kelly should be maxed at 50% risk, equating to L = 50/20 = 2.5. SBF optimal leverage would be around 5 times that, L = 12.5. We start with wealth of 1 unit, and compound it over 10 years.

I don't normally paste huge chunks of code in these blog posts, but this is a fairly short chunk:

import pandas as pd
import numpy as np
from math import log

ann_return = 0.1
ann_std_dev = 0.2

BUSINESS_DAYS_IN_YEAR = 256
daily_return = ann_return / BUSINESS_DAYS_IN_YEAR
daily_std_dev = ann_std_dev / (BUSINESS_DAYS_IN_YEAR**.5)

years = 10
number_days = years * BUSINESS_DAYS_IN_YEAR


def get_series_of_final_account_values(monte_return_streams,
leverage_factor = 1):
account_values = [account_value_from_returns(returns,
leverage_factor=leverage_factor)
for returns in monte_return_streams]

return account_values

def get_monte_return_streams():
monte_return_streams = [get_return_stream() for __ in range(10000)]

return monte_return_streams

def get_return_stream():
return np.random.normal(daily_return,
daily_std_dev,
number_days)

def account_value_from_returns(returns, leverage_factor: float = 1.0):
one_plus_return = np.array(
[1+(return_item*leverage_factor)
for return_item in returns])
cum_return = one_plus_return.cumprod()

return cum_return[-1]

monte_return_streams = get_monte_return_streams()

Utility function: Expected log(wealth) [Kelly]

Kelly first. We want to maximise the expected log final wealth:

def expected_log_value(monte_return_streams,
leverage_factor = 1):

series_of_account_values =get_series_of_final_account_values(
monte_return_streams = monte_return_streams,
leverage_factor = leverage_factor)
log_values_over_account_values = [log(account_value) for account_value in series_of_account_values]

return np.mean(log_values_over_account_values)

And let's plot the results:

def plot_over_leverage(monte_return_streams, value_function):
leverage_ratios = np.arange(1.5, 5.1, 0.1)
values = []
for leverage in leverage_ratios:
print(leverage)
values.append(
value_function(monte_return_streams, leverage_factor=leverage)
)

leverage_to_plot = pd.Series(
values, index = leverage_ratios
)

return leverage_to_plot

leverage_to_plot = plot_over_leverage(monte_return_streams,
expected_log_value)
leverage_to_plot.plot()

In this plot, and nearly all of those to come, the x-axis shows the leverage L and the y-axis shows the value of the expected utility. To find the optimal L we look to see where the highest point of the utility curve is.

As we'd expect:

  • Max expected log(wealth) is at L=2.5. This is the optimal Kelly leverage factor.
  • At twice optimal we expect to have log wealth of zero, equivalent to making no money at all (since starting wealth is 1).
  • Not plotted here, but at SBF leverage (12.5) we'd have expected log(wealth) of <undefined> and have lost pretty much all of our money.


Utility function: Expected (wealth) [SBF?]

Now let's look at a linear utility function, since SBF noted that his utility was 'roughly close to linear'. Here our utility is just equal to our terminal wealth, so it's purely linear.

def expected_value(monte_return_streams,
leverage_factor = 1):

series_of_account_values =get_series_of_final_account_values(
monte_return_streams = monte_return_streams,
leverage_factor = leverage_factor)

return np.mean(series_of_account_values)
leverage_to_plot = plot_over_leverage(monte_return_streams,
expected_value)

You can see where SBF was coming from, right? Utility gets exponentially higher and higher, as we add more leverage. Five times leverage is a lot better than 2.5 times, the Kelly criterion. Five times Kelly, or 2.5 * 5= 12.5, would be even better.


Utility function: Median(wealth) 

However there is an important assumption above, which is the use of the mean for the expectation operator. This is dumb. It would mean (pun, sorry), for example, that of the following:

  1. An investment that lost $1,000 99 times out of 100; and paid out $1,000,000 1% of the time
  2. An investment that is guaranteed to gain $9,000 

... we would theoretically prefer option 1 since it has an expected value of $9,010, higher than the trivial expected value of $9,000 for option 2. There might be some degenerate gamblers who prefer 1 to 2, but not many.

(Your wealth would also affect which of these you would prefer. If $1,000 is a relatively trivial amount to you, you might prefer 1. If this is the case consider if you'd still prefer 1 to 2 if the figures were 1000 times larger, or a million times larger). 

I've discussed this before, but I think the median is the more appropriate. What the median implies in this context is something like this: 

Considering all possible future outcomes, how can I maximise the utility I receive in the outcome that will occur half the time?

I note that the median of option 1 above is zero, whilst the median of option 2 is $9,000. Option 2 is now far more attractive.


def median_value(monte_return_streams,
leverage_factor = 1):

series_of_account_values =get_series_of_final_account_values(
monte_return_streams = monte_return_streams,
leverage_factor = leverage_factor)

return np.median(series_of_account_values)


The spooky result here is that the optimal leverage is now 2.5, the same as the Kelly criterion.

Even with linear utility, if we use the median expectation, Kelly is the optimal strategy.

The reason why people prefer to use mean(log(wealth)) rather than median(wealth), even though they are equivalent, is that the former is more computationally attractive.

Note also the well known fact that Kelly also maximises the geometric return.

With Kelly we aren't really making any assumptions about utility function: our assumption is effectively that the median is the correct expectations operator

The entire discussion about utility is really a red herring. It's very hard to measure utility functions, and everyone probably does have a different one, I think it's much better to focus on expectations.


Utility function: Nth percentile(wealth) 

Well you might be thinking that SBF seems like a particularly optimistic kind of guy. He isn't interested in the median outcome (which is the 50% percentile). Surely there must be some percentile at which it makes sense to bet 5 times Kelly? Maybe he is interested in the 75% percentile outcome?

QUANTILE = .75
def value_quantile(monte_return_streams,
leverage_factor = 1):

series_of_account_values =get_series_of_final_account_values(
monte_return_streams = monte_return_streams,
leverage_factor = leverage_factor)

return np.quantile(series_of_account_values, QUANTILE)

Now the optimal is around L=3.5. This is considerably higher than the Kelly max of L=2.5, but it is still nowhere near the SBF optimal L of 12.5.

Let's plot the utility curves for a bunch of different quantile points:

list_over_quantiles = []
quantile_ranges = np.arange(.4, 0.91, .1)
for QUANTILE in quantile_ranges:
leverage_to_plot = plot_over_leverage(monte_return_streams,
value_quantile)
list_over_quantiles.append(leverage_to_plot)

pd_list = pd.DataFrame(list_over_quantiles)
pd_list.index = quantile_ranges
pd_list.transpose().plot()

It's hard to see what's going on here, legend floating point representation notwithstanding, but you can hopefully see that the maximum L (hump of each curve) gets higher as we go up the quantile scale, as the curves themselves get higher (as you would expect).

But in none of these quantiles we are still nowhere near reaching an optimal L of 12.5. Even at the 90% quantile - evaluating something that only happens one in ten times - we have a maximum L of under 4.5.

Now there will be some quantile point at which L=12.5 is indeed optimal. Returning to my simple example:

  1. An investment that lost $1,000 99 times out of 100; and paid out $1,000,000 1% of the time
  2. An investment that is guaranteed to gain $9,000 

... if we focus on outcomes that will happen less than one in a million times (the 99.9999% quantile and above) then yes sure, we'd prefer option 1.

So at what quantile point does a leverage factor of 12.5 become optimal? I couldn't find out exactly, since to look at extremely rare quantile points requires very large numbers of outcomes*. I actually broke my laptop before I could work out what the quantile point was. 

* for example, if you want ten observations to accurately measure the quintile point, then for the 99.99% quantile you would need 10* (1/(1-0.9999)) = 100,000 outcomes.

But even for a quantile of 99.99% (!), we still aren't at an optimal leverage of 12.5! 


You can see that the optimal leverage is 8 (around 3.2 x Kelly), still way short of 12.5.


Summary

Rather than utility functions, I think it's easier to say to ask people the likelihood of outcome they are concerned about. I'd argue that sensible people would think about the median outcome, which is what you expect to happen 50% of the time. And if you are a bit risk averse, you should probably consider an even lower quantile. 

In contrast SBF went for bet sizing that would only make sense in the set of outcomes that happens significantly less than 0.01% of the time. That is insanely optimistic; and given he was dealing with billions of dollars of other peoples money it was also insanely irresponsible.

Was SBF really that recklessly optimistic, or dumb? In this particular case I'd argue the latter. He had a very superfical understanding of Kelly bet sizing, and because of that he thought he could ignore it. 
This is a classic example of 'a little knowledge is a dangerous thing'. A dumb person doesn't understand anything, but reads on the internet somewhere that half Kelly is the correct bet sizing. So they use it. A "smart" person like SBF glances at the Kelly formula, thinks 'oh but I don't have log utility' and leverages up five times Kelly and thinks 'Wow I am so smart look at all my money'. And that ended well...

A truely enlightened person understands that it isn't about the utility function, but about the expectation operator. They also understand about uncertainty, optimistic backtesting bias, and a whole bunch of factors that imply that even 0.5 x Kelly is a little reckless. I, for example, use something south of a quarter Kelly. 

Which brings us back to the meme at the start of the post:



Note I am not saying I am smarter than SBF. On pure IQ, I am almost certainly much, much dumber. In fact, it's because I know I am not a genius that I'm not arrogant enough to completely follow or ignore the Kelly criteria without first truely understanding it.

Whilst this particular misunderstanding might not have brought down SBF's empire, it shows that really really smart people can be really dumb - particularly when they think that they are so smart they don't need to properly understand something before ignoring it*.

* Here is another example of him getting something completely wrong


Postscript (16th November 2022)

I had some interesting feedback from Edwin Teejay on twitter, which is worth addressing here as well. Some of the feedback I've incorporated into the post already.

(Incidentally, Edwin is a disciple of Ergodic Economics, which has a lot of very interesting stuff to say about the entire problem of utility maximisation)

First he commented that the max(median) = max(log) relationship is only true for a long sequence of bets, i.e. asymptotically. We effectively have 5000 bets in our ten year return sequence. As I said originally, I framed this as a typical asset optimisation problem rather than a one off bet (or small number of one off bets).

He then gives an example of a one off bet decision where the median would be inappropriate:
  1. 100% win $1
  2. 51% win $0 / 49% win $1'000'000
The expected values (mean expectation) are $1 and $490,000 respectively, but the medians are $1 and $0. But any sane person would pick the second option.

My retort to this is essentially the same as before - this isn't something that could realistically happen in a long sequence of bets. Suppose we are presented with making the bet above every single week for 5 weeks. The distribution of wealth outcomes for option 1 is single peaked - we earn $5. The distribution of wealth outcomes for option 2 will vary from $0 (with probability 3.4%) to $5,000,000 (with a slightly lower probability of 2.8% - I am ignoring 'compounding', eg the possibility to buy more bets with money we've already won), with a mean of $2.45 million. 

But the median is pretty good: $1 million. So we'd definitely pick option 2. And that is with just 5 bets in the sequence. So the moment we are looking at any kind of repeating bet, the law of large numbers gets us closer and closer to the median being the optimal decision. We are just extremely unlikely to see the sort of payoff structure in the bet shown in a series of repeated bets.

Now what about the example I posted:
  1. An investment that lost $1,000 99 times out of 100; and paid out $1,000,000 1% of the time
  2. An investment that is guaranteed to gain $9,000 
Is it realistic to expect this kind of payoff structure in a series of repeated bets? Well consider instead the following:
  1. An investment that lost $1 most of the time; and paid out $1,000,000 0.001% of the time
  2. An investment that is guaranteed to gain $5

The mean of these bets is ~$9 and $5, and the medians are $-1 and $5.

Is this unrealistic? Well, these sorts of payoffs do exist in the world- they are called lottery tickets (albeit it is rare to get a lottery ticket with a $9 positive mean!). And this is something closer to the SBF example, since I noted that he would have to be looking at somewhere north of the 0.01% quantile to choose 5x Kelly Leverage.

Now what happens if we run the above as a series of 5000 repeated bets (again with no compounding for simplicity).  We end up with the following distributions:
  1. An investment that lost $5000 95.1% of the time, and makes $1 million or more 5% of the time.
  2. An investment that is guaranteed to gain $25,000

Since there is no compounding we can just multiply up the individual numbers to get the mean ($45,000  and $25,000 respectively). The medians are -$5,000 and $25,000. Personally, I still prefer option 2! You might still prefer option 2 if spending $5,000 on lottery tickets over 10 years reflects a small proportion of your wealth, but I refer you to the previous discussion on this topi: make 

So I would argue that it in a long run of bets we are more likely in real life to get payoff structures of the kind I posited, than the closer to 50:50 bet suggested by Edwin. Ultimately, I think we agree that for long sequences of bets the median makes more sense (with a caveat). I personally think long run decision making is more relevant to most people than one off bets. 

What is the caveat? Edwin also said that the choice of the median is 'arbitrary'. I disagree here. The median is 'what happens half the time'. I still think for most people that is a logical reference point for 'what I expect to happen', as well as in terms of the maths: both median and mean are averages after all. I personally think it's fine to be more conservative than this if you are risk averse, but not to be more aggressive - bear in mind that will mean you are betting at more than Kelly.

But anyway, as Matt Hollerbach, whose orginal series of tweets inspired this post, said:

"The best part of Robs framework is you don't have to use the median,50%. You could use 60%  or 70%  or 40% if your more conservative.  And it intuitively tells you what the chance of reaching your goal is. You don't get duped into a crazy long shot that the mean might be hiding in."  (typos corrected from original tweet)

This fits well into my general framework for thinking about uncertainty. Quantify it, and be aware of it. Then if you still do something crazy/stupid, well at least you know you're being an idiot...


6 comments:

  1. In the example you give, isn't the expectation $9,010 instead of $9,100? (-1,000 * 99/100 + 1,000,000 * 1/100 = 9,010)

    ReplyDelete
  2. Hi Rob, excellent post! I think part of the problem is in how it's framed: when one talks about two possible bets with different payout structures, one is probably induced to consider the possible outcomes of those two strategies when applied only once.
    If it's very clear from the beginning that the two strategies are to be repeated n times (with or without compounding), then it would be clear that the should try to characterise the distribution of the outcomes after those n trials. If the resulting distribution is far from normal (as it happens with compounding), then it would be clear that it's more reasonable to look at the median than at the mean, and from there backtrack to what the best strategy is. This is probably pretty obvious, especially after this great post, but what I feel is more fundamental and probably underestimated is how important (and clear) that initial framing is

    ReplyDelete
  3. The median is a really bad expectation operator. It ignores tails entirely: in this post you're often able to present this as the conservative, responsible, risk-managing choice, but that's because (as far as I can tell) you only examine right-skewed distributions, with likely low outcomes and unlikely very high outcomes. In left-skewed distributions the situation is reversed: the median is happy to pick up pennies in front of a steamroller no matter how few the pennies or how large the steamroller – as long as you're less than 50% likely to be run over, that outcome just doesn't matter at all. You need to take the mean if you want to incorporate the risk of catastrophe into your decision.

    As you rightly point out, some of the problems with the median are fixed by considering a long sequence of repeated bets, but:
    - as I'm picking up my pennies in front of my steamroller, the median endorses every single penny I pick up, and it's only if I ask about all the pennies at once that it says no, which leaves me in a situation where what answers I get depend on how I choose to aggregate my questions,
    - in any case, by the law of large numbers, if you repeat a bet many times, the median converges to the mean: to the extent that repeating the bet gets you to the right answer, it's because the mean was already there.

    ReplyDelete
    Replies
    1. This is fair, but SBF was talking about right tailed option like distributions so that's why I framed it in that way. Naturally for a penny/steamroller situation we'd correctly end up using a much lower leverage than if the distribution was gaussian or right tailed, and in fact I have talked about this several times before in my books and in other posts.

      Delete

Comments are moderated. So there will be a delay before they are published. Don't bother with spam, it wastes your time and mine.