Thursday, 6 February 2025

How much should we get paid for skew risk? Not as much as you think!

 A bit of a theme in my posts a few years ago was my 'battle' with the 'classic' trend followers, which can perhaps be summarised as:

Me: Better Sharpe!

Them: Yeah, but Skew!!

My final post on the subject (when I realised it as a futile battle, as we were playing on different fields - me on the field of empirical evidence, them on .... a different field) was this one, in which the key takeaway was this:

The backtest evidence shows that you can achieve a higher maximum CAGR with vol targeting, because it has a large Sharpe Ratio advantage that is only partly offset by it's small skew disadvantage. For lower levels of relative leverage, at more sensible risk targets, vol targeting still has a substantially higher CAGR. The slightly worse skew of vol targeting does not become problematic enough to overcome the SR advantage, except at extremely high levels of risk; well beyond what any sensible person would run.

And another more recent post was on Bitcoin, and why your allocation to it would depend on your appetite for skew. 

With those in mind I recently came to the insight that I could use my framework of 'maximising expected geometric mean / final wealth at different quantile points of the expectation distribution given you can use leverage or not'* to give an intuitive answer an intruiging question - probably one of the core questions in finance:

"What should the price of risk be?"

* or MEGMFWADQPOTED for short - looking actively for a better acronym - which I used in the Bitcoin post linked to above, but explain better in the first half of this post and also this one from a year ago

The whole academic risk factor literature assumes the price of risk often without much reasoning. We can work out the size of the exposure, and the risk of the factor, but that doesn't really justify it's price. After all, academics spent a long time justifying the equity risk premium

I think it would be fun to think about the price of different kinds of risk. Given the background above, I thought only about skew (3rd moment) risk but I will also briefly discuss standard deviation (2nd moment) risk. Generally speaking the idea is to answer the question "What additional Sharpe Ratio should an investor require for each unit of additional risk in the form of X?" Whilst this has certainly been covered by academics at some length, I think the approach of wrapping up into expressing risk preference as optimising for different distributional points is novel and means pretty graphs.

I'm going to assume you're familiar with the idea of maximising geometric return / CAGR / log(final wealth) at some distributional point (50% median or more conservative points like 10, 25%), to find some optimal level of leverage. If not enjoy reading the prior work.


The "price" of standard deviation risk - with and without leverage

To an investor who can use leverage, for Gaussian normal returns, this is trivial. We want the higest Sharpe Ratio asset, irrespective of what it's standard deviation is. Therefore the 'price' of standard deviation is zero. We don't mind getting additional standard deviation risk as long as it doesn't affect our Sharpe Ratio - we don't need a higher SR to compensate. Indeed in practice, we might prefer higher standard deviations since it will require less potential leverage that could be problematic if we are wrong about our SR estimates or assumptions about return distributions.

In classical Markowitz finance to an investor who cannot use leverage, the price of standard deviation is negative. We will happily pay for higher risk in the form of a lower Sharpe Ratio. We want higher returns at all costs; that may come at the cost of higher standard deviation so we aren't fully compensated for the additional risk, but we don't care. This is the 'betting against beta' explanation from the classic Pedersen paper. Consider for example an investment with a mean of 5% and a standard deviation of 10% for a Sharpe Ratio of 0.5 (I set the risk free rate to zero without loss of generality) . If the standard deviation doubles to 20%, but the mean only rises to 6%, well we'd happily take that higher mean. We'd even take it if the mean only increased by 0.00001%. That means the 'price' of higher standard deviation is not only negative, but a very big negative number.

But we are not maximising arithmetic mean. Instead we're maximising geometric mean, which is penalised by higher standard deviation. That means there will be some point at which the higher standard deviation penalty for greater mean is just too high. For the median point on the quantile distribution, which is a full Kelly investor, that will be once the standard deviation has gone above the Kelly optimal level. Until that point the price of risk will be negative; above it will turn positive.

Consider again an arbitrary investment with a mean of 5% and a standard deviation of 10%; SR =0.5. If returns are Gaussian then the geometric mean will be 4.5%. The Kelly optimal risk is much higher 50%, which means it's likely the local price of risk is still negative. So for example, if the standard deviation goes up to 20%, with the mean rising to say 6.5%, for a new (lower) SR of 0.325; we'd still end up with the same geometric mean of 4.5%. In this simple case the price of 10% units of risk is a SR penalty of 0.175; we are willing to pay 0.0175 units of SR for each 1% unit of standard deviation. 

If however the standard deviation goes up another 10%, then the maximum SR penalty for equal geometric mean we would accept is 0.025 units (getting us to a SR of 0.3 or returns of 6.5% a year on 30% standard deviation equating again to a geometric mean of 4.5%); and for any further increase in standard deviation we will have to be payed SR units. This is because the standard deviation is now 30% and so is the SR; we are at the Kelly optimal point. We wouldn't want to take on any additional standard deviation risk unless it is at a higher SR, which will then push the Kelly optimal point upwards.

So we'd need to get paid SR units to push the standard deviation up to say 40%. With 40% standard deviation we'd only be interested in taking the additional risk if we could get a SR of 0.3125 to maintain the geometric mean at 4.5%. Something weird happens here however, since 40% is higher than the new Kelly optimal we can actually get a higher geometric mean if we used less risk (basically by splitting our investment between cash and the new asset). To actually want to use that 40% of risk the SR would trivially have to be 40%. For someone who is remaining fully invested the price of standard deviation risk once you hit the Kelly optimal is going to be 1:1 (1% of standard deviation risk requiring 0.01 of SR benefit).

That is all for a Kelly optimal investor, but how would using my probabilistic methodology with a lower quantile point than the median change this? Well clearly, that would penalise higher standard deviations more, reducing the point at which standard deviation risk was negative.

Because the interaction of leverage and Kelly optimal is complex and will depend on exactly how close the initial asset is to the cutoff point, I'm not going to do more detailed analysis on this as it would be timeconsuming to write, and to read, and not add more intuition thatn the above. Suffice to say there is a reason why I usually assume we can get as much leverage as required!


The "price" of skew - with leverage

Now let's turn to skew (and let's also drop the annoying lack of leverage which makes our life so complicated). The question we now want to answer is "What is the price of skew: how many additional points of SR do we need to compensate us for a unit change in skew, assuming we can freely use leverage? And how does this change at different distributional points?". Returning to the debate that heads this post; is an extra 0.50 units of skew worth a 0.30 drop in SR when we go from continous to 'classical' trend following? We know that would only be the case if we were allowed to use a lot of leverage; which implies we were unlikely to be anything but a full Kelly optimising median distributional point investor. But at what distributional point does that sort of tradeoff become worth it?

To answer this, I'm going to recycle some code from this post and adapt it. That code uses a brute force technique to by mixing Gaussian returns to produce returns with different levels of skewness and fat tailed-ness, but with the same given Sharpe Ratio. We then bootstrap those returns at different leverage levels. That gives us a distribution of returns for each leverage level. We can then choose the optimal leverage that produces the maximum geometric return at a given distributional point (eg median for full Kelly, 10% to be conservative and so on). I then have an expected CAGR level at a given SR, for a given level of skew and fat tailness. By modifying the SR, skew and fat tailness I can see how the geometric return varies, and construct planes where the CAGR is constant. From that I can derive the price of skew (and fat tailness, but I will look at that in a momen) in SR units at different distributional points. Phew!

(Be prepared to set aside many hours of compute time for this exercise if you want to replicate...)


The "price" of skew: Kelly investor

Let's begin by looking at the results for the Kelly maximiser who focuses on the median point of the distribution when calculating their optimal leverage. 

The plots show 'indifference curves' at which the geometric mean is approximately equal. Each coloured line is for a different level of geometric mean. The plots are 'cross plots' that show statistical significance and the median of a cloud of points, as due to the brute force approach there is a cloud of points underneath.

Even then, there is still some non monotonic behaviour. But hopefully the broad message is clear; for this sort of person skew is not worth paying much for! At most we might be willing to give up 4 SR basis points to go from a skew of -3 to +3, which is a pretty massive range.



The "price" of skew: very conservative investor

Now let's consider someone who is working at the 10% quantile point.

If anything these curves are slightly flatter; at most the price of skew might be a couple of basis points. The intuition for this is that these people are working at much lower levels of leverage. They are much less likely to see a penalty from high negative skew, or much of a benefit from a high positive skew.


The "price" of lower tail risk: Kelly investor

Now let's consider the lower tail risk. Remember, a ratio of 1 means we have a Gaussian distribution, and a value above 1 means the left tail is fatter.


This may seem surprising; with a more extreme left tail it looks like you can have a higher SR. But the improvement is modest again, perhaps 5bp of SR at most.


The "price" of lower tail risk: 10% percentile investor

Once again, investors at a lower point on the quantile spectrum are less affected by changes in tail risk, requiring perhaps 3bp of SR in compensation.


How does the optimal leverage / skew relationship change at different percentiles?

As we have the data we can update the plots done earlier and consider how optimal leverage changes with skew. First for the Kelly investor:




Here each coloured line is for a different SR. We can see that for the lowest SR the optimal leverage goes from around 2.7 to 3.7 between the largest negative and positive skews; and for the higest from around 4.2 to 5.6. This is the same result as the last post: leverage can be higher if skew is positive, but not that much higher (from skew of -2 to +2 we can leverage up by around a third).

Here is the 10% investor:




The optimal leverage is lower as you would expect, since we are scaredy cats. It looks like the leverage range is higher though; for the highest SR strategies we go from around 1.7 to 2.8; a two thirds increase. And for the lower SR the rise in optimal leverage is even more dramatic. 


 

One final cut of the data cake

Finally another way to slice the cake is to draw different coloured lines for each level of skew and then see how the geometric mean varies as we change Sharpe Ratio. First the Kelly guy:


This is really reinforcing the point that skew is second order compared to Sharpe Ratio. Each of the bunches of coloured lines is very close to each other. At the very lowest SR at around 0.52 we only get a modest improvement in CAGR going from skew of -2.4 (purple) to +2.4 (red). We get a bigger improvement in CAGR when we add around 3bp of SR and move along the x-axis. Hence 5 units of skew are worth less than 3bp in SR. It's only at relatively high levels of SR that skew becomes more valuable; perhaps 5bp of SR for each 5 units of skew.


Here is the 10% person:


As we noted before there is almost no benefit from skew for the conservative investor (coloured lines close together at each SR point), except until SR ramps up. At the end 5 units of skew are worth the same as around 6bp of SR. 


Conclusion: Skew isn't as valuable as you might think

I started this post harking back to this question: is an extra 0.50 units of skew from 'traditional' trend following worth a 0.30 drop in SR? And the answer is, almost certainly not. The best price we get for skew is around 6bp for 5 units of skew. At that price, 0.5 units of skew should cost us less than 1bp in SR penalty. We're being charged about 50 times the correct price!!!

And this is for Kelly investors. For those with a lower risk tolerance, much of the time there is basically no significant benefit from skew.

That doesn't mean that you shouldn't know what your skew is, as it will affect your optimal leverage, particularly as we saw above if you are a conservative utility person (being such a person will also protect you if you think your skew or Sharpe ratio is better than it actually is, and that's no bad thing). And negatively skewed strategies at la LTCM with very low natural vol that have to be run at insane leverage will always be dangerous, particularly if you don't realise they are negatively skewed. 

But part of the problem with the original debate is a false argument by taking a true statement 'highly negatively skewed strategies are very dangerous with leverage' and extending it to 'you should be happy to suffer significantly lower Sharpe Ratio to get a marginally more positive skew' (which I have demonstrated is false). 

Anyway outside of that argument I think I have shown that to an extent the obsession with getting positive skew is a bit of an unhealthy one. Sure, get it if it's free, but don't pay much for it otherwise. 









Tuesday, 7 January 2025

Do less liquid assets trend better or is that they are just more diversified?

 As most of you know, one of the many projects / things I am involved with is the TTU Systematic Investor podcast series where I'm one of the rotating cast of co-hosts.

On a recent episode (at 24:05) we discussed the reasons why 'alt' CTAs tend to do better than traditional CTAs. Examples of alt-CTAs mentioned in that segment are the Man-AHL Evolution fund which I was heavily involved with when I was at AHL, and the Florin Court product  which is run by some ex-AHL colleagues. 

(Other funds are available and this is not endorsement or financial advice which I am not regulated to provide. It may be utterly illegal of you to even be aware of these products in your jurisdiction never mind invest in them them, and that is your problem not mine)

An 'alt-CTA' is one that trades non traditional markets, but in a traditional way (eg mostly by trend following). These could be less liquid futures markets, but is more likely to be non futures markets like options, OTC derivatives or cash equities. In this article I'm going to focus on the 'less liquid futures' definition of alt-, because that is the data I happen to have. This means that the analysis is also analogous to one of the classical issues in financial economics - the small cap effect in equities. 

In that episode I mentioned some research I had once done on that very topic; albeit many, many years ago, and that document certainly isn't available on my blog. So I thought it worth redoing this exercise.


Reasons why alt CTAs might do better

There are a number of reasons why one CTA might outperform another, but we're going to focus on just three here:

  • more diversification (the products they trade have lower correlations with each other, and/or nice co-skewness properties)
  • better pre-cost performance from the products they are trading
  • lower costs

Now of course we would expect higher costs from less liquid futures; the key question is whether we get enough extra pre-cost performance to compensate.... or no extra performance at all. In which case is the extra alt-CTA juice coming from the diversification properties of the alt-markets (eithier linear correlation or something funkier in the higher moments)? Or will my simple analysis fail to uncover any extra alt-performance, eithier because the alt-CTA's have some extra magic or because their special black magic power can only be found in non futures markets. Or because they've just been lucky.

In any case we'll see if the equivalent of the 'small cap' effect in stocks is present in futures, or if it's something that was around in the past but has gone.

Note: There is some debate about whether the small cap effect, eithier outright, or in combination with the value effect, is still a thing. 


What we are measuring

We need a way of measuring:

  •  the liquidity of futures
  • the trend following performance

To keep things simple, for trend following performance I'm going to use the Sharpe Ratio of an EWMAC16,64 trend following continous forecast with my usual vol based position sizing. To calculate the Sharpe Ratio for a given period (eg a year), I'll use the annualised average daily percentage return divided by the expected annual percentage standard deviation. So this is a Sharpe Ratio based on the vol targeted, not the realised vol. This is because for short periods we might have a weak signal producing a high SR on a contract we didn't actually make any significant money out of. 

For futures liquidity, I'm going to use the 30 day rolling average of daily volume in $ million of annualised risk units for the contract that currently has the highest volume. That is the same measure I track daily here. And then I'm going to log(x) this volume, as these figures vary by many orders of magnitude.

Note: I currently set this measure at a minimum of $1.5 million to trade a given future. 

Note: The definition of $ annualised risk units is the number of contracts of volume, multiplied by the annual standard deviation in price units, mutiplied by the $ value of each price unit.

There could be other ways of measuring liquidity; for example open interest, or the cost of trading. I'm wary of using open interest since there are contracts with large open interest and small volume, and the reverse is also true. Personally I think unless you are a massive trader the size of the volume is more important than the open interest. I don't want to use cost of trading as a measure of liquidity, since I will be analysing that seperately.

Normally when I do this kind of analysis, I exclude instruments for all kinds of reasons including because they are too expensive or illiquid to trade. In this case I don't want to do that. I will however exclude instruments in my data set that are:

  • Duplicates. For example, I don't analyse both the micro and mini S&P 500. The instrument in my dataset are those which meet my minimum requirements for liquidity but have the smallest contract size. Note that the definition of which is the duplicate contract to trade could have been different in the past. For example, immediately after the micro future came in to being it wouldn't have met my requirements for liquidity, so I would have in practice used the mini future. This will affect the results in a small number of edge cases, but mostly for high volume instruments.
  • Ignored. These instruments eithier have garbage data, or they are spread instruments.

I won't exclude instruments that have:

  • Trading Restrictions - mostly ICE markets for which I don't have access to live data so don't currently trade, and certain US derivatives I'm banned from trading
  • 'Bad markets' - these are those that are too expensive or illiquid for me to trade - I want to see if there are size effects so I want to keep these. 

This gives me 205 instruments to analyse. Finally, I have around 12 years of data since I don't have volume data prior to 2013 in my dataset. 


Results across all years

Let's start by just plotting the average volume across all the available data, versus the pre-cost trend following p&l, by instrument.



That isn't especially suggestive of a strong relationship; although our eyes are drawn to the outlier in the top left (US housing equity sector if you care). If I do this as a 'bin cross' plot, which shows statistical significance (explained in more detail in chapter 12 of AFTS), then we can see there is really nothing there - in fact there is a slight tendency for very liquid markets to have a higher trend following SR:




What about costs?

Perhaps a slightly better relationship here - lower volume means higher costs - but not super consistent. There are instruments with very volume but not bad costs, such as the CLP and CZK FX markets at the extreme left. However these costs are based on sampled bid-ask spreads so are unlikely to be indicative of what you could actually achieve trading any size.

The cross plot shows that very illiquid markets do indeed cost more, but beyond that the relationship is relatively non linear. There is a 'zone of increasing costs' up to around $20m of volume in annual risk units, but beyond that risk adjusted costs are relatively flat. Again, this applies to bid-ask spreads only (and commisions) and for institutional size traders the 'zone of increasing costs' would apply to more instruments. 



Trend following p&l: Year by year results

This kind of market analysis has a fatal flaw; it doesn't account for the fact that some instruments will have been trading across the entire 12 year dataset whilst others will only have a few years of data. It also doesn't account for time series effects such as a given instrument seeing an increase or decrease in volume over the relevant period. To get around this, instead I'm going to break the results down into year by year results. So each point on the following scatter plot is the SR and volume for a given instrument and a given year. 

There is little point doing this for costs, since the costs in my backtest aren't actual costs, but here are the results for pre-cost returns. I haven't bothered with a scatter plot as it will be insanely noisy; here is the cross plot:


As with costs it does look like there is something there for very illiquid instruments; roughly those with less than $1m of volume units per day. But it's not statistically significant. The results incidentally survive the application of costs:

The median SR for log(volume) less than 0 (volume units < $1m per day) is 0.04 SR units higher even after costs, and the less robust mean SR is 0.12 units higher.


Measuring diversification via IDM

OK so it looks like very illiquid markets might have a slight edge in performance. But this isn't enough to explain the outperformance of alt-CTAs (with all the caveats from before); I'd also like to look at diversification.

Expected linear diversification can be measured easily by using what I call the 'IDM'. Intuitively, it's the multiplication factor required to leverage up a portfolio of assets with some weightings and correlations. See any of books on trading for details. A portfolio of assets with all correlations=1 will have an IDM of 1. A portfolio of N assets with all correlations zero will have an IDM of sqrt(N).

Note: We can also measure the actual diversification (which will confound both linear and non linear effects) by looking at the ratio between the portfolio SR and the SR of individual instruments - the Sharpe Ratio Ratio (SRR). This tends to be higher than we'd expect from looking at the IDM, as I note in AFTS and here; there is also another take from an ex colleague here. It's tricky to do here however as there are a lot of instruments jumping in and out of the portfolio.

So what I need to do is create portfolios of different liquidity instrument trend following sub-strategies and measure their diversifications (not the correlation of the underlying returns!). An open question is how these portfolios are weighted. I will do this two ways; firstly with equal weights. Secondly, using my handcrafting method (H/C) but in it's simplest form with just correlations (but naturally, using out of sample optimisation). 

This will be a crude in sample test where I look at the average volume over the entire trading period when we have volume figures and then use that to split the portfolio into different buckets. Because I'm trying to work out the why not the how of how this result could be exploited. I will use the final IDM (likely an overestimate given the IDM should increase as more instruments are added).

First by cutting off the portfolio at the median log(volume) of 2.8 (about $16m of daily volume units):

                          IDM EW                        IDM H/C 
Low volume                 2.30                           2.10
High volume                2.28                           2.14

That's... not very much difference. Here are the results as a time series, just to check it isn't a weird end of days effect:


Notice that IDM's fall over time, probably because correlations generally are rising. Earlier in the period when more diversification is available, the less liquid markets do better. But the differences aren't especially substantial.

But above it did seem that the better performance effect only kicked in once we were at very low volumes - below log(volume) of 0 (less than $1m in volume units). Let's go a bit more granular and cut our list of instruments into four groups of ~50 instruments, and for simplicity just look at handcrafted results:


Note the key is in log(volume) units. Note also that there isn't much going on here.


A very silly comparison

The one thing we haven't yet done is plot an account curve, so let's see what the portfolio p&l is like for each of the four buckets of liquidity (which essentially will confound both any improvement in per instrument trend following, plus the realised diversification both linear and non linear). To make this really silly, I'm going to do this for the whole of history despite only using volumes from 2013 to the present to decided which instrument goes in which bucket. This is a shocking idea for a huge number of reasons, almost too many to elucidate here.

With all that in mind, this is the strongest effect yet with less liquid markets underperforming. However this is very likely to be luck; and it's confined mostly to the period prior to 1985 when the less liquid market sets probably only contained only a few instruments which happened to do badly. After that there really isn't much in it.

Summary

On an individual market basis there is indeed a faint 'small cap' effect in futures, at least at this single speed. But it doesn't look like there is much of a difference in measurable diversification benefits. 

As I warned none of this goes very far to explaining the puzzle of the alt-CTA's outperformance, mainly because I don't really have the data to do this properly (so perhaps Man AHL or Florin coud do so?) - the benefit's of being an alt- aren't so much having a higher exposure to illiquid futures, than to trading things that aren't futures at all.

Although perhaps it really was luck, since the outperformance has started to fade recently and the five year track records for say Evo and AHL Alpha are now very similar. That could be because the diversification benefit has fallen off in alts more than in liquid futures, or because the alt- markets have 'matured' and become less 'trendy'.

What we haven't done here is look at the effect of including less liquid instruments in an existing portfolio of liquid instruments; ceritus paribus that should be a good thing since my starting assumption is that more diversification is better, especially as many of the less liquid instruments are commodities rather than another flipping US bond future.

Perhaps I should rethink my very strict policy on what I trade (minimum liquidity of $1.5m volume units per day); after all one of the advantages of being a smaller trader is being able to trade less liquid markets, and not all of the instruments with that sort of volume are super expensive as one of the earlier plots showed. 


Friday, 6 December 2024

Taking an income from your trading account - probabilistic Kelly with regular withdrawals

Programming note: This post has been in draft since ... 2016!

One question you will see me asked a lot is 'how much money do I need to become a full time trader?'. And I usually have a handwaving answer along the lines of 'Well if you think your strategy will earn you 10% a year, then you probably want to be able to cover 5 years of expenses with no income from your trading strategy, so you need 15x your annual living expenses as an absolute minimum'. Which curiously often isn't the answer people want to hear, since objectively at that point they would already be rich and they want to trade purely to become rich (a terrible idea! people should only trade for fun with money they can afford to lose); and also because they want to start trading right now with the $1,000 they have saved up which wouldn't be enough to cover next months rent. 

But behind that slightly trite question there is a more deep and meaningful one. It is a variation of this question, which I've talked about a lot on this blog and in my various books:

"Given an expected distribution of trading strategy returns, what is the appropriate standard deviation or leverage target to run your strategy at?"

And the variation we address here is:

How does the answer to the above question change if you are regularly taking a specific % withdrawal from your account?

This has obvious applications to retail traders like me (although I don't currently take a regular withdrawal from my trading account which is only a proportion of my total investments, rather I sporadically take profits). But it could also have applications to institutional investors creating some kind of structured product with a fixed coupon (do people still do that?).

There is generic python code here (no need to install any of my libraries first, except optionally to use the progressBar function) to follow along with.


A brief reminder of prior art(icles)


For new readers and those with poor memories, here's a quick run through what I mean by 'probabilistic Kelly'. If you are completely new to this and find I'm going too quickly, you might want to read some prior articles:


If you know this stuff backwards, then you can skim through very quickly just to make sure you haven't remembered it wrong.

Here goes then: The best measure of performance is the following - having the most money at the end of your horizon (which for this blogpost I will assume is 10 years, eg around 2,560 working days). We maximise this by maximising final wealth, or log(final wealth). This is known as the Kelly criterion. The amount of money you will have at the end of time is equal to your starting capital C, multiplied by the product of (1+r0)(1+r1)...(1+rT) where rt is the return in a given time period. The t'th root of all of that lot, minus one, is equal to the geometric mean. So to get the most money, we maximise the annual geometric mean of returns which is also known in noob retail trading circles as the CAGR.

If we can use any amount of leverage then for Gaussian returns the optimal standard deviation will be equal to the Sharpe Ratio (i.e. average arithmetic excess return / standard deviation). For example, if we have a strategy with a return of 15%, with risk free rate of 5%, and standard deviation of 20%; then the Sharpe ratio will be (15-5)/20 = 0.50; the optimal standard deviation is 0.50 = 50%; and the leverage required to get that will be 50%/20% = 2.5.

Note: For the rest of the post I'm going to assume Gaussian normal returns since we're interested in the relative effects of what happens when we introduce cash withdrawal, rather than the precise numbers involved. As a general rule if returns are negatively skewed, then this will reduce the optimal leverage and standard deviation target, and hence the safe cash withdrawal rate. 

Enough maths: it's probably easier to look at some pictures. For the arbitrary strategy with the figures above, let's see what happens to return characteristics as we crank up leverage (x-axis; leverage 1 means no leverage and fully invested, >1 means we are applying leverage, <1 means we keep some cash in reserve):

x-axis: leverage, y-axis: various statistics

The raw mean in blue shows the raw effect of applying leverage; doubling leverage doubles the annual mean from 15% to 30%. Similarly doubling leverage doubles the standard deviation in green from 20% to 40%. However when we use leverage we have to borrow money; so the orange line showing the adjusted mean return is lower than the blue line (for leverage >1) as we have to pay interest..

The geometric mean is shown in red. This initially increases, and is highest at 2.5 times leverage - the figure calculated above, before falling. Note that the geometric mean is always less than the mean; and the gap between them gets larger the riskier the strategy gets. They will only be equal if the standard deviation is zero. Note also that using half the optimal leverage doesn't halve the geometric return; it falls to around 14.4% a year down from just over 17.5% a year with the optimal leverage. But doubling the leverage to 5.0 times results in the geometric mean falling to zero (this is a general result). Something to bear in mind then is that using less than the optimal leverage doesn't hurt much, using more hurts a lot.

Here is another plot showing the geometric mean (Left axis, blue) and final account value where initial capital C=1 (right axis, orange); just to confirm the maximum occurs at the same leverage point:

x-axis: leverage, y-axis LHS: geometric mean (blue), y-axis RHS: final account value (orange)

Remember the assumption we're making here is that we can use as much leverage as possible. That means that if we have a typical relative value (stat arb, equity long short, LTCM...) hedge fund with low standard deviation but high Sharpe ratio, then we would need a lot of leverage to hit the optimal point. 

If we label our original asset A, then now consider another asset B with excess mean 10%, standard deviation 10%, and thus Sharpe Ratio of 1.0. For this second asset, assuming it is Gaussian (and assets like this are normally left skewed in reality) the optimal standard deviation will be equal to the SR, 100%; and the leverage required to get that will be 100/10 = 10x. Which is a lot. Here is what happens if we plot the geometric mean against leverage for both assets.



Optimal leverage (x axis) occurs at maximum geometric mean (y axis) which is at leverage 2.5 for A, and at leverage 10 for B (which as you would expect has a much higher geometric mean at that point). 

But if we plot the geometric mean (y axis) against standard deviation (x axis) we can see the optimium risk target is 50% (A) and 100% (B) respectively:

x-axis: leverage, y-axis geometric mean
 

Bringing in uncertainty


Now this would be wonderful except for one small issue; we don't actually know with certainty what our distribution of futures returns will be. If we assume (heroically!) that there is no upward bias in our returns eg because they are from a backtest, and we also assume that the 'data generating process (DGP)' for our returns will not change, and that our statistical model (Gaussian) is appropriate for future returns; then we are still left with the problem that the parameters we are estimating for our returns are subject to sampling estimation error or what I called in my second book 'Smart Portfolios', the "uncertainty of the past".

There are at least three ways to calculate estimation error for something like a Sharpe Ratio, and they are:

  • With a distributional assumption, using a closed form formula eg the variance of the estimate will be (1+.5SR^2)/N where N is the number of observations, if returns are Gaussian. For our 2560 daily returns and an annual SR of 0.5 that will come out to a standard deviation of estimate for the SR of 0.32; eg that would give a 95% confidence interval for annual SR (approx +/- 2 s.d.) of approximately -0.1 to 1.1
  • With non parametric bootstrapping where we sample with replacement from the original time series of returns
  • With parametric monte carlo where we fix some distribution, estimate the distributional parameters from the return series and resample from those distributions
Calculation: annual SR = 0.5, daily SR = 0.5/sqrt(256) = 0.03125. Variance of estimate = (1+.5*.03125^2)/2560 = 0.000391, standard deviation of estimate = 0.0197, annualised = 0.0197*sqrt(256) = 0.32

For simplicity and since I 'know' the parameters of the distribution I'm going to use the third method in this post. 

(it would be equally valid to use the other methods, and I've done so in the past...)

So what we do is generate a number of new return series from the same distribution of returns as in the original strategy, and the same length (10 years). For each of these we calculate the final account value given various leverage levels. We then get a distribution of account values for different leverage levels. 

The full Kelly optimal would just find the leverage level at which the average account value was maximised, i.e. the median 50% percentile point of this distribution. Instead however we're going to take some more conservative distributional point which is something less than 50%, like for example 20%. In plain english, we want the leverage level that maximises the account value that we expect to get say two out of ten times in a future 10 year period (assuming all our assumptions about the distribution are true). 

Note this is a more sophisticated way of doing the crude 'half Kelly' targeting used by certain people, as I've discussed in previous blog posts. It also gives us some comfort in the case of our returns not being normally distributed, but where we've been unable to accurately estimate the likely left hand tail from the existing historic data ('peso problem').

Let's return to asset A and show the final value at different points of the monte carlo distribution, for different leverage levels:


x-axis leverage level, y-axis final value of capital, lines: different percentile points of distribution

Each line is a different point on the distribution, eg 0.5 is the median, 0.2 is the 20% percentile and so on. As we get more pessimistic (lower values of percentile), the final value curve slips down for a given level of leverage; but the optimal leverage which maximizes final value also reduces. If you are super optimistic (75% percentile) you would use 3.5x leverage; but if you were really conservative (10% percentile) you would use about 0.5x leverage (eg keep half your money in cash). 

As I said in my previous post your choice of line is down to your tolerance for uncertainty. This is not quite the same as a risk tolerance, since here we are assuming that you are happy to maximise geometric mean and therefore you are happy to take as much standard deviation risk as that involves. I personally feel that the choice of uncertainty tolerance is much more intuitive to most people than choosing a standard deviation risk limit / target, or god forbid a risk tolerance penalty variable.


Introducing withdrawals


Now we are all caught up with the past, let's have a look at what happens if we withdraw money from our portfolio over time. First decision to make is what our utility function is. Do we still want to maximise final value? Or are we happy to end up with some non positive value of money at the end of time? For some of us, the answer will depend on how much we love our children :-) To keep things simple, I'm initially going to assume that we want to maximise final value, subject to that being at least equal to our starting capital. As my compounding calculations assume an initial wealth of 1.0, that means a final account value of at least 1.0.

Inititally then I'm going to look at what happens in the non probabilistic case. In the following graph, the x-axis is leverage as before, and the y-axis this time is final value. Each of the lines shows what will happen at a different withdrawal rate. 0 is no withdrawal, 0.005 is 0.5% a year, and so on up to 0.2; 20% a year.


x-axis leverage, y-axis final value. Each line is a different annual withdrawal rate

At higher withdrawal rates we make less money - duh! - but the optimal leverage remains unchanged. That makes sense. Regardless of how much money we are withdrawing, we're going to want to run at the same optimal amount of leverage. 

And for all withdrawal rates of 17% or less, we end up with at least 1.0 of our final account value, so we can use the optimal leverage without any worries. For higher withdrawal rates, eg 20%, we can never safely withdraw all that amount, regardless of how much leverage we use. We'll always end up with less than our final account value even at the optimal leverage ratio.

For this Sharpe Ratio level then, to end up with at least 1.0 of our account value, it looks like our safe withdrawal rate is around 17% (In fact, I calculate it later to be more like 18%).


Safe withdrawals versus Sharpe Ratio


OK that's for a Sharpe of 0.5, but what if we have a strategy which is much better or worse? What is the relationship between a safe withdrawal rate, and the Sharpe Ratio of the underlying strategy?  Let's assume that we want to end up with at least 1.0x our starting capital after 10 years, and we push our withdrawal rate up to that point. 

X-axis Sharpe Ratio, y-axis safe withdrawal rate leaving capital unchanged at starting level

That looks a bit exponential-esque, which kind of makes sense since we know that returns gross of funding costs scale with the square of SR: If our returns double with the same standard deviation we double our SR, then we can double our risk target, which means we can use twice as much leverage, so we end up with four times the return. It isn't exactly exponential, because we have to fund borrowing. 

The above result is indifferent to the standard deviation of the underlying asset as we'd expect (I did check!), but how does it vary when we change the other key values in our calculation: the years to run the strategy over and the proportion of our starting capital we want to end up with?

x-axis amount of starting capital to end up with, y-axis withdrawal rate, lines different time periods in years

Each of these plots has the same format. The Sharpe Ratio of the underlying strategy is fixed, and is in the title. The y-axis shows the safe withdrawal rate, for a given amount of remaining starting capital on the x-axis (where 1.0 means we want to end up with all our remaining capital). Each line shows the results for a different number of years.

The first thing to notice is that if we want to maintain our starting capital, the withdrawal rate will be unchanged regardless of the number of years we are trading for. That makes sense - this is a 'steady state' where we are withdrawing exactly what we make each year. If we are happy to end up with less of our capital, then with shorter horizons we can afford to take a lot more out of our account each year. Again, this makes sense. However if we want to end up with more money than we started with, and our horizon is short, then we have to take less out to let everything compound up. In fact for a short enough time horizon we can't end up with twice our capital as there just isn't enough time to compound up (at what here is quite a poor Sharpe Ratio). 

x-axis amount of starting capital to end up with, y-axis withdrawal rate, lines different time periods in years


With a higher Sharpe, the pattern is similar but the withdrawal rates that are possible are much larger. 


x-axis amount of starting capital to end up with, y-axis withdrawal rate, lines different time periods in years



Withdrawals probabilistically


Notice that if you really are going to consistently hit a SR of exactly 1, and you're prepared to run at full Kelly, then a very high withdrawal rate of 50% is apparently possible. But hitting a SR of exactly 1 is unlikely because of parameter uncertainty. 

So let's see what happens if we introduce the idea of distributional monte carlo into withdrawals. To keep things simple, I'm going to stick my original goal of saying that we want to end up with exactly 100% of our capital remaining when we finish. That means we can solve the problem for an arbitrary number of years (I'm going to use 30, which seems reasonable for someone in the withdrawal phase of their investment career post retirement). 

What I'm going to do then is generate a large number of random 30 year daily return series drawn for a return distribution appropriate for a given Sharpe Ratio, and for each of those calculate what the optimal leverage would be (which remember from earlier is invariant to withdrawal rate), and then find the maximum annual withdrawal rate that means I still have my starting capital at the end of the investment period. This will give me a distribution of withdrawal rates. 

From that distribution I then take a different quantile point, depending on whether I am being optimistic or pessimistic versus the median.



X-axis: Sharpe Ratio. Y-axis: withdrawal rate (where 0.5 is 50% a year). Line colours: different percentiles of the monte carlo withdrawal rate distribution, eg 0.5 is the median, 0.1 is the very conservative 10% percentile.


Here is the same data in a table:

                   Percentile
SR     0.10   0.20   0.30   0.50    0.75
0.10 4.7 4.8 4.9 5.5 7.40
0.25 4.9 5.3 6.0 7.9 11.00
0.50 9.0 11.0 13.0 18.0 24.25
0.75 18.0 23.0 26.0 33.0 43.00
1.00 33.0 39.0 45.0 55.0 66.00
1.50 83.9 96.0 102.0 117.0 135.00
2.00 162.0 176.0 185.7 205.0 228.25

We can see our old friend 18% in the median 0.50 percentile column, for the 0.50 SR row. As before we can withdraw more with higher Sharpe Ratios.
Now though as you would expect, as we get more optimistic about the quanti, we would use a higher withdrawal rate. For example, for a SR of 1.0 the withdrawal rates vary from 33% a year at the very conservative 10% percentile, right up to 66% at the highly optimistic 75% percentile.
As I've discussed before nobody should ever use more than the median 50% (penultimate column) which means you're basically indifferent to uncertainty, and I'd be vary wary of the bottom few rows with very high  Sharpe Ratios, unless you're actually running an HFT shop or Jane Street in which case good luck.
Footnote: All of the above numbers were calculated with a 5% risk free rate. Here are the same figures with a 0% risk free rate. They are roughly, but not exactly, the above minus 5%. This means that for low enough SR values and percentile points we can't safely withdraw anything and expect to end up with our starting capital intact.

                    Percentile
SR     0.10   0.20   0.30   0.50   0.75
0.10 0.0 0.0 0.0 0.4 2.8
0.25 0.0 0.6 1.4 3.6 7.5
0.50 3.6 5.9 8.1 12.0 19.0
0.75 13.0 17.0 21.0 28.0 38.0
1.00 29.0 34.0 39.0 48.0 62.0
1.50 79.0 90.0 96.0 110.0 131.0
2.00 150.9 167.0 178.0 198.5 223.0



Conclusion



I find him strangely compelling and also very annoying. He is always sniggering and has a permanent smug look on his face. The videos are mostly set in exotic places where we are presumably supposed to envy Anton's lifestyle which seems to involve spending a lot of time flying around the world - not something I'd personally aspire to.

t

I can't comment on the quality of his education but at least he has the pedigree. He also has some interesting opinions about non trading subjects but then so do most trading "gurus". Mostly on trading, and on the financial industry generally, from what I've seen he talks mostly sense.

Anyway, one interesting thing he said is that you shouldn't use trading for income but only to grow capital. Something I mostly agree with. Mostly people who w

http://www.elitetrader.com/et/index.php?threads/how-much-did-you-save-up-before-you-decided-to-trade-full-time.298251/
As a procrastination technique (I'm supposed to be writing my second book) I've been watching the videos of Anton Kriel on youtube. For those of you who don't know him he's an english ex goldman sachs guy who retired at the age of 27, "starred" in the post modern turtle traders based reality trading show "million dollar traders", and now runs something called the institute of trading that offers very high priced training and mentoring courses.

I find him strangely compelling and also very annoying. He is always sniggering and has a permanent smug look on his face. The videos are mostly set in exotic places where we are presumably supposed to envy Anton's lifestyle which seems to involve spending a lot of time flying around the world - not something I'd personally aspire to.

I can't comment on the quality of his education but at least he has the pedigree. He also has some interesting opinions about non trading subjects but then so do most trading "gurus". Mostly on trading, and on the financial industry generally, from what I've seen he talks mostly sense.

Anyway, one interesting thing he said is that you shouldn't use trading for income but only to grow capital. Something I mostly agree with. Mostly people who w

http://www.elitetrader.com/et/index.php?threads/how-much-did-you-save-up-before-you-decided-to-trade-full-time.298251/
I'd quite a conservative person, so I'd probably conservatively assume my SR was around 0.50 (in backtest it's much higher than that, and even in live trading it's been a little bit higher), and use the most conservative 10% percentile. That implies that with a withdrawal rate a shade under 4% plus the risk free rate I'd still have my starting capital intact after any given period of time. 

Since I've used a risk free rate of 5%, that implies withdrawing the risk free rate plus another 4% on top, for a total of 9%.

If you're more aggressive, and have good reason to expect a higher Sharpe Ratio, then you could consider a withdrawal rate up to perhaps 30%. But this should only be done by someone with a track record of achieving those kinds of returns over several years, and who is comfortable with the fact that their chances of maintaining their capital are only a coin flip.

Note that one reason this is quite low is that in a conservative 10% quantile scenario I'd rarely be using the full Kelly (remember 0.50 SR implies a 50% risk target); this is consistent with what I actually do which is use a 25% risk target. With a 25% risk target, and SR 0.5 in theory I will make the risk free rate plus 12.5%. So I'm withdrawing around a third of my expected profits, which sounds like a good rule of thumb for someone who is relatively risk averse. 

Obviously my conservative 9% is higher than the 4% suggested by most retirement planners (which is a bit arbitrary as it doesn't seem to change when the risk free rate changes), but that is for long only portfolios where the Sharpe probably won't be even as good as 0.50; and more importantly where leverage isn't possible. Getting even to the half Kelly risk target of 25% isn't going to be possible without leverage with a portfolio that doesn't just contain small cap stocks or crypto.... it will be impossible with 60:40 for sure! But also bear in mind that my starting capital won't be worth what it's currently worth in real terms in the future, so I might want to reduce that figure further.