This Blog is Systematic: January 2021

Risk. Love it or hate it, well as a trader you have to deal with it even though none of us really like it. No, we'd all prefer to be one of those mythical traders you hear about on youtube or instagram who consistently make $1000 a day, and never lose any money. Sadly I am not in that unicorn like category, and as only real people read this blog neither are you unless you are one of the HFT fund managers who read this blog purely for comic relief.

("Dimitri! This guy is excited about his Sharpe Ratio... wait for it... it's 1.2! No, not daily Sharpe, annualised!! So funny!")

Risk is important, because without knowing what our risk is we can't size our positions appropriately, or accurately know whether a -2% day is an abberation or well within statistical expectations.

("Ha, imagine having a 2% down day! What a loser! I can't even imagine having a day where you make less than 2%!")

As regular readers know I prefer to measure risk according to the annualised daily standard deviation of returns. This makes some assumptions about the distribution of returns that are extremely heroic for underlying assets, but not too bad once you have normalised returns for recent standard deviations, and hence sort of okay for a trading system that dynamically adjusts it's risk.

Before I've written about why I think 'trade based' risk management is flawed, and why the ATR is a reasonable approximation to my preferred method.

However there is one form of risk measurement that I've never really dealt with, except in very short answers to questions on this blog, short answers like "No. Never do that". I'm talking about maximum drawdowns.

Maximum drawdowns are an extremely popular way of measuring risk, and using it to size positons. The latter is usually in the context of the entire backtested account curve, rather than individual instruments.

But why do I keep saying "No. Never do that" (where that is using maximum drawdowns to size positions)? Well in this post I explain where my bias agains this particular statistic comes from.

But... spoiler alert... I find that when properly used maximum drawdowns are not as bad as you think!

Why people love max drawdowns?

First, in the unlikely event there is someone reading this who doesn't know, a maximum drawdown is defined as follows. Firstly let's define a drawdown. It's when you look at your cumulated account curve eithier for live or backtested trading, and find the most recent high watermark. Then you look at where you are now, and the difference between these points is your drawdown. You can measure both the size and length (in days) of this drawdown, as well as any previous drawdowns that have now finished (because the strategy broke the previous high water mark).

If you do this exercise for your backtest, then you will find that you have a history of drawdowns, and the largest one (in size) is the maximum drawdown.

So far, so good. There is no harm in measuring this. It's what you choose to do next that could be dangerous.

What you do next is something like this "Well my maximum drawdown in backtest is only 20%, so I can afford to leverage up my strategy by a factor of 2.5, and still survive the worst possible drawdown with one half of my capital".

This is, in fact, crazy talk. I'll explain why, but first we need to dig into how we actually measure maximum drawdowns, and get a feel for how they relate to other risk statistics.

A slightly nerdy detour: measuring maximum drawdowns

First we need to make a slight detour to discuss how maximum drawdowns should be measured. This is slightly technical, and it relates to a previous post I've done on whether returns should be measured in log space/cumulated percentages.

So we measure percentage returns as the change in account value, divided by capital at risk. But what is capital? The simplest possible interpretation is that it's the current value of your account. All your profits, and all your losses, will affect the size of your capital (this is what I describe as full compounding in this post).

This leads to some difficulty in measuring your maximum drawdown. Suppose for example that you start with £10,000; but then you lose 1% of your capital for the next ten days. The first 1% is £100, but the next is just £99, whilst the third is only £98. In total you lose £956, or 9.6%.

So what's your maximum drawdown? Is it 10% or 9.6%?

Now suppose a few years later you get another drawdown, this time from a high point of £20,000. Now you lose 0.5% a day for ten days. Your total loss is £977.

What is your drawdown? Is it 977/20000 = 4.9%? Is it 0.5%*10 = 5%? Or is it 977/10000 = 9.8%? Is this now the maximum drawdown, or is it the earlier drawdown of 9.6%?

Another way to run your account is to do what I do, which is not compound any gains above the high water mark, but to reduce capital when I make subsequent losses. In this case my capital would still be £10,000 for the second drawdown, so the drawdown would be eithier 4.9% or 5%; in eithier case the first drawdown would now be clearly the maximum drawdown.

Finally, you could run your account at a fixed capital, in which case things are easy: the first drawdown is 10% and the second is 5%. However that isn't compatible with sensible position sizing methodologies: you should reduce bet size as you lose money.

In practice what I suggest doing in backtests is calculating account curves as cumulated percentages (which is equivalent to a log scale), in other words using a notional fixed capital. So the first drawdown would be 10% (the sum of ten 1% losses), and the second would be 5% (the sum of ten 0.5% losses). The first is now the maximum drawdown. This is slightly conservative, since if reduce our capital when we make losses then the actual drawdowns will be smaller.

For example, using this method my largest drawdown in live trading has been around 20%. But on the fundseeder platform, which uses the current account value for capital, it's only 18%. Not that much difference perhaps, but it becomes much bigger if your losses mount.

Because of this calculation quirk you will see maximum drawdown figures of more than 100% in the rest of this post, which of course couldn't actually happen in practice; as you lost more money you'd be reducing your capital and the actual dollar amount lost wouldn't be all your capital. But in practice something that shows up as a 100% drawdown here will still be pretty awful in reality.

There is another reason for doing it this way which will become apparent below.

The relationship between max drawdowns and other risk/return measures

Now we know how to measure maximum drawdowns, let's get a feel for how they relate to other risk measurements. We could do this using a specific set of one or more backtests, but it's actually better to use simulated data so we can draw general conclusions rather than ones that are highly specific to the way a particular strategy turned out. In fact in a very old post, I did exactly that for average drawdowns. I didn't look at maximum drawdowns specifically, so let's do that now.

Here's some code which uses the accounting function from pysystemtrade to make life simpler. What this does, initially, is generate a series of account curves with some given properties (length in years, expected annualised Sharpe Ratio, expected standard deviation, and expected skew). For each of these account curves we calculate the worst drawdown (using the cumulative method). We then take the median value of those drawdown figures to give us an expectation for the worst drawdown over the given period.

For example if do this for a 10 year backtest, SR 0.5, skew 0 (so Gaussian: the results don't change much for other values of skew), and a standard deviation of 20%; then I get a median drawdown of ~45%. Because of randomness you may get a slightly different result.

An obvious question is how we'd expect these numbers to vary given different inputs. So let's find out.

Max drawdown and standard deviation

Let's begin with a very dull picture indeed.

X-axis:standard deviation of returns, annualised % per yer. Y-axis: Median worst 10 year drawdown as a percentage using fixed capital. Period: 10 years. SR: 0.5. Skew 0.0

Keeping all the numbers as above and just varying the standard deviation, we get a linear relationship (not exactly linear only because of the inherent noise in the random backtests). Doubling your risk will exactly double your vol target. A 10% vol gives you an expected drawdown of 23%, whereas a 20% vol pushes you down to around 46%.

Note this is another reason for using cumulated % returns; if I hadn't used them we wouldn't get such a clean linear result.

With that in mind, let's stick to using a 20% vol target with the knowledge that we can very easily generalise our results to other vol levels just by applying a pro-rata adjustment.

Max drawdown and Sharpe Ratio

Now let's turn to Sharpe Ratio,

X-axis:annual Sharpe Ratio of returns. Y-axis: Median worst 10 year drawdown as a percentage using fixed capital. Period: 10 years. Standard deviation: 20% per year. Skew 0.0

Clearly the higher your Sharpe Ratio, the less likely the chance of a large drawdown. But the relationship isn't linear; going from SR 2.0 to 1.0 (halving) increases the expected maximum drawdown by 50%; halving it again to 0.5 increases it by another 36%.

Of course we have to ask ourselves how realistic those Sharpe Ratios greater than 1.0 are; something I will return to later (HFT people: I'm not talking to you. Why are you still here? Sniggering at the back, as usual).

Max drawdown and period length

How are maximum drawdowns affected by the length of time involved?

X-axis: Length of backtest in years. Y-axis: Median worst 10 year drawdown as a percentage using fixed capital. SR 0.5, Standard deviation: 20% per year. Skew 0.0

Clearly the longer we are trading the more likely we'll see a big drawdown. Over one year we can expect to see a maximum drawdown of under 20%; about the same size as the standard deviation of returns. But over 10 years the drawdown could easily be twice as big.

Here is something clearly nonsensical about the idea of using a raw maximum drawdown from a backtest to set capital sizing. Someone who has a one year backtest (and believe me, such people exist!) will confidently set their capital sizing on the assumption that their maximum drawdown is say 50% (which would be a vol target of around 55%).

But if they trade for two years they can expect to get a drawdown that is a third larger; around 67%. If they trade for ten years, then they can expect to go bust with an expected drawdown of 117% (they won't actually lose more than 100% if they adjust their capital downwards as they lose money, but they will certainly lose a hell of a lot).

Of course that is easily fixed, as you can just apply a correction factor to the maximum drawdown in your backtest using the above graph as a guide.

Beware of backtests bearing attractively small drawdowns

Apart from the minor niggle above about backtest length above (which is easily corrected), I still haven't really explained why it is so bad to use maximum drawdowns.

Come on, I hear you say, using your maximum drawdown to calibrate your risk is so satisfyingly simple. Look (you argue), the most I can lose at this risk level is half my capital. It's right there in the backtest.

But you can't trust your backtest, even if it's more than a year in length.

There are the usual stupid things people do when backtesting, survivorship bias, forward looking data... and so on.
There is the sin of overfitting in all it's glorious flavours which will bias up your past returns, even if you were extremely careful (and most of us aren't careful enough).
Finally there is the likelihood that the future just won't be quite as good as the past, eithier because your fancy HFT algo has just decayed in the last five minutes, or at the other end of the time scale because the massive tailwind from falling interest rates is no longer there.

Now to be fair these are problems that affect other backtest parameters as well, but specifically for maximum drawdowns (and other 'worst loss' statistics) there is a good chance that your backtest doesn't accurately reflect the worst that can happen. Does your equity index data cover 2010? What about 2008? And 1987? You might be with me so far, but I doubt very much your 1 minute bar data goes back to 1929! And what may look okay on daily returns could be hiding an intraday move that would have blown well past your maximum drawdown.

And even if you have included the worst of the past, that may not accurately reflect the potential worst of the future. CHFEUR had never fallen by 20% in a matter of minutes, until it did in 2011.

The wide, wide sampling distribution of maximum drawdowns

Were this any other post at this point I'd trot out the other problem with backtests; that the parameters we estimate from them are subject to sampling uncertainty. And the uncertainty of Sharpe Ratios in particular is hideously large. If we don't know exactly what the SR was in the post, well we have no chance of confidently knowing it in the future. And since the SR affects the expected maximum drawdown, we'd also expect the sampling uncertainty of maximum drawdowns to be big.

So let's go straight to the horses mouth and model the sampling distribution of maximum drawdowns. That's easily done since we generated such a distribution earlier before taking the median of that distribution. Here is a distribution for our original set of parameter values:

Distribution of maximum drawdown estimate. 10 year backtests. SR 0.5, Standard deviation: 20% per year. Skew 0.0

Notice how wide this distribution is; in comparision the estimate for standard deviation (my preferred measure of risk) would be extremely tight. That's because we're taking very few pieces of data into account when calculating the maximum drawdown; so a few outliers here and there can produce some extreme results.

Remember this parameter set had a median max DD value of around -46%, but you can see from the left skew that the mean will be lower: around 49%. In fact there are some pretty evil values in that left tail (though none are really more than 100% remember), which is why I'd be very cautious of just blithley taking the drawdown from one backtest and extrapolating a safe level of risk.

For example, with a backtested 10 year max drawdown of 30% you might think you can increase your risk target to say 45%. But you might just have been lucky (a 30% max DD would put you just outside the top 10% of the observations above), and instead you end up with a live 10 year account curve that is in the bottom 10% of outcomes (which would be a maximum drawdown of over 70% before allowing for the increased risk target, that would put you in theoretical bankruptcy and real trouble).

This kind of exercise is very useful to calibrate your expectations in live trading, given your backtest performance, by comparing your actual max drawdown with the distribution of what could be expected given the SR in your backtest, length of time you have been trading, and vol target.

For example, take my own account. At one point in 2020 I was down nearly 20%, the maximum in the nearly 7 years since I started live trading my own money. It's good to know that I have actually been extremely lucky: using my standard deviation target of 25% and assuming my backtest Sharpe Ratio of 1.0 is accurate, that puts a 20% max drawdown at around the 99th percentile of outcomes. Using my slightly lower realised standard risk and higher achieved live trading SR I'm still around the 92nd percentile.

Eithier way I shouldn't be too disheartened if I subsequently get a maximum drawdown of 42% (the median expectation over ten years using my backtest SR and vol target) or even higher in the future.

Drawdowns and Kelly

So maximum drawdowns have problems. But to be fair, most methods for setting risk targets have the same problems.

Remember that the Kelly criterion is the way I have always said we should set risk, and that in simple terms we set the vol target equal to the Sharpe Ratio. So for example, with the SR of 0.5 we've used in our canonical examples that would equate to an annual standard deviation target of 50%. And that's usually kind of crazy so a good rule of thumb is to use half that value (though in the post linked to above I use a more sophisticated method).

As I note at length in my two books on trading there are other constraints that may reduce the target risk below what Kelly says it should be.

But Kelly shares many of the problems of maximum drawdowns. It relies on Sharpe Ratio estimates, which in turn rely on unbiased backtests with overfitting - like maximum drawdown does. We also don't have precise estimates for Sharpe Ratios; like maximum drawdowns our estimates have sampling uncertainty which in turn means that our half kelly vol target estimates also have sampling uncertainty.

So let's compare half Kelly with another capital rule, where we set the vol target such that we expect over 10 years to get a maximum drawdown of exactly 50% of our capital. It sounds like this is fairly close to half Kelly in spirit, since in both cases we are setting our risk at half of the absolute maximum it could possibly be and still survive.

For example, with a Sharpe Ratio of 0.5 we already know that the median maximum drawdown with 20% standard deviation over 10 years is 46%. So our risk target could be slightly higher; about 22% (easy to calculate because of the linear scaling property of maximum drawdowns to standard deviation). The half Kelly risk level would be slightly higher than that, at precisely half the Sharpe Ratio of 0.5: 25%.

Here are the results from some other Sharpe Ratios over ten year backtests:

X-axis expected annualised Sharpe Ratio. Y-axis: appropriate risk target, annual standard deviation of returns. Blue line: Risk targeting based on Max DD. Orange line: risk targeting based on Kelly. Skew 0, 10 year backtest

Now let's do the same with a 30 year backtest:

Well that is interesting. It turns out that, except for very low Sharpe Ratios and short backtests, the use of our maximum drawdown capital rule is usually more conservative than half Kelly. That's mainly because the Kelly rule accounts for the fact that our maximum drawdown will never be as large as we think it will using the fixed capital measurement method.

But we also need to consider the uncertainty in both estimates, rather than just looking at the median worst drawdown and expected Sharpe Ratio, and using those point estimates to calculate the required vol target. Let's stick to a 10 year backtest and a SR of 0.5, since both methods give virtually the same capital target at around 22% and 25% for worst DD and Kelly respectively.

Here is a distribution of the correct vol target using the 'lose half at maximum drawdown' methodology:

Distribution of optimal vol targets calculated as 0.2*0.5/Max_dd where Max_dd is the worst drawdown over 10 years on a random series of data calculated using a 20% annual standard deviation of returns

You can see that the median comes in at approximately 23% as expected. But there are times when the vol target is much lower, due to seriously large drawdowns, and other times when it is bigger because the drawdowns don't come out too badly.

And here it is using the half Kelly criterion (using the same set of 1000 random account curves so the results are precisely comparable):

Distribution of optimal vol targets calculated as 0.5*SR where SR is the realised Sharpe Ratio over 10 years for the same bootstrap runs as the previous plot

Again the expected mean here is exactly 25% (half of 0.5; the actual distribution will have a slightly different mean because of randomness) but sometimes it's negative when the strategy loses money, and sometimes it's much larger when the SR happens to be excellent for a given random set of returns.

However comparing the two distributions is interesting. Very interesting. For a start the distribution using maximum drawdown is tighter; the 10% and 90% quantiles are 14% and 33%; versus 6% and 45% for half Kelly. The vol target is also never negative for the max drawdown method - it can't be by construction; whereas there are a few outcomes when the Sharpe Ratio is negative and hence the half Kelly vol target is also negative.

So if you're going to do something a bit dumb; just take the account curve from a single backtest with a decade of data, well then you are probably better off using the worst drawdown method to do it rather than some Kelly based method. But better still: don't be dumb!

Summary

So.

Don't get me wrong, it's useful to know your likely maximum drawdown: expectations should be managed.

But I wouldn't just pull a number off a backtest and assume that you can safely use that value to determine how much capital you need. There is way too much uncertainty, and backtests just can't be trusted that much.

Equally you shouldn't do that with the Kelly criterion eithier! Don't just take the Sharpe Ratio from a single backtest, assume it's realistic, and halve it to be 'conservative' to get your vol target. And in fact if you are going to just take a number off a single backtest to calculate a vol target, well it turns out you are better off using the 'set worst drawdown to 50%' method than half Kelly. It's a bit more conservative and has a narrower distribution. That was the big surprise for me when writing this post.

Personally I'm going to stick to using the Kelly criteria as it's what I know, and I like the intuition and simplicity. But I use Kelly with a considerable helping of distributional uncertainty, plus a dollop of backtest skepticism. Having said that there is no harm in checking the distribution of worst drawdowns given the statistics of your account curve. If it comes out at a median worst drawdown of more than 50% loss, well you may want to consider reining your vol target back - regardless of what Kelly says.

Postscript/footnote (added 6th January 2021)

Following my original publication I had this very interesting feedback on twitter from making an additional point I hadn't thought of:

@MichaelENewton1: Another big difference you didn't mention (unless I missed it) is that using Sharpe and Kelly is constantly adjusting as new data comes in, whereas max drawdown will be stable for years until a new max drawdown appears and your system has a major readjustment.

Me: Replying to @MichaelENewton1

That's a very good point. Although if you have a lot of data already the SR won't change that much.

@MichaelENewton1: That's my point. The SR and Kelly will constantly adjust but never by very much. Max drawdown won't adjust at all until a single event when it has to adjust drastically. I personally prefer the former option, even though I don't do it quite that way.

(Michael is one of those annoying people who appears to be good at at least two things: quant finance and history - his actual day job).

This Blog is Systematic

Tuesday, 5 January 2021