Tuesday, 7 January 2025

Do less liquid assets trend better or is that they are just more diversified?

 As most of you know, one of the many projects / things I am involved with is the TTU Systematic Investor podcast series where I'm one of the rotating cast of co-hosts.

On a recent episode (at 24:05) we discussed the reasons why 'alt' CTAs tend to do better than traditional CTAs. Examples of alt-CTAs mentioned in that segment are the Man-AHL Evolution fund which I was heavily involved with when I was at AHL, and the Florin Court product  which is run by some ex-AHL colleagues. 

(Other funds are available and this is not endorsement or financial advice which I am not regulated to provide. It may be utterly illegal of you to even be aware of these products in your jurisdiction never mind invest in them them, and that is your problem not mine)

An 'alt-CTA' is one that trades non traditional markets, but in a traditional way (eg mostly by trend following). These could be less liquid futures markets, but is more likely to be non futures markets like options, OTC derivatives or cash equities. In this article I'm going to focus on the 'less liquid futures' definition of alt-, because that is the data I happen to have. This means that the analysis is also analogous to one of the classical issues in financial economics - the small cap effect in equities. 

In that episode I mentioned some research I had once done on that very topic; albeit many, many years ago, and that document certainly isn't available on my blog. So I thought it worth redoing this exercise.


Reasons why alt CTAs might do better

There are a number of reasons why one CTA might outperform another, but we're going to focus on just three here:

  • more diversification (the products they trade have lower correlations with each other, and/or nice co-skewness properties)
  • better pre-cost performance from the products they are trading
  • lower costs

Now of course we would expect higher costs from less liquid futures; the key question is whether we get enough extra pre-cost performance to compensate.... or no extra performance at all. In which case is the extra alt-CTA juice coming from the diversification properties of the alt-markets (eithier linear correlation or something funkier in the higher moments)? Or will my simple analysis fail to uncover any extra alt-performance, eithier because the alt-CTA's have some extra magic or because their special black magic power can only be found in non futures markets. Or because they've just been lucky.

In any case we'll see if the equivalent of the 'small cap' effect in stocks is present in futures, or if it's something that was around in the past but has gone.

Note: There is some debate about whether the small cap effect, eithier outright, or in combination with the value effect, is still a thing. 


What we are measuring

We need a way of measuring:

  •  the liquidity of futures
  • the trend following performance

To keep things simple, for trend following performance I'm going to use the Sharpe Ratio of an EWMAC16,64 trend following continous forecast with my usual vol based position sizing. To calculate the Sharpe Ratio for a given period (eg a year), I'll use the annualised average daily percentage return divided by the expected annual percentage standard deviation. So this is a Sharpe Ratio based on the vol targeted, not the realised vol. This is because for short periods we might have a weak signal producing a high SR on a contract we didn't actually make any significant money out of. 

For futures liquidity, I'm going to use the 30 day rolling average of daily volume in $ million of annualised risk units for the contract that currently has the highest volume. That is the same measure I track daily here. And then I'm going to log(x) this volume, as these figures vary by many orders of magnitude.

Note: I currently set this measure at a minimum of $1.5 million to trade a given future. 

Note: The definition of $ annualised risk units is the number of contracts of volume, multiplied by the annual standard deviation in price units, mutiplied by the $ value of each price unit.

There could be other ways of measuring liquidity; for example open interest, or the cost of trading. I'm wary of using open interest since there are contracts with large open interest and small volume, and the reverse is also true. Personally I think unless you are a massive trader the size of the volume is more important than the open interest. I don't want to use cost of trading as a measure of liquidity, since I will be analysing that seperately.

Normally when I do this kind of analysis, I exclude instruments for all kinds of reasons including because they are too expensive or illiquid to trade. In this case I don't want to do that. I will however exclude instruments in my data set that are:

  • Duplicates. For example, I don't analyse both the micro and mini S&P 500. The instrument in my dataset are those which meet my minimum requirements for liquidity but have the smallest contract size. Note that the definition of which is the duplicate contract to trade could have been different in the past. For example, immediately after the micro future came in to being it wouldn't have met my requirements for liquidity, so I would have in practice used the mini future. This will affect the results in a small number of edge cases, but mostly for high volume instruments.
  • Ignored. These instruments eithier have garbage data, or they are spread instruments.

I won't exclude instruments that have:

  • Trading Restrictions - mostly ICE markets for which I don't have access to live data so don't currently trade, and certain US derivatives I'm banned from trading
  • 'Bad markets' - these are those that are too expensive or illiquid for me to trade - I want to see if there are size effects so I want to keep these. 

This gives me 205 instruments to analyse. Finally, I have around 12 years of data since I don't have volume data prior to 2013 in my dataset. 


Results across all years

Let's start by just plotting the average volume across all the available data, versus the pre-cost trend following p&l, by instrument.



That isn't especially suggestive of a strong relationship; although our eyes are drawn to the outlier in the top left (US housing equity sector if you care). If I do this as a 'bin cross' plot, which shows statistical significance (explained in more detail in chapter 12 of AFTS), then we can see there is really nothing there - in fact there is a slight tendency for very liquid markets to have a higher trend following SR:




What about costs?

Perhaps a slightly better relationship here - lower volume means higher costs - but not super consistent. There are instruments with very volume but not bad costs, such as the CLP and CZK FX markets at the extreme left. However these costs are based on sampled bid-ask spreads so are unlikely to be indicative of what you could actually achieve trading any size.

The cross plot shows that very illiquid markets do indeed cost more, but beyond that the relationship is relatively non linear. There is a 'zone of increasing costs' up to around $20m of volume in annual risk units, but beyond that risk adjusted costs are relatively flat. Again, this applies to bid-ask spreads only (and commisions) and for institutional size traders the 'zone of increasing costs' would apply to more instruments. 



Trend following p&l: Year by year results

This kind of market analysis has a fatal flaw; it doesn't account for the fact that some instruments will have been trading across the entire 12 year dataset whilst others will only have a few years of data. It also doesn't account for time series effects such as a given instrument seeing an increase or decrease in volume over the relevant period. To get around this, instead I'm going to break the results down into year by year results. So each point on the following scatter plot is the SR and volume for a given instrument and a given year. 

There is little point doing this for costs, since the costs in my backtest aren't actual costs, but here are the results for pre-cost returns. I haven't bothered with a scatter plot as it will be insanely noisy; here is the cross plot:


As with costs it does look like there is something there for very illiquid instruments; roughly those with less than $1m of volume units per day. But it's not statistically significant. The results incidentally survive the application of costs:

The median SR for log(volume) less than 0 (volume units < $1m per day) is 0.04 SR units higher even after costs, and the less robust mean SR is 0.12 units higher.


Measuring diversification via IDM

OK so it looks like very illiquid markets might have a slight edge in performance. But this isn't enough to explain the outperformance of alt-CTAs (with all the caveats from before); I'd also like to look at diversification.

Expected linear diversification can be measured easily by using what I call the 'IDM'. Intuitively, it's the multiplication factor required to leverage up a portfolio of assets with some weightings and correlations. See any of books on trading for details. A portfolio of assets with all correlations=1 will have an IDM of 1. A portfolio of N assets with all correlations zero will have an IDM of sqrt(N).

Note: We can also measure the actual diversification (which will confound both linear and non linear effects) by looking at the ratio between the portfolio SR and the SR of individual instruments - the Sharpe Ratio Ratio (SRR). This tends to be higher than we'd expect from looking at the IDM, as I note in AFTS and here; there is also another take from an ex colleague here. It's tricky to do here however as there are a lot of instruments jumping in and out of the portfolio.

So what I need to do is create portfolios of different liquidity instrument trend following sub-strategies and measure their diversifications (not the correlation of the underlying returns!). An open question is how these portfolios are weighted. I will do this two ways; firstly with equal weights. Secondly, using my handcrafting method (H/C) but in it's simplest form with just correlations (but naturally, using out of sample optimisation). 

This will be a crude in sample test where I look at the average volume over the entire trading period when we have volume figures and then use that to split the portfolio into different buckets. Because I'm trying to work out the why not the how of how this result could be exploited. I will use the final IDM (likely an overestimate given the IDM should increase as more instruments are added).

First by cutting off the portfolio at the median log(volume) of 2.8 (about $16m of daily volume units):

                          IDM EW                        IDM H/C 
Low volume                 2.30                           2.10
High volume                2.28                           2.14

That's... not very much difference. Here are the results as a time series, just to check it isn't a weird end of days effect:


Notice that IDM's fall over time, probably because correlations generally are rising. Earlier in the period when more diversification is available, the less liquid markets do better. But the differences aren't especially substantial.

But above it did seem that the better performance effect only kicked in once we were at very low volumes - below log(volume) of 0 (less than $1m in volume units). Let's go a bit more granular and cut our list of instruments into four groups of ~50 instruments, and for simplicity just look at handcrafted results:


Note the key is in log(volume) units. Note also that there isn't much going on here.


A very silly comparison

The one thing we haven't yet done is plot an account curve, so let's see what the portfolio p&l is like for each of the four buckets of liquidity (which essentially will confound both any improvement in per instrument trend following, plus the realised diversification both linear and non linear). To make this really silly, I'm going to do this for the whole of history despite only using volumes from 2013 to the present to decided which instrument goes in which bucket. This is a shocking idea for a huge number of reasons, almost too many to elucidate here.

With all that in mind, this is the strongest effect yet with less liquid markets underperforming. However this is very likely to be luck; and it's confined mostly to the period prior to 1985 when the less liquid market sets probably only contained only a few instruments which happened to do badly. After that there really isn't much in it.

Summary

On an individual market basis there is indeed a faint 'small cap' effect in futures, at least at this single speed. But it doesn't look like there is much of a difference in measurable diversification benefits. 

As I warned none of this goes very far to explaining the puzzle of the alt-CTA's outperformance, mainly because I don't really have the data to do this properly (so perhaps Man AHL or Florin coud do so?) - the benefit's of being an alt- aren't so much having a higher exposure to illiquid futures, than to trading things that aren't futures at all.

Although perhaps it really was luck, since the outperformance has started to fade recently and the five year track records for say Evo and AHL Alpha are now very similar. That could be because the diversification benefit has fallen off in alts more than in liquid futures, or because the alt- markets have 'matured' and become less 'trendy'.

What we haven't done here is look at the effect of including less liquid instruments in an existing portfolio of liquid instruments; ceritus paribus that should be a good thing since my starting assumption is that more diversification is better, especially as many of the less liquid instruments are commodities rather than another flipping US bond future.

Perhaps I should rethink my very strict policy on what I trade (minimum liquidity of $1.5m volume units per day); after all one of the advantages of being a smaller trader is being able to trade less liquid markets, and not all of the instruments with that sort of volume are super expensive as one of the earlier plots showed. 


Friday, 6 December 2024

Taking an income from your trading account - probabilistic Kelly with regular withdrawals

Programming note: This post has been in draft since ... 2016!

One question you will see me asked a lot is 'how much money do I need to become a full time trader?'. And I usually have a handwaving answer along the lines of 'Well if you think your strategy will earn you 10% a year, then you probably want to be able to cover 5 years of expenses with no income from your trading strategy, so you need 15x your annual living expenses as an absolute minimum'. Which curiously often isn't the answer people want to hear, since objectively at that point they would already be rich and they want to trade purely to become rich (a terrible idea! people should only trade for fun with money they can afford to lose); and also because they want to start trading right now with the $1,000 they have saved up which wouldn't be enough to cover next months rent. 

But behind that slightly trite question there is a more deep and meaningful one. It is a variation of this question, which I've talked about a lot on this blog and in my various books:

"Given an expected distribution of trading strategy returns, what is the appropriate standard deviation or leverage target to run your strategy at?"

And the variation we address here is:

How does the answer to the above question change if you are regularly taking a specific % withdrawal from your account?

This has obvious applications to retail traders like me (although I don't currently take a regular withdrawal from my trading account which is only a proportion of my total investments, rather I sporadically take profits). But it could also have applications to institutional investors creating some kind of structured product with a fixed coupon (do people still do that?).

There is generic python code here (no need to install any of my libraries first, except optionally to use the progressBar function) to follow along with.


A brief reminder of prior art(icles)


For new readers and those with poor memories, here's a quick run through what I mean by 'probabilistic Kelly'. If you are completely new to this and find I'm going too quickly, you might want to read some prior articles:


If you know this stuff backwards, then you can skim through very quickly just to make sure you haven't remembered it wrong.

Here goes then: The best measure of performance is the following - having the most money at the end of your horizon (which for this blogpost I will assume is 10 years, eg around 2,560 working days). We maximise this by maximising final wealth, or log(final wealth). This is known as the Kelly criterion. The amount of money you will have at the end of time is equal to your starting capital C, multiplied by the product of (1+r0)(1+r1)...(1+rT) where rt is the return in a given time period. The t'th root of all of that lot, minus one, is equal to the geometric mean. So to get the most money, we maximise the annual geometric mean of returns which is also known in noob retail trading circles as the CAGR.

If we can use any amount of leverage then for Gaussian returns the optimal standard deviation will be equal to the Sharpe Ratio (i.e. average arithmetic excess return / standard deviation). For example, if we have a strategy with a return of 15%, with risk free rate of 5%, and standard deviation of 20%; then the Sharpe ratio will be (15-5)/20 = 0.50; the optimal standard deviation is 0.50 = 50%; and the leverage required to get that will be 50%/20% = 2.5.

Note: For the rest of the post I'm going to assume Gaussian normal returns since we're interested in the relative effects of what happens when we introduce cash withdrawal, rather than the precise numbers involved. As a general rule if returns are negatively skewed, then this will reduce the optimal leverage and standard deviation target, and hence the safe cash withdrawal rate. 

Enough maths: it's probably easier to look at some pictures. For the arbitrary strategy with the figures above, let's see what happens to return characteristics as we crank up leverage (x-axis; leverage 1 means no leverage and fully invested, >1 means we are applying leverage, <1 means we keep some cash in reserve):

x-axis: leverage, y-axis: various statistics

The raw mean in blue shows the raw effect of applying leverage; doubling leverage doubles the annual mean from 15% to 30%. Similarly doubling leverage doubles the standard deviation in green from 20% to 40%. However when we use leverage we have to borrow money; so the orange line showing the adjusted mean return is lower than the blue line (for leverage >1) as we have to pay interest..

The geometric mean is shown in red. This initially increases, and is highest at 2.5 times leverage - the figure calculated above, before falling. Note that the geometric mean is always less than the mean; and the gap between them gets larger the riskier the strategy gets. They will only be equal if the standard deviation is zero. Note also that using half the optimal leverage doesn't halve the geometric return; it falls to around 14.4% a year down from just over 17.5% a year with the optimal leverage. But doubling the leverage to 5.0 times results in the geometric mean falling to zero (this is a general result). Something to bear in mind then is that using less than the optimal leverage doesn't hurt much, using more hurts a lot.

Here is another plot showing the geometric mean (Left axis, blue) and final account value where initial capital C=1 (right axis, orange); just to confirm the maximum occurs at the same leverage point:

x-axis: leverage, y-axis LHS: geometric mean (blue), y-axis RHS: final account value (orange)

Remember the assumption we're making here is that we can use as much leverage as possible. That means that if we have a typical relative value (stat arb, equity long short, LTCM...) hedge fund with low standard deviation but high Sharpe ratio, then we would need a lot of leverage to hit the optimal point. 

If we label our original asset A, then now consider another asset B with excess mean 10%, standard deviation 10%, and thus Sharpe Ratio of 1.0. For this second asset, assuming it is Gaussian (and assets like this are normally left skewed in reality) the optimal standard deviation will be equal to the SR, 100%; and the leverage required to get that will be 100/10 = 10x. Which is a lot. Here is what happens if we plot the geometric mean against leverage for both assets.



Optimal leverage (x axis) occurs at maximum geometric mean (y axis) which is at leverage 2.5 for A, and at leverage 10 for B (which as you would expect has a much higher geometric mean at that point). 

But if we plot the geometric mean (y axis) against standard deviation (x axis) we can see the optimium risk target is 50% (A) and 100% (B) respectively:

x-axis: leverage, y-axis geometric mean
 

Bringing in uncertainty


Now this would be wonderful except for one small issue; we don't actually know with certainty what our distribution of futures returns will be. If we assume (heroically!) that there is no upward bias in our returns eg because they are from a backtest, and we also assume that the 'data generating process (DGP)' for our returns will not change, and that our statistical model (Gaussian) is appropriate for future returns; then we are still left with the problem that the parameters we are estimating for our returns are subject to sampling estimation error or what I called in my second book 'Smart Portfolios', the "uncertainty of the past".

There are at least three ways to calculate estimation error for something like a Sharpe Ratio, and they are:

  • With a distributional assumption, using a closed form formula eg the variance of the estimate will be (1+.5SR^2)/N where N is the number of observations, if returns are Gaussian. For our 2560 daily returns and an annual SR of 0.5 that will come out to a standard deviation of estimate for the SR of 0.32; eg that would give a 95% confidence interval for annual SR (approx +/- 2 s.d.) of approximately -0.1 to 1.1
  • With non parametric bootstrapping where we sample with replacement from the original time series of returns
  • With parametric monte carlo where we fix some distribution, estimate the distributional parameters from the return series and resample from those distributions
Calculation: annual SR = 0.5, daily SR = 0.5/sqrt(256) = 0.03125. Variance of estimate = (1+.5*.03125^2)/2560 = 0.000391, standard deviation of estimate = 0.0197, annualised = 0.0197*sqrt(256) = 0.32

For simplicity and since I 'know' the parameters of the distribution I'm going to use the third method in this post. 

(it would be equally valid to use the other methods, and I've done so in the past...)

So what we do is generate a number of new return series from the same distribution of returns as in the original strategy, and the same length (10 years). For each of these we calculate the final account value given various leverage levels. We then get a distribution of account values for different leverage levels. 

The full Kelly optimal would just find the leverage level at which the average account value was maximised, i.e. the median 50% percentile point of this distribution. Instead however we're going to take some more conservative distributional point which is something less than 50%, like for example 20%. In plain english, we want the leverage level that maximises the account value that we expect to get say two out of ten times in a future 10 year period (assuming all our assumptions about the distribution are true). 

Note this is a more sophisticated way of doing the crude 'half Kelly' targeting used by certain people, as I've discussed in previous blog posts. It also gives us some comfort in the case of our returns not being normally distributed, but where we've been unable to accurately estimate the likely left hand tail from the existing historic data ('peso problem').

Let's return to asset A and show the final value at different points of the monte carlo distribution, for different leverage levels:


x-axis leverage level, y-axis final value of capital, lines: different percentile points of distribution

Each line is a different point on the distribution, eg 0.5 is the median, 0.2 is the 20% percentile and so on. As we get more pessimistic (lower values of percentile), the final value curve slips down for a given level of leverage; but the optimal leverage which maximizes final value also reduces. If you are super optimistic (75% percentile) you would use 3.5x leverage; but if you were really conservative (10% percentile) you would use about 0.5x leverage (eg keep half your money in cash). 

As I said in my previous post your choice of line is down to your tolerance for uncertainty. This is not quite the same as a risk tolerance, since here we are assuming that you are happy to maximise geometric mean and therefore you are happy to take as much standard deviation risk as that involves. I personally feel that the choice of uncertainty tolerance is much more intuitive to most people than choosing a standard deviation risk limit / target, or god forbid a risk tolerance penalty variable.


Introducing withdrawals


Now we are all caught up with the past, let's have a look at what happens if we withdraw money from our portfolio over time. First decision to make is what our utility function is. Do we still want to maximise final value? Or are we happy to end up with some non positive value of money at the end of time? For some of us, the answer will depend on how much we love our children :-) To keep things simple, I'm initially going to assume that we want to maximise final value, subject to that being at least equal to our starting capital. As my compounding calculations assume an initial wealth of 1.0, that means a final account value of at least 1.0.

Inititally then I'm going to look at what happens in the non probabilistic case. In the following graph, the x-axis is leverage as before, and the y-axis this time is final value. Each of the lines shows what will happen at a different withdrawal rate. 0 is no withdrawal, 0.005 is 0.5% a year, and so on up to 0.2; 20% a year.


x-axis leverage, y-axis final value. Each line is a different annual withdrawal rate

At higher withdrawal rates we make less money - duh! - but the optimal leverage remains unchanged. That makes sense. Regardless of how much money we are withdrawing, we're going to want to run at the same optimal amount of leverage. 

And for all withdrawal rates of 17% or less, we end up with at least 1.0 of our final account value, so we can use the optimal leverage without any worries. For higher withdrawal rates, eg 20%, we can never safely withdraw all that amount, regardless of how much leverage we use. We'll always end up with less than our final account value even at the optimal leverage ratio.

For this Sharpe Ratio level then, to end up with at least 1.0 of our account value, it looks like our safe withdrawal rate is around 17% (In fact, I calculate it later to be more like 18%).


Safe withdrawals versus Sharpe Ratio


OK that's for a Sharpe of 0.5, but what if we have a strategy which is much better or worse? What is the relationship between a safe withdrawal rate, and the Sharpe Ratio of the underlying strategy?  Let's assume that we want to end up with at least 1.0x our starting capital after 10 years, and we push our withdrawal rate up to that point. 

X-axis Sharpe Ratio, y-axis safe withdrawal rate leaving capital unchanged at starting level

That looks a bit exponential-esque, which kind of makes sense since we know that returns gross of funding costs scale with the square of SR: If our returns double with the same standard deviation we double our SR, then we can double our risk target, which means we can use twice as much leverage, so we end up with four times the return. It isn't exactly exponential, because we have to fund borrowing. 

The above result is indifferent to the standard deviation of the underlying asset as we'd expect (I did check!), but how does it vary when we change the other key values in our calculation: the years to run the strategy over and the proportion of our starting capital we want to end up with?

x-axis amount of starting capital to end up with, y-axis withdrawal rate, lines different time periods in years

Each of these plots has the same format. The Sharpe Ratio of the underlying strategy is fixed, and is in the title. The y-axis shows the safe withdrawal rate, for a given amount of remaining starting capital on the x-axis (where 1.0 means we want to end up with all our remaining capital). Each line shows the results for a different number of years.

The first thing to notice is that if we want to maintain our starting capital, the withdrawal rate will be unchanged regardless of the number of years we are trading for. That makes sense - this is a 'steady state' where we are withdrawing exactly what we make each year. If we are happy to end up with less of our capital, then with shorter horizons we can afford to take a lot more out of our account each year. Again, this makes sense. However if we want to end up with more money than we started with, and our horizon is short, then we have to take less out to let everything compound up. In fact for a short enough time horizon we can't end up with twice our capital as there just isn't enough time to compound up (at what here is quite a poor Sharpe Ratio). 

x-axis amount of starting capital to end up with, y-axis withdrawal rate, lines different time periods in years


With a higher Sharpe, the pattern is similar but the withdrawal rates that are possible are much larger. 


x-axis amount of starting capital to end up with, y-axis withdrawal rate, lines different time periods in years



Withdrawals probabilistically


Notice that if you really are going to consistently hit a SR of exactly 1, and you're prepared to run at full Kelly, then a very high withdrawal rate of 50% is apparently possible. But hitting a SR of exactly 1 is unlikely because of parameter uncertainty. 

So let's see what happens if we introduce the idea of distributional monte carlo into withdrawals. To keep things simple, I'm going to stick my original goal of saying that we want to end up with exactly 100% of our capital remaining when we finish. That means we can solve the problem for an arbitrary number of years (I'm going to use 30, which seems reasonable for someone in the withdrawal phase of their investment career post retirement). 

What I'm going to do then is generate a large number of random 30 year daily return series drawn for a return distribution appropriate for a given Sharpe Ratio, and for each of those calculate what the optimal leverage would be (which remember from earlier is invariant to withdrawal rate), and then find the maximum annual withdrawal rate that means I still have my starting capital at the end of the investment period. This will give me a distribution of withdrawal rates. 

From that distribution I then take a different quantile point, depending on whether I am being optimistic or pessimistic versus the median.



X-axis: Sharpe Ratio. Y-axis: withdrawal rate (where 0.5 is 50% a year). Line colours: different percentiles of the monte carlo withdrawal rate distribution, eg 0.5 is the median, 0.1 is the very conservative 10% percentile.


Here is the same data in a table:

                   Percentile
SR     0.10   0.20   0.30   0.50    0.75
0.10 4.7 4.8 4.9 5.5 7.40
0.25 4.9 5.3 6.0 7.9 11.00
0.50 9.0 11.0 13.0 18.0 24.25
0.75 18.0 23.0 26.0 33.0 43.00
1.00 33.0 39.0 45.0 55.0 66.00
1.50 83.9 96.0 102.0 117.0 135.00
2.00 162.0 176.0 185.7 205.0 228.25

We can see our old friend 18% in the median 0.50 percentile column, for the 0.50 SR row. As before we can withdraw more with higher Sharpe Ratios.
Now though as you would expect, as we get more optimistic about the quanti, we would use a higher withdrawal rate. For example, for a SR of 1.0 the withdrawal rates vary from 33% a year at the very conservative 10% percentile, right up to 66% at the highly optimistic 75% percentile.
As I've discussed before nobody should ever use more than the median 50% (penultimate column) which means you're basically indifferent to uncertainty, and I'd be vary wary of the bottom few rows with very high  Sharpe Ratios, unless you're actually running an HFT shop or Jane Street in which case good luck.
Footnote: All of the above numbers were calculated with a 5% risk free rate. Here are the same figures with a 0% risk free rate. They are roughly, but not exactly, the above minus 5%. This means that for low enough SR values and percentile points we can't safely withdraw anything and expect to end up with our starting capital intact.

                    Percentile
SR     0.10   0.20   0.30   0.50   0.75
0.10 0.0 0.0 0.0 0.4 2.8
0.25 0.0 0.6 1.4 3.6 7.5
0.50 3.6 5.9 8.1 12.0 19.0
0.75 13.0 17.0 21.0 28.0 38.0
1.00 29.0 34.0 39.0 48.0 62.0
1.50 79.0 90.0 96.0 110.0 131.0
2.00 150.9 167.0 178.0 198.5 223.0



Conclusion



I find him strangely compelling and also very annoying. He is always sniggering and has a permanent smug look on his face. The videos are mostly set in exotic places where we are presumably supposed to envy Anton's lifestyle which seems to involve spending a lot of time flying around the world - not something I'd personally aspire to.

t

I can't comment on the quality of his education but at least he has the pedigree. He also has some interesting opinions about non trading subjects but then so do most trading "gurus". Mostly on trading, and on the financial industry generally, from what I've seen he talks mostly sense.

Anyway, one interesting thing he said is that you shouldn't use trading for income but only to grow capital. Something I mostly agree with. Mostly people who w

http://www.elitetrader.com/et/index.php?threads/how-much-did-you-save-up-before-you-decided-to-trade-full-time.298251/
As a procrastination technique (I'm supposed to be writing my second book) I've been watching the videos of Anton Kriel on youtube. For those of you who don't know him he's an english ex goldman sachs guy who retired at the age of 27, "starred" in the post modern turtle traders based reality trading show "million dollar traders", and now runs something called the institute of trading that offers very high priced training and mentoring courses.

I find him strangely compelling and also very annoying. He is always sniggering and has a permanent smug look on his face. The videos are mostly set in exotic places where we are presumably supposed to envy Anton's lifestyle which seems to involve spending a lot of time flying around the world - not something I'd personally aspire to.

I can't comment on the quality of his education but at least he has the pedigree. He also has some interesting opinions about non trading subjects but then so do most trading "gurus". Mostly on trading, and on the financial industry generally, from what I've seen he talks mostly sense.

Anyway, one interesting thing he said is that you shouldn't use trading for income but only to grow capital. Something I mostly agree with. Mostly people who w

http://www.elitetrader.com/et/index.php?threads/how-much-did-you-save-up-before-you-decided-to-trade-full-time.298251/
I'd quite a conservative person, so I'd probably conservatively assume my SR was around 0.50 (in backtest it's much higher than that, and even in live trading it's been a little bit higher), and use the most conservative 10% percentile. That implies that with a withdrawal rate a shade under 4% plus the risk free rate I'd still have my starting capital intact after any given period of time. 

Since I've used a risk free rate of 5%, that implies withdrawing the risk free rate plus another 4% on top, for a total of 9%.

If you're more aggressive, and have good reason to expect a higher Sharpe Ratio, then you could consider a withdrawal rate up to perhaps 30%. But this should only be done by someone with a track record of achieving those kinds of returns over several years, and who is comfortable with the fact that their chances of maintaining their capital are only a coin flip.

Note that one reason this is quite low is that in a conservative 10% quantile scenario I'd rarely be using the full Kelly (remember 0.50 SR implies a 50% risk target); this is consistent with what I actually do which is use a 25% risk target. With a 25% risk target, and SR 0.5 in theory I will make the risk free rate plus 12.5%. So I'm withdrawing around a third of my expected profits, which sounds like a good rule of thumb for someone who is relatively risk averse. 

Obviously my conservative 9% is higher than the 4% suggested by most retirement planners (which is a bit arbitrary as it doesn't seem to change when the risk free rate changes), but that is for long only portfolios where the Sharpe probably won't be even as good as 0.50; and more importantly where leverage isn't possible. Getting even to the half Kelly risk target of 25% isn't going to be possible without leverage with a portfolio that doesn't just contain small cap stocks or crypto.... it will be impossible with 60:40 for sure! But also bear in mind that my starting capital won't be worth what it's currently worth in real terms in the future, so I might want to reduce that figure further. 

Tuesday, 19 November 2024

CTA index replication and the curse of dimensionality

Programming note: 

So, first I should apologise for the LONG.... break between blogposts. This started when I decided not to do my usual annual review of performance - it is a lot of work, and I decided that the effort wasn't worth the value I was getting from it (in the interests of transparency, you can still find my regularly updated futures trading performance here). Since then I have been busy with other projects, but I now find myself with more free time and a big stack of things I want to research and write blog posts on.

Actual content begins here:

To the point then - if you have heard me talking on the TTU podcast you will know that one of my pet subjects for discussion is the thorny idea of replicating - specifically, replicating the performance of a CTA index using a relatively modest basket of futures which is then presented inside something like an ETF or other fund wrapper as an alternative to investing in the CTA index itself (or to be more precise, investing in the constituents because you can't actually invest in an index).

Reasons why this might be a good thing are: 

  • that you don't have to pay fat fees to a bunch of CTA managers, just slightly thinner ones to the person providing you with the ETF. 
  • potentially lower transaction costs outside of the fee charged
  • Much lower minimum investment ticket size
  • Less chance of idiosyncratic manager exposure if you were to deal with the ticket size issue by investing in just a subset of managers rather than the full index
How is this black magic achieved? In an abstract way there are three ways we can replicate something using a subset of the instruments that the underyling managers are trading:
  • If we know the positions - by finding the subset of positions which most closely matches the joint positions held by the funds in the index. This is how my own dynamic optimisation works, but it's not really practical or possible in this context.
  • Using the returns of individual instruments: doing a top down replication where we try and find the basket of  current positions that does the best job of producing those returns.
  • If we know the underlying strategies - by doing a bottom up replication where we try and find the basket of strategies that does the best job of producing those returns.

In this post I discuss in more detail some more of my thoughts on replication, and why I think bottom up is superior to top down (with evidence!).

I'd like to acknowledge a couple of key papers which inspired this post, and from which I've liberally stolen:



Why are we replicating?

You may think I have already answered this; replication allows us to get close to the returns of an index more cheaply and with lower minimum ticket size than if we invested in the underlying managers. But we need to take a step back: why do we want the returns of the <insert name of CTA index> index?




For many institutional allocators of capital the goal is indeed closely matching and yet beating the returns of a (relatively) arbitrary benchmark. In which case replication is probably a good thing.

If on the other hand you want to get exposure to some latent trend following (and carry, and ...) return factors that you believe are profitable and/or diversifying then other options are equally valid, including investing in a selected number of managers, or doing DIY trend following (and carry, and ...). In both cases you will end up with a lower correlation to the index than with replication, but frankly you probably don't care.

And of course for retail investors where direct manager investment (in a single manager, let alone multiple managers) and DIY trend following aren't possible (both requiring $100k or more) then a half decent and chearp ETF that gives you that exposure is the only option. Note such a fund wouldn't neccessarily need to do any replication - it could just consist of a set of simple CTA type strategies run on a limited universe of futures and that's probably just fine. 

(There is another debate about how wide that universe of futures should be, which I have also discussed in recent TTU episodes and for which this article is an interesting viewpoint). 

For now let's assume we care deeply, deeply, about getting the returns of the index and that replication is hence the way to go.


What exactly are we replicating?

In a very abstract way, we think of there being C_0....C_N CTA managers in an index. For example in the SG CTA index there are 20 managers, whilst in the BTOP50 index there are... you can probably guess. No, not 50, it's currently 20. The 50 refers to the fact it's trying to capture at least 50% of the investable universe. 

In theory the managers could be weighted in various ways (AUM, vol, number of Phds in the front office...) but both of these major indices are equally weighted. It doesn't actually matter what the weighting is for our purposes today.

Each manager trades in X underlying assets with returns R_0.....R_X. At any given time they will have positions in each of these assets, P_c_x (so for manager 0, P_0_0.... P_0_X, for manager 1, P_1_0...P_1_X and in total there will be X*N positions at each time interval). Not every manager has to trade every asset, so many of these positions could be persistently zero.

If we sum positions up across managers for each underlying asset, then there will be a 'index level' position in each underlying asset P_0.... P_X. If we knew that position and were able to know instantly when it was changing, we could perfectly track the index ignoring fees and costs. In practice, we're going to do a bit better than the index in terms of performance as we will get some execution cost netting effects (where managers trade against each other we can net those off), and we're not paying fees. 

Note that not paying performance fees on each manager (the 20 part of '2&20') will obviously improve our returns, but it will also lower our correlation with the index. Management fee savings however will just go straight to our bottom line without reducing correlation. There will be additional noise from things like how we invest our spare margin in different currencies, but this should be tiny. All this means that even in the world of perfectly observable positions we will never quite get to a correlation of 1 with the index.

But we do not know those positions! Instead, we can only observe the returns that the index level positions produce. We have to infer what the positions are from the returns. 


The curse of dimensionality and non stationarity, top down version

How can we do this inference? Well we're finance people, so the first thing we would probably reach for is a regression (it doesn't have to be a regression, and no doubt younger people reading this blog would prefer something a bit more modern, but the advantage of a regression is it's very easy to understand it's flaws and problems unlike some black box ML technique and thus illustrate what's going wrong here).

On the left hand side of the regression is the single y variable we are trying to predict - the returns of the index. On the right hand side we have the returns of all the possible instruments we know our managers are trading. This will probably run into the hundreds, but the maximum used for top down replication is typically 50 which should capture the lions share of the positions held. The regressed 'beta' coefficients on each of these returns will be the positions that we're going to hold in each instrument in our replicating portfolio: P_0... P_X. 

Is this regression even possible? Well, as a rule you want to have lots more data points than you do coefficients to estimate. Let's call the ratio between these the Data Ratio. It isn't called that! But it's as good a name as any. There is a rule of thumb that you should have at least 10x the number of variables in data points. I've been unable to find a source for who invented this rule, so let's call it The Rule Of Thumb.

There are over 3800 data points available for the BTOP50 - 14 years of daily returns, so having say 50 coefficients to estimate gives us a ratio of over 70. So we are all good.

Note - We don't estimate an intercept as we want to do this replication without help or hindrance from a systematic return bias.

In fact we are not good at all- we have a very big problem, which is that the correct betas will change every day as the positions held change every day. In theory then that means we will have to estimate 200 variables with just one piece of data - todays daily return. That's a ratio of 0.005x; well below 10!

Note - we may also have the returns for each individual manager in the index, but a moments thought 
will tell you that this is not actually helpful as it just means we will have twenty regressions to do, each with exactly the same dimensionality problem.

We can get round this. One good thing is that these CTAs aren't trading that quickly, so the position weights we should use today are probably pretty similar to yesterdays. So we can use more than one day of returns to estimate the correct current weights. The general approach in top down replication is to use rolling windows in the 20 to 40 day range. 

We now have a ratio of 40 datapoints: 50 coefficients - which is still less than ten.

To solve this problem we must reduce the number of betas we're trying to estimate by reducing the number of instruments in our replacing portfolio. This can be done by picking a set of reasonably liquid and uncorrelated instruments (say 10 or 15) to the point where we can actually estimate enough position weights to somewhat replicate the portfolio. 

However with 40 days of observations we need to have just four instruments to meet our rule of thumb. It would be hard to find a fixed group of four instruments that suffice to do a good job of replicating a trend index that actually has hundreds of instruments underlying it.

To deal withs problem, we can use some fancy econometrics. With regularisation techniques like LASSO or ridge regression; or stepwise regressions, we can reduce the effective number of coefficients we have to estimate. We would effectively be estimating a small number of coefficients, but they would be the coefficients of four different instruments over time (yes this is a hand waving sentence) which give us the best current fit.

Note that there is a clear trade off here between the choice of lookback window, and the number of coefficients estimated (eithier as an explicit fixed market choice, dynamically through stepwise regression, or in an implicit way through regularisation):

  • Very short windows will worsen the curse of dimensionality. Longer windows won't be reactive enough to position changes.
  • A smaller set of markets means a better fit, and means we can be more reactive to changes in positions held by the underlying markets, but it also means we're going to do a poorer job of replicating the index.


Introducing strategies and return factors

At this point if we were top down replicators, we would get our dataset and start running regressions. But instead we're going to pause and think a bit more deeply. We actually have additional information about our CTA managers - we know they are CTA managers! And we know that they are likely to do stuff like trend following, as well as other things like carry and no doubt lots of other exotic things. 

That information can be used to improve the top down regression. For example, we know that CTA managers probably do vol scaling of positions. Therefore, we can regress against the vol scaled returns of the underlying markets rather than the raw returns. That will have the benefit of making the betas more stable over time, as well as making the Betas comparable and thus more intuitive when interpreting the results.

But we can also use this information to tip the top down idea on it's head. Recall:

Each manager trades in X underlying assets with returns R_0.....R_X. At any given time they will have positions in each of these assets, P_c_x (so for manager 0, P_0_0.... P_0_X, for manager 1, P_1_0...P_1_X so there will be X*N positions at each time interval). 

Now instead we consider the following:

Each manager trades in Y underlying strategies with returns r_0.....r_Y. At any given time they will have weights in each of these strategies, w_c_y (so for manager 0, w_0_0.... w_0_Y, for manager 1, w_1_0...w_1_Y so there will be Y*N positions at each time interval). 

Why is this good? Well because strategy weights, unlike positions, are likely to be much more stable. I barely change my strategy weights. Most CTAs probably do regular refits, but even if they do then the weights they are using now will be very similar to those used a year ago. Instead of a 40 day window, it wouldn't be unreasonable to use a window length that could be measured in years: thousands of days. This considerably improves the curse of dimensionality problem.


Some simple tables

For a given number of X instruments, and a given number of Y strategies, Z for each instrument:



                                Top down              Bottom up

Approx optimal window size      40 days               2000 days

Number of coefficients            X                     X*Z

Data ratio                  40 / X                   2000 / X*Z


Therefore as long as Z is less than 50 the data ratio of the bottom up strategy will be superior. For example, with some real numbers - 20 markets and 5 strategies per market:

                  


                                Top down              Bottom up

Approx optimal window size        40 days               2000 days
Number of coefficients            20                    100

Data ratio                        2                      20


Alternatively, we could calculate the effective number of coefficients we could estimate to get a data ratio of 10 (eithier as a fixed group, or implicit via regularisation):



                                Top down              Bottom up

Approx optimal window size        40 days               2000 days

Data ratio                        10                    10

Number of coefficients            4                     20


It's clear that with bottom up replication we should get a better match as we can smuggle in many more coefficients, regardless of how fancy our replication is.

A very small number of caveats


There are some "but..."'s, and some "hang on a moment's" though. We potentially have a much larger number of strategies than instruments, given that we probably use more than one strategy on each instrument. Two trend following speeds plus one carry strategy is probably a minimum; tripling the number of coefficients we have to estimate. It could be many more times that.

There are ways round this - the same ways we would use to get round the 'too many instruments' problem we had before. And ultimately the benefit from allowing a much longer window length is significantly greater than the increase in potential coefficients from multiple strategies per instrument. Even if we ended up with thousands of potential coefficients, we'd still end up selecting more of them than we would with top down replication.

A perhaps unanswerable 'but...' is that we don't know for sure which strategies are being used by the various managers, whereas we almost certainly know all the possible underlying instruments they are trading. For basic trend following that's not a problem; it doesn't really matter how you do trend following you end up with much the same return stream. But it's problematic for managers doing other things.

A sidebar on latent factors


Now one thing I have noticed in my research is that asset class trends seem to explain most of instrument trend following returns (see my latest book for details). To put it another way, if you trend follow a global equity index you capture much of the p&l from trend following the individual constituents. In a handwaving way, this is an example of a latent return factor. Latent factors are the reason why both top down and bottom up replication work as well as they do so it's worth understanding them.

The idea is that there are these big and unobservable latent factors that drive returns (and risk), and individual market returns are just manifestations of those. So there is the equity return factor for example, and also a bond one. A standard way of working out what these factors are is to do a decomposition of the covariance matrix and find out what the principal components are. The first few PC will often explain most of the returns. The factor loadings are relatively static and slow moving; the S&P 500 is usually going to have a big weight in the equity return factor.

Taking this idea a step further, there could also be 'alternative' return factors; like the trend following factor or carry factor (or back in equity land, value and quality). These have dynamic loadings versus the underyling instruments; sometimes the trend following factor will be long S&P 500 and sometimes short. This dynamic loading is what makes top down replication difficult.

Bottom up regression reverses this process and begins with some known factors; eg the returns from trend following the S&P 500 at some speed with a given moving average crossover, and then tries to work out the loading on those factors for a given asset - in this case the CTA index. 

Note that this also suggests some interesting research ideas such as using factor decomposition to reduce the number of instruments or strategies required to do top down or bottom up replication, but that is for another day. 

If factors didn't exist and all returns were idiosyncratic both types of replication would be harder; the fact they do seem to exist makes replication a lot easier as it reduces the number of coefficients required to do a good job.



Setup of an empirical battle royale


Let's do a face off then of the two methodologies. The key thing here isn't to reproduce the excellent work done by others (see the referenced papers for examples), or neccessarily to find the best possible way of doing eithier kind of replication, but to understand better how the curse of dimensionality affects each of them. 

My choice of index is the BTOP50, purely because daily returns are still available for free download. My set of instruments will be the 102 I used in my recent book 'AFTS' (actually 103, but Eurodollar is no longer trading) which represent a good spread of liquid futures instruments across all the major asset classes. 

I am slightly concerned about using daily returns, because the index snapshot time is likely to be different from the closing futures price times I am using. This could lead to lookahead bias, although that is easily dealt with by introducing a conservative two day lag in betas as others have done. However it could also make the results worse since a systematic mismatch will lower the correlation between the index returns and underyling instrument returns (and thus also the strategy returns in a bottom up replication). To avoid this I also tested a version using two day returns but it did not affect the results.

For the top down replication I will use six different window sizes from 8 business days up to 256 (about a year) with all the powers of 2 in between. These window sizes exceed the range typically used in this application, deliberately because I want to illustrate the tradeoffs involved. For bottom up replication I will use eight window sizes from 32 business days up to 4096 (about sixteen years, although in practice we only have 14 years of data for the BTP50 so this means using all the available data). 

We will do our regressions every day, and then use an exponential smooth on the resulting coefficients with a span equal to twice the window size. For better intuition, a 16 day exponential span such as we would use with an 8 day window size has a halife of around 5.5 days. The maximum smooth I use is a span of 256 days.

For bottom up replication, I will use seven strategies: three trend following EWMA4,16, EWMAC16,64, EWMAC64,256 and a carry strategy (carry60); plus some additional strategies: acceleration32, mrinasset1000, and skewabs180. For details of what these involve, please see AFTS or various blogposts; suffice to say they can be qualitiatively described as fast, medium and slow trend following, carry, acceleration (change in momentum), mean reversion, fast momentum and skew respectively. Note that in the Resolve paper they use 13 strategies for each instrument, but these are all trend following over different speeds and are likely to be highly correlated (which is bad for regression, and also not helpful for replication).

I will use a limited set of 15 instruments, the same as those used in the Newfound paper, which gives me 15*7 = 105 coefficients to estimate - roughly the same as in the top down replication.

I'm going to use my standard continous forecasting method just because that is the code I have to hand; the Resolve paper does various kinds of sensitivity analysis and concludes that both binary and continous produce similar results (with a large enough universe of instruments, it doesn't matter so much exactly how you do the CTA thing). 

Note - it could make sense to force the coefficients on bottom up replication to be positive, however we don't know for sure if a majority of CTAs are using some of these strategies in reverse, in particular the divergent non trend following strategies.


Approx data ratios with different window sizes if all ~100 coefficients estimated:

                               

8 days                            0.08
16 days                           0.16
32 days                           0.32
64 days                           0.64
128 days                          1.28
256 days                          2.56
512 days                          5.12
1024 days                         10.2    
2048 days                         20.5
4096 days                         41.0


In both cases I need a way to reduce the number of regressors on the right hand side from somewhere just over 100 to something more reasonable. This will clearly be very important with an 8 day window!

Various fancy techniques are commonly used for this including LASSO and ridge regression. There is a nice summary of the pros and cons of these in an appendix of the Resolve paper; one implication being that the right technique will depend on whether we are doing bottom up or top down replication. They also talk about elastic net, a technique that combines both of these techniques. For simplicity I use LASSO, as there is only one hyperparameter to fit (penalty size).



Results

Here are the correlation figures for the two methods with different lookback windows:


As you can see, the best lookback for the top down method needs to be quite short to capture changing positions. Since strategy weights are more stable, we can use a longer lookback for the bottom up method. For any reasonable length of lookback the correlation produced by the top down method is pretty stable, and significantly better than the bottom up method.


Footnote: Why not do both?

One of the major contributions of the Resolve paper is the idea of combining both top down and bottom up methods. We can see why this make sense. Although bottom up is superior as it causes less dimensionality issues, it does suffer because there might be some extra 'secret sauce' that our bottom up models don't capture. By including the top down element as well we can possibly fill this gap.


Footnote on 'Creating a CTA from scratch'

You may have seen some bottom up 'replication' articles that don't use any regression, such as this one. They just put together a set of simple strategies with some sensible weights and then do an ex-post cursory check on correlation with the index. The result, without trying, is a daily correlation of 0.6 with the SG CTA index, in line with the best bottom up results above without any of the work or the risks involved with doing potentially unstable regressions on small amounts of data. Indeed, my own trading strategies (monthly) correlation with the SG CTA index was 0.8 last time I checked. I have certainly done no regressions to get that that!

As I mentioned above, if you are a retail investor or an institutional investor who is not obsessed with benchmarking, then this might be the way to go. There is then no limit on the number of markets and strategies you can include.


Conclusion

I guess my conclusion comes back to why... why are we doing this.

If we really want to replicate the index then we should be agnostic about methodology and go with what is best. This will involve mostly bottom up with a longish window for the reasons discussed above, although it can probably be improved by including an averaging with top down.

But if we are trying to get 'exposure to some trend following factors' without caring about the index then I would probably start with the bottom up components of simple strategies on a diversified set of instruments with sensible but dumb 'no-information' weights that probably use some correlation information but not much else (see all the many posts I have done on portfolio optimisation). Basically the 'CTA from scratch' idea.

And then it might make sense to move in the direction of trying to do a bottom up replication of the index if you did decide to reduce your tracking error, though I'd probably use a robust regression to avoid pulling the strategy weights too far from the dumb weights.