Wednesday 6 March 2024

Fitting with: exponential weighting, alpha and the kitchen sink

 I've talked at some length before about the question of fitting forecast weights, the weights you use to allocate risk amongst different signals used to trade a particular instrument. Generally I've concluded that there isn't much point wasting time on this, for example consider my previous post on the subject here.

However it's an itch I keep wanting to scratch, and in particular there are three things I'd like to look at which I haven't considered before:

  • I've generally used ALL my data history, weighted equally. But there are known examples of trading rules which just stop working during the backtested period, for example faster momentum pre-cost (see my last book for a discussion). 
  • I've generally used Sharpe ratio as my indicator of performance of choice. But one big disadvantage of it is that will tend to favour rules with more positive Beta exposure on markets that have historically gone up 
  • I've always used a two step process where I first fit forecast weights, and then instrument weights. This seperation makes things easier. But we can imagine many examples where it would produce a suboptimal performance.
In this post I discuss some ideas to deal with these problems:

  • Exponential weighting, with more recent performance getting a higher weight.
  • Using alpha rather than Sharpe ratio to fit.
  • A kitchen sink approach where both instrument and forecast weights are fitted together.

Note I have a longer term project in mind where I re-consider the entire structure of my trading system, but that is a big job, and I want to put in place these changes before the end of the UK tax year, when I will also be introducing another 50 or so instruments into my traded universe, something that would require some fitting of some kind to be done anyway.


Exponential weighting

Here is the 2nd most famous 'hockey stick' graph in existence:

(From my latest book, Advanced Futures Trading Strategies AFTS)

Focusing on the black lines, which show the net performance of the two fastest EWMAC trading rules across a portfolio of 102 futures contracts, there's a clear pattern. Prior to 1990 these rules do pretty well, then afterwards they flatline (EWMAC4 in a very clear hockey stick pattern) and do badly (EWMAC2). 

I discuss some reasons why this might have happened in the book, but that isn't what concerns us now. What bothers me is this; if I allocate my portfolio across these trading strategies using all the data since 1970 then I'm going to give some allocation to EWMAC4 and even a little bit to EWMAC2. But does that really make sense, to put money in something that's been flat / money losing for over 30 years?

Fitting by use of historic data is a constant balance between using more history, to get more robust statistically significant results, and using more recent data that is more likely to be relevant and also accounts for alpha decay. The right balance depends on both the holding period of our strategies (HFT traders use months of data, I should certainly be using decades), and also the context (to predict instrument standard deviation, I use something equvalent to using about a month of returns, whereas for this problem a much longer history would be appropriate). 

Now I am not talking about crazy and doing something daft like allocating everything to the strategy that did best last week, but it does seem reasonable to use something like a 15 year halflife when estimating means and Sharpe Ratios of trading strategy returns.

That would mean I'd currently be giving about 86% of any weighting to the period after 1990, compared to about 62% now with equal weighting. So it's not a pure rolling window; the distant past still has some value, but the recent past is more important. 


Using alpha rather than Sharpe Ratio to fit

One big difference between Quant equity people and Quant futures traders is that the former are obsessed with alpha. They get mugs from their significant others with 'worlds best alpha generator' on them for christmas. They wear jumpers with the alpha symbol on. You get the idea. Beta is something to be hedged out. Much of the logic is that we're probably getting our daily diet of Beta exposure elsewhere, so the holistic optimal portfolio will consist of our existing Beta plus some pure alpha.

Quant futures traders are, broadly speaking, more concerned with outright return. I'm guilty of this myself. Look at the Sharpe Ratio in the backtest. Isn't it great? And you know what, that's probably fine. The correlation of a typical managed futures strategy with equity/bond 60:40 is pretty low. So most of our performance will be alpha anyway. 

However evaluating different trading strategies on outright performance is somewhat problematic. Certain rules are more likely to have a high correlation with underlying markets. Typically this will include carry in assets where carry is usually positive (eg bonds), and slower momentum on anything that has most usually gone up in the past (eg equities)*.  To an extent some of this is fine since we want to collect some risk premia, but if we're already collecting those premia elsewhere in long only portfolios**, why bother? 

* This also means that any weighting of instrument performance will be biased towards things that have gone up in the past - not a problem for me right now as I generally ignore it, but could be a problem if we adopt a 'kitchen sink' approach as I will discuss later.

** Naturally 'Trend following plus nothing' people will prefer to collect their risk premia inside their trend following portfolios, but they are an exception. I note in passing that for a retail investor who has to pay capital gains when their futures roll, it is likely that holding futures positions is an inferior way of collecting alpha.

I'm reminded of a comment by an old colleague of mine* on investigating different trading rules in the bond sector (naturally evalutin. After several depressing weeks he concluded that 'Nothing I try is any better than long only'.

*Hi Tom!

So in my latest book AFTS (sorry for the repeated plugs, but you're reading this for free so there has to be some advertising and at least it's more relevant than whatever clickbait nonsense the evil algo would serve up to you otherwise) I did redress this slightly by looking at alpha and not just outright returns. For example my slowest momentum rule (EWMAC64,256) has a slightly higher SR than one of my fastest (EWMAC8,32), but an inferior alpha even after costs.


Which benchmark?

Well this idea of using alpha is all very well, but what benchmark are we regressing on to get it? This isn't US equities now mate, you can't just use the S&P 500 without thinking. Some plausible candidates are:

  1. The S&P 500,.
  2. The 60:40 portfolio that some mythical investor might own as well as this, or a more tailored version to my own requirementsThis would be roughly equivalent to long everything on a subset of markets, with sector risk weights of about 80% in equities and 20% in bonds. Frankly this wouldn't be much different to the S&P 500.
  3. The 'long everything' portfolio I used in AFTS, which consists of all my futures with a constant positive fixed forecast (the system from chapter 4, as readers will know).
  4. A long only portfolio just for the sector a given instrument is trading in.
  5. A long only position just on the given instrument we are trading.

There are a number of things to consider here. What is the other portfolio that we hold? It might well be the S&P 500 or just the magnificent 7; it's more likely to consist of a globally diversified bunch of bonds and stocks; it's less likely to have a long only cash position in some obscure commodities contract. 

Also not all things deliver risk premia in their naked Beta outfits. Looking at median long only constant forecast SR in chapter 3 of AFTS, they appear lower in the non financial assets (0.07 in ags, 0.27 in metals and 0.32 in energy; versus 0.40 in short vol, 0.46 in equity and 0.59 in bonds; incidentally FX is also close to zero at 0.09, but EM apart there's no reason why we should earn a risk premium here). This implies we should be veering towards loading up on Beta in financials and Alpha in non financials). 

But it's hard to disaggregate what is the natural risk premium from holding financial assets, versus what we've earned just from a secular downtrend in rates and inflation that lasted for much of the 50 odd years of the backtest. Much of the logic for doing this exercise is because I'm assuming that these long only returns will be lower in the future because that secular trend has now finished.

Looking at the alpha just on one instrument will make it a bit weird when comparing alphas across different instruments. It might sort of make more sense to do the regression on a sector Beta. This would be more analogus to what the equity people do.

On balance I think the 'long everything' benchmark I used in AFTS is the best compromise. Because trends have been stronger in equities and bonds it will be reasonably correlated to 60:40 anyway. Regressing against this will thus give a lower Beta and potentially better Alpha for instruments outside of those two sectors.

One nice exercise to do is to then see what a blend of long everything and the alpha optimised portfolio looks like. This would allow us to include a certain amount of general Beta into the strategy. We probably shouldn't optimise for this.


Optimising with alpha

We want to allocate more to strategies with higher alpha. We also want that alpha to be statistically significant. We'll get more statistical significance with more observations, and/or a better fit to the regression. 

Unlike with means and Sharpe Ratios, I don't personally have any well developed methodologies, theories, or heuristics, for allocating weights according to alpha or significance of alpha. I did consider developing a new heuristic, and wasted a bit of time with toy formula involving the product of (1- p_value) and alpha.

But I quickly realised that it's fairly easy to adapt work I have done on this before. Instead of using naked return streams, we use residual return streams; basically the return left over after subtracting Beta*benchmark return. We can then divide this by the target return to get a Sharpe Ratio, which is then plugged in as normal.

How does this fit into an exponential framework? There are a number of ways of doing this, but I decided against the complexity of writing code (which would be slow) to do my regression in a full exponential way. Instead I estimate my Betas on first a rolling, then an expanding, 30 year window (which trivially has a 15 year half life). I don't expect Betas to vary that much over time. I estimate my alphas (and hence Sharpe ratios) with a 15 year half life on the residuals. Betas are re-estimated every year, and the most up to date estimate is then used to correct returns in the past year (otherwise the residual returns would change over time which is a bit weird and also computationally more expensive).


Kitchen sink

I've always done my optimisation in a two step process. First, what is the best way to forecast the price of this market (what is the best allocation across trading rules, i.e. what are my forecast weights)? Second, how should I put together a portfolio of these forecasters (what is the best allocation across instruments, i.e. what are my instrument weights)? 

Partly that reflects the way my trading strategy is constructed, but this seperation also makes things easier. But it does reflect a forecasting mindset, rather than a 'diversified set of risk premia' mindset. Under the latter mindset, it would make sense to do a joint optimisation where the individual 'lego bricks' are ONE trading rule and ONE instrument. 

It strikes me that this is also a much more logical approach once we move to maximising alpha rather than maximising Sharpe Ratio. 

Of course there are potential pain points here. Even for a toy portfolio of 10 trading rules and 50 instruments we are optimising 500 assets. But the handcrafting approach of top down optimisation ought to be able to handle this fairly easily (we shall see!).



Testing

Setup

Let's think about how to setup some tests for these ideas. For speed and interpretation I want to keep things reasonably small. I'm going to use my usual five outright momentum EWMAC trading rules, plus 1 carry (correlations are pretty high here, I will use carry60), plus one of my skew rules (skewabs180 for those who care), asset class mean reversion - a value type rule (mrinasset1000), and both asset class momentum (assettrend64) and relative momentum (relmomentum80). My expectation is that the more RV type rules - relative momentum, skew, value - will get a higher weight than when we are just considering outright performance. I'm also expecting that the very fastest momentum will have a lower weight when exponential weighting is used.

The rules profitability is shown above. You can see that we're probably going to want to have less MR (mean reversion), as it's rubbish; and also if we update our estimates for profitability we'd probably want less faster momentum and relative momentum. There is another hockey stick from 2009 onwards when many rules seem to flatten off somewhat.

(Frankly we could do with more rules that made more money recently; but I don't want to be accused of overegging the pudding on overfitting here)

For instruments, to avoid breaking my laptop with repeated optimisation of 200+ instruments I kept it simple and restricted myself to only those with at least 20 years of trading history. There are 39 of these old timers:

'BRE', 'CAD', 'CHF', 'CORN', 'DAX', 'DOW', 'EURCHF', 'EUR_micro', 'FEEDCOW', 'GASOILINE', 'GAS_US_mini', 'GBP', 'GBPEUR', 'GOLD_micro', 'HEATOIL', 'JPY', 'LEANHOG', 'LIVECOW', 'MILK', 'MSCISING', 'MXP', 'NASDAQ_micro', 'NZD', 'PALLAD', 'PLAT', 'REDWHEAT', 'RICE', 'SILVER', 'SOYBEAN_mini', 'SOYMEAL', 'SOYOIL', 'SP400', 'SP500_micro', 'US10', 'US20', 'US5', 'WHEAT', 'YENEUR', 'ZAR'

On the downside there is a bit of a sector bias here (12 FX, 11 Ags,  6 equity, 4 metals, and only 3 bonds amd 3 energy), but that also gives more work for the optimiser (FWIW my full set of instruments has biased towards equities, so you can't really win).

For my long only benchmark used for regressions I'm going to use a fixed forecast of +10, which in laymans terms means it's a risk parity type portfolio. I will set the instrument weights using my handcrafting method, but without any information about Sharpe Ratio, just correlations. IDM is estimated on backward looking data of course.

I will then have something that roughly resembles my current system (although clearly with fewer markets and trading rules, and without using dynamic optimisation of positions). I also use handcrafting, but I fit forecast weights and instrument weights seperately, again without using any information on performance just correlations. 

I then check the effect of introducing the following features:

  • 'SR' Allowing Sharpe Ratio information to influence forecast and instrument weights
  • 'Alpha' Using alpha rather than Sharpe Ratio
  • 'Short' Using a 15 year halflife rather than all the data to estimate Sharpe Ratios and correlations
  • 'Sink' Estimating the weights for forecast and instrument weights at the same time

Apart from SR and alpha which are mutually exclusive, this gives me the following possible permutations:

  • Baseline: Using no peformance information 
  • 'SR' 
  • 'SR+Short' 
  • 'Sink' 
  • 'SR+Sink' 
  • 'SR+Short+Sink' 
  • 'Alpha' 
  • 'Alpha+Short'
  • 'Alpha+Sink'
  • 'Alpha+Short+Sink'
In terms of performance I'm going to check both the outright performance, but also the overall portfolio alpha. I will also look seperately at the post 2008 period and the pre 2008 period. Naturally everything is done out of sample, with robust optimisation, and after costs.

Finally, as usual in all cases I discard trading rules which don't meet my 'speed limit'. This also means that I don't trade the Milk future at all.


Long only benchmark

Some fun facts, here are the final instrument weights by asset class:

{'Ags': 0.248, 'Bond': 0.286, 'Equity': 0.117, 'FX': 0.259, 'Metals': 0.0332, 'OilGas': 0.0554}

The final diversification multiplier is 2.13. It has a SR of around 0.6, and costs of around 0.4% a year.


Baseline

Here is a representative set of forecast weights (S&P 500):

relmomentum80       0.105

momentum4           0.094
momentum8           0.048
momentum16          0.048
momentum32          0.054
momentum64          0.054
assettrend64        0.102

carry60             0.155
mrinasset1000       0.238
skewabs180          0.102

The massive weight to mrinasset is due to the fact it is very diversifying, and we are only using correlations here. But mrinasset isn't very good, so smuggling in outright performance would probably be a good thing to do.

SR of this thing is 0.98 and costs are a bit higher as we'd expect at 0.75% annualised. Always amazing how well just a simple diversified system can do. The Beta to our long only model is just 0.09 (again probably due to that big dollop of mean reversion which is slightly negative Beta if anything), so perhaps unsurprising the net alpha is 18.8% a year (dividing by the vol gets to a SR of 0.98 again just on the alpha). BUT...



Performance has declined over time. 


'SR'

I'm now going to allow the fitting process for both forecast and instrument weighs to use Sharpe ratio. Naturally I'm doing this in a sensible way so the weights won't go completely nuts.

Let's have a look at the forecast weights for comparison:

momentum4           0.112

momentum8           0.065

momentum16          0.068

momentum32          0.075

momentum64          0.072

assettrend64        0.122

relmomentum80       0.097


mrinasset1000       0.135

skewabs180          0.105

carry60             0.149


We can see that money losing MR has a lower weight, and in general the non trendy part of the portfolio has dropped from about half to under 40%. But we still have lots of faster momentum as we're using the whole period to fit.

Instrument weights by asset class meanwhile look like this:

{'Ags': 0.202, 'Bond': 0.317, 'Equity': 0.191, 'FX': 0.138 'Metals': 0.0700, 'OilGas': 0.0820}

Not dramatic changes, but we do get a bit more of the winning asset classes. 


'SR+Short'

Now what happens if we change our mean and correlation estimates so they have a 15 year halflife, rather than using all the data?

Forecast weights:

momentum4           0.095

momentum8           0.055

momentum16          0.059

momentum32          0.067

momentum64          0.066

relmomentum80       0.095

assettrend64        0.122

skewabs180          0.117

carry60             0.142

mrinasset1000       0.183



There's definitely been a shift out of faster momentum, and into things that have done better recently such as skew. We are also seeing more MR which seems counterintuitive, my initial theory is that it's because MR becomes more diversifying over time and this is indeed the case.


'Sink+SR+Short'

So far we've just been twiddling around a little at the edges really, but this next change is potentially quite different - jointly optimising the forecast and instrument weights. Let's look at the results with the SR using the 15 year halflife.

Here are the S&P 500 forecast weights - note that unlike for other methods, these could be wildly different across instruments:

momentum4           0.136
momentum8           0.188
momentum16          0.147
momentum32          0.046
momentum64          0.048
assettrend64        0.098
relmomentum80       0.012

carry60             0.035
mrinasset1000       0.000
skewabs180          0.290

Here we see decent amounts of faster momentum - maybe because it's a cheaper instrument or just happens to work better - but no mean reversion which apparently is shocking here. A better way of doing this is seeing the forecast weights added up across all instruments:

momentum4        0.181185
momentum8        0.138984
momentum16       0.088705
momentum32       0.079942
momentum64       0.098399
assettrend64     0.107174
relmomentum80    0.052196

skewabs180       0.086572
mrinasset1000    0.063426
carry60          0.103418


Perhaps surprisingly now we're seeing brutally large amounts of fast momentum, and less of the more diversifying rules. 
 


Interlude - clustering when everything is optimised together


To understand a little better what's going on, it might be helpful to do a cluster analysis to see how things are grouping together when we do our top down optimisation across the 10 rules and 37 instruments: 370 things altogether. Using the final correlation matrix to do the clustering, here are the results for 2 clusters:

Instruments {'CAD': 10, 'FEEDCOW': 10, 'GAS_US_mini': 10, 'GBPEUR': 10, 'LEANHOG': 10, 'LIVECOW': 10, 'MILK': 10, 'RICE': 10, 'SP400': 10, 'YENEUR': 10, 'ZAR': 10, 'DAX': 9, 'DOW': 9, 'MSCISING': 9, 'NASDAQ_micro': 9, 'SP500_micro': 9, 'GASOILINE': 4, 'GOLD_micro': 4, 'HEATOIL': 4, 'JPY': 4, 'PALLAD': 4, 'PLAT': 4, 'SILVER': 4, 'CHF': 3, 'CORN': 3, 'EURCHF': 3, 'EUR_micro': 3, 'NZD': 3, 'REDWHEAT': 3, 'SOYBEAN_mini': 3, 'SOYMEAL': 3, 'SOYOIL': 3, 'US10': 3, 'US5': 3, 'WHEAT': 3, 'BRE': 2, 'GBP': 2, 'US20': 2, 'MXP': 1}
Rules {'skewabs180': 37, 'mrinasset1000': 35, 'carry60': 33, 'relmomentum80': 25, 'assettrend64': 16, 'momentum4': 16, 'momentum8': 16, 'momentum16': 16, 'momentum32': 16, 'momentum64': 16}

Instruments {'MXP': 9, 'BRE': 8, 'GBP': 8, 'US20': 8, 'CHF': 7, 'CORN': 7, 'EURCHF': 7, 'EUR_micro': 7, 'NZD': 7, 'REDWHEAT': 7, 'SOYBEAN_mini': 7, 'SOYMEAL': 7, 'SOYOIL': 7, 'US10': 7, 'US5': 7, 'WHEAT': 7, 'GASOILINE': 6, 'GOLD_micro': 6, 'HEATOIL': 6, 'JPY': 6, 'PALLAD': 6, 'PLAT': 6, 'SILVER': 6, 'DAX': 1, 'DOW': 1, 'MSCISING': 1, 'NASDAQ_micro': 1, 'SP500_micro': 1}
Rules {'assettrend64': 23, 'momentum4': 23, 'momentum8': 23, 'momentum16': 23, 'momentum32': 23, 'momentum64': 23, 'relmomentum80': 14, 'carry60': 6, 'mrinasset1000': 4, 'skewabs180': 2}


Interepration here is that for each cluster I count the number of instruments present, and then trading rules. So for example the first cluster has 10 examples of CAD - since there are 10 trading rules that means all the CAD is in this cluster. It also has 37 examples of the skewabs180 rules, again this means that all the skew rules have been collected here.

This first cluster split clearly shows a split between divergent rules in cluster 1, and trendy type rules in cluster 2. The instrument split is less helpful.

Jumping ahead, here are N=10 clusters with my own labels in bold:

Cluster 1 EQUITY TREND
Instruments {'DAX': 6, 'DOW': 5, 'NASDAQ_micro': 5, 'SP400': 5, 'SP500_micro': 5}
Rules {'assettrend64': 5, 'momentum16': 5, 'momentum32': 5, 'momentum64': 5, 'momentum8': 4, 'mrinasset1000': 1, 'skewabs180': 1}

Cluster 2 ???
Instruments {'GAS_US_mini': 9, 'EURCHF': 2, 'GOLD_micro': 2, 'MSCISING': 2, 'NASDAQ_micro': 2, 'PALLAD': 2, 'PLAT': 2, 'RICE': 2, 'SILVER': 2, 'SP500_micro': 2, 'BRE': 1, 'CAD': 1, 'DAX': 1, 'DOW': 1, 'EUR_micro': 1, 'GASOILINE': 1, 'REDWHEAT': 1, 'SOYBEAN_mini': 1, 'SOYMEAL': 1, 'SOYOIL': 1, 'SP400': 1, 'US5': 1, 'WHEAT': 1}
Rules {'mrinasset1000': 15, 'carry60': 8, 'skewabs180': 6, 'relmomentum80': 5, 'assettrend64': 1, 'momentum4': 1, 'momentum8': 1, 'momentum16': 1, 'momentum32': 1, 'momentum64': 1}

Cluster 3 ???
Instruments {'FEEDCOW': 10, 'GBPEUR': 10, 'LEANHOG': 10, 'LIVECOW': 10, 'MILK': 10, 'YENEUR': 10, 'ZAR': 10, 'CAD': 9, 'RICE': 8, 'MSCISING': 7, 'HEATOIL': 4, 'JPY': 4, 'SP400': 4, 'CHF': 3, 'CORN': 3, 'DOW': 3, 'GASOILINE': 3, 'NZD': 3, 'US10': 3, 'DAX': 2, 'EUR_micro': 2, 'GBP': 2, 'GOLD_micro': 2, 'NASDAQ_micro': 2, 'PALLAD': 2, 'PLAT': 2, 'REDWHEAT': 2, 'SILVER': 2, 'SOYBEAN_mini': 2, 'SOYMEAL': 2, 'SOYOIL': 2, 'SP500_micro': 2, 'US20': 2, 'US5': 2, 'WHEAT': 2, 'BRE': 1, 'EURCHF': 1, 'GAS_US_mini': 1, 'MXP': 1}
Rules {'skewabs180': 30, 'carry60': 25, 'relmomentum80': 20, 'mrinasset1000': 19, 'momentum4': 15, 'momentum8': 11, 'assettrend64': 10, 'momentum16': 10, 'momentum32': 10, 'momentum64': 10}

Cluster 4 US RATES TREND+CARRY
Instruments {'US20': 8, 'US10': 7, 'US5': 7}
Rules {'carry60': 3, 'assettrend64': 3, 'momentum4': 3, 'momentum8': 3, 'momentum16': 3, 'momentum32': 3, 'momentum64': 3, 'mrinasset1000': 1}

Cluster 5 EURCHF
Instruments {'EURCHF': 7}
Rules {'assettrend64': 1, 'momentum4': 1, 'momentum8': 1, 'momentum16': 1, 'momentum32': 1, 'momentum64': 1, 'skewabs180': 1}

Cluster 6 EQUITY MR+REL MOMENTUM
Instruments {'DAX': 1, 'DOW': 1, 'MSCISING': 1, 'NASDAQ_micro': 1, 'SP500_micro': 1}
Rules {'mrinasset1000': 3, 'relmomentum80': 2}

Cluster 7 G10 FX TREND
Instruments {'GBP': 8, 'CHF': 7, 'EUR_micro': 7, 'NZD': 7, 'JPY': 6}
Rules {'assettrend64': 5, 'momentum4': 5, 'momentum8': 5, 'momentum16': 5, 'momentum32': 5, 'momentum64': 5, 'relmomentum80': 4, 'carry60': 1}

Cluster 8 EM FX
Instruments {'MXP': 9, 'BRE': 8}
Rules {'relmomentum80': 2, 'carry60': 2, 'assettrend64': 2, 'momentum4': 2, 'momentum8': 2, 'momentum16': 2, 'momentum32': 2, 'momentum64': 2, 'skewabs180': 1}

Cluster 9 AGS TREND
Instruments {'CORN': 7, 'REDWHEAT': 7, 'SOYBEAN_mini': 7, 'SOYMEAL': 7, 'SOYOIL': 7, 'WHEAT': 7}
Rules {'relmomentum80': 6, 'assettrend64': 6, 'momentum4': 6, 'momentum8': 6, 'momentum16': 6, 'momentum32': 6, 'momentum64': 6}

Cluster 10 ENERGY/METAL TREND
Instruments {'GASOILINE': 6, 'GOLD_micro': 6, 'HEATOIL': 6, 'PALLAD': 6, 'PLAT': 6, 'SILVER': 6}
Rules {'assettrend64': 6, 'momentum4': 6, 'momentum8': 6, 'momentum16': 6, 'momentum32': 6, 'momentum64': 6}

We can see that there are some richer things going on here than we could capture in the simple 2-dimensional fit of first forecast weights, then instrument weights. 


'Alpha'

Let's now see what happens if we replace the use of Sharpe Ratio on raw returns to measure performance with optimisation with the use of a Sharpe Ratio on residual returns after adjusting for Beta exposure; alpha basically.

Here are our usual forecast weights for S&P 500:

momentum4        0.061399
momentum8        0.067279
momentum16       0.037172
momentum32       0.037112
momentum64       0.069207
relmomentum80    0.170218
assettrend64     0.176815

skewabs180       0.105298
carry60          0.102799
mrinasset1000    0.172700

Very interesting; we're steering very much away from all speeds of 'vanilla' momentum here, and once again we have a lump of money in the very much diversifying but money losing mean reversion in assets.


Results


Right so you have waded through all this crap, and here is your reward, what are the results like?

                     SR   beta   r_SR  H1_SR  H1_beta  H1_r_SR  H2_SR  H2_beta  H2_r_SR
LONG_ONLY
0.601 1.000 0.000 0.740 1.000 0.000 0.191 1.000 0.000
BASELINE 0.983 0.094 0.940 1.246 0.094 1.192 0.197 0.058 0.188
SR 1.087 0.308 0.932 1.298 0.338 1.084 0.458 0.151 0.438
SR_short 1.089 0.324 0.927 1.318 0.353 1.100 0.375 0.166 0.350
sink 1.025 0.323 0.871 1.252 0.330 1.055 0.356 0.260 0.319
SR_sink 0.975 0.362 0.804 1.167 0.378 0.947 0.388 0.265 0.349
SR_short_sink 0.929 0.384 0.753 1.121 0.395 0.899 0.323 0.305 0.277
alpha 0.993 0.199 0.891 1.212 0.220 1.069 0.345 0.081 0.336
alpha_short 1.023 0.238 0.904 1.237 0.249 1.082 0.367 0.160 0.343
alpha_sink_short 0.888 0.336 0.735 1.090 0.336 0.903 0.257 0.299 0.210
The columns are the SR, beta, and 'residual SR' (alpha divided by standard deviation) for the whole period, then for H1 (not really the first half, but pre 2009), then for H2 (after 2009). Green values are the best or very close to it, red is the worst (excluding long only, natch).

Top line is everything looks worse after 2009 for both outright and residual performance. Looking at the entire period, there are some fitting methods that do better than the baseline on SR, but on residual SR they fail to improve. Focusing on the second half of the data, there is a better improvement on SR over the baseline for all the fitting methods, but one which also survives the use of a residual SR. 

But the best model of all in that second half was almost much the simplest, just using the SR alone to robustly fit weights - but for the entire period, and sticking to the two stage process of fitting forecast weights and then instrument weights. 

I checked, and the improvement over the baseline from just using SR was statistically significant with a p-value of about 0.02. The p-value versus the competing 'apha' fit wasn't so good - just 0.2; but Occams Rob's razor says we should use the simplest possible model unless there is a more complex model that is significantly better. SR is significantly better than the simpler baseline model at least in the more critical second half of the data, so we should use it and only use a more complex model if they are better. We don't need a decent p-value to justify using SR over alpha, since the latter is more complex.


Coda: Forecast or instrument weights?

One thing I was curious about was whether the improvements from using SR are down to fitting forecast weights or instrument weights. I'm a bit more bullish on the former, as I feel there is more data and thus more likelihood of getting robust statistics. Every time I have looked at instrument performance, I've not seen any stastistically significant differences.

(If you have say 30 years of data history, then for each instrument you have 30 years worth of information, but for each trading rule you have evidence from each instrument so you end up with 30*30 years which means you have root(30) = 5.5 times more information).

My hope / expectation is that all the work is being done by forecast weight fitting, so I checked to see what happened if I ran a SR fit just on instruments, and just on forecasts:

               SR   beta   r_SR  H1_SR  H1_beta  H1_r_SR  H2_SR  H2_beta  H2_r_SR
LONG_ONLY 0.601 1.000 0.000 0.740 1.000 0.000 0.191 1.000 0.000
BASELINE 0.983 0.094 0.940 1.246 0.094 1.192 0.197 0.058 0.188
SR 1.087 0.308 0.932 1.298 0.338 1.084 0.458 0.151 0.438
SR_forecasts 1.173 0.330 1.002 1.416 0.364 1.179 0.441 0.154 0.420
SR_instruments 0.948 0.118 0.892 1.179 0.121 1.106 0.259 0.076 0.248


Sure enough we can see that the benefit is pretty much entirely coming from the forecast weight fitting.


Conclusions

I've shyed away from using performance rather than just correlations for fitting, but as I said earlier it is an itch I wanted to scratch. It does seem that none of the fancy alternatives I've considered in this post add value; so I will keep searching for the elusive bullet of quick wins through portfolio optimisation. 

Meanwhile for the exercise of updating my trading strategy with new instruments, I will probably be using Sharpe Ratio information to robustly fit forecast weights but not instrument weighs (I still need to hold my nose a bit!).







Monday 5 February 2024

Introducing max-GM (median@p), a new(?) performance statistic

Do you remember this post? https://qoppac.blogspot.com/2022/06/vol-targeting-cagr-race.html

Here I introduced a performance metric, the best annualised compounding return at the optimal leverage level for that strategy. This is equivalent to finding the highest geometric return once a strategy is run at it's Kelly optimal leverage.

I've since played with that idea a bit, for example in this more recent post amongst others I considered the implications of that if we have different tolerances for uncertainty and used bootstrapping of returns when optimising a stock and bond portfolio with leverage, whilst this post from last month does the same exercise in a bitcoin/equity portfolio without leverage.

In this post I return to the idea of focusing on this performance metric - the maximised geometric mean at optimal leverage, but now I'm going to take a more abstract view to try and get a feel for in general what sort of strategies are likely to be favoured when we use this performance metric. In particular I'd like to return to the theme of the original post, which is the effect that skew and other return moments have on the maximum achievable geometric mean. 

Obvious implications here are in comparing different types of strategies, such as trend following versus convergent strategies like relative value or option selling; or even 'classic' trend following versus other kinds.



Some high school maths

(Note I use 'high school' in honor of my US readers, and 'maths' for my British fans)

To kick off let's make the very heroric assumption of Gaussian returns, and assume we're working at the 'maximum Kelly' point which means we want to maximise the median of our final wealth distribution - same as aiming for maximum geometric return - and are indifferent to how much risk such a portfolio might generate.

Let's start with the easy case where we assume the risk free rate is zero; which also implies we pay no interest for borrowing (apologies I've just copied and pasted in great gobs of LaTeX output since it's easier than inserting each formula manually):

Now that is a very nice intuitive result!

Trivially then if we can use as much leverage as we like, and we are happy to run at full Kelly, and our returns are Gaussian, then we should choose the strategy with the highest Sharpe Ratio. What's more if we can double our Sharpe Ratio, we will quadruple our Geometric mean!

Truely the Sharpe Ratio is the one ratio to rule them all!

Now let's throw in the risk free rate:


We have the original term from before (although the SR now deducts the risk free rate), and we add on the risk free rate reflecting the fact that the SR deducts it, so we add it back on again to get our total return. Note that the higher SR = higher geometric mean at optimal leverage relationship is still true. Even with a positive risk free rate we still want to choose the strategy with the highest Sharpe Ratio!

Consider for example a classic CTA type strategy with 10% annualised return and 20% standard deviation, a miserly SR of 0.5 with no risk free rate; and contrast with a relative value fixed income strategy that earns 6.5% annualised return with 6% standard deviation, a SR of 1.0833

Now if the risk free rate is zero we would prefer the second strategy as it has a higher SR, and indeed should return a much higher geometric mean (since the SR is more than double, it should be over four times higher). Let's check. The optimal leverages are 2.5 times and 18.1 (!) times respectively. At those leverages the arithmetic means are 25% and 117% respectively, and the geometric means using the approximation are 12.5% for the CTA and 58.7% for the RV strategy.

But what if the risk free rate was 5%? Our Sharpe ratios are now equal: both are 0.25. The optimal leverages are also lower, 1.25 and 4.17. The arithmetic means come in at 12.5% and 27.1%, with geometric means of 9.4% and 24%. However we have to include the cost of interest; which is just 1.25% for the CTA strategy (borrowing just a quarter of it's capital at a cost of 5% remember) but a massive 15.8% for the RV. Factoring those in the net geometric means drop to 8.125% for both strategies - we should be indifferent between them, which makes sense as they have equal SR.



The horror of higher moments

Now there is a lot wrong with this analysis. We'll put aside the uncertainty around being able to measure exactly what the Sharpe Ratio of a strategy is likely to be (which I can deal with by drawing off more conservative points of the return distribution, as I have done in several previous posts), and the assumption that returns will be the same in the future. But that still leaves us with the big problem that returns are not Gaussian! In particular a CTA strategy is likely to have positive skew, whilst an RV variant is more likely to be a negatively skewed beast, both with fat tails in excess of what a Gaussian model would deliver. In truth in the stylised example above I'd much prefer to run the CTA strategy rather than a quadruple leveraged RV strategy with horrible left tail properties.

Big negative skewed strategies tend to have worse one day losses; or crap VAR if you prefer that particular measure. The downside of using high leverage is that we will be saddled with a large loss on a day when we have high leverage, which will significantly reduce our geometric mean.

There are known ways to deal with modifying the geometric mean calculation to deal with higher moments like skew and kurtosis. But my aim in this post is to keep things simple; and I'd also rather not use the actual estimate of kurtosis from historic data since it has large sampling error and may underestimate the true horror of a bad day that can happen in the future (the so called 'peso problem'); I also don't find the figures for kurtosis particuarly intuitive. 

(Note that I did consider briefly using maximum drawdown as my idiots tail effect here. However maximum drawdown is only relevant if we can't reduce our leverage into the drawdown. And perhaps counter intuitively, negative skewed strategies actually have lower and shorter drawdown)

Instead let's turn to the tail ratio, which I defined in my latest book AFTS. A lower tail ratio of 1.0 means that that the left tail is Gaussian in size, whilst a higher ratio implies a fatter left tail.

I'm going to struggle to rewrite the relatively simple 0.5SR^2 formulae to include a skew and left tail term, which in case will require me to make some distributional assumptions. Instead I'm going to use some bootstrapping to generate some distributions, measure the tail properties, find the optimal leverage, and then work out the geometric return at the optimal leverage point. We can then plot maximal geometric means against empirical tail ratios to get a feel for what sort of effect these have.



Setup

To generate distributions with different tail properties I will use a mixture of two Gaussian distributions; including one tail distribution with a different mean and standard deviation* which we draw from with some probability<0.5. It will then be straightforward to adjust the first two moments of the sample distribution of returns to equalise Sharpe Ratio so we are comparing like with like.

* you will recognise this as the 'normal/bliss' regime approach used in the paper I discussed in my prior post around optimal crypto allocation, although of course it will only be bliss if the tail is nicer which won't be the case half the time.

As a starting point then my main return distribution will have daily standard deviation 1% and mean 0.04% which gives me an annualised SR of 0.64, and will be 2500 observations (about 10 years) in length - running with different numbers won't affect things very much. For each sample I will draw the probability of a tail distribution from a uniform distribution between 1% and 20%, and the tail distribution daily mean from uniform [-5%, 5%], and for the tail standard deviation I will use 3% (three times the normal). 

All this is just to give me a series of different return distributions with varying skew and tail properties. I can then ex-post adjust the first two moments so I'm hitting them dead on, so the mean, standard deviation and SR are identical for all my sample runs. The target standard deviation is 16% a year, and the target SR is 0.64, all of which means that if the returns are Gaussian we'd get a maximum leverage of 4.0 times.

As always with these things it's probably easier to look at code, which is here (just vanilla python only requirements are pandas/numpy).



Results

Let's start with looking at optimal leverage. Firstly, how does this vary with skew?


Ignore the 'smearing' effect this is just because each of the dots in a given x-plane will have the same skew and first two moments, but slightly different distribution otherwise. As we'd expect given the setup of the problem the optimal leverage with zero skew is 4.0

For a pretty nasty skew of -3 the leverage should be about 10% lower - 3.6; whilst for seriously positive skew of +3 you could go up to 4.5. These aren't big changes! Especially when you consider that few real world trading strategies have absolute skew values over 1 with the possible exception of some naked option buying/selling madness. The most extreme skew value I could find in my latest book was just over 2.0, and that was for single instruments trading carry in just one asset class (metals).

What about the lower tail? I've truncated the plot on the x-axis for reasons I will explain in a second.



Again for the base case with a tail ratio of 1.0 (Gaussian) the optimal leverage is 4.0; for very thin lower tails again it goes up to 4.4, but for very fat lower tails it doesn't really get below about 3.6. Once again, tail ratios of over 4.0 are pretty rare in real strategies (though not individual instruments), although my sampling does sometimes generate very high tail ratios the optimal leverages never go below 3.6 even for a ratio of nearly 100.

Upper tail, again truncated:


Again a tail ratio of 1.0 corresponds to optimal leverage of roughly 4.0; and once again even for very fat upper tail ratios (of the sort that 'classical' trend followers like to boast about, the optimal leverage really isn't very much higher.

Now let's turn to Geometric mean, first as affected by skew:


As we'd expect we can generate more geometric mean with more skew... but not a lot! Even a 3 unit improvement in skew barely moves the return needle from 22.5% to 24.5%. To repeat myself it's rare to find trading strategies above or below 1.0, never mind 3.0. For example the much vaunted Mulvaney capital has a return skew on monthly returns of about 0.45. The graph above shows that will only add about 0.25% to expected geometric return at optimal leverage versus a Gaussian normal return distribution (this particular fund has extremely large standard deviation as they are one of the few funds in the world that actually runs at optimal leverage levels).

For completeness, here are the tail ratios. First the lower tail ratio:

Fat left tails do indeed reduce maximum optinmal geometric means, but not a lot.

Now for the upper tail:



Summary

I do like neat formulae, and the results for Gaussian normal distributions do have the property of being nice. Of course they are based on unreal expectations, and anyway who runs full Kelly on a real strategy expecting the returns to be normal and the SR to be equal to the backtested SR is an idiot. A very optimistic idiot, no doubt with a sunny disposition, but an idiot nonetheless.

For the results later in the post it does seem surprising that even if you have something with a very ugly distribution that you'd not really adjust your optimal leverage much, and hence see barely on impact on geometric mean. But remember here that I'm fixing the first two moments of the distribution; which means I'm effectively assuming that I can measure these with certainty and also that the future will be exactly like the past. These are not realistic expectations! 

And that is why in the past rather than choose the leverage that maximised geometric mean, I've chosen to maximise some more conservative point on the distribution of terminal wealth (where geometric mean would be the median of that distribution). Doing that would cause some damage to the negative skewed, fat left tail distributions resulting in lower optimal leverage and thus lower geometric means.

I have thought of a way of doing this analysis with distributional points, but it's quite computationally intensive and this post is already on the long side, so let's stop here. 

Tuesday 9 January 2024

Skew preferences for crypto degens


An old friend asking for help... how can I resist? Here is the perplexing paper:

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4042239

And here is the (not that senstional) abstract:


Bitcoin (BTC) returns exhibit pronounced positive skewness with a third central moment of approximately 150% per year. They are well characterized by a mixture of Normals distribution with one “normal” regime and a small probability of a “bliss” regime where the price appreciation is more than 100 times at the annual horizon. The large right-tail skew induces investors with preferences for positive skewness to add significant BTC holdings to equity-bond portfolios. Even when BTC is forecast to lose half of its value in the normal regime, investors with power utility optimally add 3% allocations to BTC when the probability of the bliss regime is around 1%. Cumulative Prospect Theory investors are even more sensitive to positive skewness and hold BTC allocations of around 3% when the probability of the bliss regime is 0.0006 and the mean of BTC in the normal regime corresponds to a loss of 90%.


3% in BTC doesn't sound too crazy to me, but what has been really setting the internet on fire is this out of context quote from later in the paper:


Starting with a 60-40 equity-bond portfolio, which is produced with a risk aversion of 𝛾 = 1.50, the optimal BTC allocation is a large 84.9%! The remainder of the portfolio, 15.1% is split 60-40 between equities and bonds. Although BTC has an extremely large volatility of 1.322 (see Exhibit 1), the pronounced positive skewness leads to large allocations and dominates in the utility function (see equation (9)). The certainty equivalent compensation required to not invest in BTC is close to 200%. [my emphasis]


Here's my English translation of this:

- Bitcoin has pronounced positive skew 

- Some people really like positive skew (people  with 'power utility' and 'cumulative prospect theory' preferences)

- This justifies a higher allocation to Bitcoin than they would otherwise have, since it has lots of positive skew (both on an outright basis, and as part of a 60:40 portfolio).

- There is a 'Bliss' regime when Bitcoin does really well ('goes to the moon') but which isn't very likely

- Even if there is a tiny probaility of this happening, and if things are generally terrible in the non bliss regime, then people who like positive skew should have more Bitcoin. Some of them should have a lot!

Now, I could just as easily write this:

- Lottery tickets have (very!) pronounced positive skew 

- Some people really like positive skew 

- This justifies a higher allocation to lottery tickets than they would otherwise have

- There is a 'Bliss' regime when lottery tickets do very well ('winning the jackpot') but which isn't very likely

- Even if there is a tiny probaility of this happening, and if things are generally terrible in the non bliss regime, then people who like positive skew should have more lottery tickets. Some of them should have a lot!

I see nothing here that I can argue with (sorry Ben)! And it certainly doesn't require an academic to make the argument that people who like lottery ticket type payoffs, and think that there is a chance that Bitcoin will go up a lot, should buy more Bitcoin. But I think there is a blogpost to be written about the interaction of skew prefences and allocations; and hopefully one that is perhaps easier to interpret. Two key questions for me are:

- to what extent does the expectation of return distributions affect allocations?

- just how far from 'skew neutral' does ones prefence have to be before we allocate significant amounts to Bitcoin

Luckily, I already have an intuitive framework for analysing these problems, which I used in a fairly complete way in my previous post - bootstrapping the return distribution. 


Setup

The goal then, is to understand the asset allocation that comes out of (a) a set of return distributions and (b) a preference for skew.

For the return distributions we have two broad approaches we can use. Firstly, we can use actual data. Secondly, we can use made up return distributions fitted to the actual data. This is what the paper does, mostly "We use monthly frequency data at the annual horizon from July 2010 to December 2021 for BTC and from January 1973 to December 2021 for stocks and bonds. The univariate moments for each asset are computed using the longest available sample, and the correlation estimates are computed with the common sample across the assets."

The paper also uses a third approach, which is to see what happens if they mess with the return distributions once fitted by changing the probability of 'Bliss'.

I'm going to use the first approach, which is to use real return data at least initially. Other slight differences, I will use returns from July 2010 to November 2023 for all three assets, I will use excess rather than total returns (which given the low interest rates in the period makes almost no difference) with futures prices for S&P 500 (equity proxy) and US 10 year bonds (bond proxy), with Bitcoin total return deflated by US 3 month treasury yields, and I'm going to use daily rather than monthly data to improve my sample size.

The next consideration is the utility preference of the investor. I am going to assume that the investor wants to maximise the Nth percentile point of the distribution of geometric returns. This is the approach I have used before which requires no assumptions about utility function and allows an intuitive measure of risk preference to be used by modifying N. 

As I have noted at length, someone with N=50 is a Kelly optimiser. That is the absolute maximum you should bet, irrespective of your appetite for skew or risk. Thus the Kelly bettor must have the maximum possible appetite for skew. Someone with N<50 would be very nervous about the downside and much more worried about small losses than the potential for large gains; and hence they would have less of a preference for positive skewed assets.

I personally think this is a much more intuitive way to proceed than randomly choosing utility functions and risk aversion parameters, and choosing from a menu of theoretical distributions. The downside is that isn't possible to decompose skew and risk preferences, since both have been replaced with a different measure - the 'appetite for uncertainty'.

An important point is that maximising CAGR will naturally lead to a higher allocation to crypto than you would get from the more classical method of maximising mean subject to some standard deviation constraint or risk aversion penalty. 

The method I will use then is:

- sample the returns data repeatedly to create multiple new sets of data.The new set of data would be the same length as the original, and we'd be sampling with replacement (or we'd just get the new data in a different order). 

- from this new set of data and a given set of possible portfolio allocations, estimate the geometric return

- for a given set of allocations, take the Nth percentile of the distribution of geometric means

- plot the Nth percentile for each allocation to work out roughly where the optimal might be

I say 'roughly', because as readers of previous posts on this subject are aware, we never know exactly where the optimal is when bootstrapping, which is a much better reflection of reality than the precise analytical calculations done by the original authors. Still, we can get a feel for how the optimal changes as we vary N (skew preference).

Note: As a fan of Red Dwarf, the use of the term 'Bliss' in this context is very confusing!


The data


Since we're pretending to be proper academics, here are the summary statistics of the real data:

Annualised mean:
equity 0.256
bonds 0.000
bitcoin 1.280

Annualised standard deviation:
equity 0.160
bonds 0.064
bitcoin 0.816

Correlation:
equity bonds bitcoin
equity 1.000 -0.245 0.069
bonds -0.245 1.000 -0.014
bitcoin 0.069 -0.014 1.000
Sharpe ratio:
equity 0.783
bonds 0.166
bitcoin 1.612

Skew:
equity -0.431
bonds 0.294
bitcoin 0.970

Note that if anything the statistics here are more favourable to Bitcoin than in the original paper. Importantly, we are assuming that as in the past Bitcoin will more than double every year on average (the figure in bold), and that it will have a Sharpe Ratio well north of 1.0. Given these raw statistics, it isn't then very suprising regardless of skew preferences that we would potentially dump a large part of our portfolio into Bitcoin. And indeed, if I run these numbers through my optimisation the optimal position is 100% in Bitcoin for a Kelly maximiser.

To add another line to my 'dumb' bullet point translation of the paper earlier:

- if you think Bitcoin will go up a lot like it did in the past, you should only own Bitcoin

To make things more realistic and interesting, I'm doing to massage the data to reflect what I think is a fairly conservative forward looking position: All assets will have the same Sharpe Ratio (which I will set arbitrarily at an annualised 0.5). I achieve this by shifting the mean returns up or down respectively, which means all the other return characteristics remain the same - only Sharpe Ratio and means are affected. Note that this also means that bonds will look better relative to equities.

This still implies that Bitcoin will, on average, go up by 40% a year, which means it will double every two years. Personally I still think this is extremely optimistic, but I'm going to put my own views to one side for this exercise.

Note: even if you are not a Bitcoin skeptic, it seems unlikely that Bitcoin will behave in the same way going forward as it did when it was worth less than $1,000 and had the market cap of a penny stock rather than a decent sized country; both the mean, skewness, and the standard deviation have reduced in the last few years since Bitcoin has become a bigger market.


Results

Right, let's see some pictures. 

The following heatmap shows what happens to the median of the distribution of bootstrapped geometric returns (Kelly maximiser, with maximum appetite for skew) as we allocate to equities (y-axis) and Bitcoin (x-axis). The allocation to bonds will be whatever is leftover. The white area is where we can't allocate, since we are putting more than 100% into the portfolio, and my working assumption is here is that leverage isn't allowed (if it was, we'd have much more bonds, much less equities and Bitcoin, and use leverage to maximse CAGR).




The optimal allocation to Bitcoin is somewhere around 50% with equities taking most of the rest. So even the most gung-ho optimistic skew loving nutjob shouldn't put more than half their wealth in BTC. For context, a 50% equity and BTC portfolio would have a standard deviation of around 42%, nearly 3 times the risk of equities. To be Kelly optimal, that implies the Sharpe Ratio would need to be at least 0.42. This is a much higher risk target than pretty much every hedge fund uses.

Now let's see what happens if we reduce our N to the 25% percentile point. Importantly: this is roughly the N that produces a 60:40 portfolio considering a portfolio with only equities and bonds. So we can think of this as the 'base case' for risk and skew preference. Again with CAGR below 4% washed out to produce a more granular z axis:




You can see the optimal allocation to Bitcoin is lower here, around 30%, with perhaps 50% in equities and the rest in bonds.


What about N=10%?

Again, we are looking at a bit less again in Bitcoin; with something around 20% with perhaps 70% in equities and the rest in bonds. This would give you something with a standard deviation not much higher than equities, at least in theory.


Summary- ignore everything I have said

The original paper has been toted around the internet to say that you should have 85% of your portfolio in Bitcoin ('this is optimal'). But:

- this is a single figure taken out of context from a much more nuanced paper; note again that the abstract does not include such an extreme figure
- it assumes that historic Bitcoin performance is matched going forward, including performance from 2010 back when BTC cost less than $1 and the total 'market cap' was less than $200,000. 
- it assumes particular risk aversion, preferences for skew and utility functions; such that you would hold quite a lot of Bitcoin even if you thought it's performance would generally be bad except in rare 'Bliss' regimes. Basically it says 'if you like lottery tickets, you are going to love Bitcoin!'.

In this post I take a different approach which hopefully is more intuitive for the non economist, and gives a bit more insight into the interplay between return skew and skew preference, which is also useful beyond the narrow problem of allocating to crypto currency. But what you couldn't or shouldn't do is take anything I or anyone else has written, and claim it 'proves' that the 'optimal' allocation to Bitcoin is x%. All it can do is say based on these assumptions and assuming this set of preferences what your allocation should be. That can quite easily come out to 85%, or 100%. It can also quite easily come out to less than 1%, or even zero.  

What's my own personal allocation to Bitcoin, I hear you ask? On a long only basis it is zero, and nothing I have written here will change that. Partly this is because of my long standing and well known aversion to this 'asset class', both in principle* and in practice**.

* to summarize it's a ponzi that wastes energy with the ownership structure of a pyramid scheme, and which will never be useful for anything except the current use cases: 1% illegal money transfer, 99% gambling
** it's a real pain and very expensive to buy Bitcoin 'properly' i.e. owning your own coins and putting them into cold wallet storage 

But it's also because unlike in this example, there are more than three assets in the world! Concretely, I trade well over 100 futures; of which just a couple are crypto coins. Accordingly it also makes no sense to me to put more than a few % of my trading account into crypto - an account where I can go long and short and hence my personal biases are irrelevant.

My allocation to Bitcoin and Ether in my futures trading strategy is a touch under 5%. And those are risk weights; the equivalent cash weight would be lower: as I write this my position in Bitcoin is long 3 micro futures with a notional value of perhaps £12K or around 3% of my trading capital. Of course it could just as easily be zero, or a short position...