Monday, 8 September 2025

PCA analysis of Futures returns for fun and profit, part deux

 In my previous post I discussed what would happen if you did the crazy thing of doing a PCA on the whole universe of futures across assets, rather than just within US equities or bonds like The Man would want you to. In this post I explore how we could do something useful with them. There is some messy code here, to run all of it you'll need psystemtrade, but you can exploit big chunks with your own data even if you don't.


The big problem: sign flipping

Before hitting some p&l generating activity, first however we need to deal with an outstanding issue from the previous post.

TLDR, most of the time factor one is 'risk on /equities are go' and factor two is global interest rates; although not always. Factor sign flipping was a problem however (thanks to people below the line for that insight). So sometimes factor one was long equities, sometimes short equities. Sometimes it was something else entirely. 

As an example, remember this plot from part one? It's the factor exposure of the S&P 500 over time for factors 1(0), 2 (1) and 3(2).

Note there are 'blips' when we have a short exposure to factor 1, mostly in the period since 2008 when we're normally long factor 1. That's clearly a temporary sign flip. We probably want to get rid of those. But there is also the long period in the early 2000's when we're persistently short factor 1. That might be a 'sign flip'; but it could also be that factor one in this period was something more interesting than just 'long equity risk'. 

A couple of ideas spring to mind here. One is just smoothing the factor weights. That would easily solve the blips; and the smooth need only be a few weeks to get rid of them. But a longer smooth, of the length needed to get rid of the other periods, would reduce the information about the factors; in particularly we'd be missing out on interesting times when something other than boring old risk on and off is driving the market.

Another bright idea I had was to reverse the sign on weights when the largest absolute value weight was negative. My expectation was that generally the largest weight on factor 1 (mostly risk on) would usually be equities, and when that factor flipped sign we'd flip it back again. However that didn't produce the expected results. If I were less lazy (and eager to get back to writing book #5), I'd probably do some research; eg I'm pretty sure the answer is somewhere in Gappy's new book but I haven't got there yet. 

In the end I decided to relax and ignore the sign flipping; I can do this because of the four ideas I outlined:

1- own the factors

2- trade the factors

3- buy assets with persistent alpha (+ve residual) 

4- mean revert the cumulative residual 

.... it's only really 1 and 2 that are affected by sign flipping. And I feel I already have things in my armoury for 1 and 2. For example my aggregate momentum signal (blogged about here, and also in my most recent book AFTS) is basically like 2, and on assets with a long bias that will also give us a chunk of 1 as well. 

<Sidebar * note to browser not actually HTML>

Arguably my relative momentum and long term mean reversion are also a bit like 3 and 4. Yet another idea is to build 'asset classes' using clustering as I did here, and then use those for the purposes of 1,2 and possibly 3 and 4. 

So we have three different ways of forming 'factors': exogenously determine asset classes, PCA, and clustering; and four different ways of trading each of them. Those won't give radically different results since clusters mostly follow asset classes, but they could be a little different.

<\Sidebar * see previous note>

But, I hear you cry, why can you flippantly ignore sign flipping when trading only the residuals? Well it's pretty simple; consider a standard APT type equation with a single PCA k and market i:

r_i_t = a_i + (b_i,k * r_k,t) +e_i,t

If we now do a sign flip, then the beta (b) will have a minus one in front of, but the market or PCA return r_m will also have a minus one in front of it. These cancel, and estimation of both the persistent bias (alpha, a_i) and the temporary error (epsilon, e_m) will be unaffected. 


Trading the alpha

So we have two basic ideas; we generate our PCA and then run regressions that look like this:

r_i_t = a_i + (b_i,k * r_k,t) + ... +e_i,t

Where there are one or more PCA k.... And then we eithier buy positive a_i and sell negative; or we sell things with recent cumulative positive e_i.

There are still many design questions to resolve here. How many PCA do we include? Too few, and we'll probably end up missing something interesting. Too many and there is a risk we'll end up without clear signals. Over what period should we estimate betas and alphas? Basically how persistent are they likely to be. Over what period should we cumulate epsilon? Are there periods in which episilon will be trending rather than mean reverting; eg assets that have outperformed their factor adjusted return will continue to do so (which will look an awful lot like buying positive alpha)?

For the PCA I'm going to keep it simple and initially use three PCA, which happens to be the most I can plot and get my head around it. I'm also going to stick to estimating my alphas and betas over a 12 month period, which is the arbitrary period I used before to estimate the PCA themselves (seems weird to use a different period). For the question of epsilon decay I will risk the wrath of the overfitting gods and do a time sensitivity analysis.

To summarise then: At the start of each month we look at the 12 months normalised returns, do a PCA, and then regress each instrument on the returns of each component. We then have an alpha intercept coefficient, and some betas (at most three, once for each PCA). We can see how predictable the alpha is of returns in the following month(s). Then for the following month we can also calculate the residual of performance vs the fitted model. We can cumulate up these residuals and see how they forecast performance.


Alpha

Let's start with the alphas. Here be a massive scatter plot:


Each point is the alpha calculated at the start of a given month for an instrument, and the normalised ex-post return for the following month. It looks like there might be a weak positive relationship there, so let's do some stats.

                           OLS Regression Results                            
==============================================================================
Dep. Variable:         ex_post_return   R-squared:                       0.006
Model:                            OLS   Adj. R-squared:                  0.006
Method:                 Least Squares   F-statistic:                     169.0
Date:                Mon, 08 Sep 2025   Prob (F-statistic):           1.62e-38
Time:                        11:46:09   Log-Likelihood:                 1910.6
No. Observations:               26311   AIC:                            -3817.
Df Residuals:                   26309   BIC:                            -3801.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0052      0.001      3.735      0.000       0.002       0.008
alpha          0.3144      0.024     12.999      0.000       0.267       0.362
==============================================================================

There: we have the classic undergrad stats exercise question "Why is my t-stat big but my R squared is low?". Answer: there is something here, but it's weak. This often happens if you use a large dataset (26,000 observations here).

To show this differently, the conditional ex-post average daily return over the following month with a positive alpha is 0.02, and with a negative alpha is -0.01 (both conditional subsets are roughly half the dataset overall). The t-statistic comparing these is a hefty 10, corresponding to a p_value of the order of 10^-26. So again, alpha definitely has an effect, but is that difference really that big? Hard to tell.

But we know that low R squared are pretty common in finance, so is this a problem? To test this I tried using the alphas as a forecast, and then calculated the Sharpe Ratio of each forecast. The median across all instruments is a SR of 0.1. Remember that trend following gives us around 0.3 to 0.4 for each instrument, so this isn't especially interesting.

It might be that I would get better results from a different lookback to calculate the alphas (remember we use one year). Everything from a 1 month to a lookback of 10 years. What about using fewer, or more principal components? Remember we're going with three.  It turns out that a one year lookback is pretty optimal compared to shorter or longer; but using one PC is better than using two or more. Still the very best we can do is a one year lookback with one PC, and that gives us a SR of 0.12 which is hardly in wallet busting territory and also not significantly different from the result wih three factors.


Trading the residual

Let's turn then to trading the residuals. We're going to cumulate up residuals over various periods and see how well that predicts future returns. To avoid a forward looking forecast, the residuals are calculated on the out of sample month following the point at which the model is fitted. Otherwise the regression coefficients would be forward looking, and hence so would the forecast.

Note that because the model changes slightly each month, the coefficients used to calculate residuals will also change slightly. Such is life. But we'll keep stacking up the residuals month by month even though they are using different models.

We now have 3 knobs to twiddle on our overfitting machine; lookback and number of PCs as before, but also the number of days we sum up residuals. To keep things relatively simple I will initially sum them up using 22 days (about a month of business days). So our base case is:
  • One year lookback to do PCA and calculate coefficients
  • 3 principal components
  • 22 days summing up of residuals; our forecast is minus the summed residuals
And jumping straight to Sharpe Ratio calculation like an impatient toddler, we get a SR of -0.04. The sign is wrong, showing that positive residuals lead to more positive performance, and the effect is also v.v.v. weak.

Does increasing the residual summing period work; eg mean reversion works over longer time periods? Nope. Anything up to a year is actually worse. Going down to a week (which would be v.v.v. costly to trade) does at least push the SR into positive territory, but only just.

Dropping to one PC (which was marginally better than for alpha above), changing the lookback on the PCA, .... nothing produces useful results. This idea is an already dead donkey that has been subsequently thrown off a cliff and then burned**.

** no actual donkeys were harmed in the creation of this blog post

 

Back to alpha

So we had a not too promising individual SR using alpha on an instrument level, but how does that look on a portfolio level? Surprisingly, quite good. Here are the combined results for 100 instruments:


That bad boy has a SR of 0.84! Some of that is diversification, but some of it is because a more accurately calculated median SR per instrument is 0.15, higher than the 0.10 calculated earlier (because, many reasons, like buffering and what not). Still, that's an extremely high realised diversification of over 5. Let's compare it to the 'gold standard' of single momentum models, EWMAC16,64 with the same instruments:


OK not as good, even counting the different vol ewmac comes in with a SR of 1.08, but their correlation is a relatively lowly 0.3. That suggests a modest allocation to alpha persistence will earn some money. Chucking 10% of your forecast weights into alpha persistence bumps up the SR of ewmac16,64 from 1.08 to 1.12. Going to the (arguably in sample fitted) best model with one principal component improves the SR of the alpha model by itself to 1.02, but also increases the correlation with momentum; so that the joint SR with 10% in alpha persistence and 90% in ewmac produces a pretty much unchanged SR of 1.13.


Summary

Research in systematic trading tends to result in a lot of blind alleys. I thought this would be another one. Certainly the idea of mean reverting the errors, a classic from the equity stat arb crowd, doesn't really work in this context. However there does seem to be some modest performance gain to basic momentum from including a PCA derived alpha persistence model. The gain is small however, so it's debatable whether it's worth what would be quite a lot of additional work. Not a blind alley then, but not a very pleasant one to spend much time in.




Tuesday, 1 July 2025

PCA analysis of Futures returns for fun and profit, part #1

 I know I had said I wouldn't be doing any substantive blog posts because of book writing (which is going well, thanks for asking) but this particular topic has been bugging me for a while. And if you listened to the last episode of Top Traders Unplugged you will hear me mention this in response to a question. So it's an itch I feel I need to scratch. Who knows, it might lead to a profitable trading system.

Having said all that, this post will be quite short as it's really going to be an introduction to a series of posts.


Given factor analysis

So at it's heart this is a post about factors. Factors are the source of returns, and of risk. This concept came from the land of equities, specifically the long short factor sorts beloved of Mssrs Fama and French; and it also spawned an entire industry: the modern equity market neutral hedge funds (although Alfred Winslow Jones actually implemented the whole hedge fund idea whilst Fama and French were still in high school). 

At it's core then we have the idea of the APT risk model which is basically a linear regression:

r_i,t = a_i + B_1_i*r_1_t + ..... + e

Where r_i,t is the return on asset i and time t, a_i is the alpha on asset i (assumed to be zero), B_1_i is the Beta on the first risk factor of asset i, r_1_t is the return of the first risk factor, there are more terms like this, and e is an error term with mean zero. Strictly speaking the returns on both i and the risk factor should be excess returns with risk free rate deducted, but we're futures traders so that detail can be safely ignored.

In it's simplest form with a single factor that is 'the market',  this is basically just the OG CAPM/EMH, and B_1 is just Beta. In a more complex form we can include things like the sorted portfolios of Fama and French. Notice that risk and return are intrinsically linked here. The factor is assumed to be some kind of risk that we get paid a price for exposure to. That price is the B_N term. 

(Should B_N be estimated in a time varying way? Perhaps. Although if you vol normalise everything first, you will find your B_N are much more stable, as well as being more interpretable).

Note that for both the market and the Fama French factors (FFF), the factors are given. To be precise, in both cases the factors consist of portfolios of the underlying assets, with some portfolio weights. For the market portfolio, those portfolio weights are (usually) market cap weights. For the FFF they are the +1 for top quartile, -1 for bottom quartile sort of thing. 


What can we do with factors?

Many things! The dual nature of factors as risk and return drivers leads them to multiple uses. So for example, we could own the factors. They are just portfolios, and going long if you think the factor will earn you a risk premium is not a bad idea. If you buy an S&P 500 ETF, well congratulations you have gone long the equity market beta factor. With the ability to go long and short we can own FFF as easily as the market factor. Indeed there are funds that allow you to get exposure to FFF factors or similar, though sometimes only on the long side. 

We could also trade the factors. My own work in my previous book, AFTS, suggests that 70% of the returns of a momentum portfolio come from trading an asset class index. That is an equal vol weighted rather than market cap weighted portfolio, but the overall effect is similar. Trading, i.e. market timing, the FFF or similar is a little more difficult and if you try to do it Cliff Asness will turn up at your house and hit you repeatedly with a stick.

If we treat the factors as risk we don't want, and we don't buy the idea of an efficient market, then we can buy high alpha / sell low alpha. If a stock looks like it has excess return, over and above what that market and FFF say it should have, then maybe it is a good bet? Although financial economists will scoff at you and say you are exposed to a risk that is not in your regression for which you are earning a risk premium, you can just point to your porsche and explain in great detail how you don't care.

Perhaps we believe in the efficient market hypothesis in the long term, but not in the short term. We wouldn't trust those alphas to be persistent as far as we could throw them. But if we take the residual term, e, well that will most likely show a lovely mean reverting pattern when cumulated. So we can mean revert the residual. Big upward swings away from efficiency that we can short the asset on, and lovely downward pulls we can go long on.

There are more esoteric things people do with factors, mainly to do with risk management. You can for example use them to construct robust correlation matricies, hedging portfolios and what not. Risk management isn't my principal concern here, but that is still good to know.


PCA factor analysis

This is all lovely, especially in equities, but in futures things are a bit more mysterious. For starters, we can do things at an asset class level (which is closer in spirit to the equity market neutral world, although we're still at a level higher as our components are e.g. equity indices, not individual equities); but we can also uniquely do a 'whole market' look by considering futures as a whole.

We could probably take a stab at creating an 'asset class' factor in each market that would be like Beta, and indeed I did that in AFTS with my equal risk weighted index. We know that there are certain bellweather markets like the S&P 500 that we could use as proxies for 'the market' in individual asset classes. 

But for futures as a whole, things are much harder. Is the 'market' really just long everything? Even VIX/VSTOXX where we know the risk premium is on the short side? My gut feeling is that our most important factor will be some kind of risk on/off, but then there will be times like 2022 when it would plausibly have been more inflation related. And what would the second factor be?

So we will switch tactics, and rather than use given factors, we will use discovered factors. The idea here is that data itself can tell us what the main latent drivers of returns are, if we just look hard enough. Sure in many cases that will give us the first factor as basically the market portfolio, but the subsequent factors will be more interesting. And in the specific case of futures, where we don't know what the likely factors are, it's going to be quite intruiging.

We use a PCA to discover these factors, with vol normalised returns as the starting point. For each factor we end up with a set of portfolio weights (can be long or short), which can then be helpful to interpret the factor. Note the weights are on vol normalised returns, which are more intuitive.


Sidebar: PCA meta factor analysis (on strategy returns)

Just as a brief note, as I don't intend to cover this here, but it was touched on in the podcast. If we started with the returns of trading eg momentum on a bunch of instruments, rather than the underlying returns themselves, then that might be useful for someone was thinking about replicating a hedge fund index or risk managing a CTA, or perhaps constructing a CTA where they have hedged out the principal component(s) of CTA risk. I've written about replication before, and I've already said I'm not really concerned with risk management here, so I won't talk about this again.


Some nice pictures

This won't be a long post, as I said, as I won't be looking at how to use the PCA returns now I have them. Instead I'm going to focus on visualising the PCAs and interpreting them. Which will be a bit of fun anyway. Methodological points, I used vol normalised returns and one year rolling windows to estimate my PCA. There is a debate to be had as to whether a year is best at compromising between having enough data and a stable result, or whether we need to adapt quicker to changing market conditions. 

I estimated at least two, and up to N/2 PCA depending on how many markets N had data. I used 100 liquid futures markets with daily data back to 1970 where possible.

Let's start with the contribution to variance. This is how important each PCA is.

We can see that the first PCA for the last 12 months at least explains 18% of the variance, the second 12.5% and so on. In contrast if we did this for US equities we'd find the first PCA explained 50%, and for US bonds it would be 70%. There is a lot more going on here.

If we look at how the first two factors contributions vary over time:


... we can see that there has been a bit of a downward trend as more factors arrive in the sample, but more generally the first PCA does hover around 20% and the second around 10%. There are exceptions like 2008 where I would imagine a big risk off bet drove the market. The same was true doing COVID.

What is the first PCA? Well currently it looks like this:

For clarity I've only included the top 20 and bottom 20 instruments by weight. Still you may be struggling to read the labels. The top markets are pretty much all stocks, with European equities getting a bigger weight. The S&P does just sneak into the top 20. The bottom 20 starts with VIX and VSTOXX, but mostly the weights here are quite small. So the first PCA right now is "Equity Beta, with a tilt towards Europe".

What about the second PCA?

The first 6 positive instruments are all US bonds, and nearly all the rest are government bonds of one flavour or another. Only EU-Utility stocks get to crash this party (interest rate sensitive?). On the short side we have some FX and quite a few energy futures. So this second factor is "Long bonds / Short energies"


PCA 3 is long a whole bunch of FX, which means it's short USD, and also some metals and random commodities. On the short side it's short EU-Health equities, CNHUSD FX and a whole bunch of European bonds. Feels a bit trade related. Shall we call this the Trump factor?

Anyway I could continue, but more intuitive would be to understand how these factors have changed over time. We'll pick some key markets with lots of history. We will then plot the weight each has in a given PCA over time. 

Here is the S&P 500:

We can see that is mostly positive on PCA1 and negative on PCA2,3 but there are periods when that is not the case. The sharp drops in weighting suggest that perhaps we ought to run at something longer than a year, or use an EWMA of weights to smooth things out.

Here is US10 year:


Again, this mostly loads positive on PCA2 but not always. You can see the increase in correlation of bonds and equities happening as PCA1 creeps up in the last few years.

Here is the first PCA weighting in June 2004, one of those interesting periods.


You can see that it was all about currencies in that period; plus silver and gold, various other metals, bonds and energies. So very much a short Dollar, long metals trade.

We're nearly done for today. Last job is to plot the factors. Here are the cumulated returns for PCA1:



That looks a lot like a vol normalised equity market; note the drops in 2009 and 2020.

And here is PCA2:

Again that could plausibly be bonds, mostly up with the exception of the post 2022 period.

This suggests another research idea which is to use the S&P 500 and US interest rates as 'given factors' which might be more stable than using PCA. Still that would mean missing out on times like 2004 when other things were driving the market. 


What's next

Next step would be to look at some of those opportunities for factor use and misuse outlined above, and see if there is profit as well as fun in this game!

Monday, 2 June 2025

Quickies #1: Overfitting and EWMAC forecast scalars

 I'm now in full book writing mode, so I don't have the time to do full blog posts. Instead I plan to do a series of quick posts where I share some research I did for the book. Cynically, there is also a chance it will encourage you to buy the book, as long as I don't overshare like one of those movie trailers that gives away the plot and includes all the best action scenes.

Overfitting - A pictorial guide

Here's a slide I use a lot:


Although we know intuitively that complexity improves in sample performance, and at some point makes it worse, can we visualise it?

Let’s assume I think that the pattern of up and down price movements over the last 64 business days will predict the return in the following month (about 20 business days). I can analyze these patterns by:

    • Using one period: just looking at the last 64 days of trading and seeing if the price moved up or down. Because of weekends and holidays this is about three months.

    • Using two periods: Looking at the first 32, and second 32 days of trading. And now you see why I'm using 64 days.

    • Four periods: Examining the first 16 days, second 16 days, third 16 days, and then the final 16 days.

In theory I could end up looking at each day individually, but in practice I will stop at 16 periods of four days which turns out to be sufficiently interesting (For four periods there are 2^4 = 16 possible states, at 2^64 it would just be silly). For my analysis I see what the market did on average after each pattern of daily returns. For example, if I’m using two 32 day returns for my patterns, then there are four possibilities: up and up, up and down, down and up; and down and down. I can then see whether the market went up or down after twenty days in each of these four states.

Subsequently to trade this method I will then work out which state the market is currently in, and then buy if that state has led to the market going up by more than average in the past. This is a long only book, and hence I can't short if I expect the market to go down by more than average. Instead I'll just stay flat.

The plot shows the results of performing this exercise with Microsoft. I’d get the same kind of picture with pretty much any instrument – this is one occasion where it’s not necessary to use a huge pool of data to get robust results. Each line is a different number of periods.

Unfortunately we can’t actually realize these high levels of profit. The problem is that we are in sample fitting the method to all the available price history. So now we follow the usual procedure to get out of sample testing: fit the method to the earlier part of the data, and then test that calibrated method on later data periods. Note we might see a state we haven't seen before, in which case we forward fill from the last known state.

 Let’s see what happens if I fit my method using 16 periods of four days on all history up to 2015, and then test it on the last ten years.


The black line uses the original procedure; we take all the price history and calibrate our method accordingly. It does very well, but as I’ve already noted we couldn’t get this performance in reality. The dark gray line is only fitted up to the start of 2015. It’s performance is very similar to the black line up to then, which isn’t surprising as the methods are virtually the same and share around 70% of the data used to fit them. 

However after 2015 when it is trading on price history it hasn’t seen, it’s performance radically deteriorates. Another useful experiment is to see how well the method does on different instruments. The light gray line shows what happens if use the 2015 Microsoft method to subsequently trade Apple (AAPL). You can clearly see that the performance is terrible. This method is too closely fitted to the returns of Microsoft pre-2015; it does not perform well in later years, especially with a different stock.

What happens if I do the same procedure with a much simpler model, which just looks at the entire 64 day period? Since we know stock prices trend over periods of a few months, we can already guess what this fitted method looks like. It will buy if the price has gone up in the last 64 days, otherwise it won’t do anything. 


As the figure shows, the fitted method ends up looking exactly the same regardless of whether it was built using price history up to 2015 or for the entire period. Consequently, the dark grey line is exactly overlaid on top of the black line. You can also see that this very simple method does very well on Apple stock, which it hadn’t seen before. 


Calculating not measuring forecast scalars for EWMAC

My previous books have been full of estimated forecast scalars for my various trading rules. Many of these follow a square root of time rule. So for example, the scalars for EWMACN, 4N using daily standard deviation of returns to normalise are around 15 for 2,8, 10 for 4,16 and so on. We get the next scalar by dividing by sqrt(2). Once you estimate 2,8 you can derive the other numbers easily.

That's great, but how do we get the 'seed' value for different starting multipliers other than 2,8? I started by generating a full table of multipliers for all powers of 2:


The rows are fast, the columns are slow. Since fast<slow for trend following the bottom diagonal is empty. We need to calculate the numbers in the top row. Note that this should also work for any value of S EWMAC 2,S; but it's easier to do it in power of 2 space for fitting.

I used..... chatgpt! Yes I just pasted in the numbers and asked it to fit. It used a power law distribution, and it even gave me with some prompting when I asked it to include an intercept the required python code. It's not a perfect fit because empirical data, but as it says in the footnote of my book that all this effort was done for: 

To calculate the scaling factor for any value of f=2 and a given value of s, use the formula 2.2+(184÷s^1.25) for the appropriate value of s. 

The fitting gets gradually worse as we get longer values of S, but that makes sense as things like the long bias in returns tends to have more of an effect.