This Blog is Systematic: PCA analysis of Futures returns for fun and profit, part #1

I know I had said I wouldn't be doing any substantive blog posts because of book writing (which is going well, thanks for asking) but this particular topic has been bugging me for a while. And if you listened to the last episode of Top Traders Unplugged you will hear me mention this in response to a question. So it's an itch I feel I need to scratch. Who knows, it might lead to a profitable trading system.

Having said all that, this post will be quite short as it's really going to be an introduction to a series of posts.

Given factor analysis

So at it's heart this is a post about factors. Factors are the source of returns, and of risk. This concept came from the land of equities, specifically the long short factor sorts beloved of Mssrs Fama and French; and it also spawned an entire industry: the modern equity market neutral hedge funds (although Alfred Winslow Jones actually implemented the whole hedge fund idea whilst Fama and French were still in high school).

At it's core then we have the idea of the APT risk model which is basically a linear regression:

r_i,t = a_i + B_1_i*r_1_t + ..... + e

Where r_i,t is the return on asset i and time t, a_i is the alpha on asset i (assumed to be zero), B_1_i is the Beta on the first risk factor of asset i, r_1_t is the return of the first risk factor, there are more terms like this, and e is an error term with mean zero. Strictly speaking the returns on both i and the risk factor should be excess returns with risk free rate deducted, but we're futures traders so that detail can be safely ignored.

In it's simplest form with a single factor that is 'the market', this is basically just the OG CAPM/EMH, and B_1 is just Beta. In a more complex form we can include things like the sorted portfolios of Fama and French. Notice that risk and return are intrinsically linked here. The factor is assumed to be some kind of risk that we get paid a price for exposure to. That price is the B_N term.

(Should B_N be estimated in a time varying way? Perhaps. Although if you vol normalise everything first, you will find your B_N are much more stable, as well as being more interpretable).

Note that for both the market and the Fama French factors (FFF), the factors are given. To be precise, in both cases the factors consist of portfolios of the underlying assets, with some portfolio weights. For the market portfolio, those portfolio weights are (usually) market cap weights. For the FFF they are the +1 for top quartile, -1 for bottom quartile sort of thing.

What can we do with factors?

Many things! The dual nature of factors as risk and return drivers leads them to multiple uses. So for example, we could own the factors. They are just portfolios, and going long if you think the factor will earn you a risk premium is not a bad idea. If you buy an S&P 500 ETF, well congratulations you have gone long the equity market beta factor. With the ability to go long and short we can own FFF as easily as the market factor. Indeed there are funds that allow you to get exposure to FFF factors or similar, though sometimes only on the long side.

We could also trade the factors. My own work in my previous book, AFTS, suggests that 70% of the returns of a momentum portfolio come from trading an asset class index. That is an equal vol weighted rather than market cap weighted portfolio, but the overall effect is similar. Trading, i.e. market timing, the FFF or similar is a little more difficult and if you try to do it Cliff Asness will turn up at your house and hit you repeatedly with a stick.

If we treat the factors as risk we don't want, and we don't buy the idea of an efficient market, then we can buy high alpha / sell low alpha. If a stock looks like it has excess return, over and above what that market and FFF say it should have, then maybe it is a good bet? Although financial economists will scoff at you and say you are exposed to a risk that is not in your regression for which you are earning a risk premium, you can just point to your porsche and explain in great detail how you don't care.

Perhaps we believe in the efficient market hypothesis in the long term, but not in the short term. We wouldn't trust those alphas to be persistent as far as we could throw them. But if we take the residual term, e, well that will most likely show a lovely mean reverting pattern when cumulated. So we can mean revert the residual. Big upward swings away from efficiency that we can short the asset on, and lovely downward pulls we can go long on.

There are more esoteric things people do with factors, mainly to do with risk management. You can for example use them to construct robust correlation matricies, hedging portfolios and what not. Risk management isn't my principal concern here, but that is still good to know.

PCA factor analysis

This is all lovely, especially in equities, but in futures things are a bit more mysterious. For starters, we can do things at an asset class level (which is closer in spirit to the equity market neutral world, although we're still at a level higher as our components are e.g. equity indices, not individual equities); but we can also uniquely do a 'whole market' look by considering futures as a whole.

We could probably take a stab at creating an 'asset class' factor in each market that would be like Beta, and indeed I did that in AFTS with my equal risk weighted index. We know that there are certain bellweather markets like the S&P 500 that we could use as proxies for 'the market' in individual asset classes.

But for futures as a whole, things are much harder. Is the 'market' really just long everything? Even VIX/VSTOXX where we know the risk premium is on the short side? My gut feeling is that our most important factor will be some kind of risk on/off, but then there will be times like 2022 when it would plausibly have been more inflation related. And what would the second factor be?

So we will switch tactics, and rather than use given factors, we will use discovered factors. The idea here is that data itself can tell us what the main latent drivers of returns are, if we just look hard enough. Sure in many cases that will give us the first factor as basically the market portfolio, but the subsequent factors will be more interesting. And in the specific case of futures, where we don't know what the likely factors are, it's going to be quite intruiging.

We use a PCA to discover these factors, with vol normalised returns as the starting point. For each factor we end up with a set of portfolio weights (can be long or short), which can then be helpful to interpret the factor. Note the weights are on vol normalised returns, which are more intuitive.

Sidebar: PCA meta factor analysis (on strategy returns)

Just as a brief note, as I don't intend to cover this here, but it was touched on in the podcast. If we started with the returns of trading eg momentum on a bunch of instruments, rather than the underlying returns themselves, then that might be useful for someone was thinking about replicating a hedge fund index or risk managing a CTA, or perhaps constructing a CTA where they have hedged out the principal component(s) of CTA risk. I've written about replication before, and I've already said I'm not really concerned with risk management here, so I won't talk about this again.

Some nice pictures

This won't be a long post, as I said, as I won't be looking at how to use the PCA returns now I have them. Instead I'm going to focus on visualising the PCAs and interpreting them. Which will be a bit of fun anyway. Methodological points, I used vol normalised returns and one year rolling windows to estimate my PCA. There is a debate to be had as to whether a year is best at compromising between having enough data and a stable result, or whether we need to adapt quicker to changing market conditions.

I estimated at least two, and up to N/2 PCA depending on how many markets N had data. I used 100 liquid futures markets with daily data back to 1970 where possible.

Let's start with the contribution to variance. This is how important each PCA is.

We can see that the first PCA for the last 12 months at least explains 18% of the variance, the second 12.5% and so on. In contrast if we did this for US equities we'd find the first PCA explained 50%, and for US bonds it would be 70%. There is a lot more going on here.

If we look at how the first two factors contributions vary over time:

... we can see that there has been a bit of a downward trend as more factors arrive in the sample, but more generally the first PCA does hover around 20% and the second around 10%. There are exceptions like 2008 where I would imagine a big risk off bet drove the market. The same was true doing COVID.

What is the first PCA? Well currently it looks like this:

For clarity I've only included the top 20 and bottom 20 instruments by weight. Still you may be struggling to read the labels. The top markets are pretty much all stocks, with European equities getting a bigger weight. The S&P does just sneak into the top 20. The bottom 20 starts with VIX and VSTOXX, but mostly the weights here are quite small. So the first PCA right now is "Equity Beta, with a tilt towards Europe".

What about the second PCA?

The first 6 positive instruments are all US bonds, and nearly all the rest are government bonds of one flavour or another. Only EU-Utility stocks get to crash this party (interest rate sensitive?). On the short side we have some FX and quite a few energy futures. So this second factor is "Long bonds / Short energies"

PCA 3 is long a whole bunch of FX, which means it's short USD, and also some metals and random commodities. On the short side it's short EU-Health equities, CNHUSD FX and a whole bunch of European bonds. Feels a bit trade related. Shall we call this the Trump factor?

Anyway I could continue, but more intuitive would be to understand how these factors have changed over time. We'll pick some key markets with lots of history. We will then plot the weight each has in a given PCA over time.

Here is the S&P 500:

We can see that is mostly positive on PCA1 and negative on PCA2,3 but there are periods when that is not the case. The sharp drops in weighting suggest that perhaps we ought to run at something longer than a year, or use an EWMA of weights to smooth things out.

Here is US10 year:

Again, this mostly loads positive on PCA2 but not always. You can see the increase in correlation of bonds and equities happening as PCA1 creeps up in the last few years.

Here is the first PCA weighting in June 2004, one of those interesting periods.

You can see that it was all about currencies in that period; plus silver and gold, various other metals, bonds and energies. So very much a short Dollar, long metals trade.

We're nearly done for today. Last job is to plot the factors. Here are the cumulated returns for PCA1:

That looks a lot like a vol normalised equity market; note the drops in 2009 and 2020.

And here is PCA2:

Again that could plausibly be bonds, mostly up with the exception of the post 2022 period.

This suggests another research idea which is to use the S&P 500 and US interest rates as 'given factors' which might be more stable than using PCA. Still that would mean missing out on times like 2004 when other things were driving the market.

What's next

Next step would be to look at some of those opportunities for factor use and misuse outlined above, and see if there is profit as well as fun in this game!

3 comments:

William2 July 2025 at 03:20
I really appreciate you answering the question on TTU and am looking forward to this series.

I keep coming back to this idea related to Hierarchical PCA (https://arxiv.org/abs/2010.04140), which partly motivated my question. It seems your correlation based clustering and hand crafting approach could be used to create the hierarchy for HPCA, hopefully generating more stable and interpretable factors (basically provides a prior for potential PCA factors). It's an idea I want to eventually look into, but thought it was worth asking first.

Thanks again for your time, I hope you don't get too distracted from book writing!
Arvind Damarla2 July 2025 at 22:55
Note that PCA1 looks like the equity market factor after 2000, but it looks like its inverse before 2000.
ollie3 July 2025 at 14:13
Instability of PCA is one of the main drawbacks. Signs of first factor flipping (which is what it may well be) is easy to take care of though.

I've spent a fair bit of time on macro factors in various contexts. I always found PCA to be an interesting guide, but always ended using pre-determined factors, as suggested in the conclusion of the article. I believe adding energies make sense too for a baseline model.
From there, one can build as many additional factors as required. Either based on asset classes (credit, FX, etc.) or, taking a cue from equity portfolio management, based on characteristics (regions, sectors, mom, carry, value...), using orthogonalisation. There are infinite ways to skin a cat though.

Interested to see what Rob's final approach to it will be.

Comments are moderated. So there will be a delay before they are published. Don't bother with spam, it wastes your time and mine.

Tuesday, 1 July 2025

PCA analysis of Futures returns for fun and profit, part #1