Friday, 14 November 2025

Wordle (TM) and the one simple hack you need to pass funded trader challenges

An unusual (but quick) mid month post, as this is a live issue I thought I'd publish this whilst it's relevant.

There has been some controversy on X/Twitter about 'pay to play' prop shops (see this thread and this one) and in particular Raen Trading. It's fair to say the industry has a bad name, and perhaps this is unfairly tarnishing what may pass for good actors in this space. It's also perhaps fair to say that many of those criticising these firms, including myself, aren't as familiar with that part of the trading industry and our ignorance could be problematic. 

But putting all that aside, a question I thought I would try and answer is this - How hard is it to actually pass one of these challenges? As a side effect, it will also tell us what the optimal vol target is to use if we're taking part in one of these challenges. Hence the clickbait article heading. I know from experience this will open me up to having to filter out 500 spam comments a day, but f*** it. 

As well as modelling Raen, I also model a much dodgier challenge later in the post, from another company which I will name only as prop firm #2. Finally I close with some generic and unquantified thoughts on the subject. 

Standalone Python code here. You can play with this to model another firms challenges.

TLDR: 

  • Raen you have reasonable chance of passing their first round challenge and you should use a vol target of [scroll down to find out!] to maximise your chances.
  • Prop firm #2 and most of the 'industry' use a very long bargepole, I can lend you mine
  • I remain skeptical of pay to play

As to what any of this has to do with the word game Wordle (TM), read on to find out.


IMPORTANT: This is not an endorsement of Raen. I have no association with them and I remain skeptical of this entire industry. Their CEO reached out to me after this blogpost was initially published, confirmed my understanding of the challenge parameters was correct, and gave me permission to use the firms name. I made one small correction to the post as a result of that contact.


The (relatively) good guys 

The rules of the Raen challenge are this:

  • You must make 20%
  • You can't lose more than 2% in a single day. There is no maximum trailing drawdown. So if you lose 1.99% every day forever, you're still in the game.
  • You must trade for at least 30 trading days before passing the challenge
  • It costs $300 a month to do the challenge. This isn't exactly the same Raen which charges a little more, but as a rounder number it makes it easier to directly see how many months we expect to take by backing out from the cost per month. I assume this is paid at the start of the month.
Note: this is just the 1st stage of the challenge. The rules for the 2nd stage are much more nebolous, but to be fair there are no charges for those. Like I said, this prop firm appears to be amongst the relatively good guys. 
 
I've also got these parameters:
  • 256 business days a year, 22 business days a month (it's actually more like 21, but again this higher figure will make the prop firm look good)
  • Random gaussian returns generated with no autocorrelation. This is extremely kind as it ignores the chance of fat tails that are somewhat common in finance.
  • If we get stopped out we try again, which means restarting the challenge from scratch. There are no reset fees. I assume that this reset doesn't affect the timing of monthly fees (I can't find the answer to this question on the website, but this must be the case as otherwise the cost of resetting would be free and your best strategy would be to keep making massive bets every day and you would pass eventually and only ever have to pay the first month).
  • We give up if we can't pass after trying for a year (there are no time limits in the challenge, but this speeds up the computation and seems like reasonable behaviour).
  • I assume there are no other limits which make it hard to hit a given risk target. This is unlikely to be a constraint except for suboptimally high vol targets.
There are two clear variables we are missing: the expected Sharpe Ratio, and the vol, both required to generate the gaussian returns. The former is assumed to be exogenous (a question to answer is how hard are these challenges to pass - if you need a SR of 4 to pass them that suggests they are probably too hard), whilst the latter we can optimise for. Note that due to the drawdown and self imposed time limit the optimal vol target won't be equal to the usual Kelly optimal. In fact this subject is intellectually interesting as well as topical since it's the first time I've looked at optimisation with a drawdown/time constraint. 

I run this as a bootstrap exercise. We try and optimise: (a) minimise the median cost, (b) maximise the probability of being funded before we give up. 

OK so two simple graphs then. Each has a different line for each SR, and the x-axis is the vol target we are running at. The y-axis on graph one is the cost, with a minus sign so we have the natural thing of a high y-axis being good. On graph two the y-axis is the probability of passing before we give up, again obviously high y-axis is good.


Median cost of getting to stage two, lines are SR, x axis is annual vol target, y axis is cost (bigger minus numbers are higher costs)

Note that for SR/vol combinations where we have a less than 50% median chance of succeeding the median cost will be equal to the monthly cost * 12. This is the case for SR<1.5



Probability of getting to stage two, lines are SR, x axis is annual vol target, y axis is probability of success



What conclusions can we draw from this?
  • The optimal vol target depends on your SR and whether you are focusing on costs or probability*
  • To get a greater than 50% chance of passing we need an expected SR of 1.5 or higher. 
  • The expected median cost with optimal vol is going to be be $2000 for a SR of 1.5, which you can get down to $1500 if you are the next RenTech (SR of 3). 
  • The expected median time to pass is going to be about 7 months for a SR of 1.5 or about 5 months if you are the next RenTech
* Experts will recognise the vol target choice as the Wordle (TM) starting word problem (yes we finally got there). The best starting word for Wordle will depend on whether you are maximising your probability of winning, or trying to minimise the number of guesses you make. Similarly, are we trying to maximise our chance of passing the challenge, or minimising our likely cost? They are not quite the same thing.

The optimal vol looking at costs is around 15 - 20%. Looking at probability of passing, it's around 12% for very high SR traders, and more like 22% for low SR traders. Basically if you're crap you have to take a bit more risk to have a chance. If you're good you can chill. Given we're assuming Gaussian returns I'd be tempted to mark these figures down a bit, although note that for high SR traders using less than optimal vol is quite harmful (very steep lines) whilst using more than optimal is less painful (this is completely at odds with Kelly of course).

Since nobody knows what their SR is, I'd suggest using 15% as a vol target. If you are incredible that is slightly more than optimal, but you still have an 80% chance of passing. If you are less incredible it may be slightly less than optimal, but then you have no business passing this challenge anyway.


The not so good guys firm #2


Here is an example of another firm's level 1 challenge, I won't name them but they are currently on the 1st page of google results for the search term "trading prop challenge" so that narrows it down. This firm has several challenge tiers in the futures space, I've chosen the lowest; but all the conditions are the same just different notional capital and $ costs. 

The rules of the challenge are this:

  • You must make 6%
  • The maximum drawdown is 4%; trailing based on daily balances.
  • If you lose more than 2% in a day, well basically you're stopped out at 2% but the challenge doesn't end. So your max loss in a day is 2%. In practice would be slightly more because of slippage but let's be generous here.
  • There is a one time activation of $130 (not exact figures again but ballpark).
  • You have to do the challenge in 30 days. It costs $100 to start each challenge. If you want to extend the challenge by 30 days it costs another $100. This equates to a monthly fee of $100, so we'll model it like that.
  • If you need to reset (start again because you've gone boom) it's $80. This is on top of the monthly cost since it doesn't reset the number of days to zero before you have to pay a monthly fee again.
  • There are optional data fees we will ignore, because there are enough fees here already.
Now, it's worth saying that there are many other terms and conditions that make firm #2 much dodgier and less likely to fund you or give you your profit share once funded (of course we're assuming firm #1 sticks to their word as well); but we're purely here to model the challenge itself.

Here are the graphs:


Median cost of getting to stage two, lines are SR, x axis is annual vol target, y axis is cost (bigger minus numbers are higher costs)


Probability of getting to stage two, lines are SR, x axis is annual vol target, y axis is probability of success

This is not what I had expected. I had expected the challenge to be much harder, so the firm could keep collecting the fees. But this challenge is easy to pass, just use vol more than 25%. Basically you get to flip a coin a couple of times and sooner or later it will turn up heads. This strategy will work even if you are a losing trader (SR -0.5) as shown. The only benefit of being a better trader is you will pass quicker and thus pay less. 

This is an incredibly badly designed challenge. It rewards higher volatility. It doesn't discriminate at all between good and bad traders. 

So eithier (i) there are other conditions in the (very hard to find) small print that in practice make the challenge hard to pass or (ii) it's a deliberate strategy to allow almost anyone to get to the next stage. The biggest red flag is that trading with this particular firm is sim only even after you have passed the challenge. They don't want to make the initial challenge too hard; they want you as a paying customer ASAP. And people who use too much vol are ideal customers for bucket shops.


The prop firms view

Of course what we're not doing here is looking at things from the prop firm's point of view. The challenge is designed to answer the question: "is this potential trader any good or just lucky?". At least that is if you are assuming they are genuinely looking for good traders. Which prop firm #2 definitely isn't, so let's focus on Raen.

The main shortcoming of these challenges is that 30 days or even a year is a wholly insufficient time to determine if anyone has any skill, unless they are very highly skilled indeed. And again, to be fair, the initial challenge of Raen is purely a screening exercise that will essentially tell you (a) if someone has a vague idea of how to manage risk and avoid a 2% daily drawdown and (b) is eithier very good (SR somewhere over 1) or just very lucky.

Someone who shoots for a vol target that is too high will almost certainly fail. However there is still a chance of a crap trader being lucky. But hopefully the second stage will weed them out. So we aren't too worried about type 1 errors.

However even relatively highly skilled traders (say SR 1 to 1.5) will only have a coinflip chance of passing. So there is still quite a big chance of a type 2 error and missing out on the next Nav Sarao*. Perhaps that's okay. They're probably only interested in people with a SR of over 2 anyway, where the passing percentage for a year will be over 60% if they use optimal vol. Of course I'm assuming all these people have several thousand dollars to stump up to a years worth of monthly fees. There will be many who don't, and therefore also miss out on potentially being funded even if they are good traders. So I would say the possibility of a type 2 error is quite high.

this famous gentlemen who for all his faults was an incredibly succesful futures prop trader even when he wasn't breaking the law.
 
I'd say on balance that Raen's challenge is relatively well designed given all the caveats. It's simple, it's difficulty rating feels about right, and the 2% daily drawdown acts as a simple anti muppet filter. The fact there is an optimal vol is satisfying. It would be interesting to see their internal numbers on how many people pass the first and then the second challenge; and then go on to become good traders. That will tell us what their type 1 error actually is. 


But is this all really a good idea? Some unquantified and unqualified opinions

Putting aside the statistical debate, is this all really a good thing? For the traders trying out, or for the firms themselves (assuming they again are genuine). There are many red flags in this industry, having to pay to be considered for a 'job' is always bad (although Raen's CEO clarified to me that they also accept applicants who haven't passed the challenge, presumably with some kind of filter on experience); the fact that many places are purely bucket shops where you trade against the broker is awful (again not Raen), frankly the whole thing makes my stomach churn but I'm trying to be as fair as possible here and put emotions aside.

The world of trading has changed an awful lot. In this post the founder of Raen says their shop is for people who would never the opportunity to get into Jane Street (JS). But JS is looking for people with a very particular set of skills to do a certain kind of trading which you can't do unless you have the sort of resources JS has. 

Yes we can argue that the Jane Street filter is too strict (though they hired SBF, a man who did not understand how to size trading positions, so maybe not strict enough), but it's pretty silly to pretend that Jane Street would be interested in hiring the sort of people who have the ability to be point and click futures traders. It's really not the same at all.

Raen apparently has ex JS people working there and they are 'very succesful'. I am sure they are. I'm also sure that they're almost certainly not point and click traders eithier. But is it really realistic to replicate JS by hiring a completely different set of people, without any of the filters JS uses to get specific sets of skills, using a totally different process from what JS uses, and without most of JS resources; and then sit them next to ex JS traders from whom they will presumably absorb brilliance by osmosis?

Basically if for some reason you are trying to be the next JS why are you using a hiring process which is clearly for point and click traders? There are no references to APIs that I can see on any of these challenge websites, so I assume it's point and click they are looking for.

So, is the world of point and click prop traders too inaccessible? It's probably more accessible than it was 20 years ago from an IT and cost perspective. But admittedly if you're not trading costly and dodgy retail assets,  and want to trade futures, then no you can't really do this with the $3000 or so you'd need to pass even a good trading challenge. The $100k of (notional, real?) money you get from Raen is the bare minimum I suggest in my book. You would need less in equities though, but to be eg US PDT you need $25k (for now). 

But from my perspective, the whole point and click futures industry seems very... niche. The vast majority of professional traders now are basically quants, or heavily supported by quants, and/or using data other than the charts and order books fancied by the dozen monitor setups of the cliched point and click trader. It's an area of the market that really is very efficient and where the vast majority of point and click humans can't compete even if supported by execution algos, which is why I deliberately trade much... more... slowly. 

In fact I'd say there are now significantly more people employed by the likes of JS than by genuine and profitable point and click firms. 

So we're talking about getting access to a relatively tiny industry that is frankly a bit quaint and probably still shrinking. I can understand why many people want to get into it though. Who wouldn't want to gamble for a living, make millions of dollars a year, in a job which requires no qualifications (no Phd in astrophysics needed here!), which so many films and YouTube videos have glamorised, and which eithier requires almost no work or where hard work and effort will be rewarded (depending on which video you watch). 

I can believe that there are a very small number of people who have pointed and clicked for so long, that they really do have an ability to 'feel' a particular market very well, they can glance at an L2 order book and see patterns, they know which news and statistics to focus on, they know what other markets to look at, they know how to manage risk and size positions, they have built execution algos that enable them to compete not with HFT but certainly in the sub one day area... and they are certainly better traders than me or my systems. 

As to how you would select such people, I do not know. They are not my people. Personally I am very skeptical as to whether there really are people who can sit at a computer having never traded futures before except maybe in a simulator, with no training or market experience, and have some innate trading ability that enables them to have a high probability of passing a trading challenge with the sort of SR that would make most hedge funds weep with envy, and also that those challenges are the best way of being able to tell that someone has that innate ability. 

But once again, I'm not in this industry so what do I know. 

Summary

Raen: not a bad intial screening and a more than 50% chance of a pass with a SR above 1.5. But it will cost you more than $300. Budget for several thousand bucks and use a vol target of around 15% to optimise your chances.

Unamed prop firm #2 I googled and most of this industry: stay away for gods sakes.

Pay to play: morally dubious IMHO

Random futures traders having some sort of innate talent that can be found in this way: I doubt it








Tuesday, 11 November 2025

Is predicting vol better worth the effort and does the VIX help?

 I'm a vol scaler.

There I've said it. Yes I adjust my position size inversely to vol. And so should you. 

But to this well we need to be able to predict future vol; where the 'future' here is roughly how long we expect to hold our positions for. 

Some people spend a lot of effort on this. They use implied vol from options, high(er) frequency data, GARCH or stochastic vol models. Other people don't spend a lot of effort on this. They look at the vol from the last month or so, and use that. I'm somewhere in the middle (though biased massively towards simplicity); I use an exponentially weighted moving average of recent vol combined with a much slower average.

An obvious question with any research effort is this: is the extra effort worth it? If we were trading options, then sure it would be. But we're not.

In this post I answer that 'is it worth spending time on this nonsense' question and look at the actual improvements we can gain from moving from the most rudimentary vol forecasting to the slightly more complex stuff I do. I also see if we can use a simple indicator of future volatility - the VIX - to improve things further. This was suggested by someone on Twitter(X). 


Is it worth predicting vol better?

I've mentioned this experiement a few times in the past, but I don't think I have ever blogged about it. Basically you run two backtests, one with your normal historic vol estimation, and the other with perfect foresight: basically equal to the ex-post vol over the next 30 days. This will be equal to the theoretical best possible job we could do if we really worked hard at forecasting vol. We can't do any better than a crystal ball. 

Then you check out the improvement. If vol is worth forecasting, there will be a big improvement in performance.

[This is a 'workhorse' test simulation with 100 liquid futures and 4 signals: 40% carry, and 20% in eahc of ewmac 16,32 and 64]

We begin with the simplest possible predictor of vol, a backward looking standard deviation estimate with an infinite window. Essentially this is a fixed vol estimate without any in sample estimation issues. We then compare that to the perfect foresight model.

Let's begin by looking and seeing what the vol outcome is like, this is one month rolling vol estimate (the realised vol of the strategy returns); clearly foresight does a better job of vol targeting.


Above are the cumulated returns. That sure looks like a decent improvement and as the vol of perfect foresight is lower it's better than it looks. It's a half unit improvement in SR points, from 0.76 to 1.24. The skew has dropped off from over 1.0 monthly to 0.12, but you know from my previous posts that small dip in skew won't be enough to destroy the huge CAGR advantage given by this sort of SR premium. The sortino is much better, more than double. 

So the short answer is yes, it's worth predicting vol better. Let's see how 


What size window

The obvious thing to do is to shorten our estimation window from forever to something a little shorter. Here is a graph I like to show people:

The x-axis shows the window size for a historic vol estimator in business days. The y-axis shows the R squared regressing the realised vol for a given future time period against the estimator / predictor of future vol. We're looking for the point on the x-axis that maximises R squared. Each line is a different length of future time period. So for example, to get the best prediction of vol one month ahead (about 21 business days) we look at the purple line for 21 days, and we can see this peaks at around 25 days. 

This is also the highest R squared. We are best at predicting one month vol ahead than other periods, and to do so we should use the previous one month vol (actually slightly more than a month). 

We don't do quite as well predicting shorter periods, and it looks like we might need slightly less data to predict eg 5 day vol. We do worse predicting longer periods, and it looks like we need more data. For 365 days ahead vol, the best R squared is obtained at somewhere between 40 days (around 2 months) and 100 days (around 5 months). 

Note: these are good R squared! In my last post a monthly holding period with an R squared of 0.1 would give us a SR of over 1, which is good. Here we are seeing R squared of over 0.30, which equates to a SR of nearly 2. That is very good - if we were as good at predicting returns as vol our SR would be two!

With that in mind, let's go from an infinite lookback to a 25 day business day lookback and see what happens.

First the rolling vol:

We can already see a fair improvement from the spikiness of the benchmark. How about the returns?

It looks like we are doing better than the benchmark and are competitive with foresight. However some of this is higher vol; our SR is 1.03 which still falls short of the 1.24 of the perfect foresight model, though obviously much better than the benchmark of infinite vol.

To recap:

Infinite previous vol                  SR 0.76
One month simple rolling vol           SR 1.03
Perfect foresight                      SR 1.24


From simple to exponential moving average

Now let's be a little fancier and go to EWM of vol rather than a simple equally weighted measure. This might not get us a better forecast of vol, but we should be smoother. A 36 day span in the pandas EWM function has the same half life as a 25 day SMA.

As before, here's the vol targeting, which is now almost identical:


And for profits....


Again we aren't quite vol matched, but EWM does in fact add a small increment in SR of 0.04 units. Around a quarter of that modest bump comes from lower costs (a saving of around 24 bp a year). 


Infinite previous vol                  SR 0.76
One month simple rolling vol           SR 1.03
One month EWM rolling vol              SR 1.06
Perfect foresight                      SR 1.24


I already looked at this in my book AFTS, but if we combine the standard 25 EWM vol with a very long run average (10 years) of the same vol we get another small bump. This is the vol measure I use myself.


Introducing the VIX

We are still some way short of getting close to perfect foresight vol. So let's do something else, for fun. We know that implied vol should be a good predictor of future vol; accounting for the well known vol premium (we get paid for being short gamma, hence implied is persistently higher than expected future vol).

Here's the simple rolling 25 day standard deviation measure for the S&P 500, and the VIX:

Note: I would like to thank Paul Calluzzo for pointing out a stupid mistake I had made in the first version of this post

A couple of things to notice. Firstly the vol premium is larger after 2008 due to a general level of scaredy-cat-ness, and it sems to have narrowed somewhat inthe last few years. Over the last few years there have been a lot of dumb retail people selling vol and pushing the price down! 

Secondly it looks like VIX tracks rather than predicts increases in risk, at least for those unexpected events which cause the biggest spikes. Which suggests it's predictive power will be somewhat limited.
If we regress future vol on historic vol plus the VIX, the VIX coefficient is 0.14 and the historic vol comes in at 0.71. That suggests historic vol does most of the explaining with VIX not adding much to the party. I get similar results if I put the vol premium (VIX - historic vol) plus historic vol into the regression to reduce potential colinearity. 

Summary

There are significant performance benefits to be gained from forecasting vol well even in a directional system that doesn't trade optionality. Over half of those benefits can be captured by just using the right amount of lookback on a simple historical estimate. Further complexity can probably improve vol targeting but is unlikely to lead to significant performance improvements. Finally, the VIX is not especially helpful in predicting future volatility; mostly this is explained pretty well by historic vol.



Saturday, 1 November 2025

R squared and Sharpe Ratio

 Here's some research I did whilst writing my new book (coming next year, and aimed at relatively inexperienced traders). Imagine the scene. You're a trader who products forecasts (a scaled number which predicts future risk adjusted returns, or at least you hope it does) who wants to evaluate how good you are. After all you've read Carver, and you know you should use your expected Sharpe Ratio to determine your risk target and cost budget.

But you don't have access to cutting edge backtesting software, or even dodgy home brew backtesting software like my own psystemtrade, instead you just have Excel (substitute for your own favourite spreadsheet, god knows I certainly don't use the Micros*it product myself). You're not enough of a spreadsheet whizz to construct a backtest, but you can just about manage a linear regression. But how do we get a Sharpe Ratio from a regression?

If that is to much of a stretch for the typical reader of this blog, instead imagine that you do fancy yourself as a bit of a data scientist, and naturally you begin your research by regressing your risk adjusted returns on your forecasts to identify 'features' (I'm given to understand this is the way these people speak) before going near your backtester because you've read Lopez De Prado

Feels like we're watching a remake of that classic scene in Good Will Hunting doesn't it "Of course that's your contention. You're a first year data scientist. You just finished some financial economist, Lopez De Prado prob'ly, and so naturally that's what you believe until next month when you get to Rob Carver and get convinced that momentum is a risk factor. That'll last until sometime in your second year..."

But you're wondering whether an R squared of 0.05 is any good or not? Unlike the Sharpe Ratio, where you know that 1 is good, 2 is brilliant, and 3 means you are eithier the next RenTech or more likely you've overfitted.

So I thought it would be 'fun' to model the relationship between these two measures of performance. Also, like I said, it's useful for the book. Which is very much aimed at the tech novice trader rather than the data scientist, but I guess the data scientist can just get the result for free from this blogpost as they're unlikely to buy the book.

There are three ways we can do this. We can use a closed form formula, we can use random data, or we can use actual data. I'm going to do all three. Partly to verify the formula works in the real world, and partly to 

There is code here; you'll need psystemtrade to run it though.

Edit notes: I'd like to thank LacertaXG1 and Vivek Rao for reminding me that a closed form formula exists for this problem.


Closed form formula

From the book known only as G&K we have one of my favourite laws, LAM - the law of active management. This is where the famous 'Sharpe Ratio (actually Information Ratio, but we're amongst friends) is proportional to sqrt active bets' comes from, a result we use in both portfolio size space (the IDM for a portfolio of N uncorrelated assets ought to be sqrt N) and in time space (for a given success rate the SR for a trading strategy with holding period T will be sqrt 2 times better if we halve our holding period). 

Anyway under LAM at an annual holding period an R squared of 0.01 equates to an IC/SR of 0.10. Under LAM we'd expect the same R squared to result in a sqrt(256) = 16, SR of 1.6 at a daily holding period. Let's see how well this is borne out by the data.


Real data and forecasts

This is the easiest one. We're going to get some real forecasts, for things like carry, momentum. You know the sort of thing I do. If not, read some books. Or if you're a cheapskate, the rest of this blog. And we get the price of the things the forecasts are for. And because I do indeed have fancy backtesting software I can measure the SR for a given forecast/price pairing*. 

* to do this we need a way of mapping from forecast to positions, basically I just do inverse vol position scaling with my standard simple vol estimate which is roughly the last month of daily returns, and the overall forecast scaling doesn't really matter because we're not interested in the estimated coefficients of the regression just the R squared.

And because I can do import statsmodel in python, I can also do regressions. What's the regression I do? Well since forecasts are for  predicting future risk adjusted returns, I regress:

(price_t+h - price_t)/vol_estimate_t = alpha + beta * (forecast_t) + epsilon_t 

Where t is time index, and h is the forecast horizon in calendar days, which I measure simply by working out the forecast turnover (by counting the typical frequency of forecast sign changes from negative to positive in a year), and then dividing 365 by the turnover. 

Strictly speaking we should remove overlapping periods as that will inflate our R squared, but as long as we consistently don't remove overlapping periods it then our results will be fine.

Beta we don't care about as long as it's positive (it's some arbitrary scaling factor that will depend on the size of h and the forecast scaling), and alpha will be any bias in the forecast which we also don't care about. All we care about is how well the regression fits, and for that we use R squared. 

Note: We could also look at the statistical significance of the beta estimate, but that's going to depend on the length of time period we have. I'd rather look at the statistical significance of the SR estimate once we have it, so we'll leave that to one side. 

Anyway we end up with a collection of SR and the counterpart R squared for the relevant regression. Which we'll plot in a minute, but let's get random data first.


Random data

This is the slightly harder one. To help out, let's think about the regression we're going to end up running:

(price_t+h - price_t)/vol_estimate_t = alpha + beta * (forecast_t)  + epsilon_t 

And let's move some stuff around:

 (forecast_t)  

     = ((1/beta)*(price_t+h - price_t)/vol_estimate_t) 

      + (alpha/beta) + (epsilon_t/beta) 

If we assume that alpha is zero, and we're not bothered about arbitrary beta scaling, then we can see that:

 (forecast_t)  

     = ((price_t+h - price_t)/vol_estimate_t) + noise

This means we can do the following:
  • Create a random price series, compounded gaussian random is fine, and scaling doesn't matter
  • Measure it's backward looking vol estimate
  • Work out the future risk adjusted price return at any given point for some horizon, h
  • Add noise to it (as a multiple of the gaussian standard deviation)
  • Voila! As the french would say. We have a forecast! (Or nous avons une prévision! As the French would say)
We now have a price, and a forecast. So we can repeat the exercise of measuring a SR and doing a regression from which we get the R squared. And we'll get the behaviour we expect; more noise equals lower SR and a worse R squared. We can run this bad boy many times for different horizons, and also for different levels of noise.


Results

Without adoing any further, here are some nice pictures. We'll start with the fake data. Each of the points on these graphs is the mean SR and R squared from 500 random price series. The x-axis is a LOG scale for R squared. 10^-1 is 0.01 and so on, you know the drill. The y axis is the SR. No logging. The titles are the forecast horizons in business days, so 5 days is a week, etc etc.

As we're trading quickly, we get pretty decent SR even for R squared that would make you sad. An R squared of 0.01, which sounds rubbish, gives you a SR of around 0.7. 

Heres around a monthly holding period:


Two months:


Three months:


Six months:

And finally, one year:



Right so what are the conclusions? There is some fun intuition here. We can see that an R squared of 0.01 equates to a SR of 0.1 at an annual holding period as the theory suggests. It's also clear that an R squared of 0.1, which is very high for financial data, isn't going to help that much if your holding period is a year. Your SR will still only be around 0.30. Wheras if you're trading fifty times faster, around once a week, it will be around 2.30 SR with R squared of 0.1. The ratio between these two numbers (7.6) is almost exactly equal to the square root of fifty (7.1) and this is no accident; our results are in line with the law of active management which is a nice touch.

Neatly, an R squared of 1 equates exactly to a SR of 1 at a one year holding period.

Now how about some real results. Here we don't know what the forecast horizon is, instead we measure it from the forecast. This does mean we won't have neat graphs for a given horizon, but we can do each graph for a range of horizons. And we don't have to make up the forecast by reversing the regression equation, we just have forecasts already. And the price, well of course we have prices.
Important note! Unlike with fake data where we're unlikely to lose money on average, with real data we can lose money. So we remove all the negative SR before plotting.

Here's for a horizon of about 5 days:

No neat lines here; each scatter point represents an instrument and trading rule (probably mostly fast momentum). Remember this from earlier for the 5 day plot with fake data: "An R squared of 0.01, which sounds rubbish, gives you a SR of around 0.7". You can see that is still true here. And also the general shape is similar to what we'd expect; a gentle upward curve. We just have more really low SR, and (sadly!) fewer higher SR than in the fake data.

About two weeks:

About a month:

About two months:
About three months:
About six months... notice things are getting sparser
And finally about a year:
There is very little to go on here, but an R squared of 0.1 which before gave a SR of 0.3 isn't a million miles away at 0.5. In general I'd say the real results come close to confirming the fake results.


Summary

Both data scientists and neophyte traders alike can use the fake data graphs to get SR without doing a backtest. Do your regression at some forecast horizon for which a fake data graph exists. Don't remove overlapping periods. If the beta is negative then you're losing money. If the beta is positive then you can lookup the SR inferred by the R squared.

You can also use any graph, and then correct the results for LAM. For example, if you want the results for 1 day, then you can use the results for 5 days and multiply the SR by sqrt(5). But you want a closed form solution. So here is one, assuming 256 business days in a year:

The SR for N days holding period is equal to 16 * sqrt(R squared / N)



Tuesday, 7 October 2025

Is the degradation of trend following performance a cohort effect, instrument decay, or an environmental problem?

It's probably bad luck to say this, but the most recent poor performance of CTAs and trend following managers this year appears to have been reversed. My own system is up over 12% since the nadir of the summer drawdown, and is now up for year; admittedly by only by 5.5%. 

Nevetheless, it's true to say that trend following performance appears to have been degrading over the last few decades. If I can literally talk my own book (the book in question being Advanced Futures Trading Strategies - AFTS), then in one chapter I note:

Having said that, it does look like the returns from strategy nine are falling over time. The inflationary 1970s were particularly strong, with a total non-compounded return of over 500% over the decade (if we had been compounding, our returns would have been even more spectacular). We then made over 200% in each of the next three decades. But since 2010 our average return has roughly halved.

But where the better returns in the 1970's (and to a lesser extent 80's, 90's and 00's) because we had a better environment for trend following, or because we had better instruments, or because over time instrument performance decays?

Let me explain. Back in 1972 when my backtest begins, there were just a few instruments. In the first few years only 11 instruments were around, out of the 100 or so in my usual list of liquid instruments I use for testing. And they were weird: seven were agricultural commodities, two are currencies and two metals. Perhaps the better performance of trend following in the 1970's was because the instruments we had then were just better at trend following? Or, perhaps it's just that when an instrument is first traded it does very well, because there aren't many other smart people hanging around to extract 'alpha'?

So, let's see which of these three explanations is most likely.

I'm going to start with the fastest EWMAC 2,8 crossover I use. In AFTS I noted that the two fastest crossovers have suffered particularly bad degeneration in performance since about 1990. These returns are before costs, so the after cost performance would be even worse.

There are quite a few of these graphs in this format, so let me explain. Each line is a different cohort of instruments. So the blue line for example, is all the instruments that began trading in the period 1971-1980 inclusive. On the y-axis is the average SR for those instruments in the five year period beginning in the date on the x-axis. So for example, for the instruments that began trading in the first ten years, from 1981-1985 their average SR was around 0.30; a little higher than the next cohort of instruments that just came in. 

* Important note. The date an instrument starts trading is the earliest point I have data for it. It may have been trading long before then.

Let's now review three possible explanations for the reduction in trend following p&l in the fifty years or so, and see which most represents the empirical results.

First of all we have a cohort effect explanation. Instruments which enter the dataset later have worse performance, so they drag down the average performance. This would look something like this:


You can see that the older an instrument is, the better it's performance. To combat this effect we could just avoid trading newer instruments.

Next we have an instrument lifetime decay effect. Each instrument does well when it first enters the dataset, but then it's performance decays as it ages. As a result, as more instruments are old and fewer newer, and the average performance falls. An extreme version of that effect would look something like this (I've drawn the horizontal lines slightly apart for clarity, they should be on top of each other): :

To deal with this we'd have to be constantly adding new instruments that haven't yet been affected by the influx of sophisticated traders looking for alpha. This is the opposite of what we'd do with a cohort effect.

Finally we have the general enviroment effect. All instruments have roughly similar performance in each period, which gets worse over time. This would result in something a bit like this (I've drawn these lines slightly apart for clarity, they should be on top of each other): 

There is no solution here. We are buggered. We need to get out of the trend following game. Or at least hope that this is just a temporary setback; after all people have tested trend following rules over hundreds of years so a few decades of bad performance is nothing to worry about.

Looking back at the EWMAC 2,8 graph, there is no support for the cohort effect. It looks like there is some evidence of an instrument decay, most strikingly for the 1991 cohort. However this only contains 7 instruments, so it's one of the smallest cohorts, so the significance is questionable. Overall though, this does look an enviroment effect with noise. 

Let's step down the speed to EWMAC4,8:

Looks like a very similar picture. How about EWMAC8,16?
Again apart from the 1991 cohort, this looks pretty much like a gradual decline due to enviromental effects. Now let's turn to EWMAC16,64, which readers of AFTS will know is the best following of my momentum indicators:
Feels like we're watching the same film over and over again, doesn't it?

EWMAC32,128


EWMAC64,256

Not quite as clear, but we're trading very slowly now so would expect more noise.

To summarise then, it looks like the decay of momentum performance has been solely down to  general enviromental effects, rather than a cohort effect, or the decay of instrument performance. This is bad, because it means we can't do anything about it by restricting ourselves to older or newer instruments. This is good, because instrument diversification really is our best chance of being profitable traders and making the most out of a weakening signal. I wouldn't want to suggest that you should do anything different than trading all the instruments you can get your hands on.

And to reiterate, let's hope this is a temporary situation. For me personally, as someone who isn't tied into a CTA box, I'm happy to continue trying to trade as many different risk / return factors as possible.

Bonus postscript: Here's carry!

This does look a little bit more like a decaying instrument story...



Monday, 8 September 2025

PCA analysis of Futures returns for fun and profit, part deux

 In my previous post I discussed what would happen if you did the crazy thing of doing a PCA on the whole universe of futures across assets, rather than just within US equities or bonds like The Man would want you to. In this post I explore how we could do something useful with them. There is some messy code here, to run all of it you'll need psystemtrade, but you can exploit big chunks with your own data even if you don't.


The big problem: sign flipping

Before hitting some p&l generating activity, first however we need to deal with an outstanding issue from the previous post.

TLDR, most of the time factor one is 'risk on /equities are go' and factor two is global interest rates; although not always. Factor sign flipping was a problem however (thanks to people below the line for that insight). So sometimes factor one was long equities, sometimes short equities. Sometimes it was something else entirely. 

As an example, remember this plot from part one? It's the factor exposure of the S&P 500 over time for factors 1(0), 2 (1) and 3(2).

Note there are 'blips' when we have a short exposure to factor 1, mostly in the period since 2008 when we're normally long factor 1. That's clearly a temporary sign flip. We probably want to get rid of those. But there is also the long period in the early 2000's when we're persistently short factor 1. That might be a 'sign flip'; but it could also be that factor one in this period was something more interesting than just 'long equity risk'. 

A couple of ideas spring to mind here. One is just smoothing the factor weights. That would easily solve the blips; and the smooth need only be a few weeks to get rid of them. But a longer smooth, of the length needed to get rid of the other periods, would reduce the information about the factors; in particularly we'd be missing out on interesting times when something other than boring old risk on and off is driving the market.

Another bright idea I had was to reverse the sign on weights when the largest absolute value weight was negative. My expectation was that generally the largest weight on factor 1 (mostly risk on) would usually be equities, and when that factor flipped sign we'd flip it back again. However that didn't produce the expected results. If I were less lazy (and eager to get back to writing book #5), I'd probably do some research; eg I'm pretty sure the answer is somewhere in Gappy's new book but I haven't got there yet. 

In the end I decided to relax and ignore the sign flipping; I can do this because of the four ideas I outlined:

1- own the factors

2- trade the factors

3- buy assets with persistent alpha (+ve residual) 

4- mean revert the cumulative residual 

.... it's only really 1 and 2 that are affected by sign flipping. And I feel I already have things in my armoury for 1 and 2. For example my aggregate momentum signal (blogged about here, and also in my most recent book AFTS) is basically like 2, and on assets with a long bias that will also give us a chunk of 1 as well. 

<Sidebar * note to browser not actually HTML>

Arguably my relative momentum and long term mean reversion are also a bit like 3 and 4. Yet another idea is to build 'asset classes' using clustering as I did here, and then use those for the purposes of 1,2 and possibly 3 and 4. 

So we have three different ways of forming 'factors': exogenously determine asset classes, PCA, and clustering; and four different ways of trading each of them. Those won't give radically different results since clusters mostly follow asset classes, but they could be a little different.

<\Sidebar * see previous note>

But, I hear you cry, why can you flippantly ignore sign flipping when trading only the residuals? Well it's pretty simple; consider a standard APT type equation with a single PCA k and market i:

r_i_t = a_i + (b_i,k * r_k,t) +e_i,t

If we now do a sign flip, then the beta (b) will have a minus one in front of, but the market or PCA return r_m will also have a minus one in front of it. These cancel, and estimation of both the persistent bias (alpha, a_i) and the temporary error (epsilon, e_m) will be unaffected. 


Trading the alpha

So we have two basic ideas; we generate our PCA and then run regressions that look like this:

r_i_t = a_i + (b_i,k * r_k,t) + ... +e_i,t

Where there are one or more PCA k.... And then we eithier buy positive a_i and sell negative; or we sell things with recent cumulative positive e_i.

There are still many design questions to resolve here. How many PCA do we include? Too few, and we'll probably end up missing something interesting. Too many and there is a risk we'll end up without clear signals. Over what period should we estimate betas and alphas? Basically how persistent are they likely to be. Over what period should we cumulate epsilon? Are there periods in which episilon will be trending rather than mean reverting; eg assets that have outperformed their factor adjusted return will continue to do so (which will look an awful lot like buying positive alpha)?

For the PCA I'm going to keep it simple and initially use three PCA, which happens to be the most I can plot and get my head around it. I'm also going to stick to estimating my alphas and betas over a 12 month period, which is the arbitrary period I used before to estimate the PCA themselves (seems weird to use a different period). For the question of epsilon decay I will risk the wrath of the overfitting gods and do a time sensitivity analysis.

To summarise then: At the start of each month we look at the 12 months normalised returns, do a PCA, and then regress each instrument on the returns of each component. We then have an alpha intercept coefficient, and some betas (at most three, once for each PCA). We can see how predictable the alpha is of returns in the following month(s). Then for the following month we can also calculate the residual of performance vs the fitted model. We can cumulate up these residuals and see how they forecast performance.


Alpha

Let's start with the alphas. Here be a massive scatter plot:


Each point is the alpha calculated at the start of a given month for an instrument, and the normalised ex-post return for the following month. It looks like there might be a weak positive relationship there, so let's do some stats.

                           OLS Regression Results                            
==============================================================================
Dep. Variable:         ex_post_return   R-squared:                       0.006
Model:                            OLS   Adj. R-squared:                  0.006
Method:                 Least Squares   F-statistic:                     169.0
Date:                Mon, 08 Sep 2025   Prob (F-statistic):           1.62e-38
Time:                        11:46:09   Log-Likelihood:                 1910.6
No. Observations:               26311   AIC:                            -3817.
Df Residuals:                   26309   BIC:                            -3801.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0052      0.001      3.735      0.000       0.002       0.008
alpha          0.3144      0.024     12.999      0.000       0.267       0.362
==============================================================================

There: we have the classic undergrad stats exercise question "Why is my t-stat big but my R squared is low?". Answer: there is something here, but it's weak. This often happens if you use a large dataset (26,000 observations here).

To show this differently, the conditional ex-post average daily return over the following month with a positive alpha is 0.02, and with a negative alpha is -0.01 (both conditional subsets are roughly half the dataset overall). The t-statistic comparing these is a hefty 10, corresponding to a p_value of the order of 10^-26. So again, alpha definitely has an effect, but is that difference really that big? Hard to tell.

But we know that low R squared are pretty common in finance, so is this a problem? To test this I tried using the alphas as a forecast, and then calculated the Sharpe Ratio of each forecast. The median across all instruments is a SR of 0.1. Remember that trend following gives us around 0.3 to 0.4 for each instrument, so this isn't especially interesting.

It might be that I would get better results from a different lookback to calculate the alphas (remember we use one year). Everything from a 1 month to a lookback of 10 years. What about using fewer, or more principal components? Remember we're going with three.  It turns out that a one year lookback is pretty optimal compared to shorter or longer; but using one PC is better than using two or more. Still the very best we can do is a one year lookback with one PC, and that gives us a SR of 0.12 which is hardly in wallet busting territory and also not significantly different from the result wih three factors.


Trading the residual

Let's turn then to trading the residuals. We're going to cumulate up residuals over various periods and see how well that predicts future returns. To avoid a forward looking forecast, the residuals are calculated on the out of sample month following the point at which the model is fitted. Otherwise the regression coefficients would be forward looking, and hence so would the forecast.

Note that because the model changes slightly each month, the coefficients used to calculate residuals will also change slightly. Such is life. But we'll keep stacking up the residuals month by month even though they are using different models.

We now have 3 knobs to twiddle on our overfitting machine; lookback and number of PCs as before, but also the number of days we sum up residuals. To keep things relatively simple I will initially sum them up using 22 days (about a month of business days). So our base case is:
  • One year lookback to do PCA and calculate coefficients
  • 3 principal components
  • 22 days summing up of residuals; our forecast is minus the summed residuals
And jumping straight to Sharpe Ratio calculation like an impatient toddler, we get a SR of -0.04. The sign is wrong, showing that positive residuals lead to more positive performance, and the effect is also v.v.v. weak.

Does increasing the residual summing period work; eg mean reversion works over longer time periods? Nope. Anything up to a year is actually worse. Going down to a week (which would be v.v.v. costly to trade) does at least push the SR into positive territory, but only just.

Dropping to one PC (which was marginally better than for alpha above), changing the lookback on the PCA, .... nothing produces useful results. This idea is an already dead donkey that has been subsequently thrown off a cliff and then burned**.

** no actual donkeys were harmed in the creation of this blog post

 

Back to alpha

So we had a not too promising individual SR using alpha on an instrument level, but how does that look on a portfolio level? Surprisingly, quite good. Here are the combined results for 100 instruments:


That bad boy has a SR of 0.84! Some of that is diversification, but some of it is because a more accurately calculated median SR per instrument is 0.15, higher than the 0.10 calculated earlier (because, many reasons, like buffering and what not). Still, that's an extremely high realised diversification of over 5. Let's compare it to the 'gold standard' of single momentum models, EWMAC16,64 with the same instruments:


OK not as good, even counting the different vol ewmac comes in with a SR of 1.08, but their correlation is a relatively lowly 0.3. That suggests a modest allocation to alpha persistence will earn some money. Chucking 10% of your forecast weights into alpha persistence bumps up the SR of ewmac16,64 from 1.08 to 1.12. Going to the (arguably in sample fitted) best model with one principal component improves the SR of the alpha model by itself to 1.02, but also increases the correlation with momentum; so that the joint SR with 10% in alpha persistence and 90% in ewmac produces a pretty much unchanged SR of 1.13.


Summary

Research in systematic trading tends to result in a lot of blind alleys. I thought this would be another one. Certainly the idea of mean reverting the errors, a classic from the equity stat arb crowd, doesn't really work in this context. However there does seem to be some modest performance gain to basic momentum from including a PCA derived alpha persistence model. The gain is small however, so it's debatable whether it's worth what would be quite a lot of additional work. Not a blind alley then, but not a very pleasant one to spend much time in.




Tuesday, 1 July 2025

PCA analysis of Futures returns for fun and profit, part #1

 I know I had said I wouldn't be doing any substantive blog posts because of book writing (which is going well, thanks for asking) but this particular topic has been bugging me for a while. And if you listened to the last episode of Top Traders Unplugged you will hear me mention this in response to a question. So it's an itch I feel I need to scratch. Who knows, it might lead to a profitable trading system.

Having said all that, this post will be quite short as it's really going to be an introduction to a series of posts.


Given factor analysis

So at it's heart this is a post about factors. Factors are the source of returns, and of risk. This concept came from the land of equities, specifically the long short factor sorts beloved of Mssrs Fama and French; and it also spawned an entire industry: the modern equity market neutral hedge funds (although Alfred Winslow Jones actually implemented the whole hedge fund idea whilst Fama and French were still in high school). 

At it's core then we have the idea of the APT risk model which is basically a linear regression:

r_i,t = a_i + B_1_i*r_1_t + ..... + e

Where r_i,t is the return on asset i and time t, a_i is the alpha on asset i (assumed to be zero), B_1_i is the Beta on the first risk factor of asset i, r_1_t is the return of the first risk factor, there are more terms like this, and e is an error term with mean zero. Strictly speaking the returns on both i and the risk factor should be excess returns with risk free rate deducted, but we're futures traders so that detail can be safely ignored.

In it's simplest form with a single factor that is 'the market',  this is basically just the OG CAPM/EMH, and B_1 is just Beta. In a more complex form we can include things like the sorted portfolios of Fama and French. Notice that risk and return are intrinsically linked here. The factor is assumed to be some kind of risk that we get paid a price for exposure to. That price is the B_N term. 

(Should B_N be estimated in a time varying way? Perhaps. Although if you vol normalise everything first, you will find your B_N are much more stable, as well as being more interpretable).

Note that for both the market and the Fama French factors (FFF), the factors are given. To be precise, in both cases the factors consist of portfolios of the underlying assets, with some portfolio weights. For the market portfolio, those portfolio weights are (usually) market cap weights. For the FFF they are the +1 for top quartile, -1 for bottom quartile sort of thing. 


What can we do with factors?

Many things! The dual nature of factors as risk and return drivers leads them to multiple uses. So for example, we could own the factors. They are just portfolios, and going long if you think the factor will earn you a risk premium is not a bad idea. If you buy an S&P 500 ETF, well congratulations you have gone long the equity market beta factor. With the ability to go long and short we can own FFF as easily as the market factor. Indeed there are funds that allow you to get exposure to FFF factors or similar, though sometimes only on the long side. 

We could also trade the factors. My own work in my previous book, AFTS, suggests that 70% of the returns of a momentum portfolio come from trading an asset class index. That is an equal vol weighted rather than market cap weighted portfolio, but the overall effect is similar. Trading, i.e. market timing, the FFF or similar is a little more difficult and if you try to do it Cliff Asness will turn up at your house and hit you repeatedly with a stick.

If we treat the factors as risk we don't want, and we don't buy the idea of an efficient market, then we can buy high alpha / sell low alpha. If a stock looks like it has excess return, over and above what that market and FFF say it should have, then maybe it is a good bet? Although financial economists will scoff at you and say you are exposed to a risk that is not in your regression for which you are earning a risk premium, you can just point to your porsche and explain in great detail how you don't care.

Perhaps we believe in the efficient market hypothesis in the long term, but not in the short term. We wouldn't trust those alphas to be persistent as far as we could throw them. But if we take the residual term, e, well that will most likely show a lovely mean reverting pattern when cumulated. So we can mean revert the residual. Big upward swings away from efficiency that we can short the asset on, and lovely downward pulls we can go long on.

There are more esoteric things people do with factors, mainly to do with risk management. You can for example use them to construct robust correlation matricies, hedging portfolios and what not. Risk management isn't my principal concern here, but that is still good to know.


PCA factor analysis

This is all lovely, especially in equities, but in futures things are a bit more mysterious. For starters, we can do things at an asset class level (which is closer in spirit to the equity market neutral world, although we're still at a level higher as our components are e.g. equity indices, not individual equities); but we can also uniquely do a 'whole market' look by considering futures as a whole.

We could probably take a stab at creating an 'asset class' factor in each market that would be like Beta, and indeed I did that in AFTS with my equal risk weighted index. We know that there are certain bellweather markets like the S&P 500 that we could use as proxies for 'the market' in individual asset classes. 

But for futures as a whole, things are much harder. Is the 'market' really just long everything? Even VIX/VSTOXX where we know the risk premium is on the short side? My gut feeling is that our most important factor will be some kind of risk on/off, but then there will be times like 2022 when it would plausibly have been more inflation related. And what would the second factor be?

So we will switch tactics, and rather than use given factors, we will use discovered factors. The idea here is that data itself can tell us what the main latent drivers of returns are, if we just look hard enough. Sure in many cases that will give us the first factor as basically the market portfolio, but the subsequent factors will be more interesting. And in the specific case of futures, where we don't know what the likely factors are, it's going to be quite intruiging.

We use a PCA to discover these factors, with vol normalised returns as the starting point. For each factor we end up with a set of portfolio weights (can be long or short), which can then be helpful to interpret the factor. Note the weights are on vol normalised returns, which are more intuitive.


Sidebar: PCA meta factor analysis (on strategy returns)

Just as a brief note, as I don't intend to cover this here, but it was touched on in the podcast. If we started with the returns of trading eg momentum on a bunch of instruments, rather than the underlying returns themselves, then that might be useful for someone was thinking about replicating a hedge fund index or risk managing a CTA, or perhaps constructing a CTA where they have hedged out the principal component(s) of CTA risk. I've written about replication before, and I've already said I'm not really concerned with risk management here, so I won't talk about this again.


Some nice pictures

This won't be a long post, as I said, as I won't be looking at how to use the PCA returns now I have them. Instead I'm going to focus on visualising the PCAs and interpreting them. Which will be a bit of fun anyway. Methodological points, I used vol normalised returns and one year rolling windows to estimate my PCA. There is a debate to be had as to whether a year is best at compromising between having enough data and a stable result, or whether we need to adapt quicker to changing market conditions. 

I estimated at least two, and up to N/2 PCA depending on how many markets N had data. I used 100 liquid futures markets with daily data back to 1970 where possible.

Let's start with the contribution to variance. This is how important each PCA is.

We can see that the first PCA for the last 12 months at least explains 18% of the variance, the second 12.5% and so on. In contrast if we did this for US equities we'd find the first PCA explained 50%, and for US bonds it would be 70%. There is a lot more going on here.

If we look at how the first two factors contributions vary over time:


... we can see that there has been a bit of a downward trend as more factors arrive in the sample, but more generally the first PCA does hover around 20% and the second around 10%. There are exceptions like 2008 where I would imagine a big risk off bet drove the market. The same was true doing COVID.

What is the first PCA? Well currently it looks like this:

For clarity I've only included the top 20 and bottom 20 instruments by weight. Still you may be struggling to read the labels. The top markets are pretty much all stocks, with European equities getting a bigger weight. The S&P does just sneak into the top 20. The bottom 20 starts with VIX and VSTOXX, but mostly the weights here are quite small. So the first PCA right now is "Equity Beta, with a tilt towards Europe".

What about the second PCA?

The first 6 positive instruments are all US bonds, and nearly all the rest are government bonds of one flavour or another. Only EU-Utility stocks get to crash this party (interest rate sensitive?). On the short side we have some FX and quite a few energy futures. So this second factor is "Long bonds / Short energies"


PCA 3 is long a whole bunch of FX, which means it's short USD, and also some metals and random commodities. On the short side it's short EU-Health equities, CNHUSD FX and a whole bunch of European bonds. Feels a bit trade related. Shall we call this the Trump factor?

Anyway I could continue, but more intuitive would be to understand how these factors have changed over time. We'll pick some key markets with lots of history. We will then plot the weight each has in a given PCA over time. 

Here is the S&P 500:

We can see that is mostly positive on PCA1 and negative on PCA2,3 but there are periods when that is not the case. The sharp drops in weighting suggest that perhaps we ought to run at something longer than a year, or use an EWMA of weights to smooth things out.

Here is US10 year:


Again, this mostly loads positive on PCA2 but not always. You can see the increase in correlation of bonds and equities happening as PCA1 creeps up in the last few years.

Here is the first PCA weighting in June 2004, one of those interesting periods.


You can see that it was all about currencies in that period; plus silver and gold, various other metals, bonds and energies. So very much a short Dollar, long metals trade.

We're nearly done for today. Last job is to plot the factors. Here are the cumulated returns for PCA1:



That looks a lot like a vol normalised equity market; note the drops in 2009 and 2020.

And here is PCA2:

Again that could plausibly be bonds, mostly up with the exception of the post 2022 period.

This suggests another research idea which is to use the S&P 500 and US interest rates as 'given factors' which might be more stable than using PCA. Still that would mean missing out on times like 2004 when other things were driving the market. 


What's next

Next step would be to look at some of those opportunities for factor use and misuse outlined above, and see if there is profit as well as fun in this game!