Friday, 1 October 2021

Mr Greedy and the Tale of the Minimum Tracking Error Variance [optimising portfolios for small accounts dynamic optimisation testing / yet another method / IT WORKS!]

 This is the sixth (!) post in a (loosely defined) series about finding the best way to trade futures with a relatively small account size.

  • This first (old) post, which wasn't conciously part of a series, uses an 'ugly hack': a non linear rescaling of forecasts such that we only take positions for relatively large forecast size. This is the method I was using for many years.
  • These two posts (first and second) discuss an idea for dynamic optmisation of positions. Which doesn't work.
  • This post discusses a static optimisation method for selecting the best set of instruments to include in a system given limited capital. This works! This is the method I've been using more recently.
  • Finally, in the last post I go back to dynamic optimisation, but this time using a simpler heuristic method. Again, this didn't work very well.
(Incidentally, if you're new to this problem, it's probably worth reading the first post on dynamic optimisation post in my series

That would be the end of the story, apart from the fact that I got a mysterious twitter reply:


After some back and forth I finally got round to chatting to Doug in mid September. Afterwards, Doug sent me over some code, which was in R, a language I haven't used for 14 years but I did manage to translate it into Python and then integrate it into https://github.com/robcarver17/pysystemtrade, and added in some variations of my own (more realistic costs, and some constraints for use in live trading). Then I tested it. And blimey, it actually bloody worked.


What was Doug's brilliant idea


Doug's idea had two main insights:
  • It's far more stable to minimise the variance of the tracking error portfolio, rather than using my original idea (maximising the utility of the target portfolio, having extracted the expected return from the optimal portfolio). And indeed this is a well known technique in quant finance (people who run index funds are forever minimising tracking error variance).
  • A grid search is unneccessary given that in portfolio optimisation we usually have a lot of very similar portfolios, all of which are equally as good, and finding the global optimum doesn't usually give us much value. So using a greedy algorithm is a sufficiently good search method, and also a lot faster as it doesn't require exponentially more time as we add assets.

Mr Greedy
https://mrmen.fandom.com/wiki/Mr._Greedy?file=Mr_greedy_1A.png



Here's the core code (well my version of Doug's R code to be precise):

def greedy_algo_across_integer_values(
obj_instance: objectiveFunctionForGreedy
) -> np.array:

## Starting weights
## These will eithier be all zero, or in the presence of constraints will include the minima
weight_start = obj_instance.starting_weights_as_np()

## Evaluate the starting weights. We're minimising here
best_value = obj_instance.evaluate(weight_start)
best_solution = weight_start

done = False

while not done:
new_best_value, new_solution = _find_possible_new_best(best_solution = best_solution,
best_value=best_value,
obj_instance=obj_instance)

if new_best_value<best_value:
# reached a new optimium (we're minimising remember)
best_value = new_best_value
best_solution = new_solution
else:
# we can't do any better (or we're in a local minima, but such is greedy algorithim life)
break

return best_solution


def _find_possible_new_best(best_solution: np.array,
best_value: float,
obj_instance: objectiveFunctionForGreedy) -> tuple:

new_best_value = best_value
new_solution = best_solution

per_contract_value = obj_instance.per_contract_value_as_np
direction = obj_instance.direction_as_np

count_assets = len(best_solution)
for i in range(count_assets):
temp_step = copy(best_solution)
temp_step[i] = temp_step[i] + per_contract_value[i] * direction[i]

temp_objective_value = obj_instance.evaluate(temp_step)
if temp_objective_value < new_best_value:
new_best_value = temp_objective_value
new_solution = temp_step

return new_best_value, new_solution

Hopefully that's pretty clear and obvious. A couple of important notes:

  • The direction will always be the sign of the optimal position. So we'd normally start at zero (start_weights), and then get gradually longer (if the optimal is positive), or start at zero and get gradually shorter (if the optimal position is a negative short). This means we're only ever moving in one direction which makes the greedy algorithim work. Note: This is different with certain corner cases in the presence of constraints. See the end of the post.
  • We move in steps of per_contract_value. Since everything is being done in portfolio weights space (eg 150% means the notional value of our position is equal to 1.5 times our capital: see the first post for more detail), not contract space, these won't be integers; and the per_contract_value will be different for each instrument we're trading.

Let's have a little look at the objective function (the interesting bit, not the boilerplate). Here 'weights_optimal_as_np' are the fractional weights we'd want to take if we could trade fractional contracts:


class objectiveFunctionForGreedy:
    ....
def evaluate(self, weights: np.array) -> float:
solution_gap = weights - self.weights_optimal_as_np
track_error = \
(solution_gap.dot(self.covariance_matrix_as_np).dot(solution_gap))**.5

trade_costs = self.calculate_costs(weights)

return track_error + trade_costs

def calculate_costs(self, weights: np.array) -> float:
if self.no_prior_weights_provided:
return 0.0
trade_gap = weights - self.weights_prior_as_np
costs_per_trade = self.costs_as_np
trade_costs = sum(abs(costs_per_trade * trade_gap * self.trade_shadow_cost))

return trade_costs

The tracking error portfolio is just the portfolio whose weights are the gap between our current weights and the optimal unrounded weights, and what we are trying to minimise is the standard deviation of that portfolio.

The covariance matrix used to calculate the standard deviation is the one for instrument returns (not trading subsystem returns); if you've followed the story you will recall that I spent some time grappling with this decision before and I see no reason to change my mind.  For this particular methodology the use of instrument returns is a complete no-brainer.

The shadow cost is required because portfolio standard deviation and trading costs are in completely different units, so we can't just add them together. It defaults to 10 in my code (some experimentation reveals roughly this value gives the same turnover as the original system before the optimisation occurs. As you'll see later I haven't optimised this for performance).


Performance


And let's have a look at some results (45 instruments, $100K in capital; I have tried this with other systems to the same effect). All systems are using the same basic 3 momentum crossover+carry rules I introduce in chapter 15 of my first book.



So if we were able to take fractional positions (which we're not!) we could make the green line (which has a Sharpe Ratio of 1.17). But we can't take fractional positions! (sad face and a Sharpe Ratio of just 1.06). But if we run the optimiser, we can achieve the blue line, even without requiring fractional positions. Which has a Sharpe Ratio of .... drumroll... 1.19. 

Costs? About 0.13 SR units for all three versions.

Tiny differences in Sharpe Ratio aside, the optimisation process does a great job in getting pretty close to the raw unrounded portfolio (correlation of returns 0.94). OK the rounded portfolio is even more correlated (0.97) , but I think that's a price worth paying.

That's a little higher than what you will have seen in previous backtests. The reason is I'm now including holding costs in my calculations. I plan to exclude some instruments from trading whose holding costs are a little high, which will bring these figures down, but for now I've left them in.

If I do a t-test comparing the returns of the three account curves I find that the optimised version is indistinguishable from the unrounded fractional position version. And I also find that the optimised version is significantly better than the rounded positions: a T-statistic of 3.7 with a tiny p-value. Since these are the versions we can actually run in practice, that is a stonking win for the optimisation.


Risk


The interesting thing about the greedy algorithm is that it gradually adds risk until it finds the optimal position, whilst trying to reduce the variance of the delta portfolio, so it should hopefully target similar risk. Here is the expected (ex-ante) risk for all three systems:

Red line: original system with rounded positions, Green line: original system with unrounded positions, Blue line: optimised system with unrounded positions


It's quite hard to see what's going on there, so let's zoom into more recent data:


You can see that the optimiser does a superb job of targeting the required risk, compared to the systematically under-risked (in this period - sometimes it will be over-risked) and lumpy risk presented by the rounded positions. The ex-post risk over the entire dataset comes in at 22.4% (unrounded), 20.9% (rounded) and 21.6% (optimised); versus a target of 20%. 


How many positions do we typically take?


An interesting question that Doug asked me is how many positions the optimiser typically takes.

"Curious how many positions it uses of the 45??

I guess I don’t know how much capital you are running, but given there are really only maybe 10 independent bets there (or not many more) does it find a parsimonious answer that you might use even if you had a giant portfolio to run?"

Remember we have 45 markets we could choose from in this setup, how many do we actually use?

Blue line: Total number of instruments with data. Orange line: Instruments with positions from optimiser


The answer is, not many! The average is 6, and the maximum is 18; mostly it's less than 12. And (apart from at the very beginning) the number of instruments chosen hasn't increased as we add more possible instruments to our portfolio. Of course we couldn't really run our system with just 12 instruments, since the 12 instruments we're using varies from period to period. But as Doug notes (in an email):

"Pretty cool. The dimension of the returns just isn’t all that high even though there are so many things moving around." 

Interestingly, here are some statistics showing the % of time any given instrument has a position on. I've done this over the last 7 years, as otherwise it would be biased towards instruments for which we have far more data:

PALLAD          0.00
COPPER          0.00
GASOILINE       0.00
FEEDCOW         0.00
PLAT            0.00
HEATOIL         0.00
GBP             0.01
REDWHEAT        0.02
WHEAT           0.02
EUR             0.02
NZD             0.03
CRUDE_W_mini    0.03
US20            0.03
US10            0.03
SOYOIL          0.04
LEANHOG         0.04
AUD             0.04
LIVECOW         0.05
JPY             0.05
SOYMEAL         0.06
CAC             0.07
SOYBEAN         0.08
OATIES          0.08
OAT             0.09
RICE            0.10
US5             0.11
MXP             0.12
CORN            0.13
BTP             0.14
SMI             0.14
AEX             0.14
GOLD_micro      0.19
EUROSTX         0.23
KR10            0.24
BITCOIN         0.26
EU-DIV30        0.33
BUND            0.39
EDOLLAR         0.39
BOBL            0.43
NASDAQ_micro    0.47
VIX             0.55
SP500_micro     0.60
KR3             0.66
US2             0.73
SHATZ           0.74


Notice we end up taking positions in all but six instruments. And even if we never take a position in those instruments, their signals are still contributing to the information we have about other markets. Remember from the previous posts, I may want to include instruments in the optimisation that are too illiquid or expensive to trade, and then subsequently not take positions in them.

I've highlighted in bold the 16 instruments we trade the most. You might want to take the approach of only trading these instruments: effectively ignoring the dynamic nature of the optimisation and saying 'This is a portfolio that mostly reproduces the exposure I want'. 

However notice that they're all financial instruments (Gold and Bitcoin, quasi financial), reflecting that the big trends of the last 7 years have all been financial. So we'd probably want to go further back. Here are the instruments with the most positions over all the data:

PLAT            0.24
RICE            0.25
CRUDE_W_mini    0.26
NASDAQ_micro    0.27
SOYMEAL         0.33
US2             0.35
US5             0.38
WHEAT           0.39
LIVECOW         0.40
LEANHOG         0.41
CORN            0.43
SOYOIL          0.44
EDOLLAR         0.46
SP500_micro     0.56
GOLD_micro      0.56
OATIES          0.61

That's a much more diversified set of instruments. But I still don't think this is the way forward.


Tracking error


Tracking error: I like to think of as quantified regret. Regret that you're missing out on trends in markets you don't have a full size position in...

What does the tracking error look like? Here are the differences in daily returns between the unrounded and optimised portfolio:

The standard deviation is 0.486%. It also looks like the tracking error has grown a little over time, but 2020 aside it has been fairly steady for a number of years. My hunch is that as we get more markets in the dataset it becomes more likely that we'll have fractional positions in a lot of markets that the optimiser can't match very well. And indeed, if I plot the tracking error of rounded versus unrounded portfolios, it shows a similar pattern. The standard deviation for that tracking error is also virtually identical: 0.494%. 



Checking the cost effects


It's worth checking to see what effect the cost penalty is having. I had a quick look at Eurodollar, since we know from the above analysis that it's a market we normally have a position on. Zooming in to recent history to make things clearer:




The green line is the unrounded position we'd ideally want to have on, wheras the red line shows our simple rounded (and buffered) position. The blue line shows what the optimiser would do without a cost penalty. It trades - a lot! And the orange line shows our optimised position. It's trading a bit more than the red line, and interestingly often has a larger position on (where it's probably taking on more of the load of being long fixed income from other instruments), but it's definitely trading a lot less than the blue line.

Interestingly the addition of the cost penalty doesn't reduce backtested costs much, and reduces net performance a little: about 1 SR point, but I'd still rather have the penalty thanks very much. 



Much lower capital

To ssee how robust this approach is, let's repeat some of the analysis above with just 50K in capital. 




So the optimised version is still better than the rounded (SR improvement around 0.1), but nowhere near as good as the unrounded (SR penalty around 0.2). We can't work miracles! With 50K we just don't have enough capital to accurately reflect the exposures we'd like to take in 45 markets. The tracking error vs the unrounded portfolio is 0.72% for the optimiser (versus 0.49% earlier with 100K), but is even higher for the rounded portfolio (0.76%). The correlation between the optimiser and ideal unrounded optimal returns has dropped to 0.86 (0.94 with 100K); but for rounded positions is even lower: 0.84 (0.97 with 100K).

Less capital makes it harder to match the unrounded position of a large portfolio of instruments, but relatively speaking the dynamic optimisation is still the superior method.


What does this mean?

Let's just review what our options are if we have limited capital, of say $100K:
  • We could win the lottery and trade the unrounded positions.
  • We could try and trade a lot of instruments - say 45 - and just let our positions be rounded. This gives us a SR penalty of around 0.1 SR versus what we could do with unrounded positions. The penalty would be larger (in expectation) with less capital and / or more instruments (eg it's 0.3SR with 50k). The tracking error would also get larger for smaller capital, relative to the number of instruments.
  • We could try and choose a set of static instruments and just trade those. In this post I showed that we could probably choose 16 instruments using a systematic approach. This would also give us a SR penalty of around 0.1 SR in expectation, but the tracking error would be larger than for rounded positions. Again with less capital / more instruments both penalties and tracking error would be larger.
  • We could use the 'principal components' approach, to choose a static list of the 16 instruments that are normally selected by the optimiser. I've highlighted these in the list of instruments above. Our tracking error would be a little smaller (in expectation) than for rounded positions, but we'd still have a SR penalty of around 0.1 SR.
  • We could have as many instruments as we liked in our portfolio and use the dynamic optimisation approach to decide which of those to hold positions for today. Normally this means we'll only have positions in around 10 instruments or so, but the 10 involved will change from day to day. Our tracking error would be similar as for rounded positions, but we'd not be giving up much in terms of SR (if anything). With smaller capital or more instruments we'd get some SR penalty (but less than the alternatives), and higher tracking error (but again better than the alternatives). 
Ignoring the first option, it strikes me that dynamic optimisation brings significant benefits, which for me overcome the additional complexity it introduces into our trading.


Live trading


If you're clever, you will have noticed that the algo code above doesn't include any provision for some of the features I specified in my initital post on this subject:

  • A 'reduce only' constraint, so I can gradually remove instruments which no longer meet liquidity and cost requirements
  • A 'do not trade' constraint, if for some reason 
  • Maximum position constraints (which could be for any reason really) 
The psystemtrade version of the code here covers these possibilities. It adjusts the starting weights and direction depending on the constraints above, and also introduces minima and maxima into the optimisation (and prevents the greedy algorithim from adjusting weights any further once they've hit those). It's a bit complicated because there are quite a few corner cases to deal with, but hopefully it makes sense.

Note: I could use a more exhaustive grid search for live trading, which only optimises once a day, but I wouldn't be able to backtest it with 100+ instruments so I'll stick with the greedy algorithim, which also has the virtue of being a very robust and stable process and avoids duplicating code.

Let's have a play with this code and see how well it works. Here's what the optimised positions are for a particular day in the data. In the first column is the portfolio weight per contract. The previous days portfolio weights are shown in the next column. The third column shows the optimal portfolio weights we'd have in the absence of rounding. The optimised positions are in the final column. I've sorted by optimal position, and removed some very small weights for clarity:

              per contract  previous  optimal  optimised
SHATZ 1.32 -10.56 -3.08 -5.27
BOBL 1.59 -1.59 -1.09 0.00
OAT 1.97 0.00 -0.31 0.00
VIX 0.24 0.00 -0.14 -0.24
EUR 1.47 0.00 -0.13 0.00
... snip...

GOLD_micro 0.18 0.00 -0.01 -0.18
... snip...

AEX 1.84 1.86 0.12 0.00
EUROSTX 0.48 0.00 0.13 0.48
EU-DIV30 0.22 0.22 0.14 0.00
MXP 0.25 0.00 0.16 0.00
US10 1.33 1.33 0.17 1.33
SMI 1.27 0.00 0.19 0.00
SP500_micro 0.22 0.00 0.21 0.00
US20 1.64 0.00 0.23 0.00
NASDAQ_micro 0.30 0.31 0.26 0.30
US5 1.23 2.47 0.33 2.47
KR10 1.07 0.00 0.41 0.00
BUND 2.01 0.00 0.91 0.00
EDOLLAR 2.47 0.00 1.44 0.00
KR3 0.94 3.77 2.96 3.77
US2 2.20 8.81 17.44 8.81

Now let's suppose we could only take a single contract position in US 5 year bonds, which is a maximum portfolio weight of 1.23:

              weight per contract  previous  optimal  optimised  with no trade
SHATZ 1.32 -10.56 -3.08 -5.27 -2.63
BOBL 1.59 -1.59 -1.09 0.00 0.00
OAT 1.97 0.00 -0.31 0.00 0.00
VIX 0.24 0.00 -0.14 -0.24 -0.24
... snip...
GOLD_micro                   0.18      0.00    -0.01      -0.18           0.00
... snip...
AEX                          1.84      1.86     0.12       0.00           0.00
EUROSTX 0.48 0.00 0.13 0.48 0.48
EU-DIV30 0.22 0.22 0.14 0.00 0.00
MXP 0.25 0.00 0.16 0.00 0.00
US10 1.33 1.33 0.17 1.33 1.33
SMI 1.27 0.00 0.19 0.00 0.00
SP500_micro 0.22 0.00 0.21 0.00 0.00
US20 1.64 0.00 0.23 0.00 0.00
NASDAQ_micro 0.30 0.31 0.26 0.30 0.30
US5 1.23 2.47 0.33 2.47 1.23
KR10 1.07 0.00 0.41 0.00 0.00
BUND 2.01 0.00 0.91 0.00 0.00
EDOLLAR 2.47 0.00 1.44 0.00 0.00
KR3 0.94 3.77 2.96 3.77 3.77
US2 2.20 8.81 17.44 8.81 8.81
That works. Notice that to compensate we reduce our short in two correlated market (German 2 year bonds and Gold, both of which have correlations above 0.55); for some reason this is a better option that increasing our long position elsewhere.

Now suppose we can't currently trade German 5 year bonds, Bobls, (but we remove the position constraint):

              weight per contract  previous  optimal  optimised  with no trade
SHATZ 1.32 -10.56 -3.08 -5.27 -2.63
BOBL 1.59 -1.59 -1.09 0.00 -1.59
OAT 1.97 0.00 -0.31 0.00 0.00
VIX 0.24 0.00 -0.14 -0.24 -0.24
... snip...
GOLD_micro                   0.18      0.00    -0.01      -0.18          -0.18
... snip ...
AEX                          1.84      1.86     0.12       0.00           0.00
EUROSTX 0.48 0.00 0.13 0.48 0.48
EU-DIV30 0.22 0.22 0.14 0.00 0.00
MXP 0.25 0.00 0.16 0.00 0.00
US10 1.33 1.33 0.17 1.33 2.66
SMI 1.27 0.00 0.19 0.00 0.00
SP500_micro 0.22 0.00 0.21 0.00 0.00
US20 1.64 0.00 0.23 0.00 0.00
NASDAQ_micro 0.30 0.31 0.26 0.30 0.30
US5 1.23 2.47 0.33 2.47 1.23
KR10 1.07 0.00 0.41 0.00 0.00
BUND 2.01 0.00 0.91 0.00 0.00
EDOLLAR 2.47 0.00 1.44 0.00 0.00
KR3 0.94 3.77 2.96 3.77 3.77
US2 2.20 8.81 17.44 8.81 8.81
Our position in Bobl remains the same, and to compensate for the extra short we go less short 2 year Shatz, longer 10 year German bonds (Bunds), and there is also some action in US 5 year and 10 year bonds.

Finally consider a case when we can only reduce our position. There are a limited number of markets where this will do anything in this example, so let's do it with Gold and Eurostoxx (which the previous day have zero position, so this will be equivalent to not trading)

                weight per contract  previous  optimal  optimised  reduce only
SHATZ 1.32 -10.56 -3.08 -5.27 -5.27
BOBL 1.59 -1.59 -1.09 0.00 0.00
OAT 1.97 0.00 -0.31 0.00 0.00
VIX 0.24 0.00 -0.14 -0.24 -0.24
EUR 1.47 0.00 -0.13 0.00 0.00
... snip ...
GOLD_micro                   0.18      0.00    -0.01      -0.18         0.00
... snip ...
EUROSTX                      0.48      0.00     0.13       0.48         0.00
EU-DIV30 0.22 0.22 0.14 0.00 0.22
MXP 0.25 0.00 0.16 0.00 0.00
US10 1.33 1.33 0.17 1.33 1.33
SMI 1.27 0.00 0.19 0.00 0.00
SP500_micro 0.22 0.00 0.21 0.00 0.22
US20 1.64 0.00 0.23 0.00 0.00
NASDAQ_micro 0.30 0.31 0.26 0.30 0.30
US5 1.23 2.47 0.33 2.47 1.23
KR10 1.07 0.00 0.41 0.00 0.00
BUND 2.01 0.00 0.91 0.00 0.00
EDOLLAR 2.47 0.00 1.44 0.00 0.00
KR3 0.94 3.77 2.96 3.77 3.77
US2 2.20 8.81 17.44 8.81 8.81

Once again the exposure we can't take in Eurostoxx is pushed elsewhere: into EU-DIV30 (another European equity index) and S&P 500; the fact we can't go as short in Gold is compensated for by a slightly smaller long in US5 year bonds.


What's next


I've prodded and poked this methodology in backtesting, and I'm fairly confident it's working well and does what I expect it to. The next step is to write an order generation layer (the bit of code that basically takes optimal positions and current live positions, and issues orders: that will replace the current layer, which just does buffering), and develop some additional diagnostic reports for live trading (the sort of dataframes in the section above would work well, to get a feel for how maxima and minima are affecting the results). I'll then create a paper trading system which will include the 100+ instruments I currently have data for. 

At some point I'll be ready to switch my live trading to this new system. The nice thing about the methodology is that it will gradually transition out of whatever positions I happen to have on into the optimal positions, so there won't be a 'cliff edge' effect of changing systems (I might impose a temporarily higher shadow cost to make this process even smoother).

In the mean time, if anyone has any ideas for further diagnostics that I can run to test this idea out I'd be happy to hear them.

Finally I'd like to thank Doug once again for his valuable insight. 



Monday, 6 September 2021

Truth and Liebor

 This will be a bit different from my normal posts. It's basically some personal reflections on the LIBOR fixing scandal, prompted by having just read this book written by Stelios Contogoulas





This post isn't really a book review, although I will say that the book is definitely worth buying. Most of you have probably already read the excellent Spider Network. That is arguably better written than Stelios' book (as it's written by a professional journalist, and as anyone who has read my books knows ex-traders are not always naturally gifted writers - Nassim Taleb is a black swan in this respect). Stelios' book is less polished, but he still does a good job of hooking you into the narrative and it got very exciting towards the end.

More importantly, as far as I am aware Stelios is the only person who has written a book about this scandal from the inside. And his book is very thoughtful and reflective, and his reflection has inspired some personal thoughts of my own.



Three traders


This post is about three people. One of them is Stelios. Another is an Italian by the name of Carlo Palombo. And the third is me.


Stelios


Carlo


Me


What do we have in common? Well, we're all in our forties, and our hair has long since departed our scalps. But more importantly we were all trading interest rate derivatives at Barclays Capital (as the investment banking arm of Barclays bank was known at the time) at the same time: from around September 2002 to February 2004 (when I left the bank). 

In fact until early 2004 the life and career of myself and Stelios followed an eerily similar track. Stelios of course grew up in Greece not England and is three years older than I am, but like me he lived abroad as an expat child. Like me he was interested in computers, and like me he decided a career in IT was not for him (in my case I dropped out after my first year at University, in his case after several years in IT consulting).

We both returned to education a little later in life, attending the University of Manchester at the same time. I was a mature Economics undergraduate, whilst Stelios was doing an MBA. We overlapped by about 18 months but we probably never met, although many of our lectures would have been in the same building.

Stelios was hired by Barclays in early 2002 as an associate after doing an internship (at the same time as I was doing an internship at AHL). When I was being interviewed for a position on the Barclays analyst programme, he had probably just started in the Canary Wharf office (5 North Colonnade - the home of Barclays investment bank then, and now, at least until next year). We were interviewed by some of the same people, a few months apart. 

We were both hired, I suspect, for ulterior motives. Stelios' computing experience meant that he didn't start properly trading for a couple of years, as he was initially tasked with rebuilding the banks yield curve systems. My instinct is that I was hired because I had the right personality and was a few years older than the other graduates - more of that in a second.

In September 2002 I started on the graduate programme. The programme covered around 75 analysts and associates, covering back, middle and front office. I was one of only two traders. The other was Carlo Palombo. 



Derivatives trading at Barclays


Stelios and Carlo were working within a few metres of each other, both working on the interest rate swaps desk (which also traded FRAs). I was on the next bank of desks, but no more than 10 metres from each of them. My job was a little fancier; at least in theory. I was working on the exotics rates desk, which confusingly covered both vanilla options (swaptions, caps and floors) as well as actual exotics (bermudans, CMS, PRDMC...). However like Stelios and Carlo I was very much a junior trader.

My line manager was the desk MD, a very smart and decent guy who looked like a bouncer. But I reported day to day to the desk's senior trader, who ran the main vega book (options maturing in over a year; there was also a gamma book for shorter options which I eventually took over, plus various traders trading FX, caps/floors, inflation; and we also had an on desk quant / trader for the very fancy stuff). 

A thinly disguised version of this bloke appears in my first book ('Sergei'). He was an extremely unpleasant person to work for. I suspect I was hired - despite not having the Phd everyone else on the desk had - because it was thought with a few years of work experience I would be able to deal with this character better than a 21 year old neophyte or fresh faced Phd. It sort of worked - at least for me; I didn't end up being a glorified coffee boy like most junior traders as I refused to take any crap.

But Carlo was reporting to a guy called Jay (who traded the short Euro swaps and FRAs), who made my senior trader look like an social worker.  He really gave Carlo hell, and the poor guy practically cowered under the tirade of abuse he got if he made even the slightest error. I felt sorry for Carlo, as I was working relatively relaxed hours (7am to 5pm), much less than the other analysts on the IB programme, and also a lot less than Carlo who practically had to sleep under his desk to keep up with the workload. Interestingly in Stelios' book he refers to Jay as:

 '... very demanding as a person- particularly with juniors - but when he liked someone, he was a great manager and mentor'. 

OK. Maybe I just didn't see his good side - perhaps he didn't like me or need to like me, or maybe I'm just a snowflake who was too soft to work on the trading floor. I certainly couldn't have worked on the swaps desk which was much larger than ours, and always seemed to have at least five people yelling abuse at each other. 

Outside of business I knew Carlo reasonably well as there were often nights at the pub or house parties with the other members of the grad programme, but I probably only spoke to Stelios half a dozen times during my time at Barclays. 



The crucial post it note


We didn't have a huge amount of interaction with 'the delta desk' as we disparagingly called the swaps traders, although we were supposed to do our hedging with them internally, and we also used to occasionally get them to clear up the fixing risk on our books. Sometimes a complex deal would need co-ordination between the desks, but mostly we had a friendly(ish) distant rivalry. We thought the delta traders were a bit simple (how hard could it be to trade swaps and FRAs, compared to bermudan swaptions?), and they probably thought we were a bit lazy and arrogant. As a junior trader from the stuck up exotics desk I tried to avoid the very scary looking senior swaps traders like Jay wherever possible.

One day however we had some large expiries in our book, and the market price was very close to the strike. 

About 15 minutes before the expiry (and fixing time) 'Sergei' leant over to me and in an uncharacteristically quiet voice said 

"Go tell X that we have a large expiry on this morning". 'X' was a senior swaps trader

'Oh come on, don't make me walk over there. Why don't I just message or call him' I moaned, not fancying running the gauntlet of the swaps desk.

'Don't be so f***** stupid. Go over and tell him, face to face.' hissed Sergei in reply. I rolled my eyes.

'For f**** sake' he muttered, and grabbed a post it note 'Just do it. Here is the expiry we have on. I've written it down so you don't forget it. Make sure you get it right. And make sure you bring that post-it note right back here'

Now I was intruiged. This was more like a spy mission than the normal humdrum business of trading. I wandered over to X (who fortunately was one of the nicer blokes on the swaps desk), and passed the crucial information on.

'We have this expiry today' I said, and read off the post it note. X nodded sagely but said nothing. I stood there for a few moments, not sure exactly what was supposed to happen next. He turned back to his screen, which was obviously my cue to leave. 

I returned to my desk, and sat down. Sergei held out his hand without looking at me.

'Post it note' he snapped. I pulled the scrap of yellow paper out of the pocket I had stuffed it into, and passed it over. I watched as he methodically tore it into tiny pieces, and then put the pieces into his own pocket. Then he turned to me and winked. Belatedly, I realised what had just happened.

Some background information, swaptions (options on swaps) were mostly cash settled against something which you can think of as a bit like a 'Swap Libor' fixing. Like LIBOR it was calculated daily from an average of figures given by a panel of banks. The swaps desk was resposible for submitting their estimated figures of where swaps were trading at a specific time each morning.

Note here the direct analogy with LIBOR:

The swaptions desk will gain / lose if swaps fix in a particular place
-   The swaps desk will gain / lose if LIBOR fixes in a particular place
The swaptions desk are not responsible for submitting the swap fix - the swaps desk are
-   The swaps desk are not responsible for submitting the LIBOR fix - the cash desk are
To influence the swap fix the swaptions desk will have to speak to the swaps desk
-    To influence the LIBOR fix the swap desk will have to speak to the cash desk

Now, I am not saying that Sergei was trying to influence the swaps fix that day in favour of our expiry. And indeed, the message I had passed on was not 'We'd like the fix to be higher today please' All I had told X was the position that we had on. Of course, X could have easily inferred where we would like the fix to be. And he could have used that to change the rate he submitted. 

All in all, it seemed a bit fishy. If this was kosher, why the secrecy? Why didn't Sergei want any electronic or taped record of my conversation with X to exist? Why had he torn up the post it note, and even been careful enough not to put it in the bin by his desk, but presumably take it home for more secure disposal? 

To be clear: I didn't even have the slightest thought that it might be illegal; nothing like this had been covered in eithier my regulatory exams or in the training the bank had provided. And I'd had no formal training whatsoever on the swap fix, or even the expiry process. Still it was definitely a step beyond my own moral boundaries I turned to Sergei and said as confidently as I could:

'I'd rather not do that again if it's okay with you'

He looked at me and smirked. 'Whatever. Now see if you can find a broker to buy us some lunch. I fancy some Ubon today.'

I felt like I'd failed some kind of test, but whatever he thought I was never asked again. In case you're wondering, I don't remember there being anything 'weird' about the expiry today, nor do I remember if we ended up in a profitable position. I have no idea whatsoever if X did anything at all, or if he was just being polite and pretending to do us a favour. 

And, for what it's worth, I never saw any evidence that any further requests were made by Sergei or anyone else. Perhaps he was just very discreet, perhaps it was a very rare event which I just happened to be part of, or perhaps I'd shocked him into a more virtous life (although that seems unlikely). 
 

What I did next


Over the next few months there were other things that seemed fishy to me, but I couldn't avoid doing most of them. One of them I have talked about for several years now, here, in the newspapers, to the UK parliament, and on TV: the practice of selling embedded derivatives to local authorities and housing associations as part of 'LOBO' loans.

Importantly, there was nothing secretive about the LOBO business: communication was done properly over recorded lines, and there were no post it notes bandied around. I remember only one exception, which I described in my earlier blog post:

"On this particular deal the commission was so large in percentage terms that it exceeded internal limits. Even the most hard nosed traders on the trading desk were feeling pangs of.... well not guilt perhaps but fear that this kind of thing might one day be written on a blog. But the broker agreed to take half of the commission spread over subsequent deals, so that was okay."

For that trade there was indeed a lot of whispering, and the real commission was never written down or discussed in a recorded setting - not even on a post it note (it might have been written in biro on someones hand). 

Again it was clear to me that was going on was definitely immoral, but I never even considered it might be illegal. And of course, no court has ever found that Barclays (or any other bank) were engaged in illegal activity in relation to the LOBO deals and there has been no regulatory action. But the banks have 'voluntarily' agreed to 'tear up' many of the LOBO deals and replace with straightforward loans, often taking significant mark to market losses in the process. 

(I remember going to a compulsory course on ethics at Barclays where they told us not to do anything that could end up on the front page of the newspaper, even if it was legal. That amused me no end when I was quoted on the front page of the FT in reference to the LOBO scandal).

The morally grey activities and the stress of working on the sell side all got a bit much for me.  I decided in Febuary 2004 to leave Barclays. My MD tried to make me stay; he even broke the rules and told me what my year end bonus would me if I stayed at least until April. I pointed out that I was probably giving up a lot more money in the long run, but this wasn't for me, and I wasn't entirely happy with a lot of the stuff we were doing.

You know the rest if you've followed my blog; I did a couple of years at an economics think tank and then joined AHL in 2006 where I lived happily every after (at least until 2013, when I left and now live happily ever after writing stuff for you guys to read). 


What Stelios and Carlo did next


What happened to Stelios and Carlo? Well they both stayed at Barclays, and not long after I left Stelios was allowed to begin trading properly, initially on the sterling FRA book, then subsequently covering USD short end swaps for the London desk. And at some point, both were asked to pass on requests to cash desks to ensure LIBOR and/or EURIBOR fixes reflected their trading book.

It's worth quoting from Stelios' book:

"One morning, Fred stood up from his chair...  'Come with me, there's someone I want you to meet'

The two of us walked a few rows away on the edge of the trading floor... Sitting there was Peter Johnson... He was an Englishman in his early fifties with already a long career at Barclays. He was an established, succesful, and very senior trader.

'Stelios this is Peter....' said Fred 'He is the US cash trader here at Barclays and he's the person responsibile for submitting LIBOR rates for the bank. Alex and I will be asking you on occasion to relay some information to him, relating to LIBOR rates and our preference on it. So, all you have to do is to let him know, OK?'

Peter got up... 'Nice to meet you, Stelios. Just let me know whenever you boys need something and I'll do my best to help out' he said."

And thus the die was cast.


The LIBOR scandal


When the rumours about LIBOR first surfaced in 2008 (and ironically, I think it was Tim Bond from Barclays who brought 'lowballing' to everyones attention), I immediately remember the incident from five years earlier. My first thought was 'Yes, that's absolutely what would have been happening', and then 'Wait, is that really illegal?'. 

The rest is history not worth repeating here; but for Stelios and Carlo it did not end well, as both were prosecuted for LIBOR and EURIBOR fixing respectively.  I won't tell you what happened to Stelios, you can google it if you like or better still read his book. Sadly, Carlo was sentenced to four years in prison, and could be there until 2023 (although hopefully he will qualify for an earlier release).

Several other traders were also found guilty, of which the most high profile was certainly Tom Hayes who was finally released a few months ago.


Why them, and not me?


I'm not going to discuss the rights and wrongs of the scandal here, I'm not going to debate as to whether any law was actually broken; nor will I tell you how I feel that only relatively junior people got prosectuted whilst their bosses got away with murder. You can read Stelios' book, as he's basically in broad agreement with me on all of these issues.

But there is one point I want to finish with. In Stelios' book he includes this line:

"Try to put yourself in my shoes and think about how you would have acted in my place"

For me this is especially poignant. It really could have been me. I wasn't actually in Stelios' shoes, but I was standing (or rather sitting) just a few metres away. And yet I acted quite differently.

I'd like to think that it's because I have an especially finely tuned moral compass, but if I'm being brutally honest I'm not sure that's the case (and to be fair to Stelios, in my limited personal dealings with him, and in his book, he comes across as a pretty decent guy).

Realistically, if I was in Stelios' shoes, or Carlo's for that matter, I probably would have done what he / they did. After all, we had a lot in common, quite apart from our near parallel career tracks. We had no training whatsoever on the legal or regulatory ramifications of rate fixing. Furthermore, we were working as juniors for domineering bosses who brooked no disagreement, although Stelios and I probably coped better than Carlo.

There are two main reasons why I didn't make the same decisions. Firstly, we were doing jobs that were quite different. Rate fixing had a much bigger impact on the swaps book than on ours (to use some jargon, we were running much smaller delta positions), so seeking to influence fixing rates just doesn't seem to have been such a big part of the job.

And secondly, if you reread the accounts of my brush with rate fixing and Stelios' description, they are quite different. There is none of the furtive nature of Sergei's instructions when you read what Stelios writes. There is no reason for Stelios to suspect that anything fishy is going on. It's just presented as completely normalised behaviour.

I am still not completely sure why Sergei was so secretive, given the practice of adjusting fixes was so commonplace. Perhaps he had some prescience about whaat was going go happen in the future, perhaps it was for his own amusement as part of the 'test', or perhaps it was just his Russian upbringing. 



"Try to put yourself in my shoes and think about how you would have acted in my place"


Thursday, 2 September 2021

The three kinds of (over) fitting

This post is something that I've banged on about in many presentations at several conferences* (most complete slides are here), and in various interviews, but never actually formally described in a blog post. In fact this post has existed in draft form since 2015 (!).

* you know, when you leave your house and listen to someone else speaking. Something that in late 2021 is a distant memory, although I will actually be speaking at an event later this year.

So there won't be new information here if you've been following my work closely, but it's still nice to write it down in one place.

(I'm trying to stick to my self imposed target of one blog post per month, but you will appreciate that I don't always have time for the research involved in producing them - unless it's a by product of something I'm already working on)

Trivially, it's about the fitting of trading systems and the different ways you can screw this up:

  • Explicit (over)fitting
  • Implicit (over)fitting
  • Tacit (over)fitting


What is fitting

I find it hard to believe that anyone reading this doesn't already know this, unless you've accidentally landed here after googling some unrelated search term, but let me define my terms.

The act of fitting a trading system can formally be defined as the process of discovering which combination of trading rule and parameter set(s) will produce the optimal trading system when tested on historic data: a combination I call the trading rule variation. The unspoken assumption of all quant finance is that this variation will also be the optimal system to use in the future.

A trading rule is a specific set of instructions which tells you how to trade; for example something like 'Buy if the N day return is negative, otherwise sell'. In this case the parameter set would consist only of a specific value of N.

Optimality can mean many things, but for the purposes of this post let's assume it's maximising Sharpe Ratio (it isn't that important which measure we choose in the context of our discussion here).

So for this particular example fitting could involve considering alternative values of N, and finding the value which had the highest Sharpe Ratio in an historic backtest. Alternatively, it could also involve trying out different rules - for example 'Sell if the N day return is negative, otherwise buy'. But note that these approaches are equivalent; we could parameterize this alternative set of rules as 'Buy X*units if the N day return is negative, otherwise buy' where X is eithier +1 (so we buy) or -1 (so we sell). Now we have two parameters, N and X, and our fitting process will try and find the optimal joint parameter values. 

Of course there are still numerous rules that we haven't considered here, such as selling if the N hour return is negative, or if the most recent non farm payroll was greater than N, or if there was a vomiting camel chart pattern on the Nth Wednesday in the month. So when fitting we will do so over a given parameter space, which includes the range of possible values for all our parameters. Here the parameter space will be X = [-1,1] and N = [1,2,3......] (assuming we have daily closing data). The product of possible values of X and N can loosely be thought of as the 'degrees of freedom' of the fitting process. 

All fitting thus involves the choice of some possible trading strategies from a tiny subset of all possible strategies.

The number of units to buy or sell is another question entirely, which I discuss in this series of posts

Fitting can be done in an automated fashion, purely manually, or using some combination of the above. For example, we could get some backtesting software and ask it to find the optimal values of X and N. Or we could manually test each possible variation. Or we could run the backtesting software once for X=1 (buy if N day return is negative), and then again for X=-1, each time finding the best value of N. The third option is the more common amongst most quant traders.


What is overfitting and why it be bad

Consider the following:

Hastie et al (2009) “The Elements of Statistical Learning” Springer. Figure 2.11


How does this relate to the fitting of trading systems? Well, we can think of 'prediction error' as 'Sharpe Ratio on an inverted scale' such that a low value is good. And 'model complexity' is effectively the degrees of freedom of the trading strategy.

What is the graph telling us? Well first consider the 'training sample' - the set of data we used to do the fitting on - the dirty red line. As we add complexity we will get a better performing trading strategy (in expectation). In fact it's possible to create a trading strategy with zero prediction error, and thus infinite Sharpe Ratio, if the degrees of freedom are sufficiently large (in a hand waving way, if the complexity in the strategy is equal to the amount of entropy in the data). 

How? Well consider a trading strategy which has the form 'Buy X*units if it's January', 'Buy X*units if it's February'.... If we fit this on past data it's going to do pretty well. Now let's make it even more complex: 'Buy X* units if it's January 3rd 2015', 'Buy X* units if it's January 4th 2015' .... (where January 3rd 2015 is the first day of our price history). This will perfectly predict every single day in the backtest, and thus have infinite Sharpe Ratio.

(More mathematically, if we fit a sufficiently high degree polynomial to the price data, we can get a perfect fit)

On the out of sample (dirty green) line notice that we always do worse (in expectation) than the red line. That's because we'll never do as well in predicting a different data set to what we have trained / fitted our model on. Also notice that the gap between the red and the green line grows as the model gets more complex. The more closely our model fits the backtest period, the less likely it is that it will be able to predict a novel future. 

This means that the green line has a minimum error (~maximum Sharpe Ratio) where we have the optimal amount of complexity (~degrees of freedom). Anything to the right of this point is overfitting (also known as curve fitting).

Sadly, we don't get paid based on how well we predict the in sample data. We get paid for predicting out of sample performance: for predicting the future. And this is much harder! And the Sharpe Ratios will be lower! 

At least in theory! In practice, if you're an academic then you get paid for publishing papers with nice results: papers that predict the past. If you're working for a quant hedge fund then you may be getting paid for coming up with nice backtests that also predict the past. And even as a humble independent trader, we get a kick out of a nice backtest. So for this reason it's very easy to be drawn towards trying to make the in sample line look as possible: which we'll do by making the model more complicated.

Basically: our incentives make us prone to overfitting and towards confounding the red and the green lines.



Explicit fitting


We're now ready to discuss the three kinds of (over)fitting.

The first is explicit fitting. It's what most people think of as fitting. The basic idea being that you get some kind of automated algo to select the best possible set of parameters. This could be very easy: a grid search for example that just tries every possible strategy variation. Or it could be much more complex: some kind of fancy AI technique like a neural network. 

The good news about explicit fitting is that it's possible to do it properly. By which I mean we can:
 
  • Restrict ourselves to fewer degrees of freedom
  • Enforce a realistic seperation between in and out of sample data in the backtest (the 'no time machine' rule) 
  • Use robust fitting techniques to avoid wandering into the overly complex overfitting end of the figure above.

Of course it's also possible to do explicit fitting badly (and plenty of people do!), but at least it's possible to avoid overfitting if you're careful enough.


Fewer degrees of freedom


Consider a more realistic example of an moving average crossover trading rule (MAC) which can be defined using two parameters A and B: signal = MA_A - MA_B, where MA_x is a moving average with lookback x days, and A<>B. Note that if A<B then this will be a momentum rule, whereas if A>B it will be a mean reversion rule. We assume that A and B can take any values in the range 1 to 256 (where 256 is roughly the number of business days in a year); anything longer than this would be an 'investment' rather than a 'trading' strategy.

If we try and fit all 65,280 possible values of A and B individually for each instrument we trade then we're very likely to overfit. We can reduce our degrees of freedom in various ways:

  • Restrict A<B [so just momentum]
  • Set B = k.A; fit k first, then fit A  [I do this!]
  • Restrict A and B to be in the set {1,2,4,8,16,32, ... 256}  [I do this!]
  • Use the same A, B for all instruments in a given asset class [discussed here]
  • Use the same A,B for all instruments [perhaps after accounting for costs]
Notice that this effectively involves making fitting decisions outside of the explicit fitting... I discuss this some more later. But for now you can note that it's possible to make these kinds of decisions without using real data at all.


No time machine


By 'no time machine', I mean that a parameter set should only be tested on a period of data if it has been fitted only on data that was available on data that was in the past of the testing period.

So for example if we fit from 2000 - 2020, and then test on the same period, then we're cheating - we couldn't have done this without a time machine. If we fit from 2000-2010, and then test from 2011 - 2020; then that's okay. But if we then do a classic ML technique and subsequently fit from 2011-2020 to test from 2000-2010 then we've cheated.

There are two honest options:

  • An expanding window; first we fit using data for 2000 (assuming a year gives us enough data to fit with; if we're doing a robust fit that would be fine) and test that model in the year 2001; then we fit using 2000 and 2001, and test that second model in 2002..... then we fit using 2000 - 2019, and then test in the year 2020.
  • A rolling window. Say we want to use a maximum of 10 years to fit our data, then we would proceed initially as for an expanding window until we get to .... we fit using 2000 - 2009 and test in the year 2010, then we fit using 2001 - 2010 and test in the year 2011.... then finally we fit using 2010-2019 and then test in the year 2020. 
In practice the choice between expanding and rolling windows is a tension between using as much data as possible (to reduce the chances that we overfit to a small sample), and the fact that markets change over time. A medium speed trend follower that needs decades worth of data to fit will probably want to use an expanding window: they are exploiting market effects that are relatively low Sharpe Ratio (high entropy in the data) but will also hopefully not go away. An HFT shop will want to use a rolling window, with a duration of the order of a few months: they are looking for high SR effects that will be quickly degraded once the competition finds out about them.


A robust fitting technique 


A robust fitting technique is one which accounts for the amount of entropy in the data; basically it will not over reach itself based on limited evidence that one parameter set is better than another.  

Consider for example the following:

A and B are the parameters for a MAC model trading Eurodollar futures. The best possible combination sits neatly in the centre of this plot: A=10, B=20 (a trend following model of medium speed). The Z-axis compares this optimum with all other values shown in the plot; a high value (yellow) indicates the optimium is significantly better than the relevant point.

I have removed all values below 2.0, which roughly corresponds to statistical significance. The large white area covers all possible values of A and B that can't be distinguished from the optimum. Even though we have over 30 years of data here, there is enough entropy that we can only rule out all the mean reversion systems (top triangle of the plot), and the faster momentum models (wedge at top left).

Contrast this with the picture for Mexican Peso:


Here I only have a few years of data. There is almost no evidence to suggest that the optimum parameter set (which lies at the bottom right of the plot) is any better than almost any other set of parameters. 

A simple example of robust fitting is the method I use myself: I construct a number of different parameter variations and then allocate weights to them. 

This is now a portfolio optimisation problem, a domain where there are plenty of available techniques for robust fitting (my favourite is discussed at length, in the posts that begin here). We can do this in a purely backward looking fashion (not breaking the 'no time machine' rule). A robust fitting technique will allocate equally to all considered variations where there is too much entropy and insufficient evidence that any is worth allocating more to (in the form of heterogenous correlation matricices, different cost levels, or differing pre-cost Sharpe Ratios). 

But when there is compelling evidence available it will tilt it's allocation to more diversifying, cheaper, and higher performing rule variations. It is usually a tilt rather than a wholesale reallocation, since there is rarely enough information to prove that one trading rule variation is better than all the others.



Implicit fitting


We can now think about the second form of fitting: implicit fitting. Implicit fitting occurs when you make any decision having seen the results of testing with both in and out of sample data.

Implicit fitting comes in degrees of badness. From most worst to least bad, examples of implicit fitting could include:

  • Run a few different backtests with different parameter values. Pick the one you like the best. Basically this is explicit in sample fitting, done manually. As an example, consider what I wrote earlier:  "Or we could run the backtesting software once for X=1 (buy if N day return is negative), and then again for X=-1, each time finding the best value of N." This is implicit fitting.
  • Run an explicitally fitted backtest, then modify the parameter space (eg restricting A<50) before running it again
  • Run a proper backtest, then modify the trading rule in some way before running it again (again, with explicit fitting, so you can pat yourself on the back). If this improves things, keep the modified rule.
  • Run a series of backtests, changing the fitting hyper parameters until you get a result you like. Examples of hyper parameters include expanding window lookbacks, shrinkage on robust Bayesian fitting, deciding whether to fit on a per instrument or per asset basis, and all kinds of wonderful things if you're doing fancy AI.
  • Run a series of backtests, changing some 'non core' parameters until you get a result you like. Examples include the volatility estimation lookback on your risk scaling, or the buffer window.
  • Run a single backtest to try out and idea. The idea doesn't work, so you forget about it completely.
You can probably see why these are all 'cheating': we're basically making use of a time machine that wouldn't . So for the last example, what we really ought to do is have a 'fund level' backtest in which every single idea we've ever considered is stored, and gets a risk allocation at the start of our testing period (which is then modified as the backtest fitting learns more about the historic performance of the model). Poor ideas will not appear in our 'live' model (assuming there is sufficient evidence by the ), but it will mean that our historic 'fund level' account curve won't be inflated by only ever having good ideas within it.

Other ways to deal with this also rely on knowing how many backtests you have run for a given idea; they include correcting your significance level for the number of trials you have done (which I don't like, since it treats a major case of parameter cheating the same as a tiny hyper parameter tweak), and testing on multiple paths to catch especially egregious over fitting (something like CPCV

But ultimately, you should know when you are doing implicit fitting. Try not to do it! As much as possible, if something needs fitting (and most things don't) fit in a proper explicit robust out of sample fashion. 



Tacit fitting


Barbara is a quant trader. She's read all about explicit and implicit fitting. She decides to fit a MAC model to capture momentum. First she restricts the parameter space using artifical data (as I discuss here):

  • Restrict A<B [so just momentum]
  • Set B = 4A [using artificial data]
  • Restrict A to be in the set {1,2,4,8,16,32,64}  [using artificial data]
  • Drop values of A that are too expensive for a given instrument [using artificial data]

Then she fits a series of risk weights using a robust out of sample expanding window with real data, pooling data across all instruments. Barbara is pleased with her results and goes ahead to trade the strategy.

The question is this, has Barbara used a time machine? Surely not!

In fact she has. Consider the first decision that she made:

  • Restrict A<B [so just momentum]
Could Barbara have made this decision without a time machine? Had she really been at the start of her backtest data (which we'll assume goes back to the beginning of financial market data; for the sake of argument let's say that's 1900), would she have known that momentum is more likely to be profitable than mean reversion (at least for the sort of assets and time scales that I tend to focus on, as does Barbara?). Strictly speaking the answer is no. Barbara only knows that momentum is better because of one or more pieces of tacit knowledge. Most likely:

  • She's done this backtest before  (perhaps at another shop where they were less strict about overfitting)
  • And/ or her boss has done this backtest before, and told her to fit a momentum model
  • And/ or she saw a conference presentation where someone said that momentum works 
  • ... She read a classic academic paper on the subject
  • ... Her Uber driver to the airport was an ex pit trader who favoured momentum
  • She is one of my students
  • She's read all of my books
None of this information would have been available to Barbara in 1900. By restricting A<B she's massively inflating her backtested performance over what would have been really possible had the backtest software realistically discovered over time that momentum was better. It's also possible that she will miss out on some profitable trading strategies just because she isn't looking for them (for example, some models of mean reverting A>B seem to be profitable for small A). 

Solving the problem of tacit fitting is very hard. Here are some possible ideas:

  • Widen the parameter space and fit in the wider space (so don't restrict A<B in this simple example). Of course that will result in more degrees of freedom, so you will need to be far more careful with using a robust fitting technique.
  • Use some kind of fancy neural network or similar to fit a highly general model. Even with moderm computational power it is unrealistic to fit a model that would be sufficiently general to avoid any possibility of tacit fitting (for example, if you only feed such a model daily price data, then you've arguably made a tacit decision that daily prices can predict future returns).
  • Hire people who know nothing about finance (and once they've learned, kill or brainwash them. You can't just fire them - they'll tell people your secrets!). This is surprisingly common amongst top quant funds (the hiring of ignorant people, not the killing and brainwashing).


And finally....




And if you want to get fancy, read this book.

Now go away, and overfit no more.