Here's a nice picture from a lovely book written by a top bloke:
It shows the cumulative p&l from different speeds of momentum over time (for portfolios containing 102 instruments) over 50 years of data. Notice how the two fastest speeds (2&4) get worse in the second half of the sample. I've called the line #2 here the 'second most famous hockey stick graph in history'. It certainly looks like something changed in 1990.This is important. If we're optimising portfolios of such things we only want to consider data that is relevant, but we also want as much data as possible for statistical significance. Now if I were a simpleton I'd do this by looking at graphs like that and going 'aha i only need to use data after 1990'. As a simpleton I don't use capital letters. But I am a big fan of not doing in sample fitting, even of meta parameters like this; and I am an even bigger fan of doing things automatically which means not wading through thousands of graphs like that (since there are thousands of SR estimates in my forecast p&l space, plus a good chunk of correlations).
So we need an automatic way of identifying such breaks. Fortunately this is not a new problem as you will know if, like me, you did undergraduate econometrics. Finding structural breaks is an entire industry. We need two things: a test for how likely it is that a break has occured between two sub-samples A and B. And an algorithim for going through all the options of A and B
And in case you haven't realised this is the seventh post in my summer 2026 series on portfolio optimisation.
What parameters
The first question to think about is what parameters we're going to apply this process to. I do two kinds of optimisation:
- Forecast weights
- Instrument weights
What test
What algo / procedure
- We compare year 1 with years 2 to 50. Since year 1 is very small it's unlikely we'll find a break; but if we do then we do a split (see below).
- If no split has occured, we compare years 1-2 with years 3-50. Again if we find a break, we split.
- If no split has occured, we compare years 1-3 with years 4-50...
- ...
- If no split has occured, we compare years 1-45 with years 46-50. If we still don't find a break, then we use the entire period for estimation (since the period 47-50 will only have four years, we terminate here).
- We compare year 21 with years 22 to 50. If we find a break, then we split again.
- If no split has occured, we compare years 21-22 with years 23-50. Again if we find a break, we split.
- If no split has occured, we compare years 21-23 with years 24-50...
- ...
- If no split has occured, we compare years 21-45 with years 46-50. If we still don't find a break, then we use the entire period from years 21-50 for estimation.
An example
Some code
from copy import copy
from typing import List, Callable
import numpy as np
import pandas as pd
from scipy.stats import ttest_ind
BUS_DAYS_IN_YEAR = 256
import matplotlib.pyplot as plt
MIN_NUMBER_OF_YEARS = 5
def identify_and_plot_breaks(all_returns: pd.Series, CV: float =0.01):
breaks_as_dict = identify_all_breaks(all_returns)
breaks_as_df = pd.DataFrame(breaks_as_dict)
breaks_as_df = breaks_as_df.bfill(axis=1)
breaks_as_df.cumsum().plot()
plt.show(block=True)
def identify_all_breaks(all_returns: pd.Series, CV: float):
## returns a dict, turn into a dataframe and you can plot
returns_to_consider= copy(all_returns)
returns_to_consider = returns_to_consider.dropna()
broken_list = identify_all_breaks_recursively(returns_to_consider=returns_to_consider, list_of_returns_broken_off=[],
CV=CV)
broken_list.reverse()
broken_dict = dict([
(idx, value) for idx, value in enumerate(broken_list)
])
return broken_dict
def identify_all_breaks_recursively(returns_to_consider: pd.Series, list_of_returns_broken_off: List, CV: float) -> List:
years_in_returns = how_many_years_approx(returns_to_consider)
for i in range(years_in_returns):
year_idx=i+1
first_sample, second_sample = split_sample_after_n_years(returns_to_consider, year_idx)
if len(second_sample)<(MIN_NUMBER_OF_YEARS*BUS_DAYS_IN_YEAR):
break
is_broken_here = test_a_break(first_sample, second_sample, CV=CV)
if is_broken_here:
list_of_returns_broken_off.append(first_sample)
return identify_all_breaks_recursively(
second_sample, list_of_returns_broken_off=list_of_returns_broken_off,
CV=CV
)
else:
continue
## No breaks identified or sample size too short
list_of_returns_broken_off.append(returns_to_consider)
return list_of_returns_broken_off
def how_many_years_approx(returns: pd.Series):
return int(np.floor(len(returns)/BUS_DAYS_IN_YEAR))
def split_sample_after_n_years(all_returns: pd.Series, n_years: int):
idx = n_years*BUS_DAYS_IN_YEAR
return all_returns[:idx], all_returns[idx:]
def test_a_break(first_sample: pd.Series, second_sample: pd.Series, CV: float):
## Normalise by standard deviation before considering means
norm_first_sample =first_sample/first_sample.std()
norm_second_sample=second_sample/second_sample.std()
return ttest_ind(norm_first_sample, norm_second_sample).pvalue<CV
A summary of results
You can see that breaks are quite rare with only 13% or so of instrument/rules having at least one break. This also suggests that the Sharpe Ratios for trading rule performance are actually quite stable over time; or at least stable enough that they won't fail any statistical tests at a 1% critical value.

Although five is pushing it, there are certainly three regimes there (pre 2000, 2000 - 2010, and 2010 onwards), and using post 2010 data seems to make some kind of sense.
An optimisation test
- Select 10,20,30 or 40 years of in sample data (I need at least 10 years because with a minimum of five years required for estimation I certainly won't find any breaks, or I will risk finding a break and not having five years of data leftover)
- Select 1 or 5 years of out of sample data
- Pick a random instrument, ensuring there is enough history available (between 11 and 45 years). We will only choose from instruments with sufficient history for the time required.
- Randomly pick N=9 forecasting rules from those available (the same number as in posts #2 and #3)
- Cycle through using no breaks (0% CV), 1% CV, 5% CV and 10% CV
- Estimate SR on the insample data using eithier all the data (0% CV), or the data after the last break given some critical value.
- Estimate correlation using all the in sample data
- Use fixed shrinkage levels (estimated here): SR shrinkage 0.5, correlation 0.75 (since we'll always have at least five years of in sample data we don't need to worry about the higher levels of shrinkage required when we have insufficent data). The results won't be much different with any vaguely similar shrinkage.
- Run in sample optimisation and out of sample optimisation on all the options above
- Get the median SR from the distribution of subsamples
- Find the optimal CV with the highest SR
- Test to see if that median is significantly higher than the others
10 years in sample, one year out of sample
SR pvalue all pvalue distinct 0.00 -0.021 0.247 0.247 0.01 -0.018 0.295 0.295 0.05 -0.019 0.204 0.204 0.10 -0.014 NaN NaN
10 years in sample, five years out of sample
SR pvalue all pvalue distinct
0.00 0.166 0.168 0.168
0.01 0.160 0.003 0.003
0.05 0.167 0.007 0.007
0.10 0.170 NaN NaN
SR pvalue all pvalue distinct
0.00 0.166 0.168 0.168
0.01 0.160 0.003 0.003
0.05 0.167 0.007 0.007
0.10 0.170 NaN NaN20 years in sample, one year out of sample
SR pvalue all pvalue distinct
0.00 -0.160 0.542 0.542
0.01 -0.160 0.323 0.323
0.05 -0.160 0.050 0.050
0.10 -0.155 NaN NaN
20 years in sample, five years out of sample
SR pvalue all pvalue distinct
0.00 0.123 NaN NaN
0.01 0.116 0.0 0.0
0.05 0.105 0.0 0.0
0.10 0.098 0.0 0.0
30 years in sample, one year out of sample
SR pvalue all pvalue distinct
0.00 0.092 NaN NaN
0.01 -0.019 0.0 0.0
0.05 -0.078 0.0 0.0
0.10 -0.066 0.0 0.0
30 years in sample, five years out of sample
SR pvalue all pvalue distinct
0.00 -0.005 0.001 0.001
0.01 -0.011 0.000 0.000
0.05 -0.008 0.014 0.014
0.10 -0.003 NaN NaN
40 years in sample, one year out of sample
SR pvalue all pvalue distinct
0.00 -0.142 NaN NaN
0.01 -0.249 0.000 0.000
0.05 -0.189 0.000 0.000
0.10 -0.178 0.002 0.00240 years in sample, five years out of sample
SR pvalue all pvalue distinct
0.00 -0.060 0.177 0.177
0.01 -0.051 NaN NaN
0.05 -0.055 0.000 0.000
0.10 -0.055 0.000 0.000





No comments:
Post a Comment
Comments are moderated. So there will be a delay before they are published. Don't bother with spam, it wastes your time and mine.