Although this post will make more sense if you've read the book, it can also be read independently as I'll be dropping brief explanations in as we go. Hopefully it will whet your appetite!

You can get the code you need from here:

https://github.com/robcarver17/systematictradingexamples/blob/master/optimisation.py

The code also includes a function for generating "expanding window", "rolling window" and "in sample" back test time periods which could be useful for general fitting.

### The problem

*

*You can think of this as a synthetic constant maturity bond, or what you'd get if you held the 20 year US bond future and also earned interest on the cash you saved from getting exposure via a derivative.*

Some of the issues I will explore in this post are:

- This is a
*backtest*that we're running here - a historical simulation. So how do we deal with the fact that 10 years ago we wouldn't have had data from 2005 to 2015? How much data should we use to fit? - These assets have quite different volatility. How can we express our portfolio weights in a way which accounts for this?
- Standard portfolio optimisation techniques produce very unstable and extreme weights. Should we use them, or another method like
which takes account of the noise in the data?**bootstrapping** - Most of the instability in weights comes from having slightly different estimates of the mean return. Should we just assume all assets have the same mean return?

### In sample

Let's begin by doing some simple

**in sample**testing. Here we cheat, and assume we have all the data at the start.

I'm going to do the most 'vanilla' optimisation possible:

opt_and_plot(data, "in_sample", "one_shot", equalisemeans=False, equalisevols=False)

This is a very boring plot, but it shows that we would have put 78% of our portfolio into US 20 year bonds and 22% into S&P500, with nothing in NASDAQ. Because we're cheating we have the same information throughout the backtest so the weights don't change. We haven't accounted for the uncertainty in our data; nor done anything with our estimated means - this is just vanilla 'one period' optimisation - so the weights are pretty extreme.

Let's deal with the first problem - different volatility. In my book I use the technique of

**volatility normalisation**

*to make sure that the assets we are optimising weights for have the same expected risk. That isn't the case here. Bonds are much less volatile than stocks. To compensate for this they have a much bigger weight.*

We can change the optimisation function so it does a type of normalisation; measure the standard deviation of returns in the dataset and change all the returns so they have some arbitrary annualised risk (20% by default). This has the effect of turning the covariance matrix into a correlation matrix.

opt_and_plot(data, "in_sample", "one_shot", equalisemeans=False, equalisevols=True)

Now things are looking slightly more reasonable. The weights we are seeing here are 'risk allocations'; they are conditional on the assets having the same volatility. Even if we aren't lucky enough to have assets like that it's more intuitive to look at weights in this vol adjusted space.

However it's still a pretty extreme portfolio. Poor NASDAQ doesn't get a look in. A very simple way of dealing with this is to throw away the information we have about expected mean returns, and assume all assets have the same mean return (notice that as we have equalised volatility this is the same as assume the same Sharpe Ratio for all assets; and indeed this is actually what the code does).

opt_and_plot(data, "in_sample", "one_period", equalisemeans=True, equalisevols=True)

Now we have something I could actually live with. The only information we're using here is correlations; clearly bonds are uncorrelated with equities and get almost half the weight (which is what they'd get with

*- the simple, no computer required, method I discuss in my own). S&P 500 is, for some reason, slightly less diversifying than NASDAQ in this dataset, and gets a slightly higher weight.*

**handcrafting**However what if our assets do have different expected returns, and in a statistically significant way? A better way of doing the optimisation is not to throw away the means, but to use

**bootstrapping**. With bootstrapping we pull returns out of our data at random (500 times in this example); do an optimisation on each sample of returns, and then take an average of the weights from each sample.

opt_and_plot(data, "in_sample", "bootstrap", equalisemeans=False, equalisevols=True, monte_carlo=500)

Notice the weights are 'wiggling' around slightly. This is because although the code is using the same data (as we're optimising in sample), it's doing a new set of 500 optimisations each year, and each will be slightly different due to the randomness of each sample. If I'd used a smaller value for monte_carlo then there would be even more noise. I quite like this 'wiggliness' - it exposes the underlying uncertainty in the data.

Looking at the actual weights they are similar to the previous example with no means, although NASDAQ (which did really badly in this sample) is slightly downweighted. In this case using the distribution of average returns (and correlations, for what it's worth) hasn't changed our minds very much. There isn't a statistically significant difference in the returns of these three assets over this period.

### Rolling window

To begin with let's use 'one period' optimisation with a lookback of a single year.

opt_and_plot(data, "rolling", "one_period", rollyears=1, equalisemeans=False, equalisevols=True)

As I explain at length in my book one year is wholly inadequate to give you significant information about returns. Notice how unstable and extreme these weights are. What about 5 years?

opt_and_plot(data, "rolling", "one_period", rollyears=5, equalisemeans=False, equalisevols=True)

These are a little more stable, but still very extreme. In practice you usually need a lot more than 5 years of data to do any kind of optimisation, and chapter 3 of my book expands on this point.

I won't show the results for bootstrapping with a rolling window; this is left as an exercise for the reader.

### Expanding window

It's my preference to use an

**expanding window**(sometimes called

*anchored fitting*). Here we use all the data that we have available as we step through each year. So our window gets bigger over time.

opt_and_plot(data, "expanding", "one_period", equalisemeans=False, equalisevols=True)

These weights are more stable as we get more data; by the end of the period we're only adding 7% more information so it doesn't affect the results that much. However the weights are still extreme. Adding more data to a one-shot optimisation is only helpful up to a point.

Let's go back to the boostrapped method. This is my own personal favourite optimisation method:

opt_and_plot(data, "expanding", "bootstrap", equalisemeans=False, equalisevols=True)

Things are a bit hairy in the first year* but the weights quickly settle down to non extreme values, gradually adjusting as we get more data as the window expands.

*

*I'm using 250 days - about a year - of data in each bootstrap sample (you can change this with the monte_length parameter). With the underlying sample also only a year long this is pushing things to their limit - I normally suggest you use a window size around 10% of the total data. If you must optimise with only a year of data then you should probably use samples of around 25 business days. However my simple code doesn't support a varying window size; though it would be easy to use the 10% guideline eg by adding monte_length=int(0.1*len(returns_to_bs.index)) to the start of the function bootstrap_portfolio.*

Just to reinforce the point that these are 'risk weightings' here is the same optimisation done with the actual 'cash' weights and no normalisation of volatility:

opt_and_plot(data, "expanding", "bootstrap", equalisemeans=False, equalisevols=False)

### Conclusion

I hope this has been useful both to those who have bought my book, and those who haven't

*yet*bought it (I'm feeling optimistic!). If there is any python code that I've used to write the book you would like to see, let me know.

Bootstrapping of returns isn't always a good idea. Such "take the mean over various optimisation problems" are often not useful for sparse portfolios. Assume you have a universe of 500 stocks but can only invest in like 50 at the same time. Obviously taken your average over sparse portfolios would destroy this. On top of this it's computationally demanding and it takes some effort to reproduce exactly the same results (that's important in a back test). Your noise around your average will cause further artificial trading costs. The method you are suggesting reminds of a book by Michaud. You may want to cite his work. Having the same mean return is just a more extreme form of shrinkage. I guess you could look into a less brutal shrinkage.

ReplyDeleteHi Thomas

ReplyDeleteGreat comments.

You're right bootstrapping isn't great for sparse portfolios. However the kinds of portfolios I deal with in my book aren't sparse; generally you'd have an investment in all of them. Still the point is well made.

I think the issues of computational demand are less problematic than they were in the past; 100 or 200 monte carlo runs is enough to get pretty good results.

One could use some kind of buffering or smoothing to reduce the jumps in weights. However it's probably better to overstate trading costs in a backtest. In reality one would probably use the final weights from the backtest; and then recheck annually that they were still pretty close to the latest bootstrapped values. It's also worth bearing in mind that these jumps are much smaller than you'd get from most other methods, except one with a massive amount of shrinkage.

I actually like getting slightly different results when I run a backtest. I think it reminds us that any backtest is just a single random sample from an unknown universe. But then I'm weird like that :-)

I think this is a bit different from Michaud who does something a bit more sophisticated than me, resampling the efficient frontier rather than the weights of a single optimal point. In my book I credit Jobson and Korkie who I think came up with this non parametric method in the 1980's. I'm happy to recredit them here.

I've also used shrinkage in the past. It does require a bit more skill / work as you need to (a) come up with a prior and (b) decide given the amount of noise in the data. Its my experience that it's easier to get things wrong with shrinkage methods than with bootstrapping.

Hi Rob

ReplyDeleteThis question may (may) be somewhat related to the previous question.

Let say I'm systematically trading a portfolio of stocks and I have $ available to open N more positions.

From the universe of many/many stocks what metric should I use to select the stocks in order to maintain a balanced (low correlation) portfolio?

What metric am I trying to minimize/maximize?

I feel like I've seen this question posed and answered other places and that maybe it's a standard portfolio composition/optimization question ... but I'm not sure.

Thanks

So you already have X positions, and you want to open up N more. I'm assuming you don't want to do anything with the X positions you already have. This meas you are only optimising part of the portfolio.

DeleteBasically you want to do a standard optimisation, but hold the weights you already have constant.

If you just want to minimise variance (max diversification if everything is vol normalised), then you can throw away the mean information.

Suppose you had $A and you now have $B more, and you had weights w1, w2, w3.... So you have (B / [A+B]) of your portfolio left to allocate and (A / [A+B]) will remain fixed

Then you need to work out the new effective weights R*w1, R*w2, .. .where R = (A / A+B).

Then to apply the weights you change this line (137 in the .py file):

bounds=[(0.0,1.0)]*number_assets

to bounds=old_weights+ [(0.0,1.0)]*number_other assets

Where number_other assets is the list of N more positions you could open, and old_weights=[(R*w1-epsilon, R*w1+epsilon), (R*w2-epsilon, R*w2+epsilon), ...]

Where epsilon is small, but bigger than the tolerance (tol=0.00001); perhaps tol*2.

Hi Rob - thank for your feedback.

ReplyDeleteHmmm ... interesting. I think you answered my question but am not 100% sure ...

The actual scenario I'm trying to understand how to automate is:

1) I have an empty portfolio and I want to add positions to it until it's full (however I measure that)

2) I close a position(s) and want to add position(s) to the portfolio until it's full

I assume I wouldn't want to fill the portfolio with highly correlated positions - e.g., all semiconductor stocks or if trading futures all equity futures or grains.

I'd like to write some code to automate both 1) and 2) above but am not quite getting the picture of what metric I should be optimizing .. is it covariance?

Maybe I'm over complicating things and a simple heuristic like never have > N% of the portfolio in a single sector (semiconductor, equity futures, grains, etc) would suffice.

The engineer in me wants to optimize some number to make myself feel good that the portfolio is mathematically "well balanced" however you define that.

Hi Rob

DeleteThe metric you're trying to optimise is sharpe ratio, just like a standard optimisation; if all assets have the expected same sharpe ratio and volatility then that will just be a function of the correlation matrix (if my tired old brain is correct, the weights will always be proportional to the inverse of that matrix); i.e. the minimum variance / maximum diversified portfolio.

I think the code I have posted will deal with situation (2). But let's try and think of a step by step method, something like:

We have an existing portfolio, with a set of weights, and 'space' for more more (i.e. weights don't add up to 1).

We assume we want to add an asset to that portfolio, with some given weight (so this is a bit different, we're not finding the weights)

To make things easy let's ignore the 'space' we have except what we need for this asset. This makes the problem recursive.

So we have existing weights w1,w3, .... wn-1. And we have decided to include something with weight wn.

So the problem becomes which new asset N will give me the highest expected sharpe ratio, given weights w1....wn for my portfolio and existing assets 1....n-1.

I could do this with an optimisation, but it's quite unstable. So in practice let's just iterate over all possible assets in my universe which I could add, calculate the sharpe ratio for each portfolio, .

So the solution is something like this, assuming the total size of the portfolio is S, and all assets will have equal weights:

0) Add your first asset. This should be the asset with the lowest average correlation to other assets. You now have 1- 1/S of your portfolio left to allocate.

1) Find the asset which gives you the highest sharpe ratio, given a particular covariance / correlation matrix given a weight on the new asset of 1/S, and existing weights unchanged.

2) Repeat step 1, but when the portfolio is 'full' (we have S assets) stop

If you close one or more positions, repeat step 1 until the portfolio is 'full' again.

Does that make sense? Of course simple heuristics are good too. The 'handcrafted' method is a simple heuristic; and I actually use that, not bootstrapping, in my 'live' portfolio weights.

Hi Rob,

ReplyDeleteI think situation 2 is really the same as 1. E.g., I was stopped out of all positions at once and I now need to refill the whole portfolio.

I'm going to go through the steps you listed and see if I understand:

0) Add your first asset. This should be the asset with the lowest average correlation to other assets. You now have 1- 1/S of your portfolio left to allocate.

0.1) I assume the portfolio can contain S positions and for this exercise each are equally weighted as 1/S ... OK?

0.2) I assume I'm calculating the correlation between the price series (closes) of the potential new asset and those already in the portfolio ... correct?

1) Find the asset which gives you the highest sharpe ratio, given a particular covariance / correlation matrix given a weight on the new asset of 1/S, and existing weights unchanged.

1.1) I've already selected an asset (the min avg correlation asset) in step 0 and added it to the portfolio ... correct?

1.2) Re "Find the asset which gives you the highest sharpe ratio".

Can you elaborate a bit on what's going in this step as I'm not sure what I'm "finding" here and what I do with it once found. I.e. what's the output of this step as you've already added a new asset in step 0?

1.3) If I'm calculating the Sharpe ratio of asset a-sub-n I assume I'm using the returns of this asset as would have been produced by the system... correct?

Re "The 'handcrafted' method is a simple heuristic; and I actually use that, not bootstrapping, in my 'live' portfolio weights."

- What is the 'handcrated' method?

Thanks

No sorry I haven't explained it properly. I'll try again, renumbering the steps for clarity and changing the way the weights work (so they don't correspond to my prior explanation. Just wipe that from your mind. Forget I said it.)

DeleteStep 0:

You start with an empty portfolio and some number of potential assets P. We want to find S assets to fill our portfolio. P>>S. We have expected returns for all S assets, so we can construct a correlation matrix (I'm assuming volatility normalisation, and the same expected average return; as we're just focusing on maximum diversification here).

Step 1:

Then in step 1 you add one asset - your first - the most diversifying. To find this asset you get the correlation matrix of all potential assets. Find the average correlation of all assets with all other assets (this is just the average of each column in the correlation matrix, after you've removed the '1's). Pick the asset with the lowest average correlation.

We now have one asset in our portfolio, and S-1 assets left to find out of a pool of P-1. The weight of this asset - for now - is 100%

In step 2 we take our current portfolio. This consists of N assets (here N=1) with existing weights W1....WN. By definition all the existing weights add up to 100% (here W1=100%).

We're going to add another asset. To make space for this we give that asset a weight of (1/ (N+1)). All existing assets must also have a weight of (1/(N+1)). To achieve this we need to multiply the existing weights by N/(N+1). For the trivial case of N=1, the new thing gets a weight of 1/2, and we multipy the existing weight W1=100% by 1/2. So the original asset has a weight of 50%, and the new one a weight of 50%.

W1=.5, W2=.5

Okay so now we look at all the assets left over (P-1 in the trivial case). We calculate the expected portfolio sharpe for:

- a portfolio of the original asset with weight 50%, and the first possible candidate asset with weight 50%

- a portfolio of the original asset with weight 50%, and the next possible candidate asset with weight 50%

....

- a portfolio of the original asset with weight 50%, and the last possible candidate asset with weight 50%

We find which of these portfolios has the highest sharpe ratio. We then select the candidate asset which forms part of that portfolio.

We now have two asset in our portfolio, and S-2 assets left to find out of a pool of P-2. Each asset has 50% weight.

Step 3 is very similar to step 2

Our current portfolio consists of N assets (here N=2) with existing weights W1....WN. (here W1=50%, W2=50%).

We're going to add another asset. For N=2, the new thing gets a weight of 1/3, and we multipy the existing weights by 2/3. So the original assets have weights of 50%*2/3 = 33%, and the new one a weight of 33%.

W1=.333, W2=.333, W3=.333

Okay so now we look at all the assets left over (P-2). We calculate the expected portfolio sharpe for:

- a portfolio of the original asset with weight 33%, the second asset with weight 33%, and the first possible candidate asset with weight 50%

....

We find which of these portfolios has the highest sharpe ratio. We then select the candidate asset which forms part of that portfolio.

Step 4 - and so on, until our portfolio has S elements.

I hope that makes more sense. It would probably be more concise to write it in code, but I'm feeling lazy.

Aha - you eithier haven't bought my book or you haven't got to chapter 4 yet! Sorry I'm going to be akward and say if you want to learn about the handcrafted method you're going to have to spend some money, or read a bit. Suffice to say it's a simple but effective heuristic method for portfolio optimisation.

First, no I haven't yet purchased your book but have it on my list after reading the review on the Reading The Markets blog - my go-to source for market book reviews.

ReplyDeleteOK, I almost 100% completely understand ... almost! I'm going to ask a few questions step by step.

Will start with Step 1 now as I have to think a bit about Step 2 before posing questions. Those will probably come tomorrow.

Step-1:

What two time series are you using to calculate the correlation? Assume we're using daily soybeans, corn for this example.

a) Daily closing values of beans, corn

b) Daily returns of beans, corn

c) Log(daily returns or beans), Log(daily returns of corn)

d) something else

Hi Robert,

DeleteGlad to hear you are a potential purchaser.

I'd probably use weekly % returns. The reason for % is that it's more stable over time. The reason I use weekly is that using daily tends to understate correlations especially when you're trading across time zones.

Hi Rob,

ReplyDeleteThanks for the correlation info.

Re Step 1:

It seems odd, at least to me, that if we're trying to build a maximally diversified portfolio, as mentioned in Step 0, that the correlation matrix (built in step 0) is used once only in Step 1 and then never again.

It seems that as it's used to pick the initial market and nothing else that one could (seemingly) just pick the initial market at random at get similar results ... yes?

Re Step 2:

Is the method to calculate the portfolio Sharpe detailed in your book?

I'm not exactly sure how to calculate this and it may be a bit involved for this Q&A.

Thanks -- Robert

Hi Robert

DeleteYou know what I'm going to do what I should have done to begin with, which is write some code to make the method clear. When I've done that I'll write a little post. Give me a few days...

Rob

Thanks Rob. No hurry as this is one of those things on my "I wonder how to do this" list.

ReplyDeleteRegards, -- Robert

Hi Robert

DeleteTry this

https://github.com/robcarver17/systematictradingexamples/blob/master/iteratedoptimisation.py

Hi Rob,

Deleteif one were to tackle this problem by optimizing a portfolio consisting of all of the available assets, and then selecting the subset which had the highest weights in the "universe portfolio" (in your code example, optimize a portfolio of all three assets, and then select the two which had the highest weights), how would that compare to your iteration approach? Would that yield different results and why?

The code fragment works purely on maximum diversification, so let's assume you use the same approach and throw away the asset with the least diversification. They would get similar results, but because of path dependence they wouldn't necessarily end up in the same place.

DeleteHi!

ReplyDeleteI'm a complete beginner in developing automated trading strategies. The most "sophisticated" tool that I've access to use when testing my strategies is walk forward tests where I can test my strategies on unseen data. Do you think this tool is "enough" for testing the robustness of my strategies?

Thanks for taking your time.

Hi!

ReplyDeleteI'm a complete beginner in developing automated trading strategies. The most "sophisticated" tool that I've access to use when testing my strategies is walk forward tests where I can test my strategies on unseen data. Do you think this tool is "enough" for testing the robustness of my strategies?

Thanks for taking your time.

Actually having a clumsy method for testing is probably a good thing. It will make it much more painful to fit multiple variations of a trading rule, and thus reduce the incentive to do so.

DeleteHi Rob!

DeleteWould you be so kind and elaborate what you mean by " a clumsy method"?

Thanks in advance.

The method you described in your original post would be clumsy. Alternatively imagine something that automatically sweeps through your data, fitting automatically and spitting out an overfitted trading rule at the end.

DeleteHi!

ReplyDeleteWhen testing strategies - what do you think is the most powerfull way to test if the strategies have chances on working in real-time-trading?

I am not sure I understand the question. Perhaps you can elucidate or give me an example. But I can think of a few ways to make it less likely you will make money in real time trading:

Delete- overfitting

- fitting in sample

- not accounting properly for costs

Any method which avoids these pitfalls will have a better chance of producing good results than one that makes these errors.

Rob

Hi!

DeleteAlright - I've read a lot about overfitting. What do you think is the most common way to overfit a system?

About fitting in sample - is "out-of-sample"-tests a good way to respond to this matter?

Thanks for answering.

There are two common ways to overfit.

DeleteOne is where you do it in an 'implicit way'. This is where you manually backtest your system, look at the entire account curve, change it a bit to improve it's performance, and repeat.

The second is where you explictally fit in sample using a complicated method and/or too many parameters. For example suppose you use a neural network to fit on 20 years of data, reserving one year of data for out of sample testing. By luck your network did well on the out of sample year. But this is a meaningless fluke. Chances are you still have a horrendously overfitted strategy.

You should always use expanding or rolling windows for out of sample testing. This will help avoid explicit overfitting.

Hi Rob,

ReplyDeleteThanks for this great resource. I have read through your book but I need to read it again to fully grasp all the concepts. I was going to post a question on how exactly to derive the Forecast Scalars in Table 49, p. 285, but I have now found the example spreadsheet on your support site which does exactly that. So thanks!

Keep up the good work.

Andy

Hi Rob,

ReplyDeleteEnjoying your book and blog. In your book you refer to volatility normalizing your time series before optimizing portfolio weights. How exactly are you doing your vol normalizing?

Thanks!

If I'm optimising the weights of trading rules or sets of trading rules for each instrument then I don't need to do anything - the expected standard deviation will be identical.

DeleteIf I'm doing it for assets then I would probably measure the standard deviation of the assets over the last couple of years (although in practice for these examples in this post I took the easy option of measuring standard deviation for the whole period before the point in the backtest where I'm calculating the weights).

Thanks, Rob. Just as a follow-up, so are you merely scaling the assets' time series until their historical vols are equal?

DeleteYes

DeleteDear Rob,

DeleteWhat lookback would you suggest to use when voltility scaling a trading rule forecast? In the book you say "recent stdev", then in spreadsheets there is default 36 EMA. Should i take into account the speed the rule trades when normalizing it?

ALso, I don't quite understand how you calculate stddev for the two trading rule's forecasts you've provided:

You've got variance = EMA36(ret^2), missing the -EMA36(ret)^2 term, with return itself being not a true return, but a price difference. Is there any justification for this approach?

Thnk you!

In the past I've looked at using a different vol scaling for different speed rules. It doesn't seem to make much sense / make much difference.

DeleteWhen calculating a trading rule forecast for EWMAC we have a price difference on top of the numerator, so the standardisation in the denominator should also be in price difference units (I assume by 'a true return' you mean % return)

Dear Robert,

ReplyDeletethanks for the wonderful work within your book and here. I'm struggling to grasp the idea of bootstrapping, so plese clarify if I'm thinking correctly:

1. Case of in-sample bootstrapping.

opt_and_plot(data, "in_sample", "bootstrap", equalisemeans=False, equalisevols=True, monte_carlo=500)

So you have roughly 2500 (10 years) of data points (returns). What you do is draw 250 returns at random with replacement (for year one) and calculate their statistics (mean, std, correlation matrix) then you do that 500 times and take average of those statistics. Then you do the same thing for years 2 to 10, is this correct?

2. For the case of expanding window

opt_and_plot(data, "expanding", "bootstrap", equalisemeans=False, equalisevols=True).

Pretty much everything is the same - draw 250 samples at random with replacement first out of 250 data points (1 yr), do that 500 times, find statistic averages, then out of 500 data points (2 yrs), do that 500 times, find averages e.t.c. Is this correct?

3. From your personal experience is epanding window better than the rolling window?

Thank You!

DeleteYes what you have written is correct.

I prefer expanding windows unless you have a *lot* of data (say 50 years plus)

Hi Robert

ReplyDeleteI really enjoy your book and going to implement this methods on my trading.

In your book on page 167 you describe that we going to share our capital across a portfolio of subsystems. However, fFrom my backtesting results I see that my subsystems are not always traded. For each of the subsystems there are periods (sometimes pretty long) that there are no trades. If I'm going to use bootstrapping on this "gapped" subsystems, is my result then realistic ?

Thanks

Kris

I'm curious: How long are these periods? What kind of trading rules are you using?

DeleteIt's fair to say the framework works best when you're trading most markets most of the time (I think I'm normally in about half of my markets).

To answer your question I guess the solution here is to use a longer bootstrap window. I default to a year, but if there is a high probability of not getting any returns for a particular asset with a one year draw, then you should increase that.

In the extreme case you can actually have a window size greater than your data length, if you sample with replacement.

Hi Rob,

DeleteThe periods vary from 1-2 months to sometimes more then a year. But I must confess that I use only one variance of a Trend Following system at this moment. Since I'm new to trading, my first focus was on building my own datamanagement & backtest system and expanding this with a portfolio framework. When this is done, I'm going to focus on the implementation of strategies.

So I think your statement is right that with a mix of strategies there are less gaps.

Kris

Hi Rob,

ReplyDeleteIn determining your asset weights for your own system, in your optimisation do you constrain all weights such that the weights are bound by zero and 1, meaning "no short sales" based on the weights? I believe this is what you do, as your trading rules determine long/short positions. Am I right?

Yes. It makes no sense to give a trading rule or subsystem a negative weight.

DeleteOne question I am struggling with is the following: if I am performing a mean-variance Markowitz optimisation (via bootstrapping), I believe this will put a downward bias on the weights to winning shorts, so that they are a smaller part of the portfolio even though they are profitable. Why? Because if we constrain the weights to be positive when setting the instrument weights, the assets with negative returns (which could have been profitable shorts) will have very low/zero weights in the optimisations. What is your take on this?

ReplyDeleteI'm confused. Eithier (a) you're trying to create a long short portfolio, so no bounding of weights above 0 or (b) you're creating a long only portfolio. In the case of (a) profitable shorts will have just as much chance of getting a decent negative weight as profitable longs. In the case of (b) they'll get a zero weight, but then you're running a portfolio where you can't / or it doesn't make sense (see previous comment) to go short.

DeleteI am attempting to apply the method that you do in your approach in that I i) determine the weights allocated to each instrument and ii) determine the weights allocated to each rule. For both i) and ii) I apply the mean-variance (m/v) optimisation via bootstrapping. i) and ii) will be assumed to be long only, but the trading rules in ii) can includes shorts. If I allocated instrument weights in i) via m/v some instruments will receive a zero weight if they yielded negative returns. However, the system would be more profitable if we included instruments with negative returns, as the trading rules from ii) would allow us to go short these assets. Is this correct? Also, are there any instruments in your portfolio that receive zero instrument weights?

DeleteStep 1: We create trading rules to forecast prices for each instrument. Forecasts can be long or short.(chapter 7 of my book)

DeleteStep 2: We combine our forecasts to get one forecast per instrument. This is an optimisation problem, where the weights are the forecast weights, and the inputs are the returns of each trading rule. Weights are bounded above 0 and total 100% (chapter 8)

Step 3: We now have a portfolio of *subsystems*, minature trading systems, one per instrument. We allocate our capital amongst this portfolio. This is an optimisation problem, weights are the amount of capital in each instrument subsystem, returns are the amount of money each subsystem makes. Weights are bounded above 0 and total 100% (chapter 11)

Notice at no stage do we allocate capital directly to positions in instruments. Therefore we never allocate a negative weight. A short in an instrument will arise if it's combined forecast is negative.

If a trading rule makes good forecasts on the short side for a particular instrument, then it will get a higher weight, and we'll happily short the relevant instrument.

In theory it's possible for an instrument subsystem or trading rule to get a zero weight through bootstrapping, but only if it's performance is incredibly bad and it's highly correlated to another rule / subsystem.

Hi Rob

ReplyDeleteJust to say I think your work has been an eye-opener to how the hedge fund world operates. So greatly appreciated.

I have no financial or programming background but have been trying to implement your methods in C sharp which I know a little. So I have done the variance covariance matrix on random portfolios with different characteristics etc etc. I understand the bootstrapping principle to select the rules but come an abrupt stop when it comes to optimisation.

The code you use in your example uses method='SLSQP' as part of the mark-solvo method/def. Is this something you have coded yourself in python or is it something freely available to users of python?

The other question I have is that the code you exhibit is really the overall structure of the code. I presume you have coded the detail behind the framework or is a lot of this already precoded "library" code available to users of python?

Many thanks

Chris

slsqp is part of a standard python package (sequential least squares programming). Clearly I had to code a lot of stuff, but the python pandas library handles things like storing and manipulating time series, and I certainly don't fancy programming my own non linear optimiser.

DeleteThanks Rob

ReplyDeleteLooks like I would save a lot of time by learning python and having a look at your open source code! Can't be that difficult! Will download python tonight!

Chris

Hello Rob,

ReplyDeleteDo you consider/discuss research of an 'optimal window length' as a function of Sharpe ratio or other metric for expanding window fitment? If so, which blog article (or your book) is it located? If not, what's your critique of the prospect of doing so?

Also, would it be technically correct to call expanding or window testing 'cross-validation?'

I've never though about "optimal window length". If you're trading slowly (holding period weeks or months) then the optimal window length is infinity (or at least far more data than you probably have). If you're a high frequency trader then it's probably a few months.

DeleteSo as a rule of thumb optimal window length = holding period * a very big number

Technically speaking an expanding or moving window isn't cross validation, but it is "good" in the sense that our training and testing data are always different.

Thanks for commenting - as far as the question of an optimal window length, I got the notion from a blog here (no affiliation with them whatsoever): robotwealth.com/optimal-data-windows-for-training-a-machine-learning-model-for-financial-prediction/

DeleteNot apples-to-apples, as they're going over cross-validation, and on a different trading strategy than what you might use...

The result for optimal window length in that article isn't statistically significant- all the account curves shown are indistinguishable from each other and from noise.

DeleteHi Rob, thanks for the interesting book. Can you comment on the use of fx adjusted returns and local returns in the asset allocation example. When bootstrapping weights on a similar portfolio to the book for a GBP investor I see FTSE and UK Bond weights increase for fx adjusted returns. I think this probably makes sense since the bulk of the assets are not GBP so this diversifies the currency exposure. Appreciate that one could borrow the foreign currencies or use fx derivatives but that is not available to all accounts. If you go down the route of foreign borrowing do people account for the cost of repatriating profits and losses (Quanto in derivatives terminology) in their Sharpe ratios?

ReplyDeleteEssentially I'm assuming that we mark to market all foreign currency p&l daily, in other words there is an implicit assumption that (a) there is no margin, (b) there is no interest or fx risk on capital, (c) all returns made in foreign currency are immediately repatriated on a daily basis.

DeleteYou will appreciate that properly accounting for real world effects on all these things is very difficult, so the question is how much does it matter?

In terms of a more real world example if I look at the effect of FX + net interest on my own account over the last three years the figures are: 1.6%, 3.2%, -1.1%. That compares to p&l swings of 0% to 50+%.

So the effect is small compared to the variability of returns overall. Indeed because of the way my account is structured, with equities funding cash for futures margin, the effect of FX is probably higher than for an all cash funded account.

I'm currently re-estimating the weights every 21 days using bootstrapping.

ReplyDeleteI'm taking gross returns that are all normalised/scaled to a min/max of +/-20. Then for each day I take all the historic returns data available up to that date across all (17) assets and from them randomly sample and combine 20 x 256 day blocks to create one long block. I then make 200 MC runs across that long block and average the samples (256 days each) to get the weights.

When doing this using 200 MC runs, I noticed some fairly big variations between re-estimations. A weight might be 0.15 on one occasion but then 0.5 on the next. I realise this isn't necessarily 'wrong' (returns change) but as an experiment I tried upping the MC runs to 1000 and found that the largest change for individual weights between re-estimations dropped to max of about 0.1 (e.g. 0.15 might become 0.25). (I tried various other numbers of MC runs and found that 1000 was about as 'good' as it got - negligible further change reduction above 1000 and pretty much a linear increase below that down to 200 MC iterations.)

This prompted me to wonder if there was a conceptually 'better' way to do this:

1. Simply increase the MC runs per estimation to 1000

2. Calculate an average of the average weights from each estimation (200 runs) either cumulatively or on a rolling window basis and use that average of averages for the weights

3. Both of the above

Would much value your view on this (even if you think it is just a futile exercise in false precision). Many thanks.

Note: I'm using C not Python (so extra MC cycles aren't very computationally expernsive) and the weights are based on the same 6 sets of EMA lengths in your book, but I'm applying those lengths to low pass filters instead of EMAs (because I noticed that the correlation between LP filters of the same periods as the EMAs was considerably lower)

Increasing the number of MC runs will improve things, but asymptotically (decreasing returns to more runs). I'm not sure whether taking an average of an average is better... I think in the limit and on average the average of X runs of N length should have the same properties as a run of X*N.

DeleteSo if you can afford to computationally then yes, just increase the number of MC runs.

Rob- thanks for the useful posts, books, etc. long-time lurker here. Quick(ish) question: when bootstrapping returns whats the best way to keep the serial correlation (and other such properties I guess)? block re-sampling or something else?

ReplyDeleteYes, block sampling is the best way.

DeleteHi Rob, I just wanted to double check I was correctly calculating the returns from individual EMA parameter sets. I'm simply taking the volatility standardised asset returns for each day for each asset and multiplying them by the normalised (i.e. scaled +/-20) predictions generated by each EMA parameter set (i.e. (2,8), (4,16)) for each day. Is that correct? (I'm coding this in C rather than using your Python version)

ReplyDeleteApologies for the v basic question, but I'm getting v odd portfolio weights from Markowitz for the slowest two parameter sets when only those two alone are being bootstrapped. They are both always zero, even when the mean returns (as calculated above) for both are positive.

Many thanks.

That sounds about right, but without seeing your code it's hard to be sure (and please don't show it to me - I don't look at other peoples code, and that goes doubly so for C).

DeleteHi Rob, I wonder what are your thoughts incorporating autoregressive nature of the returns in your bootstrapping? How would you go about this? IIRC there are methods which use the residuals of ARIMA process in the bootstrapping.

ReplyDeleteBlock bootstrapping.

DeleteHi Rob,

ReplyDeleteI managed to write my own bootstrapper. But currently it has the constraint that all of the Instruments must have history going back to a certain point in time (2009).

I suppose that's fine as long as I'm using well-established instruments, but I was hoping to bootstrap some stocks and ETF's that have been launched later, say as late as 2015 or so.

So obviously, I cannot correlate between these instruments in the period of 2009 thru 2014. And any bootstrap iteration that has any sliver of that date range cannot use the new instruments.

Do you know of a way to bootstrap with a combination of the established and newer instruments yet still have relatively stable weights?

Basically if I neeed a weight for a particular instrument in a period (I do a rolling optimisation, so early on in the sample that may not be an issue) and it is absent from the current historical data up to my reference date, then I allocate a pro-rata weight. For example, suppose I have 4 instruments, 2 with data, 2 without. The two with data will get whatever weights the opimisation gives them. The other 2 will get 25% weights. I call these 'cleaned' weights. You can also be a bit more conservative and only allocate say half the pro-rata allocation, which in this case would give you 12.5% in each of the instruments with no data.

DeleteThe issue then becomes when we start to get data for a given instrument where there was none before. The bootstrapped weights will be biased downwards for the instruments without much data, because they don't appear in the samples very much. You can argue this is the 'right' behaviour; since we allocate to things we don't understand yet. But this is inconsistent with equally weighting when we know nothing at all!

One solution is to apply an exponentially weighted smooth to your weights. I do this anyway to reduce trading costs when weights are re-estimated in a backtest. But the smooth will have to be rather slow to deal with the problem.

Another is to use a linear function to transition between pro-rata 'clean' weights and weights from a sample. So with zero years of data you use clean weights, maybe with 5 years of data you use the weights from the sample, then after 2.5 years of data you were using an average of those two sets of weights.

Wow, thanks for the info! This leaves room for much experimentation.

DeleteHi Rob,

ReplyDeleteThanks for your excellent post.

For the volatility normalization, how to apply the "risk weightings" in real portfolio investment? Is there any adjustment necessary as the return was adjusted by the volatility before?

If your risk weightings are r1, r2,.... then you divide all your weights w1,w2,... by the risk for that asset. And then renormalise the resulting cash weights so they add up to 100%.

DeleteSo w1 = r1 / s1, ... and then renormalise all w please?

DeleteBy renormalise I mean suppose you had risk weights of 50% each asset, and the risk was 10% and 20% respectively. The raw cash weights are 50%/10% = 500% and 50%/20% = 250%. Total is 750%. Now I divide by that, 500/750 = 66.66% and 250/750 = 33.3%

Delete(If memory serves in 'smart portfolios' I suggest you multiply by the ratio target standard deviation / asset risk. But it doesn't really matter as the target cancels out)

Hi Rob,

ReplyDeleteI hope you're doing well.

I've been exploring the portfolio optimization techniques outlined in your work, specifically the bootstrapping with replacement approach combined with an expanding window. This has led me to a couple of questions regarding the methodology and its application:

Monte Carlo Length Adjustment: In the context of using an expanding window for bootstrapping, would it make sense to increase the length of each Monte Carlo simulation (monte_length) proportionally to the size of the expanding window? This adjustment could potentially capture more of the evolving data characteristics over time. I'm curious about your perspective on this approach.

Optimization Frequency: From the analysis presented, it seems that the optimizer's output weights are kept static on a yearly basis. Could you share more about the rationale behind this decision? Was the frequency of running the optimizer determined based on empirical analysis, theoretical considerations, or a discretionary choice?

Application to Sparse Data: My current project involves data with a weekly frequency, but I only have around 600 weeks' worth of data. Given the relatively limited dataset:

How would you suggest adjusting the frequency of recalculating forecast weights?

Are there any recommendations for setting the window size for the expanding window and the number of bootstrap runs in this context?

Thank you for your time and for sharing your expertise!

"would it make sense to increase the length of each Monte Carlo simulation (monte_length) proportionally to the size of the expanding window? " Yes in fact I would use the rule monte_length = max(length of window, 1 year).

Delete"Optimization Frequency: From the analysis presented, it seems that the optimizer's output weights are kept static on a yearly basis. Could you share more about the rationale behind this decision?" it's very unlikely that we would get any interesting new information with less than an additional year of data and this slows things down a lot. In reality I fit these weights less frequently than annually- almost never.

Delete"Application to Sparse Data: My current project involves data with a weekly frequency, but I only have around 600 weeks' worth of data."

Delete(pendantically, this isn't sparse data but data with limited history, not quite the same thing)

" Given the relatively limited dataset:

How would you suggest adjusting the frequency of recalculating forecast weights?" I wouldn't.

"Are there any recommendations for setting the window size for the expanding window and the number of bootstrap runs in this context?" No again I'd use something like max(1 year, available data) for window size and # of bootstraps as many as possible without killing your CPU (though with smaller window sizes you will find you don't need as many bootstraps as there are fewer unique combinations of samples).