Friday, 3 February 2023

Percentage or price differences when estimating standard deviation - that is the question

In a lot of my work, including my new book, I use two different ways of measuring standard deviation. The first method, which most people are familiar with, is to use some series of recent percentage returns. Given a series of prices p_t you might imagine the calculation would be something like this:

Sigma_% = f([p_t - p_t-1]/p_t-1, [p_t-1 - p_t-2]/pt-2, ....)

NOTE: I am not concerned with the form that function f takes in this post, but for the sake of argument let's see it's a simple moving average standard deviation. So we would take the last N of these terms, subtract the rolling mean from them, square them, take the average, and then take the square root.

For futures trading we have two options for p_t: the 'current' price of the contract, and the back adjusted price. These will only be the same in the days since the last roll. In fact, because the back adjusted price can go to zero or become negative, I strongly advocate using the 'current' price as the denominator in the above equation, and the changein back adjusted price as the numerator. If we used the change in current price, we'd see a pop upwards in volatility every time there was a futures roll. So if p*_t is the current price of a contract, then:

Sigma_% = f([p_t - p_t-1]/p*_t-1, [p_t-1 - p_t-2]/p*t-2, ....)

The alternative method, is to use some series of price differences:

Sigma_d = f([p_t - p_t-1], [p_t-1 - p_t-2], ....)

Here these are all 

If I wanted to convert this standard deviation into terms comparable with the % standard deviation, then I would divide this by the current price (*not* the backadjusted price):

Sigma_d% = Sigma_d / p*_t

Now, clearly these are not going to give exactly the same answer, except in the tedious case where there has been no volatility (and perhaps a few, other, odd corner cases). This is illustrated nicely by the following little figure-ette (figure-ine? figure-let? figure-ling?):

import pandas as pd
perc =(px.diff()/pxc.shift(1)).rolling(30, min_periods=3).std()
diff = (px.diff()).rolling(30, min_periods=3).std().ffill()/pxc.ffill()
both = pd.concat([perc,diff], axis=1)
both.columns = ['%', 'diff']



The two series are tracking pretty closely, except in the extreme vol of late 2008, and even they aren't that different. 

Here is another one:

That's WTi crude oil during COVID; and there is quite a big difference there. Incidentally, the difference could have been far worse. I was trading the December 2020 contract at the time... the front contract in this period (May 2020) went below zero for several days.

Now most people are more familiar with % standard deviations, which is why I have used it so much, but what you may not realise is that the price difference standard deviation is far more important.

How come? Well consider the basic position sizing equation that I have used throughout my work:

N = Capital × τ ÷ (Multiplier × Price × FX rate × σ_% )

(This is the version in my latest book, but very similar versions appear in my first and third books). Ignoring most things we get:

N = X ÷ (Price × σ_%)

So the number of contracts held is proportional to one divided by the price multiplied by the percentage standard deviation estimate. The price shown is, if you've been paying attention, the current price not the back adjusted one. But remember:

Sigma_d% = Sigma_d / p*_t

Hence the position is actually proportional to the standard deviation in price difference terms. We can eithier estimate this directly, or as the equation suggests recover it from the standard deviation in percentage terms, which we then multiply by the current futures price.

As the graphs above suggest, in the majority of cases it won't make much difference which of these methods you choose. But for the corner case of prices close to zero, it will be more robust to use price differences. In conclusion: I recommend using price differences to estimate the standard deviation.

Finally, there are also times when it still makes sense to use % returns. For example, when estimating risk it's more natural to do this using percentages (I do this when getting a covariance matrix for my exogenous risk overlay and dynamic optimisation). When percentage standard deviation is required I usually divide my price difference estimate by the absolute value of the current futures price. That will handle prices close to zero and negative prices, but it will result in temporarily very high % standard deviations. This is mostly unavoidable, but at least the problem is confined to a small part of the strategy, and the most likely outcome is that we won't take positions in these markets (probably not a bad thing!).

Footnote: Shout out to the diff(log price) people. How did those negative prices work out for you guys?



4 comments:

  1. I'm guessing this approach isn't applicable for long time-frame backtesting or measuring vol over long spans? It is not clear to me how we can use back-adjusted prices generated years in advance of the first testing period to compute standard deviation with differences in back adjusted prices in the numerator and (then held contract) current prices in the denominator. Let's take Soybeans for an example. Back-adjusting prices from the 8/31/2023 all the way back to 9/20/85 generates a starting back-adjusted price of -$77.5 on 9/20/85. The current price at that time was $536.50. Given the disparity, we cannot hope to get an accurate measure of volatility for the early years if we use the difference in back-adjusted prices in the numerator and the current price in the denominator. Nor can we use this method to get long-term estimates of volatility because the further one goes back the greater potential for price disparity. How does your model deal with this in long term backtesting?

    ReplyDelete
    Replies
    1. I've been thinking and working on this since I wrote it a few days ago. Back adjusted prices are back changed for every price historically at each roll. I think that requires that each roll will have its own isolated back-adjusted price series history. This is the only setup I can think of to avoid lookahead bias created by a continuous back-adjusted price series that extends beyond the back test trade dates. How else could we create a proper EWMAC calculation on continuous pricing? How else could we measure volatility at any point in time in the series using back-adjusted data using the methods in the blog entry and book which relies on back-adjusted prices. I wrote a script that back-adjusts prices at each roll interval separately, which creates a unique back adjusted series per roll going back to the beginning of data availability for a given futures contract. If I am thinking about this incorrectly please let me know.

      Delete
    2. Nope. You can forward adjust if you're worried about forward looking bias. But a well designed trading system that doesn't use absolute price levels will be indifferent between these two methods.

      Delete
    3. Ok thats helpful and I can see now my idea for keeping separate BAP series is unnecessary.

      Separate Question: In the Appendix of AFTS you recommend for position sizing, BAP Change/Currently Held % method for standard deviation and.... for risk adjusting forecasts, the daily price changes estimate of standard deviation (not annualized). In this blog post, however, you recommend the price change method for position sizing (I think)? I was confused by this followup: "Hence the position is actually proportional to the standard deviation in price difference terms. We can either estimate this directly, or as the equation suggests recover it from the standard deviation in percentage terms, which we then multiply by the current futures price." So σ (d) = σ% (from BAP change/Currently Held) * Price? I think this means we are indifferent to which of the two methods we use to position size as long as we adjust the % method to price in your position sizing formula?

      Delete

Comments are moderated. So there will be a delay before they are published. Don't bother with spam, it wastes your time and mine.