## Wednesday 17 December 2014

### Should billionaires and bricklayers have the same investments?

I permanently cut my the risk of my trend following futures trading system last week and invested my profits into bonds, because I was feeling richer. Let me explain. why...

### The basics

There are many ways to measure risk. My favourite is the expected daily standard deviation of your portfolio returns, and I usually look at the annualised version of this - multiply by 16. Until recently I was targeting 50% a year (or about 4% a day). I hasten to add that I only have a fraction of my net worth in my futures trading system - this would be an inappropriately high level if it was my entire asset base. I've now cut it to 25% a year, which is closer to the 10 - 20% that most systematic trend following firms target.

It's a well known result that to be consistent with the continuous Kelly criterion you should target the same standard deviation as you expect your Sharpe Ratio (SR) to be. I'll talk about what the right Sharpe Ratio might be in a moment.

So if you expect a SR of 0.5, you should run at 50% annualised risk. However that is a little bit rich for me, and for any sensible person. Consider the following graph:

 Recovering the Kelly criterion from simulated data. Source: Author's research.

Focusing on the blue line for the moment (apologies to the colour blind, it's the middle one once we get to the right) you can see it peaks at around 0.50, or 50%, which is Kelly optimal for a portfolio (trading system, or long only portfolio with one or more assets in it) with a true Sharpe Ratio of 0.5 as we have here. However suppose you don't know what your true Sharpe is, which is the normal state of affairs.

Suppose you think that your SR is 1.0, in which case you would be betting at a risk target of 100% annualised risk. As the picture shows if the true SR is really 0.5 you would on average lose money in the long run, and in many cases you'd lose a lot. A far safer bet is to run at 'Half-Kelly'. Expecting a SR of 1.0 you'd run at 50% risk. If you thought you'd get a SR of 0.50 as in the graph then 25% annualised risk is fine. This isn't optimal, your average annual return will be about a third less than 'Full Kelly', but it's better than risking too much and ending up on the right hand side of the peak.

That is the result for a normal asset with symmetric returns. The other two lines show you the results of different kinds of assets. The green line (bottom line on the right) is for a negative skew asset - like a trading system that sells volatility either directly or through running relative value type trades. The red line (top line on the right) is for a positive skew asset like trend following. As you can see the negative skew asset becomes toxic much quicker than the other two.

Overestimating the Sharpe ratio of a negative skew asset and Kelly betting accordingly is a one way ticket to bankruptcy. This is made worse by the fact that these strategies normally have quite low natural volatility, so to get up to the likes of 50 or 100% annualised risk they will need enormous leverage.

In contrast the positive skew asset is relatively benign at larger risk percentages. It's still better to run at the optimal Kelly, and safer to run at half Kelly, but running too much risk isn't quite as damaging.

### What is a reasonable Sharpe Ratio to expect?

All this is well and good but what sort of Sharpe should we expect? Most people would at this point just get some estimates of past returns and volatility, or if you run a trading system you fire up some back test software. Two reasons why you should take what comes of this with a pinch of salt.

Firstly asset returns in the future are unlikely to be as high as they were in the past. Take stocks. Even with the financial crisis over the last 40 years they have done pretty well. A good chunk of that comes from effects that won't be repeated (falling inflation) or could well reverse (rising proportion of GDP as corporate profits, rerating of earnings:price ratios). This also affects trading systems, since if assets have generally been going up then trend following for example will work better.

The second problem is most back tests are overfitted. Unless you've genuinely put in the first set of trading rules you thought of, not looked at the performance, not thrown anything away; and done a pure backward looking optimisation. Even if you do all of these things chances are you're still using trading rules that somebody else has come up, using past data or experience.

You can either apply a very sophisticated method, adjusting past asset class performance to take out secular effects and using statistical techniques to estimate the effect of overfitting, or just use a reasonable rule of thumb which is to cut the expected back test performance in half.

In long only world for a single average stock a SR of 0.2 is likely. For a diversified portfolio of equities you could get up to 0.3. Diversifying across asset classes might get you up to a SR of 0.5. Adding a trading system on top of these numbers could half again; with a mixture of styles you could probably double this.

For a very well diversified system like mine (45 futures markets over all major asset classes, 8 types of signal over three different styles) then backtested SR of 2.0 translate to an expectation of 1.0.

Unless you're in high frequency world, and benefiting from low latency technology or have market maker advantages, then I don't believe a SR above this is realistic.

So far I haven't justified why I cut my annual risk percentage by half, since if I was expecting a Sharpe of 1.0 then my 50% target was probably okay. So now we need to think about how wealth influences risk taking.

### Should wealth determine the amount of risk you take, and the kinds of investments you have?

Economic theory generally assumes constant relative risk aversion. This would imply that wealth doesn't affect your desire for risk. A bricklayer who somehow managed to come a billionaire would maintain the same level of risk as a percentage of their portfolio. Financial theory also assumes that everyone should have the same portfolio of investments, with the highest possible Sharpe Ratio, and then leverage as required to get the risk they want.

I am not picking on bricklayers for any reason, except for the alliterative opportunities they offer here.

In practise this doesn't seem to happen. For example under prospect theory the bricklayer would probably become more risk averse as they get richer, for fear of losing their new found gains. Secondly most people also aren't comfortable using leverage, except when buying residential property.

Imagine you're a 64 year old bricklayer, who will be retiring next week. You only have a state pension and no other investments, except £10,000 in cash. Economically you own an annuity (the pension) worth perhaps £180,000 plus the cash which is 5.3% of your net worth.

Is the best use of £10,000 to invest it in a Sharpe ratio 1.0 opportunity which will return 10%, or to buy lottery tickets? The latter is more likely and also makes more sense. £1,000 isn't going to make any difference at all (adding 0.53% to wealth, and if invested risk free about the same to income). But in the 2 million to one or so chance of a lottery jackpot and winning £10 million the bricklayer could be much better off.

Point one: people who don't / can't use leverage and need / want high returns will pay for risky investments - lottery tickets, growth story stocks, 100-1 horses - even if they have a negative expectation.

If the bricklayer could infinitely leverage up his £10,000 the lottery ticket would make no sense as it would be dominated by a leveraged form of the SR 1.0 investment. This could net him £10 million (a leverage factor so large I can't be bothered to work it out) with a positive expectation. But that would be well beyond half or even full Kelly. Betting at half Kelly - five times leverage - would still only expect to earn £5,000 again - not enough to make a huge difference (2.5% of wealth). It's more likely the builder will bet beyond half or even full Kelly, even if they don't go all the way to lottery like levels.

Point two: people who have a low level of financial wealth, which is dominated by other income, will often use too much leverage or go for riskier investments.

Now suppose you are a billionaire, with a billion quid, and 5.3% or £53 million spare. You could certainly afford to throw it away on lottery tickets, or buy a football team, both of which have negative expectation. However it's much more likely that you will put it into the SR 1.0 investment - that after all is how you became rich, not by making stupid financial decisions but by making good ones.

Or maybe you inherited the money, in which case good decision to be born to the right parents. Go you!

I also think it's much more likely that you will be very cautious, investing at most half-Kelly, and probably not even leveraging at all. You don't need the extra income, so preserving your wealth is more important than taking additional risk to get it. This is why rich people like investments with consistent returns. Prospect theory tells us that fear of losing new found wealth makes people more risk averse than if they are trying to recover gains.

This also opens things up for the billionaires. They can invest in high SR, but low return, investments that other people would spurn.

This effect applies to all levels of wealth. Now hopefully you understand why after a good run on the futures markets I wanted to lower the risk of my portfolio, by scaling back on my leveraged derivative exposure and putting the money into relatively low risk bonds.

Point three: As people get more wealth they become risk averse, able to invest in low risk but high SR investments, and they use less leverage.

### Let's get a bit more sophisticated...

Apart from risk preferences can we say anything else about preferences for different wealth levels. I was inspired to write this post by the following which also generated some discussion with my ex colleague Matt. The paper argues that wealthier investors are more likely to be 'value' investors, whereas others are 'momentum' investors.

By cutting my exposure to momentum (which I did before reading about the Lettau et al paper) I have definitely followed this track, at least to a degree.

The authors postulate that investors with different wealth levels are hedging different risk exposures that they already have.

"Thus shareholders in the bottom 90% of the wealth distribution may seek to hedge risks associated with an increase in the capital share by chasing returns and sticking to stocks whose prices have appreciated most recently. On the other hand, those in the top 10%, such as corporate executives whose fortunes are highly correlated with recent stock market gains, may have compensation structures that are already momentum-like. These shareholders may seek to hedge their compensation structures by undertaking contrarian investment strategies that go long in stocks whose prices are low or recently depreciated."

There may be other reasons. It's possible we can wrap this up with what we already know from above. Pure value strategies are relative value, exactly the kind of high SR, naturally low risk strategy that rich people like. Momentum strategies tend to be have higher natural risk, due to low futures margin and the positive skew that means you can safely run higher risk targets.

Another explanation relates to liquidity. Billionaires are more likely to be owners of 'patient capital', money that can be tied up for years or decades in family trusts. Value strategies - buying stuff that's cheap - particularly illiquid stuff like private equity or land - do better if they don't have to suddenly liquidate after losses due to redemption's by impatient investors. Again momentum strategies tend to be in more liquid futures which for the common or garden retired investor who relies on regular returns for income is a good thing.

### Concluding thought

Although the story in the paper is an interesting one, and might have some truth to it, ultimately having a good mix of investment styles is undoubtedly better than favouring one or another, and will give you a higher Sharpe Ratio overall. So although getting a little bit richer might be a good excuse for reducing your risk appetite and leverage, it doesn't justify trusting all your money to one investing style.

## Monday 8 December 2014

### Why you need two systems for running automating trading strategies

Running a fully automated trading strategy requires very little time. Apart from the 6 months or so of flat out coding you need to do first of course. Before doing this coding there is a chicken or the egg question to resolve. Do you write backtesting code and then some extra bits to make it trade live, or do you write live trading code which you then try and backtest?

### Some background

If you haven't had the pleasure of writing an automated trading strategy, perhaps because you use prebaked software like "Me Too! Trader" or an online platform such as https://www.quantopian.com/, you may wonder what on earth I am talking about.

The issue is that there are two completely different user requirements for what is usually one piece of software. The first user is a researcher. They want something highly flexible that they can use to test different, and novel, trading strategies; and to simulate their profitability by "backtesting". Any component needs to be interactive, dynamic and easy to modify.

The next user - lets call them the implementor -  does not rate flexibility, indeed it may be viewed as potentially dangerous. They want something that is ultra robust and can run with minimal human intervention. Every component must be unit tested to the eyeballs; modifications should be minimal and rigorously tested. Interaction is strongly discouraged and should be limited to reading diagnostic output. The code needs to be stuffed full of fail safes, "what ifs?", and corner case catchers.

Ultimately you won't benefit from a systematic trading strategy unless both users are happy. You will end up with a product which is either untested with market data and which may not be profitable (unhappy researcher), or with one which should be profitable but is so badly implemented it will either crash daily or produce fat finger class errors and buy 10e6 too many contracts (unhappy implementor).

Weirdly of course if, like myself, you're trading with your own money these users are the same person!

### Ideas have to be tested

In the vast majority of cases the backtest code comes first, for the same reason that when it comes to building a new car you don't just weld together a bunch of panels and see what they look like; you get out your little clay model (or in this less romantic world, your CAD package). Pretty much every design discipline uses a 'sandbox' environment to develop ideas. Important fact: The people playing in the sandpit aren't usually professionally trained programmers (including yours truly).

Either in a greenfield corporate context, or if you are developing your own stuff, the first thing you will do is write some code that turns prices or other data into positions; and then a little routine to pretend you were actually trading live in the past to see how much money you did, or didn't make.

If you're sensible then you might even have some of your core mathematical routines tidied up and unit tested so they are properly reusable. You can try and modularise the code as much as possible, so running a different trading rule just involves repointing one line of code. You could get quite fancy and have code that is flexible and is configurable by file or arguments, rather than  "configuration" by script. Your simulation of backtested performance can get quite sophisticated.

At some point though you're going to want to run real money on this.

### Productionization - bringing in the grownups

This simulation code isn't normally up to the job of running with real money. In theory all you need to do is write a script that runs the simulation every day / hour / minute and then another piece of code that turns the output of that into actual real live trades.
I suspect most people who are running their own money go down this path. However I would estimate that only 10% of my own code base (of which more in a moment) is needed to run a simulation. What that means in practice is you start with code that isn't sufficiently robust to run in a fully automated way (because it's missing most of the other 90%) and if you're lucky you end up with a vast jerry built structure of things tacked on when you realise you needed them.

If you are trading your own money and not interested in the machinations of corporate fund management politics you'll probably want to skip ahead to 'Two systems'.
Alternatively what tends to happen next in a corporate context is some proper programmers get brought in to productionize the system. The simulation code is normally treated as a specification document, and a seriously incomplete and badly written one at that, rather than as a prototype. The rest of the spec, which is the stuff you need to do the 90%, then has to be written by the implementor.

The result is a robust trading systems but one on which it's now impossible to do any research. The reason why it's are that it's very hard to unpick the 10% of code that can be mucked about with, muck about with it and then re-run it to see what will happen.

### When lunatics run the asylum: need for innovation

What usually happens next is that the research user comes up with some clever idea that the solid monolithic tank like existing production code isn't capable of doing. Given that most quant finance businesses have an oversupply of clever people with clever ideas, and an under supply of people who can actually make things work properly, they will then be faced with a choice. Either wait for many months for some programming talent to become available, or try and twist the arm of management to let them implement the simulation system with real money.

Most quant finance businesses are run by quants (Which you might think is the natural order of things. But being very clever and insightful AND being a great business person are quite unusual skills to find in the same person. Perhaps it is sometimes better to have the business run by a glorified COO whilst you stick to what you're good at, which is usually the cool and fun stuff. Tech company bosses also take note). Which means that the simulation system ends up being used to trade real money, despite this being an insane idea. By the way having quants in charge is also why there is an under supply of builders AKA programmers versus architects AKA researchers in these businesses. That and separate reporting / manpower budget lines for CTO's.

Anyway the bottom line is that rather than modify the existing production code to do the new new thing the programmers then often have to work with a hacked up backtest pretending to be a swan like production system. But because this is actually running real money it's treated more as a prototype than a badly written spec. This means a lot of crud gets ported across into the production system, and the process of productionizing takes a lot longer.

Eventually we end up with a robust system again. Until that is some bright spark has another clever idea, and the cycle begins again.

### Two systems: An aside on testing and matching

One way - indeed the best way - of dealing with this is to keep your two code bases completely separate. Once your bright idea is fully developed then you show the programmers your code. After laughing hard at your pathetic attempt they then incorporate it into the production system. You then continue to use your simulation code

You can also do this as an individual, although you probably won't laugh at your own code, not if you've just written it anyway. As an individual programmer and trader having to maintain two systems is also a serious time overhead, but ultimately worth it.

Back in the corporate world an obvious problem with this is you still have the bottleneck of needing enough programmers to keep up with the flow of wonderful ideas. However at least they aren't wasting their time trying to deal with hurriedly rewriting cruddy simulation systems that are already running real money before they blow up.

A slight problem with this is that you have created two ways to do something. Corporate types running systematic fund businesses have an unhealthy obsession with things being 'right'. You have to prove that the position coming from your production system is 'right'. If you have a simulation the most obvious way of doing this is to run that and crosscheck them. In this way the simulation becomes a glorified integration test of the production code.

This is a recipe for tens of thousands of person hours of wasted time and effort trying to work out why two are slightly different. This is completely stupid. For starters there is no 'right'. All trading rules are guesses anyway. Under this logic a trading rule that did exactly what it was 'supposed' to do, but lost a billion dollars would be better than one which was a bit wayward but which made the same amount in profit.

Second of all this is a very stupid way of testing anything. You should have a spec as to what the trading system should do. In case it isn't obvious, I don't think the simulation code should be the spec. At best it's a starting point for writing the spec. But should you reproduce a bug in the simulation if it isn't what was intended? No. You should find out what's intended, write it down, and that is what you should implement. You then write tests to check the production code meets the spec. And mostly they should be unit tests. This is very obvious indeed to anyone working in any kind of other industry where you build stuff after prototyping it.

### Production first?!?!

When it came to writing my own trading system about a year ago I did something radical. Since I knew exactly what I wanted to implement, I just sat down and wrote the production code. Of course I was in the unusual position of having already designed enough trading systems to know what I wanted to do, albeit in a corporate context and I had never written an end to end production system before.

I don't think writing production first is unattainable even if you don't know exactly what you're going to do. If you have the pleasure of working in a greenfield setting you have two main jobs to do. The first is to write a production system, and the second is to come up with some new and profitable ideas. Don't wait until you've come up with ideas to hire your proper programmers, hire them now. Get them to code up a simple trading rule in a robust production system. Meanwhile you can do your clever stuff. Occasionally they will come and confront you with questions, and hopefully this will will force you to direct your cleverness in the direction of clarifying what your investment process might end up being.

Similarly if you are writing your own stuff then it might be worth coding the simplest possible production system first before you do your research. You could even do both jobs them in parallel. It's quite nice being able to shift to doing some hard core econometrics when you've been coding up corner cases for trading algorithms, and some mindless script writing can be just the ticket when you are stuck for inspiration and the great trading ideas just aren't coming.

If your code is modular enough you should be able to subsequently write the simulation code from production rather than vice versa. The simulation code will just be some scaffolding around the core trading rule part of your production code (the 10% bit, remember?).

With my own system I did get round to doing this, but only after I'd be trading for 6 months. But to be honest I don't really run my simulation code that much, and I certainly don't check it against my production code. It's only used for what it should be used for - a sandbox for playing in. If I come up with any new ideas then I'll then have to go and implement them in the production code. So I am firmly in the two system world, although I approached it from the other direction than what we normally see.

### Nirvana?

No not the early 90's grunge band, but the idea of some perfect system existing that can do both. A giant uber-system which can meet both requirements. I don't think such a nirvana is attainable, for a couple of reasons. Firstly the work involved is substantial - I would estimate at least four times as much as developing a separate production and simulation system.

Secondly, in a corporate context, there is usually too big a disparity in the needs of different users particularly on the research side. Often there is a temptation to over specify the flashy aspects of the project, such as the user interface having lots of interactive graphics. This often happens because the senior managers with the authority to order such large IT projects haven't done much coding for a while and need more of a point and click interface.

### Small steps

I don't believe in the fairy story of 'one system to rule them all'. Instead I believe that two systems probably works best, but with some sensible code reuse where it makes sense. Here are some of the small steps you can take.

As I've already mentioned your core utilities, like calculate a moving average*, should be shared, and tested to death, so you can trust them.

* Okay bad example, since I get pandas to do this for me. But you get the idea.

You can't possible reuse code unless you have good modularity. The wrapper around the 10% of my production code that is reusable for simulations looks like this:

data1 = get_live_data_to_do_step_one(*args for live data)
config1 = get_live_config_to_do_step_one(*args for live config)
diag = diagnostic(* define where live diagnostics are written to)

output1 = do_step_one(data=data1, config=config1, diag=diag)

data2 = get_live_data_to_do_step_two(*args for live data)
config2 = get_live_config_to_do_step_two(*args for live config)
output2 = do_step_two(output1, somedata=output1, moredata=data2, config2=config2, diag)

Hopefully I don't need to spell out how the simulation code is different, or how it would be hard to replace step one with a different step one in a research context if the code wasn't broken down like this.

Try and separate out the parts that do all the corner case and type testing from the actual algorithm. The latter part you will want to play with and look at. This does however mean you can't have a simple 'doughnut' model of production and simulation code, where there is just a different 'scaffolding' around a core position generation function (which I realise is what my pseudo code implies...). It needs to be more dynamic than that.

Don't make stuff reusable for the sake of it. For example I toyed with creating a fancy accounting object which could analyse either live or simulated profitability. But ultimately I didn't think it was worth it, just because it would have been cool. Instead I wrote a lot of small routines that did various small analysis, that I could stick together in different ways for each task.

As well as code reuse you can also have data reuse. It doesn't make any sense to have two databases of price data, one for simulation and one for live data. If there are certain prebaked calculations that you always do, such as working out price volatility, then you should have your production system work them out as often as it needs to and dump the results where the rest of your system, including your simulation code, can get it.

### Go forth and code

That's it then. Hopefully I've convinced you that the two system model makes sense. Now if you will excuse me I'm going to go and hack some back testing code ....

## Thursday 20 November 2014

A few years ago I was working at a quant hedge fund with the glorious job title of Head of Fundamental Trading Models. It sounds grand I know, but it was just three of us: me, a mad Canadian guy and a madder Italian guy; occasionally augmented with some poor interns. When we weren't building and managing trading strategies we used to amuse ourselves with turf wars with the Head of Technical Trading Models, and his merry band of rocket scientists.

Fast forward to the present day and I entrust a big chunk of my net worth to a purely technical trading system (the rest is in a bunch of shares and exchange traded funds which I 'manage', or rather idly neglect, on a vaguely discretionary fundamental basis).

So which is better?

#### First some context

Systematic trading strategies vary in the source of data they use, using either technical or fundamental information. Strategies which are technical use only prices as an input. Technicians believe that all necessary information is already impounded in the market price, and other inputs are futile.

Non price, fundamental data, comes in two main flavors. Micro data is about a specific asset, for example the yield of a particular bond or the Price Earnings ratio of a companies stock. Macro data is about entire economies and could include inflation or GDP growth. There are also forecasts available for many kinds of fundamental data.

It's time for the big head to head: Technical vs Fundamental systematic trading systems. Who will be victorious?

### Data collection and cleaning

#### Technical:

We get a price. Sometimes it might be a bit dodgy, but given an estimate of typical variability of price between collection points, we can set up simple automatic filters that will catch unexpected moves.

#### Fundamental:

We get a price. And a whole bunch of other stuff. Accounting ratios for stocks, interest rates, inflation,... the list goes on. So if I have 10 variables for each instrument I'm trading, then I need to collect 10 times as much data. Right now I spend about two minutes per month checking each of the 50 plus futures contracts I trade, or roughly 5 minutes a day in total (basically when the aforementioned filters are triggered). Do I really want to make that 50 minutes a day? When would I have time to watch TV?

### Data bias and manipulation

#### Technical:

It's a price. Unlike certain other financial variables, which will remain nameless, traded prices are mostly hard to manipulate or generally muck about with. Well okay, there are some games you can play, but they don't move prices enough to seriously interfere with the signal of a trading system.

#### Fundamental:

Anyone who thinks that accounting data can't be manipulated needs to read The Smartest Guys in the Room, at least twice. Economic data is probably reasonably okay, in least in developed countries. But it can be biased. And there are these lovely indices which try and predict / encompass other fundamental variables, which at some point in the past had their weights fitted with an in sample regression

### Data frequency, lags and revisions

#### Technical:

Even if you have no money you can get a price with a 15 minute delay free from at least 8 million websites on the internet. If you're prepared to stump up for a live feed, or you're broker will give you one in exchange for the vast amounts of commission you are paying them, you can get your price within milliseconds. And once you have the price, it isn't going to get changed. And within a fraction of a second of getting it, you can get a new price (well it might not have moved, but in principle you can get a new one).

#### Fundamental:

You want GDP? Sorry you're going to have to wait sir, it's only quarterly. Yes, four times a year, the clues in the name. It's the end of the quarter? You're going to have to wait a few weeks have elapsed. We need to, add a bunch of stuff up, or something. And just when you think you've got it, we're going to change it in 3 months time. Then we'll change it again. And again. And so on for several years. You want to know what the original number was, so you're backtest isn't biased? Sorry we don't have it. Or we'll charge you an arm, a leg, and a kidney for it.
By the way did you know that 10 years ago the publication delay was 3 months, not 3 weeks? You'll have to factor that into your backtest as well. If you can even found that detail out...

### Surprise, Surprise

#### Technical:

Is the price now what we expected it to be yesterday? Possibly. Is it what the market expected it to be? What does that even mean (except in the rare case of forward prices being unbiased predictors of future spot price movements...)?

#### Fundamental:

Are non farm payrolls today what we expected yesterday? I can tell you the expectations, mean and distribution, and how they've changed since the last number (though that data isn't free). Same thing for the earnings announcement of Google that's coming out (free on Yahoo finance, et al).

Forecasts are another source of data. Comparing forecasts against reality - surprise - adds yet another dimension we can investigate.

### Comparability

#### Technical:

The changes in the price of US 10 year bond futures can be easily compared against the same for German 2 year bonds, or live hogs for that matter; and they can also be compared against the price of US 10 year bonds thirty years ago (once you've done some volatility normalisation anyway).

#### Fundamental:

Can we compare US inflation against UK? Not easily. Investment banks hire whole team of economic analysts to understand this kind of stuff. It's hard to write automated software that can deal with this issue. Similarly can you compare the price book ratio of a UK bank, against a German manufacturer, versus a Japanese Retailer? Probably not. What about earnings price ratios now, versus a time when the level of share buybacks was different? Tricky. The only way to deal with this is either to ignore it and live with the consequences, or be careful only to compare like for like (eg UK banks against their peers), which reduces the size of portfolios you can trade.

#### Technical:

I am constantly surprised by the number of technical trading rules that are out there, and even more amazed by some of the very silly names they have. Especially since nearly all of them are just trying to pick up trends. Something you can do very easily with one or two rules.

#### Fundamental:

It can be shown mathematically that the number of possible strategies is loads bigger if you have more than one variable you can measure.
Number of technical strategies = Number of variables (1) x Human capacity to come up with silly names
Number of fundamental strategies = Number of variables N! x Human capacity to run panel regressions.

### Rich cross section of data

#### Technical:

Suppose you've got twenty years of daily price data for instrument X. That means you've got about 5000 data points. Collecting it more frequently will give you more data points, but not much extra information. Pooling data across X, Y and Z will also help. But there's limited change of finding something interesting, that hasn't already been discovered and given a silly name, without full on data mining.

#### Fundamental:

So with ten variables you've got 50,000 data points for X. Loads more once you've pooled it with Y and Z. That's a very meaty data set. Lots to dig into before running up against the boundary of statistical significance.

### Fun and knowledge

#### Technical:

Oh look its a wavy line. I don't really need to understand why its wavy, just predict the next wave. I can hire any old Phd to analyse it. They don't have to have any knowledge about trading markets, and there is no point in them learning anyway. And after a few years of looking at the wavy line they're going to be bored sick, and go back to something useful like looking for Higgs Bosons and landing stuff on comets.

#### Fundamental:

You want to trade oil futures, with fundamentals? You need to know about peak oil, weather, Saudi Arabian politics, Shale Gas,oil rig maintenance cycles, electric car take up rates, Putin's psychological state and a hundred other things.
Then you need to work out a way of putting that into a fundamental model. You can spend the rest of your life learning how to do this stuff, its fascinating. You'll never get bored.

### And there's the final whistle...

It's a draw. And no, there are no penalties. Are you really surprised?
I have worked extensively with both kinds of data and I have no strong preference. Technical systems are easier to build and run, but the additional effort required for including fundamental rules will usually be rewarded with additional profits.

Unfortunately as I don't have any staff, the additional effort is not something I am willing to put in right now, so I am sticking with my simple technical systems.

## Tuesday 28 October 2014

### Using sqllite3 to store static and time series data

I've had a request to help out with using sqlite in python to store data for systematic trading. There are three kinds of data I generally keep; static data, state data and timeseries data.

Static data include:

• Futures, and specific contract details
• System parameters
Timeseries data includes:
• Price data
• Volume data
• Fundamental data if used, eg PE ratios for equities
• Accounting data
• Diagnostic data, storing what the system did at various points in the past

State data relates to the control of the trading system and I won't be going into details here - its just static data which is frequently modified.

Code is in the usual  git repo. You will need pandas and sqllite3  (which came with my python distro automatically so check your own).

### Creating the database

dbname="mydb"
dbfilename=get_db_filename(dbname)

setup_blank_tables(dbfilename, ["CREATE TABLE timeseries (datetime text, code text, price float)",
"CREATE TABLE static (code text, fullname text)"])

Here we're creating a database file with two tables in it. Notice the use of database names to abstract away from where they are stored. I find the performance of sqllite3 with massive files isn't great so I tend to stick to one table per file in practice, but for this simple example we don't need to.

If you're a SQL whizz you'll see that I am not doing any relational stuff here.

### Static data

st_table=staticdata(dbname)
st_table.modify("FTSE", "FTSE all share")
st_table.delete("FTSE")

Notice that we use staticdata so we don't need to use any SQL in these commands (in case the underlying table structure changes and to avoid having reams of repetitive nonsense), and within that the connection object ensures that the staticdata code isn't specific to sqlite3.

The sqlite3 read returns lists of tuples, which  staticdata.read() resolves to a single string.

### Timeseries data

dt_table=tsdata(dbname)
someprices=pd.TimeSeries(range(100), pd.date_range('1/1/2014', periods=100))

We use pandas TimeSeries class as the input and output, which is then translated into database terms. sqlite has no native datetime format, only text or float, so we need to translate between pandas/datetime and text. I define a specific format for the text representation to be precise and ensure the database is forward compatible to any changes in pandas.

### The end

This is a very brief and simple example and is missing a lot of error handling it really ought to have, but it should provide you with enough help to get started even if you're not a SQL whizz.

## Friday 24 October 2014

### The worlds simplest execution algo

As you will know I run a fully automated systematic trading system. As its fully automated, due to my extreme laziness, all the trades are put into the market completely automatically.

When I first started running my system I kept the execution process extremely simple:

• Check that the best bid (if selling) or best offer (if buying) was large enough to absorb my order
• Submit a market order

Yes, what a loser. Since I am trading futures, and my broker only has fancy orders for equities, this seemed the easiest option.

I then compounded my misery by creating a nice daily report to tell me how much each trade had cost me. Sure enough most of the time I was paying half the inside spread (the difference between the mid price, and the bid or offer).

After a couple of months of this, and getting fed up with seeing this report add up my losses from trading every day, I decided to bite the bullet and do it properly.

Creating cool execution algorithms (algos) isn't my area of deep expertise, so I had to work from first principles. I also don't have much experience of writing very complicated fast event driven code, and I write in a slowish high level language (python). Finally my orders aren't very large, so there is no need to break them up into smaller slices and track each slice. All this points towards a simple algo being sufficient.

Only one more thing to consider; I get charged for modifying orders. It isn't a big cost, and its worth much less than the saving from smarter execution, but it still means that creating an algo that modifies orders an excessive number of times where this is not necessary probably isn't worth the extra work or cost.

Finally I can't modify a limit order and turn it into a market order. I would have to cancel the order and submit a new one.

### What does it do?

A good human trader, wanting to execute a smallish buy order and not worrying about game playing or spoofing etc, will probably do something like this:

• Submit a limit order, on the same side of the spread they want to trade, joining the current level. So if we are buying we'd submit a buy at the current best bid level. In the jargon this is passive behaviour, waiting for the market to come to us.
• In an ideal world this initial order would get executed.We'll have gained half the spread in negative execution cost (comparing  the mid versus the best bid).
• If:
•  the order isn't being executed after several minutes,
•  or there are signs the market is about to move against them, and rally
• or the market has already moved up against them
• ... then the smart trader would cut their losses and modify their order to pay up and cross the spread. This is aggressive behaviour.
• The new modified aggressive order would be a buy at the current best offer. In theory this would then be executed, costing half the spread (which if the market has already moved against us, would be more than if we'd just submitted a market order initially).
• If we're too slow and the market continues to move against us, keep modifying the order to stay on the new best offer, until we're all done

Although that's it in a nutshell there are still a few bells and whistles in getting an algo like this to work, and in such a way that it can deal robustly with anything that gets thrown at it. Below is the detail of the algo. Although this is shown as python code, its not executable since I haven't included many of the relevant subroutines. However it should give you enough of an idea to code something similar up yourself.

It's somewhat dangerous dropping an algo trade into the mix if the market isn't liquid enough; this routine checks that.

"""
Function easy algo runs before getting a new order

contract: object indicating what we are trading
dbtype, tws: handles for which database and tws API server we are dealing with here.

Returns integer indicating size I am happy with

Zero means market can't support order

"""

## Get market data (a list containing inside spread and size)

bookdata=get_market_data(dbtype, tws, contract, snapshot=False, maxstaleseconds=5, maxwaitseconds=5)

## None means the API is not running or the market is closed :-(

if bookdata is None:
return (0, bookdata)

## Check the market is liquid; the spread and the size have to be within certain limits. We use a multiplier because we are less discerning with limit orders - a wide spread could work in our favour!

if not market_liquid:
return (0, bookdata)

## If the market is liquid, but maybe the order is large compared to the size on the inside spread, we can cut it down to fit the order book.

### New order

Not just one of the best bands in the eighties, also the routine you call when a new order request is issued by the upstream code.

MAX_DELAY=0.03

def EasyAlgo_new_order(order, tws, dbtype, use_orderid, bookdata):

"""
Function easy algo runs on getting a new order

Args:
order - object of my order type containing the required trade
tws - connection object to tws API for interactive brokers
dbtype - database we are accessing
use_orderid- orderid
bookdata- list containing best bid and offer, and relevant sizes

"""

## The s, state, variable is used to ensure that log messages and diagnostics get saved right. Don't worry too much about this

log=logger()
diag=diagnostic(dbtype, system="algo",  system3=str(order.orderid))
s=state_from_sdict(order.orderid, diag, log)

## From the order book, and the trade, get the price we would pay if aggressive (sideprice) and the price we pay if we get passive (offsideprice)

if np.isnan(offsideprice) or offsideprice==0:
log.warning("No offside / limit price in market data so can't issue the order")
return None

if np.isnan(sideprice) or sideprice==0:
log.warning("No sideprice in market data so dangerous to issue the order")
return None

## The order object contains the price recorded at the time the order was generated; check to see if a large move since then (should be less than a second, so unlikely unless market data corrupt)

if not np.isnan(order.submit_price):
delay=abs((offsideprice/order.submit_price) - 1.0)
if delay>MAX_DELAY:
log.warning("Large move since submission - not trading a limit order on that")
return None

## We're happy with the order book, so set the limit price to the 'offside' - best offer if selling, best bid if buying

limitprice=offsideprice

## We change the order so its now a limit order with the right price

order.modify(lmtPrice = limitprice)
order.modify(orderType="LMT")

## Need to translate from my object space to the API's native objects

iborder=from_myorder_to_IBorder(order)
contract=Contract(code=order.code, contractid=order.contractid)
ibcontract=make_IB_contract(contract)

## diagnostic stuff
## its important to save this so we can track what happened if orders go squiffy (a technical term)

s.update(dict(limit_price=limitprice, offside_price=offsideprice, side_price=sideprice,
message="StartingPassive", Mode="Passive"))
timenow=datetime.datetime.now()

##  The algo memory table is used to store state information for the algo. Key thing here is the Mode which is PASSIVE initially!

am=algo_memory_table(dbtype)
am.update_value(order.orderid, "Limit", limitprice)
am.update_value(order.orderid, "ValidSidePrice", sideprice)
am.update_value(order.orderid, "ValidOffSidePrice", offsideprice)
am.update_value(order.orderid, "Started", date_as_float(timenow))
am.update_value(order.orderid, "Mode", "Passive")
am.update_value(order.orderid, "LastNotice", date_as_float(timenow))

am.close()

## Place the order
tws.placeOrder(
use_orderid,                                    # orderId,
ibcontract,                                   # contract,
iborder                                       # order
)

## Return the order upstream, so it can be saved in databases etc. Note if this routine terminates early it returns a None; so the upstream routine knows no order was placed.

return order

### Action on tick

A tick comes from the API when any part of the inside order book is updated (best bid or offer, or relevant size).

Within the tws server code I have a routine that keeps marketdata (a list with best bid and  offer, and relevant sizes) up to date as ticks arrive, and then calls the relevant routine.

What does this set of functions do?
• If we are in a passive state (the initial state, remember!)
• ... and more than five minutes has elapsed, change to aggressive
• if buying and the current best bid has moved up from where it started (an adverse price movement), change to aggressive
• if selling, and the current best offer has moved down from where it started (also adverse)
• If there is an unfavourable order imbalance (eg five times as many people selling than buying on the inside spread if we're also selling), change to aggressive.
• If we are in an aggressive state
• ... and more than ten minutes has elapsed, cancel the order.
•  if buying and the current best offer has moved up from where it was last (a further adverse price movement), then update our limit to the new best offer (chase the market up).
•  if selling and the current best bid has moved down from where it was last (a further adverse price movement), then update our limit to the new best offer

passivetimelimit=5*60 ## max five minutes
totaltimelimit=10*60 ## max another five minute aggressive
maximbalance=5.0 ## amount of imbalance we can copy with

def EasyAlgo_on_tick(dbtype, orderid, marketdata, tws, contract):
"""
Function easy algo runs on getting a tick

Args:
dbtype, tws: handles for database and tws API
orderid: the orderid that is associated with a tick
marketdata: summary of the state of current inside spread
contract: what we are actually trading

"""

## diagnostic code
log=logger()
diag=diagnostic(dbtype, system="algo",  system3=str(int(orderid)))
s=state_from_sdict(orderid, diag, log)

am=algo_memory_table(dbtype)

## Can't find this order in our state database!

if Mode is None or Started is None or current_limit is None or trade is None or LastNotice is None:
log.critical("Can't get algo memory values for orderid %d CANCELLING" % orderid)
FinishOrder(dbtype, orderid, marketdata, tws, contract)

Started=float_as_date(Started)
LastNotice=float_as_date(LastNotice)
timenow=datetime.datetime.now()

s.update(dict(limit_price=current_limit, offside_price=offsideprice, side_price=sideprice,
Mode=Mode))

## Work out how long we've been trading, and the time since we last 'noticed' the time

time_since_last=(timenow - LastNotice).seconds

## A minute has elapsed since we

if time_since_last>60:
s.update(dict(message="One minute since last noticed now %s, total time %d seconds - waiting %d %s %s" % (str(timenow), time_trading, orderid, contract.code, contract.contractid)))
am.update_value(orderid, "LastNotice", date_as_float(timenow))

## We've run out of time - cancel any remaining order

s.update(dict(message="Out of time cancelling for %d %s %s" % (orderid, contract.code, contract.contractid)))
FinishOrder(dbtype, orderid, marketdata, tws, contract)
return -1

if not np.isnan(sideprice) and sideprice<>lastsideprice:
am.update_value(orderid, "ValidSidePrice", sideprice)

if not np.isnan(offsideprice) and offsideprice<>lastoffsideprice:
am.update_value(orderid, "ValidOffSidePrice", offsideprice)

am.close()

if Mode=="Passive":

## Out of time (5 minutes) for passive behaviour: panic

s.update(dict(message="Out of time moving to aggressive for %d %s %s" % (orderid, contract.code, contract.contractid)))

SwitchToAggresive(dbtype, orderid, marketdata, tws, contract, trade)
return -1

if np.isnan(offsideprice):
s.update(dict(message="NAN offside price in passive mode - waiting %d %s %s" % (orderid, contract.code, contract.contractid)))
return -5

if offsideprice>current_limit:
## Since we have put in our limit the price has moved up. We are no longer competitive

s.update(dict(message="Adverse price move moving to aggressive for %d %s %s" % (orderid, contract.code, contract.contractid)))

SwitchToAggresive(dbtype, orderid, marketdata, tws, contract, trade)

return -1
## Selling
if offsideprice<current_limit:
## Since we have put in our limit the price has moved down. We are no longer competitive

s.update(dict(message="Adverse price move moving to aggressive for %d %s %s" % (orderid, contract.code, contract.contractid)))

SwitchToAggresive(dbtype, orderid, marketdata, tws, contract, trade)
return -1

## Detect Imbalance (bid size/ask size if we are buying; ask size/bid size if we are selling)

if balancestat>maximbalance:
s.update(dict(message="Order book imbalance of %f developed compared to %f, switching to aggressive for %d %s %s" %(balancestat , maximbalance, orderid, contract.code, contract.contractid)))

SwitchToAggresive(dbtype, orderid, marketdata, tws, contract, trade)
return -1

elif Mode=="Aggressive":

if np.isnan(sideprice):
s.update(dict(message="NAN side price in aggressive mode - waiting %d %s %s" % (orderid, contract.code, contract.contractid)))
return -5

if sideprice>current_limit:
## Since we have put in our limit the price has moved up further. Keep up!

s.update(dict(message="Adverse price move in aggressive mode for %d %s %s" % (orderid, contract.code, contract.contractid)))
SwitchToAggresive(dbtype, orderid, marketdata, tws, contract, trade)

return -1
## Selling
if sideprice<current_limit:
## Since we have put in our limit the price has moved down. Keep up!

s.update(dict(message="Adverse price move in aggressive mode for %d %s %s" % (orderid, contract.code, contract.contractid)))

SwitchToAggresive(dbtype, orderid, marketdata, tws, contract, trade)
return -1

elif Mode=="Finished":
## do nothing, still have tick for some reason
pass

else:
msg="Mode %s not known for order %d" % (Mode, orderid)
s.update(dict(message=msg))

log=logger()
log.critical(msg)
raise Exception(msg)

s.update(dict(message="tick no action %d %s %s" % (orderid, contract.code, contract.contractid)))

diag.close()

return 0

def SwitchToAggresive(dbtype, orderid, marketdata, tws, contract, trade):
"""
What to do... if we want to eithier change our current order to an aggressive limit order, or move an order is already aggressive limit price

"""
## diagnostics...
log=logger()

diag=diagnostic(dbtype, system="algo",  system3=str(int(orderid)))
s=state_from_sdict(orderid, diag, log)

if tws is None:
log.info("Switch to aggressive didn't get a tws... can't do anything in orderid %d" % orderid)
return -1

## Get the last valid side price (relevant price if crossing the spread) as this will be our new limit order

am=algo_memory_table(dbtype)

ordertable=order_table(dbtype)
ordertable.close()

if np.isnan(sideprice):
s.update(dict(message="To Aggressive: Can't change limit for %d as got nan - will try again" % orderid))
return -1

## updating the order

newlimit=sideprice

order.modify(lmtPrice = newlimit)
order.modify(orderType="LMT")

iborder=from_myorder_to_IBorder(order)
ibcontract=make_IB_contract(contract)

am.update_value(order.orderid, "Limit", newlimit)
am.update_value(order.orderid, "Mode", "Aggressive")
am.close()

# Update the order
tws.placeOrder(
orderid,                                    # orderId,
ibcontract,                                   # contract,
iborder                                       # order
)

s.update(dict(limit_price=newlimit, side_price=sideprice,
message="NowAggressive", Mode="Aggresive"))

return 0

def FinishOrder(dbtype, orderid, marketdata, tws, contract):
"""
Algo hasn't worked, lets cancel this order
"""
diag=diagnostic(dbtype, system="algo",  system3=str(int(orderid)))

s=state_from_sdict(orderid, diag, log)
log=logger()

if tws is None:
log.info("Finish order didn't get a tws... can't do anything in orderid %d" % orderid)
return -1

log=logger()
ordertable=order_table(dbtype)

log.info("Trying to cancel %d because easy algo failure" % orderid)
tws.cancelOrder(int(order.brokerorderid))

order.modify(cancelled=True)
ordertable.update_order(order)

do_order_completed(dbtype, order)

EasyAlgo_on_complete(dbtype, order, tws)

s.update(dict(message="NowCancelling", Mode="Finished"))

am=algo_memory_table(dbtype)
am.update_value(order.orderid, "Mode", "Finished")
am.close()

return -1

### Partial or complete fill

Blimey this has actually worked, we've actually got a fill...

def EasyAlgo_on_partial(dbtype, order, tws):
diag=diagnostic(dbtype, system="algo",  system3=str(int(order.orderid)))

diag.w(order.filledprice, system2="fillprice")

return 0

def EasyAlgo_on_complete(dbtype, order_filled, tws):
"""
Function Easy algo runs on completion of trade
"""

diag=diagnostic(dbtype, system="algo",  system3=str(int(order_filled.orderid)))

diag.w("Finished", system2="Mode")
diag.w(order_filled.filledprice, system2="fillprice")

am=algo_memory_table(dbtype)
am.update_value(order_filled.orderid, "Mode", "Finished")
am.close()

return 0

### And we're done

That's it. Its not perfect and it would be very easy to write high frequency code that would game this kind of strategy. However the proof is in the proverbial traditional English dessert, and my execution costs have reduced by approximately 80% from when I was doing market orders, i.e. I am paying an average of 1/10 of the spread. So it's definitely an improvement, and well worth the day or so it took me to code it up and test it.