Monday, 9 December 2013

Setup for a home based systematic trading system

In this post I am going to talk about my setup for 'home based' systematic trading, both hardware and software.

This post is updated when I change my setup. Notes about changes made in italics.

Before I kick off a really key point is that what setup you have depends a lot on what kind of trading you do. Essentially you have to balance the following criteria:

  • cost
  • speed
  • flexibility
  • degree of difficulty
  • robustness and redundancy
  • ease of automation

The relative importance of those factors will depend on what kind of trading you are doing. I am doing fairly slow, fully automated, medium speed trading over a fairly large number of markets with a 'trend following character'. This means that speed isn't very important to me but the other factors are.

If I was doing faster trading I'd probably need a more powerful machine and I would be worried about latency (delays in getting trades out) from my choice of software and database. If I was doing some kind of complex non linear analysis again I would need more juice on the processing side. If I was doing some kind of 'negative skew / selling optionality' strategy where my return profile was lots of small gains and occasional big losses I would be more relaxed about turning my system off on days where I can't monitor it closely; as it is I want to keep it on to capture big trends when they appear so robustness is ultra important. If I wasn't fully automated and I was doing my own execution manually on a smaller number of markets I might be happier using an 'out of the box' system to produce my signals and there would be more brokers I could potentially use. So the lesson here is that one size won't fit all.

There is also a heavy bias coming in from my own background and experience; like most people I want to stick to what I am used to.

Windows vs Unix

My choice: Unix

To be precise I am running on Linux Mint. Mainly this is due to my own experience; I know how to do many things on Unix systems which would take me a long time to figure out on Windows. There are many other advantages and if you google Linux vs windows you will get about eight zillion hits back which will bore you stupid about which is better. I certainly am glad I didn't have to spend any time researching this choice; I think it is a discussion likely to cause physical violence right up there with 'Which God?' and 'Which football team?', and like those questions there is no right answer (well there is, and its unix, but I am not going to waste time trying to unburden you of your ignorance).

As for the choice of Mint I am no expert on Linux 'distros' and to be honest this is the only one I have installed. What I like about it is the windows like user interface (in fact it is more windows like than Windows 8) . Its probably a bit 'heavy' and a slimmer distro would be sufficient for a trading only machine (I intend to try lubuntu soon) but I also use my boxes for other purposes so its nice to effectively have a complete windows replacement even if that means having some disk space filled with media drivers etc.

Apple? Well its Unix now isn't it, and I have an aversion to expensive brand names, but if you are a slavish fashion victim who can't be seen without the latest shiny white box then who am I to tell you differently.

Cheap hardware or expensive hardware

My choice: Cheap

I don't need fat processing power as I am doing fairly slow trading so what matters here is robustness and cost. Given a budget of say £1000 then it is better to buy two £500 machines and get redundancy. In practice I am spending much less than that on second hand machines which are probably more likely to fail than £1000 new boxes but arguably less likely to fail jointly than the £1000 machines single probability of failure. Running linux here helps as well as you can get much better performance for a given box than with Windows. Make sure that any relevant drivers are supported by your Linux distro.

I know everyone loves Rasberry Pi's but they are a bit too bare bones for my liking so building a trading system on one is clearly a bit of a pointless CV padding or geek ego boosting exercise.

Currently my dual machine trading estate is:

- 4GB/500GB Mint Box. This is a machine preloaded with Linux Mint. I treated myself to this new, for about £400. It's about the size of a paperback, and looks amazing. Its fanless, but doesn't have an SSD so there's a murmur of noise from the hard disk.

- 8GB/500GB Intense PC2. I wanted a second Mint box, but they seemed to be out of stock. This is effectively the same Compulab machine as the mint box, but without Mint automatically preinstalled (I guess there might be some special drivers on the mint box though everything seems to work just fine on this); though you can pay £40 to Mint installed (obviously I didn't...).

As you can see below I did manage for a while with a series of second hand machines of varying ability. Now I've eventually decided to have dual machines as close to each other as possible, and I've also bitten the bullet and spent a few quid on machines that are reasonable spec and brand new. However there is still quite a lot of change from a thousand pounds after buying these two boxes.

I've also successfully run the system on all of the following.

- 4GB/60GB fanless Zanshuri beta MK1. I got this for £100 of ebay. This is a great little workhorse, can be left on for donkeys years in a quiet corner and is really quiet.
-  4 GB/ 200 GB ancient Toshiba laptop. Now my main workhorse for research. £120 from a second hand laptop site.
- 2 GB/100GB almost as ancient Samsung netbook. I actually bought this new, about 8 years ago, so I can't remember what it cost.
- 4 GB/80 GB Dell SFF Optiplex PC. £120 off amazon.

If I had enough money and a big office I'd buy a billion monitors and a really fast over powered  machine. Not because I need them to trade, but because they look cool.

Backup / redundancy / storage:

My choice: Parallel systems, local NAS, Dropbox

As noted above I have two machines running at once. The second machine can eithier be 'active' (essentially running in parallel, but writing to shadow storage and issuing 'fake' orders) or 'passive' (ready to go but not running in parallel). Eithier way you need to think about joint storage, how you would do failovers, ensuring synchronisation and so on. Even if you don't go down the dual machine route you need to worry about backups, or you should do.
I use multiple machines with the fallback machines mostly passive (so holding all the right code but not running any core processes, just generating additional diagnostics). Each uses local storage to store all its data; i.e. the passive machine won't have its data updated.

The failover then is just to stop the crontab and any active processes on the old active machine, copy the updated data up to my NAS and then down to the old passive machine (which in practice I do daily anyway as part of a backup process) and start the crontab on the new active machine (there is a bit more work if I do this in the middle of a trading day obviously). If a machine is lost the worst that will happen is that I will have to manually put orders traded into the database since the last backup was done (prices and thus signals, at least to daily frequency, are recovered automatically at the next data feed).

So I would normally run the same code and same data (after synching) on both boxes but have the facility to check new code or new implementation by running it in parallel on the 'test' box first. I would then overwrite any data produced by testing with the 'live' data since the exercise is about robustness not parallel checks and spending time trying to reconcile the two sets of data is a waste of effort.

Since I won't necessarily be sitting at the active machine I use ssh (for just looking at logs and running diagnostics) sometimes with screen. I get the active machine to email me its local IP address as part of its power on since I haven't worked out how to keep this static. Unfortunately this isn't enough to restart the interactive brokers Gateway (necessary after a power cut) which requires a persistent GUI; even using screen within xterm isn't enough. So I have a monitor and keyboard handy to plug into the box.

I use a LaCie D2 as my single shared storage NAS which automatically backs up all my data as well as holding the master git repository of my code and key documents (and acts as a music server and general dogsbody storage). This is automatically backed up on to a spare drive on another local machine. With more money I would get a NAS with RAID, although to be honest its not very high up my list of cool things to own before I am dead.

Used to use a WD My Cloud and parallel active machines but chose to simplify things; I also found the NAS a bit sluggish at least over my network so prefer to use it as backup only rather than primary data storage.

You still need offsite backup and I use dropbox to take copies of my data and code (though this isn't a full 'image' backup it means with a few hours work I could reconstruct a working system from a virgin linux machine, requiring just the installation of the right libraries etc).

Another issue I haven't really been able to address is the failure of local power and/or internet (although running with laptops does effectively give you a UPS). The best solution would be to buy a virtual linux server on a cloud somewhere and let someone else worry about all that stuff. This is actually quite cheap but I am a bit daunted by the potential complexity of having to set this up and get everything working remotely. Some of the people reading this will think I am some kind of legendary uber technology guru, but others will assure you that I am not.

Bespoke vs Out of the box

My choice: Bespoke

Again clearly this is a matter of choice and personal skill levels; it would take me much longer to get acquanted with something like Ninja trader than to write a new system from scratch which is what I have done. And having written that system from scratch it is ultra flexible to modify as required. Were I starting from a zero base I would probably still build everything myself even if it took much longer as it is very satisfying and you learn lots of skills which could be useful if you are doing this with the intention of getting a job in the industry. However if you just want to get something working fast and aren't really interested in the 'plumbing' aspect then I guess an out of the box system would make sense. But I reserve the right to call you a total wimp. Which you are.


My choice: Interactive brokers

As I am doing fully automated trading there didn't seem to be any choice on this as they are the only company that seems to offer a proper API to retail customers; although I would love to be shown alternatives since it isn't ideal to only have one route to market. What they offer is not perfect and there are lots of things I would like to see changed, like the fact you can get 15 minute data free with their front end but not with their API. The API itself is also a bit fiddly (and I will blog more on this subject in due course) and not very user friendly but that could also be because I am not very technically gifted, as you will note I keep emphasising.

Gateway or TWS?

My choice: Gateway

This is a very esoteric choice but if you are using the IB API you can eithier run the 'server' process as the TWS (which is a fully featured front end) or the Gateway (which does nothing but service API requests). For robustness you should use the Gateway since it never falls over and I usually leave one session running for a week. If you are doing your own trades then you will have to use TWS.

Data feed:

My choice: Interactive brokers with for historic data

This is a good choice if you are using slow signals since the quandl stuff is free daily data and very easy to get via their API; the IB stuff has a price depending on the instruments but is pretty cheap, however it can be flaky. I build in filters and averaging to deal with the flaky data but this does make my system unsuitable for high frequency trading.

I don't use fundamental data like stock PE ratios for automated trading so I can't comment on that. I am still a bit of a luddite with that part of my portfolio and like to check the numbers by looking at the actual annual report, preferably a paper copy; ideally one bound in vellum and written with a quill pen.

There are a few markets I can't trade because of expensive IB data so a future project is to get a free source, maybe or using bs4 to scrape someones screen.

Programming language

My choice: Python with numpy and pandas

I've written reasonable chunks of code in Matlab, R, S plus and dabbled in a lot of other languages over the years but Python was the last language I used in a corporate setting. I guess being objective it is quite a nice mixture of the statistical languages mentioned above as well as being easier to use for scripting, and from my perspective it is harder to write bad code in Python than R which is very important for a clumsy dolt like me. Having said that the way it deals with matrices, arrays and time series isn't as natural as those other langugages and can take some getting used to. Like R its free / open source, which is also nice. There is a lot of python code out there, and a lot of python programmers so if you learn it you may even get some kind of job in the field, although just to warn you it will probably involve glueing HTML and SQL together.

Of course if I was writing high frequency trading code I would probably have to learn C++, and arguably if I was a real man I would (in my youth there was a saying that real men only used assembler but I guess even real men aren't as real as they used to be).

API access method

My choice: swigpy

Unfortunately IB doesn't offer a native Python API but some kind people have done open source work on addressing this oversight. The main offering out there is ibPY. After trying to get this to work I however went for which is a rather simpler idea; essentially it takes the native IB C++ API and just wraps it word for word (or rather class for class) in python. This has the advantage that you can just read the C++ documentation directly, and its theoretically possible to generate a new library yourself with SWIG from a new C++ API when released (although I haven't tried it) so you aren't reliant on one open source guy to update his stuff or having to update it yourself. The downside is it can be fiddly to get it working and there are five posts to help you from here onwards.

Storage methodology

My choice: sqllite, .csv and HDF5 files

What do you need to store? I would argue you need to store time series data, like prices; static data which is rarely changed like the native currency of a futures contract and configuration data like what futures we trade and when. I use Python HDF5 files to store diagnostic information which is usually time series, .csv for configurations and databases for everything else. is just a joy to use and extremely simple although it may not be as good if you have a more complex relational scheme (I just use isolated tables and any relations are effectively done in software) and there might be faster options out there if latency bothers you. I did have some lock issues but these mostly went away when I partioned my database into a larger number of files; they only tend to happen now if I try and run large reports when actively trading rather than when trading has stopped (a solution to this is to run the reports on a cloned database).

I switched from pickle to HDF5 as the latter is much more efficient and very easy to use.

Finally please not that nothing here constitutes a commercial plug for any of the products listed and shouldn't be considered as a considered, thorough review of the alternatives. Do your own research etc etc etc. If any of the manufacturers on this page does want to pay me for the mention I will of course accept the money but I am afraid I will have to mention it in really small writing.