Monday, 9 December 2013

Setup for a home based systematic trading system

The physical stack: From top to bottom - cable/wireless router (out of shot), Mint box (used for monitoring), 8 port switch, then left to right: cable modem, NAS drive, Backup drive, then left to right Primary trading server, Secondary trading server 



In this post I am going to talk about my setup for 'home based' systematic trading, both hardware and software.

This post is updated when I change my setup. 

Last update: February 2021

Before I kick off a really key point is that what setup you have depends a lot on what kind of trading you do. Essentially you have to balance the following criteria:

  • cost
  • speed
  • flexibility
  • degree of difficulty
  • robustness and redundancy
  • ease of automation

The relative importance of those factors will depend on what kind of trading you are doing. I am doing fairly slow, fully automated, medium speed trading over a fairly large number of markets with a 'trend following character'. This means that speed isn't very important to me but the other factors are.

If I was doing faster trading I'd probably need a more powerful machine and I would be worried about latency (delays in getting trades out) from my choice of software and database. If I was doing some kind of complex non linear analysis again I would need more juice on the processing side. If I was doing some kind of 'negative skew / selling optionality' strategy where my return profile was lots of small gains and occasional big losses I would be more relaxed about turning my system off on days where I can't monitor it closely; as it is I want to keep it on to capture big trends when they appear so robustness is ultra important. If I wasn't fully automated and I was doing my own execution manually on a smaller number of markets I might be happier using an 'out of the box' system to produce my signals and there would be more brokers I could potentially use. So the lesson here is that one size won't fit all.

There is also a heavy bias coming in from my own background and experience; like most people I want to stick to what I am used to.

Windows vs Unix

My choice: Unix


To be precise I am running on Linux Mint. Mainly this is due to my own experience; I know how to do many things on Unix systems which would take me a long time to figure out on Windows. There are many other advantages and if you google Linux vs windows you will get about eight zillion hits back which will bore you stupid about which is better. I certainly am glad I didn't have to spend any time researching this choice; I think it is a discussion likely to cause physical violence right up there with 'Which God?' and 'Which football team?', and like those questions there is no right answer (well there is, and its unix, but I am not going to waste time trying to unburden you of your ignorance).

As for the choice of Mint I am no expert on Linux 'distros' and to be honest this is the only one I have installed. What I like about it is the windows like user interface (in fact it is more windows like than Windows 8) . Its probably a bit 'heavy' and a slimmer distro might be sufficient for a trading only machine but I also use my boxes for other purposes so its nice to effectively have a complete windows replacement even if that means having some disk space filled with media drivers etc.

Apple? Well its Unix now isn't it, and I have an aversion to expensive brand names, but if you are a slavish fashion victim who can't be seen without the latest shiny white box then who am I to tell you differently.


Cheap hardware or expensive hardware

My choice: Cheap(ish)

I don't need fat processing power as I am doing fairly slow trading so what matters here is robustness and cost. Given a budget of say £1000 then it is better to buy two £500 machines and get redundancy. In practice I am spending much less than that on second hand machines which are probably more likely to fail than £1000 new boxes but arguably less likely to fail jointly than the £1000 machines single probability of failure. Running linux here helps as well as you can get much better performance for a given box than with Windows. Make sure that any relevant drivers are supported by your Linux distro.

I know everyone loves Rasberry Pi's but they are a bit too bare bones for my liking so building a trading system on one is clearly a bit of a pointless CV padding or geek ego boosting exercise.

Currently my dual machine trading estate consists of two identical machines (a live machine, and a backup machine I use for testing and occasional research), both built for me by pcspecialist at a total cost of just over a grand:

  • Case: CiT MTX-007B Mini ITX Case 180W
  • CPU: Intel® Core™ i7 Eight Core Processor i7-9700 (3.0GHz) 12MB Cache
  • Motherboard: ASUS® PRIME H310i PLUS R2.0: Mini-ITX, LGA1151, USB 3.1, SATA 6GBs
  • RAM: 32GB Corsair VENGEANCE DDR4 2400MHz (2 x 16GB)
  • Storage: 1TB SEAGATE BARRACUDA 120 2.5" SSD, (up to 560MB/sR | 540MB/sW)

I also have a further machine which I previously used for trading, but I now use purely for monitoring purposes:

- 4GB/500GB Mint Box

As you can see below I did manage for a while with a series of second hand machines of varying ability. But I eventually decided to have identical dual machines, and I've also bitten the bullet and spent a few quid on machines that are reasonable spec and brand new. 

I've successfully run my trading system on all of the following.

- 4GB/500GB Mint Box £400 new 
- 4GB/60GB fanless Zanshuri beta MK1. I got this for £100 of ebay. 
-  4 GB/ 200 GB ancient Toshiba laptop. £120 from a second hand laptop site.
- 2 GB/100GB almost as ancient Samsung netbook. I actually bought this new, about 8 years ago, so I can't remember what it cost.
- 4 GB/80 GB Dell SFF Optiplex PC. £120 off amazon 

If I had enough money and a big office I'd buy a billion monitors and a really fast over powered  machine. Not because I need them to trade, but because they look cool.

If anyone cares, I mostly do my research on a Thinkpad Laptop (T480 i7 16GB 512GB).


Backup / redundancy / storage:

My choice: Parallel systems, local NAS


As noted above I have two machines that can run my code. The second machine can eithier be 'active' (essentially running in parallel, but writing to shadow storage and issuing 'fake' orders) or 'passive' (ready to go but not running in parallel). Eithier way you need to think about joint storage, how you would do failovers, ensuring synchronisation and so on. Even if you don't go down the dual machine route you need to worry about backups, or you should do.
 
I use multiple machines with the fallback machine in a passive state. Each uses local storage to store all its data; i.e. the passive machine won't automatically have its data updated.

The failover then is just to stop the crontab and any active processes on the old active machine, copy the updated data up to my NAS  (which in practice I do daily anyway as part of a backup process) and then down to the old passive machine, pull the latest code and reinstall, and finally start the crontab on the new active machine (there is a bit more work if I do this in the middle of a trading day obviously). If a machine is lost the worst that will happen is that I will have to manually put orders traded into the database since the last backup was done (prices and thus signals, at least to daily frequency, are recovered automatically at the next data feed).

So I would normally run the same code and same data (after synching) on eithier box but have the facility to check new code or new implementation by running it in parallel on the 'test' box first. I would then overwrite any data produced by testing with the 'live' data since the exercise is about robustness not parallel checks and spending time trying to reconcile the two sets of data is a waste of effort.

Since I won't necessarily be sitting at the active machine I use ssh (for just looking at logs and running diagnostics) sometimes with screen (so I don't need to stay connected). Unfortunately this isn't enough to restart the interactive brokers Gateway (necessary after a power cut) which requires a persistent GUI; so I use x11vnc. I also have a monitor and keyboard handy to plug into the box for emergencies (top tip: buy a really, really long HDMI cable if you don't want to move your monitor).

I use a Synology disk station as my single shared storage NAS which automatically backs up all my data as well as holding the master git repository of my code and key documents (and acts as a music server and general dogsbody storage) with RAID. This is automatically backed up nightly on to a USB connected drive, and sporadically on to USB sticks that live in my safe.

Used to use parallel active machines but chose to simplify things; I also found the NAS a bit sluggish at least over my network so prefer to use it as backup only rather than primary data storage.

You still need offsite backup and I use google drive to take copies of my data and code (though this isn't a full 'image' backup it means with a few hours work I could reconstruct a working system from a virgin linux machine, requiring just the installation of the right libraries etc). Most of my code is on github anyway, apart from some private configuration files.

Another issue I haven't really been able to address is the failure of local power and/or internet (although running with laptops does effectively give you a UPS). The best solution would be to buy a virtual linux server on a cloud somewhere and let someone else worry about all that stuff. This is actually quite cheap but I am a bit daunted by the potential complexity of having to set this up and get everything working remotely. Some of the people reading this will think I am some kind of legendary uber technology guru, but others will assure you that I am not. Also: I really like having physical boxes :-)

I have thought about, but never bothered with, RAID.


Bespoke vs Out of the box

My choice: Bespoke


Again clearly this is a matter of choice and personal skill levels; it would take me much longer to get acquanted with something like Ninja trader than to write a new system from scratch which is what I have done (twice! The newest system is here). And having written that system from scratch it is ultra flexible to modify as required. Were I starting from a zero base I would probably still build everything myself even if it took much longer as it is very satisfying and you learn lots of skills which could be useful if you are doing this with the intention of getting a job in the industry. However if you just want to get something working fast and aren't really interested in the 'plumbing' aspect then I guess an out of the box system would make sense. But I reserve the right to call you a total wimp. Which you are.


Broker

My choice: Interactive brokers


As I am doing fully automated trading there didn't seem to be any choice on this as they are the only company that seems to offer a proper API to retail customers; although I would love to be shown alternatives since it isn't ideal to only have one route to market. What they offer is not perfect and there are lots of things I would like to see changed, like the fact you can get 15 minute data free with their front end but not with their API. The API itself is also a bit fiddly and not very user friendly but that could also be because I am not very technically gifted, as you will note I keep emphasising.


Gateway or TWS?

My choice: Gateway


This is a very esoteric choice but if you are using the IB API you can eithier run the 'server' process as the TWS (which is a fully featured front end) or the Gateway (which does nothing but service API requests). For robustness you should use the Gateway since it never falls over and I usually leave one session running for a week. If you are doing your own trades then you will have to use TWS.


Data feed:

My choice: Interactive brokers with barchart.com for historic data

I used to use quandl.com until they stopped supplying free futures data

This is a good choice if you are using slow signals since the barchart stuff is relatively cheap and very easy to get via their API; the IB stuff has a price depending on the instruments but is pretty cheap, however it can be flaky. I build in filters and averaging to deal with the flaky data but this does make my system unsuitable for high frequency trading.

I don't use fundamental data like stock PE ratios for automated trading so I can't comment on that. I am still a bit of a luddite with that part of my portfolio and like to check the numbers by looking at the actual annual report, preferably a paper copy; ideally one bound in vellum and written with a quill pen.

There are a few markets I can't trade because of expensive IB data so a future project is to get a free source, maybe http://www.bloomberg.com/markets/chart/data/1D/AAPL:US or using bs4 to scrape someones screen.


Programming language

My choice: Python with numpy and pandas


I've written reasonable chunks of code in Matlab, R, S plus and dabbled in a lot of other languages over the years but Python was the last language I used in a corporate setting. I guess being objective it is quite a nice mixture of the statistical languages mentioned above as well as being easier to use for scripting, and from my perspective it is harder to write bad code in Python than R which is very important for a clumsy dolt like me. Having said that the way it deals with matrices, arrays and time series isn't as natural as those other langugages and can take some getting used to. Like R its free / open source, which is also nice. There is a lot of python code out there, and a lot of python programmers so if you learn it you may even get some kind of job in the field, although just to warn you it will probably involve glueing HTML and SQL together.

Of course if I was writing high frequency trading code I would probably have to learn C++, and arguably if I was a real man I would (in my youth there was a saying that real men only used assembler but I guess even real men aren't as real as they used to be).


API access method

My choice: ibinsync 

In the past I used the native python API for IB directly, and I've written some more posts explaining how to use it, starting here. In the even more distant past I used swigiby, which is now deprecated.

I now use ibinsync which makes life much easier as it means we can treat the IB server as just some random set of objects with state that gets magically updated by a wizard when we're not looking, and not worry about concurrency or threads or anything.



Storage methodology

My choice: mongoDB, Arctic, Yaml
Previously: sqllite, .csv and HDF5 files

What do you need to store? I would argue you need to store time series data, like prices; static data which is rarely changed like the native currency of a futures contract and configuration data like what futures we trade and when. 

I mostly use .yaml files for configuration, but occasionally also .csv. In my production system configuration is also dumped in mongoDB tables. Time series data is put in Arctic, which sits on top of mongoDB. Finally I store backtest diagnostics as simple pickle files.


Finally...

Finally please not that nothing here constitutes a commercial plug for any of the products listed and shouldn't be considered as a considered, thorough review of the alternatives. Do your own research etc etc etc. If any of the manufacturers on this page does want to pay me for the mention I will of course accept the money but I am afraid I will have to mention it in really small writing.