Monday 9 December 2013

Setup for a home based systematic trading system

The physical stack: From top to bottom - cable/wireless router (out of shot), Mint box (used for monitoring), 8 port switch, then left to right: cable modem, NAS drive, Backup drive, then left to right Primary trading server, Secondary trading server 



In this post I am going to talk about my setup for 'home based' systematic trading, both hardware and software.

This post is updated when I change my setup. 

Last update: February 2021

Before I kick off a really key point is that what setup you have depends a lot on what kind of trading you do. Essentially you have to balance the following criteria:

  • cost
  • speed
  • flexibility
  • degree of difficulty
  • robustness and redundancy
  • ease of automation

The relative importance of those factors will depend on what kind of trading you are doing. I am doing fairly slow, fully automated, medium speed trading over a fairly large number of markets with a 'trend following character'. This means that speed isn't very important to me but the other factors are.

If I was doing faster trading I'd probably need a more powerful machine and I would be worried about latency (delays in getting trades out) from my choice of software and database. If I was doing some kind of complex non linear analysis again I would need more juice on the processing side. If I was doing some kind of 'negative skew / selling optionality' strategy where my return profile was lots of small gains and occasional big losses I would be more relaxed about turning my system off on days where I can't monitor it closely; as it is I want to keep it on to capture big trends when they appear so robustness is ultra important. If I wasn't fully automated and I was doing my own execution manually on a smaller number of markets I might be happier using an 'out of the box' system to produce my signals and there would be more brokers I could potentially use. So the lesson here is that one size won't fit all.

There is also a heavy bias coming in from my own background and experience; like most people I want to stick to what I am used to.

Windows vs Unix

My choice: Unix


To be precise I am running on Linux Mint. Mainly this is due to my own experience; I know how to do many things on Unix systems which would take me a long time to figure out on Windows. There are many other advantages and if you google Linux vs windows you will get about eight zillion hits back which will bore you stupid about which is better. I certainly am glad I didn't have to spend any time researching this choice; I think it is a discussion likely to cause physical violence right up there with 'Which God?' and 'Which football team?', and like those questions there is no right answer (well there is, and its unix, but I am not going to waste time trying to unburden you of your ignorance).

As for the choice of Mint I am no expert on Linux 'distros' and to be honest this is the only one I have installed. What I like about it is the windows like user interface (in fact it is more windows like than Windows 8) . Its probably a bit 'heavy' and a slimmer distro might be sufficient for a trading only machine but I also use my boxes for other purposes so its nice to effectively have a complete windows replacement even if that means having some disk space filled with media drivers etc.

Apple? Well its Unix now isn't it, and I have an aversion to expensive brand names, but if you are a slavish fashion victim who can't be seen without the latest shiny white box then who am I to tell you differently.


Cheap hardware or expensive hardware

My choice: Cheap(ish)

I don't need fat processing power as I am doing fairly slow trading so what matters here is robustness and cost. Given a budget of say £1000 then it is better to buy two £500 machines and get redundancy. In practice I am spending much less than that on second hand machines which are probably more likely to fail than £1000 new boxes but arguably less likely to fail jointly than the £1000 machines single probability of failure. Running linux here helps as well as you can get much better performance for a given box than with Windows. Make sure that any relevant drivers are supported by your Linux distro.

I know everyone loves Rasberry Pi's but they are a bit too bare bones for my liking so building a trading system on one is clearly a bit of a pointless CV padding or geek ego boosting exercise.

Currently my dual machine trading estate consists of two identical machines (a live machine, and a backup machine I use for testing and occasional research), both built for me by pcspecialist at a total cost of just over a grand:

  • Case: CiT MTX-007B Mini ITX Case 180W
  • CPU: Intel® Core™ i7 Eight Core Processor i7-9700 (3.0GHz) 12MB Cache
  • Motherboard: ASUS® PRIME H310i PLUS R2.0: Mini-ITX, LGA1151, USB 3.1, SATA 6GBs
  • RAM: 32GB Corsair VENGEANCE DDR4 2400MHz (2 x 16GB)
  • Storage: 1TB SEAGATE BARRACUDA 120 2.5" SSD, (up to 560MB/sR | 540MB/sW)

I also have a further machine which I previously used for trading, but I now use purely for monitoring purposes:

- 4GB/500GB Mint Box

As you can see below I did manage for a while with a series of second hand machines of varying ability. But I eventually decided to have identical dual machines, and I've also bitten the bullet and spent a few quid on machines that are reasonable spec and brand new. 

I've successfully run my trading system on all of the following.

- 4GB/500GB Mint Box £400 new 
- 4GB/60GB fanless Zanshuri beta MK1. I got this for £100 of ebay. 
-  4 GB/ 200 GB ancient Toshiba laptop. £120 from a second hand laptop site.
- 2 GB/100GB almost as ancient Samsung netbook. I actually bought this new, about 8 years ago, so I can't remember what it cost.
- 4 GB/80 GB Dell SFF Optiplex PC. £120 off amazon 

If I had enough money and a big office I'd buy a billion monitors and a really fast over powered  machine. Not because I need them to trade, but because they look cool.

If anyone cares, I mostly do my research on a Thinkpad Laptop (T480 i7 16GB 512GB).


Backup / redundancy / storage:

My choice: Parallel systems, local NAS


As noted above I have two machines that can run my code. The second machine can eithier be 'active' (essentially running in parallel, but writing to shadow storage and issuing 'fake' orders) or 'passive' (ready to go but not running in parallel). Eithier way you need to think about joint storage, how you would do failovers, ensuring synchronisation and so on. Even if you don't go down the dual machine route you need to worry about backups, or you should do.
 
I use multiple machines with the fallback machine in a passive state. Each uses local storage to store all its data; i.e. the passive machine won't automatically have its data updated.

The failover then is just to stop the crontab and any active processes on the old active machine, copy the updated data up to my NAS  (which in practice I do daily anyway as part of a backup process) and then down to the old passive machine, pull the latest code and reinstall, and finally start the crontab on the new active machine (there is a bit more work if I do this in the middle of a trading day obviously). If a machine is lost the worst that will happen is that I will have to manually put orders traded into the database since the last backup was done (prices and thus signals, at least to daily frequency, are recovered automatically at the next data feed).

So I would normally run the same code and same data (after synching) on eithier box but have the facility to check new code or new implementation by running it in parallel on the 'test' box first. I would then overwrite any data produced by testing with the 'live' data since the exercise is about robustness not parallel checks and spending time trying to reconcile the two sets of data is a waste of effort.

Since I won't necessarily be sitting at the active machine I use ssh (for just looking at logs and running diagnostics) sometimes with screen (so I don't need to stay connected). Unfortunately this isn't enough to restart the interactive brokers Gateway (necessary after a power cut) which requires a persistent GUI; so I use x11vnc. I also have a monitor and keyboard handy to plug into the box for emergencies (top tip: buy a really, really long HDMI cable if you don't want to move your monitor).

I use a Synology disk station as my single shared storage NAS which automatically backs up all my data as well as holding the master git repository of my code and key documents (and acts as a music server and general dogsbody storage) with RAID. This is automatically backed up nightly on to a USB connected drive, and sporadically on to USB sticks that live in my safe.

Used to use parallel active machines but chose to simplify things; I also found the NAS a bit sluggish at least over my network so prefer to use it as backup only rather than primary data storage.

You still need offsite backup and I use google drive to take copies of my data and code (though this isn't a full 'image' backup it means with a few hours work I could reconstruct a working system from a virgin linux machine, requiring just the installation of the right libraries etc). Most of my code is on github anyway, apart from some private configuration files.

Another issue I haven't really been able to address is the failure of local power and/or internet (although running with laptops does effectively give you a UPS). The best solution would be to buy a virtual linux server on a cloud somewhere and let someone else worry about all that stuff. This is actually quite cheap but I am a bit daunted by the potential complexity of having to set this up and get everything working remotely. Some of the people reading this will think I am some kind of legendary uber technology guru, but others will assure you that I am not. Also: I really like having physical boxes :-)

I have thought about, but never bothered with, RAID.


Bespoke vs Out of the box

My choice: Bespoke


Again clearly this is a matter of choice and personal skill levels; it would take me much longer to get acquanted with something like Ninja trader than to write a new system from scratch which is what I have done (twice! The newest system is here). And having written that system from scratch it is ultra flexible to modify as required. Were I starting from a zero base I would probably still build everything myself even if it took much longer as it is very satisfying and you learn lots of skills which could be useful if you are doing this with the intention of getting a job in the industry. However if you just want to get something working fast and aren't really interested in the 'plumbing' aspect then I guess an out of the box system would make sense. But I reserve the right to call you a total wimp. Which you are.


Broker

My choice: Interactive brokers


As I am doing fully automated trading there didn't seem to be any choice on this as they are the only company that seems to offer a proper API to retail customers; although I would love to be shown alternatives since it isn't ideal to only have one route to market. What they offer is not perfect and there are lots of things I would like to see changed, like the fact you can get 15 minute data free with their front end but not with their API. The API itself is also a bit fiddly and not very user friendly but that could also be because I am not very technically gifted, as you will note I keep emphasising.


Gateway or TWS?

My choice: Gateway


This is a very esoteric choice but if you are using the IB API you can eithier run the 'server' process as the TWS (which is a fully featured front end) or the Gateway (which does nothing but service API requests). For robustness you should use the Gateway since it never falls over and I usually leave one session running for a week. If you are doing your own trades then you will have to use TWS.


Data feed:

My choice: Interactive brokers with barchart.com for historic data

I used to use quandl.com until they stopped supplying free futures data

This is a good choice if you are using slow signals since the barchart stuff is relatively cheap and very easy to get via their API; the IB stuff has a price depending on the instruments but is pretty cheap, however it can be flaky. I build in filters and averaging to deal with the flaky data but this does make my system unsuitable for high frequency trading.

I don't use fundamental data like stock PE ratios for automated trading so I can't comment on that. I am still a bit of a luddite with that part of my portfolio and like to check the numbers by looking at the actual annual report, preferably a paper copy; ideally one bound in vellum and written with a quill pen.

There are a few markets I can't trade because of expensive IB data so a future project is to get a free source, maybe http://www.bloomberg.com/markets/chart/data/1D/AAPL:US or using bs4 to scrape someones screen.


Programming language

My choice: Python with numpy and pandas


I've written reasonable chunks of code in Matlab, R, S plus and dabbled in a lot of other languages over the years but Python was the last language I used in a corporate setting. I guess being objective it is quite a nice mixture of the statistical languages mentioned above as well as being easier to use for scripting, and from my perspective it is harder to write bad code in Python than R which is very important for a clumsy dolt like me. Having said that the way it deals with matrices, arrays and time series isn't as natural as those other langugages and can take some getting used to. Like R its free / open source, which is also nice. There is a lot of python code out there, and a lot of python programmers so if you learn it you may even get some kind of job in the field, although just to warn you it will probably involve glueing HTML and SQL together.

Of course if I was writing high frequency trading code I would probably have to learn C++, and arguably if I was a real man I would (in my youth there was a saying that real men only used assembler but I guess even real men aren't as real as they used to be).


API access method

My choice: ibinsync 

In the past I used the native python API for IB directly, and I've written some more posts explaining how to use it, starting here. In the even more distant past I used swigiby, which is now deprecated.

I now use ibinsync which makes life much easier as it means we can treat the IB server as just some random set of objects with state that gets magically updated by a wizard when we're not looking, and not worry about concurrency or threads or anything.



Storage methodology

My choice: mongoDB, Arctic, Yaml
Previously: sqllite, .csv and HDF5 files

What do you need to store? I would argue you need to store time series data, like prices; static data which is rarely changed like the native currency of a futures contract and configuration data like what futures we trade and when. 

I mostly use .yaml files for configuration, but occasionally also .csv. In my production system configuration is also dumped in mongoDB tables. Time series data is put in Arctic, which sits on top of mongoDB. Finally I store backtest diagnostics as simple pickle files.


Finally...

Finally please not that nothing here constitutes a commercial plug for any of the products listed and shouldn't be considered as a considered, thorough review of the alternatives. Do your own research etc etc etc. If any of the manufacturers on this page does want to pay me for the mention I will of course accept the money but I am afraid I will have to mention it in really small writing.

18 comments:

  1. I am very glad to hear you've discovered Quandl; it is an amazing resource and they seem to be adding more data every day. It seems it hasn't caught on as a big thing yet but I think it will.

    You only mentioned it in passing but GitHub is also amazing. It makes storing, sharing and collaborating on code so easy. I've seamlessly worked on the same code base on five different computers using GitHub.

    Also, you're missing out by not renting a server. I've dabbled with GoDaddy and AWS which seem ok, but Windows Azure is very good (yes, I know, even though it's a Windows product). You can set up a Virtual Machine in like 30mins and with GitHub get everything running in no time. It's running all the time with all the computing power you want (to buy) and you can remote in from anywhere, although that probably doesn't matter if you're at home all day.

    I haven't started using them yet but apparently HDF5 files are the best for saving data, faster than CSVs and the functionality to almost replace databases.

    ReplyDelete
  2. Tom thanks for your comments and a billion apologies for not getting back sooner.

    GitHub, what puts me off is the initial up front cost in terms of time. Given its only me working on the code, and on at most two machines (master and slave) I get by pretty well just copying and pasting. Its not a priority for me.

    Server rental is a more interesting proposition. This http://www.latentexistence.me.uk/how-to-set-up-a-free-linux-server-on-amazon-ec2/ is rather interesting. I am slightly scared by having an IB API server running I can't literally turn off if it goes haywire.

    HDF5 were used at AHL to store diagnostic type data (for which I currently use the rather low tech 'pickle') eg write once, read multiple times. I think they are brilliant for this application since they do all the work of recovering what the object you saved down looks like. For just storing time series of prices which are updated frequently... not sure.

    ReplyDelete
  3. As you can see Tom I am now using both git and HDF5, so you win.

    Still no virtual server though :-)

    ReplyDelete
  4. Hi Rob, I realize this post was from some time ago, but could you give some more insights into any tools you use to run your trading system. E.g. How do you run any periodic jobs - cron, or similar? How do you run and coordinate your system stages - do you run them as separate processes with any kind of inter process communication mechanism or ESB of some kind? How do you monitor processes which should be running and alert yourself if they unexpectedly stop or fail? Thanks.

    ReplyDelete
    Replies
    1. Hi JWM. I run cron jobs, seperate processes (one to get prices and calculate optimal positions, one for execution, one for plus reporting etc) talking via databases. Much more here http://qoppac.blogspot.co.uk/2015/09/systems-building-checks-and-balances.html

      Delete
  5. Hi Robert,

    I have a question about placing future orders before markets opens.

    I'm in a situation that I have a daily job and go to work arround 7AM (CEST). I've created my own tool to load data, analyse it and calculate position sizes based on your ideas in your book (yes it was a lot of work but I'm proud of it :-) ).
    I like to trade on the daily closes, so it's possible to analyse the closes of the day-before arround 6AM and then create some orders. But at that moment markets are not open yet and I couldn't place any order. Is there a way to handle this ? I read something about LOO (Limit-on-open) orders and MOO (Market-on-open) orders but don't see this options in the broker-software where I'm looking for.

    Is it furthermore a good idea to (automatically) place trades at market open without controlling the execution yourself? I've read that markets at open are illiquid and are much more volatile. Maybe it's better to wait 30 minutes after markets open ?

    Thanks for some advice on this


    ReplyDelete
    Replies
    1. Hi Kris.

      For your second point, yes, it's generally a good idea to avoid the market open. Spreads are wider and things can be a bit crazy. The exception is if your trading system is of the type where you will only want to buy if things hit a specific price, in which case it's fine to submit a limit order for that price.

      As for the first point I don't personally use these kinds of orders so I can't really be of much help here. Any advice I give you would be just what you could get from googling it...

      Delete
    2. Thanks Robert, in the meantime I discovered most markets that I want to trade are open at morning (except two). I've also found your articles about the API from IB (or reseller Lynx in Belgium) and autotrading. Very interesting so I will go deeper in that.

      Thanks for all the interesting stuff on your website!

      Delete
  6. Hi Robert,

    I recently discovered and purchased your book and I have to admit that I can't put it down now! I am a software engineer by trade and have been writing trading systems for my own use for a few years now. The concepts in your book are helping me improve my algos considerably so I can only thank you for that.

    I do have one question. I use spread betting as my main trading vehicle. The broker I use offers a comprehensive API and I am currently testing a new version of one of the systems that is able to place trades automatically (as opposed to just sending me signals for me to execute through their GUI). Seems to be working okay in the test environment.

    In the book, you use spread betting in the semi-automated example and discuss some of the benefits around not having to worry about contract sizes, currency conversion etc. I was wondering however if you could share some thoughts on using a SB broker as part of a fully automated system? Not too worried about the API/technical piece but more about the merits of SB when compared to a more traditional broker like IB?

    Thanks again,
    -Luis


    ReplyDelete
    Replies
    1. Main benefits - smaller contract size (there normally is one, eg £ per point but smaller than futures)

      Main disadvantages - much more expensive than futures, OTC so counterparty risk and potential liquidity shortage in a crisis.

      Delete
  7. Hi,

    How come you're using a 500gb harddrive? Would 128gb or lower be too little?

    /Matt

    ReplyDelete
    Replies
    1. Disk space is effectively free so why not. I would need to check to see how much space I actually use.

      Delete
  8. How come you don't use Amazon Web services to host your code? Also, what do you do in case of power breakdown?

    ReplyDelete
    Replies
    1. A few reasons. When I began trading cloud was less accessible and I didn't know anything about it (both those facts have changed of course). It was also relatively expensive (again, cheaper now) compared to my strategy of buying several cheap machines and running the system across them. My system was (is) relatively relaxed about being shut down for a while because of power or internet failure, since it is trading relatively slowly (if I backtest the effect of delaying trades for a day the effect is zero, and even a two week delay doesn't reduce SR more than a few basis points). So for example I've always left the system running over the holidays. I plan, in the near future, to start using some faster trading systems. Most likely if I go on holiday I'd have to turn them off. However I haven't ruled out putting everything on the cloud at some point in the future.

      Delete
  9. To clarify, do you 1) store data on both of your 500GB machines as well as run computations there, while the copy in your NAS is simply backup, or 2) store data in your NAS, and have your machines read from NAS to run backtest?

    Right now I simply use my laptop to run computations, but because its HD isn't big enough to store my data, I store that in a WD 4TB external harddrive which is plugged in to the laptop via USB. With some rejiggering it gets mounted on with postgresql running off there. I'm sure this probably isn't the most efficient setup, nor does it take into account redundancy, etc., so would love to hear more about how you set it up.

    ReplyDelete
    Replies
    1. I store data on my machines, the NAS is just a backup. Otherwise the data has to go through the network every time I have a read/write, which is a lot (because I'm logging everything). An external USB drive would be a bit slower than a native hard drive, but probably still okay as long as you're not high frequency trading. But I'd seriously think about a backup.

      Delete
  10. Please Rob, there is an update for this post?

    ReplyDelete
    Replies
    1. As it says on the top: This post is updated when I change my setup. Last update: February 2021

      Delete

Comments are moderated. So there will be a delay before they are published. Don't bother with spam, it wastes your time and mine.