In this post I am going to talk about
my setup for 'home based' systematic trading, both hardware and
software.
This post is updated when I change my setup.
Last update: February 2021
Before I kick off a really key point is that what setup you have depends a lot on what kind of trading you do. Essentially you have to balance the following criteria:
This post is updated when I change my setup.
Last update: February 2021
Before I kick off a really key point is that what setup you have depends a lot on what kind of trading you do. Essentially you have to balance the following criteria:
- cost
- speed
- flexibility
- degree of difficulty
- robustness and redundancy
- ease of automation
The relative importance of those
factors will depend on what kind of trading you are doing. I am doing
fairly slow, fully automated, medium speed trading over a fairly
large number of markets with a 'trend following character'. This
means that speed isn't very important to me but the other factors
are.
If I was doing faster trading I'd
probably need a more powerful machine and I would be worried about
latency (delays in getting trades out) from my choice of software and
database. If I was doing some kind of complex non linear analysis
again I would need more juice on the processing side. If I was doing
some kind of 'negative skew / selling optionality' strategy where my
return profile was lots of small gains and occasional big losses I
would be more relaxed about turning my system off on days where I
can't monitor it closely; as it is I want to keep it on to capture
big trends when they appear so robustness is ultra important. If I
wasn't fully automated and I was doing my own execution manually on a
smaller number of markets I might be happier using an 'out of the
box' system to produce my signals and there would be more brokers I
could potentially use. So the lesson here is that one size won't fit
all.
There is also a heavy bias coming in
from my own background and experience; like most people I want to
stick to what I am used to.
Windows vs Unix
My choice: Unix
To be precise I am running on Linux
Mint. Mainly this is due to my own experience; I know how to do many
things on Unix systems which would take me a long time to figure out
on Windows. There are many other advantages and if you google Linux
vs windows you will get about eight zillion hits back which will bore
you stupid about which is better. I certainly am glad I didn't have
to spend any time researching this choice; I think it is a discussion
likely to cause physical violence right up there with 'Which God?'
and 'Which football team?', and like those questions there is no
right answer (well there is, and its unix, but I am not going to
waste time trying to unburden you of your ignorance).
As for the choice of Mint I am no
expert on Linux 'distros' and to be honest this is the only one I
have installed. What I like about it is the windows like user
interface (in fact it is more windows like than Windows 8) . Its
probably a bit 'heavy' and a slimmer distro might be sufficient for a
trading only machine but I also use my boxes for other purposes so
its nice to effectively have a complete windows replacement even if
that means having some disk space filled with media drivers etc.
Apple? Well its Unix now isn't it, and
I have an aversion to expensive brand names, but if you are a slavish
fashion victim who can't be seen without the latest shiny white box
then who am I to tell you differently.
Cheap hardware or expensive hardware
My choice: Cheap(ish)
I don't need fat processing power as I
am doing fairly slow trading so what matters here is robustness and
cost. Given a budget of say £1000 then it is better to buy two £500
machines and get redundancy. In practice I am spending much less than
that on second hand machines which are probably more likely to fail
than £1000 new boxes but arguably less likely to fail jointly than
the £1000 machines single probability of failure. Running linux here
helps as well as you can get much better performance for a given box
than with Windows. Make sure that any relevant drivers are supported by your Linux
distro.
I know everyone loves Rasberry Pi's but they are a bit too bare bones
for my liking so building a trading system on one is clearly a bit of
a pointless CV padding or geek ego boosting exercise.
Currently my dual machine trading estate consists of two identical machines (a live machine, and a backup machine I use for testing and occasional research), both built for me by pcspecialist at a total cost of just over a grand:
Currently my dual machine trading estate consists of two identical machines (a live machine, and a backup machine I use for testing and occasional research), both built for me by pcspecialist at a total cost of just over a grand:
- Case: CiT MTX-007B Mini ITX Case 180W
- CPU: Intel® Core™ i7 Eight Core Processor i7-9700 (3.0GHz) 12MB Cache
- Motherboard: ASUS® PRIME H310i PLUS R2.0: Mini-ITX, LGA1151, USB 3.1, SATA 6GBs
- RAM: 32GB Corsair VENGEANCE DDR4 2400MHz (2 x 16GB)
- Storage: 1TB SEAGATE BARRACUDA 120 2.5" SSD, (up to 560MB/sR | 540MB/sW)
I also have a further machine which I previously used for trading, but I now use purely for monitoring purposes:
- 4GB/500GB Mint Box
As you can see below I did manage for a while with a series of second hand machines of varying ability. But I eventually decided to have identical dual machines, and I've also bitten the bullet and spent a few quid on machines that are reasonable spec and brand new.
I've successfully run my trading system on all of the following.
- 4GB/500GB Mint Box £400 new
- 4GB/60GB fanless Zanshuri beta
MK1. I got this for £100 of ebay.
- 4 GB/ 200 GB ancient Toshiba laptop. £120 from a second hand laptop site.
- 2 GB/100GB almost as ancient Samsung netbook. I actually bought this new, about 8 years ago, so I can't remember what it cost.
- 4 GB/80 GB Dell SFF Optiplex PC. £120 off amazon
If I had enough money and a big office I'd buy a billion monitors and a really fast over powered machine. Not because I need them to trade, but because they look cool.
- 4 GB/ 200 GB ancient Toshiba laptop. £120 from a second hand laptop site.
- 2 GB/100GB almost as ancient Samsung netbook. I actually bought this new, about 8 years ago, so I can't remember what it cost.
- 4 GB/80 GB Dell SFF Optiplex PC. £120 off amazon
If I had enough money and a big office I'd buy a billion monitors and a really fast over powered machine. Not because I need them to trade, but because they look cool.
If anyone cares, I mostly do my research on a Thinkpad Laptop (T480 i7 16GB 512GB).
Backup / redundancy / storage:
My choice: Parallel systems, local NAS
As noted above I have two machines that can run my code. The second machine can eithier be 'active'
(essentially running in parallel, but writing to shadow storage and
issuing 'fake' orders) or 'passive' (ready to go but not running in
parallel). Eithier way you need to think about joint storage, how you
would do failovers, ensuring synchronisation and so on. Even if you
don't go down the dual machine route you need to worry about backups,
or you should do.
I use multiple machines with the fallback machine in a passive state. Each uses local storage to store all its data; i.e. the passive machine won't automatically have its data updated.
The failover then is just to stop the crontab and any active processes on the old active machine, copy the updated data up to my NAS (which in practice I do daily anyway as part of a backup process) and then down to the old passive machine, pull the latest code and reinstall, and finally start the crontab on the new active machine (there is a bit more work if I do this in the middle of a trading day obviously). If a machine is lost the worst that will happen is that I will have to manually put orders traded into the database since the last backup was done (prices and thus signals, at least to daily frequency, are recovered automatically at the next data feed).
I use multiple machines with the fallback machine in a passive state. Each uses local storage to store all its data; i.e. the passive machine won't automatically have its data updated.
The failover then is just to stop the crontab and any active processes on the old active machine, copy the updated data up to my NAS (which in practice I do daily anyway as part of a backup process) and then down to the old passive machine, pull the latest code and reinstall, and finally start the crontab on the new active machine (there is a bit more work if I do this in the middle of a trading day obviously). If a machine is lost the worst that will happen is that I will have to manually put orders traded into the database since the last backup was done (prices and thus signals, at least to daily frequency, are recovered automatically at the next data feed).
So I would normally run the same code and same data (after synching) on eithier box but have the facility to check new code or new implementation by running it in parallel on the 'test' box first. I would then overwrite any data produced by testing with the 'live' data since the exercise is about robustness not parallel checks and spending time trying to reconcile the two sets of data is a waste of effort.
Since I won't necessarily be sitting at the active machine I use ssh (for just looking at logs and running diagnostics) sometimes with screen (so I don't need to stay connected). Unfortunately this isn't enough to restart the interactive brokers Gateway (necessary after a power cut) which requires a persistent GUI; so I use x11vnc. I also have a monitor and keyboard handy to plug into the box for emergencies (top tip: buy a really, really long HDMI cable if you don't want to move your monitor).
I use a Synology disk station as my single shared
storage NAS which automatically backs up all my data as well as holding the master git repository of my code and key documents (and acts as a music server and general dogsbody storage) with RAID. This is automatically
backed up nightly on to a USB connected drive, and sporadically on to USB sticks that live in my safe.
Used to use parallel active machines but chose to simplify things; I also found the NAS a bit sluggish at least over my network so prefer to use it as backup only rather than primary data storage.
You still need offsite backup and I use
google drive to take copies of my data and code (though this isn't a full
'image' backup it means with a few hours work I could reconstruct a
working system from a virgin linux machine, requiring just the installation of the right libraries etc). Most of my code is on github anyway, apart from some private configuration files.
Another issue I haven't really been
able to address is the failure of local power and/or internet
(although running with laptops does effectively give you a UPS). The
best solution would be to buy a virtual linux server on a cloud
somewhere and let someone else worry about all that stuff. This is
actually quite cheap but I am a bit daunted by the potential
complexity of having to set this up and get everything working
remotely. Some of the people reading this will think I am some kind
of legendary uber technology guru, but others will assure you that I
am not. Also: I really like having physical boxes :-)
I have thought about, but never bothered with, RAID.
Bespoke vs Out of the box
My choice: Bespoke
Again clearly this is a matter of
choice and personal skill levels; it would take me much longer to get
acquanted with something like Ninja trader than to write a new system
from scratch which is what I have done (twice! The newest system is here). And having written that
system from scratch it is ultra flexible to modify as required. Were
I starting from a zero base I would probably still build everything
myself even if it took much longer as it is very satisfying and you
learn lots of skills which could be useful if you are doing this with
the intention of getting a job in the industry. However if you just
want to get something working fast and aren't really interested in
the 'plumbing' aspect then I guess an out of the box system would
make sense. But I reserve the right to call you a total wimp. Which
you are.
Broker
My choice: Interactive brokers
As I am doing fully automated trading
there didn't seem to be any choice on this as they are the only
company that seems to offer a proper API to retail customers;
although I would love to be shown alternatives since it isn't ideal
to only have one route to market. What they offer is not perfect and
there are lots of things I would like to see changed, like the fact
you can get 15 minute data free with their front end but not with
their API. The API itself is also a bit fiddly and not very user friendly but that
could also be because I am not very technically gifted, as you will
note I keep emphasising.
Gateway or TWS?
My choice: Gateway
This is a very esoteric choice but if
you are using the IB API you can eithier run the 'server' process as
the TWS (which is a fully featured front end) or the Gateway (which
does nothing but service API requests). For robustness you should use
the Gateway since it never falls over and I usually leave one session
running for a week. If you are doing your own trades then you will
have to use TWS.
Data feed:
My choice: Interactive brokers with barchart.com for historic data
I used to use quandl.com until they stopped supplying free futures data
This is a good choice if you are using
slow signals since the barchart stuff is relatively cheap and very easy
to get via their API; the IB stuff has a price depending on the
instruments but is pretty cheap, however it can be flaky. I build in
filters and averaging to deal with the flaky data but this does make
my system unsuitable for high frequency trading.
I don't use fundamental data like stock
PE ratios for automated trading so I can't comment on that. I am
still a bit of a luddite with that part of my portfolio and like to
check the numbers by looking at the actual annual report, preferably
a paper copy; ideally one bound in vellum and written with a quill
pen.
There are a few markets I can't trade because of expensive IB data so a future project is to get a free source, maybe http://www.bloomberg.com/markets/chart/data/1D/AAPL:US or using bs4 to scrape someones screen.
Programming language
My choice: Python with numpy and pandas
I've written reasonable chunks of code
in Matlab, R, S plus and dabbled in a lot of other languages over the
years but Python was the last language I used in a corporate setting.
I guess being objective it is quite a nice mixture of the statistical
languages mentioned above as well as being easier to use for
scripting, and from my perspective it is harder to write bad code in
Python than R which is very important for a clumsy dolt like me.
Having said that the way it deals with matrices, arrays and time
series isn't as natural as those other langugages and can take some
getting used to. Like R its free / open source, which is also nice.
There is a lot of python code out there, and a lot of python
programmers so if you learn it you may even get some kind of job in
the field, although just to warn you it will probably involve glueing
HTML and SQL together.
Of course if I was writing high
frequency trading code I would probably have to learn C++, and
arguably if I was a real man I would (in my youth there was a saying
that real men only used assembler but I guess even real men aren't as
real as they used to be).
API access method
My choice: ibinsync
In the past I used the native python API for IB directly, and I've written some more posts explaining how to use it, starting here. In the even more distant past I used swigiby, which is now deprecated.
I now use ibinsync which makes life much easier as it means we can treat the IB server as just some random set of objects with state that gets magically updated by a wizard when we're not looking, and not worry about concurrency or threads or anything.
Storage methodology
My choice: mongoDB, Arctic, Yaml
Previously: sqllite, .csv and HDF5 files
What do you need to store? I would
argue you need to store time series data, like prices; static data
which is rarely changed like the native currency of a futures
contract and configuration data like what futures we trade and when.
I mostly use .yaml files for configuration, but occasionally also .csv. In my production system configuration is also dumped in mongoDB tables. Time series data is put in Arctic, which sits on top of mongoDB. Finally I store backtest diagnostics as simple pickle files.
Finally...
Finally please not that nothing here
constitutes a commercial plug for any of the products listed and
shouldn't be considered as a considered, thorough review of the
alternatives. Do your own research etc etc etc. If any of the manufacturers on this page does want to pay me for the mention I will of course accept the money but I am afraid I will have to mention it in really small writing.
I am very glad to hear you've discovered Quandl; it is an amazing resource and they seem to be adding more data every day. It seems it hasn't caught on as a big thing yet but I think it will.
ReplyDeleteYou only mentioned it in passing but GitHub is also amazing. It makes storing, sharing and collaborating on code so easy. I've seamlessly worked on the same code base on five different computers using GitHub.
Also, you're missing out by not renting a server. I've dabbled with GoDaddy and AWS which seem ok, but Windows Azure is very good (yes, I know, even though it's a Windows product). You can set up a Virtual Machine in like 30mins and with GitHub get everything running in no time. It's running all the time with all the computing power you want (to buy) and you can remote in from anywhere, although that probably doesn't matter if you're at home all day.
I haven't started using them yet but apparently HDF5 files are the best for saving data, faster than CSVs and the functionality to almost replace databases.
Tom thanks for your comments and a billion apologies for not getting back sooner.
ReplyDeleteGitHub, what puts me off is the initial up front cost in terms of time. Given its only me working on the code, and on at most two machines (master and slave) I get by pretty well just copying and pasting. Its not a priority for me.
Server rental is a more interesting proposition. This http://www.latentexistence.me.uk/how-to-set-up-a-free-linux-server-on-amazon-ec2/ is rather interesting. I am slightly scared by having an IB API server running I can't literally turn off if it goes haywire.
HDF5 were used at AHL to store diagnostic type data (for which I currently use the rather low tech 'pickle') eg write once, read multiple times. I think they are brilliant for this application since they do all the work of recovering what the object you saved down looks like. For just storing time series of prices which are updated frequently... not sure.
As you can see Tom I am now using both git and HDF5, so you win.
ReplyDeleteStill no virtual server though :-)
Hi Rob, I realize this post was from some time ago, but could you give some more insights into any tools you use to run your trading system. E.g. How do you run any periodic jobs - cron, or similar? How do you run and coordinate your system stages - do you run them as separate processes with any kind of inter process communication mechanism or ESB of some kind? How do you monitor processes which should be running and alert yourself if they unexpectedly stop or fail? Thanks.
ReplyDeleteHi JWM. I run cron jobs, seperate processes (one to get prices and calculate optimal positions, one for execution, one for plus reporting etc) talking via databases. Much more here http://qoppac.blogspot.co.uk/2015/09/systems-building-checks-and-balances.html
DeleteHi Robert,
ReplyDeleteI have a question about placing future orders before markets opens.
I'm in a situation that I have a daily job and go to work arround 7AM (CEST). I've created my own tool to load data, analyse it and calculate position sizes based on your ideas in your book (yes it was a lot of work but I'm proud of it :-) ).
I like to trade on the daily closes, so it's possible to analyse the closes of the day-before arround 6AM and then create some orders. But at that moment markets are not open yet and I couldn't place any order. Is there a way to handle this ? I read something about LOO (Limit-on-open) orders and MOO (Market-on-open) orders but don't see this options in the broker-software where I'm looking for.
Is it furthermore a good idea to (automatically) place trades at market open without controlling the execution yourself? I've read that markets at open are illiquid and are much more volatile. Maybe it's better to wait 30 minutes after markets open ?
Thanks for some advice on this
Hi Kris.
DeleteFor your second point, yes, it's generally a good idea to avoid the market open. Spreads are wider and things can be a bit crazy. The exception is if your trading system is of the type where you will only want to buy if things hit a specific price, in which case it's fine to submit a limit order for that price.
As for the first point I don't personally use these kinds of orders so I can't really be of much help here. Any advice I give you would be just what you could get from googling it...
Thanks Robert, in the meantime I discovered most markets that I want to trade are open at morning (except two). I've also found your articles about the API from IB (or reseller Lynx in Belgium) and autotrading. Very interesting so I will go deeper in that.
DeleteThanks for all the interesting stuff on your website!
Hi Robert,
ReplyDeleteI recently discovered and purchased your book and I have to admit that I can't put it down now! I am a software engineer by trade and have been writing trading systems for my own use for a few years now. The concepts in your book are helping me improve my algos considerably so I can only thank you for that.
I do have one question. I use spread betting as my main trading vehicle. The broker I use offers a comprehensive API and I am currently testing a new version of one of the systems that is able to place trades automatically (as opposed to just sending me signals for me to execute through their GUI). Seems to be working okay in the test environment.
In the book, you use spread betting in the semi-automated example and discuss some of the benefits around not having to worry about contract sizes, currency conversion etc. I was wondering however if you could share some thoughts on using a SB broker as part of a fully automated system? Not too worried about the API/technical piece but more about the merits of SB when compared to a more traditional broker like IB?
Thanks again,
-Luis
Main benefits - smaller contract size (there normally is one, eg £ per point but smaller than futures)
DeleteMain disadvantages - much more expensive than futures, OTC so counterparty risk and potential liquidity shortage in a crisis.
Hi,
ReplyDeleteHow come you're using a 500gb harddrive? Would 128gb or lower be too little?
/Matt
Disk space is effectively free so why not. I would need to check to see how much space I actually use.
DeleteHow come you don't use Amazon Web services to host your code? Also, what do you do in case of power breakdown?
ReplyDeleteA few reasons. When I began trading cloud was less accessible and I didn't know anything about it (both those facts have changed of course). It was also relatively expensive (again, cheaper now) compared to my strategy of buying several cheap machines and running the system across them. My system was (is) relatively relaxed about being shut down for a while because of power or internet failure, since it is trading relatively slowly (if I backtest the effect of delaying trades for a day the effect is zero, and even a two week delay doesn't reduce SR more than a few basis points). So for example I've always left the system running over the holidays. I plan, in the near future, to start using some faster trading systems. Most likely if I go on holiday I'd have to turn them off. However I haven't ruled out putting everything on the cloud at some point in the future.
DeleteTo clarify, do you 1) store data on both of your 500GB machines as well as run computations there, while the copy in your NAS is simply backup, or 2) store data in your NAS, and have your machines read from NAS to run backtest?
ReplyDeleteRight now I simply use my laptop to run computations, but because its HD isn't big enough to store my data, I store that in a WD 4TB external harddrive which is plugged in to the laptop via USB. With some rejiggering it gets mounted on with postgresql running off there. I'm sure this probably isn't the most efficient setup, nor does it take into account redundancy, etc., so would love to hear more about how you set it up.
I store data on my machines, the NAS is just a backup. Otherwise the data has to go through the network every time I have a read/write, which is a lot (because I'm logging everything). An external USB drive would be a bit slower than a native hard drive, but probably still okay as long as you're not high frequency trading. But I'd seriously think about a backup.
DeletePlease Rob, there is an update for this post?
ReplyDeleteAs it says on the top: This post is updated when I change my setup. Last update: February 2021
Delete