In this post I am going to talk about
my setup for 'home based' systematic trading, both hardware and
software.
This post is updated when I change my setup.
Last update: February 2021
Before I kick off a really key point is that what setup you have depends a lot on what kind of trading you do. Essentially you have to balance the following criteria:
This post is updated when I change my setup.
Last update: February 2021
Before I kick off a really key point is that what setup you have depends a lot on what kind of trading you do. Essentially you have to balance the following criteria:
- cost
- speed
- flexibility
- degree of difficulty
- robustness and redundancy
- ease of automation
The relative importance of those
factors will depend on what kind of trading you are doing. I am doing
fairly slow, fully automated, medium speed trading over a fairly
large number of markets with a 'trend following character'. This
means that speed isn't very important to me but the other factors
are.
If I was doing faster trading I'd
probably need a more powerful machine and I would be worried about
latency (delays in getting trades out) from my choice of software and
database. If I was doing some kind of complex non linear analysis
again I would need more juice on the processing side. If I was doing
some kind of 'negative skew / selling optionality' strategy where my
return profile was lots of small gains and occasional big losses I
would be more relaxed about turning my system off on days where I
can't monitor it closely; as it is I want to keep it on to capture
big trends when they appear so robustness is ultra important. If I
wasn't fully automated and I was doing my own execution manually on a
smaller number of markets I might be happier using an 'out of the
box' system to produce my signals and there would be more brokers I
could potentially use. So the lesson here is that one size won't fit
all.
There is also a heavy bias coming in
from my own background and experience; like most people I want to
stick to what I am used to.
Windows vs Unix
My choice: Unix
To be precise I am running on Linux
Mint. Mainly this is due to my own experience; I know how to do many
things on Unix systems which would take me a long time to figure out
on Windows. There are many other advantages and if you google Linux
vs windows you will get about eight zillion hits back which will bore
you stupid about which is better. I certainly am glad I didn't have
to spend any time researching this choice; I think it is a discussion
likely to cause physical violence right up there with 'Which God?'
and 'Which football team?', and like those questions there is no
right answer (well there is, and its unix, but I am not going to
waste time trying to unburden you of your ignorance).
As for the choice of Mint I am no
expert on Linux 'distros' and to be honest this is the only one I
have installed. What I like about it is the windows like user
interface (in fact it is more windows like than Windows 8) . Its
probably a bit 'heavy' and a slimmer distro might be sufficient for a
trading only machine but I also use my boxes for other purposes so
its nice to effectively have a complete windows replacement even if
that means having some disk space filled with media drivers etc.
Apple? Well its Unix now isn't it, and
I have an aversion to expensive brand names, but if you are a slavish
fashion victim who can't be seen without the latest shiny white box
then who am I to tell you differently.
Cheap hardware or expensive hardware
My choice: Cheap(ish)
I don't need fat processing power as I
am doing fairly slow trading so what matters here is robustness and
cost. Given a budget of say £1000 then it is better to buy two £500
machines and get redundancy. In practice I am spending much less than
that on second hand machines which are probably more likely to fail
than £1000 new boxes but arguably less likely to fail jointly than
the £1000 machines single probability of failure. Running linux here
helps as well as you can get much better performance for a given box
than with Windows. Make sure that any relevant drivers are supported by your Linux
distro.
I know everyone loves Rasberry Pi's but they are a bit too bare bones
for my liking so building a trading system on one is clearly a bit of
a pointless CV padding or geek ego boosting exercise.
Currently my dual machine trading estate consists of two identical machines (a live machine, and a backup machine I use for testing and occasional research), both built for me by pcspecialist at a total cost of just over a grand:
Currently my dual machine trading estate consists of two identical machines (a live machine, and a backup machine I use for testing and occasional research), both built for me by pcspecialist at a total cost of just over a grand:
- Case: CiT MTX-007B Mini ITX Case 180W
- CPU: Intel® Core™ i7 Eight Core Processor i7-9700 (3.0GHz) 12MB Cache
- Motherboard: ASUS® PRIME H310i PLUS R2.0: Mini-ITX, LGA1151, USB 3.1, SATA 6GBs
- RAM: 32GB Corsair VENGEANCE DDR4 2400MHz (2 x 16GB)
- Storage: 1TB SEAGATE BARRACUDA 120 2.5" SSD, (up to 560MB/sR | 540MB/sW)
I also have a further machine which I previously used for trading, but I now use purely for monitoring purposes:
- 4GB/500GB Mint Box
As you can see below I did manage for a while with a series of second hand machines of varying ability. But I eventually decided to have identical dual machines, and I've also bitten the bullet and spent a few quid on machines that are reasonable spec and brand new.
I've successfully run my trading system on all of the following.
- 4GB/500GB Mint Box £400 new
- 4GB/60GB fanless Zanshuri beta
MK1. I got this for £100 of ebay.
- 4 GB/ 200 GB ancient Toshiba laptop. £120 from a second hand laptop site.
- 2 GB/100GB almost as ancient Samsung netbook. I actually bought this new, about 8 years ago, so I can't remember what it cost.
- 4 GB/80 GB Dell SFF Optiplex PC. £120 off amazon
If I had enough money and a big office I'd buy a billion monitors and a really fast over powered machine. Not because I need them to trade, but because they look cool.
- 4 GB/ 200 GB ancient Toshiba laptop. £120 from a second hand laptop site.
- 2 GB/100GB almost as ancient Samsung netbook. I actually bought this new, about 8 years ago, so I can't remember what it cost.
- 4 GB/80 GB Dell SFF Optiplex PC. £120 off amazon
If I had enough money and a big office I'd buy a billion monitors and a really fast over powered machine. Not because I need them to trade, but because they look cool.
If anyone cares, I mostly do my research on a Thinkpad Laptop (T480 i7 16GB 512GB).
Backup / redundancy / storage:
My choice: Parallel systems, local NAS
As noted above I have two machines that can run my code. The second machine can eithier be 'active'
(essentially running in parallel, but writing to shadow storage and
issuing 'fake' orders) or 'passive' (ready to go but not running in
parallel). Eithier way you need to think about joint storage, how you
would do failovers, ensuring synchronisation and so on. Even if you
don't go down the dual machine route you need to worry about backups,
or you should do.
I use multiple machines with the fallback machine in a passive state. Each uses local storage to store all its data; i.e. the passive machine won't automatically have its data updated.
The failover then is just to stop the crontab and any active processes on the old active machine, copy the updated data up to my NAS (which in practice I do daily anyway as part of a backup process) and then down to the old passive machine, pull the latest code and reinstall, and finally start the crontab on the new active machine (there is a bit more work if I do this in the middle of a trading day obviously). If a machine is lost the worst that will happen is that I will have to manually put orders traded into the database since the last backup was done (prices and thus signals, at least to daily frequency, are recovered automatically at the next data feed).
I use multiple machines with the fallback machine in a passive state. Each uses local storage to store all its data; i.e. the passive machine won't automatically have its data updated.
The failover then is just to stop the crontab and any active processes on the old active machine, copy the updated data up to my NAS (which in practice I do daily anyway as part of a backup process) and then down to the old passive machine, pull the latest code and reinstall, and finally start the crontab on the new active machine (there is a bit more work if I do this in the middle of a trading day obviously). If a machine is lost the worst that will happen is that I will have to manually put orders traded into the database since the last backup was done (prices and thus signals, at least to daily frequency, are recovered automatically at the next data feed).
So I would normally run the same code and same data (after synching) on eithier box but have the facility to check new code or new implementation by running it in parallel on the 'test' box first. I would then overwrite any data produced by testing with the 'live' data since the exercise is about robustness not parallel checks and spending time trying to reconcile the two sets of data is a waste of effort.
Since I won't necessarily be sitting at the active machine I use ssh (for just looking at logs and running diagnostics) sometimes with screen (so I don't need to stay connected). Unfortunately this isn't enough to restart the interactive brokers Gateway (necessary after a power cut) which requires a persistent GUI; so I use x11vnc. I also have a monitor and keyboard handy to plug into the box for emergencies (top tip: buy a really, really long HDMI cable if you don't want to move your monitor).
I use a Synology disk station as my single shared
storage NAS which automatically backs up all my data as well as holding the master git repository of my code and key documents (and acts as a music server and general dogsbody storage) with RAID. This is automatically
backed up nightly on to a USB connected drive, and sporadically on to USB sticks that live in my safe.
Used to use parallel active machines but chose to simplify things; I also found the NAS a bit sluggish at least over my network so prefer to use it as backup only rather than primary data storage.
You still need offsite backup and I use
google drive to take copies of my data and code (though this isn't a full
'image' backup it means with a few hours work I could reconstruct a
working system from a virgin linux machine, requiring just the installation of the right libraries etc). Most of my code is on github anyway, apart from some private configuration files.
Another issue I haven't really been
able to address is the failure of local power and/or internet
(although running with laptops does effectively give you a UPS). The
best solution would be to buy a virtual linux server on a cloud
somewhere and let someone else worry about all that stuff. This is
actually quite cheap but I am a bit daunted by the potential
complexity of having to set this up and get everything working
remotely. Some of the people reading this will think I am some kind
of legendary uber technology guru, but others will assure you that I
am not. Also: I really like having physical boxes :-)
I have thought about, but never bothered with, RAID.
Bespoke vs Out of the box
My choice: Bespoke
Again clearly this is a matter of
choice and personal skill levels; it would take me much longer to get
acquanted with something like Ninja trader than to write a new system
from scratch which is what I have done (twice! The newest system is here). And having written that
system from scratch it is ultra flexible to modify as required. Were
I starting from a zero base I would probably still build everything
myself even if it took much longer as it is very satisfying and you
learn lots of skills which could be useful if you are doing this with
the intention of getting a job in the industry. However if you just
want to get something working fast and aren't really interested in
the 'plumbing' aspect then I guess an out of the box system would
make sense. But I reserve the right to call you a total wimp. Which
you are.
Broker
My choice: Interactive brokers
As I am doing fully automated trading
there didn't seem to be any choice on this as they are the only
company that seems to offer a proper API to retail customers;
although I would love to be shown alternatives since it isn't ideal
to only have one route to market. What they offer is not perfect and
there are lots of things I would like to see changed, like the fact
you can get 15 minute data free with their front end but not with
their API. The API itself is also a bit fiddly and not very user friendly but that
could also be because I am not very technically gifted, as you will
note I keep emphasising.
Gateway or TWS?
My choice: Gateway
This is a very esoteric choice but if
you are using the IB API you can eithier run the 'server' process as
the TWS (which is a fully featured front end) or the Gateway (which
does nothing but service API requests). For robustness you should use
the Gateway since it never falls over and I usually leave one session
running for a week. If you are doing your own trades then you will
have to use TWS.
Data feed:
My choice: Interactive brokers with barchart.com for historic data
I used to use quandl.com until they stopped supplying free futures data
This is a good choice if you are using
slow signals since the barchart stuff is relatively cheap and very easy
to get via their API; the IB stuff has a price depending on the
instruments but is pretty cheap, however it can be flaky. I build in
filters and averaging to deal with the flaky data but this does make
my system unsuitable for high frequency trading.
I don't use fundamental data like stock
PE ratios for automated trading so I can't comment on that. I am
still a bit of a luddite with that part of my portfolio and like to
check the numbers by looking at the actual annual report, preferably
a paper copy; ideally one bound in vellum and written with a quill
pen.
There are a few markets I can't trade because of expensive IB data so a future project is to get a free source, maybe http://www.bloomberg.com/markets/chart/data/1D/AAPL:US or using bs4 to scrape someones screen.
Programming language
My choice: Python with numpy and pandas
I've written reasonable chunks of code
in Matlab, R, S plus and dabbled in a lot of other languages over the
years but Python was the last language I used in a corporate setting.
I guess being objective it is quite a nice mixture of the statistical
languages mentioned above as well as being easier to use for
scripting, and from my perspective it is harder to write bad code in
Python than R which is very important for a clumsy dolt like me.
Having said that the way it deals with matrices, arrays and time
series isn't as natural as those other langugages and can take some
getting used to. Like R its free / open source, which is also nice.
There is a lot of python code out there, and a lot of python
programmers so if you learn it you may even get some kind of job in
the field, although just to warn you it will probably involve glueing
HTML and SQL together.
Of course if I was writing high
frequency trading code I would probably have to learn C++, and
arguably if I was a real man I would (in my youth there was a saying
that real men only used assembler but I guess even real men aren't as
real as they used to be).
API access method
My choice: ibinsync
In the past I used the native python API for IB directly, and I've written some more posts explaining how to use it, starting here. In the even more distant past I used swigiby, which is now deprecated.
I now use ibinsync which makes life much easier as it means we can treat the IB server as just some random set of objects with state that gets magically updated by a wizard when we're not looking, and not worry about concurrency or threads or anything.
Storage methodology
My choice: mongoDB, Arctic, Yaml
Previously: sqllite, .csv and HDF5 files
What do you need to store? I would
argue you need to store time series data, like prices; static data
which is rarely changed like the native currency of a futures
contract and configuration data like what futures we trade and when.
I mostly use .yaml files for configuration, but occasionally also .csv. In my production system configuration is also dumped in mongoDB tables. Time series data is put in Arctic, which sits on top of mongoDB. Finally I store backtest diagnostics as simple pickle files.
Finally...
Finally please not that nothing here
constitutes a commercial plug for any of the products listed and
shouldn't be considered as a considered, thorough review of the
alternatives. Do your own research etc etc etc. If any of the manufacturers on this page does want to pay me for the mention I will of course accept the money but I am afraid I will have to mention it in really small writing.