Wednesday 18 January 2017

Playing with Docker - some initial results (pysystemtrade)

This post is about using Docker - a containerisation tool - to run automated trading strategies. I'll show you a simple example of how to use Docker with my python back testing library pysystemtrade to run a backtest in a container, and get the results out. However this post should hopefully be comprehensible to non pysystemtrade and non python speaking people as well.

PS: Apologies for the long break between posts: I've been writing my second book, and until the first draft is at the publishers posts will be pretty infrequent.


The logo of docker is a cute whale with some containers on its back. It's fun, but I'm worried that people will think it's okay to start using whales as a cheap alternative to ships. Listen people: It's not okay.
Source: docker.com


What the Heck is Docker and why do I need it?

As you'll know if you've read this post I currently run my trading system with two machines - a live and a backup. A couple of months ago the backup failed; some kind of disk failure. I humanely dispatched it to ebay (it is now in a better place, the new owner managing to successfully replace the disk: yes I am a software guy and I don't do hardware...).

A brief trip to ebay later and I had a new backup machine (actually I accidentally bought two machines, which means I now have a machine I can use for development). A couple of days ago I then had to spend around half a day setting up the new machine.

This is quite a complicated business, as the trading system consists of:


  1. An operating system (Linux mint in my case)
  2. Some essential packages (ssh; emacs, as I can't use vi; x11vnc as the machine won't normally have a monitor attached; git to download useful python code from github or my local network drive)
  3. Drive mountings to network drives
  4. The interactive brokers gateway
  5. A specific version of Python 
  6. A bunch of python libraries that I wrote all by myself
  7. A whole bunch of other python libraries like numpy, pandas etc: again all requiring specific versions to work (my legacy trading system is in frozen development so it is keyed to an obsolete version of various libraries which have since had pesky API changes)
  8. Some random non python package dependencies 
  9. A directory structure for data
  10. The actual data itself 

All that needs to be setup on the machine; which at times can be quite fiddly; for example many of the dependencies have dependencies and sometimes a google is required. Although I have a 'build' file consisting of a document file mostly saying "do this, then this..." it can still be a tricky process. And some parts are... very... slow...

While moaning about this problem on twitter I kept hearing about something called Docker. I had also seen references to Docker in the geeky dead trees based comic I occasionally indulge myself with, and most recently at this ultra geeky linux blog written by an ex colleague.

At it's simplest level Docker allows the above nightmare to be simplified to just the following steps:

  1. An operating system. Scarily this could be ANY operating system. I discuss this below.
  2. Some essential packages (these could be containerised but probably not worth it)
  3. Drive mountings to network drives
  4. The interactive brokers gateway (could also be put inside Docker; see here).
  5. Install Docker
  6. Run a docker container that contains everything in steps 5 to 10

This will certainly save time every time I setup a new trading server; but unless you are running your own data centre that might seem a minimal time saving. Actually there are numerous other advantages to using Docker which I'll discuss in a second. But first lets look at a real example.


Installing Docker


Installing Docker is, to be fair, a little tricky. However it's something you should only need to do once; and it will in the long run prevent you from much pain installing other crud. It's also much easier than installing say pandas.

Here is how it's done: Docker installation

It's reasonably straightforward, although I found to my cost it won't work on a 32 bit linux distro. So I had to spend the first few hours of yesterday reinstalling the OS on my laptop, and on the development machine I intend to use for running pysystemtrade when it gets closer to being live for at least .

Not a bad thing: I had to go through the pain of reinstalling my legacy code and dependencies to remind me of why I was doing this, took the opportunity to switch to a new IDE pycharm, and as a bonus finally wiped Windows 10 off the hard disk (I'd kept it 'just in case' but I've only used it twice in the last 6 months: as the rest of my household are still using Windows there are enough machines lying around if I need it).

The documentation for Docker is mostly excellent, although I had to do a little bit of googling to work out how to run the test example I'm going to present now (that's mostly because there is a lot of documentation and I got bored - probably if I'd read it all I wouldn't have had to google the answer).


Example of using Docker with pysystemtrade


The latest version of pysystemtrade on github includes a new directory: pysystemtrade/examples/dockertest/.  You should follow along with that. All the command line stuff you see here is linux; windows users might have to read the documentation for Docker to see whats different.


Step one: Creating the docker image (optional)


The docker image is the starting state of your container. Think of a docker image as a little virtual machine (Docker is different from a true virtual machine... but you can google that distinction yourself) preloaded with the operating system, all the software and data you need do your thing; plus the script that your little machine will run when it's loaded up.

Creating a docker image essentially front loads - and makes repeatable - the job of setting up the machine. You don't have to create your own images, since you can download them from the docker hub - which is just like the git hub. Indeed you'll do that in step two. Nevertheless it's worth understanding how an image is created, even if you don't do it yourself.

If you want to create your own image you'll need to copy the file Dockerfile (in the directory of pysystemtrade/examples/dockertest/) to the parent directory of pysystemtrade on your computer. For example the full path name for me is /home/rob/workspace3/pysystemtrade/... so I would move to the directory /home/rob/workspace3/. Then you'll need to run this command:

sudo docker build -t mydockerimage .

Okay; what did that just do? First let's have a look at the Dockerfile:


FROM python
MAINTAINER Rob Carver <rob@qoppac.com>
RUN pip3 install pandas
RUN pip3 install pyyaml
RUN pip3 install scipy
RUN pip3 install matplotlib
COPY pysystemtrade/ /pysystemtrade/
ENV PYTHONPATH /pysystemtrade:$PYTHONPATH
CMD [ "python3", "/pysystemtrade/examples/dockertest/dockertest.py" ]

Ignoring the second line, this does the following (also read this):


  • Loads a base image called python (this defaults to python 3, but you can get earlier versions). I'll talk about base images later in the post.
  • Loads a bunch of python dependencies (the latest versions; but again I could get earlier versions if I wanted)
  • Copies my local version of the pysystemtrade library into the image
  • Ensures that python can see that library
  • Runs a script within the pysystemtrade 


If you wanted to you could tag this image and push it on the docker hub, so that other people could use it (See here). Indeed that's exactly what I've done: here.

Note: Docker hub gives you one free private image by default, but as many public images as you need.

Note 2: If you are running this on a machine without pysystem trade you will need to add an extra command to pull the code from github. I leave this as an exercise to the reader.


Step two: Running the container script


You're now ready to actually use the image you've created in a container. A container is like a little machine that springs into life with the image pre-loaded, does it stuff in a virtual machine like way, and then vanishes.

If you haven't created your own image, then you need to run this:

sudo docker run -t -v /home/rob/results:/results robcarver17/pysystemtrade

This will go to docker hub and get my image (warning this may take a few minutes).

OR with your own local image if you followed step one above:

sudo docker run -t -v /home/rob/results:/results mydockerimage


In both cases replacing  /home/rob/results with your own preferred directory for putting the backtest output. This stuff after the '-v' flag mounts that directory into the docker container; mapping it to the directory /results. Warning: the docker image will have complete power over that directory, so it's best to create a directory just for this purpose in case anything goes wrong.

The docker image will run, executing this script:


from systems.provided.futures_chapter15.basesystem import futures_system
from matplotlib.pyplot import show

resultsdir="/results"system = futures_system(log_level="on")
print(system.accounts.portfolio().sharpe())
system.pickle_cache("", resultsdir+"/dockertest.pck")


This just runs a backtest, and then saves the result to /results/dockertest.pck. The '-t' flag means you can see it running. Remember the /results directory is actually a volume mapped on to the local machine. After the image has closed it will have created a file on the local machine called   /home/rob/results/dockertest.pck


Step three: Perusing the results


Now in a normal local python sessions run the file dockertestresults.py

from systems.provided.futures_chapter15.basesystem import futures_system
from matplotlib.pyplot import show

resultsdir="/home/rob/results"
system = futures_system(log_level="on")
system.unpickle_cache("", resultsdir+"/dockertest.pck")
# this will run much faster and reuse previous calculationsprint(system.accounts.portfolio().sharpe())

Again you will need to change the resultsdir to reflect where you mapped the Docker volume earlier. This will load the saved back test, and recalculate the p&l (which is not stored in the systems object cache).


Okay... so what (Part one: backtesting)


You might be feeling a little underwhelmed by that example, but there are many implications of what we just did. Let's think about them.


Backtesting server


Firstly, what we just did could have happened on two machines: one to run the container, the other to analyse the results. If your computing setup is like mine (relatively powerful, headless servers, often sitting around doing nothing) that's quite a tasty prospect.


Backtesting servers: Land of clusters


I could also run backtests across multiple machines. Indeed there is a specific docker product (swarm) to make this easier (also check out Docker machine).


Easy setup


Right at the start I told you about the pain involved with setting up a single machine. With multiple machines in a cluster... that would be a real pain. But not with docker. It's just a case of installing essential services, docker itself, and then launching containers. 


Cloud computing


These multiple machines don't have to be in your house... they could be anywhere. Cloud computing is a good way of getting someone else to keep your machines running (if I was running modest amounts of outside capital, it would be the route I would take). But the task of spinning up, and preparing a new cloud environment is a pain. Docker makes it much easier (see Docker cloud).


Operating system independent


You can run the container on any OS that can install Docker... even (spits) Windows. The base image essentially gets a new OS; in the case of the base image python this is just a linux variant with python preloaded. You can also get different variants which have a much lower memory overhead.

This means access to a wider variety of cloud providers. It also provides redundancy for local machines: if both my machines fail I am only minutes away from running my trading system. Finally for users of pysystemtrade who don't use Linux it means they can still run my code. For example if you use this Dockerfile:

FROM python
MAINTAINER Rob Carver <rob@qoppac.com>
RUN pip3 install pandas
RUN pip3 install pyyaml
RUN pip3 install scipy
RUN pip3 install matplotlib
COPY pysystemtrade/ /pysystemtrade/
ENV PYTHONPATH /pysystemtrade:$PYTHONPATH


sudo docker build -t mydockerimage .
sudo docker run -t -v -i mydockerimage

... then you will be inside an interactive python session with access to the pysystemtrade libraries. Some googling indicates it's possible to run ipython and python notebooks inside docker containers as well, though I haven't tried this myself.


Okay... so what (Part two: production)


Docker also make running production automated trading systems much easier: in fact I would say this is the main benefit for me personally. For example you can easily spin up one or more new trading machines agnostic of OS either locally or on a cloud. Indeed using multiple machines in my trading system is one of the things I've been thinking about for a while (see this series of tweets: one, two, three and so on).


Microservices


Docker makes it easier to adopt a microservices approach where we have lots of little processes rather than a few big ones. For example instead of running one piece of code to do my execution, I could run multiple pieces one for each instrument I am trading. Each of these could live in it's own container. Then if one container fails, the others keep running (something that doesn't happen right now).

The main advantage of Docker over true virtual machines is that each of those containers would be working off almost identical images (the only difference being the CMD command at the end; in practice you'd have identical images and put the CMD logic into the command line); Docker would share this common stuff massively reducing the memory load of running a hundred processes instead of one.


Data in container images


In the simple version of pysystemtrade as it now exists the data is effectively static .csv files that live inside the python directory structure. But in the future the data would be elsewhere: probably in databases. However it would make sense to keep some data inside the docker image, eg static information about instruments or configuration files. Then it would be possible to easily test and deploy changes to that static information.


Data outside container images


Not all data can live inside images; in particular dynamic data like market prices and system state information like positions held needs to be somewhere else.

Multiple machines means multiple places where data can be stored (machine A, B or a local NAS). Docker volumes allow you to virtualise that so the container doesn't know or care where the data it's using lives. The only work you'd have to do is define environment variables which might change if data is living in a different place to where it normally is, and then launch your container with the approriate volume mappings.

Okay there are other ways of doing this (a messy script of sim links in linux for example) but this is nice and tidy.

You can also containerise your storage using Docker data volumes but I haven't looked into that yet.


Message bus


I am in two minds about whether using a message bus to communicate between processes is necessary (rather than just shared databases; the approach I use right now). But if I go down that route containers need to be able to speak to each other. It seems like this is possible although this kind of technical stuff is a little beyond me; more investigation is required. 

(It might be that Swarm might remove the need for a message bus in any case; with new containers launched passing key arguments)

Still at a minimum docker containers will need to talk to the IB Gateway (which could also live in a container... see here) so it's reassuring to know that's possible. But my next Docker experiment will probably be seeing if I can launch a Gateway instance (probably outside a container because of the two factor authentication I've grumbled about before unless I can use x11vnc to peek inside it) and then get a Docker container to talk to it. This is clearly a "gotcha" - if I can't get this to work then I can't use Docker to trade with! Watch this space.


Scheduling


At the moment my scheduling is very simple: I launch three big processes every morning using cron. Ideally I'd launch processes on demand; eg when a trade is required I'd run a process to execute it. I'd launch price capturing processes only when the market opens. If I introduce event driven trading systems into my life then I'd need processes that launched when specific price targets were reached.

It looks like Docker Swarm will enable this kind of thing very easily. In particularly because I'm not using python to do the process launching I won't violate the IB multiple gateway connection problem. I imagine I'd then be left with a very simple crontab on each machine to kick everything into life, and perhaps not even that.


Security


Security isn't a big deal for me, but there is something pleasing about only allowing images access to certain specific directories on the host machine.


Development and release cycle


Finally Docker makes it easier to have a development and release cycle. You can launch a docker container on one machine to test things are working. Then launch it on your production machine. If you have problems then you can easily revert to the last set of images that worked. You don't have to worry about reverting back to old python libraries and generally crossing your fingers and hoping it all works.

You can also easily run automated testing; a good thing if I ever get round to fixing all my tests.

Geeky note: You can only have one private image in your docker hub account; and git isn't ideal for storing large binaries. So another source control tool might be better for storing copies of images you want to keep private.


Summary


Development of pysystemtrade - as the eventual replacement to my legacy trading system - is currently paused whilst I finish my second book; but I'm sure that Docker will play a big part in it. It's a huge and complex beast with many possibilities which I need to research more. Hopefully this post has given you a glimpse of those possibilities.

Yes: there are other ways of achieving some of these goals (I look forward to people telling me I should use puppet or god knows what), but the massive popularity of Docker tells you why it's so good; it's very easy to use for someone like me who isn't a professional sysadmin or full on linux geek, and offers a complete solution to many of the typical problems involved with running a fully automated system.

PS You should know me better by now but to clear: I have no connection with Docker and I am receiving no benefit pecuniary or otherwise for writing this post.