Wednesday, 2 September 2015

Systems building - Checks and balances


Driverless cars are, apparently, very close to commercial reality. I don't know about you but there is something pretty scary about a computer being completely in control of a complex process, which could have catastrophic consequences if it went wrong.

Ah it was nothing. You should have seen the other guy...  (From autospies.com)

That might seem a strange attitude. After all I run a fully automated systematic trading strategy. True if it goes wrong the consequences are 'just' losing money, rather than death and dismemberment; but I am pretty comfortable with letting the computer trade my money for me. Partly might be because I wrote the software myself; but given I am a competent amateur rather than a professional programmer this shouldn't be that reassuring.

No - what gives me comfort is having information about my system. Because I have a series of ways to automatically monitor my system; to understand if it's behaving properly and if anything is going wrong.

This post will tell you how I keep a close eye on my automated trading computer. It is the sixth, and final post, in a series on the nuts and bolts of building systematic trading systems. Warning: there are a few python and linux commands sprinkled here. However you don't need to be familiar with these environments to follow the post.


I like driving in my car


Let's return to the car analogy. In my car I have a dashboard with various lights and dials. If I had a nicer car, I'd probably have a built in touch screen interface as well.

Now that is what I call a dashboard ... (knightrideronline.com)

Leaving the specifics aside these various components can be classified as one of:

  • Indication lights. Binary (on/off), and which show if something is happening or not. Example - Turn signal / indicator light.
  • Warning lights. Binary (on/off), and show if something is wrong if lit. Example - low fuel warning.
  • Informative dials and displays.  Variable, show to what degree something is happening. Example - speedometer.
  • Interactive reporting. Can be interrogated to give different information. Example - satellite navigation or in car computer touchscreen.
I'll use the same keywords here to classify the different types of monitors and reports I generate. Note that a warning light ought always to be off, but an indication light could 'legally' be on, or off. If you saw your turn signal blinking when you hadn't selected it that would be a problem.

 Note that with all diagnostics the actual reporting component might be broken. So you can get false errors when all is well, or a report that all is well when it really isn't (those with some statistical training might think about type 1 and type 2 errors: usually I'd prefer to get a false error, than miss a real error).

This means there is a flaw in having warning lights. If a warning light isn't on, then it might indicate there is no problem. Or it could indicate the warning light itself isn't working, but there is an underlying problem. It's perhaps better to have a light which is green if there is no problem, then goes red when there is.

That's the analogy; so how do we actually do this in practice? We need some way of getting the underlying raw information, and then we need to report it to the user.


Sources of raw information


There are three different underlying sources of information that can be used for system status reporting: logs/echoes, diagnostics and existing data.

Logs and echoes


The most basic information you can use are echoes and logs. Echoes are my name for the trace output from a process (everything that is printed to screen, or stdout if you want to be pedantic). So my unix crontab* begins like this:

run@bilbo ~/echos $ crontab -l
#
# Daily sample FX Prices
#
0 6  * * 1-5 $HOME/workspace/systematic_engine/syscontrol/scripts/samplefx  LIVE >> $HOME/echos/samplefxpricesLIVE 2>&1


* Note for non unix users - a crontab is a schedule of processes that get kicked off on a regular basis. It's one reason why linux / unix is better for automated processes.

The part after >> means that the trace output from calling the script samplefx LIVE will be dumped into the file samplefxpricesLIVE. This is the 'echo' file for this process. Good things about echo files are that they are simple, and that without any explicit error handling they'll always dump the trace of an error failure when it happens. Bad things are that they include everything that a process writes to stdout, and unless you truncate them regularly they just get larger every day; so it can be hard to find stuff.

Logging is a bit more sophisticated. It requires software to say 'I want to log this please' rather than just printing.


run@bilbo ~/logs $ cat fx.log
2015-08-31 06:00:04 INFO    : fx           : sample all fx running  (sample_all_fx)
2015-08-31 06:00:04 INFO    : fx           : Running in AUTO mode (sample_all_fx)
2015-08-31 06:00:04 INFO    : fx           : sampling USD (sample_all_fx)
2015-08-31 06:00:04 INFO    : fx           : sampling AUD (sample_all_fx)
2015-08-31 06:00:04 INFO    : fx           : sampling AUD: getting quandl price (sample_all_fx)
2015-08-31 06:00:11 INFO    : fx           : Added fxprice AUD 2015-08-31 00:00:00 0.716025 ok (add_price_row)


In my system this is produced my simple commands like this:

    log=logger()
    log.info("sample all fx running ")
    log.info("Running in %s mode" % entrymode)


My code:
  • Outputs everything in a nice way, with time stamps, and padding so it's easy to read
  • Handles different levels of log message differently - information, warning, error or critical error.
  • Recognises which module is writing the log, and which parent process it's come from (using the python function inspect.stack)
  • Directs messages to the appropriate log file, depending on what level of error we have and the parent process.
  • Sends me an email when a critical error is logged.

It's based on the python logging package. This package also allows a new log file each day, and the automatic archiving of old log files.

Note that both logs and echoes aren't much use for understanding what is going on. You can do basic monitoring by running tail logfilename -f in a unix terminal, which will show you a live updating file (and is all you can really do if people ask to see your 'system in action'), but there is nothing here like the dashboard of a car.

"Can I see your trading system?". What people expect to see.... (kevinonthestreet.com)

.... and what people actually get.

They also aren't much use as a data layer for a reporting application, since a big unstructured text file isn't the best way to store retrievable data.Normally you'd only go to these files if something had gone badly wrong, and you needed very detailed information about what happened.

Diagnostics


Unlike the free text of logs and echoes; diagnostics are a structured and usually quantified record of what a process has been up to. So rather than logging "Just got an ask price of 124.2 for US 10 year bonds" I'd store the value 124.2 in a place specifically for that purpose.

A diagnostic record is a value (or sometimes text string), for a given date, and for a given set of characteristics, eg if I store the ask price for 10 year bonds, then I could get the time series of such values with a python command like diagnostic(system="prices", instrument="US10", value="ask_price").read()

If you're interested I store my diagnostics in a mixture of database and hdf files, with the read/write methods figuring out what to write and where to read from based on the type of the value being stored.

As with logs/echoes I will rarely look at the naked raw data. But they are used as the data layer for reporting. About two thirds of the information that the 'user facing' reports that I discuss below ultimately comes from diagnostics.


Existing system data


Here I am referring to the data I actually use in my system, eg prices, intermediate calculations, positions and trades. This isn't data specifically stored for diagnostic purposes; it's needed anyway. About one third of the information in my reports is just pulled from existing databases. Since generating and storing diagnostics is costly (both in terms of processing power and disk space), it's better to use data you already have if you can.


Reporting

This isn't my trading dashboard. But it's cool though, isn't it? (dashboardspy.wordpress.com

No, I don't have a fancy GUI that displays an actual trading dashboard (although my former employees, AHL, have an amazing one displayed on giant touchscreens in their offices if you can talk your way into them); instead I have a mixture of:

  • text reports that are emailed to me each day, 
  • a series of scripts that I can run manually to give me different kinds of information, again text based; 
  • and an interactive method for visually reporting ad-hoc diagnostics.


Reports


All my reports are produced daily, and I produce two versions. The first is emailed to me, and contains only problems (the warning lights). The second is stored on disk, and is more verbose.

I can also run all my reports interactively on an ad hoc basis, perhaps with different arguments; so rather than getting just todays trades I could get them for the last two weeks. 

Monitor report


I run two kinds of reports that give indication and warning type information.

The first report is my monitor report. This is a very simple report that:

  • Check's when a particular process last happened (from explicit diagnostics for this purpose) or particular kind of data was last written (by examining the appropriate database tables).
  • Compares this to when we expected it to happen. Daily events are expected to happen.... yes every week day; other processes during market hours.
  • Reports anything that is delayed, where the definition of delayed for me is 12 hours (fine for a relatively low frequency).
A typical emailed monitor report might look like this:

Ran monitor report at 2015-08-31 12:08:11
 Following are delayed more than 12 hours


prices-fx                 HKD          504.0     2015-08-10 00:00:00



\ ***************  FINISHED  ******************



This is telling me that I have a delay on FX rates for HKD of 504 hours; more than the 12 hour delay cutoff (this is an extreme example as I've been on holiday - a wet camping holiday if you must know).

This is a picture of me on my holiday, whilst HKD was moving slightly more than usual. Well it's not actually me. But you get the idea. (http://www.bbc.co.uk/)

I'd then need to find the underlying cause; which could involve looking at other diagnostics, or log files (in this case it's because the FX rate - normally virtually fixed - moved more than expected, which requires a manual intervention on my part).

The full verbose version looks like this:

run@bilbo ~/reports $ cat monitor_report.txt | more
Monitor report at 2015-08-31 12:08:11


 prices-fx                HKD          504.0   2015-08-10 00:00:00
last_beat-samplerunner    PLAT         1.3     2015-08-28 18:31:50
last_beat-signalsrunner   PLAT         1.3     2015-08-28 18:31:53
prices-optimal            PLAT         1.3     2015-08-28 18:32:59
prices-adj                EDOLLAR      1.3     2015-08-28 18:36:16
last_beat-samplerunner    EDOLLAR      1.3     2015-08-28 18:36:34
last_beat-signalsrunner   EDOLLAR      1.3     2015-08-28 18:36:39
prices-optimal            EDOLLAR      1.3     2015-08-28 18:37:49
prices-adj                WHEAT        1.2     2015-08-28 18:43:48
last_beat-samplerunner    WHEAT        1.2     2015-08-28 18:43:56
last_beat-signalsrunner   WHEAT        1.2     2015-08-28 18:43:59
prices-optimal            WHEAT        1.1     2015-08-28 18:45:11 


.... and so on. Clearly everything is running smoothly. The delays of just over an hour are misleading; for simplicity I assume that everything 'should' have got a price exactly when the market 'closed' (and for these US markets, that is my system closing time of 8pm local time).

That is why I use a 12 hour threshold - I am happy if my system runs once a day for a given instrument. A different tolerance for delays would make sense for other people; or for those whose system always gets a price for a given instrument at specific times each day.


Reconcile report

 

My second report for indication and warning information is the reconcile report. This runs a series of cross checks to see if the various system states are properly aligned. This won't make any sense until I explain it.

An emailed reconcile report would look like this:

Reconcile report issues at 2015-08-31 13:00:03


 No issues with optimal vs actual positions

IB and my positions perfectly matched

IB and my fills perfectly matched

Roll status with action needed

       code                              msg  near_expiry_days  position price_contract  relative_volume rollstatus                 suggest
2   LIVECOW           Roll positions ongoing                 0        -1         201510         0.004920      ALLOW                CONTINUE
7      BOBL         Roll complete *ROLL ADJ*                 7         0         201509         0.012277      ALLOW                ROLL_ADJ


 Run rollinstrument as needed

Missing forecasts

Missing last value for  KR10 cryer
Missing last value for  KR3 cryer
Missing last value for  LIVECOW rushton
Missing last value for  KR10 rushton
Missing last value for  KR3 rushton
Missing last value for  BOBL rushton
Missing last value for  BTP rushton
Missing last value for  BUND rushton
Missing last value for  OAT rushton
Missing last value for  SHATZ rushton
Missing last value for  AEX rushton
Missing last value for  CAC rushton
Missing last value for  SMI rushton
Missing last value for  EUROSTX rushton


no instruments at limit

No stopped process


No instruments with trade controls
 

 
No instruments overrides


No issues with positions (locks, other contract trading, in forward when not rolling)



\ ***************  FINISHED  ******************



Clearly what we will want to see is lots of smiley faces (yes I really do have them in the report). These are the 'green' warning lights I mentioned in my original explanation of the dashboard metaphor. If there are is a missing smiley faces then I might need to investigate further.

Let me quickly explain each of the sections of the report:

  • Are the positions I have (according to my position database) in line with the optimal: what I want to have given current market prices and my trading rules?
  • Are the positions I have (according to my position database) in line with what my broker is reporting to me?
  • Are the fills I have for today (according to my trades database) in line with what my broker is reporting to me?
  • Are any of my futures contracts in a state where I need to think about rolling them forward?
  •  Am I missing the last forecast for a given trading rule variation (here shown with internal code names) and instrument? This might be for legitimate reasons, but I need to know about it.
  • Have I exceeded my self imposed trading limits - maximum number of contracts per day; or position limits - maximum number of contracts to hold?
  • Have I manually stopped any of my processes from running? This might be a valid thing to do, but I don't want to forget about this and leave it turned off.
  • Are any of my instruments in a trading state other than "OK TO TRADE"? Again I might want this, but I musn't forget I've done it.
  • Have I applied an override; eg cut positions by half in this instrument, to a given instrument?
  • Are any of my instruments locked (meaning a trade is taking place)
  • Do I hold positions in something other than the current, or next contract I want to roll into?
  • Do I hold positions in the next contract, when I am not in the right rolling state?

There is also a longer report on disk, but I won't reproduce it here. Again clearly there are some things here that are quite specific to a futures system, but you should be able to put together a checklist for your own system which reflects your own concerns.

 

Risk report


The remainder of my reports are for informative purposes. Probably the most important of these is my risk report. This is a complex report, so I'll go through section by section.

The first section shows instruments where 'stuff has happened' - eithier in the market, or my system, or both. It's those instruments where closer attention might be warranted if the system has started behaving weirdly.

In the first table I show the instruments for which my forecast has changed the most today. I also show the daily price change, normalised for volatility, and the current size of the forecast. If my intuition suggests that the forecast change is large, or unwarranted given market movements, I will delve deeper.

The next table shows instruments for which the absolute volatility normalised daily price change is bigger than 1. Again I also show the forecast, and the change in forecast. I have similar tables looking at changes in prices, and forecasts, over a 2 week horizon (given the speed of my system they are likely to be more meaningful).

Section two focuses on my biggest expected risks.

I list all instruments for which my forecast is larger than 15 (where the expected average absolute value of my forecasts is 10). I show the actual positions, and optimal positions, I have in each instrument and show all the numbers used to calculate them, such as instrument weight and diversification multipler.

Obligatory book plug: This part of the blog will make a little more sense if you read part three of "Systematic Trading".

I also have a table with all the instruments for which the monetary value of my risk exceeds a given amount. I compare the expected risk I'd have if I could hold fractional futures contracts with what I actually have given discrete integer positions.I compare actual versus expected risk (using 2 weeks of returns for the former) for the instruments with the largest risk.

Section three of the report looks at total portfolio risk. I calculate this in various ways, with different assumptions for volatility and correlations. I also include statistics on my cash holdings here (so I can see if I need to do any spot currency transactions).

The final section looks at my return, volatility, and Sharpe Ratio over various periods (from 2 weeks up to one year).

As usual the emailed version includes only the biggest positions and risks; a more verbose version gives values for every single instrument.


Trades report

 

This is simply a list of all the trades I've executed, and my calculation of execution costs - what I could have paid if I'd been able to execute at the mid, what I would have paid if I'd crossed the spread immediately, and what I actually paid (the difference between the latter two due to my simple execution algo). It also shows the time delay between getting a price, calculating the resulting optimal position, issuing the order, and getting filled.

Any trades with large costs (even if they're positive) or long delays might warrant further investigation.

The default emailed report will be of just todays trades (this is one occasion where there is no fuller verbose version), but I can also interactively run the report for a longer period.



Profit and loss report

 

You can easily imagine what this report is for. Read my detailed post on accounting data to see how I use various sources to work out p&l numbers. I default to running a p&l for a single day, at the close, but I can also get it for different periods interactively. The emailed report contains the highlights (biggest profits/losses), whilst the verbose version has every single position I've held during the reporting period.

Very large p&l would pique my interest, but I'd probably look at the risk report to see where it came from (an unusually risky position, or a very large move in price).

 

Scripts


Here is a list of the scripts I can run in a unix terminal to get various nuggets of useful information when required (with various additional arguments). The emphasis here is on small pieces of code to do one particular specific job. Note that most of these diagnostic functions are also used when I generate the emailed reports (the underlying code can be run eithier by a script and display output; or it can return the raw data to the reporting 'engine').

  • displaylimits: What are my positions, and trades, for each instrument and how do these compare to my self imposed limits on those?    
  • rollstatus: Present eithier a summary of futures roll status, or a detailed report for an individual instrument.     

  • displaydiag: Display the diagnostic values for a given set of characteristics.  
  • lastbeat: Show when each process was last run for each instrument  
  • orderdiags: Display all the diagnostics generated by my execution algo for a particular order.     
  • displayforecasts: Show a pandas data frame of the forecasts I have for each instrument.
  • missingforecasts: Print a list of instruments and trading rule variations for which the last forecast value was missing.

  • showorder: Give me information about a particular trade.
  • displaytickets: Show the orders (both filled and unfilled) for the last N days
  • recfills: Compare the fills data from my database, and from the broker      

  • displaypositions: Show a pandas data frame of the positions (summed up to instrument level).  
  • lastposition: Show me the current position in each futures contract
  • lastopt: Last optimal position for each instrument (the position my system wants to take)  
  • displayopt:          Show a pandas data frame of the optimal positions for each instrument.     
  • optvspos: Compare my optimal positions to what I actually have.   
  • recpositions: Compare the positions from my database, and from my broker.       

  • displayfx: Show me the time series of FX rates for a particular currency    
  • lastfx: Show me the last FX rates for all currencies.   
  • displayvolume: Show a time series of volume for a particular instrument and futures contract.   
  • displayraw: Show me the time series of raw futures prices for a given futures contract.      
  •  displayadj: Show a time series of prices, after futures back adjustment  
  • lastadj: Give me the last futures back adjusted price for each instrument  

  • findIB: Find the instrument which has a given brokers identity code.          
  •  showcontract: Give me all the static information about a particular futures contract.  
  • showinstrument: Give me all the static information about a particular instrument.

   

  

Interactive diagnostics


A picture is worth a thousand words. This is why your car dashboard has a little picture of a petrol pump as a fuel warning light, rather than LED lights spelling out "NEARLY OUT OF FUEL!". To find out what is causing some behaviour, like a particular trade, will often involve looking at a dozen or more individual diagnostic time series.

So I've written some stuff in python to allow me to understand visually what has been going on. Let me give you a quick demo. I usually don't run this on a trading machine to avoid loading it up or the remote chance of a file lock. The diagnostic data is copied every day to my NAS, and then pulled down to my laptop.

This is in a python session:

from syssignals.diags import diagsBlob
from matplotlib.pyplot import show
from sysdiag.picture_subfuncs.display import plot_it

data=diagsBlob("LIVE", "VIX")

This is quick since the class loads in diagnostics 'on demand'. To load all the diagnostics up for a given instrument would take a while, so it doesn't get something unless it has to. Now we're actually going to plot something; when we run this there will be a delay whilst the diagnostics that are needed are 'sucked' in.

ans=data.show_how_combined_signal_calc_dict()
plot_it(ans)


Example of a multiple plot. The dataBlob method returns a dict of pandas dataframes. plot_it is just a function to plot these.       

data.show_position_and_threshold().plot()
show()

Example of a single plot. The dataBlob method returns a single pandas dataframe, which can be plotted as normal.



The end


As it's the end of the series, what better picture to conclude than one of a tombstone? Or in this case a gravestone, apparently (forextradingstrategies4u.com) Warning: The author of this post does not endorse the use of candlestick analysis, and would go as far to say that it is at best scientifically unproven, and at worst complete nonsense.

Yes, this really is the final post in my series on the nuts and bolts of building systematic trading systems.

The first five posts are:

http://qoppac.blogspot.co.uk/2015/04/system-building-data-capture.html
http://qoppac.blogspot.co.uk/2015/05/systems-building-futures-rolling.html
http://qoppac.blogspot.co.uk/2015/06/systems-building-deciding-positions.html
http://qoppac.blogspot.co.uk/2015/07/systems-building-execution.html


If you need more help on coding up an automated system in python on the Interactive Brokers platform, check out this series of posts. If you want advice on designing trading systems then you really ought to buy, and read, my book.

 

5 comments:

  1. This is great!

    A question mostly regarding the use of Linux. I'm sure you 'solved' this issue at AHL. The data I've been using is coming from a Bloomberg Terminal. However, I can't use the terminal on Linux at all. Do you know of a workaround that I can use Linux while still accessing Bloomberg data?

    Thanks again!

    ReplyDelete
    Replies
    1. Mostly at AHL we were using the B-PIPE feed from Bloomberg. Obviously this is platform independent. From a terminal we'd only be doing occasional one off analysis so we'd copy data down to a shared Samba drive from the terminal (windows) then read off in the unix enviroment. I have a NAS in my house that is a samba drive; I think most of them are as they run linux under the hood and they are a very easy way to deal with files in a cross platform way.

      In a more systematic way I don't know if it's possible to use the bloomberg API with terminal.

      https://www.bloomberglabs.com/api/faq/

      Delete
  2. This sentence led me to write/ask the below. I would have done so privately but can't seem to find a good way to do so, so here I am..

    > given I am a competent amateur rather than a professional programmer

    I consider myself a professional programmer/engineer (+10y of exp on both writing software and building/running/tuning infrastructure), and an increasingly competent* amateur in finance and trading.

    What would be your advice to someone who like me wanted to immerse themselves into building/learning about systematic trading systems AND the day-to-day markets mechanics? Join a hedge fund? Go work somewhere like an engineer sitting next to a trading desk at GS or JPM ("strats")? Join a quant team somewhere? Work for you? :D

    I'm currently in the PE space so not much exposure to building systematic trading systems and the inner workings of the markets. So the only way to learn for me so far is to read books/blogs like yours and code the/a system by myself. Given you've "seen it all" I thought I would ask someone like you who've been immersed in this world already.

    Thanks in advance.

    *slowly but steadily, as I keep on reading and studying and learning etc - also "competent" depends on who's judging...

    ReplyDelete
    Replies
    1. I find these 'career advice' questions very hard to answer, although I can tell you for sure that working for me isn't an option as my 'family office' AUM isn't enough to support paid staff :-) But my general advice (written in some other posts, when you get to them...) is that it's often easier to first get the wrong job in the right place. By the sounds of it, getting a tech job in a systematic HF shouldn't be too much a stretch, from where you can hopefully get exposure to the PM / research side. There are fewer seats like that in banks, so that's less promising.

      Delete
  3. Indeed. Apologies for the question. It's never easy to answer.

    > I can tell you for sure that working for me isn't an option as my 'family office' AUM isn't enough to support paid staff

    Haha lol yes. But I should have probably clarified that in the sense that one could learn by just having direct exposure on someone that's been there, done that , etc. Not being employed.

    Thanks for the advice though. Got a couple of keywords in there. That's useful!

    ReplyDelete

Comments are moderated. So there will be a delay before they are published. Don't bother with spam, it wastes your time and mine.