Py Fi
Py Fi
Py Fi
Saeed Amen
When analysing financial markets, traders are often faced with conflicting objec-
tives. Furthermore, these objectives can vary significantly between traders. For
high frequency traders, it is crucial that they can quickly digest market data and
execute trades based on that analysis as soon as possible. This necessitates the use
of languages such as Java or C++. Furthermore, in some cases they might turn to
more specialised programming techniques such as using GPU or FPGA. For traders
active at lower frequencies, the speed of execution is less important. However, what
is common amongst all traders is that they all need to develop trading strategies
before they put them into production. The panacea for traders is a language which
allows them to quickly develop a trading strategy and put it into production. C++
could be one candidate? It compiles down to machine code, however, it can be time
consuming for implementation. R is popular amongst statisticians and boasts a very
wide array of statistical libraries. However, it is slow and also not ideal for the
implementation of large scale systems.
S. Amen ()
Cuemacro Limited, Level39, One Canada Square, London E14 5AB, UK
e-mail: saeed@cuemacro.com
Another benefit of using Python for doing market data analysis is that there is also
a rich ecosystem of Python libraries that deal with data analysis. Here we describe
some of the most well known libraries used in Python based data science, beginning
with the SciPy stack.
29.2.2 Visualisation
29.2.4 Front-Ends
• arctic [1]—Man-AHL’s open sourced Python library provides a front end for
efficient storing of pandas dataframes in MongoDB
• BLPAPI [2]—Bloomberg’s Open Source API for accessing Bloomberg market
data
• Quandl [15]—Open source market data provider
If we think of any data science problem, it tends to involve several steps. Below, we
outline these steps albeit from the perspective of developing a trading strategy.
• Step 1—The first step is in a sense the most important. This involves formulating
our hypothesis. In the case of developing a trading strategy, this involves
brainstorming to think of a trading rationale. Is our trading strategy based on
some sort of traditional factor, such as trend following (and if so, what is the
rationale for this factor)? Are we trying to model a certain trader behaviour, such
as specific flows? We can think of the rationale as effectively pruning the search
546 S. Amen
space of possible trading strategies. A well thought out rationale can also help
to reduce the chances that our final result is the result of excessive data mining
(where “help” is the key word!). In practice, coming up with a hypothesis, is
something where real world knowledge of markets and a deep experience of what
price action can mean is most important.
• Step 2—Once we have our hypothesis, we need data. Typically, we need price
data at the very least to compute the returns of the market asset we wish to trade.
In many cases, we need other data to generate a trading signal. This can range
from fundamental data on the economy to more unusual data sources such as
news data and social media.
• Step 3—With both a hypothesis and data, we can now create our trading signal
and also backtest our trading rule if we are developing a systematic trading
rule. This involves looking at historical data to see how well our trading signal
performed. We can split up our analysis between in-sample and out-of-sample
sections Alternatively, we might be doing market analysis to inform our decision
a discretionary trading decision.
• Step 4—Lastly, once we have completed our backtest, we need to present our
results, in written form, along with some data visualisation. A well thought out
visualisation can tell us more about any trading rule than a massive table of
numbers. Furthermore, the importance of visualisation is not purely confined to
the research stage. Understanding how our trading model is performing live is
key in helping us to risk manage it, and make adjustments if necessarily over
time.
Over a year ago, I started a Python based library, pythalesians to facilitate my own
analysis of markets, which centres around building systematic trading strategies.
I eventually open sourced that library, and subsequently split it into several more
specialised rewritten libraries listed below. In general, smaller more specialised
libraries are more successful as open source projects. It is easier for potential users to
understand what a library does when it is smaller and it is easier to identify how each
library helps with solving each stage of the data science problem. When a library
is very large and encompasses too much functionality, it can become difficult to
understand precisely what it does. It is also easier to find contributors to an open
source project when its purpose is better defined. The libraries are built upon many
other open source Python libraries in particular from the SciPy stack.
• finmarketpy (for backtesting)[7]
• findatapy (for collecting market data)[6]
• chartpy (for data visualisation) [4]
29 Using Python to Analyse Financial Markets 547
Whilst I’ve extolled the virtues of Python, for having a rich data science ecosystem,
in many cases we can have many choices for which underlying library to use. For
example, if we wish to visualise data, we could for example use Plotly, Bokeh
or Matplotlib (and this is a curtailed list). Each of these libraries has a different
API. If we want to switch between these libraries, we would need to rewrite all our
code. The same is true if we are downloading data from multiple sources, each data
provider has a different API. If we end up using multiple APIs for visualisation and
data collection, our code could become very messy indeed.
Furthermore, it will make it difficult to concentrate on what is the most important
part of our problem, creating a trading signal. In finmarketpy, findatapy and chartpy,
I’ve used abstraction to hide away the low level APIs for visualisation and data
collection. In their place, I’ve create a common higher level API. Hence, to the end
user, downloading data from Bloomberg or Quandl should look very similar, save
for the change of a single keyword. Hence, it allows the trader to concentrate on the
more pressing issue of developing a trading algorithm, rather than fiddling around
with lower level details. This approach also makes it easier to maintain, in particular
when we want to add new data sources or ways of visualising data.
Furthermore, for common tasks such as backtesting, I’ve also created templates that
make it quicker to generate new ideas, simply define the signal and the assets which
are traded. When creating a backtesting environment, the key question is whether
you want to make it event driven or not. By event driven, we mean that every new
tick of market data triggers a computation which decides whether or not to execute
a trade. From a production perspective, this is preferable, because we can use the
same code for production or backtesting. If we are simply using a system to research
a trading strategy (or trading at low frequencies), it is possible to adopt a simpler
approach which involves collecting all our signal data at the outset and all our return
data, and multiplying this data together to generate a historical backtest. This is the
approach I’ve adopted for finmarketpy. In languages like Python, there will also be
significant speed increase in doing this, given we can vectorise our code more easily
in this approach, exploiting libraries like NumPy. Event driven code, by contrast
would be much slower in a language such as Python.
548 S. Amen
We now demonstrate how we can use Python to collect market data using findatapy,
using several external sources including Bloomberg.
We use pandas dataframes as our structures to hold the time series data. In our
first example, we collect daily data from Quandl for S&P500, which is a free data
source. The Market class acts as an interface to lower level market data APIs. We
construct a MarketDataRequest object to describe the nature of the market data we
want to download.
from findatapy.market import MarketDataRequest,
Market, MarketDataGenerator
market = Market(market_data_generator
=MarketDataGenerator())
md_request = MarketDataRequest(
start_date="01 Jan 2005",
data_source=’quandl’,
tickers=[’S&P500’],
fields=[’close’],
vendor_tickers=[’YAHOO/INDEX_GSPC’],
vendor_fields=[’Close’])
df_sp = market.fetch_market(md_request)
We now repeat the exercise for Bloomberg for S&P500, which requires a valid
Bloomberg data license.
md_request = MarketDataRequest(
start_date="01 Jan 2005",
data_source=’bloomberg’,
tickers=[’S&P500’],
fields=[’close’],
vendor_tickers=[’SPX Index’],
vendor_fields=[’PX_LAST’])
df_sp_bbg = market.fetch_market(md_request)
As we can see that the code is virtually identical in both cases, the only difference
are the vendor specific tickers the data source keyword. We can also download tick
data, this time from a retail FX broker, using similar code.
md_request = MarketDataRequest(
start_date=’14 Jun 2016’,
finish_date=’15 Jun 2016’,
category=’fx’,
29 Using Python to Analyse Financial Markets 549
fields=[’bid’, ’ask’],
freq=’tick’,
data_source=’dukascopy’,
tickers=[’EURUSD’])
df_retail = market.fetch_market(md_request)
Now that we have seen how we can download market data, we discuss ways in
which we can analyse and plot this data. Here we show how we can use Python
to visual time series. In this example, we collect intraday USD/JPY data from
Bloomberg and also the event times for the US employment report for the past few
months from Bloomberg.
import pandas, datetime
from datetime import timedelta
start_date = datetime.date.today()-timedelta(days=180)
md_request = MarketDataRequest(
start_date=start_date,
category=’fx’,
freq=’intraday’,
data_source=’bloomberg’,
tickers=[’EURUSD’],
fields=[’close’],
cache_algo=’internet_load_return’)
df_fx_bbg = market.fetch_market(md_request)
df_event_times = market.fetch_market(md_request)
550 S. Amen
df_event_times =
pandas.DataFrame(index=df_event_times
[’NFP.release-date-time-full’])
We now have the raw data available, the next step is to do an event study, where
we analyse the moves in USD/JPY during the 3 h following the release of the US
employment report. We also do the same for intraday volatility
from finmarketpy.economics import EventStudy
df_event = EventStudy().
get_intraday_moves_over_custom_event(df_fx_bbg,
df_event_times)
df_event[’Avg’] = df_event.mean(axis=1)
Finally, we can plot (see Fig. 29.1) the various event study time series, using the
popular matplotlib library. Alternatively, if we were displaying on a webpage, we
might prefer the plotly library. We can see that the bulk of the volatility occurs over
the actual data release and quickly dissipates.
from chartpy import Chart, Style
style = Style()
style.scale_factor = 3
style.file_output = ’eurusd-nfp.png’
style.title = ’EURUSD spot moves over recent NFP’
style.color_2_series = [df_event.columns[-2],
df_event.columns[-1]]
style.color_2 = [’red’, ’orange’]
style.linewidth_2 = 2
style.linewidth_2_series = style.color_2_series
chart = Chart(engine=’matplotlib’)
chart.plot(df_event * 100, style=style)
29 Using Python to Analyse Financial Markets 551
1.5
1.0
0.5
0.0
–0.5
Source: Web
–1.0
0 20 40 60 80 100 120 140 160
Fig. 29.1 EUR/USD moves during 3 h following recent the US employment reports
style = Style()
style.date_formatter = ’%b’
style.title = ’S&P500 seasonality’
style.scale_factor = 3
style.file_output = "sp-seasonality.png"
chart = Chart()
chart.plot(day_of_month_seasonality, style=style)
552 S. Amen
105
104
103
102
101
100
99
Source: Web
98
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Sometimes, more complicated plots might be relevant. One example of this can be
seen with FX volatility. Implied volatility is quoted for a range of both strike and
tenor combinations. An efficient way to plot this is using a surface. Below, we show
how to download FX volatility surface data from Bloomberg and how to plot it (see
Fig. 29.3) in chartpy using a plotly backend.
md_request = MarketDataRequest(
start_date=datetime.datetime.now()-timedelta(days=1),
data_source=’bloomberg’,
cut=’NYC’,
category=’fx-vol-market’,
tickers=[’EURUSD’])
df_vol = market.fetch_market(md_request)
df_vs = FXVolFactory().extract_vol_surface_for_date(
df_vol, ’EURUSD’, -1)
29 Using Python to Analyse Financial Markets 553
12
11
11
10
10
z 9
8
7 9
ON
2W
1M
8
3M
y
9M 10DP
25DP 7
2Y ΔTM
In this section we show how to create a backtest for a basic trend following style
strategy for FX. We extend the abstract class TradingModel to create our backtest,
creating the TradingModelFXTrend_Example class. First, we do all the appropriate
imports of other Python modules that we shall use later. In the init function, we
define some parameters for where we want to output our results, the name of our
strategy and also the underlying plotting engine we shall use for displaying the
results.
import datetime
class TradingModelFXTrend_Example(TradingModel):
def __init__(self):
super(TradingModel, self).__init__()
self.market = Market(market_data_generator
=MarketDataGenerator())
self.DUMP_PATH = ’’
self.FINAL_STRATEGY = ’FX trend’
self.SCALE_FACTOR = 1
self.DEFAULT_PLOT_ENGINE = ’matplotlib’
self.br = self.load_parameters()
return
Now we define the parameters for the backtest including the start and end date.
We also define leverage parameters here on the signal and portfolio level. For each
asset we define a volatility target of 10%. The idea behind this is that we equalise the
risk in each asset we are trading. If we do not do this, we are essentially allocating
more risk in the higher volatility assets. We also apply a final volatility target for the
whole portfolio too. In practice, given we don’t know what future realised volatility
will be, we are unlikely to hit our target exactly.
def load_parameters(self):
br = BacktestRequest()
br.spot_tc_bp = 0.5
br.ann_factor = 252
br.plot_start = "01 Apr 2015"
br.calc_stats = True
br.write_csv = False
br.plot_interim = True
br.include_benchmark = True
# tech params
br.tech_params.sma_period = 200
return br
We load up all the market data next. For simplicity we shall be using spot data
from Quandl. We do note however, that in practice, using spot data for calculation
of FX returns is only an approximation given it doesn’t include carry.
def load_assets(self):
full_bkt = [’EURUSD’, ’USDJPY’, ’GBPUSD’, ’AUDUSD’,
’USDCAD’, ’NZDUSD’, ’USDCHF’, ’USDNOK’, ’USDSEK’]
basket_dict = {}
br = self.load_parameters()
self.logger.info("Loading asset data...")
vendor_tickers = [’FRED/DEXUSEU’, ’FRED/DEXJPUS’,
’FRED/DEXUSUK’, ’FRED/DEXUSAL’, ’FRED/DEXCAUS’,
’FRED/DEXUSNZ’, ’FRED/DEXSZUS’, ’FRED/DEXNOUS’,
’FRED/DEXSDUS’]
md_request = MarketDataRequest(
start_date = br.start_date,
finish_date = br.finish_date,
freq = ’daily’,
data_source = ’quandl’,
tickers = full_bkt,
fields = [’close’],
vendor_tickers = vendor_tickers,
vendor_fields = [’close’],
cache_algo = ’internet_load_return’)
asset_df = self.market.fetch_market(md_request)
spot_df = asset_df
spot_df2 = None
Next, we calculate the signal. This involves calculating a 200D simple moving
average. If spot is above the moving average it triggers a buy signal and if it is below,
we define a sell signal. This is predefined signal, so it can be defined with very few
lines of code.
def construct_signal(self, spot_df, spot_df2,
tech_params, br):
tech_ind = TechIndicator()
tech_ind.create_tech_ind(spot_df, ’SMA’,
tech_params)
signal_df = tech_ind.get_signal()
return signal_df
We also define a benchmark for our trading strategy which is simply being long
EUR/USD in this case (mainly for simplicity).
def construct_strategy_benchmark(self):
tsr_indices = MarketDataRequest(
start_date = self.br.start_date,
finish_date = self.br.finish_date,
29 Using Python to Analyse Financial Markets 557
freq = ’daily’,
data_source = ’quandl’,
tickers = ["EURUSD"],
vendor_tickers=[’FRED/DEXUSEU’],
fields = [’close’],
vendor_fields = [’close’],
cache_algo = ’internet_load_return’)
df = self.market.fetch_market(tsr_indices)
df.columns = [x.split(".")[0] for x in df.columns]
return df
We can now kick off the actual calculation, by instantiating the Trading-
ModelFXTrend_Example class and running a few commands to do the number
crunching. We also plot some of the results.
if __name__ == ’__main__’:
model = TradingModelFXTrend_Example()
model.construct_strategy()
model.plot_strategy_pnl()
We display some out the plot output in (see Fig. 29.4) from our backtest.
As a next step we do some sensitivity analysis using the TradeAnalysis class from
finmarketpy. We examine how much transaction costs impact returns. For higher
frequency strategies, transaction costs can make up a larger proportion of returns,
given they trade more rapidly.
400
350
300
250
200
150
100
Cuemacro FX CTA Port Ret = 5.4% Vol = 12.7% IR = 0.43 Dr = –30.2% Kurt = 0.3
50 Source: Web
1991 1995 1999 2003 2007 2011 2015
ta.run_arbitrary_sensitivity(strategy,
parameter_list=parameter_list,
pretty_portfolio_names=pretty_portfolio_names,
parameter_type=parameter_type)
350
300
250
200
150
100
Source: Web
50
1990 1994 1998 2002 2006 2010 2014
we have gone a bit overboard on data mining (unless we can find a very good
explanation to why other parameters should not work).
29.10 Conclusions
We have discussed the various benefits of using Python to analyse financial markets.
Whilst its speed of execution is slower than languages such as Java or C++, it has a
rich ecosystem for data science. It is also quicker to prototype ideas in Python. More
broadly we discussed the general steps we adopt when creating a trading strategy,
which can be viewed as a data science problem. Later, we briefly discussed our
approach to building an Python based ecosystem for developing trading strategies
and analysing market data. Lastly, we gave some practical Python based examples
showing how to download and analyse data. We also went through an example
demonstrating how to create a simple trend following trading strategy in FX.
References
1. arctic: High performance datastore for time series and tick data. http://github.com/manahl/
arctic
2. BLPAPI: Bloomberg market data library. https://www.bloomberglabs.com/api/libraries/
3. bokeh: Python interactive visualization library. http://bokeh.pydata.org/en/latest/
4. chartpy: Python data visualisation library. http://www.github.com/cuemacro/chartpy
5. Cython C: Extensions for python. http://www.cython.org
6. findatapy: Python finanical data library. http://www.github.com/cuemacro/findatapy
7. finmarketpy: Python financial trading library. http://www.github.com/cuemacro/finmarketpy
8. flask: Micro web framework. http://flask.pocoo.org
9. IPython: Enhanced interactive console. http://www.ipython.org
10. matplotlib: Python plotting library. http://www.matplotlib.org
11. NumPy: Package for scientific computing with Python. http://www.numpy.org
12. pandas: Python data analysis library. http://pandas.pydata.org
13. ploty: Collaboration platform for modern data science. http://plot.ly
14. PyMC3: Probabilistic programming in python. http://github.com/pymc-devs/pymc3
15. Quandl: Market data API. http://www.quandl.com/
16. scikit-learn: Machine learning in python. http://scikit-learn.org/stable/
17. SciPy library: Fundamental library for scientific computing. https://www.scipy
18. Sympy: Symbolic mathematics. http://www.sympy.org/en/index.html
19. TensorFlow: Open source library for machine intelligence. https://www.tensorflow.org
20. xlwings: Python for excel. http://www.xlwings.org/