Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
138 views

Building High Frequency Trading Model

This document discusses the development of a high frequency trading model. It introduces 10 factors that could be used to generate trading signals, such as book pressure, accumulation/distribution, and momentum. Each factor is tested individually to see if it correlates with stock returns. The factors are then combined and normalized to create a trading alpha, and backtesting is performed on over 100 stocks to analyze the model's performance.

Uploaded by

Max
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
138 views

Building High Frequency Trading Model

This document discusses the development of a high frequency trading model. It introduces 10 factors that could be used to generate trading signals, such as book pressure, accumulation/distribution, and momentum. Each factor is tested individually to see if it correlates with stock returns. The factors are then combined and normalized to create a trading alpha, and backtesting is performed on over 100 stocks to analyze the model's performance.

Uploaded by

Max
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

ILLINOIS INSTITUTE OF TECHNOLOGY

Building High Frequency


Trading Model
In the eternal search for alpha
Alexey Bolotov, Chockalingam Krishnan Chockalingam, Rennet Premnath, Uma Deepthi Kapila
12/4/2010

This project covers basic building blocks necessary for the development of a trading model. Factors that
lead to trading signals are introduced and evaluated. Further examination is performed to reveal factor’s
compatibility and internal consistency. Trading alpha is constructed based on the weights assigned to
various signals. Simulation trading is performed to evaluate combination of trading rules and alpha
performance. Post-trading analysis is further implemented.
I Introduction

The purpose of this project is to show evidence that there are opportunities to generate
alpha in the high frequency trading environment. High frequency trading is a strategy
which trades for investment horizons of less than one day and it seeks to unwind all
positions before the end of each trading day. Therefore the factors that impact the
selection of HFT strategy are drastically different of those used for traditional investing.
HFT is generally divided into three groups: market-making, statistical arbitrage and
momentum. This project aims to explore opportunities only related to the last group. We
will build our alpha model based on factors developed on a series of snapshots taken
every 5 seconds. Original data set includes:

1. Open, High, Low and Close prices for the period


2. Total Volume, Top Bid Size, Top Ask Size
3. Top Ask and Top Bid.

Change in those values over 5 seconds intervals was evaluated to explore whether
factors built upon those values can be used as an important source of information for
predicting short-term fluctuations of stock returns.

We later combined information available to us to form 10 factors that represent a


diversified array of industry-followed indicators for short-term price forecasting (see
Factors).

Each factor was individually tested to produce positive correlation results between its
value and stock return (see Edge and IC Testing).

Factor values were then modified to be uniformly distributed over -1 to 1 range for
easier comparison and combination (see Normal Scaling).

Factors were tested on internal consistency (see Cronbach’s Alpha) to form groups of
mutually supporting factors.

Final trade signal was derived from Trading Alpha (see Trading Alpha) based on the
above groups.

We then performed back-testing of the signal over the 15 minute data set for over 100
stocks. Results were analyzed to form an attribution report (see Trading and
Attribution).
II Factors

In selecting factors for our model we tried to take a diversified and theme-based
approach. There are factors based on volatility of returns, factors that use technical
analysis methodology, factors with values completely removed from price of the
underlying and factors that use combination of values. Our factors can be divided into
two groups, based on whether all information for factor creation is present within one
time frame snapshot or information from previous time frames was also used *.

Group 1:

- Book Pressure. Intuitively, one would expect that when the rate and size of buy
orders exceeds that of sell orders, the stock price would have a propensity to drift
up. Our data set was only limited to top book information.
- Accumulation / Distribution. Acc/Dist tracks the relationship between price and
volume and acts as a leading indicator of price movements.
- Volatility. This indicator measures relationship between price movement and
severity of a price distribution in a single time frame.
- Liquidity. Provides information about relationship between price movement and
immediate availability of shares of stock on the bid and ask, including bid/ask
spread.
- Swing Volume. Intends to detect a relationship between magnitude of the move
within one time frame and executed volume.

Group 2:

- Momentum. Indicator provides basic information about return of the stock since
last period.
- R factor. A weighted return indicator.
- Trade V Ratio. A factor that aims to establish relationship between price moves
and change in executed volume compared to average volume in the past n
periods.
- MACD. Moving Average Convergence-Divergence is one of the most effective
momentum indicators. It is used to spot changes in the strength, direction,
momentum and duration of a trend in a stock’s price.
- Trend. This factor seeks to find relationship between a price movement and price
in relation to EMA. Value remains positive if price remains above EMA.

*. See Appendix A for factor formulas.


III Edge and IC Testing

For the shorter term factor testing we used two techniques: edge test and IC test. Basic
statistic for edge test tells us how much, in percentage terms, stock tends to move in
our favor versus against us during test period. We looked at 24 consecutive bars after
trade initiation to calculate this statistic.
 − 

 =
 + 


Where MFE indicates maximum positive move and MAE indicates maximum negative
drift of the stock price during holding period. In order to perform this test, we sorted
factor values from lowest to highest and concentrated on top and bottom 10% of values.
That would give us an opportunity to utilize both long and short positions. The value of
10% was selected purely arbitrarily. Other thresholds could be also explored.

There is a standard technique called bootstrapping, that can tell us if our test results
are statistically significant. It’s based on building a sampling distribution for a statistic by
resampling from the data at hand. There is certain degree of autocorrelation in our data
and we would need to adjust the sampling size. We don’t have significant amount of
data in our set to perform this analysis.

We can also find forecasting ability of our signals by performing what is known as IC
test. The results will show correlation levels of our forecast with actual market returns.
We picked three forward returns lengths – 5, 60 and 120 seconds. For each period
correlation of the signal and return was calculated.

Here and thereafter, we will use in-text explanation of Excel spreadsheets structure. First file in
a series is “1 – PROJECT F”. Worksheets “MSFT (raw)” and “GOOG (raw)” represent extracted 5
seconds data, related just for those two stocks, just for 02/01/2008. Only data for those stocks
and date was used to derive and test our signals. Worksheets “Calculation Examples” contain
formulas to calculate our factors. Factors are presented in colored columns M through Y.
Worksheets “Factors GOOG” and “Factors MSFT” contains the same information, but with
factors values passed as numbers, rather than formulas (that helps with sorting). Columns BA
through BL are used to calculate factor statistics. Column BD sums up MFE’, MAE’s and Edge to
calculate total edge. Table at column BI shows the results for IC calculation. This worksheet also
serves as our factor tester. That is done by selecting columns A through BA and sorting by factor
value. All statistics will be illustrated in columns BA – BL. Final worksheet “Trading Alpha” refers
to combination of scaled and fitted factor values that form final alpha, more on that later. Next
spreadsheet in series “2 – Factor Statistics F” sums up all information in one table. Results are
divided into LONG and SHORT sections, with individual statistic per section showing Hit rate,
MFE’s, MAE’s, Average Edge. Right-most section of the table sums up edge values into Total
Edge.
Table below illustrates the results of the Edge test calculation for each factor, followed
by IC test results.

Edge Test Results:


IC Test results (GOOG):

IC 5 sec 60 sec 120 sec


BOOK PRESS 0.017638 0.0122 0.012662
ACC / DIST 0.063413 0.064678 0.012156
VOLATILITY -0.03017 -0.04629 -0.04713
MOMENTUM 0.033764 0.059673 0.01987
R FACTOR 0.035231 -0.03917 -0.00171
TRADE V RATIO 0.002484 0.002969 0.006315
MACD -0.00369 0.109233 0.174003
LIQUIDITY 0.000875 0.0029 0.012116
TREND 0.030976 0.058628 -0.02755
SWING VOLUME -0.03136 -0.03777 -0.0242

IC Test results (MSFT):

IC 5 sec 60 sec 120 sec


BOOK PRESS 0.170998 0.02656 -0.00677
ACC / DIST 0.006514 0.008651 -0.02467
VOLATILITY -0.01744 -0.10206 -0.1136
MOMENTUM 0.000516 -0.01057 -0.02314
R FACTOR 0.03274 0.023475 0.035467
TRADE V RATIO 0.010239 0.011291 -0.00074
MACD 0.012377 0.054458 0.032245
LIQUIDITY 0.023906 0.025093 0.040507
TREND -0.03648 -0.03375 -0.07054
SWING VOLUME 0.019923 -0.09597 -0.09786

IV Normal Scaling

Throughout our research we intend to be able to compare our factor values, be able to
determine importance of the factors and assign different weights to them. Original factor
values have drastically different variability and distributions. Units in which factors are
measured must be similar for our final alpha calculation. We used distribution fitting to
scale values of all factors to the range -1 to 1. This technique uses mapping of the
lowest observation to the probability that generates the lowest score (-1), mapping of
the highest observation to the probability of the highest score and mapping the rest of
the observation according to the probabilities proportional to their values.

Series of charts illustrating fitted distribution of factor values is presented below.


Book Pressure:

MACD:

R factor:
Accumulation / Distribution:

Momentum:

Trend:
Volatility:

Swing Volume:

Please see third file in series “3 – Normal Scaling – F” for details. Sheet “Raw Data” contains
original factor values. Sheet “Working Sheet” was used to copy original values, including time
stamp, resort by value and copied to column B. Column F then shows scaled, fitted values. It is
then copied back and sorted by time. Worksheet “Normalized” illustrates final result with new
values for each factor.
V Cronbach’s Alpha

To test internal consistency of our group of factors we used Cronbach’s analysis.


Alpha coefficients were calculated for the entire group of N factors, then for N
groups with N – 1 factor and so on until we were examining combinations of just
2 factors. During this analysis we were able to identify two homogeneous groups
of factors. Group 1: Accumulation / Distribution, Momentum, Trend: Cronbach’s
Alpha 0.785. Group 2: Volatility, Swing Volume: Cronbach’s Alpha 0.7347. The
rest of the factors were not grouped in any combinations due to very low alpha’s
reading. We used following formula to calculate Cronbach’s Alpha:

It allowed us to easily switch between, substitute and eliminate factors.

Please see spreadsheet “4 – Cronbach’s Alpha F” for reference. Sheets “10 FACTORS”
through “1 FACTORS” illustrate a progress of removing factors and new group
formation. Worksheet “GROUP” shows final two groups with corresponding Cronbach’s
Alpha.

VI Trading Alpha

Final trading alpha is a combination of weighted scaled factor values. Factors


considered for weighting included:

- Equal weighing,
- Achieved Edge weighting,
- Cronbach’s alpha group based weighing.

We have experimented with weighing based on all of the mentioned factors. Equal
weighting the factors produces non-discriminating result. It is a general, all-purpose
approach, where each factor was assigned a weight of 0.1. Results for trading alpha are
very promising and presented in the table below:

AVE UP EDGE AVE DN EDGE AVE TOTAL Hit Rate Up Hit Rate Dn
0.070471949 0.144113811 0.106489448 52.9787% 60.0000%
IC 5 sec 60 sec 120 sec
ALPHA 0.029129447 0.039844551 0.033319173
Much better results, however, were also achieved when using higher weights for the
factors with highest total edge results. Three quarters of the weighting was distributed
between Book Pressure, Acc / Dist, MACD and Trend, with the remainder (0.25 of the
total) equally distributed between other 6 factors:

While having slightly lower overall edge, this combination would achieve much higher
and stable IC over all time frames tested.

This calculation can be seen in Excel spreadsheet “5 – Trading Model F”. This spreadsheet is
designed to run a macro on each stock symbol, produce factor values, scale and fit those values,
apply appropriate weights to calculate model’s alpha, plot a graph of alpha, plug in those values
into trading mechanism, show trading positions and PnL, and, finally, create separate sheet with
all data displayed. Please find detailed instructions on this spreadsheet in Appendix B. Every new
worksheet will have columns AA through AJ displaying fitted factor values. Row 2 displays
appropriate weights and column AK sums it all up in a final model alpha calculation for that
particular stock.

While performing similar analysis using MSFT data for 02/01/2008, we came up to
similar conclusions. Edge test and IC test for alpha show overall improvement and
stability of the combined weighted indicator. Results are shown below:

Over all the MSFT data had lower edges for all factors including the book pressure
factor.

VII Trading and Attribution

In this section we back-test our alpha model. Out of all the equity names provided we
selected a “universe” of 50 stocks. Stocks were selected on a fair sector representation
basis. Out of 50 stocks we randomly selected 20 names and applied the data available
for those stocks in our trading model. We have created trading alpha based on factors
from our in-sample analysis of the MSFT and GOOG data. We intend to apply same
model to random 20 stocks and, essentially, test our approach out-of-sample. In order
to imitate a real world approach, when a combination of rolling N past periods analysis
would be used (and historical data exists) to detect optimal range for alpha, we just
peek forward to check alpha distribution for the test period to decide on our WTB and
WTS cut off ranges. That’s why they appear to be different for different stocks.

Worksheet “Trading Positions” in Excel spreadsheet “5 – Trading Model F” contains a


list of 20 stocks in our trading portfolio and positions in each stock. Position value can
be either -100 (short), 0 (neutral) or 100 (long). For an attribution analysis we selected a
period between 9:36:00 and 9:36:15. Performance of the 50 stock “universe” and our
positions during this time frame can be seen on page “Return and MSP” on “6 –
Attribution F” worksheet. Next we analyze each consecutive time frame step-by-step.
Sheet “Attribution 1” represents positions we held at the beginning of 9:36:00 and
returns of the 50 stocks at 9:36:05. Based on that, we calculate our first 5 seconds
attribution. Sheet “Attribution 2” represents next time frame. Positions at 9:36:05 are
marked against stocks performance at 9:36:10. Final sheet “Final 15 sec Attribution”
sums everything up*.

Attribution results:

About 15% of our profit came due to market exposure and about 75% is due to sector
and stock exposure. Our strategy was not intended to be market (dollar) neutral nor was
it designed to be speculative. Therefore it is encouraging to see positive results in all
three categories.

VIII Conclusion

This project confirms a presence of information available at the top level of the book to
make successful short-term price forecasts. Multiple assumptions were made while
calculating trade price (MSP), commissions (none), slippage (none) that made our
project only a first stepping stone for a further deep research.

*. See Appendix B and C for Excel walkthrough.


APPENDIX A

   −
  
  =
   +
  

   −  

    = $%  &


ℎ − " #

ℎ 
 ' = "(
" # 

  
  = − 1
)    

(.94 0 ( − 1) + .06 0() − 0())


+  =
0

(%  ) 
4  + =
∑67879
678 %  
4
2

(10,40,5) = {$=
(10) − =
(40) − 
 ( − 1)& ∗ + 
( − 1)} ∗ −1
6

1
"A' = ∗ (
   +   )

 
−1
 

0
4 = ln

D 0

0#   = ( ℎ  − " # ) ∗ %  


APPENDIX B

Trading Model - Walk through

List of Sheet in the Trading Model

1. Main Sheet
2. 15 min – Base Model
3. 1 day Base Model
4. Raw Data
5. Trading Positions
6. sheet for each individual stock in the Portfolio

1. Main sheet

Select the stock from the Drop down list in the Circle and click run macro. This adds a new sheet
with 15 min Trading Model for that Specific stock
2. 15 Min – Base Model

This is Template of complete Trading model for 15 min that we use to apply to our stocks

This sheets starts with raw data

And then Factors calculation

And Factor Normalization

And then Alpha calculation based on weights calculated from Cronbach’s alpha

And finally the Trading Logic and Alpha Plot


3. 1 day – Base Model

This is Template of complete Trading model for 1 day. Same as 15 min- Base Model
Applied only for GOOG stock

4. Raw data

15 min data for all the stocks in the universe


5. Trading position

Collection of trading positions at each 5 sec bars for all the two stocks. This sheet is
dynamic, positions changes atomically when changed in the stock specific sheet. We use this
Data in the sheet for Attribution
6. Separate sheet for each stock – 15 min trading model applied to each stock

1 ADBE
2 AMZN
3 AAPL
4 BBBY
5 CMCSA
6 COST
7 HANS
8 EBAY
9 CDNS
10 GOOG
11 INFY
12 HOLX
13 DELL
14 LOGI
15 MSFT
16 DISH
17 CSCO
18 SIRI
19 WFMI
20 YHOO
APPENDIX C

Attribution - Walk through

List of Sheet in the Attribution

1. Sector
2. Return and MSP
3. Attribution 05
4. Attribution 10
5. Attribution 15
6. Final 15 secAttribution

1. Sector

This sheet has the list of given stocks with sectors (grey strip) and out of that we selected 50 to
be our universe (green strip) and from that we chose 20 stocks (pink strip) to be in our portfolio.

2. Return and MSP

This sheet has the list of MSP and Return for the stock universe and Positions for our portfolio

3. Attribution 05/10/15/Final15

These are the main Attribution sheets. Attribution 05/10/15 has 5 second attribution summary on
each sheet and the Final15 sheet has the sum of the first 3 Attributions and captures 15sec
Attribution summary.

You might also like