Building High Frequency Trading Model
Building High Frequency Trading Model
This project covers basic building blocks necessary for the development of a trading model. Factors that
lead to trading signals are introduced and evaluated. Further examination is performed to reveal factor’s
compatibility and internal consistency. Trading alpha is constructed based on the weights assigned to
various signals. Simulation trading is performed to evaluate combination of trading rules and alpha
performance. Post-trading analysis is further implemented.
I Introduction
The purpose of this project is to show evidence that there are opportunities to generate
alpha in the high frequency trading environment. High frequency trading is a strategy
which trades for investment horizons of less than one day and it seeks to unwind all
positions before the end of each trading day. Therefore the factors that impact the
selection of HFT strategy are drastically different of those used for traditional investing.
HFT is generally divided into three groups: market-making, statistical arbitrage and
momentum. This project aims to explore opportunities only related to the last group. We
will build our alpha model based on factors developed on a series of snapshots taken
every 5 seconds. Original data set includes:
Change in those values over 5 seconds intervals was evaluated to explore whether
factors built upon those values can be used as an important source of information for
predicting short-term fluctuations of stock returns.
Each factor was individually tested to produce positive correlation results between its
value and stock return (see Edge and IC Testing).
Factor values were then modified to be uniformly distributed over -1 to 1 range for
easier comparison and combination (see Normal Scaling).
Factors were tested on internal consistency (see Cronbach’s Alpha) to form groups of
mutually supporting factors.
Final trade signal was derived from Trading Alpha (see Trading Alpha) based on the
above groups.
We then performed back-testing of the signal over the 15 minute data set for over 100
stocks. Results were analyzed to form an attribution report (see Trading and
Attribution).
II Factors
In selecting factors for our model we tried to take a diversified and theme-based
approach. There are factors based on volatility of returns, factors that use technical
analysis methodology, factors with values completely removed from price of the
underlying and factors that use combination of values. Our factors can be divided into
two groups, based on whether all information for factor creation is present within one
time frame snapshot or information from previous time frames was also used *.
Group 1:
- Book Pressure. Intuitively, one would expect that when the rate and size of buy
orders exceeds that of sell orders, the stock price would have a propensity to drift
up. Our data set was only limited to top book information.
- Accumulation / Distribution. Acc/Dist tracks the relationship between price and
volume and acts as a leading indicator of price movements.
- Volatility. This indicator measures relationship between price movement and
severity of a price distribution in a single time frame.
- Liquidity. Provides information about relationship between price movement and
immediate availability of shares of stock on the bid and ask, including bid/ask
spread.
- Swing Volume. Intends to detect a relationship between magnitude of the move
within one time frame and executed volume.
Group 2:
- Momentum. Indicator provides basic information about return of the stock since
last period.
- R factor. A weighted return indicator.
- Trade V Ratio. A factor that aims to establish relationship between price moves
and change in executed volume compared to average volume in the past n
periods.
- MACD. Moving Average Convergence-Divergence is one of the most effective
momentum indicators. It is used to spot changes in the strength, direction,
momentum and duration of a trend in a stock’s price.
- Trend. This factor seeks to find relationship between a price movement and price
in relation to EMA. Value remains positive if price remains above EMA.
For the shorter term factor testing we used two techniques: edge test and IC test. Basic
statistic for edge test tells us how much, in percentage terms, stock tends to move in
our favor versus against us during test period. We looked at 24 consecutive bars after
trade initiation to calculate this statistic.
−
=
+
Where MFE indicates maximum positive move and MAE indicates maximum negative
drift of the stock price during holding period. In order to perform this test, we sorted
factor values from lowest to highest and concentrated on top and bottom 10% of values.
That would give us an opportunity to utilize both long and short positions. The value of
10% was selected purely arbitrarily. Other thresholds could be also explored.
There is a standard technique called bootstrapping, that can tell us if our test results
are statistically significant. It’s based on building a sampling distribution for a statistic by
resampling from the data at hand. There is certain degree of autocorrelation in our data
and we would need to adjust the sampling size. We don’t have significant amount of
data in our set to perform this analysis.
We can also find forecasting ability of our signals by performing what is known as IC
test. The results will show correlation levels of our forecast with actual market returns.
We picked three forward returns lengths – 5, 60 and 120 seconds. For each period
correlation of the signal and return was calculated.
Here and thereafter, we will use in-text explanation of Excel spreadsheets structure. First file in
a series is “1 – PROJECT F”. Worksheets “MSFT (raw)” and “GOOG (raw)” represent extracted 5
seconds data, related just for those two stocks, just for 02/01/2008. Only data for those stocks
and date was used to derive and test our signals. Worksheets “Calculation Examples” contain
formulas to calculate our factors. Factors are presented in colored columns M through Y.
Worksheets “Factors GOOG” and “Factors MSFT” contains the same information, but with
factors values passed as numbers, rather than formulas (that helps with sorting). Columns BA
through BL are used to calculate factor statistics. Column BD sums up MFE’, MAE’s and Edge to
calculate total edge. Table at column BI shows the results for IC calculation. This worksheet also
serves as our factor tester. That is done by selecting columns A through BA and sorting by factor
value. All statistics will be illustrated in columns BA – BL. Final worksheet “Trading Alpha” refers
to combination of scaled and fitted factor values that form final alpha, more on that later. Next
spreadsheet in series “2 – Factor Statistics F” sums up all information in one table. Results are
divided into LONG and SHORT sections, with individual statistic per section showing Hit rate,
MFE’s, MAE’s, Average Edge. Right-most section of the table sums up edge values into Total
Edge.
Table below illustrates the results of the Edge test calculation for each factor, followed
by IC test results.
IV Normal Scaling
Throughout our research we intend to be able to compare our factor values, be able to
determine importance of the factors and assign different weights to them. Original factor
values have drastically different variability and distributions. Units in which factors are
measured must be similar for our final alpha calculation. We used distribution fitting to
scale values of all factors to the range -1 to 1. This technique uses mapping of the
lowest observation to the probability that generates the lowest score (-1), mapping of
the highest observation to the probability of the highest score and mapping the rest of
the observation according to the probabilities proportional to their values.
MACD:
R factor:
Accumulation / Distribution:
Momentum:
Trend:
Volatility:
Swing Volume:
Please see third file in series “3 – Normal Scaling – F” for details. Sheet “Raw Data” contains
original factor values. Sheet “Working Sheet” was used to copy original values, including time
stamp, resort by value and copied to column B. Column F then shows scaled, fitted values. It is
then copied back and sorted by time. Worksheet “Normalized” illustrates final result with new
values for each factor.
V Cronbach’s Alpha
Please see spreadsheet “4 – Cronbach’s Alpha F” for reference. Sheets “10 FACTORS”
through “1 FACTORS” illustrate a progress of removing factors and new group
formation. Worksheet “GROUP” shows final two groups with corresponding Cronbach’s
Alpha.
VI Trading Alpha
- Equal weighing,
- Achieved Edge weighting,
- Cronbach’s alpha group based weighing.
We have experimented with weighing based on all of the mentioned factors. Equal
weighting the factors produces non-discriminating result. It is a general, all-purpose
approach, where each factor was assigned a weight of 0.1. Results for trading alpha are
very promising and presented in the table below:
AVE UP EDGE AVE DN EDGE AVE TOTAL Hit Rate Up Hit Rate Dn
0.070471949 0.144113811 0.106489448 52.9787% 60.0000%
IC 5 sec 60 sec 120 sec
ALPHA 0.029129447 0.039844551 0.033319173
Much better results, however, were also achieved when using higher weights for the
factors with highest total edge results. Three quarters of the weighting was distributed
between Book Pressure, Acc / Dist, MACD and Trend, with the remainder (0.25 of the
total) equally distributed between other 6 factors:
While having slightly lower overall edge, this combination would achieve much higher
and stable IC over all time frames tested.
This calculation can be seen in Excel spreadsheet “5 – Trading Model F”. This spreadsheet is
designed to run a macro on each stock symbol, produce factor values, scale and fit those values,
apply appropriate weights to calculate model’s alpha, plot a graph of alpha, plug in those values
into trading mechanism, show trading positions and PnL, and, finally, create separate sheet with
all data displayed. Please find detailed instructions on this spreadsheet in Appendix B. Every new
worksheet will have columns AA through AJ displaying fitted factor values. Row 2 displays
appropriate weights and column AK sums it all up in a final model alpha calculation for that
particular stock.
While performing similar analysis using MSFT data for 02/01/2008, we came up to
similar conclusions. Edge test and IC test for alpha show overall improvement and
stability of the combined weighted indicator. Results are shown below:
Over all the MSFT data had lower edges for all factors including the book pressure
factor.
In this section we back-test our alpha model. Out of all the equity names provided we
selected a “universe” of 50 stocks. Stocks were selected on a fair sector representation
basis. Out of 50 stocks we randomly selected 20 names and applied the data available
for those stocks in our trading model. We have created trading alpha based on factors
from our in-sample analysis of the MSFT and GOOG data. We intend to apply same
model to random 20 stocks and, essentially, test our approach out-of-sample. In order
to imitate a real world approach, when a combination of rolling N past periods analysis
would be used (and historical data exists) to detect optimal range for alpha, we just
peek forward to check alpha distribution for the test period to decide on our WTB and
WTS cut off ranges. That’s why they appear to be different for different stocks.
Attribution results:
About 15% of our profit came due to market exposure and about 75% is due to sector
and stock exposure. Our strategy was not intended to be market (dollar) neutral nor was
it designed to be speculative. Therefore it is encouraging to see positive results in all
three categories.
VIII Conclusion
This project confirms a presence of information available at the top level of the book to
make successful short-term price forecasts. Multiple assumptions were made while
calculating trade price (MSP), commissions (none), slippage (none) that made our
project only a first stepping stone for a further deep research.
−
=
+
ℎ
' = "(
"
#
= − 1
)
(%
)
4 +
=
∑67879
678 %
4
2
(10,40,5) = {$=
(10) − =
(40) −
( − 1)& ∗ +
( − 1)} ∗ −1
6
1
"A' = ∗ (
+
)
−1
0
4 = ln
D 0
1. Main Sheet
2. 15 min – Base Model
3. 1 day Base Model
4. Raw Data
5. Trading Positions
6. sheet for each individual stock in the Portfolio
1. Main sheet
Select the stock from the Drop down list in the Circle and click run macro. This adds a new sheet
with 15 min Trading Model for that Specific stock
2. 15 Min – Base Model
This is Template of complete Trading model for 15 min that we use to apply to our stocks
And then Alpha calculation based on weights calculated from Cronbach’s alpha
This is Template of complete Trading model for 1 day. Same as 15 min- Base Model
Applied only for GOOG stock
4. Raw data
Collection of trading positions at each 5 sec bars for all the two stocks. This sheet is
dynamic, positions changes atomically when changed in the stock specific sheet. We use this
Data in the sheet for Attribution
6. Separate sheet for each stock – 15 min trading model applied to each stock
1 ADBE
2 AMZN
3 AAPL
4 BBBY
5 CMCSA
6 COST
7 HANS
8 EBAY
9 CDNS
10 GOOG
11 INFY
12 HOLX
13 DELL
14 LOGI
15 MSFT
16 DISH
17 CSCO
18 SIRI
19 WFMI
20 YHOO
APPENDIX C
1. Sector
2. Return and MSP
3. Attribution 05
4. Attribution 10
5. Attribution 15
6. Final 15 secAttribution
1. Sector
This sheet has the list of given stocks with sectors (grey strip) and out of that we selected 50 to
be our universe (green strip) and from that we chose 20 stocks (pink strip) to be in our portfolio.
This sheet has the list of MSP and Return for the stock universe and Positions for our portfolio
3. Attribution 05/10/15/Final15
These are the main Attribution sheets. Attribution 05/10/15 has 5 second attribution summary on
each sheet and the Final15 sheet has the sum of the first 3 Attributions and captures 15sec
Attribution summary.