The Price Impact of Order Book Events: Rama Cont, Arseniy Kukanov and Sasha Stoikov March 2011
The Price Impact of Order Book Events: Rama Cont, Arseniy Kukanov and Sasha Stoikov March 2011
The Price Impact of Order Book Events: Rama Cont, Arseniy Kukanov and Sasha Stoikov March 2011
March 2011
Abstract
arXiv:1011.6402v3 [q-fin.TR] 13 Apr 2011
We study the price impact of order book events - limit orders, market orders and can-
celations - using the NYSE TAQ data for 50 U.S. stocks. We show that, over short time
intervals, price changes are mainly driven by the order flow imbalance, defined as the imbal-
ance between supply and demand at the best bid and ask prices. Our study reveals a linear
relation between order flow imbalance and price changes, with a slope inversely proportional
to the market depth. These results are shown to be robust to seasonality effects, and stable
across time scales and across stocks. We argue that this linear price impact model, together
with a scaling argument, implies the empirically observed “square-root” relation between
price changes and trading volume. However, the relation between price changes and trade
volume is found to be noisy and less robust than the one based on order flow imbalance.
Contents
1 Introduction 2
1.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
5 Conclusion 21
1.1 Summary
We conduct in this study an empirical investigation of the impact of order book events –market
orders, limit orders and cancelations– on equity prices. Although previous studies give a rela-
tively complex description of this impact, we argue that, in fact, their impact on price dynamics
2
may be modeled parsimoniously through a single variable, the order flow imbalance (OFI), which
represents the net order flow at the bid and ask and tracks changes in the size of the bid and
ask queues by
• increasing every time the bid size increases, the ask size decreases or the bid/ask prices
increase
• decreases every time the bid size decreases, the ask size increases or the bid/ask prices
decrease.
Interestingly, this variable treats a market sell and a cancel buy of the same size as equivalent,
since they have the same effect on the size of the bid queue. We find that this aggregate variable
explains mid-price changes over short time scales in a linear fashion, for a large sample of stocks,
with an average R2 of 65%. The resulting price impact model relates prices, trades, limit orders
and cancelations in a simple way: it is linear, requires the estimation of a single parameter and
it is robust across stocks and across timescales.
The slope of this relation, which we call the price impact coefficient, exhibits intraday
seasonality in line with known intraday patterns observed in spreads, market depth and price
volatility [1, 4, 31, 34] which have been explained in terms of intraday shifts in information
asymmetry [33] or informativeness of trades [21]. Motivated by a stylized model of the order
book, we relate the intraday changes in the price impact coefficient to variations in market
depth and show that price impact is inversely proportional to the depth of the order book. This
allows us to explain intraday patterns in price impact and price volatility using only observable
quantities - the order flow imbalance and the market depth, as opposed to unobservable param-
eters previously invoked in the literature, such as information asymmetry or informativeness of
trades.
The intuition that “it takes volume to move prices”, though widely confirmed by empirical
studies [27], is not easy to explain theoretically (see [37, Chapter 6.2]). In Section 4, we show
that our price impact model, together with a scaling argument, leads to an apparent “square
root” relation between price changes and trade volume, similar to some findings in the empirical
literature [11, 40]. However, we argue that this relation is not robust and is a statistical artifact
due to the aggregation of data.
1.2 Outline
The article is structured as follows. In Section 2, motivated by a stylized model of the order
book, we specify a parsimonious model that links stock price changes, order flow imbalance and
market depth. Section 3 describes the trades and quotes data and estimation results for our
model. There, we also show how intraday patterns in depth and order flow imbalance generate
intraday patterns in price impact and price volatility. In Section 4 we discuss the role of trading
volume as an explanatory variable and show that order flow imbalance is more effective in
explaining price moves than variables based on trades. We also derive a scaling relation between
order flow imbalance and traded volume and show how the “square-root” price impact of volume
follows from our model. We present our conclusions in Section 5.
3
The bid price and size represent the demand for a stock, while the ask price and size rep-
resent the supply. We enumerate these observations by n and compare (Pn−1 B , qB , P A , qA )
n−1 n−1 n−1
B B A A
with (Pn , qn , Pn , qn ). Between two such observations, only one of the following events can
occur:
• PnB > Pn−1
B or qnB > qn−1
B signifying an increase in demand
Note that if q B increases but P B remains the same, we assign en = qnB − qn−1 B , representing the
size that was added at the bid. If q B decreases, we also assign en = qnB − qn−1B , representing the
size that was removed from the bid, whether due to a market sell or cancel buy order. If P B
increases, we let en = qnB , representing the size of a price-improving limit order. If P B decreases,
B , representing the size that was removed, whether due to a market order or a
we let en = qn−1
cancellation. The same classification is done for events on the ask side, with signs reversed.
Events affecting the order book occur at random times τn , and we define N (t) =
max{n|τn ≤ t} to be the number of events during [0, t]. We define the order flow imbalance
over time intervals [tk−1 , tk ] as a sum of individual event contributions en over these intervals:
N (tk )
X
OF Ik = en ,
n=N (tk−1 )+1
where N (tk−1 ) + 1 and N (tk ) are the index of the first and the index of the last event in the
interval [tk−1 , tk ]. The order flow imbalance is a measure of supply/demand imbalance, which
encompasses trades, limit orders and cancelations. Whereas previous studies [10, 20, 22, 29, 38,
43] focused on measures of “trade imbalance”1 , using orders provides a more natural way of
measuring supply and demand.
We also consider mid-price changes (in number of ticks) over the same time grid:
where Pk is the mid-quote price at time tk and δ is the tick size (equal to 1 cent in our data).
1
Hopman [24] computes the supply/demand imbalance based on limit orders and trades, but not cancelations.
4
2.2 A stylized model of the order book
Consider first a stylized model of the order book in which
1. the number of shares at each price level beyond the best bid/ask is equal to D.
2. limit orders arrivals and cancelations occur only at the best bid/ask.
We will show that under these assumptions a linear relation holds between order flow imbalance
and price changes. Consider three scenarios, when only market buy orders, limit buy orders or
limit sell cancels happen over some time interval [t, t + ∆t]:
• Market sell orders remove M s shares from the bid, while limit buy orders add Lb shares
to the bid.
• Market sell orders and limit buy cancels remove M s + C b shares from the bid, while limit
buy orders add Lb shares to the bid.
5
The three variables M b , C s and Ls for the ask can be defined analogously. Under the above
assumptions, the impact of order book events at the bid (ask) side of the book is additive and
only depends on their net effect on the bid (ask) queue size:
∆P b = d(Lb − C b − M s )/De
Similarly, for the ask:
∆P a = −d(Ls − C s − M b )/De
These relations are remarkably simple - they involve no parameters and incorporate the
effects of all order book events on bid and ask prices. Although the following analysis can be
carried for the bid and the ask prices separately, we take their average (the mid-price) to simplify
the analysis:
1 1
∆P = d(Lb − C b − M s )/De − d(Ls − C s − M b )/De
2 2
Note that the above is equivalent (up to truncation) to
OF I
∆P = + , (1)
2D
where OF I = Lb − C b − M s − Ls + C s + M b and is the truncation error. This expression
for OF I is obtained from its definition by grouping individual order contrubutions ei by their
types (limit buys, market sells, etc).
∆Pk = β OF Ik + k , (2)
where β is the price impact coefficient and k is a noise term due to the influence of deeper levels
of the order book and rounding errors. Our earlier discussion suggests that the price impact
coefficient is inversely related to market depth, which is itself subject to intraday fluctuations.
We define a measure of depth by averaging the bid/ask queue sizes over intervals [Ti−1 , Ti ]:
N (Ti )
1 X
ADi = (qnB + qnA )
2(N (Ti ) − N (Ti−1 ) − 1)
n=N (Ti−1 )+1
We therefore specify the following relation between the price impact coefficient βi in the time
interval [Ti−1 , Ti ] and our measure of market depth as:
c
βi = + νi , (3)
ADiλ
where c, λ are constants and νi is a noise term. Note that the stylized model exposed above
corresponds to λ = 1.
6
The specification (2-3) may be regarded as a model of the instantaneous price impact over
a short time interval [tk−1 , tk ]. An order, submitted at τ ∈ [tk−1 , tk ], has a contribution eτ
and joins the aggregate order flow imbalance OF Ik . If the order goes in the same direction
as the majority of the orders (sgn(eτ ) = sgn(OF Ik )), it reinforces the concurrent order flow
imbalance and can affect the price. If the order goes against the concurrent order flow imbalance
(sgn(eτ ) = −sgn(OF Ik )), it is compensated by other orders and may have an instantaneous
impact of zero. In our model all events (including trades) have a linear price impact, equal to β
on average. Their realized impact, however, depends on the rest of the orders that arrive during
the same time interval.
The idea that the concurrent limit order activity can make a difference in terms of trades’
impact was demonstrated by Stephens et al. [41], where authors show that the shape of the price
impact function essentially depends on the contemporaneous limit order activity. Our approach
can also be related to the model proposed by Bouchaud et al. [12]. where order book events
have a linear impact on prices, which depends on their signs and types2 . The major difference of
our models lies in the aggregation across time and events. As argued in [12], order book events
have complicated auto- and cross-correlation structures on the timescale of individual events,
which typically vanish after 10 seconds. In our data the autocorrelations at a timescale of 10
seconds are small and quickly vanish as well (ACF plots for a representative stock are shown on
Figure 1). Finally, Hasbrouck and Seppi [22] propose a model similar to (2, 3) for explaining
the price impact of trades. Although their focus is on trades, they also allow the price impact
coefficient to depend on contemporaneous liquidity factors and change through time.
However, the linear equation (2) is quite different from models of price impact that consider
only the size of trades [18, 20, 29, 43, 38, 39]. Instead of modeling price impact of trades as a
(nonlinear) function of trade size, we show that the price impact of all events (including trades)
is a linear function of their size after events are aggregated into a single imbalance variable. In
Section 4 we will argue that, first, the effect of trades on prices is adequately captured by the
order flow imbalance and, second, that if one leaves out all events except trades, the relation 2
leads to an apparent concave relation between price changes and trade volume.
The next section provides an overview of the estimation results for our model.
Figure 1: ACF of the mid-price changes ∆Pk , the order flow imbalance OF Ik and the 5%
significance bounds for the Schlumberger stock (SLB).
2
Note that in our case all order book events have the same average impact, equal to βi , regardless of their type.
As shown in [12], average impacts of different event types are empirically very similar, allowing to reasonably
approximate them with a single number.
7
3 Estimation and results
3.1 The trades and quotes (TAQ) data
Our data set consists of one calendar month (April, 2010) of trades and quotes (TAQ) data
for 50 stocks. The stocks were selected by a random number generator from the S&P 500
constituents. The S&P 500 composition for that month was obtained from Compustat and the
data for individual stocks was obtained from the TAQ consolidated quotes and TAQ consolidated
trades databases. The data were obtained through Wharton Research Data Services (WRDS).
Consolidated quotes contains all changes in queue sizes at the best bid and ask. For each
stock, a data update consists of a timestamp (rounded to the nearest second), bid price, bid
size, ask price, ask size and exchange flag. Consolidated trades (or market orders) consist of
a timestamp, a price and a size. These two data sets are often referred to as Level 1 data, as
opposed to Level 2 data, which also includes quote updates deeper in the book.
Our reason for using TAQ data rather than Level 2 order book data, is that it is far
more accessible, yet contains all events in the top order book (best bid and ask updates). We
demonstrate that Level 1 TAQ data can be successfully used to study limit orders and we hope
that more empirical studies of that subject will follow. We note that the ratio of the number of
quote updates to the number trades is roughly 40 to 1 in our data. Many empirical studies have
focused exclusively on trades rather than quotes, but the sheer ratio in the size of these data
sets is a good indicator that more information may be conveyed by the quotes than by trades.
Using a procedure described in detail in the appendix, we aggregate all quote updates to
estimate the National Best Bid and Offer sizes and prices (NBBO) at each quote update. Instead
of aggregating all exchanges in this fashion, one may also simply filter by the exchange flag and
study one exchange at the time. Focussing on one exchange at a time yields similar results.
We use a uniform grid in time {t0 , . . . , tN } with a timescale tk − tk−1 ≡ ∆t = 10 seconds to
compute the price changes and the order flow imbalances. To test the robustness of our findings
to the choice of the basic timescale, we repeated our calculations on a subsample of stocks for
different values of ∆t, ranging from 10 quote updates (usually less than half of a second in our
data) up to 10 minutes. The fit of our model generally increases with ∆t, but the rest of the
results stays the same. Time aggregation serves two purposes: first, it alleviates the issue of data
discreteness and second, it mitigates the errors due to the trade matching algorithm (described
in the Appendix).
8
intercept is mostly insignificant. Figure 3 represents the histogram of excess kurtosis values of the
residuals ˆk across subsamples: the relatively low level of kurtosis shows that the residuals are not
predominantly associated with large price changes. Since the regression residuals demonstrate
heteroscedasticity, we used White’s heteroscedasticity-consistent standard errors for the z-test.
To check for higher order/nonlinear dependence we add a quadratic term γ̂Q,i OF Ik |OF Ik | to
the regression. The increase in R2 , from 65% to 68% on average, is barely noticeable and the
coefficient γ̂Q,i is statistically insignificant in most samples.
Figure 2: Scatter plot of ∆Pk against OF Ik for the Schlumberger stock (SLB), 04/01/2010
11:30-12:00pm.
Figure 3: Distribution of excess kurtosis of the residuals ˆk across stocks and time.
9
Table 1. Descriptive statistics
Daily Number of Number of Average Maximum Best quote
Name Ticker Price volume, best quote trades Spread, spread, depth,
shares updates cents cents shares
Advanced Micro Devices AMD 9.61 20872996 417204 6687 1 1 1035
Apollo Group APOL 62.92 1949337 172942 4095 2 5 15
American Express AXP 45.21 8678723 559701 7748 1 24 79
Autozone AZO 179.03 243197 43682 1081 9 35 7
Bank of America BAC 18.43 164550168 1529395 15008 1 1 3208
Becton Dickinson BDX 78.07 1130362 61029 2968 2 5 15
Bank of New York Mellon BK 31.77 6310701 285619 5518 1 1 122
Boston Scientific BSX 7.13 25746787 309441 6768 1 1 2965
Peabody Energy corp BTU 47.14 5210642 298616 7267 1 3 29
Caterpillar CAT 67.20 6664891 392499 8224 1 2 38
Chubb CB 52.22 1951618 149010 3601 1 2 43
Carnival CCL 40.16 4275911 215427 5503 1 2 53
Cincinnati Financial CINF 29.41 688914 51373 1528 1 2 42
CME Group CME 322.83 418955 38504 1412 31 103 5
Coach COH 41.91 3126469 176795 4458 1 2 41
ConocoPhillips COP 56.09 9644544 426614 8621 1 2 84
Coventry Health Care CVH 24.16 1157022 79305 2213 1 2 38
Denbury Resources DNR 17.88 5737740 263173 4643 1 1 186
Devon Energy DVN 66.98 3260982 177006 5805 2 4 18
Equifax EFX 35.34 799505 62957 1945 1 3 39
Eaton ETN 78.53 1757136 67989 3580 2 6 13
Fiserv FISV 52.56 1038311 58304 2208 1 3 20
Hasbro HAS 39.48 1322037 86040 2672 1 2 34
HCP HCP 32.63 2872521 213045 4357 1 2 48
Starwood Hotels HOT 50.59 3164807 150252 5106 2 4 22
Kohl’s KSS 56.88 3064821 128196 4936 1 3 27
L-3 Communications LLL 94.64 670937 72818 2141 2 6 9
Lockheed Martin LMT 84.14 1416072 88254 3333 2 5 15
Macy’s M 23.40 8324639 491756 6469 1 1 176
Marriott MAR 34.45 5014098 238190 5499 1 2 65
McAfee MFE 40.04 2469324 109073 3561 1 2 40
McGraw-Hill MHP 34.90 1954576 102389 3261 1 2 42
Medco Health Solutions MHS 63.22 2798098 109382 4680 1 3 25
Merck MRK 36.03 13930842 448748 7997 1 1 231
Marathon Oil MRO 32.33 5035354 341408 5522 1 1 143
MeadWestvaco MWV 26.96 1035547 92825 2312 1 3 37
Newmont Mining NEM 53.43 5673718 435295 7717 1 2 38
Omnicom OMC 41.17 3357585 150800 4359 1 2 65
MetroPCS Communications PCS 7.53 4424560 107967 2901 1 1 523
Pultegroup PHM 11.80 6834683 262420 4604 1 1 319
PerkinElmer PKI 23.98 1268774 78114 2127 1 2 72
Ryder System R 44.01 631889 47422 2085 2 5 11
Reynolds American RAI 54.44 773387 56236 2076 1 4 22
Schlumberger SLB 67.94 9476060 440839 10286 1 2 39
Teco Energy TE 16.52 1070815 70318 1807 1 1 148
Time Warner Cable TWC 53.21 1770234 88286 3554 2 3 22
Whirlpool WHR 97.73 1424264 134152 3348 4 9 10
Windstream WIN 11.03 2508830 104887 2937 1 1 798
Watson Pharmaceuticals WPI 42.51 895967 63094 2024 1 3 29
XTO Energy XTO 48.13 7219436 612804 5040 1 7 225
Grand mean 51.75 7512376 223232 4552 2 6 227
Table 1 presents the average mid-price, daily transaction volume, daily number of best
quote updates, daily number of trades, spread and the depth at the best bid and ask for 50
randomly chosen U.S. stocks. All values are calculated from the filtered data, that consists of
21 trading day during April, 2010.
10
Table 2. Relation between price changes and order flow imbalance.
Average results Hypothesis testing
Ticker
α̂ t(α̂) β̂ t(β̂) γ̂Q t(γ̂Q ) R2 {α 6= 0} {β 6= 0} {γQ 6= 0}
AMD -0.0032 -0.17 0.0008 9.96 1.4E-07 0.68 64% 0% 98% 22%
APOL 0.0038 0.10 0.0555 10.32 -2.2E-04 -1.17 63% 12% 91% 4%
AXP 0.0019 0.08 0.0082 13.87 -3.8E-06 -0.88 69% 11% 100% 5%
AZO 0.0101 0.33 0.1619 6.39 -9.3E-04 -0.89 47% 23% 97% 3%
BAC -0.0018 -0.09 0.0002 18.36 1.9E-09 0.01 79% 1% 100% 8%
BDX -0.0008 -0.06 0.0536 10.08 -1.1E-04 -0.38 63% 9% 100% 8%
BK -0.0078 -0.19 0.0069 14.97 -4.0E-06 -0.57 74% 3% 100% 6%
BSX 0.0000 -0.01 0.0003 6.12 7.8E-08 1.14 58% 0% 81% 22%
BTU 0.0048 0.12 0.0242 14.51 -3.5E-05 -1.26 72% 11% 100% 3%
CAT 0.0147 0.23 0.0194 14.85 -1.9E-05 -1.13 71% 12% 99% 3%
CB -0.0086 -0.07 0.0191 11.97 -3.5E-07 0.00 64% 5% 100% 8%
CCL -0.0067 -0.18 0.0140 13.88 -1.2E-05 -0.64 70% 3% 99% 7%
CINF -0.0030 -0.02 0.0260 10.73 -7.0E-06 0.27 70% 1% 98% 16%
CME 0.0506 0.05 0.6262 4.98 -7.2E-03 -0.99 35% 15% 94% 2%
COH -0.0221 -0.45 0.0179 12.75 -1.7E-05 -0.77 69% 2% 100% 3%
COP -0.0008 0.06 0.0084 12.50 -5.8E-06 -1.17 68% 10% 100% 3%
CVH -0.0034 -0.06 0.0217 10.83 7.6E-06 0.20 65% 3% 99% 10%
DNR -0.0008 -0.04 0.0045 12.76 -1.3E-07 0.19 69% 1% 99% 13%
DVN 0.0112 0.18 0.0370 11.48 -1.0E-04 -1.59 65% 17% 97% 0%
EFX -0.0032 -0.04 0.0222 8.71 6.4E-05 0.64 56% 1% 98% 18%
ETN -0.0076 0.05 0.0712 10.51 -2.3E-04 -1.14 65% 14% 98% 1%
FISV -0.0002 0.06 0.0397 10.42 -2.3E-05 -0.19 63% 4% 100% 8%
HAS -0.0031 -0.02 0.0222 11.45 4.7E-06 0.21 67% 3% 100% 16%
HCP -0.0078 -0.17 0.0150 13.60 -1.4E-05 -0.46 67% 2% 100% 6%
HOT -0.0012 0.05 0.0345 12.64 -7.2E-05 -1.21 68% 10% 99% 2%
KSS -0.0030 -0.04 0.0317 13.82 -5.4E-05 -0.80 71% 10% 98% 3%
LLL 0.0160 0.32 0.1000 11.76 -3.8E-04 -0.75 67% 14% 96% 3%
LMT 0.0006 0.00 0.0520 13.58 -1.2E-04 -0.98 72% 14% 100% 1%
M -0.0010 0.04 0.0043 15.82 8.8E-08 0.13 75% 0% 100% 12%
MAR -0.0039 -0.02 0.0121 14.61 -4.1E-06 -0.23 71% 3% 100% 4%
MFE 0.0087 0.16 0.0205 12.72 -3.8E-05 -0.38 68% 7% 100% 7%
MHP -0.0073 -0.13 0.0211 11.62 5.8E-06 0.14 68% 2% 99% 11%
MHS -0.0055 -0.16 0.0334 11.70 -8.3E-05 -1.10 66% 9% 99% 3%
MRK -0.0065 -0.20 0.0032 12.53 -5.4E-07 -0.38 69% 1% 100% 8%
MRO 0.0018 0.07 0.0058 13.67 -3.6E-07 0.22 69% 5% 100% 13%
MWV -0.0011 0.01 0.0205 11.79 -1.7E-05 -0.25 68% 3% 100% 7%
NEM -0.0102 -0.22 0.0170 13.81 -1.9E-05 -1.36 71% 8% 100% 2%
OMC -0.0099 -0.28 0.0144 11.88 -4.5E-06 -0.01 65% 2% 99% 13%
PCS -0.0006 -0.03 0.0015 5.21 1.8E-06 1.01 53% 0% 79% 24%
PHM 0.0006 0.03 0.0027 10.33 8.4E-07 0.55 66% 1% 98% 21%
PKI -0.0004 -0.03 0.0102 7.25 4.1E-05 1.10 53% 2% 94% 29%
R 0.0006 0.03 0.0667 10.14 3.7E-05 0.01 63% 8% 98% 10%
RAI -0.0070 -0.10 0.0396 10.40 2.6E-05 0.01 66% 5% 100% 11%
SLB -0.0077 -0.15 0.0198 16.76 -1.8E-05 -1.15 76% 7% 100% 1%
TE 0.0011 0.05 0.0049 6.66 1.4E-05 1.45 54% 2% 86% 30%
TWC -0.0130 -0.13 0.0384 11.80 -5.6E-05 -0.44 64% 8% 99% 5%
WHR 0.0628 0.63 0.1278 10.26 -3.3E-04 -0.80 65% 22% 97% 4%
WIN -0.0004 -0.03 0.0009 3.12 1.5E-06 0.76 44% 1% 60% 15%
WPI -0.0090 -0.21 0.0270 10.47 2.9E-05 0.28 66% 3% 98% 14%
XTO -0.0088 -0.18 0.0029 13.28 2.7E-07 0.30 65% 0% 100% 18%
Average 0.0002 -0.02 0.0398 11.47 -2.0E-04 -0.28 65% 6% 97% 9%
Table 2 presents a cross-section of results (averaged across time) for the regressions:
where ∆Pk are the 10-second mid-price changes and OF Ik are the contemporaneous order flow imbalances. These regressions
were estimated using 273 half-hour subsamples (indexed by i) for each stock and their outputs were averaged across
subsamples. Each subsample typically contains about 180 observations (indexed by k). The t-statistics were computed
using White’s standard errors. For brevity, we report the R2 , the average α̂i and the average β̂i only for the first regression
(with a single OF Ik term). There is almost no difference between averages of estimates β̂i and βˆQ i and the R2 in two
regressions. The last three columns report the percentage of samples where the coefficient(s) passed the z-test at the 5%
significance level.
11
Next, we estimate the parameters λ and c in (3). For each stock, we first obtain λ̂ fit via
a loglinear regression:
ĉ
β̂i = αM,i
ˆ + + ˆM,i (6)
ADiλ̂
Both regressions are estimated using ordinary least squares. The results are presented in
Table 3: the quality of these fits convincingly demonstrates that the instantaneous price impact
(measured by β̂i ) is inversely related to market depth. There are three stocks with bad fits
(namely APOL, AZO and CME) and we note that they also have wide spreads and low values
of depth. It is possible that for these stocks other factors, such as the presence of hidden orders
and depth beyond the best price levels the order book may dominate the instantaneous price
impact. The intercept αˆL,i is highly statistically significant (being an estimate of parameter c)
and αM,i
ˆ , which is included to absorb the means, is mostly insignificant. Since the residuals of
these regressions appear to be autocorrelated, the t-statistics and confidence intervals in Table 3
are computed with Newey-West standard errors. Coinciding with our intuition for (1), estimates
λ̂ are very close to 1 across stocks and the hypothesis {λ = 1} cannot be rejected for 35 out of
50 stocks. The restricted model (with λ = 1) also demonstrates a good quality of fit, making
this a good approximation. However, the coefficient ĉ is generally different from c = 21 in (1).
Lower values of ĉ mean that mid-prices are (on average) more resilient to the incoming orders
than indicated by ADi (which is only a rough measure of market depth). In summary, λ = 1
appears to be a good approximation for most of the stocks and only the constant c needs to be
calibrated to the data. The general case of regression (5) is illustrated on Figure 4 by a scatter
plot for a representative stock.
Figure 4: Log-log scatter plot of the price impact coefficient estimate β̂i against average market
depth ADi for the Schlumberger stock (SLB).
12
Table 3. Relation between the price impact coefficient and market depth.
Parameter estimates 5% confidence intervals Fit measures
Ticker ˆ ˆ
ĉ λ̂ t(ĉ) t(λ̂) ĉl ĉu λ̂l λ̂u R2 corr[β̂, β̂]2 corr[β̂, β̂ ∗ ]2
AMD 0.23 0.94 27.74 23.11 0.22 0.25 0.86 1.02 78% 86% 86%
APOL 0.27 0.36 4.43 1.05 0.15 0.39 -0.32 1.04 2% 30% 31%
AXP 0.14 0.83 13.95 26.48 0.12 0.16 0.77 0.89 84% 76% 76%
AZO 0.39 0.67 5.48 5.10 0.25 0.53 0.41 0.92 13% 17% 16%
BAC 0.27 0.96 25.27 19.74 0.25 0.29 0.90 1.03 76% 87% 87%
BDX 0.38 1.04 22.83 18.64 0.35 0.41 0.93 1.15 71% 68% 68%
BK 0.21 0.92 17.52 54.54 0.19 0.24 0.88 0.95 93% 91% 90%
BSX 0.35 0.98 14.98 24.55 0.31 0.40 0.90 1.05 73% 81% 81%
BTU 0.42 1.12 40.90 36.77 0.40 0.44 1.06 1.18 87% 83% 83%
CAT 0.29 0.96 21.70 16.87 0.27 0.32 0.85 1.07 87% 83% 83%
CB 0.32 1.02 27.08 49.61 0.30 0.34 0.98 1.06 92% 89% 89%
CCL 0.26 0.96 24.36 37.55 0.24 0.29 0.91 1.01 87% 83% 83%
CINF 0.31 0.97 20.05 47.39 0.28 0.34 0.93 1.01 92% 88% 88%
CME 1.27 0.50 2.55 1.99 0.29 2.24 0.01 0.99 2% 4% 3%
COH 0.37 1.05 15.29 36.65 0.32 0.43 0.98 1.12 77% 75% 75%
COP 0.13 0.80 8.52 15.95 0.10 0.16 0.70 0.89 75% 66% 66%
CVH 0.32 1.03 26.50 37.51 0.29 0.34 0.98 1.08 89% 89% 89%
DNR 0.23 0.96 32.44 40.90 0.22 0.24 0.92 1.01 91% 89% 89%
DVN 0.26 0.91 13.50 16.66 0.22 0.30 0.80 1.02 45% 56% 56%
EFX 0.30 0.99 20.16 26.13 0.27 0.33 0.92 1.07 84% 79% 79%
ETN 0.45 1.07 11.51 17.34 0.38 0.53 0.95 1.19 60% 56% 56%
FISV 0.34 1.01 23.35 30.70 0.31 0.36 0.94 1.07 84% 77% 77%
HAS 0.32 1.00 26.36 46.00 0.30 0.34 0.96 1.05 89% 83% 83%
HCP 0.19 0.89 22.93 51.27 0.17 0.21 0.86 0.93 94% 90% 90%
HOT 0.44 1.12 19.53 26.59 0.40 0.48 1.04 1.20 82% 80% 79%
KSS 0.39 1.05 24.40 33.17 0.36 0.42 0.99 1.11 85% 78% 78%
LLL 0.43 1.01 13.21 14.45 0.37 0.50 0.87 1.14 51% 58% 58%
LMT 0.50 1.14 7.31 13.49 0.37 0.64 0.98 1.31 60% 52% 52%
M 0.19 0.90 37.41 57.39 0.18 0.20 0.87 0.93 94% 92% 92%
MAR 0.28 0.98 22.58 50.20 0.25 0.30 0.94 1.02 92% 88% 88%
MFE 0.31 1.01 20.28 46.20 0.28 0.34 0.96 1.05 91% 86% 86%
MHP 0.27 0.94 19.60 33.62 0.24 0.30 0.89 1.00 82% 74% 74%
MHS 0.53 1.16 17.03 34.25 0.47 0.59 1.10 1.23 85% 81% 80%
MRK 0.13 0.81 18.07 32.20 0.11 0.14 0.76 0.86 87% 81% 81%
MRO 0.23 0.94 35.54 49.68 0.21 0.24 0.91 0.98 94% 93% 93%
MWV 0.32 1.05 28.07 37.81 0.30 0.34 1.00 1.10 90% 85% 85%
NEM 0.26 0.98 18.79 25.97 0.23 0.28 0.91 1.05 81% 77% 77%
OMC 0.30 0.96 29.47 17.76 0.28 0.32 0.85 1.06 83% 85% 85%
PCS 0.30 1.02 21.27 18.73 0.27 0.33 0.90 1.14 53% 82% 82%
PHM 0.28 0.98 36.43 35.12 0.26 0.29 0.93 1.04 86% 90% 90%
PKI 0.30 1.07 26.59 38.35 0.28 0.32 1.00 1.13 82% 88% 87%
R 0.37 1.02 18.51 15.76 0.33 0.41 0.90 1.15 57% 58% 58%
RAI 0.35 1.03 24.94 40.46 0.32 0.38 0.98 1.08 86% 76% 76%
SLB 0.35 1.06 18.98 40.60 0.31 0.38 1.01 1.12 91% 88% 88%
TE 0.21 1.00 16.18 24.28 0.18 0.24 0.92 1.09 70% 86% 86%
TWC 0.37 1.04 17.70 15.96 0.33 0.42 0.91 1.16 72% 79% 79%
WHR 0.78 1.18 9.24 11.54 0.61 0.94 0.98 1.38 44% 43% 42%
WIN 5.81 1.60 16.09 11.70 5.11 6.52 1.33 1.87 28% 71% 71%
WPI 0.27 0.92 19.33 28.99 0.24 0.30 0.86 0.98 78% 76% 76%
XTO 0.31 1.04 30.85 39.51 0.29 0.33 0.98 1.09 89% 91% 91%
Grand mean 0.45 0.98 20.74 29.53 0.38 0.52 0.88 1.08 74% 75% 75%
where β̂i is the price impact coefficient for the i-th half-hour subsample and ADi is the average market depth for that
subsample. These regressions were estimated for each of the 50 stocks, using 273 estimates of β̂i for that stock, obtained
from (4). The second regression uses estimates λ̂ obtained from the first regression. The t-statistics and the confidence
intervals were computed using Newey-West standard errors. Confidence intervals are built with normal critical values.
The last three columns provide three alternative fit measures - the R2 of the linear regression (5), the squared correlation
ˆ ˆ
between β̂i and fitted values β̂i = ĉ λ̂ and the squared correlation between β̂i and β̂i∗ = AD
ĉ
.
ADi i
13
3.3 Intraday patterns
The link that we established between the price impact and the market depth has an important
implication. Since the market depth follows a predictable pattern of intraday seasonality ([1],
[31]), the price impact coefficient must also have a predictable intraday pattern. To demonstrate
it, we averaged β̂i for each stock and each half-hour interval across days, resulting in the intraday
seasonality pattern for that stock, normalized these values by the average β̂i of that stock and
averaged the normalized seasonality patterns across stocks. The same procedure was repeated
for ADi and the results are shown on Figure 5.
Figure 5: Intraday patterns in the price impact coefficient β̂i and the average depth ADi .
Near the market open, depth is two times lower than it is on average, indicating that the
order book is relatively shallow. In a shallow market, the incoming orders can easily affect the
mid-price and the price impact coefficient is two times higher near the market open than on
average. Moreover, price impact is five times higher at the market open compared to the market
close.
The intraday pattern in price impact can be used to explain the intraday patterns in
price volatility, observed by many researchers ([1], [4], [21], [33]). Similarly to the price impact
coefficient and the market depth, we computed the intraday patterns in variances of ∆Pk and
OF Ik , using half-hour subsamples (indexed by i). Taking the variance on both sides of equation
(2) demonstrates the link between var[∆Pk ]i , var[OF Ik ]i and βi :
14
Figure 6: Intraday seasonality in variances var[∆Pk ]i , var[OF Ik ]i , the price impact coefficient
β̂i and the expression βi2 var[OF Ik ]i .
The intraday pattern in price variance was explained by Madhavan et al. [33] in terms
of a structural model. They argued that the volatility is higher in the morning because of the
higher inflow of both public and private information. Similarly, Hasbrouck [21] argued that the
peak of price volatility at market open is mostly due to higher intensity of public information.
Both studies agree that the impact of trades is larger in the morning. Our model contributes to
this discussion by explaining the peak of price volatility using tangible quantities, rather than
unobservable parameters. We also argue that the price impact of trades and the information
asymmetry may be, in fact, two sides of the same coin.
First, we associate the higher volatility of order flow imbalance at market the open and close
with a higher rate of trading, that is, higher inflow of public and private information. Second,
if the bid-ask spread is small (it is mostly equal to 1 cent in our data), limit order traders may
avoid being “picked off” only by lowering the number of submitted orders, reducing the depth.
Therefore, if limit order traders are aware of information asymmetry in the morning, the low
depth may simply indicate this asymmetry. In our model, low depth also implies a higher price
impact, making the information advantages harder to realize at the market open.
15
4 Price impact of trades
4.1 Trade imbalance vs order flow imbalance
The previous section discussed the linear relation between price changes and OF Ik - our measure
of supply/demand imbalance. However, little has been said about trade imbalances, which are
widely used in the academic literature [10, 20, 22, 24, 29, 38] and in practice [43]. The aim of
this section is to compare the price impact of trades and order flow imbalance and show that
the (nonlinear) price impact of trade volume may be derived from our linear model for the price
impact of order flow.
For convenience we will call ‘buy trade’ a transaction initiated by a market buy order and
‘sell trade’ a transaction initiated by a market sell order. We define the trade imbalance during
a time interval [tk−1 , tk ] as the difference between volumes of buy and sell trades during that
interval:
N (tk ) N (tk )
X X
T Ik = bn − sn ,
n=N (tk−1 )+1 n=N (tk−1 )+1
Here, bn is the size of a buyer-initiated trade that occurs at the n-th quote; bn = 0 if no buy
trade occurs at that quote. Similarly, sn is the size of a sell trade that occurs at the n-th quote
or zero. The procedure that matches trades with quotes and classifies them as buys or sells is
described in the Appendix.
To compare the explanatory power of trade and order flow imbalances with respect to price
changes, we perform the following regressions:
The regressions are estimated separately for every half-hour subsample of data (indexed by i).
If the effect of trades is included in the order flow imbalance, the coefficients θ̂T,i in (8c) must
be indistinguishable from zero. We note that regressions (8a-8c) contain only the linear terms,
because we found no evidence of non-linear price impacts in our data (for neither OF Ik nor
T Ik ). The average results of these regressions are presented in Panel A of Table 4. Clearly,
when OF Ik and T Ik are taken individually, each of them has a statistically significant influence
on price changes. Comparing the two we observe that OF Ik explains price changes better
than T Ik - the average R2 for order flow imbalance is 65% compared to 32% for the trade
imbalance. When two variables are used together to explain price changes, the dependence on
trade imbalance becomes questionable. The average t-statistic of T Ik decreases by a factor of
four and the coefficients θ̂T,i are statistically significant in only 31% of subsamples. However,
the dependence on OF Ik remains convincingly strong.
Our findings show that:
1. The order flow imbalance OF Ik explains price movements better than the imbalance of
trades.
16
Table 4. Comparison of order flow imbalance and trade imbalance.
Panel A: Detailed results for changes in mid prices
Order flow imbalance Trade imbalance Both covariates
Ticker
R2 t(β̂) {β 6= 0} F R2 t(β̂T ) {βT 6= 0} F R2 t(θ̂O ) t(θ̂T ) {θO 6= 0} {θT 6= 0} F
AMD 64% 9.96 98% 382 39% 4.15 86% 140 67% 6.49 1.26 93% 34% 214
APOL 63% 10.32 91% 396 30% 4.14 84% 83 66% 8.00 1.09 89% 26% 211
AXP 69% 13.87 100% 449 34% 4.72 83% 101 71% 10.05 1.50 100% 44% 241
AZO 47% 6.39 97% 179 30% 4.09 90% 87 54% 5.02 2.34 96% 68% 118
BAC 79% 18.36 100% 774 45% 6.31 96% 157 80% 12.55 0.72 99% 19% 397
BDX 63% 10.08 100% 362 28% 4.02 82% 79 65% 7.88 1.23 97% 34% 195
BK 74% 14.97 100% 610 36% 4.58 81% 117 75% 10.68 0.68 99% 17% 313
BSX 58% 6.12 81% 338 31% 2.57 54% 106 62% 4.51 0.57 73% 12% 189
BTU 72% 14.51 100% 527 35% 5.21 88% 103 74% 10.90 1.31 99% 32% 277
CAT 71% 14.85 99% 498 33% 5.01 86% 94 72% 11.27 1.28 99% 38% 262
CB 64% 11.97 100% 378 33% 4.66 88% 102 66% 8.42 1.34 99% 37% 202
CCL 70% 13.88 99% 478 32% 4.55 85% 93 71% 10.50 0.98 99% 26% 247
CINF 70% 10.73 98% 552 39% 4.26 87% 141 72% 7.17 1.01 96% 27% 297
CME 35% 4.98 94% 112 24% 3.39 75% 63 44% 4.10 2.18 92% 59% 78
COH 69% 12.75 100% 457 29% 3.91 82% 80 70% 10.06 0.87 100% 22% 238
COP 68% 12.50 100% 450 35% 4.92 84% 107 70% 9.19 1.42 100% 40% 240
CVH 65% 10.83 99% 418 35% 4.10 84% 114 67% 7.30 1.01 97% 25% 222
DNR 69% 12.76 99% 471 32% 3.98 81% 101 70% 9.29 1.01 97% 24% 246
DVN 65% 11.48 97% 414 33% 4.83 88% 96 68% 8.58 1.70 93% 48% 226
EFX 56% 8.71 98% 289 31% 3.72 80% 101 60% 6.21 1.64 96% 43% 167
ETN 65% 10.51 98% 389 25% 3.59 71% 69 67% 8.66 1.04 98% 29% 209
FISV 63% 10.42 100% 380 28% 3.79 81% 79 65% 8.12 0.88 100% 25% 201
HAS 67% 11.45 100% 427 32% 4.04 84% 97 68% 8.53 0.89 100% 24% 223
HCP 67% 13.60 100% 417 31% 4.43 82% 91 68% 10.01 1.05 100% 32% 217
HOT 68% 12.64 99% 438 27% 3.86 77% 74 70% 9.94 1.17 99% 29% 231
KSS 71% 13.82 98% 525 31% 4.43 81% 91 72% 10.83 0.94 97% 25% 274
LLL 67% 11.76 96% 485 36% 5.07 90% 117 70% 8.58 1.63 94% 44% 270
LMT 72% 13.58 100% 516 35% 4.89 90% 105 73% 10.19 1.50 99% 40% 277
M 75% 15.82 100% 640 35% 4.41 84% 108 76% 11.38 0.97 100% 26% 330
MAR 71% 14.61 100% 498 34% 4.77 89% 105 72% 10.45 1.05 100% 27% 258
MFE 68% 12.72 100% 463 31% 4.17 82% 93 69% 9.06 0.73 99% 18% 239
MHP 68% 11.62 99% 489 31% 3.85 84% 96 70% 8.92 0.77 98% 19% 257
MHS 66% 11.70 99% 414 28% 4.03 77% 80 68% 9.10 1.11 99% 27% 218
MRK 69% 12.53 100% 451 31% 4.08 82% 93 70% 9.20 0.76 100% 20% 235
MRO 69% 13.67 100% 465 35% 4.66 89% 104 70% 9.73 0.91 100% 24% 241
MWV 68% 11.79 100% 452 34% 4.37 86% 102 69% 8.63 0.80 100% 24% 237
NEM 71% 13.81 100% 490 34% 4.99 81% 100 72% 10.24 1.53 99% 43% 260
OMC 65% 11.88 99% 411 30% 4.14 85% 88 67% 8.99 0.96 99% 24% 216
PCS 53% 5.21 79% 297 35% 2.68 59% 169 58% 3.44 0.86 71% 20% 195
PHM 66% 10.33 98% 416 35% 3.87 84% 115 68% 7.28 0.95 93% 29% 224
PKI 53% 7.25 94% 263 28% 3.03 70% 89 57% 5.39 1.24 88% 32% 148
R 63% 10.14 98% 352 27% 3.92 86% 71 65% 8.07 1.20 97% 30% 188
RAI 66% 10.40 100% 422 36% 4.67 89% 111 68% 7.52 1.11 99% 31% 224
SLB 76% 16.76 100% 644 32% 4.54 79% 94 77% 13.02 1.24 100% 36% 336
TE 54% 6.66 86% 301 37% 3.27 67% 175 60% 4.34 1.32 79% 29% 200
TWC 64% 11.80 99% 377 31% 4.26 77% 93 66% 8.46 1.34 99% 37% 201
WHR 65% 10.26 97% 394 29% 4.29 88% 85 67% 8.17 1.43 96% 39% 217
WIN 44% 3.12 60% 243 41% 2.68 54% 249 58% 1.78 1.39 42% 29% 206
WPI 66% 10.47 98% 437 32% 3.91 83% 100 68% 7.82 1.05 97% 30% 232
XTO 65% 13.28 100% 399 21% 3.05 63% 54 66% 10.72 1.05 100% 27% 209
Grand mean 65% 11.47 97% 429 32% 4.18 81% 103 67% 8.49 1.16 95% 31% 231
Panel B: Average results for changes in transaction prices
L = 2 trades 14% 15.74 98% 464 1% 2.69 63% 26 15% 14.17 -2.58 98% 54% 245
L = 5 trades 38% 19.42 98% 753 8% 4.50 75% 113 39% 16.85 -0.20 98% 9% 379
L = 10 trades 51% 17.78 98% 655 13% 4.55 75% 100 51% 14.97 0.57 98% 9% 329
17
As a robustness check, we repeated regressions (8a-8c) with differences between transaction
prices Pkt instead of differences in mid-prices Pk . This time price differences were computed in
trade time as ∆L Pkt = Pkt − Pk−L
t for L trades. The average results across five stocks, picked at
5
random are presented in Panel B of Table 4. Our findings for transaction prices are essentially
the same as for mid prices - OF Ik explains price changes better than T Ik . Moreover, the effect
of trades on prices seems to be captured by the order flow imbalance. The variable T Ik becomes
statistically insignificant when used together with OF Ik in the regression and the increase in R2
from adding T Ik as an extra regressor is not economically significant.
Interestingly, we found that the relation between ∆L Pkt and OF Ik is concave in some
samples, and similarly for ∆L Pkt and T Ik . We estimated regressions (8a) and (8b) for transaction
price changes with additional quadratic terms OF Ik |OF Ik | (respectively, T Ik |T Ik |) and found
that they are significant in nearly half of the samples with t-statistics of -2.8 on average (-2.3
for T Ik |T Ik |). Sampling data at special times (trade times) may introduce biases to the right
side of the regression. One possible explanation is that traders submit their orders when they
expect their impact to be minimal, leading to a concave (sublinear) impact. Supporting this
idea of sampling biases, we found that when mid-prices are sampled at trade times, the price
impact of OF Ik is again concave in some samples. On another hand, when we regressed last
trade prices sampled at 1-minute frequency on OF Ik , we observed the concave price impact
once again. This suggests that using either trade times or trade prices may lead to non-linear
price impact. However the quadratic term in our regressions is insignificant in about half of the
samples and marginally significant in the the other half of the data.
where wn = bn +sn is the size of any trade (either buy or sell) if it occurs at the n-th quote or zero
otherwise. Comparing this definition with the definition of OF Ik we note that both quantities
are sums of random variables. As the aggregation window [tk−1 , tk ] becomes progressively larger,
the behavior of these sums (under certain assumptions) will be governed by the Law of Large
Numbers and the Central Limit Theorem. We consider a time interval [0, T ) and denote by
5
The stocks tickers were BDX, CB, MHS, PHM and PKI. We computed price changes for L = 2, 5, 10 trades to
mitigate the possible issues with trade and quote alignment in the TAQ data and we correspondingly computed
order flow imbalances and trade imbalances during the time intervals between 2, 5 or 10 consecutive trades. To
ensure that there is an ample amount of data for each regression, we pooled data across days for each stock and
each time interval.
18
N (T ) the number of order book events during that time interval. We also denote by OF I(T )
and V OL(T ), respectively, the order flow imbalance and the traded volume during [0, T ). The
following proposition shows a link between OF I(T ) and V OL(T ) as T grows.
2. {ei }∞ 2
i=1 are i.i.d. random variables with a finite variance σ ,
3. {wi }∞
i=1 are i.i.d. random variables with a finite mean µπ, where π is the proportion of
order book events that correspond to trades and µ is the mean trade size.
√
µπ OF I(T )
Then p ⇒ ξ, as T → ∞ (9)
σ V OL(T )
where ξ ∼ N (0, 1) is a standard normal random variable and ⇒ denotes convergence in distri-
bution.
Proof: First, we apply the law of large numbers to the traded volume. Assumption (1)
ensures that N (T ) → ∞ as T → ∞:
PN (T )
V OL(T ) i=1 wi
= → µπ, w.p.1, as T → ∞, (10)
N (T ) N (T )
Second, event contributions ei have a finite variance σ 2 and, under our assumptions, we can
apply the classical central limit theorem to the order flow imbalance:
PN (T )
OF I(T ) ei
p ≡ pi=1 ⇒ ξ, as T → ∞, (11)
σ N (T ) σ N (T )
p
where ξ ∼ N (0, 1) is a standard normal random variable. Although the denominator σ N (T )
is random, it goes to infinity by assumption (1) and Anscombe’s lemma ensures that we can
use such a normalization in the central limit theorem [13, Lemma 2.5.8]. Since the square root
function is continuous, the convergence in (10) takes place almost-surely and the limit in (10)
is deterministic, we can combine (10) and (11) in the following way:
PN (T )
√ i=1 ei
√
µπ OF I(T ) σ N (T )
p ≡ rP ⇒ ξ, as T → ∞ (12)
σ V OL(T ) N (T )
wi
i=1
µπ(N (T ))
If the time interval [0, T ) includes a large enough number of order book events and trades,
the above limit argument implies a noisy scaling relation between order flow imbalance and the
square root of traded volume:
σ p
OF I(T ) = ξ √ V OL(T ), (13)
µπ
where µ, π and σ are constants and ξ ∼ N (0, 1). Now, assume that it holds not just for the first
interval, but for every time interval [tk−1 , tk ) of large enough length ∆t, regardless of its index
k. Then, (13) can be substituted into our model (2), to yield:
p
∆Pk = θk V OLk + k , (14)
19
where θk = βi ξk √σµπ is a slope coefficient and ξk ∼ N (0, 1) is a noise term due to scaling.
Due to the scaling approximation, the slope θk in (14) √
is a random normal variable: θk ∼
2 σ2 µπ OF Ik
N (0, βi µπ ). For every time interval [tk−1 , tk ) the ratio σ √V OL is a different draw from the
k
N (0, 1) distribution, leading to a different θk in each case. This additional randomness makes
this model considerably less robust than (2) and we do not recommend to use it.
Equation (14) shows that even if prices are driven by the order flow imbalance (i.e. even if
k = 0 ∀k), there will be a noisy square-root relation between the price changes and the traded
volume. However, if the assumptions of Proposition 1 do not hold (e.g. {ei }∞ i=1 are strongly
dependent or have infinite variance), the price-volume relation may have a different exponent.
A variety of exponents 0 < H < 1 have been observed in the relation between prices changes
and trade sizes [8], suggesting the following model:
∆Pk = θk V OLH
k + k , (15)
To estimate the exponent H, we put k = 0 and θk = θ̄i ξk in (15) and fit a logarithmic regression
to every half-hour subsample, indexed by i:
log |∆Ptk | = log θ̄ˆi + Ĥi log V OLk + log ξˆk (16)
Based on Proposition 1, we expect the price-volume relation to be indirect (i.e. come through
OF Ik ) and noisy. To empirically confirm this, we compare the following three regressions:
These regressions are estimated for every half-hour subsample with the exponents Ĥi pre-
estimated by (16). The averages of Ĥi and their standard deviation for each stock are presented
on the left panel in Table 5. The exponent varies considerably across stocks and time, but is
generally below 1/2 in our data. The average results of regressions (17a-17c) for each stock are
presented on the middle and right panels. We observe that |OF Ik | explains the magnitude of
price moves better than V OLĤ i
k . Although both variables appear to be statistically significant
when taken individually, only |OF Ik | remains significant in the multiple regression. Thus, the
dependence between the magnitude of price moves and the traded volume is mostly due to cor-
relation between V OLk and |OF Ik |. Interestingly, the number of trades variable (suggested in
[26]) is also statistically significant on a stand-alone basis, but becomes insignificant when added
to (17c) as a third variable.
20
5 Conclusion
We have introduced order flow imbalance, a variable that cumulates the sizes of order book
events, treating the contributions of market, limit and cancel orders equally, and provided em-
pirical and theoretical evidence for a linear relation between high-frequency price changes and
order flow imbalance for individual stocks. We have shown that this linear model is robust across
stocks and the impact coefficient is inversely proportional to market depth. These relations sug-
gest that prices respond to changes in the supply and demand for shares at the best quotes,
and that the impact coefficient fluctuates with the amount of liquidity provision, or depth, in
the market. Moreover, we have demonstrated that order flow imbalance is a stronger driver
of high-frequency price changes than standard measures of trade imbalance. Trades seem to
carry little to no information about price changes after the simultaneous order flow imbalance is
taken into account. If trades do not help to explain price changes after controlling for the order
flow imbalance, it is highly possible that the relation between price changes and traded volume
simply capture the noisy scaling relation between these variables.
Overall, these findings seem to give an intuitive picture of the price impact of order book
events, which is somewhat simpler than the one conveyed by previous studies.
21
Table 5. Comparison of traded volume and order flow imbalance.
Avg Stdev Order flow imbalance Traded volume Both covariates
Ticker
Ĥ Ĥ R2 t(β̂O ) βO 6= 0 F R2 t(β̂V ) βV 6= 0 F R2 t(φ̂O ) t(φ̂V ) φO 6= 0 6 0
φV = F
AMD 0.06 0.08 63% 10.3 99% 356 14% 4.5 83% 34 63% 9.4 1.1 99% 35% 182
APOL 0.24 0.08 53% 8.3 90% 258 25% 6.8 99% 63 57% 6.9 2.9 89% 84% 144
AXP 0.16 0.08 55% 10.5 100% 249 20% 6.6 100% 48 57% 9.0 2.8 100% 81% 133
AZO 0.43 0.22 39% 5.5 96% 131 32% 5.3 100% 93 50% 4.3 3.6 94% 96% 98
BAC 0.09 0.08 73% 16.3 100% 560 24% 5.6 83% 61 74% 13.9 1.2 96% 35% 285
BDX 0.26 0.10 55% 8.4 99% 261 27% 6.3 100% 71 58% 6.7 2.9 98% 84% 147
BK 0.11 0.07 68% 13.1 100% 437 19% 6.6 97% 46 68% 11.5 2.0 99% 58% 225
BSX -0.17 2.41 68% 8.4 100% 486 14% 3.3 95% 33 69% 8.0 0.1 97% 12% 246
BTU 0.24 0.07 58% 10.5 99% 283 23% 6.8 99% 57 60% 8.9 2.4 99% 78% 151
CAT 0.22 0.07 56% 10.4 98% 250 19% 6.0 98% 44 57% 8.9 2.1 98% 63% 131
CB 0.19 0.09 56% 10.1 99% 261 23% 6.4 99% 58 58% 8.2 2.6 99% 74% 141
CCL 0.14 0.07 60% 11.3 100% 309 19% 6.6 99% 45 62% 9.9 2.4 99% 74% 162
CINF 0.13 0.12 67% 10.6 99% 505 30% 6.1 98% 85 69% 8.7 2.0 99% 55% 268
CME 0.49 0.24 28% 4.1 94% 78 30% 4.8 99% 83 42% 3.2 3.6 86% 94% 71
COH 0.19 0.07 60% 10.4 99% 299 22% 6.5 99% 52 61% 8.9 2.2 98% 69% 157
COP 0.16 0.07 56% 9.8 100% 277 20% 6.0 96% 49 58% 8.4 2.4 100% 70% 145
CVH 0.18 0.10 62% 10.2 100% 352 27% 5.9 99% 72 64% 8.2 2.2 100% 70% 189
DNR 0.08 0.07 64% 12.0 99% 376 17% 6.3 95% 38 65% 10.7 1.8 99% 55% 193
DVN 0.26 0.07 52% 8.6 93% 236 24% 6.7 100% 59 55% 7.1 2.9 91% 81% 131
EFX 0.20 0.11 52% 8.1 99% 241 26% 5.4 99% 69 56% 6.4 2.7 97% 75% 137
ETN 0.26 0.10 55% 8.2 97% 252 27% 6.4 99% 70 58% 6.8 2.9 96% 83% 142
FISV 0.19 0.11 57% 9.1 100% 284 25% 5.9 100% 65 59% 7.3 2.2 99% 66% 153
HAS 0.20 0.09 61% 10.1 100% 328 26% 6.2 100% 67 63% 8.2 2.3 100% 73% 175
HCP 0.14 0.07 57% 11.1 100% 268 21% 7.0 99% 50 59% 9.3 2.7 100% 79% 143
HOT 0.23 0.08 57% 9.7 98% 263 24% 6.9 100% 60 60% 8.2 3.0 98% 85% 145
KSS 0.24 0.08 60% 10.8 97% 318 25% 6.6 99% 61 62% 9.0 2.4 97% 74% 169
LLL 0.33 0.12 58% 9.4 94% 323 34% 6.9 100% 101 63% 7.1 3.0 91% 86% 188
LMT 0.28 0.09 61% 10.7 99% 327 31% 7.3 100% 85 64% 8.4 2.9 99% 84% 182
M 0.11 0.07 69% 13.9 100% 463 20% 6.3 99% 46 69% 12.2 2.0 100% 60% 238
MAR 0.15 0.07 61% 12.3 100% 324 21% 6.9 99% 50 62% 10.4 2.4 100% 71% 170
MFE 0.16 0.09 60% 10.9 99% 318 24% 7.1 98% 62 62% 8.8 2.5 99% 71% 170
MHP 0.20 0.10 62% 10.2 99% 377 25% 5.9 99% 62 64% 8.5 1.9 99% 55% 199
MHS 0.23 0.08 56% 9.2 99% 258 24% 6.6 100% 58 58% 7.7 2.6 98% 77% 139
MRK 0.10 0.07 62% 11.0 100% 330 17% 5.4 99% 40 63% 9.8 1.8 100% 55% 170
MRO 0.09 0.06 61% 11.8 100% 333 16% 6.3 95% 36 63% 10.6 2.0 100% 54% 172
MWV 0.18 0.10 62% 10.3 100% 330 28% 6.7 100% 75 64% 8.2 2.4 100% 74% 180
NEM 0.20 0.07 56% 9.9 99% 253 20% 6.1 99% 47 58% 8.6 2.5 99% 75% 135
OMC 0.15 0.09 57% 10.1 99% 286 20% 6.4 98% 48 59% 8.6 2.4 98% 73% 151
PCS 0.11 0.18 62% 7.1 96% 411 18% 3.7 97% 54 63% 6.5 0.7 93% 20% 214
PHM 0.07 0.08 64% 10.2 100% 384 15% 5.5 90% 34 65% 9.4 1.2 99% 40% 195
PKI 0.11 0.11 55% 7.8 99% 266 20% 4.8 97% 47 57% 6.7 1.8 98% 53% 141
R 0.27 0.11 56% 8.6 98% 259 28% 6.0 100% 74 59% 6.9 2.9 97% 85% 147
RAI 0.25 0.10 61% 9.2 99% 334 28% 5.7 100% 73 63% 7.6 2.4 99% 71% 182
SLB 0.24 0.07 62% 12.0 99% 330 19% 5.5 98% 46 63% 10.6 1.7 99% 51% 171
TE 0.09 1.69 60% 8.0 98% 371 18% 4.4 85% 48 61% 7.2 1.3 98% 39% 196
TWC 0.25 0.10 55% 9.7 99% 253 27% 6.6 100% 73 58% 7.6 3.0 99% 81% 142
WHR 0.34 0.11 56% 8.2 97% 272 29% 6.3 100% 78 59% 6.6 2.9 95% 86% 156
WIN 0.06 0.26 48% 3.9 79% 340 10% 2.8 50% 34 49% 3.7 0.6 79% 29% 179
WPI 0.22 0.10 61% 9.6 98% 361 28% 5.8 100% 75 64% 7.7 2.2 98% 71% 196
XTO 0.08 0.06 53% 10.9 100% 238 15% 6.5 100% 32 55% 9.6 2.7 100% 78% 125
Grand mean 0.18 0.18 58% 9.8 98% 313 23% 6.0 97% 58 61% 8.3 2.3 97% 67% 168
22
References
[1] H. Ahn, K. Bae, and K. Chan, Limit orders, depth, and volatility: evidence from the
stock exchange of Hong Kong, Journal of Finance, 56 (2001), pp. 767–788.
[2] R. Almgren and N. Chriss, Optimal execution of portfolio transactions, Journal of Risk,
3 (2000), pp. 5–39.
[3] R. Almgren, C. Thum, E. Hauptmann, and H. Li, Direct estimation of equity market
impact, Journal of Risk, 18 (2005), p. 57.
[4] T. Andersen and T. Bollerslev, Deutsche mark - dollar volatility: intraday activity
patterns, macroeconomic announcements, and longer run dependencies, Journal of Finance,
53 (1998), p. 219.
[5] M. Avellaneda, S. Stoikov, and J. Reed, Forecasting prices from level-I quotes in the
presence of hidden liquidity. Working paper, 2010.
[6] D. Bertsimas and A. Lo, Optimal control of execution costs, Journal of Financial Mar-
kets, 1 (1998), pp. 1–50.
[7] J.-P. Bouchaud, Encyclopedia of Quantitative Finance, Wiley, 2010, ch. Price Impact.
[8] J.-P. Bouchaud, D. Farmer, and F. Lillo, Handbook of financial markets: dynamics
and evolution, Elsevier: Academic Press, 2009, ch. How markets slowly digest changes in
supply and demand.
[9] J.-P. Bouchaud, Y. Gefen, M. Potters, and M. Wyart, Fluctuations and response
in financial markets: the subtle nature of ’random’ price changes, Quantitative Finance, 4
(2004), p. 176.
[10] T. Chordia, R. Roll, and A. Subrahmanyam, Liquidity and market efficiency, Journal
of Financial Economics, 87 (2008), p. 249.
[11] P. K. Clark, A subordinated stochastic process model with finite variance for speculative
price, Econometrica, 41 (1973), pp. 135–155.
[12] Z. Eisler, J.-P. Bouchaud, and J. Kockelkoren, The price impact of order
book events: market orders, limit orders and cancellations, Quantitative Finance Papers
0904.0900, arXiv.org, Apr. 2009.
[14] R. Engle, R. Ferstenberg, and J. Russel, Measuring and modeling execution cost
and risk. NYU Working Paper No. FIN-06-044, 2006.
[15] R. Engle and A. Lunde, Trades and quotes: a bivariate point process, Journal of Finan-
cial Econometrics, 1 (2003), pp. 159–188.
[16] M. Evans and R. Lyons, Order flow and exchange rate dynamics, Journal of Political
Economy, 110 (2002), p. 170.
[17] J. D. Farmer, L. Gillemot, F. Lillo, S. Mike, and A. Sen, What really causes large
price changes?, Quantitative Finance, 4 (2004), pp. 383–397.
23
[18] X. Gabaix, P. Gopikrishnan, V. Plerou, and H. Stanley, A theory of power-law
distributions in financial market fluctuations, Nature, 423 (2003), p. 267.
[20] J. Hasbrouck, Measuring the information content of stock trades, Journal of Finance, 46
(1991), pp. 179–207.
[22] J. Hasbrouck and D. Seppi, Common factors in prices, order flows and liquidity, Journal
of Finance and Economics, 59 (2001), p. 383.
[23] N. Hautsch and R. Huang, The market impact of a limit order. SFB 649 Discussion
Papers, 2009.
[24] C. Hopman, Do supply and demand drive stock prices?, Quantitative Finance, 7 (2007),
pp. 37–53.
[26] C. Jones, G. Kaul, and M. Lipson, Transactions, volume, and volatility, Review of
Financial Studies, 7 (1994), pp. 631–651.
[27] J. Karpoff, The relation between price changes and trading volume: A survey, Journal of
Financial and Quantitative Analysis, 22 (1987), p. 109.
[28] D. Keim and A. Madhavan, The upstairs market for large-block transactions: Analysis
and measurement of price effects, Review of Economic Studies, 9 (1996), p. 1.
[29] A. Kempf and O. Korn, Market depth and order size, Journal of Financial Markets, 2
(1999), p. 29.
[30] P. Knez and M. Ready, Estimating the profits from trading strategies, Review of Financial
Studies, 9 (1996), p. 1121.
[31] C. Lee, B. Mucklow, and M. Ready, Spreads, depths, and the impact of earnings
information: an intraday analysis, Review of Financial Studies, 6 (1993), pp. 345–374.
[32] C. Lee and M. Ready, Inferring trade direction from intraday data, Journal of Finance,
46 (1991), pp. 733–746.
[34] T. McInish and R. Wood, An analysis of intraday patterns in bid/ask spreads for nyse
stocks, Journal of Finance, 47 (1992), pp. 753–764.
[35] A. Obizhaeva and J. Wang, Optimal trading strategy and supply/demand dynamics.
NBER Working Papers, No 11444, 2005.
24
[37] M. O’Hara, Market Microstructure Theory, Wiley, 1998.
[39] M. Potters and J. Bouchaud, More statistical properties of order books and price
impact, Physica A, 324 (2003), p. 133 140.
[42] E. Theissen, A test of the accuracy of the lee/ready trade classification algorithm, Journal
of International Financial Markets, Institutions and Money, 11 (2001), pp. 147–165.
[43] N. Torre and M. Ferrari, The Market Impact Model, BARRA, 1997.
[44] P. Weber and B. Rosenow, Order book approach to price impact, Quantitative Finance,
5 (2005), pp. 357–364.
[45] P. Weber and B. Rosenow, Large stock price changes: volume or liquidity?, Quantitative
Finance, 6 (2006), p. 7.
[46] I. Zovko and J. D. Farmer, The power of patience: A behavioral regularity in limit order
placement, Quantitative Finance, 2 (2002), pp. 387–392.
25
A Appendix: TAQ data processing
Quotes data were filtered as follows:
3. Quote mode 6∈ {4, 7, 9, 11, 13, 14, 15, 19, 20, 27, 28}
3. Correction indicator ≤ 2.
From the filtered quotes data we construct the National Best Bid and Offer (NBBO) quotes.
This is done by scanning through the filtered quotes data, while maintaining a matrix with the
best quotes for every exchange. When a new entry is read, we check the exchange flag of that
entry and update the corresponding row in the exchange matrix. Using this matrix, the NBBO
prices are computed at each entry as the highest bid and the lowest ask across all exchanges.
The NBBO sizes are simply the sums of all sizes at the NBBO bid and ask across all exchanges.
After the NBBO quotes are computed, we applied a simple quote test to the NBBO quotes
and the filtered trades data. This test matches trades with NBBO quotes and computes the
direction of matched trades. A trade is matched with a quote, if:
(a) Trade price ≥ NBBO ask: in this case the trade is considered to be a buy trade.
(b) Trade price ≤ NBBO bid: in this case the trade is considered to be a sell trade.
4. If the above conditions allow to match a trade with several quotes, it is matched with the
earliest quote.
There are other routines to estimate trade direction, including the tick test and the Lee-
Ready rule [32]. Although the latter is used quite frequently, there seems to be no compelling
evidence of superiority of either of these heuristics [36, 42]. To test the robustness of our findings
to the choice of a trade direction test, we compared our results on a subsample of data, applying
alternatively the tick test or the quote test and it led to virtually the same results.
Finally, we removed observations with extremely high bid-ask spreads. To apply this filter
coherently across stocks, we computed for each stock the 95-th percentile of its bid-ask spread
distribution and removed the 5% of that stock’s quotes with the spreads above that percentile.
26