CME Iceberg Detection Preprint
CME Iceberg Detection Preprint
CME Iceberg Detection Preprint
2019-08-29
We propose a method for detection and prediction of native and synthetic iceberg orders on
Chicago Mercantile Exchange. Native (managed by the exchange) icebergs are detected using dis-
crepancies between the resting volume of an order and the actual trade size as indicated by trade
summary messages, as well as by tracking order modifications that follow trade events. Synthetic
(managed by market participants) icebergs are detected by observing limit orders arriving within a
short time frame after a trade. The obtained icebergs are then used to train a model based on the
Kaplan–Meier estimator, accounting for orders that were cancelled after a partial execution. The
model is utilized to predict the total size of newly detected icebergs. Out of sample validation is
performed on the full order depth data, performance metrics and quantitative estimates of hidden
volume are presented.
1 Introduction
On financial exchanges, an iceberg order is a limit order where only a fraction of the total order size (display
quantity) is shown in the limit order book (LOB) at any one time (peak ), with the remainder of volume hidden
(Christensen and Woodmansey, 2013). When the peak is executed, the next part of the iceberg’s hidden volume
(tranche or refill ) gets displayed in the LOB. This process is repeated until the initial order is fully traded or
cancelled.
The hidden volume, although not being directly observed, is de facto present in the LOB and hence can be
traded against. This makes the detection of hidden liquidity a desirable goal for interested parties, e.g. traders
and market makers.
In this paper we propose a method for detecting and predicting hidden liquidity on Chicago Mercantile
Exchange (CME). The model is fit and assessed out of sample using historical data. We treat it both as a
classification and a regression model and discuss relevant performance metrics.
1.1 Data
We had access to an almost week-long full order depth (FOD) LOB log of a September E-Mini S&P 500 futures
contract, existing at that time under the ticker symbol ESU19, for the period from 2019-06-14, 11:00:00 CDT
to 2019-06-21, 16:00:00 CDT. The chosen interval is especially interesting from a trading activity standpoint:
as the front month contract ESM19 approaches its expiration, a majority of the open interest gets transferred
onto the next one, creating an increased demand for hidden liquidity vehicles. Each order was described by a
sequence of fields presented in table 1.
For more information about CME Market-by-Order book management, see (CME, 2019b).
1
Preprint ver. 2019-08-29
In addition, trade summary messages (CME, 2019c) were present in the data. Each trade event against a
resting order corresponded to a trade record in the log with the aforementioned fields, “Action” set to “Trade”
and an extra field for the passive order ID.
This way, our algorithm works in the “offline” mode by reading a pre-recorded LOB log. Since it is not
forward-looking, it can be easily modified to work with real-time streaming data.
Native icebergs are managed by the exchange itself. All new tranches are submitted as modifications of the
initial order; this means that the original order ID is preserved throughout the whole lifetime of the
iceberg. Additionally, trades against these orders may sometimes be larger in volume than the current
resting size, as indicated by trade summary messages.
Synthetic icebergs are submitted by independent software vendors (ISV), whose infrastructure is physically
separated from the exchange. ISV’s split the initial iceberg order, submit new tranches and track their
execution. These tranches are indistinguishable from usual limit orders submitted by other participants.
Detecting native icebergs is conceptually easy since a) the order ID does not change until the iceberg is fully
executed or cancelled; and b) trade summary messages include actual trade volumes, which may be larger than
the resting display quantity. Thus an unambiguous and accurate detection is possible. Synthetic icebergs, on
the other hand, being identical to non-iceberg orders in how they are processed by the exchange, can only be
detected heuristically and relying on a set of assumptions, which are introduced further.
1. Series of tranches are identified in the data as belonging to larger iceberg orders (the “detection” step).
2. Using the detected icebergs, a statistical model is fit that captures the correspondence between the peak
size and the total iceberg size (the “learning” step).
3. The detection step is repeated, but a new iceberg order is detected, a prediction of the total size of the
iceberg is made using the model obtained at the previous step (the “prediction” step).
We adapt this detection–learning–prediction scheme for our work, albeit with the following notable differences:
• The authors did not have the access to the FOD MBO data at the time of writing. In particular, the
order ID for each action or trade was not available, yet that drastically changes the logic of the detection
step.
• No distinction between synthetic and native icebergs is made. Namely, it is assumed that trades can
sometimes be larger than the size of the resting order being traded, what is specific for native icebergs;
however, iceberg tranches arrive as new limit orders, and that is an attribute of synthetic icebergs.
• During the learning phase, a bivariate Gaussian kernel density estimate of peak and total size is built,
which is then optimised for the global maximum given a peak size. For the purpose of prediction, where
only one value of the total size corresponding to the maximum probability given a peak size is necessary,
2
Preprint ver. 2019-08-29
this complication is questionable as a simpler model is sufficient1 . Kernel density estimate may be desired
if the algorithm operates on instruments with a relatively low daily trading volume, and this is not the
case with our data. In addition, by omitting this step we don’t have to resort to numerical methods when
optimizing for conditional maxima.
• All incomplete icebergs — i.e. those, that were cancelled before being fully executed — are not included
into the learning phase. However, our calculations show that more than half of all synthetic icebergs
are cancelled, thus it is highly desirable to include the information about incomplete executions into the
model.
3 Detection
3.1 Native Icebergs
Native iceberg orders enter the book as limit orders which may or may not be traded upon arrival. After the
initial limit order volume is fully traded, the next part of the iceberg order appears in the book. Crucially,
when the iceberg has its displayed quantity refreshed (by means of an update action), the refreshed order will
have the same order ID as the original order. Moreover, any trades involving the iceberg order will indicate
the total volume of trade, including the hidden part of the iceberg. Using these two properties it is then fairly
easy to detect a sequence of new–trade–update–delete actions that forms an iceberg. In particular, we might
be interested in update actions that correspond to new iceberg tranches, as well as in determining the peak size
and in calculating the total iceberg size.
We would like to illustrate the process with an example. Consider the data presented in table 2.
Table 2: Sample native iceberg order log data. Grouped are the orders related to the same tranche.
1. Order #645764830354 enters the book and immediately trades at 2931.75 for the total of 12 units of
volume. The remainder — 6 units — is placed at the same level. At that point we do not know whether
the order has any hidden liquidity or not. Moreover, assuming that it does, the peak size cannot be
1 It should be noted that the authors consider discrete kernel estimation, but opt to use the Gaussian kernel “on the basis of
simplicity”.
3
Preprint ver. 2019-08-29
precisely determined; but since 12 + 6 = 18, it is one of the divisors of 18 greater or equal than 6,
i.e. 6, 9 or 18.
2. The next trade has volume 8 which is larger than the resting volume of 6. This is sufficient to mark order
#645764830354 as an active iceberg.
3. The next tranche volume is 7. Note that 8 − 6 = 2 units of volume were traded against a tranche that
had not entered the book. This means that the peak size can be determined precisely as 7 + (8 − 6) = 9.
The trade could have been large enough to consume several hidden tranches.
4. The next several trades are smaller in volume than the resting order. The trade initiated by order
#645764830365 is equal to the resting volume of 1. Consequently, the next modify action is seen to
refresh the visible volume by the peak size of 9 (which agrees with the previous calculations).
5. Finally, the last update action has volume 5, which, accounting for the hidden trade of 9 − 7 = 2 results
in peak size of 7. The trade for 5 units completes the sequence as no more refresh messages is seen and
the order is deleted from the book.
Overall, the iceberg has a total volume of 43, 4 tranches with peak sizes 9, 9, 9 and 7, correspondingly, and the
display quantity equal to 9.
The process of parsing the action stream can be conveniently formalised as a finite state machine, see fig. 1.
To recap the detection phase, an iceberg enters the book as a new limit order, possibly following a sequence
of trades. It is then traded, and usually — but not always — each trade corresponds to one trade summary
message, in which case it is followed by an update action, specifying the currently resting order volume. If more
than one trade messages are seen before the next update action, then this should be accounted for. Moreover, all
price adjustments which move the order to the top of the book are not disseminated by the exchange, meaning
that even after the placement the order can again act as an aggressive order and initiate a trade. If at this point
the order is deleted from the book or traded so that the trade volume is never greater than the resting volume,
it is marked as “ordinary” and removed from consideration. On the other hand, once a trade larger than the
resting volume is detected, or the order is fully traded but then modified to have non-zero volume again, then
the order is marked as an iceberg. The trade–modify cycle then continues until the order is completely executed
or cancelled, resulting in its deletion from the book.
In addition to tracking the transitions through the state space, we are interested in calculating the following
quantities:
• the total volume Vtotal is conveniently computed as the sum of all traded volume VT (which may exceed
the sum of limit and/or update volumes), plus any volume VD that is explicitly deleted;
• the currently resting volume VR is simply the last modify action volume VM ;
VT + VL VT + VL
Vpeak = = , d = 1, . . . , VT + VL , Vpeak ∈ {n ∈ N : n ≥ VL }.
k+1 d
If more than one admissible Vpeak values are found, then the following heuristics apply.
· If the first tranche is traded for exactly the resting volume, the following update message unam-
biguously identifies the peak size.
· If the trade volume is greater than the resting volume, then
∗
Vpeak = VM + (VT − VR ) mod Vpeak ,
∗
where Vpeak is one of previously computed values. Only the values that satisfy this equation
are kept.
4
Preprint ver. 2019-08-29
initial
start T
trade
L/
peak ← ( + )/
L /
peak ←
T, TA [ < ] M
T, TA [ ≥ ]
T, TA [ ≥ ]
T, TA M T, TA [ < ]
M next
next T, TA [ ≥ ]
next tranche
tranche
M / tranche partially
traded
peak ← T, TA [ < ] traded
+( − )% peak
D /
D +
total ←
complete cancelled
Figure 1: The grammar of native icebergs. The nodes correspond to states of the finite state machine, the
edges — to order actions: new (L), update (M), trade (T), “affected” trade (TA ), delete (D). Trades T
are initiated by the iceberg order, while “affected” trades TA — by other incoming orders. An op-
tional condition is specified in square brackets; a side effect is specified after the slash (/). Different
volumes V· refer to the iceberg’s peak volume Vpeak , total volume Vtotal , current resting volume VR
and the order’s trade (VT ), delete (VD ), modify (VM ) volumes. Node colours represent the status
of the action sequence being tracked: grey for non-iceberg (“ordinary”), blue for active (all statuses
starting with “next tranche...”), green for complete and red for cancelled.
5
Preprint ver. 2019-08-29
more than one order of the target volume may arrive on the same price level within dt. Our very strong
assumption is that the next tranche arrives faster than any other new limit order, so for each tranche there
is only one child. A more sophisticated model would account for all possible children tranches and somehow
average the volume later on. Another complication is that when several limit orders of the same price and
size get executed and deleted from the book simultaneously, the next tranche can be “linked” to any of those.
Repeated over several trades, this produces a tree of possible tranches. Every path from all leaves to the root
(a chain) is a possible iceberg. See table 3 and the resulting graph in fig. 2 for an illustration.
Table 3: Artificial data to demonstrate synthetic iceberg detection. For this example, dt was set to 0.3 seconds.
Limit order #1 gets traded and removed from the book. The following limit order #2 arriving within
a third of a second becomes the next tranche in the iceberg chain. Note that orders #4 and #5 do
not arrive within dt and, since there were no more trades, start two new chains. After they get traded
simultaneously, order #6 arrives within dt, thus becoming the next tranche. The process continues
until all orders are removed from the book.
6
Preprint ver. 2019-08-29
Figure 2: An iceberg tranche tree corresponding to table 3. Node labels are order IDs. Edge labels are time in
seconds between subsequent tranches — note that these are different from dt as a tranche can remain
indefinitely long in the book after its placement. The iceberg consists of either 3, 4 (two chains) or 5
tranches.
• the total volume Vtotal could have been calculated as the sum of the tranche volumes if there was only one
tranche chain per iceberg. In general, however, there are more and the total volume has to be aggregated
in some way. We propose the following options:
◦ the average total volume of all chains V all ;
◦ the average total volume of chains of unique length V unique ;
◦ the total volume of the longest chain V longest .
4 Learning
4.1 Kaplan–Meier estimation
Having detected sufficiently many iceberg orders, we would like to build a model that yields a prediction of the
total iceberg size. Although it is clearly not the most advanced in terms of predictive power, we elaborate on
the model proposed in (Christensen and Woodmansey, 2013). Namely, for each of P unique detected peak sizes,
a distribution of possible total sizes is estimated; then, given the peak size Vpeak = p of a previously unseen
iceberg, “the best” total volume (in terms of conditional mean, median or mode) is returned as a prediction.
More precisely, from now on let Vp denote a random variable representing the total volume of an iceberg with
peak size p. Then for each value of p we are interested in estimating the distribution of Vp . While a trivial
empirical distribution might suffice, our experiments show that a significant amount of synthetic icebergs are
cancelled before being completely executed (see fig. 5). Hence for some icebergs only a lower bound on their
total volume is known: for the i-th iceberg, vi ≥ ci , ci ∈ N. From the point of view of survival analysis, these
are censored observations. Usually survival analysis deals with the so-called “time to event data”: the primary
interest is the time until the onset of an event for each member of the analysed group. If only upper (or lower,
or both) bounds on time, but not exact event times, are known, the observations are considered censored.
Instead of discarding these, it is possible to construct estimators that incorporate the uncertainty associated
with censoring. In our case, accumulated iceberg volumes play the role of time to event durations, so the task
is to estimate the distribution of Vp for each p using random right-censored data.
The proportion of cancelled native icebergs is much smaller, and, in fact, could be disregarded for the purpose
of distribution estimation. Nevertheless, we would like to utilise the same approach to simplify the analysis and
to make the direct comparison between native and synthetic iceberg estimates possible.
The standard approach for a non-parametric distribution estimation of censored data is to use the Ka-
plan–Meier estimate (Kaplan and Meier, 1958). Let Fp (v) be the cumulative distribution function of Vp , then
Sp (v) = 1 − Fp (v) is its survival function. Also define (for the given p)
7
Preprint ver. 2019-08-29
where C is a set of indices of all complete icebergs and Hi is a set of tranche chain indices of the i-th iceberg,
having total volumes equal to uj . ñj are computed similarly. Of course, when each tranche tree consists of only
one chain, all weights are equal to 1 and we have d˜j = dj and ñj = nj . Since Vp only takes discrete values for
all p, we finally obtain the weighted estimate
j
!
Y d˜k
Ŝp (uj ) = 1− .
ñk
k=1
From Ŝp an estimate of the probability mass function fp (uj ) = P(Vp = uj ) can be obtained in a trivial way.
One notable problem with this estimate is that if dK = 0, then Sp (uK ) 6= 0 and the probabilities do not sum
up to 1. This is fixed trivially by normalising the probabilities.
5 Prediction
The prediction step starts from detecting first several tranches of an iceberg: for native icebergs, this might be
any moment when the iceberg becomes “active”, for synthetic icebergs this number is an algorithm parameter
with a default value of 3. If the peak size p is precisely detected, a prediction of the total volume might be done.
8
Preprint ver. 2019-08-29
• median prediction as
J
X
v̂ median = max uJ : fˆp (uj ) ≤ 0.5, uj ∈ Vp ∀j = 1, . . . , |Vp | ;
j=1
v̂ mode(k) = u(k) ,
where the order of u(1) , . . . , u(|Vp |) is given by fˆp (u(1) ) ≥ · · · ≥ fˆp (u(|Vp |) ). Tied volumes are taken in
ascending order.
where Vp0 is the constrained optimization space and Kp is the number of unique total icebergs sizes with peak p.
For the sake of brevity we do not report other possible estimates, as they do not differ much. The predicted
total volume v̂`mode is aggregated across the chains of the iceberg in question as
6 Evaluation
Given an estimate of fˆp (v) and previously unseen data, the model can be evaluated both as a binary classifier
and as a regression. In the discussion below, we assume that the prediction algorithm was run, producing a set
of complete icebergs.
• For classification, our null hypothesis is “there is no hidden liquidity”. In the context of synthetic icebergs,
it means that the iceberg is complete and no more tranches will follow. In the context of native icebergs, it
means the last seen tranche can only be traded for the volume not exceeding its currently visible volume,
and that no more tranches will follow. Since the full information on a particular iceberg execution is
available after we run the prediction algorithm (each iceberg is eventually complete), the true total volume
is known2 and hence the classification results can be summarised in a confusion matrix, from which we
compute the standard classification metrics: accuracy, precision, recall and F1 score.
• Regression performance metrics show the degree to which the prediction is different from the true total
volume.
The details of evaluation are slightly different for native and synthetic icebergs, and are given below. We hope
that the level of details is sufficient so that there is no ambiguity of how the particular results were obtained.
9
Preprint ver. 2019-08-29
Classification After a new tranche r arrives, the accumulated volume is vi,r + pi,r . Hence if vi,r + pi,r < vi ,
·
then the hypothesis is rejected (the true result is “negative”); consequently if vi,r + pi,r < v̂i,r , then the outcome
is “true negative”, otherwise it is “false positive”; and vice versa. For mode(1), . . . , mode(k) predictions, consider
the prediction true if at least of the them was true.
1 X X ·
MAE = ei,r , (2)
|R|
i∈C r∈Ri
s
1 X X · 2
RMSE = ei,r , (3)
|R|
i∈C r∈Ri
[
where R = Ri and C is the set of complete icebergs.
i∈C
Classification
· ·
• For the last tranche rmax , if V̂i,r max
= Vi,r max
, then the case is true positive, otherwise it is a false positive.
· ·
• For all but the last tranche (r < rmax ), if V̂i,r > Vi,r , then the case is true negative, otherwise it is a false
negative.
7 Results
We estimated fp (v) on one day of ESU19 (E-Mini S&P 500 futures contract) FOD LOB log data: from 2019-06-
18, roughly 16:45:00 CDT, to 2019-06-19, 16:00:00 CDT; for synthetic icebergs, dt was set to 0.3 seconds. The
choice of parameters and training intervals is empirical and may be optimised further, but this falls outside of
the scope of this article. Our evidence suggests that it is reasonable to include at least one trading session into
the learning phase, thus capturing different order flow regimes throughout the day (see e.g. (Bouchaud et al.,
2018, chapter 4)).
The following figures were produced using the data for the aforementioned period. For synthetic icebergs,
the longest chain volume aggregation is used.
10
Preprint ver. 2019-08-29
6e+06 100%
1e+05
Trade count
4e+06
1e+03
50%
2e+06
25%
1e+01
0e+00 0%
TRADE MODIFY DELETE LIMIT (ALL) 0 200 400 600
Action Volume
100% 100%
60000 1500
% of all icebergs
% of all icebergs
Iceberg count
Iceberg count
75% 75%
40000 1000
50% 50%
20000 500
25% 25%
0 0% 0 0%
all cancelled complete all cancelled complete
Completion Completion
Figure 5: Iceberg completion state distribution on ESM19 (E-Mini S&P 500) futures FOD LOB log data
(2019-06-18 16:45:00 CDT – 2019-06-19 16:00:00 CDT).
The proportion of iceberg orders to all orders on one trading day is shown in fig. 6 in terms of both volume
and number of orders. In case of synthetic icebergs, the results depend on the minimum number of tranches per
iceberg — that is, the number of tranches after which their sequence is considered an iceberg. By increasing
this parameter, we decrease the false positive rate at the cost of disregarding all icebergs of shorter lengths.
We divide the total volume of all iceberg orders by the total traded volume of all orders (like e.g. (Frey and
Sandås, 2017) do), and not the total daily limit order volume. This ratio makes more sense because only executed
icebergs can be detected, which surely constitute only a fraction of all resting hidden volume. We estimate that
4% of all traded volume is contributed by native icebergs, while the volume contributed by synthetic icebergs
ranges from 3.3 to 14.3%, depending on the minimum number of tranches. This is in agreement with some of
the results reported in the literature as alluded to earlier in section 2.
Moreover, as (Fleming et al., 2018) note, usually there is no hidden depth, but when it is present, it is
substantial. This is especially true for native icebergs, that constitute 0.06% of all orders by number, but 4%
by volume; see fig. 6.
In addition, the following size-related distributions are estimated:
• Trade volume (fig. 7). At least with native icebergs, we confirm the finding of (Christensen and Wood-
mansey, 2013) that order sizes to be multiples of 5, like 15, 25, 50 or 100 as can be seen in the right panel
— this might be inidicative of a human bias.
11
Preprint ver. 2019-08-29
6%
(Iceberg / Total traded volume)
5% 2%
0% 0%
Native Synthetic Native S. (all tranches) S. (as one)
Icebergs Icebergs
12
30%
9
% of orders
Density
20%
6
3 10%
0 0%
1 10 100 1000 2 4 6 8 1012 15 20 25 50 80 100
Volume Volume
12
Preprint ver. 2019-08-29
10000 10000
1000 1000
Iceberg count
Iceberg count
100 100
10 10
1 1
1 10 25 50 75 100 0 10 20 30 40
Peak volume No. tranches per iceberg
Figure 10 visualises summary statistics related to the distributions of the number of tranches, the peak size
and the total volume per order. Note that the total volume of both native and synthetic icebergs is significantly
different from the the size of all limit orders. Also, the median total volume is, in fact, identical for native
and synthetic icebergs (being equal to 6), but the means are different due to some native icebergs having an
extremely large size.
100 1000
30
100
10
10
10
3
1 1 1
Native i. Synthetic i. Native i. Synthetic i. All Native i. Synthetic i.
Figure 10: Summary of the distributions of the number of tranches per iceberg, the peak size and the total
volume per order. The lower and upper hinges correspond to the first and third quartiles. The
whiskers extend from the lower / upper hinge to the minimum / maximum value, respectively. The
middle bar is the median, while the red diamond dot is the mean.
Lastly, fig. 11 shows the distribution of arrival time differences between subsequent tranches. Zero values
are discarded for the purpose of drawing the plot, but they amount to 4.71%3 and 38.94% of all values for
synthetic and native icebergs, correspondingly. If the initial tranche is not considered, then it can be seen that
the majority of tranches arrive less than one second after the previous tranche (before being traded). This
suggests that the proposed detection algorithm is more suitable as an input to other trading algorithms, rather
than a signal to a day trader, who would not be able to react sufficiently fast.
3 The fact that we observe zero delays for synthetic icebergs may be attributed to an insufficient accuracy of time records (milli-
second resolution).
13
Preprint ver. 2019-08-29
0.4
0.4
0.3
0.3
Density
Density
0.2
0.2
0.1
0.1
0.0 0.0
10−2 10−1 1 101 102 103 104 10−2 10−1 1 101 102 103 104 105
Arrival time difference (seconds) Arrival time difference (seconds)
Tranches All All but 1st Tranches All All but 1st
Figure 11: Tranche arrival time difference distributions (for values strictly greater than zero). It is instructive
to compare two cases: when the initial tranche is included into and excluded from consideration — it
might take longer time to execute the first tranche of an iceberg after its initial placement, but the
following tranches get traded more rapidly.
Synthetic icebergs The classification performance is summarised in tables 4 and 5. V̂ all and V̂ unique volume
aggregation methods showed similar results (about 1% difference in the classification metrics). For the sake of
brevity, only V̂ all and V̂ longest are displayed. The algorithm demonstrates a fair performance as indicated by
the ≈ 70% accuracy, although it is mainly contributed by the true negatives (the prediction that the iceberg is
not complete). This is especially true in the case of the “all chains” aggregation. When checking for the equality
V̂r· max = Vr· max , taking the longest chain gives a better result, because V̂rlongest
max
is not averaged and thus is always
an integer. It is instructive to check the magnitude of the prediction error — the equality V̂r· max = Vr· max might
not hold, but not by a large margin. Indeed, as the Regression section of table 4 demonstrates, the prediction
is off by 1.78 units of volume on average.
Table 4: Evaluation metrics for synthetic icebergs. Percentages for regression are given relatively to the mean
total volume.
Native icebergs Of the total number of native icebergs (see fig. 5), 33 icebergs with non-unique peak size
values were filtered out, leaving 98% of the initial amount used to estimate the total volume distribution.
Tables 6 and 7 summarise the predictive performance on native icebergs. Mode (k) columns refer to the
metrics and confusion matrices computed using k best mode predictions. For the sake of brevity, we only
14
Preprint ver. 2019-08-29
provide median and mode (3) confusion matrices as those averages have demonstrated the best results. As with
synthetic icebergs, high accuracy values are mainly contributed by a large number of true negatives. Note that
regression results are worse compared to the case of synthetic icebergs, which can possibly be explained by the
smaller sample size.
MAE 94.60 (97.87%) 89.40 (92.49%) 99.66 (103.14%) 69.66 (72.07%) 61.79 (63.92%)
Regression
RMSE 217.08 (224.58%) 234.45 (242.55%) 239.22 (247.49%) 204.35 (211.41%) 190.43 (197.01%)
Table 6: Evaluation metrics for native icebergs. Percentages for regression are given relatively to the mean total
volume.
8.1 Detection
Detection of native icebergs is straightforward as the information disseminated by the exchange is sufficient to
reliably determine the sequence of tranches that constitute an iceberg order.
On the other hand, detecting synthetic icebergs is conceptually more complicated and can only be attempted
by relying on various heuristics. One inherent limitation of the proposed model is that the next tranche is
expected to arrive earlier than any other limit orders for the same price and volume combination after a trade.
In network graph terms, each node cannot have more than one child. This limitation may possibly be overcome
by considering more complex graphs where each tranche is allowed to have more than one child, although at
present it is unclear how to proceed with consistent inference in that case. The value of dt parameter can be
optimised using cross-validation.
That being said, for an end user interested in predictions, it may not matter at all whether the detected limit
orders are a part of an iceberg order or not. Generally speaking, an order flow pattern gets detected, for which
a satisfactory prediction can be made. This information can in turn be used as an input to trading algorithms.
15
Preprint ver. 2019-08-29
References
Bouchaud, J.-P., Bonart, J., Donier, J., and Gould, M. 2018. Trades, Quotes and Prices: Financial
Markets Under the Microscope. Cambridge University Press.
Christensen, H. and Woodmansey, R. 2013. Prediction of Hidden Liquidity in the Limit Order Book of
GLOBEX Futures. The Journal of Trading 8:68–95.
Kalbfleisch, J. and Prentice, R. 2002. The Statistical Analysis of Failure Time Data (Wiley Series in
Probability and Statistics). Wiley-Interscience, 2 edition.
Kaplan, E. L. and Meier, P. 1958. Nonparametric Estimation from Incomplete Observations. Journal of
the American Statistical Association 53:457–481.
Moro, E., Vicente, J., Moyano, L. G., Gerig, A., Farmer, J. D., Vaglica, G., Lillo, F., and
Mantegna, R. N. 2009. Market impact and trading profile of hidden orders in stock markets. Physical
review. E, Statistical, nonlinear, and soft matter physics 80:066102.
16