Efficient Query Processing Infrastructures

Efficient Query Processing
Infrastructures
Craig Macdonald 1, Nicola Tonellotto 2
1 School of Computer Science
University of Glasgow
2 Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo”
National Research Council of Italy

The scale of the web search challenge

How many documents?
In how long?
• Reports suggest that Google considers a total of 30
trillion pages in the indexes of its search engine
– And it identified relevant results from these 30 trillion in 0.63 seconds
– Clearly this a big data problem!
• To answer a user's query, a search engine doesn’t read
through all of those pages: the index data structures help
it to efficiently find pages that effectively match the query
and will help the user
– Effective: users want relevant search results
– Efficient: users aren't prepared to wait a long time for search results
• Today, we provide an insight into search engine
architecture and the modern retrieval techniques for
attaining efficient yet effective search

Search as a Distributed Problem
• To achieve efficiency at Big Data scale, search
engines use many servers:
• N & M can be very big:
– Microsoft's Bing search engine has "hundreds of
thousands of query servers"
Shard
Replica
Query
Server
Retrieval
Strategy
Shard
Replica
Query
Server
Retrieval
Strategy
Broker
Query
Scheduler
queries
N M
Results
Merging

Image Source: http://wallpapercave.com/w/NzPVzSl
5

Image Source: http://www.coloandcloud.com/editorial/quincy-wa-big-data-centers-leverage-abundant-inexpensive-renewable-energy/
QUINCY WA, USA

Image Source: http://www.coloandcloud.com/editorial/quincy-wa-big-data-centers-leverage-abundant-inexpensive-
renewable-energy/

Data centres use 3 per cent of the global electricity supply and accounting for
about 2 per cent of total greenhouse gas emissions
This is expected to treble in the next decade, putting an enormous strain on
energy supplies and dealing a hefty blow to efforts to contain global warming
Data Centres are energy hungry…
Image Source: http://www.mwhglobal.com/project/wanapum-dam-spillway-project/Source: The Independent

How to keep pace?
IncomeExpenditure
More users, more ad revenue
More queries received
More documents to index
Search engines are growing
9

LessEffectiveMoreEffective
More Query ServersMore Efficient Querying
Expenditure
Income
Expenditure
Income

Green search engines are important
Keep your search engine
Efficient yet effective
Efficient: More throughput, less servers, less
expenditure
Effective: More users, more income
Income
11
Expenditure

Inter Data Center
So where is our focus?

Intra Data Center

Intra Server

• Different formulations are more effective & efficient for
different queries:
– Homepage: match on Url/Title/Atext
– Informational: match on Body, more query expansion
• Query type classification, query performance predictors and
query efficiency predictors all have a role in deciding the
most appropriate plan for a query
Query Plans: Executing a Query
16
Q
Qplan
Q Q plan
halloween
costumes
(halloween ∊ U|B|T) ⋀
( {costume,costumes} ∊ A|U|B|T)
facebook login (facebook ∊ U|B|T)
Optimizing Query Evaluations using Reinforcement Learning for Web Search. Rosset et al,
Proceedings of SIGIR 2018.

• Typically, in web-scale search, the ranking process can be
conceptually seen as a series of cascades [1]
– Rank some documents
– Pass top-ranked onto next cascade for refined re-ranking
1,000,000,000s
of documents
1000s of
documents
Cascades
17
20
docs
Boolean: Do query terms occur?
Simple Ranking: Identify a set most
likely to contain relevant documents
Re-Ranking: Try really hard to
get the top of the ranking correct,
using many signals (features)
LEARNING TO RANK
e.g. BM25
[1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day.
e.g. AND/OR
Q
Qplan

1,000,000,000s
of documents
1000s of
documents
Roadmap
18
20
docs
LEARNING TO RANK
e.g. BM25
e.g. AND/ORPart 1: Index layouts
Part 2: Query evaluation
& dynamic pruning
Part 3: Learning-to-rank
Part 4: Query
Efficiency Prediction
& Applications

Aims of this Tutorial
• ILO 1. learn about the fundamentals of query processing in
information retrieval in terms of the data structures and algorithms
involved (e.g. Document-at-a-Time/Term-at-a-time/Score-at-a-
time);
• ILO 2. learn about the state-of-the-art of dynamic pruning (query
evaluation) techniques (e.g., TAAT optimizations, impacts,
MaxScore, WAND, blockmax indexes, top-hitlists), illustrated
through extensive use of pseudo-code and examples.
• ILO 3. understand how such query processing fits into a multi-tier
search architecture, involving learning-to-rank and distributed
query processing.
• ILO 4. understand about safeness in dynamic pruning techniques,
and about the nature of the tradeoffs between efficiency and
effectiveness in query processing.
• ILO 5. learn about query efficiency prediction and its applications
to efficient yet effective web search and to operational costs
reduction, including a description of Bing’s use of QEP.
19SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA

Outline
• 1330-1340 Introduction (Craig) (10 mins)
– Components, multi-stage architectures
• 1340-1410 Part 1: Data structures, compression & skipping (Craig) (30 mins)
• 1410-1500 Part 2: Query Evaluation (50 mins)
– TAAT/DAAT (Craig)
– dynamic pruning, impacts, SAAT (Nicola)
• 1500-1530 Coffee Break (30 mins)
• 1530-1625 Part 3: Second Stage Query Processing, aka LTR
– Learning to rank (Craig) (40 min)
– Quickscorer (Nicola) (15 mins)
• 1625-1650 Part 4: Query Efficiency Prediction (25 minutes)
• 1650-1700 Conclusions (5-10 minutes)
SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 20

PART 1: INDEX LAYOUT &
COMPRESSION

The Format of an Index
term df cf p
Lexicon
id len
DocumentIndex
InvertedIndexid tf id tf
each entry (posting) represents a
document
Index
Term Pipeline
Indexing
Document
Tokenizer
22
• An index normally contains three sub-
structures
• Lexicon: Records the list of all unique terms
and their statistics
• Document Index: Records the list of all
documents and their statistics
• Inverted Index: Records the mapping
between terms and documents
id tf
Could also contain other occurrence
information: e.g. term positions,
fields (title, URL)
Could also contain other document
information: e.g. PageRank

Search Efficiency
• It is important to make retrieval as
fast as possible
• Research by indicates that even
slightly slower retrieval (0.2s-0.4s) can
lead to a dramatic drop in the perceived
quality of the results [1]
• So what is the most costly part of a
(classical) search system?
• Scoring each document for the user query
23[1] Teevan et al. Slow Search: Information Retrieval without Time Constraints. HCIR’13
Term Pipeline
Document
Retrieval Model
Query
Tokenizer
Re-Ranking
Index
Top Results

Why is Document Scoring
Expensive?
• The largest reason for the expense of document
scoring is that there are lots of documents:
– A Web search index can contain billions of
documents
• Google currently indexes trillions of pages [1]
• More specifically, the cost of a search is dependent on:
– Query length (the number of search terms)
– Posting list length for each query term
• i.e. The number of documents containing each term
24
term1 df cf p
term2 df cf p
id tf id tf id tf
id tf id tf
[1] http://www.statisticbrain.com/total-number-of-pages-indexed-by-google/

Index Compression
• The compression of the inverted index posting lists is
essential for efficient scoring [1]
– Motivation: it physically takes time to read the term
posting lists, particularly if they are stored on a (slow)
hard disk
– Using lossless compressed layouts for the term posting
lists can save space (on disk or in memory) and reduce
the amount of time spent doing IO
– But decompression can also be expensive, so efficient
decompression is key!
26
term2 21 2 25 2
1 integer = 32 bits = 4 bytes
total = 24 bytes
Do we need 32 bits?
26 5
[1] Witten et al. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann
1999.

Delta Gaps
• One component of the information stored in a posting
list is the docids…
– …in ascending order!
• We can make smaller numbers by taking the
differences
• So each docid in the posting lists could be represented
using less bits
– How to represent these numbers?
– 32 bits has a range -2147483648 .. 2147483648
– Using a fixed number of bits is wasteful
27
term2 21 2 25 2 26 5
term2 21 2 25 24 26 51

• Unary:
– Use as many 0s as the input value x, followed by a 1
– Eg: 5 is 000001
• Gamma:
– Let N=⌊log2 x⌋ be the highest power of 2 that x contains;
– Write N out in unary representation, followed by the
remainder
(x – 2N) in binary
– Eg: 5 is represented as 00101
• Let’s represent docids as gamma, and tf as unary
Elias Unary & Gamma Encoding
28
term2 21 2 25 24 26 51
Repr. 0000 1 0101 001 00100 001 1 00001
Bits 9 3 5 3 1 5
= 21 bits
< 3 bytes
down from
24!
3.5 bits/int

Other Compressions
Schemes
• Elias Gamma & Elias Unary are moderately expensive to
decode: lots of bit twiddling!
– Other schemes are byte-aligned, e.g.
• Variable byte [1]
• Simple family [2]
• Documents are often clustered in the inverted index (e.g. by
URL ordering)
– Compression can be more effective in blocks of numbers
– List-adaptive techniques work on blocks of numbers
• Frame of reference (FOR) [3]
• Patched frame of reference (PFOR) [4]
29
[1] H.E. Williams and J. Zobel. Compressing Integers for Fast File Access. Comput. J. 1999
[2] V. Anh & A. Moffat. Inverted Index Compression using Word-aligned Binary Codes. INRT. 2005
[3] J. Goldstein et al. Compressing relations and indexes. ICDE 1998.
[4] M. Zukowski et al. Super-scalar RAM-CPU cache compression. ICDE 2006.

Frame of Reference
Idea: pick the minimum m and the maximum M
values of block of numbers that you are
compressing.
• Then, any value x can be represented using b bits,
where b = ⌈log2(M-m+1)⌉.
Example: To compress numbers in range
{2000,...,2063}
– ⌈log2(64) ⌉ = 6
– So we use 6 bits per value:
2000
2
6
2
xxxxxxxxxxxxxxxxxx...
37

31
Patched frame of reference
(PFOR)
● FOR is sensitive to outliers
– Eg: to compress {2000,...,9000,...2064} we now need
13 bits per value
● Idea: pick b bits for compression such as it is “good” for most
of the values, treat the others as exceptions (32 bits each)
Zukowski et al, ICDE’06

32
PFOR flavors
● NewPFD
– compresses exceptions using a Simple family
codec
● OptPFD
– tries to optimize b to have the best compression
ratio and decompression speed
● FastPFOR
– subdivides exceptions by # used bits, then
compresses these sets using a FOR codec
NewPFD and OptPFD in Yan et al., WWW'09
FastPFOR in Lemire, Boytsov; Software: Practice and experience (2013)

Compression:
Some numbers [1]
• ClueWeb09 corpus – 50 million Web documents
– 12,725,738,385 postings => 94.8GB inverted file uncompressed –
NO retrieval numbers: WAY TOO SLOW!
– Terrier’s standard Elias Gamma/Unary compression = 15GB
• Compression is essential for an efficient IR system
– List adaptive compression: slightly larger indices, markedly faster
33
Time (s) Size Time (s) Size
docids tfs
Gamma/Unary 1.55 - 1.55 -
Variable Byte +0.6% +5% +9% +18.4%
Simple16 -7.1% -0.2% -2.6% +0.7%
FOR -9.7% +1.3% -3.2% +4.1%
PForDelta -7.7% +1.2% -1.3% +3.3%
[1] M Catena, C Macdonald, and I Ounis. On Inverted Index Compression for Search Engine Efficiency. ECIR
2014.

Most Recently… Elias Fano
• A data structure from the ’70s (mostly in the succinct
data structures niche) for encoding of monotonically
increasing sequences of integers
• Recently successfully applied to inverted indexes
[Vigna, WSDM13]
– Used by Facebook Graph Search!
• Originally distribution-independent, but recently
adapted to take into account document similarities,
i.e., common terms: partitioned Elias-Fano
[Ottaviano & Venturini, SIGIR15]
• Very fast implementations exist
– Skipping is included…

Posting List Iterators
• It is often convenient to see a posting list as an
iterator over its postings.
• APIs:
– p.docid() returns the docid of the current posting
– p.score() returns the score of the current posting
– p.next() moves sequentially the iterator to the next posting
– p.next(d) advances the iterator forward to the next posting
with a document identifier greater than or equal to d. =>
skipping

Skipping (I)
d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13
To move the cursor to a specific docid, we need to
read and decompress all postings in the middle
current
posting
next
posting

Skipping (II)
d1 d10d2 d4 d5 d6 d7 d8 d9d3
We append an additional structure to the start of the
posting list, such that the largest docid in any block
of compressed postings is known
Then to move the cursor to a specific docid, we
need to read and decompress skips in the middle
and few postings
current
posting
next
posting
d11 d12

Essential Bibliography
• Matteo Catena, Craig Macdonald and Iadh Ounis. On Inverted Index Compression for
Search Engine Efficiency. In Proceedings of ECIR 2014.
• Witten, I.H., Bell, T.C., Moffat, A.: Managing Gigabytes: Compressing and Indexing
Documents and Images. 1st edn. (1994)
• Elias, P.: Universal codeword sets and representations of the integers. Trans. Info. Theory
21(2) (1975) 2014.
• Lemire, D., Boytsov, L.: Decoding billions of integers per second through vectorization.
Software: Practice and Experience (2013)
• Delbru, R., Campinas, S., Samp, K., Tummarello, G.: Adaptive frame of reference for
compressing inverted lists. Technical Report 2010-12-16, DERI (2010)
• Williams, H.E., Zobel, J.: Compressing integers for fast file access. The Computer Journal
42 (1999)
• Scholer, F., Williams, H.E., Yiannis, J., Zobel, J.: Compression of inverted indexes for fast
query evaluation. In: Proc. SIGIR ’02. (2002)
• Anh, V.N., Moffat, A.: Inverted index compression using word-aligned binary codes. Inf.
Retr. 8(1) (2005)
• Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: Proc.
ICDE ’98. (1998)
• Alistair Moffat and Justin Zobel. 1996. Self-indexing inverted files for fast text
retrieval. ACM Trans. Inf. Syst. 14, 4 (October 1996), 349-379.
• Sebastiano Vigna. 2013. Quasi-succinct indices. In Proceedings of the sixth ACM
international conference on Web search and data mining (WSDM '13)
• Giuseppe Ottaviano, and Rossano Venturini. "Partitioned elias-fano indexes." Proceedings
of the 37th international ACM SIGIR conference on Research & development in information
retrieval. ACM, 2014.

1,000,000,000s
of documents
1000s of
documents
Roadmap
40
20
docs
LEARNING TO RANK
e.g. BM25
& dynamic pruning
Part 4: Query
& Applications

Query Evaluation
• Conjunctive (AND) vs. Disjunctive (OR)
• Boolean retrieval
– Suitable for small/medium collections
– Returns all documents matching the query
• Ranked retrieval
– Suitable for Web-scale collections
– Requires a similarity function between queries and documents
– Returns only the top k document

Query Evaluation
• Normal strategies make a pass on the postings lists
for each query term
– This can be done Term-at-a-Time (TAAT) – one query term
at a time
– Or Document-at-a-time (DAAT) – all query terms in parallel
• We will explain these, before showing how we can
improve them
42

Term-at-a-Time (TAAT)
43
rank docid score
1
2
…
term1 p
term2 p
21 2
21 2
25 2
Advantages:
• Simple
Disadvantages:
• Requires lots of memory to contain partial scores for all
documents
• Difficult to do Boolean or phrase queries, as we don’t have
a document’s postings for all query terms at once
21 221 4
25 2

Document-at-a-Time (DAAT)
44
term1 p
term2 p
21 2
21 2
25 2
Advantages:
• Reduced memory compared to TAAT (and hence faster)
• Supports Boolean query operators, phrases, etc.
Disadvantages:
• Slightly more complex to implement
Most commercial search engines are reported to use
DAAT
rank docid score
1
2
…
21 4
25 2

TAAT vs DAAT
• TAAT and DAAT have been the cornerstores of query
evaluation in IR systems since 1970s.
• The plain implementations of these two strategies are seldom
used anymore, since many optimizations have been proposed
during the years
• Several known systems in production today, from large scale
search engines as Google and Yahoo!, to open source text
indexing packages as Lucene and Lemur, use some
optimized variation of these strategies

TAAT vs DAAT
• [Turtle and Flood,1995] were the first to argue that DAAT could beat
TAAT in practical environments.
– For large corpora, DAAT implementations require a smaller run-time memory footprint
46
• [Fontoura et al., 2011] report experiments on
small (200k docs) and large indexes (3M docs),
with short (4.26 terms) and long (57.76 terms)
queries (times are in microseconds):
SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Source: [Fontoura et al, 2011]

Bibliography
47
[Heaps, 1978] Information Retrieval: Computational and Theoretical Aspects. Academic Press, USA,
1978.
[Buckley and Lewit, 1985] Optimization of inverted vector searches. In Proc. SIGIR, pages 97–110,
1985. ACM.
[Turtle and Flood, 1995] Query evaluation: strategies and optimizations. IPM, 31(6):831–850, 1995.
[Moffat and Zobel, 1996] Self-indexing inverted files for fast text retrieval. ACM TOIS, 14(4):349–379,
1996.
[Kaszkiel and Zobel, 1998] Term-ordered query evaluation versus document-ordered query evaluation
for large document databases. In Proc. SIGIR,, pages 343– 344, 1998. ACM.
[Kaszkiel et al., 1999] Efficient passage ranking for document databases. ACM TOIS, 17(4):406–439,
1999.
[Broder et al., 2003] Efficient query evaluation using a two-level retrieval process. In Proc. CIKM,
pages 426–434, 2003. ACM.
[Zobel and Moffat, 2006] Inverted files for text search engines. ACM Computing Surveys, 38(2), 2006.
[Culpepper and Moffat, 2010] Efficient set intersection for inverted indexing. ACM TOIS, 29(1):1:1–
1:25, 2010.
[Fontoura et al., 2011] Evaluation strategies for top-k queries over memory-resident inverted indexes.
Proc. of VLDB Endowment, 4(12): 1213–1224, 2011.

FASTER QUERY EVALUATION
Dynamic Pruning

Query Processing
• Breakdown
• Pre-process the query (e.g., tokenisation, stemming)
• Lookup the statistics for each term in the lexicon
• Process the postings for each query term, computing scores for
documents to identify the final retrieved set
• Output the retrieved set with metadata (e.g., URLs)
• What takes time?
• Number of query terms
– Longer queries have more terms with posting lists to process
• Length of posting lists
– More postings takes longer times
• Goal: avoid (unnecessary) scoring of posting
Term Pipeline
Document
Retrieval Model
Query
Tokenizer
Re-Ranking
Index
Top Results

Dynamic Pruning during
Query Evaluation
• Dynamic pruning strategies aim to make scoring faster by
only scoring a subset of the documents
– The core assumption of these approaches is that the
user is only interested in the top K results, say K=20
– During query scoring, it is possible to determine if a
document cannot make the top K ranked results
– Hence, the scoring of such documents can be terminated
early, or skipped entirely, without damaging retrieval
effectiveness to rank K
• We call this “safe-to-rank K”
• Dynamic pruning is based upon
– Early termination
– Comparing upper bounds on retrieval scores with thresholds
50

Dynamic Pruning Techniques
• MaxScore [1]
– Early termination: does not compute scores for
documents that won’t be retrieved by comparing upper
bounds with a score threshold
• WAND [2]
– Approximate evaluation: does not consider documents
with approximate scores (sum of upper bounds) lower
than threshold
– Therefore, it focuses on the combinations of terms needed
(wAND)
• BlockMaxWand
– SOTA variant of WAND that uses benefits from the block-
layout of posting lists
51
[1] H Turtle & J Flood. Query Evaluation : Strategies and Optimisations. IPM: 31(6). 1995.
[2] A Broder et al. Efficient Query Evaluation using a Two-Level Retrieval Process. CIKM 2003.

Early Termination
A document evaluation is early terminated if all or
some of its postings, relative to the terms of the query,
are not fetched from the inverted index or not scored
by the ranking function.

Term and document upper
bounds
• Similarity function between queries and documents:
• For each term t in the vocabulary, we can compute a term
upper bound (also known as max score) σt(q) such that, for
all documents d in the posting list of term t:
• For a query q and a document d, we can compute a
document upper bound σd(q) by summing up the query
term upper bounds and/or actual scores:

Heap and Threshold
• During query processing, the top k full or
partial scores computed so far, together
with the corresponding docids, are
organized in a priority queue, or min-
heap,
• The smallest value of these (partial)
scores is called threshold θ.
• If there are not at least k scores, the
threshold value is assumed to be 0.
d1
dK
d2
dK-1
dK-2

Dynamic Pruning Condition
• Property I: The threshold value is not decreasing
• Property II: A document with a score smaller than the
threshold will never be in final top K documents
• Pruning Condition: for a query q and a document d, if the
document upper bound σd(q), computed by using partial
scores, if any, and term upper bounds, is lesser then or equal
to the current threshold θ, the document processing can be
early terminated.

Example of early termination
= 11.1 –
2.1 + 0.5

Effectiveness Guarantees
• safe/unoptimized: all documents, not just the top k, appear in the same
order and with the same score as they would appear in the ranking
produced by a unoptimized strategy.
• safe up to k/rank safe: the top k documents produced are ranked
correctly, but the document scores are not guaranteed to coincide with
the scores produced by a unoptimized strategy.
• unordered safe up to k/set safe: the documents coincides with the top k
documents computed by a full strategy, but their ranking can be different.
• approximate/unsafe: no provable guarantees on the correctness of any
portion of the ranking produced by these optimizations can be given
[Turtle and Flood, 1995]
[Strohman, 2007]
LessEffectiveMoreEffective

TAAT Optimizations
• All TAAT dynamic pruning optimizations split the query
processing into two distinct phases.
• OR mode
– the normal TAAT algorithm is executed
– New accumulators are created and updated, until a certain pruning
condition is met
• AND mode
– no new accumulators are created,
– a different algorithm is executed, on the remaining terms and/or on the
accumulators created during the first phase.

AND Mode
• Quit.
– The processing of postings completely stops at the end of the first phase.
– No new accumulators are created and no postings from the remaining term’s posting lists are processed.
• Continue.
– The creation of new accumulators stops at the end of the first.
– The remaining term’s posting lists will be processed, but just to update the score of the already existing
postings.
• Decrease.
– The processing of postings in the second phase proceeds as in Continue.
– The number of accumulators is decreased as the remaining terms are processed.
– Those terms will have small score contributions, with few chances to alter the current top k document
ranking.
– More importantly, the memory occupancy could be reduced as soon as we realize that an existing
accumulator can be dropped since it will never enter in the final top k documents, with a corresponding
benefit on response times.
[Moffat and Zobel, 1996]
[Anh and Moffat, 1998]

Summary

DAAT Optimizations
• We will cover three families of DAAT Optimizations
1. Top Docs:
– Adds record of "top" documents to the start of each posting list
2. Max Scores:
– Utilises thresholds and upper bounds
– Can skip over unscored postings
– E.g. MaxScore, WAND
3. Block Max:
– Leverages the block layout of the posting lists
– Can skip entire blocks of postings
– E.g. BlockMaxWand

Top Docs Optimizations
• [Brown, 1995] proposed to focus processing on documents with high
score contributions.
• For any given term t in the vocabulary, a top candidates list (also known
as a champions list) is stored separately as a new posting list.
• Then, the normal DAAT processing for any query q is applied to the top
candidate posting lists only, reducing sensibly the number of postings
scored, but with potential negative impacts on the effectiveness, since this
optimization is unsafe.

Max Scores Optimizations
• MaxScore strategy [Turtle and Flood, 1995]
– Early termination: does not compute scores for documents that won’t be returned
– By comparing upper bounds with threshold
– Based on essential and non-essential posting lists
– Suitable for TAAT as well
• WAND strategy [Broder et al., 2003]
– Approximate evaluation: does not consider documents with approximate scores
(sum of upper bounds) lower than threshold
• Both use docids sorted posting lists
• Both exploit skipping

MaxScore Example (I)
no document can be returned as a top k
results if it appears in the non-essential
lists only, i.e., 1.3 + 3.4 < 6.0
Sorted in increasing order of
term upper bound

MaxScore Example (II)
Essential lists are processed in
DAAT, and their score
contributions computed

MaxScore Example (III)
process the
non essential
lists by
skipping to the
candidate
docid.
Partial score is
updated.
As soon as a partial score
implies early termination,
Move to next document.

MaxScore Example (IV)

WAND
WAND uses approximate evaluation:
does not consider documents with approximate scores (sum of upper bounds) lower than
threshold
Weak AND or Weighted AND operator:
is true if and only if
where
• Xi is a boolean variable,
• wi is a weight,
• xi is 0 or 1 depending on Xi
• θ is a threshold

WAND Details
• Given a query q = {t1,...,tn} and a document d,
• Xi is true if and only if the term ti appears in
document d
• We take the term upper bound σi(d) as weight wi
• The threshold θ is the smallest score of the top k
documents scored so far during query processing
is true
iff

WAND Example (I)
Posting lists ALWAYS sorted by
current docid
Pivot docid is the first docid
with a non-zero chance to end
up in the top k results

WAND Example (II)
Move previous posting lists up to the
pivot (or right after) using p.next(d)
operator

WAND Example (III)
Keep the posting
lists sorted by
increasing docid!
2.1 + 1.3 + 4.0 > 6.0
then full evaluation

WAND Example (IV)
Advance the iterators by one
and keep lists sorted

Performance
Latency results (in ms) for DAAT, MaxScore and WAND (with speedups), for K = 30. Adapted from [Fontoura
et al., 2011].
Latency results (in ms) for DAAT, MaxScore and WAND (with speedups) for different query lengths, average
query times on ClueWeb09, for K = 10. Adapted from [Mallia et al., 2017].

Unsafe WAND
• Reducing K makes (BlockMax)WAND and
MaxScore faster
– But you get a shorter, but safe, ranking
• The dynamic pruning techniques can be also
configured to be unsafe
– This is achieved by multiplying the threshold Θ by a
constant factor F
– Increased speed, but some documents will not be
retrieved

Global vs Local
Term Upper Bounds
• A (global) term upper bound is computed over the scores of all
documents in a posting list
• Each posting list is sequentially divided in a block list, where each block
containts a given number of consecutive postings, e.g., 128 postings per
block.
• For each block, a block (local) term upper bound is computed, storing
the maximum contribution of the postings in the block.

Block Max Indexes
BlockMaxWAND requires further information in the skiplists –
i.e. Block Max Indices
lexicon

Simplified Block Max WAND
pseudocode
while true
pivot ← find pivot as in WAND (all lists) or break
perform a shallow move // advance block list iterators to pivot
compute block upper bound
if pivot would enter heap
compute score and update heap
perform a deep move // advance posting list iterators to next docid
reorder by docid
else
perform a shallow move // advance block list iterators to next block(s)
perform a deep move // advance posting list iterators to next block(s)
reorder by docid
return top K results in heap

Performance
Latency results (in ms) for DAAT, WAND and BMW (64 postings blocks) (with
speedups) for different query lengths, average query times (Avg, in ms) on Gov2, for
K = 10.

Variable Block Max WAND
Blocks size matters:
• too large → inaccurate estimation
• too small → average skip is small
Is fixed-sized blocks the right choice?

ALTERNATIVE INDEX
LAYOUTS

Frequency-sorted Indexes
• An alternative way to arrange the posting lists is to sort
them such that the highest scoring documents appear
early.
• If an algorithm could find them at the beginning of
posting lists, dynamic pruning conditions can be
enforced very early.
• [Wong and Lee, 1993] proposed to sort posting list in
decreasing order of term-document frequency value.

Impact-sorted Indexes
• [Anh et al., 2001] introduced the definition of impact of
term t in document d for the quantity wt(d)/Wd (or
document impact) and impact of term t in query q for
wt(q) (or query impact).
• They proposed to facilitate effective query pruning by
sorting the posting lists in decreasing order of
(document) impact, i.e., to process queries on a impact-
sorted index

Quantized Impacts
• Transform impact values from real numbers in [L,U] to b-
bit integers, i.e. 2b buckets.
• LeftGeom Quantization:
• Uniform Quantization
• Reverse Mapping

Quantization Bits
• [Anh et al., 2001] show that 5 bits are reasonable to
encode the impacts with minimal effectiveness losses for
both quantization schemes
– with 10 bits, no effectiveness losses are reported
– This value should be tuned when using different collections
and/or different similarity functions.
• Crane et al. [2013] confirmed that 5 to 8 bits are enough
for small-medium document collections
– for larger collections, from 8 up to 25 bits are necessary,
depending on the effectiveness measure.

Compressing impacts
• [Trotman, 2014b] investigated the performance of integer compression
algorithms for frequency-sorted and impact-sorted indexes.
• The new SIMD compressor (QMX) is more time and space efficient than
the existing SIMD codecs, as furtherly examined by [Trotman and Lin,
2016].
• [Lin and Trotman, 2017] found that the best performance in query
processing with SAAT is indeed obtained when no compression is used,
even if the advantage w.r.t. QMX is small (∼5%).
• Uncompressed impact-sorted indexes can be up to two times larger than
their QMX- compressed versions.

Impact-based Query
Processing

Score-at-a-time

Dynamic Pruning for SAAT
• The impact blocks are initially processed in OR-mode, then AND-
mode, and finally in REFINE-mode.
• After the OR-mode, a fidelity control knob controls the
percentage of remaining postings to process thereafter (AND &
REFINE).
– 30% of the remaining postings results in good retrieval performance.
• Experiments by [Ahn & Moffat, 2006] showed that this optimization
is able to reduce the memory footprint by 98% w.r.t. SAAT, with
a speedup of 1.75×.

Anytime Ranking
• [Lin and Trotman, 2015] proposed a linear regression model to
translate a deadline time on the query processing time into the
number of posting to process.
• [Mackenzie et al., 2017] proposed to stop after processing a given
percentage of the total postings in the query terms' posting lists, on
a per-query basis.
• According to their experiments, the fixed threshold may result in
reduced effectiveness, as the number of query terms increases,
but conversely it gives a very strict control over the tail latency of
the queries.

99
[Wong and Lee, 1993]. Implementations of partial document ranking using inverted files. IPM, 29 (5):647–669, 1993.
[Persin, 1994]. Document filtering for fast ranking. In Proc. SIGIR, 1994. ACM.
[Turtle and Flood, 1995] Query evaluation: strategies and optimizations. IPM, 31(6):831–850, 1995.
[Anh et al., 2001] Vector-space ranking with effective early termination. In Proc. SIGIR, pages 35–42, 2001. ACM.
[Anh and Moffat, 2002]. Impact transformation: effective and efficient web retrieval. In Proc. SIGIR, pages 3–10, 2002. ACM.
[Broder et al., 2003] Efficient query evaluation using a two-level retrieval process. In Proc. CIKM, pages 426–434, 2003. ACM.
[Anh and Moffat, 2006] Pruned query evaluation using pre-computed impacts. In Proc. SIGIR, pages 372–379, 2006. ACM.
[Zobel and Moffat, 2006] Inverted files for text search engines. ACM Computing Surveys, 38(2), 2006.
[Strohman, 2007] Efficient Processing of Complex Features for Information Retrieval. PhD Thesis, 2007.
[Fontoura et al., 2011] Evaluation strategies for top-k queries over memory-resident inverted indexes. Proc. of VLDB Endowment, 4(12): 1213–
1224, 2011.
[Crane et al., 2013] Maintaining discriminatory power in quantized indexes. In Proc. CIKM, pages 1221–1224,2013. ACM.
[Lin and Trotman, 2015] Anytime ranking for impact-ordered indexes. In Proc. ICTIR, pages 301–304, 2015. ACM.
[Trotman and Lin, 2016] In vacuo and in situ evaluation of simd codecs. In Proc. ADCS, pages 1–8, 2016. ACM.
[Mackenzie et al., 2017] Early termination heuristics for score-at-a-time index traversal. In Proc, ADCS, pages 8:1– 8:8, 2017. ACM.
[Lin and Trotman, 2017] The role of index compression in score-at-a-time query evaluation. Information Retrieval Journal, pages 1–22, 2017.
[Mallia et al., 2017] Faster blockmax wand with variable-sized blocks. In Proc. SIGIR, pages 625–634, 2017. ACM.

PART 3: LEARNING TO RANK

1,000,000,000s
of documents
1000s of
documents
Roadmap
101
20
docs
LEARNING TO RANK
e.g. BM25
& dynamic pruning
Part 4: Query
& Applications

Motivations for Learning
• How to choose term weighting models?
• Different term weighting models have different assumptions
about how relevant documents should be retrieved
• Also:
• Field-based models: term occurrences in different fields
matter differently
• Proximity-models: close co-occurrences matter more
• (Deep learned matching models)
• Priors: documents with particular lengths or URL/inlink
values matter more
• Query Features: Long queries, difficult queries, query type
How to combine all these easily and appropriately?

Learning to Rank
• Application of tailored machine learning
techniques to automatically (select and) weight
retrieval features
• Based on training data with relevance assessments
• Learning to rank has been popularised by
commercial search engines (e.g. Bing, Baidu,
Yandex)
• They require large training datasets, possibly instantiated
from click-through data
• Click-through data has facilitated the deployment of
learning approaches
T.-Y. Liu. (2009). Learning to rank for information retrieval. Foundation and Trends in Information Retrieval, 3(3), 225–331.

Types of Features
• Typically, commercial search engines use hundreds of
features for ranking documents, usually categorised as
follows:
Name
Varies depending on…
Examples
Query Document
Query
Dependent
Features
✔ ✔
Weighting models, e.g. BM25, PL2
Proximity models, e.g. Markov Random Fields
Field-based weighting models, e.g. PL2F
Deep learned matched representations
Query
Independent
Features
✗ ✔
PageRank, number of inlinks
Spamminess
Query
Features
✔ ✗
Query length
Presence of entities

Learning to Rank
1. Sample Identification
• Apply BM25 or similar (e.g. DFR PL2) to rank documents
with respect to the query
• Hope that the sample contains enough relevant documents
2. Compute more features
• Query Dependent
• Query Independent
• Query Features
3A. Learn ranking model
• Based on training data 3B. Apply learned model
− Re-rank sample documents

Schematically…
Generated using same
previous two steps, +
relevance assessments
N. Tonellotto, C. Macdonald, I. Ounis. (2013). Efficient and effective retrieval using selective pruning. WSDM'13.
Sample
Ranking
Learning to Rank
technique

Query-Dependent
Feature Extraction
• We might typically deploy a number of query dependent
features
– Such as additional weighting models, e.g. fields/proximity, that are
calculated based on information in the inverted index
• We want the result set passed to the final cascade to have
all query dependent features computed
– Once first cascade retrieval has ended, it’s too late to compute
features without re-traversing the inverted index postings lists
– It would take too long to compute all features for all scored
documents
• Solutions:
– 1. cache the "fat" postings for documents that might make the
sample (Postings contain frequencies, positions, fields, etc.)
– 2. access the document vectors after retrieval
107

d30 4.3 <p1>,<p2>
d19 3.7 <p1>
d42 3.3 <p1,p2>
Caching the Fat
• Solution 1: cache the "fat" postings for documents
that might make the sample
Q
Initial
Sample Retrieval
d30 4.3
d19 3.7
d42 3.3
Calculate Features;
Apply LTR
d19 1.0
d42 0.9
d30 0.5
Sample ResultSet
“Fat” Final
ResultSet
Inv
C Macdonald et al. (2012). About Learning Models with Multiple Query Dependent Features. TOIS
31(3). 108
(With
positions)
 Lookup cost is almost free – we had to access the inverted index anyway
✗ We only have access to postings for the original query terms

Accessing Document Vectors
• Solution 2: access direct/forward index document vectors
Q
Initial
Sample Retrieval
d30 4.3
d19 3.7
d42 3.3
Calculate Features;
Apply LTR
d19 1.0
d42 0.9
d30 0.5
Sample ResultSet
Final
ResultSet
Inv
Asadi & Lin. Document vector representations for feature extraction in multi-stage document
ranking. Inf. Retr. 16(6): 747-768 (2013) 109
(Need not
have
positions)
 We can calculate features for query terms not in the original query
✗ Additional lookup for K documents
✗ Direct Index postings contain information for all terms in a document
d30 4.3 <p1>,<p2>
d19 3.7 <p1>
d42 3.3 <p1,p2>
Direct

Learning to Rank Process
document
sample
✓
✗
✓
✓
✗
✓
✓
✗
✓
✓
✗
✓
Learner Learned
Model
Re-rank
using
learned
model
unseen
query
Training examples
f1 f2 f3 f4
d1 0.2 0.1 0.5 0.8
d2 0.5 0.7 0.0 0.2
d3 0.3 0.2 0.5 0.0

The Importance of Sample Size
• What sample size is necessary for effective
learning?
• Small samples: faster learning, faster retrieval
• Large samples: higher recall
Small sample size,
degraded
performance
No benefit for
sample size larger
than 1000
Pairwise techniques
struggle with large
samples
NDCG@20 for
different sample
sizes
Increased
performance as
sample size rises
C. Macdonald, R. Santos and I. Ounis. (2012). The Whens and Hows of Learning to Rank. IR Journal. DOI: 1007/s10791-012-9209-9

Types of Learned Models (1)
Linear Model
• Linear Models (the most intuitive to comprehend)
– Many learning to rank techniques generate a linear combination
of feature values:
– Linear Models make some assumptions:
• Feature Usage: They assume that the same features are needed by all
queries
• Model Form: The model is only a linear combination of feature values.
- Contrast this with genetic algorithms, which can learn functional combinations of features, by
randomly introducing operators (e.g. try divide feature a by feature b), but are unpractical to learn
• It is difficult to find wf values that maximise the
performance of an IR evaluation metric, as they are none-
smooth and none-differentiable
– Typically, techniques such as simulated annealing or stochastic
gradient descent are used to empirically obtain wf values
score(d,Q) = wf
f
å ×valuef (d)

Type of Learned Models (2)
Regression Trees
– A regression tree is series of decisions, leading to a partial score
output
– The outcome of the learner is a “forest” of many such trees, used to
calculate the final score of a document for a query
– Their ability to customise branches makes them more effective than
linear models
– Regression trees are pointwise in nature, but several major search
engines have created adapted regression tree techniques that are
listwise
– E.g. Microsoft’s LambdaMART is at the heart of the Bing search engine
(hundreds/thousands
of them!)Σ
For each document in the sample:
S. Tyree, K. Weinberger, K. Agrawal, J. Paykin. (2011). Parallel Boosted Regression Trees for Web Search Ranking. WWW’11.

Learning to Rank Best
Practices (Query Features)
• Tree-based learning to rank techniques have an
inherent advantage
• They can activate different parts of their learned model
depending on feature values
• We can add “query features”, which allow different sub-trees
for queries with different characteristics
– E.g. long query, apply proximity
Class of Query Features NDCG@20
Baseline (47 document features) LambdaMART 0.2832
Query Performance Predictors 0.3033*
Query Topic Classification (entities) 0.3109*
Query Concept Identification (concepts) 0.3049*
Query Log Mining (query frequency) 0.3085*
C. Macdonald, R. Santos and I. Ounis. (2012). On the Usefulness of Query Features for Learning to Rank. CIKM’12.

Learning to Rank
Best Practices
• About the sample – it must be:
– Small enough that calculating features doesn’t take too long
– Small enough that BMW/WAND/MaxScore are fast!
– Large enough to have enough sufficient recall of relevant douments
• Previous slide says 1000 documents needed for TREC Web
corpora
– Simple early cascade: Only time for a single weighting model to
create the sample
• E.g. PL2 < BM25 with no features; but when features are added, no real
difference
• Multiple weighting models as features:
• Using more weighting models do improve the learned model
• Field Models as Features
• Adding field-based models as features always improves
effectiveness
• In general, Query Independent features (e.g. PageRank,
Inlinks, URL length) improve effectiveness
C. Macdonald, R. Santos, I. Ounis, B. He. About Learning Models with Multiple Query Dependent Features. TOIS 31(3), 2013.
C Macdonald & I Ounis (2012). The Whens & Hows of Learning to Rank. INRT. 16(5), 2012.

EFFICIENTLY APPLYING TREE
MODELS

Additive Ensmbles of
Regression Trees
• Tree forests: Gradient-Boosted Regression Tress, Lambda MART, Random Forest, etc.
– ensemble of weak learners, each contributing a partial score
– at scoring time, all trees can be processed independently
T1
(q, d)
s1
T2
(q, d)
s2
Tn
(q, d)
sn
Assuming:
• 3,000 trees
• an average tree depth of 10 nodes
• 1,000 documents scored per query
• a cluster with 1,000 search shards
Costs
• 3,000 x 10 x 1,000 = 30 M tests per query
and per shard
• Approx. 30 G tests for the entire search
cluster

127
13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55
Query-Document pair features
F1 F2 F3 F4 F5 F6 F7 F8
0.4 -1.4
1.5 3.2
2.0
0.5 -3.1
7.1
50.1:F4
10.1:F1
-3.0:F3
-1.0:F3
3.0:F8
0.1:F6
0.2:F2 2.0
s(d) += 2.0
exit leaf
true false
- number of trees = 1,000 – 20,000
- number of leaves = 4 – 64
- number of docs = 3,000 – 10,000
- number of features = 100 –1,000
Processing a
query-document pair
d[F4] ≤ 50.1

128
Struct+
• Each tree node is represented by a C object containing:
• the feature id,
• the associated threshold
• the left and right pointers
• Memory is allocated for all the nodes at once
• The nodes are laid out using a breadth-first traversal of the tree
Node* getLeaf(Node* np, float* fv) {
if (!np->left && !np->right) {
return np;
}
if (fv[np->fid] <= np->threshold) {
return getLeaf(np->left, fv);
} else {
return getLeaf(np->right, fv);
}
}
Tree traversal:
- No guarantees about references locality
- Low cache hit ratio
- High branch misprediction rate
- Processor pipeline flushing

129
CodeGen
• Statically-generated if-then-else blocks
if (x[4] <= 50.1) {
// recurses on the left subtree
...
} else {
// recurses on the right subtree
if(x[3] <= -3.0) {
result = 0.4;
} else {
result = -1.4;
}
}
Tree traversal:
- Static generation
- Recompilation if model changes
- Compiler "black magic" optimizations
- Expected to be fast
- Compact code, expected to fit in I-cache
- High cache hit ratio
- No data dependencies, a lot of control dependencies
- High branch misprediction rate

130
VPred
• Array representation of nodes, similar to the Struct+ baseline
• All trees in a single array, need additional data structures to
locate a tree in the array, and to know its height
Tree traversal:
Theory Practice
double depth4(float* x, Node* nodes) {
int nodeId = 0;
nodeId = nodes->children[
x[nodes[nodeId].fid] > nodes[nodeId].theta];
return scores[nodeId];
}
- Static unrolling of traversal functions
- High compilation times
- Compiler "black magic" optimizations
- Expected to be fast
- In-code vectorization
- Processing 16 documents at a time
- Requires tuning on dataset
- No control dependencies, a lot of data dependencies
- Low branch misprediction rate

131
Vectorization of Trees
offsets
|F|
T1
(q d)
s1
T2
(q, d)
s2
Tn
(q, d)
sn
Thresholds of
tests nodes
involving
feature f0
Thresholds
of tests nodes
involving
feature f1
thresholds

132
while !(x[f0] <= thresholds[i]) do
// bitmask operations
offsets
thresholds
• Linear access to data
• Highly predictable conditional branches
• True nodes are not inspected at all
Processing Tress in
Vectors

133
Quickscorer algorithm
Tree traversal
(BitMask
Operations)
Score
computation

134
Table 2: Per -docum ent scor ing t im e in µs of QS, VPr ed, If -T hen-El se and St r uct + on MSN-1 and Y!S1
dat aset s. G ain fact or s ar e r epor t ed in par ent heses.
M et hod ⇤
Number of t rees/ dat aset
1, 000 5, 000 10, 000 20, 000
MSN-1 Y!S1 MSN-1 Y!S1 MSN-1 Y!S1 MSN-1 Y!S1
QS
8
2.2 (–) 4.3 (–) 10.5 (–) 14.3 (–) 20.0 (–) 25.4 (–) 40.5 (–) 48.1 (–)
VPr ed 7.9 (3.6x) 8.5 (2.0x) 40.2 (3.8x) 41.6 (2.9x) 80.5 (4.0x) 82.7 (3.3) 161.4 (4.0x) 164.8 (3.4x)
If -T hen-El se 8.2 (3.7x) 10.3 (2.4x) 81.0 (7.7x) 85.8 (6.0x) 185.1 (9.3x) 185.8 (7.3x) 709.0 (17.5x) 772.2 (16.0x)
St r uct + 21.2 (9.6x) 23.1 (5.4x) 107.7 (10.3x) 112.6 (7.9x) 373.7 (18.7x) 390.8 (15.4x) 1150.4 (28.4x) 1141.6 (23.7x)
QS
16
2.9 (–) 6.1 (–) 16.2 (–) 22.2 (–) 32.4 (–) 41.2 (–) 67.8 (–) 81.0 (–)
VPr ed 16.0 (5.5x) 16.5 (2.7x) 82.4 (5.0x) 82.8 (3.7x) 165.5 (5.1x) 165.2 (4.0x) 336.4 (4.9x) 336.1 (4.1x)
St r uct + 42.6 (14.7x) 41.0 (6.7x) 424.3 (26.2x) 403.9 (18.2x) 1218.6 (37.6x) 1191.3 (28.9x) 2590.8 (38.2x) 2621.2 (32.4x)
QS
32
5.2 (–) 9.7 (–) 27.1 (–) 34.3 (–) 59.6 (–) 70.3 (–) 155.8 (–) 160.1 (–)
VPr ed 31.9 (6.1x) 31.6 (3.2x) 165.2 (6.0x) 162.2 (4.7x) 343.4 (5.7x) 336.6 (4.8x) 711.9 (4.5x) 694.8 (4.3x)
St r uct + 69.1 (13.3x) 67.4 (6.9x) 928.6 (34.2x) 834.6 (24.3x) 1806.7 (30.3x) 1774.3 (25.2x) 4610.8 (29.6x) 4332.3 (27.0x)
QS
64
9.5 (–) 15.1 (–) 56.3 (–) 66.9 (–) 157.5 (–) 159.4 (–) 425.1 (–) 343.7 (–)
VPr ed 62.2 (6.5x) 57.6 (3.8x) 355.2 (6.3x) 334.9 (5.0x) 734.4 (4.7x) 706.8 (4.4x) 1309.7 (3.0x) 1420.7 (4.1x)
St r uct + 109.8 (11.6x) 116.8 (7.7x) 1661.7 (29.5x) 1554.6 (23.2x) 3040.7 (19.3x) 2937.3 (18.4x) 5437.0 (12.8x) 5456.4 (15.9x)
same trivially holds for St r uct + . This means that the in-
terleaved traversal strategy of QS needs to processlessnodes
than in a traditional root-to-leaf visit. This mostly explains
the results achieved by QS.
As far as number of branches is concerned, we note that,
not surprisingly, QS and VPr ed are much more efficient
than If -T hen-El se and St r uct + with this respect. QS
number of L3 cache misses starts increasing when dealing
with 9, 000 trees on MSN and 8, 000 trees on Y!S1.
BWQS: a block-wise variant of QS
The previous experiments suggest that improving the cache
efficiency of QS may result in signiﬁcant beneﬁts. As in
Performance
Per-document scoring time in microsecs and speedups

135
• Craig Macdonald, Rodrygo Santos and Iadh Ounis. The Whens and Hows of Learning to Rank. Information
Retrieval 16(5):584-628.
• Craig Macdonald, Rodrygo Santos and Iadh Ounis. On the Usefulness of Query Features for Learning to Rank. In
Proceedings of CIKM 2012.
• C Macdonald et al, About Learning Models with Multiple Query Dependent Features. (2012). TOIS 31(3).
• N Fuhr. Optimal Polynomial Retrieval Functions Based on the Probability Ranking Principle. (1989). TOIS. 7(3),
1989.
• S. Tyree, K.Q. Weinberger, K. Agrawal, J. Paykin Parallel boosted regression trees for web search ranking. WWW
2011.
• N. Asadi, J. Lin, and A. P. de Vries. Runtime optimizations for tree-based machine learning models. IEEE
Transactions on Knowledge and Data Engineering, 26(9):2281–2292, 2014.
• C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. Quickscorer: A fast algorithm to
rank documents with additive ensembles of regression trees. In Proc. of the 38th ACM SIGIR Conference, pages
73–82, 2015.
• D. Dato, C. Lucchese, F.M. Nardini, S. Orlando, R. Perego, N. Tonellotto, R. Venturini. Fast Ranking with Additive
Ensembles of Oblivious and Non-Oblivious Regression Trees. ACM Transactions on Information Systems, 2016.
• C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. Exploiting CPU SIMD
Extensions to Speed-up Document Scoring with Tree Ensembles. In Proc. of the 39th ACM SIGIR Conference,
pages 833-836, 2016.
• F. Lettich, C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. GPU-based
Parallelization of QuickScorer to Speed-up Document Ranking with Tree Ensembles. In Proc. of the 7th IIR
Workshop, 2016

1,000,000,000s
of documents
1000s of
documents
Roadmap
137
20
docs
LEARNING TO RANK
e.g. BM25
& dynamic pruning
Part 4: Query
& Applications

If we know how long a query will take,
can we reconfigure the search engine's
cascades?
1,000,000,000s
of documents
1000s of
documents
10
docs
138

Main Idea:
Query Efficiency Prediction
• Predict how long an unseen query will take to execute, before it has
executed.
• This facilitates 3+ manners to make a search engine more efficient:
1. Reconfigure the cascades of the search engine, trading off a little
effectiveness for efficiency
2. Apply more CPU cores to long-running queries
3. Decide how to plan the rewrites of a query, to reduce long-
running queries
In each case: increasing efficiency means increased server
capacity and energy savings

2 term queries
4 term queries
What makes a query
fast or slow?
141
Length of
Posting Lists?
Full/MaxScore/
Wand?
Length of
Query?
Co-occurrence
of query
terms?

Query Efficiency
Prediction
• Can we predict response time for a query (query efficiency
prediction)?
• Properties:
– Can be calculated online before retrieval commences
– Easy (quick) to calculate
– Based on single term statistics, previously calculated offline
– Includes statistics on posting list’s score distribution
term-level
statistic
predictor
feature
aggregation
function

• We combine statistics of the query terms using in a regression to
accurately predict execution time for a query?
Query Efficiency
Prediction Framework
total
postings
(baseline)
42 inexpensive featur

Accuracy of Query
Efficiency Predictors
144
Findings: Our learned predictor, with 42 features can accurately
predict the response time of the search engine for an unseen
query

Three Applications of Query
1. Selective retrieval – can we reduce the time taken in
the earlier retrieval cascades, e.g. by retrieving less
documents
2. Selective parallelisation – apply more CPU cores for
long-running queries
3. Selective query rewriting – can we decide how to
rewrite a query on a per-query basis?
• Recall that, for each application: increasing efficiency
means increased server capacity and energy savings

QEP APPLICATION (1):
SELECTIVE RETRIEVAL
146

QEP Application (1):
Selective Retrieval
• If we only need to return 10 documents to the user, do we
need to re-score 1000s?
• i.e. can we configure the earlier cascades to be faster for
longer queries, but potentially miss relevant documents?
1,000,000,000s
of documents
1000s of
documents
20
docs
500,000,000
docs
[Tonellotto, Macdonald, Ounis, WSDM 2013]
500s
docs
10
docs
1,000,000,000s
of documents
1000s of
documents
10
docs
Proposed Method [WSDM 2013]:
• Predict the efficiency (response time) of a query
• Adjust <K, F> of WAND to be faster for predicted longer queries

Selective Pruning Results
Findings: by using our 42-feature query efficiency predictor to selectively adjust the
retrieval cascades, we could reduce response time by 31%, while maintaining high
effectiveness
The increased capacity by the better efficiency is equivalent to 44% more servers

QEP APPLICATION (2) :
SELECTIVE
PARALLELISATION
152

QEP Application (2):
Selective Parallelisation
• Industry web search deployments have to meet service level
agreements
– e.g. 200ms for 99th-percentile response time
– Only 1 query in 100 may take longer than 200ms
• Some queries can take > 200ms
Problem: How to identify and speed-up those slow queries?

Parallelisation: Retrieval with multiple cores
• Parallelisation using N threads speeds up all queries, but
decreases server capacity to 1/N
• Idea! Selective Parallelisation – only parallelise those
queries predicted to be too long
154
CPU 1 CPU 2
Selective
Prediction:
long!
cup 1.3B p
Lexicon
person 500M p
world 6.9B p
cup
world
InvertedIndex
1
2
2
5
2
6
3
5
6
7
6
8
1
2
3
4
3
5
3
6
----
----
world cup
----

As used by !!
• Jeon et al. reports that this approach reduces the
99th-percentile response time by 50% from
200ms to 100ms, compared to other state-of-the-
art approaches that do not consider predicted
execution time
• In doing so, it increases server capacity by more
than 50%. This potentially saves one-third of
production servers, constituting a significant cost
reduction
• Selective parallelisation, using query efficiency
predictors, is now being deployed by industrial
search engine Bing, across hundreds of
thousands of servers
Image Source: Wikimedia
[Predictive parallelization: Taming tail latencies in web search.
M Jeon, S Kim, S Hwang, Y He, S Elnikety, AL Cox, S Rixner.
Proceedings of ACM SIGIR 2014]

QEP APPLICATION (3) :
SELECTIVE QUERY
REWRITING
156

Query Rewriting Plans
• Remember our notion of a query plan, incl. rewriting the query
– E.g. Context-Sensitive Stemming [1] or MRF Proximity: [2]
157
• All these rewritings use complex operators, e.g. #syn() #uwN()
– #syn: documents containing any of the words
– #1: documents containing the exact phrase
– #uwN: documents containing the words in a window of size N
• Key Question: how to rewrite a query, to maximise efficiency AND
effectiveness
157
[1] F. Peng et al. Context Sensitive Stemming for Web Search. SIGIR 2007.
[2] D. Metzler, W.B. Croft. A Markov Random Field Model for Term Dependencies. SIGIR 2005.

158
Different queries can be rewritten in different ways
Different queries (and rewritings) can take varying
durations to execute
We describe a framework that plans how to execute
each query (e.g. rewriting), taking into account:
How long each plan may take to execute
Expected effectiveness of each plan

A Selective Rewriting Approach
• The selective mechanism is based around two
ideas:
1. We can predict the efficiency of a given query
plan… and hence eliminate plans that will
exceed the target response time
2. We can predict the effectiveness of a given
query plan… and where possible execute the
most effective
159
QEP,
including for
complex
queries
QPP, or an
expectation of
plan
effectiveness

Complex terms
and ephemeral posting lists
• Recall, QEP essentially estimates pruning, based on posting list lengths
• Challenge – obtaining posting list length estimates for queries involving
complex terms, e.g.
160
car 21 2 25 3 26 5
cars 19 1 25 2
#syn(car,cars) 19 1 21 2 25 5 26 6
unigram
ephemeral

Observations on
Ephemeral Postings lists
• Observation: Statistics for ephemeral postings lists
can be bounded…
– e.g. #syn is disjunctive, so the length of its posting list
cannot be greater than the sum of the posting list lengths
of its constituent terms
– e.g. #1 is conjunctive, the length of its postings list cannot
be greater than the minimum of the posting list lengths of
its constituent terms
– e.g. for #1, the maximum number of matches for a phrase
cannot exceed the minimum of the max term frequencies
of the constituent terms
• These allow:
a) upper bound statistics (necessary for doing dynamic
pruning with complex terms)
b) Efficiency predictions for queries with complex terms
161

Putting together: Complex QEP
& Online Plan Selection
162
49% decrease in mean
response time, and 62%
decrease in tail response time,
compared to Default, w/o
significant loss in NDCG

Looking Forward
• The query plans we showed here demonstrate
only two possible rewritings:
– Proximity and stemming
– They don’t consider the (predicted) effectiveness of a
plan
• We highlight Rosset et al, who have proposed
use of reinforcement learning to identify the most
appropriate plan per-query and per-index block.
163
Optimizing Query Evaluations using Reinforcement Learning for Web Search.
Rosset et al, Proceedings of SIGIR 2018.

QEP Bibliography
• Predictive parallelization: Taming tail latencies in web search.
M Jeon, S Kim, S Hwang, Y He, S Elnikety, AL Cox, S Rixner.
Proceedings of ACM SIGIR 2014
• Load-Sensitive Selective Pruning for Distributed
Search. Nicola Tonellotto, Fabrizio Silvestri, Raffaele Perego,
Daniele Broccolo, Salvatore Orlando, Craig Macdonald and
Iadh Ounis. In Proceedings of CIKM 2013.
• Efficient and effective retrieval using selective pruning. Nicola
Tonellotto, Craig Macdonald and Iadh Ounis. In Proceedings
of WSDM 2013.
• Learning to Predict Response Times for Online Query
Scheduling. Craig Macdonald, Nicola Tonellotto and Iadh
Ounis. In Proceedings of SIGIR 2012.
• Efficient & Effective Selective Query Rewriting with Efficiency
Predictions. Craig Macdonald, Nicola Tonellotto, Iadh Ounis.
In Proceedings of SIGIR 2017.
164ECIR 2017 — 9 April 2017 — Aberdeen, Scotland, UK

Conclusions
• Information retrieval requires not just effective, but also
efficient techniques
– These two form an inherent tradeoff
– Efficiency can be helped by applying more horsepower, but at
significant cost!
• Hence efficiency is important to ensure the Green
credentials of the IR community
• We have presented many techniques can aim to improve
particular methods of retrieving
– They make assumptions about the IR process (e.g. we care
about top K documents)
• As new IR techniques and models are created, there will
always be a for enhancements to their efficiency to make
them easily deployable

Emerging Trends
• BitFunnel (SIGIR 2017 best paper) – a "lossy"
signature representation for inverted files
– Now deployed by Bing
• Similarly deep learning data structures – Google
• Deep learning for matching – but efficiency costs
not yet well-researched
• And your future ideas…?

Acknolwedgements
• Some of the slides presented here have
appeared in presentations which we have co-
authored with:
– Iadh Ounis
– Richard McCreadie
– Matteo Catena
– Raffaele Perego
– Rossano Venturini
– Salvatore Orlando
– Franco Maria Nardini
– Claudio Lucchese

• Feedback:
– http://bit.ly/2ucH86b

Efficient Query Processing Infrastructures

More Related Content

Efficient Query Processing Infrastructures