Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Efficient Query Processing
Infrastructures
Craig Macdonald 1, Nicola Tonellotto 2
1 School of Computer Science
University of Glasgow
2 Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo”
National Research Council of Italy
The scale of the web search challenge
How many documents?
In how long?
• Reports suggest that Google considers a total of 30
trillion pages in the indexes of its search engine
– And it identified relevant results from these 30 trillion in 0.63 seconds
– Clearly this a big data problem!
• To answer a user's query, a search engine doesn’t read
through all of those pages: the index data structures help
it to efficiently find pages that effectively match the query
and will help the user
– Effective: users want relevant search results
– Efficient: users aren't prepared to wait a long time for search results
• Today, we provide an insight into search engine
architecture and the modern retrieval techniques for
attaining efficient yet effective search
Search as a Distributed Problem
• To achieve efficiency at Big Data scale, search
engines use many servers:
• N & M can be very big:
– Microsoft's Bing search engine has "hundreds of
thousands of query servers"
Shard
Replica
Query
Server
Retrieval
Strategy
Shard
Replica
Query
Server
Retrieval
Strategy
Broker
Query
Scheduler
queries
N M
Results
Merging
Image Source: http://wallpapercave.com/w/NzPVzSl
5
Image Source: http://www.coloandcloud.com/editorial/quincy-wa-big-data-centers-leverage-abundant-inexpensive-renewable-energy/
QUINCY WA, USA
Image Source: http://www.coloandcloud.com/editorial/quincy-wa-big-data-centers-leverage-abundant-inexpensive-
renewable-energy/
Data centres use 3 per cent of the global electricity supply and accounting for
about 2 per cent of total greenhouse gas emissions
This is expected to treble in the next decade, putting an enormous strain on
energy supplies and dealing a hefty blow to efforts to contain global warming
Data Centres are energy hungry…
Image Source: http://www.mwhglobal.com/project/wanapum-dam-spillway-project/Source: The Independent
How to keep pace?
IncomeExpenditure
More users, more ad revenue
More queries received
More documents to index
Search engines are growing
9
LessEffectiveMoreEffective
More Query ServersMore Efficient Querying
Expenditure
Income
Expenditure
Income
Green search engines are important
Keep your search engine
Efficient yet effective
Efficient: More throughput, less servers, less
expenditure
Effective: More users, more income
Income
11
Expenditure
So where is our focus?
Inter Data Center
So where is our focus?
Intra Data Center
So where is our focus?
Intra Server
So where is our focus?
• Different formulations are more effective & efficient for
different queries:
– Homepage: match on Url/Title/Atext
– Informational: match on Body, more query expansion
• Query type classification, query performance predictors and
query efficiency predictors all have a role in deciding the
most appropriate plan for a query
Query Plans: Executing a Query
16
Q
Qplan
Q Q plan
halloween
costumes
(halloween ∊ U|B|T) ⋀
( {costume,costumes} ∊ A|U|B|T)
facebook login (facebook ∊ U|B|T)
Optimizing Query Evaluations using Reinforcement Learning for Web Search. Rosset et al,
Proceedings of SIGIR 2018.
• Typically, in web-scale search, the ranking process can be
conceptually seen as a series of cascades [1]
– Rank some documents
– Pass top-ranked onto next cascade for refined re-ranking
1,000,000,000s
of documents
1000s of
documents
Cascades
17
20
docs
Boolean: Do query terms occur?
Simple Ranking: Identify a set most
likely to contain relevant documents
Re-Ranking: Try really hard to
get the top of the ranking correct,
using many signals (features)
LEARNING TO RANK
e.g. BM25
[1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day.
e.g. AND/OR
Q
Qplan
1,000,000,000s
of documents
1000s of
documents
Roadmap
18
20
docs
LEARNING TO RANK
e.g. BM25
[1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day.
e.g. AND/ORPart 1: Index layouts
Part 2: Query evaluation
& dynamic pruning
Part 3: Learning-to-rank
Part 4: Query
Efficiency Prediction
& Applications
Aims of this Tutorial
• ILO 1. learn about the fundamentals of query processing in
information retrieval in terms of the data structures and algorithms
involved (e.g. Document-at-a-Time/Term-at-a-time/Score-at-a-
time);
• ILO 2. learn about the state-of-the-art of dynamic pruning (query
evaluation) techniques (e.g., TAAT optimizations, impacts,
MaxScore, WAND, blockmax indexes, top-hitlists), illustrated
through extensive use of pseudo-code and examples.
• ILO 3. understand how such query processing fits into a multi-tier
search architecture, involving learning-to-rank and distributed
query processing.
• ILO 4. understand about safeness in dynamic pruning techniques,
and about the nature of the tradeoffs between efficiency and
effectiveness in query processing.
• ILO 5. learn about query efficiency prediction and its applications
to efficient yet effective web search and to operational costs
reduction, including a description of Bing’s use of QEP.
19SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Outline
• 1330-1340 Introduction (Craig) (10 mins)
– Components, multi-stage architectures
• 1340-1410 Part 1: Data structures, compression & skipping (Craig) (30 mins)
• 1410-1500 Part 2: Query Evaluation (50 mins)
– TAAT/DAAT (Craig)
– dynamic pruning, impacts, SAAT (Nicola)
• 1500-1530 Coffee Break (30 mins)
• 1530-1625 Part 3: Second Stage Query Processing, aka LTR
– Learning to rank (Craig) (40 min)
– Quickscorer (Nicola) (15 mins)
• 1625-1650 Part 4: Query Efficiency Prediction (25 minutes)
• 1650-1700 Conclusions (5-10 minutes)
SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 20
PART 1: INDEX LAYOUT &
COMPRESSION
21SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
The Format of an Index
term df cf p
Lexicon
id len
DocumentIndex
InvertedIndexid tf id tf
each entry (posting) represents a
document
Index
Term Pipeline
Indexing
Document
Tokenizer
22
• An index normally contains three sub-
structures
• Lexicon: Records the list of all unique terms
and their statistics
• Document Index: Records the list of all
documents and their statistics
• Inverted Index: Records the mapping
between terms and documents
id tf
Could also contain other occurrence
information: e.g. term positions,
fields (title, URL)
Could also contain other document
information: e.g. PageRank
Search Efficiency
• It is important to make retrieval as
fast as possible
• Research by indicates that even
slightly slower retrieval (0.2s-0.4s) can
lead to a dramatic drop in the perceived
quality of the results [1]
• So what is the most costly part of a
(classical) search system?
• Scoring each document for the user query
23[1] Teevan et al. Slow Search: Information Retrieval without Time Constraints. HCIR’13
Term Pipeline
Document
Retrieval Model
Query
Tokenizer
Re-Ranking
Index
Top Results
Why is Document Scoring
Expensive?
• The largest reason for the expense of document
scoring is that there are lots of documents:
– A Web search index can contain billions of
documents
• Google currently indexes trillions of pages [1]
• More specifically, the cost of a search is dependent on:
– Query length (the number of search terms)
– Posting list length for each query term
• i.e. The number of documents containing each term
24
term1 df cf p
term2 df cf p
id tf id tf id tf
id tf id tf
[1] http://www.statisticbrain.com/total-number-of-pages-indexed-by-google/
INDEX COMPRESSION
25
Index Compression
• The compression of the inverted index posting lists is
essential for efficient scoring [1]
– Motivation: it physically takes time to read the term
posting lists, particularly if they are stored on a (slow)
hard disk
– Using lossless compressed layouts for the term posting
lists can save space (on disk or in memory) and reduce
the amount of time spent doing IO
– But decompression can also be expensive, so efficient
decompression is key!
26
term2 21 2 25 2
1 integer = 32 bits = 4 bytes
total = 24 bytes
Do we need 32 bits?
26 5
[1] Witten et al. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann
1999.
Delta Gaps
• One component of the information stored in a posting
list is the docids…
– …in ascending order!
• We can make smaller numbers by taking the
differences
• So each docid in the posting lists could be represented
using less bits
– How to represent these numbers?
– 32 bits has a range -2147483648 .. 2147483648
– Using a fixed number of bits is wasteful
27
term2 21 2 25 2 26 5
term2 21 2 25 24 26 51
• Unary:
– Use as many 0s as the input value x, followed by a 1
– Eg: 5 is 000001
• Gamma:
– Let N=⌊log2 x⌋ be the highest power of 2 that x contains;
– Write N out in unary representation, followed by the
remainder
(x – 2N) in binary
– Eg: 5 is represented as 00101
• Let’s represent docids as gamma, and tf as unary
Elias Unary & Gamma Encoding
28
term2 21 2 25 24 26 51
Repr. 0000 1 0101 001 00100 001 1 00001
Bits 9 3 5 3 1 5
= 21 bits
< 3 bytes
down from
24!
3.5 bits/int
Other Compressions
Schemes
• Elias Gamma & Elias Unary are moderately expensive to
decode: lots of bit twiddling!
– Other schemes are byte-aligned, e.g.
• Variable byte [1]
• Simple family [2]
• Documents are often clustered in the inverted index (e.g. by
URL ordering)
– Compression can be more effective in blocks of numbers
– List-adaptive techniques work on blocks of numbers
• Frame of reference (FOR) [3]
• Patched frame of reference (PFOR) [4]
29
[1] H.E. Williams and J. Zobel. Compressing Integers for Fast File Access. Comput. J. 1999
[2] V. Anh & A. Moffat. Inverted Index Compression using Word-aligned Binary Codes. INRT. 2005
[3] J. Goldstein et al. Compressing relations and indexes. ICDE 1998.
[4] M. Zukowski et al. Super-scalar RAM-CPU cache compression. ICDE 2006.
Frame of Reference
Idea: pick the minimum m and the maximum M
values of block of numbers that you are
compressing.
• Then, any value x can be represented using b bits,
where b = ⌈log2(M-m+1)⌉.
Example: To compress numbers in range
{2000,...,2063}
– ⌈log2(64) ⌉ = 6
– So we use 6 bits per value:
2000
2
6
2
xxxxxxxxxxxxxxxxxx...
37
31
Patched frame of reference
(PFOR)
● FOR is sensitive to outliers
– Eg: to compress {2000,...,9000,...2064} we now need
13 bits per value
● Idea: pick b bits for compression such as it is “good” for most
of the values, treat the others as exceptions (32 bits each)
Zukowski et al, ICDE’06
32
PFOR flavors
● NewPFD
– compresses exceptions using a Simple family
codec
● OptPFD
– tries to optimize b to have the best compression
ratio and decompression speed
● FastPFOR
– subdivides exceptions by # used bits, then
compresses these sets using a FOR codec
NewPFD and OptPFD in Yan et al., WWW'09
FastPFOR in Lemire, Boytsov; Software: Practice and experience (2013)
Compression:
Some numbers [1]
• ClueWeb09 corpus – 50 million Web documents
– 12,725,738,385 postings => 94.8GB inverted file uncompressed –
NO retrieval numbers: WAY TOO SLOW!
– Terrier’s standard Elias Gamma/Unary compression = 15GB
• Compression is essential for an efficient IR system
– List adaptive compression: slightly larger indices, markedly faster
33
Time (s) Size Time (s) Size
docids tfs
Gamma/Unary 1.55 - 1.55 -
Variable Byte +0.6% +5% +9% +18.4%
Simple16 -7.1% -0.2% -2.6% +0.7%
FOR -9.7% +1.3% -3.2% +4.1%
PForDelta -7.7% +1.2% -1.3% +3.3%
[1] M Catena, C Macdonald, and I Ounis. On Inverted Index Compression for Search Engine Efficiency. ECIR
2014.
Most Recently… Elias Fano
• A data structure from the ’70s (mostly in the succinct
data structures niche) for encoding of monotonically
increasing sequences of integers
• Recently successfully applied to inverted indexes
[Vigna, WSDM13]
– Used by Facebook Graph Search!
• Originally distribution-independent, but recently
adapted to take into account document similarities,
i.e., common terms: partitioned Elias-Fano
[Ottaviano & Venturini, SIGIR15]
• Very fast implementations exist
– Skipping is included…
34SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Posting List Iterators
• It is often convenient to see a posting list as an
iterator over its postings.
• APIs:
– p.docid() returns the docid of the current posting
– p.score() returns the score of the current posting
– p.next() moves sequentially the iterator to the next posting
– p.next(d) advances the iterator forward to the next posting
with a document identifier greater than or equal to d. =>
skipping
35SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Skipping (I)
d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13
To move the cursor to a specific docid, we need to
read and decompress all postings in the middle
current
posting
next
posting
Skipping (II)
d1 d10d2 d4 d5 d6 d7 d8 d9d3
We append an additional structure to the start of the
posting list, such that the largest docid in any block
of compressed postings is known
Then to move the cursor to a specific docid, we
need to read and decompress skips in the middle
and few postings
current
posting
next
posting
d11 d12
Essential Bibliography
• Matteo Catena, Craig Macdonald and Iadh Ounis. On Inverted Index Compression for
Search Engine Efficiency. In Proceedings of ECIR 2014.
• Witten, I.H., Bell, T.C., Moffat, A.: Managing Gigabytes: Compressing and Indexing
Documents and Images. 1st edn. (1994)
• Elias, P.: Universal codeword sets and representations of the integers. Trans. Info. Theory
21(2) (1975) 2014.
• Lemire, D., Boytsov, L.: Decoding billions of integers per second through vectorization.
Software: Practice and Experience (2013)
• Delbru, R., Campinas, S., Samp, K., Tummarello, G.: Adaptive frame of reference for
compressing inverted lists. Technical Report 2010-12-16, DERI (2010)
• Williams, H.E., Zobel, J.: Compressing integers for fast file access. The Computer Journal
42 (1999)
• Scholer, F., Williams, H.E., Yiannis, J., Zobel, J.: Compression of inverted indexes for fast
query evaluation. In: Proc. SIGIR ’02. (2002)
• Anh, V.N., Moffat, A.: Inverted index compression using word-aligned binary codes. Inf.
Retr. 8(1) (2005)
• Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: Proc.
ICDE ’98. (1998)
• Alistair Moffat and Justin Zobel. 1996. Self-indexing inverted files for fast text
retrieval. ACM Trans. Inf. Syst. 14, 4 (October 1996), 349-379.
• Sebastiano Vigna. 2013. Quasi-succinct indices. In Proceedings of the sixth ACM
international conference on Web search and data mining (WSDM '13)
• Giuseppe Ottaviano, and Rossano Venturini. "Partitioned elias-fano indexes." Proceedings
of the 37th international ACM SIGIR conference on Research & development in information
retrieval. ACM, 2014.
38SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
PART 2: QUERY EVALUATION
39
1,000,000,000s
of documents
1000s of
documents
Roadmap
40
20
docs
LEARNING TO RANK
e.g. BM25
[1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day.
e.g. AND/ORPart 1: Index layouts
Part 2: Query evaluation
& dynamic pruning
Part 3: Learning-to-rank
Part 4: Query
Efficiency Prediction
& Applications
Query Evaluation
• Conjunctive (AND) vs. Disjunctive (OR)
• Boolean retrieval
– Suitable for small/medium collections
– Returns all documents matching the query
• Ranked retrieval
– Suitable for Web-scale collections
– Requires a similarity function between queries and documents
– Returns only the top k document
SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 41
Query Evaluation
• Normal strategies make a pass on the postings lists
for each query term
– This can be done Term-at-a-Time (TAAT) – one query term
at a time
– Or Document-at-a-time (DAAT) – all query terms in parallel
• We will explain these, before showing how we can
improve them
42
Term-at-a-Time (TAAT)
43
rank docid score
1
2
…
term1 p
term2 p
21 2
21 2
25 2
Advantages:
• Simple
Disadvantages:
• Requires lots of memory to contain partial scores for all
documents
• Difficult to do Boolean or phrase queries, as we don’t have
a document’s postings for all query terms at once
21 221 4
25 2
Document-at-a-Time (DAAT)
44
term1 p
term2 p
21 2
21 2
25 2
Advantages:
• Reduced memory compared to TAAT (and hence faster)
• Supports Boolean query operators, phrases, etc.
Disadvantages:
• Slightly more complex to implement
Most commercial search engines are reported to use
DAAT
rank docid score
1
2
…
21 4
25 2
TAAT vs DAAT
• TAAT and DAAT have been the cornerstores of query
evaluation in IR systems since 1970s.
• The plain implementations of these two strategies are seldom
used anymore, since many optimizations have been proposed
during the years
• Several known systems in production today, from large scale
search engines as Google and Yahoo!, to open source text
indexing packages as Lucene and Lemur, use some
optimized variation of these strategies
45SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
TAAT vs DAAT
• [Turtle and Flood,1995] were the first to argue that DAAT could beat
TAAT in practical environments.
– For large corpora, DAAT implementations require a smaller run-time memory footprint
46
• [Fontoura et al., 2011] report experiments on
small (200k docs) and large indexes (3M docs),
with short (4.26 terms) and long (57.76 terms)
queries (times are in microseconds):
SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Source: [Fontoura et al, 2011]
Bibliography
47
[Heaps, 1978] Information Retrieval: Computational and Theoretical Aspects. Academic Press, USA,
1978.
[Buckley and Lewit, 1985] Optimization of inverted vector searches. In Proc. SIGIR, pages 97–110,
1985. ACM.
[Turtle and Flood, 1995] Query evaluation: strategies and optimizations. IPM, 31(6):831–850, 1995.
[Moffat and Zobel, 1996] Self-indexing inverted files for fast text retrieval. ACM TOIS, 14(4):349–379,
1996.
[Kaszkiel and Zobel, 1998] Term-ordered query evaluation versus document-ordered query evaluation
for large document databases. In Proc. SIGIR,, pages 343– 344, 1998. ACM.
[Kaszkiel et al., 1999] Efficient passage ranking for document databases. ACM TOIS, 17(4):406–439,
1999.
[Broder et al., 2003] Efficient query evaluation using a two-level retrieval process. In Proc. CIKM,
pages 426–434, 2003. ACM.
[Zobel and Moffat, 2006] Inverted files for text search engines. ACM Computing Surveys, 38(2), 2006.
[Culpepper and Moffat, 2010] Efficient set intersection for inverted indexing. ACM TOIS, 29(1):1:1–
1:25, 2010.
[Fontoura et al., 2011] Evaluation strategies for top-k queries over memory-resident inverted indexes.
Proc. of VLDB Endowment, 4(12): 1213–1224, 2011.
SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
FASTER QUERY EVALUATION
Dynamic Pruning
48SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Query Processing
• Breakdown
• Pre-process the query (e.g., tokenisation, stemming)
• Lookup the statistics for each term in the lexicon
• Process the postings for each query term, computing scores for
documents to identify the final retrieved set
• Output the retrieved set with metadata (e.g., URLs)
• What takes time?
• Number of query terms
– Longer queries have more terms with posting lists to process
• Length of posting lists
– More postings takes longer times
• Goal: avoid (unnecessary) scoring of posting
49SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Term Pipeline
Document
Retrieval Model
Query
Tokenizer
Re-Ranking
Index
Top Results
Dynamic Pruning during
Query Evaluation
• Dynamic pruning strategies aim to make scoring faster by
only scoring a subset of the documents
– The core assumption of these approaches is that the
user is only interested in the top K results, say K=20
– During query scoring, it is possible to determine if a
document cannot make the top K ranked results
– Hence, the scoring of such documents can be terminated
early, or skipped entirely, without damaging retrieval
effectiveness to rank K
• We call this “safe-to-rank K”
• Dynamic pruning is based upon
– Early termination
– Comparing upper bounds on retrieval scores with thresholds
50
Dynamic Pruning Techniques
• MaxScore [1]
– Early termination: does not compute scores for
documents that won’t be retrieved by comparing upper
bounds with a score threshold
• WAND [2]
– Approximate evaluation: does not consider documents
with approximate scores (sum of upper bounds) lower
than threshold
– Therefore, it focuses on the combinations of terms needed
(wAND)
• BlockMaxWand
– SOTA variant of WAND that uses benefits from the block-
layout of posting lists
51
[1] H Turtle & J Flood. Query Evaluation : Strategies and Optimisations. IPM: 31(6). 1995.
[2] A Broder et al. Efficient Query Evaluation using a Two-Level Retrieval Process. CIKM 2003.
Early Termination
A document evaluation is early terminated if all or
some of its postings, relative to the terms of the query,
are not fetched from the inverted index or not scored
by the ranking function.
52SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Term and document upper
bounds
• Similarity function between queries and documents:
• For each term t in the vocabulary, we can compute a term
upper bound (also known as max score) σt(q) such that, for
all documents d in the posting list of term t:
• For a query q and a document d, we can compute a
document upper bound σd(q) by summing up the query
term upper bounds and/or actual scores:
53SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Heap and Threshold
• During query processing, the top k full or
partial scores computed so far, together
with the corresponding docids, are
organized in a priority queue, or min-
heap,
• The smallest value of these (partial)
scores is called threshold θ.
• If there are not at least k scores, the
threshold value is assumed to be 0.
54SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
d1
dK
d2
dK-1
dK-2
Dynamic Pruning Condition
• Property I: The threshold value is not decreasing
• Property II: A document with a score smaller than the
threshold will never be in final top K documents
• Pruning Condition: for a query q and a document d, if the
document upper bound σd(q), computed by using partial
scores, if any, and term upper bounds, is lesser then or equal
to the current threshold θ, the document processing can be
early terminated.
55SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Example of early termination
56SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
= 11.1 –
2.1 + 0.5
Effectiveness Guarantees
• safe/unoptimized: all documents, not just the top k, appear in the same
order and with the same score as they would appear in the ranking
produced by a unoptimized strategy.
• safe up to k/rank safe: the top k documents produced are ranked
correctly, but the document scores are not guaranteed to coincide with
the scores produced by a unoptimized strategy.
• unordered safe up to k/set safe: the documents coincides with the top k
documents computed by a full strategy, but their ranking can be different.
• approximate/unsafe: no provable guarantees on the correctness of any
portion of the ranking produced by these optimizations can be given
57SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
[Turtle and Flood, 1995]
[Strohman, 2007]
LessEffectiveMoreEffective
TAAT Optimizations
• All TAAT dynamic pruning optimizations split the query
processing into two distinct phases.
• OR mode
– the normal TAAT algorithm is executed
– New accumulators are created and updated, until a certain pruning
condition is met
• AND mode
– no new accumulators are created,
– a different algorithm is executed, on the remaining terms and/or on the
accumulators created during the first phase.
58SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
AND Mode
• Quit.
– The processing of postings completely stops at the end of the first phase.
– No new accumulators are created and no postings from the remaining term’s posting lists are processed.
• Continue.
– The creation of new accumulators stops at the end of the first.
– The remaining term’s posting lists will be processed, but just to update the score of the already existing
postings.
• Decrease.
– The processing of postings in the second phase proceeds as in Continue.
– The number of accumulators is decreased as the remaining terms are processed.
– Those terms will have small score contributions, with few chances to alter the current top k document
ranking.
– More importantly, the memory occupancy could be reduced as soon as we realize that an existing
accumulator can be dropped since it will never enter in the final top k documents, with a corresponding
benefit on response times.
59SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
[Moffat and Zobel, 1996]
[Anh and Moffat, 1998]
Summary
60SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
DAAT Optimizations
• We will cover three families of DAAT Optimizations
1. Top Docs:
– Adds record of "top" documents to the start of each posting list
2. Max Scores:
– Utilises thresholds and upper bounds
– Can skip over unscored postings
– E.g. MaxScore, WAND
3. Block Max:
– Leverages the block layout of the posting lists
– Can skip entire blocks of postings
– E.g. BlockMaxWand
61SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Top Docs Optimizations
• [Brown, 1995] proposed to focus processing on documents with high
score contributions.
• For any given term t in the vocabulary, a top candidates list (also known
as a champions list) is stored separately as a new posting list.
• Then, the normal DAAT processing for any query q is applied to the top
candidate posting lists only, reducing sensibly the number of postings
scored, but with potential negative impacts on the effectiveness, since this
optimization is unsafe.
62SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Max Scores Optimizations
• MaxScore strategy [Turtle and Flood, 1995]
– Early termination: does not compute scores for documents that won’t be returned
– By comparing upper bounds with threshold
– Based on essential and non-essential posting lists
– Suitable for TAAT as well
• WAND strategy [Broder et al., 2003]
– Approximate evaluation: does not consider documents with approximate scores
(sum of upper bounds) lower than threshold
• Both use docids sorted posting lists
• Both exploit skipping
63SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
MaxScore Example (I)
64SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
no document can be returned as a top k
results if it appears in the non-essential
lists only, i.e., 1.3 + 3.4 < 6.0
Sorted in increasing order of
term upper bound
MaxScore Example (II)
65SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Essential lists are processed in
DAAT, and their score
contributions computed
MaxScore Example (III)
66SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
process the
non essential
lists by
skipping to the
candidate
docid.
Partial score is
updated.
As soon as a partial score
implies early termination,
Move to next document.
MaxScore Example (IV)
67SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
WAND
WAND uses approximate evaluation:
does not consider documents with approximate scores (sum of upper bounds) lower than
threshold
Weak AND or Weighted AND operator:
is true if and only if
where
• Xi is a boolean variable,
• wi is a weight,
• xi is 0 or 1 depending on Xi
• θ is a threshold
69SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
WAND Details
• Given a query q = {t1,...,tn} and a document d,
• Xi is true if and only if the term ti appears in
document d
• We take the term upper bound σi(d) as weight wi
• The threshold θ is the smallest score of the top k
documents scored so far during query processing
70SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
is true
iff
WAND Example (I)
71SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Posting lists ALWAYS sorted by
current docid
Pivot docid is the first docid
with a non-zero chance to end
up in the top k results
WAND Example (II)
72SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Move previous posting lists up to the
pivot (or right after) using p.next(d)
operator
WAND Example (III)
73SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Keep the posting
lists sorted by
increasing docid!
2.1 + 1.3 + 4.0 > 6.0
then full evaluation
WAND Example (IV)
74SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Advance the iterators by one
and keep lists sorted
Performance
76SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Latency results (in ms) for DAAT, MaxScore and WAND (with speedups), for K = 30. Adapted from [Fontoura
et al., 2011].
Latency results (in ms) for DAAT, MaxScore and WAND (with speedups) for different query lengths, average
query times on ClueWeb09, for K = 10. Adapted from [Mallia et al., 2017].
Unsafe WAND
77SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
• Reducing K makes (BlockMax)WAND and
MaxScore faster
– But you get a shorter, but safe, ranking
• The dynamic pruning techniques can be also
configured to be unsafe
– This is achieved by multiplying the threshold Θ by a
constant factor F
– Increased speed, but some documents will not be
retrieved
Global vs Local
Term Upper Bounds
• A (global) term upper bound is computed over the scores of all
documents in a posting list
• Each posting list is sequentially divided in a block list, where each block
containts a given number of consecutive postings, e.g., 128 postings per
block.
• For each block, a block (local) term upper bound is computed, storing
the maximum contribution of the postings in the block.
78SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Block Max Indexes
79SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
BlockMaxWAND requires further information in the skiplists –
i.e. Block Max Indices
lexicon
Simplified Block Max WAND
pseudocode
80SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
while true
pivot ← find pivot as in WAND (all lists) or break
perform a shallow move // advance block list iterators to pivot
compute block upper bound
if pivot would enter heap
compute score and update heap
perform a deep move // advance posting list iterators to next docid
reorder by docid
else
perform a shallow move // advance block list iterators to next block(s)
perform a deep move // advance posting list iterators to next block(s)
reorder by docid
return top K results in heap
Performance
81SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Latency results (in ms) for DAAT, WAND and BMW (64 postings blocks) (with
speedups) for different query lengths, average query times (Avg, in ms) on Gov2, for
K = 10.
Variable Block Max WAND
SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Blocks size matters:
• too large → inaccurate estimation
• too small → average skip is small
Is fixed-sized blocks the right choice?
ALTERNATIVE INDEX
LAYOUTS
89SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Frequency-sorted Indexes
• An alternative way to arrange the posting lists is to sort
them such that the highest scoring documents appear
early.
• If an algorithm could find them at the beginning of
posting lists, dynamic pruning conditions can be
enforced very early.
• [Wong and Lee, 1993] proposed to sort posting list in
decreasing order of term-document frequency value.
90SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Impact-sorted Indexes
• [Anh et al., 2001] introduced the definition of impact of
term t in document d for the quantity wt(d)/Wd (or
document impact) and impact of term t in query q for
wt(q) (or query impact).
• They proposed to facilitate effective query pruning by
sorting the posting lists in decreasing order of
(document) impact, i.e., to process queries on a impact-
sorted index
91SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Quantized Impacts
• Transform impact values from real numbers in [L,U] to b-
bit integers, i.e. 2b buckets.
• LeftGeom Quantization:
• Uniform Quantization
• Reverse Mapping
92SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Quantization Bits
• [Anh et al., 2001] show that 5 bits are reasonable to
encode the impacts with minimal effectiveness losses for
both quantization schemes
– with 10 bits, no effectiveness losses are reported
– This value should be tuned when using different collections
and/or different similarity functions.
• Crane et al. [2013] confirmed that 5 to 8 bits are enough
for small-medium document collections
– for larger collections, from 8 up to 25 bits are necessary,
depending on the effectiveness measure.
93SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Compressing impacts
• [Trotman, 2014b] investigated the performance of integer compression
algorithms for frequency-sorted and impact-sorted indexes.
• The new SIMD compressor (QMX) is more time and space efficient than
the existing SIMD codecs, as furtherly examined by [Trotman and Lin,
2016].
• [Lin and Trotman, 2017] found that the best performance in query
processing with SAAT is indeed obtained when no compression is used,
even if the advantage w.r.t. QMX is small (∼5%).
• Uncompressed impact-sorted indexes can be up to two times larger than
their QMX- compressed versions.
94SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Impact-based Query
Processing
95SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Score-at-a-time
96SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Dynamic Pruning for SAAT
• The impact blocks are initially processed in OR-mode, then AND-
mode, and finally in REFINE-mode.
• After the OR-mode, a fidelity control knob controls the
percentage of remaining postings to process thereafter (AND &
REFINE).
– 30% of the remaining postings results in good retrieval performance.
• Experiments by [Ahn & Moffat, 2006] showed that this optimization
is able to reduce the memory footprint by 98% w.r.t. SAAT, with
a speedup of 1.75×.
97SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Anytime Ranking
• [Lin and Trotman, 2015] proposed a linear regression model to
translate a deadline time on the query processing time into the
number of posting to process.
• [Mackenzie et al., 2017] proposed to stop after processing a given
percentage of the total postings in the query terms' posting lists, on
a per-query basis.
• According to their experiments, the fixed threshold may result in
reduced effectiveness, as the number of query terms increases,
but conversely it gives a very strict control over the tail latency of
the queries.
98SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Essential Bibliography
99
[Wong and Lee, 1993]. Implementations of partial document ranking using inverted files. IPM, 29 (5):647–669, 1993.
[Persin, 1994]. Document filtering for fast ranking. In Proc. SIGIR, 1994. ACM.
[Turtle and Flood, 1995] Query evaluation: strategies and optimizations. IPM, 31(6):831–850, 1995.
[Anh et al., 2001] Vector-space ranking with effective early termination. In Proc. SIGIR, pages 35–42, 2001. ACM.
[Anh and Moffat, 2002]. Impact transformation: effective and efficient web retrieval. In Proc. SIGIR, pages 3–10, 2002. ACM.
[Broder et al., 2003] Efficient query evaluation using a two-level retrieval process. In Proc. CIKM, pages 426–434, 2003. ACM.
[Anh and Moffat, 2006] Pruned query evaluation using pre-computed impacts. In Proc. SIGIR, pages 372–379, 2006. ACM.
[Zobel and Moffat, 2006] Inverted files for text search engines. ACM Computing Surveys, 38(2), 2006.
[Strohman, 2007] Efficient Processing of Complex Features for Information Retrieval. PhD Thesis, 2007.
[Fontoura et al., 2011] Evaluation strategies for top-k queries over memory-resident inverted indexes. Proc. of VLDB Endowment, 4(12): 1213–
1224, 2011.
[Crane et al., 2013] Maintaining discriminatory power in quantized indexes. In Proc. CIKM, pages 1221–1224,2013. ACM.
[Lin and Trotman, 2015] Anytime ranking for impact-ordered indexes. In Proc. ICTIR, pages 301–304, 2015. ACM.
[Trotman and Lin, 2016] In vacuo and in situ evaluation of simd codecs. In Proc. ADCS, pages 1–8, 2016. ACM.
[Mackenzie et al., 2017] Early termination heuristics for score-at-a-time index traversal. In Proc, ADCS, pages 8:1– 8:8, 2017. ACM.
[Lin and Trotman, 2017] The role of index compression in score-at-a-time query evaluation. Information Retrieval Journal, pages 1–22, 2017.
[Mallia et al., 2017] Faster blockmax wand with variable-sized blocks. In Proc. SIGIR, pages 625–634, 2017. ACM.
SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
PART 3: LEARNING TO RANK
100SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
1,000,000,000s
of documents
1000s of
documents
Roadmap
101
20
docs
LEARNING TO RANK
e.g. BM25
[1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day.
e.g. AND/ORPart 1: Index layouts
Part 2: Query evaluation
& dynamic pruning
Part 3: Learning-to-rank
Part 4: Query
Efficiency Prediction
& Applications
Motivations for Learning
• How to choose term weighting models?
• Different term weighting models have different assumptions
about how relevant documents should be retrieved
• Also:
• Field-based models: term occurrences in different fields
matter differently
• Proximity-models: close co-occurrences matter more
• (Deep learned matching models)
• Priors: documents with particular lengths or URL/inlink
values matter more
• Query Features: Long queries, difficult queries, query type
How to combine all these easily and appropriately?
Learning to Rank
• Application of tailored machine learning
techniques to automatically (select and) weight
retrieval features
• Based on training data with relevance assessments
• Learning to rank has been popularised by
commercial search engines (e.g. Bing, Baidu,
Yandex)
• They require large training datasets, possibly instantiated
from click-through data
• Click-through data has facilitated the deployment of
learning approaches
T.-Y. Liu. (2009). Learning to rank for information retrieval. Foundation and Trends in Information Retrieval, 3(3), 225–331.
Types of Features
• Typically, commercial search engines use hundreds of
features for ranking documents, usually categorised as
follows:
Name
Varies depending on…
Examples
Query Document
Query
Dependent
Features
✔ ✔
Weighting models, e.g. BM25, PL2
Proximity models, e.g. Markov Random Fields
Field-based weighting models, e.g. PL2F
Deep learned matched representations
Query
Independent
Features
✗ ✔
PageRank, number of inlinks
Spamminess
Query
Features
✔ ✗
Query length
Presence of entities
Learning to Rank
1. Sample Identification
• Apply BM25 or similar (e.g. DFR PL2) to rank documents
with respect to the query
• Hope that the sample contains enough relevant documents
2. Compute more features
• Query Dependent
• Query Independent
• Query Features
3A. Learn ranking model
• Based on training data 3B. Apply learned model
− Re-rank sample documents
Schematically…
Generated using same
previous two steps, +
relevance assessments
N. Tonellotto, C. Macdonald, I. Ounis. (2013). Efficient and effective retrieval using selective pruning. WSDM'13.
Sample
Ranking
Learning to Rank
technique
Query-Dependent
Feature Extraction
• We might typically deploy a number of query dependent
features
– Such as additional weighting models, e.g. fields/proximity, that are
calculated based on information in the inverted index
• We want the result set passed to the final cascade to have
all query dependent features computed
– Once first cascade retrieval has ended, it’s too late to compute
features without re-traversing the inverted index postings lists
– It would take too long to compute all features for all scored
documents
• Solutions:
– 1. cache the "fat" postings for documents that might make the
sample (Postings contain frequencies, positions, fields, etc.)
– 2. access the document vectors after retrieval
107
d30 4.3 <p1>,<p2>
d19 3.7 <p1>
d42 3.3 <p1,p2>
Caching the Fat
• Solution 1: cache the "fat" postings for documents
that might make the sample
Q
Initial
Sample Retrieval
d30 4.3
d19 3.7
d42 3.3
Calculate Features;
Apply LTR
d19 1.0
d42 0.9
d30 0.5
Sample ResultSet
“Fat” Final
ResultSet
Inv
C Macdonald et al. (2012). About Learning Models with Multiple Query Dependent Features. TOIS
31(3). 108
(With
positions)
 Lookup cost is almost free – we had to access the inverted index anyway
✗ We only have access to postings for the original query terms
Accessing Document Vectors
• Solution 2: access direct/forward index document vectors
Q
Initial
Sample Retrieval
d30 4.3
d19 3.7
d42 3.3
Calculate Features;
Apply LTR
d19 1.0
d42 0.9
d30 0.5
Sample ResultSet
Final
ResultSet
Inv
Asadi & Lin. Document vector representations for feature extraction in multi-stage document
ranking. Inf. Retr. 16(6): 747-768 (2013) 109
(Need not
have
positions)
 We can calculate features for query terms not in the original query
✗ Additional lookup for K documents
✗ Direct Index postings contain information for all terms in a document
d30 4.3 <p1>,<p2>
d19 3.7 <p1>
d42 3.3 <p1,p2>
Direct
Learning to Rank Process
document
sample
✓
✗
✓
✓
✗
✓
✓
✗
✓
✓
✗
✓
Learner Learned
Model
Re-rank
using
learned
model
unseen
query
Training examples
f1 f2 f3 f4
d1 0.2 0.1 0.5 0.8
d2 0.5 0.7 0.0 0.2
d3 0.3 0.2 0.5 0.0
The Importance of Sample Size
• What sample size is necessary for effective
learning?
• Small samples: faster learning, faster retrieval
• Large samples: higher recall
Small sample size,
degraded
performance
No benefit for
sample size larger
than 1000
Pairwise techniques
struggle with large
samples
NDCG@20 for
different sample
sizes
Increased
performance as
sample size rises
C. Macdonald, R. Santos and I. Ounis. (2012). The Whens and Hows of Learning to Rank. IR Journal. DOI: 1007/s10791-012-9209-9
TYPES OF MODELS FOR LTR
Types of Learned Models (1)
Linear Model
• Linear Models (the most intuitive to comprehend)
– Many learning to rank techniques generate a linear combination
of feature values:
– Linear Models make some assumptions:
• Feature Usage: They assume that the same features are needed by all
queries
• Model Form: The model is only a linear combination of feature values.
- Contrast this with genetic algorithms, which can learn functional combinations of features, by
randomly introducing operators (e.g. try divide feature a by feature b), but are unpractical to learn
• It is difficult to find wf values that maximise the
performance of an IR evaluation metric, as they are none-
smooth and none-differentiable
– Typically, techniques such as simulated annealing or stochastic
gradient descent are used to empirically obtain wf values
score(d,Q) = wf
f
å ×valuef (d)
Type of Learned Models (2)
Regression Trees
– A regression tree is series of decisions, leading to a partial score
output
– The outcome of the learner is a “forest” of many such trees, used to
calculate the final score of a document for a query
– Their ability to customise branches makes them more effective than
linear models
– Regression trees are pointwise in nature, but several major search
engines have created adapted regression tree techniques that are
listwise
– E.g. Microsoft’s LambdaMART is at the heart of the Bing search engine
(hundreds/thousands
of them!)Σ
For each document in the sample:
S. Tyree, K. Weinberger, K. Agrawal, J. Paykin. (2011). Parallel Boosted Regression Trees for Web Search Ranking. WWW’11.
Learning to Rank Best
Practices (Query Features)
• Tree-based learning to rank techniques have an
inherent advantage
• They can activate different parts of their learned model
depending on feature values
• We can add “query features”, which allow different sub-trees
for queries with different characteristics
– E.g. long query, apply proximity
Class of Query Features NDCG@20
Baseline (47 document features) LambdaMART 0.2832
Query Performance Predictors 0.3033*
Query Topic Classification (entities) 0.3109*
Query Concept Identification (concepts) 0.3049*
Query Log Mining (query frequency) 0.3085*
C. Macdonald, R. Santos and I. Ounis. (2012). On the Usefulness of Query Features for Learning to Rank. CIKM’12.
Learning to Rank
Best Practices
• About the sample – it must be:
– Small enough that calculating features doesn’t take too long
– Small enough that BMW/WAND/MaxScore are fast!
– Large enough to have enough sufficient recall of relevant douments
• Previous slide says 1000 documents needed for TREC Web
corpora
– Simple early cascade: Only time for a single weighting model to
create the sample
• E.g. PL2 < BM25 with no features; but when features are added, no real
difference
• Multiple weighting models as features:
• Using more weighting models do improve the learned model
• Field Models as Features
• Adding field-based models as features always improves
effectiveness
• In general, Query Independent features (e.g. PageRank,
Inlinks, URL length) improve effectiveness
C. Macdonald, R. Santos, I. Ounis, B. He. About Learning Models with Multiple Query Dependent Features. TOIS 31(3), 2013.
C Macdonald & I Ounis (2012). The Whens & Hows of Learning to Rank. INRT. 16(5), 2012.
EFFICIENTLY APPLYING TREE
MODELS
Additive Ensmbles of
Regression Trees
• Tree forests: Gradient-Boosted Regression Tress, Lambda MART, Random Forest, etc.
– ensemble of weak learners, each contributing a partial score
– at scoring time, all trees can be processed independently
126SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
T1
(q, d)
s1
T2
(q, d)
s2
Tn
(q, d)
sn
Assuming:
• 3,000 trees
• an average tree depth of 10 nodes
• 1,000 documents scored per query
• a cluster with 1,000 search shards
Costs
• 3,000 x 10 x 1,000 = 30 M tests per query
and per shard
• Approx. 30 G tests for the entire search
cluster
127
13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55
Query-Document pair features
F1 F2 F3 F4 F5 F6 F7 F8
0.4 -1.4
1.5 3.2
2.0
0.5 -3.1
7.1
50.1:F4
10.1:F1
-3.0:F3
-1.0:F3
3.0:F8
0.1:F6
0.2:F2 2.0
s(d) += 2.0
exit leaf
true false
- number of trees = 1,000 – 20,000
- number of leaves = 4 – 64
- number of docs = 3,000 – 10,000
- number of features = 100 –1,000
Processing a
query-document pair
d[F4] ≤ 50.1
SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 127
128
Struct+
• Each tree node is represented by a C object containing:
• the feature id,
• the associated threshold
• the left and right pointers
• Memory is allocated for all the nodes at once
• The nodes are laid out using a breadth-first traversal of the tree
Node* getLeaf(Node* np, float* fv) {
if (!np->left && !np->right) {
return np;
}
if (fv[np->fid] <= np->threshold) {
return getLeaf(np->left, fv);
} else {
return getLeaf(np->right, fv);
}
}
Tree traversal:
- No guarantees about references locality
- Low cache hit ratio
- High branch misprediction rate
- Processor pipeline flushing
SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 128
129
CodeGen
• Statically-generated if-then-else blocks
if (x[4] <= 50.1) {
// recurses on the left subtree
...
} else {
// recurses on the right subtree
if(x[3] <= -3.0) {
result = 0.4;
} else {
result = -1.4;
}
}
Tree traversal:
- Static generation
- Recompilation if model changes
- Compiler "black magic" optimizations
- Expected to be fast
- Compact code, expected to fit in I-cache
- High cache hit ratio
- No data dependencies, a lot of control dependencies
- High branch misprediction rate
SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 129
130
VPred
• Array representation of nodes, similar to the Struct+ baseline
• All trees in a single array, need additional data structures to
locate a tree in the array, and to know its height
Tree traversal:
Theory Practice
double depth4(float* x, Node* nodes) {
int nodeId = 0;
nodeId = nodes->children[
x[nodes[nodeId].fid] > nodes[nodeId].theta];
nodeId = nodes->children[
x[nodes[nodeId].fid] > nodes[nodeId].theta];
nodeId = nodes->children[
x[nodes[nodeId].fid] > nodes[nodeId].theta];
nodeId = nodes->children[
x[nodes[nodeId].fid] > nodes[nodeId].theta];
return scores[nodeId];
}
- Static unrolling of traversal functions
- High compilation times
- Compiler "black magic" optimizations
- Expected to be fast
- In-code vectorization
- Processing 16 documents at a time
- Requires tuning on dataset
- No control dependencies, a lot of data dependencies
- Low branch misprediction rate
SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 130
131
Vectorization of Trees
offsets
|F|
T1
(q d)
s1
T2
(q, d)
s2
Tn
(q, d)
sn
Thresholds of
tests nodes
involving
feature f0
Thresholds
of tests nodes
involving
feature f1
thresholds
SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 131
132
while !(x[f0] <= thresholds[i]) do
// bitmask operations
offsets
thresholds
• Linear access to data
• Highly predictable conditional branches
• True nodes are not inspected at all
SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 132
Processing Tress in
Vectors
133
Quickscorer algorithm
SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 133
Tree traversal
(BitMask
Operations)
Score
computation
134
Table 2: Per -docum ent scor ing t im e in µs of QS, VPr ed, If -T hen-El se and St r uct + on MSN-1 and Y!S1
dat aset s. G ain fact or s ar e r epor t ed in par ent heses.
M et hod ⇤
Number of t rees/ dat aset
1, 000 5, 000 10, 000 20, 000
MSN-1 Y!S1 MSN-1 Y!S1 MSN-1 Y!S1 MSN-1 Y!S1
QS
8
2.2 (–) 4.3 (–) 10.5 (–) 14.3 (–) 20.0 (–) 25.4 (–) 40.5 (–) 48.1 (–)
VPr ed 7.9 (3.6x) 8.5 (2.0x) 40.2 (3.8x) 41.6 (2.9x) 80.5 (4.0x) 82.7 (3.3) 161.4 (4.0x) 164.8 (3.4x)
If -T hen-El se 8.2 (3.7x) 10.3 (2.4x) 81.0 (7.7x) 85.8 (6.0x) 185.1 (9.3x) 185.8 (7.3x) 709.0 (17.5x) 772.2 (16.0x)
St r uct + 21.2 (9.6x) 23.1 (5.4x) 107.7 (10.3x) 112.6 (7.9x) 373.7 (18.7x) 390.8 (15.4x) 1150.4 (28.4x) 1141.6 (23.7x)
QS
16
2.9 (–) 6.1 (–) 16.2 (–) 22.2 (–) 32.4 (–) 41.2 (–) 67.8 (–) 81.0 (–)
VPr ed 16.0 (5.5x) 16.5 (2.7x) 82.4 (5.0x) 82.8 (3.7x) 165.5 (5.1x) 165.2 (4.0x) 336.4 (4.9x) 336.1 (4.1x)
If -T hen-El se 18.0 (6.2x) 21.8 (3.6x) 126.9 (7.8x) 130.0 (5.8x) 617.8 (19.0x) 406.6 (9.9x) 1767.3 (26.0x) 1711.4 (21.1x)
St r uct + 42.6 (14.7x) 41.0 (6.7x) 424.3 (26.2x) 403.9 (18.2x) 1218.6 (37.6x) 1191.3 (28.9x) 2590.8 (38.2x) 2621.2 (32.4x)
QS
32
5.2 (–) 9.7 (–) 27.1 (–) 34.3 (–) 59.6 (–) 70.3 (–) 155.8 (–) 160.1 (–)
VPr ed 31.9 (6.1x) 31.6 (3.2x) 165.2 (6.0x) 162.2 (4.7x) 343.4 (5.7x) 336.6 (4.8x) 711.9 (4.5x) 694.8 (4.3x)
If -T hen-El se 34.5 (6.6x) 36.2 (3.7x) 300.9 (11.1x) 277.7 (8.0x) 1396.8 (23.4x) 1389.8 (19.8x) 3179.4 (20.4x) 3105.2 (19.4x)
St r uct + 69.1 (13.3x) 67.4 (6.9x) 928.6 (34.2x) 834.6 (24.3x) 1806.7 (30.3x) 1774.3 (25.2x) 4610.8 (29.6x) 4332.3 (27.0x)
QS
64
9.5 (–) 15.1 (–) 56.3 (–) 66.9 (–) 157.5 (–) 159.4 (–) 425.1 (–) 343.7 (–)
VPr ed 62.2 (6.5x) 57.6 (3.8x) 355.2 (6.3x) 334.9 (5.0x) 734.4 (4.7x) 706.8 (4.4x) 1309.7 (3.0x) 1420.7 (4.1x)
If -T hen-El se 55.9 (5.9x) 55.1 (3.6x) 933.1 (16.6x) 935.3 (14.0x) 2496.5 (15.9x) 2428.6 (15.2x) 4662.0 (11.0x) 4809.6 (14.0x)
St r uct + 109.8 (11.6x) 116.8 (7.7x) 1661.7 (29.5x) 1554.6 (23.2x) 3040.7 (19.3x) 2937.3 (18.4x) 5437.0 (12.8x) 5456.4 (15.9x)
same trivially holds for St r uct + . This means that the in-
terleaved traversal strategy of QS needs to processlessnodes
than in a traditional root-to-leaf visit. This mostly explains
the results achieved by QS.
As far as number of branches is concerned, we note that,
not surprisingly, QS and VPr ed are much more efficient
than If -T hen-El se and St r uct + with this respect. QS
number of L3 cache misses starts increasing when dealing
with 9, 000 trees on MSN and 8, 000 trees on Y!S1.
BWQS: a block-wise variant of QS
The previous experiments suggest that improving the cache
efficiency of QS may result in significant benefits. As in
Performance
Per-document scoring time in microsecs and speedups
SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 134
Essential Bibliography
135
• Craig Macdonald, Rodrygo Santos and Iadh Ounis. The Whens and Hows of Learning to Rank. Information
Retrieval 16(5):584-628.
• Craig Macdonald, Rodrygo Santos and Iadh Ounis. On the Usefulness of Query Features for Learning to Rank. In
Proceedings of CIKM 2012.
• C Macdonald et al, About Learning Models with Multiple Query Dependent Features. (2012). TOIS 31(3).
• N Fuhr. Optimal Polynomial Retrieval Functions Based on the Probability Ranking Principle. (1989). TOIS. 7(3),
1989.
• S. Tyree, K.Q. Weinberger, K. Agrawal, J. Paykin Parallel boosted regression trees for web search ranking. WWW
2011.
• N. Asadi, J. Lin, and A. P. de Vries. Runtime optimizations for tree-based machine learning models. IEEE
Transactions on Knowledge and Data Engineering, 26(9):2281–2292, 2014.
• C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. Quickscorer: A fast algorithm to
rank documents with additive ensembles of regression trees. In Proc. of the 38th ACM SIGIR Conference, pages
73–82, 2015.
• D. Dato, C. Lucchese, F.M. Nardini, S. Orlando, R. Perego, N. Tonellotto, R. Venturini. Fast Ranking with Additive
Ensembles of Oblivious and Non-Oblivious Regression Trees. ACM Transactions on Information Systems, 2016.
• C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. Exploiting CPU SIMD
Extensions to Speed-up Document Scoring with Tree Ensembles. In Proc. of the 39th ACM SIGIR Conference,
pages 833-836, 2016.
• F. Lettich, C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. GPU-based
Parallelization of QuickScorer to Speed-up Document Ranking with Tree Ensembles. In Proc. of the 7th IIR
Workshop, 2016
SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
QUERY EFFICIENCY
PREDICTION
1,000,000,000s
of documents
1000s of
documents
Roadmap
137
20
docs
LEARNING TO RANK
e.g. BM25
[1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day.
e.g. AND/ORPart 1: Index layouts
Part 2: Query evaluation
& dynamic pruning
Part 3: Learning-to-rank
Part 4: Query
Efficiency Prediction
& Applications
If we know how long a query will take,
can we reconfigure the search engine's
cascades?
1,000,000,000s
of documents
1000s of
documents
10
docs
Simple Ranking: Identify a set most
likely to contain relevant documents
Re-Ranking: Try really hard to
get the top of the ranking correct,
using many signals (features)
[1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day.
Boolean: Do query terms occur?
138
Main Idea:
Query Efficiency Prediction
• Predict how long an unseen query will take to execute, before it has
executed.
• This facilitates 3+ manners to make a search engine more efficient:
1. Reconfigure the cascades of the search engine, trading off a little
effectiveness for efficiency
2. Apply more CPU cores to long-running queries
3. Decide how to plan the rewrites of a query, to reduce long-
running queries
In each case: increasing efficiency means increased server
capacity and energy savings
2 term queries
4 term queries
What makes a query
fast or slow?
141
Length of
Posting Lists?
Full/MaxScore/
Wand?
Length of
Query?
Co-occurrence
of query
terms?
Query Efficiency
Prediction
• Can we predict response time for a query (query efficiency
prediction)?
• Properties:
– Can be calculated online before retrieval commences
– Easy (quick) to calculate
– Based on single term statistics, previously calculated offline
– Includes statistics on posting list’s score distribution
term-level
statistic
predictor
feature
aggregation
function
• We combine statistics of the query terms using in a regression to
accurately predict execution time for a query?
Query Efficiency
Prediction Framework
total
postings
(baseline)
42 inexpensive featur
Accuracy of Query
Efficiency Predictors
144
Findings: Our learned predictor, with 42 features can accurately
predict the response time of the search engine for an unseen
query
Three Applications of Query
Efficiency Prediction
1. Selective retrieval – can we reduce the time taken in
the earlier retrieval cascades, e.g. by retrieving less
documents
2. Selective parallelisation – apply more CPU cores for
long-running queries
3. Selective query rewriting – can we decide how to
rewrite a query on a per-query basis?
• Recall that, for each application: increasing efficiency
means increased server capacity and energy savings
QEP APPLICATION (1):
SELECTIVE RETRIEVAL
146
QEP Application (1):
Selective Retrieval
• If we only need to return 10 documents to the user, do we
need to re-score 1000s?
• i.e. can we configure the earlier cascades to be faster for
longer queries, but potentially miss relevant documents?
Boolean: Do query terms occur?
Simple Ranking: Identify a set most
likely to contain relevant documents
1,000,000,000s
of documents
1000s of
documents
20
docs
500,000,000
docs
[Tonellotto, Macdonald, Ounis, WSDM 2013]
500s
docs
10
docs
1,000,000,000s
of documents
1000s of
documents
10
docs
Re-Ranking: Try really hard to
get the top of the ranking correct,
using many signals (features)
Proposed Method [WSDM 2013]:
• Predict the efficiency (response time) of a query
• Adjust <K, F> of WAND to be faster for predicted longer queries
Selective Pruning Results
148SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Findings: by using our 42-feature query efficiency predictor to selectively adjust the
retrieval cascades, we could reduce response time by 31%, while maintaining high
effectiveness
The increased capacity by the better efficiency is equivalent to 44% more servers
QEP APPLICATION (2) :
SELECTIVE
PARALLELISATION
152
QEP Application (2):
Selective Parallelisation
• Industry web search deployments have to meet service level
agreements
– e.g. 200ms for 99th-percentile response time
– Only 1 query in 100 may take longer than 200ms
• Some queries can take > 200ms
Problem: How to identify and speed-up those slow queries?
Parallelisation: Retrieval with multiple cores
• Parallelisation using N threads speeds up all queries, but
decreases server capacity to 1/N
• Idea! Selective Parallelisation – only parallelise those
queries predicted to be too long
154
CPU 1 CPU 2
Selective
Prediction:
long!
cup 1.3B p
Lexicon
person 500M p
world 6.9B p
cup
world
InvertedIndex
1
2
2
5
2
6
3
5
6
7
6
8
1
2
3
4
3
5
3
6
----
----
world cup
----
As used by !!
• Jeon et al. reports that this approach reduces the
99th-percentile response time by 50% from
200ms to 100ms, compared to other state-of-the-
art approaches that do not consider predicted
execution time
• In doing so, it increases server capacity by more
than 50%. This potentially saves one-third of
production servers, constituting a significant cost
reduction
• Selective parallelisation, using query efficiency
predictors, is now being deployed by industrial
search engine Bing, across hundreds of
thousands of servers
Image Source: Wikimedia
[Predictive parallelization: Taming tail latencies in web search.
M Jeon, S Kim, S Hwang, Y He, S Elnikety, AL Cox, S Rixner.
Proceedings of ACM SIGIR 2014]
QEP APPLICATION (3) :
SELECTIVE QUERY
REWRITING
156
Query Rewriting Plans
• Remember our notion of a query plan, incl. rewriting the query
– E.g. Context-Sensitive Stemming [1] or MRF Proximity: [2]
157
• All these rewritings use complex operators, e.g. #syn() #uwN()
– #syn: documents containing any of the words
– #1: documents containing the exact phrase
– #uwN: documents containing the words in a window of size N
• Key Question: how to rewrite a query, to maximise efficiency AND
effectiveness
157
[1] F. Peng et al. Context Sensitive Stemming for Web Search. SIGIR 2007.
[2] D. Metzler, W.B. Croft. A Markov Random Field Model for Term Dependencies. SIGIR 2005.
158
Different queries can be rewritten in different ways
Different queries (and rewritings) can take varying
durations to execute
We describe a framework that plans how to execute
each query (e.g. rewriting), taking into account:
How long each plan may take to execute
Expected effectiveness of each plan
A Selective Rewriting Approach
• The selective mechanism is based around two
ideas:
1. We can predict the efficiency of a given query
plan… and hence eliminate plans that will
exceed the target response time
2. We can predict the effectiveness of a given
query plan… and where possible execute the
most effective
159
QEP,
including for
complex
queries
QPP, or an
expectation of
plan
effectiveness
Complex terms
and ephemeral posting lists
• Recall, QEP essentially estimates pruning, based on posting list lengths
• Challenge – obtaining posting list length estimates for queries involving
complex terms, e.g.
160
car 21 2 25 3 26 5
cars 19 1 25 2
#syn(car,cars) 19 1 21 2 25 5 26 6
unigram
ephemeral
Observations on
Ephemeral Postings lists
• Observation: Statistics for ephemeral postings lists
can be bounded…
– e.g. #syn is disjunctive, so the length of its posting list
cannot be greater than the sum of the posting list lengths
of its constituent terms
– e.g. #1 is conjunctive, the length of its postings list cannot
be greater than the minimum of the posting list lengths of
its constituent terms
– e.g. for #1, the maximum number of matches for a phrase
cannot exceed the minimum of the max term frequencies
of the constituent terms
• These allow:
a) upper bound statistics (necessary for doing dynamic
pruning with complex terms)
b) Efficiency predictions for queries with complex terms
161
Putting together: Complex QEP
& Online Plan Selection
162
49% decrease in mean
response time, and 62%
decrease in tail response time,
compared to Default, w/o
significant loss in NDCG
Looking Forward
• The query plans we showed here demonstrate
only two possible rewritings:
– Proximity and stemming
– They don’t consider the (predicted) effectiveness of a
plan
• We highlight Rosset et al, who have proposed
use of reinforcement learning to identify the most
appropriate plan per-query and per-index block.
163
Optimizing Query Evaluations using Reinforcement Learning for Web Search.
Rosset et al, Proceedings of SIGIR 2018.
QEP Bibliography
• Predictive parallelization: Taming tail latencies in web search.
M Jeon, S Kim, S Hwang, Y He, S Elnikety, AL Cox, S Rixner.
Proceedings of ACM SIGIR 2014
• Load-Sensitive Selective Pruning for Distributed
Search. Nicola Tonellotto, Fabrizio Silvestri, Raffaele Perego,
Daniele Broccolo, Salvatore Orlando, Craig Macdonald and
Iadh Ounis. In Proceedings of CIKM 2013.
• Efficient and effective retrieval using selective pruning. Nicola
Tonellotto, Craig Macdonald and Iadh Ounis. In Proceedings
of WSDM 2013.
• Learning to Predict Response Times for Online Query
Scheduling. Craig Macdonald, Nicola Tonellotto and Iadh
Ounis. In Proceedings of SIGIR 2012.
• Efficient & Effective Selective Query Rewriting with Efficiency
Predictions. Craig Macdonald, Nicola Tonellotto, Iadh Ounis.
In Proceedings of SIGIR 2017.
164ECIR 2017 — 9 April 2017 — Aberdeen, Scotland, UK
CONCLUSIONS
165
Conclusions
• Information retrieval requires not just effective, but also
efficient techniques
– These two form an inherent tradeoff
– Efficiency can be helped by applying more horsepower, but at
significant cost!
• Hence efficiency is important to ensure the Green
credentials of the IR community
• We have presented many techniques can aim to improve
particular methods of retrieving
– They make assumptions about the IR process (e.g. we care
about top K documents)
• As new IR techniques and models are created, there will
always be a for enhancements to their efficiency to make
them easily deployable
166SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Emerging Trends
• BitFunnel (SIGIR 2017 best paper) – a "lossy"
signature representation for inverted files
– Now deployed by Bing
• Similarly deep learning data structures – Google
• Deep learning for matching – but efficiency costs
not yet well-researched
• And your future ideas…?
167SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
Acknolwedgements
• Some of the slides presented here have
appeared in presentations which we have co-
authored with:
– Iadh Ounis
– Richard McCreadie
– Matteo Catena
– Raffaele Perego
– Rossano Venturini
– Salvatore Orlando
– Franco Maria Nardini
– Claudio Lucchese
168SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
• Feedback:
– http://bit.ly/2ucH86b
169SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA

More Related Content

Efficient Query Processing Infrastructures

  • 1. Efficient Query Processing Infrastructures Craig Macdonald 1, Nicola Tonellotto 2 1 School of Computer Science University of Glasgow 2 Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” National Research Council of Italy
  • 2. The scale of the web search challenge
  • 3. How many documents? In how long? • Reports suggest that Google considers a total of 30 trillion pages in the indexes of its search engine – And it identified relevant results from these 30 trillion in 0.63 seconds – Clearly this a big data problem! • To answer a user's query, a search engine doesn’t read through all of those pages: the index data structures help it to efficiently find pages that effectively match the query and will help the user – Effective: users want relevant search results – Efficient: users aren't prepared to wait a long time for search results • Today, we provide an insight into search engine architecture and the modern retrieval techniques for attaining efficient yet effective search
  • 4. Search as a Distributed Problem • To achieve efficiency at Big Data scale, search engines use many servers: • N & M can be very big: – Microsoft's Bing search engine has "hundreds of thousands of query servers" Shard Replica Query Server Retrieval Strategy Shard Replica Query Server Retrieval Strategy Broker Query Scheduler queries N M Results Merging
  • 8. Data centres use 3 per cent of the global electricity supply and accounting for about 2 per cent of total greenhouse gas emissions This is expected to treble in the next decade, putting an enormous strain on energy supplies and dealing a hefty blow to efforts to contain global warming Data Centres are energy hungry… Image Source: http://www.mwhglobal.com/project/wanapum-dam-spillway-project/Source: The Independent
  • 9. How to keep pace? IncomeExpenditure More users, more ad revenue More queries received More documents to index Search engines are growing 9
  • 10. LessEffectiveMoreEffective More Query ServersMore Efficient Querying Expenditure Income Expenditure Income
  • 11. Green search engines are important Keep your search engine Efficient yet effective Efficient: More throughput, less servers, less expenditure Effective: More users, more income Income 11 Expenditure
  • 12. So where is our focus?
  • 13. Inter Data Center So where is our focus?
  • 14. Intra Data Center So where is our focus?
  • 15. Intra Server So where is our focus?
  • 16. • Different formulations are more effective & efficient for different queries: – Homepage: match on Url/Title/Atext – Informational: match on Body, more query expansion • Query type classification, query performance predictors and query efficiency predictors all have a role in deciding the most appropriate plan for a query Query Plans: Executing a Query 16 Q Qplan Q Q plan halloween costumes (halloween ∊ U|B|T) ⋀ ( {costume,costumes} ∊ A|U|B|T) facebook login (facebook ∊ U|B|T) Optimizing Query Evaluations using Reinforcement Learning for Web Search. Rosset et al, Proceedings of SIGIR 2018.
  • 17. • Typically, in web-scale search, the ranking process can be conceptually seen as a series of cascades [1] – Rank some documents – Pass top-ranked onto next cascade for refined re-ranking 1,000,000,000s of documents 1000s of documents Cascades 17 20 docs Boolean: Do query terms occur? Simple Ranking: Identify a set most likely to contain relevant documents Re-Ranking: Try really hard to get the top of the ranking correct, using many signals (features) LEARNING TO RANK e.g. BM25 [1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day. e.g. AND/OR Q Qplan
  • 18. 1,000,000,000s of documents 1000s of documents Roadmap 18 20 docs LEARNING TO RANK e.g. BM25 [1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day. e.g. AND/ORPart 1: Index layouts Part 2: Query evaluation & dynamic pruning Part 3: Learning-to-rank Part 4: Query Efficiency Prediction & Applications
  • 19. Aims of this Tutorial • ILO 1. learn about the fundamentals of query processing in information retrieval in terms of the data structures and algorithms involved (e.g. Document-at-a-Time/Term-at-a-time/Score-at-a- time); • ILO 2. learn about the state-of-the-art of dynamic pruning (query evaluation) techniques (e.g., TAAT optimizations, impacts, MaxScore, WAND, blockmax indexes, top-hitlists), illustrated through extensive use of pseudo-code and examples. • ILO 3. understand how such query processing fits into a multi-tier search architecture, involving learning-to-rank and distributed query processing. • ILO 4. understand about safeness in dynamic pruning techniques, and about the nature of the tradeoffs between efficiency and effectiveness in query processing. • ILO 5. learn about query efficiency prediction and its applications to efficient yet effective web search and to operational costs reduction, including a description of Bing’s use of QEP. 19SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 20. Outline • 1330-1340 Introduction (Craig) (10 mins) – Components, multi-stage architectures • 1340-1410 Part 1: Data structures, compression & skipping (Craig) (30 mins) • 1410-1500 Part 2: Query Evaluation (50 mins) – TAAT/DAAT (Craig) – dynamic pruning, impacts, SAAT (Nicola) • 1500-1530 Coffee Break (30 mins) • 1530-1625 Part 3: Second Stage Query Processing, aka LTR – Learning to rank (Craig) (40 min) – Quickscorer (Nicola) (15 mins) • 1625-1650 Part 4: Query Efficiency Prediction (25 minutes) • 1650-1700 Conclusions (5-10 minutes) SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 20
  • 21. PART 1: INDEX LAYOUT & COMPRESSION 21SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 22. The Format of an Index term df cf p Lexicon id len DocumentIndex InvertedIndexid tf id tf each entry (posting) represents a document Index Term Pipeline Indexing Document Tokenizer 22 • An index normally contains three sub- structures • Lexicon: Records the list of all unique terms and their statistics • Document Index: Records the list of all documents and their statistics • Inverted Index: Records the mapping between terms and documents id tf Could also contain other occurrence information: e.g. term positions, fields (title, URL) Could also contain other document information: e.g. PageRank
  • 23. Search Efficiency • It is important to make retrieval as fast as possible • Research by indicates that even slightly slower retrieval (0.2s-0.4s) can lead to a dramatic drop in the perceived quality of the results [1] • So what is the most costly part of a (classical) search system? • Scoring each document for the user query 23[1] Teevan et al. Slow Search: Information Retrieval without Time Constraints. HCIR’13 Term Pipeline Document Retrieval Model Query Tokenizer Re-Ranking Index Top Results
  • 24. Why is Document Scoring Expensive? • The largest reason for the expense of document scoring is that there are lots of documents: – A Web search index can contain billions of documents • Google currently indexes trillions of pages [1] • More specifically, the cost of a search is dependent on: – Query length (the number of search terms) – Posting list length for each query term • i.e. The number of documents containing each term 24 term1 df cf p term2 df cf p id tf id tf id tf id tf id tf [1] http://www.statisticbrain.com/total-number-of-pages-indexed-by-google/
  • 26. Index Compression • The compression of the inverted index posting lists is essential for efficient scoring [1] – Motivation: it physically takes time to read the term posting lists, particularly if they are stored on a (slow) hard disk – Using lossless compressed layouts for the term posting lists can save space (on disk or in memory) and reduce the amount of time spent doing IO – But decompression can also be expensive, so efficient decompression is key! 26 term2 21 2 25 2 1 integer = 32 bits = 4 bytes total = 24 bytes Do we need 32 bits? 26 5 [1] Witten et al. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann 1999.
  • 27. Delta Gaps • One component of the information stored in a posting list is the docids… – …in ascending order! • We can make smaller numbers by taking the differences • So each docid in the posting lists could be represented using less bits – How to represent these numbers? – 32 bits has a range -2147483648 .. 2147483648 – Using a fixed number of bits is wasteful 27 term2 21 2 25 2 26 5 term2 21 2 25 24 26 51
  • 28. • Unary: – Use as many 0s as the input value x, followed by a 1 – Eg: 5 is 000001 • Gamma: – Let N=⌊log2 x⌋ be the highest power of 2 that x contains; – Write N out in unary representation, followed by the remainder (x – 2N) in binary – Eg: 5 is represented as 00101 • Let’s represent docids as gamma, and tf as unary Elias Unary & Gamma Encoding 28 term2 21 2 25 24 26 51 Repr. 0000 1 0101 001 00100 001 1 00001 Bits 9 3 5 3 1 5 = 21 bits < 3 bytes down from 24! 3.5 bits/int
  • 29. Other Compressions Schemes • Elias Gamma & Elias Unary are moderately expensive to decode: lots of bit twiddling! – Other schemes are byte-aligned, e.g. • Variable byte [1] • Simple family [2] • Documents are often clustered in the inverted index (e.g. by URL ordering) – Compression can be more effective in blocks of numbers – List-adaptive techniques work on blocks of numbers • Frame of reference (FOR) [3] • Patched frame of reference (PFOR) [4] 29 [1] H.E. Williams and J. Zobel. Compressing Integers for Fast File Access. Comput. J. 1999 [2] V. Anh & A. Moffat. Inverted Index Compression using Word-aligned Binary Codes. INRT. 2005 [3] J. Goldstein et al. Compressing relations and indexes. ICDE 1998. [4] M. Zukowski et al. Super-scalar RAM-CPU cache compression. ICDE 2006.
  • 30. Frame of Reference Idea: pick the minimum m and the maximum M values of block of numbers that you are compressing. • Then, any value x can be represented using b bits, where b = ⌈log2(M-m+1)⌉. Example: To compress numbers in range {2000,...,2063} – ⌈log2(64) ⌉ = 6 – So we use 6 bits per value: 2000 2 6 2 xxxxxxxxxxxxxxxxxx... 37
  • 31. 31 Patched frame of reference (PFOR) ● FOR is sensitive to outliers – Eg: to compress {2000,...,9000,...2064} we now need 13 bits per value ● Idea: pick b bits for compression such as it is “good” for most of the values, treat the others as exceptions (32 bits each) Zukowski et al, ICDE’06
  • 32. 32 PFOR flavors ● NewPFD – compresses exceptions using a Simple family codec ● OptPFD – tries to optimize b to have the best compression ratio and decompression speed ● FastPFOR – subdivides exceptions by # used bits, then compresses these sets using a FOR codec NewPFD and OptPFD in Yan et al., WWW'09 FastPFOR in Lemire, Boytsov; Software: Practice and experience (2013)
  • 33. Compression: Some numbers [1] • ClueWeb09 corpus – 50 million Web documents – 12,725,738,385 postings => 94.8GB inverted file uncompressed – NO retrieval numbers: WAY TOO SLOW! – Terrier’s standard Elias Gamma/Unary compression = 15GB • Compression is essential for an efficient IR system – List adaptive compression: slightly larger indices, markedly faster 33 Time (s) Size Time (s) Size docids tfs Gamma/Unary 1.55 - 1.55 - Variable Byte +0.6% +5% +9% +18.4% Simple16 -7.1% -0.2% -2.6% +0.7% FOR -9.7% +1.3% -3.2% +4.1% PForDelta -7.7% +1.2% -1.3% +3.3% [1] M Catena, C Macdonald, and I Ounis. On Inverted Index Compression for Search Engine Efficiency. ECIR 2014.
  • 34. Most Recently… Elias Fano • A data structure from the ’70s (mostly in the succinct data structures niche) for encoding of monotonically increasing sequences of integers • Recently successfully applied to inverted indexes [Vigna, WSDM13] – Used by Facebook Graph Search! • Originally distribution-independent, but recently adapted to take into account document similarities, i.e., common terms: partitioned Elias-Fano [Ottaviano & Venturini, SIGIR15] • Very fast implementations exist – Skipping is included… 34SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 35. Posting List Iterators • It is often convenient to see a posting list as an iterator over its postings. • APIs: – p.docid() returns the docid of the current posting – p.score() returns the score of the current posting – p.next() moves sequentially the iterator to the next posting – p.next(d) advances the iterator forward to the next posting with a document identifier greater than or equal to d. => skipping 35SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 36. Skipping (I) d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 d13 To move the cursor to a specific docid, we need to read and decompress all postings in the middle current posting next posting
  • 37. Skipping (II) d1 d10d2 d4 d5 d6 d7 d8 d9d3 We append an additional structure to the start of the posting list, such that the largest docid in any block of compressed postings is known Then to move the cursor to a specific docid, we need to read and decompress skips in the middle and few postings current posting next posting d11 d12
  • 38. Essential Bibliography • Matteo Catena, Craig Macdonald and Iadh Ounis. On Inverted Index Compression for Search Engine Efficiency. In Proceedings of ECIR 2014. • Witten, I.H., Bell, T.C., Moffat, A.: Managing Gigabytes: Compressing and Indexing Documents and Images. 1st edn. (1994) • Elias, P.: Universal codeword sets and representations of the integers. Trans. Info. Theory 21(2) (1975) 2014. • Lemire, D., Boytsov, L.: Decoding billions of integers per second through vectorization. Software: Practice and Experience (2013) • Delbru, R., Campinas, S., Samp, K., Tummarello, G.: Adaptive frame of reference for compressing inverted lists. Technical Report 2010-12-16, DERI (2010) • Williams, H.E., Zobel, J.: Compressing integers for fast file access. The Computer Journal 42 (1999) • Scholer, F., Williams, H.E., Yiannis, J., Zobel, J.: Compression of inverted indexes for fast query evaluation. In: Proc. SIGIR ’02. (2002) • Anh, V.N., Moffat, A.: Inverted index compression using word-aligned binary codes. Inf. Retr. 8(1) (2005) • Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: Proc. ICDE ’98. (1998) • Alistair Moffat and Justin Zobel. 1996. Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst. 14, 4 (October 1996), 349-379. • Sebastiano Vigna. 2013. Quasi-succinct indices. In Proceedings of the sixth ACM international conference on Web search and data mining (WSDM '13) • Giuseppe Ottaviano, and Rossano Venturini. "Partitioned elias-fano indexes." Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 2014. 38SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 39. PART 2: QUERY EVALUATION 39
  • 40. 1,000,000,000s of documents 1000s of documents Roadmap 40 20 docs LEARNING TO RANK e.g. BM25 [1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day. e.g. AND/ORPart 1: Index layouts Part 2: Query evaluation & dynamic pruning Part 3: Learning-to-rank Part 4: Query Efficiency Prediction & Applications
  • 41. Query Evaluation • Conjunctive (AND) vs. Disjunctive (OR) • Boolean retrieval – Suitable for small/medium collections – Returns all documents matching the query • Ranked retrieval – Suitable for Web-scale collections – Requires a similarity function between queries and documents – Returns only the top k document SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 41
  • 42. Query Evaluation • Normal strategies make a pass on the postings lists for each query term – This can be done Term-at-a-Time (TAAT) – one query term at a time – Or Document-at-a-time (DAAT) – all query terms in parallel • We will explain these, before showing how we can improve them 42
  • 43. Term-at-a-Time (TAAT) 43 rank docid score 1 2 … term1 p term2 p 21 2 21 2 25 2 Advantages: • Simple Disadvantages: • Requires lots of memory to contain partial scores for all documents • Difficult to do Boolean or phrase queries, as we don’t have a document’s postings for all query terms at once 21 221 4 25 2
  • 44. Document-at-a-Time (DAAT) 44 term1 p term2 p 21 2 21 2 25 2 Advantages: • Reduced memory compared to TAAT (and hence faster) • Supports Boolean query operators, phrases, etc. Disadvantages: • Slightly more complex to implement Most commercial search engines are reported to use DAAT rank docid score 1 2 … 21 4 25 2
  • 45. TAAT vs DAAT • TAAT and DAAT have been the cornerstores of query evaluation in IR systems since 1970s. • The plain implementations of these two strategies are seldom used anymore, since many optimizations have been proposed during the years • Several known systems in production today, from large scale search engines as Google and Yahoo!, to open source text indexing packages as Lucene and Lemur, use some optimized variation of these strategies 45SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 46. TAAT vs DAAT • [Turtle and Flood,1995] were the first to argue that DAAT could beat TAAT in practical environments. – For large corpora, DAAT implementations require a smaller run-time memory footprint 46 • [Fontoura et al., 2011] report experiments on small (200k docs) and large indexes (3M docs), with short (4.26 terms) and long (57.76 terms) queries (times are in microseconds): SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Source: [Fontoura et al, 2011]
  • 47. Bibliography 47 [Heaps, 1978] Information Retrieval: Computational and Theoretical Aspects. Academic Press, USA, 1978. [Buckley and Lewit, 1985] Optimization of inverted vector searches. In Proc. SIGIR, pages 97–110, 1985. ACM. [Turtle and Flood, 1995] Query evaluation: strategies and optimizations. IPM, 31(6):831–850, 1995. [Moffat and Zobel, 1996] Self-indexing inverted files for fast text retrieval. ACM TOIS, 14(4):349–379, 1996. [Kaszkiel and Zobel, 1998] Term-ordered query evaluation versus document-ordered query evaluation for large document databases. In Proc. SIGIR,, pages 343– 344, 1998. ACM. [Kaszkiel et al., 1999] Efficient passage ranking for document databases. ACM TOIS, 17(4):406–439, 1999. [Broder et al., 2003] Efficient query evaluation using a two-level retrieval process. In Proc. CIKM, pages 426–434, 2003. ACM. [Zobel and Moffat, 2006] Inverted files for text search engines. ACM Computing Surveys, 38(2), 2006. [Culpepper and Moffat, 2010] Efficient set intersection for inverted indexing. ACM TOIS, 29(1):1:1– 1:25, 2010. [Fontoura et al., 2011] Evaluation strategies for top-k queries over memory-resident inverted indexes. Proc. of VLDB Endowment, 4(12): 1213–1224, 2011. SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 48. FASTER QUERY EVALUATION Dynamic Pruning 48SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 49. Query Processing • Breakdown • Pre-process the query (e.g., tokenisation, stemming) • Lookup the statistics for each term in the lexicon • Process the postings for each query term, computing scores for documents to identify the final retrieved set • Output the retrieved set with metadata (e.g., URLs) • What takes time? • Number of query terms – Longer queries have more terms with posting lists to process • Length of posting lists – More postings takes longer times • Goal: avoid (unnecessary) scoring of posting 49SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Term Pipeline Document Retrieval Model Query Tokenizer Re-Ranking Index Top Results
  • 50. Dynamic Pruning during Query Evaluation • Dynamic pruning strategies aim to make scoring faster by only scoring a subset of the documents – The core assumption of these approaches is that the user is only interested in the top K results, say K=20 – During query scoring, it is possible to determine if a document cannot make the top K ranked results – Hence, the scoring of such documents can be terminated early, or skipped entirely, without damaging retrieval effectiveness to rank K • We call this “safe-to-rank K” • Dynamic pruning is based upon – Early termination – Comparing upper bounds on retrieval scores with thresholds 50
  • 51. Dynamic Pruning Techniques • MaxScore [1] – Early termination: does not compute scores for documents that won’t be retrieved by comparing upper bounds with a score threshold • WAND [2] – Approximate evaluation: does not consider documents with approximate scores (sum of upper bounds) lower than threshold – Therefore, it focuses on the combinations of terms needed (wAND) • BlockMaxWand – SOTA variant of WAND that uses benefits from the block- layout of posting lists 51 [1] H Turtle & J Flood. Query Evaluation : Strategies and Optimisations. IPM: 31(6). 1995. [2] A Broder et al. Efficient Query Evaluation using a Two-Level Retrieval Process. CIKM 2003.
  • 52. Early Termination A document evaluation is early terminated if all or some of its postings, relative to the terms of the query, are not fetched from the inverted index or not scored by the ranking function. 52SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 53. Term and document upper bounds • Similarity function between queries and documents: • For each term t in the vocabulary, we can compute a term upper bound (also known as max score) σt(q) such that, for all documents d in the posting list of term t: • For a query q and a document d, we can compute a document upper bound σd(q) by summing up the query term upper bounds and/or actual scores: 53SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 54. Heap and Threshold • During query processing, the top k full or partial scores computed so far, together with the corresponding docids, are organized in a priority queue, or min- heap, • The smallest value of these (partial) scores is called threshold θ. • If there are not at least k scores, the threshold value is assumed to be 0. 54SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA d1 dK d2 dK-1 dK-2
  • 55. Dynamic Pruning Condition • Property I: The threshold value is not decreasing • Property II: A document with a score smaller than the threshold will never be in final top K documents • Pruning Condition: for a query q and a document d, if the document upper bound σd(q), computed by using partial scores, if any, and term upper bounds, is lesser then or equal to the current threshold θ, the document processing can be early terminated. 55SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 56. Example of early termination 56SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA = 11.1 – 2.1 + 0.5
  • 57. Effectiveness Guarantees • safe/unoptimized: all documents, not just the top k, appear in the same order and with the same score as they would appear in the ranking produced by a unoptimized strategy. • safe up to k/rank safe: the top k documents produced are ranked correctly, but the document scores are not guaranteed to coincide with the scores produced by a unoptimized strategy. • unordered safe up to k/set safe: the documents coincides with the top k documents computed by a full strategy, but their ranking can be different. • approximate/unsafe: no provable guarantees on the correctness of any portion of the ranking produced by these optimizations can be given 57SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA [Turtle and Flood, 1995] [Strohman, 2007] LessEffectiveMoreEffective
  • 58. TAAT Optimizations • All TAAT dynamic pruning optimizations split the query processing into two distinct phases. • OR mode – the normal TAAT algorithm is executed – New accumulators are created and updated, until a certain pruning condition is met • AND mode – no new accumulators are created, – a different algorithm is executed, on the remaining terms and/or on the accumulators created during the first phase. 58SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 59. AND Mode • Quit. – The processing of postings completely stops at the end of the first phase. – No new accumulators are created and no postings from the remaining term’s posting lists are processed. • Continue. – The creation of new accumulators stops at the end of the first. – The remaining term’s posting lists will be processed, but just to update the score of the already existing postings. • Decrease. – The processing of postings in the second phase proceeds as in Continue. – The number of accumulators is decreased as the remaining terms are processed. – Those terms will have small score contributions, with few chances to alter the current top k document ranking. – More importantly, the memory occupancy could be reduced as soon as we realize that an existing accumulator can be dropped since it will never enter in the final top k documents, with a corresponding benefit on response times. 59SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA [Moffat and Zobel, 1996] [Anh and Moffat, 1998]
  • 60. Summary 60SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 61. DAAT Optimizations • We will cover three families of DAAT Optimizations 1. Top Docs: – Adds record of "top" documents to the start of each posting list 2. Max Scores: – Utilises thresholds and upper bounds – Can skip over unscored postings – E.g. MaxScore, WAND 3. Block Max: – Leverages the block layout of the posting lists – Can skip entire blocks of postings – E.g. BlockMaxWand 61SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 62. Top Docs Optimizations • [Brown, 1995] proposed to focus processing on documents with high score contributions. • For any given term t in the vocabulary, a top candidates list (also known as a champions list) is stored separately as a new posting list. • Then, the normal DAAT processing for any query q is applied to the top candidate posting lists only, reducing sensibly the number of postings scored, but with potential negative impacts on the effectiveness, since this optimization is unsafe. 62SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 63. Max Scores Optimizations • MaxScore strategy [Turtle and Flood, 1995] – Early termination: does not compute scores for documents that won’t be returned – By comparing upper bounds with threshold – Based on essential and non-essential posting lists – Suitable for TAAT as well • WAND strategy [Broder et al., 2003] – Approximate evaluation: does not consider documents with approximate scores (sum of upper bounds) lower than threshold • Both use docids sorted posting lists • Both exploit skipping 63SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 64. MaxScore Example (I) 64SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA no document can be returned as a top k results if it appears in the non-essential lists only, i.e., 1.3 + 3.4 < 6.0 Sorted in increasing order of term upper bound
  • 65. MaxScore Example (II) 65SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Essential lists are processed in DAAT, and their score contributions computed
  • 66. MaxScore Example (III) 66SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA process the non essential lists by skipping to the candidate docid. Partial score is updated. As soon as a partial score implies early termination, Move to next document.
  • 67. MaxScore Example (IV) 67SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 68. WAND WAND uses approximate evaluation: does not consider documents with approximate scores (sum of upper bounds) lower than threshold Weak AND or Weighted AND operator: is true if and only if where • Xi is a boolean variable, • wi is a weight, • xi is 0 or 1 depending on Xi • θ is a threshold 69SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 69. WAND Details • Given a query q = {t1,...,tn} and a document d, • Xi is true if and only if the term ti appears in document d • We take the term upper bound σi(d) as weight wi • The threshold θ is the smallest score of the top k documents scored so far during query processing 70SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA is true iff
  • 70. WAND Example (I) 71SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Posting lists ALWAYS sorted by current docid Pivot docid is the first docid with a non-zero chance to end up in the top k results
  • 71. WAND Example (II) 72SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Move previous posting lists up to the pivot (or right after) using p.next(d) operator
  • 72. WAND Example (III) 73SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Keep the posting lists sorted by increasing docid! 2.1 + 1.3 + 4.0 > 6.0 then full evaluation
  • 73. WAND Example (IV) 74SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Advance the iterators by one and keep lists sorted
  • 74. Performance 76SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Latency results (in ms) for DAAT, MaxScore and WAND (with speedups), for K = 30. Adapted from [Fontoura et al., 2011]. Latency results (in ms) for DAAT, MaxScore and WAND (with speedups) for different query lengths, average query times on ClueWeb09, for K = 10. Adapted from [Mallia et al., 2017].
  • 75. Unsafe WAND 77SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA • Reducing K makes (BlockMax)WAND and MaxScore faster – But you get a shorter, but safe, ranking • The dynamic pruning techniques can be also configured to be unsafe – This is achieved by multiplying the threshold Θ by a constant factor F – Increased speed, but some documents will not be retrieved
  • 76. Global vs Local Term Upper Bounds • A (global) term upper bound is computed over the scores of all documents in a posting list • Each posting list is sequentially divided in a block list, where each block containts a given number of consecutive postings, e.g., 128 postings per block. • For each block, a block (local) term upper bound is computed, storing the maximum contribution of the postings in the block. 78SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 77. Block Max Indexes 79SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA BlockMaxWAND requires further information in the skiplists – i.e. Block Max Indices lexicon
  • 78. Simplified Block Max WAND pseudocode 80SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA while true pivot ← find pivot as in WAND (all lists) or break perform a shallow move // advance block list iterators to pivot compute block upper bound if pivot would enter heap compute score and update heap perform a deep move // advance posting list iterators to next docid reorder by docid else perform a shallow move // advance block list iterators to next block(s) perform a deep move // advance posting list iterators to next block(s) reorder by docid return top K results in heap
  • 79. Performance 81SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Latency results (in ms) for DAAT, WAND and BMW (64 postings blocks) (with speedups) for different query lengths, average query times (Avg, in ms) on Gov2, for K = 10.
  • 80. Variable Block Max WAND SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Blocks size matters: • too large → inaccurate estimation • too small → average skip is small Is fixed-sized blocks the right choice?
  • 81. ALTERNATIVE INDEX LAYOUTS 89SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 82. Frequency-sorted Indexes • An alternative way to arrange the posting lists is to sort them such that the highest scoring documents appear early. • If an algorithm could find them at the beginning of posting lists, dynamic pruning conditions can be enforced very early. • [Wong and Lee, 1993] proposed to sort posting list in decreasing order of term-document frequency value. 90SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 83. Impact-sorted Indexes • [Anh et al., 2001] introduced the definition of impact of term t in document d for the quantity wt(d)/Wd (or document impact) and impact of term t in query q for wt(q) (or query impact). • They proposed to facilitate effective query pruning by sorting the posting lists in decreasing order of (document) impact, i.e., to process queries on a impact- sorted index 91SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 84. Quantized Impacts • Transform impact values from real numbers in [L,U] to b- bit integers, i.e. 2b buckets. • LeftGeom Quantization: • Uniform Quantization • Reverse Mapping 92SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 85. Quantization Bits • [Anh et al., 2001] show that 5 bits are reasonable to encode the impacts with minimal effectiveness losses for both quantization schemes – with 10 bits, no effectiveness losses are reported – This value should be tuned when using different collections and/or different similarity functions. • Crane et al. [2013] confirmed that 5 to 8 bits are enough for small-medium document collections – for larger collections, from 8 up to 25 bits are necessary, depending on the effectiveness measure. 93SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 86. Compressing impacts • [Trotman, 2014b] investigated the performance of integer compression algorithms for frequency-sorted and impact-sorted indexes. • The new SIMD compressor (QMX) is more time and space efficient than the existing SIMD codecs, as furtherly examined by [Trotman and Lin, 2016]. • [Lin and Trotman, 2017] found that the best performance in query processing with SAAT is indeed obtained when no compression is used, even if the advantage w.r.t. QMX is small (∼5%). • Uncompressed impact-sorted indexes can be up to two times larger than their QMX- compressed versions. 94SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 87. Impact-based Query Processing 95SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 88. Score-at-a-time 96SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 89. Dynamic Pruning for SAAT • The impact blocks are initially processed in OR-mode, then AND- mode, and finally in REFINE-mode. • After the OR-mode, a fidelity control knob controls the percentage of remaining postings to process thereafter (AND & REFINE). – 30% of the remaining postings results in good retrieval performance. • Experiments by [Ahn & Moffat, 2006] showed that this optimization is able to reduce the memory footprint by 98% w.r.t. SAAT, with a speedup of 1.75×. 97SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 90. Anytime Ranking • [Lin and Trotman, 2015] proposed a linear regression model to translate a deadline time on the query processing time into the number of posting to process. • [Mackenzie et al., 2017] proposed to stop after processing a given percentage of the total postings in the query terms' posting lists, on a per-query basis. • According to their experiments, the fixed threshold may result in reduced effectiveness, as the number of query terms increases, but conversely it gives a very strict control over the tail latency of the queries. 98SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 91. Essential Bibliography 99 [Wong and Lee, 1993]. Implementations of partial document ranking using inverted files. IPM, 29 (5):647–669, 1993. [Persin, 1994]. Document filtering for fast ranking. In Proc. SIGIR, 1994. ACM. [Turtle and Flood, 1995] Query evaluation: strategies and optimizations. IPM, 31(6):831–850, 1995. [Anh et al., 2001] Vector-space ranking with effective early termination. In Proc. SIGIR, pages 35–42, 2001. ACM. [Anh and Moffat, 2002]. Impact transformation: effective and efficient web retrieval. In Proc. SIGIR, pages 3–10, 2002. ACM. [Broder et al., 2003] Efficient query evaluation using a two-level retrieval process. In Proc. CIKM, pages 426–434, 2003. ACM. [Anh and Moffat, 2006] Pruned query evaluation using pre-computed impacts. In Proc. SIGIR, pages 372–379, 2006. ACM. [Zobel and Moffat, 2006] Inverted files for text search engines. ACM Computing Surveys, 38(2), 2006. [Strohman, 2007] Efficient Processing of Complex Features for Information Retrieval. PhD Thesis, 2007. [Fontoura et al., 2011] Evaluation strategies for top-k queries over memory-resident inverted indexes. Proc. of VLDB Endowment, 4(12): 1213– 1224, 2011. [Crane et al., 2013] Maintaining discriminatory power in quantized indexes. In Proc. CIKM, pages 1221–1224,2013. ACM. [Lin and Trotman, 2015] Anytime ranking for impact-ordered indexes. In Proc. ICTIR, pages 301–304, 2015. ACM. [Trotman and Lin, 2016] In vacuo and in situ evaluation of simd codecs. In Proc. ADCS, pages 1–8, 2016. ACM. [Mackenzie et al., 2017] Early termination heuristics for score-at-a-time index traversal. In Proc, ADCS, pages 8:1– 8:8, 2017. ACM. [Lin and Trotman, 2017] The role of index compression in score-at-a-time query evaluation. Information Retrieval Journal, pages 1–22, 2017. [Mallia et al., 2017] Faster blockmax wand with variable-sized blocks. In Proc. SIGIR, pages 625–634, 2017. ACM. SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 92. PART 3: LEARNING TO RANK 100SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 93. 1,000,000,000s of documents 1000s of documents Roadmap 101 20 docs LEARNING TO RANK e.g. BM25 [1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day. e.g. AND/ORPart 1: Index layouts Part 2: Query evaluation & dynamic pruning Part 3: Learning-to-rank Part 4: Query Efficiency Prediction & Applications
  • 94. Motivations for Learning • How to choose term weighting models? • Different term weighting models have different assumptions about how relevant documents should be retrieved • Also: • Field-based models: term occurrences in different fields matter differently • Proximity-models: close co-occurrences matter more • (Deep learned matching models) • Priors: documents with particular lengths or URL/inlink values matter more • Query Features: Long queries, difficult queries, query type How to combine all these easily and appropriately?
  • 95. Learning to Rank • Application of tailored machine learning techniques to automatically (select and) weight retrieval features • Based on training data with relevance assessments • Learning to rank has been popularised by commercial search engines (e.g. Bing, Baidu, Yandex) • They require large training datasets, possibly instantiated from click-through data • Click-through data has facilitated the deployment of learning approaches T.-Y. Liu. (2009). Learning to rank for information retrieval. Foundation and Trends in Information Retrieval, 3(3), 225–331.
  • 96. Types of Features • Typically, commercial search engines use hundreds of features for ranking documents, usually categorised as follows: Name Varies depending on… Examples Query Document Query Dependent Features ✔ ✔ Weighting models, e.g. BM25, PL2 Proximity models, e.g. Markov Random Fields Field-based weighting models, e.g. PL2F Deep learned matched representations Query Independent Features ✗ ✔ PageRank, number of inlinks Spamminess Query Features ✔ ✗ Query length Presence of entities
  • 97. Learning to Rank 1. Sample Identification • Apply BM25 or similar (e.g. DFR PL2) to rank documents with respect to the query • Hope that the sample contains enough relevant documents 2. Compute more features • Query Dependent • Query Independent • Query Features 3A. Learn ranking model • Based on training data 3B. Apply learned model − Re-rank sample documents
  • 98. Schematically… Generated using same previous two steps, + relevance assessments N. Tonellotto, C. Macdonald, I. Ounis. (2013). Efficient and effective retrieval using selective pruning. WSDM'13. Sample Ranking Learning to Rank technique
  • 99. Query-Dependent Feature Extraction • We might typically deploy a number of query dependent features – Such as additional weighting models, e.g. fields/proximity, that are calculated based on information in the inverted index • We want the result set passed to the final cascade to have all query dependent features computed – Once first cascade retrieval has ended, it’s too late to compute features without re-traversing the inverted index postings lists – It would take too long to compute all features for all scored documents • Solutions: – 1. cache the "fat" postings for documents that might make the sample (Postings contain frequencies, positions, fields, etc.) – 2. access the document vectors after retrieval 107
  • 100. d30 4.3 <p1>,<p2> d19 3.7 <p1> d42 3.3 <p1,p2> Caching the Fat • Solution 1: cache the "fat" postings for documents that might make the sample Q Initial Sample Retrieval d30 4.3 d19 3.7 d42 3.3 Calculate Features; Apply LTR d19 1.0 d42 0.9 d30 0.5 Sample ResultSet “Fat” Final ResultSet Inv C Macdonald et al. (2012). About Learning Models with Multiple Query Dependent Features. TOIS 31(3). 108 (With positions)  Lookup cost is almost free – we had to access the inverted index anyway ✗ We only have access to postings for the original query terms
  • 101. Accessing Document Vectors • Solution 2: access direct/forward index document vectors Q Initial Sample Retrieval d30 4.3 d19 3.7 d42 3.3 Calculate Features; Apply LTR d19 1.0 d42 0.9 d30 0.5 Sample ResultSet Final ResultSet Inv Asadi & Lin. Document vector representations for feature extraction in multi-stage document ranking. Inf. Retr. 16(6): 747-768 (2013) 109 (Need not have positions)  We can calculate features for query terms not in the original query ✗ Additional lookup for K documents ✗ Direct Index postings contain information for all terms in a document d30 4.3 <p1>,<p2> d19 3.7 <p1> d42 3.3 <p1,p2> Direct
  • 102. Learning to Rank Process document sample ✓ ✗ ✓ ✓ ✗ ✓ ✓ ✗ ✓ ✓ ✗ ✓ Learner Learned Model Re-rank using learned model unseen query Training examples f1 f2 f3 f4 d1 0.2 0.1 0.5 0.8 d2 0.5 0.7 0.0 0.2 d3 0.3 0.2 0.5 0.0
  • 103. The Importance of Sample Size • What sample size is necessary for effective learning? • Small samples: faster learning, faster retrieval • Large samples: higher recall Small sample size, degraded performance No benefit for sample size larger than 1000 Pairwise techniques struggle with large samples NDCG@20 for different sample sizes Increased performance as sample size rises C. Macdonald, R. Santos and I. Ounis. (2012). The Whens and Hows of Learning to Rank. IR Journal. DOI: 1007/s10791-012-9209-9
  • 104. TYPES OF MODELS FOR LTR
  • 105. Types of Learned Models (1) Linear Model • Linear Models (the most intuitive to comprehend) – Many learning to rank techniques generate a linear combination of feature values: – Linear Models make some assumptions: • Feature Usage: They assume that the same features are needed by all queries • Model Form: The model is only a linear combination of feature values. - Contrast this with genetic algorithms, which can learn functional combinations of features, by randomly introducing operators (e.g. try divide feature a by feature b), but are unpractical to learn • It is difficult to find wf values that maximise the performance of an IR evaluation metric, as they are none- smooth and none-differentiable – Typically, techniques such as simulated annealing or stochastic gradient descent are used to empirically obtain wf values score(d,Q) = wf f å ×valuef (d)
  • 106. Type of Learned Models (2) Regression Trees – A regression tree is series of decisions, leading to a partial score output – The outcome of the learner is a “forest” of many such trees, used to calculate the final score of a document for a query – Their ability to customise branches makes them more effective than linear models – Regression trees are pointwise in nature, but several major search engines have created adapted regression tree techniques that are listwise – E.g. Microsoft’s LambdaMART is at the heart of the Bing search engine (hundreds/thousands of them!)Σ For each document in the sample: S. Tyree, K. Weinberger, K. Agrawal, J. Paykin. (2011). Parallel Boosted Regression Trees for Web Search Ranking. WWW’11.
  • 107. Learning to Rank Best Practices (Query Features) • Tree-based learning to rank techniques have an inherent advantage • They can activate different parts of their learned model depending on feature values • We can add “query features”, which allow different sub-trees for queries with different characteristics – E.g. long query, apply proximity Class of Query Features NDCG@20 Baseline (47 document features) LambdaMART 0.2832 Query Performance Predictors 0.3033* Query Topic Classification (entities) 0.3109* Query Concept Identification (concepts) 0.3049* Query Log Mining (query frequency) 0.3085* C. Macdonald, R. Santos and I. Ounis. (2012). On the Usefulness of Query Features for Learning to Rank. CIKM’12.
  • 108. Learning to Rank Best Practices • About the sample – it must be: – Small enough that calculating features doesn’t take too long – Small enough that BMW/WAND/MaxScore are fast! – Large enough to have enough sufficient recall of relevant douments • Previous slide says 1000 documents needed for TREC Web corpora – Simple early cascade: Only time for a single weighting model to create the sample • E.g. PL2 < BM25 with no features; but when features are added, no real difference • Multiple weighting models as features: • Using more weighting models do improve the learned model • Field Models as Features • Adding field-based models as features always improves effectiveness • In general, Query Independent features (e.g. PageRank, Inlinks, URL length) improve effectiveness C. Macdonald, R. Santos, I. Ounis, B. He. About Learning Models with Multiple Query Dependent Features. TOIS 31(3), 2013. C Macdonald & I Ounis (2012). The Whens & Hows of Learning to Rank. INRT. 16(5), 2012.
  • 110. Additive Ensmbles of Regression Trees • Tree forests: Gradient-Boosted Regression Tress, Lambda MART, Random Forest, etc. – ensemble of weak learners, each contributing a partial score – at scoring time, all trees can be processed independently 126SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA T1 (q, d) s1 T2 (q, d) s2 Tn (q, d) sn Assuming: • 3,000 trees • an average tree depth of 10 nodes • 1,000 documents scored per query • a cluster with 1,000 search shards Costs • 3,000 x 10 x 1,000 = 30 M tests per query and per shard • Approx. 30 G tests for the entire search cluster
  • 111. 127 13.3 0.12 -1.2 43.9 11 -0.4 7.98 2.55 Query-Document pair features F1 F2 F3 F4 F5 F6 F7 F8 0.4 -1.4 1.5 3.2 2.0 0.5 -3.1 7.1 50.1:F4 10.1:F1 -3.0:F3 -1.0:F3 3.0:F8 0.1:F6 0.2:F2 2.0 s(d) += 2.0 exit leaf true false - number of trees = 1,000 – 20,000 - number of leaves = 4 – 64 - number of docs = 3,000 – 10,000 - number of features = 100 –1,000 Processing a query-document pair d[F4] ≤ 50.1 SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 127
  • 112. 128 Struct+ • Each tree node is represented by a C object containing: • the feature id, • the associated threshold • the left and right pointers • Memory is allocated for all the nodes at once • The nodes are laid out using a breadth-first traversal of the tree Node* getLeaf(Node* np, float* fv) { if (!np->left && !np->right) { return np; } if (fv[np->fid] <= np->threshold) { return getLeaf(np->left, fv); } else { return getLeaf(np->right, fv); } } Tree traversal: - No guarantees about references locality - Low cache hit ratio - High branch misprediction rate - Processor pipeline flushing SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 128
  • 113. 129 CodeGen • Statically-generated if-then-else blocks if (x[4] <= 50.1) { // recurses on the left subtree ... } else { // recurses on the right subtree if(x[3] <= -3.0) { result = 0.4; } else { result = -1.4; } } Tree traversal: - Static generation - Recompilation if model changes - Compiler "black magic" optimizations - Expected to be fast - Compact code, expected to fit in I-cache - High cache hit ratio - No data dependencies, a lot of control dependencies - High branch misprediction rate SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 129
  • 114. 130 VPred • Array representation of nodes, similar to the Struct+ baseline • All trees in a single array, need additional data structures to locate a tree in the array, and to know its height Tree traversal: Theory Practice double depth4(float* x, Node* nodes) { int nodeId = 0; nodeId = nodes->children[ x[nodes[nodeId].fid] > nodes[nodeId].theta]; nodeId = nodes->children[ x[nodes[nodeId].fid] > nodes[nodeId].theta]; nodeId = nodes->children[ x[nodes[nodeId].fid] > nodes[nodeId].theta]; nodeId = nodes->children[ x[nodes[nodeId].fid] > nodes[nodeId].theta]; return scores[nodeId]; } - Static unrolling of traversal functions - High compilation times - Compiler "black magic" optimizations - Expected to be fast - In-code vectorization - Processing 16 documents at a time - Requires tuning on dataset - No control dependencies, a lot of data dependencies - Low branch misprediction rate SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 130
  • 115. 131 Vectorization of Trees offsets |F| T1 (q d) s1 T2 (q, d) s2 Tn (q, d) sn Thresholds of tests nodes involving feature f0 Thresholds of tests nodes involving feature f1 thresholds SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 131
  • 116. 132 while !(x[f0] <= thresholds[i]) do // bitmask operations offsets thresholds • Linear access to data • Highly predictable conditional branches • True nodes are not inspected at all SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 132 Processing Tress in Vectors
  • 117. 133 Quickscorer algorithm SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 133 Tree traversal (BitMask Operations) Score computation
  • 118. 134 Table 2: Per -docum ent scor ing t im e in µs of QS, VPr ed, If -T hen-El se and St r uct + on MSN-1 and Y!S1 dat aset s. G ain fact or s ar e r epor t ed in par ent heses. M et hod ⇤ Number of t rees/ dat aset 1, 000 5, 000 10, 000 20, 000 MSN-1 Y!S1 MSN-1 Y!S1 MSN-1 Y!S1 MSN-1 Y!S1 QS 8 2.2 (–) 4.3 (–) 10.5 (–) 14.3 (–) 20.0 (–) 25.4 (–) 40.5 (–) 48.1 (–) VPr ed 7.9 (3.6x) 8.5 (2.0x) 40.2 (3.8x) 41.6 (2.9x) 80.5 (4.0x) 82.7 (3.3) 161.4 (4.0x) 164.8 (3.4x) If -T hen-El se 8.2 (3.7x) 10.3 (2.4x) 81.0 (7.7x) 85.8 (6.0x) 185.1 (9.3x) 185.8 (7.3x) 709.0 (17.5x) 772.2 (16.0x) St r uct + 21.2 (9.6x) 23.1 (5.4x) 107.7 (10.3x) 112.6 (7.9x) 373.7 (18.7x) 390.8 (15.4x) 1150.4 (28.4x) 1141.6 (23.7x) QS 16 2.9 (–) 6.1 (–) 16.2 (–) 22.2 (–) 32.4 (–) 41.2 (–) 67.8 (–) 81.0 (–) VPr ed 16.0 (5.5x) 16.5 (2.7x) 82.4 (5.0x) 82.8 (3.7x) 165.5 (5.1x) 165.2 (4.0x) 336.4 (4.9x) 336.1 (4.1x) If -T hen-El se 18.0 (6.2x) 21.8 (3.6x) 126.9 (7.8x) 130.0 (5.8x) 617.8 (19.0x) 406.6 (9.9x) 1767.3 (26.0x) 1711.4 (21.1x) St r uct + 42.6 (14.7x) 41.0 (6.7x) 424.3 (26.2x) 403.9 (18.2x) 1218.6 (37.6x) 1191.3 (28.9x) 2590.8 (38.2x) 2621.2 (32.4x) QS 32 5.2 (–) 9.7 (–) 27.1 (–) 34.3 (–) 59.6 (–) 70.3 (–) 155.8 (–) 160.1 (–) VPr ed 31.9 (6.1x) 31.6 (3.2x) 165.2 (6.0x) 162.2 (4.7x) 343.4 (5.7x) 336.6 (4.8x) 711.9 (4.5x) 694.8 (4.3x) If -T hen-El se 34.5 (6.6x) 36.2 (3.7x) 300.9 (11.1x) 277.7 (8.0x) 1396.8 (23.4x) 1389.8 (19.8x) 3179.4 (20.4x) 3105.2 (19.4x) St r uct + 69.1 (13.3x) 67.4 (6.9x) 928.6 (34.2x) 834.6 (24.3x) 1806.7 (30.3x) 1774.3 (25.2x) 4610.8 (29.6x) 4332.3 (27.0x) QS 64 9.5 (–) 15.1 (–) 56.3 (–) 66.9 (–) 157.5 (–) 159.4 (–) 425.1 (–) 343.7 (–) VPr ed 62.2 (6.5x) 57.6 (3.8x) 355.2 (6.3x) 334.9 (5.0x) 734.4 (4.7x) 706.8 (4.4x) 1309.7 (3.0x) 1420.7 (4.1x) If -T hen-El se 55.9 (5.9x) 55.1 (3.6x) 933.1 (16.6x) 935.3 (14.0x) 2496.5 (15.9x) 2428.6 (15.2x) 4662.0 (11.0x) 4809.6 (14.0x) St r uct + 109.8 (11.6x) 116.8 (7.7x) 1661.7 (29.5x) 1554.6 (23.2x) 3040.7 (19.3x) 2937.3 (18.4x) 5437.0 (12.8x) 5456.4 (15.9x) same trivially holds for St r uct + . This means that the in- terleaved traversal strategy of QS needs to processlessnodes than in a traditional root-to-leaf visit. This mostly explains the results achieved by QS. As far as number of branches is concerned, we note that, not surprisingly, QS and VPr ed are much more efficient than If -T hen-El se and St r uct + with this respect. QS number of L3 cache misses starts increasing when dealing with 9, 000 trees on MSN and 8, 000 trees on Y!S1. BWQS: a block-wise variant of QS The previous experiments suggest that improving the cache efficiency of QS may result in significant benefits. As in Performance Per-document scoring time in microsecs and speedups SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA 134
  • 119. Essential Bibliography 135 • Craig Macdonald, Rodrygo Santos and Iadh Ounis. The Whens and Hows of Learning to Rank. Information Retrieval 16(5):584-628. • Craig Macdonald, Rodrygo Santos and Iadh Ounis. On the Usefulness of Query Features for Learning to Rank. In Proceedings of CIKM 2012. • C Macdonald et al, About Learning Models with Multiple Query Dependent Features. (2012). TOIS 31(3). • N Fuhr. Optimal Polynomial Retrieval Functions Based on the Probability Ranking Principle. (1989). TOIS. 7(3), 1989. • S. Tyree, K.Q. Weinberger, K. Agrawal, J. Paykin Parallel boosted regression trees for web search ranking. WWW 2011. • N. Asadi, J. Lin, and A. P. de Vries. Runtime optimizations for tree-based machine learning models. IEEE Transactions on Knowledge and Data Engineering, 26(9):2281–2292, 2014. • C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. Quickscorer: A fast algorithm to rank documents with additive ensembles of regression trees. In Proc. of the 38th ACM SIGIR Conference, pages 73–82, 2015. • D. Dato, C. Lucchese, F.M. Nardini, S. Orlando, R. Perego, N. Tonellotto, R. Venturini. Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees. ACM Transactions on Information Systems, 2016. • C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. Exploiting CPU SIMD Extensions to Speed-up Document Scoring with Tree Ensembles. In Proc. of the 39th ACM SIGIR Conference, pages 833-836, 2016. • F. Lettich, C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. GPU-based Parallelization of QuickScorer to Speed-up Document Ranking with Tree Ensembles. In Proc. of the 7th IIR Workshop, 2016 SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 121. 1,000,000,000s of documents 1000s of documents Roadmap 137 20 docs LEARNING TO RANK e.g. BM25 [1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day. e.g. AND/ORPart 1: Index layouts Part 2: Query evaluation & dynamic pruning Part 3: Learning-to-rank Part 4: Query Efficiency Prediction & Applications
  • 122. If we know how long a query will take, can we reconfigure the search engine's cascades? 1,000,000,000s of documents 1000s of documents 10 docs Simple Ranking: Identify a set most likely to contain relevant documents Re-Ranking: Try really hard to get the top of the ranking correct, using many signals (features) [1] J Pederson. Query understanding at Bing. SIGIR 2010 Industry Day. Boolean: Do query terms occur? 138
  • 123. Main Idea: Query Efficiency Prediction • Predict how long an unseen query will take to execute, before it has executed. • This facilitates 3+ manners to make a search engine more efficient: 1. Reconfigure the cascades of the search engine, trading off a little effectiveness for efficiency 2. Apply more CPU cores to long-running queries 3. Decide how to plan the rewrites of a query, to reduce long- running queries In each case: increasing efficiency means increased server capacity and energy savings
  • 124. 2 term queries 4 term queries What makes a query fast or slow? 141 Length of Posting Lists? Full/MaxScore/ Wand? Length of Query? Co-occurrence of query terms?
  • 125. Query Efficiency Prediction • Can we predict response time for a query (query efficiency prediction)? • Properties: – Can be calculated online before retrieval commences – Easy (quick) to calculate – Based on single term statistics, previously calculated offline – Includes statistics on posting list’s score distribution term-level statistic predictor feature aggregation function
  • 126. • We combine statistics of the query terms using in a regression to accurately predict execution time for a query? Query Efficiency Prediction Framework total postings (baseline) 42 inexpensive featur
  • 127. Accuracy of Query Efficiency Predictors 144 Findings: Our learned predictor, with 42 features can accurately predict the response time of the search engine for an unseen query
  • 128. Three Applications of Query Efficiency Prediction 1. Selective retrieval – can we reduce the time taken in the earlier retrieval cascades, e.g. by retrieving less documents 2. Selective parallelisation – apply more CPU cores for long-running queries 3. Selective query rewriting – can we decide how to rewrite a query on a per-query basis? • Recall that, for each application: increasing efficiency means increased server capacity and energy savings
  • 130. QEP Application (1): Selective Retrieval • If we only need to return 10 documents to the user, do we need to re-score 1000s? • i.e. can we configure the earlier cascades to be faster for longer queries, but potentially miss relevant documents? Boolean: Do query terms occur? Simple Ranking: Identify a set most likely to contain relevant documents 1,000,000,000s of documents 1000s of documents 20 docs 500,000,000 docs [Tonellotto, Macdonald, Ounis, WSDM 2013] 500s docs 10 docs 1,000,000,000s of documents 1000s of documents 10 docs Re-Ranking: Try really hard to get the top of the ranking correct, using many signals (features) Proposed Method [WSDM 2013]: • Predict the efficiency (response time) of a query • Adjust <K, F> of WAND to be faster for predicted longer queries
  • 131. Selective Pruning Results 148SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA Findings: by using our 42-feature query efficiency predictor to selectively adjust the retrieval cascades, we could reduce response time by 31%, while maintaining high effectiveness The increased capacity by the better efficiency is equivalent to 44% more servers
  • 132. QEP APPLICATION (2) : SELECTIVE PARALLELISATION 152
  • 133. QEP Application (2): Selective Parallelisation • Industry web search deployments have to meet service level agreements – e.g. 200ms for 99th-percentile response time – Only 1 query in 100 may take longer than 200ms • Some queries can take > 200ms Problem: How to identify and speed-up those slow queries?
  • 134. Parallelisation: Retrieval with multiple cores • Parallelisation using N threads speeds up all queries, but decreases server capacity to 1/N • Idea! Selective Parallelisation – only parallelise those queries predicted to be too long 154 CPU 1 CPU 2 Selective Prediction: long! cup 1.3B p Lexicon person 500M p world 6.9B p cup world InvertedIndex 1 2 2 5 2 6 3 5 6 7 6 8 1 2 3 4 3 5 3 6 ---- ---- world cup ----
  • 135. As used by !! • Jeon et al. reports that this approach reduces the 99th-percentile response time by 50% from 200ms to 100ms, compared to other state-of-the- art approaches that do not consider predicted execution time • In doing so, it increases server capacity by more than 50%. This potentially saves one-third of production servers, constituting a significant cost reduction • Selective parallelisation, using query efficiency predictors, is now being deployed by industrial search engine Bing, across hundreds of thousands of servers Image Source: Wikimedia [Predictive parallelization: Taming tail latencies in web search. M Jeon, S Kim, S Hwang, Y He, S Elnikety, AL Cox, S Rixner. Proceedings of ACM SIGIR 2014]
  • 136. QEP APPLICATION (3) : SELECTIVE QUERY REWRITING 156
  • 137. Query Rewriting Plans • Remember our notion of a query plan, incl. rewriting the query – E.g. Context-Sensitive Stemming [1] or MRF Proximity: [2] 157 • All these rewritings use complex operators, e.g. #syn() #uwN() – #syn: documents containing any of the words – #1: documents containing the exact phrase – #uwN: documents containing the words in a window of size N • Key Question: how to rewrite a query, to maximise efficiency AND effectiveness 157 [1] F. Peng et al. Context Sensitive Stemming for Web Search. SIGIR 2007. [2] D. Metzler, W.B. Croft. A Markov Random Field Model for Term Dependencies. SIGIR 2005.
  • 138. 158 Different queries can be rewritten in different ways Different queries (and rewritings) can take varying durations to execute We describe a framework that plans how to execute each query (e.g. rewriting), taking into account: How long each plan may take to execute Expected effectiveness of each plan
  • 139. A Selective Rewriting Approach • The selective mechanism is based around two ideas: 1. We can predict the efficiency of a given query plan… and hence eliminate plans that will exceed the target response time 2. We can predict the effectiveness of a given query plan… and where possible execute the most effective 159 QEP, including for complex queries QPP, or an expectation of plan effectiveness
  • 140. Complex terms and ephemeral posting lists • Recall, QEP essentially estimates pruning, based on posting list lengths • Challenge – obtaining posting list length estimates for queries involving complex terms, e.g. 160 car 21 2 25 3 26 5 cars 19 1 25 2 #syn(car,cars) 19 1 21 2 25 5 26 6 unigram ephemeral
  • 141. Observations on Ephemeral Postings lists • Observation: Statistics for ephemeral postings lists can be bounded… – e.g. #syn is disjunctive, so the length of its posting list cannot be greater than the sum of the posting list lengths of its constituent terms – e.g. #1 is conjunctive, the length of its postings list cannot be greater than the minimum of the posting list lengths of its constituent terms – e.g. for #1, the maximum number of matches for a phrase cannot exceed the minimum of the max term frequencies of the constituent terms • These allow: a) upper bound statistics (necessary for doing dynamic pruning with complex terms) b) Efficiency predictions for queries with complex terms 161
  • 142. Putting together: Complex QEP & Online Plan Selection 162 49% decrease in mean response time, and 62% decrease in tail response time, compared to Default, w/o significant loss in NDCG
  • 143. Looking Forward • The query plans we showed here demonstrate only two possible rewritings: – Proximity and stemming – They don’t consider the (predicted) effectiveness of a plan • We highlight Rosset et al, who have proposed use of reinforcement learning to identify the most appropriate plan per-query and per-index block. 163 Optimizing Query Evaluations using Reinforcement Learning for Web Search. Rosset et al, Proceedings of SIGIR 2018.
  • 144. QEP Bibliography • Predictive parallelization: Taming tail latencies in web search. M Jeon, S Kim, S Hwang, Y He, S Elnikety, AL Cox, S Rixner. Proceedings of ACM SIGIR 2014 • Load-Sensitive Selective Pruning for Distributed Search. Nicola Tonellotto, Fabrizio Silvestri, Raffaele Perego, Daniele Broccolo, Salvatore Orlando, Craig Macdonald and Iadh Ounis. In Proceedings of CIKM 2013. • Efficient and effective retrieval using selective pruning. Nicola Tonellotto, Craig Macdonald and Iadh Ounis. In Proceedings of WSDM 2013. • Learning to Predict Response Times for Online Query Scheduling. Craig Macdonald, Nicola Tonellotto and Iadh Ounis. In Proceedings of SIGIR 2012. • Efficient & Effective Selective Query Rewriting with Efficiency Predictions. Craig Macdonald, Nicola Tonellotto, Iadh Ounis. In Proceedings of SIGIR 2017. 164ECIR 2017 — 9 April 2017 — Aberdeen, Scotland, UK
  • 146. Conclusions • Information retrieval requires not just effective, but also efficient techniques – These two form an inherent tradeoff – Efficiency can be helped by applying more horsepower, but at significant cost! • Hence efficiency is important to ensure the Green credentials of the IR community • We have presented many techniques can aim to improve particular methods of retrieving – They make assumptions about the IR process (e.g. we care about top K documents) • As new IR techniques and models are created, there will always be a for enhancements to their efficiency to make them easily deployable 166SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 147. Emerging Trends • BitFunnel (SIGIR 2017 best paper) – a "lossy" signature representation for inverted files – Now deployed by Bing • Similarly deep learning data structures – Google • Deep learning for matching – but efficiency costs not yet well-researched • And your future ideas…? 167SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 148. Acknolwedgements • Some of the slides presented here have appeared in presentations which we have co- authored with: – Iadh Ounis – Richard McCreadie – Matteo Catena – Raffaele Perego – Rossano Venturini – Salvatore Orlando – Franco Maria Nardini – Claudio Lucchese 168SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA
  • 149. • Feedback: – http://bit.ly/2ucH86b 169SIGIR 2018 — 8 July 2018 — Ann Arbor (MI), USA