research-article

Finding the Best of Both Worlds: Faster and More Robust Top-k Document Retrieval

Authors:

Mohammad Hammoud,

Tamer ElsayedAuthors Info & Claims

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1031 - 1040

https://doi.org/10.1145/3397271.3401076

Published: 25 July 2020 Publication History

Abstract

Many top-k document retrieval strategies have been proposed based on the WAND and MaxScore heuristics and yet, from recent work, it is surprisingly difficult to identify the "fastest" strategy. This becomes even more challenging when considering various retrieval criteria, like different ranking models and values of k. In this paper, we conduct the first extensive comparison between ten effective strategies, many of which were never compared before to our knowledge, examining their efficiency under five representative ranking models. Based on a careful analysis of the comparison, we propose LazyBM, a remarkably simple retrieval strategy that bridges the gap between the best performing WAND-based and MaxScore-based approaches. Empirically, LazyBM considerably outperforms all of the considered strategies across ranking models, values of k, and index configurations under both mean and tail query latency.

References

[1]

Giambattista Amati. 2006. Frequentist and bayesian approach to information retrieval. In European Conference on Information Retrieval. Springer, 13--24.

Digital Library

[2]

Gianni Amati and Cornelis Joost Van Rijsbergen. 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. TOIS (2002).

[3]

Ramesh Bhashyam. 1996. TPC-D-the challenges, issues and results. ACM SIGMOD Record, Vol. 25, 4 (1996), 89--93.

Digital Library

[4]

Edward Bortnikov, David Carmel, and Guy Golan-Gueta. 2017. Top-k query processing with conditional skips. In Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 653--661.

Digital Library

[5]

Andrei Z Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Zien. 2003. Efficient query evaluation using a two-level retrieval process. In CIKM.

[6]

Kaushik Chakrabarti, Surajit Chaudhuri, and Venkatesh Ganti. 2011. Interval-based pruning for top-k processing over compressed lists. In ICDE.

[7]

Stéphane Clinchant and Eric Gaussier. 2010. Information-based models for ad hoc IR. In SIGIR.

[8]

Kevyn Collins-Thompson, Paul N. Bennett, Fernando Diaz, Charlie Clarke, and Ellen M. Voorhees. 2014. TREC 2013 Web Track Overview. In TREC.

[9]

Matt Crane, J Shane Culpepper, Jimmy Lin, Joel Mackenzie, and Andrew Trotman. 2017. A comparison of Document-at-a-Time and Score-at-a-Time query evaluation. In WSDM.

[10]

Caio Moura Daoud, Edleno Silva de Moura, Andre Carvalho, Altigran Soares da Silva, David Fernandes, and Cristian Rossi. 2016. Fast top-k preserving query processing using two-tier indexes. Information Processing & Management, Vol. 52, 5 (2016), 855--872.

Digital Library

[11]

Caio Moura Daoud, Edleno Silva de Moura, David Fernandes, Altigran Soares da Silva, Cristian Rossi, and Andre Carvalho. 2017. Waves: a fast multi-tier top-k query processing algorithm. Information Retrieval Journal, Vol. 20, 3 (2017), 292--316.

Digital Library

[12]

Jeffrey Dean. 2009. Challenges in building large-scale information retrieval systems: invited talk. In WSDM. 1--1.

[13]

Constantinos Dimopoulos, Sergey Nepomnyachiy, and Torsten Suel. 2013a. A candidate filtering mechanism for fast top-k query processing on modern cpus. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 723--732.

Digital Library

[14]

Constantinos Dimopoulos, Sergey Nepomnyachiy, and Torsten Suel. 2013b. Optimizing top-k document retrieval strategies for block-max indexes. In WSDM.

[15]

Shuai Ding and Torsten Suel. 2011. Faster top-k document retrieval using block-max indexes. In SIGIR.

[16]

Hui Fang and ChengXiang Zhai. 2005. An exploration of axiomatic approaches to information retrieval. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 480--487.

Digital Library

[17]

Marcus Fontoura, Vanja Josifovski, Jinhui Liu, Srihari Venkatesan, Xiangfei Zhu, and Jason Zien. 2011. Evaluation strategies for top-k queries over memory-resident inverted indexes. VLDB (2011).

[18]

Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 55--64.

Digital Library

[19]

Myeongjae Jeon, Saehoon Kim, Seung-won Hwang, Yuxiong He, Sameh Elnikety, Alan L Cox, and Scott Rixner. 2014. Predictive parallelization: Taming tail latencies in web search. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 253--262.

Digital Library

[20]

Jimmy Lin and Andrew Trotman. 2015. Anytime ranking for impact-ordered indexes. In Proceedings of the 2015 International Conference on The Theory of Information Retrieval. ACM, 301--304.

Digital Library

[21]

Jimmy Lin and Andrew Trotman. 2017. The role of index compression in score-at-a-time query evaluation. Information Retrieval Journal, Vol. 20, 3 (2017), 199--220.

Digital Library

[22]

Claudio Lucchese, Franco Maria Nardini, Raffaele Perego, Salvatore Orlando, and Salvatore Trani. 2018. Selective Gradient Boosting for Effective Learning to Rank. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 155--164.

Digital Library

[23]

Craig Macdonald, Iadh Ounis, and Nicola Tonellotto. 2011. Upper-bound approximations for dynamic pruning. ACM Transactions on Information Systems (TOIS), Vol. 29, 4 (2011), 17.

Digital Library

[24]

Craig Macdonald and Nicola Tonellotto. 2017. Upper Bound Approximation for BlockMaxWand. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. ACM, 273--276.

Digital Library

[25]

Craig Macdonald, Nicola Tonellotto, and Iadh Ounis. 2017. Efficient & effective selective query rewriting with efficiency predictions. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 495--504.

Digital Library

[26]

Antonio Mallia, Giuseppe Ottaviano, Elia Porciani, Nicola Tonellotto, and Rossano Venturini. 2017. Faster blockmax wand with variable-sized blocks. In SIGIR.

[27]

Antonio Mallia, Michał Siedlaczek, Joel Mackenzie, and Torsten Suel. 2019 b. PISA: Performant Indexes and Search for Academia. (2019).

[28]

Antonio Mallia, Michał Siedlaczek, and Torsten Suel. 2019 a. An experimental study of index compression and DAAT query processing methods. In European Conference on Information Retrieval. Springer, 353--368.

[29]

Eric Nalisnick, Bhaskar Mitra, Nick Craswell, and Rich Caruana. 2016. Improving document ranking with dual word embeddings. In Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee, 83--84.

Digital Library

[30]

Matthias Petri, J Shane Culpepper, and Alistair Moffat. 2013. Exploring the magic of WAND. In Proceedings of the 18th Australasian Document Computing Symposium. ACM, 58--65.

Digital Library

[31]

Matthias Petri, Alistair Moffat, Joel Mackenzie, J Shane Culpepper, and Daniel Beck. 2019. Accelerated Query Processing Via Similarity Score Prediction. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 485--494.

Digital Library

[32]

Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et almbox. 1995. Okapi at TREC-3. NIST Special Publication (1995).

[33]

Oscar Rojas, Veronica Gil-Costa, and Mauricio Marin. 2013a. Distributing efficiently the Block-Max WAND algorithm. Procedia Computer Science (2013).

[34]

Oscar Rojas, Veronica Gil-Costa, and Mauricio Marin. 2013b. Efficient parallel block-max WAND algorithm. In European Conference on Parallel Processing. Springer, 394--405.

Digital Library

[35]

Dongdong Shan, Shuai Ding, Jing He, Hongfei Yan, and Xiaoming Li. 2012. Optimized top-k processing with global page scores on block-max indexes. In Proceedings of the fifth ACM international conference on Web search and data mining. ACM, 423--432.

Digital Library

[36]

Alexander A Stepanov, Anil R Gangolli, Daniel E Rose, Ryan J Ernst, and Paramjit S Oberoi. 2011. SIMD-based decoding of posting lists. In Proceedings of the 20th ACM international conference on Information and knowledge management. 317--326.

Digital Library

[37]

Nicola Tonellotto, Craig Macdonald, Iadh Ounis, et almbox. 2018. Efficient Query Processing for Scalable Web Search. Foundations and Trends® in Information Retrieval (2018).

[38]

Howard Turtle and James Flood. 1995. Query evaluation: strategies and optimizations. IP & M (1995).

[39]

Sebastiano Vigna. 2013. Quasi-succinct Indices. In WSDM.

[40]

Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A cascade ranking model for efficient ranked retrieval. In SIGIR.

[41]

Peilin Yang, Hui Fang, and Jimmy Lin. 2018. Anserini: Reproducible ranking baselines using Lucene. JDIQ (2018).

Digital Library

[42]

Dawei Yin, Yuening Hu, Jiliang Tang, Tim Daly, Mianwei Zhou, Hua Ouyang, Jianhui Chen, Changsung Kang, Hongbo Deng, Chikashi Nobata, et almbox. 2016. Ranking relevance in yahoo search. In SIGKDD.

[43]

Chengxiang Zhai and John Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. TOIS (2004).

Cited By

Mackenzie JTrotman ALin J(2023)Efficient Document-at-a-time and Score-at-a-time Query Evaluation for Learned Sparse RepresentationsACM Transactions on Information Systems10.1145/357692241:4(1-28)Online publication date: 22-Mar-2023
https://dl.acm.org/doi/10.1145/3576922
Qiao YYang YLin HYang T(2023)Optimizing Guided Traversal for Fast Learned Sparse RetrievalProceedings of the ACM Web Conference 202310.1145/3543507.3583497(3375-3385)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543507.3583497
Li ZMackenzie JChen HDuh WHuang HKato MMothe JPoblete B(2023)Profiling and Visualizing Dynamic Pruning AlgorithmsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591806(3125-3129)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591806
Show More Cited By

Index Terms

Finding the Best of Both Worlds: Faster and More Robust Top-k Document Retrieval
1. Information systems
  1. Information retrieval

Recommendations

Efficient Query Processing Infrastructures: A half-day tutorial at SIGIR 2018
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval

Typically, techniques that benefit effectiveness of information retrieval (IR) systems have a negative impact on efficiency. Yet, with the large scale of Web search engines, there is a need to deploy efficient query processing techniques to reduce the ...
Fast phrase querying with combined indexes

Search engines need to evaluate queries extremely fast, a challenging task given the quantities of data being indexed. A significant proportion of the queries posed to search engines involve phrases. In this article we consider how phrase queries can be ...
Improving Ranking Consistency for Web Search by Leveraging a Knowledge Base and Search Logs
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

In this paper, we propose a new idea called ranking consistency in web search. Relevance ranking is one of the biggest problems in creating an effective web search system. Given some queries with similar search intents, conventional approaches typically ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2020

2548 pages

ISBN:9781450380164

DOI:10.1145/3397271

General Chairs:
Jimmy Huang
York University, Canada
,
Yi Chang
Jilin University, China
,
Xueqi Cheng
Chinese Academy of Sciences, China
,
Program Chairs:
Jaap Kamps
University of Amsterdam, Netherlands
,
Vanessa Murdock
Amazon, U.S.A.
,
Ji-Rong Wen
Renmin University of China, China
,
Yiqun Liu
Tsinghua University, China

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NPRP

Conference

SIGIR '20

Sponsor:

SIGIR

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval

July 25 - 30, 2020

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
658
Total Downloads

Downloads (Last 12 months)54
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mackenzie JTrotman ALin J(2023)Efficient Document-at-a-time and Score-at-a-time Query Evaluation for Learned Sparse RepresentationsACM Transactions on Information Systems10.1145/357692241:4(1-28)Online publication date: 22-Mar-2023
https://dl.acm.org/doi/10.1145/3576922
Qiao YYang YLin HYang T(2023)Optimizing Guided Traversal for Fast Learned Sparse RetrievalProceedings of the ACM Web Conference 202310.1145/3543507.3583497(3375-3385)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543507.3583497
Li ZMackenzie JChen HDuh WHuang HKato MMothe JPoblete B(2023)Profiling and Visualizing Dynamic Pruning AlgorithmsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591806(3125-3129)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591806
Santhanam KKhattab OPotts CZaharia MAl Hasan MXiong L(2022)PLAIDProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557325(1747-1756)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557325
Lassance CClinchant SAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)An Efficiency Study for SPLADE ModelsProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531833(2220-2226)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531833
Aboulnaga AAbouzied AEchihabi KOuzzani M(2021)Database systems research in the Arab worldCommunications of the ACM10.1145/344775064:4(120-123)Online publication date: 22-Mar-2021
https://dl.acm.org/doi/10.1145/3447750
Shao JQiao YJi SYang TDiaz FShah CSuel TCastells PJones RSakai T(2021)Window Navigation with Adaptive Probing for Executing BlockMax WANDProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463109(2323-2327)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3404835.3463109

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten