Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3397271.3401076acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Finding the Best of Both Worlds: Faster and More Robust Top-k Document Retrieval

Published: 25 July 2020 Publication History

Abstract

Many top-k document retrieval strategies have been proposed based on the WAND and MaxScore heuristics and yet, from recent work, it is surprisingly difficult to identify the "fastest" strategy. This becomes even more challenging when considering various retrieval criteria, like different ranking models and values of k. In this paper, we conduct the first extensive comparison between ten effective strategies, many of which were never compared before to our knowledge, examining their efficiency under five representative ranking models. Based on a careful analysis of the comparison, we propose LazyBM, a remarkably simple retrieval strategy that bridges the gap between the best performing WAND-based and MaxScore-based approaches. Empirically, LazyBM considerably outperforms all of the considered strategies across ranking models, values of k, and index configurations under both mean and tail query latency.

References

[1]
Giambattista Amati. 2006. Frequentist and bayesian approach to information retrieval. In European Conference on Information Retrieval. Springer, 13--24.
[2]
Gianni Amati and Cornelis Joost Van Rijsbergen. 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. TOIS (2002).
[3]
Ramesh Bhashyam. 1996. TPC-D-the challenges, issues and results. ACM SIGMOD Record, Vol. 25, 4 (1996), 89--93.
[4]
Edward Bortnikov, David Carmel, and Guy Golan-Gueta. 2017. Top-k query processing with conditional skips. In Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 653--661.
[5]
Andrei Z Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Zien. 2003. Efficient query evaluation using a two-level retrieval process. In CIKM.
[6]
Kaushik Chakrabarti, Surajit Chaudhuri, and Venkatesh Ganti. 2011. Interval-based pruning for top-k processing over compressed lists. In ICDE.
[7]
Stéphane Clinchant and Eric Gaussier. 2010. Information-based models for ad hoc IR. In SIGIR.
[8]
Kevyn Collins-Thompson, Paul N. Bennett, Fernando Diaz, Charlie Clarke, and Ellen M. Voorhees. 2014. TREC 2013 Web Track Overview. In TREC.
[9]
Matt Crane, J Shane Culpepper, Jimmy Lin, Joel Mackenzie, and Andrew Trotman. 2017. A comparison of Document-at-a-Time and Score-at-a-Time query evaluation. In WSDM.
[10]
Caio Moura Daoud, Edleno Silva de Moura, Andre Carvalho, Altigran Soares da Silva, David Fernandes, and Cristian Rossi. 2016. Fast top-k preserving query processing using two-tier indexes. Information Processing & Management, Vol. 52, 5 (2016), 855--872.
[11]
Caio Moura Daoud, Edleno Silva de Moura, David Fernandes, Altigran Soares da Silva, Cristian Rossi, and Andre Carvalho. 2017. Waves: a fast multi-tier top-k query processing algorithm. Information Retrieval Journal, Vol. 20, 3 (2017), 292--316.
[12]
Jeffrey Dean. 2009. Challenges in building large-scale information retrieval systems: invited talk. In WSDM. 1--1.
[13]
Constantinos Dimopoulos, Sergey Nepomnyachiy, and Torsten Suel. 2013a. A candidate filtering mechanism for fast top-k query processing on modern cpus. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 723--732.
[14]
Constantinos Dimopoulos, Sergey Nepomnyachiy, and Torsten Suel. 2013b. Optimizing top-k document retrieval strategies for block-max indexes. In WSDM.
[15]
Shuai Ding and Torsten Suel. 2011. Faster top-k document retrieval using block-max indexes. In SIGIR.
[16]
Hui Fang and ChengXiang Zhai. 2005. An exploration of axiomatic approaches to information retrieval. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 480--487.
[17]
Marcus Fontoura, Vanja Josifovski, Jinhui Liu, Srihari Venkatesan, Xiangfei Zhu, and Jason Zien. 2011. Evaluation strategies for top-k queries over memory-resident inverted indexes. VLDB (2011).
[18]
Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 55--64.
[19]
Myeongjae Jeon, Saehoon Kim, Seung-won Hwang, Yuxiong He, Sameh Elnikety, Alan L Cox, and Scott Rixner. 2014. Predictive parallelization: Taming tail latencies in web search. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 253--262.
[20]
Jimmy Lin and Andrew Trotman. 2015. Anytime ranking for impact-ordered indexes. In Proceedings of the 2015 International Conference on The Theory of Information Retrieval. ACM, 301--304.
[21]
Jimmy Lin and Andrew Trotman. 2017. The role of index compression in score-at-a-time query evaluation. Information Retrieval Journal, Vol. 20, 3 (2017), 199--220.
[22]
Claudio Lucchese, Franco Maria Nardini, Raffaele Perego, Salvatore Orlando, and Salvatore Trani. 2018. Selective Gradient Boosting for Effective Learning to Rank. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 155--164.
[23]
Craig Macdonald, Iadh Ounis, and Nicola Tonellotto. 2011. Upper-bound approximations for dynamic pruning. ACM Transactions on Information Systems (TOIS), Vol. 29, 4 (2011), 17.
[24]
Craig Macdonald and Nicola Tonellotto. 2017. Upper Bound Approximation for BlockMaxWand. In Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. ACM, 273--276.
[25]
Craig Macdonald, Nicola Tonellotto, and Iadh Ounis. 2017. Efficient & effective selective query rewriting with efficiency predictions. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 495--504.
[26]
Antonio Mallia, Giuseppe Ottaviano, Elia Porciani, Nicola Tonellotto, and Rossano Venturini. 2017. Faster blockmax wand with variable-sized blocks. In SIGIR.
[27]
Antonio Mallia, Michał Siedlaczek, Joel Mackenzie, and Torsten Suel. 2019 b. PISA: Performant Indexes and Search for Academia. (2019).
[28]
Antonio Mallia, Michał Siedlaczek, and Torsten Suel. 2019 a. An experimental study of index compression and DAAT query processing methods. In European Conference on Information Retrieval. Springer, 353--368.
[29]
Eric Nalisnick, Bhaskar Mitra, Nick Craswell, and Rich Caruana. 2016. Improving document ranking with dual word embeddings. In Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee, 83--84.
[30]
Matthias Petri, J Shane Culpepper, and Alistair Moffat. 2013. Exploring the magic of WAND. In Proceedings of the 18th Australasian Document Computing Symposium. ACM, 58--65.
[31]
Matthias Petri, Alistair Moffat, Joel Mackenzie, J Shane Culpepper, and Daniel Beck. 2019. Accelerated Query Processing Via Similarity Score Prediction. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 485--494.
[32]
Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et almbox. 1995. Okapi at TREC-3. NIST Special Publication (1995).
[33]
Oscar Rojas, Veronica Gil-Costa, and Mauricio Marin. 2013a. Distributing efficiently the Block-Max WAND algorithm. Procedia Computer Science (2013).
[34]
Oscar Rojas, Veronica Gil-Costa, and Mauricio Marin. 2013b. Efficient parallel block-max WAND algorithm. In European Conference on Parallel Processing. Springer, 394--405.
[35]
Dongdong Shan, Shuai Ding, Jing He, Hongfei Yan, and Xiaoming Li. 2012. Optimized top-k processing with global page scores on block-max indexes. In Proceedings of the fifth ACM international conference on Web search and data mining. ACM, 423--432.
[36]
Alexander A Stepanov, Anil R Gangolli, Daniel E Rose, Ryan J Ernst, and Paramjit S Oberoi. 2011. SIMD-based decoding of posting lists. In Proceedings of the 20th ACM international conference on Information and knowledge management. 317--326.
[37]
Nicola Tonellotto, Craig Macdonald, Iadh Ounis, et almbox. 2018. Efficient Query Processing for Scalable Web Search. Foundations and Trends® in Information Retrieval (2018).
[38]
Howard Turtle and James Flood. 1995. Query evaluation: strategies and optimizations. IP & M (1995).
[39]
Sebastiano Vigna. 2013. Quasi-succinct Indices. In WSDM.
[40]
Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A cascade ranking model for efficient ranked retrieval. In SIGIR.
[41]
Peilin Yang, Hui Fang, and Jimmy Lin. 2018. Anserini: Reproducible ranking baselines using Lucene. JDIQ (2018).
[42]
Dawei Yin, Yuening Hu, Jiliang Tang, Tim Daly, Mianwei Zhou, Hua Ouyang, Jianhui Chen, Changsung Kang, Hongbo Deng, Chikashi Nobata, et almbox. 2016. Ranking relevance in yahoo search. In SIGKDD.
[43]
Chengxiang Zhai and John Lafferty. 2004. A study of smoothing methods for language models applied to information retrieval. TOIS (2004).

Cited By

View all
  • (2023)Efficient Document-at-a-time and Score-at-a-time Query Evaluation for Learned Sparse RepresentationsACM Transactions on Information Systems10.1145/357692241:4(1-28)Online publication date: 22-Mar-2023
  • (2023)Optimizing Guided Traversal for Fast Learned Sparse RetrievalProceedings of the ACM Web Conference 202310.1145/3543507.3583497(3375-3385)Online publication date: 30-Apr-2023
  • (2023)Profiling and Visualizing Dynamic Pruning AlgorithmsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591806(3125-3129)Online publication date: 19-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2020
2548 pages
ISBN:9781450380164
DOI:10.1145/3397271
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dynamic pruning
  2. efficiency
  3. query evaluation
  4. web search

Qualifiers

  • Research-article

Funding Sources

  • NPRP

Conference

SIGIR '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)54
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Efficient Document-at-a-time and Score-at-a-time Query Evaluation for Learned Sparse RepresentationsACM Transactions on Information Systems10.1145/357692241:4(1-28)Online publication date: 22-Mar-2023
  • (2023)Optimizing Guided Traversal for Fast Learned Sparse RetrievalProceedings of the ACM Web Conference 202310.1145/3543507.3583497(3375-3385)Online publication date: 30-Apr-2023
  • (2023)Profiling and Visualizing Dynamic Pruning AlgorithmsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591806(3125-3129)Online publication date: 19-Jul-2023
  • (2022)PLAIDProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557325(1747-1756)Online publication date: 17-Oct-2022
  • (2022)An Efficiency Study for SPLADE ModelsProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531833(2220-2226)Online publication date: 6-Jul-2022
  • (2021)Database systems research in the Arab worldCommunications of the ACM10.1145/344775064:4(120-123)Online publication date: 22-Mar-2021
  • (2021)Window Navigation with Adaptive Probing for Executing BlockMax WANDProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463109(2323-2327)Online publication date: 11-Jul-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media