Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3340531.3412080acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

A Comparison of Top-k Threshold Estimation Techniques for Disjunctive Query Processing

Published: 19 October 2020 Publication History

Abstract

In the top-k threshold estimation problem, given a query q, the goal is to estimate the score of the result at rank k. A good estimate of this score can result in significant performance improvements for several query processing scenarios, including selective search, index tiering, and widely used disjunctive query processing algorithms such as MaxScore, WAND, and BMW. Several approaches have been proposed, including parametric approaches, methods using random sampling, and a recent approach based on machine learning. However, previous work fails to perform any experimental comparison between these approaches. In this paper, we address this issue by reimplementing four major approaches and comparing them in terms of estimation error, running time, likelihood of an overestimate, and end-to-end performance when applied to common classes of disjunctive top-k query processing algorithms.

Supplementary Material

MP4 File (3340531.3412080.mp4)
Video file

References

[1]
I. S. Altingovde, R. Ozcan, and Ö. Ulusoy. 2012. Static Index Pruning in Web Search Engines: Combining Term and Document Popularities with Query Views. ACM Trans. Inf. Syst. 30, 1 (2012).
[2]
R. Aly, D. Hiemstra, and T. Demeester. 2013. Taily: Shard Selection Using the Tail of Score Distributions. In SIGIR. 673--682.
[3]
R. Baeza-Yates, V. Murdock, and C. Hauff. 2009. Efficiency Trade-Offs in Two-Tier Web Search Systems. In SIGIR. 163--170.
[4]
A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. 2003. Efficient Query Evaluation Using a Two-Level Retrieval Process. In CIKM.
[5]
Z. Dai, Y. Kim, and J. Callan. 2017. Learning to rank resources. In SIGIR. 837--840.
[6]
L. L. S. de Carvalho, E. S. de Moura, C. M. Daoud, and A. S. da Silva. 2015. Heuristics to Improve the BMW Method and Its Variants. JIDM 6, 3 (2015).
[7]
C. Dimopoulos, S. Nepomnyachiy, and T. Suel. 2013. A Candidate Filtering Mechanism for Fast Top-k Query Processing on Modern Cpus. In SIGIR. 723--732.
[8]
C. Dimopoulos, S. Nepomnyachiy, and T. Suel. 2013. Optimizing Top-k Document Retrieval Strategies for Block-Max Indexes. In WSDM.
[9]
S. Ding and T. Suel. 2011. Faster Top-k Document Retrieval Using Block-Max Indexes. In SIGIR. 993--1002.
[10]
S. Garcia. 2007. Search engine optimisation using past queries. Ph.D. Dissertation. RMIT University, Melbourne, Australia.
[11]
A. Kane and F. Wm. Tompa. 2018. Split-Lists and Initial Thresholds for WAND based Search. In SIGIR. 877--880.
[12]
A. Kulkarni and J. Callan. 2015. Selective search: Efficient and effective search of large textual collections. TOIS 33, 4 (2015), 1--33.
[13]
A. Kulkarni, A. S. Tigelaar, D. Hiemstra, and J. Callan. 2012. Shard Ranking and Cutoff Estimation for Topically Partitioned Collections. In CIKM. 555--564.
[14]
D. Lemire and L. Boytsov. 2015. Decoding Billions of Integers Per Second Through Vectorization. Software: Practice and Experience 45, 1 (2015), 1--29.
[15]
G. Leung, N. Quadrianto, A. J. Smola, and K. Tsioutsiouliklis. 2010. Optimal Web-Scale Tiering as a Flow Problem. In NIPS. 1333--1341.
[16]
A. Mallia, G. Ottaviano, E. Porciani, N. Tonellotto, and R. Venturini. 2017. Faster BlockMax WAND with Variable-Sized Blocks. In SIGIR.
[17]
A. Mallia and E. Porciani. 2019. Faster BlockMax WAND with longer skipping. In European Conference on Information Retrieval. 771--778.
[18]
A. Ntoulas and J. Cho. 2007. Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee. In SIGIR. 191--198.
[19]
M. Petri, A. Moffat, J. Mackenzie, J. S. Culpepper, and D. Beck. 2019. Accelerated Query Processing Via Similarity Score Prediction. In SIGIR. 485--494.
[20]
Jay M Ponte and W Bruce Croft. 1998. A language modeling approach to information retrieval. In SIGIR. 275--281.
[21]
K. M. Risvik, Y. Aasheim, and M. Lidal. 2003. Multi-Tier Architecture for Web Search Engines. In Proc. of the First Conf. on Latin American Web Congress. 132.
[22]
L. Si and J. Callan. 2002. Using Sampled Data and Regression to Merge Search Engine Results. In SIGIR. 19--26.
[23]
L. Si and J. P. Callan. 2003. Relevant document distribution estimation method for resource selection. In SIGIR. 298--305.
[24]
P. Thomas and M. Shokouhi. 2009. SUSHI: Scoring Scaled Samples for Server Selection. In SIGIR. 419--426.
[25]
H. Turtle and J. Flood. 1995. Query Evaluation: Strategies and Optimizations. Information Processing & Management 31, 6 (1995), 831--850.
[26]
E. Yafay and I. S. Altingovde. 2019. Caching Scores for Faster Query Processing with Dynamic Pruning in Search Engines. In CIKM. 2457--2460.

Cited By

View all
  • (2024)Faster Learned Sparse Retrieval with Block-Max PruningProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657906(2411-2415)Online publication date: 10-Jul-2024
  • (2024)Beyond Quantile Methods: Improved Top-K Threshold Estimation for Traditional and Learned Sparse Indexes2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825349(709-716)Online publication date: 15-Dec-2024
  • (2023)Efficient Document-at-a-time and Score-at-a-time Query Evaluation for Learned Sparse RepresentationsACM Transactions on Information Systems10.1145/357692241:4(1-28)Online publication date: 22-Mar-2023
  • Show More Cited By

Index Terms

  1. A Comparison of Top-k Threshold Estimation Techniques for Disjunctive Query Processing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
    October 2020
    3619 pages
    ISBN:9781450368599
    DOI:10.1145/3340531
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. query processing
    2. threshold estimation
    3. top-k document retrieval

    Qualifiers

    • Short-paper

    Funding Sources

    • Amazon
    • NSF

    Conference

    CIKM '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)32
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Faster Learned Sparse Retrieval with Block-Max PruningProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657906(2411-2415)Online publication date: 10-Jul-2024
    • (2024)Beyond Quantile Methods: Improved Top-K Threshold Estimation for Traditional and Learned Sparse Indexes2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825349(709-716)Online publication date: 15-Dec-2024
    • (2023)Efficient Document-at-a-time and Score-at-a-time Query Evaluation for Learned Sparse RepresentationsACM Transactions on Information Systems10.1145/357692241:4(1-28)Online publication date: 22-Mar-2023
    • (2023)Faster Dynamic Pruning via Reordering of Documents in Inverted IndexesProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591987(2001-2005)Online publication date: 19-Jul-2023
    • (2023)Profiling and Visualizing Dynamic Pruning AlgorithmsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591806(3125-3129)Online publication date: 19-Jul-2023
    • (2022)Using Conjunctions for Faster Disjunctive Top-k QueriesProceedings of the Fifteenth ACM International Conference on Web Search and Data Mining10.1145/3488560.3498489(917-927)Online publication date: 11-Feb-2022
    • (2022)Efficient query processing techniques for next-page retrievalInformation Retrieval10.1007/s10791-021-09402-725:1(27-43)Online publication date: 18-Jan-2022
    • (2021)Fast Disjunctive Candidate Generation Using Live Block FilteringProceedings of the 14th ACM International Conference on Web Search and Data Mining10.1145/3437963.3441813(671-679)Online publication date: 8-Mar-2021
    • (2021)Window Navigation with Adaptive Probing for Executing BlockMax WANDProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463109(2323-2327)Online publication date: 11-Jul-2021
    • (2020)Examining the Additivity of Top-k Query Processing InnovationsProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412000(1085-1094)Online publication date: 19-Oct-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media