Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2009916.2010045acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Posting list intersection on multicore architectures

Published: 24 July 2011 Publication History
  • Get Citation Alerts
  • Abstract

    In current commercial Web search engines, queries are processed in the conjunctive mode, which requires the search engine to compute the intersection of a number of posting lists to determine the documents matching all query terms. In practice, the intersection operation takes a significant fraction of the query processing time, for some queries dominating the total query latency. Hence, efficient posting list intersection is critical for achieving short query latencies. In this work, we focus on improving the performance of posting list intersection by leveraging the compute capabilities of recent multicore systems. To this end, we consider various coarse-grained and fine-grained parallelization models for list intersection. Specifically, we present an algorithm that partitions the work associated with a given query into a number of small and independent tasks that are subsequently processed in parallel. Through a detailed empirical analysis of these alternative models, we demonstrate that exploiting parallelism at the finest-level of granularity is critical to achieve the best performance on multicore systems. On an eight-core system, the fine-grained parallelization method is able to achieve more than five times reduction in average query processing time while still exploiting the parallelism for high query throughput.

    References

    [1]
    V. N. Anh, O. Kretser, and A. Moffat. Vector-space ranking with effective early termination. In Proc. 24th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 35--42, 2001.
    [2]
    V. N. Anh and A. Moffat. Compressed inverted files with reduced decoding overheads. In Proc. 21th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 290--297, 1998.
    [3]
    C. Badue, R. Baeza-Yates, B. Ribeiro-Neto, and N. Ziviani. Distributed query processing using partitioned inverted files. In Proc. 8th Symp. String Processing and Information Retrieval, pages 10--20, 2001.
    [4]
    R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. The impact of caching on search engines. In Proc. 30th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 183--190, 2007.
    [5]
    R. A. Baeza-Yates. A fast set intersection algorithm for sorted sequences. In Proc. 15th Annual Symp. Combinatorial Pattern Matching, pages 400--408, 2004.
    [6]
    J. Barbay. Optimality of randomized algorithms for the intersection problem. In Proc. 2nd Int'l Symp. Stochastic Algorithms: Foundations and Applications, pages 26--38, 2003.
    [7]
    J. Barbay, A. López-Ortiz, and T. Lu. Faster adaptive set intersections for text searching. In Proc. 5th Int'l Workshop on Experimental Algorithms, pages 146--157, 2006.
    [8]
    L.A. Barroso, J. Dean, and U. Holzle. Web search for a planet: the Google cluster architecture. IEEE Micro, 23(2):22--28, 2003.
    [9]
    N.J. Belkin, D. Kelly, G. Kim, J.Y. Kim, H.J. Lee, G. Muresan, M.C. Tang, X.J. Yuan, and C. Cool. Query length in interactive information retrieval. In Proc. 26th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 205--212. ACM New York, NY, USA, 2003.
    [10]
    R. Blanco. Index compression for information retrieval systems. PhD thesis, University of A Coruna, 2008.
    [11]
    C. Bonacic, C. Garcia, M. Marin, M. Prieto, F. Tirado, and C. Vicente. Improving search engines performance on multithreading processors. In Proc. 8th Int'l Conf. High Performance Computing for Computational Science, pages 201--213, 2008.
    [12]
    Eric A. Brewer. Lessons from giant-scale services. IEEE Internet Computing, 5(4):46--55, 2001.
    [13]
    C. Buckley and A. Lewit. Optimizations of inverted vector searches. In Proc. 8th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 97--110, 1985.
    [14]
    B.B. Cambazoglu, F.P. Junqueira, V. Plachouras, S. Banachowski, B. Cui, S. Lim, and B. Bridge. A refreshing perspective of search engine caching. In Proc. 19th Int'l Conf. World Wide Web, pages 181--190, 2010.
    [15]
    B.B. Cambazoglu, H. Zaragoza, O. Chapelle, J. Chen, C. Liao, Z. Zheng, and J. Degenhardt. Early exit optimizations for additive machine learned ranking systems. In Proc. 3rd ACM Int'l Conf. Web Search and Data Mining, pages 411--420, 2010.
    [16]
    E. Demaine, A. López-Ortiz, and J. I. Munro. Adaptive set intersections, unions, and differences. In Proc. 11th ACM-SIAM Symp. Discrete Algorithms, pages 743--752, 2000.
    [17]
    E. Demaine, A. López-Ortiz, and J. I. Munro. Experiments on adaptive set intersections for text retrieval systems. Lect. Notes Comput. Sc., 2153:91--104, 2001.
    [18]
    S. Ding, J. He, H. Yan, and T. Suel. Using graphics processors for high performance IR query processing. In Proc. 18th Int'l Conf. World Wide Web, pages 421--430, 2009.
    [19]
    E. Frachtenberg. Reducing query latencies in web search using fine-grained parallelism. World Wide Web, 12(4):441--460, 2009.
    [20]
    Q. Gan and T. Suel. Improved techniques for result caching in web search engines. In Proc. 18th Int'l Conf. World Wide Web, pages 431--440, 2009.
    [21]
    A. Moffat, W. Webber, J. Zobel, and R. Baeza-Yates. A pipelined architecture for distributed text query evaluation. Inf. Retr., 10(3):205--231, 2007.
    [22]
    A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst., 14(4):349--379, 1996.
    [23]
    M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. J. Am. Soc. Inf. Sci., 47(10):749--764, 1996.
    [24]
    D. Puppin, F. Silvestri, and D. Laforenza. Query-driven document partitioning and collection selection. In Proc. 1st Int'l Conf. Scalable Information Systems, 2006.
    [25]
    B. Ribeiro-Neto and R. A. Barbosa. Query performance for tightly coupled distributed digital libraries. In Proc. 3rd ACM Conf. Digital Libraries, pages 182--190, 1998.
    [26]
    E. Schurman and J. Brutlag. Performance related changes and their user impact. Velocity -- Web Performance and Operations Conf., 2009.
    [27]
    Gleb Skobeltsyn, Flavio Junqueira, Vassilis Plachouras, and Ricardo Baeza-Yates. ResIn: a combination of results caching and index pruning for high-performance web search engines. In Proc. 31st Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 131--138, 2008.
    [28]
    T. Strohman and W. Croft. Efficient document retrieval in main memory. In Proc. 30th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 175--182, 2007.
    [29]
    T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In Proc. 28th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 219--225, 2005.
    [30]
    S. Tatikonda and S. Parthasarathy. Mining tree-structured data on multicore systems. Proc. VLDB Endow., 2(1):694--705, 2009.
    [31]
    D. Tsirogiannis, S. Guha, and N. Koudas. Improving the performance of list intersection. In Proc. 35th Int'l Conf. Very Large Data Bases, pages 838--849, 2009.
    [32]
    J. Zhang, X. Long, and T. Suel. Performance of compressed inverted list caching in search engines. In Proc. 17th Int'l Conf. World Wide Web, pages 387--396, 2008.
    [33]
    M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar RAM-CPU cache compression. In Proc. 22nd Int'l Conf. Data Engineering, 2006.

    Cited By

    View all
    • (2021)Evaluating List Intersection on SSDs for Parallel I/O Skipping2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00161(1823-1828)Online publication date: Apr-2021
    • (2020)SageProceedings of the VLDB Endowment10.14778/3397230.339725113:9(1598-1613)Online publication date: 1-May-2020
    • (2020)Performance Characterization of Simultaneous Multi-Threading and Index Partitioning for an Online Document Search Application2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS48437.2020.00043(231-240)Online publication date: Aug-2020
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
    July 2011
    1374 pages
    ISBN:9781450307574
    DOI:10.1145/2009916
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 July 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. intra-query parallelism
    2. multicore architectures
    3. posting list intersection
    4. query processing
    5. web search engines

    Qualifiers

    • Research-article

    Conference

    SIGIR '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Evaluating List Intersection on SSDs for Parallel I/O Skipping2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00161(1823-1828)Online publication date: Apr-2021
    • (2020)SageProceedings of the VLDB Endowment10.14778/3397230.339725113:9(1598-1613)Online publication date: 1-May-2020
    • (2020)Performance Characterization of Simultaneous Multi-Threading and Index Partitioning for an Online Document Search Application2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS48437.2020.00043(231-240)Online publication date: Aug-2020
    • (2019)Fully dynamic depth-first search in directed graphsProceedings of the VLDB Endowment10.14778/3364324.336432913:2(142-154)Online publication date: 1-Oct-2019
    • (2019)Efficient main-memory top-K selection for multicore architecturesProceedings of the VLDB Endowment10.14778/3364324.336432713:2(114-127)Online publication date: 1-Oct-2019
    • (2019)Enabling data science for the majorityProceedings of the VLDB Endowment10.14778/3352063.335214812:12(2309-2322)Online publication date: 1-Aug-2019
    • (2019)Customizable and scalable fuzzy join for big dataProceedings of the VLDB Endowment10.14778/3352063.335212812:12(2106-2117)Online publication date: 1-Aug-2019
    • (2019)Comprehensive Characterization of an Open Source Document Search EngineACM Transactions on Architecture and Code Optimization10.1145/332034616:2(1-21)Online publication date: 29-May-2019
    • (2019)Scalable Top-K Query Processing Using Graphics Processing UnitLanguages and Compilers for Parallel Computing10.1007/978-3-030-35225-7_16(240-261)Online publication date: 15-Nov-2019
    • (2018)ColumnMLProceedings of the VLDB Endowment10.14778/3297753.329775612:4(348-361)Online publication date: 1-Dec-2018
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media