Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1277741.1277774acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Efficient document retrieval in main memory

Published: 23 July 2007 Publication History
  • Get Citation Alerts
  • Abstract

    Disk access performance is a major bottleneck in traditional information retrieval systems. Compared to system memory, disk bandwidth is poor, and seek times are worse.
    We circumvent this problem by considering query evaluation strategies in main memory. We show how new accumulator trimming techniques combined with inverted list skipping can produce extremely high performance retrieval systems without resorting to methods that may harm effectiveness.
    We evaluate our techniques using Galago, a new retrieval system designed for efficient query processing. Our system achieves a 69% improvement in query throughput over previous methods.

    References

    [1]
    V. N. Anh, O. deKretser, and A. Moffat. Vector-space ranking with effective early termination. In SIGIR 2001, pages 35--42, New York, NY, USA, 2001. ACM Press.
    [2]
    V. N. Anh and A. Moffat. Simplified similarity scoring using term ranks. In SIGIR 2005, pages 226--233, New York, NY, USA, 2005. ACM Press.
    [3]
    V. N. Anh and A. Moffat. Pruned query evaluation using pre-computed impacts. In SIGIR 2006, pages 372--379, New York, NY, USA, 2006. ACM Press.
    [4]
    H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum. IO-top-k: index-access optimized top-k query processing. In VLDB 2006, pages 475--486. VLDB Endowment, 2006.
    [5]
    E. W. Brown. Fast evaluation of structured queries for information retrieval. In SIGIR 1995, pages 30--38, New York, NY, USA, 1995. ACM Press.
    [6]
    C. Buckley. Implementation of the information retrieval system. Technical report, Cornell University, Ithaca, NY, USA, 1985.
    [7]
    S. Büttcher and C. L. A. Clarke. A document-centric approach to static index pruning in text retrieval systems. In CIKM 2006, pages 182--189, New York, NY, USA, 2006. ACM Press.
    [8]
    S. Büttcher, C. L. A. Clarke, and I. Soboroff. The TREC 2006 Terabyte track. In TREC 2006, Gaithersburg, Maryland USA, November 2006.
    [9]
    D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, Y. S. Maarek, and A. Soffer. Static index pruning for information retrieval systems. In SIGIR 2001, pages 43--50, New York, NY, USA, 2001. ACM Press.
    [10]
    C. W. Cleverdon. The significance of the Cranfield tests on index languages. In SIGIR 1991, pages 3--12, New York, NY, USA, 1991. ACM Press.
    [11]
    J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters: Simplified data processing on large clusters. In OSDI 2004, pages 137--150, 2004.
    [12]
    R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS 2001, pages 102--113, New York, NY, USA, 2001. ACM Press.
    [13]
    N. Lester, A. Moffat, W. Webber, and J. Zobel. Space-limited ranked query evaluation using adaptive pruning. In WISE 2005, pages 470--477, 2005.
    [14]
    A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst., 14(4):349--379, 1996.
    [15]
    M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. Journal of the American Society of Information Science, 47(10):749--764, 1996.
    [16]
    T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In SIGIR 2005, pages 219--225, New York, NY, USA, 2005. ACM Press.
    [17]
    H. Turtle and J. Flood. Query evaluation: strategies and optimizations. Information Processing and Management, 31(6):831--850, 1995.
    [18]
    I. H. Witten, A. Moffat, and T. C. Bell. Managing gigabytes (2nd ed.): compressing and indexing documents and images. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1999.
    [19]
    M. Zukowski, P. A. Boncz, N. Nes, and S. Heman. MonetDB/X100 - a DBMS in the CPU cache. IEEE Data Engineering Bulletin, 28(2):17--22, June 2005.

    Cited By

    View all
    • (2023)Optimizing Guided Traversal for Fast Learned Sparse RetrievalProceedings of the ACM Web Conference 202310.1145/3543507.3583497(3375-3385)Online publication date: 30-Apr-2023
    • (2021)Improving Search Engine Performance Through Dynamic Caching2021 40th International Conference of the Chilean Computer Science Society (SCCC)10.1109/SCCC54552.2021.9650412(1-6)Online publication date: 15-Nov-2021
    • (2020)Evaluation strategies for top-k queries over memory-resident inverted indexesProceedings of the VLDB Endowment10.14778/3402755.34027564:12(1213-1224)Online publication date: 3-Jun-2020
    • Show More Cited By

    Index Terms

    1. Efficient document retrieval in main memory

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
      July 2007
      946 pages
      ISBN:9781595935977
      DOI:10.1145/1277741
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 July 2007

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. impact-sorted indexes
      2. memory

      Qualifiers

      • Article

      Conference

      SIGIR07
      Sponsor:
      SIGIR07: The 30th Annual International SIGIR Conference
      July 23 - 27, 2007
      Amsterdam, The Netherlands

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)14
      • Downloads (Last 6 weeks)1

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Optimizing Guided Traversal for Fast Learned Sparse RetrievalProceedings of the ACM Web Conference 202310.1145/3543507.3583497(3375-3385)Online publication date: 30-Apr-2023
      • (2021)Improving Search Engine Performance Through Dynamic Caching2021 40th International Conference of the Chilean Computer Science Society (SCCC)10.1109/SCCC54552.2021.9650412(1-6)Online publication date: 15-Nov-2021
      • (2020)Evaluation strategies for top-k queries over memory-resident inverted indexesProceedings of the VLDB Endowment10.14778/3402755.34027564:12(1213-1224)Online publication date: 3-Jun-2020
      • (2020)Index Obfuscation for Oblivious Document Retrieval in a Trusted Execution EnvironmentProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412035(1345-1354)Online publication date: 19-Oct-2020
      • (2019)A Note on Using Performance and Data Profiles for Training AlgorithmsACM Transactions on Mathematical Software10.1145/331036245:2(1-10)Online publication date: 18-Apr-2019
      • (2019)Ultra-Low-Power Gaze Tracking for Virtual RealityGetMobile: Mobile Computing and Communications10.1145/3308755.330876522:3(27-31)Online publication date: 17-Jan-2019
      • (2019)SignpostGetMobile: Mobile Computing and Communications10.1145/3308755.330876322:3(23-26)Online publication date: 17-Jan-2019
      • (2019)Research on ARM TrustZoneGetMobile: Mobile Computing and Communications10.1145/3308755.330876122:3(17-22)Online publication date: 17-Jan-2019
      • (2019)Exploiting the Use of Similar Past Search Results Through a Dynamic Cache2019 IEEE CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies (CHILECON)10.1109/CHILECON47746.2019.8988020(1-5)Online publication date: Nov-2019
      • (2018)Locality analysis through static parallel samplingACM SIGPLAN Notices10.1145/3296979.319240253:4(557-570)Online publication date: 11-Jun-2018
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media