Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Filtered document retrieval with frequency-sorted indexes

Published: 01 September 1996 Publication History
  • Get Citation Alerts
  • Abstract

    No abstract available.

    Cited By

    View all
    • (2017)Early Termination Heuristics for Score-at-a-Time Index TraversalProceedings of the 22nd Australasian Document Computing Symposium10.1145/3166072.3166073(1-8)Online publication date: 7-Dec-2017
    • (2017)Latency Reduction via Decision Tree Based Query ConstructionProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132865(1399-1407)Online publication date: 6-Nov-2017
    • (2017)Quantization in Append-Only CollectionsProceedings of the ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3121050.3121092(265-268)Online publication date: 1-Oct-2017
    • Show More Cited By

    Index Terms

    1. Filtered document retrieval with frequency-sorted indexes

      Recommendations

      Reviews

      William T. O'Connell

      The authors describe a filtering technique used in conjunction with frequency-sorted inverted lists in document databases. By filtering out irrelevant documents quickly, they reduce disk, CPU, and memory costs on similarity queries without degrading retrieval effectiveness. Since each inverted list is sorted on term frequency within a database, their technique limits the search to the first part of the highly relevant lists. A list is considered relevant only if the combination of the list's term frequency and its term importance for a target document is large enough to be likely to affect the final ordering of documents. During simulations, the technique used only 2 percent of the memory required by other standard implementations, without degradation of retrieval effectiveness. However, it is not clear what implementations are being referred to. While the authors give an overview of inversion techniques, they do not relate this work to signature files or to vector space and clustering models [1]. Recent work in weighted-partitioned signature files has also shown CPU, memory, and disk reduction without sacrificing recall and precision effectiveness [2]. Recent developments in vector models, such as latent semantic i ndexing, effectively apply singular value decomposition on the document term-matrices with good results [3]. Additionally, this approach allows general-purpose multidimensional indexes to be used. It would also have been appropriate to have additional discussion of the method's disadvantages, such as additional storage overhead [4] due to the difficulties of compression on frequency-sorted lists and the query cost on dynamic transaction workloads. The authors' idea about frequency-sorted indexes is a good extension, but I would have liked to see more analysis of the implications for storage overhead and dynamic transaction workloads. Overall, the ideas are well organized and presented. I recommend the paper to anyone interested in document retrieval techniques.

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Information & Contributors

      Information

      Published In

      cover image Journal of the American Society for Information Science
      Journal of the American Society for Information Science  Volume 47, Issue 10
      Oct. 1996
      59 pages
      ISSN:0002-8231
      Issue’s Table of Contents

      Publisher

      John Wiley & Sons, Inc.

      United States

      Publication History

      Published: 01 September 1996

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 11 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2017)Early Termination Heuristics for Score-at-a-Time Index TraversalProceedings of the 22nd Australasian Document Computing Symposium10.1145/3166072.3166073(1-8)Online publication date: 7-Dec-2017
      • (2017)Latency Reduction via Decision Tree Based Query ConstructionProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132865(1399-1407)Online publication date: 6-Nov-2017
      • (2017)Quantization in Append-Only CollectionsProceedings of the ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3121050.3121092(265-268)Online publication date: 1-Oct-2017
      • (2017)Top-k Query Processing with Conditional SkipsProceedings of the 26th International Conference on World Wide Web Companion10.1145/3041021.3054191(653-661)Online publication date: 3-Apr-2017
      • (2017)Inverted TreapsACM Transactions on Information Systems10.1145/300718635:3(1-45)Online publication date: 4-Jan-2017
      • (2017)Time-Optimal Top-$k$ Document RetrievalSIAM Journal on Computing10.1137/14099894946:1(80-113)Online publication date: 8-Feb-2017
      • (2016)In Vacuo and In Situ Evaluation of SIMD CodecsProceedings of the 21st Australasian Document Computing Symposium10.1145/3015022.3015023(1-8)Online publication date: 5-Dec-2016
      • (2016)Rank-at-a-Time Query ProcessingProceedings of the 2016 ACM International Conference on the Theory of Information Retrieval10.1145/2970398.2970434(229-232)Online publication date: 12-Sep-2016
      • (2016)A Two-Phase MapReduce Algorithm for Scalable Preference Queries over High-Dimensional DataProceedings of the 20th International Database Engineering & Applications Symposium10.1145/2938503.2938525(43-52)Online publication date: 11-Jul-2016
      • (2016)Instant SearchProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2914806(1211-1214)Online publication date: 7-Jul-2016
      • Show More Cited By

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media