Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2009916.2010046acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Timestamp-based result cache invalidation for web search engines

Published: 24 July 2011 Publication History
  • Get Citation Alerts
  • Abstract

    The result cache is a vital component for efficiency of large-scale web search engines, and maintaining the freshness of cached query results is the current research challenge. As a remedy to this problem, our work proposes a new mechanism to identify queries whose cached results are stale. The basic idea behind our mechanism is to maintain and compare generation time of query results with update times of posting lists and documents to decide on staleness of query results. The proposed technique is evaluated using a Wikipedia document collection with real update information and a real-life query log. We show that our technique has good prediction accuracy, relative to a baseline based on the time-to-live mechanism. Moreover, it is easy to implement and incurs less processing overhead on the system relative to a recently proposed, more sophisticated invalidation mechanism.

    References

    [1]
    S. Alici, I. S. Altingovde, R. Ozcan, B. B. Cambazoglu, and Ö. Ulusoy. Timestamp-based cache invalidation for search engines. In Proc. 20th Int'l Conf. World Wide Web (Companion Volume), pages 3--4, 2011.
    [2]
    I. S. Altingovde, R. Ozcan, B. B. Cambazoglu, and Ö. Ulusoy. Second chance: A hybrid approach for dynamic result caching in search engines. In Proc. 33rd European Conference IR Research, pages 510--516, 2011.
    [3]
    I. S. Altingovde, R. Ozcan, and Ö. Ulusoy. A cost-aware strategy for query result caching in web search engines. In Proc. 31th European Conference on IR Research, pages 628--636, 2009.
    [4]
    R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. The impact of caching on search engines. In Proc. 30th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 183--190, 2007.
    [5]
    R. Baeza-Yates, F. Junqueira, V. Plachouras, and H. Witschel. Admission policies for caches of search engine results. In Proc. 14th Int'l Symposium on String Processing and Information Retrieval, pages 74--85. 2007.
    [6]
    R. Baeza-Yates and F. Saint-Jean. A three level search engine index based in query log distribution. In Proc. 10th Int'l Symposium on String Processing and Information Retrieval, pages 56--65. 2003.
    [7]
    R. Blanco, E. Bortnikov, F. Junqueira, R. Lempel, L. Telloli, and H. Zaragoza. Caching search engine results over incremental indices. In Proc. 33rd Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 82--89, 2010.
    [8]
    S. Büttcher, C. L. A. Clarke, and B. Lushman. Hybrid index maintenance for growing text collections. In Proc. 29th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 356--363, 2006.
    [9]
    B. B. Cambazoglu, F. P. Junqueira, V. Plachouras, S. Banachowski, B. Cui, S. Lim, and B. Bridge. A refreshing perspective of search engine caching. In Proc. 19th Int'l Conf. World Wide Web, pages 181--190, 2010.
    [10]
    D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, Y. S. Maarek, and A. Soffer. Static index pruning for information retrieval systems. In Proc. 24th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 43--50, 2001.
    [11]
    D. Cutting and J. Pedersen. Optimization for dynamic inverted index maintenance. In Proc. 13th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 405--411, 1990.
    [12]
    J. Dean. Challenges in building large-scale information retrieval systems. In Proc. 2nd ACM Int'l Conf. Web Search and Data Mining, page 1, 2009.
    [13]
    T. Fagni, R. Perego, F. Silvestri, and S. Orlando. Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst., 24(1):51--78, 2006.
    [14]
    D. Fetterly, M. Manasse, M. Najork, and J. Wiener. A large-scale study of the evolution of web pages. In Proc. 12th Int'l Conf. World Wide Web, pages 669--678, 2003.
    [15]
    Q. Gan and T. Suel. Improved techniques for result caching in web search engines. In Proc. 18th Int'l Conf. World Wide Web, pages 431--440, 2009.
    [16]
    S. Garcia. Search engine optimisation using past queries. PhD thesis, RMIT University, 2007.
    [17]
    R. Lempel and S. Moran. Predictive caching and prefetching of query results in search engines. In Proc. 12th Int'l Conf. World Wide Web, pages 19--28, 2003.
    [18]
    N. Lester, A. Moffat, and J. Zobel. Efficient online index construction for text databases. ACM Trans. Database Syst., 33(3):1--33, 2008.
    [19]
    N. Lester, J. Zobel, and H. E. Williams. In-place versus re-build versus re-merge: index maintenance strategies for text retrieval systems. In Proc. 27th Australasian Conf. Computer Science, pages 15--23, 2004.
    [20]
    X. Long and T. Suel. Three-level caching for efficient query processing in large web search engines. In Proc. 14th Int'l Conf. World Wide Web, pages 257--266, 2005.
    [21]
    M. Marin, V. Gil-Costa, and C. Gomez-Pantoja. New caching techniques for web search engines. In Proc. 19th ACM Int'l Symp. High Performance Distributed Computing, pages 215--226, 2010.
    [22]
    E. P. Markatos. On caching search engine query results. Comput. Commun., 24(2):137--143, 2001.
    [23]
    D. Metzler and W. B. Croft. A Markov random field model for term dependencies. In Proc. 28th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 472--479, 2005.
    [24]
    A. Ntoulas, J. Cho, and C. Olston. What's new on the Web?: the evolution of the Web from a search engine perspective. In Proc. 13th Int'l Conf. World Wide Web, pages 1--12, 2004.
    [25]
    R. Ozcan, I. S. Altingovde, and Ö. Ulusoy. Cost-aware strategies for query result caching in web search engines. ACM Trans. Web, 5(2):9:1--9:25, May 2011.
    [26]
    R. Ozcan, I. S. Altingovde, B. B. Cambazoglu, F. P. Junqueira, and Ö. Ulusoy. A five-level static cache architecture for web search engines. Information Processing & Management, in press, 2011.
    [27]
    G. Pass, A. Chowdhury, and C. Torgeson. A picture of search. In Proc. 1st Int'l Conf. Scalable Information Systems, 2006.
    [28]
    D. Puppin, F. Silvestri, R. Perego, and R. Baeza-Yates. Tuning the capacity of search engines: load-driven routing and incremental caching to reduce and balance the load. ACM Trans. Inf. Syst., 28(2):1--36, 2010.
    [29]
    P. C. Saraiva, E. Silva de Moura, N. Ziviani, W. Meira, R. Fonseca, and B. Riberio-Neto. Rank-preserving two-level caching for scalable search engines. In Proc. 24th Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 51--58, 2001.
    [30]
    W.-Y. Shieh and C.-P. Chung. A statistics-based approach to incrementally update inverted files. Inf. Process. Manage., 41(2):275--288, 2005.
    [31]
    G. Skobeltsyn, F. Junqueira, V. Plachouras, and R. Baeza-Yates. Resin: a combination of results caching and index pruning for high-performance web search engines. In Proc. 31st Int'l ACM SIGIR Conf. Research and Development in Information Retrieval, pages 131--138, 2008.
    [32]
    A. Tomasic, H. García-Molina, and K. Shoens. Incremental updates of inverted lists for text document retrieval. In Proc. 1994 ACM SIGMOD Int'l Conf. on Management of Data, pages 289--300, 1994.

    Cited By

    View all
    • (2024)Caching in Forschung und IndustrieSchnelles und skalierbares Cloud-Datenmanagement10.1007/978-3-031-54388-3_5(91-140)Online publication date: 3-May-2024
    • (2020)Caching in Research and IndustryFast and Scalable Cloud Data Management10.1007/978-3-030-43506-6_5(85-130)Online publication date: 15-May-2020
    • (2018)Better Caching in Search Advertising Systems with Rapid Refresh PredictionsProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186176(1875-1884)Online publication date: 10-Apr-2018
    • Show More Cited By

    Index Terms

    1. Timestamp-based result cache invalidation for web search engines

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
      July 2011
      1374 pages
      ISBN:9781450307574
      DOI:10.1145/2009916
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 July 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. cache invalidation
      2. freshness
      3. result cache
      4. web search

      Qualifiers

      • Research-article

      Conference

      SIGIR '11
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 11 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Caching in Forschung und IndustrieSchnelles und skalierbares Cloud-Datenmanagement10.1007/978-3-031-54388-3_5(91-140)Online publication date: 3-May-2024
      • (2020)Caching in Research and IndustryFast and Scalable Cloud Data Management10.1007/978-3-030-43506-6_5(85-130)Online publication date: 15-May-2020
      • (2018)Better Caching in Search Advertising Systems with Rapid Refresh PredictionsProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186176(1875-1884)Online publication date: 10-Apr-2018
      • (2017)Workload analysis and caching strategies for search advertising systemsProceedings of the 2017 Symposium on Cloud Computing10.1145/3127479.3129255(170-180)Online publication date: 24-Sep-2017
      • (2017)Caching with Dual CostsProceedings of the 26th International Conference on World Wide Web Companion10.1145/3041021.3054187(643-652)Online publication date: 3-Apr-2017
      • (2017)A machine learning approach for result caching in web search enginesInformation Processing and Management: an International Journal10.1016/j.ipm.2017.02.00653:4(834-850)Online publication date: 1-Jul-2017
      • (2016)Scalability and Efficiency Challenges in Large-Scale Web Search EnginesProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2914808(1223-1226)Online publication date: 7-Jul-2016
      • (2015)Scalability Challenges in Web Search EnginesSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00662ED1V01Y201508ICR0457:6(1-138)Online publication date: 29-Dec-2015
      • (2015)Propagating Expiration Decisions in a Search Engine Result CacheProceedings of the 24th International Conference on World Wide Web10.1145/2740908.2742772(107-108)Online publication date: 18-May-2015
      • (2015)Adaptive Caching of Fresh Web Search ResultsAdvances in Information Retrieval10.1007/978-3-319-16354-3_13(110-122)Online publication date: 2015
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media