Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines

Published: 01 December 2013 Publication History

Abstract

Web search engines are known to cache the results of previously issued queries. The stored results typically contain the document summaries and some data that is used to construct the final search result page returned to the user. An alternative strategy is to store in the cache only the result document IDs, which take much less space, allowing results of more queries to be cached. These two strategies lead to an interesting trade-off between the hit rate and the average query response latency. In this work, in order to exploit this trade-off, we propose a hybrid result caching strategy where a dynamic result cache is split into two sections: an HTML cache and a docID cache. Moreover, using a realistic cost model, we evaluate the performance of different result prefetching strategies for the proposed hybrid cache and the baseline HTML-only cache. Finally, we propose a machine learning approach to predict singleton queries, which occur only once in the query stream. We show that when the proposed hybrid result caching strategy is coupled with the singleton query predictor, the hit rate is further improved.

References

[1]
Alici, S., Altingovde, I. S., Ozcan, R., Cambazoglu, B. B., and Ulusoy, O. 2011. Timestamp-based result cache invalidation for Web search engines. In Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 973--982.
[2]
Alici, S., Altingovde, I. S., Ozcan, R., Cambazoglu, B. B., and Ulusoy, O. 2012. Adaptive time-to-live strategies for query result caching in Web search engines. In Proceedings of the 34th European Conference Advances in Information Retrieval. 401--412.
[3]
Altingovde, I. S., Ozcan, R., Cambazoglu, B. B., and Ulusoy, O. 2011. Second chance: A hybrid approach for dynamic result caching in search engines. In Proceedings of the 33rd European Conference on Advances in Information Retrieval. 510--516.
[4]
Arroyuelo, D., González, S., Marin, M., Oyarzún, M., and Suel, T. 2012. To index or not to index: Time-space trade-offs in search engines with positional ranking functions. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 255--264.
[5]
Baeza-Yates, R. and Jonassen, S. 2012. Modeling static caching in Web search engines. In Proceedings of the 34th European Conference on Advances in Information Retrieval. 436--446.
[6]
Baeza-Yates, R. and Saint-Jean, F. 2003. A three level search engine index based in query log distribution. In Proceedings of the 10th International Conference on String Processing and Information Retrieval. 56--65.
[7]
Baeza-Yates, R., Junqueira, F., Plachouras, V., and Witschel, H. F. 2007. Admission policies for caches of search engine results. In Proceedings of the 14th International Conference on String Processing and Information Retrieval. 74--85.
[8]
Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., and Silvestri, F. 2008. Design trade-offs for search engine caching. ACM Trans. Web 2, 4, 1--28.
[9]
Bailey, P., White, R. W., Liu, H., and Kumaran, G. 2010. Mining historic query trails to label long and rare search engine queries. ACM Trans. Web 4, 4, 15:1--15:27.
[10]
Blanco, R., Bortnikov, E., Junqueira, F., Lempel, R., Telloli, L., and Zaragoza, H. 2010a. Caching search engine results over incremental indices. In Proceedings of the 19th International Conference on World Wide Web. 1065--1066.
[11]
Blanco, R., Bortnikov, E., Junqueira, F., Lempel, R., Telloli, L., and Zaragoza, H. 2010b. Caching search engine results over incremental indices. In Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 82--89.
[12]
Bortnikov, E., Lempel, R., and Vornovitsky, K. 2011. Caching for realtime search. In Proceedings of the 33rd European Conference on Advances in Information Retrieval. 104--116.
[13]
Broder, A. Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., and Zhang, T. 2007. Robust classification of rare queries using Web knowledge. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 231--238.
[14]
Cambazoglu, B. B. and Baeza-Yates, R. 2011. Scalability challenges in Web search engines. In Advanced Topics in Information Retrieval, M. Melucci, R. Baeza-Yates, and W. B. Croft Eds., The Information Retrieval Series, vol. 33. Springer, Berlin Heidelberg, 27--50.
[15]
Cambazoglu, B. B., Junqueira, F., Plachouras, V., Banachowski, S., Cui, B., Lim, S., and Bridge, B. 2010. A refreshing perspective of search engine caching. In Proceedings of the 19th International Conference on World Wide Web. 181--190.
[16]
Ceccarelli, D., Lucchese, C., Orlando, S., Perego, R., and Silvestri, F. 2011. Caching query-biased snippets for efficient retrieval. In Proceedings of the 14th International Conference on Extending Database Technology. 93--104.
[17]
Elias, P. 1975. Universal codeword sets and the representation of the integers. IEEE Trans. Inf. Theory 21, 194--203.
[18]
Fagni, T., Perego, R., Silvestri, F., and Orlando, S. 2006. Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inform. Syst. 24, 1, 51--78.
[19]
Gan, Q. and Suel, T. 2009. Improved techniques for result caching in Web search engines. In Proceedings of the 18th International Conference on World Wide Web. 431--440.
[20]
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. 2009. The WEKA data mining software: An update. SIGKDD Explor. 11, 1.
[21]
Jonassen, S., Cambazoglu, B. B., and Silvestri, F. 2012. Prefetching query results and its impact on search engines. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 631--640.
[22]
Lempel, R. and Moran, S. 2003. Predictive caching and prefetching of query results in search engines. In Proceedings of the 12th International Conference on World Wide Web. 19--28.
[23]
Lempel, R. and Moran, S. 2004. Optimizing result prefetching in Web search engines with segmented indices. ACM Trans. Int. Technol. 4, 1, 31--59.
[24]
Long, X. and Suel, T. 2005. Three-level caching for efficient query processing in large Web search engines. In Proceedings of the 14th International Conference on World Wide Web. 257--266.
[25]
Marin, M., Gil-Costa, V., and Gomez-Pantoja, C. 2010. New caching techniques for Web search engines. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. 215--226.
[26]
Markatos, E. P. 2001. On caching search engine query results. Comput. Commun. 24, 2, 137--143.
[27]
Ozcan, R., Altingovde, I. S., and Ulusoy, O. 2011a. Cost-aware strategies for query result caching in Web search engines. ACM Trans. Web 5, 2, 9:1--9:25.
[28]
Ozcan, R., Altingovde, I. S., and Ulusoy, O. 2011b. Exploiting navigational queries for result presentation and caching in Web search engines. J. Am. Soc. Inf. Sci. Technol. 62, 4, 714--726.
[29]
Ozcan, R., Altingovde, I. S., Cambazoglu, B. B., Junqueira, F. P., and Ulusoy, O. 2012. A five-level static cache architecture for Web search engines. Inf. Process. Manage. 48, 5, 828--840.
[30]
Pass, G., Chowdhury, A., and Torgeson, C. 2006. A picture of search. In Proceedings of the 1st International Conference on Scalable Information Systems.
[31]
Podlipnig, S. and Boszormenyi, L. 2003. A survey of Web cache replacement strategies. ACM Comput. Surv. 35, 4, 374--398.
[32]
Saraiva, P. C., Silva de Moura, E., Ziviani, N., Meira, W., Fonseca, R., and Riberio-Neto, B. 2001. Rank-preserving two-level caching for scalable search engines. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 51--58.
[33]
Sazoglu, F. B., Cambazoglu, B. B., Ozcan, R., Altingovde, I. S., and Ulusoy, O. 2013a. A financial cost metric for result caching. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 873--876.
[34]
Sazoglu, F. B., Cambazoglu, B. B., Ozcan, R., Altingovde, I. S., and Ulusoy, O. 2013b. Strategies for setting time-to-live values in result caches. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 1881--1884.
[35]
Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. 1999. Analysis of a very large Web search engine query log. SIGIR Forum 33, 1, 6--12.
[36]
Tsegay, Y., Puglisi, S. J., Turpin, A., and Zobel, J. 2009. Document compaction for efficient query biased snippet generation. In Proceedings of the 31th European Conference on Advances in Information Retrieval. 509--520.
[37]
Turpin, A., Tsegay, Y., Hawking, D., and Williams, H. E. 2007. Fast generation of result snippets in Web search. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 127--134.
[38]
Zhang, J., Long, X., and Suel, T. 2008. Performance of compressed inverted list caching in search engines. In Proceedings of the 17th International Conference on World Wide Web. 387--396.

Cited By

View all
  • (2021)Three-level Compact Caching for Search Engines Based on Solid State Drives2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00030(16-25)Online publication date: Dec-2021
  • (2020)Topical result caching in web search enginesInformation Processing and Management: an International Journal10.1016/j.ipm.2019.10219357:3Online publication date: 1-May-2020
  • (2019)Caching Scores for Faster Query Processing with Dynamic Pruning in Search EnginesProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358154(2457-2460)Online publication date: 3-Nov-2019
  • Show More Cited By

Index Terms

  1. Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on the Web
      ACM Transactions on the Web  Volume 8, Issue 1
      December 2013
      204 pages
      ISSN:1559-1131
      EISSN:1559-114X
      DOI:10.1145/2560539
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 01 December 2013
      Accepted: 01 October 2013
      Revised: 01 July 2013
      Received: 01 August 2012
      Published in TWEB Volume 8, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Web search engines
      2. dynamic result caching
      3. result prefetching

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 09 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Three-level Compact Caching for Search Engines Based on Solid State Drives2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00030(16-25)Online publication date: Dec-2021
      • (2020)Topical result caching in web search enginesInformation Processing and Management: an International Journal10.1016/j.ipm.2019.10219357:3Online publication date: 1-May-2020
      • (2019)Caching Scores for Faster Query Processing with Dynamic Pruning in Search EnginesProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358154(2457-2460)Online publication date: 3-Nov-2019
      • (2019)On the Impact of Storing Query Frequency History for Search Engine Result CachingAdvances in Information Retrieval10.1007/978-3-030-15719-7_20(155-162)Online publication date: 14-Apr-2019
      • (2018)Better Caching in Search Advertising Systems with Rapid Refresh PredictionsProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186176(1875-1884)Online publication date: 10-Apr-2018
      • (2017)Workload analysis and caching strategies for search advertising systemsProceedings of the 2017 Symposium on Cloud Computing10.1145/3127479.3129255(170-180)Online publication date: 24-Sep-2017
      • (2017)A machine learning approach for result caching in web search enginesInformation Processing and Management: an International Journal10.1016/j.ipm.2017.02.00653:4(834-850)Online publication date: 1-Jul-2017
      • (2017)A New Static Web Caching Mechanism Based on Mutual Dependency Between Result Cache and Posting List CacheWeb Information Systems Engineering – WISE 201710.1007/978-3-319-68786-5_12(148-156)Online publication date: 7-Oct-2017
      • (2015)Scalability Challenges in Web Search EnginesSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00662ED1V01Y201508ICR0457:6(1-138)Online publication date: 29-Dec-2015

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media