Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1277741.1277775acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

The impact of caching on search engines

Published: 23 July 2007 Publication History
  • Get Citation Alerts
  • Abstract

    In this paper we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs.caching posting lists. Using a query log spanning a whole year we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the problem of finding the optimal way to split the static cache between answers and posting lists. Finally, we measure how the changes in the query log affect the effectiveness of static caching, given our observation that the distribution of the queries changes slowly over time. Our results and observations are applicable to different levels of the data-access hierarchy, for instance, for a memory/disk layer or a broker/remote server layer.

    References

    [1]
    V. N. Anh and A. Moffat. Pruned query evaluation using pre-computed impacts. In ACM CIKM, 2006.
    [2]
    R. A. Baeza-Yates and F. Saint-Jean. A three level search engine index based in query log distribution. In SPIRE, 2003.
    [3]
    C. Buckley and A. F. Lewit. Optimization of inverted vector searches. In ACM SIGIR, 1985.
    [4]
    S. Büttcher and C. L. A. Clarke. A document-centric approach to static index pruning in text retrieval systems. In ACM CIKM, 2006.
    [5]
    P. Cao and S. Irani. Cost-aware WWW proxy caching algorithms. In USITS, 1997.
    [6]
    P. Denning. Working sets past and present. IEEE Trans. on Software Engineering, SE-6(1):64--84, 1980.
    [7]
    T. Fagni, R. Perego, F. Silvestri, and S. Orlando. Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst., 24(1):51--78, 2006.
    [8]
    R. Lempel and S. Moran. Predictive caching and prefetching of query results in search engines. In WWW, 2003.
    [9]
    X. Long and T. Suel. Three-level caching for efficient query processing in large web search engines. In WWW, 2005.
    [10]
    E. P. Markatos. On caching search engine query results. Computer Communications, 24(2):137--143, 2001.
    [11]
    I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and C. Lioma. Terrier: A High Performance and Scalable Information Retrieval Platform. In SIGIR Workshop on Open Source Information Retrieval, 2006.
    [12]
    V. V. Raghavan and H. Sever. On the reuse of past optimal queries. In ACM SIGIR, 1995.
    [13]
    P. C. Saraiva, E. S. de Moura, N. Ziviani, W. Meira, R. Fonseca, and B. Riberio-Neto. Rank-preserving two-level caching for scalable search engines. In ACM SIGIR, 2001.
    [14]
    D. R. Slutz and I. L. Traiger. A note on the calculation of average working set size. Communications of the ACM, 17(10):563--565, 1974.
    [15]
    T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In ACM SIGIR, 2005.
    [16]
    I. H. Witten, T. C. Bell, and A. Moffat. Managing Gigabytes: Compressing and Indexing Documents and Images. John Wiley & Sons, Inc., NY, 1994.
    [17]
    N. E. Young. On-line file caching. Algorithmica, 33(3):371--383, 2002.

    Cited By

    View all
    • (2024)Postgraduate training opportunities for chiropractors: A description of United States programsJournal of Chiropractic Education10.7899/JCE-23-23Online publication date: 23-Jan-2024
    • (2024)Diversity-aware strategies for static index pruningInformation Processing & Management10.1016/j.ipm.2024.10379561:5(103795)Online publication date: Sep-2024
    • (2022)An NVM SSD-based High Performance Query Processing Framework for Search EnginesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3160557(1-1)Online publication date: 2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
    July 2007
    946 pages
    ISBN:9781595935977
    DOI:10.1145/1277741
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 July 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. caching
    2. information retrieval systems
    3. query logs
    4. web search

    Qualifiers

    • Article

    Conference

    SIGIR07
    Sponsor:
    SIGIR07: The 30th Annual International SIGIR Conference
    July 23 - 27, 2007
    Amsterdam, The Netherlands

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)25
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Postgraduate training opportunities for chiropractors: A description of United States programsJournal of Chiropractic Education10.7899/JCE-23-23Online publication date: 23-Jan-2024
    • (2024)Diversity-aware strategies for static index pruningInformation Processing & Management10.1016/j.ipm.2024.10379561:5(103795)Online publication date: Sep-2024
    • (2022)An NVM SSD-based High Performance Query Processing Framework for Search EnginesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3160557(1-1)Online publication date: 2022
    • (2022)Estimating the Total Volume of Queries to a Search EngineIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.305466834:11(5351-5363)Online publication date: 1-Nov-2022
    • (2021)Improving Search Engine Performance Through Dynamic Caching2021 40th International Conference of the Chilean Computer Science Society (SCCC)10.1109/SCCC54552.2021.9650412(1-6)Online publication date: 15-Nov-2021
    • (2021)Three-level Compact Caching for Search Engines Based on Solid State Drives2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00030(16-25)Online publication date: Dec-2021
    • (2020)An NVM SSD-Optimized Query Processing FrameworkProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412010(935-944)Online publication date: 19-Oct-2020
    • (2019)Caching Scores for Faster Query Processing with Dynamic Pruning in Search EnginesProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358154(2457-2460)Online publication date: 3-Nov-2019
    • (2019)Less Data Delivers Higher Search Effectiveness for Keyword QueriesProceedings of the 31st International Conference on Scientific and Statistical Database Management10.1145/3335783.3335794(109-120)Online publication date: 23-Jul-2019
    • (2019)Estimating the Total Volume of Queries to GoogleThe World Wide Web Conference10.1145/3308558.3313535(1051-1060)Online publication date: 13-May-2019
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media