research-article

Caching query-biased snippets for efficient retrieval

Authors:

Diego Ceccarelli,

Claudio Lucchese,

Salvatore Orlando,

Raffaele Perego,

Fabrizio SilvestriAuthors Info & Claims

EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology

Pages 93 - 104

https://doi.org/10.1145/1951365.1951379

Published: 21 March 2011 Publication History

Abstract

Web Search Engines' result pages contain references to the top-k documents relevant for the query submitted by a user. Each document is represented by a title, a snippet and a URL. Snippets, i.e. short sentences showing the portions of the document being relevant to the query, help users to select the most interesting results.

The snippet generation process is very expensive, since it may require to access a number of documents for each issued query. We assert that caching, a popular technique used to enhance performance at various levels of any computing systems, can be very effective in this context. We design and experiment several cache organizations, and we introduce the concept of supersnippet, that is the set of sentences in a document that are more likely to answer future queries. We show that supersnippets can be built by exploiting query logs, and that in our experiments a supersnippet cache answers up to 62% of the requests, remarkably outperforming other caching approaches.

References

[1]

R. Baeza-Yates, C. Castillo, F. Junqueira, V. Plachouras, and F. Silvestri. Challenges on distributed web retrieval. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pages 6--20. IEEE, 2007.

[2]

R. Baeza-Yates, A. Gionis, F. P. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. Design trade-offs for search engine caching. ACM Transactions on the Web (TWEB), 2(4):1--28, 2008.

Digital Library

[3]

L. A. Barroso, J. Dean, and U. Holzle. Web search for a planet: The google cluster architecture. Micro, IEEE, 23(2):22--28, mar. 2003.

Digital Library

[4]

H. P. Edmundson. New Methods in Automatic Extracting. Computing, 16(2):264--285, 1969.

Digital Library

[5]

T. Fagni, R. Perego, F. Silvestri, and S. Orlando. Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst., 24(1):51--78, 2006.

Digital Library

[6]

Q. Gan and T. Suel. Improved techniques for result caching in web search engines. In Proc. of the 18th Int. Conference on World Wide Web, pages 431--440. ACM, 2009.

Digital Library

[7]

L. Kaufman and P. J. Rousseeuw. Finding Groups in Data An Introduction to Cluster Analysis. Wiley Interscience, New York, 1990.

[8]

R. Lempel and S. Moran. Predictive caching and prefetching of query results in search engines. In Proceedings of the 12th international conference on World Wide Web, pages 19--28. ACM, 2003.

Digital Library

[9]

J. Lu and J. Callan. Pruning long documents for distributed information retrieval. In Proceedings of the eleventh international conference on information and knowledge management, pages 332--339. ACM, 2002.

Digital Library

[10]

H. P. Luhn. The automatic creation of literature abstracts. IBM J. of research and development, 2(2):159--165, 1958.

Digital Library

[11]

E. P. Markatos. On caching search engine query results. Computer Communications, 24(2):137--143, 2001.

Digital Library

[12]

D. Metzler and T. Kanungo. Machine learned sentence selection strategies for query-biased summarization. Learning to Rank for Information Retrieval, 40, 2008.

[13]

D. A. Patterson and J. L. Hennessy. Computer organization and design: the hardware/software interface. Morgan Kaufmann Pub, 2009.

[14]

D. R. Radev, E. Hovy, and K. McKeown. Introduction to the special issue on summarization. Computational Linguistics, 28(4):399--408, 2002.

Digital Library

[15]

F. Silvestri. Mining query logs: Turning search usage data into knowledge. Foundations and Trends in Information Retrieval, 4(1--2):1--174, 2010.

Digital Library

[16]

A. Tombros and M. Sanderson. In Proc. of the 21st Annual Inter. ACM SIGIR Conf. on Research and Development in Information Retrieval. ACM, 1998.

[17]

Y. Tsegay, S. Puglisi, A. Turpin, and J. Zobel. Document Compaction for Efficient Query Biased Snippet Generation. Advances in Information Retrieval, pages 509--520, 2009.

Digital Library

[18]

A. Turpin, Y. Tsegay, D. Hawking, and H. E. Williams. Fast generation of result snippets in web search. In: ACM SIGIR, 39(5):127--134, 2007.

Digital Library

[19]

Y. Xie and D. O'Hallaron. Locality in search engine queries and its implications for caching. In Proceedings of IEEE INFOCOM 2002, The 21^st Annual Joint Conference of the IEEE Computer and Communications Societies, 2002.

Cited By

Zhang RSun PTong JZang RQian HPan YStones RWang GLiu XLi Y(2021)Three-level Compact Caching for Search Engines Based on Solid State Drives2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00030(16-25)Online publication date: Dec-2021
https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00030
Mele ITonellotto NFrieder OPerego R(2020)Topical result caching in web search enginesInformation Processing and Management: an International Journal10.1016/j.ipm.2019.10219357:3Online publication date: 1-May-2020
https://dl.acm.org/doi/10.1016/j.ipm.2019.102193
Zhang ZAo NWang GLiu X(2018)An Extensible Search Engine Platform for Efficiency Research2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom)10.1109/BDCloud.2018.00103(676-683)Online publication date: Dec-2018
https://doi.org/10.1109/BDCloud.2018.00103
Show More Cited By

Index Terms

Caching query-biased snippets for efficient retrieval
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
  2. Information systems applications
    1. Data mining

Recommendations

Auditing the Partisanship of Google Search Snippets
WWW '19: The World Wide Web Conference

The text snippets presented in web search results provide users with a slice of page content that they can quickly scan to help inform their click decisions. However, little is known about how these snippets are generated or how they relate to a user's ...
A refreshing perspective of search engine caching
WWW '10: Proceedings of the 19th international conference on World wide web

Commercial Web search engines have to process user queries over huge Web indexes under tight latency constraints. In practice, to achieve low latency, large result caches are employed and a portion of the query traffic is served using previously ...
User-aware caching and prefetching query results in web search engines
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Query results caching is an efficient technique for Web search engines. In this paper we present User-Aware Cache, a novel approach tailored for query results caching, that is based on user characteristics. We then use a trace of around 30 million ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology

March 2011

587 pages

ISBN:9781450305280

DOI:10.1145/1951365

Editors:
Anastasia Ailamaki
EPFL, Switzerland
,
Sihem Amer-Yahia
Yahoo! Research
,
Jignesh Pate
University of Wisconsin-Madison
,
Tore Risch
Uppsala University, Sweden
,
Pierre Senellart
Télécom ParisTech, France
,
Julia Stoyanovich
University of Pennsylvania

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Microsoft Research: Microsoft Research

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

EDBT/ICDT '11

Sponsor:

Microsoft Research

EDBT/ICDT '11: EDBT/ICDT '11 joint conference

March 21 - 24, 2011

Uppsala, Sweden

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
204
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang RSun PTong JZang RQian HPan YStones RWang GLiu XLi Y(2021)Three-level Compact Caching for Search Engines Based on Solid State Drives2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00030(16-25)Online publication date: Dec-2021
https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00030
Mele ITonellotto NFrieder OPerego R(2020)Topical result caching in web search enginesInformation Processing and Management: an International Journal10.1016/j.ipm.2019.10219357:3Online publication date: 1-May-2020
https://dl.acm.org/doi/10.1016/j.ipm.2019.102193
Zhang ZAo NWang GLiu X(2018)An Extensible Search Engine Platform for Efficiency Research2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom)10.1109/BDCloud.2018.00103(676-683)Online publication date: Dec-2018
https://doi.org/10.1109/BDCloud.2018.00103
Yun TWhang KKwon HKim JSong I(2018)Two-dimensional indexing to provide one-integrated-memory view of distributed memory for a massively-parallel search engineWorld Wide Web10.1007/s11280-018-0647-1Online publication date: 13-Nov-2018
https://doi.org/10.1007/s11280-018-0647-1
Xu CWang YLv PXu J(2017)Caching-Aware Techniques for Query Workload Partitioning in Parallel Search Engines2017 14th Web Information Systems and Applications Conference (WISA)10.1109/WISA.2017.33(44-49)Online publication date: Nov-2017
https://doi.org/10.1109/WISA.2017.33
Dong XLi RHe HGu XSarem MQiu MLi K(2017)EDSFuture Generation Computer Systems10.1016/j.future.2016.02.01474:C(220-231)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1016/j.future.2016.02.014
Trinh TWu DHuang J(2017)A New Static Web Caching Mechanism Based on Mutual Dependency Between Result Cache and Posting List CacheWeb Information Systems Engineering – WISE 201710.1007/978-3-319-68786-5_12(148-156)Online publication date: 4-Oct-2017
https://doi.org/10.1007/978-3-319-68786-5_12
Cambazoglu BBaeza-Yates R(2015)Scalability Challenges in Web Search EnginesSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00662ED1V01Y201508ICR0457:6(1-138)Online publication date: 29-Dec-2015
https://doi.org/10.2200/S00662ED1V01Y201508ICR045
Zhang RSun PTong JStones RWang GLiu XBaeza-Yates RLalmas MMoffat ARibeiro-Neto B(2015)Compact Snippet Caching for Flash-based Search EnginesProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/2766462.2767764(1015-1018)Online publication date: 9-Aug-2015
https://dl.acm.org/doi/10.1145/2766462.2767764
Anagnostopoulos ABecchetti LBordino ILeonardi SMele ISankowski P(2015)Stochastic Query Covering for Fast Approximate Document RetrievalACM Transactions on Information Systems10.1145/269967133:3(1-35)Online publication date: 17-Feb-2015
https://dl.acm.org/doi/10.1145/2699671
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents