Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1951365.1951379acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Caching query-biased snippets for efficient retrieval

Published: 21 March 2011 Publication History

Abstract

Web Search Engines' result pages contain references to the top-k documents relevant for the query submitted by a user. Each document is represented by a title, a snippet and a URL. Snippets, i.e. short sentences showing the portions of the document being relevant to the query, help users to select the most interesting results.
The snippet generation process is very expensive, since it may require to access a number of documents for each issued query. We assert that caching, a popular technique used to enhance performance at various levels of any computing systems, can be very effective in this context. We design and experiment several cache organizations, and we introduce the concept of supersnippet, that is the set of sentences in a document that are more likely to answer future queries. We show that supersnippets can be built by exploiting query logs, and that in our experiments a supersnippet cache answers up to 62% of the requests, remarkably outperforming other caching approaches.

References

[1]
R. Baeza-Yates, C. Castillo, F. Junqueira, V. Plachouras, and F. Silvestri. Challenges on distributed web retrieval. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on, pages 6--20. IEEE, 2007.
[2]
R. Baeza-Yates, A. Gionis, F. P. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. Design trade-offs for search engine caching. ACM Transactions on the Web (TWEB), 2(4):1--28, 2008.
[3]
L. A. Barroso, J. Dean, and U. Holzle. Web search for a planet: The google cluster architecture. Micro, IEEE, 23(2):22--28, mar. 2003.
[4]
H. P. Edmundson. New Methods in Automatic Extracting. Computing, 16(2):264--285, 1969.
[5]
T. Fagni, R. Perego, F. Silvestri, and S. Orlando. Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst., 24(1):51--78, 2006.
[6]
Q. Gan and T. Suel. Improved techniques for result caching in web search engines. In Proc. of the 18th Int. Conference on World Wide Web, pages 431--440. ACM, 2009.
[7]
L. Kaufman and P. J. Rousseeuw. Finding Groups in Data An Introduction to Cluster Analysis. Wiley Interscience, New York, 1990.
[8]
R. Lempel and S. Moran. Predictive caching and prefetching of query results in search engines. In Proceedings of the 12th international conference on World Wide Web, pages 19--28. ACM, 2003.
[9]
J. Lu and J. Callan. Pruning long documents for distributed information retrieval. In Proceedings of the eleventh international conference on information and knowledge management, pages 332--339. ACM, 2002.
[10]
H. P. Luhn. The automatic creation of literature abstracts. IBM J. of research and development, 2(2):159--165, 1958.
[11]
E. P. Markatos. On caching search engine query results. Computer Communications, 24(2):137--143, 2001.
[12]
D. Metzler and T. Kanungo. Machine learned sentence selection strategies for query-biased summarization. Learning to Rank for Information Retrieval, 40, 2008.
[13]
D. A. Patterson and J. L. Hennessy. Computer organization and design: the hardware/software interface. Morgan Kaufmann Pub, 2009.
[14]
D. R. Radev, E. Hovy, and K. McKeown. Introduction to the special issue on summarization. Computational Linguistics, 28(4):399--408, 2002.
[15]
F. Silvestri. Mining query logs: Turning search usage data into knowledge. Foundations and Trends in Information Retrieval, 4(1--2):1--174, 2010.
[16]
A. Tombros and M. Sanderson. In Proc. of the 21st Annual Inter. ACM SIGIR Conf. on Research and Development in Information Retrieval. ACM, 1998.
[17]
Y. Tsegay, S. Puglisi, A. Turpin, and J. Zobel. Document Compaction for Efficient Query Biased Snippet Generation. Advances in Information Retrieval, pages 509--520, 2009.
[18]
A. Turpin, Y. Tsegay, D. Hawking, and H. E. Williams. Fast generation of result snippets in web search. In: ACM SIGIR, 39(5):127--134, 2007.
[19]
Y. Xie and D. O'Hallaron. Locality in search engine queries and its implications for caching. In Proceedings of IEEE INFOCOM 2002, The 21st Annual Joint Conference of the IEEE Computer and Communications Societies, 2002.

Cited By

View all
  • (2021)Three-level Compact Caching for Search Engines Based on Solid State Drives2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00030(16-25)Online publication date: Dec-2021
  • (2020)Topical result caching in web search enginesInformation Processing and Management: an International Journal10.1016/j.ipm.2019.10219357:3Online publication date: 1-May-2020
  • (2018)An Extensible Search Engine Platform for Efficiency Research2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom)10.1109/BDCloud.2018.00103(676-683)Online publication date: Dec-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology
March 2011
587 pages
ISBN:9781450305280
DOI:10.1145/1951365
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Microsoft Research: Microsoft Research

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. caching
  2. efficiency
  3. snippet generation
  4. throughput
  5. web search engines

Qualifiers

  • Research-article

Conference

EDBT/ICDT '11
Sponsor:
  • Microsoft Research
EDBT/ICDT '11: EDBT/ICDT '11 joint conference
March 21 - 24, 2011
Uppsala, Sweden

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Three-level Compact Caching for Search Engines Based on Solid State Drives2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00030(16-25)Online publication date: Dec-2021
  • (2020)Topical result caching in web search enginesInformation Processing and Management: an International Journal10.1016/j.ipm.2019.10219357:3Online publication date: 1-May-2020
  • (2018)An Extensible Search Engine Platform for Efficiency Research2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom)10.1109/BDCloud.2018.00103(676-683)Online publication date: Dec-2018
  • (2018)Two-dimensional indexing to provide one-integrated-memory view of distributed memory for a massively-parallel search engineWorld Wide Web10.1007/s11280-018-0647-1Online publication date: 13-Nov-2018
  • (2017)Caching-Aware Techniques for Query Workload Partitioning in Parallel Search Engines2017 14th Web Information Systems and Applications Conference (WISA)10.1109/WISA.2017.33(44-49)Online publication date: Nov-2017
  • (2017)EDSFuture Generation Computer Systems10.1016/j.future.2016.02.01474:C(220-231)Online publication date: 1-Sep-2017
  • (2017)A New Static Web Caching Mechanism Based on Mutual Dependency Between Result Cache and Posting List CacheWeb Information Systems Engineering – WISE 201710.1007/978-3-319-68786-5_12(148-156)Online publication date: 4-Oct-2017
  • (2015)Scalability Challenges in Web Search EnginesSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00662ED1V01Y201508ICR0457:6(1-138)Online publication date: 29-Dec-2015
  • (2015)Compact Snippet Caching for Flash-based Search EnginesProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/2766462.2767764(1015-1018)Online publication date: 9-Aug-2015
  • (2015)Stochastic Query Covering for Fast Approximate Document RetrievalACM Transactions on Information Systems10.1145/269967133:3(1-35)Online publication date: 17-Feb-2015
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media