Google Scholar

Efficient index-based snippet generation

H Bast, M Celikik - ACM Transactions on Information Systems (TOIS), 2014 - dl.acm.org

H Bast, M Celikik

ACM Transactions on Information Systems (TOIS), 2014•dl.acm.org

Ranked result lists with query-dependent snippets have become state of the art in text search. They are typically implemented by searching, at query time, for occurrences of the query words in the top-ranked documents. This document-based approach has three inherent problems: (i) when a document is indexed by terms which it does not contain literally (e.g., related words or spelling variants), localization of the corresponding snippets becomes problematic; (ii) each query operator (e.g., phrase or proximity search) has to be implemented twice, on the index side in order to compute the correct result set, and on the snippet-generation side to generate the appropriate snippets; and (iii) in a worst case, the whole document needs to be scanned for occurrences of the query words, which could be problematic for very long documents.

We present a new index-based method that localizes snippets by information solely computed from the index and that overcomes all three problems. Unlike previous index-based methods, we show how to achieve this at essentially no extra cost in query processing time, by a technique we call operator inversion. We also show how our index-based method allows the caching of individual segments instead of complete documents, which enables a significantly larger cache hit-ratio as compared to the document-based approach. We have fully integrated our implementation with the CompleteSearch engine.

ACM Digital Library

Show moreShow less

Save Cite Cited by 19 Related articles All 6 versions

Cite

Advanced search

Saved to My library

Efficient index-based snippet generation