Content area
Full Text
Inf Retrieval (2008) 11:269286
DOI 10.1007/s10791-008-9048-x
Holger Bast Christian W. Mortensen
Ingmar Weber
Received: 23 January 2008 / Accepted: 23 January 2008 / Published online: 4 March 2008 The Author(s) 2008
Abstract We consider the following autocompletion search scenario: imagine a user of a search engine typing a query; then with every keystroke display those completions of the last query word that would lead to the best hits, and also display the best such hits. The following problem is at the core of this feature: for a xed document collection, given a set D of documents, and an alphabetical range W of words, compute the set of all word-in-document pairs (w, d) from the collection such that w [ W and d [ D. We present a new data structure with the help of which such autocompletion queries can be processed, on the average, in time linear in the input plus output size, independent of the size of the underlying document collection. At the same time, our data structure uses no more space than an inverted index. Actual query processing times on a large test collection correlate almost perfectly with our theoretical bound.
Keywords Autocompletion Index data structure Prex search Output-sensitive
1 Introduction
Autocompletion, in its most basic form, is the following mechanism: the user types the rst few letters of some word, and either by pressing a dedicated key or automatically after each keystroke a procedure is invoked that displays all relevant words that are continuations of the typed sequence. The most prominent example of this feature is the tab-completion mechanism in a Unix shell. In the recently launched Google Suggest service completion is to frequently asked queries. Algorithmically, this basic form of autocompletion is easy: it merely requires two searches in a precompiled sorted list of strings, in order to nd the rst and the last completion from that list.
H. Bast C. W. Mortensen I. Weber (&)
Max-Planck-Institut fr Informatik, Campus E1.4, 66123 Saarbrucken, Germany e-mail: [email protected]
Output-sensitive autocompletion search
123
270 Inf Retrieval (2008) 11:269286
1.1 Problem denition
The problem we consider in this paper is derived from a more sophisticated form of autocompletion, which takes into account the context in which the to-be-completed word has been typed. Here, we would...