Inverted Index
13 Followers
Recent papers in Inverted Index
On-line keyword searching from documents in Chinese tends to use inverted indexing as the main technique, which has its difficulties. Suffix Array is widely used for processing text in Western languages. However, it fails to get widely... more
Modern web search engines are expected to return the top-k results efficiently. Although many dynamic index pruning strategies have been proposed for efficient top-k computation, most of them are prone to ignoring some especially... more
This talk discusses the design and the application of indexing in search engines and big data platforms.
Text Information Retrieval(TIR) is considered the heart of many applications such as Document Management System(DMS). TIR that used for DMS requires different techniques of data structure than that used in the search engine. Search... more
More and more (semi) structured information is becoming available on the web in the form of documents embedding metadata (e.g., RDF, RDFa, Microformats and others). There are already hundreds of millions of such documents accessible and... more
In order to make accurate and fast keywords and full text searches it is recommended to index the words in the corpus. One way to do this is to use an inverted index to maintain in a structured form the words occurrence in a set of... more
We examine index representation techniques for document-based inverted files, and present a mechanism for compressing them using word-aligned binary codes. The new approach allows extremely fast decoding of inverted lists during query... more
Previous compact representations of permutations have focused on adding a small index on top of the plain data 〈π(1),π(2),...π(n)〉, in order to efficiently support the application of the inverse or the iterated permutation. In this paper... more
We demonstrate a parallel implementation of asparse matrix information retrieval engine. We use ashared nothing PC cluster. We perform ourexperiments with TREC disk 4 and 5 data, a NIST 2Gigabytes standard benchmark text collection on 2,... more
We propose XIR, a novel method for processing partial match queries on heterogeneous XML documents using information retrieval (IR) techniques. A partial match query is defined as the one having the descendent-or-self axis “//” in its... more
The Web contains a large amount of documents and an increasing quantity of structured data in the form of RDF triples. Many of these triples are annotations associated with documents. While structured queries constitute the principal... more
Turkish National Corpus (TNC) released its first version in 2012 is the first large scale (50 million words), web-based and publicly-available free resource of contemporary Turkish. It is designed to be a well-balanced and representative... more
NoSQL systems are more and more deployed as back-end infrastructure for large-scale distributed online platforms like Google, Amazon or Facebook. Their applicability results from the fact that most services of online platforms access the... more
Data mining is the process of discovering interesting patterns and knowledge from large amounts of data. Spatial databases store large space related data, such as maps, preprocessed remote sensing or medical imaging data. Modern mobile... more
We describe a new formulation of appearance-only SLAM suitable for very large scale place recognition. The system navigates in the space of appearance, assigning each new observation to either a new or a previously visited location,... more
During the past few years, the commercial Web search engines have augmented their underlying index structures by significantly enriching the information which describes the appearance of a word within a document Dean (2009) [7]. This... more
We consider the following autocompletion search scenario: imagine a user of a search engine typing a query; then with every keystroke display those completions of the last query word that would lead to the best hits, and also display the... more