Text Indexing

Aluru, Srinivas

doi:10.1007/978-0-387-30162-4_422

Srinivas Aluru²

160 Accesses

Keywords and Synonyms

String indexing

Problem Definition

Text or string data naturally arises in many contexts including document processing, information retrieval, natural and computer language processing, and describing molecular sequences. In broad terms, the goal of text indexing is to design methodologies to store text data so as to significantly improve the speed and performance of answering queries. While text indexing has been studied for a long time, it shot into prominence during the last decade due to the ubiquity of web-based textual data and search engines to explore it, design of digital libraries for archiving human knowledge, and application of string techniques to further understanding of modern biology. Text indexing differs from the typical indexing of keys drawn from an underlying total order—text data can have varying lengths, and queries are often more complex and involve substrings, partial matches, or approximate matches.

Queries on text data are as varied as...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 399.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Recently, Cole et al. (2006) showed how to further reduce the search time to $ { O(|P| + \log |\Sigma|) } $ while still keeping the optimal $ { O(|T|) } $ space.

Recommended Reading

Abouelhoda, M., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discret. Algorithms 2, 53–86 (2004)
Article MathSciNet MATH Google Scholar
Aluru, S. (ed.): Handbook of Computational Molecular Biology. Computer and Information Science Series. Chapman and Hall/CRC Press, Boca Raton (2005)
Google Scholar
Amir, A., Kopelowitz, T., Lewenstein, M., Lewenstein, N.: Towards real-time suffix tree construction. In: Proc. String Processing and Information Retrieval Symposium (SPIRE), 2005, pp. 67–78
Google Scholar
Ciriani, V., Ferragina, P., Luccio, F., Muthukrishnan, S.: A data structure for a sequence of string acesses in external memory. ACM Trans. Algorithms 3 (2007)
Google Scholar
Crescenzi, P., Grossi, R., Italiano, G.: Search data structures for skewed strings. In: International Workshop on Experimental and Efficient Algorithms (WEA). Lecture Notes in Computer Science, vol. 2, pp. 81–96. Springer, Berlin (2003)
Google Scholar
Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific Publishing Company, Singapore (2002)
Book Google Scholar
Ferragina, P., Grossi, R.: Optimal On-Line Search and Sublinear Time Update in String Matching. SIAM J. Comput. 3, 713–736 (1998)
Article MathSciNet Google Scholar
Franceschini, G., Grossi, R.: A general technique for managing strings in comparison‐driven data structures. In: Annual International Colloquium on Automata, Languages and Programming (ICALP), 2004
Google Scholar
Grossi, R., Italiano, G.: Efficient techniques for maintaining multidimensional keys in linked data structures. In: Annual International Colloquium on Automata, Languages and Programming (ICALP), 1999, pp. 372–381
Google Scholar
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)
Book MATH Google Scholar
Karkkainen, J., Sanders, P., Burkhardt, S.: Linear work suffix arrays construction. J. ACM 53, 918–936 (2006)
Article MathSciNet Google Scholar
Kasai, T., Lee, G., Arimura, H. et al.: Linear-time longest‐common‐prefix computation in suffix arrays and its applications. In: Proc. 12th Annual Symposium, Combinatorial Pattern Matching (CPM), 2001, pp. 181–192
Google Scholar
Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. J. Discret. Algorithms 3, 143–156 (2005)
Article MathSciNet MATH Google Scholar
Ko, P., Aluru, S.: Optimal self‐adjustring tree for dynamic string data in secondary storage. In: Proc. String Processing and Information Retrieval Symposium (SPIRE). Lect. Notes Comp. Sci. vol. 4726, pp. 184–194, Santiago, Chile (2007)
Google Scholar
Manber, U., Myers, G.: Suffix arrays: a new method for on-line search. SIAM J. Comput. 22, 935–948 (1993)
Article MathSciNet MATH Google Scholar
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33, 31–88 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Iowa State University, Ames, IA, USA
Srinivas Aluru

Authors

Srinivas Aluru
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical Engineering and Computer ScienceMcCormick School of Engineering and Applied Science, Northwestern University, Evanston, IL, 60208, USA
Ming-Yang Kao Professor of Computer Science

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Aluru, S. (2008). Text Indexing. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30162-4_422

Download citation

DOI: https://doi.org/10.1007/978-0-387-30162-4_422
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30770-1
Online ISBN: 978-0-387-30162-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics