Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Text Indexing

1993; Manber, Myers

  • Reference work entry
Encyclopedia of Algorithms
  • 160 Accesses

Keywords and Synonyms

String indexing      

Problem Definition

Text or string data naturally arises in many contexts including document processing, information retrieval, natural and computer language processing, and describing molecular sequences. In broad terms, the goal of text indexing is to design methodologies to store text data so as to significantly improve the speed and performance of answering queries. While text indexing has been studied for a long time, it shot into prominence during the last decade due to the ubiquity of web-based textual data and search engines to explore it, design of digital libraries for archiving human knowledge, and application of string techniques to further understanding of modern biology. Text indexing differs from the typical indexing of keys drawn from an underlying total order—text data can have varying lengths, and queries are often more complex and involve substrings, partial matches, or approximate matches.

Queries on text data are as varied as...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 399.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Recently, Cole et al. (2006) showed how to further reduce the search time to \( { O(|P| + \log |\Sigma|) } \) while still keeping the optimal \( { O(|T|) } \) space.

Recommended Reading

  1. Abouelhoda, M., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discret. Algorithms 2, 53–86 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  2. Aluru, S. (ed.): Handbook of Computational Molecular Biology. Computer and Information Science Series. Chapman and Hall/CRC Press, Boca Raton (2005)

    Google Scholar 

  3. Amir, A., Kopelowitz, T., Lewenstein, M., Lewenstein, N.: Towards real-time suffix tree construction. In: Proc. String Processing and Information Retrieval Symposium (SPIRE), 2005, pp. 67–78

    Google Scholar 

  4. Ciriani, V., Ferragina, P., Luccio, F., Muthukrishnan, S.: A data structure for a sequence of string acesses in external memory. ACM Trans. Algorithms 3 (2007)

    Google Scholar 

  5. Crescenzi, P., Grossi, R., Italiano, G.: Search data structures for skewed strings. In: International Workshop on Experimental and Efficient Algorithms (WEA). Lecture Notes in Computer Science, vol. 2, pp. 81–96. Springer, Berlin (2003)

    Google Scholar 

  6. Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific Publishing Company, Singapore (2002)

    Book  Google Scholar 

  7. Ferragina, P., Grossi, R.: Optimal On-Line Search and Sublinear Time Update in String Matching. SIAM J. Comput. 3, 713–736 (1998)

    Article  MathSciNet  Google Scholar 

  8. Franceschini, G., Grossi, R.: A general technique for managing strings in comparison‐driven data structures. In: Annual International Colloquium on Automata, Languages and Programming (ICALP), 2004

    Google Scholar 

  9. Grossi, R., Italiano, G.: Efficient techniques for maintaining multidimensional keys in linked data structures. In: Annual International Colloquium on Automata, Languages and Programming (ICALP), 1999, pp. 372–381

    Google Scholar 

  10. Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)

    Book  MATH  Google Scholar 

  11. Karkkainen, J., Sanders, P., Burkhardt, S.: Linear work suffix arrays construction. J. ACM 53, 918–936 (2006)

    Article  MathSciNet  Google Scholar 

  12. Kasai, T., Lee, G., Arimura, H. et al.: Linear-time longest‐common‐prefix computation in suffix arrays and its applications. In: Proc. 12th Annual Symposium, Combinatorial Pattern Matching (CPM), 2001, pp. 181–192

    Google Scholar 

  13. Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. J. Discret. Algorithms 3, 143–156 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  14. Ko, P., Aluru, S.: Optimal self‐adjustring tree for dynamic string data in secondary storage. In: Proc. String Processing and Information Retrieval Symposium (SPIRE). Lect. Notes Comp. Sci. vol. 4726, pp. 184–194, Santiago, Chile (2007)

    Google Scholar 

  15. Manber, U., Myers, G.: Suffix arrays: a new method for on-line search. SIAM J. Comput. 22, 935–948 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  16. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33, 31–88 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag

About this entry

Cite this entry

Aluru, S. (2008). Text Indexing. In: Kao, MY. (eds) Encyclopedia of Algorithms. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30162-4_422

Download citation

Publish with us

Policies and ethics