Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Compressed Indexes for Approximate String Matching

  • Conference paper
Algorithms – ESA 2006 (ESA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4168))

Included in the following conference series:

Abstract

We revisit the problem of indexing a string S[1..n] to support searching all substrings in S that match a given pattern P[1..m] with at most k errors. Previous solutions either require an index of size exponential in k or need Ω(m k) time for searching. Motivated by the indexing of DNA sequences, we investigate space efficient indexes that occupy only O(n) space. For k = 1, we give an index to support matching in O(m + occ + logn loglogn) time. The previously best solution achieving this time complexity requires an index of size O(n logn). This new index can be used to improve existing indexes for k ≥2 errors. Among others, it can support matching with k=2 errors in O(m logn loglogn + occ) time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Indexing and dictionary matching with one error. In: Dehne, F., Gupta, A., Sack, J.-R., Tamassia, R. (eds.) WADS 1999. LNCS, vol. 1663, pp. 181–192. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  2. Buchsbaum, A.L., Goodrich, M.T., Westbrook, J.R.: Range searching over tree cross products. In: European Symposium on Algorithms, pp. 120–131 (2000)

    Google Scholar 

  3. Chavez, E., Navarro, G.: A metric index for approximate string matching. In: Proceedings of Latin American Theoretical Informatics, pp. 181–195 (2002)

    Google Scholar 

  4. Cobbs, A.: Fast approximate matching using suffix trees. In: Proceedings of Symposium on Combinatorial Pattern Matching, pp. 41–54 (1995)

    Google Scholar 

  5. Cole, R., Gottlieb, L.A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of Symposium on Theory of Computing, pp. 91–100 (2004)

    Google Scholar 

  6. Chan, H.L., Lam, T.W., Sung, W.K., Tam, S.L., Wong, S.S.: A Linear-Size Index for Approximate Pattern Matching. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 49–59. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Ferragina, P., Manzini, G.: Opportunistic Data Structures with Applications. In: Symposium on Foundations of Computer Science, pp. 390–398 (2000)

    Google Scholar 

  8. Grossi, R., Vitter, J.S.: Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. In: Proceedings of Symposium on Theory of Computing, pp. 397–406 (2000)

    Google Scholar 

  9. Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing 13(2), 338–355 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  10. Huynh, T.N.D., Hon, W.K., Lam, T.W., Sung, W.K.: Approximate string matching using compressed suffix arrays. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 434–444. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  11. Lam, T.W., Sung, W.K., Wong, S.S.: Improved approximate string matching using compressed suffix data structures. In: Proceedings of International Symposium on Algorithms and Computation (2005)

    Google Scholar 

  12. Maaß, M.G., Nowak, J.: Text indexing with errors. Technical Report TUM-10503, Fakultät für Informatik, TU München (March 2005)

    Google Scholar 

  13. Manber, U., Myers, G.: Suffix Arrays: A New Method for On-Line String Searches. SIAM Journal on Computing 22(5), 935–948 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  14. McCreight, E.M.: A Space-economical Suffix Tree Construction Algorithm. Journal of the ACM 23(2), 262–272 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  15. Munro, J.I.: Tables. In: Proceedings of Conference on Foundations of Software Technology and Computer Science, pp. 37–42 (1996)

    Google Scholar 

  16. Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proceedings of 13th Symposium on Discrete Algorithms, pp. 657–666 (2002)

    Google Scholar 

  17. Navarro, G., Baeza-Yates, R.: A Hybrid Indexing Method for Approximate String Matching. J. Discrete Algorithms 1(1), 205–209 (2000)

    MathSciNet  Google Scholar 

  18. Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms, pp. 233–242 (2002)

    Google Scholar 

  19. Sadakane, K.: Succinct representations of lcp information and improvements in the compressed suffix arrays. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms, pp. 225–232 (2002)

    Google Scholar 

  20. Weiner, P.: Linear Pattern Matching Algorithms. In: Proceedings of Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

  21. Willard, D.E.: Log-Logarithmic worst-case range queries are possible in space Θ(n). Information Processing Letters 17(2), 81–84 (1983)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chan, HL., Lam, TW., Sung, WK., Tam, SL., Wong, SS. (2006). Compressed Indexes for Approximate String Matching. In: Azar, Y., Erlebach, T. (eds) Algorithms – ESA 2006. ESA 2006. Lecture Notes in Computer Science, vol 4168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11841036_21

Download citation

  • DOI: https://doi.org/10.1007/11841036_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-38875-3

  • Online ISBN: 978-3-540-38876-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics