Abstract
We revisit the problem of indexing a string S[1..n] to support searching all substrings in S that match a given pattern P[1..m] with at most k errors. Previous solutions either require an index of size exponential in k or need Ω(m k) time for searching. Motivated by the indexing of DNA sequences, we investigate space efficient indexes that occupy only O(n) space. For k = 1, we give an index to support matching in O(m + occ + logn loglogn) time. The previously best solution achieving this time complexity requires an index of size O(n logn). This new index can be used to improve existing indexes for k ≥2 errors. Among others, it can support matching with k=2 errors in O(m logn loglogn + occ) time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Indexing and dictionary matching with one error. In: Dehne, F., Gupta, A., Sack, J.-R., Tamassia, R. (eds.) WADS 1999. LNCS, vol. 1663, pp. 181–192. Springer, Heidelberg (1999)
Buchsbaum, A.L., Goodrich, M.T., Westbrook, J.R.: Range searching over tree cross products. In: European Symposium on Algorithms, pp. 120–131 (2000)
Chavez, E., Navarro, G.: A metric index for approximate string matching. In: Proceedings of Latin American Theoretical Informatics, pp. 181–195 (2002)
Cobbs, A.: Fast approximate matching using suffix trees. In: Proceedings of Symposium on Combinatorial Pattern Matching, pp. 41–54 (1995)
Cole, R., Gottlieb, L.A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of Symposium on Theory of Computing, pp. 91–100 (2004)
Chan, H.L., Lam, T.W., Sung, W.K., Tam, S.L., Wong, S.S.: A Linear-Size Index for Approximate Pattern Matching. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 49–59. Springer, Heidelberg (2006)
Ferragina, P., Manzini, G.: Opportunistic Data Structures with Applications. In: Symposium on Foundations of Computer Science, pp. 390–398 (2000)
Grossi, R., Vitter, J.S.: Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. In: Proceedings of Symposium on Theory of Computing, pp. 397–406 (2000)
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing 13(2), 338–355 (1984)
Huynh, T.N.D., Hon, W.K., Lam, T.W., Sung, W.K.: Approximate string matching using compressed suffix arrays. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 434–444. Springer, Heidelberg (2004)
Lam, T.W., Sung, W.K., Wong, S.S.: Improved approximate string matching using compressed suffix data structures. In: Proceedings of International Symposium on Algorithms and Computation (2005)
Maaß, M.G., Nowak, J.: Text indexing with errors. Technical Report TUM-10503, Fakultät für Informatik, TU München (March 2005)
Manber, U., Myers, G.: Suffix Arrays: A New Method for On-Line String Searches. SIAM Journal on Computing 22(5), 935–948 (1993)
McCreight, E.M.: A Space-economical Suffix Tree Construction Algorithm. Journal of the ACM 23(2), 262–272 (1976)
Munro, J.I.: Tables. In: Proceedings of Conference on Foundations of Software Technology and Computer Science, pp. 37–42 (1996)
Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proceedings of 13th Symposium on Discrete Algorithms, pp. 657–666 (2002)
Navarro, G., Baeza-Yates, R.: A Hybrid Indexing Method for Approximate String Matching. J. Discrete Algorithms 1(1), 205–209 (2000)
Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms, pp. 233–242 (2002)
Sadakane, K.: Succinct representations of lcp information and improvements in the compressed suffix arrays. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms, pp. 225–232 (2002)
Weiner, P.: Linear Pattern Matching Algorithms. In: Proceedings of Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Willard, D.E.: Log-Logarithmic worst-case range queries are possible in space Θ(n). Information Processing Letters 17(2), 81–84 (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chan, HL., Lam, TW., Sung, WK., Tam, SL., Wong, SS. (2006). Compressed Indexes for Approximate String Matching. In: Azar, Y., Erlebach, T. (eds) Algorithms – ESA 2006. ESA 2006. Lecture Notes in Computer Science, vol 4168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11841036_21
Download citation
DOI: https://doi.org/10.1007/11841036_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-38875-3
Online ISBN: 978-3-540-38876-0
eBook Packages: Computer ScienceComputer Science (R0)