Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3310435.3310506acmotherconferencesArticle/Chapter ViewAbstractPublication PagessodaConference Proceedingsconference-collections
research-article

Efficiently approximating edit distance between pseudorandom strings

Published: 06 January 2019 Publication History

Abstract

We present an algorithm for approximating the edit distance ed(x, y) between two strings x and y in time parameterized by the degree to which one of the strings x satisfies a natural pseudorandomness property. The pseudorandomness model is asymmetric in that no requirements are placed on the second string y, which may be constructed by an adversary with full knowledge of x.
We say that x is (p, B)-pseudorandom if all pairs a and b of disjoint B-letter substrings of x satisfy ed(a,b) ≥ pB. Given parameters p and B, our algorithm computes the edit distance between a (p, B)-pseudorandom string x and an arbitrary string y within a factor of O(1/p) in time Õ(nB), with high probability. If x is generated at random, then with high probability it will be (Ω(1),O(logn))-pseudorandom, allowing us to compute ed(x, y) within a constant factor in near linear time. For strings x of varying degrees of pseudorandomness, our algorithm offers a continuum of runtimes.
Our algorithm is robust in the sense that it can handle a small portion of x being adversarial (i.e., not satisfying the pseudorandomness property). In this case, the algorithm incurs an additive approximation error proportional to the fraction of x which behaves maliciously.
The asymmetry of our pseudorandomness model has particular appeal for the case where x is a source string, meaning that ed(x, y) will be computed for many strings y. Suppose that one wishes to achieve an O(α)-approximation for each ed(x, y) computation, and that B is the smallest block-size for which the string x is (1/α, B)-pseudorandom. We show that without knowing B beforehand, x may be preprocessed in time [MATH HERE], so that all future computations of the form ed(x,y) may be O(α)-approximated in time Õ(nB). Furthermore, for the special case where only a single ed( x, y) computation will be performed, we show how to achieve an O(α)-approximation in time Õ(n4/3B2/3).

References

[1]
Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic acids research, 25(17):3389-3402, 1997.
[2]
Alexandr Andoni and Robert Krauthgamer. The smoothed complexity of edit distance. ACM Transactions on Algorithms (TALG), 8(4):44, 2012.
[3]
Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In Proceedings of the 51st Annual Symposium on Foundations of Computer Science (FOCS), pages 377--386, 2010.
[4]
Alexandr Andoni and Krzysztof Onak. Approximating edit distance in near-linear time. SIAM J. Comput., 41(6):1635-1648, 2012.
[5]
Arturs Backurs and Piotr Indyk. Edit distance cannot be computed in strongly subquadratic time (unless seth is false). In Proceedings of the 47th Annual Symposium on Theory of Computing (STOC), pages 51--58, 2015.
[6]
Karl Bringmann and Marvin Künnemann. Quadratic conditional lower bounds for string problems and dynamic time warping. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on, pages 79--97, 2015.
[7]
Diptarka Chakraborty, Debarati Das, Elazar Golden-berg, Michal Kouckỳ, and Michael Saks. Approximating edit distance within constant factor in truly sub-quadratic time. In Proceedings of the 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 979--990, 2018.
[8]
Diptarka Chakraborty, Elazar Goldenberg, and Michal Kouckỳ. Streaming algorithms for computing edit distance without exploiting suffix trees. arXiv preprint arXiv:1607.03718, 2016.
[9]
William I Chang and Jordan Lampe. Theoretical and empirical comparisons of approximate string matching algorithms. In Annual Symposium on Combinatorial Pattern Matching, pages 175--184, 1992.
[10]
Kun-Mao Chao, William R. Pearson, and Webb Miller. Aligning two sequences within a specified diagonal band. Bioinformatics, 8(5):481-487, 1992.
[11]
Moses Charikar, Ofir Geri, Michael P. Kim, and William Kuszmaul. On estimating edit distance: Alignment, dimension reduction, and embeddings. In 45th International Colloquium on Automata, Languages, and Programming (ICALP), pages 34:1-34:14, 2018.
[12]
Dan Gusfield. Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, 1997.
[13]
Bernhard Haeupler, Aviad Rubinstein, and Amir-behshad Shahrasbi. Near-linear time insertion-deletion codes and (1 + ε)-approximating edit distance via indexing. arXiv preprint arXiv:1810.11863, 2018.
[14]
Piotr Indyk. Algorithmic aspects of geometric embeddings (tutorial). In Proceedings of the 42nd Annual Symposium on Foundations of Computer Science (FOCS), pages 10--33, 2001.
[15]
Gad M. Landau, Eugene W. Myers, and Jeanette P. Schmidt. Incremental string comparison. SIAM Journal on Computing, 27(2):557-582, 1998.
[16]
Gad M Landau and Uzi Vishkin. Fast parallel and serial approximate string matching. Journal of algorithms, 10(2):157-169, 1989.
[17]
Bin Ma, John Tromp, and Ming Li. Patternhunter: faster and more sensitive homology search. Bioinformatics, 18(3):440-445, 2002.
[18]
Eugene Wimberly Myers. Incremental alignment algorithms and their applications. University of Arizona, Department of Computer Science, 1986.
[19]
Gonzalo Navarro. A guided tour to approximate string matching. ACM computing surveys (CSUR), 33(1):31-88, 2001.
[20]
Saul B. Needleman and Christian D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology, 48(3):443-453, 1970.
[21]
Rafail Ostrovsky and Yuval Rabani. Low distortion embeddings for edit distance. J. ACM, 54(5):23, 2007.
[22]
Taras K. Vintsyuk. Speech discrimination by dynamic programming. Cybernetics, 4(1):52-57, 1968.
[23]
Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. J. ACM,21(1):168-173, 1974.
  1. Efficiently approximating edit distance between pseudorandom strings

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      SODA '19: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms
      January 2019
      2993 pages

      Sponsors

      • SIAM Activity Group on Discrete Mathematics

      In-Cooperation

      Publisher

      Society for Industrial and Applied Mathematics

      United States

      Publication History

      Published: 06 January 2019

      Check for updates

      Qualifiers

      • Research-article

      Conference

      SODA '19
      Sponsor:
      SODA '19: Symposium on Discrete Algorithms
      January 6 - 9, 2019
      California, San Diego

      Acceptance Rates

      Overall Acceptance Rate 411 of 1,322 submissions, 31%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 66
        Total Downloads
      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 25 Dec 2024

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media