Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/780542.780590acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
Article

A sublinear algorithm for weakly approximating edit distance

Published: 09 June 2003 Publication History

Abstract

We show how to determine whether the edit distance between two given strings is small in sublinear time. Specifically, we present a test which, given two n-character strings A and B, runs in time o(n) and with high probability returns "CLOSE" if their edit distance is O(nΑ), and "FAR" if their edit distance is Ω(n), where Α is a fixed parameter less than 1. Our algorithm for testing the edit distance works by recursively subdividing the strings A and B into smaller substrings and looking for pairs of substrings in A, B with small edit distance. To do this, we query both strings at random places using a special technique for economizing on the samples which does not pick the samples independently and provides better query and overall complexity. As a result, our test runs in time Õ(nmax(Α/2, 2Α - 1\)) for any fixed Α < 1. Our algorithm thus provides a trade-off between accuracy and efficiency that is particularly useful when the input data is very large.We also show a lower bound of Ω(nΑ/2) on the query complexity of every algorithm that distinguishes pairs of strings with edit distance at most nΑ from those with edit distance at least n/6.

References

[1]
W. Chang and E. Lawler. Approximate string matching in sublinear expected time. In Proceedings of the 31st IEEE Annual Symposium on Foundations of Computer Science, pages 116--124, Saint Louis, Missouri, 1990. IEEE Computer Society Press.
[2]
R. Cole and R. Hariharan. Approximate string matching: A simpler faster algorithm. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 463--472, San Francisco, California, Jan. 1998.
[3]
G. Cormode and S. Muthukrishnan and C. Sahinalp and U. Vishkin. Communication complexity of document exchange. In Proceedings of the Eleventh Annual ACM/SIAM Symposium on Discrete Algorithms, pages 197--206, San Francisco, CA, Jan. 2000.
[4]
G. M. Landau and U. Vishkin. Introducing efficient parallelism into approximate string matching and a new serial algorithm. In Proceedings of the Eighteenth annual ACM Symposium on Theory of Computing, pages 220--230, Berkeley, California, May 1986. ACM Press, New York.
[5]
W. J. Masek and M. S. Paterson. A faster algorithm computing string edit distances. Journal of Computer and System Sciences, 20:18--31, 1980.
[6]
E. W. Myers. A sublinear algorithm for approximate keyword searching. Algorithmica, 12(4/5):345--374, Oct./Nov. 1994.
[7]
S. C. Sahinalp and U. Vishkin. Efficient approximate and dynamic matching of patterns using a labeling paradigm. In 37th Annual Symposium on Foundations of Computer Science, pages 320--328, Burlington, Vermont, Oct. 1996. IEEE Computer Society Press.

Cited By

View all
  • (2023)Weighted Edit Distance Computation: Strings, Trees, and DyckProceedings of the 55th Annual ACM Symposium on Theory of Computing10.1145/3564246.3585178(377-390)Online publication date: 2-Jun-2023
  • (2023)Optimal Algorithms for Bounded Weighted Edit Distance2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00135(2177-2187)Online publication date: 6-Nov-2023
  • (2023)Approximating Edit Distance in the Fully Dynamic Model2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00098(1628-1638)Online publication date: 6-Nov-2023
  • Show More Cited By

Index Terms

  1. A sublinear algorithm for weakly approximating edit distance

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    STOC '03: Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
    June 2003
    740 pages
    ISBN:1581136749
    DOI:10.1145/780542
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2003

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. approximation
    2. string matching
    3. sublinear algorithms

    Qualifiers

    • Article

    Conference

    STOC03
    Sponsor:

    Acceptance Rates

    STOC '03 Paper Acceptance Rate 80 of 270 submissions, 30%;
    Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Weighted Edit Distance Computation: Strings, Trees, and DyckProceedings of the 55th Annual ACM Symposium on Theory of Computing10.1145/3564246.3585178(377-390)Online publication date: 2-Jun-2023
    • (2023)Optimal Algorithms for Bounded Weighted Edit Distance2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00135(2177-2187)Online publication date: 6-Nov-2023
    • (2023)Approximating Edit Distance in the Fully Dynamic Model2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00098(1628-1638)Online publication date: 6-Nov-2023
    • (2022)A Linear-Time n0.4-Approximation for Longest Common SubsequenceACM Transactions on Algorithms10.1145/356839819:1(1-24)Online publication date: 26-Oct-2022
    • (2022)Almost-optimal sublinear-time edit distance in the low distance regimeProceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing10.1145/3519935.3519990(1102-1115)Online publication date: 9-Jun-2022
    • (2022)Gap Edit Distance via Non-Adaptive Queries: Simple and Optimal2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS54457.2022.00070(674-685)Online publication date: Oct-2022
    • (2021)On efficient distance approximation for graph propertiesProceedings of the Thirty-Second Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3458064.3458162(1618-1637)Online publication date: 10-Jan-2021
    • (2020)Approximating Edit Distance Within Constant Factor in Truly Sub-quadratic TimeJournal of the ACM10.1145/342282367:6(1-22)Online publication date: 29-Oct-2020
    • (2020)Constant factor approximations to edit distance on far input pairs in nearly linear timeProceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing10.1145/3357713.3384307(699-712)Online publication date: 22-Jun-2020
    • (2020)Does preprocessing help in fast sequence comparisons?Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing10.1145/3357713.3384300(657-670)Online publication date: 22-Jun-2020
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media