Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3618260.3649783acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Open access

Almost Linear Size Edit Distance Sketch

Published: 11 June 2024 Publication History

Abstract

We design an almost linear-size sketching scheme for computing edit distance up to a given threshold k. The scheme consists of two algorithms, a sketching algorithm and a recovery algorithm. The sketching algorithm depends on the parameter k and takes as input a string x and a public random string ρ and computes a sketch skρ(x;k), which is a compressed version of x. The recovery algorithm is given two sketches skρ(x;k) and skρ(y;k) as well as the public random string ρ used to create the two sketches, and (with high probability) if the edit distance ED(x,y) between x and y is at most k, will output ED(x,y) together with an optimal sequence of edit operations that transforms x to y, and if ED(x,y) > k will output large. The size of the sketch output by the sketching algorithm on input x is k2O(√log(n)loglog(n)) (where n is an upper bound on length of x). The sketching and recovery algorithms both run in time polynomial in n. The dependence of sketch size on k is information theoretically optimal and improves over the quadratic dependence on k in schemes of Kociumaka, Porat and Starikovskaya (FOCS’2021), and Bhattacharya and Koucký (STOC’2023).

References

[1]
Alexandr Andoni and Negev Shekel Nosatzki. 2020. Edit Distance in Near-Linear Time: it’s a Constant Factor. CoRR, abs/2005.07678 (2020), arXiv:2005.07678. arxiv:2005.07678
[2]
Arturs Backurs and Piotr Indyk. 2018. Edit Distance Cannot Be Computed in Strongly Subquadratic Time (Unless SETH is False). SIAM J. Comput., 47, 3 (2018), 1087–1097. https://doi.org/10.1137/15M1053128
[3]
Djamal Belazzougui and Qin Zhang. 2016. Edit Distance: Sketching, Streaming, and Document Exchange. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS). 51–60. https://doi.org/10.1109/FOCS.2016.15
[4]
Sudatta Bhattacharya and Michal Koucký. 2023. Locally Consistent Decomposition of Strings with Applications to Edit Distance Sketching. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC 2023, Orlando, FL, USA, June 20-23, 2023, Barna Saha and Rocco A. Servedio (Eds.). ACM, 219–232. https://doi.org/10.1145/3564246.3585239
[5]
Joshua Brakensiek and Aviad Rubinstein. 2020. Constant-factor approximation of near-linear edit distance in near-linear time. In Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020. ACM, 685–698. https://doi.org/10.1145/3357713.3384282
[6]
Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucký, and Michael E. Saks. 2020. Approximating Edit Distance Within Constant Factor in Truly Sub-quadratic Time. J. ACM, 67, 6 (2020), 36:1–36:22. https://doi.org/10.1145/3422823
[7]
Diptarka Chakraborty, Elazar Goldenberg, and Michal Koucký. 2016. Streaming algorithms for embedding and computing edit distance in the low distance regime. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, Daniel Wichs and Yishay Mansour (Eds.). ACM, 712–725. https://doi.org/10.1145/2897518.2897577
[8]
Raphaël Clifford, Klim Efremenko, Ely Porat, and Amir Rothschild. 2009. From coding theory to efficient pattern matching. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009, New York, NY, USA, January 4-6, 2009. SIAM, 778–784. https://doi.org/10.1137/1.9781611973068.85
[9]
Raphaël Clifford, Tomasz Kociumaka, and Ely Porat. 2019. The streaming k-mismatch problem. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2019, San Diego, California, USA, January 6-9, 2019, Timothy M. Chan (Ed.). SIAM, 1106–1125. https://doi.org/10.1137/1.9781611975482.68
[10]
Joan Feigenbaum, Sampath Kannan, Martin Strauss, and Mahesh Viswanathan. 2002. An Approximate L1-Difference Algorithm for Massive Data Streams. SIAM J. Comput., 32, 1 (2002), 131–151. https://doi.org/10.1137/S0097539799361701
[11]
Amos Fiat, Moni Naor, Jeanette P. Schmidt, and Alan Siegel. 1992. Nonoblivious Hashing. J. ACM, 39, 4, 764–782. https://doi.org/10.1145/146585.146591
[12]
Arun Ganesh, Tomasz Kociumaka, Andrea Lincoln, and Barna Saha. 2022. How Compression and Approximation Affect Efficiency in String Distance Measures. In Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, Virtual Conference / Alexandria, VA, USA, January 9 - 12, 2022, Joseph (Seffi) Naor and Niv Buchbinder (Eds.). SIAM, 2867–2919. https://doi.org/10.1137/1.9781611977073.112
[13]
Szymon Grabowski. 2016. New tabulation and sparse dynamic programming based techniques for sequence similarity problems. Discret. Appl. Math., 212 (2016), 96–103. https://doi.org/10.1016/J.DAM.2015.10.040
[14]
Ce Jin, Jelani Nelson, and Kewen Wu. 2021. An Improved Sketching Algorithm for Edit Distance. In 38th International Symposium on Theoretical Aspects of Computer Science, STACS 2021, (LIPIcs, Vol. 187). 45:1–45:16. https://doi.org/10.4230/LIPIcs.STACS.2021.45
[15]
Richard M. Karp and Michael O. Rabin. 1987. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev., 31, 2 (1987), mar, 249–260. issn:0018-8646 https://doi.org/10.1147/rd.312.0249
[16]
Tomasz Kociumaka, Ely Porat, and Tatiana Starikovskaya. 2021. Small-space and streaming pattern matching with k edits. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS). 885–896. https://doi.org/10.1109/FOCS52979.2021.00090
[17]
Michal Koucký and Michael E. Saks. 2020. Constant factor approximations to edit distance on far input pairs in nearly linear time. In Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020. ACM, 699–712. https://doi.org/10.1145/3357713.3384307
[18]
Gad M. Landau, Eugene W. Myers, and Jeanette P. Schmidt. 1998. Incremental String Comparison. SIAM J. Comput., 27, 2 (1998), 557–582. https://doi.org/10.1137/S0097539794264810
[19]
William J. Masek and Mike Paterson. 1980. A Faster Algorithm Computing String Edit Distances. J. Comput. Syst. Sci., 20, 1 (1980), 18–31. https://doi.org/10.1016/0022-0000(80)90002-1
[20]
Rafail Ostrovsky and Yuval Rabani. 2007. Low distortion embeddings for edit distance. J. ACM, 54, 5 (2007), 23. https://doi.org/10.1145/1284320.1284322
[21]
Ely Porat and Ohad Lipsky. 2007. Improved Sketching of Hamming Distance with Error Correcting. In Combinatorial Pattern Matching, 18th Annual Symposium, CPM 2007, London, Canada, July 9-11, 2007, Proceedings, Bin Ma and Kaizhong Zhang (Eds.) (Lecture Notes in Computer Science, Vol. 4580). Springer, 173–182. https://doi.org/10.1007/978-3-540-73437-6_19
[22]
Robert A. Wagner and Michael J. Fischer. 1974. The String-to-String Correction Problem. J. ACM, 21, 1 (1974), 168–173. https://doi.org/10.1145/321796.321811

Index Terms

  1. Almost Linear Size Edit Distance Sketch

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    STOC 2024: Proceedings of the 56th Annual ACM Symposium on Theory of Computing
    June 2024
    2049 pages
    ISBN:9798400703836
    DOI:10.1145/3618260
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. edit distance
    2. sketching algorithm

    Qualifiers

    • Research-article

    Funding Sources

    • National Science Foundation
    • European Commission
    • Czech Science Foundation

    Conference

    STOC '24
    Sponsor:
    STOC '24: 56th Annual ACM Symposium on Theory of Computing
    June 24 - 28, 2024
    BC, Vancouver, Canada

    Acceptance Rates

    Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 116
      Total Downloads
    • Downloads (Last 12 months)116
    • Downloads (Last 6 weeks)39
    Reflects downloads up to 01 Sep 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media