Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Approximating Edit Distance in Truly Subquadratic Time: Quantum and MapReduce

Published: 13 May 2021 Publication History

Abstract

The edit distance between two strings is defined as the smallest number of insertions, deletions, and substitutions that need to be made to transform one of the strings to another one. Approximating edit distance in subquadratic time is “one of the biggest unsolved problems in the field of combinatorial pattern matching” [37]. Our main result is a quantum constant approximation algorithm for computing the edit distance in truly subquadratic time. More precisely, we give an quantum algorithm that approximates the edit distance within a factor of 3. We further extend this result to an quantum algorithm that approximates the edit distance within a larger constant factor.
Our solutions are based on a framework for approximating edit distance in parallel settings. This framework requires as black box an algorithm that computes the distances of several smaller strings all at once. For a quantum algorithm, we reduce the black box to metric estimation and provide efficient algorithms for approximating it. We further show that this framework enables us to approximate edit distance in distributed settings. To this end, we provide a MapReduce algorithm to approximate edit distance within a factor of , with sublinearly many machines and sublinear memory. Also, our algorithm runs in a logarithmic number of rounds.

References

[1]
Andris Ambainis. 2002. Quantum lower bounds by quantum arguments. J. Comput. System Sci. 64, 4 (2002), 750–767.
[2]
Alexandr Andoni, Piotr Indyk, and Robert Krauthgamer. 2009. Overcoming the non-embeddability barrier: Algorithms for product metrics. In Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’09). SIAM, 865–874.
[3]
Alexandr Andoni and Robert Krauthgamer. 2010. The computational hardness of estimating edit distance. SIAM J. Comput. 39, 6 (April 2010), 2398–2429.
[4]
Alexandr Andoni and Robert Krauthgamer. 2012. The smoothed complexity of edit distance. ACM Trans. Algorithms 8, 4 (Oct. 2012), Article 44, 25 pages.
[5]
Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. 2010. Polylogarithmic approximation for edit distance and the asymmetric query complexity. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS’10). IEEE, 377–386.
[6]
Alexandr Andoni, Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. 2014. Parallel algorithms for geometric graph problems. In Proceedings of the 46th Annual ACM SIGACT Symposium on Theory of Computing (STOC’14). ACM, 574–583.
[7]
Alexandr Andoni and Negev Shekel Nosatzki. 2020. Edit distance in near-linear time: It’s a constant factor. arxiv:cs.DS/2005.07678
[8]
Alexandr Andoni and Krzysztof Onak. 2012. Approximating edit distance in near-linear time. SIAM J. Comput. 41, 6 (2012), 1635–1648.
[9]
Alberto Apostolico, Mikhail J. Atallah, Lawrence L. Larmore, and Scott McFaddin. 1990. Efficient parallel algorithms for string editing and related problems. SIAM J. Comput. 19, 5 (1990), 968–988.
[10]
Arturs Backurs and Piotr Indyk. 2015. Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In Proceedings of the 47th Annual ACM SIGACT Symposium on Theory of Computing (STOC’15). ACM, 51–58.
[11]
Ziv Bar-Yossef, T. S. Jayram, Robert Krauthgamer, and Ravi Kumar. 2004. Approximating edit distance efficiently. In Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science (FOCS’04). IEEE, 550–559.
[12]
Tuğkan Batu, Funda Ergun, and Cenk Sahinalp. 2006. Oblivious string embeddings and edit distance approximations. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA’06). SIAM, 792–801.
[13]
Robert Beals. 1997. Quantum computation of Fourier transforms over symmetric groups. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC’97). ACM, 48–53.
[14]
Aleksandrs Belovs. 2012. Learning-graph-based quantum algorithm for k-distinctness. In Proceedings of the 53rd Annual IEEE Symposium on Foundations of Computer Science (FOCS’12). IEEE, 207–216.
[15]
Mahdi Boroujeni, Soheil Ehsani, Mohammad Ghodsi, MohammadTaghi HajiAghayi, and Saeed Seddighin. 2018. Approximating edit distance in truly subquadratic time: Quantum and MapReduce. In Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’18). SIAM, 1170–1189.
[16]
Mahdi Boroujeni, Masoud Seddighin, and Saeed Seddighin. 2020. Improved algorithms for edit distance and LCS: Beyond worst case. In Proceedings of the 31st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’20). SIAM, 1601–1620.
[17]
Mahdi Boroujeni and Saeed Seddighin. 2019. Improved MPC algorithms for edit distance and Ulam distance. In The 31st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’19). ACM, 31–40.
[18]
Michel Boyer, Gilles Brassard, Peter Høyer, and Alain Tapp. 1998. Tight bounds on quantum searching. Fortschritte der Physik 46, 4–5 (1998), 493–505.
[19]
Joshua Brakensiek, Moses Charikar, and Aviad Rubinstein. 2020. A simple sublinear algorithm for gap edit distance. arxiv:cs.DS/2007.14368
[20]
Joshua Brakensiek and Aviad Rubinstein. 2020. Constant-factor approximation of near-linear edit distance in near-linear time. In Proceedings of the 52nd Annual ACM Symposium on Theory of Computing (STOC’20). ACM, 685–698.
[21]
Gilles Brassard, Peter Høyer, Michele Mosca, and Alain Tapp. 2002. Quantum amplitude amplification and estimation. Contemp. Math. 305 (2002), 53–74.
[22]
Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucky, and Michael Saks. 2018. Approximating edit distance within constant factor in truly sub-quadratic time. In Proceedings of the 59th Annual IEEE Symposium on Foundations of Computer Science (FOCS’18). IEEE, 979–990.
[23]
Diptarka Chakraborty, Debarati Das, Elazar Goldenberg, Michal Koucký, and Michael Saks. 2020. Approximating edit distance within constant factor in truly sub-quadratic time. J. ACM 67, 6, Article 36 (Oct. 2020), 22 pages.
[24]
Moses Charikar and Robert Krauthgamer. 2006. Embedding the Ulam metric into . Theory Comput. 2, 11 (2006), 207–224.
[25]
Christoph Dürr, Mark Heiligman, Peter Høyer, and Mehdi Mhalla. 2006. Quantum query complexity of some graph problems. SIAM J. Comput. 35, 6 (2006), 1310–1328.
[26]
Alireza Farhadi, MohammadTaghi HajiAghayi, Aviad Rubinstein, and Saeed Seddighin. 2020. Asymmetric streaming algorithms for edit distance and LCS. arxiv:cs.DS/2002.11342
[27]
Edward Farhi, Jeffrey Goldstone, Sam Gutmann, and Michael Sipser. 1998. A limit on the speed of quantum computation in determining parity. Phys. Rev. Lett. 81 (1998), 5442–5444.
[28]
Edward Farhi, Jeffrey Goldstone, Sam Gutmann, and Michael Sipser. 1999. Invariant quantum algorithms for insertion into an ordered list. arxiv:quant-ph/9901059
[29]
Elazar Goldenberg, Robert Krauthgamer, and Barna Saha. 2019. Sublinear algorithms for gap edit distance. In Proceedings of the 60th Annual IEEE Symposium on Foundations of Computer Science (FOCS’19). IEEE, 1101–1120.
[30]
Elazar Goldenberg, Aviad Rubinstein, and Barna Saha. 2020. Does preprocessing help in fast sequence comparisons? In Proceedings of the 52nd Annual ACM Symposium on Theory of Computing (STOC’20). ACM, 657–670.
[31]
Lov K. Grover. 1996. A fast quantum mechanical algorithm for database search. In Proceedings of the 28th Annual ACM Symposium on Theory of Computing (STOC’96). ACM, 212–219.
[32]
Bernhard Haeupler, Aviad Rubinstein, and Amirbehshad Shahrasbi. 2019. Near-linear time insertion-deletion codes and (1+ )-approximating edit distance via indexing. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing (STOC’19). ACM, 697–708.
[33]
MohammadTaghi HajiAghayi, Silvio Lattanzi, Saeed Seddighin, and Cliff Stein. 2019. MapReduce meets fine-grained complexity: MapReduce algorithms for APSP, matrix multiplication, 3-SUM, and beyond. arxiv:cs.DS/1905.01748
[34]
MohammadTaghi HajiAghayi, Saeed Seddighin, and Xiaorui Sun. 2019. Massively parallel approximation algorithms for edit distance and longest common subsequence. In Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’19). SIAM, 1654–1672.
[35]
Ramesh Hariharan and V. Vinay. 2003. String matching in quantum time. J. Discrete Algorithms 1, 1 (2003), 103–110.
[36]
Sungjin Im, Benjamin Moseley, and Xiaorui Sun. 2017. Efficient massively parallel methods for dynamic programming. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC’17). ACM, 798–811.
[37]
Piotr Indyk. 2001. Algorithmic applications of low-distortion geometric embeddings. In Proceedings of the 42nd Annual IEEE Symposium on Foundations of Computer Science (FOCS’01). IEEE, 10–33.
[38]
Stacey Jeffery, Robin Kothari, and Frédéric Magniez. 2013. Nested quantum walks with quantum data structures. In Proceedings of the 24th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’13). SIAM, 1474–1485.
[39]
Shagun Jhaver, Latifur Khan, and Bhavani Thuraisingham. 2014. Calculating edit distance for large sets of string pairs using MapReduce. Paper presented at ASE International Conference on Big Data.
[40]
Howard Karloff, Siddharth Suri, and Sergei Vassilvitskii. 2010. A model of computation for MapReduce. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’10). SIAM, 938–948.
[41]
Tomasz Kociumaka and Barna Saha. 2020. Sublinear-time algorithms for computing & embedding gap edit distance. In Proceedings of the 61st Annual IEEE Symposium on Foundations of Computer Science (FOCS’20). IEEE, Virtual Conference.
[42]
Michal Kouckỳ and Michael Saks. 2020. Constant factor approximations to edit distance on far input pairs in nearly linear time. In Proceedings of the 52nd Annual ACM Symposium on Theory of Computing (STOC’20). ACM, 699–712.
[43]
Robert Krauthgamer and Yuval Rabani. 2009. Improved lower bounds for embeddings into . SIAM J. Comput. 38, 6 (2009), 2487–2498.
[44]
Hari Krovi and Alexander Russell. 2015. Quantum Fourier transforms and the complexity of link invariants for quantum doubles of finite groups. Comm. Math. Phys. 334, 2 (2015), 743–777.
[45]
Gad M. Landau, Eugene W. Myers, and Jeanette P. Schmidt. 1998. Incremental string comparison. SIAM J. Comput. 27, 2 (1998), 557–582.
[46]
Silvio Lattanzi, Benjamin Moseley, Siddharth Suri, and Sergei Vassilvitskii. 2011. Filtering: A method for solving graph problems in MapReduce. In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’11). ACM, 85–94.
[47]
François Le Gall. 2014. Improved quantum algorithm for triangle finding via combinatorial arguments. In Proceedings of the 55th Annual IEEE Symposium on Foundations of Computer Science (FOCS’14). IEEE, 216–225.
[48]
William J. Masek and Michael S. Paterson. 1980. A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20, 1 (1980), 18–31.
[49]
Michael Mitzenmacher and Saeed Seddighin. 2020. Dynamic algorithms for LIS and distance to monotonicity. In Proceedings of the 52nd Annual ACM Symposium on Theory of Computing (STOC’20). ACM, 671–684.
[50]
Ashley Montanaro, Richard Jozsa, and Graeme Mitchison. 2015. On exact quantum query complexity. Algorithmica 71, 4 (2015), 775–796.
[51]
Rajeev Motwani and Prabhakar Raghavan. 1995. Randomized Algorithms. Cambridge University Press, UK.
[52]
Aran Nayebi and Virginia Vassilevska Williams. 2014. Quantum algorithms for shortest paths problems in structured instances. arxiv:quant-ph/1410.6220
[53]
Rafail Ostrovsky and Yuval Rabani. 2007. Low distortion embeddings for edit distance. J. ACM 54, 5 (Oct. 2007), Article 23, 16 pages.
[54]
Aviad Rubinstein, Saeed Seddighin, Zhao Song, and Xiaorui Sun. 2019. Approximation algorithms for LCS and LIS with truly improved running times. In Proceedings of the 60th Annual IEEE Symposium on Foundations of Computer Science (FOCS’19). IEEE, 1121–1145.
[55]
Aviad Rubinstein and Virginia Vassilevska Williams. 2019. SETH vs approximation. ACM SIGACT News 50, 4 (2019), 57–76.
[56]
Peter W. Shor. 1997. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM J. Comput. 26, 5 (1997), 1484–1509.
[57]
Esko Ukkonen. 1985. Algorithms for approximate string matching. Inf. Control 64, 1–3 (1985), 100–118.

Cited By

View all
  • (2024)Quantum Speed-Ups for String Synchronizing Sets, Longest Common Substring, and k-mismatch MatchingACM Transactions on Algorithms10.1145/367239520:4(1-36)Online publication date: 5-Aug-2024
  • (2024)An Optimal MPC Algorithm for Subunit-Monge Matrix Multiplication, with Applications to LISProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659974(145-154)Online publication date: 17-Jun-2024
  • (2023)Optimal Algorithms for Bounded Weighted Edit Distance2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00135(2177-2187)Online publication date: 6-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 68, Issue 3
June 2021
244 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/3456663
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2021
Accepted: 01 January 2021
Revised: 01 September 2020
Received: 01 April 2018
Published in JACM Volume 68, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Edit distance
  2. approximation algorithm
  3. subquadratic time algorithm
  4. quantum algorithm
  5. parallel algorithm
  6. MapReduce

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • NSF CAREER
  • NSF BIGDATA
  • NSF AF:Medium
  • DARPA GRAPHS/AFOSR
  • DARPA SIMPLEX

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)4
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Quantum Speed-Ups for String Synchronizing Sets, Longest Common Substring, and k-mismatch MatchingACM Transactions on Algorithms10.1145/367239520:4(1-36)Online publication date: 5-Aug-2024
  • (2024)An Optimal MPC Algorithm for Subunit-Monge Matrix Multiplication, with Applications to LISProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3659974(145-154)Online publication date: 17-Jun-2024
  • (2023)Optimal Algorithms for Bounded Weighted Edit Distance2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00135(2177-2187)Online publication date: 6-Nov-2023
  • (2023)Approximating Edit Distance in the Fully Dynamic Model2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00098(1628-1638)Online publication date: 6-Nov-2023
  • (2023)Quantum algorithm for learning secret strings and its experimental demonstrationPhysica A: Statistical Mechanics and its Applications10.1016/j.physa.2022.128372609(128372)Online publication date: Jan-2023
  • (2023)Near-Optimal Quantum Algorithms for String ProblemsAlgorithmica10.1007/s00453-022-01092-x85:8(2260-2317)Online publication date: 27-Jan-2023
  • (2022)Applications of Big Data Analytics in Investment ManagementJournal of Database Management10.4018/JDM.29955733:1(1-32)Online publication date: 23-May-2022
  • (2022)Õ(n+poly(k))-time Algorithm for Bounded Tree Edit Distance2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS54457.2022.00071(686-697)Online publication date: Oct-2022
  • (2022)Gap Edit Distance via Non-Adaptive Queries: Simple and Optimal2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS54457.2022.00070(674-685)Online publication date: Oct-2022

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media