Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

AnO(ND) difference algorithm and its variations

Published: 22 March 2023 Publication History

Abstract

The problems of finding a longest common subsequence of two sequencesA andB and a shortest edit script for transformingA intoB have long been known to be dual problems. In this paper, they are shown to be equivalent to finding a shortest/longest path in an edit graph. Using this perspective, a simpleO(ND) time and space algorithm is developed whereN is the sum of the lengths ofA andB andD is the size of the minimum edit script forA andB. The algorithm performs well when differences are small (sequences are similar) and is consequently fast in typical applications. The algorithm is shown to haveO(N+D2) expected-time performance under a basic stochastic model. A refinement of the algorithm requires onlyO(N) space, and the use of suffix trees leads to anO(N logN+D2) time variation.

References

[1]
Aho A. V., Hirschberg D. S., and Ullman J. D. Bounds on the complexity of the longest common subsequence problem J. ACM 1976 23 1 1-12
[2]
Aho A. V., Hopcroft J. E., and Ullman J. D. Data Structures and Algorithms 1983 Reading, MA Addison-Wesley 203-208
[3]
Dijkstra E. W. A note on two problems in connexion with graphs Numer. Math. 1959 1 269-271
[4]
J. Gosling. A redisplay algorithm.Proceedings ACM SIGPLAN/SIGOA Symposium on Text Manipulation, 1981, pp.
[5]
Hall P. A. V. and Dowling G. R. Approximate string matching Comput. Surv. 1980 12 4 381-402
[6]
Harel D. and Tarjan R. E. Fast algorithms for finding nearest common ancestors SIAM J. Comput. 1984 13 2 338-355
[7]
Hirschberg D. S. A linear space algorithm for computing maximal common subsequences Commun. ACM 1975 18 6 341-343
[8]
Hirschberg D. S. Algorithms for the longest common subsequence problem J. ACM 1977 24 4 664-675
[9]
Hirschberg D. S. An information-theoretic lower bound for the longest common subsequence problem Inform. Process. Lett. 1978 7 1 40-41
[10]
J. W. Hunt and M. D. McIlroy. An algorithm for differential file comparison. Computing Science Technical Report 41, Bell Laboratories (1975).
[11]
Hunt J. W. and Szymanski T. G. A fast algorithm for computing longest common subsequences Commun. ACM 1977 20 5 350-353
[12]
Knuth D. E. The Art of Computer Programming, Vol. 3: Sorting and Searching 1983 Reading, MA Addison-Wesley 490-493
[13]
Masek W. J. and Paterson M. S. A faster algorithm for computing string edit distances J. Comput. System Sci. 1980 20 1 18-31
[14]
McCreight E. M. A space-economical suffix tree construction algorithm J. ACM 1976 23 2 262-272
[15]
Miller W. and Myers E. W. A file comparison program Software—Practice and Experience 1985 15 11 1025-1040
[16]
Nakatsu N., Kambayashi Y., and Yajima S. A longest common subsequence algorithm suitable for similar text strings Acta Inform. 1982 18 171-179
[17]
Rochkind M. J. The source code control system IEEE Trans. Software Engrg. 1975 1 4 364-370
[18]
Sankoff D. and Kruskal J. B. Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison 1983 Reading, MA Addison-Wesley
[19]
Tichy W. The string-to-string correction problem with block moves ACM Trans. Comput. Systems 1984 2 309-321
[20]
Wagner R. A. and Fischer M. J. The string-to-string correction problem J. ACM 1974 21 1 168-173

Cited By

View all
  • (2024)Code Shaping: Iterative Code Editing with Free-form SketchingAdjunct Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3672539.3686324(1-3)Online publication date: 13-Oct-2024
  • (2024)An Efficient Approach to Store and Access Wikipedia's Revision History for Large-Scale AnalysisProceedings of the 35th ACM Conference on Hypertext and Social Media10.1145/3648188.3675150(309-315)Online publication date: 10-Sep-2024
  • (2024)gawd: A Differencing Tool for GitHub Actions WorkflowsProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644873(682-686)Online publication date: 15-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Algorithmica
Algorithmica  Volume 1, Issue 1-4
Nov 1986
517 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 22 March 2023
Revision received: 17 January 1986
Received: 11 June 1985

Author Tags

  1. Longest common subsequence
  2. Shortest edit script
  3. Edit graph
  4. File comparison

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Code Shaping: Iterative Code Editing with Free-form SketchingAdjunct Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3672539.3686324(1-3)Online publication date: 13-Oct-2024
  • (2024)An Efficient Approach to Store and Access Wikipedia's Revision History for Large-Scale AnalysisProceedings of the 35th ACM Conference on Hypertext and Social Media10.1145/3648188.3675150(309-315)Online publication date: 10-Sep-2024
  • (2024)gawd: A Differencing Tool for GitHub Actions WorkflowsProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644873(682-686)Online publication date: 15-Apr-2024
  • (2024)ESGen: Commit Message Generation Based on Edit Sequence of Code ChangeProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644414(112-124)Online publication date: 15-Apr-2024
  • (2024)Keep Eyes on the Sentence: An Interactive Sentence Simplification System for English Learners Based on Eye Tracking and Large Language ModelsExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650792(1-7)Online publication date: 11-May-2024
  • (2024)The Effects of Update Interval and Reveal Method on Writer Comfort in Synchronized Shared-EditorsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642330(1-13)Online publication date: 11-May-2024
  • (2024)Fine-grained, accurate and scalable source differencingProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639148(1-12)Online publication date: 20-May-2024
  • (2024)Tracking the Evolution of Static Code Warnings: The State-of-the-Art and a Better ApproachIEEE Transactions on Software Engineering10.1109/TSE.2024.335828350:3(534-550)Online publication date: 1-Mar-2024
  • (2024)SecureC2Edit: A Framework for Secure Collaborative and Concurrent Document EditingIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.330281021:4(2227-2241)Online publication date: 1-Jul-2024
  • (2024)An efficient Burrows–Wheeler transform-based aligner for short read mappingComputational Biology and Chemistry10.1016/j.compbiolchem.2024.108050110:COnline publication date: 1-Jun-2024
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media