Abstract
Several efficient and very powerful algorithms exist for detecting changes in tree-based textual documents, such as those encoded in XML. An important aspect is still underestimated in their design and implementation: the quality of the output, in terms of readability, clearness and accuracy for human users. Such requirement is particularly relevant when diff-ing literary documents, such as books, articles, reviews, acts, and so on. This paper introduces the concept of ’naturalness’ in diff-ing tree-based textual documents, and discusses a new extensible set of changes which can and should be detected. A naturalness-based algorithm is presented, as well as its application for diff-ing XML-encoded legislative documents. The algorithm, called JNDiff, proved to detect significantly better matchings (since new operations are recognized) and to be very efficient.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agnoloni, T., Francesconi, E., Spinosa, P.: xmLeges Editor, an OpenSource visual XML editor for supporting Legal National Standards. In: Proceedings of V Legislative XML Workshop, Florence, Italy (2007)
Eggert, P.: Free Software Foundation: GNU Diff (2006), http://www.gnu.org/software/diffutils/diffutils.html
Ball, T., Douglis, F.: Tracking and viewing changes on the web. In: 1996 USENIX Annual Technical Conference (1996)
Chen, Y.F., Douglis, F., Ball, T., Koutsofios, E.: The at&t internet difference engine: Tracking and viewing changes on the web. World Wide Web 1(1), 27–44 (1998)
Fontaine, R.L.: A delta format for xml: identifying changes in xml files and representing the changes in xml. In: XML Europe 2001 (May 2001)
Fontaine, R.L.: Xml files: a new approach providing intelligent merge of xml data sets. In: XML Europe 2002 (May 2002)
Marian, A., Cobena, G., Abiteboul, S.: Detecting changes in xml documents. In: The 18th International Conference on Data Engineering, February 2002, pp. 493–504 (2002)
Hirschberg, D.S.: Algorithm for the longest common subsequence problem. Journal of the ACM 24(4), 664–675 (1977)
Lupo, C., Aini, F.: Norme in rete (1999), http://www.normeinrete.it/
Myers, E.W.: An o(nd) difference algorithm and its variations. Algorithmica 1(2), 251–266 (1986)
Cai, J., Wang, Y., DeWitt, D.: X-diff: an effective change detection algorithm for xml documents. Technical Report, University of Wisconsin (2001)
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing 18(6), 1245–1262 (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Di Iorio, A., Schirinzi, M., Vitali, F., Marchetti, C. (2009). A Natural and Multi-layered Approach to Detect Changes in Tree-Based Textual Documents. In: Filipe, J., Cordeiro, J. (eds) Enterprise Information Systems. ICEIS 2009. Lecture Notes in Business Information Processing, vol 24. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01347-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-01347-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01346-1
Online ISBN: 978-3-642-01347-8
eBook Packages: Computer ScienceComputer Science (R0)