Towards XML version control of office documents

S Rönnau, J Scheffczyk, UM Borghoff - … of the 2005 ACM symposium on …, 2005 - dl.acm.org
S Rönnau, J Scheffczyk, UM Borghoff
Proceedings of the 2005 ACM symposium on Document engineering, 2005dl.acm.org
Office applications such as OpenOffice and Microsoft Office are widely used to edit the
majority of today's business documents: office documents. Usually, version control systems
consider office documents as binary objects, thus severely hindering collaborative work.
Since XML has become a de-facto standard for office applications, we focus on versioning
office documents by structured XML version control approaches. This enables state-of-the-
art version control for office documents. A basic prerequisite to XML version control is a diff …
Office applications such as OpenOffice and Microsoft Office are widely used to edit the majority of today's business documents: office documents. Usually, version control systems consider office documents as binary objects, thus severely hindering collaborative work. Since XML has become a de-facto standard for office applications, we focus on versioning office documents by structured XML version control approaches. This enables state-of-the-art version control for office documents.A basic prerequisite to XML version control is a diff algorithm, which detects structural changes between XML documents. In this paper, we evaluate state-of-the-art XML diff algorithms w.r.t. their suitability to OpenOffice XML documents and the future OASIS office document standard. It turns out that, due to the specific XML office format, a careful examination of the diff algorithm characteristics is necessary. Therefore, we identify important features for XML diff approaches to handle office documents. We have implemented a first OpenOffice versioning API that can be used in version control systems as a replacement for line-based or binary diffs, which are currently used.
ACM Digital Library