Simplifying XML schema: effortless handling of nondeterministic regular expressions

GJ Bex, W Gelade, W Martens, F Neven - Proceedings of the 2009 ACM …, 2009 - dl.acm.org
Proceedings of the 2009 ACM SIGMOD International Conference on Management of …, 2009dl.acm.org
Whether beloved or despised, XML Schema is momentarily the only industrially accepted
schema language for XML and is unlikely to become obsolete any time soon. Nevertheless,
many nontransparent restrictions unnecessarily complicate the design of XSDs. For
instance, complex content models in XML Schema are constrained by the infamous unique
particle attribution (UPA) constraint. In formal language theoretic terms, this constraint
restricts content models to deterministic regular expressions. As the latter constitute a …
Whether beloved or despised, XML Schema is momentarily the only industrially accepted schema language for XML and is unlikely to become obsolete any time soon. Nevertheless, many nontransparent restrictions unnecessarily complicate the design of XSDs. For instance, complex content models in XML Schema are constrained by the infamous unique particle attribution (UPA) constraint. In formal language theoretic terms, this constraint restricts content models to deterministic regular expressions. As the latter constitute a semantic notion and no simple corresponding syntactical characterization is known, it is very difficult for non-expert users to understand exactly when and why content models do or do not violate UPA. In the present paper, we therefore investigate solutions to relieve users from the burden of UPA by automatically transforming nondeterministic expressions into concise deterministic ones defining the same language or constituting good approximations. The presented techniques facilitate XSD construction by reducing the design task at hand more towards the complexity of the modeling task. In addition, our algorithms can serve as a plug-in for any model management tool which supports export to XML Schema format.
ACM Digital Library