Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

BonXai: Combining the Simplicity of DTD with the Expressiveness of XML Schema

Published: 24 August 2017 Publication History

Abstract

While the migration from DTD to XML Schema was driven by a need for increased expressivity and flexibility, the latter was also significantly more complex to use and understand. Whereas DTDs are characterized by their simplicity, XML Schema Documents are notoriously difficult. In this article, we introduce the XML specification language BonXai, which incorporates many features of XML Schema but is arguably almost as easy to use as DTDs. In brief, the latter is achieved by sacrificing the explicit use of types in favor of simple patterns expressing contexts for elements. The goal of BonXai is not to replace XML Schema but rather to provide a simpler alternative for users who want to go beyond the expressiveness and features of DTD but do not need the explicit use of types. Furthermore, XML Schema processing tools can be used as a back-end for BonXai, since BonXai can be automatically converted into XML Schema. A particularly strong point of BonXai is its solid foundation rooted in a decade of theoretical work around pattern-based schemas. We present a formal model for a core fragment of BonXai and the translation algorithms to and from a core fragment of XML Schema. We prove that BonXai and XML Schema can be converted back-and-forth on the level of tree languages and we formally study the size trade-offs between the two languages.

References

[1]
Geert Jan Bex, Wouter Gelade, Wim Martens, and Frank Neven. 2009. Simplifying XML schema: Effortless handling of nondeterministic regular expressions. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’09). ACM, New York, NY, 731--744
[2]
Geert Jan Bex, Wim Martens, Frank Neven, and Thomas Schwentick. 2005. Expressiveness of XSDs: From practice to theory, there and back again. In Proceedings of the International Conference on World Wide Web (WWW’05). 712--721.
[3]
Geert Jan Bex, Frank Neven, and Jan Van den Bussche. 2004. DTDs versus XML schema: A practical study. In Proceedings of the International Workshop on the Web and Databases (WebDB’04). 79--84.
[4]
Geert Jan Bex, Frank Neven, Thomas Schwentick, and Stijn Vansummeren. 2010. Inference of concise regular expressions and DTDs. ACM Trans. Database Syst. 35, 2 (2010), 11:1--11:47.
[5]
Henrik Björklund, Wim Martens, and Thomas Timm. 2015. Efficient incremental evaluation of succinct regular expressions. In Proceedings of the International on Conference on Information and Knowledge Management (CIKM’15). 1541--1550.
[6]
A. Brüggemann-Klein and D. Wood. 1998. One-unambiguous regular languages. Info. Comput. 142, 2 (1998), 182--206.
[7]
Janusz A. Brzozowski. 1964. Derivatives of regular expressions. J. ACM 11, 4 (1964), 481--494.
[8]
Russel Butek and Shannon Kendrick. 2011. Web services hints and tips: Avoid anonymous types. Retrieved from http://www.ibm.com/developerworks/webservices/library/ws-avoid-anonymous-types/ws-avoid-anonymous-types-pdf.pdf.
[9]
P. Caron, Y. Han, and L. Mignot. 2011. Generalized one-unambiguity. In Proceedings of the International Conference on Developments in Language Theory (DLT’11). 129--140.
[10]
Claudio Sacerdoti Coen, Paolo Marinelli, and Fabio Vitali. 2004. Schemapath, a minimal extension to XML schema for conditional constraints. In Proceedings of the WWW, Stuart I. Feldman, Mike Uretsky, Marc Najork, and Craig E. Wills (Eds.). ACM, 164--174.
[11]
W. Czerwiński, C. David, K. Losemann, and W. Martens. 2013. Deciding definability by deterministic regular expressions. In Proceedings of the International Conference on Foundations of Software Science and Computation (FOSSACS’13). 289--304.
[12]
W. Czerwiński, W. Martens, and T. Masopust. 2013. Efficient separability of regular languages by subsequences and suffixes. In Proceedings of the International Colloquium on Automata, Languages, and Programming (ICALP’13). 150--161.
[13]
DSD. 2002. Document Structure Description (DSD). Retrieved from http://www.brics.dk/DSD/ (2002).
[14]
A. Ehrenfeucht and H. P. Zeiger. 1976. Complexity measures for regular expressions. J. Comput. Syst. Sci. 12, 2 (1976), 134--146.
[15]
Davide Fiorello, Nicola Gessa, Paolo Marinelli, and Fabio Vitali. 2004. DTD++ 2.0: Adding support for co-constraints. In Extreme Markup Languages.
[16]
S. Gao, C. M. Sperberg-McQueen, H. Thompson, N. Mendelsohn, D. Beech, and M. Maloney. 2012. W3C XML Schema Definition Language (XSD) 1.1 Part 1: Structures. Retrieved from http://www.w3.org/TR/2012/REC-xmlschema11-1-20120405/ (April 2012).
[17]
Wouter Gelade and Frank Neven. 2011. Succinctness of pattern-based schema languages for XML. J. Comput. System Sci. 77, 3 (2011), 505--519.
[18]
P. Hofman and W. Martens. 2015. Separability by short subsequences and subwords. In Proceedings of the International Conference on Database Theory (ICDT’15). 230--246.
[19]
JEdit. jEdit Programmer’s Text Editor. Retrieved from www.jedit.org.
[20]
Gjergji Kasneci and Thomas Schwentick. 2007. The complexity of reasoning about pattern-based XML schemas. In Proceedings of the ACM Symposium on Principles of Database Systems (PODS’07). 155--164.
[21]
K. Losemann, W. Martens, and M. Niewerth. 2012. Descriptional complexity of deterministic regular expressions. In Proceedings of the International Symposium on Mathematical Foundations of Computer Science (MFCS’12). 643--654.
[22]
W. Martens, V. Mattick, M. Niewerth, S. Agarwal, N. Douib, O. Garbe, D. Günther, D. Oliana, J. Kroniger, F. Lücke, T. Melikoglu, K. Nordmann, G. Özen, T. Schlitt, L. Schmidt, J. Westhoff, and D. Wolff. 2015. Design of the BonXai Schema Language (draft 2015). Retrieved from http://www.bonxai.org/downloads/bonxai-design.pdf.
[23]
W. Martens, F. Neven, M. Niewerth, and T. Schwentick. 2012. Developing and analyzing XSDs through BonXai. Proc. Very Large Database 5, 12 (2012), 1994--1997.
[24]
Wim Martens, Frank Neven, Matthias Niewerth, and Thomas Schwentick. 2015. BonXai: Combining the simplicity of DTD with the expressiveness of XML Schema. In Proceedings of the Symposium on Principles of Database Systems (PODS’15). 145--156.
[25]
Wim Martens, Frank Neven, Matthias Niewerth, and Thomas Schwentick. 2017. BonXai: Combining the simplicity of DTD with the expressiveness of XML Schema (data set). Retrieved from http://bonxai.org/downloads.html.
[26]
Wim Martens, Frank Neven, and Thomas Schwentick. 2007. Simple off the shelf abstractions of XML schema. SIGMOD Rec. 36, 3 (2007), 15--22.
[27]
Wim Martens, Frank Neven, Thomas Schwentick, and Geert Jan Bex. 2006. Expressiveness and complexity of XML Schema. ACM Trans. Database Syst. 31, 3 (2006), 770--813.
[28]
W. Martens and J. Niehren. 2007. On the minimization of XML schemas and tree automata for unranked trees. J. Comput. Syst. Sci. 73, 4 (2007), 550--583.
[29]
Anders Møller and Michael Schwartzbach. 2006. An Introduction to XML and Web Technologies. Addison-Wesley.
[30]
Makoto Murata, Dongwon Lee, Murali Mani, and Kohsuke Kawaguchi. 2005. Taxonomy of XML schema languages using formal language theory. ACM Trans. Internet Technol. 5, 4 (2005), 660--704.
[31]
D. Peterson, S. Gao, A. Malhotra, C. M. Sperberg-McQueen, H. Thompson, and P. V. Biron. 2012. W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes. Retrieved from http://www.w3.org/TR/2012/REC-xmlschema11-2-20120405/.
[32]
T. Place, L. van Rooijen, and M. Zeitoun. 2013. Separating regular languages by piecewise testable and unambiguous languages. In Proceedings of the International Symposium on Mathematical Foundations of Computer Science (MFCS’13). 729--740.
[33]
RelaxNG. 2001. Relax NG Specification. Retrieved from http://www.relaxng.org/spec-20011203.html (2001).
[34]
Schematron. 1999. Schematron. Retrieved from http://www.schematron.com/ (1999).
[35]
C. M. Sperberg-McQueen and H. Thompson. 2005. XML Schema. Retrieved from http://www.w3.org/XML/Schema.

Cited By

View all
  • (2023)A noise-tolerant differentiable learning approach for single occurrence regular expression with interleavingProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i4.25606(4809-4817)Online publication date: 7-Feb-2023
  • (2023)Validating Streaming JSON Documents with Learned VPAsTools and Algorithms for the Construction and Analysis of Systems10.1007/978-3-031-30823-9_14(271-289)Online publication date: 22-Apr-2023
  • (2022)Towards Theory for Real-World DataProceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3517804.3526066(261-276)Online publication date: 12-Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 42, Issue 3
Invited Paper from SIGMOD 2015, Invited Paper from PODS 2015, Regular Papers and Technical Correspondence
September 2017
220 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/3129336
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2017
Accepted: 01 June 2017
Revised: 01 May 2017
Received: 01 December 2015
Published in TODS Volume 42, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. BonXai
  2. XML
  3. XML Schema
  4. schema languages

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Deutsche Forschungsgemeinschaft (Emmy Noether Nachwuchsgruppe)
  • Future and Emerging Technologies (FET)
  • Seventh Framework Programme for Research of the European Commission, under the FET-Open grant agreement FOX

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A noise-tolerant differentiable learning approach for single occurrence regular expression with interleavingProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i4.25606(4809-4817)Online publication date: 7-Feb-2023
  • (2023)Validating Streaming JSON Documents with Learned VPAsTools and Algorithms for the Construction and Analysis of Systems10.1007/978-3-031-30823-9_14(271-289)Online publication date: 22-Apr-2023
  • (2022)Towards Theory for Real-World DataProceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3517804.3526066(261-276)Online publication date: 12-Jun-2022
  • (2022)Learning Disjunctive Multiplicity Expressions and Disjunctive Generalize Multiplicity Expressions From Both Positive and Negative ExamplesThe Computer Journal10.1093/comjnl/bxac03766:7(1733-1748)Online publication date: 18-Apr-2022
  • (2022)Membership Algorithm for Single-Occurrence Regular Expressions with Shuffle and CountingDatabase Systems for Advanced Applications10.1007/978-3-031-00123-9_41(526-542)Online publication date: 11-Apr-2022
  • (2021)PG-Keys: Keys for Property GraphsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457561(2423-2436)Online publication date: 9-Jun-2021
  • (2021)Learning Finite Automata with ShuffleAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-75765-6_25(308-320)Online publication date: 11-May-2021
  • (2020)Model-View-Controller based Context Visualization Method for Multimedia English Teaching System: A Case Study of Multimedia Technology TeachingInternational Journal of Academic Research in Progressive Education and Development10.6007/IJARPED/v9-i2/71839:2Online publication date: 30-Apr-2020
  • (2020)Multimedia Teaching in Teaching of College English ReadingJournal of Testing and Evaluation10.1520/JTE2020017949:4(20200179)Online publication date: 18-Dec-2020
  • (2019)Dichotomies for Evaluating Simple Regular Path QueriesACM Transactions on Database Systems10.1145/333144644:4(1-46)Online publication date: 15-Oct-2019
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media