Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Simple off the shelf abstractions for XML schema

Published: 01 September 2007 Publication History

Abstract

Although the advent of XML Schema [25] has rendered DTDs obsolete, research on practical XML optimization is mostly biased towards DTDs and tends to largely ignore XSDs (some notable exceptions non-withstanding). One of the underlying reasons is most probably the perceived simplicity of DTDs versus the alleged impenetrability of XML Schema. Indeed, optimization w.r.t. DTDs has a local flavor and usually reduces to reasoning about the accustomed formalism of regular expressions. XSDs, on the other hand, even when sufficiently stripped down, are related to the less pervious class of unranked regular tree automata [6, 19, 20, 21]. Recent results on the structural expressiveness of XSDs [19], however, show that XSDs are in fact much closer to DTDs than to tree automata, leveraging the possibility to directly extend techniques for DTD-based XML optimization to the realm of XML Schema. The goal of the present paper is to present the results in [19] in an easy and accessible way. At the same time, we discuss possible applications, related research, and future research directions. Throughout the paper, we try to restrict notation to a minimum. We refer to [19] for further details.

References

[1]
G. J. Bex, F. Neven, T. Schwentick, and K. Tuyls. Inference of concise DTDs from XML data. In VLDB, pages 115--126, 2006.
[2]
G. J. Bex, F. Neven, and J. Van den Bussche. DTDs versus XML Schema: A practical study. In WebDB, pages 79--84, 2004.
[3]
G. J. Bex, F. Neven, and S. Vansummeren. Inferring XML Schema Definitions from XML data. In VLDB, pages 998--1009, 2007.
[4]
A. Brüggemann-Klein. Regular expressions into finite automata. Theoretical Computer Science, 120(2):197--213, 1993.
[5]
A. Brüggemann-Klein and Wood D. One-unambiguous regular languages. Information and Computation, 140(2):229--253, 1998.
[6]
A. Brüggemann-Klein, M. Murata, and D. Wood. Regular tree and regular hedge languages over unranked alphabets: Version 1, april 3, 2001. Technical Report HKUST-TCSC-2001-0, The Hongkong University of Science and Technology, 2001.
[7]
J. Clark and M. Murata. RELAX NG Specification. OASIS, December 2001.
[8]
C. Sacerdoti Coen, P. Marinelli, and F. Vitali. Schemapath, a minimal extension to XML Schema for conditional constraints. In WWW, pages 164--174, 2004.
[9]
R. Cover. The Cover pages. http://xml.coverpages.org/, 2005.
[10]
W. Gelade, W. Martens, and F. Neven. Optimizing schema languages for XML: Numerical constraints and interleaving. In ICDT, pages 269--283, 2007.
[11]
W. Gelade and F. Neven. Succinctness of pattern-based schema languages for XML. In DBPL, 2007.
[12]
G. Ghelli, D. Colazzo, and C. Sartiani. Efficient inclusion for a class of XML types with interleaving and counting. In DBPL, 2007.
[13]
G. Kasneci and T. Schwentick. The complexity of reasoning about pattern-based XML schemas. In PODS, pages 155--164, 2007.
[14]
P. Kilpeläinen and R. Tuhkanen. Regular expressions with numerical occurrence indicators - preliminary results. In SPLST, pages 163--173, 2003.
[15]
P. Kilpeläinen and R. Tuhkanen. Towards efficient implementation of XML schema content models. In DOCENG, pages 239--241, 2004.
[16]
P. Kilpeläinen and R. Tuhkanen. One-unambiguity of regular expressions with numeric occurrence indicators. Inf. Comput., 205(6):890--916, 2007.
[17]
M. Mani. Keeping chess alive --- Do we need 1-unambiguous content models? In Extreme Markup Languages, Montreal, Canada, 2001.
[18]
W. Martens, F. Neven, and T. Schwentick. Complexity of decision problems for simple regular expressions. In MFCS, pages 889--900, 2004.
[19]
W. Martens, F. Neven, T. Schwentick, and G. J. Bex. Expressiveness and complexity of XML Schema. ACM Transactions on Database Systems, 31(3):770--813, 2006.
[20]
M. Murata, D. Lee, M. Mani, and K. Kawaguchi. Taxonomy of XML schema languages using formal language theory. ACM Transactions on Internet Technology, 5(4):1--45, 2005.
[21]
F. Neven. Automata theory for XML researchers. SIGMOD Record, 31(3):39--46, 2002.
[22]
L. Segoufin and V. Vianu. Validating streaming XML documents. In PODS, pages 53--64, 2002.
[23]
H. Seidl. Deciding equivalence of finite tree automata. SIAM Journal on Computing, 19(3):424--437, 1990.
[24]
C. M. Sperberg-McQueen. XML Schema 1.0: A language for document grammars. In XML --- Conference Proceedings, 2003.
[25]
C. M. Sperberg-McQueen and H. Thompson. XML Schema. Technical report, World Wide Web Consortium, 2005. http://www.w3.org/XML/Schema.
[26]
E. van der Vlist. XML Schema. O'Reilly, 2002.

Cited By

View all
  • (2023)Exploiting Structure in Regular Expression QueriesProceedings of the ACM on Management of Data10.1145/35892971:2(1-28)Online publication date: 20-Jun-2023
  • (2019)Faster Carry Bit Computation for Adder Circuits with Prescribed Arrival TimesACM Transactions on Algorithms10.1145/334032115:4(1-23)Online publication date: 25-Jul-2019
  • (2019)Dynamic Beats FixedACM Transactions on Algorithms10.1145/334029615:4(1-21)Online publication date: 25-Jul-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 36, Issue 3
September 2007
52 pages
ISSN:0163-5808
DOI:10.1145/1324185
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2007
Published in SIGMOD Volume 36, Issue 3

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media