Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1559845.1559922acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Simplifying XML schema: effortless handling of nondeterministic regular expressions

Published: 29 June 2009 Publication History

Abstract

Whether beloved or despised, XML Schema is momentarily the only industrially accepted schema language for XML and is unlikely to become obsolete any time soon. Nevertheless, many nontransparent restrictions unnecessarily complicate the design of XSDs. For instance, complex content models in XML Schema are constrained by the infamous unique particle attribution (UPA) constraint. In formal language theoretic terms, this constraint restricts content models to deterministic regular expressions. As the latter constitute a semantic notion and no simple corresponding syntactical characterization is known, it is very difficult for non-expert users to understand exactly when and why content models do or do not violate UPA. In the present paper, we therefore investigate solutions to relieve users from the burden of UPA by automatically transforming nondeterministic expressions into concise deterministic ones defining the same language or constituting good approximations. The presented techniques facilitate XSD construction by reducing the design task at hand more towards the complexity of the modeling task. In addition, our algorithms can serve as a plug-in for any model management tool which supports export to XML Schema format.

References

[1]
H. Ahonen. Disambiguation of SGML content models. In Workshop on Principles of Document Processing (PODP), p. 27--37, 1996.
[2]
D. Barbosa, L. Mignet, and P. Veltri. Studying the XML Web: gathering statistics from an XML sample. World Wide Web, 8(4):413--438, 2005.
[3]
M. Benedikt, W. Fan, F. Geerts. XPath satisfiability in the presence of DTDs. Journal of the ACM, 55(2), 2007.
[4]
P. A. Bernstein. Applying Model Management to Classical Meta Data Problems. In Conference on Innovative Data Systems Research (CIDR), 2003.
[5]
G. J. Bex, F. Neven, J. Van den Bussche. DTDs versus XML Schema: a practical study. Workshop on the Web and Databases (WebDB), p. 79--84, 2004.
[6]
G. J. Bex, W. Gelade, F. Neven, and S. Vansummeren. Learning deterministic regular expressions for the inference of schemas from XML data. In World Wide Web Conference (WWW), p. 825--834, 2008.
[7]
G. J. Bex, F. Neven, T. Schwentick, and K. Tuyls. Inference of concise DTDs from XML data. In Very Large Data Bases (VLDB), p. 115--126, 2006.
[8]
G. J. Bex, F. Neven, and S. Vansummeren. Inferring XML schema definitions from XML data. In Very Large Data Bases (VLDB), p. 998--1009, 2007.
[9]
G. J. Bex, F. Neven, and S. Vansummeren. SchemaScope: a system for inferring and cleaning XML schemas. In ACM SIGMOD International Conference on Management of Data (SIGMOD), p. 1259--1262, 2008.
[10]
A. Brüggemann-Klein and D. Wood. One-unambiguous regular languages. Information and Computation, 142:182--206, 1998.
[11]
D. Che, K. Aberer, and M. T. Özsu. Query optimization in XML structured-document databases. VLDB Journal, 15(3):263---289, 2006.
[12]
C. Chitic and D. Rosu. On validation of XML streams using finite state machines. In Workshop on the Web and Databases (WebDB), p. 85--90, 2004.
[13]
J. Freire, F. Du, S. Amer-Yahia. ShreX: Managing XML Documents in Relational Databases. In Very Large Data Bases (VLDB), p. 1297--1300, 2004.
[14]
J. Flum and M. Grohe. Parametrized Complexity Theory. Springer, 2006.
[15]
J. Freire, J.R. Haritsa, M. Ramanath, P. Roy, and J. Siméon. StatiX:making XML count. ACM SIGMOD International Conference on Management of Data (SIGMOD), p. 181--191, 2002.
[16]
W. Gelade and F. Neven. Succinctness of the complement and intersection of regular expressions. In Symposium on Theoretical Aspects of Computer Science (STACS), p. 325--336, 2008.
[17]
G. Ghelli, D. Colazzo, C. Sartiani. Efficient inclusion for a class of xml types with interleaving and counting. In Database Programming Languages (DBPL), p. 231--245, 2007.
[18]
G. Ghelli, D. Colazzo, and C. Sartiani. Linear time membership in a class of regular expressions with interleaving and counting. In Conference on Information and Knowledge Management (CIKM), p. 389--398, 2008.
[19]
H. Gruber and J. Johannsen. Optimal lower bounds on regular expression size using communication complexity. In Foundations of Software Science and Computation Structures (FOSSACS), p. 273--286, 2008.
[20]
P. Kilpeläinen and R. Tuhkanen. One-unambiguity of regular expressions with numeric occurrence indicators. Information and Computation, 205(6):890--916, 2007.
[21]
C. Koch and S. Scherzinger. Attribute grammars for scalable query processing on XML streams. VLDB Journal, 16(3):317--342, 2007.
[22]
C. Koch, S. Scherzinger, N. Schweikardt, and B. Stegmaier. Schema-based scheduling of event processors and buffer minimization for queries on structured data streams. In Very Large Data Bases (VLDB), p. 228--239, 2004.
[23]
Ioana Manolescu, Daniela Florescu, and Donald Kossmann. Answering XML Queries on Heterogeneous Data Sources. In Very Large Data Bases (VLDB), p. 241--250, 2001.
[24]
W. Martens, F. Neven, and T. Schwentick. Complexity of decision problems for simple regular expressions. In Mathematical Foundations of Computer Science (MFCS), p. 889--900, 2004.
[25]
W. Martens, F. Neven, T. Schwentick, and G.J. Bex. Expressiveness and complexity of XML Schema. ACM Transactions on Database Systems, 31(3):770--813, 2006.
[26]
L. Mignet, D. Barbosa, and P. Veltri. The XML web: a first study. In World Wide Web Conference (WWW), p. 500--510, Budapest, Hungary, 2003.
[27]
F. Neven and T. Schwentick. On the complexity of XPath containment in the presence of disjunction, DTDs, and variables. Logical Methods in Computer Science, 2(3), 2006.
[28]
Erhard Rahm and Philip A. Bernstein. A survey of approaches to automatic schema matching. VLDB Journal, 10(4):334--350, 2001.
[29]
A.Salomaa. Two complete axiom systems for the algebra of regular events. Journal of the ACM,13:158--169, 1966.
[30]
H.S. Thompson, D. Beech, M. Maloney, and N. Mendelsohn. XML Schema part 1: Structures. World Wide Web Consortium (W3C), May 2001.
[31]
E. van der Vlist. XML Schema. O'Reilly, 2002.
[32]
P. van Emde Boas. The convenience of tilings. In Complexity, Logic and Recursion Theory, p. 331--363.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
June 2009
1168 pages
ISBN:9781605585512
DOI:10.1145/1559845
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2009

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. deterministic regular expressions
  2. upa
  3. xml schema

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '09
Sponsor:
SIGMOD/PODS '09: International Conference on Management of Data
June 29 - July 2, 2009
Rhode Island, Providence, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)2
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Context-Free Grammars for Deterministic Regular Expressions with InterleavingTheoretical Aspects of Computing – ICTAC 201910.1007/978-3-030-32505-3_14(235-252)Online publication date: 22-Oct-2019
  • (2018)Inferring Deterministic Regular Expression with CountingConceptual Modeling10.1007/978-3-030-00847-5_15(184-199)Online publication date: 26-Sep-2018
  • (2017)BonXaiACM Transactions on Database Systems10.1145/310596042:3(1-42)Online publication date: 24-Aug-2017
  • (2015)Deciding determinism of unary languagesInformation and Computation10.1016/j.ic.2015.08.005245:C(181-196)Online publication date: 1-Dec-2015
  • (2015)Deciding Determinism of Regular LanguagesTheory of Computing Systems10.1007/s00224-014-9576-257:1(97-139)Online publication date: 1-Jul-2015
  • (2015)Fast Learning of Restricted Regular Expressions and DTDsTheory of Computing Systems10.1007/s00224-014-9559-357:4(1114-1158)Online publication date: 1-Nov-2015
  • (2015)Definability by Weakly Deterministic Regular Expressions with Counters is DecidableMathematical Foundations of Computer Science 201510.1007/978-3-662-48057-1_29(369-381)Online publication date: 11-Aug-2015
  • (2015)Discovering Restricted Regular Expressions with InterleavingWeb Technologies and Applications10.1007/978-3-319-25255-1_9(104-115)Online publication date: 13-Nov-2015
  • (2014)Discovering XSD Keys from XML DataACM Transactions on Database Systems10.1145/263854739:4(1-49)Online publication date: 30-Dec-2014
  • (2013)A Graphical Based Approach to the Conceptual Modeling, Validation and Generation of XML Schema DefinitionsInternational Journal of Information Technology and Web Engineering10.4018/jitwe.20130101018:1(1-22)Online publication date: 1-Jan-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media