Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/543613.543625acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

A normal form for XML documents

Published: 03 June 2002 Publication History

Abstract

This paper takes a first step towards the design and normalization theory for XML documents. We show that, like relational databases, XML documents may contain redundant information, and may be prone to update anomalies. Furthermore, such problems are caused by certain functional dependencies among paths in the document. Our goal is to find a way of converting an arbitrary DTD into a well-designed one, that avoids these problems. We first introduce the concept of a functional dependency for XML, and define its semantics via a relational representation of XML. We then define an XML normal form, XNF, that avoids update anomalies and redundancies. We study its properties and show that it generalizes BCNF and a normal form for nested relations when those are appropriately coded as XML documents. Finally, we present a lossless algorithm for converting any DTD into one in XNF.

References

[1]
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.]]
[2]
J. Albert, D. Giammarresi, D. Wood. Normal form algorithms for extended context-free grammars. TCS 267 (2001), 35-47.]]
[3]
P. Atzeni, N. Morfuni. Functional dependencies in relations with null values. Information Processing Letters 18(4): 233-238, (1984).]]
[4]
C. Beeri, P. Bernstein, N. Goodman. A sophisticate's introduction to database normalization theory. VLDB'78, pages 113-124.]]
[5]
P. Buneman, S. Davidson, W. Fan, C. Hara, and W. Tan. Keys for XML. In WWW'10, 2001.]]
[6]
P. Buneman, S. Davidson, W. Fan, C. Hara, and W. Tan. Reasoning about keys for XML. In DBPL'01.]]
[7]
P. Buneman, A. Jung, A. Ohori, Using powerdomains to generalize relational databases, Theoretical Computer Science 91 (1991), 23-55.]]
[8]
DBLP. http://dblp.uni-trier.de/.]]
[9]
W. F. Dowling and J. H. Gallier. Linear-time algorithms for testing the satisfiability of propositional Horn formulae. JLP 1(3): 267-284 (1984).]]
[10]
ebXML. Business Process Specification Schema v1.01. http://www.ebxml.org/specs/.]]
[11]
W. Fan, L. Libkin. On XML integrity constraints in the presence of DTDs. In PODS'01, pages 114-125.]]
[12]
W. Fan, J. Siméon. Integrity constraints for XML. PODS'00, pages 23-34.]]
[13]
M. Fernandez, J. Siméon, P. Wadler. A semi-monad for semi-structured data. ICDT'01, pages 263-300.]]
[14]
D. Florescu, D. Kossmann. Storing and querying XML data using an RDMBS. IEEE Data Eng. Bull. 22 (1999), 27-34.]]
[15]
G. Grahne. The Problem of Incomplete Information in Relational Databases, Springer, Berlin, 1991.]]
[16]
C. Gunter. Semantics of Programming Languages, The MIT Press, 1992.]]
[17]
J. Higgins, R. Jelliffe QAML Version 2.4. http://xml.ascc.net/resource/qaml-xml.dtd, 1999.]]
[18]
R. Hull. Relative information capacity of simple relational database schemata. SIAM Journal on Computing 15(3): 856-886 (1986).]]
[19]
T. Imielinski and W. Lipski. Incomplete information in relational databases. J. ACM 31(1984), 761-791.]]
[20]
C. Kanne, G. Moerkotte. Efficient storage of XML data. In ICDE'00, p. 198.]]
[21]
M. Levene, G. Loizou. Axiomatisation of functional dependencies in incomplete relations. TCS 206(1-2): 283-300, 1998.]]
[22]
W. Y. Mok, Y. K. Ng, D. Embley. A normal form for precisely characterizing redundancy in nested relations. ACM TODS 21 (1996), 77-106.]]
[23]
Z. M. Özsoyoglu, L.-Y. Yuan. A new normal form for nested relations. ACM TODS 12(1): 111-136, 1987.]]
[24]
Y. Sagiv, C. Delobel, D. S. Parker, R. Fagin. An equivalence between relational database dependencies and a fragment of propositional logic. J. ACM 28 (1981), 435-453.]]
[25]
J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D. DeWitt, J. Naughton. Relational databases for querying XML documents: limitations and opportunities. VLDB'99, pages 302-314.]]
[26]
D. Suciu. Bounded fixpoints for complex objects. TCS 176 (1997), 283-328.]]
[27]
Z. Tari, J. Stokes, S. Spaccapietra. Object normal forms and dependency constraints for object-oriented schemata. ACM TODS 22 (1997), 513-569.]]
[28]
I. Tatarinov, Z. Ives, A. Halevy, D. Weld. Updating XML. In SIGMOD'01, pages 413-424.]]
[29]
J. Van den Bussche. Simulation of the nested relational algebra by the flat relational algebra, with an application to the complexity of evaluating powerset algebra expressions. TCS 254(1-2): 363-377, 2001.]]
[30]
W3C. XML-Data. W3C Note, Jan. 1998.]]
[31]
W3C. XML Schema. W3C Working Draft, May 2001.]]
[32]
W3C. XQuery 1.0: An XML Query Language. W3C Working Draft, June 2001.]]

Cited By

View all

Recommendations

Reviews

Herman Fischer

Arenas and Libkin define the background and requirements of a normal form of Extensible Markup Language (XML), with the goal of converting arbitrary XML into well-formed XML. Since my own work is with Extensible Business Reporting Language (XBRL) processor implementation, this background, and their algorithm for normalizing XML documents, is of high interest to me. Recent database implementations for storing normal-form XML data-both those implemented specifically for XML, such as Apache.org's XIndice, and by XML features added to popular commercial heavyweight relational databases-make this technology timely and of wide interest. The authors begin by demonstrating the need to normalize XML, based on XML data structures that have redundancies of data in different areas of the structure. They relate that to relational functional dependency. Extending from relational to XML functional dependencies, the authors then represent XML tree structured data into tuples (XBRL also does this). Functional dependencies are defined using tree tuples. Finally, an XML normal form (XNF) is defined. Anticipating massive Web databases of poorly organized data, the authors' goal is to provide principles for good design, and algorithms for producing that design, sort of an XML schema reengineering. My take on this process is different: I see the need for this technology in the opposite direction, not for the rescue of Web "stuff," but for the upcoming new generations of databases and software that will all be based on XML technologies. Mainstream business processing now considers XML and XML derivatives for automated interchange between systems, and as core infrastructural technology. Physically, this paper seems to be about the right length (12 pages). The presentation is consistent with the authors' ideas, and the text was easy to read. The latest reference is from one year before publication. This paper demonstrates that XML structure can be converted to a normal form, and that data can be losslessly converted into the resulting structure. In my work with XBRL, there are more expressed semantics than just the structure of data. Source fact values expressed in XML are given context, measure, role restrictions, and semantics, defined in XML linking language (XLink) linkbases. Applications of the authors' theories, if extendable, would be very useful. I would recommend this paper to researchers and developers of XML and XML-derived data structuring and data storing technology. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '02: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2002
311 pages
ISBN:1581135076
DOI:10.1145/543613
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2002

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS02

Acceptance Rates

PODS '02 Paper Acceptance Rate 24 of 109 submissions, 22%;
Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Applications of Information Inequalities to Database Theory Problems2023 38th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)10.1109/LICS56636.2023.10175769(1-30)Online publication date: 26-Jun-2023
  • (2019)Dependencies for GraphsACM Transactions on Database Systems10.1145/328728544:2(1-40)Online publication date: 13-Feb-2019
  • (2018)XML Integrity ConstraintsEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_787(4756-4761)Online publication date: 7-Dec-2018
  • (2017)Dependencies for GraphsProceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3034786.3056114(403-416)Online publication date: 9-May-2017
  • (2017)XML Integrity ConstraintsEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_787-2(1-7)Online publication date: 19-Sep-2017
  • (2016)Data type definition and handling for supporting interoperability across organizational bordersJournal of Intelligent Manufacturing10.1007/s10845-014-0884-927:1(167-185)Online publication date: 1-Feb-2016
  • (2012)On evaluating an approach for balancing the trade‐off on XML schema designInternational Journal of Web Information Systems10.1108/174400812112828748:4(371-389)Online publication date: 16-Nov-2012
  • (2011)Attribute grammar for XML integrity constraint validationProceedings of the 22nd international conference on Database and expert systems applications - Volume Part I10.5555/2035368.2035378(94-109)Online publication date: 29-Aug-2011
  • (2011)Weak functional dependencies on trees with restructuringActa Cybernetica10.14232/actacyb.20.2.2011.520:2(285-329)Online publication date: 1-Feb-2011
  • (2011)A workload-aware approach for optimizing the XML schema design trade-offProceedings of the 13th International Conference on Information Integration and Web-based Applications and Services10.1145/2095536.2095542(12-19)Online publication date: 5-Dec-2011
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media