Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/773153.773155acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

An information-theoretic approach to normal forms for relational and XML data

Published: 09 June 2003 Publication History

Abstract

Normalization as a way of producing good database designs is a well-understood topic. However, the same problem of distinguishing well-designed databases from poorly designed ones arises in other data models, in particular, XML. While in the relational world the criteria for being well-designed are usually very intuitive and clear to state, they become more obscure when one moves to more complex data models.Our goal is to provide a set of tools for testing when a condition on a database design, specified by a normal form, corresponds to a good design. We use techniques of information theory, and define a measure of information content of elements in a database with respect to a set of constraints. We first test this measure in the relational context, providing information-theoretic justification for familiar normal forms such as BCNF, 4NF, PJ/NF, 5NFR, DK/NF. We then show that the same measure applies in the XML context, which gives us a characterization of a recently introduced XML normal form called XNF. Finally, we look at information-theoretic criteria for justifying normalization algorithms.

References

[1]
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.
[2]
J. Albert, Y. Ioannidis, and R. Ramakrishnan.Equivalence of keyed relational schemas by conjunctive queries. JCSS, 58(3):512--534, 1999.
[3]
M. Arenas and L. Libkin. A normal form for XML documents. In PODS'02, pages 85--96.
[4]
C. Beeri, P. Bernstein, and N. Goodman. A sophisticate's introduction to database normalization theory. In VLDB'78, pages 113--124.
[5]
J. Biskup. Achievements of relational database schema design theory revisited. In Semantics in Databases, LNCS 1358, pages 29--54. Springer-Verlag, 1995.
[6]
R. Cavallo and M. Pittarelli. The theory of probabilistic databases. In VLDB'87, pages 71--81.
[7]
T. Cover and J. Thomas. Elements of Information Theory. Wiley-Interscience, 1991.
[8]
M. Dalkilic and E. Robertson. Information dependencies.In PODS'00, pages 245--253.
[9]
DBLP. http://www.informatik.uni-trier.de/~ley/db/.
[10]
D. W. Embley and W. Y. Mok. Developing XML documents with guaranteed "good" properties. In ER'01, pages 426--441.
[11]
R. Fagin. Multivalued dependencies and a new normal form for relational databases. ACM TODS, 2(3):262--278, 1977.
[12]
R. Fagin. Normal forms and relational database operators. In SIGMOD'79, pages 153--160.
[13]
R. Fagin. A normal form for relational databases that is based on domains and keys. ACM TODS, 6(3):387--415, 1981.
[14]
R. Hull. Relative information capacity of simple relational database schemata. SIAM J. Comput., 15(3):856--886, 1986.
[15]
P. Kanellakis. Elements of Relational Database Theory, In Handbook of TCS, vol. B, pages 1075--1144. 1990.
[16]
T. T. Lee. An information-theoretic analysis of relational databases - Part I: Data dependencies and information metric. IEEE Trans. on Software Engineering, 13(10):1049--1061, 1987.
[17]
M. Levene and G. Loizou. Why is the snowflake schema a good data warehouse design? Information Systems, to appear.
[18]
M. Levene and M. W. Vincent. Justification for inclusion dependency normal form. IEEE TKDE, 12(2):281--291, 2000.
[19]
D. Maier, A. O. Mendelzon, and Y. Sagiv. Testing implications of data dependencies. ACM TODS, 4(4):455--469, 1979.
[20]
W.Y. Mok, Y. K. Ng, D. Embley. A normal form for precisely characterizing redundancy in nested relations.ACM TODS 21 (1996), 77--106.
[21]
Z. M. Özsoyoglu, L.-Y. Yuan. A new normal form for nested relations. ACM TODS 12(1): 111--136, 1987.
[22]
C. H. Papadimitriou. Computational Complexity Addison-Wesley, 1994.
[23]
C.E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379--423 (Part I), 623--656 (Part II), 1948.
[24]
D. Suciu. On database theory and XML. SIGMOD Record, 30(3):39--45, 2001.
[25]
I. Tatarinov, Z. Ives, A. Halevy, and D. Weld. Updating XML. In SIGMOD'01, pages 413--424.
[26]
V. Vianu. A Web Odyssey: from Codd to XML. In PODS'01, pages 1--15.
[27]
M. W. Vincent. A corrected 5NF definition for relational database design. TCS, 185(2):379--391, 1997.
[28]
M. W. Vincent. Semantic foundations of 4NF in relational database design. Acta Informatica, 36(3):173--213, 1999.

Cited By

View all
  • (2010)Fast detection of functional dependencies in XML dataProceedings of the 7th international XML database conference on Database and XML technologies10.5555/1888011.1888025(113-127)Online publication date: 17-Sep-2010
  • (2010)Information theory for data managementProceedings of the 2010 ACM SIGMOD International Conference on Management of data10.1145/1807167.1807337(1255-1256)Online publication date: 6-Jun-2010
  • (2010)Semantic clustering of XML documentsACM Transactions on Information Systems10.1145/1658377.165838028:1(1-56)Online publication date: 29-Jan-2010
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2003
291 pages
ISBN:1581136706
DOI:10.1145/773153
  • Conference Chair:
  • Frank Neven,
  • General Chair:
  • Catriel Beeri,
  • Program Chair:
  • Tova Milo
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2003

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS03

Acceptance Rates

PODS '03 Paper Acceptance Rate 27 of 136 submissions, 20%;
Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2010)Fast detection of functional dependencies in XML dataProceedings of the 7th international XML database conference on Database and XML technologies10.5555/1888011.1888025(113-127)Online publication date: 17-Sep-2010
  • (2010)Information theory for data managementProceedings of the 2010 ACM SIGMOD International Conference on Management of data10.1145/1807167.1807337(1255-1256)Online publication date: 6-Jun-2010
  • (2010)Semantic clustering of XML documentsACM Transactions on Information Systems10.1145/1658377.165838028:1(1-56)Online publication date: 29-Jan-2010
  • (2010)Fast Detection of Functional Dependencies in XML DataDatabase and XML Technologies10.1007/978-3-642-15684-7_10(113-127)Online publication date: 2010
  • (2009)Information theory for data managementProceedings of the VLDB Endowment10.14778/1687553.16876242:2(1662-1663)Online publication date: 1-Aug-2009
  • (2008)Lossless decompositions in complex-valued databasesProceedings of the 5th international conference on Foundations of information and knowledge systems10.5555/1786094.1786118(329-347)Online publication date: 11-Feb-2008
  • (2008)Lossless Decompositions in Complex-Valued DatabasesFoundations of Information and Knowledge Systems10.1007/978-3-540-77684-0_22(329-347)Online publication date: 2008
  • (2007)Domination normal formProceedings of the thirtieth Australasian conference on Computer science - Volume 6210.5555/1273749.1273759(79-85)Online publication date: 30-Jan-2007
  • (2007)On the equivalence between FDs in XML and FDs in relationsActa Informatica10.1007/s00236-007-0048-x44:3(207-247)Online publication date: 26-Jun-2007
  • (2006)Making Designer Schemas with ColorsProceedings of the 22nd International Conference on Data Engineering10.1109/ICDE.2006.88Online publication date: 3-Apr-2006
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media