Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

An information-theoretic approach to normal forms for relational and XML data

Published: 01 March 2005 Publication History

Abstract

Normalization as a way of producing good relational database designs is a well-understood topic. However, the same problem of distinguishing well-designed databases from poorly designed ones arises in other data models, in particular, XML. While, in the relational world, the criteria for being well designed are usually very intuitive and clear to state, they become more obscure when one moves to more complex data models.Our goal is to provide a set of tools for testing when a condition on a database design, specified by a normal form, corresponds to a good design. We use techniques of information theory, and define a measure of information content of elements in a database with respect to a set of constraints. We first test this measure in the relational context, providing information-theoretic justification for familiar normal forms such as BCNF, 4NF, PJ/NF, 5NFR, DK/NF. We then show that the same measure applies in the XML context, which gives us a characterization of a recently introduced XML normal form called XNF. Finally, we look at information-theoretic criteria for justifying normalization algorithms.

References

[1]
Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley, Reading, Mass.
[2]
Albert, J., Ioannidis, Y., and Ramakrishnan, R. 1999. Equivalence of keyed relational schemas by conjunctive queries. J. Comput. Syst. Sci. 58, 3, 512--534.
[3]
Arenas, M., and Libkin, L. 2002. A normal form for XML documents. In Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York, 85--96.
[4]
Arenas, M., and Libkin, L. 2004. A normal form for XML documents. ACM Trans. Datab. Syst. 29, 1, 195--232.
[5]
Beeri, C., Bernstein, P., and Goodman, N. 1978. A sophisticate's introduction to database normalization theory. In Proceedings of the 4th International Conference on Very Large Data Bases. 113--124.
[6]
Biskup, J. 1995. Achievements of relational database schema design theory revisited. In Semantics in Databases, L. Libkin and B. Thalheim, Eds. Springer-Verlag, New York, 29--54.
[7]
Cavallo, R., and Pittarelli, M. 1987. The theory of probabilistic databases. In Proceedings of 13th International Conference on Very Large Data Bases. 71--81.
[8]
Cover, T., and Thomas, J. 1991. Elements of Information Theory. Wiley-Interscience, New York.
[9]
Dalkilic, M., and Robertson, E. 2000. Information dependencies. In Proceedings of the 19th ACM Symposium on Principles of Database Systems. ACM, New York, 245--253.
[10]
Embley, D., and Mok, W. Y. 2001. Developing XML documents with guaranteed “good” properties. In Proceedings of the 20th International Conference on Conceptual Modeling. 426--441.
[11]
Fagin, R. 1977. Multivalued dependencies and a new normal form for relational databases. ACM Trans. Datab. Syst. 2, 3, 262--278.
[12]
Fagin, R. 1979. Normal forms and relational database operators. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data. ACM, New York, 153--160.
[13]
Fagin, R. 1981. A normal form for relational databases that is based on domians and keys. ACM Trans. Datab. Syst. 6, 3, 387--415.
[14]
Hull, R. 1986. Relative information capacity of simple relational database schemata. SIAM J. Comput. 15, 3, 856--886.
[15]
Kanellakis, P. 1990. Elements of relational database theory. In Handbook of Theoretical Computer Science, Volume B. MIT Press, Cambridge, Mass., 1075--1144.
[16]
Lee, T. 1987. An information-theoretic analysis of relational databases---Part I: Data dependencies and information metric. IEEE Trans. Softw. Eng. 13, 10, 1049--1061.
[17]
Levene, M., and Loizou, G. 2003. Why is the snowflake schema a good data warehouse design? Inf. Syst. 28, 3, 225--240.
[18]
Levene, M., and Vincent, M. 2000. Justification for inclusion dependency normal form. IEEE Trans. Knowl. Data Eng. 12, 2, 281--291.
[19]
Ley, M. 2003. DBLP. http://www.informatik.uni-trier.de/~ley/db/index.html.
[20]
Maier, D., Mendelzon, A., and Sagiv, Y. 1979. Testing implications of data dependencies. ACM Trans. Datab. Syst. 4, 4, 455--469.
[21]
Mok, W. Y., Ng, Y.-K., and Embley, D. 1996. A normal form for precisely characterizing redundancy in nested relations. ACM Trans. Datab. Syst. 21, 1, 77--106.
[22]
Özsoyoglu, M., and Yuan, L.-Y. 1987. A new normal form for nested relations. ACM Trans. Datab. Syst. 12, 1, 111--136.
[23]
Papadimitriou, C. 1994. Computational complexity. Addison-Wesley, Reading, Mass.
[24]
Shannon, C. 1948. A mathematical theory of communication. Bell Syst. Tech. J. 27, 379--423 (Part I), 623--656 (Part II).
[25]
Suciu, D. 2001. On database theory and XML. SIGMOD Record 30, 3, 39--45.
[26]
Tatarinov, I., Ives, Z., Halevy, A., and Weld, D. 2001. Updating XML. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. ACM, New York, 413--424.
[27]
Vianu, V. 2001. A web odyssey: From Codd to XML. In Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York, 1--15.
[28]
Vincent, M. 1997. A corrected 5NF definition for relational database design. Theoret. Comput. Sci. 185, 2, 379--391.
[29]
Vincent, M. 1999. Semantic foundations of 4NF in relational database design. Acta Inf. 36, 3, 173--213.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 52, Issue 2
March 2005
189 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/1059513
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2005
Published in JACM Volume 52, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Information theory
  2. XML
  3. design
  4. normal forms
  5. normalization algorithms
  6. relational databases

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A Survey on Mapping Semi-Structured Data and Graph Data to Relational DataACM Computing Surveys10.1145/356744455:10(1-38)Online publication date: 2-Feb-2023
  • (2023)Applications of Information Inequalities to Database Theory Problems2023 38th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)10.1109/LICS56636.2023.10175769(1-30)Online publication date: 26-Jun-2023
  • (2023)Entity integrity management under data volume, variety and veracityKnowledge and Information Systems10.1007/s10115-022-01814-165:7(2895-2934)Online publication date: 25-Jan-2023
  • (2021)Bag Query Containment and Information TheoryACM Transactions on Database Systems10.1145/347239146:3(1-39)Online publication date: 28-Sep-2021
  • (2021)Structurally Recursive Patterns in Data Modeling and Their ResolutionModelling to Program10.1007/978-3-030-72696-6_7(137-161)Online publication date: 31-Mar-2021
  • (2020)Bag Query Containment and Information TheoryProceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3375395.3387645(95-112)Online publication date: 14-Jun-2020
  • (2020)A Study on Information-Preserving Schema TransformationsInternational Journal of Semantic Computing10.1142/S1793351X2040002414:01(27-53)Online publication date: 9-Jun-2020
  • (2020)Transformation of Fuzzy Spatiotemporal Data Between Relational Databases and XMLModeling Fuzzy Spatiotemporal Data with XML10.1007/978-3-030-41999-8_6(123-145)Online publication date: 5-Mar-2020
  • (2019)On Preserving Information in Schema Transformations: A Constructive Perspective2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)10.1109/AIKE.2019.00019(57-64)Online publication date: Jun-2019
  • (2018)InfoCleanJournal of Data and Information Quality10.1145/31905779:4(1-26)Online publication date: 12-Apr-2018
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media