Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

An information-theoretic analysis of worst-case redundancy in database design

Published: 15 February 2008 Publication History

Abstract

Normal forms that guide the process of database schema design have several key goals such as elimination of redundancies and preservation of integrity constraints, such as functional dependencies. It has long been known that complete elimination of redundancies and complete preservation of constraints cannot be achieved simultaneously. In this article, we use a recently introduced information-theoretic framework, and provide a quantitative analysis of the redundancy/integrity preservation trade-off, and give techniques for comparing different schema designs in terms of the amount of redundancy they carry.
The main notion of the information-theoretic framework is that of an information content of each datum in an instance (which is a number in [0,1]): the closer to 1, the less redundancy it carries. We start by providing a combinatorial criterion that lets us calculate, for a relational schema with functional dependencies, the lowest information content in its instances. This indicates how good the schema design is in terms of allowing redundant information. We then study the normal form 3NF, which tolerates some redundancy to guarantee preservation of functional dependencies. The main result provides a formal justification for normal form 3NF by showing that this normal form pays the smallest possible price, in terms of redundancy, for achieving dependency preservation. We also give techniques for quantitative comparison of different normal forms based on the redundancy they tolerate.

References

[1]
Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley.
[2]
Aho, A. V., Beeri, C., and Ullman, J. D. 1979. The theory of joins in relational databases. ACM Trans. Datab. Syst. 4, 3, 297--314.
[3]
Arenas, M. and Libkin, L. 2004. A normal form for XML documents. ACM Trans. Datab. Syst. 29, 195--232.
[4]
Arenas, M. and Libkin, L. 2005. An information-theoretic approach to normal forms for relational and XML data. J. ACM 52, 2, 246--283.
[5]
Beeri, C., Bernstein, P. A., and Goodman, N. 1978. A sophisticate's introduction to database normalization theory. In Proceedings of the 4th International Conference on Very Large Data Bases. 113--124.
[6]
Beeri, C., Dowd, M., Fagin, R., and Statman, R. 1984. On the structure of Armstrong relations for functional dependencies. J. ACM 31, 1, 30--46.
[7]
Bernstein, P. A. 1976. Synthesizing third normal form relations from functional dependencies. ACM Trans. Datab. Syst. 1, 4, 277--298.
[8]
Bernstein, P. A. and Goodman, N. 1980. What does boyce-codd normal form do? In Proceedings of the 6th International Conference on Very Large Data Bases. IEEE Computer Society, 245--259.
[9]
Biskup, J. 1995. Achievements of relational database schema design theory revisited. In Semantics in Databases. 29--54.
[10]
Biskup, J., Dayal, U., and Bernstein, P. A. 1979. Synthesizing independent database schemas. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 143--151.
[11]
Biskup, J. and Meyer, R. 1987. Design of relational database schemes by deleting attributes in the canonical decomposition. J. Comput. Syst. Sci. 35, 1, 1--22.
[12]
Cavallo, R. and Pittarelli, M. 1987. The theory of probabilistic databases. In Proceedings of the 13th International Conference on Very Large Data Bases. 71--81.
[13]
Cover, T. M. and Thomas, J. A. 1991. Elements of Information Theory. John Wiley and Sons.
[14]
Dalkilic, M. M. and Robertson, E. L. 2000. Information dependencies. In Proceedings of the 19th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 245--253.
[15]
Demetrovics, J. and Thi, V. 1987. Keys, antikeys and prime attributes. Annales Univ. Sci., Sect. Comp., Budapest 8, 35--52.
[16]
Dewson, R. 2006. Beginning SQL Server 2005 for Developers: From Novice to Professional. Apress.
[17]
Fagin, R. 1979. Normal forms and relational database operators. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 153--160.
[18]
Fagin, R. 1981. A normal form for relational databases that is based on domians and keys. ACM Trans. Datab. Syst. 6, 3, 387--415.
[19]
Greenwald, R., Stackowiak, R., and Stern, J. 2007. Oracle Essentials: Oracle Database 11g, 4th Ed. O'Reilly Media.
[20]
Kanellakis, P. C. 1990. Elements of relational database theory. In Handbook of Theoretical Computer Science, Vol. B: Formal Models and Semantics, 1073--1156.
[21]
Kifer, M., Bernstein, A., and Lewis, P. M. 2006. Database Systems: An Application-Oriented Approach. Addison-Wesley.
[22]
Kolahi, S. 2007. Dependency-Preserving normalization of relational and XML data. J. Comput. Syst. Sci. 73, 4, 636--647.
[23]
Kolahi, S. and Libkin, L. 2006. On redundancy vs dependency preservation in normalization: an information-theoretic study of 3NF. In Proceedings of the 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 114--123.
[24]
LeDoux, C. H. and Parker, D. S. 1982. Reflections on boyce-codd normal form. In Proceedings of the 8th International Conference on Very Large Data Bases. Morgan Kaufmann, 131--141.
[25]
Lee, T. T. 1987. An information-theoretic analysis of relational databases - Part i: Data dependencies and information metric. IEEE Trans. Softw. Engin. 13, 10, 1049--1061.
[26]
Levene, M., Levene, M., and Loizou, G. 1999. A Guided Tour of Relational Databases and Beyond. Springer.
[27]
Levene, M. and Loizou, G. 2003. Why is the snowflake schema a good data warehouse design? Inf. Syst. 28, 3, 225--240.
[28]
Levene, M. and Vincent, M. W. 2000. Justification for inclusion dependency normal form. IEEE Trans. Knowl. Data Engin. 12, 2, 281--291.
[29]
Ling, T. W., Tompa, F. W., and Kameda, T. 1981. An improved third normal form for relational databases. ACM Trans. Datab. Syst. 6, 2, 329--346.
[30]
Mannila, H. and Räihä, K.-J. 1986. Design by example: An application of Armstrong relations. J. Comput. Syst. Sci. 33, 3, 126--141.
[31]
Stephens, R. K. and Plew, R. R. 2002. Sams Teach Yourself SQL in 21 Days 4th Ed. Sams.
[32]
Valiant, L. G. 1979. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 3, 410--421.
[33]
Vincent, M. W. 1999. Semantic foundations of 4NF in relational database design. Acta Inf. 36, 3, 173--213.
[34]
Zaniolo, C. 1982. A new normal form for the design of relational database schemata. ACM Trans. Datab. Syst. 7, 3, 489--499.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 35, Issue 1
February 2010
310 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1670243
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Accepted: 01 August 2009
Revised: 01 May 2009
Received: 01 September 2008
Published: 15 February 2008
Published in TODS Volume 35, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Database design
  2. Third Normal Form (3NF)
  3. functional dependency
  4. redundancy

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)5
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Third and Boyce–Codd normal form for property graphsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-025-00902-234:2Online publication date: 1-Mar-2025
  • (2024)Relational Schemas with Multiplicity Bounds, Diversity Bounds and Functional DependenciesFoundations of Information and Knowledge Systems10.1007/978-3-031-56940-1_3(45-63)Online publication date: 29-Mar-2024
  • (2023)Normalizing Property GraphsProceedings of the VLDB Endowment10.14778/3611479.361150616:11(3031-3043)Online publication date: 24-Aug-2023
  • (2022)Uniform probabilistic generation of relation instances satisfying a functional dependencyInformation Systems10.1016/j.is.2021.101848103:COnline publication date: 1-Jan-2022
  • (2021)Embedded Functional Dependencies and Data-completeness Tailored Database DesignACM Transactions on Database Systems10.1145/345051846:2(1-46)Online publication date: 29-May-2021
  • (2021)Logical Schema Design that Quantifies Update Inefficiency and Join EfficiencyProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3459238(1169-1181)Online publication date: 9-Jun-2021
  • (2019)Embedded functional dependencies and data-completeness tailored database designProceedings of the VLDB Endowment10.14778/3342263.334262612:11(1458-1470)Online publication date: 1-Jul-2019
  • (2019)Checking inference-proofness of attribute-disjoint and duplicate-preserving fragmentationsAnnals of Mathematics and Artificial Intelligence10.1007/s10472-019-09655-587:1-2(43-82)Online publication date: 1-Oct-2019
  • (2018)Inferences from Attribute-Disjoint and Duplicate-Preserving Relational FragmentationsFoundations of Information and Knowledge Systems10.1007/978-3-319-90050-6_5(77-96)Online publication date: 18-Apr-2018
  • (2016)Data Inconsistency Evaluation for Cyberphysical SystemInternational Journal of Distributed Sensor Networks10.1177/15501477949687812:8(9496878)Online publication date: 9-Aug-2016
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media