Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2882903.2915239acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

SQL Schema Design: Foundations, Normal Forms, and Normalization

Published: 14 June 2016 Publication History

Abstract

Normalization helps us find a database schema at design time that can process the most frequent updates efficiently at run time. Unfortunately, relational normalization only works for idealized database instances in which duplicates and null markers are not present. On one hand, these features occur frequently in real-world data compliant with the industry standard SQL, and especially in modern application domains. On the other hand, the features impose challenges that have made it impossible so far to extend the existing forty year old normalization framework to SQL. We introduce a new class of functional dependencies and show that they provide the right notion for SQL schema design. Axiomatic and linear-time algorithmic characterizations of the associated implication problem are established. These foundations enable us to propose a Boyce-Codd normal form for SQL. Indeed, we justify the normal form by showing that it permits precisely those SQL instances which are free from data redundancy. Unlike the relational case, there are SQL schemata that cannot be converted into Boyce-Codd normal form. Nevertheless, for an expressive sub-class of our functional dependencies we establish a normalization algorithm that always produces a schema in Value-Redundancy free normal form. This normal form permits precisely those instances which are free from any redundant data value occurrences other than the null marker. Experiments show that our functional dependencies occur frequently in real-world data and that they are effective in eliminating redundant values from these data sets without loss of information.

References

[1]
Z. Abedjan, L. Golab, and F. Naumann. Profiling relational data: a survey. VLDB J., 24(4):557--581, 2015.
[2]
M. Arenas. Normalization theory for XML. SIGMOD Record, 35(4):57--64, 2006.
[3]
W. W. Armstrong. Dependency structures of data base relationships. In IFIP Congress, pages 580--583, 1974.
[4]
P. Atzeni and N. M. Morfuni. Functional dependencies and constraints on null values in database relations. Information and Control, 70(1):1--31, 1986.
[5]
C. Beeri and P. A. Bernstein. Computational problems related to the design of normal form relational schemas. ACM Trans. Database Syst., 4(1):30--59, 1979.
[6]
P. A. Bernstein and N. Goodman. What does boyce-codd normal form do? In VLDB, pages 245--259, 1980.
[7]
J. Biskup, U. Dayal, and P. A. Bernstein. Synthesizing independent database schemas. In SIGMOD, pages 143--151, 1979.
[8]
J. Biskup and T. Polle. Adding inclusion dependencies to an object-oriented data model with uniqueness constraints. Acta Inf., 39(6--7):391--449, 2003.
[9]
E. F. Codd. Recent investigations in relational data base systems. In IFIP Congress, pages 1017--1021, 1974.
[10]
E. F. Codd. The Relational Model for Database Management, Version 2. Addison-Wesley, 1990.
[11]
H. Darwen, C. Date, and R. Fagin. A normal form for preventing redundant tuples in relational databases. In ICDT, pages 114--126, 2012.
[12]
J. Diederich and J. Milton. New methods and fast algorithms for database normalization. ACM Trans. Database Syst., 13(3):339--365, 1988.
[13]
R. Fagin. Multivalued dependencies and a new normal form for relational databases. ACM Trans. Database Syst., 2(3):262--278, 1977.
[14]
R. Fagin. A normal form for relational databases that is based on domains and keys. ACM Trans. Database Syst., 6(3):387--415, 1981.
[15]
F. Ferrarotti, S. Hartmann, H. Köhler, S. Link, and M. W. Vincent. The Boyce-Codd-Heath normal form for SQL. In WoLLIC, pages 110--122, 2011.
[16]
S. Hartmann and S. Link. When data dependencies over SQL tables meet the logics of paradox and S-3. In PODS, pages 317--326, 2010.
[17]
S. Hartmann and S. Link. The implication problem of data dependencies over SQL table definitions. ACM Trans. Database Syst., 37(2):13, 2012.
[18]
I. J. Heath. Unacceptable file operations in a relational data base. In SIGFIDET Workshop, pages 19--33, 1971.
[19]
R. T. Herrera, J. Tekli, R. Chbeir, S. Laborie, I. Dongo, and R. Guzman. Toward RDF normalization. In ER, pages 261--275, 2015.
[20]
H. Köhler and S. Link. SQL Schema Design. Technical Report TR-496, Centre for Discrete Mathematics and Theoretical Computer Science, The University of Auckland, 2016.
[21]
H. Köhler, S. Link, and X. Zhou. Possible and certain SQL key. PVLDB, 8(11):1118--1129, 2015.
[22]
G. Lausen. Relational databases in RDF: Keys and foreign keys. In SWDB-ODBIS, pages 43--56, 2007.
[23]
M. Levene. The Nested Universal Relation Database Model, volume 595 of LNCS. Springer, 1992.
[24]
M. Levene and G. Loizou. Axiomatisation of functional dependencies in incomplete relations. Theor. Comput. Sci., 206(1--2):283--300, 1998.
[25]
M. Levene and G. Loizou. Database design for incomplete relations. ACM Trans. Database Syst., 24(1):80--125, 1999.
[26]
M. Levene and G. Loizou. A guided tour of relational databases and beyond. Springer, 1999.
[27]
M. Levene and M. W. Vincent. Justification for inclusion dependency normal form. IEEE Trans. Knowl. Data Eng., 12(2):281--291, 2000.
[28]
Y. E. Lien. On the equivalence of database models. J. ACM, 29(2):333--362, 1982.
[29]
S. Link. Reasoning about saturated conditional independence under uncertainty. In AAAI, 2013.
[30]
J. A. Makowsky and E. V. Ravve. Dependency preserving refinements and the fundamental problem of database design. Data Knowl. Eng., 24(3):277--312, 1998.
[31]
H. Mannila and K.-J. Raiha. Design of Relational Databases. Addison-Wesley, 1992.
[32]
M. Niepert, M. Gyssens, B. Sayrafi, and D. V. Gucht. On the conditional independence implication problem: A lattice-theoretic approach. Artif. Intell., 202:29--51, 2013.
[33]
T. Papenbrock, J. Ehrlich, J. Marten, T. Neubert, J. Rudolph, M. Schönberg, J. Zwiener, and F. Naumann. Functional dependency discovery. PVLDB, 8(10):1082--1093, 2015.
[34]
J. Paredaens, P. D. Bra, M. Gyssens, and D. V. Gucht. The Structure of the Relational Database Model, volume 17 of EATCS Monographs on Theoretical Computer Science. Springer, 1989.
[35]
J. Rissanen. Independent components of relations. ACM Trans. Database Syst., 2(4):317--325, 1977.
[36]
Z. Tari, J. Stokes, and S. Spaccapietra. Object normal forms and dependency constraints for object-oriented schemata. ACM Trans. Database Syst., 22(4):513--569, 1997.
[37]
B. Thalheim. Entity-relationship modeling - foundations of database technology. Springer, 2000.
[38]
D.-M. Tsou and P. C. Fischer. Decomposition of a relation scheme into Boyce-Codd normal form. SIGACT News, 14(3):23--29, 1982.
[39]
Y. Vassiliou. Functional dependencies and incomplete information. In VLDB, pages 260--269, 1980.
[40]
M. W. Vincent. Semantic foundations of 4NF in relational database design. Acta Inf., 36(3):173--213, 1999.
[41]
M. W. Vincent, J. Liu, and C. Liu. Strong functional dependencies and their application to normal forms in XML. ACM Trans. Database Syst., 29(3):445--462, 2004.
[42]
C. Zaniolo. Database relations with null values. J. Comput. System Sci., 28(1):142--166, 1984.

Cited By

View all
  • (2023)Normalizing Property GraphsProceedings of the VLDB Endowment10.14778/3611479.361150616:11(3031-3043)Online publication date: 24-Aug-2023
  • (2023)Cardinality constraints and functional dependencies in SQL: Taming data redundancy in logical database designInformation Systems10.1016/j.is.2023.102208115(102208)Online publication date: May-2023
  • (2022)Data ProfilingundefinedOnline publication date: 25-Feb-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Boyce-Codd normal form
  2. SQL
  3. axiomatization
  4. discovery
  5. functional dependency
  6. implication problem
  7. key
  8. normalization
  9. null
  10. redundancy

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'16
Sponsor:
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)64
  • Downloads (Last 6 weeks)5
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Normalizing Property GraphsProceedings of the VLDB Endowment10.14778/3611479.361150616:11(3031-3043)Online publication date: 24-Aug-2023
  • (2023)Cardinality constraints and functional dependencies in SQL: Taming data redundancy in logical database designInformation Systems10.1016/j.is.2023.102208115(102208)Online publication date: May-2023
  • (2022)Data ProfilingundefinedOnline publication date: 25-Feb-2022
  • (2021)Embedded Functional Dependencies and Data-completeness Tailored Database DesignACM Transactions on Database Systems10.1145/345051846:2(1-46)Online publication date: 29-May-2021
  • (2021)Logical Schema Design that Quantifies Update Inefficiency and Join EfficiencyProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3459238(1169-1181)Online publication date: 9-Jun-2021
  • (2021)Possible Keys and Functional DependenciesJournal on Data Semantics10.1007/s13740-021-00135-w10:3-4(327-366)Online publication date: 14-Aug-2021
  • (2021)Data dependencies for query optimization: a surveyThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-021-00676-331:1(1-22)Online publication date: 14-Jun-2021
  • (2019)Discovery and ranking of embedded uniqueness constraintsProceedings of the VLDB Endowment10.14778/3358701.335870312:13(2339-2352)Online publication date: 1-Sep-2019
  • (2019)Embedded functional dependencies and data-completeness tailored database designProceedings of the VLDB Endowment10.14778/3342263.334262612:11(1458-1470)Online publication date: 1-Jul-2019
  • (2018)Data ProfilingSynthesis Lectures on Data Management10.2200/S00878ED1V01Y201810DTM05210:4(1-154)Online publication date: 7-Nov-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media