The document discusses database normalization. Normalization is the process of organizing data to avoid data redundancy and inconsistencies. It discusses the three normal forms - 1st normal form requires each table column contain atomic values, 2nd normal form requires columns depend on the whole primary key, and 3rd normal form removes transitive dependencies. The document also contrasts top-down design, which identifies entity types before attributes, versus bottom-up design, which groups attributes into entities.
2. IN THIS PRESENTATION
Entity and Referential Integrity
Physical Database Design: tables, primary keys, foreign keys
Normalization - 1st , 2nd , 3rd Normal Forms
Top-down versus Bottom-up Design
2
3. TABLES / RELATIONS
When creating a table also called a relation:
• Each attribute value must be a single value only.
• All values for a given attribute must be of the same data type.
• Each attribute (column) name must be unique.
• The order of attributes (columns) is insignificant
• No two tuples (rows) in a relation should be identical.
• The order of the tuples (rows) is insignificant.
3
4. ENTITY AND REFERENTIAL INTEGRITY
• An Entity typically corresponds to a relation.
• Thus an entity’s attributes become attributes of the
relation.
• These attributes are represented by columns in a
relation
4
5. KEYS
• Keys play a very important role in relational databases. They
are used to establish and identify relationships between
tables. They are also ensure that each record can be uniquely
identified by a combination of one or more field found in a
table.
5
6. PRIMARY & FOREIGN KEYS
Foreign Key
A field in a table that matches the primary key column of another table. The
purpose of the foreign key is to ensure referential integrity of the data. In other
words, only values that are supposed to appear in the database are permitted.
6
7. FUNCTIONAL DEPENDENCIES
• Describes a relationship between attributes within a single table.
• An attribute is functionally dependent on another if we can use
the value of one attribute to determine the value of another.
• Example: Employee_Name is functionally dependent on
Social_Security_Number because Social_Security_Number can be
used to uniquely determine the value of Employee_Name.
The arrow symbol → is used to indicate a functional dependency.
X → Y is read X functionally determines Y
7
8. FUNCTIONAL DEPENDENCIES
Here are a few more examples:
- Student_ID → Student_Major
- Semester → Grade, Course_Number
- TaxRate → Car_Price
• The attributes listed on the left hand side of the → are
called determinants.
• One can read A → B as:
• A Determines B
• Given a value for A, we can determine one value for B.
8
9. NORMALIZATION
• Normalization is a process in which we systematically
examine relations for anomalies and, when
detected, remove those anomalies by splitting up
the relation into two new, related, relations.
In a nut shell Normalization is the process of
efficiently organizing data in a database.
9
10. NORMALIZATION
• Normalization is a relational database concept. If you have
created a correct entity model, then the tables created
during design will conform to the rules of normalization.
• Normalization can also be thought of as a trade-off between
data redundancy and performance. Normalizing a relation
reduces data redundancy but introduces the need for joins
when all of the data is required by an application such as a
report query.
10
11. NORMAL FORMS
• There are a series of guidelines for ensuring that
databases are normalized. These are divided into
•
•
•
•
•
•
1NF – First Normal Form
2NF – Second Normal Form
“Third normal form is the generally
accepted goal for a database design
3NF – Third Normal Form
that eliminated redundancy.”
4NF – Forth Normal Form
5NF – Fifth Normal Form
BCNF – Boyce & Codd Normal Form
• 4NF and 5NF are rarely seen and won't be discussed in
this chapter.
11
12. NORMALIZATION RULES
Normal Form Rule
Description
First Normal Form
The table contains no duplicative groups i.e. no
columns are repeated.
Second Normal Form (2NF)
The Table must be in 1NF.
An attribute must be dependent upon entity’s
entire unique identifier.
Third Normal Form (3NF)
The Table must be in 2NF.
No non-UID attribute can be dependent on
another non-UID attribute.
“Each non-primary key value MUST be dependent on the
key, the whole key, and nothing but the key.”
12
13. FIRST NORMAL FORM – 1NF
The table must express a set of unordered, two-dimensional table structures.
A table is considered in the first normal form if it contains no repeating groups.
• Steps to Remove Repeating Groups
1. Remove the repeating columns from the original table.
2. Create separate tables for each group of related data
3. Identify each row with a unique column or set of columns
(the primary key).
4. Create a foreign key in the new table to link back to the
original table.
13
14. 2ND NORMAL FORM
A relation is in second normal form (2NF) if it is in 1NF and all of its non-key
attributes are dependent on all of the key.
• Another way to say this: A relation is in second normal form
if it is free from partial-key dependencies
• Relations that have a single attribute for a key are
automatically in 2NF.
14
15. 2ND NORMAL FORM
• Steps to Remove Partial Dependencies
1. Determine which non-key columns are only partially
dependent upon the table’s primary key.
2. Remove those columns from the base table.
3. Create a second table with those non-keyed columns an
assign an appropriate primary key.
4. Create a foreign key from the original base table to the
new table, linking to the new primary key.
15
16. 3RD NORMAL FORM
A relation is in third normal form (3NF) if it is in second normal form and
it contains no transitive dependencies.
• Steps to Remove Transitive Dependencies
1. Determine which columns are dependent on another nonkeyed column.
2. Remove those columns from the base table.
3. Create a second table with those columns and the nonkey columns that they are dependent upon.
4. Create a foreign key in the original table linking to the
primary key of the new table.
16
17. TOP-DOWN DESIGN VS BOTTOM UP DATABASE
SCHEMA DESIGN
• TOP DOWN
• Identifies the data sets and then defines the data
elements for each of those sets. That is entity types
are defined followed by each entity’s attributes, often
represented by ER modelling.
• BOTTOM UP
• First identifies the data elements and then groups them
together in data sets i.e. it first defines attributes and
then groups them to form entities
17
18. TOP-DOWN DESIGN VS BOTTOM UP DESIGN
Top Down
Entity
Attribute
Attribute
Entity
Attribute
Attribute
Bottom Up
Conceptual
Model
18
19. SUMMARY
1NF - The table must express a set of unordered, two
dimensional tables. The table cannot contain repeating groups.
2NF - The table must be in 1NF. Every non-key column must be
dependent on all parts of the primary key.
3NF - The table must be in 2NF. No non-key column may be
functionally dependent on another non-key column.
An entity relationship model transforms into
normalized data design.
19
20. REFERENCES
• Gillenson, Mark L.,2012, Fundamentals of Database
Management Systems / Mark L. Gillenson.—2nd ed., John
Wiley and sons inc
• http://holowczak.com/database-normalization/
• http://www.darkopetrovic.com/pdf/Data-Modeling-andRelational-Database-Design.pdf
• http://databases.about.com/od/specificproducts/a/normali
zation.htm
20