Functional Dependencies and Normalization4
Functional Dependencies and Normalization4
and
Normalization for Relational
Databases
Relational Database Designs
There are many different relational database
"designs" (schemas) that can be used to store the
relevant mini-world information.
Problems ??
• Redundant Data
• Inability to represent certain information
• Loss of information
Design Anomalies
Ship(S#, sname, status, city, P#, pname, colour, weight,
qty, date)
Good designs:
No anomalies
Can reconstruct all original information
EXAMPLE: SHIPMENT OF PARTS
EXAMPLE: SHIPMENT OF PARTS
Informal Design Guidelines for Relational
Databases
9
INFORMAL DESIGN GUIDELINES FOR
RELATIONAL DATABASES
11
RELATION DECOMPOSITION
Relation decomposition can be dangerous we can
loose information.
Loss-less join decomposition
• Eliminates redundancy
Spurious Tuples
RELATION DECOMPOSITION
Join returns the original relation
Unnecessary: no redundancy is removed, and
now SID is stored twice!
Unnecessary decomposition
Functional Dependencies
How do we tell if a design is bad?
21
EXAMPLES OF FD CONSTRAINTS
• FD is a property of relational schema R. Must be define
by someone who knows the semantic of attributes of R.
• FD’s hold at all times
• Certain FD’s can be ruled out based on a given state of
the database
22
FUNCTIONAL DEPENDENCIES
Given a relation R and a set of FD’s F
Does another FD follow from F?
Are some of the FD’s in F redundant (i.e., they follow
from the others)?
Example:
Disk_drive (‘1978619’, ‘WesternDigital’, ‘A2235X’, ‘765234’, 500, ‘CompUSA’)
27
PROOFS
If X -> YZ, then X -> Y and X -> Z
YZ -> Y (reflexive)
X -> YZ (given)
X -> Y (transitivity)
29
ALGORITHM FOR COMPUTING ATTRIBUTE
CLOSURE
The closure of X (denoted X +) with respect to F is the
set of all attributes functionally determined by X
Algorithm for computing the closure
X+ = X
repeat
OldX+ =X+
for each functional dependency Y → Z in F do
if Y⊆ X+ then X+ := X+ ∪ Z
until (X + = oldX+ )
EXAMPLE 1: CLOSURE ALGORITHM
Consider following schema
And FDs
F = {Ssn -> Ename,
Pno -> {Pname, Plocation},
{Ssn, Pno} -> Hours }
{ CID, email }+ = ?
X+ = {CID, email}
Example F includes:
SID → name, email; email → SID; SID, CID → grade
CID, email → grade?
email → SID (given in F)
CID, email → CID, SID (augmentation)
SID, CID → grade (given in F)
CID, email → grade (transitivity)
Minimal Sets of FDs
A set of FDs is minimal if it satisfies the
following conditions:
34
EXAMPLE: MINIMAL COVER
F = { AB -> C, C -> A, BC -> D, ACD -> B, D -> EG,
BE -> C, CG -> BD, CE -> AG}
redundant
EXAMPLE: MINIMAL COVER
Consider ACD -> B to see if any of the three
attributes on the LHS is redundant
Computing CD+ ={ACDEGB}
Closure contains A and thus, A is redundant.
The closure contains B, which tells us that CD -> B holds.
if {F – {X->A} } = F
then remove X->A from F
EXAMPLE 2: COMPUTE MINIMAL SETS OF FDS
Now E′ : {B → A, D → A, B → D}.
41
EXAMPLE 2: COMPUTE MINIMAL SETS OF FDS
E′ : {B → A, D → A, B → D}.
Step 3: Look for a redundant FD in E′
B→D
D → A,
Thus, B → A (transitive rule)
Eliminate B → A in E’.
42
MINIMAL COVER
Given a relation R (A, B, C, D, E, F) and
a set of FDs F = {A →BCE, CD →EF, E →F, B →
E, AB → CF}.
Compute the minimal cover for F
Equivalence of Sets of FDs
Two sets of FDs F and G are equivalent if:
Every FD in F can be inferred from G, and
Every FD in G can be inferred from F
Hence, F and G are equivalent if F+ =G+
Covers:
F covers G if every FD in G can be inferred from F
(i.e., if G+ ⊆ F+)
44
EXAMPLE: EQUIVALENCE BETWEEN SETS
OF FUNCTIONAL DEPENDENCIES
F = {A -> B, A ->C}
G = {A -> B, B -> C}
Are F and G equivalent?
A+ using G = ABC;
A+ using F = ABC;
B+ using F = B,
This indicate B -> C in G is not inferred using the
FDs from F.
51
NORMAL FORMS
Denormalization:
Opposite of normalization
Join relations to form a base relation—which is in
a lower normal form
52
SUPERKEYS AND KEYS
Superkey
Key
Candidate key
Primary key
Secondary keys
53
First Normal Form
It disallows Composite attributes, Multivalued attributes and
Nested relations
Attributes can have atomic values for an individual tuple
It is considered to be a part of the definition of relation
54
Normalization of nested relations into
1NF
55
Second Normal Form
Uses the concepts of FDs
Full functional dependency: a FD Y -> Z where removal
of any attribute from Y means the FD does not hold any
more
56
SECOND NORMAL FORM
Second Normal Form:
A relation that is in 1NF and every non-prime
attribute is fully functionally dependent on the
primary key
57
Normalizing into 2NF
58
Third Normal Form
Transitive functional dependency:
a FD X -> Z that can be derived from two FDs X -> Y and
Y -> Z
59
THIRD NORMAL FORM
Third Normal Form:
A relation that is in 2NF, and in which no non-prime
attribute is transitively dependent on the primary
key.
60
THIRD NORMAL FORM
NOTE:
In X -> Y and Y -> Z, with X as the primary key,
we consider this a problem only if Y is not a
candidate key.
62
SUMMARY OF NORMAL FORMS
based on Primary Keys
63
GENERAL NORMAL FORM DEFINITIONS
(FOR MULTIPLE KEYS)
The above definitions consider the primary key
only
The following more general definitions take into
account relations with multiple candidate keys
A relation schema R is in 2NF if every non-prime
attribute A in R is fully functionally dependent
on every key of R
64
GENERAL NORMAL FORM DEFINITIONS
A relation R is in 3NF if whenever a FD X -> A
holds in R, then either:
a) X is a superkey of R, or
b) A is a prime attribute of R
65
Successive Normalization of LOTS into 2NF and
3NF
66
BCNF (Boyce-Codd Normal Form)
A relation schema R is in Boyce-Codd Normal Form
(BCNF) if whenever an FD X -> A holds in R, then X is a
superkey of R
Each normal form is strictly stronger than the previous one
Every 2NF relation is in 1NF
Every 3NF relation is in 2NF
Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
67
Boyce-Codd Normal Form
68
A relation that is in 3NF but not in BCNF
{student, course} is a
candidate key
69
A relation that is in 3NF but not in BCNF
70
EXAMPLE
What is the key for R ? Given
F = {A, B->C,
B, D->E, F,
A, D->G, H,
A->I,
H->J }
Key
ABD
2NF
R1 (A, B, C) R2 (B,D, E, F) R3 (A, D, G, H, J) R4 (A, I) , R=(ABD)
3NF
R1 (A, B, C) R2 (B,D, E, F) R3.1 (A, D, G, H) R3.2 (H, J) R4 (A, I),
R=(ABD)
EXAMPLE: DESIGNING A STUDENT
DATABASE
1. Students take courses
2. Students typically take more than one course
3. Students can fail courses and can repeat the same course in
different semesters => Students can take the same course more
than once.
4. Students are assigned a grade for each course they take.
Problems:
Redundancy
Insert anomalies
Delete anomalies
Update problems
DESIGNING A STUDENT DATABASE -3
sname, dept, advisor are dependent only upon sID
credit dependent only on course and is independent of which
semester it is offered and which student is taking it.
course-room, instructor and instructor-office only depend upon
the course and the semester (are independent of which student is
taking the course).
Only grade is dependent upon all 3 parts of the original key.
Fields in the
original data
table will be as
follows:
SalesOrderNo,
Date,
CustomerNo,
CustomerName,
CustomerAdd,
ClerkNo,
ClerkName,
ItemNo,
Description,
Qty, UnitPrice
NORMALIZATION: FIRST NORMAL FORM
Separate Repeating Groups into New Tables.
Repeating Groups Fields that may be repeated several
times for one document/entity
The primary key of the new table (repeating group) is
always a composite key;
Relations in 1NF:
SalesOrderNo, ItemNo, Description, Qty, UnitPrice
SalesOrderNo, Date, CustomerNo, CustomerName,
CustomerAdd, ClerkNo, ClerkName
NORMALIZATION: SECOND NORMAL FORM
Remove Partial Dependencies.
Note: Do not treat price as dependent on item. Price
may be different for different sales orders (discounts,
special customers, etc.)
Relations in 2NF
ItemNo, Description
SalesOrderNo, ItemNo, Qty, UnitPrice
SalesOrderNo, Date, CustomerNo, CustomerName,
CustomerAdd, ClerkNo, ClerkName
NORMALIZATION: SECOND NORMAL FORM
What if we did not Normalize the Database to
Second Normal Form?
Redundancy
Delete Anomalies
Insert Anomalies
Update Anomalies
NORMALIZATION: THIRD NORMAL FORM
Remove transitive dependencies
Relations in 3NF
Customers: CustomerNo, CustomerName, CustomerAdd
Clerks: ClerkNo, ClerkName
Inventory Items: ItemNo, Description
Sales Orders: SalesOrderNo, Date, CustomerNo, ClerkNo
SalesOrderDetail: SalesOrderNo, ItemNo, Qty, UnitPrice
NORMALIZATION: THIRD NORMAL FORM
What if we did not Normalize the Database to
Third Normal Form?
Redundancy – Detail for Cust/Clerk would appear on
every SO
Delete Anomalies – Delete a sales order, delete the
customer/clerk
Insert Anomalies – To insert a customer/clerk, must
insert sales order.
Update Anomalies – To change the name/address, etc,
must change it on every SO.
CHAPTER SUMMARY
Informal Design Guidelines for Relational
Databases
Functional Dependencies (FDs)
Definition, Inference Rules, Equivalence of Sets of
FDs, Minimal Sets of FDs
Normal Forms Based on Primary Keys
General Normal Form Definitions (For Multiple
Keys)
BCNF (Boyce-Codd Normal Form)
87