Normalization
Normalization
Chapter Objectives
The purpose of normailization Data redundancy and Update Anomalies Functional Dependencies The Process of Normalization First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF)
Update Anomalies
Relations that have redundant data may have problems called update anomalies, which are classified as , Insertion anomalies Deletion anomalies Modification anomalies
To insert a new staff with branchNo B007 into the StaffBranch relation; To delete a tuple that represents the last member of staff located at a branch B007; To change the address of branch B003.
StaffBranch
staffNo
SL21 SG37 SG14
sName
John White Ann Beech David Ford
position
Manager Assistant Superviso r
salary branchN o
30000 12000 18000 B005 B003 B003
bAddress
22 Deer Rd, London 163 Main St,Glasgow 163 Main St,Glasgow
SA9
SG5 SL41
Mary Howe
Susan Brand Julie Lee
Assistant
Manager Assistant
9000
24000 9000
B007
B003 B005
Staff
staffNo
SL21 SG37 SG14 SA9 SG5 SL41
sName
John White Ann Beech David Ford Mary Howe Susan Brand Julie Lee
position
Manager Assistant Supervisor Assistant Manager Assistant
salary branceN o
30000 12000 18000 9000 24000 9000 B005 B003 B003 B007 B003 B005
Branch
branceN o
B005 B007 B003
bAddress
22 Deer Rd, London 16 Argyll St, Aberdeen 163 Main St,Glasgow
Functional Dependencies
Functional dependency describes the relationship between
attributes in a relation. For example, if A and B are attributes of relation R, and B is functionally dependent on A ( denoted A B), if each value of A is associated with exactly one value of B. ( A and B may each consist of one or more attributes.)
B is functionally
A
dependent on A
B
Refers to the attribute or group of attributes on the left-hand side of the arrow of a functional dependency
Determinant
Trival functional dependency means that the right-hand side is a subset ( not necessarily a proper subset) of the lefthand side.
For example:
Main characteristics of functional dependencies in normalization Have a one-to-one relationship between attribute(s) on the left- and right- hand side of a dependency; hold for all time; are nontrivial.
The FD in a given relation are determined by the semantics of the relation not by data instances Teacher Smith Smith Brown Course DS DBMS DS Text Bartram Al-nour Augen
Hall Compilers Hoffman TEACH looks to satisfy TEXT COURSE Instance can be used to disprove a FD
TEACHER -\-> COURSE COURSE -\-> TEXT COURSE -\-> TEACHER
Exercise EMP_DEPT(ENAME, SSN, BDATE, DNUMBER, DNAME, DMGRSSN, DLOC) The FDs in this relation are: 1) SSN ENAME, BDATE, ADDRESS, DNUMBER 2) DNUMBER DNAME, DMGRSSN, DLOC Note: Each table much represent only one concept.
Examples
Finding a key:
SNAME does not appear in RHS, so SNAME must be a part of the key. Since SNAME ADDR ZIP, we know SNAME ADDR, ZIP But SNAME alone cannot determine any more.
How can we determine ITEM and PRICE? If we have ITEM, them we can determine PRICE So, SNAME, ITEM SNAME, ADDR, ZIP, ITEM, PRICE. So it satisfies the definition of the key
Lossless Decomposition
Decomposition means dividing a table into multiple tables Decomposition is lossless (or nonloss) if it is possible to reconstruct R from decomposed relations using JOINs. Condition for lossless join when R was decomposed into R1, R2, Rn.
R = R1 R2 R3 JOIN operation. R
. Rn, where
means
Lossy decomposition
R1
R2 R3 . Rn
Why need it? To maintain the accurate database What if not? Cause wrong answers for queries How to check? It is sufficient if any Ri contains a candidate key of R when we used the normalization algorithms for 3NF/BCNF. This means that if any of the decomposed relation contains a complete CK (or PK) of the original relation, then the decomposition is called lossless. This means by joining all the decomposed relations, we can reconstruct the original relation.
Armstrongs axioms
Theorem: Armstrongs axioms are sound and complete Soundness: any result derived by applying the Armstrongs axiom is always correct. Completeness: Armstrongs axiom can derive all the FDs that are necessary for computation of normalization. We can fine all candidate keys by using Armstrongs axiom. We can compute the minimal cover of relations using Armstrongs axiom.
Inference Rules
A set of all functional dependencies that are implied by a given set of functional dependencies X is called closure of X, written X+. A set of inference rule is needed to compute X+ from X.
Armstrongs axioms 1. 2. 3. 4. 5. 6. 7. Relfexivity: If B is a subset of A, them A B Augmentation: If A B, then A, C B Transitivity: If A B and B C, then A C Self-determination: AA Decomposition: If A B,C then A B and A C Union: If A B and A C, then A B,C Composition: If A B and C D, then A,C B,
A set of functional dependencies for the StaffBranch relation satisfies the three conditions for producing a minimal set. staffNo sName staffNo position staffNo salary staffNo branchNo staffNo bAddress branchNo bAddress branchNo, position salary bAddress, position salary
rent 350
ownerNo CO40
oName Tina Murphy Tony Shaw Tina Murphy Tony Shaw Tony Shaw
CR76
John kay
1-Sep-02
1-Sep-99
1-Sep-02
10-Jun-00
450
350
CO93
CO40
CR56
PG36
10-Oct-00
1-Dec-01
370
CO93
PG16
1-Nov-02
1-Aug-03
450
CO93
Definition of 1NF
First Normal Form is a relation in which the intersection of each row and column contains one and only one value. * A relation R os om 1NF if all attributes have atomic values. There are two approaches to removing repeating groups from unnormalized tables:
1. Removes the repeating groups by entering appropriate data in the empty columns of rows containing the repeating data. 2. Removes the repeating group by placing the repeating data, along with a copy of the original key attribute(s), in a separate relation. A primary key is identified for the new relation.
propertyN o
PG4 PG16 PG4
cName
John Kay John Kay Aline Stewart Aline Stewart Aline Stewart
pAddress
6 lawrence St,Glasgow 5 Novar Dr, Glasgow 6 lawrence St,Glasgow 2 Manor Rd, Glasgow 5 Novar Dr, Glasgow
rentStar t
1-Jul-00 1-Sep-02 1-Sep-99
rentFinis h
31-Aug-01 1-Sep-02 10-Jun-00
ren t
350 450 350
ownerN o
CO40 CO93 CO40
oName
Tina Murphy Tony Shaw Tina Murphy
CR56
PG36
10-Oct-00
1-Dec-01
370
CO93
Tony Shaw
Tony Shaw
CR56
PG16
1-Nov-02
1-Aug-03
450
CO93
(clientNo, cName) (clientNo, propertyNo, pAddress, rentStart, rentFinish, rent, ownerNo, oName)
cName
John Kay Aline Stewart
ClientN o
CR76 CR76 CR56 CR56 CR56
propertyN o
PG4 PG16 PG4 PG36 PG16
pAddress
6 lawrence St,Glasgow 5 Novar Dr, Glasgow 6 lawrence St,Glasgow 2 Manor Rd, Glasgow 5 Novar Dr, Glasgow
rentStar t
1-Jul-00 1-Sep-02 1-Sep-99 10-Oct-00 1-Nov-02
rentFinis h
31-Aug-01 1-Sep-02 10-Jun-00 1-Dec-01 1-Aug-03
ren t
350 450 350 370 450
ownerN o
CO40 CO93 CO40 CO93 CO93
oName
Tina Murphy Tony Shaw Tina Murphy Tony Shaw Tony Shaw
With the second approach, we remove Figure 5 1NF ClientRental relation with the second approach
the repeating g
Second normal form (2NF) is a relation that is in first normal form and every non-primary-key attribute is fully functionally dependent on the primary key. The normalization of 1NF relations to 2NF involves the removal of partial dependencies. If a partial dependency exists, we remove the function dependent attributes from the relation by placing them in a new relation along with a copy of their determinant.
Informal definition: A relation R is in 2NF if a) R is in 1NF and b) For each FD X A, X is not a part of any candidate key
Condition b) means each attribute is fully functionally dependant on the whole key of R. The FD that does not satisfy the condition (b) is called a partial dependency (PD) Note: a non-Second Normal Form occurs only when you have a composite PK.
Rental
cName
John Kay Aline Stewart
ClientN o
CR76
CR76 CR56 CR56 CR56
propertyN o
PG4
PG16 PG4 PG36 PG16
rentStar t
1-Jul-00
1-Sep-02 1-Sep-99 10-Oct-00 1-Nov-02
rentFinis h
31-Aug-01
1-Sep-02 10-Jun-00 1-Dec-01 1-Aug-03
PropertyOwner
propertyN o
PG4 PG16 PG36
pAddress
6 lawrence St,Glasgow 5 Novar Dr, Glasgow 2 Manor Rd, Glasgow
rent
350 450 370
ownerN o
CO40 CO93 CO93
oName
Tina Murphy Tony Shaw Tony Shaw
Third normal form (3NF) A relation that is in first and second normal form, and in which no non-primary-key attribute is transitively dependent on the primary key.
The normalization of 2NF relations to 3NF involves the removal of transitive dependencies by placing the attribute(s) in a new relation along with a copy of the determinant.
Example) WORK(EMP#, ENAME, DEPT#, BUDGET, LOC) 2NF 3NF 1) EMP# ENAME Y Y 2) EMP# DEPT# Y Y 3) DEPT# BUDGET Y N 4) DEPT# LOC Y N WORK is in 2NF but not in 3NF because of FD (3) and (4)
2) Create a separate table for each FD R1(EMP#, ENAME, DEPT#), R2(DEPT#, BUDGET, LOC) 3) Check for redundant table 4) Check for lossless join The decomposition is lossless since R1 contains EMP# The original relation WORK is not in 3NF but R1 and R2 are in 3NF. Note that the LHS of a FD becomes the PK of each decomposed table.
Example 2:
Key L# + ACC# 1) Combine the RHS of FDs of they have common LHS. L# AMT, LOAN_DATE ACC# BAL, ACC_DATE 2) Create a separate table for each FD R1(L#, AMT, LOAN_DATE) R2(ACC#, BAL, ACC_DATE) 3) Check for redundant tables. 4) Check for lossless join The decomposition is lossy since neither R1 nor R2 contains L# + ACC#. So add the candidate key ad the 3rd relation. R3(L#,ACC#)
Rental
fd1 fd5 fd6 clientNo, propertyNo rentStart, rentFinish clientNo, rentStart propertyNo, rentFinish propertyNo, rentStart clientNo, rentFinish (Primary Key) (Candidate key) (Candidate key)
PropertyOwner
fd3 fd4 propertyNo pAddress, rent, ownerNo, oName (Primary Key) ownerNo oName (Transitive Dependency)
Rental
cName
John Kay Aline Stewart
ClientN o
CR76 CR76 CR56 CR56 CR56
propertyN o
PG4 PG16 PG4 PG36 PG16
rentStar t
1-Jul-00 1-Sep-02 1-Sep-99 10-Oct-00 1-Nov-02
rentFinis h
31-Aug-01 1-Sep-02 10-Jun-00 1-Dec-01 1-Aug-03
PropertyOwner
propertyN o
PG4 PG16 PG36
Owner
rent
350 450 370
pAddress
6 lawrence St,Glasgow 5 Novar Dr, Glasgow 2 Manor Rd, Glasgow
ownerN o
CO40 CO93 CO93
ownerN o
CO40 CO93
oName
Tina Murphy Tony Shaw
Example of BCNF
fd1 fd2 fd3 fd4 clientNo, interviewDate interviewTime, staffNo, roomNo (Primary Key) staffNo, interviewDate, interviewTime clientNo (Candidate key) roomNo, interviewDate, interviewTime clientNo, staffNo (Candidate key) staffNo, interviewDate roomNo (not a candidate key)
As a consequece the ClientInterview relation may suffer from update anmalies. For example, two tuples have to be updated if the roomNo need be changed for staffNo SG5 on the 13-May-02.
ClientInterview
ClientN o
CR76
CR76 CR74 CR56
interviewDat e
13-May-02
13-May-02 13-May-02 1-Jul-02
interviewTim e
10.30
12.00 12.00 10.30
staffNo
SG5
SG5 SG37 SG5
roomNo
G101
G101 G102 G102
Example of BCNF(2)
To transform the ClientInterview relation to BCNF, we must remove the violating functional dependency by creating two new relations called Interview and SatffRoom as shown below, Interview (clientNo, interviewDate, interviewTime, staffNo) StaffRoom(staffNo, interviewDate, roomNo)
Interview
ClientN o
CR76 CR76 CR74 CR56
interviewDat e
13-May-02 13-May-02 13-May-02 1-Jul-02
interviewTim e
10.30 12.00 12.00 10.30
staffNo
SG5 SG5 SG37 SG5
StaffRoom
staffNo
SG5 SG37 SG5
interviewDat e
13-May-02 13-May-02 1-Jul-02
roomNo
G101 G102 G102