Slides6 Normalization
Slides6 Normalization
Part 2: Normalization
Instructor: Vu Tuyet Trinh
Objective
•Upon completion of this lesson, students will be able to:
1. Know why we need normalization in relational DB
1
Outline
• Introduction
• Normal Forms
• Normalization
1. Introduction
1.1. Motivation
1.2. Full & Partial Dependency
1.3. Transitive Dependency
2
1.1. Motivation
• Designing DB: one of the most difficult tasks
• One simplest design approach is to use a big table and store all
data
• But what’s the problem with this?
• Anomalies
• Redundancies
3
1.1. Motivation: Update anomalies
• An instance where the same information must be updated in
several different places
• If you update the name of subject "Databases", you need to
update in two different places (not efficient)
student_id full_name dob subject_id name result
4
1.1. Motivation
• Normalization is the process of removing anomalies and
redundancies from DB
10
5
1.3. Transitive dependency
• If A → B and B → C
• Attribute A must be the determinant of C.
• Attribute A transitively determines attribute C or
• C is transitively dependent on A
A B C
11
11
2. Normal Forms
2.1. Introduction
2.2. 1st Normal Form
2.3. 2nd Normal Form
2.4. 3rd Normal Form
12
12
6
2.1. Introduction
• Each form was designed to eliminate one or more of the anomalies: First
NF; Second NF; Third NF
• Unnormalized Form (UNF)
• A table that contains one or more repeating groups. I.e., its cell may
contain multiple values
Multi Value
Or repeating groups
student_id full_name dob subject_id name result
1238 Theresa May 08/06/1998 IT4843, IT4868 Data integration, Web mining B, B
13
13
14
14
7
2.3. Second Normal Form (2NF)
• Based on the concept of full functional dependency
• A prime attribute
• It is an attribute that is member of some candidate key
• 2NF relation is
• in 1NF and every non-prime attribute is fully functionally dependent on
the primary key
Partial Dependency
student_id full_name
Full Dependency
subject_id result
15
15
16
16
8
3. Normalization
3.1. Properties of relational decompositions
3.2. An algorithm decomposes a universal relation into 3NF
3.3. Some examples
17
17
• A single universal relation schema R = {A1, A2, ..., An} that includes all
the attributes of the DB
• F is a set of FDs holds on R
• Using the FDs, the algorithms decompose the universal relation
schema R into a set of relation schemas D = {R1, R2, ..., Rm}; D is
called a decomposition of R
18
18
9
3.1. Properties of relational decompositions
• Attribute preservation
• Each attribute in R will appear in at least one relation
schema Ri in the decomposition so that no attributes are lost
• Dependency preservation
• Each FD X→Y specified in F either appeared directly in one of the R i in
the decomposition D or could be inferred from the dependencies that
appear in some Ri.
• Lossless join
• r = R1(r) ⋈ R2(r) ⋈ … ⋈ Rm (r)
19
19
20
20
10
3.2. An algorithm decomposes a universal
relation into 3NF
• Input: A universal relation R and a set of FDs F on the attributes of R.
• Find a minimal cover G for F
• For each left-hand-side X of a FD that appears in G, create a relation schema
in D with attributes {X ∪ {A1} ∪ {A2} ... ∪ {Ak} }, where X → A1, X → A2, ..., X →
Ak are the only dependencies in G with X as the left-hand-side (X is
the key of this relation);
• Place any remaining attributes (that have not been placed in any relation)
in a single relation schema to ensure the attribute preservation property.
21
21
22
22
11
3.3. Some examples
• Example 1:
• Given R = {A, B, C, D, E, F, G}, F = {A→B; ABCD→E; EF→G;
ACDF→EG}
• A minimal cover of F is G = {A→B, ACD→E, EF→G}
• Find a minimal key: K = ACDF
• We have R1(AB), R2(ACDE), R3(EFG)
• Since K is not a subset of Ri, we have a new relation R4(ACDF)
• In conclusion, we have a decomposition D = {R1, R2, R3, R4}
23
23
24
24
12
Remark
• Motivation of normalization
• Full & Partial Dependency
• Transitive dependency
• 1NF, 2 NF, 3 NF
• Properties of relational decompositions
• An algorithm decomposes a universal relation into 3NF
25
25
Summary
1. Introduction
• Normalization is the process of removing anomalies and redundancies
from DB
• Full & Partial Dependency
• Transitive dependency
2. Normal Forms
• 1NF, 2NF, 3NF
3. Normalization
• Properties of relational decompositions
• An algorithm decomposes a universal relation into 3NF
• Some examples
26
26
13