Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
15 views

Slides6 Normalization

The document discusses database normalization. It begins by describing problems with storing data in a single large table, such as insertion, update, and deletion anomalies. It then covers normal forms including 1st, 2nd, and 3rd normal form, which are designed to eliminate anomalies by removing redundant and dependent data. The document explains concepts like full and partial functional dependencies, transitive dependencies, and prime attributes. It provides examples to illustrate normalization and normal forms.

Uploaded by

Hữu Hoàng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Slides6 Normalization

The document discusses database normalization. It begins by describing problems with storing data in a single large table, such as insertion, update, and deletion anomalies. It then covers normal forms including 1st, 2nd, and 3rd normal form, which are designed to eliminate anomalies by removing redundant and dependent data. The document explains concepts like full and partial functional dependencies, transitive dependencies, and prime attributes. It provides examples to illustrate normalization and normal forms.

Uploaded by

Hữu Hoàng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Database design

Part 2: Normalization
Instructor: Vu Tuyet Trinh

Objective
•Upon completion of this lesson, students will be able to:
1. Know why we need normalization in relational DB

2. Identify normal forms such as 1st NF, 2nd NF, 3rd NF

3. Know how to normalize a relational DB into 3NF

1
Outline
• Introduction
• Normal Forms
• Normalization

1. Introduction
1.1. Motivation
1.2. Full & Partial Dependency
1.3. Transitive Dependency

2
1.1. Motivation
• Designing DB: one of the most difficult tasks
• One simplest design approach is to use a big table and store all
data
• But what’s the problem with this?
• Anomalies
• Redundancies

1.1. Motivation: Insertion Anomalies


• PK: (student_id, subject_id)
• We can not insert a new subject if we do not have a student
assigned to it yet
• We can not insert a null value into PK attributes
student_id full_name dob subject_id name result

1234 David Beckham 12/21/1997 IT3090 Databases A

1238 Theresa May 08/06/1998 IT4843 Data integration B

1234 David Beckham 12/21/1997 IT4868 Web mining C

1497 Tony Blair 03/01/1999 IT3090 Databases A

1238 Theresa May 08/06/1998 IT4868 Web mining B

1542 Margaret Thatcher 05/08/1997 IT2000 Introduction to ICT C

3
1.1. Motivation: Update anomalies
• An instance where the same information must be updated in
several different places
• If you update the name of subject "Databases", you need to
update in two different places (not efficient)
student_id full_name dob subject_id name result

1234 David Beckham 12/21/1997 IT3090 Databases A

1238 Theresa May 08/06/1998 IT4843 Data integration B

1234 David Beckham 12/21/1997 IT4868 Web mining C

1497 Tony Blair 03/01/1999 IT3090 Databases A

1238 Theresa May 08/06/1998 IT4868 Web mining B

1542 Margaret Thatcher 05/08/1997 IT2000 Introduction to ICT C

1.1. Motivation: Deletion Anomalies

• Where deleting one piece of data inadvertently causes other


data to be lost
• If we delete student Margaret Thatcher, then we will lose
information about subject "Introduction to ICT"
student_id full_name dob subject_id name result

1234 David Beckham 12/21/1997 IT3090 Databases A

1238 Theresa May 08/06/1998 IT4843 Data integration B

1234 David Beckham 12/21/1997 IT4868 Web mining C

1497 Tony Blair 03/01/1999 IT3090 Databases A

1238 Theresa May 08/06/1998 IT4868 Web mining B

1542 Margaret Thatcher 05/08/1997 IT2000 Introduction to ICT C

4
1.1. Motivation
• Normalization is the process of removing anomalies and
redundancies from DB

1.2. Full & Partial Dependency

student_id full_name dob subject_id name result

1234 David Beckham 12/21/1997 IT3090 Databases A

1238 Theresa May 08/06/1998 IT4843 Data integration B

1234 David Beckham 12/21/1997 IT4868 Web mining C

1497 Tony Blair 03/01/1999 IT3090 Databases A

1238 Theresa May 08/06/1998 IT4868 Web mining B

1542 Margaret Thatcher 05/08/1997 IT2000 Introduction to ICT C

Key: {student_id, subject_id}


student_id full_name Full Key Dependency:
{student_id, subject_id} → result
subject_id result Partial Key Dependency:
student_id → full_name
10

10

5
1.3. Transitive dependency

• If A → B and B → C
• Attribute A must be the determinant of C.
• Attribute A transitively determines attribute C or
• C is transitively dependent on A

A B C

11

11

2. Normal Forms
2.1. Introduction
2.2. 1st Normal Form
2.3. 2nd Normal Form
2.4. 3rd Normal Form

12

12

6
2.1. Introduction
• Each form was designed to eliminate one or more of the anomalies: First
NF; Second NF; Third NF
• Unnormalized Form (UNF)
• A table that contains one or more repeating groups. I.e., its cell may
contain multiple values
Multi Value
Or repeating groups
student_id full_name dob subject_id name result

1234 David Beckham 12/21/1997 IT3090, IT4868 Databases, Web mining A, C

1238 Theresa May 08/06/1998 IT4843, IT4868 Data integration, Web mining B, B

1497 Tony Blair 03/01/1999 IT3090 Databases A

1542 Margaret Thatcher 05/08/1997 IT2000 Introduction to ICT C

13

13

2.2. First Normal Form (1NF)


• A cell in a relation contains one and only one value.
• Disallows composite attributes, multivalued attributes or nested
relations

student_id full_name dob subject_id name result

1234 David Beckham 12/21/1997 IT3090 Databases A

1238 Theresa May 08/06/1998 IT4843 Data integration B

1234 David Beckham 12/21/1997 IT4868 Web mining C

1497 Tony Blair 03/01/1999 IT3090 Databases A

1238 Theresa May 08/06/1998 IT4868 Web mining B

1542 Margaret Thatcher 05/08/1997 IT2000 Introduction to ICT C

14

14

7
2.3. Second Normal Form (2NF)
• Based on the concept of full functional dependency
• A prime attribute
• It is an attribute that is member of some candidate key
• 2NF relation is
• in 1NF and every non-prime attribute is fully functionally dependent on
the primary key
Partial Dependency
student_id full_name

Full Dependency
subject_id result

15

15

2.4. Third Normal Form (3NF)


• A relation that is in 2NF
• No non-prime attribute is transitively dependent on the primary key.
• I.e, all non-prime attributes are fully & directly dependent on the PK.

16

16

8
3. Normalization
3.1. Properties of relational decompositions
3.2. An algorithm decomposes a universal relation into 3NF
3.3. Some examples

17

17

3.1. Properties of relational decompositions

• A single universal relation schema R = {A1, A2, ..., An} that includes all
the attributes of the DB
• F is a set of FDs holds on R
• Using the FDs, the algorithms decompose the universal relation
schema R into a set of relation schemas D = {R1, R2, ..., Rm}; D is
called a decomposition of R

18

18

9
3.1. Properties of relational decompositions

• Attribute preservation
• Each attribute in R will appear in at least one relation
schema Ri in the decomposition so that no attributes are lost
• Dependency preservation
• Each FD X→Y specified in F either appeared directly in one of the R i in
the decomposition D or could be inferred from the dependencies that
appear in some Ri.
• Lossless join
• r = R1(r) ⋈ R2(r) ⋈ … ⋈  Rm (r)

19

19

3.1. Properties of relational decompositions


• An example
• Suppose we have a relation:
Learn(student_id, full_name, dob, subject_id, name, result)
• We split it into two relations:
Student(student_id, full_name, dob)
Subject(subject_id, name)
• This decomposition does not warrant:
• Attribute preservation: Lost information about "result"
• Dependency preservation condition, for instance, (student_id, subject_id)
→ result is loss.
• Lossless join property, i.e., we can join these two relations

20

20

10
3.2. An algorithm decomposes a universal
relation into 3NF
• Input: A universal relation R and a set of FDs F on the attributes of R.
• Find a minimal cover G for F
• For each left-hand-side X of a FD that appears in G, create a relation schema
in D with attributes {X ∪ {A1} ∪ {A2} ... ∪ {Ak} }, where X → A1, X → A2, ..., X →
Ak are the only dependencies in G with X as the left-hand-side (X is
the key of this relation);
• Place any remaining attributes (that have not been placed in any relation)
in a single relation schema to ensure the attribute preservation property.

21

21

3.2. An algorithm decomposes a universal


relation into 3NF
• If none of the relation schemas in D contains a key of R, then create one
more relation schema in D that contains attributes that form a key of R.
• Eliminate redundant relations from the resulting set of relations in the
relational database schema.
• A relation R is considered redundant if R is a projection of another
relation S in the schema; alternately, R is subsumed by S

22

22

11
3.3. Some examples

• Example 1:
• Given R = {A, B, C, D, E, F, G}, F = {A→B; ABCD→E; EF→G;
ACDF→EG}
• A minimal cover of F is G = {A→B, ACD→E, EF→G}
• Find a minimal key: K = ACDF
• We have R1(AB), R2(ACDE), R3(EFG)
• Since K is not a subset of Ri, we have a new relation R4(ACDF)
• In conclusion, we have a decomposition D = {R1, R2, R3, R4}

23

23

3.3. Some examples


• Example 2:
• Given R(student_id, name, birthday, advisor, department, semester, course, grade)
• F = { student_id → (name, birthday); advisor → department; (student_id, semester,
course) → (grade, advisor, department)}
• We denote like this: student_id (A), name (B), birthday (C), advisor (D), department
(E), semester (F), course (G), grade (H)
• F is rewritten as {A →BC; D →E; AFG →HDE}
• A minimal cover of F is G = {A→B; A →C; D →E; AFG →DH}
• Find a minimal key: K = AFG
• We have R1(ABC), R2(DE), R3(AFGDH)
• Since K is a subset of R3, we have a decomposition D = {R1, R2, R3} or
{R1(student_id, name, birthday), R2 (advisor, department), R3 (student_id, semester,
course, advisor, grade)}

24

24

12
Remark
• Motivation of normalization
• Full & Partial Dependency
• Transitive dependency
• 1NF, 2 NF, 3 NF
• Properties of relational decompositions
• An algorithm decomposes a universal relation into 3NF

25

25

Summary
1. Introduction
• Normalization is the process of removing anomalies and redundancies
from DB
• Full & Partial Dependency
• Transitive dependency
2. Normal Forms
• 1NF, 2NF, 3NF
3. Normalization
• Properties of relational decompositions
• An algorithm decomposes a universal relation into 3NF
• Some examples

26

26

13

You might also like