Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
70 views

Database Normalization

Normalization is the process of organizing data in a database to avoid redundancy and anomalies when data is inserted, updated, or deleted. It involves decomposing tables to eliminate redundant data and dependencies between non-key attributes. This is done through analyzing attribute dependencies and decomposing the table into smaller, normalized tables in at least first normal form with a single, unique key in each table. The normalized tables are then related through joins while ensuring no information is lost from the original table.

Uploaded by

Anabelle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views

Database Normalization

Normalization is the process of organizing data in a database to avoid redundancy and anomalies when data is inserted, updated, or deleted. It involves decomposing tables to eliminate redundant data and dependencies between non-key attributes. This is done through analyzing attribute dependencies and decomposing the table into smaller, normalized tables in at least first normal form with a single, unique key in each table. The normalized tables are then related through joins while ensuring no information is lost from the original table.

Uploaded by

Anabelle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Normalization

COURSE 4: Databases
Normalization
when and why

Databases C5: Normal Forms


Normalization
• Informal:
• Organize data in a relational database in order to avoid redundancy and data
manipulation anomalies.

• Decompose a relation (table) without loosing information.

Databases C5: Normal Forms


Incorrect
Incorrect ER
table structure

INSERT/UPDATE/DELETE
REDUNDANCY
ANOMALY

Databases C5: Normal Forms


Incorrect
Incorrect ER
table structure

decomposition /
synthesis
----
normalization
INSERT/UPDATE/DELETE
REDUNDANCY
ANOMALY

Databases C5: Normal Forms


Normalization
• Avoid redundancy

Databases C5: Normal Forms


Normalization CUSTOMER
CUSTOMER_ID LAST_NAME … ….
1 Smith ... ….
• Avoid redundancy
2 Green … ….
3 Avery ... ….

LOAN

LOAN_ID CUSTOMER_ID AMOUNT DATE


101 1 125000 18/04/21
102 1 25000 14/04/22
103 2 12500 03/05/21
127 2 20000 …
389 3 75000 …

Databases C5: Normal Forms


Normalization
• Avoid redundancy

Databases C5: Normal Forms


Normalization
• Avoid redundancy
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
2 Green 103 credit card 12500
2 Green 127 mortgage 20000
3 Avery 389 mortgage 75000
3 Avery 486 credit card 5000
3 Avery 769 mortgage 45000

Databases C5: Normal Forms


Normalization
• INSERT anomaly
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
2 Green 103 credit card 12500
2 Green 127 mortgage 20000
3 Avery 389 mortgage 75000
3 Avery 486 credit card 5000
4 Stark ??? null null

Databases C5: Normal Forms


Normalization
• UPDATE anomaly
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
2 Green 103 credit card 12500
2 Green 127 mortgage 20000
3 Avery 389 mortgage 75000
3 Avery 486 credit card 5000
3 Avery 769 mortgage 45000

Databases C5: Normal Forms


Normalization
• DELETE anomaly
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
2 Green 103 credit card 12500
2 Green 127 mortgage 20000
3 Avery 389 mortgage 75000
3 Avery 486 credit card 5000
4 Stark 700 mortgage 45000

Databases C5: Normal Forms


Table with
redundancy/not in a
normal form

Analyze
dependencies

“good”
decomposition

table in normal form table in normal form

Databases C5: Normal Forms


Decomposition

Databases C5: Normal Forms


Decomposition Step 1: Projection
NAME LOAN_ID TYPE AMOUNT
➢ S1 = ς(𝑁𝐴𝑀𝐸,𝐿𝑂𝐴𝑁𝐼𝐷,𝑇𝑌𝑃𝐸,𝐴𝑀𝑂𝑈𝑁𝑇) 𝑅 Smith 101 mortgage 125000
Smith 102 credit card 25000
Green 103 credit card 12500
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
Smith 389 mortgage 75000
1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
2 Green 103 credit card 12500
CUSTOMER_ID NAME
3 Smith 389 mortgage 75000
1 Smith
1 Smith
2 Green
➢ 𝑆2 = ς(𝐶𝑈𝑆𝑇𝑂𝑀𝐸𝑅𝐼𝐷,𝑁𝐴𝑀𝐸) 𝑅 3 Smith

Databases C5: Normal Forms


Decomposition Step 2: Join
CUSTOMER_ID NAME NAME LOAN_ID TYPE AMOUNT
1 Smith Smith 101 mortgage 125000
1 Smith Smith 102 credit card 25000
2 Green Green 103 credit card 12500
3 Smith Smith 389 mortgage 75000

CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT

• Lossy decomposition 1 Smith 101 mortgage 125000


1 Smith 102 credit card 25000
S1 ⋈ S2 ⊇ 𝑅
1 Smith 389 mortgage 75000
3 Smith 101 mortgage 125000
3 Smith 102 credit card 25000
3 Smith 389 mortgage 75000
2 Green 103 credit card 12500

Databases C5: Normal Forms


Decomposition Step 1: Projection
CUSTOMER_ID LOAN_ID TYPE AMOUNT
➢ S1 = ς(𝐶𝑈𝑆𝑇𝑂𝑀𝐸𝑅𝐼𝐷,𝐿𝑂𝐴𝑁𝐼𝐷,𝑇𝑌𝑃𝐸,𝐴𝑀𝑂𝑈𝑁𝑇) 𝑅1 101 mortgage 125000
1 102 credit card 25000
2 103 credit card 12500
CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT
3 389 mortgage 75000
1 Smith 101 mortgage 125000
1 Smith 102 credit card 25000
2 Green 103 credit card 12500
CUSTOMER_ID NAME
3 Smith 389 mortgage 75000
1 Smith
1 Smith
2 Green
➢ 𝑆2 = ς(𝐶𝑈𝑆𝑇𝑂𝑀𝐸𝑅𝐼𝐷,𝑁𝐴𝑀𝐸) 𝑅 3 Smith

Databases C5: Normal Forms


Decomposition Step 2: Join
CUSTOMER_ID NAME CUSTOMER_ID LOAN_ID TYPE AMOUNT
1 Smith 1 101 mortgage 125000
1 Smith 1 102 credit card 25000
2 Green 2 103 credit card 12500
3 Smith 3 389 mortgage 75000

CUSTOMER_ID NAME LOAN_ID TYPE AMOUNT


• Lossless decomposition 1 Smith 101 mortgage 125000
S1 ⋈ S2 = 𝑅 1 Smith 102 credit card 25000
3 Smith 389 mortgage 75000
2 Green 103 credit card 12500

Databases C5: Normal Forms


Decomposition
• lossy decompositions and lossless decompositions.

• Lossy: R → decompose(R): S1, S2 → recompose(S1,S2) ⊇ R

lossy =/= less data, (less is more!)


lossy = lost information

• Lossless R → decompose(R): S1, S2 → recompose(S1,S2) = R

Databases C5: Normal Forms


Decomposition
• Lossy
ς𝑅1 𝑅 ⋈ ς𝑅2 𝑅 ⊇ R

• Lossless
ς𝑅1 𝑅 ⋈ ς𝑅2 𝑅 = R

Databases C5: Normal Forms


Functional dependencies

Databases C5: Normal Forms


Functional dependencies
CUSTOMER_ID NAME X Y Z T
1 Smith X1 Y1 Z1 T1
1 Smith X1 Y2 Z1 T2
2 Green X2 Y2 Z2 T2
3 Smith X2 Y3 Z2 T3
X3 Y3 Z2 T4

• CUSTOMER_ID -> NAME


• X -> Z
• Z --/--> X
• X -> X

Databases C5: Normal Forms


Normal Forms

Databases C5: Normal Forms


NF1 NF2 NF3 BCNF NF4 NF5

Databases C5: Normal Forms


NF1
ATOMIC ATTRIBUTES

Databases C5: Normal Forms


NF1
• Atomic attributes
• No multi-valued attributes

• The domain of each attribute contains only atomic values and each
attribute contains only a value of its domain.

• A relational database is at least in NF1

Databases C5: Normal Forms


NF1
EMP_ID NAME EMAIL
1 Williams williams@gmail.com Optional:
Composed key
williams@yahoo.com Artificial key
2 Davis davis@gmail.com
davis@academy.com
3 Miller miller@gmail.com EMAIL_ID EMP_ID EMAIL
4 Stewart stewart@gmail.com 1 1 williams@gmail.com
office@academy.com 2 1 williams@yahoo.com
3 2 davis@gmail.com
4 2 davis@academy.com
5 3 miller@gmail.com
6 4 stewart@gmail.com
7 4 office@academy.com

Databases C5: Normal Forms


NF2
NO PARTIAL DEPENDENCIES

Databases C5: Normal Forms


NF2
• Tables in NF1
• No non-key attributes (not part of the key) that depend on a subset of
the attributes forming the key.

• There are no partial dependencies.

Databases C5: Normal Forms


Functional dependencies
X Y Z T X Y Z T
X1 Y1 Z1 T1 X1 Y1 Z1 …
X2 Y1 Z1 T2 X2 Y1 Z1 …
X2 Y2 Z2 T3 X2 Y2 Z2 …
X2 Y3 Z2 T3 X2 Y3 Z2 …
X2 Y3 Z2 T3 X2 Y3 Z2 …

• partial (X,Y) → Z
• Y →Z

Databases C5: Normal Forms


Functional dependencies
X Y Z T
X Y Z T … Y1 … T1
X1 Y1 … T1 … Y1 … T2
X2 Y1 … T2 … … … …
X2 Y2 … T3 … … … …
X2 Y3 … T3 … … … …
X2 Y3 … T3

X Y Z T

• total (X,Y) → T … … … …
X2 … … T2
• X -/-> T
X2 … … T3
• Y -/-> T
… … … …
… … … …

Databases C5: Normal Forms


Functional dependencies

AIRPORT_ID AIRPLANE_ID DEPARTURE AIRPLANE_MODEL BOARDING_GATE


1 101 30/03/20 17:00 Boeing 777 42
1 102 02/05/20 09:30 Airbus A320 50
2 201 06/08/20 10:45 Boeing 757 35
2 202 10/10/20 06:20 Airbus A320 10
1 101 06/04/20 16:35 Boeing 777 23

Databases C5: Normal Forms


NF2 dependencies

K1,K2 X, Y
K1 -> X

(K1, K2) -> Y

Databases C5: Normal Forms


NF2 dependencies

K1,K2 X, Y
K1 -> X

(K1, K2) -> Y

K1,K2 Y K1 X

Databases C5: Normal Forms


NF2 dependencies

K1,K2 X, Y
K1 -> X

(K1, K2) -> Y

K1 = AIRPLANE_ID
K2 = AIRPORT_ID, DEPARTURE K1,K2 Y K1 X
Y = BOARDING_GATE
X = AIRPLANE_MODEL

Databases C5: Normal Forms


AIRPORT_ID AIRPLANE_ID DEPARTURE AIRPLANE_MODEL BOARDING_GATE
1 101 30/03/20 17:00 Boeing 777 42
1 102 02/05/20 09:30 Airbus A320 50
2 201 06/08/20 10:45 Boeing 757 35
2 202 10/10/20 06:20 Airbus A320 10
1 101 06/04/20 16:35 Boeing 777 23

AIRPORT_ID AIRPLANE_ID DEPARTURE BOARDING_GATE AIRPLANE_ID AIRPLANE_MODEL


1 101 30/03/20 17:00 42 101 Boeing 777
1 102 02/05/20 09:30 50 102 Airbus A320
2 201 06/08/20 10:45 35 201 Boeing 757
2 202 10/10/20 06:20 10 202 Airbus A320
1 101 06/04/20 16:35 23

Databases C5: Normal Forms


NF3
NO TRANSITIVE DEPENDENCIES

Databases C5: Normal Forms


NF3
• Tables in NF2
• Non-key attributes (not part of the key) depend on the entire key and
only on the key.

• There are no transitive dependencies.

Databases C5: Normal Forms


AIRPORT_ID AIRPLANE_ID DEPARTURE MODEL CAPACITY REVISION_DATE BOARDING_GATE
1 101 30/03/20 17:00 Boeing 777 451 01/01/2021 42
1 102 02/05/20 09:30 Airbus A320 150 01/03/2020 50
2 201 06/08/20 10:45 Boeing 757 295 03/05/2020 35
2 202 10/10/20 06:20 Airbus A320 150 04/06/2021 10
1 101 06/04/20 16:35 Boeing 777 451 08/09/2020 23

AIRPORT_ID AIRPLANE_ID DEPARTURE BOARDING_GATE AIRPLANE_ID MODEL CAPACITY REVISION_DATE


1 101 30/03/20 17:00 42
101 Boeing 777 451 01/01/2021
1 102 02/05/20 09:30 50
102 Airbus A320 150 01/03/2020
2 201 06/08/20 10:45 35
201 Boeing 757 259 03/05/2020
2 202 10/10/20 06:20 10
202 Airbus A320 150 04/06/2021
1 101 06/04/20 16:35 23

Databases C5: Normal Forms


AIRPLANE_ID MODEL CAPACITY REVISION_DATE

101 Boeing 777 451 01/01/2021


102 Airbus A320 150 01/03/2020
201 Boeing 757 259 03/05/2020
202 Airbus A320 150 04/06/2021

Databases C5: Normal Forms


NF3 dependencies

K X, Y, Z
K -> X

X -> Y

Databases C5: Normal Forms


NF3 dependencies

K X, Y, Z
K -> X

X -> Y

K X, Z X Y

Databases C5: Normal Forms


NF3 dependencies

K X, Y, Z
K -> X

X -> Y

K = AIRPLANE_ID
X = AIRPLANE_MODEL K X, Z X Y
Y = CAPACITY
Z= REVISION_DATE

Databases C5: Normal Forms


AIRPLANE_ID MODEL CAPACITY REVISION_DATE

101 Boeing 777 451 01/01/2021


102 Airbus A320 150 01/03/2020
201 Boeing 757 259 03/05/2020
202 Airbus A320 150 04/06/2021

AIRPLANE_ID MODEL REVISION_DATE


MODEL CAPACITY
101 Boeing 777 01/01/2021
Boeing 777 451
102 Airbus A320 01/03/2020
Airbus A320 150
201 Boeing 757 03/05/2020
Boeing 757 259
202 Airbus A320 04/06/2021

Databases C5: Normal Forms

You might also like