Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
44 views

Introduction of Database Normalization

Database normalization is a process to organize attributes and tables to reduce data redundancy. It involves identifying functional dependencies between attributes. Some key points: - Functional dependencies specify relationships where one attribute determines another. For example, employee ID functionally determines name. - Attributes are classified based on functional dependencies, like trivial dependencies where the determined attribute is a subset of the determinant. - Armstrong's axioms define properties like transitivity that allow deducing new functional dependencies. - Attribute closure is the set of attributes that can be functionally determined by another set, helping identify candidate and super keys.

Uploaded by

Om Sirsath
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Introduction of Database Normalization

Database normalization is a process to organize attributes and tables to reduce data redundancy. It involves identifying functional dependencies between attributes. Some key points: - Functional dependencies specify relationships where one attribute determines another. For example, employee ID functionally determines name. - Attributes are classified based on functional dependencies, like trivial dependencies where the determined attribute is a subset of the determinant. - Armstrong's axioms define properties like transitivity that allow deducing new functional dependencies. - Attribute closure is the set of attributes that can be functionally determined by another set, helping identify candidate and super keys.

Uploaded by

Om Sirsath
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

1

Introduction of Database Normalization


Database normalization is the process of organizing
the attributes of the database to reduce or
eliminate data redundancy (having the same data
but at different places). 
Problems because of data redundancy 
Data redundancy unnecessarily increases the size
of the database as the same data is repeated in
many places. Inconsistency problems also arise
during insert, delete and update operations. 

Functional Dependency:
Functional Dependency is a constraint between two
sets of attributes in relation to a database. A
functional dependency is denoted by an arrow (→).
If an attribute A functionally determines B, then it is
written as A → B. 
For example, employee_id → name means
employee_id functionally determines the name of
the employee. As another example in a timetable
database, {student_id, time} → {lecture_room},
student ID and time determine the lecture room
where the student should be. 
What does functionally dependent mean? 
A function dependency A → B means for all
instances of a particular value of A, there is the
same value of B. 
2

For example in the below table A → B is true, but B


→ A is not true as there are different values of A for
B = 3. 

A B
------
1 3
2 3
4 0
1 3
4 0
Trivial Functional Dependency:
X → Y is trivial only when Y is subset of X. 
Examples 
 
ABC → AB
ABC → A
ABC → ABC
Non Trivial Functional Dependencies:
X → Y is a non trivial functional dependency when
Y is not a subset of X. 
X → Y is called completely non-trivial when X
intersect Y is NULL. 
3

 
Example: 
Id → Name,
Name → DOB

Types of Functional dependencies in DBMS


A functional dependency is a constraint that
specifies the relationship between two sets of
attributes where one set can accurately determine
the value of other sets. It is denoted as X → Y,
where X is a set of attributes that is capable of
determining the value of Y. The attribute set on the
left side of the arrow, X is called Determinant, while
on the right side, Y is called the Dependent.
Functional dependencies are used to
mathematically express relations among database
entities and are very important to understand
advanced concepts in Relational Database System
and understanding problems in competitive exams
like Gate.

Example:
Roll_no name dept_name dept_building
42 abc CO A4
43 pqr IT A3
44 xyz CO A4
4

45  xyz IT A3
46 mno EC B2
47 jkl ME B2

From the above table we can conclude some


valid functional dependencies:
 roll_no → { name, dept_name,
dept_building },→  Here, roll_no can
determine values of fields name, dept_name
and dept_building, hence a valid Functional
dependency
 roll_no → dept_name , Since, roll_no can
determine whole set of {name, dept_name,
dept_building}, it can determine its subset
dept_name also.
 dept_name → dept_building ,  Dept_name
can identify the dept_building accurately,
since departments with different dept_name
will also have a different dept_building
 More valid functional dependencies: roll_no
→ name, {roll_no, name} ⇢ {dept_name,
dept_building}, etc.

Here are some invalid functional dependencies:


 name → dept_name   Students with the same

name can have different dept_name, hence


this is not a valid functional dependency.
5

 dept_building → dept_name    There can be


multiple departments in the same building,
For example, in the above table departments
ME and EC are in the same building B2,
hence dept_building → dept_name is an
invalid functional dependency.
 More invalid functional dependencies: name
→ roll_no, {name, dept_name} → roll_no,
dept_building → roll_no, etc.

Armstrong’s axioms/properties of functional


dependencies:
1.Reflexivity: If Y is a subset of X, then X→Y
holds by reflexivity rule
For example, {roll_no, name} → name is
valid.
2.Augmentation: If X → Y is a valid
dependency, then XZ → YZ is also valid by
the augmentation rule.
For example, If {roll_no, name} →
dept_building is valid, hence {roll_no, name,
dept_name} → {dept_building, dept_name} is
also valid.→
3.Transitivity: If X → Y and Y → Z are both
valid dependencies, then X→Z is also valid by
the Transitivity rule.
6

For example, roll_no → dept_name &


dept_name → dept_building, then roll_no →
dept_building is also valid.

Types of Functional dependencies in


DBMS:
1.Trivial functional dependency
2.Non-Trivial functional dependency
3.Multivalued functional dependency
4.Transitive functional dependency
1.  Trivial Functional Dependency
In Trivial Functional Dependency, a dependent is
always a subset of the determinant.
i.e. If X → Y and Y is the subset of X, then it is
called trivial functional dependency
For example,

roll_no name age

42 abc 17

43 pqr 18

44 xyz 18
7

Here, {roll_no, name} → name is a trivial


functional dependency, since the
dependent name is a subset of determinant
set {roll_no, name}
Similarly, roll_no → roll_no is also an example of
trivial functional dependency. 
2.  Non-trivial Functional Dependency
In Non-trivial functional dependency, the
dependent is strictly not a subset of the
determinant.
i.e. If X → Y and Y is not a subset of X, then it is
called Non-trivial functional dependency.
For example,

roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, roll_no → name is a non-trivial functional


dependency, since the dependent name is not a
subset of determinant roll_no
8

Similarly, {roll_no, name} → age is also a non-


trivial functional dependency, since age is not a
subset of {roll_no, name} 
3.  Multivalued Functional Dependency
In Multivalued functional dependency, entities of
the dependent set are not dependent on each
other.
i.e. If a → {b, c} and there exists no functional
dependency between b and c, then it is called
a multivalued functional dependency.

For example,
roll_n
o name age 

42 abc 17 

43 pqr 18

44 xyz 18

45 abc 19
Here, roll_no → {name, age} is a multivalued
functional dependency, since the
dependents name & age are not dependent on
each other(i.e. name → age or age → name
doesn’t exist !)
9

4. Transitive Functional Dependency


In transitive functional dependency, dependent is
indirectly dependent on determinant.
i.e. If a → b & b → c, then according to axiom of
transitivity, a → c. This is a transitive functional
dependency  
For example,

enrol_no name dept building_no

42 abc CO 4

43 pqr EC 2

44 xyz IT 1

45 abc EC 2
Here, enrol_no → dept and dept → building_no, 
Hence, according to the axiom of
transitivity, enrol_no → building_no is a valid
functional dependency. This is an indirect functional
dependency, hence called Transitive functional
dependency.

Functional Dependency and Attribute


Closure
10

A functional dependency A->B in a relation holds if


two tuples having same value of attribute A also
have same value for attribute B. For Example, in
relation STUDENT shown in table 1, Functional
Dependencies 
 

STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE
hold
but 
STUD_NAME->STUD_STATE do not hold

How to find functional dependencies for a


relation?
Functional Dependencies in a relation are
dependent on the domain of the relation. Consider
the STUDENT relation given in Table 1. 
 
 We know that STUD_NO is unique for each
student. So STUD_NO->STUD_NAME,
STUD_NO->STUD_PHONE, STUD_NO-
>STUD_STATE, STUD_NO-
11

>STUD_COUNTRY and STUD_NO ->


STUD_AGE all will be true.
 Similarly, STUD_STATE->STUD_COUNTRY
will be true as if two records have same
STUD_STATE, they will have same
STUD_COUNTRY as well.
 For relation STUDENT_COURSE,
COURSE_NO->COURSE_NAME will be true
as two records with same COURSE_NO will
have same COURSE_NAME.
Functional Dependency Set:  Functional
Dependency set or FD set of a relation is the set of
all FDs present in the relation. For Example, FD set
for relation STUDENT shown in table 1 is: 

  { STUD_NO->STUD_NAME, STUD_NO-
>STUD_PHONE, STUD_NO->STUD_STATE, STUD_NO-
>STUD_COUNTRY,
STUD_NO -> STUD_AGE, STUD_STATE-
>STUD_COUNTRY }
Attribute Closure: Attribute closure of an attribute
set can be defined as set of attributes which can be
functionally determined from it. 
How to find attribute closure of an attribute set? 
To find attribute closure of an attribute set: 
 
 Add elements of attribute set to the result set.
12

 Recursively add elements to the result set


which can be functionally determined from the
elements of the result set.
 Using FD set of table 1, attribute closure can be
determined as:
 
 
 (STUD_NO)+ = {STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE, STUD_COUNTRY,
STUD_AGE}
 (STUD_STATE)+ = {STUD_STATE,
STUD_COUNTRY}
How to find Candidate Keys and Super Keys
using Attribute Closure?
 
If attribute closure of an attribute set contains

all attributes of relation, the attribute set will


be super key of the relation.
 If no subset of this attribute set can
functionally determine all attributes of the
relation, the set will be candidate key as well.
For Example, using FD set of table 1,
(STUD_NO, STUD_NAME)+ = {STUD_NO,
STUD_NAME, STUD_PHONE, STUD_STATE,
STUD_COUNTRY, STUD_AGE} 
13

(STUD_NO)+ = {STUD_NO, STUD_NAME,


STUD_PHONE, STUD_STATE, STUD_COUNTRY,
STUD_AGE} 
(STUD_NO, STUD_NAME) will be super key but not
candidate key because its subset (STUD_NO)+ is
equal to all attributes of the relation. So, STUD_NO
will be a candidate key. 
GATE Question: Consider the relation scheme R
= {E, F, G, H, I, J, K, L, M, N} and the set of
functional dependencies {{E, F} -> {G}, {F} -> {I,
J}, {E, H} -> {K, L}, K -> {M}, L -> {N} on R. What
is the key for R? (GATE-CS-2014) 
A. {E, F} 
B. {E, F, H} 
C. {E, F, H, K, L} 
D. {E} 
Answer: Finding attribute closure of all given
options, we get: 
{E,F}+ = {EFGIJ} 
{E,F,H}+ = {EFHGIJKLMN} 
{E,F,H,K,L}+ = {{EFHGIJKLMN} 
{E}+ = {E} 
{EFH}+ and {EFHKL}+ results in set of all attributes,
but EFH is minimal. So it will be candidate key. So
correct option is (B). 
How to check whether an FD can be derived
from a given FD set?
14

To check whether an FD A->B can be derived from


an FD set F,  
1.Find (A)+ using FD set F.
2.If B is subset of (A)+, then A->B is true else
not true.
GATE Question: In a schema with attributes A,
B, C, D and E following set of functional
dependencies are given 
{A -> B, A -> C, CD -> E, B -> D, E -> A} 
Which of the following functional dependencies
is NOT implied by the above set? (GATE IT
2005) 
A. CD -> AC 
B. BD -> CD 
C. BC -> CD 
D. AC -> BC 
Answer: Using FD set given in question, 
(CD)+ = {CDEAB} which means CD -> AC also
holds true. 
(BD)+ = {BD} which means BD -> CD can’t hold
true. So this FD is no implied in FD set. So (B) is the
required option. 
Others can be checked in the same way. 
Prime and non-prime attributes
Attributes which are parts of any candidate key of
relation are called as prime attribute, others are
non-prime attributes. For Example, STUD_NO in
15

STUDENT relation is prime attribute, others are


non-prime attribute. 
GATE Question:  Consider a relation scheme R
= (A, B, C, D, E, H) on which the following
functional dependencies hold: {A–>B, BC–> D,
E–>C, D–>A}. What are the candidate keys of R?
[GATE 2005] 
(a) AE, BE 
(b) AE, BE, DE 
(c) AEH, BEH, BCH 
(d) AEH, BEH, DEH 
Answer: (AE)+ = {ABECD} which is not set of all
attributes. So AE is not a candidate key. Hence
option A and B are wrong. 
(AEH)+ = {ABCDEH} 
(BEH)+ = {BEHCDA} 
(BCH)+ = {BCHDA} which is not set of all attributes.
So BCH is not a candidate key. Hence option C is
wrong. 
So correct answer is D. 

Normal Forms in DBMS

Normalization is the process of


minimizing redundancy from a relation or set of
relations. Redundancy in relation may cause
16

insertion, deletion, and update anomalies. So, it


helps to minimize the redundancy in
relations. Normal forms are used to eliminate or
reduce redundancy in database tables.

1. First Normal Form –

If a relation contain composite or multi-valued


attribute, it violates first normal form or a relation is
in first normal form if it does not contain any
composite or multi-valued attribute. A relation is in
first normal form if every attribute in that relation
is singled valued attribute.

Example 1 – Relation STUDENT in table 1 is not in


1NF because of multi-valued attribute
STUD_PHONE. Its decomposition into 1NF has
been shown in table 2.

 Example 2 –
17

 ID Name Courses
 ------------------
 1 A c1, c2
 2 E c3
 3 M C2, c3
In the above table Course is a multi-valued
attribute so it is not in 1NF.
Below Table is in 1NF as there is no multi-valued
attribute
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
 

Second Normal Form –

To be in second normal form, a relation must be in


first normal form and relation must not contain any
18

partial dependency. A relation is in 2NF if it has No


Partial Dependency, i.e., no non-prime attribute
(attributes which are not part of any candidate key)
is dependent on any proper subset of any candidate
key of the table.
Partial Dependency – If the proper subset of
candidate key determines non-prime attribute, it is
called partial dependency.

Example 1 – Consider table-3 as following below.


STUD_NO COURSE_NO
COURSE_FEE
1 C1
1000
2 C2
1500
1 C4
2000
4 C3
1000
4 C1
1000
2 C5
2000
19

{Note that, there are many courses having the same


course fee. }
Here,
COURSE_FEE cannot alone decide the value of
COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot
decide the value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot
decide the value of STUD_NO;
Hence,
 COURSE_FEE would be a non-prime attribute, as
it does not belong to the one only candidate key
{STUD_NO, COURSE_NO} ;
But, COURSE_NO -> COURSE_FEE, i.e.,
COURSE_FEE is dependent on COURSE_NO,
which is a proper subset of the candidate key. Non-
prime attribute COURSE_FEE is dependent on a
proper subset of the candidate key, which is a
partial dependency and so this relation is not in
2NF.
To convert the above relation to 2NF,
we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE
Table 1
Table2
20

STUD_NO COURSE_NO
COURSE_NO
COURSE_FEE
1 C1 C1
1000
2 C2 C2
1500
1 C4 C3
1000
4 C3 C4
2000
4 C1 C5
2000
2 C5
NOTE: 2NF tries to reduce the redundant data
getting stored in memory. For instance, if there are
100 students taking C1 course, we don’t need to
store its Fee as 1000 for all the 100 records,
instead, once we can store it in the second table as
the course fee for C1 is 1000.

Example 2 – Consider following functional


dependencies in relation  R (A,  B , C,  D )
AB -> C [A and B together determine C]
BC -> D [B and C together determine D]
21

In the above relation, AB is the only candidate key


and there is no partial dependency, i.e., any proper
subset of AB doesn’t determine any non-prime
attribute.

3. Third Normal Form –

A relation is in third normal form, if there is no


transitive dependency for non-prime attributes as
well as it is in second normal form.
A relation is in 3NF if at least one of the following
condition holds in every non-trivial function
dependency X –> Y
1.X is a super key.
2.Y is a prime attribute (each element of Y
is part of some candidate key).

Transitive dependency – If A->B and B->C are two


FDs then A->C is called transitive dependency.

Example 1 – In relation STUDENT given in Table 4,


FD set: {STUD_NO -> STUD_NAME, STUD_NO ->
STUD_STATE, STUD_STATE ->
STUD_COUNTRY, STUD_NO -> STUD_AGE}

Candidate Key: {STUD_NO}


22

For this relation in table 4,


STUD_NO -> STUD_STATE and
STUD_STATE -> STUD_COUNTRY are true. So
STUD_COUNTRY is transitively dependent on
STUD_NO. It violates the third normal form. To
convert it in third normal form, we will decompose
the relation
STUDENT (STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE,
STUD_COUNTRY_STUD_AGE)
as:
STUDENT (STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE, STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)
Example 2 – Consider relation R(A, B, C, D, E)
A -> BC,
CD -> E,
B -> D,
E -> A
All possible candidate keys in above relation are {A,
E, CD, BC} All attributes are on right sides of all
functional dependencies are prime.
23

4. Boyce-Codd Normal Form (BCNF) –

A relation R is in BCNF if R is in Third Normal Form


and for every FD, LHS is super key. A relation is in
BCNF iff in every non-trivial functional dependency
X –> Y, X is a super key.
Example 1 – Find the highest normal form of a
relation R(A,B,C,D,E) with FD set as {BC->D, AC-
>BE, B->E}
Step 1. As we can see, (AC)+ ={A,C,B,E,D} but
none of its subset can determine all attribute of
relation, So AC will be candidate key. A or C can’t
be derived from any other attribute of the relation,
so there will be only 1 candidate key {AC}.

Step 2. Prime attributes are those attributes that are


part of candidate key {A, C} in this example and
others will be non-prime {B, D, E} in this example.
Step 3. The relation R is in 1st normal form as a
relational DBMS does not allow multi-valued or
composite attribute.

The relation is in 2nd normal form because BC->D


is in 2nd normal form (BC is not a proper subset of
candidate key AC) and AC->BE is in 2nd normal
form (AC is candidate key) and B->E is in 2nd
24

normal form (B is not a proper subset of candidate


key AC).
The relation is not in 3rd normal form because in
BC->D (neither BC is a super key nor D is a prime
attribute) and in B->E (neither B is a super key nor
E is a prime attribute) but to satisfy 3rd normal for,
either LHS of an FD should be super key or RHS
should be prime attribute.
So the highest normal form of relation will be 2nd
Normal form.
Example 2 –
For example consider relation R(A, B, C)
A -> BC,
B ->
A and B both are super keys so above relation is in
BCNF.
Key Points –
 BCNF is free from redundancy.

 If a relation is in BCNF, then 3NF is also

satisfied.
  If all attributes of relation are prime attribute,

then the relation is always in 3NF.


 A relation in a Relational Database is always

and at least in 1NF form.


 Every Binary Relation ( a Relation with only 2

attributes ) is always in BCNF.


25

 If a Relation has only singleton candidate


keys( i.e. every candidate key consists of
only 1 attribute), then the Relation is always
in 2NF( because no Partial functional
dependency possible).
 Sometimes going for BCNF form may not
preserve functional dependency. In that case
go for BCNF only if the lost FD(s) is not
required, else normalize till 3NF only.
 There are many more Normal forms that exist
after BCNF, like 4NF and more. But in real
world database systems it’s generally not
required to go beyond BCNF.
 
Exercise 1: Find the highest normal form in R
(A, B, C, D, E) under following functional
dependencies.
ABC --> D
CD --> AE
Important Points for solving above type of
question.
1) It is always a good idea to start checking
from BCNF, then 3 NF, and so on.
2) If any functional dependency satisfied a
normal form then there is no need to check for
lower normal form. For example, ABC –> D is in
BCNF (Note that ABC is a superkey), so no
26

need to check this dependency for lower normal


forms.
Candidate keys in the given relation are {ABC,
BCD}
BCNF: ABC -> D is in BCNF. Let us check CD -
> AE, CD is not a super key so this dependency
is not in BCNF. So, R is not in BCNF.
3NF: ABC -> D we don’t need to check for this
dependency as it already satisfied BCNF. Let us
consider CD -> AE. Since E is not a prime
attribute, so the relation is not in 3NF.
2NF: In 2NF, we need to check for partial
dependency. CD is a proper subset of a
candidate key and it determines E, which is
non-prime attribute. So, given relation is also
not in 2 NF. So, the highest normal form is 1
NF.

You might also like