Unit 4 - Database Management System - WWW - Rgpvnotes.in
Unit 4 - Database Management System - WWW - Rgpvnotes.in
Unit 4 - Database Management System - WWW - Rgpvnotes.in
Unit IV
Normalization of Database
Database Normalizations is a technique of organizing the data in the database. Normalization is a systematic
approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion,
Update and Deletion Anomalies. It is a multi-step process that puts data into tabular form by removing
duplicated data from the relation tables.
Updating Anomaly: To update address of a student who occurs twice or more than twice in a table, we
will have to update Address column in all the rows, else data will become inconsistent.
Insertion Anomaly: Suppose for a new admission, we have a Student id(S_id), name and address of a
student but if student has not opted for any subjects yet then we have to insert NULL there, leading to
Insertion Anomaly.
Deletion Anomaly: If (S_id) 401 has only one subject and temporarily he drops it, when we delete that
row, entire student record will be deleted along with it.
Theory of Data Normalization in Sql is still being developed further. For example, there are discussions
even on 6th Normal Form. But in most practical applications normalization achieves its best in 3rd
Normal Form. The evolution of Normalization theories is illustrated below-
Functional Dependencies
A functional dependency is a relationship between two attributes. Typically, between the PK and other non-
key attributes with in the table. For any relation R, attribute Y is functionally dependent on attribute X
(usually the PK), if for every valid instance of X, that value of X uniquely determines the value of Y.
X ———–> Y
The left-hand side of the FD is called the determinant, and the right-hand side is the dependent.
Examples:
SIN ———-> Name, Address, Birthdate
SIN determines names and address and birthdays. Given SIN, we can determine any of the other attributes
within the table.
Sin, Course ———> Date-Completed
Sin and Course determine date completed. This must also work for a composite PK.
ISBN ———–> Title
ISBN determines title.
Various Types of Functional Dependencies are –
Database is a collection of related information in which one information depends on another information.
The information is either single-valued or multi-valued. For example, the name of the person or his / her
date of birth are single valued facts. But the qualification of a person is a multivalued fact.
A simple example of single value functional dependency is when A is the primary key of an entity (eg. SID)
and B is some single valued attribute of the entity (eg. Sname . The , A → B ust al a s hold.
If AD → C, is fully functional dependency, then we cannot remove A or D. I.e. C is fully functional dependent
on AD. If we are able to remove A or D, then it is not fully functional dependency.
Another Example, Consider the following Company Relational Schema,
EMPOYEE
DEPARTMENT
DEPT_LOCATIONS
PROJECT
{SSN, PNUMBER} → ENAME is ot a full FD it is alled a pa tial depe de si e SSN → ENAME also holds.
Partial Functional Dependency –
A Functional Dependency in which one or more non-key attributes are functionally depending on a part of
the primary key is called partial functional dependency. or
where the determinant consists of key attributes, but not the entire primary key, and the determined consist
of non-key attributes.
For example, consider a Relation R (A, B, C, D, E) having
FD: AB → CDE he e PK is AB.
Transitive Dependency –
Given a relation R (A, B, C) then dependency like A–>B, B–>C is a transitive dependency, since A–>C is
implied.
{TRIVIAL + NONTRIVIAL}
Question 1 :
A B C
1 1 1
1 2 1
2 1 2
2 2 3
Identify Non-Trivial Functional Dependency?
Solution:
S.NO Dependencies Non-Trivial FD?
1 A→B ×
2 A→C ×
3 A→BC ×
4 B→A ×
5 B→C ×
6 B→AC ×
7 C→A √
8 C→B ×
9 C→AB ×
10 AB→C √
11 BC→A √
12 AC→B ×
Some examples:
C -> ACD
D -> AD
AB -> ABCD
AC -> ACD
BC -> ABCD
BD -> ABCD
CD -> ACD
ABC -> ABCD
ABD -> ABCD
BCD -> ABCD
Let S be the set of functional dependencies that are specified on relation schema R. Numerous other
dependencies can be inferred or deduced from the functional dependencies in S.
Example:
Let S = {A → B, B → C}
A multivalued dependency occurs when the presence of one or more rows in a table implies the presence of
one or more other rows in that same table. Put another way, two attributes (or columns) in a table are
independent of one another, but both depend on a third attribute. A multivalued dependency prevents the
normalization standard Fourth Normal Form (4NF).
Students
Student_Name Major
Ravi Art History
Beth Chemistry
This functional dependency can be written: Student_Name -> Major. Each Student_Name determines
exactly one Major, and no more.
Now, perhaps we also want to track the sports these students take. We might think the easiest way to do
this is to just add another column, Sport:
Students
Student_Name Major Sport
Ravi Art History Soccer
Ravi Art History Volleyball
Ravi Art History Tennis
Beth Chemistry Tennis
Beth Chemistry Soccer
The problem here is that both Ravi and Beth play multiple sports. We need to add a new row for every
additional sport.
This table has introduced a multivalued dependency because the major and the sport are independent of
one another but both depend on the student.
Note that this is a very simple example and easily identifiable — but this could become a problem in a large,
complex database.
This is read as "Student_Name multidetermined Major" and "Student Name multidetermined Sport."
A multivalued dependency always requires at least three attributes because it consists of at least two
attributes that are dependent on a third.
The table below now has a functional dependency of Student_Name -> Major, and no multidependencies:
Student_Name Major
Ravi Art History
Ravi Art History
Ravi Art History
Beth Chemistry
Beth Chemistry
While this table also has a single functional dependency of Student_Name -> Sport:
It's clear that normalization is often addressed by simplifying complex tables so that they contain information
related to a single idea or theme, rather than trying to make a single table contain too much disparate
information.
1. Let R= (A, B, C, D, E, F) be a relation scheme with the following dependencies: C->F, E->A, EC->D, A->B.
Which of the following is a key for R?
(a) CD (b) EC (c) AE (d) AC
Normalization
First Normal Form
First Normal Form is defined in the definition of relations (tables) itself. This rule defines that all the
attributes in a relation must have atomic domains. The values in an atomic domain are indivisible units.
Course Content
Programming Java,C++
Web HTML,PHP,ASP
We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID. According to
the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent upon both and not on any of
the prime key attribute individually. But we find that Stu_Name can be identified by Stu_ID and Proj_Name
can be identified by Proj_ID independently. This is called partial dependency, which is not allowed in Second
Normal Form.
Student
Project
Proj_ID Proj_Name
We broke the relation in two as depicted in the above picture. So there exists no partial dependency.
STUDENT_DETAILS
We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute. We find
that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is City a prime attribute.
Additio all , Stu_ID → )ip → Cit , so the e e ists t a siti e depe de .
To bring this relation into third normal form, we break the relation into two relations as follows –.
Student_Details
Zip Codes
Zip City
V2 I2 P1
V2 I3 P1
V3 I1 P2
V3 I1 P3
The table can be expressed as the two 4NF relations given as following. The fact that vendors are capable of
supplying certain items and that they are assigned to supply for some projects in independently specified in
the 4NF relation.
Vendor-Supply
Vendor Code Item Code
V1 I1
V1 I2
V2 I2
V2 I3
V3 I1
Vendor-Project
Vendor Code Project No.
V1 P1
V1 P3
V2 P1
V3 P2
Fifth Normal Form (5NF)
These relations still have a problem. While defining the 4NF we mentioned that all the attributes depend
upon each other. While creating the two tables in the 4NF, although we have preserved the dependencies
between Vendor Code and Item code in the first table and Vendor Code and Item code in the second table,
we have lost the relationship between Item Code and Project No. If there were a primary key then this loss
of dependency would not have occurred. In order to revive this relationship, we must add a new table like
the following. Please note that during the entire process of normalization, this is the only step where a new
table is created by joining two attributes, rather than splitting them into separate tables.