Data Normalization
Data Normalization
• DATA INTEGRITY
• NORMALIZATION
• TYPES OF NORMALIZATION
DATA INTERITY
Data Integrity is having
correct and accurate data in
your database. When we
are storing data in the
database, we don't want
repeating values, we don't
want incorrect values or
broken relationships
between tables. There are
mainly two types of data
integrity:
Entity integrity
Referential integrity
ENTITY INTEGRITY IS
CONCERNED WITH ENSURING
THAT EACH ROW OF A TABLE
HAS A UNIQUE AND NON-
NULL PRIMARY KEY VALUE;
THIS IS THE SAME AS SAYING
THAT EACH ROW IN A TABLE
REPRESENTS A SINGLE
1. Partial
dependency
A partial dependency in DBMS occurs between a prime
attribute and a non-prime attribute. A partial dependency
exists if a non-prime attribute is dependent on a proper
subset of a candidate key. Example A→B, where B is some
non-prime attribute, and A is the proper subset of a
candidate key, say X.
A partial dependency would occur whenever a non-prime
attribute depends functionally on a part of the given
candidate key. The 2NF (Second Normal Form) eliminates
partial dependency. Let us take an example to understand
this.
Example
Employee_Tas
k
Employee_ID Task_No Employee_Name
C01 34 Mona
C02 58 Genine
Here, the prime key attributes are Employee_ID and Task_No, and also:
Employee_ID = A unique ID of the employee
Employee_Name = Name of the employee
Task_No = A unique ID of the task
Task_Name = The name of the task
The Employee_Name can be determined using the Employee_ID. It actually makes the
relation Dependent Partially.
The Task_Name can be determined using the Task_No. It makes the relation Dependent
Partially.
Thus, the <Employee_Task> relation would violate the Second Normal Form in
Normalization and is considered to be a bad database design.
We decompose the tables to remove Partial Dependency along with the violation on the
second normal form:
Employee_i
nfo
Employee_ID Task_No Employee_Name
C01 34 Mona
C02 58 Genine
Task_inf
o
Task_No Task_Name Task_No
34 App Development 34
58 UX/UI Designing 58
2. TRANSITIVE
DEPENDENCY
A transitive dependency in a database is an indirect
relationship between values in the same table that causes a
functional dependency. To achieve the normalization
standard of Third Normal Form, you must eliminate any
transitive dependency.
The transitive dependency can occur easily only in the case
of some given relation of three or more attributes. Such a
type of dependency helps us in normalizing the database in
their 3rd Normal Form (3NF).
SHOW_TELECA
Example
ST
Show_ID Telecast_ID Telecast_Type CD_Cost ($)
F08 S09 Thriller 50
F03 S05 Romantic 30
F05 S09 Comedy 20
The table above is not in its 3NF because it includes a transitive functional
dependency.
Show_ID -> Telecast_ID
Telecast_ID -> Telecast_Type
Thus, the following has a transitive type of functional dependency.
Show_ID -> Telecast_Type
The statement given above states the relation <Show_Telecast> violates the 3NF
(3rd Normal Form). If we want to remove this violation, then we have to split the
tables for the removal of the transitive functional dependency.
SHOW
1. INSERTION
ANOMALY
In the above example, it is not possible to add a new department to the
database without also having to add a member of staff at the same time. The
table expects a teacher’s details and the details of a department to be stored
together as one record.
At the moment, there is no way to add the Maths department without also
having to add a Maths teacher. This problem is known as an insert anomaly.
2. DELETION
ANOMALY
A delete anomaly is the opposite of an insert anomaly. When a delete anomaly
occur it means that you cannot delete data from the table without having to
delete the entire record.
For example, if we want to remove Betty Flood from the table, we would also
need to remove all data that is stored about the Geography department. This
means we would lose data that we might not want to lose.
3. MODIFICATION
ANOMALY
Take a look at the table shown above again. If the phone number for the English
department changed to 307 instead of 301 it would need to be changed in two
different records.
If the change only happened in one of the two records, then an update anomaly
would have taken place.
In small tables it can be easy to spot update anomalies and make sure that
changes are made everywhere. However, large flat file tables would often
contain thousands of records, meaning that it is difficult to make changes to
every record. Update anomalies lead to inaccuracy and inconsistency in a
database.
NORMALIZATIO
N
Normalization is the process of organizing data in a
database. This includes creating tables and establishing
relationships between those tables according to rules
designed both to protect the data and to make the database
more flexible by eliminating redundancy and inconsistent
dependency.
GRAPHICAL REPRESENTATION OF NORMALIZATION
UN-NORMALIZED
RELATION
Registratio
n:
Student# Class#
1022 101-07
1022 143-01
1022 159-02
4123 101-07
4123 143-01
4123 179-04
4. Third normal form: Eliminate data not dependent on
key
In the last example, Adv-Room (the advisor's office number)
is functionally dependent on the Advisor attribute. The
solution is to move that attribute from the Students table to
the Faculty table, as shown below:
Student
s:
Student# Advisor Student#
1022 Jones 1022
4123 Smith 4123
Faculty
:
Name Room Dept
Jones 412 42
Smith 216 42
VIDEO EXPLAINATION