Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
11 views

Data Normalization

Uploaded by

bixmashr
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Data Normalization

Uploaded by

bixmashr
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

DATA NORMALIZATION

• DATA INTEGRITY

• PARTIAL AND TRANSITIVE


DEPENDENCY

agenda • DATA ANOMALIES

• NORMALIZATION

• TYPES OF NORMALIZATION
DATA INTERITY
Data Integrity is having
correct and accurate data in
your database. When we
are storing data in the
database, we don't want
repeating values, we don't
want incorrect values or
broken relationships
between tables. There are
mainly two types of data
integrity:
 Entity integrity
 Referential integrity
ENTITY INTEGRITY IS
CONCERNED WITH ENSURING
THAT EACH ROW OF A TABLE
HAS A UNIQUE AND NON-
NULL PRIMARY KEY VALUE;
THIS IS THE SAME AS SAYING
THAT EACH ROW IN A TABLE
REPRESENTS A SINGLE

ENTITY INTEGRITY INSTANCE OF THE ENTITY


TYPE MODELLED BY THE
TABLE. THERE ARE FOUR
PRIMARY TYPES OF DATA
INTEGRITY:
 ENTITY
 DOMAIN
 REFERENTIAL
 USER-DEFINED
REFERENTIAL INTEGRITY IS THE
LOGICAL DEPENDENCY OF A
FOREIGN KEY ON A PRIMARY KEY.
THE INTEGRITY OF A ROW THAT
CONTAINS A FOREIGN KEY
DEPENDS ON THE INTEGRITY OF
THE ROW THAT IT REFERENCES—
THE ROW THAT CONTAINS THE
MATCHING PRIMARY KEY. THERE
REFERENTIAL INTEGRITY ARE FOUR TYPES OF
INTEGRITY CONSTRAINTS IN
DBMS:
 DOMAIN CONSTRAINT
 ENTITY CONSTRAINT
 REFERENTIAL INTEGRITY
CONSTRAINT
 KEY CONSTRAINT
EXAMPLE OF REFERENTIAL INTEGRITY
STUDENT(FIRST
TABLE)
ROLL NAME AGE COURS FOREIGN
NO E-ID KEY
1 ALI 18 78 This value is not allowed because
2 AHMAD 19 16 this value is not defined as a
3 ZAIN 20 56 primary key
in the course table.
4 ATIF 21 The value can be NULL as the
student (ATIF)
may not have taken any course.
COURSE (SECOND
TABLE)
PRIMARY COURSE- COURSE- DURATION
KEY ID NAME (MONTHS)
78 BIG DATA 4
56 ALGORIT 2
HM
PARTIAL AND TRANSITIVE DEPENDENCY

1. Partial
dependency
A partial dependency in DBMS occurs between a prime
attribute and a non-prime attribute. A partial dependency
exists if a non-prime attribute is dependent on a proper
subset of a candidate key. Example A→B, where B is some
non-prime attribute, and A is the proper subset of a
candidate key, say X.
A partial dependency would occur whenever a non-prime
attribute depends functionally on a part of the given
candidate key. The 2NF (Second Normal Form) eliminates
partial dependency. Let us take an example to understand
this.
Example
Employee_Tas
k
Employee_ID Task_No Employee_Name
C01 34 Mona
C02 58 Genine

Here, the prime key attributes are Employee_ID and Task_No, and also:
Employee_ID = A unique ID of the employee
Employee_Name = Name of the employee
Task_No = A unique ID of the task
Task_Name = The name of the task
The Employee_Name can be determined using the Employee_ID. It actually makes the
relation Dependent Partially.
The Task_Name can be determined using the Task_No. It makes the relation Dependent
Partially.
Thus, the <Employee_Task> relation would violate the Second Normal Form in
Normalization and is considered to be a bad database design.
We decompose the tables to remove Partial Dependency along with the violation on the
second normal form:
Employee_i
nfo
Employee_ID Task_No Employee_Name
C01 34 Mona
C02 58 Genine

Task_inf
o
Task_No Task_Name Task_No
34 App Development 34
58 UX/UI Designing 58
2. TRANSITIVE
DEPENDENCY
A transitive dependency in a database is an indirect
relationship between values in the same table that causes a
functional dependency. To achieve the normalization
standard of Third Normal Form, you must eliminate any
transitive dependency.
The transitive dependency can occur easily only in the case
of some given relation of three or more attributes. Such a
type of dependency helps us in normalizing the database in
their 3rd Normal Form (3NF).
SHOW_TELECA
Example
ST
Show_ID Telecast_ID Telecast_Type CD_Cost ($)
F08 S09 Thriller 50
F03 S05 Romantic 30
F05 S09 Comedy 20
The table above is not in its 3NF because it includes a transitive functional
dependency.
Show_ID -> Telecast_ID
Telecast_ID -> Telecast_Type
Thus, the following has a transitive type of functional dependency.
Show_ID -> Telecast_Type
The statement given above states the relation <Show_Telecast> violates the 3NF
(3rd Normal Form). If we want to remove this violation, then we have to split the
tables for the removal of the transitive functional dependency.

SHOW

Show_ID Telecast_ID CD_Cost ($) Show_ID


F08 S09 50 F08
F03 S05 30 F03
F05 S09 20 F05
TELECAS
T
Telecast_ID Telecast_Type Telecast_ID Telecast_Type
S09 Thriller S09 Thriller
S05 Romantic S05 Romantic
S09 Comedy S09 Comedy
DATABASE ANOMALIES
Database anomalies mean errors and
inconsistencies. It creates problems while
performing the operations like insertion , deletion
and modification. It is necessary to remove
anomalies for error free processing in relations.
The anomalies can be eliminated by redefining a
relation into two or more relations . There are three
Types of anomalies in database:
 Insertion anomaly
 Deletion anomaly
 Modification anomaly
ANOMALIE
ID S Surname Departme Departme
Forename
Phone
nt nt ID number
1 Colin Arthur ICT 001 300
2 Laura Brown ICT 001 300
3 Stephen MacLeod ICT 001 300
4 Scott Sinclair English 002 301
5 Michelle Wie English 002 301
6 Ross Dyett PE 003 302
7 Ian Anderson PE 003 302
8 Betty Flood Geography 004 303

1. INSERTION
ANOMALY
In the above example, it is not possible to add a new department to the
database without also having to add a member of staff at the same time. The
table expects a teacher’s details and the details of a department to be stored
together as one record.
At the moment, there is no way to add the Maths department without also
having to add a Maths teacher. This problem is known as an insert anomaly.
2. DELETION
ANOMALY
A delete anomaly is the opposite of an insert anomaly. When a delete anomaly
occur it means that you cannot delete data from the table without having to
delete the entire record.
For example, if we want to remove Betty Flood from the table, we would also
need to remove all data that is stored about the Geography department. This
means we would lose data that we might not want to lose.

3. MODIFICATION
ANOMALY
Take a look at the table shown above again. If the phone number for the English
department changed to 307 instead of 301 it would need to be changed in two
different records.
If the change only happened in one of the two records, then an update anomaly
would have taken place.
In small tables it can be easy to spot update anomalies and make sure that
changes are made everywhere. However, large flat file tables would often
contain thousands of records, meaning that it is difficult to make changes to
every record. Update anomalies lead to inaccuracy and inconsistency in a
database.
NORMALIZATIO
N
Normalization is the process of organizing data in a
database. This includes creating tables and establishing
relationships between those tables according to rules
designed both to protect the data and to make the database
more flexible by eliminating redundancy and inconsistent
dependency.
GRAPHICAL REPRESENTATION OF NORMALIZATION

1ST NORMAL FORM

UN-NORMALIZED
RELATION

2ND NORMAL FORM

BOYCE AND CODD


3RD NORMAL FORM NORMAL FORM (BCNF)
1 ST
NORMAL FORM
A table is in first normal form if and only if satisfies
the conditions described below :
 Eliminate repeating groups in individual tables.
 Create a separate table for each set of related
data.
 Identify each set of related data with a primary
key.
2 ND
NORMAL FORM
A relation is in second normal form if and only if it satisfies the following
Conditions:
 Create separate tables for sets of values that apply to multiple records.
 Relate these tables with a foreign key.
Records should not depend on anything other than a table's primary key (a
compound key, if necessary). For example, consider a customer's address in an
accounting system. The address is needed by the Customers table, but also by
the Orders, Shipping, Invoices, Accounts Receivable, and Collections tables.
Instead of storing the customer's address as a separate entry in each of these
tables, store it in one place, either in the Customers table or in a separate
Addresses table.
3 RD
NORMAL
FORM
A relation is in third normal form if and only if satisfies the following
Conditions:
 Eliminate fields that do not depend on the key.
Values in a record that are not part of that record's key do not belong in the
table. In general, anytime the contents of a group of fields may apply to more
than a single record in the table, consider placing those fields in a separate
table.
For example, in an Employee Recruitment table, a candidate's university name
and address may be included. But you need a complete list of universities for
group mailings. If university information is stored in the Candidates table, there
is no way to list universities with no current candidates. Create a separate
Universities table and link it to the Candidates table with a university code key.
NORMALIZING AN EXAMPLE TABLE
1. Unnormalized
table:
Student Advisor Adv- Class1 Class2 Class3
# Room
1022 Jones 412 101-07 143-01 159-02
4123 Smith 216 101-07 143-01 179-04

2. First normal form: No repeating


groups
Tables should have only two dimensions. Since one student has several classes,
these classes should be listed in a separate table. Fields Class1, Class2, and
Class3 in the above records are indications of design trouble.
Spreadsheets often use the third dimension, but tables should not. Another way
to look at this problem is with a one-to-many relationship, do not put the one side
and the many side in the same table. Instead, create another table in first normal
form by eliminating the repeating group (Class#), as shown below:
Student# Advisor Adv-Room Class#
1022 Jones 412 101-07
1022 Jones 412 143-01
1022 Jones 412 159-02
4123 Smith 216 101-07
4123 Smith 216 143-01
4123 Smith 216 179-04

3. Second normal form: Eliminate


redundant data
Note the multiple Class# values for each Student# value in
the above table. Class# is not functionally dependent on
Student# (primary key), so this relationship is not in second
normal form.
The following tables demonstrate second normal form:
Students:

Student# Advisor Adv-Room


1022 Jones 412
4123 Smith 216

Registratio
n:
Student# Class#
1022 101-07
1022 143-01
1022 159-02
4123 101-07
4123 143-01
4123 179-04
4. Third normal form: Eliminate data not dependent on
key
In the last example, Adv-Room (the advisor's office number)
is functionally dependent on the Advisor attribute. The
solution is to move that attribute from the Students table to
the Faculty table, as shown below:

Student
s:
Student# Advisor Student#
1022 Jones 1022
4123 Smith 4123

Faculty
:
Name Room Dept
Jones 412 42
Smith 216 42
VIDEO EXPLAINATION

You might also like