0% found this document useful (0 votes)

23 views

Data Normalization

Data Normalization DBMS

Uploaded by

Nitika Kumari

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Data Normalization

Data Normalization DBMS

Uploaded by

Nitika Kumari

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

Data Normalization

We will step by step normalize the data.

The data basically stores the course code, course venue, instructor name, and
instructor’s phone number. At first, this design seems to be good. However, issues
start to develop once we need to modify information. For instance, suppose, if Prof.
George changed his mobile number. In such a situation, we will have to make edits in
2 places.

What if someone just edited the mobile number against CS101, but forgot to edit it
for CS154? This will lead to stale/wrong information in the database. This problem
can be easily tackled by dividing our table into 2 simpler tables:

Table 1 (Instructor):

 Instructor ID
 Instructor Name
 Instructor mobile number

Table 2 (Course):

 Course code
 Course venue
 Instructor ID
we store the instructors separately and in the course table, we do not store the entire
data of the instructor. Rather, we store the ID of the instructor. Now, if someone
wants to know the mobile number of the instructor, they can simply look up the
instructor table. Also, if we were to change the mobile number of Prof. George, it can
be done in exactly one place. This avoids the stale/wrong data problem.

Further, if you observe, the mobile number now need not be stored 2 times. We have
stored it in just 1 place. This also saves storage. This may not be obvious in the above
simple example. However, think about the case when there are hundreds of courses
and instructors and for each instructor, we have to store not just the mobile number,
but also other details like office address, email address, specialization, availability, etc.
In such a situation, replicating so much data will increase the storage requirement
unnecessarily.

First Normal Form 1NF

The first normal form simply says that each cell of a table should contain exactly one
value. Assume we are storing the courses that a particular instructor takes, we can
store it like this:
Second Normal Form (2NF)
For a table to be in second normal form, the following 2 conditions must be met:

 The table should be in the first normal form.

 The primary key of the table should have exactly 1 column.
the first column is the student name and the second column is the course taken by
the student.

Clearly, the student name column isn’t unique as we can see that there are 2 entries
corresponding to the name ‘Rahul’ in row 1 and row 3. Similarly, the course code
column is not unique as we can see that there are 2 entries corresponding to course
code CS101 in row 2 and row 4.

However, the tuple (student name, course code) is unique since a student cannot
enroll in the same course more than once. So, these 2 columns when combined form
the primary key for the database.

As per the second normal form definition, our enrollment table above isn’t in the
second normal form. To achieve the same (1NF to 2NF), we can rather break it into 2
tables:
Third Normal Form (3NF)
Before we delve into the details of third normal form, let us understand the concept
of a functional dependency on a table.

Column A is said to be functionally dependent on column B if changing the value of

A may require a change in the value of B. As an example, consider the following
table:

Here, the department column is dependent on the professor name column. This is
because if in a particular row, we change the name of the professor, we will also have
to change the department value. As an example, suppose MA214 is now taken by
Prof. Ronald who happens to be from the mathematics department, the table will
look like this:

Here, when we changed the name of the professor, we also had to change the
department column. This is not desirable since someone who is updating the
database may remember to change the name of the professor, but may forget
updating the department value. This can cause inconsistency in the database.

Third normal form avoids this by breaking this into separate tables:
we store the details of the professor against his/her ID. This way, whenever we want
to reference the professor somewhere, we don’t have to put the other details of the
professor in that table again. We can simply use the ID.

Therefore, in the third normal form, the following conditions are required:

 The table should be in the second normal form.

 There should not be any functional dependency.

Boyce-Codd Normal Form (BCNF)

The Boyce-Codd Normal form is a stronger generalization of the third normal form.
A table is in Boyce-Codd Normal form if and only if at least one of the following
conditions are met for each functional dependency A → B:

 A is a superkey
 It is a trivial functional dependency.

Let us first understand what a superkey means. To understand BCNF in DBMS,

consider the following BCNF example table:
Here, the first column (course code) is unique across various rows. So, it is a
superkey. Consider the combination of columns (course code, professor name). It is
also unique across various rows. So, it is also a superkey. A superkey is basically a set
of columns such that the value of that set of columns is unique across various rows.
That is, no 2 rows have the same set of values for those columns. Some of the
superkeys for the table above are:

 Course code
 Course code, professor name
 Course code, professor mobile number

A superkey whose size (number of columns) is the smallest is called a candidate key.
For instance, the first superkey above has just 1 column. The second one and the last
one have 2 columns. So, the first superkey (Course code) is a candidate key.

Boyce-Codd Normal Form says that if there is a functional dependency A → B, then

either A is a superkey or it is a trivial functional dependency. A trivial functional
dependency means that all columns of B are contained in the columns of A. For
instance, (course code, professor name) → (course code) is a trivial functional
dependency because when we know the value of course code and professor name,
we do know the value of course code and so, the dependency becomes trivial.

A is a superkey: this means that only and only on a superkey column should it be
the case that there is a dependency of other columns. Basically, if a set of columns (B)
can be determined knowing some other set of columns (A), then A should be a
superkey. Superkey basically determines each row uniquely.

It is a trivial functional dependency: this means that there should be no non-trivial

dependency. For instance, we saw how the professor’s department was dependent
on the professor’s name. This may create integrity issues since someone may edit the
professor’s name without changing the department. This may lead to an inconsistent
database.

Another example would be if a company had employees who work in more than one
department. The corresponding database can be decomposed into where the
functional dependencies could be such keys as employee ID and employee
department.

Fourth Normal Form (4NF)

Definition: 4NF is a level of database normalization that builds upon the first three
normal forms (1NF, 2NF, and 3NF) and the Boyce-Codd Normal Form (BCNF).

o Objective: It ensures that there are no non-trivial multivalued

dependencies other than a candidate key.
o Properties:
 The relation must already satisfy the requirements of BCNF.
 It should not contain more than one multivalued dependency.
o Example: Consider the schema, with a case as “if a company makes a
product and an agent is an agent for that company, then he always
sells that product for the company”. Under these circumstances, the
ACP table is shown as:

Table ACP
Agent Company Product

A1 PQR/XYZ Nut/Bolt

A2 PQR Nut

A3 XYZ Bolt

Here, the multivalued dependencies are:

 A1 ->-> Company
 A1 ->-> Product

This is read as “Agent A1 multi determines Company” and “Agent A1 multi determines
Product.”
Note that a functional dependency is a special case of multivalued dependency. In a
functional dependency X -> Y, every x determines exactly one y, never more than one.

Agent Company Product

A1 PQR Nut

A1 PQR Bolt

A1 XYZ Nut

A1 XYZ Bolt

A2 PQR Nut

A3 XYZ Bolt

To eliminate the redundancy, we can break this information into two tables: one for
agent_company and another for agent_product.

Table R1

Agent Company

PQR
A1

A1 XYZ
Agent Company

A2 PQR

A3 XYZ

Table R2

Agent Product

A1 Nut

A1 Bolt

A2 Nut

A3 Bolt

Fifth Normal Form (5NF)

Also known as Project-Join Normal Form (PJNF), 5NF is the highest level of
normalization.

o Objective: It involves decomposing a table into smaller tables to

remove data redundancy and improve data integrity.
o Condition:
 A relation is in 5NF if every join dependency in that relation is
implied by the candidate keys.
o Join Dependency (JD):
 A join decomposition is a generalization of multivalued
dependencies.
 If the join of two decomposed relations over a common attribute
is equal to the original relation, a join dependency exists.

If the join of R1 and R2 over C is equal to relation R then we can

say that a join dependency (JD) exists, where R1 and R2 are the
decomposition R1(A, B, C) and R2(C, D) of a given relations R (A,
B, C, D). Alternatively, R1 and R2 are a lossless decomposition of
R. A JD ⋈ {R1, R2, …, Rn} is said to hold over a relation R if R1,
R2, ….., Rn is a lossless-join decomposition. The *(A, B, C, D), (C,
D) will be a JD of R if the join of joins attribute is equal to the
relation R. Here, *(R1, R2, R3) is used to indicate that relation R1,
R2, R3 and so on are a JD of R. Let R is a relation schema R1, R2,
R3……..Rn be the decomposition of R. r( R ) is said to satisfy join
dependency if and only if

Example – Consider the above schema, with a case as “if a company makes a
product and an agent is an agent for that company, then he always sells that
product for the company”. Under these circumstances, the ACP table is shown as:

Table ACP
Agent Company Product

A1 PQR Nut

A1 PQR Bolt

A1 XYZ Nut

A1 XYZ Bolt
Agent Company Product

A2 PQR Nut

The relation ACP is again decomposed into 3 relations. Now, the natural Join of all
three relations will be shown as:

Table R1
Agent Company

A1 PQR

A1 XYZ

A2 PQR

Table R2
Agent Product

A1 Nut

A1 Bolt

A2 Nut

Table R3
Company Product

PQR Nut

PQR Bolt
Company Product

XYZ Nut

XYZ Bolt

The result of the Natural Join of R1 and R3 over ‘Company’ and then the Natural
Join of ACP is a lossless join decomposition. Therefore, the relation is in 5NF as it
does not violate the property of lossless join.

Conclusion
 Multivalued dependencies are removed by 4NF, and join dependencies
are removed by 5NF.
 The greatest degrees of database normalization, 4NF and 5NF, might not
be required for every application.
 Normalizing to 4NF and 5NF might result in more complicated database
structures and slower query speed, but it can also increase data accuracy,
dependability, and simplicity.

UNIT-2,3: Hierarchical Model
No ratings yet
UNIT-2,3: Hierarchical Model
18 pages
Normalization in DBMS11
No ratings yet
Normalization in DBMS11
12 pages
Chapter 5_085706
No ratings yet
Chapter 5_085706
23 pages
DBMS Normalization
No ratings yet
DBMS Normalization
53 pages
unit 3 ADBMS
No ratings yet
unit 3 ADBMS
12 pages
DBMS Unit-Iv
No ratings yet
DBMS Unit-Iv
51 pages
Normalization and Denormalization
No ratings yet
Normalization and Denormalization
44 pages
Normalization docx (Autosaved)
No ratings yet
Normalization docx (Autosaved)
33 pages
Unit3 Normalization
No ratings yet
Unit3 Normalization
9 pages
Unit 3 Normalization
No ratings yet
Unit 3 Normalization
70 pages
Normalization-DBMS
No ratings yet
Normalization-DBMS
10 pages
Normalizationl
No ratings yet
Normalizationl
17 pages
1NF, 2NF
No ratings yet
1NF, 2NF
9 pages
Unit 3 DBMS - 1596870407
100% (1)
Unit 3 DBMS - 1596870407
16 pages
First Normal Form
No ratings yet
First Normal Form
23 pages
DBMS Unit 2
No ratings yet
DBMS Unit 2
47 pages
Chapter 5
No ratings yet
Chapter 5
22 pages
Normalization
No ratings yet
Normalization
57 pages
SQL Normalisation, Constraints, ERD and ACID Properties
No ratings yet
SQL Normalisation, Constraints, ERD and ACID Properties
9 pages
CS331 - Chapter 5 Normalization
No ratings yet
CS331 - Chapter 5 Normalization
35 pages
CS331 - Chapter5 Normalization
No ratings yet
CS331 - Chapter5 Normalization
35 pages
Normalization
No ratings yet
Normalization
8 pages
DOC-20250103-WA0014.
No ratings yet
DOC-20250103-WA0014.
8 pages
Normal Form
No ratings yet
Normal Form
27 pages
Normalization of Database
No ratings yet
Normalization of Database
10 pages
Normalization in DBMS
No ratings yet
Normalization in DBMS
8 pages
Normalization
No ratings yet
Normalization
11 pages
Normalization: Normalization Is A Systematic Way of Ensuring That A Database Structure Is Suitable For
No ratings yet
Normalization: Normalization Is A Systematic Way of Ensuring That A Database Structure Is Suitable For
6 pages
module3
No ratings yet
module3
32 pages
Normalization AND KEYS
No ratings yet
Normalization AND KEYS
19 pages
db2
No ratings yet
db2
15 pages
Database Normalization PDF
No ratings yet
Database Normalization PDF
3 pages
Normaliation 7 B
No ratings yet
Normaliation 7 B
8 pages
Chapter 9 NORMALIZATION
No ratings yet
Chapter 9 NORMALIZATION
45 pages
Unit-2-Normalization
No ratings yet
Unit-2-Normalization
11 pages
DBMS, unit-5
No ratings yet
DBMS, unit-5
9 pages
Database Normalization - New
No ratings yet
Database Normalization - New
8 pages
Chapter-9-NORMALIZATION-1
No ratings yet
Chapter-9-NORMALIZATION-1
45 pages
MYSQL DAY - 20 (Normalization)
No ratings yet
MYSQL DAY - 20 (Normalization)
13 pages
Normalization 1
No ratings yet
Normalization 1
6 pages
NORMALISATION
No ratings yet
NORMALISATION
15 pages
Unit 4
No ratings yet
Unit 4
33 pages
RDBMS Unit 4
No ratings yet
RDBMS Unit 4
15 pages
Normalization
No ratings yet
Normalization
17 pages
Normalization (2)
No ratings yet
Normalization (2)
39 pages
DBMS Unit 2 Module 2
No ratings yet
DBMS Unit 2 Module 2
14 pages
f
No ratings yet
f
5 pages
Normalization ss2
No ratings yet
Normalization ss2
5 pages
DBMS Normalization
No ratings yet
DBMS Normalization
3 pages
2nd and 3rd Unit
No ratings yet
2nd and 3rd Unit
87 pages
Database Lec 2 donya
No ratings yet
Database Lec 2 donya
7 pages
Lecture 6 Normalization
No ratings yet
Lecture 6 Normalization
54 pages
DBMS Chap 3
No ratings yet
DBMS Chap 3
17 pages
Slid CH06
No ratings yet
Slid CH06
53 pages
Week11 CM MDL CC225
No ratings yet
Week11 CM MDL CC225
25 pages
Week 3 Dbms
No ratings yet
Week 3 Dbms
6 pages
Relational Database Management System: Normalization
No ratings yet
Relational Database Management System: Normalization
8 pages
Normalization and Normal Form
No ratings yet
Normalization and Normal Form
11 pages
More on C# in Front Office
From Everand
More on C# in Front Office
Xing Zhou
No ratings yet
Interview Questions for DB2 z/OS Application Developers
From Everand
Interview Questions for DB2 z/OS Application Developers
Robert Wingate
No ratings yet
OBIEE Vs OAS Key Differences Features and Benefits
No ratings yet
OBIEE Vs OAS Key Differences Features and Benefits
10 pages
Chapter 8 New WEEK 11
No ratings yet
Chapter 8 New WEEK 11
68 pages
SQLclassQuiz Final
No ratings yet
SQLclassQuiz Final
16 pages
Upload Nexus
No ratings yet
Upload Nexus
2 pages
All Item ID Database
No ratings yet
All Item ID Database
66 pages
IICS
No ratings yet
IICS
269 pages
DBMS Presentation
No ratings yet
DBMS Presentation
17 pages
MWA_EXPRESS_SERVER_USER_GUIDE_1.9
No ratings yet
MWA_EXPRESS_SERVER_USER_GUIDE_1.9
111 pages
ISCDC For Oracle
No ratings yet
ISCDC For Oracle
104 pages
Query
No ratings yet
Query
13 pages
Payroll Management System Project
100% (1)
Payroll Management System Project
16 pages
Mes2.0 Sepasoft
No ratings yet
Mes2.0 Sepasoft
2,917 pages
Unit 2 Data Preprocessing
No ratings yet
Unit 2 Data Preprocessing
40 pages
Rdbms Important
No ratings yet
Rdbms Important
76 pages
Mehjabeen Sayyed Resume - Test Manager
No ratings yet
Mehjabeen Sayyed Resume - Test Manager
9 pages
Tutorial 3-2024
No ratings yet
Tutorial 3-2024
1 page
Pyspark IQ FREE Guide
No ratings yet
Pyspark IQ FREE Guide
57 pages
SQLcel
No ratings yet
SQLcel
3 pages
CSE220 Final Fall-23 Set-A
No ratings yet
CSE220 Final Fall-23 Set-A
4 pages
Importance of Class Group & Types, DB Connectivity in PEGA
No ratings yet
Importance of Class Group & Types, DB Connectivity in PEGA
6 pages
Question 2
No ratings yet
Question 2
9 pages
Ip Project by Nirupama
No ratings yet
Ip Project by Nirupama
81 pages
Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations' COCI: A Multidisciplinary Comparison of Coverage Via Citations
No ratings yet
Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations' COCI: A Multidisciplinary Comparison of Coverage Via Citations
36 pages
The Firebird book a reference for database developers 2nd ed. Edition Borrie download pdf
100% (1)
The Firebird book a reference for database developers 2nd ed. Edition Borrie download pdf
61 pages
Sap Note 3239971
No ratings yet
Sap Note 3239971
6 pages
P00186290 Ian Ngwalo DBDD
No ratings yet
P00186290 Ian Ngwalo DBDD
32 pages
Data Guard 11 G Release 2 Technical Overview
No ratings yet
Data Guard 11 G Release 2 Technical Overview
60 pages
IS481 Week 6 Assignment
No ratings yet
IS481 Week 6 Assignment
2 pages
Post Gis Intro
No ratings yet
Post Gis Intro
231 pages
Splunk Interview Questions
No ratings yet
Splunk Interview Questions
14 pages