Data Normalization
Data Normalization
Database Normalization
Dr. Hasan TINMAZ
Woosong University
Endicott College of International Studies
Normalization
Good database design must be matched to good table structures.
Normalization is evaluating and correcting table structures to
minimize data redundancies.
Reduces data anomalies,
Assigns attributes to tables based on determination,
Normalization works through a series of stages called normal
forms.
The first three stages are described as first normal form (1NF),
second normal form (2NF), and third normal form (3NF).
2
Normalization
Structural point of view of normal forms
Higher normal forms are better than lower normal forms;
2NF is better than 1NF, and 3NF is better than 2NF.
Properly designed 3NF structures meet the requirement of fourth
normal form (4NF).
Normalization is concerned with relations.
In normalization terminology, any attribute that is at least part of a
key is known as a prime attribute instead of the more common
term key attribute, which was introduced earlier.
Conversely, a nonprime attribute, or a nonkey attribute, is not
part of any candidate key.
3
De-Normalization
Although normalization is a very important ingredient in database
design, you should not assume that the highest level of
normalization is always the most desirable.
Generally, the higher the normal form, the more relational join
operations you need to produce a specified output.
Also, more resources are required by the database system to
respond to end-user queries.
A successful design must also consider end-user demand for fast
performance.
4
De-Normalization
Therefore, you will occasionally need to denormalize some portions
of a database design to meet performance requirements.
Denormalization produces a lower normal form; that is, a 3NF
will be converted to a 2NF through denormalization.
However, the price you pay for increased performance through
denormalization is greater data redundancy.
5
Need for Normalization
Used while designing a new database structure
Analyzes the relationship among the attributes within each
entity
Determines if the structure can be improved
Improves the existing data structure and creates an appropriate
database design.
You start by defining the business rules and data constraints,
identifying the functional dependencies, entities, and attributes
using the techniques you learned.
Then, you apply normalization concepts to validate and further
refine the model.
6
Need for Normalization
7
Normalization Process
The objective of normalization is to ensure that each table
conforms to the concept of well-formed relations - in other
words, tables that have the following characteristics:
Each relation (table) represents a single subject. For example, a
COURSE table will contain only data that directly pertain to
courses. Similarly, a STUDENT table will contain only student
data.
Each row/column intersection contains only one value and not
a group of values.
8
Normalization Process
No data item will be unnecessarily stored in more than one table
(tables have minimum controlled redundancy).
The reason for this requirement is to ensure that the data is
updated in only one place.
All nonprime attributes in a relation (table) are dependent on the
primary key-the entire primary key and nothing but the primary
key.
The reason for this requirement is to ensure that the data is
uniquely identifiable by a primary key value.
Each relation (table) has no insertion, update, or deletion
anomalies, which ensures the integrity and consistency of the data.
9
Normalization Process
Ensures that all tables are in at least 3NF
Higher forms are not likely to be encountered in business environment
Works one relation at a time
Starts by:
Identifying the dependencies of a relation (table)
Progressively breaking the relation into new set of relations
10
Functional Dependence Concepts
11
Types of Functional Dependencies
12
Types of Functional Dependencies
13
Conversion to First Normal Form (1NF)
Step 1: Eliminate the repeating group: Group of
multiple entries of same type can exist for any single key
attribute occurrence
Existence proves the presence of data redundancies
Enable reducing data redundancies
Steps
Eliminate the repeating groups
Identify the primary key
Identify all dependencies
14
Conversion to First Normal Form (1NF)
This will create separate rows for each employee assigned to each project, converting the
15 multivalued attributes into single-valued attributes.
Conversion to First Normal Form (1NF)
Step 2: Identify the Primary Key: The layout in previous figure represents
more than a mere cosmetic change.
Even a casual observer will note that PROJ_NUM is not an adequate primary
key because the project number does not uniquely identify each row.
For example, the PROJ_NUM value 15 can identify any one of five rows
containing employees who work on the Evergreen project.
To maintain a proper primary key that will uniquely identify any attribute value,
the new key must be composed of a combination of PROJ_NUM and
EMP_NUM.
For example, using the data shown in previous figure, if you know that
PROJ_NUM = 15 and EMP_NUM = 103, the entries for the attributes
PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, and HOURS must
be Evergreen, June E. Arbough, Elect. Engineer, $84.50, and 23.8,
respectively.
16
Conversion to First Normal Form (1NF)
Step 3: Identify all dependencies: Depicts all dependencies
found within given table structure
Helps get an overview of all relationships among table’s
attributes
Makes it less likely that an important dependency will be
overlooked
The identification of the PK in Step 2 means that you have already
identified the following dependency:
17
First Normal Form (1NF) Dependency Diagram
The dependencies can also be depicted with the help of the diagram shown in figure.
Because such a diagram depicts all dependencies found within a given table structure, it is known as a dependency
diagram.
Dependency diagrams are very helpful in getting a bird’s-eye view of all the relationships among a table’s attributes, and
their use makes it less likely that you will overlook an important dependency.
18
First Normal Form
(1NF) Dependency
Diagram
19
Conversion to First Normal Form (1NF)
1NF describes tabular format in which:
All key attributes are defined
There are no repeating groups in the table
All attributes are dependent on the primary key
All relational tables satisfy 1NF requirements
Some tables contain partial dependencies
Subject to data redundancies and various anomalies
20
Conversion to Second Normal Form
(2NF)
Conversion to 2NF occurs only when the 1NF has a composite
primary key.
If the 1NF has a single-attribute primary key, then the table is
automatically in 2NF.
The 1NF-to-2NF conversion is simple.
Starting with the 1NF format displayed in previous figure, you
take the following steps:
Step 1: Make New Tables to Eliminate Partial Dependencies
Step 2: Reassign Corresponding Dependent Attributes
21
Conversion to Second Normal Form
(2NF)
Step 1: Make New Tables to Eliminate Partial
Dependencies:
For each component of the primary key that acts as a determinant
in a partial dependency, create a new table with a copy of that
component as the primary key.
While these components are placed in the new tables, it is
important that they also remain in the original table as well.
The determinants must remain in the original table because they
will be the foreign keys for the relationships needed to relate these
new tables to the original table.
22
Conversion to Second Normal Form
(2NF)
Step 1: Make New Tables to Eliminate Partial
Dependencies:
To construct the revised dependency diagram, write each key
component on a separate line and then write the original
(composite) key on the last line.
For example:
PROJ_NUM
EMP_NUM
PROJ_NUM EMP_NUM
Each component will become the key in a new table. In other
words, the original table is now divided into three tables
(PROJECT, EMPLOYEE, and ASSIGNMENT).
23
Conversion to Second Normal Form
(2NF)
Step 2: Reassign Corresponding Dependent
Attributes:
The attributes that are dependent in a partial
dependency are removed from the original table and
placed in the new table with the dependency’s
determinant.
Any attributes that are not dependent in a partial
dependency will remain in the original table.
24
Conversion to Second Normal Form
(2NF)
Step 2: Reassign Corresponding Dependent Attributes:
In other words, the three tables that result from the conversion to
2NF are given appropriate names (PROJECT, EMPLOYEE, and
ASSIGNMENT) and are described by the following relational
schemas:
25
Second Normal Form (2NF) Conversion Results
The results of Steps 1 and 2 are displayed in figure. At this point, most of the anomalies discussed earlier
have been eliminated.
For example, if you now want to add, change, or delete a PROJECT record, you need to go only to the
26 PROJECT table and make the change to only one row.
Conversion to Third Normal Form (3NF)
Step 1: Make New Tables to Eliminate Transitive
Dependencies:
For every transitive dependency, write a copy of its determinant as
a primary key for a new table.
A determinant is any attribute whose value determines other
values within a row.
If you have three different transitive dependencies, you will have
three different determinants.
27
Conversion to Third Normal Form (3NF)
Step 1: Make New Tables to Eliminate Transitive
Dependencies:
As with the conversion to 2NF, it is important that the
determinant remain in the original table to serve as a foreign key.
Previous figure shows only one table that contains a transitive
dependency.
Therefore, write the determinant for this transitive dependency
as:
JOB_CLASS
28
Conversion to Third Normal Form (3NF)
Step 2: Reassign Corresponding Dependent Attributes:
Using previous figure, identify the attributes that are dependent
on each determinant identified in Step 1.
Place the dependent attributes in the new tables with their
determinants and remove them from their original tables.
In this example, eliminate CHG_HOUR from the EMPLOYEE
table shown in previous figure to leave the EMPLOYEE table
dependency definition as:
29
Third Normal Form (3NF) Conversion Results
Draw a new dependency diagram to show all the tables you have defined in Steps 1 and 2.
Name the table to reflect its contents and function. In this case, JOB seems appropriate.
Check all the tables to make sure that each table has a determinant and that no table contains
inappropriate dependencies.
30 When you have completed these steps, you will see the results in figure above.
Third Normal Form (3NF) Conversion
Results
31
As a summary,
It is interesting to note the
similarities between resolving
2NF and 3NF problems.
To convert a table from 1NF
to 2NF, it is necessary to
remove the partial
dependencies.
To convert a table from 2NF
to 3NF, it is necessary to
remove the transitive
dependencies.
32
As a summary,
No matter whether the “problem” dependency is a partial
dependency or a transitive dependency, the solution is the same:
create a new table for each problem dependency.
The determinant of the problem dependency remains in the
original table and is placed as the primary key of the new table.
The dependents of the problem dependency are removed from the
original table and placed as nonprime attributes in the new table.
33
Normalization and Database Design
Normalization should be part of the design process.
Proposed entities must meet required the normal form before
table structures are created.
Principles and normalization procedures to be understood to
redesign and modify databases
ERD is created through an iterative process
Normalization focuses on the characteristics of specific entities
34
Initial Contracting Company ERD
35
Modified Contracting Company ERD
36
Incorrect M:N Relationship Representation
37
Final Contracting Company ERD
38
The Implemented Database
39
General Summary on Data Modelling
40
General Summary on Data Modelling
41
General Summary on Data Modelling
42
Reference:
Coronel, C. & Morris, S.
(2019). Database Systems
Design Implementation and
Management (13th ed.)
Cengage Learning.
43