ER and Normalization
ER and Normalization
ER and Normalization
• Data Redundancy: Often the same information is duplicated in two or more files.
The duplication of data also called redundancy leads to higher storage cost and
access cost. It also lead to data inconsistency.
• Data Isolation: Data Isolation means that all the related data is not available in one
file. Generally, the data is scattered in various files, and the files may be in different
formats, therefore writing new application programs to retrieve the appropriate
data is difficult.
• Lack of Flexibility: The traditional systems are able to retrieve information for
predetermined requests for data. If the management needs unanticipated data,
the information can perhaps be provided if it is in the files of the system. Extensive
programming is however required which may result in delay in making the
information available. Thus by the time the information is made available, it may
no longer be required or useful.
• Database Management System is one of the oldest technique, which contains a set of
programs specially designed for creation and managing of DATA stored in a DATABASE
• Information:- –A known fact that can be recorded and that have implicit meaning or
processed data.
• Database:- A collection of related data with the following implicit properties. A Database is a
logically coherent collection of data with some inherent meaning. A Database is designed,
built, and populated with data for a particular purpose
• Database System :– Database and DBMS software integrated to work together forms a
database system
Services provided by a DBMS
• Characteristics
• Traditionally, data was organized in file formats. DBMS was a new
concept then, and all the research was done to make it overcome the
deficiencies in traditional style of data management.
• A modern DBMS has the following characteristics −
• Real-world entity − A modern DBMS is more realistic and uses real-
world entities to design its architecture. It uses the behavior and attributes
too. For example, a school database may use students as an entity and their
age as an attribute.
• Relation-based tables − DBMS allows entities and relations among
them to form tables. A user can understand the architecture of a database
just by looking at the table names.
• Isolation of data and application − A database system is entirely
different than its data. A database is an active entity, whereas data is said
to be passive, on which the database works and organizes. DBMS also
stores metadata, which is data about data, to ease its own process.
• Less redundancy − DBMS follows the rules of normalization, which splits a
relation when any of its attributes is having redundancy in values. Normalization
is a mathematically rich and scientific process that reduces data redundancy.
• Consistency − Consistency is a state where every relation in a database
remains consistent. There exist methods and techniques, which can detect attempt
of leaving database in inconsistent state. A DBMS can provide greater consistency
as compared to earlier forms of data storing applications like file-processing
systems.
• Query Language − DBMS is equipped with query language, which makes it
more efficient to retrieve and manipulate data. A user can apply as many and as
different filtering options as required to retrieve a set of data. Traditionally it was
not possible where file-processing system was used.
• ACID Properties − DBMS follows the concepts of Atomicity, Consistency,
Isolation, and Durability normallyshortenedasACID. These concepts are applied
on transactions, which manipulate data in a database. ACID properties help the
database stay healthy in multitransactional environments and in case of failure.
• Multiuser and Concurrent Access − DBMS supports multi-user environment
and allows
• them to access and manipulate data in parallel. Though there are restrictions on
• transactions when users attempt to handle the same data item, but users are always
• unaware of them.
• Multiple views − DBMS offers multiple views for different users. A user who is
in the Sales department will have a different view of database than a person
working in the Production department. This feature enables the users to have a
concentrate view of the database according to their requirements.
• Security − Features like multiple views offer security to some extent where
users are unable to access data of other users and departments. DBMS offers
methods to impose constraints while entering data into the database and retrieving
the same at a later stage. DBMS offers many different levels of security features,
which enables multiple users to have different views with different features. For
example, a user in the Sales department cannot see the data that belongs to the
Purchase department. Additionally, it can also be managed how much data of the
Sales department should be displayed to the user. Since a DBMS is not saved on
the disk as traditional file systems, it is very hard for miscreants to break the code.
Database Architecture Levels
In the above figure, the three level of DBMS architecture is depicted.
The External view is how the Customer, Jack views it.
The Conceptual view is how the DBA views it.
The Internal view is how the data is actually stored
• Data Independence
• A database system normally contains a lot of data in addition to users’ data. For
example, it stores data about data, known as metadata, to locate and retrieve data
easily. It is rather difficult to modify or update a set of metadata once it is stored
in the database. But as a DBMS expands, it needs to change over time to satisfy
the requirements of the users. If the entire data is dependent, it would become a
tedious and highly complex job.
• Disadvantages:
• · Search for (an element or ) a record is difficult.
• · Insertion, deletion and updation is difficult.
• · Many – many relationship can not be established.
• · Data with non hierarchical relationship can not be mapped.
• · The hierarchical relationship is maintained using points which requires extra
• storage.
• · Changes in relationship requires changes in entire structure of the database.
• · Processing is sequential among branches of the tree so access time is high.
• Advantages:
• · Many to many relationship among records can be implemented .
• · Useful for representing such records which are represented in many to many
relationships.
• · Searching a record is easy since there are multiple access paths .
• · No problem of consistency in case of addition deletion of records.
• Disadvantages:
• · Implementations of records and relationship is complex.
• · Storage space requirement is high because of so many pointers.
• Advantages:
• 1. Tabular structure is easy to understand simple.
• 2. Data manipulation is easy.
• 3. We can apply mathematical operation on tables.
• 4. Built in query language support such as SQL.
• 5. Very flexible data organization.
• Disadvantages:
• 1. Size of the data base becomes large.
A primary key which is a combination of more than one attribute is called a
composite primary key.
Diagramatic Representation of
Entity and Attributes
Employee
Foreign Key
Derived attribute
46
The minimum and maximum values of this connectivity is called the cardinality of the
relationship
A unary relationship is represented as a diamond which connects one entity to itself as a
loop.
•The relationship above means, some instances of employee manage other instances of
Employee
A relationship between two entity types
• Attribute inheritance – a lower-level entity set inherits all the attributes and
relationship participation of the higher-level entity set to which it is linked.
Specialization Example
Specialization
Specialization is the opposite of generalization. In specialization, a group of
entities is divided into sub-groups based on their characteristics. Take a group
‘Person’ for example. A person has name, date of birth, gender, etc. These
properties are common in all persons, human beings. But in a company, persons
can be identified as employee, employer, customer, or vendor, based on what
role they play in the company.
10
3
Normalization Required
• To ensure data consistency and stability
• To minimize data redundancy
• To ensure consistent updatability
• To ensure maintainability of the data
10
4
Why Normal Forms?
• To understand the how complex our data base is
• To know in which state we are in
• This will allow us to understand present criticality in the database
• To ensure that all the operations related to database perform smoothly
12
0
First Normal Form
12
2
Before 1NF
UNNORMALISED FORM
STUD STUD MAJOR MAJOR TCH TCH TCH CLASS CLASS CLASS
NO NAME NO NAME NO NAME ROOM NO NO NO
12
3
Bringing it to 1NF
Reduce entities to first normal form (1NF) by
removing repeating or multi valued attributes
• After we put it into 1NF, it changes to
STUDENT
(
STUDNO, STUDNAME,MAJORNO, MAJORNAME,
TCHNO, TCHNAME, TCHROOM, CLASSNO
)
12
4
First Normal Form
STUD STUD MAJOR MAJOR TCH TCH TCH CLASS
NO NAME NO NAME NO NAME ROOM NO
1022 MUTHU 81 Maths 1 Ram 412 101
12
6
Second Normal Form
STUD STUD MAJOR MAJOR TCH TCH TCH STUD CLASS
NO NAME NO NAME NO NAME ROOM NO NO
1022 MUTHU 81 Maths 1 Ram 412 1022 101
1023 UMA 81 Maths 3 Venkat 500
1022 102
4123 RITA 84 Physics 2 Ganesh 216
1022 103
1023 101
1023 102
Please observe that MAJORNAME not only depends on
the PK but actually depends on MAJORNO. 1023 103
12
7
Third Normal Form
12
8
Third Normal Form
STUD STUD MAJOR TCH STUD CLASS
NO NAME NO NO NO NO
1022 MUTHU 81 1 1022 101
1023 UMA 81 3 1022 102
4123 RITA 84 2 1022 103
1023 101
MAJOR MAJOR TCH TCH TCH 1023 102
NO NAME NO NAME ROOM 1023 103
81 Maths 1 Ram 412 4123 201
84 Physics 2 Ganesh 216 4123 202
3 Venkat 500 4123 203
12
9
PROPERTIES OF A RELATION
13
0
Unnormalized Form (UNF)
• A table that contains one or more repeating
groups.
21
First Normal Form (1NF)
• A relation in which the intersection of each
row and column contains one and only one
value.
22
UNF to 1NF
• Nominate an attribute or group of attributes
to act as the key for the unnormalized table.
23
UNF to 1NF
• Remove the repeating group by
– Entering appropriate data into the empty
columns of rows containing the repeating data
(‘flattening’ the table).
– Or by
– Placing the repeating data along with a copy of
the original key attribute(s) into a separate
relation.
24
Second Normal Form (2NF)
• Based on the concept of full functional
dependency.
25
Second Normal Form (2NF)
• A relation that is in 1NF and every non-
primary-key attribute is fully functionally
dependent on the primary key.
26
1NF to 2NF
• Identify the primary key for the 1NF relation.
28
Third Normal Form (3NF)
• A relation that is in 1NF and 2NF and in which
no non-primary-key attribute is transitively
dependent on the primary key.
29
2NF to 3NF
• Identify the primary key in the 2NF relation.
30
Boyce-Codd Normal Form (BCNF)
• Based on functional dependencies that takes
into account all candidate keys in a relation.
31
Boyce-Codd Normal Form (BCNF)
32
3NF to BCNF
• Identify all candidate keys in the relation.
35
UNF to 1NF (Alternative)
36
FDs for Customer_Rental Relation
37
Customer_Rental to 2NF Relations
38
Property_Owner to 3NF Relations
39
Example 1 - Normalization
Summary of 3NF Relations
41
3NF to BCNF Relations
42
Fourth Normal Form (4NF)
• Associated with a dependency called multi-
valued dependency (MVD).
47
MVD
• Represents a dependency between attributes (for
example, A, B, and C) in a relation, such that for
each value of A there is a set of values for B, and
a set of values for C. However, the set of values
for B and C are independent of each other.
49
Example 4 - Normalization
BCNF to 4NF Relations
50
Fifth Normal Form (5NF)
• Lossless-join property refers to when we
decompose a relation into two relations - we
can rejoin the resulting relations to produce
the original relation.
• Lossless-join Dependency
– A property of decomposition, which ensures
that no spurious rows are generated when
relations are reunited through a natural join
operation.
• 5NF
– A relation that has no join dependency.
52
Example 4 - Normalization
4NF to 5NF Relations
53
The Process of
Normalization up to
5NF
54