Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

ER and Normalization

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 158

ADBMS

History of Database / Evolution of Database


•In the traditional approach, information is stored in flat files which are maintained by the
file system of OS.
•Application programs go through the file system to access these flat files.
Disadvantages of the traditional approach
• Data Security: The data as maintained in the flat file(s) is easily accessible and
therefore not secure.

• Data Redundancy: Often the same information is duplicated in two or more files.
The duplication of data also called redundancy leads to higher storage cost and
access cost. It also lead to data inconsistency.

• Data Isolation: Data Isolation means that all the related data is not available in one
file. Generally, the data is scattered in various files, and the files may be in different
formats, therefore writing new application programs to retrieve the appropriate
data is difficult.

• Program/Data Dependence: Under the traditional file approach, application


programs are dependent on the master and transaction file(s) and vice-versa.
Changes in the physical format of the master file(s), such as addition of a data field
requires that the change must be made in all the application programs that access
the master file
Disadvantages of the traditional approach

• Lack of Flexibility: The traditional systems are able to retrieve information for
predetermined requests for data. If the management needs unanticipated data,
the information can perhaps be provided if it is in the files of the system. Extensive
programming is however required which may result in delay in making the
information available. Thus by the time the information is made available, it may
no longer be required or useful.

• Concurrent Access Anomalies: Many traditional systems allow multiple users to


access and update the same piece of data simultaneously. But the interaction of
concurrent updates may result in inconsistent data.
DBMS: Definitions

• Database Management System is one of the oldest technique, which contains a set of
programs specially designed for creation and managing of DATA stored in a DATABASE

• Data: A collection of raw facts.

• Information:- –A known fact that can be recorded and that have implicit meaning or
processed data.

• Database:- A collection of related data with the following implicit properties. A Database is a
logically coherent collection of data with some inherent meaning. A Database is designed,
built, and populated with data for a particular purpose

• Database System :– Database and DBMS software integrated to work together forms a
database system
Services provided by a DBMS

• Services provided by a DBMS


• Data management
• Data definition
• Transaction support
• Concurrency control
• Recovery
• Security and integrity
• Utilities-facilities like data import & export, user management, backup,
performance analysis, logging & audit, physical storage control
Characteristics

• Characteristics
• Traditionally, data was organized in file formats. DBMS was a new
concept then, and all the research was done to make it overcome the
deficiencies in traditional style of data management.
• A modern DBMS has the following characteristics −
• Real-world entity − A modern DBMS is more realistic and uses real-
world entities to design its architecture. It uses the behavior and attributes
too. For example, a school database may use students as an entity and their
age as an attribute.
• Relation-based tables − DBMS allows entities and relations among
them to form tables. A user can understand the architecture of a database
just by looking at the table names.
• Isolation of data and application − A database system is entirely
different than its data. A database is an active entity, whereas data is said
to be passive, on which the database works and organizes. DBMS also
stores metadata, which is data about data, to ease its own process.
• Less redundancy − DBMS follows the rules of normalization, which splits a
relation when any of its attributes is having redundancy in values. Normalization
is a mathematically rich and scientific process that reduces data redundancy.
• Consistency − Consistency is a state where every relation in a database
remains consistent. There exist methods and techniques, which can detect attempt
of leaving database in inconsistent state. A DBMS can provide greater consistency
as compared to earlier forms of data storing applications like file-processing
systems.
• Query Language − DBMS is equipped with query language, which makes it
more efficient to retrieve and manipulate data. A user can apply as many and as
different filtering options as required to retrieve a set of data. Traditionally it was
not possible where file-processing system was used.
• ACID Properties − DBMS follows the concepts of Atomicity, Consistency,
Isolation, and Durability normallyshortenedasACID. These concepts are applied
on transactions, which manipulate data in a database. ACID properties help the
database stay healthy in multitransactional environments and in case of failure.
• Multiuser and Concurrent Access − DBMS supports multi-user environment
and allows
• them to access and manipulate data in parallel. Though there are restrictions on
• transactions when users attempt to handle the same data item, but users are always
• unaware of them.
• Multiple views − DBMS offers multiple views for different users. A user who is
in the Sales department will have a different view of database than a person
working in the Production department. This feature enables the users to have a
concentrate view of the database according to their requirements.
• Security − Features like multiple views offer security to some extent where
users are unable to access data of other users and departments. DBMS offers
methods to impose constraints while entering data into the database and retrieving
the same at a later stage. DBMS offers many different levels of security features,
which enables multiple users to have different views with different features. For
example, a user in the Sales department cannot see the data that belongs to the
Purchase department. Additionally, it can also be managed how much data of the
Sales department should be displayed to the user. Since a DBMS is not saved on
the disk as traditional file systems, it is very hard for miscreants to break the code.
Database Architecture Levels
In the above figure, the three level of DBMS architecture is depicted.
The External view is how the Customer, Jack views it.
The Conceptual view is how the DBA views it.
The Internal view is how the data is actually stored
• Data Independence
• A database system normally contains a lot of data in addition to users’ data. For
example, it stores data about data, known as metadata, to locate and retrieve data
easily. It is rather difficult to modify or update a set of metadata once it is stored
in the database. But as a DBMS expands, it needs to change over time to satisfy
the requirements of the users. If the entire data is dependent, it would become a
tedious and highly complex job.

• Metadata itself follows a layered architecture, so that when we change data at


one layer, it does not affect the data at another level. This data is independent but
mapped to each other.

• Logical Data Independence


• Logical data is data about database, that is, it stores information about how data
is managed inside. For example, a table relation stored in the database and all its
constraints, applied on that relation.
• Logical data independence is a kind of mechanism, which liberalizes itself from
actual data stored on the disk. If we do some changes on table format, it should
not change the data residing on the disk.
• Physical Data Independence
• All the schemas are logical, and the actual data is stored in bit format on the disk.
Physical data independence is the power to change the physical data without
impacting the schema or logical data.
• For example, in case we want to change or upgrade the storage system itself −
suppose we want to replace hard-disks with SSD − it should not have any impact
on the logical data or schemas.
DBA is a key person and takes care of most administrative tasks
•Database designers, design the database elements.
•Application programmers, make use of the various database elements and write
programs to retrieve data from them.
•End users use the DBMS.
1. Users and application programs need not know exactly where or how the data is
stored in order to access it.
2. Proper database design can reduce or eliminate data redundancy and confusion.
3.Support for unforeseen (ad hoc) information requests are better supported -better
flexibility.
4. Data can be more effectively shared between users and/or application programs.
5.Data can be stored for long term analysis (data warehousing).
Database Models/Record Based Logical Models
• Advantages.
• · Simple and easy to use
• · Data with hierarchical relationship can be mapped on this model.
• · Suitable for application such as: Employee Dept.

• Disadvantages:
• · Search for (an element or ) a record is difficult.
• · Insertion, deletion and updation is difficult.
• · Many – many relationship can not be established.
• · Data with non hierarchical relationship can not be mapped.
• · The hierarchical relationship is maintained using points which requires extra
• storage.
• · Changes in relationship requires changes in entire structure of the database.
• · Processing is sequential among branches of the tree so access time is high.
• Advantages:
• · Many to many relationship among records can be implemented .
• · Useful for representing such records which are represented in many to many
relationships.
• · Searching a record is easy since there are multiple access paths .
• · No problem of consistency in case of addition deletion of records.

• Disadvantages:
• · Implementations of records and relationship is complex.
• · Storage space requirement is high because of so many pointers.
• Advantages:
• 1. Tabular structure is easy to understand simple.
• 2. Data manipulation is easy.
• 3. We can apply mathematical operation on tables.
• 4. Built in query language support such as SQL.
• 5. Very flexible data organization.

• Disadvantages:
• 1. Size of the data base becomes large.
A primary key which is a combination of more than one attribute is called a
composite primary key.
Diagramatic Representation of
Entity and Attributes

Employee

Department EmpNo {PK} Composite


attribute
Address
DepNo {PK}
street
Area to list DepName
city
attributes Location
belongs postcode
MobileNo[1..3]
NetSalary
Multi-valued attribute DepNo {FK}

Foreign Key
Derived attribute

46
The minimum and maximum values of this connectivity is called the cardinality of the
relationship
A unary relationship is represented as a diamond which connects one entity to itself as a
loop.
•The relationship above means, some instances of employee manage other instances of
Employee
A relationship between two entity types

A relationship connecting three entity types


All instances of the entity type Employee don‟t participate in the relationship, Head-of.
•Every employee doesn‟t head a department. So, employee entity type is said to partially
participate in the relationship.
•But, every department would be headed by some employee.
•So, all instances of the entity type Department participate in this relationship. So, we say that it
is total participation from the department side.
These attributes best describe the relationship prescription rather than any individual
entity Doctor, Patient or Medicine.
Entity Vs. Attribute
E-R modeling Entity Vs. Relationship
Binary Vs. ternary Relationships
Binary vs. Ternary relationships
The identifying relationship is the one which relates the weak entity (dependant) with the
strong entity (Employee) on which it depends.
Id is underlined with a dotted line because it is used to form composite key of dependent
entity along with E#.
Derived attributes are ignored.
Composite attributes are represented by components.
Multi-valued attributes are represented by a separate table.
Here dependant is a weak entity. Dependant doesn’t mean anything to the problem
without the information on for which employee the person is a dependant.
Extended E-R Features: Specialization
• Top-down design process; we designate subgroupings within an entity set that are
distinctive from other entities in the set.

• These subgroupings become lower-level entity sets that have attributes or


participate in relationships that do not apply to the higher-level entity set.

• Depicted by a triangle component labeled ISA (E.g. customer “is a” person).

• Attribute inheritance – a lower-level entity set inherits all the attributes and
relationship participation of the higher-level entity set to which it is linked.
Specialization Example
Specialization
Specialization is the opposite of generalization. In specialization, a group of
entities is divided into sub-groups based on their characteristics. Take a group
‘Person’ for example. A person has name, date of birth, gender, etc. These
properties are common in all persons, human beings. But in a company, persons
can be identified as employee, employer, customer, or vendor, based on what
role they play in the company.

Similarly, in a school database, persons can be specialized as teacher, student, or


a staff, based on what role they play in school as entities.
Extended ER Features: Generalization
• A bottom-up design process – combine a number of entity sets that share
the same features into a higher-level entity set.

• Specialization and generalization are simple inversions of each other;


they are represented in an E-R diagram in the same way.

• The terms specialization and generalization are used interchangeably.


• Can have multiple specializations of an entity set based on different
features.
• E.g. permanent_employee vs. temporary_employee, in addition to officer
vs. secretary vs. teller
• Each particular employee would be
– a member of one of permanent_employee or temporary_employee,
– and also a member of one of officer, secretary, or teller
• The ISA relationship also referred to as superclass - subclass relationship
Design Constraints on a Specialization/Generalization

• Constraint on which entities can be members of a given lower-level


entity set.
– condition-defined
• Example: all customers over 65 years are members of
senior-citizen entity set; senior-citizen ISA person.
– user-defined
• Constraint on whether or not entities may belong to more than one
lower-level entity set within a single generalization.
– Disjoint
• an entity can belong to only one lower-level entity set
• Noted in E-R diagram by writing disjoint next to the ISA
triangle
– Overlapping
• an entity can belong to more than one lower-level entity set
Design Constraints on a Specialization/Generalization
(Cont.)

• Completeness constraint -- specifies whether or not an


entity in the higher-level entity set must belong to at least one
of the lower-level entity sets within a generalization.
– total : an entity must belong to one of the lower-level
entity sets
– partial: an entity need not belong to one of the lower-
level entity sets
• Generalization
• As mentioned above, the process of generalizing entities, where the
generalized entities contain the properties of all the generalized entities, is
called generalization. In generalization, a number of entities are brought
together into one generalized entity based on their similar characteristics.
• For example, pigeon, house sparrow, crow and dove can all be generalized
as Birds.
Aggregation

 Consider the ternary relationship works_on, which we saw earlier


 Suppose we want to record managers for tasks performed by an
employee at a branch
Aggregation (Cont.)
• Relationship sets works_on and manages represent overlapping
information
– Every manages relationship corresponds to a works_on relationship
– However, some works_on relationships may not correspond to any
manages relationships
• So we can’t discard the works_on relationship
• Eliminate this redundancy via aggregation
– Treat relationship as an abstract entity
– Allows relationships between relationships
– Abstraction of relationship into new entity
• Without introducing redundancy, the following diagram represents:
– An employee works on a particular job at a particular branch
– An employee, branch, job combination may have an associated
manager
E-R Diagram With Aggregation
Designing the database without any anomalies
Designing the database such that the redundancy is reduced
Normalization
 The process of identifying the correct location of each
attribute and the correct structure of the relations.
 It is the process by which incorrectly constructed
relations are decomposed into multiple correctly
constructed relations

10
3
Normalization Required
• To ensure data consistency and stability
• To minimize data redundancy
• To ensure consistent updatability
• To ensure maintainability of the data

10
4
Why Normal Forms?
• To understand the how complex our data base is
• To know in which state we are in
• This will allow us to understand present criticality in the database
• To ensure that all the operations related to database perform smoothly

• Issues with Redundancy


• It is the main root cause for all the anomalies
• Anomalies like INSERT, UPDATE, DELETE
• Huge memory loss
Insert anomalies
Update anomalies
Delete anomalies
In above example Marks is fully functionally dependent on STUDENT#
COURSE#and not on sub set of STUDENT# COURSE#. This means Marks
can not be determined either by STUDENT# OR COURSE# alone. It can be
determined only using STUDENT# AND COURSE# together. Hence Marks
is fully functionally dependent on STUDENT# COURSE#.
Another Example
• CourseName is not fully functionally dependent on STUDENT# COURSE#
because subset of STUDENT# COURSE# i.e only COURSE# determines the
CourseName and STUDENT# does not have any role in deciding CourseName.
Hence CourseName is not fully functionally dependent on STUDENT# COURSE#.

• In the above relationship CourseName is partially dependent on composite


attributes STUDENT# COURSE#because COURSE# alone defines the
CourseName.
Another Example
In above example, Grade depends on Marks and in turn Marks depends on
Student#COURSE#. Hence Grade transitively depends on Student#COURSE#.
Transitive:Indirect
Another Example
KEY CONCEPTS
Formal Name Common Name Also Known As

Relation Table Entity


Tuple Row Record
Attribute Column Field

12
0
First Normal Form

• In relational model, all the tables be in atleast


1NF
• Repeating sets of attributes are not allowed.
• Or in other words, Multivalued attributes are
not allowed

12
2
Before 1NF
UNNORMALISED FORM
STUD STUD MAJOR MAJOR TCH TCH TCH CLASS CLASS CLASS
NO NAME NO NAME NO NAME ROOM NO NO NO

1022 MUTHU 81 Maths 1 Ram 412 101 102 103

1023 UMA 81 Maths 3 Venkat 500 101 102 103

4123 RITA 84 Physics 2 Ganesh 216 201 202 203

Please observe that CLASSNO attribute repeats itself


STUDENT
(
STUDNO, STUDNAME,MAJORNO, MAJORNAME, TCHNO,
TCHNAME, TCHROOM, {CLASSNO}
)

12
3
Bringing it to 1NF
 Reduce entities to first normal form (1NF) by
removing repeating or multi valued attributes
• After we put it into 1NF, it changes to
STUDENT
(
STUDNO, STUDNAME,MAJORNO, MAJORNAME,
TCHNO, TCHNAME, TCHROOM, CLASSNO
)

12
4
First Normal Form
STUD STUD MAJOR MAJOR TCH TCH TCH CLASS
NO NAME NO NAME NO NAME ROOM NO
1022 MUTHU 81 Maths 1 Ram 412 101

1022 MUTHU 81 Maths 1 Ram 412 102

1022 MUTHU 81 Maths 1 Ram 412 103


1023 UMA 81 Maths 3 Venkat 500 101
1023 UMA 81 Maths 3 Venkat 500 102
1023 UMA 81 Maths 3 Venkat 500 103

4123 RITA 84 Physics 2 Ganesh 216 201


4123 RITA 84 Physics 2 Ganesh 216 202
4123 RITA 84 Physics 2 Ganesh 216 203

Please observe that CLASSNO is not functionally dependent on STUDNO


which is the PK. And so, we proceed to 2nd NF by forming a new relation with
the PK and CLASSNO
12
5
Second Normal Form

• To be in 2NF, a relation must be already in 1NF


• Its non-key attributes must be fully functionally
dependent on the primary key

12
6
Second Normal Form
STUD STUD MAJOR MAJOR TCH TCH TCH STUD CLASS
NO NAME NO NAME NO NAME ROOM NO NO
1022 MUTHU 81 Maths 1 Ram 412 1022 101
1023 UMA 81 Maths 3 Venkat 500
1022 102
4123 RITA 84 Physics 2 Ganesh 216
1022 103

1023 101

1023 102
Please observe that MAJORNAME not only depends on
the PK but actually depends on MAJORNO. 1023 103

Similarly TCHNAME & TCHROOM are not dependent on 4123 201


STUDNO only but actually depend on TCHNO. 4123 202
And so, we move towards 3rd NF as follows next 4123 203

12
7
Third Normal Form

• To be in the 3NF, a relation must be in 2NF

• Each non-key attribute must depend only on the

PK and on the entire PK (if it is a composite key).

12
8
Third Normal Form
STUD STUD MAJOR TCH STUD CLASS
NO NAME NO NO NO NO
1022 MUTHU 81 1 1022 101
1023 UMA 81 3 1022 102
4123 RITA 84 2 1022 103
1023 101
MAJOR MAJOR TCH TCH TCH 1023 102
NO NAME NO NAME ROOM 1023 103
81 Maths 1 Ram 412 4123 201
84 Physics 2 Ganesh 216 4123 202
3 Venkat 500 4123 203

12
9
PROPERTIES OF A RELATION

1: Entries in columns are single-valued.


2: Entries in columns are of the same kind.
3: Each row is unique.
4: Sequence of columns is insignificant.
5: Sequence of rows is insignificant.
6: Each column has a unique name.

13
0
Unnormalized Form (UNF)
• A table that contains one or more repeating
groups.

• To create an unnormalized table


– Transform the data from the information source
(e.g. form) into table format with columns and
rows.

21
First Normal Form (1NF)
• A relation in which the intersection of each
row and column contains one and only one
value.

22
UNF to 1NF
• Nominate an attribute or group of attributes
to act as the key for the unnormalized table.

• Identify the repeating group(s) in the


unnormalized table which repeats for the key
attribute(s).

23
UNF to 1NF
• Remove the repeating group by
– Entering appropriate data into the empty
columns of rows containing the repeating data
(‘flattening’ the table).
– Or by
– Placing the repeating data along with a copy of
the original key attribute(s) into a separate
relation.

24
Second Normal Form (2NF)
• Based on the concept of full functional
dependency.

• Full functional dependency indicates that if


– A and B are attributes of a relation,
– B is fully dependent on A if B is functionally
dependent on A but not on any proper subset of
A.

25
Second Normal Form (2NF)
• A relation that is in 1NF and every non-
primary-key attribute is fully functionally
dependent on the primary key.

26
1NF to 2NF
• Identify the primary key for the 1NF relation.

• Identify the functional dependencies in the


relation.

• If partial dependencies exist on the primary


key remove them by placing them in a new
relation along with a copy of their
determinant.
27
Third Normal Form (3NF)
• Based on the concept of transitive dependency.

• Transitive Dependency is a condition where


– A, B and C are attributes of a relation such that if A
 B and B  C,
– then C is transitively dependent on A through B.
(Provided that A is not functionally dependent on B
or C).

28
Third Normal Form (3NF)
• A relation that is in 1NF and 2NF and in which
no non-primary-key attribute is transitively
dependent on the primary key.

29
2NF to 3NF
• Identify the primary key in the 2NF relation.

• Identify functional dependencies in the relation.

• If transitive dependencies exist on the primary


key remove them by placing them in a new
relation along with a copy of their determinant.

30
Boyce-Codd Normal Form (BCNF)
• Based on functional dependencies that takes
into account all candidate keys in a relation.

• For a relation with only one candidate key,


3NF and BCNF are equivalent.

• A relation is in BCNF, if and only if every


determinant is a candidate key.

31
Boyce-Codd Normal Form (BCNF)

• Violation of BCNF may occur in a relation


that
– contains two (or more) composite keys
– which overlap and share at least one attribute
in common.

32
3NF to BCNF
• Identify all candidate keys in the relation.

• Identify all functional dependencies in the


relation.

• If functional dependencies exists in the relation


where their determinants are not candidate keys for
the relation, remove the functional dependencies
by placing them in a new relation along with a copy
of their determinant.
33
UNF to 1NF

35
UNF to 1NF (Alternative)

36
FDs for Customer_Rental Relation

37
Customer_Rental to 2NF Relations

38
Property_Owner to 3NF Relations

39
Example 1 - Normalization
Summary of 3NF Relations

41
3NF to BCNF Relations

42
Fourth Normal Form (4NF)
• Associated with a dependency called multi-
valued dependency (MVD).

• MVDs in a relation are due to first normal


form (1NF), which disallows an attribute in
a row from having a set of values.

47
MVD
• Represents a dependency between attributes (for
example, A, B, and C) in a relation, such that for
each value of A there is a set of values for B, and
a set of values for C. However, the set of values
for B and C are independent of each other.

• MVD between attributes A, B, and C in a


relation using the following notation:
A  B
A  C
Fourth Normal Form (4NF)
• A relation that is in Boyce-Codd Normal
Form and contains no MVDs.

• BCNF to 4NF involves the removal of the


MVD from the relation by placing the
attribute(s) in a new relation along with a
copy of the determinant(s).

49
Example 4 - Normalization
BCNF to 4NF Relations

50
Fifth Normal Form (5NF)
• Lossless-join property refers to when we
decompose a relation into two relations - we
can rejoin the resulting relations to produce
the original relation.

• However, sometimes there is the requirement


to decompose a relation into more than two
relations. Although rare, these cases are
managed by join dependency and 5NF.
51
5NF and Lossless-join Dependency

• Lossless-join Dependency
– A property of decomposition, which ensures
that no spurious rows are generated when
relations are reunited through a natural join
operation.

• 5NF
– A relation that has no join dependency.

52
Example 4 - Normalization
4NF to 5NF Relations

53
The Process of
Normalization up to
5NF

54

You might also like