Data Modelling Dbms PDF
Data Modelling Dbms PDF
Design Concepts
Database Development Life Cycle
• Prelimnary Planning
• Feasibility Study
• Requirements Definition
• Conceptual Design
• Implementation
• Evaluation & Maintenance
You are familiar with these steps from
Software Engineering
Data Models
• A model is a representation of real world objects
and their associations.
• A data model in contrast is a representation of
real world data objects and their associations.
• D ata m o d e l s a c t u a l l y d ef i n e h ow d ata i s
connected to each other and how they are
processed inside the system.
• More precisely a Data Model is a logical structure
of Database. It describes the design of database
to reflect entities, attributes, relationship among
data, constrains etc.
• The purpose of a data model is to represent the
data and to make the data understandable. If it
does this then it can be easily used to design a
database.
• There have been many data models proposed in
literature and we can categorize them according
to the level they are used to describe the
database structure.
• High Level or Conceptual Data Models provide
concepts that are close to the way many users
perceive data, whereas Low-Level or Physical
Data Models provide concepts that describe the
details of how data is stored in the memory of the
computer.
• Concepts provided by low-level data models are
generally meant for computer specialists, not for
typical end users.
• Between these two extremes is a class of
Representation al ( or imple mentation or
trad itiona l) D ata M od e ls , w hi c h provi d e
concepts that may be understood by end users
but they are not too far from the way data is
organized within the memory of computer.
• Representational data models hide some (not all)
details of data storage but can be implemented
on a computer system in a direct way.
• Conceptual Data Models use concepts such as entities,
attributes and relationships. (We shall discuss ER
Model in detail).
• Representational Data Models are the models used
most frequently in traditional commercial DBMSs, and
they include the most widely used relational data
model as well as the so-called legacy data models – the
Network and Hierarchical Data Models.
• Representational Data Models represent data by using
record structures ( a characteristics close to physical
data models) and hence are referred to as Record
Based Data Models.
• We can regard Object Data Models as a new family of
high level implementation data models that are closer
to the conceptual data models.
• In practice Conceptual Data Models are used
to derive the highest level abstraction of the
database and the model thus obtained is
converted to representational data model
that can be directly implemented on a
computer (i.e. that can be utilized to create
conceptual and external schemas).
• In the next few portions of this lecture we
shall discuss Conceptual and Representational
Data Models at length.
Data Associations
• Data Associations means relationship between various data
items (Entities and Attributes). Data Modeling in fact is
used to represent the entities of Interest and their
relationships in the database.
• Entities specify distinct real world items in an application.
Basically they are nouns or in other words anything of the
interest of the organization.
• Attributes are nothing but the properties of Entities.
• When a large amount of data is stored in a database, we
have to formalize the storage mechanism that will be used
to obtain the correct information from the database. We
have to establish a means of showing the relationship
among various sets of data represented in the database.
• A R e l a t i o n s h i p b e t w e e n t w o s e t s X a n d Y i s a
correspondence or mapping between members of the sets.
A possible relationship that may exist between any two sets
(X & Y) may be one-to-one, one-to-many or many-to-many.
X Y
One-to-One
X Y
One-to-Many
X Y
Many-to-Many
An Example...
• Consider the Entity “Employee”.
• Attributes of this Entity may be – EMP_ID, SSN,
Name, Salary, Address, Status etc.
• Anyone among EMP_ID and SSN can be
chosen as Primary Key (An attribute that is
capable of uniquely identifying a given
employee).
• Some possible associations (relationships)
among the attributes of Employee have been
shown in the figure on the next page.
EMP_ID SSN
One-to-One (1 : 1)
EMP_ID NAME
Many-to-One (M : 1)
NAME SSN
One-to-Many (1 : M)
SALARY STATUS
Many-to-Many (M : N)
Relationships Among Entities
• As associations (relationships) do exist
between the attributes of a given entity,
similarly two entities may also be related.
• We distinguish between the association
that exists among the attributes of an
entity, called attribute association, and
that which exists between entities, called
a relationship.
An Example: Employees in an organization work on several
p ro j e c t s . I d e nt i f y E n t i t i e s a n d t h e i r re l at i o n s h i p s .
• Possible Entities are:
– MANAGER
– EMPLOYEES
– DEPARTMENT
– PROJECTS
• Relationships Among Entities:
– Relationship between DEPARTMENT and MANAGER is 1 : 1.
– Relationship between MANAGER and EMPLOYEES is 1 : M.
– Relationship between EMPLOYEES and PROJECTS is M : N.
– Relationship between DEPARTMENT and EMPLOYEES is 1 : M.
– Relationship between PROJECTS and MANAGER M : N.
– Relationship between PROJECTS and DEPARTMENT is M : 1.
DEPART MENT MANAGER
1:1
MANAGER EMPLOYEES
1:M
EMPLOYEES PROJECT S
M:N
PROJECT S MANAGER
M:N
1 M
MANAGER Manages EMPLOYEES
M N
EMPLOYEES Work On PROJECT S
1 M
DEPART MENT Consists EMPLOYEES
M Managed N
PROJECT S MANAGER
By
M 1
PROJECT S Assignedto DEPART MENT
ER Diagram for Employees – Projects Problem
Problem Statement:
• S e v e r a l E m p l o y e e s i n v a r i o u s
Departments managed by different
individual Managers work on several
projects. Draw an ER diagram for
modeling the data of organization.
Students – Teachers – Courses Problem
Problem Statement:
• In a University department students are
taught several courses by different
teachers. Identify entities and draw an
ER diagram to model the data of the
department.
Assignment Problems
1. A machine shop produces many parts which it takes on contract.
It has many machinists who can operate any of the machines. A
part needs working on only one machine. A record is kept on
the quality of material needed for producing each part. The
production of each part is tracked by giving a job number, start
time , end time and machinist identification. Obtain an ER
diagram for this problem.
2. A magazine is published monthly and is sent by post to its
subscribers. Two months before the expiry of subscription, a
reminder is sent to the subscriber. If subscription is not within a
month another reminder is sent. If renewal subscription received
up to two weeks before the expiry of subscription the
subscriber’s name is removed from the mailing list and the
subscriber is informed. Obtain an ER diagram to model the
situation.
3. A library receives 1300 journals of varying periodicities. The journals
receipt have to be recorded and displayed. Action has to be taken
when journals are not received in time or lost in mail. Unless
request for replacement is sent quickly, it may not be possible to
get replacement. Journals have to be ordered at different times
during the year and subscriptions renewed in time. Late payment of
subscriptions may lead to non availability of earlier issues or paying
high amounts for those issues,. Draw an ER diagram for the problem.
4. An advertisement is issued giving essential qualifications for the
course, the last date receipt of application form and a fee to be
enclosed with the application. A clerk in the registrar’s office
checks the received applications to see if marks sheet and fee are
enclosed and sends valid applications to the concerned academic
departments. The department checks the application in detail and
decides the applicants to be admitted, those to be put in waiting
list and those rejected. Appropriate letters are sent to the
registrar’s office which intimates the applicants. Draw ER diagram
for the problem.
5. Draw an ER diagram for a banking system.
Representational (Traditional) Data Models
Hierarchical Data Model
• The Hierarchical Data Model (HDM) uses the tree concept to represent
data and the relationships among data.
• The nodes of the tree are the record types (segments) representing the
entities and are connected by links.
• Each hierarchical tree can have only one root record type and this
record type doesn’t have a parent record type.
• The root can have any number of child record types, each of which can
itself be a root of another hierarchical tree.
• Each child record type can have only one parent record type, thus a
many to many relationship can’t be directly expressed between two
record types.
• Data in parent record applies to all its child records. A child record
occurrence must have a parent record occurrence.
• Deleting a parent record occurrence requires deleting all its child record
occurrences.
• A hierarchical tree can have any number of record occurrences for each
record type at each level of hierarchy.
Transforming an ER Model to Hierarchical
Data Structures (Hierarchical Model)
1. Transforming One to One Relationships:
• We follow the following Rule:
– For each entity E in the ER model, create a record type (segment) S
in the hierarchical model. All attributes of E are represented as
fields of S. Any of the segments may be chosen as parent and the
other segment becomes the child.
2. Transforming One to Many Relationships:
• We follow the following two Rules:
– For each entity E in the ER model, create a record type (segment) S
in the hierarchical model. All attributes of E are represented as
fields of S.
– For one to many relationship between two entities, create
corresponding tree structure diagrams, making each entity as
record type and making one to many relationship as a parent child
relationship. The record type (segment) on the many side of the
relationship becomes the child record type and the record type on
the one side of the relationship becomes the parent.
3. Transforming Many to Many Relationships:
• We follow the following two Rules:
– For entities E1 and E2 that have a many to many relationship and
from which segments (record types) S1 and S2 have been derived,
construct two different trees: S1 to S2 and S2 to S1. In one tree S1
would be parent and in the other tree it would be child. Similarly in
one tree S2 would be child and in the other tree it would be parent.
– If a many to many relationship has common attribute data, create a
new intersection segment I which contains that data. Each of the
segment types created from the entities will function as a root of a
distinct tree. Insert the new segment between the two entity types
and establish the corresponding one to many relationships
between parent child segments. If any of those parent child
relationships are exactly one to one, the common attribute data
might be combined into segments created from entities.
Example (1 : 1 Relationships)
MID
DID
1 1
DEPARTMENT Has MANAGER MNANE
DNAME
LOCATION
SEX
ER Model
DID DNAME LOCATION Parent (Department)
Segment 1
Segment 2
M102 Sandy F
Record Occurrences
Example (1 : M Relationships)
SEX EID
DID
1 M
DNAM E DEPART M ENT Has EM PLOYEES ENAM E
LOCAT ION M
GRADE
SALARY
1 Has
AM OUNT
ID DAT E
ER Model
Parent (Department)
DID DNAME LOCATION
[One Side]
Child (Employee)
EID ENAME GRADE SALARY SEX
[Many Side]
Parent (Department)
DID DNAME LOCATION
[One Side]
Child (Employee)
EID ENAME GRADE SALARY SEX
[Many Side]
Record Occurrences
Example (M : N Relationships)
MID
PID
M N
PNAME PRODUCTS MadeBy Manufacturers MNANE
DESC
LOCATION
ER Model
PID PNAME DESC Parent (Products)
Tree 2
Hierarchical data Model
101 Bajaj Mumbai
1 Scooter 25000
2 BikeE 40000
Three
3 60000
Wheeler
1 Scooter 25000
CITY
STATUS
QTY
S#
S# PNAME
SNAME
SNAME Suppliers Supply Parts P#
COLOR
CITY
CITY
WEIGHT
S# SNAME STATUS CITY Parent P# PNAME COLOR WEIGHT CITY Parent
QTY QTY
Tree 1 Tree 2
• f you carefully examine the relations used by
C.J. Date for the supplier-parts problem,
throughout his book they are:
S <S#, SNAME, CITY, STATUS>
P <P#, PNAME, COLOR, WEIGHT, CITY>
SP <S#, P#, QTY>
• It indicates that QTY attribute must be the
attribute data.
Evaluation of Hierarchical Model
• There are three features that define the hierarchical data structures –
trees, segments and fields.
• While any ER model can be transformed to a hierarchical data structure,
the requirement that all database records can be trees may result in
segments duplication.
• Any situation whose natural mapping results in a segment being a child
segment of two distinct parent segments requires that those parent
segments occur in separate trees.
• As an example consider the ER Model and its transformation to
Hierarchical Data Structures, on the next page.
• Such duplication has the following negative points:
– Storage space is used inefficiently as the segment is repeated.
– Possibility of inconsistent data is there, if the data are changed in one
segment copy but not in the other.
1 M N M
E1 E2 E3
500
Virtual Segments
Examples of Commercial Database Management Systems
Based on Hierarchical Approach
• No DBMS supports hierarchical data organization in modern
days.
• Few legacy DBMSs based on this approach are:
– IMS: IBM’s Information Management System. It was
once the leading DBMS based on Hierarchical approach.
– TDMS: System Development Corporation’s Time Shared
Data Management System.
– Mark IV: Control Data Corporation’s Multi Access
Retrieval System.
– System 2000: SAS Institute’s Hierarchical DBMS.
Network Data Model
• There are two fundamental data structures in Network data
Model:
– Record Types (same as Segments in Hierarchical Data Model)
– DBTG Sets or simply Sets
• Record Types are defined in a usual way as collections of logically
related data items (fields). Record types are same as segments in
hierarchical model.
• DBTG Set or simply Set in the DBTG model expresses a one to
one or one to many relationship between two record types
(segments) or entities. The record type on the one side of the
one to many relationship of a DBTG set is called the Owner
record type and the record type on the many side of the one to
many relationship of a DBTG set is called Member record type. In
a one to one relationship any record type can be chosen as
Owner and the other becomes the member.
Transforming an ER Model to Network Data
Structures (Network Model)
• Conceptual (ER) modeling to network data modeling is expressed by
means of Batchman Diagrams.
• Following conventions are there for constructing the Batchman
Diagrams:
– The sets are denoted by the arrows between the record types
with the arrow pointing to the member record type.
– Each set is consists of an owner record type, a member record
and a name for the set. The set name is the label given to the
arrow.
• Simple & Complex Networks: A conceptual data structure (ER
Model) in which all relationships are one to one or one to many is
called a Simple Network and a conceptual data structure in which
one or more relationships are many to many is called Complex
Network. Note that the DBTG network model allows only simple
networks in which all relationships are one to many or one to one. A
complex network can’t be directly implemented in DBTG model.
Rules for Transformation
1. Transforming One to One Relationships:
• We follow the following Rule:
– For each entity E in the ER model, create a record type R in
the network model. All attributes of E are represented as
fields of R. Any of the record types may be chosen as owner
and the other record type becomes the member.
2. Transforming One to Many Relationships:
• We follow the following Rules:
– For each entity E in the ER model, create a record type R in
the network model. All attributes of E are represented as
fields of R.
– For one to many relationship the record type on one side of
the relationship becomes the owner and the record type on
the many side of the relationship becomes the member.
Example (1 : 1 Relationships)
SSN
CODE
1 1
COUNTRY Has PM PMNANE
NAME
CURRENCY
SEX
CODE NAME CURRENCY Ow ne r (Country)
CON-PM
OR
SSN PMNAME SEX Ow ne r (PM)
PM-CON
Record Occurrences
(as per first choice in above diagram)
Example (1 : M Relationships)
(for simplicity we are considering one attribute for each entity and are considering only 1:M relationships)
O No .
1 M
CNA M E CU S T O M E R G i ve s O RD E R S
E xe cu te s
S A L E S P E RS O N
NA M E
(for simplicity I have considered entities only. You are advised to use the complete
record types as we have done in the previous example)
CUS-PO SAP-PO
Orders
Record Occurrences
3. Transforming Manyto Many Relationships:
• When two entities are connected in many to many relationship we
create an intersection or link record type consisting of at least the
key attributes from both the entities. Other attributes may be
added at the discretion of the designer.
• What is a link record?
– A dummy record type that is created in order to convert a
complex network into an equivalent simple network is called a
link record or link record type.
– With the creation of link record type, all many to many
relationships are converted into equivalent one to many
relationships. Which are required by the DBTG Network Model.
• So, to derive the Network Data Structure, we follow the following
rule:
– For each many to many relationship between entities E1 and E2,
create a link record type L, and make it the member record type
in the two set types, of which the set types owners are the
record types corresponding to E1 and E2.
Example (M : N Relationships)
CID
ENO
M N
SNAME ST UDENT S Attend COURSES
CNAM E
STUDENTS SID SNAME (OWNERS) CID CNAME COURSES
STU-LREC COU-LREC
S1 C2 S1 C3 S1 C4 S2 C1 S2 C3 S2 C4 S3 C1 S3 C4
Record Occurrences
(Notice that on both sides now relationship is 1:M)
Hierarchical Vs. Network Data Model
• Consider the Network Data Structure shown on the
next page for the Customer-Sales Person-Purchase
problem.
• This example indicates the clear difference
between hierarchical and network models.
• In this figure, the PURCHASE ORDER record type is
a member (Child) of two sets (Parents) – CUS-PO
and SAP-PO.
• In the hierarchical data model no record type can
be a member (child) of two different sets (parents).
Owner CUSTOMER SALES PERSON Owner
CUS-PO SAP-PO
• No DBMS supports network data organization in modern days.
• Few legacy DBMSs based on this approach are:
– IDMS/ R: The most widely used commercial implementation
of DBTG network data model. It stands for Integrated
Database Management System/ Relational
– DMS 1100: from UNIVAC.
– TOTAL: from Cincom’s.
– DBDMP: (Database Operations & Manipulation Process) from
IBM.
Relational Data Model
• In 1970 the way many people viewed databases was permanently changed when E.F.
Codd introduced the relational model.
• In this model relation is the only construct required to represent the associations
among the attributes of an entity as well as the relationships among different
entities.
• One of the major reasons for introducing this model was to increase the
productivity of application programmers.
• Users need not to know the exact physical structure to use the database. They are
however required to know how the data has been partitioned into various relations.
• In relational data model, relation is the only data structure used to represent
entities and relationships among them.
• In addition Codd proposed two data languages which promised more power in
accessing and processing the data. These Languages are:
– Relational Algebra
– Relational Calculus
• Today, these languages provide the basis for the commercial database languages
used in many of the most popular commercial DBMSs.
Terminology of Relational Model
Relation:
Given a collection of sets D1, D2 , D3 …. …. …. Dn, R is a relation on
those n sets if it is a set of ordered n-tuples < d1, d2 , d3 …. …. …. dn>
such that d1є D1, d2є D2, d3є D3,……., dnє Dn. Sets D1, D2 , D3 …. …. ….
D n are called the domains of R and the value n is called the
degree of R.
OR
We define R to be relation on sets D1, D2 , D3 …. …. …. Dn, if it is a
subset of the Cartesian Product D1 X D2 X D3 …. …. …. X Dn,
Cartesian product of these n sets, written as “D1 X D2 X D3 …. …. ….
X Dn” is the set of all possible ordered n tuples < d1, d2 , d3 …. …. ….
dn> such d1є D1, d2є D2, d3є D3,……., dnє Dn.
An Example
S# P#
S1
S2 X
P1
P2
= {<S1,P1>, <S1,P2>, <S2,P1>, <S2,P2>}
S# P#
S1 P1
S1 P2
S2 P1
S2 P2
• Both WORKER_ID and SUPV_ID in the WORKER relation have different names
but both take their values from the same domain (i.e. the domain of workers
identification numbers).
• The SUPV_ID is a foreign key in the worker relation that references the key of
its own relation. Such foreign keys are called Recursive Foreign Keys.
• Thus a recursive foreign key is nothing but a foreign key that references the key
of its own relation.
Relational Database Schema:
• A listing that gives relation names followed by their attribute names with
key attributes underlined and with foreign keys designated is called a
relational schema.
• Note that the term “Schema” has been used loosely in this definition. What
we are going to create is more closer to the relational view, because
schema is defined using a data sublanguage as we have practiced in past.
• An example of relational schema is:
WORKER<WORKER_ID,NAME,HOURLY_RATE,SKILL_TYPE,SUPV_ID>
Foreign Keys: SKILL_TYPE REFERENCES SKILL
: SUPV_ID REFERENCES WORKER
ASSIGNMENT<WORKER_ID,BUILD_ID,START_DATE,NUM_DAYS>
Foreign Keys: WORKER_ID REFERENCES WORKER
: BUILD_ID REFERENCES BUILDING
BUILDING<BUILD_ID,BUILD_ADDRESS,TYPE,QUALITY_LEVEL,STATUS>
SKILL<SKILL_TYPE,BONOUS_RATE,HOURS_PER_WEEK>
Views:
• We know how tables are defined in a relational database schema.
These tables are called base tables, because they contain the basic data
of the database.
• Portions of these base tables as well as information derived from them
can be defined in database views which are also defined as part of the
database schema.
• A view is a virtual table i.e. a window into a portion of the database.
Views are useful for maintaining confidentiality by restricting access to
selected parts of the database and for simplifying frequently used
query types.
• Following example illustrates how a view can be created
CREATE VIEW B_WORKER AS SELECT WORKER_ID, SKILL_TYPE FROM
WORKER
Transforming E-R Diagrams into Relational
Data Structures (Model)
• Please refer to the class notes.
Normalization
• This topic is concerned with the design and implementation
issues, that would be considered in a RDBMS (i.e. when we
shall design a database using relation DBMS).
• This is actually in continuation of our earlier discussion on
RDBMS to further refine our design (View) to produce the
relations in a form that is least prone to the problems like
inconsistency and redundancy.
• In general the goal of a relational database design is to
generate a set of relational schema that allows us to store
information without unnecessary redundancy, yet allows us
to retrieve information easily.
What is Normalization?
• The entities and their attributes can be organized into a set of tables in
many different ways.
• One method of organization is to design schemas that are in
appropriate normal form. The theory behind such arrangement of
attributes in tables is known as the Normalization.
• The normalization of data helps to ensure that a particular organization
conforms to such standards as:
– Minimization of duplication of data (redundancy).
– Providing flexibility to support different functional requirements.
– Easy and consistent modification of data.
– Enabling the organization to be translated to the actual database design.
• A number of normal forms have been defined for classifying relations.
Each normal form has associated with it a number of constraints.
• A relational schema is said to be in a particular normal form if it
satisfies all the constraints required for that normal form.
• In general if a given relation is in (n+1)th Normal form, it is obvious that
the relation is in the nth Normal form.
Univ erse of Relations (Un-Normalized & Normalized)
2 NF
3 NF
BCNF
4 NF
PJ/ NF (5 NF)
Why Normalization?
Relations are normalized so that when relations in a database are to
be altered during the lifetime of the database, we don’t loose vital
information or introduce inconsistencies. The types of alterations
normally needed for relations are:
Insertion:
• Insertion of data items into database should be possible without
being forced to leave blank fields for some attributes. If our design
has such undesirable property it is called Insertion Anomaly.
Deletion:
• Deletion of a tuple from a relation should be possible without
loosing vital information. If our design has such undesirable
property it is called Deletion Anomaly.
Updation:
• Updation or changing the value of an attribute should be possible
without introducing inconsistencies in the database. If our design
has such undesirable property it is called Updation Anomaly.
Clearly Normalization is used to avoid anomalies from the design.
Un-Normalized Relations
• Consider the following two relations:
Course_No. Course_Dept
Comp. Engg. Smith 353 Comp. Engg
221 Mathematics
456 Mathematics
336 Chemistry
---- -----
• In the above relation the attribute Items_Lines
(Course_Prefrences) is not a single attribute but is composed
of three attributes namely Item_Code, Quantity and
Price/Unit (Course_no. and Course_Dept).
• Each row contains multiple set of values for some of the
columns. These multiple values in a single row are called non-
atomic values.
• This form of data is not suitable for storing as a file on the
computer, as retrieval of data based on a component of a
composite attribute is difficult.
• For example, to find out “how many items for a specified
item_code (or how many teachers are interested to teach a
particular course)” is really difficult.
• Thus relations as shown in the above two tables are not
allowed. Such relations are known as Un-normalized
Relations.
Normal Forms
First Normal Form
• The two relations shown above can be rewritten as below:
. .
. .
. .
FD: X Y
X Y
. .
. .
. .
Not an FD
• Graphically an FD is shown as below:
X Y
S4 Clark London 20
S5 Adams Athens 30
SNAME STATUS
Example 2
• For the following relation (SP), determine FDs and draw the functional
dependency diagram:
S# P# QTY
S1 P1 300
S1 P2 200
S1 P3 400
S2 P1 300
S2 P2 400
S3 P2 200
ANSWER
• The following FD holds on SP relation.
SP.(S#,P#) SP.QTY
• The dependency diagram is shown in the following figure.
S#
QTY
P#
Full Functional Dependency
• Given a relation R and an FD: R.X R.Y.
Attr i bu te Y o f R i s f ul l y f u nc t i o na l l y
dependent on attribute X of R, if there is no
Z, where Z is a proper subset of X, such that
Z determines Y.
• So, full functional dependency is meaningful
when X is a composite attribute, otherwise
functional dependency and full functional
are used interchangeably.
An Example
• Consider the relation S. Determine whether S.CITY is fully
functionally dependent upon S.(S#,STATUS)?
• In relation S the FD:
S.(S#,STATUS) S.CITY
Does not hold.
Explanation:
We Have
S.S# S.CITY
As a subset (i.e. S#) of (S#,STATUS) is determining CITY, clearly
CITY is not fully functionally dependent on the composite
attribute in question.
• Note that in case of SP relation SP.QTY is fully functionally
dependent on SP.(S#,P#). (Why???)
Decompositions
• Decomposition is the process of splitting a relation into multiple
relations.
• Decompositions are mandatory requirements
S1 20 LONDON P1 300
S1 20 LONDON P2 200
S1 20 LONDON P3 400
S1 20 LONDON P4 200
S1 20 LONDON P5 100
S1 20 LONDON P6 100
S2 10 PARIS P1 300
S2 10 PARIS P2 400
S3 10 PARIS P2 200
S4 20 LONDON P2 200
S4 20 LONDON P4 300
S4 20 LONDON P5 400
S5 20 ATHENS P1 300
ORDER
ORDER_NO. ORDER_DATE ITEM_CODE QTY PRICE/
UNIT
1456 26-2-2001 3687 52 50.40
QTY
P# CITY
ORDER_NO
S# ORDER_DATE
QTY
ITEM_CODE
P# PRICR/ UNIT
1456 26-2-2001
1886 04-03-2001
1788 04-04-2001
ITEMS PRICES
ORDER_NO ITEM_CODE QTY ITEM_CODE PRICE/ UNIT
1456 3687 52 3687 50.40
1456 4627 38 4627 60.20
1456 3214 20 3214 17.50
1886 4629 45 4629 20.25
1886 4627 30 4630 62.20
1788 4630 40
• With these decompositions now we shall observe how the insertion,
deletion and updation anomalies problems related to 1 NF representations
of these relations are solved:
Insertion Anomaly:
• We can now enter the information that S6 is located in Washington, in spite
of the fact that currently he doesn’t supply any part.
• Similarly an order without any item description can be placed.
Deletion Anomaly:
• We can now delete the shipment connecting S5 to P2 and S3 to P2 by
deleting appropriate tuples from the relation SP, yet we don’t loose the
information that they are located in which particular cities.
• Similarly, if an order is cancelled we can delete appropriate tuple from
XORDER relation and this time we don’t loose the price of the item (s)
described in that order.
Updation Anomaly:
• In the revised structures of the relations, the CITY of a given supplier
appears once, so if a supplier moves from one city to another, his location
can easily be altered.
• Similarly the PRICE/ UNIT of an item can be changed easily in the PRICES
relation.
What We Have Actually Done to Solve The Problems??
• We have decomposed the relations in
such a way that in each of the resultant
relations every non key attribute is
fully functionally dependent on the key
attribute.
• Check the dependency diagrams of new
relations below:
S# STATUS
S#
QTY
CITY
P#
Dependency Diagram for SECOND
ORDER_NO
ITEM_CODE
• A relation R is in Second Normal Form
(2 NF) if and only if it is in First Normal
F o r m ( 1 N F ) a n d e v e r y n o n k e y
attribute is fully functionally dependent
on the key attribute.
• Further a relational schema is said to
be in 2 NF if each rel ati on i n th e
schema is in 2 NF.
Problems with 2 NF
• Anomalies similar to those described with 1 NF can, also occur with a
relation that is in second normal form (2 NF).
• To remove them another normalization step is used, that converts
second normal form relation to the third normal form relation.
• For further discussion consider the following relations namely SECOND
and STU_INFO
SECOND STU_INFO
S# STATUS CITY ENo. NAME DEPT YEAR HALL
Transitive Dependence
• Now consider the dependency diagrams of SECOND and
STU_INFO Relations on the next page.
• We notice that STATUS is transitively dependent on S# via CITY
and HALL is transitively dependent on ENo. via YEAR.
• In third normal form our goal is to remove these transitive
dependences.
• The only way to remove them is DECOMPOSITION. So, we once
again decompose the relations SECOND and STU_INFO as below:
Decomposition of SECOND:
SC<S#,CITY>
CS<CITY,STATUS>
Decomposition of STU_INFO:
STU1<ENO,NAME,DEPT,YEAR>
STU2<YEAR,HALL>
C ITY
S#
NAME
S TA TU S
YEAR
HALL
Dependences in SC
CITY STATUS
Dependences in CS
NAME
DEPT
ENo YEAR HALL
YEAR Dependence in STU2
HALL
Dependence in STU1
Problems
• Reduce the following relation into third normal form.
Project number Project name Empnumber Employee name Rate category Hourly rate
12 Pauline James B $50
16 Charles Ramoraz C $40
17 Monique Williams B $50
• Consider the following invoice for International Widgets Corp.
Design a relational schema for the computer based system that is
expected to generate the same invoice. Make sure that your
schema is in 3 NF.
Relational Algebra & Relational
Calculus
• In 1971 E.F.Codd published two papers introducing the relational data model and
relational data manipulation languages – Relational Algebra and Relational Calculus.
• Both of these languages allow the manipulation of data solely on the basis of their
logical characteristics.
• In his original paper Codd introduced the relational data model and relational
algebra.
• Relational Algebra is a procedural language for manipulating relations i.e. relational
algebra uses a step by step procedure to create a relation containing the data that
answer the query.
• In subsequent paper Codd introduced Relational Calculus. Relational calculus is non-
procedural. In relational calculus a query is solved by defining a solution relation in a
single step.
• Codd showed that relational algebra and relational calculus are logically equivalent.
It meant that any query that could be formulated in relational calculus could also be
formulated in relational algebra and vice versa.
• This provided a means of measuring logical power of a query language. If a language
was at least as powerful as relational algebra or relational calculus, it was called
Relationally Complete Language.
Relational Algebra
• Relational Algebra operators manipulate relations i.e. these
operators use one or two existing relations to create a new
relation. This new relation may then be used as an input to a new
operation.
• This powerful concept i.e. the creation of new relations from old
ones makes possible an infinite variety of data manipulations. It
also makes the solution of queries easier, since we can
experiment with partial solution until we find an approach that
will lead us to the final solution.
• The relational algebra operators can be divided into following
two categories:
– Basic Set Oriented Operations
– Relational Oriented Operations
Basic Set Oriented Operations
• These are traditional set operations namely Union, Difference,
Intersection and Cartesian Product.
• Three of these four basic operations – Union, Intersection and
Difference require that operand relations must be union
compatible which means that the names of the attributes of the
operand relations are same and that the resultant relation
inherits these names. Mathematically:
• Two relations P and Q, are said to be union compatible if both P
a n d Q a re o f s a m e d e g re e n a n d t h e d o m a i n s o f t h e
corresponding n attributes are identical i.e. if P = {P1, P2, P3,….,
Pn} and Q = {Q1, Q2, Q3,…., Qn} then:
Dom(Pi) = Dom(Qi) for all i = 1 to n.
Where Dom(Pi) represents the domain of the attribute Pi.
• Next we discuss the basic set oriented operators of relational
algebra.
1. Union (U)
• If we assume that P and Q are two union compatible relations then
the union of P and Q is set – theoretic union of P and Q. The resultant
relation R = P U Q has tuples drawn from P and Q such that:
R = { t | t є P V t є Q } and
max(|P|, |Q|) ≤ |R| ≤ |P| + |Q|
Note: |X| means cardinality of relation X.
• Following example illustrates:
P Q R=PUQ
ID NAME ID NAME ID NAME
101 Jones U 103 Smith 101 Jones
103 Smith 104 Lloyd
= 103 Smith
104 Lloyd 106 Byron 104 Lloyd
107 Evan 110 Drew 106 Byron
110 Drew 107 Evan
112 Smith 110 Drew
112 Smith
STATUS
2.π(STATUS) (SECOND) = 10
20
STATUS CITY
20 LONDON
3.π(STATUS, CITY) (SECOND) =
10 PARIS
20 ATHENS
2. Selection Operation (σ)
• The select io n o perati o n w ri tte n a s
σ predicate (R), works on a single relation R
and defines a relation that contains only
those tuples of R that satisfy the specified
condition (predicate) for a given attribute
or for a combination of attributes.
• Following examples on relations P and
SECOND illustrate the Selection operation.
P
ID Names ID Names
101 Jones 101 Jones
103 Smith 1. σID < 107(P) = 103 Smith
104 Lloyd 104 Lloyd
106 Byron 106 Byron
107 Evan
S# STATUS CITY
2. σCITY=’LONDON’(SECOND) = S1 20 LONDON
S4 20 LONDON
• Write Relational Algebra expression to answer
the following query for relation SECOND:
“Get S# for all suppliers residing in PARIS”.
• Solution:
1. Select all suppliers residing in Paris:
σCITY=’PARIS’(SECOND)
2. Project Out the undesired attributes (or take
projection on S#)
π(S#) (σCITY=’PARIS’(SECOND))
3. Join ( )
• The Join operation as its name suggests, allows the
combination of two relations to form a single new relation.
The join operation is used to connect data across relations,
which is the most important function in any database
language.
• There are several versions of join namely:
– Natural Join
– Theta Join
– Equi Join
– Self Join
– Outer Join
• Join is supposed to be one of the most important
operations of relational databases.
(i) Natural Join
• Suppose we want to take Natural Join of two relations P
and Q which have columns C1, C2, C3…. …. ….Cn in common.
Then,
P Q is obtained as:
1. Take Cartesian Product of P and Q i.e. Find P X Q.
2. Eliminate all tuples from P X Q except those on which the
values of columns C1, C2, C3…. …. ….Cn, in P are equal,
respectively to the values of those columns in Q.
3. Project out one copy of the columns C1, C2, C3…. …. ….Cn.
• The degree of resultant relation is always less than the
sum of the degree of P and degree of Q.
• Following example illustrates the Natural Join operation:
Computation of Natural Join for the relations E and S:
E S
ID NAME ID SALARY CODE
101 Jones 101 67
103 Smith 103 55
104 Evan 104 75
(i) Compute E X S
ID NAME ID SALARY CODE
101 Jones 101 67
101 Jones 103 55
101 Jones 104 75
103 Smith 101 67
103 Smith 103 55
103 Smith 104 75
104 Evan 101 67
104 Evan 103 55
104 Evan 104 75
(ii) Select rows for which values of common attribute are equal
ID NAME ID SALARY CODE
101 Jones 101 67
103 Smith 104 75
104 Evan 104 75
• We can note that natural join on relations E and S
may also be expressed as:
E S = π(Id, Name, Salary Code) (σE.Id = S.Id(E X S))
(ii) Theta Join ( ) B
• Theta Join is intended for those occasions when we need to join two
relations on the basis of some condition.
• So, theta join is a join operation that connects relations when the
values from specified columns of the relations have a specified
relationship.
• Let P and Q be wo relations, then theta join of relation P on attribute
X with relation Q on attribute Y is written as:
P Q
B
E S
ID NAME ID STATUS
101 Jones 101 Clerk
102 Smith 107 Engineer
103 Evan
(i) E X S
ID NAME S.ID S.STATUS
101 Jones 101 Clerk
101 Jones 107 Engineer
102 Smith 101 Clerk
102 Smith 107 Engineer
103 Evan 101 Clerk
103 Evan 107 Engineer
(ii) Assume that θ = E.ID < S.ID
Select all rows for which B is TRUE
ID NAME S.ID S.STATUS
101 Jones 107 Engineer
102 Smith 107 Engineer
103 Evan 107 Engineer
• We can easily note that theta join on relations E and S may also be
expressed as:
E S = σE.Id < S.Id(E X S)
B
• It would be interesting to note that the attributes on which theta join
operation is being carried out, may have different names in the two
relations. For example the ID attribute of relation S could have the
name CODE or something else. The true spirit of a theta join
operation is that the attributes on which it is being carried out must
take their values from the same domain.
(iii) Equi Join:
• Theta join is called Equi Join when the condition for
comparison (θ) is equality.
• Clearly, Equi Join is a special case of Theta Join.
• Also, Natural Join is a special case of Equi Join where,
among the duplicate copies of columns, one copy is
projected out.
• One major distinction between Equi Join and Natural Join
is that, for Natural Join it is mandatory that the attributes
on which natural join operation is being carried out must
have same names in the two relations however in case of
equi join they may have different names.
• Following example illustrates EQUI Join:
Computation of EQUI Join for the relations E and S:
E S
ID NAME CODE SALARY CODE
101 Jones 101 67
103 Smith 103 55
104 Evan 104 75
(i) ComputeE X S
ID NAME CODE SALARY CODE
101 Jones 101 67
101 Jones 103 55
101 Jones 104 75
103 Smith 101 67
103 Smith 103 55
103 Smith 104 75
104 Evan 101 67
104 Evan 103 55
104 Evan 104 75
(ii) Assume that θ = (E.ID = S.CODE)
Select all rows for which B is TRUE
ID NAME CODE SALARY CODE
E S
ID NAME ID SALARY CODE
101 Jones 101 67
103 Smith 103 55
104 Evan
(i) E X S
ID NAME ID SALARY CODE
101 Jones 101 67
101 Jones 103 55
103 Smith 101 67
103 Smith 103 55
104 Evan 101 67
104 Evan 103 55
(ii) Select rows for which values of common attribute are equal
ID NAME ID SALARY CODE
101 Jones 101 67
103 Smith 104 75
(iii) Project out one copy of common columns
ID NAME SALARY CODE
101 Jones 67
103 Smith 75
(iii) Indicate the Unmatched records
ID NAME SALARY CODE
101 Jones 67
103 Smith 75
104 Evan null
Please Note…
• Strictly speaking the outer join operation
we have just executed is a Left Outer Join as
it keeps every tuple in the left hand relation
in the result.
• Similarly there is a Right Outer Join, that
keeps every tuple in the right hand relation
in the result.
• There is also a Full Outer Join that keeps all
tuples in both relations, padding tuples
with nulls when no matching tuples are
found.
(v) Self Join
• As the name indicates, self join means joining a
relation with itself.
• But how to join a relation with itself when all the
attributes are common and take the values from
identical domains.
• To resolve the problem we copy the relation into
another relation having a different name and join
operation is executed on the qualified attributes
indicated by the requirement i.e. by the query.
• Following example illustrates:
• Consider the relations ASSIGNMENT:
ASSIGNMENT
EMP# PROD# JOB#
107 HEAP1 800
101 HEAP1 600
110 BINS9 800
103 HEAP1 700
101 BINS9 700
110 FM6 800
107 B++1 800
R = (ASSIGNMENT) (COASSIGN)
ASSIGNMENT.PROD# = COASSIGN.PROD#
3. Take Projection of R on EMP# and COA.EMP#::
π(EMP#, COA.EMP#) (R)
4. The Complete Expression Would be:
π(EMP#, COA.EMP#) ((ASSIGNMENT) (COASSIGN))
ASSIGNMENT.PROD# = COASSIGN.PROD#
ASSIGNMENT COASSIGN
EMP# PROD# JOB# EMP# PROD# JOB#
107 HEAP1 800 107 HEAP1 800
101 HEAP1 600 101 HEAP1 600
110 BINS9 800 110 BINS9 800
103 HEAP1 700 103 HEAP1 700
101 BINS9 700 101 BINS9 700
110 FM6 800 110 FM6 800
107 B++1 800 107 B++1 800
• Anant Kumar and Vinod Rathor are also under
supervision of Vijes Setthi.
• Rakesh Patel and Mukesh Singh are under
supervision of Unnith Nayar.
Now Think of Answering The Following Query:
• Get the list of employees along
with their supervisors??
Solution
• Copy Emp1 to Emp2
• J o i n E m p 1 w i t h E m p 2 s u c h t h a t
Emp1.EMP_SUPV = Emp2.EMP_ID
• SQL Command:
SELECT EMP1.EMP_ID, EMP1.EMP_NAME,
EMP2.EMP_ID, EMP2.EMP_NAME FROM EMP1,
E M P 2 W H E R E E m p 1 . E M P _ S U P V =
Emp2.EMP_ID;
Output
4. Divide (÷)
• Consider the relation P and the several results when relation P is
divided by different values of relation Q:
P
A B
A1 B1
A1 B2
A2 B1
A3 B1
A4 B1
A4 B2
A5 B1
A5 B2
Values of Relation Q Result of P ÷ Q
B A
B1 A1
B2 A4
A5
B A
B1 A1
A2
A3
A4
A5
B A
B1 NULL SET
B2
B3
• Simply stated – The division of P and Q is defined
such that the Cartesian Product of the result with
Q is a subset of P.
“OR”
• If we assume that a tuple for an instance (A1,B1) of
P represents the object A1 with the property B1
then resultant relation R is the set of all such
instances from P that posses the property B1.
Problems:
• Consider the Relation Schema Given Below taken over the ER diagram
shown therein:
EMPLOYEE<Emp#, Name>
ASSIGNED_TO<Project#,Emp#>
PROJECT<Project#,Project_Name,Chief_Architects>