Dbms Module 2 Chapter 8
Dbms Module 2 Chapter 8
Introduction
• Data model must include a set of operations to manipulate the
database, define the database’s structure and constraints. These
operations for the formal relational model is the relational algebra.
• These operations enable a user to specify basic retrieval requests as
relational algebra expressions.
• Sequence of relational algebra operations forms a relational algebra
expression.
• Unary operations - operate on single relations.
• Binary operations - operate on two tables by combining related
tuples based on join conditions.
8.1 Unary Relational Operations: SELECT
and PROJECT
• 8.1.1 The SELECT Operation
• The SELECT operation is used to choose a subset of the tuples from a relation that satisfies
a selection condition.
• We can consider the SELECT operation to be a filter that keeps only those tuples that satisfy
a qualifying condition.
• The SELECT operation is a horizontal partition of the relation into two sets of tuples
-those tuples that satisfy the condition and are selected
-those tuples that do not satisfy the condition and are filtered out.
• In SQL, the SELECT condition is typically specified in the WHERE clause of a query.
• For example, to select the EMPLOYEE tuples whose department is 4
corresponds to
8.1.2 The PROJECT Operation
• PROJECT operation, selects certain columns from the table and discards the
other columns.
• In SQL, the PROJECT attribute list is specified in the SELECT clause of a query.
• Result of the PROJECT operation can be visualized as a vertical partition of
the relation into two relations
- one has the needed columns (attributes) and contains the result of the
operation
- the other contains the discarded columns.
• For example, to list each employee’s first and last name and salary, we can
use the PROJECT operation as follows
• The general form of the PROJECT operation is
• Suppose we want to retrieve the name of the manager of each department. To get the manager’s
name, we need to combine each department tuple with the employee tuple whose Ssn value
matches the Mgr_ssn value in the department tuple. We do this by using the JOIN operation and
then projecting the result over the necessary attributes
• Mgr_ssn is a foreign key of the DEPARTMENT relation that references
Ssn, the primary key of the EMPLOYEE relation. This referential integrity
constraint plays a role in having matching tuples in the referenced
relation EMPLOYEE.
• The JOIN operation is CARTESIAN PRODUCT operation followed by a
SELECT operation.
• Consider the previous cartesian product example
• A JOIN operation with such a general join condition is called a THETA JOIN.
where each is of the form Ai θ Bj , Ai is an attribute of R, Bj is an attribute of S, Ai and
Bj have the same domain, and θ (theta) is one of the comparison operators {=,> ,<, ≥,
≠}.
8.3.2 Variations of JOIN: The EQUIJOIN and
NATURAL JOIN
• JOIN, where the only comparison operator used is =, is called an EQUIJOIN.
• In the result of an EQUIJOIN we always have one or more pairs of attributes that
have identical values in every tuple for example the equality join condition
specified on these two attributes requires the values to be identical in every
tuple in the result.
• NATURAL JOIN—denoted by *
• Created to get rid of the extra unnecessary attribute in an EQUIJOIN
condition.
• NATURAL JOIN requires that the two join attributes have the same name
in both relations. If this is not the case, renaming operation is applied
first.
• Suppose we want to combine each PROJECT tuple with the
DEPARTMENT tuple that controls the project, first we rename the
Dnumber attribute of DEPARTMENT to Dnum—so that it has the same
name as the Dnum attribute in PROJECT—and then we apply NATURAL
JOIN:
• The same query can be done in two steps by creating an intermediate
table DEPT as follows:
• The attribute Dnum is called the join attribute for the NATURAL JOIN
operation, because it is the only attribute with the same name in
both relations.
• In the PROJ_DEPT relation, each tuple combines a PROJECT tuple with
the DEPARTMENT tuple for the department that controls the project,
but only one join attribute value is kept.
• If the attributes on which the natural join is specified already have the
same names in both relations, renaming is unnecessary.
• The join condition for NATURAL JOIN is constructed by equating each pair of
join attributes that have the same name in the two relations and combining
these conditions with AND. There can be a list of join attributes from each
relation, and each corresponding pair must have the same name.
• The NATURAL JOIN or EQUIJOIN operation can also be specified among
multiple tables, leading to an n-way join. For example, consider the following
three-way join:
• This combines each project tuple with its controlling department tuple into a
single tuple, and then combines that tuple with an employee tuple that is the
department manager. The net result is a consolidated relation in which each
tuple contains this project-department-manager combined information .
8.3.3 A Complete Set of Relational Algebra
Operations
• Set of relational algebra operations {σ, π, ∪, ρ, –, ×} is a complete set.
• Relational algebra operations can be expressed as a sequence of operations
from this set.
• For example, the INTERSECTION operation can be expressed by using UNION
and MINUS as follows:
R ∩ S ≡ (R ∪ S) – ((R – S) ∪ (S – R))
• JOIN operation can be specified as a CARTESIAN PRODUCT followed by a
SELECT operation
• Next, create a relation that includes a tuple <Pno,Essn> whenever the employee whose Ssn is Essn
works on the project whose number is Pno in the intermediate relation SSN_PNOS:
• Finally, apply the DIVISION operation to the two relations, which gives the desired employees’
Social Security numbers:
RESULT
• This means that, for a tuple t to appear in the result T of the
DIVISION, the values in t must appear in R in combination with every
tuple in S.
• In the DIVISION operation, the tuples in the denominator relation S
restrict the numerator relation R by selecting those tuples in the
result that match all values present in the denominator.
8.3.5 Notation for Query Trees
• A notation used in RDBMS to represent queries internally.
• It is known as a query tree or query evaluation tree or query execution tree.
• It includes the relational algebra operations being executed and is used as a
possible data structure for the internal representation of the query in an
RDBMS.
• A query tree is a tree data structure that corresponds to a relational algebra
expression. It represents the input relations of the query as leaf nodes of the
tree, and represents the relational algebra operations as internal nodes. An
execution of the query tree consists of executing an internal node operation
whenever its operands (represented by its child nodes) are available, and then
replacing that internal node by the relation that results from executing the
operation. The execution terminates when the root node is executed and
produces the result relation for the query.
For every project located in ‘Stafford’, list the project number, the
controlling department number, and the department manager’s last
name, address, and birth date.
8.4 Additional Relational Operations
• Some common database requests can not be performed with the original relational
algebra operations.
• Additional operations to express these requests are required. These operations
enhance the expressive power of the original relational algebra.
8.4.1 Generalized Projection
• The generalized projection operation extends the projection operation by allowing
functions of attributes to be included in the projection list. The generalized form
can be expressed as:
• F1, F2, … , Fn are functions over the attributes in relation R and may involve
arithmetic operations and constant values.
• This operation is helpful when developing reports where computed values have to
be produced in the columns of a query result.
8.4.2 Aggregate Functions and Grouping
• Another type of request that cannot be expressed in the basic relational algebra is to
specify mathematical aggregate functions on collections of values from the database.
• Example, retrieving the average or total salary of all employees or the total number of
employee tuples.
• Common functions applied to collections of numeric values include SUM, AVERAGE,
MAXIMUM, and MINIMUM. The COUNT function is used for counting tuples or
values.
• Another common type of request involves grouping the tuples in a relation by the
value of some of their attributes and then applying an aggregate function
independently to each group.
• Example, group EMPLOYEE tuples by Dno, so that each group includes the tuples for
employees working in the same department. We can then list each Dno value along
with, the average salary of employees within the department, or the number of
employees who work in the department.
• AGGREGATE FUNCTION operation
• For each binary 1:1 relationship type R in the ER schema, identify the
relations S and T that correspond to the entity types participating in R
• There are three possible approaches:
- foreign key approach
- merged relationship approach
- cross reference or relationship relation approach
1. Foreign key approach:
• Choose one of the relations S, and include as a foreign key in S the primary key
of T.
• It is better to choose an entity type with total participation in R in the role of S.
• Include all the simple attributes of the 1:1 relationship type R as attributes of S.
• In our example, we map the 1:1 relationship type MANAGES by choosing the
participating entity type DEPARTMENT to serve in the role of S because its
participation in the MANAGES relationship type is total (every department has
a manager).
• We include the primary key of the EMPLOYEE relation as foreign key in the
DEPARTMENT relation and rename it to Mgr_ssn.
• We also include the simple attribute Start_date of the MANAGES relationship
type in the DEPARTMENT relation and rename it Mgr_start_date.
2. Merged relation approach:
• An alternative mapping of a 1:1 relationship type is to merge the two entity types
and the relationship into a single relation.
• This is possible when both participations are total, as this would indicate that the
two tables will have the exact same number of tuples at all times.
3. Cross-reference or relationship relation approach:
• Set up a third relation R for the purpose of cross-referencing the
primary keys of the two relations S and T representing the entity types.
• It is required for binary M:N relationships.
• The relation R is called a relationship relation, because each tuple in R
represents a relationship instance that relates one tuple from S with
one tuple from T.
• The relation R will include the primary key attributes of S and T as
foreign keys.
Step 4: Mapping of Binary 1:N Relationship Types.
• Two possible approaches:
(1) the foreign key approach
(2) the cross-reference or relationship relation approach.
1. The foreign key approach
• For each regular binary 1:N relationship type R, identify the relation S that represents the
participating entity type at the N-side of the relationship type.
• Include as foreign key in S the primary key of the relation T that represents the other
entity type participating in R.
• Because each entity instance on the N-side is related to at most one entity instance on the
1-side of the relationship type.
• Include any simple attributes of the 1:N relationship type as attributes of S.
• Example, we map the 1:N relationship types WORKS_FOR, CONTROLS, and SUPERVISION.
• For WORKS_FOR we include the primary key Dnumber of the DEPARTMENT relation as
foreign key in the EMPLOYEE relation and call it Dno.
• For SUPERVISION we include the primary key of the EMPLOYEE relation as foreign key in
the EMPLOYEE relation itself because the relationship is recursive—and call it Super_ssn.
• The CONTROLS relationship is mapped to the foreign key attribute Dnum of PROJECT,
which references the primary key Dnumber of the DEPARTMENT relation.
The relationship relation approach:
• We create a separate relation R whose attributes are the primary keys
of S and T.
• The relation R is called a relationship relation, because each tuple in R
represents a relationship instance that relates one tuple from S with
one tuple from T.
Step 5: Mapping of Binary M:N Relationship Types.
• For each binary M:N relationship type R, create a new relation S to
represent R.
• Include as foreign key attributes in S the primary keys of the relations
that represent the participating entity types; their combination will
form the primary key of S.
• Include any simple attributes of the M:N relationship type as attributes
of S.
• We cannot represent an M:N relationship type by a single foreign key
attribute in one of the participating relations because of the M:N
cardinality ratio; we must create a separate relationship relation S.
• We map the M:N relationship type WORKS_ON by creating the relation
WORKS_ON. We include the primary keys of the PROJECT and EMPLOYEE relations
as foreign keys in WORKS_ON and rename them Pno and Essn, respectively .
• We also include an attribute Hours in WORKS_ON to represent the Hours attribute
of the relationship type. The primary key of the WORKS_ON relation is the
combination of the foreign key attributes {Essn, Pno}.
• The propagate (CASCADE) option for the referential triggered action should be
specified on the foreign keys in the relation corresponding to the relationship R,
since each relationship instance has an existence dependency on each of the
entities it relates. This can be used for both ON UPDATE and ON DELETE.
Step 6: Mapping of Multivalued Attributes.
• For each multivalued attribute A,
• create a new relation R.
• The primary key of R is the combination of A and K.
• If the multivalued attribute is composite, we include its simple components.
• Example, we create a relation DEPT_LOCATIONS.
• The attribute Dlocation represents the multivalued attribute LOCATIONS of
DEPARTMENT, whereas Dnumber—as foreign key—represents the primary
key of the DEPARTMENT relation.
• The primary key of DEPT_LOCATIONS is the combination of {Dnumber,
Dlocation}.
• A separate tuple will exist in DEPT_LOCATIONS for each location that a
department has
Step 7: Mapping of N-ary Relationship Types
• For each n-ary relationship type R.
- Create a new relation S to represent R.
- Include primary keys of participating entity types as foreign keys.
- Include any simple attributes as attributes.
- The primary key of S is usually a combination of all the foreign keys that
reference the relations representing the participating entity types.
- Consider the ternary relationship type SUPPLY which relates a SUPPLIER
s, PART p, and PROJECT j whenever s is currently supplying p to j; this
can be mapped to the relation SUPPLY whose primary key is the
combination of the three foreign keys {Sname, Part_no, Proj_name}.