Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

02 - Relational Algebra

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

L4: Relational Algebra

Relational Query Languages


 A major strength of the relational model: supports
simple, powerful querying of data.
 Queries can be written intuitively, and the DBMS is
responsible for efficient evaluation.
The key: precise semantics for relational queries.
Allows the optimizer to extensively re-order
operations, and still ensure that the answer does not
change.
 Relational languages
Relational algebra
Relational calculus
Commercial languages (SQL)
Relational Algebra
DBMS-L4: Relational Algebra -- 2
Relational Query Languages
 Query languages: Allow manipulation and retrieval of
data from a database.
 Relational model supports simple, powerful QLs:
 Strong formal foundation based on logic.
 Allows for much optimization.
 Query Languages != programming languages!
 QLs not expected to be “Turing complete”.
 QLs not intended to be used for complex
calculations.
 QLs support easy, efficient access to large data sets.

Relational Algebra
DBMS-L4: Relational Algebra -- 3
Formal Relational Query Languages
Two mathematical Query Languages form the
basis for “real” languages (e.g. SQL), and
for implementation:
 Relational Algebra: More operational, very
useful for representing execution plans.
 Relational Calculus: Lets users describe
what they want, rather than how to compute
it. (Non-operational, declarative.)

Understanding Algebra & Calculus is key to


understanding SQL & query processing!
Relational Algebra
DBMS-L4: Relational Algebra -- 4
Preliminaries
 A query is applied to relation instances, and the result of
a query is also a relation instance.
 Schemas of input relations for a query are fixed (but
query will run regardless of instance!)
 The schema for the result of a given query is also
fixed! Determined by definition of query language
constructs.
 Positional vs. named-field notation:
 Positional notation easier for formal definitions,
named-field notation more readable.
 Both used in SQL

Relational Algebra
DBMS-L4: Relational Algebra -- 5
R1 sid bid day
22 101 10/10/96
Example Instances
58 103 11/12/96
 “Sailors” and “Reserves” and
“Boats” relations for our S1 sid sname rating age
examples. 22 dustin 7 45.0
 Sailors (sid, name, rating,
age) 31 lubber 8 55.5
 Reserves (sid, bid, day) 58 rusty 10 35.0
 Boat (bid, name, color)
 We’ll use positional or named S2 sid sname rating age
field notation, assume that
names of fields in query results 28 yuppy 9 35.0
are `inherited’ from names of 31 lubber 8 55.5
fields in query input relations.
44 guppy 5 35.0
58 rusty 10 35.0
Relational Algebra
DBMS-L4: Relational Algebra -- 6
Relational Algebra
 Expressions consists of operands and operators without looping
 Basic operations:
 Selection ( σ ) Selects a subset of rows from relation.
 Projection ( π or Π ) Deletes unwanted columns from relation.
 Cross-product ( × ) Allows us to combine two relations.
 Set-difference ( − ) Tuples in one relation but not the other.
 Union (∪ ) Tuples in both two relations.
 Additional operations:
 Intersection, join, division, renaming: Not essential, but (very!)
useful.
 Since each operation returns a relation, operations can be composed!
(Algebra is “closed”.)

Relational Algebra
DBMS-L4: Relational Algebra -- 7
sid sname rating age
Selection 28 yuppy 9 35.0
58 rusty 10 35.0
 Selects rows that
satisfy selection σ
rating >8
(S2)
condition.
 No duplicates in result!
(Why?)
 Schema of result sname rating
identical to schema of
(only) input relation. yuppy 9
 Result relation can be rusty 10
the input for another
relational algebra π sname,rating(σ rating >8(S2))
operation! (Operator
composition.)
Relational Algebra
DBMS-L4: Relational Algebra -- 8
sname rating
Projection yuppy 9
lubber 8
 Deletes attributes that are not
in projection list. guppy 5
 Schema of result contains rusty 10
exactly the fields in the
projection list, with the same
π sname,rating(S2)
names that they had in the
(only) input relation.
 Projection operator has to age
eliminate duplicates! (Why??)
 Note: real systems typically 35.0
don’t do duplicate 55.5
elimination unless the user
explicitly asks for it. (Why π age(S2)
not?)
Relational Algebra
DBMS-L4: Relational Algebra -- 9
Union, Intersection, Set-Difference
 All of these operations take two
sid sname rating age
input relations, which must be
union-compatible: 22 dustin 7 45.0
 Same number of fields. 31 lubber 8 55.5
 `Corresponding’ fields have the
same type. 58 rusty 10 35.0
 You almost always want to do 44 guppy 5 35.0
these for input relations whose 28 yuppy 9 35.0
schema contains a key!
 What is the schema of the result?
S1∪ S2
sid sn a m e ra tin g age
sid sname rating age 31 lu b b e r 8 5 5 .5
22 dustin 7 45.0 58 ru sty 10 3 5 .0
S1− S2 S1∩ S2
Relational Algebra
DBMS-L4: Relational Algebra -- 10
Cross-Product
 Each row of S1 is paired with each row of R1.
 Result schema has one field per field of S1 and R1, with
field names `inherited’ if possible.
 Conflict: Both S1 and R1 have a field called sid.
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96

 Renaming operator: ρ (C (1 → sid1, 5 → sid 2 ), S1 × R1)


Relational Algebra
DBMS-L4: Relational Algebra -- 11
Joins
 Condition Join: R >< c S = σ c ( R × S)
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 58 103 11/12/96
S1 >< R1
S1. sid < R1. sid
 Result schema same as that of cross-product.
 Fewer tuples than cross-product, might be able to
compute more efficiently
 Sometimes called a theta-join.
Relational Algebra
DBMS-L4: Relational Algebra -- 12
Joins
 Equi-Join: A special case of condition join
where the condition c contains only equalities.
sid sname rating age bid day
22 dustin 7 45.0 101 10/10/96
58 rusty 10 35.0 103 11/12/96
S1 >< R1
sid
 Result schema similar to cross-product, but only
one copy of fields for which equality is specified.
 Natural Join: Equijoin on all common fields.
Relational Algebra
DBMS-L4: Relational Algebra -- 13
Division
 Not supported as a primitive operator, but useful for
expressing queries like:
Find sailors who have reserved all boats.
 Let A have 2 fields, x and y; B have only field y:
 A/B = { x | ∃ x , y ∈ A ∀ y ∈ B }
 i.e., A/B contains all x tuples (sailors) such that for every y tuple
(boat) in B, there is an xy tuple in A.
 Or: If the set of y values (boats) associated with an x value
(sailor) in A contains all y values in B, the x value is in A/B.
 In general, x and y can be any lists of fields; y is the list of
fields in B, and x ∪ y is the list of fields of A.

Relational Algebra
DBMS-L4: Relational Algebra -- 14
Examples of Division A/B
sno pno pno pno pno
s1 p1 p2 p2 p1
s1 p2 p4 p2
s1 p3 B1 B2
p4
s1 p4
sno B3
s2 p1
s2 p2 s1
s3 p2 s2 sno
sno
s4 p2 s3 s1
s1
s4 p4 s4 s4
A A/B1 A/B2 A/B3
Relational Algebra
DBMS-L4: Relational Algebra -- 15
Expressing A/B Using Basic Operators
 Division is not essential op; just a useful
shorthand.
 (Also true of joins, but joins are so common that
systems implement joins specially.)
 Idea: For A/B, compute all x values that are not
`disqualified’ by some y value in B.
 x value is disqualified if by attaching y value from B, we
obtain an xy tuple that is not in A.
Disqualified x values: π x ((π x ( A ) × B ) − A )
A/B: π x ( A) − all disqualified tuples
Relational Algebra
DBMS-L4: Relational Algebra -- 16
Find Names of Sailors Who’ve Reserved
Boat #103
 Solution 1: π sname((σ bid =103 Reserves) >< Sailors)

 Solution 2:
ρ (Temp1, σ Re serves)
bid = 103

ρ ( Temp2, Temp1 >< Sailors)


π sname (Temp2)
 Solution 3:
π sname (σ (Re serves >< Sailors))
bid =103
Relational Algebra
DBMS-L4: Relational Algebra -- 17
Find Names of Sailors Who’ve Reserved a
Red Boat
 Information about boat color only available in
Boats; so need an extra join:

π snam e ((σ Boats) >< Re serves >< Sailors )


color =' red '

 A more efficient solution:


π sname (π ((π σ Boats )>< Re serves ) >< Sailors )
sid bid color = 'red '

 A query optimizer can find this given the first solution!

Relational Algebra
DBMS-L4: Relational Algebra -- 18
Find Sailors Who’ve Reserved a Red or a
Green Boat
 Can identify all red or green boats, then find
sailors who’ve reserved one of these boats:
ρ (Tempboats, (σ Boats))
color =' red ' ∨ color =' green '
π sname(Tempboats >< Re serves >< Sailors)

 Can also define Tempboats using union! (How?)

 What happens if ∨ is replaced by ∧ in this query?


Relational Algebra
DBMS-L4: Relational Algebra -- 19
Find Sailors Who’ve Reserved a Red and a Green
Boat
 Previous approach won’t work! Must identify
sailors who’ve reserved red boats, sailors
who’ve reserved green boats, then find the
intersection (note that sid is a key for Sailors):

ρ (Tempred, π ((σ Boats) >< Re serves))


sid color =' red '
ρ (Tempgreen, π ((σ Boats) >< Re serves))
sid color =' green'

π sname((Tempred ∩ Tempgreen) >< Sailors)

Relational Algebra
DBMS-L4: Relational Algebra -- 20
Find the Names of Sailors Who’ve Reserved all
Boats
 Uses division; schemas of the input relations to /
must be carefully chosen:

ρ (Tempsids,(π Reserves) / (π Boats))


sid ,bid bid
π sname (Tempsids >< Sailors)

 To find sailors who’ve reserved all red boats:


K/ π (σ Boats)
bid color =red
Relational Algebra
DBMS-L4: Relational Algebra -- 21
Summary – Relational Algebra
 The relational model has rigorously defined
query languages that are simple and powerful.
 Relational algebra is more operational; useful
as internal representation for query evaluation
plans.
 Several ways of expressing a given query; a
query optimizer should choose the most efficient
version.

Relational Algebra
DBMS-L4: Relational Algebra -- 22

You might also like