Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

CS121Lec02 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

RELATIONAL ALGEBRA

CS121: Relational Databases


Fall 2018 – Lecture 2
Query Languages
2

¨ A query language specifies how to access the data in


the database
¨ Different kinds of query languages:
¤ Declarative languages specify what data to retrieve, but not
how to retrieve it
¤ Procedural languages specify what to retrieve, as well as
the process for retrieving it
¨ Query languages often include updating and deleting
data as well
¨ Also called data manipulation language (DML)
The Relational Algebra
3

¨ A procedural query language


¨ Comprised of relational algebra operations
¨ Relational operations:
¤ Take one or two relations as input
¤ Produce a relation as output
¨ Relational operations can be composed together
¤ Each operation produces a relation
¤ A query is simply a relational algebra expression
¨ Six “fundamental” relational operations
¨ Other useful operations can be composed from these
fundamental operations
“Why is this useful?”
4

¨ SQL is only loosely based on relational algebra


¨ SQL is much more on the “declarative” end of the
spectrum
¨ Many relational databases use relational algebra
operations for representing execution plans
¤ Simple, clean, effective abstraction for representing
how results will be generated
¤ Relatively easy to manipulate for query optimization
Fundamental Relational Algebra Operations
5

¨ Six fundamental operations:


σ select operation
Π project operation
∪ set-union operation
– set-difference operation
× Cartesian product operation
ρ rename operation
¨ Each operation takes one or two relations as input
¨ Produces another relation as output
¨ Important details:
¤ What tuples are included in the result relation?
¤ Any constraints on input schemas? What is schema of result?
Select Operation
6

¨ Written as: σP(r)


¨ P is the predicate for selection
¤ P can refer to attributes in r (but no other relation!),
as well as literal values
¤ Can use comparison operators: =, ≠, <, ≤, >, ≥
¤ Can combine multiple predicates using:
∧ (and), ∨ (or), ¬ (not)
¨ r is the input relation
¨ Result relation contains all tuples in r for which P is true
¨ Result schema is identical to schema for r
Select Examples
7

Using the account relation: acct_id branch_name balance


A-301 New York 350
A-307 Seattle 275
A-318 Los Angeles 550
A-319 New York 80
A-322 Los Angeles 275
account

“Retrieve all tuples for accounts acct_id branch_name balance


in the Los Angeles branch.” A-318 Los Angeles 550
σbranch_name=“Los Angeles”(account) A-322 Los Angeles 275

“Retrieve all tuples for accounts


in the Los Angeles branch, acct_id branch_name balance
with a balance under $300.” A-322 Los Angeles 275
σbranch_name=“Los Angeles” ∧ balance<300(account)
Project Operation
8

¨ Written as: Πa,b,…(r)


¨ Result relation contains only specified attributes of r
¤ Specified attributes must actually be in schema of r
¤ Result’s schema only contains the specified attributes
¤ Domains are same as source attributes’ domains

¨ Important note:
¤ Result relation may have fewer rows than input relation!
¤ Why?
n Relations are sets of tuples, not multisets
Project Example
9

Using the account relation: acct_id branch_name balance


A-301 New York 350
A-307 Seattle 275
A-318 Los Angeles 550
A-319 New York 80
A-322 Los Angeles 275
account

“Retrieve all branch names that branch_name


have at least one account.” New York
Seattle
Πbranch_name(account)
Los Angeles

¨ Result only has three tuples, even though input has five
¨ Result schema is just (branch_name)
Composing Operations
10

¨ Input can also be an expression that evaluates to a


relation, instead of just a relation
¨ Πacct_id(σbalance≥300(account))
¤ Selects the account IDs of all accounts with a balance of
$300 or more
¤ Input relation’s schema is:
Account_schema = (acct_id, branch_name, balance)
¤ Final result relation’s schema?
n Just one attribute: (acct_id)
¨ Distinguish between base and derived relations
¤ account is a base relation
¤ σbalance≥300(account) is a derived relation
Set-Union Operation
11

¨ Written as: r ∪ s
¨ Result contains all tuples from r and s
¤ Each tuple is unique, even if it’s in both r and s
¨ Constraints on schemas for r and s ?
¨ r and s must have compatible schemas:
¤ r and s must have same arity
n (same number of attributes)
¤ For each attribute i in r and s, r[i] must have the same
domain as s[i]
¤ (Our examples also generally have same attribute names,
but not required! Arity and domains are what matter.)
Set-Union Example
12

¨ More complicated schema: accounts and loans

acct_id branch_name balance cust_name acct_id


A-301 New York 350 Johnson A-318
A-307 Seattle 275 Smith A-322
A-318 Los Angeles 550 Reynolds A-319
A-319 New York 80 Lewis A-307
A-322 Los Angeles 275 Reynolds A-301
account depositor

loan_id branch_name amount cust_name loan_id


L-421 San Francisco 7500 Anderson L-437
L-445 Los Angeles 2000 Jackson L-419
L-437 Las Vegas 4300 Lewis L-421
L-419 Seattle 2900 Smith L-445
loan borrower
Set-Union Example (2)
13

¨ Find names of all customers that have either a


bank account or a loan at the bank
acct_id branch_name balance cust_name acct_id
A-301 New York 350 Johnson A-318
A-307 Seattle 275 Smith A-322
A-318 Los Angeles 550 Reynolds A-319
A-319 New York 80 Lewis A-307
A-322 Los Angeles 275 Reynolds A-301
account depositor

loan_id branch_name amount cust_name loan_id


L-421 San Francisco 7500 Anderson L-437
L-445 Los Angeles 2000 Jackson L-419
L-437 Las Vegas 4300 Lewis L-421
L-419 Seattle 2900 Smith L-445
loan borrower
Set-Union Example (3)
14

¨ Find names of all customers that have either a bank


account or a loan at the bank
¤ Easy to find the customers cust_name cust_name
with an account: Johnson Anderson
Πcust_name(depositor) Smith Jackson
Reynolds Lewis
¤ Also easy to find customers Lewis Smith
with a loan: Πcust_name(depositor) Πcust_name(borrower)
Πcust_name(borrower)
¨ Result is set-union of these expressions:
cust_name
Πcust_name(depositor) ∪ Πcust_name(borrower)
Johnson
¤ Note that inputs have 8 tuples, Smith
but result has 6 tuples. Reynolds
Lewis
Anderson
Jackson
Set-Difference Operation
15

¨ Written as: r – s
¨ Result contains tuples that are only in r, but not in s
¤ Tuples in both r and s are excluded
¤ Tuples only in s are also excluded

¨ Constraints on schemas of r and s?


¤ Schemas must be compatible
¤ (Exactly like set-union.)
Set-Difference Example
16

acct_id branch_name balance cust_name acct_id


A-301 New York 350 Johnson A-318
A-307 Seattle 275 Smith A-322
A-318 Los Angeles 550 Reynolds A-319
A-319 New York 80 Lewis A-307
A-322 Los Angeles 275 Reynolds A-301
account depositor

loan_id branch_name amount cust_name loan_id


L-421 San Francisco 7500 Anderson L-437
L-445 Los Angeles 2000 Jackson L-419
L-437 Las Vegas 4300 Lewis L-421
L-419 Seattle 2900 Smith L-445
loan borrower

“Find all customers that have an account but not a loan.”


Set-Difference Example (2)
17

¨ Again, each component is easy


¤ All customers that have an account:
Πcust_name(depositor) cust_name cust_name

¤ All customers that have a loan: Johnson Anderson


Smith Jackson
Πcust_name(borrower) Reynolds Lewis
Lewis Smith
Πcust_name(depositor) Πcust_name(borrower)

¨ Result is set-difference of these expressions


Πcust_name(depositor) – Πcust_name(borrower)
cust_name
Johnson
Reynolds
Cartesian Product Operation
18

¨ Written as: r × s
¤ Read as “r cross s”
¨ No constraints on schemas of r and s
¨ Schema of result is concatenation of schemas for r and s
¨ If r and s have overlapping attribute names:
¤ All overlapping attributes are included; none are eliminated
¤ Distinguish overlapping attribute names by prepending the source
relation’s name
¨ Example:
¤ Input relations: r(a, b) and s(b, c)
¤ Schema of r × s is (a, r.b, s.b, c)
Cartesian Product Operation (2)
19

¨ Result of r × s
¤ Contains every tuple in r, combined with every tuple in s
¤ If r contains Nr tuples, and s contains Ns tuples, result
contains Nr × Ns tuples
¨ Allows two relations to be compared and/or
combined
¤ If we want to correlate tuples in relation r with tuples in
relation s…
¤ Compute r × s, then select out desired results with an
appropriate predicate
Cartesian Product Example
20

¨ Compute result of borrower × loan


cust_name loan_id loan_id branch_name amount
Anderson L-437 L-421 San Francisco 7500
Jackson L-419 L-445 Los Angeles 2000
Lewis L-421 L-437 Las Vegas 4300
Smith L-445 L-419 Seattle 2900
borrower loan

¨ Result will contain 4 × 4 = 16 tuples


Cartesian Product Example (2)
21

¨ Schema for borrower is:


Borrower_schema = (cust_name, loan_id)
¨ Schema for loan is:
Loan_schema = (loan_id, branch_name, amount)
¨ Schema for result of borrower × loan is:
(cust_name, borrower.loan_id,
loan.loan_id, branch_name, amount)
n Overlapping attribute names are distinguished by including
name of source relation
Cartesian Product Example (3)
22

Result: borrower. loan.


cust_name loan_id loan_id branch_name amount
Anderson L-437 L-421 San Francisco 7500
Anderson L-437 L-445 Los Angeles 2000
Anderson L-437 L-437 Las Vegas 4300
Anderson L-437 L-419 Seattle 2900
Jackson L-419 L-421 San Francisco 7500
Jackson L-419 L-445 Los Angeles 2000
Jackson L-419 L-437 Las Vegas 4300
Jackson L-419 L-419 Seattle 2900
Lewis L-421 L-421 San Francisco 7500
Lewis L-421 L-445 Los Angeles 2000
Lewis L-421 L-437 Las Vegas 4300
Lewis L-421 L-419 Seattle 2900
Smith L-445 L-421 San Francisco 7500
Smith L-445 L-445 Los Angeles 2000
Smith L-445 L-437 Las Vegas 4300
Smith L-445 L-419 Seattle 2900
Cartesian Product Example (4)
23

¨ Can use Cartesian product to associate related rows


between two tables
¤ …but, a lot of extra rows are included!
borrower. loan.
cust_name loan_id loan_id branch_name amount
… … … … …
Jackson L-419 L-437 Las Vegas 4300
Jackson L-419 L-419 Seattle 2900
Lewis L-421 L-421 San Francisco 7500
Lewis L-421 L-445 Los Angeles 2000
… … … … …

¨ Combine Cartesian product with a select operation


σborrower.loan_id=loan.loan_id(borrower × loan)
Cartesian Product Example (5)
24

¨ “Retrieve the names of all customers with loans at the


Seattle branch.”
cust_name loan_id loan_id branch_name amount
Anderson L-437 L-421 San Francisco 7500
Jackson L-419 L-445 Los Angeles 2000
Lewis L-421 L-437 Las Vegas 4300
Smith L-445 L-419 Seattle 2900
borrower loan

¨ Need both borrower and loan relations


¨ Correlate tuples in the relations using loan_id
¨ Then, computing result is easy.
Cartesian Product Example (6)
25

¨ Associate customer names with loan details, using Cartesian


product and a select:
σborrower.loan_id=loan.loan_id(borrower × loan)
¨ Select out loans at Seattle branch:
σbranch_name=“Seattle”(σborrower.loan_id=loan.loan_id(borrower × loan))
Simplify:
σborrower.loan_id=loan.loan_id ∧ branch_name=“Seattle”(borrower × loan)
¨ Project results down to customer name:
Πcust_name(σborrower.loan_id=loan.loan_id ∧ branch_name=“Seattle”(borrower × loan))

¨ Final result: cust_name


Jackson
Rename Operation
26

¨ Results of relational operations are unnamed


¤ Result has a schema, but the relation itself is unnamed
¨ Can give result a name using the rename operator
¨ Written as: ρx(E) (Greek rho, not lowercase “P”)
¤ E is an expression that produces a relation
¤ E can also be a named relation or a relation-variable
¤ x is new name of relation

¨ More general form is: ρx(A , A , …, A )(E)


1 2 n

¤ Allows renaming of relation’s attributes


¤ Requirement: E has arity n
Scope of Renamed Relations
27

¨ Rename operation ρ only applies within a specific


relational algebra expression
¤ This does not create a new relation-variable!
¤ The new name is only visible to enclosing relational-algebra
expressions
¨ Rename operator is used for two main purposes:
¤ Allow a derived relation and its attributes to be referred to by
enclosing relational-algebra operations
¤ Allow a base relation to be used multiple ways in one query
n r × ρs(r)
¨ In other words, rename operation ρ is used to resolve
ambiguities within a specific relational algebra expression
Rename Example
28

¨ “Find the ID of the loan with the largest amount.”


loan_id branch_name amount
L-421 San Francisco 7500
L-445 Los Angeles 2000
L-437 Las Vegas 4300
L-419 Seattle 2900
loan

¤ Hard to find the loan with the largest amount!


n (At least, with the tools we have so far…)
¤ Much easier to find all loans that have an amount smaller
than some other loan
¤ Then, use set-difference to find the largest loan
Rename Example (2)
29

¨ How to find all loans with an amount smaller than


some other loan?
¤ Use Cartesian Product of loan with itself:
loan × loan
¤ Compare each loan’s amount to all other loans
¨ Problem: Can’t distinguish between attributes of left
and right loan relations!
¨ Solution: Use rename operation
loan × ρtest(loan)
¤ Now, right relation is named test
Rename Example (3)
30

¨ Find IDs of all loans with an amount smaller than


some other loan:
Πloan.loan_id(σloan.amount<test.amount(loan × ρtest(loan)))
¨ Finally, we can get our result:
Πloan_id(loan) – loan_id
L-421
Πloan.loan_id(σloan.amount<test.amount(loan × ρtest(loan)))

¨ What if multiple loans have max value?


¤ All loans with max value appear in result.
Additional Relational Operations
31

¨ The fundamental operations are sufficient to query


a relational database…
¨ Can produce some large expressions for common
operations!
¨ Several additional operations, defined in terms of
fundamental operations:
∩ set-intersection
natural join
÷ division
⟵ assignment
Set-Intersection Operation
32

¨ Written as: r ∩ s
¨ r ∩ s = r – (r – s)
r – s = the rows in r, but not in s
r – (r – s) = the rows in both r and s
¨ Relations must have compatible schemas
¨ Example: find all customers with both a loan and a
bank account
Πcust_name(borrower) ∩ Πcust_name(depositor)
Natural Join Operation
33

¨ Most common use of Cartesian product is to


correlate tuples with the same key-values
¤ Called a join operation
¨ The natural join is a shorthand for this operation
¨ Written as: r s
¤ r and s must have common attributes
¤ The common attributes are usually a key for
r and/or s, but certainly don’t have to be
Natural Join Definition
34

¨ For two relations r(R) and s(S)


¨ Attributes used to perform natural join:
R ∩ S = {A1, A2, …, An}
¨ Formal definition:
r s = ΠR ∪ S(σ r.A1=s.A1 ∧ r.A2=s.A2 ∧ … ∧ r.An=s.An (r × s))
¤ r and s are joined using an equality condition based on
their common attributes
¤ Result is projected so that common attributes only
appear once
Natural Join Example
35

¨ Simple example:
“Find the names of all customers with loans.”
¨ Result:
Πcust_name(σborrower.loan_id=loan.loan_id(borrower × loan))
¨ Rewritten with natural join:
Πcust_name(borrower loan)
Natural Join Characteristics
36

¨ Very common to compute joins across multiple tables


¨ Example: customer borrower loan
¨ Natural join operation is associative:
¤ (customer borrower) loan is equivalent to
customer (borrower loan)

¨ Note:
¤ Even though these expressions are equivalent, order of join
operations can dramatically affect query cost!
¤ (Keep this in mind for later…)
Division Operation
37

¨ Binary operator: r ÷ s
¨ Implements a “for each” type of query
¤ “Find all rows in r that have one row corresponding to
each row in s.”
¤ Relation r divided by relation s
¨ Easiest to illustrate with an example:
¨ Puzzle Database
puzzle_list(puzzle_name)
n Simple list of puzzles by name
completed(person_name, puzzle_name)
n Records which puzzles have been completed by each person
Puzzle Database
38

“Who has solved every puzzle?” person_name puzzle_name


Alex altekruse
¨ Need to find every person in completed
Alex soma cube
that has an entry for every puzzle in Bob puzzle box
puzzle_list Carl altekruse
Bob soma cube
¨ Divide completed by puzzle_list to get Carl puzzle box
answer: Alex puzzle box
Carl soma cube
completed ÷ puzzle_list = person_name
completed
Alex
Carl
puzzle_name
altekruse
¨ Only Alex and Carl have completed soma cube
every puzzle in puzzle_list. puzzle box
puzzle_list
Puzzle Database (2)
39

“Who has solved every puzzle?” person_name puzzle_name


Alex altekruse
completed ÷ puzzle_list = person_name
Alex soma cube
Alex
Bob puzzle box
Carl
Carl altekruse
Bob soma cube
¨ Very reminiscent of integer division Carl puzzle box
Alex puzzle box
¤ Result relation contains tuples from Carl soma cube
completed that are evenly divided by completed
puzzle_name
puzzle_name
¨ Several other kinds of relational division
altekruse
operators soma cube
¤ e.g. some can compute “remainder” of puzzle box
puzzle_list
the division operation
Division Operation
40

For r(R) ÷ s(S)


¨ Required: S ⊂ R
¤ All attributes in S must also be in R
¨ Result has schema R – S
¤ Result has attributes that are in R but not also in S
¤ (This is why we don’t allow S = R)
¨ Every tuple t in result satisfies these conditions:
t ∈ ΠR–S(r)
⟨ ∀ts ∈ s : ∃tr ∈ r : tr[S] = ts[S] ∧ tr[R–S] = t ⟩
n Every tuple in the result has a row in r corresponding to
every row in s
Puzzle Database
41

person_name puzzle_name
Alex altekruse
For completed ÷ puzzle_list Alex soma cube
¨ Schemas are compatible Bob puzzle box
Carl altekruse
¨ Result has schema (person_name) Bob soma cube
¤ Attributes in completed schema, but Carl puzzle box
not also in puzzle_list schema Alex puzzle box
Carl soma cube
person_name
completed = r
Alex
Carl
puzzle_name
completed ÷ puzzle_list
altekruse
¨ Every tuple t in result satisfies these soma cube
conditions: puzzle box
t ∈ ΠR–S(r) puzzle_list = s
⟨ ∀ts ∈ s : ∃tr ∈ r : tr[S] = ts[S] ∧ tr[R–S] = t ⟩
Division Operation
42

¨ Not provided natively in most SQL databases


¤ Rarely needed!
¤ Easy enough to implement in SQL, if needed

¨ Will see it in the homework assignments, and on the


midterm… J
¤ Often a very nice shortcut for more involved queries
Relation Variables
43

¨ Recall: relation variables refer to a specific relation


¤ A specific set of tuples, with a particular schema
¨ Example: account relation
acct_id branch_name balance
A-301 New York 350
A-307 Seattle 275
A-318 Los Angeles 550
A-319 New York 80
A-322 Los Angeles 275
account

¤ account is actually technically a relation variable,


as are all our named relations so far
Assignment Operation
44

¨ Can assign a relation-value to a relation-variable


¨ Written as: relvar ⟵ E
¤ E is an expression that evaluates to a relation
¨ Unlike ρ, the name relvar persists in the database
¨ Often used for temporary relation-variables:
temp1 ⟵ ΠR–S(r)
temp2 ⟵ ΠR–S((temp1 × s) – ΠR–S,S(r))
result ⟵ temp1 – temp2
¤ Query evaluation becomes a sequence of steps
¤ (This is an implementation of the ÷ operator)
¨ Can also use assignment operation to modify data
¤ More about updates next time…

You might also like