Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

03 Relational Algebra

Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

RELATIONAL ALGEBRA

Revisit: Data Model


cust_id cust_name ssn acct_id branch_name balance
23-652 Joe Smith 330-25-8822 A-301 New York 350
15-202 Ellen Jones 221-30-6551 A-307 Seattle 275
23-521 Dave Johnson 005-81-2568 A-318 Los Angeles 550
cust_id acct_id
... ... ... ... ... ...
15-202 A-301
cust_id branch_name balance 23-521 A-307
23-652 Joe Smith 330-25-8822 23-652 A-318
15-202 Ellen Jones 221-30-6551 ... ...
23-521 Dave Johnson 005-81-2568
15-202 Albert Stevens 450-22-5869
... ... The customer
... relation

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Query Languages
• A query language specifies how to access the data in the database
• Different kinds of query languages:
• Declarative languages specify what data to retrieve, but not how to retrieve it
• Procedural languages specify what to retrieve, as well as the process for retrieving it
• Query languages often include updating and deleting data as well
• Also called data manipulation language (DML)

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


The Relational Algebra
• A procedural query language
• Comprised of relational algebra operations
• Relational operations:
• Take one or two relations as input
• Produce a relation as output
• Relational operations can be composed together
• Each operation produces a relation
• A query is simply a relational algebra expression
• Six “fundamental” relational operations
• Other useful operations can be composed from these fundamental operations

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


“Why is this useful?”
• SQL is only loosely based on relational algebra
• SQL is much more on the “declarative” end of the spectrum
• Many relational databases use relational algebra operations for
representing execution plans
• Simple, clean, effective abstraction for representing how results will be generated
• Relatively easy to manipulate for query optimization

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Fundamental Relational Algebra Operations
• Six fundamental operations:
σ select operation
Π project operation
∪ set-union operation
– set-difference operation
× Cartesian product operation
ρ rename operation
• Each operation takes one or two relations as input
• Produces another relation as output
• Important details:
• What tuples are included in the result relation?
• Any constraints on input schemas? What is schema of result?
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Select Operation
Written as: σP(r)
• r is the input relation
• P is the predicate for selection
• P can refer to attributes in r (but no other relation!), as well as literal values
• Can use comparison operators: =, ≠, <, ≤, >, ≥
• Can combine multiple predicates using: ∧ (and), ∨ (or), ¬ (not)
• Result relation contains all tuples in r for which P is true
• Result schema is identical to schema for r

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


acct_id branch_name balance
A-301 New York 350
Select Examples A-307 Seattle 275

Using the account relation: A-318 Los Angeles 550


A-319 New York 80
A-322 Los Angeles 275
account

“Retrieve all tuples for accounts in the Los Angeles branch.” acct_id branch_name balance
σbranch_name=“Los Angeles”(account) A-318 Los Angeles 550
A-322 Los Angeles 275

“Retrieve all tuples for accounts in the Los Angeles branch, acct_id branch_name balance
with a balance under $300.” A-322 Los Angeles 275
σbranch_name=“Los Angeles” ∧ balance<300(account)
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Project Operation
Written as: Πa,b,…(r)
• Result relation contains only specified attributes of r
• Specified attributes must actually be in schema of r
• Result’s schema only contains the specified attributes
• Domains are same as source attributes’ domains
• Important note:
• Result relation may have fewer rows than input relation!
• Why?
• Relations are sets of tuples, not multisets

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


acct_id branch_name balance
A-301 New York 350
Project Examples
A-307 Seattle 275
Using the account relation: A-318 Los Angeles 550
A-319 New York 80
A-322 Los Angeles 275
account

“Retrieve all branch names that have at least one account.” branch_name
Πbranch_name(account) New York
Seattle
Los Angeles
• Result only has three tuples, even though input has five
• Result schema is just (branch_name)

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Composing Operations
• Input can also be an expression that evaluates to a relation, instead of just a
relation
• Πacct_id(σbalance≥300(account))
• Selects the account IDs of all accounts with a balance of $300 or more
• Input relation’s schema is:
Account_schema = (acct_id, branch_name, balance)
• Final result relation’s schema?
• Just one attribute: (acct_id)
• Distinguish between base and derived relations
• account is a base relation
• σbalance≥300(account) is a derived relation
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Set-Union Operation

Written as: r ∪ s
• Result contains all tuples from r and s
• Each tuple is unique, even if it’s in both r and s
• Constraints on schemas for r and s?
• r and s must have compatible schemas:
• r and s must have same arity (same number of attributes)
• For each attribute i in r and s, r[i] must have the same domain as s[i]
• (Our examples also generally have same attribute names, but not required! Arity and
domains are what matter.)

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Set-Union Example
acct_id branch_name balance cust_name acct_id
A-301 New York 350 Johnson A-318
A-307 Seattle 275 Smith A-322
A-318 Los Angeles 550 Reynolds A-319
A-319 New York 80 Lewis A-307
A-322 Los Angeles 275 Reynolds A-301
account depositor
loan_id branch_name amount cust_name loan_id
L-421 San Francisco 7500 Anderson L-437
L-445 Los Angeles 2000 Jackson L-419
L-437 Las Vegas 4300 Lewis L-421
L-419 Seattle 2900 Smith L-445
loan borrower
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Set-Union Example
acct_id branch_name balance cust_name acct_id
Find names of all customers that A-301 New York 350 Johnson A-318
have either a bank account or a A-307 Seattle 275 Smith A-322
loan at the bank A-318 Los Angeles 550 Reynolds A-319
A-319 New York 80 Lewis A-307
A-322 Los Angeles 275 Reynolds A-301
account depositor
loan_id branch_name amount cust_name loan_id
L-421 San Francisco 7500 Anderson L-437
L-445 Los Angeles 2000 Jackson L-419
L-437 Las Vegas 4300 Lewis L-421
L-419 Seattle 2900 Smith L-445
loan borrower
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Set-Union Example cust_name cust_name
Johnson Anderson
• Find names of all customers that have either a bank Smith Jackson
Reynolds Lewis
account or a loan at the bank
Lewis Smith
• Easy to find the customers with an account:
Πcust_name(depositor) Πcust_name(borrower)
Πcust_name(depositor)
• Also easy to find customers with a loan: cust_name
Johnson
Πcust_name(borrower)
Smith
• Result is set-union of these expressions: Reynolds
Πcust_name(depositor) ∪ Πcust_name(borrower) Lewis
Anderson
• Note that inputs have 8 tuples, but result has 6 tuples. Jackson

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Set-Difference Operation
Written as: r – s
• Result contains tuples that are only in r, but not in s
• Tuples in both r and s are excluded
• Tuples only in s are also excluded
• Constraints on schemas of r and s?
• Schemas must be compatible
• (Exactly like set-union)

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Set-Difference Example
acct_id branch_name balance cust_name acct_id
Find all customers that have an A-301 New York 350 Johnson A-318
account but not a loan. A-307 Seattle 275 Smith A-322
A-318 Los Angeles 550 Reynolds A-319
A-319 New York 80 Lewis A-307
A-322 Los Angeles 275 Reynolds A-301
account depositor
loan_id branch_name amount cust_name loan_id
L-421 San Francisco 7500 Anderson L-437
L-445 Los Angeles 2000 Jackson L-419
L-437 Las Vegas 4300 Lewis L-421
L-419 Seattle 2900 Smith L-445
loan borrower
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Set-Difference Example cust_name cust_name
Johnson Anderson
Find all customers that have an account but not a loan. Smith Jackson
Reynolds Lewis
• Easy to find the customers with an account:
Lewis Smith
Πcust_name(depositor)
Πcust_name(depositor) Πcust_name(borrower)
• Also easy to find customers with a loan:
Πcust_name(borrower)
cust_name
• Result is set-difference of these expressions:
Johnson
Πcust_name(depositor) - Πcust_name(borrower) Reynolds

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Cartesian Product Operation
Written as: r × s
• Read as “r cross s”
• No constraints on schemas of r and s
• Schema of result is concatenation of schemas for r and s
• If r and s have overlapping attribute names:
• All overlapping attributes are included; none are eliminated
• Distinguish overlapping attribute names by prepending the source relation’s name
• Example:
• Input relations: r(a, b) and s(b, c)
• Schema of r × s is (a, r.b, s.b, c)
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Cartesian Product Operation
• Result of r × s
• Contains every tuple in r, combined with every tuple in s
• If r contains Nr tuples, and s contains Ns tuples, result contains Nr × Ns tuples
• Allows two relations to be compared and/or combined
• If we want to correlate tuples in relation r with tuples in relation s…
• Compute r × s, then select out desired results with an appropriate predicate

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Cartesian Product Example
loan_id branch_name amount cust_name loan_id
L-421 San Francisco 7500 Anderson L-437
L-445 Los Angeles 2000 Jackson L-419
L-437 Las Vegas 4300 Lewis L-421
L-419 Seattle 2900 Smith L-445
loan borrower

• Compute result of borrower × loan


• Result will contain 4 × 4 = 16 tuples

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Cartesian Product Example
• Schema for borrower is:
Borrower_schema = (cust_name, loan_id)
• Schema for loan is:
Loan_schema = (loan_id, branch_name, amount)
• Schema for result of borrower × loan is:
(cust_name, borrower.loan_id, loan.loan_id, branch_name, amount)
• Overlapping attribute names are distinguished by including name of source relation

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


cust_name borrower. loan_id loan. loan_id branch_name amount
Anderson L-437 L-421 San Francisco 7500
Anderson L-437 L-445 Los Angeles 2000
Anderson L-437 L-437 Las Vegas 4300
Anderson L-437 L-419 Seattle 2900
Jackson L-419 L-421 San Francisco 7500
Jackson L-419 L-445 Los Angeles 2000
Jackson L-419 L-437 Las Vegas 4300
Jackson L-419 L-419 Seattle 2900
Lewis L-421 L-421 San Francisco 7500
Lewis L-421 L-445 Los Angeles 2000
Lewis L-421 L-437 Las Vegas 4300
Lewis L-421 L-419 Seattle 2900
Smith L-445 L-421 San Francisco 7500
Smith L-445 L-445 Los Angeles 2000
Smith L-445 L-437 Las Vegas 4300
Smith L-445 L-419 Seattle 2900

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Cartesian Product Example
• Can use Cartesian product to associate related rows between two tables
• …but, a lot of extra rows are included!
cust_name borrower. loan_id loan. loan_id branch_name amount
… … … … …
Jackson L-419 L-437 Las Vegas 4300
Jackson L-419 L-419 Seattle 2900
Lewis L-421 L-421 San Francisco 7500
Lewis L-421 L-445 Los Angeles 2000
… … … … …

• Combine Cartesian product with a select operation


σborrower.loan_id=loan.loan_id(borrower × loan)

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Cartesian Product Example
“Retrieve the names of all customers with loans at the Seattle branch.”
loan_id branch_name amount cust_name loan_id
L-421 San Francisco 7500 Anderson L-437
L-445 Los Angeles 2000 Jackson L-419
L-437 Las Vegas 4300 Lewis L-421
L-419 Seattle 2900 Smith L-445
loan borrower

• Need both borrower and loan relations


• Correlate tuples in the relations using loan_id
• Then, computing result is easy.
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Cartesian Product Example
• Associate customer names with loan details, using Cartesian product and a select:
σborrower.loan_id=loan.loan_id(borrower × loan)
• Select out loans at Seattle branch:
σbranch_name=“Seattle”(σborrower.loan_id=loan.loan_id(borrower × loan))
Simplify:
σborrower.loan_id=loan.loan_id ∧ branch_name=“Seattle”(borrower × loan)
• Project results down to customer name:
Πcust_name(σborrower.loan_id=loan.loan_id ∧ branch_name=“Seattle”(borrower × loan))
• Final result:
cust_name
Jackon

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Rename Operation
• Results of relational operations are unnamed
• Result has a schema, but the relation itself is unnamed
• Can give result a name using the rename operator
• Written as: ρx(E) (Greek rho, not lowercase “P”)
• E is an expression that produces a relation
• E can also be a named relation or a relation-variable
• x is new name of relation
• More general form is: ρx(A1, A2, …, An) (E)
• Allows renaming of relation’s attributes
• Requirement: E has arity n

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Scope of Renamed Relations
• Rename operation ρ only applies within a specific relational algebra
expression
• This does not create a new relation-variable!
• The new name is only visible to enclosing relational-algebra expressions
• Rename operator is used for two main purposes:
• Allow a derived relation and its attributes to be referred to by enclosing relational-
algebra operations
• Allow a base relation to be used multiple ways in one query
r × ρs(r)
• In other words, rename operation ρ is used to resolve ambiguities within a
specific relational algebra expression
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Rename Example
“Find the ID of the loan with the largest amount.”
loan_id branch_name amount
L-421 San Francisco 7500
L-445 Los Angeles 2000
L-437 Las Vegas 4300
L-419 Seattle 2900
loan

• Hard to find the loan with the largest amount!


• (At least, with the tools we have so far…)
• Much easier to find all loans that have an amount smaller than some other loan
• Then, use set-difference to find the largest loan
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Rename Example
• How to find all loans with an amount smaller than some other loan?
• Use Cartesian Product of loan with itself:
loan × loan
• Compare each loan’s amount to all other loans
• Problem: Can’t distinguish between attributes of left and right loan
relations!
• Solution: Use rename operation
loan × ρtest(loan)
• Now, right relation is named test

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Rename Example
• Find IDs of all loans with an amount smaller than some other loan:
Πloan.loan_id(σloan.amount<test.amount(loan × ρtest(loan)))
• Finally, we can get our result:
Πloan_id(loan) –
Πloan.loan_id(σloan.amount<test.amount(loan × ρtest(loan)))
• What if multiple loans have max value?
• All loans with max value appear in result.

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Additional Relational Operations
• The fundamental operations are sufficient to query a relational database…
• Can produce some large expressions for common operations!
• Several additional operations, defined in terms of fundamental operations:
∩ set-intersection
⋈ natural join
÷ division
← assignment

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Set-Intersection Operation
Written as: r ∩ s
• r ∩ s = r – (r – s)
r – s = the rows in r, but not in s
r – (r – s) = the rows in both r and s
• Relations must have compatible schemas
• Example: find all customers with both a loan and a bank account
Πcust_name(borrower) ∩ Πcust_name(depositor)

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Natural Join Operation
• Most common use of Cartesian product is to correlate tuples with the same
key-values
• Called a join operation
• The natural join is a shorthand for this operation
• Written as: r ⋈ s
• r and s must have common attributes
• The common attributes are usually a key for r and/or s, but certainly don’t have to be

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Natural Join Definition
• For two relations r(R) and s(S)
• Attributes used to perform natural join:
R ∩ S = {A1, A2, …, An}
• Formal definition:
• 𝑟 ⋈ 𝑠 = Π𝑅 ∪ 𝑆 𝜎𝑟.𝐴1=𝑠.𝐴1 ∧ 𝑟.𝐴2=𝑠.𝐴2 ∧ … ∧ 𝑟.𝐴𝑛=𝑠.𝐴𝑛 𝑟 × 𝑠
• r and s are joined using an equality condition based on their common attributes
• Result is projected so that common attributes only appear once

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Natural Join Example
• Simple example:
“Find the names of all customers with loans.”
• Result:
Π𝑐𝑢𝑠𝑡_𝑛𝑎𝑚𝑒 𝜎𝑏𝑜𝑟𝑟𝑜𝑤𝑒𝑟.𝑙𝑜𝑎𝑛=𝑙𝑜𝑎𝑛.𝑙𝑜𝑎𝑛_𝑖𝑑 𝑏𝑜𝑟𝑟𝑜𝑤𝑒𝑟 × 𝑙𝑜𝑎𝑛
• Rewritten with natural join:
Π𝑐𝑢𝑠𝑡_𝑛𝑎𝑚𝑒 𝑏𝑜𝑟𝑟𝑜𝑤𝑒𝑟 ⋈ 𝑙𝑜𝑎𝑛

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Natural Join Characteristics
• Very common to compute joins across multiple tables
• Example: 𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟 ⋈ 𝑏𝑜𝑟𝑟𝑜𝑤𝑒𝑟 ⋈ 𝑙𝑜𝑎𝑛
• Natural join operation is associative:
𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟 ⋈ 𝑏𝑜𝑟𝑟𝑜𝑤𝑒𝑟 ⋈ 𝑙𝑜𝑎𝑛 is equivalent to 𝑐𝑢𝑠𝑡𝑜𝑚𝑒𝑟 ⋈ 𝑏𝑜𝑟𝑟𝑜𝑤𝑒𝑟 ⋈ 𝑙𝑜𝑎𝑛

• Note:
• Even though these expressions are equivalent, order of join operations can
dramatically affect query cost!
• (Keep this in mind for later…)

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Division Operation
• Binary operator: 𝑟 ÷ 𝑠
• Implements a “for each” type of query
• “Find all rows in r that have one row corresponding to each row in s.”
• Relation r divided by relation s
• Not provided natively in most SQL databases
• Rarely needed!
• Easy enough to implement in SQL, if needed

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


person_name puzzle_name
Alex altekruse
Division Operation Alex soma cube
Bob puzzle box
• Puzzle Database
Carl altekruse
puzzle_list(puzzle_name) Bob soma cube
• Simple list of puzzles by name Carl puzzle box
completed(person_name, puzzle_name) Alex puzzle box
• Records which puzzles have been completed by each person Carl soma cube
completed
“Who has solved every puzzle?” puzzle_name
𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑑 ÷ 𝑝𝑢𝑧𝑧𝑙𝑒_𝑙𝑖𝑠𝑡 = person_name altekruse
Alex soma cube
Carl puzzle box
puzzle_list
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Relation Variables
• Recall: relation variables refer to a specific relation
• A specific set of tuples, with a particular schema
• Example: account_relation
acct_id branch_name balance
A-301 New York 350
A-307 Seattle 275
A-318 Los Angeles 550
A-319 New York 80
A-322 Los Angeles 275
account

• account is actually technically a relation variable, as are all our named relations so far
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Assignment Operation
• Can assign a relation-value to a relation-variable
• Written as: 𝑟𝑒𝑙𝑣𝑎𝑟 ← 𝐸
• E is an expression that evaluates a relation
• Unlike ρ, the name relvar persists in the database
• Often used for temporary relation-variables:
temp1 ⟵ ΠR–S(r)
temp2 ⟵ ΠR–S((temp1 × s) – ΠR–S,S(r))
result ⟵ temp1 – temp2
• Query evaluation becomes a sequence of steps
• (This is an implementation of the ÷ operator)
• Can also use assignment operation to modify data
• More about updates next time…
www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia
Guest (GuestID, GuestName, GuestAddress)
Hotel (HotelID, HotelName, HotelAddress)
Room (RoomID, HotelID, Type, Price)
Booking (HotelID, GuestID, RoomID, DateFrom, DateTo)

For each question, write the relational algebra that will fulfill the request. Please describe your answer!
1. List full details of all hotels.
2. List full details of all hotels in Surabaya.
3. List the number of rooms in each hotel in Surabaya.
4. List the names and addresses of all guests in Surabaya.
5. List all double or family rooms with a price below Rp 1,500,000 per night.

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Store(StoreID, StoreName, address)
Product(ProductID, ProductName, colour)
Catalog(StoreID, ProductID, price)

For each question, write the relational algebra that will fulfill the request. Please describe your answer!
1. Find the names of all black products.
2. Find all prices for products that are black or white.
3. Find the StoreID of all stores who sell a product that is black or white.
4. Find the StoreID of all stores who sell a product that is black and white.
5. Find the names of all stores who supply a product that is black or white.

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia


Thank You!

www.its.ac.id INSTITUT TEKNOLOGI SEPULUH NOPEMBER, Surabaya - Indonesia

You might also like