DBMS Module 2.5 Query Processing
DBMS Module 2.5 Query Processing
Optimizer
Query Evaluation
Output Engine Execution Plan
Statistics
Data
About Data
CIS552 Query Processing 2
Database Engineering 4th SEM CSE
Basic Steps in Query Processing (Cont.)
Query Processing 3
Database Engineering 4th SEM CSE
Basic Steps in Query Processing
Optimization – finding the cheapest evaluation plan for a query.
• Given relational algebra expression may have many equivalent
expressions
E.g. σbalance<2500(Πbalance(account) is equivalent to
Πbalance(σbalance<2500(account))
• Any relational-algebra expression can be evaluated in
many ways. Annotated expression specifying detailed
evaluation strategy is called an evaluation-plan.
E.g. can use an index on balance to find accounts with
balance <2500, or can perform complete relation scan and
discard accounts with balance ≥ 2500
• Amongst all equivalent expressions, try to choose the one
with cheapest possible evaluation-plan. Cost estimate of a
plan based on statistical information in the DBMS catalog.
Query Processing 4
Database Engineering 4th SEM CSE
Catalog Information for Cost Estimation
• nr: number of tuples in relation r.
• br: number of blocks containing tuples of r.
• sr: size of a tuple of r in bytes.
• fr: blocking factor of r - i.e., the number of tuples of r that fit into one
block.
• V(A, r): number of distinct values that appear in r for attribute A;
same as the size of ΠA(r).
• SC(A, r): selection cardinality of attribute A of relation r; average
number of records that satisfy equality on A.
• If tuples of r are stored together physically in a file, then:
n
br = r
fr
σbalance<2500 customer
account
Query Processing 6
Database Engineering 4th SEM CSE
Evaluation of Expressions (Cont.)
• Pipelining: evaluate several operations simultaneously,
passing the results of one operation on to the next.
• E.g., in expression in previous slide, don’t store result of
σbalance<2500(Account) – instead, pass tuples directly to
the join. Similarly, don’t store result of join, pass tuples
directly to projection.
• Much cheaper than materialization: no need to store a
temporary relation to disk.
• For pipelining to be effective, use evaluation algorithms
that generate output tuples even as tuples are received
for inputs to the operation.
Query Processing 7
Database Engineering 4th SEM CSE
Transformation of Relational Expressions
Query Processing 8
Database Engineering 4th SEM CSE
Equivalence of Expressions
• Relations generated by two equivalent expressions have the same
set of attributes and contain the same set of tuples, although their
attributes may be ordered differently.
Πcustomer-name
Πcustomer-name
σ branch-city = Brooklyn
σ branch-city = Brooklyn
branch
Query Processing 10
Database Engineering 4th SEM CSE
Equivalence Rules (Cont.)
Query Processing 11
Database Engineering 4th SEM CSE
Equivalence Rules (Cont.)
7. The selection operation distributes over the theta join
operation under the following two conditions:
(a) When all the attributes in θ0 involve only the attributes
of one of the expressions (E1) being joined.
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
(b) When θ1 involves only the attributes of E1 and θ2
involves only the attributes of E2.
σθ ∧ θ (E1 θ E2) = (σθ1 (E1)) θ (σθ2 ( E2))
1 2
Query Processing 12
Database Engineering 4th SEM CSE
Equivalence Rules (Cont.)
8. The projection operation distributes over the theta join
operation as follows:
(a) if θ involves only attributes from L1 ∪ L2:
ΠL1∪ L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
(b) Consider a join E1 θ E2. Let L1 and L2 be sets of
attributes from E1 and E2, respectively. Let L3 be
attributes of E1 that are involved in join condition θ ,
Query Processing 13
Database Engineering 4th SEM CSE
Equivalence Rules (Cont.)
9. The set operations union and intersection are commutative (set
difference is not commutative).
E1 ∪ E2 = E2 ∪ E1
E1 ∩ E2 = E2 ∩ E1
10. Set union and intersection are associative.
11. The selection operation distributes over ∪, ∩ and −. E.g.:
σp(E1 − E2) = σp(E1) − σp(E2)
For difference
and intersection, union we also have:
σp(E1 ∩ E2) = σp(E1) ∩ σp(E2)
σp(E1 ∩ E2) = σp(E1) ∩ σp(E2)
Query Processing 14
Database Engineering 4th SEM CSE
Selection Operation Example
• Query: Find the names of all customers who have an
account at some branch located in Brooklyn.
Πcustomer-name(σbranch-city = “Brooklyn”
(branch (account depositor)))
• Transformation using rule 7a.
Πcustomer-name
((σbranch-city = “Brooklyn” (branch)) (account depositor))
Query Processing 15
Database Engineering 4th SEM CSE
Selection Operation Example(Cont.)
• Query: Find the names of all customers with an account at a
Brooklyn branch whose account balance is over $1000.
Πcustomer-name(σbranch-city = “Brooklyn” ∧ balance > 1000
(branch (account depositor))
• Transformation using join associativity (Rule 6a):
Πcustomer-name(σbranch-city = “Brooklyn” ∧ balance > 1000
(branch account) depositor))
• Second form provides an opportunity to apply the “Perform
selections early” rule, resulting in the subexpression
σbranch-city = “Brooklyn” (branch) σbalance > 1000 (account)
• Thus a sequence of transformations can be useful
Query Processing 16
Database Engineering 4th SEM CSE
Projection Operation Example
Query Processing 17
Database Engineering 4th SEM CSE
Join Ordering Example
• For all relations r1, r2 and r3,
(r1 r2) r3 = r1 (r2 r3)
• If r2 r3 is quite large and r1 r2 is small, we choose
(r1 r2) r3
so that we compute and store a smaller temporary
relation.
Query Processing 18
Database Engineering 4th SEM CSE
Heuristic Optimization
• Cost-based optimization is expensive, even with
dynamic programming.
• Systems may use heuristics to reduce the number of
choices that must be made in a cost-based fashion.
• Heuristic optimization transforms the query-tree by
using a set of rules that typically ( but not in all cases)
improve execution performance:
– Perform selection early (reduces the number of tuples)
– Perform projection early ( reduces the number of attributes)
– Perform most restrictive selection and join operations before
other similar operations.
• Some systems use only heuristics, others combine
heuristics with partial cost-based optimization.
Query Processing 19
Database Engineering 4th SEM CSE