Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
58 views

DBMS Module 2.5 Query Processing

The document discusses the basic steps in query processing which are parsing and translation, optimization, and evaluation. It describes how a query is parsed, translated to relational algebra, optimized to find the most efficient evaluation plan, and then evaluated by executing the optimized plan. The optimization step involves transforming the query using equivalence rules to find a logically equivalent plan with lower estimated cost based on statistics about the data. Some common equivalence rules allow projections, selections, and joins to be reordered and distributed in different ways during optimization.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

DBMS Module 2.5 Query Processing

The document discusses the basic steps in query processing which are parsing and translation, optimization, and evaluation. It describes how a query is parsed, translated to relational algebra, optimized to find the most efficient evaluation plan, and then evaluated by executing the optimized plan. The optimization step involves transforming the query using equivalence rules to find a logically equivalent plan with lower estimated cost based on statistics about the data. Some common equivalence rules allow projections, selections, and joins to be reordered and distributed in different ways during optimization.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

DBMS module 2.

Query Processing Strategy

Database Engineering 4th SEM CSE


Basic Steps in Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
Parser & Relational Algebra
Query Expression
Translator

Optimizer

Query Evaluation
Output Engine Execution Plan

Statistics
Data
About Data
CIS552 Query Processing 2
Database Engineering 4th SEM CSE
Basic Steps in Query Processing (Cont.)

Parsing and translation


• translate the query into its internal form. This is then
translated into relational algebra.
• Parser checks syntax, verifies relations
Evaluation
• The query-execution engine takes a query-evaluation
plan, executes that plan, and returns the answers to the
query.

Query Processing 3
Database Engineering 4th SEM CSE
Basic Steps in Query Processing
Optimization – finding the cheapest evaluation plan for a query.
• Given relational algebra expression may have many equivalent
expressions
E.g. σbalance<2500(Πbalance(account) is equivalent to
Πbalance(σbalance<2500(account))
• Any relational-algebra expression can be evaluated in
many ways. Annotated expression specifying detailed
evaluation strategy is called an evaluation-plan.
E.g. can use an index on balance to find accounts with
balance <2500, or can perform complete relation scan and
discard accounts with balance ≥ 2500
• Amongst all equivalent expressions, try to choose the one
with cheapest possible evaluation-plan. Cost estimate of a
plan based on statistical information in the DBMS catalog.

Query Processing 4
Database Engineering 4th SEM CSE
Catalog Information for Cost Estimation
• nr: number of tuples in relation r.
• br: number of blocks containing tuples of r.
• sr: size of a tuple of r in bytes.
• fr: blocking factor of r - i.e., the number of tuples of r that fit into one
block.
• V(A, r): number of distinct values that appear in r for attribute A;
same as the size of ΠA(r).
• SC(A, r): selection cardinality of attribute A of relation r; average
number of records that satisfy equality on A.
• If tuples of r are stored together physically in a file, then:
n 
br =  r 
 fr 

CIS552 Query Processing 5


Database Engineering 4th SEM CSE
Evaluation of Expressions
• Materialization: evaluate one operation at a time, starting
at the lowest-level. Use intermediate results materialized
into temporary relations to evaluate next-level operations.
• E.g., in figure below, compute and store σbalance<2500(account);
then compute and store its join with customer, and finally
compute the projection on customer-name.
Πcustomer-name

σbalance<2500 customer

account

Query Processing 6
Database Engineering 4th SEM CSE
Evaluation of Expressions (Cont.)
• Pipelining: evaluate several operations simultaneously,
passing the results of one operation on to the next.
• E.g., in expression in previous slide, don’t store result of
σbalance<2500(Account) – instead, pass tuples directly to
the join. Similarly, don’t store result of join, pass tuples
directly to projection.
• Much cheaper than materialization: no need to store a
temporary relation to disk.
• For pipelining to be effective, use evaluation algorithms
that generate output tuples even as tuples are received
for inputs to the operation.

Query Processing 7
Database Engineering 4th SEM CSE
Transformation of Relational Expressions

• Generation of query-evaluation plans for an expression


involves two steps:
1. generating logically equivalent expressions
2. annotating resultant expressions to get alternative
query plans
• Use equivalence rules to transform an expression into
an equivalent one.
• Based on estimated cost, the cheapest plan is selected.
The process is called cost based optimization.

Query Processing 8
Database Engineering 4th SEM CSE
Equivalence of Expressions
• Relations generated by two equivalent expressions have the same
set of attributes and contain the same set of tuples, although their
attributes may be ordered differently.
Πcustomer-name
Πcustomer-name

σ branch-city = Brooklyn

σ branch-city = Brooklyn
branch

account depositor branch account depositor

(a) Initial Expression Tree (b) Transformed Expression Tree


Equivalent expressions
Query Processing 9
Database Engineering 4th SEM CSE
Equivalence Rules
1. Conjunctive selection operations can be deconstructed
into a sequence of individual selections.
σθ1 ∧ θ2 (E) = σθ1 ( θ2 (E))
2. Selection operations are commutative.
σθ1 ( σθ2 (E))= σθ2 (σθ1 (E))
3. Only the last in a sequence of projection operations is
needed, the others can be omitted.
ΠL1(ΠL2(…(ΠLn(E))…)) = ΠL1(E)
4. Selections can be combined with Cartesian products
and theta joins.
(a) σθ (E1× E2) = E1 θ E2
(b) σθ1 (E1 θ E2) = E1 θ ∧ θ E2
2 1 2

Query Processing 10
Database Engineering 4th SEM CSE
Equivalence Rules (Cont.)

5. Theta-join operations (and natural joins) are


commutative.
E1 θ E2 = E2 θ E1
6. (a) Natural join operations are associative:
(E1 E2) E3 = E1 (E2 E3)
(b) Theta joins are associative in the following manner:
(E1 θ1 E2) θ 2 ∧ θ3 E3 = E1 θ1 ∧ θ3 (E2 θ2 E3)

where θ2 involves attributes from only E2 and E3.

Query Processing 11
Database Engineering 4th SEM CSE
Equivalence Rules (Cont.)
7. The selection operation distributes over the theta join
operation under the following two conditions:
(a) When all the attributes in θ0 involve only the attributes
of one of the expressions (E1) being joined.
σθ0 (E1 θ E2) = (σθ0 (E1)) θ E2
(b) When θ1 involves only the attributes of E1 and θ2
involves only the attributes of E2.
σθ ∧ θ (E1 θ E2) = (σθ1 (E1)) θ (σθ2 ( E2))
1 2

Query Processing 12
Database Engineering 4th SEM CSE
Equivalence Rules (Cont.)
8. The projection operation distributes over the theta join
operation as follows:
(a) if θ involves only attributes from L1 ∪ L2:
ΠL1∪ L2 (E1 θ E2) = (ΠL1(E1)) θ (ΠL2(E2))
(b) Consider a join E1 θ E2. Let L1 and L2 be sets of
attributes from E1 and E2, respectively. Let L3 be
attributes of E1 that are involved in join condition θ ,

but are not in L1 ∪ L2, and let L4 be attributes of E2 that


are involved in join condition θ , but are not in L1 ∪ L2.
ΠL1∪ L2 (E1 θ E2) = ΠL1∪ L2((ΠL1∪ L3 (E1)) θ (ΠL2∪ L4 (E2)))

Query Processing 13
Database Engineering 4th SEM CSE
Equivalence Rules (Cont.)
9. The set operations union and intersection are commutative (set
difference is not commutative).
E1 ∪ E2 = E2 ∪ E1
E1 ∩ E2 = E2 ∩ E1
10. Set union and intersection are associative.
11. The selection operation distributes over ∪, ∩ and −. E.g.:
σp(E1 − E2) = σp(E1) − σp(E2)
For difference
and intersection, union we also have:
σp(E1 ∩ E2) = σp(E1) ∩ σp(E2)
σp(E1 ∩ E2) = σp(E1) ∩ σp(E2)

12. The projection operation distributes over the union operation.


ΠL(E1 ∪ E2) = (ΠL(E1)) ∪ ΠL(E2))

Query Processing 14
Database Engineering 4th SEM CSE
Selection Operation Example
• Query: Find the names of all customers who have an
account at some branch located in Brooklyn.
Πcustomer-name(σbranch-city = “Brooklyn”
(branch (account depositor)))
• Transformation using rule 7a.
Πcustomer-name
((σbranch-city = “Brooklyn” (branch)) (account depositor))

• Performing the selection as early as possible reduces


the size of the relation to be joined.

Query Processing 15
Database Engineering 4th SEM CSE
Selection Operation Example(Cont.)
• Query: Find the names of all customers with an account at a
Brooklyn branch whose account balance is over $1000.
Πcustomer-name(σbranch-city = “Brooklyn” ∧ balance > 1000
(branch (account depositor))
• Transformation using join associativity (Rule 6a):
Πcustomer-name(σbranch-city = “Brooklyn” ∧ balance > 1000
(branch account) depositor))
• Second form provides an opportunity to apply the “Perform
selections early” rule, resulting in the subexpression
σbranch-city = “Brooklyn” (branch) σbalance > 1000 (account)
• Thus a sequence of transformations can be useful

Query Processing 16
Database Engineering 4th SEM CSE
Projection Operation Example

Πcustomer-name((σbranch-city = “Brooklyn” (branch)


account) depositor)
• When we compute
(σbranch-city = “Brooklyn” (branch) account)
We obtain a relation whose schema is:
(branch-name, branch-city, assets, account-number, balance)
• Push projections using equivalence rules 8a and 8b; eliminate
unneeded attributes from intermediate results to get:
Πcustomer-name ((Πaccount-number (
σbranch-city = “Brooklyn” (branch)) account)) depositor)

Query Processing 17
Database Engineering 4th SEM CSE
Join Ordering Example
• For all relations r1, r2 and r3,
(r1 r2) r3 = r1 (r2 r3)
• If r2 r3 is quite large and r1 r2 is small, we choose
(r1 r2) r3
so that we compute and store a smaller temporary
relation.

Query Processing 18
Database Engineering 4th SEM CSE
Heuristic Optimization
• Cost-based optimization is expensive, even with
dynamic programming.
• Systems may use heuristics to reduce the number of
choices that must be made in a cost-based fashion.
• Heuristic optimization transforms the query-tree by
using a set of rules that typically ( but not in all cases)
improve execution performance:
– Perform selection early (reduces the number of tuples)
– Perform projection early ( reduces the number of attributes)
– Perform most restrictive selection and join operations before
other similar operations.
• Some systems use only heuristics, others combine
heuristics with partial cost-based optimization.
Query Processing 19
Database Engineering 4th SEM CSE

You might also like