Introduction To Query Processing
Introduction To Query Processing
and
Query Optimization
Outline
Overview
Measures of Query Cost
Query Optimization
What is Query Processing?
For simplicity we just use the number of block transfers from disk and
the number of seeks as the cost measures
tT – time to transfer one block
tS – time for one seek
Cost for b block transfers plus S seeks
b * tT + S * tS
We ignore CPU costs for simplicity
Real systems do take CPU cost into account
We do not include cost to writing output to disk in our cost
Select operation
➢ Symbol:
➢ Notation: condition (Relation)
➢ Operation : Select tuple from a relation that satisfy a given condition.
➢ Search algorithm
1. Linear search (A1)
2. Binary search (A2)
Linear search (A1)
It Scan each file block and test all records to see whether they satisfy the selection condition.
➢ Method
1. Materialization
2. Pipelining
Materialization
➢ Materialized evaluation: evaluate one operation at a time, starting at the lowest-
level. (from bottom and perform the inner most operations first)
➢ The intermediate results of each operation is materialized (store in temporary
relation)and become input for subsequent( evaluate next-level operations).
➢ The cost of materialization is the sum of the individual operations plus the cost of
writing the intermediate results to disk.
The problem ;
1. Creates lots of temporary relation
2. Perform lots of I/O operation
Pipelining
It evaluate several operations simultaneously, passing the results of one
operation on to the next.
To reduce number of intermediate temporary relations , we pass results of one
operation to the next operation in the pipeline.
Combining operations into a pipeline eliminates the cost of reading and writing
temporary relations.
Much cheaper than materialization: no need to store a temporary relation to disk.
Pipelines can be executed in two ways:
Demand driven –system makes request for tuples from the operation at the top
of pipeline
Producer driven – Operation do not wait for request to produce tuple but
generate the tuples eagerly.
Query Optimization
Customer Account
Cid Ano C_name Ano Balance
A1 3000
101 A1 Ram
A2 1000
102 A2 Harsh
A3 2000
103 A3 Deepak
A4 4000
104 A4 Gopal
4 records 4 records
Transformation of relational Expression
Cascade of selection
Combined selection operation can be divided into sequence of individual selection.
Selection operation
Selection operation are commutative
Project opeartion
If more than one projection operation is used in expression then only the
outer projection operation is required.
Join
Natural join operations are associative
Example
Recap
Query processing
Measures of Query Cost
Evaluation of expressions
Query representation