Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
48 views

Introduction To Query Processing

This document provides an overview of query processing and optimization. It discusses the basic steps of query processing which are parsing, optimization, and evaluation. It also covers measures of query cost such as disk accesses and CPU time. Query optimization is the process of selecting the most efficient query evaluation plan by transforming the query expression and evaluating operations in the most cost-effective order. The goal is to minimize the number of disk accesses and reduce the response time for a query.

Uploaded by

Atharva Tadge
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Introduction To Query Processing

This document provides an overview of query processing and optimization. It discusses the basic steps of query processing which are parsing, optimization, and evaluation. It also covers measures of query cost such as disk accesses and CPU time. Query optimization is the process of selecting the most efficient query evaluation plan by transforming the query expression and evaluating operations in the most cost-effective order. The goal is to minimize the number of disk accesses and reduce the response time for a query.

Uploaded by

Atharva Tadge
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Introduction to Query Processing

and
Query Optimization
Outline

Overview
Measures of Query Cost
Query Optimization
What is Query Processing?

➢ Query processing: Activities involved in


extracting data from a database.
➢ Three basic steps:
1. Parsing and Translation
2. Optimization
3. Evaluation
Steps in Query Processing
Measures of Query Cost

➢ Cost is generally measured as total elapsed time for answering query


➢ Many factors contribute to time cost
1. Disk accesses (Time to process a data request and retrive data
from the storage device)
2. CPU (time to execute a query)
3. Network communication cost
➢ Disk access is the predominant cost, and is also relatively easy to
estimate.
➢ Cost to write a block is greater than cost to read a block
• data is read back after being written to ensure that the write
was successful
Measures of Query Cost

For simplicity we just use the number of block transfers from disk and
the number of seeks as the cost measures
tT – time to transfer one block
tS – time for one seek
Cost for b block transfers plus S seeks
b * tT + S * tS
We ignore CPU costs for simplicity
Real systems do take CPU cost into account
We do not include cost to writing output to disk in our cost
Select operation
➢ Symbol: 
➢ Notation:  condition (Relation)
➢ Operation : Select tuple from a relation that satisfy a given condition.

➢ Search algorithm
1. Linear search (A1)
2. Binary search (A2)
Linear search (A1)
It Scan each file block and test all records to see whether they satisfy the selection condition.

Cost estimate = br block transfers


•br denotes number of blocks containing records from relation r
If selection is on a key attribute (primary key), then system can
stop on finding record
• cost = (br /2) block transfers
Linear search can be applied regardless of
• Selection condition or
• Ordering of records in the file, or
• Availability of indices
This algorithm is slower than binary search algorithm.
Binary search (A2)
Is used when selection is an equality comparison on the primary key and
relation is sorted on primary key attribute.

Cost of binary search = [log2(br)]


br denotes number of blocks containing records from relation r

If the selection is on non primary attribute then multiple block may


contains required records , then the cost of scanning such block need to
be added to the cost estimate.

This algorithm is faster than linear search algorithm


Evaluation of expressions

➢ Method
1. Materialization
2. Pipelining
Materialization
➢ Materialized evaluation: evaluate one operation at a time, starting at the lowest-
level. (from bottom and perform the inner most operations first)
➢ The intermediate results of each operation is materialized (store in temporary
relation)and become input for subsequent( evaluate next-level operations).
➢ The cost of materialization is the sum of the individual operations plus the cost of
writing the intermediate results to disk.
The problem ;
1. Creates lots of temporary relation
2. Perform lots of I/O operation
Pipelining
It evaluate several operations simultaneously, passing the results of one
operation on to the next.
To reduce number of intermediate temporary relations , we pass results of one
operation to the next operation in the pipeline.
Combining operations into a pipeline eliminates the cost of reading and writing
temporary relations.
Much cheaper than materialization: no need to store a temporary relation to disk.
Pipelines can be executed in two ways:
Demand driven –system makes request for tuples from the operation at the top
of pipeline
Producer driven – Operation do not wait for request to produce tuple but
generate the tuples eagerly.
Query Optimization

Process of selecting the most efficient query evaluation plan


Query Optimization

Customer Account
Cid Ano C_name Ano Balance
A1 3000
101 A1 Ram
A2 1000
102 A2 Harsh
A3 2000
103 A3 Deepak
A4 4000
104 A4 Gopal

Efficient Plan 2 records 4 records

4 records 4 records
Transformation of relational Expression
Cascade of selection
Combined selection operation can be divided into sequence of individual selection.
Selection operation
Selection operation are commutative
Project opeartion
If more than one projection operation is used in expression then only the
outer projection operation is required.
Join
Natural join operations are associative
Example
Recap

Query processing
Measures of Query Cost
Evaluation of expressions
Query representation

You might also like