Advanced Database System Chapter Three Query Processing and Optimization
Advanced Database System Chapter Three Query Processing and Optimization
Three
Query processing
and Optimization
1
Outlines
3
What is query processing?
6
Typical stages in query decomposition are:
1
2
…continued
• There are two main techniques that are employed during
query optimization.
• The first technique is based on heuristic rules for ordering the
operations in a query execution strategy. A heuristic is a rule that
works well in most cases but is not guaranteed to work well in
every case. The rules typically reorder the operations in a query
tree.
• The second technique involves systematically estimating the
cost of different execution strategies and choosing the execution
plan with the lowest cost estimate. These techniques are usually
combined in a query optimizer.
1
3
…continued
Example: Consider relations r(AB) and s(CD). We
require r X s.
Method 1 :
a. Load next record of r in RAM.
b. Load all records of s, one at a time and
concatenate with r.
c. All records of r concatenated?
NO: goto a.
YES: exit (the result in RAM or on disk).
Performance: Too many accesses.
14
…continued
Method 2: Improvement
a. Load as many blocks of r as possible leaving
room for one block of s.
b. Run through the s file completely one block
at a time.
Performance: Reduces the number of times s blocks are
loaded by a factor of equal to the number of r records than
can fit in main memory.
Considerations during query Optimization:
– Narrow down intermediate result sets
quickly. SELECT and PROJECTION before
JOIN
1
– Use access structures (indexes). 5
Using Heuristics in Query Optimization
• To measure the performance of your query execution plan, you need to use some
tools and metrics that can help you analyze and compare different plans.
Explain is a command that shows the query execution plan and the estimated cost of
each operation.
Profiling is a feature that shows the actual time and resources used by each
operation.
Monitoring is a tool that tracks and displays the overall performance and health of
your database system, such as CPU, memory, disk, and network usage.
• Common metrics for measuring query performance include execution time, disk
I/O, memory usage, and network traffic.
3. Union (∪): Combines two relations to create a new relation with all unique rows.
5. Difference (-): Retrieves rows from one relation that are not present in another.
6. Cartesian Product (×): Combines all possible pairs of rows from two relations.
7. Join (⨝): Combines rows from two relations based on a specified condition.
Projection (π): Selects specific columns from a relation, creating a new relation
with only those columns.
– Example: π(Name, Salary)(Employees)
Union (∪): Combines two relations to create a new relation containing all unique
rows.
– Example: R ∪ S
Intersection (∩): Returns a relation containing rows that appear in both input
relations.
– Example: R ∩ S
Difference (-): Returns a relation containing rows that appear in the first input
relation but not in the second.
– Example: R - S
Example: Example:
Initial Query: Initial Query:
Equivalent Query:
Optimized Query:
Equivalent Query:
Explanation: If you first project
the attributes name, age, and
salary, and then project only
Explanation: If you first project the
name and age, you can directly
attributes name and age and then
project name and age from the
select employees older than 30, or
start.
if you first select employees older
Transformation rule for relational
algebra with example….
5. Commutativity of THETA JOIN/Cartesian
Product
Rule: The THETA JOIN (⨝) and Cartesian Product
(×) operations are commutative, meaning the
order of the relations can be swapped without
affecting the result.
Example:
Initial Query:
R×S
Equivalent Query:
S×R
Explanation: Whether you join R with S or S with
R, the result will be the same set of tuples.
Transformation rule for relational
algebra with example….
6. Commutativity of SELECTION Case b: SELECTION
with THETA JOIN Predicate Involves
Rule: If the SELECTION predicate Attributes of Both
involves only attributes of one of Relations
the relations being joined, the
SELECTION and JOIN operations can Example:
be interchanged.
Initial Query:
Example:
Initial Query:
Optimized Query:
Optimized Query:
Example:
Example:
Query block: The basic unit that can be translated into the
algebraic operators and optimized.
A query block contains a single SELECT-FROM-WHERE
expression, as well as GROUP BY and HAVING clause if
these are part of the block.
Nested queries within a query are identified as separate query
blocks.
There are two types of nested queries: uncorrelated and
correlated. 6
2
Uncorrelated Nested Queries
SELECT name
FROM employees
WHERE department_id IN (SELECT department_id
FROM departments WHERE location = 'New York’);
SELECT name
FROM employees e
WHERE salary > (SELECT AVG(salary) FROM employees
WHERE department_id = e.department_id);
SELECT P.NUMBER,P.DNUM,E.LNAME,E.ADDRESS,
E.BDATE FROM PROJECT AS P,DEPARTMENT AS D,
EMPLOYEE AS E WHERE P.DNUM=D.DNUMBER AND
D.MGRSSN=E.SSN AND P.PLOCATION=‘STAFFORD’;
6
7
Sli
de
15-
68
Sli
de
15-
69
…cont
Step 1. Perform Selection operation as early
as possible : By using selection operation at
early stages, you can reduce the unwanted
number of record or data, to transfer from
database to primary memory. Optimizer use
transformation rule 1 to divide selection
operations with conjunctive conditions into a
cascade of selection operations.
… cont
1. The main heuristic is to apply first the operations that reduce the
size of intermediate results.
Slide 15-
77
B. Cost Estimation Approach to Query Optimization
• The main idea is to minimize he cost of processing a query. The cost
function is comprised of:
• I/O cost + CPU processing cost + communication cost + Storage cost
• These components might have different weights in different
processing environments
• The DBMs will use information stored in the system catalogue for the
purpose of estimating cost.
• The main target of query optimization is to minimize the size of the
intermediate relation. The size will have effect in the cost of:
• Disk Access
• Data Transportation
• Storage space in the Primary Memory
• Writing on Disk
78
• Cost-based query optimization:
• Estimate and compare the costs of executing a
query using different execution strategies and
choose the strategy with the lowest cost estimate.
(Compare to heuristic query optimization)
• Issues
• Cost function
• Number of execution strategies to be considered
• Cost Components for Query Execution
1. Access cost to secondary storage
2. Storage cost
3. Computation cost
4. Memory usage cost
5. Communication cost
Sli
de
15-
79
1. Access Cost of Secondary Storage
• Data is going to be accessed from secondary storage, as a query will
be needing some part of the data stored in the database. The disk
access cost can again be analyzed in terms of:
– Searching
– Reading, and
– Writing, data blocks used to store some portion of a
relation.
• Remark: The disk access cost will vary depending on
– The file organization used and the access method
implemented for the file organization.
– whether the data is stored contiguously or in
scattered manner, will affect the disk access cost.
80
…continued
2. Storage Cost
• While processing a query, as any query would be composed of
many database operations, there could be one or more
intermediate results before reaching the final output. These
intermediate results should be stored in primary memory for
further processing. The bigger the intermediate relation, the
larger the memory requirement, which will have impact on
the limited available space. This will be considered as a cost
of storage.
8
1
3. Query Execution Plans