Query Optimization

The document discusses query optimization techniques used by DBMS to process and execute high-level queries, including scanning, parsing, and validating SQL queries. It explains the creation of query trees, the evaluation of execution strategies, and the importance of heuristic optimization to improve performance by reducing intermediate results. Additionally, it covers the conversion of query trees into execution plans, detailing access methods and evaluation approaches such as materialized and pipelined evaluations.

Uploaded by

mwendikimaiga21

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Query Optimization

Uploaded by

mwendikimaiga21

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Query Optimization

• Query optimization addresses the techniques

used by DBMS to process, optimize and
execute high level queries. A query expressed
as a high-level language such as SQL must first
be scanned, parsed and validated.
• scanner identifies the language tokens-such as
SQL keywords, attribute names and relation
names
Query Optimization
• the parser checks the query syntax to
determine whether it is formulated according
to the syntax rules of the query language
• query must also be validated by checking that
all attribute and relation names are valid and
semantically meaningful names in the schema
Query Optimization
• An internal representation of the query is then
created, usually as a tree data structure called a
query tree. It is possible to represent the query
using a graph data structure called a query graph.
• The DBMS must then devise an execution
strategy for retrieving the result of the query
from the database files. A query typically has
many possible execution strategies and the
process of choosing a suitable one for processing
is known as query optimization.
Query Optimization
Query Optimization
• A RDBMS (and a ODBMS) must systematically
evaluate alternative query execution
strategies and choose a reasonably efficient or
optimal strategy.
• Each DBMS has general database access
algorithms that implement relational
operations such as SELECT or JOIN or
combinations of these operations.
Translating SQL Queries into
Relational Algebra
• An SQL query is first translated into an
equivalent extended relational algebra
expression represented as a query tree data
structure that is then optimized.
• SQL queries are decomposed into query
blocks which form the basic units that can be
translated into the algebraic operators and
optimized
Translating SQL Queries into
Relational Algebra
• A query block contains a single SELECT-FROM-
WHERE expression as well as GROUPBY and
HAVING clauses if these are part of the block.
• Nested queries within a query are identified as
separate query block
• Because SQL includes aggregate operators
such as MAX, MIN, SUM and COUNT, these
operators must also be included in the
extended algebra.
Translating SQL Queries into
Relational Algebra
Translating SQL Queries into
Relational Algebra
• This query includes a nested subquery and
hence would be decomposed into two blocks.
The inner block is
Translating SQL Queries into
Relational Algebra
• The outer block is
Translating SQL Queries into
Relational Algebra
Translating SQL Queries into
Relational Algebra
• The query optimizer would then choose an
execution plan for each block.
• In the example above the inner block needs to
be evaluated only once to produce the
maximum salary which is then used as the
constant c.
Basic Algorithms for Executing Query
Operations
• For each operation or combination of
operations one or more algorithms would
typically be available to execute the
operation(s).
• An algorithm may apply only to particular
storage structures and access paths, if so then
it can only be used if the files involved in the
operation include these access paths.
Basic Algorithms for Executing Query
Operations
• The external sorting is at the heart of many
relational operations that utilize sort-merge
strategies
• Access algorithms for implementing SELECT,
JOIN, PROJECT and set operations( UNION,
INTERSECTION, SET DIFFERENCE), and
Aggregate operations (MIN, COUNT, AVERAGE,
SUM) are also important in query
optimization.
External Sorting
• Sorting is one of the primary algorithms used
in query processing. E.g whenever an SQL
query specifies an ORDER BY clause, the query
result must be sorted. Sorting is also a key
component in sort-merge algorithms used for
JOIN and other operations such as UNION and
INTERSECTION and in duplicate elimination
algorithms for the PROJECT operation ( when
an SQL query specifies the DISTINCT option in
the SELECT clause)
External Sorting
• External sorting refers to sorting algorithms
that are suitable for large files of records
stored on disk that do not fit entirely in main
memory such as database files
• The typical external sorting algorithm uses a
sort-merge strategy, which starts by sorting
small subfiles-called runs-of the main file and
then merges the sorted runs creating larger
sorted files that are merged in turn
External Sorting
• The sort-merge algorithm like other database
algorithms requires buffer space in main
memory where the actual sorting and merging
of the runs is performed.
• The basic algorithm consists of two phases:
i. Sorting Phase
ii. Merging Phase
Sorting phase
Runs (portions) of the file that can fit in the
available buffer space are read into main
memory, sorted using an internal sorting
algorithm and written back to disk as
temporary sorted subfiles or runs
Merging phase
• The sorted runs are merged during one or
more passes. The degree of merging is the
number of runs that can be merged together
in each pass. In each pass, one buffer block is
needed for containing one block of the merge
result
Combining Operations Using
pipelining
• A query specified in SQL will typically be
translated into relational algebra expression
that is a sequence of relational operations. For
example rather than being implemented
separately, a JOIN can be combined with two
SELECT operations on the input files and a
final PROJECT operation on the resulting file;
Combining Operations Using
pipelining
• All this implemented by one algorithm with
two input files and a single output file.
Heuristic relational algebra optimization can
group operations together foe execution. This
is called pipelining or stream-based processing
Using Heuristics in Query
Optimization
• Application of heuristic rules to modify the internal
representation of a query is used to achieve
improvement in performance. One of the main
heuristic rules is to apply SELECT and PROJECT
operations before applying the JOIN or other binary
operations.
• This is because the size of the file resulting from a
binary operation such as a JOIN is usually a
multiplicative function of the sizes of the input files.
The SELECT and PROJECT operations reduce the size of
a file and hence should be applied before a JOIN or
other binary operation.
Notation for query trees and query
graphs
• A query tree is a tree data structure that
corresponds to a relational algebra expression. It
represents the input relations of the query as leaf
nodes of the tree, and represents the relational
algebra operations as internal nodes.
• An execution of the query tree consists of
executing an internal node operation whenever
its operands are available and then replacing that
internal node by the relation that results from
executing the operation
Notation for query trees and query
graphs
• The execution terminates when the root node is
executed and produces the result relation for the
query.
• Example: For every project located in ‘Nairobi’
retrieve the project number, the controlling
department number and the department
managers last name, address and birthdate. This
query is specified on the relational schema below
and corresponds to the following algebra
expression
Notation for query trees and query
graphs
Notation for query trees and query
graphs
Notation for query trees and query
graphs
Notation for query trees and query
graphs
Notation for query trees and query
graphs
Query tree corresponding to the
relational algebra expression
Query tree corresponding to the
relational algebra expression
• The three relations PROJECT, DEPARTMENT and
EMPLOYEE are represented by leaf nodes P,D and E,
while the relational algebra operations of the
expression are represented by internal tree nodes.
• When this query tree is executed, the node marked (1)
must begin execution before (2) because some
resulting tuples of operartion (1) must be available
before we can begin executing operation (2). Similarly
node (2) must begin executing and producing results
before node (3) can start execution and so on
Initial ( canonical ) query tree for the
query
Query graph for the query
Query graph for the query
• Relations in the query are represented by relation
nodes, which are displayed as single circles.
Constant values, typically from the query
selection conditions are represented by constant
nodes which are displayed as double circles.
• Selection and join conditions are represented by
the graph edges. The attributes to be retrieved
from each relation are displayed in square
brackets above each relation.
Query graph for the query
• The query graph representation does not
indicate an order on which operations to
perform first. There is only a single graph
corresponding to each query.
• Query trees are preferred because the query
optimizer needs to show the order of
operations for query execution, which is not
possible in query graphs
Heuristic Optimization Of Query Trees

• In general many different relational algebra

expressions ( hence many different query
trees) can be equivalent i.e. they can
correspond to the same query.
• The query parser will typically generate a
standard initial query tree to correspond to an
SQL query, without doing any optimization. In
the above example the canonical form is that
initial tree.
Heuristic Optimization Of Query Trees
• The CARTESIAN PRODUCT of the relations
specified in the FROM clause is first applied;
then the selection and join conditions of the
WHERE clause are applied, followed by the
projection on the SELECT clause attributes.
• Such a canonical query tree represents a
relational algebra expression that is very
inefficient if executed directly, because of the
CARTESIAN PRODUCT (X) operations
Heuristic Optimization Of Query Trees

• For example if the PROJECT, DEPARTMENT and

EMPLOYEE relations had record sizes of 100,
50 and 150 bytes and contained 100, 20, 5000
tuples respectively, the result of the
CARTESIAN PRODUCT would contain 10
million tuples of record size 300 bytes each
Heuristic Optimization Of Query Trees

• It is now the job of the heuristic query optimizer

to transform this initial query tree into a final
query tree that is efficient to execute.
• The optimizer must include rules for equivalence
among relational algebra expressions that can be
applied to the initial tree. The heuristic query
optimization rules then utilize these equivalence
expressions to transform the initial tree into the
final optimised query tree.
Heuristic Optimization Of Query Trees

• Example of transforming a tree:

• Find the last names of employees born after
1957 who work on a project named ‘sensors’.
• This query can be specified in SQL as:
Heuristic Optimization Of Query Trees
Initial (canonical) query tree for SQL
query
Moving SELECT operations down the
query tree
Moving SELECT operations down the
query tree
• This is an improved query tree that first
applies the SELECT operation to reduce the
number of tuples that appear in the
CARTESIAN PRODUCT.
Applying the more restrictive SELECT
operation first
Applying the more restrictive SELECT
operation first
• A further improvement is achieved by
switching the positions of the EMPLOYEE and
PROJECT relations in figure above. This uses
the information that PNUMBER is a key
attribute of the project relation and hence the
SELECT operation on the PROJECT relation will
retrieve a single record only.
Replacing CARTESIAN PRODUCT and
SELECT with JOIN operations
Replacing CARTESIAN PRODUCT and
SELECT with JOIN operations
• In above figure improvement is achieved by
replacing any CARTESIAN PRODUCT operation
that is followed by a join condition with a JOIN
operation
Moving PROJECT operations down the
query tree.
Moving PROJECT operations down the
query tree
• In this improvement is achieved by keeping
only the attributes needed by the subsequent
operations in the intermediate relations by
including project operations as early as
possible in the query tree. This reduces the
attributes of the intermediate relations,
whereas the SELECT operations reduce the
number of tuples.
General transformation Rules for
Relational Algebra Operations
• This example demonstrates that a query tree
can be transformed step by step into another
query tree that is more efficient to execute.
However we must be sure that the
transformation steps always lead to an
equivalent query tree. To do this the query
optimizer must know which transformation
rules preserve this equivalence
General transformation Rules for
Relational Algebra Operations
General transformation Rules for
Relational Algebra Operations
• Take Home CAT 2
1. Complete the list of rules up to rule number 12
and identify how the rules have been
implemented in the example given earlier in
figures b to e.
2. Discuss the methods for implementing
SELECTION, JOIN, PROJECT, SET and AGGREGATE
Operations (hint. For selection: linear search,
binary search) For Join (Nested-loop join, sort-
merge join) SET( Hashing) e.t.c
General transformation Rules for
Relational Algebra Operations
• The main heuristic is to apply first the
operations that reduce the size of
intermediate results. This includes performing
as early as possible SELECT operations to
reduce the number of tuples and PROJECT
operations to reduce the number of
attributes.
• This is done by moving SELECT and PROJECT
operations as far down the tree as possible
General transformation Rules for
Relational Algebra Operations
• In addition, the SELECT and JOIN operations
that are most restrictive-that is result in
relations with the fewest tuples or with the
smallest absolute size-should be executed
before other similar operations.
• This is done by reordering the leaf nodes of
the tree among themselves while avoiding
CARTESIAN PRODUCTS, and adjusting the rest
of the tree appropriately.
Converting Query Trees into
Execution Plans
An execution plan for a relational algebra
expression represented as a query tree
includes information about the access
methods available for each relation as well as
the algorithms to be used in computing the
relational operators represented in the tree.
Converting Query Trees into
Execution Plans
Converting Query Trees into
Execution Plans
• Consider the query tree above: to convert this
into an execution plan, the optimizer might
choose an index search for the SELECT
operation (assuming one exists), a table scan
as access method for EMPLOYEE, a nested-
loop join algorithm for the join, and a scan of
the JOIN result for the PROJECT operator.
Converting Query Trees into
Execution Plans
• In addition, the approach taken for executing the
query may specify a materialised or a pipelined
evaluation. With a materialised evaluation, the
result of an operation is stored as a temporary
relation (that is the result is physically
materialised).
• For instance the join operation can be computed
and the entire result stored as a temporary
relation, which is then read as input by the
algorithm that computes the PROJECT operation,
which would produce the query result table.
Converting Query Trees into
Execution Plans
• On the other hand, with a pipelined
evaluation, as the resulting tuples of an
operation are produced, they are forwarded
directly to the next operation in the query
sequence.
• The advantage of pipelining is the cost saving
in not having to write the intermediate results
to disk and not having to read them back for
the next operation.

SASE Secondary
No ratings yet
SASE Secondary
29 pages
Chapter - 1 - Query Optimization
No ratings yet
Chapter - 1 - Query Optimization
38 pages
Lect#2 DDBS (Characteristics and Layers of Query Processing)
78% (9)
Lect#2 DDBS (Characteristics and Layers of Query Processing)
20 pages
Performance Tuning Interview Questions
100% (3)
Performance Tuning Interview Questions
8 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
31 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
25 pages
Chapter 2 Query Optimization
No ratings yet
Chapter 2 Query Optimization
31 pages
Ch-2 Query Processing and Optimization
No ratings yet
Ch-2 Query Processing and Optimization
26 pages
ADB Chapter 2
No ratings yet
ADB Chapter 2
40 pages
Query Processing 16 Oct
No ratings yet
Query Processing 16 Oct
12 pages
Chapter 1 Query Processing and Optimization
No ratings yet
Chapter 1 Query Processing and Optimization
129 pages
Presentation9 - Query Processing and Query Optimization in DBMS
No ratings yet
Presentation9 - Query Processing and Query Optimization in DBMS
36 pages
04 Advanced Database System Chap 02 [RVUNC]
No ratings yet
04 Advanced Database System Chap 02 [RVUNC]
50 pages
Advanced Database Chapter Two Query Processing and Optimization
100% (1)
Advanced Database Chapter Two Query Processing and Optimization
43 pages
Chapter 5
No ratings yet
Chapter 5
45 pages
Chapter 2 Adb
No ratings yet
Chapter 2 Adb
21 pages
Chapter 2-1: Query Processing
No ratings yet
Chapter 2-1: Query Processing
31 pages
Adb Lecture Fourr
No ratings yet
Adb Lecture Fourr
52 pages
Data Communication Basics CH 2
No ratings yet
Data Communication Basics CH 2
36 pages
CO3-SESSION-23
No ratings yet
CO3-SESSION-23
27 pages
2 Algorithms For Query Processing Optimization
No ratings yet
2 Algorithms For Query Processing Optimization
46 pages
Rdbms Assignment
No ratings yet
Rdbms Assignment
12 pages
Unit 3
No ratings yet
Unit 3
63 pages
Chapter 4 Query Optimization
100% (2)
Chapter 4 Query Optimization
35 pages
CHAPTER - 02 - Query Processing - CS 2nd Year - 2016
No ratings yet
CHAPTER - 02 - Query Processing - CS 2nd Year - 2016
49 pages
Lecture 4 Query Processing
No ratings yet
Lecture 4 Query Processing
18 pages
Chapter 1 - Query Processing and Optimization
No ratings yet
Chapter 1 - Query Processing and Optimization
62 pages
CH - 2 Query Process
No ratings yet
CH - 2 Query Process
44 pages
Querry Optimization
No ratings yet
Querry Optimization
13 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
127 pages
Query Processing and Query Optimization
No ratings yet
Query Processing and Query Optimization
9 pages
Query Processing and Optimization: Chapter - 2
No ratings yet
Query Processing and Optimization: Chapter - 2
42 pages
Query Optimization: Admas University, Advanced DBMS Lecture Note
No ratings yet
Query Optimization: Admas University, Advanced DBMS Lecture Note
5 pages
1 Hend 4 F 3 Hru 8 Dfu 504 Un
No ratings yet
1 Hend 4 F 3 Hru 8 Dfu 504 Un
22 pages
Query Processing and Query Optimization Techniques
No ratings yet
Query Processing and Query Optimization Techniques
20 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
28 pages
Advancedchapter 2 2013
No ratings yet
Advancedchapter 2 2013
16 pages
Chapter 2 - Query Processing and Optimization
100% (1)
Chapter 2 - Query Processing and Optimization
28 pages
SQL Performance in ERP 11i
No ratings yet
SQL Performance in ERP 11i
14 pages
CH - 1 Query Process SW
No ratings yet
CH - 1 Query Process SW
43 pages
Ivunit Query Processing
No ratings yet
Ivunit Query Processing
12 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
33 pages
Oracle SQL Tuning PDF
50% (2)
Oracle SQL Tuning PDF
70 pages
Measures of Query Cost
No ratings yet
Measures of Query Cost
15 pages
Measures of Query Cost
No ratings yet
Measures of Query Cost
15 pages
FALLSEM2023 24 - BCSE302L - TH - VL2023240100776 - 2023 06 16 - Reference Material I 2
No ratings yet
FALLSEM2023 24 - BCSE302L - TH - VL2023240100776 - 2023 06 16 - Reference Material I 2
41 pages
Lesson 05
No ratings yet
Lesson 05
29 pages
3.6 Query Processing: - Upper Levels of The Data Integration Problem - Basic Steps in Query Processing
No ratings yet
3.6 Query Processing: - Upper Levels of The Data Integration Problem - Basic Steps in Query Processing
9 pages
ADBChapter 1
No ratings yet
ADBChapter 1
32 pages
Query Optimizattion
No ratings yet
Query Optimizattion
113 pages
Advanced Database Systems: Chapter 3:query Processing and Evaluation
100% (1)
Advanced Database Systems: Chapter 3:query Processing and Evaluation
36 pages
Bca3020 Unit 11 SLM
No ratings yet
Bca3020 Unit 11 SLM
22 pages
Query Processing
No ratings yet
Query Processing
3 pages
Chapter 1 Query Processing and Optimization
No ratings yet
Chapter 1 Query Processing and Optimization
108 pages
ADBMS TypicalQueryOptimizer
No ratings yet
ADBMS TypicalQueryOptimizer
30 pages
Unit 2
No ratings yet
Unit 2
48 pages
Advaced DB U1
No ratings yet
Advaced DB U1
48 pages
Lesson 07
No ratings yet
Lesson 07
57 pages
DR Adnan
No ratings yet
DR Adnan
51 pages
Query Processing Optimization
No ratings yet
Query Processing Optimization
38 pages
Oracle OBIEE Interview Q & A
From Everand
Oracle OBIEE Interview Q & A
Mohammed Azizuddin Aamer
3/5 (1)
Surface Evolver Manual
No ratings yet
Surface Evolver Manual
241 pages
Akhil -Data Analyst
No ratings yet
Akhil -Data Analyst
4 pages
An Internship Report On IT Management
No ratings yet
An Internship Report On IT Management
26 pages
DMS100 Technical Specification PDF
No ratings yet
DMS100 Technical Specification PDF
858 pages
Soap Doc 2
No ratings yet
Soap Doc 2
277 pages
AutoDock Tutorial v1.2
No ratings yet
AutoDock Tutorial v1.2
3 pages
HSTE NB0066 IND RV1.0 macOS - Guide - HYP2003 20220909
No ratings yet
HSTE NB0066 IND RV1.0 macOS - Guide - HYP2003 20220909
13 pages
Data Structure
No ratings yet
Data Structure
109 pages
EconDev Mod 4 ManSci Mod 2 3 and AIS Mod 5
No ratings yet
EconDev Mod 4 ManSci Mod 2 3 and AIS Mod 5
7 pages
3.decision Making and Looping
No ratings yet
3.decision Making and Looping
3 pages
Proshow 6
No ratings yet
Proshow 6
2 pages
SetPoint Report
No ratings yet
SetPoint Report
136 pages
PAC Productivity Suite: Integrated PLC and SCADA Solution
No ratings yet
PAC Productivity Suite: Integrated PLC and SCADA Solution
6 pages
Rapid Application Development
No ratings yet
Rapid Application Development
13 pages
Lecture 3 Software ReEngineering 10102022 104311am
No ratings yet
Lecture 3 Software ReEngineering 10102022 104311am
37 pages
WDT 8 6 Operation Guide Slides (For SBC) - V2 1 - 2
No ratings yet
WDT 8 6 Operation Guide Slides (For SBC) - V2 1 - 2
31 pages
APDU - EMV, APDU Command - Response Parse at Iso8583.info
No ratings yet
APDU - EMV, APDU Command - Response Parse at Iso8583.info
1 page
Decision-Makerr MPAC 1500 Decision-Makerr MPAC 1500 Controller Standard Features
No ratings yet
Decision-Makerr MPAC 1500 Decision-Makerr MPAC 1500 Controller Standard Features
6 pages
Securing_Modbus_TCP_Communications_in_I4.0_A_Penetration_Testing_Approach_Using_OpenPLC_and_Factory_IO
No ratings yet
Securing_Modbus_TCP_Communications_in_I4.0_A_Penetration_Testing_Approach_Using_OpenPLC_and_Factory_IO
6 pages
10.2478 - Picbe 2022 0079
No ratings yet
10.2478 - Picbe 2022 0079
10 pages
Purchase Order: Contract Creation: Step 1: Go To Transaction Code
No ratings yet
Purchase Order: Contract Creation: Step 1: Go To Transaction Code
18 pages
Tutorial: Unix Command Summary
No ratings yet
Tutorial: Unix Command Summary
15 pages
6.13 Key Terms, Review Questions, and Problems
No ratings yet
6.13 Key Terms, Review Questions, and Problems
7 pages
SDLC On Cloud Computing
100% (1)
SDLC On Cloud Computing
9 pages
CARIS HIPS & SIPS Changes List PDF
No ratings yet
CARIS HIPS & SIPS Changes List PDF
48 pages
Samsung A7 A6 A6+
No ratings yet
Samsung A7 A6 A6+
2 pages
C Programs
No ratings yet
C Programs
5 pages
AWS and Microsoft Azure
No ratings yet
AWS and Microsoft Azure
5 pages
Autosar TR Bswumlmodelmodelingguide
No ratings yet
Autosar TR Bswumlmodelmodelingguide
51 pages