Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Query Optimization

The document discusses query optimization techniques used by DBMS to process and execute high-level queries, including scanning, parsing, and validating SQL queries. It explains the creation of query trees, the evaluation of execution strategies, and the importance of heuristic optimization to improve performance by reducing intermediate results. Additionally, it covers the conversion of query trees into execution plans, detailing access methods and evaluation approaches such as materialized and pipelined evaluations.

Uploaded by

mwendikimaiga21
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Query Optimization

The document discusses query optimization techniques used by DBMS to process and execute high-level queries, including scanning, parsing, and validating SQL queries. It explains the creation of query trees, the evaluation of execution strategies, and the importance of heuristic optimization to improve performance by reducing intermediate results. Additionally, it covers the conversion of query trees into execution plans, detailing access methods and evaluation approaches such as materialized and pipelined evaluations.

Uploaded by

mwendikimaiga21
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Query Optimization

• Query optimization addresses the techniques


used by DBMS to process, optimize and
execute high level queries. A query expressed
as a high-level language such as SQL must first
be scanned, parsed and validated.
• scanner identifies the language tokens-such as
SQL keywords, attribute names and relation
names
Query Optimization
• the parser checks the query syntax to
determine whether it is formulated according
to the syntax rules of the query language
• query must also be validated by checking that
all attribute and relation names are valid and
semantically meaningful names in the schema
Query Optimization
• An internal representation of the query is then
created, usually as a tree data structure called a
query tree. It is possible to represent the query
using a graph data structure called a query graph.
• The DBMS must then devise an execution
strategy for retrieving the result of the query
from the database files. A query typically has
many possible execution strategies and the
process of choosing a suitable one for processing
is known as query optimization.
Query Optimization
Query Optimization
• A RDBMS (and a ODBMS) must systematically
evaluate alternative query execution
strategies and choose a reasonably efficient or
optimal strategy.
• Each DBMS has general database access
algorithms that implement relational
operations such as SELECT or JOIN or
combinations of these operations.
Translating SQL Queries into
Relational Algebra
• An SQL query is first translated into an
equivalent extended relational algebra
expression represented as a query tree data
structure that is then optimized.
• SQL queries are decomposed into query
blocks which form the basic units that can be
translated into the algebraic operators and
optimized
Translating SQL Queries into
Relational Algebra
• A query block contains a single SELECT-FROM-
WHERE expression as well as GROUPBY and
HAVING clauses if these are part of the block.
• Nested queries within a query are identified as
separate query block
• Because SQL includes aggregate operators
such as MAX, MIN, SUM and COUNT, these
operators must also be included in the
extended algebra.
Translating SQL Queries into
Relational Algebra
Translating SQL Queries into
Relational Algebra
• This query includes a nested subquery and
hence would be decomposed into two blocks.
The inner block is
Translating SQL Queries into
Relational Algebra
• The outer block is
Translating SQL Queries into
Relational Algebra
Translating SQL Queries into
Relational Algebra
• The query optimizer would then choose an
execution plan for each block.
• In the example above the inner block needs to
be evaluated only once to produce the
maximum salary which is then used as the
constant c.
Basic Algorithms for Executing Query
Operations
• For each operation or combination of
operations one or more algorithms would
typically be available to execute the
operation(s).
• An algorithm may apply only to particular
storage structures and access paths, if so then
it can only be used if the files involved in the
operation include these access paths.
Basic Algorithms for Executing Query
Operations
• The external sorting is at the heart of many
relational operations that utilize sort-merge
strategies
• Access algorithms for implementing SELECT,
JOIN, PROJECT and set operations( UNION,
INTERSECTION, SET DIFFERENCE), and
Aggregate operations (MIN, COUNT, AVERAGE,
SUM) are also important in query
optimization.
External Sorting
• Sorting is one of the primary algorithms used
in query processing. E.g whenever an SQL
query specifies an ORDER BY clause, the query
result must be sorted. Sorting is also a key
component in sort-merge algorithms used for
JOIN and other operations such as UNION and
INTERSECTION and in duplicate elimination
algorithms for the PROJECT operation ( when
an SQL query specifies the DISTINCT option in
the SELECT clause)
External Sorting
• External sorting refers to sorting algorithms
that are suitable for large files of records
stored on disk that do not fit entirely in main
memory such as database files
• The typical external sorting algorithm uses a
sort-merge strategy, which starts by sorting
small subfiles-called runs-of the main file and
then merges the sorted runs creating larger
sorted files that are merged in turn
External Sorting
• The sort-merge algorithm like other database
algorithms requires buffer space in main
memory where the actual sorting and merging
of the runs is performed.
• The basic algorithm consists of two phases:
i. Sorting Phase
ii. Merging Phase
Sorting phase
Runs (portions) of the file that can fit in the
available buffer space are read into main
memory, sorted using an internal sorting
algorithm and written back to disk as
temporary sorted subfiles or runs
Merging phase
• The sorted runs are merged during one or
more passes. The degree of merging is the
number of runs that can be merged together
in each pass. In each pass, one buffer block is
needed for containing one block of the merge
result
Combining Operations Using
pipelining
• A query specified in SQL will typically be
translated into relational algebra expression
that is a sequence of relational operations. For
example rather than being implemented
separately, a JOIN can be combined with two
SELECT operations on the input files and a
final PROJECT operation on the resulting file;
Combining Operations Using
pipelining
• All this implemented by one algorithm with
two input files and a single output file.
Heuristic relational algebra optimization can
group operations together foe execution. This
is called pipelining or stream-based processing
Using Heuristics in Query
Optimization
• Application of heuristic rules to modify the internal
representation of a query is used to achieve
improvement in performance. One of the main
heuristic rules is to apply SELECT and PROJECT
operations before applying the JOIN or other binary
operations.
• This is because the size of the file resulting from a
binary operation such as a JOIN is usually a
multiplicative function of the sizes of the input files.
The SELECT and PROJECT operations reduce the size of
a file and hence should be applied before a JOIN or
other binary operation.
Notation for query trees and query
graphs
• A query tree is a tree data structure that
corresponds to a relational algebra expression. It
represents the input relations of the query as leaf
nodes of the tree, and represents the relational
algebra operations as internal nodes.
• An execution of the query tree consists of
executing an internal node operation whenever
its operands are available and then replacing that
internal node by the relation that results from
executing the operation
Notation for query trees and query
graphs
• The execution terminates when the root node is
executed and produces the result relation for the
query.
• Example: For every project located in ‘Nairobi’
retrieve the project number, the controlling
department number and the department
managers last name, address and birthdate. This
query is specified on the relational schema below
and corresponds to the following algebra
expression
Notation for query trees and query
graphs
Notation for query trees and query
graphs
Notation for query trees and query
graphs
Notation for query trees and query
graphs
Notation for query trees and query
graphs
Query tree corresponding to the
relational algebra expression
Query tree corresponding to the
relational algebra expression
• The three relations PROJECT, DEPARTMENT and
EMPLOYEE are represented by leaf nodes P,D and E,
while the relational algebra operations of the
expression are represented by internal tree nodes.
• When this query tree is executed, the node marked (1)
must begin execution before (2) because some
resulting tuples of operartion (1) must be available
before we can begin executing operation (2). Similarly
node (2) must begin executing and producing results
before node (3) can start execution and so on
Initial ( canonical ) query tree for the
query
Query graph for the query
Query graph for the query
• Relations in the query are represented by relation
nodes, which are displayed as single circles.
Constant values, typically from the query
selection conditions are represented by constant
nodes which are displayed as double circles.
• Selection and join conditions are represented by
the graph edges. The attributes to be retrieved
from each relation are displayed in square
brackets above each relation.
Query graph for the query
• The query graph representation does not
indicate an order on which operations to
perform first. There is only a single graph
corresponding to each query.
• Query trees are preferred because the query
optimizer needs to show the order of
operations for query execution, which is not
possible in query graphs
Heuristic Optimization Of Query Trees

• In general many different relational algebra


expressions ( hence many different query
trees) can be equivalent i.e. they can
correspond to the same query.
• The query parser will typically generate a
standard initial query tree to correspond to an
SQL query, without doing any optimization. In
the above example the canonical form is that
initial tree.
Heuristic Optimization Of Query Trees
• The CARTESIAN PRODUCT of the relations
specified in the FROM clause is first applied;
then the selection and join conditions of the
WHERE clause are applied, followed by the
projection on the SELECT clause attributes.
• Such a canonical query tree represents a
relational algebra expression that is very
inefficient if executed directly, because of the
CARTESIAN PRODUCT (X) operations
Heuristic Optimization Of Query Trees

• For example if the PROJECT, DEPARTMENT and


EMPLOYEE relations had record sizes of 100,
50 and 150 bytes and contained 100, 20, 5000
tuples respectively, the result of the
CARTESIAN PRODUCT would contain 10
million tuples of record size 300 bytes each
Heuristic Optimization Of Query Trees

• It is now the job of the heuristic query optimizer


to transform this initial query tree into a final
query tree that is efficient to execute.
• The optimizer must include rules for equivalence
among relational algebra expressions that can be
applied to the initial tree. The heuristic query
optimization rules then utilize these equivalence
expressions to transform the initial tree into the
final optimised query tree.
Heuristic Optimization Of Query Trees

• Example of transforming a tree:


• Find the last names of employees born after
1957 who work on a project named ‘sensors’.
• This query can be specified in SQL as:
Heuristic Optimization Of Query Trees
Initial (canonical) query tree for SQL
query
Moving SELECT operations down the
query tree
Moving SELECT operations down the
query tree
• This is an improved query tree that first
applies the SELECT operation to reduce the
number of tuples that appear in the
CARTESIAN PRODUCT.
Applying the more restrictive SELECT
operation first
Applying the more restrictive SELECT
operation first
• A further improvement is achieved by
switching the positions of the EMPLOYEE and
PROJECT relations in figure above. This uses
the information that PNUMBER is a key
attribute of the project relation and hence the
SELECT operation on the PROJECT relation will
retrieve a single record only.
Replacing CARTESIAN PRODUCT and
SELECT with JOIN operations
Replacing CARTESIAN PRODUCT and
SELECT with JOIN operations
• In above figure improvement is achieved by
replacing any CARTESIAN PRODUCT operation
that is followed by a join condition with a JOIN
operation
Moving PROJECT operations down the
query tree.
Moving PROJECT operations down the
query tree
• In this improvement is achieved by keeping
only the attributes needed by the subsequent
operations in the intermediate relations by
including project operations as early as
possible in the query tree. This reduces the
attributes of the intermediate relations,
whereas the SELECT operations reduce the
number of tuples.
General transformation Rules for
Relational Algebra Operations
• This example demonstrates that a query tree
can be transformed step by step into another
query tree that is more efficient to execute.
However we must be sure that the
transformation steps always lead to an
equivalent query tree. To do this the query
optimizer must know which transformation
rules preserve this equivalence
General transformation Rules for
Relational Algebra Operations
General transformation Rules for
Relational Algebra Operations
• Take Home CAT 2
1. Complete the list of rules up to rule number 12
and identify how the rules have been
implemented in the example given earlier in
figures b to e.
2. Discuss the methods for implementing
SELECTION, JOIN, PROJECT, SET and AGGREGATE
Operations (hint. For selection: linear search,
binary search) For Join (Nested-loop join, sort-
merge join) SET( Hashing) e.t.c
General transformation Rules for
Relational Algebra Operations
• The main heuristic is to apply first the
operations that reduce the size of
intermediate results. This includes performing
as early as possible SELECT operations to
reduce the number of tuples and PROJECT
operations to reduce the number of
attributes.
• This is done by moving SELECT and PROJECT
operations as far down the tree as possible
General transformation Rules for
Relational Algebra Operations
• In addition, the SELECT and JOIN operations
that are most restrictive-that is result in
relations with the fewest tuples or with the
smallest absolute size-should be executed
before other similar operations.
• This is done by reordering the leaf nodes of
the tree among themselves while avoiding
CARTESIAN PRODUCTS, and adjusting the rest
of the tree appropriately.
Converting Query Trees into
Execution Plans
An execution plan for a relational algebra
expression represented as a query tree
includes information about the access
methods available for each relation as well as
the algorithms to be used in computing the
relational operators represented in the tree.
Converting Query Trees into
Execution Plans
Converting Query Trees into
Execution Plans
• Consider the query tree above: to convert this
into an execution plan, the optimizer might
choose an index search for the SELECT
operation (assuming one exists), a table scan
as access method for EMPLOYEE, a nested-
loop join algorithm for the join, and a scan of
the JOIN result for the PROJECT operator.
Converting Query Trees into
Execution Plans
• In addition, the approach taken for executing the
query may specify a materialised or a pipelined
evaluation. With a materialised evaluation, the
result of an operation is stored as a temporary
relation (that is the result is physically
materialised).
• For instance the join operation can be computed
and the entire result stored as a temporary
relation, which is then read as input by the
algorithm that computes the PROJECT operation,
which would produce the query result table.
Converting Query Trees into
Execution Plans
• On the other hand, with a pipelined
evaluation, as the resulting tuples of an
operation are produced, they are forwarded
directly to the next operation in the query
sequence.
• The advantage of pipelining is the cost saving
in not having to write the intermediate results
to disk and not having to read them back for
the next operation.

You might also like