Ad db Chapter 2
Ad db Chapter 2
Database Systems
3
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Introduction to Query Processing
• Query Optimization:
• The process of choosing a suitable execution strategy for
processing a query
• Two Internal Representations of a Query:
• Query Tree
• Query Graph
4
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Introduction to Query Processing
5
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Translating SQL Queries into
Relational Algebra
• Query Block:
The basic unit that can be translated into the algebraic
operators and optimized
• A query block contains a single SELECT-FROM-WHERE
expression, as well as GROUP BY and HAVING clause if
these are part of the block
• Nested Queries within a query are identified as separate
query blocks
• Aggregate operators in SQL must be included in the extended
algebra
6
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Translating SQL Queries into Relational Algebra
9
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Basic Algorithms for Executing
Query Operations
• Implementing the SELECT Operation Algorithm:
• Examples:
(OP1): s SSN='123456789' (EMPLOYEE)
(OP2): s DNUMBER>5(DEPARTMENT)
(OP3): s DNO=5(EMPLOYEE)
(OP4): s DNO=5 AND SALARY>30000 AND SEX=F(EMPLOYEE)
(OP5): s ESSN=123456789 AND PNO=10(WORKS_ON)
10
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Basic Algorithms for Executing
Query Operations
• Implementing the JOIN Operation Algorithm:
• Join (EQUIJOIN, NATURAL JOIN)
• Two-Way Join: a join on two files
• e.g. R A=B S
• Multi-Way Joins: joins involving more than two files
• e.g. R A=B S C=D T
• Examples:
• (OP6): EMPLOYEE DNO=DNUMBER DEPARTMENT
• (OP7): DEPARTMENT MGRSSN=SSN EMPLOYEE
11
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Basic Algorithms for Executing
Query Operations
• Factors Affecting JOIN Performance:
• Available Buffer Space
• Join Selection Factor
• Choice of Inner vs Outer Relation
12
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Basic Algorithms for Executing
Query Operations
• Algorithm for PROJECT Operations:
<attribute list>(R)
• Alternative:
• Avoid constructing temporary results as much as possible
• Pipeline the data through multiple operations - pass the
result of a previous operator to the next without waiting to
complete the previous operation
• Also known as stream-based processing
17
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization
• Process for Heuristics Optimization
1. The parser of a high-level query generates an initial
internal representation
2. Apply heuristics rules to optimize the internal
representation
3. A query execution plan is generated to execute groups
of operations based on the access paths available on
the files involved in the query
• The main heuristic is to apply first the operations that reduce
the size of intermediate results
• E.g., Apply SELECT and PROJECT operations before
applying the JOIN or other binary operations
18
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization
• Query Tree:
19
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization
• Query Graph:
• Example:
For every project located in ‘Stafford’, retrieve the project
number, the controlling department number and the department
manager’s last name, address and birthdate
20
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization
• Relation Algebra:
PNUMBER, DNUM, LNAME, ADDRESS, BDATE
(((sPLOCATION=‘STAFFORD’ (PROJECT))
DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN
(EMPLOYEE))
22
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization
23
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization
• Heuristic Optimization of Query Trees:
• The same query could correspond to many different
relational algebra expressions - and hence many
different query trees
• The task of heuristic optimization of query trees is to find a
final query tree that is efficient to execute
• Example:
Q: SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME = ‘AQUARIUS’ AND
PNMUBER=PNO AND ESSN=SSN
AND BDATE > ‘1957-12-31’;
24
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization
25
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization
26
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization
• Summary of Heuristics for Algebraic Optimization:
1. The main heuristic is to apply first the operations that
reduce the size of intermediate results
2. Perform select operations as early as possible to reduce
the number of tuples and perform project operations as
early as possible to reduce the number of attributes. (This is
done by moving select and project operations as far
down the tree as possible.)
3. The select and join operations that are most restrictive
should be executed before other similar operations. (This is
done by reordering the leaf nodes of the tree among
themselves and adjusting the rest of the tree
appropriately.)
27
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Selectivity & Cost Estimates in
Query Optimization
• Cost-based Query Optimization:
• Estimate and compare the costs of executing a query
using different execution strategies and choose the
strategy with the lowest cost estimate
• (Compare to heuristic query optimization)
• Issues:
• Cost Function
• Number of execution strategies to be considered
28
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Selectivity & Cost Estimates in
Query Optimization
• Cost Components for Query Execution:
1. Access Cost to Secondary Storage
2. Storage Cost
3. Computation Cost
4. Memory Usage Cost
5. Communication Cost
29
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Selectivity & Cost Estimates in
Query Optimization
• Catalog Information Used in Cost Functions:
• Information about the size of a file:
• Number of records (tuples) (r),
• Record size (R),
• Number of blocks (b)
• Blocking factor (bfr)
• Information about indexes and indexing attributes of a file:
• Number of levels (x) of each multilevel index
• Number of first-level index blocks (bI1)
• Number of distinct values (d) of an attribute
• Selectivity (sl) of an attribute
• Selection cardinality (s) of an attribute. (s = sl * r)
30
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Selectivity & Cost Estimates in
Query Optimization
• Examples of Cost Functions for SELECT:
• S1. Linear Search (Brute Force) Approach
• CS1a = b
• For an equality condition on a key, CS1a = (b/2) if the
record is found; otherwise CS1a = b
• S2. Binary Search:
CS2 = log2b + (s/bfr) –1
For an equality condition on a unique (key) attribute, CS2
=log2b
• S3. Using A Primary Index (S3a) or Hash Key (S3b) to
retrieve a single record
• CS3a = x + 1; CS3b = 1 for static or linear hashing;
• CS3b = 1 for extendible hashing;
31
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Overview of Query Optimization in Oracle
• Oracle DBMS V8
• Rule-based query optimization: the optimizer chooses
execution plans based on heuristically ranked operations
• (Currently it is being phased out)
• Cost-based query optimization: the optimizer
examines alternative access paths and operator algorithms
and chooses the execution plan with lowest estimate cost
• The query cost is calculated based on the estimated
usage of resources such as I/O, CPU and memory
needed
• Application developers could specify hints to the
ORACLE query optimizer
• The idea is that an application developer might know
more information about the data 32
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Semantic Query Optimization
• Uses constraints specified on the database schema in order to
modify one query into another query that is more efficient to
execute
• Consider the following SQL query,
SELECT E.LNAME, M.LNAME
FROM EMPLOYEE E M
WHERE E.SUPERSSN=M.SSN AND E.SALARY>M.SALARY
• Explanation:
Suppose that we had a constraint on the database schema that stated
that no employee can earn more than his or her direct supervisor.
If the semantic query optimizer checks for the existence of this
constraint, it need not execute the query at all because it knows that
the result of the query will be empty. Techniques known as
theorem proving can be used for this purpose
33
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Summary
• Introduction to Query Processing
• Translating SQL Queries into Relational Algebra
• Algorithms for External Sorting
• Algorithms for SELECT and JOIN Operations
• Algorithms for PROJECT and SET Operations
• Implementing Aggregate Operations and Outer Joins
• Combining Operations using Pipelining
• Using Heuristics in Query Optimization
• Using Selectivity and Cost Estimates in Query Optimization
• Overview of Query Optimization in Oracle
• Semantic Query Optimization
34
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
End Of
Chapter Two
???