Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
4 views

Ad db Chapter 2

The document discusses query processing and optimization in advanced database systems, covering topics such as translating SQL queries into relational algebra, basic algorithms for executing query operations, and the use of heuristics and cost estimates in query optimization. It outlines various query operations, including SELECT, JOIN, and aggregate functions, and emphasizes the importance of efficient execution strategies. Additionally, it highlights the role of cost-based optimization and catalog information in determining the most efficient query execution plan.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Ad db Chapter 2

The document discusses query processing and optimization in advanced database systems, covering topics such as translating SQL queries into relational algebra, basic algorithms for executing query operations, and the use of heuristics and cost estimates in query optimization. It outlines various query operations, including SELECT, JOIN, and aggregate functions, and emphasizes the importance of efficient execution strategies. Additionally, it highlights the role of cost-based optimization and catalog information in determining the most efficient query execution plan.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Advanced

Database Systems

Estifanos T. (MSc in Computer Networking)


Chapter Two
Query Processing and Optimization
OUTLINE
• Introduction to Query Processing

• Translating SQL Queries into Relational Algebra

• Basic Algorithms for Executing Query Operations

• Using Heuristic in Query Optimization

• Using Selectivity and Cost Estimates in Query Optimization

• Semantic Query Optimization

3
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Introduction to Query Processing
• Query Optimization:
• The process of choosing a suitable execution strategy for
processing a query
• Two Internal Representations of a Query:
• Query Tree
• Query Graph

4
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Introduction to Query Processing

5
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Translating SQL Queries into
Relational Algebra
• Query Block:
The basic unit that can be translated into the algebraic
operators and optimized
• A query block contains a single SELECT-FROM-WHERE
expression, as well as GROUP BY and HAVING clause if
these are part of the block
• Nested Queries within a query are identified as separate
query blocks
• Aggregate operators in SQL must be included in the extended
algebra
6
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Translating SQL Queries into Relational Algebra

SELECT LNAME, FNAME


FROM EMPLOYEE
WHERE SALARY > (SELECT MAX (SALARY)
FROM EMPLOYEE
WHERE DNO = 5);

SELECT LNAME, FNAME SELECT MAX (SALARY)


FROM EMPLOYEE FROM EMPLOYEE
WHERE SALARY > C WHERE DNO = 5

πLNAME, FNAME (σSALARY>C(EMPLOYEE)) ℱMAX SALARY (σDNO=5 (EMPLOYEE))


7
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Basic Algorithms for Executing
Query Operations
• External Sorting:
Refers to sorting algorithms that are suitable for large files
of records stored on disk that do not fit entirely in main
memory, such as most database files
• Sort-Merge Strategy:
Starts by sorting small subfiles (runs) of the main file and
then merges the sorted runs, creating larger sorted subfiles
that are merged in turn
Sorting Phase: nR = (b/nB)
Merging Phase: dM = Min (nB-1, nR); nP = (logdM(nR))
nR: number of initial runs; b: number of file blocks;
nB: available buffer space; dM: degree of merging;
nP: number of passes.
8
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Basic Algorithms for Executing Query Operations

9
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Basic Algorithms for Executing
Query Operations
• Implementing the SELECT Operation Algorithm:
• Examples:
(OP1): s SSN='123456789' (EMPLOYEE)
(OP2): s DNUMBER>5(DEPARTMENT)
(OP3): s DNO=5(EMPLOYEE)
(OP4): s DNO=5 AND SALARY>30000 AND SEX=F(EMPLOYEE)
(OP5): s ESSN=123456789 AND PNO=10(WORKS_ON)

10
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Basic Algorithms for Executing
Query Operations
• Implementing the JOIN Operation Algorithm:
• Join (EQUIJOIN, NATURAL JOIN)
• Two-Way Join: a join on two files
• e.g. R A=B S
• Multi-Way Joins: joins involving more than two files
• e.g. R A=B S C=D T
• Examples:
• (OP6): EMPLOYEE DNO=DNUMBER DEPARTMENT
• (OP7): DEPARTMENT MGRSSN=SSN EMPLOYEE
11
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Basic Algorithms for Executing
Query Operations
• Factors Affecting JOIN Performance:
• Available Buffer Space
• Join Selection Factor
• Choice of Inner vs Outer Relation

12
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Basic Algorithms for Executing
Query Operations
• Algorithm for PROJECT Operations:
 <attribute list>(R)

1. If <attribute list> has a key of relation R, extract all tuples


from R with only the values for the attributes in <attribute
list>
2. If <attribute list> does NOT include a key of relation R,
duplicated tuples must be removed from the results

• Methods to Remove Duplicate Tuples


1. Sorting
2. Hashing
13
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Basic Algorithms for Executing
Query Operations
• Algorithm for SET Operations:
• Set Operators:
• UNION, INTERSECTION, SET DIFFERENCE and
CARTESIAN PRODUCT
• CARTESIAN PRODUCT of relations R and S include all
possible combinations of records from R and S. The attribute
of the result include all attributes of R and S
• Cost Analysis of CARTESIAN PRODUCT
• If R has n records and j attributes and S has m records and k
attributes, the result relation will have n*m records and j+k
attributes
• CARTESIAN PRODUCT operation is very expensive and
should be avoided if possible
14
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Basic Algorithms for Executing
Query Operations
• Implementing Aggregate Operations:
• Aggregate Operators:
MIN, MAX, SUM, COUNT and AVG
• Options to Implement Aggregate Operators:
Table Scan
Index
• Example:
SELECT MAX (SALARY)
FROM EMPLOYEE;
• If an (ascending) index on SALARY exists for the employee
relation, then the optimizer could decide on traversing the
index for the largest value, which would entail following the
right most pointer in each index node from the root to a leaf
15
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Basic Algorithms for Executing
Query Operations
• Implementing Outer Join:
• Outer Join Operators:
LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN
• The full outer join produces a result which is equivalent to the union
of the results of the left and right outer joins
• Example:
SELECT FNAME, DNAME
FROM (EMPLOYEE LEFT OUTER JOIN
DEPARTMENT ON DNO = DNUMBER);
• Note: The result of this query is a table of employee names and
their associated departments. It is similar to a regular join
result, with the exception that if an employee does not have an
associated department, the employee's name will still appear in
the resulting table, although the department name would be
indicated as null 16
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Combining Operations Using Pipelining
• Motivation:
• A query is mapped into a sequence of operations
• Each execution of an operation produces a temporary
result
• Generating and saving temporary files on disk is time
consuming and expensive

• Alternative:
• Avoid constructing temporary results as much as possible
• Pipeline the data through multiple operations - pass the
result of a previous operator to the next without waiting to
complete the previous operation
• Also known as stream-based processing
17
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization
• Process for Heuristics Optimization
1. The parser of a high-level query generates an initial
internal representation
2. Apply heuristics rules to optimize the internal
representation
3. A query execution plan is generated to execute groups
of operations based on the access paths available on
the files involved in the query
• The main heuristic is to apply first the operations that reduce
the size of intermediate results
• E.g., Apply SELECT and PROJECT operations before
applying the JOIN or other binary operations
18
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization
• Query Tree:

• A tree data structure that corresponds to a relational


algebra expression. It represents the input relations of
the query as leaf nodes of the tree, and represents the
relational algebra operations as internal nodes

• An execution of the query tree consists of executing an


internal node operation whenever its operands are available and
then replacing that internal node by the relation that
results from executing the operation

19
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization
• Query Graph:

• A graph data structure that corresponds to a relational


calculus expression. It does not indicate an order on
which operations to perform first. There is only a single
graph corresponding to each query

• Example:
For every project located in ‘Stafford’, retrieve the project
number, the controlling department number and the department
manager’s last name, address and birthdate

20
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization
• Relation Algebra:
PNUMBER, DNUM, LNAME, ADDRESS, BDATE
(((sPLOCATION=‘STAFFORD’ (PROJECT))
DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN
(EMPLOYEE))

• SQL Query (Q2):


SELECT P.NUMBER,P.DNUM,E.LNAME,
E.ADDRESS, E.BDATE
FROM PROJECT AS P, DEPARTMENT AS
D, EMPLOYEE AS E
WHERE P.DNUM=D.DNUMBER AND
D.MGRSSN=E.SSN AND
P.PLOCATION=‘STAFFORD’; 21
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization

22
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization

23
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization
• Heuristic Optimization of Query Trees:
• The same query could correspond to many different
relational algebra expressions - and hence many
different query trees
• The task of heuristic optimization of query trees is to find a
final query tree that is efficient to execute

• Example:
Q: SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME = ‘AQUARIUS’ AND
PNMUBER=PNO AND ESSN=SSN
AND BDATE > ‘1957-12-31’;
24
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization

25
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization

26
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Heuristics in Query Optimization
• Summary of Heuristics for Algebraic Optimization:
1. The main heuristic is to apply first the operations that
reduce the size of intermediate results
2. Perform select operations as early as possible to reduce
the number of tuples and perform project operations as
early as possible to reduce the number of attributes. (This is
done by moving select and project operations as far
down the tree as possible.)
3. The select and join operations that are most restrictive
should be executed before other similar operations. (This is
done by reordering the leaf nodes of the tree among
themselves and adjusting the rest of the tree
appropriately.)
27
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Selectivity & Cost Estimates in
Query Optimization
• Cost-based Query Optimization:
• Estimate and compare the costs of executing a query
using different execution strategies and choose the
strategy with the lowest cost estimate
• (Compare to heuristic query optimization)

• Issues:
• Cost Function
• Number of execution strategies to be considered

28
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Selectivity & Cost Estimates in
Query Optimization
• Cost Components for Query Execution:
1. Access Cost to Secondary Storage
2. Storage Cost
3. Computation Cost
4. Memory Usage Cost
5. Communication Cost

• Note: Different database systems may focus on different cost


components

29
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Selectivity & Cost Estimates in
Query Optimization
• Catalog Information Used in Cost Functions:
• Information about the size of a file:
• Number of records (tuples) (r),
• Record size (R),
• Number of blocks (b)
• Blocking factor (bfr)
• Information about indexes and indexing attributes of a file:
• Number of levels (x) of each multilevel index
• Number of first-level index blocks (bI1)
• Number of distinct values (d) of an attribute
• Selectivity (sl) of an attribute
• Selection cardinality (s) of an attribute. (s = sl * r)
30
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Using Selectivity & Cost Estimates in
Query Optimization
• Examples of Cost Functions for SELECT:
• S1. Linear Search (Brute Force) Approach
• CS1a = b
• For an equality condition on a key, CS1a = (b/2) if the
record is found; otherwise CS1a = b
• S2. Binary Search:
CS2 = log2b + (s/bfr) –1
For an equality condition on a unique (key) attribute, CS2
=log2b
• S3. Using A Primary Index (S3a) or Hash Key (S3b) to
retrieve a single record
• CS3a = x + 1; CS3b = 1 for static or linear hashing;
• CS3b = 1 for extendible hashing;
31
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Overview of Query Optimization in Oracle
• Oracle DBMS V8
• Rule-based query optimization: the optimizer chooses
execution plans based on heuristically ranked operations
• (Currently it is being phased out)
• Cost-based query optimization: the optimizer
examines alternative access paths and operator algorithms
and chooses the execution plan with lowest estimate cost
• The query cost is calculated based on the estimated
usage of resources such as I/O, CPU and memory
needed
• Application developers could specify hints to the
ORACLE query optimizer
• The idea is that an application developer might know
more information about the data 32
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Semantic Query Optimization
• Uses constraints specified on the database schema in order to
modify one query into another query that is more efficient to
execute
• Consider the following SQL query,
SELECT E.LNAME, M.LNAME
FROM EMPLOYEE E M
WHERE E.SUPERSSN=M.SSN AND E.SALARY>M.SALARY
• Explanation:
Suppose that we had a constraint on the database schema that stated
that no employee can earn more than his or her direct supervisor.
If the semantic query optimizer checks for the existence of this
constraint, it need not execute the query at all because it knows that
the result of the query will be empty. Techniques known as
theorem proving can be used for this purpose
33
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
Summary
• Introduction to Query Processing
• Translating SQL Queries into Relational Algebra
• Algorithms for External Sorting
• Algorithms for SELECT and JOIN Operations
• Algorithms for PROJECT and SET Operations
• Implementing Aggregate Operations and Outer Joins
• Combining Operations using Pipelining
• Using Heuristics in Query Optimization
• Using Selectivity and Cost Estimates in Query Optimization
• Overview of Query Optimization in Oracle
• Semantic Query Optimization

34
Estifanos T. (MSc in Computer Networking) Lecture 2: Query Processing and Optimization 7/1/2018
End Of
Chapter Two

???

You might also like