05 Query Processing and Optimization-TELU
05 Query Processing and Optimization-TELU
Oleh:
Tim Dosen
Goals of the Meeting
01 02 03
Students knows the Students know various Students know various
basic process of query algorithms for selection ways to optimize query
processing and join operations and processing, generate
and understand how to how to measures query equivalent expressions,
translate SQL Queries costs and execute an SQL
into Relational Algebra statement to view query
Expression (RAE) evaluation plans in
DBMS
OUTLINES
• Query Cost
9 /1 0/ 2 02 4 Storage Management 3
STEP OF QUERY
PRO CESSING
9 /1 0/ 2 02 4 4
BASIC STEPS IN QUERY PROCESSING
1. Parsing and translation
2. Optimization
3. Evaluation
BA S I C ST E P S I N Q U E RY P R O C E S S I N G ( C O N T. )
• Parsing and translation
– translate the query into its internal form. This is then translated into relational algebra.
– Parser checks syntax, verifies relations
• Optimization
– Each relational algebra operation can be evaluated using one of several different algorithms
– Annotated expression specifying detailed evaluation strategy is called an evaluation-plan
– Amongst all equivalent evaluation plans choose the one with lowest cost.
• Evaluation
– The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to
the query.
REL ATIONAL
ALGEBRA
EXPRESSION
• PA R SIN G & TR AN SL AT IO N
Q U E RY TO R A E
9 /1 0/ 2 02 4 7
BASIC STEPS IN QUERY PROCESSING
1. Parsing and translation
2. Optimization
3. Evaluation
RE L ATIO NAL ALGE BRA
• A procedural language consisting of a set of operations that take one or two relations as
input and produce a new relation as their result.
• Operators
– select:
– project:
– cartesian product: x
– join: ⋈
– union:
– set-intersection:
– set-difference: –
– assignment:
– rename:
S EL ECT OPER ATIO N
• The select operation selects tuples that satisfy a given predicate.
• Notation: p (r)
• p is called the selection predicate
• Example: select those tuples of the instructor relation where the instructor is in the “Physics”
department.
– Query
SELECT * FROM instructor WHERE dept_name = ‘Physics’
– Relational Algebra (RA)
dept_name=“Physics” (instructor)
– Result
S E L E C T O P E R AT I O N ( C O N T. )
• We allow comparisons using
=, , >, . <.
in the selection predicate.
• We can combine several predicates into a larger predicate by using the connectives:
(and), (or), (not)
• Example: Find the instructors in Physics with a salary greater than $90,000, we write:
– Query: SELECT * FROM instructor WHERE dept_name = ‘Physics’ AND salary > 90000
– RA:
dept_name=“Physics” salary > 90,000 (instructor)
• RA:
• Instead of giving the name of a relation as the argument of the projection operation, we give an
expression that evaluates to a relation.
CARTE S IAN-PROD U CT OPER ATION
• The Cartesian-product operation (denoted by X) allows us to combine information from any two
relations.
• Example: the Cartesian product of the relations instructor and teaches is written as:
– Query: SELECT * FROM instructor CROSS JOIN teaches
or SELECT * FROM instructor, teaches
– RA:
instructor X teaches
• We construct a tuple of the result out of each possible pair of tuples: one from the instructor relation
and one from the teaches relation (see next slide)
• Since the instructor ID appears in both relations we distinguish between these attribute by attaching to
the attribute the name of the relation from which the attribute originally came.
– instructor.ID
– teaches.ID
THE INSTRUCTOR X TEACHES TA B LE
JO IN OPE RATIO N
• The Cartesian-Product
instructor X teaches
associates every tuple of instructor with every tuple of teaches.
– Most of the resulting rows have information about instructors who did NOT teach a particular course.
• To get only those tuples of “instructor X teaches “ that pertain to instructors and the courses that they
taught, we write:
– Query: SELECT * FROM instructor, teaches WHERE instructor.id = teaches.id
– RAE:
instructor.id = teaches.id (instructor x teaches ))
– We get only those tuples of “instructor X teaches” that pertain to instructors and the courses that
they taught.
• The result of this expression, shown in the next slide
J O I N O P E R AT I O N ( C O N T. )
• The table corresponding to:
• Let “theta” be a predicate on attributes in the schema R “union” S. The join operation r ⋈𝜃 s is defined as follows:
𝒓 ⋈𝜽 𝒔 = 𝝈𝜽 (𝒓 × 𝒔)
Thus
• Natural join :
• Inner join :
• Outer join :
• SE LE CTI O N ALG O RI TH M
9 /1 0/ 2 02 4 23
BASIC STEPS IN QUERY PROCESSING
1. Parsing and translation
2. Optimization
3. Evaluation
BASIC STEPS IN QUERY PROCESSING:
OPTI MIZATIO N
• A relational algebra expression may have many equivalent expressions
– E.g., salary75000(salary(instructor)) is equivalent to
salary(salary75000(instructor))
• Each relational algebra operation can be evaluated using one of several different algorithms
– Correspondingly, a relational-algebra expression can be evaluated in many ways.
• Annotated expression specifying detailed evaluation strategy is called an evaluation-plan. E.g.,:
– Use an index on salary to find instructors with salary < 75000,
– Or perform complete relation scan and discard instructors with salary 75000
B A S I C S T E P S : O P T I M I Z A T I O N ( C O N T. )
• Query Optimization: Amongst all equivalent evaluation plans choose the one with lowest cost.
– Cost is estimated using statistical information from the
database catalog
• e.g.. number of tuples in each relation, size of tuples, etc.
• In this chapter we study
– How to measure query costs
– Algorithms for evaluating relational algebra operations
– How to combine algorithms for individual operations in order to evaluate a complete
expression
MEASURES OF QUERY COST
• Many factors contribute to time cost
– disk access, CPU, and network communication
• Cost can be measured based on
– response time, i.e. total elapsed time for answering query, or
– total resource consumption
• We use total resource consumption as cost metric
– Response time harder to estimate, and minimizing resource consumption is a good idea in a shared
database
• We ignore CPU costs for simplicity
– Real systems do take CPU cost into account
– Network costs must be considered for parallel systems
• We describe how estimate the cost of each operation
– We do not include cost to writing output to disk
MEASURES OF QUERY COST
• Disk cost can be estimated as:
– Number of seeks * average-seek-cost
– Number of blocks read * average-block-read-cost
– Number of blocks written * average-block-write-cost
• For simplicity we just use the number of block transfers from disk and the number of seeks as the cost measures
– tT – time to transfer one block
• Assuming for simplicity that write cost is same as read cost
– tS – time for one seek
– Cost for b block transfers plus S seeks
b * tT + S * tS
• tS and tT depend on where data is stored; with 4 KB blocks:
– High end magnetic disk: tS = 4 msec and tT =0.1 msec
– SSD: tS = 20-90 microsec and tT = 2-10 microsec for 4KB
M E A S U R E S O F Q U E R Y C O S T ( C O N T. )
• Required data may be buffer resident already, avoiding disk I/O
– But hard to take into account for cost estimation
• Several algorithms can reduce disk IO by using extra buffer space
– Amount of real memory available to buffer depends on other concurrent queries and OS
processes, known only during execution
• Worst case estimates assume that no data is initially in buffer and only the minimum amount of
memory needed for the operation is available
– But more optimistic estimates are used in practice
S ELECT I ONS I NVOLV I NG EQU A LI TY
COST ESTIMATES FOR SELECTION ALGORITHMS
S ELECT I ONS I NVOLV I NG EQU A LI TY (2 )
COST ESTIMATES FOR SELECTION ALGORITHMS
S ELECT I ONS I NVOLV I NG COM PA R IS ON S
COST ESTIMATES FOR SELECTION ALGORITHMS
J OIN OPER AT ION
• Several different algorithms to implement joins
– Nested-loop join
– Block nested-loop join
• Choice based on cost estimate
• Examples use the following information
– Number of records of student: 5,000 takes: 10,000
– Number of blocks of student: 100 takes: 400
NESTED-LOOP JOIN
• To compute the theta join r⨝s
for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr,ts) to see if they satisfy the join condition
if they do, add tr • ts to the result.
end
end
• r is called the outer relation and s the inner relation of the join.
• Requires no indices and can be used with any kind of join condition.
9 /1 0/ 2 02 4 38
BASIC STEPS IN QUERY PROCESSING
1. Parsing and translation
2. Optimization
3. Evaluation
INTRODUCTION
• Alternative ways of evaluating a given query
– Equivalent expressions
– Different algorithms for each operation
I N T R O D U CT I O N ( C O N T.)
• An evaluation plan defines exactly what algorithm is used for each operation, and how the execution of the
operations is coordinated.
I N T R O D U CT I O N ( C O N T.)
• Cost difference between evaluation plans for a query can be enormous
– E.g., seconds vs. days in some cases
• Steps in cost-based query optimization
1. Generate logically equivalent expressions using equivalence rules
2. Annotate resultant expressions to get alternative query plans
3. Choose the cheapest plan based on estimated cost
• Estimation of plan cost based on:
– Statistical information about relations. Examples:
• number of tuples, number of distinct values for an attribute
– Statistics estimation for intermediate results
• to compute cost of complex expressions
– Cost formulae for algorithms, computed using statistics
TRANS FORMATION OF REL ATIONAL
EXPRESSIONS
• Two relational algebra expressions are said to be equivalent if the two expressions generate the same set of
tuples on every legal database instance
– Note: order of tuples is irrelevant
– we don’t care if they generate different results on databases that violate integrity constraints
• In SQL, inputs and outputs are multisets of tuples
– Two expressions in the multiset version of the relational algebra are said to be equivalent if the two
expressions generate the same multiset of tuples on every legal database instance.
• An equivalence rule says that expressions of two forms are equivalent
– Can replace expression of first form by second, or vice versa
EQUIVALENCE RULES
1. Conjunctive selection operations can be deconstructed into a sequence of individual selections.
σ1 2 (E) ≡ σ1 (σ2 (E))
2. Selection operations are commutative.
σ1(σ2(E)) ≡ σ2 (σ1(E))
3. Only the last in a sequence of projection operations is needed, the others can be omitted.
L1( L2(…( Ln(E))…)) ≡ L1(E)
where L1 ⊆ L2 … ⊆ Ln
4. Selections can be combined with Cartesian products and theta joins.
a. σ (E1 x E2) ≡ E1 ⨝ E2
b. σ 1 (E1 ⨝2 E2) ≡ E1 ⨝ 1∧2 E2
E Q U I V A L E N C E R U L E S ( C O N T. )
5. Theta-join operations (and natural joins) are commutative.
E1 ⨝ E2 ≡ E2 ⨝ E1
• Query: Find the names of all instructors in the Music department, along with the titles of the
courses that they teach
– name, title(dept_name= ‘Music’
(instructor ⨝ (teaches ⨝ course_id, title (course))))
• Transformation using rule 7a.
– name, title((dept_name= ‘Music’(instructor)) ⨝
(teaches ⨝ course_id, title (course)))
• Performing the selection as early as possible reduces the size of the relation to be joined.
MULTI PL E T RA NS FOR MAT IONS
JOIN ORDERING EXAMPLE
• For all relations r1, r2, and r3,
(r1 ⨝ r2) ⨝ r3 = r1 ⨝ (r2 ⨝ r3 )
(Join Associativity) ⨝
• If r2 ⨝ r3 is quite large and r1 ⨝ r2 is small, we choose
(r1 ⨝ r2) ⨝ r3
so that we compute and store a smaller temporary relation.
J O I N O R D E R I N G E X A M P L E ( C O N T. )
• Consider the expression
name, title(dept_name= “Music” (instructor) ⨝ teaches) ⨝ course_id, title (course))))
• Could compute teaches ⨝ course_id, title (course) first, and join result with
Must consider the interaction of evaluation techniques when choosing evaluation plans choosing the
cheapest algorithm for each operation independently may not yield best overall algorithm.
V IE WING QUERY EVALUAT ION PL ANS
• Most database support explain <query>
– Displays plan chosen by query optimizer, along with cost estimates
– Some syntax variations between databases
• Oracle: explain plan for <query> followed by select * from table (dbms_xplan.display)
• SQL Server: set showplan_text on
Source: https://www.db-book.com/db7/slides-dir/index.html