Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
30 views

05 Query Processing and Optimization-TELU

Uploaded by

pratistozuhri
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

05 Query Processing and Optimization-TELU

Uploaded by

pratistozuhri
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Database System 05 | Query Processing and Optimization

Tahun Ajar Ganjil 2024/2025

Oleh:
Tim Dosen
Goals of the Meeting

01 02 03
Students knows the Students know various Students know various
basic process of query algorithms for selection ways to optimize query
processing and join operations and processing, generate
and understand how to how to measures query equivalent expressions,
translate SQL Queries costs and execute an SQL
into Relational Algebra statement to view query
Expression (RAE) evaluation plans in
DBMS
OUTLINES

• Step of Query Processing

• Relational Algebra Expression (RAE)

• Query Cost

• Equivalen Rules Query Evaluation Plans on DBMS

9 /1 0/ 2 02 4 Storage Management 3
STEP OF QUERY
PRO CESSING

9 /1 0/ 2 02 4 4
BASIC STEPS IN QUERY PROCESSING
1. Parsing and translation

2. Optimization

3. Evaluation
BA S I C ST E P S I N Q U E RY P R O C E S S I N G ( C O N T. )
• Parsing and translation
– translate the query into its internal form. This is then translated into relational algebra.
– Parser checks syntax, verifies relations
• Optimization
– Each relational algebra operation can be evaluated using one of several different algorithms
– Annotated expression specifying detailed evaluation strategy is called an evaluation-plan
– Amongst all equivalent evaluation plans choose the one with lowest cost.
• Evaluation
– The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to
the query.
REL ATIONAL
ALGEBRA
EXPRESSION
• PA R SIN G & TR AN SL AT IO N
Q U E RY TO R A E

9 /1 0/ 2 02 4 7
BASIC STEPS IN QUERY PROCESSING
1. Parsing and translation

2. Optimization

3. Evaluation
RE L ATIO NAL ALGE BRA
• A procedural language consisting of a set of operations that take one or two relations as
input and produce a new relation as their result.
• Operators
– select: 
– project: 
– cartesian product: x
– join: ⋈
– union: 
– set-intersection: 
– set-difference: –
– assignment: 
– rename: 
S EL ECT OPER ATIO N
• The select operation selects tuples that satisfy a given predicate.
• Notation:  p (r)
• p is called the selection predicate
• Example: select those tuples of the instructor relation where the instructor is in the “Physics”
department.
– Query
SELECT * FROM instructor WHERE dept_name = ‘Physics’
– Relational Algebra (RA)
 dept_name=“Physics” (instructor)

– Result
S E L E C T O P E R AT I O N ( C O N T. )
• We allow comparisons using
=, , >, . <. 
in the selection predicate.
• We can combine several predicates into a larger predicate by using the connectives:
 (and),  (or),  (not)
• Example: Find the instructors in Physics with a salary greater than $90,000, we write:
– Query: SELECT * FROM instructor WHERE dept_name = ‘Physics’ AND salary > 90000
– RA:
 dept_name=“Physics”  salary > 90,000 (instructor)

• The select predicate may include comparisons between two attributes.


– Example, find all departments whose name is the same as their building name:
– Query: SELECT * FROM department WHERE dept_name = building
– RA:
 dept_name=building (department)
PROJECT OPERATION
• A unary operation that returns its argument relation, with certain attributes left out.
• Notation:

 A1,A2,A3 ….Ak (r)

where A1, A2, …, Ak are attribute names and r is a relation name.


• The result is defined as the relation of k columns obtained by erasing the columns that are not
listed
• Duplicate rows removed from result, since relations are sets
PROJECT OPERATION EXAMPLE
• Example: eliminate the dept_name attribute of instructor

• Query: SELECT id, name, salary FROM instructor

• RA:

ID, name, salary (instructor)


• Result:
COMPOS ITIO N OF RE L ATIO NAL O PERATION S
• The result of a relational-algebra operation is relation and therefore of relational-algebra
operations can be composed together into a relational-algebra expression.
• Consider the query -- Find the names of all instructors in the Physics department.
– Query: SELECT name FROM instructor WHERE dept_name = ‘Physics’
– RAE:
name( dept_name =“Physics” (instructor))

• Instead of giving the name of a relation as the argument of the projection operation, we give an
expression that evaluates to a relation.
CARTE S IAN-PROD U CT OPER ATION
• The Cartesian-product operation (denoted by X) allows us to combine information from any two
relations.
• Example: the Cartesian product of the relations instructor and teaches is written as:
– Query: SELECT * FROM instructor CROSS JOIN teaches
or SELECT * FROM instructor, teaches
– RA:
instructor X teaches
• We construct a tuple of the result out of each possible pair of tuples: one from the instructor relation
and one from the teaches relation (see next slide)
• Since the instructor ID appears in both relations we distinguish between these attribute by attaching to
the attribute the name of the relation from which the attribute originally came.
– instructor.ID
– teaches.ID
THE INSTRUCTOR X TEACHES TA B LE
JO IN OPE RATIO N
• The Cartesian-Product
instructor X teaches
associates every tuple of instructor with every tuple of teaches.
– Most of the resulting rows have information about instructors who did NOT teach a particular course.
• To get only those tuples of “instructor X teaches “ that pertain to instructors and the courses that they
taught, we write:
– Query: SELECT * FROM instructor, teaches WHERE instructor.id = teaches.id
– RAE:
 instructor.id = teaches.id (instructor x teaches ))

– We get only those tuples of “instructor X teaches” that pertain to instructors and the courses that
they taught.
• The result of this expression, shown in the next slide
J O I N O P E R AT I O N ( C O N T. )
• The table corresponding to:

 instructor.id = teaches.id (instructor x teaches))


J O I N O P E R AT I O N ( C O N T. )
• The join operation allows us to combine a select operation and a Cartesian-Product operation into a single
operation.

• Consider relations r (R) and s (S)

• Let “theta” be a predicate on attributes in the schema R “union” S. The join operation r ⋈𝜃 s is defined as follows:
𝒓 ⋈𝜽 𝒔 = 𝝈𝜽 (𝒓 × 𝒔)

Thus

 instructor.id = teaches.id (instructor x teaches ))


Can equivalently be written as
– SELECT * FROM instructor JOIN teaches ON instructor.id = teaches.id

instructor ⋈ Instructor.id = teaches.id teaches


J O I N O P E R AT I O N ( C O N T. )

• Natural join :

• Inner join :

• Outer join :

Left Outer Join Full Outer Join Right Outer Join


RE N AME OPE RATI ON
• The results of relational-algebra expressions do not have a name that we can use
to refer to them. The rename operator,  , is provided for that purpose
• The expression:
x (E)
returns the result of expression E under the name x
• Another form of the rename operation:
x(A1,A2, .. An) (E)
returns the result of expression E under the name x, and with the attributes
renamed to A1 , A2 , …., An .
RENAME OPERATION EXAMPLE
• SELECT * FROM countries nation:
nation (countries)

• SELECT country_id AS id, country_name AS name, region_id FROM countries


nation:
nation (id, name, region_id) (countries)

• SELECT country_id AS id, country_name AS name FROM countries nation:


 id, name (nation (id, name, region_id) (countries) )
QUERY
O P T I M I Z AT I O N
• ME AS U RE S O F Q U ER Y C OST S

• SE LE CTI O N ALG O RI TH M

9 /1 0/ 2 02 4 23
BASIC STEPS IN QUERY PROCESSING
1. Parsing and translation

2. Optimization

3. Evaluation
BASIC STEPS IN QUERY PROCESSING:
OPTI MIZATIO N
• A relational algebra expression may have many equivalent expressions
– E.g., salary75000(salary(instructor)) is equivalent to
salary(salary75000(instructor))
• Each relational algebra operation can be evaluated using one of several different algorithms
– Correspondingly, a relational-algebra expression can be evaluated in many ways.
• Annotated expression specifying detailed evaluation strategy is called an evaluation-plan. E.g.,:
– Use an index on salary to find instructors with salary < 75000,
– Or perform complete relation scan and discard instructors with salary  75000
B A S I C S T E P S : O P T I M I Z A T I O N ( C O N T. )
• Query Optimization: Amongst all equivalent evaluation plans choose the one with lowest cost.
– Cost is estimated using statistical information from the
database catalog
• e.g.. number of tuples in each relation, size of tuples, etc.
• In this chapter we study
– How to measure query costs
– Algorithms for evaluating relational algebra operations
– How to combine algorithms for individual operations in order to evaluate a complete
expression
MEASURES OF QUERY COST
• Many factors contribute to time cost
– disk access, CPU, and network communication
• Cost can be measured based on
– response time, i.e. total elapsed time for answering query, or
– total resource consumption
• We use total resource consumption as cost metric
– Response time harder to estimate, and minimizing resource consumption is a good idea in a shared
database
• We ignore CPU costs for simplicity
– Real systems do take CPU cost into account
– Network costs must be considered for parallel systems
• We describe how estimate the cost of each operation
– We do not include cost to writing output to disk
MEASURES OF QUERY COST
• Disk cost can be estimated as:
– Number of seeks * average-seek-cost
– Number of blocks read * average-block-read-cost
– Number of blocks written * average-block-write-cost
• For simplicity we just use the number of block transfers from disk and the number of seeks as the cost measures
– tT – time to transfer one block
• Assuming for simplicity that write cost is same as read cost
– tS – time for one seek
– Cost for b block transfers plus S seeks
b * tT + S * tS
• tS and tT depend on where data is stored; with 4 KB blocks:
– High end magnetic disk: tS = 4 msec and tT =0.1 msec
– SSD: tS = 20-90 microsec and tT = 2-10 microsec for 4KB
M E A S U R E S O F Q U E R Y C O S T ( C O N T. )
• Required data may be buffer resident already, avoiding disk I/O
– But hard to take into account for cost estimation
• Several algorithms can reduce disk IO by using extra buffer space
– Amount of real memory available to buffer depends on other concurrent queries and OS
processes, known only during execution
• Worst case estimates assume that no data is initially in buffer and only the minimum amount of
memory needed for the operation is available
– But more optimistic estimates are used in practice
S ELECT I ONS I NVOLV I NG EQU A LI TY
COST ESTIMATES FOR SELECTION ALGORITHMS
S ELECT I ONS I NVOLV I NG EQU A LI TY (2 )
COST ESTIMATES FOR SELECTION ALGORITHMS
S ELECT I ONS I NVOLV I NG COM PA R IS ON S
COST ESTIMATES FOR SELECTION ALGORITHMS
J OIN OPER AT ION
• Several different algorithms to implement joins
– Nested-loop join
– Block nested-loop join
• Choice based on cost estimate
• Examples use the following information
– Number of records of student: 5,000 takes: 10,000
– Number of blocks of student: 100 takes: 400
NESTED-LOOP JOIN
• To compute the theta join r⨝s
for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr,ts) to see if they satisfy the join condition 
if they do, add tr • ts to the result.
end
end
• r is called the outer relation and s the inner relation of the join.
• Requires no indices and can be used with any kind of join condition.

• Expensive since it examines every pair of tuples in the two relations.


N E S T E D - L O O P J O I N ( C O N T. )
• In the worst case, if there is enough memory only to hold one block of each relation, the estimated cost is
nr  bs + br block transfers, plus nr + br seeks
• If the smaller relation fits entirely in memory, use that as the inner relation.
– Reduces cost to br + bs block transfers and 2 seeks
• Assuming worst case memory availability cost estimate is
– with student as outer relation:
• 5000  400 + 100 = 2,000,100 block transfers,
• 5000 + 100 = 5100 seeks
– with takes as the outer relation
• 10000  100 + 400 = 1,000,400 block transfers and 10,400 seeks
• If smaller relation (student) fits entirely in memory, the cost estimate will be 500 block transfers.
• Block nested-loops algorithm (next slide) is preferable.
BLOCK NESTED -LOOP JOI N
• Variant of nested-loop join in which every block of inner relation is paired with every block of outer
relation.
for each block Br of r do begin
for each block Bs of s do begin
for each tuple tr in Br do begin
for each tuple ts in Bs do begin
Check if (tr,ts) satisfy the join condition
if they do, add tr • ts to the result.
end
end
end
end
B L O C K N E S T E D - L O O P J O I N ( C O N T. )
• Worst case estimate: br  bs + br block transfers + 2 * br seeks
– Each block in the inner relation s is read once for each block in the outer relation
• Best case: br + bs block transfers + 2 seeks.

• Improvements to nested loop and block nested loop algorithms:


– In block nested-loop, use M — 2 disk blocks as blocking unit for outer relations, where M = memory size in
blocks; use remaining two blocks to buffer inner relation and output
• Cost = br / (M-2)  bs + br block transfers +
2 br / (M-2) seeks
– If equi-join attribute forms a key or inner relation, stop inner loop on first match
– Scan inner loop forward and backward alternately, to make use of the blocks remaining in buffer (with LRU
replacement)
– Use index on inner relation if available
QUERY
E VA LU AT I O N
• EQ U I VAL EN CE R U L ES

9 /1 0/ 2 02 4 38
BASIC STEPS IN QUERY PROCESSING
1. Parsing and translation

2. Optimization

3. Evaluation
INTRODUCTION
• Alternative ways of evaluating a given query
– Equivalent expressions
– Different algorithms for each operation
I N T R O D U CT I O N ( C O N T.)
• An evaluation plan defines exactly what algorithm is used for each operation, and how the execution of the
operations is coordinated.
I N T R O D U CT I O N ( C O N T.)
• Cost difference between evaluation plans for a query can be enormous
– E.g., seconds vs. days in some cases
• Steps in cost-based query optimization
1. Generate logically equivalent expressions using equivalence rules
2. Annotate resultant expressions to get alternative query plans
3. Choose the cheapest plan based on estimated cost
• Estimation of plan cost based on:
– Statistical information about relations. Examples:
• number of tuples, number of distinct values for an attribute
– Statistics estimation for intermediate results
• to compute cost of complex expressions
– Cost formulae for algorithms, computed using statistics
TRANS FORMATION OF REL ATIONAL
EXPRESSIONS
• Two relational algebra expressions are said to be equivalent if the two expressions generate the same set of
tuples on every legal database instance
– Note: order of tuples is irrelevant
– we don’t care if they generate different results on databases that violate integrity constraints
• In SQL, inputs and outputs are multisets of tuples
– Two expressions in the multiset version of the relational algebra are said to be equivalent if the two
expressions generate the same multiset of tuples on every legal database instance.
• An equivalence rule says that expressions of two forms are equivalent
– Can replace expression of first form by second, or vice versa
EQUIVALENCE RULES
1. Conjunctive selection operations can be deconstructed into a sequence of individual selections.
σ1  2 (E) ≡ σ1 (σ2 (E))
2. Selection operations are commutative.
σ1(σ2(E)) ≡ σ2 (σ1(E))
3. Only the last in a sequence of projection operations is needed, the others can be omitted.
 L1( L2(…( Ln(E))…)) ≡  L1(E)
where L1 ⊆ L2 … ⊆ Ln
4. Selections can be combined with Cartesian products and theta joins.
a. σ (E1 x E2) ≡ E1 ⨝  E2
b. σ 1 (E1 ⨝2 E2) ≡ E1 ⨝ 1∧2 E2
E Q U I V A L E N C E R U L E S ( C O N T. )
5. Theta-join operations (and natural joins) are commutative.

E1 ⨝ E2 ≡ E2 ⨝ E1

6. (a) Natural join operations are associative:

(E1 ⨝ E2) ⨝ E3 ≡ E1 ⨝ (E2 ⨝ E3)

(b) Theta joins are associative in the following manner:

(E1 ⨝ 1 E2) ⨝ 2  3 E3 ≡ E1 ⨝1  3 (E2 ⨝ 2 E3)

where 2 involves attributes from only E2 and E3.


PICTORIAL DEPICTION OF EQUIVALENCE
RULES
E Q U I V A L E N C E R U L E S ( C O N T. )
7. The selection operation distributes over the theta join operation under the following two
conditions:
(a) When all the attributes in 0 involve only the attributes of one
of the expressions (E1) being joined.

0 (E1 ⨝ E2) ≡ (0(E1)) ⨝ E2

(b) When 1 involves only the attributes of E1 and 2 involves


only the attributes of E2.
1  2 (E1 ⨝ E2) ≡ (1(E1)) ⨝ (2(E2))
TRANSFORMATION EXAMPLE: PUSHING
SELECTIONS

• Query: Find the names of all instructors in the Music department, along with the titles of the
courses that they teach
– name, title(dept_name= ‘Music’
(instructor ⨝ (teaches ⨝ course_id, title (course))))
• Transformation using rule 7a.
– name, title((dept_name= ‘Music’(instructor)) ⨝
(teaches ⨝ course_id, title (course)))
• Performing the selection as early as possible reduces the size of the relation to be joined.
MULTI PL E T RA NS FOR MAT IONS
JOIN ORDERING EXAMPLE
• For all relations r1, r2, and r3,
(r1 ⨝ r2) ⨝ r3 = r1 ⨝ (r2 ⨝ r3 )
(Join Associativity) ⨝
• If r2 ⨝ r3 is quite large and r1 ⨝ r2 is small, we choose

(r1 ⨝ r2) ⨝ r3
so that we compute and store a smaller temporary relation.
J O I N O R D E R I N G E X A M P L E ( C O N T. )
• Consider the expression
name, title(dept_name= “Music” (instructor) ⨝ teaches) ⨝ course_id, title (course))))

• Could compute teaches ⨝ course_id, title (course) first, and join result with

dept_name= “Music” (instructor)

• but the result of the first join is likely to be a large relation.


• Only a small fraction of the university’s instructors are likely to be from the Music department
– it is better to compute
dept_name= “Music” (instructor) ⨝ teaches
first.
COST ESTIMATION
• Cost of each operator computer

– Need statistics of input relations


• E.g., number of tuples, sizes of tuples

• Inputs can be results of sub-expressions

– Need to estimate statistics of expression results


– To do so, we require additional statistics
• E.g., number of distinct values for an attribute
Choice of Evaluation Plans

Must consider the interaction of evaluation techniques when choosing evaluation plans choosing the
cheapest algorithm for each operation independently may not yield best overall algorithm.
V IE WING QUERY EVALUAT ION PL ANS
• Most database support explain <query>
– Displays plan chosen by query optimizer, along with cost estimates
– Some syntax variations between databases
• Oracle: explain plan for <query> followed by select * from table (dbms_xplan.display)
• SQL Server: set showplan_text on

• Some databases (e.g. PostgreSQL) support explain analyse <query>


– Shows actual runtime statistics found by running the query, in addition to showing the plan
• Some databases (e.g. PostgreSQL) show cost as f..l
– f is the cost of delivering first tuple and l is cost of delivering all results
REFERENCE
Silberschatz, Korth, and Sudarshan. Database System Concepts – 7th Edition. McGraw-Hill. 2019.

Slides adapted from Database System Concepts Slide.

Source: https://www.db-book.com/db7/slides-dir/index.html

You might also like