Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

08 Query Processing Strategies and Optimization

Uploaded by

beshahashenafi32
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

08 Query Processing Strategies and Optimization

Uploaded by

beshahashenafi32
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Query Processing

Strategies and
Optimization
CPS352: Database Systems

Simon Miner
Gordon College
Last Revised: 10/25/12
Agenda

• Check-in

• Design Project Presentations

• Query Processing

• Programming Project

• Exam 1 (time permitting)

• Homework 4
Check-in
Design Project
Presentations
Query Processing and
Optimization
Different Ways to Execute
Queries
• Database creates a plan to get the results for a query
• Not just one way to do this.

• Example : Find the titles of all books written by


Korth.
• π title σ author = ‘Korth’ Book |X| BookAuthor
• π title Book |X| σ author = ‘Korth’ BookAuthor

• Good DBMS will transform queries to make them


as efficient as possible
• Minimize disk accesses
Selection Strategies
• Linear search – full table scan
• Cost of potentially accessing each disk block containing the
desired data

• Binary search (with B+ tree index)


• Exact matches
• Multiple matches
• Range queries
• Complex queries

• Index often requires disk accesses the index structure as


well as for actual data
• Typically far fewer for data than linear search
• Index root and (perhaps) first level kept in buffer pool
Query Type vs. Index Type

Condition Example Clustering / Secondary Hashed Index


Primary Index Index
Exact match id = 12345 Great! Great! Great!
on candidate
key
Exact match status = N/A Find first Find first
on non-key ‘Active’ match (+ match (+
potential scan) potential scan)
Range query age between 21 Find first Less helpful Not useful
and 65 match +
sequential scan
Complex query color = ‘blue’ Not useful Not useful Not useful
or status = (multiple
‘Inactive’ indexes help)
Join Strategies

• Joins are most expensive part of query processing


• Number of tuples examined can approach the product of the
number of records in tables being joined

• Example
• σ Borrower.lastName = BookAuthor.authorName Borrower X BookAuthor
• Where BookAuthor has 10K tuples and Borrower has 2K tuples
• Cartesian join yields 20 million tuples to process
Nested Loop Join
Nested Block Join
Buffering an Entire Relation
Using Indexes to Speed Up
Joins
• Example: Borrower |X| CheckedOut
• Assume
• 2K Borrower tuples, 1K CheckedOut tuples
• 20 records per block (so 100 and 50 blocks for each table, respectively)
• We cannot buffer either table entirely
• Without indexes – nested block join takes 5050 or 5100 disk accesses,
depending on which table is in the outer loop
• With index on Borrower.borrowerID – exactly one match (PK)
• Scan all 1000 CheckedOut records (50 blocks) – each matches exactly one
Borrower record, which can be looked up in the index
• Requires processing only 2000 tuples
• Not quite as good as it seems
• Each borrower may require a separate disk access (50 + 1000 = 1050 accesses)
• Traversing index might take multiple disk accesses (especially B+ Tree indexes)
Temporary Indexes

• Indexes created and buffered for the purpose of a single


query and then discarded
• Example: neither Borrower nor CheckedOut is indexed
• Borrower |X| CheckedOut might cause a temporary index
to be built on Borrower.borrowerID
• If each (dense) index entry takes ~10 bytes, entire index will
be ~20K
• Index construction requires reading all 2K borrowers = 100
disk accesses
• Join itself costs up to 1050 disk accesses (see previous slide)
• Total of 1150 disk accesses
Merge Join
Order of Joins
• For multiple joins, performance can be greatly impacted by the order
in which the joins are done
• Example
• π last, first, authorName Borrower |X| BookAuthor |X| CheckedOut
• Assume 2K borrowers, 1K CheckedOut records, and 10K authors
• Each book has an average of 2 authors
• 3 ways to do the (binary commutative) join operations
• ( Borrower|X| BookAuthor ) |X| CheckedOut
• ( BookAuthor |X| CheckedOut ) |X| Borrower
• ( Borrower |X| CheckedOut ) \X| BookAuthor
• Final number of tuples is the same, but intermediate joins create
temporary tables which are then joined with the remaining table
• Which way is most efficient in light of this?
Rules of Equivalence
• Two formulations of a query are equivalent if the
produce the same set of results
• Not necessarily in the same order

• Example : Find the titles of all books written by Korth.


• select title
from Book natural join BookAuthor
where authorName = ‘Korth’;
• Equivalent relational algebra queries
• π title σ author = ‘Korth’ Book |X| BookAuthor

• π title Book |X| σ author = ‘Korth’ BookAuthor

• Equivalent, but the same in terms of performance


Equivalence Rules

1. Conjunctive selection operations can be deconstructed into a


sequence of individual selections.
s q Ùq ( E ) = s q (s q ( E ))
1 2 1 2
2. Selection operations are commutative.
s q (s q ( E )) = s q (s q ( E ))
1 2 2 1

3. Only the last in a sequence of projection operations is


needed, the others can be omitted.
 L1 ( L2 (( Ln ( E ))))   L1 ( E )

4. Selections can be combined with Cartesian products and


theta joins.
a. (E1 X E2) = E1  E2
b. 1(E1 2 E2) = E1 1 2 E2

Database System Concepts - 6th Edition 1.18 ©Silberschatz, Korth and Sudarshan
Equivalence Rules (Cont.)
5. Theta-join operations (and natural joins) are commutative.
E1  E2 = E2  E1
6. (a) Natural join operations are associative:
(E1 E2) E3 = E1 (E2 E3)

(b) Theta joins are associative in the following manner:

(E1 1 E2) 2 3 E3 = E1 1 3 (E2 2 E3)

where 2 involves attributes from only E2 and E3.

Database System Concepts - 6th Edition 1.19 ©Silberschatz, Korth and Sudarshan
Equivalence Rules (Cont.)
7. The selection operation distributes over the theta join operation
under the following two conditions:
(a) When all the attributes in 0 involve only the attributes of one
of the expressions (E1) being joined.

0E1  E2) = (0(E1))  E2

(b) When  1 involves only the attributes of E1 and 2 involves


only the attributes of E2.
1 E1  E2) = (1(E1))  ( (E2))

Database System Concepts - 6th Edition 1.20 ©Silberschatz, Korth and Sudarshan
Equivalence Rules (Cont.)
8. The projection operation distributes over the theta join operation
as follows:
(a) if  involves only attributes from L1  L2:
 L1 L2 ( E1  E2 )  ( L1 ( E1 ))  ( L2 ( E2 ))

(b) Consider a join E1  E2.


l Let L1 and L2 be sets of attributes from E1 and E2,
respectively.
l Let L3 be attributes of E1 that are involved in join condition ,
but are not in L1  L2, and
l let L4 be attributes of E2 that are involved in join condition ,
but are not in L1  L2.
 L  L ( E1
1 2  E2 )   L  L (( L  L ( E1 ))
1 2 1 3  ( L  L ( E 2 )))
2 4

Database System Concepts - 6th Edition 1.21 ©Silberschatz, Korth and Sudarshan
Equivalence Rules (Cont.)

9. The set operations union and intersection are commutative


E1  E2 = E2  E1
E1  E2 = E2  E1
n (set difference is not commutative).
10. Set union and intersection are associative.
(E1  E2)  E3 = E1  (E2  E3)
(E1  E2)  E3 = E1  (E2  E3)
11. The selection operation distributes over ,  and –.
 (E1 – E2) =  (E1) – (E2)
and similarly for  and  in place of –
Also:  (E1 – E2) = (E1) – E2
and similarly for  in place of –, but not for 
12. The projection operation distributes over union
L(E1  E2) = (L(E1))  (L(E2))
Database System Concepts - 6th Edition 1.22 ©Silberschatz, Korth and Sudarshan
Push Selections Inward
• Do selections as early as possible
• Reduces (“flattens”) the number of records in the relation(s) being
joined

• Example:
• π title σ author = ‘Korth’ Book |X| BookAuthor
• π title Book |X| σ author = ‘Korth’ BookAuthor

• Sometimes this is not feasible


• σ Borrower.lastName = BookAuthor.authorName Borrower X BookAuthor
• i.e. when there are no shared attributes

• Alter the structure of the selection itself


• Find late checked out books that cost more than $20.00.
• σ purchasePrice > 20 ∧ dateDue < today Book |X| CheckedOut
• σ purchasePrice > 20 Book |X|σ dateDue < today CheckedOut
Push Projections Inward

• Do projections as early as possible


• Reduces (“narrows”) the number of columns in the relation(s)
being joined

• Example:
• π lastName, firstName, title, dateDue Borrower|X| CheckedOut |X| Book
• π lastName, firstName, title, dateDue Borrower|X|
(π borrowerID, title, dateDue CheckedOut |X| Book )
• Reduces the number of columns in the temporary table from the
intermediate join
Statistics and Query
Optimization
• Using statistics about database objects can help speed
up queries

• Maintaining statistics as the data in the database


changes is a manageable process

• Types of statistics
• Table statistics
• Column statistics
Table Statistics

• On a relation r
• nr = number of tuples in the relation
• br = number of blocks used by the relation
• lr = size (in bytes) of a tuple in the relation
• fr = blocking factor, number of tuples per block
• Note that fr = floor( block size / lr ) if tuples do not span
blocks
• Note that br = ceiling( nr / fr ) if tuples in r reside in a single
file and are not clustered with other relations
Column Statistics
• on a column A
• V( A, r ) = number of distinct values in the column
• If A is a superkey, then V( A, r ) = nr
• If A is not a superkey, the number of times each
column value occurs can be estimated by nr / V( A, r )
• If column A is indexed, V( A, r ) s relatively easy to
maintain
• Keep track of the count of entries in the index

• Histogram of the relative frequency of column


values in different ranges
Estimating the Size of a Join
• Cartesian product– r X s
• Number of tuples in join = nr X s = nr * ns
• Size of each tuple in join = lr X s = lr + ls

• Natural join – r |X| s, where r and s have A in common


• The size of the join can be estimated in two ways
• The ns tuples of s will join with nr / V( A, r ) tuples of r
for ns * nr / V( A, r ) total tuples
• The nr tuples of r will join with ns / V( A, s ) tuples of s
for nr * ns / V( A, s ) total tuples
• We want to use the smaller of these estimates
• min(nr * ns / V(A, s) , ns * nr / V(A, r) ) = ns * nr / max( V(A, r), V(A, s) )
• Also note that V(A, r |X| s) = min( V(A, r), V(A, s) )
• Some tuples in the relation with the larger number of column values do not join
with any tuples in the other relation
Example Join Estimation
• π last, first, authorName Borrower |X| BookAuthor |X| CheckedOut

• 3 ways to do the join operations – Which is most efficient?


• ( Book |X| BookAuthor ) |X| CheckedOut
• ( BookAuthor |X| CheckedOut ) |X| Borrower
• ( Borrower |X| CheckedOut |X| BookAuthor

• Statistics
nr V( A, r )
nBorrower = 2000 V( borrowerID, Borrower ) = 2000
nCheckedOut = 1000 V( borrower, CheckedOut ) = 100
nBookAuthor = 10,000 V( callNo, CheckedOut ) = 500
V( callNo, BookAuthor ) = 5000
Programming Project
Part I
Exam 1
Homework 4

You might also like