08 Query Processing Strategies and Optimization
08 Query Processing Strategies and Optimization
Strategies and
Optimization
CPS352: Database Systems
Simon Miner
Gordon College
Last Revised: 10/25/12
Agenda
• Check-in
• Query Processing
• Programming Project
• Homework 4
Check-in
Design Project
Presentations
Query Processing and
Optimization
Different Ways to Execute
Queries
• Database creates a plan to get the results for a query
• Not just one way to do this.
• Example
• σ Borrower.lastName = BookAuthor.authorName Borrower X BookAuthor
• Where BookAuthor has 10K tuples and Borrower has 2K tuples
• Cartesian join yields 20 million tuples to process
Nested Loop Join
Nested Block Join
Buffering an Entire Relation
Using Indexes to Speed Up
Joins
• Example: Borrower |X| CheckedOut
• Assume
• 2K Borrower tuples, 1K CheckedOut tuples
• 20 records per block (so 100 and 50 blocks for each table, respectively)
• We cannot buffer either table entirely
• Without indexes – nested block join takes 5050 or 5100 disk accesses,
depending on which table is in the outer loop
• With index on Borrower.borrowerID – exactly one match (PK)
• Scan all 1000 CheckedOut records (50 blocks) – each matches exactly one
Borrower record, which can be looked up in the index
• Requires processing only 2000 tuples
• Not quite as good as it seems
• Each borrower may require a separate disk access (50 + 1000 = 1050 accesses)
• Traversing index might take multiple disk accesses (especially B+ Tree indexes)
Temporary Indexes
Database System Concepts - 6th Edition 1.18 ©Silberschatz, Korth and Sudarshan
Equivalence Rules (Cont.)
5. Theta-join operations (and natural joins) are commutative.
E1 E2 = E2 E1
6. (a) Natural join operations are associative:
(E1 E2) E3 = E1 (E2 E3)
Database System Concepts - 6th Edition 1.19 ©Silberschatz, Korth and Sudarshan
Equivalence Rules (Cont.)
7. The selection operation distributes over the theta join operation
under the following two conditions:
(a) When all the attributes in 0 involve only the attributes of one
of the expressions (E1) being joined.
Database System Concepts - 6th Edition 1.20 ©Silberschatz, Korth and Sudarshan
Equivalence Rules (Cont.)
8. The projection operation distributes over the theta join operation
as follows:
(a) if involves only attributes from L1 L2:
L1 L2 ( E1 E2 ) ( L1 ( E1 )) ( L2 ( E2 ))
Database System Concepts - 6th Edition 1.21 ©Silberschatz, Korth and Sudarshan
Equivalence Rules (Cont.)
• Example:
• π title σ author = ‘Korth’ Book |X| BookAuthor
• π title Book |X| σ author = ‘Korth’ BookAuthor
• Example:
• π lastName, firstName, title, dateDue Borrower|X| CheckedOut |X| Book
• π lastName, firstName, title, dateDue Borrower|X|
(π borrowerID, title, dateDue CheckedOut |X| Book )
• Reduces the number of columns in the temporary table from the
intermediate join
Statistics and Query
Optimization
• Using statistics about database objects can help speed
up queries
• Types of statistics
• Table statistics
• Column statistics
Table Statistics
• On a relation r
• nr = number of tuples in the relation
• br = number of blocks used by the relation
• lr = size (in bytes) of a tuple in the relation
• fr = blocking factor, number of tuples per block
• Note that fr = floor( block size / lr ) if tuples do not span
blocks
• Note that br = ceiling( nr / fr ) if tuples in r reside in a single
file and are not clustered with other relations
Column Statistics
• on a column A
• V( A, r ) = number of distinct values in the column
• If A is a superkey, then V( A, r ) = nr
• If A is not a superkey, the number of times each
column value occurs can be estimated by nr / V( A, r )
• If column A is indexed, V( A, r ) s relatively easy to
maintain
• Keep track of the count of entries in the index
• Statistics
nr V( A, r )
nBorrower = 2000 V( borrowerID, Borrower ) = 2000
nCheckedOut = 1000 V( borrower, CheckedOut ) = 100
nBookAuthor = 10,000 V( callNo, CheckedOut ) = 500
V( callNo, BookAuthor ) = 5000
Programming Project
Part I
Exam 1
Homework 4