0% found this document useful (0 votes)

1K views

Query Optimization

The document discusses how queries are processed in SQL. It explains that a query processor takes the SQL query and optimizes it by determining the most efficient execution plan. The key steps are: 1) Scanning, parsing, and validating the query 2) Optimizing the query plan using either heuristic or cost-based methods 3) Generating executable query code 4) Executing the optimized plan against the database.

Uploaded by

ndev84

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views

Query Optimization

Uploaded by

ndev84

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 103

QUERY PROCESSING and

OPTIMIZATION
How queries are processed In SQL?

 SQL is a nonprocedural language in which

we specify What we need without How
we can get it.

 With higher level database query

languages such as SQL and QUEL, a
special component of the DBMS called the
Query Processor takes care of arranging
the underlying access routines to satisfy a
given query.
DB ACCESS

Users/Programmers

Database Application Programs/Queries

System

DBMS
Software Software: Query Processing
& Programs

Software: Data Access

Database
Database
Definition
Agenda
I. Query Processing and Optimization: Why?

II. Steps of Processing

III. Methods of Optimization

 Heuristic (Logical Transformations)
 Transformation Rules
 Heuristic Optimization Guidelines
 Cost Based (Physical Execution Costs)
 Data Storage/Access Refresher
 Catalog & Costs

IV. What All This Means To YOU?

How queries are processed?

A query is processed in four

general steps:
1. Scanning and Parsing
2. Query Optimization or planning the
execution strategy
3. Query Code Generation (interpreted
or compiled)
4. Execution in the runtime database
processor
Relational Query Processing

Query
Scanning
Parsing
Validating
Intermediate form of Query
(query Tree)
Query
Optimizer
Catalog

Execution Plan
Query Code
Generator
Compile
d Query Executable Code
Code
Execution in
Runtime processor
1. Query Recognition
 Scanning is the process of identifying
the tokens in the query.
 The tokenized representation is suitable for
processing by the parser.
 Token examples are SQL keywords,
Attribute names, Table names, …
 This representation may be in a tree form.

 Parser checks the tokenized

representation for correct syntax. This is
according to rules of language grammar
1. Query Recognition

 Validating, checks are made to

determine if columns and tables
identified in the query exist in the
database.

 If the query passes the recognition

checks, the output (intermediate form
of query) is called the Canonical Query
Tree.
Relational Query Processing

Query
Scanning
Parsing
Validating
Intermediate form of Query
(query Tree)
Query
Optimizer
Catalog

Execution Plan
Query Code
Generator
Compile
d Query Executable Code
Code
Execution in
Runtime processor
2. Query Optimization
 The goal of the query optimizer
is to find an efficient strategy for
executing the query using the
access routines.

 Optimization typically takes one

of two forms: Heuristic
Optimization or Cost Based
Optimization
2. Query Optimization
 For any given query, there may be a number
of different ways to execute it.
 Each operation in the query (SELECT, JOIN,
etc.) can be implemented using one or more
different Access Routines.
 For example, an access routine that employs
an index to retrieve some rows would be more
efficient than an access routine that performs
a full table scan.
 The query optimizer has determined the
execution plan
Relational Query Processing

Query
Scanning
Parsing
Validating
Intermediate form of Query
(query Tree)
Query
Optimizer
Catalog

Execution Plan
Query Code
Generator
Compile
d Query Executable Code
Code
Execution in
Runtime processor
3. Query Code Generator
 Once the query optimizer has
determined the execution plan (the
specific ordering of access routines),
the code generator writes out the
actual access routines to be executed.
 With an interactive session, the query
code is interpreted and passed directly
to the runtime database processor for
execution.
 It is also possible to compile the access
routines and store them for later
execution
Access Routines
 are algorithms that are used to access
and aggregate data in a database.
 A RDBMS may have a collection of
general access routines that can be
combined to implement a query
execution plan.
 We are interested in access routines
for selection, projection, join and set
operations such as union, intersection,
set difference, Cartesian product, etc.
Relational Query Processing

Query
Scanning
Parsing
Validating
Intermediate form of Query
(query Tree)
Query
Optimizer
Catalog

Execution Plan
Query Code
Generator
Compile
d Query Executable Code
Code
Execution in
Runtime
processor
4. Execution in the runtime
database processor
 At this point, the query has been
scanned, parsed, planned and
(possibly) compiled.
 The runtime database processor then
executes the access routines against
the database.
 The results are returned to the
application that made the query in
the first place.
 Any runtime errors are also returned.
Query Processing &
Optimization
What is Query Processing?
 Steps required to transform high level
SQL query into a correct and “efficient”
strategy for execution and retrieval.

What is Query Optimization?

 The activity of choosing a single
“efficient” execution strategy (from
hundreds) as determined by database
catalog statistics.
Example

R(A,B,C)
S(C,D,E)
SELECT B, D
FROM R, S
WHERE R.C=S.C AND
R.A = "c" AND
S.E = 2
R A B C S C D E
a 1 10 10 x 2
b 1 20 20 y 2
c 2 10 30 z 2
d 2 35 40 x 1
e 3 45 50 y 3
Answer B D
2 x
But this is your intelligent way..
• How to execute query?

Basic idea - Do Cartesian product

RxS.
- Select tuples.
- Do projection.
projection
RxS R.A R.B R.C S.C S.D S.E
a 1 10 10 x 2
a 1 10 20 y 2
.
.
Got one... c 2 10 10 x 2
.
.
Problem

 A Cartesian product RxS may be LARGE:

 need to create and examine n x m tuples,

where n = |R| and m = |S|.
 For example, n = m = 1000 => 106 records.

−> need more efficient evaluation methods.

Relational Algebra:
used to describe logical plans.

Ex: Original logical query plan

Π B,D
SELECT B,D
−>

σ R.A =“c”∧S.E=2 ∧R.C=S.C

WHERE ... −>
x
R S
FROM R,S −>
OR: Π B,D [ σ R.A= “c”∧S.E=2 ∧R.C=S.C (RxS)]
Improved logical query plan:

Plan II
Π B,D
natural join

σ R.A = “c” σ S.E = 2

R S
R S
A B C σ A='c' (R) σ E=2 (S) C D E

a 1 10 A B C C D E 10 x 2
b 1 20 c 2 10 10 x 2 20 y 2
c 2 10 20 y 2 30 z 2
d 2 35 30 z 2 40 x 1
e 3 45 50 y 3

Π B,D
Physical Query Plan:

Detailed description to execute the

query:
- algorithms to implement operations;
order of execution steps; how relations are
For example:
accessed;
(1) Use R.A index to select tuples of R with R.A
= “c”.
(2) For each R.C value found, use the index on
S.C to find matching tuples.
(3) Eliminate S tuples with S.E ≠ 2.
(4) Join matching R,S tuples, project on
attributes B and D, and place in result.
R S
A=“c” C
A B C I1 I2 C D E
a 1 10 10 x 2
<c,2,10> <10,x,2>
b 1 20 20 y 2
c 2 10 check=2?30 z 2

d 2 35 output: <2,x> 40 x 1
e 3 45 50 y 3

next tuple:
<c,7,15>
Physical operators

 Principal methods for executing

operations of relational algebra.
 Building blocks of physical query
plans.
 Major strategies:
 scanning tables.
 sorting, indexing, hashing.
Questions for Query
Optimization
 Which relational algebra expression, equivalent to
the given query, will lead to the most efficient
solution plan?

 For each algebraic operator, what algorithm (of

several available) do we use to compute that
operator?

 How do operations pass data (main memory buffer,

disk buffer,…)?

 Will this plan minimize resource usage?

(CPU/Response Time/Disk)
Overview of Query Execution

SQL query
parse
parse tree
convert
answer
logical query plan
execute
apply laws
Pi
“improved” l.q.p
estimate result sizes statistics
pick best
{P1,C1>...
l.q.p. +sizes }
consider physical plans estimate costs

{P1,P2,…..}
Processing Steps
Three Major Steps of
Processing
(1) Query Decomposition
 Analysis
 Derive Relational Algebra Tree
 Normalization

(2) Query Optimization

 Heuristic: Improve and Refine relational algebra
tree to create equivalent Logical Query Plans
 Cost Based: Use database statistics to estimate
physical costs of logical operators in LQP to
create Physical Execution Plans

(3) Query Execution

Query Decomposition
 ANALYSIS

 Lexical: Is it even valid SQL?

 Syntactic: Do the relations/attributes exist and

are the operations valid?

 Result is internal tree representation of SQL

query (Parse Tree)

<Query>

SELECT select_list FROM <from_list>

…
<attribute>

*
Query Decomposition (cont…)

 RELATIONAL ALGEBRA TREE

 Root : The desired result of query
 Leaf : Base relations of query
 Non-Leaf : Intermediate relation created from relational algebra operation
 NORMALIZATION
 Convert WHERE clause into more easily manipulated form
 Conjunctive Normal Form(CNF) : (a v b) ∧ [(c v d) ∧ e] ∧ f (more efficient)
 Disjunctive Normal Form(DNF) : ∨ ∨
Query Processing: Who needs
it?
A motivating example:
Identify all managers who work in a London branch
SELECT *
FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND
s.position = ‘Manager’ AND
b.city = ‘london’;

Results in these equivalent relational algebra statements

(1)σ (position=‘Manager’)^(city=‘London’)^(Staff.branchNo=Branch.branchNo) (Staff X Branch)
(2) σ (position=‘Manager’)^(city=‘London’) (Staff wvStaff.branchNo=Branch.branchNo Branch)
(3) [σ (position=‘Manager’) (Staff)]  Staff.branchNo=Branch.branchNo [σ (city=‘London’)
(Branch)]
A Motivating Example (cont…)
Assume:
 1000 tuples in Staff.
 ~ 50 Managers

 50 tuples in Branch.
 ~ 5 London branches

 No indexes or sort keys

 All temporary results are written back to disk (memory

is small)

 Tuples are accessed one at a time (not in blocks)

Motivating Example: Query 1
(Bad)
σ (position=‘Manager’)^(city=‘London’)^(Staff.branchNo=Branch.branchNo) (Staff X Branch)

 Requires (1000+50) disk accesses to read from Staff

and Branch relations
 Creates temporary relation of Cartesian Product
(1000*50) tuples
 Requires (1000*50) disk access to read in temporary
relation and test predicate

Total Work = (1000+50) + 2(100050) =

101,050 I/O operations
Motivating Example: Query 2
(Better)
σ (position=‘Manager’)^(city=‘London’) (Staff  Staff.branchNo=Branch.branchNo
Branch)
 Again requires (1000+50) disk accesses to read from Staff and Branch
 Joins Staff and Branch on branchNo with 1000 tuples
(1 employee : 1 branch )

 Requires (1000) disk access to read in joined relation and check predicate

Total Work = (1000+50) + 2*(1000) =

3050 I/O operations

3300% Improvement over Query 1

Motivating Example: Query 3
(Best)
[ σ (position=‘Manager’) (Staff) ]  Staff.branchNo=Branch.branchNo [σ (city=‘London’) (Branc

 Read Staff relation to determine ‘Managers’ (1000 reads)

 Create 50 tuple relation(50 writes)

 Read Branch relation to determine ‘London’ branches (50 reads)

 Create 5 tuple relation(5 writes)

 Join reduced relations and check predicate (50 + 5 reads)

Total Work = 1000 + 2*(50) + 5 + (50 + 5) =

1160 I/O operations

8700% Improvement over Query 1

Consider if Staff and Branch relations were 10x size? 100x? !!!
Heuristic Optimization
GOAL:
 Use relational algebra equivalence rules to
improve the expected performance of a given
query tree.

Consider the example given earlier:

 Join followed by Selection (~ 3050 disk reads)
 Selection followed by Join (~ 1160 disk reads)
Relational Algebra
Transformations
Cascade of Selection
 (1) σ p ∧q ∧r (R) = σ p(σ q(σ r(R)))

Commutativity of Selection Operations

 (2) σ p(σ q (R)) = σ q(σ p(R))

In a sequence of projections only the last is required

 (3) Π LΠ M…Π N(R) = Π L(R)

Selections can be combined with Cartesian Products and Joins

 (4) σ p( R x S ) = R  S p

(5) σ p( R  S ) = R  Visual of 4
S
σ
 q q^p

p 
x =
p

R S R S
Note : The above is an incomplete List! For a complete list see the text.
More Relational Algebra
Transformations
Join and Cartesian Product Operations are
Commutative and Associative
(6) R x S = S x R
(7) R x (S x T) = (R x S) x T
(8) R  p S = S  p R
(9) (R  p S)  q T = R  p (S  q T)

Selection Distributes over Joins

 If predicate p involves attributes of R only:
(10) σ p( R wvq S ) = σ (R)  q S
p
 If predicate p involves only attributes of R and q
involves only attributes of S:
(11) σ p^q (R  r S) = σ p (R)  r σ (S)
q
Optimization Uses The Following
Heuristics
Break apart conjunctive selections into a sequence of simpler
selections (preparatory step for next heuristic).

Move σ down the query tree for the earliest possible execution
(reduce number of tuples processed).

Replace σ -x pairs by  (avoid large intermediate results).

Break apart and move as far down the tree as possible lists of
projection attributes, create new projections where possible
(reduce tuple widths early).

Perform the joins with the smallest expected result first

Heuristic Optimization
Example
“What are the ticket numbers of the pilots flying to France on 01-0
06?”

SELECT p.ticketno
FROM Flight f , Passenger p, Crew c
WHERE f.flightNo = p.flightNo AND
f .flightNo = c.flightNo AND
f.date = ’01-01-06’ AND
f.to = ’FRA’ AND
p.name = c.name AND Canonical Relational Algebra Expression
c.job = ’Pilot’
Heuristic Optimization (Step 1)
Heuristic Optimization (Step
2)
Heuristic Optimization (Step
3)
Heuristic Optimization (Step
4)
Heuristic Optimization (Step
5)
Heuristic Optimization (Step
6)
Physical Execution Plan
 Identified “optimal” Logical Query Plans
 Every heuristic not always “best” transform
 Heuristic Analysis reduces search space for cost
evaluation but does not necessarily reduce costs

 Annotate Logical Query Plan operators with

physical operations (1 : *)
 Binary vs. Linear search for Selection?
 Nested-Loop Join vs. Sort-Merge Join?
 Pipelining vs. Materialization?

 How does optimizer determine “cheapest”

plan?
Physical Searching
Physical Storage
 Record Placement

 Types of Records:
 Variable Length
 Fixed Length

 Record Separation
 Fixed records don’t need it
 If needed, indicate records with special
marker and give record lengths or offsets
Record Separation
 Unspanned
 Records must stay within a block
 Simpler, but wastes space

 Spanned
 Records are across multiple blocks
 Require pointer at the end of the
block to the next block with that
record
 Essential if record size > block size
Record Separation
 Mixed Record Types – Clustering
 Different record types within the same
block
 Why cluster? Frequently accessed
records are in the same block
 Has performance downsides if there are
many frequently accessed queries with
different ordering
 Split Records
 Put fixed records in one place and
variable in another block
Record Separation
 Sequencing
 Order records in sequential blocks
based on a key
 Indirection
 Record address is a combination of
various physical identifiers or an
arbitrary bit string
 Very flexible but can be costly
Accessing Data
 What is an index?
 Data structure that allows the
DBMS to quickly locate particular
records or tuples that meet specific
conditions
 Types of indicies:
 Primary Index
 Secondary Index
 Dense Index
 Sparse Index/Clustering Index
 Multilevel Indicies
Accessing Data
 Primary Index
 Index on the attribute that
determines the sequencing of the
table
 Guarantees that the index is unique

 Secondary Index
 An index on any other attribute
 Does not guarantee unique index
Accessing Data
 Dense Index
 Every value of the indexed attribute
appears in the index
 Can tell if record exists without
accessing files
 Better access to overflow records

 Clustering Index
 Each index can correspond to many
records
Dense Index

10 10
20 20
30
40
30
40
50
60 50
70 60
80
70
90 80
100 90
110 100
120
Accessing Data
 Sparse Index
 Many values of the indexed
attribute don’t appear
 Less index space per record
 Can keep more of index in memory
 Better for insertions

 Multilevel Indices
 Build an index on an index
 Level 2 Index -> Level 2 Index ->
Data File
Sparse Index
10 10
30 20
50
70
30
40
90
110 50
130 60
150
70
170 80
190 90
210 100
230
B+ Tree
 Use a tree model to hold data or
indices
 Maintain balanced tree and aim
for a “bushy” shallow tree

100

120
150
180
30

180
200
120
130
100
101
110

150
156
179
11

30
35
3
5
B+ Tree
 Rules:
 If root is not a leaf, it must have at
least two children
 For a tree of order n, each node
must have between n/2 and n
pointers and children
 For a tree of order n, the number of
key values in a leaf node must be
between (n-1)/2 and (n-1) pointers
and children
B+ Tree (cont…)
 Rules:
 The number of key values
contained in a non-leaf node is 1
less than the number of pointers
 The tree must always be balanced;
that is, every path from the root
node to a leaf must have the same
length
 Leaf nodes are linked in order of
key values
Hashing
 Calculates the address of the page in
which the record is to be stored based
on more or more fields
 Each hash points to a bucket
 Hash function should evenly distribute
the records throughout the file
 A good hash will generate an equal
number of keys to buckets
 Keep keys sorted within buckets
Hashing
.

records
key → h(key)
.
Hashing
 Types of hashing:
 Extensible Hashing
 Pro:
 Handle growing files
 Less wasted space
 No full reorganizations
 Con:
 Uses indirection
 Directory doubles in size
Hashing
 Types of hashing:
 Linear Hashing
 Pro:
 Handle growing files
 Less wasted space
 No full reorganizations
 No indirection like extensible hashing
 Con:
 Still have overflow chains
Indexing vs. Hashing
 Hashing is good for:
 Probes given specific key
SELECT * FROM R WHERE R.A = 5

 Indexing is good for:

 Range searches
SELECT * FROM R WHERE R.A > 5
Cost Model
Disks and Files

 DBMS stores information on (“hard”) disks.

 This has major implications for DBMS design!
 READ: transfer data from disk to main memory
(RAM).
 WRITE: transfer data from RAM to disk.
 Both are high-cost operations, relative to in-
memory operations, so must be planned
carefully!
Parameters for Estimation

• M: # of available main memory buffers

(estimate).

 Kept as statistics for each relation R:

 T(R) : # of tuples in R.
 B(R): # of blocks to hold all tuples of R.
 V(R, A): # of distinct values for attribute R.A

= SELECT COUNT (DISTINCT A) FROM R

Cost of Scanning a
Relation
 Normally assume relation R to be
clustered, that is, stored in blocks
exclusively used for representing R.

 For example, consider a clustered-file

organization of relations:
DEPT(Name, …) and EMP(Name, Dname, …)
DEPT: Toy, ... DEPT: Sales, ...

EMP: Ann, Toy, ... EMP: John, Sales, ...

EMP: Bob, Toy, ... EMP: Ken, Sales, ... …

… …

 Relation EMP might be considered clustered,

relation DEPT probably not.
 For a clustered relation R, sufficient to read
(approx.) B(R) blocks for a full scan.
 If relation R not clustered, most tuples
probably in different blocks => input cost
approx. T(R)
Classification of Physical Operators
 By applicability and cost:
 one-pass methods
 if
at least one argument relation fits in main
memory.
 two-pass methods
 ifmemory not sufficient for one-pass.
 process relations twice, storing intermediate
results on disk.
 multi-pass
 generalization of two-pass for HUGE relations
Implementing Selection
 How to evaluate σ C(R)?
 Sufficient to examine one tuple at a
time
−> Easy to evaluate in one pass:
 Read each block of R using one input
buffer.
 Output records that satisfy condition C.

 If R clustered, cost = B(R); else

T(R).
 Projection π A(R) in a similar
manner.
Index-Based Selection
 Consider selection σ A='c' (R).
 If there is an index on R.A, we can
locate tuples t with t.A='c' directly.
 What is the cost?
 How many tuples are selected?
 estimate: T(R)/V(R,A) on the average.
 if A is a primary key, V(R,A) =T(R) −> 1 disk
I/O.
Index-Based Selection
(cont.)
 Index is clustering, if tuples with A='c' are stored in
consecutive blocks (for any 'c')

A
10
A
10
index

10
20
20
Selection using a clustering index

 We estimate a fraction T(R)/V(R,A) of all R

tuples to satisfy A='c’. Apply same
estimate to data blocks accessible through
a clustering  B
index
( R) /V ( =>
R, A)

is an
 
estimate for the number of block accesses

• Further simplifications: Ignore, e.g.,

– cost of reading the (few) index blocks
– unfilled room left intentionally in blocks
– …
Selection Example
Consider σ A=0 (R) when T(R)=20,000, Time
if disk I/O
B(R)=1000, and there's an index on R.A
15 ms
 simple scan of R
 if R not clustered: cost = T(R) = 20,000
 if R clustered: cost = B(R) = 1000 5 min
 if V(R,A)=100 and index is … 15 sec
 not clustering −> cost = T(R)/V(R,A) = 200
 clustering −> cost = B(R)/V(R,A)= 10 3 sec
 if V(R,A)=20,000 (i.e., A is key) −> cost = 1 0,15 sec
15 ms
Processing of Joins
 Consider natural join R(X,Y)
S(Y,Z)
 general joins rather similarly,
possibly with additional selections
(for complex join conditions)
 Assumptions:
 Y = join attributes common to R
and S
 S is the smaller of relations: B(S)
≤B(R)
One-Pass Join
 Requirement: B(S) < M, i.e., S fits in
memory
 Read entire S in memory; Build a
dictionary (balanced tree, hash table)
using join attributes of tuples as search
key
 Read each block of R (using one buffer);
For each tuple t, find matching tuples
from the dictionary, and output their
join
 I/O cost ≤ B(S) + B(R)
What If Memory
Insufficient?
 Basic join strategy:
 "nested-loop" join
 ”1+n pass” operation:
 one relation read once, the other
repeateadly
 no memory limitations
 can be used for relations of any size
 Nested-loop join (conceptually)

for each tuple s ∈ S do

for each tuple r ∈ R do
if r.Y = s.Y then
output join of r and s;

• Cost (like for Cartesian product):

T(S) *(1 + T(R)) = T(S) + T(S)T(R)
 If R and S clustered, can apply
block-based nested-loop join:

for each chunck of M-1 blocks of S do

Read blocks in memory;
Insert tuples in a dictionary using the join
attributes;
for each block b of R do
Read b in memory;
for each tuple r in b do
Find matching tuples from the dictionary;
output their join with r;
Cost of Block-Based Nested-Loop Join

 Consider R(X,Y) S(Y,Z) when B(R)=1000,

B(S)=500, and M = 101
 Use 100 buffers for loading S
−> 500/100 = 5 chunks
 Total I/O cost = 5 x (100 + 1000) = 5500 blocks

• R as the outer-loop relation −> I/O cost

6000
– in general, using the smaller relation in the
outer loop gives an advantage of B(R) - B(S)
operations
Analysis of Nested-Loop
join
 B(S)/(M-1) outer-loop iterations;
Each reads M-1 + B(R) blocks
−> total cost = B(S) + B(S)B(R)/(M-
1), or approx. B(S)B(R)/M
blocks
 Not the best method, but sometimes
the only choice
 Next: More efficient join algorithms
Sort-Based Two-Pass Join
 Idea: Joining relations R and S on
attribute Y is rather easy, if the
relations are sorted using Y
 IF not too many tuples join for any value
of the join attributes. (E.g. if π Y(R) =
π Y(S) = {y}, all tuples match, and we
may need to resort to nested-loop join)
 If relations not sorted already, they
have to be sorted (with two-phase
multi-way merge sort, since they do
not fit in memory)
Sort-Based Two-Pass Join
1. Sort R with join attributes Y as the sort key;
2. Do the same for relation S;
3. Merge the sorted relations, using 1 buffer
for current input block of each relation:
- skip tuples whose Y-value y not in both R and S
- read blocks of both R and S for all tuples whose

Y value is y
- output all possible joins of the matching tuples
r ∈ R and s ∈ S
Example: Join of R and S sorted on Y

R(X, Y) S(Y, Z)
1 a
2 c b 1 5 a
1 e b 1 2c
c 2 2...c c 2 2
2c
3 c c 3 3 c c 3 3
4 d c 4 4 d c 4 2c
e 5 4
5 e e 5 3c
... ... ...
2
3…
c
… … Main memory 3
3c
4
Analysis of Sort-Based Two-Phase Join

 Consider R(X,Y) S(Y,Z) when B(R)=1000,

B(S)=500, and M = 101
 Remember two-phase multiway merge sort:
 each block read + written + read + written once
−> 4 x (B(R) + B(S)) = 6000 disk I/Os
 Merge of sorted relations for the join:
 B(R) + B(S)= 1500 disk I/Os
 Total I/O cost = 5 x (B(R) + B(S)) = 7500
 Seems big, but for large R and S much better
than B(R)B(S)/M of block-based nested loop
join
Analysis of Sort-Based Two-Phase Join

 Limitations? Sorting requires

max(B(R), B(S)) ≤≤M2 M ≥ max{B(R), B(S )}
• Variation: Perform only phase I (building
of sorted sublists) of the sorting, and
merge all of them (can handle at most
M) for the join
– I/O cost = 3 x (B(R) + B(S))
– requires the union of R and S to fit in at
most M sublists, each of whichMat
≥ most
B(R) +MB(S )
blocks long
−> works if B(R)+ B(S) ≤≤M2
Two-Phase Join with
Hashing
 Idea: If relations do not fit in memory,
first hash the tuples of each relation
in buckets. Then join tuples in each
pair of buckets.
 For a join on attributes Y, use Y as the hash
key
 Hash Phase: For each relation R and S:
 Use 1 input buffer, and M-1 output buffers as
hash buckets
 Read each block and hash its tuples; When
output buffer gets full, write it on disk as the
next block of that bucket
Two-Phase Join with
Hashing
 The hashing phase produces buckets
(sequences of blocks)
R1, …, RM-1 and S1, …, SM-1
 Tuples r ∈ R and s ∈ S join iff r.Y =
s.Y
=> h(r.Y) = h(s.Y)
=> r occurs in bucket Ri and
s occurs in bucket Si for the same i
Hash-Join: The Join Phase
 For each i = 1, …, M-1, perform one-
pass join between buckets Ri and Si
−> the smaller one has to fit in M-1
main memory buffers
 Average size for bucket Ri is approx.
B(R)/M, and B(S)/M for bucket Si
−> Approximated memory requirement
min(B(R), B(S)) < M2
M > min{B(R), B(S )}
Cost of Hash-Join

 Consider R(X,Y) S(Y,Z) when B(R)=1000,

B(S)=500, and M = 101
 Hashing −> 100 buckets for both R and S,
with avg sizes 1000/100=10 and 500/100=5
 I/O cost 4500 blocks:
 hashing phase 2x1000 + 2x500 = 3000 blocks
 join phase: 1000 + 500 (in total for the 100 one-
pass joins)
 In general: cost = 3(B(R) + B(S))
Index-Based Join

 Still consider R(X,Y) S(Y,Z)

 Assume there's an index on S.Y
 Can compute the join by
 reading each tuple t of R
 locating matching tuples of S by index-
lookup for t.Y, and
 outputting their join with tuple t
 Efficiency depends on many factors
Cost of Index-Based Join

 Cost of scanning R:
 B(R), if clustered; T(R), if not

• On the average, T(S)/V(S,Y) matching

tuples found by index lookup; Cost of
loading them (total for all tuples of R):
– T(R)T(S)/V(S,Y), if index not clustered
– T(R)B(S)/V(S,Y), if index clustered
• Cost of loading tuples of S dominates
Example: Cost of Index-
Join

 Again R(X,Y) S(Y,Z) with B(R)=1000,

B(S)=500;
T(R) = 10,000, T(S) = 5000, and V(S,Y) =
100
 Assume R clustered, and the index on S.Y
is clustering
−> I/O cost 1000 + 10,000 x 500/100 = 51,000
blocks
 Often not this bad…
Index-Join useful

… when |R| << |S|, and V(S,Y) large (i.e, the

index on S.Y is selective)
 For example, if Y primary key of S:
 each of the T(R) index lookups locates at most
one record of relation S
=> at most T(R) input operations to load blocks
of S
=> Total cost only
 B(R) + T(R), if R clustered, and
 T(R) + T(R) = 2T(R), if R not clustered
Joins Using a Sorted Index

 Still consider R(X,Y) S(Y,Z)

 Assume there's a sorted index on both R.Y
and S.Y
 B-tree or a sorted sequential index
 Scan both indexes in the increasing order of Y
 like merge-join, without need to sort first
 if index dense, can skip nonmatching tuples withou
loading them
 very efficient
 Details to excercises?
Questions?
Thank you for your time.

Questions? Comments?