CAS CS 460/660 Introduction To Database Systems Query Evaluation I

CAS CS 460/660
Introduction to Database
Systems
Query Evaluation I
Slides from UC Berkeley
1.1
Introduction
 We’ve covered the basic underlying

storage, buffering, and indexing
SQL Query
technology.
 Now we can move on to query
Query Optimization
processing.
and Execution
 Some database operations are EXPENSIVE
 Can greatly improve performance by being Relational Operators
“smart” Files and Access Methods
 e.g., can speed up 1,000x over naïve
approach Buffer Management
 Main weapons are:
Disk Space Management
1. clever implementation techniques for
operators
2. exploiting “equivalencies” of relational
operators DB
3. using statistics and cost models to
choose among these. 1.2
Cost-based Query Sub-System
Select *
Queries From Blah B
Where B.blah = blah Usually there is a
heuristics-based
rewriting step before
Query Parser the cost-based steps.
Query Optimizer
Plan Plan Cost Catalog Manager

Generator Estimator
Schema Statistics
Query Plan Evaluator

1.3
Query Processing Overview
 The query optimizer translates SQL to a special internal
“language”
 Query Plans
 The query executor is an interpreter for query plans
 Think of query plans as “box-and-arrow”
dataflow diagrams
 Each box implements a relational operator
 Edges represent a flow of tuples (columns as specified)
name, gpa
 For single-table queries, these diagrams are
straight-line graphs Distinct
name, gpa
SELECT DISTINCT name, gpa Optimizer
FROM Students Sort
name, gpa
HeapScan
1.4
Query Optimization
Distinct
 A deep subject, focuses on multi-table queries
 We will only need a cookbook version for now. Sort
 Build the dataflow bottom up:
 Choose an Access Method (HeapScan or IndexScan)
 Non-trivial, we’ll learn about this later! Filter
 Next apply any WHERE clause filters
 Next apply GROUP BY and aggregation HashAgg
 Can choose between sorting and hashing!
 Next apply any HAVING clause filters
 Next Sort to help with ORDER BY and DISTINCT Filter
 In absence of ORDER BY, can do DISTINCT via hashing!
HeapScan
1.5
Iterators
 The relational operators are all subclasses of the class iterator:
class iterator {
void init(); iterator
tuple next();
void close();
iterator inputs[];
// additional state goes here
}
 Note:
 Edges in the graph are specified by inputs (max 2, usually 1)
 Encapsulation: any iterator can be input to any other!
 When subclassing, different iterators will keep different kinds of state
information
1.6
Example: Scan class Scan extends iterator {
void init();
tuple next();
void close();
iterator inputs[1];
bool_expr filter_expr;
 init(): proj_attr_list proj_list;
 Set up internal state }
 call init() on child – often a file open
 next():
 call next() on child until qualifying tuple found or EOF
 keep only those fields in “proj_list”
 return tuple (or EOF -- “End of File” -- if no tuples remain)
 close():
 call close() on child
 clean up internal state
Note: Scan also applies “selection” filters and “projections”

(without duplicate elimination)
1.7
class Sort extends iterator {
Example: Sort void init();
tuple next();
void close();
iterator inputs[1];
int numberOfRuns;
DiskBlock runs[];
 init():
RID nextRID[];
}
 generate the sorted runs on disk
 Allocate runs[] array and fill in with disk pointers.
 Initialize numberOfRuns
 Allocate nextRID array and initialize to NULLs
 next():
 nextRID array tells us where we’re “up to” in each run
 find the next tuple to return based on nextRID array
 advance the corresponding nextRID entry
 return tuple (or EOF -- “End of File” -- if no tuples remain)
 close():
 deallocate the runs and nextRID arrays
1.8
Streaming through RAM
 Simple case: “Map”. (assume many records per disk page)
 Goal: Compute f(x) for each record, write out the result
 Challenge: minimize RAM, call read/write rarely
 Approach
 Read a chunk from INPUT to an Input Buffer
 Write f(x) for each item to an Output Buffer
 When Input Buffer is consumed, read another chunk
 When Output Buffer fills, write it to OUTPUT
 Reads and Writes are not coordinated (i.e., not in lockstep)
 E.g., if f() is Compress(), you read many chunks per write.
 E.g., if f() is DeCompress(), you write many chunks per read.
Input Output
f(x)
Buffer Buffer
INPUT RAM OUTPUT

1.9
Rendezvous
 Streaming: one chunk at a time. Easy.

 But some algorithms need certain
items to be co-resident in memory
 not guaranteed to appear in the same
input chunk
 Time-space Rendezvous
 in the same place (RAM) at the same time
 There may be many combos of such
items
1.10
Divide and Conquer
 Out-of-core algorithms orchestrate

rendezvous.
 Typical RAM Allocation:
 Assume B pages worth of RAM available
 Use 1 page of RAM to read into
 Use 1 page of RAM to write into
 B-2 pages of RAM as workspace
B-2
INPUT OUTPUT
IN OUT
1.11
Divide and Conquer
 Phase 1
 “streamwise” divide into N/(B-2)
megachunks
 output (write) to disk one megachunk at
a time
B-2
INPUT OUTPUT
IN OUT
1.12
Divide and Conquer
 Phase 2
 Now megachunks will be the input
 process each megachunk individually.
B-2
INPUT OUTPUT
IN OUT
1.13
Sorting: 2-Way
• Pass 0:
– read a page, sort it, write it.
– only one buffer page is used
– a repeated “ batch job”
I/O
INPUT Buffer
OUTPUT
sort
RAM
1.14
Sorting: 2-Way (cont.)
 Pass 1, 2, 3, …, etc. (merge):

 requires 3 buffer pages
 note: this has nothing to do with double buffering!
 merge pairs of runs into runs twice as long
 a streaming algorithm, as in the previous slide!
INPUT 1
Merge OUTPUT
INPUT 2
RAM
1.15
Two-Way External Merge Sort
 Sort subfiles and Merge 3,4 6,2 9,4 8,7 5,6 3,1 2 Input file
PASS 0
 How many passes? 3,4 2,6 4,9 7,8 5,6 1,3 2 1-page runs
 N pages in the file PASS 1
2,3 4,7 1,3
=> the number of 2-page runs
4,6 8,9 5,6 2
passes = PASS 2
2,3
4,4 1,2 4-page runs
6,7 3,5
8,9 6
PASS 3
 Total I/O cost? (reads +
1,2
writes)
2,3
 Each pass we read + 3,4 8-page runs
write 4,5
6,6
each page in file. So
7,8
total cost is: 9
1.16
General External Merge Sort
 More than 3 buffer pages. How can we

utilize them?
 To sort a file with N pages using B buffer
pages:
 Pass 0: use B buffer pages. Produce
sorted runs of B pages each.
INPUT 1
INPUT 2
... sort
INPUT B
RAM Disk
Pass 0 – Create Sorted Runs

1.17
General External Merge Sort
Pass 1, 2, …, etc.: merge B-1 runs.

Creates runs of (B-1) * size of runs from
previous pass.
INPUT 1
INPUT 2
Merge OUTPUT
...
INPUT B-1
RAM Disk
Merging Runs
1.18
Cost of External Merge Sort
 Number of passes:
 Cost = 2N * (# of passes)
 E.g., with 5 buffer pages, to sort 108
page file:
 Pass 0: = 22 sorted runs of 5
pages each (last run is only 3 pages)
 Pass 1: = 6 sorted runs of 20
pages each (last run is only 8 pages)
 Pass 2: 2 sorted runs, 80 pages and 28
pages
 Pass 3: Sorted file of 108 pages
Formula check: 1+┌log4 22┐= 1+3  4 passes √

1.19
# of Passes of External Sort
( I/O cost is 2N times number of passes)
1.20
Memory Requirement for External
Sorting
 How big of a table can we sort in two passes?

 Each “sorted run” after Phase 0 is of size B
 Can merge up to B-1 sorted runs in Phase 1
 Answer: B(B-1).
 Sort N pages of data in about space
1.21
Alternative: Hashing
 Idea:
 Many times we don’t require order
 E.g.: removing duplicates
 E.g.: forming groups
 Often just need to rendezvous
matches
 Hashing does this
 And may be cheaper than sorting!
(Hmmm…!)
 But how to do it out-of-core??
1.22
Divide
 Streaming Partition (divide):

Use a hash f’n hp to stream records
to disk partitions
 All matches rendezvous in the same
partition.
 Streaming alg to create partitions on
disk:
 “Spill” partitions to disk via output buffers
1.23
Divide & Conquer
 Streaming Partition (divide):
Use a hash function hp to stream records to
disk-based partitions
 All matches rendezvous in the same partition.
 Streaming alg to create partitions on disk:
 “Spill” partitions to disk via output buffers
 ReHash (conquer):
Read partitions into RAM-based hash table one
at a time, using hash function hr
 Then go through each bucket of this hash table to
achieve rendezvous in RAM
 Note: Two different hash functions
 hp is coarser-grained than hr
1.24
Two Phases
Original
Relation OUTPUT Partitions
 Partition: 1
1
INPUT 2
hash 2
...
function
hp B-1
B-1
Disk B main memory buffers Disk
1.25
Two Phases Original
Relation OUTPUT Partitions
1
1
INPUT 2
hash 2
 Partition:
...
function
hp B-1
B-1
Disk B main memory buffers Disk
Partitions Result
Hash table for partition
 Rehash: hash Ri (k <= B pages)
fn
hr
Disk B main memory buffers

1.26
Cost of External Hashing
cost = 4*N IO’s
1.27
Memory Requirement
 How big of a table can we hash in two

passes?
 B-1 “partitions” result from Phase 0
 Each should be no more than B pages in size
 Answer: B(B-1).
 We can hash a table of size N pages in about space
 Note: assumes hash function distributes records
evenly!
 Have a bigger table? Recursive partitioning!
 How many times?
 Until every partition fits in memory !! (<=B)
1.28
How does this compare with
external sorting?
1.29
So which is better ??
 Simplest analysis:
 Same memory requirement for 2 passes
 Same I/O cost
 But we can dig a bit deeper…
 Sorting pros:
 Great if input already sorted (or almost sorted)
w/heapsort
 Great if need output to be sorted anyway
 Not sensitive to “data skew” or “bad” hash functions
 Hashing pros:
 For duplicate elimination, scales with # of values
 Not # of items! We’ll see this again.
 Can exploit extra memory to reduce # IOs (stay tuned…)
1.30
Summing Up 1
 Unordered collection model

 Read in chunks to avoid fixed I/O costs
 Patterns for Big Data

 Streaming
 Divide & Conquer
 also Parallelism (but we didn’t cover this here)
1.31
Summary Part 2
 Sort/Hash Duality
 Sorting is Conquer & Merge
 Hashing is Divide & Conquer
 Sorting is overkill for rendezvous
 But sometimes a win anyhow
 Sorting sensitive to internal sort alg
 Quicksort vs. HeapSort
 In practice, QuickSort tends to be used
 Don’t forget double buffering (with threads)
1.32

CAS CS 460/660 Introduction To Database Systems Query Evaluation I

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CAS CS 460/660 Introduction To Database Systems Query Evaluation I

Uploaded by

Copyright:

Available Formats

CAS CS 460/660

Slides from UC Berkeley

 We’ve covered the basic underlying

Plan Plan Cost Catalog Manager

Query Plan Evaluator

 The relational operators are all subclasses of the class iterator:

Note: Scan also applies “selection” filters and “projections”

INPUT RAM OUTPUT

 Streaming: one chunk at a time. Easy.

 Out-of-core algorithms orchestrate

 Pass 1, 2, 3, …, etc. (merge):

 More than 3 buffer pages. How can we

Pass 0 – Create Sorted Runs

Pass 1, 2, …, etc.: merge B-1 runs.

Formula check: 1+┌log4 22┐= 1+3  4 passes √

( I/O cost is 2N times number of passes)

 How big of a table can we sort in two passes?

 Streaming Partition (divide):

Disk B main memory buffers Disk

Disk B main memory buffers Disk

Disk B main memory buffers

cost = 4*N IO’s

 How big of a table can we hash in two

 Unordered collection model

 Patterns for Big Data

You might also like