Part10 Quadtrees Etc
Part10 Quadtrees Etc
Part10 Quadtrees Etc
Part 10
Advanced Data Structures
Henry Kautz
Autumn Quarter 2002
1
Outline
• Multidimensional search trees
– Range Queries
– k-D Trees
– Quad Trees
• Randomized Data Structures & Algorithms
– Treaps
– Primality testing
– Local search for NP-complete problems
2
Multi-D Search ADT
• Dictionary operations 5,2
– create
– destroy 2,5 8,4
– find
4,4 1,9 8,2 5,7
– insert
– delete 4,2 3,6 9,1
– range queries
• Each item has k keys for a k-dimensional search tree
• Searches can be performed on one, some, or all the
keys or on ranges of the keys
3
Applications of Multi-D Search
• Astronomy (simulation of galaxies) - 3 dimensions
• Protein folding in molecular biology - 3 dimensions
• Lossy data compression - 4 to 64 dimensions
• Image processing - 2 dimensions
• Graphics - 2 or 3 dimensions
• Animation - 3 to 4 dimensions
• Geographical databases - 2 or 3 dimensions
• Web searching - 200 or more dimensions
4
Range Query
A range query is a search in a dictionary in which
the exact key may not be entirely specified.
5
Range Query Examples:
Two Dimensions
6
Range Querying in 1-D
Find everything in the rectangle…
7
Range Querying in 1-D with a BST
Find everything in the rectangle…
8
1-D Range Querying in 2-D
y
9
x
2-D Range Querying in 2-D
y
10
x
k-D Trees
• Split on the next dimension at each succeeding level
• If building in batch, choose the median along the
current dimension at each level
– guarantees logarithmic height and balanced tree
• In general, add as in a BST
k-D tree node
keys value The dimension that
dimension this node splits on
left right
11
Find in a k-D Tree
find(<x1,x2, …, xk>, root) finds the node
which has the given set of keys in it or returns
null if there is no such node
Node find(keyVector keys,
Node root) {
int dim = root.dimension;
if (root == NULL)
return NULL;
else if (root.keys == keys)
return root;
else if (keys[dim] < root.keys[dim])
return find(keys, root.left);
else
return find(keys, root.right);
12
} runtime:
Find Example
find(<3,6>) 5,2
find(<0,10>)
2,5 8,4
14
x
Building a 2-D Tree (2/4)
y
15
x
Building a 2-D Tree (3/4)
y
16
x
Building a 2-D Tree (4/4)
y
17
x
k-D Tree
b a d
c
e
e
f h
i
g g h
j m
k l
f k d l
b a j c i m18
2-D Range Querying in 2-D Trees
y
runtime: O(N) 20
Range Query in a k-D Tree
print_range(int low[MAXD], high[MAXD], Node root) {
if (root == NULL) return;
inrange = true;
for (i=0; i<MAXD;i++){
if ( root.coord[i] < low[i] ) inrange = false;
if ( high[i] < root.coord[i] ) inrange = false; }
if (inrange) print(root);
if ((low[root.dim] <= root.coord[root.dim] )
print_range(root.left);
if (root.coord[root.dim] <= high[root.dim])
print_range(root.right);
}
runtime: O(N) 21
Other Shapes for Range Querying
y
7,7
7,7
keys value
0,1 1,1
Center: x y
runtime: O(depth) 26
Find Example
find(<10,2>) (i.e., c)
find(<5,6>) (i.e., d)
a b
c
a g
d d e f
e
f g
b c
27
Building a Quad Tree (1/5)
y
28
x
Building a Quad Tree (2/5)
y
29
x
Building a Quad Tree (3/5)
y
30
x
Building a Quad Tree (4/5)
y
31
x
Building a Quad Tree (5/5)
y
32
x
Quad Tree Example
a b
c
a g
d d e f
e
f g
b c
33
Quad Trees Can Suck
a
b
suck factor: 34
Quad Trees Can Suck
a
b
36
x
2-D Range Query in a Quad Tree
print_range(int xlow, xhigh, ylow, yhigh, Node root){
if (root == NULL) return;
if ( xlow <= root.x && root.x <= xhigh &&
ylow <= root.y && root.y <= yhigh ){
print(root);
runtime: O(depth) 38
Delete Example
delete(<10,2>)(i.e., c)
a b
c
a g
d
e d e f
f g
a g
d
e d e f
f g
42
CSE 326: Data Structures
Part 10, continued
Data Structures
Henry Kautz
Autumn Quarter 2002
43
Pick a Card
45
What’s the Difference?
• Deterministic with good average time
– If your application happens to always use the “bad” case,
you are in big trouble!
• Kind of like an
insurance policy
for your algorithm!
46
Treap Dictionary Data Structure
heap in yellow; search tree in blue
• Treaps have the binary
search tree 2
– binary tree property 9
– search tree property
6 4
• Treaps also have the 7 18
heap-order property!
– randomly assigned 7 9 10
priorities 8 15 30
Legend:
priority 15
key 47
12
Treap Insert
• Choose a random priority
• Insert as in normal BST
• Rotate up until heap order is restored (maintaining
BST property while rotating)
insert(15)
2 2 2
9 9 9
6 14 6 14 6 9
7 12 7 12 7 15
7 7 9 7 14 48
8 8 15 8 12
Tree + Heap… Why Bother?
Insert data in sorted order into a treap; what shape
tree comes out?
7 6 6 15
8 7 7 12
Legend:
priority 7 7
key 49
8 8
Treap Delete
• Find the key delete(9)
2 6
Increase its value to
rotate left rotate left
• 9 7
• 6 9
Rotate it to the fringe
7 15
• Snip it off 9
7 15 7
6 8 12 9
7 rotate right 8 15
7 15
8 12
9
9
15 15
50
12
Treap Delete, cont.
6 6
6 rotate right
rotate right 7 7
7
7 7
7
8 8
8
9 9
9 15 15
9 15
15 15 9 12
12 15
12 9
snip!
51
Treap Summary
• Implements Dictionary ADT
– insert in expected O(log n) time
– delete in expected O(log n) time
– find in expected O(log n) time
– but worst case O(n)
• Memory use
– O(1) per node
– about the cost of AVL trees
• Very simple to implement, little overhead – less
than AVL trees
52
Other Randomized Data
Structures & Algorithms
• Randomized skip list
– cross between a linked list and a binary search tree
– O(log n) expected time for finds, and then can simply
follow links to do range queries
• Randomized QuickSort
– just choose pivot position randomly
– expected O(n log n) time for any input
53
Randomized Primality Testing
• No known polynomial time algorithm for primality
testing
– but does not appear to be NP-complete either – in
between?
• Best known algorithm:
1. Guess a random number 0 < A < N
2. If (AN-1 % N) 1, then N is not prime
3. Otherwise, 75% chance N is prime
– or is a “Carmichael number” – a slightly more complex test
rules out this case
4. Repeat to increase confidence in the answer
54
Randomized Search Algorithms
• Finding a goal node in very, very large graphs
using DFS, BFS, and even A* (using known
heuristic functions) is often too slow
• Alternative: random walk through the graph
55
N-Queens Problem
• Place N queens on an N by N chessboard so that
no two queens can attack each other
• Graph search formulation:
– Each way of placing from 0 to N queens on the
chessboard is a vertex
– Edge between vertices that differ by adding or removing
one queen
– Start vertex: empty board
– Goal vertex: any one with N non-attacking queens (there
are many such goals)
• Demo
56
Random Walk – Complexity?
• Random walk – also known as an “absorbing
Markov chain”, “simulated annealing”, the
“Metropolis algorithm” (Metropolis 1958)
• Can often prove that if you run long enough will
reach a goal state – but may take exponential time
• In some cases can prove that with high probability a
goal is reached in polynomial time
– e.g., 2-SAT, Papadimitriou 1997
• Widely used for real-world problems where actual
complexity is unknown – scheduling, optimization
57
Traveling Salesman
Recall the Traveling Salesperson (TSP) Problem:
Given a fully connected, weighted graph G =
(V,E), is there a cycle that visits all vertices
exactly once and has total cost K?
– NP-complete: reduction from Hamiltonian circuit
• Occurs in many real-world transportation and
design problems
• Randomized simulated annealing algorithm demo
58
Latin Squares
• Randomization can be combined with depth first
search
• When a branch of the search terminates without
finding a solution, algorithm backs up to the last
choice point: backtracking search
• Instead of make choice of branch to follow
systematically, make it randomly
– If your random choices are unlucky, give up and start
over again
• Demo
59
Final Review
60
Be Sure to Bring
• 1 page of notes
• A hand calculator
• Several #2 pencils
61
Final Review: What you need to
• Basic Math
know N ( N 1) N
i
– Logs, exponents, summation of series i 1 2
– Proof by induction N
AN 1 1
A i
A 1
• Asymptotic Analysis i 0
63
Final Review: What you need to
know
• Binary Search Trees
– How to do Find, Insert, Delete
• Bad worst case performance – could take up to O(N) time
– AVL trees
• Balance factor is +1, 0, -1
• Know single and double rotations to keep tree balanced
• All operations are O(log N) worst case time
– Splay trees – good amortized performance
• A single operation may take O(N) time but in a sequence of
operations, average time per operation is O(log N)
• Every Find, Insert, Delete causes accessed node to be moved to the
root
• Know how to zig-zig, zig-zag, etc. to “bubble” node to top
64
Final Review: What you need to
know
• Priority Queues
– Binary Heaps: Insert/DeleteMin, Percolate up/down
• Array implementation
• BuildHeap takes only O(N) time (used in heapsort)
– Binomial Queues: Forest of binomial trees with heap order
• Merge is fast – O(log N) time
• Insert and DeleteMin based on Merge
• Hashing
– Hash functions based on the mod function
– Collision resolution strategies
• Chaining, Linear and Quadratic probing, Double Hashing
– Load factor of a hash table
65
Final Review: What you need to
know
• Sorting Algorithms: Know run times and how they work
– Elementary sorting algorithms and their run time
• Selection sort
– Heapsort – based on binary heaps (max-heaps)
• BuildHeap and repeated DeleteMax’s
– Mergesort – recursive divide-and-conquer, uses extra array
– Quicksort – recursive divide-and-conquer, Partition in-place
• fastest in practice, but O(N2) worst case time
• Pivot selection – median-of-three works best
– Know which of these are stable and in-place
– Lower bound on sorting, bucket sort, and radix sort
66
Final Review: What you need to
know
• Disjoint Sets and Union-Find
– Up-trees and their array-based implementation
– Know how Union-by-size and Path compression work
– No need to know run time analysis – just know the result:
• Sequence of M operations with Union-by-size and P.C. is (M
(M,N)) – just a little more than (1) amortized time per op
• Graph Algorithms
– Adjacency matrix versus adjacency list representation of
graphs
– Know how to Topological sort in O(|V| + |E|) time using a
queue
– Breadth First Search (BFS) for unweighted shortest path
67
Final Review: What you need to
know
• Graph Algorithms (cont.)
– Dijkstra’s shortest path algorithm
– Depth First Search (DFS) and Iterated DFS
• Use of memory compared to BFS
– A* - relation of g(n) and h(n)
– Minimum Spanning trees – Kruskal’s & Prim’s algorithms
– Connected components using DFS or union/find
• NP-completeness
– Euler versus Hamiltonian circuits
– Definition of P, NP, NP-complete
– How one problem can be “reduced” to another (e.g. input to HC
can be transformed into input for TSP)
68
Final Review: What you need to
know
• Multidimensional Search Trees
– k-d Trees – find and range queries
• Depth logarithmic in number of nodes
– Quad trees – find and range queries
• Depth logarithmic in inverse of minimal distance between
nodes
• But higher branching fractor means shorter depth if points are
well spread out (log base 4 instead of log base 2)
• Randomized Algorithms
– expected time vs. average time vs. amortized time
– Treaps, randomized Quicksort, primality testing
69