Advanced Data Structures
Advanced Data Structures
Chapter 2
• What if the data items contained in a search tree do not fit into the
main memory?
2
Cycles to Access Storage
3
Search trees on disks
• A majority of the tree operations (search, insert, delete, etc.) will
require O(lg n) disk accesses where n is the number of data items in
the search tree
4
B-Tree
• B-Tree is a low-depth self-balancing tree. The height of a B-Tree is
kept low by putting maximum possible keys in a B-Tree node.
• The B-Trees have a higher branching factor (also termed as the order)
to reduce the depth.
5
Definition
A B-Tree of order m is an m-ary tree with the following properties:
• The data items are stored at leaves.
• The non-leaf nodes store up to m − 1 keys to guide the searching; The
key i represents the smallest key in subtree i + 1.
• The root is either a leaf or has between 2 and m children.
• All non-leaf nodes (except the root) have between ⌈m/2⌉ and m
children
• All leaves are at the same depth and have between [k/2] and k data
items, for some k.
6
B-Tree
• Order 5
• Depth 3
• m=5
• k=5
7
Searching
• Search 44
8
Inserting
• Insert 56
• Data items are shifted
9
Inserting
• Inserting 62
• Breaking leaf node into pairs
10
Inserting
Inserting 35
• Breaking into pairs and inclusion of a new non-leaf node
11
Insertion Algorithm
1. Insert the key in its leaf in 3. If an internal node ends up
sorted order with M+1 children, “Overflow”
• Split into two nodes:
2. If the leaf ends up with L+1 • Old with [(M+1)/2] children and new
items, “overflow” with [(M+1)/2] children
• Split into two nodes: • Add the new child to the parent
• Old with [(L+1)/2] items and new with • If parent ends up with M+1
[(L+1)/2] items children, “Overflow”
• Add the new child to the parent 4. Split an overflowed root in two.
• If parent ends up with M+1 Hand new nodes under a new
children, “Overflow” root
5. Propagate keys up the tree
12
Deleting
• Delete 96
13
Deletion Algorithm
1. Remove the key from its leaf 3. If an internal node ends up with
fewer than M/2 children,
2. If the leaf ends up with fewer “Underflow”
• Adpot data from a neighbour, update the
than L/2 items, “underflow” parent
• Adpot data from a neighbour, • If adopting won’t work, delete node and
merge with neighbour
update the parent • If parent ends up with fewer than M/2
children, “Underflow” Split an overflowed
• If adopting won’t work, delete root in two. Hand new nodes under a new
node and merge with neighbour root
• If parent ends up with fewer than 4. If the root ends up with only one
M/2 children, “Underflow” child, make the child the new root of
the tree
5. Propagate keys up through tree
14
B+-Trees
A B+-Tree of order m is an m-ary tree with the following properties:
• The data items are stored at leaves.
• The non-leaf nodes store up to m − 1 keys to guide the searching; The key i
represents the smallest key in subtree i + 1.
• All leaves are at the same depth and have up to k data items, for some k.
15
B -Tree
+
16
Network Flows
• When we concern ourselves with shortest paths, we interpret the
edge weights of an undirected graph as distances.
17
Flow Network
• We require that E never contain both (u,v) and (v,u) for any pair of
vertices u,v (so in particular, there are no loops). Also, if u,v ∈ V with
(u,v) ∉ E, then we define c(u,v) to be zero.
18
Max Flow Problem
• Now, to define our problem, we have two constraints. The capacity
constraint: each edge u, v in a flow network has some flow f(u, v)
attached such 0 ≤ f(u, v) ≤ c(u, v).
• The flow conservation that the flow entering a node must be the
same as the flow exiting a node (∑v∈V f(u, v) = ∑v∈V f(v, u)).
19
Ford Fulkerson
Ford-Fulkerson(G, S, T):
initialize flow f(u, v) = 0 for all edges (u, v) in G
20
Edmonds-Karp
Edmonds-Karp(G, S, T):
initialize flow f(u, v) = 0 for all edges (u, v) in G
21
Shortest Path
• Given a directed weighted graph G = (V,E,w) and two vertices x and y,
find the least-cost path from x to y in G
𝑤𝑤 𝑃𝑃 = � 𝑤𝑤(𝑢𝑢 → 𝑝𝑝)
𝑢𝑢→𝑣𝑣∈𝑃𝑃
23
Dijkstra’s Algorithm
For each edge (u, v) ∊ E, assume w(u, v) ≥ 0, maintain a set S of vertices
whose final shortest path weights have been determined. Repeatedly
select u ∊ V − S with minimum shortest path estimate, add u to S, relax
all edges out of u.
RELAX(u, v, w)
if d[v] > d[u] + w(u, v)
then d[v] = d[u] + w(u, v)
24
Dijkstra’s Algorithm
25
Bellman-Ford Algorithm
• The Bellman-Ford algorithm is a way to find single source shortest
paths in a graph with negative edge weights
26
Bellman-Ford
The Bellman-Ford algorithm utilizes the
same sort of distance update operation/relaxation that Dijkstra’s uses:
dist(v) = min(dist(v), dist(u) + l(u, v))
• Dijkstra has the privilege to stop performing this update operation for node
v when it is popped from the priority queue since a shorter path to v would
never happen in the future.
• However, with negative edges, Bellman-Ford may need to apply this update
operation many more times to account for future shorter paths.
27
Pseudocode
def bellmanFord(G, l, s):
for all v in V:
dist(u) = infinity
dist(s) = 0
for i=1...(|V| - 1)
for each e=(u,v) in E
dist(v) = min{dist(v), dist(u) + l(u, v)}
28
Example
• Distance value for each round
30
Johnson’s Algorithm
• Johnson’s all-pairs shortest path algorithm computes a cost for each vertex,
so that the new weight of every edge is non-negative, and then computes
shortest paths with respect to the new weights using Dijkstra’s algorithm.
• First, suppose the input graph has a vertex s that can reach all the other
vertices. Johnson’s algorithm computes the shortest paths from s to the
other vertices, using Bellman-Ford (which doesn’t care if the edge weights
are negative), and then reweights the graph using the cost function π(v) =
dist(s, v). The new weight of every edge is:
31
Geometric data
• Classical dictionary data structure largely concerns with one-
dimensional keys
• Queries
• Nearest Neighbour Search: closest point to the query point
• Range Searching: Give a rectangle or a circle find the points that line inside
• Point Location: Determine the region of the subdivision containing this point
32
Multi-Dimensional Data
• There are similarities with 1-D data
• Tree structure
• Balance O(lg n)
• Internal and leaf nodes
• Difference
• No natural total order (no one unique way to sort the data e.g. x and y
coordinates)
• Tree rotation may not be meaningful
33
Data Structure
• Partition Trees
• Hierarchical space
partition
• In binary trees, each point naturally splits the real line in two. In two
dimensions if we run a vertical and horizontal line through the point,
it naturally subdivides the plane into four quadrants about this point.
35
Point Quadtree
• Each node has four (possibly null) children,
corresponding to the four quadrants defined by
the 4-way subdivision. We label these according
to the compass directions, as NW, NE, SW, and
SE.
37
Subdivision and corresponding Tree
38
Point kd-tree
• Point quadtrees can be generalized to higher dimensions, the number
of children grows exponentially in the dimension, as 2d.
• For example, if we are working in 20-dimensional space, every node has 220,
or roughly a million children! Clearly, the simple quadtree idea is not scalable
to very high dimensions.
• Like in a quadtree, when a new point is inserted into some leaf node’s
cell, we split the cell by a horizontal or vertical splitting line, which
passes through this point
• Often we alternate between the horizontal and the vertical
39
Point kd-tree
• Height of the tree: O(lg n)
40
Data Structure
41
Insertion
• The function takes three arguments, the point pt being inserted, the
current node p, and the cutting dimension cd of the newly created
node. The initial call is root = insert(pt, root, 0).
42
Deletion
• How to find the replacement for the deleted node?
43
Deletion
• Finding replacement for the deletion
44
45
Rectangle Range Searching
• Report all rectangles intersecting query rectangle Q
Q: Find all restaurants in Sanepa
46
Similar to B-tree
R-trees Rectangles in leaves (on same level)
Internal nodes contain MBR of rectangles below each child
47
R-trees
• The key idea of the data structure is to group nearby objects and
represent them with their minimum bounding rectangle in the next
higher level of the tree
• Since all objects lie within this bounding rectangle, a query that does
not intersect the bounding rectangle also cannot intersect any of the
contained objects
48
Search
Let q be the search region of a range query. We invoke range-query(root, q),
where root is the root of the tree
range-query(u, r)
if u is a leaf then
report all points stored at u that are covered by r
else
for each child v of u do
if MBR(v) intersects r then
range-query(v, r)
49
Example
• Nodes u1, u2, u3, u5, u6 are accessed to answer the query with the
shaded search region
50
Tree Construction
Which is better?
51
Common Principle
• In general, the construction algorithm of the R-tree aims at
minimizing the perimeter sum of all the MBRs.
• For example, the left tree has a smaller perimeter sum than the right
one.
52
Insertion
• Let p be the point being inserted and root is the root of the tree
insert(u, p)
if u is a leaf node then
add p to u
if u overflows then
/* namely, u has B + 1 points */
handle-overflow(u)
else
v choose-subtree(u, p)
/* which subtree under u should we insert p into? */
insert(v, p)
53
Overflow
handle-overflow(u)
split(u) into u and u’
if u is the root then
create a new root with u and u’ as its child nodes
else
w the parent of u
update MBR(u) in w
add u’ as a child of w
if w overflows then
handle-overflow(w)
54
Choosing a subtree
choose-subtree(u, p)
return the child whose MBR requires the
minimum increase in perimeter to cover p.
break ties by favoring the smallest MBR.
55
Example
Assume that we want to insert the white point m. By applying choose-
subtree twice, we reach the leaf node u6 that should accommodate m.
The node overflows after incorporating m (here, B = 3)
56
Example (contd.)
Node u6 splits, generating u9. Adding u9 as a child of u3 causes u3 to
overflow
57
Example (contd.)
Node u3 splits, generating u10. The insertion finishes after adding u10 as
a child of the root
58
Minimum Spanning Trees
• A tree is a connected graph with no
cycles. A spanning tree is a subgraph of
G which has the same set of vertices of
G and is a tree.
59
Prim’s Algorithm
function prims(G)
let E' = {}
pick any starting node, s, in V
let V' = {s}
while |V'| < |V|
find, e, the minimum weight edge with one endpoint in V’
add this edge to E’
add the endpoint of e not in V' to V’
return (V', E')
60
61
62
63
64
65
66
67
68
Krushkal’s Algorithm
function kruskals(G)
// using disjoint sets, find and union
let E' = {}
sort edges in E
make singleton sets of all nodes in V
for each edge, (u,v) in E
if find(u) != find(v)
add edge to E’
union(u, v)
return (V, E')
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
struct t_red_black_node {
enum { red, black } colour;
85
Red-Black Tree
• We will label each of “second nodes” of the 3-nodes as red and label all the other
nodes as black, we obtain a binary tree with both red and black nodes. The
resulting binary tree satisfies the following red-black tree properties:
86