Data Structures Cheat Sheet
Data Structures Cheat Sheet
Trees Binary
Red-Black Tree Melding: If the heap is represented by an array, link the two
arrays together and Heapify-Up. O (n).
1. Red Rule: A red child must have a black father
2. Black Rule: All paths to external nodes pass through the
Binomial
same number of black nodes.
3. All the leaves are black, and the sky is grey. Melding: Unify trees by rank like binary summation. O (log n)
Rotations are terminal cases. Only happen once per fixup.
If we have a series of insert-delete for which the insertion point Fibonacci Heap
is known, the amortized cost to each action is O (n). √
(1+ 5)
Height:log n ≤ h ≤ 2 log n Maximum degree: D (n) ≤ logϕ n ; ϕ = 2
Limit of rotations: 2 per insert. Minimum size of degree k: sk ≥ Fk+2
Bound of ratios between two branches L, R: S (R) ≤ (S (L))
2 Marking: Every node which lost one child is marked.
Completely isomorphic to 2-4 Trees. Cascading Cut: Cut every marked node climbing upwards.
Keeps amortized O(log n) time for deleteMin. Otherwise
√
B-Tree O( n).
Proof of the ϕk node size bound:
d defines the minimum number of keys on a node 1. All subtrees of junction j, sorted by order of insertion are of
Height: h ≈ logd n degree D[si ] ≥ i − 2 (Proof: when x’s largest subtree was added,
1. Every node has at most d children and at least d2 children since D [x] was i − 1, so was the subtree. Since then, it could
(root excluded). lose only one child, so it is at least i − 2)
2. The root has at least 2 children if it isn’t a leaf. Pk
2. Fk+2 = 1 + i=0 Fi ; Fk+2 ≥ ϕk
3. A non-leaf node with k children contains k − 1 keys. 3. If x is a node and k = deg [x], Sx ≥ Fk+2 ≥ ϕk .
4. On B+ trees, leaves appear at the same level. (Proof: Assume induction after the base cases and then sk =
5. Nodes at each level form linked lists Pk Pk Pk
2 + i=2 Si−2 ≥ 2 + i=2 Fi = 1 + i=0 Fi = Fk+2 )
d is optimized for HDD/cache block size
Insert: Add to insertion point. If the node gets too large, Structures
split.O (log n) ≤ O (logd n)
Split: The middle of the node (low median) moves up to be the Median Heap: one min-heap and one max-heap with ∀x ∈
edge of the father node. O (d) min, y ∈ max : x > y then the minimum is on the median-heap
Delete: If the key is not in a leaf, switch with succ/pred. Delete,
and deal with short node v:
1. If v is the root, discard; terminate.
Sorting
2. If v has a non-short sibling, steal from it; terminate. Comparables
3. Fuse v with its sibling, repeat with p ← p [v].
Algorithm Expected Worst Storage
Traversals QuickSort O (n log n) O n2 In-Place
Partition recursively at each step.
BubbleSort O n2 In-Place
Traverse(t):
SelectionSort O n2 In-Place
if t==null then return
Traverse n slots keeping score of the
→ print (t) //pre-order
Traverse(t.left) maximum. Swap it with A [n]. Repeat
→ (OR) print(t) //in-order for A [n − 1] .
Traverse(t.right) HeapSort O (n log n) Aux
→ (OR) print(t) //post-order InsertionSort Aux
MergeSort O (n log n) Aux
Union-Find
Selection
MakeSet(x) Union(x, y) Find(x)
QuickSelect O (n) O n2 O (1) O (1) O (α (k))
5-tuple Select Union by Rank: The larger tree remains the master tree in
every union.
Path Compression: every find operation first finds the master
Hashing root, then repeats its walk to change the subroots.
Universal Family: a family of mappings H. ∀h ∈ H. h : U →
1
[m] is universal iff ∀k1 6= k2 ∈ U : P rh∈H [h(k1 ) = h(k2 )] ≤ m Recursion
Example: If U = [p] = {0, 1, . . . , p − 1}then Hp,m =
Master Theorem: for T (n) = aT nb + f (n) ; a ≥ 1, b >
{ha,b | 1 ≤ a ≤ p; 0 ≤ b ≤ p} and every hash function is
ha,b (k) = ((ak + b) mod (p)) mod (m) ε > 0:
1,
Linear Probing: Search in incremental order through the table
T (n) = Θ nlogb a f (n) = O nlogb (a)−ε
from h (x) until a vacancy is found. T (n) = Θ nlogb a logk+1 n
f (n) = Θ nlogb a logk n ; k ≥ 0
Open Addressing: Use h1 (x) to hash and h2 (x)to permute.
f (n) = Ω nlogb a+ε
No pointers.
T (n) = Θ (f (n))
af nb ≥ cf (n)
Open Hashing:
Perfect Hash: When one function clashes, try another. O (∞). Building a recursion tree: build one tree for running times (at
Load Factor α: The length of a possible collision chain. When T (αn)) and one for f (n).
|U | = n, α = mn.
Orders of Growth
Methods f x→∞
f = O (g) lim supx→∞ fg < ∞ f = o(g) g → 0
Modular: Multipilicative, Additive, Tabular(byte)-additive f = Θ (g) limx→∞ fg ∈ R+
f x→∞
f = Ω (g) lim inf x→∞ fg > 0 f = ω(g) g → ∞
Performance
Chaining E [X] Worst Case Amortized Analysis
1
Successful Search/Del 2 (1 + α) n
Potential Method: Set Φ to examine a parameter on data
Failed Search/Verified Insert 1+α n
stucture Di where i indexes the state of the structure. If ci is
the actual cost of action i, then cˆi = ci + Φ (Di ) − Φ (Di−1 ).
Probing Pn
The total potential amortized cost will then be
Pn Pn i=1 cˆi =
Linear: h (k, i) = (h0 (k) + i) mod m (c
i=1 i + Φ (D i ) − Φ (Di−1 )) = c
i=1 i + Φ (D i ) − Φ (D0 )
Quadratic:h k, i) = h0 (k) + c1 i + c2 i2 mod m Deterministic algorithm: Always predictable.
√ n
Double: h (k, i) = (h1 (k) + ih2 (k)) mod m Stirling’s Approximation: n! ∼ 2πn ne ⇒ log n! ∼
E [X] Unsuccessful Search Successful Search x log x − x
1 1
Uni. Probing ln 1
1−α 2 α 1−α
1 1 1 1
Lin. Probing 2 1 + 1−α 2 1 + 1−α
So Linear Probing is slightly worse but better for cache.
Collision Expectation: P [X ≤ 2E [X]] ≥ 12
So:
Scribbled by Omer Shapira, based on the course “Data Structures” at Tel Aviv
1. if m = n then E [|Col| < n] ≥ n2 University.
2. if m = n2 then E [|Col| < 1] ≥ 12 And with 2 there are no Redistribute freely.
collisions. Website: http://www.omershapira.com