data-structures-and-algorithms-class-notes-pdf
data-structures-and-algorithms-class-notes-pdf
Raphael Finkel
May 4, 2021
1 Intro
Class 1, 1/26/2021
• Handout 1 — My names
• TA:
• Plagiarism — read aloud
• Assignments on web. Use C, C++, or Java.
• E-mail list:
• accounts in MultiLab
• text — we will skip around
1
CS315 Spring 2021 2
3 Tools
Use
Specification
Implementation
4 Singly-linked list
• used as a part of several ADTs.
• Can be considered an ADT itself.
• Collection of nodes, each with optional arbitrary data and a pointer
to the next element on the list.
handle
a c x f
data pointer null pointer
operation cost
create empty list O(1)
insert new node at front of list O(1)
delete first node, returning data O(1)
count length O(n)
search by data O(n)
sort O(n log n) to O(n2 )
CS315 Spring 2021 3
CS315 Spring 2021 4
2 node *front;
3 node *rear;
4 int count;
5 } nodeHeader;
• Class 2, 1/28/2021
• To make search faster: remove the special case that we reach the end
of the list by placing a pseudo-data node at the end. Keep track of
the pseudo-data node in the header.
1 typedef struct {
2 node *front;
3 node *rear;
4 node *pseudo;
5 int count;
6 } nodeHeader;
7
9 Stack of integer
• Abstract definition: either empty or the result of pushing an integer
onto the stack.
• operations
• stack makeEmptyStack()
• boolean isEmptyStack(stack S)
• int popStack(stack *S) // modifies S
CS315 Spring 2021 7
4 typedef struct {
5 int contents[MAXSTACKSIZE];
6 int count; // index of first free space in contents
7 } stack;
8
9 stack *makeEmptyStack() {
10 stack *answer = (stack *) malloc(sizeof(stack));
11 answer->count = 0;
12 return answer;
13 } // makeEmptyStack
14
• The array implementation limits the size. Does the linked-list imple-
mentation also limit the size?
• The array implementation needs one cell per (potential) element, and
one for the count. How much space does the linked-list implemen-
tation need?
• We can position two opposite-sense stacks in one array so long as
CS315 Spring 2021 9
12 Queue of integer
• Abstract definition: either empty or the result of inserting an integer
at the rear of a queue or deleting an integer from the front of a queue.
• operations
• queue makeEmptyQueue()
• boolean isEmptyQueue(queue Q)
• void insertInQueue(queue Q, int I) // modifies Q
• int deleteFromQueue(queue Q) // modifies Q
rear
header
dummy
CS315 Spring 2021 10
1 #include <stdlib.h>
2
8 typedef struct {
9 node *front;
10 node *rear;
11 } queue;
12
13 queue *makeEmptyQueue() {
14 queue *answer = (queue *) malloc(sizeof(queue));
15 answer->front = answer->rear = makeNode(0, NULL);
16 return answer;
17 } // makeEmptyQueue
18
full
front
0 rear MAX
empty
CS315 Spring 2021 12
1 #define MAXQUEUESIZE 30
2
3 typedef struct {
4 int contents[MAXQUEUESIZE];
5 int front; // index of element at the front
6 int rear; // index of first free space after the queue
7 } queue;
8
15 Dequeue of integer
• Abstract definition: either empty or the result of inserting an integer
at the front or rear of a dequeue or deleting an integer from the front
or rear of a queue.
• operations
• dequeue makeEmptyDequeue()
• boolean isEmptyDequeue(dequeue D)
• void insertFrontDequeue(dequeue D, int I) // modifies D
• void insertRearDequeue(dequeue D, int I) // modifies D
• int deleteFrontDequeue(dequeue D) // modifies D
• int deleteRearDequeue(dequeue D) // modifies D
• Exercise: code the insertFrontDequeue() and deleteRearDequeue()
routines using an array.
• All operations for a singly-linked list implementation are O(1) except
for deleteRearDequeue(), which is O(n).
• The best list structure is a doubly-linked list with a single dummy
node.
prev data next
dum
16 Searching
• Class 4, 2/4/2021
• Given n data elements (we will use integer data), arrange them in a
data structure D so that these operations are fast:
• Representation 3: Array
when cn
a < bk Θ(nk )
a = bk Θ(nk log n)
a > bk Θ(nlogb a )
• In our case, a = 1, b = 2, k = 0, so bk = 1, so a = bk , so cn =
Θ(nk log n) = Θ(log n).
• Bad news: any comparison-based searching algorithm is Ω(log n),
that is, needs at least on the order of log n steps.
• Notation, slightly more formally defined. All these ignore multi-
plicative constants.
1 #define NULL 0
2 #include <stdlib.h>
3
20 Traversals
• A traversal walks through the tree, visiting every node.
• Symmetric traversal (also called inorder)
1 void symmetric(treeNode *tree) {
2 if (tree == NULL) { // do nothing
3 } else {
4 symmetric(tree->left);
5 visit(tree);
6 symmetric(tree->right);
7 }
8 } // symmetric()
• Pre-order traversal
1 void preorder(treeNode *tree) {
• Post-order traversal
1 void postorder(treeNode *tree) {
• insert(data) and search(data) are O(log n), but we can generally treat
them as O(1).
• We will discuss hashing later.
• It appears that for arbitrary j we need O(jn) time, because each iter-
ation needs t tests, where 1 ≤ t ≤ j, followed by modifying j + 1 − t
values, for a total cost of j + 1.
• Class 6, 2/11/2021
• Clever algorithm using an array: QuickSelect (Tony Hoare)
• We can also compute the cost using the recursion theorem (page 17):
23 Partitioning an array
• Nico Lomuto’s method
• Online demonstration.
• The method partitions array[lowIndex .. highIndex] into
three pieces:
The elements of each piece are in order with respect to adjacent pieces.
CS315 Spring 2021 22
• Example
CS315 Spring 2021 23
5 2 1 7 9 0 3 6 4 8
d c
d,c
5 2 1 7 9 0 3 6 4 8
d c
d,c
5 2 1 7 9 0 3 6 4 8
d c
d c
d c
d c
d c
d c
5 2 1 0 9 7 3 6 4 8
d c
d c
5 2 1 0 3 7 9 6 4 8
d c
d c
d c
d c
5 2 1 0 3 4 9 6 7 8
d c
d c
d c
4 2 1 0 3 5 9 6 7 8
CS315 Spring 2021 24
25 Sorting
• Class 7, 2/16/2021
• We usually are interested in sorting an array in place.
• Sorting is Ω(n log n).
• Good methods are O(n log n).
• Bad methods are O(n2 ).
27 Insertion sort
• Comb method:
sorted unsorted
probe
• n iterations.
• Iteration i may need to shift the probe value i places.
• ⇒ O(n2 ).
• Experimental results for Insertion Sort:
compares + moves ≈ n.
n compares moves n2 /2
100 2644 2545 5000
200 9733 9534 20000
400 41157 40758 80000
CS315 Spring 2021 26
28 Selection sort
• Comb method:
sorted, small unsorted, large
smallest
• n iterations.
• Iteration i may need to search through n − i places.
• ⇒ O(n2 ).
• Experimental results for Selection Sort:
compares + moves ≈ n.
CS315 Spring 2021 27
n compares moves n2 /2
100 4950 198 5000
200 19900 398 20000
400 79800 798 80000
1 void selectionSort(int array[], int length) {
2 // array goes from 0..length-1
3 int combIndex, smallestValue, bestIndex, probeIndex;
4 for (combIndex = 0; combIndex < length; combIndex += 1) {
5 // array[0 .. combIndex-1] has lowest elements, sorted.
6 // Find smallest other element to place at combIndex.
7 smallestValue = array[combIndex];
8 bestIndex = combIndex;
9 for (probeIndex = combIndex+1; probeIndex < length;
10 probeIndex += 1) {
11 if (array[probeIndex] < smallestValue) {
12 smallestValue = array[probeIndex];
13 bestIndex = probeIndex;
14 }
15 }
16 swap(array, combIndex, bestIndex);
17 } // for combIndex
18 } // selectionSort
• Not stable, because the swap moves an arbitrary value into the un-
sorted area.
random
partition
small big
sort sort
• To insert
• Storage
• Applications
• Sorting
• Priority queue
CS315 Spring 2021 31
1 // intermediate algorithms
2
1 // advanced algorithm
2
32 Bin sort
• Assumptions: values lie in a small range; there are no duplicates.
• Storage: build an array of bins, one for each possible value. Each is
1 bit long.
• Space: O(r), where r is the size of the range.
• Place each value to sort as a 1 in its bin. Time: O(n).
• Read off bins in order, reporting index if it is 1. Time: O(r).
• Total time: O(n + r).
• Total memory: O(r), which can be expensive.
• Can handle duplicates by storing a count in each bin, at a further
expense of memory.
• This sorting method does not work for arbitrary data having nu-
meric keys; it only sorts the keys, not the data.
33 Radix sort
• Example: use base 10, with values integers 0 – 9999, with 10 bins,
each holding a list of values, initially empty.
• Pass 1: insert each value in a bin (at rear of its list) based on the last
digit of the value.
CS315 Spring 2021 34
• Pass 2: examine values in bin order, and in list order within bins,
placing them in a new copy of bins based on the second-to-last digit.
• Pass 3, 4: similar.
• The number of digits is O(log n), so there are O(log n) passes, each
of which takes O(n) time, so the algorithm is O(n log n).
• This sorting method is stable.
34 Merge sort
• cn = n + 2cn/2
• a = 2, b = 2, k = 1 ⇒ O(n log n).
• This time complexity is guaranteed.
• Space needed: 2n, because merge in place is awkward (and expen-
sive).
• The sort is also stable: it preserves the order of identical keys.
• Insertion, radix, and merge sort are stable, but not selection, Quick-
sort or Heapsort.
CS315 Spring 2021 35
• place new node n in the tree and color it red. O(log n).
CS315 Spring 2021 36
c* c
g g
rotate c up
u p p down u c
continue to case 3
c* c3 c1 p*
c1 c2 c2 c3
g g p
recolor rotate p up
u p u p g down g n
c1 c* c1 c u c1
final
1 color 1 case 3 1 rotate 2 case 1
2 color 2 1 3 color
3 3 4
2 2 2
case 3 rotate
1 3 1 3 1 4 case 1
color
4 4 5 color
3
5 5 6
2
1 4
3 5
6
37 Ternary trees
• Class 11, 3/4/2021
CS315 Spring 2021 38
• By example.
• The depth of a balanced ternary tree is log3 n, which is only 63% the
depth of a balanced binary tree.
• The number of comparisons needed to traverse an internal node dur-
ing a search is either 1 or 2; average 5/3.
• So the number of comparisons to reach a leaf is 53 log3 n instead of
(for a binary tree) log2 n, a ratio of 1.05, indicating a 5% degradation.
• The situation gets only worse for larger arity. For quaternary trees,
the degradation (in comparison to binary trees) is about 12.5%.
• And, of course, an online construction is not balanced.
• Moral: binary is best; higher arity is not helpful.
41 Stooge Sort
• A terrible method, but fun to analyze.
CS315 Spring 2021 41
1 #include <math.h>
2
• cn = 1 + 3c2n/3
• a = 3, b = 3/2, k = 0, so bk = 1. By the recursion theorem (page 18),
since a > bk , we have complexity Θ(nlogb a ) = Θ(nlog3/2 3 ) ≈ Θ(n2.71 ), so
Stooge Sort is worse than quadratic.
• However, the recursion often encounters already-sorted sub-arrays.
If we add a check for that situation, Stooge Sort becomes roughly
quadratic.
• B+ tree variant: link leaf nodes together for quicker inorder traversal.
This link also allows us to avoid splitting a leaf if its neighbor is not
at capacity.
• A densely filled tree with n keys (values), height h:
mh+1 −1
• Number of nodes a = 1 + m + m2 + · · · + mh = m−1
.
• Number of keys n = (m − 1)a = mh+1 − 1 ⇒ logm (n + 1) =
h + 1 ⇒ h is O(log n).
• The root has two subtrees; the others have g = dm/2e subtrees,
so:
h −1)
• Number of nodes a = 1 + 2(1 + g + g 2 + · · · + g h−1 ) = 1 + 2(gg−1 .
• The root has 1 key, the others have g − 1 keys, so:
• Number of keys n = 1+2(g h −1) = 2g h −1 ⇒ h = logg (n+1)/2 =
O(log n).
• If all neighbors (there are 1 or 2) are already minimal, grab a key from
the parent and also merge with a neighbor.
• In general, deletion is quite difficult.
44 Hashing
• Very popular data structure for searching.
• Cost of insertion and of search is O(log n), but only because n distinct
values must be log n bits long, and we need to look at the entire key.
If we consider looking at a key to be O(1), then hashing is expected
(but not guaranteed) to be O(1).
• Idea: find the value associated with key k at A[h(k)], where
• Example
• k = student in class.
• h(k) = k’s birthday (a value from 0 .. 365).
• Difficulty: collisions
365!
• Birthday paradox: Prob(no collisions with j people) = (365−j)!365j
• This probability goes below 1/2 at j = 23.
• At j = 50, the probability is 0.029.
• Moral: One cannot in general avoid collisions. One has to deal with
them.
• Perfect hashing: if you know all n values in advance, you can look
for a non-colliding hash function h. Finding such a function is in
general quite difficult, but compiler writers do sometimes use perfect
hashing to detect keywords in the language (like if and for).
• Linear probing. Probe p is at index h(k) + p (mod s), for p = 0, 1, . . ..
47 Midterm exam
Class 15, 3/18/2021
• Add (or multiply) all (or some of) the words of k, discarding
overflow, then mod by s. It helps if s = 2j , because mod is then
masking with 2j − 1.
• XOR the words of k, shifting left by 1 after each, followed by
mod s.
• Start with one bucket. If it gets too full (list longer than 10, say),
split it on the last bit of h(k) into two buckets.
• Whenever a bucket based on the last j bits is too full, split it
based on bit j + 1 from the end.
• To find the bucket
• compute v = h(k).
• follow a tree that discriminates on the last bits of v. This
tree is called a trie.
• it takes at most log v steps to find the right bucket.
• Searching within the bucket now is guaranteed to take con-
stant time (ignoring the log n cost of comparing keys)
• Resizing the array is automatic, although one might specify the ex-
pected size in advance to avoid resizing during early growth.
• Perl has a built-in datatype called a hash.
1 my %foo;
2 foo{"this"} = "that".
1 Foo = dict()
2 Foo[’this’] = ’that’;
• fast computation
• uninvertable: given h(k), it should be infeasible to compute k.
• it should be infeasible to find collisions k1 and k2 such that h(k1 ) =
h(k2 ).
• examples
• uses
CS315 Spring 2021 49
56 Graphs
• Our standard graph:
1 e1 2
e2
e4 e5 3 4 e3 5
6 7
e7
e6
• Nomenclature
• vertices: V is the name of the set, v is the size of the set. In our
example, V = {1, 2, 3, 4, 5, 6, 7}.
• edges: E is the name of the set, e is the size of the set. In our
example, E = {e1, e2, e3, e4, e5, e6, e7}.
• directed graph: edges have direction (represented by arrows).
• undirected graph: edges have no direction.
• multigraph: more than one edge between two vertices. We gen-
erally do not deal with multigraphs, and the word graph gen-
erally disallows them.
• weighted graph: each edge has numeric label called its weight.
a graph.
B
A
D
C
Can you find an Eulerian cycle?
• Family trees. These graphs are bipartite: Family nodes and per-
son nodes. We might want to find the shortest path between
two people.
• Cities and roadways, with weights indicating distance. We might
want a minimal-cost spanning tree.
• an array n × n of Boolean.
• A[i, j] = true ⇒ there is an edge from vertex i to vertex j.
1 2 3 4 5 6 7
1 x x
2 x x x
3 x x
4 x
5 x
6 x x
7 x x x
• The array is symmetric if the graph is undirected
• in this case, we can store only one half of it, typically in a
1-dimensional array
• A[i(i − 1)/2 + j] holds information about edge i, j.
• Instead of Boolean, we can use integer values to store edge weights.
CS315 Spring 2021 51
• Adjacency list
61 Breadth-first search
• applications
• For our standard graph (page 49), assuming that the adjacency lists
are all sorted by vertex number (or that we use the adjacency matrix),
starting at vertex 1, BFS visits these vertices: 1, 2, 6, 3, 7.
• using adjacency lists, BFS is O(v 0 + e0 ).
• Rule: among all vertices that can extend a shortest path already
found, choose the one that results in a shortest path. If there is a
tie ending at the same vertex, choose either. If there is a tie going to
different vertices, choose both.
• This is an example of a greedy algorithm: at each step, improve the
solution in the way that looks best at the moment.
• Starting position: one path, length 0, from start vertex j to j.
CS315 Spring 2021 56
1 2
80
40 100
path length
20
60 3 4 5 →0
56 → 40
60
120 120 53 120
51 60
51 → 60
5 40 6 563 100 better way to add vertex 3
564 160
512 140
563 → 100
564 160
512 140
5634 → 120 better way to add vertex 4
• Another example:
1
30 30 path length
1 →0
2 10 3 20 4 12 →3
10 14 →3
40 30 123 →4
5 Start at 1. 1 2 5 7
143 5
145 6
125 7
145 6
1235 →5
64 Topological sort
• Sample application: course prerequisites place some pairs of courses
in order, leading to a directed, acyclic graph (DAG). We want to find
a total order; there may be many acceptable answers.
• Weiss §9.2
CS315 Spring 2021 57
115
215
1 2 3
216 275
4 5 6
280 315
10 471
Possible results:
1 4 10 1 2 6 5 7 8 3 9
2 10 4 7 1 2 5 8 6 3 9
• method: DFS looking for sinks (degrees with fanout 0), which are
then placed at the front of the growing result.
1 list answerList; // global
2
65 Spanning trees
• Class 21, 4/8/2021
• Weiss §9.5
• Spanning tree: Given a connected undirected graph, a cycle-free
CS315 Spring 2021 58
20 50
30 30 20
10 10
3 4 3 4
60 30
60
5 20 6 5 20 6
• Minimum-weight panning tree: Given a connected undirected weighted
graph, a spanning tree with least total weight.
• Example: minimum-cost set of roads (edges) connecting a set of cities
(vertices).
66 Prim’s algorithm
• Complexity: O(v · log v + e), because for we add each vertex once, re-
moving it from a heap that can have v elements; we need to consider
each edge twice (once from each end).
67 Kruskal’s algorithm
• operations
69 Numerical algorithms
• We will not look at algorithms for approximation to problems using
real numbers; that is the subject of CS321.
• We will study integer algorithms.
CS315 Spring 2021 61
a 12 60 12
• Example:
b 60 12 0
a 15 66 15 6 3
• Example:
b 66 15 6 3 0
a 15 67 15 7 1
• Example:
b 67 15 7 1 0
71 Fast exponentiation
• Many cryptographic algorithms require raising large integers (thou-
sands of digits) to very large powers (hundreds of digits), modulo a
large number (about 2K bits).
• to get a64 we only need six multiplications: (((((a2 )2 )2 )2 )2 )2
• to get a5 we need three multiplications: a4 · a = (a2 )2 · a.
• General rule to compute ae : look at the binary representation of e,
read it from left to right. The initial accumulator has value 1.
72 Integer multiplication
• Class 23, 4/15/2021
• The BigNum representation: linked list of pieces, each with, say, 2
bytes of unsigned integer, with least-significant piece first. (It makes
no difference whether we store those 2 bytes in little-endian or big-
endian.)
• Ordinary multiplication of two n-digit numbers x and y costs n2 .
• Anatoly Karatsuba (1962) showed a divide-and-conquer method that
is better.
• Split each number into two chunks, each with n/2 digits:
• x = a · 10n/2 + b
• y = c · 10n/2 + d
The base 10 is arbitrary; the same idea works in any base, such
as 2.
• Now we can calculate xy = ac10n + (bc + ad)10n/2 + bd. This cal-
culation uses four multiplications, each costing (n/2)2 , so it still
CS315 Spring 2021 63
2 y = 4481
3 a = 39
4 b = 62
5 c = 44
6 d = 81
7 u = a*c
8 v = b*d
9 w = (a+b)*(c+d)
10 x * y
11 u*10ˆ4 + (w-u-v)*10ˆ2 + v
• We now have reduced the work to 1/128 (for 7-bit ASCII), not
1/2, for the random case, because only that small fraction of
starting positions are worth pursuing.
CS315 Spring 2021 68
1 int location[256];
2 // location[c] is the last position in p holding char c
3
• Examples
79 Edit distance
• How much do we need to change s (source) to make it look like d
(destination) ?
CS315 Spring 2021 73
80 Categories of algorithms
• Divide and conquer
CS315 Spring 2021 74
• Greedy
• Dynamic programming
• Search
• Mergesort:
1 void mergeSort(int array[], int lowIndex, int highIndex){
2 // sort array[lowIndex] .. array[highIndex]
3 if (highIndex - lowIndex < 1) return; // width 0 or 1
4 int mid = (lowIndex+highIndex)/2;
5 mergeSort(array, lowIndex, mid);
6 mergeSort(array, mid+1, highIndex);
7 merge(array, lowIndex, highIndex);
8 } // mergeSort
a = 2, b = 2, k = 1 ⇒ O(n log n).
CS315 Spring 2021 75
82 Greedy algorithms
General rule: Enlarge the current solution by selecting (usually in a simple
way) the best single-step improvement.
• Computing the coins for change: greedily apply the biggest available
coin first.
space 0
A 111
O 110
R 101
S 1001
T 1000
The same text now uses 60 · 1 + 22 · 3 + . . . + 4 · 4 = 253 bits.
• To decode: follow a tree:
1 0
1 0 space
1 0 1 0
A O R 1 0
S T
• Remove the two smallest numbers from the set. (This step is
greedy: take the numbers whose sum can be computed with
the least precision loss.)
• Insert their sum in the set.
• Use a heap to represent the set.
• Greedy method
• Start with an empty knapsack.
• Sort the objects in decreasing order of pi /wi .
• Greedy step: Take all of each object in the list, if it fits. If it
fits partially, take a fraction of the object, then done.
• Stop when the knapsack is full.
• Example.
chapter pages (weight) importance (profit) ratio
1 120 5 .0417
2 150 5 .0333
3 200 4 .0200
4 150 8 .0533
5 140 3 .0214
sorted: 4, 1, 2, 5, 3. If capacity C = 600, take all of 4, 1, 2, 5, and
40/200 of 3.
• This greedy algorithm happens to be optimal.
• 0/1 knapsack problem: Same as before, but no fractions are allowed.
The greedy algorithm is still fast, but it is not guaranteed optimal.
83 Dynamic programming
General rule: Solve all smaller problems and use their solutions to com-
pute the solution to the next problem.
• Compute Fibonacci numbers: fi = fi−1 + fi−2 .
• Compute binomial coefficients: C(n, i) = C(n − 1, i − 1) + C(n − 1, i).
• Compute minimal edit distance.
• Numerical algorithms
• Miscellaneous
85 Tractability
• Formal definition of O: f (n) = O(g(n)) iff for adequately large n,
and some constant c, we have f (n) ≤ c · g(n). That is, f is bounded
above by some multiple of g. We can say that f grows no faster than
g.
• Formal definition of Θ: f (n) = Θ(g(n)) iff for adequately large n, and
some constants c1 , c2 , we have c1 · g(n) ≤ f (n) ≤ c2 · g(n). We say that
f grows as fast as g.
• Formal definition of Ω is similar; f (n) = Ω(g(n)) means that f grows
at least as fast as g.
• We usually say that a problem is tractable if we can solve it in poly-
nomial time (with respect to the problem size n). We also say the
program is efficient.
• constant time: O(1)
• logarithmic time: O(log n)
• linear time: O(n)
• sub-quadratic: O(n log n) (for instance)
• quadratic time: O(n2 )
• cubic time: O(n3 )
• These are all bounded by O(nk ) for some fixed k.
• However, if k is large, even tractable problems can be infeasible
to solve. In practice, algorithms seldom have k > 3.
• There are many algorithms that take more than polynomial time.
• exponential: O(2n ).
• super-exponential: O(n!) (for example)
• O(nn )
n
• O(22 )
CS315 Spring 2021 80