Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
13 views

to-print-data-structures-and-algorithms-class-notes

Uploaded by

2024mt12089
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

to-print-data-structures-and-algorithms-class-notes

Uploaded by

2024mt12089
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

CS315 Spring 2021 2

3 Tools
Use
Specification
CS315 Class Notes Implementation

Raphael Finkel
4 Singly-linked list
May 4, 2021
• used as a part of several ADTs.
• Can be considered an ADT itself.
1 Intro • Collection of nodes, each with optional arbitrary data and a pointer
to the next element on the list.
Class 1, 1/26/2021
handle

• Handout 1 — My names
• TA:
a c x f
• Plagiarism — read aloud
data pointer null pointer

• Assignments on web. Use C, C++, or Java. operation cost


• E-mail list: create empty list O(1)
insert new node at front of list O(1)
• accounts in MultiLab delete first node, returning data O(1)
• text — we will skip around count length O(n)
search by data O(n)
sort O(n log n) to O(n2 )
2 Basic building blocks: Linked lists (Chapter 3)
and trees (Chapter 4)
Linked lists and trees are examples of data structures:

• way to represent information


• so it can be manipulated
• packaged with routines that do the manipulations

Leads to an Abstract Data Type (ADT): has an API (specification) and


hides its internals.

1
CS315 Spring 2021 3 CS315 Spring 2021 4

5 Sample code (in C)


1 #define NULL 0
2 #include <stdlib.h>
3

4 typedef struct node_s {


5 int data;
6 struct node_s *next;
7 } node;
8

9 node *makeNode(int data, node* next) {


10 node *answer = (node *) malloc(sizeof(node));
11 answer->data = data;
12 answer->next = next;
13 return(answer);
14 } // makeNode
15

16 node *insertAtFront(node* handle, int data) {


17 node *answer = makeNode(data, handle->next);
18 handle->next = answer;
19 return(answer);
20 } // insertAtFront
21

22 node *searchDataIterative(node *handle, int data) {


23 // iterative method
24 node *current = handle->next;
25 while (current != NULL) {
26 if (current->data == data) break;
27 current = current->next;
28 }
29 return current;
30 } // searchDataIterative
31

32 node *searchDataRecursive(node *handle, int data) {


33 // recursive method
34 node *current = handle->next;
35 if (current == NULL) return NULL;
36 else if (current->data == data) return current;
37 else return searchDataRecursive(current, data);
38 } // searchDataRecursive
CS315 Spring 2021 5 CS315 Spring 2021 6

6 Improving the efficiency of some operations • Exercise: Is it easy to add a new node after a given node?
• Exercise: Is it easy to add a new node before a given node?
• To make count() fast: maintain the count in a separate variable. If we
need the count more often than we insert and delete, it is worthwhile.
• To make insert at rear fast: maintain two handles, one to the front, 7 Aside: Unix pipes
the other to the rear of the list.
• Unix programs automatically have three “files” open: standard in-
• Combine these new items in a header node:
put, which is by default the keyboard, standard output, which is by
1 typedef struct {
default the screen, and standard error, which is by default the screen.
2 node *front;
3 node *rear; • In C and C++, they are defined in stdio.h by the names stdin,
4 int count; stdout, and stderr.
5 } nodeHeader; • The command interpreter (in Unix, it’s called the “shell”) lets you
invoke programs redirecting any or all of these three. For instance,
• Class 2, 1/28/2021 ls | wc redirects stdout of the ls program to stdin of the wc
program.
• To make search faster: remove the special case that we reach the end
of the list by placing a pseudo-data node at the end. Keep track of • If you run your trains program without redirection, you can type
the pseudo-data node in the header. in arbitrary numbers.
1 typedef struct { • If you run randGen.pl without redirection, it generates an unbounded
2 node *front; list of pseudo-random numbers to stdout.
3 node *rear; • If you run randGen.pl | trains, the list of numbers from randGen.pl
4 node *pseudo; is redirected as input to trains.
5 int count;
6 } nodeHeader;
7 8 Stacks, queues, dequeues: built out of either
8 node *searchDataIterative(nodeHeader *header, int data) { linked lists or arrays
9 // iterative method
10 header->pseudo->data = data; • We’ll see each of these.
11 node *current = header->front;
12 while (current->data != data) {
13 current = current->next; 9 Stack of integer
14 }
15 return (current == header->pseudo ? NULL : current); • Abstract definition: either empty or the result of pushing an integer
16 } // searchDataIterative onto the stack.

• Exercise: If we want both pseudo-data and a rear pointer, how does • operations
an empty list look? • stack makeEmptyStack()
• Exercise: If we want pseudo-data, how does searchDataRecursive() • boolean isEmptyStack(stack S)
change? • int popStack(stack *S) // modifies S
CS315 Spring 2021 7 CS315 Spring 2021 8

• void pushStack(stack *S, int I) // modifies S • Warning: it’s easy to make off-by-one errors.
1 #define MAXSTACKSIZE 10
#include <stdlib.h>
10 Implementation 1 of Stack: Linked list 2

• makeEmptyStack implemented by makeEmptyList() 4 typedef struct {


5 int contents[MAXSTACKSIZE];
• isEmptyStack implemented by isEmptyList() 6 int count; // index of first free space in contents
• pushStack inserts at the front of the list 7 } stack;
• popStack deletes from the front of the list 8

9 stack *makeEmptyStack() {
10 stack *answer = (stack *) malloc(sizeof(stack));
11 Implementation 2 of Stack: Array 11 answer->count = 0;
12 return answer;
• Class 3, 2/2/2021 13 } // makeEmptyStack
14

15 void pushOntoStack(stack *theStack, int data) {


16 if (theStack->count == MAXSTACKSIZE) {
17 (void) error("stack overflow");
18 } else {
19 theStack->contents[theStack->count] = data;
20 theStack->count += 1;
21 }
22 } // pushOntoStack
23

24 int popFromStack(stack *theStack) {


25 if (theStack->count == 0) {
26 return error("stack underflow");
27 } else {
28 theStack->count -= 1;
29 return theStack->contents[theStack->count];
30 }
31 } // popFromStack

• The array implementation limits the size. Does the linked-list imple-
mentation also limit the size?
• The array implementation needs one cell per (potential) element, and
one for the count. How much space does the linked-list implemen-
tation need?
• We can position two opposite-sense stacks in one array so long as
CS315 Spring 2021 9 CS315 Spring 2021 10

their combined size never exceeds MAXSTACKSIZE. 1 #include <stdlib.h>


2

typedef struct node_s {


12 Queue of integer 3

4 int data;
5 struct node_s *next;
• Abstract definition: either empty or the result of inserting an integer
6 } node;
at the rear of a queue or deleting an integer from the front of a queue.
7

• operations 8 typedef struct {


9 node *front;
• queue makeEmptyQueue()
10 node *rear;
• boolean isEmptyQueue(queue Q) 11 } queue;
• void insertInQueue(queue Q, int I) // modifies Q 12

• int deleteFromQueue(queue Q) // modifies Q 13 queue *makeEmptyQueue() {


14 queue *answer = (queue *) malloc(sizeof(queue));
15 answer->front = answer->rear = makeNode(0, NULL);
13 Implementation 1 of Queue: Linked list 16 return answer;
17 } // makeEmptyQueue
We use a header to represent the front and the rear, and we put a dummy 18
node at the front to make the code work equally well for an empty queue. 19 bool isEmptyQueue(queue *theQueue) {
front
20 return (theQueue->front == theQueue->rear);
21 } // isEmptyQueue
rear 22

23 void insertInQueue(queue *theQueue, int data) {


header
24 node *newNode = makeNode(data, NULL);
25 theQueue->rear->next = newNode;
26 theQueue->rear = newNode;
dummy 27 } // insertInQueue
28

29 int deleteFromQueue(queue *theQueue) {


30 if (isEmptyQueue(theQueue)) return error("queue underflow");
31 node *oldNode = theQueue->front->next;
32 theQueue->front->next = oldNode->next;
33 if (theQueue->front->next == NULL) {
34 theQueue->rear = theQueue->front;
35 }
36 return oldNode->data;
37 } // deleteFromQueue
CS315 Spring 2021 11 CS315 Spring 2021 12

14 Implementation 2 of Queue: Array 1 #define MAXQUEUESIZE 30


2
Warning: it’s easy to make off-by-one errors. 3 typedef struct {
4 int contents[MAXQUEUESIZE];
5 int front; // index of element at the front
6 int rear; // index of first free space after the queue
7 } queue;
8

0 front rear MAX 9 bool isEmptyQueue(queue *theQueue) {


10 return (theQueue->front == theQueue->rear);
11 } // isEmptyQueue
12

13 int nextSlot(int index) { // circular


14 return (index + 1) % MAXQUEUESIZE;
15 } // nextSlot
0 rear front MAX
16

17 void insertInQueue(queue *theQueue, int data) {


18 if (nextSlot(theQueue->rear) == theQueue->front)
19 error("queue overflow");
20 else {
21 theQueue->contents[theQueue->rear] = data;
0 rear front MAX 22 theQueue->rear = nextSlot(theQueue->rear);
23 }
full
24 } // insertInQueue
25

26 int deleteFromQueue(queue *theQueue) {


front 27 if (isEmptyQueue(theQueue)) {
0 rear MAX 28 return error("queue underflow");
29 } else {
empty 30 int answer = theQueue->contents[theQueue->front];
31 theQueue->front = nextSlot(theQueue->front);
32 return answer;
33 }
34 } // deleteFromQueue
CS315 Spring 2021 13 CS315 Spring 2021 14

15 Dequeue of integer 16 Searching


• Abstract definition: either empty or the result of inserting an integer • Class 4, 2/4/2021
at the front or rear of a dequeue or deleting an integer from the front
• Given n data elements (we will use integer data), arrange them in a
or rear of a queue.
data structure D so that these operations are fast:
• operations
• void insert(int data, *D)
• dequeue makeEmptyDequeue()
• boolean search(int data, D) (can also return entire data record)
• boolean isEmptyDequeue(dequeue D)
• void insertFrontDequeue(dequeue D, int I) // modifies D • We don’t care about the speed of deletion (for now).
• void insertRearDequeue(dequeue D, int I) // modifies D • Much of this material is in Chapter 4 of the book (trees)
• int deleteFrontDequeue(dequeue D) // modifies D • Representation 1: Linked list
• int deleteRearDequeue(dequeue D) // modifies D
• insert(i) is O(1): Place new element at the front.
• Exercise: code the insertFrontDequeue() and deleteRearDequeue()
• search(i) is O(n): We may need to look at whole list; we use
routines using an array.
pseudo-data i to make search as fast as possible
• All operations for a singly-linked list implementation are O(1) except
for deleteRearDequeue(), which is O(n). • Representation 2: Sorted linked list
• The best list structure is a doubly-linked list with a single dummy • insert(i) is O(n): On average, n2 steps. Use pseudo-data (value
node. ∞) at end to make insertion as fast as possible.
prev data next
• search(i) is O(n): We may need to look at whole list; on average,
we look at n2 elements if the search succeeds; all n elements if
dum it fails. Use pseudo-data (value ∞) to make search as fast as
possible.

• Representation 3: Array

• insert(i) is O(1): We place new element at the rear.


• search(i) is O(n): We may need to look at whole list; use pseudo-
data i at rear.

• Representation 4: Sorted array

• insert(i) is O(n): We need to search and then shove cells over.


• search(i) is O(log n): We use binary search.

• Exercise: Code all the routines.


• Exercise: Is it easy to add a new node after a given node?
• Exercise: Is it easy to add a new node before a given node?
CS315 Spring 2021 15 CS315 Spring 2021 16

1 // warning: it’s easy to make off-by-one errors. 17 Quadratic search: set mid based on discrep-
2 bool binarySearch(int target, int *array, ancy
3 int lowIndex, int highIndex) {
4 // look for target in array[lowIndex..highIndex] Also called interpolation search, extrapolation search, dictionary search.
5 while (lowIndex < highIndex) { // at least 2 elements
1 bool quadraticSearch(int target, int *array,
6 int mid = (lowIndex + highIndex) / 2; // round down
2 int lowIndex, int highIndex) {
7 if (array[mid] < target) lowIndex = mid + 1;
3 // look for target in array[lowIndex..highIndex]
8 else highIndex = mid;
4 while (lowIndex < highIndex) { // at least 2 elements
9 } // while at least 2 elements
5 if (array[highIndex] == array[lowIndex]) {
10 return (array[lowIndex] == target);
6 highIndex = lowIndex;
11 } // binarySearch
7 break;
8 }
9 float percent = (0.0 + target - array[lowIndex])
10 / (array[highIndex] - array[lowIndex]);
11 int mid = int(percent * (highIndex-lowIndex)) + lowIndex;
12 if (mid == highIndex) {
13 mid -= 1;
14 }
15 if (array[mid] < target) {
16 lowIndex = mid + 1;
17 } else {
18 highIndex = mid;
19 }
20 } // while at least 2 elements
21 return(array[lowIndex] == target);
22 } // quadraticSearch
Experimental results

• It is hard to program correctly.


• For 106 ≈ 220 elements, binary search always makes 20 probes.
• This result is consistent with O(log n).
• Quadratic search: 20 tests with uniform data. The range of probes
was 3 – 17; the average about 9 probes.
• Analysis shows that if the data are uniformly distributed, quadratic
search should be O(log log n).
CS315 Spring 2021 17 CS315 Spring 2021 18

18 Analyzing binary search 1 #define NULL 0


2 #include <stdlib.h>
• Binary search: cn = 1 + cn2 where cn is the number of steps to search 3
for an element in an array of length n. 4 typedef struct treeNode_s {
• We will use the Recursion Theorem: if cn = f (n) + acnb , where 5 int data;
f (n) = Θ(nk ), then 6 treeNode_s *left, *right;
7 } treeNode;
8
when cn 9 treeNode *makeNode(int data) {
a < bk Θ(nk ) 10 treeNode *answer = (treeNode *) malloc(sizeof(treeNode));
a = bk Θ(nk log n) 11 answer->data = data;
a > bk Θ(nlogb a ) 12 answer->left = answer->right = NULL;
13 return answer;
• In our case, a = 1, b = 2, k = 0, so b = 1, so a = b , so cn =
k k
14 } // makeNode
Θ(nk log n) = Θ(log n). 15

• Bad news: any comparison-based searching algorithm is Ω(log n), 16 treeNode *searchTree(treeNode *tree, int key) {
that is, needs at least on the order of log n steps. 17 if (tree == NULL) return(NULL);
18 else if (tree->data == key) return(tree);
• Notation, slightly more formally defined. All these ignore multi- 19 else if (key <= tree->data)
plicative constants. 20 return(searchTree(tree->left, key));
• O(f (n)): no worse than f (n); at most f (n). 21 else
22 return(searchTree(tree->right, key));
• Ω(f (n)): no better than f (n); at least f (n).
23 } // searchTree
• Θ(f (n)): no better or worse than f (n); exactly f (n). 24

25 void insertTree(treeNode *tree, int key) {


// assumes empty tree is a pseudo-node with infinite data
19 Representation 5: Binary tree 26

27 treeNode *parent = NULL;


28 treeNode *newNode = makeNode(key);
• Class 5, 2/9/2021 29 while (tree != NULL) { // dive down tree
• Example with elicited values 30 parent = tree;
• Pseudo-data: in the universal “null” node. 31 tree = (key <= tree->data) ? tree->left : tree->right;
32 } // dive down tree
• insert(i) and search(i) are both O(log n) if we are lucky or data are 33 if (key <= parent->data)
random. 34 parent->left = newNode;
35 else
36 parent->right = newNode;
37 } // insertTree

• We will deal with balancing trees later.


CS315 Spring 2021 19 CS315 Spring 2021 20

20 Traversals • insert(data) and search(data) are O(log n), but we can generally treat
them as O(1).
• A traversal walks through the tree, visiting every node.
• We will discuss hashing later.
• Symmetric traversal (also called inorder)
void symmetric(treeNode *tree) {
1

2 if (tree == NULL) { // do nothing


22 Finding the jth largest element in a set
3 } else {
• If j = 1, a single pass works in O(n) time:
4 symmetric(tree->left);
5 visit(tree); 1 largest = -∞; // priming
6 symmetric(tree->right); 2 foreach (value in set) {
7 } 3 if (value > largest) largest = value;
8 } // symmetric() 4 }
5 return(largest);
• Pre-order traversal
1 void preorder(treeNode *tree) {
• If j = 2, a single pass still works in O(n) time, but it is about twice as
2 if (tree == NULL) { // do nothing costly:
3 } else { 1 largest = nextLargest = -∞; // priming
4 visit(tree); 2 foreach (value in set) {
5 preorder(tree->left); 3 if (value > largest) {
6 preorder(tree->right); 4 nextLargest = largest;
7 } 5 largest = value;
8 } // preorder() 6 } else if (value > nextLargest) {
7 nextLargest = value;
• Post-order traversal 8 }
1 void postorder(treeNode *tree) { 9 } // foreach value
2 if (tree == NULL) { // do nothing 10 return(nextLargest);
3 } else {
4 postorder(tree->left); • It appears that for arbitrary j we need O(jn) time, because each iter-
5 postorder(tree->right); ation needs t tests, where 1 ≤ t ≤ j, followed by modifying j + 1 − t
6 visit(tree); values, for a total cost of j + 1.
7 } • Class 6, 2/11/2021
8 } // postorder()
• Clever algorithm using an array: QuickSelect (Tony Hoare)
• Does pseudo-data make sense?
• Partition the array into “small” and “large” elements with a
pivot between them (details soon).
21 Representation 6: Hashing (scatter storage) • Recurse in either the small or large subarray, depending where
the jth element falls. Stop if the jth element is the pivot.
• Hashing is often the best method for searching (but not for sorting).
• Cost: n + n2 + n4 +    = 2n = O(n)
CS315 Spring 2021 21 CS315 Spring 2021 22

• We can also compute the cost using the recursion theorem (page 17): 1 int partition(int array[], int lowIndex, int highIndex) {
• cn = n + cn2 (if we are lucky) 2 // modifies array, returns pivot index.
3 int pivotValue = array[lowIndex];
• cn = n + c2n3 (fairly average case)
4 int divideIndex = lowIndex;
• f (n) = n = O(n1 ) 5 for (int combIndex = lowIndex+1; combIndex <= highIndex;
• k = 1, a = 1, b = 2 (or b = 32) 6 combIndex += 1) {
• a < bk 7 // array[lowIndex] is the partitioning (pivot) value.
• so cn = Θ(nk ) = Θ(n) 8 // array[lowIndex+1 .. divideIndex] are < pivot
9 // array[divideIndex+1 .. combIndex-1] are ≥ pivot
10 // array[combIndex .. highIndex] are unseen
23 Partitioning an array 11 if (array[combIndex] < pivotValue) { // see a small value
12 divideIndex += 1;
• Nico Lomuto’s method 13 swap(array, divideIndex, combIndex);
• Online demonstration. 14 }
15 } // each combIndex
• The method partitions array[lowIndex .. highIndex] into 16 // swap pivotValue into its place
three pieces: 17 swap(array, divideIndex, lowIndex);
18 return(divideIndex);
• array[lowIndex .. divideIndex -1]
19 } // partition
• array[divideIndex]
• array[divideIndex + 1 .. highIndex] • Example

The elements of each piece are in order with respect to adjacent pieces.
CS315 Spring 2021 23 CS315 Spring 2021 24

5 2 1 7 9 0 3 6 4 8 24 Using partitioning to select jth smallest


d c
d,c
1 int selectJthSmallest (int array[], int size, int targetIndex) {
5 2 1 7 9 0 3 6 4 8
2 // rearrange the values in array[0..size-1] so that
d c
3 // array[targetIndex] has the value it would have if the array
d,c
4 // were sorted.
5 2 1 7 9 0 3 6 4 8
5 int lowIndex = 0;
d c
6 int highIndex = size-1;
d c
7 while (lowIndex < highIndex) {
d c
8 int midIndex = partition(array, lowIndex, highIndex);
d c 9 if (midIndex == targetIndex) {
d c 10 return array[targetIndex];
d c 11 } else if (midIndex < targetIndex) { // look to right
5 2 1 0 9 7 3 6 4 8 12 lowIndex = midIndex + 1;
d c 13 } else { // look to left
d c 14 highIndex = midIndex - 1;
5 2 1 0 3 7 9 6 4 8 15 }
d c 16 } // while lowIndex < highIndex
d c 17 return array[targetIndex];
d c 18 } // selectJthSmallest
d c
5 2 1 0 3 4 9 6 7 8
d c 25 Sorting
d c
d c • Class 7, 2/16/2021
4 2 1 0 3 5 9 6 7 8
• We usually are interested in sorting an array in place.
• Sorting is Ω(n log n).
• Good methods are O(n log n).
• Bad methods are O(n2 ).

26 Sorting out sorting


• https://www.youtube.com/watch?v=HnQMDkUFzh4 Original film.
• https://www.youtube.com/watch?v=kPRA0W1kECg 15 meth-
ods in 6 minutes.
CS315 Spring 2021 25 CS315 Spring 2021 26

27 Insertion sort 1 void insertionSort(int array[], int length) {


2 // array goes from 1..length.
• Comb method: 3 // location 0 is available for pseudo-data.
sorted unsorted 4 int combIndex, combValue, sortedIndex;
5 for (combIndex = 2; combIndex <= length; combIndex += 1) {
6 // array[1 .. combIndex-1] is sorted.
7 // Place array[combIndex] in order.
probe 8 combValue = array[combIndex];
9 sortedIndex = combIndex - 1;
10 array[0] = combValue - 1; // pseudo-data
11 while (combValue < array[sortedIndex]) {
12 array[sortedIndex+1] = array[sortedIndex];
13 sortedIndex -= 1;
• n iterations. 14 }
• Iteration i may need to shift the probe value i places. 15 array[sortedIndex+1] = combValue;
2
16 } // for combIndex
• ⇒ O(n ). 17 } // insertionSort
• Experimental results for Insertion Sort:
compares + moves ≈ n. • Stable: multiple copies of the same key stay in order.
n compares moves n2 2
100 2644 2545 5000
200 9733 9534 20000
28 Selection sort
400 41157 40758 80000 • Comb method:
sorted, small unsorted, large

smallest

sorted, small unsorted, large

• n iterations.
• Iteration i may need to search through n − i places.
• ⇒ O(n2 ).
• Experimental results for Selection Sort:
compares + moves ≈ n.
CS315 Spring 2021 27 CS315 Spring 2021 28

n compares moves n2 2
100 4950 198 5000 random
200 19900 398 20000
400 79800 798 80000
partition
1 void selectionSort(int array[], int length) {
2 // array goes from 0..length-1
3 int combIndex, smallestValue, bestIndex, probeIndex;
small big
4 for (combIndex = 0; combIndex < length; combIndex += 1) {
5 // array[0 .. combIndex-1] has lowest elements, sorted.
6 // Find smallest other element to place at combIndex.
sort sort
7 smallestValue = array[combIndex];
8 bestIndex = combIndex;
9 for (probeIndex = combIndex+1; probeIndex < length;
10 probeIndex += 1) {
11 if (array[probeIndex] < smallestValue) {
12 smallestValue = array[probeIndex]; • about log n depth.
13 bestIndex = probeIndex;
• each depth takes about O(n) work.
14 }
15 } • ⇒ O(n log n).
16 swap(array, combIndex, bestIndex); • Can be unlucky: O(n2 ).
17 } // for combIndex
• To prevent worst-case behavior, partition based on median of 3 or 5.
18 } // selectionSort
• Don’t Quicksort small regions; use a final insertionSort pass instead.
• Not stable, because the swap moves an arbitrary value into the un- Experiments how that the optimal break point depends on the im-
sorted area. plementation, but somewhere between 10 and 100 is usually good.
• Experimental results for QuickSort:
compares + moves ≈ 24 n log n.
29 Quicksort (C. A. R. Hoare) n compares moves n log n n2 2
100 643 824 664 5000
• Class 8, 2/18/2021
200 1444 1668 1528 20000
• Recursive based on partitioning: 400 3885 4228 3457 80000
800 8066 8966 7715 320000
1600 17583 18958 17030 1280000

• Analysis if lucky: Cn = n + 2Cn2 , so k = 1, a = 2, b = 2, so a = bk , so


Cn = Θ(nk log n) = Θ(n log n).
• Analysis if unlucky: Cn = n + Cn3 + C2n3 < n + 2C2n3 , so k = 1, a =
2, b = 32, so a > bk , so Cn < Θ(nlogb a ) = Θ(nlog3/2 2 ) ≈ Θ(n170951 ),
which is still better than quadratic.
CS315 Spring 2021 29 CS315 Spring 2021 30

1 void quickSort(int array[], int lowIndex, int highIndex){ • To insert


2 if (highIndex - lowIndex <= 0) return; • Place new value at “end” of tree.
3 // could stop if <= 6 and finish by using insertion sort.
• Let the new value sift up to its proper level.
4 int midIndex = partition(array, lowIndex, highIndex);
5 quickSort(array, lowIndex, midIndex-1); • To delete: always delete the least (root) element
6 quickSort(array, midIndex+1, highIndex);
7 } // quickSort • Save value at root to return it later.
• Move the last value to the root.
• Let the new value sift down to its proper level.
30 Shell Sort (Donald Shell, 1959)
• Storage
• Each pass has a span s.
• Store the tree in an array [1    ]
1 for (int span in reverse(spanSequence)) {
• leftChild[index] = 2*index
2 for (int offset = 0; offset < span; offset += 1) {
3 insertionSort(a[offset], a[offset+span], ... ) • rightChild[index] = 2*index+1
4 } // each offset • the last occupied place in the array is at index heapSize.
5 } // each span
• Applications
• The last element in spanSequence must be 1.
• Sorting
• Tokuda’s sequence: s0 = 1; sk = 225sk−1 + 1; spank = dsk e = 1, 4, 9, • Priority queue
20, 46, 103, 233, 525, 1182, 2660, ...
• Experimental results for Shell Sort: compares + moves ≈ 22n log n.
n compares moves n log n n2 2
100 355 855 664 5000
200 932 1932 1528 20000
400 2266 4666 3457 80000
800 5216 10816 7715 320000
1600 11942 24742 17030 1280000

31 Heaps: a kind of tree


• Class 9, 2/25/2021
• Heap property: the value at a node is ≤ the value of each child (for a
top-light heap) or ≥ the value of each child (for a top-heavy heap).
• The smallest (largest) value is therefore at the root.
• All leaves are at the same level ±1.
CS315 Spring 2021 31 CS315 Spring 2021 32

1 // basic algorithms (top-light heap) 1 // intermediate algorithms


2 2

3 void siftUp (int heap[], int subjectIndex) { 3 void insertInHeap (int heap[], int *heapSize, int value) {
4 // the element in subjectIndex needs to be sifted up. 4 *heapSize += 1; // should check for overflow
5 heap[0] = heap[subjectIndex]; // pseudo-data 5 heap[*heapSize] = value;
6 while (1) { // compare with parentValue. 6 siftUp(heap, *heapSize);
7 int parentIndex = subjectIndex / 2; 7 } // insertInHeap
8 if (heap[parentIndex] <= heap[subjectIndex]) return; 8

9 swap(heap, subjectIndex, parentIndex); 9 int deleteFromHeap (int heap[], int *heapSize) {


10 subjectIndex = parentIndex; 10 int answer = heap[1];
11 } 11 heap[1] = heap[*heapSize];
12 } // siftUp 12 *heapSize -= 1;
13 13 siftDown(heap, 1, *heapSize);
14 int betterChild (int heap[], int subjectIndex, int heapSize) { 14 return(answer);
15 int answerIndex = subjectIndex * 2; // assume better child 15 } // deleteFromHeap
16 if (answerIndex+1 <= heapSize &&
17 heap[answerIndex+1] < heap[answerIndex]) { 1 // advanced algorithm
18 answerIndex += 1; 2

19 } 3 void heapSort(int array[], int arraySize){


20 return(answerIndex); 4 // sorts array[1..arraySize] by first making it a
21 } // betterChild 5 // top-heavy heap, then by successive deletion.
22 6 // Deleted elements go to the end.
23 void siftDown (int heap[], int subjectIndex, int heapSize) { 7 int index, size;
24 // the element in subjectIndex needs to be sifted down. 8 array[0] = −∞; // pseudo-data
25 while (2*subjectIndex <= heapSize) { 9 // The second half of array[] satisfies the heap property.
26 int childIndex = betterChild(heap, subjectIndex, heapSize); 10 for (index = (arraySize+1)/2; index > 0; index -= 1) {
27 if (heap[childIndex] >= heap[subjectIndex]) return; 11 siftDown(array, index, arraySize);
28 swap(heap, subjectIndex, childIndex); 12 }
29 subjectIndex = childIndex; 13 for (index = arraySize; index > 0; index -= 1) {
30 } 14 array[index] = deleteFromHeap(array, &arraySize);
31 } // siftUp 15 }
16 } // heapSort

• This method of heapifying is O(n):

• 1/2 the elements require no motion.


• 1/4 the elements may sift down 1 level.
• 1/8 the elements may sift down 2 levels.

• Total motion = (n2) · 1≤j j2j
CS315 Spring 2021 33 CS315 Spring 2021 34

• That formula approaches n as j → ∞ • Pass 2: examine values in bin order, and in list order within bins,
placing them in a new copy of bins based on the second-to-last digit.
• Total complexity is therefore O(n + n log n) = O(n log n).
• Pass 3, 4: similar.
• This sorting method is not stable, because sifting does not preserve
order. • The number of digits is O(log n), so there are O(log n) passes, each
of which takes O(n) time, so the algorithm is O(n log n).
Experimental results for Heap Sort: compares + moves ≈ 31n log n. • This sorting method is stable.
n compares moves n log n n2 2
100 755 1190 664 5000
200 1799 2756 1528 20000 34 Merge sort
400 4180 6196 3457 80000
800 9621 14050 7715 320000 1 void mergeSort(int array[], int lowIndex, int highIndex){
1600 21569 31214 17030 1280000 2 // sort array[lowIndex] .. array[highIndex]
3 if (highIndex - lowIndex < 1) return; // width 0 or 1
4 int mid = (lowIndex+highIndex)/2;
32 Bin sort 5 mergeSort(array, lowIndex, mid);
6 mergeSort(array, mid+1, highIndex);
• Assumptions: values lie in a small range; there are no duplicates. 7 merge(array, lowIndex, highIndex);
• Storage: build an array of bins, one for each possible value. Each is 8 } // mergeSort
1 bit long. 9

10 void merge(int array[], int lowIndex, int highIndex) {


• Space: O(r), where r is the size of the range. 11 int mid = (lowIndex+highIndex)/2;
• Place each value to sort as a 1 in its bin. Time: O(n). 12 // copy the relevant parts of array to two temporaries
• Read off bins in order, reporting index if it is 1. Time: O(r). 13 // walk through the temporaries in tandem,
14 // placing smaller in array, ties honor left version.
• Total time: O(n + r). 15 } // merge
• Total memory: O(r), which can be expensive.
• Can handle duplicates by storing a count in each bin, at a further • cn = n + 2cn2
expense of memory.
• a = 2, b = 2, k = 1 ⇒ O(n log n).
• This sorting method does not work for arbitrary data having nu-
• This time complexity is guaranteed.
meric keys; it only sorts the keys, not the data.
• Space needed: 2n, because merge in place is awkward (and expen-
sive).
33 Radix sort • The sort is also stable: it preserves the order of identical keys.
• Example: use base 10, with values integers 0 – 9999, with 10 bins, • Insertion, radix, and merge sort are stable, but not selection, Quick-
each holding a list of values, initially empty. sort or Heapsort.
• Pass 1: insert each value in a bin (at rear of its list) based on the last
digit of the value.
CS315 Spring 2021 35 CS315 Spring 2021 36

Experimental results for Merge Sort: compares + moves ≈ 29n log n. • walk up the tree from n , rotating as needed to restore color
n compares moves n log n n2 2 rules. O(log n).
100 546 1344 664 5000 • color the root black.
200 1286 3088 1528 20000
case 1: parent and uncle red
400 2959 6976 3457 80000
Circled: black; otherwise: red
800 6741 15552 7715 320000
1600 15017 34304 17030 1280000 g g*
Star: continue up the tree here
recolor
p u p u
35 Red-black trees (Guibas and Sedgewick 1978)
c* c
• Class 10, 3/2/2021 case 2: parent red, uncle black, c inside
• Red-black trees balance themselves during online insertion.
g g
• Their representation requires pointers both to children and to the rotate c up
parent.
u p p down u c
• Each node is red or black.
continue to case 3
• The pseudo-nodes (or null nodes) at bottom are black. c* c3 c1 p*
• The root node is black.
c1 c2 c2 c3
• Red nodes have only black children. So no path has two red nodes
in a row. case 3: parent red, uncle black, c outside
• All paths from the root to a leaf have the same number of black g g p
nodes. recolor rotate p up

• For a node x, define black-height(x) = number of black nodes on a u p u p g down g n


path down from x, not counting x.
• The algorithm manages to keep height of the tree ≤ 2 log(n + 1). c1 c* c1 c u c1

• To keep the tree acceptable, we sometimes rotate, which reorganizes • try with values 1..6:
the tree locally without changing the symmetric traversal.
y x
right
x c a y
left
a b b c
• To insert

• place new node n in the tree and color it red. O(log n).
CS315 Spring 2021 37 CS315 Spring 2021 38

final • By example.
1 color 1 case 3 1 rotate 2 case 1
2 2 1 3 • The depth of a balanced ternary tree is log3 n, which is only 63% the
color color
depth of a balanced binary tree.
3 3 4
• The number of comparisons needed to traverse an internal node dur-
ing a search is either 1 or 2; average 5/3.
2 2 2 • So the number of comparisons to reach a leaf is 53 log3 n instead of
case 3
1 3 1 3 rotate 1 4 case 1 (for a binary tree) log2 n, a ratio of 1.05, indicating a 5% degradation.
color
4 4 3 5 color • The situation gets only worse for larger arity. For quaternary trees,
5 5
the degradation (in comparison to binary trees) is about 12.5%.
6
• And, of course, an online construction is not balanced.
• Moral: binary is best; higher arity is not helpful.
2
1 4
3 5
38 Quad trees (Finkel 1973)
6 • Extension of sorted binary trees to two dimensions.
• try with these values: 5, 2, 7, 4 (case 1), 3 (case 2), 1 (case 1) • Internal nodes contain a discriminant, which is a two-dimensional
(x,y) value.
• Internal nodes have four children, corresponding to the four quad-
36 Review of binary trees rants from the discriminant.
• Binary trees have expected O(log n) depth, but they can have O(n) • Leaf nodes contain a bucket of b values.
depth. • Insertion
• insertion
• Dive down the tree, put new value in its bucket.
• traversal: preorder, postorder, inorder=symmetric order. • If the bucket overflows, pick a good discriminant and subdi-
• deletion of node D vide.
• Good discriminant: one that separates the values as evenly as
• If D is a leaf, remove it.
possible. Suggestion: median (x, y) values.
• If D has one child C, move C in place of D.
• If D has two children, find its successor: S = RL∗ . Move S in • Offline algorithm to build a balanced tree
place of D. S has no left child, but if it has a right child C, move
• Put all elements in a single bucket, then recursively subdivide
C in place of S.
as above.

• Generalization: for d-dimensional data, let each discriminant have d


37 Ternary trees values. A node can have up to 2d children. This number becomes
cumbersome when d grows above about 3.
• Class 11, 3/4/2021
CS315 Spring 2021 39 CS315 Spring 2021 40

• Heavily used in 3-d modeling for graphics, often with discriminant • Used for cluster analysis, categorizing (as in optical character recog-
chosen as midpoint, not median. nition).

39 k-d trees (Bentley and Finkel 1973) 40 2-3 trees (John Hopcroft, 1970)
• Extension of sorted binary trees to d dimensions. • Class 12, 3/9/2021
• Especially good when d is high. • By example.
• Internal nodes contain a dimension number (0  d − 1) and a dis- • Like a ternary tree, but different rule of insertion
criminant value (real).
• Always completely balanced
• Internal nodes have two children, corresponding to values ≤ and >
• A node may hold 1, 2, or 3 (temporarily) values.
the discriminant in the given dimension.
• A node may have 0 (only leaves), 2, 3, or 4 (temporarily) children.
• Leaf nodes contain a bucket of b values.
• A node that has 3 values splits and promotes its middle value to its
• Offline construction and online insertion are similar to quad trees.
parent (recursively up the tree).
• To split a bucket of values, pick the dimension number with the • If the root splits, it promotes a new root.
largest range across those values.
• Complexity: O(n log n) for insertion and search, guaranteed.
• Given the dimension, pick the median of the values in that di-
mension as the discriminant. • Deletion: unpleasant.
• That choice of dimension number tends to make the domain
of each bucket roughly cubical; that choice of discriminant bal-
ances the tree.
41 Stooge Sort
• Nearest-neighbor search: Given a d-dimensional probe value p, to • A terrible method, but fun to analyze.
find the nearest neighbor to p that is in the tree.

• Dive into the tree until you find p’s bucket.


• Find the closest value in the bucket to p. Cost: b distance mea-
sures. Result: a ball around p.
• Walking back up to the root, starting at the bucket:
• If the domain of the other child of the node overlaps the
ball, dive into that child.
• If the ball is entirely contained within the node’s domain,
done.
• Otherwise walk one step up toward the root and continue.
• complexity: Initial dive is O(n), but the expected number of
buckets examined is O(1).
CS315 Spring 2021 41 CS315 Spring 2021 42

1 #include <math.h> • Shorthand: g = dm2e (the half size)


2 • Internal nodes (other than the root) have g  m children.
3 void stoogeSort(int array[], int lowIndex, int highIndex){ • Insertion
4 // highIndex is one past the end
5 int size = highIndex - lowIndex; • Insert in appropriate leaf.
6 if (size <= 1) { // nothing to do • If current node overflows (has m values) split it into two nodes
7 } else if (size == 2) { // direct sort of g values each; hoist the middle value up one level.
8 if (array[lowIndex] > array[lowIndex+1]) { • When a node splits, its parent’s pointer to it becomes two point-
•9 swap(array, lowIndex, lowIndex+1); ers to the new nodes.
10 }
• When a value is hoisted, iterate up the tree checking for over-
11 } else { // general case
flow.
12 float third = ((float) size) / 3.0;
13 stoogeSort(array, lowIndex, ceil(highIndex - third)); • B+ tree variant: link leaf nodes together for quicker inorder traversal.
14 stoogeSort(array, floor(lowIndex + third), highIndex); This link also allows us to avoid splitting a leaf if its neighbor is not
15 stoogeSort(array, lowIndex, ceil(highIndex - third)); at capacity.
16 }
• A densely filled tree with n keys (values), height h:
17 } // stoogeSort

mh+1 −1
• cn = 1 + 3c2n3 • Number of nodes a = 1 + m + m2 + · · · + mh = m−1
.
• Number of keys n = (m − 1)a = mh+1 − 1 ⇒ logm (n + 1) =
• a = 3, b = 32, k = 0, so bk = 1. By the recursion theorem (page 18),
h + 1 ⇒ h is O(log n).
since a > bk , we have complexity Θ(nlogb a ) = Θ(nlog3/2 3 ) ≈ Θ(n271 ), so
Stooge Sort is worse than quadratic. • A sparsely filled tree with n keys (values), height h:
• However, the recursion often encounters already-sorted sub-arrays.
• The root has two subtrees; the others have g = dm2e subtrees,
If we add a check for that situation, Stooge Sort becomes roughly
so:
quadratic. h −1)
• Number of nodes a = 1 + 2(1 + g + g 2 + · · · + g h−1 ) = 1 + 2(gg−1 .
• The root has 1 key, the others have g − 1 keys, so:
42 B trees (Ed McCreight 1972) • Number of keys n = 1+2(g h −1) = 2g h −1 ⇒ h = logg (n+1)2 =
O(log n).
• A generalization of 2-3 trees when McCreight was at Boeing, hence
the name.
• Choose a number m (the bucket size) such that m values plus m 43 Deletion from a B tree
disk indices fit in a single disk block. For instance, if a block is 4KB,
a value takes 4B, and an index takes 4B, then m = 4KB8B = 512. • Deletion from an internal node: replace value with successor (taken
from a leaf), and then proceed to deletion from a leaf.
• m = 3 ⇒ 2-3 tree.
• Deletion from a leaf: the bad case is that it can cause underflow: the
• Class 13, 3/11/2021 leaf now has fewer than g keys.
• Each node has 1  m − 1 values and 0  m children. (We have room • In case of underflow, borrow a value from a neighbor if possible,
for m values; the extra can be used for pseudo-data.) adjusting the appropriate key in the parent.
CS315 Spring 2021 43 CS315 Spring 2021 44

• If all neighbors (there are 1 or 2) are already minimal, grab a key from • These methods suffer from clustering.
the parent and also merge with a neighbor. • Deletion is hard, because removing an element can damage un-
• In general, deletion is quite difficult. related searches. Deletion by marking is the only reasonable
approach.

44 Hashing • Perfect hashing: if you know all n values in advance, you can look
for a non-colliding hash function h. Finding such a function is in
• Very popular data structure for searching. general quite difficult, but compiler writers do sometimes use perfect
hashing to detect keywords in the language (like if and for).
• Cost of insertion and of search is O(log n), but only because n distinct
values must be log n bits long, and we need to look at the entire key. • Linear probing. Probe p is at index h(k) + p (mod s), for p = 0, 1,   .
If we consider looking at a key to be O(1), then hashing is expected • Terrible behavior when A[ ] is almost full, because chains coa-
(but not guaranteed) to be O(1). lesce. This problem is called “primary clustering”.
• Idea: find the value associated with key k at A[h(k)], where
• Additional hash functions. Use a family of hash functions, h1 (), h2 (),   .
• h() maps keys to integers in 0s − 1, where s is the size of A[ ].
• insertion: key probing with different functions until an empty
• h() is “fast”. (It generally needs to look at all of k, though.)
slot is found.
• Example • searching: probe with different functions until you find the key
(success) or an empty slot (failure).
• k = student in class.
• You need a family of independent hash functions.
• h(k) = k’s birthday (a value from 0 .. 365).
• The method is very expensive when A[ ] is almost full.
• Difficulty: collisions

• Birthday paradox: Prob(no collisions with j people) = 365!


(365−j)!365j
46 Review for midterm
• This probability goes below 1/2 at j = 23.
Class 14, 3/16/2021
• At j = 50, the probability is 0.029. Insert the following items: 31 11 4 12 51 9 2 6 52 32 into:
• Moral: One cannot in general avoid collisions. One has to deal with • binary tree. Preorder result: 31 11 1 2 2 32 51 52 9 6
them.
• top-light heap. Breadth-order result: 11 12 2 31 32 9 4 6 52 51
• array, then heapify. Breadth-order result: 11 12 2 31 32 9 4 6 52 51
45 Hashing: Dealing with collisions: open ad- • ternary tree. Preorder result: (11 , 3) 12 (2, 32 ) (4, 51 ) 52 (6, 9)
dressing • array, then 5 steps of selection sort. Result: 11 12 2 31 32  9 4 6 52 51
Note: not stable.
• Overview
• array, then 5 steps of insertion sort. Result: 11 12 31 4 51  9 2 6 52 32
• The following methods store all items in A[ ] and use a probe Note: stable. Can force anti-stable.
sequence. If the desired position is occupied, use some other • array, then first step of Quicksort, using Lomuto’s partitioning. final
position to consider instead. insertionSort.
CS315 Spring 2021 45 CS315 Spring 2021 46

• 2-3 tree. Preorder result: 3 1 1 (2, 3) 5 (4, 5) (6, 9) 50 Hashing: Dealing with collisions: external chain-
• red-black tree. Preorder result: 3b 1 1b 2b 3 5 4b 5 9b 6 ing
• Each element in A is a pointer, initially null, to a bucket, which is a
47 Midterm exam linked list of nodes that hash to that element; each node contains k
and any other associated data.
Class 15, 3/18/2021
• insert: place k at the front of A[h(k)].
• search: look through the list at A[h(k)].
48 Midterm exam follow-up
• optimization: When you find, promote the node to the start of
Class 16, 3/23/2021 its list.

• average list length is sn. So if we set s ∼


= n we expect about 1 ele-
ment per list, although some may be longer, some empty.
49 Hashing: more open-addressing methods
• Instead of lists, we can use something fancier (such as 2-3 trees), but
• Class 17, 3/25/2021 it is generally better to use a larger s.

• Quadratic probing. Probe p is at index h(k) + p2 (mod s), for p =


0, 1,   . 51 Hashing: What is a good hash function?
• When does this sequence hit all of A[ ]? Certainly it does if s is • Want it to be
prime.
• We still suffer “secondary clustering”: if two keys have the same • Uniform: Equally likely to give any value in 0s − 1.
hash value, then the sequence of probes is the same for both. • Fast.
• Spreading: similar inputs → dissimilar outputs, to prevent clus-
• Add-the-hash rehash. Probe p is at index (p + 1) · h(k) (mod s).
tering. (Only important for open addressing, as described be-
• This method avoids clustering. low.)
• Warning: h(k) must never be 0. • Several suggestions, assuming that k is a multi-word data structure,
• Double hashing. Use two has functions, h1 () and h2 (). Probe p is at such as a string.
index h1 (k) + p · h2 (k). • Add (or multiply) all (or some of) the words of k, discarding
• This method avoids clustering. overflow, then mod by s. It helps if s = 2j , because mod is then
masking with 2j − 1.
• Warning: h2 (k) must never be 0.
• XOR the words of k, shifting left by 1 after each, followed by
mod s.

• Wisdom: The hash function doesn’t make much difference. It is not


necessary to look at all of k. Just make sure that h(k) is not constant
(except for testing collision resolution).
CS315 Spring 2021 47 CS315 Spring 2021 48

52 Hashing: How big should the array be? • Resizing the array is automatic, although one might specify the ex-
pected size in advance to avoid resizing during early growth.
• Some open-addressing methods prefer that s = Array be prime.
• Perl has a built-in datatype called a hash.
• Computing h() is faster if s = 2j for some j.
• Open addressing gets very bad if s < 2n, depending on method. 1 my %foo;
Linear probing is the worst; I would make sure s ≥ 3n. 2 foo{"this"} = "that".
• External chaining works fine even when s ∼ = n, but it gets steadily
• Python has dictionaries.
worse.

1 Foo = dict()
53 Hashing: What should we do if we discover 2 Foo[’this’] = ’that’;
that s is too small? • JavaScript arrays are all associative.
• We can rebuild with a bigger s, rehashing every element. But that
operation causes a temporary “outage”, so it is not acceptable for 1 const foo = [];
online work. 2 foo[’this’] = ’that’;
3 foo.this = ’that’;
• Extendible hashing

• Start with one bucket. If it gets too full (list longer than 10, say),
split it on the last bit of h(k) into two buckets. 55 Cryptographic hashes: digests
• Whenever a bucket based on the last j bits is too full, split it
based on bit j + 1 from the end. • purpose: uniquely identify text of any length.
• To find the bucket • these hashes are not used for searching.
• compute v = h(k). • goals
• follow a tree that discriminates on the last bits of v. This
tree is called a trie. • fast computation
• it takes at most log v steps to find the right bucket. • uninvertable: given h(k), it should be infeasible to compute k.
• Searching within the bucket now is guaranteed to take con- • it should be infeasible to find collisions k1 and k2 such that h(k1 ) =
stant time (ignoring the log n cost of comparing keys) h(k2 ).

• examples
54 Hash tables (associative arrays) in scripting • MD5: 128 bits. Practical attack in 2008.
languages • SHA-1: 160 bits, but (2005) one can find collisions in 269 hash
operations (brute force would use 280 )
• Class 18, 3/30/2021 • SHA-2: usual variant is SHA256; also SHA-512.
• Like an array, but the indices are strings.
• uses
CS315 Spring 2021 49 CS315 Spring 2021 50

• storing passwords (used as a trap-door function) a graph.


• catching plagiarism B
• for authentication (h(m + s) authenticates m to someone who
shares the secret s, for example)
• tripwire: intrusion detection A
D

56 Graphs
• Our standard graph:
1 e1 2 C
e2 Can you find an Eulerian cycle?
e4 e5 3 4 e3 5

• Family trees. These graphs are bipartite: Family nodes and per-
6 7
e7 son nodes. We might want to find the shortest path between
e6 two people.
• Cities and roadways, with weights indicating distance. We might
• Nomenclature
want a minimal-cost spanning tree.
• vertices: V is the name of the set, v is the size of the set. In our
example, V = 1, 2, 3, 4, 5, 6, 7.
• edges: E is the name of the set, e is the size of the set. In our
57 Data structures representing a graph
example, E = e1, e2, e3, e4, e5, e6, e7.
• Adjacency matrix
• directed graph: edges have direction (represented by arrows).
• undirected graph: edges have no direction. • an array n × n of Boolean.
• multigraph: more than one edge between two vertices. We gen- • A[i, j] = true ⇒ there is an edge from vertex i to vertex j.
erally do not deal with multigraphs, and the word graph gen- 1 2 3 4 5 6 7
erally disallows them. 1 x x
• weighted graph: each edge has numeric label called its weight. 2 x x x
3 x x
• Graphs represent situations 4 x
5 x
• streets in a city. We might be interested in computing paths. 6 x x
• airline routes, where the weight is the price of a flight. We might 7 x x x
be interested in minimal-cost cycles. • The array is symmetric if the graph is undirected
• Hamiltonian cycle: no duplicated vertices (cities). • in this case, we can store only one half of it, typically in a
• Eulerian cycle: no duplicated edges (flights). 1-dimensional array
• Islands and bridges, as in the bridges of Königsburg, later called • A[i(i − 1)2 + j] holds information about edge i, j.
Kaliningrad (Euler 1707-1783). This is a multigraph, not strictly • Instead of Boolean, we can use integer values to store edge weights.
CS315 Spring 2021 51 CS315 Spring 2021 52

• Adjacency list • method: depth-first search (DFS).

• an array n of singly-linked lists. 1 void DFS(vertex here) {


2 // assume visited[*] == false at start
• j is in linked list A[i] if there is an edge from vertex i to vertex
3 visited[here] = true;
j.
4 foreach next (successors(here)) {
1 2→6
5 if (! visited[next]) DFS(next);
2 1→3→7
6 }
3 2→7
7 } // DFS
4 5
5 4 • DFS is faster with adjacency list: O(e′ + v ′ ), where e′ , v ′ only count to
6 1→7 the number of edges and vertices in the connected component.
7 2→3→6
• DFS is slower with adjacency matrix: O(v + v ′ ).
• For our standard graph (page 49), assuming that the adjacency lists
58 Computing the degree of all vertices are all sorted by vertex number (or that we use the adjacency matrix),
starting at vertex 1, we invoke DFS on these vertices: 1, 2, 3, 7, 6.
• Adjacency matrix: O(v 2 ).
• DFS can be coded iteratively with an explicit stack
1 foreach vertex (0 .. v-1) {
2 degree[vertex] = 0; 1 void DFS(vertex start) {
3 foreach neighbor in 0 .. v-1 { 2 // assume visited[*] == false at start
4 if (A[vertex, neighbor]) degree[vertex] += 1; 3 workStack = makeEmptyStack();
5 } 4 pushStack(workStack, start)
6 } 5 while (! isEmptyStack(workStack)) {
6 place = popStack(workStack);
• Adjacency list: O(v + e). 7 if (visited[place]) continue;
8 visited[place] = true;
1 foreach vertex (0 .. v-1) {
9 foreach neighbor (successors(place)) {
2 degree[vertex] = 0;
10 if (! visited[neighbor]) {
3 for (neighbor = A[vertex]; neighbor != null;
11 pushStack(workStack, neighbor);
4 neighbor = neighbor->next) {
12 // could record "place" as parent
5 degree[vertex] += 1;
13 } // "neighbor" is not yet visited
6 }
14 } // foreach neighbor
7 }
15 } // while workStack not empty

59 Computing the connected component contain-


ing vertex i in an undirected graph
• why: to segment an image.
• Class 19, 4/1/2021
CS315 Spring 2021 53 CS315 Spring 2021 54

60 To see if a graph is connected 1 void BFS(vertex start) {


2 // assume visited[*] == false at start
• See if DFS hits every vertex. 3 workQueue = makeQueue();
1 bool isConnected() { 4 visited[start] = true;
2 foreach vertex (vertices) 5 insertInQueue(workQueue, start)
3 visited[vertex] = false; 6 while (! emptyQueue(workQueue)) {
4 DFS(0); // or any vertex 7 place = deleteFromQueue(workQueue); // from front
5 foreach vertex (vertices) 8 foreach neighbor (successors(place)) {
6 if (! visited[vertex]) return false; 9 if (! visited[neighbor]) {
7 return true; 10 visited[neighbor] = true;
8 } // isConnected 11 insertInQueue(workQueue, neighbor); // to rear
12 // or: insert (place, neighbor)
13 // to remember path to start
61 Breadth-first search 14 } // not visited
15 } // foreach neighbor
• applications 16 } // while queue not empty
17 } // BFS
• find shortest path in a family tree connecting two people
• find shortest route through city streets • For our standard graph (page 49), assuming that the adjacency lists
• find fastest itinerary by plane between two cities are all sorted by vertex number (or that we use the adjacency matrix),
starting at vertex 1, BFS visits these vertices: 1, 2, 6, 3, 7.
• method: place unfinished vertices in a queue. These are the ones we • using adjacency lists, BFS is O(v ′ + e′ ).
still need to visit, in order closest to furthest.

62 Shortest path between vertices i and j


• Compute BFS(i), but stop when you visit j.

• Actually, you can stop when you place j in the queue.


• Construct the path by building a back chain when you insert
a vertex in the queue. That is, you insert a pair: (place,
neighbor).

• If edges are weighted:

• Use a heap (top-light) instead of a queue. That’s why heaps are


sometimes called priority queues.
• stop when you visit j, not when you place j in the queue.

• Class 20, 4/6/2021


CS315 Spring 2021 55 CS315 Spring 2021 56

1 2
1 void weightedBFS(vertex start, vertex goal) { 80
2 // assume visited[*] == () at start
3 workHeap = makeHeap(); // top-light 40 100
4 insertInHeap(workHeap, (0, start, start)); path length
20
5 // distance, vertex, from where 60 3 4 5 →0
6 while (! emptyHeap(workHeap)) { 56 → 40
60
7 (distance, place, from) = deleteFromHeap(workHeap); 120 120 53 120
8 if (visited[place] != ()) continue; // already seen 51 60
9 visited(place) = (from, distance); 51 → 60
10 if (place == goal) return; // could print path 5 40 6 563 100 better way to add vertex 3
11 foreach (neighbor, weight) in (successors(place)) { 564 160
12 insertInHeap(workHeap, (distance+weight, neighbor, place)); 512 140
13 } // foreach neighbor 563 → 100
14 } // while queue not empty 564 160
15 } // BFS 512 140
5634 → 120 better way to add vertex 4
• Another example:
63 Dijkstra’s algorithm: Finding all shortest paths 1
from a given vertex in a weighted graph 30 30 path length
1 →0
The weights must be positive. Weiss §9.3.2 2 10 3 20 4 12 →3
10 14 →3
• Rule: among all vertices that can extend a shortest path already 40 30 123 →4
found, choose the one that results in a shortest path. If there is a 5 Start at 1. 1 2 5 7
tie ending at the same vertex, choose either. If there is a tie going to 143 5
different vertices, choose both. 145 6
• This is an example of a greedy algorithm: at each step, improve the 125 7
solution in the way that looks best at the moment. 145 6
1235 →5
• Starting position: one path, length 0, from start vertex j to j.

64 Topological sort
• Sample application: course prerequisites place some pairs of courses
in order, leading to a directed, acyclic graph (DAG). We want to find
a total order; there may be many acceptable answers.
• Weiss §9.2
CS315 Spring 2021 57 CS315 Spring 2021 58

115
connected subgraph containing all the original vertices.
40 40
215 1 2 1 2
1 2 3

216 275
20 50
4 5 6 30 30 20
280 315

10 10
7 8 9 470 335 405
3 4 3 4
60 30
10 471 60
Possible results: 5 20 6 5 20 6
1 4 10 1 2 6 5 7 8 3 9 • Minimum-weight panning tree: Given a connected undirected weighted
2 10 4 7 1 2 5 8 6 3 9 graph, a spanning tree with least total weight.
• method: DFS looking for sinks (degrees with fanout 0), which are • Example: minimum-cost set of roads (edges) connecting a set of cities
then placed at the front of the growing result. (vertices).
1 list answerList; // global
2
66 Prim’s algorithm
3 void topologicalSort () { // computes answerList
4 foreach j (vertices) visited[j] = false;
5 answerList = makeEmptyList(); 1 Start with any vertex as the current tree.
6 foreach j (vertices) 2 do v − 1 times
7 if (! visited[j]) tsRecurse(j); 3 connect the current tree to the closest external vertex
8 } // topologicalSort
9
• This is a greedy algorithm: at each step, improve the solution in the
10 void tsRecurse(vertex here) { // adds to answerList way that looks best at the moment.
11 visited[here] = true; • Example: start with 5. We add: (5,6), (5,1), (1,3), (3, 4), (1, 2)
12 foreach next (successors(here)) • Implementation
13 if (! visited[next]) tsRecurse(next);
14 insertAtFront(answerList, here); • Keep a top-light heap of all external vertices based on their dis-
15 } // tsRecurse tance to the current tree (and store to which tree vertex they
connect at that distance).
• Initially, all distances are ∞ except for the neighbors of the start-
65 Spanning trees ing vertex.
• Repeatedly take the closest vertex f and add its edge to the cur-
• Class 21, 4/8/2021 rent tree.
• Weiss §9.5 • For all external neighbors b of f , perhaps f is a better way to
connect b to the tree; if so, update b’s information in the heap.
• Spanning tree: Given a connected undirected graph, a cycle-free
(Remove b and reinsert it with the better distance.)
CS315 Spring 2021 59 CS315 Spring 2021 60

• Complexity: O(v · log v + e), because for we add each vertex once, re- • Every vertex has at most one parent, initially nil.
moving it from a heap that can have v elements; we need to consider • Find the representative b’ of b by following parent links until
each edge twice (once from each end). the end.
• Find the representative c’ of c.
67 Kruskal’s algorithm • If b’ = c’, they are already in the same component. Done.
• Point either b’ to c’ or c’ to b’ by introducing a parent link be-
tween them.
1 Start with all vertices, no edges.
2 do v − 1 times • We want trees to be as shallow as possible. So record the height
3 add the lowest-cost missing edge that does not form a cycle of each tree in its root. Point the shallower one at the deeper
one.
• This is a greedy algorithm: at each step, improve the solution in the • We can compress paths while searching for the representative.
way that looks best at the moment. In this case, the height recorded in the root is just an estimate.

• We can stop when we have added v − 1 edges; all the rest will cer- • We use this data structure in Kruskal’s algorithm to avoid cycles:
tainly introduce cycles.
1 typedef struct vertex_s {
• Data representation: List of edges, sorted by weight 2 int name; // need not be int
• Complexity: assuming that keeping track of the component of each 3 struct vertex_s *representative; // NULL => me
vertex is O(log∗ v), the complexity is O(e log e + v log∗ v), because we 4 int depth; // only if I represent my group; 0 initially
must sort the edges and then add v − 1 edges. 5 } vertex_t;

• Class 22, 4/13/2021


68 Cycle detection: Union-find • More examples of Union-Find

• general idea

• As edges are added, keep track of which connected compone-


69 Numerical algorithms
nent every vertex belongs to. • We will not look at algorithms for approximation to problems using
• Any new edge connecting vertices already in the same compo- real numbers; that is the subject of CS321.
nent would form a cycle; avoid adding such edges.
• We will study integer algorithms.
• operations

• Each vertex starts as a separate component.


• union(b,c): assign b and c to the same component (for instance,
when an edge is introduced between them).
• find(b): tell which component b is in (if b and c are in the same
component, don’t add an edge connecting them).

• method for union(b,c)


CS315 Spring 2021 61 CS315 Spring 2021 62

70 Euclidean algorithm: greatest common divi- • As we read the binary representation of e from left to right, we could
start with the leading 0’s without any harm.
sor (GCD)
• Example (run with the bc calculator program): 243745 mod 452. 74510 =
• Examples: gcd(12,60)=12, gcd(15,66)=3, gcd(15,67)=1. 10111010012 .
1 int gcd(a, b) { 1 a = 243
2 while (b != 0) { 2 m = 452
3 (a,b) = (b, a % b); 3 r = 1
4 } 4 r = rˆ2*a % m
5 return(a); 5 r = rˆ2 % m
6 } // gcd 6 r = rˆ2*a % m
7 r = rˆ2*a % m
a 12 60 12 8 r = rˆ2*a % m
• Example:
b 60 12 0 9 r = rˆ2 % m
10 r = rˆ2*a % m
a 15 66 15 6 3
• Example: 11 r = rˆ2 % m
b 66 15 6 3 0
12 r = rˆ2 % m
a 15 67 15 7 1 13 r = rˆ2*a % m
• Example:
b 67 15 7 1 0 14 r

71 Fast exponentiation 72 Integer multiplication


• Many cryptographic algorithms require raising large integers (thou-
sands of digits) to very large powers (hundreds of digits), modulo a • Class 23, 4/15/2021
large number (about 2K bits). • The BigNum representation: linked list of pieces, each with, say, 2
• to get a64 we only need six multiplications: (((((a2 )2 )2 )2 )2 )2 bytes of unsigned integer, with least-significant piece first. (It makes
no difference whether we store those 2 bytes in little-endian or big-
• to get a5 we need three multiplications: a4 · a = (a2 )2 · a. endian.)
• General rule to compute ae : look at the binary representation of e, • Ordinary multiplication of two n-digit numbers x and y costs n2 .
read it from left to right. The initial accumulator has value 1.
• Anatoly Karatsuba (1962) showed a divide-and-conquer method that
• 0: square the accumulator is better.
• 1: square the accumulator and multiply by a.
• Split each number into two chunks, each with n2 digits:
• Example: a11 . In binary, 1110 is expressed as 10112 . So we get • x = a · 10n2 + b
((((12 )a)2 )2 · a)2 · a, a total of 4 squares and 3 multiplications, or 7 • y = c · 10n2 + d
operations. The first square is always 12 and the first multiplication The base 10 is arbitrary; the same idea works in any base, such
is 1 · a; we can avoid those trivial operations. as 2.
• In cryptography, we often need to compute ae (mod p). Calculate • Now we can calculate xy = ac10n + (bc + ad)10n2 + bd. This cal-
this quantity by performing mod p after each multiplication. culation uses four multiplications, each costing (n2)2 , so it still
CS315 Spring 2021 63 CS315 Spring 2021 64

costs n2 . All the additions and shifts (multiplying by powers of 1 bigInt bigMult(bigInt x, y; int n) {
10) cost just O(n), which we ignore. 2 // n-chunk multiply of x and y
• We can use the Recursion Theorem (page 17 ): cn = n + 4cn2 . 3 bigInt a, b, c, d, u, v, w;
Then a = 4, b = 2, k = 1, so a > bk , so cn = Θ(nlogb (a) ) = 4 if (n == 1) return(toBigInt(toInt(x)*toInt(y)));
Θ(nlog2 (4) ) = Θ(n2 ). 5 a = extractPart(x, 0, n/2 - 1); // high part of x
• But we can introduce u = ac, v = bd, and w = (a + b)(c + d) at a 6 b = extractPart(x, n/2, n-1); // low part of x
cost of (34)n2 . 7 c = extractPart(y, 0, n/2 - 1); // high part of y
• Now xy = u10n + (w − u − v)10n2 + v, which costs no further 8 d = extractPart(y, n/2, n-1); // low part of y
multiplications. 9 u = bigMult(a, c, n/2); // recursive
• Example 10 v = bigMult(b, d, n/2); // recursive
11 w = bigMult(bigAdd(a,b), bigAdd(c,d), n/2); // recursive
• x = 3962, y = 4481 12 return(
• a = 39, b = 62, c = 44, d = 81 13 bigAdd(
• u = ac = 1716, v = bd = 5022, w = (a + b)(c + d) = 12625 14 bigShift(u, n),
• w − u − v = 5887 15 bigAdd(
• xy = 17753722. 16 bigShift(bigSubtract(w, bigAdd(u,v)), n/2),
• In bc: 17 v
1 x = 3962
18 ) // add
2 y = 4481
19 ) // add
3 a = 39
20 );
4 b = 62
21 }
5 c = 44

6 d = 81

7 u = a*c
73 Strings and pattern matching — Text search
8 v = b*d
problem
9 w = (a+b)*(c+d)

10 x * y
• Class 24, 4/20/2021
11 u*10ˆ4 + (w-u-v)*10ˆ2 + v
• The problem: Find a match for pattern p within a text t, where p =
• We can apply this construction recursively. cn = n + 3cn2 . We m and t = n.
can again apply the Recursion Theorem (page 17 ): a = 3, b = 2,
• Application: t is a long string of bytes (a “message”), and p is a short
k = 1, so a > bk , so cn = Θ(nlogb a ) = Θ(nlog2 3 ) ≈ Θ(n1585 ).
string of bytes (a “word”).
• For small n, this improvement is small. But for n = 100, we
reduce the cost from 10, 000 to about 1480. Running bc -l: • We will look at several algorithms; there are others.

1 power=l(3)/l(2) • Brute force: O(mn). Typical: 11n (operations).


2 a=100 • Rabin-Karp: O(n). Typical: 7n.
3 e(power*l(a)) • Knuth-Morris-Pratt: O(n) Typical: 11n.
• Boyer-Moore: worst O(mn). Typical: nm.
CS315 Spring 2021 65 CS315 Spring 2021 66

• Non-classical version: approximate match, regular expressions, more • We will start with fingerprinting, a weak version of the final method,
complicated patterns. just looking at parity, and assuming the strings are composed of 0
and 1 characters.
• The parity of a string of 0 and 1 characters is 0 if the number of 1
74 Text search — brute force algorithm characters is even; otherwise the parity is 1.

• Return the smallest index j such that t[j  j + m − 1] = p, or −1 if • Formula: parity = j p[j] (mod 2).
there is no match. • We can compute the parities of windows of m(= 6) bits in t. For ex-
1 int bruteSearch(char *t, char *p) { ample,
2 // returns index in t where p is found, or -1 j 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
3 const int n = strlen(t); t 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 1 1
4 const int m = strlen(p); tParity 1 1 0 1 0 1 0 1 0 1 0 0 1 1
5 int tIndex = 0; • Say that p = 010111, which has pParity = 0. We only need to
6 p[m] = 0xFF; // impossible character; pseudo-data consider matches starting at positions 2, 4, 6, 8, 10, and 11.
7 while (tIndex+m <= n) { // there is still room to find p
• We have saved half the work.
8 int pIndex = 0;
9 while (t[tIndex+pIndex] == p[pIndex]) // enlarge match • We can calculate tParity quickly as we move p by looking at only
10 pIndex += 1; 2, not p, characters of t:
11 if (pIndex == m) return(tIndex); // hit pseudo-data ∑
• Initially, tParity0 = 0≤j<m t[j] (mod 2).
12 tIndex += 1;
13 } // there is still room to find p • Then, tParityj+1 = tParityj + t[j] + t[j + m] (mod 2)
14 return(-1); // failure
15 } // bruteSearch

• Example: p = ”001”, t = ”010001”.


• Worst case: O((n − m)m) = O(nm)
• If the patterns are fairly random, we observe complexity O(n − m) =
O(n); in practice, complexity is about 11n.

75 Text search — Rabin-Karp


• Michael Rabin, Richard Karp (1987)
• The idea is to do a preliminary hash-based check each time we incre-
ment tIndex and skip this value of tIndex if there is no chance that
this position works.
• Problem: how can we avoid m accesses to compute the hash of the
next piece of t?
CS315 Spring 2021 67 CS315 Spring 2021 68

1 bit computeParity(bit *string, int length) { • The full algorithm extends fingerprinting.
2 bit answer = 0; • Instead of reducing the work to 1/2 or 1/128, we want to reduce
3 for (int index = 0; index < length; index += 1) { it to 1q for some large q.
4 answer += string[index];
• Use this hash function for m bytes t[j]    t[j + m − 1]:
5 } ∑
0≤i<m 2
m−1−i
t[j+i] (mod q). Experience suggests that q should
6 return (answer & 01);
be a prime > m.
7 } // computeParity
8
• We can still update tParity quickly as we move p by looking
9 int fingerprintSearch(bit *t, bit *p) { at only 2, not p, characters of t:
10 const int n = strlen(t); tParityj+1 = (t[j + m] + 2(tParityj − 2m−1 t[j])) (mod q).
11 const int m = strlen(p); • We can use shifting to compute tParity without multiplica-
12 const int pParity = computeParity(p, m); tion: tParityj+1 = (t[j+m]+(tParityj −(t[j] << (m−1)) << 1)
13 int tParity = computeParity(t, m); // initial substring (mod q). We still need to compute mod q, however.
14 int tIndex = 0;
15 while (tIndex+m <= n) { // there is still room to find p • Class 25, 4/22/2021
16 if (tParity == pParity) { // parity check ok • Monte-Carlo substring search
17 int pIndex = 0;
18 while (t[tIndex+pIndex] == p[pIndex]) { // enlarge match • Choose q, a prime q close to but not exceeding mn2 . For in-
19 pIndex += 1; stance, if m = 10 and n = 1000, choose a prime q near 107 , such
20 if (pIndex >= m) return(tIndex); as 9,999,991.
21 } // enlarge match • The probability 1q that we will make a mistake is very low, so
22 } // parity check ok just omit the inner loop. We will sometimes have a false posi-
23 tParity = (tParity + t[tIndex] + t[tIndex+m]) & 01; tive, with probability, it turns out, less than 253n.
24 tIndex += 1; • I don’t think we save enough computation to warrant using
25 } // there is still room to find p Monte Carlo search. If false positives are very rare, it doesn’t
26 return(-1); // failure hurt to employ even a very expensive algorithm to remove them.
27 } // fingerprintSearch Checking anyway is called the “Las-Vegas version”.
• Instead of bits, we can deal with character arrays. • The idea is good, but in practice Rabin-Karp takes about 7n work.
• We generalize parity to the exclusive OR of characters, which
are just 8-bit quantities. 76 Text search — Knuth–Morris–Pratt
• The C operator for exclusive OR is ˆ.
• The update rule for tParity is • Donald Knuth, James Morris, Vaughan Pratt, 1970-1977.
tParity = tParity ˆ t[tIndex] ˆ t[tIndex+m]; • Consider t = Tweedledee and Tweedledum, p = Tweedledum.
• After running the inner loop of brute-force search to the u in p, we
• We now have reduced the work to 1/128 (for 7-bit ASCII), not have learned much about t, enough to realize that none of the letters
1/2, for the random case, because only that small fraction of up to that point in t (except the first) are T. So the next place to start
starting positions are worth pursuing. a match in t is not position 1, but position 8.
CS315 Spring 2021 69 CS315 Spring 2021 70

• Consider t = pappappappar, p = pappar. forward, so we omit it.


• After running the inner loop of brute-force search to the r in p, we • The overall cost is guaranteed O(n + m), but m < n, so O(n). In
have learned much about t, enough to realize that the first place in t practice, it makes about 11n comparisons.
that can match p starts not at position 1, but rather in position 3 (the
third p). Moving p to that position lets us continue in the middle of
p, never retreating in t at all. 77 Text search — Boyer – Moore simple
• How much to shift p depends on how much of it matches when we
• Robert S. Boyer, J. Strother Moore (1977)
encounter a mismatch in the inner loop. This shift table describes
the first example. • We start by modifying bruteSearch to search from the end of p
p T w e e d l e d u m backwards.
k -1 0 1 2 3 4 5 6 7 8 9 1 int backwardSearch(char *t, char *p) {
shift 1 1 2 3 4 5 6 7 8 9 10 2 const int n = strlen(t);
• If our match fails at p[8], use shift[7]=8 to reposition the pattern. 3 const int m = strlen(p);
4 int tIndex = 0;
• Here is the shift table for the second example. 5 while (tIndex+m <= n) { // there is still room to find p
p p a p p a r 6 int pIndex = m-1;
k -1 0 1 2 3 4 5 7 while (t[tIndex+pIndex] == p[pIndex]) { // enlarge match
shift 1 1 2 2 3 3 6 8 pIndex -= 1;
• Try matching that p against t = pappappapparrassanuaragh. 9 if (pIndex < 0) return(tIndex);
10 } // enlarge match
1 int KMPSearch(char *t, char *p) {
11 tIndex += 1;
2 const int n = strlen(t);
12 } // there is still room to find p
3 const int m = strlen(p);
13 return(-1); // failure
4 int tIndex = 0;
14 } // backwardSearch
5 int pIndex = 0;
6 char shiftTable[m]; • Occurrence heuristic: At a mismatch, say at letter α in t, shift p to
7 computeShiftTable(p, shiftTable); align the rightmost occurrence of α in p with that α in the text. But
8 while (tIndex+m <= n) { // there is still room to find p don’t move p to the left. If α does not occur at all in p, move p to one
9 while (t[tIndex+pIndex] == p[pIndex]) { // enlarge match position after α.
10 pIndex += 1;
11 if (pIndex >= m) return(tIndex); • Method: Initialize location array for p:
12 } // enlarge match
13 const int shiftAmount = shiftTable[pIndex - 1];
14 tIndex += shiftAmount;
15 pIndex = max(0, pIndex-shiftAmount);
16 } // there is still room to find p
17 return(-1); // failure
18 } // KMPSearch

• Unfortunately, computing the shift table, although O(m), is not straight-


CS315 Spring 2021 71 CS315 Spring 2021 72

1 int location[256]; • Horspool’s version (Nigel Horspool, 1980): on a mismatch, look at


2 // location[c] is the last position in p holding char c β, which is the element in t where we started matching, that is, β =
3
ti+m−1 . Shift so that β in t aligns with the rightmost occurrence of β
4 void initLocation(char *p) { in p (not counting pm−1 ).
5 const int m = strlen(p); • This method always shifts p to the right.
6 for (int charVal = 0; charVal < 256; charVal += 1) {
• We need to precompute for each letter of the alphabet where its
7 location[charVal] = -1;
rightmost occurrence in p is, not counting pm−1 . In particular:
8 }
9 for (int pIndex = 0; pIndex < m; pIndex += 1) { • shift[β] = if β in p0m−2 then m − 1 − maxjj < m − 1, pj = β
10 location[p[pIndex]] = pIndex; else m.
11 }
} // initLocation
12
78 Advanced pattern matching, as in Perl
• Let α be the failure character, which is found at a particular pIndex
and tIndex. • Based on regular expressions; can be compiled into finite-state au-
tomata.
• Slide p: tIndex += max(1, pIndex - location[α])
• This formula works in all cases. • exact: conundrum
• don’t-care symbols: con.ndr..
• α not in p and pIndex = m-1 ⇒ a full shift: tIndex += m • character classes: c[ou1-5]nundrum
• α not in p and pIndex = j ⇒ a partial shift, larger if we haven’t • alternation: c(o|u)nund(rum|ite)
travelled far along p: tIndex += pIndex + 1
• repetition:
• α is in p. We shift enough to align the rightmost α of p with the
• c(on)*und
one we failed on, or at least shift right by 1.
• c(on)+und
• Examples • c(on)4,5und
• predefined character classes: c\wnundrum\d\W
• p = rum, t = conundrum. We shift p by 3, another 3, and find the
match. • Unicode character classes:
c\p{ASCII}nundrum\p{digit}\p{Final_Punctuation}
• p = drum, t = conundrum. We shift p by 1, by 4, and find the
match. • pseudo-characters: ˆconundrum$
• p = natu, t = conundrum. We shift p by 2, then fail. • Beyond regular expressions in Perl
• p = date, t = detective. We would shift p left, so we just shift
right 1, then 4, then fail. • Reference to ”capture groups”: con(un|an)dr\1m
• Zero-width assertions: (?=conundrum)
• Class 26, 4/27/2021
• Match heuristic: Use a shift table (organized for right-to-left search)
as with the Knuth–Morris–Pratt algorithm.
79 Edit distance
• Use both the occurrence and the match heuristics, and shift by the • How much do we need to change s (source) to make it look like d
larger of the two suggestions. (destination) ?
CS315 Spring 2021 73 CS315 Spring 2021 74

• Charge 1 for each replacement (R), deletion (D), insertion (I). • Greedy
D I
• Example: ghost → host → houst → house
R
• Dynamic programming
• The edit distance (s,d) is the smallest number of operations to trans- • Search
form s to d.
• We can build an edit-distance table d by this rule:
di,j = min(di−1,j + 1, di,j−1 + 1, di−1,j−1 + if s[i] = d[j] then 0 else 1).
81 Divide and conquer algorithms
• Example: peseta → presto (should get distance 3). • steps
-1 0 1 2 3 4 5
p e s e t a • if the problem size n is trivial, do it.
-1 0 1 2 3 4 5 6 • divide the problem into a easier problems of size nb.
0 p 1 0 1 2 3 4 5 • do the a easier problems
1 r 2 1 1 2 3 4 5 • combine the answers.
2 e 3 2 1 2 2 3 4
3 s 4 3 2 1 2 3 4 • We can usually compute the complexity by the Recursion Theorem
4 t 5 4 3 2 2 2 3 (page 17).
5 o 6 5 4 3 3 3 3 • cost: nk for splitting, recomputing, so Cn = nk + aCnb .
• We can trace back from the last cell to see exactly how to navigate to • Select jth smallest element of an array. a = 1, b = 2, k = 1 ⇒ O(n).
the start cell: pick any smallest neighbor to left/above.
• Quicksort. a = 2, b = 2, k = 1 ⇒ O(n log n).
• ↓: delete a character from source (left string) • Binary search, search in a binary tree. a = 1, b = 2, k = 0 ⇒ O(log n).
• →: insert a character from destination (top string)
• Multiplication (Karatsuba) a = 3, b = 2, k = 1 ⇒ O(nlog2 3 ).
• ↘: keep the same character (if number the same) or replace a
character in the source (left string) with one from the destination • Tile an n × n board that is missing a single cell by a trimino: a = 4,
(top string). b = 4, k = 0 ⇒ O(n).

• complexity: O(nm) to calculate the array; the preprocessing is just to


start up the array, of cost O(n + m).
• Mergesort:
• Another example: convert banana to antenna. It should take only
4 edits. 1 void mergeSort(int array[], int lowIndex, int highIndex){
2 // sort array[lowIndex] .. array[highIndex]
• Class 27, 4/29/2021
3 if (highIndex - lowIndex < 1) return; // width 0 or 1
• This algorithm is in the dynamic programming category. Pascal’s 4 int mid = (lowIndex+highIndex)/2;
triangle is another, as is finding the rectangle in an array with the 5 mergeSort(array, lowIndex, mid);
largest sum of values (some negative). 6 mergeSort(array, mid+1, highIndex);
7 merge(array, lowIndex, highIndex);
} // mergeSort
80 Categories of algorithms
8

a = 2, b = 2, k = 1 ⇒ O(n log n).


• Divide and conquer
CS315 Spring 2021 75 CS315 Spring 2021 76

82 Greedy algorithms space 0


A 111
General rule: Enlarge the current solution by selecting (usually in a simple O 110
way) the best single-step improvement. R 101
S 1001
• Computing the coins for change: greedily apply the biggest available T 1000
coin first.
The same text now uses 60 · 1 + 22 · 3 +    + 4 · 4 = 253 bits.
• not always optimal: consider denominations 1, 6, 10, and we
• To decode: follow a tree:
wish to construct 12.
• Denominations 1, 5, 10 guarantee optimality. 1 0

• Power-of-two coins would be very nice: no more than 1 of each


needed for change. British measures follow this rule: fluid ounce 1 0 space
: tablespoon : quarter-gill : half-gill : gill : cup : pint : quart :
half gallon : gallon 1 0 1 0

• Similar problem: putting weights on barbells.


A O R 1 0

• Kruskal’s algorithm for computing a minimum-cost spanning tree:


greedily add edges of increasing weight, avoiding cycles. S T
• Prim’s algorithm for computing a minimum-cost spanning tree: greed- • To build the tree
ily enlarge the current spanning tree with the shortest edge leading
out. • Each character is a node.
• Dijkstra’s algorithm for all shortest paths from a source: greedily • Greedily take the two least common nodes, combine them as
pick the cheapest extension of all paths so far. children of a new parent, and label that parent with the com-
bined frequency of the two children.
• Hoffman codes for data compression
• Adding a million real numbers, all in the range 0    1, losing minimal
• Start with a table of frequencies, like this one:
precision
space 60
A 22 • Remove the two smallest numbers from the set. (This step is
O 16 greedy: take the numbers whose sum can be computed with
R 13 the least precision loss.)
S 6 • Insert their sum in the set.
T 4
• Use a heap to represent the set.
A text containing all these characters in the given frequencies
would take 60 + 22 + 16 + 13 + 6 + 4 = 121 7-bit units or 847 bits. • Continuous knapsack problem
• Build a table of codes, like this one:
• Given a set of n objects xi , each with a weight wi and profit pi ,
and a total weight capacity C, select objects (to put in a knap-
sack) that together weigh ≤ C and maximize profit. We are
allowed to take fractions of an object.
CS315 Spring 2021 77 CS315 Spring 2021 78

• Greedy method • Breadth-first search: use a queue, iterative, avoiding vertices


• Start with an empty knapsack. already visited, perhaps with back-pointers
• Sort the objects in decreasing order of pi wi . • Shortest path between nodes i and j: use a priority queue (heap)
• Greedy step: Take all of each object in the list, if it fits. If it sorted by distance from i.
fits partially, take a fraction of the object, then done. • Topological sort: recursive; build list as the last step.
• Stop when the knapsack is full. • Dijkstra’s algorithm: Finding all shortest paths from given node
• Example. Greedy: extend currently shortest path
chapter pages (weight) importance (profit) ratio • Prim’s algorithm for spanning trees: Greedy, repeatedly add
1 120 5 .0417 shortest outgoing edge
2 150 5 .0333 • Kruskal’s algorithm for spanning trees: Greedy, repeatedly add
3 200 4 .0200 shortest edge that does not build a cycle.
4 150 8 .0533
• Cycle detection: Union-find: All vertices in a component point
5 140 3 .0214
(possibly indirectly) to a representative; union joins representa-
sorted: 4, 1, 2, 5, 3. If capacity C = 600, take all of 4, 1, 2, 5, and
tives.
40/200 of 3.
• This greedy algorithm happens to be optimal. • Numerical algorithms
• 0/1 knapsack problem: Same as before, but no fractions are allowed. • Euclidean algorithm for greatest common divisor (GCD): Re-
The greedy algorithm is still fast, but it is not guaranteed optimal. peatedly take modulus.
• Fast exponentiation: represent the exponent in binary to guide
the steps
83 Dynamic programming • Integer multiplication (Karatsuba): Subdivide a problem of size
General rule: Solve all smaller problems and use their solutions to com- n × n into three problems of size n2 × n2.
pute the solution to the next problem. • Strings and pattern matching — Text search problem
• Compute Fibonacci numbers: fi = fi−1 + fi−2 .
• Text search — Brute-force algorithm: Try each position for the
• Compute binomial coefficients: C(n, i) = C(n − 1, i − 1) + C(n − 1, i). pattern p.
• Compute minimal edit distance. • Text search — Rabin-Karp: Hash-based pre-check each time p
moves over.
• Text search — Knuth–Morris–Pratt: Precomputed shift table tells
84 Summary of algorithms covered how far to move p on a mismatch.
• Text search — Boyer – Moore simple: Match starting at the end
• Class 28, 5/4/2021
of p; can jump great distances.
• Graphs • Edit distance — Dynamic programming, finding edit distance
• Computing the degree of all vertices: Loop over representation of several subproblems to guide the next subproblem.
• Computing the connected component containing node i in an • Miscellaneous
undirected graph: Depth-first search, recursive, avoiding ver-
tices already visited • Tiling (divide and conquer)
CS315 Spring 2021 79 CS315 Spring 2021 80

• Mergesort (divide and conquer) 86 Decision problems, function problems, P, NP


• Computing Fibonacci numbers (dynamic programming)
• Computing binomial coefficients (dynamic programming) • Decision problems: The answer is just “yes” or “no”.
• Continuous knapsack problem (greedy) • Primality: is n prime? (There are very fast probabilistic algo-
• Coin changing (greedy) rithms, and recently a polynomial algorithm).
• Hoffman codes (greedy) • Is there a path from a to b shorter than 10?
• Are two graphs G and F isomorphic? (Apparently very hard)
85 Tractability • Can graph G be colored with 3 colors? (Apparently very hard)
• Function problems: the answer is a number.
• Formal definition of O: f (n) = O(g(n)) iff for adequately large n,
and some constant c, we have f (n) ≤ c · g(n). That is, f is bounded • What is the smallest prime divisor of n?
above by some multiple of g. We can say that f grows no faster than • What is the weight of a minimum-weight spanning tree of G?
g.
• We use P to refer to the set of decision problems that can be decided
• Formal definition of Θ: f (n) = Θ(g(n)) iff for adequately large n, and
in polynomial time. That is, for all problems p ∈ P , there must be
some constants c1 , c2 , we have c1 · g(n) ≤ f (n) ≤ c2 · g(n). We say that
an algorithm and a positive number k such that the time of the algo-
f grows as fast as g.
rithm for p is O(xk ) where x means the size of x.
• Formal definition of Ω is similar; f (n) = Ω(g(n)) means that f grows
• We use NP to refer to the set of decision problems that can be decided
at least as fast as g.
in polynomial time if we are allowed to guess a witness to a “yes”
• We usually say that a problem is tractable if we can solve it in poly- answer and only need to check it.
nomial time (with respect to the problem size n). We also say the
program is efficient. • Is there a path from a to b shorter than 10? Guess the path, find
its length. O(1).
• constant time: O(1)
• Are two graphs G and F isomorphic? Guess the isomorphism,
• logarithmic time: O(log n) then check, requiring O(v + e).
• linear time: O(n) • Can graph G be colored with 3 colors? Guess the coloring, then
• sub-quadratic: O(n log n) (for instance) demonstrate that it is right; O(v + e).
• quadratic time: O(n2 ) • Is there a set of Boolean values for variables x1 , , xn that sat-
• cubic time: O(n3 ) isfies a given Boolean formula (using ”and”, ”or”, and ”not”)?
• These are all bounded by O(nk ) for some fixed k. Guess the values, check in linear time (in the length of the for-
• However, if k is large, even tractable problems can be infeasible mula).
to solve. In practice, algorithms seldom have k > 3.
• Properties of P and NP (and EXP, decision problems that can be
• There are many algorithms that take more than polynomial time. solved in O(k n )).
• exponential: O(2n ). • P ⊆ N P ⊆ EXP
• super-exponential: O(n!) (for example) • P ⊂ EXP
• O(nn ) • if a problem in NP has g possible witnesses, then it has an algo-
n
• O(22 ) rithm in O(g n ).
CS315 Spring 2021 81

• Some problems can be proved to be “hardest” in N P . The are


called N P -complete problems. All other problems in N P can
be reduced to such N P -complete problems.
• Nobody knows, but people suspect that P ⊂ N P ⊂ EXP .

You might also like