Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
143 views

Data Structure and Algorithm Coding Interview PDF

Uploaded by

Sravan Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
143 views

Data Structure and Algorithm Coding Interview PDF

Uploaded by

Sravan Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 102
a CODING INTERVIEW COMPLETE D STRUCTURE ALGORITHM J Scanned with CamScanner en aa se . Data structures . Trees and Graph algorithms . Dynamic Programming . Recursive algorithms . Scheduling algorithms (Greedy) . Caching Scanned with CamScanner 7. Sorting 8. Files 9. Computability 10. Bitwise operators 11. System design lachine Learning TA, I would think that they might ask me several questions about machine learning. I should also prepare something about that. Operating systems is another area where you'll get asked, or you may answer questions from that aspect which can potentially impress your interviewer. Google likes to ask follow-up questions that scale up the size of input. So it may be helpful to prepare some knowledge on Big Data, distributed systems, and databases. The context of any algorithm problems can involve sets, arrays, hashtables, strings, etc. It will be useful to know several well-known algorithms for those contexts, such the KMP algorithm for substrings. In this document, I will also summarize my past projects, the most difficult bugs, etc., things that might get asked. When I fly to Mountain View, this is the only document I will bring with me. I believe that it is powerful enough. Contents 1 Ground 1 2 Knowledge Review 5 2.1 Data structures . 5 2.1.1 Array 5 2.1.2 Tuple 6 2.1.3 Union... 6 2.1.4 Tagged union . 6 2.1.5 Dictionary . 7 2.1.6 Multimap . 9 21.7 Set 9 218 Bag... 9 2.1.9 Stack 9 QLMO Queue ee 9 Scanned with CamScanner 21.11 Priority queue 2.2... ee 9 2.1.12 List 10 2.1.13 Heap... 13. 2.1.14 Graph . . 13. 2115 Tree... 4 2.1.16 Union-Find 16 2.2 Trees and Graph algorithm 7 1 BFS and DFS. . 7 Topological Sort ........ 20 Paths 21 Minimum Spanning Tree. . . . 24 Max-flow Min-cut ... 2... 25 Dynamic Programming ........ 25 One example problem involving 2-D table... . . 26 2.3.2 Well-known problems solved by DP. . a7 ‘Top-down dynam 29 2.4 Recursive algorithms . 30 2.4.1 Divide and Conquer 30 24.2 Backtracking .........~. 33 2.5 Greedy algorithms 34 2.6 Sorting . 36 Merge sort, 36 Quicksort 36 Bucket sort . 37 2.6.4 Radix sort 38 2.7 Searching ...... 38 271 Queso 38 28 String .. 39 2.8.1 Regular expressions 39 2.8.2. Knuth-Morris-Pratt (MP) 4 Algorithm. ...... 40 2.8.3 Suffix/Prefix Tree... . . 42 2.8.4 Permutation was 43 29 Caching ....... i ws 43 2.9.1 Cache Concepts Review 43 2.9.2. LRU Cache . 44 2.10 Game Theory . . . 45 2.10.1 Minimax and Alpha-beta 45 2.10.2. Markov Decision Proce: 46 2.10.3 Hidden Markov Models 48 2.10.4 Baysian Models... 2... ..00.000-00-- 49 Scanned with CamScanner 21 2.12 213 31 3.2 3.3 34 Computability 1 Countability 22. ... 2. 2.11, 2.11.2 2.11. 2.11. Bitwi 212; 2.13. 2.13. 2.13. 2.13. 2.13.5 2 The Halting Problem . 3 Turing Machine . 4 P-NP ‘ise operators 1 Facts and Tricks ..... . Math. . 1 GcDs and Modulo 2 Prime numbers ...... - 3 Palindromes . . 4 Combination and Permutation Series Concurrency 2.14. 2 System design . . . 2.15. 2.15. 2.15. 2.15. 2.15. Flagship Problems Arrays 2... Strings . . me Permutation... . Trees... Graphs . Divide and Conquer Dynamic Programming Mis Unsolved 1 Threads & Proce eS. 2 Locks 1 Specification 3 Design Patterns . 4 Architecture 5 Testing ‘ellaneous Behavioral 41 Standard 4.1.1 4.1.2 4.1.3 4.14 4.15 introduce yourself .2 Subtyping and Subclasses . . . talk about your last internship talk about your current res talk about your projects . . why Google? rch, 78 Scanned with CamScanner 4.2 Favorites... 0... ee eee . 94 4.2.1 project? .. 94 4.2.2 class? 4 4.2.3 language? 94 4.2.4 thing about Google? ... . 94 4.2.5 machine learning technique? 95 4.3. Most difficult... 95 43.1 bug?.... 95 4.3.2. design decision in your project? 95 teamwork issue? . 2.2... 95 failure? 96 interview problem you prepared? . 96 5 Appendix 97 5.1 Java Implementation of Trie . . . sees 97 5.2. Python Implementation of the KMP algorithm ...... 98 5.3. Python Implementation of Union-Find .... 2.00... 100 2 Knowledge Review 2.1 Data structures 2.4.4 Array An array is used to describe a collection of elements, where each ele- ment is identified by an index that can be computed at run time by the program. I am familiar with this, so no need for more information. Bit array A bit array is an array where each element is either 0 or 1. People use bit array to leverage parallelism in hardware. Implemen- tation of bit array typically use an array of integers, where all the bits of an integer are used to be elements in the bit array. With such imple- mentation, if we want to retrieve bit with index k in the array, it is the bit with index k%32 in the int with index k/32. ‘An interesting use case of bit array is the bitmap in the file system. Each bit in the bitmap maps to a block. To retrieve the address of the block, we just do BLKSTART+k/32/BLKSIZE. on Scanned with CamScanner Circular buffer A circular buffer is a single, fixed-size buffer used as if it is connected end-to-end. It is useful as a FIFO buffer, because we do not need to shift every element back when the first-inserted one is consumed. Non-circular buffer is suited as a LIFO buffer. 2.1.2 Tuple A tuple is a finite ordered list of elements. In mathematics, n-tuple is an ordered list of n elements, where n is non-negative integer. A tuple may contain multiple instances of the same element. Two tuples are equal if and only if every element in one tuple equalst to the element at the corresponding index in the other tuple. 2.1.3 Union In computer science, a union is a value that may have any of several representations or formats within the same position in memory; or it is a data structure that consists of a variable that may hold such a value. ‘Think about the union data type in C, where essentially it allows you to store different data types in the same memory location. 2.1.4 Tagged union A tagged union, also called a disjoint union, is a data structure used to hold a value that could take on al different, but fixed, types. Only one of the types can be in use at any one time, and a tag field explicitly indicates which one is in use. It can be thought of as a type that has several ” cases,” each of which should be handled correctly when that type is manipulated. Like ordinary unions, tagged unions can save storage by overlapping storage areas for each type, since only one is in use at a time. Mathematically, tagged unions correspond to disjoint or discrimi- nated unions, usually written using +. Given an element of a disjoint union A + B, it is possible to determine whether it came from A or B. Tf an element lies in both, there will be two effectively distinct copies of the value in A + B, one from A and one from B. This data structure is not even covered in my computer science edu- cation. I don’t expect any problems about it. Good to know. Scanned with CamScanner 2.1.5 Dictionary A dictionary, also called a map or associative array, is a collection of key, value pairs, such that each possible key appears at most once in the collection. There are numerous ways to implement a dictionary. This is basically a review of CSE 332. Hash table At the bottom level, a hash table is a list T of buckets. We want the size of this list, |T', to be a prime number, to fix sparseness of the list, which can lower the number of collisions. In order to store a key-value pair into T, we need a hash function to map the likely non-integer key to an integer. This hash function should ideally have these properties: 1. Uniform distribution of outputs. 2. Low computational cost. As more elements are inserted into the hash table, there will likely be collisions. A collision is when two distinct keys map to the same bucket in the list T. Here are several common collision resolution strategies: 1, Separate Chaining: If we hash multiple items to the same bucket, store a LinkedList of those items at that bucket. Worst case insert and delete is O(n). Average is O(1). Separate Chaining is easy to understand and implement, but requires a lot more memory allocation. 2. Open Addressing: Choose a different bucket when the natural choice (one computed by hash function) is full. Techniques include linear probing, quadratic probing, and double hashing. The opti- mal open addressing technique allows (1) duplication of the path we took, (2) coverage of all spaces in the table, (3) avoid putting lots of keys close together. The reasons to use open addressing could be less memory allocation and easier data representation. I found that open addressing seems to be the preferred way, used by major languages such as Python. So it is worthy to understand how those probing techniques work. Linear probing: This method is a naive one. It finds the very next free bucket relative to the natural choice bucket. Formula: (h(key) + i) |T|, where h is the hash function. When we delete an element Scanned with CamScanner from T’, we must use lazy deletion, ie. mark that element as deleted, but not actually removing it. Otherwise, we won't be able to retrace the insertion path. Linear probing can cause primary clustering, which happens when different keys collide to form one big group! Quadratic probing: Similar to linear probing, except that we use a different formula to deal with collision: (a(ey) + 42) % ||. Theory shows that if \ < 1/2, quadratic probing will find an empty slot in at most [7/2 probes (no failure of insertion)?. Quadratic probing causes secondary clustering, which happens when different keys hash to the same place and follow the same probing sequence’. Double hashing: When there is a collision, simply apply a second, independent hash function g to the key: (h(key) + ixg(key)) % |Z]. With careful choice of g, we can avoid the infinite loop problem similar to quadratic probing. An example is h(key) = key % p, g(x) = q - (key % p) for primes p,q with 299| & > )37| >| It is important to understand the trade-offs of linked list: 1, Indexing: @(n) 2. Insert/delete at both ends: (1). 3. Insert/delete in middle: search time + (1). (No need to shift elements) Linked list has wasted space of O(n), because of the extra storage of the references. The advantage of singly linked list over others is that for some op- erations, such as merging two lists, or enumerating elements in reverse order, have very simple recursive algorithms, compared to iterative ones. For other lists, these algorithm have to include extra arguments and base cases. Additionally, linear singly linked lists allow tail-sharing, which is using a common terminal sublist for two different lists. This is not okay for doubly linked list or circular linked list, because a node cannot belong to more than one list in those cases. Doubly linked list A doubly linked list differs from singly linked list in that each node has two references, one pointing to the next node, and one pointing to the previous node jefe [12] «Ale [99] #4 The convenience of doubly linked list is that it allows traversal of the list in cither direction. In operating systems, doubly linked lists are used to maintain active processes, threads, ete. 10 Scanned with CamScanner There is a classic problem: Convert a given binary tree to doubly linked list (or the other way around). This problem will be discussed later. Unrolled linked list An unrolled linked list differs from linked list in that each node stores multiple elements. It is useful for increasing cache performance while decreasing the memory overhead assoicated with stor- ing list metadata (e.g. references). Related to the B-tree. A typical node looks like this: record node { node next int numElements // number of elements in this node, up to some max limit array elements XOR linked list An XOR linked list takes advantage of the bitwise XOR operation, to decrease storage requirements for doubly linked lists. An ordinary doubly linked list requires two references in a node, one for the previous node, one for the next node. An XOR linked list uses one address field to compress the two references, by storing the bitwise XOR. between the address of the previous node and the address of the next node. Example: We have XOR linked list nodes A, B, C, D, in order. So for node B, it has a field A@C; for node C it has a field BED, etc. When we traverse the list, if we are at C, we can obtain the address of D by XORing the address of B with the reference field of C, i.e. Be(BED)=(BeB)6D=0eD=D. For the traversal to work, we can store B’s address alone in A’s field, and store C’s address alone in D’s field, and we have to mark A as start and D as end. This is because given an arbitrary middle node in XOR linked list, one cannot tell the next or previous addresses of that node. Advantages: Obviously it saves a lot of space. Disadvantages: General-purpose debugging tools cannot follow XOR chain. Most garbage collection schemes do not work with data structures that do not contain literal pointers. Besides, while traversing the list, you have to remember the address of the previous node in order to figure out the next one. Also, XOR linked list does not have all features of doubly linked list, e.g. the ability to delete a node knowing only its address. ll Scanned with CamScanner Self-organizing list From Wikipedia:A self-organizing list is a list that reorders its elements based on some self-organizing heuristic to improve average access time. The aim of a self-organizing list is to improve effi- ciency of linear search by moving more frequently accessed items towards the head of the list. A self-organizing list achieves near constant time for element access in the best case. A self-organizing list uses a reor- ganizing algorithm to adapt to various query distributions at runtime. Self-organizing list can be used in compilers (even for code on embed- ded systems) to maintain symbol tables during compilation of program source code. Some techniques for rearranging nodes: Move to Front (MTF) Method: Moves the accessed element to the front of the list. Pros: easy to implement, no extra memory. Cons: may prioritize infrequently used nodes. Count Method: Keep a count of the number of times each node is accessed. Then, nodes are rearranged according to decreas Pros: realistic in representing the actual pattern. memory; unable to quickly adapt to rapid changes in access patt Skip list A skip list is a probabilistic data structure that allows fast search within an ordered sequence of elements. Fast search is made possible by maintaining a linked hierarchy of subsequences, where each subsequence skips over fewer elements than the previous one. ea ‘A skip list is built in layers. The bottom layer is the ordinary linked list. Each layer higher is an "express lane” for the lists below, where an element in layer i appears in layer i+ 1 with some fired probability p. ‘This seems fancy. How are these express lanes used in searching? How are skip lists used? A search for a target starts at the head element of the top layer list, and it proceeds horizontally until the current element is greater than or equal to the target. If equal, target is found. If greater, the search returns to the previous element, and drops down vertically to the list at the lower layer. The expected run time is O(logn). Skip lists can be used to maintain some, e.g. key-value, structure in databases. People compare skip lists with balanced trees. Skip lists have the same asymptotic expected time bounds as balanced trees, and they are 12 Scanned with CamScanner simpler to implement, and use less space. The average time for search, insert and delete are all O(/ogn). Worst case O(n). 2.1.13 Heap Minimum-heap property: All children are larger. Binary heap One more property of binary heap is that the tree has no gaps. Implementation using an array: parent(n) = (n-1) / 2; leftChild(n) = 2n + 1;rightChild(n) = 2n + 2. Floyd's build heap algorithm takes O(n). There are several variations of binary heap. In- sert, deleteMin, decreaseKey operations take @(logn) time. Merge takes O(n) time. Fibonacci heap This kind of heap is not a single binary-tree shaped structure for (conventional) binary heaps. Instead, it is a collection of trees satisfying the minimum-heap property. This implies that the minimum key is always at the root of one of the trees. The trees structures can be more flexible - they can have gaps. Insert, decrease- key, and merge all have amortized constant run time. Delete-min is O(logn). Implementation is kind of complex. Visualization: https: //awa.cs .usfca.edu/~galles/visualization/FibonacciHeap.html 2.1.14 Graph A graph consists of a set of vertices and a set of pairs of these vertices as edges. If these pairs are unordered, then the graph is an undirected graph. If these pairs are ordered pairs, then the graph is a directed graph. Paths A path is called simple if all its vertices are distinct from one another. A cycle isa path {v1,v2,+-+ ,ve—1, vg} in which for k > 2, the first k — 1 nodes are distinct, and vy = v,. The distance between two nodes w and v is the minimum number of edges in a u-v path. Connectivity In an undirected graph, the graph is connected if. for every pair of nodes u and v, there is a path from u to v. In an directed graph, the graph is strongly connected if, for every two nodes u and v, there is a path from u to v and a path from v to u. 13, Scanned with CamScanner ADT The following is the typical operations for a graph abstract data type (ADT). During the interview, if you need to use a graph library, you can expect it to have these functions. In Python, there are graph libraries such as python-graph. add_vertex(G, v) add_edge(G, u,v) neighbors (G,v) remove_vertex(G,v) remove_edge(G,u,v) adjacent (G,v,w) Common representations of a graph are adjacency list or adjacency matriz. Wikipedia has a nice explanation and comparison of different representations of a graph ADT. Check it out below. Adjacency list: Vertices are stored as records or objects, and every vertex stores a list of adjacent vertices. This data structure allows the storage of additional data on the vertices. Additional data can be stored if edges are also stored as objects, in which case each vertex stores its incident edges and each edge stores its incident vertices. Adjacency matrix: A two-dimensional matrix, in which the rows rep- resent source vertices and columns represent des ices. Data on edges and vertices must be stored externally. Only the cost for one edge can be stored between each pair of vertices. ination ve Adjacency list | Adjacency matrix Store graph | O(|V|+|E|) o(\V?) Add vertex o(1) ove) Add edge O(l) O(1) Remove vertex O(\E)) ouvpP) Remove edge} O(|E]) O(1) adjacent (G, v.w) O(\V)) o(1) 2.1.15 Tree An undirected grah is a tree if it is connected and does not contain a cycle (acyclic). Descendant € ancestor: We say that w is a descendent of v if v lies on the path from root to w. In this case, v is an ancestor of w. There are so many different kinds of trees. Won't discuss them here. Trie A trie is also called a prefix tree. Each edge in a trie represents a character, and the value in each node represents the current prefix by collecting all characters on the edges when traversing from the root (an empty string) to that node. All the descendants of a node have the same prefix as that node. See 5.1 for my Java implementation of Trie. ‘A compressed trie is a trie where non-branching paths are compressed into a single edge. See the figure in 2.8.3 as an example. 14 Scanned with CamScanner B-Tree In computer science, a B-tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a general- ization of a binary search tree in that a node can have more than two children (Comer 1979, p. 123). For an interview, I doubt that we need to know implementation details for a B-Tree. Know its motivation through. Motivation: Self-balanced binary search tree (e.g. AVL tree) is slow ight of the tree reaches a certain limit such that manipulating nodes require disk access. In fact, for a large dictionary, most of the data is stored on disk. So we want a self-balancing tree that is even shallower than AVL tree, to minimize the number of disk accesses, and exploit disk block size. when the Binary Indexed Tree A binary indexed tree, also called a Fenwick tree, is a data structure that can efficiently update elements and calculate prefix sums in a table of numbers. I used it for the 2D Range sum problem (3.8). See my implementation of 2D binary indexed tree there. Motivation: Suppose we have an array arr with length n, we want to (1) find the sum of first k elements, and (2) update the value of the element arrfij, both in O(logn) tim How it works': The core idea behind BIT is that every integer can be written as the sum of power 2’s. Each node in a BIT stores the sum of a range [i,j], and with all nodes combined, the BIT will cover the entire range of the array. There needs to be a dummy root node, so the size of BIT is n +1. Here is how we build up the BIT for this array First, we initialize an array BIT with size n +1 and all elements set to 0. Then, we iterate through arr. For clement i, we do an update for the BIT array as follows: 1. We look at the node BIT[i+1]. We add arr [i] to it so BIT(i+1]=BIT(i+1]+arr [i]. 2. Now since we changed the range sum of a node, we have to update the value of some other nodes. We can obtain the index n of the neat node with respect to node j for updating using the following formula: =j+ja(-j). ‘Watch YouTube video for decent explanation, by Tushar Roy: https: //www. youtube. com/watch?v=CWDQJGaNigYat=13s Scanned with CamScanner We add the value arr[i] to each of the affected node, until the computed index n is out of bound. The run time here is O(logn). Here is how we use a BIT to compute the prefix sum of the first k elements. Just like before, we find the BIT node with index k+1. We add the value of that node to our sum. Then, traverse from the node BIT[i+1] back to the root. Each time we go to the parent node of current node, suppose BIT[j], we compute the index of that parent node p, by p=j-j&(-j) Then we add the value of the parent node to sum, until we reach the root. Return sum as the result. This process is also O(logn) time. The space of BIT is O(n). 2.1.16 Union-Find A union-find data structure, also called a disjoin-set data structure, is a data structure that maintains a set of disjoint subsets (e.g. components of a graph). It supports two operations: 1. Find(u): Given element u, it will return the name of the set that u is in. This can be used for checking if u and v are in the same set. Optimal run time: O(logn). 2. Union(Ni, No): Given disjoint sets Ny, No, this operation will merge the two components into one set. Optimal run time O(1) if we use pointers; if not, it is O(logn). First, we will discuss an implementation using implicit list. Assume that all objects can be labeled by numbers 1,2,3,--+. Suppose we have three disjoint sets as shown in the upper part of the following image. Notice that the above representation is called an explicit list, because it itly connects objects within a set together. 3} 7] o|-4]2] 3} a] 5 af] afl] a2) af3] ald] afS] alé) a7] 16 Scanned with CamScanner As shown in the lower part of the image above, we can use a single implicit list to represent the disjoint sets that can remember (1) pointers to the canonical element (i.e. name) for a disjoint set, (2) the size of each disjoint set. See appendix 5.3 for my Python implementation of Union- Find, using implicit list. When we union two sets, it is conceptually like joining two trees to- gether, and the root of the tree is the canonical element of the set after union. Path compression is basically the idea that we always join the tree with smaller height into the one with greater height, i.e. the root of the taller tree will be the root of the new tree after union. In my imple- mentation, I used union-by-size instead of height, which can produce the same result in run time analysis®. The run time of union is determined by the run time of find in this implementation. Analysis shows that the upper bound of run time of find is the inverse Ackermann function, which is even better than O(logn) There is another implementation that uses tree that is also optimal for union. In this case, the union-find data structure is a collection of trees (forest), where each tree is a subset. The root of the tree is the canonical clement (ic. name) of the disjoint set. It is essentially the same idea as implicit list. 2.2 Trees and Graph algorithms 2.2.1 BFS and DFS Pseudo-code BFS and DFS needs no introduction. Here is the pseudo- code. The only difference here between BFS and DFS is that, for BFS, we use queue as worklist, and for DFS, we use stack as worklist. BFS/DFS(G=(V,E), s) { worklist = [s] seen = {s} while vorklist is not empty: node = worklist .remove {visit node} for each neighbor wu of node: if u is not in visited: queue. add (1) seen.add(u) Discussed in CSE 332 slides, by Adam Blank: nttps: //courses.cs.washington. edu/courses/cse332/15au/lectures/union-~find/union-find. pdf. 17 Scanned with CamScanner There is another way to implement DFS using recursion (From MIT 6.006 lecture): DFS(V, Adj): parent = {} for s in V if s is not in parent: parent[s] = None DFS-Visit (Adj, s, parent) DFS-Visit(Adj, s, parent): for v in Adj(s]: if v is not in parent: parent[v] = s DFS-Visit(Adj, v, parent) Obtain BFS/DFS layers BFS/DFS results in BFS/DFS-tree, which has layers L,L9,---. Each layer is a set of vertices. BFS layers are really useful for problems such as determining if the graph is two-colorable. DFS layers are useful too (application?) BFS/DFS(G=(V,E), s) { worklist = [s] seen = {s} layers = {s:0} while vorklist is not empty: node = worklist.remove {visit node} for each neighbor u of node: if u is not in visited: queue.add(u) seen. add(u) layers.put(u, layers [node] +1) Go through keys in layers and obtain the set of nodes for each layer. + Now we will look at some BFS/DFS Tree Theorems. Theorem 2.1 (BFS). For each j > 1, layer L; produced by BFS starting from node s consists of all nodes at distance exactly j from s. Theorem 2.2 (BFS). There is a path from s tot if and only if t appears 18 Scanned with CamScanner in some BFS layer. Theorem 2.3 (BFS). For BFS tree T, if there is an edge (x,y) in G such that node x belongs to layer Li, and node y belongs to layer L;, then i and j differ at most 1. Theorem 2.4 (DFS). Let T be a DFS tree, Let x,y be nodes in T. Let (x,y) be an edge of G that is NOT an edge of T. Then, one of « ory is an ancestor of the other. Theorem 2.5 (DFS). For a given recursive call DFS(u), all nodes that are marked visited (e.g. put into the parent map) between the invocation and end of this recursive call are descendents of u in the DFS tree T. Now, let us look at how BFS layers is used for the two-colorable graph problem. A graph is two-colorable if and only if it is bipartite. A bipartite graph is a graph whose vertices can be divided into disjoint sets U and V (ie. U and V are independent sets), such that every edge connects a vertex in U to one in V. Theorem 2.6 (No Odd Cycle). If a graph G is bipartite, then it cannot contain an odd cycle, i.e. a cycle with odd number of edges (or nodes). Theorem 2.7 (BFS and Bipartite). Let G be a connected graph. Let Ly, La,-++ be the layers of teh BFS tree starting at node s. Then, 1. Bither there exists an edge that joins two nodes of the same layer, which implies that there exists odd cycle in G, so G isn't bipartite. 2. Or, there is no edge that joins two nodes of the same layer. So G is bipartite Edge classification for DFS tree We can classify edges in @ after DFS into four categories: 1. tree edge: We visit new vertex via such edge in DES. 2. forward edge: edge from node to its descendant in DFS tree. 3. backward edge: edge from node to its ancestor in DFS tree. 4. cross edges: edges between two non-ancestor related subtrees. Note that DES tree for a directed graph can have all four types of edges, but DFS tree for an undirected graph cannot have forward edges and cross edges. 19 Scanned with CamScanner Cycle detection Theorem 2.8. A graph G has a cycle if and only if the DFS has a backward edge. Besides using this theorem, Kahn’s algorithm (discussed in Topological Sort section) can be used to detect if there is a cycle. 2.2.2 Topological Sort For a directed graph G, we say that a topological ordering of G is an ordering of its nodes as v1, v2,-++ , vn, such that for every edge (vi,vs), we have i 1(*). Then, we increment i by |x|. Then, we repeat the previous step (*), and stop when i +j > J. We check if i +j =1 inside each iteration, and if so, we check if Opt{i,j]=True. If yes, we return True. If we don’t return True, we return False at the end. 2.3.2. Well-known problems solved by DP Longest Common Subsequence Find the longest subsequence com- mon to all sequences in a set $ of sequences. Unlike substrings, sub- sequences are not required to occupy consecutive positions within the original sequences. Let us look at the case where there are only two sequences x, y in S. For example, x is 14426, and y is 2134. The longest common subsequence of a and y is then 14. Define Opt{i, j] to be the longest common subsequence for subsring x{0 : i] and y[0: j]. We have the following update formula: 0 ifi=0,j= Opt = ¢ Opt[i — 1,7 -— JU a; if x[é] = yl] max(Opt[i—1,j], Opt[i,j —1])_ if 2[i] 4 li] Similar problems: longest common substring. longest increasing subsequence). Now, let us discuss the Levenshtein distance problem. The Lev- enshtein distance measures the difference between two sequences, i.e. the fewest number of operations (edit, delete, add) to change from one sequence to another. The definition of Levenshtein distance is itself a dynamic programming solution to find the edit distance, as follows (from Wikipedia): Definition 2.1. The Levenshtein distance between two strings a,b (of length |a| and |b| respectively) is given by lev,,,({a], |b) where max(i,j) ifmin(i, j) = evaa(is) levae(i— 1,9) +1 levas(i,j)= 4. -_ , OD) V nin ¢ leva a(isj — 1) +1 otherwise. leva.s(i—1,5 — 1) + Learzn,) 7 Scanned with CamScanner where 1(4,46,) is the indicator function that equals to 1 if a; # bj, and leva,s(é,J) is the distance between the first i characters of a and the first j characters of b. The Knapsack Problem Given a set of items S, with size n, each item a; with a weight w; and a value v;, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit K and the total value is as large as possible. Define Opt{i, k] to be the optimal subset of items from a;,--- ,a; such that the total weight does not exceed k. Our final result will then be given by Opt{n, K]. For an item a;, either it is included into the subset, or not. If not, that means the total weight of the subset with a; added exceeds k. Therefore we have: if k=0, Opti — 1, 4] if adding w; & max(Opt[i— 1, k], Opt[i—1,k — w;] + v;) otherwise Similar problems: subset sum. eds k, Opt[i, k] = Matrix Chain Multiplication Given a sequence of matrices Aj Ao -- the goal is to find the most efficient way to multiply these matrices. ‘The problem is not actually to perform the multiplications, but merely to decide the sequence of the matrix multiplications involved. Here are many options because matrix multiplication is associative. In other words, no matter how the product is parenthesized, the result obtained will remain the same. For example, for four matrices A, B,C, and D, we would have: ((AB)C)D = ((A(BC))D) = (AB)(CD) = A((BC)D) = A(B(CD)) However, the order in which the product is parenthesized affects the number of simple arithmetic operations needed to compute the product, or the efficiency. For example, if A is a 10x30 matrix, B is a 30x5 matrix, and C is a 5 x 60 matrix, then computing (AB)C needs (10 x 30 x 5) + (10 x 5 x 60) = 1500 + 3000 = 4500 operations, while computing A(BC) needs (30 x 5 x 60) + (10 x 30 x 60) = 9000 + 18000 = 27000 28 Scanned with CamScanner operations. The first parenthesization is obviously more preferable. Given n matrices, the total number of ways to parenthesize them is P(n) = Q(4"/n3/?), so brute force is impractical!?, We use dynamic programming. First, we characterize the structure of an optimal solution. We claim that one possible structure is the following: ((Au)(Aitan)) (ly where Aj; means matrix multiplication of A;Aiy1-+- Aj. In order for the above to be optimal, the parenthesization for Ay; and Ajyy:n must also be optimal. Therefore, we can recursively break down the problem, till we only have one matrix. A subproblem is of the form Aj.;, with 1 < i,j <1, which means there are O(n2) unique subproblems (counting). Let Opt[i, j] be the cost of computing Aj,;. If the final multiplication of Ai) is Aij = AiceAnsij, assuming that Aj.x is pia x Pr, and Agsiy ii hen for i < j, Opt[i. j] = Opt[i, k] + Opt[k + 1, j] + pi-apep; This is because by definition of Opt, we need Opt[i, k] to compute Ac.x, and Opt[k +1: j] to compute Ay41.j;, and pi—1p_pj to compute the multiplication of Ay, and Agy1.j. For i = j, Opt[i,j] = 0. Since we need to check all values of i, j pair, i.c. the parenthesization shown in (1), the run time is O(n). 2.3.3 Top-down dynamic programming So far, we are able to come up with an equation for the Opt table/array in the example problems above. This is called bottom-up approach. However, for some problems, it is not easy to determine such equation. In this case, we can use memoization and top-down approach, usually involving recursion. The top-down approach basically leads to the same algorithm as bottom-up, but the perspective is different. According to a lecture note of CMU algorithms Basic Idea: Suppose you have a recursive algorithm for some problem that gives you a really bad recurrence like T(n) = 2T(n — 1) +n. However, suppose that many of the subprob- Jems you reach as you go down the recursion tree are the same. Then you can hope to get a big savings if you store your com- putations so that you only compute each different subproblem 12Columbia class Fii/natrix-chain. pdf. jes:_http: //aww.columbia .edu/-cs2035/courses/csor4231. 29 Scanned with CamScanner once. You can store these solutions in an array or hash table. This view of Dynamic Programming is often called memoizing. For example, the longest common subsequence (LCS) problem can be solved with this top-down approach. Here is the pseudo-code, from the CMU lecture note. LCS(S,n,T,m) t if @==0 | ) return 0; if (arr[n] [m] != unknown) return arr(n] [m]; // memoization (use) if (Sti) T{m]) result = 1 + LCS(S,n-1,T,m-1); else result = max( LCS(S,n-1,T,m), LCS(S,n,T,m-1) ); arr(n] (m] = result; // memoization (store) return result; + If we compare the above code with the bottom-up formula for LCS, w« realize that they are just using th same algorithm, with same cases. The idea that both approaches share is that, we only care about computing the value for a particular subproblem. 2.4 Recursive algorithms 2.4.1 Divide and Conquer ‘The matrix chain multiplication problem discussed previously can be solved, using top-down approach, with recursion, and the idea there is basically divide and conquer — break up the big chain into smaller chains, until i = j (Opt[i, j]=0). Divide and conquer (D&C) works by recursively breaking down a problem into two or more sub-problems of the same or related type, until these problems are simple enough to be solved directly!. For some problems, we can use memoization technique to optimize the run tin Now, let us look at two well-known problems solvable by divide-and- conquer algorithms. 13From Wikipedia. 30 Scanned with CamScanner Binary Search This search algorithm runs in O(/ogn) time. It works by comparing the target element with the middle element of the array, and narrow the search to half of the array, until the middle element is exactly the target clement, or until the remaining array has only one element. Binary search is naturally a divide-and-conquer algorithm. Binary-Search-Recursive(arr, target, lo, hi): # lo inclusive, hi exclusive. if hi <= lo: return NOT FOUND mid = lo + (hi-1o)/2 if arr[mid] == target: return mid else if arr[mid] > target return Binary-Search-Recursive(arr, target, mid+1, hi) else: return Binary-Search-Recursive(arr, target, lo, mid) Binary-Search-Iterative(arr, target): lo =0 hi = arr.length while lo < hi: mid = lo + (h: if arr[mid] return mid else if arr{mid] > target: lo = mid + 1 else: hi = mid return NOT FOUND 10) /2 target: Implement the square root function: To implement the square root function programmatically, with integer return value, we can use binary search. Given number n, we know that the square root of n must lie between 0 and n/2. Then, we can basically treat all integers in (0, n/2| as an array, and do binary search. We terminate only when 1o equals to hi. Closest Pair of Points Given n points in metric space, e.g. plane, find a pair of points with the smallest distance between them. A divide- and-conquer algorithm is as follows (from Wikipedia): 1. Sort points according to their 2-coordinates. 31 Scanned with CamScanner 2. Split the set of points into two equal-sized subsets by a vertical line x = Lapuit- 3. Solve the problem recursively in the left and right subsets. This yields the left-side and right-side minimum distances dymin and drmin, Tespectively. 4, Find the minimal distance dzmin among the set of pairs of points in which one point lies on the left of the dividing vertical and the other point lies to the right. 5. The final answer is the minimum among dpmin, dRmin; and dr Rmin- ‘The recurrence of this algorithm is T(n) = 2T(n/2) + O(n), where O(n) is the time needed for step 4. This recurrence to O(nlogn). Why can step 4 be completed in linear time? How? Suppose from step 3, we know the current minimum distance is 6. For step 4, we first pick the points with x-coordinates that are within [x.p1ie — 4, ¢sprie + 4], call this the boundary zone. Suppose we have py,--- ,Pm inside the boundary zone. ‘Then, we have the following magical theorem. Theorem 2.11. If dist(p;, pj) <6, then j —i < 15. With this, we can write the pseudo-code for this algorithm’: Closest-Pair(P): if [P| == 2: return dist (P(0). P[1)) L, R = SplitPointsByHalf(P) dL = Closest-Pair(L) AR = Closest-Pair(R) ALR = min(dL, dR) S = BoundaryZonePoints(L, R, dLR) fori=1,..., |S): for j= 41, ..., 15: aLR = min(dist(S[i], SIj]), @) return dLR Obviously, there are other classic divide-and-conquer algorithms to solve problems such as the convex hull (two-finger algorithm), and the median of medians algorithm (groups of five). As the writer, I have read those 14Cited from CMU lecture slides, with modification https: //www.cs.cmu.edu/ ~ckingsf /bioinfo-lectures/closepoints..pdf 32 Scanned with CamScanner algorithms and understood them, but I will save my time and not discuss them here. 2.4.2 Backtracking Backtracking is a general algorithm for finding all (or some) solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons each partial candidate ¢ (“backtracks”) as soon as it determines that c cannot possibly be completed to a valid solution’. Solving Sudoku Sudoku is a puzzle that is played on a grid of 9 by 9 cells, such that when all cells are filled up with numbers, each row and column have an enumeration of 1,2,--- ,9, and so does each "big cell” (subregion). It needs no more introduction than that. Here are the constraints for a Sudoku puzzle: 1. Each cell can contain one number in {1,2,3,4,5,6,7,8, 9} 2. Each row, column, and subregion all have an enumeration of the numbers 1 through 9, with no repeat. Pseudo-code'®. The idea of backtracking is illustrated in the undo & try again step. bool SolveSudoku(Grid &grid) t int row, col; if (!FindUnassignedLocation(grid, row, col)) return true; // all locations successfully assigned! for (int num = 1; num <= 9; num++) { // options are 1-9 if (NoConflicts(grid, row, col, num)) { // if # looks ok grid(row, col) = num; // try assign # if (SolveSudoku(grid)) return true; // recur if succeed stop grid(row, col) = UNASSIGNED; // undo & try again } + return false; // this triggers backtracking from early decisions } 1SDescription of backtracking is from Wikipedia. 16From https: //see.stanford.edu/materials/icspacs106b/Lecture11.pdf 33 Scanned with CamScanner Relation between backtracking and DFS: DFS is a specific form of back- tracking related to searching tree structures. Backtracking is more broad — it can be used for searching in non-tree structures, such as the Sudoku board. 2.5 Greedy algorithms A greedy algorithm is an algorithm that makes locally optimal decisions, with the hope that these decisions would lead to globally optimal solu- tion. It is easy to come up with these greedy rules, but most of them are wrong, and the right ones are typically hard to justify. So nothing is better than showing examples when discussing greedy algorithms. Scheduling Problem 1! There is a computer system with numerous processes, some of which is marked sensitive. Each sensitive process has a designated start time and finish time, and it runs continuously between these times.There is a list of the planned start and finish times of all sensitive processes that will be run that day. You are given a program called status-check that, when invoked records various pieces of logging informe processes running on the system at that moment. You should run status_check as few times as possible during the day, but enough that for each sensitive process P, status_check is invoked at least once dur- ing the execution of process P. Give an efficient algorithm that, given the start and finish times of all the sensitive processes, finds as small a set of times as possible at which to invoke status_check , subject to the requirement that status_check is invoked at least once during each sensitive process P. Algorithm: We start by sorting the processes by descending start time, using a heap (i.e. the process with highest start time will have the minimum value in the heap). Then, we keep removing the root process from the heap, and use the start time of this process as the time to call status_check. And we mark processes that are running at that time as checked, and remove them from the heap as well. Here is a pseudo-code to illustrate this algorithm ion about all the sensitive CountCall (processes) : Heap h = a heap of processes ordered by descending start time. count_calls = 0 While h is not empty: his problem is from Kleinberg Algorithm Design, pp.194 14. 34 Scanned with CamScanner call_time = start time of p count_calls += 1 For each process q in h: If q is running at call_time: h.Remove(q) Return count_calls Justification for correctness: The above algorithm will terminate because there is finite number of sensitive processes, and thus the heap will eventually be empty when all processes are checked. We can use induction to show that the above algorithm produces correct result. Suppose we have n sensitive processes in total. Processes are labeled Po, Pi, P2,...,Py. Let Wy be defined as the number of calls to status_check when there are n proc Proof. Base Case: The base case when n=0 is trivial. The al- gorithm will simply retum 0 since the heap is empty. This is a correct behavior. So, the algorithm works for the base case. Induction Hypothesis: Assume that for 0 <= j <= k, our algorithm produces minimum possible value of W}. Inductive Step: Now show that for n = k +1, our algorithm still produces minimum possible value for We+1- For Px, there are two cases to consider: esses. 1. We need to call status_check once more in order to check Py41, because Py41 is not checked when we handle the other Po, .2+y Pk processes. 2. We do not need to call status-check any more, because Pea is checked when we handle the other P),...; Pk pro- cesses. For case (a), since our algorithm will only terminate when the heap is empty, so when P41 is not checked, it is still in the heap. Therefore, the algorithm will do one extra RemoveMin() and remove P,; from the heap. By the induction hypothesis, the algorithm produces optimal result for 0 <= j <= k. Thus, the result produced by the algorithm for n = k+1 matches the optimal in case (a), which requires one extra call to the status_check function. For case (b), since P,41 is checked when we handle Po, ... Pes our algorithm will have already removed Pj: when it is done dealing with Po,...,P... By the induction hypothesis, the algo- Scanned with CamScanner rithm produces optimal result for 0 <= j <= k. Thus, the result produced by the algorithm for n = k +1 matches the optimal in case (b), which is to NOT call status-check any more. Conclusion From the above proof of base case and induction step, by Strong Induction, we have shown that our algorithm works for integer n >= 0. a Indeed, induction is how you formally prove that a greedy rule works correctly. Justification for run time: The above algorithm is efficient. We first construct a heap of processes, which takes O(nlogn) time. Then we loop until we remove all items inside the heap, which is O(nlogn) time. Since we do not add any process back into the heap after we removed it, the algorithm will terminate when the heap is empty. Besides, any other operations in the algorithm is O(1). Therefore, combined, our algorithm has an efficient runtime of O(nlogn) + O(nlogn) = O(nlogn). 2.6 Sorting 2.6.1 Merge sort Merge sort is a divide-and-conquer, stable sorting algorithm. Worst case O(nlogn); Worst space O(n). Here is a pseudo-code for non-in-place merge sort. An in-place merge sort is possible. Mergesort (arr): if arr.length == 1: return arr 1 = Mergesort(arr[0:mid]) x = Mergesort (arr [mid: length] ) return merge(1, r) 2.6.2 Quicksort Quicksort is a divide-and-conquer, unstable sorting algorithm. Average run time O(nlogn); Worst case run time O(n”); Worst case auxiliary space'® O(logn) with good implementation. (Naive implementation is O(n) space still.) Quicksort is fast if all comparisons are done with constant-time memory access (assumption). People have argued which sort is the best. Here is an answer from Stackoverflow, by user11318: 18 Auxiliary Space is the extra space or temporary space used by an algorithm From GeeksForGeeks. 36 Scanned with CamScanner ... However if your data structure is big enough to live on disk, then quicksort gets killed by the fact that your average disk does something like 200 random seeks per second. But that same disk has no trouble reading or writing megabytes per second of data sequentially. Which is exactly what merge sort does. Therefore if data has to be sorted on disk, you really, really want to use some variation on merge sort. (Gen- erally you quicksort sublists, then start merging them together above some size threshold.) Here is the pseudo-code: Quicksort(arr, lo, hi): # lo inclusive, hi exclusive if hi < lo: return pivot = ChoosePivot(arr, lo, hi) p = Partition(arr, lo, hi, pivot) Quicksort(arr, lo, p) Quicksort(arr, p, hi) Partition(arr, lo, hi, pivot): # lo inclusive, hi exclusive i= lo, j= hi while i k: return Quickselect(arr, k, lo, p) else: return Quickselect(arr, k-p, p, hi) The (expected) recurrence for the above psuedo-code is T(n) = T(n/2)-+ O(n). When solved, it gives O(n) run time. For the worst case, which is also due to bad pivot selection, the run time is O(n). 2.8 String 2.8.1 Regular expressions Regular expression needs no introduction. In interviews, the interviewer may ask you to implement a regular expression matcher for a subset of regular expression symbols. Similar problems could be asking you to implement a program that can recognize a particular string pattern. Here is a regex matcher written by Brian Kernighan and Rob Pike in their book The Practice of Programming”. /* match: search for regexp anywhere in text */ int match(char #regexp, char *text) { if (regexp[0] == °~”) return matchhere(regexpt1, text) ; do { /* mst look even if string is empty */ if (matchhere(regexp, text)) return 1; } while (*#text++ != ’\0’); return * /* matchhere: search for regexp at beginning of text +/ int matchhere(char *regexp, char *text) t 21Discussed here —_bttp://www.cs.princeton. edu/courses/archive/spr09/ cos333/beautiful html 39 Scanned with CamScanner if (regexp[0] == ’\0’) return 1; if (regexp[1] == '*?) return matchstar(regexp[0], regexp+2, text); if (xegexp[0] == ’#? \k\k regexp[1] == *\0’) /* # means dollar sign here! +*/ return *text == ’\0’; if (+text!=’\0’ && (regexp[0]==’.’ || regexp[0]==*text)) return matchhere(regexpt1, text+1) ; return 0; ¥ /* matchstar: search for c#regexp at beginning of text +*/ int matchstar(int c, char *regexp, char *text) t do { /* a * matches zero or more instances */ if (matchhere(regexp, text)) return 1; } while (*text != ’\O’ \W\& (*text++ == ¢ || ¢ == ?.7))5 return 0; + 2.8.2. Knuth-Morris-Pratt (KMP) Algorithm The KMP algorithm is used for the string matching problem. Find the index that a pattern P with length m occurs (if ever) in a string W with length n. The naive algorithm to solve this problem takes O(nm) time, which does not utilize any information when a matching failed. The key of KMP is that it uses it and achieves run time of O(n + m). It is a complicated algorithm, and let me explain it now. See the appendix 5.2 for my Python implementation, based on the ideas below. Building prefix table (7 table) The first thing that KMP does is to preprocess the pattern P and create a x table. x[i] is the largest integer smaller than i such that Py-++ Pi is a suffix of Py ++ P;. Consider the following example: mil [-1|-1,/0[1]-1/0[1 [2 40 Scanned with CamScanner When we are filling r[i], we focus on the substring P)---P;, and see if there is a prefix equal to the suffix in that substring. (0), [1], [4] are —1, meaning that there is no prefix equal to suffix in the corresponding substring. For example, for 7{4], the substring of concern is ababe, and there is no valid index value for 7[4| to be set. 7[7] = 2, because the substring Py---P; is ababcaba, and the prefix Py---P2, aba, is a suffix of that substring. Below is a pseudo-code for constructing a 7 table. The idea behind the pseudo-code is captured by two observations: L. If Py+++ Py is a suffix for Py-+-P;, then Py-- jaja is a suffix for Py--- Pi-1 as well. 2. If Py +++ Pyjy isa sullix for Py-+-P,, then so does Py +++ P,jq|ij}, and so does Py +++ P,j,{n{ij}- ete., a recursion of the 1 values. So we can use two pointers i,j, and we are always looking at if the prefix P)---Pj-1 is a suffix for the substring Py --- P,-1. So pointer i moves quicker than pointer j. In fact i moves up by 1 every time we are done with a comparison between P; and P;, and j moves up by 1 when P, = P; (observation 1). At this time (P; = Pj), we set a[i] = j. If P, # P;, we will move j back to m[j — 1] +1 (+1 because 7 1 when there is no matching prefix.) This guarantees that the prefix Py +» Pj-1 is the longest suffix for the substring Pp --- P;-1. We need to initialize a[-1] =—1 and 20] = -1. Construct-7-Table(P) : j=0,isit while i < |P|: if j>0: j = max(0, w[j-1]+1) else: mfi] = -1 ita Pattern Matching Once we have the 7 table, we can skip characters when comparing the pattern P and the string W. Consider P and W to 41 Scanned with CamScanner be the following, as an example. (P is the same as the above example.) W = abccababababcaababcaba, — P = ababeaba Based on the way we construct the 7 table above, we have the following rules when doing the matching. Assume the matched substring (i.e. the substring of P before the first mismatch, starting at W{k], has length d. 1. Ifd = |P|, we found a match. Return k. 2. Else, if d > 0, and z[d—1] = —1, then the next comparison starts at Wik +d]. 3. Else, if d > 0, and x[d—1] 4 —1, then the next comparison starts at. W[k + d— rd —1] — 1].note: we don't need the -1 hore if © table is I-based index. See Stanford slides. 4. Else, if the matched substring, starting at index k, has length 0, then the next match starts at k +1 2.8.3 Suffix/Prefix Tree See Trie 2.1.15. If we have a text of size T, and a small pattern of size P, and we are interested to know if P occurs in T, then we can achieve O(P) time and O(T) space by building a suffix tree of the text T. A compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow icularly fast implementations of many important string oper onstruction of such a tree for the string T takes time e linear in the length of T. Here is an example of a suffix tree, banana$”. (The $ sign is for marking the end of a string.) tions 42 Scanned with CamScanner 2.8.4 Permutation String permutation is another topic that interviewers may like to ask. One generates permutations typically by depth-first search (i.e. a form of backtracking); we can imagine that all possible permutations are the leafs of a tree, and the paths from root to them represent the characters chosen. 2.9 Caching In general a cache is a location to store a small amount of data for more convenient access. It is everywhere. Here are some common examples, described at a high level. CPU cache is used by the CPU to quickly access data in the main memory. Depending on the type of memory data, e.g. regular data, instruction, virtual address (translation lookaside buffer used by MMU), etc., there may be different types of caches, such as data cache and instruction cache. Cache server (web cache) basically saves web data (e.g. web page, requests) and serve them when the user request the same thing again, in order to reduce the amount of data transmitted over the web. 2.9.1 Cache Concepts Review Definition 2.2. Cache hit is when we request something that is in the cache. Cache miss is when the requested item does not exist in the cache. Definition 2.3. A cache block, or cache line a section of continuous bytes on a cache. It is the lowest I/O level for a cache. Locality refers to the fact that programs tend to use data and in- structions close to (spatial) some of those recently used (temporal). Memory Cache Direct mapped cache is a cache where cach memory address can be mapped to exactly one block in the cache. Each block is divided into three sections: tag, data, and valid bit. The tag stores the first few bits of an address. The data stores the cached memory data, which can consist of several blocks of typically 32 or 64 bytes. The valid bit indicates whether the cached data can be trusted by the CPU for computation. When CPU needs to refer to some data with an address, 43 Scanned with CamScanner it will figure out the tag on that address (e.g. by dividing page size), and check if the mapped block in the cache has the same tag, and if the valid bit is set. If so, then the cached data is usable. Fully-Associative cache is one that allows any memory page to be cached in any cache block, opposite to direct mapped cache. The advan- tage is that it avoids the possibly constantly empty entries in a direct mapped cache, so that the cache miss rate is reduced. The drawback is that such cache requires hardware sophistication to support parallel look-up of tags, because in this case, the only way to identify if an ad- dress is cached is to compare the tag on that address with all tags in the cache (in parallel). In CSE 351, we adopted the set-associative cache as the reason- able compromise between complicated hardware and the direct mapped cache. Here, we divide the cache into sets, where each set contains sev- eral entries. This way, we can reduce the cache miss rate compared to direct mapped cache, but also check the tags efficiently enough in par- allel, because there are only a few entries in a We say a cache is n-way, if in each set there are n cache blocks. For the sake of interviews, we will discuss LRU cache and LFU cache (in software). Both of them fall under the topic of cache replacement policies, which means that when a cache is full, how do we evict cached data. There are numerous policies, including FIFO, LIFO, LRU (Least Recently Used), LFU (Least Frequently Used), etc. In hardware, these caches are usually implemented by manipulating some bits (e.g. LRU counter for each cache block) in the block to keep track of some property such as age. Concepts are similar. 2.9.2 LRU Cache LRU cache policy is to evict the least recently used data first. In soft- ware, we can use a doubly linked list plus a hash table to implement the LRU cache. Each node in the list corresponds to a key-value pair®? in the hash table. When we insert new key-value pair into the cache, we also add a node to the front of the list. When the cache is full and we still need to add data to it, we basically remove the last element in the list, because it is least recently used. When we actually have a cache hit, we can simply bring the corresponding node to the front of the list, por! 22In the context of a memory cache, we can think of the key as the address, and the value as the cache block associated with the tag of that address. 44 Scanned with CamScanner by removing it (constant time for doubly linked list) then prepending it to the list. 2.10 Game Theory Game theory is basically the theory of modeling intelligent rational de- cision making process in different kinds of situations. It is not just used in computer science, but also in economics and political science. Thi not a very hot topic in coding interviews, but knowing these algorithm may help in some cases - they are essentially various sorts of searching algorithms with different model of the world. Some concepts In economics, game theory, decision theory, and ar tificial intelligence, @ rational agent is an agent that has clear prefer- ences, models uncertainty via expected values of variables or functions of variables, and always chooses to perform the action with the optimal expected outcome for itself from among all feasible actions. A rational agent can be anything that makes decisions, typically a person, firm, machine, or software, A measure of preferences is called utility. A game where agents have opposite utilities is a zero-sum game, e.g. chess; agents go against each other in a adversarial, pure competition. A gen- eral game is one where agents have independent utilities; Cooperation, indifference, competition, etc. are all possible in this kind of game. 2.10.1 Minimax and Alpha-beta Minimaz is an intuitive adversarial search algorithm for deterministic games. There is an agent A and an agent (opponent) Z. Minimax follows the following two equations”*: Va(sa)= max Va(sz) sz.€successors(s.a) min Va(sa) Vz(sz) = 2 z) 4 Esuccessors(s z) where Vx(s) means the utility function of agent K for a state s. The parameter sx is the state that agent K takes control - agent K makes the decision of how to change from sx to some successor state. So, agent A tries to maximize the utility, and Z tries to minimize its utility. 23From Wikiped 24Modified from CSE 473 lecture slides, 2016 spring, by Prof. L. Zettlemoyer. 45 Scanned with CamScanner Alpha-beta is a pruning method for Minimax tree. The output of Alpha-beta is the same as the output of Minimax. The a value represents the assured maximum score that the agent A can get, and the f value is the assured minimum score that agent Z can get. Below is my Python implementation of alpha-beta, when doing the pacman assignment. Only the agent with index 0 is a maximizing agent. def _alphabeta(self, gameState, agentIndex, depth, alpha, beta): if gameState.isWin() or gameState.isLose() or depth == 0: score = self. evaluationFunction(gameState) return score curAgent = agentIndex % gameState. getNumAgents() legalActions = gameState.getLegalActions(curAgent) ~float("inf") score if agentIndex != 0: score = -score nextAgent = curAgent + 1 if nextAgent >= gameState.getNumAgents(): nextAgent = 0 depth -= 1 for action in legalActions: successorState = gameState. generateSuccessor (curAgent, action) if curdgent == 0: score = max(score, self._alphabeta(successorState, nextAgent, depth, alpha, beta)) if score > beta: return score alpha = max(alpha, score) else: score = min(score, self._alphabeta(successorState, nextAgent, depth, alpha, beta)) if score < alpha: return score beta = min(beta, score) return score 2.10.2 Markov Decision Process A Markov Decision Process (MDP) is defined by: 46 Scanned with CamScanner A set of states s € S, A set of actions a € A, A transition function T(s, as’) for the probability of transition from s to s’ with action a, A reward function R(s, as’), A start state, (maybe) a terminal state. ‘The world for MDP is usually a grid world, where some grids have pos- itive reward, and some have negative reward. The goal of solving an MDP is to figure out an optimal policy 7*(s), the optimal action for state s, so that the agent can take actions according to the policy in or- der to gain the highest amount of reward possible. There are two ways to solve MDP discussed in the undergraduate level AI class: value (utility) iteration and policy iteration. I will just put some formulas here. For most of the time, understanding them is straightforward and sufficient. Definition 2.4. The utility of a state is V*(s), the expected utility starting in s and acting optimally. Definition 2.5. The utility of a q-state®® is Q*(s, a), the expected util ity starting by taking action a in state s, and act optimally afterwards. Using the above definition, we have the following recursive definition of utilities. The 7 value is a discount, from 0 to 1, which can let the model prefer sooner reward, and help the algorithm converge. Note max is different from argmax. V*(s) = maxQ*(s,a) Q*(s,a) = YET 6,4,8')[R(s,a, 8!) +9V"(s1)] V%(s) = max I 1(s,a,s)[R(s,4,s") +9V"(s))] Value Iteration: Start with Vo(s) = 0 for all s € S. Then, we update Visi iteratively using the following (almost trivial) update rule, until convergence, Complexity of each iteration: O(S?A). Via (s) © max SOT (s,a,5')[R(s, 4,8") + 7Ve(s’)] 25The naming, q-state, from my understanding, means quasi-state, which is seem- ingly a state, but not really. 47 Scanned with CamScanner Policy Iteration: Start with an arbitrary policy 7o, then iteratively evaluate and improve the current policy until policy converges. This is better than value iteration in that policy usually converges long before value converges, and the run time for value iteration is not desirable. Vitis — SOT(s. mi(s). 8')[R(s, mi(s),8") + WVils!)™] mriaa(s) = argmax > T(s, a, s')[R(s, a, s'] +7V™(s')] 2.10.3 Hidden Markov Models A Hidden Markov Model looks like this: @)-@)G)---* A state is a value of a variable X;. For example, if X; is a random variable meaning "it rains on day i”, then the value of X; can be True or False. If X, = ¢, then we have a state X; which means it rains on day 1. An HMM is defined by an initial distribution P(X), transitions P(X¢|Xi-1), and emissions P(E;|X;). The value of an emission vari- able represents an observation, e.g. sensor readings. We are interested to know P(X;|e1.1), which is the distribution of X; given all of the observations to date. We can obtain the joint distribution of a; € X; and all current observations. Plee,er,° +€+) = Perle) S P(welte1)P(aia, er,+++ ee) Ther b we normalize all entries in P(X;,€1,--+ ,¢,) to the desired current , B(X;), by the definition of conditional probability. B(X,) = P(Xilere) = P(Xeer,--> .00)/ Pleven see) 48 Scanned with CamScanner This is called the forward algorithm. From this, we can derive the formula for the belief at the next time frame, given current evidence: BY(Xis1) = P(Xesalere) = SD P(Xesaler) PC SPX le) Bear) ‘The above equation allows us to perform online belief updates. 2.10.4 Baysian Models Baysian Network is based on the familiar Bayes’s Theorem: P(B\A)P(A) PAB) = Soy A Baysian Network can be represented by a DAG. A diagram of an example network is as below. Each node is a random variable. The edges encode conditional in- dependence; nodes that are not connected represent variables that are conditionally independent of each other. For example, in the diagram below, A and B are conditionally independent given the other vari- ables. Recall that if X,Y are conditionally independent given Z, then P(XY|Z) = P(X|Z)P(Y|Z) Besides, a Baysian Network implicitly encode a joint distribution. For the above diagram, we have: P(A,B,C,D, E) = P(E\C, D)P(C\A, B)P(D|C) P(A) P(B) In general, P(xy,22)--* tn =[[Pteitparents(x)) 49 Scanned with CamScanner 2.11 Computability This section is about some fundamental theory of computer science. I am not sure if any interview will ask questions directly related to them, but knowing them definitely helps. 2.11.1 Countability A set $ is countable if and only if the elements in Scan be mapped on-to N. The union of countably many countable sets is countable. (Given axiom of choi 2.11.2 The Halting Problem A set is called decidable if there is an algorithm which terminates after a finite amount of time and correctly decides whether a given element belongs to the set. Here is my proof of a 311 problem related to the Halting Problem. ‘The problem is the following: Prove that the set {,,y) : P is a program and P(x) # P(y)} is undecidable. 1. Assume that there exists an algorithm A that satisfies the above. Note that the argument for A is an element in the set, ie. (CODE(P), x,y). A has return value of true or false. True when the element is in, and false otherwise. 2. Suppose we have the following program D, inside which there is an arbitrary program H. We claim that we can use A to show if H halts: Dx): ifx \2= while(1) else: return H() #H is an arbitrary function 3. As you can see, the output of this program is either “loop forever”, or what H returns. 4. If A(CODE(D), 1, 2) returns true, then D(1) # D(2). So D(1) halts, which means H halts. Scanned with CamScanner a . If A(CODE(D), 1, 2) returns false, then D(1) = D(2). So D(1) does not halt. (Bascially, when two outputs D(1)= D(2), the only chance that happens is that both when into infinite loop, which does not return a value (abstractly, this "infinite loop” is the re- turned value). 6. Suppose H is a program that actually plays the same role as A but for the halting set, a set defined like this: { (i,2) — program halts when run on input x } 7. Because of Halting Problem, we know that this set is undecidable. So here is a contradiction. 8. Thus, the given set is undecidable. 2.11.3. Turing Machine ‘A Turing Machine is basically a conceptual machine, imagined using the materials in Alan Turing’s time (1930s), that can be used to do compu- tation with algorithmic logic. A Turing Machine can only solve problems that are decidable, which means that there exists a single program that always correctly outputs Yes/No. limitations A limitation of Turing machines is that they do not model the strengths of a particular arrangement well. Another limitation of Turing machines is that they do not model concurrency well (Wikipedia). 2.11.4 P-NP ‘The following definitions and theorems are provided by Algorithm De- sign, book by Kleinberg et. al., and the CSE 332 lecture slides on P-NP, by Adam Blank. Definition 2.6. A complerity class is a set of problems limited by some resource constraint (e.g. time, spac Definition 2.7. A decision problem is a set of strings (L € Es). An al gorithm (from © to boolean) solves a decision problem when it outputs ‘True if and only if the input is in the set Definition 2.8. P is the set of decision problems with a polynomial time (in terms of the input) algorithm Scanned with CamScanner Definition 2.9. NP (1) NP is the set of decision problems with a non- deterministic polynomial time algorithm. Definition 2.10. A certifier for problem X is an algorithm that takes as input: (1) s which is an instance of X; (2) A string w which is a certificate” or “witness” that s € X, and returns False if s © X regardless of w, and returns True otherwise, i.c. there exists some w to let se X. Definition 2.11. NP (2) NP is the set of decision problems with a polynomial time certificate. Definition 2.12. Suppose X,Y are two different problems. We say X is at least as hard as Y if there is a black box” capable of solving X, and if we can use polynomial operations plus polynomial number of calls to X in order to solve Y. This also means that Y

> 31) & Ox1; int max = a-k* c; return max; } The purpose of k in the max function is to check if the difference is negative. Run it manually to see how it works. Other bit manipulation code snippets Some of these are from my solutions to CSE 351 lab1. /* invert - Return x with the n bits that begin at position p inverted (i.e., turn 0 into 1 and vice versa) and the rest left unchanged. Consider the indices of x to begin with the low-order bit numbered as 0. Can assume that 0 <= n <= 31 and 0 <= p <= 31 Example: invert(0x80000000, 0, 1) = 0x80000001, invert (0x0000008e, 3, 3) = 0x000000b6, */ int invert(int x, int p, int n) { int mask = (1 << n) + “0; // have n 1s at the end mask <= p; return x ~ mask; + * sign - return 1 if positive, 0 if zero, and -1 if negative * Examples: sign(130) = 1 * sign(-23) = -1 / int sign(int x) { // Tf x is negative, x >> 31 should give all 1s // which is essentailly -1 int neg = x >> 31; // need to make x ~ 0 only 0 or 1 int pos_or_zero = !(x >> 31) & !!(x * 0); return neg + pos_or_zero; on & Scanned with CamScanner

You might also like