Cheat Sheet 2
Cheat Sheet 2
Cheat Sheet 2
4 • Prefix: x[: i]∀i - O(n) (subproblems are broken from start to i) for i = 0,1,2,...,m:
Topological Order is le to right. E(i,0) = i
for j = 1,2,...,n:
Written by: Krishna Parashar • Substrings: x[i : j]∀i, j - O(n2 ) (subproblems are fragments of
Published by: The OCM E(0,j) = j
problem, combination of suffix and prefix) Topological Order is increasing
for i = 1,2,...,m:
substring size.
for j = 1,2,...,n:
Huffman Encoding Fibonacci O(n) E(i,j) = min{E(i-1,j)+1,E(i,j-1)+1,E(i-1,j-1)
A means to encode data using the optimal number of bits for each character given +diff(i,j)}
Recursive:
a distribution. return E(m,n)
memo = {}
func huffman(f):
fib(n): Knapsack O(nW )
Input: An array f[1...n] of frequencies
if n in memo: return memo[n] Items have a weight and a value, you want to maximize the value within a given
Output: An encoding tree with n leaves
if n <= 2: f = 2 weight. (The amount you can carry in your knapsack)
else: f = fib(n-1) + fib(n-2) With repetition:
let H be a priority queue of integers, ordered by f
memo[n] = f
for i=1 to n: insert(H,i)
return f
K(ω) = maxitems {K(ω − ωitem ) + value}
for k = n+1 to 2n-1:
i=deletemin(H), j=deletemin(H) Without repetition:
Iterative:
create a node numbered k with children i,j
K(ω, j) = maxavailable items {K(ω − ωj , j − 1) + Vj , k(ω, j − 1)}
f[k] = f[i]+f[j]
fib = []
insert(H,k)
fib(n): Parenthesization O(n3 )
for k in range(1, n): C(i, j) = min{C(i, k) + c(k + 1, j) + mi−1 · mk · mj }
Dynamic Programming if k <= 2: f = 2
else: f = fib[k-1] + fib[n-2]
Floyd-Warshall O(|V |3 )
Fundamentally DP is carefully bruteforcing the solutions to a problem by turning it
into smaller and smaller nested subproblems that remember useful information fib[k] = f Used for finding shortest paths in a weighted graph with positive or negative edge
about its bigger or parent subproblem so that it can eventually reconstruct itself to return fib[n] weights (but with no negative cycles/
solve the original problem in a reasonable amount of time. This “remembrance” is
for i=1 to n:
o en done using memoization or parent pointers. Shortest Paths θ(V E) for j=1 to n:
Dynamic Programming has two approaches, which both have the same For DAGs: For a shortest path (s,v) that uses a limit of, guess take the min of the dist(i,j,0) = infinity
asymptotic runtime (differ by a constant): incoming edge weights into v, say from a node u and then add it to the prefix for all (i,j) in E:
1. Top Down: The top down approach uses the recursive idea of breaking the subproblem of shortest path from (s, u). dist(i,j,0) = l(i,j)
problem into trivially (but still helpful) smaller subproblems and finding a For General: Sk (s, v) = weight of shortest path from s to v that uses ≤ k for k = 1 to n:
way (through brute force and memorization) to find the maximum or edges. Sk (s, v) = min(u,v)inE (Sk−1 (s, u) + w(u, v) for i = 1 to n:
minimum of every permutation what is be le with ever permutation of This is Bellman-Ford. for j = 1 to n:
the subproblems. This is unlike Divide and Conquer which garners its dist(i,j,k) = min{dist(i,k,k-1)+
2
efficiency from reducing its subproblems to massively smaller problems. Longest Increasing Subsequence: O(n ) dist(k,j,k-1),
Please note that subsequences are any sequences found in another sequences dist(i,j,k-1)}
2. Bottom Up: The bottom up approach uses the opposite approach of
that are not necessarily next to each other (contiguous). Contiguous subsequences
breaking down the problem into its smallest subproblems and iteratively
using the smaller problems to solve bigger and bigger problems until it
are substrings. The following algorithm starts at one side of the list and finds the Traveling Salesman Problem (TSP) O(n2 2n )
max length of sequences terminating at that given node, recursively following
solves the original problem. A table is o en used to keep track of these Shortest path for visiting all nodes.
backlinks. Then given all the lengths of paths terminating at that given node
values. BU is more space efficient than TD, unless you use tail recursion for
choose the max length. Without memoization, this solution would be exponential C({1},1)=0
TD.
time. for s = 2 to n:
You can solve most DP problems by the following steps:
for all subsets S in {1,2,...,n} of size s and has l:
1. Define the subproblems. Know the # of subproblems. L = {} C(S,1) = infinity
for j=1,2,...,n: for all j in S,j != 1:
2. Guess a solution for what is not the subproblem (max/min of brute force
L[j] = 1+max{L[i]:(i,j) in E} C(S,j) = min{C(S-{j},i)+dij:i in S,i not in j}
permutations). Know the # of guesses.
# The (i,j) represents all the edges that go from return min over j, C({1,...,n},j)+dj1
3. Relate the subproblems to the parent problem. This is the # a node to j.
recursive/iterative step. return max(L)
Linear Programming
4. Do the recursion and memoize or iteratively build a table to keep track of Feed into a LP solver like Simplex an Objective Function which states if you want to
previously used values. These can be used to form a DAG. Ensure the DAG Edit Distance (Spelling Suggestions)
maximize or minimize the equation (max(x + 2y)), Constraints which are
for these are acyclic (i.e. have valid topological order or no dependences This algorithm works by basically choosing the min of the options for every given limitations for the variables of the Objective Function (x ≥ 0, y ≤ 600).
on parent problems) letter. (The 3 options being adding a gap inbetween letters of one of the strings or
1. To turn a maximization problem into a minimization (or vice versa) just
5. Solve the original problem (θ(#subproblems ∗ time/subproblem)) matching the two letters and moving on.)
multiply the coeficients of the objective function by -1.
ex) Snowy and Sunny have an edit distance of 3 with this configuration ∑
Choosing Subproblems (i is current problem): 2. To turn an inequality constraint like ∑n
i=1 ai xi ≤ b into an equation,
• Suffix: x[i :]∀i - O(n) (subproblems are broken from i to end) S _ N O W Y introduce a new variable S and use, n i=1 ai xi + s = b, s ≥ 0 (S is
Topological Order is right to le . S U N N _ Y known as a slack variable)
3. To change an inequality constraint into inequalities rewrite ax = b, Classifications Traveling Salesman Problem (TSP)
as ax ≤ b and ax ≥ b • P: The set of all search problems that are solvable in a reasonable amount Find the shortest path in a graph that visits all the vertices exactly once before
4. If a linear program has an unbounded value then its dual must be of time (Polynomial time). returning home. This comes from the idea of a traveling salesman wanting to
infeasible. • NP (Nondeterministic Polynomial): The set of all search problems whose efficiently visit all the cities to sell his wares.
Max Flow solution can be checked in Polynomial time (includes P) - there might exist Rudrata/Hamiltonian Cycle
search problems whose solutions can not be checked in Polynomial time.
Construct graph G that is a simple directed graph with source s and sink t. No Given a graph find if there a cycle that passes through each vertex exactly once, or
A solution may not necessary be found in a reasonable amount of time
negative capacities are possible. report one doesn’t exist.
(2n and n! algorithms can be in NP). Called NP because if you had the
Construct a Residual Graph with forward edges for the amount of capacity that is
power to guess correct every time it would work in Polynomial time, Rudrata/Hamiltonian Path
not being used (capacity - current use) and a back edge for what is currently being
making it non-deterministic. Should be called “Guessing in P”.
used. Given a path starting at s and ending at t that goes through each vertex exactly
• NP-Hard: Any problem in NP can be reduced to this problem in Polynomial once.
Ford-Fulkerson
time, so it is at least as difficult or “hard” as any NP problem. The Halting
Start with no flow on your graph. Find a path using DFS. Then create a residual Problem is NP hard and also impossible to solve in a finite amount of time, Independent Set
graph. We can now use DFS for finding a path from s to t in the residual graph. If so this idea is not always practically useful for reductions. Most NP-Hard Given a graph and a number g , the aim is to find g vertices that are independent
one exists we are not optimally using our flow. We then find the edge with the problem are NOT in NP, but those that are, are NP-Complete. meaning that no two of which have share an edge.
LEAST capacity edge - this is our bottleneck - and add flow onto all the edges in
• NP-Complete: Problem that is not only as hard as every problem in NP, Graph 3-Coloring
that path up to the capacity that is not being used by the bottleneck edge, hereby
but is also in NP. (NP-Hard and in NP). Any NP problem can be reduced to
maximizing the flow of the path. Our new flow is guaranteed to be better. Create a Given an undirected graph G = (V,E) find a valid 3-coloring C such that no two
one of these in Polynomial time. It is o en useful to prove the difficulty of
new residual graph and repeat until no path in the residual graph can be found vertices sharing the same edge have the same color, or report that such an
problems. These are the hardest problems in NP and the reason why P =
from s to t. This will happen when capacity = current use, as we lose our forward ordering doesn’t exist.
NP can revolutionize things.
edge and only have a back edge.
This algorithm will sometimes decrease flow in one area to increase flow in Reductions
another. Max flow is the total weights of incoming edges to the sink. Runtime Reductions are an incredible useful tool for either turning a search problem we
would be O(E ∗ M ), where M is the number of iterations and E is the time to don’t know how to solve into one we do, or proving that a search problem can not
find a path. This can also be stated as O(maxflow∗E). However this may not be solved or is hard to solve by reducing it to problem that is one of those two
terminate. If you use BFS instead of DFS you are using the Edmonds-Karp things.
Algorithm which will terminate and has complexity O(V ∗ (E)2 ), which is better - We know how to solve B in a reasonable amount of time and we want to use this
for large networks. Common NP-Complete Problems knowledge to solve A.
Hard problems(NP-complete) Easy problems (in P) - We denote a reduction from A to B as A → B . Difficultly flows in the direction
3SAT 2SAT, HORN SAT of the arrow.
Traveling Salesman Problem Minimum Spanning Tree - If we can reduce our unknown problem A into a known problem B then B must
Longest Path Shortest Path be as hard if not even harder to solve than A. A way to mathematically write this is
3D Matching Bipartite Matching A ≤p B . A thereby provides a lower bound of hardness for B .
Knapsack Unary Knapsack - Reduction can be composed as well: if you can reduce A → B and B → C
Independent Set Independent Set on trees then A → C .
Max Flow Min Cut Theorem Integer Linear Programming Linear Programming - Any search problem in NP can be reduced to an NP-Hard Problem in Polynomial
Rudrata Path Euler Path Time but this is not always useful (like Halting Problem)
The size of the maximum flow in a network equals the capacity of the smallest - Any search problem in NP can also be reduced to an NP-Complete Problem in
(s,t)-cut, where and (s,t)-cut partitions the vertices into two disjoint groups L and R Balanced Cut Minimum Cut
Polynomial Time.
such that s (start) is in L and t (goal) is in R. - Any problem that is NP-Complete is reducible to any other problem that is also
Satisfiability (SAT)
Bipartite Matching NP-Complete which is very useful.
This is the prototypical NP-Complete problem that everything started from. Say
Explained by Example: list of boys, list of girls, if boy likes girl, a direct edge exists you have some Boolean expressions written using only AND, OR, NOT, variables, Reduction Tree
from boy to girl. Is there a perfect matching? Create source node s and sink node t, and parentheses (Example: x1 ∧ x2 ∨ x3 ). The SAT problem is given any one of
s has outgoing edges to all the boys, and t has incoming edges from all the girls. these expressions, is there some assignment of TRUE and FALSE values to the
Give every edge a capacity of one (obviously). A flow exists if there is a flow into t variables that will make the entire expression TRUE?
with size equal to number of couples.
3SAT
Computational Complexity This is a stricter version of the SAT problem in which the statement is divided into
We use Computational Complexity to determine classifications for our algorithms clauses where each clause can have exactly 3 literals. (Example:
to know if they are feasible. (x1 ∨ x2 ∨ x3 ) ∧ (x4 ∨ x5 ∨ x6 )). For these you want to find whether there
exists values for x1 ...x6 such that the boolean evaluates to TRUE.
Decision vs. Search Problems
Decision Problem: Computational problem that answers “Yes” or “No”. Our input CircuitSAT
can be any possible string (binary, ASCII), and it will answer either 0 or 1 depending Given a circuit of logic gates with a single output and no loops find there a setting
upon weather the solution is correct or not. This type of problem determines our of the inputs that causes the circuit to output 1.
classes.
Search Problem: Computational problem tasked with not if a solution exists, but
Integer Linear Programming
what one is. Decision problems can be derived from Search problems which are Solve a problem using linear objective function and linear inequalities, WHILE
generally always more difficult. constraining the values of the variables to integers.