With NP-completeness we have seen that there are many important optimization problems that are
likely to be quite hard to solve exactly. Since these are important problems, we cannot simply give up at
this point, since people do need solutions to these problems. Here are some strategies that are used to
cope with NP-completeness:
Even on the fastest parallel computers this approach is viable only for the smallest instance of these
Heuristics: A heuristic is a strategy for producing a valid solution but there are no guarantees how close
it to optimal. This is worthwhile if all else fails.
General search methods: Powerful techniques for solving general combinatorial optimization problems.
Branch-and-bound, A*-search, simulated annealing, and genetic algorithms
Approximation algorithm: This is an algorithm that runs in polynomial time (ideally) and produces a
solution that is within a guaranteed factor of the optimal solution.
Given an undirected graph G = (V, E) and an integer k, does G contain a subset V 0 of k vertices such
that no two vertices in V 0 are adjacent to each other.
The independent set problem arises when there is some sort of selection problem where there are
mutual restrictions pairs that cannot both be selected. For example, a company dinner where an
employee and his or her immediate supervisor cannot both be invited.
A literal is a variable x or its negation x. A boolean formula is in 3-Conjunctive Normal Form (3-CNF) if it
is the boolean-and of clauses where each clause is the boolean-or of exactly three literals. For example,
is in 3-CNF form. 3SAT is the problem of determining whether a formula is 3-CNF is satisfiable. 3SAT is
NP-complete. We can use this fact to prove that other problems are NP-complete. We will do this with
the independent set problem.
Claim: IS is NP-complete
The proof involves two parts. First, we need to show that IS ∈ NP. The certificate consists of k vertices of
V 0 . We simply verify that for each pair of vertices u, v ∈ V 0 , there is no edge between them. Clearly,
this can be done in polynomial time, by an inspection of the adjacency matrix.
Second, we need to establish that IS is NP-hard This can be done by showing that some known NP-
compete (3SAT) is polynomial-time reducible to IS. That is, 3SAT ≤P IS.
An important aspect to reductions is that we do not attempt to solve the satisfiability problem.
Remember: It is NP-complete, and there is not likely to be any polynomial time solution. The idea is to
translate the similar elements of the satisfiable problem to corresponding elements of the independent
set problem.
3Col ≤P CCov
Consider the following problem than can be solved with the graph coloring approach. A tropical fish
hobbyist has six different types of fish designated by A, B, C, D, E, and F, respectively. Because of
predator-prey relationships, water conditions and size, some fish can be kept in the same tank. The
following table shows which fish cannot be together:
These constraints can be displayed as a graph where an edge between two vertices exists if the two
species cannot be together. This is shown in Figure 9.4. For example, A cannot be with B and C; there is
an edge between A and B and between A and C.
Given these constraints, What is the smallest number of tanks needed to keep all the fish? The answer
can be found by coloring the vertices in the graph such that no two adjacent vertices have the same
color. This particular graph is 3-colorable and therefore, 3 fish tanks are enough. This is depicted in
Figure 9.5. The 3 fish tanks will hold fish as follows:
Complexity Classes important for short question
Before giving all the technical definitions, let us say a bit about what the general classes look like at an
intuitive level.
Class P: This is the set of all decision problems that can be solved in polynomial time. We will generally
refer to these problems as being “easy” or “efficiently solvable”.
Class NP: This is the set of all decision problems that can be verified in polynomial time. This class
contains P as a subset. It also contains a number of problems that are believed to be very “ hard” to
Class NP: The term “NP” does not mean “not polynomial”. Originally, the term meant “ non-
deterministic polynomial” but it is a bit more intuitive to explain the concept from the perspective of
Class NP-hard: In spite of its name, to say that a problem is NP-hard does not mean that it is hard to
solve. Rather, it means that if we could solve this problem in polynomial time, then we could solve all NP
problems in polynomial time. Note that for a problem to NP-hard, it does not have to be in the class NP
Single-source shortest-path problem: Find shortest paths from a given (single) source vertex s ∈ V to
every other vertex v ∈ V in the graph G.
Single-destination shortest-paths problem: Find a shortest path to a given destination vertex t from each
vertex v. We can reduce the this problem to a single-source problem by reversing the direction of each
edge in the graph.
Single-pair shortest-path problem: Find a shortest path from u to v for given vertices u and v. If we solve
the single-source problem with source vertex u, we solve this problem also. No algorithms for this
problem are known to run asymptotically faster than the best single-source algorithms in the worst
In contrast, Prim’s algorithm builds the MST by adding leaves one at a time to the current tree. We start
with a root vertex r; it can be any vertex. At any time, the subset of edges A forms a single tree (in
Kruskal’s, it formed a forest). We look to add a single vertex as a leaf to the tree.
Kruskal’s algorithm works by adding edges in increasing order of weight (lightest edge first). If the next
edge does not induce a cycle among the current set of edges, then it is added to A. If it does, we skip it
and consider the next in order. As the algorithm runs, the edges in A induce a forest on the vertices. The
trees of this forest are eventually merged until a single tree forms containing all vertices.
The tricky part of the algorithm is how to detect whether the addition of an edge will create a cycle in A.
Suppose the edge being considered has vertices (u, v). We want a fast test that tells us whether u and v
are in the same tree of A. This can be done using the Union-Find data structure which supports the
following O(log n) operations:
Union(u,v): merge the set containing u and set containing v into a common set
In Kruskal’s algorithm, the vertices will be stored in sets. The vertices in each tree of A will be a set. The
edges in A can be stored as a simple list. Here is the algorithm: Figures 8.51 through ?? demonstrate the
algorithm applied to a graph.
1 A ← {}
2 for ( each u ∈ V)
3 do create set(u)
6 do if (find(u) 6= find(v))
8 union(u, v)
9 return A
An edge of E is a light edge crossing a cut if among all edges crossing the cut, it has the minimum weight.
Intuition says that since all the edges that cross a respecting cut do not induce a cycle, then the lightest
edge crossing a cut is a natural choice. The main theorem which drives both algorithms is the following
MST Lemma: Let G = (V, E) be a connected, undirected graph with real-valued weights on the edges. Let
A be a viable subset of E (i.e., a subset of some MST). Let (S, V − S) be any cut that respects A and let (u,
v) be a light edge crossing the cut. Then the edge (u, v) is safe for A. This is illustrated in Figure 8.46.
MST Proof: It would simplify the proof if we assume that all edge weights are distinct. Let T be any MST
for G. If T contains (u, v) then we are done. This is shown in Figure 8.47 where the lightest edge (u, v)
with cost 4 has been chosen.
A common problem is communications networks and circuit design is that of connecting together a set
of nodes by a network of total minimum length. The length is the sum of lengths of connecting wires.
Consider, for example, laying cable in a city for cable t.v.
The computational problem is called the minimum spanning tree (MST) problem. Formally, we are given
a connected, undirected graph G = (V, E) Each edge (u, v) has numeric weight of cost. We define the cost
of a spanning tree T to be the sum of the costs of edges in the spanning tree
A topological sort of a DAG is a linear ordering of the vertices of the DAG such that for each edge (u, v), u
appears before v in the ordering.
Computing a topological ordering is actually quite easy, given a DFS of the DAG. For every edge (u, v) in
a DAG, the finish time of u is greater than the finish time of v (by the lemma). Thus, it suffices to output
the vertices in the reverse order of finish times.
The time stamps given by DFS allow us to determine a number of things about a graph or digraph. For
example, we can determine whether the graph contains any cycles. We do this with the help of the
following two lemmas.
Lemma: Given a digraph G = (V, E), consider any DFS forest of G and consider any edge (u, v) ∈ E. If this
edge is a tree, forward or cross edge, then f[u] > f[v]. If this edge is a back edge, then f[u] ≤ f[v].
Proof: For the non-tree forward and back edges the proof follows directly from the parenthesis lemma.
For example, for a forward edge (u, v), v is a descendent of u and so v’s start-finish interval is contained
within u’s implying that v has an earlier finish time. For a cross edge (u, v) we know that the two time
intervals are disjoint. When we were processing u, v was not white (otherwise (u, v) would be a tree
edge), implying that v was started before u. Because the intervals are disjoint, v must have also finished
before u
Lemma: Consider a digraph G = (V, E) and any DFS forest for G. G has a cycle if and only if the DFS forest
has a back edge.
Proof: If there is a back edge (u, v) then v is an ancestor of u and by following tree edge from v to u, we
get a cycle.
We show the contrapositive: suppose there are no back edges. By the lemma above, each of the
remaining types of edges, tree, forward, and cross all have the property that they go from vertices with
higher finishing time to vertices with lower finishing time. Thus along any path, finish times decrease
monotonically, implying there can be no cycle.
Imp short
Cross edge: (u, v) where u and v are not ancestor or descendent of one another. In fact, the edge may
go between different trees of the forest.
Difference b/w Breadth-first Search and Depth-first Search imp for short and long
Breadth-first Search
Here is a more efficient algorithm called the breadth-first search (BFS) Start with s and visit its adjacent
nodes. Label them with distance 1. Now consider the neighbors of neighbors of s. These would be at
distance 2. Now consider the neighbors of neighbors of neighbors of s. These would be at distance 3.
Repeat this until no more unvisited neighbors left to visit. The algorithm can be visualized as a wave
front propagating outwards from s visiting the vertices in bands at ever increasing distances from s.
Depth-first Search
Breadth-first search is one instance of a general family of graph traversal algorithms. Traversing a graph
means visiting every node in the graph. Another traversal strategy is depth-first search (DFS). DFS
procedure can be written recursively or non-recursively. Both versions are passed s initially
Let ρi = vi/wi denote the value per unit weight ratio for item i. Sort the items in decreasing order of ρi.
Add items in decreasing order of ρi. If the item fits, we take it all. At some point there is an item that
does not fit in the remaining space. We take as much of this item as possible thus filling the knapsack
Huffman Encoding Algorithm complete with correctness imp for short and long
Suppose you want to count out a certain amount of money, using the fewest possible bills (notes) and
coins. A greedy algorithm to do this would be: at each step, take the largest possible note or coin that
does not overshoot.
Consider the currency in U.S.A. There are paper notes for one dollar, five dollars, ten dollars, twenty
dollars, fifty dollars and hundred dollars. The notes are also called “bills”. The coins are one cent, five
cents (called a “nickle”), ten cents (called a “dime”) and twenty five cents (a “quarter”). In Pakistan, the
currency notes are five rupees, ten rupees, fifty rupees, hundred rupees, five hundred rupees and
thousand rupees. The coins are one rupee and two rupees. Suppose you are asked to give change of
$6.39 (six dollars and thirty nine cents), you can choose:
• a $5 note • a $1 note to make $6 • a 25 cents coin (quarter), to make $6.25 • a 10 cents coin (dime), to
make $6.35 • four 1 cents coins, to make $6.39
Notice how we started with the highest note, $5, before moving to the next lower denomination.
Formally, the Coin Change problem is: Given k denominations d1, d2, . . . , dk and given N, find a way of
such that
i1 + i2 + · · · + ik is minimized.
The greedy approach gives us an optimal solution when the coins are all powers of a fixed
Note that this is N represented in based D. U.S.A coins are multiples of 5: 5 cents, 10 cents and 25 cents.
What is Greedy Algorithms and its two phases imp for short &long
An optimization problem is one in which you want to find, not just a solution, but the best solution.
Search techniques look at many possible solutions. E.g. dynamic programming or backtrack search. A “
greedy algorithm” sometimes works well for optimization problems
• You take the best you can get right now, without regard for future consequences.
• You hope that by choosing a local optimum at each step, you will end up at a global optimum.
For some problems, greedy approach always gets optimum. For others, greedy finds good, but not
always best. If so, it is called a greedy heuristic, or approximation. For still others, greedy approach can
do very poorly
