Chapter One: Al-Khowarizmi, Is Defined As Follows: Roughly Speaking
Chapter One: Al-Khowarizmi, Is Defined As Follows: Roughly Speaking
Chapter One: Al-Khowarizmi, Is Defined As Follows: Roughly Speaking
Analysis of Algorithms
Introduction
An algorithm, named after the ninth century scholar Abu Jafar Muhammad Ibn Musu
Al-Khowarizmi, is defined as follows: Roughly speaking:
An algorithm is a set of rules for carrying out calculation either by hand or on a
machine.
An algorithm is a finite step-by-step procedure to achieve a required result.
An algorithm is a sequence of computational steps that transform the input into the
output.
An algorithm is a sequence of operations performed on data that have to be
organized in data structures.
An algorithm is an abstraction of a program to be executed on a physical machine
(model of Computation).
The most famous algorithm in history dates well before the time of the ancient Greeks:
this is Euclid's algorithm for calculating the greatest common divisor of two integers.
Properties of an algorithm
• Finiteness: Algorithm must complete after a finite number of steps.
• Definiteness: Each step must be clearly defined, having one and only one interpretation. At
each point in computation, one should be able to tell exactly what happens next.
• Sequence: Each step must have a unique defined preceding and succeeding step. The first
step (start step) and last step (halt step) must be clearly noted.
• Feasibility: It must be possible to perform each instruction.
• Correctness: It must compute correct answer for all possible legal inputs.
• Language Independence: It must not depend on any one programming language.
• Completeness: It must solve the problem completely.
• Effectiveness: It must be possible to perform each step exactly and in a finite amount of time.
• Efficiency: It must solve with the least amount of computational resources such as time and
space.
• Generality: Algorithm should be valid on all pos
sible inputs.
• Input/Output: There must be a specified number of input values, and one or more result
values.
Chapter Objectives: At the end of this chapter you should be able to:
Page 1
1.1 Performance Analysis
By performance of a program, we mean the amount of computer memory and time needed to run a program.
There are two approaches to determine the performance of a program:
What is Analysis: - The analysis of algorithm provides background information that a general idea of how
long an algorithm will take for a given set. For each algorithm considered, we will come up with an estimate
of how long it will take to solve a problem that has a set of N input values. For example:-
i. We might determine how many comparisons a sorting algorithm does to put a list of N values in to
ascending order, or,
ii. We might determine how many arithmetic operations it takes to multiply two matrices of size N x N
There a number of algorithms that will solve a problem. Studying the analysis of algorithms gives us the tools
to choose between algorithms.
Input Classes
Two important measures of an algorithms performance are its time and space complexity measured as
functions of the size of the input. Input plays an important role in analysing algorithms because it is the input
that determines what the path of execution though an algorithm will be. For example, if we are interested to
find out the largest value in a list of N numbers, we can use the following algorithm:
largest = list[1]
for i=2 to N do
if(list[i] > largest) then
largest = list[i]
end if
end for
We can see that, if the list is in decreasing order there will only one assignment done before the loop starts. If
the list is in increasing order, however, there will be N assignments (one before the loop starts and N-1 inside
the loop).
So, our analysis must consider more than one possible set of input, because if we look at one set of input, it
may be the set that is solved fastest (slowest). This will give us false impression of the algorithm. Instead we
consider all types of input set.
When we look at the input we will try to break up all the different input set into classes based on how the
algorithm behaves on each set. This helps to reduce the number of possibilities that we will need to consider.
The complexity of an algorithm can be expressed in terms of its space and time complexity.
Space Complexity of a program: - is the amount of memory space it needs to run to completion. We are
interested in the space complexity of a program for the following reasons:
Page 2
• If a program is to run on a multiuser computer system, then we may need to specify the amount of
memory to be allocated for the program.
• For any computer system, we would like to know in advance whether or not sufficient memory is
available to run the program.
• A problem might have several possible solutions with different space requirements.
• We can use the space complexity to estimate the size of the largest problem that a program can
solve. For example, we may have a circuit simulation program that requires 106 + 100(c + w)
bytes of memory to simulate circuits with c components and w wires. If the total amount of
memory available is 4 * 106, then we can simulate circuits with c + w <= 30,000.
Components of space complexity: - the space needed by a program has the following components:
• Instruction space: - is the amount of space needed to store the compiled version of the
program instructions. The amount of instruction space that is needed depends on factors such
as:
- The compiler used to compile the program into machine code.
- The compiler options in effect at time of compilation.
- The target computer.
• Data space: - is the space needed to store all constant and variable values. Data space has two
components:
1. Space needed by constants (e.g. the numbers 0, 1, 2...) and simple variables.
2. Space needed by dynamically allocated objects such as arrays and class instances.
• Environmental stack space:- is the space used to save information needed to resume
execution of partially completed methods (functions). For example, if method func1 invokes
or calls method func2, then we must at least save a pointer to the instruction of func1 to be
executed when func2 terminates. Each time a function or method is called or invoked, the
following data are stored on the environment stack.
1. The return address
2. The values of all local variables and formal parameters in the method or function
being invoked or called.
Time Complexity of a program: - is the amount of computer time it needs to run to completion. We are
interested in time complexity of program for the following reasons:
Some computer systems require the user to provide an upper limit on the amount of time the program
will run. Once this upper limit is reached, the program is aborted. An easy way out is to simply specify
a time limit of a few thousand years. However, this solution could result in serious fiscal problems if
the program runs into an infinite loop caused by some discrepancy in the data and you actually get
billed for the computer time used. We would like to provide a time limit that is just slightly above the
expected run time.
The program we are developing might need to provide a satisfactory real-time response. For example,
all interactive programs must provide such a response. A text editor that takes a minute to move the
cursor one page down or one page up will not be acceptable to many users. A spreadsheet program
that takes several minutes to re-evaluate the cells in a sheet will be satisfactory only to patient users. A
database management system that allows its users adequate time to drink two cups of coffee while it is
sorting a relation will not find too much acceptance. Programs designed for interactive use must
Page 3
provide satisfactory real-time response. From the time complexity of the program or program module,
we can decide whether or not the response time will be acceptable. If not, we need to either redesign
the algorithm or give the user a faster computer.
If we have alternative ways to solve a problem, then the decision on which to use will be based primarily on
the expected performance difference among those solutions. We will use some weighted measures of the space
and time complexities of the alternative solutions.
Components of time complexity: - the time complexity of a program depends on all the factors that the space
complexity depends on.
Since analytical approach to determine the run time of a program is fraud with complications, we attempt only
to estimate the run time. To more manageable approach to estimating the run time are:
1. Identify one or more key operations and determine the number of times these are performed, and
2. Determine the total number of steps executed by the program.
Deciding what to count involves two steps. The first is choosing the significant operation or operations, and
the second is deciding which of these operations are integral to the algorithm and which are overhead or
bookkeeping. There are two classes of operations that are typically chosen for the significant operations. These
significant classes of operations are comparison operations and arithmetic operations.
- The comparison operators are all considered equivalent and we count them in algorithms such as
searching and sorting. In these algorithms, the important task being done is the comparison of two
values to determine, when searching, if the value is the one we are looking for or, when sorting, if the
values are out of order. Comparison operators include equal, not equal, less than, greater than, less
than or equal, and greater than or equal.
- We will count arithmetic operators in two groups: Additive and Multiplicative. Additive operators
(usually called additions for short) include addition, subtraction, increment, and decrement.
Multiplicative operators (usually called multiplications for short) include multiplication, division, and
modulus.
These two groups are counted separately because multiplications are considered to take longer than additions.
In case of integer multiplication or division by a power of 2, this operation can be reduced to a shift operation,
which is considered as fast as addition.
Cases to consider
Choosing what input to consider when analysing and algorithm can have a significant impact on how an
algorithm will perform. If the input is already sorted, the some sorting algorithms will perform very well, but
other sorting algorithms may perform very poorly. The opposite may be true if the list is randomly arranged
instead of sorted. Because of this, we will not consider just one input set when we analyse an algorithm. In
fact, we will actually look for those input sets that allow an algorithm to perform the most quickly and the
most slowly. We will also consider an overall average performance of the algorithm as well.
Best Case
As its name indicates, the best case for an algorithm is the input that requires the algorithm to take the shortest
time. This input is the combination of values that cause the algorithm to do the least amount of work.
Page 4
For example, if we are looking at a searching algorithm, the best case would be if the value we are searching
for (commonly called the target or key) was the value stored in the first location that the searching algorithm
will check. This would then require only one comparison no matter how complex the algorithm is. Notice that
for searching through a list of values, no matter how large, the best case will result in a constant time of 1.
Because the best case for an algorithm will usually be a very small and frequently constant value, we will not
do a best-case analysis very frequently.
Worst Case
Worst case is an important analysis because it gives us an idea of the most an algorithm will ever take. Worst-
case analysis requires that we identify the input values that cause an algorithm to do the most work.
For example, for searching algorithms, the worst case is one where the value is in the last place we check or is
not in the list. This could involve comparing the key to each list value for a total of N comparisons. The worst
case gives us an upper bound on how slowly parts of our programs may work based on our algorithm choices.
Average Case
The average-case analysis is the toughest to do because there are a lot of details involved. The basic process
begins by determining the number of different groups into which all possible input sets can be divided. The
second step is to determine the probability that the input will come from each of these groups. The third step is
to determine how long the algorithm will run for each of these groups. All of the input in each group should
take the same amount of time, and if they do not, the group must be split into two separate groups. When all of
this has been done, the average case time is given by the following formula:
𝑚
𝐴(𝑛) = ∑ Pi * ti
𝑖=1
Where n is the size of the input, m is the number of groups, pi is the probability that the input will be from
group i, and ti is the time that the algorithm will takes for input from group i.
In some cases, we will consider that each of the input groups has equal probabilities. In other words, if there
are five input groups, the chance the input will be in group 1 is the same as the chance for group 2, and so on.
This would mean that for these five groups all possibilities would be 0.2. We could calculate the average case
by the above formula, or we could note that the following simplified formula is equivalent in the case of where
all groups are equally probable:
𝑚
1
𝐴(𝑛) = ∑ Pi * ti
𝑚
𝑖=1
There are some mathematical concepts that will be useful in analysis of algorithms.
Floor of a number: we say a floor of a X (written as X ), is the largest integer that is less than or
equal to X. So, 2.5 would be 2 and -7.3 would be -8.
Page 5
Ceiling of a given number: we say that the ceiling of X (written as X ), is the smallest integer
greater than or equal to X. So, 2.5 would be 3 and -7.3 would be -7.
Because we would be using just positive numbers, we can think of the floor as truncation and the
ceiling as rounding up. For negative numbers, the effect is reversed.
The floor and ceiling will be used when we need to determine how many times something is done;
the value depends on some fraction of the items it is done to.
Logarithms
Because logarithms play an important role in our analysis, there are some properties that must be
discussed.
The logarithm base Y of a number X, is the power of Y that will produce the number X. Logarithms
are strictly increasing functions. This means that given two numbers X and Y, if X<Y, logBX < logBY
for all bases B. Logarithms are one-to-one functions. This means if logBX = logBY, then X=Y. Other
properties that are important for you are:
log
B1
=0
log
BB = 1
log (X*Y) = log X + log Y
B B B
log XY = Y* log X
B B
𝐥𝐨𝐠𝐁 𝐗
log
AX = 𝐥𝐨𝐠𝐁 𝐀
Probabilities
Because we will analyse algorithms relative to their input size, we may at times need to consider the
likelihood of a certain set of input. This means that we will need to work with the probability that the
input will meet some condition. The probability that something will occur is the given as a number in
the range of 0 to 1, where 0 means it will never occur, and 1 means it will always occur.
If we know that there are exactly 10 different possible inputs, we can say that the probability of each
of these is between 0 and 1, and the total of all of the individual probabilities is 1, because one of
these must happen. If there is an equal chance that any of these can occur, each will have a
probability of 0.1 (one out of 10 or 1/10).
For most of our analysis, we will first determine how many possible situations there are and then
assume that all are equally likely. If we determine that there are N possible situations, these results in
a probability of 1/N for each of these situations.
Summations
We will be adding up sets of values as we analyse our algorithms. Let’s say we have an algorithm
with a loop. We notice that when the loop variable is 5, we do 5 steps and when it is 20, we do 20
steps. We determine in general that when the loop variable is m, we do m steps. Overall, the loop
variable will take on all values from 1 to N. So the total steps are the sum of the values from 1
Page 6
through N. To easily express this, we use ∑𝑛𝑖=0 . The expression below ∑ represents the initial value
for the summation variable, and the value above the ∑ represents the ending value.
Once we have expressed some solution in terms of this summation notation, we will want to simplify
this so that we can make comparisons with other formulas. The following are set of standard
summation formulas determine the actual values these summations represent.
o ∑𝑁 𝑁
𝑖=1 c ∗ i = 𝑐 ∗ ∑𝑖=1 i , with c a constant expression not dependent on i.
𝑁
o ∑𝑁
𝑖=𝑐 i = ∑𝑖=0(c + i)
𝑐−1
o ∑𝑁 𝑁
𝑖=𝑐 i = ∑𝑖=0 i - ∑𝑖=0 i
𝑁
o ∑𝑖=1(A + B) = ∑𝑁 𝑁
𝑖=1 A + ∑𝑖=1 B
𝑁
o ∑𝑖=0(N − i) = ∑𝑁
𝑖=0 i
o ∑𝑁
𝑖=1 1= N
o ∑𝑁
𝑖=1 C = C * N
N(N+1)
o ∑𝑁
𝑖=1 i = 2
N(N+1)(2N+1) 2𝑁3 +3𝑁2 +𝑁
o ∑𝑁 2
𝑖=1 𝑖 = =
6 6
o ∑𝑁
𝑖=0 2
𝑖
=2𝑁+1
− 1
𝐴𝑁+1 −1
o ∑𝑁 𝑖
𝑖=0 𝐴 = , for some number A
𝐴−1
𝑁
o ∑𝑖=1 𝑖2 = (𝑁 − 1)2𝑁+1 + 2
𝑖
1
o ∑𝑛𝑖=1 = ln 𝑁
𝑖
o ∑𝑁𝑖=1 log 2 𝑖 ~ log 2 𝑁 − 1.5
Page 7
Chapter Two: - Complexity analysis
2.1 Rates of Growth
Complexity analysis involves two distinct phases:
Algorithm Analysis: Analysis of the algorithm or data structure to produce a function T (n) that
describes the algorithm in terms of the operations performed in order to measure the complexity of the
algorithm.
Order of Magnitude Analysis: Analysis of the function T (n) to determine the general complexity
category to which it belongs.
There is no generally accepted set of rules for algorithm analysis. However, an exact count of operations is
commonly used.
Analysis Rules:
1. We assume an arbitrary time unit.
2. Execution of one of the following operations takes time 1:
Assignment Operation
Single Input/output Operation
Single Boolean Operations
Single Arithmetic Operations
Function Return
3. Running time of a selection statement (if, switch) is the time for the condition evaluation plus
the maximum of the running times for the individual clauses in the selection.
4. Loops: Running time for a loop is equal to the running time for the statements inside the loop
multiplied by the number of iterations.
The total running time of a statement inside a group of nested loops is the running time of the
statements multiplied by the product of the sizes of all the loops.
For nested loops, analyze inside out.
Always assume that the loop executes the maximum number of iterations possible.
5. Running time of a function call is 1 for setup plus the time for any parameter calculations plus
the time required for the execution of the function body.
Examples
Example1: }
int count()
int k=0;
Time Units to Compute
cout<< “Enter an integer”;
------------------------------------------------------------
cin>>n;
1 for the assignment statement: int k=0
for (i=0;i<n;i++) 1 for the output statement.
1 for the input statement.
k=k+1;
In the for loop:
return 0; 1 assignment, n+1 tests, and n increments.
Page 8
n loops of 2 units for an assignment, and an ------------------------------------------------------------
addition.
1 for the return statement. T (n)= 1+1+1+(1+n+1+n)+2n+1 = 4n+6 = O(n)
Example2:
Time Units to Compute
int total(int n) -------------------------------------------------
{ 1 for the assignment statement: int sum=0
In the for loop:
int sum=0; 1 assignment, n+1 tests, and n increments.
n loops of 2 units for an assignment, and an
for (int i=1;i<=n;i++) addition.
sum=sum+1; 1 for the return statement.
-----------------------------------------------------------
return sum;
T (n)= 1+ (1+n+1+n)+2n+1 = 4n+4 = O(n)
}
Page 9
Formal Approach to Analysis
In the above examples we have seen that analysis is a bit complex. However, it can be simplified by using
some formal approach in which case we can ignore initializations, loop control, and book keeping.
In general, a for loop translates to a summation. The index and bounds of the summation are the same
as the index and bounds of the for loop.
N
1 N
for (int i = 1; i <= N; i++) {
sum = sum+i;
}
i 1
Suppose we count the number of additions that are done. There is 1 addition per iteration of the loop,
hence N additions in total.
Nested for loops translate into multiple summations, one for each for loop.
for (int i = 1; i <= N; i++) {
for (int j = 1; j <= M; j++) { N M N
}
sum = sum+i+j;
i 1 j 1
2 i 1
2M 2 MN
}
Again, count the number of additions. The outer summation is for the outer for loop.
Conditionals: Formally
• If (test) s1 else s2: Compute the maximum of the running time for s1 and s2.
Page 10
if (test == 1) {
for (int i = 1; i <= N; i++) { N N N
sum = sum+i; max 1, 2
}} i 1 i 1 j 1
else for (int i = 1; i <= N; i++) {
for (int j = 1; j <= N; j++) { max N , 2 N 2 2 N 2
sum = sum+i+j;
}}
Measures of Times
In order to determine the running time of an algorithm it is possible to define three functions Tbest(n), Tavg(n)
and Tworst(n) as the best, the average and the worst case running time of the algorithm respectively.
Average Case (Tavg): The amount of time the algorithm takes on an "average" set of inputs.
Worst Case (Tworst): The amount of time the algorithm takes on the worst possible set of inputs.
Best Case (Tbest): The amount of time the algorithm takes on the smallest possible set of inputs.
We are interested in the worst-case time, since it provides a bound for all input – this is called the “Big-Oh”
estimate.
Asymptotic Analysis
In analysis of algorithms, it is not important to know how many operations an algorithm does. Of greater
concern is the rate of increase in operations for an algorithm to solve a problem as the size of the problem
increases. This is referred to us the rate of growth of the algorithm. What happens with small sets of input
data is not as interesting as what happens when the data set gets large.
The rate of growth of an algorithm is dominated by the largest term in an equation; we will discard the terms
that grow more slowly. When we strip all of these things away, we are left with what we call the order of the
function or related algorithm. We then group algorithms based on their order. We group in three categories –
those that grow at least as fast as some function, those that grow at the same rate, and those that grow no
faster.
Asymptotic analysis is concerned with how the running time of an algorithm increases with the size of the
input in the limit, as the size of the input increases without bound.
There are five notations used to describe a running time function. These are:
Big-Oh notation is a way of comparing algorithms and is used for computing the complexity of algorithms;
i.e., the amount of time that it takes for computer program to run. It’s only concerned with what happens for
very a large value of n. Therefore only the largest term in the expression (function) is needed. For example, if
Page 11
the number of operations in an algorithm is n2 – n, n is insignificant compared to n2 for large values of n.
Hence the n term is ignored. Of course, for small values of n, it may be important. However, Big-Oh is mainly
concerned with large values of n.
Formal Definition:
We use O( f ), called big – oh which represents the class of functions that grow to faster than f. This means that
for all values of n greater than some threshold 𝑛0 , all of the functions in O( f ) have values that are no greater
than f. The class O( f ) has f as an upper bound, so none of the functions in this class grow faster than f.
Formally this means that if 𝑔(𝑥) ∈ 𝑂(𝑓), 𝑔(𝑛) ≤ 𝑐 ∗ 𝑓(𝑛) for all 𝑛 ≥ 𝑛0 (where c is a positive constant).
i.e. 𝒇(𝒏) = 𝑶(𝒈(𝒏)) if there exists a constant 𝑐, 𝑛0 ∈ 𝑅 + such that for all 𝑛 ≥ 𝑛0 , 𝑓(𝑛) ≤ 𝑐 ∗ 𝑔(𝑛).
Examples: The following points are facts that you can use for Big-Oh problems:
1 ≤ 𝑛 for all 𝑛 ≥ 1
𝑛 ≤ 𝑛2 for all 𝑛 ≥ 1
2𝑛 ≤ 𝑛! for all 𝑛 ≥ 4
To show that f(n) is O(g(n)) we must show that constants c and 𝑛0 such that
(𝑐 = 15, 𝑛0 = 1).
∴ f(n)=10n+5 is O(g(n))
Page 12
Typical Orders
Here is a table of some typical cases. This uses logarithms to base 2, but these are simply proportional to
logarithms in other base.
Demonstrating that a function f(n) is big-O of a function g(n) requires that we find specific constants c and
𝒏𝟎 for which the inequality holds (and show that the inequality does in fact hold).
Big-Oh expresses an upper bound on the growth rate of a function, for sufficiently large values of n.
An upper bound is the best algorithmic solution that has been found for a problem.
In simple words, 𝑓 (𝑛) = 𝑂(𝑔(𝑛)) means that the growth rate of f(n) is less than or equal to g(n).
The big-oh notation gives an upper bound for a function to within a constant factor. We write
𝑓(𝑛) = 𝑂(𝑔(𝑛)) if there are positive constants n0 and c such that to the right of n0, the value of f(n)
always lies on or below cg(n).
Big-O Theorems
For all the following theorems, assume that f(n) is a function of n and that k is an arbitrary
constant.
Page 13
Theorem 1: k is O(1),
Theorem 2: A polynomial is O(the term containing the highest power of n).
𝐾, 𝑛, log 𝑏 𝑛 , 𝑛 log 𝑏 𝑛 , 𝑛2 , 𝑛 𝑡𝑜 ℎ𝑖𝑔ℎ𝑒𝑟 𝑝𝑜𝑤𝑒𝑟𝑠, 2𝑛 , 3𝑛 , 𝑙𝑎𝑟𝑔𝑒𝑟 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠 𝑡𝑜 𝑡ℎ𝑒 𝑛𝑡ℎ 𝑝𝑜𝑤𝑒𝑟, 𝑛!, 𝑛𝑛 .
• 𝑛𝑟 𝑖𝑠 𝑂( 𝑛 𝑠 ) 𝑖𝑓 0 ≤ 𝑟 ≤ 𝑠
Big-Omega Notation
Just as O-notation provides an asymptotic upper bound on a function, notation provides an
asymptotic lower bound.
Formal Definition: A function f(n) is ( g (n)) if there exist constants c and k 𝜖 𝑅 + such
that f(n) >=c. g(n) for all n>=k. f(n)= (g(n)) means that f(n) is greater
Page 14
than or equal to some constant multiple of g(n) for all values of n
greater than or equal to some k.
Theta Notation
A function f (n) belongs to the set of (g(n)) if there exist positive constants c1 and c2 such that it can
be sandwiched between c1.g(n) and c2.g(n), for sufficiently large values of n.
Formal Definition: A function f (n) is (g(n)) if it is both O( g(n) ) and ( g(n) ). In other
words, there exist constants c1, c2, and k >0 such that
c1.g (n)<=f(n)<=c2. g(n) for all n >= k
If f(n)= (g(n)), then g(n) is an asymptotically tight bound for f(n).
In simple terms, f(n)= (g(n)) means that f(n) and g(n) have the same
rate of growth.
Example:
1. If f(n)=2n+1, then f(n) = (n)
2. f(n) =2n2 then f(n)=O(n4), f(n)=O(n3), f(n)=O(n2)
All these are technically correct, but the last expression is the best and tight one. Since 2n2 and n2
have the same growth rate, it can be written as f(n)= (n2).
Little-o Notation
Page 15
= O(n3)
f(n)=o(g(n)) means for all c>0 there exists some k>0 such that f(n)<c.g(n) for all n>=k.
Informally, f(n)=o(g(n)) means f(n) becomes insignificant relative to g(n) as n approaches
infinity.
Example: f(n)=3n+4 is o(n2)
In simple terms, f(n) has less growth rate compared to g(n).
g(n)= 2n2 g(n) =o(n3), O(n2), g(n) is not o(n2).
Little-Omega ( notation)
Transitivity
• if f(n)=(g(n)) and g(n)= (h(n)) then f(n)=(h(n)),
• if f(n)=O(g(n)) and g(n)= O(h(n)) then f(n)=O(h(n)),
• if f(n)=(g(n)) and g(n)= (h(n)) then f(n)= (h(n)),
• if f(n)=o(g(n)) and g(n)= o(h(n)) then f(n)=o(h(n)), and
• if f(n)= (g(n)) and g(n)= (h(n)) then f(n)= (h(n)).
Symmetry
• f(n)=(g(n)) if and only if g(n)=(f(n)).
Transpose symmetry
• f(n)=O(g(n)) if and only if g(n)=(f(n)),
• f(n)=o(g(n)) if and only if g(n)=(f(n)).
Reflexivity
• f(n)=(f(n)),
• f(n)=O(f(n)),
• f(n)=(f(n)).
Recurrence Relations
Recurrence relations can be directly derived from a recursive algorithm, but they are in the form that
doesn’t allow us to quickly determine how efficient the algorithm is. To do that we need to convert
the set of recursive equations into what is called closed form by removing the recursive nature of the
equations. This is done by a series of repeated substitutions until we can see the pattern that develops.
The easiest way to see this process is by a series of examples.
A recurrence relation can be expressed in two ways. The first is used if there are just a few simple
cases for the formula.
𝑇(𝑛) = 2𝑇(𝑛 − 2) − 15
Page 16
𝑇(2) = 40
𝑇(1) = 40
The second way is the direct solution is applied for a larger number of cases:
4 for n ≤ 4
These forms are equivalent. We can convert from the second to the first by just using those values for
which we have the direct answer. This means that the second recurrence relation above could also be
given as
𝑇(𝑛) = 4𝑇(𝑛/2) − 1
𝑇(4) = 4
𝑇(3) = 4
𝑇(2) = 4
𝑇(1) = 4
Consider the following relation:
𝑇(𝑛) = 2𝑇(𝑛 − 2) − 15
𝑇(2) = 40
𝑇(1) = 40
We will want to substitute an equivalent value for 𝑇(𝑛 − 2) back into the first equation. To do so, e
replace every n in the first equation with 𝑛 − 2, giving:
𝑇(𝑛 − 2) = 2𝑇(𝑛 − 2 − 2) − 15
= 2𝑇(𝑛 − 4) − 15
But now we can see when this substitution is done, we will still have 𝑇(𝑛 − 4) to eliminate. If you
think ahead, you will realize that there will be a series of these values that we will need. As a first
step, we create a set of these equations for successively smaller values:
𝑇(𝑛 − 2) = 2𝑇(𝑛 − 4) − 15
𝑇(𝑛 − 4) = 2𝑇(𝑛 − 6) − 15
𝑇(𝑛 − 6) = 2𝑇(𝑛 − 8) − 15
Now we begin to substitute back into the original equation. We will be careful not simplify the
resulting equation too much because that will make the pattern more difficult to see. Doing the
substitution gives us:
Page 17
𝑇(𝑛) = 4𝑇(𝑛 − 4) − 2 ∗ 15 − 15
𝑇(𝑛) = 8𝑇(𝑛 − 6) − 4 ∗ 15 − 2 ∗ 15 − 15
𝑇(𝑛) = 16𝑇(𝑛 − 8) − 8 ∗ 15 − 4 ∗ 15 − 2 ∗ 15 − 15
You are probably beginning to see a pattern develop here. First, we notice that each new term at the
end of the equation is -15 multiplied by the next higher power of 2. Second, we notice that the
coefficient of the recursive call to T is going through a series of powers of 2. Third, we notice that the
value that we are calling T with keeps going down by 2 each time.
Now, you might wonder, when does this process end? If we look back at the original equation, you
will see that we have a fixed for T (2) and T (1). How many times would we have to substitute back
into this equation to get to either of these values? We can see that
2 = 𝑛 − (𝑛 − 2) if n is even. This seems to indicate that we would substitute back into this equation
𝑛−2
[ ] − 1 times giving 𝑛/2 − 1 terms based on -15 in the equation, and the power of the coefficient
2
of T will be 𝑛/2 − 1 . To see this, consider what we would have if the value of n was 14. In this case
the previous sentence indicates that we would have substituted five times, would have six terms
based on -15, and would have 26 for the coefficient of T(2). If you look closely at the last equation
and substitute 14 in for n, you will see that this is exactly what we have.
What if n is an odd number? Will these formulas still work? Let’s consider an n value of 13. In the
above equation, the only thing that would change is that T would have a value of 1 instead of 2, but
by out equations, 𝑛/2 − 1 is 5 (not 6) when n is 13. For odd n, we will use n/2 instead of n/2 – 1. We
will have two cases in our answer:
𝑛 𝑛
( −1) ( )−1
𝑇(𝑛) = 2 2 2
𝑇(2) − 15 ∗ ∑𝑖=0 2𝑖 if n is even.
i.e.
𝑛
( )−1
𝑇(𝑛) = 2(𝑛/2)−1 ∗ 40 − 15 ∗ ∑𝑖=0
2
2𝑖 if n is even.
And
𝑛 𝑛
( ) ( )
𝑇(𝑛) = 2 2 𝑇(1) − 15 ∗ ∑𝑖=0 2𝑖
2
if n is odd.
Page 18
i.e.
𝑛 𝑛
( )
𝑇(𝑛) = 2( 2 ) ∗ 40 − 15 ∗ ∑𝑖=0
2
2𝑖 if n is odd.
= 2(𝑛/2) ∗ 20 − 2(𝑛/2) ∗ 15 + 15
𝑛
= 2( 2 ) (20 − 15) + 15
= 2(𝑛/2) ∗ 5 + 15
= 2(𝑛/2) ∗ 40 − 2(𝑛/2) ∗ 30 + 15
𝑛 𝑛
= 2( 2 ) (40 − 30) + 15 = 2( 2 ) ∗ 10 + 15
Page 19
Chapter Three
3.1 Searching and Selection Algorithms
Introduction
The act of searching for a piece of information in a list is one of the fundamental algorithms in
computer science. In discussing searching we assume that there is a list that contains records of
information, which in practice is stored in array in a program. The list locations will be indexed
from 1 to N, which represents the number of records in the list. Each record can be separated into
fields, but we will only be interested in one of these fields, which we will call the key. Lists will
be either sorted or unsorted based on their key value. Records are in a random order in unsorted
list and are in order by increasing key value in a sorted list.
When a list is unsorted, we only have one search option and that is to sequentially look through
the list for the item we want. This is the simplest of the searching algorithms.
When the list of elements is sorted, the options for searching are expanded to include binary
search. Binary search takes advantages the ordered nature of the list to eliminate more than one
element of the list with each comparison. This results in a more efficient search.
Searching is a process of looking for a specific element in a list of items or determining that the
item is not in the list. There are two simple searching algorithms:
In search algorithms, we are concerned with the process of looking through a list to find a particular
element, called the target. Although not required, we usually consider the list to be unsorted when
doing a sequential search, because there are other algorithms that perform better on sorted lists.
Search algorithms are not interested in whether the target is in the list but are usually part of a larger
process that needs the data associated with that key. For example, the key value might be an
employee number, a serial number, or other unique identifier. When the proper key is found, the
program might change some of the data stored for that key or might simply output the record.
In any case, the important task of the search algorithm is to identify the location of the key. For this
reason, search algorithms return the index of where the record with the key is located. If the target
value is not found, it is typical for the algorithm to return an index value that is outside the range of
the list of elements.
For our purpose, we will assume that the elements of the list are located in positions 1 to N in the list.
This allows us to return a value of zero if the target is not in the list. For the sake of simplicity, we
will assume that the key values are unique for all the elements in the list.
Page 20
Sequential search looks at elements, one at a time, form the first in the list until a match for the target
is found. It should be obvious that the further down the list a particular key value is, the longer it will
take to find that key value. This is an important fact to remember when we begin to analyze
sequential search.
SequentialSearch(list, target, N)
list the elements to searched
target the value being searched for
N the number of elements in the list
for i = 1 to N do
if (target = list[i])
return i
end if
end for
return 0
There are two cases for the worst case analysis for the sequential search algorithm:
The first is if the target matches the last element in the list.
The second is if the target is not in the list.
For both cases, let us look at how many comparisons are done. We have said that all of the list keys
will be unique, and so if the match is in the last location, that means that all of the other locations are
different from the target. The algorithm will, therefore, compare the target with each of these values
until it finds the match in the last location. This will take N comparisons, where N is the number of
elements in the list.
In the second case, we will have to compare the target to all of the elements in the list to determine
that the target is not there. If we skip any of the elements, we will not know if the target is not present
or is present in one of the locations we skipped. This means that we need to do N comparisons to see
that none of the elements are match the target.
In both cases the, whether the target is in the last location or not in the list, this algorithm takes N
comparisons. You should see that this is the upper bound for any search algorithm, because to do
more than N comparisons would mean that the algorithm compared at least one element with the
target at least twice, which is unnecessary work, so the algorithm could be improved.
There is a difference between the concept of upper bound and a worst case. The upper bound is a
concept based on the problem to be solved, and the worst case is based on the way a particular
algorithm solves that problem. For this algorithm the worst case is also the upper bound for the
problem. I.e. T (n) is O (n). We will see another algorithm that has a worst case that is less than this
upper bound of N.
Page 21
Average - Case Analysis
There are two average-case analyses that can be done for a search algorithm. The first assumes that
the search is always successful and the other assumes that the target will sometimes not be found.
If the target is in the list, there are N places where the target can be located. It could be in the first,
second, third, fourth, and so on, locations in the list. We will assume that all of these possibilities are
equally likely, giving a probability of 1/N for each potential location.
Take a moment to answer the following questions before you proceed on:
How many comparisons are done if the match is in the first location?
What about the second?
What about the third?
What about the last or Nth location?
If you looked at the algorithm carefully, you should have determined that the answers to these
questions are 1, 2, 3, and N, respectively. This means that for each of our N cases, the number of
comparisons is the same as the location where the match occurs. This gives the following equation
for this average case: (recall the formula for calculating the average cases analysis is given
as 𝑨(𝑵) = ∑𝑵 𝒊=𝟏 𝒑𝒊 ∗ 𝒕𝒊):
𝟏
𝑨(𝑵) = 𝑵 ∑𝑵
𝒊=𝟏 𝒊
𝟏 𝑵(𝑵+𝟏)
𝑨(𝑵) = 𝑵 ∗ 𝟐
𝑵+𝟏
𝑨(𝑵) = 𝟐
If we include the possibility that the target is not in the list, we will find that there are now N+1
possibilities. As we have seen, the case where the target is not in the list will take N comparisons. If
we assume that all N+1 possibilites are equally likely, we wind up the following:
𝑵
𝟏
𝑨(𝑵) = ( ) ∗ [(∑ 𝒊) + 𝑵]
(𝑵 + 𝟏)
𝒊=𝟏
𝑵
𝟏 𝟏
𝑨(𝑵) = ( ∑ 𝒊) + ( ∗ 𝑵)
(𝑵 + 𝟏) 𝑵+𝟏
𝒊=𝟏
𝟏 𝑵(𝑵 + 𝟏) 𝑵
𝑨(𝑵) = ( ∗ )+
𝑵+𝟏 𝟐 𝑵+𝟏
𝑵 𝑵 𝑵 𝟏
𝑨(𝑵) = + = +𝟏−
𝟐 𝑵+𝟏 𝟐 𝑵+𝟏
𝑵+𝟐 𝟏
𝑨(𝑵) ≈ (As N gets very large, becomes almost 0.)
𝟐 𝑵+𝟏
Page 22
We see that including the possibility of the target not being in the list only increases the average case
by ½. When we consider this amount relative to the size of the list, which could be very large, this ½
is not significant.
The binary searching algorithm works only on an ordered list. The basic idea is:
If we compare the target with the element that is in the middle of a sorted list, we have three possible
results: the target matches, the target is less than the element, or the target is greater than the element.
In the first and best case, we are done. In the other two cases, we learn that half the list can be
eliminated from consideration.
If the target is less than the middle element, we know that if the target is in this ordered list, it must
be in the list before the middle element. When the target is greater than the middle element, we know
that if the target is in this ordered list, it must be in the list after the middle element. These facts allow
this one comparison to eliminate one-half of the list from consideration. As the process continues, we
will eliminate from consideration one-half of what is left of the list with each comparison.
BinarySearch(list, target, N)
list the elements to be searched
target the value being searched for
N the number of elements in the list
start = 1
end = N
while start <= end do
middle = (start + end)/2
select(Compare(list[middle], target)) from
case -1(T>M): start = middle + 1
case 0(T=M) : return middle
case 1(T<M): end = middle – 1
end select
end while
return 0
Page 23
−1 𝑖𝑓 𝑥 < 𝑦
𝐶𝑜𝑚𝑝𝑎𝑟𝑒(𝑥, 𝑦) = { 0 𝑖𝑓 𝑥 = 𝑦
1 𝑖𝑓 𝑥 > 𝑦
In this algorithm, start gets reset to 1 larger than the middle when we know the target is larger than
the element at the middle location. end gets reset to 1 smaller than the middle when we know the
target is smaller than the element at the middle location. These are the shifted by 1 because we know
by the three-way comparison that the middle value is not equal and so can be eliminated from
consideration.
Does this always stop? If we find the target, the answer is obviously Yes, because of the return. If we
don’t find a match, each pass through the loop will either increase the value of start or decrease the
value of end. This means that they will continue to get closer to each other. Eventually, they will
become equal to each other, and the loop will be done one more time, with start = end =
middle. After this pass (assuming that this is not the element we are looking for), either start
will be 1greater than middle and end, or end will be 1 less than middle and start.
In both cases, the while loop’s conditional will become false, and the loop will
stop. Therefore, the loop does always stop.
Does this algorithm return the correct answer? If we find that the target, the answer is obviously Yes
because of the return. If the middle element doesn’t match, each pass through the loop eliminates
from consideration one-half of the remaining elements because they are all either too large or too
small.
Because of the halving nature of this algorithm, we will assume for our analysis that N = 2k – 1 for
some value of k. If this is the case, how many elements will be left for the second pass? What about
the second pass? In general, you should see that if on some pass of the loop we have 2 j – 1 elements
in the middle, and 2j-1 – 1 in the second half. Therefore, the next pass will have 2j-1 – 1 elements left
(for 1<=j<=k). This assumption will make the following analysis easier to do.
Worst-case Analysis
In the above paragraph, we showed that the power of 2 is decreased by one each pass of the loop. It
was also shown that the last pass of the loop occurs when the list has a size of 1, which occurs when j
is 1 (21 – 1 = 1). This means that there are at most k passes when N = 2k – 1. Solving this equation
tells us that the worst case is 𝑘 = log 2 (𝑁 + 1).
The computational time for this algorithm is proportional to log2 n. Therefore the time complexity is
O(log n)
Page 24
Linear Search (Sequential Search)
Pseudocode
Loop through the array starting at the first element until the value of target matches one of the array
elements.
If a match is not found, return –1.
Time is proportional to the size of input (n) and we call this time complexity O(n).
Example Implementation:
int Linear_Search(int list[], int key)
int index=0;
int found=0;
do{
if(key==list[index])
found=1;
else
index++;
}while(found==0&&index<n);
if(found==0)
index=-1;
return index;
Binary Search
Example Implementation:
int left=0;
int right=n-1;
int found=0;
do{
Page 25
mid=(left+right)/2;
if(key==list[mid])
found=1;
else{
if(key<list[mid])
right=mid-1;
else
left=mid+1;
}while(found==0&&left<right);
if(found==0)
index=-1;
else
index=mid;
return index;
Sorting is one of the most important operations performed by computers. Sorting is a process of
reordering a list of items in either increasing or decreasing order. The following are simple sorting
algorithms used to sort small-sized lists.
• Insertion Sort
• Selection Sort
• Bubble Sort
3.2.1. Insertion Sort
The insertion sort works just like its name suggests - it inserts each item into its proper place in the
final list. The simplest implementation of this requires two list structures - the source list and the list
into which sorted items are inserted. To save memory, most implementations use an in-place sort that
works by moving the current item past the already sorted items and repeatedly swapping it with the
preceding item until it is in place.
Page 26
It's the most instinctive type of sorting algorithm. The approach is the same approach that you use for
sorting a set of cards in your hand. While playing cards, you pick up a card, start at the beginning of
your hand and find the place to insert the new card, insert it and move all the others up one place.
Basic Idea:
Find the location for an element and move all others up, and insert the element.
1. The left most value can be said to be sorted relative to itself. Thus, we don’t need to do
anything.
2. Check to see if the second value is smaller than the first one. If it is, swap these two values.
The first two values are now relatively sorted.
3. Next, we need to insert the third value in to the relatively sorted portion so that after insertion,
the portion will still be relatively sorted.
4. Remove the third value first. Slide the second value to make room for insertion. Insert the
value in the appropriate position.
5. Now the first three are relatively sorted.
6. Do the same for the remaining items in the list.
Implementation
int temp;
for(int i=1;i<n;i++){
temp=list[i];
list[j]=list[j-1];
list[j-1]=temp;
}//end of insertion_sort
Page 27
Analysis
1+2+3+…+(n-1)= O(n2)
1+2+3+…+(n-1)= O(n2)
Basic Idea:
Implementation:
for(i=0;i<n;i++){
smallest=i;
for(j=i+1;j<n;j++){
if(list[j]<list[smallest])
smallest=j;
temp=list[smallest];
list[smallest]=list[i];
list[i]=temp;
}//end of selection_sort
Page 28
Analysis
(n-1)+(n-2)+…+1= O(n2)
n=O(n)
Bubble sort is the simplest algorithm to implement and the slowest algorithm on very large inputs.
Basic Idea:
Loop through array from i=0 to n and swap adjacent elements if they are out of order.
Implementation:
void bubble_sort(list[])
int i,j,temp;
for(i=0;i<n; i++){
for(j=n-1;j>i; j--){
if(list[j]<list[j-1]){
temp=list[j];
list[j]=list[j-1];
list[j-1]=temp;
}//end of bubble_sort
Page 29
Analysis of Bubble Sort
(n-1)+(n-2)+…+1= O(n2)
(n-1)+(n-2)+…+1= O(n2)
General Comments
Each of these algorithms requires n-1 passes: each pass places one item in its correct place. The ith
pass makes either i or n - i comparisons and moves. So:
or O(n2). Thus these algorithms are only suitable for small problems where their simple code makes
them faster than the more complex code of the O(n logn) algorithm. As a rule of thumb, expect to
find an O(n logn) algorithm faster for n>10 - but the exact value depends very much on individual
machines!.
Empirically it’s known that Insertion sort is over twice as fast as the bubble sort and is just as easy to
implement as the selection sort. In short, there really isn't any reason to use the selection sort - use the
insertion sort instead.
If you really want to use the selection sort for some reason, try to avoid sorting lists of more than a
1000 items with it or repetitively sorting lists of more than a couple hundred items.
QUICKSORT
Quicksort is a sorting algorithm whose worst-case running time is (n2) on an input array of n
numbers. In spite of this slow worst-case running time, quicksort is often the best practical choice for
sorting because it is remarkably efficient on the average: its expected running time is (n lg n), and
the constant factors hidden in the (n lg n) notation are quite small. It also has the advantage of
sorting in place , and it works well even in virtual memory environments.
Description of quicksort
Quicksort, like merge sort, is based on the divide-and-conquer paradigm .Here is the three-step
divide, conquer and combine process for sorting a typical subarray A[p . . r].
Page 30
Divide: The array A[p . . r] is partitioned (rearranged) into two nonempty subarrays A[p . . q] and A[q
+ 1 . . r] such that each element of A[p . . q] is less than or equal to each element of A[q + 1 . . r]. The
index q is computed as part of this partitioning procedure.
Conquer: The two subarrays A[p . . q] and A[q + 1 . . r] are sorted by recursive calls to quicksort.
Combine: Since the subarrays are sorted in place, no work is needed to combine them: the entire
array A[p . . r] is now sorted.
QUICKSORT(A,p,r)
1 if p < r
2 then q PARTITION(A,p,r)
3 QUICKSORT(A,p,q)
4 QUICKSORT(A,q + 1,r)
The key to the algorithm is the PARTITION procedure, which rearranges the subarray A[p . . r] in
place.
PARTITION(A,p,r)
1 x A[p]
2 i p-1
3 j r+1
4 while TRUE
5 do repeat j j-1
6 until A[j] x
7 repeat i i+1
Page 31
8 until A[i] x
9 if i < j
11 else return j
Figure 1 shows how PARTITION works. It first selects an element x = A[p] from A[p . . r] as a
"pivot" element around which to partition A[p . . r]. It then grows two regions A[p . . i] and A[j . . r]
from the top and bottom of A[p . . r], respectively, such that every element in A[p . . i] is less than or
equal to x and every element in A[j . . r] is greater than or equal to x. Initially, i = p - 1 and j = r + 1,
so the two regions are empty.
Within the body of the while loop, the index j is decremented and the index i is incremented, in lines
5-8, until A[i] x A[j]. Assuming that these inequalities are strict, A[i] is too large to belong to the
bottom region and A[j] is too small to belong to the top region. Thus, by exchanging A[i] and A[j] as
is done in line 10, we can extend the two regions. (If the inequalities are not strict, the exchange can
be performed anyway.)
The body of the while loop repeats until i j, at which point the entire array A[p . . r] has been
partitioned into two subarrays A[p . . q] and A[q + 1 . . r], where p q < r, such that no element of A[p
. . q] is larger than any element of A[q + 1. . r]. The value q = j is returned at the end of the procedure.
Conceptually, the partitioning procedure performs a simple function: it puts elements smaller than x
into the bottom region of the array and elements larger than x into the top region. There are
technicalities that make the pseudocode of PARTITION a little tricky, however. For example, the
indices i and j never index the subarray A[p . . r] out of bounds, but this isn't entirely apparent from
the code. As another example, it is important that A[p] be used as the pivot element x. If A[r] is used
instead and it happens that A[r] is also the largest element in the subarray A[p . . r], then PARTITION
Page 32
returns to QUICKSORT the value q = r, and QUICKSORT loops forever.
Figure 1 The operation of PARTITION on a sample array. Lightly shaded array elements have been
placed into the correct partitions, and heavily shaded elements are not yet in their partitions. (a) The
input array, with the initial values of i and j just off the left and right ends of the array. We partition
around x = A[p] = 5. (b) The positions of i and j at line 9 of the first iteration of the while loop. (c)
The result of exchanging the elements pointed to by i and j in line 10. (d) The positions of i and j at
line 9 of the second iteration of the while loop. (e) The positions of i and j at line 9 of the third and
last iteration of the while loop. The procedure terminates because i j, and the value q = j is returned.
Array elements up to and including A[j] are less than or equal to x = 5, and array elements after A[j]
are greater than or equal to x = 5.
Exercises
Using Figure 1 as a model, illustrate the operation of PARTITION on the array A = 13, 19, 9, 5, 12,
8, 7, 4, 11, 2, 6, 21 .
Performance of quicksort
The running time of quicksort depends on whether the partitioning is balanced or unbalanced, and
this in turn depends on which elements are used for partitioning. If the partitioning is balanced, the
algorithm runs asymptotically as fast as merge sort. If the partitioning is unbalanced, however, it can
run asymptotically as slow as insertion sort. In this section, we shall informally investigate how
quicksort performs under the assumptions of balanced versus unbalanced partitioning.
Worst-case partitioning
The worst-case behavior for quicksort occurs when the partitioning routine produces one region with
n - 1 elements and one with only l element. Let us assume that this unbalanced partitioning arises at
every step of the algorithm. Since partitioning costs (n) time and T(1) = (1), the recurrence for
the running time is
Page 33
To evaluate this recurrence, we observe that T(1) = (1) and then iterate:
We obtain the last line by observing that is the arithmetic series . Figure 2 shows a recursion
tree for this worst-case execution of quicksort
Thus, if the partitioning is maximally unbalanced at every recursive step of the algorithm, the running
time is (n2). Therefore the worstcase running time of quicksort is no better than that of insertion
sort. Moreover, the (n2) running time occurs when the input array is already completely sorted--a
common situation in which insertion sort runs in O(n) time.
Figure 2 A recursion tree for QUICKSORT in which the PARTITION procedure always puts
only a single element on one side of the partition (the worst case). The resulting running time is
(n2).
Best-case partitioning
If the partitioning procedure produces two regions of size n/2, quicksort runs much faster. The
recurrence is then
Page 34
which by case 2 of the master theorem (Theorem 4.1) has solution T(n) = (n lg n). Thus, this best-
case partitioning produces a much faster algorithm. Figure 8.3 shows the recursion tree for this best-
case execution of quicksort.
Balanced partitioning
The average-case running time of quicksort is much closer to the best case than to the worst case, as
the analyses in Section 8.4 will show. The key to understanding why this might be true is to
understand how the balance of the partitioning is reflected in the recurrence that describes the
running time.
Suppose, for example, that the partitioning algorithm always produces a 9-to-1 proportional split,
which at first blush seems quite unbalanced. We then obtain the recurrence
on the running time of quicksort, where we have replaced (n) by n for convenience. Figure 8.4
shows the recursion tree for this recurrence. Notice that every level of the tree has cost n, until a
boundary condition is reached at depth log10 n = (lg n), and then the levels have cost at most n. The
recursion terminates at depth log10/9 n = (lg n). The total cost of quicksort is therefore (n lg n).
Thus, with a 9-to-1 proportional split at every level of recursion, which intuitively seems quite
unbalanced, quicksort runs in (n lg n) time--asymptotically the same as if the split were right down
the middle. In fact, even a 99-to-1 split yields an O(n lg n) running time. The reason is that any split
of constant proportionality yields a recursion tree of depth (lg n), where the cost at each level is
O(n). The running time is therefore (n lg n) whenever the split has constant proportionality.
Figure 8.3 A recursion tree for QUICKSORT in which PARTITION always balances the two
sides of the partition equally (the best case). The resulting running time is (n lg n).
Page 35
Chapter Four
Graph Algorithm
Introduction
The objective of this article is to provide a basic introduction about graphs and the commonly
used algorithms used for traversing the graph, BFS and DFS. Breadth First Search (BFS) and
Depth First Search (DFS) are the two popular algorithms. This chapter will help you to get some
basic understanding about what graphs are, how they are represented, graph traversals using BFS
and DFS and time/space complexity of each algorithm.
Graph Algorithms: We are now beginning a major new section of the course. We will be
discussing algorithms for both directed and undirected graphs. Intuitively, a graph is a collection
of vertices or nodes, connected by a collection of edges. Graphs are extremely important because
they are a very flexible mathematical model for many application problems. Basically, any time
you have a set of objects, and there is some “connection” or “relationship” or “interaction”
between pairs of objects, a graph is a good way to model this. Examples of graphs in application
include communication and transportation networks, VLSI and other sorts of logic circuits,
surface meshes used for shape description in computer-aided design and geographic information
systems ,precedence constraints in scheduling systems. The list of application is almost too long
to even consider enumerating it.
Most of the problems in computational graph theory that we will consider arise because they are
of importance to one or more of these application areas. Furthermore, many of these problems
form the basic building blocks from which more complex algorithms are then built.
Definition of Graph
A graph G = (V, E) consists of a (finite) set denoted by V, or by V(G) if one wishes to make
clear which graph is under consideration, and a collection of E, or E(G), of unordered pairs {u, v}
of distinct elements from V. Each element of V is called a vertex or a point or a node, and each
element of E is called an edge or a line or a link.
Page 36
Formally, a graph G is an ordered pair of disjoint sets (V, E), where V is called the vertex or node
set, while set E is the edge set of graph G. Typically, it is assumed that self-loops (i.e. edges of
the form (u, u), for some u Î V) are not contained in a graph.
Graphs are good in modeling real world problems like representing cities which are connected by
roads and finding the paths between cities, modeling air traffic controller system, etc. These kinds
of problems are hard to represent using simple tree structures. The following example shows a
very simple graph:
In the above graph, A,B,C,D,E,F are called nodes and the connecting lines between these nodes
are called edges. The edges can be directed edges which are shown by arrows; they can also
be weighted edges in which some numbers are assigned to them. Hence, a graph can be a
directed/undirected and weighted/un-weighted graph. In this article, we will discuss
undirected and un-weighted graphs.
Every graph has two components, Nodes and Edges.
Page 37
1. Nodes
A graph is a set of nodes and a set of edges that connect the nodes. Graphs are used to model
situations where each of the nodes in a graph must be visited, sometimes in a particular order, and
the goal is to find the most efficient way to “traverse” the graph. The elements of a graph are
called nodes, and the elements that are below a particular node are called the node’s children. it is
represented by Circle.
2. Edges
Edges represent the connection between nodes. There are two ways to represent edges.
Adjacency Matrix
It is a two dimensional array with Boolean flags. As an example, we can represent the edges for
the above graph using the following adjacency matrix.
In the given graph, A is connected with B, C and D nodes, so adjacency matrix will have 1s in the
‘A’ row for the ‘B’, ‘C’ and ‘D’ column.
The advantages of representing the edges using adjacency matrix are:
Simplicity in implementation as you need a 2-dimensional array
Creating edges/removing edges is also easy as you need to update the Booleans
Page 38
Adjacency List
It is an array of linked list nodes. In other words, it is like a list whose elements are a linked list.
For the given graph example, the edges will be represented by the below adjacency list:
A D
B C
Graph Traversal
The breadth first search (BFS) and the depth first search (DFS) are the two algorithms used
for traversing and searching a node in a graph. They can also be used to find out whether a
node is reachable from a given node or not.
Page 39
As stated before, in DFS, nodes are visited by going through the depth of the tree from the
starting node. If we do the depth first traversal of the above graph and print the visited node, it
will be “E F B C D A”. DFS visits the root node and then its children nodes until it reaches
the end node, i.e. E and F nodes, then moves up to the parent nodes.
Algorithmic Steps
Step 1: Push the root node in the Stack.
Step 2: Peek the node of the stack.
Step 3: If the node has unvisited child nodes, get the unvisited child node, mark it as traversed
and push it on stack.
Step 4: If the node does not have any unvisited child nodes, pop the node from the stack
Step 5: Loop step 2 until stack is empty.
.
Depth-First Search
s.marked = true;
Stack S = new Stack();
S.push(s);
while(! S.isempty()){
v = S.peek();
u = firstUnmarkedAdj(v);
if (u == null) S.pop();
else {
u.marked = true;
S.push(u);
}}
Page 40
Breadth First Search (BFS)
This is a very different approach for traversing the graph nodes. The aim of BFS algorithm is to
traverse the graph as close as possible to the root node. Queue (FIFO) is used in the
implementation of the breadth first search. Let’s see how BFS traversal works with respect to
the following graph:
If we do the breadth first traversal of the above graph and print the visited node as the output, it
will print the following output. “A B C D E F”. The BFS visits the nodes level by level, so it
will start with level 0 which is the root node, and then it moves to the next levels which are B,
C and D, then the last levels which are E and F.
Algorithmic Steps
Step 1: Push the root node in the Queue.
Step 2: Remove the node from the Queue.
Step 3: If the removed node has unvisited child nodes, mark them as visited and insert the
unvisited children in the queue.
Step 4: Loop until the queue is empty.
Page 41
Complexity of breadth-first search
Assume an adjacency list representation, V is the number of vertices, E the number of edges.
Each vertex is enqueued and dequeued at most once.
Scanning for all adjacent vertices takes O(E) time, since sum of lengths of adjacency lists is E.
Gives a O(V+E) time complexity.
Spanning Tree
If G is a connected graph, the spanning tree in G is a subgraph of G which includes every vertex
of G and is also a tree.
Page 42
Page 43
Minimum Spanning Trees
Spanning trees
A spanning tree of a graph is just a subgraph that contains all the vertices and is a tree. A graph
may have many spanning trees; for instance the complete graph on four vertices
Ao-------------oB
D
B o-------------oC
Exercise
This graph have sixty spanning tree try it by yourself
Now suppose the edges of the graph have weights or lengths. The weight of a tree is just the sum
of weights of its edges. Obviously, different trees have different lengths.
Dijkstra's algorithm, named after its discoverer, Dutch computer scientist Edsger Dijkstra, is a
greedy algorithm that solves the single-source shortest path problem for a directed graph with
non negative edge weights. For example, if the vertices (nodes) of the graph represent cities
and edge weights represent driving distances between pairs of cities connected by a direct
road, Dijkstra's algorithm can be used to find the shortest route between two cities. Also, this
algorithm can be used for shortest path to destination in traffic network.
Using the Code
Page 44
I will explain this algorithm over an example.
We are in A node and the problem is we must go other nodes with minimum cost. L[,] is our
distances between pairs of nodes array.
Collapse | Copy Code
int[,] L ={
{-1, 5, -1, -1, -1, 3, -1, -1},
{ 5, -1, 2, -1, -1, -1, 3, -1},
{-1, 2, -1, 6, -1, -1, -1, 10},
{-1, -1, 6, -1, 3, -1, -1, -1},
{-1, -1, -1, 3, -1, 8, -1, 5},
{ 3, -1, -1, -1, 8, -1, 7, -1},
{-1, 3, -1, -1, -1, 7, -1, 2},
{-1, -1, 10, -1, 5, -1, 2, -1}
};
D[] shows the cost array. We will write the shortest cost in D array. C[] shows our nodes.
Pseudocode
Collapse | Copy Code
function Dijkstra(L[1..n, 1..n]) : array [2..n]
array D[2..n]
set C
C <- {2, 3, 4, 5, 6, …, n}
for i <- 2 to n
D[i] <- L[1,i]
repeat n - 2 times
v <- C // minimum D[v] extract to C
v <- C - {v}
for each w in C do
Page 45
D[w] <- min(D[w], D[v] + L[v,w])
return D
C[]-> -1, 1, 2, 3, 4, 5, 6, 7
Page 46
D[]-> -1, 5, 7,-1,11, 3, 8,-1
Page 47
D[]-> -1, 5, 7,13,11, 3, 8,10
Page 48
D[]-> -1, 5, 7,13,11, 3, 8, 8
C[]-> -1,-1,-1,-1,-1,-1,-1,-1
Page 49
D[i] = L[0, i];
}
public void DijkstraSolving()
{
int minValue = Int32.MaxValue;
int minNode = 0;
for (int i = 0; i < rank; i++)
{
if (C[i] == -1)
continue;
if (D[i] > 0 && D[i] < minValue)
{
minValue = D[i];
minNode = i;
}
}
C[minNode] = -1;
for (int i = 0; i < rank; i++)
{
if (L[minNode, i] < 0)
continue;
if (D[i] < 0) {
D[i] = minValue + L[minNode, i];
continue;
}
if ((D[minNode] + L[minNode, i]) < D[i])
D[i] = minValue+ L[minNode, i];
}
}
public void Run()
{
for (trank = 1; trank >rank; trank++)
{
DijkstraSolving();
Console.WriteLine("iteration" + trank);
for (int i = 0; i < rank; i++)
Console.Write(D[i] + " ");
Console.WriteLine("");
for (int i = 0; i < rank; i++)
Console.Write(C[i] + " ");
Console.WriteLine("");
}
}
}
Page 50
CHAPTER FOUR
Designing algorithms
There are many ways to design algorithms. Insertion sort uses an incremental approach: having
sorted the subarray A[1 . . j - 1], we insert the single element A[j] into its proper place, yielding the
sorted subarray A[1 . . j].
In this section, we will examine an alternative design approaches, such as Greedy ,divide-and-
conquer and dynamic programming. One advantage of using these algorithms is that their running
times are often easily determined using techniques that will be introduced.
Greedy algorithm says that a globally optimal solution can be arrived at by making a locally optimal
choice. Unlike Dynamic Programming, which solves the sub problems bottom-up, a greedy strategy
usually progresses in a top-down fashion, making one greedy choice after another, reducing each
problem to a smaller one. Divide-and-conquer is a top-down technique for designing algorithms that
consists of dividing the problem into smaller subproblems hoping that the solutions of the
subproblems are easier to find and then composing the partial solutions into the solution of the
original problem. In this chapter we will discuss these algorithm designing approaches with
appropriate examples.
Greedy Algorithms are an Algorithms -for optimization problems typically go through a sequence of
steps, with a set of choices at each step. A greedy algorithm always makes the choice that looks best
at the moment. That is, it makes a locally optimal choice in the hope that this choice will lead to a
globally optimal solution. This chapter explores optimization problems that are solvable by greedy
algorithms.
Greedy algorithms do not always yield optimal solutions, but for many problems they do. We shall
first examine a simple but nontrivial problem, the activity-selection problem, for which a greedy
algorithm efficiently computes a solution and some of the basic elements of the greedy approach that
presents an important application of greedy techniques.
The greedy method is quite powerful and works well for a wide range of problems. Dijkstra's
algorithm for shortest paths form a single source and Minimum spanning trees form a classic
example of the greedy method.
Page 51
Greedy algorithms are simple and straightforward. They are short sighted in their approach in the
sense that they take decisions on the basis of information at hand without worrying about the effect
these decisions may have in the future. They are easy to invent, easy to implement and most of the
time quite efficient. Many problems cannot be solved correctly by greedy approach. Greedy
algorithms are used to solve optimization problems. It works by making the decision that seems most
promising at any moment; it never reconsiders this decision, whatever situation may arise later.
Definitions of feasibility
A feasible set (of candidates) is promising if it can be extended to produce not merely a solution, but
an optimal solution to the problem.
Unlike Dynamic Programming, which solves the subproblems bottom-up, a greedy strategy usually
progresses in a top-down fashion, making one greedy choice after another, reducing each problem to
a smaller one.
Given a set S of n activities with start time, Si and finish time fi, of an ith activity. Find the maximum
size set of mutually compatible activities.
Page 52
Compatible Activities
Activities i and j are compatible if the half-open internal [si, fi) and [sj, fj)
do not overlap, that is, i and j are compatible if si ≥ fj assuming, j and i are the first and second
activity, respectively.
1. n = length [s]
2. A={1}
3. j = 1
4. for i = 2 to n {
5. do if si ≥ fj
6. then A= AU{i}
7. j=i }
8. return set A
Operation of the algorithm
Let 11 activities are given S = {p, q, r, s, t, u, v, w, x, y, z} start and finished times for proposed
activities are (1, 4), (3, 5), (0, 6), (5, 7), (3, 8), (5, 9), (6, 10), (8, 11), (8, 12), (2, 13) and (12, 14).
Analysis
Page 53
Greedy-choice property
The "greedy-choice property" and "optimal substructure" are two ingredients in the problem that lend to a
greedy strategy. The first key ingredient is the greedy-choice property: a globally optimal solution can
be arrived at by making a locally optimal (greedy) choice. Here is where greedy algorithms differ
from dynamic programming. In dynamic programming, we make a choice at each step, but the choice
may depend on the solutions to subproblems. In a greedy algorithm, we make whatever choice seems
best at the moment and then solve the subproblems arising after the choice is made. The choice made
by a greedy algorithm may depend on choices so far, but it cannot depend on any future choices or on
the solutions to subproblems. Thus, unlike dynamic programming, which solves the subproblems
bottom up, a greedy strategy usually progresses in a top-down fashion, making one greedy choice
after another, iteratively reducing each given problem instance to a smaller one.
Of course, we must prove that a greedy choice at each step yields a globally optimal solution, and
this is where cleverness may be required. It then shows that the solution can be modified so that a
greedy choice is made as the first step, and that this choice reduces the problem to a similar but
smaller problem. Then, induction is applied to show that a greedy choice can be used at every step.
Showing that a greedy choice results in a similar but smaller problem reduces the proof of
correctness to demonstrating that an optimal solution must exhibit optimal substructure.
Page 54
Combine these solutions to subproblems to create a solution to the original problem.
Binary Search is an extremely well-known instance of divide-and-conquer paradigm. Given an
ordered array of n elements, the basic idea of binary search is that for a given element we "probe" the
middle element of the array. We continue in either the lower or upper segment of the array,
depending on the outcome of the probe until we reached the required (given) element.
The divide-and-conquer paradigm involves three steps at each level of the recursion:
Divide the problem into a number of subproblems.
Conquer the subproblems by solving them recursively. If the subproblem sizes are small enough,
however, just solve the subproblems in a straightforward manner.
Combine the solutions to the subproblems into the solution for the original problem.
The merge sort algorithm closely follows the divide-and-conquer paradigm. Intuitively, it operates as
follows.
Divide: Divide the n-element sequence to be sorted into two subsequences of n/2 elements each.
Conquer: Sort the two subsequences recursively using merge sort.
Combine: Merge the two sorted subsequences to produce the sorted answer.
We note that the recursion "bottoms out" when the sequence to be sorted has length 1, in which case
there is no work to be done, since every sequence of length 1 is already in sorted order.
The key operation of the merge sort algorithm is the merging of two sorted sequences in the
"combine" step. To perform the merging, we use an auxiliary procedure MERGE(A,p,q,r), where A is
an array and p, q, and r are indices numbering elements of the array such that p q < r. The procedure
assumes that the subarrays A[p. .q] and A[q + 1. .r] are in sorted order. It merges them to form a
single sorted subarray that replaces the current subarray A[p. .r].
Although we leave the pseudocode as an exercise, it is easy to imagine a MERGE procedure that
takes time (n), where n is the number of elements being merged. Returning to our card-playing
motif, suppose we have two piles of cards face up on a table. Each pile is sorted, with the smallest
cards on top. We wish to merge the two piles into a single sorted output pile, which is to be face
down on the table. Our basic step consists of choosing the smaller of the two cards on top of the face-
up piles, removing it from its pile (which exposes a new top card), and placing this card face down
onto the output pile. We repeat this step until one input pile is empty, at which time we just take the
remaining input pile and place it face down onto the output pile. Computationally, each basic step
takes constant time, since we are checking just two top cards. Since we perform at most n basic steps,
merging takes (n) time.
Page 55
We can now use the MERGE procedure as a subroutine in the merge sort algorithm. The procedure
MERGE-SORT(A,p,r) sorts the elements in the subarray A[p. .r]. If p r, the subarray has at most
one element and is therefore already sorted. Otherwise, the divide step simply computes an index q
that partitions A[p. .r] into two subarrays: A[p. .q], containing n/2] elements, and A[q + 1. .r],
containing n/2 elements.
The expression x denotes the least integer greater than or equal to x, and x denotes the greatest
integer less than or equal to x. These notations are defined in Chapter 1.
MERGE-SORT(A,p,r)
1 if p < r
2 then q (p + r)/2
3 MERGE-SORT(A,p,q)
4 MERGE-SORT(A, q + 1, r)
5 MERGE(A,p,q,r)
To sort the entire sequence A = A[1],A[2], . . . ,A[n] , we call MERGE-SORT(A, 1, length[A]),
where once again length[A] = n. If we look at the operation of the procedure bottom-up when n is a
power of two, the algorithm consists of merging pairs of 1-item sequences to form sorted sequences
of length 2, merging pairs of sequences of length 2 to form sorted sequences of length 4, and so on,
until two sequences of length n/2 are merged to form the final sorted sequence of length n. Figure 4.1
illustrates this process.
Analysing divide-and-conquer algorithms
When an algorithm contains a recursive call to itself, its running time can often be described by a
recurrence equation or recurrence, which describes the overall running time on a problem of size n
in terms of the running time on smaller inputs. We can then use mathematical tools to solve the
recurrence and provide bounds on the performance of the algorithm.
A recurrence for the running time of a divide-and-conquer algorithm is based on the three steps of the
basic paradigm. As before, we let T(n) be the running time on a problem of size n. If the problem size
is small enough, say n c for some constant c, the straightforward solution takes constant time, which
we write as (1). Suppose we divide the problem into a subproblems, each of which is 1/b the size
of the original. If we take D(n) time to divide the problem into subproblems and C(n) time to
combine the solutions to the subproblems into the solution to the original problem, we get the
recurrence
Page 56
Figure 4.1 The operation of merge sort on the array A = 5, 2, 4, 6, 1, 3, 2, 6 . The lengths of the
sorted sequences being merged increase as the algorithm progresses from bottom to top.
Page 57
Exercises
Using Figure 4.1 as a model, illustrate the operation of merge sort on the array A = 3, 41, 52, 26, 38,
57, 9, 49 .
Page 58
The Principle of Optimality
The dynamic programming relies on a principle of optimality. This principle states that in an optimal
sequence of decisions or choices, each subsequence must also be optimal. For example, in matrix
chain multiplication problem, not only the value we are interested in is optimal but all the other
entries in the table are also represent optimal.
The principle can be related as follows: the optimal solution to a problem is a combination of optimal
solutions to some of its sub problems.
The difficulty in turning the principle of optimally into an algorithm is that it is not usually obvious
which sub problems are relevant to the problem under consideration.
Let d(x,y) denote the shortest distance between vertices x and y. We have the following equation:
d ( S , A) d ( A, T )
d ( S , T ) min d ( S , B) d ( B, T )
d ( S , C ) d (C , T )
Page 59
The question is: How do we find the shortest route from, say vertex A, to vertex T? Note that we can
use the same principle to find a shortest route from A to T. That is, the problem to find a shortest
route from A to T is the same as the problem finding a shortest route from S to T except the size of the
problem is now smaller.
The shortest route finding problem can now be solved systematically as follows:
d ( S , A) d ( A, T )
d ( S , T ) min d ( S , B) d ( B, T )
d ( S , C ) d (C , T )
15 d ( A, T ) (2.1)
min 18 d ( B, T )
3 d (C , T )
d ( A, D) d ( D, T )
d ( A, T ) min
d ( A, E ) d ( E , T )
11 d ( D, T )
min (2.2)
10 d ( E , T )
11 41
min
10 21
= 31
d ( B, E ) d ( E , T )
d ( S , T ) mind ( B, F ) d ( F , T )
d ( B, G ) d (G, T )
9 d ( E , T )
min 1 d ( F , T )
2 d (G, T ) (2.3)
Page 60
9 21
min 1 3
2 21
=4
d (C , G ) d (G, T )
d ( S , T ) min
d (C , H ) d ( H , T )
14 21 (2.4)
min
16 27
= 35
Substituting 2.2, 2.3 and 2.4 into 2.1, we obtain that d (S , T ) = min{15+31,18+4,3+35} = 22, which
implies that the shortest route from S to T is S B F T . As shown above, the basic idea of
dynamic programming strategy is to decompose a large problem into several sub-problems. Each
sub-problem is identical to the original problem except the size is smaller. Thus the dynamic
programming strategy always solves a problem recursively.
Concept notes
A good algorithm is like a sharp knife--it does exactly what it is supposed to do with a minimum
amount of applied effort. Using the wrong algorithm to solve a problem is like trying to cut a steak
with a screwdriver: you may eventually get a digestible result, but you will expend considerably
more effort than necessary, and the result is unlikely to be aesthetically pleasing.
Algorithms devised to solve the same problem often differ dramatically in their efficiency. These
differences can be much more significant than the difference between a personal computer and a
supercomputer. As an example, let us pit a supercomputer running insertion sort against a small
personal computer running merge sort. They each must sort an array of one million numbers.
Suppose the supercomputer executes 100 million instructions per second, while the personal
computer executes only one million instructions per second. To make the difference even more
dramatic, suppose that the world's craftiest programmer codes insertion sort in machine language for
the supercomputer, and the resulting code requires 2n2 supercomputer instructions to sort n numbers.
Merge sort, on the other hand, is programmed for the personal computer by an average programmer
Page 61
using a high-level language with an inefficient compiler, with the resulting code taking 50n 1g n
personal computer instructions. To sort a million numbers, the supercomputer takes
By using an algorithm whose running time has a lower order of growth, even with a poor compiler,
the personal computer runs 20 times faster than the supercomputer!
This example shows that algorithms, like computer hardware, are a technology. Total system
performance depends on choosing efficient algorithms as much as on choosing fast hardware. Just as
rapid advances are being made in other computer technologies, they are being made in algorithms as
well.
5.4 Backtracking
Backtracking is a general algorithm for finding all (or some) solutions to some computational
problems, notably constraint satisfaction problems that incrementally builds candidates to the
solutions, and abandons each partial candidate c ("backtracks") as soon as it determines that c cannot
possibly be completed to a valid solution.
The classic textbook example of the use of backtracking is the eight queen’s puzzle that asks for all
arrangements of eight chess queens on a standard chessboard so that no queen attacks any other. In
the common backtracking approach, the partial candidates are arrangements of k queens in the first k
rows of the board, all in different rows and columns. Any partial solution that contains two mutually
attacking queens can be abandoned.
Backtracking can be applied only for problems which admit the concept of a "partial candidate
solution" and a relatively quick test of whether it can possibly be completed to a valid solution. It is
useless, for example, for locating a given value in an unordered table. When it is applicable, however,
backtracking is often much faster than brute force enumeration of all complete candidates, since it
can eliminate a large number of candidates with a single test.
Backtracking is an important tool for solving constraint satisfaction problems, such as crosswords,
verbal arithmetic, Sudoku, and many other puzzles. It is often the most convenient technique for
Page 62
parsing, for the knapsack problem and other combinatorial optimization problems. It is also the basis
of the so-called logic programming languages such as Icon, Planner and Prolog.
Backtracking depends on user-given "black box procedures" that define the problem to be solved, the
nature of the partial candidates, and how they are extended into complete candidates. It is therefore a
meta heuristic rather than a specific algorithm – although, unlike many other meta-heuristics, it is
guaranteed to find all solutions to a finite problem in a bounded amount of time.
The term "backtrack" was coined by American mathematician D. H. Lehmer in the 1950s. The
pioneer string-processing language SNOBOL (1962) may have been the first to provide a built-in
general backtracking facility.
The backtracking algorithm enumerates a set of partial candidates that, in principle, could be
completed in various ways to give all the possible solutions to the given problem. The completion is
done incrementally, by a sequence of candidate extension steps.
Conceptually, the partial candidates are represented as the nodes of a tree structure, the potential
search tree. Each partial candidate is the parent of the candidates that differ from it by a single
extension step; the leaves of the tree are the partial candidates that cannot be extended any further.
The backtracking algorithm traverses this search tree recursively, from the root down, in depth-first
order. At each node c, the algorithm checks whether c can be completed to a valid solution. If it
cannot, the whole sub-tree rooted at c is skipped (pruned). Otherwise, the algorithm checks whether c
itself is a valid solution, and if so reports it to the user; and recursively enumerates all sub-trees of c.
The two tests and the children of each node are defined by user-given procedures.
Therefore, the actual search tree that is traversed by the algorithm is only a part of the potential tree.
The total cost of the algorithm is the number of nodes of the actual tree times the cost of obtaining
and processing each node. This fact should be considered when choosing the potential search tree and
implementing the pruning test.
Page 63
Pseudo code
In order to apply backtracking to a specific class of problems, one must provide the data P for the
particular instance of the problem that is to be solved, and six procedural parameters, root, reject,
accept, first, next, and output. These procedures should take the instance data P as a parameter and
should do the following:
root(P): return the partial candidate at the root of the search tree.
reject(P,c): return true only if the partial candidate c is not worth completing.
next(P,s): generate the next alternative extension of a candidate, after the extension s.
The backtracking algorithm reduces the problem to the call bt(root(P)), where bt is the following
recursive procedure:
1. procedure bt(c)
2. if reject(P,c) then return
3. if accept(P,c) then output(P,c)
4. s ← first(P,c)
5. while s ≠ Λ do
6. bt(s)
7. s ← next(P,s)
Usage considerations
The reject procedure should be a boolean-valued function that returns true only if it is certain that no
possible extension of c is a valid solution for P. If the procedure cannot reach a definite conclusion, it
should return false. An incorrect true result may cause the bt procedure to miss some valid solutions.
The procedure may assume that reject(P,t) returned false for every ancestor t of c in the search tree.
On the other hand, the efficiency of the backtracking algorithm depends on reject returning true for
candidates that are as close to the root as possible. If reject always returns false, the algorithm will
still find all solutions, but it will be equivalent to a brute-force search.
Page 64
The accept procedure should return true if c is a complete and valid solution for the problem instance
P, and false otherwise. It may assume that the partial candidate c and all its ancestors in the tree have
passed the reject test.
Note that the general pseudo-code above does not assume that the valid solutions are always leaves
of the potential search tree. In other words, it admits the possibility that a valid solution for P can be
further extended to yield other valid solutions.
The first and next procedures are used by the backtracking algorithm to enumerate the children of a
node c of the tree, that is, the candidates that differ from c by a single extension step. The call
first(P,c) should yield the first child of c, in some order; and the call next(P,s) should return the next
sibling of node s, in that order. Both functions should return a distinctive "null" candidate, denoted
here by 'Λ', if the requested child does not exist.
Together, the root, first, and next functions define the set of partial candidates and the potential
search tree. They should be chosen so that every solution of P occurs somewhere in the tree, and no
partial candidate occurs more than once. Moreover, they should admit an efficient and effective reject
predicate.
The pseudo-code above will call output for all candidates that are a solution to the given instance P.
The algorithm is easily modified to stop after finding the first solution, or a specified number of
solutions; or after testing a specified number of partial candidates, or after spending a given amount
of CPU time.
Examples
1. Puzzles such as eight queens puzzle, crosswords, verbal arithmetic, Sudoku, Peg Solitaire.
2. Combinatorial optimization problems such as parsing and the knapsack problem.
3. Logic programming languages such as Icon, Planner and Prolog, which use backtracking
internally to generate answers.
Below is an example for the constraint satisfaction problem:
Page 65
Constraint satisfaction
The general constraint satisfaction problem consists in finding a list of integers x = (x[1],x[2], ...,
x[n]), each in some range {1, 2, ..., m}, that satisfies some arbitrary constraint (boolean function) F.
For this class of problems, the instance data P would be the integers m and n, and the predicate F. In
a typical backtracking solution to this problem, one could define a partial candidate as a list of
integers c = (c[1],c[2], ... c[k]), for any k between 0 and n, that are to be assigned to the first k
variables x[1],x[2], ..., x[k]). The root candidate would then be the empty list (). The first and next
procedures would then be
function first(P,c)
k ← length(c)
if k = n
then return Λ
function next(P,s)
k ← length(s)
if s[k] = m
then return Λ
The call reject(P,c) should return true if the constraint F cannot be satisfied by any list of n integers
that begins with the k elements of c. For backtracking to be effective, there must be a way to detect
this situation, at least for some candidates c, without enumerating all those mn-k n-tuples.
For example, if F is the conjunction of several boolean predicates, F = F[1] F[2] F[p], and each F[i]
depends only on a small subset of the variables x[1], ..., x[n], then the reject procedure could simply
check the terms F[i] that depend only on variables x[1], ..., x[k], and return true if any of those terms
Page 66
returns false. In fact, reject needs only check those terms that do depend on x[k], since the terms that
depend only on x[1], ..., x[k-1] will have been tested further up in the search tree.
Assuming that reject is implemented as above, then accept(P,c) needs only check whether c is
complete, that is, whether it has n elements.
It is generally better to order the list of variables so that it begins with the most critical ones (i.e. the
ones with fewest value options, or which have a greater impact on subsequent choices).
One could also allow the next function to choose which variable should be assigned when extending
a partial candidate, based on the values of the variables already assigned by it. Further improvements
can be obtained by the technique of constraint propagation.
In addition to retaining minimal recovery values used in backing up, backtracking implementations
commonly keep a variable trail, to record value change history. An efficient implementation will
avoid creating a variable trail entry between two successive changes when there is no choice point, as
the backtracking will erase all of the changes as a single operation.
An alternative to the variable trail is to keep a timestamp of when the last change was made to the
variable. The timestamp is compared to the timestamp of a choice point. If the choice point has an
associated time later than that of the variable, it is unnecessary to revert the variable when the choice
point is backtracked, as it was changed before the choice point occurred.
Page 67
5.5 Branch and bound
Branch and bound (BB or B&B) is an algorithm design paradigm for discrete and combinatorial
optimization problems, as well as general real valued problems. A branch-and-bound algorithm
consists of a systematic enumeration of candidate solutions by means of state space search: the set of
candidate solutions is thought of as forming a rooted tree with the full set at the root. The algorithm
explores branches of this tree, which represent subsets of the solution set. Before enumerating the
candidate solutions of a branch, the branch is checked against upper and lower estimated bounds on
the optimal solution, and is discarded if it cannot produce a better solution than the best one found so
far by the algorithm.
The algorithm depends on the efficient estimation of the lower and upper bounds of a region/branch
of the search space and approaches exhaustive enumeration as the size (n-dimensional volume) of the
region tends to zero.
The method was first proposed by A. H. Land and A. G. Doig in 1960 for discrete programming, and
has become the most commonly used tool for solving NP-hard optimization problems.The name
"branch and bound" first occurred in the work of Little et al. on the traveling salesman problem.
The goal of a branch-and-bound algorithm is to find a value x that maximizes or minimizes the value
of a real-valued function f(x), called an objective function, among some set S of admissible, or
candidate solutions. The set S is called the search space, or feasible region. The rest of this section
assumes that minimization of f(x) is desired; this assumption comes without loss of generality, since
one can find the maximum value of f(x) by finding the minimum of g(x) = −f(x). A B&B algorithm
operates according to two principles:
It recursively splits the search space into smaller spaces, then minimizing f(x) on these smaller
spaces; the splitting is called branching.
Branching alone would amount to brute-force enumeration of candidate solutions and testing them
all. To improve on the performance of brute-force search, a B&B algorithm keeps track of bounds on
the minimum that it is trying to find, and uses these bounds to "prune" the search space, eliminating
candidate solutions that it can prove will not contain an optimal solution.
Turning these principles into a concrete algorithm for a specific optimization problem requires some
kind of data structure that represents sets of candidate solutions. Such a representation is called an
Page 68
instance of the problem. Denote the set of candidate solutions of an instance I by SI. The instance
representation has to come with two operations:
branch(I) produces two or more instances that each represent a subset of SI. (Typically, the subsets
are disjoint to prevent the algorithm from visiting the same candidate solution twice, but this is not
required. The only requirement for a correct B&B algorithm is that the optimal solution among SI is
contained in at least one of the subsets.
bound(I) computes a lower bound on the value of any candidate solution in the space represented by
I, that is, bound(I) ≤ f(x) for all x in Si.
solution(I) determines whether I represents a single candidate solution. (Optionally, if it does not, the
operation may choose to return some feasible solution from among SI.
Using these operations, a B&B algorithm performs a top-down recursive search through the tree of
instances formed by the branch operation. Upon visiting an instance I, it checks whether bound(I) is
greater than the upper bound for some other instance that it already visited; if so, I may be safely
discarded from the search and the recursion stops. This pruning step is usually implemented by
maintaining a global variable that records the minimum upper bound seen among all instances
examined so far.
Finally: In This chapter we have discussed general approaches to algorithm design. The intent has
been to investigate basic algorithm design paradigms: dynamic programming, greedy algorithms,
depth-first search, etc. In some sense, the algorithms you have learned here are rarely immediately
applicable to your later work (unless you go on to be an algorithm designer) because real world
problems are always messier than these simple abstract problems. However, there are some important
lessons to take out of this class that used to design better algorithm.
1. Develop a clean mathematical model: Most real-world problems are messy. An important first
step in solving any problem is to produce a simple and clean mathematical formulation. For example,
this might involve describing the problem as an optimization problem on graphs, sets, or strings. If
you cannot clearly describe what your algorithm is supposed to do, it is very difficult to know when
you have succeeded.
2. Create good rough designs: Before jumping in and starting coding, it is important to begin with a
good rough design. If your rough design is based on a bad paradigm then no amount of additional
tuning and refining will save this bad design.
Page 69
3. Prove your algorithm correct: Many times you come up with an idea that seems promising, only
to find out later (after a lot of coding and testing) that it does not work. Prove that your algorithm is
correct before coding. Writing proofs is not always easy, but it may save you a few weeks of wasted
programming time. If you cannot see why it is correct, chances are that it is not correct at all.
4. Can it be improved?: Once you have a solution, try to come up with a better one. Is there some
reason why a better algorithm does not exist? (That is, can you establish a lower bound?)
5. Prototype to generate better designs: We have attempted to analyze algorithms from an
asymptotic perspective, which hides many of details of the running time, but I tried to give you a
general perspective for separating good designs from bad ones. After you have isolated the good
designs, then it is time to start prototyping and doing empirical tests to establish the real constant
factors. A good profiling tool can tell you which subroutines are taking the most time, and those are
the ones you should work on improving.
6. Still too slow?: If your problem has an unacceptably high execution time, you might consider an
approximation algorithm. The world is full of heuristics, both good and bad. You should develop a
good heuristic, and if possible, prove a ratio bound for your algorithm. If you cannot prove a ratio
bound, run many experiments to see how good the actual performance is.
There is still much more to be learned about algorithm design, but we have covered a great deal of
the basic material.
Page 70