Algorithmics and Optimization PDF
Algorithmics and Optimization PDF
Andreas de Vries
These lecture notes are published under the Creative Commons License 4.0
(http://creativecommons.org/licenses/by/4.0/)
Contents
I Foundations of algorithmics 8
1 Elements and control structures of algorithms 9
1.1 Mathematical notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 The basic example: Euclid’s algorithm . . . . . . . . . . . . . . . . . . . . . . 10
1.3 The elements of an algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Control structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 Definition of an algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Algorithmic analysis 16
2.1 Correctness (“effectiveness”) . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Complexity to measure efficiency . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Recursions 24
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Recursive algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Searching the maximum in an array . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Recursion versus iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Complexity of recursive algorithms . . . . . . . . . . . . . . . . . . . . . . . . 28
3.6 The towers of Hanoi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Sorting 34
4.1 Simple sorting algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Theoretical minimum complexity of a sorting algorithm . . . . . . . . . . . . . 35
4.3 A recursive construction strategy: Divide and conquer . . . . . . . . . . . . . . 36
4.4 Fast sorting algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.5 Comparison of sort algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2
7 Graphs and shortest paths 63
7.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.2 Representation of graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.3 Traversing graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.4 Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.5 Shortest paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8 Dynamic Programming 75
8.1 An optimum-path problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.2 The Bellman functional equation . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.3 Production smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.4 The travelling salesman problem . . . . . . . . . . . . . . . . . . . . . . . . . 84
9 Simplex algorithm 87
9.1 Mathematical formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
9.2 The simplex algorithm in detail . . . . . . . . . . . . . . . . . . . . . . . . . . 88
9.3 What did we do? or: Why simplex? . . . . . . . . . . . . . . . . . . . . . . . 91
9.4 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
10 Genetic algorithms 96
10.1 Evolutionary algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
10.2 Basic notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
10.3 The “canonical” genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . 98
10.4 The 0-1 knapsack problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
10.5 Difficulties of genetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 101
10.6 The traveling salesman problem . . . . . . . . . . . . . . . . . . . . . . . . . 103
10.7 Axelrod’s genetic algorithm for the prisoner’s dilemma . . . . . . . . . . . . . 104
10.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Appendix 107
A Mathematics 108
A.1 Exponential and logarithm functions . . . . . . . . . . . . . . . . . . . . . . . 108
A.2 Number theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
A.3 Searching in unsorted data structures . . . . . . . . . . . . . . . . . . . . . . . 110
Bibliography 120
3
Dear reader,
perhaps you are surprised finding these lecture notes written in English (or better: in a scientific
idiom which is very similar to English). We have decided to write them in English for the
following reasons. Firstly, any computer scientist or information system technologist has to
read a lot of English documents, web sites, text books — particularly if he or she wants to get to
know innovative issues. So this is the main reason: training for non-natively English speaking
students. Secondly, foreign students and international institutions shall benefit from these notes.
Thirdly, the notes offer a convenient way to learn the English terminology corresponding to the
German one given in the lecture.
To help you to learn the English terminology, you will find a small dictionary at the end
of these notes, presenting some of the expressions most widely used in computer sciences and
mathematics, along with their German translations. In addition, there are listed some arithmeti-
cal terms in English and in German.
You might as well get surprised to find another human language — mathematics. Why
mathematics in a book about algorithmics? Algorithms are, in essence, applied mathematics.
Even if they deal with apparently “unmathematical” subjects such as manipulating strings or
searching objects, mathematics is the basis. To mention just a few examples: the classical algo-
rithmic concept of recursion is very closely related to the principle of mathematical induction;
rigorous proofs are needed for establishing the correctness of given algorithms; running times
have to be computed.
The contents of these lecture notes spread a wide range. On the one hand they try to give
the basic knowledge about algorithmics, such that you will learn the following questions: What
is an algorithm and what are its building blocks? How can an algorithm be analyzed? How do
standard well-known algorithms work? On the other hand, these lecture notes introduce into the
wide and important field of optimization. Optimization is a basic principle of human activity
and thinking, it is involved in the sciences and in practice. It mainly deals with the question:
How can a solution to a given problem under certain constraints be achieved with a minimum
cost, be it time, money, or machine capacity? Optimization is a highly economical principle.
However, any solution of an optimization problem is a list of instructions, such as “do this, then
do that, but only under the condition that . . . ,” i.e., an algorithm — the circle is closed.
So we think optimization to be one of the basic subjects for you as a student of business
information systems, for it will be one of your main business activities in the future. Surely, no
lecture can give an answer to all problems which you will be challenged, but we think that it is
important to understand that any optimization problem has a basic structure — it is the structure
of a given optimization problem that you should understand, because then you may solve it in a
more efficient way (you see, another optimization problem).
Of course, a single semester is much too short to mention all relevant aspects. But our hope
is that you gain an intuitive feeling for the actual problems and obstacles. For this is what you
really must have to face future challenges — understanding.
4
Introduction
5
6 Andreas de Vries
doubles roughly every 24 months. So the applicability and the abilities of computers are grow-
ing more and more. They fly aeroplanes and starships, control power stations and cars, find and
store information, or serve as worldwide communication devices. Over the last three decades,
computers have caused a technological, economic, and social revolution which could be hardly
foreseen.
Parallel to the technology changes, and in part having enabled them, there is a development
of various programming languages. From the first “higher” programming languages of the
1950’s for scientific and business-oriented computations, like Fortran and COBOL, to internet-
based languages like Java or PHP, every new field of activity made available some new pro-
gramming languages specialized in it.
In view of this impressive and enormous developments, the question may be raised: Is there
anything that remains constant during all these changes? Of course, there are such constants,
and they were to a great part stated already before the invention of the first computers in the
1930’s, mainly achieved by the mathematicians Gödel, Turing, Church, Post and Kleene: These
are the fundamental laws underlying any computation and hence any programming language.
These fundamental laws of algorithms are the subject of this book, not a particular programming
language.
However, in this book the study of algorithms is done on the background and influenced by
the structure of Java, one of the most elaborated and widely used programming languages. In
particular, the pseudocode to represent algorithms is strongly influenced by the syntax of Java,
although it should be understandable without knowing Java.
References
The literature on algorithmics and optimization is immense. The following list only is a tiny
and uncomplete selection.
• T.H. Cormen et al.: Introduction to Algorithms [5] – classical standard reference, with
considerable breadth and width of subjects. A must for a computer scientist.
• D. Harel & Y. Feldman: Algorithmik. Die Kunst des Rechnens [21] – gives a good
overview over the wide range of algorithms and the underlying paradigms, even men-
tioning quantum computation.
• F. Kaderali & W. Poguntke: Graphen, Algorithmen, Netze [26]. Basic introduction into
the theory of graphs and graph algorithms.
• A. Barth: Algorithmik für Einsteiger [1] – a nice book explaining principles of algorith-
mics.
• W. Press et al.: Numerical Recipes in C++ [36] – for specialists, or special problems.
To lots of standard, but also rather difficult problems, there is given a short theoretical
introduction and descriptions of efficient solutions. Requires some background in mathe-
matics.
Foundations of algorithmics
8
Chapter 1
9
10 Andreas de Vries
Algorithm 1.2. (Euclid’s algorithm) Given two positive integers m and n, find their greatest
common divisor gcd, that is, the largest positive integer that evenly divides both m and n.
n ← n % m.
E3. [Is n greater than zero?] If n > 0 loop back to step E1; if n 5 0, the algorithm terminates,
m is the answer.
• Step E1 exchanges m and n such that m = 4 and n = 6; step E2 yields the values m = 4,
n = 2; because in E3 still n > 0, step E1 is done again.
• Again arriving in E1, m and n are exchanged, yielding the new values m = 2, n = 4; E2
yields the new value n = 0, and still m = 2; E3 tells us that m = 2 is the answer.
gcd (6, 4) = 2.
A first observation is that the verbal description is not a very convenient technique to de-
scribe the effect of an algorithm. Instead, we will create a value table denoting the values
depending on the time, so to say the “evolution of values” during the algorithm time flow of the
algorithm, see 1.1.
1.2.1 Pseudocode
A convenient way to express an algorithm is pseudocode. This is an artificial and informal
language which is similar to everyday English, but also resembles to higher-level programming
languages such as Java, C, or Pascal. (In fact, one purpose of pseudocode is just to enable the
direct transformation into a programming language; pseudocode is the “mother” of all program-
ming languages). Euclid’s algorithm in pseudocode reads as follows:
Algorithmics and Optimization 11
euclid (m, n) {
while ( n > 0 ) {
m ↔ n;
n ← n % m;
}
return m;
}
By convention, any assignmet is terminated by a semicolon (;). This is in accordance with most
of the common programming languages (especially Java, C, C++, Pascal, PHP). Remarkably,
Euclid’s algorithm is rather short in pseudocode. Obviously, pseudocode is a very effective way
to represent algorithms, and we will use it throughout this script.
We use the following conventions in our pseudocode.
1. In the first line the name of the algorithm appears, followed by the required parameters in
parentheses: euclid (m, n)
2. Indention indicates block structure. For example, the body of the while-loop only consists
of one instruction. Often we will indicate block structure in addition by {...} (as in Java
or C), but it could easily be read as begin ... end (as in Pascal)).
3. We use as control structure key words only while, for, and if ... else as in the common
programming languages (see below for details on control structures)
4. Comments are indicated by the double slash //. It means that the rest of the line is a
comment.
5. We will use the semicolon (;) to indicate the end of an instruction.
With the Euclidean algorithm we will explore what the basic element are out of which
a general algorithm can be built: the possible operations, the assigment, and three control
structures. With these elements we are able to define what actually an algorithm is.
with the “domain of definition” D ⊂ Rd and the “range” R ⊂ Rr and d, r ∈ N ∪ {±∞}. For
instance, the modulo operation is given by the function
f : N2 → N, f (m, n) = m % n.
Here d = 2 and r = 1.
Even non-numerical operations such as “assignment of a memory address” or “string ad-
dition” (concatenation) are possible operations, the sets D and R only have to be defined ap-
propriately. (In the end: All strings are natural numbers, [3] p.213.) Also Boolean functions
evaluating logical expressions (such as x < y) are possible.
1.3.2 Instructions
An instruction is an elementary command to do an “action”. We will be dealing only with three
instructions: input, output, and the assignment.
The instructions input and output denote the methods which manage the flow of data into
and out of the system. They turn out to be the entrance and the exit door of an algorithm. Both
are methods which process “letters” of a given “alphabet.” In Euclid’s algorithm, there are two
input “letters”, namely m and n, and the “alphabet” is the set N of positive integers; the output
is one “letter” n. (D = N × N, R = N).
Often the instruction input is replaced by the name of the input parameter list of the algo-
rithm, e.g. “abc(m, n; . . .)”. If the algorithm has no input parameters, we can write it with empty
parentheses such as “abs().” This notation makes sense especially if there is no further input
data during the algorithm performance. Analogously, the key word return is frequently used
instead of output.
The assignment is denoted by the arrow ←. The assignment m ← n assigns the “value” of
the right side n to the “variable” m on the left side. The right side may be an operation with a
well-defined result, e.g., a subtraction:
m ← 4 − 2.
Here m is assigned the value 2. A necessary condition is of course that the value of the right
side is well-defined. An assignment m ← n in which n is a variable which has no definite value
is not valid. If on the other hand m has the value 4, say, then the assignment
m ← m−2
makes sense, meaning that m gets the value 2. Note that the order of assignment instruction is
important: If, e.g., n has initial value 4 then the instruction sequence “m ← n; n ← 1;” is quite
different from the sequence “n ← 1; m ← n;”
We will often deal with the exchange instruction “↔”. It can be realized as a sequence of
instruction with a new (“intermediate”) variable t:
“m ↔ n; ” is defined as “t ← m; m ← n; n ← t; ” (1.4)
Sometimes we will use multiple assignments m ← n ← t; it means that both variables m and n
are assigned the value of t.
1.4.1 Sequence
It is the simplest of the control structures. Here each instruction is simply executes one after the
other. In pseudocode, the sequence structure simply reads
instruction 1;
...;
instruction n;
if (condition) {
instruction 1;
...;
instruction m
} else {
instruction 1;
...;
instruction n;
}
Here a condition is a logical proposition being either true or false. It is also called a Boolean
expression. If it is true, the instructions i1 , ..., im are executed, if it is false, the instructions e1 ,
..., en are executed. If n = 0, the else-branch can be omitted completely. An example is given
in Euclid’s algorithm
if (m < n) {
m↔n
}
while (condition) {
instruction 1;
...;
instruction n;
}
If the loop is performed a definite number of times, we also use the for-statement:
14 Andreas de Vries
for (i = 1 to m) {
instruction 1;
...;
instruction n;
}
myAlgorithm(m, n) {
k = m ∗ subroutine(n);
return k;
}
A special subroutine call is the “recursion” which we will consider in more detail below. The
terminology varies, subroutines are also be known as routines, procedures, functions (especially
if they return results) or methods.
try {
instruction 1;
...;
instruction n;
} catch ( exception1 A ) {
instruction A;
} catch ( exception2 B ) {
instruction B;
}
Here the try-block contains the instructions of the algorithm. These instructions are monitored
to perform correctly. If now an exception occurs during its execution, it is said to be “thrown,”
and according to its nature it is “catched” by one of the following catch-blocks, i.e., the execu-
tion flow is terminated and jumps to the appropriate catch-block. The sequence of catch-blocks
has to be arranged from the special cases to more general cases. For instance, the first catched
exception may be an arithmetic exception such as division by zero, the next one a more general
runtime exception such as number parsing of a non-numeric input, or a IO exception such as
trying to read a file which is not present, and so on to the catch-block for the most possible
exception. In Java, the most general exception is an object of the class Exception.
Algorithmics and Optimization 15
input output
(x1 , x2 , . . .) ... ... y
2. (definite) Each step of an algorithm is defined precisely. The actions to be carried out
must be rigorously and unambiguously specified for each case.
3. (elementary) All operation must be sufficiently basic that they can in principle be done
exactly and in a finite length of time by someone using pencil and paper. Operations may
be clustered to more complex operations, but in the end they must be definitely reducible
to elementary mathematical operations.
4. (input) An algorithm has zero or more inputs, i.e. data which are manipulated.
5. (ouput or return) An algorithm has one or more returns, i.e. information gained by the
data and the algorithm.
tum algorithms, it is known today that only quantum algorithms might be in fact more powerful than Turing
machines [7, §12].
4 See [29, §4.5.3, Corollary L (p.360)]; more accurately N 5 log [(3 − φ ) · max(m, n)], where φ is the “golden
√ φ
ratio” φ = (1 + 5)/2.
Chapter 2
Algorithmic analysis
There are two properties which have to be analysed when designing and checking an algorithm.
On the one hand it has to be correct, i.e., it must answer the posed problem “effectively.” Usually
to demonstrate the correctness of an algorithm is a difficult task, it requires a mathematical
proof. On the other hand, an algorithm should find a correct answer efficiently, i.e., as fast as
possible with the minimum memory space.
We have so far tacitly supposed that there always exists a greatest common divisor. To be
rigorous we have to show two things: There exists at least one divisor, and there are finitely
many divisors. But we already know that 1 is always a divisor; on the other hand the set of all
divisors is finite, because by Theorem A.3 (iv) all divisors have an absolute value bounded by
|n|, as long as n 6= 0. Thus there are at most 2n − 1 divisors of a non-vanishing n. In a finite
16
Algorithmics and Optimization 17
non-empty set there is an upper bound, the unique greatest common divisor of m, n, denoted
gcd(m, n). By our short discussion we can conclude
Proof. The first assertion (i) is obvious. We prove the second assertion. By Theorem A.4, there
is an integer q with
m = q|n| + (m mod |n|).
Therefore, gcd (m, n) divides gcd (|n|, m mod |n|) and vice versa. Since both common divisors
are nonnegative, the assertion follows from Theorem A.3 (v). Q.E.D.
Theorem 2.3. Euclid’s algorithm computes the greatest common divisor of m and n.
Proof. To prove that the algorithm terminates and yields gcd (m, n) we introduce some notation
that will also be used later. We set
m ← rk+1 , n ← rk .
It follows from Theorem 2.2 (ii) that gcd (rk+1 , rk ) = gcd (m, n) is not changed during the algo-
rithm, as long as rk+1 > 0. Thus we only need to prove that there is a k such that rk = 0. But this
follows from the fact that by (2.5) the sequence (rk )k=1 is strictly decreasing, so the algorithm
terminates surely. But if rk+1 = 0, we have simply that gcd (rk−1 , rk ) = rk , and thus n = rk is
the correct result.
This concludes the proof of the correctness of the Euclidean algorithm, since after a finite
time it yields the gcd (m, n). Q.E.D.
To analyze the running time and the space requirement of an algorithm exactly, we must
know the details about the implementation technology, such as hardware and software. For in-
stance, the running time of a given algorithm depends on the frequency of the CPU, and also on
the underlying computer architecture; the required memory space, on the other hands, depends
on the programming language and its representation of data structures. To determine the com-
plexities of an algorithm thus appears as an impracticable task. Moreover, the running time and
required space calculated in this way are not only properties of the considered algorithm, but
also of the implementation technology. However, we would appreciate some measures which
are independent from the implementation technology. To obtain such asymptotic and “robust”
measures, the O-notation has been introduced.
O is also referred to as a Landau symbol or the “big-O”. Figure 2.1 (a) illustrates the O-symbol.
Although O(g(n)) denotes a set of functions f (n) having the property (2.7), it is common to
Figure 2.1: Graphic examples of the O, Ω, and Θ notations. In each part, the value of n0 is shown as the
minimum possible value; of course, any greater value would also work. (a) O-notation gives an upper bound for
a function up to a constant factor. (b) Ω-notation gives a lower bound for a function up to a constant factor. (c)
Θ-notation bounds a function up to constant factors.
write “ f (n) = O(g(n))” instead. We use the big-O-notation to give an asymptotic upper bound
on a function f (n), up to a constant factor.
Algorithmics and Optimization 19
Example 2.4. (i) We have 2n2 + n + 1 = O(n2 ), because 2n2 + n + 1 5 4n2 for all n = 1. (That
is, c = 4, n0 = 1 in (2.7); note that we could have chosen c = 3 and n0 = 2).
(ii) More general, any quadratic polynomial a2 n2 + a1 n + a0 = O(n2 ). To show this we
assume c = |a2 | + |a1 | + |a0 |; then
a2 n2 + a1 n + a0 5 cn2 ∀n = n0 (with n0 = 1)
The maximum index m depends on n.1 We write the expansion as digits (an an−1 . . . a1 a0 )b .
Some examples:
b=2: 25 = 1 · 24 + 1 · 23 + 0 · 22 + 0 · 21 + 1 · 20 = (11001)2
b=3: 25 = 2 · 32 + 2 · 31 + 1 · 30 = (221)3
b=4: 25 = 1 · 42 + 2 · 41 + 1 · 40 = (121)4
b=5: 25 = 1 · 52 + 0 · 51 + 0 · 50 = (100)5
Let now denote lb (n) the length of the b-adic expansion of a positive integer n. Then
ln n
lb (n) = blogb nc + 1 5 logb n + 1 = + 1.
ln b
ln n 1
If n = 3 (i.e., n0 = 3), we have ln n > 1, and therefore ln b +1 < ln b + 1 ln n, i.e.,
1
l(n) < c ln n for n = 3 and with c = + 1.
ln b
Therefore we have
lb (n) = O(ln n), (2.9)
no matter what the value of b is. Therefore the number of digits of n in any number system
belongs to the same complexity class O(ln n).
The Ω-notation provides an asymptotic lower bound. For two functions f , g : N → R+ we write
Example 2.5. We have 21 n3 − n + 1 = Ω(n2 ), because 12 n3 − n + 1 > 13 n2 for all n = 1. (That is,
c = 31 , n0 = 1 in (2.10)).
1 This is an important result from elementary number theory. It is proved in any basic mathematical textbook,
e.g. [13, 34].
20 Andreas de Vries
A function f (n) thus belongs to the set Θ(g(n)) if there are two positive constants c1 and c2
such that it can be “sandwiched” between c1 g(n) and c2 g(n) for sufficiently large n. Figure 2.1
(c) gives an intuitive picture of the functions f (n) and g(n). For all values of n right of n0 f (n)
lies at or above c1 g(n) and at or below c2 g(n). In other words, for all n = n0 the function f (n)
is equal to the function g(n) up to a constant factor.
The definition of Θ(g(n)) requires that every member f (n) of Θ(g(n)) is asymptotically
nonnegative, i.e. f (n) = 0 whenever n is sufficiently large. Consequently, the function g(n)
itself must be asymptotically nonnegative (or else Θ(g(n)) is empty).
Example 2.6. (i) Since we have 2n2 + n + 1 = O(n2 ) and 2n2 + n + 1 = Ω(n2 ), we also have
2n2 + n + 1 = Θ(n2 ).
(ii) Let b be an integer with b > 1 nad lb (n) = blogb nc + 1 the length of the b-adic expansion
of a positive integer n. Then (c − 1) ln n 5 lb (n) < c ln n for n = 3 and with c = ln1b + 1.
Therefore we have
lb (n) = Θ(ln n). (2.12)
The complexity classes of polynomials are rather easy to determine. A polynomial fk (n) of
degree of degree k for some k ∈ N0 is the sum
k
fk (n) = ∑ ai ni = a0 + a1 n + a2 n2 + a3 n3 + . . . + ak nk ,
i=0
with the constant coefficients ai ∈ R. We can then state the following theorem.
Theorem 2.7. A polynomial of degree k is contained in the complexity class Θ(nk ), i.e., fk (n) =
Θ(nk ).
Example 2.8. We saw above that the polynomial 2n2 + n + 1 is in the complexity class Θ(n2 ),
according to the theorem. The polynomial, however, is not contained in the following complex-
ity classes:
2n2 + n + 1 6= O(n), 2n2 + n + 1 6= Ω(n3 ), 2n2 + n + 1 6= Θ(n3 );
but 2n2 + n + 1 = O(n3 ).
1. Worst-case analysis determines the upper bound of running time for any input. Knowing
it will give us the guarantee that the algorithm will never take any longer.
2. Average-case analysis determines the running time of a typical input, i.e. the expected
running time. It sometimes may come out that the average time is as bad as the worst-
case running time.
The complexity of an algorithm is measured in the number T (n) of instructions to be done,
where T is a function depending on the size of the input data n. If, e.g., T (n) = 3n + 4, we say
that the algorithm “is of linear time complexity,” because T (n) = 3n + 4 is a linear function.
Time complexity functions that occur frequently are given in the following table, cf. Figure 2.2.
Complexity T (n) Notation
ln n, log2 n, log10 n, . . . logarithmic time complexity Θ(log n)
n, n2 , n3 , . . . polynomial time complexity Θ(nk )
2n , en , 3n , 10n , . . . exponential time complexity Θ(kn )
T (n) polynomial
exponential
n2
2n
logarithmic
ln n
Figure 2.2: Qualitative behavior of typical functions of the three complexity classes O(ln n), O(nk ), O(kn ),
k ∈ R+ .
Definition 2.9. An algorithm is called efficient, if T (n) = O(nk ) for a constant k, i.e., if it has
polynomial time complexity or is even logarithmic.
Analyzing even a simple algorithm can be a serious challenge. The mathematical tools
required include discrete combinatorics, probability theory, algebraic skill, and the ability to
identify the most significant terms in a formula.
Example 2.10. It can be proved2 that the Euclidean algorithm has a running time
if all divisions and iterative steps are considered. (However, it may terminate even for large
numbers m and n after a single iteration step, namely if m | n or n | m.) Therefore, the Euclidean
algorithm is efficient, since it has logarithmic running time in the worst case, in dependence of
the sizes of its input numbers.
2 Cf. [5, p902]; for the number of iterations we have TEuclid (m, n) = Θ(log max[m, n]), see footnote 4 on p. 15
22 Andreas de Vries
if (C)
S1
else
S2
consists of the condition C (an operation) and the instructions S1 and S2 . It thus has
running time T (n) = O(1) + O( f (n)) + O(g(n)), i.e.
O( f (n)) if g(n) = O( f (n)),
(
T (n) = (2.15)
O(g(n)) if f (n) = O(g(n)).
• In a repetition each loop can have a different running time. All these running times have
to be summed up. Let be f (n) the number of loops to be done, and g(n) be the running
time of one loop. (Note that f (n) = O(1) if the number of loops does not depend on n)
Then the total running time T (n) of the repetition is given by T (n) = O( f (n)) · O(g(n)),
or
T (n) = O f (n) · g(n) . (2.16)
2.3 Summary
• Algorithmic analysis proves the correctness and studies the complexity of an algorithm by
mathematical means. The complexity is measured by counting the number of instructions
that have to be done during the algorithm on a RAM, an idealized mathematical model of
a computer.
• Asymptotic notation erases the “fine structure” of a function and lets only survive its
asymptotic behavior for large numbers. The O, Ω, and Θ-notation provide an asymptot-
ical bounds on a function. We use them to simplify complexity analysis. If the running
time of an algorithm with input size n is T (n) = 5n + 2 we may say simply that it is O(n).
The following essential aspects have to be kept in mind:
– The O-notation eliminates constants: O(n) = O(n/2) = O(17n) = O(6n + 5). For
all these expressions we write O(n). The same holds true for the Ω-notation and the
Θ-notation.
– The O-notation yields upper bounds: O(1) ⊂ O(n) ⊂ O(n2 ) ⊂ O(2n ). (Note that you
cannot change the sequence of relations!) So it is not wrong to say 3n2 = O(n5 ).
– The Ω-notation yields lower bounds: Ω(2n ) ⊂ Ω(n2 ) ⊂ Ω(n) ⊂ Ω(1). So, 3n5 =
Ω(n3 ).
– The Θ-notation yields tight bounds: Θ(1) 6⊂ Θ(n) 6⊂ Θ(n2 ) 6⊂ Θ(2n ). So 3n2 =
Θ(n2 ), but 3n2 6= Θ(n5 ).
• The O-notation simplifies the worst-case analysis of algorithms, the Θ-notation is used
if exact complexity classes can be determined. For many algorithms, a tight complexity
bound is not possible! For instance, the termination of the Euclidean algorithm does not
100
only depend on the size of m and n, even for giant numbers such as m = 10100 and
99
n = 1010 it may terminate after a single step: gcd(m, n) = n.
• There are three essential classes of complexity, the class of logarithmic functions O(log n),
of polynomial functions O(nk ), and of exponential functions O(kn ), for any k ∈ R+ .
Recursions
3.1 Introduction
Building stacks is closely related to the phenomenon of recursions. Building stacks again are
related to the construction of relative phrases in human languages. In the most extreme form
recursions in human language probably occur in German: the notorious property of the German
language to put the verb at the end of a relative phrase has a classical persiflage due to Christian
Morgenstern at the beginning of his Galgenlieder:
A case of recursion is shown in figure 3.1. Such a phenomenon is referred to as “feedback” in the
(‡ Restricted Use)
engineerings. Everyone knows the effect of a microphone held near a loudspeaker amplifying
the input of this microphone . . . the high whistling noise is unforgetable.
24
Algorithmics and Optimization 25
n! = n · (n − 1) · . . . · 2 · 1.
For any n ∈ Z we obtain n! = fac (n). Why? Now, let us prove it by induction:
• Induction start. For n = 0 we have fac (0) = 1. Hence fac (0) = 0!.
O.k., you perhaps believe this proof, but maybe you do not see why this recursion works?
Consider for example the case n = 3:
fac (3) 6
HH *
H
j
fac (2) 2
H *
HH
j
fac (1) 1
HH *
H
j
fac (0) - 1
two pieces: one piece that the algorithm knows how to do (base case), and one piece that it
does not know. The latter piece must resemble the original problem, but be a slightly simpler
or smaller version of the original problem. Because this new problem looks like the original
one, the algorithm calls itself to go to work with the smaller problem — this is referred to as a
recursive call or the recursion step.
The recursion step executes while the original call of the algorithm is still open (i.e., it
has not finished executing). The recursion step can result in many more recursive calls, as
the algorithm divides each new subproblem into two conceptual pieces. For the recursion to
eventually terminate, each time the algorithm calls itself with a smaller version of the problem,
the sequence of smaller and smaller problems must converge to the base case in finite time. At
that point the algorithm recognizes the base case, returns a result to the previous algorithm and
a sequence of returns ensues up the line until the first algorithm returns the final result.
Recursion resembles much the concept of mathematical induction, which we learned above.
In fact, the problem P(n) is proven by reducing it to be true if the smaller problem P(n − 1) is
true. This in turn is true if the smaller problem P(n − 2) is true, and so on. Finally, the base
case, called induction start, is reached which is proven to be true.
algorithm searchmax (a[], l, r) // — find the maximum a[l], a[l + 1], . . . , a[r]
if (l = r) // the base case
return a[l];
else m ← searchmax(a[], l + 1, r) // remains open until base case is reached!
if ( a[l] > m )
return a[l]
else
return m;
The solution can be described by the illustration in figure 3.3. Another way to visualize the
working of searchmax (and a general recursive algorithm as well) is figure 3.4. It shows the
sequence of successive calls and respective returns.
The searchmax algorithm for an array of length n takes exactly 2n operations, namely n calls
Algorithmics and Optimization 27
if 3 < max(1, 2)
a[] = [3|7, 5], l = 0, r = 2
|−→ if 7 < max(2, 2)
a[] = [7|5], l = 1, r = 2
|−→ max(2, 2) ← 5 a[] = [5], l = r = 2
max(1, 2) ← 7 ←−|
max(0, 2) ← 7 ←−|
max(0, 2) 7
HH *
H
j
max(1, 2) 7
HH *
H
j
max(2, 2) - 5
Rule 1. Any recursion consisting of a single recursion call in each step (a “primitive recur-
sion”) can be implemented as an iteration, and vice versa.
28 Andreas de Vries
As an example for the fact that any recursive algorithm can be substituted by an iterative
one let us look at the following iterative definition of fac (n) determining the value of n!:
x0 = gcd(m, n) = x1 m + x2 n. (3.3)
Note that x1 and x2 may be zero or negative. These coefficients are very useful for the solu-
tion of linear Diophantine equations, particularly for the computation of modular multiplicative
inverses in cryptology. The following algorithm extendedEuclid takes as input an arbitrary pair
(m, n) of positive integers and returns a triple of the form (x0 , x1 , x2 ) that satisfies Equation
(3.3).
if ( n == 0 ) {
return x;
} else {
x = extendedEuclid( n, m % n );
tmp = x[1];
x[1] = x[2];
x[2] = tmp - (m/n) * x[2];
return x;
}
}
e fac (n − 1) e T (n − 1) + c
.. ..
. .
e fac (1) e T (1) + c
Figure 3.5: Recursion tree of calls of the factorial algorithm and the respective running times T (n).
If we want to analyze the complexity, we have to compute the running time on each level.
Let be n = 0. Then the running time T (0) is a constant c0 ,
T (0) = c0 .
For wide class of recursion algorithms, the time complexity can be estimated by the so-called
Master theorem [5, §4.3]. A simple version of it is the following theorem.
Theorem 3.2 (Master Theorem, special case). Let a = 0, b > 1 be constants, and let T : N →
R+ be a function defined by the recurrence
T0 if n = n0 ,
T (n) = log a (3.6)
a T (bn/bc) + Θ(n b ) otherwise
with some initial value T0 . Then the T (n) can be estimated asymptotically as being polynomial:
Therefore, T (n) 5 f (n) (2 + log2 n) = O( f (n) · log n). If especially a f (n/a) = f (n), we even
have T (n) = f (n) · (1 + dlog2 ne) = Θ( f (n) log n).
Finally we state a result which demonstrates the power as well as the danger of recursion. It
is quite easy to generate exponential growth.
Proof. Analogously to Fig. 3.6, we see that according to Eq. (3.12) there are n generation levels
in the call tree of T (n), and therefore an basis cases. As long as f grows at most polynomially,
this means that T (n) = Θ(an ).
Examples 3.5. (i) The recursion equation T (n) = 12 T ( n2 ) + n is in the class Eq. (3.6) with a =
b = 2, hence T (n) = Θ(n log n).
(ii) A function T (n) satisfying T (d n2 e) + 1 is of the class (3.9) with f (n) = 1, i.e., T (n) =
O(log n).
(iii) The algorithm drawing the “Koch snowflake curve”, a special recursive curve, to the
level n has a time complexity T (n) given by T (n) = 4T (n − 1) + c1 with a constant c1 . Since it
therefore obeys (3.12) with a = 4 and f (n) = c1 , we have T (n) = Θ(4n ).
Figure 3.7: The towers of Hanoi for the case of nine disks.
A third peg is available for temporarily holding disks. So schemtaically the situation looks like
as in figure 3.7. According to the legend the world will end when the priests complete their
task. So we will attack the problem, but better won’t tell them the solution. . .
Let us assume that the priests are attempting to move the disks from peg 1 to peg 2. We
wish to develop an algorithm that will output the precise sequence of peg-to-peg disk transfers.
For instance, the output
1→3
means: “Move the most upper disk at peg 1 to the top of peg 3.” For the case of only two disks,
e.g., the output sequence reads
1 → 3, 1 → 2, 3 → 1. (3.14)
1. Move n − 1 disks from peg 1 to peg 3, using peg 2 as a temporary holding area.
3. Move the n − 1 disks from peg 3 to peg 2, using peg 1 as a holding area.
The process ends when the last task involves moving n = 1 disks, i.e. the base case. This is
solved trivially by moving the disk from peg 1 to peg 2, without the need of a temporarily
holding area.
The formulation of the algorithm reads as follows. We name it hanoi and call it with four
parameters:
What about the running time of this algorithm? In fact, from the recursion algorithm we can
directly derive the recursion equation
c0 if n = 1,
T (n) = (3.15)
2T (n − 1) + c1 otherwise,
with c0 being a constant representing the output effort in the basis case, and c1 the constant
effor in the recursion step. Since this equation is of the class of Theorem 3.4 with a = 2 and
f (n) = c1 for n > 1, f (1) = c0 , we have
T (n) = Θ(2n ). (3.16)
If we try to exactly count the moves to be carried out, we achieve the number f (n) of moves for
the problem with n disks as follows. Regarding the algorithm, we see that f (n) = 2 f (n − 1) + 1
for n = 1, with f (0) = 0. (Why?) It can be proved easily by induction that then
f (n) = 2n − 1 for n = 0. (3.17)
Summary
• A recursion is a subroutine calling itself during its execution. It consists of a basis case
(or basis cases) which do not contain a recursive call but return certain values, and of one
or several recursive steps which invoke the subroutine with slightly changed parameters.
A recursion terminates if for any allowed arguments the basis case is reached after finitely
many steps.
• A wide and important class of recursions, the “primitive recursions” consisting of a single
recursive call in the recursion step, can be equivalently implemented iteratively, i.e., with
loops.
• The time complexity of a recursive algorithm is detrmined by a recursion equation which
can be directly derived from the algorithm. There are the following usual classes of
recursion equations
T (n) = T (n−1)+c, T (n) = bT (n−1)+c, T (n) = aT (bn/bc)+Θ(nlogb a log n)
with constants a = 1, b > 1, c > 0 and some appropriate base cases. These have solutions
with the respective asymptotic behaviors
T (n) = Θ(n), T (n) = Θ(bn ), T (n) = Θ(nlogb a log n)
Chapter 4
Sorting
So far we have come to know the data structures of array, stack, queue, and heap. They all allow
an organization of data such that elements can be added to, or deleted from. In the next few
chapter we survey the computer scientist’s toolbox of frequently used algorithms and discuss
their efficiency.
A big part of overall CPU time is used for sorting. The purpose of sorting is not only to
get the items into a right order but also to bring together what belongs together. To see for
instance all transactions belonging to a specific credit card account, it is convenient to sort the
data records by credit card number and then look only at the intervall containing the respective
transactions.
selectionSort (a[], n)
// sorts array a[0], a[1], . . . , a[n − 1] ascendingly
for (i = 0; i 5 n − 2; i++) { // find minimum of a[i], . . . , a[n]
min ← i;
for ( j = i + 1; j 5 n − 1; j++) {
if (a[ j] < a[min])
min ← j;
}
a[i] ↔ a[min];
}
Essentially, for the first loop there are made n − 1 operations (inner j-loop), for the second one
n − 1, . . . . Hence its running time is given by
n−1
n(n − 1)
Tsel (n) = (n − 1) + (n − 2) + . . . + 1 = ∑k= = O(n2 ).
k=1 2
Insertion sort. Another simple method to sort an array a[] is to insert an element in the right
order. This is done by the insertion sort. To simplify this method we define the element a[0] by
initializing a[0] = −∞. (In practice −∞ means an appropriate constant.)
34
Algorithmics and Optimization 35
In the worst case the inner while-loop is run through just to the beginning of the array (a[1]).
This is the case for a descending ordered initial sequence. The effort then is given by
n
(n + 1)n
Tins (n) = 1 + 2 + . . . + n = ∑k= = O(n2 ).
k=1 2
In the mean we can expect the inner loop running through half of the lower array positions. The
effort then is
n−1
1 1 n(n − 1)
T̄s (n) = (1 + 2 + . . . + (n − 1)) = ∑i= = O(n2 ).
2 2 i=1 4
BubbleSort. Bubble sort, also known as exchange sort, is a simple sorting algorithm which
works by repeatedly stepping through the list to be sorted, comparing two items at a time and
swapping them if they are in the wrong order. The algorithm gets its name from the way smaller
elements “bubble” to the top (i.e., the beginning) of the list by the swaps.
Its running time again is Tbub (n) = O(n2 ). Following is a slightly improved version which does
not waste running time if the array is already sorted. Here the pass through the array is repeated
until no swaps are needed:
It is clear that they cannot be faster than Ω(n), because each element key has to be considered.
Ω(n) is an absolute lower bound for key comparison algorithms. It can be proved that any key
comparison sort algorithm needs at least
comparisons in the worst case for a data structure of n elements [19, §6.4]. However, in spe-
cial situations there exist sorting algorithms which have a better running time, noteworthy the
pigeonhole sort sorting an array of positive integers. It is a special version of the bucket sort
[23, §2.7].
pigeonholeSort (int[] a)
// determine maximum entry of array a:
max ← −∞;
for (i ← 0; i < a.length; i++) if (max < a[i]) max ← a[i];
b = new int[max +1]; // b has max +1 pigeonholes
for (i ← 0; i < a.length; i++) b[a[i]]++; // counts the entries of pigeonhole a[i]
// copy the pigeonhole entries back to a:
j ← 0;
for (i = 0; i < b.length; i++)
for (k = 0; k < b[i]; k++)
a[ j] = i; j++;
It has time complexity O(n + max a) and space complexity O(max a) where max a denotes the
maximum entry of the array. Hence, if 0 5 a[i] 5 O(n), then both time and space complexity
are O(n).
If the several parts have approximately equal size, the algorithm is called a balanced divide and
conquer algorithm
Theorem 4.1. A balanced divide and conquer algorithm has running time
(i) O(n) if the divide and merge steps each only need O(1) running time;
(ii) O(n log n) if the divide and merge steps each have linear running time O(n).
This theorem is a powerful result, it solves the complexity analysis of a wide class of algorithms
with a single hit. Its proof is based on Theorem 3.2, with a = b = 2.
Remark 4.2. For a balanced divide-and-conquer algorithm, the running time T (n) is given by
Algorithmics and Optimization 37
a recursion equation:
O(1) if n = 1,
T (n) = O(n) + 2 · T (n/2) + O(1) if n > 1. (4.2)
|{z} | {z } |{z}
divide conquer merge
Hence f (n) = O(1) + O(n) = O(n), i.e. f has linear growth. Therefore, T (n) = O(n log n).
• conquer by sorting the two sequences recursively by calling mergeSort for both se-
quences;
Figure 4.1: Left figure: mergeSort for an array of 10 elements. Right figure: quickSort for an array of 10
elements
4.4.2 QuickSort
This algorithm has much in common with mergeSort. It is a recursive divide and conquer algo-
rithm as well. But whereas mergeSort uses a trivial divide step leaving the greatest part of the
work to the merge step, quickSort works in the divide step and has a trivial merge step instead.
Although it has a bad worst case behavior, it is probably the most used sorting algorithm. It is
comparably old, developed by C.A.R. Hoare in 1962.
Let be a = a0 . . . an−1 the sequence to be operated upon. The algorithm quickSort(l, r) works
as follows:
• divide the r element sequence al . . . ar into two sequences al , . . . , a p−1 and a p+1 . . . ar such
that each element of the first sequence is smaller than any element of the second sequence:
ai 5 a p with l 5 i < p and a j = a p with p < j 5 r. This step we call partition, and the
element a p is called pivot element.1 Usually, the element ar is chosen to be the pivot
element, but if you want you can choose the pivot element arbitrarily.
• conquer by sorting the two sequences recursively by calling quickSort for both sequences;
We see that after the inner “i-loop” the index i points to the first element ai from the left which
is greater than or equal ar , ai = ar (if i < j). After the “ j-loop” j points to the first element a j
from the right which is smaller than ar (if j > i). Therefore, after the subalgorithm partition the
pivot element a p is placed on its right position (which will not be changed in the sequel). See
Fig. 4.1.
The complexity analysis of quickSort is not trivial. The difficulty lies in the fact that finding the
pivot element a p depends on the array. In general, this element is not in the middle of the array,
and thus we do not necessarily have a balanced divide-and-conquer algorithm. :-(
The relevant step is the divide-step consisting of the partition algorithm. The outer loop is
executed exactly once, whereas the two inner loops add to n − 1. The running time for an array
of length n = 1 is a constant c0 , and for each following step we need time c additionally to the
recursion calls. Hence we achieve the recurrence equation
c if n = 1,
0
T (n) = (n − 1) + c + T (p − 1) + T (n − p) + 0 if n > 1 (1 5 p 5 n). (4.3)
| {z } | {z } |{z}
divide conquer merge
Worst case
if n = 1,
(
c0
Tworst (n) = (4.4)
(n − 1) + Tworst (n − 1) + c if n > 1.
40 Andreas de Vries
Building up
Tworst (1) = c0
Tworst (2) = 1 + T (1) + c = 1 + c + c0
Tworst (3) = 2 + T (2) + c = 2 + 1 + 2c + c0
..
.
n−1
n
Tworst (n) = ∑ k + (n − 1)c + c0 = + (n − 1)c + c0 = O(n2 ).
k=1 2
Therefore, quickSort is not better than insertionSort in the worst case. (Unfortunately, the worst
case is present, if the array is already sorted. . . )
It can be shown that the average case is only slightly longer [23] §2.4.3:
4.4.3 HeapSort
Because of the efficient implementability of a heap in an array, it is of great practical interest
to consider heapSort. It is the best of the known sorting algorithms, guaranteeing O(n log n) in
the worst case, just as mergeSort. But it needs in essence no more memory space than the array
needs itself (remember that mergeSort needs an additional temporary array). The basic idea og
heapSort is very simple:
1. The n elements to sort are inserted in a heap; this results in complexity O(n log n).
Let a = a0 . . . an−1 denote an array of n objects ai that are to be sorted with respect to their keys
ai .
Recall again the essential properties of heaps given in [8, §4.5.1]. Let be h an array of n ele-
ments, and let hi denote its i-th entry. To insert an element we can take the following algorithm:
(a) (b)
Figure 4.2: (a) The subroutine insert. (b) The subroutine deleteMax.
reheap
Algorithm reheap lets the element al “sink down into” the heap such that a subheap al+1 , ai+1 . . . ar
is made to a subheap al , ai+1 . . . ar .
Note by Equation (4.7) in [8] that the left child of node ai in a heap — if it exists — is a2i+1 , and
the right child is a2(i+1) . Figure 4.3 shows how reheap works. Algorithm reheap needs two key
2 8
8 5 2 5
7 1 2 4 7 1 2 4
3 6 0 3 6 0
8 8
7 5 7 5
2 1 2 4 6 1 2 4
3 6 0 3 2 0
comparisons on each tree level, so at most 2 log n comparisons for the whole tree. Therefore,
the complexity Treheap of reheap is
Now we are ready to define algorithm heapSort for an array a with n elements, a = a0 , a1 , . . . , an−1 .
Initially, a need not be a heap.
Algorithmics and Optimization 43
algorithm heapSort ()
for (i = b(n − 1)/2c; i = 0; i- -) // phase 1: Building the heap
reheap (i, n − 1);
for (i = n − 1; i = 1; i- -) // phase 2: Selecting the maximum
a0 ↔ ai ; reheap (0, i − 1);
How does it work? In phase 1 (building the heap) the subheap ab(n−1)/2c+1 , . . . , an−1 is ex-
tended to the subheap ab(n−1)/2c , . . . , an−1 . The loop is run through (n/2) times, each with
effort O(log n). In phase 2 the sorted sequence is built from the tail part of the array. For this
purpose the maximum a0 is exchanged with ai , and thus the heap area is reduced by one node to
a0 , . . . , ai−1 . Because a1 , . . . , ai−1 still is a subheap, reheaping of a0 makes a0 , . . . , ai−1 a heap
again:
0 i i+1 n−1
8 7 5 6 1 2 4 3 2 0 9 14 23 31 54 64 72
| {z } | {z }
heap area increasingly ordered sequence of
the n − i − 1 greatest elements
In phase 2 the loop will be run through for (n − 1) times. Therefore, in total heapSort has
complexity O(n log n) in the worst case.
Complexity selection/insertion/bubble quick sort merge sort heap sort pigeonhole sort
worst case O(n2 ) O(n2 ) O(n ln n) O(n ln n) O(n)
average case O(n2 ) O(n ln n) O(n ln n) O(n ln n) O(n)
space O(1) O(ln n) O(n) O(1) O(n)
Table 4.1: Complexity and required additional memory space of several sorting algorithms on data structures
with n entries; pigeonhole sort is assumed to be applied to integer arrays with positive entries 5 O(n).
Chapter 5
Is it possible to optimize searching in unsorted data structures? In [8, Satz 4.3] we learned the
theoretic result that searching a key in an unsorted data structure is linear in the worst case, i.e.,
Θ(n). In Theorem A.5 on p. 110) it is shown that a naive “brute force” search, or: exhaustion,
costs running time of order Θ(n) also on average. So, these are the mathematical lower bounds
which restrict a search and cannot be decreased.
However, there is a subtle backdoor through which at least the average bound can be low-
ered considerably to a constant, i.e., O(1), albeit to the price of additional calculations. This
backdoor is called hashing.
The basic idea of hashing is to calculate the key from the object to store and to minimize
the possible range these keys can attain. The calculation is performed by a hash function.
Sloppily said, the hashing principle consists in storing the object chaotically somewhere, but
remembering the position by storing the reference in the hash table with the calculated key.
Searching the original object one then has to calculate the key value, look it up in the hash
table, and get the reference to the object.
The hashing principle is used, for instance, by the Java Collection classes HashSet and HashMap.
The underlying concept of the hash function, however, is used also in totally different areas of
computer science such as cryptology. Two examples of hash functions used in cryptology are
MD5 and SHA-1. You can find a short introduction to hash functions in German in [20].
Definition 5.1. An alphabet is a finite nonempty set Σ = {a1 , . . . , as } with a linear ordering
44
Algorithmics and Optimization 45
Example 5.4. (i) A word over the alphabet Σ of example 5.2 (i) is, e.g., NOVEMBER. It has
length 8, i.e.
NOVEMBER ∈ Σ8 .
(ii) A word over the binary alphabet Σ = {0, 1} is 1001001 ∈ {0, 1}7 .
Because alphabets are finite sets, their letters can be identified with natural numbers. If an
alphabet has m letters, its letters can be identified (“coded”) with the numbers
For instance, for the 26-letter alphabet Σ of example 5.2 (i) we can choose the code h·i : Σ → Z26 ,
given by
ai A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
(5.2)
hai i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
That means, hNi = 14. Another example is the 127-letter alphabet of the ASCII-Code, where
e.g. hAi = 65, hN) = 78, or hai = 97. A generalization is Unicode which codifies 216 = 65 536
letters. For notational convenience the 216 numbers are usually written in their hexadecimal
representation with four digits (note: 216 = 164 ), i.e.
The first 256 letters and their hexadecimal code is given in figure 5.1.
0000 C0 Controls and Basic Latin 007F 0080 C1 Controls and Latin-1 Supplement 00FF
000 001 002 003 004 005 006 007 008 009 00A 00B 00C 00D 00E 00F
0 !1A 0 @ P ` p 0 !1A ° À Ð à ð
0000 0010 0020 0030 0040 0050 0060 0070 0080 0090 00A0 00B0 00C0 00D0 00E0 00F0
1 " 2 ! 1 A Q a q 1 "2 ¡ ± Á Ñ á ñ
0001 0011 0021 0031 0041 0051 0061 0071 0081 0091 00A1 00B1 00C1 00D1 00E1 00F1
2 #3 " 2 B R b r 2 #3 ¢ ² Â Ò â ò
0002 0012 0022 0032 0042 0052 0062 0072 0082 0092 00A2 00B2 00C2 00D2 00E2 00F2
3 $4 # 3 C S c s 3 $4 £ ³ Ã Ó ã ó
0003 0013 0023 0033 0043 0053 0063 0073 0083 0093 00A3 00B3 00C3 00D3 00E3 00F3
4 %5 $ 4 D T d t 4 %5 ¤ ´ Ä Ô ä ô
0004 0014 0024 0034 0044 0054 0064 0074 0084 0094 00A4 00B4 00C4 00D4 00E4 00F4
5 &6 % 5 E U e u 5 &6 ¥ µ Å Õ å õ
0005 0015 0025 0035 0045 0055 0065 0075 0085 0095 00A5 00B5 00C5 00D5 00E5 00F5
7 (8 ' 7 G W g w 7 (8 § $ Ç × ç ÷
0007 0017 0027 0037 0047 0057 0067 0077 0087 0097 00A7 00B7 00C7 00D7 00E7 00F7
8 )9 ( 8 H X h x 8 )9 ¨ ¸ È Ø è ø
0008 0018 0028 0038 0048 0058 0068 0078 0088 0098 00A8 00B8 00C8 00D8 00E8 00F8
9 *: ) 9 I Y i y 9 *: © ¹ É Ù é ù
0009 0019 0029 0039 0049 0059 0069 0079 0089 0099 00A9 00B9 00C9 00D9 00E9 00F9
A +; * : J Z j z A +; ª º Ê Ú ê ú
000A 001A 002A 003A 004A 005A 006A 007A 008A 009A 00AA 00BA 00CA 00DA 00EA 00FA
B ,< + ; K [ k { B ,< « » Ë Û ë û
000B 001B 002B 003B 004B 005B 006B 007B 008B 009B 00AB 00BB 00CB 00DB 00EB 00FB
C -= , < L \ l | C -= ¬ ¼ Ì Ü ì ü
000C 001C 002C 003C 004C 005C 006C 007C 008C 009C 00AC 00BC 00CC 00DC 00EC 00FC
D .> - = M ] m } D .>B ½ Í Ý í ý
000D 001D 002D 003D 004D 005D 006D 007D 008D 009D 00AD 00BD 00CD 00DD 00ED 00FD
E /? . > N ^ n ~ E /? ® ¾ Î Þ î þ
000E 001E 002E 003E 004E 005E 006E 007E 008E 009E 00AE 00BE 00CE 00DE 00EE 00FE
F 0@ / ? O _ o B F 0@ ¯ ¿ Ï ß ï ÿ
000F 001F 002F 003F 004F 005F 006F 007F 008F 009F 00AF 00BF 00CF 00DF 00EF 00FF
2 The Unicode Standard 4.0, Copyright © 1991–2003, Unicode, Inc. All rights reserved. The Unicode Standard 4.0, Copyright © 1991–2003, Unicode, Inc. All rights reserved.
(‡ Restricted
7
Use)
• to a given word w, it is hard to find a second word w0 such that h(w) = h(w0 ), i.e., a second
word with the same hash value.
If two different words have the same hash value, we have a collision. The set of all possible hash
values is also called the hash table, and the number of all hash values its capacity. Sometimes,
for instance in the Java-API, the hash values are also called buckets.
h(w) = wn ⊕ . . . ⊕ w1
be the XOR operation of an arbitrarily long bit string. For instance, h(101) = 1 ⊕ 0 ⊕ 1 = 0.
Then h is a (very simple) hash function, and 0 is the hash value of 101. The input length is
arbitrary, but the output is either 0 or 1, i.e., 1 bit. Since h(1001) = 0, the two different words
w(1) = 101 and w(2) = 1001 have the same hash value. Thus we have a collision.
Example 5.7. The last digit of the 13-digit ISBN1 is a hash value computed from the first 12
digits and is called “check digit.” To date, the first three digits are 978 or 979, and may be
different according to the EAN system,
978w4 w5 . . . w12 h.
Let Σ = {0, 1, . . . , 9}. Then the first 12 digits of the ISBN form a word w ∈ Σ12 , and the last
digit is given as h(w) where the hash function h : Σ12 → Σ
12
i 1 if i is odd,
h(w1 w2 . . . w12 ) = − ∑ gi · wi mod 10 where gi = 2 + (−1) =
i=1
3 if i is even.
For example,
h(978389821656) = −138 mod 10 = 2,
since
9 7 8 3 8 9 8 2 1 6 5 6
1 3 1 3 1 3 1 3 1 3 1 3
9 21 8 9 8 27 8 6 1 18 5 18 ∑ 138.
Therefore 978-3-89821-656-2 is a valid ISBN number.
A hash function cannot be invertible, since it maps a huge set of words onto a relatively
small set of hash values. Thus there must be several words with the same hash value, forming a
collision.
Hash functions are used in quite different areas. They do not only play an important role in
the theory of data bases, but are also essential for digital signatures in cryptology. For instance,
they provide an invaluable technique to support reliable communications. Consider a given
1 also called ISBN-13, valid since 1 January 2007; http://www.isbn-international.org/
Algorithmics and Optimization 47
message m which shall be transmitted over a channel; think, for instance, of IP data packets sent
through the internet or a bit string transmitted in a data bus in your computer. Most channles
are noisy and may modify or damage the original message. In the worst case the receiver does
not notice that the data are corrupted and relies on wrong information.
A quick way to enable the receiver to check the incoming data is to send along with the
message w its hash value h(w), i.e., to send
(w, h(w)).
If sender and receiver agree upon the hash function, then the receiver can check the data consis-
tency by simply taken the received message m0 , compute its hash value h(w0 ), and compare it
to the received hash value h(w). If the transmission has been modified during the transmission,
and the hash function is “good” enough, then the receiver notices a difference and may contact
the sender to resend him the message.
This is realized very often in communication channels. In case of IP packets or in the data
bus of your computer, the hash function is a simple bitwise parity check, in cryptographic com-
munications it is a much more complex function such as SHA-1. A short survey of important
hash functions used in cryptology is given in Table 5.1. Notably, each of them base on MD4
which has been developed by Ron Rivest at the end of the 1980’s. RIPEMD-160 is supposed to
be very secure. SHA-1 is the current international standard hash function in cryptography.
Example 5.8. SHA (Secure Hash Algorithm) has been developed by the NIST and the NSA and
is the current standard hash function. It works on the set Σ∗ of arbitrary words over the binary
alphabet Σ = {0, 1} and computes hash values of fixed length m = 160 bit in binary format with
leading zeros, i.e.,
SHA : {0, 1}∗ → {0, 1}160 . (5.4)
For a given binary word w ∈ {0, 1}∗ it performs the following steps.
1. Divide the bit word w into blocks of 512 bit: The word w is padded such that its length is
a multiple of 512 bits. More precisely, the binary word is attached by a 1 and as many 0’s
such that its length is a multiple of 512 bits minus 64 bits, added by a 64-bit-representation
of the (original) word.
2. Form 80 words à 32 bits: Each 512-bit block is divided into 16 blocks M0 , M1 , . . . , M15
à 32 bit which are transformed into 80 words W0 , . . . , W79 according to
Mt if 0 5 t 5 15,
Wt =
(Wt−3 ⊕Wt−8 ⊕Wt−14 ⊕Wt−16 ) <<< 1 otherwise.
Here <<< denotes the bit rotation, or circular left-shifting (e.g., 10100 <<< 1 = 01001).2
2 The original specification of SHA as published by the NSA did not contain the bit rotation. It corrected a
“technical problem” by which the standard was less secure than originally intended [37, S. 506]. To my knowledge,
the NSA never has explained the nature of the problem in any detail.
48 Andreas de Vries
3. Initialize the variables and constants: In SHA there are used 80 constants K0 , . . . , K79
(with only four different values), given by
√ 30
0x5A827999 = b 2 · 2 c wenn 0 5 t 5 19,
√ 30
0x6ED9EBA1 = b 3 · 2 c wenn 20 5 t 5 39,
Kt = √ 30
0x8F1BBCDC = b 5 · 2 c wenn 40 5 t 5 59,
√ 30
0xCA62C1D6 = b 10 · 2 c wenn 60 5 t 5 79,
D = 0x10325476, E = 0xC3D2E1F0.
and five variables3 a, . . . , e, being initialized as
a = A, b = B, c = C, d = D, e = E.
Figure 5.2: Hashing principle. Here the universe U = Z16 = {0, 1, . . . , 15}, the hash table t with m = 10 entries,
and the hash function h(w) = w mod 10.
5.3 Collisions
We have a principal problem with hash functions. The domain of definition is a huge sets of
words of size N, whereas the number of address items m usually is much smaller, m N. That
means it may come to the effect that various different words obtain the same hash value. As we
defined above, such an event is called a collision. Let us examine the following example.
Example 5.9. We now construct a simple hash table. Let be the universe be
U = {22, 29, 33, 47, 53, 59, 67, 72, 84, 91}.
Moreover let h : U → Z11 be the hash function h(w) = w mod 11. Then we calculate the hash
table
h(w) w
0 33, 22
1 67
2
3 91, 47
4 59
5
6 72
7 29, 84
8
9
10
The example demonstrates that relatively many collisions can occur even though m = N, i.e.
even though there are as many addresses as words! This at first glance surprising fact is closely
related to the famous “birthday paradox” we explain later on.
How probable are collisions? We assume an “ideal” hash function distributing the n words
equally probable on the m hash values. Let be n 5 m (because for n > m a collision must occur!).
Denote
p(m, n) = probability for at least one collision in n words and m hash values.
(In the sequel we will often shortly write “p” instead of “p(m, n).”) Then the probability q that
no collision occurs is
q = 1 − p. (5.5)
50 Andreas de Vries
We first will calculate q, and then deduce p from q. So, what is q? If we denote by qi the
probability that the i-th word is mapped to a hash value without a collision under the condition
that all the former words are valued collisionless, then
q = q1 · q2 · . . . · qn .
First we see that q1 = 1, because initially all hash values are vacant and the first word can
be mapped on any value without collision. However, the second word finds one hash value
occupied and m − 1 vacant values. Therefore q2 = (m − 1)/m. So we found generally that
m−i+1
qi = 1 5 i 5 n,
m
because the i-th word finds (i − 1) values occupied. Thus we have for p
m(m − 1)(m − 2) · · · (m − n + 1)
p = 1− (5.6)
mn
In table 5.2 there are numerical examples for m = 365. It shows that only 23 words have to be
present such that a collision occurs with a probability p > 0.5! For 50 words, the probability
is 97%, i.e. a collision occurs almost unavoidably. The Hungarian-American mathematician
√
n p(365, n) m 1.18 m
22 0.476 365 22.49
23 0.507 1 000 000 1177.41
50 0.970 2128 ≈ 3 · 1038 2.2 · 1019
Table 5.2: The probability p for collision occurence for m = 365 (left) and Halmos estimates for some hash
capacities m
Example 5.10. Birthday paradox. Suppose a group of n people in a room. How probable
is it that at least two people have the same birthday? In fact, this question is equivalent to the
collision problem above. Here the number n of words corresponds to the number of persons,
and the possible birthdays corresponds to m = 365 hash values. Thus the table 5.2 also gives
an answer to the birthday paradox: For 23 persons in a room the probability that two have the
same birthday is greater than 50%!
Complexity analysis. Assume we want to insert n words into a hash table of size m. For all
three operations insert, delete, and member of word w the linked list at the hash entry t[h(w)]
must be run through. In the worst case all words obtain the same hash value. Searching words
in the hash table then has the same time complexity as running through a linked list with n
objects, i.e.
Tworst (n) = O(n).
However, we will see that hash table have a average case complexity. To start the analysis, we
first consider the question: How much time does an search take? The average length of a linked
list is n/m. The running time of computing the hash function is a constant, i.e. O(1). Adding
both running times for an average case search yields the complexity Tmean (n) = O(1 + n/m).
Theorem 5.11. The average complexity of the three dictionary algorithms insert, delete, and
member of a hashing with linked lists (separate chaining) is
v = 61 and w = 39, since h(v) = h(w) = 6. There are mainly two ways to calculate a new hash
value in case a collision is detected.
1. (Double hashing) Use two hash functions h(w), h0 (w) mod m and try the hash values
There is one great problem for hashing with open addressing, concerning the deletion of words.
If a word w is simply deleted, a word v that has been passed over w because of a collision could
not be found! Instead, the cell where w is located has to be marked as deleted but cannot be
simply released for a new word.
Therefore, hashing with open addressing is not appropriate for
• very “dynamical” applications where there are lots of inserts and deletes;
• for cases in which the number n of words to be inserted is greater than the hash table.
Complexity analysis. We assume that the hash function sequence hi is “ideal” in the sense
that the sequence h0 (w), h1 (w), . . . , hm−1 (w) is uniformly distributed over the possible hash val-
ues. In this case we speak of uniform hashing. Then we have the following theoretical result.
Theorem 5.12. Let be hi an ideal hash function sequence for m hash values, where already n
values are already occupied. Then the expected costs (numbers of probes) is approximately
1
Cn0 = for insert and unsuccessful search (5.10)
1−α
1 1
Cn = · ln . for insert and successful search (5.11)
α α
Here again α = n/m is the load factor.
Hash functions for collision resolution. Let the number m of possible hash values be given.
• Linear probing. One common method is the linear probing. Suppose a hash function
h(w).
hi (w) = h(w) + i mod m, i = 0, 1, . . . , m − 1. (5.12)
• Double Hashing. Suppose two hash functions h(w), h0 (w). Then we define a sequence of
hash functions
0 2
hi (w) = h(w) + h (w) · i mod m, i = 0, 1, . . . , m − 1. (5.14)
We require that the two hash functions are (stochastically) independent and uniform. This
means that for two different words v 6= w the events
each occur with probability 1/m, and both events together occur with probability 1/m2 ;
or expressed in formulae:
1 1 1
P(X) = , P(X 0 ) = , P(X ∧ X 0 ) = P(X) · P(X 0 ) = .
m m m2
This yields a real excellent hash function! Experiments show that this function has run-
ning times that are practically not distinguishable from ideal hashing. However, it is not
easy to find appropriate pairs of hash functions which can be proved to be independent.
Some are given in [29] pp.528
Part II
54
Chapter 6
Optimization problems
In this chapter we present the definition and formal structure of optimization problems and give
some typical examples.
6.1 Examples
Example 6.1. (Regression polynomial) In statistics, one is often interested in finding a regres-
sion polynomial, or regression curve, for a given series of data pairs (t1 , y1 ), . . . , (tN , yN ) where1
ti , yi ∈ R for i = 1, . . . , N. Such data may represent measurement samples with uncertainties due
to the measurement apparatus or signal perturbations by noise. To find a regression polynomial
of degree n − 1 to these data we mean that we want to specify the p real coefficients x0 , x1 , . . . ,
xn−1 of a polynomial p : R → R,
such that yi ≈ p(ti ) for all i = 1, . . . , N. For instance, if we want to find a linear regression
polynomial, we have p = 2 and look for two parameters x0 , x1 such that yi ≈ x0 + x1ti (Figure
6.1).
.6 y .6 y
.5 .5
.4 .4
.3 .3
.2 .2
.1 .1
t t
(a) 1 2 3 4 5 (b) 1 2 3 4 5
Figure 6.1: (a) Scatterplot of the data sample of the sample (1, .07), (2, .15), (3, .28), (4, .42), (5, .57). (b) Linear
regression of the sample.
A data pair series with ti = i especially represents a time series, i.e., a series of data values
(y1 , y2 , . . . ,yN ), where yt denotes the data value measured at time or period t. For instance,
these data may represent sales figures of a certain article at different periods. Specifying a
regression polynomial offers the possibility to gain from past sales figures a forecast for the
next few periods. The linear regression line, e.g., represents the bias, or “trend line” [40].
Example 6.2. (Traveling salesman problem (TSP)) A traveling salesman must visit n cities
such that the round trip is as short as possible. Here “short” may be meant with respect to time
or with respect to distance, depending on the instance of the problem. The TSP is one of the
most important — and by the way one of the hardest — optimization problems. It has many
1 The problem could easily be generalized to the case ti ∈ Rm and yi ∈ Rk with m, k ∈ N.
55
56 Andreas de Vries
17 46 Soest
Bochum Dortmund
56 45
49 28 18 27
81
Hagen Iserlohn
Düsseldorf 52 18 50 Meschede
61 112
38
Köln
Figure 6.2: A TSP for n = 8 cities. What is the shortest round-trip for the traveling salesman visiting each city
exactly once, starting and terminating in Hagen?
applications. For instance, a transport service which has to deliver goods at different places may
be considered as a TSP; another example of a TSP is the problem to program a robot to drill
thousands of holes into a circuit board as quick as possible.
There are many generalizations of the TSP, for example the travel time between two cities
may depend on time, such as the rush hours where it is longer than at night.
for a minimum problem, for example optimizing costs. The domain S is called the search space
of the optimization problem, and the function f is the objective function, or cost function. In
the next paragraphs, we will consider these notions in more detail.
Example 6.1 (Regression, continued). A reasonable objective function for the regression prob-
lem is the the error function. It sums the distances p(tk ) − yk of each sample point (tk , yk ), where
p is regression polynomial (6.1) and k = 1, . . . , N. Because a solution is better if and only if
the error is smaller, the regression problem is a minimum problem. But what does “distance”
exactly mean? A widely used distance measure is the mean squared error of the regression
polynomial (6.1) and the sample data,
1 N
f (x0 , . . . , xn ) = ∑ [ x0 + x1tk + · · · + xn−1tkn−1 − yk ]2 . (6.7)
N k=1 | {z }
p(tk )
For instance, the error function of the linear regression is given by f (x0 , x1 ) = N1 ∑N
k=1 [x0 +
2
x1tk − yk ] cf. Figure 6.3. It can be solved by calculating the gradient and setting it to zero.
0 0.1 0.16
0.14
-0.1 0.1 0.12
0.18
0.15
0.16
0.1 0.14
0.05 0.12
-0.4 -0.2 0.1
0 0.2 0 -0.15 -0.1 -0.05
0.4 0
(a) (b)
Figure 6.3: (a) The squared error objective function of linear regression (6.7) for the sample (1, .07), (2, .15),
(3, .28), (4, .42), (5, .57); the search space is S = R2 , the minimum is reached at x = (x0 , x1 ) = (−.00068, .1016);
(b) the absolute-distance error objective function (6.8) for the same sample.
We will not go into more detail of the solution of this problem here, this is done in statistics.2
Another important error functions is the mean absolute distance
1 N
f (x) = ∑ |p(tk ) − yk |
N k=1
(6.8)
where x = (x0 , . . . , xn−1 ). An objective function basing on the mean absolute distance is less
influenced by extreme outliers than a objective function using the mean squared error. This is
the reason why the mean absolute distance is usually preferred in economical applications, for
2 To shortly mention at least the simplest case: The linear regression parameters are given by
cov(T,Y )
x0 = ȳ − x1t¯, x1 =
σ 2 (T )
where t¯ = N1 ∑Nk=1 tk and ȳ = N1 ∑Nk=1 yk are the mean values, respectively, cov(T,Y ) = 1
∑N1 tk yk − t¯ȳ is the
N
1
covariance, and σ 2 (T ) = N−1 ∑N1 (tk − t¯)2 is the variance [12, §3.1], [42, §2.4].
Algorithmics and Optimization 59
there often are extreme outliers given as peaks in the sales figures (e.g., because of Christmas
trade) or by production downtimes.
Example 6.2 (TSP, continued). For the TSP, the objective function is quite obvious, it is the
total length of a round-trip. Usually, the distances between two directly connected cities are
given by a matrix G = (gi j ) where gi j denotes the distance from city i to city j; we have gii = 0,
and gi j = ∞ if there is no edge between city i and j. The matrix G is often called the weight
matrix. Then the objective function of the TSP is f : S → R+ ,
N
f (x) = ∑ gxk−1xk (6.9)
k=1
where x = (x1 , . . . , xn ), and x1 is the index of the “home town” of the salesman.
Example 6.3 (Production planning, continued). For the production planning problem, the
objective function is naturally given, since it is given by the profit. If product Pk is produced
with quantity xk and yields a specific profit of ck currency units per quantity unit, then the total
profit is given by f : S → R,
n
f (x) = ∑ ck xk . (6.10)
i=1
Example 6.4. (Rastrigin function) The Rastrigin Function is an example of a non-linear func-
tion with several local minima and maxima. It has been first proposed by Rastrigin as a 2-
dimensional function [41]. It reads
n
f (x) = a n + ∑ xi2 − a cos(ωxi ) (6.11)
i=1
with the external parameters a, ω ∈ R+ , and x = (x1 , x2 , . . . , xn ). The surface of the function
10
1.5 5
0
1 1.5
0.5 1
0.5
00
Figure 6.4: The Rastrigin function (6.11) with the search space S = [0, 2]2 ⊂ R2 and the external parameter
values a = 2, ω = 2 π. The global minimum is at x = (x1 , x2 ) = (0, 0), its global maximum at x = ( 32 , 23 ).
is determined by the external variables a and ω, which control the amplitude and frequency
modulation, respectively.
Multi-criterion optimization
Many every-day optimization problems are to solve not only with respect to a single criterion but
to several criteria. If you want to buy a car, say, you try to optimize some criteria simultaneously,
such as a low price, a low mileage, and a high speed.
60 Andreas de Vries
• Newton’s method. If the objective function is even twice differentiable, the Newton
method may be applied. It “linearizes” the gradient and leads to the next local optimum. It
is very computation-intensive because it involves matrix inversion (of the “Hesse matrix”,
a generalization of the second derivative).
• Lagrangian multiplier. If the optimization problem underlies some constraints, and the
objective function is differentiable, the Lagrangian multiplier method can be applied. It
is widely used in physics and engineering [39, §14].
Algorithmics and Optimization 61
• Simplex algorithm. The simplex algorithm computes the unique optimum of a linear
optimization problem, i.e., both the objective function and all constraints are linear.
D
B
C
A
Figure 6.5: A greedy solution of a TSP, starting at A. Obviously, the path A–C–B–D–A is shorter.
in each step the next city to be visited as the one nearest to the current visited city. A
solution then could look like as in Figure 6.5, i.e., a greedy algorithm does not guarantee
to succeed. Examples of greedy algorithms which are guaranteed to work correctly are
Dijkstra’s algorithm to find a certain class of shortest path in a network, or the Huffman
coding algorithm.
with each other. Examples are ant colony optimization for discrete problems, where each
ant walks randomly and leaves slowly evaporating pheromones on its way influencing
other ants, and particle swarm optimization where particles fly through hyperspace having
a memory both of their own best position and of the entire swarm’s best position and
communicating either to neighbor particles or to all particles of the swarm.3
3 http://jswarm-pso.sourceforge.net
Chapter 7
Objects Relations
persons A knows B
players in tennis championship A plays against B
towns there exists a highway between A and B
positions in a game of chess position A transforms to B in one move
The denotation “graph” stems from the usual graphical representation: Objects are represented
by vertices1 and relations by edges. A relation normally is a directed edge, an arrow. If the
relation is symmetric, it is represented by an undirected edge. Correspondingly we will consider
directed and undirected graphs.
2 3
2 3
1
1
4 5
4 5
An edge thus is a pair of vertices. The edge e = (v, w) thus is represented as v —→ w. In this
way we recognize the first graph in figure 7.1 as a directed graph. V and E are given by
V = {1, 2, 3, 4, 5}, E = {(1, 2), (2, 3), (2, 4), (3, 4), (3, 5), (4, 1), (4, 4), (5, 3)}
63
64 Andreas de Vries
• A path, also called a walk, is a sequence of vertices p = (v0 , . . . , vk ) such that (vi , vi+1 ) ∈ E
for 0 5 i 5 k − 1. Its length is the numbers of the edges, i.e. length = k. A path is called
simple if no edges are repeated [30, §3.1].
• The maximum number of edges e = |E| in a general graph without parallel edges, con-
sisting of n = |V | vertices, is given if all vertices are joint directly to each other.
(a) An undirected graph without self-loops can have at most n2 pairs, and thus
n
|E| 5 . (7.1)
2
n
(b) An undirected graph with self-loops can have at most 2 pairs plus n self-loops. Since
n + n2 = n + n(n−1) = (n+1)n
2 2 , we have
n+1
|E| 5 . (7.2)
2
(c) A directed graph containing no self-loops: Since there are n2 pairs and each pair can
(d) A directed graph containing self-loops can have at most n(n − 1) + n = n2 edges, i.e.,
|E| 5 n2 . (7.4)
2. (Adjacency list) In an adjacency list each vertex has a list of all his neihbors. Via an array
v[] of length n = |V | each list is randomly accessible. For the left graph in figure 7.1 we
therefore have
There are more possibilities to represent graphs, but adjacency matrices and adcjacency lists are
the most important. Both normally are used as static data structures, i.e., they are constructed
at the start and won’t be changed in the sequel. Updates (insert and delete) play a minor part as
compared to the dynamic data structures we have studied so far.
2 4 6 8
3 5 7 9
execution of BFS. Since we successively examine neighbors of visited vertices, a queue is the
appropriate choice as the data structure for subsequently storing the neighbors to be visited next.
Let now V = {V [0],V [1], . . . ,V [n − 1]} be a set of m vertices. Then each number s ∈
{0, . . . , n − 1} corresponds to a vertex V [s]. A graph G is an object which consists of m ver-
tices. This is illustrated by the following class diagram:
Graph
Vertex V [ ] 1 * Vertex
Vertex adj[ ][ ] ————— color
BFS()
The graph contains the set V of vertices and the adjacency list adj where adj[i][ j] means that
vertex V [ j] is in the neighborhood of V [i]. Each vertex has a color. The method BFS(s) is the
implementation of the following algorithm. For the vertex s it blacks each neighbor of vertex
V [s] in graph G. (It is called by G.BFS(s)) BFS has as a local variable a queue q[] of integers
containing the indices i of V [i].
Complexity analysis. Initializing colors, distances and predecessors costs running time O(|V |).
Each vertex is put into the queue at most once. Thus the while-loop is carried out |E| times.
This results in a running time
Since all of the nodes of a level must be saved until their child nodes in the next level have been
generated, the space complexity is proportional to the number of nodes at the deepest level, i.e.,
In fact, in he worst case the graph has a depth of 1 and all vertices must be stored.
Algorithmics and Optimization 67
6 5
1 4
2 3
Complexity analysis. We observe that DFS is called from DFS maximally |E| times, and
from the main algorithm at most |V | times. Moreover it uses space only to store the stack of
vertices, i.e.
TDFS (|V |, |E|) = O(|V | + |E|), SDFS (|V |) = O(|V |). (7.9)
7.4 Cycles
Definition 7.2. A closed path (v0 , . . . , vk , v0 ) i.e., a path where the final vertex coincides with
the start vertex, is called a cycle. A cycle which visits each vertex exactly once is called a
Hamiltonian cycle. A graph without cycles is called acyclic.
Example 7.3. In the group stage of the final tournament of the FIFA World Cup, soccer teams
compete within eight groups of four teams each. Each group plays a round-robin tournament,
resulting in 42 = 6 matches in total. A match can be represented by two vertices standing for
the two teams, and a directed edge between them pointing to the loser of the match or, in case
of a draw, an undirected edge connecting them. For instance, for group E during the world cup
1994 in the USA, consisting of the teams of Ireland (E), Italy (I), Mexico (M), and Norway (N),
we have the graph given in Figure 7.4. This group is the only group in World Cup history so far
in which all four teams finished on the same points.
68 Andreas de Vries
Group E
E Team Goals Pts
Ireland – Italy 1–0 1:0 2:1
Norway – Mexico 1–0 0:0 Mexico (M) 3:3 4
1:1
Italy – Norway 1–0 I M Ireland (E) 2:2 4
Mexico – Ireland 2–1 Italy (I) 2:2 4
1:0 1:0
Italy – Mexico 1–1 N Norway (N) 1:1 4
Ireland – Norway 0–0
Figure 7.4: A cycle in the graph representing the match results of group E during the World Cup 1994
The Euler cycle problem (EC) then is to determine whether a given graph contains an Euler
cycle or not. By Euler’s theorem [9, §0.8], [26, §1.3.23], a connected graph contains an Euler
cycle if and only if every vertex has an even number of edges incident upon it. Thus EC is
decidable in O(n 3 ) computational steps, counting for each of the n vertices x in how many of
j
n
the at most 2 edges (x j , y) or (y, x j ) ∈ E it is contained:
TEuler (n) = Θ(n3 ). (7.12)
1 3 4 6
4 7
4
4 6
5
2
can express . . .
• the distances (i.e. “It has 3 km from 2 to 4.”);
• the costs (“It costs e 3 from 2 to 4.”)
• capacities (“The network bandwidth is 3 MBit per second on the cable from 2 to 4.”)
• traveling duration (“It lasts 3 hours from 2 to 4.”)
There are many more applications of weighted graphs. We extend our definition of graphs to
include the weights.
Definition 7.4. A weighted graph Gγ = (V, E, γ) is a graph G = (V, E) with the weight
γ :E →R
which assigns a real number to each edge. We often will simply write G for Gγ .
For an edge (v, w) ∈ E the weight is thus given by γ(v, w).4 The unweighted graphs which we
have seen so far can be considered as special weighted graphs where all weights are constantly
1: γ(v, w) = 1 for all (v, w) ∈ E.
It is often convenient to write γ as a matrix, where γvw = γ(v, w) denotes the weight of the
edge (v, w), where the weight is ∞ if the edge (v, w) does not exist; for convenience, such entries
are often left blank or are marked with a bar “–”. For the weighted graph in Fig. 7.5 we thus
obtain the weight matrix
− 1 − 4 − − ∞ 1 ∞ 4 ∞ ∞
− − 5 3 5 − ∞ ∞ 5 3 5 ∞
− − − − 4 7 ∞ ∞ ∞ ∞ 4 7
γ(v, w) = γvw =
− − 7 − 6 − = ∞ ∞ 7 ∞ 6 ∞ .
(7.13)
2 − − − − 4 2 ∞ ∞ ∞ ∞ 4
− − − − − − ∞ ∞ ∞ ∞ ∞ ∞
4 Note that we write for short γ(v, w) instead of γ((v, w)).
70 Andreas de Vries
In this way, the weight matrix is a generalization of the adjacency matrix. With the weight γ we
can define the length of a path in graph Gγ .
Definition 7.5. Let be p = (v0 , v1 , . . . , vn ) be a path in a weighted graph Gγ . Then the weighted
length of p is defined as the sum of its weighted edges:
n
γ(p) = ∑ γ(vi−1 , vi ). (7.14)
i=1
A shortest path from v to w is a path p of minimum weighted length starting at vertex v and
ending at w. This minimum length is called the distance
δ (v, w). (7.15)
If there exists no path between two vertices v, w, we define δ (v, w) = ∞.
Some algorithms can deal with negative weights. This case poses a special problem. Look
at the weighted graph in Fig. 7.6. On the way from v to w we can walk through the cycle to
2
−4
2 1 3
v w
decrease the distance arbitrarily. Hence either the minimum distance between two points that
are reachable via a negative cycle have distance −∞ — or the problem should be reformulated.
Moreover, for an edge (v, w) ∈ E in a graph Gγ = (V, E, γ) with only non-negative weights, we
simply have
δ (v, w) = γ(v, w) (7.16)
u 4
2
2 1 3
v w
Proof. The shortest path from v to w cannot be longer than going via u.
Note that this inequality holds even if there does not exist a path between one of the vertex pairs
(for then δ (·, ·) = ∞), or if there are negative weights (for then δ (·, ·) = −∞, and the shortest
path already goes via v . . . ).
In most shortest path algorithms the principle of relaxation is used. It is based on the triangle
inequality (7.17):
Here the matrix entry dist[v][w] stores the information of the minimum distance between v and
w, and the matrix entry next[v][w] represents the vertex one must travel through if one intends to
take the shortest path from v to w. From the point of view of data structures, they are attributes
of an object “vertex.” This will be implemented consequently in the Dijkstra algorithm below.
The Floyd-Warshall algorithm implements them as attributes of the graph; therefore they are
given as two-dimensional arrays (matrices!)
We now consider the Floyd-Warshall algorithm which is fascinating in its simplicity. It solves
the all-pairs shortest paths problem. It has been developed independently from each other by
R.W. Floyd and S. Warshall in 1962.
Let the vertex be an object as an element of the graph Gγ as given by the following diagram.
Graph
Vertex[ ] V Vertex
double[ ][ ] γ 1 *
————— int index
double[ ][ ] dist
int[ ][ ] next
floydWarshall()
(Note that the weight γ and the distance dist are given as two-dimensional arrays.) Then the
Floyd-Warshall algorithm is called without a parameter.
72 Andreas de Vries
algorithm FloydWarshall ()
//* Determines all-pairs shortest paths. The n vertices are V [i], i = 0, 1, . . . , n − 1
for (v ← 0; v < n; v + +) // initialize
for (w ← 0; w < n; w + +)
dist [v][w] ← γ[v][w]; next [v][w] ← −1;
for (u ← 0; u < n; u + +)
for (v ← 0; v < n; v + +)
for (w ← 0; w < n; w + +)
// relax:
if (dist [v][w] > dist [v][u] + dist [u][w])
dist [v][w] ← dist [v][u] + dist [u][w]; next[v][w] ← u;
Unfortunately, the simplicity of an algorithm does not guarantee its correctness. For instance,
it can be immediately checked that it does not work for a graph containing a negative cycle.
However, we can prove the correctness by the following theorem.
Theorem 7.7 (Correctness of the Floyd-Warshall algorithm). If the weighted graph Gγ = (V, E, γ)
with V = {V [0],V [1], . . . ,V [n − 1]} does not contain negative cycles, the Floyd-Warshall algo-
rithm computes all-pairs shortest paths in Gγ . For each index v, w ∈ {0, 1, . . . , m − 1} it yields
Proof. Because the relaxation goes through all possible edges starting at V [v], in the first two
loops we simply have dist [v][w] = γ[v][w]. It represents the weights of all paths containing only
two vertices.
In each next loop the value of u controls the number of vertices in the paths to be considered:
For fixed u all possible paths p = (e0 , e1 , . . . , eu ) connecting each pair e0 = V [v] with eu = V [w]
are checked. Since there are no negative cycles, we have
u 5 n − 1,
because for a shortest path in a graph without negative cycles no vertex will be visited twice.
Therefore, eventually we have dist [v][w] = δ (V [v],V [w]).
This elegant algorithm is derived from a common principle used in the area of dynamic pro-
gramming5 , a subbranch of Operations Research [11]. It is formulated as follows.
Bellman’s Optimality Principle. An optimum decision sequence has the property that — in-
dependently from the initial state and the first decisions already made — the remaining decisions
starting from the achieved (and possibly non-optimum) state yield an optimum subsequence of
decisions to the final state.
An equivalent formulation goes as follows. An optimum policy has the property that
— independently from the initial state and the first decisions already made — the remaining
decisions yield an optimum policy with respect to the achieved (and possibly non-optimum)
state.
Thus if one starts correctly, Bellman’s principle leads to the optimum path.
5 in German: Dynamische Optimierung
Algorithmics and Optimization 73
Complexity analysis. The Floyd-Warshall algorithm consists of two cascading loops, the first
one running n2 = |V |2 times, the second one running at most n3 times [26, §6.1.23]:
In a “dense” graph where almost all vertices are connected directly to each other, we achieve
approximately the maximum possible number of edges e = O(n2 ), cf. (7.3). Here the Floyd-
Warshall algorithm is comparably efficient. However, if the number of edges is considerably
smaller, the three loops are wasting running time.
• insert(int vertex, int distance): Inserts vertex along with its distance form the source
into the priority queue and reheaps the queue.
• int extractMin(): Returns the index of the vertex with the current minimal distance from
the source and deletes it from the priority queue.
The data structures to implement Dijkstra algorithm thus are as in the following diagram.
PriorityQueue Vertex
Vertex[] vertex int index
int size Graph Vertex[] adjacency
1 1 1 *
insert (index, dist) ————— double[ ][ ] γ ————— double distance
extractMin () dijkstra (s) Vertex predecessor
size () int queueIndex
decreaseKey (vertex, newDist)
Here adjacency denotes the adjacency list of the vertex. As usual, a vertex V [i] ∈ V is determined
uniquely by its index i. The algorithm Dijkstra is called with the index s of the source vertex as
parameter. It “knows” the priority queue h (i.e., h is already created as an object.) The vertex
attribute queueIndex will be used by the Dijkstra algorithm to store the current index position of
the vertex in the priority queue. The algorithm is shown in Figure 7.8. There are some Java
applet animations in the Web, e.g.,
http://www-b2.is.tokushima-u.ac.jp/˜ikeda/suuri/dijkstra/Dijkstra.shtml
74 Andreas de Vries
Algorithmic analysis
For the correctness of the Dijkstra algorithm see e.g. [23, Lemma 5.12].
Theorem 7.8. The Dijkstra algorithm based on a priority queue realized by a heap computes
the single-source shortest paths in a weighted directed graph Gγ = (V, E, γ) with non-negative
weights in maximum running time TDijkstra (|V |, |E|) and with space complexity SDijkstra (|V |)
given by
TDijkstra (|V |, |E|) = O(|E| · log |V |), SDijkstra (|V |) = Θ(|V |). (7.19)
Proof. First we analyze the time complexity of the heap operations. We have n = |V | insert
operations, at most n extractMin operations and e = |E| decreaseKey operations.
Initializing the priority queue costs at most O(e log n) running time, initializing the vertices
with their distance and predecessor attributes requires only O(n). Determining the minimum
with extractMin costs at most O(e log n), because it is performed at most e times and the reheap
costs O(log n). The method decreaseKey needs O(log n) since there are at most O(n) elements
in the heap.
To calculate the space requirement S(n) we only have to notice that the algorithm itself
needs the attributes dist and pred, each of length O(n), as well as the priority queue requiring
two arrays of length n to store the vertices and their intermediate minimum distances from the
source, plus a single integer to store its current length. In total, this gives S(n) = O(n), and even
S(n) = Θ(n) since this is the minimum space requirement. Q.E.D.
We remark that the Dijkstra algorithm can be improved, if we use a so-called Fibonacci
heap. Then we have complexity O(|E| + |V | log |V |) [23, §§5.4 & 5.5].
Chapter 8
Dynamic Programming
Dynamic programming, like the divide-and-conquer method, solves problems by combining the
solutions to subproblems.1 Divide-and-conquer algorithms partition the problem into indepen-
dent subproblems, solve the subproblems recursively, and then combine their solutions to solve
the original problem. In contrast, dynamic programming is applicable when the subproblems
are not independent, that is, when subproblems share subsubproblems. In such a context, a
divide-and-conquer algorithm would do more work than necessary, repeatedly solving the com-
mon subsubproblems. A dynamic programming algorithm solves every subsubproblem just
once and then saves its answer in a table, thereby avoiding the work of recomputing the answer
every time the subsubproblem is encountered.
Dynamic programming is typically applied to optimization problems. In such problems
there can be many possible solutions. Each solution has a value, and we wish to find a solution
with the optimal (minimum or maximum) value.
The development of a dynamic-programming algorithm can be broken into a sequence of
four steps.
Steps 1–3 form the basis of a dynamic-programming solution to a problem. Step 4 can be
omitted if only the value of an optimal solution is required.
75
76 Andreas de Vries
D4 I 3
2 2
B1 G M 1
4
2 3 1
A 1
E2 K2 O
3
1 2
C4 1 H 6 N
3
3 3
F L
0 1 2 3 4 5 6
stage, due to a decision. Being in state E at stage 2, the possible decisions are to take either
state G or state H at stage 3.
In other words, searching the cost-minimal path from A to O means to search a decision
sequence, which, starting from the initial state A, yields a state on each stage such that the
final state O is reached in a cost-minimal way. At each stage there has to be exactly one state
(point) lying on the optimum path. A possible decision sequence is given in table 8.1 and
correspondingly by the emphasized path in figure 8.2.
D4 I 3
2 2
B1 G M 1
4
2 3 1
A 1
E2 K2 O
3
1 2
C4 1 H 6 N
3
3 3
F L
0 1 2 3 4 5 6
• At each stage there are several states xt , exactly one of which at each stage has to be run
through by a solution of the problem.
• Being in state xt at stage t (t = 0, . . . , n), a decision (or action) at has to be made to achieve
a state xt+1 at stage t + 1.
• The total decision x consists of a decision sequence a = (a0 , a1 , . . . , an−1 ). Here at is the
decision at stage t, or in other words, the solution of subproblem t. a is also called the
decision vector.
Here f is the transition function, which changes state xt at stage t into state xt+1 depending
on the decision at . We call equation (8.1) the transition law, or law of motion, cf. [2]. It is
important to notice that state xt+1 at stage t + 1 solely depends on stage t, state xt , and decision
at . Other states or actions at stage t have no influence on xt+1 .
But what has to be optimized at all? We have to minimize the sum of the costs ct that are
caused by each decision at changing from state xt to xt+1 . Formally we write
n−1
c(x0 , a) = ∑ ct (xt , at ) −→ min . (8.2)
a
t=0
where a = (a0 , a1 , . . . , an−1 ) is the decision vector, and xt with t > 0 results from the decision
at−1 and the state xt−1 . The function c is called the (total) cost function. It is “separable,” since
it can be separated into the sum of each stage costs, c(x0 , a) = ∑ ct (xt , at ). For a state sequence
xs –xs+1 – . . . –xt , with 0 5 s < t < n, where each state xk results from a decision according to the
recursive transition law (8.1), we will also denote the cost function of by c(xs , xs+1 , . . . , xt ), and
the minimum cost value from state xs to xt simply by c(xs , xt ).
The dynamic programming method now consists of stepwise recursive determinations of
optimal subpaths. All optimal subpaths are computed recursively, i.e. by using previously com-
puted optimal subpaths. The recursion relies on the following fundamental principle.
Bellman’s Optimality Principle. An optimum decision sequence has the property that — in-
dependently from the initial state and the first decisions already made — the remaining decisions
starting from the achieved (and possibly non-optimum) state yield an optimum subsequence of
decisions to the final state.
An equivalent formulation goes as follows. An optimum policy has the property that
— independently from the initial state and the first decisions already made — the remaining
decisions yield an optimum policy with respect to the achieved (and possibly non-optimum)
state.
For our optimum-path problem the Bellman principle thus means: Each subpath to O, e.g.,
from D to O, must be an optimum connection from D to O, no matter whether the actual total
optimum path from A to O runs through D or not.
c5 (M,C) = 1, c5 (N,C) = 2.
These are the optimum costs from M and N, repectively. We write these values directly above
them. as in fig. 8.3 (left).
4
D4 I 3 D4 I 3
2 2 1 2 2 1
B1 G M 1 B1 G M 1
4 4
2 3 1 2 3 2 1
A 1
E2 K2 O A 1
E2 K2 O
3 3
1 2 2 1 2 2
C4 1 H 6 N C4 1 H 6 N
3 3
3 3 3 5 3
F L F L
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Figure 8.3: The optimum subpaths from stage 5 to stage 6 (left), and from stage 4 to stage 6 (right).
Back at stage 4, there are the possible states I, K, and L which can be achieved by the
optimum path. From I there is only one state achievable to reach O, namely M. This yields the
optimum costs c(I, O) = c4 (I, M) + c5 (M, O) = 3 + 1 = 4, written above the letter I.
From K there are two possible decisions, K–M or K–N. Hence the total costs from K to O
are either c(K, M, O) = 2, or c(K, N, O) = 4, i.e. the minimal costs from K to O are
Analogously,
c(L, O) = min[c(L, M, O), c(L, N, O)] = min[7, 5] = 5.
This concludes the computation of the optimum subpaths from stage 4 to stage 6, cf. fig. 8.3
(right).
In a similar manner we achieve recursively the optimum subpaths from stage 3, 2, 1, and 0,
resulting in the optimum path A–B–E–H–K–M–O, with total costs c(A, O) = 8.
4 10 4
D4 I 3 D4 I 3
2 6 2 1 2 6 2 1
B1 G M 1 B1 G M 1
4 4
2 3 2 1 2 5 3 2 1
A 1
E2 K2 O A 1
E2 K2 O
3 3
3 1 2 2 3 1 2 2
C4 1 H 6 N C4 1 H 6 N
3 3
3 5 3 6 3 5 3
F L F L
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Figure 8.4: The optimum subpaths from stage 3 to stage 6 (left), and from stage 2 to stage 6 (right). Note that
by stage 3, L–N–O cannot belong to the total optimum path!
Remark. In the consideration of the transition from stage 3 to stage 4 it is obvious that the
cheapest subpath between two stages (here F–G with cost 1) need not necessarily lie on an
optimum subpath.
Algorithmics and Optimization 79
10 4 10 4
D4 I 3 D4 I 3
6 2 6 2 1 6 2 6 2 1
B1 G M 1 B1 G M 1
4 4
2 5 3 2 1 8 2 5 3 2 1
A 1
E2 K2 O A 1
E2 K2 O
3 3
10 3 1 2 2 10 3 1 2 2
C4 1 H 6 N C4 1 H 6 N
3 3
6 3 5 3 6 3 5 3
F L F L
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Figure 8.5: The optimum subpaths from stage 1 to stage 6 (left), and the optimum path from A to O (right).
for each state xt at stage t. A (xt ) is the decision space, i.e. the set of all possible transitions
(paths) from xt to a state xt+1 . Equation (8.5) is called the Bellman functional equation. The
minimum value gt (xt ) thus yields the optimum path from xt to the end xn .
This value has to be computed for all possible states xt . Starting at the end, i.e. with t = n−1,
one computes successively the optima for all n stages. The optimum path from stage 0 with
initial state x0 to the final state xn at stage n then is the solution of the problem.
Hence it is natural to treat a sequence of decisions by reversing the order. It is for this reason
that the method is also called backwards induction.
follows.
b0 = 10 qu, b1 = 20 qu, b2 = 20 qu, b3 = 40 qu. (8.6)
There are the following restrictions.
• Because the firm has only limited production capacities, it can produce maximally 30 qu
per period.
• The firm has fixed production costs of 11 cu (“currency units”) per period.
x 0 10 20 30
cvar
p (x) 0 5 11 26
This yields the total production costs c p (x) as the sum of fixed and variable costs as
x 0 10 20 30
c p (x) 11 16 22 37
• At the beginning and at the end of the whole planning period no quantities stored are
allowed.
• Per qu of stored products emerge 0.2 cu storage costs cs per period, cs (lt ) = 0.2 lt cu,
where lt is the inventory at period t.
• At the end of each period the demand is called once at a time. During this period there do
not emerge any storage costs cs .
We search for the production quantities of each period that minimize the sum of production and
storage costs over the whole planning period,
3
c(l0 , x) = ∑ (c p(xt ) + cs(lt )) −→ min
x
. (8.7)
t=0
x
(qu)
90 90
Figure 8.6: The decision space A = lt A (lt ) (shaded region) of the production-smoothing problem. The
S
cumulative demand B(t) = ∑∗i=0 bi bounds A from below, whereas the cumulative production capacity L(t) = tlmax
bounds it from above.
We now sketch the decision space. It is bounded from below by the requirement that the demand
bt has to be covered (lower limit in fig. 8.6). The production capacity of 30 qu needs not to be
regarded since decision space is bounded from above by the storage capacity lmax = 20 qu. This
yields the decision space as the shaded region in figure 8.6. The decision space is restricted by
the demands bt and the storage capacity lmax .
Since the storage inventory lt+1 in period t + 1 depends only on the inventory lt and the
quantity xt produced in the previous period t (the demand bt is known and thus a constant
parameter), the following relation can be established, lt+1 = f (xt , lt ) with
f (xt , lt ) = lt + xt − bt . (8.8)
This is the classical storage balance equation of discrete-time production planning, saying that
the storage inventory at the beginning of the period t is given by the storage inventory at the
beginning of the preceding period, increased by the production quantity xt and decreased by
the demand bt . In the context of the dynamic programming method this means that state lt+1
at stage t + 1 depends only on the decision xt in the preceding stage. Figure 8.7 illustrates the
problem.
stage t stage t + 1
period t − 1 call of demand bt−1 period t call of demand bt period t + 1
−−−−−−−−−−−−−→ −−−−−−−−−−−−−→ −−−−−−−−−−−−−→
... lt = lt−1 + xt−1 − bt−1 lt+1 = lt + xt − bt ...
storage lt−1 storage lt storage lt+1
production xt−1 state lt production xt state lt+1 production xt+1
decision xt decision xt+1
Figure 8.7: Course of production and storage, and the relationship of periods and stages.
The costs of a period t consist of the production costs c p (xt ) and the storage costs cs (lt ) of
the period,
ct (lt , xt ) = c p (xt ) + cs (lt ) = c p (xt ) + 0.2 lt .
According to equ. (8.7) the total cost minimization problem is stated by
3
c(l0 , x) = ∑ ct (lt , xt ) −→ min
x
. (8.9)
t=0
Let gt (lt ) be the minimum cost necessary to reach the final sought for storage inventory state l4 ,
starting with the inventory lt , without violating one of the constraints 0 5 xt 5 30, 0 5 lt 5 lmax .
With (8.8) this yields the recursion formula
h i
gt (lt ) = min ct (lt , xt ) + gt+1 (lt + xt − bt ) (8.10)
xt ∈At (lt )
82 Andreas de Vries
with the decision spaces At (lt ) ⊂ {0, 10, 20, 30}. The equation says that from the storage in-
ventory lt at the beginning of period t, the target l4 is reached at minimum costs by minimizing
the sum of production and storage costs to reach lt+1 and the minimum costs to reach l4 from
lt+1 .
Equation (8.8) is the transition law of our multistage decision problem, i.e. f (xt , lt ) = lt +
xt − bt , and (8.10) is its Bellman functional equation.
0 0 0 0
10 10
0 10 10 0 0 10 10 0
20 20
20 20 20 20
(a) (b)
Figure 8.8: (a) The admissible states at each stage, given by the decision space A = A (lt ) for t = 0, . . . , 4.
S
lt
(b) The possible decisions.
capacity of lmax = 20 qu; the state l4 = 0 is prescribed by the condition that the storage has to be
empty at the end of the period. This yields Fig. 8.8 (a). The possible decisions then are drawn,
yielding Fig. 8.8 (b).
In a tedious way we then attach the total cost c p (xt ) + cs (lt ), with cs (lt ) = 0.2 lt , of each
decision xt . At stage t = 0, the storage costs cs (l0 ) are zero, since the storage is empty, cs (l0 ) = 0.
There are three possible decisions, x0 = 10, 20, or 30, each causing production costs of c p (x0 )
= 16, 22, or 37 cu, respectively. At the next stage t = 1, the possible decisions for state l1 = 0
are x1 = 20 or 30, yielding total costs of c p (x1 ) + cs (0) = 22 or 37 cu. For state l1 = 10 we have
storage costs cs (10) = 2, and thus the total costs of the three decisions are 16 + 2, 22 + 2, or
37 + 2, respectively. For l1 = 20, the storage costs are cs = 4, i.e. the total costs are 15, 22, or
26. Analogous calculations yield Fig. 8.9 (a). The optimal subpaths are achieved by computing
98 76
22 37 22 37
0 0 0 0 39
37 37
16 10 39 10916 87 18 63 24
10 39
18 24
0 22 10 24 10 0 0 22 10 24 10 0
39 39 26 39 39 26 26
37 37
15 20 20 78 1520 52 20
20
20
20 20 26 20 20 26
(a) 26 (b) 26
Figure 8.9: (a) The decision graph with the respective costs. (b) The optimal paths from each state to the final
state are marked.
backwards and attaching the minimal cost to each state, see Fig. 8.8 (b).
At stage 3, i.e. the end of period 3, the demand b3 = 40 has to be covered by the decision
x3 . By the storage balance equation (8.8) we have l4 = l3 + x3 − b3 , or l3 + x3 = 40. Together
with the Bellman equation this yields
h i
x3 = 40 − l3 , g3 (l3 ) = min c p (x3 ) + 0.2 l3 + g4 (l3 + x3 − 40) .
x3 ∈{0,10,20,30}
In the following table the production costs c p (x3 ) for each decision x3 is listed in the right
column, and the storage costs cs (l3 ) = 0.2 l3 for each storage quantity l3 in the lowest row. The
aim of the first iteration is to list the values of g3 (l2 + x2 − 20) for the admissible combinations
of x3 and l3 (i.e., x3 ∈ {0, 10, 20, 30} and 0 5 l3 5 lmax = 20). First the values of the squared
brackets are computed (note that g4 (l4 ) = 0). For this purpose we determine the term 0.2 l3 by
the following table.
l3 0 10 20 c p (x3 )
x3
0 – – – 11
10 – – – 16
20 – – 0 22
30 – 0 – 37
0.2 l3 0 2 4
g3 (0) is not defined, because x3 = 40 is not in the decision space, the production capacity is
maximally x3 = 30. To obtain the optimum decision x3 in dependence from the storage quantity
l3 , i.e. x3 (l3 ), as well as the values g3 (l3 ), we have to determine the minimum of each l3 -column
(bordered in the following table).
l3 0 10 20
x3
0 – – –
10 – – –
20 – – 26
30 – 39 –
x3 (l3 ) – 30 20
g3 (l3 ) – 39 26
Thus the values of g3 (l2 + x2 − 20) for the admissible combinations of x3 and l3 are 39 and 26.
Analogously, for stage 2 we have l3 = l2 + x2 − 20, and therefore
h i
g2 (l2 ) = min c p (x2 ) + 0.2 l2 + g3 (l2 + x2 − 20) .
x2 ∈{0,10,20,30}
We obtain the following left table for the values of g3 (l2 + x2 − 20) in dependence of l2 and
x2 , viz. 39 or 26. This yields the right table for the optimum decision x2 (l2 ) for l2 and the
corresponding values of g2 (l2 ).
l2 0 10 20 c p (x2 ) l2 0 10 20
x2 x2
0 – – – 11 0 – – –
10 – – 39 16 10 – – 59
20 – 39 26 22 20 – 63 52
30 39 26 – 37 30 76 65 –
0.2 l2 0 2 4 x2 (l2 ) 30 20 20
g2 (l2 ) 76 63 52
84 Andreas de Vries
We obtain the following left table for the values of g2 (l1 + x1 − 20) in dependence of l1 and x1 ,
and the right table for the optimum decision and the corresponding values of g1 (l1 ).
l1 0 10 20 c p (x1 ) l1 0 10 20
x1 x1
0 – – 76 11 0 – – 91
10 – 76 63 16 10 – 94 83
20 76 63 52 22 20 98 87 78
30 63 52 – 37 30 100 91 –
0.2 l1 0 2 4 x1 (l1 ) 20 20 20
g1 (l1 ) 98 87 78
We obtain the following tables for the values of g1 (l0 + x0 − 10), as well as for the optimum
decision and the corresponding values of g0 (l0 ).
l0 0 10 20 c p (x0 ) l1 0 10 20
x0 x1
0 – – – 11 0 – – –
10 98 – – 16 10 114 – –
20 87 – – 22 20 109 – –
30 78 – – 37 30 115 – –
0.2 l1 0 2 4 x1 (l1 ) 20 – –
g1 (l1 ) 109 – –
stage t lt xt bt
0 0 20 10
1 10 20 20
2 10 20 20
3 10 30 40
4 0
x
(qu)
90 0
80
26
70 b4
60 39
52
50
40 63 b3
78 76
30
87
20 b2
98
10
b1 time
109 1 2 3 4
Figure 8.11: A travelling salesman problem and its solution, a minimum-cost tour with cost 7.
from j to i). Moreover, the triangle inequality (cik 5 ci j + c jk ) need not hold as well. Also, the
costs ci j may by ∞, which means that there is no direct way from city i to city j.
The travelling salesman problem then is to search for a permutation π : {1, . . . , n} → {1, . . . , n}
that minimizes the cost function
n−1
c(π) = ∑ cπ(i),π(i+1) + cπ(n),π(1) −→ min . (8.11)
i=1
Since there may be ∞-entries in the cost matrix (ci j ), it is possible that a solution does not exist
at all, i.e. that each permutation π leads to an infinite cost value c(π).
A naïve algorithm would go through all n! permutations π and compute c(π) each time.
(It suffices to consider only permutations with π(1) = 1; there are (n − 1)! of those ones.)
Therefore, this algorithm has complexity O(n!), even worse than an exponential complexity.
By dynamic programming this naïve ansatz can be improved. However, the resulting algo-
rithm will still have an exponential complexity. It is widely believed that there does not exist an
algorithm for the travelling with better complexity at all! This problem is one of a few which
are called “NP-complete”.
Our dynamic programming algorithm needs a table for the g(i, S)-values, where the combina-
tions with 1 ∈ Si are not needed. The algorithm works as follows.
for ( i = 2; i 5 n; i + + ) g(i, 0) / = ci1 ;
for ( k = 1; k 5 n − 2; k + + ) {
for ( S, |S| = k, 1 ∈ /S ) {
for ( i ∈ {2, . . . , n} \ S ) {
compute g(i, S);
}
}
}
compute g(1, {2, 3, . . . , n);
Figure 8.12: A travelling salesman problem and its solution, a minimum-cost tour 1—2—4—3—1 with cost
35.
/ = c21 = 5,
g(2, 0) / = c31 = 6,
g(3, 0) / = c41 = 8,
g(4, 0)
as well as
g(2, {3}) = c23 + g(3, 0)
/ = 15, g(2, {4}) = c24 + g(4, 0)
/ = 18,
g(3, {2}) = c32 + g(2, 0)
/ = 18, g(3, {4}) = c34 + g(4, 0)
/ = 20,
g(4, {2}) = c42 + g(2, 0)
/ = 13, g(4, {3}) = c43 + g(3, 0)
/ = 15,
g(2, {3, 4}) = min[c23 + g(3, {4}), c24 + g(4, {3})] = 25,
g(3, {2, 4}) = min[c32 + g(2, {4}), c34 + g(4, {2})] = 25,
g(4, {2, 3}) = min[c42 + g(2, {3}), c43 + g(3, {2})] = 23,
g(1, {2, 3, 4}) = min[c12 +g(2, {3, 4}), c13 +g(3, {2, 4}), c14 +g(4, {2, 3})] = min[35, 40, 43] = 35.
Storing the subpaths that yield the respective solutions, we achieve the optimum tour: it is
1—2—4—3—1.
Chapter 9
Simplex algorithm
Let us introduce the simplex algorithm by applying it to a concrete optimization problem. Af-
terwards we will take a look at some of its general properties.
Example 9.1. (Production scheduling) A firm gains profit of 2 ke1 with product 1, and profit
of 2.2 ke with product 2. To produce these products there are two machines A and B available.
Machine A can only be used up to 100 hours, and machine B up to 80 hours. (The remaining
hours are needed for maintaining.) To produce product 1, machine A is needed 1 hour a week
and machine B 2 hours; the respective numbers for product 2 are 2 hours on A and 1 hour a
week on B. There are two resource materials R and S needed. R is available only up to 960
kg a week, and material S only 1200 kg a week. Producing product 1 requires 16 kg of R and
20 kg of S, whereas product 2 requires 15 kg of R and 16 kg of S. The production schedule
maximizing the profit is searched.
where x1 is the number of instances of product 1 and x2 the respective number of product
2. It is called the objective function2 of the problem. Observe that quite naturally x1 and
x2 are nonnegative,
x1 , x2 = 0, (9.2)
2. (Determination of the constraints). Once the quantities x1 and x2 are defined, the con-
straints are directly derived:
x1 + 2x2 5 100
2x1 + x2 5 80
(9.3)
16x1 + 15x2 5 960
20x1 + 16x2 5 1200
11 ke = 1000 e
2 “objectivefunction” in German: “Zielfunktion”
87
88 Andreas de Vries
We can rewrite equations (9.1), (9.2) and (9.6) in matrix notation to reformulate the optimization
problem as
z = c∗ · x −→ max, under the constraints Ax 5 b, x = 0, (9.4)
where
1 2
100
2 1
c∗ = (2, 2.2), x =
x1 , b = 80 ,
, A= 16 15 960 (9.5)
x2
20 16 1200
∗
(Note that c is a row vector and denotes the transpose of the column vector c.) The crucial
trick of the simplex algorithm now is to introduce some extra slack variables yi to transform the
inequalities (9.3) into equalities:
x1 + 2x2 + y1 = 100
2x1 + x2 + y2 = 80
(9.6)
16x1 + 15x2 + y3 = 960
20x1 + 16x2 + y4 = 1200
It is notationally convenient to record the information content of (9.4) in a so-called simplex
tableau, as follows.
x1 x2
z 2 2.2 0
y1 1 2 100 (9.7)
y2 2 1 80
y3 16 15 960
y4 20 16 1200
1. Determine the pivot column. The “pivot column” is the column of a maximum positive
value in the z-row,
j p ∈ { j : c j > 0 ∧ c j = max[ck ]}. (9.11)
k
If there is no positive value c j , the method is terminated.
2. Determine the pivot row and the pivot element. The “pivot row” is the row i p for which
the quotient bk /ak, j p with ak, j p > 0 is minimal,
n h b io
k
i p ∈ i : ai, j p > 0 ∧ ai, j p = min . (9.12)
k ak, j p
If there is no positive matrix entry ai, j p > 0, the algorithm stops, there does not exist a
solution. The matrix entry ai p , j p is the pivot element.3 Save the value of the pivot element
d ← ai p , j p .
4. Change the z-row values. The z-row values c j are changed according to the following
cases:
−c j /d if j = j p , (c j is in the pivot column)
cj ← (9.13)
c j − c j p ai p j /d (“rectangle rule”) otherwise.
5. Change the matrix entries. The matrix entries ai j are changed according to the following
cases:
1/d if i = i p and j = j p , (ai j is the pivot element)
ai j /d if i = i p and j 6= j p , (ai j is in the pivot row)
ai j ← (9.14)
−a ij /d if i 6= i p and j = j p , (ai j is in the pivot column)
ai j − ai p j ai j p /d (“rectangle rule”) otherwise.
6. Change the b-values. The b-values bi are changed for i = 0, 1, . . . , n according to the
following cases:
bi /d if i = i p , (bi is in the pivot row)
bi ← (9.15)
bi − ai j p bi p /d (“rectangle rule”) otherwise.
Therefore, as long as they are not in a pivot row or a pivot column, the tableau entries ai j , b j
and c j are determined by the “rectangle rule”
uv d ··· u
.. ..
w ← w− . . (9.16)
d
v ··· w
Here d = ai p j p is the (old) value of the pivot element, i 6= i p , j 6= j p , and alternatively one of
each of the following rows holds:
ai p j ai j p ai j
u= bi , v= ai j , w= bi (9.17)
p p
ai p j c jp cj
x2 5 64 − 16
15 x1 , esp.: x1 = 0 ⇒ x2 5 64, x1 = 60 ⇒ x2 5 0
x2 5 75 − 54 x1 , esp.: x1 = 0 ⇒ x2 5 75, x1 = 60 ⇒ x2 5 0.
On the right hand there are given the intersections of the straight lines with the x1 - and x2 -
axes. Graphically, the situation is given in figure 9.1. Each constraint is represented by its
x2 x2
x2 = 80 − 2 x1 x2 = 80 − 2 x1
16
x2 = 64 − 15 x1 x2 = 64 − 16
15 x1
x2 = 75 − 45 x1 x2 = 75 − 54 x1
50 50
x2 = 50 − 21 x1 x2 = 50 − 12 x1
x1 x1
50 100 50 100
Figure 9.1: Left figure: the constraint lines of example 9.1; the respective inequality “5” is geometrically
represented by the region below the line, the inequalities x1 , x2 = 0 by the positive quadrant above the x1 -axis and
left of the x2 -axis. The shaded region therefore denotes all possible solutions (x1 , x2 ). Right figure: the same
sketch, added some possible z-lines (dashed); the maximum meeting the shaded region (dashed-dotted line) is the
solution.
straight line. All together, they form the shaded region of possible solutions (x1 , x2 ). (Note that
a solution is a point in the diagram!) Also you find various parallel lines for the profit value z
(“contour lines”). On each line the profit is equal. The highest line meeting the shaded region
is the maximum profit.
92 Andreas de Vries
...
9.4 Duality
What do we have to do if we want to solve a linear minimum problem? The simplex algorithm
only can be applied to linear maximum problems, z → max. A first idea is to consider the
modified objective function z0 = −z, but this leads into a dead-end street: usually the relevant
coefficients then are negative and we have no chance to choose a pivot column.
The solution is duality. In general, duality is a fascinating and powerful relation between
two objects being in different classes (or contexts) but having the same or equivalent proper-
ties. For example, in every-day life the mirror reflection is a duality, since the mirror image
of a geometrical object can be uniquely mapped to its original . . . and vice versa, by the same
operation. Another, more subtle examle is the geometric duality of points and straight lines in a
plane. The statement “Two points determine a straight line” has as dual statement “Two straight
lines determine a point.”
Once a duality is known, a given problem can be transformed by it to another problem,
the “dual problem,” which sometimes is much easier to solve.It is a remarkable property that
minimum problems are dual to maximum problems. That means that we can
3. transform the solution back by reading the non-vanishing y-values from the corresponding
z-row entries.
Algorithmics and Optimization 93
However, what is the duality operation? The transformation rule of forming the dual problem
(the “mirror image”) of a minimum problem is given by the following table.
minimum problem dual problem
z∗ (~y) = ~b∗ ·~y → min z(~x) =~c∗ ·~x → max
constraints: constraints:
(9.19)
A∗ ·~y =~c A ·~x 5 ~b
variables: variables:
~y = 0 ~x = 0
Here A∗ denotes the conjugate4 (that is, since we only deal with real matrices, simply the
transpose5 ) of A, i.e. the matrix resulting from interchanging its rows and columns. In particular,
the following correspondences between the variables and the dual slack variables can be shown
[10].
minimum problem dual problem
variable yi slack variable yi of the i-th constraint (9.20)
slack variable x j of the j-th constraint variable x j
Example 9.2. We first consider a trivial minimum problem to clarify the principles. Assume
a company has to buy 5 qu of a product whose production costs are limited by the condition
that three qu of it cannot be got for less than 12 e. What is the price y per qu of the product to
minimize the total costs? (It is immediately clear that the solution is y = 4, but let us see how
the solution startegy works!)
Mathematical formulation: The objective function expressing the total costs is z(y) = 5y, and
the constraint is given by 3y = 12. In matrix notation we thus achieve
z∗ (y) = by → min, A∗ y = c, (9.21)
with the one-dimensional vectors y (“= (y)”), b = 5, c = 12. and the (1 × 1)-matrix A∗ = 3.
Since A = A∗ = 3, we achieve the dual problem
z(x) = 12x → max, 3x 5 5. (9.22)
It can be solved by the simplex algorithm:
x y
z 12 0 =⇒ z −4 −20
1 5
y 3 5 x 3 3
The maximum problem thus is solved for x = 53 , and the original minimum problem for y = 4,
yielding total costs of z(3) = 5 · 4 = 20.
is optimized with respect to two different point of views: the minimum problem (9.21) searches
to minimize the production costs for a given quantity (x = 5 qu) and a lower price limit (3y =
12 e); the dual maximum problem (9.22) aims to maximize the quantity x for a given price (y
= 12 e/qu) and an upper quantity limit (3x 5 5 qu).
Therefore, the minimum problem (9.21) is related to the demand-sided viewpoint (the pro-
ducer pays for ressources, i.e. has costs), whereas the dual problem (9.22) is viewed by the
supplier (who gains the production prices). Duality in economical optimization problems often
reflects the different points of view of a demander and a supplier.
Example 9.3. Let be given the minimum problem
In other words, we have z∗ (~y) = ~b∗ ·~y under the constraints A∗~y =~c, where
~b∗ = (100, 80, 960, 1200), A∗ = 1 2 16 20 , ~c = 2 .
2 1 15 16 2.2
Thus we see immediately that the dual problem is exactly example 9.1 (p. 87). Solving it by the
simplex algorithm yields the final tableau (9.18), from which we can read the solution for ~y in
the z-row:
Example 9.4. A firm produces a product from two raw materials, where for each 2 kg of the
first material there is always at least 1 kg of the second material. It should be bought at least
50 kg of the first material, and at least 100 kg of both in total. The price of the first material is
6 e /kg, whereas the price of the second material is 9 e /kg. What quantities of both materials
must be bought such that the costs are minimal?
Solution. First we formulate the problem mathematically. Denote the quantity in kg of the
first material by y1 , and the quantitiy of material 2 by y2 . The first condition mentioned means
that we always have at most twice as much of material 1 than of material 2, i.e., y1 5 2y2 . This
can be rewritten to a restriction in the form: 2y2 −y1 = 0. Together with the two other mentioned
restrictions we thus have the systems of inequalities
−y1 + 2y2 = 0,
y1 = 50,
y1 + y2 = 100.
In matrix notation, this is z∗ (~y) = ~b∗~y → min, under the constraints A∗~y =~c, where
−1 2 0
∗ ~ 6
A = 1 0 , b= 50 , ~c = ,
9
1 1 100
Algorithmics and Optimization 95
This is a minimum problem. Its dual is z(~x) = ~c∗~x → max, under the constraints A∗~x 5 ~c. This
gives the following simplex tableau.
x1 x2 x3 x1 y1 x3
z 50 100 0 0 z 50 −100 100 −600
=⇒
y1 1 1 −1 6 x2 1 1 −1 6
y2 0 1 2 9 y2 −1 −1 3 3
x1 y1 y2
z − 3 − 3 − 100
50 200
3 −700
=⇒ 1
x2 1 1 3 7
1 1
x3 −3 −3 1 1
Therefore, 200/3 = 66.6 kg of the first raw material and 100/3 = 33.3 kg of the second raw
material should be bought. This yields total buying costs of 700 e.
Chapter 10
Genetic algorithms
evolutionary programming the parameters of the program executing the optimization are var-
ied, evolution strategies and genetic algorithm differ only in that the search space of evolution
strategies is real hyperspace and in that the mutation is adapted during the evolution process.
For details and further historical remarks see [15, 43].
96
Algorithmics and Optimization 97
are searching for “a best” solution, we can view this task as an optimization process. For small
spaces, classical exhaustive, or brute-force, methods usually suffice, for larger spaces special
techniques from the area of artificial intelligence must be employed.
Genetic algorithms are among such techniques. They are stochastic algorithms whose
search methods model the natural phenomenon of evolution: genetic inheritance, mutation,
and selection. In evolution, the problem each species faces is one of searching for beneficial
adaptions to a complicated and changing environment. The “knowledge” that each species has
gained is embodied in the makeup of the chromosomes of its members.
Example 10.1. (The rabbit example) [31]. At any given time there is a population of rabbits,
some of which are smarter and faster than the others. The faster and smarter rabbits are less
likely to be eaten by foxes, and therefore more of them survive to do what rabbits do best: make
more rabbits. Of course, some of the slower and dumber rabbits will survive just because they
are lucky.
This surviving population of rabbits starts breeding. The breeding results in a good mixture
of rabbit genetic material: some slow rabbits breed with fast rabbits, some fast with fast, some
smart rabbits with dumb rabbits, and so on. And on the top of that, Nature throws in a “wild
hare” every now and then by mutating some of the rabbit genetic material.
What is the ultimate effect? The resulting baby rabbits will, on average, be faster and
smarter than those in the original population because more faster and smarter parents survived
the foxes.2
A genetic algorithm follows a step-by-step procedure that closely matches the story of the rab-
bits. Genetic algorithms use a vocabulary borrowed from natural genetics:
• Chromosomes are made of units, the genes. They are arranged in a linear succession.
Genes are located at certain places of the chromosome, called loci.
• Any feature of individuals (such as hair color) can manifest itself differently; the gene is
said to be in several states called alleles, i.e., values.
Table 10.1 shortly compares the original biological meaning of different notions and their mean-
ing with respect to genetic algorithms.
An evolution process running on a population of chromosomes corresponds to a search
through a search space S of feasible solutions. Such a search requires balancing two apparently
conflicting objectives: exploiting the best solutions and exploring the solution space sufficiently.
Random search is a typical example of a strategy which explores the solution space ignoring
the exploitations of promising regions of the space. Genetic algorithms are a class of general
purpose (“domain independent”) search methods which strike a remarkable balance between
the conflicting strategies of exploring and exploiting.
2 Of course the foxes undergo a similar process, otherwise the rabbits might become too fast and smart for them.
3 Strictly speaking, in biology there is the hierarchy chromosome → genotype → phenotype = individual. Es-
pecially, there may be several genotypes resulting in the same phenotype, mainly because a phenotype is influenced
by the environment. Usually, these distinctions are not adhered to in evolutionary algorithms.
98 Andreas de Vries
• The genetic representation S = {x}, for a feasible solution x to the problem: x are the
individuals or chromosomes, and the set S is their data structure. Usually, in a genetic
algorithm the chromosome is a binary string of length n,
Thus usually, S ⊆ {0, 1}n ⊂ Zn , i.e., genetic algorithms apply to combinatorial optimiza-
tion problems. Usually, each single bit is a gene.
• Genetic operators which generate and alter the chromosomes of the children (“alter-
ation”). There are two types of genetic operators:
CX
−→
Often, a crossover operator simultaneously creates two children from two parents,
with the roles of the two parents exchanged. There may be several crossover opera-
tors acting in a genetic algorithm.
– A mutation operator M : S → S which changes one or several genes randomly,
M
−→
Algorithmics and Optimization 99
• Various parameter values used by the genetic algorithm, such as population size, proba-
bilities of applying the genetic operators, etc.
(Note that an individual here is identified with its chromosome.) Usually, the initial pop-
ulation is given by creating p solutions by random.
• Selection. (“Survival of the fittest”) This subroutine selects the parents from the current
population to reproduce the next generation. There are three popular selection princi-
ples, truncation selection where a fixed percentage of the best individuals are chosen,
roulette-wheel selection where individuals are selected with a probability proportional to
their fitness, and tournament selection where a pool of individuals is chosen at random
and then the better individuals are selected with predefined probabilities. Most selec-
tion are stochastic, such as the latter two, so that a small proportion of less fit solutions
are selected. This helps keep the diversity of the population large, preventing premature
convergence on local but non-global optima.
• Reproduction. This subroutine generates the next generation from the selected parents
of the current population. Besides the application of the genetic operators to the parents,
the routine must define to what extend the parent generation and the children survive. A
genetic algorithm with a reproduction scheme in which the best individual of a generation
is guaranteed to survive is said to obey the elite principle [15, §5.2].
in each iteration step t, called the t-th generation. For each individual x j (t) of this generation,
its fitness f (x j (t)) is evaluated. Then a new population, the next generation t + 1, is formed by
selecting the fitter individuals (“selection step”). Some members of the new population undergo
transformations by means of the genetic operators to form new solutions (“reproduction step”).
After some number of generations the program generates better and better individuals, hopefully
the best individual represents a near-optimum solution. To summarize, the “canonical” genetic
100 Andreas de Vries
• Given n objects, the search space S is most naturally given by a binary string x ∈ {0, 1}n of
length n, where the k-th bit indicates whether the k-th object is picked into the knapsack:
xk = 0 means that it is not, xk = 1 means that it is. For instance, Example 10.2 implies a
search space which consists of vectors (x1 , x2 , x3 ) with x1 , x2 , x3 ∈ {0, 1}, and the candi-
date solution x = (0, 1, 0) means that only object B is put into the knapsack. However, we
have the constraint that the maximum load of the knapsack must not be exceeded: this is
most easily expressed by a weight vector w = (w1 , . . . , wn ) where wk denotes the weight
of object k, and thus the weight constraint reads ∑nk wk xk 5 wmax . Therefore, the search
space is determined by
n n o
S = x ∈ {0, 1}n : ∑ wk xk 5 wmax (10.2)
k=1
where v = (v1 , . . . , vn ) is the “value vector” of the n objects, vk denoting the value of the k-
th object. In Example 10.2 we have v = (8, 10, 3), and the candidate solution x = (0, 1, 0)
has a fitness of f (x) = 10. With it, each generation can be evaluated.
• Creating the initial population by random, we have to tackle the problem that a random
binary string x ∈ {0, 1} is not necessarily a feasible solution, i.e., it may be x ∈
/ S, because
it exceeds the maximum load. There are two strategies for this case, first to repair the
chromosome such that the corresponding solution obeys the constraint, or second to give
it a “penalty fitness,” say a vanishing value.
10.5.2 Coding
Most genetic algorithms act on search spaces containing binary strings as chromosomes, i.e.,
S ⊂ {0, 1}n . Usually, such a string is the binary representation of an integer which expresses
parameter values of the optimization. However, the binary code has the property that successive
numbers may have a large Hamming distance. The Hamming distance dH between two binary
strings is defined by the number of positions in which they differ. For example,
Often the fitness depends on the numerical value of a chromosome, the mutation of single bits
in a chromosome whose fitness is close to the optimum may lead to negative effects, cf. Table
10.2. For instance, consider a population of 3-bit strings x and a fitness function f (x) = 8x − x2 .
Then the maximum is achieved for x = 1002 . If we had a population P = {0, 3, 5} (in decimal
notation) then by f (3) = f (5) = 15 and f (0) = 0 we would have the curious situation that
although the chromosomes x = 011 and x = 101 have the same good fitness, a mutation of
102 Andreas de Vries
a single bit could possibly change 101 to the optimum 100, but 011 has to change all three
bits. Even the much worser solution x = 000 only needs to change a single bit to become the
optimum.
A commonplace solution to this problem is the use of a Gray code. It is a one-to-one
mapping γ from the natural binary code and is defined as g = γ(b), where g = gn−1 . . . g1 g0 is
the Gray code string, b = bn−1 . . . b1 b0 is the binary code string, and
gi = bi+1 ⊕ bi (i = 0, . . . , n − 1) (10.5)
with bn = 0. Here ⊕ denotes the bitwise XOR operation (in Java: ˆ). Thus the Gray code is
calculated from the binary code by the following scheme:
bn−1 bn−2 · · · b1 b0
⊕ 0 bn−1 · · · b2 b1 (10.6)
= gn−1 gn−2 · · · g1 g0
In Java, a method converting a long integer into a Gray code string may therefore be implemented
as follows:
public static String grayCode(long b) {
return Long.toBinaryString( b^(b >> 1) );
}
Accordingly, the inverse method converting a Gray code into a long integer could read as follows.
public static long toLong(String grayCode) {
long b = Long.parseLong(grayCode,2), g = 0;
for(int i = grayCode.length() - 1; i >= 0; i--) {
g += ( (g & (1 << i+1) ) >> 1) ^ (b & (1 << i));
}
return y;
}
Algorithmics and Optimization 103
represent the tour from i1 to i2 , from i2 to i3 , . . . , form in−1 to in , and from in back to i1 ,
i1 → i2 → . . . → in−1 → in → i1 .
(~v is a so-called “permutation”). We can initialize the population by a random sample of ~v. (We
can alternatively use a heuristic algorithm for the initialization process to get “preprocessed”
outputs.)
The evaluation of a chromosome is straightforward. Given the cost of travel ci j between
cities i and j, we can easily calculate the total cost of the entire tour with the fitness function
f : Zn n → R,
n−1
f (~v) = ∑ ci j ,i j+1 + cin,i1 . (10.9)
j=1
(cf. equation (8.11), p. 85). In the TSP we thus search for the best ordering of cities in a tour. It
is relatively easy to come up with (unary) mutation operators. However, there is little hope of
finding good orderings (not to mention the best ones), because a good ordering needs not to be
located “near” another good one. The strength of genetic algorithms especially arises from the
structured information exchange of crossover combinations of highly fit individuals. So, what
we need is a crossover operator that exploits important similarities between chromosomes. For
that purpose we need an OX operator. Given two parents ~v and ~w, OX builds offspring ~u by
104 Andreas de Vries
choosing a subsequence of a tour from one parent and preserves the relative order of cities
(without those of the subsequence) from the other parent. For example,
~v = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), ~w = (7, 3, 1, 11, 4, 12, 5, 2, 10, 9, 6, 8). (10.10)
If the chosen part from parent ~v is (4, 5, 6, 7), we have to cross out the cities in ~w, i.e. ~w0 = (3,
1, 11, 12, 2, 10, 9, 8), start at city 1, and insert the chosen subsequence at the same position as
in parent ~v. This gives the child
As required from a genetic algorithm, the child bears a structural relationship to both parents.
The roles of the parents ~v and ~w can then be reversed to construct a second child.
A genetic algorithm based on the above operator outperforms random search, but leaves
much room for improvements. Typical results from the algorithm (average over 20 random
runs) as applied to 100 randomly generated cities gave, after 20 000 generations, a value of the
whole tour 9.4% above the optimum.
table, where the left head column are player A’s strategies, and the head row contains player B’s
strategies. In each table cell, the first entry is player A’s payoff for the corresponding strategy
profile, the second entry is player B’s, see table 10.4. (C means cooperate, D means defect.)
The “C–C” strategy in a multi-move prisoner’s dilemma game is a so-called “Nash equi-
librium.” That means that each player’s strategy is an optimal response to the other players’
strategies.
Example 10.3. (Multi-move prisoner’s dilemma in economics) Consider two oligopolists A
and B competing on a single market. In each seasonal period, each of them has the choice to
Algorithmics and Optimization 105
C D
C (− 12 ; − 12 ) (−10; 0)
D (0; −10) (−2; −2)
Table 10.4: Strategy table for prisoner’s dilemma game. The first entry a in each box (a, b) is player A’s payoff
for the corresponding strategy profile; the second one b is player B’s payoff.
C D
C (3; 3) (−2; 5)
D (5;−2) (−1; −1)
Table 10.5: Strategy table for a single move in the economic prisoner’s dilemma game, where C implies the
decision to make a fair price for a product, and D refers to dumping the product.
“cooperate” (C) and to make a fair price for a given product, or to “dump” (D) the product and
make a dirt-cheap price for it. The expected payoff (in Mio e) for one firm depends on the
simultaneous decision of the competitor according to Table 10.5. How should each firm decide
to make the price?
A strategy in game theory is a plan of unique moves to be made after each possible past
constellation of moves; such a constellation can depend on only the last move of the rival, but
also on a series of past moves. In other words, a strategy is a collection of precise answers to
all possible questions.
We will now consider how a genetic algorithm might be used to learn a strategy for the
prisoner’s dilemma. We have to maintain a population of “players”, each of whom has a par-
ticular strategy. Initially, each player’s strategy is chosen at random. Thereafter, at each step,
players play games and their scores are noted. Some of the players are then selected for the
next generation, and some of those are chosen to mate. When two players mate, the new player
created has a strategy constructed from the strategies of its parents (crossover). A mutation, as
usual, introduces some variability into players’ strategies by random changes on representations
of these strategies.
with a j , b j ∈ {C, D}, for i, j = −3, . . . , 0; the a’s are this player’s move (C for cooperate, D for
defect) and the b’s are the other player’s moves.
106 Andreas de Vries
1. Choose an initial population. Each player is assigned a random string of 71 bits, repre-
senting a strategy.
2. Test each player to determine its fitness. Each player uses the strategy defined by its
chromosome. The player’s score is its average over all games it plays.
3. Select players to breed. A player with an average score is given one mating; a player
scoring one standard deviation above the average is given two matings; a player scoring
one deviation below the average is given no matings.
4. Pair the fittest. The successful players are randomly paired off to produce two offspring
per mating. Each offspring’s strategy is determined from the strategies of its parents, done
by using two genetic operations, crossover and mutation.
After these four stages, we get a new population. It will show patterns of behavior that are more
like those of the successful individuals of the previous generation.
Experimental results
Running this program, Axelrod obtained quite remarkable results. From a random start, the ge-
netic algorithm evolved populations whose median member was as successful as the best known
heuristic algorithm. Some behaviorial patterns evolved in the vast majority of the individuals:
10.8 Conclusions
The examples of genetic algorithms in this chapter show their wide applicability. At the same
time we observed first signs of difficulties. The representation issues of the traveling salesman
problem were not obvious, and the new operator (OX crossover) was far from trivial. What kind
of representation difficulties may exist for other problems? On the other hand, how should we
proceed in a case where the fitness function is not well defined?4
4 For example, the famous Boolean Satisfiability Problem (SAT) seems to have a natural string representation
(the i-th bit represents the truth value of the i-th Boolean variable), however, the fitness function is far from being
obvious.
Appendix
107
Appendix A
Mathematics
logb x
loga x = . (a, b ∈ N, x > 0) (A.4)
logb a
1
In particular, loga x = ln a ln x.
is the set of integers. The rational numbers q = m/n for m,√n ∈ Z are denoted by Q. The “holes”
that are still left in Q (note that prominent numbers like 2 or π are not in Q!) are only filled
by the real numbers denoted by R. Thus we have the proper inclusion chain N ⊂ Z ⊂ Q ⊂ R.
(“Proper inclusion” means that there are always numbers that are in a set but not in its subset.
Do you find an example for each subset-set pair?)
A very important topic in mathematics, especially for the growing area of cryptology, is
number theory. We will list here some fundamentals. In this chapter lower case italic letters
(such as m, n, p, q, ...) denote integers.
Definition A.1. We say that m divides n, in symbols m | n, if there is an integer k such that
n = km. We then call m a divisor of n, and n a multiple of m. We also say that n is divisible by
m. If m does not divide n, we write m - n.
108
Algorithmics and Optimization 109
The theorem in fact consists of two assertions: (i) “There exist integers q and r such that
. . . ”, and (ii) “q and r are unique.” So we will divide its proof into two parts, proof of existence
and proof of uniqueness.
Proof of Theorem A.4. (i) Existence. The numbers m and n are given. Hence we can construct
q = bm/nc, and thus also r = m − qn. These are the two equations of (A.6). The last equation is
equivalent to m = qn − r, which is the first equation of (A.5). The property of the floor bracket
implies
m/n − 1 < q 5 m/n | ·(−n)
(n>0)
=⇒ n − m > −qn = −m | +m
⇐⇒ n > m − qn = 0.
This implies that r = m − qn satisfies the inequalities 0 5 r < n.
Uniqueness. We now show that if two integers q and r obey (A.5), they also obey (A.6). Let
be m = qn + r and 0 5 r < n. Then 0 5 r/n = m/n − q < 1. This implies
m m
0 5 −q < 1 |−
n n
m m
=⇒ − 5 −q < − + 1 | ·(−1)
n n
m m
=⇒ = q > −1
n j mnk
=⇒ q= .
n
To summarize, the proof is divided in two parts: The first one proves that there exists two
numbers q and r that satisfy (A.5). The second one shows that if two numbers q and r satisfying
(A.5) also satisfy (A.6). Thus q and r as found in the first part of the proof are unique.
110 Andreas de Vries
2 1 m N−m−1
3 3 N−1 N−1
..
.
m 1
m+1 m+1
Figure A.1: Probability tree diagrams for each search strategy on an unsorted database with N entries, m = 0 of
which are marked. Each left branch represents an event of finding a marked item, the last right branch leads to the
sure finding in the next step if m > 0.
m N−m
that the first query yields a positive answer with probability N, and with probability N
1 Itis common in the quantum algorithm literature to implicitly assume the oracle to work efficiently [32]; in
complexity theory, however, an oracle (or more precisely, an “oracle Turing machine” M A for the oracle A) may
be a much more general algorithm “transcending worlds” [35, §14.3]. In this sense, an oracle rather plays the role
of a “proof checker” or a “verification algorithm” [5, §34.2] in the terminology of complexity theory.
Algorithmics and Optimization 111
there remain N − 1 items to be checked. But with formula (A.8) for N − 1, we then obtain
E[QN,m ] = Nm + N−m N−m N+1
N E[1 + QN−1,m ] = 1 + m+1 = m+1 for 0 < m 5 N − 1. Thus the case m = N
remains to be determined: but it is simply given by E[QN,N ] = 1 = N+1
N+1 , i.e., (A.8) holds for all
0 5 m 5 N. Q.E.D.
Remark A.6. Varying the above problem, we want to search for the position of one of m marked
items in an unsorted database with N = |X| = 2 entries where we know the number m satisfying
0 < m < N. In other words, we are guaranteed a previously known number of marked items.
Then we find the position of one of the searched items in
N−m
pos (N − m)(N − m)! k (N − m)k−1
E[QN,m ] = +m ∑
(N)N−m k=1 (N)k
(N − m)m! m N−m
=
(N)m
+ ∑ k (N − k)m−1
(N)m k=1
(A.9)
n!
queries on average, where (n)k := (n−k)! for n, k ∈ N, especially (n)0 = 1, (n)n = (n)n−1 = n!.
pos
Eq. (A.9) follows directly from Figure A.1. Thus we have E[QN,m ] = Θ(N). Some special cases
are the following: For m = 1, we obtain
pos N − 1 1 N−1 N2 + N − 2
E[QN,1 ] = + ∑ k= . (A.10)
N N k=1 2N
N−2
pos 2 (N − 2) 2
E[QN,2 ] = + ∑ k (N − k)
N(N − 1) N(N − 1) k=1
(N − 2)(N 2 + 2N + 3)
= . (A.11)
3N(N − 1)
N−22
For m = 3, remembering ∑N−3 3
k=1 k = 2 , we have
pos 6 (N − 3) 3 N−3
E[QN,3 ] =
(N)3
+ ∑ k (N − k)2
(N)3 k=1
(N − 3) [(N − 2)(N 2 + 3N + 8) + 24]
=
4(N)3
(N + 2) (N − 3) (N 2 − N + 4)
= . (A.12)
4(N)3
pos
The direct evaluation for higher m is not obvious. Especially, m = N − 1 yields E[QN,N−1 ] =
1 N−1
N + N = 1.
You may ask whether Theorem A.5 is important in computer science. Eventually, all impor-
tant databases are sorted, so the result is irrelevant for usual data applications. But far from it!
In fact, any database containing datasets with more than one data field is unsorted with respect
to at least one field. Take a phone book, containing mainly the name and the corresponding
phone number as data fields: any phone book is unsorted with respect to the phone numbers.
112 Andreas de Vries
Example A.7. (Searching number in a phone book) Let U = {0, 1, . . . , 1010 − 1} be the set
of all 10-digit decimal numbers. Consider a phone book X ⊂ U containing N numbers, and let
S = {1234567890}. Then SEARCH is the decision problem to determine whether the phone
number x0 = 123–456–7890 is contained in the phone book. The oracle then is given as
1 if x = x0 ,
f (x) =
0 otherwise.
Since a 10-digit number needs n = dlog2 1010 e = 34 bits, any number x in X or S, as subsets of
Σ∗ with Σ = {0, 1}, has a length satisfying |x| 5 n. Thus the oracle can work efficiently with
at most n = 34 steps, comparing successively each possible binary digit. Classically, one needs
N+1
queries on average by Eq. (A.8), whereas Grover’s quantum search algorithm requires only
√2
N queries on average.
Example A.8. (Known-plaintext attack on a cryptosystem by brute force) Assume that you have
received a plaintext/ciphertext pair of a given cryptosystem and you want to find the secret key.
The cryptosystem might be a symmetric cipher, like AES, or a public key cipher, as RSA [6].
They all have in common that their strongness relies on the difficulty to find the key. A brute
force attack tries to break the cryptosystem by searching the secret key querying the encryption
function successively with all possible keys K until
EK (M) = C,
where M is the plaintext and C is the ciphertext. To formulate a brute force attack as a search
problem, let X = U = {0, 1}n denote the set of all keys of length n bits, and S = {K ∈ U :
EK (M) = C}. Then for each K ∈ X, the oracle f : X → {0, 1} is given by
1 if EK (M) = C,
f (K) =
0 otherwise.
Example A.9. (SAT) [35, §4.2] The “satisfiability problem for propositional logic”, denoted
SAT, asks whether a given Boolean expression f : {0, 1}n → {0, 1} in conjunctive normal form
is satisfiable, i.e., whether there exists an assignment x = (x1 , . . . , xn ) such that f (x) = 1, where
0 denotes false and 1 denotes true. Here a Boolean expression is a combination of the “literals”
x j and the symbols ¬, ∧, ∨, (, and ). is in conjunctive normal form (CNF) if f (x) = m
V
i=1 ci
where each “clause” ci is a disjunction of one or more literals x j or ¬x j [35, §4.1]. For instance,
f (x1 , x2 ) = (x1 ∨ ¬x2 ) ∧ ¬x1 is satisfiable since f (0, 0) = 1, whereas
is not satisfiable because f (x) = 0 ∀x ∈ {0, 1}3 . Denote X = U = {0, 1}n the space of the
2n possible assignments to the Boolean formula f , and let S ⊂ X be the set of all satisfying
assignments of f . If f is not satisfiable, S is empty, i.e., m = |S| = 0. Then a simple algorithm
Algorithmics and Optimization 113
to solve this problem is to perform a “brute force” search through the space X and to query f
as the oracle function. Since there are N = 2n possible assignments, SAT may be considered
as a special N item search problem with the oracle f working efficiently with time complexity
O(logk N). Classically, it requires O(2n ) oracle queries on average, whereas Grover’s quantum
n
algorithm needs O(2 2 ) oracle queries on average.
Example A.10. (Hamilton cycle problem) [32, §6.4] A Hamilton cycle is a cycle in which each
vertex of an undirected graph is visited exactly once. The Hamilton cycle problem (HC) is to
determine whether a given graph contains a Hamilton cycle or not. Let X = U be the set of
all possible cycles beginning in vertex 1, i.e., x = (x0 , x1 , . . . , xn−1 , xn ) where x0 = xn = 1 and
where (x1 , . . . , xn−1 ) is a permutation of the (n − 1) vertices x j 6= 1. In other words, X contains
all possible Hamilton cycles which could be formed with the n vertices of the graph. Then a
simple algorithm to solve the problem is to perform a “brute force” search through the space X
and to query the oracle function
1 if x ∈ S,
f (x) = (A.13)
0 otherwise,
where S is the solution set of all cycles of the graph,
S = {x ∈ U : (i j−1 , i j ) ∈ E ∀ j = 1, . . . , n}. (A.14)
If the graph does not contain a Hamilton cycle, then S is empty and m = |S| = 0. The oracle
only has to check whether each pair (x j−1 , x j ) of a specific possible Hamilton cycle is an edge
of the graph, which requires time complexity O(n2 ) since |E| 5 n2 ; because there are n pairs to
be checked in this way, the oracle works with total time complexity O(n3 ) per query. (Its space
complexity is O(log2 n) bits, because it uses E and x as input and thus needs to store temporarily
only the two vertices of the considered edge, requiring O(log2 n).)
Since there are at most N = (n − 1)! = O(nn ) = O(2n log2 n ) possible orderings, the Hamilton
cycle problem is a special N item search problem. Classically, it requires O(2n log2 n ) oracle
n
queries on average, whereas Grover’s quantum algorithm needs O(2 2 log2 n ) oracle queries on
average [7, §6.2.1].
According to Dirac’s Theorem, any graph in which each vertex has at least n/2 incident
edges has a Hamilton cycle. This and some more such sufficient criterions are listed in [9,
§8.1].
A problem being apparently similar to the Hamilton cycle problem is the Euler cycle prob-
lem. Its historical origin is the problem of the “Seven Bridges of Königsberg”, solved by Leon-
hard Euler in 1736.
Example A.11. (Euler cycle problem) [32, §3.2.2] Let Γ = (V, E) be an undirected graph con-
sisting of n numbered vertices V = {1, . . . , n} and the edges E ⊆ V 2 such that (x, x) ∈ / E and
(x, y) ∈ E implies (y, x) ∈ E for all x, y ∈ V . An Euler cycle is a closed-up sequence of edges,
in which each edge of the graph is visited exactly once, If we shortly denote (x0 , x1 , . . . , xm )
with x0 = xm = 1 for a cycle, then a necessary condiiton to be Eulerian is that m = |E|. The
Euler cycle problem (EC) then is to determine whether a given graph contains an Euler cycle
or not. By Euler’s theorem [9, §0.8], a connected graph contains an Euler cycle if and only
if every vertex has an even number of edges incident upon it. Thus EC is decidable in O(n3 )
computational steps, counting for each of the n vertices x j in how many of the at most n2 edges
(x j , y) or (y, x j ) ∈ E it is contained. As a search problem, the search space X consists only of
the n vertices of the considered graph, and the answer is known after at most n countings of the
edges incident on each vertex.
For each of these search problems, an oracle function is known which is polynomially com-
putable with respect to n, i.e., which has time complexity O(logk N) and is thus efficient.
Appendix B
114
Algorithmics and Optimization 115
L Q
lattice math Gitter queue (Warte-)Schlange
lcm kgV (kleinstes gemeinsames Vielfache)
least common multiple das kleinste gemeinsame
Vielfache
let be . . . sei . . . R
(straight) line Gerade
linked list verkettete Liste reciprocal value Kehrwert
load factor Füllfaktor, Füllgrad (e-r Hashtabelle) record Record, Datensatz
lot size Losgröße remainder Rest
lower bound = lower limit untere Schranke, untere rational number rationale Zahl
Grenze real number reelle Zahl
lowercase letter Kleinbuchstabe (n-th) root (n-te) Wurzel (to extract - ziehen)
row (Matrix-) Zeile
M S
mapping Abbildung sales figures Verkaufszahlen
merge verschmelzen, zusammenführen satisfy the equation die Gleichung erfüllen, der
motherboard Hauptplatine Gleichung genügen
multicriterion optimization Mehrkriterienoptimierung scalene triangle ungleichseitiges Dreieck
scatterplot Punktwolke, Streudiagramm
self-loop Schlinge (math)
in the sequel im folgenden
sequence Folge (math)
N series Reihe (math)
set Menge
neural network neuronales Netz slack variable Schlupfvariable (beim Simplexalgorith-
node Knoten mus)
numerator Zähler spot Ort; Fleck; (Spiel-, Würfel)Auge
stack Stack (wörtl. Stapel)
suffice genügen
sufficient condition hinreichende Bedingung
O suppose annehmen
subscripted letters indizierte Buchstaben
objective function Zielfunktion subset Teilmenge
obtain erhalten subtree Teilbaum
obvious offensichtlich, klar
odd number ungerade Zahl
T
tetrahedron Tetraeder
P therefore daher
thread einfädeln, aufreihen; Faden, comp Thread
perpendicular senkrecht thus so, also, deshalb
plane math Ebene time series Zeitreihe
pointer Pointer, Zeiger toss (hoch)werfen; Wurf
polygon Polygon, Vieleck total of the digits of Quersumme von
polyhedron Polyeder, Vielflächner triangle Dreieck
polynomial Polynom; polynomial
potential set Potenzmenge
preimage Urbild (e-r Abbildung)
prime number Primzahl UVW
proof Beweis
proposition log Aussage; math Satz, Lehrsatz up to a constant bis auf eine Konstante
prove beweisen upper bound = upper limit obere Schranke, obere
Grenze
116 Andreas de Vries
XYZ
yield ergeben
Appendix C
Arithmetical operations
117
Bibliography
[3] C. H. Bennett. ‘Logical Depth and Physical Complexity’. In R. Herken, editor, The Uni-
versal Turing Machine. A Half-Century Survey, pages 207–235. Springer-Verlag, Wien,
1994.
[6] A. de Vries. ‘The ray attack on RSA cryptosystems’. In R. W. Muno, editor, Jahres-
schrift der Bochumer Interdisziplinären Gesellschaft eV 2002, pages 11–38, Stuttgart,
2003. ibidem-Verlag. http://arxiv.org/abs/cs/0307029.
[7] A. de Vries. Quantum Computation. An Introduction for Engineers and Computer Scien-
tists. Books On Demand, Norderstedt, 2012.
[12] M. Falk, R. Becker, and F. Marohn. Angewandte Statistik mit SAS. Eine Einführung.
Springer-Verlag, Berlin Heidelberg, 2nd edition, 1995.
[14] D. Fudenberg and J. Tirole. Game Theory. MIT Press, Cambridge, 1991.
118
Algorithmics and Optimization 119
[17] L. K. Grover. ‘Quantum mechanics helps in searching for a needle in a haystack’. Phys.
Rev. Lett., 79(2):325, 1997.
[18] H. P. Gumm and M. Sommer. Einführung in die Informatik. Oldenbourg Verlag, München,
2008.
[21] D. Harel and Y. Feldman. Algorithmik. Die Kunst des Rechnens. Springer-Verlag, Berlin
Heidelberg, 2006.
[25] M. J. Holler and G. Illing. Einführung in die Spieltheorie. Springer-Verlag, Berlin Hei-
delberg New York, 3. edition, 1996.
[26] F. Kaderali and W. Poguntke. Graphen, Algorithmen, Netze. Grundlagen und Anwendun-
gen in der Nachrichtentechnik. Vieweg, Braunschweig Wiesbaden, 1995.
[29] D. E. Knuth. The Art of Computer Programming. Volume 3: Sorting and Searching.
Addison-Wesley, Reading, 3rd edition, 1998.
[32] M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information. Cam-
bridge University Press, Cambridge, 2000.
[41] A. Törn and A. Zilinskas. Global Optimization. Lecture Notes in Computer Science, 350,
1989.
[42] L. von Auer. Ökonometrie. Eine Einführung. Springer-Verlag, Berlin Heidelberg, 3. edi-
tion, 2005.
[44] N. Wirth. Algorithmen und Datenstrukturen. B.G. Teubner, Stuttgart Leipzig, 1999.
Links
1. http://www.nist.gov/dads/ – NIST Dictionary of Algorithms and Data Structures
121
122 Andreas de Vries
factorial, 25 MD5, 48
Fibonacci heap, 74 minimum problem, 56, 92
fitness function, 98 modulo, 9
floor-brackets, 9 Monte Carlo technique, 96
Floyd-Warshall algorithm, 61, 71 multi-criterion optimization, 60
multiple, 108
game theory, 105 multistage decision problem, 79
Gauß-brackets, 9 mutation, 98
gene, 97
generation, 99 Nash equilibrium, 104
genetic algorithm, 61, 97 natural numbers, 108
genetic operator, 98 negative cycle, 70
genotype, 97 neighbours, 64
golden ratio, 15 NP-complete, 85
graph, 63
weighted -, 69 objective function, 87
Gray code, 102 operation, 11
greatest common divisor, 10, 17 optimization, 56
greedy algorithm, 61 optimization problem, 68
optimization problems, 75
Hamilton cycle, 113 optimum path, 75
Hamiltonian cycle, 67, 68 oracle, 110
Hamiltonian cycle problem, 68 oracle function, 68
Hamming distance, 101 output, 15
hash table, 46 OX operator, 103
hash value, 46
hash-function, 46 Pareto optimization, 60
hashing, 48 particle swarm optimization, 62
HashMap, 44 path, 64
HashSet, 44 shortest -, 70
HC, 68 permutation, 85, 103
heuristic, 106 pigeonhole sort, 36
Huffman code, 61 pivot element, 38, 89
hyperspace, 57 polynomial, 20
polynomial time complexity, 21
independent, 53 population, 97
individual, 97 premature convergence, 101
input, 15 primary constraints, 87
insertion sort, 34 priority queue, 73
instruction, 12 programming
integers, 108 dynamic, 75
ISBN, 46 linear, 88
pseudocode, 10
key comparison sort algorithm, 35
knowledge, 97, 103 qu, 79
quantity unit, 79
Landau symbol, 18
lattice, 57 Rastrigin function, 59
law of motion, 77 recombination, 98
learn, 106 rectangle rule, 89
length, 64, 70 recurrence equation, 29, 30
letter, 44 recursion, 25
linear optimization problem, 88 recursion equation, 30
linear probing, 53 recursion step, 26
linear programming, 88 recursion tree, 28
load factor, 51 recursive call, 26
loci, 97 regression, 55
logarithmic time complexity, 21 relation, 63
loop, 13 relaxation, 71
repetition, 13
Master theorem, 30 return, 12
maximum problem, 56 Rivest, Ron, 47
Algorithmics and Optimization 123
roulette-wheel selection, 99
running time, 20, 22
tableau, 88
time, 10
time series, 55
tournament selection, 99
transition function, 77
transition law, 77
transpose, 88, 93
traveling salesman problem, 55, 103
travelling salesman problem, 84
triangle inequality, 85
truncation selection, 99
TSP, 55, 84, 103
Turing machine, 15
Unicode, 45
uniform, 53
unit
currency, 80
quantity, 79
universe, 45
variable
slack, 88
vertex, 63