Notes On Optimisation Theory
Notes On Optimisation Theory
In this handout, we discuss some points of elementary logic that are apt to cause confusion,
and also introduce ideas of set theory, and establish the basic terminology and notation.
This is not examinable material, but read it carefully, as this forms the basic language of
mathematics.
1. How to read mathematics
Dont just read it, fight it! Mathematics says a lot with a little. The reader must participate.
After reading every sentence, stop, pause and think: do I really understand this sentence?
Dont read too fast. Reading mathematics too quickly results in frustration. A half hour
of concentration while reading a novel perhaps buys you 20 pages with full comprehension.
The same half hour in a math article buys you just a couple of lines. There is no substitute
for work and time.
An easy way to progress in a mathematics course is to read the relevant section of the
course notes or book before the lectures, and then once again on the same day after the
lecture is over. Keep up, as Mathematics is different from other disciplines: you need to know
yesterdays material to understand todays. Dont save it all for one long night of cramming,
which simply wont work with Mathematics.
Before attempting the exercises, make sure you read the corresponding section from the
lecture notes or book. After reading an exercise, stop and think if you know all the terms in
the exercise, and if you understand what is being asked. Then think about what is given and
what is required. You might then see a possible way of proceeding. You can do some rough
work by writing down a few things in order to convince yourself that your strategy indeed
works. Then write down your answer in a manner that a person can understand logically
what your argument is. Justify each step. Writing proofs is an art, and one gets better at it
only by practice. Every step in the proof is a (mathematical) statement, but it is a sentence in
English! So make sure that each step in your argument reads like a simple sentence (so avoid
the use of a chain of dangling or , and pay attention to punctuation and grammar!).
2. Definitions, Lemmas, Theorems and all that
2.1. Definitions. A definition in Mathematics is a name given to a mathematical object by
specifying what the mathematical object is. Just like in biology we define that
An animal is called a fish if it is a cold-blooded, water-dwelling vertebrate with gills.
Observe that in defining a fish, we have listed the characterizing properties that specify which
animals are fish and which arent. In the same way, in Mathematics, a mathematical object
(a set or a function) is given a certain name if it satisfies certain properties.
Any definition of a mathematical term or a phrase will roughly have the form
if ? ? ? ,
where is the term which is being defined, and ? ? ? is its defining property. For example:
1
BACKGROUND
Every even integer greater than 2 can be expressed as a sum of two prime numbers.
Although the above statement has been verified for an impressive range of cases, nobody has
proved it for all cases. Nor has it been disproved, that is, nobody has so far discovered even
a single even integer greater than 2 which cannot be expressed as a sum of two primes. Still,
even today, the statement is either true or false, even though we do not know which way it is.
3.2. Statements about a class. The above remark about quantification of truth is important for statements about a class. The statement
(2)
is about a class, namely that comprising rich men. Goldbachs conjecture above is a statement
about the class of all even integers greater than 2.
A layman is apt to regard these statements as true or nearly true when they hold in a
large number of cases. Even if there are a few exceptions, he is likely to ignore them as say
The exception proves the rule!. In mathematics, this is not so. Even a single exceptional case
(a counter-example, as it is called), renders false a statement about a class. Thus even one
unhappy rich man makes the statement (2) as false as millions of such men would do. In other
words, in mathematics, we interpret the words all and every quite literally, not allowing
BACKGROUND
even a single exception. If we want to make a true statement after taking the exceptional
ones into account, we would have to make a different statement such as
All rich men other than Mr. X are happy.
But loose expressions such as most, a great many or almost all cannot be used in mathematical statements, unless, of course they have been precisely defined earlier.
There is another type of statements made about a class. These do not assert that something
holds for all elements of the class, but instead that it holds for at least one element from that
class. Take for example the statement
There exists a man whose height is 5 feet 7 inches
or
There exists a natural number k such that 1 < k < 4294967297 that divides 4294967297.
These statements refer respectively to the class of all men and to the class of all natural
numbers between 1 and 4294967297. In each case, the statements says that there is at least
one member of the class having a certain property. It does not say how many such members
are there. Nor does it say which ones they are. Thus the first statement tells us nothing by
way of the name and the address of the person with that height, and the second one does not
say what this divisor is. These statements are, therefor, not as strong as, respectively, the
statements, say,
Mr. X in London is 5 feet 7 inches tall
or
641 divides 4294967297
which are very specific. A statement which merely asserts the existence of something without
naming it or without giving any method for finding it is called an existence statement. In
the bivalued logic setting of mathematics, existence statements are either true or false, even
if they are not specific.
3.3. Negation of a statement. A negation of a statement is a statement which is true
precisely when the original statement is false and vice-versa. The simplest way to negate a
statement is to precede it with the phrase It is not the case that .... Thus the negation of
Mr. X is rich
is
It is not the case that Mr. X is rich.
Note that the negation of
All men are mortal
is not
All men are immortal.
In view of the comments that we have made about the truth of statements about a class, it
is false, even when it fails to hold just in one case, that is, when there is even one man who
is not mortal. So the correct logical negation is
There exists a man who is immortal.
BACKGROUND
Not surprisingly, the negation of an existence statement is a statement asserting that every
member of the class (to which the existence statement refers) fails to have the property
asserted by the existence statement. Thus, the negation of
There exists a rich man
is
No man is rich
that is,
Every man is poor .
If we keep in mind these simple facts, we can almost mechanically write down the negation of
any complicated statement. For example, if a 1 , a2 , a3 , . . . is a sequence of real numbers, then
the negation of
R such that > 0, N N such that n N satisfying n > N , |a n L| <
is
R such that > 0 and N N, n N such that n > N and |a n L| .
If a statement is denoted by some symbol P , then the negation of P is denoted by P .
3.4. Vacuous truth. An interesting point arises while dealing with statements about a class.
A class which contains no elements at all is called a vacuous or empty class. For example,
the class of all six-legged men is empty because there is no man who has six legs. But now
consider the statement
(3)
Is this statement true or false? We cannot call it meaningless. It has a definite meaning, just
like the statement
Every rich man is happy.
We may call the statement (3) useless, but that does not debar it from being true or false.
Which way is it then? Here the reasoning goes as follows. Because of bivalued logic, the
statement (3) has to be either true or false, but not both. If it is false, then its negation is
true. But the negation is the statement
There exists a six-legged man who is not happy.
But this statement can never be true because there exists no six-legged man whatsoever (and
so the question of his being happy or unhappy does not arise at all). So the negation has to
be false, and hence the original statement is true!
A layman may hesitate in accepting the above reasoning, and we give some recognition to
his hesitation by calling such statements as being vacuously true, meaning that they are true
simply because there is no example to render them false.
Note, by the way, that the statement
(4)
is also true (albeit vacuously). There is no contradiction here because the statements (3) and
(4) are not negations of each other.
BACKGROUND
What is the use of vacuously true statements? Certainly, no mathematician goes on proving
theorems which are known to be vacuously true. But such statements sometimes arise as
special cases of a more general case.
3.5. Logical precision in mathematics. The importance of logic in mathematics cannot
be over-emphasized. Logical reasoning being the soul of mathematics, even a single flaw of
reasoning can thwart an entire piece of research work. We already pointed out that in mathematics every theorem has to be deduced from the axioms in a strictly deductive manner.
Every step has to be justified, and this is the rule for all mathematics. But it deserves to
be emphasized here, since in high-school mathematics, the concern was usually with numbers. Consequently the required justifications were based upon some very basic properties
of numbers and their specific mention was rarely made. For example if we are to solve
(x + 3)3(x 3) = 30, we mechanically solve it in the following steps:
(x + 3)(x 3) = 10
x2 9 = 10
x2 = 19
x = 19 or 19.
Although no justification is given for these steps, they require various properties of real
numbers such as associativity, commutativity and cancellation laws for multiplication and
addition, distributivity of multiplication over addition, and finally, the existence of square
roots of real numbers. So far in high-school, we ignored these. But in mathematics, one
sometimes considers abstract algebraic systems where some of these laws of associativity,
distributivity and so on do not hold. Then the justification for each step will have to be given
carefully, starting from the axioms. Hence a proof is needed even when the statement may
seem obvious.
4. Sets and functions
It is sometimes said that mathematics is the study of sets and functions. This is an
oversimplification of matters, but it is true.
4.1. Sets. A set is a collection of objects considered together. For instance the set of all
positive integers, or the set of all rational numbers and so on. The set comprising no element
is called the empty set, and it is denoted by . The objects belonging to the set are called its
elements. For example, 2 is an element belonging to the set of positive integers.
There are two standard methods of specifying a particular set.
Method 1. Whenever it is feasible to do, we can list its elements between braces. Thus
{1, 2, 3} is the set comprising the first three positive integers.
This manner of specifying a set, by listing its elements, is unworkable in many circumstances. We then use the second method, which is to use a property that characterizes the
elements of the set.
Method 2. If P denotes a certain property of elements, then {a | P } stands for the set of all
elements a for which the property P is true 1. The set then contains all those elements (and
1The symbol | is read as such that. Some authors use : instead of |.
BACKGROUND
(Some authors use the notation A B instead of A B.) If A B, we sometimes say that
A is contained in B or that B contains A. For example the set of integers is contained in
the set of rational numbers: Z Q.
For any set A, A A and A. Two sets A and B are said to be equal if they consist of
exactly the same elements, and we then write A = B. A is equal to B iff A B and B A.
If A B and A 6= B, then we say that A is strictly contained in B, or that B strictly
contains A.
The intersection of sets A and B, denoted by A B, is the set of all elements that belong
to A and to B:
A B = {a | a A and a B}.
If A B = , then the sets A and B are said to be disjoint. For example, the intersection of
the set of all integers divisible by 2 and the set of all integers divisible by 3 is the set of all
integers divisible by 6. More generally, if A 1 , . . . , An are sets, then their intersection is the
set
{a | for all i {1, . . . n}, a Ai }.
n
\
Ai . If we have an infinite family of
The intersection of the sets A1 , . . . , An is denoted by
i=1
which is denoted by
{a | for all i N, a Ai },
Ai .
i=1
The union of sets A and B denoted by A B, is the set of all elements that belong to A
or to B:
A B = {a | a A or a B}.
For example, the union of the set of even integers and the set of odd integers is the set of all
integers. More generally, if A1 , . . . , An are sets, then their union is the set
{a | there exists an i {1, . . . n} such that a A i }.
BACKGROUND
n
[
i=1
Ai .
i=1
Given sets A and B, the product of A and B is defined as the set of all ordered pairs (a, b),
such that a is from A and b is from B. The product of the sets A and B is denoted by A B.
Thus,
A B = {(a, b) | a A and b B}.
We do not define an ordered pair, but remark that unless a = b, (a, b) is not the same as
(b, a). The name product is justified, since if A and B are finite, and have m and n elements,
respectively, then the set A B has mn elements. Note that as sets A B and B A are not
equal, even though they have the same number of elements. Similarly given sets A 1 , . . . , An ,
we define A An by
A1 An = {(a1 , . . . , an ) | ai Ai for all i {1, . . . , n}}.
4.2. Functions or maps. Let A and B be two nonempty sets. A function (or a map) is a
rule which assigns to each element a A, an element of the set B.
The set A is called the domain, and B is called the codomain of the function f . We write
f :AB
Clearly f (A) B.
For example, if we take A = B = Z and consider the rule f that assigns to the integer n
the integer n2 , then we obtain a function f : Z Z given by
(5)
f (n) = n2 ,
n Z.
We observe that the image of f , is the set f (Z) = {0, 1, 2, . . . } comprising the nonnegative
integers, which is strictly contained in the codomain Z. Thus f (Z) Q, but f (Z) 6= Z.
Note that while talking about a function, one has to keep in mind that a function really
consists of three objects: its domain A, its codomain B and the rule f . Thus for example, if
the function g : Z Q is given by g(n) = n 2 , n Z, then g is a different function from the
function f : Z Z given by (5) above, since the codomain of f is Z, while that of g is Q.
Functions can be between far more general objects than sets comprising numbers. The
important thing to remember is that the rule of assignment is such that for each element
from the domain there is only one element assigned from the codomain. For example, if we
take the set A to be the set of all human beings in the world, and B to be the set of all
females on the planet, and f : A B to be the function which associates to a person his/her
BACKGROUND
mother. Then we see that f is a function. However, if g is rule which assigns to each person
a sister he/she has, then clearly this is not a function, since there are be people with more
than one sister (and also there are people who do not have any sister).
Properties of functions play an important role in mathematics, and we highlight two very
important types of functions.
A function f : A B is said to be injective (or one-to-one) if
(6)
This means that for each point b in the image f (A) of a function f : A B, there is a unique
point a in the codomain A such that f (a) = b. The function f : Z Z given by (5) is not
injective, since for instance for the points 1, 1 in the domain Z, we see that 1 6= 1, but
f (1) = 1 = f (1). However, the function g : Z Z given by
g(n) = 2n,
n Z,
is injective.
A function f : A B is said to be surjective or onto if
(7)
Note that (7) is equivalent to f (A) = B. In other words, a function is surjective if every
element from the codomain is the image of some element. The map f : Z Z given by (5) is
not surjective. Indeed, 1 is an element from the codomain, but there is no element n from
the domain Z such that (f (n) =) n2 = 1. Consider the map g : Z {0, 1, 2, 3, . . . } given
by
g(n) = n2 , n Z.
Then clearly g is surjective.
A function f : A B is said to be bijective if it is injective and surjective. Thus to check
that a map is bijective we have to check two things, injectivity and surjectivity. Consider the
map h from the set of all integers to the set of all even integers, given by
h(n) = 2n,
n Z.
a S.
BACKGROUND
Optimisation Theory
2007/08
MA 208
Notes 1
Introduction to Continuous Optimisation
Some Mathematical Background
As mentioned in the general information of this course, the first part of this is course is based
on the text book A First Course in Optimization Theory by R.K. Sundaram ( Cambridge University Press (1996), ISBN 0-521-49770-1 ). ( Notice the difference between the British spelling
optimisation and the American optimization. We will use the British spelling, which
means that the correct title of this course is Optimisation Theory. ) As far as possible, we
will follow the notation and conventions in that book.
The notes will mainly consist of extra material ( or a different explanation of material covered in the book ), plus a description of the parts of the book relevant for the topic under
consideration.
1.1 The basic problems
Throughout this part of the course we will assume that we are given a function f : D R,
where D is a certain subset of Rn , for some n 1. The function f is called the objective
function and D is the constraint set.
And the optimisation problem is : what is the maximum or minimum for f ( x ) when x D ?
We will write these problems as
maximise f ( x ) subject to x D
and
minimise f ( x ) subject to x D ;
or more compact as
maximise f ( x ) for D
and
minimise f ( x ) for D
max{ f ( x ) | x D }
and
min{ f ( x ) | x D }.
or
Notes 1 Page 2
for all y D .
for all y D .
If such points x or z do exist, then they are called a global maximum of f on D and a global
minimum of f on D , respectively.
Notice that there is a difference between a solution to one of the optimisation problems
above and the solutions ( or the set of solutions ) to the optimisation problem.
Also realise that quite often there will be no solution at all; the set of solutions is the empty
set in that case.
We will use the notation arg max{ f ( x ) | x D } to denote the set of all solutions to the
problem max{ f ( x ) | x D } and arg min{ f ( x ) | x D } for the minimising problem.
f Has a maximum on D1 .
Notes 1 Page 3
Theorem 2.5 in the book looks kind of scary, but also very obvious. The composition f of
a function : R R and a function f : D R is the function given by
( f )( x ) = ( f ( x )),
for all x D .
Notice that in this course we use no clear distinction in notation between a real number
x R and a vector or point y Rn . Also, vectors are in general written with the coordinates
in a row x = ( x1 , . . . , xn ), but will be considered as column vectors. If we want to regard the
same vector as a row vector, an accent x 0 is added to its name.
Be careful with the notation to compare two vectors. So if x = ( x1 , . . . , xn ) and y = (y1 , . . . , yn )
have the same dimension, then
x = y,
if xi = yi for all i = 1, . . . , n;
x y,
if xi yi for all i = 1, . . . , n;
x > y,
x y,
The notation for the inner product is again somewhat different from the one you may be
used to. In this course we just write x y for the inner product of two vectors. So we can
The most important result on the norm of vectors and the distance between two vectors is
the Triangle Inequality. It comes in two flavours :
* For any two vectors x, y Rn we have k x + yk k x k + kyk.
* For any three vectors x, y, z Rn we have d( x, z) d( x, y) + d(y, z).
Notes 1 Page 4
In a way, the following two are the only definitions of convergence involving epsilons and
deltas you need to know.
Definition
A sequence of real numbers { xk } converges to zero, notation xk 0, if for all > 0 there is
an integer K such that | xk | < for all k K.
Definition
A sequence of real numbers { xk } diverges to +, notation xk + or xk +, if for all
M R there is an integer K such that xk > M for all k K.
From the first definition in the previous paragraph, we can obtain all other convergence
concepts we need.
Definition
A sequence of real numbers { xk } converges to x for some x R, notation xk x, if xk
x 0.
Definition
A sequence of points { xk } in Rn converges to x for some x Rn , notation xk x, if k xk
x k 0.
Those of you who like definitions with epsilons and so, can use the following equivalent
definition :
Notes 1 Page 5
Definition
A sequence of points { xk } in Rn converges to x for some x Rn , notation xk x, if for all
> 0 there is an integer K such that k xk x k < for all k K.
Instead of converges to x we also write is convergent with limit x.
Note that there is no concept of convergence to infinity if we are dealing with sequences of
points in Rn .
Apart from knowing the definition of convergence, you should also have a fairly good idea
what it means if a sequence is not convergent :
* A sequence of points { xk } in Rn is not convergent to x Rn if there exists an > 0 such
that for all integers K there is a k > K with k xk x k .
In particular this means that if we have a sequence of real numbers { xk } with xk +,
then { xk } does not converge.
We will use the following definitions for open and closed set :
Definition
A set S Rn is open if for all x S there is an r > 0 such that B( x, r ) S.
A set S Rn is closed if for all sequences { xk } in S that converge to a limit x, also x S.
The two definitions are related by the following theorem, which is equivalent to Theorem 1.20 in the book :
Theorem
A set S Rn is closed if and only if its complement Sc = { x Rn | x
/ S } is open.
Notes 1 Page 6
The following should be known notation for the union and intersection of arbitrary collections of sets (S ) A , where A is some index set :
S
A
S = { x | x S for some A };
S = { x | x S for all A }.
You shouldnt try to remember the following results, but try to get a feeling why they are
true.
* The union of an arbitrary collection of open sets is again open.
* The intersection of an arbitrary collection of open sets is not always open.
* The intersection of a finite collection of open sets is again open.
* The sum of two open sets is again open.
* The union of an arbitrary collection of closed sets is not always closed.
* The union of a finite collection of closed sets is again closed.
* The intersection of an arbitrary collection of closed sets is again closed.
* The sum of two closed sets is not always closed.
1.7 Upper and lower bound; supremum and infimum; maximum and minimum
As said in the first subsection, when looking at a problem of the type max{ f ( x ) | x D }
were actually looking at the properties of f (D). Since f (D) is a subset of R, we take a closer
look at subsets of R.
Definition
Let A be a nonempty subset of R.
* An upper bound of A is a point u R such that u a for all a A.
A lower bound of A is a point ` R such that ` a for all a A.
* If A has at least one upper bound, then the supremum of A, notation sup( A), is the smallest
upper bound of A.
If A has no upper bound, then we set sup( A) = +.
If A has at least one lower bound, then the infimum of A, notation inf( A), is the largest
upper bound of A.
If A has no lower bound, then we set inf( A) = .
* The maximum of A, notation max( A), is a point z A such that z a for all a A.
The minimum of A, notation min( A), is a point w A such that w a for all a A.
Notes 1 Page 7
The most complicated of the definitions above are those for supremum and infimum. From
the definitions its often not so easy to show whether or not a point is the supremum of a set.
An alternative definition would be the following :
Property
Let A R be a nonempty set. If A has an upper bound, then the supremum of A is the
unique point m with the following properties :
For each x > m we have that x
/ A.
For each x < m we have that there is an a A such that a x.
1.8 Matrices
Section 1.3 in the book contains much about matrices that I assume you are familiar with.
Read it sometimes to refresh your memory. Some items will be discussed in some detail later
in the course, when appropriate.
Remember that vectors are assumed to be column vectors, although written with the coordinates in a row. So if M is an m n matrix ( hence has m rows and n columns ) and
x = ( x1 , . . . , xn ) is an n-vector, then we will write A x for the product of A and x. If we want
to multiply the matrix from the left by the row vector x, we will write x 0 A.
The assumption above means that we could write the inner product of two vectors x, y Rn
as x y = x 0 y. We will in general only use the second notation when a matrix is involved in
the middle. Hence if we take the inner product of x and A y, then we will usually write this
as x 0 A y, which is the same as x A y.
Notes 1 Page 8
Section 1.1 in the book gives list some basic mathematical facts that I assume to be wellknown. Proofs from here will not be discussed in the course.
You should have some ideas what is happening in subsections 1.2.1 1.2.3 of the book.
Compare with section 1.3 in these notes.
Theorem 1.5 and the related Theorem 1.8 are obvious, but important results. Theorems 1.10 and 1.11 will be needed now and then in some of the more formal proofs,
but you yourself dont really need to know them in detail. Proofs in these sections of the
book are of little interest to us.
Everything we need to know about subsection 1.2.4 in the book can be found in section 1.7
of these notes. The main ideas of the proof of Theorem 1.13 will be discussed in the
lectures.
Subsections 1.2.5 and 1.2.6 are of little interest to us.
Everything you need to know about subsection 1.2.7 is in section 1.6 of these notes.
Subsection 1.2.8 will be discussed in the next notes.
We will try to avoid convex sets as much as possible in this course ! There are plenty of
courses at the LSE where you can learn about them. So ignore subsection 1.2.9.
The contents of subsection 1.2.10 will be discussed at some places in these notes. Read
this subsection once to see what is happening.
As said before, Section 1.3 in the book contains much about matrices that I assume you
are familiar with. Read it sometimes to refresh your memory.
We will need all kinds of bits and pieces from Sections 1.4 to 1.6 in the book. We will
discuss them when they are needed. So ignore at the moment.
Notes 1 Page 9
Extra Exercises
1
(a) Use the definition in 1.4 to prove that the sequence { xk } given by xk = 1/k does converge to zero.
(b) Use the definition in 1.4 to prove that the sequence {yk } given by yk = (1)k does not
converge to any limit.
Determine which of the sets A, B, C, D, E, F, G, H from the previous question are open. Justify your answers !
Let A R be a nonempty set. Use the definitions from 1.7 to prove the following statements.
(a) If inf( A) = sup( A), then A has only one element.
(b) If A is an open set, then sup( A)
/ A.
Optimisation Theory
MA 208
2007/08
Notes 2
Bounded and compact sets
Continuous Functions
Weierstrass Theorem
Again, dont learn the following properties by heart, but try to get a feeling why they are
true.
* The union of an arbitrary collection of bounded sets is not always bounded.
* The union of a finite collection of bounded sets is again bounded.
* The intersection of an arbitrary collection of bounded sets is again bounded.
* The sum of two bounded sets is again bounded.
* The union of an arbitrary collection of compact sets is not always compact.
* The union of a finite collection of compact sets is again compact.
* The intersection of an arbitrary collection of compact sets is again compact.
* The sum of two compact sets is again compact.
Author : Jan van den Heuvel
Notes 2 Page 2
Most of you will have a notion of what it means for a function from the real numbers in
itself will have a limit. Definitions for these concepts usually include notions such as x
approaches a or x approaches a from above. But if we are dealing with functions f :
Rn Rm , we must be careful what we mean if we say x approaches a since x and a are
points in some higher dimensional space. We will use the following definition :
Definition
Given a function f : S Rm where S Rn . Then we say that f ( x ) ` as x a,
where ` Rm and a Rn , if for every sequence { xk } in S such that xk 6= a but xk a we
have that f ( xk ) `.
Instead of f ( x ) ` if x a we sometimes write lim f ( x ) = `.
xa
Those of you who like epsilons and so, can use the following equivalent definition :
Definition
Given a function f : S Rm where S Rn . Then we say that f ( x ) ` as x a,
where ` Rm and a Rn , if for every > 0 there is a > 0 such that for all x S with
0 < k x ak < we have k f ( x ) `k < .
For functions f : S R whose range are the real numbers, we have the following additional concept :
Definition
Given a function f : S R where S Rn . Then we say that f ( x ) diverges to + as x a,
where a Rn , if for every sequence { xk } in S such that xk 6= a but xk a we have that
f ( xk ) +.
Here is an equivalent definition not using sequences :
Definition
Given a function f : S R where S Rn . Then we say that f ( x ) diverges to + as x a,
where a Rn , if for all M R there is an > 0 so that for all x Rn such that xk 6= a but
k x ak < , we have that | f ( x ) f ( a)| > M.
We use the notation f ( x ) + as x a, or lim f ( x ) = +.
xa
And for functions f : R R whose domain are the real numbers, we have the following
additional concept :
Definition
Given a function f : R Rn . Then we say that f ( x ) ` as x , where ` Rm , if
for every sequence { xk } in R such that xk we have that f ( xk ) `.
And also this time we can give a definition with epsilons :
Definition
Given a function f : R Rn . Then we say that f ( x ) ` as x , where ` Rm , if
for all > 0 there exists an M R such k f ( x ) `k < for all x > M.
For this we use the notation lim f ( x ) = `.
x
Notes 2 Page 3
Note that Weierstrass Theorem gives a sufficient condition, not a necessary one. So if the
function f is not continuous or the set D is not compact, you cannot conclude that a maximum of f on D does not exist.
2.5 Examples
In Section 3.2 of the book you can find a selection of worked-out examples in which Weierstrass Theorem can provide some insight into certain optimisation problems. The examples
are described also in Sections 2.3.1, 2.3.4, and 2.3.7 in Chapter 2. Have a look at these sections
in order to understand the examples better.
Here you find some of the typical notation of the book. For instance x Rn+ means that
x = ( x1 , . . . , xn ) is a vector with xi 0 for all i. And p 0 doesnt mean that p is much
larger than 0, but means that p = ( p1 , . . . , p) is a vector with pi > 0 for all i.
Notes 2 Page 4
In order to apply Weierstrass Theorem you need to check that the conditions are satisfied.
This is usually not a trivial exercise, in particular since the definitions for compact set and
for continuous function are kind of nasty. Here are some hints that may be helpful.
The formal definition for a function to be continuous is too cumbersome to use in general.
That is why you can use the following rule :
Every reasonably looking function, whose definition involves polynomials and quotients of polynomials, trigonometric functions, exponential and logarithmic functions,
and the like is continuous for every point where it is defined.
A function where the domain is split into different parts, with different function descriptions for the different parts, does not have to be continuous ( but it can be ).
A set S Rn is compact if and only if it is closed and bounded.
To show that a set S Rn is bounded, you need to find a real number M such that
k x k M for all x S.
A particular way to show that this is the case is the following :
Suppose that for a set S Rn there exists numbers m1 , . . . , mn such that for all x S,
where x = ( x1 , . . . , xn ) we have | xi | mi for all i = 1, . . . , n. Then S is a bounded set.
This follows from the previous statement since we have
q
q
k x k = x12 + + xn2 m21 + + m2n ,
q
so you could take M =
m21 + + m2n .
Again, the formal definition of a closed set should only be used when asked to do so explicitly. You can assume the following rules :
The whole space Rn is closed ( but not bounded of course ).
Any subset of Rn described by one or more constraints of the type g( x ) = a or g( x ) a,
where a R and g : Rn R is continuous on Rn , is closed.
But be aware, if there is a constraint of the type g( x ) < a, then the set is often not closed.
Further investigations are needed in that case.
2
2
k( x, y)k = x + y = 2 (| M| + 1)2 > 2 | M| > M.
So we can never find an M such that kzk M for all z D .
Notes 2 Page 5
1
on the set D given above.
x+y
First note that f is defined on D , since for all ( x, y) D we have x > 0 and y > 0. Following
the rule of looking reasonable, f is continuous. So we would like to use Weierstrass
Theorem to conclude that a maximum exists. But weve seen above that D is not bounded
hence not compact.
The trick here is to apply the final theorem on page 2 of notes 1. Its easy to see that, for
instance, (3, 3) D and since f (3, 3) = 1/6 we know that if a maximum exists it must have
function value at least 1/6. Now define
D1 = { ( x, y) D | x + y 6 }
and
D2 = { ( x, y) D | x + y 6 }.
Note that in the example above we can also conclude that f has a minimum on D1 , since
Weierstrass Theorem guarantees both a maximum and a minimum. But we cannot conclude
that f has a minimum on D .
So, to complete the example above, what can we say about the existence of a minimum of f
on D ? First note that for all ( x, y) D we have f ( x, y) > 0, but there is no ( a, b) D such
that f ( a, b) = 0. On the other hand, it is fairly easy to show that f ( x, y) can get arbitrarily
close to 0 for the right ( x, y) D . For instance, for all x 1 we know that ( x, x ) D . But
1
f ( x, x ) =
0 if x . It follows that 0 is the infimum of f (D) but not the minimum.
2x
We must conclude that f has no minimum on D .
Weierstrass Theorem is the topic of Chapter 3 in the book. More or less everything in
that chapter is relevant to us as well.
Notes 2 Page 6
Extra Exercises
1
For the sets A given below, show that every function f : A R is continuous.
(a) A set A Rn with just one element.
(b) A set A Rn with exactly two elements.
(c) A set A Rn with a finite number of elements.
(d) The set A = Z, considered as a subset of R.
Determine which of the sets in (a) (d) from above are compact. ( Use question 2 from
notes 1. )
1
| n = 1, 2, . . . }. Determine, justifying your answers, if the following
n
statements are true :
Let D = {0} {
Optimisation Theory
2007/08
MA 208
Notes 3
Differentiation of functions
Unconstrained Optimisation
First-Order Conditions
as x x0 .
If f is differentiable on a set S, then the derivative D f can in its turn be seen as a function
D f : S Rmn . If this function is continuous, then f is said to be continuous differentiable,
or f is C1 .
Notes 3 Page 2
In this course we will be mainly interested in functions from Rn into R, i.e., in functions
f : S R, where S Rn . For these functions the definition of differentiable becomes :
* A function f : S R, where S Rn , is differentiable at x0 S, where x0 must be in the
interior of S, if there exists an n-vector a such that
f ( x ) f ( x0 ) a ( x x0 )
0
k x x0 k
as x x0 .
More interesting for practical use are partial derivatives. Let e j Rn be the j-th unit vector, i.e.,
e j has a 1 in the j-th coordinate and a 0 everywhere else. Then the j-th partial derivative of f at
f (x)
a point x is the number
( or
f ( x ) ) such that
x j
x j
f (x + t ej ) f (x)
f (x)
t
x j
as t 0.
The important result connecting the general derivative and partial derivatives is Theorem 1.54
in the book. We only need the following part of that theorem.
* Theorem
Let f : S R be a function, where S Rn is an open set. The function f is C1 on S
( i.e., the derivative D f ( x ) exists for all x S and is continuous ) if and only if all partial
derivatives of f exists and are continuous for all x S.
f (x) f (x)
f (x)
In that case we also have D f ( x ) =
,
, ...,
for all x S.
x1
x2
x2
Note that the statement above is useless if the derivative or some of the partial derivatives
are not continuous. Have a look at Example 1.55 in the book to see that continuity is really
essential, and that there really is something happening in the theorem above.
The way the theorem above is used is usually as follows : given the function f , deduce
all partial derivatives. If these look reasonable, i.e., they exist everywhere on S and are
continuous, then we will assume f to be differentiable.
Notes 3 Page 3
as x x0 .
f ( x ) f ( x0 ) D f ( x0 ) ( x x0 )
for x S \ { x0 }. ( The reason that we
k x x0 k
exclude x0 is that the quotient is not defined if x = x0 . ) Then we get for free that R( x ) 0
as x x0 . Also, by rearranging this definition we immediately find the main formula
Now define R( x ) =
f ( x ) = f ( x0 ) + D f ( x0 ) ( x x0 ) + R ( x ) k x x0 k.
So the only thing left to do is what to do with R( x0 ). But by defining R( x0 ) = 0 ( remember,
R( x ) was until now undefined for x = x0 ), everything fits nicely together.
Remember that the open ball B( x, r ) with centre x Rn and radius r is the set B( x, r ) =
{ y Rn | d( x, y) < r }, where d( x, y) = k x yk is the distance between x and y.
* For a set S Rn , a point x S is in the interior of S if there exists an r > 0 such that
B( x, r ) S.
The set of all interior points is denoted by int D .
Note that by this definition we see that a set S is open if and only if every point of S is an
interior point of S.
This are unifying definitions for several important maximum-concepts. Here we assume
that D Rn is a non-empty set and f : D R a real-valued function. Several of these can
also be found in Chapter 4 of the book :
* A point x D is a global maximum of f on D if for all y D we have f (y) f ( x ).
* A point x D is a local maximum of f on D if there exists an r > 0 such that for all
y D B( x, r ) we have f (y) f ( x ).
* A point x D is an unconstrained local maximum of f on D if there exists an r > 0 such
that for all y B( x, r ) we have y D and f (y) f ( x ).
Notice the slightly different conditions for points y we need to consider in the definitions for
local maximum and unconstrained local maximum.
Notes 3 Page 4
Two proofs of Theorem 4.1 can be found in Section 4.5 in the book. Here is another proof,
using Taylors Theorem.
Proof The fact that x is an unconstrained local maximum of f on D means that there is an
r > 0 such that for all x B( x , r ) we have that x D and f ( x ) f ( x ).
From Taylors Theorem we know that there is a function R1 : D R such that R1 ( x ) = 0,
R1 ( x ) 0 as x x , and
f ( x ) = f ( x ) + D f ( x ) ( x x ) + R1 ( x ) k x x k.
Now suppose that D f ( x ) 6= 0. For easy notation say D f ( x ) = a, where a Rn \ {0}. For
any real number t define xt = x + t a. Then we have
f ( x t ) = f ( x ) + a ( x t x ) + R1 ( x t ) k x t x k
= f ( x ) + a ( t a ) + R1 ( x + t a ) k t a k
= f ( x ) + t a a + R1 ( x + t a ) k t a k.
Now a a = k ak2 and kt ak = |t| k ak. ( Notice that t is a real number and a a vector, so we
must treat them different. ) So we can write
f ( xt ) = f ( x ) + t k ak2 + R1 ( x + t a) |t|k ak
= f ( x ) + k a k t k a k + | t | R1 ( x + t a ) .
Notes 3 Page 5
Now first look at the case where t > 0. Then |t| = t so the formula becomes
f ( x t ) = f ( x ) + t k a k k a k + | R1 ( x + t a ) .
Now use that for t small enough we know that k x xt k < r so that f ( xt ) f ( x ). But we
also know R1 ( x ) 0 as x x , so again by taking t small enough we can be sure that
R1 ( x + t a) > k ak, where we use that a 6= 0, so k ak > 0. So for sufficiently small values
of t we get that
f ( x t ) = f ( x ) + t k a k k a k + | R1 ( x + t a )
> f ( x ) + t k ak k ak k ak
= f ( x ).
But this contradicts the fact that we must that for all sufficiently small t we must have f ( xt )
f ( x ) ! The only place where we assumed something that could cause this contradiction is
the assumption that D f ( x ) 6= 0. So this assumption must be false, hence we must have
D f (0) = 0.
We can do the case that x is a local unconstrained minimum very fast in the same way by
looking at the case t < 0, hence |t| = t.
Theorem 4.3 in the book describes the well-known Second-Order conditions for an unconstrained maximum or minimum. We actually wont discuss these. The reason for that is
that the Second-Order Conditions say nothing about the possibilities that a critical point is
a global optimum. Parts 3 and 4 of Theorem 4.3 can can only be used to guarantee that a
certain critical point is a local maximum or minimum. And parts 1 and 2 can only be used
in the sense that if the Hessian D f 2 ( x ) is neither negative nor positive semidefinite, then x
cannot be a local maximum or minimum.
The general way to apply the First-Order Conditions for finding and optimum of f ( x ) on D
is as follows :
Determine if a maximum or minimum exists. Weierstrass Theorem can be used for
this. But it may also be possible to show that no maximum exist because you can show
that f ( x ) can get arbitrarily large for x D .
Determine the derivative D f on the interior of D.
Use the First-Order conditions to find the critical points of f . These points are candidate
optima.
Since critical points must be interior points of D , you need to do the non-interior points
of D in a different way. ( At this moment, we dont have a good method to do so, and
hence must rely on ad-hoc methods and common sense. )
If all candidate optima are known ( either because they are critical points, or from observing the non-interior points ), then calculate the functions values in these points in order
to find a possible global maximum or minimum.
Notes 3 Page 6
Notice that he Second-Order Conditions play no role in the recipe above. The reason for that
is already given above : the Second-Order Conditions say nothing about the possibilities
that a critical point is a global optimum. And in general we are only interested in finding the
global optima.
We could use the Second-Order Conditions to decide that some critical points cannot be local
optima, hence no global optima. In order words, this might remove some critical points as
candidates for being an optimum. But we still have to do most steps above to determine the
global optima.
Unconstrained optimisation is the topic of Chapter 4 of the book. Everything in Sections 4.1, 4.2, and 4.4 is actually fairly good material to read. ( Even although Section 4.4
has the words Second-Order Conditions in its title, nothing is done with those. ) Together with whats in the notes, that should be more than enough. You can ignore the
fairly complicated proof in Section 4.5. But it would be nice if you have some idea what
is happening in the proofs in these notes, in particular how the Taylor approximations
very quickly lead to some good insights on what is happening in the neighbourhood of
specific points.
Finally, Sections 4.3 and 4.6 are completely irrelevant for us.
Optimisation Theory
MA 208
2007/08
Notes 4
Constrained Optimisation with Equality Constraints
Lagranges Theorem and Method
4.1 Introduction
The kind of optimisation problems we will look at in these and the following notes are problems with a constraint set of the form
D = U { x Rn | g( x ) = 0, h( x ) 0 }.
Here U is an open set ( often the whole Rn ) and g, h are certain multidimensional functions :
g : Rn Rk , for some k, and h : Rn R` , for some `.
We will usually explicitly write the k components of h, and the ` components of g. So instead
of g( x ) = 0, h( x ) 0 we will write
g1 ( x ) = 0,
g2 ( x ) = 0,
...,
gk ( x ) = 0;
h1 ( x ) 0,
h2 ( x ) 0,
...,
h` ( x ) 0.
You may notice that there are no constraints of the form ji ( x ) > 0, i = 1, . . . , m. The reason
for this is that for reasonable functions ji : Rn R ( for instance, if ji is continuous ) the set
{ x Rn | j1 ( x ) > 0, . . . , jm ( x ) > 0 } will be an open set. That means that constraints of that
form will appear in the definition of the open set U.
The main reason to give the open set U such a separate role, is that every point of an open
set in an interior point of that set. And we already know quite a lot about how to handle
optimisation for interior points from Notes 3.
It is often necessary to rewrite the constraint set to the standard form above. For instance we
are asked to find the minimum of a certain function f : R3 R when x D defined by
g1 ( x, y, z) = x + z 3; h1 ( x, y, z) = y + z2 ;
h2 ( x, y, z) = ln( x ) + y + z.
D = U { x R3 | g1 ( x ) = 0, h1 ( x ) 0, h2 ( x ) 0 }.
Author : Jan van den Heuvel
Notes 4 Page 2
A formulation of Lagranges Theorem can be found in Theorem 5.1 in the book. That formulation involves a rather technical condition Suppose also that ( Dg( x )) = k, which we
will translate a little here.
For a matrix A, ( A) denotes the rank of A. A further definition and some properties can
be found in the book in Section 1.3.3. I advise you to read that section ( ignore the final
Theorem 1.43 ). In particular you should know that the column-rank of A ( the maximum
number of independent columns of A ) is equal to the row-rank of A ( the maximum number
of independent rows of A ); and that this number is called the rank of A.
Using the observation above, Lagranges Theorem can be written as follows. Recall that a
function is a C1 function if its derivative exists and the derivative is continuous everywhere
where the function is defined.
* Theorem ( Lagranges Theorem )
Let f : U R be a C1 function on a certain open set U Rn , and let gi : Rn R,
i = 1, . . . , k, be C1 functions. Suppose x is a local maximum or minimum of f on the set
D = U { x Rn | gi ( x ) = 0, i = 1, . . . , k }.
Suppose also that the derivatives { Dg1 ( x ), . . . , Dgk ( x ) } form an independent set of
vectors.
k
Note that Lagranges Theorem only gives a necessary condition for a local minimum or
k
Notes 4 Page 3
Sketch of Proof Use the notation from the statement of the theorem on the previous page,
and suppose x is a local maximum of f on D . Using the first order Taylor Approximation
at x we get that
f ( x ) = f ( x ) + D f ( x ) ( x x ) + remainder,
where a more precise description of the remainder term can be found in Notes 3. Now write
x = x + a, where k ak is small enough to guarantee that x + a U. ( This is possible since U
is an open set. ) Then we get that
f ( x + a) = f ( x ) + D f ( x ) a + remainder.
Now recall that x was a local maximum on D . Thus for small enough k ak with x + a D
we must have that f ( x + a) f ( x ) and f ( x a) f ( x ). If we substitute that in the
formula above, and forget about the remainder term, then this must mean that D f ( x ) a 0
and D f ( x ) ( a) 0. We get the following necessary condition :
(1)
In order to get an idea what it means that x + a D , we have a look at the constraint
functions g1 , . . . , gk . Their Taylor Approximation around x , writing x = x + a, looks as
gi ( x + a) = gi ( x ) + Dgi ( x ) a + remainder,
for i = 1, . . . , k.
But we know that x D , hence gi ( x ) = 0; and we are only interested in those a such that
x + a D , hence such that gi ( x + a) = 0. Filling this in into the formula above, means that
we are only interested in those a such that 0 = 0 + Dgi ( x ) a + remainder, for i = 1, . . . , k.
Again ignoring the remainder term, this means that get the following statement :
(2)
Now note that if we have an a with D f ( x ) a = 0 or Dgi ( x ) a = 0, then the same holds for
every scalar multiple a. So we can ignore the condition k ak small, to get the following
combination of statements (1) and (2).
(3)
In a lemma below we will show that the only way that statement (3) can be true is if D f ( x )
is a linear combination of Dg1 ( x ), . . . , Dgk ( x ). So there must exist 1 , . . . , k R such that
k
i =1
i =1
Notes 4 Page 4
If you go through the sketch of the proof of Lagranges Theorem in Section 4.3 above, then
you may notice that the Constraint Qualification doesnt seem to play a role there. The
reason is that we did some hand-waving at a couple of places. In particular, we neglected
the remainder terms in the first order Taylor Approximations for the constraint functions gi .
But if, for instance, the constraint function has Dgi ( x ) = 0, then the remainder term is
actually the most important term in deciding if x + a D . So ignoring it at that case makes
the rest of the argument pretty useless.
Something similar, but a bit more subtle, happens when the vectors { Dg1 ( x ), . . . , Dgk ( x ) }
are not independent.
Notes 4 Page 5
Sketch of proof We give a sketch of the proof of the statement above using again the first
order Taylor Approximations of the functions involved. We assume that we replaced the j-th
constraint g j ( x ) = 0 by g j ( x ) + = 0, for some small . In other words, we use a new j-th
()
constraint function g j
()
local optimum x () .
We first look at the Taylor Approximations at x for the constraint functions :
gi ( x ) = gi ( x ) + Dgi ( x ) ( x x ) + remainder,
for i = 1, . . . , k.
For the constraints that havent changed, we have that both gi ( x ) = 0 and gi ( x () ) = 0. If
we fill in x = x () into the formula above, using the knowledge from the previous sentence,
and neglecting the remainder term, we get
(4)
Dgi ( x ) ( x () x ) 0,
for i = 1, . . . , k, i 6= j.
()
For the new and old j-th constraint we need that g j ( x ) = 0 and g j ( x () ) = 0, which gives
()
0 = g j ( x () ) = g j ( x () ) + = g j ( x ) + Dg j ( x ) ( x () x ) + + remainder
= 0 + Dg j ( x ) ( x () x ) + + remainder.
Again neglecting the remainder term, we find that
(5)
Dg j ( x ) ( x () x ) .
k
Now use that we assume that x satisfies the condition D f ( x ) + i Dgi ( x ) = 0, hence
i =1
ing the remainder term, and using the knowledge in (4) and (5), we get
f ( x () ) f ( x ) + D f ( x ) ( x () x )
k
= f ( x ) i Dgi ( x ) ( x () x )
i =1
f ( x )
i 0 j ()
i =1, i 6= j
= f ( x ) + j .
This proves the statement.
Notes 4 Page 6
The Cookbook Procedure in Section 5.4.1 is simply trying to find the points x Rn and
the Lagrangean Multipliers 1 , . . . , k such that g1 ( x ) = 0, . . . , gk ( x ) = 0, and D f ( x ) +
k
i Dgi ( x ) = 0. The first equation is actually a vector equation, involving vectors with n
i =1
for all i = 1, . . . , k,
f ( x ) + i
gi ( x ) = 0,
x j
x j
i =1
for all j = 1, . . . , n.
L( x, ) = f ( x ) + i gi ( x ),
i =1
L( x , ) = 0,
i
L( x , ) = 0,
x j
for all i = 1, . . . , k,
for all j = 1, . . . , n.
The Cookbook Procedure involves solving a system of n + k equations, where there are
n + k unknowns x1 , . . . , xn , i , . . . , k . It is not always easy to find all solutions.
Moreover, all solutions to the Lagrangean equations are only candidates for local or global
optima; you still need to find out the true nature of these points.
Notes 4 Page 7
And finally, the procedure doesnt work for points where the Constraint Qualification is not
satisfied. These points should be identified separately and all points for which the Constraint
Qualification is not satisfied should be added to the set of candidates for the optima.
And really finally, dont forget that there may be an open set U used in the definition of D .
If you find a point x satisfying the Lagrangean equations above and the Constraint Qualification, but lies outside U, then it should not be considered as a candidate optimum.
The problem with the Cookbook Procedure from the book is that it ignores some facts ( see
the previous paragraph ). So here is an improved recipe for solving equality constrained
optimisation problems.
Given : the optimisation problem
max/min-imise f ( x ) subject to x D = U { x Rn | g1 ( x ) = 0, . . . , gk ( x ) = 0 },
where f , gi : Rn R are C1 functions, and U Rn is an open set.
1. If possible, find a good reason why a maximum or minimum must exist. For this, Weierstrass Theorem would be a prime source of knowledge, but sometimes ad-hoc methods
will be required.
2. Determine the derivatives Dg1 ( x ), . . . , Dgk ( x ) of the constraint functions. Try to find all
points in D for which the vectors { Dg1 ( x ), . . . , Dgk ( x ) } are not independent.
3. Determine the derivative D f ( x ) of the objective function and formulate the Lagrangean
equations :
gi ( x ) = 0,
for all i = 1, . . . , k,
f ( x ) + i
gi ( x ) = 0,
x j
x
j
i =1
for all j = 1, . . . , n.
4. Find all values x U and multipliers 1 , . . . , k for which the equations above are satisfied.
At this point you should have a collection of candidates for the optima : the points from 2
for which the Constraint Qualification failed and the points x in 4 satisfying the Lagrangean equations. No other point can be a maximum or minimum of f on D .
5. If you know from step 1 that a maximum or minimum must exist, then calculate the
function values for all candidate points from above. The points x which give the maximal
value f ( x ) must form a global maximum. And similarly for the global minima.
If you havent been able in step 1 to guarantee the existence of a maximum or minimum,
then you probably have to do some more work. Check the candidate points and see
which could be a global maxima or minima and why ( or why not ).
If you havent been able in step 1 to guarantee the existence of a maximum or minimum,
and no candidate points are found in steps 2 or 4, then no maximum and minimum exists.
It may be a good idea to check if you can confirm that using some other reasons. ( For
instance, the function has no upper and lower bound on D . )
If no candidate point is left from steps 2 and 4, but you claimed in step 1 that a maximum
or minimum must exist, then there is something seriously wrong. Check your work and
try to find the mistake(s).
Notes 4 Page 8
You can ignore Sections 5.3 and 5.7 on the Second-Order Conditions. Also the proof in 5.6
is beyond our reach. Look at the sketch of the proof in Section 4.3. Again, dont learn that
proof by heart, but try to get an understanding of the main ideas, in particular the use
of the Taylor Approximations. A similar remarks holds for the role of the Lagrangean
Multipliers in Section 4.5.
Optimisation Theory
MA 208
2007/08
Notes 5
Constrained Optimisation with Inequality Constraints
Kuhn-Tuckers Theorem
5.1 Introduction
D = U { x Rn | g j ( x ) = 0, j = 1, . . . , k; hi ( x ) 0, i = 1, . . . , ` },
called an optimisation problem with mixed constraints.
Here U Rn is an open set and the functions g j , hi are assumed to be C1 , i.e., the functions
have continuous first derivatives.
Section 6.1.1 contains the statement of Kuhn-Tuckers Theorem. We need one extra definition : An inequality constraint hi ( x ) 0 is said to be effective at a certain point x if we have
hi ( x ) = 0, i.e., if the constraint holds with equality at x .
* Theorem ( Kuhn-Tuckers Theorem for Local Maxima )
Let f : U R be a C1 function on a certain open set U Rn , and let hi : Rn R,
i = 1, . . . , `, be C1 functions. Suppose x is a local maximum of f on the set
D = U { x Rn | hi ( x ) 0, i = 1, . . . , ` }.
Let E {1, . . . , `} denote the set of effective constraints at x . Suppose that the derivatives { Dhi ( x ) | i E } form an independent set of vectors.
Then there exist 1 , . . . , ` R such that
i 0,
for i = 1, . . . , `;
i hi ( x ) = 0,
for i = 1, . . . , `;
D f ( x ) + i Dhi ( x ) = 0.
i =1
Notes 5 Page 2
D = U { x Rn | hi ( x ) 0, i = 1, . . . , ` }.
Let E {1, . . . , `} denote the set of effective constraints at x . Suppose that the derivatives { Dhi ( x ) | i E } form an independent set of vectors.
Then there exist 1 , . . . , ` R such that
i 0,
i hi ( x )
for i = 1, . . . , `;
= 0,
for i = 1, . . . , `;
D f ( x ) i Dhi ( x ) = 0.
i =1
Note that the Kuhn-Tuckers Theorems for maxima and for minima are not exactly the same.
So in order to find both the maximum and the minimum for an inequality constrained optimisation problem, you must do certain steps twice.
The conditions that i hi ( x ) = 0 for all i are called the complementary slackness conditions.
Since we must have that hi ( x ) 0 and i 0, we can only have slack in one of the conditions ( i.e., i > 0 or hi ( x ) > 0 ) if the other condition is satisfied with equality ( i.e.,
hi ( x ) = 0 or i = 0, respectively ).
Sketch of Proof We use the notation from the statement of the theorem above. In particular,
x is a local maximum of f on D and E is the set of effective constraints, i.e., E = { i |
hi ( x ) = 0 }. This means that hi ( x ) > 0 for all i
/ E. Since the hi are continuous functions,
n
the sets { x R | hi ( x ) > 0 } are open for all i
/ E. So if we define
U 0 = U { x Rn | hi ( x ) > 0, i
/ E },
and
D 0 = U 0 { x R | hi ( x ) 0, i E },
then U 0 is an open set, and x D 0 D . Since x is a maximum of f on D , it certainly
is a maximum of f on D 0 . In the remainder of the proof we forget about the non-effective
constraints, because they are taken care of by U 0 .
Notes 5 Page 3
The only thing we have to do is saying what the values of i are for i
/ E. We just take
i = 0,
for all i
/ E;
in that way we are sure that the conditions i 0 and i hi ( x ) = 0 are satisfied for i
/ E.
Now we look back to the start of the proof of Lagranges Theorem in Notes 4. The Taylor
Approximation around x of f , writing x = x + a, is
f ( x + a) = f ( x ) + D f ( x ) a + remainder.
Recall that x is a maximum of f on D 0 . Thus for small enough k ak with x + a D 0 we
must have that f ( x + a) f ( x ). If we substitute that in the formula above, and forget
about the remainder term, then this must mean that D f ( x ) a 0. So we get the following
necessary condition :
(1)
for i E.
But we know that hi ( x ) = 0; and we are only interested in those a such that x + a D 0 ,
hence such that hi ( x + a) 0, for i E. Filling this in into the formula above, means that
we are only interested in those a such that 0 + Dhi ( x ) a + remainder 0, for i E. If we
ignore the remainder term, then this gives the following statement :
(2)
Note that if we have an a with D f ( x ) a = 0 or Dhi ( x ) a 0, then the same holds for
every scalar multiple a with 0. So we can ignore the condition k ak small, to get the
following combination of statements (1) and (2).
(3)
In a lemma below we will show that the only way that statement (3) can be true is if
(4)
D f ( x ) = i Dhi ( x ),
iE
with i 0.
Notes 5 Page 4
So the only thing left to show is that the last equation in Kuhn-Tuckers Theorem is satisfied.
But that is now very easy, using (4) and the fact that i = 0 for i
/ E:
`
iE
i
/E
= D f ( x ) + (i ) Dhi ( x ) + 0 Dhi ( x )
iE
i
/E
= D f ( x ) D f ( x ) + 0 = 0,
which completes the proof.
for i = 2, . . . , m.
0 x v2 =
m
m
i y i v2 = i y i v2 = 1 k v2 k2 .
i =1
i =1
Notes 5 Page 5
D = U { x Rn | g1 ( x ) = 0, . . . , gk ( x ) = 0, h1 ( x ) 0, . . . , h` ( x ) 0 }.
To make the notation not too cumbersome, we define functions i : Rn R for i =
1, . . . , k + `, where
(
gi ,
for i = 1, . . . , k,
i =
hi k ,
for i = k + 1, . . . , k + `.
We also only formulate the result for maxima. As before, a constraint i is effective at a point
x D if i ( x ) = 0. This means that the constraints 1, . . . , k are always effective.
* Theorem
Let f : U R be a C1 function on a certain open set U Rn , and let i : Rn R,
i = 1, . . . , k + 1 be C1 functions. Suppose x is a local maximum of f on the set
D = U { x Rn | i ( x ) = 0, i = 1, . . . , k; i ( x ) 0, j = k + 1, . . . , k + ` }.
Let E {1, . . . , k + `} denote the set of effective constraints at x . Suppose that the
derivatives { Di ( x ) | i E } form an independent set of vectors.
Then there exist 1 , . . . , k+` R such that
j 0,
for j = k + 1, . . . , k + `;
j i ( x ) = 0,
for j = k + 1, . . . , k + `;
k +`
D f ( x ) + i Di ( x ) = 0.
i =1
More particular, we say that the Constraint Qualification fails for a certain point x D if the
following holds : First, let E {1, . . . , `} be the set of effective constraints, so hi ( x ) = 0 if
and only if i E. And then the Constraint Qualification fails if { Dhi ( x ) | i E } is not an
independent set.
Notes 5 Page 6
So how do you check if the Constraint Qualification holds on D , or for which x D it fails ?
The main problem is that there are many possibilities for the set E of effective constraints.
For instance, consider the set D = { ( x, y) R2 | x + y 0, x2 0 }. Then (0, 0) D
with h1 (0, 0) = 0 and h2 (0, 0) = 0, so E = {1, 2} for x = (0, 0). But also (1, 1) D with
h1 (1, 1) = 0 but h2 (1, 1) > 0, so E = {1} for x = (1, 1). And finally, (1, 1) D with
h1 (1, 1) > 0 and h2 (1, 1) > 0, so E = in this case.
Because of the different possibilities for E, checking for which x D the Constraint Qualification fails is quite some work. Here is a tedious, but save, method :
1. Write down each subset of {1, . . . , `}, except the empty set. So if ` = 3, then we get have
the subsets { h1 }, {h2 }, { h3 }, { h1 , h2 }, { h1 , h3 }, {h2 , h3 }, and { h1 , h2 , h3 }.
2. For each of the subsets E from step 1, see if there are points x D such that
hi ( x ) = 0,
for all i E,
and
hi ( x ) > 0,
for all i
/ E,
and
{ Dhi ( x ) | i E } is dependent.
In general, there will be many E for which there is no x satisfying this. In particular, for
many E it wont be possible to have a point x satisfying hi ( x ) = 0 for all x E.
If you are working with a mixed constraint problem, then the procedure above should be
adapted to take into account that every equality constraint gi ( x ) = 0 is always effective, so
you should only consider E that include all equality constraints.
Every point x D found in the procedure above is a potential problem case in KuhnTuckers Theorem. So it has to be considered as a candidate maximum or minimum, until
we have a good reason to remove it from the list of such candidates.
Notes 5 Page 7
The result above also holds for mixed constraint problems, where you must realise that all
equality constraints are always effective. The main difference between equality and inequality constraints, is that for the latter ones we know that j 0. In particular that means that
if > 0, then the maximum changes from f ( x ) to f ( x ) + j , hence will increase or stay
the same if j = 0.
For the multiplier j of an equality constraint we dont know the sign of j , so the difference
j can be both positive or negative or 0.
The Cookbook Procedure in Section 6.2.1 for using Kuhn-Tuckers Theorem to find a maximum can be described as follows :
1. If possible, find a good reason why a maximum must exist. For this, Weierstrass Theorem
would be a prime source of knowledge, but sometimes ad-hoc methods will be required.
2. Determine the derivatives Dh1 ( x ), . . . , Dh` ( x ) of the constraint functions.
Determine all point for which the Constraint Qualification fails. Unless there are obvious
shortcuts, this has to be done by looking at every possible combination of constraints
using the procedure from Section 5.5 in these notes.
Any point in which the Constraint Qualification fails must be considered as a candidate
optimum.
3. Determine the derivative D f ( x ) of the objective function and formulate the Kuhn-Tucker
equations :
i 0
and
`
i hi ( x ) = 0,
D f ( x ) + i Dhi ( x ) = 0.
i =1
for i = 1, . . . , `;
Notes 5 Page 8
4. Find all values x U and multipliers 1 , . . . , k for which the equations above are satisfied.
At this point you should have a collection of candidates for the maxima : the points from 2
for which the Constraint Qualification failed and the points x in 4 satisfying the Lagrangean equations. No other point can be a maximum of f on D .
5. If you know from step 1 that a maximum must exist, then calculate the function values
for all candidate points from above. The points x which give the maximal value f ( x )
must form a global maximum.
If you havent been able in step 1 to guarantee the existence of a maximum, then you
probably have to do some more work. Check the candidate points and see which could
be a global maxima and why ( or why not ).
If you havent been able in step 1 to guarantee the existence of a maximum, and no candidate points are found in steps 2 or 4, then no maximum and minimum exists. It may be
a good idea to check if you can confirm that using some other reasons. ( For instance, the
function has no upper bound on D . )
If no candidate point is left from steps 2 and 4, but you claimed in step 1 that a maximum
must exist, then there is something seriously wrong. Check your work and try to find the
mistake(s).
In order to find a minimum, the procedure above needs some small adaption, although the
critical step 2 is identical. In fact, if we want to find both a maximum and minimum, then
step 2 needs to be done only once.
For mixed constraint optimisation problem, some further adaption is required.
L( x, ) = f ( x ) + i gi ( x ),
i =1
L( x , ) 0,
i
L( x , ) = 0,
x j
i 0,
L( x , ) = 0,
i
for i = 1, . . . , `,
for j = 1, . . . , n.
Note that no Second Order Conditions for inequality constrained optimisation problems
are described in the book. That type of conditions exist in the literature, but they are so
cumbersome that they are only of theoretical interest.
You can also note from the description above, and from all examples in the book, that KuhnTuckers Theorem is only used when searching for global optima. Local optima are usually
discarded as being to hard to guarantee and of little interest.
Notes 5 Page 9
Optimisation Theory
2007/08
MA 208
Notes 6
Linear Programming and Duality
6.1 Introduction
This topic doesnt appear in the book, so youll have to do with these notes and the lectures
and classes.
(1)
where the objective function f and all constraints g1 , . . . , gk and h1 , . . . , h` are linear functions.
The above gives the general form for an LP-problem. But for the rest of these notes we will
always assume that an LP-problem has the following standard form :
maximise c1 x1 + c2 x2 + + cn xn
subject to
(1)
(1)
(2)
(2)
(1)
a1 x1 + a2 x2 + + an xn b1 ,
a1 x1 + a2
..
.
(m)
a1
(m)
x1 + a2
x1 0,
(2)
x2 + + an xn b2 ,
..
..
..
.
.
.
(m)
x2 + + a n
x2 0,
...,
(2)
x n bm ,
xn 0.
( j)
Notes 6 Page 2
Of course, not every linear programming problem in general format (1) will look like the
standard form in (2). In this paragraph we will show how to transform a linear programming
problem in the more general form (1) into the form prescribed in (2).
If we want to minimise f ( x ), then this is equivalent to maximising f ( x ). Similarly,
maximising a1 x2 + + an xn + b, will give the same solutions as maximising a1 x2 +
+ an xn , where only the function value in the maxima will differ by b.
Next it is straightforward that a constraint of the form h( x ) 0, where h is a linear
function, can always be written as a1 x2 + + an xn b, for constants a1 , . . . , an and b.
So we are left with the problem what to do with equality constraints g( x ) = 0, where g is a
linear function, and how to make sure that all constraints of the form xi 0 are present.
First assume we have an equality constraint of the form g( x ) = 0 in (1), where g is a linear
function. In other words, we have a constraint of the form a1 x1 + + an xn + b = 0 for
some constants a1 , . . . , an and b. We can assume that not all ai are 0, otherwise we have
something that is always true ( if b = 0; and hence would be a constraint we can ignore ),
or that is always violated ( if b 6= 0, which would mean that D = and the whole problem has no solution ).
So take any such ai 6= 0. For simplicity we assume an 6= 0. Then a1 x1 + + an xn + b = 0
is the same as saying xn = (b a1 x1 an1 xn1 )/an . Then by just substituting this
value for xn in the objective function and in all constraints, we get a new linear programming problem. This new problem no longer has the equality a1 x1 + + an xn + b = 0.
But note that xn 0 gets translated to (b a1 x1 an1 xn1 )/an 0, which becomes ( a1 /an ) x1 + + ( an1 /an ) xn1 b/an . So we loose the inequality xn 0, but
must add a new inequality instead.
This way we get a new system which is one dimension lower than the original was, because xn has disappeared.
We should repeat the procedure until all equality constraints in (1) have been removed.
Its somewhat trickier to make sure that all constraints of the form xi 0 are really
present.
First notice that any a R can be written as a = b c for some b, c R with b, c 0
( for instance, take b = max{0, a} and c = max{0, a} ). In fact, there are always many
possibilities for b and c, since if a = b c with b, c 0, then also a = (b + 1) (c + 1)
with b + 1, c + 1 0.
Anyhow, if the constraint xi 0 is not one of the constraints in (1), then we write every
occurrence of xi in the objective function and the constraints as xi = xi+ xi , and we
add the two constraints xi+ 0 and xi 0. This way we get a new linear programming problem with one more variable ( xi is replaced by xi+ and xi ), and two further
constraints xi+ 0 and xi 0.
By repeatedly applying the procedures above, any linear program of the general form (1) can
be transformed into an equivalent LP-problem in the standard form (2); possibly with a different number of variables and/or constraints. Once were done analysing the LP-problem
in standard form ( for instance weve found a maximum ), then we can do the inverse of the
procedures above to obtain the analysis of the original problem.
Notes 6 Page 3
In order to find critical points according to the Kuhn-Tucker equations we first rewrite an
LP-problem in standard form to the form we need for Kuhn-Tucker :
maximise c1 x1 + c2 x2 + + cn xn
subject to
(1)
(1)
(1)
b1 a1 x1 a2 x2 an xn 0,
x2 an xn 0,
..
..
..
.
.
.
(m)
x2 a n
x1 a2
..
.
(m)
x1 a2
bm a 1
x1 0,
(2)
(2)
(2)
b2 a1
..
.
x2 0,
(m)
...,
x1 0,
xn 0.
Since there are clearly two different types of constraints, we also will use two names for the
( j)
( j)
Kuhn-Tucker multipliers : j for constraints b j a1 x1 an xn 0 and multipliers i
for constraints xi 0.
Using Kuhn-Tuckers Theorem, and ignoring the Constraint Qualifications for the moment,
we see that in order for ( x1 , . . . , xn ) to be a maximum, we must be able to find (1 , . . . , m ,
1 , . . . , n ) such that
j 0,
for j = 1, . . . , m,
(3a)
xn 0,
for j = 1, . . . , m,
(3b)
( j)
an
for j = 1, . . . , m,
(3c)
i 0,
for i = 1, . . . , n,
(3d)
xi 0,
for i = 1, . . . , n,
(3e)
i xi = 0,
for i = 1, . . . , n,
(3f)
for i = 1, . . . , n.
(3g)
bj
( j)
a1
j (b j
x1
( j)
a1
( j)
an
x1
xn ) = 0,
( j)
ci j ai + i = 0,
j =1
i =
( j)
j ai
j =1
(1)
(m)
ci = ai 1 + + a i
m ci ,
for i = 1, . . . , n.
Substituting this into (3d) and (3f), rearranging (3d) (3f), and renaming the j to y j , we get
that if ( x1 , . . . , xn ) is a maximum of the LP-problem (2) in standard form, then there must
Notes 6 Page 4
( j)
a1
y j (b j
x1
( j)
a1
( j)
an
x1
for j = 1, . . . , m,
(4a)
xn 0,
for j = 1, . . . , m,
(4b)
( j)
an
for j = 1, . . . , m,
(4c)
for i = 1, . . . , n,
(4d)
for i = 1, . . . , n,
(4e)
for i = 1, . . . , n.
(4f)
xn ) = 0,
xi 0,
(1)
ai
(m)
y1 + + a i
(1)
x i ( a i y1
++
ym ci 0,
(m)
ai y m ci ) =
0,
So to summarise we get
* If ( x1 , . . . , xn ) is a maximum of the LP-problem in (2), then there exist (y1 , . . . , ym ) such
that the equations (4a) (4f) are satisfied.
If you look at the equations (4a) (4f), you notice a remarkable symmetry between the xi
and the y j . In fact, we would obtain exactly the same equations if we would analyse what it
means for an m-dimensional vector (y1 , . . . , ym ) to be the solution to the following problem :
minimise b1 y1 + b2 y2 + + bm ym
(1)
(2)
(m)
subject to a1 y1 + a1 y2 + + a1
(1)
a2 y1
(2)
a2 y2
..
.
(1)
++
ym
(m)
a2 y m
..
.
(2)
..
.
(m)
c1 ,
c2 ,
..
.
a n y1 + a n y2 + + a n
ym cn ,
y1 0,
ym 0.
y2 0,
...,
(5)
So what about the Constraint Qualifications for the linear programming problem discussed
above ? If we are dealing with linear constraint function, then it is possible to prove that
even in points where the Constraint Qualifications fail, a maximum must occur as a solution
to the Kuhn-Tucker equations.
The reason for this behaviour can be found by looking back at the proof of Kuhn-Tuckers
Theorem. There the Constraint Qualifications appeared because in such points the first order Taylor approximation may not give a good description of what is happening with the
constraint functions at such a point. But this problem cannot occur if all functions are linear, because then the first order Taylor approximation is exactly the function itself ! So the
first order Taylor approximations used in the proof of Kuhn-Tuckers Theorem give an exact
description of the behaviour of the constraint functions.
There is one thing you should be aware of when there is a maximum of a linear programming problem in a point for which the Constraint Qualification fails. Although such a point
will appear as a solution to the Kuhn-Tucker equation, it is possible that there will be no
unique solution, and that many different multipliers are possible. This doesnt influence the
question about existence of the multipliers, but it may make them harder to find, and also
will spoil their interpretation as shadow prices.
Notes 6 Page 5
(1)
(1)
a1
an
.
..
..
.
Given an LP-problem in standard form (2), let A be the matrix A =
.
.
.
,
(m)
(m)
a1
an
and define column vectors c = (c1 , . . . , cn ), b = (b1 , . . . , bm ), x = ( x1 , . . . , xn ), and y =
(y1 , . . . , ym ). Also, let 0n and 0m be the n-dimensional and the m-dimensional null-vector,
respectively ( so they have all coordinates equal to 0 ).
Remember that the inner product of two vectors a and b of the same dimension is denoted
by a b. And we use A0 to denote the transpose of A, i.e., the matrix obtained from A by
taking the reflection in the diagonal. ( Transposed are usually indicated by At or A T , but we
follow the notation in the book. )
A x b,
(6)
x 0n ,
where the matrix A, the n-vector c and the m-vector b are given. This problem is sometimes
called the primal linear programming problem.
And if we have such an LP-problem in the standard form given in (6), we define the dual
linear programming problem ( or DLP-problem ) as the following constrained optimisation problem :
minimise b y
subject to A0 y c,
(7)
y 0m .
Notice that this is exactly the linear programming problem in (5).
D D = { y Rm | A0 y c; y 0m };
and D D is the feasible set of the DLP-problem and a point y D D is a feasible point of the
dual problem.
Notes 6 Page 6
We can also rewrite the equations (4a) (4f) in a more compact form, and re-order to get :
A x b,
(8a)
A0
(8b)
y c,
x 0n ,
(8c)
y 0m ,
(8d)
( j)
( j)
y j (b j a1 x1 an xn ) = 0,
(1)
xi ( ai
(m)
y1 + + a i
ym ci ) = 0,
for j = 1, . . . , m,
(8e)
for i = 1, . . . , n.
(8f)
Lemma 6.3
(a) If a, b Rn so that a 0n and b 0n , then a b 0 ( where this last 0 is the real number
zero ).
(b) If a, b, c Rn so that a 0n and b c, then a b a c.
(c) If M is an n m matrix with real entries, a Rn and b Rm , then a ( M b) = ( M0 a) b.
Proof Part (a) follows immediately since x 0 means xi 0, for all i = 1, . . . , n, and
n
The second part is almost equally trivial since b c means b c 0, which according to the
first part means a b a c = a (b c) 0, and we are done.
And (c) is easily obtained by writing out what a ( M b) and ( M0 a) b are. If the i, j-entry
n m
m n
i =1 j =1
j = 1i = 1
( M0 a) b.
Notes 6 Page 7
Given an LP-problem in the form according to (6), and its dual in (7). Then for any feasible
point x of the LP-problem and any feasible point y of its dual we have c x b y.
Proof For a feasible point x we know A x b and x 0n ; and for a feasible point y we
know A0 y c and y 0m . By repeatedly using the appropriate parts of Lemma 6.3 we can
deduce : c x ( A0 y) x = y ( A x ) y b.
(9a)
y 0m ,
(9b)
( j)
( j)
y j (b j a1 x1 an xn ) = 0,
(1)
( a i y1
++
(m)
ai y m
ci ) xi = 0,
for j = 1, . . . , m,
(9c)
for i = 1, . . . , n.
(9d)
( This are just equations (8b), (8d), (8e) and (8f). ) As a consequence of (9a) and (9b) we see
that y D D . And as a consequence of (9c) we get
y (b A x ) =
( j)
y j ( b j a1
( j)
x1 an xn ) = 0,
j =1
Notes 6 Page 8
As a final consequence of the proof of Theorem 6.5 we obtain the following important result.
Theorem 6.6 ( Complementary Slackness for Linear Programming )
Given an LP-problem in the form according to (6), and its dual in (7). If x is a solution to
the LP-problem and y is a solution to its dual, then they must satisfy the so-called complementary slackness conditions :
if xi > 0,
then ( A0 y )i = ci ;
if yj > 0,
then ( A x ) j = b j ;
if ( A x ) j < b j ,
then yj = 0;
if ( A0 y )i > ci , then xi = 0.
Proof From the proof of Theorem 6.5 we know that for optimal solution x and y we must
have
( j)
( j)
yj (b j a1 x1 an xn ) = 0,
(1)
(m)
ym
( ai y1 + + ai
ci ) xi = 0,
( j)
for j = 1, . . . , m,
for i = 1, . . . , n.
( j)
(m)
ym
ci = 0,
Notice that you only can use complementary slackness if a constraint is slack, i.e., if it is
not satisfied with equality. If you have a tight constraint, then you cannot conclude that
the corresponding constraint is slack. For example if xi = 0, then it is still possible that
( A0 y )i = ci as well.
Suppose you are asked to solve a linear programming problem in its most general form (1).
Then the following steps can be useful, although they dont give a full Cookbook Procedure :
1.
Use the procedures in Section 6.1 to translate the problem to an LP-problem in standard
form according to (2).
2.
If you are left with a problem with dimension n at most 2, then you should be able to
solve the problem graphically : In the x1 , x2 -plane, sketch the areas corresponding to
( j)
( j)
the constraints. I.e., find the ( x1 , x2 ) satisfying x1 , x2 0 and a1 x1 + a2 b j , for j =
1, . . . , m. Also sketch some level sets of the objective function; lines with c1 x1 + c2 x2 =
for some . This sketch should give you an idea in which point of the feasible set D P a
maximum is attained ( if any ).
3.
Notes 6 Page 9
( j)
If you are left with a problem with a small number of constraints of the form a1 x1 +
( j)
+ an xn b j , then it may be possible to derive fairly easily that the objective function
has no maximum on the feasible set.
4.
If you cant solve the primal LP-problem directly, then formulate the dual LP-problem
according to (5) or (7).
5.
6.
If you can conclude in step 5 that the DLP-problem has no solution, then you know that
the primal LP-problem has no solution as well.
7.
If in step 5 you were able to find an optimal solution y of the DLP-problem, then
you can use information about y in the Complementary Slackness Conditions in Theorem 6.6 to obtain information about an optimal solution x of the primal LP-problem.
Also, the Strong Duality Theorem gives you an equation c x = b y about x . This
information is often enough to find an optimal solution x .
8.
If neither the primal nor the dual LP-problem can be solved directly, or can be shown
to have no solutions, then the last resort is to formulate the equations in (4a) (4f) and
try to solve these. This is basically the same as solving the Kuhn-Tucker equations in
general, so will often lead to a detailed case analysis. For large n and m this is not
feasible to do by hand, but for smaller dimensions it should be possible.
As a small beside, the procedure above is not really what is done in general for solving big
LP-problems using a computer. ( Where big can mean n equal to a couple of thousand and m
of the order of millions. ) Other techniques are available, but all of these use duality and
complementary slackness to get to a solution as fast as possible.
Exercises
1
(a) Transform this problem into the standard format for a Linear Programming problem
according to (2) on page 1.
(b) Formulate the DLP-problem for this problem.
Notes 6 Page 10
x+y
subject to
x+2y
2 x + y
5x
x, y
3,
2,
6,
0.
(a) Write this problem as a standard LP-problem, sketch its feasible set, and find an optimal
solution.
(b) Give the dual LP-problem. Describe what the Complementary Slackness Conditions
mean for the primal and dual LP-problem.
(c) Find an optimal solution for the dual LP-problem.
3 x1 + 7 x2 + 3 x3 30,
2 x1 + 2 x2 + 3 x3 12
x1 , x2 , x3 0.
(a) Give the dual LP-problem, sketch the feasible set of the DLP-problem, and find an optimal solution for the DLP-problem.
(b) Describe what the Complementary Slackness Conditions mean for the primal and dual
LP-problem.
(c) Find an optimal solution for the original LP-problem.
A x b,
x 0n .
It is known that the feasible set D P of this problem is not empty and that c 0n , c 6= 0n .
(a) Explain carefully why you can conclude that an optimal solution of the LP-problem
must exist.
Now suppose that, in addition, for this specific LP-problem we have that every point in the
feasible set is an optimal solution.
(b) Is it true that we also must have that every point in the feasible set D D of the dual LPproblem is an optimal solution of the dual LP-problem ? ( Justify your answer, either by
proving the statement, or by giving a counterexample. )
Notes 6 Page 11
x1 x2 + x3 1,
x1 + x2 x3 1
x1 , x2 , x3 0.
(a) Give the dual LP-problem and sketch the feasible set of the DLP-problem.
(b) Show that the DLP-problem has more than one optimal solution.
(c) Use the Complementary Slackness Conditions to find an optimal solution for the primal
LP-problem, and show this solution is unique.
(d) Show that if we interpret the LP-problem as an inequality constrained optimisation
problem in Kuhn-Tuckers Theorem, then the Constraint Qualification would fail in the
optimum found in (c).
Optimisation Theory
MA 208
2007/08
Notes 7
Digraphs and Networks
Shortest Paths
Order of Functions
Algorithms and their Analysis
Shortest Path Algorithms
We now start the combinatorial optimisation part of the course. The Sundaram book
doesnt know about this, so instead we switch to the Biggs book : N.L. Biggs, Discrete Mathematics ( 2nd edition ), Oxford University Press (2002), ISBN 0-19-850717-8. The sections from
this book that are most relevant for us are : 14.1 14.7, 15.1, 15.4, 16.6 and 18.1 18.4.
For most of the remainder, we will be looking at graphs as our object of study ( well,
directed graphs actually ). If you want to read more on graph theory, then almost any book
with the words introduction, graph and theory in the title should do. An excellent
source is also the book Graph Theory with Applications, by J.A. Bondy and U.S.R. Murty, North
Holland (1976). This book is out of print ( and has been out of print for ages ). But the full text
is available online for personal use. You can find it via www.ecp6.jussieu.fr/pageperso/
bondy/books/gtwa/gtwa.html. ( Bondy and Murty recently published a new book on graph
theory; that is a far more advanced book, and not a 2nd edition of the book mentioned
above. )
7.1 Graphs, digraphs and networks
The definition of a graph can be found in Section 15.1 of the Biggs book. But we will mostly
be interested in digraphs ( or directed graphs ) defined in Section 18.1 of the book.
For us a digraph D = (V, A) consists of a finite set V, called the vertices ( singular vertex ),
and a collection A of pairs of different elements from V. The elements in A are called arcs
( sometimes also called directed edges ).
Since we dont allow pairs (u, v) with u = v, we dont allow what the book calls loops.
We also are not allowed to take the same pair twice. ( Certain authors happily allow that and
call it parallel arcs. ) But we do allow that both arcs (u, v) and (v, u) are present in A.
We think of an arc (u, v) as a line connecting u to v, with a direction from u to v. Hence we
often talk about the arc from u to v. We call u the tail and v the head of the arc (u, v).
Author : Jan van den Heuvel
Notes 7 Page 2
graph
-
6?
u
digraph
On the other hand, its not clear how we would go from digraphs to graphs, without losing
some information :
u
6
u
digraph
graph
u
Since we are always assuming we are working in a digraph, we often omit the directed in
the names above, and just talk about walk, u, v-walk, tour, path, u, v-path, cycle.
Note that for a pair of vertices u, v, there may be infinitely many u, v-walks. On the other
hand, since in a path no vertex can appear more than once, the number of u, v-paths is always
finite.
Notes 7 Page 3
A digraph D = (V, A) is strongly connected if for every two distinct vertices u, v there is walk
from u to v.
Property 7.1
A digraph D = (V, A) is strongly connected if and only if for every two distinct vertices u, v there is
a path from u to v ( exercise ).
Another phenomenon is that the question about the existence of the optima is usually guaranteed. Heres an easy argument to prove this.
Let D be a finite, non-empty, set, and f : D R be any function on D . Consider the set
{ f ( a) | a D }. This is a finite, non-empty, set of numbers from R. Hence this set has a
maximum and a minimum ( you will be asked to prove this formally in an exercise ), and
this maximum/minimum is the maximum/minimum of f on D .
So for most questions in combinatorial optimisation we are not concerned about proving
the existence of a maximum or minimum, but about an efficient way of finding them. We
usually do this by describing an algorithm to find the maximum or minimum, prove that that
algorithm indeed will result in finding the solution we want, and discussing how long the
algorithm will take to find the solution.
Just as in Section 14.1 of the Biggs book, we wont define precisely what we mean by an
algorithm. We just stick to the same description as there : An algorithm is a sequence
of instructions. Each instruction must be carried out, in its proper place, by the person or
machine for whom the algorithm is intended.
We will usually write an algorithm in some kind of pseudo computer language, being
precise if we can be, and using more general language if that is more convenient.
So for instance, the following could be an algorithm to find the minimum in a finite set S of
real numbers :
1.
2.
3.
4.
5.
6.
7.
8.
We will see later how we can prove that such an algorithm gives the correct answer.
Notes 7 Page 4
We will allow ourselves to describe our algorithms in fairly informal language : remove a
from A, calculate x + y, take the maximum of x1 , . . . , xk , etc. To prevent ourselves
from both writing one-line algorithms ( like solve the problems stated ), and from writing
detailed descriptions involving a large number of small but simple steps, we assume the
following operations are always allowed as building blocks :
w (W ) =
w ( v i , v i +1 ).
i =1
Let u, v be two vertices in a network ( D, w). Let Y (u, v) be the set of all u, v-walks in D, and
set Z (u, v) = { w(W ) | W Y (u, v) }.
Now we define the distance from u to v, denoted dist(u, v), as follows :
If there is no u, v-walk ( Y (u, v) and Z (u, v) are empty ), then we set dist(u, v) = +.
If Z (u, v) has no lower bound, then we set dist(u, v) = .
In all other cases, dist(u, v) is the weight of a shortest u, v-walk, the minimum of Z (u, v).
The fact that in the last case we can take the minimum of Z (u, v), and not the infimum, requires
some proof. This will be done in the lectures.
Life is a lot easier if we assume the digraph D is strongly connected and if none of the
weights w( a) is negative ( see exercises ).
In the definition of distance above we introduced + and . But we should not start
using them just as if they are numbers. They are more meant to indicate certain concepts ( in
particular as a shorthand for two different reasons why there is no shortest walk ).
Property 7.2
Let ( D, w) be a network. There exist two vertices u, v with dist(u, v) = if and only if there exists
a cycle C in D with w(C ) < 0.
Notes 7 Page 5
Property 7.3
Let u, v be two vertices in a network ( D, w).
If dist(u, v) 6= , then there is a u, v-path P so that dist(u, v) = w( P).
If w( a) 0 for all arcs a and there is a u, v-walk in D, then there is a u, v-path P so that
dist(u, v) = w( P).
Because of Property 7.3, for the case that all weights are non-negative, we could also define
dist(u, v) as the weight of a shortest path from u to v, if such a path exists.
Property 7.4
If u, v, w are three vertices in a network ( D, w) so that dist(u, v) 6= , dist(v, w) 6= and
dist(u, w) 6= , then dist(u, w) dist(u, v) + dist(v, w).
Property 7.5
Let ( D, w) be a network. Then for all v V we have either dist(v, v) = 0 or dist(v, v) = .
Most of the properties will be proved in the lectures.
In the lectures we will prove why this algorithm gives the right results.
Actually, the first known description of the algorithm is in a little-known report by Leyzorek, Gray, Johnson,
Ladew, Meaker, Petry & Seitz from 1957; Dijkstras publication is from 1959. You can imagine why the name
Dijkstras Algorithm stayed more popular than the name Leyzorek-Gray-Johnson-Ladew-Meaker-Petry-Seitz
Algorithm, even after the historical inaccuracy became known.
Notes 7 Page 6
Note that in fact Dijkstras Algorithm finds dist(s, v) for every vertex v V. This is a feature
of most algorithms to find shortest paths. That is why these algorithms are often known as
Single Source Shortest Path algorithms ( where the source is the single vertex s that forms the
base of all the distances ). And we might as well replace the final line of the algorithm by
13.
Since all the algorithms we look at will have the property that it determines dist(s, v) for one
source s V and all vertices v V, from now on we expect only the source s to be given
( and not s and t ).
Of course, in the case of Dijkstras Algorithm, we could stop earlier as soon as dist(s, t) is
determined ( that happens when t is coloured white ).
We have a look at how efficient Dijkstras Algorithm is in the next session.
Dijkstras Algorithm might not work if there are arcs a with negative weight. An example
where this happens is the following small graph, with weights beside the arcs :
4
u ut
A
4 AKA 2
s Au
Dijkstras Algorithm will give dist(2, t) = 2, while the correct distance is dist(s, t) = 0.
The big-oh notation comes with its own kind of arithmetic. For instance, the following
statements are easy to prove :
If f 1 (n) = O( g1 (n) and f 2 (n) = O( g2 (n), then f 1 (n) + f 2 (n) = O( g1 (n) + g2 (n)).
We often write this as O( g1 (n)) + O( g2 (n)) = O( g1 (n) + g2 (n)).
If f 1 (n) = O( g1 (n) and f 2 (n) = O( g2 (n), then f 1 (n) f 2 (n) = O( g1 (n) g2 (n)).
We often write this as O( g1 (n)) O( g2 (n)) = O( g1 (n) g2 (n)).
If f 1 (n) = O( g(n) and f 2 (n) = O( g(n), then f 1 (n) + f 2 (n) = O( g(n)).
We often write this asO( g(n)) + O( g(n)) = O( g(n)).
If f (n) = O( g(n)) and g(n) = O(h(n)), then f (n) = O(h(n)).
We have another look at Dijkstras Algorithm, and ask ourselves how long it would take to
solve a particular problem ( i.e., if we start with a network ( D, w) and a source s, how long
before the algorithm finishes ). Since it doesnt seem to make much sense to talk about how
long in time, we assume the question is supposed to be how many steps will it do ?.
Notes 7 Page 7
As an example, consider the simple algorithm to find the minimum of a finite set S in Section 7.3. Recall that |S| denotes the number of elements in S.
Line 1, copying the set S, is |S| operations. Line 2 is 1 operation, while line 3 is 2 operations.
The check in line 4 is one operation, but we may have to do this check possibly many times.
In fact, doing lines 4 7 once takes at most some small constant number a of operations. And
each time we encounter those lines, the set A becomes one smaller. So we might have to
do those steps |S| times before the set A is empty. Hence lines 4 7 in total may take a |S|
operations. And then there is the final operation in line 8.
So in total we might have up to |S| + 1 + 2 + a |S| + 1 = 4 + ( a + 1) |S| operations, where a is
some small constant. We express this as saying that the algorithm in Section 7.3 for finding
the minimum of a set S takes O(|S|) operations.
We can do a similar analysis of Dijkstras Algorithm. Here the essential input is the digraph
D = (V, A) with |V | vertices and | A| arcs, the | A| weights w( a) on the arcs, and the two
vertices s and t.
First observe that we can do lines 1 3 in O(|V |) operations.
Now the lines 4 11 are repeated as long as we have grey vertices. Since every time we do
these lines, we recolour one of the grey vertices to white, and a white vertex never changes
colour, the number of times we have to do lines 4 10 is at most the number of vertices |V |.
The check in line 4 weve assumed is just one operation.
In line 5 we need to find the minimum of a finite set, and the size of that set is the number of
grey vertices at that moment. We dont really know how large that set is, and the best we can
say is that we never have more than |V | grey vertices. Using our minimisation procedure
Notes 7 Page 8
from Section 7.3, this means that one run of line 5 takes O(|V |) operations. And since we
might have to do this up to O(|V |) times, the total number of operations is O(|V |2 ).
Line 6 is one operation, which we have to do O(|V |) times.
In lines 7 10, we do something with all arcs that have the chosen vertex u as their tail. Each
vertex appears at most once as a chosen vertex u in those lines ( since u becomes white at
this point, and white vertices are never recoloured and never chosen in line 5 ). Lines 8 10
all take a constant number of operations. So the number of operations required in total in all
the times we perform lines 7 10 is at most a constant times the number of arcs. That leads
to an estimate of O(| A|) operations.
Once weve reached line 12, we only have to worry about the black vertices. We cant predict how many there will be, but its certainly not more than |V |. So line 12 takes O(|V |)
operations.
And, finally, line 13 is one operation.
Putting it all together, and applying the arithmetic of big-oh, we see that we need at most
O(|V |) + O(|V |2 ) + O(| A|) = O(|V |2 + | A|) operations. In fact, since | A| |V | (|V | 1)
( exercise ), we can write O(|V |2 + | A|) = O(|V |2 ).
So we can conclude :
Dijkstras Algorithm requires O(|V |2 ) operations to find the distance dist(s, t) of two vertices s, t in
a network ( D, w) with |V | vertices.
It is possible to organise the way we deal with grey vertices and the way we manipulate
the values d(v) for the grey vertices more carefully; in such a way that the total number of
operation required to find all the minima in line 5 is O(|V | ln |V |) instead of O(|V |2 ). Hence
that version of Dijkstras Algorithm would use O(| A| + |V | ln |V |) operations.
So is Dijkstras Algorithm any good ? In particular, is it more efficient than just finding all
paths ( or walks ) and checking which one has the lowest weight ? Well, in Exercise 10 you
will be asked to prove that there exist graphs with (n + 1)2 vertices that have more than 2n
different paths between
two particular vertices. So checking all paths for those graphs would
|V |1
involve at least 2
operations ( and that is ignoring the number of operations involved
in finding all paths and calculating the weight of each paths ).
Since 2 |V |1 grows much, much faster than |V |2 , we can indeed say that Dijkstras Algorithm is much more efficient than the brute force approach of finding all paths.
As noticed already, Dijkstras Algorithm is not guaranteed to work if there are negative
weights in the network. ( It might give the right answer, but we cant rely on it. ) The algorithm in this section, generally known as the Bellman-Ford Algorithm2 , can be used whether
or not there are negative weights.
2
Regarding the names attached to this algorithm, we are in a similar situation as we were with Dijkstras
Algorithm. The first known published version was by Schinbel from 1955; it was rediscovered by Moore, and
by Woodbury & Dantzig, both in 1957; and then by Bellman in 1958. Since Bellman in his description used some
ideas from a publication of Ford from 1956, the name Ford became attached to the algorithm as well.
Notes 7 Page 9
Here is the algorithm. Again, we assume we have been given a network ( D, w) where D =
(V, A) is a digraph and w : A Z is a weight function. We are also given a source vertex s
and want to find dist(s, v) for all v V.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
set d(s) = 0;
for all v V with (s, v) A : set d(v) = w(s, v);
for all v V, v 6= s, with (s, v)
/ A : set d(v) = +;
repeat |V | times :
{ for all arcs (u, v) A :
{ if d(u) 6= + :
{ if d(v) = + : set d(v) = d(u) + w(u, v);
if d(v) 6= + and d(v) > d(u) + w(u, v) : set d(v) = d(u) + w(u, v)
};
};
};
for all arcs (u, v) A :
{ if d(u) 6= + and d(v) > d(u) + w(u, v) :
declare Negative Cycle ! and STOP IMMEDIATELY;
};
for all v V : declare dist(s, v) to be d(v)
We will prove the correctness of the Bellman-Ford Algorithm in the lectures. A few further
comments about the algorithm are in place as well.
First notice that also this algorithm doesnt treat t any special. If there is no v V with
dist(s, v) = , then the algorithm will find dist(s, v) for all v V. If there are certain
vertices f for which we have dist(s, v) = , then it will return this fact by giving Negative
Cycle !. This means that the algorithm discovered that there is a cycle with negative weight
and so that there is a walk from s to a vertex on that cycle. It then immediately follows that
dist(s, v) = for all vertices v on the cycle.
Of course, it might be possible that there are some vertices v V with dist(s, v) = ,
but that t is not one of them. And hence the algorithm might give the outcome Negative
Cycle !, although it also could have determined dist(s, t). There are ways to overcome this
problem, but we wont go into them here.
Finally, how many operations do we need for the Bellman-Ford Algorithm ? Looking through
the algorithm, is should be obvious that most of the work is done in lines 4 11. In fact,
the whole of lines 5 11 is done |V | times. And then for each time we do those lines, we
need to do | A| times the operations in lines 6 10. So the work in lines 4 11 would require
O(|V | | A|) operations.
Notes 7 Page 10
None of the other steps would require more operations than that, hence we can say :
The Bellman-Ford Algorithm requires O(|V | | A|) operations to find a negative cycle or the distances
dist(s, v) for all v V, in a network ( D, w) with |V | vertices and | A| arcs.
For the remainder of the section we assume that there are no negative cycles, but there might
still be arcs with negative weight. The following algorithm, known as the Floyd-Warschall
Algorithm3 , will find dist(s, t) for all vertices s, t. It works as long as there are no negative
cycles, but allows negative weight arcs.
Here is the algorithm. Again, we assume we have been given a network ( D, w) where D =
(V, A) is a digraph and w : A Z is a weight function. For this algorithm we assume that
the vertices are numbered from 1 to |V |.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
The ideas behind this algorithm, and the proof of its correctness, will be discussed in the
lecture.
Its getting boring, but once again we have an example where the persons after whom it is named published
their work later ( in two different papers from 1962 ) than the oldest known appearance of the algorithm ( by Roy
in 1959 ).
Notes 7 Page 11
Here we need to give some words of care about lines 6 8. Some of the d(i, j; k 1) in the
checking of d(u, v; k 1) d(u, k; k 1) + d(k, v; k 1) may be equal to +. So we again use
the convention that (+) + (+) = +, and for any number a we agree (+) + a = +
and a < +.
Once again, we have a quick look at the number of operations needed in the Floyd-Warschall
Algorithm. It shouldnt be too hard to convince yourself that most of the work is done in
lines 4 11. The number of times we go through those lines is |V | : it takes that long to get
from k = 1 to a value of k larger than |V |, since we increase k by one every time. And then for
every k, we go through lines 6 9 for all vertex pairs u, v. Hence lines 6 9 are encountered
|V |2 times for each value of k. Since the operations in lines 6 9 require some small constant
number of operations, we see that the total number of operations when performing all of
lines 4 11 is O(|V | |V |2 ) = O(|V |3 ).
All the other steps require significantly less than that number of operations. We summarise :
The Floyd-Warschall Algorithm requires O(|V |3 ) operations to find the distances dist(s, t) for all
vertex pairs s, t in a network ( D, w) without negative cycles ( where |V | is the number of vertices in
the digraph D ).
In particular we see that the number of operations of the Floyd-Warschall Algorithm is of
the same order as the number of operations of repeating Dijkstras Algorithm for all starting
vertices s. But Floyd-Warschall can be used if there are negative weights, whereas Dijkstras
Algorithm cant be trusted in these situations.
Extra Exercises
1
Give a formal prove of the following claim : If A is a non-empty finite set of real numbers, then A
contains a maximum, i.e., there is an a A so that a a for all a A. ( Hint : use induction
on the number of elements of A. )
Let u, v, w be three distinct vertices in a digraph D. Prove that if there is a path from u to v
and a path from v to w, then there is a path from u to w. ( Note : we cant just take a u, v-path
and a v, w-path and put them in a row. ( Why not ? ) )
Prove Property 7.1. I.e., prove that it wouldnt matter if we replaced walks by paths in the
definition of strongly connected.
Notes 7 Page 12
The big-oh notation is defined in Section 14.6 of the Biggs book in a slightly different way
then we do in Section 7.6 of these notes. So lets define f (n) = O0 ( g(n)) the Biggs way : For
two functions f , g : N R+ , we say that f (n) = O0 ( g(n)) if there exists a constant K > 0
so that f (n) K g(n) for all n N, with possibly a finite number of exceptions.
Prove that the two definition are equivalent. I.e., prove :
(a) If f (n) = O( g(n)) ( definition from these notes ), then f (n) = O0 ( g(n)) ( definition from
the Biggs book ).
(b) If f (n) = O0 ( g(n)), then f (n) = O( g(n)).
Prove that for all real numbers a, b > 1 we have loga (n) = O(logb (n)).
10
Notes 7 Page 13
For m, n 0, let M (m, n) be the Manhattan digraph. This digraph has as vertices all pairs (i, j)
with 0 i n and 0 j n. And there is an arc from a pair (i, j) to a pair (i0 , j0 ) if i0 = i
and j0 = j + 1, or if j0 = j and i0 = i + 1. This is a sketch of M(3, 2) :
u -
u -
u -
u B
u -
u -
u -
u -
u -
A u -
We are interested in number of directed paths from A = (0, 0) to B = (m, n). Lets call this
number p(m, n).
(a) Show that p(0, n) = 1 and p(m, 0) = 1 for all m, n 0.
(b) Show that for all m, n 1 we have p(m, n) = p(m 1, n) + p(m, n 1).
(m + n)!
m+n
a
(c) Prove that p(m, n) =
=
, for all m, n 0. ( Here
is the binomial
n
b
m! n!
a!
number
, and a! is the factorial a! = a ( a 1) ( a 2) 2 1. )
b! ( a b)!
( Hint : use the earlier parts and induction. )
(d) Prove that for all n 0 we have p(n, n) 2n .
11
Consider the following network, with vertex set V = {s, a, b, c, d, e, f }, and the weights are
given besides the arcs :
f
v
3 HH 2
*
YH
H
v
1
Hv e
d H
HH
*5
0 YH c
v
H
10 ?
12
H 7 6
8
*
Y
H
HHv
v
4 6
b
a H
HH
jH
6 H
3
v
s
(a) Describe how Dijkstras Algorithm would progress on this network when determining
dist(s, v) for all vertices v V.
(b) Describe how the Bellman-Ford Algorithm would progress on this network when determining dist(s, v) for all vertices v V.
(c) Describe how the Floyd-Warschall Algorithm would progress on this network when
determining dist(u, v) for all pairs u, v.
12
Notes 7 Page 14
Consider the following network, with vertex set V = {s, a, b, c, d, e, f }, and the weights are
given besides the arcs :
f
v
8 HH 2
YH
H
v
1
Hv e
d H
HH
*
0 H c 5
H
v
3 ?
3
?
HHH
1
8
Y
*
HH
v
v
4 6
b
a H
HH
YH
6 H
3
v
s
(a) Describe how Dijkstras Algorithm would progress on this network when determining
dist(s, v) for all vertices v V. Are all the answers correct ?
(b) Describe how the Bellman-Ford Algorithm would progress on this network when determining dist(s, v) for all vertices v V.
(c) Describe how the Floyd-Warschall Algorithm would progress on this network when
determining dist(u, v) for all pairs u, v.
13
Suppose we want to calculate dist(s, t) in a network where certain weights are negative. We
use the Bellman-Ford Algorithm, and for the network were interested in it declares Negative Cycle !. Thats good information to have, but it doesnt really tell us what dist(s, t)
is.
Describe how you can modify the Bellman-Ford Algorithm so that for specific given s, t it
will return the right answer dist(s, t) ( if there are s, t-walks, but whose weights have no
lower bound; + if there are no s, t-walks; and the minimum weight of a s, t-walk if this
minimum exists ).
Optimisation Theory
MA 208
2007/08
Example Questions
1
Let R = { x R | x 1 }.
(a) Prove that if two functions f , g : N R satisfy f (n) = O( g(n)), then we have
ln( f (n)) = O(ln( g(n))).
(b) Give an example of two functions f , g : N R so that ln( f (n)) = O(ln( g(n))) but
not f (n) = O( g(n)).
Solutions
1
(a) A directed tour is a sequence of vertices v1 , v2 , . . . , vk where the first and the last vertex
are the same ( v1 = vk ) and every two consecutive vertices form an arc : (vi , vi+1 ) A
for all i = 1, 2, . . . , k 1.
A directed cycle is a directed tour in which all vertices, except the first and the last, are
distinct.
(b) Let T = v1 , . . . , vk be a directed tour that contains all vertices of D. And let v be a vertex
of D. Then we must have v = vi for some vi .
Proof 1 If T is a cycle, then we are done. So assume T is not a cycle, hence there
are v p , vq p < q, ( p, q) 6= (1, k), so that v p = vq . Then the two sequences T1 =
v1 , . . . , v p , vq+1 , . . . , vk and T2 = v p , . . . , vq are also directed tours, both of them shorter
than the original tour T. Moreover, vi must be in at least one of the two tours. Take the
tour T1 or T2 on which vi lies, and continue shortening it if it is not a cycle, but making
sure vi is still on it. After a finite number of reductions in length we must be done, and
then we have found a cycle containing vi .
Proof 2 We know that there is at least one tour containing v. Now among all tours that
contain v, let T 0 be the shortest one ( the one whose sequence v1 , . . . , vk0 has the smallest
number of vertices ). Then using the ideas from Proof 1, we cannot have any vertex
appearing more than once ( except v1 = vk0 ). So T 0 must be a cycle, and we are done.
(c) No, this is not the case. Consider the following digraph :
v4
v5
u
u
JJ
JJ
J
J
]
]
u - J
u - Ju
v1
v2
v3
(a) The fact that f (n) = O( g(n)) means that there exists constants C1 , C2 so that f (n)
C1 + C2 g(n) for all n N. Since g(n) 1 for all n N, this certainly means that
f (n) (C1 + C2 ) g(n) for all n N.
Then we have that ln( f (n)) ln((C1 + C2 ) g(n)) = ln(C1 + C2 ) + ln( g(n)) for all n
N. So by taking C10 = ln(C1 + C2 ) and C20 = 1, we have shown that ln( f (n)) C10 +
C20 ln( g(n)) for all n N. This means that ln( f (n)) = O(ln( g(n)) as required.
(b) Let f (n) = n2 and g(n) = n for all n N. Then we have that ln( f (n)) = ln(n2 ) =
2 ln(n) and ln( g(n)) = ln(n) for all n N. So by taking C1 = 0 and C2 = 2, we have
shown that ln( f (n)) C1 + C2 ln( g(n)) for all n N. This means that ln( f (n)) =
O(ln( g(n)).
But we dont have f (n) = O( g(n)). For that we would need that there are constants
D1 , D2 so that n2 D1 + D2 n for all n N. But for all constants D1 , D2 , if we take n
large enough, then we will always achieve n2 > D1 + D2 n. So it is not the case that
(n) = O( g(n)).
(b) Every tour containing s can be written as s, v2 , . . . , vk , s, where each two consecutive
vertices are connected by an arc. In particular notice that such a tour must start with an
arc (s, v2 ) and then is a walk from v2 to s. And the weight of such a tour is the weight
w(s, v2 ) of the arc (s, v2 ) plus the weight of the walk from v2 to s.
So to find the shortest tour containing s, we can do the following :
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
set M = +;
for all v V do the following :
{ if (s, v) is not an arc : do nothing;
if (s, v) is an arc :
{ use Dijkstras Algorithm to find the shortest walk from v to s;
call the weight of this shortest walk m;
if w(s, v) + m < M : replace M by w(s, v) + m
};
};
declare the minimum weight of a tour containing s to be M
(c) Most of the work for the algorithm in (b) is done in lines 2 9. In those lines we follow
lines 5 8 for every vertex v for which (s, v) is an arc. Lines 6 and 7 cost a small constant
number of operations. But in line 5 we perform Dijkstras Algorithm for the network,
hence that line requires O(|V |2 ) operations. So the number of operations for lines 5 8
is O(|V |2 ).
And the number of times we have to go through lines 5 8 is O(|V |). So the number of
operations for lines 2 9 is O(|V |) O(|V |2 ) = O(|V |3 ).
Lines 1 and 10 are just a few operations.
So we can conclude that the algorithm in (b) will require O(|V |3 ) operations.
MA 208
Optimisation Theory
(Half Unit)
2007 / 08 syllabus only not for resit candidates
Instructions to candidates
Time allowed :
2 hours.
This exam contains 5 questions. You may attempt as many questions as you wish,
but only your best 4 questions will count towards the final mark.
All questions carry equal numbers of marks.
Answers should be justified by showing work.
You are supplied with :
Answer booklets.
Remarks :
Page 1 of 5
x1 + x2 + x3 ,
x1 + x2 2,
x1 + x3 3,
(1)
x2 + x3 5,
x1 , x2 , x3 0.
An optimal solution of this system is obtained at x = (0, 2, 3). ( You dont have to
prove this. )
(i) Formulate the Dual LP-problem of (1).
(ii) Describe the information that can be obtained from the Strong Duality Property
for the LP-problem in (1) and its Dual.
(iii) Describe the information that the Complementary Slackness Conditions provide
for the Dual LP-problem obtained in (i).
(iv) Find an optimal solution for the Dual LP-problem from (i).
Is this solution unique ?
n
P
k=1
Page 2 of 5
Page 3 of 5
(a) A monopolist produces one product in a market. He has a fixed cost C > 0, and
when he produces x units of product, then he has a cost c(x) per unit and can ask a
price p(x) per unit so that all units can be sold. So his profit as a function of units
produced is
(x) = x p(x) x c(x) C,
and of course this profit should be maximised, for x 0.
It is known that p : R+ R+ is a continuous function satisfying p(x) as
x , for some 0; and that c : R+ R+ is a continuous function satisfying
c(x) as x , for some 0.
For the following cases, decide if a maximum of (x) for x 0 always exists or not.
Justify your answers.
(i) > ;
(ii) > ;
(iii) = 0 and = 0.
(b) Consider the function h(x, y) = x2 + y 2 on the set D = { (x, y) | y 2 = x2 (1 x2 ) }.
(i) Explain why you can be sure that h has a maximum and a minimum on D.
(ii) Find the maxima and minima of h on D.
Make sure to explain why you are justified in each of the steps you do in your
analysis.
The constraint set is replaced by the set D = { (x, y) | y 2 = x2 (1 x2 ) + }, where
is a ( negative or positive ) real number close to zero. So now we are looking for the
extrema of h on D .
(iii) Based on the results obtained in (ii), what can you say approximately about the
extrema of h on D ?
Page 4 of 5
(a) Determine, justifying your answer, if the following statements are true :
(i) Let D1 = { x R | |x| < 1 } and D2 = { x R | |x| 1 }. If f : R R is a
( not necessarily continuous ) function which has a maximum on D1 , then f has a
maximum on D2 .
(ii) Let D3 = { x R2 | kxk < 1 } and D4 = { x R2 | kxk 1 }. If g : R2 R is a
( not necessarily continuous ) function which has a maximum on D3 , then g has a
maximum on D4 .
(b) Let N = (D, w) be a network, with D = (V, A) a digraph and w : A Z a weight
function on the arcs. And let s V .
(i) Describe the Bellman-Ford Algorithm to find dist(s, v) for all v V .
The Bellman-Ford Algorithm is run on a particular network N = (D, w). For that
particular instance, the output of the algorithm appears to be a list dist(s, v) 6=
for all v V . Moreover, for all v 6= s the output gives dist(s, v) > 0.
Based on the output of the Bellman-Ford Algorithm for this network, which of the
following statements are justified ? ( Make sure you justify your answers, either by
explaining why it is true, or by providing a counterexample. )
(ii) We must have w(a) 0 for all a A.
(iii) There are no cycles in the digraph with negative weight.
(iv) The digraph D is strongly connected.
END OF EXAM
Page 5 of 5
y)
4x y
x + 4y
1: We have Df =
xy
1
2
and
4x y
; or
x + 4y
(4x y) ; and
( x + 4y)
=
1 =
2 =
Solving for
2 (4x
8x
4x
y) = 4y
2y = 4y
y
4y x
x; or
x
3
9x = 6y; or y = x:
2
; or
3
3
x
x
x = 1; or
2
2
x2 = 1=5; so solutions are
3
1
3
1
x = p ; y = p ; and x = p ; y = p :
5
2 5
5
2 5
2x2 + 2
1
3
4p
3
4p
1
p ; p
=
5 > 0 and f p ; p
=
5<
5
5
5 2 5
5 2 5
0; so the former must be the maximum and the latter the minimum.
We have f
(b) Analyse the constraint qualications for the following sets of constraints.
i. g1 (x; y; z) = (x 1)2 + y 2 1 = 0; g2 (x; y; z) = (x
y 2 4 = 0: (These are cylinder sets.)
2)2 +
1
0
1
2x 2
2x 4
We have Dg1 = @ 2y A and Dg2 = @ 2y A :
0
0
1
2y
; Dg =
2x
2y
2x
2y
= 0; or
2 x = 0;
2 y = 2y (1
)=0
2
y
= 0 (but no interior max as Df 6= 0 )
f (0; 0) = 0
f ( 1; 0) = 1
p
1=2;
3=2 = 3=4 + 1=2 = 5=4
1=2;
3=2 :
2x
2y
; Dg2 =
0
1
; Dg3 =
0
1
2y1
y1
2
+3
4
0
2x1 + 3x2
2x2 + x3
2x1 3x3
x1 + 3x2
xi
4x3 ; subject to
2
3
4
0
5
6
0
5x + 6y
8
5
30
0
0
=
=
<
>
>
8
5
30 slack
0 slack
0 slack
5. Multiplying an m k matrix by a k n matrix requires mkn multiplications (n for each of the mk elements of the product m k matrix).
(a) Given matrices A (2
((AB) C) D
(A (BC)) D
A ((BC) D)
(AB) (CD)
A (B (CD))
has
has
has
has
has
5) ; B (5
2
5
5
2
1
5
1
1
5
3
1+2
3+2
3+5
1+1
6+5
1) ; C (1
1
5
3
3
1
3+2
3+2
6+2
6+2
6+2
3
3
5
1
5
3) ; and D (3
6
6
6
6
6
6)
52 mults
81 mults
165 mults
40 mults
108 mults
=
=
=
=
=
0
d2 d3 d4
= V (3; 4) and
5
Clearly V (3; 4) = d3 d4 d5 :
V (2; 4) = min[d2 d3 d5 + 0 + d3 d4 d5 ; d2 d4 d5 + d2 d3 d4 + 0]
(e) For n = 5 and d1 = 2; d2 = 3; d3 = 1; d4 = 3; d5 = 6; d6 = 4;
i=1
j=1 0
2
6
3
12
4
36
5
11
0
9
36
0
18
x2
x31 + x2
0; h2 (x1 ; x2 ) =
0
0
0.5
1.5
2
x
3x21 ; +1
Since neither is ever zero we need only check where they are
collinear, which means their rst coordinates must be negatives
of each other. So consider
4x1 = 3x21 ; with solutions x1 = 0 and x1 = 4=3:
But the constraints are both eective only at x1 = 0; where x2 = 0
also. So the Constraint Qualication holds on D except at (0; 0) :
( Dh1 ) +
( Dh2 ) =
( 8; 1) +
(12; 1) ;
1;
0;
0g :
(2x1 ; 1) +
3x21 ; +1 = 0
2
x2 = 0
1 h1 =
1 2x1
3
x1 + x2 = 0
2 h2 =
2
h1 ; h2
0
0
1; 2
( 8; 1) +
(12; 1) ;
(4x1 ; 1) = 0;
3x21 ; +1
x31 + x2
= 0;
= 0
4
9
8
4
= :
27
27
1)3
y 2 = 0:
0.5
0
1
1.25
1.5
1.75
2
x
-0.5
-1
Note that original solutions had only top branch, so accepted this
(and other close pictures) from students. Since x 1 and y 2 0;
the objective function is at least 1. Since 12 + 02 = 1; this is
one way to establish minimum. For x or y > 2 objective is at
least 4; greater than f (1; 0) = 1: So by W-Theorem there is a
minimum on [1; 2] [0; 2] ; which is a global minimum. We have
2
Dg = D (x 1)3 y 2 = 3(x2y1) which is 00 only when x = 1
and y = 0: LaGranges equations are
2x
3 (x 1)2 = 0;
2y + 2 y = 0;
(x 1)3 y 2 = 0:
1 0
1 0
2
2z
2y
2 0 0
2
2
@
A
@
A
@
2z
2
2x
0 2 0 ;
2
D f =
=
2y
2x
2
0 0 2
2
at the two points.
2
2
2
1
2
2 A
2
4. a Consider the LP
min 11x1 + 2x2 + 6x3 ; subject to
2x1 x2
3x1 + x2 + x3
x 1 ; x2 ; x3
1
5
0
11
2
6
0
(1,3)
V:
This will prove to your friend that the objective function cannot be less than V:
ans: Use the dual varible optimum, y1 = 1 and y2 = 3;
1 (2x1 x2 )
3 (3x1 + x2 + x3 )
1 (1)
3 (5)
adding gives
16 = V
b (This part counts much less than part a.) Explain why a pair of dual
linear programs cannot both be unbounded. For full credit, prove
any theorem you use.
Suppose that
max c x; subject to
Ax
b; x
c; y
r ((i; j) ; a) =
(i + 1; j) if a = 1
(i; j + 1) if a = 2
for (i; j) 6= (4; 2) :
f ((i; j) ; a) =
(c) Suppose that p = (:1; :3; :25; :05) and q = (:2; :1) : Solve the dynamic program, stating the minimum of E and the optimal sequence of searching the two tunnels? Hint: copy the diagram
below and complete the backwards induction, writing V at nodes
and indicating the optimal action at each node by a thick arrow.
.3
.3
0
.6
.5
.8
.25
.6
0
0
ans:
3.05
q2=.1
.3
.2
1 2.9
1.2
1.25
1.55
2.75
.3
.2
q1=.2
2.7
1.8
.4
.6
.6
.8
.8
2.95
.75
0
.6
.25
.6
1
.2
1.6
1.6
1
i
2
3
4
P1=.1
p2=.3
p3=.25 p4=.05
Optimal path is (I, I, I, II, II, I )
0 3.05
0
.1
.3
.5
.4
.9
.3
2.35
11
Optimisation Theory
2007/08
MA 208
Solutions 2008 Exam
1
(a) (i)
(ii) The fact that dist(u, v) 6= means that there is a walk W from u to v so that dist(u, v) =
w(W ). Now let W = x1 , x2 . . . , xk , with x1 = u, xk = v, be a walk so that dist(u, v) =
w(W ) and so that W is as short as possible ( in terms of the number of vertices k ). We
will prove that W must be a path.
Suppose W is not a path. The only reason why this can be the case is if some vertex
appears more than once on W. Say xi = x j , for some i < j. If i = 1 ( so x j = xi =
x1 = u ), then let W 0 be the walk x j , x j+1 , . . . , xk l while if i 6= 1, then let W 0 be the path
x1 , x2 , . . . , xi1 , x j , x j+1 , . . . , xk . And let T be the tour xi , xi+1 , . . . , x j . ( This is a tour since
the first and last vertex are the same. )
We have that W 0 is a walk from u to v with fewer vertices than W. Hence we must have
w(W 0 ) > w(W ). Since w(W ) = w(W 0 ) + w( T ), this means that w( T ) < 0. But then we
can construct walks from u to v with arbitrarily low total weight : first walk from u to xi ,
then go round the tour T as many times as you want, and then walk from x j to xk . This
would mean that dist(u, v) = , contradicting the hypothesis.
So we can conclude that W is a path with dist(u, v) = w(W ).
(b) (i)
1 1 0
Setting c = (1, 1, 1), A = 1 0 1 and b = (2, 3, 5), the Dual LP-problem is
0 1 1
Minimise b y
subject to A0 y c,
y 0,
which becomes
Minimise 2 y1 + 3 y2 + 5 y3
subject to
y1 + y2
y1 + y3
y2 + y3
y1 , y2 , y3
1,
1,
1,
0.
c London School of Economics, 2008
(ii) For this LP-problem we are given that an optimal solution for the primal LP-problem
exists. So we know that an optimal solution y for the Dual LP-problem above exists.
Moreover, we know that this optimal solution must satisfy b y = c x , which gives
2 y1 + 3 y2 + 5 y3 = x1 + x2 + x3 = 5.
(iii) From the Complementary Slackness Conditions the following information can be obtained about an optimal solution y of the Dual LP-problem :
x1 = 0, so we have no information on the tightness of y1 + y2 1;
x2 > 0, so y1 + y3 1 must be tight : y1 + y3 = 1;
x3 > 0, so y2 + y3 1 must be tight : y2 + y3 = 1;
x1 + x2 = 2, so we have no information on the tightness of y1 0;
x1 + x3 = 3, so we have no information on the tightness of y2 0;
x2 + x3 = 5, so we have no information on the tightness of y3 0.
(iv) From (ii) and (iii) it follows that we are looking for (y1 , y2 , y3 ) with 2 y1 + 3 y2 + 5 y3 = 5,
y1 + y3 = 1 and y2 + y3 = 1. The last two equations give y1 = 1 y3 and y2 =
1 y3 . Substituting this into the first equation gives 2 2 y3 + 3 3 y3 + 5 y3 = 5. This
simplifies to 5 = 5.
So all solutions to these three equations have the form (1 a, 1 a, a), for any a. But the
coordinates must also be non-negative, hence we must add 1 a 0 and a 0. Also
y1 + y2 1, which means 2 (1 a) 1 hence a 1/2. We can conclude that all points
of the form y = (1 a, 1 a, a) with 0 a 1/2 are optimal solutions to the Dual
LP-problem.
In particular, the optimal solution is not unique.
2 x cos( 12 y2 )
(ii) For the derivative we find Dh( x, y) =
.
2 y x2 y sin( 12 y2 )
For the critical points we need to find the solutions in D2 to Dh( x, y) = 0, hence we need
to find the solutions ( x, y) to 2 x cos( 21 y2 ) = 0 and 2 y x2 y sin( 12 y2 ) = 0. The
first equation has solutions if x = 0 or cos( 12 y2 ) = 0.
The possibility x = 0 in the second equation gives 2 y = 0, so y = 0.
We know that cos( 12 y2 ) = 0 only if 12 y2 = ( 21 + k) for k Z. So we must have
y2 = 1 + 2 k for some k Z. Since 1 < y < 2, hence 0 y2 4, the only possible
solution is x = 2.
(a) (i)
(iii) For the function values in the critical points we have h(0, 0) = 0 and h( 2, 1) = +
2 cos( 12 ) = . We next show that there are function values for points in D below
and above these two values, so that neither of them is a global minimum or maximum.
Since a minimum or maximum can only occur in a critical point ( the set D is open ), this
means that f has no global minimum or maximum on D .
Firstly, h( a, 0) = a2 . We have (2, 0) D . Since 22 = 4 > , we find h(2, 0) = 4 > =
We say that f (n) = O( g(n)) if there exists constants C1 , C2 so that f (n) C1 + C2 g(n)
for all n N.
(ii) Suppose f (n) = O(n), so that there exist constants C1 , C2 so that f (n) C1 + C2 n for
all n N. Then for the function S f (n) we can derive, for all n N,
(b) (i)
S f (n) =
f (k)
k =1
(C1 + C2 k)
k =1
2
= C1 n + C2
C1 n + C2
k =1
2
k =1
2
= C1 n + C2 n C1 n + C2 n = (C1 + C2 ) n .
So if we set C3 = 0 and C4 = C1 + C2 , then for all n N we have S f (n) C3 + C4 n2 ,
proving that S f (n) = O(n2 ).
(iii) Define the functions f , g : N R+ by setting, for all n N, f (n) = 1 and g(n) = 1.
Then taking C1 = 0 and C2 = 1 we trivially have f (n) C1 + C2 g(n), hence f (n) =
O( g(n)).
n
If it would be the case that S f (n) = O(( g(n))2 ), then there are constants C3 , C4 so that
for all n N we have S f (n) C3 + C4 ( g(n))2 . This is equivalent to n C3 + C4 for
all n N. Since no constants C3 , C4 can have C3 + C4 n for all odd n N, this
gives a contradiction. And hence we can conclude that it is not the case that S f (n) =
O(( g(n))2 ).
(a) (i)
The objective function f ( x, y, z) is continuous, and the constraint set D is closed ( all
constraints are of the type h( x, y, z) 0 with continuous constraint functions ). So we
1
2 x
x2 y2 0. The derivatives are Dh1 ( x, y, z) = 1 and Dh2 ( x, y, z) = 2 y .
4 z
0
The possible sets of effective constraints are { h1 }, { h2 } and { h1 , h2 }.
It is obvious that { Dh1 ( x, y, z)} is never a dependent
set.
2 x
0
We have that { Dh2 ( x, y, z)} is dependent if 2 y = 0, hence if x = 0 and y = 0.
0
0
But for (0, 0, z) we have h2 (0, 0, z) = 2 > 0. So the constraint h2 ( x, y, z) = 0 is never
effective if x = 0 and y = 0.
So we are left to check if there are points ( x, y, z) D with h1 ( x, y, z) = 0 and
h2
( x, y, z
) =
where { Dh1 ( x, y, z), Dh2 ( x, y, z)} is a dependent set. The set
0, and
1
2 x o
n
1 , 2 y is dependent only if 4 z = 0 and if 2 x = 2 y, hence if z = 0
4 z
0
and x = y. Since we must have z2 = x + y, from z = 0 and x = y we find x = 0 and
y = 0. But for (0, 0, 0) the second constraint h2 ( x, y, z) = 0 is not effective.
We can conclude that the Constraint Qualification are satisfied everywhere on D .
y+2z
1
2 x
(iii) We have D f ( x, y, z) = x + 2 z , Dh1 ( x, y, z) = 1 , Dh2 ( x, y, z) = 2 y .
2x+2y
4 z
0
Using the Kuhn-Tucker Theorem, we get the following equations for ( x, y, z, 1 , 2 ) :
1 0,
2 0,
x + y 2 z2 0,
2
2 x y 0,
1 ( x + y 2 z2 ) = 0;
2
2 (2 x y ) = 0;
(1)
(2)
y + 2 z + 1 2 2 x = 0;
(3)
x + 2 z + 1 2 2 y = 0;
(4)
2 x + 2 y 4 1 z = 0.
(5)
(iv) From the analysis above, there is only one candidate for a maximum : ( x, y, z) = (1, 1, 1).
We also know from (i) and (ii) that a maximum must exist and must appear as a solution
of the Kuhn-Tucker equations. It follows that (1, 1, 1) is the maximum of f ( x, y, z) on D .
(b) (i)
(ii) In part (a) we found that the function f has a maximum on D in ( x, y, z) = (1, 1, 1).
Hence we know that f (1, 1, 1) f ( x, y, z) for all ( x, y, z) D . We have that D 00 D
and (1, 1, 1) D 00 . Hence we have that f (1, 1, 1) f ( x, y, z) for all ( x, y, z) D 00 as
well. So we can conclude immediately that f will have the same maximum on D 00 :
( x, y, z) = (1, 1, 1).
as x .
(ii) If > , then ( x ) , so there is an x1 such that ( x ) < C for all x > x1 . Now
consider the interval I = [0, x1 ]. The profit is continuous on the compact set I, so, by
Weierstrass Theorem, has a maximum on I. And since (0) = C, this maximum is
at least C and hence is a maximum on R+ .
(b) (i)
(ii) Setting j( x, y) = x2 x4 y2 , the problem becomes finding the maximum and minimum of h( x, y) for points ( x, y) satisfying j( x, y) = 0.
We first check if there arepoint failing
the Constraint Qualification. For the derivative
2 x 4 x3
of j we have Dj( x, y) =
. The constraint set { Dj( x, y)} is only dependent
2 y
if Dj( x, y) = 0, hence if 2 x 4 x3 = 0 and 2 y = 0. The second equation gives y = 0.
The first equation can be simplified to 2 x (1 2 x2 ) = 0. This has solutions x = 0 and
2 x 4 x3
, we are looking for solutions ( x, y, ) for :
2 y
x2 x4 y2 = 0;
(1)
2 x + ( 2 x 4 x ) = 0;
(2)
2 y + ( 2 y) = 0.
(3)
2x
2y
and Dj( x, y) =
(1, 0, 1),
(1, 0, 1),
(0, 0, ),
(a) (i)
0,
if k( x1 , x2 )k 6= 1 or x2 {1, 1};
x2 , if k( x1 , x2 )k = 1 and x2 6= 1.
set d(s) = 0;
for all v V with (s, v) A : set d(v) = w(s, v);
for all v V, v 6= s, with (s, v)
/ A : set d(v) = +;
repeat |V | times :
{ for all arcs (u, v) A :
set d(v) = min{ d(v), d(u) + w(u, v) }
};
for all arcs (u, v) A :
{ if d(u) 6= + and d(v) > d(u) + w(u, v) :
declare Negative Cycle ! and STOP IMMEDIATELY;
};
for all v V : declare dist(s, v) to be d(v)
In step 6 we follow the convention that for any number a we have (+) + a = +,
min{+, a} = a, and that min{+, +} = +.
(ii) This is false. The following is a counterexample :
2
a u -
ub
4 AKA 2
s Au
A
A
For this network we have dist(s, a) = 4 and dist(s, b) = 2, even though the arc ( a, b) has
negative weight.
(iii) This is true. The fact that dist(s, v) 6= + for all v V, means that there is a path from s
to v for all v V. If there would be a cycle C with negative weight and v is a vertex
on that cycle, then dist(s, v) = , since we can form a walk from s to v and then walk
around the cycle as many times as we want.
In such a case, the Bellman-Ford Algorithm would not give the distances as outcome,
but would give Negative Cycle ! instead.
(iv) This is false. The digraph in (ii) is also a counterexample for this : There is no walk
from b to s, hence the digraph is not strongly connected.