Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Optimality Conditions: Mar Ia M. Seron

Download as pdf or txt
Download as pdf or txt
You are on page 1of 58

Optimality Conditions

Marı́a M. Seron

September 2004

Centre for Complex Dynamic


Systems and Control
Outline

1 Unconstrained Optimisation
Local and Global Minima
Descent Direction
Necessary Conditions for a Minimum
Necessary and Sufficient Conditions for a Minimum

2 Constrained Optimisation
Geometric Necessary Optimality Conditions
Problems with Inequality and Equality Constraints
The Fritz John Necessary Conditions
Karush–Kuhn–Tucker Necessary Conditions
Karush–Kuhn–Tucker Sufficient Conditions
Quadratic Programs

Centre for Complex Dynamic


Systems and Control
Unconstrained Optimisation

An unconstrained optimisation problem is a problem of the form

minimise f (x ), (1)

without any constraint on the vector x .

Definition (Local and Global Minima)


Consider the problem of minimising f (x ) over Rn and let x̄ ∈ Rn .
If f (x̄ ) ≤ f (x ) for all x ∈ Rn , then x̄ is called a global minimum.
If there exists an ε-neighbourhood Nε (x̄ ) around x̄ such that
f (x̄ ) ≤ f (x ) for all x ∈ Nε (x̄ ), then x̄ is called a local minimum.
If f (x̄ ) < f (x ) for all x ∈ Nε (x̄ ), x , x̄, for some ε > 0, then x̄ is
called a strict local minimum.

Centre for Complex Dynamic


Systems and Control
Local and Global Minima
The figure illustrates local and global minima of a function f over
the reals.

eplacements

Strict local minimum Local minima

S
Global minima

Figure: Local and global minima

Clearly, a global minimum is also a local minimum.


Centre for Complex Dynamic
Systems and Control
Descent Direction
Given a point x ∈ Rn , we wish to determine, if possible, whether or
not the point is a local or global minimum of a function f .

For differentiable functions, there exist conditions that provide this


characterisation, as we will see below.

We start by characterising descent directions.

Theorem (Descent Direction)


Let f : Rn → R be differentiable at x̄. If there exists a vector d such
that
∇f (x̄ )d < 0,
then there exists a δ > 0 such that

f (x̄ + λd ) < f (x̄ ) for each λ ∈ (0, δ),

so that d is a descent direction of f at x̄.


Centre for Complex Dynamic
Systems and Control
Descent Direction

Proof.
By the differentiability of f at x̄ , we have

f (x̄ + λd ) = f (x̄ ) + λ∇f (x̄ )d + λkd kα(x̄ , λd ),

where α(x̄ , λd ) → 0 as λ → 0.

Rearranging and dividing by λ , 0:

f (x̄ + λd ) − f (x̄ )
= ∇f (x̄ )d + kd kα(x̄ , λd ).
λ
Since ∇f (x̄ )d < 0 and α(x̄ , λd ) → 0 as λ → 0, there exists a δ > 0
such that the right hand side above is negative for all λ ∈ (0, δ). 

Centre for Complex Dynamic


Systems and Control
Necessary Conditions for a Minimum

We then have a first-order necessary condition for a minimum.

Corollary (First Order Necessary Condition for a Minimum)


Suppose that f : Rn → R is differentiable at x̄. If x̄ is a local
minimum, then
∇f (x̄ ) = 0.

Proof.
Suppose that ∇f (x̄ ) , 0. Then, letting d = −∇f (x̄ ) , we get

∇f (x̄ )d = −k∇f (x̄ )k2 < 0,

and by Theorem 2.1 (Descent Direction) there is a δ > 0 such that


f (x̄ + λd ) < f (x̄ ) for each λ ∈ (0, δ), contradicting the assumption
that x̄ is a local minimum. Hence, ∇f (x̄ ) = 0. 

Centre for Complex Dynamic


Systems and Control
Necessary Conditions for a Minimum

A second-order necessary condition for a minimum can be given in


terms of the Hessian matrix.

Theorem (Second Order Necessary Condition for a Minimum)


Suppose that f : Rn → R is twice-differentiable at x̄. If x̄ is a local
minimum, then
∇f (x̄ ) = 0
and
H (x̄ ) is positive semidefinite.

Centre for Complex Dynamic


Systems and Control
Necessary Conditions for a Minimum
Proof.
Consider an arbitrary direction d . Then, since by assumption f is
twice-differentiable at x̄ , we have
1 2 
f (x̄ + λd ) = f (x̄ ) + λ∇f (x̄ )d + λ d H (x̄ )d + λ2 kd k2 α(x̄ , λd ), (2)
2
where α(x̄ , λd ) → 0 as λ → 0. Since x̄ is a local minimum, from
Corollary 2.2 we have ∇f (x̄ ) = 0. Rearranging the terms in (2) and
dividing by λ2 > 0, we obtain

f (x̄ + λd ) − f (x̄ ) 1 
= d H (x̄ )d + kd k2 α(x̄ , λd ) . (3)
λ2 2
Since x̄ is a local minimum, f (x̄ + λd ) ≥ f (x̄ ) for sufficiently small λ .
From (3), 21 d  H (x̄ )d + kd k2 α(x̄ , λd ) ≥ 0 for sufficiently small λ. By
taking the limit as λ → 0, it follows that d  H (x̄ )d ≥ 0; and, hence,
H (x̄ ) is positive semidefinite. 
Centre for Complex Dynamic
Systems and Control
Necessary and Sufficient Conditions for a Minimum

We now give, without proof, a sufficient condition for a local


minimum.
Theorem (Sufficient Condition for a Local Minimum)
Suppose that f : Rn → R is twice-differentiable at x̄. If ∇f (x̄ ) = 0
and H (x̄ ) is positive definite, then x̄ is a strict local minimum.
As is generally the case with optimisation problems, more powerful
results exist under (generalised) convexity conditions.

The next result shows that the necessary condition ∇f (x̄ ) = 0 is


also sufficient for x̄ to be a global minimum if f is
pseudoconvex at x̄ .
Theorem (Nec. and Suff. Condition for Pseudoconvex Functions)
Let f : Rn → R be pseudoconvex at x̄. Then x̄ is a global minimum
if and only if ∇f (x̄ ) = 0.

Centre for Complex Dynamic


Systems and Control
Constrained Optimisation

We first derive optimality conditions for a problem of the following


form:

minimise f (x ), (4)
subject to:
x ∈ S.

We will first consider a general constraint set S .

Later, the set S will be more explicitly defined by a set of equality


and inequality constraints.

For constrained optimisation problems we have the following


definitions.

Centre for Complex Dynamic


Systems and Control
Feasible and Optimal Solutions

Definition (Feasible and Optimal Solutions)


Let f : Rn → R and consider the constrained optimisation
problem (4), where S is a nonempty set in Rn .
A point x ∈ S is called a feasible solution to problem (4).
If x̄ ∈ S and f (x ) ≥ f (x̄ ) for each x ∈ S, then x̄ is called an
optimal solution, a global optimal solution, or simply a
solution to the problem.
The collection of optimal solutions is called the set of
alternative optimal solutions.
If x̄ ∈ S and if there exists an ε-neighbourhood Nε (x̄ ) around x̄
such that f (x ) ≥ f (x̄ ) for each x ∈ S ∩ Nε (x̄ ), then x̄ is called a
local optimal solution.
If x̄ ∈ S and if f (x ) > f (x̄ ) for each x ∈ S ∩ Nε (x̄ ), x , x̄, for
some ε > 0, then x̄ is called a strict local optimal solution.

Centre for Complex Dynamic


Systems and Control
Local and global minima
ag replacements
The figure illustrates examples of local and global minima.

f C D
E
B
Local minimum
A
Local minima
Global minimum
[ ]
S

Figure: Local and global minima

The points in S corresponding to A, B and E are also strict local


minima, whereas those corresponding to the flat segment of the
graph between C and D are local minima that are not strict.
Centre for Complex Dynamic
Systems and Control
Convex Programs
A convex program is a problem of the form

minimise f (x ), (5)
subject to:
x ∈ S.

in which the function f and set S are, respectively, a convex


function and a convex set.

The following is important property of convex programs.

Theorem (Local Minima of Convex Programs are Global Minima)


Consider problem (5), where S is a nonempty convex set in Rn ,
and f : S → R is convex on S. If x̄ ∈ S is a local optimal solution to
the problem, then x̄ is a global optimal solution. Furthermore, if
either x̄ is a strict local minimum, or if f is strictly convex, then x̄ is
the unique global optimal solution.
Centre for Complex Dynamic
Systems and Control
Geometric Necessary Optimality Conditions

In this section we give a necessary optimality condition for problem

minimise f (x ), (6)
subject to:
x∈S

using the cone of feasible directions defined below.

We do not assume problem (6) to be a convex program.

As a consequence of this generality, only necessary conditions for


optimality will be derived.

In a later section we will impose suitable convexity conditions to


the problem in order to obtain sufficiency conditions for optimality.

Centre for Complex Dynamic


Systems and Control
Cones of Feasible Directions and of Improving Directions

Definition (Cones of Feasible and Improving Directions)

Let S be a nonempty set in Rn and let x̄ ∈ cl S. The cone of


feasible directions of S at x̄, denoted by D, is given by

D = {d : d , 0, and x̄ +λd ∈ S for all λ ∈ (0, δ) for some δ > 0}.

Each nonzero vector d ∈ D is called a feasible direction.


Given a function f : Rn → R, the cone of improving
directions at x̄, denoted by F, is given by

F = {d : f (x̄ + λd ) < f (x̄ ) for all λ ∈ (0, δ) for some δ > 0}.

Each direction d ∈ F is called an improving direction, or a


descent direction of f at x̄.

Centre for Complex Dynamic


Systems and Control
Illustration: Cone of Feasible Directions



D
D
ents PSfrag replacements PSfrag replacements
S S S


D

ents PSfrag replacements PSfrag replacements x¯


S D S D
D
x¯ x¯ S

Centre for Complex Dynamic


Systems and Control
Illustration: Cone of Improving Directions

F
s

es
se

se

as
rea

rea

cre
F
ec

ec

e

fd

fd

fd

F

ements PSfrag replacements x¯ PSfrag replacements

Centre for Complex Dynamic


Systems and Control
Algebraic Description of the Cone of Improving Directions

We will now consider the function f to be differentiable at the point


x̄ . We can then define the sets

F0 , {d : ∇f (x̄ )d < 0}, (7)


0
F0 , {d , 0 : ∇f (x̄ )d ≤ 0}. (8)

From Theorem 2.1 (Descent Direction), if ∇f (x̄ )d < 0, then d is an


improving direction. It then follows that F0 ⊆ F .

Also, if d ∈ F , we must have ∇f (x̄ )d ≤ 0, or else, analogous to


Theorem 2.1, ∇f (x̄ )d > 0 would imply that d is an ascent direction.

Hence, we have
F0 ⊆ F ⊆ F00 . (9)

Centre for Complex Dynamic


Systems and Control
Algebraic Description of the Cone of Improving Directions

F0 ⊆ F ⊆ F00
where
F0 , {d : ∇f (x̄ )d < 0} F00 , {d , 0 : ∇f (x̄ )d ≤ 0}.

F
s

es
se

se

as
rea

rea

cre
F
ec

ec

e

fd

fd

fd

ents PSfrag replacements PSfrag
F replacements

F0 ⊂ F = F00 F0 = F ⊂ F00 F0 ⊂ F ⊂ F00

Centre for Complex Dynamic


Systems and Control
Geometric Necessary Optimality Conditions

The following theorem states that a necessary condition for local


optimality is that every improving direction in F0 is not a feasible
direction.
Theorem (Geometric Necessary Condition for Local Optimality)
Consider the problem to minimise f (x ) subject to x ∈ S, where
f : Rn → R and S is a nonempty set in Rn . Suppose that f is
differentiable at a point x̄ ∈ S. If x̄ is a local optimal solution then

F0 ∩ D = ∅, (10)

where F0 = {d : ∇f (x̄ )d < 0} and D is the cone of feasible


directions of S at x̄, that is

D = {d : d , 0, and x̄ + λd ∈ S for all λ ∈ (0, δ) for some δ > 0}.

Centre for Complex Dynamic


Systems and Control
Geometric Necessary Optimality Conditions

Proof.
Suppose, by contradiction, that there exists a vector d ∈ F 0 ∩ D .
Since d ∈ F0 , then, by Theorem 2.1 (Descent Direction), there
exists a δ1 > 0 such that

f (x̄ + λd ) < f (x̄ ) for each λ ∈ (0, δ1 ). (11)

Also, since d ∈ D , by Definition 3.2, there exists a δ2 > 0 such that

x̄ + λd ∈ S for each λ ∈ (0, δ2 ). (12)

The assumption that x̄ is a local optimal solution is not compatible


with (11) and (12). Thus, F0 ∩ D = ∅. 

Centre for Complex Dynamic


Systems and Control
Geometric Necessary Optimality Conditions




Contours 

of f 
                         
D
                         
                        
                         
                         
Sfrag replacements                         
                         
                         
∇f (x̄ )
F0                          

                          
 

 
x̄  

                        
 S
                        
 
                         

 
 

 

 

 
 
  
f decr 
 
eas 
 
es

Figure: Illustration of the necessary condition F0 ∩ D = ∅.

Centre for Complex Dynamic


Systems and Control
Problems with Inequality and Equality Constraints
We next consider a specific description for the feasible region S as
follows:

S = {x ∈ X : gi (x ) ≤ 0, i = 1, . . . , m, hi (x ) = 0, i = 1, . . . , l } ,

where gi : Rn → R for i = 1, . . . , m, hi : Rn → R for i = 1, . . . , `, and


X is a nonempty open set in Rn .

This gives the following nonlinear programming problem with


inequality and equality constraints:

minimise f (x ),
subject to:
gi (x ) ≤ 0 for i = 1, . . . , m, (13)
hi (x ) = 0 for i = 1, . . . , `,
x ∈ X.

Centre for Complex Dynamic


Systems and Control
Algebraic Description of the Cone of Feasible Directions
Suppose that x̄ is a feasible solution of problem (13), and let
I = {i : gi (x̄ ) = 0} be the index set for the binding or active
constraints. Suppose that there are no equality constraints.

Furthermore, suppose that each gi for i < I is continuous at x̄ , that


f and gi for i ∈ I are differentiable at x̄ .

Let

G0 , {d : ∇gi (x̄ )d < 0 for i ∈ I},


G00 , {d , 0 : ∇gi (x̄ )d ≤ 0 for i ∈ I}.

Recall the cone of feasible directions of S at x̄ :

D = {d : d , 0, and x̄ + λd ∈ S for all λ ∈ (0, δ) for some δ > 0}.

Then
G0 ⊆ D ⊆ G00 . (14)
Centre for Complex Dynamic
Systems and Control
Algebraic Description of the Cone of Feasible Directions

To see the first inclusion, let d ∈ G0 . Since x̄ ∈ X , and X is open,


there exists δ1 > 0 such that

x̄ + λd ∈ X for λ ∈ (0, δ1 ).

Also, since gi , i < I is continuous at x̄ , there exists δ2 > 0 such that

gi (x̄ + λd ) < 0 for λ ∈ (0, δ2 ) and for i < I.

Furthermore, since d ∈ G0 , then ∇gi (x̄ )d < 0 for each i ∈ I. By


Theorem 2.1 (Descent Direction) there exists δ3 > 0 such that

gi (x̄ + λd ) < gi (x̄ ) = 0 for λ ∈ (0, δ3 ) and for i ∈ I.

It is then clear that points of the form x̄ + λd are feasible to S for


each λ ∈ (0, δ), where δ = min{δ1 , δ2 , δ3 }. Thus d ∈ D and hence
G0 ⊆ D .

Centre for Complex Dynamic


Systems and Control
Algebraic Description of the Cone of Feasible Directions

G0 ⊆ D ⊆ G00
where
G0 , {d : ∇gi (x̄ )d < 0 for i ∈ I},
G00 , {d , 0 : ∇gi (x̄ )d ≤ 0 for i ∈ I}.


D
ents PSfrag replacements PSfrag replacements
S S D
D
x¯ x¯ S

G0 = D ⊂ G00 G0 ⊂ D = G00 G0 ⊂ D ⊂ G00

Centre for Complex Dynamic


Systems and Control
Problems with Inequality and Equality Constraints

Theorem (Geometric Necessary Condition for Problems with In-


equality and Equality Constraints)
Let X be a nonempty open set in Rn , and let f : Rn → R,
gi : Rn → R for i = 1, . . . , m, hi : Rn → R for i = 1, . . . , `. Consider
the problem defined in (13). Suppose that x̄ is a local optimal
solution, and let I = {i : gi (x̄ ) = 0} be the index set for the binding or
active constraints. Furthermore, suppose that each g i for i < I is
continuous at x̄, that f and gi for i ∈ I are differentiable at x̄, and
that each hi for i = 1, . . . , ` is continuously differentiable at x̄. If
∇hi (x̄ ) for i = 1, . . . , ` are linearly independent, then
F0 ∩ G0 ∩ H0 = ∅, where

F0 = {d : ∇f (x̄ )d < 0},


G0 = {d : ∇gi (x̄ )d < 0 for i ∈ I}, (15)
H0 = {d : ∇hi (x̄ )d = 0 for i = 1, . . . , `}.

Centre for Complex Dynamic


Systems and Control
Problems with Inequality and Equality Constraints

Proof. 



(Only for inequality constraints.) Contours


Let x̄ be a local minimum. We of f 
G
      0                   
                          
then have the following                          
                         
implications from PSfrag
(10) and (14):
replacements                          
                          
F0    ∇f (x¯)
                        
                           
x̄ is a local minimum   
                        

x¯  
 

                        
S


 
=⇒ F0 ∩ D = ∅ 
 

 
 
=⇒ F0 ∩ G0 = ∅.  



 

 
 
 

  
f decr 
 
 eas es 


Centre for Complex Dynamic


Systems and Control
The Fritz John Necessary Conditions
We will now express the geometric condition F0 ∩ G0 ∩ H0 = ∅ in
an algebraic form known as the Fritz John conditions.
Theorem (The Fritz John Necessary Conditions)
Let X be a nonempty open set in Rn , and let f : Rn → R,
gi : Rn → R for i = 1, . . . , m, hi : Rn → R for i = 1, . . . , `. Let x̄ be a
feasible solution of (13), and let I = {i : gi (x̄ ) = 0}. Suppose that gi
for i < I is continuous at x̄, that f and gi for i ∈ I are differentiable at
x̄, and that hi for i = 1, . . . , ` is continuously differentiable at x̄. If x̄
locally solves problem (13), then there exist scalars u0 and ui for
i ∈ I, and vi for i = 1, . . . , `, such that

X X̀
u0 ∇f (x̄ ) + ui ∇gi (x̄ ) + vi ∇hi (x̄ ) = 0,
i ∈I i =1
(16)
u0 , ui ≥ 0 for i ∈ I,
{u0 , ui , i ∈ I, v1 , . . . , v` } not all zero .
Centre for Complex Dynamic
Systems and Control
The Fritz John Necessary Conditions

Theorem (The FJ Necessary Conditions, continued)


Furthermore, if gi , i < I are also differentiable at x̄, then the above
conditions can be written as
X
m X̀

u0 ∇f (x̄ ) + ui ∇gi (x̄ ) + vi ∇hi (x̄ ) = 0,
i =1 i =1
ui gi (x̄ ) = 0 for i = 1, . . . , m, (17)
u0 , ui ≥ 0 for i = 1, . . . , m,
(u0 , u, v ) , (0, 0, 0),

where u and v are vectors whose components are ui , i = 1, . . . , m,


and vi , i = 1, . . . , `, respectively.

Centre for Complex Dynamic


Systems and Control
The Fritz John Necessary Conditions
Proof:
If the vectors ∇hi (x̄ ) for i = 1, . . . , ` are linearly dependent, then one
P
can find scalars v1 , . . . , v` , not all zero, such that `i =1 vi ∇hi (x̄ ) = 0.
Letting u0 and ui for i ∈ I equal to zero, conditions (16) hold trivially.

Now suppose that ∇hi (x̄ ) for i = 1, . . . , ` are linearly independent.


Then, from Theorem 3.3 (Geometric Necessary Condition), local
optimality of x̄ implies that the sets defined in (15) satisfy:

F0 ∩ G0 ∩ H0 = ∅. (18)

Let A1 be the matrix whose rows are ∇f (x̄ ) and ∇gi (x̄ ) for i ∈ I, and
let A2 be the matrix whose rows are ∇hi (x̄ ) for i = 1, . . . , `. Then,
(18) is satisfied if and only if the following system is inconsistent:

A1 d < 0,
A2 d = 0.
Centre for Complex Dynamic
Systems and Control
The Fritz John Necessary Conditions

Proof (continued):
Now consider the following two sets:

S1 = {(z1 , z2 ) : z1 = A1 d , z2 = A2 d , d ∈ Rn },
S2 = {(z1 , z2 ) : z1 < 0, z2 = 0}.

Note that S1 and S2 are nonempty convex sets and, since the sys-
tem A1 d < 0, A2 d = 0 has no solution, then S1 ∩ S2 = ∅.

Then, by the theorem of separation of two disjoint convex sets, there


exists a nonzero vector p  = (p1 , p2 ) such that

p1 A1 d + p2 A2 d ≥ p1 z1 + p2 z2 ,

for each d ∈ Rn and (z1 , z2 ) ∈ cl S2 .

Centre for Complex Dynamic


Systems and Control
The Fritz John Necessary Conditions

Proof (continued):
Hence
p1 A1 d + p2 A2 d ≥ p1 z1 + p2 z2 ,
for each d ∈ Rn and (z1 , z2 ) ∈ cl S2 = {(z1 , z2 ) : z1 < 0, z2 = 0}.

Noting that z2 = 0 and since each component of z1 can be made


an arbitrarily large negative number, it follows that p 1 ≥ 0.

Also, letting (z1 , z2 ) = (0, 0) ∈ cl S2 , we must have (p1 A1 + p2 A2 )d ≥


0 for each d ∈ Rn .

Letting d = −(A1 p1 + A2 p2 ), it follows that −kA1 p1 + A2 p2 k2 ≥ 0, and


thus A1 p1 + A2 p2 = 0.

Centre for Complex Dynamic


Systems and Control
The Fritz John Necessary Conditions

Proof (continued):
Summarising, we have found a nonzero vector p  = (p1 , p2 ) with
p1 ≥ 0 such that A1 p1 + A2 p2 = 0, where A1 is the matrix whose
rows are ∇f (x̄ ) and ∇gi (x̄ ) for i ∈ I, and A2 is the matrix whose
rows are ∇hi (x̄ ) for i = 1, . . . , `.

Denoting the components of p1 by u0 and ui , i ∈ I, and letting


p2 = v , conditions (16) follow.

The equivalent form (17) is readily obtained by letting ui = 0 for


i < I, and the proof is complete. 

Centre for Complex Dynamic


Systems and Control
The Fritz John Necessary Conditions
The scalars u0 , ui for i = 1, . . . , m, and vi for i = 1, . . . , `, are called
the Lagrange multipliers associated, respectively, with the objective
function, the inequality constraints gi (x ) ≤ 0, i = 1, . . . , m, and the
equality constraints hi (x ) = 0, i = 1, . . . , `.

The condition that x̄ be feasible for the optimisation problem (13) is


called the primal feasibility [PF] condition.
X
m X̀
The requirements u0 ∇f (x̄ ) + ui ∇gi (x̄ ) + vi ∇hi (x̄ ) = 0, with
i =1 i =1
u0 , ui ≥ 0 for i = 1, . . . , m, and (u0 , u, v ) , (0, 0, 0) are called the
dual feasibility [DF] conditions.

The condition ui gi (x̄ ) = 0 for i = 1, . . . , m is called the


complementary slackness [CS] condition; it requires that u i = 0 if
the corresponding inequality is nonbinding (that is, g i (x̄ ) < 0), and
allows for ui > 0 only for those constraints that are binding.
Centre for Complex Dynamic
Systems and Control
The Fritz John Necessary Conditions

The FJ conditions can also be written in vector form as follows:

∇f (x̄ ) u0 + ∇g (x̄ ) u + ∇h (x̄ ) v = 0,


u g (x̄ ) = 0,
(19)
(u0 , u) ≥ (0, 0),
(u0 , u, v ) , (0, 0, 0),

where
∇g (x̄ ) is the m × n Jacobian matrix whose i th row is ∇gi (x̄ ),
∇h (x̄ ) is the ` × n Jacobian matrix whose i th row is ∇hi (x̄ ),
g (x̄ ) is the m vector function whose i th component is gi (x̄ ).

Any point x̄ for which there exist Lagrange multipliers such that the
FJ conditions are satisfied is called an FJ point.

Centre for Complex Dynamic


Systems and Control
Illustration: FJ conditions

ents
Feasible point
The constraint set S is:

ases S = {x ∈ R2 :
g1 (x ) = 0 x̄
g1 (x ) ≤ 0,
g2 (x ) ≤ 0,
g2 (x ) = 0
g3 (x ) ≤ 0}
S
∇g1
∇g2 Consider the feasible
∇g3 point x̄ .
g 3 (x ) = 0
−∇f
∇f

Centre for Complex Dynamic


Systems and Control
Illustration: FJ conditions

ents ∇g 1

point ∇g2
ases
g1 (x ) = 0 x̄
Consider the gradients
of the active
g2 (x ) = 0 constraints at x̄ , ∇g1 (x̄ )
S and ∇g2 (x̄ ).

∇g3
g 3 (x ) = 0
−∇f
∇f

Centre for Complex Dynamic


Systems and Control
Illustration: FJ conditions

ses
ea
nts ecr ∇g 1
fd −∇f
oint ∇g2 For the given contours
of the objective
g1 (x ) = 0 x̄ function f , we have that
u0 (−∇f (x̄ )) is in the
g 2 (x ) = 0 cone spanned by
S
∇g1 (x̄ ) and ∇g2 (x̄ ) with
u0 > 0.

∇g3
g3 (x ) = 0

∇f

Centre for Complex Dynamic


Systems and Control
Illustration: FJ conditions

ses
ea
ents ecr ∇g 1 The FJ conditions are
fd −∇f
∇f (x̄ ) u0 + ∇g (x̄ ) u = 0,
oint ∇g2
u g (x̄ ) = 0,
g1 (x ) = 0 x̄ (u0 , u) ≥ (0, 0),
(u0 , u, v ) , (0, 0, 0),
g 2 (x ) = 0
x̄ is an FJ point with
S u0 > 0.

It is also a local
∇g3
g3 (x ) = 0 minimum.

∇f

Centre for Complex Dynamic


Systems and Control
Illustration: FJ conditions

ents For the given contours


g1 (x ) = 0 of f , we have that
oint u0 (−∇f (x̄ )) is in the
g2 (x ) = 0 cone spanned by
∇g1 (x̄ ) and ∇g2 (x̄ ) only
S ∇g2 if u0 = 0.

x̄ is an FJ point
g3 (x ) = 0 with u0 = 0.
∇g1 −∇f

It is also a local
∇g3 creases minimum.
f de
∇f

Centre for Complex Dynamic


Systems and Control
Illustration: FJ conditions

ents
g1 (x ) = 0
point
g 2 (x ) = 0 x̄ is an FJ point with
S ∇g2 u0 = 0.

It is also a local
g3 (x ) = 0 maximum.
∇g1 ∇f

f decre
ases ∇g3
−∇f

Centre for Complex Dynamic


Systems and Control
The Fritz John Necessary Conditions

Given an optimisation problem, there might be points that satisfy


the FJ conditions trivially. For example:
if a feasible point x̄ (not necessarily an optimum) satisfies
∇f (x̄ ) = 0, or ∇gi (x̄ ) = 0 for some i ∈ I, or ∇hi (x̄ ) = 0 for some
i = 1, . . . , `, then we can let the corresponding Lagrange
multiplier be any positive number, set all the other multipliers
equal to zero, and satisfy conditions (16).
In fact, given any feasible solution x̄ we can always add a
redundant constraint to the problem to make x̄ an FJ point.
For example, we can add the constraint kx − x̄ k2 ≥ 0, which
holds true for all x ∈ Rn , is a binding constraint at x̄ and
whose gradient is zero at x̄ .

Centre for Complex Dynamic


Systems and Control
The Fritz John Necessary Conditions

Moreover, it is also possible that, at some feasible point x̄ , the


FJ conditions (16) are satisfied with Lagrange multiplier
associated with the objective function u0 = 0.

In those cases, the objective function gradient does not play a


role in the optimality conditions (16) and the conditions merely
state that the gradients of the binding inequality constraints
and of the equality constraints are linearly dependent.

Thus, if u0 = 0, the FJ conditions are of no practical value in


locating an optimal point.

Centre for Complex Dynamic


Systems and Control
Constraint Qualification

Under suitable assumptions, referred to as constraint


qualifications, u0 is guaranteed to be positive and the FJ conditions
become the Karush–Kuhn–Tucker [KKT] conditions, which will be
presented next.

There exist various constraint qualifications for problems with


inequality and equality constraints.

Here, we use a typical constraint qualification that requires that the


gradients of the inequality constraints for i ∈ I and the gradients of
the equality constraints at x̄ be linearly independent.

Centre for Complex Dynamic


Systems and Control
Karush–Kuhn–Tucker Necessary Conditions

Theorem (Karush–Kuhn–Tucker Necessary Conditions)


Let X be a nonempty open set in Rn , and let f : Rn → R,
gi : Rn → R for i = 1, . . . , m, hi : Rn → R for i = 1, . . . , `. Consider
the problem defined in (13). Let x̄ be a feasible solution, and let
I = {i : gi (x̄ ) = 0}. Suppose that f and gi for i ∈ I are differentiable
at x̄, that each gi for i < I is continuous at x̄, and that each hi for
i = 1, . . . , ` is continuously differentiable at x̄. Furthermore,
suppose that ∇gi (x̄ ) for i ∈ I and ∇hi (x̄ ) for i = 1, . . . , ` are linearly
independent. If x̄ is a local optimal solution, then there exist unique
scalars ui for i ∈ I, and vi for i = 1, . . . , `, such that

X X̀
∇f (x̄ ) + ui ∇gi (x̄ ) + vi ∇hi (x̄ ) = 0,
i ∈I i =1
(20)
ui ≥ 0 for i ∈ I.

Centre for Complex Dynamic


Systems and Control
Karush–Kuhn–Tucker Necessary Conditions

Theorem (KKT Necessary Conditions, continued)


Furthermore, if gi , i < I are also differentiable at x̄, then the above
conditions can be written as
X
m X̀
∇f (x̄ ) + ui ∇gi (x̄ ) + vi ∇hi (x̄ ) = 0,
i =1 i =1
(21)
ui gi (x̄ ) = 0 for i = 1, . . . , m,
ui ≥ 0 for i = 1, . . . , m.

Centre for Complex Dynamic


Systems and Control
Karush–Kuhn–Tucker Necessary Conditions
Proof.
We have, from the FJ conditions, that there exist scalars û0 and ûi ,
i ∈ I, and v̂i , i = 1, . . . , `, not all zero, such that

X X̀
û0 ∇f (x̄ ) + ûi ∇gi (x̄ ) + v̂i ∇hi (x̄ ) = 0,
i ∈I i =1
(22)
û0 , ûi ≥ 0 for i ∈ I.

Note that the assumption of linear independence of ∇g i (x̄ ) for i ∈ I


and ∇hi (x̄ ) for i = 1, . . . , `, together with (22) and the fact that at
least one of the multipliers is nonzero, implies that û0 > 0.

Then, letting ui = ûi /û0 for i ∈ I, and vi = v̂i /û0 for i = 1, . . . , ` we


obtain conditions (20).

Furthermore, the linear independence assumption implies the


uniqueness of these Lagrange multipliers. 
Centre for Complex Dynamic
Systems and Control
Karush–Kuhn–Tucker Necessary Conditions

As in the FJ conditions, the scalars ui and vi are called the


Lagrange multipliers.

The condition that x̄ be feasible for the optimisation problem (13) is


called the primal feasibility [PF] condition.
X
m X̀
The requirement that ∇f (x̄ ) + ui ∇gi (x̄ ) + vi ∇hi (x̄ ) = 0, with
i =1 i =1
ui ≥ 0 for i = 1, . . . , m is called the dual feasibility [DF] condition.

The condition ui gi (x̄ ) = 0 for i = 1, . . . , m is called the


complementary slackness [CS] condition

Centre for Complex Dynamic


Systems and Control
Karush–Kuhn–Tucker Necessary Conditions

The KKT conditions can also be written in vector form as follows:

∇f (x̄ ) + ∇g (x̄ ) u + ∇h (x̄ ) v = 0,


u g (x̄ ) = 0, (23)
u ≥ 0,

where
∇g (x̄ ) is the m × n Jacobian matrix whose i th row is ∇gi (x̄ ),
∇h (x̄ ) is the ` × n Jacobian matrix whose i th row is ∇hi (x̄ ),
g (x̄ ) is the m vector function whose i th component is gi (x̄ ).

Any point x̄ for which there exist Lagrange multipliers that satisfy
the KKT conditions (23) is called a KKT point.

Centre for Complex Dynamic


Systems and Control
Illustration: KKT conditions

cements r. PSfrag replacements PSfrag replacements


ec
fd ∇g1
−∇f Feasible point Feasible point f decr.
ble point
∇g2
x¯ ∇f ∇g2
S ∇g2 x¯

S
S ∇g1 ∇g1
x¯ −∇f
∇g3 ∇g3
∇g 3
f decr. −∇f
∇f ∇f

x̄ is a KKT point x̄ is not a KKT point x̄ is a KKT point

Centre for Complex Dynamic


Systems and Control
Constraint Qualifications
The linear independence constraint qualification is a sufficient
condition placed on the behaviour of the constraints to ensure that
an FJ point (and hence any local optimum) be a KKT point.

Thus, the importance of the constraint qualifications is to


guarantee that, by examining only KKT points, we do not lose
out on optimal solutions.

There is an important special case:

When the constraints are linear the KKT conditions are al-
ways necessary optimality conditions irrespective of the ob-
jective function.

This is because Abadie’s constraint qualification is automatically


satisfied for linear constraints.
Centre for Complex Dynamic
Systems and Control
Karush–Kuhn–Tucker Sufficient Conditions

However, we are still left with the problem of determining, among


all the points that satisfy the KKT conditions, which ones constitute
local optimal solutions.

The following result shows that, under moderate convexity


assumptions, the KKT conditions are also sufficient for local
optimality.

Centre for Complex Dynamic


Systems and Control
Karush–Kuhn–Tucker Sufficient Conditions

Theorem (Karush–Kuhn–Tucker Sufficient Conditions)


Let X be a nonempty open set in Rn , and let f : Rn → R,
gi : Rn → R for i = 1, . . . , m, hi : Rn → R for i = 1, . . . , `. Consider
the problem defined in (13). Let x̄ be a feasible solution, and let
I = {i : gi (x̄ ) = 0}. Suppose that the KKT conditions hold at x̄; that
is, there exist scalars ūi ≥ 0 for i ∈ I, and v̄i for i = 1, . . . , `, such
that
X X̀
∇f (x̄ ) + ūi ∇gi (x̄ ) + v̄i ∇hi (x̄ ) = 0. (24)
i ∈I i =1

Let J = {i : v̄i > 0} and K = {i : v̄i < 0}. Further, suppose that f is
pseudoconvex at x̄, gi is quasiconvex at x̄ for i ∈ I, hi is
quasiconvex at x̄ for i ∈ J, and hi is quasiconcave at x̄ (that is, −hi
is quasiconvex at x̄) for i ∈ K . Then x̄ is a global optimal solution to
problem (13).

Centre for Complex Dynamic


Systems and Control
Quadratic Programs

Quadratic programs are a special class of nonlinear programs in


which the objective function is quadratic and the constraints are
linear.

Thus, a quadratic programming [QP] problem can be written as

1 
minimise x Hx + x  c , (25)
2
subject to:
AI x ≤ bI ,
AE x = bE ,

where H is an n × n matrix, c is an n vector, AI is an n × mI matrix,


bI is an mI vector, AE is an n × mE matrix and bE is an mE vector.

Centre for Complex Dynamic


Systems and Control
Quadratic Programs
The constraints are linear, hence
x̄ is a local minimum =⇒ x̄ is a KKT point.
the constraint set S = {x : AI x ≤ bI , AE x = bE } is convex.

Thus,

the QP is convex ⇐⇒ the objective function is convex


⇐⇒ H is symmetric and positive semidefinite

In this case:

x̄ is a local min ⇐⇒ x̄ is a global min ⇐⇒ x̄ is a KKT point

Furthermore, if H > 0, then x̄ is the unique global minimum.


Centre for Complex Dynamic
Systems and Control
KKT Conditions for QP

The KKT conditions (23) for the QP problem defined in (25) are:

PF: AI x̄ ≤ bI ,
AE x̄ = bE ,
DF: H x̄ + c + AI u + AE v = 0, (26)
u ≥ 0,
 
CS: u (AI x̄ − bI ) = 0,

where u is an mI vector of Lagrange multipliers corresponding to


the inequality constraints and v is an mE vector of Lagrange
multipliers corresponding to the equality constraints.

Centre for Complex Dynamic


Systems and Control

You might also like