Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

lec09

Download as pdf or txt
Download as pdf or txt
You are on page 1of 56

MATH 4211/6211 – Optimization

Constrained Optimization

Xiaojing Ye
Department of Mathematics & Statistics
Georgia State University

Xiaojing Ye, Math & Stat, Georgia State University 0


Constrained optimization problems are formulated as

minimize f (x)
subject to gj (x) ≤ 0, j = 1, . . . , p,
hi(x) = 0, i = 1, . . . , m,
where gj , hi : Rn → R are inequality and equality constraint functions, re-
spectively.

We can summarize them into vector-valued functions g and h:


   
g 1 (x) h1(x)
g (x) = 

... 
 and h(x) = 

... 

g p (x) hm(x)
so the constraints can be written as g (x) ≤ 0 and h(x) = 0, respectively.

Note that g : Rn → Rp and h : Rn → Rm.

Xiaojing Ye, Math & Stat, Georgia State University 1


We can write the constrained optimization concisely as

minimize f (x)
subject to g (x) ≤ 0
h(x) = 0
The feasible set is Ω := {x ∈ Rn : g (x) ≤ 0, h(x) = 0}.

Exmaple. LP (standard form) is a constrained optimization with f (x) = c>x,


g (x) = −x and h(x) = Ax − b.

Xiaojing Ye, Math & Stat, Georgia State University 2


We now focus on constrained optimization problems with equality constraints
only, i.e.,

minimize f (x)
subject to h(x) = 0
and the feasible set is Ω = {x ∈ Rn : h(x) = 0}.

Xiaojing Ye, Math & Stat, Georgia State University 3


Some equality-constrained optimization problem can be converted into uncon-
strained ones.

Example.

• Consider the constrained optimization problem

minimize x2
1 + 2x x
1 2 + 3x 2 + 4x + 5x + 6x
2 1 2 3
subject to x1 + 2x2 = 3
4x1 + 5x3 = 6
The constraints imply that x2 = 2 1 (3 − x ) and x = 1 (6 − x ). Substi-
1 3 5 1
tute x2 and x3 in the objective function to get an unconstrained minimiza-
tion of x1 only.

Xiaojing Ye, Math & Stat, Georgia State University 4


Example.

• Consider the constrained optimization problem

maximize x1x2
subject to x2 2
1 + 4x2 = 1
It is equivalent to maximizing x2 2 2 2
1 x2 then substitute x1 by 1 − 4x2 to get
an unconstrained problem of x2.

Another way to solving this is using 1 = x2


1√+ (2x 2 ) 2 ≥ 4x x where
1 2

the equality holds when x1 = 2x2.So x1 = 2/2 and x2 = 2/4.

However, not all equality-constrained problems can be easily converted into


unconstrained ones.

Xiaojing Ye, Math & Stat, Georgia State University 5


We need general theory to solve constrained optimization problems with equal-
ity constraints:

minimize f (x)
subject to h(x) = 0
and the feasible set is Ω = {x ∈ Rn : h(x) = 0}.

Recall that h : Rn → Rm (m ≤ n) has Jacobian matrix


 
>
 ∇h1(
...
x) 
Dh(x) =   ∈ Rm×n
 
∇hm(x) >

Definition. We say a point x ∈ Ω is a regular point if rank(Dh(x)) = m,


i.e., the Jacobian matrix has full row rank.

Xiaojing Ye, Math & Stat, Georgia State University 6


Example. Let n = 3 and m = 1. Define h1(x) = x2 − x2
3 be the only
constraint. Then the Jacobian matrix is

Dh(x) = [∇h1(x)>] = [0, 1, −2x3]


Note that Dh(x) 6= 0 and hence rank(Dh(x)) = 1 everywhere.

The feasible set Ω is a “surface” in R3 with dimension n − m = 3 − 1 = 2.

Xiaojing Ye, Math & Stat, Georgia State University 7


Example. Let n = 3 and m = 2. Define h1(x) = x1 and h2(x) = x2 − x2
3.
The Jacobian is
" #
1 0 0
Dh(x) =
0 1 −2x3
with rank(Dh(x)) = 2 everywhere, and the feasible set Ω is a line in R3 with
dimension n − m = 3 − 2 = 1.

Xiaojing Ye, Math & Stat, Georgia State University 8


Tangent space and normal space

Defintion. We say x : (a, b) → Rn, a curve in Rn, is differentiable if x0i(t)


exists for all t ∈ (a, b). The derivative is defined by
0
 
x1(t)
x0(t) =   .. 
. 
x0n(t)
We say x is twice differentiable if x00i (t) exists for all t ∈ (a, b), and
00
 
x1(t)
x00(t) =   .. 
. 
x00n(t)

Xiaojing Ye, Math & Stat, Georgia State University 9


Defintion. The tangent space of Ω = {x ∈ Rn : h(x) = 0} at x∗ is the set

T (x∗) = {y ∈ Rn : Dh(x∗)y = 0}.


In other words, T (x∗) = N (Dh(x∗)).

Remark. If x∗ is regular, then rank(Dh(x∗)) = m, and hence dim(T (x∗)) =


dim(N (Dh(x∗))) = n − m.

Remark. We sometimes draw the tangent space as a plane tangent to Ω at


x∗, that tangent plane is

T P (x∗) := x∗ + T (x∗) = {x∗ + y : y ∈ T (x∗)}

Xiaojing Ye, Math & Stat, Georgia State University 10


Example. Let

Ω = {x ∈ R3 : h1(x) = x1 = 0, h2(x) = x1 − x2 = 0}
Then we have
>
"# " #
∇h1(x) 1 0 0
Dh(x) = =
∇h2(x)> 1 −1 0
at any x ∈ Ω, and the tangent space at any point x is
" #
1 0 0
 
T (x) = N (D(h(x)) = y ∈ R3 : y=0
1 −1 0
= {[0; 0; α] ∈ R3 : α ∈ R}

Xiaojing Ye, Math & Stat, Georgia State University 11


Theorem. Suppose x∗ is regular. Then y ∈ T (x∗) iff there exists curve
x : (−δ, δ) → Ω such that x(0) = x∗ and x0(0) = y .

Proof. (⇐) Let x(t) be such a curve, then h(x(t)) = 0 for t ∈ (−δ, δ) and

Dh(x(0))x0(0) = Dh(x∗)y = 0
which implies y ∈ T (x∗).

Xiaojing Ye, Math & Stat, Georgia State University 12


Proof (cont.)

(⇒) For any t, let u = u(t) ∈ Rn be determined by t s.t. it solves

h̄(t, u) := h(x∗ + ty + Dh(x∗)>u) = 0


We know u(0) = 0 is a solution at t = 0. Moreover,

Duh̄(0, u) = Dh(x∗)Dh(x∗)>  0
as Dh(x∗) has full row rank. Hence by Implicit Function Theorem, there is
δ > 0 s.t. a unique solution u(t) to h̄(t, u) = 0 exists for t ∈ (−δ, δ). Then

x(t) = x∗ + ty + Dh(x∗)>u(t)
is the desired curve.

Xiaojing Ye, Math & Stat, Georgia State University 13


Defintion. The normal space of Ω at x∗ is defined by

N (x∗) = {x ∈ Rn : x = Dh(x∗)>z for some z ∈ Rm}


In other words,

N (x∗) = C(Dh(x∗)>)
= R(Dh(x∗))
= span{∇h1(x∗), . . . , ∇hm(x∗)}
Note that dim(N (x∗)) = dim(R(Dh(x∗))) = m.

Xiaojing Ye, Math & Stat, Georgia State University 14


Remark. The tangent space T (x∗) and the normal space N (x∗) form an
orthogonal decomposition of Rn:

Rn = T (x∗) ⊕ N (x∗) = N (Dh(x∗)) ⊕ R(Dh(x∗))


where T (x∗) ⊥ N (x∗).

We can also write this as T (x∗)⊥ = N (x∗) or N (x∗)⊥ = T (x∗).

Hence, for any v ∈ Rn, there exist a unique pair y ∈ T (x∗) and w ∈ N (x∗),
such that
v =y+w

Xiaojing Ye, Math & Stat, Georgia State University 15


Now let us see the first-order necessary conditions (FONC) for equality-constrained
minimization.

Suppose x∗ is a local minimizer of f (x) over Ω = {x : h(x) = 0}, where


f, h ∈ C 1.

Then for any y ∈ T (x∗), there exists curve x : (a, b) → Ω such that x(t) =
x∗ and x0(t) = y for some t ∈ (a, b).

Define φ(s) = f (x(s)) (note that φ : (a, b) → R), then

φ0(s) = ∇f (x(s))>x0(s)

Xiaojing Ye, Math & Stat, Georgia State University 16


In particular, due to the standard FONC, we have

φ0(t) = ∇f (x(t))>x0(t) = ∇f (x∗)>y = 0


Since y ∈ T (x∗) is arbitrary, we know ∇f (x∗) ⊥ T (x∗), i.e.,

∇f (x∗) ∈ N (x∗)
This means that ∃ λ∗ ∈ Rm, such that

∇f (x∗) + Dh(x∗)>λ∗ = 0

This result is summarized below:


Theorem [Lagrange’s Theorem]. If x∗ is a local minimizer (or maximizer) of
f : Rn → R subject to h(x) = 0 ∈ Rm where m ≤ n, and x∗ is a regular
point (Dh(x∗) has full row rank), then there exists λ∗ ∈ Rm s.t.

∇f (x∗) + Dh(x∗)>λ∗ = 0

Xiaojing Ye, Math & Stat, Georgia State University 17


Now we know if x∗ is a local minimizer of

minimize f (x)
subject to h(x) = 0
then x∗ must satisfy

∇f (x∗) + Dh(x∗)>λ∗ = 0
h(x∗) = 0
There are called the first-order necessary conditions (FNOC), or the Lagrange
condition, of the equality-constrained minimization problem. λ∗ is called the
Lagrange multiplier.

Remark. The conditions above are necessary but not sufficient to determine
x∗ to be a local minimizer—a point satisfying these conditions could be a local
maximizer or neither.

Xiaojing Ye, Math & Stat, Georgia State University 18


Example. Consider the problem

minimize f (x)
subject to h(x) = 0
where f (x) = x and

2 if x < 0
x



h(x) =

0 if 0 ≤ x ≤ 1
(x − 1)2


if x > 1
We can see that Ω = [0, 1] and x∗ = 0 is the only local minimizer. However
f 0(x∗) = 1 and h0(x∗) = 0. The Lagrange condition fails to hold because x∗
is not a regular point.

Xiaojing Ye, Math & Stat, Georgia State University 19


We introduce the Lagrange function

l(x, λ) = f (x) + h(x)>λ


Then the Lagrange condition becomes

∇xl(x∗, λ∗) = ∇f (x∗) + Dh(x∗)>λ∗ = 0


∇λl(x∗, λ∗) = h(x∗) = 0
Note that this is a system of n + m equations for [x; λ] ∈ Rn+m (which as
n + m unknowns).

Xiaojing Ye, Math & Stat, Georgia State University 20


Example. Given a fixed area A of cardboard, we wish to construct a closed
cardboard box with maximum volume. Let the dimension of the box be x =
[x1; x2; x3], then the problem can be formulated as

maximize x1x2x3
subject to 2(x1x2 + x2x3 + x3x1) = A
Hence we can set

f (x) = −x1x2x3
A
h(x) = x1x2 + x2x3 + x3x1 −
2
Then the Lagrange function is
A
 
l(x, λ) = f (x) + h(x)λ = −x1x2x3 + x1x2 + x2x3 + x3x1 − λ
2

Xiaojing Ye, Math & Stat, Georgia State University 21


So the Lagrange condition is

∇xl(x, λ) = 0
∇λl(x, λ) = 0
which is

x2x3 − (x2 + x3)λ = 0


x1x3 − (x1 + x3)λ = 0
x1x2 − (x1 + x2)λ = 0
A
x1 x2 + x2 x3 + x3 x1 − = 0
2
Then solving this system yields
s s
A 1 A
x1 = x2 = x3 = , λ= .
6 2 6

Xiaojing Ye, Math & Stat, Georgia State University 22


Example. Consider an equality-constrained optimization problem

minimize x2 2
1 + x2
subject to x2 2
1 + 2x2 = 1

Solution. Here f (x) = x2 2 2 2


1 + x2 and h(x) = x1 + 2x2 − 1.

The Lagrange function is

l(x, λ) = (x2 2 2 2
1 + x2 ) + λ(x1 + 2x2 − 1)
Then we obtain

∂x1 l(x, λ) = 2x1 + 2λx1 = 0


∂x2 l(x, λ) = 2x2 + 4λx2 = 0
∂λl(x, λ) = x2
1 + 2x 2−1=0
2

Xiaojing Ye, Math & Stat, Georgia State University 23


Solution (cont). Solving this system yields
         
x1 
0√  
0√  1 −1
x2 =  −1/ 2 ,  0  , and  0 
1/ 2 ,
     
  
λ −1/2 −1/2 −1 −1

It is easy to check that x = [0; ±1/ 2] are local minimizers, and x = [0; ±1]
are local maximizers.

Xiaojing Ye, Math & Stat, Georgia State University 24


Now we consider second-order conditions. We assume f, h ∈ C 2.

Following the same steps as in FONC, suppose x∗ is a local minimizer, then


for any y ∈ T (x∗), there exists a curve x : (a, b) → Ω such that x(t) = x∗
and x0(t) = y for some t ∈ (a, b).

Again define φ(s) = f (x(s)), and hence φ0(s) = ∇f (x(s))>x0(s). Then


the standard second-order necessary condition (SONC) implies that at a local
minimizer there are

φ0(t) = ∇f (x(t))>x0(t) = ∇f (x∗)>y = 0


and
φ00(t) = y >∇2f (x∗)y + ∇f (x∗)>x00(t) ≥ 0

Xiaojing Ye, Math & Stat, Georgia State University 25


In addition, since ψi(s) := hi(x(s)) = 0 for all s ∈ (a, b), we have ψi00(t) =
0 which yields
y >∇2hi(x∗)y + ∇hi(x∗)>x00(t) = 0
for all i = 1, . . . , m.

According to the Lagrange condition, we know ∃ λ∗ ∈ Rm such that


m
∇f (x∗) + Dh(x∗)>λ∗ = ∇f (x∗) + λ∗i ∇hi(x∗) = 0
X

i=1

Using the results above, we can cancel the term with x00(t) and obtain
 m 
y > ∇ 2 f (x∗ ) + λ∗i ∇2hi(x∗) y ≥ 0
X

i=1
for all y ∈ T (x∗).

Xiaojing Ye, Math & Stat, Georgia State University 26


We summarize the second-order necessary condition (SONC):

Theorem (SONC). Let x∗ be a local minimizer of f : Rn → R over Ω = {x :


h(x) = 0 ∈ Rm} with m ≤ n, where f, h ∈ C 2. Suppose x∗ is regular, then
∃ λ∗ = [λ∗1; . . . ; λ∗m] ∈ Rm such that

1. ∇f (x∗) + Dh(x∗)>λ∗ = 0;

2. For every y ∈ T (x∗), there is


 m 
y > ∇2 f ( x ∗ ) + λ∗i ∇2hi(x∗) y ≥ 0.
X

i=1

So ∇2f (x∗) + ∗ 2 ∗
Pm
i=1 λi ∇ hi (x ) is playing the role of “Hessian”.

Xiaojing Ye, Math & Stat, Georgia State University 27


We also have the following second-order sufficient condition (SOSC):

Theorem (SOSC). Suppose x∗ ∈ Ω = {x : h(x) = 0} is regular. If ∃ λ∗ =


[λ∗1; . . . ; λ∗m] ∈ Rm such that

1. ∇f (x∗) + Dh(x∗)>λ∗ = 0;

2. for every nonzero y ∈ T (x∗), there is


 m 
y > ∇2 f ( x ∗ ) + λ∗i ∇2hi(x∗) y > 0.
X

i=1

Then x∗ is a strict local minimizer of f over Ω.

Xiaojing Ye, Math & Stat, Georgia State University 28


Example. Solve the following problem

x>Qx
maximize
x> P x
where Q = diag([4, 1]) and P = diag([2, 1]).

Solution. Note the objective function is scale-invariant (replacing x by tx for


any t 6= 0 yields the same value). This can be converted into the constrained
minimization problem

minimize − x>Qx
subject to x>P x − 1 = 0
and h(x) = x>P x − 1 ∈ R is the constraint.

Note that Dh(x) = 2P x = [4x1; 2x2].

Xiaojing Ye, Math & Stat, Georgia State University 29


Solution (cont). We first write the Lagrange function

l(x, λ) = −x>Qx + λ(x>P x − 1)


Then the Lagrange condition becomes

∇xl(x∗, λ∗) = −2(Q − λ∗P )x∗ = 0


∇λl(x∗, λ∗) = x∗>P x∗ − 1 = 0
The first equation implies P −1Qx∗ = λ∗x∗, and hence λ∗ is an eigenvalue
of P −1Q = diag([2, 1]). Hence λ∗ = 2 or λ∗ = 1.

Xiaojing Ye, Math & Stat, Georgia State University 30


For λ∗ = 2, we know x∗ is the corresponding eigenvector of P −1Q and
∗ > ∗ ∗

satisfies x P x = 1. Hence x = [±1/ 2; 0]. The tangent space is
∗ ∗

T (x ) = N (Dh(x )) = N ([± 2; 0]) = {[0; a] : a ∈ R}.

We also have
" #
0 0
∇2f (x∗) + λ∗∇2h(x∗) = −2Q + 2λ∗P =
0 2
Therefore y >[∇2f (x∗) + λ∗∇2h(x∗)]y = 2a2 > 0 for all y = [0; a] ∈
T (x∗) with a 6= 0.


Therefore x∗ = [±1/ 2; 0] are both strict local minimizers of the constrained
optimization problem.

Going back to the original problem, any x∗ = [t; 0] with t 6= 0 is a strict local
>
maximizer of x> Qx .
x Px

Xiaojing Ye, Math & Stat, Georgia State University 31


For λ∗ = 1, we know x∗ is the corresponding eigenvector of P −1Q and
satisfies x∗>P x∗ = 1. Hence x∗ = [0; ±1]. The tangent space is T (x∗) =
N (Dh(x∗)) = N ([0; ±1]) = {[a; 0] : a ∈ R}.

We also have
" #
−4 0
∇2f (x∗) + λ∗∇2h(x∗) = −2Q + 2λ∗P =
0 0
Therefore y >[∇2f (x∗) + λ∗∇2h(x∗)]y = −4a2 < 0 for all y = [a; 0] ∈
T (x∗) with a 6= 0.

Therefore x∗ = [0; ±1] are both strict local maximizers of the constrained
optimization problem.

Going back to the original problem, any x∗ = [0; t] with t 6= 0 is strict local
>
minimizer of x> Qx .
x Px

Xiaojing Ye, Math & Stat, Georgia State University 32


Now we consider a special type of constrained minimization problem with lin-
ear equality constraints (again Q  0 and A has full row rank):
1 >
minimize x Qx
2
subject to Ax = b
We have f (x) = 1
2 x > Qx and h(x) = b − Ax.

The Lagrange function is


1 >
l(x, λ) = x Qx + λ>(b − Ax).
2
Hence the Lagrange condition is

∇xl(x, λ) = Qx − A>λ = 0
∇λl(x, λ) = b − Ax = 0

Xiaojing Ye, Math & Stat, Georgia State University 33


Now we solve the following system for [x∗; λ∗]:

∇xL(x∗, λ∗) = Qx∗ − A>λ∗ = 0


∇λL(x∗, λ∗) = b − Ax∗ = 0
The first equation implies x∗ = Q−1A>λ∗.

Plugging this into the second equation and solve for λ∗ to get

λ∗ = (AQ−1A>)−1b
Hence the solution is

x∗ = Q−1A>λ∗ = Q−1A>(AQ−1A>)−1b

Xiaojing Ye, Math & Stat, Georgia State University 34


Example. Consider the problem of finding the solution of minimal norm to the
linear system Ax = b. That is

minimize kxk
subject to Ax = b

Solution. The problem is equivalent to


1 2 1 >
minimize kxk = x x
2 2
subject to Ax = b
which is the problem above with Q = I . Hence the solution is

x∗ = A>(AA>)−1b

Xiaojing Ye, Math & Stat, Georgia State University 35


Example. Consider a discrete dynamical system

xk = axk−1 + buk
with given initial x0, where k = 1, . . . , N stand for the time point. Here xk is
the “state” and uk is the “control”.

Suppose we want to minimize the state and control at all points, then we can
formulate the problem as
N
1 X
minimize (qx2
k + ru 2)
k
2 k=1
subject to xk = axk−1 + buk , k = 1, . . . , N.

This is an example of linear quadratic regulator (LQR) in the optimal control


theory.

Xiaojing Ye, Math & Stat, Georgia State University 36


To solve this, we let z = [x1; . . . ; xN ; u1; . . . ; uN ] ∈ R2N ,
" #
q IN 0
Q= ∈ R(2N )×(2N )
0 r IN
and
   
1 · · · 0 −b ··· 0 ax0
 −a 1
 ... −b ... 


0 

A =
 ... ... ... ...

 b =
... 

   
0 −a 1 0 · · · −b 0
Then the problem can be written as
1 >
minimize z Qz
2
subject to Az = b
and the solution is

z ∗ = [x∗; u∗] = Q−1A>(AQ−1A>)−1b

Xiaojing Ye, Math & Stat, Georgia State University 37


Example [Credit card holder’s dilemma]. Suppose we have a credit card
debt $10,000 which has a monthly interest rate of 2%. Now we want to make
monthly payment for 10 months to minimize the balance as well as the amount
of monthly payments.

Let xk be the balance and uk be the payment in month k. Then the problem
can be written as
10
1 X
minimize (qx2
k + ru 2)
k
2 k=1
subject to xk = 1.02 xk−1 − uk , k = 1, . . . , 10, x0 = 10000.
The more anxious we are to reduce our debt, the larger the value of q relative
to r. On the other hand, the more reluctant we are to make payments, the
larger the value of r relative to q.

Xiaojing Ye, Math & Stat, Georgia State University 38


Here are two instances with different choices of q and r:

q = 1, r = 10: q = 1, r = 300:

k Balance xk Payment uk k Balance xk Payment uk


1 7326.60 2873.40 1 9844.66 355.34
2 5374.36 2098.77 2 9725.36 316.20
3 3951.13 1530.72 3 9641.65 278.22
4 2916.82 1113.34 4 9593.23 241.25
5 2169.61 805.54 5 9579.92 205.17
6 1635.97 577.04 6 9601.68 169.84
7 1263.35 405.34 7 9658.58 135.13
8 1015.08 273.53 8 9750.83 100.92
9 866.73 168.65 9 9878.78 67.08
10 803.70 80.37 10 10042.87 33.48

Xiaojing Ye, Math & Stat, Georgia State University 39


We now focus on constrained optimization problems with both equality and
inequality constraints:

minimize f (x)
subject to g (x) ≤ 0
h(x) = 0
and the feasible set is Ω = {x ∈ Rn : g (x) ≤ 0, h(x) = 0}.

Note that g (x) = [g1(x); . . . ; gp(x)] ∈ Rp and h(x) = [h1(x); . . . ; hm(x)] ∈


Rm.

Definition. We call the inequality constraint gj active at x ∈ Ω if gj (x) = 0


and inactive if gj (x) < 0.

Xiaojing Ye, Math & Stat, Georgia State University 40


Definition. Denote J(x) the index set of active constraints at x:

J(x) = {j : gj (x) = 0}.


Also denote J c(x) = {1, . . . , p} \ J(x) as its complement.

Definition. We call x a regular point in Ω if

∇hi(x), ∇gj (x), 1 ≤ i ≤ m, j ∈ J(x)


are linearly independent (total of m + |J(x)| vectors in Rn).

Xiaojing Ye, Math & Stat, Georgia State University 41


Now we consider the first order necessary condition (FONC) for the optimiza-
tion problem with both equality and inequality constraints:

Theorem [Karush-Kahn-Tucker (KKT)]. Suppose f, g , h ∈ C 1, x∗ is a reg-


ular point and local minimizer of f , then ∃ λ∗ ∈ Rm, µ∗ ∈ Rp such that

∇f (x∗)> + λ∗>Dh(x∗) + µ∗>Dg (x∗) = 0>


h(x∗) = 0
g (x∗ ) ≤ 0
µ∗ ≥ 0
µ∗>g (x∗) = 0

Xiaojing Ye, Math & Stat, Georgia State University 42


Remarks.

• Define Lagrange function:

l(x, λ, µ) = f (x) + λ>h(x) + µ>g (x)


then the first KKT condition is just ∇xl(x∗, λ∗, µ∗) = 0.

• The second and third KKT conditions are just the constraints.

• λ is the Lagrange multiplier and µ is the KKT multiplier.

• Since µ∗ ≥ 0 and g (x∗) ≤ 0, the last KKT condition implies µ∗j gj (x∗) =
0 for all j = 1, . . . , p. Namely gj (x∗) < 0 implies µ∗j = 0. Hence

µ∗j = 0, ∀ j 6= J(x∗).

Xiaojing Ye, Math & Stat, Georgia State University 43


Proof (KKT Theorem). We first just let µ∗j = 0 for all j ∈ J c(x∗).

Since gj is not active at x∗ for j ∈ J c(x∗), it’s not active in a neighbor of x∗


either. Hence x∗ is a regular point and local minimizer in Ω implies that x∗ is
a regular point and local minimizer in

Ω0 := {x ∈ Ω : h(x) = 0, gj (x) = 0, j ∈ J(x∗)}

Note Ω0 only contains equality constraints, hence the Lagrange theorem for
equality constrained problems applies, i.e., ∃λ∗, µ∗j for j ∈ J(x∗) such that

∇f (x∗) + Dh(x∗)>λ∗ + Dg (x∗)>µ∗ = 0


where µ∗ = [µ∗1; . . . ; µ∗p]. We only need to show µ∗j ≥ 0 for all j ∈ J(x∗).

Xiaojing Ye, Math & Stat, Georgia State University 44


Proof (cont.) If µ∗j < 0 for some j ∈ J(x∗), then define

T̂ (x∗) := {y ∈ Rn : Dh(x∗)y = 0, ∇gj 0 (x∗)>y = 0, j ∈ J(x∗), j 0 6= j}.

We claim that ∃ y ∈ T̂ (x∗) such that ∇gj (x∗)>y 6= 0: otherwise ∇gj (x∗)
can be spanned by {∇hi(x∗), ∇gj 0 (x∗) : 1 ≤ i ≤ m, j 0 ∈ J(x∗), j 0 6= j},
which contradicts to that ∇hi(x∗), ∇gj (x∗) are linearly independent (since
x∗ is regular). We choose y (or −y ) so that ∇gj (x∗)>y < 0.

Now left-multiply y > to both sides of ∇f (x∗)+Dh(x∗)>λ∗ +Dg (x∗)>µ∗ =


0, we get (since µ∗j < 0 and ∇gj (x∗)>y < 0):

0 = y >∇f (x∗) + µ∗j y >∇gj (x∗) > y >∇f (x∗)


Therefore there exists a curve x(t) : (a, b) → Ω such that x(t∗) = x∗ and
x0(t∗) = y for t∗ ∈ (a, b).

Xiaojing Ye, Math & Stat, Georgia State University 45


Proof (cont.) Moreover, define φ(t) := f (x(t)), then

φ0(t∗) = ∇f (x(t∗))>x0(t∗) = ∇f (x∗)>y < 0


Also, define ψ(t) = gj (x(t)), then

ψ 0(t∗) = ∇gj (x(t∗))>x0(t∗) = ∇gj (x∗)>y < 0


These mean that ∃  > 0 such that during [t∗, t∗ + ] ⊂ (a, b), f (x(t)) and
gj (x(t)) can both decrease further, so x(t) ∈ Ω and f (x(t)) < f (x∗) for
t ∈ (t∗, t∗ + ]. This contradicts to that x∗ is a local minimizer on Ω. Hence
µ∗j ≥ 0 for all j ∈ J(x∗).

Xiaojing Ye, Math & Stat, Georgia State University 46


Example. Consider the problem

minimize f (x1, x2) = x2 2


1 + x2 + x1 x2 − 3x1
subject to x1, x2 ≥ 0
The Lagrange function is

l(x, µ) = x2 2
1 + x2 + x1 x2 − 3x1 − x1 µ1 − x2 µ2
The KKT condition is

2x1 + x2 − 3 − µ1 = 0
x1 + 2x2 − µ2 = 0
x1 , x 2 , µ 1 , µ 2 ≥ 0
µ1x1 + µ2x2 = 0
Solving this yields
∗ ∗ 3
x 1 = µ2 = , x∗2 = µ∗1 = 0.
2

Xiaojing Ye, Math & Stat, Georgia State University 47


Similar as the proof of FONC, we can show SONC.

Theorem [Second order necessary condition (SONC)]. Suppose f, g , h ∈


p
C 2. If x∗ is a regular point and local minimizer, then ∃ λ∗ ∈ Rm, µ∗ ∈ R+
such that

• The KKT condition for (x∗, λ∗, µ∗) holds;

• For all y ∈ T (x∗), there is

y >∇2xl(x∗, λ∗, µ∗)y ≥ 0


where

T (x∗) = {y ∈ Rn : Dh(x∗)y = 0, ∇gj (x∗)>y = 0, ∀ j ∈ J(x∗)}

Proof. The first part follows from the KKT theorem. The second part is due
to the fact that x∗ being a local minimizer of f over Ω implies that it is a local
minimizer over Ω0.
Xiaojing Ye, Math & Stat, Georgia State University 48
Theorem [Second order sufficient condition (SOSC)]. Suppose f, g , h ∈
C 2. If ∃ λ∗ ∈ Rm, µ∗ ∈ Rp such that

• The KKT condition of (x∗, λ∗, µ∗) holds;

• For all nonzero y ∈ T̃ (x∗, µ∗), there is

y >∇2xl(x∗, λ∗, µ∗)y > 0


where

T̃ (x∗, µ∗) := {y ∈ Rn : Dh(x)y = 0, ∇gj (x∗)y = 0, j ∈ J(


˜ x∗, µ∗)}

and
˜ x∗, µ∗) := {j ∈ J(x∗) : µ∗j > 0}
J(

Then x∗ is a strict local minimizer.

Remark. We omit the proof here. Note that T (x∗) ⊂ T̃ (x∗, µ∗).

Xiaojing Ye, Math & Stat, Georgia State University 49


Example. Consider the following constrained problem:

minimize x1x22
subject to x1 = x2
x1 ≥ 0
Solution. Here f (x) = x1x2
2 , h(x) = x1 − x2 , and g(x) = −x1 . The
Lagrange function is

l(x, λ, µ) = x1x2
2 + λ(x1 − x2 ) − µx1
Then we obtain the KKT conditions:

∂x1 l(x, λ, µ) = x2
2+λ−µ=0
∂x2 l(x, λ, µ) = 2x1x2 − λ = 0
∂λl(x, λ, µ) = x1 − x2 = 0
x1 ≥ 0
µ≥0
µx1 = 0

Xiaojing Ye, Math & Stat, Georgia State University 50


Solution (cont.) If x∗1 = x∗2 = 0, then λ∗ = µ∗ = 0. If x∗1 = x∗2 > 0,
then µ∗ = 0 but we cannot find any valid λ∗. So only the point [x∗, λ∗, µ∗] =
[0, 0, 0, 0] satisfies the KKT conditions.

Since µ∗ = 0, we have

T̃ (x∗, µ∗) = N (∇h(x∗)) = N ([1, −1]) = {t[1, 1] : t ∈ R}

On the other hand


" #
∗ ∗ ∗ 0 0
∇2
xl(x , λ , µ ) = 0 0
so y >(∇2l(x∗, λ∗, µ∗))y = 0 for all y ∈ T̃ (x∗, µ) but not strictly larger than
0. Hence SOSC does not hold. But in fact x∗ = [0, 0] is the local minimum
(actually also global).

Xiaojing Ye, Math & Stat, Georgia State University 51


Example. Consider the following constrained problem:

minimize x1 + 4x22
subject to x2 2
1 + 2x2 ≥ 4
Solution. Here f (x) = x2 2 2 2
1 + 4x2 , g(x) = −(x1 + 2x2 − 4). The Lagrange
function is
l(x, µ) = x2 2 2 2
1 + 4x2 − µ(x1 + 2x2 − 4).
Then we obtain the KKT conditions:

∂x1 l(x, µ) = 2x1 − 2µx1 = 0


∂x2 l(x, µ) = 8x2 − 4µx2 = 0
x2
1 + 2x2≥4
2
µ≥0
−µ(x2 2
1 + 2x2 − 4) = 0

Xiaojing Ye, Math & Stat, Georgia State University 52


Solution (cont.)

• If µ∗ = 0, then x∗1 = x∗2 = 0 which violates g(x) ≤ 0.

• If µ∗ = 1 then [x∗1, x∗2] = ±[2, 0].


∗ ∗ ∗

• If µ = 2 then [x1, x2] = ±[0, 2].

• If µ∗ > 0 but µ 6= 1, 2, then x∗1 = x∗2 = 0 which again violates g(x) ≤ 0.

Hence the following 4 points satisfy the KKT conditions:


   

     
x1 2 −2 √ 
0 0
 √ 
 ∗ − 2
x2 = 0 ,  0 ,  2 ,
   
µ∗
   
1 1 2 2

Xiaojing Ye, Math & Stat, Georgia State University 53


Solution (cont.)

For µ∗ = 1, we have
" # " #
0 0 ∓4
∇2
xl([±2, 0, 1]) = , ∇g([±2, 0]) =
0 4 0
which implies
T̃ (x∗, µ∗) = T (x∗) = {t[0, 1] : t ∈ R}
Hence
y >∇2xl([x∗1, x∗2, µ∗])y = 4t2 > 0
for all y ∈ T̃ (x∗, µ∗) \ {0}.

So [x∗1, x∗2] = [±2, 0] satisfy SOSC and are strict local minimizers.

Xiaojing Ye, Math & Stat, Georgia State University 54


Solution (cont.)

For µ∗ = 2, we have
√ √
" # " #
2 −2 0 0√
∇xl([0, ± 2, 2]) = , ∇g([0, ± 2]) =
0 0 ∓4 2
which implies
T̃ (x∗, µ∗) = T (x∗) = {t[1, 0] : t ∈ R}
Hence
y >∇2xl([x∗1, x∗2, µ∗])y = −4t2 < 0
for all y ∈ T̃ (x∗, µ∗) \ {0}.


So [x∗1, x∗2] = [0, ± 2] do not satisfy SOSC but are strict local maximizers.

Xiaojing Ye, Math & Stat, Georgia State University 55

You might also like