lec09

MATH 4211/6211 – Optimization
Constrained Optimization
Xiaojing Ye
Department of Mathematics & Statistics
Georgia State University
Xiaojing Ye, Math & Stat, Georgia State University 0

Constrained optimization problems are formulated as
minimize f (x)
subject to gj (x) ≤ 0, j = 1, . . . , p,
hi(x) = 0, i = 1, . . . , m,
where gj , hi : Rn → R are inequality and equality constraint functions, re-
spectively.
We can summarize them into vector-valued functions g and h:

   
g 1 (x) h1(x)
g (x) = 

... 
 and h(x) = 

... 

g p (x) hm(x)
so the constraints can be written as g (x) ≤ 0 and h(x) = 0, respectively.
Note that g : Rn → Rp and h : Rn → Rm.

We can write the constrained optimization concisely as
minimize f (x)
subject to g (x) ≤ 0
h(x) = 0
The feasible set is Ω := {x ∈ Rn : g (x) ≤ 0, h(x) = 0}.
Exmaple. LP (standard form) is a constrained optimization with f (x) = c>x,

g (x) = −x and h(x) = Ax − b.

We now focus on constrained optimization problems with equality constraints
only, i.e.,
minimize f (x)
subject to h(x) = 0
and the feasible set is Ω = {x ∈ Rn : h(x) = 0}.

Some equality-constrained optimization problem can be converted into uncon-
strained ones.
Example.
• Consider the constrained optimization problem
minimize x2
1 + 2x x
1 2 + 3x 2 + 4x + 5x + 6x
2 1 2 3
subject to x1 + 2x2 = 3
4x1 + 5x3 = 6
The constraints imply that x2 = 2 1 (3 − x ) and x = 1 (6 − x ). Substi-
1 3 5 1
tute x2 and x3 in the objective function to get an unconstrained minimiza-
tion of x1 only.

Example.
• Consider the constrained optimization problem
maximize x1x2
subject to x2 2
1 + 4x2 = 1
It is equivalent to maximizing x2 2 2 2
1 x2 then substitute x1 by 1 − 4x2 to get
an unconstrained problem of x2.
Another way to solving this is using 1 = x2

1√+ (2x 2 ) 2 ≥ 4x x where
1 2
√
the equality holds when x1 = 2x2.So x1 = 2/2 and x2 = 2/4.
However, not all equality-constrained problems can be easily converted into

unconstrained ones.

We need general theory to solve constrained optimization problems with equal-
ity constraints:
minimize f (x)
subject to h(x) = 0
and the feasible set is Ω = {x ∈ Rn : h(x) = 0}.
Recall that h : Rn → Rm (m ≤ n) has Jacobian matrix

 
>
 ∇h1(
...
x) 
Dh(x) =   ∈ Rm×n
 
∇hm(x) >
Definition. We say a point x ∈ Ω is a regular point if rank(Dh(x)) = m,

i.e., the Jacobian matrix has full row rank.

Example. Let n = 3 and m = 1. Define h1(x) = x2 − x2
3 be the only
constraint. Then the Jacobian matrix is
Dh(x) = [∇h1(x)>] = [0, 1, −2x3]

Note that Dh(x) 6= 0 and hence rank(Dh(x)) = 1 everywhere.
The feasible set Ω is a “surface” in R3 with dimension n − m = 3 − 1 = 2.

Example. Let n = 3 and m = 2. Define h1(x) = x1 and h2(x) = x2 − x2
3.
The Jacobian is
" #
1 0 0
Dh(x) =
0 1 −2x3
with rank(Dh(x)) = 2 everywhere, and the feasible set Ω is a line in R3 with
dimension n − m = 3 − 2 = 1.

Tangent space and normal space
Defintion. We say x : (a, b) → Rn, a curve in Rn, is differentiable if x0i(t)

exists for all t ∈ (a, b). The derivative is defined by
0
 
x1(t)
x0(t) =   .. 
. 
x0n(t)
We say x is twice differentiable if x00i (t) exists for all t ∈ (a, b), and
00
 
x1(t)
x00(t) =   .. 
. 
x00n(t)

Defintion. The tangent space of Ω = {x ∈ Rn : h(x) = 0} at x∗ is the set
T (x∗) = {y ∈ Rn : Dh(x∗)y = 0}.

In other words, T (x∗) = N (Dh(x∗)).
Remark. If x∗ is regular, then rank(Dh(x∗)) = m, and hence dim(T (x∗)) =

dim(N (Dh(x∗))) = n − m.
Remark. We sometimes draw the tangent space as a plane tangent to Ω at

x∗, that tangent plane is
T P (x∗) := x∗ + T (x∗) = {x∗ + y : y ∈ T (x∗)}

Example. Let
Ω = {x ∈ R3 : h1(x) = x1 = 0, h2(x) = x1 − x2 = 0}
Then we have
>
"# " #
∇h1(x) 1 0 0
Dh(x) = =
∇h2(x)> 1 −1 0
at any x ∈ Ω, and the tangent space at any point x is
" #
1 0 0

T (x) = N (D(h(x)) = y ∈ R3 : y=0
1 −1 0
= {[0; 0; α] ∈ R3 : α ∈ R}

Theorem. Suppose x∗ is regular. Then y ∈ T (x∗) iff there exists curve
x : (−δ, δ) → Ω such that x(0) = x∗ and x0(0) = y .
Proof. (⇐) Let x(t) be such a curve, then h(x(t)) = 0 for t ∈ (−δ, δ) and
Dh(x(0))x0(0) = Dh(x∗)y = 0
which implies y ∈ T (x∗).

Proof (cont.)
(⇒) For any t, let u = u(t) ∈ Rn be determined by t s.t. it solves
h̄(t, u) := h(x∗ + ty + Dh(x∗)>u) = 0

We know u(0) = 0 is a solution at t = 0. Moreover,
Duh̄(0, u) = Dh(x∗)Dh(x∗)> 0
as Dh(x∗) has full row rank. Hence by Implicit Function Theorem, there is
δ > 0 s.t. a unique solution u(t) to h̄(t, u) = 0 exists for t ∈ (−δ, δ). Then
x(t) = x∗ + ty + Dh(x∗)>u(t)
is the desired curve.

Defintion. The normal space of Ω at x∗ is defined by
N (x∗) = {x ∈ Rn : x = Dh(x∗)>z for some z ∈ Rm}

In other words,
N (x∗) = C(Dh(x∗)>)
= R(Dh(x∗))
= span{∇h1(x∗), . . . , ∇hm(x∗)}
Note that dim(N (x∗)) = dim(R(Dh(x∗))) = m.

Remark. The tangent space T (x∗) and the normal space N (x∗) form an
orthogonal decomposition of Rn:
Rn = T (x∗) ⊕ N (x∗) = N (Dh(x∗)) ⊕ R(Dh(x∗))

where T (x∗) ⊥ N (x∗).
We can also write this as T (x∗)⊥ = N (x∗) or N (x∗)⊥ = T (x∗).
Hence, for any v ∈ Rn, there exist a unique pair y ∈ T (x∗) and w ∈ N (x∗),
such that
v =y+w

Now let us see the first-order necessary conditions (FONC) for equality-constrained
minimization.
Suppose x∗ is a local minimizer of f (x) over Ω = {x : h(x) = 0}, where

f, h ∈ C 1.
Then for any y ∈ T (x∗), there exists curve x : (a, b) → Ω such that x(t) =
x∗ and x0(t) = y for some t ∈ (a, b).
Define φ(s) = f (x(s)) (note that φ : (a, b) → R), then
φ0(s) = ∇f (x(s))>x0(s)

In particular, due to the standard FONC, we have
φ0(t) = ∇f (x(t))>x0(t) = ∇f (x∗)>y = 0

Since y ∈ T (x∗) is arbitrary, we know ∇f (x∗) ⊥ T (x∗), i.e.,
∇f (x∗) ∈ N (x∗)
This means that ∃ λ∗ ∈ Rm, such that
∇f (x∗) + Dh(x∗)>λ∗ = 0
This result is summarized below:

Theorem [Lagrange’s Theorem]. If x∗ is a local minimizer (or maximizer) of
f : Rn → R subject to h(x) = 0 ∈ Rm where m ≤ n, and x∗ is a regular
point (Dh(x∗) has full row rank), then there exists λ∗ ∈ Rm s.t.
∇f (x∗) + Dh(x∗)>λ∗ = 0

Now we know if x∗ is a local minimizer of
minimize f (x)
subject to h(x) = 0
then x∗ must satisfy
∇f (x∗) + Dh(x∗)>λ∗ = 0
h(x∗) = 0
There are called the first-order necessary conditions (FNOC), or the Lagrange
condition, of the equality-constrained minimization problem. λ∗ is called the
Lagrange multiplier.
Remark. The conditions above are necessary but not sufficient to determine
x∗ to be a local minimizer—a point satisfying these conditions could be a local
maximizer or neither.

Example. Consider the problem
minimize f (x)
subject to h(x) = 0
where f (x) = x and

2 if x < 0
x



h(x) =

0 if 0 ≤ x ≤ 1
(x − 1)2


if x > 1
We can see that Ω = [0, 1] and x∗ = 0 is the only local minimizer. However
f 0(x∗) = 1 and h0(x∗) = 0. The Lagrange condition fails to hold because x∗
is not a regular point.

We introduce the Lagrange function
l(x, λ) = f (x) + h(x)>λ

Then the Lagrange condition becomes
∇xl(x∗, λ∗) = ∇f (x∗) + Dh(x∗)>λ∗ = 0

∇λl(x∗, λ∗) = h(x∗) = 0
Note that this is a system of n + m equations for [x; λ] ∈ Rn+m (which as
n + m unknowns).

Example. Given a fixed area A of cardboard, we wish to construct a closed
cardboard box with maximum volume. Let the dimension of the box be x =
[x1; x2; x3], then the problem can be formulated as
maximize x1x2x3
subject to 2(x1x2 + x2x3 + x3x1) = A
Hence we can set
f (x) = −x1x2x3
A
h(x) = x1x2 + x2x3 + x3x1 −
2
Then the Lagrange function is
A

l(x, λ) = f (x) + h(x)λ = −x1x2x3 + x1x2 + x2x3 + x3x1 − λ
2

So the Lagrange condition is
∇xl(x, λ) = 0
∇λl(x, λ) = 0
which is
x2x3 − (x2 + x3)λ = 0

x1x3 − (x1 + x3)λ = 0
x1x2 − (x1 + x2)λ = 0
A
x1 x2 + x2 x3 + x3 x1 − = 0
2
Then solving this system yields
s s
A 1 A
x1 = x2 = x3 = , λ= .
6 2 6

Example. Consider an equality-constrained optimization problem
minimize x2 2
1 + x2
subject to x2 2
1 + 2x2 = 1
Solution. Here f (x) = x2 2 2 2

1 + x2 and h(x) = x1 + 2x2 − 1.
The Lagrange function is
l(x, λ) = (x2 2 2 2
1 + x2 ) + λ(x1 + 2x2 − 1)
Then we obtain
∂x1 l(x, λ) = 2x1 + 2λx1 = 0

∂x2 l(x, λ) = 2x2 + 4λx2 = 0
∂λl(x, λ) = x2
1 + 2x 2−1=0
2

Solution (cont). Solving this system yields
         
x1 
0√  
0√  1 −1
x2 =  −1/ 2 ,  0  , and  0 
1/ 2 ,
     
  
λ −1/2 −1/2 −1 −1
√
It is easy to check that x = [0; ±1/ 2] are local minimizers, and x = [0; ±1]
are local maximizers.

Now we consider second-order conditions. We assume f, h ∈ C 2.
Following the same steps as in FONC, suppose x∗ is a local minimizer, then

for any y ∈ T (x∗), there exists a curve x : (a, b) → Ω such that x(t) = x∗
and x0(t) = y for some t ∈ (a, b).
Again define φ(s) = f (x(s)), and hence φ0(s) = ∇f (x(s))>x0(s). Then

the standard second-order necessary condition (SONC) implies that at a local
minimizer there are
φ0(t) = ∇f (x(t))>x0(t) = ∇f (x∗)>y = 0

and
φ00(t) = y >∇2f (x∗)y + ∇f (x∗)>x00(t) ≥ 0

In addition, since ψi(s) := hi(x(s)) = 0 for all s ∈ (a, b), we have ψi00(t) =
0 which yields
y >∇2hi(x∗)y + ∇hi(x∗)>x00(t) = 0
for all i = 1, . . . , m.
According to the Lagrange condition, we know ∃ λ∗ ∈ Rm such that

m
∇f (x∗) + Dh(x∗)>λ∗ = ∇f (x∗) + λ∗i ∇hi(x∗) = 0
X
i=1
Using the results above, we can cancel the term with x00(t) and obtain
m
y > ∇ 2 f (x∗ ) + λ∗i ∇2hi(x∗) y ≥ 0
X
i=1
for all y ∈ T (x∗).

We summarize the second-order necessary condition (SONC):
Theorem (SONC). Let x∗ be a local minimizer of f : Rn → R over Ω = {x :

h(x) = 0 ∈ Rm} with m ≤ n, where f, h ∈ C 2. Suppose x∗ is regular, then
∃ λ∗ = [λ∗1; . . . ; λ∗m] ∈ Rm such that
1. ∇f (x∗) + Dh(x∗)>λ∗ = 0;
2. For every y ∈ T (x∗), there is

m
y > ∇2 f ( x ∗ ) + λ∗i ∇2hi(x∗) y ≥ 0.
X
i=1
So ∇2f (x∗) + ∗ 2 ∗
Pm
i=1 λi ∇ hi (x ) is playing the role of “Hessian”.

We also have the following second-order sufficient condition (SOSC):
Theorem (SOSC). Suppose x∗ ∈ Ω = {x : h(x) = 0} is regular. If ∃ λ∗ =

[λ∗1; . . . ; λ∗m] ∈ Rm such that
1. ∇f (x∗) + Dh(x∗)>λ∗ = 0;
2. for every nonzero y ∈ T (x∗), there is

m
y > ∇2 f ( x ∗ ) + λ∗i ∇2hi(x∗) y > 0.
X
i=1
Then x∗ is a strict local minimizer of f over Ω.

Example. Solve the following problem
x>Qx
maximize
x> P x
where Q = diag([4, 1]) and P = diag([2, 1]).
Solution. Note the objective function is scale-invariant (replacing x by tx for

any t 6= 0 yields the same value). This can be converted into the constrained
minimization problem
minimize − x>Qx
subject to x>P x − 1 = 0
and h(x) = x>P x − 1 ∈ R is the constraint.
Note that Dh(x) = 2P x = [4x1; 2x2].

Solution (cont). We first write the Lagrange function
l(x, λ) = −x>Qx + λ(x>P x − 1)

Then the Lagrange condition becomes
∇xl(x∗, λ∗) = −2(Q − λ∗P )x∗ = 0

∇λl(x∗, λ∗) = x∗>P x∗ − 1 = 0
The first equation implies P −1Qx∗ = λ∗x∗, and hence λ∗ is an eigenvalue
of P −1Q = diag([2, 1]). Hence λ∗ = 2 or λ∗ = 1.

For λ∗ = 2, we know x∗ is the corresponding eigenvector of P −1Q and
∗ > ∗ ∗
√
satisfies x P x = 1. Hence x = [±1/ 2; 0]. The tangent space is
∗ ∗
√
T (x ) = N (Dh(x )) = N ([± 2; 0]) = {[0; a] : a ∈ R}.
We also have
" #
0 0
∇2f (x∗) + λ∗∇2h(x∗) = −2Q + 2λ∗P =
0 2
Therefore y >[∇2f (x∗) + λ∗∇2h(x∗)]y = 2a2 > 0 for all y = [0; a] ∈
T (x∗) with a 6= 0.
√
Therefore x∗ = [±1/ 2; 0] are both strict local minimizers of the constrained
optimization problem.
Going back to the original problem, any x∗ = [t; 0] with t 6= 0 is a strict local
>
maximizer of x> Qx .
x Px

For λ∗ = 1, we know x∗ is the corresponding eigenvector of P −1Q and
satisfies x∗>P x∗ = 1. Hence x∗ = [0; ±1]. The tangent space is T (x∗) =
N (Dh(x∗)) = N ([0; ±1]) = {[a; 0] : a ∈ R}.
We also have
" #
−4 0
∇2f (x∗) + λ∗∇2h(x∗) = −2Q + 2λ∗P =
0 0
Therefore y >[∇2f (x∗) + λ∗∇2h(x∗)]y = −4a2 < 0 for all y = [a; 0] ∈
T (x∗) with a 6= 0.
Therefore x∗ = [0; ±1] are both strict local maximizers of the constrained
optimization problem.
Going back to the original problem, any x∗ = [0; t] with t 6= 0 is strict local
>
minimizer of x> Qx .
x Px

Now we consider a special type of constrained minimization problem with lin-
ear equality constraints (again Q 0 and A has full row rank):
1 >
minimize x Qx
2
subject to Ax = b
We have f (x) = 1
2 x > Qx and h(x) = b − Ax.

1 >
l(x, λ) = x Qx + λ>(b − Ax).
2
Hence the Lagrange condition is
∇xl(x, λ) = Qx − A>λ = 0
∇λl(x, λ) = b − Ax = 0

Now we solve the following system for [x∗; λ∗]:
∇xL(x∗, λ∗) = Qx∗ − A>λ∗ = 0

∇λL(x∗, λ∗) = b − Ax∗ = 0
The first equation implies x∗ = Q−1A>λ∗.
Plugging this into the second equation and solve for λ∗ to get
λ∗ = (AQ−1A>)−1b
Hence the solution is
x∗ = Q−1A>λ∗ = Q−1A>(AQ−1A>)−1b

Example. Consider the problem of finding the solution of minimal norm to the
linear system Ax = b. That is
minimize kxk
subject to Ax = b
Solution. The problem is equivalent to

1 2 1 >
minimize kxk = x x
2 2
subject to Ax = b
which is the problem above with Q = I . Hence the solution is
x∗ = A>(AA>)−1b

Example. Consider a discrete dynamical system
xk = axk−1 + buk
with given initial x0, where k = 1, . . . , N stand for the time point. Here xk is
the “state” and uk is the “control”.
Suppose we want to minimize the state and control at all points, then we can
formulate the problem as
N
1 X
minimize (qx2
k + ru 2)
k
2 k=1
subject to xk = axk−1 + buk , k = 1, . . . , N.
This is an example of linear quadratic regulator (LQR) in the optimal control

theory.

To solve this, we let z = [x1; . . . ; xN ; u1; . . . ; uN ] ∈ R2N ,
" #
q IN 0
Q= ∈ R(2N )×(2N )
0 r IN
and
   
1 · · · 0 −b ··· 0 ax0
 −a 1
 ... −b ... 


0 

A =
 ... ... ... ...

 b =
... 

   
0 −a 1 0 · · · −b 0
Then the problem can be written as
1 >
minimize z Qz
2
subject to Az = b
and the solution is
z ∗ = [x∗; u∗] = Q−1A>(AQ−1A>)−1b

Example [Credit card holder’s dilemma]. Suppose we have a credit card
debt $10,000 which has a monthly interest rate of 2%. Now we want to make
monthly payment for 10 months to minimize the balance as well as the amount
of monthly payments.
Let xk be the balance and uk be the payment in month k. Then the problem
can be written as
10
1 X
minimize (qx2
k + ru 2)
k
2 k=1
subject to xk = 1.02 xk−1 − uk , k = 1, . . . , 10, x0 = 10000.
The more anxious we are to reduce our debt, the larger the value of q relative
to r. On the other hand, the more reluctant we are to make payments, the
larger the value of r relative to q.

Here are two instances with different choices of q and r:
q = 1, r = 10: q = 1, r = 300:
k Balance xk Payment uk k Balance xk Payment uk

1 7326.60 2873.40 1 9844.66 355.34
2 5374.36 2098.77 2 9725.36 316.20
3 3951.13 1530.72 3 9641.65 278.22
4 2916.82 1113.34 4 9593.23 241.25
5 2169.61 805.54 5 9579.92 205.17
6 1635.97 577.04 6 9601.68 169.84
7 1263.35 405.34 7 9658.58 135.13
8 1015.08 273.53 8 9750.83 100.92
9 866.73 168.65 9 9878.78 67.08
10 803.70 80.37 10 10042.87 33.48

We now focus on constrained optimization problems with both equality and
inequality constraints:
minimize f (x)
subject to g (x) ≤ 0
h(x) = 0
and the feasible set is Ω = {x ∈ Rn : g (x) ≤ 0, h(x) = 0}.
Note that g (x) = [g1(x); . . . ; gp(x)] ∈ Rp and h(x) = [h1(x); . . . ; hm(x)] ∈

Rm.
Definition. We call the inequality constraint gj active at x ∈ Ω if gj (x) = 0

and inactive if gj (x) < 0.

Definition. Denote J(x) the index set of active constraints at x:
J(x) = {j : gj (x) = 0}.

Also denote J c(x) = {1, . . . , p} \ J(x) as its complement.
Definition. We call x a regular point in Ω if
∇hi(x), ∇gj (x), 1 ≤ i ≤ m, j ∈ J(x)

are linearly independent (total of m + |J(x)| vectors in Rn).

Now we consider the first order necessary condition (FONC) for the optimiza-
tion problem with both equality and inequality constraints:
Theorem [Karush-Kahn-Tucker (KKT)]. Suppose f, g , h ∈ C 1, x∗ is a reg-

ular point and local minimizer of f , then ∃ λ∗ ∈ Rm, µ∗ ∈ Rp such that
∇f (x∗)> + λ∗>Dh(x∗) + µ∗>Dg (x∗) = 0>

h(x∗) = 0
g (x∗ ) ≤ 0
µ∗ ≥ 0
µ∗>g (x∗) = 0

Remarks.
• Define Lagrange function:
l(x, λ, µ) = f (x) + λ>h(x) + µ>g (x)

then the first KKT condition is just ∇xl(x∗, λ∗, µ∗) = 0.
• The second and third KKT conditions are just the constraints.
• λ is the Lagrange multiplier and µ is the KKT multiplier.
• Since µ∗ ≥ 0 and g (x∗) ≤ 0, the last KKT condition implies µ∗j gj (x∗) =
0 for all j = 1, . . . , p. Namely gj (x∗) < 0 implies µ∗j = 0. Hence
µ∗j = 0, ∀ j 6= J(x∗).

Proof (KKT Theorem). We first just let µ∗j = 0 for all j ∈ J c(x∗).
Since gj is not active at x∗ for j ∈ J c(x∗), it’s not active in a neighbor of x∗

either. Hence x∗ is a regular point and local minimizer in Ω implies that x∗ is
a regular point and local minimizer in
Ω0 := {x ∈ Ω : h(x) = 0, gj (x) = 0, j ∈ J(x∗)}
Note Ω0 only contains equality constraints, hence the Lagrange theorem for
equality constrained problems applies, i.e., ∃λ∗, µ∗j for j ∈ J(x∗) such that
∇f (x∗) + Dh(x∗)>λ∗ + Dg (x∗)>µ∗ = 0

where µ∗ = [µ∗1; . . . ; µ∗p]. We only need to show µ∗j ≥ 0 for all j ∈ J(x∗).

Proof (cont.) If µ∗j < 0 for some j ∈ J(x∗), then define
T̂ (x∗) := {y ∈ Rn : Dh(x∗)y = 0, ∇gj 0 (x∗)>y = 0, j ∈ J(x∗), j 0 6= j}.
We claim that ∃ y ∈ T̂ (x∗) such that ∇gj (x∗)>y 6= 0: otherwise ∇gj (x∗)
can be spanned by {∇hi(x∗), ∇gj 0 (x∗) : 1 ≤ i ≤ m, j 0 ∈ J(x∗), j 0 6= j},
which contradicts to that ∇hi(x∗), ∇gj (x∗) are linearly independent (since
x∗ is regular). We choose y (or −y ) so that ∇gj (x∗)>y < 0.
Now left-multiply y > to both sides of ∇f (x∗)+Dh(x∗)>λ∗ +Dg (x∗)>µ∗ =

0, we get (since µ∗j < 0 and ∇gj (x∗)>y < 0):
0 = y >∇f (x∗) + µ∗j y >∇gj (x∗) > y >∇f (x∗)

Therefore there exists a curve x(t) : (a, b) → Ω such that x(t∗) = x∗ and
x0(t∗) = y for t∗ ∈ (a, b).

Proof (cont.) Moreover, define φ(t) := f (x(t)), then
φ0(t∗) = ∇f (x(t∗))>x0(t∗) = ∇f (x∗)>y < 0

Also, define ψ(t) = gj (x(t)), then
ψ 0(t∗) = ∇gj (x(t∗))>x0(t∗) = ∇gj (x∗)>y < 0

These mean that ∃ > 0 such that during [t∗, t∗ + ] ⊂ (a, b), f (x(t)) and
gj (x(t)) can both decrease further, so x(t) ∈ Ω and f (x(t)) < f (x∗) for
t ∈ (t∗, t∗ + ]. This contradicts to that x∗ is a local minimizer on Ω. Hence
µ∗j ≥ 0 for all j ∈ J(x∗).

Example. Consider the problem
minimize f (x1, x2) = x2 2

1 + x2 + x1 x2 − 3x1
subject to x1, x2 ≥ 0
l(x, µ) = x2 2
1 + x2 + x1 x2 − 3x1 − x1 µ1 − x2 µ2
The KKT condition is
2x1 + x2 − 3 − µ1 = 0
x1 + 2x2 − µ2 = 0
x1 , x 2 , µ 1 , µ 2 ≥ 0
µ1x1 + µ2x2 = 0
Solving this yields
∗ ∗ 3
x 1 = µ2 = , x∗2 = µ∗1 = 0.
2

Similar as the proof of FONC, we can show SONC.
Theorem [Second order necessary condition (SONC)]. Suppose f, g , h ∈

p
C 2. If x∗ is a regular point and local minimizer, then ∃ λ∗ ∈ Rm, µ∗ ∈ R+
such that
• The KKT condition for (x∗, λ∗, µ∗) holds;
• For all y ∈ T (x∗), there is
y >∇2xl(x∗, λ∗, µ∗)y ≥ 0

where
T (x∗) = {y ∈ Rn : Dh(x∗)y = 0, ∇gj (x∗)>y = 0, ∀ j ∈ J(x∗)}
Proof. The first part follows from the KKT theorem. The second part is due
to the fact that x∗ being a local minimizer of f over Ω implies that it is a local
minimizer over Ω0.
Theorem [Second order sufficient condition (SOSC)]. Suppose f, g , h ∈
C 2. If ∃ λ∗ ∈ Rm, µ∗ ∈ Rp such that
• The KKT condition of (x∗, λ∗, µ∗) holds;
• For all nonzero y ∈ T̃ (x∗, µ∗), there is
y >∇2xl(x∗, λ∗, µ∗)y > 0

where
T̃ (x∗, µ∗) := {y ∈ Rn : Dh(x)y = 0, ∇gj (x∗)y = 0, j ∈ J(

˜ x∗, µ∗)}
and
˜ x∗, µ∗) := {j ∈ J(x∗) : µ∗j > 0}
J(
Then x∗ is a strict local minimizer.
Remark. We omit the proof here. Note that T (x∗) ⊂ T̃ (x∗, µ∗).

Example. Consider the following constrained problem:
minimize x1x22
subject to x1 = x2
x1 ≥ 0
Solution. Here f (x) = x1x2
2 , h(x) = x1 − x2 , and g(x) = −x1 . The
Lagrange function is
l(x, λ, µ) = x1x2
2 + λ(x1 − x2 ) − µx1
Then we obtain the KKT conditions:
∂x1 l(x, λ, µ) = x2
2+λ−µ=0
∂x2 l(x, λ, µ) = 2x1x2 − λ = 0
∂λl(x, λ, µ) = x1 − x2 = 0
x1 ≥ 0
µ≥0
µx1 = 0

Solution (cont.) If x∗1 = x∗2 = 0, then λ∗ = µ∗ = 0. If x∗1 = x∗2 > 0,
then µ∗ = 0 but we cannot find any valid λ∗. So only the point [x∗, λ∗, µ∗] =
[0, 0, 0, 0] satisfies the KKT conditions.
Since µ∗ = 0, we have
T̃ (x∗, µ∗) = N (∇h(x∗)) = N ([1, −1]) = {t[1, 1] : t ∈ R}
On the other hand

" #
∗ ∗ ∗ 0 0
∇2
xl(x , λ , µ ) = 0 0
so y >(∇2l(x∗, λ∗, µ∗))y = 0 for all y ∈ T̃ (x∗, µ) but not strictly larger than
0. Hence SOSC does not hold. But in fact x∗ = [0, 0] is the local minimum
(actually also global).

Example. Consider the following constrained problem:
minimize x1 + 4x22
subject to x2 2
1 + 2x2 ≥ 4
Solution. Here f (x) = x2 2 2 2
1 + 4x2 , g(x) = −(x1 + 2x2 − 4). The Lagrange
function is
l(x, µ) = x2 2 2 2
1 + 4x2 − µ(x1 + 2x2 − 4).
Then we obtain the KKT conditions:
∂x1 l(x, µ) = 2x1 − 2µx1 = 0

∂x2 l(x, µ) = 8x2 − 4µx2 = 0
x2
1 + 2x2≥4
2
µ≥0
−µ(x2 2
1 + 2x2 − 4) = 0

Solution (cont.)
• If µ∗ = 0, then x∗1 = x∗2 = 0 which violates g(x) ≤ 0.
• If µ∗ = 1 then [x∗1, x∗2] = ±[2, 0].

∗ ∗ ∗
√
• If µ = 2 then [x1, x2] = ±[0, 2].
• If µ∗ > 0 but µ 6= 1, 2, then x∗1 = x∗2 = 0 which again violates g(x) ≤ 0.
Hence the following 4 points satisfy the KKT conditions:

   
∗
     
x1 2 −2 √ 
0 0
 √ 
 ∗ − 2
x2 = 0 ,  0 ,  2 ,
   
µ∗
   
1 1 2 2

Solution (cont.)
For µ∗ = 1, we have
" # " #
0 0 ∓4
∇2
xl([±2, 0, 1]) = , ∇g([±2, 0]) =
0 4 0
which implies
T̃ (x∗, µ∗) = T (x∗) = {t[0, 1] : t ∈ R}
Hence
y >∇2xl([x∗1, x∗2, µ∗])y = 4t2 > 0
for all y ∈ T̃ (x∗, µ∗) \ {0}.
So [x∗1, x∗2] = [±2, 0] satisfy SOSC and are strict local minimizers.

Solution (cont.)
For µ∗ = 2, we have
√ √
" # " #
2 −2 0 0√
∇xl([0, ± 2, 2]) = , ∇g([0, ± 2]) =
0 0 ∓4 2
which implies
T̃ (x∗, µ∗) = T (x∗) = {t[1, 0] : t ∈ R}
Hence
y >∇2xl([x∗1, x∗2, µ∗])y = −4t2 < 0
for all y ∈ T̃ (x∗, µ∗) \ {0}.
√
So [x∗1, x∗2] = [0, ± 2] do not satisfy SOSC but are strict local maximizers.

lec09

Uploaded by

Copyright:

Available Formats

lec09

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

lec09

Uploaded by

Copyright:

Available Formats

MATH 4211/6211 – Optimization

Xiaojing Ye, Math & Stat, Georgia State University 0

We can summarize them into vector-valued functions g and h:

Note that g : Rn → Rp and h : Rn → Rm.

Xiaojing Ye, Math & Stat, Georgia State University 1

Exmaple. LP (standard form) is a constrained optimization with f (x) = c>x,

Xiaojing Ye, Math & Stat, Georgia State University 2

Xiaojing Ye, Math & Stat, Georgia State University 3

• Consider the constrained optimization problem

Xiaojing Ye, Math & Stat, Georgia State University 4

• Consider the constrained optimization problem

Another way to solving this is using 1 = x2

However, not all equality-constrained problems can be easily converted into

Xiaojing Ye, Math & Stat, Georgia State University 5

Recall that h : Rn → Rm (m ≤ n) has Jacobian matrix

Definition. We say a point x ∈ Ω is a regular point if rank(Dh(x)) = m,

Xiaojing Ye, Math & Stat, Georgia State University 6

Dh(x) = [∇h1(x)>] = [0, 1, −2x3]

The feasible set Ω is a “surface” in R3 with dimension n − m = 3 − 1 = 2.

Xiaojing Ye, Math & Stat, Georgia State University 7

Xiaojing Ye, Math & Stat, Georgia State University 8

Defintion. We say x : (a, b) → Rn, a curve in Rn, is differentiable if x0i(t)

Xiaojing Ye, Math & Stat, Georgia State University 9

T (x∗) = {y ∈ Rn : Dh(x∗)y = 0}.

Remark. If x∗ is regular, then rank(Dh(x∗)) = m, and hence dim(T (x∗)) =

Remark. We sometimes draw the tangent space as a plane tangent to Ω at

T P (x∗) := x∗ + T (x∗) = {x∗ + y : y ∈ T (x∗)}

Xiaojing Ye, Math & Stat, Georgia State University 10

Xiaojing Ye, Math & Stat, Georgia State University 11

Xiaojing Ye, Math & Stat, Georgia State University 12

(⇒) For any t, let u = u(t) ∈ Rn be determined by t s.t. it solves

h̄(t, u) := h(x∗ + ty + Dh(x∗)>u) = 0

Xiaojing Ye, Math & Stat, Georgia State University 13

N (x∗) = {x ∈ Rn : x = Dh(x∗)>z for some z ∈ Rm}

Xiaojing Ye, Math & Stat, Georgia State University 14

Rn = T (x∗) ⊕ N (x∗) = N (Dh(x∗)) ⊕ R(Dh(x∗))

We can also write this as T (x∗)⊥ = N (x∗) or N (x∗)⊥ = T (x∗).

Xiaojing Ye, Math & Stat, Georgia State University 15

Suppose x∗ is a local minimizer of f (x) over Ω = {x : h(x) = 0}, where

Define φ(s) = f (x(s)) (note that φ : (a, b) → R), then

Xiaojing Ye, Math & Stat, Georgia State University 16

φ0(t) = ∇f (x(t))>x0(t) = ∇f (x∗)>y = 0

This result is summarized below:

Xiaojing Ye, Math & Stat, Georgia State University 17

Xiaojing Ye, Math & Stat, Georgia State University 18

Xiaojing Ye, Math & Stat, Georgia State University 19

l(x, λ) = f (x) + h(x)>λ

∇xl(x∗, λ∗) = ∇f (x∗) + Dh(x∗)>λ∗ = 0

Xiaojing Ye, Math & Stat, Georgia State University 20

Xiaojing Ye, Math & Stat, Georgia State University 21

x2x3 − (x2 + x3)λ = 0

Xiaojing Ye, Math & Stat, Georgia State University 22

Solution. Here f (x) = x2 2 2 2

The Lagrange function is

∂x1 l(x, λ) = 2x1 + 2λx1 = 0

Xiaojing Ye, Math & Stat, Georgia State University 23

Xiaojing Ye, Math & Stat, Georgia State University 24

Following the same steps as in FONC, suppose x∗ is a local minimizer, then

Again define φ(s) = f (x(s)), and hence φ0(s) = ∇f (x(s))>x0(s). Then

φ0(t) = ∇f (x(t))>x0(t) = ∇f (x∗)>y = 0