lec09
lec09
lec09
Constrained Optimization
Xiaojing Ye
Department of Mathematics & Statistics
Georgia State University
minimize f (x)
subject to gj (x) ≤ 0, j = 1, . . . , p,
hi(x) = 0, i = 1, . . . , m,
where gj , hi : Rn → R are inequality and equality constraint functions, re-
spectively.
minimize f (x)
subject to g (x) ≤ 0
h(x) = 0
The feasible set is Ω := {x ∈ Rn : g (x) ≤ 0, h(x) = 0}.
minimize f (x)
subject to h(x) = 0
and the feasible set is Ω = {x ∈ Rn : h(x) = 0}.
Example.
minimize x2
1 + 2x x
1 2 + 3x 2 + 4x + 5x + 6x
2 1 2 3
subject to x1 + 2x2 = 3
4x1 + 5x3 = 6
The constraints imply that x2 = 2 1 (3 − x ) and x = 1 (6 − x ). Substi-
1 3 5 1
tute x2 and x3 in the objective function to get an unconstrained minimiza-
tion of x1 only.
maximize x1x2
subject to x2 2
1 + 4x2 = 1
It is equivalent to maximizing x2 2 2 2
1 x2 then substitute x1 by 1 − 4x2 to get
an unconstrained problem of x2.
minimize f (x)
subject to h(x) = 0
and the feasible set is Ω = {x ∈ Rn : h(x) = 0}.
Ω = {x ∈ R3 : h1(x) = x1 = 0, h2(x) = x1 − x2 = 0}
Then we have
>
"# " #
∇h1(x) 1 0 0
Dh(x) = =
∇h2(x)> 1 −1 0
at any x ∈ Ω, and the tangent space at any point x is
" #
1 0 0
T (x) = N (D(h(x)) = y ∈ R3 : y=0
1 −1 0
= {[0; 0; α] ∈ R3 : α ∈ R}
Proof. (⇐) Let x(t) be such a curve, then h(x(t)) = 0 for t ∈ (−δ, δ) and
Dh(x(0))x0(0) = Dh(x∗)y = 0
which implies y ∈ T (x∗).
Duh̄(0, u) = Dh(x∗)Dh(x∗)> 0
as Dh(x∗) has full row rank. Hence by Implicit Function Theorem, there is
δ > 0 s.t. a unique solution u(t) to h̄(t, u) = 0 exists for t ∈ (−δ, δ). Then
x(t) = x∗ + ty + Dh(x∗)>u(t)
is the desired curve.
N (x∗) = C(Dh(x∗)>)
= R(Dh(x∗))
= span{∇h1(x∗), . . . , ∇hm(x∗)}
Note that dim(N (x∗)) = dim(R(Dh(x∗))) = m.
Hence, for any v ∈ Rn, there exist a unique pair y ∈ T (x∗) and w ∈ N (x∗),
such that
v =y+w
Then for any y ∈ T (x∗), there exists curve x : (a, b) → Ω such that x(t) =
x∗ and x0(t) = y for some t ∈ (a, b).
φ0(s) = ∇f (x(s))>x0(s)
∇f (x∗) ∈ N (x∗)
This means that ∃ λ∗ ∈ Rm, such that
∇f (x∗) + Dh(x∗)>λ∗ = 0
∇f (x∗) + Dh(x∗)>λ∗ = 0
minimize f (x)
subject to h(x) = 0
then x∗ must satisfy
∇f (x∗) + Dh(x∗)>λ∗ = 0
h(x∗) = 0
There are called the first-order necessary conditions (FNOC), or the Lagrange
condition, of the equality-constrained minimization problem. λ∗ is called the
Lagrange multiplier.
Remark. The conditions above are necessary but not sufficient to determine
x∗ to be a local minimizer—a point satisfying these conditions could be a local
maximizer or neither.
minimize f (x)
subject to h(x) = 0
where f (x) = x and
2 if x < 0
x
h(x) =
0 if 0 ≤ x ≤ 1
(x − 1)2
if x > 1
We can see that Ω = [0, 1] and x∗ = 0 is the only local minimizer. However
f 0(x∗) = 1 and h0(x∗) = 0. The Lagrange condition fails to hold because x∗
is not a regular point.
maximize x1x2x3
subject to 2(x1x2 + x2x3 + x3x1) = A
Hence we can set
f (x) = −x1x2x3
A
h(x) = x1x2 + x2x3 + x3x1 −
2
Then the Lagrange function is
A
l(x, λ) = f (x) + h(x)λ = −x1x2x3 + x1x2 + x2x3 + x3x1 − λ
2
∇xl(x, λ) = 0
∇λl(x, λ) = 0
which is
minimize x2 2
1 + x2
subject to x2 2
1 + 2x2 = 1
l(x, λ) = (x2 2 2 2
1 + x2 ) + λ(x1 + 2x2 − 1)
Then we obtain
i=1
Using the results above, we can cancel the term with x00(t) and obtain
m
y > ∇ 2 f (x∗ ) + λ∗i ∇2hi(x∗) y ≥ 0
X
i=1
for all y ∈ T (x∗).
1. ∇f (x∗) + Dh(x∗)>λ∗ = 0;
i=1
So ∇2f (x∗) + ∗ 2 ∗
Pm
i=1 λi ∇ hi (x ) is playing the role of “Hessian”.
1. ∇f (x∗) + Dh(x∗)>λ∗ = 0;
i=1
x>Qx
maximize
x> P x
where Q = diag([4, 1]) and P = diag([2, 1]).
minimize − x>Qx
subject to x>P x − 1 = 0
and h(x) = x>P x − 1 ∈ R is the constraint.
We also have
" #
0 0
∇2f (x∗) + λ∗∇2h(x∗) = −2Q + 2λ∗P =
0 2
Therefore y >[∇2f (x∗) + λ∗∇2h(x∗)]y = 2a2 > 0 for all y = [0; a] ∈
T (x∗) with a 6= 0.
√
Therefore x∗ = [±1/ 2; 0] are both strict local minimizers of the constrained
optimization problem.
Going back to the original problem, any x∗ = [t; 0] with t 6= 0 is a strict local
>
maximizer of x> Qx .
x Px
We also have
" #
−4 0
∇2f (x∗) + λ∗∇2h(x∗) = −2Q + 2λ∗P =
0 0
Therefore y >[∇2f (x∗) + λ∗∇2h(x∗)]y = −4a2 < 0 for all y = [a; 0] ∈
T (x∗) with a 6= 0.
Therefore x∗ = [0; ±1] are both strict local maximizers of the constrained
optimization problem.
Going back to the original problem, any x∗ = [0; t] with t 6= 0 is strict local
>
minimizer of x> Qx .
x Px
∇xl(x, λ) = Qx − A>λ = 0
∇λl(x, λ) = b − Ax = 0
Plugging this into the second equation and solve for λ∗ to get
λ∗ = (AQ−1A>)−1b
Hence the solution is
x∗ = Q−1A>λ∗ = Q−1A>(AQ−1A>)−1b
minimize kxk
subject to Ax = b
x∗ = A>(AA>)−1b
xk = axk−1 + buk
with given initial x0, where k = 1, . . . , N stand for the time point. Here xk is
the “state” and uk is the “control”.
Suppose we want to minimize the state and control at all points, then we can
formulate the problem as
N
1 X
minimize (qx2
k + ru 2)
k
2 k=1
subject to xk = axk−1 + buk , k = 1, . . . , N.
Let xk be the balance and uk be the payment in month k. Then the problem
can be written as
10
1 X
minimize (qx2
k + ru 2)
k
2 k=1
subject to xk = 1.02 xk−1 − uk , k = 1, . . . , 10, x0 = 10000.
The more anxious we are to reduce our debt, the larger the value of q relative
to r. On the other hand, the more reluctant we are to make payments, the
larger the value of r relative to q.
q = 1, r = 10: q = 1, r = 300:
minimize f (x)
subject to g (x) ≤ 0
h(x) = 0
and the feasible set is Ω = {x ∈ Rn : g (x) ≤ 0, h(x) = 0}.
• The second and third KKT conditions are just the constraints.
• Since µ∗ ≥ 0 and g (x∗) ≤ 0, the last KKT condition implies µ∗j gj (x∗) =
0 for all j = 1, . . . , p. Namely gj (x∗) < 0 implies µ∗j = 0. Hence
µ∗j = 0, ∀ j 6= J(x∗).
Note Ω0 only contains equality constraints, hence the Lagrange theorem for
equality constrained problems applies, i.e., ∃λ∗, µ∗j for j ∈ J(x∗) such that
We claim that ∃ y ∈ T̂ (x∗) such that ∇gj (x∗)>y 6= 0: otherwise ∇gj (x∗)
can be spanned by {∇hi(x∗), ∇gj 0 (x∗) : 1 ≤ i ≤ m, j 0 ∈ J(x∗), j 0 6= j},
which contradicts to that ∇hi(x∗), ∇gj (x∗) are linearly independent (since
x∗ is regular). We choose y (or −y ) so that ∇gj (x∗)>y < 0.
l(x, µ) = x2 2
1 + x2 + x1 x2 − 3x1 − x1 µ1 − x2 µ2
The KKT condition is
2x1 + x2 − 3 − µ1 = 0
x1 + 2x2 − µ2 = 0
x1 , x 2 , µ 1 , µ 2 ≥ 0
µ1x1 + µ2x2 = 0
Solving this yields
∗ ∗ 3
x 1 = µ2 = , x∗2 = µ∗1 = 0.
2
Proof. The first part follows from the KKT theorem. The second part is due
to the fact that x∗ being a local minimizer of f over Ω implies that it is a local
minimizer over Ω0.
Xiaojing Ye, Math & Stat, Georgia State University 48
Theorem [Second order sufficient condition (SOSC)]. Suppose f, g , h ∈
C 2. If ∃ λ∗ ∈ Rm, µ∗ ∈ Rp such that
and
˜ x∗, µ∗) := {j ∈ J(x∗) : µ∗j > 0}
J(
Remark. We omit the proof here. Note that T (x∗) ⊂ T̃ (x∗, µ∗).
minimize x1x22
subject to x1 = x2
x1 ≥ 0
Solution. Here f (x) = x1x2
2 , h(x) = x1 − x2 , and g(x) = −x1 . The
Lagrange function is
l(x, λ, µ) = x1x2
2 + λ(x1 − x2 ) − µx1
Then we obtain the KKT conditions:
∂x1 l(x, λ, µ) = x2
2+λ−µ=0
∂x2 l(x, λ, µ) = 2x1x2 − λ = 0
∂λl(x, λ, µ) = x1 − x2 = 0
x1 ≥ 0
µ≥0
µx1 = 0
Since µ∗ = 0, we have
minimize x1 + 4x22
subject to x2 2
1 + 2x2 ≥ 4
Solution. Here f (x) = x2 2 2 2
1 + 4x2 , g(x) = −(x1 + 2x2 − 4). The Lagrange
function is
l(x, µ) = x2 2 2 2
1 + 4x2 − µ(x1 + 2x2 − 4).
Then we obtain the KKT conditions:
For µ∗ = 1, we have
" # " #
0 0 ∓4
∇2
xl([±2, 0, 1]) = , ∇g([±2, 0]) =
0 4 0
which implies
T̃ (x∗, µ∗) = T (x∗) = {t[0, 1] : t ∈ R}
Hence
y >∇2xl([x∗1, x∗2, µ∗])y = 4t2 > 0
for all y ∈ T̃ (x∗, µ∗) \ {0}.
So [x∗1, x∗2] = [±2, 0] satisfy SOSC and are strict local minimizers.
For µ∗ = 2, we have
√ √
" # " #
2 −2 0 0√
∇xl([0, ± 2, 2]) = , ∇g([0, ± 2]) =
0 0 ∓4 2
which implies
T̃ (x∗, µ∗) = T (x∗) = {t[1, 0] : t ∈ R}
Hence
y >∇2xl([x∗1, x∗2, µ∗])y = −4t2 < 0
for all y ∈ T̃ (x∗, µ∗) \ {0}.
√
So [x∗1, x∗2] = [0, ± 2] do not satisfy SOSC but are strict local maximizers.