KKT Conditions and Duality: March 23, 2012

KKT conditions and Duality
March 23, 2012

Tutorial Example
Want to solve this constrained optimization problem
min f (x) = min .4 (x21 + x22 )

x∈R2 x∈R2
subject to
g(x) = 2 − x1 − x2 ≤ 0
Tutorial example - Cost function
x2
iso-contours of f (x)
x1
f (x) = .4 (x21 + x22 )

Tutorial example - Constraint
x2
iso-contours of f (x)
feasible region
x1
g(x) = 2 − x1 − x2 ≤ 0
Solve this problem with Lagrange Multipliers
Can solve this constrained optimization with Lagrange multipliers:
L(x, λ) = f (x) + λ g(x)
Solution:
The Lagrangian is
L(x, λ) = .4 x21 + .4 x22 + λ (2 − x1 − x2 )
The KKT conditions say that at an optimum λ∗ ≥ 0 and
∂L(x∗ , λ∗ )
= .8 x∗1 − λ∗ = 0
∂x1
∂L(x∗ , λ∗ )
= .8 x∗2 − λ∗ = 0
∂x2
∂L(x∗ , λ∗ )
= 2 − x∗1 − x∗2 = 0
∂λ
Solve this problem with Lagrange Multipliers
Can solve this constrained optimization with Lagrange multipliers:
L(x, λ) = f (x) + λ g(x)
Solution ctd:
Find (x∗1 , x∗2 , λ∗ ) which fulfill these simultaneous equations. The first two
equations imply
5 ∗ 5 ∗
x∗1 = λ , x2 = λ
4 4
Substituting these into the last equation we get
4
8 − 5λ∗ − 5λ∗ = 0 =⇒ λ∗ = ← greater than 0
5
and in turn this means
5 ∗ 5 ∗
x∗1 = λ = 1, x∗2 = λ =1
4 4
Solve this particular problem in another way
Alternate solution:
Construct the Lagrangian dual function
q(λ) = min L(x, λ) = min (f (x) + λg(x))

x x
Find optimal value of x wrt L(x, λ) in terms of the Lagrange multiplier:

5 5
x∗1 = λ, x∗2 = λ
4 4
Substitute back into the expression of L(x, λ) to get
5 2 5 5
q(λ) = λ + λ (2 − λ − λ)
4 4 4
Find λ ≥ 0 which maximizes q(λ). Luckily in this case the global
optimum of q(λ) corresponds to the constrained optimum
∂q(λ) 5 4
=− λ+2=0 =⇒ λ∗ = =⇒ x∗1 = x∗2 = 1
∂λ 2 5
Solve the same problem in another way
The Primal Problem
min f (x) subject to g(x) ≤ 0

x∈R2
The Lagrangian Dual Problem
max q(λ) subject to λ ≥ 0

λ∈R
where
q(λ) = min (f (x) + λ g(x))

x∈R2
is referred to as the Lagrangian dual function.

The general statement
In general we will have multiple inequality and equality constraints.

The statement of the Primal Problem is
min f (x)
x∈X
subject to
g(x) ≤ 0 and h(x) = 0

While the Dual problem is
Lagrangian Dual Problem
max q(λ, µ) subject to λ ≥ 0

λ,µ
where
q(λ, µ) = min f (x) + λt g(x) + µt h(x)

x
is the Lagrangian dual function.

Why ??
This dual approach is not guaranteed to succeed. However,

• It does for a certain class of functions
• In these cases it often leads to a simpler optimization problem.
• Particularly in the case when the dimension of x is much
larger than the number of constraints.
• The expression of x∗ in terms of the Lagrange multipliers may
give some insight into the optimal solution i.e. the optimal
separating hyper-plane found by the SVM.
Why ??
This dual approach is not guaranteed to succeed. However,

• It does for a certain class of functions
• In these cases it often leads to a simpler optimization problem.
• Particularly in the case when the dimension of x is much
larger than the number of constraints.
• The expression of x∗ in terms of the Lagrange multipliers may
give some insight into the optimal solution i.e. the optimal
separating hyper-plane found by the SVM.
We will now focus on the geometry of the dual solution...

Geometry of the Dual Problem
Map the original problem
x2
z
(g(x), f (x))
x1 G y
⇒
• Map each point x ∈ R2 to (g(x), f (x)) ∈ R2 .

• This map defines the set
G = {(y, z) | y = g(x), z = f (x) for some x ∈ R2 }.
• Note: L(x, λ) = z + λ y for some z and y.
Map the original problem
z
(g(x), f (x))
G y
Define G ⊂ R2 as the image of R2 under the (g, f ) map
G = {(y, z) | y = g(x), z = f (x) for some x ∈ R2 }
In this space only points with y ≤ 0 correspond to feasible points.

The Primal Problem
z
(g(x), f (x))
(y ∗ , z ∗ )
G y
• The primal problem consists in finding a point in G with

y ≤ 0 that has minimum ordinate z.
• Obviously this optimal point is (y ∗ , z ∗ ).
Visualization of the Lagrangian
z
(g(x), f (x))
(y ∗ , z ∗ )
G y
z + λy = α
• Given a λ ≥ 0, the Lagrangian is given by

L(x, λ) = f (x) + λg(x) = z + λ y
with (y, z) ∈ G.
• Note z + λy = α is the eqn of a straight line with slope −λ that
intercepts the z-axis at α.
Visualization of the Lagrangian Dual function
z
(g(x), f (x))
q(λ)
(y ∗ , z ∗ )
G y
z + λy = q(λ)
For a given λ ≥ 0 Lagrangian dual sub-problem is find: min (z + λ y)

(y,z)∈G
• Move the line z + λy in the direction (−λ, −1) while remaining in

contact with G.
• The last intercept on the z-axis obtained this way is the value of
q(λ) corresponding to the given λ ≥ 0.
Solving the Dual Problem
z
(g(x), f (x))
∗
q(λ )
(y ∗ , z ∗ )
G y
z + λ∗ y = q(λ∗ )
z + λy = q(λ)
Finally want to find the dual optimum: max q(λ)

λ
• the line with slope −λ with maximal intercept, q(λ), on the z-axis.
• This line has slope λ∗ and dual optimal solution q(λ∗ ).
Solving the Dual Problem
z
(g(x), f (x))
∗
q(λ )
(y ∗ , z ∗ )
G y
z + λ∗ y = q(λ∗ )
z + λy = q(λ)
• For this problem the optimal dual objective z ∗ equals the optimal
primal objective z ∗ .
• In such cases, there is no duality gap (strong duality).
Properties of the Lagrangian Dual Function
q(λ) is concave
Theorem
Let Dq = {λ | q(λ) > −∞} then q(λ) is concave function on Dq .
Proof.
For any x ∈ X and λ1 , λ2 ∈ Dq and α ∈ (0, 1)
L(x, αλ1 + (1 − α)λ2 ) = f (x) + (αλ1 + (1 − α)λ2 )t g(x)

= α(f (x) + λt1 g(x)) + (1 − α)(f (x) + λt2 g(x))
= α L(x, λ1 ) + (1 − α) L(x, λ2 ).
Take the min on both sides
min{L(x, αλ1 + (1 − α)λ2 )} = min{αL(x, λ1 ) + (1 − α)L(x, λ2 )}

x∈X x∈X
≥ α min{L(x, λ1 )} + (1 − α) min{L(x, λ2 )}
x∈X x∈X
Therefore
q(αλ1 + (1 − α)λ2 ) ≥ α q(λ1 ) + (1 − α) q(λ2 )
This implies that q is concave over Dq .

The set of Lagrange Multipliers is convex
Theorem
Let Dq = {λ | q(λ) > −∞}. This constraint ensures valid Lagrange
Multipliers exist. Then Dq is a convex set.
Proof.
Let λ1 , λ2 ∈ Dq . Therefore q(λ1 ) > −∞ and q(λ2 ) > −∞. Let
α ∈ (0, 1), then as q is concave
q(α λ1 + (1 − α) λ2 ) ≥ α q(λ1 ) + (1 − α) q(λ2 ) > −∞
and this implies
α λ1 + (1 − α) λ2 ∈ Dq
Hence Dq is a convex set.

Significance of these results
• The dual is always concave, irrespective of the primal problem.

• Therefore finding the optimum of the dual function is a
convex optimization problem.
Weak Duality
Weak Duality
Theorem (Weak Duality)

Let x be a feasible solution, x ∈ X , g(x) ≤ 0 and h(x) = 0, to the
primal problem P . Let (λ, µ) be a feasible solution, λ ≥ 0, to the
dual problem D. Then
f (x) ≥ q(λ, µ)
Weak Duality
Proof of the Weak Duality Theorem.

Remember
m
X l
X
q(λ, µ) = inf{f (x) + λi gi (x) + µi hi (x) : x ∈ XF }
i=1 i=1
Then we have
q(λ, µ) = inf{f (x̃) + λt g(x̃) + µt h(x̃) : x̃ ∈ XF }

≤ f (x) + λt g(x) + µt h(x)
≤ f (x)
and the result follows.

Weak Duality
Corollary
Let
f ∗ = inf{f (x) : x ∈ X, g(x) ≥ 0, h(x) = 0}

q ∗ = sup{q(λ, µ) : λ ≥ 0}
then
q∗ ≤ f ∗
• Thus the
optimal value of the primal problem ≥ optimal value of the dual problem.
• If optimal value of the primal problem > optimal value of the
dual problem, then there exists a duality gap.
Weak Duality
Corollary
Let
f ∗ = inf{f (x) : x ∈ X, g(x) ≥ 0, h(x) = 0}

q ∗ = sup{q(λ, µ) : λ ≥ 0}
then
q∗ ≤ f ∗
• Thus the
optimal value of the primal problem ≥ optimal value of the dual problem.
• If optimal value of the primal problem > optimal value of the
dual problem, then there exists a duality gap.
Example with a Duality Gap
Example with a non-convex objective function
f (x) non-convex f (x)
feasible region
defined by g(x) ≤ 0
• Consider the constrained optimization of this 1D non-convex

objective function.
• Let’s visualize G = {(y, z) | ∃x ∈ R s.t. y = g(x), z = f (x))} and its
dual solution...
Dual Solution ≤ Primal Solution: Have a Duality Gap
z
G
Optimal primal objective
Duality Gap
Optimal dual objective
• Above is the geometric interpretation of the primal and dual

problems.
• Note there exists a duality gap due to the nonconvexity of
the set G.
Strong Duality
When does Dual Solution = Primal Solution?
The Strong Duality Theorem states, that if some suitable

convexity conditions are satisfied, then there is no duality gap
between the primal and dual optimisation problems.
Strong Duality
Theorem (Strong Duality)

Let
• X be a non-empty convex set in Rn
• f : X → R and each gi : Rn → R (i = 1, . . . , m) be convex,
• each hi : Rn → R (i = 1, . . . , l) be affine.
If
• there exists x̂ ∈ X such that g(x̂) < 0 and
• 0 ∈ int(h(X)) where h(X) = {h(x) : x ∈ X}.
then
inf{f (x) : x ∈ X, g(x) ≤ 0, h(x) = 0} = sup{q(λ, µ) : λ ≥ 0}
where q(λ, µ) = inf{f (x) + λt g(x) + µt h(x) : x ∈ X}.

Strong Duality
Theorem (Strong Duality ctd)

Furthermore, if
inf{f (x) : x ∈ X, g(x) ≤ 0, h(x) = 0} > −∞
then the
sup{q(λ, µ) : λ ≥ 0}
is achieved at (λ∗ , µ∗ ) with λ∗ ≥ 0. If the inf is achieved at x∗ then
(λ∗ )t g(x∗ ) = 0

KKT Conditions and Duality: March 23, 2012

Uploaded by

Copyright:

Available Formats

KKT Conditions and Duality: March 23, 2012

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

KKT Conditions and Duality: March 23, 2012

Uploaded by

Copyright:

Available Formats

KKT conditions and Duality

March 23, 2012

Want to solve this constrained optimization problem

min f (x) = min .4 (x21 + x22 )

f (x) = .4 (x21 + x22 )

L(x, λ) = f (x) + λ g(x)

L(x, λ) = .4 x21 + .4 x22 + λ (2 − x1 − x2 )

The KKT conditions say that at an optimum λ∗ ≥ 0 and

L(x, λ) = f (x) + λ g(x)

q(λ) = min L(x, λ) = min (f (x) + λg(x))

Find optimal value of x wrt L(x, λ) in terms of the Lagrange multiplier:

The Primal Problem

min f (x) subject to g(x) ≤ 0

The Lagrangian Dual Problem

max q(λ) subject to λ ≥ 0

q(λ) = min (f (x) + λ g(x))

is referred to as the Lagrangian dual function.

In general we will have multiple inequality and equality constraints.

g(x) ≤ 0 and h(x) = 0

Lagrangian Dual Problem

max q(λ, µ) subject to λ ≥ 0

q(λ, µ) = min f (x) + λt g(x) + µt h(x)

is the Lagrangian dual function.

This dual approach is not guaranteed to succeed. However,

This dual approach is not guaranteed to succeed. However,

We will now focus on the geometry of the dual solution...

• Map each point x ∈ R2 to (g(x), f (x)) ∈ R2 .

Define G ⊂ R2 as the image of R2 under the (g, f ) map

G = {(y, z) | y = g(x), z = f (x) for some x ∈ R2 }

In this space only points with y ≤ 0 correspond to feasible points.

• The primal problem consists in finding a point in G with

• Given a λ ≥ 0, the Lagrangian is given by

For a given λ ≥ 0 Lagrangian dual sub-problem is find: min (z + λ y)

• Move the line z + λy in the direction (−λ, −1) while remaining in

Finally want to find the dual optimum: max q(λ)

L(x, αλ1 + (1 − α)λ2 ) = f (x) + (αλ1 + (1 − α)λ2 )t g(x)

Take the min on both sides

min{L(x, αλ1 + (1 − α)λ2 )} = min{αL(x, λ1 ) + (1 − α)L(x, λ2 )}

q(αλ1 + (1 − α)λ2 ) ≥ α q(λ1 ) + (1 − α) q(λ2 )

This implies that q is concave over Dq .

q(α λ1 + (1 − α) λ2 ) ≥ α q(λ1 ) + (1 − α) q(λ2 ) > −∞

and this implies

Hence Dq is a convex set.

• The dual is always concave, irrespective of the primal problem.

Theorem (Weak Duality)

Proof of the Weak Duality Theorem.

q(λ, µ) = inf{f (x̃) + λt g(x̃) + µt h(x̃) : x̃ ∈ XF }

and the result follows.

f ∗ = inf{f (x) : x ∈ X, g(x) ≥ 0, h(x) = 0}

f ∗ = inf{f (x) : x ∈ X, g(x) ≥ 0, h(x) = 0}

• Consider the constrained optimization of this 1D non-convex

Optimal primal objective

• Above is the geometric interpretation of the primal and dual

The Strong Duality Theorem states, that if some suitable

Theorem (Strong Duality)

inf{f (x) : x ∈ X, g(x) ≤ 0, h(x) = 0} = sup{q(λ, µ) : λ ≥ 0}

where q(λ, µ) = inf{f (x) + λt g(x) + µt h(x) : x ∈ X}.

Theorem (Strong Duality ctd)