Lagrange 2deriv PDF
Lagrange 2deriv PDF
Lagrange 2deriv PDF
This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem.
The Section 1 presents a geometric motivation for the criterion involving the second derivatives of both the
function f and the constraint function g. The main result is given in section 3, with the special cases of one
constraint given in Sections 4 and 5 for two and three dimensions respectively. The result is given in terms
of the determinant of what is called the bordered Hessian matrix, which is defined in Section 2 using the
Lagrangian function.
The above figure displays the level curves 1 = g(x, y), and f (x, y) = C for C = (0.9)2 , 1, (1.1)2 , (1.9),
2 , and (2.1)2 on one plot.
2
The level curve f (x, y) = 1 intersects 1 = g(x, y) at (±1, 0). For nearby values of f , the level curve
f (x, y) = (0.9)2 does not intersect 1 = g(x, y), while the level curve f (x, y) = (1.1)2 intersects 1 =
g(x, y) in four points. Therefore, the points (±1, 0) are local minima for f . Notice that the level curve
f (x, y) = 1 bends more sharply near (±1, 0) than the level curve for g and so the level curve for f lies
inside the level curve for g. Since it lies inside the level curve for g and the gradient of f points outward,
these points are local minima for f on the level curve of g.
On the other hand, the level curve f (x, y) = 4 intersects 1 = g(x, y) at (0, ±1), f (x, y) = (1.9)
intersects in four points, and f (x, y) = (2.1)2 does not intersect. Therefore, the points (0, ±1) are local
maxima for f . Notice that the level curve f (x, y) = 4 bends less sharply near (0, ±1) than the level curve
for g and so the level curve for f lies outside the level curve for g. Since it lies outside the level curve for g
and the gradient of f points outward, these points are local maxima for f on the level curve of g.
1
2 CONSTRAINED EXTREMA
Thus, the second partial derivatives of f are the same at both (±1, 0) and (0, ±1), but the sharpness
with which the two level curves bend determines which are local maxima and which are local minima. This
discussion motivates the fact that it is the comparison of the second partial derivatives of f and g which is
relevant.
2. Lagrangian Function
One way to getting the relevant matrix is to form the Lagrangian function, which is a combination of f
and g. For the problem of finding the extrema (maxima or minima) of f (x) with ik constraints g` (x) = C`
for 1 ≤ ` ≤ k , the Lagrangian function is defined to be the function
Xk
L(λ, x) = f (x) − λ` [g` (x) − C` ] .
`=1
The solution of the Lagrange multiplier problems is then a critical point of L,
∂L ∗ ∗
(λ , x ) = −g` (x∗ ) + C` = 0, for 1 ≤ ` ≤ k and
∂λ`
∂L ∗ ∗ ∂f ∗ Xk ∂g` ∗
(λ , x ) = (x ) − λ∗` (x ) = 0 for 1 ≤ i ≤ n.
∂ xi ∂ xi `=1 ∂ xi
The second derivative test involves the matrix of all second partial derivatives of L, including those with
∂2L
respect to λ. In dimensions n greater than two, the test also involves submatrices. Notice that 2 (λ∗ , x∗ ) =
∂λ
∂2L ∂g` ∗ ∂g` ∗ ∂g` ∗
0 and (λ , x ) = −
∗ ∗
(x ). We could use (x ) in the matrix instead of − (x ) it does not
∂λ`∂ xi ∂ xi ∂ xi ∂ xi
change the determinant (both a row and a column are multiplied by minus one). The matrix of all second
partial derivatives of L is called the bordered Hessian matrix because the the second derivatives of L with
respect to the xi variables is bordered by the first order partial derivatives of g. The bordered Hessian matrix
is defined to be
0 ··· 0 -(g1 )x1 . . . -(g1 )xn
.. .. .. ..
.
..
. . . .
) . . . k )xn
0 · · · 0 -(g -(g 0 -Dg
(1) H L(λ , x ) =
∗ ∗ k x1
=
-(g1 )x1 · · · -(gk )x1 L x1 x1 . . . L x1 xn -Dg T Dx L
. .. ..
..
. .
-(g1 )xn · · · -(gk )xn L xn x1 . . . L xn xn
where all the partial derivatives are evaluated with x = x∗ and λ = λ∗ . In the following, we use the notation
Dx2 L ∗ = Dx2 L(λ∗ , x∗ ) = D 2 f (x∗ ) − k`=1 λ∗` D 2 (g` )(x∗ ) for this submatrix that appears in the bordered
P
Hessian.
In the last equality, we used the definition of D 2 f and the fact that D f (x∗ ) = k`=1 λ∗` D(g` )(x∗ ).
P
We can perform a similar calculation for the constraint equation 0 = g` (r(t)) whose derivatives are zero:
d X ∂g`
0 = g` (r(t)) = (r(t)) ri0 (t),
dt i=1,...,n
∂ x i
2 X d ∂g`
d
0 = 2 g` (r(t)) = (r(t)) ri (t)
0
dt t=0 i=1,...,n
dt ∂ xi t=0
∂ g`
2
X
= (x∗ ) ri0 (0)r 0j (0) + D(g` )(x∗ )r00 (0), and
i=1,...,n
∂ x j ∂ x i
j=1,...,n
λ∗` D(g` )(x∗ )r00 (0) = −λ∗` (r0 (0))T D 2 (g` )(x∗ )r0 (0).
Substituting this equality into the expression for the second derivative of f (r(t)),
" #
d2
X
f (r(t)) = vT D 2 f (x∗ ) − λ∗` D 2 g` (x∗ ) v,
dt 2 t=0 `=1,...,k
The next theorem uses the above lemma to derive conditions for local maxima and minima in terms of
the second derivative of the Lagrangian Dx2 L ∗ on the set of vectors Nul(Dg(x∗ )).
Theorem 2. Assume f, g` : Rn → R are C 2 for 1 ≤ ` ≤ k. Assume that x∗ ∈ Rn and λ∗ = (λ∗1 . . . . , λ∗k )
meet the first-order conditions of the Theorem of Lagrange on g −1 (C).
a. If f has a local maximum on g −1 (C) at x∗ , then vT Dx2 L ∗ v ≤ 0 for all v ∈ Nul(Dg(x∗ )).
b. If f has a local minimum on g −1 (C) at x∗ , then vT Dx2 L ∗ v ≥ 0 for all v ∈ Nul(Dg(x∗ )).
c. If vT Dx2 L ∗ v < 0 for all v ∈ Nul(Dg(x∗ )) r {0}, then x∗ is a strict local maximum of f on g −1 (C).
d. If vT Dx2 L ∗ v > 0 for all v ∈ Nul(Dg(x∗ )) r {0}, then x∗ is a strict local minimum of f on g −1 (C).
e. If vT Dx2 L ∗ v is positive for some vector v ∈ Nul(Dg(x∗ )) and negative for another such vector, then
x∗ is neither a local maximum nor a local minimum of f on g −1 (C).
4 CONSTRAINED EXTREMA
Proof. (b) We consider the case of minima. (The case of maximum just reverses the direction of the in-
equality.) Lemma 1 shows that
d2
(r(t))
f = vT D 2 L ∗ v,
x
dt 2
t=0
for any curves r(t) in g −1 (C) with r(0) = x∗ . Thus, vT Dx2 L ∗ v ≥ 0 for any vector v tangent to a curve in
g −1 (C). But the implicit function theorem implies that these are the same as the vector in the null space
Nul(Dg(x∗ )).
(d) If vT Dx2 L ∗ v > 0 for all vectors v 6= 0 in Nul(Dg(x∗ )), then
d2
f (r(t)) = r0 (0)T Dx2 L ∗ r0 (0) > 0
dt 2
t=0
for any curves r(t) in g −1 (C) with r(0) = x∗ and r0 (0) 6= 0. This latter condition implies that x∗ is a strict
local minimum on g −1 (C).
For part (e), if vT Dx2 L ∗ v is both positive and negative, then there are some curves where the value of f
is greater than at x∗ and others on which the value is less.
If we combine this result with the conditions we gave for the maximum of a quadratic form on the null
space of a linear map, we get the theorem given in the book.
Combining with the earlier theorem on constrained quadratic forms, we get the following theorem given
in the book.
Theorem 3. Assume that f, g` : Rn → R are C 2 for 1 ≤ ` ≤ k and that λ∗ = (λ∗1 , . . . , λl∗ ) and x∗ satisfied
the first order conditions for a extrema of f on g −1 (C).
Assume that the k × k submatrix of Dg(x ) formed
∗
∂g` ∗
by the first k columns has nonzero determinant, det (x ) 6= 0. Let H j be the upper left j × j
∂x j 1≤i, j≤k
submatrix of H L(λ∗ , x∗ ).
(a) If (-1)k det(H j ) > 0 for 2k + 1 ≤ j ≤ n + k, the the function f has a local minimum at x∗ on the
level set g −1 (C). (Notice that the sign given by (−1)k depends on the rank k and not j.)
(b) If (-1) j−k det(H j ) > 0 for 2k + 1 ≤ j ≤ n + k, the the function f has a local maximum at x∗ on
the level set g −1 (C). Notice that the sign given by (−1) j−k depends on j and alternates sign. The
condition on the signs of the determinants can be express as (-1)k det(H2k+1 ) < 0, and the rest of
the sequence (-1)k det(H j ) alternate signs with j.
(c) If these determinants (-1)k det(H j ) 6= 0 for 2k + 1 ≤ j ≤ n + k but fall into a different pattern of
signs than the above two cases, then the critical point is some type of saddle.
Remark 1. Notice that the null space Nul(Dg(x∗ )) had dimension n − k, so we need n − k conditions. The
range of j in the assumptions of the theorem contains n − k values.
In the case of negative definite, the first case for j = 2k+1, (-1)k det(H j ) < 0 and the terms (-1)k det(H j ) <
0 alternate sign.
Let
L(λ, x, y) = f (x, y) − λ [g(x, y) − C]
be the Lagrangian function for the problem. Form the bordered Hessian matrix
0 -gx -g y
(3) H L = -gx L x x L x y ,
-g y L yx L yy
Solving these yields (i) x = 0, y = ±1, and λ = 4, and (ii) x = ±1, y = 0, and λ = 1.
The Lagrangian function is
L(λ, x, y) = x 2 + 4y 2 − λ(x 2 + y 2 ) + λ.
The bordered Hessian matrix is
0 -2x -2y
H L = -2x 2 − 2λ 0 .
-2y 0 8 − 2λ
(i) At the first pair of points, x = 0, y = ±1, and λ = 4,
0 0 ∓2
H L(4, 0, ±1) = 0 −6 0 .
∓2 0 0
So, − det(H L) = −(−1)(−6)(±2)2 = −24 < 0, and these points are local maxima.
(ii) At the second pair of points x = ±1, y = 0, and λ = 1,
0 ∓2 0
H L(1, ±1, 0) = ∓2 0 0 ,
0 0 6
So, − det(H L) = −(−1)(6)(±2)2 = 24 > 0, and these points are local minima.
These results agree with the answers found by taking the values at the points, f (±1, 0) = 1 and
f (0, ±1) = 4.
6 CONSTRAINED EXTREMA
Let
L(λ, x, y, z) = f (x, y, z) − λ [g(x, y, z) − C]
be the Lagrangian function for the problem. The corresponding bordered Hessian matrix is
0 -gx -g y -gz
-g Lxx Lxy L xz
H4 = H L(λ∗ , x ∗ , y ∗ , z ∗ ) = x ,
(5) -g y L yx L yy L yz
-gz L zx L zy L zz
where the partial derivatives are evaluated at (x ∗ , y ∗ , z ∗ ) and λ∗ . In three dimensions, there are two directions
in which we can move in the level surface, and we need two numbers to determine whether the solution of
the Lagrange multiplier problem is a local maximum or local minimum. Therefore, we need to consider not
only the four-by-four bordered matrix H4 , but also a three-by-three submatrix; the submatrix is
0 -gx -g y
L x y if gx (x ∗ , y ∗ , z ∗ ) 6= 0 or g y (x ∗ , y ∗ , z ∗ ) 6= 0
-gx Lxx
-g y L yx L yy
(6) H3 =
0 -g y -gz
if gx (x ∗ , y ∗ , z ∗ ) = 0 and g y (x ∗ , y ∗ , z ∗ ) = 0,
-g y L yy L yz
-gz L zy L zz
where the partial derivatives are evaluated at (x ∗ , y ∗ , z ∗ ) and λ∗ . Then, we have the following second
derivative test.
Theorem 5. Let f and g be real valued functions on R3 . Let (x ∗ , y ∗ , z ∗ ) ∈ R3 and λ∗ be a solution of the
Lagrange multiplier problem ∇ f (x ∗ ,y ∗ ,z ∗ ) = λ∗ ∇g(x ∗ ,y ∗ ,z ∗ ) and C = g(x ∗ , y ∗ , z ∗ ). Assume that ∇g(x ∗ ,y ∗ ,z ∗ ) 6=
0. Define the bordered Hessian matrices H4 and H3 by equations (5) and (6).
(a) The point (x ∗ , y ∗ , z ∗ ) is a local minimum of f on c = g(x ∗ , y ∗ , z ∗ ) if − det(H3 ) > 0 and
− det(H4 ) > 0.
(b) The point (x ∗ , y ∗ , z ∗ ) is a local maximum of f on c = g(x ∗ , y ∗ , z ∗ ) if − det(H3 ) < 0 and
− det(H4 ) > 0.
(c) If − det(H4 ) < 0, then the point (x ∗ , y ∗ , z ∗ ) is a type of saddle and is neither a local minimum nor
a local maximum.
Remark 2. Again, the factor −1 in front of the determinants comes from the fact that we are considering
one constraint.
Remark 3. The theorem in this case involves the determinant of a four-by-four matrix. In the cases we
evaluate one of these, we expand on a row to find the answer or use the many zeroes in the matrix to get the
answer in terms of the product of determinants of two submatrices. The general treatment of determinants
is beyond this course and is treated in a course on linear algebra.
CONSTRAINED EXTREMA 7
A way to see that the conditions on det(H4 ) and det(H3 ) are right is to take the special case where
g(x, y, z) = x, L x y (x∗ ) = 0 L x z (x∗ ) = 0, and L yz (x∗ ) = 0. In this case,
0 -1 0 0
-1 L x x 0 -1 0
0 0
0 .
H4 = 0 and H 3 = -1 Lxx
0 L yy 0
0 0 L yy
0 0 0 L zz
Then, − det(H3 ) = L yy , and expanding det(H4 ) in the fourth row, − det(H4 ) = −L zz det(H3 ) = L yy L zz .
At a local minimum L yy > 0 and L zz > 0, so − det(H3 ) > 0 and − det(H4 ) > 0. Similarly, at a local
maximum, L yy < 0 and L zz < 0, so − det(H3 ) < 0 and − det(H4 ) > 0. The general case takes into
consideration the cross partial derivatives of L and allows the constraint function to be nonlinear. However,
an argument using linear algebra reduces the result to this special case.
Example 2. Consider the problem of finding the extreme point of f (x, y, z) = x 2 + y 2 + z 2 on 2 = z − x y.
The method of Lagrange multipliers finds the points
(λ∗ , x ∗ , y ∗ , z ∗ ) = (4, 0, 0, 2),
= (2, 1, −1, 1), and
= (2, −1, 1, 1).
The Lagrangian function is
L(λ, x, y, z) = x 2 + y 2 + z 2 − λz + λx y + λ2
with Hessian matrix
0 y x -1
y 2 λ 0.
H4 = H L =
x λ 2 0
-1 0 0 2
At the point (λ∗ , x ∗ , y ∗ , z ∗ , λ∗ ) = (4, 0, 0, 2), expanding on the first row,
0 0 0 -1
0 2 4 0 0 2 4
− det (H4 ) = − det 0 4 2 0 = − det 0 4 2 = −12 < 0,
-1 0 0
-1 0 0 2
so the point is not a local extremum.
The calculation at the other two points is similar, so we consider the point (λ∗ , x ∗ , y ∗ , z ∗ ) = (2, 1, -1, 1).
The partial derivative gx (1, -1, 1) = −(-1) 6= 0, so
0 y x 0 -1 1
H3 = y 2 λ = -1 2 2 .
x λ 2 1 2 2
Expanding det(H3 ) on the first row,
0 -1 1
− det (H3 ) = − det -1 2 2
1 2 2
-1 2 -1 2
= − det − det
1 2 1 2
= 4 + 4 = 8 > 0.
8 CONSTRAINED EXTREMA
-1 0 0 2
-1 1 -1 0 -1 1
= − det 2 2 0 − (2) det -1 2 2
2 2 0 1 2 2
= −(0) − 2(−8) = 16 > 0.
Thus, this point is a local minimum. A similar calculation at (x, y, z, λ) = (−1, 1, 1, 2) shows that it is also
a local minimum. When working the problem originally, we found these two points as the minima.
f x = λgx + µh x 0 = λ + µ2x
f y = λg y + µh y 0 = λ + µ2y
f z = λgz + µh z 1 = λ − µ.
From the third equation, we get λ = 1 + µ. So we can eliminate this variable from the equations. They
become
0 = 1 + µ + µ2x
0 = 1 + µ + µ2y.
-1 1 0 0 0 0 0 -1 -1 -1
1 -1 0 0 0 1 -1 0 0 0
0 0 6 6 1 0 5 0 -2/5 0
/5 0 0 = − det -2/5
= det 0 5 -2
0 5 0 0
0 5 0 -2/5 0
0 0 6 6 1
0 0 -1 -1 -1 0 0 -1 -1 -1
1 -1 0 0 0 1 -1 0 0 0
0 5 0 -2/5 0 0 5 0 -2/5 0
-2/5 2/5 0 = − det 0 0 -2/5 2/5 0
= − det 0 0
0 0 6 6 1 0 0 0 12 1
0 0 -1 -1 -1 0 0 0 -2 -1
1 -1 0 0 0
0 5 0 -2/5 0
= -20 < 0.
-2/5 2/5
= − det 0 0 0
0 0 0 12 1
0 0 0 0 -5/6
Therefore, this point is a local maximum.
These answers are compatible with the values of f (x, y, z) at the two critical point: The first is a global
minimum on the constraint set, and the second is a global maximum on the constraint set.