Multi Variable Optimization
Multi Variable Optimization
01)
Contents
1 Multivariable Optimization 1
1 Multivariable Optimization
In this chapter, we introduce 3-dimensional space (often called 3-space for short) and functions of several variables.
We then develop the theory of dierentiation for functions of several variables, and discuss applications to opti-
mization and nding local extreme points. The development of these topics is very similar to that of one-variable
calculus, save for some additional complexities due to the presence of extra variables.
• A function of several variables is, as the name indicates, a function that takes in several variables and outputs
a value associated to the inputs.
◦ Example: The function f (x, y) = x + y takes in two values x and y and outputs their sum x + y.
2
◦ Example: The function g(x, y, z) = x y z takes in three values x, y , and z, and outputs the product
x y z2.
√
◦ Example: The function d(l, w) =l2 + w2 gives the length of a diagonal of a rectangle as a function of
the rectangle's length l and its width w .
1
◦ Example: The function V (r, h) = πr2 h gives the volume of a (right circular) cone as a function of its
3
base radius r and its height h.
• As with functions of one variable, a function of several variables has a domain and a range: the domain is the
set of input values and the range is the set of output values.
◦ For a function of two variables f (x, y), the domain is now a subset of the 2-dimensional plane rather
than a subset of the real line. Because of this, domains of functions of more than one variable can be
rather complicated.
1
◦ In general, unless specied, the domain of a function is the largest possible set of inputs for which the
denition of the function makes sense. We generally adopt the conventions that square roots of negative
real numbers are not allowed, nor is division by zero.
√ √
◦ Example: f (x, y) = x + y , the domain is the rst quadrant of the xy -plane, dened by the
For
inequalities x ≥ 0 and y ≥ 0.
p
◦ Example: For f (x, y) = x2 + y 2 − 1, the domain is the set of points in the xy -plane which satisfy
x2 + y 2 ≥ 1: this describes all the points of the plane except for those lying strictly inside the unit circle.
1
◦ Example: For f (x, y) = , the domain is the set of points in the xy -plane which satisfy x − y 6= 0:
x−y
this describes all points in the plane except those on the line y = x.
• Points in 3-space are represented by a triplet of numbers (x, y, z). The new coordinate z represents height
above the xy -plane.
◦ For example, in 3-space, we can measure the distance between any two points. By a suitable pair of
applications of the Pythagorean Theorem, we can compute that the distance between points
p (x1 , y1 , z1 )
and (x2 , y2 , z2 ) is given by (x1 − x2 )2 + (y1 − y2 )2 + (z1 − z2 )2 .
• There are two primary ways to visualize a function f (x, y) of two variables.
• The rst way is to plot the points (x, y, z) in 3-dimensional space satisfying z = f (x, y).
◦ At the point (x, y) in the plane, the graph has the height z = f (x, y); so we see that as (x, y) varies
through the plane, the function z = f (x, y) will trace out a surface, called the graph of f (x, y).
• The second way is to plot the points (x, y) in the plane on the level sets f (x, y) = c (for particular values of
c), as implicit curves.
◦ For a given function f (x, y) and a particular value of c, the points (x, y) satisfying f (x, y) = c are called
a level set of f. For a function of two variables these sets will generally be curves, so they are also
sometimes called level curves.
◦ If we graph many of these level curves together on the same axes, we will obtain a topographical map
of the function f (x, y).
2
• Some examples of level sets are given below:
◦ Example: The level sets for the function f (x, y) = x2 + y√2 are circles in the plane. More specically, the
2 2
level set x +y = c (for c > 0) is a circle with radius c centered at (0, 0). For c=0 the level set is
just the single point (0, 0), and for c<0 the level sets do not contain any points at all. The rst graph
is the level set x2 + y 2 = 1; the second graph contains level sets for c = 1, 2, 3, · · · , 8, 9.
3 3
2 2
1 1
0 0
-1 -1
-2 -2
-3 -3
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
◦ Example: Here are two collections of level curves for the function f (x, y) = x2 − y 2 , along with the 3D
graph z = f (x, y). The rst set of level curves is f (x, y) = c for c = −4, −3, −2, −1, 0, and the second
set of level curves is f (x, y) = c for c = 0, 1, 2, 3, 4.
3 3
2 2
1 1
0 0
-1 -1
-2 -2
-3 -3
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
2 2
∗ The graph of z = x −y is called a hyperbolic paraboloid (or more colloquially, a saddle) since it
curves upward along the x-direction but downward along the y -direction. [The hyperbolic paraboloid
is called that because it looks like a hyperbola in one cross-section, and a parabola in two others.]
2
−y 2
• Example: Here are some level curves for the function f (x, y) = (x2 − y 2 )2 e−x , along with a 3D graph of
z = f (x, y), both plotted on the region −3 ≤ x ≤ 3 and −3 ≤ y ≤ 3:
3
-1
-2
-3
-3 -2 -1 0 1 2 3
◦ As one can see from comparing the plots, the level curves indicate where the function is changing in
value: many level curves grouped closely together indicates that the function is changing quickly so that
the function's graph will be steep, while having level curves grouped far apart means that the function's
graph will be fairly at.
• Here are a few more graphs z = f (x, y) for functions f (x, y):
3
◦ Example: The graph z = x3 − 3xy 2 is sometimes called the monkey saddle, as it has three depressions
rather than the two for the regular saddle (one for each leg, and one for the tail).
√ p
x2 +y 2 /12
◦ Example: The graph z = e3− · cos x2 + y 2 produces a surface that looks like ripples in a
pool of water.
• We can also represent some information graphically for functions f (x, y, z) of three variables. Unfortunately,
we cannot really produce proper graphs of these functions, since plotting a graph w = f (x, y, z) would require
drawing a 4-dimensional picture.
• However, we can still talk about level sets of functions of 3 variables these are the points (x, y, z) satisfying
a relation f (x, y, z) = c: these will (in general) give rise to level surfaces in 3-dimensional space.
◦ Any graph z = g(x, y) is an example of a level surface for the function f (x, y, z) = g(x, y) − z : the points
(x, y, z) with z = g(x, y) are the same as those with f (x, y, z) = 0.
• Example: The level surfaces for the function f (x, y, z) = x2 + y 2 + z 2 are spheres centered at the origin: by
2 22
the distance formula, the points satisfying x + y + z = c (for c > 0) are precisely those which are at a
√ √
distance of c from the origin but this is just another way of describing the sphere of radius c centered
2 2 2
at (0, 0, 0). A graph of the sphere x + y + z = 1 is below.
• Example: The level surfaces for the function f (x, y, z) = x2 + z 2 are (right circular) cylinders running along
2 2
the y -axis: again, from the distance
√ formula, points satisfying x + z = c (for c > 0) are those which are at
a distance of c from the y -axis but this is just another way of describing a right circular cylinder oriented
along the y -axis. A graph of the cylinder x2 + z 2 = 1 is above.
• Example: Let us examine the level surfaces for the function f (x, y, z) = x2 + y 2 − z 2 .
◦ The set of points satisfying x2 + y 2 − z 2 = c for c > 0 forms a surface called a hyperboloid of one sheet, so
named because the shape is a hyperbola in two cross-sections and an ellipse in the third, and the graph
is connected (one sheet).
4
∗ Remark: A hyperboloid of one sheet whose cross-sections are circles is an example of what is called
a ruled surface: through each point on the surface pass two lines which are contained in the surface.
Such surfaces can therefore be (physically) constructed using materials which are not curved. The
hyperboloid of one sheet, in particular, is a common design for cooling towers.
◦ The set of points satisfying x2 + y 2 − z 2 = 0 forms a (right circular) double cone whose axis is the z -axis
and whose center point at the origin (0, 0, 0).
◦ The set of points satisfying x2 + y 2 − z 2 = c for c<0 forms a surface called a hyperboloid of two sheets,
so named because the shape is a hyperbola in two cross-sections and an ellipse in the third, and the
graph consists of two pieces (two sheets).
• Partial derivatives are simply the usual notion of dierentiation applied to functions of more than one variable.
However, since we now have more than one variable, we also have more than one natural way to compute a
derivative.
◦ As with the denition of the derivative of a one-variable function, the derivatives of a function of several
variables are formally dened using limits.
◦ But like in the one-variable case, the formal denition of limit is cumbersome and generally not easy to
use, even for simple functions.
◦ Fortunately, we will not need to use limits to compute partial derivatives, since for all of the functions
we will discuss, the computation of partial derivatives reduces to a version of the usual one-variable
derivative.
• Denition: For a function f (x, y) of two variables, we dene the partial derivative of f with respect to x as
∂f f (x + h, y) − f (x, y)
= fx = lim and the partial derivative of f with respect to y as
∂x h→0 h
∂f f (x, y + h) − f (x, y)
= fy = lim .
∂y h→0 h
◦ Notation: In multivariable calculus, we use the symbol ∂ (typically pronounced either like the letter d or
as del) to denote taking a derivative, in contrast to single-variable calculus where we use the symbol d.
∂f
◦ We will frequently use both notations and fy to denote partial derivatives: we generally use the
∂y
dierence quotient notation when we want to emphasize a formal property of a derivative, and the
subscript notation when we want to save space.
◦ Geometrically, the partial derivative fx captures how fast the function f is changing in the x-direction,
and fy captures how fast f is changing in the y -direction.
• To evaluate a partial derivative of the function f with respect to x, we need only pretend that all the other
variables (i.e., everything except x) that f depends on are constants, and then just evaluate the derivative of
f with respect to x as a normal one-variable derivative.
5
◦ All of the derivative rules (the Product Rule, Quotient Rule, Chain Rule, etc.) from one-variable calculus
still hold: there will just be extra variables oating around.
◦ For fx , we treat y as a constant and x as the variable. Thus, we see that fx = 3x2 · y 2 + ex .
◦ Similarly, to nd fy , we instead treat x as a constant and y as the variable, to get fy = x3 · 2y + 0 = 2x3 y .
x
(Note in particular that the derivative of e with respect to y is zero.)
2x
◦ For fx , we treat y as a constant and x as the variable. We can apply the Chain Rule to get fx = ,
x2 + y 2
since the derivative of the inner function x2 + y 2 with respect to x is 2x.
2y
◦ Similarly, we can use the Chain Rule to nd the partial derivative fy = .
x2 + y 2
ex y
• Example: Find fx and fy for f (x, y) = .
x2 + x
∂ ∂
2
∂x [exy ] · (x2 + x) − exy · ∂x x +x
◦ For fx we apply the Quotient Rule: fx = . Then we can evaluate
(x2 + x)2
(y exy ) · (x2 + x) − exy · (2x + 1)
the derivatives in the numerator to get fx = .
(x2 + x)2
◦ For fy , the calculation is easier because the denominator is not a function of y. So in this case, we just
1
need to use the Chain Rule to see that fy = 2 · (x exy ) .
x +x
• We can generalize partial derivatives to functions of more than two variables: for each input variable, we get
a partial derivative with respect to that variable. The procedure remains the same: treat all variables except
the variable of interest as constants, and then dierentiate with respect to the variable of interest.
2
−y
• Example: Find fx , fy , and fz for f (x, y, z) = y z e2x .
2
−y
◦ By the Chain Rule we have fx = y z · e2x · 4x . (We don't need the Product Rule for fx since y and
z are constants.)
◦ For fy we need to use the Product Rule since f is a product of two nonconstant functions of y. We get
2
h 2
i 2 2
−y −y
fy = z · e2x −y
+yz · ∂
∂y e2x −y
, and then using the Chain Rule gives fy = z e2x − y z · e2x .
2
−y
◦ For fz , all of the terms except for z are constants, so we have fz = y e2x .
• Like in the one-variable case, we also have higher-order partial derivatives, obtained by taking a partial
derivative of a partial derivative.
∂ ∂
◦ For a function of two variables, there are four second-order partial derivatives fxx = ∂x [fx ], fxy = ∂y [fx ],
∂ ∂
fyx = ∂x [fy ], and fyy = ∂y [fy ].
◦ Remark: Partial derivatives in subscript notation are applied left-to-right, while partial derivatives in
dierential operator notation are applied right-to-left. (In practice, the order of the partial derivatives
rarely matters, as we will see.)
• Example: Find the second-order partial derivatives fxx , fxy , fyx , and fyy for f (x, y) = x3 y 4 + y e2x .
6
∂
3x y + 2y e2x = 6xy 4 + 4y e2x and fxy = ∂y
∂
3x y + 2y e2x = 12x2 y 3 + 2e2x
2 4 2 4
◦ Then we have fxx = ∂x .
∂
4x y + e2x = 12x2 y 3 + 2e2x and fyy = ∂y
∂
4x y + e2x = 12x3 y 2 .
3 3 3 3
◦ Also we have fyx = ∂x
• Notice that fxy = fyx for the function in the example above. This is not an accident:
• Theorem (Clairaut): If both partial derivatives fxy and fyx are continuous, then they are equal.
◦ In other words, these mixed partials are always equal (given mild assumptions about continuity), so
there are really only three second-order partial derivatives.
◦ This theorem can be proven using the limit denition of derivative and the Mean Value Theorem, but
the details are unenlightening.
• We can continue on and take higher-order partial derivatives. For example, a function f (x, y) has eight
third-order partial derivatives: fxxx , fxxy , fxyx , fxyy , fyxx , fyxy , fyyx , and fyyy .
◦ By Clairaut's Theorem, we can reorder the partial derivatives any way we want (if they are continuous,
which is almost always the case). Thus, fxxy = fxyx = fyxx , and fxyy = fyxy = fyyx .
◦ So in fact, f (x, y) only has four dierent third-order partial derivatives: fxxx , fxxy , fxyy , fyyy .
• Example: Find the third-order partial derivatives fxxx , fxxy , fxyy , fyyy for f (x, y) = x4 y 2 + x3 ey .
• Example: If all 5th-order partial derivatives of f (x, y, z) are continuous and fxyz = exyz , what is fzzyyx ?
◦ By Clairaut's theorem, we can dierentiate in any order, and so fzzyyx = fxyzyz = (fxyz )yz .
◦ Since fxyz = exyz we obtain (fxyz )y = xzexyz and then (fxyz )yz = xexyz + x2 yzexyz .
• Now that we have developed the basic ideas of derivatives for functions of several variables, we would like to
know how to nd minima and maxima of functions of several variables.
• We will primarily discuss functions of two variables, because there is a not-too-hard criterion for deciding
whether a critical point is a minimum or a maximum.
◦ Classifying critical points for functions of more than two variables requires some results from linear
algebra, so we will not treat functions of more than two variables except for a special case we discuss in
the next section.
• Denition: A local minimum is a critical point where f is nearby always bigger, a local maximum is a critical
point where f is nearby always smaller, and a saddle point is a critical point where f nearby is bigger in
some directions and smaller in others.
7
◦ Here are plots of the three examples:
• We would rst like to determine where a function f can have a minimum or maximum value.
◦ If fx (P ) > 0, then by moving slightly in the positive x-direction the value of f will increase, and by
moving slightly in the negative x-direction the value of f will decrease.
◦ Inversely, if fx (P ) < 0, then by moving slightly in the negative x-direction the value of f will increase,
and by moving slightly in the positive x-direction the value of f will decrease.
◦ Thus, f can only have a local minimum or maximum at P if fx (P ) = 0. By the same reasoning with y
in place of x, we must also have fy (P ) = 0 at a local minimum or maximum.
• Denition: A critical point of the function f (x, y) is a point (x0 , y0 ) such that fx (x0 , y0 ) = 0 = fy (x0 , y0 ), or
either fx (x0 , y0 ) or fy (x0 , y0 ) is undened.
◦ By the observations above, a local minimum or maximum of a function can only occur at a critical point.
◦ We have gx = 2x and gy = 2y . Since both partial derivatives are dened everywhere, the only critical
points will occur when gx = gy = 0.
◦ We see gx = 0 precisely when x=0 and gy = 0 precisely when y = 0.
◦ Thus, there is a unique critical point (x, y) = (0, 0) .
◦ We have hx = 2x − 4 and hy = 2y + 2. Since both partial derivatives are dened everywhere, the only
critical points will occur when hx = hy = 0.
◦ We see hx = 0 precisely when x=2 and hy = 0 precisely when y = −1.
◦ Thus, there is a unique critical point (x, y) = (2, −1) .
8
◦ This gives the two equations 3x2 − 3y = 0 and 3y 2 − 3x = 0, or, equivalently, x2 = y and y 2 = x.
◦ Plugging the rst equation into the second yields x4 = x: thus, x4 − x = 0 , and factoring yields
2
x(x − 1)(x + x + 1) = 0.
◦ The only real solutions are x = 0 (which then gives y = x2 = 0) and x = 1 (which then gives y = x2 = 1).
◦ Therefore, there are two critical points: (0, 0) and (1, 1) .
2
−2x2
• Example: Find the critical points of the function f (x, y) = x ey .
2
−2x2 2
−2x2 2
−2x2 2
−2x2 2 2
◦ We have fx = e y +xey ·(−4x) = (1−4x2 )ey ·(2y) = 2xy ey −2x . Since
and fy = x ey
both partial derivatives are dened everywhere, the only critical points will occur when fx = fy = 0.
1
◦ Since exponentials are never zero, we see that fx = 0 when 1 − 4x2 = 0 so that x = ± , while fy = 0
2
1
when 2xy = 0 so that x = 0 or y = 0. Since x cannot be zero by the rst equation (since x = ± ) we
2
must have y = 0.
1 1
◦ Therefore, there are two critical points: ( , 0) and (− , 0) .
2 2
• Now that we have a list of critical points (namely, the places that a function could potentially have a minimum
or maximum value) we would like to know whether those points actually are minima or maxima of f.
• Denition: The discriminant (also called the Hessian) at a critical point is the value D = fxx · fyy − (fxy )2 ,
where each of the second-order partials is evaluated at the critical point.
◦ One way to remember the denition of the discriminant is as the determinant of the matrix of the four
fxx fxy
second-order partials: D= . (We are implicitly using the fact that fxy = fyx .)
fyx fyy
◦ Example: For g(x, y) = x2 + y 2 we have gxx = gyy = 2 and gxy = 0 so D=4 at the origin.
2 2
◦ Example: For h(x, y) = x − y we have hxx = 2, hyy = −2, and hxy = 0 so D = −4 at the origin.
◦ Remark: The reason this value is named discriminant can be seen by computing D for the function
p(x, y) = ax2 + bxy + cy 2 : the result is D = 4ac − b2 , which is −1 times the quantity b2 − 4ac, the famous
2 2
discriminant for the quadratic polynomial ax + bx + c. (Recall that the discriminant of ax + bx + c
determines how many real roots the polynomial has.)
• Theorem (Second Derivatives Test): Suppose P is a critical point of f (x, y), and let D be the value of the
2
discriminant fxx fyy − fxy at P. If D>0 fxx > 0, then the critical point is a minimum. If D > 0 and
and
fxx < 0, then the critical point is a maximum. If D < 0, then the critical point is a saddle point. (If D = 0,
then the test is inconclusive.)
◦ Proof (outline): Assume for simplicity that P is at the origin. Then one may show that the function
1
f (x, y) − f (P ) is closely approximated by the polynomial ax2 + bxy + cy 2 , where a= fxx , b = fxy ,
2
1
and c = fyy . If D 6= 0, then the behavior of f (x, y) near the critical point P will be the same as that
2
quadratic polynomial. Completing the square and examining whether the resulting quadratic polynomial
has any real roots and whether it opens or downwards yields the test.
• We can combine the above results to yield a procedure for nding and classifying the critical points of a
function f (x, y):
9
∗ It may require some algebraic manipulation to nd the solutions: a basic technique is to solve one
equation for one of the variables, and then plug the result into the other equation. Another technique
is to try to factor one of the equations and then analyze cases.
◦ Step 3: At each critical point, evaluate D = fxx · fyy − (fxy )2 and apply the Second Derivatives Test:
If D>0 and fxx > 0: local minimum . If D>0 and fxx < 0: local maximum . If D < 0: saddle point .
• Example: Verify that f (x, y) = x2 + y 2 has only one critical point, a minimum at the origin.
◦ First, we have fx = 2x and fy = 2y . Since they are both dened everywhere, we need only nd where
they are both zero.
◦ Setting both partial derivatives equal to zero yields x=0 and y = 0, so the only critical point is (0, 0).
◦ To classify the critical points, we compute fxx = 2, fxy = 0, and fyy = 2. Then D = 2 · 2 − 02 = 4.
◦ So, by the classication test, since D>0 and fxx > 0 at (0, 0), we see that (0, 0) is a local minimum .
• Example: For the function f (x, y) = 3x2 + 2y 3 − 6xy , nd the critical points and classify them as minima,
maxima, or saddle points.
◦ First, we have fx = 6x − 6y and fy = 6y 2 − 6x. Since they are both dened everywhere, we need only
nd where they are both zero.
◦ Next, we can see that fx is zero only when y = x. Then the equation fy = 0 becomes 6x2 − 6x =
0, which by factoring we can see has solutions x = 0 or x = 1. Since y = x, we conclude that
• Example: For the function g(x, y) = x3 y − 3xy 3 + 8y , nd the critical points and classify them as minima,
maxima, or saddle points.
◦ First, we have gx = 3x2 y − 3y 3 and gy = x3 − 9xy 2 + 8. Since they are both dened everywhere, we need
only nd where they are both zero.
◦ Setting both partial derivatives equal to zero. Since gx = 3y(x2 − y 2 ) = 3y(x + y)(x − y), we see that
gx = 0 precisely when y=0 or y=x or y = −x.
◦ If y = 0, then gy = 0 implies x3 + 8 = 0, so that x = −2. This yields the point (x, y) = (−2, 0).
3
◦ If y = x, then gy = 0 implies −8x + 8 = 0, so that x = 1. This yields the point (x, y) = (1, 1).
3
◦ If y = −x, then gy = 0 implies −8x + 8 = 0, so that x = 1. This yields the point (x, y) = (1, −1).
◦ To summarize, we see that (−2, 0), (1, 1), and (1, −1) are critical points .
◦ To classify the critical points, we compute gxx = 6xy , gxy = 3x2 − 9y 2 , and gyy = −18xy .
◦ Then D(−2, 0) = 0·0−(12) < 0, D(1, 1) = 6·(−18)−(−6) < 0, and D(1, −1) = (−6)·(18)−(−6)2 < 0.
2 2
◦ So, by the classication test, (−2, 0), (1, 1), and (1, −1) are all saddle points .
• Example: Find the value of the function h(x, y) = x + 2y 4 − ln(x4 y 8 ) at its local minimum, for x and y
positive.
◦ To solve this problem, we will search for all critical points of h(x, y) that are minima.
3 8 4 7
4x y 4 8x y 8
◦ First, we have hx = 1 − = 1− and hy = 8y 3 − = 8y 3 − . Both partial derivatives are
x4 y 8 x x4 y 8 y
dened everywhere in the given domain.
10
8 4
◦ We see that hx = 0 only when x = 4, and also that hy = 0 is equivalent to (y − 1) = 0, which holds
y
for y = ±1. Since we only want y > 0, there is a unique critical point: (4, 1).
4 8 1
◦ Next, we compute hxx = , gxy = 0, and gyy = 24y 2 + . Then D(4, 1) = · 32 − 02 > 0.
x2 y2 4
◦ Thus, there is a unique critical point, and it is a minimum. Therefore, we conclude that the function has
a local minimum at (4, 1), and the minimum value is h(4, 1) = 6 − ln(44 ) .
• Example: Find the minimum distance between a point on the plane x+y+z =1 and the point (2, −1, −2).
p
◦ The distance from the point (x, y, z) to (2, −1, 2) is d = (x − 2)2 + (yp+ 1)2 + (z + 2)2 . Since x+y+z =
1 on the plane, we can view this as a function of x and y only: d(x, y) = (x − 2)2 + (y + 1)2 + (3 − x − y)2 .
◦ We could minimize d(x, y) by nding its critical points and searching for a minimum, but it will be much
easier to nd the minimum value of the squared distance f (x, y) = d(x, y)2 = (x−2)2 +(y+1)2 +(3−x−y)2 .
◦ We compute fx = 2(x − 2) − 2(3 − x − y) = 4x + 2y − 10 and fy = 2(y + 1) − 2(3 − x − y) = 2x + 4y − 4.
Both partial derivatives are dened everywhere, so we need only nd where they are both zero.
◦ Setting fx = 0 and solving for y yields y = 5 − 2x, and then plugging this into fy = 0 yields 2x + 4(5 −
2x) − 4 = 0, so that −6x + 16 = 0. Thus, x = 8/3 and then y = −1/3.
2
◦ Furthermore, we have fxx = 4, fxy = 2, and fyy = 4, so that D = fxx fyy − fxy = 12 > 0. Thus, the
point (x, y) = (8/3, −1/3) is a local minimum.
◦ Thus, there is a unique critical point, and it is a minimum. We conclude that the distance function has
p √
its minimum at (4, 1), so the minimum distance is d(8/3, −1/3) = (2/3)2 + (2/3)2 + (2/3)2 = 2/ 3 .
• We now discuss the problem of nding the minimum and maximum values of a function on a region in the
plane, rather than the entire plane itself.
◦ In general, if the region is not closed (i.e., does not contain its boundary, like the region x2 + y 2 < 1
2 2
which does not contain the boundary circle x + y = 1) or not bounded (i.e., extends innitely far away
from the origin, like the half-plane x ≥ 0) then a continuous function may not attain its minimum or
maximum values anywhere in the region.
◦ In order to ensure that a function does attain its minimum and maximum values at some point inside
the region, the region must be both closed and bounded. If the region is not bounded or not closed, we
must additionally study what happens to the function as we approach the region's boundary, or what
happens as we move far away from the origin.
• A natural rst step is to nd the critical points of the function. However, if we want to nd the absolute
minimum or maximum of a function f (x, y) on a closed and bounded region, we must also analyze the
function's behavior on the boundary of the region, because the boundary could contain the minimum or
maximum.
• Unfortunately, unlike the case of a function of one variable where the boundary of an interval [a, b] is very
simple (namely, the two values x = a and x = b), the boundary of a region in the plane or in higher-dimensional
space can be rather complicated.
11
◦ Ultimately, one needs to nd a parametrization (x(t), y(t)) of the boundary of the region, or some other
description. (This may require breaking the boundary into several pieces, depending on the shape of the
region.)
◦ Then, by plugging the parametrization of the boundary curve into the function, we obtain a function
f (x(t), y(t)) of the single variable t, which we can then analyze to determine the behavior of the function
on the boundary.
• To nd the absolute minimum and maximum values of a function on a given closed and bounded region R,
follow these steps:
◦ Step 1: Find all of the critical points of f that lie inside the region R.
◦ Step 2: Parametrize the boundary of the region R (separating into several components if necessary) as
x = x(t) and y = y(t), then plug in the parametrization to obtain a function of t, f (x(t), y(t)). Then
d
search for boundary-critical points, where the t-derivative of f (x(t), y(t)) is zero. Also include
dt
endpoints, if the boundary components have them.
∗ A line segment from (a, b) to (c, d) can be parametrized by x(t) = a + t(c − a), y(t) = b + t(d − b),
for 0 ≤ t ≤ 1.
∗ A curve of the form y = g(x) can be parametrized by x(t) = t, y(t) = g(t).
◦ Step 3: Plug the full list of critical and boundary-critical points into f, and nd the largest and smallest
values.
• Example: Find the absolute maximum and minimum of f (x, y) = x2 − xy + y on the rectangle 0 ≤ x ≤ 2,
0 ≤ y ≤ 3.
◦ First, we nd the critical points: since fx = 2x − y and fy = −x + 1, there is a single critical point
(1, 2) .
∗ Component #1, a line segment from (0, 0) to (2, 0): This component is parametrized by x = 2t,
y=0 0 ≤ t ≤ 1. On this component we have f (2t, 0) = 4t2 , which has a critical point at t = 0
for
corresponding to (x, y) = (0, 0) . We also have boundary points (0, 0), (2, 0) .
∗ Component #2, a line segment from (2, 0) to (2, 3): This component is parametrized by x = 2,
y = 3t for 0 ≤ t ≤ 1. On this component we have f (2, 3t) = 4 − 3t, which has no critical point. We
get only the boundary points (2, 0), (2, 3) .
∗ Component #4, a line segment from (0, 0) to (0, 3): This component is parametrized by x = 0,
y = 3t for 0 ≤ t ≤ 1. On this component we have f (0, 3t) = 3, which has no critical point. We get
only the boundary points (0, 0), (0, 3) .
◦ Our full list of points to analyze is (1, 2), (0, 0), (2, 0), (2, 3), (3/2, 3), (0, 3). We have f (1, 2) = 1,
f (0, 0) = 0, f (2, 0) = 4, f (2, 3) = 1, f (3/2, 3) = 3/4, and f (0, 3) = 3. The maximum is 4 and the
minimum is 0 .
• Example: Find the absolute minimum and maximum of f (x, y) = x3 + 6xy − y 3 on the triangle with vertices
(0, 0), (4, 0), and (0, −4).
◦ First, we nd the critical points: we have fx = 3x2 + 6y and fy = −3y 2 + 6x. Solving fy = 0 yields
2
x = y /2 and then plugging into fx = 0 gives y 4 /4 + 2y = 0 so that y(y 3 + 8) = 0: thus, we see that
(0, 0) and (2, −2) are critical points .
◦ Next, we analyze the boundary of the region. Here, the boundary has 3 components.
12
∗ Component #1, joining (0, 0) to (4, 0): This component is parametrized by x = t, y = 0 for 0 ≤ t ≤ 4.
On this component we have f (t, 0) = t3 , which has a critical point only at t = 0, which corresponds
to (x, y) = (0, 0) . Also add the boundary point (4, 0) .
∗ Component #2, joining (0, −4) to (4, 0): This component is parametrized by x = t, y = t − 4 for
0 ≤ t ≤ 4. On this component we have f (t, t − 4) = 18t2 − 72t + 64, which has a critical point for
t = 2, corresponding to (x, y) = (2, −2) . Also add the boundary points (4, 0) and (0, −4) .
∗ Component #3, joining (0, 0) to (0, −4): This component is parametrized by x = 0, y = −t for
0 ≤ t ≤ 4. On this component we have f (0, t) = t3 , which has a critical point for t = 0, corresponding
to (x, y) = (0, 0) . Also add the boundary point (0, −4) .
◦ Our full list of points to analyze is (0, 0), (4, 0), (0, −4), and (2, −2). We compute f (0, 0) = 0, f (4, 0) = 64,
f (0, −4) = 64, f (2, −2) = −8, and so we see that maximum is 64 and the minimum is -8 .
• Example: Find the absolute maximum and minimum of f (x, y) = xy − 3x on the region with x2 ≤ y ≤ 9.
◦ First, we nd the critical points: since fx = y − 3 and fy = x, there is a single critical point (0, 3) .
◦ Next, we analyze the boundary of the region, which (as a quick sketch reveals) has 2 components.
∗ Component #1, a line segment from (−3, 9) to (3, 9): This component is parametrized by x = t,
y=9 for −3 ≤ t ≤ 3. On this component we have f (t, 9) = 6t, which has no critical point. We only
have boundary points (−3, 9), (3, 9) .
The boundary points (−3, 9), (3, 9) are already listed above.
◦ Our full list of points to analyze is (0, 3), (−3, 9), (3, 9), (−1, 1), (1, 1). We have f (0, 3) = 0, f (−3, 9) =
−18, f (3, 9) = 18, f (−1, 1) = 2, and f (1, 1) = −2. The maximum is 18 and the minimum is -18 .
• A particular special class of optimization problems involves searching for the minimum or maximum value of
a linear function subject to various linear constraints (i.e., on a region dened by linear inequalities such as
3x + 2y ≤ 8 or x + z ≥ 0): these are known as linear programming problems .
1
• We can use the same methods for optimization of a function on a region to solve linear programming problems.
However, linear programming problems have some convenient features that make it easier to identify potential
minima and maxima, which we will illustrate with an example:
• Example: Find the minimum and maximum values of the function f (x, y) = 2 + 3x + 5y subject to the
constraints x ≥ 0, y ≥ 0, x + y ≤ 10, x + 2y ≤ 15.
◦ First we search for critical points of f: since fx = 3 and fy = 5, there are no critical points.
◦ Next, we need to determine the structure of the region, which is shown below:
1 Although there are numerous computational algorithms (such as the simplex method) that exist for solving large-scale linear
programming problems, the word programming in linear programming does not refer to computer programs. Instead, it comes from
the United States military usage of the word program in reference to training and logistics schedules, whose optimization was among
the rst applied examples of linear programming.
13
◦ We can see that the boundary has 4 components, and we can nd the intersection points of the various
lines that make up components of the boundary by solving the constraint equations.
◦ Explicitly, the lines x = 0 and y = 0 intersect at the origin (0, 0), the line y = 0 intersects x + y = 10 at
(10, 0), the line x + y = 10 intersects x + 2y = 15 at (5, 5), and the line x + 2y = 15 intersects x = 0 at
(0, 15/2).
◦ Now we can parametrize the four components of the boundary:
∗ Component #1, a line segment from (0, 0) to(10, 0): This component is parametrized by x = 10t,
y=0 for 0 ≤ t ≤ 1. On this component we have f (10t, 0) = 2 + 30t, which has derivative 30 and
thus has no critical points. We get only the boundary points (0, 0), (10, 0) .
∗ Component #2, a line segment from (5, 5) to (10, 0): This component is parametrized by x = 5 + 5t,
y = 5 − 5t for 0 ≤ t ≤ 1. On this component we have f (5 + 5t, 5 − 5t) = 42 − 10t, which has derivative
−10 and thus has no critical point. We get only the boundary points (5, 5), (10, 0) .
∗ Component #3, a line segment from (0, 15/2) to (5, 5): This component is parametrized by x = 5t,
15 5 15 5 79 5
y = − t for 0 ≤ t ≤ 1. On this component we have f (5t, − t) = + t, which has
2 2 2 2 2 2
5
derivative and thus has no critical point. We get only the boundary points (0, 15/2), (5, 5) .
2
∗ Component #4, a line segment from (0, 0) to (0, 15/2): This component is parametrized by x = 0,
15 15 75 75
y= t for 0 ≤ t ≤ 1. On this component we have f (0, t) = 2 + t, which has derivative
2 2 2 2
and thus has no critical point. We get only the boundary points (0, 0), (0, 15/2) .
◦ Our full list of points to analyze is (0, 0), (10, 0), (5, 5), (0, 15/2). We have f (0, 0) = 2, f (10, 0) = 32,
f (5, 5) = 42, f (0, 15/2) = 79/2. We see that the maximum is 42 and the minimum is 2 .
• In the example above, notice that we did not obtain any critical points nor any boundary-critical points: the
only points on our candidate list were the corner points of the region.
◦ In fact, we can easily see that this will be the case for any linear programming problem (where the
function to be optimized is linear, and the region is dened by linear inequalities).
◦ Since the function is linear, its partial derivatives are all constants, and so (unless all the partial derivatives
are zero, in which case the function is constant) there will be no critical points.
◦ Likewise, since the boundary of the region can be parametrized using linear functions, when we evaluate
the function on the boundary the resulting function will also be linear, and therefore its derivative will
be constant. Then (unless the derivative is zero, in which case the function is constant) there will be no
boundary-critical points.
14
◦ In all cases, we see that the minimum and maximum values will always be attained at one of the
corner points of the boundary. (This observation is sometimes called the fundamental theorem of
linear programming.)
◦ In fact, all of this analysis still holds for linear programming problems in more than 2 variables (although
of course it is typically more dicult to identify all of the corner points by hand, unless the number of
variables and inequalities dening the region is fairly small).
◦ Step 1: Identify the function f to be optimized as well as all of the constraint inequalities.
◦ Step 2: Draw the region (if possible) and identify all of the corner points.
◦ Step 3: Plug the full list of corner points into f, and nd the largest and smallest values.
◦ Note that in order for this procedure to apply, the region must be nite. If the region is innite, it is
also necessary to analyze the behavior of f on the unbounded portion of the region.
• Example: Find the minimum and maximum values of the function P (x, y) = 4x + 3y + 5 subject to the
conditions x ≥ 0, y ≥ 0, x + y ≥ 10, 2x + 3y ≤ 60, 2x + y ≤ 50.
◦ From the plot of the region, we can identify the corner points as follows:
∗ Intersection of x = 0 with x + y = 10 and 2x + 3y = 60, yielding the points (0, 10) and (0, 20).
∗ Intersection of y = 0 with x + y = 10 and 2x + y = 40, yielding the points (10, 0) and (20, 0).
∗ Intersection of 2x + 3y = 60 and 2x + y = 40. Solving the second equation yields y = 40 − 2x, and
plugging into the rst equation gives 2x + 3(40 − 2x) = 60 so that −4x + 120 = 60 so x = 15 and
then y = 10, yielding the point (15, 10).
◦ Our list of candidate points is (0, 10), (0, 20), (10, 0), (20, 0), (15, 10) .
◦ We compute f (0, 10) = 35, f (0, 20) = 65, f (10, 0) = 45, f (20, 0) = 85, and f (15, 10) = 95, so the
• Example: A cat-food company makes its food from chicken, which costs 25 cents per ounce, and beef, which
costs 20 cents per ounce. Chicken has 10 grams of protein and 4 grams of fat per ounce, while beef has 5
grams of protein and 8 grams of fat per ounce. Each package of food must weigh between 10 and 16 ounces,
and it must also have at least 95 grams of protein and at least 80 grams of fat. How much chicken and beef
should the company use in each package to minimize the total cost while also satisfying these requirements?
◦ Suppose that the company uses c ounces of chicken and b ounces of beef: then the total cost is f (b, c) =
20b + 25c (which we wish to minimize).
15
◦ We also must satisfy the constraints b≥0 and c≥0 (since the cans cannot contain a negative amount
of either ingredient), 10 ≤ b + c ≤ 16 (for the weight), 10c + 5b ≥ 95 (for the protein), and 4c + 8b ≥ 80
(for the fat).
◦ From the picture, we can see that the constraints b ≥ 0, c ≥ 0, and b + c ≥ 10 are irrelevant and are not
parts of the boundary of the region. There are three corner points, which we can nd as follows:
∗ Intersection of 10c + 5b = 95. The rst equation yields c = 16 − b, and plugging into
b + c = 16 with
the second equation yields10(16 − b) + 5b = 95 so that 160 − 5b = 95. Thus b = 13 and then c = 3.
∗ Intersection of b + c = 16 with 4c + 8b = 80. The rst equation yields c = 16 − b, and plugging into
the second equation yields 4(16 − b) + 8b = 80 so that 64 + 4b = 80. Thus b = 4 and then c = 12.
∗ Intersection of 10c + 5b = 95 with 4c + 8b = 80. The rst equation yields b = 19 − 2c, and plugging
into the second equation yields 4c + 8(19 − 2c) = 80 so that 152 − 12c = 80. Thus c = 6 and then
b = 7.
◦ Thus we obtain three corner points, (b, c) = (13, 3), (4, 12), and (7, 6).
◦ We compute f (13, 3) = 335, f (4, 12) = 380, and f (7, 6) = 290, so the minimum cost of $2.90 occurs with
(b, c) = (7, 6), which is to say, with 7 ounces of beef and 6 ounces of chicken .
• Many types of applied optimization problems are not of the form given a function, maximize it on a region,
but rather of the form given a function, maximize it subject to some additional constraints.
◦ Example: Maximize the volume V = πr2 h of a cylindrical can given that its surface area SA = 2πr2 +
2
2πrh is 150π cm .
• The most natural way to attempt such a problem is to eliminate the constraints by solving for one of the
variables in terms of the others and then reducing the problem to something without a constraint. Then we
are able to perform the usual procedure of evaluating the derivative (or derivatives), setting them equal to
zero, and looking among the resulting critical points for the desired extreme point.
◦ In the example above, we would use the surface area constraint 150π cm2 = 2πr2 + 2πrh to solve for h
150π − 2πr2 75 − r2
in terms of r , obtaining h = = , and then plug in to the volume formula to write it
2πr r
2
2 75 − r
as a function of r alone: this gives V (r) = πr · = 75πr − πr3 .
r
dV
◦ Then = 75π − 3πr2 , so setting equal to zero and solving shows that the critical points occur for
dr
r = ±5.
16
◦ Since we are interested in positive r, we can do a little bit more checking to conclude that the can's
volume is indeed maximized at the critical point, so the radius is r = 5 cm, the height is h = 10 cm, and
the resulting volume is V = 250πcm3 .
• Using the technique of Lagrange multipliers, however, we can perform a constrained optimization without
having to solve the constraint equations. This technique is especially useful when the constraints are dicult
or impossible to solve explicitly.
• Method (Lagrange multipliers, 1 constraint): To nd the extreme values of f (x, y, z) subject to a constraint
g(x, y, z) = c, dene the Lagrange function L(x, y, z, λ) = f (x, y, z) − λ · [g(x, y, z) − c]. Then any extreme
value of f (x, y, z) subject to the constraint g(x, y, z) = c must occur at a critical point of L(x, y, z, λ). In other
words, it is sucient to solve the system of four variables x, y, z, λ given by fx = λgx , fy = λgy , fz = λgz ,
g(x, y, z) = c, and then search among the resulting triples (x, y, z) to nd the minimum and maximum.
◦ If we have two variables, we would instead solve the system fx = λgx , fy = λgy , g(x, y) = c.
◦ Remark: The value λ is called a Lagrange multiplier.
∗ Imagine we are walking around the level set g(x, y, z) = c, and consider what the contours of f (x, y, z)
are doing as we move around.
∗ In general the contours of f and g will be dierent, and they will cross one another.
∗ But if we are at a point where f is maximized, then if we walk around nearby that maximum, we
will see only contours of f with a smaller value than the maximum.
• For completeness we also mention that there is an analogous procedure for a problem with two constraints:
• Method (Lagrange Multipliers, 2 constraints): To nd the extreme values of f (x, y, z) subject to a pair of
constraints g(x, y, z) = c and h(x, y, z) = d, dene the Lagrange function L(x, y, z, λ, µ) = f (x, y, z) − λ ·
[g(x, y, z) − c] − µ · [h(x, y, z) − d]. Then any extreme value of f (x, y, z) subject to the constraint constraints
g(x, y, z) = c and h(x, y, z) = d must occur at a critical point of L(x, y, z, λ, µ).
◦ The method also works with more than three variables, and has a natural generalization to more than
two constraints. (It is fairly rare to encounter systems with more than two constraints.)
• Example: Find the maximum and minimum values of f (x, y) = 2x+3y subject to the constraint x2 +4y 2 = 100.
• Example: Find the maximum and minimum values of f (x, y, z) = x + 2y + 2z subject to the constraint
x2 + y 2 + z 2 = 9.
17
1 1 1
◦ Solving the rst three equations gives x=
, y = , z = ; plugging in to the last equation yields
2 2 2 2λ λ λ
1 1 1 9 1
+ + = 9, so 2
= 9, so that λ = ± .
2λ λ λ 4λ 2
◦ This gives the two points (x, y, z) = (1, 2, 2) and (−1, −2, −2).
◦ Since f (1, 2, 2) = 9 and f (−1, −2, −2) = −9, the maximum is f (1, 2, 2) = 9 and the minimum is
• Example: Maximize the volume V = πr2 h of a cylindrical can given that its surface area SA = 2πr2 + 2πrh
2
is 150π cm .
◦ We clearly cannot have r = 0 since that contradicts the third equation, so we can assume r 6= 0.
r
◦ Cancelling r from the second equation and then solving for λ yields λ = . Plugging into the rst
2
r
equation (and cancelling the π s) yields 2rh = (4r + 2h) · , so dividing by r yields 2h = 2r + h, so that
2
h = 2r.
◦ Finally, plugging in h = 2r to the third equation (after cancelling the π s) yields 2r2 + 4r2 = 150, so that
r2 = 25 and thus r = ±5.
◦ The two candidate points are (r, h) = (5, 10) and (−5, −10); since we only want positive values we are
left only with (5, 10), which by the physical setup of the problem must be the maximum.
◦ Therefore, the maximum volume occurs with r = 5cm and h = 10cm, and is f (5, 10) = 250πcm3 .
• Example: An assembly line involving f full-time workers, p part-time workers, and r robots has a total
production level of T (f, p, r) = 80f 0.7 p0.2 r0.1 gizmos per day. Each full-time worker's compensation totals
$200 per day, each part-time worker's compensation totals $80 per day, and each robot's maintenance costs
total $40 per day. If the daily operating budget is $4000, how many of each type of worker, and how many
robots, should be employed to maximize total daily production?
◦ We wish to maximize the function T (f, p, r) = 80f 0.7 p0.2 r0.1 subject to the constraint 200f + 80p + 40r =
4000, so that g(f, p, r) = 200f + 80p + 40r.
◦ We may maximize the function T directly, but the resulting calculation is somewhat unpleasant. It is
easier to maximize the logarithm of T , namely ln(T ) = ln(80) + 0.7 ln(f ) + 0.2 ln(p) + 0.1 ln(r), instead.
0.7 0.2 0.1
◦ Taking the partial derivatives then yields the system = 200λ, = 80λ, = 40λ, and 200f +
f p r
80p + 40r = 4000.
7 1 1
◦ The rst three equations yield f = , p = , and r = .
2000λ 400λ 400λ
7 1 1
◦ Plugging these expressions into the last equation then yields 200 · + 80 · + 40 · = 4000,
2000λ 400λ 400λ
1 1
which simplies to = 4000 and thus λ = . This yields a unique candidate triple (f, p, r) =
λ 4000
(14, 10, 10), which by the setup of the problem must be a maximum.
◦ We conclude that the maximum production occurs with 14 full-time workers, 10 part-time workers, and 10 robots ,
◦ Remark: If we tried to maximize T directly, the system of equations would be 56f −0.3 p0.2 r0.1 = 200λ,
16f 0.7 p−0.8 r0.1 = 80λ, 8f 0.7 p0.2 r−0.9 = 40λ, 200f + 80p + 40r = 4000. This system is not nearly as easy
to solve as the one above; one approach is to divide the second and third equations by the rst one, then
solve for two of f, p, r in terms of the other one, and nally plug in to the last equation.
18