Fuff
Fuff
MATHEMATICS
Linear Algebra, Multivariable
Calculus, and Manifolds
MULTIVARIABLE
X/T ATT-TRA/T
1V1/A1 ATir
rlJDlVL/Al ’Q
ILxO
Linear Algebra, Multivariable
Calculus, and Manifolds
THEODORE SHIFRIN
University of Georgia
WILEY
John Wiley & Sons, Inc.
Associate Publisher Laurie Rosatone
Editorial Assistant Kelly Boyle
Executive Marketing Manager Julie Lindstrom
Senior Production Editor Sujin Hong
Senior Designer Madelyn Lesure
This book was set in Times Roman by Techsetters, Inc. and printed and bound by Malloy Lithograph.
The cover was printed by Phoenix Color.
Copyright © 2005 John Wiley & Sons, Inc. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by
any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under
Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the
Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center,
Inc., 222 Rosewood Drive, Danvers, MA 01923, (978)750-8400, Fax: (978)750-4470. Requests to the Publisher
for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., Ill River Street,
Hoboken, NJ 07030, (201) 748-6011, Fax: (201) 748-6008, E-Mail: PERMREQ@WILEY.COM. To order books
or for customer service please call 1-800-CALL WILEY (225-5945).
ISBN 978-0-471-52638-4
CONTENTS
Preface vii ► CHAPTERS
EXTREMUM PROBLEMS 196
> CHAPTER 1
VECTORS AND MATRICES 1 1 Compactness and the Maximum Value
Theorem 196
1 Vectors in R" 1 2 Maximum/Minimum Problems 202
2 Dot Product 8
3 Quadratic Forms and the Second Derivative
3 Subspaces of Rn 16
Test 208
4 Linear Transformations and Matrix Algebra 23
4 Lagrange Multipliers 216
5 Introduction to Determinants and die Cross 5 Projections, Least Squares, and Inner Product
Product 43
Spaces 225
► CHAPTER 2
► CHAPTER6
FUNCTIONS, LIMITS, AND CONTINUITY 53
SOLVING NONLINEAR PROBLEMS 244
1 Scalar- and Vector-Valued Functions 53
1 The Contraction Mapping Principle 244
2 A Bit of Topology in R" 64
2 The Inverse and Implicit Function Theorems 251
3 Limits and Continuity 72 3 Manifolds Revisited 261
► CHAPTERS
► CHAPTER 7
THE DERIVATIVE 81 INTEGRATION 267
1 Partial Derivatives and Directional Derivatives 81 1 Multiple Integrals 267
2 Differentiability 87 2 Iterated Integrals and Fubini’s Theorem 276
3 Differentiation Rules 97 3 Polar, Cylindrical, and Spherical Coordinates 288
4 The Gradient 104 4 Physical Applications 298
5 Curves 109 5 Determinants and n-Dimensional Volume 309
6 Higher-Order Partial Derivatives 120 6 Change of Variables Theorem 324
> CHAPTER 4
► CHAPTER 8
IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR DIFFERENTIAL FORMS AND INTEGRATION ON
SYSTEMS 127 MANIFOLDS 333
1 Gaussian Elimination and the Theory of Linear
1 Motivation 333
Systems 127 2 Differential Forms 335
2 Elementary Matrices and Calculating Inverse 3 Line Integrals and Green’s Theorem 348
Matrices 147
4 Surface Integrals and Flux 367
3 Linear Independence, Basis, and Dimension 156 5 Stokes’s Theorem 379
4 The Four Fundamental Subspaces 171 6 Applications to Physics 393
5 The Nonlinear Case: Introduction to 7 Applications to Topology 403
Manifolds 186
vi ► Contents
► COMMENTS ON CONTENTS
The linear algebraic material with which we begin the course in Chapter 1 is concrete,
establishes the link with geometry, and is a good self-contained setting for working on
vii
viii ► Preface
proofs. We introduce vectors, dot products, subspaces, and linear transformations and
matrix computations. At this early stage we emphasize the two interpretations of multiplying
a matrix A by a vector x: the linear equations viewpoint (considering the dot products of the
rows of A with x) and the linear combinations viewpoint (taking the linear combination of
the columns of A weighted by the coordinates of x). We end the chapter with a discussion
of 2 x 2 and 3x3 determinants, area, volume, and the cross product.
In Chapter 2 we begin to make the transition to calculus, introducing scalar functions
of a vector variable—their graphs and their level sets—and vector-valued functions. We
introduce the requisite language of open and closed sets, sequences, and limits and conti
nuity, including the proofs of the usual limit theorems. (Generally, however, I give these
short shrift in lecture, as I don’t have the time to emphasize 8-s arguments.)
We come to the concepts of differential calculus in Chapter 3. We quickly introduce
partial and directional derivatives as immediate to calculate, and then come to the definition
of differentiability, the characterization of differentiable functions, and the standard differ
entiation rules. We give the gradient vector its own brief section, in which we emphasize
its geometric meaning. Then comes a section on curves, in which we mention Kepler’s
laws (the second is proved in the text and the other two are left as an exercise), arclength,
and curvature of a space curve.
In the first four sections of Chapter 4 we give an accelerated treatment of Gaussian
elimination (including a proof of uniqueness of reduced echelon form) and the theory of
linear systems, the standard material on linear independence and dimension (including a
brief mention of abstract vector spaces), and the four fundamental subspaces associated to
a matrix. In the last section, we begin our assault on the nonlinear case, introducing (with
no proofs) the implicit function theorem and the notion of a manifold.
Chapter 5 is a blend of topology, calculus, and linear algebra—quadratic forms and pro
jections. We start with the topological notion of compactness and prove the maximum value
theorem in higher dimensions. We then turn to the calculus of applied maximum/minimum
problems and then to the analysis of the second-derivative test and the Hessian. Then
comes one of the most important topics in applications, Lagrange multipliers (with a rigor
ous proof). In the last section, we return to linear algebra, to discuss projections (from both
the explicit and the implicit approaches), least-squares solutions of inconsistent systems, the
Gram-Schmidt process, and a brief discussion of abstract inner product spaces (including
a nice proof of Lagrange interpolation).
Chapter 6 is a brief, but sophisticated, introduction to the inverse and implicit function
theorems. We present our favorite proof using the contraction mapping principle (which is
both more elegant and works just fine in the infinite-dimensional setting). In the last section
we prove that all three definitions of a manifold are (locally) equivalent: the implicit
representation, the parametric representation, and the representation as a graph. (In the
year-long course that I teach, I find I have time to treat this chapter only lightly.)
In Chapter 7 we study the multidimensional (Riemann) integral. In the first two sections
we deal predominantly with the theory of the multiple integral and, then, Fubini’s Theorem
and the computation of iterated integrals. Then we introduce (as is customary in a typical
multivariable calculus course) polar, cylindrical, and spherical coordinates and various
physical applications. We conclude the chapter with a careful treatment of determinants
(which will play a crucial role in Chapters 8 and 9) and a proof of the Change of Variables
Theorem.
Preface ◄ ix
• Exercises 1.2.22-26 and Exercises 1.5.19 and 1.5.20 on the geometry of triangles,
and Exercise 1.5.17, a nice glimpse of affine geometry
• Exercise 2.1.12, a parametrization of a hyperboloid of one sheet in which the param
eter curves are the two families of rulings
• Exercises 2.3.15-17, 3.1.10, and 3.2.18-19, exploring the infamous sorts of discon
tinuous and nondifferentiable functions
• Example 3.4.3 introducing the reflectivity property of the ellipse via the gradient,
with follow-ups in Exercises 3.4.8, 3.4.9, and 3.4.13, and then Kepler’s first and
third laws in Exercise 3.5.15.
• Exercise 3.5.14, the famous fact (due to Huygens) that the evolute of a cycloid is a
congruent cycloid
• Exercise 4.5.13, in which we discover that the lines passing through three pairwise
skew lines generate a saddle surface
• Exercises 5.1.5, 5.1.7, 9.4.11, exploring the (operator) norm of a matrix
• Exercise 5.2.15, introducing the Fermat/Steiner point of a triangle
• Exercises 5.3.2 and 5.3.4, pointing out a local minimum along every line need not
be a local minimum (an issue that is mishandled in surprisingly many multivariable
calculus texts) and that a lone critical point that is a local minimum may not be a
global minimum
x > Preface
• Exercises 5.4.32,5.4.34, and 9.4.21, giving the interpretation of the Lagrange multi
plier, introducing the bordered Hessian, and giving a proof that the bordered Hessian
gives a sufficient test for constrained critical points
• Exercises 6.1.8 and 6.1.10, giving Kantarovich’s Theorem (first in one dimension and
then in higher), a sufficient condition for Newton’s method to converge (a beautiful
result I learned from Hubbard and Hubbard)
• Exercise 6.2.13, introducing the envelope of a family of curves
• Exercise 7.3.24, my favorite triple integral challenge problem
• Exercises 7.4.27 and 7.4.28
• Exercises 7.5.25-27, some nice applications of the determinant
• Exercises 8.3.23,8.3.25, and 8.3.26, some interesting applications of line integration
and Green’s Theorem
• Exercise 8.5.22, giving a calibrations proof that the minimal surface equation gives
surfaces of least area
• The discussion in Chapter 8, Section 7, of counting roots (reminiscent of the treatment
of winding numbers and Gauss’s Law in earlier sections) and Exercises 8.7.9 and
9.4.22, in which we prove that the roots of a complex polynomial depend continuously
on its coefficients, and then derive Sylvester’s Law of Inertia as a corollary
• Exercises 9.1.12 and 9.1.13, some interesting applications of the change-of-basis
framework
• Exercises 9.2.19,9.2.20,9.2.23, and 9.2.24, some more standard but more challeng
ing linear algebra exercises
> ACKNOWLEDGMENTS
I would like to thank my students of the past years for enduring preliminary versions of
this text and for all their helpful comments and suggestions. I would like to acknowledge
Preface ◄ xi
helpful conversations with my colleagues Malcolm Adams and Jason Cantarella. I would
also like to thank the following reviewers, along with several anonymous referees, who
offered many helpful comments:
I am very grateful to my editor, Laurie Rosatone, for her enthusiastic support, encourage
ment, and guidance.
I welcome any comments and suggestions. Please address any e-mail correspondence
to
shifrin@math.uga.edu
or
www.wiley.com/college/shifrin
Theodore Shifrin
CH ER
1
VECTORS AND
MATRICES
Linear algebra provides a beautiful example of the interplay between two branches of
mathematics, geometry and algebra. Moreover, it provides the foundations for all of our
upcoming work with calculus, which is based on the idea of approximating the general
function locally by a linear one. In this chapter, we introduce the basic language of vectors,
linear functions, and matrices. We emphasize throughout the symbiotic relation between
geometric and algebraic calculations and interpretations. This is true also of the last section,
where we discuss the determinant in two and three dimensions and define the cross product.
> 1 VECTORS IN
A point in R" is an ordered n-tuple of real numbers, written (xi,..., xn). To it we may
“xi "
*2
associate the vector x = , which we visualize geometrically as the arrow pointing
from the origin to the point. We shall (purposely) use the boldface letter x to denote both
the point and the corresponding vector, as illustrated in Figure 1.1. We denote by 0 the
vector all of whose coordinates are 0, called the zero vector.
More generally, any two points A and B in space determine the arrow pointing from A
to B, as shown in Figure 1.2, again specifying a vector that we denote AB. We often refer
1
2 ► Chapter 1. Vectors and Matrices
Figure 1.3
~ ai ' 'by'
to A as the “tail” of the vector A^ and B as its “head.
If A = and B =
' bi - at ~ _an _ _bn_
then A^ is equal to the vector v = , whose tail is at the origin, as indicated in
bn-an _
Figure 1.2.
The Pythagorean Theorem tells us that when n = 2 the length of the vector x is
■yjxl A repeated application of the Pythagorean Theorem, as indicated in Figure
There are two crucial algebraic operations one can perform on vectors, both of which
have clear geometric interpretations.
l_C*n J
opposite direction, depending on whether c > Oorc < 0, respectively. Thus, multiplication
by the real number e simply stretches (or shrinks) the vector by a factor of |c| and reverses
1 Vectors in R" 3
Figure 1.4
its direction when c is negative. Since this is a geometric “change of scale,” we refer to the
real number c as a scalar and the multiplication ex as scalar multiplication.
Note that whenever x 0 we can find a unit vector with the same direction by taking
x 1
----- — ----- y
llxll ||x|| ’
Definition We say two vectors x and y are parallel if one is a scalar multiple of the
other, i.e., if there is a scalar e so that y = ex or x = cy. We say x and y are nonparallel if
they are not parallel.
x, and draw the arrow from the origin to its head. This is the so-called parallelogram
law for vector addition, for, as we see in Figure 1.5, x -I- y is the “long” diagonal of the
Figure 1.5
4 ► Chapter 1. Vectors and Matrices
parallelogram spanned by x and y. Notice that the picture makes it clear that vector addition
is commutative; i.e.,
x + y = y + x.
This also follows immediately from the algebraic definition because addition of real numbers
is commutative. (See Exercise 12 for an exhaustive list of the properties of vector addition
and scalar multiplication.)
Remark We emphasize here that the notions of vector addition and scalar multipli
cation make sense geometrically for vectors in the form A^ which do not necessarily have
their tails at the origin. If we wish to add AB to ct>, we simply recall that CD is equal
to any vector with the same length and direction, so we just translate ct> so that C and B
coincide; then the arrow from A to the point D in its new position is the sum a I + ct).
Subtraction of one vector from another is easy to define algebraically. If x and y are
as above, then we set
*1 - yi
_ xn - yn _
As is the case with real numbers, we have the following interpretation of the difference
x — y: It is the vector we add to y in order to obtain x; i.e.,
(x - y) + y = x.
Pictorially, we see that x — y is drawn, as shown in Figure 1.6, by putting its tail at y and
its head at x, thereby resulting in the other diagonal of the parallelogram determined by x
and y. Note that if A and B are points in space and we set x = OA and y — OB, then
y — x = A^. Moreover, as Figure 1.6 also suggests, we have x — y = x + (—y).
Figure 1.6
1 Vectors in R" ◄ 5
► EXAMPLE 1
Let A and B be points in R". The midpoint M of the line segment joining them is the point halfway
from A to B; that is, AAf = ^A%. Using the notation as above, we set x = OA and y = and
we have
(*) Oa J = x + A2&==x +|(y-x) = |(x + y).
In particular, the vector from the origin to the midpoint of AB is the average of the vectors x and y.
See Exercise 8 for a generalization to three vectors and Section 4 of Chapter 7 for more.
From this formula follows one of the classic results from high school geometry: The diagonals
of a parallelogram bisect one another. We’ve seen that the midpoint M of AB is, by virtue of the
formula (*), also the midpoint of diagonal OC. (See Figure 1.7.) ◄
Figure 1.7
It should now be evident that vector methods provide a great tool for translating theo
rems from Euclidean geometry into simple algebraic statements. Here is another example.
Recall that a median of a triangle is a line segment from a vertex to the midpoint of the
opposite side.
Proof We may put one of the vertices of the triangle at the origin, so that the picture
is as shown in Figure 1.8(a). Let x = OA, y = ot, and let L, M, and N be the midpoints
of OA, AB, and OB, respectively. The battle plan is the following: We let P denote the
point 2/3 of the way from B to L, Q the point 2/3 of the way from O to M, and R the
point 2/3 of the way from A to N. Although we’ve indicated P, Q, and R as distinct points
in Figure 1.8(b), our goal is to prove that P = Q = R; we do this by expressing all the
vectors OP, O&, and 0% in terms of x and y.
O^ = oi + BP = oi + Ib I = y + |(|x - y)
= jx+|y;
O§ = |OM = I (|(x + y)) = |(x + y); and
6 ► Chapter 1. Vectors and Matrices
(a)
Figure 1.8
The astute reader might notice that we could have been more economical in the last
proof. Suppose we merely check that the points 2/3 of the way down two of the medians
(say P and Q) agree. It would then follow (say, by relabeling the triangle slightly) that the
same is true of a different pair of medians (say P and R). But since any two pairs must
have a point in common, we may now conclude that all three points are equal.
► EXERCISES 1.1
5. Let ABCD be an arbitrary quadrilateral. Let P, Q, R, and 5 be the midpoints of AB, BC, CD,
and DA, respectively. Use vector methods to prove that PQRS is a parallelogram. (Hint: Use
Exercise 4.)
*6. In AABC pictured in Figure 1.9, ||A^|| = |||A^|| and ||C7?|| = j||C^||. Let Q denote the
midpoint of CD; show that AQ = cA^ for some scalar c and determine the ratio c = || A$I(/[[Aj & ||.
In what ratio does CD divide AE?
Figure 1.9
7. Consider parallelogram ABCD. Suppose A% = |A^ and D? = Show that P lies on the
diagonal AC. (See Figure 1.10.)
A E B
Figure 1.10
8. Let A, B, and C be vertices of a triangle in R3. Let x = OA, y = and z = OC. Show
that the head of the vector v = - (x 4- y + z) lies on each median of A ABC (and thus is the point of
intersection of the three medians). It follows (see Section 4 of Chapter 7) that when we put equal
masses at A, B, and C, the center of mass of that system is given by the intersections of the medians
of the triangle.
9. (a) Let u, v e R2. Describe the vectors x = su + Tv, where 5 4-1 = 1. Pay particular attention
to the location of x when s > 0 and when t > 0.
(b) Let u, v, w € R3. Describe the vectors x = ru 4- sv 4- tw, where r 4- s 4-1 = 1. Pay particular
attention to the location of x when each of r, s, and t is positive.
10. Suppose x, y e Rn are nonparallel vectors. (Recall the definition on p. 3.)
(a) Prove that if sx 4- ty = 0, then s = t = 0. (Hint: Show that neither s /= 0 nor t / 0 is possible.)
(b) Prove that if ax 4- by = ex 4- dy, then a = c and b = d.
11. “Discover” the fraction 2/3 that appears in Proposition 1.1 by finding the intersection of two
medians. (Hint: A point on the line OM can be written in the form t (x 4- y) for some scalar t, and a
point on the line AV can be written in the form x 4- j(|y — x) for some scalar s. You will need to
use the result of Exercise 10.)
12. Verify both algebraically and geometrically that the following properties of vector arithmetic
hold. (Do so for n = 2 if the general case is too intimidating.)
(a) For all x, y e R", x 4- y = y 4- x.
(b) For all x, y, z € R”, (x 4- y) 4- z = x 4- (y 4- z).
(c) 0 4- x = x for all x e R”.
8 > Chapter 1. Vectors and Matrices
Then we observe that when /LPOQ is aright angle, AO AP is similar to AO BQ, and so
X2/X1 = —y\/y2, whence jqyi + x2y2 — 0. This leads us to make the following
We know that when the vectors x and y g R2 are perpendicular, their dot product is 0.
By starting with the algebraic properties of the dot product, we are able to get a great deal
of geometry out of it.
Proof In order to simplify the notation, we give the proof with n — 2. Since multi
plication of real numbers is commutative, we have
The square of a real number is nonnegative and the sum of nonnegative numbers is nonneg
ative, so x • x = x2 4- x2 - 0 and is equal to 0 only when Xi = x2 = 0. The next property
follows from the associative and distributive properties of real numbers:
(ex) ■ y = (cxOyi -I- (cx2)y2 - c(xxy{) + c(x2y2) = c(xiy! 4- x2y2) = c(x • y).
The last result follows from the commutative, associative, and distributive properties of
real numbers:
as desired. ■
The geometric meaning of this result comes from the Pythagorean Theorem: When x
and y are perpendicular vectors in R2, then we have ||x 4- y ||2 = ||x||2 4- ||y||2, and so, by
Corollary 2.2, it must be the case that x • y = 0. (And the converse follows, too, from the
converse of the Pythagorean Theorem.) That is, two vectors in R2 are perpendicular if and
only if their dot product is 0.
Motivated by this, we use the algebraic definition of dot product of vectors in R" to
bring in the geometry. In keeping with current use of the terminology and falling prey to
the penchant to have several names for the same thing, we make the following
Armed with this definition, we proceed to a construction that will be important in much
of our future work. Starting with two vectors x, y e R", where y 0, Figure 2.2 suggests
that we should be able to write x as the sum of a vector, x11, that is parallel to y and a vector,
x-1-, that is orthogonal to y. Let’s suppose we have such an equation:
To say that x'1 is a scalar multiple of y means that we can write x" = cy for some scalar c.
Now, assuming such an expression exists, we can determine c by taking the dot product of
both sides of the equation with y:
x • y — (x11 4- x±) • y = (x11 • y) + (x1- • y) = x(i • y = (cy) • y = c||y||2.
10 > Chapter 1. Vectors and Matrices
H x-y
x" = —=-y
llyll2
± x y
x-1- = x--------- =-y.
llyll2
Obviously, x11 4- x1 = x and x11 is a scalar multiple of y. All we need to check is that x1 is
in fact orthogonal to y. Well,
x ( x,y \ x y x-y 2
x ■y=r^yry=x'y"^y'y=x'y^'ly!l =x-y~x-y=0’
as required. Note, moreover, that x11 is the unique multiple of y that satisfies the equation
(x - x11) • y = 0.
► EXAMPLE 1
”2" r -i"
Letx = 3 andy = 1 . Then
1 1
2 -1
3 1
-1 -1
1 1 _ 2
x yx-y =
xn = — 1 1 and
llyll2 “ 3
-1 1 1
1
1
2 -1 r s ■
5
2 7
X' 3 1 5
3 1
1 1 L JJ
2 Dot Product <4 11
" 8 “ ’ -1 '
3
To double-check, we compute xx ■ y = 7
3 1 = 0, as it should be. <4
1
L 3 J 1 _
Suppose x.y e R2. We shall see next that the formula for the projection of x onto y
enables us to calculate the angle between the vectors x and y. Consider the right triangle
in Figure 2.3; let 0 denote the angle between the vectors x and y. Remembering that the
cosine of an angle is the ratio of the signed length of the adjacent side to the length of the
hypotenuse, we see that
x-y
COS0 = = g|lyll = llyll2 y = x-y
length of x ||x|| ||x|| llxllllyll*
x y = ||x||||y|| cos6>.
Will this formula still make sense even when x,y e R"? Geometrically, we simply restrict
our attention to the plane spanned by x and y and measure the angle 0 in that plane, and so
we blithely make the
Definition Let x and y be nonzero vectors in R". We define the angle between them
to be the unique 0 satisfying 0 < 0 < n so that
cos(; = 2L2L
llxllllyll
Since our geometric intuition may be misleading in R", we should check algebraically
that this definition makes sense. Since |cos#| < 1, the following result gives us what is
needed.
Moreover, equality holds if and only if one of the vectors is a scalar multiple of the other.
12 ► Chapter 1. Vectors and Matrices
x•y
takes its minimum at ?0 — — The minimum value
llyll2
c<t.\ - IIII2 - 2(* 'y)2 + (X'y)2 - llxll2 - (X'y)2
«(»o) - M ||y||2 + ||yl|2 II II ||y||2
is necessarily nonnegative, so
(x-y)2 < ||x||2||y||2,
One of the most useful applications of this result is the famed triangle inequality, which
tells us that the sum of the lengths of two sides of a triangle cannot be less than the length
of the third.
Corollary 2.4 (Triangle Inequality) For any vectors x, y e Rn, we have ||x + y || <
IM + ||y||.
l|x + yII2 = M2 + 2x ■ y + llyll2 < ||x||2 + 2||x|| ||y|| + llyll2 = (||x|| + llyll)2.
Since square root preserves inequality, we conclude that ||x + y|| < ||x|| + ||y ||, as desired.
■
Remark The dot product also arises in situations removed from geometry. The
economist introduces the commodity vector x, whose entries are the quantities of various
commodities that happen to be of interest and the price vector p. For example, we might
consider
*1 Pl
X2 P2
X = X3 and p= Pl € R5,
X4 P4
X5 PS
where xi represents the number of pounds of flour, X2 the number of dozens of eggs, x$
the number of pounds of chocolate chips, x4 the number of pounds of walnuts, and x5 the
number of pounds of butter needed to produce a certain massive quantity of chocolate chip
2 Dot Product 13
cookies, and p{ is the price (in dollars) of a unit of the I th commodity (e.g., p2 is the price
of a dozen eggs). Then it is easy to see that
is the total cost of producing the massive quantity of cookies. (To be realistic, we might
also want to include x$ as the number of hours of labor, with corresponding hourly wage
Pq .) We will return to this interpretation in Section 4.
► EXERCISES 1.2
1. For each of the following pairs of vectors x and y, calculate x • y and the angle 0 between the
*2. For each pair of vectors in Exercise 1, calculate projyx and projxy.
*3. Find the angle between the long diagonal of a cube and a face diagonal.
4. Find the angle that the long diagonal of a 3 x 4 x 5 rectangular box makes with the longest edge.
5. Supposex, y e Rn, ||x|| = 2, ||y|| = 1, and the angle 0 between x and y is 0 = arccos(l/4). Prove
that the vectors x — 3y and x + y are orthogonal.
6. Suppose x, y, z € R2 are unit vectors satisfying x + y + z = 0. What can you say about the angles
between each pair?
’ 1 "1 "o~| “0“
7. Letei = 0 .e2 = 1 , and 63 = 0 be the so-called standard basis vectors for R3. Let
_0_ _0_ _ 1 _
x e R3 be a nonzero vector. For i = 1,2,3, let 0(- denote the angle between x and e;. Compute
COS2 01 + COS2 02 + COS2 03.
1 2
*8. Let x = 1 3 € R". Let 0„ be the angle between x and y in Rn. Find lim 0„.
n->oo
1 n
(Hint: You may need to recall the formulas for 1 + 2 -I------- 1- n and l2 + 22 + • • • + n2 from your
beginning calculus course.)
14 ► Chapter 1. Vectors and Matrices
9. With regard to the proof of Proposition 2.3, how is roy related to x11 ? What does this say about
projyX?
10. Use vector methods to prove that a parallelogram is a rectangle if and only if its diagonals have
the same length.
11. Use the fundamental properties of the dot product to prove that
llx + yll2 + h -y||2 = 2 (||x||2 + ||y||2).
Interpret the result geometrically.
*12. Use the dot product to prove the law of cosines: As shown in Figure 2.4,
c2 = a2 + b2 — 2abcos6.
13. Use vector methods to prove that the diagonals of a parallelogram are orthogonal if and only if
the parallelogram is a rhombus (i.e., has all sides of equal length).
14. Use vector methods to prove that a triangle inscribed in a circle and having a diameter as one of
its sides must be a right triangle. (Hint: See Figure 2.5.)
Geometric challenge: More generally, given two points A and B in the plane, what is the locus
of points X so that ZAXB has a fixed measure?
15. (a) Lety € R". If x • y = 0 for all x e R", then prove that y = 0.
(b) Suppose y, z e R" and x • y = x • z for all x e R". What can you conclude?
~X2
16. Ifx = € R2, set p(x) =
X2 J |_ Xi _
(a) Check that p(x) is orthogonal to x; indeed, p(x) is obtained by rotating x an angle t t /2 counter
clockwise.
(b) Given x. y e R2, prove that x • p(y) = -p(x) • y. Interpret this statement geometrically.
#17. Prove that for any vectors x, y e R", we have ||x|| - ||y|| < ||x — y ||. Deduce that | ||x|| - ||y|| | <
||x - y ||. (Hint: Apply the result of Corollary 2.4 directly.)
2 Dot Product 15
18. Use the Cauchy-Schwarz inequality to solve the following max/min problem: If the (long)
diagonal of a rectangular box has length c, what is the greatest the sum of the length, width, and
height of the box can be? For what shape box does the maximum occur?
19. Give an alternative proof of the Cauchy-Schwarz inequality, as follows. Let a = ||x||, b = ||y ||,
and deduce from ||bx — ay||2 > 0 that x ■ y < ab. Now how do you show that |x • y| < ab2 When
does equality hold?
•*20. (a) Let x and y be vectors with || x || = ||y ||. Prove that the vector x + y bisects the angle between
x andy.
(b) More generally, if x and y are arbitrary nonzero vectors, let a = ||x|| and b = ||y||. Prove that
the vector bx + ay bisects the angle between x and y.
21. Use vector methods to prove that the diagonals of a parallelogram bisect the vertex angles if and
only if the parallelogram is a rhombus.
22. Given A ABC with D on BC as shown in Figure 2.6. Prove that if AD bisects Z.BAC, then
||B^||/||C^|| = ||A^||/||A^||. (Hint: Use Exercise 20b. Let x = A% and y = *A?; give two
expressions for a !) in terms of x and y and use Exercise 1.1.10.)
23. Use vector methods to prove that the angle bisectors of a triangle have a common point. (Hint:
Given AOAB,letx = OA,y = 0%,a = ||<zA||, b = ||O^||, andc = ||A^||. If we define the point
P by 0$ = (bx + ay), use Exercise 20b to show that P lies on all three angle bisectors.)
24. Use vector methods to prove that the altitudes of a triangle have a common point. Recall that
altitudes of a triangle are the lines passing through a vertex and perpendicular to the opposite side.
(Hint: See Figure 2.7. Let C be the point of intersection of the altitude from B to and the altitude
from A to OB. Prove that OC is orthogonal to a X.)
25. Use vector methods to prove that the perpendicular bisectors of the sides of a triangle intersect
in a point, as follows. Assume the triangle OAB has one vertex at the origin, and let x = OA and
y = ^5.
16 ► Chapter 1. Vectors and Matrices
(a) Let z be the point of intersection of the perpendicular bisectors of O A and O B. Prove that (using
the notation of Exercise 16)
1 , ,. . Ilyll2-x-y
z = |x + cp(x), where c = - - ;- ------- .
2 2p(x) • y
(b) Show that z lies on the perpendicular bisector of AB. (Hint: What is the dot product ofz-
| (x + y) with y — x?)
26. Let P be the intersection of the medians of AO AB (see Proposition 1.1), Q the intersection of
its altitudes (see Exercise 24), and R the intersection of the perpendicular bisectors of its sides (see
Exercise 25). Show that P, Q, and R are collinear and that P is two-thirds of the way from Q to R.
Does the intersection of the angle bisectors (see Exercise 23) lie on this line as well?
► EXAMPLE 1
a. The trivial subspace consisting of just the zero vector 0 G R” is a subspace since cO = 0 for
any scalar c and 0 + 0 = 0.
b. R" itself is likewise a subspace of Rn.
c. Fix a nonzero vector u g R", and consider
as shown in Figure 3.1. IP is called a plane through the origin. To see that T is a subspace,
we do the obligatory checks:
so x + y g 7, as required.
Figure 3.1
V = {x g R" : A • x = 0}.
V consists of all vectors orthogonal to the given vector A, as pictured in Figure 3.2. We
check once again that the three criteria hold:
Figure 3.2
18 ► Chapter 1. Vectors and Matrices
3. Suppose v, w € V. Then A ■ (v + w) = (A • v) + (A • w) = 0 + 0 = 0, so v + w e
V, as required.
Thus, V is a subspace of R". We call V a hyperplane in R", having normal vector A.
More generally, given any collection of vectors Ai,..., Am e R", the set of solutions of the
homogeneous system of linear equations
Ai • x = 0, A2 • x = 0, ..., Am • x = 0
► EXAMPLE 2
Let’s consider next a few subsets of R2, as pictured in Figure 3.3, that are not subspaces.
to point out 0 £ S.
(b)
not subspaces of R2
Figure 33
Given a collection of vectors in R", it is natural to try to “build” a subspace from them.
We begin with some crucial definitions.
(as illustrated in Figure 3.4) is called a linear combination of Vi,..., v^. The set of all
linear combinations of Vi,..., v* is called their span, denoted Span(vi,..., V&).
The vectors ei,..., e„ are often called the standard basis vectors for Rn. Obviously, given
the vector
3. Suppose v, w e V. This means there are scalars ci,...,q and d\,..., dk so that
adding, we obtain
Remark Let V c R" be a subspace and let Vi,..., Vjt e V. We say that Vi,..., v*
span V if Span(vi,..., v*) = V- (The point here is that every vector in V must be a linear
combination of the vectors Vi,..., v*.) As we shall see in Chapter 4, it takes at least n
vectors to span Rn; the smallest number of vectors required to span a given subspace will
be a measure of its “size” or “dimension.”
► EXAMPLE 3
The plane
is not a subspace. This is most easily verified by checking that 0 £ for 0 G CP2 precisely when we
can find values of j and t so that
0 1 1 2
0 = 0 4-5 -1 4-1 0
0 0 2 _ 1 _
5 4- 2t — — 1
-s =0
2s + t = 0,
1 ’ ~2~
T 1"
i +s -1 4-1 0 : s, t e R
L _
2_ _ 1
so that, despite the presence of the “shifting” term, the plane may still pass through the origin. *4
There are really two different ways in which subspaces of R" arise: as being the span
of a collection of vectors (the “parametric” approach) or as being the set of solutions of a
(homogeneous) system of linear equations (the “implicit” approach). We shall study the
connections between the two in detail in Chapter 4.
► EXAMPLE 4
-1
As the reader can verify, the vector A = 3 is orthogonal to both the vectors that span the plane
2
(Pi given in Example 3 above. Thus, every vector in (Pi is orthogonal to A, and we suspect that
Strictly speaking, we only know that every vector in (Pi is a solution of this equation. But note that
if x is a solution, then
so x e (Pi and the two sets are equal.1 Thus, the discussion of Example le gives another justification
that (Pi is a subspace of R3.
On the other hand, one can check, analogously, that
and so clearly 0 (P2 and (P2 is not a subspace. It is an affine plane parallel to (Pi. <4
Definition Let V and W be subspaces of R”. We say they are orthogonal subspaces
if every element of V is orthogonal to every element of W, i.e., if
’Ordinarily, the easiest way to establish that two sets are equal is to show that each is a subset of the other.
22 ► Chapter 1. Vectors and Matrices
Figure 3.5
► EXAMPLES
Let V = Span . Then V-1- is the plane W = {x € R3 : Xj + 2x2 + %3 = 0}. Now what is
\
the orthogonal complement of IF? We suspect it is just the line V, but we will have to wait until
Chapter 4 to have the appropriate tools.
*1. Which of the following are subspaces? Justify your answer in each case.
(a) {x e R2 : Xi +x2 = 1}
a
(b) {x e R3: x = b for some a, b e R}
a+b
(c) {x e R3 .: xi+ 2%2 < 0)
(d) {xe R3: x2 + x2 + Xj = 1}
(e) {xe R3: x2 + x2 + x2 = 0}
(f) (x g R3 : Xj2 4- x2 + x^ = — 1}
2In fact, both this definition and Proposition 3.2 work just fine for any subset V C R".
4 Linear Transformations and Matrix Algebra 23
*2. Criticize the following argument: By Exercise 1.1.13, for any vector v, we have Ov = 0. So the
first criterion for subspaces is, in fact, a consequence of the second criterion and could therefore be
omitted.
B3. Suppose x, Vi,..., Vfc € Rn and x is orthogonal to each of the vectors Vi,..., v*. Prove that x is
orthogonal to any linear combination ciVi + c2v2 H------ 1- q v *.
4. Prove Proposition 3.2.
5. Given vectors Vi,..., vk e R", prove that V = Span(vi,..., vt) is the smallest subspace con
taining them all. That is, prove that if W c R" is a subspace and Vi,..., vk e W, then V C W.
B6. (a) Let U and V be subspaces of R". Define
U n V = {x e Rn : x € U andx € V}.
Prove that U A V is a subspace of Rn. Give two examples.
(b) IsJ7UV = {x€Rn:xe(7orxeV}a subspace of R"? Give a proof or counterexample.
(c) Let U and V be subspaces of R". Define
U -I- V — {x e R" : x = u + v for some u € U and v e V}.
Prove that U + V is a subspace of R". Give two examples.
7. Let Vi,..., vfc € R" and let v € Rn. Prove that
Span(vi........ Vt) = Span(vlf..., v*, v) <=>■ v e Span(v15..., vk).
^*8. Let V C R" be a subspace. Prove that V A = {0}.
89. Suppose U, V c R” are subspaces and UcV. Prove that V1 c U1.
#10. Let V c R" be a subspace. Prove that V c (V1)1. Do you think more is true?
#11. Suppose V = Span(vi,..., v*) C R". Show that there are vectors Wi,..., wt € V that are
mutually orthogonal (i.e., w,- ■ w;- = 0 whenever i j) that also span V. (Hint: Let wi = Vp Using
techniques of Section 2, define w2 so that Span(wi, w2) = Span(vi, v2) and wi • w2 = 0. Continue.)
12. Suppose U and V are subspaces of R". Prove that (U + V)x = A V-1. (See the footnote on
p. 21.)
If we think visually of T as mapping R" to ROT, then we have a diagram like Figure 4.1.
The main point of the linearity properties is that the values of T on the standard basis
vectorsei,..., en completely determine the function T: For suppose x = xiei 4-------- 1- xnen
G R”; then
In particular, let
«2j
W= g F;
All Gin
<*21
Jimi Umn^
which we call the standard matrix for T. (We will often denote this by [T].) To emphasize:
The j* column of A is the vector in Rm obtained by applying T to the j-th standard basis
vector, ey.
Figure 4.1
4 Linear Transformations and Matrix Algebra ◄ 25
► EXAMPLE 1
The most basic example of a linear map is the following. Fix a e R", and define T: R" —► R by
T(x) = a • x. By Proposition 2.1, we have
a„]-
if a = then [T] = ai a2
La* J
► EXAMPLE!
a. Consider the function T: R2 -> R2 defined by rotating vectors in the plane counterclockwise
by 90°. Then it is easy to see from the geometry in Figure 4.2 that
xi + yi ~(X2 +?2)
T(x + y) = T = T(x) + T(y),
X2+J2 Xi +yi
—CX2
T(cx) = T = cT(x),
CXl
Figure 4.2
26 > Chapter 1. Vectors and Matrices
Better yet, since rotation carries lines through the origin to lines through the origin
and triangles to congruent triangles, it is clear on geometric grounds that T must satisfy
properties (i) and (ii).
b. Consider the function T: R2 -> R2 defined by reflecting vectors across the line xi = x2, as
shown in Figure 4.3. (Visualize this as looking at vectors through a mirror along that line.)
Once again, we see from the geometry that
xi X2
X2 Xi
and linearity is obvious algebraically. But it should also be clear on geometric grounds that
stretching a vector and then looking at it in the mirror is the same as stretching its mirror
image, and likewise for addition of vectors. The standard matrix for T is
0 1
[F] =
1 0
T(x)
Figure 43
The effect of T is pictured in Figure 4.4. One might slide a deck of cards in this fashion,
and such a motion is called a shear.
Figure 4.4
4 Linear Transformations and Matrix Algebra ◄ 27
d. Consider the function T: R3 —> R3 defined by reflecting across the plane x3 = 0. Then
T(eO = ei, T(e2) = e2, and T(e3) = —e3, so the standard matrix for T is
1 0 0
0 1 0
0 0-1
COS#
T(ei) =
sin#
(by the usual definition of cos 0 and sin 0, in fact) and as its second
— sin#
T(e2) =
COS#
(since e2 is obtained by rotating ei through t t /2, then so is T (e2) obtained by rotating T(ei)
through t t /2). Thus, the standard matrix for T is
COS# — sin#
A$ —
sin# cos#
Figure 4.6
28 ► Chapter 1. Vectors and Matrices
given respectively by projection onto, and reflection across, the line t. Their standard
matrices are
1 1 +
6 3'6
then it seems impossible to discern the geometric nature of the linear map represented by
such a matrix.3 In these examples, the standard “coordinate system” built into matrices just
masks the geometry, and as we shall see, the solution is to change our coordinate system.
This we do in Chapter 9. *1
Let T: R" -> Rm be a linear map, and let A be its standard matrix. We want to define
the product of the m x n matrix A with the vector x G Rn in such a way that the vector
T(x) G Rm is equal to Ax. (We will occasionally denote the linear map defined in this way
by In accordance with the formula (*) on p. 24, we have
n n
Ax = T(x) = xiT <e«) = 52 x,a,‘ ’
i=i <=i
where
are the column vectors of the matrix A. That is, Ax is the linear combination of the vectors
ai,..., an, weighted according to the coordinates of the vector x.
There is, however, an alternative interpretation. Let
~aii ’
<221 ^ml
A1 = , A2 = ,..., Am = g F
_ a2n _ ^mn
3For the curious among you, multiplication by C gives a rotation of R3 through an angle of t t /2 about the line
m
spanned by 2 See Exercise 9.2.21.
1
4 Linear Transformations and Matrix Algebra ◄ 29
As we shall study in great detail in Chapter 4, this allows us to interpret the equation Ax = y
as a system of m linear equations in the variables xi,..., xn.
(cT)(x) = c(T(x))
(S + T)(x) = S(x) + r(x).
an can caln
a2i ca2i ca^n
cA = c
Given two matrices A and B e A4WX«, we define their sum entry by entry. In symbols,
when
we define
an <*ln + bin
► EXAMPLE 3
Let c = — 2 and
1
1
Then
cA = -4 -2 4 , A+B= -1 2 -1
_-8 2 “6- 4 -1 3_
and neither sum A 4- C nor B 4- C makes sense since C has a different shape from A and B. (One
should not expect to be able to add functions with different domains or ranges.)
Denote by O the zero matrix, themxn matrix all of whose entries are 0. As the reader
can easily check, scalar multiplication of matrices and matrix addition satisfy the same
properties as scalar multiplication of vectors and vector addition (see Exercise 1.1.12). We
list them here for reference.
1. A + B = B + A.
2. (A 4- 5) 4* C — A (5 4* C).
3. O 4- A = A.
4. There is a matrix —A so that A 4- (—A) = O.
5. c(dA) = (cd)A.
6. c(A 4- B) — cA 4- cB.
7. (c 4- d)A = cA + dA.
8. 1A = A.
Of all the operations one performs on functions, probably the most powerfill is compo
sition. Recall that when g(x) is in the domain of/, we define (f°g)(x) = f(g(x)). So, sup
pose we have linear maps S: -> JR” and T: R” —> Rm. Then we define T° S’. Rp —> Rm
4 Linear Transformations and Matrix Algebra 31
by (T° S) (x) = T (S (x)). It is well known that composition of functions is not commutative4
but is associative, inasmuch as
= /(g(/>(x») = (/■>(«•*))(*).
(b'j\
b2j
T(S(ey)) = T = b\j&i + b2j&2 H----- + bnj&n,
\bnjl
where ai,..., an are the column vectors of A. That is, the 7th column of C is the product
of the matrix A with the vector b; . So we now make the definition:
e., the dot product of the Ith row vector of A and the j,th column vector of B, both of which
i.
are vectors in R". Graphically, we have
an ai2 a\n
bU b\p
bn b2p
di\ O,i2 . . . din
(AB)ij ..............
We reiterate that in order for the product AB to be defined, the number of columns of A
must equal the number of rows of B.
► EXAMPLE 4
If
fl 31
_1 1_ U
then
"1 3~ "1 4 15 1
4 1 0 -2
AB = 2 -1 — 9 1 -5 -5
-1 1 5 1
_1 1_ •- _3 2 5 -1
Notice also that the product BA does not make sense: B is a 2 x 4 matrix and A is 3 x 2, and
4^3. ◄
The preceding example brings out an important point about the nature of matrix mul
tiplication: It can happen that the matrix product AB is defined and the product BA is not.
Now if A is an m x n matrix and B is an n x m matrix, then both products AB and BA
make sense: AB is m x tn and BA is n x n. Notice that these are both square matrices, but
of different sizes. But even if we start with both A and B as n x n matrices, the products
AB and BA need not be equal.
► EXAMPLES
Let
0
0
Then
whereas BA =
► EXAMPLE 6
Let
2
5
4
5
4 Linear Transformations and Matrix Algebra ◄ 33
Then it is easy to check that A2 = A, so An = A for all positive integers n (why?). What is the
geometric explanation? Note that
" 1 2
1 _ 1
to
5 1 0 5 1
A and
1
II
0 2 “ 5 2 4 5 2
L 5 J 1 _ 5 .
so that for every x e R2, we see that Ax lies on the line spanned by Indeed, we can tell more:
1
_ r i r "i x ‘
X1 y(X] +2x2) 4" 2x2 1 2 1
Xi |(X1+2X2) 5 2
1 II2 ■a*
2
is die projection of x onto the line spanned by . This explains why A2x = Ax for every x e R2:
A2x = A (Ax), and once we’ve projected the vector x onto the line, it stays exactly the same. ◄
> EXAMPLE?
There is an interesting way to interpret matrix powers in terms of directed graphs. Starting with the
matrix
0 2 1
A = 1 1 1
1 0 1
we draw a graph with three nodes (vertices) and directed edges (paths) from node i to node j, as
shown in Figure 4.7. For example, there are two edges from node 1 to node 2 and none from node 3
to node 2.
We calculate
]*=--—
Figure 4.7
34 ► Chapter 1. Vectors and Matrices
"5 8 8~
a 3 = 6 7 8 , and
_4 4 5_
"272 338 377 '
A7 = 273 337 377
169 208 233 _
With a bit of thought, the reader will convince herself that the i./-entry of A2 is the number of “two-
step” directed paths from node i to node j. Similarly, the ij -entry of A" is the number of n-step
directed paths from node i to node j.
Proposition 4.2 Let A and A' be m xn matrices; let B and B' be n x p matrices;
let C be a p xq matrix, and let cbe a scalar. Then
Proof These are all immediate from the linear map viewpoint. ■
AB = BA = In.
We call B the inverse of the matrix A and denote this by B — A-1.
If A is the matrix representing the linear transformation T: R" -> R", then A-1 rep
resents the inverse function T-1, which must then also be a linear transformation.
► EXAMPLE 8
Let
► EXAMPLE 9
It will be convenient for our future work to have the inverse of a 2 x 2 matrix
a b
A=
d
Provided ad — be 0, if we set
_ 1 d —b
A"1
ad — be a
► EXAMPLE 10
Since cos(—0) — cos 0 and sin(—0) = — sin 0, we see that this is the matrix A(-e). If we think about
the corresponding linear maps, this result becomes obvious: To invert (or “undo”) a rotation through
angle 0, we must rotate through angle —0.
► EXAMPLE 11
As an application of Example 9, we can now show that any two nonparallel vectors u,v e R2 must
Ml V1
span R2. It is easy to check that if u = and v = are nonparallel, then ui v2 — u2v^ /= 0,
_ “2 .
_V2.
so the matrix
Ml Vl
A=
M2 l>2
Proposition 4.3 Suppose A and B are invertible n xn matrices. Then their product
AB is invertible, and
Remark Some people refer to this result rather endearingly as the “shoe-sock theo
rem,” for to undo (invert) the process of putting on one’s socks and then one’s shoes, one
must first remove the shoes and then remove the socks.
Proof To prove the matrix AB is invertible, we need only check that the candidate
for the inverse works. That is, we need to check that
► EXAMPLE 12
Suppose
Proposition 4.4 Let A and A' be m x n matrices, let B bean n x p matrix, and let
c be a scalar. Then
1. (AT)T = A;
2. (c A)t = c At ;
3. (A + A')t = At + A't ;
4. (AB)t = Bt At .
4 Linear Transformations and Matrix Algebra 37
Proof The first is obvious since we swap rows and columns and then swap again,
returning to our original matrix. The second and third can be immediately checked. The
last result is more interesting, and we will use it to derive a crucial result in a moment. Note,
first, that AB is an m x p matrix, so (AB)T will be a p x m matrix; B7A7 is the product of
a p x n matrix and an n x m matrix and hence will be p x m as well, so the shapes agree.
Now, the ji-entry of AB is the dot product of the j* row vector of A and the i* column
vector of B; i.e., the ij-entry of (AB)7 is
On the other hand, the ij-entry of B7A7 is the dot product of the i* row vector of B7 and
the j* column vector of AT; but this is, by definition, the dot product of the Ith column
vector of B and the j* row vector of A. That is,
(BTAT)l7=bf.A7,
The transpose matrix will be very important to us because of the interplay between dot
product and transpose. If x and y are vectors in R", then by virtue of our very definition of
matrix multiplication,
x • y = xTy,
provided we agree to think of a 1 x 1 matrix as a scalar. Now we have the highly useful
Ax • y = x • ATy.
(On the left, we take the dot product of vectors in Rm; on the right, of vectors in R" J
Remark You might remember this: To move the matrix “across the dot product,”
you must transpose it.
Proof We just calculate, using the formula for the transpose of a product and, as
usual, associativity:
► EXAMPLE 13
We return to the economic interpretation of dot product given in the remark on p. 12. Suppose that
m different ingredients are required to manufacture n different products. To manufacture the product
Xi yi
vector x = requires the ingredient vector y — , and we suppose x and y are related by
ym
38 ► Chapter 1. Vectors and Matrices
the equation y = Ax for some m x n matrix A. If each unit of ingredient j costs a price pj, then the
cost of producing x is
m n
= y ■ P = Ax • p = x ■ ATp = ^qtXi,
j=\ 1=1
where q = ATp. Notice then that qt is the amount it costs to produce a unit of the Ith product. Our
fundamental formula, Proposition 4.5, tells us that the total cost of the ingredients should equal the
total worth of the products we manufacture. ◄
► EXERCISES 1.4
"o 1“
1. Let A =
"1 2~
, B=
2 r, c= 1 2 r , and D = 1 0 . Calculate
3 4 4 3 0 1 2
•- -* *- -1 _2 3_
each of the following expressions or explain why it is not defined.
(a) A + B (e) AB (i) BD
*(b) 2A — B *(f) BA (j) DB
(c) A — C *(g) AC *(k) CD
(d) C + D *(h) CA *0) DC
2. (a) If A is an m x n matrix and Ax = 0 for all x e Rn, prove that A = O.
(b) If A and B are m x n matrices and Ax = Bx for all x € R", prove that A = B.
#3. Let A be an m x n matrix. Show that V = {x € R" : Ax = 0} is a subspace of R".
#4. Let A be an m x n matrix.
(b) When m = 1, show that V C R"+1 is a hyperplane (see Example le in Section 3) by finding a
vector b € R"+1 so that V = {z € R"+1 : b • z = 0}.
5. Give 2x2 matrices A so that for any x € R2 we have, respectively,
(a) Ax is the vector whose components are, respectively, the sum and difference of the components
ofx;
*(b) Ax is the vector obtained by projecting x onto the line xi = X2 in R2;
(c) Ax is the vector obtained by first reflecting x across the line xi = 0 and then reflecting the resulting
vector across the line x2 = 0;
(d) Ax is the vector obtained by projecting x onto the line 2xi - x2 = 0;
*(e) Ax is the vector obtained by first projecting x onto the line 2xi — x2 = 0 and then rotating the
resulting vector n/2 counterclockwise;
(f) Ax is the vector obtained by first rotating x an angle of n/2 counterclockwise and then projecting
the resulting vector onto the line 2xi - x2 = 0.
6. (a) Calculate Ae A# and A^A$. (Recall the definition of the rotation matrix on p. 27.)
(b) Use your answer to part a to derive the addition formulas for cos and sin.
4 Linear Transformations and Matrix Algebra ◄ 39
7. Let Ae be the rotation matrix defined on p. 27,0 < 0 < it. Prove that
(a) || Aex|| = ||x|| for all x € R2; (b) the angle between x and A0x is 0.
These properties should characterize a rotation of the plane through angle 0.
8. Prove or give a counterexample. Assume all relevant matrices are square and of the same size.
(a) If AB = CB and B £ 0, then A = C. (c) (A + B)(A - B) = A2 - B2.
(b) If A2 = A, then A = 0 or A = I. (d) If AB = BC and B is invertible, then A = C.
cos 20 sin 20
R=
sin20 —cos 20
A'is (-<?)•
11. For each of the following matrices A, find a formula for A". (If you know how to give an inductive
proof, please do so.)
"rfi
d2
(all nondiagonal entries are 0)
12. Suppose A and A' are m x m matrices, B and B' are m x n matrices, C and C' are n x m
matrices, and D and D' are n x n matrices. Check the following formula for the product of “block”
matrices:
A B ' ' A' B' AA' + BC' AB' + BD'
Lc D [c' D' CA’ 4- DC’ CB' + DD'
"13. Let T: R2 -> R2 be the linear transformation defined by rotating the plane it fl counterclock
wise; let S: R2 -► R2 be the linear transformation defined by reflecting the plane across the line
Xi + X2 = 0.
(a) Give the standard matrices representing S and T.
(b) Give the standard matrix representing
(c) Give the standard matrix representing S»T.
14. Calculate the standard matrix for each of the following linear transformations T:
(a) T: R2 -> R2 given by rotating -it{4 about the origin and then reflecting across the line
— x2 = 0.
(b) T: R3 -> R3 given by rotating it/2 about the xi-axis (as viewed from the positive side) and then
reflecting across the plane x2 = 0.
40 ► Chapter 1. Vectors and Matrices
(c) T: R3 -* R3 given by rotating —rr/2 about the xi-axis (as viewed from the positive side) and
then rotating ?r/2 about the x3-axis.
±1
15. Consider the cube with vertices ±1 , pictured in Figure 4.8. (Note that the coordinate axes
±1
pass through the centers of the various faces.) Give the standard matrices for each of the following
symmetries of the cube.
(a) 90° rotation about the x3-axis (viewed from high above)
(0 120° rotation about the line joining (viewed from high above)
4.9. Give the standard matrices for each of the following symmetries of the tetrahedron.
T0
(a) 120° rotation counterclockwise (as viewed from high above) about the line joining 0 and
"1 ’ 0
the vertex 1
_ 1 _
(c) reflection across the plane containing one edge and the midpoint of the opposite edge
(Hint: Note where the coordinate axes intersect the tetrahedron.)
Figure 4.9
*17. Suppose A is an n x n matrix and B is an invertible n x n matrix. Calculate the following.
(a) (BAB-1)2
(b) (BAB-1)” (n a positive integer)
(c) (BAB-1)-1 (what additional assumption is required here?)
18. Find matrices A so that
(a) A £ O, but A2 = O; (b) A2 O, but A3 = O.
Can you make a conjecture about matrices satisfying A"-1 /= O but A" = O?
*19. Suppose A is an invertible n x n matrix and x € Rn satisfies Ax = 7x. Calculate A-1x.
20. Suppose A is a square matrix satisfying the equation A3 - 3A + 21 = 0. Show that A is invert
ible. (Hint: Can you give an explicit formula for A-1 ?)
21. Suppose A is an n x n matrix satisfying A10 = O. Prove that the matrix In — A is invertible.
(Hint: As a warm-up, try assuming A2 = O.)
22. Define the trace of an n x n matrix A (denoted tr A) to be the sum of its diagonal entries:
n
<=i
23. Let
2 3
(a) 2 3 1 (b) 0 1 -1 (0 1 3 1
_1 1 -1_ _1 3 1_ _0 2 -2_
(b) Fill in the missing columns in the following matrices to make them orthogonal:
?' ■ 1 2 ~
1 0 ?
4 0 -1 ? 9
5
2
J
7 2
~5
3
“I 2 1
0 0 ?_ L 3 7 5 J
5 Introduction to Determinants and the Cross Product 43
(c) Prove that any 2x2 orthogonal matrix A must be of the form
COS# -sin# cos# sin#
or
sin# COS# sin# — cos#
for some real number 6. (Hint: Use part a, rather than the original definition.)
*(d) Prove that if A is an orthogonal 2x2 matrix, then /iA: R2 -> R2 is either a rotation or the
composition of a rotation and a reflection.
(e) Assume for now that AT = A-1 when A is orthogonal (this is a consequence of Corollary 2.2 of
Chapter 4). Prove that the row vectors Ai,..., An of an orthogonal matrix A are unit vectors that are
orthogonal to one another.
35. (Recall the definition of orthogonal matrices from Exercise 34.)
(a) Prove that if A and B are orthogonal n x n matrices, then so is AB.
*(b) Prove that if A is an orthogonal matrix, then so is A-1.
*36. (a) Prove that the only matrix that is both symmetric and skew-symmetric is 0.
(b) Given any square matrix A, prove that S = |(A + AT) is symmetric and K = |(A - AT) is
skew-symmetric.
(c) Prove that any square matrix A can be written in the form A = S + K, where S is symmetric
and K is skew-symmetric.
(d) Prove that the expression in part c is unique: If A = S + K and A = S' 4- K' (where S and S'
are symmetric and K and K1 are skew-symmetric), then S = S' and K = K'. (Hint: Use part a.)
37. Suppose A is an n x n matrix that commutes with all n x n matrices; i.e., AB = BA for all
B 6 A4nxn. What can you say about A?
► 5 INTRODUCTION TO DETERMINANTS
AND THE CROSS PRODUCT
Let x and y be vectors in R2 and consider the parallelogram CP they span. The area of CP is
nonzero as long as x and y are not collinear. We want to express the area of ? in terms of
the coordinates of x and y. First notice that die area of the parallelogram pictured in Figure
5.1 is the same as the area of the rectangle obtained by moving the shaded triangle from
the right side to the left. This rectangle has area A = bh, where b = ||x|| is the base and
h = ||y || sin 0 is the height. We could calculate sin 0 from the formula
n x■y
cos# =---------- ,
IM llyll
but instead we note (see Figure 5.2) that
where p(x) is the vector obtained by rotating x an angle t t /2 counterclockwise (see Exer
Ji
area(CP) = p(x) ■ y = = xiy2-x2yx.
.yi.
44 ► Chapter 1. Vectors and Matrices
► EXAMPLE 1
then we get xiyz — x2yi = 41-3-3 = -5. Certainly the parallelogram hasn’t changed; nor does
it make sense to have negative area. What is the explanation? In deriving our formula for the area
above, we assumed 0 < 6 < rc\but if we must turn clockwise to get from x to y, this means that 9 is
negative, resulting in a sign discrepancy in the area calculation. -41
So we should amend our earlier result. We define the signed area of the parallelogram
IP to be the area of IP when one turns counterclockwise from x to y and to be negative the
area of IP when one turns clockwise from x to y, as illustrated in Figure 5.3. Then we have
this is the function that associates to each ordered pair of vectors x, y G R2 the signed area
of the parallelogram they span.
Figure 5.3
sHere, since x and y are themselves vectors, we use the customary notation for functions.
5 Introduction to Determinants and the Cross Product ◄ 45
Next, let’s explore the properties of the signed area function D on R2 x R2.6
Algebraically, we have
Geometrically, this was the point of our introducing the notion of signed area.
Geometrically, if we stretch one of the edges of the parallelogram by a factor of c > 0, then
the area is multiplied by a factor of c. And if c < 0, the area is multiplied by a factor of |c|
and the signed area changes sign (why?).
We can check this explicitly in coordinates (but the clever reader should try to use
X1 yi
properties of the dot product to give a better algebraic proof): If x — >y =
_X2 . .yz.
Z1
andz = , then
Z2
as required. (The formula for 2)(x, y + z) can now be deduced by using Property 1.)
Geometrically, we can deduce the result from Figure 5.4: The area of parallelogram OBCD
(D(x + y, z)) is equal to the sum of the areas of parallelograms OAED (D(x, z)) and
ABCE (D(y, z)). The proof of this, in turn, follows from the fact that AO AB is congruent
to ADEC.
6Recall that, given two sets X and Y, their product X x Y consists of all ordered pairs (x, y), where x 6 X and
y € Y.
46 ► Chapter 1. Vectors and Matrices
Figure 5.4
Property 4 For the standard basis vectors ei, ez, we have D(ei, ez) = 1.
As we ask the reader to check in Exercise 4, one can deduce from the four properties above
and the geometry of linear maps the fact that the determinant represents the signed area of
the parallelogram.
We next turn to the case of 3 x 3 determinants. The general case will wait until Chapter
7. Given three vectors,
Xl yi zi
x= X2 y= yi and z= Z2 e R3,
X3 _?3 _ _Z3 _
we define
I I
J2 Z2 yi Z1 yi zi
D(x, y, z) = y z = Xi - X2 + x3
y3 Z3 ?3 Z3 yi z2
Multiplying this out, we get three positive terms and three negative terms; a handy mnemonic
device for this formula is depicted in Figure 5.5.
This function D of three vectors in R3 has properties quite analogous to those in the
two-dimensional case. In particular, it follows immediately from the latter that if x, y, z,
and w are vectors in R3 and c is a scalar, then
y
XXX 71
*2 X X ?X
yz
yi 'to'' *g X
' ' " A A
+ + +
Figure 5.5
It is also immediately obvious from the definition that if x, y, z, and w are vectors in R3
and c is a scalar, then
X2 Z2 *1 21 X1 21
D(y, x, z) = -y2 + ys
*3 23 *3 23 X2 22
yi 22 . 71 21 71 22
= -Xl ~x3
73 23 73 23 yz 22
= —D(x, y, z).
Summarizing, we have
Note that, as a consequence, whenever two of x, y, and z are the same, we have
I>(x, y, z) = 0.
Property 4 For the standard basis vectors ei, e2, e3, we have D(ei, e2,e3) = 1.
If we let y' = y - projxy andz' = z - projxz — projy,z, then it follows from the proper
ties of D that D(x, y, z) = l)(x, y', z'). Moreover, we shall see when we study determinants
in Chapter 7 that the results of Exercise 4 hold in three dimensions as well, so that the latter
value is not changed by rotating R3 to make x = aei, y' = /Je2, and z' = yes. Since ro
tation doesn’t change signed volume, we deduce that D(x, y, z) equals the signed volume
of the parallelepiped spanned by x, y, and z, as suggested in Figure 5.6. For an alternative
argument, see Exercise 18.
Figure 5.6
Given two vectors x, y e R3, define a vector, called their cross product, by
where the latter is to be interpreted “formally.” The geometric interpretation of the cross
product, as indicated in Figure 5.7, is the content of the following
Figure 5.7
5 Introduction to Determinants and the Cross Product <4 49
Remark More colloquially, if you curl the fingers of your right hand from x toward
y, your thumb points in the direction of x x y.
z ■ (x x y) = !D(z, x, y).
In particular, x ■ (x x y) = D(x, x, y) = 0.
Now, D(x, y, x x y) is the signed volume of the parallelepiped spanned by x, y, and
x x y. Since x x y is orthogonal to the plane spanned by x and y, that volume is the product
of the area of IP and ||x x y ||. On the other hand,
When x and y are nonparallel, we have D(x, y, x x y) = ||x x y ||2 > 0, so the vectors span
a parallelepiped of positive signed volume, as desired. ■
► EXAMPLE 2
We can use the cross product to find the equation of the subspace IP spanned by the vectors u = 1
1
and v — 2 . For the normal vector to IP is
ei 1 3
A = u x v = e? 1 -2
e3 -1 1
and so
Moreover, as depicted schematically in Figure 5.8, the affine plane parallel to IP and passing
’ 2~
through the point Xq = is given by
Figure 5.8
► EXERCISES 1.5
1. Give a geometric proof that D(x, y + ex) = D(x, y) for any scalar c.
2. Show that if a function D: R2 x R2 -> R satisfies Properties 1-4, then D(x, y) = %iy2 - x2yi.
X2
3. Suppose a polygon in the plane has vertices Give a formula for its
yi
area.
4. (a) Check that when A and B are 2 x 2 matrices, we have det(AB) — det A det B.
(b) Let A = As be a rotation matrix. Check that det(A0B) = det B for any 2x2 matrix B.
(c) Use the result of part b and the properties of determinants to give an alternative proof that D (x, y)
is the signed area of the parallelogram spanned by x and y.
5. Calculate the cross product of the given vectors x and y.
*(a) A = 0 ,B = 0 ,C = 2 (c) A — 0 ,B = -2 ,c = 1
_0_ -1 _ 1 _ _0_ 1 _ _ -5 _
~2
"1 ' 1' 2' 8'
1 2
(b) A = -1 ,B = -1 ,c = 1 (d) A = 1 ,B = -1 ,c = 2
1 0 _2 _ 1 _ 2_ _-4_
7. Find the equation of the (affine) plane containing the three points
' o' 1 1 0 1 ' 7"
*(a) A — 0 ,B = 0 ,c = 2 (c) A — 0 ,B = -2 ,c = 1
0 -1 1 0 1 -5
5 Introduction to Determinants and the Cross Product <4 51
13. Let IP be a parallelogram in R3. Let CPi be its projection on the x2x3 -plane, ?2 be its projection
on the xix3-plane, and 1P3 be its projection on the x^-plane. Prove that
(areaGP))2 = (areaGPi))2 4- (area(?2))2 4- (area(?3))2.
(c) Suppose x is the the intersection of the medians of the triangle with vertices u, v, and w. Compare
the areas of the three triangles formed by joining x with any pair of the vertices. (Cf. Exercise 1.1.8.)
(d) Let r = D(v, w), s = D(w, u), and t = D(u, v). Show that ru 4- sv 4- tw = 0. Give a physical
interpretation of this result.
18. In this exercise, we give a self-contained derivation of die geometric interpretation of the 3 x 3
determinant as signed volume.
(a) By direct algebraic calculation, show that ||x x y||2 = ||x||2||y||2 — (x • y)2. Deduce that ||x x y||
is the area of the parallelogram spanned by x and y.
(b) Show that z ■ (x x y) is the signed volume of the parallelepiped spanned by x, y, and z.
(c) Conclude that D(x, y, z) equals the signed volume of that parallelepiped.
19. (Heron’s formula) Given AOAB, let oA = x and 0% = y, and set ||x|| = a, ||y|| = b, and
||x - y || = c. Let s = j(a + b + c) be the semiperimeter of the triangle. Use the formulas
||x x y||2 = ||x||2||y||2 - (x • y)2 (see Exercise 18),
||x-y||2 = ||x||2 +llyll2-2x-y
to prove that the area A of LOAB satisfies
A2 = 7 (a2b2 — |(c2 - a2 — b2)2^ = s(s — a)(s — b)(s — c) .
4 \ 4 /
20. Let AABC have sides a, b, and c. Let s = |(a 4- b 4- c) be its semiperimeter. Prove that the
inradius of the triangle (i.e., the radius of its inscribed circle) is r = ^/(s ~ a)(s - b)(s - c)/s.
CH ER
2
FUNCTIONS, LIMITS,
AND CONTINUITY
In this brief chapter we introduce examples of nonlinear functions, their graphs, and their
level sets. As usual in calculus, the notion of limit is a cornerstone on which calculus is
built. To discuss “nearness,” we need the concepts of open and closed sets and of convergent
sequences. We then give the usual theorems on limits of functions and several equivalent
ways of thinking about continuity. All of this will be the foundation for our work on
differential calculus, which comes next.
► EXAMPLE 1
The easiest examples, perhaps, are linear. Imagine a particle starting at position x0 and moving with
constant velocity v. Then its position at time t is evidently f (t) = xo + tv and its trajectory is a line
passing through x0 and having direction vector v, as shown in Figure 1.1. We refer to the vector
valued function f as a parametrization of the line. Here t is free to vary over all of R. When we wish
to parametrize the line passing through two points A and B, it is natural to use one of those points,
say A, as xo and the vector AB as the direction vector v, as indicated in Figure 1.2. <
53
54 > Chapter 2. Functions, Limits, and Continuity
► EXAMPLE!
The next curve with which every mathematics student is familiar is the circle. Essentially by the very
definition of the trigonometric functions cos and sin, we obtain a very natural parametrization of a
circle of radius a, as pictured in Figure 1.3(a):
11 /
/ cos t a cost
we see that the unit circle x2 + y2 = 1 maps to the ellipse — + y- = 1. Since T I
ar tr \sint bsint
the latter gives a natural parametrization of the ellipse, as shown in Figure 1.3(b). Be warned, how
ever: Here t is not the angle between the position vector and the positive x-axis, as Figure 1.3(c)
indicates. ◄ I
1 Scalar-and Vector-Valued Functions ◄ 55
Figure 1.3
► EXAMPLES
Consider the two cubic curves in R2, illustrated in Figure 1.4. On the left is the cuspidal cubic
y2 — x3, and on the right is the nodal cubic y2 = x3 + x2. These can be parametrized, respectively,
by the functions
Figure 1.4
56 ► Chapter 2. Functions, Limits, and Continuity
as the reader can verify.1 Now consider the twisted cubic in R3, illustrated in Figure 1.5, given by
t
f(r) = t eR.
Its projections in the xy-, xz-, and yz-coordinate planes are, respectively, y = x2, z = x3, and z2 = y3
(the cuspidal cubic). *4
► EXAMPLE 4
Our last example is a classic called the cycloid: It is the trajectory of a dot on a rolling wheel (circle).
Consider the illustration in Figure 1.6. Assuming the wheel rolls without slipping, we see that the
distance it travels along the ground is equal to the length of the circular arc subtended by the angle
through which it has turned. That is, if the radius of the circle is a and it has turned through angle t,
then tiie point of contact with the x-axis, Q, is at units to the right. The vector from the origin to the
point P can be expressed as the sum of the three vectors o£), Q&, and (see Figure 1.7):
= o$ + of: + cfi
at 0 —a sint
— + +
0 a —a cos t
*To see where the latter came from, as suggested by Figure 1.4(b), we substitute y — tx in the equation and solve
forx.
1 Scalar-and Vector-Valued Functions ◄ 57
/xA
rather than
\Xn/
It would be typographically more pleasant and economical to suppress the vector notation
and write merely f(xi,..., xn), as do most mathematicians. We hope our choice will make
it easier for the reader to keep vectors in columns and not confuse rows and columns of
matrices.
When n = 1 or n = 2, such functions are often best visualized by their graphs
as pictured, for example, in Figure 1.8. There are two ways to try to visualize functions
and their graphs, as we shall see in further detail in Chapter 3. One is to fix all of the
coordinates of x but one, and see how f varies with each of xi,..., xn individually. This
corresponds to taking slices of the graph, as shown in Figure 1.9. The other is to think of a
topographical map, in which we see curves representing points at the same elevation. One
then can lift each of these up to the appropriate height and imagine the surface interpolating
among them, as illustrated in Figure 1.10. These curves are called level curves or contour
curves of the function.
58 ► Chapter 2. Functions, Limits, and Continuity
Figure 1.9
Figure 1.10
1 Scalar- and Vector-Valued Functions ◄ 59
► EXAMPLES
Suppose we see families of concentric circles as the level curves, as shown in Figure 1.11. We see
that in (a) the circles are evenly spaced, whereas in (b) they grow closer together as we move outward.
This tells us that in (a) the value of f grows linearly with the distance from the origin and in (b) it
grows more quickly. Indeed, it is not surprising to see the corresponding graphs in Figure 1.12: The
• respective functions are /(x) = ||x|| and /(x) = ||x||2.
Figure 1.11
Figure 1.12
/i(x)
./mW.
But in other instances, we really want to think of the values as geometrically defined vectors;
fundamental examples are parametrized surfaces and vector fields (both of which we shall
study a good deal in Chapter 8). Note that we will indicate a vector-valued function by
boldface type.
► EXAMPLE 6
r cos©
r sin©
as illustrated in Figure 1.13. This is a one-to-one mapping onto R2 - {0}. The coordinates are
r cos 5
often called the polar coordinates of the point
r sin©
2t t T ©
Figure 1.13
► EXAMPLE?
ucos v
When we fix u = m 0> the image is a circle of radius «o at height »o: when we fix v = vo» the image is
a ray making an angle of t t /4 with the z-axis and whose projection into the xy-plane makes an angle
of i>o with the positive x-axis. Thus, the image of f is a cone, as pictured in Figure 1.14.
Figure 1.14
► EXERCISES 2.1
1. Find parametrizations of each of the following lines:
(a) 3%i + 4x2 — 6,
*(b) the line with slope 1 /3 that passes through 1 ,
1 2+t
1 l-2t
*(e) the line through parallel to g(t) =
0 3t
2. (a) Give parametric equations for the circle x2 + y2 = 1 in terms of the length t pictured in Figure
1.15. (Hint: Use similar triangles and algebra.)
(b) Use your answer to part a to produce infinitely many positive integer solutions2 of X2 + Y2 = Z2
with distinct ratios Y/ X.
2These are called Pythagorean triples. Fermat asked whether there were any nonzero integer solutions of the
corresponding equations Xn 4- Yn = Zn for n > 3. In 1995, Andrew Wiles proved in a tour de force of algebraic
number theory that there can be none.
62 ► Chapter 2. Functions, Limits, and Continuity
Figure 1.15
3. A string is unwound from a circular reel of radius a, being pulled taut at each instant. Give
parametric equations for the tip of the string P in terms of the angle 0, as pictured in Figure 1.16.
Figure 1.16
4. A wheel of radius a (perhaps belonging to a train) rolls along the x-axis. If a point P (on the
wheel) is located a distance b from the center of the wheel, what are the parametric equations of its
locus as the wheel rolls? (Note that when b = a we obtain a cycloid.) See Figure 1.17.
Figure 1.17
5. *(a) A circle of radius b rolls without slipping outside a circle of radius a > b. Give the parametric
equations of a point P on the circumference of the rolling circle (in terms of the angle 0 of the line
joining the centers of the two circles). (See Figure 1.18(a).)
(b) Now it rolls inside. Do the same as for part a.
These curves are called, respectively, an epicycloid and a hypocycloid.
6. A coin of radius 1" is rolled (without slipping) around the outside of a coin of radius 2". How many
complete revolutions does its “head” make? Now explain the correct answer! (There is a famous
story that the Educational Testing Service screwed this one up and was challenged by a precocious
high school student who knew that he had done the problem correctly.)
0
*7. A dog buries a bone at . He is at the end of a 1-unit long leash, and his master walks down
u 1
the positive x-axis, dragging the dog along. Since the dog wants to get back to the bone, he pulls
1 Scalar- and Vector-Valued Functions •«< 63
(b)
Figure 1.18
the leash taut. (It was pointed out to me by some students a few years ago that the realism of this
model leaves something to be desired.) Hie curve the dog travels is called a tractrix (why?). Give
parametric equations of the curve in terms of the parameters
(a) 0, (b) t,
as pictured in Figure 1.19. (Hint: The fact that the leash is pulled taut means that the leash is tangent
to the curve. Show that 0f(t) = sin0(t)-)
Figure 1.19
8. Prove that the twisted cubic (given in Example 3) has the property that any three distinct points
on it determine a plane; i.e., no three distinct points are collinear.
9. Sketch families of level curves and the graphs of the following functions f.
(a) f j = 1 - y (c) / r j = x2 - y2
: x2 + y2 — z2 = 1 and
64 ► Chapter 2. Functions, Limits, and Continuity
(2 + cos t) cos s
(2 + cos t) sin s 0 < s, t < 2?r.
sint
(a) Show that every point in the image of g lies on the hyperboloid x2 + y2 — z2 = 1.
(b) Show that the curves (for so and r0 constants) are (subsets of) lines. (See
Figure 1.20.)
(c) More challenging: What is the image of g?
Figure 1.20
Definition Let a g R" and let 8 > 0. The ball of radius 8 centered at a is
£
Note that if |x/ — a, | < —= for all i — 1,..., n, then
= 8,
so x g B(sl , 8). And if x g B(a, 8), then |Xj — aj < ||x — a|| <8 for all i = 1,..., n.
Figure 2.1 illustrates these relationships. If ai < bt-for i = 1,..., n, we can consider the
rectangle
R = [«i, bi] x [a2, b2] x • ■ ■ x [an, bn] = {x G R" : a, < x, < bit i = 1,..., n}.
(Strictly speaking, we should call this a rectangular parallelepiped, but that’s too much of a
mouthful.) For reasons that will be obvious in a moment, when we construct the rectangle
from open intervals, viz.,
S = (ai, bi) x (a2, bi) x • • • x (an, bn) = {x g R" : a, < x; < bif i = 1,..., n},
8
Figure 2.1
Definition We say a subset U C R" is open if for every a g U there is some ball
centered at a that is completely contained in U; that is, there is <5 > 0 so that B(a, 8) c U.
► EXAMPLE 1
a. First of all, an open interval (a, b) C R is an open subset. Given any c e (a, b), choose
8 < min(c — a,b — c). Then B(c, 8) c (a, b). However, suppose we view this interval as
Figure 2.2
b. An open rectangle is an open set. As indicated in Figure 2.3, suppose c e S = (at, bi) x
(a2, h2) x • • • x (an, bn). Let 8t = min(q — ai, bi — q), i = and set 8 =
min(<Si,..., 8n). Then we claim that B(c, 8) c S. For if x e B(c, 8), then |x(- - q| <
||x - c|| < 8 < 8i, so at < Xt < bt, as required.
centered at c is wholly contained in the region S. We consider the open rectangle centered
at c with base j — a and height 23; by construction, this rectangle is contained in S. Since
b < a and ab < 1, it easy to check that the height is smaller than the length, and so the ball
of radius 8 centered at c is contained in the rectangle, hence in S’.
As we shall see in the next section, the concept of open sets is integral to the notion of
continuity of a function.
We turn next to a discussion of sequences. The connections to open sets will become
clear.
Definition ^.sequence of vectors (or points) in R” is a function from the set of natural
numbers, N, to R", i.e., an assignment of a vector x* e R" to each natural number k e N.
We refer to x* as the k'h term of the sequence. We often abuse notation and write {x*} for
such a sequence, even though we are thinking of the actual function and not the set of its
values.
2 A Bit of Topology in R" 67
Figure 2.4
We say the sequence [xjJ converges to a (denoted x* -> a or lim x* = a) if for all
£->00
e > 0, there is K e N such that
(That is, given any neighborhood nf a. “eventually”—past some X—all the elements x^ of
the "sequence he inside.) We say the sequence {x*} is convergent if it converges to some a.
> EXAMPLE 2
a. Let xk = jL-. We suspect that xk -> 1. To prove this, note that, given any s > 0,
,, k , 1
fc + 1 " Jt-Fl
whenever k + 1 > 1/s. If we let K = [1/e] (the greatest integer less than or equal to 1/e),
then it is easy to see that k > K => k + 1 > 1/e, as required.
b. The sequence {xfc = (1 + |)*} of real numbers is a famous one (think of compound interest)
and converges to e, as the reader can check by taking logs and applying Proposition 3.6.
c. The sequence 1, —1,1, —1,1,i.e., = (-1/+1), is not convergent. Since its con
secutive terms are two units apart, no matter what a € R and K e N we pick, whenever
8 < 1, we cannot have |x* - a| < e whenever k > K. For if we did, we would have (by
68 R Chapter 2. Functions, Limits, and Continuity
2k
As the reader can easily prove by induction, we have xfc = , and it follows
f22*+l 1
that lim x* —
1
0
.◄
► EXAMPLE 3
Suppose x*, y* € R”, x* -» a, and y* -> b. Then it seems quite plausible that x* + yt -> a + b.
Given £ > 0, we are to find K € N so that whenever k > K we have || (x* + yt) — (a + b)|| < s.
Rewriting, we observe that (by the triangle inequality)
Ufa + y*) - (a + b)|| = ||(Xfc - a) + (y* - b)|| < Hx* - a|| + Uy* - b||,
I.
and so we can make ||(x* + y*) — (a + b)|| < £ by making ||x* — a|| < s/2 and ||y* - b|| < s/2. To
this end, we use the definition of convergence of the sequences {x*} and {y*J as follows: There are
Ki, K2 e N so that
and
Thus, if we take K = max(Xi, K2), whenever k > K, we will have k > Ky and k > K2, and so
g £
II(x* + yjt) - (a + b)|| < ||x* - a|| + ||y* - b|| < - + - = s,
as was required.
► EXAMPLE 4
Definition Suppose S C R". If 5 has the property that every convergent sequence
of points in S converges to a point in S, then we say S is closed. That is, S is closed if the
following is true: Whenever a convergent sequence x* -> a has the property that x* G S
for all k G N, then a G 5 as well.
► EXAMPLES
This definition seems, a bit strange, but it is exactly what we will need for many
applications to come. In the meantime, if we need to decide whether or not a set is closed,
it is easiest to use the following.
Proposition 2.1 The subset S C R” is closed ifand only if its complement, R" — S =
{x G R” : x £ 5}, is open.
Proof Suppose R” — S' is open and {x^} is a convergent sequence with x* g S and
limit a. Suppose that a 0 S. Then there is a neighborhood B(a, s) of a wholly contained
in R” — S', which means no element of the sequence {x*} lies in that neighborhood, contra-
v dieting the fact that x* -> a. Therefore, a g S, as desired.
Suppose S is closed and b S. We claim that there is a neighborhood of b lying
entirely in R" — S. Suppose not. Then for every k g N, the ball B(b, 1/k) intersects S;
that is, we can find a point x* G S with ||X£ — b|| < 1/k. Then {x*} is a sequence of points
in S converging to the point b $ 5, contradicting the hypothesis that S is closed. ■
70 ► Chapter 2. Functions, Limits, and Continuity
> EXAMPLE 6
It now follows easily that the closed interval [a, b] = {x e R : a < x < b} is a closed subset of R,
inasmuch as its complement is the union of two open intervals. Similarly, the closed ball B(a, r) =
{x e R" : ||x — a|| < r} is a closed subset of R”, as we ask the reader to check in Exercise 5. In
summary, our choice of terminology is felicitous indeed.
Note that most sets are neither open nor closed. For example, the intervals = (0,1] c
R is not open because there is no neighborhood of the point 1 contained in S, and it is not
closed because of the reasoning in Example 5. Be careful not to make a common mistake
here: Just because a set isn’t open, it need not be closed, and vice versa.
For future use, we make the following
We should think of S as containing all the points of S and all points that can be obtained
as limits of convergent sequences of points of S. A slightly different formulation of this
notion is given in Exercise 8.
► EXERCISES 2.2
*1. Which of the following subsets of R” is open? closed? neither? Prove your answer.
(a) {x : 0 < x < 2} C R x 1 2
(b) {x : x = 2~k for some fceNor (fi) .y—x| cR
x = 0} C R
(h) {x : 0 < l|x|| < 1} c R”
(c) : y > 01 C R2 (i) {x : ||x|| > 1} c R” \
0) {x : ||x|| < 1} c R"
(k) the set of rational numbers, Q C R
(d) cR2
1
x : ||x|| < 1 or x - < 1 ■ C R2
0
(e) :y >x C R2
(m) 0 (the empty set)
a2. Let lx*} be a sequence of points in R". For i = 1,..., n, let x^i denote the Ith coordinate of the
vector x*. Prove that x^ -> a if and only if x*,, -> at for all i = 1,..., n.
3. Suppose {Xfc} is a sequence of points (vectors) in R" converging to a.
■ (a) Prove that ||x* || —> ||a||. (Hint: See Exercise 1.2.17.)
(b) Prove that if b € R" is any vector, then b • x* -> b • a.
#4. Prove that a rectangle R = [«i, hi] x •• • x [an, bn] c R" is closed.
*5. Prove that the closed ball B(a, r) = {x € Rn : ||x - a|| < r} c R" is closed.
36. Given a sequence {x*} of points in R", a subsequence is formed by taking xfcl, Xfc2,..., xk.,...,
wherek\ < k2 < ^3 < • • ••
2 A Bit of Topology in R" 71
(a) Prove that if the sequence {x*} converges to a, then any subsequence {x^} converges to a as well.
(b) Is the converse valid? Give a proof or counterexample.
”7. (a) Suppose U and V are open subsets of RB. Prove that U U V and U D V are open as well.
(Recall that U U V = {x e R”: x € U or x € V} and UnV = {x€R”:xeU and x e V}.)
(b) Suppose C and D are closed subsets of R". Prove that CUD and C D D are closed as well.
118. Let S C RB. We say a e S is an interior point of S if some neighborhood of a is contained in S.
We say a € RB is a frontier point of S if every neighborhood of a contains both points in S and points
not in S.
(a) Show that every point of S is either an interior point or a frontier point, but give examples to
show that a frontier point of S may or may not belong to S.
(b) Give an example of a set S every point of which is a frontier point.
(c) Prove that the set of frontier points of S is always a closed set.
(d) Let S' be the union of S and the set of frontier points of S. Prove that S' is closed.
(e) Suppose C is a closed set containing S. Prove that S' C C. Thus, S' is the smallest closed set
containing S, which we have earlier called S, the closure of S. (Hint: Show that R” — C c R" — S'.)
9. Continuing Exercise 8:
(a) Is it true that all the interior points of S are points of S? Is this true if S is open? (Give proofs or
counterexamples.)
(b) Let S c R" and let F be the set of the frontier points of S. Is it true that the set of frontier points
of F is F itself? (Give a proof or counterexample.)
s*10. (a) Suppose Io —[a, £>] is a closed interval, and for each k e N, Ik is a closed interval with the
property that Ik C Ik-i • Prove that there is a point x e R so that x € Ik for all k e N.
(b) Give an example to show that the result of part a is false if the intervals are not closed.
11. Prove that the only subsets of R that are both open and closed are the empty set and R itself. (Hint:
Suppose S is such a nonempty subset that is not equal to R. Then there are some points a e S and
b $ S. Without loss of generality (how?), assume a < b. Let a = sup{x e R : [a, x] c S}. Show
that neither a e S nor a £ S is possible.)
12. A sequence {xfc} of points in R" is called a Cauchy sequence if for all e > 0 there is K e N so
that whenever k, t > K, we have IJx* — xg || < e.
(a) Prove that any convergent sequence is Cauchy.
(b) Prove that if a subsequence of a Cauchy sequence converges, then the sequence itself must
converge. (Hint: Suppose £ > 0. If x^ -> a, then there is J e N so that whenever j > J, we have
l|Xfcy — a|| < 6/2. ThereisalsoiT € N so that whenever k, I > K, we have ||xfc — xj| < e/2. Choose
j > J so that kj > K.)
*13. Prove that if {xfc} is a Cauchy sequence, then all the points lie in some ball centered at the origin.
14. (a) Suppose {j q } is a sequence of points in R satisfying a < xk < b for all k e N. Prove that
lx*} has a convergent subsequence (see Exercise 6). (Hint: If there are only finitely many distinct
terms in the sequence, this should be easy. If there are infinitely many distinct terms in the sequence,
then there must be infinitely many either in the left half-interval [a, or in the right half-interval
b]. Let [ui, hi] be such a half-interval. Continue the process, and apply Exercise 10.)
(b) Use the results of Exercises 12 and 13 to prove that any Cauchy sequence in R is convergent.
(c) Now prove that any Cauchy sequence in R" is convergent. (Hint: Use Exercise 2.)
315. Suppose S c R” is a closed set that is a subset of the rectangle [ai, bi] x • • • x [an, bn]. Prove
that any sequence of points in S has a convergent subsequence. (Hint: Use repeatedly the idea of
Exercise 14a.)
72 ► Chapter 2. Functions, Limits, and Continuity
limf(x) = I
x->a
(ff(x) approaches L g Rm as x approaches a) if for every e > 0 there is 3 > 0 so that
(Note that even if f (a) is defined, we say nothing whatsoever about its relation to £.)
Proposition 3.1 lim f (x) = I if and only if lim /,• (x) = € .• for all j = 1,... ,m.
x—>a x->a
Proof The proof is based on Figure 2.1. Suppose limf(x) = t. We must show
X—>8
that for any j = 1,..., m, we have lim fj(x) — tj. Given s > 0, there is 8 > 0 so that
whenever 0 < ||x — a|| < 3, we have ||f (x) — £|| < e. But since we have
we see lhat whenever 0 < ||x — a|| < 8, we have \fj (x) — | < £, as required.
Now, suppose that lim /)(x) = tj for j = 1,..., m. Given £ > 0, there are ...,
8m > 0 so that
e
\fj(x) —£j\ < —= whenever 0 < ||x — a|| < 8j.
Let 8 = min(<5i,..., 8m). Then whenever 0 < ||x — a|| < 8, we have
m I / £ \2
||f(x)-Z||= V(/;(x)-<)2 < ml — ) =e,
as required. ■
► EXAMPLE 1
Fix a nonzero vector b € ,1". Let f: R" -» R be defined by /(x) = b • x. We claim that lim /(x) =
x—>a
b•a because
by the Cauchy-Schwarz Inequality, Proposition 2.3 of Chapter 1. Thus, given 8 > 0, if we take
g
8 = 8/ ||b||, then whenever 0 < ||x - a|| < <5, we have |/(x) — b ■ a| < ||b|| —• = 8, as needed.
Il»ll
Note, moreover, that as a consequence of Proposition 3.1, for any linear map T: Rn -> Rw it is
the case that lim T(x) = T(a).
> EXAMPLE 2
Let f: R" -> R be defined by /(x) = ||x||2. Then we claim that lim /(x) = ||a||2.
x->a
1. Suppose first that a = 0. Since r2 < r whenever 0 < r < 1, we know that when 0 < £ < 1,
we can choose 3 = 8 and then
as required. But what if some (admittedly, silly) person hands us an 8 > 1? The trick to
take care of this is to let 8 = min(l, s). Should e be bigger than 1, then 8 = 1, and so when
0 < ||x|| < 8, we know that ||x|| < 1 and, once again, |/(x)| < 1 < 8, as required.
✓ g \
2. Now suppose a 0. Givens > 0, let 8 = min I ||a||, —— I. Now suppose 0 < ||x — a|| <
\ 3||a||/
8. Then, in particular, we have ||x|| < ||a|| + 8 < 2||a||, so that ||x + a|| < ||x|| + ||a|| <
3||a||. Then
as required.
Such sleight of hand (and more) is often required when the function is nonlinear. ◄
► EXAMPLES
Define/: R2 - {0}-> R by f Does lim /(x) exist? Since |x| < Jx2 + y2 and
x->0
llxlP
irooi s = iix«.
and so /(x) -> 0 as x -> 0. (In particular, taking 8 = e will work.) An alternative approach, which
will be useful later, is this:
x2
|/(x)| = |y| ...2 < |y|
x2 + y2
x2
since 0 < —------- < 1. Once again, |y| < ||x|| and hence approaches 0 as x -> 0. Thus, so does
x2 4- y2
|/(x)|. (SeeFigure 3.1(a).)
74 ► Chapter 2. Functions, Limits, and Continuity
► EXAMPLE 4
Let’s modify the previous example slightly. Define f: R2 — {0} -> R by f . We ask
x2 + y2
again whether lim /(x) exists. Note that
x->0
i-
0 = lim — = 1,
lim f [ ° j = lim
k->0 \kl
h2
k~*° k2
= 0.
whereas
Thus, lim f (x) cannot exist (there is no number t so that both 1 and 0 are less than £ away from t
x-»0
xy „ T ,.
when 0 < s < 1/2). (See Figure 3.1(b).) Now, what about f ------- ? In this case we have
4- v2
so we might surmise that the limit exists and equals 0. But consider what happens if x approaches 0
along the line y = x :
r Jh\ r h2 1
lim —2r = -.
lim / \h)I = h-+o2h 2
The fundamental properties of limits with which every calculus student is familiar
generalize in an obvious way to the multivariable setting.
Theorem 3.2 Suppose f and g map. a neighborhood of a s JR” (with the possible
exception of the point a itself) to and k maps the same neighborhood to R. Suppose
Then
and
8
||g(x) — m|| < - whenever 0 < ||x - a|| < 32.
and
£
llg(x) -m|| < • ■ whenever 0 < ||x - a|| < 32.
4" 4?
Note that when 0 < ||x — a|| < 3i, we have (by the triangle inequality) ||f (x)|| < ||Z|| 4-1.
Now, let 3 = min(3i, 32). Whenever 0 < ||x — a|| < 3, we have
as required.
The proof of the last equality is left to the reader in Exercise 4. ■
Once we have the concept of limit, the definition of continuity is quite straightforward.
limf(x) = f(a).
76 > Chapter 2. Functions, Limits, and Continuity
That is, f is continuous at a if, given any £ > 0, there is 6 > 0 so that
It is perhaps a bit more interesting to relate the definition of continuity to our notions
of open and closed sets from the previous section. Let’s first introduce a bit of standard
notation: If f: X -> Y is a function and Z c Y, we write /-1(Z) = {x G X : f(x) g Z],
as illustrated in Figure 3.2. This is called the preimage of Z under the mapping /; be careful
to remember that f may not be one-to-one and hence may well have no inverse function.
Figure 3.2
Figure 33
Proposition 3.5 Suppose U C R" and W C Rp are open, f: U -> Rm, g: W R",
and the composition offunctions fog is defined (i.e., g(x) e U for allxeW). Then iff and
g are continuous, so is f°g.
► EXAMPLES
whose graph is shown in Figure 3.4. We ask whether f is continuous. Since the denominator vanishes
only at the origin, it follows from Corollary 3.3 that f is continuous away from the origin. Now,
since f = 0 for all h and k, we are encouraged. What’s more, the restriction of f to
x \ mx3 mx _ „
(mx I
= x44 +. "2 i = ■■■■,-
m2 + x2
2 for all x.
On the other hand, if we consider the restriction of f to the parabola y = x2, we find that
= x*0
0, x = O’
which is definitely not a continuous function. Thus, f cannot be continuous. (If it were, according
Figure 3.4
Proposition 3.6 Suppose U C Rrt is open and f: U —> Rm. Then f is continuous at
a if and only iffor every sequence {x*} ofpoints in U converging to a the sequence {f (x^)}
converges to f (a).
Corollary 3.7 Suppose f: R" —> Rm is continuous. Then for any c G Rm, the level
set f"1 ({c}) = {x G R” : f (x) = c} is a closed set.
Proof Suppose {x* } is a convergent sequence of points in f “1 ({c}), and let a be its
limit. By Proposition 3.6, f (x&) -> f (a). Since f (xjt) = c for all k, it follows that f (a) = c
as well, and so a e f-1 ({c}), as we needed to show. ■
► EXAMPLE 6
By Example 2, the function f: Rrt -> R, /(x) = ||x||2, is continuous. The level sets of f are spheres
centered at the origin. It follows that these spheres are closed sets. ◄ I
14. Identify A4mxn, the set of m x n matrices, with Rmn in the obvious way.
(a) Prove that when n — 2 or 3, the set of n x n matrices with nonzero determinant is an open subset
of Mnxn.
(b) Prove that the set of n x n matrices A satisfying ATA = In is a closed subset of A4nxn-
15. (a) Let
|y| > x2 or y = 0
otherwise
Show that f is continuous at 0 on every line through the origin but is not continuous at 0.
(b) Give a function that is continuous at 0 along every line and every parabola y — kx2 through the
origin but is not continuous at 0.
16. Give a function f: R2 -> R that is
(a) continuous at 0 along every line through the origin but unbounded in every neighborhood of 0;
(b) continuous at 0 along every line through the origin, unbounded in every neighborhood of 0, and
discontinuous only at the origin.
17. Generalizing Example 5, determine for what positive values of a, ft, y, and 8 the analogous
function
f (x\ _ |x|g|y|^
f\y/ IxP' + bf’ x £ 0, /(0) = 0
is continuous at 0.
18. (a) Suppose A is an invertible n x n matrix. Show that the solution of Ax = b varies continu
ously with b e R".
A
(b) Show that the solution of Ax = b varies continuously as a function of , as A varies over
b
all invertible matrices and b over R". (You should be able to get the cases n = 1 and n = 2. What
do you need for n > 2?)
THE DERIVATIVE
In this chapter we start in earnest on calculus. The immediate goal is to define the tangent
plane at a point to the graph of a function, which should be the suitable generalization
of the tangent lines in single-variable calculus. The fundamental computational tool is
the partial derivative, a direct application of single-variable calculus tools. But the actual
definition of a differentiable function immediately involves linear algebra. We establish
various differentiation rules and then introduce the gradient, which, as common parlance
has come to suggest, tells us in which direction a scalar function increases the fastest; thus,
it is highly important for physical and mathematical applications. We conclude the chapter
with a discussion of Kepler’s laws, the geometry of curves, and higher-order derivatives.
Af
Definition We define the partial derivatives — and — as follows:
9f /a\ .. J\ b ) \bj
------------- h--------------
Very simply, if we fix b, then is the derivative at a (or slope) of the function
81
82 ► Chapter 3. The Derivative
Figure 1.1
at f(a + rej)-f(a) . .
—(a) = tan------------J------ , j = l.....n
dXj t—>0 t
(provided this limit exists). Many authors use the alternative notation Dj /(a) to represent
the jth partial derivative of f at a.
► EXAMPLE 1
The partial derivatives of f measure the rate of change of f in the directions of the
coordinate axes, i.e., in the directions of the standard basis vectors ei,..., en. Given any
nonzero vector v, it is natural to consider the rate of change of f in the direction of v.
1 Partial Derivatives and Directional Derivatives -4 83
n a r f(» + ^v)-f(a)
Z)vf(a) = lim------------------------ ,
>0 t
Note that the 7th partial derivative of f at a is just De.f (a). When n = 2 and m = 1,
as we see from Figure 1.2, if || v || — 1, the directional derivative Dv/(a) is just the slope at
a of the graph we obtain by restricting to the line through a with direction v.
Figure 1.2
the directional derivative depends not only on the direction of v, but also on its magnitude.
It is for this reason that many calculus books require that one specify a unit vector v. It
makes more sense to think of Dvf (a) as die rate of change of f as experienced by an observer
moving with instantaneous velocity v. We shall return to this interpretation in Section 3.
84 ► Chapter 3. The Derivative
► EXAMPLE 2
x/= 0, /(O) = 0,
whose graph is shown in Figure 1.3. Then the directional derivative of f at 0 in the direction of the
unit vector v =
kviKrvz)
d ,/(o > = ita = _J£l_ =
t—>0 f f—>0 if
Note that both partial derivatives of f at 0 are 0, and yet the remaining directional derivatives are
nonzero.
► EXAMPLE 3
Let /: Rn -> R be defined by f (x) = ||x||. Let a 0 be arbitrary. Let v = a/||a|| be a unit vector
pointing radially outward at a. Then
n . r /(a + tv)-/(a) lla + ^ll - Hall (||a|| +t) - ||a||
Dv f(a) = hm ---------------- ------ = hm---------- —----------- = hm -- --------- ---------- — 1.
f->0 t f->0 t /->o t
On the other hand, if v • a = 0, then
r. . r l|a + rv|| - ||a||
Dv/(a) = hm-------- ------------ = 0,
r->0 t
inasmuch as t = 0 is a global minimum of the function g(t) = ||a + tv||. (Why?) ◄
► EXAMPLE 4
1' 2~
Let f y = x2y + e3x+y~z-, let a - -1 and v = 3 What is the directional derivative
2_ _ -1 _
1 Partial Derivatives and Directional Derivatives 85
1 4- 2r \
(
-1+ 3t I = (1 + 202(~ 1 + 3r) + e(3(l+2r)+(-14-3O-(2-t))
2-f /
► EXERCISES 3.1
(e) fl ] = (* + /) log*
>2
2. Calculate the directional derivative of the given function f at the given point a in the direction of
the given vector v.
*(a)
*(b)
(c)
(d) f
3. For each of the following functions f and points a, find the unit vector v with the property that
Dyf(a) is as large as possible.
2 1
’(a) f I ) = x2 + xy, a = zx J 1 1 1
1 (C) f y a= -1
I I x y z
\z/ 1
0
(b) f ( \=yex,& =
1
86 ► Chapter 3. The Derivative
4. Suppose Dyf(a) exists. Prove that D_v/(a) exists and calculate it in terms of the former.
5. (a) Show that there can be no function f: R" -> R so that for some point a € R" we have
Dv/(a) > 0 for all nonzero vectors v € R".
(b) Show that there can, however, be a function /: R" -> R so that for some vector v € R" we have
Dv/(a) > 0 for all points a e R".
6. Consider the ideal gas law pV = nRT. (Here p is pressure, V is volume, n is the number of
moles of gas present, R is the universal gas constant, and T is temperature.) Assume n is fixed. Solve
for each of p, V, and T as functions of the others, viz.,
8. Suppose f: R -> R is differentiable, and let g = f(y/x2 + y2) for x 0. Show that
x/=0, /(0) = 0.
Show that the partial derivatives of f exist at 0 and yet f is not continuous at 0. Do other directional
derivatives of f exist atO?
10. (a) Let /: R2 —> R be the function defined in Example 5 of Chapter 2, Section 3. Calculate
Dv/(0) for any v € R2.
(b) Give an example of a function f: R2 -> R all of whose directional derivatives at 0 are 0 but is,
nevertheless, discontinuous at 0.
*11. Suppose T: R” —> is a linear map. Show that the directional derivative DVT (a) exists for
all a e R" and all v e R" and calculate it.
12. Identify the set Afnxn of n x n matrices with R"2.
(a) Define f: Mn%n -> A4nxn by f(A) = AT. For any A, B e A4„xn, prove that DBf(A) = BT.
(b) Define f: A4„xn -> A4nxn by f(A) = trA. For any A, B e Mnxn, prove that DBf(A) = trB.
(For the definition of trace, see Exercise 1.4.22.)
13. Identify the set A4nXn of n x n matrices with R"2.
(a) Define f: Mnxn -> Mnxn by f (A) = A2. For any A, B e Mnxn, prove that DBf(A) = AB +
BA.
(b) Define f: Mnxn -> Mnxn by f(A) = ATA. Calculate DBf(A).
2 Differentiability •< 87
► 2 DIFFERENTIABILITY
For a function f: R -> R, one of the fundamental consequences of being differentiable (at
d) is that the function must be continuous (at d). We have already seen that for a function
f: R" -> R, having partial derivatives (or, indeed, all directional derivatives) at a need not
guarantee continuity at a. We now seek the appropriate definition.
Recall that the derivative is defined to be
A->0 h
.. f(a + h) - f(d) — mh
lim---------------- ------------------ = 0.
h-+0 h
a
That is, the tangent line—the line passing through with slope m — f'(d)—is the
f(a)
best (affine) linear approximation to the graph of f at a, in the sense that the error goes to
0 faster than h as h -> 0. (See Figure 2.1.) Generalizing the latter notion, we make this
r f(a + h)-f(a)-Df(a)h n
iTo---------------- iihii---------------- = °
This says that Df(a) is the best linear approximation to the function f — f(a) at a,
in the sense that the difference f(a + h) — f (a) — Df(a)h is small compared to h. See
Figure 2.2 and compare Figure 2.1. Equivalently, writing x = a 4- h, the function g(x) =
f(a) + Df(a)(x - a) is the best affine linear approximation to f near a. Indeed, the graph of
g is called the tangent plane of the graph at a. The tangent plane is obtained by translating
a
the graph of Df (a), a subspace of R" x Rm, so that it passes through
88 ► Chapter 3. The Derivative
Figure 2.2
Remark The derivative Df (a), if it exists, must be unique. If there were two linear
maps T, T': R" -> Rm satisfying
r f(a+ h)-f(a) — T(h) , f(a + h)-f(a)-r(h) n
lim-------------- -------------- = 0 and hm---------------- —- --------------- = 0,
h-»o ||h|| h->o ||h||
then we would have
As we did in the remark above, for any j = 1,..., n, we consider h = ttj and let t -> 0.
Then we have
0 = lim f(a + ?ej)-f(a)-gf(aW = 0
f->0 |r|
► EXAMPLE 1
_ /m(a) _
and we can think of Df(a) = Df (a)(1) as the velocity vector of the parametrized curve at the point
f (a), which we will usually denote by the (more) familiar f'(a). See Section 5 for further discussion
of this topic.
► EXAMPLE 2
a
Let f | I = xy. To prove that f is differentiable at a = , we must exhibit a linear map D/(a)
V/ ubm
with the requisite property. By Proposition 2.1, we know the only candidate is
hk
lim —=====
► EXAMPLES
► EXAMPLE 4
I x\ x a
Let f I I = First, we claim that f is differentiable at a = , provided b / 0. The putative
V/ y b
derivative is
-- L
b2 J ’
la + h a\ _ r 1 h _ a+h a h ak
b] L b b^b2
k b+k b
(a + h)(—bk) + ak(b + k) k(ak — bh)
b2(b + k) = b2(b + k) '
and so
|fc| „ ak — bh h
since . -= illnd
- W^
y/h2 + k2 ~ k
Now, as a (not totally facetious) application, consider the problem of calculating one’s gas
mileage, having used y gallons of gas to travel x miles. For example, without having a calculator on
hand, we can use linear approximation afforded us by the derivative to estimate our gas mileage if
we’ve used 10.8 gallons to drive 344 miles. Using a = 350 and b = 10, we have
344 344 — a
10.8 10.8 - b
-6
0.1 -3.5 = 35-0.6-2.8 = 31.6.
0.8
► EXAMPLE 5
As we said earlier, a function f: R2 -» R2 is differentiable if and only if both its component functions
f■: R2 -> R are differentiable. It follows from Examples 2 and 4 that the function f: R2 —> R2 given
by
Proof Suppose f is differentiable at a; we must show that lim f (x) = f (a) or, equiv
alently, that lim f (a 4- h) = f (a). We have a linear map Df (a): R” -» Rm so that
h->0
f(a + h)-f(a)-Df(a)h
lim = 0.
h->0 IIM
This means that
/f(a + h)-f(a)-Df(a)h 11. \
hm f(a + h) - f(a) - Df(a)h = hm ----------------------------- — ||h||
h~>0 h->0 \ ||n|| /
f(a 4-h)-f(a) - Df(a)h
= lim lim ||h|| =0.
h^O Hh|| h->0
Let’s now study a few examples to see just how subtle the issue of differentiability is.
> EXAMPLE 6
Define f: R2 -> R by
0.
92 ► Chapter 3. The Derivative
However, we have already seen in Exercise 3.1.9 that f is discontinuous, so it cannot be differentiable.
For practice, we check directly: If D/(0) existed, by Proposition 2.1 we would have D/(0) = 0.
Now let’s consider
/(h) - /(0) - Df(Q)h v /(h) v hk
lim---------------------------------- — lim------ = lim —------- •
h^O ||h|| h >0 llhll rhI (fc2 + fc2)3/2
H
Like many of the limits we considered in Chapter 2, this one obviously does not exist; indeed, as
h —> 0 along die line h = k, this fraction becomes
h2 _ 1
(2h2)3/2 “ 2s/2|h| ’
which is clearly unbounded as h -+ 0. What’s more, as the reader can check, / has directional
derivatives at 0 only in the directions of the axes.
► EXAMPLE?
Define /: R2 -> R by
0
x y
x^O, /(0)=0.
x2 + y2 ’
As in Example 6, both partial derivatives of this function at 0 are 0. This function, as we saw in
Example 5 of Chapter 2, Section 3, is continuous, so differentiability is a bit more unclear. But we
just try to calculate:
r /(h)-/(0)-D/(0)h h2k
nm------------------------------ = lim - lim
h-*o ||h|| h^o ||h|| rAi (h2+fc2)3/2'
When h -> 0 along either coordinate axis, the limit is obviously 0; however, when h -> 0 along the
line h = k, the limit does not exist, as the expression is equal to when h > 0 and —when
h < 0. Thus, / is not differentiable at 0.
Proposition 2.3 When f is differentiable at a, for any v e R”, the directional deriva
tive off at a in the direction v is given by
Dvf(a) = Df (a)v.
Proof Since f is differentiable at a, we know that its derivative, Df(a), has the
property that
Since Df (a) is a linear map, Df(a)(fv) = tDf (a)v. Proceeding as in the proof of Propo
sition 2.1, letting t approach 0 through positive values, we have
f(a + /v)-f(a)-rDf(a)v
lim ---------------------------------------- = 0, and so
f->0+ t
f(a + rv)-f(a) A
lim----------------------- _ pf(a)v.
t—>o+ t
Similarly, when t approaches 0 through negative values, we have |r | = — t and
f(a + rv)-f(a)-t£)f(a)v
hm---------------------------------------- =0, so
t->o- -t
f(a + rv) - f(a) - tDf(a)v
lim ---------------------------------------- = 0, and, as before,
t->o- t
f(a-Hv) — f(a)
lim----------------------- — £>f(a)v.
t
Thus,
r->0 t
as required. ■
Proof By Exercise 6, it suffices to treat the case m — 1. For clarity, we give the proof
in the case n = 2, although the general case is not conceptually any harder. As usual, we
a h
write a = andh =
b k
As usual, if f is to be differentiable, we know that Df (a) must be given by the Jacobian
matrix of f at a. To prove that f is differentiable at a € U, we need to estimate
Now, here is the new twist: As Figure 2.3 indicates, we calculate /(a + h) — /(a) by
taking a two-step route:
rz
/(a ux rz x
+, h)-/(a) = / -/
\b +k/ \b/
and so, regrouping in a clever fashion and using the Mean Value Theorem twice, we obtain
/(a + h)-/(a)-D/(a)h
df \ 3/ \
k - ~(^)k
3yv 7
dx /
V V ( A,h +fVfa + h\ v \
9x \ b ) dx^’J XdyXb + rJ dy J
\h\ \k\
Now, observe that — < 1 and < 1; as h -> 0, continuity of the partial derivatives
guarantees that
(df fa+h df
- /-(a)\) = 0
lim = lim I — I ,
h->0 h^O \ dy \b + r} 3y /
► EXAMPLES
We know that the function f given in Example 7 is not differentiable. It follows from Proposition
2.4 that f cannot be C1 at 0. Let’s verify this directly.
It is obvious that ~ (0) = — (0) = 0, and for x 0, we have
ox dy
df Z*\ _ j 3/ Zx\ x2(x2 — y2)
dx \y/ (x2 + y2)2 311 \y/ (*2 + y2)2
► EXAMPLE 9
To see that the sufficient condition for differentiability given by Proposition 2.4 is not necessary, we
consider the classic example of the function f: R -> R defined by
x2 sin -, x 0
f(x) =
0, x=0
Then it is easy to check that /'(0) = 0, and yet f'(x) = 2x sin----- cos - has no limit as x -> 0.
x x
Thus, f is differentiable on all of R but is not C1.
► EXERCISES 3.2
1. Find the equation of the tangent plane of the graph of f at the indicated point.
1 "
*(f) f y I = sin(xy)z2 4- ejcz+1, a = 0
_ -1 _
0 1
(b)
jt /4 -1
96 ► Chapter 3. The Derivative
*3. Give the derivative matrix of each of the following vector-valued functions.
xy xyz
(a) f:R2->R2, f (d) f:"R3-->R2, f y
■2 -U ,,2 _u. ,2
\z
cost xcosy
*4. Use the technique of Example 4 to estimate your gas mileage if you used 6.5 gallons to drive 224
miles.
5. Two sides of a triangle are x = 3 and y = 4, and the included angle is 0 = t t /3. To a small change
in which of these three variables is the area of the triangle most sensitive? Why?
6. Let U c R" be an open set, and let a € U. Suppose m > 1. Prove that the function f: 17 -> Rm
is differentiable at a if and only if each component function fiti = 1,..., m, is differentiable at a.
(Hint: Review the proof of Proposition 3.1 of Chapter 2.)
7. Show that any linear map is differentiable and is its own derivative (at an arbitrary point).
8. Show that the tangent plane of the cone z2 = x2 + y2 at 0 intersects the cone in a line.
9. Show that the tangent plane of the saddle surface z = xy at any point intersects the surface in a
pair of lines.
/x •2 _ a,2
10. Find the derivative of the map f ( at the point a. Show that whenever a 0,
2xy
the linear map Dt (a) is a scalar multiple of a rotation matrix.
11. Prove from the definition that the following functions are differentiable.
(a) f(X\ — x2 + y2 (b) f(X\ =xy2 (c) /: R" R,/(x) = ||x||2
12. Let
x0O, /(0) = 0.
Show directly that f fails to be C1 at the origin. (Of course, this follows from Example 5 of Section
3 of Chapter 2 and Propositions 2.2 and 2.4.)
3 Differentiation Rules 97
13. Use the results of Exercise 3.1.13 to show that f (A) = A2 and f (A) — ATA are differentiable
functions mapping Mnxn to Mnxn.
s14. Let A be an n x n matrix. Define f: R" -> R by /(x) = Ax•x = xTAx.
(a) Show that f is differentiable and D/(a)h = Aa • h + Ah • a.
(b) Deduce that when A is symmetric, D/(a)h = 2Aa ■ h.
15. Let a G Rn, 8 > 0, and suppose f: B(a, 8) -> R is differentiable at a. Suppose /(a) > /(x)
for all x e B(a, 6). Prove that Df(a) = 0.
16. Let a G R2, 8 > 0, and suppose f: B(a, 8) -> R is differentiable and D/(x) = 0 for all x g
5(a, <5). Prove that /(x) = /(a) for all x e B(a, 5). (Hint: Start with the proof of Proposition 2.4.)
17. Let
y, x o
0, x=Q
= x^0) A°) = °-
\y J x4 + y8
► 3 DIFFERENTIATION RULES
In practice, most of the time Proposition 2.4 is sufficient for us to calculate explicit deriva
tives. However, it is reassuring to know that the sum, product, and quotient rules from
elementary calculus pertain to the multivariable case. We shall come to the chain rule
shortly.
For the next proofs, we need the notion of the norm of a linear map T: Rn -> Rm. We
set
(In Section 1 of Chapter 5 we will prove the maximum value theorem, which states that a
continuous function on a closed and bounded subset of R" achieves its maximum value.
98 ► Chapter 3. The Derivative
Since the unit sphere in R” is closed and bounded, this maximum exists.) When x 0, we
Proposition 3.1 Suppose U c R" is open and f: U -> Rm, g: U -> Rm, and
k: U —> R. Suppose a G U and f, g, and k are differentiable at a. Then
Proof These are much like the proofs of the corresponding results in single-variable
calculus. Here, however, we insert the candidate for the derivative in the definition and
check that the limit is indeed 0:
. .. (f 4-g)(a4-h) - (f 4-g)(a) - (Df (a) 4-Dg(a))h
1. hm---------------------------------------------------------------------—
h-*0 l!h||
(f(a 4- h) — f(a) — Df(a)h) 4- (g(a + h) - g(a) - Dg(a)h)
= lim
h->0 iihii
f(a4-h)-f(a)-Df(a)h g(a 4-h) - g(a) - Dg(a)h
lim---------------- —-------------------F lim----------------- —- = 04-0 = 0.
-----------------
2. We proceed much as in the proof of the limit of the product in Theorem 3.2 of
Chapter 2.
(K)(a 4- h) - (fcf)(a) - ((D£(a)h)f (a) 4- fc(a)Df (a)h)
Now, the second term clearly approaches 0. To handle the first term, we have to
use continuity in a rather subtle way, remembering that if f is differentiable at a,
then it is necessarily continuous at a (Proposition 2.2):
3 Differentiation Rules ◄ 99
Hh||
k(a + h) - k(a) - D*(a)h (Dfc(a)h) (f (a + h) - f (a))
IIM ( )+ llhll
Now here the first term clearly approaches 0, but the second term is a bit touchy.
The length of the second term is
|Dfc(a)h|||f(a4-h)-f(a)||
< ||D*(a)||||f(a + h)-f(a)||,
llhll
which in turn goes to 0 as h 0 by continuity of f at a. This concludes the proof
of (2).
The proof of (3) is virtually identical to that of (2) and is left to the reader in Exer
cise 9. ■
Theorem 3.2 (The Chain Rule) Suppose g: R" -> Rm and f: R"1 -> Rf, g is dif
ferentiable at a, and f is differentiable at g(a). Then f °g is differentiable at a and
D(fog)(a) = Df(g(a))o£>g(a).
Given e :> 0, this means that there are <5i > 0 and rj > 0 so that
(*) 0 < ||h|| < Si => ||g(a 4- h) - g(a) - Dg(a)h|| < e||h|| and
(**) ||k|| < r? => ||f(b4-k) — f(b) — Df(b)k|| < e||k||.
Setting k = g(a 4- h) - g(a) and rewriting (*), we conclude that whenever 0 < ||h|| < <51,
we have
and so
||k|| < ||Dg(a)h|| +£||h|| < (||Z>g(a)|| +s)||h||.
Finally, we start with the numerator of the fraction whose limit we seek.
f (b + k) - f (b) - Df (b)Dg(a)h
= [f (b + k) - f(b) - Df (b)k] + [Df (b)k - Df(b)Dg(a)h]
= [f(b + k) - f(b) - Df(b)k] + Df(b)(k - Dg(a)h).
as required. ■
Remark Those who wish to end with a perfect e at the end may replace the £ in (*)
£ £
with----------------------- and that in (**) with-------------------------.
2(||Df(b)|| 4-1) 2(||Dg(a)||+£)
> EXAMPLE 1
Suppose the temperature in space is given by f y = xyz2 4- e3xy~2z and the position of a bumblebee
W 1
is given as a function of time ? by g: R —> R3. If at time t = 0 the bumblebee is at a = 2 and
" -1 ~ 3
her velocity vector is v = 1 , as indicated in Figure 3.1, then we might ask at what rate she
2
perceives the temperature to be changing at that instant. The temperature she measures at time t is
(/°g)(t), and so she wants to calculate (/<>g)z(0) = D(/°g)(0). We have
so D/(a) = [ 24 12 10 j. Then
Note that in order to apply the chain rule, we need to know only her position and velocity vector at
that instant, not even what her path near a might be.
Remark Suppose f: R" -> Rm is differentiable at a and we wish to evaluate Dvf (a)
for some v € R". Define g: R -> R" by g(t) = a + tv, and consider <o(t) = (f°g)(t). By
definition, we have Dvf (a) = <pz(0). Then, by the chain rule, we have
This is an alternative derivation of the result of Proposition 2.3. (Cf. Example 4 of Sec
tion 1.)
Indeed, if g is any differentiable function with g(0) = a and g'(0) = v, we see that
Dvf (a) = (f°g)'(O), so this shows, as we suggested in the remark on p. 83, that we should
think of the directional derivative as the rate of change perceived by an observer at a moving
with instantaneous velocity v.
► EXAMPLE 2
Let
WCOS V
and g
u sin v
102 ► Chapter 3. The Derivative
Since
we have
D(fog)
u 2c o s 2v
On the other hand, as the reader can verify, (fog) and so we can double-check
u2 sii\2u
the calculation of the derivative directly. *4
► EXERCISES 3.3
*1. Suppose f: R3 -> R is differentiable and
Find (/og)'(0).
*2. Suppose
2y - sinx
gjc+3y
xy + y3
cos/
3. Suppose g(/) = sin/ and/ y + 2x. Use the chain rule to calculate
2sin(//2)
(/°g)'(/). What do you conclude?
3 cost
4. An ant moves along a helical path with trajectory g(r) — 3 sin/
5t
(a) At what rate is his distance from the origin changing at r = 2t t ?
(b) The temperature in space is given by the function f: R3 -> R, f I y I = xy + z2. At what rate
\4/
does the ant detect the temperature to be changing at t = 3t t /4?
*5. An airplane is flying near a radar tower. At the instant it is exactly 3 miles due west of the tower,
it is 4 miles high and flying with a ground speed of 450 mph and climbing at a rate of 5 mph. If at
that instant it is flying
3 Differentiation Rules 103
F(f) = / h(s)ds
Ju(t)
is differentiable and calculate F'. (Hint: Recall that the Fundamental Theorem of Calculus tells you
how to differentiate functions such as H(x) = f* h(s)ds.)
du dv \3x/ \3y/ ’
rcos0\ _ , ,
*16. Suppose f: IR2 -> JR is differentiable and let F I. Calculate
r sih0 I
/9F\2 (dF\Z
\ dr ) r2 \ 39 )
Vf (a) = (D/(a))T =
If we consider the directional derivative in the direction of various unit vectors v, we infer
from the Cauchy-Schwarz inequality, Proposition 2.3 of Chapter 1, that
► EXAMPLE 1
Let f: IR" -> IR be defined by /(x) = ||x||. It is simple enough to calculate partial derivatives of
f, but we’d rather use the geometric meaning of the gradient to figure out V/(a) for any a 0.
Clearly, if we are at a, the direction in which distance from the origin increases most rapidly is in the
4 The Gradient < 105
direction of a itself (i.e., to move away from the origin as fast as possible, we should move radially
outward). Moreover, we saw in Example 3 of Section 1 that the directional derivative Dv/(a) = 1
when v = a/1| a||. Therefore, we infer from Proposition 4.1 that V/(a) is a vector pointing radially
outward and having length 1. That is,
V/(a) = -2-
As corroboration, we observe that if we move orthogonal to a, then instantaneously our distance from
the origin is not changing, so Dv/(a) = V/(a) • v = 0 when v • a = 0, as it should. ■<!
► EXAMPLE!
A?
Consider the surface M defined by f I y — ex+2y cos z — xz + y = 2. Note that the point a =
“-2 "I \z/
1 lies on M. We want to find the equation of the tangent plane to M at a. We know that V/ (a)
0_
gives the normal to the plane, so we calculate
—Z + ex+2y cos z
V/ 1 + 2eJt+2}' cos z
—x — sin z
► EXAMPLE 3
As a beautiful application of this principle, we use the results of Example 1 to derive a fundamental
physical property of the ellipse. Given two points Fy and Fz in the plane, an ellipse (as pictured in
Figure 4.1) is the locus of points P so that
106 ► Chapter 3. The Derivative
for some positive constant a. Write //(x) = ||F<x||, i = 1,2, and set /(x) = /i(x) + y2(x). Then,
by the results of Example 1, we have
Vi v2
Both Vi and v2 are unit vectors pointing radially away from Fi and F2, respectively, and therefore
V/(P) bisects the angle between them (see Exercise 1.2.20). Thus, a = fi, and so the tangent line
to the ellipse at P makes equal angles with the lines and £2A Thus, a light ray emanating from
one focus reflects off the ellipse back to the other focus.
k EXERCISES 3.4
1. Give the equation of the tangent line of the given level curve at the prescribed point a.
*(a) x3 + y3 = 9, a =
0
(b) 3xy2 4- — sin (ary) = 1, a =
1
2. Give the equation of the tangent plane of the given level surface at the prescribed point a.
1 -1
(a) x2 + y2 + z2 = 5, a = 0 (c) x3 + xz2 + y2z + y3 = 0, a = 1
2 — 0—
— —
0 -1
*(b) yz2 + 2etyz5 = 4, a = 2 (d) e2*** cos(3y) — xy + z = 3, a = 0
4 The Gradient 107
3. Given the topographical map in Figure 4.2, sketch on the map an approximate route of steepest
ascent from P to Q, the top of the mountain. What about from J??
(a) Find a vector tangent to the curve of steepest ascent on the hill at
a
(b) Find the angle that a stream makes with the horizontal at if it flows in the e2 direction at
c
that point.
5. As shown in Figure 4.3, at a certain moment, a ladybug is at position xo and moving with velocity
vector v. At that moment, the angle Zaxob = t t /2, her velocity bisects that angle, and her speed is
5 units/sec. At what rate is the sum of her distances from a and b decreasing at that moment? Give
your reasoning clearly.
Figure 4.3
6. Suppose that, in a neighborhood of the point a, the level curve C = {x e R2: /(x) = c} can be
parametrized by a differentiable function g: (—e, s) -> R2, with g(0) = a. Use the chain rule to
prove that V/ (a) is orthogonal to the tangent vector to C at a.
7. Check that the definition of an ellipse given in Example 3 gives the usual Cartesian equation of
the form
±c
when the foci are at (Hint: You should find that a2 = b2 + c2.)
0
8. By analogy with Example 3, prove that light emanating from the focus of a parabola reflects off the
parabola in the direction of the axis of the parabola. This is why automobile headlights use parabolic
108 ► Chapter 3. The Derivative
reflectors. (A convenient definition of a parabola is this: It is the locus of points equidistant from a
point (the focus) and a line (the directrix), as pictured in Figure 4.4.)
Figure 4.4
9. Using Figure 4.5 as a guide, complete Dandelin’s proof (dating from 1822) that the appropriate
conic section is an ellipse. Find spheres that are inscribed in the cone and tangent to the plane of
the ellipse. Letting Fi and F2 be the points of tangency and P a point of the ellipse, let Qi and Q2
be points where the generator of the cone through P intersects the respective spheres. Show that
|| Qi & II = IlfiH i = l,2, and deduce that ||Fi P || + ||F2^II = const. (What happens when we tilt
the plane to obtain a parabola or hyperbola?)
Figure 4.5
10. Suppose f: R2 -» R is a differentiable function whose gradient is nowhere 0 and that satisfies
V =29/
dx dy
everywhere.
(a) Find (with proof) the level curves of f. / \
(b) Show that there is a differentiable function F: R -> R so that / IX j = F(2x + y).
5 Curves ■< 109
11. Suppose f: R2 — {0} -> R is a differentiable function whose gradient is nowhere 0 and that
satisfies
df df
->’T-+^r- = 0
dx dy
everywhere.
(a) Find (with proof) the level curves of f.
(b) Show that there is a differentiable function F defined on the set of positive real numbers so that
/(x) = F(||x||).
*12. Find all constants c for which the surfaces
x2 + y2 + z2 = 1 and z = x2 + y2 + c
(a) intersect tangentially at each point and (b) intersect orthogonally at each point
13. Prove the so-called pedal property of the ellipse: If n is the unit normal to the ellipse at P, then
(F^n)(F^n) = constant.
14. The height of land in the vicinity of a hill is given in terms of horizontal coordinates x and y
1
1 and follows a path of “steepest
5
descent.” Find the equation of the path of the stream on a map of the region.
15. A drop of water falls onto a football and rolls down, following the path of steepest descent; that
is, it moves in the direction tangent to the football most nearly vertically downward. Find the path
the water drop follows if the surface of the football is ellipsoidal and given by the equation
4%2 + y2 + 4z2 = 9
1
and the drop starts at the point 1
1
► 5 CURVES
In this section, we return to the study of (parametrized) curves with which we began Chapter
2. Now we bring in the appropriate differential calculus to discuss velocity, acceleration,
some basic principles from physics, and the notion of curvature.
If g: (a, b) -> Rn is a twice-differentiable vector-valued function, we can visualize
g(r) as denoting the position of a particle at time t, and hence the image of g represents its
trajectory as time passes. Then g'(t) is the velocity vector of the particle at time t and g"(t)
is its acceleration vector at time t. The length of the velocity vector, ||gz(t) ||, is called the
speed of the particle. In physics, a particle of mass m is said to have kinetic energy
K.E. = |m (speed)2,
and acceleration looms large because of Newton’s second law of motion, which says that a
force acting on an object imparts an acceleration according to the equation
As a quick application of some vector calculus, let’s discuss a few properties of motion
in a centralforce field. We call a force field F: U -> R3 on an open subset U c R3 central
ifF(x) = i/r (x)x for some continuous function i//-; U -> R; that is, F is everywhere a scalar
multiple of the position vector.
Newton discovered that the gravitational field of a point mass M is an inverse square
force directed toward the point mass. If we assume the point mass is at the origin, then the
force exerted on a unit test mass at position x is
GM x GM
(x)_ IMP |M " ilxll3*’
where G is the universal gravitational constant. Newton published his laws of motion
in 1687 in his Philosophiae Naturalis Principia Mathematica. Interestingly, Kepler had
published his empirical observations almost a century earlier, in 1596.1
Kepler’S first law: Planets move in ellipses with the sun at one focus.
Kepler’s second law: The position vector from the sun to the planet sweeps out area at a
constant rate.
Kepler’s third law: The square of the period of a planet is proportional to the cube of the
semimajor axis of its elliptical orbit.
For the first and third laws we refer the reader to Exercise 15, but here we prove a general
ization of the second.
Proposition 5.1 Let F be a central force field on R3. Then the trajectory of any
particle lies in a plane; assuming the trajectory is not a line, the position vector sweeps out
area at a constant rate.
Proof Let the trajectory of the particle be given by g((), and let its mass be m.
Consider the vector function A(r) = g(r) x g'(r). By Exercise 3.3.10 and by Newton’s
second law of motion, we have
since the cross product of any vector with a scalar multiple of itself is 0. Thus, A(t) = Ao
is a constant. If Ao = 0, the particle moves on a line (why?). If Ao 7^ 0, then note that g
lies on the plane
Ao • x = 0,
Somewhat earlier he had surmised that the positions of the six known planets were linked to the famous five
regular polyhedra.
5 Curves 111
by the position vectors g(t) and g(r 4- h) (see Figure 5.1), for h small, this is approxi
mately the area of the triangle determined by the pair of vectors or, equivalently, by the
vectors g(t) and g(t 4- h) — g(r). According to Proposition 5.1 of Chapter 1, this area is
5llg(t) x (g(t 4- h) - g(t))||, so that
a ->o h
= 1 llg(t) X (g(t + A)-g(t))||
a ->o + 2 h
g(t 4-h) — g(t)
= lim - g(r) x —------- t ——
A->o+ 2 h
= |llg« xg'WII = l||Ao||.
That is, the position vector sweeps out area at a constant rate. ■
One of the most useful (yet intuitively quite apparent) results about curves is the
following.
Proposition 5.2 Suppose g: (a, b) -> R" is a differentiable parametrized curve with
the property that g has constant length (i.e., the curve lies on a sphere centered at the
origin). Then g(r) ■ g'(r) = Ofor all t; i.e., the velocity vector is everywhere orthogonal to
the position vector.
to obtain
as required. ■
Physically, one should think of it this way: If the velocity vector had a nonzero projection
on the position vector, that would mean that the particle’s distance from the center of the
sphere would be changing. Analogously, as we ask the reader to show in Exercise 2, if a
particle moves with constant speed, then its acceleration must be orthogonal to its velocity.
112 > Chapter 3. The Derivative
Now we leave physics behind for a while and move on to discuss some geometry. We
begin with a generalization of the triangle inequality, Corollary 2.4 of Chapter 1.
Lemma 5.3 Suppose g: [a, ft] -> Rn is continuous (except perhaps at finitely many
points). Then, defining the integral ofg component by component, i.e.,
we have
b II fb
g(r)<Zr < / ||g(r)||J/.
II Ja
fb
Proof Let v = / g(t)dt. If v = 0, there is nothing to prove. By the Cauchy-
Ja
Schwarz inequality, Proposition 2.3 of Chapter 1, |v • g(t) | < ||v|| ||g(r) ||, so
fb fb fb fb
||v||2 = v- I g(t)dt = v-g(t)dt<l ||v||||g(t)||dr = ||v|| I ||g(r)||dr.
Ja Ja Ja Ja
cb
Assuming v 0, we now infer that || v|| < / ||g(r) \\dt, as required. ■
Ja
That is, £(g, IP) is the length of the inscribed polygon with vertices at g(ft), i = 0, ...,k,
as indicated in Figure 5.2. We define the arclength of g to be
The following result is not in the least surprising: The distance a particle travels is the
integral of its speed.
Proposition 5.4 Let g: [a, ft] -> R" be a piecewise-Q1 parametrized curve. Then
rb
€(g) = llg'(*)ll^.
5 Curves "< 113
k
t(s, O’) = £ llgft) - g(«,-i)ll
fb
so €(g) < / ||gz(r) ||dt. The same holds on any interval.
Ja
Now, for a < t < b, define s(t) to be the arclength of the curve g on the interval [a, /].
Then for h > 0 we have
since s(t + h) — ■$(?) is the arclength of the curve g on the interval [r, t + h]. Now
Uga+^-gtoiu,^,, i r-
fc->o+ h h J,
s(t+h)-s(t)
hm ----------;-----------= ||g (Oil-
h
A similar argument works for h < 0, and we conclude that s'(f) = llg'(^) II- Therefore,
s(t) = /
Ja
/•b
and, in particular, s(b) = ^(g) = Hg'ft)\\dt, as desired. ■
114 ► Chapter 3. The Derivative
► EXAMPLE 1
a cost
g(t) = asinf t € R,
bt
as pictured in Figure 5.3. Note that it twists around the cylinder of radius a, heading “uphill” at a
constant pitch. If we take one “coil” of the helix, letting t run from 0 to 2nr, then the arclength of that
portion is
2n flit —a sin t
*(g) = [
Ilg'Olld* = / a cost dt = / \/a2 + b2dt = a2 4- b2.
Jo Jo Jo
b
> EXAMPLE 2
- sin(s/a)
g'W = and ||g'(r)|| = 1.
cos(s/a)
5 Curves 4 115
Then
If g is arclength-parametrized, then the velocity vector g'(s) is the unit tangent vector
at each point, which we denote by T(s). Let’s assume now that g is twice differentiable.
Since ||T(v) || = 1 for all s, it follows from Proposition 5.2 that T(s) • T'(s) = 0. Define
the curvature of the curve to be k (s ) = ||T'(s) ||; assuming T'(s) / 0, define the principal
normal vector N(^) = T,(s)/||T'(5,)||. (See Figure 5.4.)
Figure 5.4
► EXAMPLE 3
If g is a line, then T is constant and k = 0 (and conversely). If we start with a circle of radius a, then
from Example 2a we have
— sin(s/a)
T(5) =
cos(j/a)
T(j ) = i — cos(s/a)
a — sin(s/a)
In particular, we see that N(j ) is centripetal (pointing toward the center of the circle) and k (s )
= 1/a for all s.
116 ► Chapter 3. The Derivative
Figure 5.5
Note that as long as g'(t) never vanishes, the arclength j is a differentiable function
of t with positive derivative everywhere; thus, it has a differentiable inverse function,
which we write t(s). We can “reparametrize by arclength” by considering the composition
h(j) = g(t(s)), and then, of course, g(t) = h(.s(r)). Writing2 v(t) = s'(f) = ||g/(t)|| for
the speed, we have by the chain rule
► EXAMPLE 4
cos31
g(0 = 0<t < jt /2.
sin t
Then we have
— cost — cost
g' (t) = 3 cos t sin t so u(r) = 3 cost sint and T(.s(t)) —
sint sint
sint
= (Toj)'(r) = T(s(t))s'(t) = K(s(t))v(t)N(5(t)),
cost
2For those who might not know, u is the Greek letter upsilon, not to be confused with v, the Greek letter nu.
5 Curves ◄ 117
sin/ 1 —---------------
1
K(s(t))v(t) = and ----- .
cos/ v(t) 3 cost sin t
► EXERCISES 3.5
1. Suppose g: (a, b) -> R" is a differentiable parametrized curve with the property that at each t,
the position and velocity vectors are orthogonal. Prove that g lies on a sphere centered at the origin.
2. Suppose g: (a, b) -> R" is a twice-differentiable parametrized curve. Prove that g has constant
speed if and only if the velocity and acceleration vectors are orthogonal at every t.
3. Suppose f, g: (a, b) -> Rn are differentiable and f • g = const. Prove that f' • g = —g' • f. Inter
pret the result geometrically in the event that f and g are always unit vectors.
4. Suppose a particle moves in a central force field in R3 with constant speed. What can you say
about its trajectory? (Proof?)
5. Suppose g: (a, b) -» R" is nowhere zero and g7(t) = A(t)g(t) for some scalar function X. Prove
(rigorously) that g/||g|| is constant. (Hint: Set h = g/||g||, write g = ||g||h, and differentiate.)
6. Suppose g: (a, b) -> Rn is a differentiable parametrized curve and that for some point p e R"
we have ||g(r0) - pll < l|g(t) - Pll for all t e (a, b). Prove that g'(to) • (gfro) - p) = 0. Give a
geometric explanation.
7. Find the arclength of the following parametrized curves.
el cos / /
|(? +e-t)
a(t — sin/)
(b) g(t) = — e~{) (d) g(t) = 0 < t < 2lt
a(l — cos/)
t
8. Calculate the unit tangent vector and curvature of the following curves.
■j- cos t + sin t /
*(a) g(t) = -^cos/ (c) g(t) = /2
_73c o s /_ Asin\ _/3_
*(b) g(0 =
-Jit
9. Prove that for a parametrized curve g: (a,b) -*■ R3, we have k = Hg7 x g"||/u3.
10. Using the formula (f) for acceleration, explain how engineers might decide at what angle to bank
a road that is a circle of radius 1 /4 mile and around which cars wish to drive safely at 40 mph.
11. (Frenet Formulas) Let g: [0, L] —> R3 be a three-times differentiable arclength-parametrized
curve with k > 0, and let T and N be defined as above. Define the binormal B = T x N.
(a) Show that ||B || = 1. Assuming the result of Exercise 1.4.34e, show that every vector in R3 can
be expressed as a linear combination of T(s'), N(s), and B(s). (Hint: See Example 11 of Chapter 1,
Section 4.)
118 ► Chapter 3. The Derivative
(b) Show that B' ■ T = B' • B = 0, and deduce that B'(s) is a scalar multiple of N(s) for every s.
(Hint: See Exercise 3.)
(c) Define the torsion r of the curve by B' = —t N. Show that g is a planar curve if and only if
t(s) = 0 for all s.
(d) Show that N' = —k T + rB.
The equations
T = k N, N' = -k T + t B, B' = -t N
are called the Frenet formulas for the arclength-parametrized curve g.
*12. (See Exercise 11 for the definition of torsion.) Calculate die curvature and torsion of the helix
presented in Example 1. Explain the meaning of the sign of the torsion.
13. (See Exercise 11 for the definition of torsion.) Calculate the curvature and torsion of the curve
e* cos t
g(0 = e' sin t
14. A pendulum is made, as pictured in Figure 5.6, by hanging from the cusp where two arches of a
cycloid meet a length of string equal to the length of one of the arches. As it swings, the string wraps
around the cycloid and extends tangentially to the bob at the end. Given the equation
14-sinr
f(0 = 0 < r < 2t t ,
1 — COSt
for the cycloid, find the parametric equation of the bob, P, of the pendulum.3
Figure 5.6
15. Assuming that the force field is inverse square, prove Kepler’s first and third laws, as follows.
Without loss of generality, we may assume that the planet has mass 1 and moves in the xy-plane.
(You will need to use polar coordinates, as introduced in Example 6 of Chapter 2, Section 1.)
(a) Suppose a, b > 0 and a2 = b2 + c2.Show that the polar coordinates equation of the ellipse
^- J-2 + y^
(x —Tc) 2 =1 is cr(l--cos0) = -. b2
a2 b2 a a
This is an ellipsewith semimajor axis a and semiminor axis b, with one focus at the origin. (Hint:
Expand the left-hand side in polar coordinates and express the result as a difference of squares.)
3This phenomenon was originally discovered by the Dutch mathematician Huygens in an effort to design a
pendulum whose period would not depend on the amplitude of its motion, hence one ideal for an accurate clock.
5 Curves 119
(b) Let r(f) and 0(f) be the polar coordinates of g(r), and let
cos0(t) , -sin 0(f)
e,(t) = and ee(t) =
sin 0(f) cos 0(0
as pictured in Figure 5.7. Show that
g7 (0 = r'(t)er(t) + r(r)0'(r)e0(r),
g"(0 = (r"(t) - r(O0'(O2M) + (2r'(00'(r) + r(t)O"(t))ee(t).
(c) Let Ao be as in the proof of Proposition 5.1. Show that g"(0 x Aq = GAf0'(r)es (0 = GMe'r (t),
and deduce that g'(r) x Ao = GM(er (t) + c) for some constant vector c.
(d) Dot the previous equation with g(0 and use the fact that g(r) x g'(r) = Ao to deduce that
GMr(t)(\ — ||c|| cos0(O) = ||Ao||2 if we assume c is a negative scalar multiple of ej. Deduce
that when ||c|| > 1 the path of the planet is unbounded and that when ||c|| < 1 the orbit of the planet
is an ellipse with one focus at the origin.
(e) As we shall see in Chapter 7, the area of an ellipse with semimajor axis a and semiminor axis b
is nab\ show that the period T — Zttab/(| Ao||. Now prove that T2 = ——a3.
GM
Figure 5.7
16. (Pilfered from Which Way did the Bicycle Go ...and Other Intriguing Mathematical Mysteries,
published by the M.A.A. Copyright The Mathematical Association of America, Washington, DC,
1996. All rights reserved.)
“This track, as you perceive, was made by a rider who was going from the direction
of the school.”
“Or towards it?”
“No, no, my dear Watson.... It was undoubtedly heading away from the school.”
So spoke Sherlock Holmes.4 Imagine a 20-foot wide mud patch through which a bicycle has just
passed, with its front and rear tires leaving tracks as illustrated in Figure 5.8. (We have taken the
liberty of helping you in your capacity as sleuth by using dashes for the path of one of the wheels.)
In which direction was the bicyclist traveling? Explain your answer.
, . . .
7---------------------- "---- , 1 < 11, 12, • • • < n,
dx^dXi^j • • • dXi2dXij
exist and are continuous (on U). We say f is C00 (or smooth) if all its partial derivatives of
all orders exist.
► EXAMPLE 1
9/ ry ■ 13 4
— ^ye^sinz + y « »
ax
a2 f
------ — ye*y cosz + 4y3z3,
azax
d3 f
——— = — yexy sinz + 12y3z2, and
az2 ax
d3 f
- _ 1 ■" = exy (xy + 1) cos z + 12y2z3. ◄
aydzax
which we calculate the partial derivatives does not matter. This is an intuitively obvious
result, but the proof is quite subtle.
Theorem 6.1 Let U C R" be open, and suppose f: U -> Rm is a Q2 function. Then
for any i and j we have
a2f _ 32f
dXidXj dxjdxi
Proof It suffices to prove the result when m = 1. For ease of notation, we take n — 2,
i = 1, and j = 2. Introduce the function
s
as indicated schematically in Figure 6.1. Letting q(s)
b+k
the Mean Value Theorem, we have
l( *
= h(d
\b + k) dx \bf /
(a-Yh
On the other hand, letting r(f) = f I , we have
\dy \ t ) dy\v))
=M£L
for some a between a and a + h.
dxdy
Therefore, we have
1 M _ a2/ /A = d2f
hk \kj dydx \q) dxdy \t )
122 ► Chapter 3. The Derivative
Figure 6.1
a2/ a2/
Now £, a -> a and n, r -> b as h, k -> 0, and since the functions ——— and „ „ are
5 1 dxdy dydx
continuous, we have
ay M = a2/ M
dxdy \bj dydx \bj ’
as required. ■
► EXAMPLE 2
(Harmonic Functions) If f is a C2 function on (an open subset of) R", the expression
97,97 , , ?7
V7 =
dxf dx% Sx2
is called the Laplacian of f. A solution of the equation V2/ = 0 is called a harmonic function. As
we shall see in Chapter 8, the Laplacian and harmonic functions play an important role in physical
applications. For example, the gravitational (resp., electrostatic) potential is a harmonic function in
mass-free (resp., charge-free) space.
► EXAMPLES
97 =r2^f_
(*)
dt2 dx2
models the displacement of a one-dimensional vibrating string (with “wave velocity” c) from its
equilibrium position. By a clever use of the chain rule, we can find an explicit formula for its general
6 Higher-Order Partial Derivatives ◄ 123
x
t
(so that u = x + ct and v = x - ct), and set F Then by the chain rule, we have
so
Now, differentiating with respect to u, we have to apply the chain rule to each of the functions
d2F
dudv
(■(:)) MCMlHlMO)
_A
2c
s(-(:))][t
- Hs (•(:)) as («(:))-s (■(:)) a a (■ (:))
= 0,
where at the last step we use Theorem 6.1. Now what can we say about the general solution of the
d2F
equation -- — = 0? On any rectangle in the uv-plane, we can infer that
dudv
Iu\
Fl ]=^(u) + ^(v)
\v/
for some differentiable functions d> and (For — I —- I = 0 tells us that —- is independent of
du \ dv / dv
u, hence a function of v only, whose antiderivative we call ^(v). But the constant of integration can
be an arbitrary function of u. To examine this argument a bit more carefully, we recommend that the
reader consider Exercise 11.)
In conclusion, on a suitable domain, the general solution of the wave equation (*) can be written
in the form
for arbitrary C2 functions 0 and The physical interpretation is this: The general solution is the
superposition of two traveling waves, one moving to the right along the string with speed c, the other
moving to the left with speed c.
124 ► Chapter 3. The Derivative
► EXAMPLE 4
(Minimal Surfaces) When you dip a piece of wire shaped in the form of a closed curve C into soap
film, the resulting surface you see is called a minimal surface, so called because in principle surface
tension dictates that the surface should have least area among all those surfaces having that curve
differential geometry course that f must be a solution of the minimal surface equation
/. + /V?) _,V 91 a2/ + (\ + (a2/
\ Vcty/ / dx2 dx dy dxdy \ \dx) / dy2
a. a plane;
b. a helicoid—the spiral surface obtained by joining points of a helix “horizontally” to its
vertical axis, as pictured in Figure 6.2(a);
c. a catenoid—the surface of revolution obtained by rotating a catenary y = ^(e™ + e~cx)
(for any c > 0) about the x-axis, as pictured in Figure 6.2(b).
(a)
Figure 6.2
► EXERCISES 3.6
1. Define f: R2 -> K by
0 2 2
= x/0, /(0)=0.
X2 + y2
d2f d2 f
(b) Deduce that ■■■■■-- (0) = 1 but (0) — — 1.
dxdy dydx
(c) Conclude that f is not C2 at 0.
6 Higher-Order Partial Derivatives ◄ 125
3. Check that the following functions are solutions of the one-dimensional wave equation given in
Example 3.
. Iu
6. Suppose f: R2 -> R and g: R2 R2 are C2, and let F . Writing gi I
\v
, show that
r cos 0 \
7. Suppose f: R2 -> R is C2. Let F I. Show that
r sin0 /
32f 13F i 32f _ a2/
dr2 r dr r2 d32 dx2 dy2 ’
rcos0
where the lefthand side is evaluated at and the righthand side is evaluated at . (This
e r sin0
is the formula for the Laplacian in polar coordinates.)
8. Use the result of Exercise 7 to show that for any integer n, the functions Fl ) = rn cos n& and
*9. Use the result of Exercise 7 to find all radially symmetric harmonic functions on the plane. (This
means that F is independent of 0, so we can call it h(r).)
10. Check that the following functions f: R2 -> R are indeed solutions of the minimal surface
equation given in Example 4.
2
(c) i - y2 (For this one, a computer algebra system is recommended.)
In this chapter we will see how to go back and forth between these two approaches. The
central tool is Gaussian elimination, with which we deal in depth in the first two sections.
We then come to the central notion of dimension and some useful applications. In the
last section, we will begin to investigate to what extent we can relate implicit and explicit
descriptions in the nonlinear setting.
127
128 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems
Geometrically, a solution of the system Ax = b is a vector x having the requisite dot products
with the row vectors A/ of the matrix A:
That is, the system of equations describes the intersection of the m hyperplanes with normal
vectors A, and at (signed) distance bi 11| A, || from the origin.
To solve a system of linear equations, we want to give an explicit parametric description
of the general solution. Some systems are relatively simple to solve. For example, taking
the system
xi - x3 = 1
X2 + 2X3 = 2,
we see that these equations allow us to determine xi and x2 in terms of X3; in particular,
we can write xi = 14- x3 and x2 = 2 — 2x3, where x3 is free to take on any real value.
1 +t
Thus, any solution of this system is of the form x = 2 - 2t for some t e R. (It is
easily checked that every vector of this form is in fact a solution, as (1 4-t) — t = 1 and
(2 — 2t) 4- 2t = 2 for every t e R.) Thus, we see that the intersection of the two given
"1 “ 1 “
planes is the line in R3 passing through 2 with direction vector -2
_0_ 1 _
More complicated systems of equations require some algebraic manipulations before
we can easily read off the general solution in parametric form. There are three basic
operations we can perform on systems of equations that will not affect the solution set.
They are the following elementary operations:
► EXAMPLE 1
then we use operation (ii), multiplying the first equation by 1 /2, to get
xi + x2 — 2x 4 = 3
3xi - 2x 2 4- 9x4 = 4;
now we use operation (iii), adding -3 times the first equation to the second:
xi 4- x2 — 2x 4 = 3
- 5x2 4- 15x 4 = —5.
Next we use operation (ii) again, multiplying the second equation by —1/5, to obtain
4- x2 - 2x 4 = 3
x2 - 3x4 = 1;
finally, we use operation (iii), adding —1 times the second equation to the first:
Xi 4- x4 = 2
x2 — 3x 4 = 1.
From this we see that Xi and x2 are determined by x4, whereas x3 and x4 are free to take on any values.
Thus, we read off the general solution of the system of equations:
xi = 2 — x4
x2 = 1 4- 3x4
X3 = x3
x4 = x4
We now describe a systematic technique, using the three allowable elementary oper
ations, for solving systems of m equations in n variables. Before going any further, we
should make the official observation that performing elementary operations on a system of
equations does not change its solutions.
«ln
• &mn hm
•'nt J
Notice that the augmented matrix contains all of the information of the original system of
equations since we can recover the latter by filling in the xt *s, +’s, and =’s as needed.
The elementary operations on a system of equations become operations on the rows of
the augmented matrix; in this setting, we refer to them as elementary row operations of the
corresponding three types:
Since we have established that elementary operations do not affect the solution set of a
system of equations, we can freely perform elementary row operations on the augmented
matrix of a system of equations with the goal of finding an “equivalent” augmented matrix
from which we can easily read off the general solution.
► EXAMPLE 2
— 2x2 + 9x4 = 4
2xi + 2x2 — 4x4 = 6,
3 -2 0 9 4
2 2 0 -4 6
We denote the process of performing row operations by the symbol and (in this example) we
indicate above it the type of operation we are performing:
’3 -2 0 9 4 2 2 0 -4 6 1 1 0 -2 3
Oj)
2 2 0 -4 6 3 -2 0 9 4 3 -2 0 9 4
-2 3 1 1 0 -2 3' 1 0 0 1 2
oh) 1 1 0 (ii)
0 -5 0 15 -5 0 1 0 -3 1 0 1 0 -3 1
From the final augmented matrix we are able to recover the simpler form of the equations,
Xi + xa — 2
X2 — 3x4 = 1 ,
Definition We call the first nonzero entry of a row (reading left to right) its leading
entry. A matrix is in echelon1 form if
We call the leading entry of a certain row of a matrix a pivot if there is no leading entry
above it in the same column. When a matrix is in echelon form, we refer to the columns in
which a pivot appears as pivot columns and to the corresponding variables (in the original
system of equations) as pivot variables. The remaining variables are called free variables.
1 2 0 -1 1 12 11 3 12 0 3
2J ’
_0 0 1 2 ,0012 2_ 1 0-1 2_
are, respectively, in reduced echelon form, in echelon form, and in neither. The key point
is this: When the matrix is in reduced echelon form, we are able to determine the general
solution by expressing each of the pivot variables in terms of the/ree variables.
► EXAMPLE 3
1 2 0 0 4 1
0 0 1 0 -2 2
0 0 0 1 1 1
Xi 4- 2xj 4- 4x5 = 1
X3 — 2x5 — 1
X4 4- X5 = 1.
’The word echelon derives from the French tchelle, “ladder.” Although we don’t usually draw the rungs of the
2 3 4
ladder, they are there: 0
0
Condition (2) is actually a consequence of (1), but we state it anyway for clarity.
132 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems
Notice that the pivot variables, , x3, and x4, are completely determined by the free variables x2 and
x5. As usual, we can write the general solution in terms of the free variables only:
Xi 1-2X2-4X5 T 2~ 4~
X2 X2 0 1 0
x= X3 = 2 +2x5 = 2 + *2 0 + x$ 2
*4 1 - *5 1 0 -1
_X5 _ *5_ 0_ 0_ 1_
In this last example, we see that the general solution is the sum of a particular solution—
obtained by setting all the free variables equal to 0—and a linear combination of vectors, one
for each free variable—obtained by setting that free variable equal to 1 and the remaining
free variables equal to 0 and ignoring the particular solution. In other words, if Xk is a
free variable, the corresponding vector in the general solution has coordinate equal to 1
and j* coordinate equal to 0 for all the other free variables Xj. Concentrate on the circled
entries in the vectors from Example 3:
We refer to this as the standard form of the general solution. The general solution of any
system in reduced echelon form can be presented in this manner.
Our strategy now is to transform the augmented matrix of any system of linear equations
into echelon form by performing a sequence of elementary row operations. The algorithm
goes by the name of Gaussian elimination.
The first step is to identify the first column (starting at the left) that does not consist
only of 0’s; usually this is the first column, but it may not be. Pick a row whose entry in
this column is nonzero—usually the uppermost such row, but you may choose another if it
helps with the arithmetic—and interchange this with the first row; now the first entry of the
first nonzero column is nonzero. This will be our first pivot. Next, we add the appropriate
multiple of the top row to all the remaining rows to make all the entries below the pivot
equal to 0. To consider two examples, if we begin with the matrices
then we begin by switching the first and third rows of A and the first and second rows of B
(to avoid fractions). After clearing out the first pivot column we have
2 (?) 2 * 4~
1 and 0 0 5
0 -4 -4 4 0-1 5
1 Gaussian Elimination and the Theory of Linear Systems 133
We have circled the pivots for emphasis. (If we are headed for the reduced echelon form,
we might replace the first row of A' by [ 1 1 2 1 ].)
The next step is to find the first column (again, starting at the left) in the new matrix
having a nonzero entry below the first row. Pick a row below the first that has a nonzero
entry in this column, and, if necessary, interchange it with the second row. Now the second
entry of this column is nonzero; this is our second pivot. (Once again, if we’re calculating
the reduced echelon form, we multiply by the reciprocal of this entry to make the pivot 1.)
We then add appropriate multiples of the second row to the rows beneath it to make all the
entries beneath the pivot equal to 0. Continuing with our examples, we obtain
At this point, both A" and B" are in echelon form; note that the zero row of A" is at the
bottom, and that the pivots move toward the right and down.
The process continues until we can find no more pivots—either because we have a
pivot in each row or because we’re left with nothing but rows of zeroes. At this stage, if
we are interested in finding the reduced echelon form, we clear out the entries in the pivot
columns above the pivots and then make all the pivots equal to 1. (Two words of advice
here: If we start at the right and work our way up and to the left, we in general minimize the
amount of arithmetic that must be done. Also, we always do our best to avoid fractions.)
Continuing with our examples, we find the reduced echelon forms of A and B, respectively:
We must be careful from now on to distinguish between the symbols “=” and when
we convert one matrix to another by performing one or more row operations, we do not
have equal matrices.
Here is one last example:
► EXAMPLE 4
Give the general solution of the following system of linear equations:
Xi + X2 + 3x3 ~ X4 = 0
—Xl + X2 + X3 + X4 + 2x5 = — 4
X2 + 2x3 + 2x4— X5 = 0
2xi — X2 + X4 — 6x5 = 9
134 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems
We begin with the augmented matrix of coefficients and put it in reduced echelon form:
-
1 1 3 -1 0 0" "1 1 3 -1 0 0"
-1 1 1 1 2 -4 0 2 4 0 2 -4
0 1 2 2 -1 0 0 1 2 2 -1 0
• 2 -1 0 1 -6 9. -0 -3 -6 3 -6 9.
"1 0 1 1 -2 3“
0 1 2 ) 1 -2
0 0 0 -1 1
.0 0 0 0 0 0_
Xi + X3 — 2*5 = 3
X2 + 2x3 + *5 = -2
X4 — Xs = 1,
Xi = 3 — X3 + 2x5
X2 = —2 — 2X3 — X5
x3 = x3
X4 = 1 + Xs
Xs = X5,
When we reduce a matrix to echelon form, we must make a number of choices along
the way, and the echelon form may well depend on the choices. But we shall now prove
(using an inductive argument) that any two echelon forms of the same matrix must have
pivots in the same columns, and from this it will follow that the reduced echelon form must
be unique.
Theorem 1.2 Suppose A and B are echelon forms of the same nonzero matrix M.
Then all of their pivots appear in the same positions. As a consequence, if they are in
reduced echelon form, then they are equal.
1 Gaussian Elimination and the Theory of Linear Systems 135
1.1 Consistency
136 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems
where ai,..., an G IR™ are the column vectors of the matrix A. Thus, a solution c =
i.e., a solution gives a representation of the vector b as a linear combination, Qai 4--------I-
cnan, of the column vectors of A.
► EXAMPLES
Suppose we want to express the vector b as a linear combination of the vectors Vi, v2, and v3. Writing
out the expression
*1 Vi + X2V2 + X3V3 = Xi
Xl + X2 + 2x3 = 4
X2 + x3 = 3
Xl + x2 + x3 = 1
2xi + x2 + 2x3 = 2
1 1 2
0 1 1
1 1 1
2 1 2
1 Gaussian Elimination and the Theory of Linear Systems ◄ 137
[A | b] =
-2
x= 0 , so b = -2vi +0v2 4- 3v 3,
3
► EXAMPLES
"1“
as a linear combination of the same vectors Vi, v2, and v3. This then leads analogously to the system
of equations
Xj 4- X2 + 2*3 = 1
x2 4* x3 = 1
•Xi + x2 + x3 = 0
2xi 4“ x2 + 2x3 = 1
"1 1 2 1
0 1 1 1
1 1 1 0
_2 1 2 1
138 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems
“112 1“
0 111
0 0 11'
_0 0 0 1.
A system of equations is consistent precisely when a solution exists. We see that the system
of equations in Example 6 is inconsistent and the system of equations in Example 5 is
consistent. It is easy to recognize an inconsistent system of equations from the echelon
form of its augmented matrix: The system is inconsistent only when there is an equation
that reads
for some nonzero scalar c, i.e., when there is a row in the echelon form of the augmented
matrix where all but the rightmost entry are 0.
Turning this around a bit, let [17 | c] denote the echelon form of the augmented matrix
[A | b]. The system Ax = b is consistent if and only if any zero row in U corresponds to
a zero entry in the vector c.
There are two geometric interpretations of consistency. From the standpoint of row
vectors, the system Ax = b is consistent precisely when the intersection of the hyperplanes
Ai • x = , ..., km'X = bm
is nonempty. From the point of view of column vectors, the system Ax = b is consistent
precisely when the vector b can be written as a linear combination of the column vectors
ai,..., an of A.
In the next example, we characterize those vectors b e R4 that can be expressed as a
linear combination of the three vectors Vi, V2, and V3 from Examples 5 and 6.
► EXAMPLE 7
b4
1 Gaussian Elimination and the Theory of Linear Systems 139
*1 + *2 + 2*3 =
*2 + *3 = &2
*1 + *2 + *3 = &3
have a solution? We form the augmented matrix [A | b] and determine its echelon form:
We infer from the last row of tine latter matrix that the original system of equations will have a solution
if and only if
That is, the vector b can be written as a linear combination of Vi, v2, and v3 precisely when b satisfies
the constraint equation (t). ◄
► EXAMPLE 8
Given
1 -1 1
3 2-1
1 4 -3
3-3 3.
we wish to find all vectors b e R4 so that Ax = b is consistent, i.e., all vectors b that can be expressed
as a linear combination of the columns of A.
We consider the augmented matrix [A | b] and determine its echelon form [U | c]. In order for
the system to be consistent, every entry of c corresponding to a row of zeroes in U must be 0 as well:
"1 -1 1 bi _1 -1 1 bi
3 2 -1 bz 0 5 -4 b2 — 3&i
[A | b] =
1 4 -3 bi 0 5 -4 bi — bi
.3 3 -3 &4_ -0 0 0 bi — 3Z>i _
"1 -1 1 bi
0 5 -4 b2 - 3bi
0 0 0 bi — b2 + 2bi
.0 0 0 bi — 3bi
Thus, we conclude that Ax = b is consistent if and only if b satisfies the constraint equations
These equations describe the intersection of two hyperplanes through the origin in R4 with respective
0
normal vectors and
1 0
0
Notice that here we have reversed the process at the beginning of this section. There
we expressed the general solution of a system of linear equations as a linear combination
of certain vectors. Here, starting with the column vectors of the matrix A, we have found
the constraint equations a vector b must satisfy in order to be a linear combination of them
(that is, to be in the plane they span). This is the process of determining Cartesian equations
for a space defined parametrically.
Definition The rank of a matrix is the number of nonzero rows (i.e., the number of
pivots) in its echelon form. It is usually denoted by r.
Then the number of rows of zeroes in the echelon form is m — r, and b must satisfy m - r
constraint equations. We recall that even though a matrix may have lots of different echelon
forms, it follows from Theorem 1.2 that they all must have the same number of nonzero
rows.
Given a system of m linear equations in n variables, let A denote its coefficient matrix
and r the rank of A. Let’s now summarize the state of our knowledge:
Proposition 1.3 The linear system Ax = b is consistent if and only if the rank of the
augmented matrix [A | b] equals the rank of A. In particular, ifr — m, then the system
Ax = b will be consistent for all vectors b € Rm.
Proof Ax = b is consistent if and only if the rank of the augmented matrix [A | b],
which is the number of nonzero rows in the augmented matrix [U | c], equals the number
of nonzero rows in U, i.e., the rank of A. When r = m, there is no row of zeroes in U,
hence no possibility of inconsistency. ■
We now turn our attention to the question of how many solutions a given consistent
system of equations has. Our experience with solving systems of equations suggests that
the solutions of a consistent linear system Ax = b are intimately related to the solutions of
the system Ax = 0.
The solutions of the inhomogeneous system Ax = b and those of the associated ho
mogeneous system Ax = 0 are related by the following
U = U1 + V
Au = A(ui + v) = Aui + Av = b 4- 0 = b.
Figure 1.1
matrix, so it follows that whenever n > m (i.e., there are more variables than equations),
the homogeneous system Ax = 0 must have infinitely many solutions.
From Proposition 1.4 we know that if the inhomogeneous system Ax = b is consistent,
then its solutions are obtained by translating the solutions of the associated homogeneous
system Ax = 0 by a particular solution. So we have
We conclude this discussion with an important special case. It is natural to ask when
the inhomogeneous system Ax = b has a unique solution for every b e Rm. From Propo
sition 1.3 we infer that for the system always to be consistent, we must have r = m; from
Proposition 1.5 we infer that for solutions to be unique, we must have r = n. And so we
see that we can only have both conditions when r = m = n.
Proposition 1.6 Let Abe annxn matrix. The following are equivalent:
1. A is nonsingular.
2. Ax = 0 has only the trivial solution.
3. For every b G R", the equation Ax = b has a unique solution.
► EXERCISES 4.1
2 1 3
(b)
0 1 -1
1 Gaussian Elimination and the Theory of Linear Systems 143
3. For each of the following matrices A, determine its reduced echelon form and give the general
solution of Ax = 0 in standard form.
1 0 -1' r i 2 0 -1 -1'
(a) A = -2 3 -1 -1 -3 1 2 3
*(f) A —
3 -3 0_ 1 -1 3 1 1
2 -2 4' 2 -3 7 3 4_
*(b) A = -1 1 -2 1 -1 1 1 0“
3 -3 6_ 1 0 2 1 1
(g) A =
1 2 -1" 0 2 2 2 0
1 3 1 _-l 1 -1 0 -1.
(c) A =
2 4 3 1 1 0 5 0 -1 “
_-l 1 6. 0 1 1 3 -2 0
(h) A =
1 -2 1 0 -1 2 3 4 1 -6
(d) A = 0 4 4 12 -1 -7 _
2 -4 3 --1
"1 1 1 1"
1 2 1 2
*(e) A =
1 3 2 4
_1 2 2 3_
1 1
(b) A—
3 3
1 ' "o"
*5. Find all the unit vectors x e R3 that make an angle of t t /3 with the vectors 0 and 1
_ -1 _ _ 1 _
2 -1 -4
*7. A circle C passes through the points , and . Find the center and radius of C.
6 7 -2
(Hint: The equation of a circle can be written in the form x2 + y2 + ax + by + c = 0. Why?)
144 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems
*8. By solving a system of equations, find the linear combination of the vectors
3
that gives b = 0
-2
*9. For each of the following vectors b e R4, decide whether b is a linear combination of
10. Decide whether each of the following collections of vectors spans R3.
"1 1 1 1 3 2
(a) 1 2 (c) 0 9 -1 9 5 3
1 2 1 1 3 2
1 1 1 1 2 0
(b) 1 2 3 (d) 0 9 1 9 1
1 2 3 _ -1 _ 1_ _5 _
11. Find the constraint equations that b must satisfy in order for Ax = b to be consistent.
(a)
12. Find the constraint equations that b must satisfy in order to be an element of
(a)
13. Find a matrix A with the given property or explain why none can exist.
1
*(a) One of the rows of A is 0 and for some b e R2 both the vectors
1
of the equation Ax = b;
1 Gaussian Elimination and the Theory of Linear Systems 145
"0" r °"
1 , 0
(b) the rows of A are linear combinations of and and for some b e R2 both the vectors
0 1
“1' "4“
2 1
_ 1 _ L1_
and are solutions of the equation Ax = b;
1 0
_2_ _3 _
1
0
(c) the rows of A are orthogonal to and for some nonzero vector b e R2 both the vectors
1
"1" " 1"
0
0 1
and are solutions of the equation Ax = b;
1 1
_0_ _ 1_
"1 " ' 2'
(d) for some vectors bi, ba e R2 both the vectors 0 and 1 are solutions of the equation
" 1" " 1" _1_ _1_
Ax = bi and both the vectors 0 and 1 are solutions of the equation Ax = ba.
0 1
a 3a
For which numbers a will A be singular?
(b) For all numbers a not on your list in part a, we can solve Ax = b for every vector b £ R2. For
each of the numbers a on your list, give the vectors b for which we can solve Ax = b.
15. Let A = a 2 a
a a 1
(a) For which numbers a will A be singular?
(b) For all numbers a not on your list in part a, we can solve Ax = b for every vector b e R3. For
each of the numbers a on your list, give the vectors b for which we can solve Ax = b.
16. Prove or give a counterexample:
(a) If Ax = 0 has only the trivial solution x — 0, then Ax = b always has a unique solution.
(b) Prove or give a counterexample: If Ax = 0 and Bx = 0 have the same solutions, then the set
of vectors b so that Ax = b is consistent is the same as the set of the vectors b so that fix = b is
consistent.
s 17. (a) Suppose A and fi are nonsingular n x n matrices. Prove that Afi is nonsingular. (Hint:
Solve (Afi)x = 0.)
(b) Suppose A and fi are n x n matrices. Prove that if either A or B is singular, then AB is singular.
18. In each case, give positive integers m and n and an example of an m x n matrix A with the stated
property, or explain why none can exist.
*(a) Ax = b is inconsistent for every b € Rm.
*(b) Ax = b has one solution for every b e Rm.
(c) Ax = b has either zero or one solution for every b e Rm.
146 > Chapter 4. Implicit and Explicit Solutions of Linear Systems
22. Let Pj — g Rz, i = 1,2,3. Assume xlt x2, and X3 are distinct.
.yi.
(a) Show that the matrix
1 Xl X12
1 x2 xj
1 X3 X2
is nonsingular.
(b) Show that the system of equations
always has a unique solution. Deduce that if Pi, P2, and P3 are not collinear, then they lie on a unique
parabola y = ax2 + bx + c.
(a) Prove that the three points Pi, P2, and P3 are collinear if and only if the equation Ax — 0 has a
nontrivial solution. (Hint A general line in ®2 is of the form ax + by + c = 0, where a and b are
not both 0.)
(b) Prove that if the three given points are not collinear, then there is a unique circle passing through
them. (Hint: If you set up a system of linear equations as suggested by the hint for Exercise 7, you
should use part a to deduce that the appropriate coefficient matrix is nonsingular.)
2 Elementary Matrices and Calculating Inverse Matrices 147
the Ith row of AB is the product of the Ith row vector of A with B.
X1
X2
ai a2
gives us the linear combination xiaj + %2»2 H-------- F xnan of the columns of A, the reader
can easily check that multiplying A on the left by the row vector [*i %2 • • • xm],
yields the linear combination xiAi + X2A2 H-------- 1- xmAm of the rows of A.
It should come as no surprise, then, that we can perform row operations on a matrix A
by multiplying on the left by appropriately chosen matrices. For example, if
then
“3 4" 1 2” ri 2"
ErA = 1 2 , e 2a = 3 4 , and E3A = 1 0
_5 6_ _20 24. _5 6_
Such matrices that give corresponding elementary row operations are called elementary
matrices. Note that each elementary matrix differs from the identity matrix only in a small
way. (N.B. Here we establish the custom that blank spaces in a matrix represent 0’s.)
148 > Chapter 4. Implicit and Explicit Solutions of Linear Systems
"1
1
c
1
iii. To add c times row i to row j, we should multiply by an elementary matrix of the
form
i j
"i
i —> 1
Here’s an easy way to remember the form of these matrices: Each elementary matrix is
obtained by performing the corresponding elementary row operation on the identity matrix.
2 Elementary Matrices and Calculating Inverse Matrices 149
► EXAMPLE 1
3
Let A = . We put A in reduced echelon form by the following sequence of row
1 2
operations:
These steps correspond to multiplying, in sequence from right to left, by the elementary matrices
2 3
4 3 5 "I I"1 0 -1
EA = 3 “5
1 4 5 ~ 0
I-? J 1 2 1 3
as it should. *4
► EXAMPLE!
113-10
-11112
0 12 2-1
2-1 0 1-6
To clear out the entries below the first pivot, we must multiply by the product of the two elementary
matrices Ej and E2:
1
150 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems
We then change the pivot in the third row to 1 and clear out below, multiplying by
"1 1 "1 -1
1 1
e8 = and Eg =
1 1
1_ 1_
i _3 1
4 2 0
i 1
2 5 0 0
EgEzEiE^EsE^E^EzEi) — 1 1
“4 2 0
i 9
4 1
Recall from Section 1 that if we want to find the constraint equations that a vector b
must satisfy in order for Ax = b to be consistent, we reduce the augmented matrix [A | b]
to echelon form [U | c] and set equal to 0 those entries of c corresponding to the rows
of zeroes in U. That is, when A is an m x n matrix of rank r, the constraint equations
are merely the equations cr+\ = • • • = = 0. Letting E be the product of the elementary
matrices corresponding to the elementary row operations required to put A in echelon form,
we have U = EA and so
Interestingly, we can use the equation (t) to find a simple way to compute E: When we
reduce the augmented matrix [A | b] to echelon form [Z7 | c], E is the matrix so that Eb = c.
2 Elementary Matrices and Calculating Inverse Matrices •« 151
► EXAMPLES
Taking the matrix A from Example 2, let’s find the constraint equations for Ax = b to be consistent.
We start with the augmented matrix
1 1 3 -1 0 bi’
-1 1 1 1 2 bi
[A | b] =
0 1 2 2 -1 b3
2 -1 0 1 -6 *4_
"1 1 3 -1 bi
[tZ|c]= ’ 2 4 0 bi 4- &2
0 0 4 —b\ — Z?2 4* 2i>3
.0 0 0 0 bi 4- 9&2 — 6Z>3 4- 4Z>4
bi 1
bi +bi 1 1
Eb — , then E—
—bi — b2 + 2bj -1 -1 2
_ bi 4- 9^2 ~~ 6*3 4" 4b< _ 1 9 -6 4_
r 1 3 i O'
3 2
i 1
0 0
E' = 2 2
1 1 1
2 0
1 9 -6 4.
which is very close to—but not the same as—the product of elementary matrices we obtained at the
end of Example 2. Can you explain why the first three rows must agree here, but not the last?
3We will write the “implies” symbol “=>” vertically so that we can indicate the reasoning in each step.
152 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems
Ax = b
multiplying both sides of the equation by A-1 on the left
A-1 (Ax) = A-1b
JJ. using the associative property
(A~1A)x = A-1b
4 using the definition of A-1
x = Znx = A-1b.
We aren’t done! We’ve shown that if x is a solution, then it must satisfy x = A-Ib. That
is, we’ve shown that the vector A-1b is a candidate for a solution. But now we check that
it truly is a solution by straightforward calculation:
as required; but note that we have used both pieces of the definition of the inverse matrix
to prove that the system has a unique solution (which we “discovered” along the way).
It is a consequence of this computation that if A is an invertible n x n matrix, then
Ax = c has a unique solution for every c € R", and so it follows from Proposition 1.6
that A must be nonsingular. What about the converse? If A is nonsingular, must A be
invertible? Well, if A is nonsingular, we know that every equation Ax = c has a unique
solution. In particular, for j = 1,..., n, there is a unique vector b7 that solves Ab; = ey,
the 7th standard basis vector. If we let B be the n x n matrix whose column vectors are
bi,..., b„, then we have
AB = A bi b2
This suggests that the matrix we’ve constructed should be the inverse matrix of A. But we
need to know that BA == In as well. Here is a very elegant way to understand why this is
so. We can find the matrix B by forming the giant augmented matrix
(Note that the reduced echelon form of A must be In because A is nonsingular.) But this
tells us that if E is the product of the elementary matrices required to put A in reduced
echelon form, then we have
E[A | Z] = [Z | B],
2 Elementary Matrices and Calculating Inverse Matrices 153
Note that Gaussian elimination will also let us know when A is not invertible: If we
come to a row of zeroes while reducing A to echelon form, then, of course, A is singular
and so it cannot be invertible. The following observation is often very useful.
Corollary 2.2 If A and B are n x n matrices satisfying BA = Int then B = A"1 and
A = B~l.
Proof By Exercise 4.1.19a, the equation Ax = 0 has only the trivial solution. Hence,
by Proposition 1.6, A is nonsingular; according to Theorem 2.1, A is therefore invertible.
Since A has an inverse matrix, A-1, we deduce that
BA = In
4 multiplying both sides of the equation by A"1 on the right
(BA)A-1 = /nA-1
4 using the associative property
B(AA-1) = A-1
4 using the definition of A-1
B — A~l,
> EXAMPLE 4
1 -1 1
A = 2 -1 0
1 -2 2
1 -1 0 -2 1 1 1 0 0 2 0 -1
0 1 0 4 -1 -2 0 1 0 4 -1 -2
0 0 1 3 -1 -1 0 0 1 3 -1 -1
It follows that
2 0 -1
A-1 = 4 -1 -2
3 -1 -1
> EXAMPLE 5
It is convenient to derive the formula for the inverse of a general 2x2 matrix first given in Example
9 of Chapter 1, Section 4. Let
a b
c d
b 1 0 ■11 ~a O'
n i 1 o‘
(assuming ad — be £ 0)
d 0 1 c d 0 1 _0 d-te -- 1
a
1
a 0 1 F1
ad—he At _ _°
1 0 ad—be
__ a
0 1
1 r
ad — bc—c a
As a check, we have
a b 1 d -b~ = !■>= * d —b a b
c d ad — be —c a * ad - be —c a c d
Of course, we have derived this by assuming a 0, but the reader can check .easily that the formula
works fine even when a — 0. We do see, however, from the row reduction that
a b
is nonsingular <==> ad - be 0.
c d
We have shown in the course of proving Theorem 2.1 that when A is square, any B that
satisfies AB = I (a so-called right inverse of A) must also satisfy BA = I (and thus is a left
inverse of A). Likewise, we have established in Corollary 2.2 that when A is square, any
left inverse of A is a bona fide inverse of A. Indeed, it will never happen that a nonsquare
matrix has both a left and a right inverse (see Exercise 9).
2 Elementary Matrices and Calculating Inverse Matrices ◄ 155
Remark Even when A is square, the left and right inverses have rather different
interpretations. As we saw in the proof of Theorem 2.1, the columns of the right inverse
arise as the solutions of Ax = ey . On the other hand, the left inverse of A is the product
of the elementary matrices by which we reduce A to its reduced echelon form, I. (See
Exercise 8.)
► EXERCISES 4.2
*1. For each of the matrices A in Exercise 4.1.3, find a product of elementary matrices E = • • • E2E1
so that EA is the reduced echelon form of A. Use the matrix E you’ve found to give constraint
equations for Ax = b to be consistent.
2. Use Gaussian elimination to find A-1 (if it exists):
1 21 r1 2
WA= -> 3 (d)A= 4 5
*(c) A = 0 2 1
1 1 1 1“ "2"
1 1 1 1
0 1 1 1 0
*(b) A = 0 2 3 ,b = 1 *(d) A = ,b =
0 0 1 3 1
3 2 2 2
0 0 1 4_ _ 1 _
1 -1 1
4. (a) Find two different right inverses of the matrix A =
2 -1 0
(b) Give a nonzero matrix that has no right inverse.
1 2
(c) Find two left inverses of the matrix A = 0 -1
1 1
(d) Give a nonzero matrix that has no left inverse.
5. Prove that the inverse of every elementary matrix is again an elementary matrix. Indeed, give a
simple prescription for determining the inverse of each type of elementary matrix.
156 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems
6. Using Theorem 2.1 and Proposition 4.3 of Chapter 1, prove that if AB and B are nonsingular, then
A is nonsingular. (See Exercise 4.1.17.)
B7. Suppose A is an invertible m x m matrix and B is an invertible n x n matrix.
(a) Prove that the matrix
~ A | O '
_ O | B _
is invertible and give a formula for its inverse.
(b) Suppose C is an arbitrary m x n matrix. Is the matrix
~ A [ C '
O B
invertible?
(See Exercise 1.4.12 for the notion of block multiplication.)
8. Complete the following alternative argument that the matrix obtained by Gaussian elimination
must be the inverse matrix of A. Suppose A is nonsingular.
(a) Show there are finitely many elementary matrices E\, E2,.. ■, E^ so that £*£*_]■••
E2Ex A = /.
(b) Let B = • • • E2E\. Prove that AB = I. (Hint: Use Proposition 4.3 of Chapter 1.)
9. Let A be an m x n matrix. Recall that the n x m matrix B is a left inverse of A if BA = In and a
right inverse if AB = Im.
(a) Show that A has a right inverse if and only if we can solve Ax = b for every b G if and only
ifrank(A) = m.
(b) Show that A has a left inverse if and only if Ax = 0 has the unique solution x = 0 if and only if
rank(A) = n. (Hint for 4=: If rank(A) = n, what is the reduced echelon form of A?)
(c) Show that A has both a left inverse and a right inverse if and only if A is invertible if and only if
m = n = rank(A).
> EXAMPLE 1
Let
We ask first of all whether v e Span(vi, v2, v3). This is a familiar question when we recast it in matrix
notation: Let
‘1 1 1" T
A— 1 -1 0 and b = 1
2 0 1 0
3 Linear Independence, Basis, and Dimension ^8 157
Is the system Ax = b consistent? Immediately we write down the appropriate augmented matrix and
reduce to echelon form:
2
w= 3 ?.
5
As the reader can easily check, w = 3vi - v3, so w e Span(vi, v2, v3). What’s more, w = 2vi —
v2 + v3, as well. So, obviously, there is no unique expression for w as a linear combination of vb v2,
and v3. But we can conclude more: Setting the two expressions for w equal, we obtain
That is, there is a nontrivial relation among the vectors vb v2, and v3, and this is the reason we
have different ways of expressing w as a linear combination of the three of them. Indeed, since
▼ i = — v2 + 2v 3 , we can see easily that any linear combination of vb v2, and v3 is a linear combination
just of v2 and v3:
ciVi + c2v2 + c3v3 = ci (—v2 4- 2v 3) + c2v2 + c3v3 = (c2 - ci)v2 + (c3 + 2c])v3.
We might surmise that the vector w can now be written uniquely as a linear combination of v2 and
v3, and this is easy to check:
[A'|w
and from the fact that the matrix A' has rank 2 we infer that the system of equations has a unique
solution.
Proposition 3.1 Let vb ..., v* G R” and let V = Span(vb ..., v&). An arbitrary
vector v G Span(vi,...,¥*) has a unique expression as a linear combination ofy\,... ,Vk
if and only if the zero vector has a unique expression as a linear combination ofvi,..., v*;
i.e.,
and so the zero vector has a nontrivial representation as a linear combination of Vi,.. .,
(by which we mean that not all the coefficients are 0).
Conversely, suppose there is a nontrivial linear combination
0 = siVi + ••.• +
Definition The (indexed) set of vectors {Vi,..., v^} is called linearly independent
if
i.e., if the only way of expressing the zero vector as a linear combination of Vi,..., v* is
the trivial linear combination Ovi H-------- F 0v*.
The set of vectors {vi,..., v*} is called linearly dependent if it is not linearly indepen
dent, i.e., if there is some expression
Remark Here is a piece of advice: It is virtually always the case that when you are
presented with a set of vectors {vi,..., v*} that you are to prove linearly independent, you
should write
You then use whatever hypotheses you’re given to arrive at that conclusion.
> EXAMPLE 2
1
ci
0
C2 = 0.
1
_C3_
2
By now we are old hands at solving such systems. We find that the echelon form of the coefficient
matrix is
"12 1“
0 1 1
0 0 0 ’
_0 0 0_
and so our system of equations in fact has infinitely many solutions. For example, we can take ci = 1,
c2 = — 1, and C3 = 1. The vectors therefore form a linearly dependent set. ◄
► EXAMPLES
Suppose u, v, w e R". If {u, v, w} is linearly independent, then we wish to show next that
{u + v, v + w, u 4- w} is likewise linearly independent. Suppose
We must show that ci = c2 = c3 = 0. We use the distributive property to rewrite our equation as
ci + c3 = 0
ci + c2 =0
Ci + C3 = 0,
and we leave it to the reader to check that the only solution of this system of equations is, in fact,
ci = ci = c3 = 0, as desired. ◄
► EXAMPLE 4
Any time one has a list of vectors Vi,..., v* in which one of the vectors is the zero vector, say Vi = 0,
then the set of vectors must be linearly dependent because the equation
Ivi = 0
EXAMPLES
How can two nonzero vectors u and v give rise to a linearly dependent set? By definition, this means i
that there is a linear combination |
au + by = 0,
where either a / 0 or b f 0. Suppose a 0 0. Then we may write u = — | v, so u is a scalar multiple
of v. (Similarly, you may show that if b / 0, v must be a scalar multiple of u.) So two linearly
dependent vectors are parallel (and vice versa).
How can a collection of three nonzero vectors be linearly dependent? As before, there must be I
a linear combination
au + by + cw = 0,
where (at least) one of a, b, and c is nonzero. Say a / 0. This means that we can solve:
1 b c
u = —(by + cw) = (—)v + (—)w,
a a a
so u 6 Span(v, w). In particular, Span(u, v, w) is either a line (if all three vectors u, v, w are parallel)
or a plane. ^4
Proposition 3.2 Suppose vf, ...,¥* e K" form a linearly independent set, and
suppose x € Rn. Then {vi,..., ¥*, x} is linearly independent if and only if x £ '
Span(vb ..., vjt).
Proof Although Figure 3.1 suggests the result is quite plausible, we will prove the
contrapositive:
Figure 3.1
Suppose x e Span(vi, ..., v*)- Then x = ciVi 4- C2V2 4-------- F Qv^ for some scalars ci,
. .. , Ck, so
from which we conclude that {vi,..., Vjt, x} is linearly dependent (since at least one of the
coefficients is nonzero).
Now suppose {vi,..., Vjt, x} is linearly dependent. This means that there are scalars
ci,..., Ck, and c, not all 0, so that
Note that we cannot have c = 0, for if c were 0, we’d have ciVi + c2v2 4-------- F c^v* = 0,
and linear independence of{vi,..., v^} implies ci — • • • = q = 0, which contradicts our
assumption that {vi,..., v*, x) is linearly dependent. Therefore, c / 0, and so
1 Cl C2 Ck
x - —(C1V1 4- C2V2 4- • • • 4- CfcVfc) = (------)V1 + (------ )v2 4- • • • 4- (------ )v*,
c c c c
which tells us that x € Span(vi,..., v*), as required. ■
Proposition 3.2 has the following consequence: If {vi,..., v*} is linearly independent,
then
That is, with each additional vector, the subspace spanned gets larger. We now formalize
the notion of “size” of a subspace. But we now understand that when we have a set of
linearly independent vectors, no proper subset will yield the same span. In other words, we
will have an “efficient” set of spanning vectors (i.e., there is no redundancy in the vectors
we’ve chosen: No proper subset will do). This motivates the following
Definition Let V c R" be a subspace. The set of vectors {vi......... v*} is called a
basis for V if
► EXAMPLE 6
are called the standard basis vectors for R". To check that they make up a basis, we must establish
that properties (i) and (ii) above hold for V = R". The first is obvious: If x € R", then x = xiei 4-
x2e2 4-------F xnen. The second is not much harder. Suppose ciei 4- c2e2 4------- 1- cnen = 0. Then this
means that
and so ci = c2 = •••= cn = 0.
► EXAMPLE 7
Consider the plane given by V = {x e R3: Xi - x2 4- 2x3 = 0} c R3. Our algorithms of Section 1
tell us that the vectors
span V. Since these vectors are not parallel, we can deduce (see Example 5) that they must be linearly
independent.
For the practice, however, we give a direct argument. Suppose
from which we conclude that ci = c2 = 0, as required. (For future reference, we note that this
information came from the free variable “slots.”) Therefore, {vi, v2J is linearly independent and
gives a basis for V, as required.
Corollary 3.3 Let V Cl" be a subspace, and letvi,..., v* g V. Then {vi,..., V/J
is a basis for V if and only if every vector ofV can be written uniquely as a linear combi
nation ofV\, . . . , Vfc.
> EXAMPLES
Let’s take a general vector b e K3 and ask first of all whether it has a unique expression as a linear
combination of Vi, v2, and V3. Forming the augmented matrix and row reducing, we find
“1 1 1 bi~ ’1 0 0 2&i — bj
2 1 0 b2 0 1 0 -4bi + b2 + 2i>3
_1 2 2 &3_ 0 0 1 3fei — b2 — 63
It follows from Corollary 3.3 that {vi, v2, V3} is a basis for R3, for an arbitrary vector b e R3 can be
written in the form
ci = 2&i - h3,
c2 = —4&i + h2 + 2Z>3, and
c3 = 3hi - b2 - h3
give the coordinates of b with respect to the basis {vi, v2, v3). <4
Proposition 3.4 Let Abe an n x n matrix. Then A is nonsingular if and only if its
column vectors form a basis for R".
Proof As usual, let’s denote the column vectors of A by «i, a2......... an. Using Corol
lary 3.3, we are to prove that A is nonsingular if and only if every vector in R" can be written
uniquely as a linear combination of ai, a2,..., a„. But this is exactly what Proposition 1.6
tells us. ■
Given a subspace V C Rn, how do we know there is some basis for it? This is a
consequence of Proposition 3.2 as well.
164 > Chapter 4. Implicit and Explicit Solutions of Linear Systems
Theorem 3.5 Any subspace V C R" other than the trivial subspace has a basis.
Once we realize that every subspace V C R" has some basis, we are confronted with
the problem that it has many of them. For example, Proposition 3.4 gives us a way of
finding zillions of bases for R". As we shall now show, all bases for a given subspace have
one thing in common: They all consist of the same number of elements.
Proposition 3.6 Let V C Rn be a subspace, let {vi,..., v*} be a basis for V, and let
W],..., G V. If £ > k, then {wi,..., w^} must be linearly dependent.
(*)
Since £ > k, there cannot be a pivot in every column of A, and so there is a nonzero
vector c satisfying
= 0.
LQ J
3 Linear Independence, Basis, and Dimension 165
Remark We can easily avoid equation (*) in its matrix form. Since
k j
Wj -
1=1
we have
t t k kt
z \ / V"' \ v""' / \
(**) L civi “ /. c> ( > - a‘J v‘) = > - aUci)'/‘ •
J=1 J=1 i=l i=l J=1
As before, since t > k, there is a nonzero vector c so that Ac = 0; this choice of c makes the
right-hand side of (**) the zero vector. Consequently, there is a nontrivial relation among
Wi,...,W£.
Theorem 3.7 Let V c R" be a subspace, and let {Vi,..., Vjt} and {wi,.... w^} be
two bases for V. Then we have k = t.
Proof Since {vi,..., v*} forms a basis for V and {wi,..., w^} is known to be lin
early independent, we use Proposition 3.6 to conclude that t < k. Now here’s the trick:
{wi,..., w^} is likewise a basis for V and {vi,..., v*} is known to be linearly independent,
so we infer from Proposition 3.6 that k <£. The only way both inequalities can hold is for
k and t to be equal, as we wished to show. ■
As we shall see in our applications, dimension is a powerful tool. Here is the first
instance.
Lemma 3.8 Suppose V and W are subspaces ofW with the property that W C V.
If dim V = dim W, then V = W.
166 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems
Proof Let dim W = k and let {vi,..., be a basis for W. If W Q V, then there
must be a vector v e V with v W. By virtue of Proposition 3.2, we know that {Vi,...,
?k, v} is linearly independent, so dim V > k 4-1. This is a contradiction. Therefore,
V = W. ■
► EXAMPLE 9
We want a subset of {vi, v2, v3, v4} that will give us a basis for V. Of course, this set of four
vectors must be linearly dependent since V C R3 and R3 is only 3-dimensional. But let’s examine
the solutions of
1 2 0
1 2 1
2 4 1
"1 2 0 3~
R = 0 0 1 1
_0 0 0 0_
and so the vectors n 2 and v4 can be expressed as linear combinations of the vectors Vi and v3. On the
other hand, {vj, v3} is linearly independent (why?), so this gives a basis for V.
► EXAMPLE 10
Here are a few examples of so-called “abstract” vector spaces. Others appear in the exercises.
a. Let A4mxn denote the set of all m xn matrices. As we’ve seen in Proposition 4.1 of Chapter
1, Mmxn is a vector space, using the operations of matrix addition and scalar multiplication
we’ve already defined. The zero “vector” is the zero matrix O. This space can naturally be
identified with RmB (see Exercise 24).
b. Let ^(U) denote the collection of all real valued functions defined on some subset U C R”.
If f e F(U) and c e R, then we can define a new function cf € F(U) by multiplying the
value of f at each point by the scalar c; i.e.,
Similarly, if f, g e ^(U), then we can define the new function f + g e F(U) by adding
the values of f and g at each point; i.e.,
By these formulas we define scalar multiplication and vector addition in F(U). The zero
“vector” in F(U) is the zero function. The various properties of a vector space follow from
the corresponding properties of the real numbers (as everything is defined in terms of the
values of the function at every point t). Since an element of F(U) is a function, ^(U) is
often called a junction space.
c. Let R" denote the collection of all infinite sequences of real numbers. That is, an element of
x2
R® looks like x = , where x, e R, i = 1,2,3,.... Operations are defined in the obvi-
yz CX2 X2 + yz
ous way: If c e Randy = , then we set ex = CX3
andx + y =
X3 +y3
. ◄
y-i
, * _
The vector space of functions on an open subset U C R" has various subspaces that
will be of particular interest to us. For any k > 0 we have 6fc(t7), the space of Ck functions
168 > Chapter 4. Implicit and Explicit Solutions of Linear Systems
(That these are all subspaces follows from the standard fact that sums and scalar multiples
of C functions are again 6 .) We can also consider the subspaces of polynomial functions.
We denote by Pk the vector space of polynomials of degree < k in one variable.
As we ask the reader to check in Exercise 26, the vector space Pk has dimension
k + 1. In general, we say a vector space is finite-dimensional if it has dimension n for some
n G N and infinite-dimensional if not. The vector space e°°(R) is infinite-dimensional, as
it contains polynomials of arbitrarily high degree.
7. Suppose Vi,..., vk € Rn form a linearly dependent set. Prove that for some 1 < j < k we have
Vy e Span(vi,..., v7_i, Vj+i,..., vk). That is, one of the vectors Vi,..., v* can be written as a linear
combination of the remaining vectors.
8. Suppose vi,..., v* € RB form a linearly dependent set. Prove that either Vi = 0 or vi+i e
Span(Vi,..., V,) for some i = 1,2,.... k - 1.
9. Let A be an m x n matrix and bi,..., b* e Rm. Suppose {bi,..., b*} is linearly independent
Suppose that Vi,..., v* e R" are chosen so that Avi = bi, ...» Avk = b*. Prove that {vi,..., v*}
must be linearly independent.
10. Suppose T: R" -> R* is a linear map. Prove that if [T] is nonsingular and {vj........ v*} is linearly
independent, then {7 (vi), ...,T (v*)} is likewise linearly independent.
c 11. Suppose T: R" -> is a linear map and [T] has rank n. Suppose Vi,..., v* e Rn and
{vi,..., VjJ is linearly independent. Prove that {T(vi),..., T(vt)} c Rm is likewise linearly in
dependent. (N.B.: If you did not explicitly make use of the assumption that rank([T]) = n, your
proof cannot be correct. Why?)
*12. Decide whether the following sets of vectors give a basis for the indicated space.
13. Find a basis for each of the given subspaces and determine its dimension.
1 1 1 1
*(b) V1 = 0 , v2 - 2 . v3 = 3 ;b = 1
_3_ _2_ _2_ _2_
" 1 ’ '1 " " 1 ' "3"
(c) Vi = 0 , V2 = 1 ,v3 = 1 ;b = 0
1 2 1 1
170 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems
(e) {fi, fi, fz} C e°°(R), where /i(r) = 1, /2(t) = cost, /3(t) = cos2t
(f) {fi, fi, fz} C e°°(R), where /i(t) = 1, /2(r) = cos2t, f3(r) = cos21
24. Recall that A4mx„ denotes the vector space of m x n matrices.
(a) Give a basis for and determine the dimension of Mmxn-
(b) Show that the set of diagonal matrices, the set of upper triangular matrices, and the set of lower
triangular matrices are all subspaces of Mnxn and determine their dimensions.
(c) Show that the set of symmetric matrices, S, and the set of skew-symmetric matrices, X, are
subspaces of A4nx„. What are their dimensions? Show that S 4- X = A4nxn. (See Exercise 1.4.36.)
#25. Let V be a vector space.
(a) Let V* denote the set of all linear transformations from V to R. Show that V* is a vector space.
(b) Suppose {Vi,..., v„) is a basis for V. For i = 1,..., n, define fz € V* by
f,(aiVi + a2v2 + • • • 4- anv„) = at.
Prove that {ft,..., f„} gives a basis for V*.
(c) Deduce that whenever V is finite-dimensional, dim V* = dim V.
26. Show that the set Pk of polynomials in one variable of degree < k is a vector space of dimension
k 4-1. (Hint: Suppose co + c\x -I------- 1- CkXk = 0 for all x. Differentiate.)
27. Recall that f: R" - {0} -> R is homogeneous of degree k if f {tn) = tkf{n) for all t > 0.
(a) Show that the set Pk,n of homogeneous polynomials of degree k in n variables is a vector space.
(b) Fix k € N. Show that the monomials x^x'f •••xi*, where z'i 4~ i2 4------- 1- i„ = k and 0 < ij < k
for j = 1,..., n, form a basis for Pk,n-
(c) Show that dim Pk,n = ("~j+k) -4 (Hint: It may help to remember that (0 = ( ? k).)
it
(d) Using the interpretation in part c, prove that 22 (”^) = (" k ')■
i=0
Definition Let A be an m xn matrix with row vectors Ai,..., Am € R" and column
vectors ai,..., aw e Rm. We define the column space of A to be the subspace of Rm spanned
by ai, ...,an:
Our work in Section 1 gives an important alternative interpretation of the column space.
4Recall that the binomial coefficient (£) = n\/k\{n — k)\ gives the number of ^-element subsets of a given n-
element set.
172 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems
Proposition 4.1 Let A be an m xn matrix. Let b G Rm. Then b G C(A) if and only
ifb = Ax. for some x e R". That is,
C(A) = {b g Rm : Ax = b is consistent}.
Remark If we think of A as the standard matrix of a linear map T: R" -> Rm, then
C(A) c Rm is the set of all the values of T, i.e., its image, denoted image(T).
Perhaps the most natural subspace of all comes from solving a homogeneous system
of linear equations.
Recall (see Exercise 1.4.3) that N(A) is in fact a subspace. If we think of A as the standard
matrix of a linear map T: R” -» Rm, then N(A) C R" is often called the kernel of T,
denoted ker(T).
We might surmise that our algorithm in Section 1 for finding the general solution of
the homogeneous linear system Ax = 0 produces a basis for N(A).
► EXAMPLE 1
. 121-1
A=
10 11
R= 1 ° 1 1
0 10-1
Xl = -x3 - x4
x2 = x4
X3 = X3
x4 = x4;
4 The Four Fundamental Subspaces <4 173
e.,
i.
span N(A). On the other hand, they are clearly linearly independent, for if
One of the most beautiful and powerful relations among these subspaces is the fol
lowing:
It is also the case that R(A) = N(A)1, but we are not quite yet in a position to establish
this.
Since C(A) = R(AT), the following is immediate:
In fact, we really came across this earlier, when we found constraint equations for
Ax = b to be consistent. Just as multiplying A by x takes linear combinations of the
columns of A, so then does multiplying AT by x take linear combinations of the rows of A
(perhaps it helps to think of ATx as (xTA)T). Corollary 4.3 is the statement that any linear
combination of the rows of A that gives 0 corresponds to a constraint on C(A) and vice
174 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems
versa. What is far from clear, however, is whether the vectors we obtain as coefficients of
the constraint equations form a linearly independent set.
> EXAMPLE 2
Let
2
1
1
2
We wish to find a homogeneous system of linear equations describing C(A). That is, we seek the
equations b € R4 must satisfy in order for Ax — b to be consistent. By row reduction, we find:
—bi 4- Z>2 + ^3 =0
—by + &4 — 0.
Now, if we keep track of the row operations involved in reducing A to echelon form, we find
that
1 0 0
-1 1 0
-1 1 1
_-l 0 0
—Ai + A2 + A3 = —Ai + A4 = 0.
span N(AT). On the other hand, in this instance, it is easy to see they are linearly independent and
hence give a basis for N(AT). *4
4 The Four Fundamental Subspaces *4 175
► EXAMPLE 3
Let
110 14
12 116
0 1113
2 2 0 1 7
Using the result of Exercise 1, R(A) = R(2?), so the nonzero rows of R span R(A); now we need
only check that they form a linearly independent set. We keep an eye on the pivot “slots”: Suppose
ci ' o'
C2 0
“Cl + C2 — 0
C3 0
_ ci + 2c 2 + C3 _ 0
and so ci = c2 = c3 = 0, as promised.
From the reduced echelon form R, we read off die vectors that span N(A): The general solution
of Ax = 0 is
*3~ x5 r '-1'
—*3—2*5 -i -2
X3 = *3 i +xs 0
- x5 0 -1
X5 0 1
176 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems
so
span N(A). On the other hand, these vectors are linearly independent, for if we take a linear combi
nation
-1 -2
1 + X5 0
0 -1
0 1 _
we infer (from the free variable slots) that x3 = x$ = 0. Thus, these two vectors form a basis for
N(A).
Obviously, C(A) is spanned by the five column vectors of A. But these vectors cannot be linearly
independent—that’s what vectors in the nullspace of A tell us. From our vectors spanning N(A), we
know that
These equations tell us that a3 and a3 can be written as linear combinations of ai, a2, and 84, and so
these latter three vectors span C(A). If we can check that they form a linearly independent set, we’ll
know they give a basis for C(A). We form a matrix A' with these columns (easier: cross out the third
and fifth columns of A) and reduce it to echelon form (easier: cross out the third and fifth columns of
R). Well, we have
and so only the trivial linear combination of the columns of A' will yield the zero vector. In conclusion,
the vectors
—Ai — A2 + A3 + A4 = 0
4 The Four Fundamental Subspaces ^8 177
-1
-1
v
1
1
We now state the formal results regarding the four fundamental subspaces.
Proof For simplicity of exposition, let’s assume that the reduced echelon form takes
the shape
1. Since row operations are invertible, R(A) = R(t7) (see Exercise 1). Clearly the
nonzero rows of U span R(t/). Moreover, they are linearly independent because
of the pivots. Let Ui,..., Ur denote the nonzero rows of U', because of our
simplifying assumption on R, we know that the pivots of U occur in the first r
columns as well. Suppose now that
ciUi + • • • + crUr = 0.
The first entry of the left-hand side is ciun (since the first entry of the vectors
U2,..., Ur is 0 by definition of echelon form). Since «n 0 by definition of
pivot, we must have ci = 0. Continuing in this fashion, we find that ci = C2 =
• • • = cr = 0. In conclusion, {Ui,..., Ur} forms a basis for R((7), hence for
R(A).
178 Chapter 4. Implicit and Explicit Solutions of Linear Systems
Xn 0 0 1
_0_ 0 0 1
original matrix A give a basis for C( A). These vectors form a linearly independent
set since the only solution of
from which we conclude that the vectors ar+i,..., an are all linear combinations
of ai,..., ar. It follows that C(A) is spanned by ai,..., ar, as required.
4. We are interested in the linear relations among the rows of A. The key point here
is that the first r rows of the echelon matrix U form a linearly independent set,
whereas the last m — r rows of U consist just of 0. Thus, N(t/T) is spanned by the
last m — r standard basis vectors for Rm. Using EA = U, we see that
and so
This tells us that the last m — r rows of E span N(AT). But these vectors are
linearly independent since E is nonsingular. ■
Remark Referring to our earlier discussion of (t) on p. 150 and our discussion in
Sections 1 and 2 of this chapter, we finally know that finding the constraint equations for
C(A) will give a basis for N(AT). It is also worth noting that to find bases for the four
fundamental subspaces of the matrix A, we need only find the echelon form of A to deal
with R(A) and C(A), the reduced echelon form of A to deal with N(A), and the echelon
form of the augmented matrix [A | b] to deal with N(AT).
► EXAMPLE 4
We want bases for R(A), N(A), C(A), and N(AT), given the matrix
112 0 0
0 1 1-1-1
112 12
2 1 '3 -1 -3
180 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems
10 10-1
_ 0 1 1 0 1
~ 0 0 0 1 2
_0 0 0 0 0
"1 1 2 0 0 bi
0 1 1 -1 -1 bi
[EA | Eb] =
0 0 0 1 2 —bi + &3
.0 0 0 0 0 —42>i + bi + 2Z>3 + b$_
C(A):
-4
1
N(AT):
2
1
The reader should check these all carefully. Note that dimR(A) = dim C(A) = 3, dimN(A) = 2,
and dimN(AT) = 1. ◄
We now deduce the following results on dimension. Recall that the rank of a matrix is
the number of pivots in its echelon form.
4 The Four Fundamental Subspaces 181
Proof There are r pivots and a pivot in each nonzero row of U, so dimR(A) = r.
Similarly, we have a basis vector for C(A) for every pivot, so dim C(A) = r, as well. We
see that dim N (A) is equal to the number of free variables, and this is the difference between
the total number of variables (n) and the number of pivot variables (r). Last, the number
of zero rows in U is the difference between the total number of rows (m) and the number
of nonzero rows (r), so dimN(AT) = m — r. ■
Proof Choose a basis {vi,..., v*} for V, and let these be the rows of a £ x nmatrixA.
By construction, wehaveR(A) = V. Notice also that rank(A) = dimR(A) = dim V = k.
By Proposition 4.2, we have V1 = N(A), so dim V1 = dim N(A) ~n — k. ■
We can finally bring this discussion to a close with the geometric characterization of
the relations among the four fundamental subspaces. Note that this result completes the
story of Theorem 4.5.
Proof These are immediate from Proposition 4.2, Corollary 4.3, and Proposi
tion 4.8. ■
Wi • x = 0, ..., Wf ■ x = 0.
► EXAMPLES
Let
We wish to write V = Span(vi, v2) as the solution set of a homogeneous system of linear equations.
We introduce the matrix
V = R(A) = N(A)X
= {x e R4: Wi ■ x = 0, w2 • x = 0}
— {x 6 R4 : —Xi + x2 + Xj =0, —Xi + X4 = 0}.
Earlier, e.g., in Example 2, we determined the constraint equations for the column space.
The column space, as we’ve seen, is the intersection of hyperplanes whose normal vectors
are the basis vectors for N(AT). This is an application of the result that C(A) = NCA1)-1-.
As we interchange A and AT, we turn one method of solving the problem into the other.
To close our discussion now, we introduce in Figure 4.1 a schematic diagram summa
rizing the geometric relation among our four fundamental subspaces. We know that N(A)
and R(A) are orthogonal complements of one another in R" and that, similarly, N(AT) and
C(A) are orthogonal complements of one another in ROT. But there is more to be said.
Recall that, given an m x n matrix A, we have linear maps T: R" -» R™ and S: Rw ->
R" whose standard matrices are A and AT, respectively. T sends all of N(A) to 0 e Rw
and S sends all of N(AT) to 0 e R". Now, the column space of A consists of all vectors of
4 The Four Fundamental Subspaces ◄ 183
Figure 4.1
the form Ax for some x G R"; that is, it is the image of the function T. Since dim R(A) =
dim C(A), this suggests that T maps the subspace R(A) one-to-one and onto C(A). (And,
symmetrically, S maps C(A) one-to-one and onto R(A). These are, however, generally not
inverse functions. Why? See Exercise 18.)
Proposition 4.10 For each b G C(A), there is a unique vector x g R(A) so that
Ax = b.
Proof Let {vi,..., vr} be a basis for R(A). Then Avi,..., Avr are r vectors in
C( A). They are linearly independent (by a modification of the proof of Exercise 4.3.11 that
we leave to the reader). Therefore, by Proposition 3.9, these vectors must span C(A). This
tells us that every vector b G C(A) is of the form b = Ax for some x G R(A) (why?). And
there can be only one such vector x because R(A) A N(A) = {0}. ■
Remark There is a further geometric interpretation of the vector x G R (A) that arises
in the preceding proposition. Of all the solutions of Ax = b, it is the one of least length.
Why?
184 !► Chapter 4. Implicit and Explicit Solutions of Linear Systems
► EXERCISES 4.4
*1. Show that if B is obtained from A by performing one or more row operations, then R(B) = R(A).
2 1 1
2. Let A = 0 3 4
2 -2 -3
(a) Give constraint equations for C(A). (b) Find a basis for N(AT).
3. For each of the following matrices A, give bases for R(A), N(A), C(A), and N(AT). Check
dimensions and orthogonality.
2 3 1 1 0 1 -1“
(a) A =
4 6 1 1 2 -1 1
(e) A =
2 2 2 0 0
1 3
_-l -1 2 -3 3.
(b) A = 3 5
3 3 1 1 0 5 0 -1 "
0 1 1 3 -2 0
-2 1 0 *(f) A =
(c) A = -1 2 3 4 1 -6
-4 3 -1
0 4 4 12 -1 -7 _
"1-1 1 1 0
1 0 2 1 1
(d) A =
0 2 2 2 0
_-l 1-1 0 -1
4. Given each matrix A, find matrices X and Y so that C(A) = N(X) and N(A) = C(Y).
3 -1’ "1 1 1"
(a) A = 6 -2 1 2 0
(c) A =
-9 3_ 1 1 1
"1 1 o' .1 0 2_
(b) A = 2 1 1
_1 -1 2_
5. In each case, construct a matrix with the requisite properties or explain why no such matrix exists.
V "o'
(a) The column space contains i and 1 and the nullspace contains
_i_ _1_
"o"
'(b) The column space contains 1 and 1 and the nullspace contains
_i_ _1_
“1” ’1"
*(c) The column space has basis 0 and the nullspace contains 2
1 0
'1~ ■-1" r
(d) The nullspace contains 0 9 2 , and the row space contains 1
_1_ 1_ _-l_
4 The Four Fundamental Subspaces 185
1 0 1 2
*(e) The column space has basis 0 1 , and the row space has basis 1 0
1 1 1 1
1
(f) The column space and the nullspace both have basis
,°J
’1"
(g) The column space and the nullspace both have basis 0
0
6. (a) Construct a 3 x 3 matrix A with C(A) c N(A).
(b) Construct a 3 x 3 matrix A with N(A) c C(A).
(c) Can there be a 3 x 3 matrix A with N(A) = C(A)? Why or why not?
(d) Can there be a 4 x 4 matrix A with N(A) = C(A)? Why or why not?
W = Span
(c) Give a matrix B so that the subspace W defined in part b can be written in the form W = N(B).
*10. Let A be an m x n matrix with rank r. Suppose A = BU, where U is in echelon form. Prove
that the first r columns of B give a basis for C(A). (In particular, if EA — U, where U is the echelon
form of A and E is the product of elementary matrices by which we reduce A to U, then the first r
columns of E~l give a basis for C(A).)
11. According to Proposition 4.10, if A is an m x n matrix, then for each b e C(A), there is a unique
x e R(A) with Ax = b. In each case, give a formula for that x.
186 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems
of the other. In the case of the circle x2 + y2 = 1, we can solve for y as a function of x
locally near any point not on the x-axis (viz., y =■ ±V1 — x2), and for x as a function of y
near any point not on the y-axis (analogously).
But it is important to understand that going back and forth between these two approaches
can be far more difficult—if not impossible—in the nonlinear case. For example, with a
bit of luck, we can see that the parametric curve
is given by the algebraic equation y2 = x2(x + 1) (the curve pictured in Figure 1.4(b) on
p. 55). On the other hand, the cycloid, presented parametrically as the image of the function
t — sinr
g(0 = t g R,
1 — cos t
(see Figure 1.6 on p. 57) is obviously the graph y = /(x) for some function f, but I believe
no one can find f explicitly. Nor is there a function on R2 whose zero-set is the cycloid.
Nevertheless, it is easy to see that locally we can write x as a function of y away from
the cusps. On the other hand, given the hypocycloid x2/3 + y2/3 = 1, we can find the
parametrization
cos3t
g(0 = • 3 t e [0,2?r],
sin t
Figure 5.1
188 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems
lying on the x-axis, we can write yas a function of x (explicitly in this case: y = zbs/x3 — x),
and near each of those three points we can write x as a function of y (explicitly only if you
know how to solve the cubic equation x3 — x = y1 explicitly).
Given the hyperplane a ■ x = 0 in Rn, we can solve for xn as a function of xi,...,
xn~i—i-e., we can represent the hyperplane as a graph over the xj - • • xn_i-plane—if and
only if an f 0 (and, likewise, we can solve for xk in terms of the remaining variables if and
only if a* 7^ 0). More generally, given a system of linear equations, we apply Gaussian
elimination and solve for the pivot variables as functions of thefree variables. In particular,
as Theorem 4.4 shows, if rank(A) = r, then we solve for the r pivot variables as functions
of the n — r free variables.
Now, since the derivative gives us the best linear approximation of a function, we
expect that if the tangent plane to a surface at a point is a graph, then so locally should be
the surface, as depicted in Figure 5.2. We suggested in Section 4 of Chapter 3 that, given a
level surface f — c of a differentiable function f: R” -> R, the vector V/(a)—provided
it is nonzero—should be the normal vector to the tangent plane at a; equivalently, the
subspace of R" parallel to the tangent plane should be the nullspace of the matrix [D/(a)].
To establish these facts we need die Implicit Function Theorem, whose proof we delay to
Chapter 6.
Figures.!
Theorem 5.1 (Implicit Function Theorem, Simple Case) Suppose U c R" is open,
f) f
a e U, and f: V -> R is 61. Suppose that /(a) = 0 and -—(a) 0. Then there are
That is, near a, the level surface / = 0 can be expressed as a graph over the • • • xn_\-
plane; i.e., near a, the equation f = 0 defines xn implicitly as a function of the remaining
variables.
A£
More generally, provided Df(ti) 0, we know that some partial derivative -^-(a) 0,
%xk
and so locally the equation f = 0 expresses xt implicitly as a function of xj,.. , xjt-i, ,
...,xn.
► EXAMPLE 1
= y3 — 3y — x = 0,
3f
as shown in Figure 5.3. Although it is globally a graph of x as a function of y, we see that — =
dy
-2
3(y2 — 1) = 0 at the points ± . Away from these points, y is given (implicitly) locally as
1
a function of x. We recognize these as the three (C1) local inverse functions 0i, 02, and 03 of
g(x) — x3 — 3x, defined, respectively, on die intervals (—2, oo), (—2,2), and (—oo, 2). ◄
► EXAMPLE!
/
f I y I = z2 + xz + y = 0,
\zj
190 k Chapter 4. Implicit and Explicit Solutions of Linear Systems
Figure 5.4
pictured in Figure 5.4. Note first of all that it is globally a graph: y = — (z2 + xz). On the other hand,
—-=2z+x = 0on/ = 0 precisely when x = -2z and y = z2. That is, away from points of the
dz
‘ ~2t
form t2 for some t € R, we can locally write z — </> ( j. Of course, it doesn’t take a wizard
t
to do so: We have
—x ± y/x2 — 4y
and away from points of the designated form we can choose either the positive or negative square
root. It is along the curve 4y = x2 (in the xy-plane) that the two roots of this quadratic equation
in z coalesce. (Note that this curve is the projection of the locus of points on the surface where
F=o-) 4
dz
1 0
9/ 9/ 9/
3X„_1 9xn 0
dip
dxi
(Here all the derivatives of 0 are evaluated at a, and all the derivatives of f are evaluated
at g(a) = a.) In particular, for any j = 1,..., n — 1, we have
9f
^(a) 9/ 90
+ -^-(a)-^(a) = 0,
vXy (j Xj i
Proposition 5.3 Suppose U C R" is open, a eU, f: U —> R is C1, and Df(a) / 0.
Suppose /(a) = c. Then the tangent hyperplane at a of the level surface M = /-1({c}) is
given by
3f
Proof Since D/(a) 0, we may assume without loss of generality that —— (a) 0.
9xn
Applying Theorem 5.1 to the function f — c, we know that M can be expressed near a as
the graph xn = 0 (x) for some C1 function 0. Now, the tangent plane to the graph xn — 0 (x)
a
ata = is the graph of D0(a), translated so that it passes through a:
.0(a).
^4 90
xn - an = D0(a)(x - a) = £ -^-(a)(x7 - af)
dXj
From Theorem 5.1 we infer that if f: Rn —> R is 61 and V/ / 0 on the level sur
face M — f~x({<?}), then at each point a e M, we can locally represent M as a graph
over (at least) one of the n coordinate hyperplanes. We call such a set M a smooth
hypersurface or (n — l)-dimensional manifold. More generally, a subset M C Rn is an
(n — m) -dimensional manifold if each point has a neighborhood that is a C1 graph over
some (n — m)-dimensional coordinate plane. The general version of the Implicit Function
Theorem, which we shall prove in Chapter 6, tells us that this is true whenever M is the level
set of a C1 function F: R" -> Rm with the property that rank(£>F(x)) = m at every point
x g M. Moreover, if we generalize the result of Proposition 5.3„the (n — m)-dimensional
tangent plane of M at a point a is then obtained by translating the (n — m)-dimensional
subspace N([DF(a)]) so that it passes through a.
► EXAMPLES
x2 4- y2 — a2
x2 + z2-b2
then M = F 1 ({0}). To see that M is a 1-dimensional manifold, we check that rank(DF(x)) = 2 for
every x g M. We have
Figure 5.5
5 The Nonlinear Case: Introduction to Manifolds 193
2x 2y 0 _ x y 0
2x 0 2z x 0 z
If x 0, this matrix will have two pivots, since y and z can’t be simultaneously 0. If x — 0, then
both y and z are nonzero, and once again the matrix has two pivots. Thus, as claimed, the rank of
DF(x) is 2 for every x e M, and so M is a smooth curve.
► EXERCISES 4.5
1. Can one solve for one of the variables in terms of the other to express each of the following as a
graph? What about locally?
(a) xy = 0 (b) 2 sin(xy) = 1
2. Decide whether each of the following is a smooth curve (1-dimensional manifold). If not, what
are the trouble points?
(a) y2 — x3 + x — 0 (d) x2 + y2 + z2 — 1 = x2 — x + y2 = 0
(b) y2 — x3 — x2 — 0 (e) x2 + y2 + z2 — 1 = z2 — xy = 0
(c) z — xy = y — x2 = 0
*3. Let
(x\ r1"
f I y I = xy2 + sin(xz) + ez and a= -1
\z/ °_
5. Prove that S"-1 = {x e Rn: ||x|| = 1} is an (n - l)-dimensional manifold. (Hint: Note that
||x|| = 1 ||x||2 = 1.)
*6. Let f: R3 —> R be given by
Ix \
f I y I = z2 + 4x3z — 6xyz + 4y3 — 3x2y2.
\z/
Is M = f~x ({0}) a smooth surface (2-dimensional manifold)? If not, at what points does it fail to
be so?
194 > Chapter 4. Implicit and Explicit Solutions of Linear Systems
7. Show that the intersection of the surfaces x2 + 2y2 + 3z2 = 9 and x2 4- y2 = z2 is a smooth curve.
1
Find its tangent line at the point a = 1
72
where all the partial derivatives on the right-hand side are evaluated at a.
13. Consider the three (pairwise) skew lines
1
£i: x 0
0
0 1
f2 : X 1 0
0 1
5 The Nonlinear Case: Introduction to Manifolds 195
Show that through each point of £3 there is a single line that intersects both £1 and £2. Now, find
the equation of the surface formed by all the lines intersecting the three lines £1, £2, and £3. Is it
everywhere smooth? Sketch it.
14. Suppose X C R" is a ^-dimensional manifold and Y c Rp is an £-dimensional manifold. Prove
that
Xx Y = { x
effxr :xeXandy eK
y
is a (7c 4-£) -dimensional manifold in R”+p. (Hint: Recall that Xis locally a graph over a ^-dimensional
coordinate plane in R" and Y is locally a graph over an £-dimensional coordinate plane in Rp.)
15. (a) Suppose A is an n x (n + 1) matrix of rank n. Show that the 1-dimensional solution space
of Ax = b varies continuously with b e R". (First you must decide what this means!)
(b) Generalize.
CHWER
EXTREMUM PROBLEMS
In this chapter we turn to one of the standard topics in differential calculus, solving max-
imum/minimum problems. In single-variable calculus, the strategy is to invoke the Max
imum Value Theorem (which guarantees that a continuous function on a closed interval
achieves its maximum and minimum) and then to examine all critical points and the end
points of the interval. In problems that are posed on open intervals, one must work harder
to understand the global behavior of the function. For example, it is not too hard to prove
that if a differentiable function has precisely one critical point on an interval and that critical
point is a local maximum point, then it must indeed be the global maximum point. As we
shall see, all of these issues are—not surprisingly—rather more subtle in higher dimensions.
But just to stimulate the reader’s geometric intuition, we pose a direct question here.
Query: Suppose /: R2 —> R is C1 and there is exactly one point a at which the tangent plane
of the graph of f is horizontal. Suppose a is a local minimum point. Must it be a global
minimum point?
We close the chapter with a discussion of projections and inconsistent linear systems, along
with a brief treatment of inner product spaces.
196
1 Compactness and the Maximum Value Theorem ◄ 197
infinity, the function may have no maximum value. We now make the “obvious” definition
in higher dimensions:
Definition We say S C R” is bounded if all the points of S lie in some ball centered
at the origin, i.e., if there is a constant M so that ||x|| < M for all x e S. We say 5 c R"
is compact if it is a bounded, closed subset. That is, all the points of 5 lie in some ball
centered at the origin, and any convergent sequence of points in S converges to a point
in S.
► EXAMPLE 1
We saw in Example 6 of Chapter 2, Section 2, that a closed interval in R is a closed subset, and it is
obviously bounded, so it is in fact compact. Here are a few more examples.
a. The unit sphere S"-1 = {x € R" : ||x|| = 1} is compact. Indeed, by Corollary 3.7 of Chapter
2, any level set of a continuous function is closed, so provided we have a bounded set, it will
also be compact. (Note that we write S’1-1 because the sphere is an (n — l)-dimensional
manifold, as Exercise 4.5.5 shows.)
b. Any rectangle [ai, fei] x • • • x [an, bn] c R" is compact. This set is obviously bounded,
and it is closed because of Exercise 2.2.4.
c. The set of 2 x 2 matrices of determinant 1 is a closed subset of R4 (because the determinant
is a polynomial expression in the entries of the matrix) but is not compact. The set is
’’k 0 ”
unbounded, as we can take matrices of the form for arbitrarily large k.
0 1/k
Theorem 1.1 If A C R" is compact, and {a*} is a sequence ofpoints in A, then there
is a convergent subsequence {a*y} (which a fortiori converges to a point in A).
Proof We first prove that any sequence of points in a rectangle [«i, ^i] x • • ■ x
[an> bn] C R" has a convergent subsequence. (This was the result of Exercise 2.2.15, but
the argument is sufficiently subtle that we include the proof here.) We proceed by induction
onn.
Step (i): Suppose n = 1. Given a sequence {%*} of real numbers with a < Xk < b
for all k, we claim that there is a convergent subsequence. If there are only finitely many
distinct numbers xjt, this is easy: At least one value must be taken on infinitely often, and
we choose k\ < &2 < • • • so that —Xk2 = ....
If there are infinitely many distinct numbers among the x^ then we use the famous
“successive bisection” argument. Let Iq = [a, b]. There must be infinitely many distinct
elements of our sequence either to the left of the midpoint of Zo or to the right; let Zi = [ai, bi]
be the half that contains infinitely many (if both do, let’s agree to choose the left half).
Choose Xk, € I\. At the next step, there must be infinitely many distinct elements of our
sequence either to the left or to the right of the midpoint of Zi. Let I2 = [^2. b2] be the half
that contains infinitely many (and choose the left half if both do), and choose Xk2 e I2 with
ki <k2. Continue this process inductively. Suppose we have the interval Zy = [ay, bj]
198 > Chapter 5. Extremum Problems
containing infinitely many distinct elements of our sequence, as well as k\ < < • • • < kj
with Xkt e h for t = 1,2,..., j. Then there must be infinitely many distinct elements of
our sequence either to the left or to the right of the midpoint of the interval /7, and we let
Ij+i = [aj+i, bj+1] be the half that contains infinitely many (once again choosing the left
half if both do). We also choose x^+1 e Ij+1 with kj < kj+i.
At the end of all this, why does the subsequence {x^} converge? Well, in fact, we
know what its limit must be. The set of left endpoints aj is nonempty and bounded above
by b, hence has a least upper bound, a. First of all, the left endpoints a7 must converge to
a, because (see Figure 1.2)
ai < a2 < • ■ ■ < aj < - • • < a < • ■ ■ < bj < • • ■ < b2 < bi,
and so a — <bj - aj = (b — a)/2? -> 0 as j -> oo. But since a and x*. both lie in
the interval [aj, bj], it follows that |a — < bj — aj -> 0 as j -> oo.
a
I-------------------------------------- 1--------- H-d------------------- F---------
Oq — U j O2 = fl3 ^4 ^3 = ^4 = ^2 bo
Figure 1.2
Step (ii): Suppose now n > 2 and we know the result to be true in R" 1. We introduce
_ Xn _ _ Xn—1 _
{Xjt} of points in the rectangle [ui, bi] x • x [an, bn] C R", consider the sequence {x^} of
points in the rectangle [«i, hi] x • • • x bn-1] C R"'1. By our induction hypothesis,
there is a convergent subsequence {x^}. Now the sequence of 71th coordinates of the
corresponding vectors x^, lying in the closed interval [an, bn], has in turn a convergent
subsequence, indexed by kj{ < kj2 < ■ ■ ■ < kjt < .... But then, by Exercises 2.2.6 and
2.2.2, it now follows that the subsequence {x^} converges, as required.
Step (iii): Now we turn to the case of our general compact subset A. Since it is
bounded, it is contained in some ball B(0, R) centered at the origin, hence in some cube
[—R, R] x • • • x [—R, R]. Thus, given a sequence {x*} of points in A, it lies in this cube,
and hence by what we’ve already proved has a convergent subsequence. The limit of that
subsequence is, of course, a point of the cube but must in fact lie in A since A is also closed.
This completes the proof. ■
The result that is the cornerstone of our work in this chapter is the following
1 Compactness and the Maximum Value Theorem 199
Theorem 1.2 (Maximum Value Theorem) Let X c R" be compact, and let
f: X R he a continuous junction.1 Then f takes on its maximum and minimum values;
that is, there are points y and z € X so that
Proof First we show that f is bounded (by which we mean that the set of its values
is a bounded subset of R). Assume to the contrary that the values of f are arbitrarily large.
Then for each k e N there is a point xk € X so that /(x^) > k. By Theorem 1.1, since
X is compact, the sequence {x*} has a convergent subsequence, say, xkj -> a. Since f is
continuous, by Proposition 3.6 of Chapter 2, /(a) = lim f(xk.), but this is impossible
J->OO
since /(x^.) -> oo as j -> oo. An identical argument shows that the values of f are
bounded below as well.
Since the set of values of f is bounded above, it has a least upper bound, M. By the
definition of least upper bound, for each k e N there is xk e XsothatAf — f(xk) < 1/k. As
before, since X is compact, the sequence {x*} has a convergent subsequence, say, xkj z.
Then, by continuity, /(z) = lim f(xkl) = M, so f takes on its maximum value at z. An
;-*oo
identical argument shows that f takes on its minimum value as well. ■
We infer from Theorem 1.2 that, given any linear map T: R" -> Rm, the function
/: S"-1 R
fix) = ||T(x) ||
is continuous (see Exercises 2.3.2 and 2.3.7 and Proposition 3.5 of Chapter 2). Therefore,
f takes on its maximum value, which we denote by || T ||, called die norm of T:
||T||==max||T(x)||.
Proposition 1.3 Let T: R” -> Rm be a linear map. Then for any x e R”, we have
Moreover, for any scalar c we have ||cT|| = |c| ||T||; and if S: Rn -» Rm is another linear
map, we have ||S + T|| < ||S|| + ||T||.
1 Although we have not heretofore defined continuity of a function defined on an arbitrary subset of R”, there is
no serious problem. We say f: X -> R is continuous at a e X if, given any e > 0, there is 5 > 0 so that
T < Ill’ll,
< IIT||||x||,
as required.
That max ||cT(x)|| = |c| max ||T(x)|| = |c|||T|| is evident. Now, last, since
||x||=l 11x11=1
we have
We will compute a few nontrivial examples of the norm of a linear map in the Exercises
of Section 4, but in the meantime we have the following.
► EXAMPLE 2
Let A be an n x n diagonal matrix, with diagonal entries d\,..., dn. Then for any x e S"-1 we have
Note, moreover, that this maximum value is achieved, for if max(|di|, ..., |dn|) = |d,|, then
Aej = d^ and ||Ae,-1| = |dj|. Thus, we conclude that
For future reference, we include the following important and surprising result
Proof We argue by contradiction. Suppose that for some £o > 0 there were no such
8 > 0. Then for every m € N, we could find xm, ym € X with UXm - ym|| < 1/m and
|/(Xm) — /(ym)| >: £o- Since X is compact, we may choose a convergent subsequence
-> a. Now since ||xm — ym|| -> Oasm -> oo, it must be the case that ymt -> a as well.
Since f is continuous at a, given e q > 0, there is <50 > 0 so that whenever ||x — a|| < <50,
1 Compactness and the Maximum Value Theorem ◄ 201
we have |/(x) — /(a)| < g q /2. By the triangle inequality, whenever k is sufficiently large
that ||Xm* - all < <$o and ||ymjfc - a|| < <5o, we have
► EXERCISES 5.1
*1. Which of the following are compact subsets of the given R" ? Give your reasoning. (Identify the
space of all n x n matrices with R" .)
x e* cos t
€ R2 : x2 + y2 = 1 (g) € R2 : t < 0
y e‘ sin?
X
(b) | X
€ R2 : X2 + y2 < 1 (h) y € R3 : x2 + y2 + Z2 < 1
_y _
_z _
X
X
(0 € R2 : x2 - y2 = 1 (i) y e R3 : x3 + y3 + z3 < 1
_y.
_z _
cos t
(f) e R2: t e R (1) {3x3 matrices A : ATA = /}
sin?
#6. Suppose T: R" -> Rm and S: R"1 -> Rz are linear maps. Show that ||SoT|| < ||S||||T||. (In
particular, when A is an f x m matrix and B is an m xn matrix, we have || AB || < || A || ||B||.)
7. Let A be an m x n matrix. Show that || A || = || AT||. (Hint: Start by showing that || A || < ||AT|| by
using Proposition 4.5 of Chapter 1.)
8. Suppose S c R" is compact and a e R" is fixed. Show that there is a point of S closest to a. (Hint:
Use Exercise 2.3.2.)
*9. Suppose S c Rn has the property that any sequence of points in S has a subsequence converging
to a point in S. Prove that S is compact
202 Chapter 5. Extremum Problems
10. Suppose f: X -> Rm is continuous and X is compact. Prove that the set f (X) = {y g Rm :
y = f (x) for some x g X} is compact. (Hint: Use Exercise 9.)
11. Suppose D S2 D S3 D ... are nonempty compact subsets of Rn. Prove that there is x € R" so
that x € St for all k € N. (Cf. Exercise 2.2.10.)
1112. Suppose X C R” is a compact set. Suppose Ui, U2, U3,... c R" are open sets whose union
contains X. Prove that for some N e N we have X c Ui U • • • U UN. (Hint: If not, for each fc,choose
Xk € X so that Xk £ Ui U • • • U I4-)
13. Suppose X c R" is compact, and U\, U2, U3,... C R" are open sets whose union contains X.
Prove that there is a number 8 > 0 so that for every x g X, there is some j g N so that B(x, 5) c Uj.
(Hint: If not, for each k € N, what happens with 8 = 1/k?)
► EXAMPLE 1
1, X G Q
If/(x) = , then every point a e Q is a global maximum point and every point a G Q is
0, x <£ Q
a global minimum point.
► EXAMPLE 2
From this we infer that 0 is a global minimum point. Indeed, (x + y)2 + 2y2 = 0 if and only if
x + y = y= 0if and only if x = y = 0, so 0 is the only global minimum point of f. But is 0 the
only extremum?
Proof Suppose that a is a local minimum (the case of a local maximum is left to the
reader). Then for any v e R", there is 8 > 0 so that we have
/(a 4- tv) — /(a) >0 for all real numbers t with |t| < 8.
Remark Geometrically, if we consider f as a function of x(- only, fixing all the other
variables, we get a curve with a local minimum at aif which must therefore have a flat
tangent line. That is, all partial derivatives of f at a must be 0, and so the tangent plane
must be horizontal.
► EXAMPLE 3
The prototypical example of a saddle point is provided by the function f x2 — y2. The origin
parabolas opening upward in the x-direction and those opening downward in the y-direction (see
Figure 2.3(a)).
A somewhat more interesting example is provided by the so-called monkey saddle, pictured in
Figure 2.3(b), which is the graph of f — 3xy2 — x3. Note that whereas the usual saddle surface
allows room for the legs, in the case of the monkey saddle there is also room for the monkey’s tail. ■*4
w w
Figure 2.3
Now we turn to the standard fare in differential calculus, the typical “applied extremum
problems.” If we are fortunate enough to have a differentiable function on a compact
region X, then the Maximum Value Theorem guarantees both a global maximum and a
global minimum, and we can test for critical points on the interior of X (points having
a neighborhood wholly contained in X). It still remains to examine the function on the
boundary of X, as well.
2 Maximum/Minimum Problems ◄ 205
► EXAMPLE 4
We want to find the hottest and coldest points on the metal plate R = [0, n] x [0, j t ], whose temper
ature is given by f 1 j = sinx + cos 2y. Since f is continuous and R is compact, we know the
cosx -2sin2yj,
Df
rt/2
and so the only critical point in the interior of R is . The boundary of R consists of four
n/2
x € [0, ?r], which achieves a maximum at jt /2 and minima at 0 and t t . Similarly, on C2 and C4
we have f = cos2y, y € [0, t f ], which achieves its maximum at 0 and n and its
minimum at n/2. We now mark the values of f at the nine points we’ve unearthed. We see that the
n/2 n/2 0
hottest points are and and the coldest points are and . On the other
0 n/2 n/2
hand, the critical point at the center of the square is a saddle point (why?).
Somewhat more challenging are extremum problems where the domain is not naturally
compact. Consider the following
► EXAMPLES
Of all rectangular boxes with no lid and having a volume of 4 m3, we wish to determine the dimensions
of the one with least total surface area. Let x, y, and z represent the length, width, and height of the
box, respectively, measured in meters (see Figure 2.5). Given that xyz = 4, we wish to minimize the
surface area xy + 2z(x + y). Substituting z — 4/xy, we then define the surface area as a function of
the independent variables x and y:
8 , k
= xy d----- (x + y)=xy + 8|- + -l.
xy \x y/
206 > Chapter 5. Extremum Problems
Note that the domain of f is the open first quadrant, i.e., X = 0 and y > 0 k which is
y
definitely not compact. What guarantees that our function f achieves a minimum value on X? (Note,
for example, that f has no maximum value on X.) The heuristic answer is this: If either x or y gets
either very small or very large, the value of f gets very large. We shall make this precise soon.
Let’s first of all find the critical points of f. We have
8 8 „
y- — =x~ — =0,
xl yl
whence x = y — 2. The sole critical point is a = and f = 12. Now it is not difficult to
2
establish the fact that a is the global minimum point of /.Let
S=
as in Figure 2.5(b). Then S is compact, so the restriction of / to the set 5 attains its global minimum
value. Here is the crucial point: Whenever is on the boundary of or outside S, we have
y
.1 1
> 12. (For if either 0<x<|or0<y<|, then we have / —I—
x y
if xy > 12, then we have f I I > 12.) Since /(a) = 12, it follows that the global minimum of /
on S cannot occur on the boundary of S, hence must occur at an interior point, and therefore at a
critical point of /. It follows that a is the global minimum point of / on S, hence on all of X since
/(x) > /(a) whenever x £ S.
In summary, the box of the least surface area has the dimensions 2 m x 2 m x 1 m. ◄ !
2 Maximum/Minimum Problems ◄ 207
► EXERCISES 5.2
(b) = xy 4- x - y
*(i) /b = xyz — x2 — y2 4- z2
W
(0 = sin x + sin y
2. A rectangular box with edges parallel to the coordinate axes has one comer at the origin and the
opposite comer on the plane x + 2y 4- 3z = 6. What is the maximum possible volume of the box?
*3. A rectangular box is inscribed in a hemisphere of radius r. Find the dimensions of the box of
maximum volume.
*4. The temperature of the circular plate D = {x : ||x|| < s/2} c R2 is given by the function / ( I =
x2 4- 2y2 — 2x. Find the maximum and minimum values of the temperature on D.
5. Two non-overlapping rectangles with their sides parallel to the coordinate axes are inscribed in
0 1 0
die triangle with vertices at , and . What configuration will maximize the sum of
0 o 1
their areas?
k6. A post office employee has 12 ft2 of cardboard from which to construct a rectangular box with no
lid. Find the dimensions of the box with the largest possible volume.
7. Show that the rectangular box of maximum volume with a given surface area is a cube.
8. The material for the sides of a rectangular box cost twice as much per ft2 as that for the top and
bottom. Find the relative dimensions of the box with greatest volume that can be constructed for a
given cost. 1
9. Find the equation of the plane through the point 2 that cuts off the smallest possible volume
in the first octant. 2
208 ► Chapter 5. Extremum Problems
*10. A long, flat piece of sheet metal, 12" wide, is to be bent to form a long trough with cross sections
an isosceles trapezoid. Find the shape of the trough with maximum cross-sectional area. (Hint: It
will help to use an angle as one of your variables.)
11. A pentagon is formed by placing an isosceles triangle atop a rectangle. If the perimeter P of
the pentagon is fixed, find the dimensions of the rectangle and the height of the triangle that give the
pentagon of maximum area.
12. An ellipse is formed by intersecting the cylinder x2 + y2 = 1 and the plane x + 2y + z = 0. Find
the highest and lowest points on the ellipse. (As usual, the z-axis is vertical.)
13. Suppose x, y, and z are positive numbers with xy2z3 = 108. Find (with proof) the minimum
value of their sum.
14. Let ai,..., a* G R" be fixed points. Show that the function
k
/(x) = J2||x-ay||2
j =i
has a global minimum and find the global minimum point.
15. (Cf. Exercise 14.) Let ai, a2, a3 g JR2 be three noncollinear points. Show that the function
3
/(x) = 22||x-a;-||
j =i
has a global minimum and characterize the global minimum point. (Hint: Your answer will be
geometric in nature. Can you give an explicit geometric construction?)
Proof Define the polynomial P by P(f) = g(0) + g'(O)t + Ct2, where C = g(l) —
g(0) — g'(G)‘ This choice of C makes P(l) = g(l), and it is easy to see that P(0) = g(O)
and P'(0) = gz(0) as well, as shown in Figure 3.1. Then the function h = g — P satisfies
h(ff) = /z'(0) = Jz(l) = 0. By Rolle’s Theorem, since 7z(0) = A(l) = 0, there is c e (0,1)
3 Quadratic Forms and the Second Derivative Test 209
Figure 3.1
so that h'(c) = 0. By Rolle’s Theorem applied to h', since A'(0) — hf(c) = 0, there is
£ € (0, c) so that h"(£) = 0. This means that g"(£) = P"(g) — 2C, and so
gd) = p (d =g(0)+g'm +
as required. ■
The derivative in the multivariable setting becomes a linear map (or vector); as we shall
soon see, the second derivative should become a quadratic form, i.e., a quadratic function
of a vector variable.
^(a) d2f
3x2 w 3x23x„
Hess(/)(a) = -(a) =
uXj uXj
dxndx2^
Hess(/)(a) is called the Hessian matrix of f at a. Define the associated quadratic form
: Rn —> K by
n d2 f
= hT(Hess(/)(a))h = V —
1,7=1
0Xi OXJj
Proposition 3.2 Suppose ft B(a, r) -> R is C2. Then for all h with ||h|| < r we
have
Consequently,
Proof We apply Lemma 3.1 to the function g(t) = /(a + rh). Using the chain rule
twice (and applying Theorem 6.1 of Chapter 3 as well), we have
n df
g'(t) = Df(a + <h)h = 22 ~(a + tK)h,
n n f n f
> uXjuXi (a+'h)^)
«"w = L(E5r£: ' /‘i = .12 i dXi (a+ll,)'’i',>
ar£?
, ax
1=1 J=1 J l,J=l J
= ^/,a+rh(h).
Using the Cauchy-Schwarz inequality, Proposition 2.3 of Chapter 1, and Proposition 1.3,
we find^that |hTAh| < || A|| ||h||2. So whenever ||h|| < 3, we have, for any 0 < $ < 1,
|6(h)| = |Jf/,a^h(h)-J</,a(h)| £
M2 2||h||2 < 2
whenever ||h|| < 6. Since £ > 0 was arbitrary, this proves the result. ■
► EXAMPLE 1
1 2
a. The quadratic form Q(x) = x2 + 4x^2 + 5x% = xT x is positive definite, as we see
2 5
by completing the square:
being the sum of two squares (with positive coefficients), is nonnegative and can vanish
only if *2 = *i 4- 2*2 = 0, i.e., only if x = 0.
1 1
b. The quadratic form Q(x) = xf 4- 2*i *2 - *j = xT x is indefinite, as we can see
1 -1
-2\
(
11=0.
1/
/(a4-h) -/(a) = |2Q,a(h) 4-6(h) where < s whenever ||h|| < <5.
Ilhr
Suppose now that J-Qa is positive definite. By the Maximum Value Theorem, Theorem
1.2, there is a number m > 0 so that Jf/ia(x) > m for all unit vectors x. This means that
3t/,a(h) > m||h||2 for all h. So now, choosing e = m/4, we have
for all h with ||h|| <3. This means that a is a local minimum, as desired. The negative
definite case is analogous.
Now suppose JCy,a is indefinite. Then there are unit vectors x and y so that Jfy,a(x) =
mi > 0 and CK /,a(y) = m2 < 0. Choose s — | min (mi, —m2). Now, letting h = rx (resp.,
ty) with |t| <8, we see that
/(a4-tx) - /(a) > Jmit2 > 0 and /(a 4- ty) - f (a) < |m2t2 < 0.
Last, note that if Xy,a is positive semidefinite, then a may be either a local minimum,
Proof This is just the usual process of completing the square: When A 0,
9 2 / B \2 B2\ , f B \2 /AC-B2\ 2
Ax2 + 2Bxy + Cy2 = A I x + —y I + I C----- - ) y = A I x + —y ) +1------ ------- | y ,
\ A / \ A J \ A J \ A /
so the quadratic form is positive definite when A > 0 and AC — B2 > 0, negative definite
when A < 0 and AC — B2 > 0, and indefinite when AC - B2 < 0. When A = 0, we have
2Bxy + Cy2 = y(2Bx + Cy), and so the quadratic form is indefinite provided B 0, i.e.,
provided AC — B2 < 0. ■
► EXAMPLE 2
Let’s find and classify the critical points of the function f: R2 -> R, f x3 4- y2 — 6xy. Then
Df
and so at a critical point we must have 2y = x2 = 6x. Thus, the critical points are a = and
0
6
b=
18
Now, we calculate the Hessian:
Hess(/)
0
and so
0 -6
= [-6 2
We see that Xy,a is indefinite, so a is a saddle point, and is positive definite, so b is a local
minimum point.
3 Quadratic Forms and the Second Derivative Test 213
The process of completing the square as we’ve done in Example 1 can be couched in
matrix language; indeed, it is intimately related to the reduction to echelon form, as we
shall now see.
► EXAMPLES
'1 3 2~
A= 3 4 -4
,2 -4 -10 _
where
There are already two interesting observations to make: The first column of Ef1 is the transpose of
the first row of A (hence of A'); and if we remove the first row and column from A', what’s left is
also symmetric. Indeed, we can write
0 0 0
-5 -10
-10 —14
since the first term is symmetric (why?), the latter term must be as well. Now we just continue:
1 3 2 1 3 2
A' = 0 -5 -10 0 -5 -10 = V,
0 -10 -14 0 0 6
1
L = E1-1E2"1= 3 1
2 2 1
is a lower triangular matrix with 1 ’s on the diagonal. Now here comes the amazing thing: If we factor
out the diagonal entries of the echelon matrix U, we are left with Lr:
U=
A = LDL\
Remark Of course, not every symmetric matrix can be written in the form LDLy -,
01
e.g., take A = . The problem arises when we have to switch rows to get pivots in the
1 0
appropriate places. Nevertheless, by doing appropriate row operations together with the
companion column operations (to maintain symmetry), one can show that every symmetric
matrix can be written in the form EDE\ where E is the product of elementary matrices
with only l’s on the diagonal (i.e., elementary matrices of type (iii)). See Exercise 8b for
the example of the matrix A given just above.
-where now there is at least one 0 (resp., real numbers of opposite sign) on the diagonal
ofD.
Sketch of proof Suppose A = LDLT, where L is lower triangular with l’s on the
diagonal (or, more generally, A = EDE\ where E is invertible). Let d\,..., dn be the
diagonal entries of the diagonal matrix D. Letting y = LTx, we have
n
Q(x) = x t Ax = x t (LDLt )x = (Lt x )t D(Lt x ) = yTZ>y =
i=i
Realizing that y = 0 <=> x = 0, the conclusions of the first part of the proposition are
now evident.
Suppose Q is positive definite. Then, in particular, Q(ei) = an > 0, so we can write
1 ■0 0 ... 0 '
a12
aU a 111 r 1 1 + 0
11J L °n an J
B
aln
_ an _
_ 0
where B is also symmetric and the quadratic form on R"-1 associated to B is likewise
positive definite. We now continue by induction. (For example, if the upper left entry of B
were 0, this would mean that Q(ai2ei — a^) — 0, contradicting the hypothesis that Q is
positive definite.)
An analogous argument works when Q is negative definite. If A = O, there is nothing
to prove. If not, in the semidefinite or indefinite case, if an = 0, we first find an appropri
ate elementary matrix, Ei, so that the first entry of the symmetric matrix B = E}AE\ is
nonzero, and then we continue as above. ■
Remark We will see another way, introduced in the next section and developed fully
in Chapter 9, of analyzing the nature of the quadratic form Q associated to a symmetric
matrix A. The signs of the eigenvalues of A will tell the whole story.
(a) Show that the origin is a critical point of f and that, restricting f to any line through the origin,
the origin is a local minimum point.
(b) Is the origin a local minimum point of f?2
2We’ve seen several textbooks that purportedly prove Theorem 3.3 by showing, for example, that if !H/>a is
positive definite, then the restriction of f to any line through a has a local minimum at a, and then concluding
that a must be a local minimum point of f. We hope that this exercise will convince you that such a proof must
be flawed.
216 ► Chapter 5. Extremum Problems
(a) Show that / has exactly one critical point a, which is a local minimum point.
(b) Show that a is not a global minimum point.
d2 f
5. Suppose f: R2 -> R is C2 and harmonic (see Example 2 on p. 122). Assume —L (a) / 0. Prove
dx*
that a cannot be an extremum of /.
6. For each of the following symmetric matrices A, write A = LDL7, as in Example 3. Use your
answer to determine whether the associated quadratic form Q given by Q(x) = xTAx is positive
definite, negative definite, indefinite, etc.
1 3 1 -2 2
(a) A =
3 13 (d) A = -2 6 -6
2 -6 9
2 3
(b) A = 1 1 -3 1
3 4
1 0 -3 0
2 2 -2 (e) A =
-3 -3 11 -1
*(c) A = 2 -1 4
1 0 -1 2.
-2 4 1
7. Suppose A = LDU, where L is lower triangular with l’s on the diagonal, D is diagonal, and
U is upper triangular with 1 ’s on the diagonal. Prove that this decomposition is unique; i.e., if
A = LDU = L'D'U', where L', D', and U' have the same defining properties as L, D, and U,
respectively, then L = L',D = D', and U — U'. (Hint: The product of two lower triangular matrices
is lower triangular, and likewise for upper.)
0 2
8. (a) Let A = . After making a row exchange (and corresponding column exchange to
2 1
1 2
preserve symmetry), we get B = E\AE{ = . Now write B = LDL1 and get a corresponding
2 0
equation for A. How, then, have we expressed the associated quadratic form Q(x) = 4x i %2 + *2 a s
a sum (or difference) of squares?
0 1 1 1
(b) Let A = . By considering B = E\AE{ = where E\ is the elementary matrix
1 0 . J 0 ’
corresponding to adding 1/2 of the second row to the first, show that
2 “5 1
A = EDEJ where E= and D=
1 1 -1
What is the corresponding expression for the quadratic form Q(x) = 2xiXj as a sum (or difference)
of squares?
► 4 LAGRANGE MULTIPLIERS
Most extremum problems, including those encountered in single-variable calculus, involve
functions of several variables with some constraints. Consider, for example, the box of
prescribed volume, a cylinder inscribed in a sphere of given radius, or the desire to maximize
4 Lagrange Multipliers ”4 217
profit with only a certain amount of working capital. There is an elegant and powerful way
to approach all these problems by using multivariable calculus, the method of Lagrange
multipliers. A generalization to infinite dimensions, which we shall not study here, is central
in the calculus of variations, which is a powerful tool in mechanics, thermodynamics, and
differential geometry.
► EXAMPLE 1
Your boat has sprung a leak in the middle of the lake and you are trying to find the closest point on
the shoreline. As suggested by Figure 4.1, we imagine dropping a rock in the water at the location
of the boat and watching the circular waves radiate outward. The moment the first wave touches the
shoreline, we know that the point a at which it touches must be closest to us. And at that point, the
circle must be tangent to the shoreline.
Let’s place the origin at the point at which we drop the rock. Then the circles emanating from
this point are level curves of /(x) = ||x ||. Suppose, moreover, that the shoreline is a level curve of a
differentiable function g. By Proposition 5.3 of Chapter 4, the gradient is normal to level sets, so if
the tangent line of the circle at a and the tangent line of the shoreline at a are the same, this means
that we should have
Figure 4.1
We now want to study the calculus of constrained extrema a bit more carefully.
Remark As usual, this is a necessary condition for a constrained extremum but not
a sufficient one. There may be (constrained) saddle points as well.
Proof By the Implicit Function Theorem, we can represent M = g“1 ({0}) locally
near a as a graph over some coordinate (n — m) -plane. For concreteness, let’s say that
locally
M= : x 6 V C Rn~m
X
4> : V->Rn, $(x) =
0(X)
as shown in Figure 4.2, with 4»(a) = a. Now we have two crucial pieces of information:
Figure 4J
4 Lagrange Multipliers -4 219
The first equation in (t) tells us that 7, the (n — m)-dimensional image of the linear map
M(a), satisfies 7 C ker Dg(a) (or C([D*(a)]) c N([Dg(a)])). But, by the Nullity-Rank
Theorem, Corollary 4.6 of Chapter 4, we have
T= N([Dg(a)]) = (R([Dg(a)]))‘L.
Tc N([Z>y(a)]) = Rftuyca)])1.
Thus,
(R([Dg(a)]))'L C R([Z>/(a)])\
so, taking orthogonal complements and using Exercise 1.3.9 and Proposition 4.8 of Chapter
4, we have
R([Z>/(a)]) C R([Dg(a)]),
so D/(a) is a linear combination of the linear maps Dgi (a),..., Dgm(a)—or, more geo
metrically, Vf (a) is a linear combination of the vectors Vgi (a),..., VgOT (a)—as we needed
to show. ■
► EXAMPLE!
X A?
The temperature at the point y in space is given by f 1 y = xy + z2. We wish to find the
_z _ o’
hottest and coldest points on the sphere x2 + y2 + z2 = 2z (the sphere of radius 1 centered at 0 ).
1_
That is, we must find the extrema of f subject to the constraint g I y I = x2 + y2 + z2 — 2z = 0. By
\z/
Theorem 4.1, we must find points x satisfying g(x) = 0 at which D/(x) = AZ>g(x) for some scalar
A. That is, we seek points x so that
y _ x _ 2z
x ~~ y z-1‘
220 ► Chapter 5. Extremum Problems
So either
Now, we infer from (*) that if x = 0, then y = 0 as well (and vice versa), and then z can be arbitrary,
so we also find that the north and south poles of the sphere are constrained critical points. On the
other hand, we cannot have the denominator z — 1 = 0, for, by (*), that would require z = 0, and
these equations cannot hold simultaneously.
Calculating the values of f at our various constrained critical points, we have
75/2/3 -V572/3>
( (
1
-75/2/3 7572/3 f p and f I0
1/3 / 1/3 / w w
Remark We surmise that the origin is a saddle point. Indeed, representing the sphere
locally as a graph near the origin, we have z = 1 — Vl — (x2 + y2) and
/x \
f I y 1 = xy + (1 — Vl — (x2 + y2))2 = xy + higher order terms.
\z/
(This is easiest to see by using Vl + « = 1 4- h /2 + higher order terms.) Even easier, the
origin is a nonconstrained critical point of f. Since f is a quadratic polynomial, Xy.o = f,
and on the tangent plane of the sphere at 0 we just get xy. (Also see Exercise 34.)
► EXAMPLE 3
Find the shortest possible distance from the ellipse x2 4- 2y2 = 2 to the line x + y = 2. We need to
consider the (square of the) distance between pairs of points, one on the ellipse, the other on the line.
u
This means that we need to work in R2 x R2, with coordinates and , respectively. Let’s
y
try to minimize
/x''
y
= (x — u)2 + (y - v)2
u
4 Lagrange Multipliers 221
(x>
y x2 + 2y2 - 2 0
u u + v- 2 0
\v)
(The rank condition on g is easily checked in this case.) So we need to find points at which, for some
scalars X and /i, we have
We see that we must have x — u = y — v and so x = 2y, as well. Now substituting into the constraint
equations yields two critical points:
2/V3 -2/73
1/73 -1/73
and
1 +1/(273) 1 - 1/(273)
1 - 1/(273) _ 1 + 1/(273)
u X
As a check, note that the vector from to in each case is normal to both the ellipse and the
y
line, as Figure 4.3 corroborates. Evidently, the first point gives the shortest possible distance, and we
leave it to the reader to establish this rigorously.
► EXAMPLE 4
6
Consider A — Proceeding as above, we arrive at the system of equations
2
6x 4- 2y = kx
2x + 9y = ky.
Eliminating k, we obtain
6x 4- 2y _ 2x 4- 9y
x ~ y
so either y = 2x or y = — Substituting into the constraint equation, we obtain the critical points
1/75 -2/75
(eigenvectors) and , with respective Lagrange multipliers (eigenvalues) 10
2/75 1/75
and 5.
► EXERCISES 5.4
1. (a) Find the minimum value < /I I = x2 4- y2 on the curve x 4- y = 2. Why is there no
maximum?
(c) How are the questions (and answers) in parts a and b related?
x
‘2. A wire has the shape of the circle x2 4- y2 — 2y — 0. Its temperature at the point is given
/ X \
y
L. ' -J
by T I I — 2x2 4- 3y. Find the maximum and minimum temperatures of the wire. (Be sure you’ve
3. Find the maximum value of f y I = 2x 4- 2y — z on the sphere of radius 2 centered at the origin.
\z) (x\
4. Find the maximum and minimum values of the function / I I = x2 4- xy 4- y2 on the unit disk
V/
D = {x € R2 : ||x|| < 1).
4 Lagrange Multipliers 223
1
5. Find the point(s) on the ellipse x2 + 4y2 = 4 closest to the point
0
6. The temperature at point x is given by f y = x2 + 2y + 2z. Find the hottest and coldest points
on the sphere x2 + y2 + z2 = 3.
7. Find the volume of the largest rectangular box (with all its edges parallel to the coordinate axes)
that can be inscribed in the ellipsoid
y + zL
+— 2 = i.
2 3
8. A space probe in the shape of the ellipsoid 4x2 + y2 + 4z2 = 16 enters the earth’s atmosphere and
its surface begins to heat. After 1 hour, the temperature in °C on its surface is given by f y I =
W
2x2 + yz — 4z + 600. Find the hottest and coldest points on the probe’s surface.
/x\
9. The temperature in space is given by f y = 3xy + z3 — 3z. Prove that there are hottest and
W
coldest points on the sphere x2 + y2 + z2 - 2z = 0, and find them.
G\ X
10. Let f y I = xy + z3 and S = y : x2 + y2 +z2 = 1, z>0 . Prove that f attains its
_z _
global maximum and minimum on S and determine its global maximum and minimum points.
11. Among all triangles inscribed in the unit circle, which have the greatest area? (Hint: Consider
the three small triangles formed by joining the vertices to the center of the circle.)
12. Among all triangles inscribed in the unit circle, which have the greatest perimeter?
*13. Find the ellipse x2/a2 + y2/b2 = 1 that passes through the point and has the least area.
*27. Find the points on the curve of intersection of the two surfaces x2 — xy 4- y2 — z2 = 1 and
x2 + y2 = 1 that are closest to the origin.
28. Show that of all quadrilaterals with fixed side lengths, the one of maximum area can be inscribed
in a circle. (Hint: Use as variables a pair of opposite angles. See also Exercise 1.2.14.)
29. For each of the following symmetric matrices A, find all the extrema of Q(x) = xTAx subject to
the constraint ||x||2 = 1. Also determine the Lagrange multiplier each time.
30. Find the norm of each of the following matrices. Note: A calculator will be helpful.
31. A (frictionless) lasso is thrown around two pegs, as pictured in Figure 4.4, and a large weight
hung from the free end. Treating the mass of the rope as insignificant, and supposing the weight hangs
freely, what is the equilibrium position of the system?
32. (Interpreting the Lagrange Multiplier)
(a) Suppose a = ^(c) is a local extreme point of the function f relative to the constraint g(x) = c;
suppose, moreover, that is a differentiable function. Show that A =
(b) Assume that f and g are C2. Use the Implicit Function Theorem (see Theorem 2.2 of Chapter 6
for the general version) to show that the extreme point a is given locally as a differentiable function
of c whenever the “bordered Hessian”
is invertible.
5 Projections, Least Squares, and Inner Product Spaces ◄ 225
■
Figure 4.4
33. (An Application of Exercise 32 to Economics) Let x e R" be the commodity vector, pel'1
the price vector, and f: R” -> R the production function, so that f (x) tells us how many widgets
are produced, using Xi units of item i, i = 1,..., n. Prove that to produce the greatest number of
widgets with a given budget, we must have
13/ 1 df
Pl dXt p„ 3xn '
Pi
What does the result of Exercise 32a tell us in this case?
34. (A Second Derivative Test for Constrained Extrema) Suppose a is a critical point of f sub
ject to the constraint g(x) = c, Df (a) = kDg(a), and Dg(a) 0. Show that a is a constrained local
maximum (resp., minimum) of f on M = {x: g(x) = c} if the restriction of the Hessian of f — kg to
the tangent space TaM is negative (resp., positive) definite. (Hint: Parametrize the constraint surface
M locally by $ with <P(a) = a and apply Theorem 3.3 to /o<>.) There is an interpretation in terms
of the bordered Hessian (see Exercise 32b), which is indicated in Exercise 9.4.21.
► EXAMPLE 1
1 2 1
A— 0 1 and b= 1
1 1 1
It is easy to check that b £ C(A), and so this system is inconsistent. The best we can do is to solve
Ax = p, where p is the vector in C(A) that is closest to b. Clearly that point is p = b — projab, where
a is the normal vector to C(A) c R3, as shown in Figure 5.1. Now we see how to solve our problem.
C(A) is the plane in R3 with normal vector
1
1
226 ► Chapter 5. Extremum Problems
Figure 5.1
. t ba 1
proj.b =---- =-a = -
a lla||2 3
and so
4/3
2/3
2/3
0
2/3
This is called the least squares solution of the original problem, inasmuch as Ax is the vector in C( A)
closest to b.
In general, given b e R" and an m-dimensional subspace V can ask for the
projection of b onto V, i.e., the point in V closest to b, which we denote by projvb. We
first make the official
Definition Let V c R" be a subspace, and let b e R”. We define the projection
of b onto V to be the unique vector p e V with the property that b - p e V1. We write
p = projvb.
We ask the reader to show in Exercise 10 that projection onto a subspace V gives a linear
map. As we know from Chapter 4, we can be given V either explicitly (say, V = C(A) for
some n x m matrix A) or implicitly (say, V = N(B) for some m x n matrix B). We will
start by applying the methods of this chapter to obtain a simple solution of the problem (and
then we will indicate that we could have omitted the calculus completely).
5 Projections, Least Squares, and Inner Product Spaces ◄ 227
Suppose A is an n x m matrix of rank m (so that the column vectors ai,..., aw give
a basis for our subspace V). Define
We seek critical points off. Write A (x) = ||x||2andg(x) = Ax - b, so that / = h°g. Then
Dhfy) = ty1 and Dg(x) = A, so, differentiating / by the chain rule, we have D/(x) =
D/i(g(x))Dg(x) = 2(Ax - b)TA. Thus,D/(x) = 0 ■<=> (Ax - b)TA = 0. Transposing
for convenience, we deduce that x is a critical point if and only if
/(x) = ||Ax - b||2 = IIA(x - x) + (Ax - b)II2 = ||A(x - x)||2 + ||Ax - b||2 > .
The vector x is called the least squares solution of the (inconsistent) linear system Ax — b,
and (*) gives the associated normal equations.
Figure 5.2
Remark When A has rank less than m, the linear system (*) is still consistent (see
Exercise 4.4.15) and has infinitely many solutions. We define the least squares solution to
be the one of smallest length, i.e., the unique vector x g R(A) that satisfies the equation.
See Proposition 4.10 of Chapter 4. This leads to the pseudoinverse that is important in
numerical analysis (cf. Strang).
► EXAMPLE2
We wish to find the least squares solution of the system Ax = b, where
228 ► Chapter 5. Extremum Problems
2
At A = and
4
and so, using the formula for the inverse of a 2 x 2 matrix in Example 5 on p. 154,
~2] [4 1 3'
x= (ATA) 1Arb= *
6 5 To u
This is all it takes to give an explicit formula for projection onto a subspace V c R".
In particular, denote by
the function that assigns to each vector b e HF the vector p e V closest to b. Start by
choosing a basis {Vi,..., vm} for V, and let
’I I I
A = vi v2 ••• vm
be the n x m matrix whose column vectors are these basis vectors. Then, given b g T,
we know that if we take x = (ATA)-1 ATb, then Ax = p = projvb. That is,
p = projvb = (A(ATA)-1AT)b,
(t) p =
is the appropriate projection matrix: i.e.,
In Section 5.2, we’ll see a bit more of the geometry underlying the formula for the projection
matrix.
► EXAMPLE 3
Pb = (A(ATA)-1AT)b = A(ATA)“1(ATb) = 0,
as it should be. ◄
5 Projections, Least Squares, and Inner Product Spaces 229
► EXAMPLE 4
Note that when dim V = 1, we recover our formula for projection onto a line from Section 2 of
Chapter 1. If a e R" is a nonzero vector, we consider it as an n x 1 matrix and the projection formula
becomes
that is,
as before.
► EXAMPLES
Then, since
2
At A = we have (ATA)'1
and so
Now, what happens if we are given the subspace implicitly? This sounds like the
perfect setup for Lagrange multipliers. Suppose the m-dimensional subspace V c R” is
given as the nullspace of an (n — m) x n matrix B of rank n — m. To find the point in V
closest to b e R", we want to minimize the function
The method of Lagrange multipliers, Theorem 4.1, tells us that we must have (dropping the
factor of 2)
n— m
(X - b)T = AjB/, for some scalars Xi,..., Xn_m,
i=l
x - b = BTX, where X =
(BBr)k = —Bb.
By analogy with our treatment of the equation (*), the matrix BBr has rank n — m, and so
we can solve for X, hence for the constrained extremum x q :
Note that, according to our projection formula (t), we can interpret this answer as
as it should be.
► EXAMPLES
Find the least squares line y = ax + b for the data points , and (See Figure 5.3.)
1 3
We get the system of equations
-la + b = 0
la + b = 1
2a 4- b = 3,
5 Projections, Least Squares, and Inner Product Spaces << 231
-1 "o'
1 1
2 _3_
xi
When we find the least squares line y = ax + b fitting the data points
.yi _ ym J’
we are finding the least squares solution of the (inconsistent) system A = y, where
?2
and
L?m J
Let’s denote by y = A the projection of y onto C(A). The least squares solution
has the property that || y — y || is as small as possible. If we define the error vector € = y — y,
232 ► Chapter 5. Extremum Problems
then we have
“ 61 ' " 71 - " ” 71 - (a^ + b) ~
62 72-72 72 ~ («X2 + b)
The least squares process chooses a and b so that ||€||2 = ej H-------- F 6 2 is as small as
possible. But something interesting happens. Recall that
« = y-yeC(A)±.
ei
= 61 + ■ • • + €m = 0.
That is, in the process of minimizing the sum of the squares of the errors 6,, we have in fact
made their (algebraic) sum equal to 0.
Figure 5.4
5 Projections, Least Squares, and Inner Product Spaces 233
52pr°jvf fx = £riiv
* - tr 52 *.f ii2
if and only if{N\,..., V/J is an orthogonal basis for V.
Proof Suppose {vi,..., v^J is an orthogonal basis for V. Then there are scalars
ci,c* so that
Taking advantage of the orthogonality of the v/s, we take the dot product of this equation
with Vj:
and so
- x'Vf
Ci~ IIvf II2'
Recall from Proposition 3.1 of Chapter 4 that every vector has a unique expansion as a
linear combination of basis vectors, so comparing coefficients of v2,..., vjt on either side
of this equation, we conclude that
A similar argument shows that vz • v7 = 0 for all i j, and the proof is complete. ■
As we mentioned above, if {vi,..., v^} is a basis for V, then every vector x g V can
be written uniquely as a linear combination
We recall that the coefficients ci, c2,..., ck that appear here are called the coordinates of x
with respect to the basis {vi,..., v^}. It is worth emphasizing that when {vi,..., v*} forms
an orthogonal basis for V, it is quite easy to compute the coordinates of x by using the
dot product; that is, c(- = x • vf /1| vr ||2. As we saw in Example 8 of Section 3 of Chapter 4
234 ► Chapter 5. Extremum Problems
(see also Section 1 of Chapter 9), when the basis is not orthogonal, it is far more tedious to
compute these coordinates.
Not only do orthogonal bases make it easy to calculate coordinates, they also make
projections quite easy to compute, as we now see.
fc b V'
(**) projvb = gprojV1b =
Proof Assume {vi,..., v*} is an orthogonal basis for V and write b = p + (b — p),
where p = projvb (and so b - p e V1). Then, since p e V, by Lemma 5.1, we know
k
P Vz Moreover, for i = 1,..., k, we have bv( = p • v< since b — p e V1.
Thus7
L llvJI2
K K K » K
projvb = p = gp^.p = g „ =g = gprojv,b.
k
Conversely, suppose proj7b = X} projVjb for all b e R". In particular, when b e V, we
i=i
deduce thatb = projv bean be written as a linear combination of Vi,..., v*, so these vectors
span V; since V is A-dimensional, {vi,..., Vj.} gives a basis for V. By Lemma 5.1, it must
be an orthogonal basis. ■
> EXAMPLE?
We return to Example 5 on p. 229. The basis {vi, v2} we used there was certainly not an orthogonal
basis, but it is not hard to find one that is. Instead, we take
1
W] = o and w2 = 1
1
(It is immediate that wi • w2 = 0 and that wlf w2 lie in the plane xj — 2x2 + x3 = 0.) Now, we
calculate
. _ . , . , b • Wi b • w2
projvb = projwib + proj,2b = + ^w2
/ 1 T 1 T\
= I 7i---- --------------------ii2W2W2 b
\IIW1II2 ||W2||2 V
5 Projections, Least Squares, and Inner Product Spaces ◄ 235
1
1 1 b
1 d/
as we found earlier.
Remark This is exactly what we get from formula (t) on p. 228 when {vi,..., }
is an orthogonal set. In particular,
wi = vi.
Figure 5.5
236 ► Chapter 5. Extremum Problems
If V2 is orthogonal to wi, then we set w2 = Nz. Of course, in general, it will not be, and we
want W2 to be the part of v2 that is orthogonal to wi; i.e., we set
▼ 2-W1
W2 = v2 - projWj V2 = V2 - Wi.
llwill2
Then, by construction, wi and W2 are orthogonal and Span(wi, w2) C Span(vi, v2). Since
w2 / 0 (why?), {wi, W2} must be linearly independent and therefore give a basis for
Span(vi, v2) by Lemma 3.8. We continue, replacing v3 by its part orthogonal to the plane
spanned by Wi and w2:
v3 ■ wi v3 • w2
w3 = v3 - projSpan(W1W2)v3 = v3 - projW1 v3 - projW2v3 = v3 - ------ --------------- t"
llwill2 IIW2II2
Note that we are making definite use of Proposition 5.2 here: We must use Wi and W2 in the
formula here, rather than Vi and v2, because the formula (**) requires an orthogonal basis.
Once again, we find that w3 0 (why?), and so {wi, W2, w3} must be linearly independent
and, consequently, an orthogonal basis for Span(vi, v2, v3). The process continues until
we have arrived at v* and replaced it by
V* • Wi Vfc • w2 ▼ a • w*-i
Wfc —v* projSpan(Wli...>Wi_1)vk - Vfc ||W1||2W1 ||w2||2 W2 W*-1.
llwt-dl2
Summarizing, we have the algorithm that goes by the name of the Gram-Schmidt
process.
Theorem 5.3 (Gram-Schmidt Process) Given a basis {vi,..., v^} for a subspace
V C Rn, we obtain an orthogonal basis {wi,..., w^} for V as follows:
Wi = Vi
W2 = V2 - Wi
Ifwe so desire, we can arrangefor an orthogonal basis consisting ofxxaA vectors by dividing
each of Wi,..., wk by its respective length:
_ W1 _ W2 Wfc
’‘“llwilT ’2' ||w2|| ’ ’e-||w»ll’
► EXAMPLES
It’s always a good idea to check that the vectors form an orthogonal (or orthonormal) set, and it’s
easy—with these numbers—to do so.
Definition Let V be a real vector space. We say V is an inner product space if for
every pair of elements u, v € V there is a real number (u, v), called the inner product of u
and v, such that
► EXAMPLE 9
a. Fix k + 1 distinct real numbers ti, t2,..., r*+i and define an inner product on Pk, the vector
space of polynomials of degree < k, by the formula
*+i
(p,?) = ^P(ti)q(.ti), p,q € Pk.
i=l
All the properties of an inner product are obvious except for the very last. If {p, p) =0, then
k+1
pit,)2 = 0, and so we must have p(ti) = p(t2) = • • • = pfo+i) = 0- But if a polynomial
z=i
of degree < k has (at least) k + 1 roots, then it must be the zero polynomial.
b. Let C°([a, £»]) denote the vector space of continuous functions on the interval [a, b]. If
fge 6°([a, £>]), define
(fg} = [ f(t)g(t)dt.
Ja
We verify that the defining properties hold.
i. (/. «)=£fUMM== tg, /)■
2- {cf, g) = £(cfW>g(f)dt = £ cf(f)g(e)dl = cf‘ = c{f, g).
3. (/ + «.*>) = /■*(/ + gW)W>di = (/«) + g(l))h<i)dl = ft +
g(t)h(f))dl = f‘ f(f)Mf)dt + f* g(t)h(f)dl = (/, A) + (g, A).
5 Projections, Least Squares, and Inner Product Spaces 239
4. {f, f) = f* f(t)2dt > 0 since f(t)2 > 0 for all t. On the other hand, if (f, f) =
fa (/(O)2dt = 0, then since f is continuous and f2 > 0, it must be the case that
f = 0. (If not, we would have /(to) /= 0 for some to, and then f(/)2 would be positive
on some small interval containing io; it would then follow that fa f(t)2dt > 0.)
The same inner product can be defined on subspaces of C°([a, /?]), e.g., TV
c. We define an inner product on Mnxn in Exercise 18.
If V is an inner product space, we define length, orthogonality, and the angle between
vectors just as we did in R". If v e V, we define its length to be ||v|| = V(v, v). We
say v and w are orthogonal if (v, w) = 0. Since the Cauchy-Schwarz inequality can be
established in general by following the proof of Proposition 2.3 of Chapter 1 verbatim, we
can define the angle 0 between v and w by the equation
a <V’ W>
cos 6 = ——-—■
t\ tz
b\ _ _ bz _ _
in the plane with ti, t2,..., tk+i distinct, there is exactly one polynomial p &Pk whose
graph passes through the points.
The polynomial qi(t) — (t - t2)(t -t3)---(t - tk+i) has the property that qi(tj) = 0 for
j = 2, 3,..., k + 1, and qi (ti) = (ti - *2)(*i - fc) • • • Gi - fc+i) £ 0 (why?). So now we
set
= (t - t2)(t - t3) • • • (t - tfc+1) .
P1 (ti - ?2)(h -13) • • • (ti - tk+i)'
then, as desired, pi(?i) = 1 and pi(tp = 0 for j = 2,3,..., k 4-1. Similarly, we can
define
(.t-ti)(t-t3)---(t-tk+i)
P2 (*2 — *1)(*2 ~ t3) • • • (t2 — tk+1)
240 ► Chapter 5. Extremum Problems
1, when i — j
Pt(*j) =
0, when i / j
Like the standard basis vectors in Euclidean space, pi, P2> • • •,Pk+i are unit vectors in Pk
that are orthogonal to one another. It follows from Exercise 4.3.5 that these vectors form a
linearly independent set, hence a basis for Pk (why?). In Figure 5.6 we give the graphs of
the Lagrange basis polynomials pi, P2, P3 for P2 when fi = — 1, t2 = 0, and = 2.
has the desired properties: viz., p(tj) = b}for j = 1,2,..., k + 1. On the other hand,
two polynomials of degree < k with the same values at k + 1 points must be equal since
their difference is a polynomial of degree < k with at least k + 1 roots. This establishes
uniqueness. (More elegantly, any polynomial q with q(tj) = bj, j = 1,..., k + 1, must
satisfy {q, pj) = bj, j = 1,... ,k + 1.) ■
► EXERCISES 5.5
1. Find the projection of the given vector b e Rn onto the given hyperplane V C R".
"2"
(a) V — {x\ + X2 ■+■ X3 =0} c R3, b = 1
_1_
TO’
1
*(b) V = to +x2 +x3 = 0} c R4,b =
2
3
5 Projections, Least Squares, and Inner Product Spaces 241
2. Check from the formula P = A(ATA) 1AT for the projection matrix that P = PT and P2 = P.
Show that I — P has the same properties; explain.
1 0
3. Let V = Span 0 1 C R3. Construct the matrix [projv]
1 -2
(a) by finding [proj^ij];
(b) by using the projection matrix P given in formula (f) on p. 228;
(c) by finding an orthogonal basis for V.
*4. (a) Find the least squares solution of
xi + x2 = 4
2xi + x2 = —2
xi - x2 = 1.
’1" 1' 4~
(b) Find the point on the plane spanned by 2 and 1 that is closest to -2
_1 _ _ -1 _ 1_
5. (a) Find the least squares solution of
Xi + x2 = 1
xi — 3x 2 = 4
2xi + x2 = 3.
' 1'
r 1~ " 1"
(b) Find the point on the plane spanned by 1 and -3 that is closest to 4
_2_ 1_ _3 _
6. Solve Exercise 5.4.26 anew, using (+) on p. 230.
-1 0 1 2
7. Consider the four data points
0 3 1 5
L ” J L ‘ J L ‘J L"J
'(a) Find the least squares horizontal line y = a fitting the data points. Check that the sum of the
errors is 0.
(b) Find the least squares line y = ax + b fitting the data points. Check that the sum of the errors
is 0.
*(c) Find the least squares parabola y — ax2 + bx + c fitting the data points. (Calculator recom
mended.) What is true of the sum of the errors in this case?
1 2 3 4
8. Consider the four data points
1 2 1 3
(a) Find the least squares horizontal line y — a fitting die data points. Check that the sum of the
errors is 0.
(b) Find the least squares line y = ax + b fitting the data points. Check that the sum of the errors
isO.
242 > Chapter 5. Extremum Problems
(c) Find the least squares parabola y = ax2 + bx + c fitting the data points. (Calculator recom
mended.) What is true of the sum of the errors in this case?
9. Derive the equation (*) on p. 227 by starting with the equation Ax = p and using the result of
Theorem 4.9 of Chapter 4.
10. Let V c R" be a subspace. Prove from the definition of projy on p. 226 that
(a) projy (x + y) = projvx + projvy for all vectors x and y;
(b) projy (ex) = cprojyX for all vectors x and scalars c.
(c) for any b e R” we have b = projyb + projyxb.
Parts a and b tell us that projy is a linear map.
11. Using the definition of projection on p. 226, prove that
(a) if [projy] = A, then A = A2 and A = AT. (Hint: For the latter, show that Ax • y = x • Ay for all
x, y. It may be helpful to write x and y as the sum of vectors in V and Vx.)
(b) if A2 = A and A = AT, then A is a projection matrix. (Hints: First decide onto which subspace
it should be projecting. Then show that for all x, the vector Ax lies in that subspace and x — Ax is
orthogonal to that subspace.)
12. Execute the Gram-Schmidt process in each case to give an orthonormal basis for the subspace
"1 1 1 1
1 1
(b) (d) A = 1 1 3 -5
o 1 -1
2 2 4 -4
B16. Let A be an n x n matrix and, as usual, let at,..., a„ denote its column vectors.
(a) Suppose ai,..., an form an orthonormal set. Prove that A-1 = AT.
*(b) Suppose ai,..., an form an orthogonal set and each is nonzero. Find the appropriate formula
for A-1.
17. LetV = e°([—a, a]) with the inner product (f,g) = f(t)g(f)dt. LetU+ C V be the subset
of even functions, and letI7~ C V be the subset of odd functions. That is, U+ = {f G V : f (—t) =
/(O for all t g [-a, a]} and U~ = {f g V : f(-t) = -f(t) for all t g [-a, a]}.
(a) Prove that U+ and U~ are orthogonal subspaces of V.
(b) Use the fact that every function can be written as the sum of an even and an odd function, viz.,
/«) = j(/(O + /(-»)) + !(/(» - /(-»))■
even odd
SOLVING NONLINEAR
PROBLEMS
In this brief chapter we introduce some important techniques for dealing with nonlinear
problems (and in the infinite-dimensional setting as well, although that is too far off-track
for us here). As we’ve said all along, we expect the derivative of a nonlinear function to
dictate locally how the function behaves. In this chapter we come to the rigorous treatment
of the inverse and implicit function theorems, to which we alluded at the end of Chapter
4, and to a few equivalent descriptions of a ^-dimensional manifold, which will play a
prominent role in Chapter 8.
Proposition 1.1 Suppose {a^} is a sequence of vectors in R" and the series
00
£iim
fc=l
converges (i.e., the sequence of partial sums 4 = ||ai || + • • • + ||a^|| is a convergent se
quence of real numbers). Then the series
00
Jt=l
Proof We first prove the result in the case n = 1. Given a sequence {a*} of real
numbers, define bk = a* + Note that
2zzjt, ifa£>0
0, otherwise
244
1 The Contraction Mapping Principle ◄ 245
Now, the series EEd converges by comparison with £ |. (Directly: Since bk > 0,
the partial sums form a nondecreasing sequence that is bounded above by 2 £ That
nondecreasing sequence must converge to its least upper bound. See Example 4c of Chapter
2, Section 2.) Since ak = bk - |a*|, the series converges, being the sum of the two
convergent series £ and — £ lfld-
We use this case to derive the general result. Denote by ak,j, j = 1,..., n, the j,th
component of the vector a*. Obviously, we have |a*j| < ||ajt||. By comparison with
the convergent series £ II** lb for any j = 1,..., n, the series ^,k \akj | converges, and
hence, by what we’ve just proved, so does the series akj. Since this is true for each
j = 1......... n, the series
HkakA
£»* =
_ ak,n _
Remark The result holds even if we use something other than the Euclidean length in
R". For example, we can apply the result by using the norm defined on the vector space of
m x n matrices in Section 1 of Chapter 5, since the triangle inequality || A + B|| < || A || 4-
|| B || holds (see Proposition 1.3 of Chapter 5) and \atj | < || A || for any matrix A = [a,7]
(why?).
The following result is crucial in both pure and applied mathematics, and applies in
infinite-dimensional settings as well.
► EXAMPLE 1
Consider/: [0, t t /3] -> [0,1] c [0, ?r/3] given by /(x) =cosx. Then by the mean value theorem,
for any x, y € [0, t f /3],
xA+i = f(Xjt).
Our goal is to show that, inasmuch as f is a contraction mapping, this sequence converges
to some point x g X. Then, by continuity of f (see Exercise 1), we will have
k
Xfc = Xo + (Xi - Xo) + (x2 - Xi) + • • • + (Xk - Xfc-1) = Xo + (x> “ X7-1) •
= Xfc Xfc—i
and try to determine whether the series 22 &k converges. To this end, we wish to apply
Proposition 1.1, and so we begin by estimating || a* ||: By the definition of the sequence {x*}
and the definition of a contraction mapping, we have
Therefore,
k / K \ i _ k
22
k=i
IM - (S
\fc=i
c /J IM = -rz~-IIM-
c
and so the series 22 lla*ll converges. By Proposition 1.1, we infer that the series 22
converges to some vector a g R". It follows, then, that xk -> xo + a = x, as required.
Two issues remain. Since xk -> x, all the xks are elements of X, and X is closed,
then we know that x g X as well. The uniqueness of the fixed point is left to the reader in
Exercise 1. ■
► EXAMPLE 2
According to Theorem 1.2, the function f introduced in Example 1 must have a unique fixed point
in the interval [0, j t /3]. Following the proof with x0 ~ 0, we obtain the following values:
1 The Contraction Mapping Principle 247
k xk k Xk
1 1. 11 0.744237
2 0.540302 12 0.735604
3 0.857553 13 0.741425
4 0.654289 14 0.737506
5 0.793480 15 0.740147
6 0.701368 16 0.738369
7 0.763959 17 0.739567
8 0.722102 18 0.738760
9 0.750417 19 0.739303
10 0.731404 20 0.738937
Indeed, as Figure 1.1 illustrates, the values x* are converging to the x-coordinate of the intersection
of the graph of /(x) = cos x with the diagonal y = x.
Example 2 shows that this is a very slow method to obtain the solution of cosx = x.
Far better is Newton’s method, familiar to every student of calculus. Given a differentiable
function g: R -> R, we start at xk, draw the tangent line to the graph of g at xk, and let
Xfc+i be the x-intercept of that tangent line, as shown in Figure 1.2. We obtain in this way
a sequence, and one hopes that if x q is sufficiently close to a root a, then the sequence will
converge to a. It is easy to see that the recursion formula for this sequence is
g(xk)
Xk+1 — Xk g'(Xk) ’
g(*) „
so, in fact, we are looking for a fixed point of the mapping f(x) = x — ——-. If we assume
g'(x)
g is twice differentiable, then we find that f' = gg"/(g')2, so f will be a contraction
mapping whenever lgg"/(g')2l < c < 1. In particular, if |g"| < M and |g'| > m, then
248 ► Chapter 6. Solving Nonlinear Problems
► EXAMPLES
Reconsidering the problem of Example 2, let’s use Newton’s method to approximate the root of
X COS X
cosx = x by taking g(x) = x — cosx and iterating the map f(x) = x — - ----- :.
1 + smx
k ** k xk
0 1. 0 0.523599
1 0.750364 1 0.751883
2 0.739113 2 0.739121
3 0.739085 3 0.739085
4 0.739085 4 0.739085
Here we see that, whether we start at either xo = 1 or at xo = t t /6, Newton’s method converges to
the root quite rapidly. Indeed, on the interval [nr/6, jt /3], we have m = 1.5, M = .87, and |g| < .55,
which is far smaller than m2/M ~ 2.6. ◄
By (*), we have ||g'(f)|| < ||Df (a + r(b - a))|| ||b - a||, and so
► EXERCISES 6.1
1. Prove that any contraction mapping is continuous and has at most one fixed point.
2. Let/: R -> R be given by /(x) = Vx2 + 1. Show that/has no fixed point and that |/'(x)| < 1
for all x € R. Why does this not contradict Theorem 1.2?
c*
*3. For the sequence {xt} defined in the proof of Theorem 1.2, prove that || Xfc — x|| < ------||xi — Xo||.
1 —c
This gives an a priori estimate on how fast the sequence converges to the fixed point.
4. A sequence {x*} of points in R" is called a Cauchy sequence if for all s > 0 there is K so that
whenever k,t > K, we have ||x* - xt || < s. It is a fact that any Cauchy sequence in R" is convergent
(See Exercise 2.2.14.) Suppose 0 < c < 1 and {xjt} is a sequence of points in Rn so that ||xjt+i — x*|| <
cKx* — Xjt-i || for all k e N. Prove that {x*} is a Cauchy sequence, hence convergent. (Hint: Show
cK
that whenever k, t > K, we have ||x* - xf|| < ------ ||xi - Xo||.)
1 —c
5. Use the result of Exercise 2.2.14 to give a different proof of Proposition 1.1.
8 6. (a) Show that if H is any square matrix with || H || < 1, then 1 — H is invertible. (Hint: Consider
the geometric series You will need to use the result of Exercise 5.1.6.)
(b) Suppose, more generally, that A is an invertible n x n matrix. Show that when || H || < 1 /1| A~11|,
the matrix A + H is invertible as well. (Hint: Write A + H = A(Z + A-1#).)
(c) Prove that the set of invertible n x n matrices is an open subset of A4BX« = R"2. This set
is denoted GL(n), the general linear group. (Hint: By Exercise 5.1.5, if (52hy)1/2 < 8, then
IIHII < 5.)
8 7. Continuing Exercise 6:
(a) Show that if ||H|| < s < 1, then || (I + Z?)1 - Z || < ------- .
1 —£
(b) More generally, if A is invertible and || A-11| || Zf || < e < 1, then estimate || (A + H)~x — A-11|.
(c) Let X c be the set of invertible n x n matrices (by Exercise 6, this is an open subset).
Prove that the function f: X -> X, f (A) = A-1, is continuous.
250 > Chapter 6. Solving Nonlinear Problems
2We learned of the n-dimensional version of this result, which we give in Exercise 10, called Kantarovich’s
Theorem, in Hubbard and Hubbard’s Vector Calculus, Linear Algebra, and Differential Forms.
2 The Inverse and Implicit Function Theorems 251
(b) Let g: R2 —> R2 be defined by g '1 +*2 ~5 Do one step of Newton’s method to
2*1X2 - 1
solve g(x) = 0, starting at Xo = and find a ball in R2 that is guaranteed to contain a root of g.
4sinxi + x%
(c) Let g: R2 -> R2 be defined by g . Do one step of Newton’s method to
2*1*2 — 1
solve g(x) = 0,. starting at Xq = , and find a ball in R2 that is guaranteed to contain a root of g.
0
r, - ?1 xr22
xi
(d) Let g: R2 -> R2 be defined by g . Do one step of Newton’s method to
X2 — COSX1
0
solve g(x) = 0, starting at Xq = , and find a ball in R2 that is guaranteed to contain a root of g.
1
12. Prove the following, slightly stronger version of Proposition 1.3. Suppose 17 c R" is open,
f: U -> Rm is differentiable, and a and b are points in U so that the line segment between them
is contained in U. Then prove that there is a point £ on that line segment so that ||f (b) — f (a) || <
||Df ($)|| ||b — a||. (Hints: Define g as before, let v = g(l) — g(0), and define 0: [0,1] -* R by
0(t) = g(r) • v. Apply the usual mean value theorem and the Cauchy-Schwarz inequality, Proposition
2.3 of Chapter 1, to show that ||v||2 = 0(1) - 0(0) < Ug7 (c)|| ||v|| for some c e (0,1).)
► EXAMPLE 1
| + x2 sin J, X jL 0
Let f(x) = . Then, calculating from the definition, we find
0, x=0
so there are points (e.g., x = 1 jinn for any nonzero integer n) arbitrarily close to 0 where /7(x) < 0.
That is, despite the fact that /7(0) > 0, there is no interval around 0 on which f is increasing, as
Figure 2.1 suggests. Thus, f has no inverse on any neighborhood of 0.
252 ► Chapter 6. Solving Nonlinear Problems
All right, so we need a stronger hypothesis. If we assume f is C1, then it will follow
that if f'(a) > 0, then f' > 0 on an interval around a, and so f will be increasing—hence
invertible—on that interval. That is the result that generalizes nicely to higher dimensions.
Dg(y) = (Df(x))-1.
Now, fix y with ||y || < r/2, and define the function by
0(x) =x-f(x)+y.
Note that ||Zty(x)|| = ||Df(x) - Z||. Whenever ||x|| < r, we have (by Proposition 1.3)
and so 0 maps the closed ball 2?(0, r) to itself. Moreover, if x, y G B(0, r), by Proposition
1.3 we have
Figure 2.2
so 0 is a contraction mapping on B(0, r). By Theorem 1.2, 0 has a unique fixed point
xy € B(Q, r). That is, there is a unique point xy e 2?(0, r) so that f (xy) = y. We leave it to
the reader to check in Exercise 10 that in fact xy e B(0, r).
As pictured in Figure 2.2, take W = B(0, r/2) and V = f-1 (W) Cl B(0, r) (note that V
is open because f is continuous; see also Exercise 2.2.7). Define g: W -> V by g(y) = xy.
We claim first of all that g is continuous. Indeed, define 0: B(0, r) -> by 0(x) =
f (x) — x. Then, by Proposition 1.3 we have
Thus, we have
||(f(u)-f(v))-(u-v)|| <i||u-v||.
and so
We consider instead the result of multiplying this quantity by (the fixed matrix) A:
We infer from (*) that ||h|| < 2||k||, so as k -> 0, it follows that h -> 0 as well. Note,
moreover, that h /= 0 when k / 0 (why?). Now we analyze the final product above: The
first term approaches 0 by the differentiability of f; the second is bounded above by 2. Thus,
the product approaches 0, as desired.
The last order of business is to see that g is^C1. We have
og(y) = (or(g(y)))-1,
so we see that Dg is the composition of the function y Df (g(y)) and the function
A A-1 on the space of invertible matrices. Since g is continuous and f is C1, the former
is continuous. By Exercise 6.1.7, the latter is continuous (indeed, we will prove much
more in Corollary 5.19 of Chapter 7 when we study determinants in detail). Since the
composition of continuous functions is continuous, the function y Dg(y) is continuous,
as required. ■
Remark More generally, with a bit more work, one can show that if f is Qk (or
smooth), then the local inverse g is likewise Gk (or smooth).
It is important to remember that this theorem guarantees only a local inverse function.
It may be rather difficult to determine whether f is globally one-to-one. Indeed, as the
following example shows, even if Df is everywhere invertible, the function f may be very
much not one-to-one.
> EXAMPLE 2
—eu sin v
eu cos v
is everywhere nonsingular since its determinant is e2u / 0. Nevertheless, since sine and cosine are
/„\ Z u \
periodic, it is clear that f is not one-to-one: We have fl I = f I I for any integer k.
\v / \v + 2jtkl
g ZA _ ’ 2 log(*2 + y2)
\y/ _ arctan(y/x)
2 The Inverse and Implicit Function Theorems ◄ 255
Figure 2.3
certainly satisfies fog So, why is g not the inverse function of f ? Recall that
arctan: R —*• (—nr/2, t t /2). So, as shown in Figure 2.3, if we consider the domain of f to be
: —Txfl < v < njl and the domain of g to be , then f and g will be inverse
functions.
Let’s calculate the derivative of any local inverse g according to Theorem 2.1. Iff
then
Note that we get the same formula by differentiating our specific inverse function (t). It is a bit
surprising that the derivative of any other inverse function, with different domain and range, must be
given by the identical formula.
Now we are finally in a position to prove the Implicit Function Theorem, which first
arose in our informal discussion of manifolds in Section 5 of Chapter 4. It is without
question one of the most important theorems in higher mathematics.
Moreover,
( x \\ 9F / x \
X ~ \3y \0(x)// dx \0(x)/ ‘
256 ► Chapter 6. Solving Nonlinear Problems
Figure 2.4
is invertible (see Exercise 4.2.7). This means that—as illustrated in Figure 2.4—there
are neighborhoods V c of x q , W G of y0, and Z C of 0 and a C1 function
g: Z V x W so that g is an inverse of f on V x W. Now define $: V -> W by
Since we know that 0 is C1, we can calculate the derivative by implicit differentiation:
Define h: V -> by h(x) = F ( X ). Then h is C1, and since h(x) = 0 for all x g V,
we have
2 The Inverse and Implicit Function Theorems ◄ 257
O = Dh(x) = —
9F / x \ .
Since by hypothesis — I J z I is invertible, the desired result is immediate. ■
dy
Remark With not much more work, one can prove analogously that if F is Qk (or
smooth), then y is given locally as a Qk (or smooth) function of x. We may take this for
granted in our later work.
► EXAMPLES
Consider the function F: R2 -> R, F = x3ey + 2x cos(xy). We assert that the equation
xo 1
= 3 defines y locally as a function of x near the point By the Implicit
yo 0
dF Z1 \
Function Theorem, Theorem 2.2, we need only check that — I I $4 0. Well,
3y \0/
8F , -
— = x ey — 2x sin(xy) and so
3y
so the line 5x + y = 5 is the desired tangent fine of the curve at that point. ◄ !
Figure 2.5
258 ► Chapter 6. Solving Nonlinear Problems
► EXAMPLE 4
*2 2xi + *2 + 71 + 73 -1 1
71. = *1*2 4-xiyi + *j72 "7273 , and let a= -1
yz *27173 + *171+7273 1
W 1
*1
Does the equation F = 0 define y = as implicitly as a function of x = near a? Note
*2
L* J
first of all that F is C1. We begin by calculating the derivative of F:
^*?
*2 2 1 1 0 1
DF y\ = x% + yi 3xjxl + 2x272 *1 2*272 ~ 73 “72
and so
( °"l
1 "2 1 1 0 1“
DF -1 = 0 2 0 1 -1
1 _i -1 1 1 1_
I V
1 0 1
8F
y(a) = 0 1 -1
8y
1 1 1
which is easily checked to be nonsingular, and so the hypotheses of the Implicit Function Theorem,
Theorem 2.2, are fulfilled. There is a neighborhood of a in which we have y = 0(x). Moreover, we
have
With this information, we can easily give the tangent plane at a of the surface F = 0. ◄
Remark In general, we shall not always be so chivalrous (nor shall life) as to set
up the notation precisely as in the statement of Theorem 2.2. Just as in the case of linear
equations where the first r variables needn’t always be the pivot variables, here the last m
variables needn’t always be (locally) the dependent variables. In general, it is a matter of
finding m pivots in some m columns of the m x n derivative matrix.
2 The Inverse and Implicit Function Theorems 259
EXERCISES 6.2
1. By applying the Inverse Function Theorem, Theorem 2.1, determine at which points xo the given
function f has a local C1 inverse g, and calculate Dg(f (xo)).
lx x+y+z
x/(x2 + y2)
(b) (e) f y xy + xz + yz (cf. also Exercise 2)
y/(x2 + y2)
\z xyz
x + h(y)
(c) for any 61 function h: R -> R
y
u u 4- v
2. Let U = : 0 < v < u j, and define f: U -> R2 by f
v uv
(a) Show that f has a global inverse function g. Determine the domain of and an explicit formula
forg.
(b) Calculate Dg both directly and by the formula given in the Inverse Function Theorem. Compare
your answers.
(c) What does this exercise have to do with Example 2 in Chapter 4, Section 5? In particular, give a
concrete interpretation of your answer to part b.
3. Check that in each of the following cases, the equation F = 0 defines y locally as a C1 function
xo
0(x) near a = , and calculate Z)$(x0).
yo
I X1 I i
*(b) F I x2 I = e*iy 4- y2 cosxix2 - 1, Xo = ,yo =o
2
\y /
l X1 I
0
(c) F I x2 I = e*iy 4- y2 arctan x2 - (1 4- t t /4), Xq = , yo = 1
1
x2 - y2 - y2 - 2 1
(d) F I yi , x0 = 2, y0 =
X - yi 4- yz - 2 i
\y2/
.2 2
X2 ,Xq =
2
(e) F ,yo =
yi 2X1X2 + xj - 2y2 + 3y£ + 8 -1 1
\y2/
*4. Show that the equations x2y 4- xy2 +12 — 1 = 0 and x2 4- y2 — 2yt = 0 define x and y implicitly
as C1 functions of t near Find the tangent line at this point to the curve so defined.
260 ► Chapter 6. Solving Nonlinear Problems
1
5. LetF I y I = x2 + 2y2 — 2xz - z2 = 0. Show that near the point a == 1 , z is given implicitly
\z/ 1
as a C1 function of x and y. Find the largest neighborhood of a on which this is true.
*6. Using the law of cosines (see Exercise 1.2.12) and Theorem 2.2, show that the angles of a triangle
are C1 functions of the sides. To a small change in which one of the sides (keeping the other two
fixed) is an angle most sensitive?
7. Define f: Mnxn -> Mnxn byf(A) = A2.
(a) By applying the Inverse Function Theorem, Theorem 2.1, show that every matrix B in a neigh
borhood of I has (at least) two square roots A (i.e., A2 = 5), each varying as a C1 function of B.
(See Exercise 3.1.13.)
(b) Can you decide if there are precisely two or more? (Hint: In the 2 x 2 case, what is
1 o'1'
Df ?)
0 -1
dF
8. Suppose U C R3 is an open set, F: U -> R is a C1 function, and on F = 0 we have —— 0,
fp\ p
dF dF
—- 5^ 0, and — / 0. (You might use, as an example, the equation F I v I = pV - RT = 0 for
dV dT II
\r/
one mole of ideal gas; here R is the so-called gas constant) Then it is a consequence of the Implicit
Po
Function Theorem that in some neighborhood of Vo , each of p, V, and T can be written as
_ r° _ i dp \
a differentiable function of the remaining two variables. Physical chemists denote by I —— I
x /T
partial derivative of the function p = p I j with respect to V, holding T constant, etc. Prove the
T- =-L
T P \dpJv
9. Using the notation of Exercise 8, physical chemists define the expansion coefficient a and isother
mal compressibility B to be, respectively,
1 1
and V \ dp )T '
13. (The Envelope of a Family of Curves) Suppose f: R2 x (a, b) —> R is C2 and for each t e
(a, b), V/ I 0 on the level curve Ct = f = 01. (Here the gradient denotes differentiation
with respect only to x.) The curve C is called the envelope of the family of curves [Cr : t e (a, b)}
if each member of the family is tangent to C at some point (depending on t).
(a) Suppose the matrix
is nonsingular. Show that for some 8 > 0, there is a C1 curve g: (t0 - 8, tQ + 8) -» R2 so that
f
J \t J
A_ dt \ * J
0
Conclude that g is a parametrization of the envelope C near xo.
(b) Find the envelopes of the following families of curves (portions of which are sketched in Figure
2.6).
ii.
iii.
► 3 MANIFOLDS REVISITED
In Chapter 4, we introduced ^-dimensional manifolds in R" informally as being locally the
graph of 61 function over an open subset of a k-dimensional coordinate plane. We suggested
that, because of the Implicit Function Theorem, under the appropriate hypotheses, a level
262 ► Chapter 6. Solving Nonlinear Problems
set of a C1 function is a prototypical example. Indeed, as we now wish to make clear, there
are three equivalent formulations, roughly these:
Explicit: Near each point, M is a graph over some ^-dimensional coordinate plane.
Implicit: Near each point, M is the level set of some function whose derivative has
maximum rank.
We’ve seen that the implicit formulation arises in working with Lagrange multipliers, and
the parametric formulation will be crucial for our work with integration in Chapter 8. In
this brief section, we are going to make the three definitions quite precisely and then prove
their equivalence in Theorem 3.1. To make our life easier in Chapter 8, we will replace the
C1 condition with “smooth.”
Figure 3.1
If the curious reader wonders why the last (and obviously technical) condition is in
cluded in the third definition, see Exercises 2 and 3.
3 Manifolds Revisited 263
Theorem 3.1 The three criteria given in this definition are all equivalent.
Proof The Implicit Function Theorem, Theorem 2.2, tells us precisely that (2)=>(1).
u
And (1)=>(3) is obvious since we can set g(u) (where, for ease of notation, we
f(u)
assume here that Rfc is the xi • • • -plane). So it remains only to check that (3) ===>(2).
Suppose, as in the third definition, that we are given a neighborhood W c R" of
p € M so that M A W is the image of a smooth function g: U -> R" for some open set
U C R\ with the properties that g is one-to-one, rank(Z)g(u)) = k for all u g U, and
g-1: M fl W -> U is continuous. The last condition tells us that that if g(iio) = p, then
points sufficiently close to p in M must map by g-1 close to Uo; that is, all points of M n W
are the image under g of a neighborhood of u q .
We may assume that g(0) = p and (renumbering coordinates in R" as necessary)
T A "1
£>g(0) = — , where A is an invertible k x k matrix. We define G: V x Rn~k R”
LBJ
’01 n.
by G g(u) + . Smce
“0- A___ O_
B In-k
is invertible (see Exercise 4.2.7), it follows from the Inverse Function Theorem, Theorem
u
Now suppose F(x) = 0. Since x e W,x - G I I for a unique vector € V. Then
f w =f (g C))=H2(g ("))=t ’
so F(x) = 0 if and only if v = 0, which means that x = g(u). This proves that the equation
F = 0 defines that portion of M given by g(u) for all u G V). But because W c W, we
know that such points comprise all of M A W. ■
► EXAMPLE 1
Perhaps an explicit example will make this proof a bit more understandable. Suppose g: R -> R3 is
u
given by g(u) = «2 and M is the image of g. We wish to write M (perhaps locally) as the level
,3
264 ► Chapter 6. Solving Nonlinear Problems
u " 0 ' u
u2 + VI = U2 + Vi
u3 _ »2 _ _ u3 4- V2 _
6^ X
G1 y y — x2
The proof tells us to define F = Hi, and, indeed, this works. M is the zero-set of the function
M r _x2~
F: R3 -»R2 given by Fly = y \ .
I z
W L J
We ask the reader to carry this procedure out in Exercise 6 in a situation where it will only work
locally.
There are corresponding notions of the tangent space of the manifold M at p. (Recall
that we shall attempt to refer to the tangent space as a subspace, whereas the tangent plane
is obtained by translating it to pass through the point p.)
Definition If the manifold M is presented in the three respective forms above, then
its tangent space at p, denoted TPM, is defined as follows.
Once again, we need to check that these three recipes all give the same ^-dimensional
subspace of Rn. The ideas involved in this check have all emerged already in the preceding
chapters. Since (1) is a special case of (3) (why?), we need only check that N([DF(p)]) =
image(Dg(a)). Note that both of these are ^-dimensional subspaces because of our rank
conditions on F and g. So it suffices to show that image(Dg(a)) c N([£>F(p)]). But this is
easy: The function F°g: U -> R""* is identically 0, so, by the chain rule, DF(p)°Z>g(a) =
O, which says precisely that any vector in the image of Dg(a) is in the kernel of DF(p).
► EXERCISES 6.3
*1. Show that the set X — is not a 1-dimensional manifold, even though the func
t3
tion g(t) = gives a C1 “parametrization” of it. What’s going on?
l'3l
3 Manifolds Revisited 265
cos 2t cos t
2. Show that the parametric curve g(t) = , t € (—t t /2, t t /4), is not a 1-dimensional
cos 2t sin t
manifold. (Hint: Stare at Figure 3.2.)
cosf
(b) parametric curve
3sint
(c) implicit curve x2 + y2 — 1, x2 + y2 + z2 — 2x
(d) implicit curve x2 + y2 = 1, z2 -I- w2 = 1, xz + y w = 0
+ u2
6. Suppose g: R -> R3 is given by g(») = u2 . Let M be the image of g.
u3
(a) Show that g is globally one-to-one.
(b) Following the proof given of Theorem 3.1, find a neighborhood W of 0 e R3 and F: W -> R2
so that M n W = F-1(0).
7. Show the equivalence of the three definitions for each of the following 2-dimensional manifolds:
(a) implicit surface x2 + y2 = 1 (in R3)
(b) implicit surface x2 + y2 = z2 (in R3 - {0})
wcosv
*(c) parametric surface usinv , u > 0, v e R
V
sin u cos v
(d) parametric surface sin u sin v
cosu
266 ► Chapter 6. Solving Nonlinear Problems
sin u cos v
(e) parametric surface sin u sin v , 0 < U < 7T, 0 < v < 2t t
2c o sh
(3 + 2 coS m ) cosv
(f) parametric surface (3 + 2 c o s h ) sin v , 0 < u, v < 2j t
2sinu
8. (a)
is a 2-manifold.
a smooth surface? Proof? Give the equation of its tangent space at such a point.
10. Prove that the equations
x2 + x2 + x2 + x2 =4 and xix2 + X3X4 = 0
define a smooth surface in R4. Give a basis for its tangent space at
7
INTEGRATION
We turn now to the integral, with which, intuitively, we chop a large problem into small,
understandable bits and add them up, then proceeding to a limit in some fashion. We
start with the definition and then proceed to the computation, which is, once again, based
on reducing the problem to several one-variable calculus problems. We then learn how
to exploit symmetry by using different coordinate systems and tackle various standard
physical applications (e.g., center of mass, moment of inertia, and gravitational attraction).
The discussion of determinants, initiated in Chapter 1, culminates here with a complete
treatment and their role in integration and the change of variables theorem.
in R3 lying under the graph z = f and over the rectangle R = [a, b] x [c, d] in the
xy-plane. Once we see how partitions, upper and lower sums, and the integral are defined
for rectangles in R2, then it is simple (although notationally discomforting) to generalize to
higher dimensions.
Let Mu = sup /(x) and mu = inf /(x), as indicated in Figure 1.1. Define the upper
xeRij
sum of / with respect to the partition ?,
U(/,5’) = ^Myarea(Xu),
ij
and the analogous lower sum
L(f,7) = Y, mij&e&iRij).
i>j
267
268 Chapter 7. Integration
Figure 1.1
(Note that the inequality L(f, V) <U(f, IP) is obvious, as < Mij for all i and j.)
► EXAMPLE 1
Let f be a constant function, viz., /(x) = a for all x € R. Then for any partition P of R we have
L(f, P) = aarea(R) = U (f, P), so f is integrable on R and / fdA — aarea(R).
Jr
We will usually suppress all the subscripts and just refer to the partition as {R,}. We define
the volume of a rectangle R = [ai, hi] x [a2, b2] x • • • x [a„, bn] C R" to be
Then upper sums, lower sums, and the integral are defined as before, substituting volume
(of a rectangle in R") for area (of a rectangle in R2). In dimensions n > 3, we denote the
integral by / fdV.
Jr
We need some criteria to detect integrability of functions. Then we will find soon that
we can evaluate integrals by reverting to our techniques from one-variable calculus.
Figure 1.2
Lemma 1.1 Let 3* and 7' be partitions of a given rectangle R, and suppose (P is a
refinement of 7'. Suppose f is a boundedfunction on R. Then we have
L(J, 3y) < L(/, 3>) < U(f, 3>) < U(f, 3^).
Proof It suffices to check the following: Let Q be a single rectangle, and let Q =
{Qi, • • •» Qr} be a partition of Q. Let m = infx€e /(x), - infxefii /(x), M =
supxeG /(x), and Mi = supxeG. f (x). Then we claim that
r r
m area (2) < y^ffliarea(<2i) < Aftarea(Q,) < Afarea(<2).
i=l i=l
This is immediate from the fact that m < m, < Mi< < M for all i = 1,..., r. ■
Corollary 1.2 If 7' and 7" are two partitions of R, we have L(f, 3y) < U (f, 3)Z/).
Proof Let ? be the partition of R formed by taking the union of the respective
partitions in each coordinate, as indicated in Figure 1.3. 3> is called the common refinement
of 3y and 3’". Then by Lemma 1.1, we have
L(/, JP') < L(/, 3>) < U(f, 3>) < U(f, 3>"),
as required. ■
270 ► Chapter 7. Integration
■■ ! ■■■:■
Figure 1.3
Proof 4=: Suppose there were two different numbers Ii and h satisfying L (J, IP) <
Ij < U(/, IP) for all partitions IP. Choosing £ = IZj — Al yields a contradiction.
=>: Now suppose f is integrable, so that there is a unique number I satisfying
L(f, IP) < I <U(f, IP) for all partitions IP. Given £ > 0, we can find partitions IP7 and IP"
so that
(If we could not get as close as desired to I with upper and lower sums, we would violate
uniqueness of 7.) Let IP be the common refinement of IP' and IP". Then
L{f, IP') < L(/, IP) < U(f, IP) < U(f, IP"),
so
U{f, IP) - £(/, IP) < U(f, IP") - L(/, IP7) < e,
as required. ■
We need to be aware of the basic properties of the integral (which we leave to the reader
as exercises).
1 Multiple Integrals 271
Proposition 1.6 Suppose R = R'U R" is the union of two subrectangles. Then f is
integrable on R if and only iff is integrable on both R' and R", in which case we have
( fdV= f fdV+ f fdV.
R Jr 1 . Jr "
Proof Given e > 0, we must find a partition ? of R so that U(f, ‘P) — L(f, T) < e.
Since f is continuous on the compact set R, it follows from Theorem 1.4 of Chapter 5 that f
is uniformly continuous. That means that given any e > 0, there is 8 > 0 so that whenever
g
||x — y || < 3, x, y e R, we have |/(x) - /(y)| < ——-. Partition R into subrectangles
vol(/c)
Ri,i = 1,... ,k, of diameter less than 5 (e.g., whose sidelengths are less than 8/y/n). Then
g
on any such subrectangle Rif we will have — mt <---------- , and so
vol(2?)
k
U(f, ?) - £(/, ?) = V(H - m.OvolW) < —i—vol(fi) = e,
vol(R)
as needed. ■
Definition We say X C R” has (n-dimensional) volume zero if for every e > 0, there
s
are finitely many rectangles ,..., Rs so that X C Ri U ■ • • U Rs and £ vol(l?i) < s.
j=i
Proof Let e > 0 be given. We must find a partition ? of R so that U (f, !P) —
L(f, IP) < e. Since f is bounded, there is a real number M so that |/| < M. Because
X has volume zero, we can find finitely many rectangles RJ, ..., Rfs, as shown in Figure
1.4, that cover X and satisfy vol(2f<) < e/4M. We can also ensure that no point of
X is a frontier point of the union of these rectangles (see Exercise 2.2.8). Now create a
partition of R in such a way that each of R'j, j = 1,..., s, will be a union of subrectangles
of this partition, as shown in Figure 1.5. Consider the closure Y of R — U it* too,
;=i
is compact, and f is continuous on Y, hence uniformly continuous. Proceeding as in the
proof of Proposition 1.7, we can refine the partition to obtain a partition IP = {7?i,...,
of R with the property that
s
L
RiCY
f.R-+ R
10, otherwise
f fdV= [ fdV.
Jq Jr
(We leave it to the reader to check in Exercise 8 that this is well defined.)
1 Multiple Integrals 273
Proof Recall that to integrate f over £2 we must integrate /, as defined above, over
some rectangle R containing £2. The function f is continuous on all of R except for the
frontier of £2, which is a set of volume zero. ■
Proposition 1.11 Suppose f and g are integrable junctions on the region £2 and
f <g. Then
f fdV< [ gdV.
Jn Jn
Proof Let R be a rectangle containing £2 and let / and g be the functions as defined
above. Then we have f < g everywhere on R. Then, applying Propositions 1.4 and 1.5,
the function h = g — f is integrable and / hdV — I gdV — I fdV. On the other
Jr Jq Jsi
hand, since h > 0, for any partition O’ of R, the lower sum L(h, O’) > 0, and therefore
/ h d V > 0. The desired result now follows immediately. ■
Jr
► EXERCISES 7.1
Ix\ Io 0 ** v < ~
*1. Suppose / I | = l’ ~ ” 2 • Prove that f is integrable on 2? = [0,1] x [0,1] and find
f fdA. V/ I1'
Jr
274 ► Chapter 7. Integration
1, x=y
f
0, otherwise
is integrable on R = [0,1] x [0,1] and find / fdA. (Hint: Partition R into 1/V by 1/N squares.)
Jr
3. Show directly that the function
1, y <x
f
0, otherwise
X ~ 2’ 3’ 4’ 5’
otherwise
*8. Check that / fdV is well defined. That is, if R and R' are two rectangles containing Q and f
Jo
and f are the corresponding functions, check that f is integrable over R if and only if f' is integrable
over R' and that f fdV = I f'dV.
Jr Jr '
9. (a) Prove Proposition 1.4. (Hint: If ? = {2?(} is a partition and m{, mf, m{+8, M{, Mf, M?+8
denote the obvious, show that
m{ + m8 < m{+8 < M?+g < M{ + Mf.
1 Multiple Integrals 275
It will also be helpful to see that / fdV + / gdV is the unique number between L(f, O’) + L(g, O’)
Jr Jr
and U (f, O’) + U (g, O’) for all partitions O’.)
(b) Prove Proposition 1.5.
(c) Prove Proposition 1.6.
810. Suppose/is integrable on??. Givens > 0, prove there is 8 > Oso that whenever all the rectangles
of a partition O’ have diameter less than 8, we have U (f, O’) — L{f,T) < e. (Hint: By Proposition 1.3,
there is a partition O’' (as indicated by the darker lines in Figure 1.6) so that U (f, 0^) — L(f, O’7) < e/2.
Show that covering the dividing hyperplanes (of total area A) of the partition by rectangles of diameter
< 6 requires at most volume A8/^/n. If |/| < M, then we can pick 8 so that that total volume is at
most s/4Af. Show that this 3 works.)
Figure 1.6
811. Let X c R" be a set of volume 0.
(a) Show that for every e > 0, there are finitely many cubes Ci,.... Cr so that X c Ci U • • • U Cr
and 52 vol(Ci) < £. (Hint: If R is a rectangle with vol(7?) < 8, show that there is a rectangle R'
i=l
containing R with vol(7?') < 8 and whose sidelengths are rational numbers.)
(b) Let T: R” -> R" be a linear map. Prove that T (X) has volume 0 as well. (Hint: Show that there
is a constant k so that for any cube C, the image T (C) is contained in a cube whose volume is at most
k times the volume of C.) Query: What goes wrong with this if T: R” -> Rm and men?
812. Let m < n, let X C Rm be compact, and let U C Rm be an open set containing X. Suppose
0: U -> Rn is C1. Prove 0(X) has volume 0 in R". (Hints: Take X c C, where C is a cube. Show
that if N is sufficiently large and we divide C into Nm subcubes, then X is covered by such cubes
all contained in U,1 and 0(X) will be contained in at most Nm cubes in R”. Argue by continuity of
D<j) that there is a constant k (not depending on N) so that each of these will have volume less than
(k/N)n.)
13. We’ve seen in Proposition 1.8 a sufficient condition for / to be integrable. Show that it isn’t
necessary by considering the famous function /: [0,1] -> R given by
„, K 7, x = 7 in lowest terms
/(x) = 9 9
0, otherwise
(Hint: Why is Q D [0,1] not a set of length zero?)
14. A subset X C Rn has measure zero if, given any e > 0, there is a sequence of rectangles Rit R%,
R3,..., Rk,..., so that
oo oo
x a U Ri and 22vol(/?i) < s.
i=l !=1
(c) Prove that if X is compact and has measure 0, then X has volume 0. (Hint: See Exercise 5.1.12.)
(d) Suppose Xi, X2,... is a sequence of sets of measure 0. Prove that Q X( has measure 0.
i=l
15. In this (somewhat challenging) exercise, we discover precisely which bounded functions are
integrable. Let f: R -> R be a bounded function.
(a) Let a e R and 8 > 0. Define
M(f, a, 3) = sup /(x)
x€B(a,3)nB
under the graph z = f , we could slice by planes perpendicular to the x-axis, as shown
Figure 2.1
2 Iterated Integrals and Fubini’s Theorem 277
=L ti fG)dy)dx=LbfcifG)dydx-
x fixed
This expression is called an iterated integral. Perhaps it would be more suggestive to call
it a nested integral. Calculating iterated integrals reverts to one-variable calculus skills
(finding antiderivatives and applying the Fundamental Theorem of Calculus) along with a
healthy dose of neat bookkeeping.
► EXAMPLE 1
■2 r1 n2
/ (l+x2)y + |xy2
'o *--------------- v ....... —-Jy:
x fixed
3A
'0
1 3 25
+ 3 + 4 = 12
► EXAMPLE 2
<2
xyex+yldx^dy = j - l)e*J ^jdy (recalling that /xexdx = xex -e*)
i -ii
yey2(e2 + l)dy = |(e2 + l)^2] = 0.
dydx.
f / xyex+y2dydx = f (xex)(yey2)dy^dx
= 0.
2
More to the point, we should observe that for fixed x, the function (xex)(y«J' ) is an odd
function of y, and hence the integral as y varies from — 1 to 1 must be 0.
We shall prove in a moment that for reasonable functions the iterated integrals in either order are
equal, and so it behooves us to think a minute about symmetry (or about the difficulty of finding an
antiderivative) and choose the more convenient order of integration.
278 ► Chapter 7. Integration
► EXAMPLE 3
Suppose we wish to find the volume of the region lying over the triangle Q c R2 with vertices at
o"| 1 1
,and and bounded above by z = f I I = xy. Then we wish to find the integral of
° ’ 0 1
f over the region Q. By definition, we consider Q as a subset of, say, the square R = [0,1] x [0,1]
and define R by
xy, eQ
0, otherwise
whose graph is sketched in Figure 2.2. Note that for x fixed, f I I = xy when 0 < y < x and is 0
otherwise. So
rx rx
I xydy + / Ody = / xydy.
'O Jx JO
Thus, we have
Figure 2.2
► EXAMPLE 4
Suppose we slice into a cylindrical tree trunk, x2 + y2 < a2, and remove the wedge bounded below
by z = 0 and above by z = y, as depicted in Figure 2.3. What is the volume of the chunk we remove?
2 Iterated Integrals and Fubini’s Theorem 279
We see that the plane z = y lies above the plane z = 0 when y > 0, so we let Q =
: x2 + y2 < a2, y > 0 , as indicated in Figure 2.4, and to obtain the volume we calculate:
Figure 2.3
The fact that we can compute volume by using either a multiple integral or an iterated
integral suggests that, at least for “reasonable” functions, we should in general be able to
calculate multiple integrals by computing iterated integrals. The crucial theorem that allows
us to calculate multiple integrals with relative ease is the following
a rectangle R = [a, b] x [c, d] c R2. Suppose thatfor each x e [a, b], thefunction f
fd (x\
is integrable on [c, d}', i.e., F(x) = J f I J dy exists. Suppose next that the function F
F(x)dx dx
LfdA=LV' fQdy)dx-
280 ► Chapter 7. Integration
Proof Let? be an arbitrary partition of R into rectangles R^ = [xt_i, x,] x [y7-_ i, yj,
€
whence
7=i
€
- yj-S)(xi -xi-i)
j =i
£
7=1
i=i 7=1
Corollary 2.2 Suppose f is integrable on the rectangle R = [a, b] x [c, d] and the
iterated integrals
and
2 Iterated Integrals and Fubini’s Theorem ◄ 281
both exist. (That is, for each x, the integral f f dy exists and defines a function of
x that is integrable on [a, b]. And, likewise, for each y, the integral f f dx exists
la ll f C) =I = ll ll f 0
all exist. Then the multiple integral and the iterated integral are equal:
r /*bi pbn
/ /(x)dv = I ... f(x)dxn -- dxi.
Jr J ax Jan
(The same is truefor the iterated integral in any order, provided all the intermediate integrals
exist.) In particular, whenever f is continuous on R, then the multiple integral equals any
of the n\ possible iterated integrals.
► EXAMPLES
It is easy to find a function f on the rectangle A = [0,1] x [0,1] that is integrable but whose iterated
integral doesn’t exist. Take
1, x = 0, y e Q
0, otherwise
► EXAMPLE 6
It is somewhat harder to find a function whose iterated integral exists but that is not integrable. Let
1, y €Q
f
2x, y £Q
282 ► Chapter 7. Integration
Then dx = 1 for every y e [0,1], so the iterated integral dxdy exists and
equals 1. Whether f is integrable on R = [0,1] x [0,1] is more subtle. Probably the easiest way
to see that it is not is this: If it were, by Proposition 1.6, then it would also be integrable on R' =
[0, 5] x [0, 1]. For any partition J* of R\ we have U(f, T) = 5, whereas we can make L(f, O’) as
nl/2 j
Zxdxdy — - as we wish.
We ask the reader to decide in Exercise 4 whether the other iterated integral,
dydx, exists. ◄
► EXAMPLE?
More subtle yet is a nonintegrable function on R = [0,1] x [0,1] both of whose iterated integrals
exist. Define
First of all, f is not integrable on R since L(f, T) = 0 and U(f, T) = 1 for every partition O’ of R
(see Exercise 5). Next, we claim that for any x, dy exists and equals 0. When x £ Q,
this is obvious. When x = —, only for finitely many y e [0,1] is f I X ) not equal to 0, and so the
? \y
integral exists. Obviously, then, the iterated integral dydx exists. The same argument
► EXAMPLE 8
(Changing the Order of Integration) You are asked to evaluate the iterated integral
sinx , ,
------ dxdy.
It is a classical fact that / ^^dx cannot be evaluated in elementary terms, and so (other than
J x
resorting to numerical integration) we are stymied. To be careful, we define
sinx
------ , x /0
x
1, x=0
Then f is continuous and we recognize (applying Theorem 2.1) that the iterated integral is equal to
the double integral / fdA, where
Jq
Q=
y
2 Iterated Integrals and Fubini’s Theorem -4 283
which is the triangle pictured in Figure 2.5. Once we have a picture of Q, we see that we can equally
well represent it in the form
x
Q=
y
sinx , \ ,
------dy Idx
x /
x fixed
dx
r /sinx \ f1 .
I I------ xldx= I sinxax = 1 — cos 1.
0 \ X / Jo
The moral of this story is that, when confronted by an iterated integral that cannot be evaluated in
elementary terms, it doesn’t hurt to change the order of integration and see what happens.
► EXAMPLE 9
Let Q c R3 be the region in the first octant bounded below by the paraboloid z = x2 + y2 and above
by the plane z = 4, shown in Figure 2.6. Evaluate / xdV. It is most natural to integrate first with
Jn
respect to z; notice that the projection of Q onto the xy-plane is the quarter of the disk of radius 2
centered at the origin lying in the first quadrant. For each point in that quarter-disk, z varies
y
from x2 + y2 to 4. Thus, we have
xdzdydx
n
lx
2_1(4_x2)3/2^x
2_ 64
15'
We will revisit this example in Section 3.
284 ► Chapter 7. Integration
Figure 2.6
► EXAMPLE 10
Let Q = {x e Rn : 0 < xn < x„_i < ■ • • < x2 < x^ < 1}. This region is pictured in the case n = 3 in
Figure 2.7. Then
*1
... I dxn ■ • • dx2dx\
Jo
xi fXn-2
... / x„_idxn_i • • -dx2dxi
Jo
*1 fXn-3
I ^_2dxn-2 • • • dx2dxi
'o
1 jxi = -4
'o (n-D! n!
*3
Figure 2.7
► EXERCISES 7.2
1. Evaluate the integrals / fdV for the given function f and rectangle R.
Jr
(a)
/ ] = e* cos y, 7? = [0,1] x [0, y]
2 Iterated Integrals and Fubini’s Theorem ◄ 285
\z/
2. Interpret each of the following iterated integrals as a double integral / fdA for the appropriate
Jo
region Q, sketch Q, and change the order of integration. (You may assume f is continuous.)
(a) ii:f (j (d> x f t)dxdy
3. Evaluate each of the following iterated integrals. In addition, interpret each as a double integral
/ fdA, sketch the region □ , change the order of integration, and evaluate the alternative iterated
integral.
£ £(x+y>dydx 0» [' (c) /' £177^
’(b)
vu v—a /1—
4. Given the function f in Example 6, does the iterated integral ££ f dydx exist?
5. Check that for the function f defined in Example 7, for every partition IP of R,
I7(/, IP) = 1 and L(/, ?) = 0. (Hint: Show that for every 8 > 0, if l/q < 5, then every interval
of length 8 in [0,1] contains a point of the form k/q.)
6. Let
i x = £ in lowest terms, y e Q
0, otherwise
Decide whether f is integrable on R = [0,1] x [0,1] and whether the iterated integrals
exist.
7. Is there an integrable function on a rectangle neither of whose iterated integrals exists?
8. Evaluate the following iterated integrals:
*(a) f f 7-^-5 dxdy
Jo Jjy 1+ JC3
(b) [ [ ey4dydx
Jo J&
286 ► Chapter 7. Integration
(c) / / eyixdxdy (Be careful: Why does the double integral even exist?)
Jo J^/y
9. Find the volume of the region in the first octant of R3 bounded below by file xy-plane, on the sides
by x = 0 and y = 2x, and above by y2 + z2 = 16.
10. Find the volume of the region in the R3 bounded below by the xy-plane, above by z = y, and on
the sides by y = 4 — x2.
*11. Find the volume of the region in R3 bounded by the cylinders x2 + y2 = 1 and x2 + z2 = 1.
12. Interpret each of the following iterated integrals as a triple integral / fdV for the appropriate
Jn
region Q, sketch Q, and change the order of integration so that the innermost integral is taken with
respect to y. (You may assume f is continuous.)
.2
fx+y I
*(a) f y I dzdydx *i I f Iy dzdydx
'o 0 u
•y x 'x+y
(b)
o I 'o
\z z
(c) y dzdydx
____f
'x2+y2
I
W
*13. Suppose a, b, and c are positive. Find the volume of the tetrahedron bounded by the coordinate
planes and the plane x/a + y/b 4- z/c = 1.
14. Find the volume of the region in R3 bounded by z = 1 — x2, z = x2 ~ l,y + z = 1, and y = 0.
*15. Let □ C R3 be the portion of the cube 0 < x, y, z < 1 lying above the plane y + z = 1 and below
the plane x + y + z = 2. Evaluate / xdV.
Jsi
16. Let
„ (x\ x —y
f\y)~{x + y)2'
Calculate the iterated integrals / / f | I dxdy and / / f | 1 dydx. Explain your results.
Jo Jo J Jo Jo \yJ
17. Let R = [0,1] x [0, 1]. Define /: R -> R by
fci(fc+i)(-e+D
fc+i < x — k’ i+i < y — <
-k2(k + I)2,
o, otherwise
Decide if both iterated integrals exist and if they are equal. Is f integrable on R2 (Hint: To see where
this function came from, calculate / fdA.)
(b) Suppose R is a rectangle that is symmetric about the origin; i.e., x e R —x e R, and
suppose f is an odd function, so that /(—x) = -/(x). Prove that / fdV = 0.
Jr
(c) Generalize the results of parts a and b to allow regions other than rectangles.
19. Assume / is C2. Prove Theorem 6.1 of Chapter 3 by applying Fubini’s Theorem. (Hint: Proceed
by contradiction: If the mixed partials are not equal at some point, apply Exercise 2.3.5 to show we
g2 z g2 -C
can find a rectangle on which, say, —— > Exercise 7.1.5 may also be useful.)
dxdy dydx
#20. (Differentiating Under the Integral Sign) Suppose /: [a, b] x [c,d] -> R is continuous and
9/ . fd /x\
— is continuous. Define F(x) = / f I I dy.
°x Jc \yj
(a) Prove that F is continuous. (Hint: You will need to use uniform continuity of f.)
(b) Prove that F is differentiable and that F'(x) = [ ^-\]dy. (Hint: Let 0(r) =
Jc dx \yj
fd 9/ M fx
I T- I and let $(*) = / <Kt)dt. Show that 0 is continuous and that F(x) = 4>(x) 4-
Jc 9x \y) Ja
const.)
f1 yx — 1 f1 y — 1
21. Let F(x) = I - -------dy. Use Exercise 20 to calculate Fz(x) and prove that / - ------ dy =
Jo logy Jo logy
F(l) = log2.
0 \2 f1 g-x2(t2+l)
' e"* dt] andg(x) = / - — —dt.
0 ' Jo t2 +1
(a) Using Exercise 20 as necessary, prove that /z(x) 4- gz(x) = 0 for all x.
f00 2 fN
(b) Prove that f(x) 4-g(x) = jr/4forallx. Deduce that / e~‘ dt — lim / e~*2 dt = Vt t /2.
) Jo N->oo Jq
dfi.
23. Suppose f: [a, x [c, d] -> R is continuous and — is continuous. Suppose g: [a, £>] ->
/x\
(c, d) is differentiable. Let h(x) = / f I I dy. Use the chain rule and Exercise 20 to show that
Jc \yj
fc-(x)=f”^epy + /( x L'(x).
Jc 9x \yl U(x)/
(Hint: Consider F
288 ► Chapter 7. Integration
n
I 'I f(xn)dxn ■ • • dx3dx2dx^ = -------— / (x - t)n” f(t)dt.
Jo Jo (n -1)! Jo
(Hint: Start by doing the cases n = 2 and n = 3.)
two concentric circles, as shown in Figure 3.1. As we quickly realize if we try to write
down iterated integrals in xy-coordinates, although it is not impossible to evaluate them, it
is far from a pleasant task. It would be much more sensible to work in a coordinate system
that is built around the radial symmetry. This is the place of polar coordinates.
Polar coordinates on the xy-plane are defined as follows: As shown in Figure 3.2, let
r = y/x2 4- y2 denote the distance of the point from the origin, and let 6 denote the
angle from the positive x-axis to the vector from the origin to the point. Ordinarily, we
adopt the convention that
Figure 3.1
3 Polar, Cylindrical, and Spherical Coordinates ◄ 289
It is better to express x and y in terms of r and 0, and we do this by means of the mapping
r cos#
g
r sin#
region Q in the rfi’-plane that maps to S. We substitute x = r cos 0 and y = r sin 0, and
then realize that a little rectangle Ar by A0 in the r0-plane maps to an “annular chunk”
whose area is approximately Ar (r A0) in the xy-plane (see Figure 3.3). That is, partitioning
the region Q into little rectangles corresponds to “partitioning” S into such annular pieces.
Summing over all the subrectangles of a partition suggests a formula like
rcos#
rdrdO.
r sin#
► EXAMPLE 1
Let S be the annular region 1 < x2 + y2 < 2 pictured in Figure 3.1. We wish to evaluate
s 'o
r2drdd
2t t (2a /2-1)
3
If you are not yet convinced, try doing this in Cartesian coordinates!
290 ► Chapter 7. Integration
► EXAMPLE!
Let S' C R2 be the region inside the circle x2 + y2 = 9, below the line y = x, above the x-axis, and
lying to the right of x = 1, as shown in Figure 3.4. Evaluate / xydA. We begin by finding the region
Js
Q in the r0-plane that maps to S, as shown in Figure 3.5. Clearly 0 goes from 0 to t t /4, and for each
fixed 0, we see that r starts at r = sec 0 (as we enter S at the line x = 1) and increases to r = 3 (as we
exit S at the circle). (We think naturally of determining r as a function of 0, so naturally we would
place 0 on the horizontal axis and r on the vertical; for reasons we’ll see in Chapter 8, this is not a
good idea.)
Therefore, we have
S JO Jsec 0 ' v
x y dA
= / / r3 cos6 sin0 dr d0
JO J sec 9
1
= - / (81 — sec4 0) cos 0 sin 0d0
4 Jo
Figure 3.5
3 Polar, Cylindrical, and Spherical Coordinates 291
_ 1 f 1 ( . sm0 \
~ 4 Jo 81 cos 0 sm 0------- — I d\
\ cos3 0 /
= | (81 sin2 0------ I’f/4 79
8 \ cos2 0 Jo = 16
► EXAMPLES
/*OO
We wish to evaluate the improper integral / e~x2dx. This “Gaussian integral” is ubiquitous in
, . .. •'o
probability, statistics, and statistical mechanics. Although one way of doing so was given in Exercise
7.2.22, the approach we take here is more amenable to generalization.
Taking advantage of the property ea+b = eaeb, we exploit radial symmetry by calculating instead
the double integral
00 \ 2 -00 fOO r>
e~x dxj = I / e~x2e~y2dydx = / e~{x2+y2)dA.
J JO Jo J[0,oo)x[0,oo)
/ e^x+y2dA = / / e~r2rdrde
J[0,oo)x[0,oo) Jo Jo
= lim / / e~r2rdrd0
Jo Jo
IT
4’
Remark We should probably stop to worry for a moment about convergence of these
improper integrals. First of all,
i/•oo 2 /t>N 2
/ e~x dx = lim / e~x dx
Jo N^°° Jo
exists because, for example, when x > 1, we have 0 < e~x2 < e~x, and so the integrals
/•V 2 y>00
/ e~x dx increase as N -+ oo and are all bounded above by 1 + / e~xdx = 1 + e-1.
Jo
Now it is easy to see, as Figure 3.6 suggests, that the integral of e_(* +y2) over the square
[0, N] x [0, N] lies between the integral over the quarter-disk of radius N and the integral
over the quarter-disk of radius N^/l, both of which approach t t /4.
In general, it is good to use polar coordinates when either the form of the integrand or
the shape of the region recommends it.
Next we come to three dimensions. Cylindrical coordinates r, 0, z are merely polar
coordinates (used in the xy-plane) along with the cartesian coordinate z:
Figure 3.6
r cos#
r sin#
z
The intuitive argument we gave earlier for polar coordinates suggests now that a little
rectangle Ar by A# by Az in r#z-space corresponds to a “chunk” with approximate volume
A V « Ar(r A#) Az, as pictured in Figure 3.7. If g maps the region Q in r#z-space to our
region S C K3, then we expect
rcos#\ rr / r /rcos#\ .
rsin# I rdrdOdz — 11 ( / f I rsin# I dz IrdrdG.
. z J \ z J '
Indeed, as suggested by the last integral above, it is almost always preferable to set up an
iterated integral with dz innermost, and then the usual rdrdO outside (integrating over the
projection of Q onto the xy-plane).
► EXAMPLE 4
Revisiting Example 9 of Section 2, we let S C R3 be the region in the first octant bounded below by
the paraboloid z = x2 + y2 and above by the plane z = 4. To evaluate / xdV by using cylindrical
Js
3 Polar, Cylindrical, and Spherical Coordinates 293
r
e
CM
Q=
CM
c
VI
VI
VI
VI
VI
VI
k
n
.
>
Thus, we have
r cos 0 rdzdrdO
r2 CQsOdzdrdO
71/2 64 „ 64
-coswe =
► EXAMPLES
Let S be the region bounded above by the paraboloid z = 6 — x2 - y2 and below by the cone z =
+ y2f as pictured in Figure 3.8. Find The symmetry of S about the z-axis makes
cylindrical coordinates a natural. The surfaces z = 6 — r2 and z = r intersect when r = 2, so we see
that S is the image under g of the region
Figure 3.8
294 ► Chapter 7. Integration
Thus, we have
/> f2it z*2 p6—r^
I zdV = I II z rdzdrdO
S Jo Jo Jr '------- *------- '
dV
p2it r2
I I ~((6 —r2)2 — rr)rdrd0
o Jo
r2 92
= 7t I (36 - 13r2 + r4)rdr — —n. ◄
Jo 3
Last, we come to spherical coordinates: p represents the distance from the origin to
the point, 0 the angle from the positive z-axis to the vector from the origin to the point,
and 0 the angle from the positive x-axis to the projection of that vector into the xy-plane.
That is, in some sense, 0 specifies the latitude of the point and 0 specifies its longitude.
(As shown in Figure 3.9, when p and 0 are held constant, we get a circle parallel to the
xy-plane; when p and 0 are held constant, we get a great circle going from the north pole
to the south pole.) Notice that we make the convention that
As usual, we use basic trigonometry to express x, y, and z in terms of our new coordi
nates p, 0, and 0 (see also Figure 3.10):
p sin 0 cos 0
g p sin 0 sin 0
peas#
Figure 3.9
3 Polar, Cylindrical, and Spherical Coordinates 295
p sin 0 cos B
p sin 0 sin# p2 smcpdpdcfrdQ.
pcos0
► EXAMPLES
Let S c K3 be the “ice-cream cone” bounded above by the sphere x2 + y2 + z2 = a2 and below by
the cone z = c^/x1 4- y2, where c is a fixed positive constant, as depicted in Figure 3.12. It is easy to
see that the region Q in -space mapping to S is given by
2?T 3/1 X X
—cT(l -cos^o)
Figure 3.12
296 ► Chapter 7. Integration
► EXAMPLE?
0
Let S be the sphere of radius a centered at 0 We wish to evaluate z2dV. We observe first
that, by Exercise 1.2.14, the triangle shown in Figure 3.13 is a right triangle, and so the equation of
the sphere is p = la cos 0,0 < 0 < t t /2. So we have
<• z»2/r pzt/2 r>2acos<!>
I z2dV = //I (p2 cos2 0) p2 siiLtfedpd^dd
s Jo Jo Jo '------ «------' '----------- <----------- '
z2 dV
f2ir [‘it/2 a 2a cos
= I I I p4 cos2 (j) sin^dpdt^dG
Jo Jo Jo
64 r/2 8
=—na5 I cos7 0 sin 0d0 = -na5.
5 Jo 5
Figure 3.13
3 Polar, Cylindrical, and Spherical Coordinates ◄ 297
► EXERCISES 7.3
4. For e > 0, let S£ = {x : e < ||x|| < 1} C R2. Evaluate lim [ —=======:dA. (This is often
e-*o+ Jse ^x1 4- y2
expressed as the improper integral I (x2 4- y2)~1/2dA.)
Jb {0,\)
*5. Let S'be the annular region shown in Figure 3.1. Evaluate / y2dA
Js
(a) directly; (b) by instead calculating f (x2 4- y2)dA.
Js
*6. Calculate / y(x2 4- y2) 5/2 d A, where S is the planar region lying above the x-axis, bounded on
Js
the left by x — 1 and above by x2 4- y2 — 2.
7. Calculate / (x2 4- y2)~3/2dA, where S is the planar region bounded below by y = 1 and above
Js
by x2 4- y2 = 4.
8. Let/ ==. Let S be the planar region lying inside the circle x2 4- y2 = 2x, above
Figure 3.15
13. Find the volume of the region inside both x2 4- y2 = 1 and x2 4- y2 + z2 = 2.
14. Find the volume of the region inside both x2 4- y2 4- z2 = 4a2 and x2 4- y2 = lay.
15. Find the volume of the region bounded above by x2 4- y2 4- z2 = 2 and below by z = x2 4- y2.
16. Find the volume of the region inside the sphere x2 4- y2 4- z2 = a2 by integrating in
(a) cylindrical coordinates; (b) spherical coordinates.
*17. Find the volume of a right circular cone of base radius a and height h by integrating in
(a) cylindrical coordinates; (b) spherical coordinates.
18. Find the volume of the region lying above the cone z = y/x2 4- y2 and inside the sphere
x2 4- y2 4- z2 = 2 by integrating in
(a) cylindrical coordinates; (b) spherical coordinates.
19. Find the volume of the region lying above the plane z = a and inside the sphere
x2 4- y2 4- z2 = 4a2 by integrating in
(a) cylindrical coordinates; (b) spherical coordinates.
*20. Let 5 c K3 be the unit ball. Use symmetry principles to compute I x2d V as easily as possible.
Js
21. (a) Evaluate [ e~^+y2+z2) dV. (b) Evaluate [ e~(x2+2y2+3z2)dV.
J]R3
*22. Find the volume of the region in R3 bounded above by the plane z = 3x + 4y and below by the
paraboloid z = x2 4- y2.
f z
23. Evaluate / —z----- ------ -r-^dV, where Sis the region bounded below by the sphere x2 4- y2 4-
Js (x 4- y2 4- z2y'2
z2 = 2z and above by the sphere x2 4- y2 4- z2 = 1.
24. Find the volume of the region in R3 bounded by the cylinders x*2 4- y2 = 1, y2 4- z2 = 1, and
x2 4- z2 — 1. (Hint: Make fall use of symmetry.)
► 4 PHYSICAL APPLICATIONS
So far we have focused on area and volume as our interpretation of the multiple inte
gral. Now we discuss average value and mass (which have both physical and probabilistic
interpretations), center of mass, moment of inertia, and gravitational attraction.
Recall from one-variable calculus the notion of the average value of an integrable
function. Given a real-valued function f on an interval [a, b], we may take the uniform
partition of th® interval into k equal subintervals, j q = a 4- i i = and
4 Physical Applications 299
1 k
y® = *£,/(*)■
b-a k
i=i
Now let’s suppose that f is bounded. Then, as usual, m,- < f (xt) < Mi for each i —
1,... ,k, and so
7^—Uf, 3>t)
b—a b—a
for every uniform partition 3** of the interval [a, b]. Now assume that f is integrable. Then
it follows from Exercise 7.1.10 that £(/, and U(f, TJ both approach / f(x)dx as
J Cl
k -> oo, and so
1 fb
y(*) _> -------- I f(x)dx a.sk-+ oo.
b-a Ja
Definition Let f be an integrable function on the interval [a, b]. We define the
average value of f on [a, b] to be
1
I f(x)dx.
b—a a
In general, if £2 c R" is a region and f: Q -> R is integrable, we define the average value
of f on Q to be
1 r
f=
vol(Q) JQ
> EXAMPLE 1
Around hotplate S' is given by the disk r <n/2. Its temperature is given by f
7=^)l/dA
by proceeding in polar coordinates:
z»?r/2
/ / (cos r)rdrd0 = 2t t (r sin r + cos r)
s ’o Jo -*°
300 ► Chapter 7. Integration
and so
_ 2?r(f - 1) 4Qr—2)
Z 7r(f)2 n2 « 0.463. ◄
f = —1— f tdV.
vol(Q) Jn
x = —[ xdV.
vol(fl) Jq
► EXAMPLE 2
We want to find the centroid of the plane region Q bounded below by y = 0, above by y = x2, and
on the right by x = 1. Its area is given by
area(Q) = f f dydx =
Jo Jo j
Now, integrating the position vector x over Q gives
f ,A f1 [*2 r x i r1 r *3 "| [i/4
Jq Jo Jo [y J Jo [J* J L1/10_
3/4
SO X = , which makes physical sense (see Figure 4.1). ◄
3/10
It is useful to observe that when the region Q is symmetric about an axis, its centroid will
lie on that axis. (See Exercise 7.2.18.)
When a mass distribution Q is nonuniform, it is important to understand the idea of
density. Much like instantaneous velocity (or slope of a curve), which is defined as a limit
of average velocities (or slopes of secant lines), we define the density 5(x) to be the limit
as r -> 0+ of the average density (mass/volume) of a cube of sidelength r centered at x.2
Then it is quite plausible that, with some reasonable assumptions on the behavior of “mass,”
it should be recaptured by integrating the density function.
2More precisely, the average density of that portion of the cube lying in Q.
4 Physical Applications -•< 301
Remark We should be a little bit careful here. The Fundamental Theorem of Calculus
tells us that we can recover f by differentiating its integral F(x) = J* f(t)dt provided f is
continuous. If we start with an arbitrary integrable function f, e.g., the function in Exercise
7.1.13, this will, of course, not work. Asimilar situation occurs if we start with an integrable
8, define the mass by integrating, and then try to recapture 8 by “differentiating” (taking the
limit of average densities). Since we are concerned here with physical applications, we will
tacitly assume 8 is continuous (see Exercise 7.1.7). In more sophisticated treatments, we
really would like to allow point masses and “generalized functions,” called distributions-,
this will have to wait for a more advanced course.
This is a natural generalization of the weighted average we see with a system of finitely
many point masses mi,..., at positions Xi,..., respectively, as shown in Figure
4.2. In this case, the weighted average is
N
E w«xi
Em<
i=l
and it has the following physical interpretation. If external forces F, act on the point masses
m,-, they impart accelerations x" according to Newton’s second law: F< = m/X-'. Consider
N N
the resultant force F = E acting on the total mass m — E mi (any internal forces cancel
i=l i=l
ultimately by Newton’s third law). Then
N N
f =22 = 52 ~ m*"-
i=l i=l
That is, as the forces act and time passes, the center of mass of the system translates exactly
as if we concentrated the total mass m at x and let the resultant force F act there.
Next, let’s consider a rigid body3 consisting of point masses m i,..., m N rotating about
an axis £; a typical such mass is pictured in Figure 4.3. The kinetic energy of the system is
V 1 j N
k .e .=22 = - 22m’^ft>)2’
Z=1 Z =l
where a> is the angular speed with which the body is rotating about the axis and r, is
the distance from the axis of rotation to the point mass m,. (Remember that each mass is
3A rigid body does not move relative to itself; imagine the masses connected to one another by inflexible rods.
4 Physical Applications -4 303
► EXAMPLE 3
Let’s find the moment of inertia of a uniform solid ball Q of radius a about an axis through its center.
We may as well place the ball with its center at the origin and let the axis be the z-axis. Then, using
spherical coordinates, we have (since 8 is constant)
f pa
1= I 8r2dV = 8 III (psin0)2 p2 sin cfrdpdfidd
Jq Jq Jo Jo "---- - ........ * 1 '
dV
pit pa
— 2n8 I I p4 sin3 (f>dpd<f>
Jo Jo
a5 4 /4 3 \ 2 2 2 2
= 2t c 8 • —- • - = I -Tta 8 I -a = -ma ,
5 3 \3 J 5 5
► EXAMPLE 4
One of the classic applications of the moment of inertia is to decide which rolling object wins the race
down a ramp. Given a hula hoop, a wooden nickel, a hollow ball, a solid ball, or something more
imaginative like a solid cone, as pictured in Figure 4.4, which one gets to the bottom first?
Figure 4.4
304 ► Chapter 7. Integration
We use the basic result from physics (see the remark on p. 352 and Example 6 of Chapter 8,
Section 3) that, if we ignore friction, total energy—potential plus kinetic—is conserved.4 We measure
potential energy relative to ground level, so a mass m has potential energy mgh at (relatively small)
heights h. If the rolling radius is a, its angular speed is co, and its linear speed is v, then we have
aco = v, so when the mass has descended a vertical height h, we have
Thus, the object’s speed is greatest when the fraction I/ma2 is smallest. We calculated in Example 3
that this fraction is 2/5 for a solid ball. For a hula-hoop of radius a or for a hollow cylinder of radius
a, it is obviously 1 (why?). So the solid ball beats the hula-hoop or hollow cylinder. What about the
other shapes? (See Exercises 16,17, and 19.) And is there an optimal shape?
Newton’s law of gravitation applies to point masses: The force F exerted by a mass m
at position x on a test mass (which we take to have mass 1 unit) at the origin is given by
and, thus, the gravitational force exerted by a continuous mass distribution £2 with density
function 8 is
► EXAMPLES
Find the gravitational attraction on a unit mass at the origin of the uniform region £2 bounded above
by the sphere x2 + y2 + z2 = 2a2 and below by the paraboloid az = x2 4- y2, pictured in Figure 4.5.
(Take 3 = 1.)
Since £2 is symmetric about the z-axis, the net force will lie entirely in the z-direction, so we
calculate only the e3-component of F. Working in cylindrical coordinates, we see that £2 lies over the
disk of radius a centered at the origin in the xy-plane, and so
f z
F3 = G I - ■ T—- 2 T~2<3/2
J a (x + y2 4- z2)3/2
/•2ir z»a -
=G / / -rtZzfifr^
J0 Jo Jr*/a (r2+Z2y!2
4Of course, for the objects to roll, there must be some friction.
4 Physical Applications ◄ 305
We leave it to the reader to set the problem up in spherical coordinates (see Exercise 24).
► EXAMPLE 6
Newton wanted to understand the gravitational attraction of the earth, which he took to be a uniform
ball. Most of us are taught nowadays that the gravitational attraction of the earth on a point mass
outside the earth is that of a point mass M concentrated at the center of the earth. But what happens
if the point mass is inside the earth? We put the earth (a ball of radius R) with its center at the origin
"o'!
and the point mass at o , b > 0, as shown in Figure 4.6. By symmetry, the net force will lie in
b
the z-direction, so we compute only that component. If the earth has (constant) density 8, we have
F,= /C —G8cos
—— a dV = ~G8f / b-pcos#
Jo d Jq (b2 + p2 — 2bp cos 0)3/2
fR b — pcosd> „
= -2nG8 / --.. -——------- —-^p2 sin fidfidp.
Jo Jo (b2 + p2 — 2bp cos </>)3/2
Figure 4.6
f b - p^4> 2 . b- p
JQ (b2 + P2 - 2bp COS 0)3/2p Sm 0 J(b_p}2 W3/2 2bdU
n f^)2
f 2p2
F3 = -2nG8 / ~dp =
4j t G8 R3 GM
Jo b2 b2 3 b2 ’
where M = 4t c 8R3/3 is the total mass of the earth. On the other hand, if b < R, then, since the
integrand vanishes whenever p > b, we have
2p2 , 4nG8, ,
F3 = —2nG8
which, interestingly, is linear in b. (When b = R, of course, the two answers agree.) Incidentally,
we will be able to rederive these results in a matter of seconds in Section 6 of Chapter 8.
*3. Find the average distance from a point on the boundary of a ball of radius a in R2 to the points
inside the ball.
*4. Find the average distance from a point on the boundary of a ball of radius a in R3 to the points
inside the ball.
5. Find the average distance from one comer of a square of sidelength a to the points inside the
square.
6. Consider the region Q lying inside the circle x2 4- y2 = 2x, above the x-axis, and to the right of
x = 1, with density <51 | = . Find the center of mass of Q.
\yj x2 + y2
*7. Consider the region Q lying inside the circle x2 + y2 = 2x and outside the circle x2 4- y2 = 1. If
its density function is given by 8 | I = (x2 4- y2)~1/2, find its center of mass.
V7
8. Find the center of mass of a uniform semicircular plate of radius a in R2.
9. Find the center of mass of a uniform solid hemisphere of radius a in R3.
10. Find the center of mass of the uniform region in Exercise 7.3.19.
*11. Find the center of mass of the uniform tetrahedron bounded by the coordinate planes and the
plane x/a 4- y/b 4- z/c = 1.
*12. Find the mass of a solid cylinder of height h and base radius a if its density at x is equal to the
distance from x to the axis of the cylinder. Next find its moment of inertia about the axis.
13. Find the moment of inertia about the z-axis of a solid ball of radius a centered at the origin,
whose density is given by <5(x) = ||x||.
14. Let Q be the region bounded above by x2 4- y2 4- z2 — 4 and below by z = ^x2 +y2. Calculate
the moment of inertia of Q about the z-axis by integrating in both cylindrical and spherical coordinates.
15. Find the moment of inertia about the z-axis of the region of constant density <5=1 bounded
above by the sphere x2 4- y2 4- z2 = 4 and below by the cone z-x/3 = y/x2 4- y2.
*16. Find the moment of inertia about the z-axis of each of the following uniform objects:
(a) a hollow cylindrical can x2 4- y2 = a2, 0 < z < h
(b) the solid cylinder x2 4- y2 < a2, 0 < z <h
(c)the solid cone of base radius a and height h symmetric about the z-axis
Express each of your answers in the form I — kma2 for the appropriate constant k.
17. (a) Let 0 < b < a. Find the moment of inertia Ia,b about the z-axis of the uniform region
b2 < x2 4- y2 4- z2 < a2.
(b) Find lim /a-- .
b-^a~ a3 — b3
(c) Use your answer to part b to show that the moment of inertia of a uniform hollow spherical shell
x2 4- y2 4- z2 = a2 about the z-axis is jma2, where m is its total mass.
18. Let Q c IF be a region. For what value of a e R" is the integral
f h-a||2dV
Jci
minimized? (Cf. Exercise 5.2.14.)
19. Let Q be the uniform solid of revolution obtained by rotating the graph of y = |x|", |x| < ai/n,
about the x-axis, as indicated in Figure 4.7. Let I be the moment of inertia about the x-axis. Show
h _L - 2w + 1
* ma2 2(4n 4- 1)'
308 ► Chapter 7. Integration
Figure 4.7
20. Let denote the uniform solid region described in spherical coordinates by 0 < p < a,
0 < 0 < £.
(a) Find the center of mass of J2e.
(b) Find the limiting position of the center of mass as e -> 0+. Explain your answer.
21. (Pappus’s Theorem) Suppose R c R2 is a plane region (say, that bounded by the graphs of f
and g on the interval [a, b]), and let Q C R3 be obtained by revolving 1? about the x-axis. Prove that
the volume of Q is equal to
vol(Q) = 2ny • area(R).
22. Let Q denote a mass distribution. Denote by I the moment of inertia of JI about a given axis t,
and by Zo the moment of inertia about the axis £0 parallel to t and passing through the center of mass
of Q. Then prove the parallel axis theorem:
I = I0 + mh2,
where m is the total mass of Q and h is the distance between t and fo-
23. Calculate the gravitational attraction of a solid ball of radius R on a unit mass on its boundary if
its density is equal to distance from the center of the ball.
24. Set up Example 5 in spherical coordinates and verify the calculations.
25. Prove or give a counterexample: The gravitational force on a test mass of a body with total mass
M is equal to that of a point mass M located at the center of mass of the body.
26. Show that Newton’s first result in Example 6 still works for a nonuniform earth, as long as the
density 8 is radially symmetric (i.e., is a function of p only). What happens to the second result?
27. Consider the solid region Q bounded by (x2 4- y2 + z2)3/2 = kz (k > 0), with k chosen so that
the volume of Q is equal to the volume of the unit ball.
(a) Find A:.
(b) Taking 3 = 1, find the gravitational attraction of Q on a unit test mass at the origin.
Remark Your answer to part b should be somewhat larger than 4?rG/3, the gravitational
attraction of the unit ball (with 8 = 1) on a unit mass on its boundary. In fact, J2 is the region of
appropriate mass that maximizes the gravitational attraction on a point mass at the origin. Can you
think of any explanation—physical, geometric, or otherwise?
28. A completely uniform forest is in the shape of a plane region Q. The forest service will locate a
helipad somewhere in the forest and, in the event of fire, will dispatch helicopters to fight it. If a fire
is equally likely to start anywhere in the forest, where should the forest service locate the helipad to
minimize fire damage? (Let’s take the simplest model possible: Assume that fire spreads radially at
a constant rate and that the helicopters fly at a constant rate and take off as soon as the fire starts. So
what are we trying to minimize here?)
5 Determinants and n-Dimensional Volume 309
Theorem 5.1 For each n > 1, there is exactly one function D: R” x • • • x R" -> R
having the following properties: Ttimes
1. If any pair of the vectors Vi,..., vn are exchanged, D changes sign. That is,
D(V1, = -®(¥i,
D(ei, ...,e„) = 1.
Properties (2) and (3) indicate that D is linear as a function of each of its variables
(whence “mw/rilinear”); property (1) indicates that D is “alternating.” Property (4) can be
interpreted as saying that the unit cube should have volume 1.
Since most of our work with matrices has centered on row operations, it would perhaps
be more convenient to define the determinant in terms of the rows of A. But it really is
inconsequential for two reasons: First, everything we proved using row operations (and,
correspondingly, left multiplication by elementary matrices) works verbatim for column
operations (and, correspondingly, right multiplication by elementary matrices); second, we
will prove shortly that det AT = det A.
Properties (l)-(3) of D listed in Theorem 5.1 allow us to see the effect of elementary
column operations on the determinant of a matrix. Indeed, Property (1) corresponds to a
column interchange; Property (2) corresponds to multiplying a column by a scalar; and
310 ► Chapter 7. Integration
Property (3) tells us—in combination with Property (1)—that adding a multiple of one
column to another does not change the determinant.
► EXAMPLE 1
"0 0 0 4"
t 0 2 0 0
A—
0 0 10
3 0 0 0
as follows. First we factor out the 3 from the first column to get
0 0 0 4
0 2 0 0
0 0 1 0
3 0 0 0
by Property (2). Repeating this process with the 4 and the 2, we obtain
0 0 0 4 0 0 0 1
3° 2 ° ° =2-4.3° 1 0 0
0 0 10 0 0 1 0
1 0 0 0 1 0 0 0
Now interchanging columns 1 and 4 introduces a factor of —1 by Property (1), and we have
0 0 0 1 1 0 0 0
0 1 0 0 0 1 0 0
det A = 24 = -24 = -24
0 0 1 0 0 0 1 0
1 0 0 0 0 0 0 1
To calculate the effect of the third type of column operation—adding a multiple of one
column to another—we need the following observation.
Proof If a{ = a;, then the matrix is unchanged when we switch columns i and j. On
the other hand, by Property (1), its determinant changes sign when we do so. That is, we
have det A = — det A. This can happen only when det A = 0. ■
Proposition 5.3 Let A be an n xn matrix and let A' be the matrix obtained by adding
a multiple of one column of A to another. Then det A' = det A.
5 Determinants and n-Dimensional Volume 311
Proof Suppose A1 is obtained from A by replacing the i* column by its sum with
c times the column; i.e., a' = a, 4- cay, with i j. (As a notational convenience, we
assume i < j, but that really is inconsequential.) We wish to show that
since D(ai,..., a,_i, ay, al+i......... ay,..., aM) = 0 by the preceding Lemma. ■
► EXAMPLE 2
2 2 1
A = 4 1 0
6 0 1
First we exchange columns 1 and 3, and then we proceed to (column) echelon form:
2 2 1 1 2 2 1 0 0 1 0 0
det A = 4 1 0 =: — 0 1 4 = — 0 1 4 = — 0 1 0
6 0 1 1 0 6 1 -2 4 1 —2 12
But
10 0 10 0
0 1 0 = 12 0 1 0
1 -2 12 1 -2 1
and now we can use the pivots to column-reduce to the identity matrix without changing the deter
minant. Thus,
1 0 0 1 0 0
det A = -12 0 1 0 = -12 0 1 0 = -12. ◄
1 -2 1 0 0 1
This is altogether too brain-twisting. We will now go back to the theory and soon show that
it’s perfectly all right to use row operations. First, let’s summarize what we’ve established
so far: We have
312 ► Chapter 7. Integration
1. Let A' he obtained from A by exchanging two columns. Then det A' = — det A.
2. Let A' be obtained from A by multiplying some column by the number c. Then
det A' — c det A.
3. Let A' be obtained from A by adding a multiple of one column to another. Then
det A' = det A.
Theorem 5.5 Let Abe a square matrix. Then A is nonsingular if and only if det A
0.
Proof Suppose A is nonsingular. Then its reduced (column) echelon form is the
identity matrix. Turning this upside down, we can start with the identity matrix and perform
a sequence of column operations to obtain A. If we keep track of their effects on the
determinant, we see that we’ve started with det I = 1 and multiplied it by a nonzero number
to obtain det A. That is, det A / 0. Conversely, suppose A is singular. Then its (column)
echelon form U has a column of zeroes and therefore (see Exercise 2) det U = 0. It follows
as in the previous case that det A = 0. ■
Corollary 5.6 Let E be an elementary matrix and let A be an arbitrary square matrix.
Then
Proof Suppose B is singular, so that there is some nontrivial linear relation among
its column vectors:
cibj + • • • + cnbn = 0.
from which we conclude that there is (the same) nontrivial linear relation among the column
vectorsof AB, and so AB is singular as well. We inferfrom Theorem5.5 that both det B =0
and det AB = 0, and so the result holds in this case.
Now, if B is nonsingular, we know that we can write B as a product of elementary
matrices, viz., B = E\Ei • • ■ Em. We now apply Corollary 5.6 twice: First, we have
det B — det(Z£i £2 • • • Em) = det E\ det E2 • • • det Em det I — det E\ det £2 • • • det Em;
as claimed. ■
A consequence of this proposition is that det (AB) = det(BA), even though matrix
multiplication is not commutative. Thus, we have
Corollary 5.8 Let E be an elementary matrix and let A be an arbitrary square matrix.
Then
1. Let A' be obtainedfrom A by exchanging two rows. Then det A' = — det A.
2. Let A' be obtainedfrom A by multiplying some row by the number c. Then det A' =
c det A.
3. Let A' be obtained from A by adding a multiple of one row to another. Then
det A' — det A.
Proof From the equation AA-1 = I and Proposition 5.7, we deduce that
det A det A"1 = 1, so det A-1 — 1/detA. ■
Since we’ve seen that row and column operations have the same effect on determinant,
it should not come as a surprise that a matrix and its transpose have the same determinant.
Proof If an = 0 for some i, then A is singular (why?) and so det A = 0, and the
desired equality holds in this case. Now assume all the an are nonzero. Let A, be the Ith
row vector of A, as usual, and write A/ = anBi, where the Ith entry of B, is 1. Then, using
Property (2) repeatedly, we have det A = an • • • ann det B. Now B is an upper triangular
matrix with l’s on the diagonal, so we can use the pivots to clear out the upper (lower) entries
without changing the determinant, and thus det B = det I = 1. So det A — #11^22 ‘ ‘ ‘
as promised. ■
Remark As we shall prove in Theorem 1.1 of Chapter 9, any two matrices A and A'
representing a linear map T are related by the equation A' = P~l AP for some invertible
matrix P. As a consequence of Proposition 5.7, we have
and so it makes sense to define det T = det A for any matrix representative of T.
We now come to the geometric meaning of det T: It gives the factor by which signed
volume is distorted under the mapping by T. (See Exercise 24 for another approach.)
Proposition 5.13 Let T: R" —> R" be a linear map, and let R be a parallelepiped.
Then vol(T(R)) = | det T|vol(£). Indeed, if Q c R" is a general region, then vol(T(Q))
= IdetTIvoKQ).
Proof When T has rank < n, det T = 0 and the image of T lies in a subspace of
dimension < n; hence, by Exercise 7.1.12, T(R) has volume zero. When T has rank n,
we can write [T] as a product of elementary matrices. Because of Proposition 5.7, it now
suffices to prove the result when [T] is an elementary matrix itself.
Recall that there are three kinds of elementary matrices (see p. 148). When R is a
rectangle, it is clear that the first type does not change volume, and the second multiplies
the volume by |c|; the third (a shear) does not change the volume, for the following reason.
The transformation is the identity in all directions other than the xtxj-plane, and we’ve
already checked that in two dimensions the determinant gives the signed area. (See also
Exercise 24.)
5 Determinants and n-Dimensional Volume 315
Suppose Q is a region. Then we can take a rectangle R containing Q and consider the
function
x g Q
X- R^K X(x) =
otherwise
Since by our definition of region, % is integrable, given e > 0, we can find a partition T of
R so that Z7(%, T) — £(/, ?) < e. That is, the sum of the volumes of those subrectangles
of T that intersect the frontier of Q is less than e. In particular, this means Q contains
a union, Si, of subrectangles of T and is contained in a union, S2, of subrectangles of
T, as shown in Figure 5.1, with the property that vol(S2) - vol(Si) < e. And, likewise,
T(Q) contains a union, T(Si), of parallelepipeds and is contained in a union, T(S2), of
parallelepipeds, with vol(T(Sj)) = |c|vol(S/) or vol(7(5,)) = vol(Sj), depending on the
nature of the elementary matrix. In either event, we see that
and since £ > 0 was arbitrary, we are done. (Note that, by Exercise 7.1.11 and Corollary
1.10, T (Q) has a well-defined volume.) ■
Then we have the following formula, called the expansion in cofactors along the i* row.
Proposition 5.14 Let Abeannxn matrix. Then for any fixed i, we have
n
det A = 52a0-cy.
Using rows here allows us to check that the expression on the right-hand side of this
equation satisfies the properties of a determinant as set forth in Theorem 5.1. However,
using the fact that det AT = det A, we can transpose this result to obtain the expansion in
cofactors along the column.
Proposition 5.15 Let Abeannxn matrix. Then for any fixed j, we have
n
det A =
1=1
Note that when we define the determinant of a 1 x 1 matrix by the obvious rule,
det [a] = a,
Proposition 5.15 yields the familiar formula for the determinant of a 2 x 2 matrix and,
again, that of a 3 x 3 matrix.
► EXAMPLE 3
Let
2 1 3
A= 1 -2 3
0 2 1
1 3 2 3 2 1
detA = (-l)(2+n(l) + (—1)(2+2)(—2) + (-1)(2+3)(3)
2 1 0 I 0 2
= —(!)(—5) + (—2)(2) - (3)(4) = -11.
Of course, because of the 0 entry in the third row, we’d have been smarter to expand in cofactors
along the third row, obtaining
3 2 7 1
det A = (-D^fO) + (—1)(3+2)(2) + (-l)(3+3)(l) J
3 1
= -2(3) + 1(-5) = -11.
Sketch of proof of Proposition 5.15 As we mentioned earlier, we must check that the
expression on the right-hand side has the requisite properties. When we form a new matrix
A' by switching two adjacent columns (say, columns k and k + 1) of A, then whenever j k
5 Determinants and n-Dimensional Volume ◄ 317
and j k + 1, we have a-j = ay and = —cfJ-; on the other hand, when j = ky we have
aik ~ ai,k+i and c’ik = when j — k 4-1, we have jt+1 = aik and cf>k+1 = -cik,
so
n n
^2aijCij = ~~ aijcij >
J=1 >1
n n
Y.^1)
>='
= c>='
Y,a‘>c»'
as required. Suppose now that we replace the fc* column by the sum of two column
vectors, viz., a* — ak + aj'. Then for j / k, we have c-j = cy + c-'- and a'j — ay = a"j.
When j = k, we likewise have c'ik = cik = c''k, but a'ik = aik + a”k. So
n n n
as required. ■
of operations required:
Thus, we see that once n > 4, it is sheer folly to calculate a determinant by the cofactor
method (unless almost all the entries of the matrix happen to be 0).
We conclude this section with a few classic formulas. The first is particularly useful for
solving 2x2 systems of equations and may be useful even for larger n if you are interested
only in a certain component x, of the solution vector.
det Bi
xi = TTT’
det A
where Bi is the matrix obtained by replacing the Ith column of A by the vector b.
► EXAMPLE 4
We wish to solve
2 3 xi 3
4 7 X2 -1
5 Determinants and n -Dimensional Volume 319
We have
3 3 2 3
Bi = and B2 =
-1 7 4 -1
We now deduce from Cramer’s rule an “explicit” formula for the inverse of a non
singular matrix. Students always seem to want an alternative to Gaussian elimination, but
what follows is practical only for the 2 x 2 case (where it gives us our familiar formula
from Example 5 on p. 154) and—barely—for the 3 x 3 case.
Proposition 5.17 Let Abe a nonsingular matrix, and let C = [cjj be the matrix of
its cofactors. Then
A~l = t -t c T-
det A
Proof We recall from p. 152 that the 7th column vector of A-1 is the solution of
Ax = ej, where e;- is the j* standard basis vector for Rn. Now, Cramer’s rule tells us that
the 1th coordinate of the 7 th column of A"1 is
(A-1),; = detAj/,
det A
where An is the matrix obtained by replacing the 1th column of A by ey. Now, we calculate
det Ajt by expanding in cofactors along the 1th column of the matrix Ajt. Since the only
nonzero entry of that column is the 7 th, and since all its remaining columns are those of the
original matrix A, we find that
For 3x3 matrices, this formula isn’t bad when det A would cause troublesome arith
metic in doing Gaussian elimination.
► EXAMPLES
1 2 1
A= -1 1 2
2 0 3
then
1 2 -1 2 -1 1
det A = (1) -(2) + (D = 15,
0 3 2 3 2 0
320 ► Chapter 7. Integration
and so we suspect the fractions would not be fun if we implemented Gaussian elimination. Undaunted,
we calculate the cofactor matrix:
and so
3 -6 3
A"' = -i-CT = i
7 1 -3 .
det A 15
-2 4 3
n
Proof The 7 th column of A is the vector a, = 52 aueii, and so, by Properties (2) and
i=l
(3), we have
n n n \
(22 ah ie«i * 12ai^ - • • • > 22 ^=1 yI
n
— ^iil^r‘22 ' ‘ > ®iB)>
5 Determinants and n-Dimensional Volume •«! 321
Recall that GL(n) denotes the set of invertible n x n matrices (which, by Exercise
6.1.6, is an open subset of A4nxn)-
Corollary 5.19 The function f: GL(n) —> GL(n), f(A) = A"1, is smooth.
► EXERCISES 7.5
1. Calculate the following determinants:
1 4 1 -3
-1 6 -2
2 10 0 1
(a) 3 4 5 (c)
0 0 2 2
5 2 1
0 0 -2 1
2 -1 0 0 0
1 0 2 0
-1 2 -1 0 0
-1 2 -2 0
*(b) *(d) 0 -1 2 —1 0
0 1 2 6
0 0 -1 2 -1
1 1 3 2
0 0 0 1 2
2. Suppose one column of the matrix A consists only of 0 entries; i.e., a( = 0 for some i. Prove that
det A = 0.
3. Prove Corollary 5.6.
#4. Prove (without using Proposition 5.11) that for any elementary matrix E, we have det ET = det E.
(Hint: Consider each of the three types of elementary matrices.)
5. Let A be an n x n matrix and let c be a scalar. Prove det(cA) = c" det A.
6. Prove that if the entries of a matrix A are integers, then det A is an integer. (Hint: Use Proposition
5.14 and induction or Proposition 5.18.)
7. Given that 1898,3471,7215, and 8164 are all divisible by 13, use only the properties of determi
nants and the result of Exercise 6 to prove that
18 9 8
3 4 7 1
7 2 15
8 16 4
is divisible by 13.
322 ► Chapter 7. Integration
ai bi ci
8. Let A = ,B — , andC = be points in R2. Show that the signed area of A AB C
ai bi .C2 _
is given by
1
ai bi ci
ai bi ci
2
1 1 1
(b) Suppose now that A, B, and D are as in part a, and C e A4^xjt. Prove that if A is invertible, then
A B
det = det Adet(D — CA-1B).
~D
(d) Give examples to show that the result of part c needn’t hold when A is singular or when A and
C do not commute.
*11. Suppose A is an orthogonal n xn matrix. (Recall that this means that ATA = In.) Compute
det A.
12. Suppose A is a skew-symmetric n x n matrix. (Recall that this means that AT = —A.) Prove
that when n is odd, det A = 0. Give an example to show this needn’t be true when n is even. (Hint:
Use Exercise 5.)
1 2 1
*13. Let A = 2 3 0
1 4 2
1
(a) If Ax = 2 , use Cramer’s rule to find x2.
-1
(b) Find A-1 by using cofactors.
5 Determinants and n-Dimensional Volume 323
*14. Using cofactors, find the determinant and the inverse of the matrix
-1 2 3
A= 2 1 0
0 2 3
a 15. (a) Suppose A is an n x n matrix with integer entries and det A = ±1. Prove that A-1 has all
integer entries.
(b) Conversely, suppose A and A-1 are both matrices with integer entries. Prove that det A = ±1.
16. Prove that the exchange of any pair of rows (or columns) of a matrix can be accomplished by an
odd number of exchanges of adjacent pairs.
17. Suppose A is an orthogonal n x n matrix. Show that the cofactor matrix C = ±A.
18. Generalizing the result of Proposition 5.17, prove that ACT = (det A)I even if A happens to be
singular. In particular, when A is singular, what can you conclude about the columns of CT?
X\ X2
19. (a) Show that if and are distinct points in R2, then the unique line passing through
J2.
them is given by the equation
1
X2 = 0.
y2
1
*3
= 0.
Z3
20. As we saw in Exercises 4.1.22 and 4.1.23, through any three noncollinear points in R2 there
pass a unique parabola6 y = ax2 + bx + c and a unique circle x2 + y2 + ax + by + c = 0. Given
r XI n r X2 n *3
three such points, 5 , and , show that the equation of the parabola and circle are,
_>’l - _3’2 .
-
respectively,
1 1 1 1 1 1 1 1
X xi X2 X3 X *1 X2 X3
=0 and = 0.
X2 X2 x22
X X2 >1
X1 x3 y 3'2 3,3
x2 + y2 *1+3'1 *2+3,2 x 32 + v 32
y yi y2 3>3
21. Using Corollary 5.6, prove that the determinant function is uniquely determined by the properties
listed in Theorem 5A. (Hint: Mimic the proof of Proposition 5.7. It might be^helpful to consider two
functions, det and det, that have these properties and prove that det(A) = det(A) for every square
matrix A.)
6Here we must also assume that no pair of the points lies on a vertical line.
324 ► Chapter 7. Integration
is the square of the (^-dimensional) volume ofthe ^-dimensional parallelepiped spanned by Vj,..., v*.
(Hints: First take care of the case that {vi,..., v*} is linearly dependent. Now, supposing they
are linearly independent and therefore span a fc-dimensional subspace V, choose an orthonormal
basis {Ujt+i,..., u„) for What is the relation between the fc-dimensional volume of the par
allelepiped spanned by vb ..., Vjt and the n-dimensional volume of the parallelepiped spanned by
Vl, . . . , Vfc, Ut+1, . . . , Un?)
23. (a) Using Proposition 5.18, prove that D(det)(Z)B = trB = bu -I------- 1- bnn. (See Exercise
1.4.22.)
(b) More generally, show that for any invertible matrix A, D(det)(A)B = det A tr(A-1B).
24. Give an alternative proof of Proposition 5.13 for general parallelepipeds as follows. Let R c R"
be a parallelepiped. Suppose T: R" -> R" is a linear map of either of the forms
Calculate the volume of R and of T(R) by applying Fubini’s Theorem, putting the xi integral inner
most. (This is in essence a proof of Cavalieri’s principle.)
25. (From the 1994 Putnam Exam) Find the value of m so that the line y = mx bisects the region
(1 -e)r
(1 + e)r
Figure 6.1
We leave it to the reader to check in Exercise 1 that these are indeed norms and, as will be
crucial for us, that || T (x) ||D < || T ||n ||x j|n for all x g R". Our first result, depicted in Figure
6.1, estimates how much a 61 map can distort a cube.
Lemma 6.1 Let Cr denote the cube in R" of sidelength 2r centered at 0. Suppose
U C R” is an open set containing Cr and $: U —> R” is a C1 function with the property
that 0(0) = 0 and ||D0(x) — Z ||0 < e for all ieCr and some 0 < £ < 1. Then
Proof One can check that Proposition 1.3 of Chapter 6 holds when we use the || • ||
norm instead of the usual one (see Exercise If). Then if x e Cr, we have
so 0(Cr) C C(i+e)r. The other inclusion can be proved by applying Exercise 6.2.11 in the
|| • ||Q norm. ■
The crucial ingredient in the proof of the Change of Variables Theorem is the following
result, which says that for sufficiently small cubes C, the image g(C) is well approximated
by the image under the derivative at the center of C.
Proposition 6.2 Suppose U C R" is open, g: U -> R” is C1, and Z>g(x) is invertible
for every x eU. Let C CU be a cube with center a, and suppose
Proof Since g is C1 with invertible derivative at each point of U, g maps open sets
to open sets and the frontier of g(C) is the image of the frontier of C, hence a set of zero
volume (see Exercise 7.1.12). Therefore, g(C) is a region.
326 ► Chapter 7. Integration
Suppose the sidelength of the cube C is 2r. We apply Lemma 6.1 to the function 0
defined by
Then $(0) = 0, £>0(0) = I, and D0(x) — Dg(a)-1°Dg(x + a), so, by the hypothesis,
||Zty(x) — 11| < 8 for all x e Cr. Therefore, we have
and so
Applying Proposition 5.5.13, using the fact that vol(Car) = anvol(Cr), and remembering
that translation preserves volume, we obtain the result. ■
We begin our onslaught on the Change of Variables Theorem with a very simple case,
whose proof is left to the reader in Exercise 2.
Lemma 6.3 Suppose T: R" -> R" is a linear map whose standard matrix is diagonal
and nonsingular. Let R C R” be a rectangle, and suppose f is integrable on T(R). Then
f°T is integrable on R and
Remark One can strengthen the theorem, in particular by allowing Dg(x) to fail to
be invertible on a set of volume 0. This is important for many applications—e.g., polar,
cylindrical, and spherical coordinates. But we won’t bother justifying it here.
and so
/ Og)(x)|detDg(x)|dVx = |detT| [ ((/=.g)»T)(u)|detDg(T(u))|dVu
Jr Jc
by the lemma
= / ((/°g)oT)(u)|detD(gor)(u)|dVu
Jc
by the previous comrpent
= / (/«(g»T))(u)|detD(g»T)(u))|dVo.
Jc
Thus, to prove the theorem, we substitute g°T for g and work on the cube C; that is, it
suffices to assume R is a cube.
There are positive constants M and N so that |/| < M (by integrability) and
|] (Dg)-1!^ < N (by continuity and compactness). ChooseO < e < 1. By uniform continu
ity, Theorem 1.4 of Chapter 5, there is 5i > 0 so that ||Dg(x) — Dg(y)||a < s/N whenever
||x — yII < 5i,x, y e R. Similarly, there is 82 > 0 so that | det Dg(x) — det £>g(y)| < e/M
whenever ||x — y|| < 52, x, y € R. And by integrability of (/og) | det Z)g|, there is 53 > 0
so that whenever the diameter of the cubes of a cubical partition ? is less than 83, we have
U((/°g) I det Dg|, CP) — L((/°g)| det Dg|, T) < s (see Exercise 7.1.10).
Suppose T = {2?i,..., is a partition of R into cubes of diameter less than 5 =
min(5i, 52,53). Let
Mi = sup(/°g)(x);
XGJ?/
mi = inf (/og)(x);
mi = inf (/og)(x)|detDg(x)|.
xe2?,
We claim that if a, is the center of the cube Ri, then
We check the latter: Choose a sequence of points x& e Ri so that (/°g)(x/j -> Mi (and
we assume Mi > 0 and all (/°g)(Xjt) > 0 for convenience). We have | detDg(a;)| <
| det Dg(Xfc)| + e/M and so
(/°g)(x*)|detDg(al-)| < (/°g)(xO|detDg(x Jt)| + (/°g)(xO-^
as required.
On any cube Ri with center a,, we have
Therefore, we have
s . s
(1 - £)" Vmd det PgCaOlvolCfl,) < / fdV < (1 +e)n V Mjdet^aJIvoK^).
Zf J*(R) i=i
Therefore,
5 p
ymiVol(jRi) - £(vol(Z?) + Afn) < / fdV
Z?
s
< y Mfvol(J?f) 4- £(2nvol(2?) 4- 2”-1 Mn).
1=1
We’ve almost arrived at the end. For convenience, let j8 = 2" vol (7?) 4- 2n~xMn. Recall
that since (/°g)| detDg] is integrable, its integral is the unique number lying between
all its upper and lower sums. Suppose now that / fdV I (/°g)| detDgjtZV. In
J$(R) Jr
particular, suppose / fdV = / (/°g)| det Dg|dV 4- y for some y > 0. Let £ > 0 be
Jg(R) Jr
chosen small enough so that (ft 4-1)£ < y. We have
[ fdV < U((f°g)\det Dg|, ?) 4- fie < f (/°g)| det Dg|dV 4- (fi + 1)£
Jg(R) Jr
< f (/«>g)|detDg|dV + y= [ fdV,
Jr Jg(R)
► EXAMPLE 1
First, to be official, we check that the formulas we derived in a heuristic manner in Section 4 are valid.
r cos 0
Polar coordinates: . Then
r sin 0
COS#
detDg
sin#
Ir r cos#
b. Cylindrical coordinates: gI# rsin# . Then
\z
—r sin# r\
rcos#
0
0
1 ( # I = r.
zl
c. Spherical coordinates: (
Let g 1 <f> =
psin</>cos#
psin^sin# . Then
w pcos0
(#
sin^sin#
cos</>
pcos</>sin#
—psin0
psin^cos#
► EXAMPLE!
°1 f3
Let S' c R2 be the parallelogram with vertices as pictured in Figure
0 ’ 1
6.2. Evaluate xdA. Of course, with a bit of patience, we could evaluate this by three different
iterated integrals in cartesian coordinates, but it makes sense to take a linear transformation g that
maps the unit square, R, to the region S; e.g.,
► EXAMPLE 3
first quadrant):
to S. Now,
Figure 6.3
6 Change of Variables Theorem <4 331
dvdu
U= —. ◄
3
► EXERCISES 7.6
1. Suppose x, y g R”, S and T are linear maps from R" to Rm, and c g R.
(a) Prove that ||x + y||Q < ||x||Q + ||y||D and ||cx||Q = |c|||x||D-
(b) Prove that ||S + T ||D < ||S||Q + ||r ||a and RT||a = |c| l|TllD.
<c) Prove that ||r(x)||D < ||T||DIMD.
(d) Suppose the standard matrix for T is the m x n matrix A. Prove that
||T|| max t |al7).
u l<«<m
(e) Check that ||x||a < ||x|| < 7n||x||n and ;^||T||n < ||T|| < 7n||T||D.
'b
(f) Suppose g: [a, b] -> R" is continuous. Prove that IS
□
needed to prove Proposition 1.3 of Chapter 6 with the || • [| norm.)
2. Prove Lemma 6.3.
x2 y2 x2 y2 zf
3. Find the area of the ellipse — + — < 1 and the volume of the ellipsoid — 4- + t < 1. (Cf.
a2 o2 a2 b2 c2
also Exercise 9.4.17.)
0 1 0
4. Let S be the triangle with vertices at , and . Let fl I = ri/(*+y). Evaluate
0 0 1
Js 2x + y
6. Rework Example 3 with the substitution u = xy, v = y.
*7. LetS be the plane region bounded by x = 0,y = 0, andx + y = 1. Evaluate dA.
(Remark: The integrand is undefined at the origin. Does this cause a problem?)
8. Find the volume of the region bounded below by the plane z = 0 and above by the elliptical
paraboloid z = 16 — x2 — 4y2.
9. Let S be the plane region in the first quadrant bounded by the curves y = x, y = 2x, and xy = 3.
Evaluate / xdA.
Js
332 ► Chapter 7. Integration
*10. Let S be the plane region in the first quadrant bounded by the curves y — x, y = lx, xy = 3,
fX
andxy = 1. Evaluate I -dA.
11. Let S' be the region in the first quadrant bounded by y = 0, y = x, xy = 1, and x2 — y2 = 1.
Evaluate / (x2 + y2)dA. (Hint: The obvious change of variables is u = xy, v = x2 — y2. Here it is
Js
too hard to find = g I ) explicitly, but how can you find det Dg another way?)
_yj \v/
12. Let S be the region bounded by y = — x, y = j, y = lx, and y = lx — 1. Evaluate
f *+? j*
Js(lx — y + l)4
‘13. Let S be the region with x > 0 bounded by y + x2 = 0, x — y = 2, and x2 — lx + 4y = 0.
Evaluate / (x - y + l)~2dA. (Hint: Consider x = u + v, y = v — u2.)
Js
14. Suppose 0 < b < a. Define g: (0, b~) x (0,2/r) x (0, In) —> R3 by
(a + r cos <£) cos#
g 0 (a + rcos0)sin#
r sin<£
A= 1 2 3 1
1 2 3 n
7We learned of this calculation from Simmons’s Calculus with Analytic Geometry, First Edition, pp. 751-52.
—
c h h Bt e r
8
DIFFERENTIAL FORMS
AND INTEGRATION ON
MANIFOLDS
In this chapter we come to the culmination of our study of multivariable calculus. Just as
in single-variable calculus, we’ve studied two seemingly unrelated topics—the derivative
and the integral. Now the time has come to make the connection between the two, namely,
the multivariable version of the Fundamental Theorem of Calculus. After building up to
the ultimate theorem, we consider some nontrivial applications to physics and topology.
► 1 MOTIVATION
We want to be able to integrate on ^-dimensional manifolds, so we begin by introducing
the appropriate integrands, which are called (differential) Worms. These integrals should
generalize the ideas of work (done by a force field along a directed curve) and flux (of a
vector field outward across a surface). But not only are Worms invented to be integrated,
they can also be differentiated. There is a natural operator d, called the exterior derivative,
which will turn Worms into k + 1-forms. The classical Fundamental Theorem of Calculus,
we recall, tells us that
/•b
/ f'(j)dt = f(b) - f(a)
Ja
whenever f is C1. We should think of this as relating the integral of the derivative over the
interval [a, b] to the “integral” of f over the boundary of the interval, which in this case is
the signed sum of the values f(b) and f(a). Notice that there is a notion of direction or
/•a fb
orientation built into the integral, inasmuch as / f(t)dt = — / f(f)dt. In this guise,
Jb Ja
we can write the Fundamental Theorem of Calculus in the form
f df=[ f = f(b) - f(a).
J[a,b] Jd[a,b]
I dco = / co
Jm JdM
333
334 ► Chapter 8. Differential Forms and Integration on Manifolds
for any Zr-form <y and compact, oriented ^-dimensional manifold M with boundary dM.
The original versions of Stokes’s Theorem all arose in the first half of the nineteenth century
in connection with physics, particularly potential theory and electrostatics.
Just as the Fundamental Theorem of Calculus tells us that our displacement is the
integral of our velocity, so can it tell us the area of a plane region by tracing around its
boundary (see Exercises 1.5.3 and 8.3.26). Another instance of the Fundamental Theorem
of Calculus is Gauss’s Law in physics, which tells us that the total flux of the electric field
across a “Gaussian surface” is proportional to the total charge contained inside that surface.
And, as we shall see in Section 7, another application is the Hairy Ball Theorem, which tells
us we can’t comb the hairs on a billiard ball. The elegant modern-day theory of calibrated
geometries, which grew out of understanding minimal surfaces (the surfaces of least area
with a given boundary curve), is based on differential forms and Stokes’s Theorem.
As we’ve seen in Sections 5 and 6 of Chapter 7, determinants play a crucial role in the
understanding of n-dimensional volume, and so it is not surprising that k-forms, the objects
we wish to integrate over ^-dimensional surfaces, will be built out of determinants. We
turn to this multilinear algebra in the next section.
1. Why does a (plane) mirror reverse left and right but not up and down?
2. Appropriating from Tom and Ray Magliozzi’s “Car Talk”:
RAY: Picture this. It’s 1936. You’re in your second year of high school. Europe is
on the brink of yet another war.
TOM: Second senior year in high school.
RAY: In a secret location in Germany, German officers are gathered around a table
with the designers and builders of its new personnel carrier. They’re going over every
little detail and leaving no stone unturned. They want everything to be flawless. One of
the officers stands up and says, “I have a question about the fan belt, about the longevity
of the fan belt.” You with me?
TOM: They spoke English there?
RAY: Oh, yeah.
TOM: Just like in all the movies?
RAY: I’m reading the subtitles.
TOM: Just like in all the movies. I often wondered how come they all spoke English?
RAY: Well, it’s so close to German, after all.
TOM: Yeah. You just add an ish or ein to the end of everything.
RAY: Anyway, this fan belt looks just like the belt around your waist. It’s a flat piece
of rubber, and it’s designed to run around the fan and the generator. So, he asks, “How
long do you expect the belt to last?” The engineer says, “30 to 40 thousand kilometers.”
The officer says, “Not good enough.”
TOM: He said, how many miles is that?
RAY: The colonel says ...
TOM: That’s why I never made any money in scriptwriting.
RAY: Yeah. The colonel says, “Not good enough. We need it to last at least 60K.”
The engineer says, “Huh. Not a problem. It’s just a question of taking off the belt and
flipping it over, right?”
TOM: Sure.
RAY: Turning it inside-out.
2 Differential Forms 335
TOM: Yeah.
RAY: The officer says, “That’s unacceptable. Our soldiers will be engaged in battle.
We can’t ask them to change fan belts in the middle of the battlefield.”
TOM: Well, it’s a good point.
RAY: That’s right.
TOM: I mean, come on. You can’t tell the guys to stop shooting, your fan belt’s got
to be replaced.
RAY: Exactly. Hold your fire. So, the engineers huddle together, and they come up
with a clever design change. And I think I mentioned they do not change the material of
the belt in any way, yet they satisfy the new longevity requirement quite easily. What did
they do?
TOM: Whew!
(Source: Tom and Ray Magliozzi from Car Talk on NPR.)
► 2 DIFFERENTIAL FORMS
We have learned how to calculate multiple integrals over regions in R". Our next goal is to
be able to integrate over compact manifolds, e.g., curves and surfaces inR3. In some sense,
the most basic question is this: We know that the determinant gives the signed volume of an
n-dimensional parallelepiped in R"; how do we find the signed volume of a ^-dimensional
parallelepiped in R”, and what does “signed” mean in this instance?
L J
(The reason for the bizarre notation will soon become clear.) Note that the set of linear maps
from R” to R is an n-dimensional vector space, often denoted (R")*, and {dxi,..., dxn}
is a basis for it. (See Exercise 4.3.25.) For if 0: R" -> R is a linear map, then, let
ting {ei,..., en} be the standard basis for R", we set ai — </>(eJ, i = 1......... n. Then
0 = a^dxi -I-------- 1- andxn, so dx\,..., dxn span (R")*. Why do they form a linearly in
dependent set? Well, suppose 0 — cidxi -I-------- F cndxn is the zero linear map. Then, in
particular, 0(e,) = q = 0 for all i = 1,..., n, as required.
Now, if I = (ii,..., ik) is an ordered fc-tuple, define
'Here we revert to the usual notation for functions, inasmuch as Vi,.... v* are all vectors.
336 ► Chapter 8. Differential Forms and Integration on Manifolds
dxj(vi,...,v*) =
JxIit(vi) ••• dxik(yk)
As is the case with the determinant, dxj defines an alternating, multilinear function of k
vectors in Rn. If we write
then
■ ■ ■
••• Vk,ik
When z‘i < Z2 < • • • < 4, this is of course the determinant of the k x k matrix obtained by
taking rows i i,... ,ik of the matrix
Vi v2 •■■ v*
► EXAMPLE 1
► EXAMPLE 2
Whenz'i < i2 < • • • < z^, we say that the ordered A;-tuple / = (ii,..., ik) is increasing.
If Z is a k-tuple with no repeated index, we denote by Z< the associated increasing k-tuple.
For example, if I — (2,4,5,1), then = (1,2,4,5), and we observe that
In general, Jx/ = (—l)sdxi<, where s is the number of exchanges required to move from
I to I<. Note that if we switch two of the indices in the ordered £-tuple, this amounts to
switching two rows in the matrix, and the determinant changes sign. Similarly, if two of
the indices are equal, the determinant will always be 0, so dxj = 0 whenever there is a
repeated index in I.
It follows from Theorem 5.1 or Proposition 5.18 of Chapter 7 that the set of dxj with
Z increasing spans the vector space of alternating multilinear functions from (RB)* to R,
denoted A^(R”)*. In particular, if T e A*(R")*, then for any increasing fc-tuple Z, set
a} = T (e,!,..., eit). Then we leave it to the reader to check that
T = aidXl
I increasing
and that the set of dxi with I increasing forms a linearly independent set (see Exercise
1). Since counting the increasing sequences of k numbers between 1 and n is the same as
counting the number of ^-element subsets of an n-element set, we have
dim(A*(R")*) = Q^.
Figure 2.1
of the projection onto the xixxt2... -plane of the parallelepiped spanned by vi,..., v*.
See Figure 2.1.
Generalizing the cross product of vectors in R3 (see Exercise 3), we define the product
of these alternating multilinear functions, as follows. If I and J are ordered k- and ^-tuples,
respectively, we define
dxj /\dxj =
► EXAMPLE 3
► EXAMPLE 4
Suppose a> = axdx\ + a2dx2 and t? = b}dx\ 4- b2dx2 € A’(R2)* = (R2)*. Then let’s compute
a> a r] e A2(R2)*:
Of course, it should not be altogether surprising that the determinant of the coefficient matrix a1 a2
b\ b2
has emerged here.
2 Differential Forms 339
Proof Properties (1) and (3) are obvious from the definition. For (2), we observe
that to change the ordered (k + €)-tuple (it......... ik, f,..., jf) to the ordered (k + £)-
tuple (ji,..., je, fi,..., ik) requires kl exchanges: To move ji past r’i, • • •, ik requires k
exchanges, to move j2 past ii,..., ik requires k more, and so on. ■
Now that we’ve established associativity, we can make the crucial observation that
As has been our custom throughout this text, when we work in R23, it is often more convenient
to write x, y, z for xi, x2, X3.
CO = f(x)dx\ A • • • A dxn
for some smooth function f. As we shall soon see, these (rather than functions) are precisely
what it makes sense to integrate over regions in R”. A (differential) k-form on Rn is an
expression
Determinants (and hence volume) are already built into the structure of k-forms. As
the name “differential form” suggests, their substantial power comes, however, from our
ability to differentiate them. We begin with the case of a 0-form, i.e., a smooth function
f: U -> R. Then for any x e U we want df(x} = D/(x) as a linear map onR". In other
words, we have
In particular, note that if we take f to be the z* coordinate function, then df = dx, and
Jx£ (v) = Dxi(y) = Vi, so this explains (in part) our original choice of notation. If co =
52 fi (x)dx.i is a k-form, then we define
i
n qf
dco = ^2/dfi A dxi = -—dXj A dx^ A • • • A dxik.
i i j=i °Xj
(Note that for a fixed k-tuple I, only the terms dxj where j is different from i\,..., ik will
appear.)
► EXAMPLE 5
e. Let co = x^dx2 4-X3dx4 +x5dxe € X'(R6). Then dco = dx\ A dx2 4- dx3 A dx^ 4-
dx3 A dXf>.
f. Let co = (x2 4- eyz)dy a dz 4- (y2 4- sin(x3z))dz A dx 4- (z2 4- arctan(x2 4- y2))dx A dy
€ .42(R3). Then
dco — 2xdx A dy A dz 4- 2ydy Adz Adx + 2zdz Adx Ady
= 2(x+ y + z)dx Ady Adz. **4
2 Differential Forms 341
The operator d, called the exterior derivative, enjoys the following properties.
Proof Properties (1) and (2) are immediate; indeed, (2) is a consequence of (3). To
prove (3), we note that because d commutes with sums, it suffices to consider the case that
a) = fdxi and rj = gdxj. Then, since the product rule gives d(fg) = gdf 4- fdg, we
have
A 3/
da) = y -—dxj /\dxs
“ dxj
j=i j
and
(*)
92y
d(da)) = 2y2_, —---- dx; /\ dXi /\dXj.
dXidxj J
i=i j=i
Since dxi /\dxj — —dxj A dx,, we can rewrite the right-hand side of (*) as
a2/ . a2f
dxjdxj dXjdXi ’
2.3 Pullback
All the algebraic and differential structure inherent in differential forms endows them with
a very natural behavior under mappings. The main point is to generalize the procedure of
“integration by substitution,” familiar to all calculus students: When confronted with the
fb
integral / f(g(.u))g'(u)du, we substitute x = g(u), formally write dx = g'(u)du, and
Ja
fb fg(b)
say/ f(g(p))gf(u)du = I f(x)dx. The proof that this works is, of course, the chain
Ja Jg(a)
rule. Now we put this procedure in the proper setting.
g*/ = Ag.
► EXAMPLE 6
Then g*dx = - sin tdt and g*dy = cos/dz, so g*(—ydx + xdy) = (- sinz)(- sin zdz) +
(cos z)(cos tdt) = dt.
c. Let g: R2 -> R2 be given by
U COS U
g
u sin v
2 Differential Forms 343
g*o> — (u cos v)(cos vdu — u sin vdv) + (u sin v)(sin vdu + w cos vdv)
= m (c o s 2 v + sin2 v)du + w2(— cos v sin u + cos v sin v)dv = udu.
Moreover,
g*(dx A dy) = g*dx A g*dy = (cos vdu — u sin vdv) A (sin vdu + u cos vdv)
= m (c o s 2 v + sin2 v)du Adv = u du A dv,
WCOS V
usinv
Then
and so
Therefore, if w = (x2 + y2)dx Ady + xdx Adz + ydy A dz, then we have
g*<v = u\udu A dv) 4- (u cos v)(cos vdu A dv) + (w sin v)(sin vdu A dv)
= u(u2 + l)du A dv. ◄
9“/i
duhA---Adujk.
M ... ^gik
3uJk
We need one last technical result before we turn to integrating.
344 ► Chapter 8. Differential Forms and Integration on Manifolds
Proposition 2.4 LetU c Rm be open, and let g \ U -> R" be smooth. Ifco G «4*(R"),
then
g*(da>)=d(g*a)).
Proof The statement for k = 0 is the chain rule (Theorem 3.2 of Chapter 3):
=r(E^)=^>-
Since the pullback of a wedge product is the wedge product of the pullbacks, we infer
that g*(dxj) = dgj. Because d and pullback are linear, it suffices to prove the result for
<o= fdXi. Weh,
(Notice that at the penultimate step we use the rule for differentiating the wedge product
and the fact that d(dgi) = 0.) ■
Proposition 2.5 Let Q cRn be a region, and let g : Q —> R” be smooth and one-
to-one, with det(Dg) > 0. Then for any n-form <a ~ fdx\ A • • • A dxn on S = g(S2), we
have
set of volume 0, but we won’t bother with this now.) We say that M = g(2) c Rn is a
parametrized k-dimensional manifold. If co is a fc-form on R", we define
f CO — f g*m.
Jm Jq
If gi: -> R” and g2: Q2 -> are two parametrizations of the same ^-manifold M,
then, provided detDfe^ogi) > 0 (which, as we shall soon see, means that gi and g2
parametrize M with the same orientation),
That is, the integral of co over the (oriented) parametrized manifold M is well defined.
► EXERCISES 8.2
1. Prove that as I ranges over all increasing k-tuples, the dxy form a linearly independent set in
AA(R")*. Also check that for any T e Afc(R")*, T = increasing where = T(eheik).
2. (a) Suppose co e A*(R")* and k is odd. Prove that co a c o = 0.
(b) Give an example to show that the result of part a need not hold when k is even.
3. Suppose v, w 6 R3. Show that dx(v x w) = dy a dz(y, w), dy(y x w) =dz /\ dx(y, w), and
dz(y x w) ~dx /\ dyly, w).
4. Simplify the following expressions:
*(a) (2dx + 3dy + 4dz) * (dx -dy + 2dz)
(b) (dx +dy — dz) A (dx + 2dy + dz) A (dx — 2dy + dz)
*(c) (2dx Ady + dy A dz) A (3dx — dy + 4dz)
(d) (dx\ A dx2 + dx3 A dx4) A (dx\ A dx3 + dx3 A dx^)
(e) (dx\ A dx2 + dx3 A dx^ + dx$ A dx6) A (dx\ A dx2 + dx3 A dx$ + dx5 A dxf>) A
(dxi A dx2 + dx3 A dx4 + dx5 A dx^)
s5. Let n e R3 be a unit vector, and let v and w be orthogonal to n. Let
0 = n^dy /\dz + n2dz A dx + n3dx A dy.
Prove that (v, w) is equal to the signed area of the parallelogram spanned by v and w (the sign being
determined by whether n, v, w form a right-handed system for R3).
*6. Calculate the exterior derivatives of the following differential forms:
(a) co = exydx
(b) co = z2dx + x2dy + y2dz
(c) co = x2dy /\dz + y2dz A dx + z2dx A dy
(d) co = X\X2dx3 A dx4
346 ► Chapter 8. Differential Forms and Integration on Manifolds
*7. Can there be a function f so that df is the given 1-form c d (everywhere c d is defined)? If so, can
you find /?
(a) a> = —ydx 4- xdy (d) to = (x2 4- yz)dx 4- (xz + cos y)dy 4- (z + xy)dz
(b) c d = 2xydx 4- x2dy (e) " = l^dx + Wdy
(c) co = ydx 4- zdy 4- xdz (f) c d = — x-Jt-^dx
v ’ 2+y2
4- x~r~idy
2+y2 '
8. For each of the following fc-forms co, can there be a (k — l)-form i] (defined wherever c d is) so that
dr) = c d !
(a) c d = dx A dy (e) c d = xdy Adz + ydx Adz + zdx A dy
(b) c d = xdx A dy (f) c d = (x2 4- y2 + z2)""1 (xdy Adz + ydz A dx 4- zdx A dy)
(c) c d = zdx A dy (g) CD = Xsdxi A dxz A dx3 A dx$ 4- Xidx2 A dx4 A dx3 A dx5
(d) c d = zdx Ady + ydx Adz + zdy A dz
"9. (The Star Operator)
(a) Define ★ : >V(R2) -> >V(R2) by *dx = dy and *dy = -dx, extending by linearity. If f is a
smooth function, show that
d*(df) = (+ yr) dx A d>’-
\ dx2 dy2 /
(b) Define ★ : A1 (R3) X2(R3) by *dx =dy A dz, *dy = dz a dx, and *dz = dx a dy, extend
ing by linearity. If f is a smooth function, show that
d*(df) = (yr + TT + yr) dx A dy A dz.
\ ox2 dy2 dz2, /
(Note that we can generalize the definition of the star operator by declaring that, in R”, ★ of a
basis 1-form </> = dxt is the “complementary” (n - l)-form, subject to the sign requirement that
0 A *0 = dx\ A • • • A dxn.)
10. Suppose c d e X1 (Rn) and there is a nowhere-zero function A. so that ka> is the exterior derivative
of some function f. Prove that c d Ada) = 0. (This problem gives a useful criterion for deciding
whether the differential equation c d = 0 has an integrating factor X.)
11. In each case, calculate the pullback g*tu and simplify your answer as much as possible.
(a) g: (—t t /2, n/2) -> R, g(u) = sinu, c d = dx/Vl — x2
3cos2v
*(b) g: R -> R2, g(v) = , c d = —ydx 4- xdy
3sin2v
3ucos 2v
(c) g:R2->R2,g , a) = —ydx +xdy
3m sin 2v
cosu
(d) g:R2->R3,g sinu , c d = zdx 4- xdy 4- ydz
COSM
cosu
sin v
(f) g:R2^R4,g , CD = X2dXi 4- XsdX4
sinu
2 Differential Forms < 347
cosu
sin v
(g) g:R2->R4, g , co = Xidxj — *2^*4
sinu
cos v
cosu
sin v
(h) g:R2^R4,g , CO = (—XjdXi + XidXi) A (—X^dXi, + X4C/X2)
sinu
cos v
12. For each part of Exercise 11, calculate g*(dcu) and J(g*n>) and compare your answers.
13. Let g: (0, oo) x (0, nr) x (0, 2t t ) -> R3 be the usual spherical coordinates mapping, given on
p. 294. Compute g*(c/x /\dy A dz).
"14. We say a Ar-form cy is closed i£dco~Q and exact if co = dr) for some (k — l)-fonn ty
(a) Prove that an exact form is closed. Is every closed form exact? (Hint: Work with Example 5d.)
(b) Prove that if co and 0 are closed, then co A 0 is closed.
(c) Prove that if co is exact and 0 is closed, then co a 0 is exact.
k
15. Suppose k < n. Let coi,..., co* e (R”)* and suppose that 52 A co< = 0. Prove that there are
i=l
k
scalars ay such that = a;, and co, = 52 atjdxj.
16. Suppose R? A Rm A Rn. Prove that (g°h)* = h*og*. (Hint: It suffices to prove (goh)*c7xf =
b*(g*c/x{). Why?)
17. (a) Suppose I = (i'i,..., in) is an ordered n-tuple and Z< — (1,2,..., n). Then we can define
a permutation a of the numbers 1,..., n by a(j) = ij, j = 1,..., n. Show that
d*i = sign(a)c£xi A • • • A dxn.
n
(b) Suppose co, = 52 aijdxj, i = 1,..., n, are 1-forms on Rn. Use Proposition 5.18 of Chapter 7 to
j =i
prove that coi A • • • A co„ = (det A)dxt A • • • A dxn.
(c) Suppose g: Rn -> Rn is smooth. Show that dg} /\ • ■ ■ a dgn = det(Dg)Jxi A • • • A dxn.
18. Suppose 0i,..., 0* € (R”)* and Vi,..., v* € R". Prove that
01 A•••A 0jt(Vl, ..., vt) = det [0z(vy)].
(Hints: First of all, it suffices to check this holds when the vy are standard basis vectors. Why? Write
n
out the 0i as linear combinations of the dxj, fa = 52 aijdXj, and show that both sides of the desired
equality are
°Ui ■ ‘’ aVk
akji ■ ‘ ’ akjk
when we take Vj =e71,...,vjt =e;t.)
19. Suppose U C Rm is open and g: U -> Rn is smooth. Prove that for any co € v4k(Rrl) and
Vy,..., Vjt e Rm, wc have
g*cy(a)(vi,..., vfc) = w(g(a))(Dg(a)vi....... Dg(a)vfc).
(Hint: Consider co — dni.)
348 ► Chapter 8. Differential Forms and Integration on Manifolds
20. Prove that there is a unique linear operator d mapping Ak (U) -> Ak+i (U) for all k that satisfies
the properties in Proposition 2.3 and df = -A-dxj. (This tells us that, appearances to the contrary
notwithstanding, the exterior derivative d does not depend on our coordinate system.)
re fb n
YFi^g'^dt.
JC J[a,b] Ja i=1
Fi
C
<•> =
Ja
F(g(»)) ■ g'W* = / F(g(O) •
Ja ^^d>=Lv-Tds’
where ds is classically called the “element of arclength” on C and T is the unit tangent
vector (see Section 5 of Chapter 3). The most general path over which we’ll be integrating
will be a finite union of C1 paths, as above. In particular, we say the path C is piecewise-^
ifC — C}V---V)Cs, where Cj is the image of the C1 function g7 : [a,, bj] Rn.
Remark Let C~ be the curve given by the parametrization h: [a, b] -> R", h(u) =
g(a + b — u). Then
y yb yb
I h*co = I F(h(u)) • h'(u)du = I F(g(a + b - u)) • (-g7(a + b - u))du
J[a,b] Ja Ja
cb
= — I F(g(t)) • g'(t)dt (substituting t = a + b — u)
Ja
~ - I g**>-
/[a.b]
Note that h(a) = g(b) and h(b) — g(a): When we go backward on C, the integral of co
changes sign. We can think of obtaining C~ by reversing the orientation (or direction)
ofC.
In comparing C and C~, the unit tangent vector T reverses direction, so that F • T
changes sign but ds does not. That is, the notation notwithstanding, ds is not a 1-form, as
its value on a tangent vector to C is the length of that tangent vector; this, in turn, is not a
linear function of tangent vectors. It would probably be better to write ]ds |.
3 Line Integrals and Green’s Theorem 349
► EXAMPLE 1
Let C be the line segment from , and let co = xydz. We wish to calculate co. The
0<r < 1.
Then
r r r1
I co = I g*ty = / (1 + t)(-l + 3t)(2dt)
C J [0,1] Jo
► EXAMPLE 2
Letty = - ydx + xdy. Consider two parametrized curves Ci and C2, as shown in Figure 3.1, starting
COSl
g(O = and h(O = 0<t < 1;
sinr
f f /”r/2 /”T/2 7T
I = / g*co= / (— sin t)(-sin tdt) + (cost)(costJr) = / Idt = —;
Jci J[0,n/21 Jo Jo 2
Thus, we see that / co depends not just on the endpoints of the path but also on the particular path
JA
joining them.
Figure 3.1
350 ► Chapter 8. Differential Forms and Integration on Manifolds
Recall from your integral calculus (or introductory physics) class the definition of work
done by a force in displacing an object. When the force and the displacement are parallel,
the definition is
and in general only the component of the force vector F in the direction of the displacement
vector d is considered to do work, so
work = F ■ d.
given by / F(g(t)) •
Ja
Figure 3.2
^EXAMPLES
What is the relation between work and energy? As we saw in Section 4 of Chapter 7, the kinetic
energy of a particle with mass m and velocity v is defined to be K.E. = \m ||v||2. Suppose a particle
with mass m moves along a curve C, its position at time t being given by g(t), t g [a, &]• Then the
work done by the force field F on the particle is given by
f fb
work = / F • Tds = / F(g(t)) • ^(t)dt
JC Ja
That is, assuming F is the only force acting on the particle, the work done in moving it along a path
is the particle’s change in kinetic energy along that path. *1
Proposition 3.1 Suppose co — df for some C1 function f. Then for any path (i.e.,
piecewise-Q1 manifold) C starting at A and ending at B, we have
[ co = f(B)-f(A).
Jc
Equivalently, when F = V/, we have
Proof It follows from Theorem 3.1 of Chapter 6 that any C1 segment of C is a finite
union of parametrized curves Cj, j — 1,..., s, where Q is the image of a C1 function
g;: [aj, bj] -» RB. Let g7(a7) = Aj and g;(Z?7) = Bj. We may arrange that Ai = A,
Bj = Aj+i, j = 1,..., j — 1, and Bs — B. It suffices to prove the result for Cj, for then
we will have
Now, we have
by definition
by definition of pullback
as required. Note that the proof amounts merely to applying the standard Fundamental
Theorem of Calculus, along with the definition of line integration by pullback. The fact
that d commutes with pullback, in this instance, is simply the chain rule. ■
Theorem 3.2 Let co = $2 Fidxi be a 1-form (or let F be the corresponding force
field) on an open subset U C R”. The following are equivalent:
352 ► Chapter 8. Differential Forms and Integration on Manifolds
2. I a) is path-independent in U;
JA
3. co = df (or F = Vf)for some potential function f on U.
Remark In light of Example 3, there is no net work done by F around closed paths,
so that kinetic energy is conserved—which is why such force fields are called conservative.
Physicists refer to — f as the potential energy (P.E.). It then follows from Proposition 3.1
that the total energy, K.E. + P.E., is conserved along all curves, for
Proof (1) => (2): If Ci and C2 are two paths from A to B, then C = Ci U C2 is a
closed curve, as indicated in Figure 3.3(a). Then
Figure 3.3
(2) => (3): (Here we assume any two points of U can be joined by a path. If not,
one must repeat the argument on each connected “piece” of IZ.) Fix a e U, and define
f: U -+ Rby
/•X
f (x) = / co, where the integral is computed along any path from a to x.
Ja
X1
df r 1
7T-(x) = lim - f Xj +h -f x(-
dxi h->o h
I
j fx+hfy
k Xn ) W
j I'h
/
= lim - / c d = lim - / F,(x 4- tei)dt
h->0 h J* h—>0 n Jq
= Fi(x)
function by choosing a convenient path. We illustrate the general principle with some
examples.
► EXAMPLE 4
Let c d = (ex 4- 2xy)dx 4- (x2 4- cos y)dy. We show two different ways to calculate a potential func
tion f, i.e., a function f with df = c d .
0 Xo
a. Take the line segment C joining 0 = andxo = as shown in Figure 3.4(a); we
0 .y° _
take the obvious parametrization*.
txo
g(t) = tXo = Q<t <1.
tyo
Then
b. Now we take the two-step path, as shown in Figure 3.4(b), first varying x and then varying
y, to get from 0 to Xq . That is, we have the two parametrizations:
Then we have
for some arbitrary function h (this is the “constant of integration”). Differentiating (t) with
respect to y and comparing with the latter equation in (*), we find
= x2 4- h'(y) = x2 + cosy,
whence h'(y) — cosy and h(y) = siny + C. Thus, the general potential function is
Note that even though it is computationally more clumsy, the approach in (a) requires only that we be
able to draw a line segment from the “base point” (in this case, the origin) to all the other points of our
region. The approaches in (b) and (c) require some further sort of convexity: We must be able to start
at our base point and reach every other point by a path that is first horizontal and then vertical.
3 Line Integrals and Green’s Theorem 355
We now prove a general result along these lines: Suppose an open subset U C R" has
the property that for some point a € U, the line segment from a to each and every point
x g U lies entirely in U. (Such a region is called star-shaped with respect to a, as Figure
3.5 suggests.) Then we have:
Proposition 3.3 Let cube a closed 1-form on a star-shaped region. Then co is exact.
Figure 3.5
Proof Write = £ Fidx^ For any x e U, we can parametrize the line segment
from a to x by
Then we have
9/ f1 I*1 3F-
I Fj(a + t(x — a))Jt + / t—-(a + t(x — a))(xj — aj)dt
dXj Jo Jo OXi
/* 1 /* 1 / w QP- \
I F;(a + t(x-a))dr + / t( J’'—l-(a + t(x-a))(xj - aj))dt
0 Jo °xj '
356 ► Chapter 8. Differential Forms and Integration on Manifolds
z . . r. . dFj . ,
(using the fact that —— = -— since da> = 0)
f1 r1
= / Fi(a + t(x-&))dt + I t(Fiog)f(t)dt (by the chain rule)
Jo Jo
= [ Fi(a + t(x-a))dt + t(fi°g)(t)l - I F}(a + t(x- a))dt
Jo -*0 Jo
(integrating by parts)
= Ff(x).
► EXAMPLE 5
e‘
g(0 = t6 + 4t3 - 1
_ r4 + (t - t2ksinf
We certainly hope that the 1-form <o is exact (or, equivalently, that the corresponding force field
is conservative), for then we can apply the Fundamental Theorem of Calculus for Line Integrals,
Proposition 3.1.
If a) is to be equal to df for some function f, we need to solve
^- = - + y, |^=x + z, -logx + y + 2z.
9x x dy oz
Integrating the first equation, we obtain:
dx — zlogx + xy 4-g
z
9g
9y 9y
and so we find that — = z. Thus, g [ | —yz + h(z) for some appropriate “constant of integration’
9y \z
h(z). So
f y = zlogx+xy+ yz + A(z).
3 Line Integrals and Green’s Theorem 357
y = zlogx + xy + yz + Z2 + c.
V>
Now comes the easy part. The curve goes from
A = g(0) = to B = g(l) =
and so
► EXAMPLE 6
Newton’s law of gravitation states that the gravitational force exerted by a point mass M at the origin
on a unit test mass is radial and inverse-square in magnitude:
Y
V = —GM—
hll3
The corresponding 1-fonn is w = —GM(x2 + y2 + z2)~3,2(xdx + ydy 4- zdz). Since
(see Example 1 of Chapter 3, Section 4), it follows immediately that a potential function for the
gravitational field is /(x) = GAf/||x||. (Physicists ordinarily choose the constant so that the potential
goes to 0 as x goes to infinity.)
Let’s now consider the case of the gravitational field of the earth; note that the gravitational
acceleration at the surface of the earth is given by g = GM/R1, where R is the radius of die earth. By
Proposition 3.1, the work done (against gravity) to lift a unit test mass from a point A on the surface
of the earth to a point B height h units above the surface of the earth is therefore
provided h is quite small compared to R. This checks with the standard formula for the potential
energy of a mass m at (small) height h above the surface of the earth: P.E. = mgh. <1
358 ► Chapter 8. Differential Forms and Integration on Manifolds
I co = I dco.
JdR JR
Proof Take R = [a, b] x [c, d], as shown in Figure 3.6, and write co = Pdx + Qdy.
Then
J /9£ 92>\ .
dco = I —-------- — I dx A dy.
\ dx dy /
Now we merely calculate, using Fubini’s Theorem appropriately:
as required. ■
H--------------------------------- F
a b
Figure 3.6
3 Line Integrals and Green’s Theorem ◄ 359
(It is important to understand that both S and d S inherit an orientation from the parametriza
tion g.) ■
► EXAMPLE?
Suppose c d is a smooth 1-form on the unit disk D in R2. Can we infer that / o d = I dcDl The naive
JdD JD
answer is “of course,” parametrizing by polar coordinates and applying Corollary 3.5. The difficulty
that arises is that we only get a bona fide parametrization on (0,1] x (0, 2/r). But we can apply
Corollary 3.5 on the rectangle RS e = [3,1] x [«, 2zrj when 8, s > 0 are small. Let Ds,e = g(/?a,e),
as indicated in Figure 3.7. Because c d is smooth on all of the unit disk, we have
(We leave it to the reader to justify the first and last equalities.) We shall not belabor such details in
the future.
Figure 3.7
More generally, we observe that Green’s Theorem holds for any region S that can be
decomposed as a finite union of parametrized rectangles overlapping only along their edges.
360 ► Chapter 8. Differential Forms and Integration on Manifolds
For, as Figure 3.8 illustrates, if S = U*=i *$» because the integrals over interior boundary
segments cancel in pairs, we have
k
I dot.
s
r rcosB
O r1- -----
cos0 • a
n + sm0 sin0
1 0
maps a rectangle to the triangle with vertices at and
0 1
► EXAMPLES
We can use Green’s Theorem to calculate the area of a planar region S by line integration. Since
dx t\dy — d(xdy) = d(-ydx) — d(^(-ydx +xdyj),
we have
/ co — I dco = 0,
JdR Jr
and, as the proof of Theorem 3.2 showed, this is sufficient to construct a potential func
tion f. ■
► EXAMPLE 9
y x
Let co = —------ -dx + —----- -dy. Then, as we calculated in Example 5d of Section 2, dco = 0.
x2 + y2 x2 + y2
And yet, letting C be the unit circle, it is easy to check that ® co = 2t c . So c o cannot be exact. We
Jc
shall see further instances of this phenomenon in later sections.
► EXAMPLE 10
Suppose C is any simple closed curve in the plane that encircles the origin, and let T be a circle
centered at the origin lying in the interior of C, as shown in Figure 3.9. Let S be the region lying
between C and T. If we orient C and T counterclockwise, then we have dS = C + r~. Once again,
y x
let co = ———-dx + —----- - dy. Then, as in Example 9, we have dco = 0. But now co is smooth
x2 + y2 x2 + y2
everywhere on S, and so
Figure 3.9
362 ► Chapter 8. Differential Forms and Integration on Manifolds
That is,
I CD = / CD = 211,
Jc Jr
and this is true for any simple closed curve C with the origin in its interior. More generally, consider
the curves shown in Figure 3.10. Then / c d — 2n, 4t t , and 0, respectively, in parts (a), (b), and (c).
Jc
For reasons we leave to the reader to surmise, for a closed plane curve not passing through the origin,
the integer
If y x
2tr Jc xI2 + y2 X + x2 + y2 y
► EXERCISES 8.3
*1. Let c d = ydx + xdy. Compare and contrast the integrals / c d for the following parametrized
Jc
curves C. (Be sure to sketch C.)
t cos21
(a) g: [0,1]-> R2, g(t) = (d) g: [0, t t /2] ->R2, g(r) =
t 1 — sin21
t sin2t
(b) g: [0,1] —> R2, g(r) = (e) g: [0, t t /4] R2, g(r) ==
t2 1 -cos2t
1-t cost
(c) g: [0,1]-> R2, g(t) = (f) g: [0, t t /2]-> R2, g(t) =
1 —t 1 - sint
(d) / ydx, where C is the intersection of the unit sphere and the plane x + y + z = 0, oriented
Jc
counterclockwise as viewed from high above the xy-plane. (Hint: Find an orthonormal basis for the
plane.)
4. Let C be the curve of intersection of the upper hemisphere x2 + y2 + z2 = 4, z > 0 and the
cylinder x2 + y2 = 2x, oriented counterclockwise as viewed from high above the xy-plane. Evaluate
/ ydx 4- zdy + xdz.
Jc
r n r -i
x y 1 2
5. Let to = —------- dx 4- —------ =•dy. If C is an arbitrary path from to not passing
x2 + y2 x2 + y2 1 2
Calculate / (3x 4- y1 4- 2xz)dx 4- (2xy 4- zeyz 4- y)dy 4- (x2 4- yeyz 4- zez )dz. (Hint: This prob-
Jc
lem should involve very little computation.)
9. Let C be any closed curve in the plane. Show that ® ydx = — ® xdy. What is the geometric
Jc Jc
interpretation of these integrals?
10. Calculate each of the following line integrals j to directly and by applying Green’s Theorem.
(In all cases, C is traversed counterclockwise.)
*14. LetO < b < a. Find the area beneath one arch of the trochoid (as shown in Figure 3.11)
, ' at - b sin /
g(0 = , 0 < / < 2rr.
a — hcos/
Figure 3.11
15. Find the area of the plane region bounded by the evolute
a (cos/ +1 sin/)
0 < t < 2/r ,
a (sin/ — t cos/)
Figure 3.12
F•nds= ★ (W.
This is called the flux of F across C. (See Exercise 8.2.9.) Conclude that when C = 9S, we have
Jc Js \dx dy J
19. Prove Green’s theorem for the annular region □ = : a < y/x2 4- y2 < b k pictured in
y
Figure 3.14.
Figure 3.14
20. Give a direct proof of Green’s theorem for
0 a 0
(a) a triangle with vertices at , and
0 ’ 0 b
(b) the region : a < x < b, g(x) < y < h(x) . (Hint: Exercise 7.2.23 will be helpful.)
|_y J
21. Suppose C is a piecewise C1 closed curve in R2 that intersects itself finitely many times and does
not pass through the origin. Show that the line integral
1 [ —ydx 4- xdy
Jc x2 4- y2
22. Suppose C is a piecewise C1 closed curve in R2 that intersects itself finitely many times and does
1 -1
not pass through or Show that there are integers m and n so that
0 0
y3 + x2y
23. An ant finds himself in the xy-plane in the presence of the force field F = . Around
2x2 — 6xy
what simple closed curve beginning and ending at the origin should he travel counterclockwise (once)
in order to maximize the work done on him by F?
24. Suppose Q C R2 is a region with the property that every simple closed curve in Q bounds a
region contained in Q that is a finite union of parametrized rectangles. Prove that if a> is 1-form on
Q with dco = 0, then a> is exact; i.e., there is a potential function f with a> — df.
25. (a) Suppose there is a current c in a river. Show that if we row at a constant ground speed v > c
directly downstream a certain distance and then directly back upstream to our beginning point, the
time required (ignoring the time to turn around) is always greater than the time it would take with no
current. (This is just an elementary algebra problem.)
(b) Show that the same is true no matter what closed path C we take in the river. (Assume we still
row with ground velocity v, with ||v|| > c constant.) (Hint: Express the time of the trip as a line
integral over C and do some clever estimates. The diagram in Figure 3.15 may help.)
Figure 3.15
26. According to Webster, a planimeter, pictured in Figure 3.16, is “an instrument for measuring the
area of a regular or irregular plane figure by tracing the perimeter of the figure.” As we show a bit
more schematically in Figure 3.17, an arm of fixed length b has one fixed end; to the other is attached
another arm of length a, which is free to rotate. A wheel (for convenience attached slightly off the
near end) turns as the arm rotates about the pivot point. Use Green’s Theorem to explain how the
amount that the wheel rotates tells us the area of the figure.
► EXAMPLE 1
ucosa
vsinw
v
Figure 4.1
(a + b cos v) cos u
(a+b cos v) sin u
bsinv
If 0 < b < a, the image of g is most of a torus, as pictured in Figure 4.2, the surface of
revolution obtained by rotating a circle of radius b about an axis a units from its center.
Figure 4.2
vcosa
vsina
u
368 ► Chapter 8. Differential Forms and Integration on Manifolds
This parametrized surface, pictured in part in Figure 4.3, resembles a spiral ramp, and is
officially called a helicoid. ◄
Figure 4.3
I co = f
s Ju
(provided the integral exists).
► EXAMPLE!
/ \ rcos&
r sin0
and so
f f r2* f1
I a> = I g*w = / /
s Jd Jo Jo
b. Now consider g: (0, t t /2) x (0,2n) -> R3 given by
sin^cos#
sin</>sin0
cos</>
4 Surface Integrals and Flux -41 369
g*(zdx A dy) = cos 0 (cos <£> sin a dO) = cos2 <f> sin <f>d<p a dO,
and so
/* f z*231 r*/2 2t t
I (o= I g*a> = I I cos2 </> sin QdtpdO = — •
Js J(.0,7t/2)x(0M Jo Jo 3
c. Now let’s do the lower hemisphere correspondingly in each of these two ways. Parametrizing
by the unit disk, we have
r cos#
rsin#
-^71
On the other hand, in spherical coordinates, we have k: (t t /2, n) x (0,2n) -> R3 given
by the same formula as g in part b above, and so
I <o= I k tu = —.
Js A’r/2,7r)x(O,2w) 3
What gives?
The answer to the query is very simple. Imagine you were walking around on the unit
sphere with your feet on the surface (your body pointing radially outward, normal to the
sphere). As you look down, you determine that a basis for the tangent plane to the sphere
will be “correctly oriented” if you see a positive (counterclockwise) rotation from the first
vector (u) to the second (v), as pictured in Figure 4.4. We will say that your body is pointing
Figure 4.4
370 ► Chapter 8. Differential Forms and Integration on Manifolds
in the direction of the outward-pointing normal vector to the surface. Note that then n, u, v
form a positively-oriented basis for R3; i.e.,
n u v > 0.
Figure 4.5
> EXAMPLES
The standard example of a nonorientable surface is the Mobius strip, pictured in Figure 4.6. Observe
that if you slide the positive basis {u. v) once around the strip, it will return with the opposite
orientation. Alternatively, if you start with an outward-pointing normal n and travel once around the
Mobius strip, the normal returns pointing in the opposite direction.
Definition If S is an oriented surface, its (oriented) area 2-form cr is the 2-form with
the property that <r(a) assigns to each pair of tangent vectors at a the signed area of the
4 Surface Integrals and Flux 371
Figure 4.6
parallelogram they span. (By signed area we mean die obvious: The pair of tangent vectors
form a positively oriented basis if and only if the signed area is positive.)
is its area 2-form. This was the point of Exercise 8.2.5, but we give the argument here. If
u and v are in the tangent plane to S, then
or(U, V) = n u v
gives the signed volume of the parallelepiped spanned by n, u, and v. Since n is a unit
vector orthogonal to u and v, this volume is the area of the parallelogram spanned by u and
v; our definition of orientation dictates that the signs agree.
► EXAMPLE 4
Consider the surface of revolution S defined by z — f(r), 0 < r <a, oriented so that its outward
pointing normal has a positive e3-component. We can parametrize S by
3g 9g
Since the vector — x — has a positive e3-component, this is an appropriate parametrization. Now,
the unit normal is
-f/'fr)
1
n = —===== -£/'(r)
Vl + /'(r)2
1
and so
- 1 ~f(r)dy f\dz— -f'(r)dz /\dx +dx /\ dy) .
V1 + /W
Pulling back, we have
> EXAMPLE 5
*(H ■
x/ L ^(c~niX ~n2y) _
Then
* ((^2 . , \ i 1 , ,
g*<T = rii I —dx A dy I + n% I —dx a dy I + n3ax A dy = —dx A dy.
\»3 / \n3 / »3
Recall that if u and v are two vectors in the plane, then or (u, v) gives the signed area of the parallelogram
they span, whereas (dx A dy) (n, v) gives the signed area of its projection into the xy-plane. As we see
from Figure 4.7, the area of the projection is |n31 = | cos y | times the area of the original parallelogram,
where y is the angle between the plane and the xy-plane, so the general theory is compatible with a
more intuitive, geometric approach.
Figure 4.7
4 Surface Integrals and Flux 373
"Fr
Given a vector field F = Ft on an open subset of R3, we saw in Section 3 that
L^3 J
integrating the 1-form a> = F\dx 4- F2dy 4- F^dz along an oriented curve computes the
work done by F in moving a test particle along that curve. What is the meaning of integrat
ing the corresponding 2-form i] = F^dy A dz 4- F2dz A dx 4- F$dx A dy over an oriented
surface S'? (The observant reader who’s worked Exercise 8.2.9 will recognize that t) — ★ &>.
See also Exercise 8.3.18.) Well, if u and v are tangent to S, then
That is, / t j represents the flux of F outward across S, often written / F • ndS. Here dS
Js Js
represents an element of (nonoriented) surface area, just as ds represented the element of
(nonoriented) arclength on a curve; in neither case should these be interpreted as the exterior
derivative of something.
A physical interpretation is the following: Imagine a fluid in motion (not depending on
time), and let F (x) represent the velocity of the fluid at x multiplied by the density of the fluid
at x. (Note that F points in the direction of the velocity and has units of mass/(area x time).)
Then the mass of fluid that flows across a small area AS of S in a small amount of time Ar
is approximately
so that
Am
« F • nAS.
Ar
Taking the limit as Ar -> 0 and summing over the bits of area AS, we infer that / t?
Js
represents the rate at which mass is transferred across S by the fluid flow.
► EXAMPLE 6
xz2
We wish to find the flux of the vector field F = yx2 outward across the sphere S of radius a
_ J
centered at the origin. That is, we wish to find the integral over S of the 2-form t ] = xz2dy Adz +
yx2dz A dx 4- zy2dx A dy. Calculating the pullback under the spherical coordinate parametrization
g: (0,7t) x (0,2zr) -> R3,
sin $ cos 0
sin0sin0
cos</>
374 > Chapter 8. Differential Forms and Integration on Manifolds
we have
g* j? = a5 (sin 0 cos 0 cos2 <f> (sin2 0 cos 0) 4- sin3 0 sin 0 cos2 0 (sin2 0 sin 0)
4- cos 0 sin2 0 sin2 0(sin 0 cos A dO
= a5 (sin3 0 cos2 0 4- sin5 0 cos2 0 sin2 0)d0 A dO,
and so
Figure 4.8
4 Surface Integrals and Flux 375
curve and our treatment of change of variables in Chapter 7, we next give a definition of
surface area that will work for any parametrized surface. We need the result of Exercise
7.5.22: If u and v are vectors in R", the area of the parallelogram they span is given by
/ U•U U•V
y v•u v•v
(Here is the sketch of a proof. We may assume {u, v} is linearly independent, and let
{V3,..., vn} be an orthonormal basis for Span(u, v)1. Then we know that the volume of
the -dimensional parallelepiped spanned by u, v, V3,..., vn is the absolute value of the
determinant of the matrix
riii r
A— 11 v v3 ••• v„ .
But by our choice of the vectors V3,..., vn, this volume is evidently the area of the paral
lelogram spanned by u and v. But by Propositions 5.11 and 5.7 of Chapter 7, we have
u•u u•v 0 • • 0
v u v•v 0 • • 0
(det A)2 = det(ATA) = 0 0 1 _ u•u u•V
v•u V•V
0 0 1
linear map Dg , and that, in turn, is Aw Au times the area of the parallelogram spanned
, 9g , 9g
by — and —.
7 du dv
With this motivation, we now make the following
Definition Let S c R" be a parametrized surface, given by g: Q -> Rn, for some
region Q c R2. Let
9g II2
9u ||
F=^. du
dg
dv ’
<7 —
\j —-
9g
dv
► EXERCISES 8.4
1. Let 5 be that portion of the plane x + 2y + 2z = 4 lying in the first octant, oriented with outward
normal pointing upward. Find
(a) the area of S,
(b) I (x - y + 3z )c t ,
Js
(c) / zdx /\dy + ydz /\dx + xdy /\ dz.
Js
2. Find the area of that portion of the cylinder x2 4- y2 = a2 lying above the xy-plane and below the
plane z = y.
3. Find the area of that portion of the cone z — y/2(x2 + y2) lying beneath the plane y + z = 1.
*4. Find the area of that portion of the cylinders2 + y2 = 2y lying inside the sphere x2 + y2 + z2 = 4.
#5. Let S be the sphere of radius a centered at the origin, oriented with normal pointing outward. Eval
uate / xdy /\dz + ydz a dx + zdx a dy explicitly. What formula do you deduce for the surface
Js
area of S?
6. Let S be the surface of the unit sphere, and let its area element be <r.
(a) Calculate / x2a directly.
Js
(b) Evaluate the integral in part a without doing any calculations. (Hint: Why is / x2a = I y2o =
Js Js
zJol)
7. Find the surface area of the torus given parametrically in Example lb.
*8. Find the surface area of that portion of a sphere of radius a lying between two parallel planes
(both intersecting the sphere) a distance h apart.
9. Let S be that portion of the helicoid given parametrically by
WCOS V
(a) With the orientation determined by g, decide whether the outward-pointing normal points upward
or downward.
(b) If we orient S with the normal pointing upward, compute xdz a dx.
10. We can parametrize the unit sphere (except for the north pole) by stereographic projection from
Tu"! To"
the north pole, as indicated in Figure 4.9. v is the point where the line through and
0
x x
y (on the sphere) intersects the plane z = 0, solve for « and v. Then solve for g y
z z
Explain geometrically why stereographic projection is an orientation-reversing parametrization.
4 Surface Integrals and Flux •< 377
Figure 4.9
11. Let co = xdy A dz. Let S be the unit sphere, oriented with outward-pointing normal. Calculate
I co by parametrizing S
Js
(a) by spherical coordinates,
(b) as a union of graphs,
(c) by stereographic projection (see Exercise 10).
12. Let S be the unit upper hemisphere, oriented with outward-pointing normal. Calculate / zcr by
Js
showing that zc = dx a dy as 2-forms on S.
13. LetS be the cylinder x2 4- y2 = u2,0 < z < h, oriented with outward-pointing normal. Calculate
/ co for
Js
(a) co = zdx a dy, (b) co = ydx A dz.
*14. Find the moment of inertia about the z-axis of a uniform spherical shell of radius a centered at
the origin.
*15. Find the flux of the vector field F(x) = x outward across the following surfaces (all oriented with
outward-pointing normal pointing away from the origin):
(a) the surface of the sphere of radius a centered at the origin,
(b) the surface of the cylinder x2 4- y2 = a2, —h<z<h,
(c) the surface of the cylinder x2 4-y2 = u2,-ft < z < ft, together with the two disks, x2 4-y2 < a2,
z = ±h, ±1
(d) the surface of the cube with vertices at ±1 .
±1
16. Find the flux of the vector field F = y2 outward across the given surface S (all oriented with
_ z2 _
outward-pointing normal pointing away from the origin, unless otherwise specified):
(a) S is the sphere of radius a centered at the origin.
(b) S' is the upper hemisphere of radius a centered at the origin.
(c) S is the cone z = -/x2 4- y2, 0 < z < 1, with outward-pointing normal having a negative e3-
component.
(d) S is the cylinder x2 4- y2 = a2,0 < z < ft.
378 Chapter 8. Differential Forms and Integration on Manifolds
(e) S is the cylinder x2 4- y2 = a2,0 < z < h, along with the disks x2 + y2 < a2, z = 0 and z = h.
xz
*17. Calculate the flux of the vector field F = yz outward across the surface of the paraboloid
L^2 + y2 J
S given by z = 4 — x2 — y2, z > 0 (with outward-pointing normal having positive e3-component).
*18. Find the flux of the vector field F(x) = x/||x||3 outward across the given surface (oriented with
outward-pointing normal pointing away from the origin):
(a) the surface of the sphere of radius a centered at the origin,
(b) the surface of the cylinder x2 + y2 = a2, —h<z<h,
(c) the surface of the cylinder x2 + y2 = a2, — h < z < h, together with the two disks, x2 + y2 < a2,
z ~ ±h, ±1
(d) the surface of the cube with vertices at ±1
±1
19. Let S be that portion of the cone z = y/x2 + y2 lying inside the sphere x2 + y2 + z2 = 2ax and
oriented with normal pointing downward. Calculate / co for
Js
(a) co = dx A dy,
x y
(b) co = - dy Adz + -dz Adx — dx A dy.
z z
20. Suppose g: Q -> R3 gives a parametrized, oriented surface with unit outward normal n. Let
N= x —, so thatn = N/||N||. Check that
du dv
g*(nidy Adz + n2dz Adx + n^dx Ady) = ||N||du Adv — y/EG — F2du A dv.
21. Sketch the parametrized surface g: [0, 2t t ] x [-1,1] given by:
(2 + vsin |)cosu
g (2 4- v sin |)sinu
ucos |
0
Compare g*(dy A dz) at and . Explain.
0 0
X2
: x2 + x2 = 1, x2 + x2 = 1
X3
_ X4 _
1
0
Orient X so that dxz A dx4 > 0 at the point e X. Calculate / co for
1 Jx
_0_
(a) co = dx\ A dx2 + dx3 A dx^,
(b) co = dx\ A dx3 + dx2 A dx^,
(c) CO = X2X4dX) A dX3.
5 Stokes’s Theorem 4 379
23. Consider the cylinder 5 with equation x2 + y2 = 1, — 1 < z < 1, oriented with unit normal
pointing outward. Calculate
(a) / xdy A dz — zdx A dy, (b) / xzdy and / xzdy (See Figure 4.10.)
JS J Ci Jc2
Compare your answers and explain.
Figure 4.10
24. Let S be the hemisphere x2 4- y2 + z2 = a2, z > 0, oriented with unit normal pointing upward.
Let C be the boundary curve, x2 + y2 = a2, z = 0, oriented counterclockwise. Calculate
(a) / dx A dy + 2zdz A dx, (b) / xdy + z2dx.
Js Jc
Compare your answers and explain.
25. Construct two Mobius strips out of paper: For each, cut out a long rectangle, and attach the short
edges with opposite orientations.
(a) Cut along the center circle of the first strip. What happens? Explain. What happens if you repeat
the process?
(b) Make parallel cuts in the second strip one-third of the way from either edge. What happens?
Explain.
26. Prove or give a counterexample: If S is an orientable surface, then there are exactly two possible
orientations on S.
► 5 STOKES’S THEOREM
We now come to the generalization of Green’s Theorem to higher dimensions. We first stop
to make the official definition of the integral of a differential form over a compact, oriented
manifold. So far we have dealt only with the integrals of 1- and 2~forms over parametrized
curves and surfaces, respectively.
We start with a
i. g([Z) = V = W n M\ and
ii. U is an open subset either of Rfc or of R* = {u g R* : Uk > 0}.5
See Figure 5.1. We say p is a boundary point of M (written p G 3Af) if p = g(u) for some
u g dR^ = {u g R* : Uk = 0}.
g(Z7) is sometimes called a coordinate chart on M. A coordinate ball on M is the
image of some ball under some parametrization.
As was the case with surfaces, an orientation on a manifold with boundary M C R"
is a continuously varying notion of what a positively oriented basis for the tangent space
at each point should be. M has an orientation if and only if we can cover M by coordinate
4Recall from Section 3 of Chapter 6 that this means that g is a one-to-one smooth map from U to W n M so that
£>g(u) has rank k for every u e U and g-1: W Q M U is continuous.
5 We say 17 C R^. is an open subset of R$. if it is the intersection of R^. with some open subset of R*.
5 Stokes’s Theorem ■< 381
[3g dg 1
charts g: U -> R" so that < —,..., — [is a positive basis for the tangent space of M
[OUl OUk J
at each point. We say M is orientable if there is some orientation on M.
We leave it to the reader to prove, using Theorem 5.1, that M is orientable if and only
if there is a nowhere-vanishing £-form on M (see Exercise 23). Then we can make the
Now we come to the main technical tool that will enable us to define integration on
manifolds.
Proof
Step 1: Define h: R R by
, z x x > 0
h(x) =
0, x<0
Then h is smooth (in particular, all its derivatives at 0 are equal to 0, as we ask the reader
to prove in Exercise 25). Set
fX
I h(t)h(l — f)dt
I h(f)h(l — t)dt
Jo
and define^: R* -> Rby^r(x) = j(3 — 2||x||). Then is a smooth function wither (x) =
1 whenever )|x|| < 1 and ^(x) = 0 whenever ||x)| > 3/2; is often called a bumpfunction.
(See Figure 5.2 for the graph of for k = 1.)
Step 2: For each point p g M, choose a coordinate chart whose domain is a ball of radius
2 in R* (why can we do so?).6 The images of the balls of radius 1 obviously cover all
of M; indeed, we can choose a sequence (countable number) of p’s so that this is true.
(See Exercise 26.) By Exercise 5.1.12, finitely many of these images of balls of radius 1,
say, Vi,..., Vn , cover all of M. Let g,: B(0,2) -> Vi be the respective coordinate charts,
6For those p in the boundary, this will be a half-ball, i.e., the points in the ball with nonnegative fc* coordinate.
382 > Chapter 8. Differential Forms and Integration on Manifolds
y = h(x)h(l-x) y = Wx)
1 1 1.5
Figure 5.2
0/
p‘ 1X1 <
Note that for each p e M, we have p = g; (u) for some j and some u e B(0,1), and hence
07(p) = 1 for some j. Thus, the sum is everywhere positive. These functions pi fulfill the
requirements of the theorem. ■
Now it is easy to define the integral. Let M c R" be a compact, oriented fc-dimensional
manifold (with piecewise-smooth boundary). Let co be a A-form on M? Let {/>,} be a
partition of unity, and let gz be the corresponding parametrizations, which we may take to
be orientation-preserving (how?). Now we set
r . n n . n „
Jm Jm
The point is that the form Pito is nonzero only inside the image of the parametrization g,.
One last technical point. Let M be a ^-dimensional manifold with boundary, and let
p be a boundary point. The tangent space of dM at p is a (k — 1)-dimensional subspace
of the tangent space of M at p, and its orthogonal complement is 1-dimensional. That
1-dimensional subspace has two possible basis vectors, called the inward- and outward
pointing normal vectors. By definition, if we follow a curve starting at p whose tangent
vector is the inward -pointing normal, we move into M, as shown in Figure 5.3. We endow
d M with an orientation, called the boundary orientation, by saying that the outward normal,
n, followed by a positively oriented basis for the tangent space of dM should provide a
positively oriented basis for the tangent space of M. For examples, see Figure 5.4. We ask
7We are being a bit casual about what a smooth function or jt-form on Af ought to mean. We might start with
something defined on a neighborhood of M in R" or, instead, we might just know the pullbacks under coordinate
charts are smooth. Because of Theorem 3.1 of Chapter 6, these notions are equivalent. We leave the technical
details to a more advanced course. In practice, except for results such as Theorem 5.1, we will usually start with
objects defined on R" anyhow.
5 Stokes’s Theorem 383
Figure 5.3
n
Figure 5.4
the reader to check in Exercise 1 that the boundary orientation on is the usual one on
R*-1 precisely when k is even.
I co = / da).
JdM Jm
(Here dM is endowed with the boundary orientation, as described above.)
Remark Note that the usual Fundamental Theorem of Calculus, the Fundamental
Theorem of Calculus for Line Integrals (Proposition 3.1), and Green’s Theorem (Corollary
3.5) are all special cases of this theorem. When we’re orienting the boundary of an oriented
line segment, we assign a + when the outward-pointing normal agrees with the orientation
on the segment, and a — when it disagrees. This is compatible with the signs in
b <b
f'(t)dt = f(b) - /(a).
384 ► Chapter 8. Differential Forms and Integration on Manifolds
Proof Since both sides of the desired equation are linear in cu, we can (by using a
partition of unity) reduce to the case that <u is zero outside of a compact subset of a single
coordinate chart, g: U -> Rn (where U is open in either R* or R?j.). Then we have
f g*(cZm) = f d(g*a>).
u Ju
k
g*ca = 22 fi(x)dxi A • • • A dx, A • • • A dxk,
i=l
Case 1: Suppose U is open in Rfe; this means that co = 0 on 9Af, and so we need only
show that / da) = / d(g*co) = 0. The crucial point is this: Since g*co is smooth and 0
Jm Ju ...
outside of a compact subset of U, we may choose a rectangle R containing U, as shown in
Figure 5.5, and extend the functions f to functions on all of R by setting them equal to 0
outside of U. Finally, we integrate over R = [«i, bi] x • • • x [a*, bk]:
c f _* / df- \
I d(g*a>)— I J? ((“A ’ " ^dXi A • • • ^dxk
Ju °Xi7
k
= 22(-l)'-1 f ~i-dxidx2---dxk
Jr dxi
Figure 5.5
Case 2: Now comes the more interesting situation. Suppose U is open in Rj.> and once again
we extend the functions fi to functions on a rectangle R C Rj. by letting them be 0 outside
of U. In this case, the rectangle is of the form R = [a15 hi] x • • • x [a*_i, hjt-i] x [0, h*],
as we see in Figure 5.6. Now we have
Figure 5.6
386 ► Chapter 8. Differential Forms and Integration on Manifolds
(since all the other integrals vanish for the same reason as in Case 1)
/ *1 \
= (-!)* / A
dx\ * ■ * dx^—|
£7r)3R$.
\ 0 /
I g*Ct) = / ft),
c/naR!}. JdM
as required. Note the crucial sign in the definition of the boundary orientation (see also
Exercise 1). ■
Remark Although we won’t take the time to prove it here, Stokes’s Theorem is
also valid when the boundary, rather than being a manifold itself, is piecewise smooth,
e.g., a union of smooth (k — l)-dimensional manifolds with boundary intersecting along
(k — 2)-dimensional manifolds. For example, we may take a cube or a solid cylinder,
whose boundary is the union of a cylinder and two disks. The theorem also applies to such
non-manifolds as a solid cone.
► EXAMPLE 1
Let C be the intersection of the unit sphere x2 4- y2 4- z2 = 1 and the plane x 4- 2y 4- z = 0, oriented
counterclockwise as viewed from high above the xy-plane. We wish to evaluate / (z — x)dx 4-
(x - y)dy 4- (y - z)dz.
We let <u = (z - x)dx 4- (x - y)dy 4- (y - z)dz and M be that portion of the plane x 4- 2y 4-
z = 0 lying inside the unit sphere, oriented so that the outward-pointing normal has a positive e3-
component, as shown in Figure 5.7. Then dM = C, and by Stokes’s Theorem we have
Parametrizing the plane by projection on the xy-plane, we have M = g(Z>), where D is the interior
of the ellipse 2x2 4- 4xy 4- 5y2 = 1 (why?), and
y
—x — 2y
dco — I ffdeo — I
Jd Jd
5 Stokes’s Theorem 387
Figure 5.7
Now, by Exercise 5.4.15 or by techniques we shall learn in Chapter 9, this ellipse has semimajor axis
1 and semiminor axis 1/V6, so, using the result of Exercise 8.3.12, its area is Tr/Vf), and the integral
is
Alternatively, applying our discussion of flux in Section 4, we recognize the surface integral in
“1"
(*) as the flux of the constant vector field F = 1 outward across M. Since the unit normal of M
1
is n = , we see that
(dy A dz + dz A dx + dx A dy)
► EXAMPLE 2
Let 5 be the sphere x2 + y2 + (z - I)2 = 1, oriented in the customary fashion. We wish to evaluate
I co, where a) = xzdy Adz + yzdz a dx 4- z2dx a dy. Let M be the compact 3-manifold whose
Js
boundary is 5; i.e., Af = {x € 1R3 : x2 + y2 + (z — I)2 < 1}, oriented by the standard orientation on
1R3. We apply Stokes’s Theorem to M:
f . . 4?r 16t t
— I 4zdV = 4zvol(Af) — 4 • 1 • — = ——
Jm 3 3
(Recall that z is the z-component of the center of mass of M.) Of course, we could compute the surface
0
integral directly, parametrizing S by, for example, spherical coordinates centered at 0 .
1
388 ► Chapter 8. Differential Forms and Integration on Manifolds
► EXAMPLES
xz
Suppose we wish to calculate the flux of the vector field F = yz outward across the surface
of the paraboloid S given by z = 4 - x2 — y2, z > 0 (with outward-pointing normal having posi
tive e3-component). That is, we want to compute the integral of a> — xzdy Adz + yzdz a dx +
(x2 + y2)dx A dy. How might we do this with Stokes’s Theorem? If c d were exact, i.e., if c d = dr? for
some 1-fonn r}, then we would have / co = / ??; but since da> = 2zdx A dy A dz 0, we know
Js Jss
that c d cannot be exact What now?
If we attach the disk D = {x2 + y2 < 4, z = 0}, to S, then we have a (piecewise-smooth) closed
surface, which bounds the region Af = {0<z<4 — x2 — y2} C R3, as shown in Figure 5.8. Then
we have dM = S U D (where by D~ we mean the disk with outward-pointing normal given by
—e3). Applying Stokes’s Theorem, we find
y f2n y2 fl-T2 M
I cd — I da> = I 2zdx Ady Adz = I 2zdV = I II 2rzdzdrd0 = -~n.
JdM Jm Jm m Jo Jo Jo 3
But we are interested in the integral of c d only over the surface S. Since
(where by D we mean the disk with its usual upward orientation), then we have
f2 2 64 o 88
I r rdrdO = —it + 8?r = —t t .
'o 3 3
We leave it to the reader to check this by a direct calculation (see Exercise 8.4.17).
Figure 5.8
► EXAMPLE4
We come now to the 3-dimensional analogue of Example 9 of Section 3. It will play a major role in
physical and topological applications in upcoming sections. Consider the 2-form
xdy Adz + ydz A dx + zdx A dy
cd =
(x2 -I- y2 + z2)3/2
5 Stokes’s Theorem ◄ 389
which is defined and smooth on K3 — {0}. The astute reader may recognize that on a sphere of radius
a centered at the origin, a> is 1/a2 times the area 2-form.
Pulling back by the spherical coordinates parametrization given on p. 329, with a bit of work we
see that
which establishes again the geometric interpretation of a>. It is also clear that d(g*co) = 0; since
det Z>g 76 0 whenever p # 0 and <j> 0, it, it follows that da> — 0. (Of course, it isn’t too hard to
calculate this directly.)
So here we have a 2-form whose integral over any sphere centered at the origin (with outward
pointing normal) is 4t t , and yet, for any ball B centered at the origin, / da> = 0. What happened to
Jb
Stokes’s Theorem? The problem is that to is not defined, let alone smooth, on all of B.
But there is more to be learned here. If £2 c R3 is a compact 3-manifold with boundary with
0 9£2, then we claim that
4?r, 0 € £2
I (D —
an 0, 0££2’
rather like what happened with the winding number in Example 10 of Section 3. When 0 £ £2, we
know that w is a (smooth) 2-form on all of £2, and hence Stokes’s Theorem applies directly to give
When 0 € £2, however, we choose e > 0 small enough so that the closed ball B(0, £) c £2, and we
let £2fi = £2 - B(0, s), as pictured in Figure 5.9, recalling that 9£2e = 9£2 + . (Here Se denotes the
sphere of radius s centered at 0, with its usual outward orientation.) Then co is a smooth form defined
on all of £2e and we have
Therefore, we have
c d = 4t t ,
as we learned above.
Figure 5.9
390 ► Chapter 8. Differential Forms and Integration on Manifolds
► EXERCISES 8.5
*1. Check that the boundary orientation on 8R* is (—l)fc times the usual orientation on R*”1.
2. Let C be the intersection of the cylinder x2 4- y2 = 1 and the plane 2x 4- 3y — z = 1, oriented
counterclockwise as viewed from high above the xy-plane. Evaluate
*3. Compute / (y — z)dx 4- (z — x)dy 4- (x — y)dz, where C is the intersection of the cylinder
Jc
X z
x2 + y2=a2 and the plane - 4- - ~ 1, oriented clockwise as viewed from high above the xy-plane.
a b
4. Let C be the intersection of the sphere x2 4- y2 4- z2 = 2 and the plane z = 1, oriented counter
clockwise as viewed from high above the xy-plane. Evaluate
11. Use the result of Exercise 10 to compute / dco for the given surface M and 1-form
Jm
(a) M is the upper hemisphere x2 4- y2 4- z2 = a2, z > 0, oriented with outward-pointing normal
having positive e3-component; co = (x3 4- 3x2y - y)dx 4- (y3z 4- x 4- x3)dy 4- (x2 4- y2 4- z)dz.
(b) M is that portion of the paraboloid z = x2 4- y2 lying beneath z = 4, oriented with outward
pointing normal having negative e3-component; co = ydx 4- zdy 4- xdz.
5 Stokes’s Theorem •«( 391
Figure 5.10
(c) M is the union of the cylinder x2 4- y2 = 1,0 < z < 2, and the disk x2 + y2 < 1, z = 0, oriented
so that the normal to the cylindrical portion points radially outward; co = ~y3zdx + x3zdy + x2y2dz.
12. Let M = {x € R4 : x2 4- xf 4- xf < x4 < 1}, with the standard orientation inherited from R4.
Evaluate / co:
JdM
*(a) a) = (xfxj 4- x4)c/xi A dxz A dx3,
(b) co = ||x||2dxi A dx2 a dx3.
13. Redo Exercise 8.4.22c by applying Stokes’s Theorem.
14. Suppose f is a smooth function on a compact 3-manifold with boundary M c R3. At a point of
9 M, let Dnf denote the directional derivative of f in the direction of the unit outward normal. Show
that
[ DnfdS = [ V2fdV,
JdM Jm
d2f d2f d2f
where V2/ = - 4- 4- is the Laplacian of f. (Hint: V2fdx /\dy Adz — d*df. See
ax2 ay2 dz2
Exercise 8.2.9.)
15. Let S be that portion of the cylinder x2 4- y2 = a2 lying above the xy-plane and below the sphere
x2 4- (y - a)2 4- z2 = 4a2. Let C be the intersection of the cylinder and sphere, oriented clockwise
as viewed from high above the xy-plane.
(a) Evaluate / zdS.
Js f
(b) Use your answer to part a to evaluate / y(z2 — l)dx 4- x(l - z2)dy 4- z2dz.
Jc
16. Let S be that portion of the cylinder x2 4- y2 = a2 lying above the xy-plane and below the sphere
(x — a)2 4- y2 4- z2 = 4a2. Let C be the intersection of the cylinder and sphere, oriented clockwise
as viewed from high above the xy-plane.
(a) Evaluate j z2dS.
(b) Use your answer to part a to evaluate y(z3 4- l)dx — x(z3 4- l)dy 4- zdz.
11. Let
M= CR4,
392 ► Chapter 8. Differential Forms and Integration on Manifolds
oriented so that —5----- > 0 on Af. Evaluate [ (y2 — xf)dx2 A dyi. (Hint: By applying an
Ji ~ *1 Jm
appropriate linear transformation, you should be able to recognize Af as a torus.)
*18. Let C be the intersection of the sphere x2 4- y1 4- z2 = 1 and the plane x 4- y 4- z = 0, oriented
counterclockwise as viewed from high above the xy-plane. Evaluate
[ z3dx.
Jc
(Hint: Give an orthonormal basis for the plane x + y + z = 0, and use polar coordinates.)
19. Let C be the intersection of the sphere x2 + y2 +z2 = 1 and the plane x 4- y 4- z = 0, oriented
counterclockwise as viewed from high above the xy-plane. Evaluate
8This is an illustration of the use of calibrations, introduced by Reese Harvey and Blaine Lawson in their seminal
paper, Calibrated Geometries, Acta Math. 148 (1982), pp. 47—157.
6 Applications to Physics •*< 393
24. Let M be a compact, orientable ^-dimensional manifold (with no boundary), and let co be a
(k - l)-form. Show that da> = 0 at some point of M. (Hint: Using Exercise 23, write da) — fa,
where cr is the volume form of M. Without loss of generality, you may assume M is connected.
Why?)
x >0
25. Let h(x) = ’ . Because exponential functions grow faster at infinity than any
(0, x<0
polynomial, it should be plausible that all the derivatives of h at 0 are 0. But give a rigorous proof as
follows:
(a) Let f(x) — x > 0. Prove by induction that the kth derivative of /, is given by
/(W(x) = e~X)xpk(l/x) for some polynomial Pk of degree 2k.
(b) Prove by induction that (0) = 0 for all k > 0.
26. Let X C R". Prove that given any collection {KJ of open subsets of R" whose union contains
X, there is a sequence Vai ,Va2,... of these sets whose union contains X. (Hint: Consider all balls
B(q, 1/k) C R" (for some k € N) centered at points q € R” all of whose coordinates are rational.
This collection is countable, i.e., can be arranged in a sequence. Show that we can choose such balls
B(q,, 1/fcJ, i = l,2,..., covering all of X with the additional property that each is contained in
some Vaj.)
► 6 APPLICATIONS TO PHYSICS
6.1 The Dictionary in R3
We have already seen that a vector field in R3 can plausibly be interpreted as either a 1-form
or a 2-form, the former when we are calculating work, the latter when we are calculating
flux. We have already seen that for any function f, the 1-form df corresponds to the vector
field V/. We want to give the traditional interpretations of the exterior derivative as it acts
on 1- and 2-forms.
Given a 1-form
a) — Fidxi + F2dx2 + F3dx3 e A^R3), we have
J J fdFi 9F3\ J
dx2 dxj 4-1 - ------- --- — ) dx3 A dxi
\ dX3 0X1 /
—— | dxi A dx2.
(We stick to the subscript notation here to make the symmetries as clear as possible.)
Correspondingly, given the vector field
3F2 "
“Fi"
3X2 3x 3
d Fi 3F3
F= F2 we set curl F =
dx3 3x i
9F2 3Ft
_ 3x1 3x 2 -
Note first of all that d2 = 0 tells us that
In somewhat older books one often sees the notation “rot,” rather than “curl”; both terms
suggest that we think of curl F as having something to do with rotation (curling).
Stokes’s Theorem can now be phrased in the following classical form:
Theorem 6.1 (Classical Stokes’s Theorem) Let S C R3 be a compact, oriented
surface with boundary. Let F be a smooth vector field defined on all of S. Then we have
If we return to our discussion of flux in Section 4 and visualize F as the velocity field
of a fluid, then the line integral / F • Tds around a closed curve C may be interpreted as
Jc
the circulation of F around C, which we might visualize as a measure of the tendency of a
piece of wire in the shape of C to turn (or circulate) when dropped in the fluid. Applying
the theorem with S = Dr, a 2-dimensional disk of radius r centered at a with normal vector
n, and using continuity (see Exercise 7.1.7), we have
curlF(a)-n = lim f F-Tds.
r->o+ nrz JdDr
In particular, if, as pictured in Figure 6.1, we stick a very small paddlewheel (of radius r)
in the fluid, it will spin the fastest when the axle points in the direction of curl F (and—at
least in the limit—won’t spin at all when the axle is orthogonal to curl F). Indeed, if the
fluid—and hence the paddlewheel—is spinning about an axis with angular speed v, then
||curl F|| = 2v (see Exercise 1).
Now, given the 2-fonn
O) — F]dX2 dx3 + Fidx?, A dx\ + F^dxy A dxz G v 42(R3)
(which happens to be obtained by applying the star operator, defined in Exercise 8.2.9, to
our original 1-form), then
Figure 6.1
6 Applications to Physics ◄ 395
“div” is short for divergence, a term that is & propos, as we shall soon see. In this case,
d2 = 0 can be restated as
Stokes’s Theorem now takes the following form, sometimes called Gauss’s Theorem:
I F ndS= divFdV.
dee
Once again, we get from this a limiting interpretation of the divergence: Applying
Exercise 7.1.7, we find
That is, div F(a) is a measure of the flux (per unit volume) outward across very small
spheres centered at a. If that flux is positive, we can visualize a as a source of the field,
with a net divergence of the fluid flow; if the flux is negative, we can visualize a as a sink,
with a net confluence of the fluid. We shall see a beautiful alternative interpretation of the
divergence in Chapter 9.
Given a vector field F (in the context of work) and the corresponding 1 -form co, applying
file star operator introduced in Exercise 8.2.9 gives the 2-form *co corresponding to the same
vector field F (in the context of flux)—and vice versa. That is, when we have an oriented
surface S, the 2-form gives the normal component of F times area 2-form a of S. In
particular, if we start with a function f, then on S, *df — (Dnf)tr, where Dnf = V/ • n
is the directional derivative of f in the normal direction.
We summarize the relation among forms and vector fields, the d operator and gradient,
curl, and divergence in the following table:
|d | curl / I
d2 = 0 2- forms vector fields (flux)' / div (curl) — 0
| div /
|d
3- forms functions (scalar fields)
396 ► Chapter 8. Differential Forms and Integration on Manifolds
-47TGM, 0 g fl
F-ndS =
0, otherwise
(We must also stipulate that 0 £ S for the integral to make sense.) More generally, if Fa is
the gravitational force field due to a point mass at point a £ S, then
—4t t GM, a € fl
I Fa•ndS=
s 0, otherwise
If we have point masses Afi,..., Af* at points ai,..., a*, then the flux of the resultant
k
gravitation force F = £2 Fay outward across the surface S (on which, once again, none of
(When x G D, this integral is improper, yet convergent, as can be verified by using spherical
coordinates centered at the point x.) It should come as no surprise, approximating the mass
distribution by a finite set of point masses, that the flux of the resulting gravitational force
F is given by
[ F • ndS = -4nG [ 8dV = —4nGM,
Js Ja
where M is the mass inside S ~ 3fl. This is Gauss’s law.
Using the limiting formula for divergence given in (*) on p. 395, we see that, even if
F isn’t apparently smooth, it is plausible to define
Now we can determine, as did Newton (following the lines of Example 6 of Chapter
7, Section 4), the gravitational field F inside the earth, assuming—albeit incorrectly—that
the earth is a ball of uniform density. Take the earth to be a ball of radius R centered at the
origin and to have constant density and total mass M. Fix x with ||x|| = b < R. First of
all, we have
3
I_ F • ndS = —47rG(mass of the earth inside B(0, b)) = M.
dB(Q,b)
GM
Thus, we have ||F(x) || = r- ||xII • Since F is radial, we have
RA
F(x) = -^x.
It is often surprising to find that the gravitational force inside the earth is linear in the
distance from the center. Notice that at the earth’s surface, this analysis is in accord with
the inverse-square nature of the field. (See Exercise 2.)
As an amusing application, we calculate the time required to travel in a perfectly
frictionless tunnel inside the earth from one point on the surface to another. We suppose
that we start the trip with zero speed. When the mass is at position x, the component of the
gravitational force acting in the direction of the tunnel is
. n GM
- I|F|| sin# = ——u,
where u is the displacement of the mass from the center of the tunnel (see Figure 6.2). By
Newton’s second law, we have
Figure 6.2
398 ► Chapter 8. Differential Forms and Integration on Manifolds
If we start with the initial conditions w(0) = u q and w'(0) = 0, then we have
u(t) = uo cos
and we see that the mass reaches the opposite end of the tunnel after time
As was pointed out to me my freshman year of college, this is rather less time than many
of our commutes.
I B • Tds = I J • ndS;
Jas Js
e., the circulation of the magnetic field around the wire is the flux of the current density
i.
across the loop.
6 Applications to Physics 399
Let
co — (Eidx 4- E2dy 4- E3dz) Adt + (B]dy Adz + B2dz Adx 4- B3dx Ady).
Then
J f/dE3 dE2 , 3Bi\J J /dEi dE3 dB2\ ,
dco = I I —---------------- F ——)dy Adz+ ( —--------- ------ 1- —— )dz A dx
\\ dy dz dt / X dz dx dt /
/dE2 dEi dB3\ , , \ /3Bi dB2 dB3\
+ (— -I —— Jdx A dy j Adt 4- I —----- 1—------- 1—-— ) dx A dy A dz,
X dx dy dt f / \ dx dy dz /
Next, let
(Using the star operator defined in Exercise 8.2.9, one can check that 0 — *co. The subtlety
is that we’re working in space-time, endowed with a Lorentz metric in which the standard
orthonormal basis {ei......... e<} has the property that 64 • 64 = — 1; this introduces a minus
sign so that *(dx a dt) = — dy A dz, etc.) Then an analogous calculation shows that
so that da = co (see Exercise 8.7.12). Of course, a is far from unique; for any function
f, we will have d(a 4- df) = co as well. Let = a 4- df, where f is a solution of the
inhomogeneous wave equation
d^djS = d0 = {J\dy Adz A- hdz Adx + J3dx A dy) Adt - pdx Ady A dz.
400 ► Chapter 8. Differential Forms and Integration on Manifolds
"At"
we can check that solving Maxwell’s equations is equivalent to finding A = A2 and0
_ A3 _
satisfying the inhomogeneous wave equations
► EXERCISES 8.6
1. Write down the vector field F corresponding to a rotation counterclockwise about an axis in the
direction of the unit vector a with angular speed v, and check that curl F = 2va.
2. Using Gauss’s law, show that the gravitational field of a uniform ball outside the ball is that of a
point mass at its center.
3. (Green’s Formulas) Let f, g: Q -> R be smooth functions on a region Q C R3. Recall that
Dng denotes the directional derivative of g in the normal direction.
(a) Prove that
f (Dng)dS = f y2gdV
Jan J si
f f(Dng)dS = [ (/V2g + Vjf Vg)dV
Jan J si
f (fD.g-gD.f)dS = f (fV2g-gV2 f)dV.
JdSi J SI
(c) Deduce the maximum principle for harmonic functions*. If f is harmonic on a region Q, then f
takes on its maximum value on dQ.
6. Let S C K3 be a closed, oriented surface. Using the formula (t) for the gravitational field F, show
that
(a) the flux of F outward across 5 is 0 when no points of D lie on or inside 5;
(b) the flux of F outward across S is —4k G f 8dV when all of D lies inside S.
Jd
(Hint: Change the order of integration.)
*7. Try to determine which of the vector fields pictured in Figure 6.3 have zero divergence and which
have zero curl. Justify your answers.
8. Let F be a smooth vector field on an open set U c R". A parametrized curve g is a flow line for a
vector field F if g'(r) = F(g(r)) for all t.
(a) Give a vector field with a closed flow line.
(b) Prove that if F is conservative, then it can have no closed flow line (other than a single point).
(c) Prove that if n = 2 and F has a closed flow line C, then div F must equal 0 at some point inside
C. (Hint: See Exercise 8.3.18.)
Figure 6.3
402 ► Chapter 8. Differential Forms and Integration on Manifolds
(e)
(g) (h)
Show that the validity of this equation for all regions £2 is equivalent to the equation of continuity:
98
div F + — = 0.
9t
(Hint: Use Exercise 7.2.20.)
12. Suppose a body £2 c R3 has (C2) temperature w I | at position x € £2 at time t. Assume that
the heat flow vector q = — KVu, where K is a constant (called the heat conductivity of the body);
the flux of q outward across an oriented surface 5 represents the rate of heat flow across S.
(a) Show that the rate of heat flow across 9£2 into £2 is T = / KV2ud V.
Jq
(b) Let c denote the heat capacity of the body; the amount of heat required to raise the temperature
of the volume AV by AT degrees is approximately (cAT)AV; thus, the rate at which the volume
A V absorbs heat is c~ A V. Conclude that the rate of heat flow into £2is7r = ( c—dV.
dt Jq 9t
(c) Deduce that the heat flow within £2 is governed by the partial differential equation c— = KV2u.
dt
13. Suppose £2 C R3 is a region and u: £2 x [0, oo) —> R is a C2 solution of the heat equation
/x\
V2m = —. Suppose u I | = 0 for all x e £2 and Dnu = 0 on 9 £2 (this means the region is insulated
that heat dissipates) and show that E(t) = 0 for all t > 0. (Hint: Use Exercise 7.2.20.)
(c) Prove that if »i and u2 are two solutions of the heat equation that agree at t = 0 and agree on 3 £2
for all time t > 0, then they must agree for all time t > 0.
92u
14. Supposed C R3 is a region and w: R is a C2 solution of the wave equation V2u = —y.
/x\
Suppose that u I 1 = /(x) for all x e 9£2 and all t (e.g., in two dimensions, the drumhead is clamped
is constant. Here by Vh we mean the vector of derivatives with respect only to the space variables.
► 7 APPLICATIONS TO TOPOLOGY
We are going to give a brief introduction to the field of topology by using the techniques
of differential forms and Stokes’s Theorem to prove three rather deep theorems. The basic
ingredient of several of our proofs is the following. Let Sn denote the n-dimensional unit
sphere, Sn = {x e R”+1 : ||x|| = 1}, and Dn the closed unit ball, Dn = {x e R” : ||x|| < 1}.
(Then 3Dn+1 = Sn.)
is such a form. ■
Theorem 7.2 There is no smooth function r: Dn+1 —> Sn with the property that
r(x) ~ xfor all x e Sn.
Corollary 7.3 (Brouwer Fixed Point Theorem) Let f: Dn —> Dn be smooth. Then
there must be a point x G Dn so that f (x) = x; i.e., f must have a fixed point.
Proof Suppose it does not. Then for all x G D\ the points x and f (x) are distinct.
Define r: Dn -> Sn~l by setting r(x) to be the point where the ray starting at f(x) and
passing through x intersects the unit sphere, as shown in Figure 7.1. We leave it to the
reader to check in Exercise 1 that r is in fact smooth. By construction, whenever x g S"-1,
we have r(x) = x. By Theorem 7.2, no such function can exist, and hence f must have a
fixed point. ■
Topology is in some sense the study of continuous (or, in our case, smooth) deformations
of objects. An old saw is that a topologist is one who cannot tell the difference between
a doughnut and a coffee cup. This occurs because we can continuously deform one to the
other, assuming we have flexible, plastic objects: The “hole” in the doughnut becomes the
“hole” in the handle of the cup. The crucial notion here is the following:
7 Applications to Topology -4 405
► EXAMPLE 1
The identity function f: Dn -> Dn, f (x) — x, is homotopic to the constant map g(x) = 0. We merely
set
► EXAMPLE!
cost cos2t
and
sinf sin 2?
homotopic? These parametrized curves wrap once and twice, respectively, around the unit circle,
so the winding numbers of these curves about the origin are 1 and 2, respectively. If we surmise
that the winding number should vary continuously as we continuously deform the curve, then we
guess that the curves cannot be homotopic. Let’s make this precise: Suppose there were a homotopy
H: S1 x [0,1] -> S1 between f and g. Let a) — —ydx + xdy € ^’(S1). Then
since any 2-form on S1 must be 0. On the other hand, as we see from Figure 7.2,
diS1 x [0,1]) = (S1 x {1})- U (S1 x {0}),
Figure 7.2
406 ► Chapter 8. Differential Forms and Integration on Manifolds
so
[ Fa) = [ g*co;
Js1 Js'
since 2n / 4t t , we infer that f and g cannot be homotopic.
I Fa)= I g*o>.
x Jx
By the way, it is time to give a more precise definition of the term “simply connected.”
A closed curve in R" is nothing other than the image of a map S1 -> X.
Recall that a £-form a) is closed if da> = 0 and exact if co = drj for some (k — l)-form
y. As a consequence of Proposition 7.4, we have
Corollary 7.5 Suppose X is a simply connected manifold. Then every closed 1-form
co on X is exact.
g*w = 0, we infer that / f *w = 0. The result now follows from Theorem 3.2. ■
Js1
Note that this is the generalization of the local result we obtained earlier, Proposi
tion 3.3.
Before moving on to our last topic, we stop to state and prove one of the cornerstones
of classical mathematics. We assume a modest familiarity with the complex numbers.
Proof (We identify C with R2 for purposes of the vector calculus.9) Since
\ 4W J
R cos cos nt
( R ' t/ — &
sin nt
, 0 < t < 2t t , we see that
/> /•2tt
I g*a> = I (—(sinnt)(—nsinnt) + (cosnt)(ncosnt))dt== I ndt = 2nn,
JdB(0,R~) JO Jo
and hence, by Proposition 7.4, we have / _ p*a> = 2ztn as well. Now, suppose p had
no root in 8(0, R). Then p would actually be a smooth map from all of 8(0, R) to C — {0}
and we would have
We can actually obtain a stronger, more localized version. We need the following
computational result, a more elegant proof of which is suggested in Exercise 8.
9Recall that complex numbers are of the form z = x 4- iy, x, y e R. We add complex numbers as vectors in R2,
and we multiply by using the distributive property and the rule i2 = — 1: If z = x 4- iy and w = u 4- iv, then
zw = (xu — yv) + i(xv + yu). It is customary to denote the length of the complex number z by |z|, and the
reader can easily check that \zw | = |z||w|. In addition, deMoivre’s formula tells us that ifz = r(cos 0 + i sin 0),
then zn = rn (cos n0 4- i sin n0).
408 ► Chapter 8. Differential Forms and Integration on Manifolds
Lemma 7.7 Let a> ~ {—ydx + xdy)/(x2 4- y2) € .A*1 (C — {0}), and suppose f and
g are smooth maps toC~ {0}. Then (fg)*w = f*a> 4- g*(o.
as required. ■
Now we have an intriguing application of winding numbers (see Section 3) that gives
a two-dimensional analogue of Gauss’s law from the preceding section. We make use of
the Fundamental Theorem of Algebra.
Proof As usual, let cu = (—ydx 4- xdy)/(x2 4- y2). Using Theorem 7.6, we factor
p(z.) — c(z - ri)(z - rf) • • • (z - rn), where c / 0 and r7- e C, j = 1,..., n, are the roots
of p. Let fj (z) = z - rj. Then we claim that
1, rj e D
2t t Jc 0, otherwise
The former is a consequence of Example 10 on p. 361; the latter follows from Corollary
7.5. Applying Lemma 7.7 repeatedly, we see that p*a) — f*a>, and so
There are far-reaching generalizations of this result that you may learn about in a
differential topology or differential geometry course. An interesting application is the
study of how roots of a polynomial vary as we change the polynomial; see Exercise 9.
A vector field v on Sn is a smooth function v: S” -> Rn+1 with the property that
x • v(x) — 0 for all x. (That is, v(x) is tangent to the sphere at x.)
► EXAMPLES
There is an obvious nowhere-zero vector field on Sl, the unit circle, which we’ve seen many times in
this chapter:
~X2
Xi
-x2
Xl
x2m—l x2m
\ x2m J
x2m—1
(If we visualize the vector field in the case of the circle as pushing around the circle, in the higher
dimensional case, we imagine pushing in each of the orthogonal xix2-, x3x4-, x^-ix^-planes
independently.)
In contrast with the preceding example, however, it is somewhat surprising that there
is no nowhere-zero vector field on Sn when n is even. The following result is usually
affectionately called the Hairy Ball Theorem, as it says that we cannot “comb the hairs” on
an even-dimensional sphere.
Theorem 7.9 Any vector field on the unit sphere S2m must vanish somewhere.
Clearly, H is a smooth function. Now we apply Proposition 7.4, using the form a) defined
in Lemma 7.1. In particular, we calculate g*w explicitly:
410 ► Chapter 8. Differential Forms and Integration on Manifolds
2m+l
g*<2) — g*( (— ly^XidXi A • • • /\dXi A • • • A
i=l
2/n+l
= A • • • A (,-dXi) A • • • A (-dX2m+l) = = -0).
1=1
Thus, we have
► EXERCISES 8.7
1. Check that the mapping r defined in the proof of Corollary 7.3 is in fact smooth.
2. Consider the maps f and g defined in Example 2 as maps from [0, 2t t ] to R2 (rather than to S1).
Determine whether they are homotopic.
3. Prove Proposition 7.4.
4. Let f: C -> C be given by f (z) = z4 — 3z + 9, and let □ — {|z | < 2}. Evaluate / f*(o, where,
Jan
as usual, <u = (—ydx + xdy)/(x2 + y2). How many roots does f have in Q?
5. Show that Corollary 7.3 need not hold on the following spaces:
(a) Sn, (c) a solid torus,
(b) the annulus {x e R2 : 1 < ||x|| < 2}, (d) Bn (the open unit ball).
6. Prove the following generalization of Theorem 7.2: Let M be any compact, orientable manifold
with boundary. Then there is no function f: M -> dM with the property that f(x) = x for all x e dM.
7. As pictured in Figure 7.4, let
Z = {x2 + y2 = 1, z = 0} U {x = y = 0} U {x = z = 0, y > 1} C R3.
7 Applications to Topology 411
Figure 7.4
8. (a) Let z = x + iy. Show that
dz _ xdx + ydy . —ydx + xdy
z x2 + y2 1 x2 + y2
(b) Let 17 C C be open, f, g: U -> C — {0} be differentiable, and co = (—ydx + xdy)/(x2 -I- y2).
Prove that (/g)*co = /*co + g*co. (Hint: What is (Jg)*(dz/z)?)
9. Let co = (—ydx + xdy)/(x2 + y2).
(a) Suppose U C C is open and f, g: U -> C — {0} are smooth. Let C C U be a closed curve and
suppose |g - f\ < |/| on C. Prove that
f g*a>= [ f*a>.
Jc Jc
(Hint: Use a homotopy similar to that appearing in the proof of Theorem 7.6.)
(b) Leto q , ai........ cz„_i € C andp(z) = zn + a„_izn-1 H-------- 1- a^z + a0. Let D c Cbe aregion so
that no root of p lies on C = dD. Prove that there is 8 > 0 so that whenever \b j — a7-1 <8 for all
j = 0,1,..., n — 1, the polynomial P(z) — zn + bn_\zn~l -I------- h b^z + bo has the same number
of roots in D as p.
(c) Deduce from part b that the roots of a polynomial vary continuously with the coefficients.
(Cf. Example 2 on p. 189 and Exercise 6.2.2. See also Exercise 9.4.22 for an interesting application
to linear algebra.)
10. Let f: 52m -> S'2”1 be a smooth map. Prove that there exists x e S2m so that either f (x) = x or
f(x) = -x.
11. Letn > 2andf: Dn -> Rn be smooth. Suppose ||f(x) — x|| < 1 for all x e Sn 3. Prove that there
f
is some x € D" so that f (x) = 0. (Hint: If not, show that the restriction of the map —: Dn -> S"-1
to 3D" is homotopic to the identity map.)
12. We wish to give a generalization of Proposition 3.3. Suppose U C R" is an open subset that is
star-shaped with respect to the origin.
412 ► Chapter 8. Differential Forms and Integration on Manifolds
(a) For any k = 1,..., n, given a k-form 0 = fidiCj on U, define the (k - l)-form J(0) =
(/ l A ’ •1 A dxi. A • ■ ■ A dxik. Then make J linear. Prove that
0 = d(J(0)) + J(d0).
(b) Prove that if w is a closed k-form on U, then co is exact
13. Use the result of Exercise 12 to express each of the following closed forms co on R3 in the form
co = dr).
(a) co = (e* cos y + z)dx + (2yz2 — ex sin y)dy + (x + 2y2z + ez)dz.
(b) co = (2x + y2)dy Ndz + (3y + z)dx /\dz + (z — xy)dx A dy.
(c) co = xyzdx /\dy A dz.
14. Draw an orientable surface whose boundary is the boundary curve of the Mobius strip, as pictured
in Figure 7.5. (More generally, every simple closed curve in R3 bounds an orientable surface. Can
you see why?)
Figure 7.5
15. Find three everywhere linearly independent vector fields on S1 x S2.
16. Fill in the details in the following alternative proof of Theorem 7.9, following J. Milnor. Given
a (smooth) unit vector field v on Sn, first extend v to be a vector field V on Rn+1 by setting
V(x)={W|2v(^)’ X*0
U 0, x=0
proceed by contradiction: Suppose there were a sequence tk —> 0 and points x*, y* € Drt+1 so that
ftk (Xfc) = (y*). Use compactness of Dn+1 to pass to convergent subsequences and ykj. To
establish onto, you will need to use the fact that the only nonempty subset of Dn+1 that is both open
(in Dn+1) and closed is Dn+1 itself.)
(c) Apply the Change of Variables Theorem to see that the volume of B(0, Vl +t2) must be a
polynomial expression in t.
(d) Deduce that you have arrived at a contradiction when n is even.
CHWER
EIGENVALUES,
EIGENVECTORS, AND
APPLICATIONS
We have seen the importance of choosing the appropriate coordinates in doing multiple
integration. Now we turn to what is really a much more basic question. Given a linear
transformation T: R" -> R”, can we choose appropriate (convenient?) coordinates on Rn
so that the matrix for T (in these coordinates) is as simple as possible, say, diagonal? For
this the fundamental tool is eigenvalues and eigenvectors. We then give applications to
difference and differential equations and quadratic forms.
Then we define A = to be the matrix for T with respect to B, also denoted [T]#. As
before, we have
where now the column vectors are the coordinates of the vectors with respect to the
basis B.
We might agree that, generally, the easiest matrices to understand are diagonal. If we
think of our examples of projection and reflection in R", we obtain some particularly simple
diagonal matrices.
413
414 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
> EXAMPLE 1
Suppose V cR" is a subspace. Choose a basis {vi,..., ¥&} for V and a basis {vjt+i,..., vn} for
V1. Then B = {vi,..., v„J forms a basis for R" (why?). Let T = projv: R" -> Rn be the linear
transformation given by projecting onto V, and let 5: R" -> R" be the linear transformation given
by reflecting across V. Then we have
P(V1) = Vi S(vi) =
P(Vfc) = V* 5(v*) = yk
and
T(v*+i) = 0 S(Vjt+i) = -Vfc+i
Then the matrices for T and S with respect to the basis B are, respectively,
B= h_2. and C=
0 0
► EXAMPLE 2
and v2 =
then T(vj) = 4vi and P(v2) = v2, so that the matrix for T with respect to the ordered basis B =
{▼ i, v2} is the diagonal matrix
4 0
0 1
Now it is rather straightforward to picture the linear transformation: As we see from Figure 1.1, it
stretches the Vi -axis by a factor of 4 and leaves the v2-axis unchanged. Since we can “pave” the plane
by parallelograms formed by vi and v2, we are able to describe the effects of T quite explicitly. We
shall soon see how to find vi and v2.
For future reference, let’s consider the matrix P with column vectors Vi and v2. Since T(vi) =
4vi and T(v2) = v2, we observe that
-11 I"4
3
AP = = PB.
2 0
This might be rewritten as B = P-1 AP, in the form that will occupy our attention for the rest of this
section.
1 Linear Transformations and Change of Basis 415
It would have been a more honest exercise here to start with the geometric description of T, i.e.,
its action on the basis vectors Vi and v2, and try to find the standard matrix for T. As the reader can
check, we have
1 1
ei = fvi “ 5V2
e2 = |vi + |v2,
_ 1
~ 2
Given a (finite-dimensional) vector space V and an ordered basis B = {Vi,..., v„} for
V, we can define a linear transformation
CB:
which assigns to each vector v its vector of coordinates with respect to the basis B. That is,
Cl
Of course, when B is the standard basis 8 for R", this is what you’d expect:
x\
X2
Q(x) =
Suppose T: R" -» R" is a linear transformation and T(x) = y; to say that A is the standard
matrix for T is to say that multiplying A by the coordinate vector of x (in the standard basis)
gives the coordinate vector of y (in the standard basis). Likewise, suppose T: V —> V is a
linear transformation, T(v) = w, and B is an ordered basis for V. Then let Cg(v) = x be
the coordinate vector of v with respect to the basis B, and let Cg(w) = y be the coordinate
vector of w with respect to the basis B. To say that A is the matrix for T with respect to the
basis B (see the definition on p. 413) is to say Ax = y. (See Figure 1.2.)
Suppose now that we have a linear transformation T: V -> V and two ordered bases
13 = {vi,..., vn} and B' = {vj,..., v„} for V. (Often in our applications, as the notation
suggests, V will be R" and B will be the standard basis 8.) Let Aoia = [T]b be the matrix
for T with respect to the “old” basis B, and let Anew = [T]^ be the matrix for T with
respect to the “new” basis B'. The fundamental issue now is to compute Anew if we know
Aoia- Define the change-of-basis matrix P to be the matrix whose column vectors are the
coordinates of the new basis vectors with respect to the old basis: i.e.,
Note that P must be invertible since we can similarly express each of the old basis vectors
as a linear combination of the new basis vectors. (Cf. Proposition 3.4 of Chapter 4.) Then,
as the diagram in Figure 1.3 summarizes, we have the following
> v
C<j
A
■> K"
[T]f = P-'OTb P.
Remark Two matrices A and B are called similar if B = P[AP for some invertible
matrix P (see Exercise 9). Theorem 1.1 tells us that any two matrices representing a linear
map T: V -> V are similar.
Proof Given a vector v € V, denote by x and x', respectively, its coordinate vectors
with respect to the bases B and B'. The important relation here is
x = Px'.
n
We derive this as follows: Using the equations v = £ and
n n n i=l n n
v “ E*M - E<(E>ov<) = E (Epy*')v,>
7=i j=i i=i i=i ;=i
(If we think of the old basis as the standard basis for Rn, then this is our familiar fact that
multiplying P by x' takes the appropriate linear combination of the columns of P.)
Likewise, if T(v) = w, let y and y', respectively, denote the coordinate vectors of w
with respect to bases B and B'. Now compare the equations
using
y = P/ = P([T]Fx') = (P[P]B/)x',
► EXAMPLES
Let’s return to Example 2 as a test case for the change-of-basis formula. (Of course, we’ve already
seen there that it works.) Given the matrix
1
A = [T] =
2
of a linear transformation T: R2 -> R2 with respect to the standard basis, let’s calculate its matrix
\T]& with respect to the new basis B' = {Vi, v2}, where
1 -1 . 1 2 1
p= , and P"1 = -
1 2 3 -1 1
[t j b - = p -'a p = j J
► EXAMPLE 4
We wish to calculate the standard matrix for the linear transformation T = projy, where V c R3 is
the plane - 2x2 + x3 = 0. If we choose a basis B = {vi, v2, v3} for R3 so that {vi, v2} is a basis
for V and v3 is normal to the plane, then (see Example 1) we’ll have
1 0 0
[Th= 0 1 0
0 0 0
So we take
We wish to know the standard matrix, which means that B' = {ei, e2, e3} should be the standard basis
for R3. Then the inverse of the change-of-basis matrix is
" -1 1 1 "
P'1 = 0 1 -2
1 1 1_
and so
1 0 1
“2 2
P = 1 1 1
5 3 5
1 _1 1
L 6 3 5 J
1 Linear Transformations and Change of Basis ◄ 419
► EXAMPLE 5
Suppose we consider the linear transformation T: R3 —> R3 defined by rotating an angle 2rr/3 about
1“
the line spanned by -1 . (The angle is measured counterclockwise from a vantage point on the
1
“positive side” of this line.) Once again, the key is to choose a convenient new basis adapted to the
geometry of the problem. We choose
along the axis of rotation and Vi, v2 to be an orthonormal basis for the plane orthogonal to that axis:
e.g.,
and
T(v2) = -^V1- |v 2,
T(v 3) = v3.
(Now it should be clear why we chose Vi, v2 to be orthonormal. We also want vb v2, v3 to form a
“right-handed system” so that we’re turning in the correct direction, as indicated in Figure 1.4. But
there’s no need to worry about the length of v3.) Thus, we have
r i -4 o’
[Ds = 4 -I 0
0 0 1
Next, we take JS' = {e2, e2, e3}, and the inverse of the change-of-basis matrix is
i
i
76
2
0
7E
420 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
Figure 1.4
so that
(Exercise 5.5.16 may be helpful here, but, as a last resort, there’s always Gaussian elimination.) Once
again, we solve for
amazingly enough. Tn hindsight, then, we should be able to see the effect of T on the standard basis
vectors quite plainly. Can you?
Remark Suppose we first rotate njl about the X3 -axis and then rotate n/2 about the
xi-axis. We leave it to the reader to check that the result is the linear transformation whose
matrix we just calculated. This raises a fascinating question: Is the composition of rotations
always again a rotation? If so, is there a way of predicting the ultimate axis and angle?
► EXERCISES 9.1
2 1
*1. Letvi = and v2 = , and consider the basis B' = {v!, v2} for R2.
_3.
1 5
(a) Suppose T: R2 —> R2 is a linear transformation whose standard matrix is [T] = . Find
2 -2
the matrix for T with respect to the basis B'.
1 Linear Transformations and Change of Basis -41 421
S(vi) = 2vj + v2
S(v2) = -Vi + 3v 2,
then give the standard matrix for 5.
2. Derive the result of Exercise 1.4.10a by the change-of-basis formula.
3. Let T: R3 -> R3 be the linear transformation given by reflecting across the plane
-xi + x2 + x3 = 0.
(a) Find an orthogonal basis {vi, v2, v3} for R3 so that vb v2 span the plane and v3 is orthogonal
to it.
(b) Give the matrix representing T with respect to your basis in part a.
(c) Use the change-of-basis theorem to give the matrix representing T with respect to the standard
basis.
4. Use the change-of-basis formula to find the standard matrix for projection onto the plane spanned
' 1" o'
0 and 1
_1_ _ —2 _
*5. Let T: R3 -> R3 be the linear transformation given by reflecting across the plane
xi — 2x2 + 2x 3 = 0. Use the change-of-basis formula to find its standard matrix.
6. Check the result claimed in the remark on p. 420.
7. Let V c R3 be the subspace defined by
V — {x € R3 '• x\ — x2 -F x3 = 0}.
Find the standard matrix for each of the following linear transformations:
(a) projection on V,
(b) reflection across V,
(c) rotation of V through angle jt /6 (as viewed from high above).
*8. Find the standard matrix for the linear transformation giving projection onto the plane in R4
spanned by
1 a
(d) Show that for any real numbers a and b, the matrices are similar.
0 2
10. See Exercise 9 for the relevant definition. Prove or give a counterexample:
(a) If B is similar to A, then BT is similar to AT.
(b) If B2 is similar to A2, then B is similar to A.
(c) If B is similar to A and A is nonsingular, then B is nonsingular.
(d) If B is similar to A and A is symmetric, then B is symmetric.
(e) If B is similar to A, thenN(B) = N(A).
(f) If B is similar to A, then rank(B) = rank(A).
11. See Exercise 9 for the relevant definition. Suppose A and B are n x n matrices.
(a) Show that if either A or B is nonsingular, then AB and BA are similar.
(b) Must AB and BA be similar in general?
sin 0 cos#
*(a)
12. Let a = sin0sin# , 0 < 0 < 7t/2. Prove that the intersection of the circular cylin-
cos 0
der xf + x2 = 1 with the plane a • x = 0 is an ellipse. (Hint: Consider the new basis
— sin# — COS 0 cos#
0 sin</>
(b) Describe the projection of the cylindrical region xj + x% = 1, —h < x3 < h onto the general
plane a • x = 0. (Hint: Special cases are the planes x3 = 0 and xi = 0.)
" ±1 ’ " 1 ’
13. A cube with vertices at ±1 is rotated about the long diagonal through ± 1 . Describe
_ ±1 _ _ 1 _
the resulting surface and give equation(s) for it.
14. In this exercise we give the general version of the change-of-basis formula for a linear transfor
mation T: V W.
(a) Suppose V and V' are ordered bases for the vector space V and W and W' are ordered bases for
the vector space W. Let P be the change of basis matrix from V to V and let Q be the change of basis
matrix from W to W'. Suppose T: V -> W is a linear transformation whose matrix with respect to
the bases V and W is [T]^7 and whose matrix with respect to the new bases V' and W' is [T]^f. Prove
that[T]^ ^Q-’trjWp.
(b) Consider the identity transformation T: V -> V. Using the basis V in the domain and the basis
V in the range, show that the matrix for [T]y is the change of basis matrix P.
15. (See the discussion on p. 183 and Exercise 4.4.18.) Let A be an n x n matrix. Prove that the
functions T: R(A) -> C(A) and S: C(A) -» R(A) are inverse functions if and only if A = QP,
where P is a projection matrix and Q is orthogonal.
k times
2 Eigenvalues, Eigenvectors, and Diagonalizability 423
Since Afe is easy to calculate, we are left with a very computable formula for Ak. We will
see a number of applications of this principle in Section 3. We turn first to the matter of
finding the diagonal matrix A if, in fact, A is diagonalizable. Then we will try to develop
some criteria that guarantee diagonalizability.
T(vi) = Aivi,
T(v 2) — A2V2,
T (fn) — •
At this juncture, the obvious question to ask is how we should find eigenvectors. Let’s
start by observing that, if we include the zero vector, the set of eigenvectors with eigenvalue
A forms a subspace.
Lemma 2.2 LetT: V —> V be a linear transformation, and let k be any scalar. Then
is a subspace ofV; dimE(A) > 0 if and only if A is an eigenvalue, in which case we call
E(A) the A-eigenspace.
Proof That E(A) is a subspace follows immediately once we recognize that it is the
kernel (or nullspace) of a linear map. (In the more familiar matrix notation, {x e R" :
Ax = Ax} = N(A — AZ).) Now, by definition, A is an eigenvalue precisely when there is a
nonzero vector in E(A). ■
424 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
Proof From Lemma 2.2 we infer that A is an eigenvalue if and only if the matrix
A — AZ is singular. Next we conclude from Theorem 5.5 of Chapter 7 that A — LI is
singular precisely when det(A — LI) — 0. Putting the two statements together, we obtain
the result. ■
Once we use this criterion to find the eigenvalues L, it is an easy matter to find the
corresponding eigenvectors merely by finding N(A — LI).
► EXAMPLE 1
1
7
We start by calculating
Vl = is a basis for
Since we observe that the {vi, v2} is linearly independent, die matrix A is diagonalizable. Indeed, as
1 1
the reader can check, if we take P = , then
3
► EXAMPLE 2
1 2 1
A= 0 1 0
1 3 1
We begin by computing
1-t 2 1
det(A - if) = 0 l-t 0
1 3 1 -t
Thus, the eigenvalues of A are 0,1, and 2. We next find the respective eigenspaces:
Once again, A is diagonalizable: As the reader can check, {vi, v2, v3} is linearly independent and
therefore gives a basis for R3. Just to be sure, we let
-1 3 1
P= 0 -1 0
1 2 1
then
r i 1 1 " ’1 2 1 " T-l 3 1" "o 0 o'
~2 "2 2
p-'AP = 0 -1 0 0 1 0 0 -1 0 = 0 1 0
1 5 1 1_ 1 2 1_ _0 0 2_
_ 2 2 2 J _1 3
as we expected.
426 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
► EXAMPLE 3
1
0
As usual, we calculate
-t 1
det(A — tl) = = t2 + l.
-1 -t
Since t2 + 1 > 1 for all real numbers t, there is no real number A so that det(A — AZ) = 0. Since our
scalars are allowed only to be real numbers, this matrix has no eigenvalue. On the other hand, as one
might see in a more advanced course, it is often convenient to allow complex numbers as scalars.
It is evident that we are going to find the eigenvalues of a matrix A by finding the (real)
roots of the polynomial det(A — tl). This leads us to our next
Definition Let A be a square matrix. Then p(t) = pA(t) = det(A — tl) is called
the characteristic polynomial of A.1
We can restate Proposition 2.3 by saying that the eigenvalues of A are the real roots of
the characteristic polynomial p a (I). It is comforting to observe that similar matrices have
the same characteristic polynomial, and hence it makes sense to refer to the characteristic
polynomial of a linear map T: V -> V.
Lemma 2.4 If B = P"1 AP for some invertible matrix P, then Pa (I) = Pb (1)-
Proof We have
’That the characteristic polynomial of an n x n matrix is in fact a polynomial of degree n seems pretty evident
from examples; but the fastidious reader can establish this by expanding in cofactors.
2 Eigenvalues, Eigenvectors, and Diagonalizability 427
Remark In order to determine the eigenvalues of a matrix, we must find the roots of
its characteristic polynomial. In real-world applications (where the matrices tend to get quite
large), one might solve this numerically (e.g., using Newton’s method). However, there
are more sophisticated methods for finding the eigenvalues without even calculating the
characteristic polynomial; a powerful such method is based on the Gram-Schmidt process.
The interested reader should consult Strang or Wilkinson for more details.
For the lion’s share of the matrices that we shall encounter here, the eigenvalues will be
integers, and so we take this opportunity to remind you of a trick from high school algebra.
Proposition 2.5 (Rational Roots Test) Let p(t) — antn + an-itn~x -I-------- 1- ait +
ao be a polynomial with integer coefficients. Ift — r/sisa rational root (in lowest terms)
of p(t), then r must be a factor ofao and s must be a factor ofan.
Proof You can find a proof in most abstract algebra texts, but, for obvious reasons,
we recommend Abstract Algebra: A Geometric Approach, by someone named T. Shifrin,
p. 105. ■
In particular, when the leading coefficient an is ±1, as is always the case with the
characteristic polynomial, any rational root must in fact be an integer that divides o q . So, in
practice, we test the various factors of o q (being careful to try both positive and negative).
Once we find one root r, we can divide p(t) by t — r to obtain a polynomial of smaller
degree.
► EXAMPLE 4
4 3
A = 0 1 4
2 -2
is p(t) = -t3 + 6t2 - lit 4- 6. The factors of 6 are ±1, ±2, ±3, and ±6. Since p(l) = 0, we know
that 1 is a root (so we were lucky). Now,
= tz - 5t + 6 = (t - 2)(t - 3),
and we have succeeded in finding all three eigenvalues of A. They are 1,2, and 3. ◄
Remark It might be nice to have a few shortcuts for calculating the characteristic
polynomial of small matrices. For 2x2 matrices, it’s quite easy:
a—t b
= (a — f)(d — t) — be
c d—t
= t2 — (a + d) t 4- (ad — be) —t2 — trA t 4- det A
428 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
(Recall that the trace of a matrix is the sum of its diagonal entries. The trace of A is denoted
trA.) For 3x3 matrices, similarly,
Note that the constant term is always det A because p(0) = det(A — Of) = det A.
In the long run, these formulas notwithstanding, if s sometimes best to calculate the
characteristic polynomial of 3 x 3 matrices by expansion in cofactors. If one is both atten
tive and fortunate, this may save the trouble of factoring the polynomial.
► EXAMPLES
2 0 0
A= 1 2 1
0 1 2
2-t 0 0
1
1 2-t 1 (2-0
2-t
0 1 2-t
= (2 -1)((2 -1)2 - 1) = (2 - /)(? -4/4-3)
= (2 - 0(t — 3)(t - 1).
But that was too easy. Let’s try the characteristic polynomial of
2 0 1
B= 1 3 1
1 1 2
2-t 0 1
3-t 1 1 3-t
1 3-t 1 = (2-0
1 2-t 1 1
1 1 2-t
= (2 —r)((3 — r)(2 — r) — 1) + (1 — (3 —/))
2 Eigenvalues, Eigenvectors, and Diagonalizability *4 429
= (2 - t)(t2 — 5t + 5) — (2 — t) = (2 — t)(t2 — 5t + 4)
= (2 — t)(t - l)(t — 4).
OK, perhaps we were a bit lucky there, too.
2.2 Diagonalizability
Judging by the foregoing examples, it seems to be the case that when an n x n matrix (or
linear transformation) has n distinct eigenvalues, the corresponding eigenvectors form a
linearly independent set and will therefore give a “diagonalizing basis.” Let’s begin by
proving a slightly stronger statement.
Vrooi Let m be the largest number between 1 and k (inclusive) so that {vi,..., } is
linearly independent. We want to see that m — k. By way of contradiction, suppose m < k.
Then we know that {vi,..., vOT} is linearly independent and {vj,..., vm, vm+]} is linearly
dependent. It follows from Proposition 3.2 of Chapter 4 that vm+i = ciVi 4--------1- cmvm
for some scalars q , ..., cm. Then (using repeatedly the fact that T(vz ) = A/v,)
Since A, — Aw+i 0 for i = 1,..., m, and since {vi,..., vm} is linearly independent, the
only possibility is that ci = • • • = cm = 0, contradicting the fact that vm+i 0 (by the very
definition of eigenvector). Thus, it cannot happen that m < k, and the proof is complete. ■
We now arrive at our first result that gives a sufficient condition for a linear transfor
mation to be diagonalizable.
Proof The set of the n corresponding eigenvectors will be linearly independent and
will hence give a basis for V. The matrix for T with respect to a basis of eigenvectors is
always diagonal. ■
Remark Of course, there are many diagonalizable (indeed, diagonal) matrices with
repeated eigenvalues. Certainly the identity matrix and the matrix
"2 0 0“
0 3 0
0 0 2
We spend the rest of this section discussing the two ways the hypotheses of Corollary
2.7 can fail: The characteristic polynomial may have complex roots or it may have repeated
roots.
► EXAMPLE 6
The reader may well recall from Chapter 1 that multiplying by A gives a rotation of the plane through
an angle of t t /4. Now, what are the eigenvalues of A? The characteristic polynomial is
After a bit of thought, it should come as no surprise that A has no (real) eigenvector, as there can be
no line through the origin that is unchanged after a rotation.
We have seen that when the characteristic polynomial has distinct (real) roots, we get
a 1-dimensional eigenspace for each. What happens if the characteristic polynomial has
some repeated roots?
► EXAMPLE?
1
3
-1
N(A - 21) = N
-1
► EXAMPLE 8
and
have the characteristic polynomial p(f) = (t - 2)2(t — 3)2 (why?). For A, there are two linearly inde
pendent eigenvectors with eigenvalue 2 but only one linearly independent eigenvector with eigenvalue
3. For B, there are two linearly independent eigenvectors with eigenvalue 3 but only one linearly
independent eigenvector with eigenvalue 2. As a result, neither can be diagonalized. ◄
> EXAMPLE 9
For the matrices in Example 8, both the eigenvalues 2 and 3 have algebraic multiplicity 2. For matrix
A, the eigenvalue 2 has geometric multiplicity 2 and the eigenvalue 3 has geometric multiplicity
1; for matrix B, the eigenvalue 2 has geometric multiplicity 1 and the eigenvalue 3 has geometric
multiplicity 2. "4
From the examples we’ve seen, it seems quite plausible that the geometric multiplicity
of an eigenvalue can be no larger than its algebraic multiplicity, but we stop to give a proof.
AZrf B
A =
O C
Since the characteristic polynomial does not depend on the basis and since (t — A)OT is the
largest power of t — k dividing the characteristic polynomial, it follows that d < m. ■
432 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
We are now able to give a necessary and sufficient criterion for a linear transformation
to be diagonalizable. Based on our experience with examples, it should come as no great
surprise.
Theorem 2.9 Let T: V —> V be a linear transformation. Let its distinct eigenvalues
be ... and assume these are all real numbers. Then T is diagonalizable if and only
if the geometric multiplicity, di, of each A, equals its algebraic multiplicity, mi.
therefore,
k
n =
i=l
Thus, we must have equality at every stage here, which implies that di = m( for all i =
l,...,k.
Conversely, suppose di = mi for i = 1,..., k. If we choose a basis Bi for each
eigenspace E(A() and let B — B\ U • • • U Bk, then we assert that B is a basis for V. There
are n vectors in B, so we need only check that the set of vectors is linearly independent.
This is a generalization of the argument of Theorem 2.6, and we leave it to Exercise 25. ■
► EXAMPLE 10
The matrices
both have characteristic polynomial p(t) = ~(t — l)2(t — 2). That is, the eigenvalue 1 has algebraic
multiplicity 2 and the eigenvalue 2 has algebraic multiplicity 1. To decide whether the matrices are
diagonalizable, we need to know the geometric multiplicity of the eigenvalue 1. Well,
-12 1 0 0 0
2 Eigenvalues, Eigenvectors, and Diagonalizability ◄ 433
has rank 1 and so dimExfl) = 2. We infer from Theorem 2.9 that A is diagonalizable. Indeed, as
the reader can check, a diagonalizing basis is
has rank 2 and so dimEfl(l) = 1. Since the eigenvalue 1 has geometric multiplicity 1, it follows
from Theorem 2.9 that B is not diagonalizable. *41
In the next section we will see the power of diagonalizing matrices in several applica
tions.
EXERCISES 9.2
1. Find the ei, andei, matrices.
1 5" "2 0 1"
(a)
2 4 *(i) 0 1 2
_0 0 1_
0 1
(b) 1 -2 2
i o
10 -6 (j) -1 0 -1
(C) 0 2 -1
18 -11
3 1 0
1 3
(d) (k) 0 1 2
3 1
0 1 2
1 1
*(e) 1 -6 4
3
0) -2 -4 5
-1 1 2
-2 -6 7
(f) 1 2 1
3 2 —2
2 1 -1
(m) 2 2 -1
1 0 0 2 1 0
(g) -2 1 2
1 0 0 1“
-2 0 3
0 1 1 1
1 -1 2 (n)
0 0 2 0
(h) 0 1 0 .0 0 0 2_
0 -2 3.
2. Prove that 0 is an eigenvalue of A if and only if A is singular.
3. Prove that the eigenvalues of an upper (or lower) triangular matrix are its diagonal entries.
434 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
(c) Prove that E(l) + E(—1) = R" and deduce that A is diagonalizable.
(For an application, see Exercise 15.)
20. Let A be an orthogonal 3x3 matrix.
(a) Prove that the characteristic polynomial pA has a real root.
(b) Prove that || Ax|| = ||x|| for all x G R3 and deduce that the only (real) eigenvalues of A can be 1
and —1.
(c) Prove that if det A = 1, then 1 must be an eigenvalue of A.
(d) Prove that if det A = 1 and A I, then pA: R3 —*• R3 is given by rotation through some angle
0 about some axis. (Hint: First show dimE(l) = 1. Then show that p,A maps E(l)1 to itself and use
Exercise 1.4.34.)
(e) (Cf. the remark on p. 420.) Prove that the composition of rotations in R3 is again a rotation.
21. Consider the linear map T: R3 -► R3 whose standard matrix is the matrix
r i i _i_ 1 _ a/5'i
6 3 ~ 6 6 3
C — 1 — 2 1 i V6
3 6 3 3'6
1 + £ 1-^6 1
1-6 ~ 3 3 6 6 J
given on p. 28. Show that T is indeed a rotation. Find the axis and angle of rotation.
22. Let A be an n x n matrix all of whose eigenvalues are real numbers. Prove that there is a basis
for Rn with respect to which the matrix for A becomes upper triangular. (Hint: Consider a basis
{vi, Vj,..., v3, where Vi is an eigenvector.)
8 23. Suppose T: V -> V is a linear transformation. Suppose T is diagonalizable (i.e., there is a basis
for V consisting of eigenvectors of T). Suppose, moreover, that there is a subspace W c V with the
property that T (W) c W. Prove that there is a basis for VV consisting of eigenvectors of T. (Hint:
Using Exercise 4.3.18, concoct a basis for V by starting with a basis for W. Consider the matrix for
T with respect to this basis; what is its characteristic polynomial?)
24. Suppose A and B are n x n matrices.
(a) Suppose both A and B are diagonalizable and that they have the same eigenvectors. Prove that
AB = BA.
(b) Suppose A has n distinct eigenvalues and AB = BA. Prove that every eigenvector of A is also
an eigenvector of B. Conclude that B is diagonalizable. (Query: Need every eigenvector of B be an
eigenvector of A?)
(c) Suppose A and B are diagonalizable and AB = BA. Prove that A and B are simultaneously
diagonalizable; i.e., there is a nonsingular matrix P so that both P-1 AP and PXBP are diagonal.
436 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
(Hint: If E(A) is the A.-eigenspace for A, show that if v € E(A), then Bv € E(A). Now use Exer
cise 23.)
25. (a) Let A / ft be eigenvalues of a linear transformation. Suppose {vi,..., v*} c E(A) is linearly
independent and {wi,..., w^} C EQi) is linearly independent. Prove that {vi,..., vk, wt,..., wj
is linearly independent.
(b) More generally, if Ai,..., A* are distinct and {v^,..., v^} C E(A;) is linearly independent for
i = 1,..., k, prove that {vj0 : i = 1,..., k, j = 1........ dt} is linearly independent.
Ai
^2
P-^AP = A =
where the diagonal entries of A are the eigenvalues Ai,..., A„ of A. Then it is easy to use
this to calculate the powers of A:
A = PAP"1
A2 = (PAP"1)2 = (PAP^XPAP"1) = PA(P~1P)AP"1 = PA2?"1
A3 = A2 A = (PA2P"1)(PAP“1) = PA2(P"1P)AP"1 = PA3?"1
Ak = PA*?"1.
We now show how linear algebra can be applied to solve some simple difference equations
and systems of differential equations. Both arise very naturally in modeling economic,
physical, and biological problems. For the most basic example, we need only take “expo
nential growth.” When we model a discrete growth process and stipulate that population
doubles each year, then ak, the population after k years, obeys the law: ak+\ = 2ak. When
we model a continuous growth process, we stipulate that the rate of change of the popu
lation x(t) is proportional to the population at that instant, giving the differential equation
x(t) = kx(t).
► EXAMPLE 1
(A cat/mouse population problem) Suppose the cat population at month k is ck and the mouse
Q
population at month k is mk, and let x* = denote the population vector at month k. Suppose
mjt
0.7 0.2
Xit+i = Axjt, where A~
-0.6 1.4
3 Difference Equations and Ordinary Differential Equations ◄ 437
and an initial population vector Xq is given. Then the population vector xk can be computed from
Hk = AkXo,
1
2
Then we have
0
A = PAP~
1.1
and so
co
In particular, if Xq = is the original population vector, we have
mo
Ck 2 1 1 0 2 -1 co
=
mk 3 2 0 (1.1/ -3 2 mo
2 1 1 0 2c q -mo
3 2 0 (1-1/ _ —3c q + 2mo
1'
2 2c q — mo
3 2 _(l.l)*(-3co + 2mo)_
We can now see what happens as time passes. If 3co = 2nto, the second term drops out and the popu
lation vector stays constant. If3c0 < 2m0, the first term is still constant, and the second term increases
exponentially; but note that the contribution to the mouse population is double the contribution to
the cat population. And if 3c0 > 2mQ, we see that the population vector decreases exponentially, the
mouse population being the first to disappear (why?).
438 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
The story for a general diagonalizable matrix A is the same. The column vectors of P
are the eigenvectors Vi,..., v„, and the entries of A* are A*,..., AJ, and so, letting
we have
This formula will have all the information we need, and we will see physical interpretations
of analogous formulas when we discuss systems of differential equations shortly.
► EXAMPLE 2
is obtained by letting each number (starting with the third) be the sum of the preceding two: If we let
ak denote the fc* number in the sequence, then
ak
Thus, if we define xfc = , k > 0, then we can encode the pattern of the sequence in the matrix
a*+i
equation
Ofc+l
we have
Once again, by computing the powers of the matrix A, we can calculate x* = A*Xb, and hence the A01
term in the Fibonacci sequence.
The characteristic polynomial of A is p(t) = t2 —t — 1, and so the eigenvalues are
3 Difference Equations and Ordinary Differential Equations •*< 439
1
A.2
Then
1
and
so we have
1 — A-2 1 Xi
Ai - 1 _ ~ _ -X2
In particular, reading off the first coordinate of this vector, we find that the number in the Fibonacci
sequence is
It’s far from obvious (at least to the author) that each such number is an integer. We would be remiss
if we didn’t point out one of the classic facts about the Fibonacci sequence: If we take the ratio of
successive terms, we get
1 /jk4-2
Ok+1 _ V5 VA1 “ A2 )
a„
lim — = Ai « 1.618.
k-»oo Cl/,
Here, and throughout this section, we use a dot to represent differentiation with respect to
t (time).
440 Chapter 9. Eigenvalues, Eigenvectors, and Applications
The main problem we address in this section is the following: Given an n x n (constant)
matrix A and a vector xo e R", we wish to find all differentiable vector-valued functions
x(t) so that
► EXAMPLES
Supposes = l,sothatA = [a] for some real number a. Then we have simply the ordinary differential
equation
The trick of separating variables that the reader most likely learned in her integral calculus course
leads to the solution x(r) = xQeat. As we can easily check, x(t) = ax(t), so we have in fact found a
solution. Do we know there can be no more? Suppose y (/) were any solution of the original problem.
Then the function z(t) = y(t)e~at satisfies the equation
and so z(r) must be a constant function. Since z(0) — y(0) = xo, we see that y(t) = xoeat. The
original differential equation (with its initial condition) has a unique solution. ◄
► EXAMPLE 4
xi(t) =ax!(r)
x2(t) = bx2(t)
with the initial conditions X] (0) = (xi)o, x2(O) = (x2)o. In matrix notation, this is the ODE
(xi)o
x(t) = _(x2)oj
Since xi (f) and x2(r) appear completely independently in these equations, we infer from Example 3
that the unique solution of this system of equations will be
xi (t) eat
xo = £(t)xo,
x2(0 _ _ 0
where E(t) is the diagonal 2x2 matrix with entries eat and ebt. This result is easily generalized to
the case of a diagonal n x n matrix.
3 Difference Equations and Ordinary Differential Equations ◄ 441
Recall that for any real number x, we have the Taylor series expansion
00 Yk ii 1
(t) c'=^- = 1 + x + -x2 + -x3 + ... + -x‘ + ....
Now, given an n x n matrix A, we define a new n x n matrix eA, called the exponential of
A, by
That the series converges is immediate from Proposition 1.1 of Chapter 6. In general,
however, trying to evaluate this series directly is extremely difficult because the coefficients
of Ak are not easily expressed in terms of the coefficients of A. However, when A is a
diagonalizable matrix, it is easy to compute eA: There is an invertible matrix P so that
A = P-1 AP is diagonal. Thus, A = PAP"1 and Ak = PA^P-1 for all k e N, and so
► EXAMPLES
2 0
Let A = . Then A — PAP \ where
3 -1
1
A= and P —
1
Then we have
The result of Example 4 generalizes to the n x n case. Indeed, whenever we can solve
a problem for diagonal matrices, we can solve it for diagonalizable matrices by making the
appropriate change of basis. So we should not be surprised by the following result.
etkn _
is obviously
r
knetkn _
then we have
(eM)* = (P^P"1)’ = P(^A)’P-1
= P (AefA) P"1
= (PAP-1)(PerAP*"1) = AetA.
as required.
Now suppose that y (t) is a solution of the equation (t), and consider the vector function
z(t) = e~tAy(t). Then by the product rule, we have
i(t) = (e”M)’y(O + Ay(t)
= -Ae~tAy(t) + e~fA(Ay(t)) = (-Ae~tA + e~MA)y(t) = 0,
as Ae~tA = e~tAA. This implies that z(t) must be a constant vector, and so
► EXAMPLE 6
Continuing Example 5, we see that the general solution of the system x(r) = Ax(t) has die form
Cl Cl
x(t) = etA = P I etAP~x
C2 C2
and obtain the familiar linear combination of the columns of P (which are the eigenvectors of A). If,
in particular, we wish to study the long-term behavior of the solution, we observe that lim e~' = 0
1
and lim e2' = oo, so that x(t) behaves like c^2' as t -> oo. In general, this type of analysis of
r->oo 1
diagonalizable systems is called normal mode analysis, and the vector functions
1 o
e‘ and e f
1 1
corresponding to the eigenvectors are called the normal modes of the system,
To emphasize the analogy with the solution of difference equations earlier and the
formula (*) on p. 438, we rephrase Proposition 3.1 to highlight the normal modes.
is
“1 e^11 Cl
ek2t C2
• (tt) ▼i v2
gAn,
where
Cl
C2
P-Ixo =
c,
Note that the general solution is a linear combination of the normal modes eklt Vi,..., ekntvn.
444 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
Even when A is not diagonalizable, we may differentiate the exponential series term-
by-term2 to obtain
f2 A2 + +3 fk \*
(
/ + M+2i 3!A3 + - + ^ + CTA‘+1 + -)
*2 fk—1 fk
= A+tA2 + <_A^ + ... + _l_A^L.AM + ...
fl fk’—X. fk
= A(Z + tA + -A2 + A‘-‘ + —Ak + ••■)= At?*.
\lv X) • Iv I
Thus, we have
Theorem 3.3 Suppose A is an n xn matrix. Then the unique solution of the initial
value problem
is x(t) = etAXo.
► EXAMPLE?
A= ° -1 .
1 0
The unsophisticated (but tricky) approach is to write this system out explicitly:
xi(t) = -x2(t)
x2(t)= Xi(t)
That is, our vector function x(r) satisfies the second-order differential equation
x(r) = -x(r).
for some constants ay, a2, by, and b2 (although it is far from obvious that these are the only solutions).
Some information was lost in the process; in particular, since Xi = x2, the constants must satisfy the
equations
2See Spivak, Calculus, 3 ed., chap. 24, for the proof in the real case; Proposition 1.1 of Chapter 6 applies to show
that it works for the matrix case as well.
3 Difference Equations and Ordinary Differential Equations •< 445
x(t) = eMXo,
sin t cos t
t2
^ = z + fA+_A2+_A3 + _.A4 + ...
t3 t*
1 0 0 -1 t2 -1 0 t3 0 1 t4 1 0
— +r + 2! + 3! + 4!
0 1 1 0 0 -1 -1 0 0 1
Since the power series expansions (Taylor series) for sine and cosine are, indeed,
► EXAMPLES
The system
i1(t) = 2xi+ x2
x2(t) 2x2
3But we must remind you of the famous formula, usually attributed to Euler: e‘l = cos t + i sin t.
446 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
is already partially uncoupled, so we know that x2 (f) must take the form x2 (t) = ce2t for some constant
c. Now, in order to find Xi (t), we must solve the inhomogeneous ODE
In elementary differential equations courses, one is taught to look for a solution of the form
in this case,
and so taking b = c gives the desired solution of our equation. That is, the solution of the system is
the vector function
The explanation of the trick is quite simple. Let’s calculate the matrix exponential etA by writing
2 0 0 1 0 1
A= — 214- B, where B—
0 2 0 0 0 0
and so
°° fk °° fk
fc=0 fc=0
00 (Qf\k 00 fk
*=0 K‘ k=0 K'
= e*l + t± W b =^/+i ^b
fc=l
(t-l)l s *1
0 e21
Let’s consider the general n* order linear ODE with constant coefficients:
Here o q , ai,..., an-i are scalars, and y (t) is assumed to be Qn; y denotes its kP derivative.
We can use the power of Theorem 3.3 to derive the following general result.
3 Difference Equations and Ordinary Differential Equations 447
Proof The trick is to concoct a way to apply Theorem 3.3. We introduce the vector
function x(f) defined by
y(t)
y(t)
x(f) = y(0
= Ax(t),
where A is the obvious matrix of coefficients. We infer from Theorem 3.3 that the general
solution is x(r) = eMxo, so
y(f) co
y(t) Cl
y(t) = elA C2 = covi(r) + civ2(?) + • • • + c„_ivn(r),
_ Cn'1 _
where v7 (r) are the columns of etA. In particular, if we let qi(t),..., qn(t) denote the first
entries of the vector functions vi(f),..., vn(f), respectively, we see that
that is, the functions q i,..., qn span the vector space of solutions of the differential equation
(★ ). Note that these functions are C00 since the entries of etA are. Last, we claim that these
functions are linearly independent. Suppose that for some scalars c q , c i ,..., c„-i we have
then, differentiating, we have the same linear relation among the kP derivatives of q\,...,
qn, k = 1,..., n — 1, and so we have
y(t) co
y(t) Cl
y(t) = etA C2
Cn-1 _
Since etA is an invertible matrix (see Exercise 17), we infer that co ~ ci = • • • — cn-i = 0,
and so {qi,..., qn} is linearly independent. ■
> EXAMPLE 9
Let
-3
2
vi = and v2 =
(Note, as a check, that because A is symmetric, the eigenvectors are orthogonal. See Exercise 9.2.8.)
As usual, we write P~l AP = A, where
and
yi _ 1 1 1 *1
y2 . 2 _1 -1 X2
i.e.,
j’i(r) = -yi
y2(r)= -5y2,
The four constants can be determined from the initial conditions Xo and Xo- In particular, if we start
with
then Ui = = 5 and bi = b2 = 0. Note that the form of our solution looks very much like the
normal mode decomposition of the solution (ft) of the first-order system earlier.
A physical system that leads to this differential equation is the following. Hooke’s Law says that
a spring with spring constant k exerts a restoring force F = —kx on a mass m that is displaced x units
from its equilibrium position (corresponding to the “natural length” of the spring). Now imagine a
system, as pictured in Figure 3.1, consisting of two masses (mi and m2) connected to each other and
to walls by three springs (with spring constants ki, k2, and fc3). Denote by jq and x2 the displacement
of masses m\ and m2, respectively, from equilibrium position. Hooke’s Law, as stated above, and
Newton’s second law of motion (“force = mass x acceleration”) give us the following system of
equations:
Setting mi =m2 = 1, ki = k3 = 1, and k2 =2 gives the system of differential equations with which
we began. Here the normal modes correspond to sinusoidal motion with Xi = x2 (so we observe the
masses moving “in parallel,” the middle spring staying at its natural length) and frequency 1 and
sinusoidal motion with Xi = — x2 (so we observe the masses moving “in antiparallel,” the middle
spring compressing symmetrically) and frequency a /5.
Figure 3.1
450 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
We will write the solution of this initial value problem as 0r(xo), indicating its functional
dependence on both time and the initial value. The function 0 is called the flow of the vector
field F. Note that 0o(x) = x for all x € 17.
> EXAMPLE 10
a. The flow of the vector field F(x) = x on R is <pt (x) = e‘x.
-y
b. The flow of the vector field F on R2 is
X
— sinr
cost
where
-1
d. The flow of the general linear differential equation x(f) = Ax(t) is given by 0r(x) = ^Mx.
Finding an explicit formula for the flow of a nonlinear differential equation may be somewhat
difficult.
It is proved in more advanced courses that if F is a smooth vector field on an open set
U c Rn, then for any x e U, there are a neighborhood V of x and e > 0 so that for any
y e V the flow starting at y, <f>t (y), is defined for all |/1 < e. Moreover, the function 0: V x
(—e, s) -> R", 0 = 0f(y), is smooth. We now want to give another interpretation of
3F2 3J1L
div F = —----- F —----- F • • • +
oxi 3x 2 dxn '
Proposition 3.5 Let F be a smooth vector field on U C R”, let denote the flow
ofF, and let Q CU be a compact region with piecewise smooth boundary. Let V(t) =
vol(0f(Q)). ThenV(O)= [ divFdV.
Jq
Remark Using (the obvious generalization of) the Divergence Theorem, Theorem
6.2 of Chapter 8, we have the intuitively appealing result that V(0) = / F • ndS. That is,
JdQ
what causes net increase in the volume of the region is flow across its boundary.
► EXAMPLE 11
a. In Figure 3.2, we see the flow of the unit square under the vector field F
b. In Figure 3.3 (with thanks to John Polking’s MATLAB software ppi ane 5), we see the flow
of certain regions fl. In (a), the region expands (as div F > 0), whereas in (b) the region
maintains its area (as div F = 0).
452 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
x = x + 2y x = -x + 2y
y = 5x + y
2 2-
1.5 1.5 -
1 1 -
0.5 0.5 -
* 0 0-
-0.5 -0.5 -
-1 -1 -
-1.5 -1.5 -
-2 -2-
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x
(a) (b)
Figure 3.3
Proof We have
V(f) = [ A ■ • • A dxn) = f d^c)i A • • • A d(^t)n.
as required. ■
► EXERCISES 9.3
2 5
1. Let A = . Calculate Ak for all k > 1.
1 -2
*2. Suppose each of two tubs contains two bottles of beer; two are Budweiser and two are Beck’s.
Each minute, Fraternity Freddy picks a bottle of beer at random from each tub and replaces it in the
other tub. After a long time, what portion of the time w ill there be exactly one bottle of Beck’s in the
3 Difference Equations and Ordinary Differential Equations 453
first tub? At least one bottle of Beck’s? (Hint: Let x* be the vector whose entries are, respectively,
the probabilities that there are two Beck’s, one of each, or two Buds in the first tub.)
*3. Gambling Gus has $200 and plays a game where he must continue playing until he has either lost
all his money or doubled it. In each game, he has a 2/5 chance of winning $100 and a 3/5 chance of
losing $100. What is the probability that he eventually loses all his money? (Warning: Calculator or
computer suggested.)
*4. If <2o — 2, = 3, and a*+i = 3ak — 2ak^, for all k > 1, use methods of linear algebra to deter
mine the formula for ak.
5. If «o — «i = 1 and a*+i = ak + 6aki for all k > 1, use methods of linear algebra to determine
the formula for ak.
6. Suppose oo = 0, = 1, and ak+i = 3ak + 4ak~i for all k > 1. Use methods of linear algebra to
find an explicit formula for ak.
7. Ifa0 = 0,ai = Landa^ = 4a* — 4a*_i for all k > 1, use methods of linear algebra to determine
the formula for ak. (Hint: The matrix will not be diagonalizable, but you can get close if you stare at
Exercise 9.2.16.)
*8. If o q = 0, ai = a2 = 1, and ak+i = 2ak + ak_i — 2a*_2 for k > 2, use methods of linear algebra
to determine the formula for ak.
9. Consider the cat/mouse population problem studied in Example 1. Solve the following versions,
including an investigation of the dependence on the original populations.
(a) ck+1 = 0.7c* + 0.1m* (c) ck+i = 1.1c* + 0.3m*
mk+l = -0.2c* + mk mk+i = 0.1c* + 0.9m*
*(b) c*+i = 1.3c* + 0.2m*
m*+i = -0.1c* 4- m*
What conclusions do you draw?
10. Check that if A is an n x n matrix and the n x n differentiable matrix function E(t) satisfies
E(t) = AE(t) and E(0) = 1, then E(r) = etK for all t e R.
11. Calculate etA and use your answer to solve x(r) = Ax(t), x(0) = Xq .
'(a) A = *(d) A =
(b) A = *(e) A =
(c) A = (f) A —
454 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
2
*(d) A=
1
13. Find the motion of the two-mass, three-spring system in Example 9 when
(a) mi — m2 = 1 and k\ = k3 = 1, k^ = 3,
(b) mi = m2 = 1 and k\ = 1, = 2, k3 — 4,
*(c) mi — 1, m2 = 2, k\ = 1, and = k3 = 2.
*14. Let
1
2
Calculate etJ.
*15. By mimicking the proof of Theorem 3.4, convert the following second-order differential equa
tions into first-order systems and use matrix exponentials to solve:
(a) y(r) - y(t) - 2y (t) = 0, y(0) = -1, y(0) = 4,
(b) y(t) ~ 2y(t) + y(t) = 0, y(0) = 1, y(0) = 2.
16. Let a, b e R. Convert the constant coefficient second-order differential equation
y(t)+ay(t) + by(t) =0
y(0
into a first-order system by letting x(t) = . Considering separately the cases a2 - 4b 0
y(0
and a2 — 4b = 0, use matrix exponentials to find the general solution.
17. (a) Prove that for any square matrix A, (eA)-1 = e~A. (Hint: Show (efA) 1 = elA for all
t e R.)
(b) Prove that if A is skew-symmetric (i.e., AT = —A), then eA is an orthogonal matrix.
(c) Prove that when the eigenvalues of A are real, det(eA) = e*A. (Hint: Prove the result when A is
diagonalizable and then use continuity to establish it in general. Alternatively, apply Exercise 9.2.22.)
18. Consider the mapping exp: A4nxn A4nxn given by exp(A) = eA. By Exercise 17, eA is
always invertible.
(a) Use the Inverse Function Theorem to show that for every matrix B sufficiently close to I, there
is a unique A sufficiently close to O so that eA = B.
19. Use Proposition 3.5 to deduce that the derivative with respect to r of the volume of a ball of
radius r (in Rn) is the volume (surface area) of the sphere of radius r.
20. It can be proved using (a generalization of) the Conn-action Mapping Principle, Theorem 1.2 of
Chapter 6, that when F is a smooth vector field, given a, there are 8, s > 0 so that the differential
equation x(t) = F(x(r)), x(0) = Xo, has a unique solution for all xo e B(a, 8) and defined for all
|r| < e.
(a) Assuming this result, prove that whenever |j |, |r|, and |j +1| < £, we have j>s+t = (Hint:
Fix t = t0 and vary 5.)
(b) Deduce that =(4:)“’.
(c) By considering the example F(x) = a /[x |, show that uniqueness may fail when the vector field
isn’t smooth. Indeed, show that the initial value problem i(t) = V|x(r)|, x(0) = 0, has infinitely
many solutions.
4 The Spectral Theorem A 455
21. Generalizing Proposition 3.5 somewhat, prove that V(t) = I div Fd V. (Hint: Use Exercise
a b~
A=
c
Only when A is diagonal are the eigenvalues not distinct. Thus, A is diagonalizable.
Moreover, the corresponding eigenvectors are
b A.2 — C
v2 =
Ai — a b
note that
and so the eigenvectors are orthogonal. Since there is an orthogonal basis for R2 consisting
of eigenvectors of A, we of course have an orthonormal basis for R2 consisting of eigenvec
tors of A. That is, by an appropriate rotation of the usual basis, we obtain a diagonalizing
basis for A.
► EXAMPLE 1
The eigenvalues of
1 2
A=
2 -2
456 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
From Proposition 4.5 of Chapter 1 we recall that for all x, y G R" and n x n matrices
A we have
Ax • y = x • ATy.
Ax • y = x • Ay.
More generally, we say a linear map T: R" -> R” is symmetric if T (x) • y = x • T (y) for
all x, y g R”. It is easy to see that the matrix for a symmetric linear map with respect to
any orthonormal basis is symmetric.
In general, we have die following important result. Its name comes from the word
spectrum, associated with the physical concept of decomposing light into its components
of different colors.
use calculus to prove this, but for a purely linear-algebraic proof, see Exercise 16. Consider
the function
By compactness of the unit sphere, f has a maximum subject to the constraint g(x) ==
||x ||2 = 1. Applying the method of Lagrange multipliers, we infer that there is a unit vector
v so that Df(y) = AZ)g(v) for some scalar A. By Exercise 3.2.14, this means
Av = Av,
► EXAMPLE 2
1 1
A= 1 0
0 1
Its characteristic polynomial is p(t) = — t3 + 2t2 + t —2 = —(t -I- l)(r — l)(r — 2), sotheeigenval-
ues of A are — 1,1, and 2. As the reader can check, the corresponding eigenvectors are
1
Vi = v2 = -1 and v3 = 1
1 1
Note that these three vectors form an orthogonal basis for R3, and we can easily obtain an orthonormal
basis by normalizing:
i ◄I
75
i
76 72 75 J
458 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
► EXAMPLES
5 -4 -2
A= -4 5 -2
-2 -2 8
Its characteristic polynomial is p(t) = -t3 + 18? - 81r = -t(t - 9)2, so the eigenvalues of A are
0,9, and 9. It is easy to check that
2
Vi = 2
1
-4 -4 -2
A-91 = -4 -4 —2
-2 —2 -1
which has rank 1, and so, as the spectral theorem guarantees, E(9) is 2-dimensional, with basis
-1 -1
v2 = 1 and v3 = 0
0 2
If we want an orthogonal (or orthonormal) basis, we must use the Gram-Schmidt process, Theorem
5.3 of Chapter 5: We take w2 = v2 and let
w3 = v3 - projW2v3 =
-1
w^ = 2w 3 = -1
4
As a check, note that V], w2, do in fact form an orthogonal basis. As before, if we want the
orthogonal diagonalizing matrix Q, we take
"2" ‘ -1 " ~ -1 ’
1 1 - 1
41 = 3 2 , Q2 = ~7= 1 , and q3 = —— -1
72 3^2
_ 1 _ 0_ 4_
whence
2
3
Q=
2
3 V2
1
0 4
3
i=l
► EXAMPLE 4
x2 + 4x ix 2 - 2x2 = j xj
where
where Q = -4= 2 -1 2 0
A = QAQ\ and A =
J5 1 2 0 -3
Note that the conic is much easier to understand in the >'i ^-coordinates. Indeed, we recognize that
the equation 2y2 — 3yj = 6 can be written in the form
Zi_Zi = i
3 2
from which we see that this is a hyperbola with asymptotes yi = i^fyi, as pictured in Figure 4.2.
Now recall that the yi ^-coordinates are the coordinates with respect to the basis formed by the
460 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
column vectors of Q. Thus, if we want to sketch the picture in the original XiXz-coordinates, we first
draw in the basis vectors q; and qj, and these establish the yi- and y2~ax®s, respectively, as shown in
Figure 4.3.
±a
represents a hyperbola with vertices and asymptotes X2 =
0
Quadric surfaces include those shown in Figure 4.4: ellipsoids, cylinders, and hyper
boloids of 1 and 2 sheets. There are also paraboloids (both elliptic and hyperbolic), but we
come to these a bit later. We turn to another example.
Figure 4.4
4 The Spectral Theorem <4 461
► EXAMPLES
We observe that if
1 1
1 0
0 1
and so we use the diagonalization and the substitution y = <2Tx, as before, to write
’-1 0 0“
xTAx = yTAy, where A= 0 1 0
0 0 2
yi
that is, in terms of the coordinates y — y2 , we have
and the graph of — 4- y2 4- 2yf = 2 is the hyperboloid of one sheet shown in Figure 4.5. This is
the picture with respect to the “new basis” {qi, q2, qj} (given in the solution of Example 2). The
picture with respect to the standard basis, then, is as shown in Figure 4.6. (This figure is obtained by
multiplying by the matrix Q. Why?) *41
The alert reader may have noticed that we’re lacking certain curves and surfaces. If
there are linear terms present along with the quadratic, we must adjust accordingly. For
example, we recognize that
xf + 2x 2 = 1
-1
is the equation of a congruent ellipse centered at However, the linear terms become
3/4
all important when the symmetric matrix defining the quadratic terms is singular. For
example,
xf — Xi = 1
defines a parabola.
► EXAMPLE 6
We wish to sketch the surface
5xi ~ 8*1*2 — 4x i X3 4- 5xf — 4x2X3 4- 8x3 4- 2xi 4- 2x2 4- X3 = 9.
No, we did not pull this mess out of a hat. The quadratic terms came, as might be predicted, from
Example 3. Thus, we make the change of coordinates given by y = CTx, with
which we recognize as a (circular) paraboloid, shown in Figure 4.7. The sketch of the surface in our
original xix2x3-coordinates is then shown in Figure 4.8. ”4
4 The Spectral Theorem 463
*3
Figure 4.8
► EXERCISES 9.4
1. Find orthogonal matrices that diagonalize each of the following symmetric matrices:
6 2 2 2 -2 1 -2 2~
*(a) *(c) 2 -1 -1 (e) -2 1 2
2 9
_ —2 -1 -1 2 2 L
~2 o' "1 0 1 o“
0 '3 2 2"
0 1 0 i
(b) 0 1 -1 *(d) 2 2 0 (f)
1 0 1 0
_0 -1 1_ _2 0 4_
.0 1 0 1_
464 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
A = 0 5 0 , B = 0 5 0
0 0 5 _2 0 5
—— —
1 2 4 "1 2 4
C= 0 2 2 , D= 0 2 2
_0 0 3_ _0 0 1
7. Suppose A is a diagonalizable matrix whose eigenspaces are orthogonal. Prove that A is symmetric.
8. Suppose A is a symmetric n x n matrix. Using the spectral theorem, prove that if Ax • x = 0 for
every vector x e Rn, then A = O.
9. Apply the spectral theorem to prove that any symmetric matrix A satisfying A2 = A is in fact a
projection matrix.
10. Suppose T is a symmetric linear map satisfying [T]4 = I. Use the spectral theorem to give a
complete description of T: R" -> R". (Hint: For starters, what are the potential eigenvalues of T?)
11. Let A be an m x n matrix. Show that ||A|| — Vk, where k is the largest eigenvalue of the
symmetric matrix ATA.
12. We say a symmetric matrix A is positive definite if Ax • x > 0 for all x 0, negative definite
if Ax • x < 0 for all x 0, and positive (resp., negative) semidefinite if Ax • x > 0 (resp., < 0) for
allx.
(a) Prove that if A and B are positive (negative) definite, then so is A + B.
(b) Prove that A is positive (resp., negative) definite if and only if all its eigenvalues are positive
(resp., negative).
(c) Prove that A is positive (resp., negative) semidefinite if and only if all its eigenvalues are non
negative (resp., nonpositive).
(d) Prove that if C is any m x n matrix of rank n, then A = CTC has positive eigenvalues.
(e) Prove or give a counterexample: If A and B are positive definite, then so is AB. What about
AB + BA?
13. Let A be an n x n matrix. Prove that A is nonsingular if and only if every eigenvalue of ATA is
positive.
14. Prove that if A is a positive semidefinite (symmetric) matrix, then there is a unique positive
semidefinite (symmetric) matrix B with B2 = A.
4 The Spectral Theorem •*< 465
15. Suppose A and B are symmetric and AB = BA. Prove there is an orthogonal matrix Q so
that both Q~lAQ and 316 diagonal. (Hint: Let A be an eigenvalue of A. Use the Spectral
Theorem to show that there is an orthonormal basis for E(A) consisting of eigenvectors of B.)
16. Prove, using only methods of linear algebra, that the eigenvalues of a symmetric matrix are real.
(Hints: Let A = a 4- bi be a putative complex eigenvalue of A, and consider the real matrix
B = (A - (a 4- bi)l)(A - (a - bi)l) = A2 - 2a A + (a2 + b2)I = (A — al)2 + b2I.
Show that B is singular, and that if v e N(B) is a nonzero vector, then (A — al)v = 0 and b = 0.)
17. If A is a positive definite symmetric n x n matrix, what is the volume of the n-dimensional
ellipsoid {x € R" : Ax • x < 1}? (See also Exercise 7.6.3.)
18. Sketch the following conic sections, giving axes of symmetry and asymptotes (if any).
(a) 6x1X2 — 8*2 — 9 (d) 10*? 4- 6*1*2 4- 2x2 = 11
*(b) 3*j — 2*1*2 4- 3*2 = 4 (e) lx2 4- 12*i*a — 2*j — 2*i 4- 4*2 = 6
*(c) 16*f 24*1*2 + 9xf — 3*1 4- 4x2 = 5
19. Sketch the following quadric surfaces.
*(a) 3*? 4- 2x ix 2 4- 2*1*3 + 4*2*3 = 4
(b) 4*J — 2*1*2 — 2*1*3 4- 3*2 4- 4*2*3 4- 3*2 = 6
(c) —*2 4- 2xf — *3 — 4*1*2 — 10*1X3 4- 4x2*3 = 6
*(d) 2*2 4" 2*1*2 4“ 2*1X3 4* 2*2X3 — *1 4~ *2 4* X3 = 1
(e) 3*2 4- 4*1*2 4- 8*1*3 4- 4*2*3 4- 3*2 = 8
(f) 3*2 4- 2*1*3 — x2 4- 3xf 4- 2*2 = 0
20. Let a, b, c e R, and let Q(x) = ax2 4- 2b*i*2 4- ex2.
(a) The Spectral Theorem tells us that there exists an orthonormal basis for R2 with respect to whose
coordinates yi, y2 we have
Q(x) = Q(y) = Ay? 4-
In high school analytic geometry, one derives the formula
a—c
cot 2a = ———
2b
for the angle a through which we must rotate the *i*2-axes to get the appropriate yiy2-axes. Derive
this by using eigenvalues and eigenvectors, and determine the type (ellipse, hyperbola, etc.) of the
conic section Q(x) = 1 from a, b, and c. (Hint: Use the characteristic polynomial to eliminate A2 in
your computation of tan 2a.)
(b) Use the formula for Q. above to find the maximum and minimum of Q on the unit circle 11x || = 1.
21. In this exercise we consider the nature of the restriction of a quadratic form to a hyperplane. Let
A be a symmetric n x n matrix.
(a) Show that the quadratic form Q(x) = xTAx on R" is positive definite when restricted to the
subspace xn = 0 if and only if all the roots of
are positive.
466 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications
(b) Use the change-of-basis theorem to prove that the restriction to the subspace b • x = 0 is positive
definite if and only if all the roots of
I
A-t! b =0
-----------------------------l_
---- bT ------ 0
are positive.
(c) Use this result to give a bordered Hessian test for the point a to be a constrained maximum
(minimum) of the function f subject to the constraint g = c. (See Exercises 5.4.34 and 5.4.32b.)
(d) What is the analogous result for an arbitrary subspace?
22. We saw in Section 3 of Chapter 5 that we can write a symmetric n x n matrix A in the form
A = LDlJ (where L is lower triangular with diagonal entries 1 and D is diagonal); we saw in this
section that we can write A = QAQ* for some orthogonal matrix Q. Although the diagonal entries
of D obviously need not be the eigenvalues of A, the point of this exercise is to see that the signs of
these numbers must agree. That is, the number of positive entries in D equals the number of positive
eigenvalues of A, the number of negative entries in D equals the number of negative eigenvalues of
A, and the number of zero (diagonal) entries in D equals the number of zero eigenvalues.
(a) Assume first that A is nonsingular. Consider the “straight line path” joining I and L (stick a
parameter s in front of the nondiagonal entries of L and let s vary from 0 to 1). We then obtain a
path in Mnxn joining D and A. Show that all the matrices in this path are nonsingular and, applying
Exercise 8.7.9, show that the number of positive eigenvalues of D equals the number of positive
eigenvalues of A. Deduce the result in this case.
(b) In general, prove that the number of zero diagonal entries in D is equal to dim N(A) = dim E(0).
By considering the matrix A 4- e I for s > 0 sufficiently small, use part a to deduce the result.
Remark Comparing Proposition 3.5 of Chapter 5 with Exercise 12 above, we can easily derive
the result of this exercise when A is either positive or negative definite. But the indefinite case is
more subtle.
▼
GLOSSARY OF NOTATIONS
AND RESULTS FROM
SINGLE-VARIABLE
CALCULUS
► Notations
467
468 ► Glossary
Xty subsequence 70
X least squares solution 227
x y dot product of the vectors x and y 8
(x,y) inner product of the vectors x and y 238
X11, X1 components of x parallel to and orthogonal to 9
another vector
0 zero vector 1
0 zero matrix 30
Function Derivative
x" nx”-1
ex ex
logx \/x
sinx cosx
cosx — sinx
tanx sec2x
secx secx tanx
cotx — CSC2 X
esex — CSCx cotx
arcsin x 1/V1 -X2
arctan x 1/(1+x2)
2 1 2 1
sin xdx = -(x — sinx cosx) cos xdx = - (x 4- sin x cos x)
f , 1 , 3 » • 1 • 3
/ sin xdx = — cosx + - cos x cos xdx = sin x — - sin x
/
tan3 xdx = - tan2x + log | cosx|
dx 1 x
= - arctan -
a2 +x2 a a
y >Jx2^a2dx ~ ^Vx2 ±a2 ± y log |x + 1/x2 ± a2| I -=====: = log x + \/x2 ±a2
J Jx2±a2 I I
2
;------ x r---------------- , a . x y* logxdx = xlogx — x
2 — x2dx = —ya2 — x2 + — arcsin —
2 2 a
eajt
eax sin bxdx =
473
ANSWERS TO SELECTED
EXERCISES
1.1.2.
1.1.6
1.2.1 c. —25,9 = arccos(—5/13); f. 2,0 = arccos(l/5)
A 7___5 1
1.2.2 c.
13 -4 ’ 13 8
1 fl
1.4.9 b. Either A = for some real number fl or A = a , a any
0 0 -1/fl -1
real number, /I/O.
0 -1 0 -1 1 0 -1 0
1.4.13 a. 5: ;b.
-1 0 1 0 0 -1 0 1
474
Answers to Selected Exercises •«! 475
1.4.35 b. If A is orthogonal, then A-1 = AT, and so (A-1)T(A-1) = (AT)TAT = AAT = I by Exer
cise 34e.
2
1.5.5 a. -2
2
1.5.6 a. 73
1.5.7 a. xi — X2 + X3 = 0; d. 3xi 4- 4x2 4- 5xj = 12
1.5.8 xi + X2 — X3 = —2
1.5.11 1/76
1.5.12 2
" 0 -c b
1.5.15 c 0 —a
_—b a 0
b. f (r) = *, e. f(t) =
2.2.10 a. If 4 — [a*, bk], let x = sup{afc}. The set of left-hand endpoints is bounded above (e.g.,
by by), and so the least upper bound exists. We have ak < x for all k automatically. Now, if x > bj
for some j, then since Ik C Ij for all k > j, this means that bj is an upper bound of the set as well,
contradicting the fact that x is the least upper bound, b. Take Ik = (0,1/fc).
2.2.13 Chooses = 1. Then there is K g N so that for all k > AT we have Hx* -Xk +i II < 1, so fell <
1 + ||xx+i ||. Therefore, for all j G N, we have ||xj|| < min(||xi||, fell,..., fell, fe+i|| + 1).
2.3.8 a., b., c., d., e., g., h. yes; f., i., j. no
23.10 a. 2; d. a /2
3.1.11 T(v)
3.2.1 a. z = e~2(2x - y + 5); f. w = —x + y + z 4- 3
333 a. 0; e. 1
33.4 We get the approximate answer 34 taking a = 240 and b = 6 and the approximate answer 34.14
taking a = 210 and b = 7. My calculator gives me the “correct answer” 34.46 to two decimals.
33.1 -1
-1
6e3
6
D(gof)(0) = Dg 1 Df
3.4.4 a. —4 ; b. — arctan 4
25
Answers to Selected Exercises 477
4.1.2 b., c., d., t, g. are in echelon form; c. and g. are in reduced echelon form.
2 "I f 1
-1
0
center , radius 5
4.1.8 b = Vi - v2 + v3
4.1.9 b. yes; a., c. no
4.1.10 d. yes; a., b., c. no
4.1.11 b. 2b, + b2 — b3 = 0
’1’ F1 " "1 "
4.1.13 a. By Proposition 1.4, A 1 = 0, but since 0 1 / 0, this is impossible.
_0_ _1_ 0
4.1.14 a. 0, 3; b. for a = 0, b must satisfy b2 ~ 0; for a = 3, b must satisfy b2 = 3b,.
4.1.18 a. none, as Ax = 0 is always consistent; b. take r — m = n; e. take r < n
478 ► Answers to Selected Exercises
|bi + b3 = 0;
0"| T 1 0 0 0"
0 0 10 0
0 -1010
1J L 0 0 0 1.
0
0
0
1
-1 3 -2*1 |"-1 2 1
4.2.2 c. A"1 = -1 2 -1 ;e. A~!= 5 -8 -6
2-3 2 -3 5 4_
_ — "1 -1 0 O' 2
"-2 0 r 0
0 1 -3 2
4.2.3 b. A1 = 9 -1 -3 ,x — 2 ; d. A-1 =
0 0 4 —3
-6 1 2_ _ -1 _
.0 0 -1 1. 0
4.3.2 a., b., d., e. yes; c., f. no
4.3.12
4.3.13
4.3.14
Answers to Selected Exercises 479
4.3.21 a. A hint: Use the definition of U 4- V to show that the vectors span. To establish linear
independence, suppose C1U1 4- c2u2 4------- 1- QUj 4- d^ 4------- F dtvt = 0. Then what can you say
about the vector C]Ui 4- c2u2 4------- 1- q u * = -(diVj 4------- F d^t)?
4.3.23 a., c., e. yes; b., d., f. no
4.4.1 Let’s show that R(B) C R(A) if B is obtained by performing any row operation on A. Ob
viously, a row interchange doesn’t affect the span. If B, — cA, and all the other rows are the same,
C1Bj 1- c/B/ 4------- 1- cmBm = ci Ai -I------- F (czc)A, 4------- 1- cmAm, so any vector in R(B) is in
R(A). If Bz = A, + cAj and all the other rows are the same, then ciBt 4------- F czB, 4------- 1- cmBm =
ci Ai 4- • • • 4~ Cz(Af 4" cAj) 4- ■ • • 4- cmAm = C1A1 4- • ■ • 4" CzAz 4* • • • 4- (cj 4- ccJAj 4- • • • 4"
cmAw, so once again any vector in R(B) is in R(A).
To see that R(A) C R(B), we observe that the matrix A is obtained from B by performing the
(inverse) row operation (this is why we need c 0 for the second type of row operation). Since
R(B) c R(A) and R(A) c R(B), we have R(A) = R(B).
4.4.3 f. R(A):
1
-1
1
N(A):
0
0
0.
’2 -1 o' '1 2'
4.4.4 ,T =
3 0 1 3 6
'1 0 -1 -1' “2 -1 0" "2 0 r
4.4.5 b. 0 1 0 0 ; c. 0 0 0 ; e. 0 2 1
_0 1 0 0_ _2 -1 0_ _2 2 2_
1_ _4_
4.4.10 Since U is a matrix in echelon form, its last m—r rows are 0. When we consider the matrix
product A — BU, we see that every column of A is a linear combinations of ths first r columns of
hence, these r column vectors span C(A). Since dim C(A) = r, these column vectors must give
a basis (see Proposition 3.9).
5^
4.4.11 b. 161 + J&2
_ |hi - 1&2
480 ► Answers to Selected Exercises
4.5.6 It is a smooth surface away from the curve g(r) = Indeed, M is the collection of all
the tangent lines to this curve; this surface has a cuspidal edge along the curve, as one can perhaps
see from the picture below:
X2 4- X% + xf + x| - 1
4.5.11 a. Let F(x) = Then M = F~1({0}). Now DF(x) =
X1X2 - X3X4
i. saddle point
5.3.3 We see two mountain peaks (global maxima) joined by two ridges (two saddle points) with a
deep valley (global minimum) between them.
53.6 c. A =
Answers to Selected Exercises 481
5.4.26 a.
5.4.27 ±
I*i
5.5.15 b. + ^b2
. - |Z>2 _
482 ► Answers to Selected Exercises
6.1.3 Xk - X = (xo + E>1 (Xy - Xy-i)) - (xo + E?=l (Xy - Xy_i)) = - E>fc+1 (Xy ~ Xy-i), SO
llXfc -x|| < E^jt+i ||X; — Xj-ill < (E>fc+icH1) llxi — Xoll = rfellxi — Xoll-
6.1.9 a. [1, 2]; x0 = 1, xi = 1.5, x2 = 1.41667; c. [0.26,0.79], x0 - 0.785398, Xi = 0.523599,
x2 = 0.514961
1 .984
6.1.11 a. B , 1/4 ]; to three decimals, the root is
1/4 .254
y1
6.3.5 a. zero set of F y and graph of f(y) =
/x^
6.3.7 c. zero set of F I y = y x tan z and graph off I ] = x tan z away from z = (2n 4-l)xr/2,
W
n e Z; near such points, use F = x — y cot z and / I j = y cot z-
7.1.1 Let?i = {0 = xo < xi = 1} be the trivial partition of [0,1] and let ?2 = {0 = Jo< Ji < yi <
y3 = 1} be a partition of [0,1] with the properties that yi <\<y2 and ya — yi < c; set P =?ix 1P2.
Then for j — 1, 3, we have = M\j, whereas mi2 = 0 and Mn = 1. Then
U(f, ?) - L(f, 7) = (M12 - m12)(y2 - yt) = y2 - yi < e,
and so, by the Convenient Criterion, Proposition 1.3, we infer that f is integrable. Now, for our
Answers to Selected Exercises ◄ 483
dy dz dx
7.2.13 abc/6
7.2.15 1/8
7.2.24 a. (2 4- t t )/8x 3
7.3.5 3t t /4
7.3.6 (V5 - 1)/12
7.3.9 ?r(e -1)/4
7.4.11 -
4
7.4.12 I = Ima2
7.4.16 a. k = 1; b. k = 1/2; c. k = 3/10
7.5.1 b. —4; d. 6
7.5.11 detA = ±l.
7.5.13 a. -4/3; b. 3 3 3
5 2 1
3 3-
-1 0 1
7.5.14 -3; 2 1 -2
5
3-
7.6.5 ±(3-log5)
7.6.7 sin 1
7.6.10 j
7.6.13 (nj3-3)/9
7.6.14 This is a solid torus with core radius a and little radius b. Its volume is 2n2ab2.
7.6.17 27r2a5/5
u 1 - 1
8.4.10 ,g y 2v
V 1 —z y 1 + u2 + V2
z u2 + V2 - 1
8.4.14 |m«2 = I 3k a4
8.4.15 a. 47ra3; b. 4Ka2h; c. 6zra2/i; d. 24
8.4.17 88t t /3
8.4.18 a., c., d. 4k ; b. 4Kh/*/a2 + h2
8.4.22 a., b. 0: c. t t 2
8.5.1 Since the outward-pointing normal to 3R* is —e*, we must decide whether
{-ek, ej,..., e*i } is a positively-oriented basis for R*. We need k — 1 exchanges and
standard positive basis for R*'1
one change of sign to obtain {ei,..., et}. This is k sign changes in all, and hence the standard
positive basis for R*-1 gives the correct orientation precisely when (—1)* = 4-1.
8.5.3 2k a (a + b)
8.5.6 Ka4
8.5.12 a. —8t t /15
8.5.18 t t /2a /3
8.6.7 c., d., e., g., h. div = 0; a., f., g., h. curl = 0
36 -3
-55 -2
7 4 -4
1
4 1 8
-4 8 1
“3 1 5 4"
^•1-8 i 1 6 -4 7
17 5 -4 14 1
.4 7 1 11.
9.1.12 a. With respect to the “new” coordinates yb y2, J3. the equation of the curve of intersection
is yf + cos2 0 y2 = 1, y3 = 0.
If3 +12 + 2t + 1
;d,
1 1 2 2
9.3.13 c. normal modes cos r , sint , cos 2t , and sin 2t
1 1 -1 -1
1 ’-2 1" A 0 i n
75
i 1
’-2 1 2~
9.4.1 a. ~t = ; c. 7S
; d. 2 2 1
V5 1 2 1 1 1 3
L”76 75 J 1 -2 2_
75
2
9.4.2 2
4
9.4.5 There is an orthogonal matrix Q so that Q~yAQ = A — XI. But then A = Q(kI)Q 1 = XI.
1 -1
9.4.18 b. ellipse y2 + 2y% = 2, where y — -y= x;
1 1
Answers to Selected Exercises 487
1 3
c. parabola yi = 5y% — 1, where y = -
5 4
0 2 “I
J3 Te
9.4.19 a. hyperboloid of 1 sheet —2yj + y? + 4yj = 4, where y = ?2
1
75
i
7S
1 i i
TeJ
*
2 1T
0 T6
1
y= T2 X.
1
V2 A TS-I
INDEX
acceleration, 109 closure, 70 determinant, 46, 309,314
Ampere’s law, 398 cofactor, 315 diagonalizable, 423, 429,432,455
angle, 11,239 column space, 171 simultaneously, 435,465
arclength, 112 basis for, 177 difference equation, 436
arclength-parametrized, 114 column vector(s), 28 differentiable, 87
area, 43 linear combination of, 136 continuously, 93
signed, 44 compact, 197 differential equations, system of,
area form, 370 complement 439,441
augmented matrix, 130 orthogonal, 22 differential form, 339
average value, 298, 299 conic section, 108,459-460 closed, 347
weighted, 301 connected, 103,352,361 exact, 347,386
simply, see simply connected differentiating under the integral
bah, 65 conservative, 352 sign, 287
closed, 70 consistent, 138,140,172 dimension, 165
basis, 161 constraint equation, 139,140,150, directional derivative, 83
change of, 416 179,182 distributive property, 8, 34
orthogonal, 233,236 continuity, 75 divergence, 395,450
orthonormal, 236,456 properties of, 76-78 Divergence Theorem, 395
standard, 162 uniform, 200, 271,287 domain, 24
binomial coefficient, 171 contour curves, 57 dot product, 8
binormal, 117 contraction mapping, 245
boundary orientation, 382 convergent, 67 echelon form, 131
boundary point, 380 coordinate chart, 380 eigenspace, 423
bounded, 197,199 coordinates, 163, 233,413,415,416 eigenvalue, 222,423
Brouwer Fixed Point Theorem, 404 cos, power series of, 445 eigenvector, 222,423
bump function, 381 Cramer’s Rule, 318 elementary matrix, 147-148, 312
critical point, 203 elementary operations, 128
e°°, 120,167-168 cross product, 48 column, 309
catenoid, 124 cubic row, 130
Cauchy sequence, 71,249 cuspidal, 55 ellipse, 105-106,108,460
Cauchy-Schwarz inequality, 11,15, nodal, 55,187 ellipsoid, 460
239 twisted, 56 envelope, 261
Cavalieri’s principle, 324 cubical norm, 324 epicycloid, 62
center of mass, 300 curl, 393 Euler, 103,445
centroid, 300 curvature, 115 exact, 347
Change of Variables Theorem, 326 cuspidal cubic, 55 exponential, power series of, 441
Change-of-Basis Formula, 417 cycloid, 56,187 exterior derivative, 341
change-of-basis matrix, 416 cylinder, 460 extremum, 202
characteristic polynomial, 426-428 cylindrical coordinates, 291-294,
checkerboard, 315 329 Faraday’s law, 398
circulation, 394 Fibonacci Sequence, 438
closed, 69, 347 d, 341 finite-dimensional, 168
closed bafi, 70 deMoivre’s formula, 407 first variation equation, 455
488
Index ◄ 489