Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Download as pdf or txt
Download as pdf or txt
You are on page 1of 504

MULTIVARIABLE

MATHEMATICS
Linear Algebra, Multivariable
Calculus, and Manifolds
MULTIVARIABLE
X/T ATT-TRA/T
1V1/A1 ATir
rlJDlVL/Al ’Q
ILxO
Linear Algebra, Multivariable
Calculus, and Manifolds

THEODORE SHIFRIN
University of Georgia

WILEY
John Wiley & Sons, Inc.
Associate Publisher Laurie Rosatone
Editorial Assistant Kelly Boyle
Executive Marketing Manager Julie Lindstrom
Senior Production Editor Sujin Hong
Senior Designer Madelyn Lesure

This book was set in Times Roman by Techsetters, Inc. and printed and bound by Malloy Lithograph.
The cover was printed by Phoenix Color.

This book is printed on acid free paper, oo

Copyright © 2005 John Wiley & Sons, Inc. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by
any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under
Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the
Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center,
Inc., 222 Rosewood Drive, Danvers, MA 01923, (978)750-8400, Fax: (978)750-4470. Requests to the Publisher
for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., Ill River Street,
Hoboken, NJ 07030, (201) 748-6011, Fax: (201) 748-6008, E-Mail: PERMREQ@WILEY.COM. To order books
or for customer service please call 1-800-CALL WILEY (225-5945).

ISBN 978-0-471-52638-4
CONTENTS
Preface vii ► CHAPTERS
EXTREMUM PROBLEMS 196
> CHAPTER 1
VECTORS AND MATRICES 1 1 Compactness and the Maximum Value
Theorem 196
1 Vectors in R" 1 2 Maximum/Minimum Problems 202
2 Dot Product 8
3 Quadratic Forms and the Second Derivative
3 Subspaces of Rn 16
Test 208
4 Linear Transformations and Matrix Algebra 23
4 Lagrange Multipliers 216
5 Introduction to Determinants and die Cross 5 Projections, Least Squares, and Inner Product
Product 43
Spaces 225
► CHAPTER 2
► CHAPTER6
FUNCTIONS, LIMITS, AND CONTINUITY 53
SOLVING NONLINEAR PROBLEMS 244
1 Scalar- and Vector-Valued Functions 53
1 The Contraction Mapping Principle 244
2 A Bit of Topology in R" 64
2 The Inverse and Implicit Function Theorems 251
3 Limits and Continuity 72 3 Manifolds Revisited 261
► CHAPTERS
► CHAPTER 7
THE DERIVATIVE 81 INTEGRATION 267
1 Partial Derivatives and Directional Derivatives 81 1 Multiple Integrals 267
2 Differentiability 87 2 Iterated Integrals and Fubini’s Theorem 276
3 Differentiation Rules 97 3 Polar, Cylindrical, and Spherical Coordinates 288
4 The Gradient 104 4 Physical Applications 298
5 Curves 109 5 Determinants and n-Dimensional Volume 309
6 Higher-Order Partial Derivatives 120 6 Change of Variables Theorem 324
> CHAPTER 4
► CHAPTER 8
IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR DIFFERENTIAL FORMS AND INTEGRATION ON
SYSTEMS 127 MANIFOLDS 333
1 Gaussian Elimination and the Theory of Linear
1 Motivation 333
Systems 127 2 Differential Forms 335
2 Elementary Matrices and Calculating Inverse 3 Line Integrals and Green’s Theorem 348
Matrices 147
4 Surface Integrals and Flux 367
3 Linear Independence, Basis, and Dimension 156 5 Stokes’s Theorem 379
4 The Four Fundamental Subspaces 171 6 Applications to Physics 393
5 The Nonlinear Case: Introduction to 7 Applications to Topology 403
Manifolds 186
vi ► Contents

► CHAPTER 9 GLOSSARY OF NOTATIONS AND RESULTS FROM


EIGENVALUES, EIGENVECTORS, AND SINGLE-VARIABLE CALCULUS 467
APPLICATIONS 413
FOR FURTHER READING 473
1 Linear Transformations and Change of Basis 413
2 Eigenvalues, Eigenvectors, and ANSWERS TO SELECTED EXERCISES 474
Diagonalizability 422
INDEX 488
3 Difference Equations and Ordinary Differential
Equations 436
4 The Spectral Theorem 455
PREFACE
I began writing this text as I taught a brand-new course combining linear algebra and a
rigorous approach to multivariable calculus. Some of the students had already taken a
proof-oriented single-variable calculus course (using Spivak’s beautiful book, Calculus),
but many had not: There were sophomores who wanted a more challenging entree to
higher-level mathematics, as well as freshmen who’d scored a 5 on the Advanced Placement
Calculus BC exam. My goal was to include all the standard computational material found
in the usual linear algebra and multivariable calculus courses and more, interweaving the
material as effectively as possible, and include complete proofs.
Although there have been a number of books that include both the linear algebra and
the calculus material, they have tended to segregate the material. Advanced calculus books
treat the rigorous multivariable calculus, but presume the students have already mastered
linear algebra. I wanted to integrate the material so as to emphasize the recurring theme of
implicit versus explicit that persists in linear algebra and analysis. In every linear algebra
course we should learn how to go back and forth between a system of equations and a
parametrization of its solution set. But the same problem occurs, in principle, in calculus: To
solve constrained maximum/minimum problems we must either parametrize the constraint
set or use Lagrange multipliers; to integrate over a curve or surface, we need a parametric
representation. Of course, in the linear case one can globally go back and forth; it’s not
so easy in the nonlinear case, but, as we’ll learn, it should at least be possible in principle
locally.
The prerequisites for this book are a solid background in single-variable calculus and, if
not some experience writing proofs, a strong interest in grappling with them. In presenting
the material, I have included plenty of examples, clear proofs, and significant motivation
for the crucial concepts. We all know that to learn (and enjoy?) mathematics one must
work lots of problems, from the routine to the more challenging. To this end, I have pro­
vided numerous exercises of varying levels of difficulty, both computational and more
proof-oriented. Some of the proof exercises require the student “merely” to understand and
modify a proof in the text; others may require a good deal of ingenuity. I also ask students
for lots of examples and counterexamples. Generally speaking, exercises are arranged in
order of increasing difficulty. To offer a bit more guidance, I have marked with an asterisk
(*) those problems to which short answers, hints, or—in some cases—complete solutions
are given at the back of the book. As a guide to the new teacher, I have marked with a sharp
(B) some important exercises to \vhich reference is made later. An Instructor’s Solutions
Manual (ISBN 0-471-64915-5) is available from the publisher.

► COMMENTS ON CONTENTS
The linear algebraic material with which we begin the course in Chapter 1 is concrete,
establishes the link with geometry, and is a good self-contained setting for working on

vii
viii ► Preface

proofs. We introduce vectors, dot products, subspaces, and linear transformations and
matrix computations. At this early stage we emphasize the two interpretations of multiplying
a matrix A by a vector x: the linear equations viewpoint (considering the dot products of the
rows of A with x) and the linear combinations viewpoint (taking the linear combination of
the columns of A weighted by the coordinates of x). We end the chapter with a discussion
of 2 x 2 and 3x3 determinants, area, volume, and the cross product.
In Chapter 2 we begin to make the transition to calculus, introducing scalar functions
of a vector variable—their graphs and their level sets—and vector-valued functions. We
introduce the requisite language of open and closed sets, sequences, and limits and conti­
nuity, including the proofs of the usual limit theorems. (Generally, however, I give these
short shrift in lecture, as I don’t have the time to emphasize 8-s arguments.)
We come to the concepts of differential calculus in Chapter 3. We quickly introduce
partial and directional derivatives as immediate to calculate, and then come to the definition
of differentiability, the characterization of differentiable functions, and the standard differ­
entiation rules. We give the gradient vector its own brief section, in which we emphasize
its geometric meaning. Then comes a section on curves, in which we mention Kepler’s
laws (the second is proved in the text and the other two are left as an exercise), arclength,
and curvature of a space curve.
In the first four sections of Chapter 4 we give an accelerated treatment of Gaussian
elimination (including a proof of uniqueness of reduced echelon form) and the theory of
linear systems, the standard material on linear independence and dimension (including a
brief mention of abstract vector spaces), and the four fundamental subspaces associated to
a matrix. In the last section, we begin our assault on the nonlinear case, introducing (with
no proofs) the implicit function theorem and the notion of a manifold.
Chapter 5 is a blend of topology, calculus, and linear algebra—quadratic forms and pro­
jections. We start with the topological notion of compactness and prove the maximum value
theorem in higher dimensions. We then turn to the calculus of applied maximum/minimum
problems and then to the analysis of the second-derivative test and the Hessian. Then
comes one of the most important topics in applications, Lagrange multipliers (with a rigor­
ous proof). In the last section, we return to linear algebra, to discuss projections (from both
the explicit and the implicit approaches), least-squares solutions of inconsistent systems, the
Gram-Schmidt process, and a brief discussion of abstract inner product spaces (including
a nice proof of Lagrange interpolation).
Chapter 6 is a brief, but sophisticated, introduction to the inverse and implicit function
theorems. We present our favorite proof using the contraction mapping principle (which is
both more elegant and works just fine in the infinite-dimensional setting). In the last section
we prove that all three definitions of a manifold are (locally) equivalent: the implicit
representation, the parametric representation, and the representation as a graph. (In the
year-long course that I teach, I find I have time to treat this chapter only lightly.)
In Chapter 7 we study the multidimensional (Riemann) integral. In the first two sections
we deal predominantly with the theory of the multiple integral and, then, Fubini’s Theorem
and the computation of iterated integrals. Then we introduce (as is customary in a typical
multivariable calculus course) polar, cylindrical, and spherical coordinates and various
physical applications. We conclude the chapter with a careful treatment of determinants
(which will play a crucial role in Chapters 8 and 9) and a proof of the Change of Variables
Theorem.
Preface ◄ ix

In single-variable calculus, one of the truly impressive results is the Fundamental


Theorem of Calculus. In Chapter 8 we start by laying the groundwork for the analogous
multidimensional result, introducing differential forms in a very explicit fashion. We then
parallel a traditional vector calculus course, introducing line integrals and Green’s Theorem,
surface integrals and flux, and, then, finally stating and proving the general Stokes’s Theorem
for compact oriented manifolds. We do not skimp on concrete and nontrivial examples
throughout. In Section 8.6 we introduce the standard terminology of divergence and curl
and give the “classical” versions of Stokes’s and the Divergence Theorems, along with
some applications to physics. In Section 8.7 we begin to illustrate the power of Stokes’s
Theorem by proving the Fundamental Theorem of Algebra, a special case of the argument
principle, and the “hairy ball theorem” from topology.
In Chapter 9 we complete our study of linear algebra, including standard material
on change of basis (with a geometric slant), eigenvalues, eigenvectors and discussion of
diagonalizability. The remainder of the chapter is devoted to applications: difference and
differential equations, and a brief discussion of flows and their relation to the Divergence
Theorem of Chapter 8. We close with the Spectral Theorem. (With the exception of
Section 3.3, which relies on Chapter 8, and the proof of the Spectral Theorem, which relies
on Section 4 of Chapter 5, topics in this chapter can be covered at any time after completing
Chapter 4.)
We have included a glossary of notations and a quick compilation of relevant results
from trigonometry and single-variable calculus (including a short table of integrals), along
with a much-requested list of the Greek alphabet.
There are over 800 exercises in the text, many with multiple parts. Here are a few
particularly interesting (and somewhat unusual) exercises included in this text:

• Exercises 1.2.22-26 and Exercises 1.5.19 and 1.5.20 on the geometry of triangles,
and Exercise 1.5.17, a nice glimpse of affine geometry
• Exercise 2.1.12, a parametrization of a hyperboloid of one sheet in which the param­
eter curves are the two families of rulings
• Exercises 2.3.15-17, 3.1.10, and 3.2.18-19, exploring the infamous sorts of discon­
tinuous and nondifferentiable functions
• Example 3.4.3 introducing the reflectivity property of the ellipse via the gradient,
with follow-ups in Exercises 3.4.8, 3.4.9, and 3.4.13, and then Kepler’s first and
third laws in Exercise 3.5.15.
• Exercise 3.5.14, the famous fact (due to Huygens) that the evolute of a cycloid is a
congruent cycloid
• Exercise 4.5.13, in which we discover that the lines passing through three pairwise­
skew lines generate a saddle surface
• Exercises 5.1.5, 5.1.7, 9.4.11, exploring the (operator) norm of a matrix
• Exercise 5.2.15, introducing the Fermat/Steiner point of a triangle
• Exercises 5.3.2 and 5.3.4, pointing out a local minimum along every line need not
be a local minimum (an issue that is mishandled in surprisingly many multivariable
calculus texts) and that a lone critical point that is a local minimum may not be a
global minimum
x > Preface

• Exercises 5.4.32,5.4.34, and 9.4.21, giving the interpretation of the Lagrange multi­
plier, introducing the bordered Hessian, and giving a proof that the bordered Hessian
gives a sufficient test for constrained critical points
• Exercises 6.1.8 and 6.1.10, giving Kantarovich’s Theorem (first in one dimension and
then in higher), a sufficient condition for Newton’s method to converge (a beautiful
result I learned from Hubbard and Hubbard)
• Exercise 6.2.13, introducing the envelope of a family of curves
• Exercise 7.3.24, my favorite triple integral challenge problem
• Exercises 7.4.27 and 7.4.28
• Exercises 7.5.25-27, some nice applications of the determinant
• Exercises 8.3.23,8.3.25, and 8.3.26, some interesting applications of line integration
and Green’s Theorem
• Exercise 8.5.22, giving a calibrations proof that the minimal surface equation gives
surfaces of least area
• The discussion in Chapter 8, Section 7, of counting roots (reminiscent of the treatment
of winding numbers and Gauss’s Law in earlier sections) and Exercises 8.7.9 and
9.4.22, in which we prove that the roots of a complex polynomial depend continuously
on its coefficients, and then derive Sylvester’s Law of Inertia as a corollary
• Exercises 9.1.12 and 9.1.13, some interesting applications of the change-of-basis
framework
• Exercises 9.2.19,9.2.20,9.2.23, and 9.2.24, some more standard but more challeng­
ing linear algebra exercises

► POSSIBLE WAYS TO USE THIS BOOK


I have been using the text for a number of years in a course for highly motivated freshmen
and sophomores. Since this is the first “serious" course in mathematics for many of them,
because of time limitations, I must give somewhat short shrift to many of the complicated
analytic proofs. For example, I only have time to talk about the Inverse and Implicit Function
Theorems and to sketch the proof of the Change of Variables Theorem, and do not include
all the technical aspects of the proof of Stokes’s Theorem. On the other hand, I cover most
of the linear algebra material thoroughly. I do plenty of examples and assign a broad range
of homework problems, from the computational to the more challenging proofs.
It would also be quite appropriate to use the text in courses in advanced calculus or
multivariable analysis. Depending on the students’ background, I might bypass the linear
algebra material or assign some of it as review reading and highlight a few crucial results.
I would spend more time on the analytic material (especially in Chapters 3, 6, and 7) and
treat Stokes’s Theorem from the differential form viewpoint very carefully, including the
applications in Section 8.7. The approach of the text will give the students a very hands-on
understanding of rather abstract material. In such courses, I would spend more time in class
on proofs and assign a greater proportion of theoretical homework problems.

> ACKNOWLEDGMENTS
I would like to thank my students of the past years for enduring preliminary versions of
this text and for all their helpful comments and suggestions. I would like to acknowledge
Preface ◄ xi

helpful conversations with my colleagues Malcolm Adams and Jason Cantarella. I would
also like to thank the following reviewers, along with several anonymous referees, who
offered many helpful comments:

Michael T. Anderson SUNY, Stony Brook


Quo-Shin Chi Washington University
Mohamed Elhamdadi University of South Florida
Nathaniel Emerson University of California, Los Angeles
Greg Friedman Yale University
Stephen Sperber University of Minnesota
John Stalker Princeton University
Philip B. Yasskin Texas A&M University

I am very grateful to my editor, Laurie Rosatone, for her enthusiastic support, encourage­
ment, and guidance.
I welcome any comments and suggestions. Please address any e-mail correspondence
to

shifrin@math.uga.edu

and please keep an eye on

www. math. uga. edu/~shif rin/Mult ivariable. html

or

www.wiley.com/college/shifrin

for the latest in typos and corrections.

Theodore Shifrin
CH ER

1
VECTORS AND
MATRICES
Linear algebra provides a beautiful example of the interplay between two branches of
mathematics, geometry and algebra. Moreover, it provides the foundations for all of our
upcoming work with calculus, which is based on the idea of approximating the general
function locally by a linear one. In this chapter, we introduce the basic language of vectors,
linear functions, and matrices. We emphasize throughout the symbiotic relation between
geometric and algebraic calculations and interpretations. This is true also of the last section,
where we discuss the determinant in two and three dimensions and define the cross product.

> 1 VECTORS IN
A point in R" is an ordered n-tuple of real numbers, written (xi,..., xn). To it we may
“xi "
*2
associate the vector x = , which we visualize geometrically as the arrow pointing

from the origin to the point. We shall (purposely) use the boldface letter x to denote both
the point and the corresponding vector, as illustrated in Figure 1.1. We denote by 0 the
vector all of whose coordinates are 0, called the zero vector.
More generally, any two points A and B in space determine the arrow pointing from A
to B, as shown in Figure 1.2, again specifying a vector that we denote AB. We often refer

1
2 ► Chapter 1. Vectors and Matrices

Figure 1.3

~ ai ' 'by'
to A as the “tail” of the vector A^ and B as its “head.
If A = and B =
' bi - at ~ _an _ _bn_
then A^ is equal to the vector v = , whose tail is at the origin, as indicated in
bn-an _
Figure 1.2.
The Pythagorean Theorem tells us that when n = 2 the length of the vector x is
■yjxl A repeated application of the Pythagorean Theorem, as indicated in Figure

1.3, leads to the following

Definition We define the length of the vector

We say x is a unit vector if it has length 1, i.e., if ||x|| = 1.

There are two crucial algebraic operations one can perform on vectors, both of which
have clear geometric interpretations.

Scalar multiplication: If c is a real number and x = is a vector, then we define


CXi
Ur. J
cx2
ex to be the vector Note that ex points in either the same direction as x or the

l_C*n J
opposite direction, depending on whether c > Oorc < 0, respectively. Thus, multiplication
by the real number e simply stretches (or shrinks) the vector by a factor of |c| and reverses
1 Vectors in R" 3

Figure 1.4

its direction when c is negative. Since this is a geometric “change of scale,” we refer to the
real number c as a scalar and the multiplication ex as scalar multiplication.
Note that whenever x 0 we can find a unit vector with the same direction by taking
x 1
----- — ----- y
llxll ||x|| ’

as shown in Figure 1.4.


Given a nonzero vector x, any scalar multiple ex lies on the line through the origin and
passing through the head of the vector x. For this reason, we make the following

Definition We say two vectors x and y are parallel if one is a scalar multiple of the
other, i.e., if there is a scalar e so that y = ex or x = cy. We say x and y are nonparallel if
they are not parallel.

x, and draw the arrow from the origin to its head. This is the so-called parallelogram
law for vector addition, for, as we see in Figure 1.5, x -I- y is the “long” diagonal of the

Figure 1.5
4 ► Chapter 1. Vectors and Matrices

parallelogram spanned by x and y. Notice that the picture makes it clear that vector addition
is commutative; i.e.,

x + y = y + x.

This also follows immediately from the algebraic definition because addition of real numbers
is commutative. (See Exercise 12 for an exhaustive list of the properties of vector addition
and scalar multiplication.)

Remark We emphasize here that the notions of vector addition and scalar multipli­
cation make sense geometrically for vectors in the form A^ which do not necessarily have
their tails at the origin. If we wish to add AB to ct>, we simply recall that CD is equal
to any vector with the same length and direction, so we just translate ct> so that C and B
coincide; then the arrow from A to the point D in its new position is the sum a I + ct).

Subtraction of one vector from another is easy to define algebraically. If x and y are
as above, then we set

*1 - yi

_ xn - yn _

As is the case with real numbers, we have the following interpretation of the difference
x — y: It is the vector we add to y in order to obtain x; i.e.,

(x - y) + y = x.

Pictorially, we see that x — y is drawn, as shown in Figure 1.6, by putting its tail at y and
its head at x, thereby resulting in the other diagonal of the parallelogram determined by x
and y. Note that if A and B are points in space and we set x = OA and y — OB, then
y — x = A^. Moreover, as Figure 1.6 also suggests, we have x — y = x + (—y).

Figure 1.6
1 Vectors in R" ◄ 5

► EXAMPLE 1

Let A and B be points in R". The midpoint M of the line segment joining them is the point halfway
from A to B; that is, AAf = ^A%. Using the notation as above, we set x = OA and y = and
we have
(*) Oa J = x + A2&==x +|(y-x) = |(x + y).

In particular, the vector from the origin to the midpoint of AB is the average of the vectors x and y.
See Exercise 8 for a generalization to three vectors and Section 4 of Chapter 7 for more.
From this formula follows one of the classic results from high school geometry: The diagonals
of a parallelogram bisect one another. We’ve seen that the midpoint M of AB is, by virtue of the
formula (*), also the midpoint of diagonal OC. (See Figure 1.7.) ◄

Figure 1.7

It should now be evident that vector methods provide a great tool for translating theo­
rems from Euclidean geometry into simple algebraic statements. Here is another example.
Recall that a median of a triangle is a line segment from a vertex to the midpoint of the
opposite side.

Proposition 1.1 The medians of a triangle intersect at a point that is two-thirds of


the way from each vertex to the opposite side.

Proof We may put one of the vertices of the triangle at the origin, so that the picture
is as shown in Figure 1.8(a). Let x = OA, y = ot, and let L, M, and N be the midpoints
of OA, AB, and OB, respectively. The battle plan is the following: We let P denote the
point 2/3 of the way from B to L, Q the point 2/3 of the way from O to M, and R the
point 2/3 of the way from A to N. Although we’ve indicated P, Q, and R as distinct points
in Figure 1.8(b), our goal is to prove that P = Q = R; we do this by expressing all the
vectors OP, O&, and 0% in terms of x and y.

O^ = oi + BP = oi + Ib I = y + |(|x - y)

= jx+|y;
O§ = |OM = I (|(x + y)) = |(x + y); and
6 ► Chapter 1. Vectors and Matrices

(a)

Figure 1.8

Oft = CM + a J? = CM + = x + |(iy — x) = |x + |y.

We conclude that, as desired, and so P = Q = R. That is, if we go


2/3 of the way down any of the medians, we end up at the same point; this is, of course,
the point of intersection of the three medians. ■

The astute reader might notice that we could have been more economical in the last
proof. Suppose we merely check that the points 2/3 of the way down two of the medians
(say P and Q) agree. It would then follow (say, by relabeling the triangle slightly) that the
same is true of a different pair of medians (say P and R). But since any two pairs must
have a point in common, we may now conclude that all three points are equal.

► EXERCISES 1.1

1. Given x = , calculate the following both algebraically and geometrically.

(a) x + y (e) y-x


(b) x-y (f) 2x —y
(c) x + 2y (g) llxll
(d) |x + |y (h) x/||x||

'1" ”2" ‘3"


*2. Three vertices of a parallelogram are 2 4 , and 1 What are all the possible positions
_1_ _3 _ _5 _
of the fourth vertex? Give your reasoning.
3. The origin is at the center of a regular polygon.
(a) What is the sum of the vectors to each of the vertices of the polygon? Give your reasoning. (Hint:
What are the symmetries of the polygon?)
(b) What is the sum of the vectors from one fixed vertex to each of the remaining vertices? Give
your reasoning.
4. Given AABC, let M and N be the midpoints of AB and AC, respectively. Prove that MN = | BC.
1 Vectors in R" •< 7

5. Let ABCD be an arbitrary quadrilateral. Let P, Q, R, and 5 be the midpoints of AB, BC, CD,
and DA, respectively. Use vector methods to prove that PQRS is a parallelogram. (Hint: Use
Exercise 4.)
*6. In AABC pictured in Figure 1.9, ||A^|| = |||A^|| and ||C7?|| = j||C^||. Let Q denote the
midpoint of CD; show that AQ = cA^ for some scalar c and determine the ratio c = || A$I(/[[Aj & ||.
In what ratio does CD divide AE?

Figure 1.9
7. Consider parallelogram ABCD. Suppose A% = |A^ and D? = Show that P lies on the
diagonal AC. (See Figure 1.10.)

A E B
Figure 1.10

8. Let A, B, and C be vertices of a triangle in R3. Let x = OA, y = and z = OC. Show
that the head of the vector v = - (x 4- y + z) lies on each median of A ABC (and thus is the point of
intersection of the three medians). It follows (see Section 4 of Chapter 7) that when we put equal
masses at A, B, and C, the center of mass of that system is given by the intersections of the medians
of the triangle.
9. (a) Let u, v e R2. Describe the vectors x = su + Tv, where 5 4-1 = 1. Pay particular attention
to the location of x when s > 0 and when t > 0.
(b) Let u, v, w € R3. Describe the vectors x = ru 4- sv 4- tw, where r 4- s 4-1 = 1. Pay particular
attention to the location of x when each of r, s, and t is positive.
10. Suppose x, y e Rn are nonparallel vectors. (Recall the definition on p. 3.)
(a) Prove that if sx 4- ty = 0, then s = t = 0. (Hint: Show that neither s /= 0 nor t / 0 is possible.)
(b) Prove that if ax 4- by = ex 4- dy, then a = c and b = d.
11. “Discover” the fraction 2/3 that appears in Proposition 1.1 by finding the intersection of two
medians. (Hint: A point on the line OM can be written in the form t (x 4- y) for some scalar t, and a
point on the line AV can be written in the form x 4- j(|y — x) for some scalar s. You will need to
use the result of Exercise 10.)
12. Verify both algebraically and geometrically that the following properties of vector arithmetic
hold. (Do so for n = 2 if the general case is too intimidating.)
(a) For all x, y e R", x 4- y = y 4- x.
(b) For all x, y, z € R”, (x 4- y) 4- z = x 4- (y 4- z).
(c) 0 4- x = x for all x e R”.
8 > Chapter 1. Vectors and Matrices

(d) For each x G R”, there is a vector —x so that x + (—x) = 0.


(e) For all c, d g R and x € R", c(dx) = (cd)x.
(f) For all c g R and x, y 6 R", c(x + y) = ex + cy.
(g) For all c, d g R and x g R", (c + d)x = ex + dx.
(h) For all x e Rn, lx = x.
813. (a) Using only the properties listed in Exercise 12, prove that for any x G R", we have Ox = 0.
(It often surprises students that this is a consequence of the properties in Exercise 12.)
(b) Using the result of part a, prove that (—l)x = —x. (Be sure that you didn’t use this fact in your
proof of part a!)

> 2 DOT PRODUCT


We discuss next one of the crucial constructions in linear algebra, the dot product x • y of
two vectors x, y g R”. By way of motivation, let’s recall some basic results from plane
*1
geometry. Let P =
.*2 _

Then we observe that when /LPOQ is aright angle, AO AP is similar to AO BQ, and so
X2/X1 = —y\/y2, whence jqyi + x2y2 — 0. This leads us to make the following

Definition Given vectors x, y e R2, define their dot product

x-y = xiyi + x2y2.

More generally, given vectors x, y G R”, define their dot product

x • y = xiyi + x2y2 + • • • + xnyn.

We know that when the vectors x and y g R2 are perpendicular, their dot product is 0.
By starting with the algebraic properties of the dot product, we are able to get a great deal
of geometry out of it.

Proposition 2.1 The dot product has the following properties:

1. x • y = y • x/or all x, y G R" (dot product is commutative);


2. x • x = ||x||2 > 0 and x • x = 0 <=> x = 0;
3. (ex) • y = c(x • y) for all x, y g R" and c G R;
4. x • (y 4- z) = x • y + x • zfor all x, y, z g R" (the distributive property).
2 Dot Product 9

Proof In order to simplify the notation, we give the proof with n — 2. Since multi­
plication of real numbers is commutative, we have

x • y = xiyi + x2y2 = yixi 4- y2x2 = y x.

The square of a real number is nonnegative and the sum of nonnegative numbers is nonneg­
ative, so x • x = x2 4- x2 - 0 and is equal to 0 only when Xi = x2 = 0. The next property
follows from the associative and distributive properties of real numbers:

(ex) ■ y = (cxOyi -I- (cx2)y2 - c(xxy{) + c(x2y2) = c(xiy! 4- x2y2) = c(x • y).

The last result follows from the commutative, associative, and distributive properties of
real numbers:

x • (y + z) = xi(yi + zi) + x2(y2 + Z2) = *iyi + *iZi 4- x2y2 4- x2z2


= (xiyi 4- x2y2) 4- (X1Z1 4- x2z2) = x • y 4- x • z. ■

Corollary 2.2 ||x + y||2 = ||x||2 + 2x • y 4- ||y||2.

Proof Using the properties repeatedly, we have


llx 4- y ||2 = (x4-y)-(x4-y) = x- x4-x-y4-y-x + y- y
. = l|x||2 + 2x-y + ||y||2,

as desired. ■

The geometric meaning of this result comes from the Pythagorean Theorem: When x
and y are perpendicular vectors in R2, then we have ||x 4- y ||2 = ||x||2 4- ||y||2, and so, by
Corollary 2.2, it must be the case that x • y = 0. (And the converse follows, too, from the
converse of the Pythagorean Theorem.) That is, two vectors in R2 are perpendicular if and
only if their dot product is 0.
Motivated by this, we use the algebraic definition of dot product of vectors in R" to
bring in the geometry. In keeping with current use of the terminology and falling prey to
the penchant to have several names for the same thing, we make the following

Definition We say vectors x and y are orthogonal if x • y = 0.

Armed with this definition, we proceed to a construction that will be important in much
of our future work. Starting with two vectors x, y e R", where y 0, Figure 2.2 suggests
that we should be able to write x as the sum of a vector, x11, that is parallel to y and a vector,
x-1-, that is orthogonal to y. Let’s suppose we have such an equation:

x = x11 4- x4-, where


x" is a scalar multiple of y and x1 is orthogonal to y.

To say that x'1 is a scalar multiple of y means that we can write x" = cy for some scalar c.
Now, assuming such an expression exists, we can determine c by taking the dot product of
both sides of the equation with y:
x • y — (x11 4- x±) • y = (x11 • y) + (x1- • y) = x(i • y = (cy) • y = c||y||2.
10 > Chapter 1. Vectors and Matrices

This means that


x y , n x y
c = 7~i?2 ’ 311(1 so x = ri^-
y 2 y 2

The vector x11 is called the projection of x onto y, written projyx.


The fastidious reader may be puzzled by the logic here. We have apparently assumed
that we can write x = x11 + x1 in order to prove that we can do so. Of course, as it stands,
this is not fair. Here’s how we fix it. We now define

H x-y
x" = —=-y
llyll2
± x y
x-1- = x--------- =-y.
llyll2

Obviously, x11 4- x1 = x and x11 is a scalar multiple of y. All we need to check is that x1 is
in fact orthogonal to y. Well,

x ( x,y \ x y x-y 2
x ■y=r^yry=x'y"^y'y=x'y^'ly!l =x-y~x-y=0’
as required. Note, moreover, that x11 is the unique multiple of y that satisfies the equation
(x - x11) • y = 0.

► EXAMPLE 1
”2" r -i"
Letx = 3 andy = 1 . Then
1 1

2 -1
3 1
-1 -1
1 1 _ 2
x yx-y =
xn = — 1 1 and
llyll2 “ 3
-1 1 1
1
1

2 -1 r s ■
5
2 7
X' 3 1 5
3 1
1 1 L JJ
2 Dot Product <4 11

" 8 “ ’ -1 '
3
To double-check, we compute xx ■ y = 7
3 1 = 0, as it should be. <4
1
L 3 J 1 _

Suppose x.y e R2. We shall see next that the formula for the projection of x onto y
enables us to calculate the angle between the vectors x and y. Consider the right triangle
in Figure 2.3; let 0 denote the angle between the vectors x and y. Remembering that the
cosine of an angle is the ratio of the signed length of the adjacent side to the length of the
hypotenuse, we see that

x-y
COS0 = = g|lyll = llyll2 y = x-y
length of x ||x|| ||x|| llxllllyll*

This, then, is the geometric interpretation of the dot product:

x y = ||x||||y|| cos6>.

Will this formula still make sense even when x,y e R"? Geometrically, we simply restrict
our attention to the plane spanned by x and y and measure the angle 0 in that plane, and so
we blithely make the

Definition Let x and y be nonzero vectors in R". We define the angle between them
to be the unique 0 satisfying 0 < 0 < n so that

cos(; = 2L2L
llxllllyll

Since our geometric intuition may be misleading in R", we should check algebraically
that this definition makes sense. Since |cos#| < 1, the following result gives us what is
needed.

Proposition 23 (Cauchy-Schwarz Inequality) Ifx, y e R”, then

|x-y| < ||x||||y||.

Moreover, equality holds if and only if one of the vectors is a scalar multiple of the other.
12 ► Chapter 1. Vectors and Matrices

Proof If y = 0, then there’s nothing to prove. If y / 0, then we observe that the


quadratic function of t given by

g(t) = ||x + ty ||2 = ||x||2 + 2tx • y +12 ||y ||2

x•y
takes its minimum at ?0 — — The minimum value
llyll2
c<t.\ - IIII2 - 2(* 'y)2 + (X'y)2 - llxll2 - (X'y)2
«(»o) - M ||y||2 + ||yl|2 II II ||y||2

is necessarily nonnegative, so
(x-y)2 < ||x||2||y||2,

and, since square root preserves inequality,

|x-yl < IM llyll»


as desired. Equality holds if and only if x + fy = 0 for some scalar t. (See Exercise 9 for
a discussion of how this proof relates to our formula for projyx above.) ■

One of the most useful applications of this result is the famed triangle inequality, which
tells us that the sum of the lengths of two sides of a triangle cannot be less than the length
of the third.

Corollary 2.4 (Triangle Inequality) For any vectors x, y e Rn, we have ||x + y || <
IM + ||y||.

Proof By Corollary 2.2 and Proposition 2.3 we have

l|x + yII2 = M2 + 2x ■ y + llyll2 < ||x||2 + 2||x|| ||y|| + llyll2 = (||x|| + llyll)2.
Since square root preserves inequality, we conclude that ||x + y|| < ||x|| + ||y ||, as desired.

Remark The dot product also arises in situations removed from geometry. The
economist introduces the commodity vector x, whose entries are the quantities of various
commodities that happen to be of interest and the price vector p. For example, we might
consider

*1 Pl
X2 P2
X = X3 and p= Pl € R5,
X4 P4
X5 PS

where xi represents the number of pounds of flour, X2 the number of dozens of eggs, x$
the number of pounds of chocolate chips, x4 the number of pounds of walnuts, and x5 the
number of pounds of butter needed to produce a certain massive quantity of chocolate chip
2 Dot Product 13

cookies, and p{ is the price (in dollars) of a unit of the I th commodity (e.g., p2 is the price
of a dozen eggs). Then it is easy to see that

p • x = pixr + p2x2 4- P3X3 + P4X4 + P5X5

is the total cost of producing the massive quantity of cookies. (To be realistic, we might
also want to include x$ as the number of hours of labor, with corresponding hourly wage
Pq .) We will return to this interpretation in Section 4.

► EXERCISES 1.2
1. For each of the following pairs of vectors x and y, calculate x • y and the angle 0 between the

*2. For each pair of vectors in Exercise 1, calculate projyx and projxy.
*3. Find the angle between the long diagonal of a cube and a face diagonal.
4. Find the angle that the long diagonal of a 3 x 4 x 5 rectangular box makes with the longest edge.
5. Supposex, y e Rn, ||x|| = 2, ||y|| = 1, and the angle 0 between x and y is 0 = arccos(l/4). Prove
that the vectors x — 3y and x + y are orthogonal.
6. Suppose x, y, z € R2 are unit vectors satisfying x + y + z = 0. What can you say about the angles
between each pair?
’ 1 "1 "o~| “0“

7. Letei = 0 .e2 = 1 , and 63 = 0 be the so-called standard basis vectors for R3. Let
_0_ _0_ _ 1 _
x e R3 be a nonzero vector. For i = 1,2,3, let 0(- denote the angle between x and e;. Compute
COS2 01 + COS2 02 + COS2 03.

1 2

*8. Let x = 1 3 € R". Let 0„ be the angle between x and y in Rn. Find lim 0„.
n->oo

1 n
(Hint: You may need to recall the formulas for 1 + 2 -I------- 1- n and l2 + 22 + • • • + n2 from your
beginning calculus course.)
14 ► Chapter 1. Vectors and Matrices

9. With regard to the proof of Proposition 2.3, how is roy related to x11 ? What does this say about
projyX?
10. Use vector methods to prove that a parallelogram is a rectangle if and only if its diagonals have
the same length.
11. Use the fundamental properties of the dot product to prove that
llx + yll2 + h -y||2 = 2 (||x||2 + ||y||2).
Interpret the result geometrically.
*12. Use the dot product to prove the law of cosines: As shown in Figure 2.4,
c2 = a2 + b2 — 2abcos6.

13. Use vector methods to prove that the diagonals of a parallelogram are orthogonal if and only if
the parallelogram is a rhombus (i.e., has all sides of equal length).
14. Use vector methods to prove that a triangle inscribed in a circle and having a diameter as one of
its sides must be a right triangle. (Hint: See Figure 2.5.)
Geometric challenge: More generally, given two points A and B in the plane, what is the locus
of points X so that ZAXB has a fixed measure?

15. (a) Lety € R". If x • y = 0 for all x e R", then prove that y = 0.
(b) Suppose y, z e R" and x • y = x • z for all x e R". What can you conclude?

~X2
16. Ifx = € R2, set p(x) =
X2 J |_ Xi _
(a) Check that p(x) is orthogonal to x; indeed, p(x) is obtained by rotating x an angle t t /2 counter­
clockwise.
(b) Given x. y e R2, prove that x • p(y) = -p(x) • y. Interpret this statement geometrically.
#17. Prove that for any vectors x, y e R", we have ||x|| - ||y|| < ||x — y ||. Deduce that | ||x|| - ||y|| | <
||x - y ||. (Hint: Apply the result of Corollary 2.4 directly.)
2 Dot Product 15

18. Use the Cauchy-Schwarz inequality to solve the following max/min problem: If the (long)
diagonal of a rectangular box has length c, what is the greatest the sum of the length, width, and
height of the box can be? For what shape box does the maximum occur?
19. Give an alternative proof of the Cauchy-Schwarz inequality, as follows. Let a = ||x||, b = ||y ||,
and deduce from ||bx — ay||2 > 0 that x ■ y < ab. Now how do you show that |x • y| < ab2 When
does equality hold?
•*20. (a) Let x and y be vectors with || x || = ||y ||. Prove that the vector x + y bisects the angle between
x andy.
(b) More generally, if x and y are arbitrary nonzero vectors, let a = ||x|| and b = ||y||. Prove that
the vector bx + ay bisects the angle between x and y.
21. Use vector methods to prove that the diagonals of a parallelogram bisect the vertex angles if and
only if the parallelogram is a rhombus.
22. Given A ABC with D on BC as shown in Figure 2.6. Prove that if AD bisects Z.BAC, then
||B^||/||C^|| = ||A^||/||A^||. (Hint: Use Exercise 20b. Let x = A% and y = *A?; give two
expressions for a !) in terms of x and y and use Exercise 1.1.10.)

23. Use vector methods to prove that the angle bisectors of a triangle have a common point. (Hint:
Given AOAB,letx = OA,y = 0%,a = ||<zA||, b = ||O^||, andc = ||A^||. If we define the point
P by 0$ = (bx + ay), use Exercise 20b to show that P lies on all three angle bisectors.)

24. Use vector methods to prove that the altitudes of a triangle have a common point. Recall that
altitudes of a triangle are the lines passing through a vertex and perpendicular to the opposite side.
(Hint: See Figure 2.7. Let C be the point of intersection of the altitude from B to and the altitude
from A to OB. Prove that OC is orthogonal to a X.)

25. Use vector methods to prove that the perpendicular bisectors of the sides of a triangle intersect
in a point, as follows. Assume the triangle OAB has one vertex at the origin, and let x = OA and
y = ^5.
16 ► Chapter 1. Vectors and Matrices

(a) Let z be the point of intersection of the perpendicular bisectors of O A and O B. Prove that (using
the notation of Exercise 16)
1 , ,. . Ilyll2-x-y
z = |x + cp(x), where c = - - ;- ------- .
2 2p(x) • y
(b) Show that z lies on the perpendicular bisector of AB. (Hint: What is the dot product ofz-
| (x + y) with y — x?)
26. Let P be the intersection of the medians of AO AB (see Proposition 1.1), Q the intersection of
its altitudes (see Exercise 24), and R the intersection of the perpendicular bisectors of its sides (see
Exercise 25). Show that P, Q, and R are collinear and that P is two-thirds of the way from Q to R.
Does the intersection of the angle bisectors (see Exercise 23) lie on this line as well?

>3 SUBSPACES OF R"


As we proceed in our study of “linear objects,” it is fundamental to concentrate on subsets
of R" that are generalizations of lines and planes through the origin.

Definition A set V C R” (a subset of R") is called a subspace of R" if it satisfies


the following properties:

1. 0 G V (the zero vector belongs to V);


2. whenever v G V and c G R, we have cv g V (V is closed under scalar multiplica­
tion);
3. whenever v, w g V, we have v + w g V (V is closed under addition).

► EXAMPLE 1

Let’s begin with some familiar examples.

a. The trivial subspace consisting of just the zero vector 0 G R” is a subspace since cO = 0 for
any scalar c and 0 + 0 = 0.
b. R" itself is likewise a subspace of Rn.
c. Fix a nonzero vector u g R", and consider

I = {x g R" : x = tu for some t g R}.

We check that the three criteria hold:


1. Setting t = 0, we see that 0 G t.
2. If v G t and c € R, then v = tu for some t e R, and so cv = c(tu) = (ct)u, which is
again a scalar multiple of u and hence an element of £.
3. If v, w g t, this means that v = su and w = tu for some scalars 5 and t. Then v + w =
su + tu = (s + t)u, so v + w g €, as needed.

t is called a line through the origin.


d. Fix two nonparallel vectors u and v g R". Set

O’ = {x g Rn : x = su + tv for some s, t g R},


3 Subspaces of Rn 17

as shown in Figure 3.1. IP is called a plane through the origin. To see that T is a subspace,
we do the obligatory checks:

1. Setting s and t = 0, we see that 0 = Ou + Ov, so 0 g S’.


2. Suppose x e O’ and c e IL Then x = su + tv for some scalars s and t, and ex =
c(su + tv) = (cs')u + (ct)v, so ex g ? as well.
3. Suppose x, y 6 T. This means that x = su + tv for some scalars s and t, and y =
s'u + t'v for some scalars s' and t’. Then

x + y = (su + tv) + (?u + t'v) = (s + s')u + (t + t')v,

so x + y g 7, as required.

Figure 3.1

e. Fix a nonzero vector A g R", and consider

V = {x g R" : A • x = 0}.

V consists of all vectors orthogonal to the given vector A, as pictured in Figure 3.2. We
check once again that the three criteria hold:

1. Since A • 0 = 0, we know that 0 G V.


2. Suppose v g V and c g R. Then A • (cv) = c(A • v) = 0, so cv g V.

Figure 3.2
18 ► Chapter 1. Vectors and Matrices

3. Suppose v, w € V. Then A ■ (v + w) = (A • v) + (A • w) = 0 + 0 = 0, so v + w e
V, as required.
Thus, V is a subspace of R". We call V a hyperplane in R", having normal vector A.
More generally, given any collection of vectors Ai,..., Am e R", the set of solutions of the
homogeneous system of linear equations

Ai • x = 0, A2 • x = 0, ..., Am • x = 0

forms a subspace of R".

► EXAMPLE 2

Let’s consider next a few subsets of R2, as pictured in Figure 3.3, that are not subspaces.

e R2 : x2 = 2xi + 1 is not a subspace. All three criteria fail, but it suffices

to point out 0 £ S.

e R2: xix2 = 0 is not a subspace. Each of the vectors v =

lies in S, and yet their sum v + w = does not.

S= e R2 : x2 > 0 is not a subspace. The vector v = lies in S, and yet any

negative scalar multiple of it, e.g., (-2)v = doesnot.

(b)
not subspaces of R2

Figure 33

Given a collection of vectors in R", it is natural to try to “build” a subspace from them.
We begin with some crucial definitions.

Definition Let vi,..., v* e RB. If ci,..., q c R, the vector

V = QVi + c2v2 4-------- 1- ckvk


3 Subspaces of R" ◄ 19

(as illustrated in Figure 3.4) is called a linear combination of Vi,..., v^. The set of all
linear combinations of Vi,..., v* is called their span, denoted Span(vi,..., V&).

Every vector in R" can be written as a linear combination of the vectors

The vectors ei,..., e„ are often called the standard basis vectors for Rn. Obviously, given
the vector

we have x = xid 4- x2e2 4-------- F xnen.

Proposition 3.1 Let Vi,..., vk G RB. Then V = Span(vi,..., v*) is a subspace


ofW.

Proof We check that all three criteria hold.

1. To see that 0 g V, we merely take ci = c2 = • • • = c* = 0. Then (by Exercise


1.1.13) ciVi + c2v2 4- • • • 4~ CfcVfc = Ovi 4- • • • 4- Ovfc = 0 + • • • + 0 = 0.
2. Suppose v g V and c G R. By definition, there are scalars ci,..., q so that
V = C1V1 + c2v2 4--------1- Q¥t. Thus,

cv = C(C1V1 + c2v2 4-------- F CjtVfc) = (cci)vi + (cc2)v2 4-------- F (ccjt)vjt,

which is again a linear combination of Vi,..., vjt, so cv g V, as desired.


20 ► Chapter 1. Vectors and Matrices

3. Suppose v, w e V. This means there are scalars ci,...,q and d\,..., dk so that

v = C1V1 4-------- 1- ckNk and


w = Jivi + • • ■ + dkyk',

adding, we obtain

V + w = (C1V1 + • • • 4- CkVk) + (^1V1 4- • • • 4- dkvk)


= (ci + t/i)vi 4-------- F (ck 4-

which is again a linear combination of Vi,..., v*» hence an element of V.

This completes the verification that V is a subspace of R". ■

Remark Let V c R" be a subspace and let Vi,..., Vjt e V. We say that Vi,..., v*
span V if Span(vi,..., v*) = V- (The point here is that every vector in V must be a linear
combination of the vectors Vi,..., v*.) As we shall see in Chapter 4, it takes at least n
vectors to span Rn; the smallest number of vectors required to span a given subspace will
be a measure of its “size” or “dimension.”

► EXAMPLE 3

The plane

is the span of the vectors

is not a subspace. This is most easily verified by checking that 0 £ for 0 G CP2 precisely when we
can find values of j and t so that

0 1 1 2
0 = 0 4-5 -1 4-1 0
0 0 2 _ 1 _

This amounts to the system of equations:


3 Subspaces of R" 21

5 4- 2t — — 1
-s =0
2s + t = 0,

which we easily see has no solution.


A word of warning here: We might have expressed (Pi in the form

1 ’ ~2~
T 1"
i +s -1 4-1 0 : s, t e R
L _
2_ _ 1

so that, despite the presence of the “shifting” term, the plane may still pass through the origin. *4

There are really two different ways in which subspaces of R" arise: as being the span
of a collection of vectors (the “parametric” approach) or as being the set of solutions of a
(homogeneous) system of linear equations (the “implicit” approach). We shall study the
connections between the two in detail in Chapter 4.

► EXAMPLE 4

-1
As the reader can verify, the vector A = 3 is orthogonal to both the vectors that span the plane
2
(Pi given in Example 3 above. Thus, every vector in (Pi is orthogonal to A, and we suspect that

(Pi = {x e R3 : A • x = 0} = {x e R3: — 4- 3x2 + 2x3 = 0}.

Strictly speaking, we only know that every vector in (Pi is a solution of this equation. But note that
if x is a solution, then

xi 3x24-2x3 ~3' '2' 1" ‘2’

X2 = X2 = X2 1 4- X3 0 = (-X2) -1 + (2X2 + X3) 0


_x3 _ X3_ _0_ _1_ 2_ _1_

so x e (Pi and the two sets are equal.1 Thus, the discussion of Example le gives another justification
that (Pi is a subspace of R3.
On the other hand, one can check, analogously, that

(P2 = {x e R3 : —%i 4- 3x2 4- 2x3 = —1).

and so clearly 0 (P2 and (P2 is not a subspace. It is an affine plane parallel to (Pi. <4

Definition Let V and W be subspaces of R”. We say they are orthogonal subspaces
if every element of V is orthogonal to every element of W, i.e., if

v•w=0 for every v g V and every w € W.

’Ordinarily, the easiest way to establish that two sets are equal is to show that each is a subset of the other.
22 ► Chapter 1. Vectors and Matrices

Figure 3.5

As indicated in Figure 3.5, given a subspace V C R", define

V1 = {x e R” : x • v = 0 for every v e V}.

V1 (read “V perp”) is called the orthogonal complement of V.2

Proposition 3.2 V1 is also a subspace ofRn.

Proof We leave this to the reader in Exercise 4. ■

► EXAMPLES

Let V = Span . Then V-1- is the plane W = {x € R3 : Xj + 2x2 + %3 = 0}. Now what is

\
the orthogonal complement of IF? We suspect it is just the line V, but we will have to wait until
Chapter 4 to have the appropriate tools.

If V and W are orthogonal subspaces of R", then certainly W c V1 (why?). Of course,


W need not be equal to V1: Consider, for example, the xj-axis and the %2-axis in R3.

> EXERCISES 1.3

*1. Which of the following are subspaces? Justify your answer in each case.
(a) {x e R2 : Xi +x2 = 1}
a
(b) {x e R3: x = b for some a, b e R}
a+b
(c) {x e R3 .: xi+ 2%2 < 0)
(d) {xe R3: x2 + x2 + Xj = 1}
(e) {xe R3: x2 + x2 + x2 = 0}
(f) (x g R3 : Xj2 4- x2 + x^ = — 1}

2In fact, both this definition and Proposition 3.2 work just fine for any subset V C R".
4 Linear Transformations and Matrix Algebra 23

*2. Criticize the following argument: By Exercise 1.1.13, for any vector v, we have Ov = 0. So the
first criterion for subspaces is, in fact, a consequence of the second criterion and could therefore be
omitted.
B3. Suppose x, Vi,..., Vfc € Rn and x is orthogonal to each of the vectors Vi,..., v*. Prove that x is
orthogonal to any linear combination ciVi + c2v2 H------ 1- q v *.
4. Prove Proposition 3.2.
5. Given vectors Vi,..., vk e R", prove that V = Span(vi,..., vt) is the smallest subspace con­
taining them all. That is, prove that if W c R" is a subspace and Vi,..., vk e W, then V C W.
B6. (a) Let U and V be subspaces of R". Define
U n V = {x e Rn : x € U andx € V}.
Prove that U A V is a subspace of Rn. Give two examples.
(b) IsJ7UV = {x€Rn:xe(7orxeV}a subspace of R"? Give a proof or counterexample.
(c) Let U and V be subspaces of R". Define
U -I- V — {x e R" : x = u + v for some u € U and v e V}.
Prove that U + V is a subspace of R". Give two examples.
7. Let Vi,..., vfc € R" and let v € Rn. Prove that
Span(vi........ Vt) = Span(vlf..., v*, v) <=>■ v e Span(v15..., vk).
^*8. Let V C R" be a subspace. Prove that V A = {0}.
89. Suppose U, V c R” are subspaces and UcV. Prove that V1 c U1.
#10. Let V c R" be a subspace. Prove that V c (V1)1. Do you think more is true?
#11. Suppose V = Span(vi,..., v*) C R". Show that there are vectors Wi,..., wt € V that are
mutually orthogonal (i.e., w,- ■ w;- = 0 whenever i j) that also span V. (Hint: Let wi = Vp Using
techniques of Section 2, define w2 so that Span(wi, w2) = Span(vi, v2) and wi • w2 = 0. Continue.)
12. Suppose U and V are subspaces of R". Prove that (U + V)x = A V-1. (See the footnote on
p. 21.)

► 4 LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA


We are heading toward calculus and the study of functions. As we learned in the case of one
variable, differential calculus is based on the idea of the best (affine) linear approximation
of a function. Thus, our first brush with functions is with those that are linear.
24 ► Chapter 1. Vectors and Matrices

First we introduce a bit of notation. If X and F are sets, a Junction f: X -> Y is a


rule that assigns to each element x e X a single element y G Y; we write y = f(x). We
call X the domain of f and Y the range. The image of f is the set of all its values; i.e.,
{y g Y : y = f(x) for some x g X}.

Definition A function T: R" -» R7” is called a linear transformation or linear map


if it satisfies

i. T(u + v) = T(u) 4- T(v) for all u, v g R";


ii. T(cv) = cT(v) for all v G R” and scalars c.

If we think visually of T as mapping R" to ROT, then we have a diagram like Figure 4.1.
The main point of the linearity properties is that the values of T on the standard basis
vectorsei,..., en completely determine the function T: For suppose x = xiei 4-------- 1- xnen
G R”; then

T(x) = T(xiei4----- 4-x„en)


(*J
= F(*iei) 4-------- F T(xnen) = xiT(ei) 4-.. .xnT(en).

In particular, let

«2j
W= g F;

then to T we can naturally associate the m x n array

All Gin

<*21

Jimi Umn^

which we call the standard matrix for T. (We will often denote this by [T].) To emphasize:
The j* column of A is the vector in Rm obtained by applying T to the j-th standard basis
vector, ey.

Figure 4.1
4 Linear Transformations and Matrix Algebra ◄ 25

► EXAMPLE 1

The most basic example of a linear map is the following. Fix a e R", and define T: R" —► R by
T(x) = a • x. By Proposition 2.1, we have

T(u + v) = a • (u + v) = (a • u) + (a • v) = T(u) + T(v), and


T(cv) = a • (cv) = c(a • v) = cT(y),

as required. Moreover, it is easy to see that

a„]-
if a = then [T] = ai a2

La* J

► EXAMPLE!

a. Consider the function T: R2 -> R2 defined by rotating vectors in the plane counterclockwise
by 90°. Then it is easy to see from the geometry in Figure 4.2 that

Now the linearity properties can be checked algebraically: If

are vectors, then

xi + yi ~(X2 +?2)
T(x + y) = T = T(x) + T(y),
X2+J2 Xi +yi

and, even easier,

—CX2
T(cx) = T = cT(x),
CXl

Figure 4.2
26 > Chapter 1. Vectors and Matrices

as required. The standard matrix for T is

Better yet, since rotation carries lines through the origin to lines through the origin
and triangles to congruent triangles, it is clear on geometric grounds that T must satisfy
properties (i) and (ii).
b. Consider the function T: R2 -> R2 defined by reflecting vectors across the line xi = x2, as
shown in Figure 4.3. (Visualize this as looking at vectors through a mirror along that line.)
Once again, we see from the geometry that

xi X2

X2 Xi

and linearity is obvious algebraically. But it should also be clear on geometric grounds that
stretching a vector and then looking at it in the mirror is the same as stretching its mirror
image, and likewise for addition of vectors. The standard matrix for T is

0 1
[F] =
1 0

T(x)

Figure 43

c. Consider the linear transformation T: R2 -> R2, whose standard matrix is

The effect of T is pictured in Figure 4.4. One might slide a deck of cards in this fashion,
and such a motion is called a shear.

Figure 4.4
4 Linear Transformations and Matrix Algebra ◄ 27

d. Consider the function T: R3 —> R3 defined by reflecting across the plane x3 = 0. Then
T(eO = ei, T(e2) = e2, and T(e3) = —e3, so the standard matrix for T is

1 0 0
0 1 0
0 0-1

e. Generalizing part a, we consider rotation of R2 through the angle 0 (given in radians).


By the same geometric argument we suggested earlier (see Figure 4.5), this is a linear
transformation of R2. Now, as we can see from Figure 4.6, the standard matrix has as its
first column

COS#
T(ei) =
sin#

(by the usual definition of cos 0 and sin 0, in fact) and as its second

— sin#
T(e2) =
COS#

(since e2 is obtained by rotating ei through t t /2, then so is T (e2) obtained by rotating T(ei)
through t t /2). Thus, the standard matrix for T is

COS# — sin#
A$ —
sin# cos#

Figure 4.6
28 ► Chapter 1. Vectors and Matrices

f. If t c R2 is the line spanned by , then we can consider the linear maps S, T: R2 R2

given respectively by projection onto, and reflection across, the line t. Their standard
matrices are

If we consider larger matrices, e.g.,

1 1 +
6 3'6

then it seems impossible to discern the geometric nature of the linear map represented by
such a matrix.3 In these examples, the standard “coordinate system” built into matrices just
masks the geometry, and as we shall see, the solution is to change our coordinate system.
This we do in Chapter 9. *1

Let T: R" -> Rm be a linear map, and let A be its standard matrix. We want to define
the product of the m x n matrix A with the vector x G Rn in such a way that the vector
T(x) G Rm is equal to Ax. (We will occasionally denote the linear map defined in this way
by In accordance with the formula (*) on p. 24, we have
n n
Ax = T(x) = xiT <e«) = 52 x,a,‘ ’
i=i <=i

where

are the column vectors of the matrix A. That is, Ax is the linear combination of the vectors
ai,..., an, weighted according to the coordinates of the vector x.
There is, however, an alternative interpretation. Let

~aii ’
<221 ^ml

A1 = , A2 = ,..., Am = g F

_ a2n _ ^mn

3For the curious among you, multiplication by C gives a rotation of R3 through an angle of t t /2 about the line
m
spanned by 2 See Exercise 9.2.21.
1
4 Linear Transformations and Matrix Algebra ◄ 29

be the row vectors of the matrix A. Then

flnXi 4-------- F a-[nxn Ai • x


a2iXi + ■ • • + dinXn A2 -x
Ax =

_ 4" ' ‘ * 4" OmnXn _

As we shall study in great detail in Chapter 4, this allows us to interpret the equation Ax = y
as a system of m linear equations in the variables xi,..., xn.

4.1 Algebra of Linear Functions


Denote by Mmxn the set of all m xn matrices. In an obvious way this set can be identified
with (how?). Indeed, we begin by observing that we can add m x n matrices and
multiply them by scalars, just as we did with vectors.
For future reference, we call a matrix square if m = n (i.e., it has equal numbers of
rows and columns). We refer to the entries an, i = 1,..., n, as diagonal entries. We call
the (square) matrix a diagonal matrix if ay = 0 whenever i / j, i.e., if every nondiagonal
entry is 0. A square matrix all of whose entries below the diagonal are 0 is called upper
triangular; one all of whose entries above the diagonal are 0 is called lower triangular.
If S, T: R" -> ROT are linear maps and c e R, then we can obviously form the linear
maps cT: R" -> Rm and S + T: R" -» Rm, defined, respectively, by

(cT)(x) = c(T(x))
(S + T)(x) = S(x) + r(x).

The corresponding algebraic manipulations with matrices are clear: If A = [ay],


i = 1,..., m, j = 1,..., n, then cA is the matrix whose entries are cay:

an can caln
a2i ca2i ca^n
cA = c

Jlml Umn.. -CUm! camn_

Given two matrices A and B e A4WX«, we define their sum entry by entry. In symbols,
when

flln bll ... bln


«2n b2i ■•• bin
A= and B =

_bmi ... bmn_


30 ► Chapter 1. Vectors and Matrices

we define

an <*ln + bin

<*21 4" ^21 a2n +


A+B =

4" bml • ■ • <*mn “I" bmn_

► EXAMPLE 3

Let c = — 2 and

1
1

Then

'-2 -4 —6~ 7 6 2"

cA = -4 -2 4 , A+B= -1 2 -1
_-8 2 “6- 4 -1 3_

and neither sum A 4- C nor B 4- C makes sense since C has a different shape from A and B. (One
should not expect to be able to add functions with different domains or ranges.)

Denote by O the zero matrix, themxn matrix all of whose entries are 0. As the reader
can easily check, scalar multiplication of matrices and matrix addition satisfy the same
properties as scalar multiplication of vectors and vector addition (see Exercise 1.1.12). We
list them here for reference.

Proposition 4.1 Let A,B,Ce A4mxn and let c,d € R.

1. A + B = B + A.
2. (A 4- 5) 4* C — A (5 4* C).
3. O 4- A = A.
4. There is a matrix —A so that A 4- (—A) = O.
5. c(dA) = (cd)A.
6. c(A 4- B) — cA 4- cB.
7. (c 4- d)A = cA + dA.
8. 1A = A.

Of all the operations one performs on functions, probably the most powerfill is compo­
sition. Recall that when g(x) is in the domain of/, we define (f°g)(x) = f(g(x)). So, sup­
pose we have linear maps S: -> JR” and T: R” —> Rm. Then we define T° S’. Rp —> Rm
4 Linear Transformations and Matrix Algebra 31

by (T° S) (x) = T (S (x)). It is well known that composition of functions is not commutative4
but is associative, inasmuch as

= /(g(/>(x») = (/■>(«•*))(*).

We want to define matrix multiplication so that it corresponds to the composition of linear


maps. Let A be the m x n matrix representing T and let B be the n x p matrix representing
S. We expect that the m x p matrix C representing T°S can be expressed in terms of A
and B. The j,th column of C is the vector (T°5)(e;) e Now,

(b'j\
b2j
T(S(ey)) = T = b\j&i + b2j&2 H----- + bnj&n,

\bnjl

where ai,..., an are the column vectors of A. That is, the 7th column of C is the product
of the matrix A with the vector b; . So we now make the definition:

Definition Let A be an m x n matrix and B an n x p matrix. Their product AB is


the m x p matrix whose j* column is the product of A with the jth column of B. That is,
its ij -entry is

(AB)jj — anby 4- di2,b2j ■+■••• 4~ Uinbnj,

e., the dot product of the Ith row vector of A and the j,th column vector of B, both of which
i.
are vectors in R". Graphically, we have

an ai2 a\n
bU b\p

bn b2p
di\ O,i2 . . . din

bnl bnj bnp

dm\ dm2 • • . dmn

(AB)ij ..............

We reiterate that in order for the product AB to be defined, the number of columns of A
must equal the number of rows of B.

4E.g., sin(x2) sin2x.


32 ► Chapter 1. Vectors and Matrices

► EXAMPLE 4

If

fl 31

_1 1_ U

then

"1 3~ "1 4 15 1
4 1 0 -2
AB = 2 -1 — 9 1 -5 -5
-1 1 5 1
_1 1_ •- _3 2 5 -1

Notice also that the product BA does not make sense: B is a 2 x 4 matrix and A is 3 x 2, and
4^3. ◄

The preceding example brings out an important point about the nature of matrix mul­
tiplication: It can happen that the matrix product AB is defined and the product BA is not.
Now if A is an m x n matrix and B is an n x m matrix, then both products AB and BA
make sense: AB is m x tn and BA is n x n. Notice that these are both square matrices, but
of different sizes. But even if we start with both A and B as n x n matrices, the products
AB and BA need not be equal.

► EXAMPLES

Let

0
0

Then

whereas BA =

When—and only when—A is a square matrix (i.e., m — n), we can multiply A by


itself, obtaining A2 = AA, A3 — A2A = AA2, etc. If we think of Ax as resulting from x
by performing some geometric procedure, then (A2)x should result from performing that
procedure twice, (A3)x thrice, and so on.

► EXAMPLE 6

Let

2
5
4
5
4 Linear Transformations and Matrix Algebra ◄ 33

Then it is easy to check that A2 = A, so An = A for all positive integers n (why?). What is the
geometric explanation? Note that
" 1 2
1 _ 1

to
5 1 0 5 1
A and

1
II
0 2 “ 5 2 4 5 2
L 5 J 1 _ 5 .

so that for every x e R2, we see that Ax lies on the line spanned by Indeed, we can tell more:

1
_ r i r "i x ‘
X1 y(X] +2x2) 4" 2x2 1 2 1
Xi |(X1+2X2) 5 2
1 II2 ■a*
2

is die projection of x onto the line spanned by . This explains why A2x = Ax for every x e R2:

A2x = A (Ax), and once we’ve projected the vector x onto the line, it stays exactly the same. ◄

> EXAMPLE?

There is an interesting way to interpret matrix powers in terms of directed graphs. Starting with the
matrix

0 2 1
A = 1 1 1
1 0 1

we draw a graph with three nodes (vertices) and directed edges (paths) from node i to node j, as
shown in Figure 4.7. For example, there are two edges from node 1 to node 2 and none from node 3
to node 2.
We calculate

]*=--—
Figure 4.7
34 ► Chapter 1. Vectors and Matrices

"5 8 8~
a 3 = 6 7 8 , and
_4 4 5_
"272 338 377 '
A7 = 273 337 377
169 208 233 _

For example, the 13-entry of A2 is


(A2)13 = 011013 + O12O23 + O13O33 = (0)(l) + (2)(1) + (1)(1) = 3.

With a bit of thought, the reader will convince herself that the i./-entry of A2 is the number of “two-
step” directed paths from node i to node j. Similarly, the ij -entry of A" is the number of n-step
directed paths from node i to node j.

We have seen that, in general, matrix multiplication is not commutative. However, it


does have the following crucial properties. Let In denote the n x n matrix with l’s on the
diagonal and O’s elsewhere.

Proposition 4.2 Let A and A' be m xn matrices; let B and B' be n x p matrices;
let C be a p xq matrix, and let cbe a scalar. Then

1. AIn — A = Im A. For this reason, In is called then xn identity matrix.


2. (A + A')B = AB + A'B and A(B + B') = AB + AB'. This is the distributive
property of matrix multiplication over matrix addition.
3. (cA)B = c(AB) = A(cB).
4. (AB)C = A(BC). This is the associative property of matrix multiplication.

Proof These are all immediate from the linear map viewpoint. ■

One of the important concepts is that of the inverse of a function.

Definition Let A be an n x n matrix. We say A is invertible if there is an n x n


matrix B so that

AB = BA = In.
We call B the inverse of the matrix A and denote this by B — A-1.

If A is the matrix representing the linear transformation T: R" -> R", then A-1 rep­
resents the inverse function T-1, which must then also be a linear transformation.

► EXAMPLE 8

Let

Then AB = I2 and BA = I2, so B is the inverse matrix of A. ◄


4 Linear Transformations and Matrix Algebra 35

► EXAMPLE 9

It will be convenient for our future work to have the inverse of a 2 x 2 matrix

a b
A=
d

Provided ad — be 0, if we set

_ 1 d —b
A"1
ad — be a

then an easy calculation shows that A A-1 = A-1 A = I2, as needed. ◄ I

► EXAMPLE 10

It follows immediately from Example 9 that for our rotation matrix

COS# — sin# cos 9 sin#


Ag = we have Ae 1 =
sin# cos# — sin 6 cos#

Since cos(—0) — cos 0 and sin(—0) = — sin 0, we see that this is the matrix A(-e). If we think about
the corresponding linear maps, this result becomes obvious: To invert (or “undo”) a rotation through
angle 0, we must rotate through angle —0.

► EXAMPLE 11

As an application of Example 9, we can now show that any two nonparallel vectors u,v e R2 must
Ml V1
span R2. It is easy to check that if u = and v = are nonparallel, then ui v2 — u2v^ /= 0,
_ “2 .
_V2.
so the matrix

Ml Vl
A=
M2 l>2

is invertible. Given x € R2, define c = by c = A Jx. Then we have

x = A(A Jx) = Ac = ciu + c2v,

thereby establishing that an arbitrary x e R2 is a linear combination of u and v. Indeed, more is


true: That linear combination is unique since x = Ac if and only if c = A-1x. We shall study the
generalization of this result to higher dimensions in great detail in Chapter 4. ◄

We shall learn in Chapter 4 how to calculate the inverse of a matrix in a straightforward


fashion. We end the present discussion of inverses with a very important observation.

Proposition 4.3 Suppose A and B are invertible n xn matrices. Then their product
AB is invertible, and

(AB)'1 = B-1 A-1.


36 ► Chapter 1. Vectors and Matrices

Remark Some people refer to this result rather endearingly as the “shoe-sock theo­
rem,” for to undo (invert) the process of putting on one’s socks and then one’s shoes, one
must first remove the shoes and then remove the socks.

Proof To prove the matrix AB is invertible, we need only check that the candidate
for the inverse works. That is, we need to check that

(AB)(B-1A-1) =/„ and (B-1A-1)(AB) = Z„.

But these follow immediately from associativity:

(AB)(B~lA~l) = A(BB-1)A-1 = AInA~r = AA-1 = In, and


(B~lA~l)(AB) = B-^A^AyB = B~xInB = B~XB = In. ■

4.2 The Transpose


The final matrix operation we will discuss in this chapter is the transpose. When A is an
m x n matrix with entries aJ7, the matrix AT (read “A transpose”) is the n x m matrix whose
iij-entry is a?z; i.e., the Ith row of AT is the Ith column of A. We say a square matrix A is
symmetric if AT = A and skew-symmetric if AT = —A.

► EXAMPLE 12

Suppose

Then AT = B, B7 = A, CT = D, and Dr — C. Note, in particular, that the transpose of a column


vector, i.e., an n x 1 matrix, is a row vector, i.e., a 1 x n matrix. An example of a symmetric matrix
is

The basic properties of the transpose operation are as follows:

Proposition 4.4 Let A and A' be m x n matrices, let B bean n x p matrix, and let
c be a scalar. Then
1. (AT)T = A;
2. (c A)t = c At ;
3. (A + A')t = At + A't ;
4. (AB)t = Bt At .
4 Linear Transformations and Matrix Algebra 37

Proof The first is obvious since we swap rows and columns and then swap again,
returning to our original matrix. The second and third can be immediately checked. The
last result is more interesting, and we will use it to derive a crucial result in a moment. Note,
first, that AB is an m x p matrix, so (AB)T will be a p x m matrix; B7A7 is the product of
a p x n matrix and an n x m matrix and hence will be p x m as well, so the shapes agree.
Now, the ji-entry of AB is the dot product of the j* row vector of A and the i* column
vector of B; i.e., the ij-entry of (AB)7 is

On the other hand, the ij-entry of B7A7 is the dot product of the i* row vector of B7 and
the j* column vector of AT; but this is, by definition, the dot product of the Ith column
vector of B and the j* row vector of A. That is,

(BTAT)l7=bf.A7,

and, since dot product is commutative, the two formulas agree. ■

The transpose matrix will be very important to us because of the interplay between dot
product and transpose. If x and y are vectors in R", then by virtue of our very definition of
matrix multiplication,

x • y = xTy,

provided we agree to think of a 1 x 1 matrix as a scalar. Now we have the highly useful

Proposition4.5 Let Abeanmxn matrix, x e R", andy G Rm. Then

Ax • y = x • ATy.

(On the left, we take the dot product of vectors in Rm; on the right, of vectors in R" J

Remark You might remember this: To move the matrix “across the dot product,”
you must transpose it.

Proof We just calculate, using the formula for the transpose of a product and, as
usual, associativity:

Ax • y = (Ax)Ty = (xTAT)y = xT(ATy) = x • ATy. ■

► EXAMPLE 13
We return to the economic interpretation of dot product given in the remark on p. 12. Suppose that
m different ingredients are required to manufacture n different products. To manufacture the product
Xi yi

vector x = requires the ingredient vector y — , and we suppose x and y are related by

ym
38 ► Chapter 1. Vectors and Matrices

the equation y = Ax for some m x n matrix A. If each unit of ingredient j costs a price pj, then the
cost of producing x is

m n
= y ■ P = Ax • p = x ■ ATp = ^qtXi,
j=\ 1=1

where q = ATp. Notice then that qt is the amount it costs to produce a unit of the Ith product. Our
fundamental formula, Proposition 4.5, tells us that the total cost of the ingredients should equal the
total worth of the products we manufacture. ◄

► EXERCISES 1.4
"o 1“
1. Let A =
"1 2~
, B=
2 r, c= 1 2 r , and D = 1 0 . Calculate
3 4 4 3 0 1 2
•- -* *- -1 _2 3_
each of the following expressions or explain why it is not defined.
(a) A + B (e) AB (i) BD
*(b) 2A — B *(f) BA (j) DB
(c) A — C *(g) AC *(k) CD
(d) C + D *(h) CA *0) DC
2. (a) If A is an m x n matrix and Ax = 0 for all x e Rn, prove that A = O.
(b) If A and B are m x n matrices and Ax = Bx for all x € R", prove that A = B.
#3. Let A be an m x n matrix. Show that V = {x € R" : Ax = 0} is a subspace of R".
#4. Let A be an m x n matrix.

(a) Show that V = :xeR" C Rw+" is a subspace of R'”+".

(b) When m = 1, show that V C R"+1 is a hyperplane (see Example le in Section 3) by finding a
vector b € R"+1 so that V = {z € R"+1 : b • z = 0}.
5. Give 2x2 matrices A so that for any x € R2 we have, respectively,
(a) Ax is the vector whose components are, respectively, the sum and difference of the components
ofx;
*(b) Ax is the vector obtained by projecting x onto the line xi = X2 in R2;
(c) Ax is the vector obtained by first reflecting x across the line xi = 0 and then reflecting the resulting
vector across the line x2 = 0;
(d) Ax is the vector obtained by projecting x onto the line 2xi - x2 = 0;
*(e) Ax is the vector obtained by first projecting x onto the line 2xi — x2 = 0 and then rotating the
resulting vector n/2 counterclockwise;
(f) Ax is the vector obtained by first rotating x an angle of n/2 counterclockwise and then projecting
the resulting vector onto the line 2xi - x2 = 0.
6. (a) Calculate Ae A# and A^A$. (Recall the definition of the rotation matrix on p. 27.)
(b) Use your answer to part a to derive the addition formulas for cos and sin.
4 Linear Transformations and Matrix Algebra ◄ 39

7. Let Ae be the rotation matrix defined on p. 27,0 < 0 < it. Prove that
(a) || Aex|| = ||x|| for all x € R2; (b) the angle between x and A0x is 0.
These properties should characterize a rotation of the plane through angle 0.
8. Prove or give a counterexample. Assume all relevant matrices are square and of the same size.
(a) If AB = CB and B £ 0, then A = C. (c) (A + B)(A - B) = A2 - B2.
(b) If A2 = A, then A = 0 or A = I. (d) If AB = BC and B is invertible, then A = C.

9. Find all 2 x 2 matrices A = satisfying

(a) A2 = /2; *(b) A2 = 0; (c) A2 =-Z2.


COS0
10. (a) Show that the matrix giving reflection across the line spanned by
sin0

cos 20 sin 20
R=
sin20 —cos 20

(b) Letting Ae be the rotation matrix defined on p. 27, check that

A'is (-<?)•

11. For each of the following matrices A, find a formula for A". (If you know how to give an inductive
proof, please do so.)

"rfi

d2
(all nondiagonal entries are 0)

12. Suppose A and A' are m x m matrices, B and B' are m x n matrices, C and C' are n x m
matrices, and D and D' are n x n matrices. Check the following formula for the product of “block”
matrices:
A B ' ' A' B' AA' + BC' AB' + BD'
Lc D [c' D' CA’ 4- DC’ CB' + DD'
"13. Let T: R2 -> R2 be the linear transformation defined by rotating the plane it fl counterclock­
wise; let S: R2 -► R2 be the linear transformation defined by reflecting the plane across the line
Xi + X2 = 0.
(a) Give the standard matrices representing S and T.
(b) Give the standard matrix representing
(c) Give the standard matrix representing S»T.
14. Calculate the standard matrix for each of the following linear transformations T:
(a) T: R2 -> R2 given by rotating -it{4 about the origin and then reflecting across the line
— x2 = 0.
(b) T: R3 -> R3 given by rotating it/2 about the xi-axis (as viewed from the positive side) and then
reflecting across the plane x2 = 0.
40 ► Chapter 1. Vectors and Matrices

(c) T: R3 -* R3 given by rotating —rr/2 about the xi-axis (as viewed from the positive side) and
then rotating ?r/2 about the x3-axis.
±1
15. Consider the cube with vertices ±1 , pictured in Figure 4.8. (Note that the coordinate axes
±1
pass through the centers of the various faces.) Give the standard matrices for each of the following
symmetries of the cube.
(a) 90° rotation about the x3-axis (viewed from high above)

(b) 180° rotation about the line joining

(0 120° rotation about the line joining (viewed from high above)

16. Consider the tetrahedron with vertices , pictured in Figure

4.9. Give the standard matrices for each of the following symmetries of the tetrahedron.
T0
(a) 120° rotation counterclockwise (as viewed from high above) about the line joining 0 and
"1 ’ 0
the vertex 1
_ 1 _

(b) 180“ rotation about the line joining


4 Linear Transformations and Matrix Algebra ◄ 41

(c) reflection across the plane containing one edge and the midpoint of the opposite edge
(Hint: Note where the coordinate axes intersect the tetrahedron.)

Figure 4.9
*17. Suppose A is an n x n matrix and B is an invertible n x n matrix. Calculate the following.
(a) (BAB-1)2
(b) (BAB-1)” (n a positive integer)
(c) (BAB-1)-1 (what additional assumption is required here?)
18. Find matrices A so that
(a) A £ O, but A2 = O; (b) A2 O, but A3 = O.
Can you make a conjecture about matrices satisfying A"-1 /= O but A" = O?
*19. Suppose A is an invertible n x n matrix and x € Rn satisfies Ax = 7x. Calculate A-1x.
20. Suppose A is a square matrix satisfying the equation A3 - 3A + 21 = 0. Show that A is invert­
ible. (Hint: Can you give an explicit formula for A-1 ?)
21. Suppose A is an n x n matrix satisfying A10 = O. Prove that the matrix In — A is invertible.
(Hint: As a warm-up, try assuming A2 = O.)
22. Define the trace of an n x n matrix A (denoted tr A) to be the sum of its diagonal entries:
n

<=i

(a) Prove that trA = tr(AT).


(b) Prove that tr(A -I- B) = trA + trB and tr(cA) = c trA for any scalar c.

(c) Prove that tr(AB) = tr(BA). (Hint: E Ec« = E E c^-)

23. Let

2 3

Calculate each of the following expressions or explain why it is not defined,


(a) AT (c) CT *(e) ATC
*(b) 2A - BT (d) CT + D (f) ACT
42 ► Chapter 1. Vectors and Matrices

*(g) CTAT (i) Dt B *(k) CTC


(h) BDr *(j) CCT 0) CTDT
*24. Suppose A and B are symmetric. Prove that AB is symmetric if and only if AB = BA.
25. Let A be an arbitrary m x n matrix. Prove that ATA is symmetric.
26. Suppose A is invertible. Check that (A-1)TAT = I and AT(A-1)T = Z, and deduce that AT is
likewise invertible.
*27. Let Ae be the rotation matrix defined on p. 27. Explain why Ag1 = Aj.
28. An n x n matrix is called a permutation matrix if it has a single 1 in each row and column and
all its remaining entries are 0.
(a) Write down all the 2 x 2 permutation matrices. How many are there?
(b) Write down all the 3 x 3 permutation matrices. How many are there?
(c) Prove that the product of two permutation matrices is again a permutation matrix. Do they
commute?
(d) Prove that every permutation matrix is invertible and P-1 = PT.
(e) If A is an n x n matrix and P is an n xn permutation matrix, describe the matrices PA and AP.
829. Let A bean m x n matrix and let x,y e R". Prove that if Ax = Oandy = ATbforsomeb e Rm,
then x • y = 0.
#30. Suppose A is a symmetric n x n matrix. Let V c R" be a subspace with the property that Ax € V
for every x e V. Prove that Ay e V1 for all y € Vx.
*31. Given the matrix
"1 2 1' 4 -3 1"

A= 1 3 1 and its inverse matrix A 1 = -1 1 0


_0 1 -1_ -1 1 -1_

find (with no computation) the inverse of


’1 1 o' "1 2 r “1 2 1'

(a) 2 3 1 (b) 0 1 -1 (0 1 3 1
_1 1 -1_ _1 3 1_ _0 2 -2_

*t*32. Suppose A is an m x n matrix and x € Rn satisfies (ATA)x = 0. Prove that Ax = 0. (Hint:


What is || Ax ||?)
33. Suppose A is a symmetric matrix satisfying A2 = O. Prove that A = O. Give an example to
show that the hypothesis of symmetry is required.
#34. We say an n xn matrix A is orthogonal if ATA = In.
(a) Prove that the column vectors »i,..., a„ of an orthogonal matrix A are unit vectors that are
orthogonal to one another; i.e., f

(b) Fill in the missing columns in the following matrices to make them orthogonal:
?' ■ 1 2 ~
1 0 ?
4 0 -1 ? 9
5
2
J
7 2
~5
3

“I 2 1
0 0 ?_ L 3 7 5 J
5 Introduction to Determinants and the Cross Product 43

(c) Prove that any 2x2 orthogonal matrix A must be of the form
COS# -sin# cos# sin#
or
sin# COS# sin# — cos#

for some real number 6. (Hint: Use part a, rather than the original definition.)
*(d) Prove that if A is an orthogonal 2x2 matrix, then /iA: R2 -> R2 is either a rotation or the
composition of a rotation and a reflection.
(e) Assume for now that AT = A-1 when A is orthogonal (this is a consequence of Corollary 2.2 of
Chapter 4). Prove that the row vectors Ai,..., An of an orthogonal matrix A are unit vectors that are
orthogonal to one another.
35. (Recall the definition of orthogonal matrices from Exercise 34.)
(a) Prove that if A and B are orthogonal n x n matrices, then so is AB.
*(b) Prove that if A is an orthogonal matrix, then so is A-1.
*36. (a) Prove that the only matrix that is both symmetric and skew-symmetric is 0.
(b) Given any square matrix A, prove that S = |(A + AT) is symmetric and K = |(A - AT) is
skew-symmetric.
(c) Prove that any square matrix A can be written in the form A = S + K, where S is symmetric
and K is skew-symmetric.
(d) Prove that the expression in part c is unique: If A = S + K and A = S' 4- K' (where S and S'
are symmetric and K and K1 are skew-symmetric), then S = S' and K = K'. (Hint: Use part a.)
37. Suppose A is an n x n matrix that commutes with all n x n matrices; i.e., AB = BA for all
B 6 A4nxn. What can you say about A?

► 5 INTRODUCTION TO DETERMINANTS
AND THE CROSS PRODUCT
Let x and y be vectors in R2 and consider the parallelogram CP they span. The area of CP is
nonzero as long as x and y are not collinear. We want to express the area of ? in terms of
the coordinates of x and y. First notice that die area of the parallelogram pictured in Figure
5.1 is the same as the area of the rectangle obtained by moving the shaded triangle from
the right side to the left. This rectangle has area A = bh, where b = ||x|| is the base and
h = ||y || sin 0 is the height. We could calculate sin 0 from the formula
n x■y
cos# =---------- ,
IM llyll
but instead we note (see Figure 5.2) that

||x||||y|| sin# = ||x||||y|| cos (f - #) = p(x) • y,

where p(x) is the vector obtained by rotating x an angle t t /2 counterclockwise (see Exer­

cise 1.2.16). If x = andy = then we have

Ji
area(CP) = p(x) ■ y = = xiy2-x2yx.
.yi.
44 ► Chapter 1. Vectors and Matrices

► EXAMPLE 1

If x = and y = , then the area of die parallelogram spanned by x and y is y2 — x2y} =

3 • 3 - 1 • 4 = 5. On the other hand, if we interchange the two, letting x = $ and y = ,

then we get xiyz — x2yi = 41-3-3 = -5. Certainly the parallelogram hasn’t changed; nor does
it make sense to have negative area. What is the explanation? In deriving our formula for the area
above, we assumed 0 < 6 < rc\but if we must turn clockwise to get from x to y, this means that 9 is
negative, resulting in a sign discrepancy in the area calculation. -41

So we should amend our earlier result. We define the signed area of the parallelogram
IP to be the area of IP when one turns counterclockwise from x to y and to be negative the
area of IP when one turns clockwise from x to y, as illustrated in Figure 5.3. Then we have

signed area(IP) = xi?2 — x2yi.

Because of its geometric significance, we consider the function5

(*) D(x,y) =xiy2-x2yi;

this is the function that associates to each ordered pair of vectors x, y G R2 the signed area
of the parallelogram they span.

Figure 5.3

sHere, since x and y are themselves vectors, we use the customary notation for functions.
5 Introduction to Determinants and the Cross Product ◄ 45

Next, let’s explore the properties of the signed area function D on R2 x R2.6

Property 1 If x, y € R2, then D(y, x) = -D(x, y).

Algebraically, we have

x) = yix2 - yixi = -(*iy2 - x2yi) = -D(x, y).

Geometrically, this was the point of our introducing the notion of signed area.

Property 2 If x, y € R2 and c e R, then

D(cx, y) = cD(x, y) = D(x, cy).

This follows immediately from the formula (*):

D(cx, y) = (cxi)y2 - (cx2)yi = c(xiy2 - x2yi) = cD(x, y).

Geometrically, if we stretch one of the edges of the parallelogram by a factor of c > 0, then
the area is multiplied by a factor of c. And if c < 0, the area is multiplied by a factor of |c|
and the signed area changes sign (why?).

Property 3 If x, y, z G R2, then

D(x + y, z) = D(x, z) + l>(y,z) and D(x,y+ z) = D(x,y)+ D(x, z).

We can check this explicitly in coordinates (but the clever reader should try to use
X1 yi
properties of the dot product to give a better algebraic proof): If x — >y =
_X2 . .yz.
Z1
andz = , then
Z2

D(x 4- y, z) = (xi + yi)z2 - (x2 + y2)zi


= (X1Z2 ~ X2Z1) + (yiZ2 - J2Z1) = ®(x, z) + D(y, z),

as required. (The formula for 2)(x, y + z) can now be deduced by using Property 1.)
Geometrically, we can deduce the result from Figure 5.4: The area of parallelogram OBCD
(D(x + y, z)) is equal to the sum of the areas of parallelograms OAED (D(x, z)) and
ABCE (D(y, z)). The proof of this, in turn, follows from the fact that AO AB is congruent
to ADEC.

6Recall that, given two sets X and Y, their product X x Y consists of all ordered pairs (x, y), where x 6 X and
y € Y.
46 ► Chapter 1. Vectors and Matrices

Figure 5.4

Property 4 For the standard basis vectors ei, ez, we have D(ei, ez) = 1.

The expression D(x, y) is a 2 x 2 determinant, often written |x y |. Indeed, given a 2 x 2

matrix A with column vectors ai, az e R2, we define

det A = D(ai, az) = »i a2

As we ask the reader to check in Exercise 4, one can deduce from the four properties above
and the geometry of linear maps the fact that the determinant represents the signed area of
the parallelogram.
We next turn to the case of 3 x 3 determinants. The general case will wait until Chapter
7. Given three vectors,

Xl yi zi

x= X2 y= yi and z= Z2 e R3,
X3 _?3 _ _Z3 _

we define

I I
J2 Z2 yi Z1 yi zi
D(x, y, z) = y z = Xi - X2 + x3
y3 Z3 ?3 Z3 yi z2

Multiplying this out, we get three positive terms and three negative terms; a handy mnemonic
device for this formula is depicted in Figure 5.5.
This function D of three vectors in R3 has properties quite analogous to those in the
two-dimensional case. In particular, it follows immediately from the latter that if x, y, z,
and w are vectors in R3 and c is a scalar, then

D(x, z, y) = -D(x, y, z),


D(x, cy, z) = cD(x, y, z) = D(x, y, cz),
5 Introduction to Determinants and the Cross Product -4 47

y
XXX 71
*2 X X ?X
yz
yi 'to'' *g X
' ' " A A
+ + +
Figure 5.5

D(x, y + w, z) = D(x, y, z)+ D(x, w, z), and


D(x, y, z + w) = D(x, y, z) + D(x, y, w).

It is also immediately obvious from the definition that if x, y, z, and w are vectors in R3
and c is a scalar, then

D(cx, y, z) = c®(x, y, z),


D(x 4- w, y, z) — 2)(x, y, z) + D(w, y, z).

Least elegant is the verification that D(y, x, z) = —D(x, y, z):

X2 Z2 *1 21 X1 21
D(y, x, z) = -y2 + ys
*3 23 *3 23 X2 22

= J1(*2Z3 - X3Z2) + V2(X3Z1 - X1Z3) + V3U1Z2 ~ *2Z1)


= -xi(y2Z3 - 7322) + *2(7123 - 7321) - *3(7122 - y2zi)

yi 22 . 71 21 71 22
= -Xl ~x3
73 23 73 23 yz 22

= —D(x, y, z).

Summarizing, we have

Property 1 If x, y, z g R3, then

D(y, x, z) = D(x, z, y) = D(z, y, x) = -D(x, y, z).

Note that, as a consequence, whenever two of x, y, and z are the same, we have
I>(x, y, z) = 0.

Property 2 If x, y, z G R3 and c e R, then

£>(cx, y, z) = D(x, cy, z) = D(x, y, cz) = cD(x, y, z).

Property 3 If x, y, z G R3, then

D(x + w, y, z) = D(x, y, z) + 2)(w, y, z),


D(x,y + w, z) = !D(x, y, z)+ D(x, w, z), and
D(x, y, z + w) = D(x, y, z) + 2>(x, y, w).
48 ► Chapter 1. Vectors and Matrices

Property 4 For the standard basis vectors ei, e2, e3, we have D(ei, e2,e3) = 1.

If we let y' = y - projxy andz' = z - projxz — projy,z, then it follows from the proper­
ties of D that D(x, y, z) = l)(x, y', z'). Moreover, we shall see when we study determinants
in Chapter 7 that the results of Exercise 4 hold in three dimensions as well, so that the latter
value is not changed by rotating R3 to make x = aei, y' = /Je2, and z' = yes. Since ro­
tation doesn’t change signed volume, we deduce that D(x, y, z) equals the signed volume
of the parallelepiped spanned by x, y, and z, as suggested in Figure 5.6. For an alternative
argument, see Exercise 18.

Figure 5.6

Given two vectors x, y e R3, define a vector, called their cross product, by

x xy = (x2y3 -*3?2)ei + (x3yi -xiy3)e2 + (xi?2 -X2ji)e3


ei xi
= e2 x2 y2 ,
e3 x3 y3

where the latter is to be interpreted “formally.” The geometric interpretation of the cross
product, as indicated in Figure 5.7, is the content of the following

Proposition 5.1 The cross product x x y of two vectors x, y e R3 is orthogonal to


both x and y and ||x x y || is the area ofthe parallelogram IP spanned by x and y. Moreover,

Figure 5.7
5 Introduction to Determinants and the Cross Product <4 49

when x and y are nonparallel, the vectors x, y, x x y determine a parallelepiped o/positive


signed volume.

Remark More colloquially, if you curl the fingers of your right hand from x toward
y, your thumb points in the direction of x x y.

Proof The orthogonality is an immediate consequence of the properties once we


realize that the formula for the cross product guarantees that

z ■ (x x y) = !D(z, x, y).

In particular, x ■ (x x y) = D(x, x, y) = 0.
Now, D(x, y, x x y) is the signed volume of the parallelepiped spanned by x, y, and
x x y. Since x x y is orthogonal to the plane spanned by x and y, that volume is the product
of the area of IP and ||x x y ||. On the other hand,

®(x, y, x x y) = D(x x y, x, y) = (x x y) • (x x y) = ||x x y||2.

Setting the two expressions equal, we infer that

||x x y|| = area(IP).

When x and y are nonparallel, we have D(x, y, x x y) = ||x x y ||2 > 0, so the vectors span
a parallelepiped of positive signed volume, as desired. ■

► EXAMPLE 2

We can use the cross product to find the equation of the subspace IP spanned by the vectors u = 1

1
and v — 2 . For the normal vector to IP is

ei 1 3
A = u x v = e? 1 -2
e3 -1 1

and so

IP = {x e R3: A • x = 0} = {x € R3: 3xi — 2x2 + x3 = 0}.

Moreover, as depicted schematically in Figure 5.8, the affine plane parallel to IP and passing
’ 2~
through the point Xq = is given by

IPi = {x € R3: A ■ (x - Xo) = 0} = {x 6 R3 : A • x = A ■ Xol


= {x e R3: 3xi — 2x3 + x3 = 7}.
50 ► Chapter 1. Vectors and Matrices

Figure 5.8

► EXERCISES 1.5
1. Give a geometric proof that D(x, y + ex) = D(x, y) for any scalar c.
2. Show that if a function D: R2 x R2 -> R satisfies Properties 1-4, then D(x, y) = %iy2 - x2yi.

X2
3. Suppose a polygon in the plane has vertices Give a formula for its
yi
area.
4. (a) Check that when A and B are 2 x 2 matrices, we have det(AB) — det A det B.
(b) Let A = As be a rotation matrix. Check that det(A0B) = det B for any 2x2 matrix B.
(c) Use the result of part b and the properties of determinants to give an alternative proof that D (x, y)
is the signed area of the parallelogram spanned by x and y.
5. Calculate the cross product of the given vectors x and y.

6. Find the area of the triangle with the given vertices.


"o' 1 ' 1" "o' 1 ' 7"

*(a) A = 0 ,B = 0 ,C = 2 (c) A — 0 ,B = -2 ,c = 1
_0_ -1 _ 1 _ _0_ 1 _ _ -5 _
~2
"1 ' 1' 2' 8'
1 2
(b) A = -1 ,B = -1 ,c = 1 (d) A = 1 ,B = -1 ,c = 2
1 0 _2 _ 1 _ 2_ _-4_

7. Find the equation of the (affine) plane containing the three points
' o' 1 1 0 1 ' 7"

*(a) A — 0 ,B = 0 ,c = 2 (c) A — 0 ,B = -2 ,c = 1
0 -1 1 0 1 -5
5 Introduction to Determinants and the Cross Product <4 51

1 ’ 2~ -2" “ 1 ' 2“ 8"


(b) A = -1 ,B = -1 ,c = 1 , *(d) A = 1 ,B = -1 ,c = 2
1 _ 0_ _2_ _ 1 _ 2_ _ -4_
— — —
1 ’ ’0 "
*8. Find the equation of the (affine) plane containing the points -1 and 1 and parallel to the
■" 1 “I
2_ _3 _
vector 0 .
1

9. Find the intersection of the two planes xi 4- x2 - 2x3 = 0 and 2xt 4- x2 4- x3 = 0.


10. Given the nonzero vector aGR3,a-x = &GR, and a x x = c g R3, can you determine the
vector x G R3? If so, give a geometric construction for x.
*11. Find the distance between the given skew lines in R3:
"2" o’ ’ 1 " ’ 1 '
t: 1 4- f 1 and m: 1 4-J 1
_ 1 _ _ -1 . _0_ _ 1 _

*12. Find the volume of the parallelepiped spanned by


" 1 “
r2" r-i'
x= 2 , y= 3 , and z= 0
_ 1 _ .1. 3_

13. Let IP be a parallelogram in R3. Let CPi be its projection on the x2x3 -plane, ?2 be its projection
on the xix3-plane, and 1P3 be its projection on the x^-plane. Prove that
(areaGP))2 = (areaGPi))2 4- (area(?2))2 4- (area(?3))2.

(How’s that for a generalization of the Pythagorean Theorem?)


14. Letx, y, z g R3.
(a) Show that x x y = -y x x and xx(y4-z) = xxy4-xxz.
(b) Show that cross product is not associative; i.e., give specific vectors so that (x x y) x z
x x (y x z).
a
*15. Given a = b G R3, define T: R3 -> R3 by T(x) = a x x. Prove that T is a linear trans-
c
formation and give its standard matrix. Explain in the context of Proposition 4.5 why [T] is skew-
symmetric.
X•Z x•w
16. Let x, y, z, w g R3. Show that (x x y) ■ (z x w) =
y-z y • w
17. Suppose u, v, w g R2 are noncollinear points, and let x G R2.
(a) Show that we can write x uniquely in the form x = ru 4- sv 4- tw, where r 4- s 4-1 = 1. (Hint:
The vectors v — u and w — u must be nonparallel. Now apply the result of Example 11 of Section 4.)
(b) Show that r is the ratio of the signed area of the triangle with vertices x, v, and w to the signed
area of the triangle with vertices u, v, and w. Give corresponding formulas for j and t.
52 ► Chapter 1. Vectors and Matrices

(c) Suppose x is the the intersection of the medians of the triangle with vertices u, v, and w. Compare
the areas of the three triangles formed by joining x with any pair of the vertices. (Cf. Exercise 1.1.8.)
(d) Let r = D(v, w), s = D(w, u), and t = D(u, v). Show that ru 4- sv 4- tw = 0. Give a physical
interpretation of this result.
18. In this exercise, we give a self-contained derivation of die geometric interpretation of the 3 x 3
determinant as signed volume.
(a) By direct algebraic calculation, show that ||x x y||2 = ||x||2||y||2 — (x • y)2. Deduce that ||x x y||
is the area of the parallelogram spanned by x and y.
(b) Show that z ■ (x x y) is the signed volume of the parallelepiped spanned by x, y, and z.
(c) Conclude that D(x, y, z) equals the signed volume of that parallelepiped.
19. (Heron’s formula) Given AOAB, let oA = x and 0% = y, and set ||x|| = a, ||y|| = b, and
||x - y || = c. Let s = j(a + b + c) be the semiperimeter of the triangle. Use the formulas
||x x y||2 = ||x||2||y||2 - (x • y)2 (see Exercise 18),
||x-y||2 = ||x||2 +llyll2-2x-y
to prove that the area A of LOAB satisfies
A2 = 7 (a2b2 — |(c2 - a2 — b2)2^ = s(s — a)(s — b)(s — c) .
4 \ 4 /
20. Let AABC have sides a, b, and c. Let s = |(a 4- b 4- c) be its semiperimeter. Prove that the
inradius of the triangle (i.e., the radius of its inscribed circle) is r = ^/(s ~ a)(s - b)(s - c)/s.
CH ER

2
FUNCTIONS, LIMITS,
AND CONTINUITY
In this brief chapter we introduce examples of nonlinear functions, their graphs, and their
level sets. As usual in calculus, the notion of limit is a cornerstone on which calculus is
built. To discuss “nearness,” we need the concepts of open and closed sets and of convergent
sequences. We then give the usual theorems on limits of functions and several equivalent
ways of thinking about continuity. All of this will be the foundation for our work on
differential calculus, which comes next.

► 1 SCALAR-AND VECTOR-VALUED FUNCTIONS


In first-year calculus we studied real-valued functions defined on intervals in R (or perhaps
on all of R). In Chapter 1 we began our study of linear functions from R" to Rm. There
are three steps we might imagine to understand more complicated vector-valued functions
of a vector variable.

1.1 Parametrized Curves


First, we might study a vector-valued function of a single variable. If we think of the
independent variable t as time, then we can visualize f: (a, b) -> R" as a parametrized
curve—we can imagine a particle moving in Rn as time varies, and f (t) gives its position
at time t. At this point, we just give an assortment of examples. The careful analysis,
including the associated differential calculus and physical interpretations, will come in the
next chapter.

► EXAMPLE 1

The easiest examples, perhaps, are linear. Imagine a particle starting at position x0 and moving with
constant velocity v. Then its position at time t is evidently f (t) = xo + tv and its trajectory is a line
passing through x0 and having direction vector v, as shown in Figure 1.1. We refer to the vector­
valued function f as a parametrization of the line. Here t is free to vary over all of R. When we wish
to parametrize the line passing through two points A and B, it is natural to use one of those points,
say A, as xo and the vector AB as the direction vector v, as indicated in Figure 1.2. <

53
54 > Chapter 2. Functions, Limits, and Continuity

► EXAMPLE!

The next curve with which every mathematics student is familiar is the circle. Essentially by the very
definition of the trigonometric functions cos and sin, we obtain a very natural parametrization of a
circle of radius a, as pictured in Figure 1.3(a):

f(O = a 0 < t < 2rr.

Now, if a, b > 0 and we apply the linear map

11 /
/ cos t a cost
we see that the unit circle x2 + y2 = 1 maps to the ellipse — + y- = 1. Since T I
ar tr \sint bsint
the latter gives a natural parametrization of the ellipse, as shown in Figure 1.3(b). Be warned, how­
ever: Here t is not the angle between the position vector and the positive x-axis, as Figure 1.3(c)
indicates. ◄ I
1 Scalar-and Vector-Valued Functions ◄ 55

Figure 1.3

Now we come to some more interesting examples.

► EXAMPLES

Consider the two cubic curves in R2, illustrated in Figure 1.4. On the left is the cuspidal cubic
y2 — x3, and on the right is the nodal cubic y2 = x3 + x2. These can be parametrized, respectively,
by the functions

Figure 1.4
56 ► Chapter 2. Functions, Limits, and Continuity

as the reader can verify.1 Now consider the twisted cubic in R3, illustrated in Figure 1.5, given by

t
f(r) = t eR.

Its projections in the xy-, xz-, and yz-coordinate planes are, respectively, y = x2, z = x3, and z2 = y3
(the cuspidal cubic). *4

► EXAMPLE 4

Our last example is a classic called the cycloid: It is the trajectory of a dot on a rolling wheel (circle).
Consider the illustration in Figure 1.6. Assuming the wheel rolls without slipping, we see that the
distance it travels along the ground is equal to the length of the circular arc subtended by the angle
through which it has turned. That is, if the radius of the circle is a and it has turned through angle t,
then tiie point of contact with the x-axis, Q, is at units to the right. The vector from the origin to the
point P can be expressed as the sum of the three vectors o£), Q&, and (see Figure 1.7):

= o$ + of: + cfi
at 0 —a sint
— + +
0 a —a cos t

and hence the function


at — a sin t t — sint
f(0 = =a t€R
a —a cos t 1 — cost

gives a parametrization of the cycloid.

*To see where the latter came from, as suggested by Figure 1.4(b), we substitute y — tx in the equation and solve
forx.
1 Scalar-and Vector-Valued Functions ◄ 57

1.2 Scalar Functions of Several Variables


Next, we might study a scalar-valued function of several variables. For example, we might
study elevation of the earth as a function of position on the surface of the earth; temperature
at noon as a function of position in space; or, indeed, temperature as a function of both
position and time. If we have a function of n variables, to avoid cumbersome notation, we
will typically write

/xA
rather than
\Xn/

It would be typographically more pleasant and economical to suppress the vector notation
and write merely f(xi,..., xn), as do most mathematicians. We hope our choice will make
it easier for the reader to keep vectors in columns and not confuse rows and columns of
matrices.
When n = 1 or n = 2, such functions are often best visualized by their graphs

graph(/)=([/«. :xGir} cr+1,

as pictured, for example, in Figure 1.8. There are two ways to try to visualize functions
and their graphs, as we shall see in further detail in Chapter 3. One is to fix all of the
coordinates of x but one, and see how f varies with each of xi,..., xn individually. This
corresponds to taking slices of the graph, as shown in Figure 1.9. The other is to think of a
topographical map, in which we see curves representing points at the same elevation. One
then can lift each of these up to the appropriate height and imagine the surface interpolating
among them, as illustrated in Figure 1.10. These curves are called level curves or contour
curves of the function.
58 ► Chapter 2. Functions, Limits, and Continuity

Figure 1.9

Figure 1.10
1 Scalar- and Vector-Valued Functions ◄ 59

► EXAMPLES

Suppose we see families of concentric circles as the level curves, as shown in Figure 1.11. We see
that in (a) the circles are evenly spaced, whereas in (b) they grow closer together as we move outward.
This tells us that in (a) the value of f grows linearly with the distance from the origin and in (b) it
grows more quickly. Indeed, it is not surprising to see the corresponding graphs in Figure 1.12: The
• respective functions are /(x) = ||x|| and /(x) = ||x||2.

Figure 1.11

Figure 1.12

13 Vector Functions of Several Variables


Last, we think of vector-valued functions of several variables. Of course, the linear versions
arise in the study of linear maps, as we’ve already seen, and a good deal more in the solution
of systems of linear equations. Sometimes it is easiest to think of a vector-valued function
60 ► Chapter 2. Functions, Limits, and Continuity

f: Rn -> as merely a collection of m scalar functions of n variables:

/i(x)

./mW.

But in other instances, we really want to think of the values as geometrically defined vectors;
fundamental examples are parametrized surfaces and vector fields (both of which we shall
study a good deal in Chapter 8). Note that we will indicate a vector-valued function by
boldface type.

► EXAMPLE 6

Consider the mapping

f: (0, oo) x [0,2?r) -> R2 - {0}

r cos©
r sin©

as illustrated in Figure 1.13. This is a one-to-one mapping onto R2 - {0}. The coordinates are

r cos 5
often called the polar coordinates of the point
r sin©

2t t T ©

Figure 1.13

► EXAMPLE?

Consider the mapping

ucos v

usinu u > 0, 0 < v < 2tr.


u
1 Scalar-and Vector-Valued Functions -*4 61

When we fix u = m 0> the image is a circle of radius «o at height »o: when we fix v = vo» the image is
a ray making an angle of t t /4 with the z-axis and whose projection into the xy-plane makes an angle
of i>o with the positive x-axis. Thus, the image of f is a cone, as pictured in Figure 1.14.

Figure 1.14

► EXERCISES 2.1
1. Find parametrizations of each of the following lines:
(a) 3%i + 4x2 — 6,
*(b) the line with slope 1 /3 that passes through 1 ,

(c) the line through A =

(d) the line through A =

1 2+t
1 l-2t
*(e) the line through parallel to g(t) =
0 3t

2. (a) Give parametric equations for the circle x2 + y2 = 1 in terms of the length t pictured in Figure
1.15. (Hint: Use similar triangles and algebra.)
(b) Use your answer to part a to produce infinitely many positive integer solutions2 of X2 + Y2 = Z2
with distinct ratios Y/ X.

2These are called Pythagorean triples. Fermat asked whether there were any nonzero integer solutions of the
corresponding equations Xn 4- Yn = Zn for n > 3. In 1995, Andrew Wiles proved in a tour de force of algebraic
number theory that there can be none.
62 ► Chapter 2. Functions, Limits, and Continuity

Figure 1.15
3. A string is unwound from a circular reel of radius a, being pulled taut at each instant. Give
parametric equations for the tip of the string P in terms of the angle 0, as pictured in Figure 1.16.

Figure 1.16
4. A wheel of radius a (perhaps belonging to a train) rolls along the x-axis. If a point P (on the
wheel) is located a distance b from the center of the wheel, what are the parametric equations of its
locus as the wheel rolls? (Note that when b = a we obtain a cycloid.) See Figure 1.17.

Figure 1.17
5. *(a) A circle of radius b rolls without slipping outside a circle of radius a > b. Give the parametric
equations of a point P on the circumference of the rolling circle (in terms of the angle 0 of the line
joining the centers of the two circles). (See Figure 1.18(a).)
(b) Now it rolls inside. Do the same as for part a.
These curves are called, respectively, an epicycloid and a hypocycloid.
6. A coin of radius 1" is rolled (without slipping) around the outside of a coin of radius 2". How many
complete revolutions does its “head” make? Now explain the correct answer! (There is a famous
story that the Educational Testing Service screwed this one up and was challenged by a precocious
high school student who knew that he had done the problem correctly.)
0
*7. A dog buries a bone at . He is at the end of a 1-unit long leash, and his master walks down
u 1
the positive x-axis, dragging the dog along. Since the dog wants to get back to the bone, he pulls
1 Scalar- and Vector-Valued Functions •«< 63

(b)

Figure 1.18
the leash taut. (It was pointed out to me by some students a few years ago that the realism of this
model leaves something to be desired.) Hie curve the dog travels is called a tractrix (why?). Give
parametric equations of the curve in terms of the parameters
(a) 0, (b) t,
as pictured in Figure 1.19. (Hint: The fact that the leash is pulled taut means that the leash is tangent
to the curve. Show that 0f(t) = sin0(t)-)

Figure 1.19
8. Prove that the twisted cubic (given in Example 3) has the property that any three distinct points
on it determine a plane; i.e., no three distinct points are collinear.
9. Sketch families of level curves and the graphs of the following functions f.
(a) f j = 1 - y (c) / r j = x2 - y2

10. Consider the surfaces

: x2 + y2 — z2 = 1 and
64 ► Chapter 2. Functions, Limits, and Continuity

(a) Sketch the surfaces.


(b) Give a rigorous argument (not merely based on your pictures) that every pair of points of X can
be joined by a curve in X but that the same is not true of Y.
11. Consider the function

(2 + cos t) cos s
(2 + cos t) sin s 0 < s, t < 2?r.
sint

(a) Sketch the image, X, of g.


*(b) Find an algebraic equation satisfied by all the points of X.
12. Consider the function (defined wherever st /= 1)

(a) Show that every point in the image of g lies on the hyperboloid x2 + y2 — z2 = 1.

(b) Show that the curves (for so and r0 constants) are (subsets of) lines. (See

Figure 1.20.)
(c) More challenging: What is the image of g?

Figure 1.20

> 2 A BIT OF TOPOLOGY IN R"


Having introduced functions, we must next decide what it means for a function to be
continuous. In one-variable calculus, we study functions defined on intervals and come
to appreciate the difference between open and closed intervals. For example, the notion
of limit is couched in terms of open intervals, whereas the maximum value theorem for
continuous functions depends crucially on closed intervals. Matters are somewhat more
subtle in higher dimensions, and we begin our assault on the analogous notions in Rn.
2 A Bit of Topology in Rn 65

Definition Let a g R" and let 8 > 0. The ball of radius 8 centered at a is

B(a, 8) = {x g R" : ||x — a|| < 3}.

This is often called a neighborhood of a.

£
Note that if |x/ — a, | < —= for all i — 1,..., n, then

= 8,

so x g B(sl , 8). And if x g B(a, 8), then |Xj — aj < ||x — a|| <8 for all i = 1,..., n.
Figure 2.1 illustrates these relationships. If ai < bt-for i = 1,..., n, we can consider the
rectangle

R = [«i, bi] x [a2, b2] x • ■ ■ x [an, bn] = {x G R" : a, < x, < bit i = 1,..., n}.

(Strictly speaking, we should call this a rectangular parallelepiped, but that’s too much of a
mouthful.) For reasons that will be obvious in a moment, when we construct the rectangle
from open intervals, viz.,

S = (ai, bi) x (a2, bi) x • • • x (an, bn) = {x g R" : a, < x; < bif i = 1,..., n},

we call it an open rectangle.

8
Figure 2.1

Definition We say a subset U C R" is open if for every a g U there is some ball
centered at a that is completely contained in U; that is, there is <5 > 0 so that B(a, 8) c U.

► EXAMPLE 1

a. First of all, an open interval (a, b) C R is an open subset. Given any c e (a, b), choose
8 < min(c — a,b — c). Then B(c, 8) c (a, b). However, suppose we view this interval as

a subset of R2; namely, S = : a < x < b . Then it is no longer an open subset


0
L J -i
because no ball in R2 centered at is contained in S, as Figure 2.2 plainly indicates.
0
66 > Chapter 2. Functions, Limits, and Continuity

Figure 2.2

b. An open rectangle is an open set. As indicated in Figure 2.3, suppose c e S = (at, bi) x
(a2, h2) x • • • x (an, bn). Let 8t = min(q — ai, bi — q), i = and set 8 =
min(<Si,..., 8n). Then we claim that B(c, 8) c S. For if x e B(c, 8), then |x(- - q| <
||x - c|| < 8 < 8i, so at < Xt < bt, as required.

c. Consider S = : 0 < xy < 11. We want to show that S is open, so we choose


a y
c= €■5. Without loss of generality, we may assume that 0 < b < a, as shown in
b
Figure 2.4. We claim that the ball of radius
„ 1 (l-ab\
8 = 75-------- ;----- b = b I----------I
\{a + |) \l+afc/

centered at c is wholly contained in the region S. We consider the open rectangle centered
at c with base j — a and height 23; by construction, this rectangle is contained in S. Since
b < a and ab < 1, it easy to check that the height is smaller than the length, and so the ball
of radius 8 centered at c is contained in the rectangle, hence in S’.

As we shall see in the next section, the concept of open sets is integral to the notion of
continuity of a function.
We turn next to a discussion of sequences. The connections to open sets will become
clear.

Definition ^.sequence of vectors (or points) in R” is a function from the set of natural
numbers, N, to R", i.e., an assignment of a vector x* e R" to each natural number k e N.
We refer to x* as the k'h term of the sequence. We often abuse notation and write {x*} for
such a sequence, even though we are thinking of the actual function and not the set of its
values.
2 A Bit of Topology in R" 67

Figure 2.4

We say the sequence [xjJ converges to a (denoted x* -> a or lim x* = a) if for all
£->00
e > 0, there is K e N such that

||Xjt — a|| < e whenever k > K.

(That is, given any neighborhood nf a. “eventually”—past some X—all the elements x^ of
the "sequence he inside.) We say the sequence {x*} is convergent if it converges to some a.

> EXAMPLE 2

Here are a few examples of sequences, both convergent and nonconvergent.

a. Let xk = jL-. We suspect that xk -> 1. To prove this, note that, given any s > 0,

,, k , 1
fc + 1 " Jt-Fl

whenever k + 1 > 1/s. If we let K = [1/e] (the greatest integer less than or equal to 1/e),
then it is easy to see that k > K => k + 1 > 1/e, as required.
b. The sequence {xfc = (1 + |)*} of real numbers is a famous one (think of compound interest)
and converges to e, as the reader can check by taking logs and applying Proposition 3.6.
c. The sequence 1, —1,1, —1,1,i.e., = (-1/+1), is not convergent. Since its con­
secutive terms are two units apart, no matter what a € R and K e N we pick, whenever
8 < 1, we cannot have |x* - a| < e whenever k > K. For if we did, we would have (by
68 R Chapter 2. Functions, Limits, and Continuity

the triangle inequality) ,


2 == |x*+i -4| < frxk+i -a+a~xk\< |xfc+i - a| + |xt - a| < 2s < 2

whenever k > K, which is, of course, impossible.


d. Let x0 € R" be a fixed vector. Define a sequence (recursively) by x* = jxjt-i, k > 1. This
means, of course, that x* = (|)* x0, and so we suspect that x* -> 0. If x0 = 0, there is
nothing to prove. Suppose x0 0 and e > 0. Then we will have

l|x» -Oil = IM = (1)* llxoll <e

whenever k > log2(||x0||/e) = log(||x0||/e)/log2. So, if we take K = [iog(||x0|)/£)/


log 2] 4-1, then it follows that whenever k > K we have ||x* || < 8, as required.
2 0 1
e. Let A — andxo = . Define a sequence of vectors in R2 recursively by
0 1 1

2k
As the reader can easily prove by induction, we have xfc = , and it follows
f22*+l 1

that lim x* —
1
0
.◄

► EXAMPLE 3

Suppose x*, y* € R”, x* -» a, and y* -> b. Then it seems quite plausible that x* + yt -> a + b.
Given £ > 0, we are to find K € N so that whenever k > K we have || (x* + yt) — (a + b)|| < s.
Rewriting, we observe that (by the triangle inequality)

Ufa + y*) - (a + b)|| = ||(Xfc - a) + (y* - b)|| < Hx* - a|| + Uy* - b||,
I.
and so we can make ||(x* + y*) — (a + b)|| < £ by making ||x* — a|| < s/2 and ||y* - b|| < s/2. To
this end, we use the definition of convergence of the sequences {x*} and {y*J as follows: There are
Ki, K2 e N so that

whenever k > Ki, we have ||x* — a|| < s/2

and

whenever k > K2, we have ||y* — b|| < s/2.

Thus, if we take K = max(Xi, K2), whenever k > K, we will have k > Ky and k > K2, and so
g £
II(x* + yjt) - (a + b)|| < ||x* - a|| + ||y* - b|| < - + - = s,

as was required.

A crucial topological property of R is the least upper bound property.


2 A Bit of Topology in R" -4 69

A subset S c R is bounded above if there is some b g R so that a < b for


all a e S. Such a real number b is called an upper bound of S. Then every
nonempty set S that is bounded above has a least upper bound, denoted sup S.
That is, a < sup S for all a g S and sup S < b for every upper bound b of S.

► EXAMPLE 4

a. Let S = [0,1]. Then S is bounded above (e.g., by 2) and sup 5=1.


b. Let 5 = {x e Q : x2 < 2}. Then 5 is bounded above (e.g., by 2), and sup 5 = a /2. (Note
that V2 £ Q. The point is that the irrational numbers fill in all the “holes” among the
rationals.)
c. Suppose {xjt} is a sequence of real numbers that is both bounded above and nondecreasing
(i.e., xk < Xk+i for all k e N). Then the sequence must converge. Since the sequence is
bounded above, there is a least upper bound, a, for the set of its values. Now we claim that
Xk -> a. Given e > 0, there is K e N so that a — xK < e (for otherwise a would not be
the least upper bound). But then the fact that the sequence is nondecreasing tells us that
whenever k > K we have 0 < a — Xk < a — x k < £, as required. *4

Definition Suppose S C R". If 5 has the property that every convergent sequence
of points in S converges to a point in S, then we say S is closed. That is, S is closed if the
following is true: Whenever a convergent sequence x* -> a has the property that x* G S
for all k G N, then a G 5 as well.

► EXAMPLES

5 = R — {0} = {x g R : x 0} C R is not closed, for if we take the sequence xk — 1/fc G 5,clearly


Xk -> 0, and 0 £ 5. *4

This definition seems, a bit strange, but it is exactly what we will need for many
applications to come. In the meantime, if we need to decide whether or not a set is closed,
it is easiest to use the following.

Proposition 2.1 The subset S C R” is closed ifand only if its complement, R" — S =
{x G R” : x £ 5}, is open.

Proof Suppose R” — S' is open and {x^} is a convergent sequence with x* g S and
limit a. Suppose that a 0 S. Then there is a neighborhood B(a, s) of a wholly contained
in R” — S', which means no element of the sequence {x*} lies in that neighborhood, contra-
v dieting the fact that x* -> a. Therefore, a g S, as desired.
Suppose S is closed and b S. We claim that there is a neighborhood of b lying
entirely in R" — S. Suppose not. Then for every k g N, the ball B(b, 1/k) intersects S;
that is, we can find a point x* G S with ||X£ — b|| < 1/k. Then {x*} is a sequence of points
in S converging to the point b $ 5, contradicting the hypothesis that S is closed. ■
70 ► Chapter 2. Functions, Limits, and Continuity

> EXAMPLE 6

It now follows easily that the closed interval [a, b] = {x e R : a < x < b} is a closed subset of R,
inasmuch as its complement is the union of two open intervals. Similarly, the closed ball B(a, r) =
{x e R" : ||x — a|| < r} is a closed subset of R”, as we ask the reader to check in Exercise 5. In
summary, our choice of terminology is felicitous indeed.

Note that most sets are neither open nor closed. For example, the intervals = (0,1] c
R is not open because there is no neighborhood of the point 1 contained in S, and it is not
closed because of the reasoning in Example 5. Be careful not to make a common mistake
here: Just because a set isn’t open, it need not be closed, and vice versa.
For future use, we make the following

Definition Suppose S C R". We define the closure of S to be the smallest closed


set containing S. It is denoted by S.

We should think of S as containing all the points of S and all points that can be obtained
as limits of convergent sequences of points of S. A slightly different formulation of this
notion is given in Exercise 8.

► EXERCISES 2.2

*1. Which of the following subsets of R” is open? closed? neither? Prove your answer.
(a) {x : 0 < x < 2} C R x 1 2
(b) {x : x = 2~k for some fceNor (fi) .y—x| cR
x = 0} C R
(h) {x : 0 < l|x|| < 1} c R”
(c) : y > 01 C R2 (i) {x : ||x|| > 1} c R” \
0) {x : ||x|| < 1} c R"
(k) the set of rational numbers, Q C R
(d) cR2
1
x : ||x|| < 1 or x - < 1 ■ C R2
0
(e) :y >x C R2
(m) 0 (the empty set)

a2. Let lx*} be a sequence of points in R". For i = 1,..., n, let x^i denote the Ith coordinate of the
vector x*. Prove that x^ -> a if and only if x*,, -> at for all i = 1,..., n.
3. Suppose {Xfc} is a sequence of points (vectors) in R" converging to a.
■ (a) Prove that ||x* || —> ||a||. (Hint: See Exercise 1.2.17.)
(b) Prove that if b € R" is any vector, then b • x* -> b • a.
#4. Prove that a rectangle R = [«i, hi] x •• • x [an, bn] c R" is closed.
*5. Prove that the closed ball B(a, r) = {x € Rn : ||x - a|| < r} c R" is closed.
36. Given a sequence {x*} of points in R", a subsequence is formed by taking xfcl, Xfc2,..., xk.,...,
wherek\ < k2 < ^3 < • • ••
2 A Bit of Topology in R" 71

(a) Prove that if the sequence {x*} converges to a, then any subsequence {x^} converges to a as well.
(b) Is the converse valid? Give a proof or counterexample.
”7. (a) Suppose U and V are open subsets of RB. Prove that U U V and U D V are open as well.
(Recall that U U V = {x e R”: x € U or x € V} and UnV = {x€R”:xeU and x e V}.)
(b) Suppose C and D are closed subsets of R". Prove that CUD and C D D are closed as well.
118. Let S C RB. We say a e S is an interior point of S if some neighborhood of a is contained in S.
We say a € RB is a frontier point of S if every neighborhood of a contains both points in S and points
not in S.
(a) Show that every point of S is either an interior point or a frontier point, but give examples to
show that a frontier point of S may or may not belong to S.
(b) Give an example of a set S every point of which is a frontier point.
(c) Prove that the set of frontier points of S is always a closed set.
(d) Let S' be the union of S and the set of frontier points of S. Prove that S' is closed.
(e) Suppose C is a closed set containing S. Prove that S' C C. Thus, S' is the smallest closed set
containing S, which we have earlier called S, the closure of S. (Hint: Show that R” — C c R" — S'.)
9. Continuing Exercise 8:
(a) Is it true that all the interior points of S are points of S? Is this true if S is open? (Give proofs or
counterexamples.)
(b) Let S c R" and let F be the set of the frontier points of S. Is it true that the set of frontier points
of F is F itself? (Give a proof or counterexample.)
s*10. (a) Suppose Io —[a, £>] is a closed interval, and for each k e N, Ik is a closed interval with the
property that Ik C Ik-i • Prove that there is a point x e R so that x € Ik for all k e N.
(b) Give an example to show that the result of part a is false if the intervals are not closed.
11. Prove that the only subsets of R that are both open and closed are the empty set and R itself. (Hint:
Suppose S is such a nonempty subset that is not equal to R. Then there are some points a e S and
b $ S. Without loss of generality (how?), assume a < b. Let a = sup{x e R : [a, x] c S}. Show
that neither a e S nor a £ S is possible.)
12. A sequence {xfc} of points in R" is called a Cauchy sequence if for all e > 0 there is K e N so
that whenever k, t > K, we have IJx* — xg || < e.
(a) Prove that any convergent sequence is Cauchy.
(b) Prove that if a subsequence of a Cauchy sequence converges, then the sequence itself must
converge. (Hint: Suppose £ > 0. If x^ -> a, then there is J e N so that whenever j > J, we have
l|Xfcy — a|| < 6/2. ThereisalsoiT € N so that whenever k, I > K, we have ||xfc — xj| < e/2. Choose
j > J so that kj > K.)
*13. Prove that if {xfc} is a Cauchy sequence, then all the points lie in some ball centered at the origin.
14. (a) Suppose {j q } is a sequence of points in R satisfying a < xk < b for all k e N. Prove that
lx*} has a convergent subsequence (see Exercise 6). (Hint: If there are only finitely many distinct
terms in the sequence, this should be easy. If there are infinitely many distinct terms in the sequence,
then there must be infinitely many either in the left half-interval [a, or in the right half-interval
b]. Let [ui, hi] be such a half-interval. Continue the process, and apply Exercise 10.)
(b) Use the results of Exercises 12 and 13 to prove that any Cauchy sequence in R is convergent.
(c) Now prove that any Cauchy sequence in R" is convergent. (Hint: Use Exercise 2.)
315. Suppose S c R” is a closed set that is a subset of the rectangle [ai, bi] x • • • x [an, bn]. Prove
that any sequence of points in S has a convergent subsequence. (Hint: Use repeatedly the idea of
Exercise 14a.)
72 ► Chapter 2. Functions, Limits, and Continuity

► 3 LIMITS AND CONTINUITY


The concept on which all of calculus is founded is that of the limit. Limits are rather more
subtle when we consider functions of more than one variable. We begin with the obligatory
definition and some standard properties of limits.

Definition Let U C R" be an open subset containing a neighborhood of a g Rn,


except perhaps for the point a itself. Suppose f: U -> Rm. We say that

limf(x) = I
x->a
(ff(x) approaches L g Rm as x approaches a) if for every e > 0 there is 3 > 0 so that

||f(x)—Z||<£ whenever 0 < ||x — a|| < <5.

(Note that even if f (a) is defined, we say nothing whatsoever about its relation to £.)

We begin by observing that for a vector-valued function, calculating a limit may be


done component by component. As is customary by now, we denote the components of f
by

Proposition 3.1 lim f (x) = I if and only if lim /,• (x) = € .• for all j = 1,... ,m.
x—>a x->a

Proof The proof is based on Figure 2.1. Suppose limf(x) = t. We must show
X—>8
that for any j = 1,..., m, we have lim fj(x) — tj. Given s > 0, there is 8 > 0 so that
whenever 0 < ||x — a|| < 3, we have ||f (x) — £|| < e. But since we have

|/;(x) — < ||f(x) —£||,

we see lhat whenever 0 < ||x — a|| < 8, we have \fj (x) — | < £, as required.
Now, suppose that lim /)(x) = tj for j = 1,..., m. Given £ > 0, there are ...,
8m > 0 so that
e
\fj(x) —£j\ < —= whenever 0 < ||x — a|| < 8j.

Let 8 = min(<5i,..., 8m). Then whenever 0 < ||x — a|| < 8, we have

m I / £ \2
||f(x)-Z||= V(/;(x)-<)2 < ml — ) =e,

as required. ■

► EXAMPLE 1

Fix a nonzero vector b € ,1". Let f: R" -» R be defined by /(x) = b • x. We claim that lim /(x) =
x—>a
b•a because

|/(x) — b-a| = |b x-b a| = |b• (x-a)| < ||b||||x-a||,


3 Limits and Continuity 73

by the Cauchy-Schwarz Inequality, Proposition 2.3 of Chapter 1. Thus, given 8 > 0, if we take
g
8 = 8/ ||b||, then whenever 0 < ||x - a|| < <5, we have |/(x) — b ■ a| < ||b|| —• = 8, as needed.
Il»ll
Note, moreover, that as a consequence of Proposition 3.1, for any linear map T: Rn -> Rw it is
the case that lim T(x) = T(a).

> EXAMPLE 2

Let f: R" -> R be defined by /(x) = ||x||2. Then we claim that lim /(x) = ||a||2.
x->a

1. Suppose first that a = 0. Since r2 < r whenever 0 < r < 1, we know that when 0 < £ < 1,
we can choose 3 = 8 and then

0 < ||x|| < 8 = 8 ==> |/(x)| = ||x||2 < 82 < 8,

as required. But what if some (admittedly, silly) person hands us an 8 > 1? The trick to
take care of this is to let 8 = min(l, s). Should e be bigger than 1, then 8 = 1, and so when
0 < ||x|| < 8, we know that ||x|| < 1 and, once again, |/(x)| < 1 < 8, as required.
✓ g \
2. Now suppose a 0. Givens > 0, let 8 = min I ||a||, —— I. Now suppose 0 < ||x — a|| <
\ 3||a||/
8. Then, in particular, we have ||x|| < ||a|| + 8 < 2||a||, so that ||x + a|| < ||x|| + ||a|| <
3||a||. Then

|/(x) - ||a||21 = |x • x - a • a| = |(x + a) • (x - a)|


8
< ||x + a|| ||x - a|| < 3||a|| • —- = s,
3||a||

as required.
Such sleight of hand (and more) is often required when the function is nonlinear. ◄

► EXAMPLES

Define/: R2 - {0}-> R by f Does lim /(x) exist? Since |x| < Jx2 + y2 and
x->0

|y | < y/x2~+ y2, we have (writing x =

llxlP
irooi s = iix«.

and so /(x) -> 0 as x -> 0. (In particular, taking 8 = e will work.) An alternative approach, which
will be useful later, is this:

x2
|/(x)| = |y| ...2 < |y|
x2 + y2

x2
since 0 < —------- < 1. Once again, |y| < ||x|| and hence approaches 0 as x -> 0. Thus, so does
x2 4- y2
|/(x)|. (SeeFigure 3.1(a).)
74 ► Chapter 2. Functions, Limits, and Continuity

► EXAMPLE 4

Let’s modify the previous example slightly. Define f: R2 — {0} -> R by f . We ask
x2 + y2
again whether lim /(x) exists. Note that
x->0
i-

0 = lim — = 1,

lim f [ ° j = lim
k->0 \kl
h2

k~*° k2
= 0.
whereas

Thus, lim f (x) cannot exist (there is no number t so that both 1 and 0 are less than £ away from t
x-»0
xy „ T ,.
when 0 < s < 1/2). (See Figure 3.1(b).) Now, what about f ------- ? In this case we have
4- v2

so we might surmise that the limit exists and equals 0. But consider what happens if x approaches 0
along the line y = x :

r Jh\ r h2 1
lim —2r = -.
lim / \h)I = h-+o2h 2

Once again, the limit does not exist. <<

The fundamental properties of limits with which every calculus student is familiar
generalize in an obvious way to the multivariable setting.

Theorem 3.2 Suppose f and g map. a neighborhood of a s JR” (with the possible
exception of the point a itself) to and k maps the same neighborhood to R. Suppose

limf(x)=£, limg(x)=m, and limfc(x)=c.


x->a x—>a x—>a
3 Limits and Continuity 75

Then

lim f (x) + g(x) = I 4- m,


x-+a
lim f(x) • g(x) = t • m,
x->a
lim fc(x)f (x) = c£.
x—>a

Proof Given £ > 0, there are <5i, <§2 > 0 so that


£
||f(x) — Z|| < - whenever 0 < ||x — a|| <

and
8
||g(x) — m|| < - whenever 0 < ||x - a|| < 32.

Let 3 = min(3i, 32). Whenever 0 < ||x - a|| < 3, we have


£ £
II(f(X) + g(x)) - (Z +m)|| < ||f(x) -t\\ 4- ||g(x) —m|| < - + - = e,
£
as required.
Given e > 0, there are (different) 3i, 82 > 0 so that
£
||f(x) - ZII < min ( . .... -- , 1) whenever 0<|(x-a||<3i
Z(||m|| 4-1)

and
£
llg(x) -m|| < • ■ whenever 0 < ||x - a|| < 32.
4" 4?
Note that when 0 < ||x — a|| < 3i, we have (by the triangle inequality) ||f (x)|| < ||Z|| 4-1.
Now, let 3 = min(3i, 32). Whenever 0 < ||x — a|| < 3, we have

|f (x) • g(x) - £ • m| = |f (x) • (g(x) - m) 4- (f (x) - Z) • m|


< ||f (x)|| ||g(x) — m|| 4- (|f(x) -Z||||m||
< (11*11 + l)llg(x) -mil 4- llf(x) -Z||||m||
£ £ £ £
< (W + 1)2(||€|| +1) + 2(||m|| + 1)W * 2 + 2 =

as required.
The proof of the last equality is left to the reader in Exercise 4. ■

Once we have the concept of limit, the definition of continuity is quite straightforward.

Definition Let U C R” be an open subset containing a neighborhood of a e R", and


let f: U -> Rm. We say f is continuous at a if

limf(x) = f(a).
76 > Chapter 2. Functions, Limits, and Continuity

That is, f is continuous at a if, given any £ > 0, there is 6 > 0 so that

||f(x) — f(a)|| < e whenever ||x —a||<3.

We say f is continuous if it is continuous at every point of its domain.

As an immediate consequence of Theorem 3.2 and this definition we have

Corollary 3.3 Suppose f and g map a neighborhood of a G Rn to Rm and k maps


the same neighborhood to R. If each function is continuous at a, then so are f + g, f g,
and kf.

It is perhaps a bit more interesting to relate the definition of continuity to our notions
of open and closed sets from the previous section. Let’s first introduce a bit of standard
notation: If f: X -> Y is a function and Z c Y, we write /-1(Z) = {x G X : f(x) g Z],
as illustrated in Figure 3.2. This is called the preimage of Z under the mapping /; be careful
to remember that f may not be one-to-one and hence may well have no inverse function.

Figure 3.2

Proposition 3.4 Let U C Rn be an open set. The function f:U-+Rmis continuous


if and only iffor every open subset V C Rm, f-1(V) is an open set (i.e., the preimage of
every open set is open).

Proof <=: Suppose a g U and we wish to prove f is continuous at a. Given


e > 0, we must find 8 > 0 (chosen small enough so that B(a, <5) c U) so that whenever
||x — a|| < 8, we have ||f(x) — f(a)|| < e. Take V = B(f(a), e). Since f-1(V) is open
and a G f-1(V), there is 3 > 0 so that B(a, 3) c f-1(V). We then know that whenever
||x — a|| < 3, we have f(x) g V = B(f (a), s), and so we’re done. (See Figure 3.3.)
==>: Suppose now that f is continuous and V c Rm is open. Let a G f-1(V) be
arbitrary. Since f (a) g V and V is open, there is e > 0 so that (a), s) C V. Since f is
continuous at a, there is 3 > 0 so that whenever ||x — a || < 3, we have ||f (x) — f (a) || < £.
So, whenever x G B(a, 3), we have f (x) g B(f(a), e) c V. This means that B(a, 6) c
f-1 (V), and so f-1 (V) is open. ■
3 Limits and Continuity < 77

Figure 33

Proposition 3.5 Suppose U C R" and W C Rp are open, f: U -> Rm, g: W R",
and the composition offunctions fog is defined (i.e., g(x) e U for allxeW). Then iff and
g are continuous, so is f°g.

Proof Let V c Rm be open. We need to see that (f°g)-1(V) is an open subset of


Rp. By the definition of composition, (f°g)(x) = f(g(x)) € V if and only if g(x) € f_1(V)
if and only if x e g-1(f-1(V)). By the continuity of f, we know that f-1(V) c R" is
open, and then by the continuity of g, we deduce that g-1 (f-1 (V)) c Rp is open. That is,
(f°g)-1(V) C Rp is open, as required. ■

► EXAMPLES

Consider the function


0
/(0) = o,

whose graph is shown in Figure 3.4. We ask whether f is continuous. Since the denominator vanishes
only at the origin, it follows from Corollary 3.3 that f is continuous away from the origin. Now,

since f = 0 for all h and k, we are encouraged. What’s more, the restriction of f to

any line y = mx through the origin is continuous since

x \ mx3 mx _ „
(mx I
= x44 +. "2 i = ■■■■,-
m2 + x2
2 for all x.

On the other hand, if we consider the restriction of f to the parabola y = x2, we find that

= x*0
0, x = O’

which is definitely not a continuous function. Thus, f cannot be continuous. (If it were, according

to Proposition 3.5, letting g(x) = , /og would have to be continuous.)


.2
78 ► Chapter 2. Functions, Limits, and Continuity

Figure 3.4

Next we come to the relation between continuity and convergent sequences.

Proposition 3.6 Suppose U C Rrt is open and f: U —> Rm. Then f is continuous at
a if and only iffor every sequence {x*} ofpoints in U converging to a the sequence {f (x^)}
converges to f (a).

Proof Suppose f is continuous at a. Given e > 0, there is 3 > 0 so that whenever


||x - a|| < 3, we have ||f (x) - f (a) || < e. Suppose x* -» a. There is K e N so that when­
ever k > K, we have ||x* — a|| < 3, and hence ||f (x*) — f (a) || < e. Thus, f(xjt) -> f(a),
as required.
The converse is a bit trickier. We proceed by proving the contrapositive. Suppose f
is not continuous at a. This means that for some £q > 0, it is the case that for every 3 > 0
there is some x with ||x - a|| < 3 and ||f (x) - f (a)|| > e q . So , for each k e N, there is a
point xjt so that ||x* — a|| < 1/fcand ||f (xjt) - f(a)|| > s q . But this means that the sequence
{Xfc} converges to a and yet clearly the sequence {f (x^)) cannot converge to f (a). ■

Corollary 3.7 Suppose f: R" —> Rm is continuous. Then for any c G Rm, the level
set f"1 ({c}) = {x G R” : f (x) = c} is a closed set.

Proof Suppose {x* } is a convergent sequence of points in f “1 ({c}), and let a be its
limit. By Proposition 3.6, f (x&) -> f (a). Since f (xjt) = c for all k, it follows that f (a) = c
as well, and so a e f-1 ({c}), as we needed to show. ■

► EXAMPLE 6

By Example 2, the function f: Rrt -> R, /(x) = ||x||2, is continuous. The level sets of f are spheres
centered at the origin. It follows that these spheres are closed sets. ◄ I

> EXERCISES 2.3


1. Prove that if lim f (x) exists, it must be unique. (Hint: If t and m are two putative limits, choose
x->a
e = ||/-m||/2.)
#2. Prove that f: R" -> R, /(x) = ||x|| is continuous. (Hint: Use Exercise 1.2.17.)
3 Limits and Continuity •< 79

"3. (Squeeze Principle) Suppose f, g, and h are real-valued functions on a neighborhood of a


(perhaps not including the point a itself). Suppose /(x) < g(x) < A(x) for all x and lim /(x) = f =
x->a
lim /z(x). Prove that lim g(x) = £. (Hint: Given e > 0, show that there is 8 > 0 so that whenever
x->a x->a
0 < ||x — a|| < 8, we have —e < /(x) - € < g(x) -1 < h(x) - £ < e .)
4. Suppose lim f (x) = I and lim £(x) = c. Prove that lim k(x)l(x) = cl.
x—>a x->a x->a
B5. Suppose U c R" is open and f: U -> R is continuous. If a e U and /(a) > 0, prove that there
is <5 > 0 so that /(x) > 0 for all x e B(a, 6). (That is, a continuous function that is positive at a point
must be positive on a neighborhood of that point.) Can you state a somewhat stronger result?
6. Let U c R" be open. Suppose g: U -> R is continuous and g(a) 0. Prove that \/g is contin­
uous on some neighborhood of a. (Hint: Apply Proposition 3.5.)
B7. Suppose T: R" -> Rm is a linear map.
(a) Prove that T is continuous. (See Example 1.)
(b) Deduce the result of part a an alternative way by showing that for any m x n matrix A, we have
IIAx|| < (22^?.) ‘ ||x||.
ij
*8. Using Theorem 3.2 whenever possible (and standard facts from one-variable calculus), decide in
each case whether lim /(x) exists. Provide appropriate justification.
x->0
xy x\ x2 + y2 , „ „/o\
(a) (f) 1 =---------- , x 0, f I 1=0
y) x \y)

9. Suppose f: R" -> Rn is continuous and xo is arbitrary. Define a sequence by x* = f(xn),


£=1,2,.... Prove that if x* -> a, then f (a) = a. We say a is a fixed point of f.
10. Use Exercise 9 to find the limit of each of the following sequences of points in R, presuming it
exists. i
*(a) Xo = 1, Xjt = A/2xjt_i (c) Xo = 1, Xfc = 1 H--------
Xk-l
X&—1
(b) x0 = 5,xt = -4-i- 2
+----- *(d) xo = l,x, = l + —1------
2 X/t-1 1 + Xk-!
11. Give an example of a discontinuous function f: R -> R having the property that for every c € R
the level set /-1({c}) is closed.
12. If f: X -> Y is a function and U c X, recall that the image of U is the set f(U) = {yeY:
y = / (x) for some x € U}. Prove or give a counterexample: If f is continuous, then the image of
every open set is open. (Cf. Proposition 3.4.)
13. Prove that if f is continuous, then the preimage of every closed set is closed.
80 ► Chapter 2. Functions, Limits, and Continuity

14. Identify A4mxn, the set of m x n matrices, with Rmn in the obvious way.
(a) Prove that when n — 2 or 3, the set of n x n matrices with nonzero determinant is an open subset
of Mnxn.
(b) Prove that the set of n x n matrices A satisfying ATA = In is a closed subset of A4nxn-
15. (a) Let

|y| > x2 or y = 0
otherwise

Show that f is continuous at 0 on every line through the origin but is not continuous at 0.
(b) Give a function that is continuous at 0 along every line and every parabola y — kx2 through the
origin but is not continuous at 0.
16. Give a function f: R2 -> R that is
(a) continuous at 0 along every line through the origin but unbounded in every neighborhood of 0;
(b) continuous at 0 along every line through the origin, unbounded in every neighborhood of 0, and
discontinuous only at the origin.
17. Generalizing Example 5, determine for what positive values of a, ft, y, and 8 the analogous
function
f (x\ _ |x|g|y|^
f\y/ IxP' + bf’ x £ 0, /(0) = 0

is continuous at 0.
18. (a) Suppose A is an invertible n x n matrix. Show that the solution of Ax = b varies continu­
ously with b e R".
A
(b) Show that the solution of Ax = b varies continuously as a function of , as A varies over
b
all invertible matrices and b over R". (You should be able to get the cases n = 1 and n = 2. What
do you need for n > 2?)
THE DERIVATIVE
In this chapter we start in earnest on calculus. The immediate goal is to define the tangent
plane at a point to the graph of a function, which should be the suitable generalization
of the tangent lines in single-variable calculus. The fundamental computational tool is
the partial derivative, a direct application of single-variable calculus tools. But the actual
definition of a differentiable function immediately involves linear algebra. We establish
various differentiation rules and then introduce the gradient, which, as common parlance
has come to suggest, tells us in which direction a scalar function increases the fastest; thus,
it is highly important for physical and mathematical applications. We conclude the chapter
with a discussion of Kepler’s laws, the geometry of curves, and higher-order derivatives.

► 1 PARTIAL DERIVATIVES AND DIRECTIONAL DERIVATIVES


Whenever possible it is desirable to reduce problems in multivariable calculus to those in
single-variable calculus. It is reasonable to think that we should understand a function by
knowing how it varies with each variable, fixing all the others. (A physical analogy is this:
To find the change in energy of a gas as we change its volume and temperature, we imagine
that we can first fix the volume and change the temperature and then, fixing the temperature,
change the volume.)
We begin by considering a real-valued function f of two variables, x and y.

Af
Definition We define the partial derivatives — and — as follows:

9f /a\ .. J\ b ) \bj
------------- h--------------

Very simply, if we fix b, then is the derivative at a (or slope) of the function

F(x) = f as indicated in Figure 1.1. There is an analogous interpretation of


3y \bj

81
82 ► Chapter 3. The Derivative

Figure 1.1

More generally, if U C R” is open and a € U, we define the Jth partial derivative of


f: U -> Rm at a to be

at f(a + rej)-f(a) . .
—(a) = tan------------J------ , j = l.....n
dXj t—>0 t

(provided this limit exists). Many authors use the alternative notation Dj /(a) to represent
the jth partial derivative of f at a.

► EXAMPLE 1

= x3y5 + sin(2x + 3y). Then

df = 3x2y5 + exy(y sin(2x + 3y) + 2cos(2x + 3y)) and


dx

= 5x3y4 + (x sin(2x + 3y) + 3 cos(2x + 3y)).

The partial derivatives of f measure the rate of change of f in the directions of the
coordinate axes, i.e., in the directions of the standard basis vectors ei,..., en. Given any
nonzero vector v, it is natural to consider the rate of change of f in the direction of v.
1 Partial Derivatives and Directional Derivatives -4 83

Definition Let U C R” be open and a e U. We define the directional derivative of


f: U —> R™ at a in the direction v to be

n a r f(» + ^v)-f(a)
Z)vf(a) = lim------------------------ ,
>0 t

provided this limit exists.

Note that the 7th partial derivative of f at a is just De.f (a). When n = 2 and m = 1,
as we see from Figure 1.2, if || v || — 1, the directional derivative Dv/(a) is just the slope at
a of the graph we obtain by restricting to the line through a with direction v.

Figure 1.2

Remark Our terminology might be a bit misleading. Note that since

D fWI - lim + ,(CT)) “ fW lim + <C')¥) “ fW


Devi (a) = lim-------------------------------- = lim----------------------------------
r->0 t f->0 t
f (a + (ct)v) - f(a) f(a + sv) - f(a)
— c lim —------------- ------------— c lim-------------------------
r->0 ct s
= cDvf(a),

the directional derivative depends not only on the direction of v, but also on its magnitude.
It is for this reason that many calculus books require that one specify a unit vector v. It
makes more sense to think of Dvf (a) as die rate of change of f as experienced by an observer
moving with instantaneous velocity v. We shall return to this interpretation in Section 3.
84 ► Chapter 3. The Derivative

► EXAMPLE 2

Let f: R2 —> R be defined by

x/= 0, /(O) = 0,

whose graph is shown in Figure 1.3. Then the directional derivative of f at 0 in the direction of the

unit vector v =

kviKrvz)
d ,/(o > = ita = _J£l_ =
t—>0 f f—>0 if

Note that both partial derivatives of f at 0 are 0, and yet the remaining directional derivatives are
nonzero.

► EXAMPLE 3

Let /: Rn -> R be defined by f (x) = ||x||. Let a 0 be arbitrary. Let v = a/||a|| be a unit vector
pointing radially outward at a. Then
n . r /(a + tv)-/(a) lla + ^ll - Hall (||a|| +t) - ||a||
Dv f(a) = hm ---------------- ------ = hm---------- —----------- = hm -- --------- ---------- — 1.
f->0 t f->0 t /->o t
On the other hand, if v • a = 0, then
r. . r l|a + rv|| - ||a||
Dv/(a) = hm-------- ------------ = 0,
r->0 t
inasmuch as t = 0 is a global minimum of the function g(t) = ||a + tv||. (Why?) ◄

► EXAMPLE 4
1' 2~
Let f y = x2y + e3x+y~z-, let a - -1 and v = 3 What is the directional derivative
2_ _ -1 _
1 Partial Derivatives and Directional Derivatives 85

We define <p: R -> R by ^(0 = /(a + rv). Note that

?'(0) = Um = lim + = Dv/(a).


r-»0 t t->0 t
So we just calculate and compute its derivative at 0:

1 4- 2r \

(
-1+ 3t I = (1 + 202(~ 1 + 3r) + e(3(l+2r)+(-14-3O-(2-t))

2-f /

= (1 +202(-1 + 3i) + e10r, so


= 4(1 + 20(-l + 30 + 3(1 + 202 + 10?°',

from which we conclude that Dv/(a) = <p'(Q) = 9.


This approach is usually more convenient than calculating the limit directly when we are given
commonplace functions.

► EXERCISES 3.1

1. Calculate the partial derivatives of the following functions.

*(a) f I ) = x3 + 3xy2 - 2y + 7 (d) f e-(x2+y2)

(e) fl ] = (* + /) log*
>2

(f) f y = exyz2 — xy sin(Tryz)


*(c) f

2. Calculate the directional derivative of the given function f at the given point a in the direction of
the given vector v.

*(a)

*(b)

(c)

(d) f

3. For each of the following functions f and points a, find the unit vector v with the property that
Dyf(a) is as large as possible.
2 1
’(a) f I ) = x2 + xy, a = zx J 1 1 1
1 (C) f y a= -1
I I x y z
\z/ 1
0
(b) f ( \=yex,& =
1
86 ► Chapter 3. The Derivative

4. Suppose Dyf(a) exists. Prove that D_v/(a) exists and calculate it in terms of the former.
5. (a) Show that there can be no function f: R" -> R so that for some point a € R" we have
Dv/(a) > 0 for all nonzero vectors v € R".
(b) Show that there can, however, be a function /: R" -> R so that for some vector v € R" we have
Dv/(a) > 0 for all points a e R".
6. Consider the ideal gas law pV = nRT. (Here p is pressure, V is volume, n is the number of
moles of gas present, R is the universal gas constant, and T is temperature.) Assume n is fixed. Solve
for each of p, V, and T as functions of the others, viz.,

Compute the partial derivatives of f, g, and h. What is


3/ dg dh . n dp dV 3T?
av ’ af ' 8p ’ or’more colloquially’ av ’ df ' dp ‘

Suppose f: R -> R is differentiable, and let g I )=/(-). Show that


\yj \y/
dg , dg n
x— + y— — 0.
dx z dy

8. Suppose f: R -> R is differentiable, and let g = f(y/x2 + y2) for x 0. Show that

9. Let f: R2 -> R be defined by

x/=0, /(0) = 0.

Show that the partial derivatives of f exist at 0 and yet f is not continuous at 0. Do other directional
derivatives of f exist atO?
10. (a) Let /: R2 —> R be the function defined in Example 5 of Chapter 2, Section 3. Calculate
Dv/(0) for any v € R2.
(b) Give an example of a function f: R2 -> R all of whose directional derivatives at 0 are 0 but is,
nevertheless, discontinuous at 0.
*11. Suppose T: R” —> is a linear map. Show that the directional derivative DVT (a) exists for
all a e R" and all v e R" and calculate it.
12. Identify the set Afnxn of n x n matrices with R"2.
(a) Define f: Mn%n -> A4nxn by f(A) = AT. For any A, B e A4„xn, prove that DBf(A) = BT.
(b) Define f: A4„xn -> A4nxn by f(A) = trA. For any A, B e Mnxn, prove that DBf(A) = trB.
(For the definition of trace, see Exercise 1.4.22.)
13. Identify the set A4nXn of n x n matrices with R"2.
(a) Define f: Mnxn -> Mnxn by f (A) = A2. For any A, B e Mnxn, prove that DBf(A) = AB +
BA.
(b) Define f: Mnxn -> Mnxn by f(A) = ATA. Calculate DBf(A).
2 Differentiability •< 87

► 2 DIFFERENTIABILITY
For a function f: R -> R, one of the fundamental consequences of being differentiable (at
d) is that the function must be continuous (at d). We have already seen that for a function
f: R" -> R, having partial derivatives (or, indeed, all directional derivatives) at a need not
guarantee continuity at a. We now seek the appropriate definition.
Recall that the derivative is defined to be

A->0 h

alternatively, if it exists, it is the unique number m with the property that

.. f(a + h) - f(d) — mh
lim---------------- ------------------ = 0.
h-+0 h

a
That is, the tangent line—the line passing through with slope m — f'(d)—is the
f(a)
best (affine) linear approximation to the graph of f at a, in the sense that the error goes to
0 faster than h as h -> 0. (See Figure 2.1.) Generalizing the latter notion, we make this

Definition Let U c R" be open, and let a e U. A function f: U -> Rm is differen­


tiable at a if there is a linear map Df (a): R” -> Rm so that

r f(a + h)-f(a)-Df(a)h n
iTo---------------- iihii---------------- = °

This says that Df(a) is the best linear approximation to the function f — f(a) at a,
in the sense that the difference f(a + h) — f (a) — Df(a)h is small compared to h. See
Figure 2.2 and compare Figure 2.1. Equivalently, writing x = a 4- h, the function g(x) =
f(a) + Df(a)(x - a) is the best affine linear approximation to f near a. Indeed, the graph of
g is called the tangent plane of the graph at a. The tangent plane is obtained by translating
a
the graph of Df (a), a subspace of R" x Rm, so that it passes through
88 ► Chapter 3. The Derivative

Figure 2.2

Remark The derivative Df (a), if it exists, must be unique. If there were two linear
maps T, T': R" -> Rm satisfying
r f(a+ h)-f(a) — T(h) , f(a + h)-f(a)-r(h) n
lim-------------- -------------- = 0 and hm---------------- —- --------------- = 0,
h-»o ||h|| h->o ||h||
then we would have

In particular, letting h = ret- for any i = 1,..., n, we see that

lim = (T - D(et) = 0 fort = l,...,n,


<->o+ t
and so T — T'.

It is worth observing that a vector-valued function f is differentiable at a if and only if


each of its coordinate functions fa is differentiable at a. (See Exercise 6 for a proof.)

Proposition 2.1 If f: R" -> Rm is differentiable at a, then the partial derivatives


9fa
-— (a) exist and
3Xj
[Df(a)] = [^-(a) .

The latter matrix is often called the Jacobian matrix of f at a.

Proof Since we assume f is differentiable at a, we know there is a linear map Df (a)


with the property that
f (a + h) - f (a) - Df (a)h
lim-------------------------------------- = 0.
h-o Uhll
2 Differentiability < 89

As we did in the remark above, for any j = 1,..., n, we consider h = ttj and let t -> 0.
Then we have
0 = lim f(a + ?ej)-f(a)-gf(aW = 0
f->0 |r|

Considering separately the cases t > 0 and t < 0, we find that

f(a + re7)-f(a) - Z)f(a)(tep f(a + tey)-f(a)


0 = lim ------------ ------------------------------ — = lim ----------------------------- Df(a)(e, ),
r->0+ t r-*0+ t
f(a + rey) - f(a) - Df(a)(re;) / f(a + re7)-f(a) \
0 = lim ------------------------------------------ — = — ( lim -----------------------------Df(a)(e.)J,
t—>0 \ f—>0 t
a nr/w x r *(» + *«/)”*(») 3f . _
and so Df (a)(e7) = lim------------------------- = -—(a), as required. ■

► EXAMPLE 1

When n = 1, we have parametric equations of a curve in Km. We see that if f is differentiable at a,


then
‘ A'(a) ‘

_ /m(a) _
and we can think of Df(a) = Df (a)(1) as the velocity vector of the parametrized curve at the point
f (a), which we will usually denote by the (more) familiar f'(a). See Section 5 for further discussion
of this topic.

► EXAMPLE 2
a
Let f | I = xy. To prove that f is differentiable at a = , we must exhibit a linear map D/(a)
V/ ubm
with the requisite property. By Proposition 2.1, we know the only candidate is

so now we just prove that the appropriate limit is really 0:

(ab + bh + ak + hk) -ab- (bh + ak)


Jh2 + k2

hk
lim —=====

because < 1 and k -> 0 as


Jh2+k2 "
90 ► Chapter 3. The Derivative

► EXAMPLES

The tangent plane of the graph z = = xy at a = is

= 2+[l = 2 + (x — 2) 4-2(y — 1) = x+2y - 2. ◄I

► EXAMPLE 4

I x\ x a
Let f I I = First, we claim that f is differentiable at a = , provided b / 0. The putative
V/ y b
derivative is

-- L
b2 J ’

and we check that


v f(a+ h) - f(a) - Df(a)h
lim 0.
h->0 Ilhll
WeU,

la + h a\ _ r 1 h _ a+h a h ak
b] L b b^b2
k b+k b
(a + h)(—bk) + ak(b + k) k(ak — bh)
b2(b + k) = b2(b + k) '

and so

a+h a\ r1 h k(ak — bh)


b) L b
b+k b2 k Um ™=0
lim
h hl „ Jh2+k2
k

|fc| „ ak — bh h
since . -= illnd
- W^
y/h2 + k2 ~ k
Now, as a (not totally facetious) application, consider the problem of calculating one’s gas
mileage, having used y gallons of gas to travel x miles. For example, without having a calculator on
hand, we can use linear approximation afforded us by the derivative to estimate our gas mileage if
we’ve used 10.8 gallons to drive 344 miles. Using a = 350 and b = 10, we have

344 344 — a
10.8 10.8 - b

-6
0.1 -3.5 = 35-0.6-2.8 = 31.6.
0.8

(The actual value, to two decimal places, is 31.85.) <4


2 Differentiability ■< 91

► EXAMPLE 5

As we said earlier, a function f: R2 -» R2 is differentiable if and only if both its component functions
f■: R2 -> R are differentiable. It follows from Examples 2 and 4 that the function f: R2 —> R2 given
by

is differentiable, and the Jacobian matrix of f at a is

One indication that we have the correct definition is the following.

Proposition 2.2 Iff: Rn -> Rm is differentiable at a, then f is continuous at a.

Proof Suppose f is differentiable at a; we must show that lim f (x) = f (a) or, equiv­
alently, that lim f (a 4- h) = f (a). We have a linear map Df (a): R” -» Rm so that
h->0

f(a + h)-f(a)-Df(a)h
lim = 0.
h->0 IIM
This means that
/f(a + h)-f(a)-Df(a)h 11. \
hm f(a + h) - f(a) - Df(a)h = hm ----------------------------- — ||h||
h~>0 h->0 \ ||n|| /
f(a 4-h)-f(a) - Df(a)h
= lim lim ||h|| =0.
h^O Hh|| h->0

By Exercise 2.3.7, lim Df(a)h = 0, and so


h—>0
lim f (a + h) - f(a) = lim (f (a + h) — f (a) - Df (a)h) 4- lim Df (a)h = 0,
h->0 h->0 h-»0
as required. ■

Let’s now study a few examples to see just how subtle the issue of differentiability is.

> EXAMPLE 6

Define f: R2 -> R by

-5^-7. X^O, /(0)=0.


x2 4- y2

Since f I ) = 0 for all x and f [ I = 0 for all y, certainly

0.
92 ► Chapter 3. The Derivative

However, we have already seen in Exercise 3.1.9 that f is discontinuous, so it cannot be differentiable.
For practice, we check directly: If D/(0) existed, by Proposition 2.1 we would have D/(0) = 0.
Now let’s consider
/(h) - /(0) - Df(Q)h v /(h) v hk
lim---------------------------------- — lim------ = lim —------- •
h^O ||h|| h >0 llhll rhI (fc2 + fc2)3/2
H
Like many of the limits we considered in Chapter 2, this one obviously does not exist; indeed, as
h —> 0 along die line h = k, this fraction becomes

h2 _ 1
(2h2)3/2 “ 2s/2|h| ’

which is clearly unbounded as h -+ 0. What’s more, as the reader can check, / has directional
derivatives at 0 only in the directions of the axes.

► EXAMPLE?

Define /: R2 -> R by
0
x y
x^O, /(0)=0.
x2 + y2 ’

As in Example 6, both partial derivatives of this function at 0 are 0. This function, as we saw in
Example 5 of Chapter 2, Section 3, is continuous, so differentiability is a bit more unclear. But we
just try to calculate:

r /(h)-/(0)-D/(0)h h2k
nm------------------------------ = lim - lim
h-*o ||h|| h^o ||h|| rAi (h2+fc2)3/2'

When h -> 0 along either coordinate axis, the limit is obviously 0; however, when h -> 0 along the
line h = k, the limit does not exist, as the expression is equal to when h > 0 and —when
h < 0. Thus, / is not differentiable at 0.

Proposition 2.3 When f is differentiable at a, for any v e R”, the directional deriva­
tive off at a in the direction v is given by

Dvf(a) = Df (a)v.

Proof Since f is differentiable at a, we know that its derivative, Df(a), has the
property that

r f(a + h) — f(a) — Df(a)h _


lim---------------- —-----------------= 0.
h—>0 l|h||

Substituting h = tv and letting t -> 0, we have

f (a + tv) - f(a) - Df (a) (tv)


lim-------------------- —-------------------- = 0.
r->0 in
2 Differentiability 93

Since Df (a) is a linear map, Df(a)(fv) = tDf (a)v. Proceeding as in the proof of Propo­
sition 2.1, letting t approach 0 through positive values, we have

f(a + /v)-f(a)-rDf(a)v
lim ---------------------------------------- = 0, and so
f->0+ t
f(a + rv)-f(a) A
lim----------------------- _ pf(a)v.
t—>o+ t
Similarly, when t approaches 0 through negative values, we have |r | = — t and
f(a + rv)-f(a)-t£)f(a)v
hm---------------------------------------- =0, so
t->o- -t
f(a + rv) - f(a) - tDf(a)v
lim ---------------------------------------- = 0, and, as before,
t->o- t
f(a-Hv) — f(a)
lim----------------------- — £>f(a)v.
t
Thus,

r->0 t
as required. ■

Remark Let’s consider the case of a function f: R2 -> R, as we pictured in Figures


1.1 and 1.2. As a consequence of Proposition 2.3, the tangent plane of the graph of f at a
contains the tangent lines at a of the slices by all vertical planes. The function f given in
Example 2 of Section 1 cannot be differentiable at 0, as it is clear from Figure 1.3 that the
tangent lines to the various vertical slices at the origin do not lie in a plane.

Since it is so tedious to determine from the definition whether a function is differen­


tiable, the following proposition is useful indeed.

Proposition 2.4 If f: U -> Rw has continuous partial derivatives, then f is differ­


entiable.

A function with continuous partial derivatives is said to be 61 or continuously differ­


entiable. The reason for this notation will become clear when we study partial derivatives
of higher order.

Proof By Exercise 6, it suffices to treat the case m — 1. For clarity, we give the proof
in the case n = 2, although the general case is not conceptually any harder. As usual, we
a h
write a = andh =
b k
As usual, if f is to be differentiable, we know that Df (a) must be given by the Jacobian
matrix of f at a. To prove that f is differentiable at a € U, we need to estimate

/(a + h) - /(a) - D/(a)h = /(a + h) - /(a) - (^(»)h + 2£(a)*J .


94 ► Chapter 3. The Derivative

Now, here is the new twist: As Figure 2.3 indicates, we calculate /(a + h) — /(a) by
taking a two-step route:

rz
/(a ux rz x
+, h)-/(a) = / -/
\b +k/ \b/

and so, regrouping in a clever fashion and using the Mean Value Theorem twice, we obtain

/(a + h)-/(a)-D/(a)h

df \ 3/ \
k - ~(^)k
3yv 7
dx /

for some | between 0 and h and some r] between 0 and k

V V ( A,h +fVfa + h\ v \
9x \ b ) dx^’J XdyXb + rJ dy J

\h\ \k\
Now, observe that — < 1 and < 1; as h -> 0, continuity of the partial derivatives

guarantees that

(df fa+h df
- /-(a)\) = 0
lim = lim I — I ,
h->0 h^O \ dy \b + r} 3y /

since $ 0 and t? -> 0 as h -> 0. Thus,


|/(a + h)-/(a)-D/(a)h|
llhll
df (a + h\ df K
“ dx \ b ) dxW ||h|| dy yb'+Tj/ dy l|h|| ’

and therefore indeed approaches 0 as h 0. ■


2 Differentiability •« 95

► EXAMPLES

We know that the function f given in Example 7 is not differentiable. It follows from Proposition
2.4 that f cannot be C1 at 0. Let’s verify this directly.
It is obvious that ~ (0) = — (0) = 0, and for x 0, we have
ox dy
df Z*\ _ j 3/ Zx\ x2(x2 — y2)
dx \y/ (x2 + y2)2 311 \y/ (*2 + y2)2

So we see that when x /= 0, i and = 1, neither of which approaches 0 as x -*■ 0.

Thus, f is not C1 at 0. *<

► EXAMPLE 9

To see that the sufficient condition for differentiability given by Proposition 2.4 is not necessary, we
consider the classic example of the function f: R -> R defined by

x2 sin -, x 0
f(x) =
0, x=0

Then it is easy to check that /'(0) = 0, and yet f'(x) = 2x sin----- cos - has no limit as x -> 0.
x x
Thus, f is differentiable on all of R but is not C1.

► EXERCISES 3.2

1. Find the equation of the tangent plane of the graph of f at the indicated point.

1 "
*(f) f y I = sin(xy)z2 4- ejcz+1, a = 0
_ -1 _

2. Calculate the directional derivative of f at a in the given direction v.


0 1
*(a) f I J = ex cosy, a =
?r/4 1

0 1
(b)
jt /4 -1
96 ► Chapter 3. The Derivative

*3. Give the derivative matrix of each of the following vector-valued functions.

xy xyz
(a) f:R2->R2, f (d) f:"R3-->R2, f y
■2 -U ,,2 _u. ,2
\z
cost xcosy

(b) f: R-> R3, f(r) = sint (e) f:R2-> R3, f = x siny


e' V/ y
scosr
(c) f: R2 -> R2, f
s sint

*4. Use the technique of Example 4 to estimate your gas mileage if you used 6.5 gallons to drive 224
miles.
5. Two sides of a triangle are x = 3 and y = 4, and the included angle is 0 = t t /3. To a small change
in which of these three variables is the area of the triangle most sensitive? Why?
6. Let U c R" be an open set, and let a € U. Suppose m > 1. Prove that the function f: 17 -> Rm
is differentiable at a if and only if each component function fiti = 1,..., m, is differentiable at a.
(Hint: Review the proof of Proposition 3.1 of Chapter 2.)
7. Show that any linear map is differentiable and is its own derivative (at an arbitrary point).

8. Show that the tangent plane of the cone z2 = x2 + y2 at 0 intersects the cone in a line.

9. Show that the tangent plane of the saddle surface z = xy at any point intersects the surface in a
pair of lines.
/x •2 _ a,2
10. Find the derivative of the map f ( at the point a. Show that whenever a 0,
2xy
the linear map Dt (a) is a scalar multiple of a rotation matrix.
11. Prove from the definition that the following functions are differentiable.
(a) f(X\ — x2 + y2 (b) f(X\ =xy2 (c) /: R" R,/(x) = ||x||2

12. Let

x0O, /(0) = 0.

Show directly that f fails to be C1 at the origin. (Of course, this follows from Example 5 of Section
3 of Chapter 2 and Propositions 2.2 and 2.4.)
3 Differentiation Rules 97

13. Use the results of Exercise 3.1.13 to show that f (A) = A2 and f (A) — ATA are differentiable
functions mapping Mnxn to Mnxn.
s14. Let A be an n x n matrix. Define f: R" -> R by /(x) = Ax•x = xTAx.
(a) Show that f is differentiable and D/(a)h = Aa • h + Ah • a.
(b) Deduce that when A is symmetric, D/(a)h = 2Aa ■ h.
15. Let a G Rn, 8 > 0, and suppose f: B(a, 8) -> R is differentiable at a. Suppose /(a) > /(x)
for all x e B(a, 6). Prove that Df(a) = 0.
16. Let a G R2, 8 > 0, and suppose f: B(a, 8) -> R is differentiable and D/(x) = 0 for all x g
5(a, <5). Prove that /(x) = /(a) for all x e B(a, 5). (Hint: Start with the proof of Proposition 2.4.)
17. Let

y, x o
0, x=Q

(a) Prove that f is continuous at 0.


(b) Determine whether f is differentiable at 0. Give a careful proof.
18. Let

= x^0) A°) = °-
\y J x4 + y8

(a) Find all the directional derivatives of f at 0.


(b) Is f continuous at 0?
(c) Is f differentiable at 0?
19. (a) Let f: R2 -> R be the function defined in Example 5 of Chapter 2, Section 3. Show that f
has directional derivatives at 0 in every direction but is not differentiable at 0.
(b) Find a function all of whose directional derivatives at 0 are 0 but which, nevertheless, is not
differentiable at 0.
(c) Find a function all of whose directional derivatives at 0 are 0 but which is unbounded in any
neighborhood of 0.
(d) Find a function all of whose directional derivatives at 0 are 0, all of whose directional derivatives
exist at every point, and which is unbounded in any neighborhood of 0.

► 3 DIFFERENTIATION RULES
In practice, most of the time Proposition 2.4 is sufficient for us to calculate explicit deriva­
tives. However, it is reassuring to know that the sum, product, and quotient rules from
elementary calculus pertain to the multivariable case. We shall come to the chain rule
shortly.
For the next proofs, we need the notion of the norm of a linear map T: Rn -> Rm. We
set

IITII = max ||T(x)||.


JJxJJ=l

(In Section 1 of Chapter 5 we will prove the maximum value theorem, which states that a
continuous function on a closed and bounded subset of R" achieves its maximum value.
98 ► Chapter 3. The Derivative

Since the unit sphere in R” is closed and bounded, this maximum exists.) When x 0, we

have and so, by linearity, the following formula follows immediately:

||T(x)|| < ||T||||x||.

Proposition 3.1 Suppose U c R" is open and f: U -> Rm, g: U -> Rm, and
k: U —> R. Suppose a G U and f, g, and k are differentiable at a. Then

1. f 4- g: U —> Rw is differentiable at a and D(f + g)(a) = £>f(a) 4- Dg(a).


2. kf: U -> Rm is differentiable at a and D(kf)(a)v = (D^(a)v)f(a) 4- k(a)Df (a)v
for any v G R".
3. f • g: U -> R is differentiable at a and D(t • g)(a)v = (Df(a)v) • g(a) 4-
f(a) • (Dg(a)v) for any v G Rn.

Proof These are much like the proofs of the corresponding results in single-variable
calculus. Here, however, we insert the candidate for the derivative in the definition and
check that the limit is indeed 0:
. .. (f 4-g)(a4-h) - (f 4-g)(a) - (Df (a) 4-Dg(a))h
1. hm---------------------------------------------------------------------—
h-*0 l!h||
(f(a 4- h) — f(a) — Df(a)h) 4- (g(a + h) - g(a) - Dg(a)h)
= lim
h->0 iihii
f(a4-h)-f(a)-Df(a)h g(a 4-h) - g(a) - Dg(a)h
lim---------------- —-------------------F lim----------------- —- = 04-0 = 0.
-----------------

2. We proceed much as in the proof of the limit of the product in Theorem 3.2 of
Chapter 2.
(K)(a 4- h) - (fcf)(a) - ((D£(a)h)f (a) 4- fc(a)Df (a)h)

k(a 4- h)f(a 4- h) - fc(a)f (a) - ((Dfc(a)h)f(a) 4- £(a)Df(a)hj


lim —

((*(a 4- h) - *(a))f(a + h) 4- *(a)(f(a 4- h) - f (a))) - ((D*(a)h)f(a) 4- *(a)Df (a)h)


= lim
h—>0 iiM
(k(a 4- h) - fc(a))f(a 4- h) - (Dfc(a)h)f (a) fc(a)(f(a 4- h) - f(a)) - fc(a)Df(a)h
&----------------------------- jjhjj-----------------------------+£?.------------------------- w------------------------

(*(a + h)-«a))f(a + h)-(2X(a)h)f(a) f(a + h) - f(a) - Df (a)h


hm - ----------------------- 7 -------------------- F*(a) hm---------------- ---------------------- .
h-»0 ||h|| h->0 ||h||

Now, the second term clearly approaches 0. To handle the first term, we have to
use continuity in a rather subtle way, remembering that if f is differentiable at a,
then it is necessarily continuous at a (Proposition 2.2):
3 Differentiation Rules ◄ 99

(*(a + h) - *(a))f(a 4- h) - (Dfc(a)h)f (a)


IIM
(fc(a + h) - k(a) - (D£(a)h))f(a + h) + (ZU(a)h)(f(a 4- h) - f(a))

Hh||
k(a + h) - k(a) - D*(a)h (Dfc(a)h) (f (a + h) - f (a))
IIM ( )+ llhll
Now here the first term clearly approaches 0, but the second term is a bit touchy.
The length of the second term is

|Dfc(a)h|||f(a4-h)-f(a)||
< ||D*(a)||||f(a + h)-f(a)||,
llhll
which in turn goes to 0 as h 0 by continuity of f at a. This concludes the proof
of (2).

The proof of (3) is virtually identical to that of (2) and is left to the reader in Exer­
cise 9. ■

Theorem 3.2 (The Chain Rule) Suppose g: R" -> Rm and f: R"1 -> Rf, g is dif­
ferentiable at a, and f is differentiable at g(a). Then f °g is differentiable at a and

D(fog)(a) = Df(g(a))o£>g(a).

Proof We must show that


r f (g(a 4- h)) - f (g(a)) - Di(g(a))Dg(a)h
lim-------------------------- ——----------------------------- = v.
h->0 llhll
Letting b = g(a), we know that

g(a 4-h) - g(a) - Dg(a)h


lim---------------- --------------------- = 0 and
h-+o ||h||
f(b 4-k) - f(b) — Df(b)k
Inn---------------- —-- -----------------= 0.
k->0 ||k||

Given e :> 0, this means that there are <5i > 0 and rj > 0 so that

(*) 0 < ||h|| < Si => ||g(a 4- h) - g(a) - Dg(a)h|| < e||h|| and
(**) ||k|| < r? => ||f(b4-k) — f(b) — Df(b)k|| < e||k||.

Setting k = g(a 4- h) - g(a) and rewriting (*), we conclude that whenever 0 < ||h|| < <51,
we have

||k - Dg(a)h|| <e||h||,

and so
||k|| < ||Dg(a)h|| +£||h|| < (||Z>g(a)|| +s)||h||.

Let 82 = r}/(\\Dg(a)|| 4- e) and set 8 — min(<5i, <5i).


100 ► Chapter 3. The Derivative

Finally, we start with the numerator of the fraction whose limit we seek.

f (b + k) - f (b) - Df (b)Dg(a)h
= [f (b + k) - f(b) - Df (b)k] + [Df (b)k - Df(b)Dg(a)h]
= [f(b + k) - f(b) - Df(b)k] + Df(b)(k - Dg(a)h).

Therefore, whenever 0 < ||h|| < 6, we have

||f(b + k) -f(b) - Df(b)Dg(a)h||


< ||f(b + k) - f(b) - Df(b)k|| + ||Df(b)|| ||k - Dg(a)h||
< £||k|| + ||Df(b)||e||h|| < £||h||(||Dg(a)|| + £ + ||Df(b)||).

Thus, whenever 0 < ||h|| < <5, we have

||f(g(a + h))-f(g(a»-Df(g(a))Dg(a)h|| < e(||Dg(a)| +, + ||Df(b)||).

Since £ > 0 is arbitrary, this shows that

r ||f(g(a + h)) - f (g(a)) - Df (g(a))Dg(a)h|| _


h^o ||h||

as required. ■

Remark Those who wish to end with a perfect e at the end may replace the £ in (*)
£ £
with----------------------- and that in (**) with-------------------------.
2(||Df(b)|| 4-1) 2(||Dg(a)||+£)

> EXAMPLE 1

Suppose the temperature in space is given by f y = xyz2 4- e3xy~2z and the position of a bumblebee
W 1
is given as a function of time ? by g: R —> R3. If at time t = 0 the bumblebee is at a = 2 and
" -1 ~ 3
her velocity vector is v = 1 , as indicated in Figure 3.1, then we might ask at what rate she
2
perceives the temperature to be changing at that instant. The temperature she measures at time t is
(/°g)(t), and so she wants to calculate (/<>g)z(0) = D(/°g)(0). We have

— [ yz2 + 3ye3xy 2z xz2 + 22 2xyz — 2e3xy 22 J ,


Df y
3 Differentiation Rules 101

so D/(a) = [ 24 12 10 j. Then

(/og)'(O) = D/(g(0))g'(0) = Df(a)v = [24

Note that in order to apply the chain rule, we need to know only her position and velocity vector at
that instant, not even what her path near a might be.

Remark Suppose f: R" -> Rm is differentiable at a and we wish to evaluate Dvf (a)
for some v € R". Define g: R -> R" by g(t) = a + tv, and consider <o(t) = (f°g)(t). By
definition, we have Dvf (a) = <pz(0). Then, by the chain rule, we have

Dvf (a) = ?'(0) = (fog)'(O) = Df (a)g'(0) = Df (a)v.

This is an alternative derivation of the result of Proposition 2.3. (Cf. Example 4 of Sec­
tion 1.)
Indeed, if g is any differentiable function with g(0) = a and g'(0) = v, we see that
Dvf (a) = (f°g)'(O), so this shows, as we suggested in the remark on p. 83, that we should
think of the directional derivative as the rate of change perceived by an observer at a moving
with instantaneous velocity v.

► EXAMPLE 2

Let
WCOS V
and g
u sin v
102 ► Chapter 3. The Derivative

Since

2x —2y cos v — u sin v


2y 2x sin v u cos v

we have

D(fog)

2u cos v —2u sin v cosv —u sin v ucos2v —u2 sin 2v


=2
2u sin v 2u cos v sin v U COS V u sin2v u2cos 2v

u 2c o s 2v
On the other hand, as the reader can verify, (fog) and so we can double-check
u2 sii\2u
the calculation of the derivative directly. *4

► EXERCISES 3.3
*1. Suppose f: R3 -> R is differentiable and

cos t -1- sin t f


g(0 = t 4-1 and Df 1
(-1/
t2 + 4t - 1

Find (/og)'(0).
*2. Suppose

2y - sinx
gjc+3y

xy + y3

Calculate D(fog)(0) and D(g°f)(0).

cos/
3. Suppose g(/) = sin/ and/ y + 2x. Use the chain rule to calculate
2sin(//2)
(/°g)'(/). What do you conclude?
3 cost
4. An ant moves along a helical path with trajectory g(r) — 3 sin/

5t
(a) At what rate is his distance from the origin changing at r = 2t t ?

(b) The temperature in space is given by the function f: R3 -> R, f I y I = xy + z2. At what rate
\4/
does the ant detect the temperature to be changing at t = 3t t /4?
*5. An airplane is flying near a radar tower. At the instant it is exactly 3 miles due west of the tower,
it is 4 miles high and flying with a ground speed of 450 mph and climbing at a rate of 5 mph. If at
that instant it is flying
3 Differentiation Rules 103

(a) due east, (b) northeast,


at what rate is it approaching the radar tower at that instant?
*6. An ideal gas obeys the law pV = nRT, where p is pressure, V is volume, n is the number of
moles, R is the universal gas constant, and T is temperature. Suppose for a certain quantity of ideal
gas, nR = 1 l-atm/°K. At a given instant, the volume is 101 and is increasing at the rate of 11/min; the
temperature is 300° K and is increasing at the rate of 5°K/min. At what rate is the pressure increasing
at that instant?
7. Ohm’s law tells us that V = IR, where V is the voltage in an electric circuit, I is the current
flow (in amps), and R is the resistance (in ohms). Suppose that as time passes, the voltage decreases
as the battery wears out and the resistance increases as the resistor heats up. Assuming V and R
vary as differentiable functions of t, at what rate is the current flow changing at the instant to if
R(to) = 100 ohm, R'(to) = 0.5 ohm/sec, Z(t0) =0.1 amp, and Vz(r0) = —0.1 volt/sec?
8. Let U c Rn be open. Suppose g: U -> R is differentiable at a e U and g(a) 0. Prove that
1/g is differentiable at a and D(l/g)(a) = -l/(^(a))2Dg(a).
9. Prove (3) in Proposition 3.1. (One approach is to mimic the proof given of (2). Another is to apply
(1) and (2) appropriately.)
810. Suppose U c R" is open and a € U. Letf, g: U -> R3 be differentiable at a. Prove that f x gis
differentiable at a and D(f x g)(a)v = (Df(a)v) x g(a) + f (a) x (Dg(a)v) for any v € R". (Hint:
Follow the proof of part (2) of Proposition 3.1, and use Exercise 1.5.14.)
11. (Euler’s Theorem on Homogeneous Functions) We say f: R" - {0} -> R is homogeneous
of degree k if /(tx) = tkf(x) for all t > 0. Prove that f is homogeneous of degree k if and only if
D/(x)x = kf(x) for all nonzero x e R". (Hint: Fix x and consider h(t) = t~k
812. Suppose 17 C R” is open and convex (i.e., given any points a, b e U, the line segment joining
them lies in U as well). If f: U -> Rw is differentiable and Df (x) = O for all x e U, prove that f is
constant. Can you prove this when 17 is open and connected (i.e., any pair of points can be joined by
a piecewise-C1 path)?

13. Suppose f: R -> R is differentiable and let h

^x1 + y2, show that


SZz dh ,
x^-+y^~ =rf(r).
dx dy
*14. Suppose h: R -> R is continuous and u, v: (a, b) -> R are differentiable. Prove that the func­
tion F: (a, b) -> R given by

F(f) = / h(s)ds
Ju(t)
is differentiable and calculate F'. (Hint: Recall that the Fundamental Theorem of Calculus tells you
how to differentiate functions such as H(x) = f* h(s)ds.)

15. If f: R2 —> R is differentiable and F I j =f i U ), show that


\yl \u~vl

du dv \3x/ \3y/ ’

where the functions on the right-hand side are evaluated at


u—v
104 ► Chapter 3. The Derivative

rcos0\ _ , ,
*16. Suppose f: IR2 -> JR is differentiable and let F I. Calculate
r sih0 I
/9F\2 (dF\Z
\ dr ) r2 \ 39 )

in terms of the partial derivatives of f.


nf A£
17. Suppose f: IR2 -> IR is differentiable and — = c~ for some nonzero constant c. Prove that
dt dx
u X
= h(x 4- ct) for some function h. (Hint: Let
V x + ct

> 4 THE GRADIENT


To develop physical intuition, it is important to recast Proposition 2.3 in more geometric
terms when f is a scalar-valued function.

Definition Let /: R" R be differentiable at a. We define the gradient of f at a


to be the vector

Vf (a) = (D/(a))T =

Now we can interpret the directional derivative of a differentiable function as a dot


product:

(*) Dyf(a) = D/(a)v = V/(a)«v.

If we consider the directional derivative in the direction of various unit vectors v, we infer
from the Cauchy-Schwarz inequality, Proposition 2.3 of Chapter 1, that

£>v/(a) < ||V/(a)||,

with equality holding if and only if V/ (a) is a positive scalar multiple of v.


As a consequence, we have

Proposition 4.1 Suppose f is differentiable at a. Then V/(a) points in the direction


in which f increases at the greatest rate, and ||V/(a)|| is that greatest possible rate of
change; i.e.,

IIV/(a)|| = max Dv/(a).


I|v|| = l

► EXAMPLE 1

Let f: IR" -> IR be defined by /(x) = ||x||. It is simple enough to calculate partial derivatives of
f, but we’d rather use the geometric meaning of the gradient to figure out V/(a) for any a 0.
Clearly, if we are at a, the direction in which distance from the origin increases most rapidly is in the
4 The Gradient < 105

direction of a itself (i.e., to move away from the origin as fast as possible, we should move radially
outward). Moreover, we saw in Example 3 of Section 1 that the directional derivative Dv/(a) = 1
when v = a/1| a||. Therefore, we infer from Proposition 4.1 that V/(a) is a vector pointing radially
outward and having length 1. That is,

V/(a) = -2-

As corroboration, we observe that if we move orthogonal to a, then instantaneously our distance from
the origin is not changing, so Dv/(a) = V/(a) • v = 0 when v • a = 0, as it should. ■<!

An equally important interpretation, which will emerge in a significant role in Section


5 of Chapter 4 and then in Chapter 5, is this: Suppose f: R2 -> R is a C1 function, c is
a constant, and C = {x G R2 : /(x) = c} is a level curve of f. We shall prove later that
for any a G C, provided V/(a) /= 0, C has a tangent line at a and V/(a) is orthogonal to
that tangent line. Intuitively, this is quite plausible: If v is tangent to C at a, then since f
does not change as we move along C, it therefore does not change instantaneously as we
move in the direction of v, and so Dvf(a) = 0. Therefore, by (*), V/(a) is orthogonal to
v. (See also Exercise 6.) More generally, if /: R" -> R is differentiable and V/(a) / 0,
then V/(a) is orthogonal to the level set {x g R" : /(x) = c} of f passing through a.

► EXAMPLE!
A?
Consider the surface M defined by f I y — ex+2y cos z — xz + y = 2. Note that the point a =
“-2 "I \z/
1 lies on M. We want to find the equation of the tangent plane to M at a. We know that V/ (a)
0_
gives the normal to the plane, so we calculate

—Z + ex+2y cos z
V/ 1 + 2eJt+2}' cos z
—x — sin z

Thus, the equation of the tangent plane of M at a is l(x + 2) + 3(y - 1) + 2(z — 0) = 0 or


x + 3y + 2z = l. •<

Remark Be sure to distinguish between a level surface of f and the graph of f


(which, in this case, would reside in R4).

► EXAMPLE 3

As a beautiful application of this principle, we use the results of Example 1 to derive a fundamental
physical property of the ellipse. Given two points Fy and Fz in the plane, an ellipse (as pictured in
Figure 4.1) is the locus of points P so that
106 ► Chapter 3. The Derivative

for some positive constant a. Write //(x) = ||F<x||, i = 1,2, and set /(x) = /i(x) + y2(x). Then,
by the results of Example 1, we have

V/(P) = V/i(P) + V/2(P) = + -Ht-

Vi v2

Both Vi and v2 are unit vectors pointing radially away from Fi and F2, respectively, and therefore
V/(P) bisects the angle between them (see Exercise 1.2.20). Thus, a = fi, and so the tangent line
to the ellipse at P makes equal angles with the lines and £2A Thus, a light ray emanating from
one focus reflects off the ellipse back to the other focus.

k EXERCISES 3.4
1. Give the equation of the tangent line of the given level curve at the prescribed point a.

*(a) x3 + y3 = 9, a =

0
(b) 3xy2 4- — sin (ary) = 1, a =
1
2. Give the equation of the tangent plane of the given level surface at the prescribed point a.
1 -1
(a) x2 + y2 + z2 = 5, a = 0 (c) x3 + xz2 + y2z + y3 = 0, a = 1
2 — 0—
— —
0 -1
*(b) yz2 + 2etyz5 = 4, a = 2 (d) e2*** cos(3y) — xy + z = 3, a = 0
4 The Gradient 107

3. Given the topographical map in Figure 4.2, sketch on the map an approximate route of steepest
ascent from P to Q, the top of the mountain. What about from J??

*4. Suppose a hillside is given by z = /(x),x € U C R2. Suppose /(a) = candD/(a) = 3 -4

(a) Find a vector tangent to the curve of steepest ascent on the hill at
a
(b) Find the angle that a stream makes with the horizontal at if it flows in the e2 direction at
c
that point.
5. As shown in Figure 4.3, at a certain moment, a ladybug is at position xo and moving with velocity
vector v. At that moment, the angle Zaxob = t t /2, her velocity bisects that angle, and her speed is
5 units/sec. At what rate is the sum of her distances from a and b decreasing at that moment? Give
your reasoning clearly.

Figure 4.3

6. Suppose that, in a neighborhood of the point a, the level curve C = {x e R2: /(x) = c} can be
parametrized by a differentiable function g: (—e, s) -> R2, with g(0) = a. Use the chain rule to
prove that V/ (a) is orthogonal to the tangent vector to C at a.
7. Check that the definition of an ellipse given in Example 3 gives the usual Cartesian equation of
the form

±c
when the foci are at (Hint: You should find that a2 = b2 + c2.)
0
8. By analogy with Example 3, prove that light emanating from the focus of a parabola reflects off the
parabola in the direction of the axis of the parabola. This is why automobile headlights use parabolic
108 ► Chapter 3. The Derivative

reflectors. (A convenient definition of a parabola is this: It is the locus of points equidistant from a
point (the focus) and a line (the directrix), as pictured in Figure 4.4.)

Figure 4.4
9. Using Figure 4.5 as a guide, complete Dandelin’s proof (dating from 1822) that the appropriate
conic section is an ellipse. Find spheres that are inscribed in the cone and tangent to the plane of
the ellipse. Letting Fi and F2 be the points of tangency and P a point of the ellipse, let Qi and Q2
be points where the generator of the cone through P intersects the respective spheres. Show that
|| Qi & II = IlfiH i = l,2, and deduce that ||Fi P || + ||F2^II = const. (What happens when we tilt
the plane to obtain a parabola or hyperbola?)

Figure 4.5
10. Suppose f: R2 -» R is a differentiable function whose gradient is nowhere 0 and that satisfies
V =29/
dx dy
everywhere.
(a) Find (with proof) the level curves of f. / \
(b) Show that there is a differentiable function F: R -> R so that / IX j = F(2x + y).
5 Curves ■< 109

11. Suppose f: R2 — {0} -> R is a differentiable function whose gradient is nowhere 0 and that
satisfies
df df
->’T-+^r- = 0
dx dy
everywhere.
(a) Find (with proof) the level curves of f.
(b) Show that there is a differentiable function F defined on the set of positive real numbers so that
/(x) = F(||x||).
*12. Find all constants c for which the surfaces
x2 + y2 + z2 = 1 and z = x2 + y2 + c

(a) intersect tangentially at each point and (b) intersect orthogonally at each point
13. Prove the so-called pedal property of the ellipse: If n is the unit normal to the ellipse at P, then
(F^n)(F^n) = constant.

14. The height of land in the vicinity of a hill is given in terms of horizontal coordinates x and y
1
1 and follows a path of “steepest
5
descent.” Find the equation of the path of the stream on a map of the region.
15. A drop of water falls onto a football and rolls down, following the path of steepest descent; that
is, it moves in the direction tangent to the football most nearly vertically downward. Find the path
the water drop follows if the surface of the football is ellipsoidal and given by the equation
4%2 + y2 + 4z2 = 9
1
and the drop starts at the point 1
1

► 5 CURVES
In this section, we return to the study of (parametrized) curves with which we began Chapter
2. Now we bring in the appropriate differential calculus to discuss velocity, acceleration,
some basic principles from physics, and the notion of curvature.
If g: (a, b) -> Rn is a twice-differentiable vector-valued function, we can visualize
g(r) as denoting the position of a particle at time t, and hence the image of g represents its
trajectory as time passes. Then g'(t) is the velocity vector of the particle at time t and g"(t)
is its acceleration vector at time t. The length of the velocity vector, ||gz(t) ||, is called the
speed of the particle. In physics, a particle of mass m is said to have kinetic energy

K.E. = |m (speed)2,

and acceleration looms large because of Newton’s second law of motion, which says that a
force acting on an object imparts an acceleration according to the equation

force = (mass)acceleration, or, in other words, F = ma.


110 ► Chapter 3. The Derivative

As a quick application of some vector calculus, let’s discuss a few properties of motion
in a centralforce field. We call a force field F: U -> R3 on an open subset U c R3 central
ifF(x) = i/r (x)x for some continuous function i//-; U -> R; that is, F is everywhere a scalar
multiple of the position vector.
Newton discovered that the gravitational field of a point mass M is an inverse square
force directed toward the point mass. If we assume the point mass is at the origin, then the
force exerted on a unit test mass at position x is

GM x GM
(x)_ IMP |M " ilxll3*’

where G is the universal gravitational constant. Newton published his laws of motion
in 1687 in his Philosophiae Naturalis Principia Mathematica. Interestingly, Kepler had
published his empirical observations almost a century earlier, in 1596.1

Kepler’S first law: Planets move in ellipses with the sun at one focus.

Kepler’s second law: The position vector from the sun to the planet sweeps out area at a
constant rate.

Kepler’s third law: The square of the period of a planet is proportional to the cube of the
semimajor axis of its elliptical orbit.

For the first and third laws we refer the reader to Exercise 15, but here we prove a general­
ization of the second.

Proposition 5.1 Let F be a central force field on R3. Then the trajectory of any
particle lies in a plane; assuming the trajectory is not a line, the position vector sweeps out
area at a constant rate.

Proof Let the trajectory of the particle be given by g((), and let its mass be m.
Consider the vector function A(r) = g(r) x g'(r). By Exercise 3.3.10 and by Newton’s
second law of motion, we have

A'(r) = g'(t) x g'(t) +g(r) x g"(r) = g(r) x ^(g(O)g(O = 0

since the cross product of any vector with a scalar multiple of itself is 0. Thus, A(t) = Ao
is a constant. If Ao = 0, the particle moves on a line (why?). If Ao 7^ 0, then note that g
lies on the plane

Ao • x = 0,

as Ao • g(t) = A(t) • g(r) = 0 for all t.


Assume now the trajectory is not linear. Let A(t) denote the area swept out by the po­
sition vector g(0 from time t0 to time t. Since A(t + hj — A(t) equals the area subtended

Somewhat earlier he had surmised that the positions of the six known planets were linked to the famous five
regular polyhedra.
5 Curves 111

by the position vectors g(t) and g(r 4- h) (see Figure 5.1), for h small, this is approxi­
mately the area of the triangle determined by the pair of vectors or, equivalently, by the
vectors g(t) and g(t 4- h) — g(r). According to Proposition 5.1 of Chapter 1, this area is
5llg(t) x (g(t 4- h) - g(t))||, so that

a ->o h
= 1 llg(t) X (g(t + A)-g(t))||
a ->o + 2 h
g(t 4-h) — g(t)
= lim - g(r) x —------- t ——
A->o+ 2 h
= |llg« xg'WII = l||Ao||.

That is, the position vector sweeps out area at a constant rate. ■

One of the most useful (yet intuitively quite apparent) results about curves is the
following.

Proposition 5.2 Suppose g: (a, b) -> R" is a differentiable parametrized curve with
the property that g has constant length (i.e., the curve lies on a sphere centered at the
origin). Then g(r) ■ g'(r) = Ofor all t; i.e., the velocity vector is everywhere orthogonal to
the position vector.

Proof By (3) of Proposition 3.1, we differentiate the equation

g(t) • g(t) = const

to obtain

g'(t) • g(0 + g(0 • g'(0 = 2g(t) • g,(t) = 0,

as required. ■

Physically, one should think of it this way: If the velocity vector had a nonzero projection
on the position vector, that would mean that the particle’s distance from the center of the
sphere would be changing. Analogously, as we ask the reader to show in Exercise 2, if a
particle moves with constant speed, then its acceleration must be orthogonal to its velocity.
112 > Chapter 3. The Derivative

Now we leave physics behind for a while and move on to discuss some geometry. We
begin with a generalization of the triangle inequality, Corollary 2.4 of Chapter 1.

Lemma 5.3 Suppose g: [a, ft] -> Rn is continuous (except perhaps at finitely many
points). Then, defining the integral ofg component by component, i.e.,

we have
b II fb
g(r)<Zr < / ||g(r)||J/.
II Ja

fb
Proof Let v = / g(t)dt. If v = 0, there is nothing to prove. By the Cauchy-
Ja
Schwarz inequality, Proposition 2.3 of Chapter 1, |v • g(t) | < ||v|| ||g(r) ||, so
fb fb fb fb
||v||2 = v- I g(t)dt = v-g(t)dt<l ||v||||g(t)||dr = ||v|| I ||g(r)||dr.
Ja Ja Ja Ja
cb
Assuming v 0, we now infer that || v|| < / ||g(r) \\dt, as required. ■
Ja

Definition Let g: [a, b] -» R" be a (continuous) parametrized curve. Given a par­


tition ? = {a = to < ti < ■ • • < tk — ft] of the interval [a, b], let
k
f(g, ?) = 22 wsft-) - g(^-i) ii •
i=i

That is, £(g, IP) is the length of the inscribed polygon with vertices at g(ft), i = 0, ...,k,
as indicated in Figure 5.2. We define the arclength of g to be

£(g) = sup{£(g, ?) : ? partition of [a, ft]},

provided the set of polygonal lengths is bounded above.

The following result is not in the least surprising: The distance a particle travels is the
integral of its speed.

Proposition 5.4 Let g: [a, ft] -> R" be a piecewise-Q1 parametrized curve. Then

rb
€(g) = llg'(*)ll^.
5 Curves "< 113

Proof For any partition IP of [a, b], we have, by Lemma 5.3,

k
t(s, O’) = £ llgft) - g(«,-i)ll

k ptj & fti r>b


=L / e'w* -E/ iig'wiid^/ iig'wiiA,
i=l i=l Ja

fb
so €(g) < / ||gz(r) ||dt. The same holds on any interval.
Ja
Now, for a < t < b, define s(t) to be the arclength of the curve g on the interval [a, /].
Then for h > 0 we have

||g(f + A)-g(Q|| < s(t + h)-s(t) < 1 f‘+h


h ~ h -hi,
h

since s(t + h) — ■$(?) is the arclength of the curve g on the interval [r, t + h]. Now

Uga+^-gtoiu,^,, i r-
fc->o+ h h J,

Therefore, by the squeeze principle (see Exercise 2.3.3),

s(t+h)-s(t)
hm ----------;-----------= ||g (Oil-
h

A similar argument works for h < 0, and we conclude that s'(f) = llg'(^) II- Therefore,

s(t) = /
Ja

/•b
and, in particular, s(b) = ^(g) = Hg'ft)\\dt, as desired. ■
114 ► Chapter 3. The Derivative

► EXAMPLE 1

Consider the helix

a cost
g(t) = asinf t € R,
bt

as pictured in Figure 5.3. Note that it twists around the cylinder of radius a, heading “uphill” at a
constant pitch. If we take one “coil” of the helix, letting t run from 0 to 2nr, then the arclength of that
portion is

2n flit —a sin t
*(g) = [
Ilg'Olld* = / a cost dt = / \/a2 + b2dt = a2 4- b2.
Jo Jo Jo
b

We say the parametrized curve is arclength-parametrized if ||g'(r)|| = 1 for all r, so


$ (t) = t + c for some constant c. Typically, when the curve is arclength-parametrized, we
use s as the parameter.

> EXAMPLE 2

The following curves are arclength-parametrized.


a. Consider the following parametrization of a circle of radius a:
-
a cos(s/a)
g(s) = 0 < s < 2jra.
a sin(r/a)

Then note that

- sin(s/a)
g'W = and ||g'(r)|| = 1.
cos(s/a)
5 Curves 4 115

b. Consider the curve


|(l + s)3/2"

g(j) = |(1- O3/2 -1 <s < 1.


S'
75 J

Then

g'W = and 11^)11 = 1. ◄

If g is arclength-parametrized, then the velocity vector g'(s) is the unit tangent vector
at each point, which we denote by T(s). Let’s assume now that g is twice differentiable.
Since ||T(v) || = 1 for all s, it follows from Proposition 5.2 that T(s) • T'(s) = 0. Define
the curvature of the curve to be k (s ) = ||T'(s) ||; assuming T'(s) / 0, define the principal
normal vector N(^) = T,(s)/||T'(5,)||. (See Figure 5.4.)

Figure 5.4

► EXAMPLE 3

If g is a line, then T is constant and k = 0 (and conversely). If we start with a circle of radius a, then
from Example 2a we have

— sin(s/a)
T(5) =
cos(j/a)

from which we compute that

T(j ) = i — cos(s/a)
a — sin(s/a)

In particular, we see that N(j ) is centripetal (pointing toward the center of the circle) and k (s )
= 1/a for all s.
116 ► Chapter 3. The Derivative

Figure 5.5

Remark If the arclength-parametrized curve g: [0, L] -> R3 is closed (meaning that


g(0) = g(L)), then itis interesting to consider its total curvature, K(s)ds. For a circle or,
indeed, for any convex plane curve, this integral is 2j t . Not surprisingly, at least that much
total curvature is required for the curve to “close up.” A famous theorem in differential
geometry, called the Fary-Milnor Theorem, states that total curvature at least 4% is required
to make a knot, a closed curve that cannot be continuously deformed into a circle without
crossing itself. A trefoil knot is pictured in Figure 5.5. See, e.g., doCarmo, Differential
Geometry of Curves and Surfaces, §5.7.

Note that as long as g'(t) never vanishes, the arclength j is a differentiable function
of t with positive derivative everywhere; thus, it has a differentiable inverse function,
which we write t(s). We can “reparametrize by arclength” by considering the composition
h(j) = g(t(s)), and then, of course, g(t) = h(.s(r)). Writing2 v(t) = s'(f) = ||g/(t)|| for
the speed, we have by the chain rule

g'(r) = h'(s(t)W) = v (0T(j (0) and


(t) g"«) = v'OTTMO) + v(02T(s«)) = v '(0T(j (»)) + K(s(r))v(t)2IW)).

► EXAMPLE 4

Consider the parametrized curve

cos31
g(0 = 0<t < jt /2.
sin t

Then we have

— cost — cost
g' (t) = 3 cos t sin t so u(r) = 3 cost sint and T(.s(t)) —
sint sint

Then, by the chain rule, we have

sint
= (Toj)'(r) = T(s(t))s'(t) = K(s(t))v(t)N(5(t)),
cost

2For those who might not know, u is the Greek letter upsilon, not to be confused with v, the Greek letter nu.
5 Curves ◄ 117

from which we conclude that

sin/ 1 —---------------
1
K(s(t))v(t) = and ----- .
cos/ v(t) 3 cost sin t

► EXERCISES 3.5

1. Suppose g: (a, b) -> R" is a differentiable parametrized curve with the property that at each t,
the position and velocity vectors are orthogonal. Prove that g lies on a sphere centered at the origin.
2. Suppose g: (a, b) -> R" is a twice-differentiable parametrized curve. Prove that g has constant
speed if and only if the velocity and acceleration vectors are orthogonal at every t.
3. Suppose f, g: (a, b) -> Rn are differentiable and f • g = const. Prove that f' • g = —g' • f. Inter­
pret the result geometrically in the event that f and g are always unit vectors.
4. Suppose a particle moves in a central force field in R3 with constant speed. What can you say
about its trajectory? (Proof?)
5. Suppose g: (a, b) -» R" is nowhere zero and g7(t) = A(t)g(t) for some scalar function X. Prove
(rigorously) that g/||g|| is constant. (Hint: Set h = g/||g||, write g = ||g||h, and differentiate.)
6. Suppose g: (a, b) -> Rn is a differentiable parametrized curve and that for some point p e R"
we have ||g(r0) - pll < l|g(t) - Pll for all t e (a, b). Prove that g'(to) • (gfro) - p) = 0. Give a
geometric explanation.
7. Find the arclength of the following parametrized curves.
el cos / /

*(a) g(t) = e* sin/ , a <t <b *(c) g(/) = 3/2


e* 6/3

|(? +e-t)
a(t — sin/)
(b) g(t) = — e~{) (d) g(t) = 0 < t < 2lt
a(l — cos/)
t
8. Calculate the unit tangent vector and curvature of the following curves.
■j- cos t + sin t /
*(a) g(t) = -^cos/ (c) g(t) = /2
_73c o s /_ Asin\ _/3_

*(b) g(0 =
-Jit

9. Prove that for a parametrized curve g: (a,b) -*■ R3, we have k = Hg7 x g"||/u3.
10. Using the formula (f) for acceleration, explain how engineers might decide at what angle to bank
a road that is a circle of radius 1 /4 mile and around which cars wish to drive safely at 40 mph.
11. (Frenet Formulas) Let g: [0, L] —> R3 be a three-times differentiable arclength-parametrized
curve with k > 0, and let T and N be defined as above. Define the binormal B = T x N.
(a) Show that ||B || = 1. Assuming the result of Exercise 1.4.34e, show that every vector in R3 can
be expressed as a linear combination of T(s'), N(s), and B(s). (Hint: See Example 11 of Chapter 1,
Section 4.)
118 ► Chapter 3. The Derivative

(b) Show that B' ■ T = B' • B = 0, and deduce that B'(s) is a scalar multiple of N(s) for every s.
(Hint: See Exercise 3.)
(c) Define the torsion r of the curve by B' = —t N. Show that g is a planar curve if and only if
t(s) = 0 for all s.
(d) Show that N' = —k T + rB.
The equations
T = k N, N' = -k T + t B, B' = -t N
are called the Frenet formulas for the arclength-parametrized curve g.
*12. (See Exercise 11 for the definition of torsion.) Calculate die curvature and torsion of the helix
presented in Example 1. Explain the meaning of the sign of the torsion.
13. (See Exercise 11 for the definition of torsion.) Calculate the curvature and torsion of the curve
e* cos t
g(0 = e' sin t

14. A pendulum is made, as pictured in Figure 5.6, by hanging from the cusp where two arches of a
cycloid meet a length of string equal to the length of one of the arches. As it swings, the string wraps
around the cycloid and extends tangentially to the bob at the end. Given the equation
14-sinr
f(0 = 0 < r < 2t t ,
1 — COSt

for the cycloid, find the parametric equation of the bob, P, of the pendulum.3

Figure 5.6
15. Assuming that the force field is inverse square, prove Kepler’s first and third laws, as follows.
Without loss of generality, we may assume that the planet has mass 1 and moves in the xy-plane.
(You will need to use polar coordinates, as introduced in Example 6 of Chapter 2, Section 1.)
(a) Suppose a, b > 0 and a2 = b2 + c2.Show that the polar coordinates equation of the ellipse
^- J-2 + y^
(x —Tc) 2 =1 is cr(l--cos0) = -. b2
a2 b2 a a
This is an ellipsewith semimajor axis a and semiminor axis b, with one focus at the origin. (Hint:
Expand the left-hand side in polar coordinates and express the result as a difference of squares.)

3This phenomenon was originally discovered by the Dutch mathematician Huygens in an effort to design a
pendulum whose period would not depend on the amplitude of its motion, hence one ideal for an accurate clock.
5 Curves 119

(b) Let r(f) and 0(f) be the polar coordinates of g(r), and let
cos0(t) , -sin 0(f)
e,(t) = and ee(t) =
sin 0(f) cos 0(0
as pictured in Figure 5.7. Show that
g7 (0 = r'(t)er(t) + r(r)0'(r)e0(r),
g"(0 = (r"(t) - r(O0'(O2M) + (2r'(00'(r) + r(t)O"(t))ee(t).

(c) Let Ao be as in the proof of Proposition 5.1. Show that g"(0 x Aq = GAf0'(r)es (0 = GMe'r (t),
and deduce that g'(r) x Ao = GM(er (t) + c) for some constant vector c.
(d) Dot the previous equation with g(0 and use the fact that g(r) x g'(r) = Ao to deduce that
GMr(t)(\ — ||c|| cos0(O) = ||Ao||2 if we assume c is a negative scalar multiple of ej. Deduce
that when ||c|| > 1 the path of the planet is unbounded and that when ||c|| < 1 the orbit of the planet
is an ellipse with one focus at the origin.
(e) As we shall see in Chapter 7, the area of an ellipse with semimajor axis a and semiminor axis b
is nab\ show that the period T — Zttab/(| Ao||. Now prove that T2 = ——a3.
GM

Figure 5.7
16. (Pilfered from Which Way did the Bicycle Go ...and Other Intriguing Mathematical Mysteries,
published by the M.A.A. Copyright The Mathematical Association of America, Washington, DC,
1996. All rights reserved.)

“This track, as you perceive, was made by a rider who was going from the direction
of the school.”
“Or towards it?”
“No, no, my dear Watson.... It was undoubtedly heading away from the school.”

So spoke Sherlock Holmes.4 Imagine a 20-foot wide mud patch through which a bicycle has just
passed, with its front and rear tires leaving tracks as illustrated in Figure 5.8. (We have taken the
liberty of helping you in your capacity as sleuth by using dashes for the path of one of the wheels.)
In which direction was the bicyclist traveling? Explain your answer.

4“The Adventure of the Priory School,” The Return of Sherlock Holmes.


120 ► Chapter 3. The Derivative

► 6 HIGHER-ORDER PARTIAL DERIVATIVES


Suppose U C R" is open and f: 17 -> Rm is a vector-valued function on U. Recall that we
1 df
said f is 6 (on U) if the partial derivatives -— exist and are continuous on U. Suppose
a Xi
this is the case. Then we can ask whether they in turn have partial derivatives, i.e., whether
the functions
a2 f _ a /jrt \
dXjdXi def dXj \dXi /
are defined. These functions are, for obviops reasons, called second-order partial deriva­
tives of f. We say f is C2 (on U) if all its first- and second-order partial derivatives exist
and are continuous (on U). More generally, we say f is Ck (on U) if all its first-, second-,
..., and fc^-order partial derivatives

, . . .
7---------------------- "---- , 1 < 11, 12, • • • < n,
dx^dXi^j • • • dXi2dXij
exist and are continuous (on U). We say f is C00 (or smooth) if all its partial derivatives of
all orders exist.

► EXAMPLE 1

Let f I y I = exy sin z + xy3z4. Then f is smooth and


\z/

9/ ry ■ 13 4
— ^ye^sinz + y « »
ax
a2 f
------ — ye*y cosz + 4y3z3,
azax
d3 f
——— = — yexy sinz + 12y3z2, and
az2 ax
d3 f
- _ 1 ■" = exy (xy + 1) cos z + 12y2z3. ◄
aydzax

It is a hassle to keep track of the order in which we calculate higher-order partial


derivatives. Luckily, the following result tells us that for smooth functions, the order in
6 Higher-Order Partial Derivatives 121

which we calculate the partial derivatives does not matter. This is an intuitively obvious
result, but the proof is quite subtle.

Theorem 6.1 Let U C R" be open, and suppose f: U -> Rm is a Q2 function. Then
for any i and j we have

a2f _ 32f
dXidXj dxjdxi

Proof It suffices to prove the result when m = 1. For ease of notation, we take n — 2,
i = 1, and j = 2. Introduce the function

s
as indicated schematically in Figure 6.1. Letting q(s)
b+k
the Mean Value Theorem, we have

A q(a + h) — q(a) = hq'(g) for some $ between a and a + h

l( *
= h(d
\b + k) dx \bf /

= hk for some q between b and b + k.

(a-Yh
On the other hand, letting r(f) = f I , we have

A ( ) = r(b + k) — r(k) = kr'(r) for some r between b and b + k

\dy \ t ) dy\v))

=M£L
for some a between a and a + h.
dxdy

Therefore, we have

1 M _ a2/ /A = d2f
hk \kj dydx \q) dxdy \t )
122 ► Chapter 3. The Derivative

Figure 6.1

a2/ a2/
Now £, a -> a and n, r -> b as h, k -> 0, and since the functions ——— and „ „ are
5 1 dxdy dydx
continuous, we have

ay M = a2/ M
dxdy \bj dydx \bj ’

as required. ■

To see why the C2 hypothesis is necessary, see Exercise 1.


Second-order derivatives appear in the study of the local behavior of functions near
critical points and, more importantly, in differential equations and physics—as we’ve seen,
Newton’s second law of motion tells us that forces induce acceleration. At this juncture, we
give a few examples of higher-order partial derivatives and partial differential equations
that arise in the further study of mathematics and physics.

► EXAMPLE 2

(Harmonic Functions) If f is a C2 function on (an open subset of) R", the expression

97,97 , , ?7
V7 =
dxf dx% Sx2

is called the Laplacian of f. A solution of the equation V2/ = 0 is called a harmonic function. As
we shall see in Chapter 8, the Laplacian and harmonic functions play an important role in physical
applications. For example, the gravitational (resp., electrostatic) potential is a harmonic function in
mass-free (resp., charge-free) space.

► EXAMPLES

(Wave Equation) The equation

97 =r2^f_
(*)
dt2 dx2
models the displacement of a one-dimensional vibrating string (with “wave velocity” c) from its
equilibrium position. By a clever use of the chain rule, we can find an explicit formula for its general
6 Higher-Order Partial Derivatives ◄ 123

solution, assuming f is Q2. Let

x
t

(so that u = x + ct and v = x - ct), and set F Then by the chain rule, we have

so

Now, differentiating with respect to u, we have to apply the chain rule to each of the functions

d2F
dudv
(■(:)) MCMlHlMO)
_A
2c
s(-(:))][t
- Hs (•(:)) as («(:))-s (■(:)) a a (■ (:))
= 0,

where at the last step we use Theorem 6.1. Now what can we say about the general solution of the
d2F
equation -- — = 0? On any rectangle in the uv-plane, we can infer that
dudv

Iu\
Fl ]=^(u) + ^(v)
\v/

for some differentiable functions d> and (For — I —- I = 0 tells us that —- is independent of
du \ dv / dv
u, hence a function of v only, whose antiderivative we call ^(v). But the constant of integration can
be an arbitrary function of u. To examine this argument a bit more carefully, we recommend that the
reader consider Exercise 11.)
In conclusion, on a suitable domain, the general solution of the wave equation (*) can be written
in the form

= j>(x + ct) + fi(x — Ct)

for arbitrary C2 functions 0 and The physical interpretation is this: The general solution is the
superposition of two traveling waves, one moving to the right along the string with speed c, the other
moving to the left with speed c.
124 ► Chapter 3. The Derivative

► EXAMPLE 4

(Minimal Surfaces) When you dip a piece of wire shaped in the form of a closed curve C into soap
film, the resulting surface you see is called a minimal surface, so called because in principle surface
tension dictates that the surface should have least area among all those surfaces having that curve

C as boundary. If the minimal surface is in the form of a graph z = f I I, then it is shown in a

differential geometry course that f must be a solution of the minimal surface equation
/. + /V?) _,V 91 a2/ + (\ + (a2/
\ Vcty/ / dx2 dx dy dxdy \ \dx) / dy2

(See also Exercise 8.5.22.) Examples of minimal surfaces are

a. a plane;
b. a helicoid—the spiral surface obtained by joining points of a helix “horizontally” to its
vertical axis, as pictured in Figure 6.2(a);
c. a catenoid—the surface of revolution obtained by rotating a catenary y = ^(e™ + e~cx)
(for any c > 0) about the x-axis, as pictured in Figure 6.2(b).

(See Exercise 10.)

(a)

Figure 6.2

► EXERCISES 3.6

1. Define f: R2 -> K by
0 2 2
= x/0, /(0)=0.
X2 + y2

(a) Show that j = -y for all y and j = x for all x.

d2f d2 f
(b) Deduce that ■■■■■-- (0) = 1 but (0) — — 1.
dxdy dydx
(c) Conclude that f is not C2 at 0.
6 Higher-Order Partial Derivatives ◄ 125

2. Check that the following are harmonic functions.

(a) f ( I = 3x2 - 5xy - 3y2 (c) y I = x2 + xy + 2y2 - 3z2 + xyz

(b) f I = log(x2 + y2) (d) = (x2 4- y2 + z2)~1/2


\zf

3. Check that the following functions are solutions of the one-dimensional wave equation given in
Example 3.

(a) f I ) = cos(x + ct) (b) f I I — sin 5x cos 5ct


\t /

4. Let f IX | = t 1/2e . Show that f is a solution of the one-dimensional heat equation — =


aV *
w
*5. Suppose we are given a solution f of the one-dimensional wave equation, with initial position
f = h(x) and initial velocity = k(x). Express the functions 0 and in the solution of

Example 3 in terms of h and k.

. Iu
6. Suppose f: R2 -> R and g: R2 R2 are C2, and let F . Writing gi I
\v
, show that

d2F _ df d2x df d2y d2f dx dx d2f /dx dy dx dy a2f Qy Qy


dudv dx dudv dy dudv dx2 du dv dxdy \3w dv dv du dy2 du dv ’

where the partial derivatives of f are evaluated at g

r cos 0 \
7. Suppose f: R2 -> R is C2. Let F I. Show that
r sin0 /
32f 13F i 32f _ a2/
dr2 r dr r2 d32 dx2 dy2 ’
rcos0
where the lefthand side is evaluated at and the righthand side is evaluated at . (This
e r sin0
is the formula for the Laplacian in polar coordinates.)

8. Use the result of Exercise 7 to show that for any integer n, the functions Fl ) = rn cos n& and

F = rn sin«0 are harmonic.


126 ► Chapter 3. The Derivative

*9. Use the result of Exercise 7 to find all radially symmetric harmonic functions on the plane. (This
means that F is independent of 0, so we can call it h(r).)

10. Check that the following functions f: R2 -> R are indeed solutions of the minimal surface
equation given in Example 4.

(b) arctan (y/x)

2
(c) i - y2 (For this one, a computer algebra system is recommended.)

0, u < 0 or v < 0 . 32F n , _


11. Define F , . Show that F is C and ——• = 0, and yet F cannot
w3, n>0andv>0 dudv
be written in the form prescribed by the discussion of Example 3. Resolve this paradox.
c h w er

IMPLICIT AND EXPLICIT


SOLUTIONS OF LINEAR
SYSTEMS
We have seen that we can view the unit circle {x e R2 : ||x|| — 1} either as the set of solutions
cost
of an equation or in terms of a parametric representation : t e [0,2t t )
sinf
are, respectively, the implicit and explicit representations of this subset of R2. Similarly,
any subspace V c R" can be represented in two ways:

i. V = Span(vi,..., V/J for appropriate vectors vi,..., v* € RM—this is the ex­


plicit or parametric representation;
ii. V = {x e R" : Ax = 0} for an appropriate mxn matrix A—this is the implicit
representation, viewing V as the intersection of the hyperplanes defined by
Aj • x = 0.

In this chapter we will see how to go back and forth between these two approaches. The
central tool is Gaussian elimination, with which we deal in depth in the first two sections.
We then come to the central notion of dimension and some useful applications. In the
last section, we will begin to investigate to what extent we can relate implicit and explicit
descriptions in the nonlinear setting.

► 1 GAUSSIAN ELIMINATION AND


THE THEORY OF LINEAR SYSTEMS
In this section we give an explicit algorithm for solving a system of m linear equations in
n variables:

011*1 + <212X2 + . . . + Cllnxn — &1


021*1 + <222*2 + • . . + O2n*n = l>2

Omi*i + am2x2 + .. . + dmnxn — bm •

127
128 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

Of course, we can write this in the form Ax = b, where

Geometrically, a solution of the system Ax = b is a vector x having the requisite dot products
with the row vectors A/ of the matrix A:

A, • x = bi for all i = 1,2,..., m.

That is, the system of equations describes the intersection of the m hyperplanes with normal
vectors A, and at (signed) distance bi 11| A, || from the origin.
To solve a system of linear equations, we want to give an explicit parametric description
of the general solution. Some systems are relatively simple to solve. For example, taking
the system

xi - x3 = 1
X2 + 2X3 = 2,

we see that these equations allow us to determine xi and x2 in terms of X3; in particular,
we can write xi = 14- x3 and x2 = 2 — 2x3, where x3 is free to take on any real value.
1 +t
Thus, any solution of this system is of the form x = 2 - 2t for some t e R. (It is

easily checked that every vector of this form is in fact a solution, as (1 4-t) — t = 1 and
(2 — 2t) 4- 2t = 2 for every t e R.) Thus, we see that the intersection of the two given
"1 “ 1 “
planes is the line in R3 passing through 2 with direction vector -2
_0_ 1 _
More complicated systems of equations require some algebraic manipulations before
we can easily read off the general solution in parametric form. There are three basic
operations we can perform on systems of equations that will not affect the solution set.
They are the following elementary operations:

i. interchange any pair of equations;


ii. multiply any equation by a nonzero real number;
iii. replace any equation by its sum with a multiple of any other equation.

► EXAMPLE 1

Consider the system of linear equations

3xi — 2x2 4- 9x4 = 4


2x! 4- 2x2 — 4x4 = 6.
1 Gaussian Elimination and the Theory of Linear Systems ◄ 129

We can use operation (i) to replace this system with

2xi 4- 2xj — 4x4 ~ 6


3xi - 2x 2 4- 9x 4 = 4;

then we use operation (ii), multiplying the first equation by 1 /2, to get

xi + x2 — 2x 4 = 3
3xi - 2x 2 4- 9x4 = 4;

now we use operation (iii), adding -3 times the first equation to the second:

xi 4- x2 — 2x 4 = 3
- 5x2 4- 15x 4 = —5.

Next we use operation (ii) again, multiplying the second equation by —1/5, to obtain

4- x2 - 2x 4 = 3
x2 - 3x4 = 1;

finally, we use operation (iii), adding —1 times the second equation to the first:

Xi 4- x4 = 2
x2 — 3x 4 = 1.

From this we see that Xi and x2 are determined by x4, whereas x3 and x4 are free to take on any values.
Thus, we read off the general solution of the system of equations:

xi = 2 — x4
x2 = 1 4- 3x4
X3 = x3
x4 = x4

We now describe a systematic technique, using the three allowable elementary oper­
ations, for solving systems of m equations in n variables. Before going any further, we
should make the official observation that performing elementary operations on a system of
equations does not change its solutions.

Proposition 1.1 If a system of equations Ax = b is changed into the new system


Cx = d by elementary operations, then the systems have the same set of solutions.

Proof Left to the reader in Exercise 1. ■


130 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

We introduce one further piece of shorthand notation, the augmented matrix

«ln

021 • O2n &2


[A | b] =

• &mn hm
•'nt J

Notice that the augmented matrix contains all of the information of the original system of
equations since we can recover the latter by filling in the xt *s, +’s, and =’s as needed.
The elementary operations on a system of equations become operations on the rows of
the augmented matrix; in this setting, we refer to them as elementary row operations of the
corresponding three types:

i. interchange any pair of rows;


ii. multiply all the entries of any row by a nonzero real number;
iii. replace any row by its sum with a multiple of any other row.

Since we have established that elementary operations do not affect the solution set of a
system of equations, we can freely perform elementary row operations on the augmented
matrix of a system of equations with the goal of finding an “equivalent” augmented matrix
from which we can easily read off the general solution.

► EXAMPLE 2

We revisit Example 1 in the notation of augmented matrices. To solve

— 2x2 + 9x4 = 4
2xi + 2x2 — 4x4 = 6,

we begin by forming the appropriate augmented matrix

3 -2 0 9 4
2 2 0 -4 6

We denote the process of performing row operations by the symbol and (in this example) we
indicate above it the type of operation we are performing:

’3 -2 0 9 4 2 2 0 -4 6 1 1 0 -2 3
Oj)
2 2 0 -4 6 3 -2 0 9 4 3 -2 0 9 4

-2 3 1 1 0 -2 3' 1 0 0 1 2
oh) 1 1 0 (ii)

0 -5 0 15 -5 0 1 0 -3 1 0 1 0 -3 1

From the final augmented matrix we are able to recover the simpler form of the equations,

Xi + xa — 2
X2 — 3x4 = 1 ,

and read off the general solution just as before.


1 Gaussian Elimination and the Theory of Linear Systems "4 131

Definition We call the first nonzero entry of a row (reading left to right) its leading
entry. A matrix is in echelon1 form if

1. the leading entries move to the right in successive rows;


2. the entries of the column below each leading entry are all 0;2
3. all rows of zeroes are at the bottom of the matrix.

A matrix is in reduced echelon form if it is in echelon form and, in addition,

4. every leading entry is 1;


5. all the entries of the column above each leading entry are 0 as well.

We call the leading entry of a certain row of a matrix a pivot if there is no leading entry
above it in the same column. When a matrix is in echelon form, we refer to the columns in
which a pivot appears as pivot columns and to the corresponding variables (in the original
system of equations) as pivot variables. The remaining variables are called free variables.

The augmented matrices

1 2 0 -1 1 12 11 3 12 0 3
2J ’
_0 0 1 2 ,0012 2_ 1 0-1 2_

are, respectively, in reduced echelon form, in echelon form, and in neither. The key point
is this: When the matrix is in reduced echelon form, we are able to determine the general
solution by expressing each of the pivot variables in terms of the/ree variables.

► EXAMPLE 3

The augmented matrix

1 2 0 0 4 1
0 0 1 0 -2 2
0 0 0 1 1 1

is in reduced echelon form. The corresponding system of equations is

Xi 4- 2xj 4- 4x5 = 1
X3 — 2x5 — 1
X4 4- X5 = 1.

’The word echelon derives from the French tchelle, “ladder.” Although we don’t usually draw the rungs of the
2 3 4
ladder, they are there: 0
0
Condition (2) is actually a consequence of (1), but we state it anyway for clarity.
132 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

Notice that the pivot variables, , x3, and x4, are completely determined by the free variables x2 and
x5. As usual, we can write the general solution in terms of the free variables only:

Xi 1-2X2-4X5 T 2~ 4~

X2 X2 0 1 0
x= X3 = 2 +2x5 = 2 + *2 0 + x$ 2
*4 1 - *5 1 0 -1
_X5 _ *5_ 0_ 0_ 1_

In this last example, we see that the general solution is the sum of a particular solution—
obtained by setting all the free variables equal to 0—and a linear combination of vectors, one
for each free variable—obtained by setting that free variable equal to 1 and the remaining
free variables equal to 0 and ignoring the particular solution. In other words, if Xk is a
free variable, the corresponding vector in the general solution has coordinate equal to 1
and j* coordinate equal to 0 for all the other free variables Xj. Concentrate on the circled
entries in the vectors from Example 3:

We refer to this as the standard form of the general solution. The general solution of any
system in reduced echelon form can be presented in this manner.
Our strategy now is to transform the augmented matrix of any system of linear equations
into echelon form by performing a sequence of elementary row operations. The algorithm
goes by the name of Gaussian elimination.
The first step is to identify the first column (starting at the left) that does not consist
only of 0’s; usually this is the first column, but it may not be. Pick a row whose entry in
this column is nonzero—usually the uppermost such row, but you may choose another if it
helps with the arithmetic—and interchange this with the first row; now the first entry of the
first nonzero column is nonzero. This will be our first pivot. Next, we add the appropriate
multiple of the top row to all the remaining rows to make all the entries below the pivot
equal to 0. To consider two examples, if we begin with the matrices

then we begin by switching the first and third rows of A and the first and second rows of B
(to avoid fractions). After clearing out the first pivot column we have

2 (?) 2 * 4~
1 and 0 0 5
0 -4 -4 4 0-1 5
1 Gaussian Elimination and the Theory of Linear Systems 133

We have circled the pivots for emphasis. (If we are headed for the reduced echelon form,
we might replace the first row of A' by [ 1 1 2 1 ].)
The next step is to find the first column (again, starting at the left) in the new matrix
having a nonzero entry below the first row. Pick a row below the first that has a nonzero
entry in this column, and, if necessary, interchange it with the second row. Now the second
entry of this column is nonzero; this is our second pivot. (Once again, if we’re calculating
the reduced echelon form, we multiply by the reciprocal of this entry to make the pivot 1.)
We then add appropriate multiples of the second row to the rows beneath it to make all the
entries beneath the pivot equal to 0. Continuing with our examples, we obtain

At this point, both A" and B" are in echelon form; note that the zero row of A" is at the
bottom, and that the pivots move toward the right and down.
The process continues until we can find no more pivots—either because we have a
pivot in each row or because we’re left with nothing but rows of zeroes. At this stage, if
we are interested in finding the reduced echelon form, we clear out the entries in the pivot
columns above the pivots and then make all the pivots equal to 1. (Two words of advice
here: If we start at the right and work our way up and to the left, we in general minimize the
amount of arithmetic that must be done. Also, we always do our best to avoid fractions.)
Continuing with our examples, we find the reduced echelon forms of A and B, respectively:

We must be careful from now on to distinguish between the symbols “=” and when
we convert one matrix to another by performing one or more row operations, we do not
have equal matrices.
Here is one last example:

► EXAMPLE 4
Give the general solution of the following system of linear equations:

Xi + X2 + 3x3 ~ X4 = 0
—Xl + X2 + X3 + X4 + 2x5 = — 4
X2 + 2x3 + 2x4— X5 = 0
2xi — X2 + X4 — 6x5 = 9
134 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

We begin with the augmented matrix of coefficients and put it in reduced echelon form:
-
1 1 3 -1 0 0" "1 1 3 -1 0 0"
-1 1 1 1 2 -4 0 2 4 0 2 -4
0 1 2 2 -1 0 0 1 2 2 -1 0

• 2 -1 0 1 -6 9. -0 -3 -6 3 -6 9.

"1 1 3 -1 0 o' "1 1 3 -1 0 0"


0 1 2 0 1 -2 0 1 2 0 1 -2
0 0 0 2 -2 2 0 0 0 1 -1 1
.0 0 0 -3 3. .0 0 0 0 0 0.

"1 0 1 1 -2 3“
0 1 2 ) 1 -2
0 0 0 -1 1
.0 0 0 0 0 0_

Thus, the system of equations is given in reduced echelon form by

Xi + X3 — 2*5 = 3
X2 + 2x3 + *5 = -2
X4 — Xs = 1,

from which we read off

Xi = 3 — X3 + 2x5
X2 = —2 — 2X3 — X5

x3 = x3
X4 = 1 + Xs

Xs = X5,

and so the general solution is

When we reduce a matrix to echelon form, we must make a number of choices along
the way, and the echelon form may well depend on the choices. But we shall now prove
(using an inductive argument) that any two echelon forms of the same matrix must have
pivots in the same columns, and from this it will follow that the reduced echelon form must
be unique.

Theorem 1.2 Suppose A and B are echelon forms of the same nonzero matrix M.
Then all of their pivots appear in the same positions. As a consequence, if they are in
reduced echelon form, then they are equal.
1 Gaussian Elimination and the Theory of Linear Systems 135

Proof We begin by noting that we can transform M to both A and B by sequences


of elementary row operations. It follows that we can proceed from A to B by a sequence
of elementary row operations: The inverse of an elementary row operation is itself an
elementary row operation, so we can first transform A to M and then transform M to B.
Suppose the i* column of A is its first pivot column; this column vector is the standard
basis vector ei G and all previous columns are zero. If we perform any elementary row
operation on A, the first i — 1 columns remain zero and the i* column remains nonzero.
Thus, the Ith column is the first nonzero column of B; i.e., it is B’s first pivot column.
Next we prove that all the pivots must be in the same locations. We do this by induction
on m, the number of rows. We’ve already established that this must be the case for m = 1.
Now assume that the statement is true for m = k and consider (k + 1) x n matrices A and
B satisfying the hypotheses. By what we’ve already said, A and B have the same first pivot
column; by using an elementary row operation of type (ii) appropriately, we may assume
those respective first pivot entries in the first row are equal. Now, the k x n matrices A' and
B' obtained from A and B by deleting their first rows are also in echelon form. Furthermore,
any sequence of elementary row operations that transforms A to B cannot involve the first
row in a nontrivial way (if we add a multiple of the first row to any other row, we must
later subtract it again). Thus, A' can be transformed to B' by a sequence of elementary row
operations. By the induction hypothesis we can now conclude that A' and B' have pivots
in the same locations and, thus, so do A and B.
Last, we prove that if A and B are in reduced echelon form, then they are equal. Again
we proceed by induction on m. The case m = 1 is trivial. Assume that the statement is true
for m = k and consider the case m = k + 1. If the matrix A has a row of zeroes, then so
must the matrix B; we delete these rows and apply the induction hypothesis to conclude
that A = B. Now, if the last row of A is nonzero, it must contain the last pivot of A (say, in
the jth column). Then we know that the last pivot of B must be in the column as well.
Since the matrices are in reduced echelon form, their j* columns must be the last standard
basis vector em G lRm. Because of this, the sequence of elementary row operations that
transforms A to B cannot involve the last row in a nontrivial way. Thus, if we let A' and
B' be the matrices obtained from A and B by deleting the last row, we see that A' can be
transformed to B' by a sequence of elementary row operations and that A' and B' are both
in reduced echelon form. The induction hypothesis applies to A' and B', so we conclude
that A' = B'. Finally, we need to argue that the bottom rows of A and B are identical. But
any elementary row operation that would alter the last row would also have to make some
change in the first j entries. Since the last rows of A and B are known to agree in the first
j entries, we conclude that they must agree everywhere. ■

1.1 Consistency
136 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

where ai,..., an G IR™ are the column vectors of the matrix A. Thus, a solution c =

of the linear system Ax = b provides scalars ci,..., cn so that

b = t?iai 4-------- l-cnan;

i.e., a solution gives a representation of the vector b as a linear combination, Qai 4--------I-
cnan, of the column vectors of A.

► EXAMPLES

Consider the four vectors

Suppose we want to express the vector b as a linear combination of the vectors Vi, v2, and v3. Writing
out the expression

*1 Vi + X2V2 + X3V3 = Xi

we obtain the system of equations

Xl + X2 + 2x3 = 4
X2 + x3 = 3
Xl + x2 + x3 = 1
2xi + x2 + 2x3 = 2

In matrix notation, we must solve Ax = b, where

1 1 2
0 1 1
1 1 1
2 1 2
1 Gaussian Elimination and the Theory of Linear Systems ◄ 137

So we take the augmented matrix to reduced echelon form:

[A | b] =

This tells us that the solution is

-2
x= 0 , so b = -2vi +0v2 4- 3v 3,
3

which, as the reader can check, works.

Now we modify the preceding example slightly.

► EXAMPLES

We would like to express the vector

"1“

as a linear combination of the same vectors Vi, v2, and v3. This then leads analogously to the system
of equations

Xj 4- X2 + 2*3 = 1
x2 4* x3 = 1
•Xi + x2 + x3 = 0
2xi 4“ x2 + 2x3 = 1

and to the augmented matrix

"1 1 2 1
0 1 1 1
1 1 1 0
_2 1 2 1
138 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

whose echelon form is

“112 1“
0 111
0 0 11'

_0 0 0 1.

The last row of the augmented matrix corresponds to the equation

Oxi 4- 0x2 4" 0x3 = 1,


which obviously has no solution. Thus, the original system of equations has no solution: The vector
b in this example cannot be written as a linear combination of V], v2, and V3.

These examples lead us to make the following

Definition If the system of equations Ax = b has no solutions, the system is said to


be inconsistent; if it has at least one solution, then it is said to be consistent.

A system of equations is consistent precisely when a solution exists. We see that the system
of equations in Example 6 is inconsistent and the system of equations in Example 5 is
consistent. It is easy to recognize an inconsistent system of equations from the echelon
form of its augmented matrix: The system is inconsistent only when there is an equation
that reads

Oxi + 0%2 + ••••+■ 0xn = c

for some nonzero scalar c, i.e., when there is a row in the echelon form of the augmented
matrix where all but the rightmost entry are 0.
Turning this around a bit, let [17 | c] denote the echelon form of the augmented matrix
[A | b]. The system Ax = b is consistent if and only if any zero row in U corresponds to
a zero entry in the vector c.
There are two geometric interpretations of consistency. From the standpoint of row
vectors, the system Ax = b is consistent precisely when the intersection of the hyperplanes

Ai • x = , ..., km'X = bm

is nonempty. From the point of view of column vectors, the system Ax = b is consistent
precisely when the vector b can be written as a linear combination of the column vectors
ai,..., an of A.
In the next example, we characterize those vectors b e R4 that can be expressed as a
linear combination of the three vectors Vi, V2, and V3 from Examples 5 and 6.

► EXAMPLE 7

For what vectors


b\

b4
1 Gaussian Elimination and the Theory of Linear Systems 139

will the system of equations

*1 + *2 + 2*3 =
*2 + *3 = &2

*1 + *2 + *3 = &3

2Xj + *2 + 2X3 — &4

have a solution? We form the augmented matrix [A | b] and determine its echelon form:

"1 1 2 bi~ "1 1 2 bi "1 1 2 bi


0 1 1 b2 0 1 1 b2 0 1 1 b2
1 1 1 &3 0 0 -1 i>3 ~ bi 0 0 1 bi -b3
_2 1 2 &4_ .0 -1 -2 Z>4 — 2Z»i _ .0 0 0 —bi + i>2 — bi + i>4_

We infer from the last row of tine latter matrix that the original system of equations will have a solution
if and only if

(t) —bi + &2 ~ b$ + Z>4 = 0.

That is, the vector b can be written as a linear combination of Vi, v2, and v3 precisely when b satisfies
the constraint equation (t). ◄

► EXAMPLE 8

Given

1 -1 1
3 2-1
1 4 -3
3-3 3.

we wish to find all vectors b e R4 so that Ax = b is consistent, i.e., all vectors b that can be expressed
as a linear combination of the columns of A.
We consider the augmented matrix [A | b] and determine its echelon form [U | c]. In order for
the system to be consistent, every entry of c corresponding to a row of zeroes in U must be 0 as well:

"1 -1 1 bi _1 -1 1 bi
3 2 -1 bz 0 5 -4 b2 — 3&i
[A | b] =
1 4 -3 bi 0 5 -4 bi — bi
.3 3 -3 &4_ -0 0 0 bi — 3Z>i _
"1 -1 1 bi
0 5 -4 b2 - 3bi
0 0 0 bi — b2 + 2bi
.0 0 0 bi — 3bi

Thus, we conclude that Ax = b is consistent if and only if b satisfies the constraint equations

2bi — &2 + bj = 0 and — 32>i 4- #4 = 0-


140 > Chapter 4. Implicit and Explicit Solutions of Linear Systems

These equations describe the intersection of two hyperplanes through the origin in R4 with respective

0
normal vectors and
1 0
0

Notice that here we have reversed the process at the beginning of this section. There
we expressed the general solution of a system of linear equations as a linear combination
of certain vectors. Here, starting with the column vectors of the matrix A, we have found
the constraint equations a vector b must satisfy in order to be a linear combination of them
(that is, to be in the plane they span). This is the process of determining Cartesian equations
for a space defined parametrically.

1.2 Existence and Uniqueness of Solutions


In general, given an m x n matrix, we might wonder how many conditions a vector b € IRW
must satisfy in order to be a linear combination of the columns of A. From the procedure
we’ve just followed, the answer is quite clear: Each row of zeroes in the echelon form of
A contributes one constraint. This leads us to our next

Definition The rank of a matrix is the number of nonzero rows (i.e., the number of
pivots) in its echelon form. It is usually denoted by r.

Then the number of rows of zeroes in the echelon form is m — r, and b must satisfy m - r
constraint equations. We recall that even though a matrix may have lots of different echelon
forms, it follows from Theorem 1.2 that they all must have the same number of nonzero
rows.
Given a system of m linear equations in n variables, let A denote its coefficient matrix
and r the rank of A. Let’s now summarize the state of our knowledge:

Proposition 1.3 The linear system Ax = b is consistent if and only if the rank of the
augmented matrix [A | b] equals the rank of A. In particular, ifr — m, then the system
Ax = b will be consistent for all vectors b € Rm.

Proof Ax = b is consistent if and only if the rank of the augmented matrix [A | b],
which is the number of nonzero rows in the augmented matrix [U | c], equals the number
of nonzero rows in U, i.e., the rank of A. When r = m, there is no row of zeroes in U,
hence no possibility of inconsistency. ■

We now turn our attention to the question of how many solutions a given consistent
system of equations has. Our experience with solving systems of equations suggests that
the solutions of a consistent linear system Ax = b are intimately related to the solutions of
the system Ax = 0.

Definition A system Ax = b of linear equations is called inhomogeneous when


0; the corresponding equation Ax = 0 is called the associated homogeneous system.
1 Gaussian Elimination and the Theory of Linear Systems 141

The solutions of the inhomogeneous system Ax = b and those of the associated ho­
mogeneous system Ax = 0 are related by the following

Proposition 1.4 Assume the system Ax = b is consistent, and let Hi be a “particular


solution. ” Then all the solutions are of the form

U = U1 + V

for some solution v of the associated homogeneous system Ax = 0.

Proof First we observe that any such vector u is a solution of Ax = b. By linearity,


we have

Au = A(ui + v) = Aui + Av = b 4- 0 = b.

Conversely, every solution of Ax = b can be written in this form, for if u is an arbitrary


solution of Ax = b, then, by linearity again,

A(u — Ui) = Au — Aui = b — b = 0,

so v = u — Ui is a solution of the associated homogeneous system; now we just solve for


u, obtaining u = m + v, as required. ■

Remark As Figure 1.1 suggests, when the inhomogeneous system Ax = b is con­


sistent, its solutions are obtained by translating the set of solutions of the associated homo­
geneous system by a particular solution Ui.

Figure 1.1

Of course, a homogeneous system is always consistent since the trivial solution, x = 0,


is always a solution of Ax = 0. Now, if the rank of A is r, then there will be r pivot variables
and n — r free variables in the general solution of Ax = 0. In particular, if r — n, then x = 0
is the only solution of Ax = 0.

Definition If the system of equations Ax = b has precisely one solution, then we


say that the system has a unique solution.

Thus, a homogeneous system Ax = 0 has a unique solution when r = n and infinitely


many solutions when r < n. Note that it is impossible to have r > n since there cannot
be more pivots than columns. Similarly, there cannot be more pivots than rows in the
142 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

matrix, so it follows that whenever n > m (i.e., there are more variables than equations),
the homogeneous system Ax = 0 must have infinitely many solutions.
From Proposition 1.4 we know that if the inhomogeneous system Ax = b is consistent,
then its solutions are obtained by translating the solutions of the associated homogeneous
system Ax = 0 by a particular solution. So we have

Proposition 1.5 Suppose the system Ax = b is consistent. Then it has a unique


solution if and only if the associated homogeneous system Ax = 0 has only the trivial
solution. This happens exactly when r — n.

We conclude this discussion with an important special case. It is natural to ask when
the inhomogeneous system Ax = b has a unique solution for every b e Rm. From Propo­
sition 1.3 we infer that for the system always to be consistent, we must have r = m; from
Proposition 1.5 we infer that for solutions to be unique, we must have r = n. And so we
see that we can only have both conditions when r = m = n.

Definition An n x n matrix of rank r = n is called nonsingular. An n x n matrix


of rank r < n is called singular.

It is easy—but important—to observe that an n x n matrix is nonsingular if and only if


there is a pivot in each row, hence in each column, of its echelon form.

Proposition 1.6 Let Abe annxn matrix. The following are equivalent:

1. A is nonsingular.
2. Ax = 0 has only the trivial solution.
3. For every b G R", the equation Ax = b has a unique solution.

► EXERCISES 4.1

1. Prove Proposition 1.1.


*2. Decide which of the following matrices are in echelon form, which are in reduced echelon form,
and which are neither. Justify your answers.
0 1
(a)
2

2 1 3
(b)
0 1 -1
1 Gaussian Elimination and the Theory of Linear Systems 143

3. For each of the following matrices A, determine its reduced echelon form and give the general
solution of Ax = 0 in standard form.
1 0 -1' r i 2 0 -1 -1'
(a) A = -2 3 -1 -1 -3 1 2 3
*(f) A —
3 -3 0_ 1 -1 3 1 1

2 -2 4' 2 -3 7 3 4_

*(b) A = -1 1 -2 1 -1 1 1 0“
3 -3 6_ 1 0 2 1 1
(g) A =
1 2 -1" 0 2 2 2 0

1 3 1 _-l 1 -1 0 -1.
(c) A =
2 4 3 1 1 0 5 0 -1 “
_-l 1 6. 0 1 1 3 -2 0
(h) A =
1 -2 1 0 -1 2 3 4 1 -6
(d) A = 0 4 4 12 -1 -7 _
2 -4 3 --1

"1 1 1 1"
1 2 1 2
*(e) A =
1 3 2 4
_1 2 2 3_

4. Give the general solution of the equation Ax = b in standard form.


2 1
'(a) A = 1 2
-1 1

1 1
(b) A—
3 3

"1 1 1 -1 0“ " -2"


2 0 4 1 -1 10
(c) A = b=
1 2 0 -2 2 -3
_0 1 -1 2 4_ 7_

1 ' "o"
*5. Find all the unit vectors x e R3 that make an angle of t t /3 with the vectors 0 and 1
_ -1 _ _ 1 _

6. Find the normal vector to the hyperplane in R4 spanned by

2 -1 -4
*7. A circle C passes through the points , and . Find the center and radius of C.
6 7 -2
(Hint: The equation of a circle can be written in the form x2 + y2 + ax + by + c = 0. Why?)
144 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

*8. By solving a system of equations, find the linear combination of the vectors

3
that gives b = 0
-2
*9. For each of the following vectors b e R4, decide whether b is a linear combination of

10. Decide whether each of the following collections of vectors spans R3.
"1 1 1 1 3 2
(a) 1 2 (c) 0 9 -1 9 5 3
1 2 1 1 3 2
1 1 1 1 2 0
(b) 1 2 3 (d) 0 9 1 9 1
1 2 3 _ -1 _ 1_ _5 _
11. Find the constraint equations that b must satisfy in order for Ax = b to be consistent.

(a)

12. Find the constraint equations that b must satisfy in order to be an element of

(a)

13. Find a matrix A with the given property or explain why none can exist.
1
*(a) One of the rows of A is 0 and for some b e R2 both the vectors
1
of the equation Ax = b;
1 Gaussian Elimination and the Theory of Linear Systems 145

"0" r °"
1 , 0
(b) the rows of A are linear combinations of and and for some b e R2 both the vectors
0 1
“1' "4“
2 1
_ 1 _ L1_
and are solutions of the equation Ax = b;
1 0
_2_ _3 _
1
0
(c) the rows of A are orthogonal to and for some nonzero vector b e R2 both the vectors
1
"1" " 1"
0
0 1
and are solutions of the equation Ax = b;
1 1
_0_ _ 1_
"1 " ' 2'
(d) for some vectors bi, ba e R2 both the vectors 0 and 1 are solutions of the equation
" 1" " 1" _1_ _1_
Ax = bi and both the vectors 0 and 1 are solutions of the equation Ax = ba.
0 1

a 3a
For which numbers a will A be singular?
(b) For all numbers a not on your list in part a, we can solve Ax = b for every vector b £ R2. For
each of the numbers a on your list, give the vectors b for which we can solve Ax = b.

15. Let A = a 2 a
a a 1
(a) For which numbers a will A be singular?
(b) For all numbers a not on your list in part a, we can solve Ax = b for every vector b e R3. For
each of the numbers a on your list, give the vectors b for which we can solve Ax = b.
16. Prove or give a counterexample:
(a) If Ax = 0 has only the trivial solution x — 0, then Ax = b always has a unique solution.
(b) Prove or give a counterexample: If Ax = 0 and Bx = 0 have the same solutions, then the set
of vectors b so that Ax = b is consistent is the same as the set of the vectors b so that fix = b is
consistent.
s 17. (a) Suppose A and fi are nonsingular n x n matrices. Prove that Afi is nonsingular. (Hint:
Solve (Afi)x = 0.)
(b) Suppose A and fi are n x n matrices. Prove that if either A or B is singular, then AB is singular.
18. In each case, give positive integers m and n and an example of an m x n matrix A with the stated
property, or explain why none can exist.
*(a) Ax = b is inconsistent for every b € Rm.
*(b) Ax = b has one solution for every b e Rm.
(c) Ax = b has either zero or one solution for every b e Rm.
146 > Chapter 4. Implicit and Explicit Solutions of Linear Systems

(d) Ax = b has infinitely many solutions for every b e


*(e) Ax = b has infinitely many solutions whenever it is consistent.
(f) There are vectors bi, b2, ba so that Ax = bi has no solution, Ax = b2 has exactly one solution,
and Ax — ba has infinitely many solutions.
19. s(a) Suppose A e Mmxn, B e A4„xm,andBA = Z„. Prove that if for some b g theequation
Ax = b has a solution, then that solution is unique.
(b) Suppose A € Mmxn, C e Mnxm, and AC = Im. Prove that the system Ax = b is consistent for
every b g Rm.
2(c) Suppose A e Atmxn and B, C g Mnxm are matrices that satisfy BA = In and AC = Im. Prove
that B — C.
20. Let A be an m x n matrix with row vectors Ai,..., Am g Rn.
(a) Suppose Ai H------- F Am = 0. Prove that rank(A) < m. (Hint: Why must there be a row of
zeroes in the echelon form of A?)
(b) More generally, suppose there is some nontrivial linear combination <?iAi H------- F cmkm = 0.
Prove rank(A) < m.
21. Let A be an m x n matrix with column vectors ai,..., an €
(a) Suppose ai H------- F a„ = 0. Prove that rank (A) < n. (Hint: Consider solutions of Ax = 0.)
(b) More generally, suppose there is some nontrivial linear combination ciaj H------- F cnan = 0.
Prove rank(A) < n.

22. Let Pj — g Rz, i = 1,2,3. Assume xlt x2, and X3 are distinct.
.yi.
(a) Show that the matrix
1 Xl X12
1 x2 xj
1 X3 X2

is nonsingular.
(b) Show that the system of equations

always has a unique solution. Deduce that if Pi, P2, and P3 are not collinear, then they lie on a unique
parabola y = ax2 + bx + c.

23. Let Pt = g R2, i = 1,2,3. Let


yi

(a) Prove that the three points Pi, P2, and P3 are collinear if and only if the equation Ax — 0 has a
nontrivial solution. (Hint A general line in ®2 is of the form ax + by + c = 0, where a and b are
not both 0.)
(b) Prove that if the three given points are not collinear, then there is a unique circle passing through
them. (Hint: If you set up a system of linear equations as suggested by the hint for Exercise 7, you
should use part a to deduce that the appropriate coefficient matrix is nonsingular.)
2 Elementary Matrices and Calculating Inverse Matrices 147

► 2 ELEMENTARY MATRICES AND


CALCULATING INVERSE MATRICES
So far we have focused on the interpretation of matrix multiplication in terms of columns,
namely, the fact that the column of AB is the product of A with the 7th column vector
of B. But equally & propos is the observation that

the Ith row of AB is the product of the Ith row vector of A with B.

Just as multiplying the matrix A by a column vector x on the right,

X1

X2
ai a2

gives us the linear combination xiaj + %2»2 H-------- F xnan of the columns of A, the reader
can easily check that multiplying A on the left by the row vector [*i %2 • • • xm],

yields the linear combination xiAi + X2A2 H-------- 1- xmAm of the rows of A.
It should come as no surprise, then, that we can perform row operations on a matrix A
by multiplying on the left by appropriately chosen matrices. For example, if

then

“3 4" 1 2” ri 2"
ErA = 1 2 , e 2a = 3 4 , and E3A = 1 0
_5 6_ _20 24. _5 6_

Such matrices that give corresponding elementary row operations are called elementary
matrices. Note that each elementary matrix differs from the identity matrix only in a small
way. (N.B. Here we establish the custom that blank spaces in a matrix represent 0’s.)
148 > Chapter 4. Implicit and Explicit Solutions of Linear Systems

i. To interchange rows i and j, we should multiply by an elementary matrix of the


form
i j

"1

z-> ... o 1 •••

... 1 ... 0 ...

ii. To multiply row i by a scalar c, we should multiply by an elementary matrix of


the form
i
I

1
c
1

iii. To add c times row i to row j, we should multiply by an elementary matrix of the
form
i j

"i

i —> 1

j -> ... c ■•• 1

Here’s an easy way to remember the form of these matrices: Each elementary matrix is
obtained by performing the corresponding elementary row operation on the identity matrix.
2 Elementary Matrices and Calculating Inverse Matrices 149

► EXAMPLE 1

3
Let A = . We put A in reduced echelon form by the following sequence of row
1 2
operations:

These steps correspond to multiplying, in sequence from right to left, by the elementary matrices

now the reader can check that


3
E = E4E3E2E1 = “5
4
5J
and, indeed,

2 3
4 3 5 "I I"1 0 -1
EA = 3 “5
1 4 5 ~ 0
I-? J 1 2 1 3

as it should. *4

► EXAMPLE!

Let’s revisit Example 4 on p. 133. Let

113-10
-11112
0 12 2-1
2-1 0 1-6

To clear out the entries below the first pivot, we must multiply by the product of the two elementary
matrices Ej and E2:

1
150 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

and then by the product

We then change the pivot in the third row to 1 and clear out below, multiplying by

Now we clear out above the pivots by multiplying by

"1 1 "1 -1
1 1
e8 = and Eg =
1 1
1_ 1_

The net result is this: When we multiply the product

i _3 1
4 2 0
i 1
2 5 0 0
EgEzEiE^EsE^E^EzEi) — 1 1
“4 2 0
i 9
4 1

by the original matrix, we do in fact get the reduced echelon form. ◄

Recall from Section 1 that if we want to find the constraint equations that a vector b
must satisfy in order for Ax = b to be consistent, we reduce the augmented matrix [A | b]
to echelon form [U | c] and set equal to 0 those entries of c corresponding to the rows
of zeroes in U. That is, when A is an m x n matrix of rank r, the constraint equations
are merely the equations cr+\ = • • • = = 0. Letting E be the product of the elementary
matrices corresponding to the elementary row operations required to put A in echelon form,
we have U = EA and so

(t) [17 | c] = [EA | Eb].

That is, the constraint equations are the equations

Er+i • b = 0, ..., EOT • b = 0.

Interestingly, we can use the equation (t) to find a simple way to compute E: When we
reduce the augmented matrix [A | b] to echelon form [Z7 | c], E is the matrix so that Eb = c.
2 Elementary Matrices and Calculating Inverse Matrices •« 151

► EXAMPLES
Taking the matrix A from Example 2, let’s find the constraint equations for Ax = b to be consistent.
We start with the augmented matrix

1 1 3 -1 0 bi’
-1 1 1 1 2 bi
[A | b] =
0 1 2 2 -1 b3
2 -1 0 1 -6 *4_

and reduce to echelon form

"1 1 3 -1 bi
[tZ|c]= ’ 2 4 0 bi 4- &2
0 0 4 —b\ — Z?2 4* 2i>3
.0 0 0 0 bi 4- 9&2 — 6Z>3 4- 4Z>4

Now it is easy to see that if

bi 1
bi +bi 1 1
Eb — , then E—
—bi — b2 + 2bj -1 -1 2
_ bi 4- 9^2 ~~ 6*3 4" 4b< _ 1 9 -6 4_

The reader should check that, in fact, EA = U.


We could continue our Gaussian elimination to reach reduced echelon form:

“1 0 1 0 -2 |*1 — |i>2 4- |*3


0 1 2 0 1 5*1 4- \b2
[/? I d] =
0 0 0 1 -1 —%bi — \b2 4-|*3
.0 0 0 0 0 bi 4" 9Z>2 — 6*3 4- 4i>4 _

From this we see that R = E'A, where

r 1 3 i O'
3 2
i 1
0 0
E' = 2 2
1 1 1
2 0
1 9 -6 4.

which is very close to—but not the same as—the product of elementary matrices we obtained at the
end of Example 2. Can you explain why the first three rows must agree here, but not the last?

We now concentrate on square (n x n) matrices. Recall that the inverse of the n x n


matrix A is the matrix A-1 satisfying A A-1 = A-1 A = In. It is convenient to have an
inverse matrix if we wish to solve the system Ax — b for numerous vectors b. If A is
invertible, we can solve as follows3:

3We will write the “implies” symbol “=>” vertically so that we can indicate the reasoning in each step.
152 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

Ax = b
multiplying both sides of the equation by A-1 on the left
A-1 (Ax) = A-1b
JJ. using the associative property
(A~1A)x = A-1b
4 using the definition of A-1
x = Znx = A-1b.

We aren’t done! We’ve shown that if x is a solution, then it must satisfy x = A-Ib. That
is, we’ve shown that the vector A-1b is a candidate for a solution. But now we check that
it truly is a solution by straightforward calculation:

Ax = A(A-1b) = (AA-1)b = /»b = b,

as required; but note that we have used both pieces of the definition of the inverse matrix
to prove that the system has a unique solution (which we “discovered” along the way).
It is a consequence of this computation that if A is an invertible n x n matrix, then
Ax = c has a unique solution for every c € R", and so it follows from Proposition 1.6
that A must be nonsingular. What about the converse? If A is nonsingular, must A be
invertible? Well, if A is nonsingular, we know that every equation Ax = c has a unique
solution. In particular, for j = 1,..., n, there is a unique vector b7 that solves Ab; = ey,
the 7th standard basis vector. If we let B be the n x n matrix whose column vectors are
bi,..., b„, then we have

AB = A bi b2

This suggests that the matrix we’ve constructed should be the inverse matrix of A. But we
need to know that BA == In as well. Here is a very elegant way to understand why this is
so. We can find the matrix B by forming the giant augmented matrix

and using Gaussian elimination to obtain the reduced echelon form

(Note that the reduced echelon form of A must be In because A is nonsingular.) But this
tells us that if E is the product of the elementary matrices required to put A in reduced
echelon form, then we have

E[A | Z] = [Z | B],
2 Elementary Matrices and Calculating Inverse Matrices 153

and so B = E and BA = In, which is what we needed to check. Tn conclusion, we have


proved the following

Theorem 2.1 An n x n matrix is nonsingular if and only if it is invertible.

Note that Gaussian elimination will also let us know when A is not invertible: If we
come to a row of zeroes while reducing A to echelon form, then, of course, A is singular
and so it cannot be invertible. The following observation is often very useful.

Corollary 2.2 If A and B are n x n matrices satisfying BA = Int then B = A"1 and
A = B~l.

Proof By Exercise 4.1.19a, the equation Ax = 0 has only the trivial solution. Hence,
by Proposition 1.6, A is nonsingular; according to Theorem 2.1, A is therefore invertible.
Since A has an inverse matrix, A-1, we deduce that

BA = In
4 multiplying both sides of the equation by A"1 on the right
(BA)A-1 = /nA-1
4 using the associative property
B(AA-1) = A-1
4 using the definition of A-1
B — A~l,

as desired. Since AB = In and BA = In, it now follows that A = B~l, as well. ■

> EXAMPLE 4

We wish to determine the inverse of the matrix

1 -1 1
A = 2 -1 0
1 -2 2

(if it exists). We apply Gaussian elimination to the augmented matrix:


154 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

1 -1 0 -2 1 1 1 0 0 2 0 -1
0 1 0 4 -1 -2 0 1 0 4 -1 -2
0 0 1 3 -1 -1 0 0 1 3 -1 -1

It follows that
2 0 -1
A-1 = 4 -1 -2
3 -1 -1

(The reader should check our arithmetic by multiplying AA 1 or A 1 A.) ◄

> EXAMPLE 5

It is convenient to derive the formula for the inverse of a general 2x2 matrix first given in Example
9 of Chapter 1, Section 4. Let

a b
c d

We assume a 0 to start with.

b 1 0 ■11 ~a O'
n i 1 o‘
(assuming ad — be £ 0)
d 0 1 c d 0 1 _0 d-te -- 1
a
1
a 0 1 F1
ad—he At _ _°

1 0 ad—be
__ a
0 1

and so we see that, provided ad — be / 0,

1 r
ad — bc—c a

As a check, we have

a b 1 d -b~ = !■>= * d —b a b
c d ad — be —c a * ad - be —c a c d

Of course, we have derived this by assuming a 0, but the reader can check .easily that the formula
works fine even when a — 0. We do see, however, from the row reduction that

a b
is nonsingular <==> ad - be 0.
c d

We have shown in the course of proving Theorem 2.1 that when A is square, any B that
satisfies AB = I (a so-called right inverse of A) must also satisfy BA = I (and thus is a left
inverse of A). Likewise, we have established in Corollary 2.2 that when A is square, any
left inverse of A is a bona fide inverse of A. Indeed, it will never happen that a nonsquare
matrix has both a left and a right inverse (see Exercise 9).
2 Elementary Matrices and Calculating Inverse Matrices ◄ 155

Remark Even when A is square, the left and right inverses have rather different
interpretations. As we saw in the proof of Theorem 2.1, the columns of the right inverse
arise as the solutions of Ax = ey . On the other hand, the left inverse of A is the product
of the elementary matrices by which we reduce A to its reduced echelon form, I. (See
Exercise 8.)

► EXERCISES 4.2

*1. For each of the matrices A in Exercise 4.1.3, find a product of elementary matrices E = • • • E2E1
so that EA is the reduced echelon form of A. Use the matrix E you’ve found to give constraint
equations for Ax = b to be consistent.
2. Use Gaussian elimination to find A-1 (if it exists):
1 21 r1 2
WA= -> 3 (d)A= 4 5

*(c) A = 0 2 1

3. In each case, given A and b,


i. Find A-1.
ii. Use your answer to (i) to solve Ax = b.
iii. Use your answer to (ii) to express b as a linear combination of the columns of A.
1 1 1 "3“
2 3 3
(a) A = ,b = (c) A = 0 1 1 ,b = 0
3 5 4
1 2 1 _ 1 _

1 1 1 1“ "2"
1 1 1 1
0 1 1 1 0
*(b) A = 0 2 3 ,b = 1 *(d) A = ,b =
0 0 1 3 1
3 2 2 2
0 0 1 4_ _ 1 _

1 -1 1
4. (a) Find two different right inverses of the matrix A =
2 -1 0
(b) Give a nonzero matrix that has no right inverse.
1 2
(c) Find two left inverses of the matrix A = 0 -1
1 1
(d) Give a nonzero matrix that has no left inverse.
5. Prove that the inverse of every elementary matrix is again an elementary matrix. Indeed, give a
simple prescription for determining the inverse of each type of elementary matrix.
156 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

6. Using Theorem 2.1 and Proposition 4.3 of Chapter 1, prove that if AB and B are nonsingular, then
A is nonsingular. (See Exercise 4.1.17.)
B7. Suppose A is an invertible m x m matrix and B is an invertible n x n matrix.
(a) Prove that the matrix
~ A | O '
_ O | B _
is invertible and give a formula for its inverse.
(b) Suppose C is an arbitrary m x n matrix. Is the matrix
~ A [ C '
O B
invertible?
(See Exercise 1.4.12 for the notion of block multiplication.)
8. Complete the following alternative argument that the matrix obtained by Gaussian elimination
must be the inverse matrix of A. Suppose A is nonsingular.
(a) Show there are finitely many elementary matrices E\, E2,.. ■, E^ so that £*£*_]■••
E2Ex A = /.
(b) Let B = • • • E2E\. Prove that AB = I. (Hint: Use Proposition 4.3 of Chapter 1.)
9. Let A be an m x n matrix. Recall that the n x m matrix B is a left inverse of A if BA = In and a
right inverse if AB = Im.
(a) Show that A has a right inverse if and only if we can solve Ax = b for every b G if and only
ifrank(A) = m.
(b) Show that A has a left inverse if and only if Ax = 0 has the unique solution x = 0 if and only if
rank(A) = n. (Hint for 4=: If rank(A) = n, what is the reduced echelon form of A?)
(c) Show that A has both a left inverse and a right inverse if and only if A is invertible if and only if
m = n = rank(A).

► 3 LINEAR INDEPENDENCE, BASIS, AND DIMENSION


Given vectors Vi,..., v* g R" andv € Rn, it is natural to ask whether v e Span(vi,..., v^).
That is, do there exist scalars ci,..., q so that v = ciVj + C2V2 H-------- F q v *? This is in
turn a question of whether a certain (inhomogeneous) system of linear equations has a solu­
tion. As we saw in Section 1, one is often interested in the allied question: Is that solution
unique?

> EXAMPLE 1
Let

We ask first of all whether v e Span(vi, v2, v3). This is a familiar question when we recast it in matrix
notation: Let
‘1 1 1" T
A— 1 -1 0 and b = 1
2 0 1 0
3 Linear Independence, Basis, and Dimension ^8 157

Is the system Ax = b consistent? Immediately we write down the appropriate augmented matrix and
reduce to echelon form:

2
w= 3 ?.
5

As the reader can easily check, w = 3vi - v3, so w e Span(vi, v2, v3). What’s more, w = 2vi —
v2 + v3, as well. So, obviously, there is no unique expression for w as a linear combination of vb v2,
and v3. But we can conclude more: Setting the two expressions for w equal, we obtain

3vi - v3 = 2vi - v2+ v3, i.e., Vi + v2 - 2v3 = 0.

That is, there is a nontrivial relation among the vectors vb v2, and v3, and this is the reason we
have different ways of expressing w as a linear combination of the three of them. Indeed, since
▼ i = — v2 + 2v 3 , we can see easily that any linear combination of vb v2, and v3 is a linear combination
just of v2 and v3:

ciVi + c2v2 + c3v3 = ci (—v2 4- 2v 3) + c2v2 + c3v3 = (c2 - ci)v2 + (c3 + 2c])v3.

The vector Vi was redundant because

Span(vb v2, v3) = Span(v2, v3).

We might surmise that the vector w can now be written uniquely as a linear combination of v2 and
v3, and this is easy to check:

[A'|w

and from the fact that the matrix A' has rank 2 we infer that the system of equations has a unique
solution.

Remark In the language of functions, if A is the standard matrix of a linear map


T: Rn -> Rm, we are interested in the image of T (i.e., the set of w G Rm so that w = T(v)
for some v G R") and the issue of whether T is one-to-one (i.e., given w in the image, is
there exactly one v G R" so that T (v) = w?).

Generalizing the preceding example, we now recast Proposition 1.5:

Proposition 3.1 Let vb ..., v* G R” and let V = Span(vb ..., v&). An arbitrary
vector v G Span(vi,...,¥*) has a unique expression as a linear combination ofy\,... ,Vk
if and only if the zero vector has a unique expression as a linear combination ofvi,..., v*;
i.e.,

C1V1 + c2v2 -I-------- 1- QVfc = 0 Cl = C2 = • • • = Q = 0.


158 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

Proof Suppose for some v e V there are two different expressions

v = QV1 + C2V2 H-------- F QVjt and


v = diVi + d2v2 + • • • + dkNk.

Then, subtracting, we obtain

0 = (ci - Ji)vi H----- + (ck - dfc)vfc,

and so the zero vector has a nontrivial representation as a linear combination of Vi,.. .,
(by which we mean that not all the coefficients are 0).
Conversely, suppose there is a nontrivial linear combination

0 = siVi + ••.• +

Then, given any vector v e V, we can express v as a linear combination of Vi,..., in


several ways: for instance, adding

V = C1V1 + C2V2 H-------- F ckvk and


O = sivi 4-S2V2H-------- FsjtVfc,

we obtain another formula for v, namely,

v = (Cl + S0V1 + ••• + (£* +

This completes the proof. ■

This discussion leads us to make the following

Definition The (indexed) set of vectors {Vi,..., v^} is called linearly independent
if

C1V1 -F c2V2 H-------- F ckVk = 0 => ci = c2 = • • • = Q — 0,

i.e., if the only way of expressing the zero vector as a linear combination of Vi,..., v* is
the trivial linear combination Ovi H-------- F 0v*.
The set of vectors {vi,..., v*} is called linearly dependent if it is not linearly indepen­
dent, i.e., if there is some expression

c?i Vi -F C2V2 + •" + ckvk = 0, where not all the c, ’s are 0.

Remark The language is problematic here. Many mathematicians—often including


the author of this text—tend to say things like “the vectors Vi,..., Nk are linearly indepen­
dent.” But linear independence (or dependence) is a property of the whole collection of
vectors, not of the individual vectors. What’s worse, we really should refer to an ordered
list of vectors rather than to a set of vectors: For example, any list in which some vector,
v, appears twice is obviously giving a linearly dependent collection; but the set {v, v} is
indistinguishable from the set {v}. There seems to be no ideal route out of this morass.
Having said all this, we warn the gentle reader that we may occasionally say “the vectors
Vi,..., Vfc are linearly (in)dependent” where it would be too clumsy to be more pedantic.
Just stay alert!
3 Linear Independence, Basis, and Dimension -4 159

Remark Here is a piece of advice: It is virtually always the case that when you are
presented with a set of vectors {vi,..., v*} that you are to prove linearly independent, you
should write

“Suppose ciVi + c2v2 4-------- 1- QVfc = 0. I must show that q = • • • = ck = 0.”

You then use whatever hypotheses you’re given to arrive at that conclusion.

> EXAMPLE 2

We wish to decide whether the vectors

form a linearly independent set. Suppose ciVi 4- c2v2 + c3v3 = 0; i.e.,

Can we conclude that ci = c2 = c3 = 0? We recognize tins as a homogeneous system of linear


equations:

1
ci
0
C2 = 0.
1
_C3_
2

By now we are old hands at solving such systems. We find that the echelon form of the coefficient
matrix is
"12 1“
0 1 1
0 0 0 ’
_0 0 0_

and so our system of equations in fact has infinitely many solutions. For example, we can take ci = 1,
c2 = — 1, and C3 = 1. The vectors therefore form a linearly dependent set. ◄

► EXAMPLES

Suppose u, v, w e R". If {u, v, w} is linearly independent, then we wish to show next that
{u + v, v + w, u 4- w} is likewise linearly independent. Suppose

ci(u 4- v) 4- c2(v + w) + c3(u + w) = 0.

We must show that ci = c2 = c3 = 0. We use the distributive property to rewrite our equation as

(Cl + c3)u + (Cl 4- C2)v + (c2 + c3)w = 0.


160 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

Since {u, v, w} is linearly independent, we may infer that

ci + c3 = 0
ci + c2 =0
Ci + C3 = 0,

and we leave it to the reader to check that the only solution of this system of equations is, in fact,
ci = ci = c3 = 0, as desired. ◄

► EXAMPLE 4

Any time one has a list of vectors Vi,..., v* in which one of the vectors is the zero vector, say Vi = 0,
then the set of vectors must be linearly dependent because the equation

Ivi = 0

is a nontrivial linear combination of the vectors yielding the zero vector.

EXAMPLES

How can two nonzero vectors u and v give rise to a linearly dependent set? By definition, this means i
that there is a linear combination |

au + by = 0,
where either a / 0 or b f 0. Suppose a 0 0. Then we may write u = — | v, so u is a scalar multiple
of v. (Similarly, you may show that if b / 0, v must be a scalar multiple of u.) So two linearly
dependent vectors are parallel (and vice versa).
How can a collection of three nonzero vectors be linearly dependent? As before, there must be I
a linear combination

au + by + cw = 0,
where (at least) one of a, b, and c is nonzero. Say a / 0. This means that we can solve:
1 b c
u = —(by + cw) = (—)v + (—)w,
a a a
so u 6 Span(v, w). In particular, Span(u, v, w) is either a line (if all three vectors u, v, w are parallel)
or a plane. ^4

The appropriate generalization of the last example is the following useful

Proposition 3.2 Suppose vf, ...,¥* e K" form a linearly independent set, and
suppose x € Rn. Then {vi,..., ¥*, x} is linearly independent if and only if x £ '
Span(vb ..., vjt).

Proof Although Figure 3.1 suggests the result is quite plausible, we will prove the
contrapositive:

{vi,..., Vjt, x} is linearly dependent if and only if x e Span(vi,..., v^).


3 Linear Independence, Basis, and Dimension ◄ 161

Figure 3.1

Suppose x e Span(vi, ..., v*)- Then x = ciVi 4- C2V2 4-------- F Qv^ for some scalars ci,
. .. , Ck, so

C1V1 + c2v2 4--------F QVjt 4- (-l)x = 0,

from which we conclude that {vi,..., Vjt, x} is linearly dependent (since at least one of the
coefficients is nonzero).
Now suppose {vi,..., Vjt, x} is linearly dependent. This means that there are scalars
ci,..., Ck, and c, not all 0, so that

C1V1 + c2v2 4-------- F ckVk 4- ex = 0.

Note that we cannot have c = 0, for if c were 0, we’d have ciVi + c2v2 4-------- F c^v* = 0,
and linear independence of{vi,..., v^} implies ci — • • • = q = 0, which contradicts our
assumption that {vi,..., v*, x) is linearly dependent. Therefore, c / 0, and so

1 Cl C2 Ck
x - —(C1V1 4- C2V2 4- • • • 4- CfcVfc) = (------)V1 + (------ )v2 4- • • • 4- (------ )v*,
c c c c
which tells us that x € Span(vi,..., v*), as required. ■

Proposition 3.2 has the following consequence: If {vi,..., v*} is linearly independent,
then

Span(vi) $ Span(vi, v2) C • • • £ Span(vi,..., v*).

That is, with each additional vector, the subspace spanned gets larger. We now formalize
the notion of “size” of a subspace. But we now understand that when we have a set of
linearly independent vectors, no proper subset will yield the same span. In other words, we
will have an “efficient” set of spanning vectors (i.e., there is no redundancy in the vectors
we’ve chosen: No proper subset will do). This motivates the following

Definition Let V c R" be a subspace. The set of vectors {vi......... v*} is called a
basis for V if

i. vi,..., vjt span V; i.e., V = Span(vi,..., v*), and


ii. {vi,..., Vfc} is linearly independent.

We comment that the plural of basis is bases.


162 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

► EXAMPLE 6

Recall that the vectors

are called the standard basis vectors for R". To check that they make up a basis, we must establish
that properties (i) and (ii) above hold for V = R". The first is obvious: If x € R", then x = xiei 4-
x2e2 4-------F xnen. The second is not much harder. Suppose ciei 4- c2e2 4------- 1- cnen = 0. Then this
means that

and so ci = c2 = •••= cn = 0.

► EXAMPLE 7

Consider the plane given by V = {x e R3: Xi - x2 4- 2x3 = 0} c R3. Our algorithms of Section 1
tell us that the vectors

span V. Since these vectors are not parallel, we can deduce (see Example 5) that they must be linearly
independent.
For the practice, however, we give a direct argument. Suppose

” 1' " —2 "


CiVi 4- c2v2 = ci 1 4-c 2 0 = 0.
0 1

Writing out the entries explicitly, we obtain

from which we conclude that ci = c2 = 0, as required. (For future reference, we note that this
information came from the free variable “slots.”) Therefore, {vi, v2J is linearly independent and
gives a basis for V, as required.

The following observation may prove useful.


3 Linear Independence, Basis, and Dimension ◄ 163

Corollary 3.3 Let V Cl" be a subspace, and letvi,..., v* g V. Then {vi,..., V/J
is a basis for V if and only if every vector ofV can be written uniquely as a linear combi­
nation ofV\, . . . , Vfc.

Proof This is immediate from Proposition 3.1. ■

Definition When we write v = ci vi + C2V2 4-------- 1- QVb we refer to q ,..., q as


the coordinates of v with respect to the (ordered) basis {vi,..., v*}.

> EXAMPLES

Consider the three vectors

Let’s take a general vector b e K3 and ask first of all whether it has a unique expression as a linear
combination of Vi, v2, and V3. Forming the augmented matrix and row reducing, we find
“1 1 1 bi~ ’1 0 0 2&i — bj
2 1 0 b2 0 1 0 -4bi + b2 + 2i>3
_1 2 2 &3_ 0 0 1 3fei — b2 — 63

It follows from Corollary 3.3 that {vi, v2, V3} is a basis for R3, for an arbitrary vector b e R3 can be
written in the form

b = (2&i — £>3) Vi 4- (—4&i + h2 + 2Z>3) v 2 + (3hj — h2 — £>3) V3.


Ci c2 c3
And, what’s more,

ci = 2&i - h3,
c2 = —4&i + h2 + 2Z>3, and
c3 = 3hi - b2 - h3

give the coordinates of b with respect to the basis {vi, v2, v3). <4

Another example, which will be quite important to us in the future, is

Proposition 3.4 Let Abe an n x n matrix. Then A is nonsingular if and only if its
column vectors form a basis for R".

Proof As usual, let’s denote the column vectors of A by «i, a2......... an. Using Corol­
lary 3.3, we are to prove that A is nonsingular if and only if every vector in R" can be written
uniquely as a linear combination of ai, a2,..., a„. But this is exactly what Proposition 1.6
tells us. ■

Given a subspace V C Rn, how do we know there is some basis for it? This is a
consequence of Proposition 3.2 as well.
164 > Chapter 4. Implicit and Explicit Solutions of Linear Systems

Theorem 3.5 Any subspace V C R" other than the trivial subspace has a basis.

Proof Since V {0}, we choose a nonzero vector vi G V. If Vi spans V, then we


know {vi} will constitute a basis for V. If not, choose V2 £ Span(vi). From Proposition
3.2 we infer that {vi, V2} is linearly independent. If Vi, V2 span V, then {vi, V2} will be a
basis for V. If not, choose V3 £ Span(vi, V2). Once again, we know that {vi, v2, V3} will
be linearly independent and hence will form a basis for V if the three vectors span V. We
continue in this fashion, and we are guaranteed that the process will terminate in at most n
steps because, according to Exercise 6, once we have n 4-1 vectors in R", they must form
a linearly dependent set. ■

Once we realize that every subspace V C R" has some basis, we are confronted with
the problem that it has many of them. For example, Proposition 3.4 gives us a way of
finding zillions of bases for R". As we shall now show, all bases for a given subspace have
one thing in common: They all consist of the same number of elements.

Proposition 3.6 Let V C Rn be a subspace, let {vi,..., v*} be a basis for V, and let
W],..., G V. If £ > k, then {wi,..., w^} must be linearly dependent.

Proof Each vector in V can be written uniquely as a linear combination of Vi,..., .


So let’s write each vector wi,..., as such:
f
Wi = aiiVi 4- a2iV2 4- • • • + a*iv*
w2 = C12V1 + 022 V2 4------4- 0Jt2Vfc

W£ = ai£Vi 4- a-uv2 + • • • 4- awv*.


i
We now form the k x £ matrix A = [ay]. This gives the matrix equation

(*)

Since £ > k, there cannot be a pivot in every column of A, and so there is a nonzero
vector c satisfying

= 0.

LQ J
3 Linear Independence, Basis, and Dimension 165

Using (*) and associativity, we have

That is, we have found a nontrivial linear combination

QW1 H--------h QW£ = 0,

which means that {wi,..., w^} is linearly dependent, as was claimed. ■

Remark We can easily avoid equation (*) in its matrix form. Since
k j
Wj -
1=1

we have
t t k kt
z \ / V"' \ v""' / \
(**) L civi “ /. c> ( > - a‘J v‘) = > - aUci)'/‘ •
J=1 J=1 i=l i=l J=1

As before, since t > k, there is a nonzero vector c so that Ac = 0; this choice of c makes the
right-hand side of (**) the zero vector. Consequently, there is a nontrivial relation among
Wi,...,W£.

This proposition leads directly to our main result.

Theorem 3.7 Let V c R" be a subspace, and let {Vi,..., Vjt} and {wi,.... w^} be
two bases for V. Then we have k = t.

Proof Since {vi,..., v*} forms a basis for V and {wi,..., w^} is known to be lin­
early independent, we use Proposition 3.6 to conclude that t < k. Now here’s the trick:
{wi,..., w^} is likewise a basis for V and {vi,..., v*} is known to be linearly independent,
so we infer from Proposition 3.6 that k <£. The only way both inequalities can hold is for
k and t to be equal, as we wished to show. ■

We now make the official

Definition The dimension of a subspace V c R" is the number of vectors in any


basis for V. We denote the dimension of V by dim V. By convention, dim{0} = 0.

As we shall see in our applications, dimension is a powerful tool. Here is the first
instance.

Lemma 3.8 Suppose V and W are subspaces ofW with the property that W C V.
If dim V = dim W, then V = W.
166 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

Proof Let dim W = k and let {vi,..., be a basis for W. If W Q V, then there
must be a vector v e V with v W. By virtue of Proposition 3.2, we know that {Vi,...,
?k, v} is linearly independent, so dim V > k 4-1. This is a contradiction. Therefore,
V = W. ■

The next result is quite useful.

Proposition 3.9 Let V C R” be a k-dimensional subspace. Then any k vectors that


span V must be linearly independent and any k linearly independent vectors in V must
span V.

Proof Left to the reader in Exercise 17. ■

► EXAMPLE 9

Let V = Span(vi, v2, v3, v4) c R3, where

’ 1' ’2' "o' " 3"


Vi = 1 , v2 = 2 , v3 = 1 , and v4 = 4
_2_ _4_ _1_ _7_

We want a subset of {vi, v2, v3, v4} that will give us a basis for V. Of course, this set of four
vectors must be linearly dependent since V C R3 and R3 is only 3-dimensional. But let’s examine
the solutions of

Cl Vi + c2v2 4- c3v3 + C4V4 = 0,

or, in matrix form,

1 2 0
1 2 1
2 4 1

As usual, we proceed to reduced echelon form:

"1 2 0 3~
R = 0 0 1 1
_0 0 0 0_

from which we find that the vectors

•2" " -3 "


1 0
and
0 -1
0. 1_

span the space of solutions. In particular, this tells us that

-2vi 4- v2 = 0 and -3vi - v3 4- v4 = 0,


3 Linear Independence, Basis, and Dimension ◄ 167

and so the vectors n 2 and v4 can be expressed as linear combinations of the vectors Vi and v3. On the
other hand, {vj, v3} is linearly independent (why?), so this gives a basis for V.

3.1 Abstract Vector Spaces


We have not yet dealt with vector spaces other than Euclidean spaces. In general, a vector
space is a set endowed with the operations of addition and scalar multiplication, subject to
the properties listed in Exercise 1.1.12. Notions of linear independence and basis proceed
analogously; the Remark on p. 165 shows that dimension is well defined in the general
setting.

► EXAMPLE 10

Here are a few examples of so-called “abstract” vector spaces. Others appear in the exercises.
a. Let A4mxn denote the set of all m xn matrices. As we’ve seen in Proposition 4.1 of Chapter
1, Mmxn is a vector space, using the operations of matrix addition and scalar multiplication
we’ve already defined. The zero “vector” is the zero matrix O. This space can naturally be
identified with RmB (see Exercise 24).
b. Let ^(U) denote the collection of all real valued functions defined on some subset U C R”.
If f e F(U) and c e R, then we can define a new function cf € F(U) by multiplying the
value of f at each point by the scalar c; i.e.,

(cjj(t) = cf(t) for each t e U.

Similarly, if f, g e ^(U), then we can define the new function f + g e F(U) by adding
the values of f and g at each point; i.e.,

(/ + g)(t) = f(t) + 8<t) for each t e U.

By these formulas we define scalar multiplication and vector addition in F(U). The zero
“vector” in F(U) is the zero function. The various properties of a vector space follow from
the corresponding properties of the real numbers (as everything is defined in terms of the
values of the function at every point t). Since an element of F(U) is a function, ^(U) is
often called a junction space.
c. Let R" denote the collection of all infinite sequences of real numbers. That is, an element of

x2
R® looks like x = , where x, e R, i = 1,2,3,.... Operations are defined in the obvi-

' yi ’ CXl " + yi ~

yz CX2 X2 + yz
ous way: If c e Randy = , then we set ex = CX3
andx + y =
X3 +y3
. ◄
y-i

, * _

The vector space of functions on an open subset U C R" has various subspaces that
will be of particular interest to us. For any k > 0 we have 6fc(t7), the space of Ck functions
168 > Chapter 4. Implicit and Explicit Solutions of Linear Systems

on U; indeed, we have the hierarchy

c ■ • • c e‘+1(j/) c e‘(£/> c • • • c e2(t/) c e‘(t/) c e°(co-

(That these are all subspaces follows from the standard fact that sums and scalar multiples
of C functions are again 6 .) We can also consider the subspaces of polynomial functions.
We denote by Pk the vector space of polynomials of degree < k in one variable.
As we ask the reader to check in Exercise 26, the vector space Pk has dimension
k + 1. In general, we say a vector space is finite-dimensional if it has dimension n for some
n G N and infinite-dimensional if not. The vector space e°°(R) is infinite-dimensional, as
it contains polynomials of arbitrarily high degree.

> EXERCISES 4.3


" 1' ' 2~ "2'

1. Let Vi = 2 . v2 = 4 , and v3 = 4 G R3. Is each of the following statements correct


_3_ _5_ _6_
or incorrect? Explain.
(a) The set {vi, v2, v3) is linearly dependent.
(b) Each of the vectors v2, and v3 can be written as a linear combination of the others.
*2. Decide whether each of the following sets of vectors is linearly independent.

4. Suppose {u, v, w} c R3 is linearly independent.


(a) Prove that u • (v x w) £ 0.
(b) Prove that {u x v, v x w, w x u} is linearly independent as well.
s5. Suppose Vi,.... Vfc are nonzero vectors with the property that vf • V; = 0 whenever i # j. Prove
that {Vf,..., Vi} is linearly independent. (Hint: Suppose ciVt + o2 v2 4- • • • 4- = 0. Start by
showing ci = 0.)
#6. Suppose k > n. Prove that any k vectors in R" must form a linearly dependent set. (So what can
you conclude if you have k linearly independent vectors in RB?)
3 Linear Independence, Basis, and Dimension ◄ 169

7. Suppose Vi,..., vk € Rn form a linearly dependent set. Prove that for some 1 < j < k we have
Vy e Span(vi,..., v7_i, Vj+i,..., vk). That is, one of the vectors Vi,..., v* can be written as a linear
combination of the remaining vectors.
8. Suppose vi,..., v* € RB form a linearly dependent set. Prove that either Vi = 0 or vi+i e
Span(Vi,..., V,) for some i = 1,2,.... k - 1.
9. Let A be an m x n matrix and bi,..., b* e Rm. Suppose {bi,..., b*} is linearly independent
Suppose that Vi,..., v* e R" are chosen so that Avi = bi, ...» Avk = b*. Prove that {vi,..., v*}
must be linearly independent.
10. Suppose T: R" -> R* is a linear map. Prove that if [T] is nonsingular and {vj........ v*} is linearly
independent, then {7 (vi), ...,T (v*)} is likewise linearly independent.
c 11. Suppose T: R" -> is a linear map and [T] has rank n. Suppose Vi,..., v* e Rn and
{vi,..., VjJ is linearly independent. Prove that {T(vi),..., T(vt)} c Rm is likewise linearly in­
dependent. (N.B.: If you did not explicitly make use of the assumption that rank([T]) = n, your
proof cannot be correct. Why?)
*12. Decide whether the following sets of vectors give a basis for the indicated space.

13. Find a basis for each of the given subspaces and determine its dimension.

(d) V = {x e R5: xi = x2. x3 = x4} c R5


14. In each case, check that {vi,..., v„} is a basis for R" and give the coordinates of the given vector
b e Rn with respect to that basis.

1 1 1 1
*(b) V1 = 0 , v2 - 2 . v3 = 3 ;b = 1
_3_ _2_ _2_ _2_
" 1 ’ '1 " " 1 ' "3"

(c) Vi = 0 , V2 = 1 ,v3 = 1 ;b = 0
1 2 1 1
170 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

15. Find a basis for the intersection of the subspaces

V = Span and W = Span

816. Suppose Vi,.... vB are nonzero, mutually orthogonal vectors in R".


(a) Prove that they form a basis for R".
(b) Given any x G R", give an explicit formula for the coordinates of x with respect to the basis
n
(c) Deduce from your answer to part b that x = 52 x-
z=i
17. Prove Proposition 3.9. (Hint: Exercise 7 and Lemma 3.8 may be of help.)
818. Let V C R" be a subspace, and suppose you are given a linearly independent set of vectors
{vi,..., vfc} c V. Prove that there are vectors vfc+i,..., Nt g V so that {vi,..., vf) forms a basis
for'V.
19. Suppose V and W are subspaces of R" and W C V. Prove that dim W < dim V. (Hint: Start
with a basis for W and apply Exercise 18.)
20. Suppose A is an n x n matrix, and let Vi,...»vn G R". Suppose {Avi,..., Avn} is linearly
independent. Prove that A is nonsingular.
21. *(a) Suppose U and V are subspaces of Rn with U n V = {0}. If {ui,..., u*} is a basis for [7
and {vi,..., vz) is a basis for V, prove that {ui,... ,u*, Vi,..., vj is a basis for U + V.
(b) Let U and V be subspaces of Rn. Prove that if U n V = {0}, then dim(t/ + V) = dim U +
dim V.
(c) Let U and V be subspaces of Rn. Prove that dim(l7 + V) = dim U + dim V — dim(U D V).
(Hint: Start with a basis for U D V, and use Exercise 18.)
22. Let T: Rn —> Rm be a linear map. Define
ker(T) = {x g Rn : T(x) = 0} and
image(T) = {y e Rm : y = T(x) for some x g Rn}.
(a) Check that ker(T) and image (T) are subspaces of R" and Rm, respectively.
(b) Let {vi,..., vfc} be a basis for ker(T) and, using Exercise 18, extend to a basis {vi,..., Vfc,
Vfc+i, ...,¥„} for R". Prove that {7'(vi+1),T(vn)} gives a basis for image(T).
(c) Deduce that dim ker(T) + dim image(T) = n.
*23. Decide whether the following sets of vectors are linearly independent.
✓ x [h ol To fl ill
(a) • , , C A42x 2
L° d L1 °J L1 -d
(b) {/i, /i, fa} C Pi, where /i(r) = t, faff) = t 4-1, /sfr) = t + 2
(c) {fa, fa, fa} C e°°(R), where /i(t) = 1, fa(t) = cost, fa(t) = sinr
(d) {fa, fi, fi} C C°(R), where fa(t) = 1, fa(t) = sin21, fa(t) = cos21
4 The Four Fundamental Subspaces -4 171

(e) {fi, fi, fz} C e°°(R), where /i(r) = 1, /2(t) = cost, /3(t) = cos2t
(f) {fi, fi, fz} C e°°(R), where /i(t) = 1, /2(r) = cos2t, f3(r) = cos21
24. Recall that A4mx„ denotes the vector space of m x n matrices.
(a) Give a basis for and determine the dimension of Mmxn-
(b) Show that the set of diagonal matrices, the set of upper triangular matrices, and the set of lower
triangular matrices are all subspaces of Mnxn and determine their dimensions.
(c) Show that the set of symmetric matrices, S, and the set of skew-symmetric matrices, X, are
subspaces of A4nx„. What are their dimensions? Show that S 4- X = A4nxn. (See Exercise 1.4.36.)
#25. Let V be a vector space.
(a) Let V* denote the set of all linear transformations from V to R. Show that V* is a vector space.
(b) Suppose {Vi,..., v„) is a basis for V. For i = 1,..., n, define fz € V* by
f,(aiVi + a2v2 + • • • 4- anv„) = at.
Prove that {ft,..., f„} gives a basis for V*.
(c) Deduce that whenever V is finite-dimensional, dim V* = dim V.
26. Show that the set Pk of polynomials in one variable of degree < k is a vector space of dimension
k 4-1. (Hint: Suppose co + c\x -I------- 1- CkXk = 0 for all x. Differentiate.)
27. Recall that f: R" - {0} -> R is homogeneous of degree k if f {tn) = tkf{n) for all t > 0.
(a) Show that the set Pk,n of homogeneous polynomials of degree k in n variables is a vector space.
(b) Fix k € N. Show that the monomials x^x'f •••xi*, where z'i 4~ i2 4------- 1- i„ = k and 0 < ij < k
for j = 1,..., n, form a basis for Pk,n-
(c) Show that dim Pk,n = ("~j+k) -4 (Hint: It may help to remember that (0 = ( ? k).)
it
(d) Using the interpretation in part c, prove that 22 (”^) = (" k ')■
i=0

> 4 THE FOUR FUNDAMENTAL SUBSPACES


Given an m x n matrix A (or, more conceptually, a linear map T: R" -> Rm), there are
four natural subspaces to consider. It is one of our goals to understand the relations among
them. We begin with the column space and row space.

Definition Let A be an m xn matrix with row vectors Ai,..., Am € R" and column
vectors ai,..., aw e Rm. We define the column space of A to be the subspace of Rm spanned
by ai, ...,an:

C(A) = Span(ai,..., an) cRm.

We define the row space of A to be the subspace of R" spanned by Ai,..., Am :

R(A) = Span(Ai,..., Aw) c R".

Our work in Section 1 gives an important alternative interpretation of the column space.

4Recall that the binomial coefficient (£) = n\/k\{n — k)\ gives the number of ^-element subsets of a given n-
element set.
172 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

Proposition 4.1 Let A be an m xn matrix. Let b G Rm. Then b G C(A) if and only
ifb = Ax. for some x e R". That is,

C(A) = {b g Rm : Ax = b is consistent}.

Proof By definition, C(A) = Span(ai,..., an), and so b g C(A) if and only if b is a


linear combination of the vectors ai,..., an; i.e., b = xi«i -I-------- F xnan for some scalars
xi,..., xn. Recalling our crucial observation (*) on p. 135, we conclude that b g C(A) if
and only if b = Ax for some x G R". The final reformulation is straightforward as long as
we remember that the system Ax = b is consistent if it has a solution. ■

Remark If we think of A as the standard matrix of a linear map T: R" -> Rm, then
C(A) c Rm is the set of all the values of T, i.e., its image, denoted image(T).

Perhaps the most natural subspace of all comes from solving a homogeneous system
of linear equations.

Definition Let A be an m x n matrix. The nullspace of A is the set of solutions of


the system Ax = 0:

N(A) = {x G R" : Ax = 0}.

Recall (see Exercise 1.4.3) that N(A) is in fact a subspace. If we think of A as the standard
matrix of a linear map T: R” -» Rm, then N(A) C R" is often called the kernel of T,
denoted ker(T).
We might surmise that our algorithm in Section 1 for finding the general solution of
the homogeneous linear system Ax = 0 produces a basis for N(A).

► EXAMPLE 1

Let’s find a basis for the nullspace of the matrix

. 121-1
A=
10 11

Of course, we bring A to its reduced echelon form

R= 1 ° 1 1
0 10-1

and read off the general solution

Xl = -x3 - x4
x2 = x4
X3 = X3
x4 = x4;
4 The Four Fundamental Subspaces <4 173

e.,
i.

From this we see that the vectors

span N(A). On the other hand, they are clearly linearly independent, for if

One of the most beautiful and powerful relations among these subspaces is the fol­
lowing:

Proposition 4.2 Let A be an m xn matrix. Then N(A) = R(A)X.

Proof If x € N(A), then, by definition, At • x = 0 for all i = 1,2,..., m. (Remem­


ber that Ai,..., Am denote the row vectors of the matrix A.) So it follows (see Exercise
1.3.3) that x is orthogonal to any linear combination of Ai,..., Am, hence to any vec­
tor in R(A). That is, x € R(A)-1-, so N(A) c R(A)-1-. Now we need only show that
R(A)-1- C N(A). If x € R(A)-1-, this means that x is orthogonal to every vector in R(A),
so, in particular, x is orthogonal to each of the row vectors Ai,..., Aw. But this means that
Ax = 0, so x g N(A), as required. ■

It is also the case that R(A) = N(A)1, but we are not quite yet in a position to establish
this.
Since C(A) = R(AT), the following is immediate:

Corollary 4.3 Let Abe anm xn matrix. Then N(AT) = C(A)"1-.

In fact, we really came across this earlier, when we found constraint equations for
Ax = b to be consistent. Just as multiplying A by x takes linear combinations of the
columns of A, so then does multiplying AT by x take linear combinations of the rows of A
(perhaps it helps to think of ATx as (xTA)T). Corollary 4.3 is the statement that any linear
combination of the rows of A that gives 0 corresponds to a constraint on C(A) and vice
174 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

versa. What is far from clear, however, is whether the vectors we obtain as coefficients of
the constraint equations form a linearly independent set.

> EXAMPLE 2

Let

2
1
1
2

We wish to find a homogeneous system of linear equations describing C(A). That is, we seek the
equations b € R4 must satisfy in order for Ax — b to be consistent. By row reduction, we find:

“1 2 bi' "1 2 bi "1 2 bi


1 1 bz 0 -1 b2 — bi 0 ' 1 bi —b2
0 1 b3 0 1 b3 0 0 —bi + bz + b3
.1 2 ^4_ .0 0 b^ — bi_ .0 0 —bi + Z>4

and so the constraint equations are

—bi 4- Z>2 + ^3 =0
—by + &4 — 0.

Now, if we keep track of the row operations involved in reducing A to echelon form, we find
that

1 0 0
-1 1 0
-1 1 1
_-l 0 0

from which we see that

—Ai + A2 + A3 = —Ai + A4 = 0.

Thus, we infer that

span N(AT). On the other hand, in this instance, it is easy to see they are linearly independent and
hence give a basis for N(AT). *4
4 The Four Fundamental Subspaces *4 175

► EXAMPLE 3

Let

110 14
12 116
0 1113
2 2 0 1 7

Using the result of Exercise 1, R(A) = R(2?), so the nonzero rows of R span R(A); now we need
only check that they form a linearly independent set. We keep an eye on the pivot “slots”: Suppose

1' 'o' 'o'


0 1 0
-1 + C2 1 + C3 0
0 0 1
1 _ _2_ 1

This means that

ci ' o'

C2 0
“Cl + C2 — 0
C3 0
_ ci + 2c 2 + C3 _ 0

and so ci = c2 = c3 = 0, as promised.
From the reduced echelon form R, we read off die vectors that span N(A): The general solution
of Ax = 0 is

*3~ x5 r '-1'

—*3—2*5 -i -2
X3 = *3 i +xs 0
- x5 0 -1
X5 0 1
176 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

so

span N(A). On the other hand, these vectors are linearly independent, for if we take a linear combi­
nation

1" " -1 "

-1 -2
1 + X5 0
0 -1
0 1 _

we infer (from the free variable slots) that x3 = x$ = 0. Thus, these two vectors form a basis for
N(A).
Obviously, C(A) is spanned by the five column vectors of A. But these vectors cannot be linearly
independent—that’s what vectors in the nullspace of A tell us. From our vectors spanning N(A), we
know that

(*) ai — a2 + a3 = 0 and - ai — 2a2 — 84 + a5 = 0.

These equations tell us that a3 and a3 can be written as linear combinations of ai, a2, and 84, and so
these latter three vectors span C(A). If we can check that they form a linearly independent set, we’ll
know they give a basis for C(A). We form a matrix A' with these columns (easier: cross out the third
and fifth columns of A) and reduce it to echelon form (easier: cross out the third and fifth columns of
R). Well, we have

and so only the trivial linear combination of the columns of A' will yield the zero vector. In conclusion,
the vectors

are linearly independent and span C(A).


What about N(AT)? The only row of zeroes in 2? arises as the linear combination

—Ai — A2 + A3 + A4 = 0
4 The Four Fundamental Subspaces ^8 177

of the rows of A, so we expect the vector

-1
-1
v
1
1

to give a basis for N(AT). ◄

We now state the formal results regarding the four fundamental subspaces.

Theorem 4.4 Let A be an m xn matrix. Let U and R, respectively, denote the


echelon and reduced echelon form, respectively, of A, and write EA — U (so E is the
product of the elementary matrices by which we reduce A to echelon form).

1. The nonzero rows of U (or of R) give a basis for R(A).


2. The vectors obtained by setting each free variable equal to 1 and the remaining
free variables equal to 0 in the general solution of An = 0 (which we read offfrom
Rx = 0) give a basis for N(A).
3. The pivot columns of A (i.e., the columns of the original matrix A corresponding
to the pivots in U) give a basis for C(A).
4. The (transposes of the) rows of E that correspond to the zero rows ofU give a
basis for N(AT). (The same works with E' if we write E'A = R.)

Proof For simplicity of exposition, let’s assume that the reduced echelon form takes
the shape

1. Since row operations are invertible, R(A) = R(t7) (see Exercise 1). Clearly the
nonzero rows of U span R(t/). Moreover, they are linearly independent because
of the pivots. Let Ui,..., Ur denote the nonzero rows of U', because of our
simplifying assumption on R, we know that the pivots of U occur in the first r
columns as well. Suppose now that

ciUi + • • • + crUr = 0.

The first entry of the left-hand side is ciun (since the first entry of the vectors
U2,..., Ur is 0 by definition of echelon form). Since «n 0 by definition of
pivot, we must have ci = 0. Continuing in this fashion, we find that ci = C2 =
• • • = cr = 0. In conclusion, {Ui,..., Ur} forms a basis for R((7), hence for
R(A).
178 Chapter 4. Implicit and Explicit Solutions of Linear Systems

2. Ax = 0 if and only if Rx — 0, which means that

+ Z>iir+ixr+i + bltr+2xr+2 + ... + blnxn = 0


*2 + ^2,r+lxr+l + ^2,r+2Xr+2 + ... + = 0

Xr + ^r,r+lXr+l + &r,r+2*r+2 + + brnXn = 0 .

Thus, an arbitrary element of N(A) can be written in the form

Xl -bl,r+l —bi>r+2 -bln

Xr ~br,r+l ~br,r+2 ~brn

*r+l — xy+i 1 + Xr+2 0 + • ‘ "I" xn 0


*r+2 0 1 0

Xn 0 0 1

expressed as a linear combination of them). We need to check linear independence:


The key is the pattern of l’s and 0’s in the free variable “slots.” Suppose

"o’ —b\j+\ ~bl,r+2 -bln

0 ~br,r+l —br,r+2 -brn


0= 0 = xr+i 1 + Xr+2 0 +• ■ + xn 0
0 0 1 0

_0_ 0 0 1

Then we get xr+i = xr+2 = • • = xn = 0, as required.


3. Let’s continue with the notational simplification that the pivots occur in the first r
columns. Then we need to establish the fact that the first r column vectors of the
4 The Four Fundamental Subspaces ◄ 179

original matrix A give a basis for C( A). These vectors form a linearly independent
set since the only solution of

qai 4-------- 1- crar = 0

is ci = C2 = • • • = cr = 0 (look only at the first r columns of A and the first r


columns of R). It is more interesting to understand why ai,..., ar span C(A).
Consider each of the basis vectors for N(A) given above: Each one gives us a
linear combination of the column vectors of A that results in the zero vector. In
particular, we find that

—~ ... — br>r+i3r + ar+i = 0


~■^l,r+2ai — ... — br>r+2^r 4" &r+2 = 0

—£in&i — ... — brnar + an =0,

from which we conclude that the vectors ar+i,..., an are all linear combinations
of ai,..., ar. It follows that C(A) is spanned by ai,..., ar, as required.
4. We are interested in the linear relations among the rows of A. The key point here
is that the first r rows of the echelon matrix U form a linearly independent set,
whereas the last m — r rows of U consist just of 0. Thus, N(t/T) is spanned by the
last m — r standard basis vectors for Rm. Using EA = U, we see that

AT = (E-1U)T = £7T(E-1)T = Ut (Et )-1,

and so

x G N(AT) 4=^ x G N(Ut (Et )-1) 4=> (Et )-1x g N(Ut )


4=> x = ETy for some y g N(Ut ) .

This tells us that the last m — r rows of E span N(AT). But these vectors are
linearly independent since E is nonsingular. ■

Remark Referring to our earlier discussion of (t) on p. 150 and our discussion in
Sections 1 and 2 of this chapter, we finally know that finding the constraint equations for
C(A) will give a basis for N(AT). It is also worth noting that to find bases for the four
fundamental subspaces of the matrix A, we need only find the echelon form of A to deal
with R(A) and C(A), the reduced echelon form of A to deal with N(A), and the echelon
form of the augmented matrix [A | b] to deal with N(AT).

► EXAMPLE 4
We want bases for R(A), N(A), C(A), and N(AT), given the matrix

112 0 0
0 1 1-1-1
112 12
2 1 '3 -1 -3
180 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

We leave it to the reader to check that the reduced echelon form of A is

10 10-1
_ 0 1 1 0 1
~ 0 0 0 1 2

_0 0 0 0 0

and that EA = U, where

Alternatively, the echelon form of the augmented matrix [A | b] is

"1 1 2 0 0 bi
0 1 1 -1 -1 bi
[EA | Eb] =
0 0 0 1 2 —bi + &3
.0 0 0 0 0 —42>i + bi + 2Z>3 + b$_

Then we have the following bases for the respective subspaces:

C(A):

-4
1
N(AT):
2
1

The reader should check these all carefully. Note that dimR(A) = dim C(A) = 3, dimN(A) = 2,
and dimN(AT) = 1. ◄

We now deduce the following results on dimension. Recall that the rank of a matrix is
the number of pivots in its echelon form.
4 The Four Fundamental Subspaces 181

Theorem 4.5 Let Abeanm x n matrix of rank r. Then

1. dim R(A) = dim C(A) = r.


2. dimN(A) = n — r.
3. dimN(AT) = m — r.

Proof There are r pivots and a pivot in each nonzero row of U, so dimR(A) = r.
Similarly, we have a basis vector for C(A) for every pivot, so dim C(A) = r, as well. We
see that dim N (A) is equal to the number of free variables, and this is the difference between
the total number of variables (n) and the number of pivot variables (r). Last, the number
of zero rows in U is the difference between the total number of rows (m) and the number
of nonzero rows (r), so dimN(AT) = m — r. ■

An immediate corollary of Theorem 4.5 is the following. The dimension of the


nullspace of A is often called the nullity of A, denoted null (A). (Cf. also Exercise 4.3.22.)

Corollary 4.6 (Nullity-Rank Theorem) Let A be an m xn matrix. Then

null (A) + rank(A) = n.

Now we are in a position to complete our discussion of orthogonal complements.

Proposition 4.7 Let V C R" be a k-dimensional subspace. Then dim V-1 = n — k.

Proof Choose a basis {vi,..., v*} for V, and let these be the rows of a £ x nmatrixA.
By construction, wehaveR(A) = V. Notice also that rank(A) = dimR(A) = dim V = k.
By Proposition 4.2, we have V1 = N(A), so dim V1 = dim N(A) ~n — k. ■

Now we arrive at our desired conclusion:

Proposition 4.8 Let V C R” be a subspace. Then (V1)1 = V.

Proof By Exercise 1.3.10, we have V C (V'L)'L. Now we calculate dimensions: If


dim V = k, then dim V1 = n — k, and dim( V1-)1 = n — (n — k) — k. Applying Lemma
3.8, we deduce that V = (V1)1. ■

We can finally bring this discussion to a close with the geometric characterization of
the relations among the four fundamental subspaces. Note that this result completes the
story of Theorem 4.5.

Theorem 4.9 Let Abeanm xn matrix. Then


1. R(A)± = N(A);
2. N(A)± = R(A);
3. C(A)1 = N(AT);
4. N(AT)X = C(A).
182 > Chapter 4. Implicit and Explicit Solutions of Linear Systems

Proof These are immediate from Proposition 4.2, Corollary 4.3, and Proposi­
tion 4.8. ■

Now, using Theorem 4.9, we have an alternative way of expressing a subspace V


spanned by a given set of vectors Vi,..., v* as the solution set of a homogeneous system
of linear equations. We use the vectors as rows of a matrix A; let {wi,..., w^} give a basis
for N(A). Since V = R(A) = N(A)1, we see that V is defined by the equations

Wi • x = 0, ..., Wf ■ x = 0.

► EXAMPLES

Let

We wish to write V = Span(vi, v2) as the solution set of a homogeneous system of linear equations.
We introduce the matrix

and find that

give a basis for N(A). By our earlier comments,

V = R(A) = N(A)X
= {x e R4: Wi ■ x = 0, w2 • x = 0}
— {x 6 R4 : —Xi + x2 + Xj =0, —Xi + X4 = 0}.

Earlier, e.g., in Example 2, we determined the constraint equations for the column space.
The column space, as we’ve seen, is the intersection of hyperplanes whose normal vectors
are the basis vectors for N(AT). This is an application of the result that C(A) = NCA1)-1-.
As we interchange A and AT, we turn one method of solving the problem into the other.
To close our discussion now, we introduce in Figure 4.1 a schematic diagram summa­
rizing the geometric relation among our four fundamental subspaces. We know that N(A)
and R(A) are orthogonal complements of one another in R" and that, similarly, N(AT) and
C(A) are orthogonal complements of one another in ROT. But there is more to be said.
Recall that, given an m x n matrix A, we have linear maps T: R" -» R™ and S: Rw ->
R" whose standard matrices are A and AT, respectively. T sends all of N(A) to 0 e Rw
and S sends all of N(AT) to 0 e R". Now, the column space of A consists of all vectors of
4 The Four Fundamental Subspaces ◄ 183

Figure 4.1

the form Ax for some x G R"; that is, it is the image of the function T. Since dim R(A) =
dim C(A), this suggests that T maps the subspace R(A) one-to-one and onto C(A). (And,
symmetrically, S maps C(A) one-to-one and onto R(A). These are, however, generally not
inverse functions. Why? See Exercise 18.)

Proposition 4.10 For each b G C(A), there is a unique vector x g R(A) so that
Ax = b.

(See Figure 4.2.)

Proof Let {vi,..., vr} be a basis for R(A). Then Avi,..., Avr are r vectors in
C( A). They are linearly independent (by a modification of the proof of Exercise 4.3.11 that
we leave to the reader). Therefore, by Proposition 3.9, these vectors must span C(A). This
tells us that every vector b G C(A) is of the form b = Ax for some x G R(A) (why?). And
there can be only one such vector x because R(A) A N(A) = {0}. ■

Remark There is a further geometric interpretation of the vector x G R (A) that arises
in the preceding proposition. Of all the solutions of Ax = b, it is the one of least length.
Why?
184 !► Chapter 4. Implicit and Explicit Solutions of Linear Systems

► EXERCISES 4.4
*1. Show that if B is obtained from A by performing one or more row operations, then R(B) = R(A).

2 1 1
2. Let A = 0 3 4
2 -2 -3
(a) Give constraint equations for C(A). (b) Find a basis for N(AT).
3. For each of the following matrices A, give bases for R(A), N(A), C(A), and N(AT). Check
dimensions and orthogonality.
2 3 1 1 0 1 -1“
(a) A =
4 6 1 1 2 -1 1
(e) A =
2 2 2 0 0
1 3
_-l -1 2 -3 3.
(b) A = 3 5
3 3 1 1 0 5 0 -1 "
0 1 1 3 -2 0
-2 1 0 *(f) A =
(c) A = -1 2 3 4 1 -6
-4 3 -1
0 4 4 12 -1 -7 _
"1-1 1 1 0
1 0 2 1 1
(d) A =
0 2 2 2 0
_-l 1-1 0 -1
4. Given each matrix A, find matrices X and Y so that C(A) = N(X) and N(A) = C(Y).
3 -1’ "1 1 1"
(a) A = 6 -2 1 2 0
(c) A =
-9 3_ 1 1 1
"1 1 o' .1 0 2_

(b) A = 2 1 1
_1 -1 2_
5. In each case, construct a matrix with the requisite properties or explain why no such matrix exists.
V "o'

(a) The column space contains i and 1 and the nullspace contains
_i_ _1_

"o"
'(b) The column space contains 1 and 1 and the nullspace contains
_i_ _1_

“1” ’1"

*(c) The column space has basis 0 and the nullspace contains 2
1 0
'1~ ■-1" r
(d) The nullspace contains 0 9 2 , and the row space contains 1
_1_ 1_ _-l_
4 The Four Fundamental Subspaces 185

1 0 1 2
*(e) The column space has basis 0 1 , and the row space has basis 1 0
1 1 1 1

1
(f) The column space and the nullspace both have basis
,°J
’1"

(g) The column space and the nullspace both have basis 0
0
6. (a) Construct a 3 x 3 matrix A with C(A) c N(A).
(b) Construct a 3 x 3 matrix A with N(A) c C(A).
(c) Can there be a 3 x 3 matrix A with N(A) = C(A)? Why or why not?
(d) Can there be a 4 x 4 matrix A with N(A) = C(A)? Why or why not?

7. Let V C R5 be spanned by Give a homogeneous system of equations having

V as its solution set.


*8. Give a basis for the orthogonal complement of each of the following subspaces of R4:

(b) W = {x e R4 : xi + 3x3 + 4x4 =0, x2 + 2x3 — 5x4 = 0).


9. (a) Give a basis for the orthogonal complement of the subspace V c R4 given by
V = {x € R4 : xi + x2 - 2x4 = 0, xi — x2 — x3 + 6x4 =0, x2 + x3 — 4x4 = 0}.

(b) Give a basis for the orthogonal complement of the subspace

W = Span

(c) Give a matrix B so that the subspace W defined in part b can be written in the form W = N(B).
*10. Let A be an m x n matrix with rank r. Suppose A = BU, where U is in echelon form. Prove
that the first r columns of B give a basis for C(A). (In particular, if EA — U, where U is the echelon
form of A and E is the product of elementary matrices by which we reduce A to U, then the first r
columns of E~l give a basis for C(A).)
11. According to Proposition 4.10, if A is an m x n matrix, then for each b e C(A), there is a unique
x e R(A) with Ax = b. In each case, give a formula for that x.
186 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

812. Let A be an m x n matrix and B be an n x p matrix. Prove that


(a) N(B) c N(AB).
(b) C(AB) c C(A). (Hint: Use Proposition 4.1.)
(c) N(B) = N(AB) when A is n x n and nonsingular.
(d) C(AB) ~ C(A) when B is n x n and nonsingular.
13. Continuing Exercise 12: Let A be an m x n matrix and B be an n x p matrix.
(a) Prove that rank(AB) < rank(A). (Hint: See part b of Exercise 12.)
(b) Prove that if n = p and B is nonsingular, then rank(AB) = rank(A).
(c) Prove that rank(AB) < rank(B). (Hint: Use part a of Exercise 12 and Theorem 4.5.)
(d) Prove that if m = n and A is nonsingular, then rank (AB) = rank(B).
(e) Prove that if rank(AB) = n, then rank(A) = rank(B) = n.
314. Let A be an m x n matrix. Prove that N(ATA) = N(A). (Hint: Use Exercise 12 and Exercise
1.4.32.)
15. Let A be an m xn matrix.
(a) Use Theorem 4.9 to prove that N(ATA) = N(A). (Hint: You’ve already proved D in Exercise
12. Now, if x e N(ATA), then Ax & C(A) n N(AT).)
(b) Prove that rank(A) = rank(ATA).
(c) Prove that C(ATA) = C(AT). (Hint: You’ve already proved c in Exercise 12. Use part b to see
that the two spaces have the same dimension.)
16. Suppose A is an n x n matrix with the property that A2 — A.
(a) Prove that C(A) = {x e R" : x = Ax}.
(b) Prove that N(A) = {x e R” : x = u - Au for some u e R"}.
(c) Prove that C(A) n N(A) = {0}.
(d) Prove that C(A) + N(A) = R”.
17. Suppose U and V are subspaces of R". Prove that (U n V)x — U1 + V1. (Hint: Use Exercise
1.3.12 and Proposition 4.8.)
18. (a) Show that if the m x n matrix A has rank 1, then there are nonzero vectors ueR” and
v s R” so that A = uvT. Describe the geometry of the four fundamental subspaces in terms of u
and v.
Pursuing the discussion on p. 183,
(b) Suppose A is an m x n matrix of rank n. Show that AT A = In if and only if the column vectors
ai,..., a„ e Rn are mutually orthogonal unit vectors.
(c) Suppose A is an m x n matrix of rank 1. Using the notation of part a, show that (SoT)(x) = x
for each x e R(A) if and only if ||u|| ||v|| = 1. Interpret T geometrically.
(d) Can you generalize? (See Exercise 9.1.15.)

► 5 THE NONLINEAR CASE: INTRODUCTION TO MANIFOLDS


We have seen that given a linear subspace V of R", we can represent it either explic­
itly (parametrically) as the span of its basis vectors or implicitly as the solution set of a
homogeneous system of linear equations (i.e., the nullspace of an appropriate matrix A).
Proposition 4.2 gives a geometric interpretation of that matrix: Its row vectors must span
the orthogonal complement of V.
In the nonlinear case, sometimes we are just as fortunate. Given the hyperbola with
equation xy = 1, it is easy to solve (everywhere) explicitly for either x or y as a function
5 The Nonlinear Case: Introduction to Manifolds 187

of the other. In the case of the circle x2 + y2 = 1, we can solve for y as a function of x
locally near any point not on the x-axis (viz., y =■ ±V1 — x2), and for x as a function of y
near any point not on the y-axis (analogously).
But it is important to understand that going back and forth between these two approaches
can be far more difficult—if not impossible—in the nonlinear case. For example, with a
bit of luck, we can see that the parametric curve

' t2-l '


g(f) = t e R,
J(t2 -1) J ’

is given by the algebraic equation y2 = x2(x + 1) (the curve pictured in Figure 1.4(b) on
p. 55). On the other hand, the cycloid, presented parametrically as the image of the function

t — sinr
g(0 = t g R,
1 — cos t

(see Figure 1.6 on p. 57) is obviously the graph y = /(x) for some function f, but I believe
no one can find f explicitly. Nor is there a function on R2 whose zero-set is the cycloid.
Nevertheless, it is easy to see that locally we can write x as a function of y away from
the cusps. On the other hand, given the hypocycloid x2/3 + y2/3 = 1, we can find the
parametrization

cos3t
g(0 = • 3 t e [0,2?r],
sin t

but giving an explicit (global) parametrization of the curve y2 - x3 + x = 0 in terms of


elementary functions is impossible. However, as Figure 5.1 suggests, away from the points

Figure 5.1
188 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

lying on the x-axis, we can write yas a function of x (explicitly in this case: y = zbs/x3 — x),
and near each of those three points we can write x as a function of y (explicitly only if you
know how to solve the cubic equation x3 — x = y1 explicitly).
Given the hyperplane a ■ x = 0 in Rn, we can solve for xn as a function of xi,...,
xn~i—i-e., we can represent the hyperplane as a graph over the xj - • • xn_i-plane—if and
only if an f 0 (and, likewise, we can solve for xk in terms of the remaining variables if and
only if a* 7^ 0). More generally, given a system of linear equations, we apply Gaussian
elimination and solve for the pivot variables as functions of thefree variables. In particular,
as Theorem 4.4 shows, if rank(A) = r, then we solve for the r pivot variables as functions
of the n — r free variables.
Now, since the derivative gives us the best linear approximation of a function, we
expect that if the tangent plane to a surface at a point is a graph, then so locally should be
the surface, as depicted in Figure 5.2. We suggested in Section 4 of Chapter 3 that, given a
level surface f — c of a differentiable function f: R” -> R, the vector V/(a)—provided
it is nonzero—should be the normal vector to the tangent plane at a; equivalently, the
subspace of R" parallel to the tangent plane should be the nullspace of the matrix [D/(a)].
To establish these facts we need die Implicit Function Theorem, whose proof we delay to
Chapter 6.

Figures.!

Theorem 5.1 (Implicit Function Theorem, Simple Case) Suppose U c R" is open,
f) f
a e U, and f: V -> R is 61. Suppose that /(a) = 0 and -—(a) 0. Then there are

neighborhoods V of and W ofan and a C1 function V —> W so that


5 The Nonlinear Case: Introduction to Manifolds ◄ 189

That is, near a, the level surface / = 0 can be expressed as a graph over the • • • xn_\-
plane; i.e., near a, the equation f = 0 defines xn implicitly as a function of the remaining
variables.


More generally, provided Df(ti) 0, we know that some partial derivative -^-(a) 0,
%xk
and so locally the equation f = 0 expresses xt implicitly as a function of xj,.. , xjt-i, ,
...,xn.

► EXAMPLE 1

Consider the curve

= y3 — 3y — x = 0,

3f
as shown in Figure 5.3. Although it is globally a graph of x as a function of y, we see that — =
dy
-2
3(y2 — 1) = 0 at the points ± . Away from these points, y is given (implicitly) locally as
1
a function of x. We recognize these as the three (C1) local inverse functions 0i, 02, and 03 of
g(x) — x3 — 3x, defined, respectively, on die intervals (—2, oo), (—2,2), and (—oo, 2). ◄

► EXAMPLE!

Consider the surface

/
f I y I = z2 + xz + y = 0,
\zj
190 k Chapter 4. Implicit and Explicit Solutions of Linear Systems

Figure 5.4

pictured in Figure 5.4. Note first of all that it is globally a graph: y = — (z2 + xz). On the other hand,
—-=2z+x = 0on/ = 0 precisely when x = -2z and y = z2. That is, away from points of the
dz
‘ ~2t
form t2 for some t € R, we can locally write z — </> ( j. Of course, it doesn’t take a wizard
t
to do so: We have
—x ± y/x2 — 4y

and away from points of the designated form we can choose either the positive or negative square
root. It is along the curve 4y = x2 (in the xy-plane) that the two roots of this quadratic equation
in z coalesce. (Note that this curve is the projection of the locus of points on the surface where
F=o-) 4
dz

Now we can legitimize (finally) the process of implicit differentiation introduced


in beginning calculus classes. Suppose U C R" is open, act/, /: U —> IR is C1, and
3/
-— (a) 7^ 0. For convenience here, let’s write
8xn

Then, by Theorem 5.1, f — 0 defines xn implicitly as a C1 function </>(x) near a. Then we


have
5 The Nonlinear Case: Introduction to Manifolds 191

Lemma 5.2 For j = 1,... ,n — 1, we have

Proof Define g: V -> Rn by g(x) = Then (/°g)(x) = 0 for all x G V.

Thus, by the Chain Rule, D(/og)(a) = D/(g(a))Dg(a) = 0, so that

1 0
9/ 9/ 9/
3X„_1 9xn 0
dip
dxi

(Here all the derivatives of 0 are evaluated at a, and all the derivatives of f are evaluated
at g(a) = a.) In particular, for any j = 1,..., n — 1, we have

9f
^(a) 9/ 90
+ -^-(a)-^(a) = 0,
vXy (j Xj i

from which the result is immediate. ■

Now we can officially prove our assertion from Chapter 3.

Proposition 5.3 Suppose U C R" is open, a eU, f: U —> R is C1, and Df(a) / 0.
Suppose /(a) = c. Then the tangent hyperplane at a of the level surface M = /-1({c}) is
given by

TaM = {x g R" : D/(a)(x - a) = 0};

that is, V/'(a) is normal to the tangent hyperplane.

3f
Proof Since D/(a) 0, we may assume without loss of generality that —— (a) 0.
9xn
Applying Theorem 5.1 to the function f — c, we know that M can be expressed near a as
the graph xn = 0 (x) for some C1 function 0. Now, the tangent plane to the graph xn — 0 (x)
a
ata = is the graph of D0(a), translated so that it passes through a:
.0(a).

^4 90
xn - an = D0(a)(x - a) = £ -^-(a)(x7 - af)
dXj

= 2^ I ~~Bf'— I ^Xi ~ fay Lemma 5.2),


/=i \
192 ► Chapter 4. Implicit and Explicit Solutions of Linear Systems

and so, by simple algebra, we obtain

V ^-(»)(XJ - aj) + ^-(a)(x„ - a„) = D/(«)(x - a) = 0,


~ dxJ dxn
as required. ■

From Theorem 5.1 we infer that if f: Rn —> R is 61 and V/ / 0 on the level sur­
face M — f~x({<?}), then at each point a e M, we can locally represent M as a graph
over (at least) one of the n coordinate hyperplanes. We call such a set M a smooth
hypersurface or (n — l)-dimensional manifold. More generally, a subset M C Rn is an
(n — m) -dimensional manifold if each point has a neighborhood that is a C1 graph over
some (n — m)-dimensional coordinate plane. The general version of the Implicit Function
Theorem, which we shall prove in Chapter 6, tells us that this is true whenever M is the level
set of a C1 function F: R" -> Rm with the property that rank(£>F(x)) = m at every point
x g M. Moreover, if we generalize the result of Proposition 5.3„the (n — m)-dimensional
tangent plane of M at a point a is then obtained by translating the (n — m)-dimensional
subspace N([DF(a)]) so that it passes through a.

► EXAMPLES

Suppose a, b > 0. Consider the intersection M of the cylinders x2 + y2 = a2 and x2 + z2 = b2. We


claim that as long as a b, M is a smooth curve (1-dimensional manifold), as pictured in Figure 5.5.
If we define F: R3 -> R2 by

x2 4- y2 — a2
x2 + z2-b2

then M = F 1 ({0}). To see that M is a 1-dimensional manifold, we check that rank(DF(x)) = 2 for
every x g M. We have

Figure 5.5
5 The Nonlinear Case: Introduction to Manifolds 193

2x 2y 0 _ x y 0
2x 0 2z x 0 z

If x 0, this matrix will have two pivots, since y and z can’t be simultaneously 0. If x — 0, then
both y and z are nonzero, and once again the matrix has two pivots. Thus, as claimed, the rank of
DF(x) is 2 for every x e M, and so M is a smooth curve.

► EXERCISES 4.5

1. Can one solve for one of the variables in terms of the other to express each of the following as a
graph? What about locally?
(a) xy = 0 (b) 2 sin(xy) = 1
2. Decide whether each of the following is a smooth curve (1-dimensional manifold). If not, what
are the trouble points?
(a) y2 — x3 + x — 0 (d) x2 + y2 + z2 — 1 = x2 — x + y2 = 0
(b) y2 — x3 — x2 — 0 (e) x2 + y2 + z2 — 1 = z2 — xy = 0
(c) z — xy = y — x2 = 0
*3. Let
(x\ r1"
f I y I = xy2 + sin(xz) + ez and a= -1
\z/ °_

(a) Show that the equation f — 2 defines z as a C1 function z = $ ( j near a.


zl x T7- j ^0 / 1\ , dtp / 1\
(b) Find — I I and — I I.
dx \-iJ dy \-lJ
(c) Find the equation of the tangent plane of the surface ({2}) at a in two ways.

4. Suppose h: R2 -> R is C1 and ~~ 0. Show that the equation h \X ) = 0 defines z (locally)


dxz \z/x J

implicitly as a C1 function z = 0 I | and show that


\y)
d(p dtp a (x \

5. Prove that S"-1 = {x e Rn: ||x|| = 1} is an (n - l)-dimensional manifold. (Hint: Note that
||x|| = 1 ||x||2 = 1.)
*6. Let f: R3 —> R be given by
Ix \
f I y I = z2 + 4x3z — 6xyz + 4y3 — 3x2y2.

\z/

Is M = f~x ({0}) a smooth surface (2-dimensional manifold)? If not, at what points does it fail to
be so?
194 > Chapter 4. Implicit and Explicit Solutions of Linear Systems

7. Show that the intersection of the surfaces x2 + 2y2 + 3z2 = 9 and x2 4- y2 = z2 is a smooth curve.
1
Find its tangent line at the point a = 1
72

8. Investigate what happens in Example 3 when a = b.


9. Show that the set of nonzero singular 2x2 matrices is a 3-dimensional manifold in ^2x2 = R4-

10. Consider the curve f = 4y3 - 3y — x — 0.

(a) Sketch the curve.


(b) Check that y is given (locally) by the following C1 functions of x on the given intervals:
0i (x) = |((x + 7x2 - i)1/3 + (x + 7x2 - 1) 1/3), x g (1, oo);
02 (x) = cos(| arccosx), x e (—1,1).
Give the remaining functions (two defined on (—1,1), one on (—oo, —1)).
(c) Show that the function 0: (—1, oo) -> R defined by
02(x), x g (-1,1)
0(x) = 1, x=1
01 (x), X € (1, oo)
is C1 and that the value of 0'(1) agrees with that given by Lemma 5.2.
*11. Let M = {x G R4 : x^ + xf + x2 4- x2 = 1, xix2 = x3x4}.
(a) Show that M is a smooth surface (2-dimensional manifold).
" 1" 1/2"
0 -1/2
(b) Find the tangent plane of M at a = and at a =
0 -1/2
.0. _ 1/2.
ar
12. Suppose f: R3 -> R is 62 and ~ (a) 0, so that f = 0 defines z implicitly as a Q2 function 0
of x and y near a. Show that
2
2 ; a2/ a/ a/
320 dxdz dx dz
a?® ~ / 3

where all the partial derivatives on the right-hand side are evaluated at a.
13. Consider the three (pairwise) skew lines
1
£i: x 0
0

0 1
f2 : X 1 0
0 1
5 The Nonlinear Case: Introduction to Manifolds 195

Show that through each point of £3 there is a single line that intersects both £1 and £2. Now, find
the equation of the surface formed by all the lines intersecting the three lines £1, £2, and £3. Is it
everywhere smooth? Sketch it.
14. Suppose X C R" is a ^-dimensional manifold and Y c Rp is an £-dimensional manifold. Prove
that

Xx Y = { x
effxr :xeXandy eK
y
is a (7c 4-£) -dimensional manifold in R”+p. (Hint: Recall that Xis locally a graph over a ^-dimensional
coordinate plane in R" and Y is locally a graph over an £-dimensional coordinate plane in Rp.)
15. (a) Suppose A is an n x (n + 1) matrix of rank n. Show that the 1-dimensional solution space
of Ax = b varies continuously with b e R". (First you must decide what this means!)
(b) Generalize.
CHWER

EXTREMUM PROBLEMS
In this chapter we turn to one of the standard topics in differential calculus, solving max-
imum/minimum problems. In single-variable calculus, the strategy is to invoke the Max­
imum Value Theorem (which guarantees that a continuous function on a closed interval
achieves its maximum and minimum) and then to examine all critical points and the end­
points of the interval. In problems that are posed on open intervals, one must work harder
to understand the global behavior of the function. For example, it is not too hard to prove
that if a differentiable function has precisely one critical point on an interval and that critical
point is a local maximum point, then it must indeed be the global maximum point. As we
shall see, all of these issues are—not surprisingly—rather more subtle in higher dimensions.
But just to stimulate the reader’s geometric intuition, we pose a direct question here.

Query: Suppose /: R2 —> R is C1 and there is exactly one point a at which the tangent plane
of the graph of f is horizontal. Suppose a is a local minimum point. Must it be a global
minimum point?

We close the chapter with a discussion of projections and inconsistent linear systems, along
with a brief treatment of inner product spaces.

> 1 COMPACTNESS AND THE MAXIMUM VALUE THEOREM


In Section 2 of Chapter 2 we introduced the basic topological notions of open and closed
sets and sequences. Here we return to a few more questions of the topology of R" in order
to frame the higher-dimensional version of the Maximum Value Theorem. Let’s begin by
reminding ourselves why a closed interval is needed in the case of a continuous function of
one variable: As Figure 1.1 illustrates, when an endpoint is missing or the interval extends to

196
1 Compactness and the Maximum Value Theorem ◄ 197

infinity, the function may have no maximum value. We now make the “obvious” definition
in higher dimensions:

Definition We say S C R” is bounded if all the points of S lie in some ball centered
at the origin, i.e., if there is a constant M so that ||x|| < M for all x e S. We say 5 c R"
is compact if it is a bounded, closed subset. That is, all the points of 5 lie in some ball
centered at the origin, and any convergent sequence of points in S converges to a point
in S.

► EXAMPLE 1

We saw in Example 6 of Chapter 2, Section 2, that a closed interval in R is a closed subset, and it is
obviously bounded, so it is in fact compact. Here are a few more examples.
a. The unit sphere S"-1 = {x € R" : ||x|| = 1} is compact. Indeed, by Corollary 3.7 of Chapter
2, any level set of a continuous function is closed, so provided we have a bounded set, it will
also be compact. (Note that we write S’1-1 because the sphere is an (n — l)-dimensional
manifold, as Exercise 4.5.5 shows.)
b. Any rectangle [ai, fei] x • • • x [an, bn] c R" is compact. This set is obviously bounded,
and it is closed because of Exercise 2.2.4.
c. The set of 2 x 2 matrices of determinant 1 is a closed subset of R4 (because the determinant
is a polynomial expression in the entries of the matrix) but is not compact. The set is
’’k 0 ”
unbounded, as we can take matrices of the form for arbitrarily large k.
0 1/k

One of the most important features of a compact set is the following

Theorem 1.1 If A C R" is compact, and {a*} is a sequence ofpoints in A, then there
is a convergent subsequence {a*y} (which a fortiori converges to a point in A).

Proof We first prove that any sequence of points in a rectangle [«i, ^i] x • • ■ x
[an> bn] C R" has a convergent subsequence. (This was the result of Exercise 2.2.15, but
the argument is sufficiently subtle that we include the proof here.) We proceed by induction
onn.
Step (i): Suppose n = 1. Given a sequence {%*} of real numbers with a < Xk < b
for all k, we claim that there is a convergent subsequence. If there are only finitely many
distinct numbers xjt, this is easy: At least one value must be taken on infinitely often, and
we choose k\ < &2 < • • • so that —Xk2 = ....
If there are infinitely many distinct numbers among the x^ then we use the famous
“successive bisection” argument. Let Iq = [a, b]. There must be infinitely many distinct
elements of our sequence either to the left of the midpoint of Zo or to the right; let Zi = [ai, bi]
be the half that contains infinitely many (if both do, let’s agree to choose the left half).
Choose Xk, € I\. At the next step, there must be infinitely many distinct elements of our
sequence either to the left or to the right of the midpoint of Zi. Let I2 = [^2. b2] be the half
that contains infinitely many (and choose the left half if both do), and choose Xk2 e I2 with
ki <k2. Continue this process inductively. Suppose we have the interval Zy = [ay, bj]
198 > Chapter 5. Extremum Problems

containing infinitely many distinct elements of our sequence, as well as k\ < < • • • < kj
with Xkt e h for t = 1,2,..., j. Then there must be infinitely many distinct elements of
our sequence either to the left or to the right of the midpoint of the interval /7, and we let
Ij+i = [aj+i, bj+1] be the half that contains infinitely many (once again choosing the left
half if both do). We also choose x^+1 e Ij+1 with kj < kj+i.
At the end of all this, why does the subsequence {x^} converge? Well, in fact, we
know what its limit must be. The set of left endpoints aj is nonempty and bounded above
by b, hence has a least upper bound, a. First of all, the left endpoints a7 must converge to
a, because (see Figure 1.2)

ai < a2 < • ■ ■ < aj < - • • < a < • ■ ■ < bj < • • ■ < b2 < bi,

and so a — <bj - aj = (b — a)/2? -> 0 as j -> oo. But since a and x*. both lie in
the interval [aj, bj], it follows that |a — < bj — aj -> 0 as j -> oo.

a
I-------------------------------------- 1--------- H-d------------------- F---------
Oq — U j O2 = fl3 ^4 ^3 = ^4 = ^2 bo

Figure 1.2

Step (ii): Suppose now n > 2 and we know the result to be true in R" 1. We introduce

some notation: Givenx = e R", we write x = e R" 1. Given a sequence

_ Xn _ _ Xn—1 _
{Xjt} of points in the rectangle [ui, bi] x • x [an, bn] C R", consider the sequence {x^} of
points in the rectangle [«i, hi] x • • • x bn-1] C R"'1. By our induction hypothesis,
there is a convergent subsequence {x^}. Now the sequence of 71th coordinates of the
corresponding vectors x^, lying in the closed interval [an, bn], has in turn a convergent
subsequence, indexed by kj{ < kj2 < ■ ■ ■ < kjt < .... But then, by Exercises 2.2.6 and
2.2.2, it now follows that the subsequence {x^} converges, as required.
Step (iii): Now we turn to the case of our general compact subset A. Since it is
bounded, it is contained in some ball B(0, R) centered at the origin, hence in some cube
[—R, R] x • • • x [—R, R]. Thus, given a sequence {x*} of points in A, it lies in this cube,
and hence by what we’ve already proved has a convergent subsequence. The limit of that
subsequence is, of course, a point of the cube but must in fact lie in A since A is also closed.
This completes the proof. ■

The result that is the cornerstone of our work in this chapter is the following
1 Compactness and the Maximum Value Theorem 199

Theorem 1.2 (Maximum Value Theorem) Let X c R" be compact, and let
f: X R he a continuous junction.1 Then f takes on its maximum and minimum values;
that is, there are points y and z € X so that

f(X) < fi*) < /(z) for all x e X.

Proof First we show that f is bounded (by which we mean that the set of its values
is a bounded subset of R). Assume to the contrary that the values of f are arbitrarily large.
Then for each k e N there is a point xk € X so that /(x^) > k. By Theorem 1.1, since
X is compact, the sequence {x*} has a convergent subsequence, say, xkj -> a. Since f is
continuous, by Proposition 3.6 of Chapter 2, /(a) = lim f(xk.), but this is impossible
J->OO
since /(x^.) -> oo as j -> oo. An identical argument shows that the values of f are
bounded below as well.
Since the set of values of f is bounded above, it has a least upper bound, M. By the
definition of least upper bound, for each k e N there is xk e XsothatAf — f(xk) < 1/k. As
before, since X is compact, the sequence {x*} has a convergent subsequence, say, xkj z.
Then, by continuity, /(z) = lim f(xkl) = M, so f takes on its maximum value at z. An
;-*oo
identical argument shows that f takes on its minimum value as well. ■

We infer from Theorem 1.2 that, given any linear map T: R" -> Rm, the function

/: S"-1 R
fix) = ||T(x) ||

is continuous (see Exercises 2.3.2 and 2.3.7 and Proposition 3.5 of Chapter 2). Therefore,
f takes on its maximum value, which we denote by || T ||, called die norm of T:

||T||==max||T(x)||.

Since T is linear, the following formula follows immediately:

Proposition 1.3 Let T: R” -> Rm be a linear map. Then for any x e R”, we have

||T(x)|| < ||T||||x||.

Moreover, for any scalar c we have ||cT|| = |c| ||T||; and if S: Rn -» Rm is another linear
map, we have ||S + T|| < ||S|| + ||T||.

1 Although we have not heretofore defined continuity of a function defined on an arbitrary subset of R”, there is
no serious problem. We say f: X -> R is continuous at a e X if, given any e > 0, there is 5 > 0 so that

|/(x) — /(a)| < e whenever ||x — a|| < 3 andx € X.


200 ► Chapter 5. Extremum Problems

Proof There is nothing to prove when x = 0. When x =£ 0, we have

T < Ill’ll,

by definition of the norm, and so, using the linearity of T, we have

< IIT||||x||,

as required.
That max ||cT(x)|| = |c| max ||T(x)|| = |c|||T|| is evident. Now, last, since
||x||=l 11x11=1

||(S + T)(x)|| < ||S(x)|| + ||T(x)||,

we have

max ||(S + T)(x)|| < max (||S(x)|| + ||7-(x)||)


||x||=l ||x||=l

< max HS'Cx)II + max ||T(x)|| = ||5|| 4- ||T||. ■


||x|| = l !|X||=1

We will compute a few nontrivial examples of the norm of a linear map in the Exercises
of Section 4, but in the meantime we have the following.

► EXAMPLE 2

Let A be an n x n diagonal matrix, with diagonal entries d\,..., dn. Then for any x e S"-1 we have

I!Ax||2 = (diXi)2 4- (d2x2)2 + • • • 4- (dnxn)2


< max.(d$, d2)(xj 4------- F x2) = max(df, d%,..., d2).

Note, moreover, that this maximum value is achieved, for if max(|di|, ..., |dn|) = |d,|, then
Aej = d^ and ||Ae,-1| = |dj|. Thus, we conclude that

||A(| = max(|d1|, |d2|,..., |d„|). ◄

For future reference, we include the following important and surprising result

Theorem 1.4 (Uniform Continuity Theorem) Let X C Rn be compact and let


f: X -> R be continuous. Then f is uniformly continuous; i.e., given e > 0, there is
3 > Oso that whenever ||x — y || < 3, x, y e X, we have |/(x) — f (y) | < e :

Proof We argue by contradiction. Suppose that for some £o > 0 there were no such
8 > 0. Then for every m € N, we could find xm, ym € X with UXm - ym|| < 1/m and
|/(Xm) — /(ym)| >: £o- Since X is compact, we may choose a convergent subsequence
-> a. Now since ||xm — ym|| -> Oasm -> oo, it must be the case that ymt -> a as well.
Since f is continuous at a, given e q > 0, there is <50 > 0 so that whenever ||x — a|| < <50,
1 Compactness and the Maximum Value Theorem ◄ 201

we have |/(x) — /(a)| < g q /2. By the triangle inequality, whenever k is sufficiently large
that ||Xm* - all < <$o and ||ymjfc - a|| < <5o, we have

l/(Xmt) - /(yrnjl < l/(Xm*) - /(a)| + |/(yWt) - /(a)| < So,

contradicting our hypothesis that |/(xm) - /(ym)| > s0 for all m. ■

► EXERCISES 5.1

*1. Which of the following are compact subsets of the given R" ? Give your reasoning. (Identify the
space of all n x n matrices with R" .)
x e* cos t
€ R2 : x2 + y2 = 1 (g) € R2 : t < 0
y e‘ sin?

X
(b) | X
€ R2 : X2 + y2 < 1 (h) y € R3 : x2 + y2 + Z2 < 1
_y _
_z _
X
X
(0 € R2 : x2 - y2 = 1 (i) y e R3 : x3 + y3 + z3 < 1
_y.
_z _

(d) (J) {3x3 matrices A : det A = 1}


y

(e) € R2 : y = sin - for some 0 (k) (2x2 matrices A : ATA = /}

cos t
(f) e R2: t e R (1) {3x3 matrices A : ATA = /}
sin?

2. If X c R” is not compact, then show that there is an unbounded continuous function f: X R.


3. Let T: R” -> R be a linear map. Prove that there is a vector a e R" so that T(x) = a • x, and
deduce that || T || = ||a||.
4. Find ||A|| if
1 1 3 4
(a) A = ; (b) A =
1 1 3 4

B5. Suppose A is an m x n matrix. Prove that || A || < y 22/,/ aij — || A||.

#6. Suppose T: R" -> Rm and S: R"1 -> Rz are linear maps. Show that ||SoT|| < ||S||||T||. (In
particular, when A is an f x m matrix and B is an m xn matrix, we have || AB || < || A || ||B||.)
7. Let A be an m x n matrix. Show that || A || = || AT||. (Hint: Start by showing that || A || < ||AT|| by
using Proposition 4.5 of Chapter 1.)
8. Suppose S c R" is compact and a e R" is fixed. Show that there is a point of S closest to a. (Hint:
Use Exercise 2.3.2.)
*9. Suppose S c Rn has the property that any sequence of points in S has a subsequence converging
to a point in S. Prove that S is compact
202 Chapter 5. Extremum Problems

10. Suppose f: X -> Rm is continuous and X is compact. Prove that the set f (X) = {y g Rm :
y = f (x) for some x g X} is compact. (Hint: Use Exercise 9.)
11. Suppose D S2 D S3 D ... are nonempty compact subsets of Rn. Prove that there is x € R" so
that x € St for all k € N. (Cf. Exercise 2.2.10.)
1112. Suppose X C R” is a compact set. Suppose Ui, U2, U3,... c R" are open sets whose union
contains X. Prove that for some N e N we have X c Ui U • • • U UN. (Hint: If not, for each fc,choose
Xk € X so that Xk £ Ui U • • • U I4-)
13. Suppose X c R" is compact, and U\, U2, U3,... C R" are open sets whose union contains X.
Prove that there is a number 8 > 0 so that for every x g X, there is some j g N so that B(x, 5) c Uj.
(Hint: If not, for each k € N, what happens with 8 = 1/k?)

> 2 MAXIMUM/MINIMUM PROBLEMS


Definition Let X C R", and let a e X. The function f: X -> R has a global max­
imum at a if /(x) < /(a) for all x G X; the function f has a local maximum at a if, for
some 8 > 0, we have /(x) < /(a) for all x g B(a, 5) n X. We say a is a (local or global)
maximum point of f.
Analogously, f has & global minimum at a if /(x) > /(a) for all x G X; the function f
has a local minimum at a if, for some 8 > 0, we have /(x) > /(a) for all x G B(a, 8) n X.
We say a is a (local or global) minimum point of f.
If a is either a local maximum or local minimum point, we say it is an extremum.

We begin with a somewhat silly example:

► EXAMPLE 1

1, X G Q
If/(x) = , then every point a e Q is a global maximum point and every point a G Q is
0, x <£ Q
a global minimum point.

Now for something a bit more substantial.

► EXAMPLE 2

Let f: R2 -> R be defined by f = x2 + 2xy + 3y2. Then

From this we infer that 0 is a global minimum point. Indeed, (x + y)2 + 2y2 = 0 if and only if
x + y = y= 0if and only if x = y = 0, so 0 is the only global minimum point of f. But is 0 the
only extremum?

Lemma 2.1 Suppose f is defined on some neighborhood of the extremum a and f is


differentiable at a. Then Df(a) = 0 (or, equivalently, N/(&)=■$).
2 Maximum/Minimum Problems 4 203

Proof Suppose that a is a local minimum (the case of a local maximum is left to the
reader). Then for any v e R", there is 8 > 0 so that we have

/(a 4- tv) — /(a) >0 for all real numbers t with |t| < 8.

This means that


/(a + *▼ ) ~/(a) > q
hm -------------------------- and /(a + tv)-/(a) <
jim --------------------------- q
z->0+ t z->0~ t
Since f is differentiable at a, the directional derivative Dv/(a) exists, and so we must have

Z>/(a)v = Dv/(a) = Um /<a + tv)~r(a) = Q


z->o r

Since v is arbitrary, we infer that Z>/(a) = 0. ■

Remark Geometrically, if we consider f as a function of x(- only, fixing all the other
variables, we get a curve with a local minimum at aif which must therefore have a flat
tangent line. That is, all partial derivatives of f at a must be 0, and so the tangent plane
must be horizontal.

Definition Suppose f is differentiable at a. We say a is a critical point if Df (a) = 0.


A critical point a with the property that /(x) < /(a) for some x near a and /(x) > f (a)
for other x near a is called a saddle point.

In Section 3 we will devise a second-derivative test to attempt to distinguish among


local maxima, local minima, and saddle points, typical ones of which are shown in Figure
2.1. In the sketch in Figure 2.2(a), we cannot tell whether we are at a local maximum or a
local minimum; however, in (b) and (c) we strongly suspect a saddle point.

► EXAMPLE 3

The prototypical example of a saddle point is provided by the function f x2 — y2. The origin

is a critical point, and clearly f < 0 for y /= 0. In the graph we see


204 ► Chapter 5. Extremum Problems

parabolas opening upward in the x-direction and those opening downward in the y-direction (see
Figure 2.3(a)).
A somewhat more interesting example is provided by the so-called monkey saddle, pictured in

Figure 2.3(b), which is the graph of f — 3xy2 — x3. Note that whereas the usual saddle surface

allows room for the legs, in the case of the monkey saddle there is also room for the monkey’s tail. ■*4

w w

Figure 2.3

Now we turn to the standard fare in differential calculus, the typical “applied extremum
problems.” If we are fortunate enough to have a differentiable function on a compact
region X, then the Maximum Value Theorem guarantees both a global maximum and a
global minimum, and we can test for critical points on the interior of X (points having
a neighborhood wholly contained in X). It still remains to examine the function on the
boundary of X, as well.
2 Maximum/Minimum Problems ◄ 205

► EXAMPLE 4

We want to find the hottest and coldest points on the metal plate R = [0, n] x [0, j t ], whose temper­

ature is given by f 1 j = sinx + cos 2y. Since f is continuous and R is compact, we know the

global maximum and minimum exist. We find that

cosx -2sin2yj,
Df

rt/2
and so the only critical point in the interior of R is . The boundary of R consists of four
n/2

line segments, as indicated in Figure 2.4. On Ci and C3 we have f sinx +1,

x € [0, ?r], which achieves a maximum at jt /2 and minima at 0 and t t . Similarly, on C2 and C4

we have f = cos2y, y € [0, t f ], which achieves its maximum at 0 and n and its

minimum at n/2. We now mark the values of f at the nine points we’ve unearthed. We see that the
n/2 n/2 0
hottest points are and and the coldest points are and . On the other
0 n/2 n/2
hand, the critical point at the center of the square is a saddle point (why?).

Somewhat more challenging are extremum problems where the domain is not naturally
compact. Consider the following

► EXAMPLES

Of all rectangular boxes with no lid and having a volume of 4 m3, we wish to determine the dimensions
of the one with least total surface area. Let x, y, and z represent the length, width, and height of the
box, respectively, measured in meters (see Figure 2.5). Given that xyz = 4, we wish to minimize the
surface area xy + 2z(x + y). Substituting z — 4/xy, we then define the surface area as a function of
the independent variables x and y:

8 , k
= xy d----- (x + y)=xy + 8|- + -l.
xy \x y/
206 > Chapter 5. Extremum Problems

Note that the domain of f is the open first quadrant, i.e., X = 0 and y > 0 k which is
y
definitely not compact. What guarantees that our function f achieves a minimum value on X? (Note,
for example, that f has no maximum value on X.) The heuristic answer is this: If either x or y gets
either very small or very large, the value of f gets very large. We shall make this precise soon.
Let’s first of all find the critical points of f. We have

so at a critical point we must have

8 8 „
y- — =x~ — =0,
xl yl

whence x = y — 2. The sole critical point is a = and f = 12. Now it is not difficult to
2
establish the fact that a is the global minimum point of /.Let

S=

as in Figure 2.5(b). Then S is compact, so the restriction of / to the set 5 attains its global minimum

value. Here is the crucial point: Whenever is on the boundary of or outside S, we have
y
.1 1
> 12. (For if either 0<x<|or0<y<|, then we have / —I—
x y

if xy > 12, then we have f I I > 12.) Since /(a) = 12, it follows that the global minimum of /

on S cannot occur on the boundary of S, hence must occur at an interior point, and therefore at a
critical point of /. It follows that a is the global minimum point of / on S, hence on all of X since
/(x) > /(a) whenever x £ S.
In summary, the box of the least surface area has the dimensions 2 m x 2 m x 1 m. ◄ !
2 Maximum/Minimum Problems ◄ 207

In general, when confronted with a maximunVminimum problem on a noncompact


region, we must usually be somewhat resourceful—either with such estimates or with
algebraic or geometric arguments.

► EXERCISES 5.2

1. Find all the critical points of the following scalar functions:


*(a) = x2 4- 3x - 2y2 + 4y (h) = x2y — 4xy — y2

(b) = xy 4- x - y
*(i) /b = xyz — x2 — y2 4- z2
W
(0 = sin x + sin y

(j) = x3 4- xz2 — 3x2 + y2 + 2z2


(d) = x2 — 3x2y 4- y3
V.
lx
(e) — x2y 4~ x3 —x2 4-y2 (k) = e <x2+y2+z2)/6(x _ y_|_
z
(f) = (x2 + y2)e~y
(1) f y — xyz — x2 — y2 — z2
(g) = (x - y)e-(x2+?2M4 kz.

2. A rectangular box with edges parallel to the coordinate axes has one comer at the origin and the
opposite comer on the plane x + 2y 4- 3z = 6. What is the maximum possible volume of the box?
*3. A rectangular box is inscribed in a hemisphere of radius r. Find the dimensions of the box of
maximum volume.

*4. The temperature of the circular plate D = {x : ||x|| < s/2} c R2 is given by the function / ( I =

x2 4- 2y2 — 2x. Find the maximum and minimum values of the temperature on D.
5. Two non-overlapping rectangles with their sides parallel to the coordinate axes are inscribed in
0 1 0
die triangle with vertices at , and . What configuration will maximize the sum of
0 o 1
their areas?
k6. A post office employee has 12 ft2 of cardboard from which to construct a rectangular box with no
lid. Find the dimensions of the box with the largest possible volume.
7. Show that the rectangular box of maximum volume with a given surface area is a cube.
8. The material for the sides of a rectangular box cost twice as much per ft2 as that for the top and
bottom. Find the relative dimensions of the box with greatest volume that can be constructed for a
given cost. 1
9. Find the equation of the plane through the point 2 that cuts off the smallest possible volume
in the first octant. 2
208 ► Chapter 5. Extremum Problems

*10. A long, flat piece of sheet metal, 12" wide, is to be bent to form a long trough with cross sections
an isosceles trapezoid. Find the shape of the trough with maximum cross-sectional area. (Hint: It
will help to use an angle as one of your variables.)
11. A pentagon is formed by placing an isosceles triangle atop a rectangle. If the perimeter P of
the pentagon is fixed, find the dimensions of the rectangle and the height of the triangle that give the
pentagon of maximum area.
12. An ellipse is formed by intersecting the cylinder x2 + y2 = 1 and the plane x + 2y + z = 0. Find
the highest and lowest points on the ellipse. (As usual, the z-axis is vertical.)
13. Suppose x, y, and z are positive numbers with xy2z3 = 108. Find (with proof) the minimum
value of their sum.
14. Let ai,..., a* G R" be fixed points. Show that the function
k
/(x) = J2||x-ay||2
j =i
has a global minimum and find the global minimum point.
15. (Cf. Exercise 14.) Let ai, a2, a3 g JR2 be three noncollinear points. Show that the function
3
/(x) = 22||x-a;-||
j =i
has a global minimum and characterize the global minimum point. (Hint: Your answer will be
geometric in nature. Can you give an explicit geometric construction?)

> 3 QUADRATIC FORMS AND THE SECOND DERIVATIVE TEST


Just as the second derivative test in single-variable calculus often allows us to differen­
tiate between local minima and local maxima, there is something quite analogous in the
multivariable case, to which we now turn. Of course, even with just one variable, if
f'(a) = f'(a) — 0, we do not have enough information, and we need higher derivatives to
infer the local behavior of f at a; lying behind this is the theory of the Taylor polynomial,
which works analogously in the multivariable case. In the interest of time, however, we
shall content ourselves here with just the second derivative.
First, we need a one-variable generalization of the Mean Value Theorem. (In truth, it
is Taylor’s Theorem with Remainder for the first-degree Taylor polynomial. See Chapter
20 of Spivak.)

Lemma 3.1 Suppose g : [0,1] -> R is twice differentiable. Then

g(l) = g(0) + #'(0) 4- for some 0 < £ < 1.

Proof Define the polynomial P by P(f) = g(0) + g'(O)t + Ct2, where C = g(l) —
g(0) — g'(G)‘ This choice of C makes P(l) = g(l), and it is easy to see that P(0) = g(O)
and P'(0) = gz(0) as well, as shown in Figure 3.1. Then the function h = g — P satisfies
h(ff) = /z'(0) = Jz(l) = 0. By Rolle’s Theorem, since 7z(0) = A(l) = 0, there is c e (0,1)
3 Quadratic Forms and the Second Derivative Test 209

Figure 3.1

so that h'(c) = 0. By Rolle’s Theorem applied to h', since A'(0) — hf(c) = 0, there is
£ € (0, c) so that h"(£) = 0. This means that g"(£) = P"(g) — 2C, and so

gd) = p (d =g(0)+g'm +

as required. ■

The derivative in the multivariable setting becomes a linear map (or vector); as we shall
soon see, the second derivative should become a quadratic form, i.e., a quadratic function
of a vector variable.

Definition Assume /is a C2 function in a neighborhood of a. Define the symmetric


matrix
t>2f (a d2f
3x2 '' 3xi3xn

^(a) d2f
3x2 w 3x23x„
Hess(/)(a) = -(a) =
uXj uXj
dxndx2^

Hess(/)(a) is called the Hessian matrix of f at a. Define the associated quadratic form

: Rn —> K by
n d2 f
= hT(Hess(/)(a))h = V —
1,7=1
0Xi OXJj

Now we are in a position to state the generalization of Lemma 3.1 to functions of


several variables. This will enable us to deduce the appropriate second derivative test for
extrema.

Proposition 3.2 Suppose ft B(a, r) -> R is C2. Then for all h with ||h|| < r we
have

/(a + h) = /(a) + D/(a)h + ^IK/.a+^hCh) for some 0 < £ < 1.


210 > Chapter 5. Extremum Problems

Consequently,

f(a + h) = /(a) + D/(a)h + jJQafli) + 6(h), where 6(h)/||h||2 -> 0 as h -> 0.

Remark Just as the derivative gives the best linear approximation to f at a, so


adding the quadratic term |JHy>a(h) gives the best possible quadratic approximation to
f at a. This is the second-degree Taylor polynomial of f at a. For further reading on
multivariable Taylor polynomials, consult, e.g., Edwards’s Advanced Calculus of Several
Variables or Hubbard and Hubbard’s, Vector Calculus, Linear Algebra, and Differential
Forms: A Unified Approach.

Proof We apply Lemma 3.1 to the function g(t) = /(a + rh). Using the chain rule
twice (and applying Theorem 6.1 of Chapter 3 as well), we have

n df
g'(t) = Df(a + <h)h = 22 ~(a + tK)h,

n n f n f
> uXjuXi (a+'h)^)
«"w = L(E5r£: ' /‘i = .12 i dXi (a+ll,)'’i',>
ar£?
, ax
1=1 J=1 J l,J=l J
= ^/,a+rh(h).

Now substitution yields the first result.


Since f is C2, given any £ > 0, there is 5 > 0 so that whenever ||v|| < 5 we have

||Hess(/)(a + v) - Hess(f)(a)|| < £.

Using the Cauchy-Schwarz inequality, Proposition 2.3 of Chapter 1, and Proposition 1.3,
we find^that |hTAh| < || A|| ||h||2. So whenever ||h|| < 3, we have, for any 0 < $ < 1,

|3Q,a+?h(h) - tt/,a(h)| < e||h||2.

By definition, 6(h) = |(CK/,a+^h(h) — JQ,a(h)), so

|6(h)| = |Jf/,a^h(h)-J</,a(h)| £
M2 2||h||2 < 2

whenever ||h|| < 6. Since £ > 0 was arbitrary, this proves the result. ■

Definition Given a symmetric n x n matrix A, we say the associated quadratic form


Q: Rn -> R, Q(x) = xTAx, is

positive definite if Q(x) > 0 for all x 0,


negative definite if Q(x) < 0 for all x 0,
positive semidefinite if Q(x) > 0 for all x and = 0 for some x / 0,
negative semidefinite if Q(x) < 0 for all x and = 0 for some x 0, and
indefinite if <2(x) > 0 for some x and Q(x) < 0 for other x.
3 Quadratic Forms and the Second Derivative Test 211

► EXAMPLE 1

1 2
a. The quadratic form Q(x) = x2 + 4x^2 + 5x% = xT x is positive definite, as we see
2 5
by completing the square:

xj + 4X1*2 + 5*2 = (*1 + 2*2)2 4- *2,

being the sum of two squares (with positive coefficients), is nonnegative and can vanish
only if *2 = *i 4- 2*2 = 0, i.e., only if x = 0.
1 1
b. The quadratic form Q(x) = xf 4- 2*i *2 - *j = xT x is indefinite, as we can see
1 -1

either by completing the square or merely by observing that Q ( j = t2 Oand Q

-t2 < 0 for t 0.


1 1 1
c. The quadratic form Q(x) = *f 4- 2*ix2 + 2x% 4- 2x i *3 + 2*j = xT 1 2 0 x is, how-
ever, positive semidefinite, for 1 0 2
*2 4~ 2xix2 4” 2*2 4" 2*1*3 4- 2*2 = (*i 4~ *2 4~ *3)2 4" *2 ~ 2*2*3 4* *2

= (*1 4- *2 4- *s)2 4- (*2 — *s)2 > 0,

-2\

(
11=0.

1/

Theorem 3.3 Suppose f: B(a, r) —> R is C2 and a is a critical point. If a is


positive (resp., negative) definite, then a is a local minimum (resp., maximum)point; if fK/,a
is indefinite, then a is a saddle point. If^f^ is semidefinite, we can draw no conclusions.

Proof By Proposition 3.2, given s > 0, there is 8 > 0 so that

/(a4-h) -/(a) = |2Q,a(h) 4-6(h) where < s whenever ||h|| < <5.
Ilhr
Suppose now that J-Qa is positive definite. By the Maximum Value Theorem, Theorem
1.2, there is a number m > 0 so that Jf/ia(x) > m for all unit vectors x. This means that
3t/,a(h) > m||h||2 for all h. So now, choosing e = m/4, we have

/(a + h) - /(a) = jK/,a(h) + e(h) > lm||h||2 > 0

for all h with ||h|| <3. This means that a is a local minimum, as desired. The negative
definite case is analogous.
Now suppose JCy,a is indefinite. Then there are unit vectors x and y so that Jfy,a(x) =
mi > 0 and CK /,a(y) = m2 < 0. Choose s — | min (mi, —m2). Now, letting h = rx (resp.,
ty) with |t| <8, we see that
/(a4-tx) - /(a) > Jmit2 > 0 and /(a 4- ty) - f (a) < |m2t2 < 0.

This means that a is a saddle point of f.


212 ► Chapter 5. Extremum Problems

Last, note that if Xy,a is positive semidefinite, then a may be either a local minimum,

a local maximum (!), or a saddle. Consider, respectively, the functions f — x2 + y4,

—x4 — y4, and x2 4- y3, all at the origin. ■

Corollary 3.4 When n = 2, assume f is C2 near the critical point a and


A B
Hess(/)(a) = . Then
B C

AC — B2 > 0 and A>° a is a local minimum


and A <0 a is a local maximum
AC - B2 < 0 a is a saddle point
AC - B2 = 0 the test is inconclusive

Proof This is just the usual process of completing the square: When A 0,

9 2 / B \2 B2\ , f B \2 /AC-B2\ 2
Ax2 + 2Bxy + Cy2 = A I x + —y I + I C----- - ) y = A I x + —y ) +1------ ------- | y ,
\ A / \ A J \ A J \ A /

so the quadratic form is positive definite when A > 0 and AC — B2 > 0, negative definite
when A < 0 and AC — B2 > 0, and indefinite when AC - B2 < 0. When A = 0, we have
2Bxy + Cy2 = y(2Bx + Cy), and so the quadratic form is indefinite provided B 0, i.e.,
provided AC — B2 < 0. ■

► EXAMPLE 2

Let’s find and classify the critical points of the function f: R2 -> R, f x3 4- y2 — 6xy. Then

Df

and so at a critical point we must have 2y = x2 = 6x. Thus, the critical points are a = and
0
6
b=
18
Now, we calculate the Hessian:

Hess(/)

0
and so

0 -6
= [-6 2

We see that Xy,a is indefinite, so a is a saddle point, and is positive definite, so b is a local
minimum point.
3 Quadratic Forms and the Second Derivative Test 213

The process of completing the square as we’ve done in Example 1 can be couched in
matrix language; indeed, it is intimately related to the reduction to echelon form, as we
shall now see.

► EXAMPLES

Suppose we begin to reduce the symmetric matrix

'1 3 2~
A= 3 4 -4
,2 -4 -10 _

to echelon form. The first step is

"1 3 2" "1 3 2'


A= 3 4 -4 0 -5 -10 = A',
_2 -4 -10 _ _0 -10 -14 _

where

There are already two interesting observations to make: The first column of Ef1 is the transpose of
the first row of A (hence of A'); and if we remove the first row and column from A', what’s left is
also symmetric. Indeed, we can write

0 0 0
-5 -10
-10 —14

since the first term is symmetric (why?), the latter term must be as well. Now we just continue:

1 3 2 1 3 2
A' = 0 -5 -10 0 -5 -10 = V,
0 -10 -14 0 0 6

and so, as before,


214 > Chapter 5. Extremum Problems

Summarizing, we have A = LU, where

1
L = E1-1E2"1= 3 1
2 2 1

is a lower triangular matrix with 1 ’s on the diagonal. Now here comes the amazing thing: If we factor
out the diagonal entries of the echelon matrix U, we are left with Lr:

U=

Because A is symmetric, we arrive at the formula

A = LDL\

corresponding to the formula we get by completing the square:

x2 + 6x1X2 + 4x2 + 4x i X3 — 8x2X3 — 10x2 = (xi + 3x 2 4- 2x3)2 — 5x2 — 2OX2X3 — 14x2


= (xi + 3x2 + 2x 3)2 — 5(x2 + 4x 2X3) - 14x2
(*) = (xi + 3x2 -j- 2X3)2 — 5(x 2 “I- 2X3)2 + 6x2.

To complete the circle of ideas, note that

2(x) = xTAx = x t (£D£t )x = (Lt x )t Z>(Lt x )


recaptures the form of (*).

Remark Of course, not every symmetric matrix can be written in the form LDLy -,
01
e.g., take A = . The problem arises when we have to switch rows to get pivots in the
1 0
appropriate places. Nevertheless, by doing appropriate row operations together with the
companion column operations (to maintain symmetry), one can show that every symmetric
matrix can be written in the form EDE\ where E is the product of elementary matrices
with only l’s on the diagonal (i.e., elementary matrices of type (iii)). See Exercise 8b for
the example of the matrix A given just above.

Proposition 3.5 Suppose A is a symmetric matrix with associated quadratic form


Q. Suppose A = LDL\ where L is lower triangular with 1 ’s on the diagonal and D is
diagonal. If all the entries of D are positive (resp., negative), then Q is positive (resp.,
negative) definite; if all the entries of D are nonnegative (resp., nonpositive) and at least
one is 0, then Q is positive (resp., negative) semidefinite; and if entries ofD have opposite
sign, then Q is indefinite.
Conversely, if Qis positive (resp., negative) definite, then there are a lower triangular
matrix L with 1 ’s on the diagonal and a diagonal matrix D with positive (resp., negative)
entries so that A = LDLT. IfQ is semidefinite (resp., indefinite), the matrix EAE1 (where
E is a suitable product ofelementary matrices of type (Hi)) can be written in theform L DL\
3 Quadratic Forms and the Second Derivative Test ◄ 215

-where now there is at least one 0 (resp., real numbers of opposite sign) on the diagonal
ofD.

Sketch of proof Suppose A = LDLT, where L is lower triangular with l’s on the
diagonal (or, more generally, A = EDE\ where E is invertible). Let d\,..., dn be the
diagonal entries of the diagonal matrix D. Letting y = LTx, we have
n
Q(x) = x t Ax = x t (LDLt )x = (Lt x )t D(Lt x ) = yTZ>y =
i=i

Realizing that y = 0 <=> x = 0, the conclusions of the first part of the proposition are
now evident.
Suppose Q is positive definite. Then, in particular, Q(ei) = an > 0, so we can write

1 ■0 0 ... 0 '
a12
aU a 111 r 1 1 + 0
11J L °n an J
B
aln
_ an _
_ 0

where B is also symmetric and the quadratic form on R"-1 associated to B is likewise
positive definite. We now continue by induction. (For example, if the upper left entry of B
were 0, this would mean that Q(ai2ei — a^) — 0, contradicting the hypothesis that Q is
positive definite.)
An analogous argument works when Q is negative definite. If A = O, there is nothing
to prove. If not, in the semidefinite or indefinite case, if an = 0, we first find an appropri­
ate elementary matrix, Ei, so that the first entry of the symmetric matrix B = E}AE\ is
nonzero, and then we continue as above. ■

Remark We will see another way, introduced in the next section and developed fully
in Chapter 9, of analyzing the nature of the quadratic form Q associated to a symmetric
matrix A. The signs of the eigenvalues of A will tell the whole story.

> EXERCISES 5.3


*1. Classify the critical points of the functions in Exercise 5.2.1.

2. Consider the function f = 2x4 — 3x2y + y2.

(a) Show that the origin is a critical point of f and that, restricting f to any line through the origin,
the origin is a local minimum point.
(b) Is the origin a local minimum point of f?2

2We’ve seen several textbooks that purportedly prove Theorem 3.3 by showing, for example, that if !H/>a is
positive definite, then the restriction of f to any line through a has a local minimum at a, and then concluding
that a must be a local minimum point of f. We hope that this exercise will convince you that such a proof must
be flawed.
216 ► Chapter 5. Extremum Problems

*3. Describe the graph of / | I = (2x2 + y2)e (*2+y2)

4. Let/ x3 4- e3y — 3xey.

(a) Show that / has exactly one critical point a, which is a local minimum point.
(b) Show that a is not a global minimum point.
d2 f
5. Suppose f: R2 -> R is C2 and harmonic (see Example 2 on p. 122). Assume —L (a) / 0. Prove
dx*
that a cannot be an extremum of /.
6. For each of the following symmetric matrices A, write A = LDL7, as in Example 3. Use your
answer to determine whether the associated quadratic form Q given by Q(x) = xTAx is positive
definite, negative definite, indefinite, etc.
1 3 1 -2 2
(a) A =
3 13 (d) A = -2 6 -6
2 -6 9
2 3
(b) A = 1 1 -3 1
3 4
1 0 -3 0
2 2 -2 (e) A =
-3 -3 11 -1
*(c) A = 2 -1 4
1 0 -1 2.
-2 4 1

7. Suppose A = LDU, where L is lower triangular with l’s on the diagonal, D is diagonal, and
U is upper triangular with 1 ’s on the diagonal. Prove that this decomposition is unique; i.e., if
A = LDU = L'D'U', where L', D', and U' have the same defining properties as L, D, and U,
respectively, then L = L',D = D', and U — U'. (Hint: The product of two lower triangular matrices
is lower triangular, and likewise for upper.)
0 2
8. (a) Let A = . After making a row exchange (and corresponding column exchange to
2 1
1 2
preserve symmetry), we get B = E\AE{ = . Now write B = LDL1 and get a corresponding
2 0
equation for A. How, then, have we expressed the associated quadratic form Q(x) = 4x i %2 + *2 a s
a sum (or difference) of squares?
0 1 1 1
(b) Let A = . By considering B = E\AE{ = where E\ is the elementary matrix
1 0 . J 0 ’
corresponding to adding 1/2 of the second row to the first, show that

2 “5 1
A = EDEJ where E= and D=
1 1 -1
What is the corresponding expression for the quadratic form Q(x) = 2xiXj as a sum (or difference)
of squares?

► 4 LAGRANGE MULTIPLIERS
Most extremum problems, including those encountered in single-variable calculus, involve
functions of several variables with some constraints. Consider, for example, the box of
prescribed volume, a cylinder inscribed in a sphere of given radius, or the desire to maximize
4 Lagrange Multipliers ”4 217

profit with only a certain amount of working capital. There is an elegant and powerful way
to approach all these problems by using multivariable calculus, the method of Lagrange
multipliers. A generalization to infinite dimensions, which we shall not study here, is central
in the calculus of variations, which is a powerful tool in mechanics, thermodynamics, and
differential geometry.

► EXAMPLE 1

Your boat has sprung a leak in the middle of the lake and you are trying to find the closest point on
the shoreline. As suggested by Figure 4.1, we imagine dropping a rock in the water at the location
of the boat and watching the circular waves radiate outward. The moment the first wave touches the
shoreline, we know that the point a at which it touches must be closest to us. And at that point, the
circle must be tangent to the shoreline.
Let’s place the origin at the point at which we drop the rock. Then the circles emanating from
this point are level curves of /(x) = ||x ||. Suppose, moreover, that the shoreline is a level curve of a
differentiable function g. By Proposition 5.3 of Chapter 4, the gradient is normal to level sets, so if
the tangent line of the circle at a and the tangent line of the shoreline at a are the same, this means
that we should have

V/(a) = AVg(a) for some scalar X. ◄

Figure 4.1

We now want to study the calculus of constrained extrema a bit more carefully.

Definition Suppose U G RB is open and f: U -> R and g: U -> Rm are differ­


entiable. Suppose g(a) = 0. We say a is a local maximum (resp., minimum) point of f
subject to the constraint g(x) = 0 if for some 8 > 0, /(x) < /(a) (resp., /(x) > /(a))
218 > Chapter 5. Extremum Problems

for all x e B(a, 8) satisfying g(x) = 0. More succinctly, letting M = g 1 ({0}), a is an


extremum of the restriction of f to the set M.

Theorem 4.1 Suppose U C Rn is open, f: U -» R is differentiable, and g: U ->


Rm is C1. Suppose g(a) = 0 and rank(Dg(a)) = m. If a is a local extremum off subject
to the constraint g = 0, then there are scalars k\,..., km so that

Df(a) = XiDgi(a) H------+ kmDgm(a).

The scalars k\,... ,km are called Lagrange multipliers.

Remark As usual, this is a necessary condition for a constrained extremum but not
a sufficient one. There may be (constrained) saddle points as well.

Proof By the Implicit Function Theorem, we can represent M = g“1 ({0}) locally
near a as a graph over some coordinate (n — m) -plane. For concreteness, let’s say that
locally

M= : x 6 V C Rn~m

where 0: V -> Rm is C1. Thus, we can define a local parametrization of M by

X
4> : V->Rn, $(x) =
0(X)

as shown in Figure 4.2, with 4»(a) = a. Now we have two crucial pieces of information:

go$ = 0 and y°<& has a local extremum at a.

Differentiating by the chain rule, and applying Lemma 2.1, we have

(t) Z>g(a)°£>4>(a) = O and D/(a)°D$(a) = 0.

Figure 4J
4 Lagrange Multipliers -4 219

The first equation in (t) tells us that 7, the (n — m)-dimensional image of the linear map
M(a), satisfies 7 C ker Dg(a) (or C([D*(a)]) c N([Dg(a)])). But, by the Nullity-Rank
Theorem, Corollary 4.6 of Chapter 4, we have

dimN([£)g(a)]) = n -rank([Dg(a)]) = n-m,

by hypothesis. Since dim7 = n — m = dimN([Dg(a)]), we must have 7 = N([Dg(a)]).


Moreover,

T= N([Dg(a)]) = (R([Dg(a)]))‘L.

On the other hand, the latter equation in (f) tells us that

Tc N([Z>y(a)]) = Rftuyca)])1.

Thus,

(R([Dg(a)]))'L C R([Z>/(a)])\

so, taking orthogonal complements and using Exercise 1.3.9 and Proposition 4.8 of Chapter
4, we have

R([Z>/(a)]) C R([Dg(a)]),

so D/(a) is a linear combination of the linear maps Dgi (a),..., Dgm(a)—or, more geo­
metrically, Vf (a) is a linear combination of the vectors Vgi (a),..., VgOT (a)—as we needed
to show. ■

Remark The subspace 7 = image(Z)4>(a)) = (R([Dg(a)]))± is called the tangent


space of M at a. We shall return to such matters in Chapter 6.

► EXAMPLE!

X A?
The temperature at the point y in space is given by f 1 y = xy + z2. We wish to find the
_z _ o’
hottest and coldest points on the sphere x2 + y2 + z2 = 2z (the sphere of radius 1 centered at 0 ).
1_
That is, we must find the extrema of f subject to the constraint g I y I = x2 + y2 + z2 — 2z = 0. By
\z/
Theorem 4.1, we must find points x satisfying g(x) = 0 at which D/(x) = AZ>g(x) for some scalar
A. That is, we seek points x so that

(*) y x 2z = A x y z-1 for some scalar A.

(Notice that we removed the factor of 2 from Dg.)


Eliminating A, we see that, provided none of our denominators is 0,

y _ x _ 2z
x ~~ y z-1‘
220 ► Chapter 5. Extremum Problems

So either

y=x and 2z = z — 1 or y = —x and 2z = 1 - z;

the former leads to z = — 1, which is impossible, and the latter leads to

Now, we infer from (*) that if x = 0, then y = 0 as well (and vice versa), and then z can be arbitrary,
so we also find that the north and south poles of the sphere are constrained critical points. On the
other hand, we cannot have the denominator z — 1 = 0, for, by (*), that would require z = 0, and
these equations cannot hold simultaneously.
Calculating the values of f at our various constrained critical points, we have

75/2/3 -V572/3>

( (
1
-75/2/3 7572/3 f p and f I0

1/3 / 1/3 / w w

~°1 7572/3 " -7572/3 "


Thus, the topmost point 0 is the hottest and the two points -7572/3 and
_2_ _ 1/3 _ 1/3
the coldest.

Remark We surmise that the origin is a saddle point. Indeed, representing the sphere
locally as a graph near the origin, we have z = 1 — Vl — (x2 + y2) and

/x \
f I y 1 = xy + (1 — Vl — (x2 + y2))2 = xy + higher order terms.
\z/

(This is easiest to see by using Vl + « = 1 4- h /2 + higher order terms.) Even easier, the
origin is a nonconstrained critical point of f. Since f is a quadratic polynomial, Xy.o = f,
and on the tangent plane of the sphere at 0 we just get xy. (Also see Exercise 34.)

► EXAMPLE 3

Find the shortest possible distance from the ellipse x2 4- 2y2 = 2 to the line x + y = 2. We need to
consider the (square of the) distance between pairs of points, one on the ellipse, the other on the line.
u
This means that we need to work in R2 x R2, with coordinates and , respectively. Let’s
y
try to minimize

/x''

y
= (x — u)2 + (y - v)2
u
4 Lagrange Multipliers 221

subject to the constraints

(x>

y x2 + 2y2 - 2 0
u u + v- 2 0
\v)

(The rank condition on g is easily checked in this case.) So we need to find points at which, for some
scalars X and /i, we have

Df = ADgi +nDg2\ i.e.,


[x — w y—v —(x — u) -(y - i>) j = X ^x 2y 0 0j £0 0 1 lj.

We see that we must have x — u = y — v and so x = 2y, as well. Now substituting into the constraint
equations yields two critical points:

2/V3 -2/73
1/73 -1/73
and
1 +1/(273) 1 - 1/(273)
1 - 1/(273) _ 1 + 1/(273)

u X
As a check, note that the vector from to in each case is normal to both the ellipse and the
y
line, as Figure 4.3 corroborates. Evidently, the first point gives the shortest possible distance, and we
leave it to the reader to establish this rigorously.

We close this section with an application of the method of Lagrange multipliers to


linear algebra. Suppose A is a symmetric n x n matrix. Let’s find the extrema of the
quadratic form Q(x) = xTAx subject to the constraint g(x) = ||x||2 = 1. By Theorem 4.1,
weseekx G R" so that for some scalar A we have DQ(x) = ADg(x). Applying the result of
Exercise 3.2.14 (and canceling a pair of 2’s), this means that at any constrained extremum
we must have

Ax = Ax for some scalar A.


222 ► Chapter 5. Extremum Problems

Such a vector x is called an eigenvector of A, and the Lagrange multiplier k is called an


eigenvalue. Note that by compactness of the unit sphere, Q must have at least a global mini­
mum and a global maximum; hence A must have at least two eigenvalues and corresponding
eigenvectors.

► EXAMPLE 4

6
Consider A — Proceeding as above, we arrive at the system of equations
2

6x 4- 2y = kx
2x + 9y = ky.

Eliminating k, we obtain

6x 4- 2y _ 2x 4- 9y
x ~ y

from which we find the equation

so either y = 2x or y = — Substituting into the constraint equation, we obtain the critical points
1/75 -2/75
(eigenvectors) and , with respective Lagrange multipliers (eigenvalues) 10
2/75 1/75

and 5.

► EXERCISES 5.4

1. (a) Find the minimum value < /I I = x2 4- y2 on the curve x 4- y = 2. Why is there no
maximum?

(b) Find the maximum value of g = x 4- y on the curve x2 4- y2 = 2. Is there a minimum?

(c) How are the questions (and answers) in parts a and b related?
x
‘2. A wire has the shape of the circle x2 4- y2 — 2y — 0. Its temperature at the point is given
/ X \
y
L. ' -J
by T I I — 2x2 4- 3y. Find the maximum and minimum temperatures of the wire. (Be sure you’ve

found all potential critical points.)

3. Find the maximum value of f y I = 2x 4- 2y — z on the sphere of radius 2 centered at the origin.
\z) (x\
4. Find the maximum and minimum values of the function / I I = x2 4- xy 4- y2 on the unit disk
V/
D = {x € R2 : ||x|| < 1).
4 Lagrange Multipliers 223

1
5. Find the point(s) on the ellipse x2 + 4y2 = 4 closest to the point
0

6. The temperature at point x is given by f y = x2 + 2y + 2z. Find the hottest and coldest points
on the sphere x2 + y2 + z2 = 3.
7. Find the volume of the largest rectangular box (with all its edges parallel to the coordinate axes)
that can be inscribed in the ellipsoid
y + zL
+— 2 = i.
2 3
8. A space probe in the shape of the ellipsoid 4x2 + y2 + 4z2 = 16 enters the earth’s atmosphere and

its surface begins to heat. After 1 hour, the temperature in °C on its surface is given by f y I =

W
2x2 + yz — 4z + 600. Find the hottest and coldest points on the probe’s surface.
/x\
9. The temperature in space is given by f y = 3xy + z3 — 3z. Prove that there are hottest and
W
coldest points on the sphere x2 + y2 + z2 - 2z = 0, and find them.
G\ X
10. Let f y I = xy + z3 and S = y : x2 + y2 +z2 = 1, z>0 . Prove that f attains its
_z _
global maximum and minimum on S and determine its global maximum and minimum points.
11. Among all triangles inscribed in the unit circle, which have the greatest area? (Hint: Consider
the three small triangles formed by joining the vertices to the center of the circle.)
12. Among all triangles inscribed in the unit circle, which have the greatest perimeter?

*13. Find the ellipse x2/a2 + y2/b2 = 1 that passes through the point and has the least area.

(Recall that the area of the ellipse is nab.)


14. If a, ft, and y are the angles of a triangle, show that
. « . P . / *
sin — sin — sin — < -.
2 2 2 ” 8
For what triangles is the maximum attained?
s*15. Find the points closest to and farthest from the origin on the ellipse 2x2 + 4xy + 5y2 = 1.
16. Solve Exercise 5.2.8 anew, using Lagrange multipliers.
17. Solve Exercise 5.2.9 anew, using Lagrange multipliers.
18. Find the maximum and minimum values of the function /(x) — %! H------- 1- xn subject to the
constraint ||x|| = 1.
*19. Find the maximum volume of an n-dimensional rectangular parallelepiped of diameter 3.
20. Suppose X[,..., xn are positive numbers. Prove that
224 ► Chapter 5. Extremum Problems
1 1
21. Suppose p, q > 0 and —I- - = 1. Suppose x, y > 0. Use Lagrange multipliers to prove that
P q x?
— + ~>xy.
p q
(Hint: Minimize the left-hand side subject to the constraint xy = constant.)
22. Solve Exercise 5.2.11 anew, using Lagrange multipliers.
23. A silo is built by putting a right circular cone atop a right circular cylinder (both having the same
radius). What dimensions will give the silo of maximum volume for a given surface area?
24. Solve Exercise 5.2.12 anew, using Lagrange multipliers.
*25. Use Lagrange multipliers to find the point closest to the origin on the intersection of the planes
x + 2y -I- z = 5 and 2x + y - z = 1.
26. In each case, find the point in the given subspace V closest to b.
T3 "
*(a) V = {x e R3 : xi - x2 4- 3x3 = 2xi + x2 = 0}, b = 7
_1 _
(b) V = {x e R4 : X] + X2 + x3 + x4 = xi 4- 2x3 + x4 = 0), b =

*27. Find the points on the curve of intersection of the two surfaces x2 — xy 4- y2 — z2 = 1 and
x2 + y2 = 1 that are closest to the origin.
28. Show that of all quadrilaterals with fixed side lengths, the one of maximum area can be inscribed
in a circle. (Hint: Use as variables a pair of opposite angles. See also Exercise 1.2.14.)
29. For each of the following symmetric matrices A, find all the extrema of Q(x) = xTAx subject to
the constraint ||x||2 = 1. Also determine the Lagrange multiplier each time.

30. Find the norm of each of the following matrices. Note: A calculator will be helpful.

31. A (frictionless) lasso is thrown around two pegs, as pictured in Figure 4.4, and a large weight
hung from the free end. Treating the mass of the rope as insignificant, and supposing the weight hangs
freely, what is the equilibrium position of the system?
32. (Interpreting the Lagrange Multiplier)
(a) Suppose a = ^(c) is a local extreme point of the function f relative to the constraint g(x) = c;
suppose, moreover, that is a differentiable function. Show that A =
(b) Assume that f and g are C2. Use the Implicit Function Theorem (see Theorem 2.2 of Chapter 6
for the general version) to show that the extreme point a is given locally as a differentiable function
of c whenever the “bordered Hessian”

• Hess(/)(a) — AHess(g)(a) Vg(a)

----- Dg(a) ----- 0

is invertible.
5 Projections, Least Squares, and Inner Product Spaces ◄ 225


Figure 4.4
33. (An Application of Exercise 32 to Economics) Let x e R" be the commodity vector, pel'1
the price vector, and f: R” -> R the production function, so that f (x) tells us how many widgets
are produced, using Xi units of item i, i = 1,..., n. Prove that to produce the greatest number of
widgets with a given budget, we must have
13/ 1 df
Pl dXt p„ 3xn '

Pi
What does the result of Exercise 32a tell us in this case?
34. (A Second Derivative Test for Constrained Extrema) Suppose a is a critical point of f sub­
ject to the constraint g(x) = c, Df (a) = kDg(a), and Dg(a) 0. Show that a is a constrained local
maximum (resp., minimum) of f on M = {x: g(x) = c} if the restriction of the Hessian of f — kg to
the tangent space TaM is negative (resp., positive) definite. (Hint: Parametrize the constraint surface
M locally by $ with <P(a) = a and apply Theorem 3.3 to /o<>.) There is an interpretation in terms
of the bordered Hessian (see Exercise 32b), which is indicated in Exercise 9.4.21.

► 5 PROJECTIONS, LEAST SQUARES,


AND INNER PRODUCT SPACES

► EXAMPLE 1

Suppose we’re given the system Ax = b to solve, where

1 2 1
A— 0 1 and b= 1
1 1 1

It is easy to check that b £ C(A), and so this system is inconsistent. The best we can do is to solve
Ax = p, where p is the vector in C(A) that is closest to b. Clearly that point is p = b — projab, where
a is the normal vector to C(A) c R3, as shown in Figure 5.1. Now we see how to solve our problem.
C(A) is the plane in R3 with normal vector

1
1
226 ► Chapter 5. Extremum Problems

Figure 5.1

and if we compute projab, then we will have

p = b — projab e Span (a)1 = C(A) and b — p = projab e C(A)1.


In our case, we have

. t ba 1
proj.b =---- =-a = -
a lla||2 3

and so

4/3
2/3
2/3

Now it is an easy matter to solve Ax = p; indeed, the solution is

0
2/3

This is called the least squares solution of the original problem, inasmuch as Ax is the vector in C( A)
closest to b.

In general, given b e R" and an m-dimensional subspace V can ask for the
projection of b onto V, i.e., the point in V closest to b, which we denote by projvb. We
first make the official

Definition Let V c R" be a subspace, and let b e R”. We define the projection
of b onto V to be the unique vector p e V with the property that b - p e V1. We write
p = projvb.

We ask the reader to show in Exercise 10 that projection onto a subspace V gives a linear
map. As we know from Chapter 4, we can be given V either explicitly (say, V = C(A) for
some n x m matrix A) or implicitly (say, V = N(B) for some m x n matrix B). We will
start by applying the methods of this chapter to obtain a simple solution of the problem (and
then we will indicate that we could have omitted the calculus completely).
5 Projections, Least Squares, and Inner Product Spaces ◄ 227

Suppose A is an n x m matrix of rank m (so that the column vectors ai,..., aw give
a basis for our subspace V). Define

f: -> R by /(x) = ||Ax - b||2.

We seek critical points off. Write A (x) = ||x||2andg(x) = Ax - b, so that / = h°g. Then
Dhfy) = ty1 and Dg(x) = A, so, differentiating / by the chain rule, we have D/(x) =
D/i(g(x))Dg(x) = 2(Ax - b)TA. Thus,D/(x) = 0 ■<=> (Ax - b)TA = 0. Transposing
for convenience, we deduce that x is a critical point if and only if

(*) (At A)x = ATb.

Now, At A is an m x m matrix, and by Exercise 4.4,14 this matrix is nonsingular, so the


equation (*) has a unique solution x. We claim that x is the global minimum point. This is
just the Pythagorean Theorem again: Since AT(Ax — b) = 0, Ax — b g N(At ) — C(A)\
so, for any x 6 Rm, x /x,as Figure 5.2 shows:

/(x) = ||Ax - b||2 = IIA(x - x) + (Ax - b)II2 = ||A(x - x)||2 + ||Ax - b||2 > .

The vector x is called the least squares solution of the (inconsistent) linear system Ax — b,
and (*) gives the associated normal equations.

Figure 5.2

Remark When A has rank less than m, the linear system (*) is still consistent (see
Exercise 4.4.15) and has infinitely many solutions. We define the least squares solution to
be the one of smallest length, i.e., the unique vector x g R(A) that satisfies the equation.
See Proposition 4.10 of Chapter 4. This leads to the pseudoinverse that is important in
numerical analysis (cf. Strang).

► EXAMPLE2
We wish to find the least squares solution of the system Ax = b, where
228 ► Chapter 5. Extremum Problems

We need only solve the normal equations ATAx = ATb. Now,

2
At A = and
4

and so, using the formula for the inverse of a 2 x 2 matrix in Example 5 on p. 154,

~2] [4 1 3'
x= (ATA) 1Arb= *
6 5 To u

is the least squares solution.

This is all it takes to give an explicit formula for projection onto a subspace V c R".
In particular, denote by

projv: JR" -> Rn

the function that assigns to each vector b e HF the vector p e V closest to b. Start by
choosing a basis {Vi,..., vm} for V, and let

’I I I
A = vi v2 ••• vm

be the n x m matrix whose column vectors are these basis vectors. Then, given b g T,
we know that if we take x = (ATA)-1 ATb, then Ax = p = projvb. That is,

p = projvb = (A(ATA)-1AT)b,

from which we deduce that the matrix

(t) p =
is the appropriate projection matrix: i.e.,

[projy] = A(At A)"1At .

In Section 5.2, we’ll see a bit more of the geometry underlying the formula for the projection
matrix.

► EXAMPLE 3

If b € C(A) to begin with, then b = Ax for some x e Rm, and

Pb = (A(ATA)“1AT)b = A(At A)-1(At A)x = Ax = b,

as it should be. And if b 6 CCA)-1, then b e N(AT), so

Pb = (A(ATA)-1AT)b = A(ATA)“1(ATb) = 0,

as it should be. ◄
5 Projections, Least Squares, and Inner Product Spaces 229

► EXAMPLE 4

Note that when dim V = 1, we recover our formula for projection onto a line from Section 2 of
Chapter 1. If a e R" is a nonzero vector, we consider it as an n x 1 matrix and the projection formula
becomes

that is,

Pb = -i- (aaT)b = —^a(aTb) = 7-^ a,

as before.

► EXAMPLES

Let V C R3 be the plane defined by the equation — 2x2 + *3 = 0. Then

form a basis for V, and we take

Then, since

2
At A = we have (ATA)'1

and so

Now, what happens if we are given the subspace implicitly? This sounds like the
perfect setup for Lagrange multipliers. Suppose the m-dimensional subspace V c R” is
given as the nullspace of an (n — m) x n matrix B of rank n — m. To find the point in V
closest to b e R", we want to minimize the function

f: R" -> R, /(x) = ||x — b||2, subject to the constraint g(x) = Bx = 0.


230 ► Chapter 5. Extremum Problems

The method of Lagrange multipliers, Theorem 4.1, tells us that we must have (dropping the
factor of 2)
n— m
(X - b)T = AjB/, for some scalars Xi,..., Xn_m,
i=l

where, as usual, B, are the rows of B. Transposing this equation, we have

x - b = BTX, where X =

Multiplying this equation by B and using the constraint equation, we get

(BBr)k = —Bb.

By analogy with our treatment of the equation (*), the matrix BBr has rank n — m, and so
we can solve for X, hence for the constrained extremum x q :

($) xo = b + BT(-(BBT)_1Bb) = b - Br(BBr)~lBb.

Note that, according to our projection formula (t), we can interpret this answer as

xo = b - projC(BT)b = b - projR(B)b = projR(B)±b = projN(B)b,

as it should be.

5.1 Data Fitting


Perhaps the most natural setting in which inconsistent systems of equations arise is that of
fitting data to a curve when they won’t quite fit. For example, in our laboratory work many
X\ ' Xm
of us have tried to find the right constants a and k so that the data points ■9 • • • 9
_ym.
lie on the curve y = axk. Taking natural logarithms, we see that this is equivalent to
Ui log Xi
fitting the points i — 1,..., m, to a line v = ku + log a—whence the
-vi. .logy, _
convenience of log-log paper. The least squares solution of such problems is called the
least squares line fitting the points (or the line of regression in statistics).

► EXAMPLES

Find the least squares line y = ax + b for the data points , and (See Figure 5.3.)
1 3
We get the system of equations

-la + b = 0
la + b = 1
2a 4- b = 3,
5 Projections, Least Squares, and Inner Product Spaces << 231

which in matrix form becomes

-1 "o'

1 1
2 _3_

The least squares solution is


'o'
a
= (AxA)~1Ar 1
i r 3 2 V _ _1_ 13
h 14 —2 6 4 " 14 10
_3_

That is, the least squares line is


13 5
y =
14 7’

xi
When we find the least squares line y = ax + b fitting the data points
.yi _ ym J’
we are finding the least squares solution of the (inconsistent) system A = y, where

?2
and

L?m J

Let’s denote by y = A the projection of y onto C(A). The least squares solution

has the property that || y — y || is as small as possible. If we define the error vector € = y — y,
232 ► Chapter 5. Extremum Problems

then we have
“ 61 ' " 71 - " ” 71 - (a^ + b) ~
62 72-72 72 ~ («X2 + b)

_6m _ _ 7m - 7m _ _ym- (axm + b) _

The least squares process chooses a and b so that ||€||2 = ej H-------- F 6 2 is as small as
possible. But something interesting happens. Recall that

« = y-yeC(A)±.

Thus, € is orthogonal to each of the column vectors of A, and so, in particular,

ei

= 61 + ■ • • + €m = 0.

That is, in the process of minimizing the sum of the squares of the errors 6,, we have in fact
made their (algebraic) sum equal to 0.

5.2 Orthogonal Bases


We have seen how to find the projection of a vector onto a subspace V C R" by using the
so-called normal equations. But the inner workings of the formula (t) on p. 228 escape
us. Since we have known since Chapter 1 how to project a vector x onto a line, it might
seem more natural to start with a basis {vi,..., v*} for V and sum up the projections of x
onto the v7’s. However, as we see in Figure 5.4(a), when we start with x e V and add the
projections of x onto the vectors of an arbitrary basis for V, the resulting vector needn’t
have much to do with x. Nevertheless, the diagram on the right suggests that when we start
with a basis consisting of mutually orthogonal vectors, the process may work. We begin
by proving this as a lemma.

Figure 5.4
5 Projections, Least Squares, and Inner Product Spaces 233

Definition Let vj,..., v* G We say {vj,..., } is an orthogonal set of vectors


provided v; • Vj — 0 whenever i j. We say {vi,..., v*} is an orthogonal basis for a
subspace V if {vi,..., v*} is both a basis for V and an orthogonal set.

Lemma 5.1 Suppose {vi,..., } is a basis for V and x g V. Then

52pr°jvf fx = £riiv
* - tr 52 *.f ii2
if and only if{N\,..., V/J is an orthogonal basis for V.

Proof Suppose {vi,..., v^J is an orthogonal basis for V. Then there are scalars
ci,c* so that

x = QVi 4--------F c/Vj 4-------- 1- CjtVfc.

Taking advantage of the orthogonality of the v/s, we take the dot product of this equation
with Vj:

x • v, = ci(vi • vf) 4- • • • 4- cz(Vi • vz) 4- • • • 4- ck(yk • vz)


= || vf ||2,

and so
- x'Vf
Ci~ IIvf II2'

(Note that vt / 0 since {vi,..., v*} forms a basis for V.)


Conversely, suppose that every vector x g V is the sum of its projections on Vi,..., v^.
Let’s just examine what this means when x = vr. We are given that
k k
v-'' Vi • Vf
v1 = ^proJviv1 = 22-jj-p-Vi.

Recall from Proposition 3.1 of Chapter 4 that every vector has a unique expansion as a
linear combination of basis vectors, so comparing coefficients of v2,..., vjt on either side
of this equation, we conclude that

vj • vz = 0 for all i = 2,..., fc.

A similar argument shows that vz • v7 = 0 for all i j, and the proof is complete. ■

As we mentioned above, if {vi,..., v^} is a basis for V, then every vector x g V can
be written uniquely as a linear combination

x = C1V1 4- c2v2 4-------- F ckvk.

We recall that the coefficients ci, c2,..., ck that appear here are called the coordinates of x
with respect to the basis {vi,..., v^}. It is worth emphasizing that when {vi,..., v*} forms
an orthogonal basis for V, it is quite easy to compute the coordinates of x by using the
dot product; that is, c(- = x • vf /1| vr ||2. As we saw in Example 8 of Section 3 of Chapter 4
234 ► Chapter 5. Extremum Problems

(see also Section 1 of Chapter 9), when the basis is not orthogonal, it is far more tedious to
compute these coordinates.
Not only do orthogonal bases make it easy to calculate coordinates, they also make
projections quite easy to compute, as we now see.

Proposition 5.2 Let V C R” be a k-dimensional subspace. For any vector b e R”,

fc b V'
(**) projvb = gprojV1b =

if and only if{vi,..., v*} is an orthogonal basis for V.

Proof Assume {vi,..., v*} is an orthogonal basis for V and write b = p + (b — p),
where p = projvb (and so b - p e V1). Then, since p e V, by Lemma 5.1, we know
k
P Vz Moreover, for i = 1,..., k, we have bv( = p • v< since b — p e V1.

Thus7
L llvJI2

K K K » K
projvb = p = gp^.p = g „ =g = gprojv,b.

k
Conversely, suppose proj7b = X} projVjb for all b e R". In particular, when b e V, we
i=i
deduce thatb = projv bean be written as a linear combination of Vi,..., v*, so these vectors
span V; since V is A-dimensional, {vi,..., Vj.} gives a basis for V. By Lemma 5.1, it must
be an orthogonal basis. ■

We now have another way to calculate the projection of a vector on a subspace V,


provided we can come up with an orthogonal basis for V.

> EXAMPLE?

We return to Example 5 on p. 229. The basis {vi, v2} we used there was certainly not an orthogonal
basis, but it is not hard to find one that is. Instead, we take

1
W] = o and w2 = 1
1

(It is immediate that wi • w2 = 0 and that wlf w2 lie in the plane xj — 2x2 + x3 = 0.) Now, we
calculate
. _ . , . , b • Wi b • w2
projvb = projwib + proj,2b = + ^w2
/ 1 T 1 T\
= I 7i---- --------------------ii2W2W2 b
\IIW1II2 ||W2||2 V
5 Projections, Least Squares, and Inner Product Spaces ◄ 235

1
1 1 b

1 d/

as we found earlier.

Remark This is exactly what we get from formula (t) on p. 228 when {vi,..., }
is an orthogonal set. In particular,

Now it is time to develop an algorithm for transforming a given (ordered) basis


{vi,..., v*} for a subspace into an orthogonal basis {wi,..., w*}, as shown in Figure
5.5. The idea is quite simple. We set

wi = vi.

Figure 5.5
236 ► Chapter 5. Extremum Problems

If V2 is orthogonal to wi, then we set w2 = Nz. Of course, in general, it will not be, and we
want W2 to be the part of v2 that is orthogonal to wi; i.e., we set
▼ 2-W1
W2 = v2 - projWj V2 = V2 - Wi.
llwill2

Then, by construction, wi and W2 are orthogonal and Span(wi, w2) C Span(vi, v2). Since
w2 / 0 (why?), {wi, W2} must be linearly independent and therefore give a basis for
Span(vi, v2) by Lemma 3.8. We continue, replacing v3 by its part orthogonal to the plane
spanned by Wi and w2:
v3 ■ wi v3 • w2
w3 = v3 - projSpan(W1W2)v3 = v3 - projW1 v3 - projW2v3 = v3 - ------ --------------- t"
llwill2 IIW2II2

Note that we are making definite use of Proposition 5.2 here: We must use Wi and W2 in the
formula here, rather than Vi and v2, because the formula (**) requires an orthogonal basis.
Once again, we find that w3 0 (why?), and so {wi, W2, w3} must be linearly independent
and, consequently, an orthogonal basis for Span(vi, v2, v3). The process continues until
we have arrived at v* and replaced it by
V* • Wi Vfc • w2 ▼ a • w*-i
Wfc —v* projSpan(Wli...>Wi_1)vk - Vfc ||W1||2W1 ||w2||2 W2 W*-1.
llwt-dl2

Summarizing, we have the algorithm that goes by the name of the Gram-Schmidt
process.

Theorem 5.3 (Gram-Schmidt Process) Given a basis {vi,..., v^} for a subspace
V C Rn, we obtain an orthogonal basis {wi,..., w^} for V as follows:

Wi = Vi

W2 = V2 - Wi

and, assuming wi,..., w, have been defined,

Vj+1 • Wi VJ+1 • W2 V;+1 • Wy


Wj+1 = VJ+' - 1^FW| - -------

Vfc ’ Wi v* • w2 Vfc • Wfc-1


Wfc = vk ------ tWi w2 - • Wfc-1.
llwi II2 IIW2II2 l|w*_i II2

Ifwe so desire, we can arrangefor an orthogonal basis consisting ofxxaA vectors by dividing
each of Wi,..., wk by its respective length:
_ W1 _ W2 Wfc
’‘“llwilT ’2' ||w2|| ’ ’e-||w»ll’

The set {qi,..., q*} is called an orthonormal basis for V.


5 Projections, Least Squares, and Inner Product Spaces ◄ 237

► EXAMPLES

Letvi = . We want to use the Gram-Schmidt process to give

an orthogonal basis for V = Span(vi, v2, v3) c R4. We take

And if we desire an orthonormal basis, then we take


238 ► Chapter 5. Extremum Problems

It’s always a good idea to check that the vectors form an orthogonal (or orthonormal) set, and it’s
easy—with these numbers—to do so.

5.3 Inner Product Spaces


In certain abstract vector spaces we may define a notion of dot product.

Definition Let V be a real vector space. We say V is an inner product space if for
every pair of elements u, v € V there is a real number (u, v), called the inner product of u
and v, such that

1. (u, v) — (v, u) for all u, v e V;


2. (cu, v) = c(u, v) for all u, v 6 V and scalars c;
3. (u + v, w) = (u, w) + (v, w) for all u, v, w g V;
4. (v, v) > 0 for all v G V and (v, v) = 0 only if v = 0.

► EXAMPLE 9

a. Fix k + 1 distinct real numbers ti, t2,..., r*+i and define an inner product on Pk, the vector
space of polynomials of degree < k, by the formula

*+i
(p,?) = ^P(ti)q(.ti), p,q € Pk.
i=l

All the properties of an inner product are obvious except for the very last. If {p, p) =0, then
k+1
pit,)2 = 0, and so we must have p(ti) = p(t2) = • • • = pfo+i) = 0- But if a polynomial
z=i
of degree < k has (at least) k + 1 roots, then it must be the zero polynomial.
b. Let C°([a, £»]) denote the vector space of continuous functions on the interval [a, b]. If
fge 6°([a, £>]), define

(fg} = [ f(t)g(t)dt.
Ja
We verify that the defining properties hold.
i. (/. «)=£fUMM== tg, /)■
2- {cf, g) = £(cfW>g(f)dt = £ cf(f)g(e)dl = cf‘ = c{f, g).
3. (/ + «.*>) = /■*(/ + gW)W>di = (/«) + g(l))h<i)dl = ft +
g(t)h(f))dl = f‘ f(f)Mf)dt + f* g(t)h(f)dl = (/, A) + (g, A).
5 Projections, Least Squares, and Inner Product Spaces 239

4. {f, f) = f* f(t)2dt > 0 since f(t)2 > 0 for all t. On the other hand, if (f, f) =
fa (/(O)2dt = 0, then since f is continuous and f2 > 0, it must be the case that
f = 0. (If not, we would have /(to) /= 0 for some to, and then f(/)2 would be positive
on some small interval containing io; it would then follow that fa f(t)2dt > 0.)

The same inner product can be defined on subspaces of C°([a, /?]), e.g., TV
c. We define an inner product on Mnxn in Exercise 18.

If V is an inner product space, we define length, orthogonality, and the angle between
vectors just as we did in R". If v e V, we define its length to be ||v|| = V(v, v). We
say v and w are orthogonal if (v, w) = 0. Since the Cauchy-Schwarz inequality can be
established in general by following the proof of Proposition 2.3 of Chapter 1 verbatim, we
can define the angle 0 between v and w by the equation

a <V’ W>
cos 6 = ——-—■

We can define orthogonal subspaces, orthogonal complements, and the Gram-Schmidt


process analogously.
We can use the inner product defined in Example 9a to prove the following important
result about curve fitting.

Theorem 5.4 (Lagrange Interpolation Formula) Given k + 1 points

t\ tz
b\ _ _ bz _ _

in the plane with ti, t2,..., tk+i distinct, there is exactly one polynomial p &Pk whose
graph passes through the points.

Proof We begin by explicitly constructing a basis for Pk consisting of mutually


orthogonal vectors of length 1 with respect to the inner product defined in Example 9a.
That is, to start, we seek a polynomial pi € Pk so that

Pi(h) = l, Pite) = 0, ..., pi(rjt+i)=O.

The polynomial qi(t) — (t - t2)(t -t3)---(t - tk+i) has the property that qi(tj) = 0 for
j = 2, 3,..., k + 1, and qi (ti) = (ti - *2)(*i - fc) • • • Gi - fc+i) £ 0 (why?). So now we
set
= (t - t2)(t - t3) • • • (t - tfc+1) .
P1 (ti - ?2)(h -13) • • • (ti - tk+i)'

then, as desired, pi(?i) = 1 and pi(tp = 0 for j = 2,3,..., k 4-1. Similarly, we can
define
(.t-ti)(t-t3)---(t-tk+i)
P2 (*2 — *1)(*2 ~ t3) • • • (t2 — tk+1)
240 ► Chapter 5. Extremum Problems

and polynomials pj,,, pk+i so that

1, when i — j
Pt(*j) =
0, when i / j

Like the standard basis vectors in Euclidean space, pi, P2> • • •,Pk+i are unit vectors in Pk
that are orthogonal to one another. It follows from Exercise 4.3.5 that these vectors form a
linearly independent set, hence a basis for Pk (why?). In Figure 5.6 we give the graphs of
the Lagrange basis polynomials pi, P2, P3 for P2 when fi = — 1, t2 = 0, and = 2.

Now it is easy to see that the appropriate linear combination

P — bipi 4- b2P2 + • • • + bk+iPit+i

has the desired properties: viz., p(tj) = b}for j = 1,2,..., k + 1. On the other hand,
two polynomials of degree < k with the same values at k + 1 points must be equal since
their difference is a polynomial of degree < k with at least k + 1 roots. This establishes
uniqueness. (More elegantly, any polynomial q with q(tj) = bj, j = 1,..., k + 1, must
satisfy {q, pj) = bj, j = 1,... ,k + 1.) ■

► EXERCISES 5.5
1. Find the projection of the given vector b e Rn onto the given hyperplane V C R".
"2"
(a) V — {x\ + X2 ■+■ X3 =0} c R3, b = 1
_1_
TO’
1
*(b) V = to +x2 +x3 = 0} c R4,b =
2
3
5 Projections, Least Squares, and Inner Product Spaces 241

(c) V = {xi - x2 + x3 + 2x 4 = 0} C R4, b =

2. Check from the formula P = A(ATA) 1AT for the projection matrix that P = PT and P2 = P.
Show that I — P has the same properties; explain.
1 0
3. Let V = Span 0 1 C R3. Construct the matrix [projv]
1 -2
(a) by finding [proj^ij];
(b) by using the projection matrix P given in formula (f) on p. 228;
(c) by finding an orthogonal basis for V.
*4. (a) Find the least squares solution of
xi + x2 = 4
2xi + x2 = —2
xi - x2 = 1.

’1" 1' 4~
(b) Find the point on the plane spanned by 2 and 1 that is closest to -2
_1 _ _ -1 _ 1_
5. (a) Find the least squares solution of
Xi + x2 = 1
xi — 3x 2 = 4
2xi + x2 = 3.
' 1'
r 1~ " 1"
(b) Find the point on the plane spanned by 1 and -3 that is closest to 4
_2_ 1_ _3 _
6. Solve Exercise 5.4.26 anew, using (+) on p. 230.
-1 0 1 2
7. Consider the four data points
0 3 1 5
L ” J L ‘ J L ‘J L"J
'(a) Find the least squares horizontal line y = a fitting the data points. Check that the sum of the
errors is 0.
(b) Find the least squares line y = ax + b fitting the data points. Check that the sum of the errors
is 0.
*(c) Find the least squares parabola y — ax2 + bx + c fitting the data points. (Calculator recom­
mended.) What is true of the sum of the errors in this case?
1 2 3 4
8. Consider the four data points
1 2 1 3
(a) Find the least squares horizontal line y — a fitting die data points. Check that the sum of the
errors is 0.
(b) Find the least squares line y = ax + b fitting the data points. Check that the sum of the errors
isO.
242 > Chapter 5. Extremum Problems

(c) Find the least squares parabola y = ax2 + bx + c fitting the data points. (Calculator recom­
mended.) What is true of the sum of the errors in this case?
9. Derive the equation (*) on p. 227 by starting with the equation Ax = p and using the result of
Theorem 4.9 of Chapter 4.
10. Let V c R" be a subspace. Prove from the definition of projy on p. 226 that
(a) projy (x + y) = projvx + projvy for all vectors x and y;
(b) projy (ex) = cprojyX for all vectors x and scalars c.
(c) for any b e R” we have b = projyb + projyxb.
Parts a and b tell us that projy is a linear map.
11. Using the definition of projection on p. 226, prove that
(a) if [projy] = A, then A = A2 and A = AT. (Hint: For the latter, show that Ax • y = x • Ay for all
x, y. It may be helpful to write x and y as the sum of vectors in V and Vx.)
(b) if A2 = A and A = AT, then A is a projection matrix. (Hints: First decide onto which subspace
it should be projecting. Then show that for all x, the vector Ax lies in that subspace and x — Ax is
orthogonal to that subspace.)
12. Execute the Gram-Schmidt process in each case to give an orthonormal basis for the subspace

(a) Find an orthogonal basis for V.


(b) Use your answer to part a to find p = projyb.
(c) Letting

use your answer to part b to give the least squares solution of Ax = b.

14. Let V = Span

(a) Give an orthogonal basis for V.


5 Projections, Least Squares, and Inner Product Spaces ◄ 243

(b) Give an orthogonal basis for Vx.


(c) Given a general vector x € R4, find v e V and w e Vx so that x = v + w.
15. According to Proposition 4.10 of Chapter 4, if A is an m x n matrix, then for each b G C(A),
there is a unique x g R(A) with Ax = b. In each case, give a formula for that x.
1 2 3
(a) A = (c) A =
1 2 3 1 1 3 -5

"1 1 1 1
1 1
(b) (d) A = 1 1 3 -5
o 1 -1
2 2 4 -4

B16. Let A be an n x n matrix and, as usual, let at,..., a„ denote its column vectors.
(a) Suppose ai,..., an form an orthonormal set. Prove that A-1 = AT.
*(b) Suppose ai,..., an form an orthogonal set and each is nonzero. Find the appropriate formula
for A-1.
17. LetV = e°([—a, a]) with the inner product (f,g) = f(t)g(f)dt. LetU+ C V be the subset
of even functions, and letI7~ C V be the subset of odd functions. That is, U+ = {f G V : f (—t) =
/(O for all t g [-a, a]} and U~ = {f g V : f(-t) = -f(t) for all t g [-a, a]}.
(a) Prove that U+ and U~ are orthogonal subspaces of V.
(b) Use the fact that every function can be written as the sum of an even and an odd function, viz.,
/«) = j(/(O + /(-»)) + !(/(» - /(-»))■
even odd

to prove that U~ = (U+)x and U+ — (U")x.


18. (See Exercise 1.4.22 for the definition and basic properties of trace.)
(a) If A, B g Adnxn, define (A, B) = tr(ATB). Check that this is an inner product on A4nXn-
(b) Check that if A is symmetric and B is skew-symmetric, then (A, B} =0. (Hint: Show that
(A, B) = —(B, A).)
(c) Deduce that the subspaces of symmetric and skew-symmetric matrices (cf. Exercise 4.3.24) are
orthogonal complements in A4nxn-
19. Let gi (t) = 1 and g2 (t) —t. Using the inner product defined in Example 9b, find the orthogonal
complement of Span(gi, g2) in
(a) P2 C e°([-l, 1]); *(b) P2 C C°([0,1]); (c) P3 c e°([-l, 1]).
*20. Show that for any positive integer n, the functions 1, cos t, sin t, cos 2t, sin 2t,..., cos nt, sin nt
are orthogonal in C°°([-jr, nr]) c 6° ([—t t , t t ]) (using the inner product defined in Example 9b).
CHWER

SOLVING NONLINEAR
PROBLEMS
In this brief chapter we introduce some important techniques for dealing with nonlinear
problems (and in the infinite-dimensional setting as well, although that is too far off-track
for us here). As we’ve said all along, we expect the derivative of a nonlinear function to
dictate locally how the function behaves. In this chapter we come to the rigorous treatment
of the inverse and implicit function theorems, to which we alluded at the end of Chapter
4, and to a few equivalent descriptions of a ^-dimensional manifold, which will play a
prominent role in Chapter 8.

k 1 THE CONTRACTION MAPPING PRINCIPLE


We begin with a useful result about summing series of vectors. It will be important not just
in our immediate work but also in our treatment of matrix exponentials in Chapter 9.

Proposition 1.1 Suppose {a^} is a sequence of vectors in R" and the series
00

£iim
fc=l

converges (i.e., the sequence of partial sums 4 = ||ai || + • • • + ||a^|| is a convergent se­
quence of real numbers). Then the series
00

Jt=l

of vectors converges (i.e., the sequence ofpartial sums Sk = »i -I--------F a* is a convergent


sequence of vectors in R").

Proof We first prove the result in the case n = 1. Given a sequence {a*} of real
numbers, define bk = a* + Note that

2zzjt, ifa£>0
0, otherwise

244
1 The Contraction Mapping Principle ◄ 245

Now, the series EEd converges by comparison with £ |. (Directly: Since bk > 0,
the partial sums form a nondecreasing sequence that is bounded above by 2 £ That
nondecreasing sequence must converge to its least upper bound. See Example 4c of Chapter
2, Section 2.) Since ak = bk - |a*|, the series converges, being the sum of the two
convergent series £ and — £ lfld-
We use this case to derive the general result. Denote by ak,j, j = 1,..., n, the j,th
component of the vector a*. Obviously, we have |a*j| < ||ajt||. By comparison with
the convergent series £ II** lb for any j = 1,..., n, the series ^,k \akj | converges, and
hence, by what we’ve just proved, so does the series akj. Since this is true for each
j = 1......... n, the series

HkakA

£»* =
_ ak,n _

converges as well, as we wished to establish. ■

Remark The result holds even if we use something other than the Euclidean length in
R". For example, we can apply the result by using the norm defined on the vector space of
m x n matrices in Section 1 of Chapter 5, since the triangle inequality || A + B|| < || A || 4-
|| B || holds (see Proposition 1.3 of Chapter 5) and \atj | < || A || for any matrix A = [a,7]
(why?).

The following result is crucial in both pure and applied mathematics, and applies in
infinite-dimensional settings as well.

Definition Let X be a subset of R". A function f: X -> X is called a contraction


mapping if there is a constant c, 0 < c < 1, so that

||f(x) -f(y)|| < c||x —y|l for all x, ye X.

(It is crucial that c be strictly less than 1, as Exercise 2 illustrates.)

► EXAMPLE 1

Consider/: [0, t t /3] -> [0,1] c [0, ?r/3] given by /(x) =cosx. Then by the mean value theorem,
for any x, y € [0, t f /3],

l/(*) - /(y)l = I sinz||x - y| for some z between x and y


73
< -£-l*-yl since 0 < z < n/3.

Since 73/2 < 1, / is a contraction mapping.

Theorem 1.2 (Contraction Mapping Principle) Let X c R" be closed. Let


f: X —> X be a contraction mapping. Then there is a unique point x € X such that
f (x) = x. (Not surprisingly, x is called a fixed point off.)
246 > Chapter 6. Solving Nonlinear Problems

Proof Let xo g X be arbitrary, and define a sequence recursively by

xA+i = f(Xjt).

Our goal is to show that, inasmuch as f is a contraction mapping, this sequence converges
to some point x g X. Then, by continuity of f (see Exercise 1), we will have

f(x) = lim f(Xjt) = lim xk+i — x.


k-+oc &->oo

Consider the equation

k
Xfc = Xo + (Xi - Xo) + (x2 - Xi) + • • • + (Xk - Xfc-1) = Xo + (x> “ X7-1) •

This suggests that we set

= Xfc Xfc—i

and try to determine whether the series 22 &k converges. To this end, we wish to apply
Proposition 1.1, and so we begin by estimating || a* ||: By the definition of the sequence {x*}
and the definition of a contraction mapping, we have

Ila*II = llx* - Xfc-i|| = ||f(Xfc-i) - f(xjt-2)I < e||xjt-i - xjt_2|| = c||a*-i ||


for some constant 0 < c < 1, so that

l|a*|| < c||a*_i|| < c2||afc_2|| < • < c*-1||ai||.

Therefore,

k / K \ i _ k
22
k=i
IM - (S
\fc=i
c /J IM = -rz~-IIM-
c

Since 0 < c < 1,

and so the series 22 lla*ll converges. By Proposition 1.1, we infer that the series 22
converges to some vector a g R". It follows, then, that xk -> xo + a = x, as required.
Two issues remain. Since xk -> x, all the xks are elements of X, and X is closed,
then we know that x g X as well. The uniqueness of the fixed point is left to the reader in
Exercise 1. ■

► EXAMPLE 2

According to Theorem 1.2, the function f introduced in Example 1 must have a unique fixed point
in the interval [0, j t /3]. Following the proof with x0 ~ 0, we obtain the following values:
1 The Contraction Mapping Principle 247

k xk k Xk
1 1. 11 0.744237
2 0.540302 12 0.735604
3 0.857553 13 0.741425
4 0.654289 14 0.737506
5 0.793480 15 0.740147
6 0.701368 16 0.738369
7 0.763959 17 0.739567
8 0.722102 18 0.738760
9 0.750417 19 0.739303
10 0.731404 20 0.738937

Indeed, as Figure 1.1 illustrates, the values x* are converging to the x-coordinate of the intersection
of the graph of /(x) = cos x with the diagonal y = x.

Example 2 shows that this is a very slow method to obtain the solution of cosx = x.
Far better is Newton’s method, familiar to every student of calculus. Given a differentiable
function g: R -> R, we start at xk, draw the tangent line to the graph of g at xk, and let
Xfc+i be the x-intercept of that tangent line, as shown in Figure 1.2. We obtain in this way
a sequence, and one hopes that if x q is sufficiently close to a root a, then the sequence will
converge to a. It is easy to see that the recursion formula for this sequence is
g(xk)
Xk+1 — Xk g'(Xk) ’

g(*) „
so, in fact, we are looking for a fixed point of the mapping f(x) = x — ——-. If we assume
g'(x)
g is twice differentiable, then we find that f' = gg"/(g')2, so f will be a contraction
mapping whenever lgg"/(g')2l < c < 1. In particular, if |g"| < M and |g'| > m, then
248 ► Chapter 6. Solving Nonlinear Problems

iterating f will converge to a root a of g if we start in any closed interval containing a on


which |g| < m2/M (provided f maps that interval back to itself). For the strongest result,
see Exercise 8.

► EXAMPLES

Reconsidering the problem of Example 2, let’s use Newton’s method to approximate the root of
X COS X
cosx = x by taking g(x) = x — cosx and iterating the map f(x) = x — - ----- :.
1 + smx

k ** k xk
0 1. 0 0.523599
1 0.750364 1 0.751883
2 0.739113 2 0.739121
3 0.739085 3 0.739085
4 0.739085 4 0.739085

Here we see that, whether we start at either xo = 1 or at xo = t t /6, Newton’s method converges to
the root quite rapidly. Indeed, on the interval [nr/6, jt /3], we have m = 1.5, M = .87, and |g| < .55,
which is far smaller than m2/M ~ 2.6. ◄

To move to higher dimensions, we need a multivariable Mean Value Theorem. The


Mean Value Theorem, although often misinterpreted in beginning calculus courses, tells
us that if we have bounds on the size of the derivative of a differentiable function, then
we have bounds on how much the function itself can change from one point to another. A
crucial tool here will be the norm of a linear map, introduced in Chapters 3 and 5.

Proposition 13 (The Mean Value Inequality) Suppose U C is open, f: 17 ->


is C1, and a and b are points in U so that the line segment between them is contained
in U.1 Then
||f(b) - f(a)|| < ( max ||Z>f(x)||)||b - a||.
\ x€[a,b] /

^More generally, all we need is a C1 path in U joining a and b.


1 The Contraction Mapping Principle 4 249

Proof Define g: [0,1] Rm by g(r) = f (a + f(b - a)). Note that

f(b) — f(a) = g(l) - g(0).

By the chain rule, g is differentiable and

(*) g'(f) = Df(a + z(b - a))(b - a).

Applying Lemma 5.3 of Chapter 3, we have

||f(b) — f(a)|| = </Z*1 Ilg'COII^t < max Hg'COII.


JO '€[0,1]

By (*), we have ||g'(f)|| < ||Df (a + r(b - a))|| ||b - a||, and so

max Hg'O)|| < ( max ||Df(x)||)||b - a||.


re[0,l] \xe[a,b] /

This completes the proof. ■

► EXERCISES 6.1
1. Prove that any contraction mapping is continuous and has at most one fixed point.
2. Let/: R -> R be given by /(x) = Vx2 + 1. Show that/has no fixed point and that |/'(x)| < 1
for all x € R. Why does this not contradict Theorem 1.2?
c*
*3. For the sequence {xt} defined in the proof of Theorem 1.2, prove that || Xfc — x|| < ------||xi — Xo||.
1 —c
This gives an a priori estimate on how fast the sequence converges to the fixed point.
4. A sequence {x*} of points in R" is called a Cauchy sequence if for all s > 0 there is K so that
whenever k,t > K, we have ||x* - xt || < s. It is a fact that any Cauchy sequence in R" is convergent
(See Exercise 2.2.14.) Suppose 0 < c < 1 and {xjt} is a sequence of points in Rn so that ||xjt+i — x*|| <
cKx* — Xjt-i || for all k e N. Prove that {x*} is a Cauchy sequence, hence convergent. (Hint: Show
cK
that whenever k, t > K, we have ||x* - xf|| < ------ ||xi - Xo||.)
1 —c
5. Use the result of Exercise 2.2.14 to give a different proof of Proposition 1.1.
8 6. (a) Show that if H is any square matrix with || H || < 1, then 1 — H is invertible. (Hint: Consider
the geometric series You will need to use the result of Exercise 5.1.6.)
(b) Suppose, more generally, that A is an invertible n x n matrix. Show that when || H || < 1 /1| A~11|,
the matrix A + H is invertible as well. (Hint: Write A + H = A(Z + A-1#).)
(c) Prove that the set of invertible n x n matrices is an open subset of A4BX« = R"2. This set
is denoted GL(n), the general linear group. (Hint: By Exercise 5.1.5, if (52hy)1/2 < 8, then
IIHII < 5.)
8 7. Continuing Exercise 6:
(a) Show that if ||H|| < s < 1, then || (I + Z?)1 - Z || < ------- .
1 —£
(b) More generally, if A is invertible and || A-11| || Zf || < e < 1, then estimate || (A + H)~x — A-11|.
(c) Let X c be the set of invertible n x n matrices (by Exercise 6, this is an open subset).
Prove that the function f: X -> X, f (A) = A-1, is continuous.
250 > Chapter 6. Solving Nonlinear Problems

8. Suppose xo € U c R, g: V -> R is Q2, and g'(xo) / 0. Set hQ = -g(xo)/g'(xo) and Xi = x0 4-


h0. Prove that if |g"| < M on the interval [xi - |/i0|, *i + |/i0|] and |g(x0)IM < |(g'(x0))2, then
Newton’s method converges, starting at x0, to the unique root of g in that interval,2 as follows.
(a) According to Proposition 3.2 of Chapter 5, we have <g(xi) = g(x0) 4- g'(xo)ho 4- |g"(£)Ao for
some £ between x0 and jq. Prove that |g(xi)| < 11 g(xo) I.
(b) Using the fact that g'(xj) = g'(x0) 4- g"(c)h0 for some c between x0 and Xi, show that
1 2
is'(*i)i ~ igwr
Now deduce that
|g(*i)l < lff(*o)l
and hence that |g(xi)|Af < jCg^xi))2.
<£'Oi)2 “ g'(x0)2

(c) Deduce that IgCxO/g'Cxi)! < |A0|/2.


(d) Prove analogously that if, when we apply Newton’s method, we set xfc+i = xk 4- hk, then \hk\ <
\hQ\/2k. Deduce that iterating Newton’s method converges to a point in the given interval.
9. Using the result of Exercise 8:
*(a) Let g(x) = x2 — 2. Carry out two steps of Newton’s method starting at x0 = 1. Give an interval
that is guaranteed to contain a nearby root of g.
(b) Let g(x) — x3 — 2. Carry out two steps of Newton’s method starting at x0 = 5/4. Give an
interval that is guaranteed to contain a nearby root of g.
*(c) Let g(x) = x - cos2x. Carry out two steps of Newton’s method starting at x0 = t t /4. Give an
interval that is guaranteed to contain a nearby root of g.
10. Suppose Xq € U C R", g: U -> R" is C2, and Dg(xo) is invertible. Newton’s method in n
dimensions is given by iterating the map
f(x) = x- Dg(x)-1g(x),
startingatxo. Setho = -DgCxol^gCxp)andxi = Xo 4- ho- LetB = B(xi, ||ho||); suppose ||Hess(gj)||
< Mi on B, and set M = ^52"=! Af2- Suppose, moreover, that

l|i>g(xo)’1||2)|g(xo))|M < 1/2.


Prove that Newton’s method converges, starting at xo, to a point of B, as follows.
(a) Using Proposition 3.2 of Chapter 5, prove that ||g(xi) || < ||ho||2 < |||g(xo)||.
(b) Show that ||Dg(xi) - Dg(xo)|| < Af||ho||. Using Exercise 7, deduce that ||Dg(xi)-11| <
2||Dg(x0)-11|. (Hint: Let H = Dg(x0)-1(Dg(x1) - Dg(xo)); show that ||H|| < 1/2.)
(c) Now show that ||Dg(xi)-1||2||g(x1)|| < ||Dg(xo)-1||2||g(xo)||, and conclude that
l|£>g(x1)-1||2||g(x1)||M<l/2.
(d) Prove that ||Dg(xi)-1g(xi)|| < ||hol|/2.
(e) Letting bfe = — Dg(xt)-1g(x,t), prove analogously that ||hfc || < ||ho||/2fc. Deduce that iterating
Newton’s method converges to a point in the given ball.
11. Using the result of Exercise 10:
4xi 4- x| - 4
*(a) Let g: R2 -> R2 be defined by g Do one step of Newton’s method to
4X1X2 — 1
-
solve g(x) = 0, starting at Xo = , and find a ball in R2 that is guaranteed to contain a root of g.

2We learned of the n-dimensional version of this result, which we give in Exercise 10, called Kantarovich’s
Theorem, in Hubbard and Hubbard’s Vector Calculus, Linear Algebra, and Differential Forms.
2 The Inverse and Implicit Function Theorems 251

(b) Let g: R2 —> R2 be defined by g '1 +*2 ~5 Do one step of Newton’s method to
2*1X2 - 1

solve g(x) = 0, starting at Xo = and find a ball in R2 that is guaranteed to contain a root of g.

4sinxi + x%
(c) Let g: R2 -> R2 be defined by g . Do one step of Newton’s method to
2*1*2 — 1

solve g(x) = 0,. starting at Xq = , and find a ball in R2 that is guaranteed to contain a root of g.
0

r, - ?1 xr22
xi
(d) Let g: R2 -> R2 be defined by g . Do one step of Newton’s method to
X2 — COSX1
0
solve g(x) = 0, starting at Xq = , and find a ball in R2 that is guaranteed to contain a root of g.
1

12. Prove the following, slightly stronger version of Proposition 1.3. Suppose 17 c R" is open,
f: U -> Rm is differentiable, and a and b are points in U so that the line segment between them
is contained in U. Then prove that there is a point £ on that line segment so that ||f (b) — f (a) || <
||Df ($)|| ||b — a||. (Hints: Define g as before, let v = g(l) — g(0), and define 0: [0,1] -* R by
0(t) = g(r) • v. Apply the usual mean value theorem and the Cauchy-Schwarz inequality, Proposition
2.3 of Chapter 1, to show that ||v||2 = 0(1) - 0(0) < Ug7 (c)|| ||v|| for some c e (0,1).)

► 2 THE INVERSE AND IMPLICIT FUNCTION THEOREMS


When we study functions f: R -> R in single-variable calculus, it is usually quite simple
to decide when a function has an inverse function. Any increasing (or decreasing) function
certainly has an inverse, even if we are unable to give it explicitly (e.g., what is the inverse
of the function /(x) = x5 + x 4-1?). Sometimes we make up names for inverse functions,
such as log, the inverse of exp, and arcsin, the inverse of sin (restricted to the interval
[-7F/2, 7F/2J).
Since a differentiable function on an interval in R with nowhere zero derivative has a
differentiable inverse, it is tempting to think that if the derivative f'(a) 0, then / should
have a local inverse at a.

► EXAMPLE 1

| + x2 sin J, X jL 0
Let f(x) = . Then, calculating from the definition, we find
0, x=0

f7(0) = | 4-Umhsin | = 5 > 0.

On the other hand, if x 0,

f'(x) = | + 2x sin | - cos

so there are points (e.g., x = 1 jinn for any nonzero integer n) arbitrarily close to 0 where /7(x) < 0.
That is, despite the fact that /7(0) > 0, there is no interval around 0 on which f is increasing, as
Figure 2.1 suggests. Thus, f has no inverse on any neighborhood of 0.
252 ► Chapter 6. Solving Nonlinear Problems

All right, so we need a stronger hypothesis. If we assume f is C1, then it will follow
that if f'(a) > 0, then f' > 0 on an interval around a, and so f will be increasing—hence
invertible—on that interval. That is the result that generalizes nicely to higher dimensions.

Theorem 2.1 (Inverse Function Theorem) Suppose U c R." is open, xo g 17,


f: U -> R" is C1, and Df(x0) is invertible. Then there is a neighborhood V CU of
«o on which f has a C1 inverse function. That is, there are neighborhoods V of Xq and W
of f(xo) = yo and a C1 junction g: W —> V so that

f(g(y))=y for ally eW and g(f(x))=x forallxeV.

Moreover, iff(x) = y, we have

Dg(y) = (Df(x))-1.

Proof Without loss of generality, we assume that xo = yo = 0 and that Df (0) = I.


(We make appropriate translations and then replace f(x) by Df(0)-1f(x).) Since f is C1,
there is r > 0 so that

||Df(x) — 11| < | whenever ||x|| < r.

Now, fix y with ||y || < r/2, and define the function by

0(x) =x-f(x)+y.

Note that ||Zty(x)|| = ||Df(x) - Z||. Whenever ||x|| < r, we have (by Proposition 1.3)

H(X) II < ||X - f(X) II + ||y|| < r- + r- = r,

and so 0 maps the closed ball 2?(0, r) to itself. Moreover, if x, y G B(0, r), by Proposition
1.3 we have

||0(x) — 0(y)|| < |||x - y||,


2 The Inverse and Implicit Function Theorems ◄ 253

Figure 2.2

so 0 is a contraction mapping on B(0, r). By Theorem 1.2, 0 has a unique fixed point
xy € B(Q, r). That is, there is a unique point xy e 2?(0, r) so that f (xy) = y. We leave it to
the reader to check in Exercise 10 that in fact xy e B(0, r).
As pictured in Figure 2.2, take W = B(0, r/2) and V = f-1 (W) Cl B(0, r) (note that V
is open because f is continuous; see also Exercise 2.2.7). Define g: W -> V by g(y) = xy.
We claim first of all that g is continuous. Indeed, define 0: B(0, r) -> by 0(x) =
f (x) — x. Then, by Proposition 1.3 we have

||(f(u)-u) - (f(v) - v)|| = ||f(u) - lKv)|| < l||u-v||.

Thus, we have

||(f(u)-f(v))-(u-v)|| <i||u-v||.

It follows from the triangle inequality (see Exercise 1.2.17) that

l|u — v|| - ||f(u) — f(v)|| < |||u — v||,

and so

||h -t || < ||f(u) -f(v)||.

Writing f (u) = y and f(v) = z, we have

(*) llg(y) -g(z)ll < 2||y-z||.

It follows that g is continuous (e.g., given e > 0, take S = e/2).


Next, we check that g is differentiable. Fix y € W and write g(y) = x; and we wish
to prove that Dg(y) = (Df(x))-1. Choose k sufficiently small that y + k g W. Set
g(y +• k) — x + h, so that h = g(y + k) — g(y). For ease of notation, write A = Df (x).
We are to prove that
254 > Chapter 6. Solving Nonlinear Problems

We consider instead the result of multiplying this quantity by (the fixed matrix) A:

A(g(y + k)-g(y))-k _ Ah-k _ f(x + h) -f(x) - Dt(x)h ||h||


l|k|| ||k|| ~ ||h|| ' ilk||'

We infer from (*) that ||h|| < 2||k||, so as k -> 0, it follows that h -> 0 as well. Note,
moreover, that h /= 0 when k / 0 (why?). Now we analyze the final product above: The
first term approaches 0 by the differentiability of f; the second is bounded above by 2. Thus,
the product approaches 0, as desired.
The last order of business is to see that g is^C1. We have

og(y) = (or(g(y)))-1,

so we see that Dg is the composition of the function y Df (g(y)) and the function
A A-1 on the space of invertible matrices. Since g is continuous and f is C1, the former
is continuous. By Exercise 6.1.7, the latter is continuous (indeed, we will prove much
more in Corollary 5.19 of Chapter 7 when we study determinants in detail). Since the
composition of continuous functions is continuous, the function y Dg(y) is continuous,
as required. ■

Remark More generally, with a bit more work, one can show that if f is Qk (or
smooth), then the local inverse g is likewise Gk (or smooth).
It is important to remember that this theorem guarantees only a local inverse function.
It may be rather difficult to determine whether f is globally one-to-one. Indeed, as the
following example shows, even if Df is everywhere invertible, the function f may be very
much not one-to-one.

> EXAMPLE 2

Definef: R2 -> R2by

Thenf is C1, and

—eu sin v
eu cos v

is everywhere nonsingular since its determinant is e2u / 0. Nevertheless, since sine and cosine are
/„\ Z u \
periodic, it is clear that f is not one-to-one: We have fl I = f I I for any integer k.
\v / \v + 2jtkl

On the other hand, if f , then we apparently can solve for u and v:

g ZA _ ’ 2 log(*2 + y2)
\y/ _ arctan(y/x)
2 The Inverse and Implicit Function Theorems ◄ 255

Figure 2.3

certainly satisfies fog So, why is g not the inverse function of f ? Recall that

arctan: R —*• (—nr/2, t t /2). So, as shown in Figure 2.3, if we consider the domain of f to be

: —Txfl < v < njl and the domain of g to be , then f and g will be inverse

functions.

Let’s calculate the derivative of any local inverse g according to Theorem 2.1. Iff

then

Note that we get the same formula by differentiating our specific inverse function (t). It is a bit
surprising that the derivative of any other inverse function, with different domain and range, must be
given by the identical formula.

Now we are finally in a position to prove the Implicit Function Theorem, which first
arose in our informal discussion of manifolds in Section 5 of Chapter 4. It is without
question one of the most important theorems in higher mathematics.

Theorem 2.2 (Implicit Function Theorem) Suppose U c Rn is open and


x
F: U -> Writing avector inW1 as , withx e R" m andy g suppose that
_y _
= 0 and the m xm matrix is invertible. Then there are neighborhoods

V ofxo and W of yo and a 61 function (/): V W so that

F = 0, x G V, and y g W «=> y = $(x).

Moreover,

( x \\ 9F / x \
X ~ \3y \0(x)// dx \0(x)/ ‘
256 ► Chapter 6. Solving Nonlinear Problems

Figure 2.4

Proof Define f: U -» ln = Rn“m xP by

Note that the linear map

is invertible (see Exercise 4.2.7). This means that—as illustrated in Figure 2.4—there
are neighborhoods V c of x q , W G of y0, and Z C of 0 and a C1 function
g: Z V x W so that g is an inverse of f on V x W. Now define $: V -> W by

$ is obviously C1 since g is. And

so F ( X ) = 0, as desired. On the other hand, if F (X I = 0, x G V, and y g W, then


W(x)/ \y/
X = g ^X^, so y must be equal to ^(x).

Since we know that 0 is C1, we can calculate the derivative by implicit differentiation:
Define h: V -> by h(x) = F ( X ). Then h is C1, and since h(x) = 0 for all x g V,

we have
2 The Inverse and Implicit Function Theorems ◄ 257

O = Dh(x) = —

9F / x \ .
Since by hypothesis — I J z I is invertible, the desired result is immediate. ■
dy
Remark With not much more work, one can prove analogously that if F is Qk (or
smooth), then y is given locally as a Qk (or smooth) function of x. We may take this for
granted in our later work.

► EXAMPLES

Consider the function F: R2 -> R, F = x3ey + 2x cos(xy). We assert that the equation

xo 1
= 3 defines y locally as a function of x near the point By the Implicit
yo 0
dF Z1 \
Function Theorem, Theorem 2.2, we need only check that — I I $4 0. Well,
3y \0/

8F , -
— = x ey — 2x sin(xy) and so
3y

so we know that in a neighborhood of = 1 there is a C1 function 0 with <£(1) = 0 whose graph is


(the appropriate piece of) the level curve F = 3, as shown in Figure 2.5. Of course, farther away, the
curve apparently gets quite crazy.
1
If we’re interested in the best linear approximation to the curve at the point , then we also
0
know from Theorem 2.2 that

so the line 5x + y = 5 is the desired tangent fine of the curve at that point. ◄ !

Figure 2.5
258 ► Chapter 6. Solving Nonlinear Problems

► EXAMPLE 4

Consider F: R5 -> R3 given by


^Xl\ o’

*2 2xi + *2 + 71 + 73 -1 1
71. = *1*2 4-xiyi + *j72 "7273 , and let a= -1
yz *27173 + *171+7273 1

W 1

*1
Does the equation F = 0 define y = as implicitly as a function of x = near a? Note
*2
L* J
first of all that F is C1. We begin by calculating the derivative of F:

^*?
*2 2 1 1 0 1
DF y\ = x% + yi 3xjxl + 2x272 *1 2*272 ~ 73 “72

72 7i 7173 *273 +2*171 yj *271 +27273 _

and so

( °"l

1 "2 1 1 0 1“

DF -1 = 0 2 0 1 -1
1 _i -1 1 1 1_
I V

In particular, we see that

1 0 1
8F
y(a) = 0 1 -1
8y
1 1 1

which is easily checked to be nonsingular, and so the hypotheses of the Implicit Function Theorem,
Theorem 2.2, are fulfilled. There is a neighborhood of a in which we have y = 0(x). Moreover, we
have

With this information, we can easily give the tangent plane at a of the surface F = 0. ◄

Remark In general, we shall not always be so chivalrous (nor shall life) as to set
up the notation precisely as in the statement of Theorem 2.2. Just as in the case of linear
equations where the first r variables needn’t always be the pivot variables, here the last m
variables needn’t always be (locally) the dependent variables. In general, it is a matter of
finding m pivots in some m columns of the m x n derivative matrix.
2 The Inverse and Implicit Function Theorems 259

EXERCISES 6.2

1. By applying the Inverse Function Theorem, Theorem 2.1, determine at which points xo the given
function f has a local C1 inverse g, and calculate Dg(f (xo)).

lx x+y+z
x/(x2 + y2)
(b) (e) f y xy + xz + yz (cf. also Exercise 2)
y/(x2 + y2)
\z xyz
x + h(y)
(c) for any 61 function h: R -> R
y

u u 4- v
2. Let U = : 0 < v < u j, and define f: U -> R2 by f
v uv
(a) Show that f has a global inverse function g. Determine the domain of and an explicit formula
forg.
(b) Calculate Dg both directly and by the formula given in the Inverse Function Theorem. Compare
your answers.
(c) What does this exercise have to do with Example 2 in Chapter 4, Section 5? In particular, give a
concrete interpretation of your answer to part b.
3. Check that in each of the following cases, the equation F = 0 defines y locally as a C1 function
xo
0(x) near a = , and calculate Z)$(x0).
yo

(a) Fi I = y1 -x3 -2 sin (n(x -y)),x0 = 1, y0 = -1

I X1 I i
*(b) F I x2 I = e*iy 4- y2 cosxix2 - 1, Xo = ,yo =o
2
\y /

l X1 I
0
(c) F I x2 I = e*iy 4- y2 arctan x2 - (1 4- t t /4), Xq = , yo = 1
1

x2 - y2 - y2 - 2 1
(d) F I yi , x0 = 2, y0 =
X - yi 4- yz - 2 i
\y2/

.2 2
X2 ,Xq =
2
(e) F ,yo =
yi 2X1X2 + xj - 2y2 + 3y£ + 8 -1 1
\y2/
*4. Show that the equations x2y 4- xy2 +12 — 1 = 0 and x2 4- y2 — 2yt = 0 define x and y implicitly

as C1 functions of t near Find the tangent line at this point to the curve so defined.
260 ► Chapter 6. Solving Nonlinear Problems

1
5. LetF I y I = x2 + 2y2 — 2xz - z2 = 0. Show that near the point a == 1 , z is given implicitly
\z/ 1
as a C1 function of x and y. Find the largest neighborhood of a on which this is true.
*6. Using the law of cosines (see Exercise 1.2.12) and Theorem 2.2, show that the angles of a triangle
are C1 functions of the sides. To a small change in which one of the sides (keeping the other two
fixed) is an angle most sensitive?
7. Define f: Mnxn -> Mnxn byf(A) = A2.
(a) By applying the Inverse Function Theorem, Theorem 2.1, show that every matrix B in a neigh­
borhood of I has (at least) two square roots A (i.e., A2 = 5), each varying as a C1 function of B.
(See Exercise 3.1.13.)
(b) Can you decide if there are precisely two or more? (Hint: In the 2 x 2 case, what is
1 o'1'
Df ?)
0 -1
dF
8. Suppose U C R3 is an open set, F: U -> R is a C1 function, and on F = 0 we have —— 0,
fp\ p
dF dF
—- 5^ 0, and — / 0. (You might use, as an example, the equation F I v I = pV - RT = 0 for
dV dT II
\r/
one mole of ideal gas; here R is the so-called gas constant) Then it is a consequence of the Implicit
Po
Function Theorem that in some neighborhood of Vo , each of p, V, and T can be written as
_ r° _ i dp \
a differentiable function of the remaining two variables. Physical chemists denote by I —— I
x /T

partial derivative of the function p = p I j with respect to V, holding T constant, etc. Prove the

thennodynamicist’s magic formula

T- =-L
T P \dpJv

9. Using the notation of Exercise 8, physical chemists define the expansion coefficient a and isother­
mal compressibility B to be, respectively,
1 1
and V \ dp )T '

*(a) Calculate a and B for an ideal gas.


dp \ a
( —— I = —.
dT Jv p
10. Check that, under the hypotheses in place in the proof of Theorem 2.1, if ||x || = r, then ||f(x)|| >
r/2. (Hint: Use Exercise 1.2.17.)
d 11. Let B = 5(0, r) c R”. Suppose U C R" is an open subset containing the closed ball B, f: U ->
is C1, f (0) = 0, and ||Df (x) — Z|| < J < 1 for all x e B. Prove that if UyH < r(l — 5), then there
is x € B such that f (x) = y.
12. Suppose U c R" is open and f: U -> Rm is C1 with f (a) = 0 and rank(Df (a)) = m. Prove that
for every c sufficiently close to 0 e R"1 the equation f (x) = c has a solution near a.
3 Manifolds Revisited •< 261

13. (The Envelope of a Family of Curves) Suppose f: R2 x (a, b) —> R is C2 and for each t e
(a, b), V/ I 0 on the level curve Ct = f = 01. (Here the gradient denotes differentiation

with respect only to x.) The curve C is called the envelope of the family of curves [Cr : t e (a, b)}
if each member of the family is tangent to C at some point (depending on t).
(a) Suppose the matrix

a2/ M d2f /«o\


9x3t } dydt yZo)

is nonsingular. Show that for some 8 > 0, there is a C1 curve g: (t0 - 8, tQ + 8) -» R2 so that

f
J \t J
A_ dt \ * J
0
Conclude that g is a parametrization of the envelope C near xo.
(b) Find the envelopes of the following families of curves (portions of which are sketched in Figure
2.6).

ii.

iii.

► 3 MANIFOLDS REVISITED
In Chapter 4, we introduced ^-dimensional manifolds in R" informally as being locally the
graph of 61 function over an open subset of a k-dimensional coordinate plane. We suggested
that, because of the Implicit Function Theorem, under the appropriate hypotheses, a level
262 ► Chapter 6. Solving Nonlinear Problems

set of a C1 function is a prototypical example. Indeed, as we now wish to make clear, there
are three equivalent formulations, roughly these:

Explicit: Near each point, M is a graph over some ^-dimensional coordinate plane.

Implicit: Near each point, M is the level set of some function whose derivative has
maximum rank.

Parametric: Near each point, M is parametrized by some one-to-one function whose


derivative has maximum rank (e.g., a parametrized curve with nonzero velocity).

We’ve seen that the implicit formulation arises in working with Lagrange multipliers, and
the parametric formulation will be crucial for our work with integration in Chapter 8. In
this brief section, we are going to make the three definitions quite precisely and then prove
their equivalence in Theorem 3.1. To make our life easier in Chapter 8, we will replace the
C1 condition with “smooth.”

Definition We say M c R” is a k-dimensional manifold if any one of the following


three criteria holds:

1. For any p e M, there is a neighborhood W C R" of p so that M A W is the graph


of a smooth function f: V -> R"“*» where V c R* is an open set. Here we are
allowed to choose any k integers 1 < ii < • • • < 4 < n; then R* is the xix • • •
plane, and Rn“* is the plane of the complementary coordinates.
2. For any p e M, there are a neighborhood W C R" of p and a smooth function
F: W -> R"_fc so that F-1(0) = M A W and rank(£>F(x)) = n — k for every x g
MAW.
3. For any p g M, there is a neighborhood W C R" of p so that M A W is the image
of a smooth function g: t7 -> Rn for some open set U c R\ with the properties
that g is one-to-one, rank(Dg(u)) = k for all u g U, and g-1: M A W -» U is
continuous. (See Figure 3.1.)

Figure 3.1

If the curious reader wonders why the last (and obviously technical) condition is in­
cluded in the third definition, see Exercises 2 and 3.
3 Manifolds Revisited 263

Theorem 3.1 The three criteria given in this definition are all equivalent.

Proof The Implicit Function Theorem, Theorem 2.2, tells us precisely that (2)=>(1).
u
And (1)=>(3) is obvious since we can set g(u) (where, for ease of notation, we
f(u)
assume here that Rfc is the xi • • • -plane). So it remains only to check that (3) ===>(2).
Suppose, as in the third definition, that we are given a neighborhood W c R" of
p € M so that M A W is the image of a smooth function g: U -> R" for some open set
U C R\ with the properties that g is one-to-one, rank(Z)g(u)) = k for all u g U, and
g-1: M fl W -> U is continuous. The last condition tells us that that if g(iio) = p, then
points sufficiently close to p in M must map by g-1 close to Uo; that is, all points of M n W
are the image under g of a neighborhood of u q .
We may assume that g(0) = p and (renumbering coordinates in R" as necessary)
T A "1
£>g(0) = — , where A is an invertible k x k matrix. We define G: V x Rn~k R”
LBJ
’01 n.
by G g(u) + . Smce

“0- A___ O_
B In-k

is invertible (see Exercise 4.2.7), it follows from the Inverse Function Theorem, Theorem

2.1, that there are neighborhoods V = Vi x V2 C R* x RB“* of


0
and W c Rn of p
0
and a local (smooth) inverse H: W V of G. (Shrinking W if necessary, we assume

W C W.) Writing H(x) = x g R* xRw~*, we define F: W -> R"~fc by F = H2.

u
Now suppose F(x) = 0. Since x e W,x - G I I for a unique vector € V. Then

f w =f (g C))=H2(g ("))=t ’
so F(x) = 0 if and only if v = 0, which means that x = g(u). This proves that the equation
F = 0 defines that portion of M given by g(u) for all u G V). But because W c W, we
know that such points comprise all of M A W. ■

► EXAMPLE 1

Perhaps an explicit example will make this proof a bit more understandable. Suppose g: R -> R3 is
u
given by g(u) = «2 and M is the image of g. We wish to write M (perhaps locally) as the level
,3
264 ► Chapter 6. Solving Nonlinear Problems

set of a function near p = 0. As in the proof, we define

u " 0 ' u
u2 + VI = U2 + Vi
u3 _ »2 _ _ u3 4- V2 _

We can explicitly construct the inverse function

6^ X
G1 y y — x2

The proof tells us to define F = Hi, and, indeed, this works. M is the zero-set of the function
M r _x2~
F: R3 -»R2 given by Fly = y \ .
I z
W L J

We ask the reader to carry this procedure out in Exercise 6 in a situation where it will only work
locally.

There are corresponding notions of the tangent space of the manifold M at p. (Recall
that we shall attempt to refer to the tangent space as a subspace, whereas the tangent plane
is obtained by translating it to pass through the point p.)

Definition If the manifold M is presented in the three respective forms above, then
its tangent space at p, denoted TPM, is defined as follows.

1. Assuming M is locally the graph of f with p = , then TPM is the graph of


L J
Df(a).
2. Assuming M is locally a level set of F, then TPM = N([Z>F(p)]).
3. Assuming M is locally parametrized by g with p = g(a), then TPM is the image
of the linear map Z)g(a): R* -> R".

Once again, we need to check that these three recipes all give the same ^-dimensional
subspace of Rn. The ideas involved in this check have all emerged already in the preceding
chapters. Since (1) is a special case of (3) (why?), we need only check that N([DF(p)]) =
image(Dg(a)). Note that both of these are ^-dimensional subspaces because of our rank
conditions on F and g. So it suffices to show that image(Dg(a)) c N([£>F(p)]). But this is
easy: The function F°g: U -> R""* is identically 0, so, by the chain rule, DF(p)°Z>g(a) =
O, which says precisely that any vector in the image of Dg(a) is in the kernel of DF(p).

► EXERCISES 6.3

*1. Show that the set X — is not a 1-dimensional manifold, even though the func­

t3
tion g(t) = gives a C1 “parametrization” of it. What’s going on?
l'3l
3 Manifolds Revisited 265

cos 2t cos t
2. Show that the parametric curve g(t) = , t € (—t t /2, t t /4), is not a 1-dimensional
cos 2t sin t
manifold. (Hint: Stare at Figure 3.2.)

3. Consider the following union of parallel lines:


x
X= y = q for some q e Q ■ C R2.
LyJ
Is X a 1-dimensional manifold? (Here Q denotes the set of rational numbers.)
4. Is the union of the hyperbola xy = 1 and its asymptote y = 0 a 1-dimensional manifold? Give
your reasoning.
5. Show the equivalence of the three definitions for each of the following 1-dimensional manifolds:
a
*(a) parametric curve
t*

cosf
(b) parametric curve
3sint
(c) implicit curve x2 + y2 — 1, x2 + y2 + z2 — 2x
(d) implicit curve x2 + y2 = 1, z2 -I- w2 = 1, xz + y w = 0
+ u2
6. Suppose g: R -> R3 is given by g(») = u2 . Let M be the image of g.
u3
(a) Show that g is globally one-to-one.
(b) Following the proof given of Theorem 3.1, find a neighborhood W of 0 e R3 and F: W -> R2
so that M n W = F-1(0).
7. Show the equivalence of the three definitions for each of the following 2-dimensional manifolds:
(a) implicit surface x2 + y2 = 1 (in R3)
(b) implicit surface x2 + y2 = z2 (in R3 - {0})
wcosv
*(c) parametric surface usinv , u > 0, v e R
V
sin u cos v
(d) parametric surface sin u sin v
cosu
266 ► Chapter 6. Solving Nonlinear Problems

sin u cos v
(e) parametric surface sin u sin v , 0 < U < 7T, 0 < v < 2t t
2c o sh

(3 + 2 coS m ) cosv
(f) parametric surface (3 + 2 c o s h ) sin v , 0 < u, v < 2j t
2sinu

8. (a)

: (x2 + y2 + z2)2 — 10(x2 + y2) + 6z2 + 9 = 0

is a 2-manifold.

(b) Check that y e X 4=> (y/x2 + y2 — 2)2 + z2 = 1. Use this to sketch X.


z
9. At what points is

(x2 + y2 + z2)2 - 4(x2 + y2) = 0

a smooth surface? Proof? Give the equation of its tangent space at such a point.
10. Prove that the equations
x2 + x2 + x2 + x2 =4 and xix2 + X3X4 = 0

define a smooth surface in R4. Give a basis for its tangent space at

11. Prove (1)==>(2) in Theorem 3.1 directly.


x
12. Writing 6 R3 x R3, show that the equations
y
Wl2 = llyll2 = 1 and xy = 0
define a 3-dimensional manifold in R6. Give a geometric interpretation of this manifold.
13. Recall from Exercise 1.4.34 that an n x n matrix A is called orthogonal if ATA = I.
(a) Prove that the set O(n) of n x n orthogonal matrices forms a -^-dimensional manifold in
Mnxn = R”2 ■ (Hint: Consider F: Mnxn -* {symmetric n x 'n matrices} = R"(n+i)/2 defined by
F(A) = ATA - I. Use Exercise 3.1.13.)
(b) Show that the tangent space of O(n) at I is the set of skew-symmetric n x n matrices.
FA "I ,
14. Prove (3)==>(1) in Theorem 3.1 directly. (Hint: Suppose g(0) = p and Dg(0) = — , where
B
gi U —> R* x R"~* and observe that gi has a local
A is an invertible k x k matrix Write g =
g2
inverse. What about the general case?)
CH ER

7
INTEGRATION
We turn now to the integral, with which, intuitively, we chop a large problem into small,
understandable bits and add them up, then proceeding to a limit in some fashion. We
start with the definition and then proceed to the computation, which is, once again, based
on reducing the problem to several one-variable calculus problems. We then learn how
to exploit symmetry by using different coordinate systems and tackle various standard
physical applications (e.g., center of mass, moment of inertia, and gravitational attraction).
The discussion of determinants, initiated in Chapter 1, culminates here with a complete
treatment and their role in integration and the change of variables theorem.

> 1 MULTIPLE INTEGRALS


In single-variable calculus the integral is motivated by the problem of finding the area under
a curve y = f(x) over an interval [a, b}. Now we want to find the volume of the region

in R3 lying under the graph z = f and over the rectangle R = [a, b] x [c, d] in the

xy-plane. Once we see how partitions, upper and lower sums, and the integral are defined
for rectangles in R2, then it is simple (although notationally discomforting) to generalize to
higher dimensions.

Definition Let/? = [a,b] x [c, d]bea.rectanglem7R.2. Let/: R —> R be a bounded


function. Given partitions = {a — x q < < ♦ • • < xk — b} and = {c = yo < yi <
■ ■■< yf = d} of [a, b\ and [c, d], respectively, denote by T — CPi x CP2 the partition of the
rectangle R into subrectangles

Ry = [x{-i, x^ x [yj-i, yj, 1 < i < k, 1 < j <£.

Let Mu = sup /(x) and mu = inf /(x), as indicated in Figure 1.1. Define the upper
xeRij
sum of / with respect to the partition ?,

U(/,5’) = ^Myarea(Xu),
ij
and the analogous lower sum

L(f,7) = Y, mij&e&iRij).
i>j

267
268 Chapter 7. Integration

Figure 1.1

We say f is integrable on R if there is a unique number I satisfying

L(f, P) < I < U(f, P) for all partitions O’.

In that event, we denote I = / fdA, called the integral of f over R.


Jr

(Note that the inequality L(f, V) <U(f, IP) is obvious, as < Mij for all i and j.)

► EXAMPLE 1

Let f be a constant function, viz., /(x) = a for all x € R. Then for any partition P of R we have
L(f, P) = aarea(R) = U (f, P), so f is integrable on R and / fdA — aarea(R).
Jr

In higher dimensions, we proceed analogously, but the notation is horrendous. Let


R = [ai, bj x [a2, b2]x • • ■ x [an, bn] C Rn be a rectangle in Rn. We obtain a partition
of R by dividing each of the intervals into subintervals,

= xi.o < *1,1 < • • • < *1,^ = bi,


«2 = *2,0 < *2,1 < • ’ * < *2,Jt2 = ^2,

an = *n,0 < *n,l < ‘ < *M„ = bn,

and in such a way forming a “paving” of R by subrectangles

Rjiji—jn = [*l,Ji—1’*l,Ji] x t*2,j2-l’ *2,72! X ••• X [*n,jB—1, *n,jnl

for some 1 < js <ks,s = 1,..., n.

We will usually suppress all the subscripts and just refer to the partition as {R,}. We define
the volume of a rectangle R = [ai, hi] x [a2, b2] x • • • x [a„, bn] C R" to be

vol(R) = (bi - ai)(h2 - a2) • • • (bn - an).


1 Multiple Integrals 269

Then upper sums, lower sums, and the integral are defined as before, substituting volume
(of a rectangle in R") for area (of a rectangle in R2). In dimensions n > 3, we denote the
integral by / fdV.
Jr
We need some criteria to detect integrability of functions. Then we will find soon that
we can evaluate integrals by reverting to our techniques from one-variable calculus.

Definition Let 3s and IP' be partitions of a given rectangle R. We say 3^ is a refinement


of 3’ if for every rectangle Q e IP' there is a rectangle Q g 7 so that Q' c Q. (See Fig­
ure 1.2.)

partition IP of the rectangle R

Figure 1.2

Lemma 1.1 Let 3* and 7' be partitions of a given rectangle R, and suppose (P is a
refinement of 7'. Suppose f is a boundedfunction on R. Then we have

L(J, 3y) < L(/, 3>) < U(f, 3>) < U(f, 3^).

Proof It suffices to check the following: Let Q be a single rectangle, and let Q =
{Qi, • • •» Qr} be a partition of Q. Let m = infx€e /(x), - infxefii /(x), M =
supxeG /(x), and Mi = supxeG. f (x). Then we claim that
r r
m area (2) < y^ffliarea(<2i) < Aftarea(Q,) < Afarea(<2).
i=l i=l

This is immediate from the fact that m < m, < Mi< < M for all i = 1,..., r. ■

Corollary 1.2 If 7' and 7" are two partitions of R, we have L(f, 3y) < U (f, 3)Z/).

Proof Let ? be the partition of R formed by taking the union of the respective
partitions in each coordinate, as indicated in Figure 1.3. 3> is called the common refinement
of 3y and 3’". Then by Lemma 1.1, we have

L(/, JP') < L(/, 3>) < U(f, 3>) < U(f, 3>"),

as required. ■
270 ► Chapter 7. Integration

■■ ! ■■■:■

partition IP' of the rectangle R partition IP" of the rectangle R

common refinement of partitions IP' and IP"

Figure 1.3

Proposition 1.3 (Convenient Criterion) Given a bounded function on a rectangle


R, f is integrable on R if and only if, for any e > 0, there is a partition CP of R so that
U(f,?)-L(f,S>)<£-

Proof 4=: Suppose there were two different numbers Ii and h satisfying L (J, IP) <
Ij < U(/, IP) for all partitions IP. Choosing £ = IZj — Al yields a contradiction.
=>: Now suppose f is integrable, so that there is a unique number I satisfying
L(f, IP) < I <U(f, IP) for all partitions IP. Given £ > 0, we can find partitions IP7 and IP"
so that

I - L(f, IP') < 8/2 and U(J, IP") - I < s/2.

(If we could not get as close as desired to I with upper and lower sums, we would violate
uniqueness of 7.) Let IP be the common refinement of IP' and IP". Then

L{f, IP') < L(/, IP) < U(f, IP) < U(f, IP"),

so

U{f, IP) - £(/, IP) < U(f, IP") - L(/, IP7) < e,

as required. ■

We need to be aware of the basic properties of the integral (which we leave to the reader
as exercises).
1 Multiple Integrals 271

Proposition 1.4 Suppose f and g are integrable Junctions on R. Then f + g is


integrable on R and we have

f (f + 8)dV — [ fdV+ [ gdV.


R Jr Jr

Proof See Exercise 9. ■

Proposition 1.5 Suppose f is an integrable Junction on R and a is a scalar. Then


af is integrable on R and we have
f (af)dV =a [ fdV.
r Jr

Proof See Exercise 9. ■

Proposition 1.6 Suppose R = R'U R" is the union of two subrectangles. Then f is
integrable on R if and only iff is integrable on both R' and R", in which case we have
( fdV= f fdV+ f fdV.
R Jr 1 . Jr "

Proof See Exercise 9. ■

Proposition 1.7 Let R c R" be a rectangle. Suppose f: R -> R is continuous.


Then f is integrable.

Proof Given e > 0, we must find a partition ? of R so that U(f, ‘P) — L(f, T) < e.
Since f is continuous on the compact set R, it follows from Theorem 1.4 of Chapter 5 that f
is uniformly continuous. That means that given any e > 0, there is 8 > 0 so that whenever
g
||x — y || < 3, x, y e R, we have |/(x) - /(y)| < ——-. Partition R into subrectangles
vol(/c)
Ri,i = 1,... ,k, of diameter less than 5 (e.g., whose sidelengths are less than 8/y/n). Then
g
on any such subrectangle Rif we will have — mt <---------- , and so
vol(2?)
k
U(f, ?) - £(/, ?) = V(H - m.OvolW) < —i—vol(fi) = e,
vol(R)

as needed. ■

Definition We say X C R” has (n-dimensional) volume zero if for every e > 0, there
s
are finitely many rectangles ,..., Rs so that X C Ri U ■ • • U Rs and £ vol(l?i) < s.
j=i

Proposition 1.8 Suppose f: R -> R is a boundedfunction and the set X = {x. e R :


f is not continuous at x} has volume zero. Then f is integrable on R.
272 ► Chapter 7. Integration

Proof Let e > 0 be given. We must find a partition ? of R so that U (f, !P) —
L(f, IP) < e. Since f is bounded, there is a real number M so that |/| < M. Because
X has volume zero, we can find finitely many rectangles RJ, ..., Rfs, as shown in Figure

1.4, that cover X and satisfy vol(2f<) < e/4M. We can also ensure that no point of

X is a frontier point of the union of these rectangles (see Exercise 2.2.8). Now create a
partition of R in such a way that each of R'j, j = 1,..., s, will be a union of subrectangles

of this partition, as shown in Figure 1.5. Consider the closure Y of R — U it* too,
;=i
is compact, and f is continuous on Y, hence uniformly continuous. Proceeding as in the
proof of Proposition 1.7, we can refine the partition to obtain a partition IP = {7?i,...,
of R with the property that
s
L
RiCY

But we already know that


__ g £
V (Mi - ml)vol(/?/) < (2Af)— =
2
«tcU*f

Therefore, U(f, ?) - L(f, ?) < s, as required. ■

If we want to integrate over a nonrectangular bounded set Q, we pick a rectangle R


with Q C X. Given a bounded function f: Q -> R, define

f.R-+ R

10, otherwise

We then define f to be integrable when f is, and set

f fdV= [ fdV.
Jq Jr
(We leave it to the reader to check in Exercise 8 that this is well defined.)
1 Multiple Integrals 273

Definition We say a subset £2 c Rn is a region if it is the closure of a bounded open


subset of R" and its frontier, i.e., the set of its frontier points, has volume 0.

Remark Note, first of all, that any region is compact.


As we ask the reader to check in Exercise 12, if m < n, X c Rw is compact, and
0: X -> R" is 61, then $(X) has volume 0 in R". So, any time the frontier of £2 is a finite
union of such sets, it has volume 0 and £2 is a region.

Corollary 1.9 If £2 C R” is a region and f: £2 -> R is continuous, then f is inte­


grable on £2.

Proof Recall that to integrate f over £2 we must integrate /, as defined above, over
some rectangle R containing £2. The function f is continuous on all of R except for the
frontier of £2, which is a set of volume zero. ■

Corollary 1.10 If £2 c Rrt is a region, then vol(£2) = / id V is well defined.


Ja

Proof The constant function 1 is continuous on £2. ■

The following result is often quite useful.

Proposition 1.11 Suppose f and g are integrable junctions on the region £2 and
f <g. Then

f fdV< [ gdV.
Jn Jn

Proof Let R be a rectangle containing £2 and let / and g be the functions as defined
above. Then we have f < g everywhere on R. Then, applying Propositions 1.4 and 1.5,
the function h = g — f is integrable and / hdV — I gdV — I fdV. On the other
Jr Jq Jsi
hand, since h > 0, for any partition O’ of R, the lower sum L(h, O’) > 0, and therefore
/ h d V > 0. The desired result now follows immediately. ■
Jr

► EXERCISES 7.1
Ix\ Io 0 ** v < ~
*1. Suppose / I | = l’ ~ ” 2 • Prove that f is integrable on 2? = [0,1] x [0,1] and find
f fdA. V/ I1'
Jr
274 ► Chapter 7. Integration

2. Show directly that the function

1, x=y
f
0, otherwise

is integrable on R = [0,1] x [0,1] and find / fdA. (Hint: Partition R into 1/V by 1/N squares.)
Jr
3. Show directly that the function

1, y <x
f
0, otherwise

is integrable on R = [0,1] x [0,1] and find / fdA.


Jr
4. Show directly that the function

X ~ 2’ 3’ 4’ 5’

otherwise

is integrable on R = [0,1] x [0,1] and find / fdA.


Jr
tt5. (a) Suppose f: R -> R is nonnegative, continuous, and positive at some point of R. Prove that
f fdV > 0.
Jr
(b) Give an example to show the result of part a is false if we remove the hypothesis of continuity.
6. Let Q C R" be a region and suppose f: Q -> R is continuous.
(a) If m and M are, respectively, the minimum and maximum values of f, prove that
mvol(Q) < [ fdV < Afvol(S2).
Jq
(Hint: Use Proposition 1.11.)
(b) (Mean Value Theorem for Integrals) Suppose Q is connected (this means that any pair of
points in Q can be joined by a path in £2). Prove that there is a point c e fl so that
f fdV = /(c) vol (£2).
Ja
(Hint: Apply the Intermediate Value Theorem.)
“7. Suppose f is continuous at a and integrable on a neighborhood of a. Prove that
lim ——---- - [ fdV = f(a).
volB(a, e) JB(ai8) 7

*8. Check that / fdV is well defined. That is, if R and R' are two rectangles containing Q and f
Jo
and f are the corresponding functions, check that f is integrable over R if and only if f' is integrable
over R' and that f fdV = I f'dV.
Jr Jr '
9. (a) Prove Proposition 1.4. (Hint: If ? = {2?(} is a partition and m{, mf, m{+8, M{, Mf, M?+8
denote the obvious, show that
m{ + m8 < m{+8 < M?+g < M{ + Mf.
1 Multiple Integrals 275

It will also be helpful to see that / fdV + / gdV is the unique number between L(f, O’) + L(g, O’)
Jr Jr
and U (f, O’) + U (g, O’) for all partitions O’.)
(b) Prove Proposition 1.5.
(c) Prove Proposition 1.6.
810. Suppose/is integrable on??. Givens > 0, prove there is 8 > Oso that whenever all the rectangles
of a partition O’ have diameter less than 8, we have U (f, O’) — L{f,T) < e. (Hint: By Proposition 1.3,
there is a partition O’' (as indicated by the darker lines in Figure 1.6) so that U (f, 0^) — L(f, O’7) < e/2.
Show that covering the dividing hyperplanes (of total area A) of the partition by rectangles of diameter
< 6 requires at most volume A8/^/n. If |/| < M, then we can pick 8 so that that total volume is at
most s/4Af. Show that this 3 works.)

Figure 1.6
811. Let X c R" be a set of volume 0.
(a) Show that for every e > 0, there are finitely many cubes Ci,.... Cr so that X c Ci U • • • U Cr
and 52 vol(Ci) < £. (Hint: If R is a rectangle with vol(7?) < 8, show that there is a rectangle R'
i=l
containing R with vol(7?') < 8 and whose sidelengths are rational numbers.)
(b) Let T: R” -> R" be a linear map. Prove that T (X) has volume 0 as well. (Hint: Show that there
is a constant k so that for any cube C, the image T (C) is contained in a cube whose volume is at most
k times the volume of C.) Query: What goes wrong with this if T: R” -> Rm and men?
812. Let m < n, let X C Rm be compact, and let U C Rm be an open set containing X. Suppose
0: U -> Rn is C1. Prove 0(X) has volume 0 in R". (Hints: Take X c C, where C is a cube. Show
that if N is sufficiently large and we divide C into Nm subcubes, then X is covered by such cubes
all contained in U,1 and 0(X) will be contained in at most Nm cubes in R”. Argue by continuity of
D<j) that there is a constant k (not depending on N) so that each of these will have volume less than
(k/N)n.)
13. We’ve seen in Proposition 1.8 a sufficient condition for / to be integrable. Show that it isn’t
necessary by considering the famous function /: [0,1] -> R given by
„, K 7, x = 7 in lowest terms
/(x) = 9 9
0, otherwise
(Hint: Why is Q D [0,1] not a set of length zero?)
14. A subset X C Rn has measure zero if, given any e > 0, there is a sequence of rectangles Rit R%,
R3,..., Rk,..., so that
oo oo
x a U Ri and 22vol(/?i) < s.
i=l !=1

(a) Prove that any set of volume 0 has measure 0.


(b) Give an example of a set of measure 0 that does not have volume 0.

1This follows from Exercise 5.1.13.


276 ► Chapter 7. Integration

(c) Prove that if X is compact and has measure 0, then X has volume 0. (Hint: See Exercise 5.1.12.)
(d) Suppose Xi, X2,... is a sequence of sets of measure 0. Prove that Q X( has measure 0.
i=l
15. In this (somewhat challenging) exercise, we discover precisely which bounded functions are
integrable. Let f: R -> R be a bounded function.
(a) Let a e R and 8 > 0. Define
M(f, a, 3) = sup /(x)
x€B(a,3)nB

m(f, a, 3) = inf /(x)

o(f, a) = lim M(f, a, 3) - m(f, a, 3).


«->o+
Prove that o(f, a) makes sense (i.e., the limit exists) and is nonnegative; it is called the oscillation of
f at a. Prove that f is continuous at a if and only if o(f, a) = 0.
(b) For any e > 0, set De = {x € R : o(/, x) > e}, and let D = {x e R : / is discontinuous at a}.
Show that D = Di U D1/2 U D1/3 U • • • and that Ds is a closed set.
(c) Suppose that / is integrable on R. Prove that for any k e N, D\/k has volume 0. Deduce that if
/ is integrable on R, then D has measure 0. (Hint: Use Exercise 14.)
(d) Conversely, prove that if D has measure 0, then / is integrable. (Hints: Choose s > 0 and apply
the convenient criterion. If D has measure 0, then so has Ds, and so it has volume 0 (why?). Create
a partition consisting of rectangles disjoint from De and of rectangles of small total volume that
cover De.)

► 2 ITERATED INTEGRALS AND FUBINI’S THEOREM


In one-variable integral calculus, we learned that we could compute the volume of a solid
region by slicing it by parallel planes and integrating the cross-sectional area. In particular,
given a rectangle R = [a, x [c, d], if we are interested in finding the volume over R and

under the graph z = f , we could slice by planes perpendicular to the x-axis, as shown

in Figure 2.1, obtaining

Figure 2.1
2 Iterated Integrals and Fubini’s Theorem 277

Z (cross-sectional area at x)dx

=L ti fG)dy)dx=LbfcifG)dydx-

x fixed

This expression is called an iterated integral. Perhaps it would be more suggestive to call
it a nested integral. Calculating iterated integrals reverts to one-variable calculus skills
(finding antiderivatives and applying the Fundamental Theorem of Calculus) along with a
healthy dose of neat bookkeeping.

► EXAMPLE 1

■2 r1 n2
/ (l+x2)y + |xy2
'o *--------------- v ....... —-Jy:
x fixed

3A
'0
1 3 25
+ 3 + 4 = 12

► EXAMPLE 2

Let’s investigate an obvious question.

a. We wish to evaluate j f xye*+yl dxdy.

<2
xyex+yldx^dy = j - l)e*J ^jdy (recalling that /xexdx = xex -e*)

i -ii
yey2(e2 + l)dy = |(e2 + l)^2] = 0.

b. Now let’s consider


Z 1 Jy=-1

dydx.

f / xyex+y2dydx = f (xex)(yey2)dy^dx

= 0.
2
More to the point, we should observe that for fixed x, the function (xex)(y«J' ) is an odd
function of y, and hence the integral as y varies from — 1 to 1 must be 0.

We shall prove in a moment that for reasonable functions the iterated integrals in either order are
equal, and so it behooves us to think a minute about symmetry (or about the difficulty of finding an
antiderivative) and choose the more convenient order of integration.
278 ► Chapter 7. Integration

► EXAMPLE 3
Suppose we wish to find the volume of the region lying over the triangle Q c R2 with vertices at
o"| 1 1
,and and bounded above by z = f I I = xy. Then we wish to find the integral of
° ’ 0 1
f over the region Q. By definition, we consider Q as a subset of, say, the square R = [0,1] x [0,1]
and define R by

xy, eQ

0, otherwise

whose graph is sketched in Figure 2.2. Note that for x fixed, f I I = xy when 0 < y < x and is 0

otherwise. So
rx rx
I xydy + / Ody = / xydy.
'O Jx JO
Thus, we have

Figure 2.2

► EXAMPLE 4
Suppose we slice into a cylindrical tree trunk, x2 + y2 < a2, and remove the wedge bounded below
by z = 0 and above by z = y, as depicted in Figure 2.3. What is the volume of the chunk we remove?
2 Iterated Integrals and Fubini’s Theorem 279

We see that the plane z = y lies above the plane z = 0 when y > 0, so we let Q =

: x2 + y2 < a2, y > 0 , as indicated in Figure 2.4, and to obtain the volume we calculate:

[ (a2 — x2)dx = [ (a2 — x2)dx =


2 J-a Jo 3

Figure 2.3

The fact that we can compute volume by using either a multiple integral or an iterated
integral suggests that, at least for “reasonable” functions, we should in general be able to
calculate multiple integrals by computing iterated integrals. The crucial theorem that allows
us to calculate multiple integrals with relative ease is the following

Theorem 2.1 (Fubini’s Theorem, 2-Dimensional Case) Suppose f is integrable on

a rectangle R = [a, b] x [c, d] c R2. Suppose thatfor each x e [a, b], thefunction f

fd (x\
is integrable on [c, d}', i.e., F(x) = J f I J dy exists. Suppose next that the function F

is integrable on [a, b]; i.e.,

F(x)dx dx

exists. Then we have

LfdA=LV' fQdy)dx-
280 ► Chapter 7. Integration

Proof Let? be an arbitrary partition of R into rectangles R^ = [xt_i, x,] x [y7-_ i, yj,

i = l,...,k, 7 = 1,. , t. When G Rij, we have


_y.

™ij < < Mij, and so

- yj-ti dy < Af0(y; - y7-i).

So now when x e [x,_i, xj, we have


whence
7=i

- yj-S)(xi -xi-i)

j =i
£

7=1

Finally, summing over i, we have

Z2 ( K (yj “ ^-1)) (Xl‘ ~Xi-^~/a (Jc ? (y) dy) dX


7 kt

i=i 7=1

But this can be rewritten as


k t ply / pd s \ \ k t
5 / ( / f( I iiy]dx < VVA/yareaCRy)
mj z ; j° w > m j =i
i.e„ ([ ffy‘‘y)dx^U<~f’'n-

Since f is integrable on R, if a number I satisfies

L(f,1?) < I < U(f, ?) for all partitions ? of [a, 3],

then I = I fdA. This completes the proof. ■


Jr

Corollary 2.2 Suppose f is integrable on the rectangle R = [a, b] x [c, d] and the
iterated integrals

and
2 Iterated Integrals and Fubini’s Theorem ◄ 281

both exist. (That is, for each x, the integral f f dy exists and defines a function of

x that is integrable on [a, b]. And, likewise, for each y, the integral f f dx exists

and defines a function of y that is integrable on [c, d].) Then

la ll f C) =I = ll ll f 0

In general, in n dimensions, we have

Theorem 2.3 (Fubini’s Theorem, General Case) Let R c R” be a rectangle, say,

R - [at, bi] x • •• x [an,bnJ.

Suppose f: R —> R is integrable and that, moreover, the integrals


fbn fbn-1 / rbn \
/ f(x)dxn, I (/ f(x)dxn ]dxn-i, ...,
an
rbi / i^bn-1 / pbn \ \
/ ■ • ( / ( / f&>dxn) dx*-l) • ■ 'dxi
J at \Jan-l \JOn / /

all exist. Then the multiple integral and the iterated integral are equal:
r /*bi pbn
/ /(x)dv = I ... f(x)dxn -- dxi.
Jr J ax Jan
(The same is truefor the iterated integral in any order, provided all the intermediate integrals
exist.) In particular, whenever f is continuous on R, then the multiple integral equals any
of the n\ possible iterated integrals.

► EXAMPLES

It is easy to find a function f on the rectangle A = [0,1] x [0,1] that is integrable but whose iterated
integral doesn’t exist. Take

1, x = 0, y e Q
0, otherwise

dy does not exist, but it is easy to see that f is integrable and / f d A = 0. ◄


Jr

► EXAMPLE 6

It is somewhat harder to find a function whose iterated integral exists but that is not integrable. Let

1, y €Q
f
2x, y £Q
282 ► Chapter 7. Integration

Then dx = 1 for every y e [0,1], so the iterated integral dxdy exists and

equals 1. Whether f is integrable on R = [0,1] x [0,1] is more subtle. Probably the easiest way
to see that it is not is this: If it were, by Proposition 1.6, then it would also be integrable on R' =
[0, 5] x [0, 1]. For any partition J* of R\ we have U(f, T) = 5, whereas we can make L(f, O’) as
nl/2 j
Zxdxdy — - as we wish.
We ask the reader to decide in Exercise 4 whether the other iterated integral,

dydx, exists. ◄

► EXAMPLE?

More subtle yet is a nonintegrable function on R = [0,1] x [0,1] both of whose iterated integrals
exist. Define

1, x= and y = * for some m, n, q G N with q prime


0, otherwise

First of all, f is not integrable on R since L(f, T) = 0 and U(f, T) = 1 for every partition O’ of R

(see Exercise 5). Next, we claim that for any x, dy exists and equals 0. When x £ Q,

this is obvious. When x = —, only for finitely many y e [0,1] is f I X ) not equal to 0, and so the
? \y

integral exists. Obviously, then, the iterated integral dydx exists. The same argument

applies when we reverse the order.

► EXAMPLE 8

(Changing the Order of Integration) You are asked to evaluate the iterated integral

sinx , ,
------ dxdy.

It is a classical fact that / ^^dx cannot be evaluated in elementary terms, and so (other than
J x
resorting to numerical integration) we are stymied. To be careful, we define

sinx
------ , x /0
x
1, x=0

Then f is continuous and we recognize (applying Theorem 2.1) that the iterated integral is equal to
the double integral / fdA, where
Jq

Q=
y
2 Iterated Integrals and Fubini’s Theorem -4 283

which is the triangle pictured in Figure 2.5. Once we have a picture of Q, we see that we can equally
well represent it in the form

x
Q=
y

and so, writing the iterated integral in the other order,

sinx , \ ,
------dy Idx
x /
x fixed

dx

r /sinx \ f1 .
I I------ xldx= I sinxax = 1 — cos 1.
0 \ X / Jo

The moral of this story is that, when confronted by an iterated integral that cannot be evaluated in
elementary terms, it doesn’t hurt to change the order of integration and see what happens.

► EXAMPLE 9

Let Q c R3 be the region in the first octant bounded below by the paraboloid z = x2 + y2 and above
by the plane z = 4, shown in Figure 2.6. Evaluate / xdV. It is most natural to integrate first with
Jn
respect to z; notice that the projection of Q onto the xy-plane is the quarter of the disk of radius 2

centered at the origin lying in the first quadrant. For each point in that quarter-disk, z varies
y
from x2 + y2 to 4. Thus, we have

xdzdydx
n

lx

2_1(4_x2)3/2^x

2_ 64
15'
We will revisit this example in Section 3.
284 ► Chapter 7. Integration

Figure 2.6

► EXAMPLE 10

Let Q = {x e Rn : 0 < xn < x„_i < ■ • • < x2 < x^ < 1}. This region is pictured in the case n = 3 in
Figure 2.7. Then
*1
... I dxn ■ • • dx2dx\
Jo
xi fXn-2
... / x„_idxn_i • • -dx2dxi
Jo
*1 fXn-3
I ^_2dxn-2 • • • dx2dxi
'o
1 jxi = -4
'o (n-D! n!

*3

Figure 2.7

► EXERCISES 7.2

1. Evaluate the integrals / fdV for the given function f and rectangle R.
Jr
(a)
/ ] = e* cos y, 7? = [0,1] x [0, y]
2 Iterated Integrals and Fubini’s Theorem ◄ 285

(b) /H = j,« = [l,3]x[2,4)

•(c) f n = . R = [0,1] X [1,3]


\y/ x2 + y

(d) f I xJ1 = (x+y)z, R = [-1,1] x [1,2] x [2, 3]

\z/

2. Interpret each of the following iterated integrals as a double integral / fdA for the appropriate
Jo
region Q, sketch Q, and change the order of integration. (You may assume f is continuous.)
(a) ii:f (j (d> x f t)dxdy

3. Evaluate each of the following iterated integrals. In addition, interpret each as a double integral
/ fdA, sketch the region □ , change the order of integration, and evaluate the alternative iterated
integral.
£ £(x+y>dydx 0» [' (c) /' £177^
’(b)
vu v—a /1—
4. Given the function f in Example 6, does the iterated integral ££ f dydx exist?

5. Check that for the function f defined in Example 7, for every partition IP of R,
I7(/, IP) = 1 and L(/, ?) = 0. (Hint: Show that for every 8 > 0, if l/q < 5, then every interval
of length 8 in [0,1] contains a point of the form k/q.)
6. Let

i x = £ in lowest terms, y e Q
0, otherwise

Decide whether f is integrable on R = [0,1] x [0,1] and whether the iterated integrals

exist.
7. Is there an integrable function on a rectangle neither of whose iterated integrals exists?
8. Evaluate the following iterated integrals:
*(a) f f 7-^-5 dxdy
Jo Jjy 1+ JC3
(b) [ [ ey4dydx
Jo J&
286 ► Chapter 7. Integration

(c) / / eyixdxdy (Be careful: Why does the double integral even exist?)
Jo J^/y
9. Find the volume of the region in the first octant of R3 bounded below by file xy-plane, on the sides
by x = 0 and y = 2x, and above by y2 + z2 = 16.
10. Find the volume of the region in the R3 bounded below by the xy-plane, above by z = y, and on
the sides by y = 4 — x2.
*11. Find the volume of the region in R3 bounded by the cylinders x2 + y2 = 1 and x2 + z2 = 1.
12. Interpret each of the following iterated integrals as a triple integral / fdV for the appropriate
Jn
region Q, sketch Q, and change the order of integration so that the innermost integral is taken with
respect to y. (You may assume f is continuous.)
.2
fx+y I
*(a) f y I dzdydx *i I f Iy dzdydx
'o 0 u

•y x 'x+y
(b)
o I 'o
\z z

(c) y dzdydx
____f
'x2+y2
I
W
*13. Suppose a, b, and c are positive. Find the volume of the tetrahedron bounded by the coordinate
planes and the plane x/a + y/b 4- z/c = 1.
14. Find the volume of the region in R3 bounded by z = 1 — x2, z = x2 ~ l,y + z = 1, and y = 0.
*15. Let □ C R3 be the portion of the cube 0 < x, y, z < 1 lying above the plane y + z = 1 and below
the plane x + y + z = 2. Evaluate / xdV.
Jsi
16. Let

„ (x\ x —y
f\y)~{x + y)2'

Calculate the iterated integrals / / f | I dxdy and / / f | 1 dydx. Explain your results.
Jo Jo J Jo Jo \yJ
17. Let R = [0,1] x [0, 1]. Define /: R -> R by

fci(fc+i)(-e+D
fc+i < x — k’ i+i < y — <
-k2(k + I)2,
o, otherwise

Decide if both iterated integrals exist and if they are equal. Is f integrable on R2 (Hint: To see where
this function came from, calculate / fdA.)

18. (Exploiting Symmetry) Let R c R". Suppose f: R -> R is integrable.


(a) Suppose R is a rectangle that is symmetric about the hyperplane xi = 0; i.e.,
2 Iterated Integrals and Fubini’s Theorem 287

Prove that / fdV = 0.


Jr

(b) Suppose R is a rectangle that is symmetric about the origin; i.e., x e R —x e R, and
suppose f is an odd function, so that /(—x) = -/(x). Prove that / fdV = 0.
Jr
(c) Generalize the results of parts a and b to allow regions other than rectangles.
19. Assume / is C2. Prove Theorem 6.1 of Chapter 3 by applying Fubini’s Theorem. (Hint: Proceed
by contradiction: If the mixed partials are not equal at some point, apply Exercise 2.3.5 to show we
g2 z g2 -C
can find a rectangle on which, say, —— > Exercise 7.1.5 may also be useful.)
dxdy dydx
#20. (Differentiating Under the Integral Sign) Suppose /: [a, b] x [c,d] -> R is continuous and
9/ . fd /x\
— is continuous. Define F(x) = / f I I dy.
°x Jc \yj
(a) Prove that F is continuous. (Hint: You will need to use uniform continuity of f.)
(b) Prove that F is differentiable and that F'(x) = [ ^-\]dy. (Hint: Let 0(r) =
Jc dx \yj
fd 9/ M fx
I T- I and let $(*) = / <Kt)dt. Show that 0 is continuous and that F(x) = 4>(x) 4-
Jc 9x \y) Ja
const.)
f1 yx — 1 f1 y — 1
21. Let F(x) = I - -------dy. Use Exercise 20 to calculate Fz(x) and prove that / - ------ dy =
Jo logy Jo logy
F(l) = log2.
0 \2 f1 g-x2(t2+l)
' e"* dt] andg(x) = / - — —dt.
0 ' Jo t2 +1
(a) Using Exercise 20 as necessary, prove that /z(x) 4- gz(x) = 0 for all x.
f00 2 fN
(b) Prove that f(x) 4-g(x) = jr/4forallx. Deduce that / e~‘ dt — lim / e~*2 dt = Vt t /2.
) Jo N->oo Jq
dfi.
23. Suppose f: [a, x [c, d] -> R is continuous and — is continuous. Suppose g: [a, £>] ->
/x\
(c, d) is differentiable. Let h(x) = / f I I dy. Use the chain rule and Exercise 20 to show that
Jc \yj

fc-(x)=f”^epy + /( x L'(x).
Jc 9x \yl U(x)/

(Hint: Consider F
288 ► Chapter 7. Integration

24. Evaluate / -- ----- -. Use Exercise 23 to evaluate


Jo x2 + y2
fx dy fx dy
'(a) / (b) / 7 2 / 2\3
Jo (x2 + y2)2 Jo (x2 + y2y
fX
25. Suppose f is continuous. Let h(x) = I sin(x — y)f(y)dy. Show that A (0) = A'(0) = 0 and
Jo
h"(x) + h(x) = /(x).
26. Suppose f is continuous. Prove that
*l fXn-l i r*

n
I 'I f(xn)dxn ■ • • dx3dx2dx^ = -------— / (x - t)n” f(t)dt.
Jo Jo (n -1)! Jo
(Hint: Start by doing the cases n = 2 and n = 3.)

> 3 POLAR, CYLINDRICAL, AND SPHERICAL COORDINATES


In this section we introduce three extremely useful alternative coordinate systems in two
and three dimensions. We treat the question of changes of variables in multiple integrals
intuitively here, leaving the official proofs for Section 6.
Suppose one wished to calculate j f dA, where S is the annular region between

two concentric circles, as shown in Figure 3.1. As we quickly realize if we try to write
down iterated integrals in xy-coordinates, although it is not impossible to evaluate them, it
is far from a pleasant task. It would be much more sensible to work in a coordinate system
that is built around the radial symmetry. This is the place of polar coordinates.
Polar coordinates on the xy-plane are defined as follows: As shown in Figure 3.2, let

r = y/x2 4- y2 denote the distance of the point from the origin, and let 6 denote the

angle from the positive x-axis to the vector from the origin to the point. Ordinarily, we
adopt the convention that

r > 0 and 0 < 0 < 2t t or — it <0 <it.

Figure 3.1
3 Polar, Cylindrical, and Spherical Coordinates ◄ 289

It is better to express x and y in terms of r and 0, and we do this by means of the mapping

g: [0, oo) x [0,2?r) -> R2

r cos#
g
r sin#

To evaluate a double integral j f Qj dA in polar coordinates, we first determine the

region Q in the rfi’-plane that maps to S. We substitute x = r cos 0 and y = r sin 0, and
then realize that a little rectangle Ar by A0 in the r0-plane maps to an “annular chunk”
whose area is approximately Ar (r A0) in the xy-plane (see Figure 3.3). That is, partitioning
the region Q into little rectangles corresponds to “partitioning” S into such annular pieces.
Summing over all the subrectangles of a partition suggests a formula like

rcos#
rdrdO.
r sin#

A rigorous justification will come in Section 6.

► EXAMPLE 1

Let S be the annular region 1 < x2 + y2 < 2 pictured in Figure 3.1. We wish to evaluate

s 'o

r2drdd

2t t (2a /2-1)
3

If you are not yet convinced, try doing this in Cartesian coordinates!
290 ► Chapter 7. Integration

► EXAMPLE!

Let S' C R2 be the region inside the circle x2 + y2 = 9, below the line y = x, above the x-axis, and
lying to the right of x = 1, as shown in Figure 3.4. Evaluate / xydA. We begin by finding the region
Js
Q in the r0-plane that maps to S, as shown in Figure 3.5. Clearly 0 goes from 0 to t t /4, and for each
fixed 0, we see that r starts at r = sec 0 (as we enter S at the line x = 1) and increases to r = 3 (as we
exit S at the circle). (We think naturally of determining r as a function of 0, so naturally we would
place 0 on the horizontal axis and r on the vertical; for reasons we’ll see in Chapter 8, this is not a
good idea.)
Therefore, we have

S JO Jsec 0 ' v
x y dA

= / / r3 cos6 sin0 dr d0
JO J sec 9
1
= - / (81 — sec4 0) cos 0 sin 0d0
4 Jo

Figure 3.5
3 Polar, Cylindrical, and Spherical Coordinates 291

_ 1 f 1 ( . sm0 \
~ 4 Jo 81 cos 0 sm 0------- — I d\
\ cos3 0 /
= | (81 sin2 0------ I’f/4 79
8 \ cos2 0 Jo = 16

► EXAMPLES
/*OO
We wish to evaluate the improper integral / e~x2dx. This “Gaussian integral” is ubiquitous in
, . .. •'o
probability, statistics, and statistical mechanics. Although one way of doing so was given in Exercise
7.2.22, the approach we take here is more amenable to generalization.
Taking advantage of the property ea+b = eaeb, we exploit radial symmetry by calculating instead
the double integral
00 \ 2 -00 fOO r>
e~x dxj = I / e~x2e~y2dydx = / e~{x2+y2)dA.
J JO Jo J[0,oo)x[0,oo)

Converting to polar coordinates, we have

/ e^x+y2dA = / / e~r2rdrde
J[0,oo)x[0,oo) Jo Jo

= lim / / e~r2rdrd0
Jo Jo

IT
4’

and so our original integral is equal to . *1

Remark We should probably stop to worry for a moment about convergence of these
improper integrals. First of all,

i/•oo 2 /t>N 2
/ e~x dx = lim / e~x dx
Jo N^°° Jo
exists because, for example, when x > 1, we have 0 < e~x2 < e~x, and so the integrals
/•V 2 y>00
/ e~x dx increase as N -+ oo and are all bounded above by 1 + / e~xdx = 1 + e-1.
Jo
Now it is easy to see, as Figure 3.6 suggests, that the integral of e_(* +y2) over the square
[0, N] x [0, N] lies between the integral over the quarter-disk of radius N and the integral
over the quarter-disk of radius N^/l, both of which approach t t /4.

In general, it is good to use polar coordinates when either the form of the integrand or
the shape of the region recommends it.
Next we come to three dimensions. Cylindrical coordinates r, 0, z are merely polar
coordinates (used in the xy-plane) along with the cartesian coordinate z:

g: [0, oo) x [0, 2t t ) x R -> R3


292 ► Chapter 7. Integration

Figure 3.6

r cos#
r sin#
z

The intuitive argument we gave earlier for polar coordinates suggests now that a little
rectangle Ar by A# by Az in r#z-space corresponds to a “chunk” with approximate volume
A V « Ar(r A#) Az, as pictured in Figure 3.7. If g maps the region Q in r#z-space to our
region S C K3, then we expect

rcos#\ rr / r /rcos#\ .
rsin# I rdrdOdz — 11 ( / f I rsin# I dz IrdrdG.
. z J \ z J '

Indeed, as suggested by the last integral above, it is almost always preferable to set up an
iterated integral with dz innermost, and then the usual rdrdO outside (integrating over the
projection of Q onto the xy-plane).

► EXAMPLE 4

Revisiting Example 9 of Section 2, we let S C R3 be the region in the first octant bounded below by
the paraboloid z = x2 + y2 and above by the plane z = 4. To evaluate / xdV by using cylindrical
Js
3 Polar, Cylindrical, and Spherical Coordinates 293

coordinates, we realize that S is the image under g of the region

r
e

CM
Q=

CM
c
VI
VI

VI

VI

VI

VI
k

n
.

>
Thus, we have

r cos 0 rdzdrdO

r2 CQsOdzdrdO

r2 cos 0(4 — r2)dr dO

71/2 64 „ 64
-coswe =

which, reassuringly, is the same answer we obtained earlier.

► EXAMPLES

Let S be the region bounded above by the paraboloid z = 6 — x2 - y2 and below by the cone z =
+ y2f as pictured in Figure 3.8. Find The symmetry of S about the z-axis makes
cylindrical coordinates a natural. The surfaces z = 6 — r2 and z = r intersect when r = 2, so we see
that S is the image under g of the region

: 0 < r < 2, 0 < 0 < 2t t , r < z < 6 — r2

Figure 3.8
294 ► Chapter 7. Integration

Thus, we have
/> f2it z*2 p6—r^
I zdV = I II z rdzdrdO
S Jo Jo Jr '------- *------- '
dV
p2it r2
I I ~((6 —r2)2 — rr)rdrd0
o Jo
r2 92
= 7t I (36 - 13r2 + r4)rdr — —n. ◄
Jo 3

Last, we come to spherical coordinates: p represents the distance from the origin to
the point, 0 the angle from the positive z-axis to the vector from the origin to the point,
and 0 the angle from the positive x-axis to the projection of that vector into the xy-plane.
That is, in some sense, 0 specifies the latitude of the point and 0 specifies its longitude.
(As shown in Figure 3.9, when p and 0 are held constant, we get a circle parallel to the
xy-plane; when p and 0 are held constant, we get a great circle going from the north pole
to the south pole.) Notice that we make the convention that

p > 0, O<0<7r, and O<0<2rr.

As usual, we use basic trigonometry to express x, y, and z in terms of our new coordi­
nates p, 0, and 0 (see also Figure 3.10):

g: [0, oo) x [0, tt] x [0, 2t t ) -> R3

p sin 0 cos 0
g p sin 0 sin 0
peas#

As suggested by Figure 3.11, a rectangle Ap by A0 by A0 in p00-space maps to a spherical


chunk of volume approximately

AV « (Ap)(pA0)(p sin0A0) = p2sin0ApA0A0.

Figure 3.9
3 Polar, Cylindrical, and Spherical Coordinates 295

So, if g maps the region Q to S, we expect that

p sin 0 cos B
p sin 0 sin# p2 smcpdpdcfrdQ.
pcos0

► EXAMPLES

Let S c K3 be the “ice-cream cone” bounded above by the sphere x2 + y2 + z2 = a2 and below by
the cone z = c^/x1 4- y2, where c is a fixed positive constant, as depicted in Figure 3.12. It is easy to
see that the region Q in -space mapping to S is given by

: 0 < p <a, O<^><0o» 0 < 0 < 2jt

where 0O = arctan (1/c).


The volume of S is calculated by using spherical coordinates as follows:

2?T 3/1 X X
—cT(l -cos^o)

Figure 3.12
296 ► Chapter 7. Integration

We can calculate zdV as well:

III (pcos0) p2 sxn^dpdfydQ


s 0 JO J0 ’—*—' '-------------- *-------------- '
z dV
f(pQ fCL
= I I I p3 sin cos fidpdfidO
Jo Jo Jo
A. • 2 j . 4 ( 1 A
= -a4 sin2 = —a4 I ——7 I .
4 4 \ 1 + c2 /

► EXAMPLE?
0
Let S be the sphere of radius a centered at 0 We wish to evaluate z2dV. We observe first

that, by Exercise 1.2.14, the triangle shown in Figure 3.13 is a right triangle, and so the equation of
the sphere is p = la cos 0,0 < 0 < t t /2. So we have
<• z»2/r pzt/2 r>2acos<!>
I z2dV = //I (p2 cos2 0) p2 siiLtfedpd^dd
s Jo Jo Jo '------ «------' '----------- <----------- '
z2 dV
f2ir [‘it/2 a 2a cos
= I I I p4 cos2 (j) sin^dpdt^dG
Jo Jo Jo
64 r/2 8
=—na5 I cos7 0 sin 0d0 = -na5.
5 Jo 5

Figure 3.13
3 Polar, Cylindrical, and Spherical Coordinates ◄ 297

► EXERCISES 7.3

1. Sketch the curves:


(a) r = 4 cos 9 (c) r = 1 - sin 9
(b) r = 3 sec 9 (d) r = l/(cos0 + sin#)
2. Find the area of the region bounded on the left by x = 1 and on the right by x2 4- y2 = 4. Check
your answer with simple geometry.
3. Find the area of the cardioid r = 1 + cos 9, pictured in Figure 3.14.

4. For e > 0, let S£ = {x : e < ||x|| < 1} C R2. Evaluate lim [ —=======:dA. (This is often
e-*o+ Jse ^x1 4- y2
expressed as the improper integral I (x2 4- y2)~1/2dA.)
Jb {0,\)

*5. Let S'be the annular region shown in Figure 3.1. Evaluate / y2dA
Js
(a) directly; (b) by instead calculating f (x2 4- y2)dA.
Js

*6. Calculate / y(x2 4- y2) 5/2 d A, where S is the planar region lying above the x-axis, bounded on
Js
the left by x — 1 and above by x2 4- y2 — 2.

7. Calculate / (x2 4- y2)~3/2dA, where S is the planar region bounded below by y = 1 and above
Js
by x2 4- y2 = 4.

8. Let/ ==. Let S be the planar region lying inside the circle x2 4- y2 = 2x, above

the x-axis, and to the right of x = 1. Evaluate / fdA.


JS
. f1 f1 xex
9. Evaluate / / —-------dxdy.
Jo Jy x2 + y2
10. Find the volume of the region bounded above by z= 2y and below by z = x2 + y2.
11. Find the volume of the “doughnut with no hole,” p= sin <j>,pictured in Figure 3.15.
*12. Sketch and find the volume of the “pillow” p = sin0,O<0<7r.
298 ► Chapter 7. Integration

Figure 3.15
13. Find the volume of the region inside both x2 4- y2 = 1 and x2 4- y2 + z2 = 2.
14. Find the volume of the region inside both x2 4- y2 4- z2 = 4a2 and x2 4- y2 = lay.
15. Find the volume of the region bounded above by x2 4- y2 4- z2 = 2 and below by z = x2 4- y2.
16. Find the volume of the region inside the sphere x2 4- y2 4- z2 = a2 by integrating in
(a) cylindrical coordinates; (b) spherical coordinates.
*17. Find the volume of a right circular cone of base radius a and height h by integrating in
(a) cylindrical coordinates; (b) spherical coordinates.
18. Find the volume of the region lying above the cone z = y/x2 4- y2 and inside the sphere
x2 4- y2 4- z2 = 2 by integrating in
(a) cylindrical coordinates; (b) spherical coordinates.
19. Find the volume of the region lying above the plane z = a and inside the sphere
x2 4- y2 4- z2 = 4a2 by integrating in
(a) cylindrical coordinates; (b) spherical coordinates.

*20. Let 5 c K3 be the unit ball. Use symmetry principles to compute I x2d V as easily as possible.
Js
21. (a) Evaluate [ e~^+y2+z2) dV. (b) Evaluate [ e~(x2+2y2+3z2)dV.
J]R3

*22. Find the volume of the region in R3 bounded above by the plane z = 3x + 4y and below by the
paraboloid z = x2 4- y2.
f z
23. Evaluate / —z----- ------ -r-^dV, where Sis the region bounded below by the sphere x2 4- y2 4-
Js (x 4- y2 4- z2y'2
z2 = 2z and above by the sphere x2 4- y2 4- z2 = 1.
24. Find the volume of the region in R3 bounded by the cylinders x*2 4- y2 = 1, y2 4- z2 = 1, and
x2 4- z2 — 1. (Hint: Make fall use of symmetry.)

► 4 PHYSICAL APPLICATIONS
So far we have focused on area and volume as our interpretation of the multiple inte­
gral. Now we discuss average value and mass (which have both physical and probabilistic
interpretations), center of mass, moment of inertia, and gravitational attraction.
Recall from one-variable calculus the notion of the average value of an integrable
function. Given a real-valued function f on an interval [a, b], we may take the uniform
partition of th® interval into k equal subintervals, j q = a 4- i i = and
4 Physical Applications 299

calculate the average of the values /(xi),..., f(xk):

1 k
y® = *£,/(*)■

Multiplying and dividing by b — a gives

b-a k
i=i
Now let’s suppose that f is bounded. Then, as usual, m,- < f (xt) < Mi for each i —
1,... ,k, and so

7^—Uf, 3>t)
b—a b—a
for every uniform partition 3** of the interval [a, b]. Now assume that f is integrable. Then
it follows from Exercise 7.1.10 that £(/, and U(f, TJ both approach / f(x)dx as
J Cl
k -> oo, and so

1 fb
y(*) _> -------- I f(x)dx a.sk-+ oo.
b-a Ja

This motivates the following

Definition Let f be an integrable function on the interval [a, b]. We define the
average value of f on [a, b] to be

1
I f(x)dx.
b—a a
In general, if £2 c R" is a region and f: Q -> R is integrable, we define the average value
of f on Q to be

1 r
f=
vol(Q) JQ

> EXAMPLE 1

Around hotplate S' is given by the disk r <n/2. Its temperature is given by f

We want to determine the average temperature of the plate. We calculate

7=^)l/dA
by proceeding in polar coordinates:
z»?r/2
/ / (cos r)rdrd0 = 2t t (r sin r + cos r)
s ’o Jo -*°
300 ► Chapter 7. Integration

and so
_ 2?r(f - 1) 4Qr—2)
Z 7r(f)2 n2 « 0.463. ◄

It is useful to define the integral of a vector-valued function f: Q -> component


by component (generalizing what we did in Lemma 5.3 of Chapter 3): Assuming each of
the component fimctions fa,..., fm is integrable on Q, we set

Then we can define the average value of f on Q in the obvious way:

f = —1— f tdV.
vol(Q) Jn

In particular, we define the centroid or center of mass of the region □ to be

x = —[ xdV.
vol(fl) Jq

► EXAMPLE 2
We want to find the centroid of the plane region Q bounded below by y = 0, above by y = x2, and
on the right by x = 1. Its area is given by

area(Q) = f f dydx =
Jo Jo j
Now, integrating the position vector x over Q gives
f ,A f1 [*2 r x i r1 r *3 "| [i/4

Jq Jo Jo [y J Jo [J* J L1/10_

3/4
SO X = , which makes physical sense (see Figure 4.1). ◄
3/10

It is useful to observe that when the region Q is symmetric about an axis, its centroid will
lie on that axis. (See Exercise 7.2.18.)
When a mass distribution Q is nonuniform, it is important to understand the idea of
density. Much like instantaneous velocity (or slope of a curve), which is defined as a limit
of average velocities (or slopes of secant lines), we define the density 5(x) to be the limit
as r -> 0+ of the average density (mass/volume) of a cube of sidelength r centered at x.2
Then it is quite plausible that, with some reasonable assumptions on the behavior of “mass,”
it should be recaptured by integrating the density function.

2More precisely, the average density of that portion of the cube lying in Q.
4 Physical Applications -•< 301

Lemma 4.1 Let Q c R” be a region. Assume the density junction 8: Q —> R is


integrable. Then mass(fi) = / 8dV.
Ja

Proof As usual, it suffices to assume ft is a rectangle R. For any partition 7 —


{Ri} of R, let mi = infxe/?. <5(x) and Mi = supxe7?j 8(x). Then m/vol^) < mass (7?,) <
MiVol(Ri). (Suppose, for example, that
m mass<fr ) _ r*
vol(/?{)
Then, in particular, for all x e Ri, we have 8 (x) < 8*, and so, by the definition of 8, for each
x there is a cube centered at x whose average density is less than 8*. By compactness, we
can cover Ri by finitely many such cubes, and we see that the average density of Rt itself is
less than 8*, which is a contradiction.) It now follows that L(8, T) < mass(7?) < U(8, T)
for any partition T of R, and so, since we’ve assumed 8 is integrable, we must have
mass(7?) = f 8dV. ■
Jr

Remark We should be a little bit careful here. The Fundamental Theorem of Calculus
tells us that we can recover f by differentiating its integral F(x) = J* f(t)dt provided f is
continuous. If we start with an arbitrary integrable function f, e.g., the function in Exercise
7.1.13, this will, of course, not work. Asimilar situation occurs if we start with an integrable
8, define the mass by integrating, and then try to recapture 8 by “differentiating” (taking the
limit of average densities). Since we are concerned here with physical applications, we will
tacitly assume 8 is continuous (see Exercise 7.1.7). In more sophisticated treatments, we
really would like to allow point masses and “generalized functions,” called distributions-,
this will have to wait for a more advanced course.

Now, generalizing our earlier definition of center of mass, if Q is a mass distribution


with density function 8, then we define the center of mass to be the weighted average
x==----- "7m / 5<x)Xfifv-
mass(Q) J&
302 ► Chapter 7. Integration

This is a natural generalization of the weighted average we see with a system of finitely
many point masses mi,..., at positions Xi,..., respectively, as shown in Figure
4.2. In this case, the weighted average is
N
E w«xi

Em<
i=l
and it has the following physical interpretation. If external forces F, act on the point masses
m,-, they impart accelerations x" according to Newton’s second law: F< = m/X-'. Consider
N N
the resultant force F = E acting on the total mass m — E mi (any internal forces cancel
i=l i=l
ultimately by Newton’s third law). Then
N N
f =22 = 52 ~ m*"-
i=l i=l

That is, as the forces act and time passes, the center of mass of the system translates exactly
as if we concentrated the total mass m at x and let the resultant force F act there.
Next, let’s consider a rigid body3 consisting of point masses m i,..., m N rotating about
an axis £; a typical such mass is pictured in Figure 4.3. The kinetic energy of the system is

V 1 j N
k .e .=22 = - 22m’^ft>)2’
Z=1 Z =l

where a> is the angular speed with which the body is rotating about the axis and r, is
the distance from the axis of rotation to the point mass m,. (Remember that each mass is

3A rigid body does not move relative to itself; imagine the masses connected to one another by inflexible rods.
4 Physical Applications -4 303

moving in a circle whose center lies on £.) Regrouping, we get


N
K.E. = 11 a)2, where I = ^mir2.
i=l

I is called the moment of inertia of the rigid body about t.


In the case of a mass distribution O forming a rigid body, we define by analogy (par­
titioning it and approximating it by a finite number of masses) its moment of inertia about
an axis £ to be
I = f 8r2dV,
Jq
where r is the distance from £.

► EXAMPLE 3

Let’s find the moment of inertia of a uniform solid ball Q of radius a about an axis through its center.
We may as well place the ball with its center at the origin and let the axis be the z-axis. Then, using
spherical coordinates, we have (since 8 is constant)
f pa
1= I 8r2dV = 8 III (psin0)2 p2 sin cfrdpdfidd
Jq Jq Jo Jo "---- - ........ * 1 '
dV
pit pa
— 2n8 I I p4 sin3 (f>dpd<f>
Jo Jo
a5 4 /4 3 \ 2 2 2 2
= 2t c 8 • —- • - = I -Tta 8 I -a = -ma ,
5 3 \3 J 5 5

where m = is the total mass of Q.

► EXAMPLE 4

One of the classic applications of the moment of inertia is to decide which rolling object wins the race
down a ramp. Given a hula hoop, a wooden nickel, a hollow ball, a solid ball, or something more
imaginative like a solid cone, as pictured in Figure 4.4, which one gets to the bottom first?

Figure 4.4
304 ► Chapter 7. Integration

We use the basic result from physics (see the remark on p. 352 and Example 6 of Chapter 8,
Section 3) that, if we ignore friction, total energy—potential plus kinetic—is conserved.4 We measure
potential energy relative to ground level, so a mass m has potential energy mgh at (relatively small)
heights h. If the rolling radius is a, its angular speed is co, and its linear speed is v, then we have
aco = v, so when the mass has descended a vertical height h, we have

original (potential) energy = final (kinetic) energy


, 1 2 1 , /V\2 1 /. I \ ,
mgh = -mv 4- -I ( - I = -m I 1 4------ r I v .
2^_^ 2 , 2 V m°2'
translational K.E. rotational K..E.

Thus, the object’s speed is greatest when the fraction I/ma2 is smallest. We calculated in Example 3
that this fraction is 2/5 for a solid ball. For a hula-hoop of radius a or for a hollow cylinder of radius
a, it is obviously 1 (why?). So the solid ball beats the hula-hoop or hollow cylinder. What about the
other shapes? (See Exercises 16,17, and 19.) And is there an optimal shape?

Newton’s law of gravitation applies to point masses: The force F exerted by a mass m
at position x on a test mass (which we take to have mass 1 unit) at the origin is given by

Thus, the gravitational force exerted by a collection of masses i — 1,..., N, at positions


X,- on the test mass is given by
N N

and, thus, the gravitational force exerted by a continuous mass distribution £2 with density
function 8 is

► EXAMPLES

Find the gravitational attraction on a unit mass at the origin of the uniform region £2 bounded above
by the sphere x2 + y2 + z2 = 2a2 and below by the paraboloid az = x2 4- y2, pictured in Figure 4.5.
(Take 3 = 1.)
Since £2 is symmetric about the z-axis, the net force will lie entirely in the z-direction, so we
calculate only the e3-component of F. Working in cylindrical coordinates, we see that £2 lies over the
disk of radius a centered at the origin in the xy-plane, and so

f z
F3 = G I - ■ T—- 2 T~2<3/2
J a (x + y2 4- z2)3/2
/•2ir z»a -
=G / / -rtZzfifr^
J0 Jo Jr*/a (r2+Z2y!2

4Of course, for the objects to roll, there must be some friction.
4 Physical Applications ◄ 305

= 2nG f0 (\V«2a+ r2 ~~z ) dr = 2nGa (log(l + V2) —


Jo av2/ \

We leave it to the reader to set the problem up in spherical coordinates (see Exercise 24).

► EXAMPLE 6

Newton wanted to understand the gravitational attraction of the earth, which he took to be a uniform
ball. Most of us are taught nowadays that the gravitational attraction of the earth on a point mass
outside the earth is that of a point mass M concentrated at the center of the earth. But what happens
if the point mass is inside the earth? We put the earth (a ball of radius R) with its center at the origin
"o'!
and the point mass at o , b > 0, as shown in Figure 4.6. By symmetry, the net force will lie in
b
the z-direction, so we compute only that component. If the earth has (constant) density 8, we have

F,= /C —G8cos
—— a dV = ~G8f / b-pcos#
Jo d Jq (b2 + p2 — 2bp cos 0)3/2
fR b — pcosd> „
= -2nG8 / --.. -——------- —-^p2 sin fidfidp.
Jo Jo (b2 + p2 — 2bp cos </>)3/2

(Note that we are going to do the 0 integral first.)


306 !► Chapter 7. Integration

Figure 4.6

Fixing p, let u = b2 4- p2 - 2bp cos 0, du = 2bp sin <pd<j>, so

f b - p^4> 2 . b- p
JQ (b2 + P2 - 2bp COS 0)3/2p Sm 0 J(b_p}2 W3/2 2bdU

n f^)2

2b2lib —Pl 7-^—) + (b + p) - \b - p|l


b + pJ J
_ (2p2/b2, p<b
|0, p>b

Now we do the p integral. In the event that b > R, we have

f 2p2
F3 = -2nG8 / ~dp =
4j t G8 R3 GM
Jo b2 b2 3 b2 ’

where M = 4t c 8R3/3 is the total mass of the earth. On the other hand, if b < R, then, since the
integrand vanishes whenever p > b, we have

2p2 , 4nG8, ,
F3 = —2nG8

which, interestingly, is linear in b. (When b = R, of course, the two answers agree.) Incidentally,
we will be able to rederive these results in a matter of seconds in Section 6 of Chapter 8.

> EXERCISES 7.4


*1. Find the average distance from the origin to the points in the ball B(0, a) C 1R2.
2. Find the average distance from the origin to the points in the ball B(0, a) C X3.
4 Physical Applications *4 307

*3. Find the average distance from a point on the boundary of a ball of radius a in R2 to the points
inside the ball.
*4. Find the average distance from a point on the boundary of a ball of radius a in R3 to the points
inside the ball.
5. Find the average distance from one comer of a square of sidelength a to the points inside the
square.
6. Consider the region Q lying inside the circle x2 4- y2 = 2x, above the x-axis, and to the right of
x = 1, with density <51 | = . Find the center of mass of Q.
\yj x2 + y2
*7. Consider the region Q lying inside the circle x2 + y2 = 2x and outside the circle x2 4- y2 = 1. If

its density function is given by 8 | I = (x2 4- y2)~1/2, find its center of mass.
V7
8. Find the center of mass of a uniform semicircular plate of radius a in R2.
9. Find the center of mass of a uniform solid hemisphere of radius a in R3.
10. Find the center of mass of the uniform region in Exercise 7.3.19.
*11. Find the center of mass of the uniform tetrahedron bounded by the coordinate planes and the
plane x/a 4- y/b 4- z/c = 1.
*12. Find the mass of a solid cylinder of height h and base radius a if its density at x is equal to the
distance from x to the axis of the cylinder. Next find its moment of inertia about the axis.
13. Find the moment of inertia about the z-axis of a solid ball of radius a centered at the origin,
whose density is given by <5(x) = ||x||.
14. Let Q be the region bounded above by x2 4- y2 4- z2 — 4 and below by z = ^x2 +y2. Calculate
the moment of inertia of Q about the z-axis by integrating in both cylindrical and spherical coordinates.
15. Find the moment of inertia about the z-axis of the region of constant density <5=1 bounded
above by the sphere x2 4- y2 4- z2 = 4 and below by the cone z-x/3 = y/x2 4- y2.
*16. Find the moment of inertia about the z-axis of each of the following uniform objects:
(a) a hollow cylindrical can x2 4- y2 = a2, 0 < z < h
(b) the solid cylinder x2 4- y2 < a2, 0 < z <h
(c)the solid cone of base radius a and height h symmetric about the z-axis
Express each of your answers in the form I — kma2 for the appropriate constant k.
17. (a) Let 0 < b < a. Find the moment of inertia Ia,b about the z-axis of the uniform region
b2 < x2 4- y2 4- z2 < a2.
(b) Find lim /a-- .
b-^a~ a3 — b3
(c) Use your answer to part b to show that the moment of inertia of a uniform hollow spherical shell
x2 4- y2 4- z2 = a2 about the z-axis is jma2, where m is its total mass.
18. Let Q c IF be a region. For what value of a e R" is the integral

f h-a||2dV
Jci
minimized? (Cf. Exercise 5.2.14.)
19. Let Q be the uniform solid of revolution obtained by rotating the graph of y = |x|", |x| < ai/n,
about the x-axis, as indicated in Figure 4.7. Let I be the moment of inertia about the x-axis. Show
h _L - 2w + 1
* ma2 2(4n 4- 1)'
308 ► Chapter 7. Integration

Figure 4.7
20. Let denote the uniform solid region described in spherical coordinates by 0 < p < a,
0 < 0 < £.
(a) Find the center of mass of J2e.
(b) Find the limiting position of the center of mass as e -> 0+. Explain your answer.
21. (Pappus’s Theorem) Suppose R c R2 is a plane region (say, that bounded by the graphs of f
and g on the interval [a, b]), and let Q C R3 be obtained by revolving 1? about the x-axis. Prove that
the volume of Q is equal to
vol(Q) = 2ny • area(R).
22. Let Q denote a mass distribution. Denote by I the moment of inertia of JI about a given axis t,
and by Zo the moment of inertia about the axis £0 parallel to t and passing through the center of mass
of Q. Then prove the parallel axis theorem:
I = I0 + mh2,
where m is the total mass of Q and h is the distance between t and fo-
23. Calculate the gravitational attraction of a solid ball of radius R on a unit mass on its boundary if
its density is equal to distance from the center of the ball.
24. Set up Example 5 in spherical coordinates and verify the calculations.
25. Prove or give a counterexample: The gravitational force on a test mass of a body with total mass
M is equal to that of a point mass M located at the center of mass of the body.
26. Show that Newton’s first result in Example 6 still works for a nonuniform earth, as long as the
density 8 is radially symmetric (i.e., is a function of p only). What happens to the second result?
27. Consider the solid region Q bounded by (x2 4- y2 + z2)3/2 = kz (k > 0), with k chosen so that
the volume of Q is equal to the volume of the unit ball.
(a) Find A:.
(b) Taking 3 = 1, find the gravitational attraction of Q on a unit test mass at the origin.

Remark Your answer to part b should be somewhat larger than 4?rG/3, the gravitational
attraction of the unit ball (with 8 = 1) on a unit mass on its boundary. In fact, J2 is the region of
appropriate mass that maximizes the gravitational attraction on a point mass at the origin. Can you
think of any explanation—physical, geometric, or otherwise?

28. A completely uniform forest is in the shape of a plane region Q. The forest service will locate a
helipad somewhere in the forest and, in the event of fire, will dispatch helicopters to fight it. If a fire
is equally likely to start anywhere in the forest, where should the forest service locate the helipad to
minimize fire damage? (Let’s take the simplest model possible: Assume that fire spreads radially at
a constant rate and that the helicopters fly at a constant rate and take off as soon as the fire starts. So
what are we trying to minimize here?)
5 Determinants and n-Dimensional Volume 309

► 5 DETERMINANTS AND n -DIMENSIONAL VOLUME


We now want to complete the discussion of determinants initiated in Section 5 of Chapter
1. We will see soon the relation between such “multilinear” functions and n-dimensional
volume. Indeed, determinants will play a central role in all our remaining work.

Theorem 5.1 For each n > 1, there is exactly one function D: R” x • • • x R" -> R
having the following properties: Ttimes

1. If any pair of the vectors Vi,..., vn are exchanged, D changes sign. That is,

D(V1, = -®(¥i,

for any 1 < i < j <n.

2. For allvi,..., v„ G R” and c G R, we have

T)(cyX,V2, ...,¥„) = D(¥i , CV2, ...,¥„) = •••


= D(¥i , . . . , ¥„-i, C¥„) = CD(¥1, .... ¥„).

3. For any vectors Vi,..., vn and N], we have

Wyi.......... ¥j—i, ¥j + vj, ¥l+b ...,¥„) =


O(¥1, ... , ¥i-i, ¥i, ¥f+i, ...,¥„) + £(¥i , . . . , Vi-1, ¥<, ¥i+i, . .., ¥n).

4. 7f{ei,..., en} is the standard basis for R”, then we have

D(ei, ...,e„) = 1.

Properties (2) and (3) indicate that D is linear as a function of each of its variables
(whence “mw/rilinear”); property (1) indicates that D is “alternating.” Property (4) can be
interpreted as saying that the unit cube should have volume 1.

Definition Given an n x n matrix A with column vectors ai,..., a„ g R", set

det A = D(ai,..., a„).

This is called the determinant of A.

Since most of our work with matrices has centered on row operations, it would perhaps
be more convenient to define the determinant in terms of the rows of A. But it really is
inconsequential for two reasons: First, everything we proved using row operations (and,
correspondingly, left multiplication by elementary matrices) works verbatim for column
operations (and, correspondingly, right multiplication by elementary matrices); second, we
will prove shortly that det AT = det A.
Properties (l)-(3) of D listed in Theorem 5.1 allow us to see the effect of elementary
column operations on the determinant of a matrix. Indeed, Property (1) corresponds to a
column interchange; Property (2) corresponds to multiplying a column by a scalar; and
310 ► Chapter 7. Integration

Property (3) tells us—in combination with Property (1)—that adding a multiple of one
column to another does not change the determinant.

► EXAMPLE 1

We can calculate the determinant of the matrix

"0 0 0 4"
t 0 2 0 0
A—
0 0 10
3 0 0 0

as follows. First we factor out the 3 from the first column to get

0 0 0 4
0 2 0 0
0 0 1 0
3 0 0 0

by Property (2). Repeating this process with the 4 and the 2, we obtain

0 0 0 4 0 0 0 1
3° 2 ° ° =2-4.3° 1 0 0
0 0 10 0 0 1 0
1 0 0 0 1 0 0 0

Now interchanging columns 1 and 4 introduces a factor of —1 by Property (1), and we have

0 0 0 1 1 0 0 0
0 1 0 0 0 1 0 0
det A = 24 = -24 = -24
0 0 1 0 0 0 1 0
1 0 0 0 0 0 0 1

since Property (4) tells us that det L = 1.

To calculate the effect of the third type of column operation—adding a multiple of one
column to another—we need the following observation.

Lemma 5.2 If two columns of a matrix A are equal, then det A = 0.

Proof If a{ = a;, then the matrix is unchanged when we switch columns i and j. On
the other hand, by Property (1), its determinant changes sign when we do so. That is, we
have det A = — det A. This can happen only when det A = 0. ■

Now we can easily prove the

Proposition 5.3 Let A be an n xn matrix and let A' be the matrix obtained by adding
a multiple of one column of A to another. Then det A' = det A.
5 Determinants and n-Dimensional Volume 311

Proof Suppose A1 is obtained from A by replacing the i* column by its sum with
c times the column; i.e., a' = a, 4- cay, with i j. (As a notational convenience, we
assume i < j, but that really is inconsequential.) We wish to show that

det A' = D(ai,..., af_i, a, -I- cay, af+i,..., ay,..., an)


- 2)(ai,..., a?_i, a,, al+i,..., ay,..., an) = det A.

By Property (3), we have

det A' = D(ai,..., ai-i, a, 4- cay, ai+i,..., ay,..., an)


= ®(ai,..., a,_i, ai( af+i,... ,ay......... an)
2)(ai,..., a,—!, cay, ai-|-i,..., ay,..., an)
= D(ai, ...,af_i,ai,a/+i.......... ay,...,a„)
4- cD(ai,..., a^i,ay, ai+i,..., ay,..., an)
- D(ai,..., a,_i, Ai, al+i,..., ay,..., an)

since D(ai,..., a,_i, ay, al+i......... ay,..., aM) = 0 by the preceding Lemma. ■

► EXAMPLE 2

We now use column operations to calculate the determinant of the matrix

2 2 1
A = 4 1 0
6 0 1

First we exchange columns 1 and 3, and then we proceed to (column) echelon form:

2 2 1 1 2 2 1 0 0 1 0 0
det A = 4 1 0 =: — 0 1 4 = — 0 1 4 = — 0 1 0
6 0 1 1 0 6 1 -2 4 1 —2 12

But

10 0 10 0
0 1 0 = 12 0 1 0
1 -2 12 1 -2 1

and now we can use the pivots to column-reduce to the identity matrix without changing the deter­
minant. Thus,

1 0 0 1 0 0
det A = -12 0 1 0 = -12 0 1 0 = -12. ◄
1 -2 1 0 0 1

This is altogether too brain-twisting. We will now go back to the theory and soon show that
it’s perfectly all right to use row operations. First, let’s summarize what we’ve established
so far: We have
312 ► Chapter 7. Integration

Proposition 5.4 Let A beann xn matrix.

1. Let A' he obtained from A by exchanging two columns. Then det A' = — det A.
2. Let A' be obtained from A by multiplying some column by the number c. Then
det A' — c det A.
3. Let A' be obtained from A by adding a multiple of one column to another. Then
det A' = det A.

Generalizing our discovery in Example 5 of Section 2 of Chapter 4, we have the


following characterization of nonsingular matrices that will be critical both here and in
Chapter 9.

Theorem 5.5 Let Abe a square matrix. Then A is nonsingular if and only if det A
0.

Proof Suppose A is nonsingular. Then its reduced (column) echelon form is the
identity matrix. Turning this upside down, we can start with the identity matrix and perform
a sequence of column operations to obtain A. If we keep track of their effects on the
determinant, we see that we’ve started with det I = 1 and multiplied it by a nonzero number
to obtain det A. That is, det A / 0. Conversely, suppose A is singular. Then its (column)
echelon form U has a column of zeroes and therefore (see Exercise 2) det U = 0. It follows
as in the previous case that det A = 0. ■

Reinterpreting Proposition 5.4, we have

Corollary 5.6 Let E be an elementary matrix and let A be an arbitrary square matrix.
Then

det(AE) = det E det A.

Proof Left to the reader in Exercise 3. ■

Of especial interest is the “product rule” for determinants.

Proposition 5.7 Let A and B ben xn matrices. Then

de,t(A.B) = det A det B.

Proof Suppose B is singular, so that there is some nontrivial linear relation among
its column vectors:

cibj + • • • + cnbn = 0.

Then, multiplying by A on the left, we find that

ci(Abi) 4--------1- cn(Abn) = 0,


5 Determinants and n-Dimensional Volume -*< 313

from which we conclude that there is (the same) nontrivial linear relation among the column
vectorsof AB, and so AB is singular as well. We inferfrom Theorem5.5 that both det B =0
and det AB = 0, and so the result holds in this case.
Now, if B is nonsingular, we know that we can write B as a product of elementary
matrices, viz., B = E\Ei • • ■ Em. We now apply Corollary 5.6 twice: First, we have

det B — det(Z£i £2 • • • Em) = det E\ det E2 • • • det Em det I — det E\ det £2 • • • det Em;

but then we have

det AB = det(A£i£2 • • • Em) = det £1 det £2 • • • det Em det A


= det A (det £1 det £2 • • • det £m) = det A det B,

as claimed. ■

A consequence of this proposition is that det (AB) = det(BA), even though matrix
multiplication is not commutative. Thus, we have

Corollary 5.8 Let E be an elementary matrix and let A be an arbitrary square matrix.
Then

det(£A) = det £ det A.

Now we infer that, analogous to Proposition 5.4, we have

Proposition 5.9 Let Abe an n xn matrix.

1. Let A' be obtainedfrom A by exchanging two rows. Then det A' = — det A.
2. Let A' be obtainedfrom A by multiplying some row by the number c. Then det A' =
c det A.
3. Let A' be obtained from A by adding a multiple of one row to another. Then
det A' — det A.

Another useful observation is the following

Corollary 5.10 If A is nonsingular, then det(A-1) =

Proof From the equation AA-1 = I and Proposition 5.7, we deduce that
det A det A"1 = 1, so det A-1 — 1/detA. ■

Since we’ve seen that row and column operations have the same effect on determinant,
it should not come as a surprise that a matrix and its transpose have the same determinant.

Proposition 5.11 Let A be a square matrix. Then


det At = det A.
314 ► Chapter 7. Integration

Proof Suppose A is singular. Then so is AT (why?). Thus, det AT = 0 = det A, and


so the result holds in this case. Suppose now that A is nonsingular. As in the preceding
proof, we write A = E\E2 • • • Em. Now we have AT = (E\E2 • • • Em)T = £„■■■ E^E],
and so, using the product rule and Exercise 4, we obtain

det At = det(£^) • • • det(Ej) det(£[) = det Ei det E2 • • • det Em = det A. ■

The following result can be useful:

Proposition 5.12 If A is an upper (lower) triangular n xn matrix, then det A =


011^22 • • -^nn-

Proof If an = 0 for some i, then A is singular (why?) and so det A = 0, and the
desired equality holds in this case. Now assume all the an are nonzero. Let A, be the Ith
row vector of A, as usual, and write A/ = anBi, where the Ith entry of B, is 1. Then, using
Property (2) repeatedly, we have det A = an • • • ann det B. Now B is an upper triangular
matrix with l’s on the diagonal, so we can use the pivots to clear out the upper (lower) entries
without changing the determinant, and thus det B = det I = 1. So det A — #11^22 ‘ ‘ ‘
as promised. ■

Remark As we shall prove in Theorem 1.1 of Chapter 9, any two matrices A and A'
representing a linear map T are related by the equation A' = P~l AP for some invertible
matrix P. As a consequence of Proposition 5.7, we have

det A' = det(P-1AP) = det(APP-1) = det Adet(PP-1) = det A,

and so it makes sense to define det T = det A for any matrix representative of T.

We now come to the geometric meaning of det T: It gives the factor by which signed
volume is distorted under the mapping by T. (See Exercise 24 for another approach.)

Proposition 5.13 Let T: R" —> R" be a linear map, and let R be a parallelepiped.
Then vol(T(R)) = | det T|vol(£). Indeed, if Q c R" is a general region, then vol(T(Q))
= IdetTIvoKQ).

Proof When T has rank < n, det T = 0 and the image of T lies in a subspace of
dimension < n; hence, by Exercise 7.1.12, T(R) has volume zero. When T has rank n,
we can write [T] as a product of elementary matrices. Because of Proposition 5.7, it now
suffices to prove the result when [T] is an elementary matrix itself.
Recall that there are three kinds of elementary matrices (see p. 148). When R is a
rectangle, it is clear that the first type does not change volume, and the second multiplies
the volume by |c|; the third (a shear) does not change the volume, for the following reason.
The transformation is the identity in all directions other than the xtxj-plane, and we’ve
already checked that in two dimensions the determinant gives the signed area. (See also
Exercise 24.)
5 Determinants and n-Dimensional Volume 315

Suppose Q is a region. Then we can take a rectangle R containing Q and consider the
function

x g Q
X- R^K X(x) =
otherwise

Since by our definition of region, % is integrable, given e > 0, we can find a partition T of
R so that Z7(%, T) — £(/, ?) < e. That is, the sum of the volumes of those subrectangles
of T that intersect the frontier of Q is less than e. In particular, this means Q contains
a union, Si, of subrectangles of T and is contained in a union, S2, of subrectangles of
T, as shown in Figure 5.1, with the property that vol(S2) - vol(Si) < e. And, likewise,
T(Q) contains a union, T(Si), of parallelepipeds and is contained in a union, T(S2), of
parallelepipeds, with vol(T(Sj)) = |c|vol(S/) or vol(7(5,)) = vol(Sj), depending on the
nature of the elementary matrix. In either event, we see that

|detT|vol(Si) < vol(T(fi)) < | det I|vol(S2),

and since £ > 0 was arbitrary, we are done. (Note that, by Exercise 7.1.11 and Corollary
1.10, T (Q) has a well-defined volume.) ■

5.1 Formulas for the Determinant


In Chapter 1 we had explicit formulas for the determinant of 2 x 2 and 3x3 matrices. It
is sometimes more useful to have a recursive way of calculating the determinant. Given
an n x n matrix A with n > 2, denote by Aq the (n — 1) x (n — 1) matrix obtained by
deleting the zth row and the j* column from A. Define the ijth cofactor of the matrix to be

Cij = (-1)'+; det Ay.

Note that we include the coefficient of ±1 according to the “checkerboard” pattern as


indicated below:
316 ► Chapter 7. Integration

Then we have the following formula, called the expansion in cofactors along the i* row.

Proposition 5.14 Let Abeannxn matrix. Then for any fixed i, we have
n
det A = 52a0-cy.

Using rows here allows us to check that the expression on the right-hand side of this
equation satisfies the properties of a determinant as set forth in Theorem 5.1. However,
using the fact that det AT = det A, we can transpose this result to obtain the expansion in
cofactors along the column.

Proposition 5.15 Let Abeannxn matrix. Then for any fixed j, we have
n
det A =
1=1

Note that when we define the determinant of a 1 x 1 matrix by the obvious rule,

det [a] = a,

Proposition 5.15 yields the familiar formula for the determinant of a 2 x 2 matrix and,
again, that of a 3 x 3 matrix.

► EXAMPLE 3

Let
2 1 3
A= 1 -2 3
0 2 1

We calculate det A by expanding in cofactors along the second row:

1 3 2 3 2 1
detA = (-l)(2+n(l) + (—1)(2+2)(—2) + (-1)(2+3)(3)
2 1 0 I 0 2
= —(!)(—5) + (—2)(2) - (3)(4) = -11.
Of course, because of the 0 entry in the third row, we’d have been smarter to expand in cofactors
along the third row, obtaining

3 2 7 1
det A = (-D^fO) + (—1)(3+2)(2) + (-l)(3+3)(l) J
3 1
= -2(3) + 1(-5) = -11.

Sketch of proof of Proposition 5.15 As we mentioned earlier, we must check that the
expression on the right-hand side has the requisite properties. When we form a new matrix
A' by switching two adjacent columns (say, columns k and k + 1) of A, then whenever j k
5 Determinants and n-Dimensional Volume ◄ 317

and j k + 1, we have a-j = ay and = —cfJ-; on the other hand, when j = ky we have
aik ~ ai,k+i and c’ik = when j — k 4-1, we have jt+1 = aik and cf>k+1 = -cik,
so

n n
^2aijCij = ~~ aijcij >
J=1 >1

as required. We can exchange an arbitrary pair of columns by exchanging an odd number


of adjacent pairs in succession (see Exercise 16), so the general result follows.
The remaining properties are easier to check. If we multiply the £th column by c, then
for j / k, we have a-j = ay and c-7- = ccy, whereas for j = k, we have c'ik = Cjk and
a'ik — caik. Thus,

n n
Y.^1)
>='
= c>='
Y,a‘>c»'
as required. Suppose now that we replace the fc* column by the sum of two column
vectors, viz., a* — ak + aj'. Then for j / k, we have c-j = cy + c-'- and a'j — ay = a"j.
When j = k, we likewise have c'ik = cik = c''k, but a'ik = aik + a”k. So

n n n

j=i ;=i j=i

as required. ■

Proof of Theorem 5.1 Proposition 5.15 establishes the existence of a function D


satisfying the properties listed in the statement of the theorem. On the other hand, as
we saw, calculating determinants by just using the properties, there can only be one such
function because, by reducing the matrix to echelon form by column or row operations, we
are able to compute the determinant. (See also Exercise 21.) ■

Remark It is worth remarking that expansion in cofactors is an important theoretical


tool but a computational nightmare. Even with calculators and computers, to compute an
n x n determinant by expanding in cofactors requires more than n! multiplications5 (and
lots of additions). On the other hand, to compute an n x n determinant by row reducing
the matrix to upper triangular form requires slightly fewer than |n3 multiplications (and
additions). Now, Stirling’s formula tells us that n! grows faster than (n/e)", which gets
large much faster than does n3. Indeed, consider the following table, displaying the number

5In fact, as n gets larger, the number of multiplications is essentially (e — l)n!.


318 ► Chapter 7. Integration

of operations required:

n cofactors row or column operations


2 2 2
3 9 8
4 40 20
5 205 40
6 1236 70
7 8659 112
8 69280 168
9 623529 240
10 6235300 330

Thus, we see that once n > 4, it is sheer folly to calculate a determinant by the cofactor
method (unless almost all the entries of the matrix happen to be 0).

We conclude this section with a few classic formulas. The first is particularly useful for
solving 2x2 systems of equations and may be useful even for larger n if you are interested
only in a certain component x, of the solution vector.

Proposition 5.16 (Cramer’s Rule) Let A be a nonsingular n x n matrix, and let


b G R". Then the Ith coordinate of the vector x solving Ax = b is

det Bi
xi = TTT’
det A
where Bi is the matrix obtained by replacing the Ith column of A by the vector b.

Proof This is amazingly simple. We calculate the determinant of the matrix Bi


obtained by replacing the Ith column of A by b = Ax = xiai 4-------- 1- xnan:

detBj = ai a2 • • Xi8i + • ’ 4* Xn8n ■•• 8„


1 1
1 1 1
= ai a2 • Xj8/ • • an = Xi det A
1
since the multiples of columns other than the i* do not contribute to the determinant. ■

► EXAMPLE 4

We wish to solve

2 3 xi 3
4 7 X2 -1
5 Determinants and n -Dimensional Volume 319

We have

3 3 2 3
Bi = and B2 =
-1 7 4 -1

so det Bi = 24, det B2 = —14, and det A = 2. Therefore, xi = 12 and X2 = —1.

We now deduce from Cramer’s rule an “explicit” formula for the inverse of a non­
singular matrix. Students always seem to want an alternative to Gaussian elimination, but
what follows is practical only for the 2 x 2 case (where it gives us our familiar formula
from Example 5 on p. 154) and—barely—for the 3 x 3 case.

Proposition 5.17 Let Abe a nonsingular matrix, and let C = [cjj be the matrix of
its cofactors. Then

A~l = t -t c T-
det A

Proof We recall from p. 152 that the 7th column vector of A-1 is the solution of
Ax = ej, where e;- is the j* standard basis vector for Rn. Now, Cramer’s rule tells us that
the 1th coordinate of the 7 th column of A"1 is

(A-1),; = detAj/,
det A
where An is the matrix obtained by replacing the 1th column of A by ey. Now, we calculate
det Ajt by expanding in cofactors along the 1th column of the matrix Ajt. Since the only
nonzero entry of that column is the 7 th, and since all its remaining columns are those of the
original matrix A, we find that

det Ajt = (-l)l+j det Ajj = cji,

and this proves the result. ■

For 3x3 matrices, this formula isn’t bad when det A would cause troublesome arith­
metic in doing Gaussian elimination.

► EXAMPLES

Consider the matrix

1 2 1
A= -1 1 2
2 0 3

then

1 2 -1 2 -1 1
det A = (1) -(2) + (D = 15,
0 3 2 3 2 0
320 ► Chapter 7. Integration

and so we suspect the fractions would not be fun if we implemented Gaussian elimination. Undaunted,
we calculate the cofactor matrix:

and so

3 -6 3
A"' = -i-CT = i
7 1 -3 .
det A 15
-2 4 3

Tn general, the determinant of an n x n matrix can be written as the sum of n! terms,


each (±) the product of n entries of the matrix, one from each row and column. This
can be deduced either from the recursive formula of Proposition 5.15 or directly from the
properties of Theorem 5.1.

Definition A permutation a of the numbers 1,..., n is a one-to-one function cr


mapping {1,..., n} to itself. The sign of the permutation or, denoted sign(o-), is 4-1 if an
even number of exchanges is required to change the ordered set {1,..., n} to the ordered
set {<r (1),..., or(n)} and -1 if an odd number of exchanges is required.

Remark It is a consequence of the well-definedness of the determinant, which we’ve


already proved, that the sign of a permutation is itself well defined. One can then define
sign(cr) to be the determinant of the permutation matrix whose ij-entry is 1 when j = cr(i)
and 0 otherwise.

Proposition 5.18 Let Abe ann xn matrix. Then

det A = sign(o’)aCT(i)iacr(2)2 • • • aa(n)n


permutations a of

= 22 sign(<7)aiCT(l)<22ff (2) • • ’ ^na(n) •


permutations a of

n
Proof The 7 th column of A is the vector a, = 52 aueii, and so, by Properties (2) and
i=l
(3), we have

n n n \
(22 ah ie«i * 12ai^ - • • • > 22 ^=1 yI
n
— ^iil^r‘22 ' ‘ > ®iB)>
5 Determinants and n-Dimensional Volume •«! 321

which, by Property (1),'

= ^2 sign(cr)aff(DiOa(2)2 • • • aff(n)n(D(ei,..., e„)


permutations a of

= / Slgn(cr)da(i)iaa(2)2 • • • «CT(n)n >


pennutations er of

by Property (4). To obtain the second equality, we apply Proposition 5.11. I

Recall that GL(n) denotes the set of invertible n x n matrices (which, by Exercise
6.1.6, is an open subset of A4nxn)-

Corollary 5.19 The function f: GL(n) —> GL(n), f(A) = A"1, is smooth.

Proof Proposition 5.18 shows that the determinant of an n xn matrix is a polynomial


expression (of degree ri) in its n23*567entries. Thus, we infer from Proposition 5.17 that each
entry of A-1 is a rational function (quotient of polynomials) of the entries of A. ■

► EXERCISES 7.5
1. Calculate the following determinants:
1 4 1 -3
-1 6 -2
2 10 0 1
(a) 3 4 5 (c)
0 0 2 2
5 2 1
0 0 -2 1

2 -1 0 0 0
1 0 2 0
-1 2 -1 0 0
-1 2 -2 0
*(b) *(d) 0 -1 2 —1 0
0 1 2 6
0 0 -1 2 -1
1 1 3 2
0 0 0 1 2
2. Suppose one column of the matrix A consists only of 0 entries; i.e., a( = 0 for some i. Prove that
det A = 0.
3. Prove Corollary 5.6.
#4. Prove (without using Proposition 5.11) that for any elementary matrix E, we have det ET = det E.
(Hint: Consider each of the three types of elementary matrices.)
5. Let A be an n x n matrix and let c be a scalar. Prove det(cA) = c" det A.
6. Prove that if the entries of a matrix A are integers, then det A is an integer. (Hint: Use Proposition
5.14 and induction or Proposition 5.18.)
7. Given that 1898,3471,7215, and 8164 are all divisible by 13, use only the properties of determi­
nants and the result of Exercise 6 to prove that
18 9 8
3 4 7 1
7 2 15
8 16 4

is divisible by 13.
322 ► Chapter 7. Integration

ai bi ci
8. Let A = ,B — , andC = be points in R2. Show that the signed area of A AB C
ai bi .C2 _
is given by

1
ai bi ci
ai bi ci
2
1 1 1

9. Let A be an n x n matrix. Prove that


_ 1 0 0 "I
0
det = det A.
A
0
What’s the interpretation in terms of (signed) volume?
10. Generalizing Exercise 9, we have the following:
(a) Suppose A e Mk*k> B € Mkxi, and D € Mtxt. Prove that
A B_
det = det A det D.
O D

(b) Suppose now that A, B, and D are as in part a, and C e A4^xjt. Prove that if A is invertible, then
A B
det = det Adet(D — CA-1B).
~D

(c) If we assume, moreover, that k = t and AC — CA, then deduce that


A_ B_
det = det (AD - CB).
C D

(d) Give examples to show that the result of part c needn’t hold when A is singular or when A and
C do not commute.
*11. Suppose A is an orthogonal n xn matrix. (Recall that this means that ATA = In.) Compute
det A.
12. Suppose A is a skew-symmetric n x n matrix. (Recall that this means that AT = —A.) Prove
that when n is odd, det A = 0. Give an example to show this needn’t be true when n is even. (Hint:
Use Exercise 5.)
1 2 1
*13. Let A = 2 3 0
1 4 2
1
(a) If Ax = 2 , use Cramer’s rule to find x2.
-1
(b) Find A-1 by using cofactors.
5 Determinants and n-Dimensional Volume 323

*14. Using cofactors, find the determinant and the inverse of the matrix
-1 2 3
A= 2 1 0
0 2 3

a 15. (a) Suppose A is an n x n matrix with integer entries and det A = ±1. Prove that A-1 has all
integer entries.
(b) Conversely, suppose A and A-1 are both matrices with integer entries. Prove that det A = ±1.
16. Prove that the exchange of any pair of rows (or columns) of a matrix can be accomplished by an
odd number of exchanges of adjacent pairs.
17. Suppose A is an orthogonal n x n matrix. Show that the cofactor matrix C = ±A.
18. Generalizing the result of Proposition 5.17, prove that ACT = (det A)I even if A happens to be
singular. In particular, when A is singular, what can you conclude about the columns of CT?
X\ X2
19. (a) Show that if and are distinct points in R2, then the unique line passing through
J2.
them is given by the equation
1
X2 = 0.
y2

noncollinear points in R3, then the unique plane

1
*3
= 0.

Z3

20. As we saw in Exercises 4.1.22 and 4.1.23, through any three noncollinear points in R2 there
pass a unique parabola6 y = ax2 + bx + c and a unique circle x2 + y2 + ax + by + c = 0. Given
r XI n r X2 n *3
three such points, 5 , and , show that the equation of the parabola and circle are,
_>’l - _3’2 .
-
respectively,
1 1 1 1 1 1 1 1
X xi X2 X3 X *1 X2 X3
=0 and = 0.
X2 X2 x22
X X2 >1
X1 x3 y 3'2 3,3
x2 + y2 *1+3'1 *2+3,2 x 32 + v 32
y yi y2 3>3

21. Using Corollary 5.6, prove that the determinant function is uniquely determined by the properties
listed in Theorem 5A. (Hint: Mimic the proof of Proposition 5.7. It might be^helpful to consider two
functions, det and det, that have these properties and prove that det(A) = det(A) for every square
matrix A.)

6Here we must also assume that no pair of the points lies on a vertical line.
324 ► Chapter 7. Integration

22. Let Vi,..., vfe g R". Show that


V1 • V1 Vi • V2 ■•• Vl • Vjt
V2-V1 V2-V2 ••• V2-V*

▼ *•▼ 1 V* • V2 ... Vfc-Vjt

is the square of the (^-dimensional) volume ofthe ^-dimensional parallelepiped spanned by Vj,..., v*.
(Hints: First take care of the case that {vi,..., v*} is linearly dependent. Now, supposing they
are linearly independent and therefore span a fc-dimensional subspace V, choose an orthonormal
basis {Ujt+i,..., u„) for What is the relation between the fc-dimensional volume of the par­
allelepiped spanned by vb ..., Vjt and the n-dimensional volume of the parallelepiped spanned by
Vl, . . . , Vfc, Ut+1, . . . , Un?)
23. (a) Using Proposition 5.18, prove that D(det)(Z)B = trB = bu -I------- 1- bnn. (See Exercise
1.4.22.)
(b) More generally, show that for any invertible matrix A, D(det)(A)B = det A tr(A-1B).
24. Give an alternative proof of Proposition 5.13 for general parallelepipeds as follows. Let R c R"
be a parallelepiped. Suppose T: R" -> R" is a linear map of either of the forms

Calculate the volume of R and of T(R) by applying Fubini’s Theorem, putting the xi integral inner­
most. (This is in essence a proof of Cavalieri’s principle.)
25. (From the 1994 Putnam Exam) Find the value of m so that the line y = mx bisects the region

g R2: — +y2<l, x > 0, y>0 •.


_y J 4
26. Given any ellipse, show that there are infinitely many inscribed triangles of maximal area.
27. (From the 1994 Putnam Exam) Let A and B be 2 x 2 matrices with integer entries such that A,
A + B, A + 2B,A + 3B, and A + 42? are all invertible matrices whose inverses have integer entries.
Prove that A + 5B is invertible and that its inverse has integer entries. (Hint: Use Exercise 15.)

► 6 CHANGE OF VARIABLES THEOREM


We end this chapter with a general theorem justifying our formulas for integration in polar,
cylindrical, and spherical coordinates. Since we know that the determinant tells us the
factor by which linear maps distort signed volume, and since the derivative gives the best
linear approximation, we expect a change of variables formula to involve the determinant
of the derivative matrix. Giving a rigorous proof is, however, another matter.
Since integration is based upon rectangles rather than balls, it is most convenient to
choose (for this section only) a different norm to measure vectors and linear maps, which,
for obvious reasons, we dub the cubical norm.

Definition If x € Rn, set ||x|jD = max(|xi|, Ixil, • , |xn|). If T: Rn -> Rm is a


linear map, set ||T||n = max ||T(x)||n.
6 Change of Variables Theorem 325

(1 -e)r

(1 + e)r
Figure 6.1

We leave it to the reader to check in Exercise 1 that these are indeed norms and, as will be
crucial for us, that || T (x) ||D < || T ||n ||x j|n for all x g R". Our first result, depicted in Figure
6.1, estimates how much a 61 map can distort a cube.

Lemma 6.1 Let Cr denote the cube in R" of sidelength 2r centered at 0. Suppose
U C R” is an open set containing Cr and $: U —> R” is a C1 function with the property
that 0(0) = 0 and ||D0(x) — Z ||0 < e for all ieCr and some 0 < £ < 1. Then

C(l-e)r C <KCr) C C(l+e)r•

Proof One can check that Proposition 1.3 of Chapter 6 holds when we use the || • ||
norm instead of the usual one (see Exercise If). Then if x e Cr, we have

||0(x)|| < max ||Z>0(y) || ||x|| <(14- £)r,


u ye[0,x] a a

so 0(Cr) C C(i+e)r. The other inclusion can be proved by applying Exercise 6.2.11 in the
|| • ||Q norm. ■

The crucial ingredient in the proof of the Change of Variables Theorem is the following
result, which says that for sufficiently small cubes C, the image g(C) is well approximated
by the image under the derivative at the center of C.

Proposition 6.2 Suppose U C R" is open, g: U -> R” is C1, and Z>g(x) is invertible
for every x eU. Let C CU be a cube with center a, and suppose

||Dg(a)_1oDg(x) - I||D < e < 1 forallxe C.

Then g(C) is a region (and hence has volume) and

(1 - £)n| detDg(a)|vol(C) < vol(g(C)) <(14- e)n| detDg(a)|vol(C).

Proof Since g is C1 with invertible derivative at each point of U, g maps open sets
to open sets and the frontier of g(C) is the image of the frontier of C, hence a set of zero
volume (see Exercise 7.1.12). Therefore, g(C) is a region.
326 ► Chapter 7. Integration

Suppose the sidelength of the cube C is 2r. We apply Lemma 6.1 to the function 0
defined by

<A(x) = Dg(a)-1 (g(x 4- a) - g(a)), x g Cr.

Then $(0) = 0, £>0(0) = I, and D0(x) — Dg(a)-1°Dg(x + a), so, by the hypothesis,
||Zty(x) — 11| < 8 for all x e Cr. Therefore, we have

C(i—e)r C <l>(Cr) C C(i4.e)r,

and so

g(a) 4- Dg(a) C g(C) c g(a) + Dg(a)(C(i+fi)r).

Applying Proposition 5.5.13, using the fact that vol(Car) = anvol(Cr), and remembering
that translation preserves volume, we obtain the result. ■

We begin our onslaught on the Change of Variables Theorem with a very simple case,
whose proof is left to the reader in Exercise 2.

Lemma 6.3 Suppose T: R" -> R" is a linear map whose standard matrix is diagonal
and nonsingular. Let R C R” be a rectangle, and suppose f is integrable on T(R). Then
f°T is integrable on R and

[ fWVy = | det T| [ (foT)(x)dVx.


JTfjt) Jr

Theorem 6.4 (Change of Variables Theorem) Let Q c R” be a region and let U be


an open set containing Q so that g: U -> R" is one-to-one and C1 with invertible derivative
at each point. Suppose f: g(S2) —> R and (/°g) | det Dg|: Q -> R are both integrable.
Then
f f(j)dVy= f (/og)(x)|detDg(x)|JVx.
Jg(f2) Jq

Remark One can strengthen the theorem, in particular by allowing Dg(x) to fail to
be invertible on a set of volume 0. This is important for many applications—e.g., polar,
cylindrical, and spherical coordinates. But we won’t bother justifying it here.

Proof First, we may assume Q is a rectangle R (as usual, by choosing a rectangle


R with Q c R and working with the function f). Next, by applying Lemma 6.3, we may
assume R is a cube. That is, choose a cube C and a linear map T: Rn -> R” so that
T(C) = R. Then, by the chain rule (Theorem 3.2 of Chapter 3) and the product rule for
determinants (Proposition 5.7) and recalling that a linear map T is its own derivative, we
have

det D(goT)(u) = det (Dg(T(u))DT (u)) = det (£>g(T(u))) det T,


6 Change of Variables Theorem «l 327

and so
/ Og)(x)|detDg(x)|dVx = |detT| [ ((/=.g)»T)(u)|detDg(T(u))|dVu
Jr Jc
by the lemma

= / ((/°g)oT)(u)|detD(gor)(u)|dVu
Jc
by the previous comrpent

= / (/«(g»T))(u)|detD(g»T)(u))|dVo.
Jc
Thus, to prove the theorem, we substitute g°T for g and work on the cube C; that is, it
suffices to assume R is a cube.
There are positive constants M and N so that |/| < M (by integrability) and
|] (Dg)-1!^ < N (by continuity and compactness). ChooseO < e < 1. By uniform continu­
ity, Theorem 1.4 of Chapter 5, there is 5i > 0 so that ||Dg(x) — Dg(y)||a < s/N whenever
||x — yII < 5i,x, y e R. Similarly, there is 82 > 0 so that | det Dg(x) — det £>g(y)| < e/M
whenever ||x — y|| < 52, x, y € R. And by integrability of (/og) | det Z)g|, there is 53 > 0
so that whenever the diameter of the cubes of a cubical partition ? is less than 83, we have
U((/°g) I det Dg|, CP) — L((/°g)| det Dg|, T) < s (see Exercise 7.1.10).
Suppose T = {2?i,..., is a partition of R into cubes of diameter less than 5 =
min(5i, 52,53). Let

Mi = sup(/°g)(x);
XGJ?/
mi = inf (/og)(x);

Mi = sup (/og) (x) I det Dg(x) |;


xg/?i

mi = inf (/og)(x)|detDg(x)|.
xe2?,
We claim that if a, is the center of the cube Ri, then

(*) nti - e < mi | det Dg(a,) | and Mi | det Dg(a,) | < Mi + e.

We check the latter: Choose a sequence of points x& e Ri so that (/°g)(x/j -> Mi (and
we assume Mi > 0 and all (/°g)(Xjt) > 0 for convenience). We have | detDg(a;)| <
| det Dg(Xfc)| + e/M and so
(/°g)(x*)|detDg(al-)| < (/°g)(xO|detDg(x Jt)| + (/°g)(xO-^

< (/og)(Xfc)|detDg(x*)| +£< Mi+ 8.

Taking the limit as k -> 00, we conclude that

Mi | det Dg(sii) | < Mt + e,

as required.
On any cube Ri with center a,, we have

||Pg(a,)-1«Dg(x) - Z||D < ||Dg(ai)"1||D||Dg(x) - Z>g(a)||D < N- = e


328 ► Chapter 7. Integration

for all x G Ri. By Proposition 6.2, we have

(1 - £)”| detDg(ai)|vol(2?j) < vol(g(/?()) <(14- £)n| det £>g(a,)|vol(/?,).

Now, f fdV = f fdV, and


4(«) Zi A(«.)

m/volCgC/?,)) < f fdV < MfvoKg(i?i)) for i = 1,..., s.


•W)

Therefore, we have
s . s
(1 - £)" Vmd det PgCaOlvolCfl,) < / fdV < (1 +e)n V Mjdet^aJIvoK^).
Zf J*(R) i=i

Substituting the inequalities (*), we find


S - J
(1 - £)” V(mf - £)vol(^) < / fdV < (1 + £)” V(Mi + £)vol(/?f).
Z? i=i

Now, since 0 < £ < 1, we have


(1 4- £)" < 2”, (1 + £)" — 1 < 2”-1n£ (by the mean value theorem);
(1 — £)” < 1, and 1 — (1 — £)” < ne (by the mean value theorem).

Therefore,
5 p
ymiVol(jRi) - £(vol(Z?) + Afn) < / fdV
Z?
s
< y Mfvol(J?f) 4- £(2nvol(2?) 4- 2”-1 Mn).
1=1

We’ve almost arrived at the end. For convenience, let j8 = 2" vol (7?) 4- 2n~xMn. Recall
that since (/°g)| detDg] is integrable, its integral is the unique number lying between
all its upper and lower sums. Suppose now that / fdV I (/°g)| detDgjtZV. In
J$(R) Jr
particular, suppose / fdV = / (/°g)| det Dg|dV 4- y for some y > 0. Let £ > 0 be
Jg(R) Jr
chosen small enough so that (ft 4-1)£ < y. We have
[ fdV < U((f°g)\det Dg|, ?) 4- fie < f (/°g)| det Dg|dV 4- (fi + 1)£
Jg(R) Jr
< f (/«>g)|detDg|dV + y= [ fdV,
Jr Jg(R)

which is a contradiction. Similarly, supposing that y < 0 leads to a contradiction. Thus,


[ fdV= f (/og)|det£>g|dV,
Jg(R) Jr
as desired. ■
6 Change of Variables Theorem ◄ 329

► EXAMPLE 1

First, to be official, we check that the formulas we derived in a heuristic manner in Section 4 are valid.

r cos 0
Polar coordinates: . Then
r sin 0

COS#
detDg
sin#

Ir r cos#
b. Cylindrical coordinates: gI# rsin# . Then
\z
—r sin# r\
rcos#

0
0
1 ( # I = r.
zl

c. Spherical coordinates: (
Let g 1 <f> =
psin</>cos#
psin^sin# . Then
w pcos0

P sin^cos# pcos^cos# —psin^sin#

(#
sin^sin#

cos</>
pcos</>sin#

—psin0
psin^cos#

and, expanding in cofactors along the third row, we find that

det Dg I </> — cos0(p2sin</>cos0) + psin0(p sin20) = p2sin0. ◄


w

► EXAMPLE!
°1 f3
Let S' c R2 be the parallelogram with vertices as pictured in Figure
0 ’ 1

6.2. Evaluate xdA. Of course, with a bit of patience, we could evaluate this by three different
iterated integrals in cartesian coordinates, but it makes sense to take a linear transformation g that
maps the unit square, R, to the region S; e.g.,

Then, applying the Change of Variables Theorem, we have

I xdAxy = I (3u + v) 5 dAuv


s R ---*--- '
x |detDg|

= 5 f f (3w + v)dvdu = 5 [ (3m + |)Jm = 5 • 2 = 10. ◄


Jo Jo Jo
330 ► Chapter 7. Integration

► EXAMPLE 3

Let S c R2 be the region bounded by the curves y = x, y = 3, xy = 1, and xy — 4. We wish to


evaluate / yd A. The equations of the boundary curves suggest a substitution u = xy, v = y/x. To
Js
determine the function g so that g , we need the inverse function (note that S lies in the

first quadrant):

If we look at Figure 6.3, it is easy to check that g maps the region Q =

to S. Now,

Figure 6.3
6 Change of Variables Theorem <4 331

Then, by the Change of Variables Theorem, we have

dvdu

U= —. ◄
3

► EXERCISES 7.6

1. Suppose x, y g R”, S and T are linear maps from R" to Rm, and c g R.
(a) Prove that ||x + y||Q < ||x||Q + ||y||D and ||cx||Q = |c|||x||D-
(b) Prove that ||S + T ||D < ||S||Q + ||r ||a and RT||a = |c| l|TllD.
<c) Prove that ||r(x)||D < ||T||DIMD.
(d) Suppose the standard matrix for T is the m x n matrix A. Prove that
||T|| max t |al7).
u l<«<m
(e) Check that ||x||a < ||x|| < 7n||x||n and ;^||T||n < ||T|| < 7n||T||D.
'b
(f) Suppose g: [a, b] -> R" is continuous. Prove that IS

needed to prove Proposition 1.3 of Chapter 6 with the || • [| norm.)
2. Prove Lemma 6.3.
x2 y2 x2 y2 zf
3. Find the area of the ellipse — + — < 1 and the volume of the ellipsoid — 4- + t < 1. (Cf.
a2 o2 a2 b2 c2
also Exercise 9.4.17.)
0 1 0
4. Let S be the triangle with vertices at , and . Let fl I = ri/(*+y). Evaluate
0 0 1

the integral / fdA


Js
(a) by changing to polar coordinates,
(b) by making the change of variables w =x - y, v = x + y.
*5. LetS be the plane region bounded by y = 0,2x + y = l,2x 4- y = 5,and-x + 3y = 1. Evaluate

Js 2x + y
6. Rework Example 3 with the substitution u = xy, v = y.
*7. LetS be the plane region bounded by x = 0,y = 0, andx + y = 1. Evaluate dA.
(Remark: The integrand is undefined at the origin. Does this cause a problem?)
8. Find the volume of the region bounded below by the plane z = 0 and above by the elliptical
paraboloid z = 16 — x2 — 4y2.
9. Let S be the plane region in the first quadrant bounded by the curves y = x, y = 2x, and xy = 3.
Evaluate / xdA.
Js
332 ► Chapter 7. Integration

*10. Let S be the plane region in the first quadrant bounded by the curves y — x, y = lx, xy = 3,
fX
andxy = 1. Evaluate I -dA.

11. Let S' be the region in the first quadrant bounded by y = 0, y = x, xy = 1, and x2 — y2 = 1.
Evaluate / (x2 + y2)dA. (Hint: The obvious change of variables is u = xy, v = x2 — y2. Here it is
Js

too hard to find = g I ) explicitly, but how can you find det Dg another way?)
_yj \v/
12. Let S be the region bounded by y = — x, y = j, y = lx, and y = lx — 1. Evaluate
f *+? j*
Js(lx — y + l)4
‘13. Let S be the region with x > 0 bounded by y + x2 = 0, x — y = 2, and x2 — lx + 4y = 0.
Evaluate / (x - y + l)~2dA. (Hint: Consider x = u + v, y = v — u2.)
Js
14. Suppose 0 < b < a. Define g: (0, b~) x (0,2/r) x (0, In) —> R3 by
(a + r cos <£) cos#
g 0 (a + rcos0)sin#
r sin<£

Describe and sketch the image of g, and find its volume.


15. Let
1 1 1 1
1 2 1 1

A= 1 2 3 1

1 2 3 n

Given that / fdV = 1, evaluate / f(A 'x)dV.


Jw1 JR"
16. Let S = {x e R" : x{ > 0 for all i, xj + 2x2 + 3x3 H------- 1- nxn < n}. Find vol(S).
*17. Define spherical coordinates in R4 and calculate / ||x||d V.
J B(0.a)
Let R = [0,1] x [0,1], and consider the integral I = [ -—dA.
18.
Jr 1 - xy
OO
(a) By expanding the integrand in a geometric series, show that ? = (To be completely
k=l
rigorous, you will need to write Z as the limit of integrals over [0,1] x [0,1 — 5] as 8 —> 0+. Why?)
(b) Evaluate I by rotating the plane through n/4. A reasonable amount of cleverness will be re­
quired.7
19. Let an denote the n-dimensional volume of the n-dimensional unit ball 5(0,1) c R". Prove that
nm/ml, n — lm
an =
^l^ml/dm + 1)1, n = lm + l '
(Hint: Proceed by induction with gaps of 2.)

7We learned of this calculation from Simmons’s Calculus with Analytic Geometry, First Edition, pp. 751-52.

c h h Bt e r

8
DIFFERENTIAL FORMS
AND INTEGRATION ON
MANIFOLDS
In this chapter we come to the culmination of our study of multivariable calculus. Just as
in single-variable calculus, we’ve studied two seemingly unrelated topics—the derivative
and the integral. Now the time has come to make the connection between the two, namely,
the multivariable version of the Fundamental Theorem of Calculus. After building up to
the ultimate theorem, we consider some nontrivial applications to physics and topology.

► 1 MOTIVATION
We want to be able to integrate on ^-dimensional manifolds, so we begin by introducing
the appropriate integrands, which are called (differential) Worms. These integrals should
generalize the ideas of work (done by a force field along a directed curve) and flux (of a
vector field outward across a surface). But not only are Worms invented to be integrated,
they can also be differentiated. There is a natural operator d, called the exterior derivative,
which will turn Worms into k + 1-forms. The classical Fundamental Theorem of Calculus,
we recall, tells us that
/•b
/ f'(j)dt = f(b) - f(a)
Ja

whenever f is C1. We should think of this as relating the integral of the derivative over the
interval [a, b] to the “integral” of f over the boundary of the interval, which in this case is
the signed sum of the values f(b) and f(a). Notice that there is a notion of direction or
/•a fb
orientation built into the integral, inasmuch as / f(t)dt = — / f(f)dt. In this guise,
Jb Ja
we can write the Fundamental Theorem of Calculus in the form
f df=[ f = f(b) - f(a).
J[a,b] Jd[a,b]

More generally, we will prove Stokes’s Theorem, which says that

I dco = / co
Jm JdM

333
334 ► Chapter 8. Differential Forms and Integration on Manifolds

for any Zr-form <y and compact, oriented ^-dimensional manifold M with boundary dM.
The original versions of Stokes’s Theorem all arose in the first half of the nineteenth century
in connection with physics, particularly potential theory and electrostatics.
Just as the Fundamental Theorem of Calculus tells us that our displacement is the
integral of our velocity, so can it tell us the area of a plane region by tracing around its
boundary (see Exercises 1.5.3 and 8.3.26). Another instance of the Fundamental Theorem
of Calculus is Gauss’s Law in physics, which tells us that the total flux of the electric field
across a “Gaussian surface” is proportional to the total charge contained inside that surface.
And, as we shall see in Section 7, another application is the Hairy Ball Theorem, which tells
us we can’t comb the hairs on a billiard ball. The elegant modern-day theory of calibrated
geometries, which grew out of understanding minimal surfaces (the surfaces of least area
with a given boundary curve), is based on differential forms and Stokes’s Theorem.
As we’ve seen in Sections 5 and 6 of Chapter 7, determinants play a crucial role in the
understanding of n-dimensional volume, and so it is not surprising that k-forms, the objects
we wish to integrate over ^-dimensional surfaces, will be built out of determinants. We
turn to this multilinear algebra in the next section.

> EXERCISES 8.1

1. Why does a (plane) mirror reverse left and right but not up and down?
2. Appropriating from Tom and Ray Magliozzi’s “Car Talk”:
RAY: Picture this. It’s 1936. You’re in your second year of high school. Europe is
on the brink of yet another war.
TOM: Second senior year in high school.
RAY: In a secret location in Germany, German officers are gathered around a table
with the designers and builders of its new personnel carrier. They’re going over every
little detail and leaving no stone unturned. They want everything to be flawless. One of
the officers stands up and says, “I have a question about the fan belt, about the longevity
of the fan belt.” You with me?
TOM: They spoke English there?
RAY: Oh, yeah.
TOM: Just like in all the movies?
RAY: I’m reading the subtitles.
TOM: Just like in all the movies. I often wondered how come they all spoke English?
RAY: Well, it’s so close to German, after all.
TOM: Yeah. You just add an ish or ein to the end of everything.
RAY: Anyway, this fan belt looks just like the belt around your waist. It’s a flat piece
of rubber, and it’s designed to run around the fan and the generator. So, he asks, “How
long do you expect the belt to last?” The engineer says, “30 to 40 thousand kilometers.”
The officer says, “Not good enough.”
TOM: He said, how many miles is that?
RAY: The colonel says ...
TOM: That’s why I never made any money in scriptwriting.
RAY: Yeah. The colonel says, “Not good enough. We need it to last at least 60K.”
The engineer says, “Huh. Not a problem. It’s just a question of taking off the belt and
flipping it over, right?”
TOM: Sure.
RAY: Turning it inside-out.
2 Differential Forms 335

TOM: Yeah.
RAY: The officer says, “That’s unacceptable. Our soldiers will be engaged in battle.
We can’t ask them to change fan belts in the middle of the battlefield.”
TOM: Well, it’s a good point.
RAY: That’s right.
TOM: I mean, come on. You can’t tell the guys to stop shooting, your fan belt’s got
to be replaced.
RAY: Exactly. Hold your fire. So, the engineers huddle together, and they come up
with a clever design change. And I think I mentioned they do not change the material of
the belt in any way, yet they satisfy the new longevity requirement quite easily. What did
they do?
TOM: Whew!
(Source: Tom and Ray Magliozzi from Car Talk on NPR.)

► 2 DIFFERENTIAL FORMS
We have learned how to calculate multiple integrals over regions in R". Our next goal is to
be able to integrate over compact manifolds, e.g., curves and surfaces inR3. In some sense,
the most basic question is this: We know that the determinant gives the signed volume of an
n-dimensional parallelepiped in R"; how do we find the signed volume of a ^-dimensional
parallelepiped in R”, and what does “signed” mean in this instance?

2.1 The Multilinear Setup


We begin by using the determinant to define various multilinear functions of (ordered) sets
of k vectors in R”. First, we define n different linear maps dxi;: R" -> R, i = 1,..., n, as
follows: If

then set dxi (v) = v,.

L J

(The reason for the bizarre notation will soon become clear.) Note that the set of linear maps
from R” to R is an n-dimensional vector space, often denoted (R")*, and {dxi,..., dxn}
is a basis for it. (See Exercise 4.3.25.) For if 0: R" -> R is a linear map, then, let­
ting {ei,..., en} be the standard basis for R", we set ai — </>(eJ, i = 1......... n. Then
0 = a^dxi -I-------- 1- andxn, so dx\,..., dxn span (R")*. Why do they form a linearly in­
dependent set? Well, suppose 0 — cidxi -I-------- F cndxn is the zero linear map. Then, in
particular, 0(e,) = q = 0 for all i = 1,..., n, as required.
Now, if I = (ii,..., ik) is an ordered fc-tuple, define

rfx,: R”x » - -xR” R by1


k times

'Here we revert to the usual notation for functions, inasmuch as Vi,.... v* are all vectors.
336 ► Chapter 8. Differential Forms and Integration on Manifolds

dxj(vi,...,v*) =
JxIit(vi) ••• dxik(yk)

As is the case with the determinant, dxj defines an alternating, multilinear function of k
vectors in Rn. If we write

then

■ ■ ■

••• Vk,ik

When z‘i < Z2 < • • • < 4, this is of course the determinant of the k x k matrix obtained by
taking rows i i,... ,ik of the matrix

Vi v2 •■■ v*

► EXAMPLE 1

► EXAMPLE 2

Letn =4, Z = (3,1,4),


2 Differential Forms 337

Whenz'i < i2 < • • • < z^, we say that the ordered A;-tuple / = (ii,..., ik) is increasing.
If Z is a k-tuple with no repeated index, we denote by Z< the associated increasing k-tuple.
For example, if I — (2,4,5,1), then = (1,2,4,5), and we observe that

^X(2,4,5,l) = —dX(2,4,l,5) = +rfX(2,1,4,5) = ~^x(l,2,4,5) •

In general, Jx/ = (—l)sdxi<, where s is the number of exchanges required to move from
I to I<. Note that if we switch two of the indices in the ordered £-tuple, this amounts to
switching two rows in the matrix, and the determinant changes sign. Similarly, if two of
the indices are equal, the determinant will always be 0, so dxj = 0 whenever there is a
repeated index in I.
It follows from Theorem 5.1 or Proposition 5.18 of Chapter 7 that the set of dxj with
Z increasing spans the vector space of alternating multilinear functions from (RB)* to R,
denoted A^(R”)*. In particular, if T e A*(R")*, then for any increasing fc-tuple Z, set
a} = T (e,!,..., eit). Then we leave it to the reader to check that

T = aidXl
I increasing

and that the set of dxi with I increasing forms a linearly independent set (see Exercise
1). Since counting the increasing sequences of k numbers between 1 and n is the same as
counting the number of ^-element subsets of an n-element set, we have

dim(A*(R")*) = Q^.

Remark Suppose Z is an increasing £-tuple. We have the following geometric inter­


pretation: Given vectors Vi,..., v* 6 Rn, the number dxj (vi,..., v^) is the signed volume
338 ► Chapter 8. Differential Forms and Integration on Manifolds

The signed area (as viewed from the positive


x2-axis) of this parallelogram is d (v , w )

Figure 2.1

of the projection onto the xixxt2... -plane of the parallelepiped spanned by vi,..., v*.
See Figure 2.1.

Generalizing the cross product of vectors in R3 (see Exercise 3), we define the product
of these alternating multilinear functions, as follows. If I and J are ordered k- and ^-tuples,
respectively, we define

dxj /\dxj =

where by (I, J) we mean the ordered (k 4- £)-tuple obtained by concatenating I and J.

► EXAMPLE 3

^X(l,2) A dx^ — dX(\t2,3)


<ZX(i,5) A fi?X(4,2) — dX([t5'4,2) = ~<^x(l,2,4,5)
^*(1,3,2) A dX@,4) = ^X(i,3,2,3,4) = 0

We extend by linearity: If co — ^aidxi and rj = ^bjdxj, then we set A z; =


A dxj = ^(aibj)dX(ij). This is called the wedge product of c d and r).

► EXAMPLE 4

Suppose a> = axdx\ + a2dx2 and t? = b}dx\ 4- b2dx2 € A’(R2)* = (R2)*. Then let’s compute
a> a r] e A2(R2)*:

a) A T) = (a\dx[ 4- a2dx2) A (b\dx\ + b2dx2)


= aAb\dx\ A dxi + a2bxdx2 A dx} 4- a&dxi A dx2 4- a2b2dx2 A dx2
— a\bidx(iti) 4-a2b]dX(2,i) 4- fljh2^x(i,2) 4-
= (aii>2 — ^2hi)dX(i,2)-

Of course, it should not be altogether surprising that the determinant of the coefficient matrix a1 a2
b\ b2
has emerged here.
2 Differential Forms 339

Proposition 2.1 The wedge product enjoys the following properties.

1. It is bilinear: (cy + 0)A7j = cDA^ + 0A^ and (cco) A = c(a> A f).


2. It is skew-commutative: co A rj = (— \)kir) A co, when co G A*(Rn)* and r] G
A£(R")*.
3. It is associative: (<y A rj) A 0 = co A (rj A 0).

Proof Properties (1) and (3) are obvious from the definition. For (2), we observe
that to change the ordered (k + €)-tuple (it......... ik, f,..., jf) to the ordered (k + £)-
tuple (ji,..., je, fi,..., ik) requires kl exchanges: To move ji past r’i, • • •, ik requires k
exchanges, to move j2 past ii,..., ik requires k more, and so on. ■

Now that we’ve established associativity, we can make the crucial observation that

dxi /\dxj = dxyj) and, moreover,


dx^ A dxi2 A ■ ■ • A dxik =

As has been our custom throughout this text, when we work in R23, it is often more convenient
to write x, y, z for xi, x2, X3.

2.2 Differential Forms on R” and the Exterior Derivative


A (differential) 0-form on R" is a smooth function. An n-form on R” is an expression of
the form2

CO = f(x)dx\ A • • • A dxn

for some smooth function f. As we shall soon see, these (rather than functions) are precisely
what it makes sense to integrate over regions in R”. A (differential) k-form on Rn is an
expression

cd = £ /z(x)cfxz = A ' •' A dxik


increasing fc-tuples Z i i <—< i*
for some smooth functions fa. (Remember that the Jx/ with I increasing gives a basis for
Afc(Rn)*.) As usual, if k > n, the only £-form is 0.
We can perform the obvious algebraic manipulations with forms: We can add two k-
forms; we can multiply a fl-form by a function; we can form the wedge product of a £-form
and an £-form. The set of fc-forms on R" is naturally a vector space, which we denote by
^(Rn).3 For reference we list the relevant algebraic properties:

Proposition 2.2 Let U c R” be an open set. Let co g Ak(U), rj G X€(t7), and 0 G


Am(U).

2Sorry about that. You think of a better word!


3Por those of you who may see such words in the future, it is in fact a module over the ring of smooth functions.
Indeed, because we can multiply by using the wedge product, if we put all the k-forms together, k = 0,1,..., n,
we get what is called a graded algebra.
340 ► Chapter 8. Differential Forms and Integration on Manifolds

1. When k = t = m, c o + 1) = t j + c o and (co 4- tf) 4- = co 4- + 0).


2. co A Tj = (—l)wz? A co.
3. (co A rj) A $ = co A (r) A <f>).
4. When k = f, (co 4- z?) A <f> — (co A 0) 4- (z? A 0).

Determinants (and hence volume) are already built into the structure of k-forms. As
the name “differential form” suggests, their substantial power comes, however, from our
ability to differentiate them. We begin with the case of a 0-form, i.e., a smooth function
f: U -> R. Then for any x e U we want df(x} = D/(x) as a linear map onR". In other
words, we have

In particular, note that if we take f to be the z* coordinate function, then df = dx, and
Jx£ (v) = Dxi(y) = Vi, so this explains (in part) our original choice of notation. If co =
52 fi (x)dx.i is a k-form, then we define
i
n qf
dco = ^2/dfi A dxi = -—dXj A dx^ A • • • A dxik.
i i j=i °Xj

(Note that for a fixed k-tuple I, only the terms dxj where j is different from i\,..., ik will
appear.)

► EXAMPLE 5

a. Suppose f: R -> R is smooth. Then we have df — f'(x)dx.


b. Let co = ydx 4- xdy e ^(R2). Then dco = dy A dx 4- dx A dy =0.
c. Let co = —ydx 4- xdy € A^R2). Then dco = — dy Adx +dx Ady = 2dx A dy.
d. Let co = d (arctan e >t’(R2 — {0}). Then
\ x/ x2 4- y2
( y \ ( x \
dco — d I —------ r I A dx 4- d I —------r ) A dy
\ x2 + y2J \x +y J
3 / y \ , . 8 ( x \ , .
= “T" 2"7 2 I dy Adx 4- — I - \dx Ady
dy\x2 + y2J 3x\x24-y2/
(x2 4- y2) ~2y2 (x2 + y2) - 2x2
= ----- C 2 ■ 2\2 dx ^dy+ --- z ~2~. 2\2----dX ^dy = °‘
(x2 4- y2)2 (x2 4- y2)2

e. Let co = x^dx2 4-X3dx4 +x5dxe € X'(R6). Then dco = dx\ A dx2 4- dx3 A dx^ 4-
dx3 A dXf>.
f. Let co = (x2 4- eyz)dy a dz 4- (y2 4- sin(x3z))dz A dx 4- (z2 4- arctan(x2 4- y2))dx A dy
€ .42(R3). Then
dco — 2xdx A dy A dz 4- 2ydy Adz Adx + 2zdz Adx Ady
= 2(x+ y + z)dx Ady Adz. **4
2 Differential Forms 341

The operator d, called the exterior derivative, enjoys the following properties.

Proposition 2.3 Let a) e Ak(U) and r} € >4£(tZ). Let f be a smooth function.

1. When k = £,we have d(a) + r}) = da) + dr}.


2. d(fa)) = df A <i) + fda).
3. d(a) A i}) = da) A r) + (— /\drj.
4. d(da)) = 0.

Proof Properties (1) and (2) are immediate; indeed, (2) is a consequence of (3). To
prove (3), we note that because d commutes with sums, it suffices to consider the case that
a) = fdxi and rj = gdxj. Then, since the product rule gives d(fg) = gdf 4- fdg, we
have

d(a) A/}) = d(fgdxj /\dxj) = d(fg) A dxi A dxj


= (gdf + fdg) A dxi A dxj = gdf /\ dxi /\dxj + fdg A dxi /\dxj
= (df A dxj) A (gdxj) + (-l)k(fdxr) A (dg A dxj)

(since we must switch dg e Ar(U) and dxj e Ak(U))

= da) /\r} + (— l)ka) A dr}.

To prove (4), suppose a) = fdxj. Then

A 3/
da) = y -—dxj /\dxs
“ dxj
j=i j

and

(*)
92y
d(da)) = 2y2_, —---- dx; /\ dXi /\dXj.
dXidxj J
i=i j=i

Since dxi /\dxj — —dxj A dx,, we can rewrite the right-hand side of (*) as

V-a / d2f d2f \


> I------------- -- —-— ) dxi A dxi A dX[.
£f.\dXidXj dXjdXi) J

But by Theorem 6.1 of Chapter 3, we have

a2/ . a2f
dxjdxj dXjdXi ’

and so this sum is 0, as required. ■


342 ► Chapter 8. Differential Forms and Integration on Manifolds

2.3 Pullback
All the algebraic and differential structure inherent in differential forms endows them with
a very natural behavior under mappings. The main point is to generalize the procedure of
“integration by substitution,” familiar to all calculus students: When confronted with the
fb
integral / f(g(.u))g'(u)du, we substitute x = g(u), formally write dx = g'(u)du, and
Ja
fb fg(b)
say/ f(g(p))gf(u)du = I f(x)dx. The proof that this works is, of course, the chain
Ja Jg(a)
rule. Now we put this procedure in the proper setting.

Definition Let U C be open, and let g: U -> R" be smooth. If c d e .4fe(R"),


then we define g*<z> G Ak(U) (the pullback of c d by g) as follows. To pull back a function
(0-form) f, we just compose functions:

g*/ = Ag.

To pull back the basis 1-forms, if g(u) = x, then set


m „
g*dxi =dgi =Y^-^-du J.
— duJ
j =i J
Note that the coefficients of g*dxf, written as a linear combination of duy......... dum, are
the entries of the I th row of the derivative matrix of g. Now just let the pullback of a wedge
product be the wedge product of the pullbacks:

g*(dxix A • • • A dxik) = dgix A • • ■ a dgik, which we can abbreviate as dgi.

Last, we take the pullback of a sum to be the sum of the pullbacks:

g*( ZL Jx') = Z2(^og) Jg/ = E<»og>d8i> A ‘ ‘' A d^ •


z I i

► EXAMPLE 6

a. If g: R -> R, then g*(f(x)dx) - f(g(u))g'(u)du.


b. Let g: R -> R2 be given by

Then g*dx = - sin tdt and g*dy = cos/dz, so g*(—ydx + xdy) = (- sinz)(- sin zdz) +
(cos z)(cos tdt) = dt.
c. Let g: R2 -> R2 be given by

U COS U
g
u sin v
2 Differential Forms 343

If a> = xdx + ydy, then

g*o> — (u cos v)(cos vdu — u sin vdv) + (u sin v)(sin vdu + w cos vdv)
= m (c o s 2 v + sin2 v)du + w2(— cos v sin u + cos v sin v)dv = udu.

Moreover,

g*(dx A dy) = g*dx A g*dy = (cos vdu — u sin vdv) A (sin vdu + u cos vdv)
= m (c o s 2 v + sin2 v)du Adv = u du A dv,

so g*(^_(x2+:y2)dx A dy) = we““2du A dv.


d. Let g: R2 -> R3 be given by

WCOS V
usinv

Then

g*dx = cos vdu — u sin vdv


g*dy = sin v du + u cos vdv
g*dz — dv

and so

g*(dx A dy) = udu A dv


g*(dx A dz) = cos vdu A dv
g*(dy A dz) = sin vdu A dv.

Therefore, if w = (x2 + y2)dx Ady + xdx Adz + ydy A dz, then we have

g*<v = u\udu A dv) 4- (u cos v)(cos vdu A dv) + (w sin v)(sin vdu A dv)
= u(u2 + l)du A dv. ◄

It is impossible to miss the appearance of determinants of the derivative matrix in the


calculations we just performed. Indeed, if I is an ordered £-tuple,

g*Jxz = det duj\ i.e.,


.9u/_
increasing ^-tuples J

9“/i

duhA---Adujk.
M ... ^gik
3uJk
We need one last technical result before we turn to integrating.
344 ► Chapter 8. Differential Forms and Integration on Manifolds

Proposition 2.4 LetU c Rm be open, and let g \ U -> R" be smooth. Ifco G «4*(R"),
then

g*(da>)=d(g*a)).

Proof The statement for k = 0 is the chain rule (Theorem 3.2 of Chapter 3):

=r(E^)=^>-

Since the pullback of a wedge product is the wedge product of the pullbacks, we infer
that g*(dxj) = dgj. Because d and pullback are linear, it suffices to prove the result for
<o= fdXi. Weh,

g*(d(/dx,)) = g’(d/ A dX;) = g*(d/) A g*(dx;) = g*(d/) A d&


= d(g*/) A dg, = d((g*/)dg;) = d(g*(/dx,)).

(Notice that at the penultimate step we use the rule for differentiating the wedge product
and the fact that d(dgi) = 0.) ■

Now we come to integration. Given an n-form <o = f(x)dxi a • • • a dxn on a region


Q C R”, we define
[ <o= [ fdV.
Jci J ci
Note that since f is smooth, it is continuous and hence integrable on any region Q. It is
very important to emphasize here that the n-form <o must be written as a functional multiple
of the standard n-form dxi a • • • A dxn.
In some sense, the whole point of differential forms is the following restatement of the
Change of Variables Theorem:

Proposition 2.5 Let Q cRn be a region, and let g : Q —> R” be smooth and one-
to-one, with det(Dg) > 0. Then for any n-form <a ~ fdx\ A • • • A dxn on S = g(S2), we
have

Let Q C R* be a region, and let g: Q -> Rn be a smooth, one-to-one map whose


derivative has rank k at every point. (Actually, it is allowed to have lesser rank on a
2 Differential Forms ■< 345

set of volume 0, but we won’t bother with this now.) We say that M = g(2) c Rn is a
parametrized k-dimensional manifold. If co is a fc-form on R", we define
f CO — f g*m.
Jm Jq
If gi: -> R” and g2: Q2 -> are two parametrizations of the same ^-manifold M,
then, provided detDfe^ogi) > 0 (which, as we shall soon see, means that gi and g2
parametrize M with the same orientation),

I 82^=/ (82 l°gi)*(82") (by Proposition 2.5)


JSi? J Sil

= / (g2°(g2 logi))**> (see Exercise 16)


JS2i
= / ((g20g71)°gi)*<w by associativity
J S2i

That is, the integral of co over the (oriented) parametrized manifold M is well defined.

► EXERCISES 8.2

1. Prove that as I ranges over all increasing k-tuples, the dxy form a linearly independent set in
AA(R")*. Also check that for any T e Afc(R")*, T = increasing where = T(eheik).
2. (a) Suppose co e A*(R")* and k is odd. Prove that co a c o = 0.
(b) Give an example to show that the result of part a need not hold when k is even.
3. Suppose v, w 6 R3. Show that dx(v x w) = dy a dz(y, w), dy(y x w) =dz /\ dx(y, w), and
dz(y x w) ~dx /\ dyly, w).
4. Simplify the following expressions:
*(a) (2dx + 3dy + 4dz) * (dx -dy + 2dz)
(b) (dx +dy — dz) A (dx + 2dy + dz) A (dx — 2dy + dz)
*(c) (2dx Ady + dy A dz) A (3dx — dy + 4dz)
(d) (dx\ A dx2 + dx3 A dx4) A (dx\ A dx3 + dx3 A dx^)
(e) (dx\ A dx2 + dx3 A dx^ + dx$ A dx6) A (dx\ A dx2 + dx3 A dx$ + dx5 A dxf>) A
(dxi A dx2 + dx3 A dx4 + dx5 A dx^)
s5. Let n e R3 be a unit vector, and let v and w be orthogonal to n. Let
0 = n^dy /\dz + n2dz A dx + n3dx A dy.
Prove that (v, w) is equal to the signed area of the parallelogram spanned by v and w (the sign being
determined by whether n, v, w form a right-handed system for R3).
*6. Calculate the exterior derivatives of the following differential forms:
(a) co = exydx
(b) co = z2dx + x2dy + y2dz
(c) co = x2dy /\dz + y2dz A dx + z2dx A dy
(d) co = X\X2dx3 A dx4
346 ► Chapter 8. Differential Forms and Integration on Manifolds

*7. Can there be a function f so that df is the given 1-form c d (everywhere c d is defined)? If so, can
you find /?
(a) a> = —ydx 4- xdy (d) to = (x2 4- yz)dx 4- (xz + cos y)dy 4- (z + xy)dz
(b) c d = 2xydx 4- x2dy (e) " = l^dx + Wdy
(c) co = ydx 4- zdy 4- xdz (f) c d = — x-Jt-^dx
v ’ 2+y2
4- x~r~idy
2+y2 '

8. For each of the following fc-forms co, can there be a (k — l)-form i] (defined wherever c d is) so that
dr) = c d !
(a) c d = dx A dy (e) c d = xdy Adz + ydx Adz + zdx A dy
(b) c d = xdx A dy (f) c d = (x2 4- y2 + z2)""1 (xdy Adz + ydz A dx 4- zdx A dy)
(c) c d = zdx A dy (g) CD = Xsdxi A dxz A dx3 A dx$ 4- Xidx2 A dx4 A dx3 A dx5
(d) c d = zdx Ady + ydx Adz + zdy A dz
"9. (The Star Operator)
(a) Define ★ : >V(R2) -> >V(R2) by *dx = dy and *dy = -dx, extending by linearity. If f is a
smooth function, show that
d*(df) = (+ yr) dx A d>’-
\ dx2 dy2 /

(b) Define ★ : A1 (R3) X2(R3) by *dx =dy A dz, *dy = dz a dx, and *dz = dx a dy, extend­
ing by linearity. If f is a smooth function, show that
d*(df) = (yr + TT + yr) dx A dy A dz.
\ ox2 dy2 dz2, /
(Note that we can generalize the definition of the star operator by declaring that, in R”, ★ of a
basis 1-form </> = dxt is the “complementary” (n - l)-form, subject to the sign requirement that
0 A *0 = dx\ A • • • A dxn.)
10. Suppose c d e X1 (Rn) and there is a nowhere-zero function A. so that ka> is the exterior derivative
of some function f. Prove that c d Ada) = 0. (This problem gives a useful criterion for deciding
whether the differential equation c d = 0 has an integrating factor X.)
11. In each case, calculate the pullback g*tu and simplify your answer as much as possible.
(a) g: (—t t /2, n/2) -> R, g(u) = sinu, c d = dx/Vl — x2
3cos2v
*(b) g: R -> R2, g(v) = , c d = —ydx 4- xdy
3sin2v

3ucos 2v
(c) g:R2->R2,g , a) = —ydx +xdy
3m sin 2v
cosu
(d) g:R2->R3,g sinu , c d = zdx 4- xdy 4- ydz

COSM

*(e) g:R2->R3,g sinu , c d = zdx Ady + ydz A dx

cosu
sin v
(f) g:R2^R4,g , CD = X2dXi 4- XsdX4
sinu
2 Differential Forms < 347

cosu
sin v
(g) g:R2->R4, g , co = Xidxj — *2^*4
sinu

cos v

cosu

sin v
(h) g:R2^R4,g , CO = (—XjdXi + XidXi) A (—X^dXi, + X4C/X2)
sinu

cos v

12. For each part of Exercise 11, calculate g*(dcu) and J(g*n>) and compare your answers.
13. Let g: (0, oo) x (0, nr) x (0, 2t t ) -> R3 be the usual spherical coordinates mapping, given on
p. 294. Compute g*(c/x /\dy A dz).
"14. We say a Ar-form cy is closed i£dco~Q and exact if co = dr) for some (k — l)-fonn ty
(a) Prove that an exact form is closed. Is every closed form exact? (Hint: Work with Example 5d.)
(b) Prove that if co and 0 are closed, then co A 0 is closed.
(c) Prove that if co is exact and 0 is closed, then co a 0 is exact.
k
15. Suppose k < n. Let coi,..., co* e (R”)* and suppose that 52 A co< = 0. Prove that there are
i=l
k
scalars ay such that = a;, and co, = 52 atjdxj.

16. Suppose R? A Rm A Rn. Prove that (g°h)* = h*og*. (Hint: It suffices to prove (goh)*c7xf =
b*(g*c/x{). Why?)
17. (a) Suppose I = (i'i,..., in) is an ordered n-tuple and Z< — (1,2,..., n). Then we can define
a permutation a of the numbers 1,..., n by a(j) = ij, j = 1,..., n. Show that
d*i = sign(a)c£xi A • • • A dxn.
n
(b) Suppose co, = 52 aijdxj, i = 1,..., n, are 1-forms on Rn. Use Proposition 5.18 of Chapter 7 to
j =i
prove that coi A • • • A co„ = (det A)dxt A • • • A dxn.
(c) Suppose g: Rn -> Rn is smooth. Show that dg} /\ • ■ ■ a dgn = det(Dg)Jxi A • • • A dxn.
18. Suppose 0i,..., 0* € (R”)* and Vi,..., v* € R". Prove that
01 A•••A 0jt(Vl, ..., vt) = det [0z(vy)].
(Hints: First of all, it suffices to check this holds when the vy are standard basis vectors. Why? Write
n
out the 0i as linear combinations of the dxj, fa = 52 aijdXj, and show that both sides of the desired
equality are

°Ui ■ ‘’ aVk

akji ■ ‘ ’ akjk
when we take Vj =e71,...,vjt =e;t.)
19. Suppose U C Rm is open and g: U -> Rn is smooth. Prove that for any co € v4k(Rrl) and
Vy,..., Vjt e Rm, wc have
g*cy(a)(vi,..., vfc) = w(g(a))(Dg(a)vi....... Dg(a)vfc).
(Hint: Consider co — dni.)
348 ► Chapter 8. Differential Forms and Integration on Manifolds

20. Prove that there is a unique linear operator d mapping Ak (U) -> Ak+i (U) for all k that satisfies
the properties in Proposition 2.3 and df = -A-dxj. (This tells us that, appearances to the contrary

notwithstanding, the exterior derivative d does not depend on our coordinate system.)

► 3 LINE INTEGRALS AND GREEN’S THEOREM


We begin with a 1-form c d = $2 011 and a parametrized curve, C, given by a C1
function g: [a, b] -> R” (ordinarily with g/ / 0). Then we define

re fb n
YFi^g'^dt.
JC J[a,b] Ja i=1

Now we define a vector field (vector-valued function) F: R" —> Rn by

Fi

We then recognize that


y yb yb

C
<•> =
Ja
F(g(»)) ■ g'W* = / F(g(O) •
Ja ^^d>=Lv-Tds’
where ds is classically called the “element of arclength” on C and T is the unit tangent
vector (see Section 5 of Chapter 3). The most general path over which we’ll be integrating
will be a finite union of C1 paths, as above. In particular, we say the path C is piecewise-^
ifC — C}V---V)Cs, where Cj is the image of the C1 function g7 : [a,, bj] Rn.

Remark Let C~ be the curve given by the parametrization h: [a, b] -> R", h(u) =
g(a + b — u). Then
y yb yb
I h*co = I F(h(u)) • h'(u)du = I F(g(a + b - u)) • (-g7(a + b - u))du
J[a,b] Ja Ja
cb
= — I F(g(t)) • g'(t)dt (substituting t = a + b — u)
Ja

~ - I g**>-
/[a.b]
Note that h(a) = g(b) and h(b) — g(a): When we go backward on C, the integral of co
changes sign. We can think of obtaining C~ by reversing the orientation (or direction)
ofC.
In comparing C and C~, the unit tangent vector T reverses direction, so that F • T
changes sign but ds does not. That is, the notation notwithstanding, ds is not a 1-form, as
its value on a tangent vector to C is the length of that tangent vector; this, in turn, is not a
linear function of tangent vectors. It would probably be better to write ]ds |.
3 Line Integrals and Green’s Theorem 349

► EXAMPLE 1

Let C be the line segment from , and let co = xydz. We wish to calculate co. The

first step is to parametrize C:

0<r < 1.

Then
r r r1
I co = I g*ty = / (1 + t)(-l + 3t)(2dt)
C J [0,1] Jo

= 2 /r1 (3r2 + 2t - l)Jt = 2(t3 +12 - t)]J = 2. ◄


Jo

► EXAMPLE 2

Letty = - ydx + xdy. Consider two parametrized curves Ci and C2, as shown in Figure 3.1, starting

at A — and ending at B = , and parametrized, respectively, by

COSl
g(O = and h(O = 0<t < 1;
sinr

f f /”r/2 /”T/2 7T
I = / g*co= / (— sin t)(-sin tdt) + (cost)(costJr) = / Idt = —;
Jci J[0,n/21 Jo Jo 2

/ ft>= / h*w= / (-t)(-df) + (1 - t)(dt) = / ldt = l.


Jc2 J[0,l] Jo Jo

Thus, we see that / co depends not just on the endpoints of the path but also on the particular path
JA
joining them.

Figure 3.1
350 ► Chapter 8. Differential Forms and Integration on Manifolds

Recall from your integral calculus (or introductory physics) class the definition of work
done by a force in displacing an object. When the force and the displacement are parallel,
the definition is

work = force x displacement,

and in general only the component of the force vector F in the direction of the displacement
vector d is considered to do work, so

work = F ■ d.

If a vector field F moves a particle along a parametrized curve C, then it is reasonable to


suggest that the total work should be / F • Ids: Instantaneously, the particle moves in the
Jc
direction of T, and only the component of F in that direction should contribute. Without
providing complete rigor, we see from Figure 3.2 that the amount of work done by the force
in moving the particle along C during a very small time interval [t, t + h] is approximately
F(g(t)) • (g(t + h) - g(t)) « F(g(t)) • which suggests that the total work should be

given by / F(g(t)) •
Ja

Figure 3.2

^EXAMPLES

What is the relation between work and energy? As we saw in Section 4 of Chapter 7, the kinetic
energy of a particle with mass m and velocity v is defined to be K.E. = \m ||v||2. Suppose a particle
with mass m moves along a curve C, its position at time t being given by g(t), t g [a, &]• Then the
work done by the force field F on the particle is given by
f fb
work = / F • Tds = / F(g(t)) • ^(t)dt
JC Ja

= I mg"(r) • g/(t)dt by Newton’s second law of motion


Ja
= m [ l(||g'll2)'(»)A
Ja
— (l|g*(b)ll2 ~ lg'(«)ll2) by the Fundamental Theorem of Calculus
= A(lm||v|l2) = A(K.E.).
3 Line Integrals and Green’s Theorem 351

That is, assuming F is the only force acting on the particle, the work done in moving it along a path
is the particle’s change in kinetic energy along that path. *1

3.1 The Fundamental Theorem of Calculus for Line Integrals

Proposition 3.1 Suppose co — df for some C1 function f. Then for any path (i.e.,
piecewise-Q1 manifold) C starting at A and ending at B, we have

[ co = f(B)-f(A).
Jc
Equivalently, when F = V/, we have

[ E-Tds = f(B) — f(A).


Jc

Proof It follows from Theorem 3.1 of Chapter 6 that any C1 segment of C is a finite
union of parametrized curves Cj, j — 1,..., s, where Q is the image of a C1 function
g;: [aj, bj] -» RB. Let g7(a7) = Aj and g;(Z?7) = Bj. We may arrange that Ai = A,
Bj = Aj+i, j = 1,..., j — 1, and Bs — B. It suffices to prove the result for Cj, for then
we will have

®=E (/w - w) = fw - fw.


J=l

Now, we have

by definition

since d commutes with pullback

by definition of pullback

= <f°gj)(bj) - (fogjXaf) by the Fundamental Theorem of Calculus


= f(Bj) — f(Aj),

as required. Note that the proof amounts merely to applying the standard Fundamental
Theorem of Calculus, along with the definition of line integration by pullback. The fact
that d commutes with pullback, in this instance, is simply the chain rule. ■

Theorem 3.2 Let co = $2 Fidxi be a 1-form (or let F be the corresponding force
field) on an open subset U C R”. The following are equivalent:
352 ► Chapter 8. Differential Forms and Integration on Manifolds

1. J) co = Ofor every closed curve C C U;

2. I a) is path-independent in U;
JA
3. co = df (or F = Vf)for some potential function f on U.

Remark In light of Example 3, there is no net work done by F around closed paths,
so that kinetic energy is conserved—which is why such force fields are called conservative.
Physicists refer to — f as the potential energy (P.E.). It then follows from Proposition 3.1
that the total energy, K.E. + P.E., is conserved along all curves, for

A(K.E.) = work = f(B) — f(A) = —A(P.E.), and so A(K.E. + P.E.) = 0.

Proof (1) => (2): If Ci and C2 are two paths from A to B, then C = Ci U C2 is a
closed curve, as indicated in Figure 3.3(a). Then

Figure 3.3

(2) => (3): (Here we assume any two points of U can be joined by a path. If not,
one must repeat the argument on each connected “piece” of IZ.) Fix a e U, and define
f: U -+ Rby

/•X
f (x) = / co, where the integral is computed along any path from a to x.
Ja

By path-independence, f is well defined. Now, to show that df — co, we must evidently


establish that — (x) = F,(x). As Figure 3.3(b) suggests,
dXi
3 Line Integrals and Green’s Theorem ◄ 353

X1

df r 1
7T-(x) = lim - f Xj +h -f x(-
dxi h->o h

I
j fx+hfy
k Xn ) W
j I'h
/
= lim - / c d = lim - / F,(x 4- tei)dt
h->0 h J* h—>0 n Jq
= Fi(x)

by the usual Fundamental Theorem of Calculus.


(3) ==> (1): This is immediate from Proposition 3.1. ■

Remark Given a 1-form c d , by (4) of Proposition 2.3, a necessary condition for


c d = df for some function f (c d exact) is that dcD = 0 (c d closed). As we saw in Example 5d
of Section 2, the condition is definitely not sufficient. We shall soon see that the topology
of the region on which c d is defined is relevant.

3.2 Finding a Potential Function


If we know that j c d is path-independent on a region, then we can construct a potential

function by choosing a convenient path. We illustrate the general principle with some
examples.

► EXAMPLE 4
Let c d = (ex 4- 2xy)dx 4- (x2 4- cos y)dy. We show two different ways to calculate a potential func­
tion f, i.e., a function f with df = c d .

0 Xo
a. Take the line segment C joining 0 = andxo = as shown in Figure 3.4(a); we
0 .y° _
take the obvious parametrization*.

txo
g(t) = tXo = Q<t <1.
tyo

Then

= / ((ew° + 2r2x0y0)x0 4- (t2*o + cos(ty0))y0)dt


Jo
= (etx° + |t3x^y0 + |t3*o?o + sin(fyo))]o

= (e*0 4- XoJo 4- sin y0) - 1,

and so we set f = ex + x2y 4- sin y — 1, and it is easy to check that df = c d .


354 ► Chapter 8. Differential Forms and Integration on Manifolds

b. Now we take the two-step path, as shown in Figure 3.4(b), first varying x and then varying
y, to get from 0 to Xq . That is, we have the two parametrizations:

Ci : gi (t) = 0 < f < x0, C2 : gz(O = 0 < t < y0.

Then we have

Once again, we have f I X I = e* — 1 + x2y + sin


V/

c. As a variation on the approach of part (b), we proceed purely by antidifferentiating. If we


seek a function f with df = co, then this means that
/ 9/ x 2
(*) — = e + 2xy and —- = x + cos y.

Integrating the first equation, holding y fixed, we obtain

(t) f = I (e* + 2xy)dx = e* + x2y + h(y)

for some arbitrary function h (this is the “constant of integration”). Differentiating (t) with
respect to y and comparing with the latter equation in (*), we find

= x2 4- h'(y) = x2 + cosy,

whence h'(y) — cosy and h(y) = siny + C. Thus, the general potential function is

— ex + x2y + sin y + C for any constant C.

Note that even though it is computationally more clumsy, the approach in (a) requires only that we be
able to draw a line segment from the “base point” (in this case, the origin) to all the other points of our
region. The approaches in (b) and (c) require some further sort of convexity: We must be able to start
at our base point and reach every other point by a path that is first horizontal and then vertical.
3 Line Integrals and Green’s Theorem 355

We now prove a general result along these lines: Suppose an open subset U C R" has
the property that for some point a € U, the line segment from a to each and every point
x g U lies entirely in U. (Such a region is called star-shaped with respect to a, as Figure
3.5 suggests.) Then we have:

Proposition 3.3 Let cube a closed 1-form on a star-shaped region. Then co is exact.

Figure 3.5

Proof Write = £ Fidx^ For any x e U, we can parametrize the line segment
from a to x by

g(r) = a + t(x-a), 0 < t < 1.

Then we have

a + r(x — a))(x;- - a7-

Using Exercise 7.2.20 to calculate the derivative of /, we have

9/ f1 I*1 3F-
I Fj(a + t(x — a))Jt + / t—-(a + t(x — a))(xj — aj)dt
dXj Jo Jo OXi

/* 1 /* 1 / w QP- \
I F;(a + t(x-a))dr + / t( J’'—l-(a + t(x-a))(xj - aj))dt
0 Jo °xj '
356 ► Chapter 8. Differential Forms and Integration on Manifolds

z . . r. . dFj . ,
(using the fact that —— = -— since da> = 0)

f1 r1
= / Fi(a + t(x-&))dt + I t(Fiog)f(t)dt (by the chain rule)
Jo Jo
= [ Fi(a + t(x-a))dt + t(fi°g)(t)l - I F}(a + t(x- a))dt
Jo -*0 Jo

(integrating by parts)

= Ff(x).

That is, df = a>, as required. ■

► EXAMPLE 5

Let C be the parametric curve

e‘
g(0 = t6 + 4t3 - 1
_ r4 + (t - t2ksinf

and let to = + y) dx + (x + z)dy + (logx + y 4- 2z)dz. We wish to calculate j a>.

We certainly hope that the 1-form <o is exact (or, equivalently, that the corresponding force field
is conservative), for then we can apply the Fundamental Theorem of Calculus for Line Integrals,
Proposition 3.1.
If a) is to be equal to df for some function f, we need to solve
^- = - + y, |^=x + z, -logx + y + 2z.
9x x dy oz
Integrating the first equation, we obtain:

dx — zlogx + xy 4-g
z

where g I z I is the “constant of integration.” Differentiating with respect to y, we have

9g
9y 9y

and so we find that — = z. Thus, g [ | —yz + h(z) for some appropriate “constant of integration’
9y \z
h(z). So

f y = zlogx+xy+ yz + A(z).
3 Line Integrals and Green’s Theorem 357

Now, differentiating with respect to z, we have

= logx + y + h (z) = logo: + y + 2z,


OZ

and so—finally—h(z) = z2 + c, whence

y = zlogx + xy + yz + Z2 + c.

V>
Now comes the easy part. The curve goes from

A = g(0) = to B = g(l) =

and so

at = f(B) - f(A) - (l+4e+4 + l)-(-l) = 4e + 7. ◄

► EXAMPLE 6

Newton’s law of gravitation states that the gravitational force exerted by a point mass M at the origin
on a unit test mass is radial and inverse-square in magnitude:

Y
V = —GM—
hll3
The corresponding 1-fonn is w = —GM(x2 + y2 + z2)~3,2(xdx + ydy 4- zdz). Since

d(l|x||) = WW + ydy + zdz)

(see Example 1 of Chapter 3, Section 4), it follows immediately that a potential function for the
gravitational field is /(x) = GAf/||x||. (Physicists ordinarily choose the constant so that the potential
goes to 0 as x goes to infinity.)
Let’s now consider the case of the gravitational field of the earth; note that the gravitational
acceleration at the surface of the earth is given by g = GM/R1, where R is the radius of die earth. By
Proposition 3.1, the work done (against gravity) to lift a unit test mass from a point A on the surface
of the earth to a point B height h units above the surface of the earth is therefore

-(/(B) - /(A)) = GM (1 - « gk.

provided h is quite small compared to R. This checks with the standard formula for the potential
energy of a mass m at (small) height h above the surface of the earth: P.E. = mgh. <1
358 ► Chapter 8. Differential Forms and Integration on Manifolds

3.3 Green’s Theorem


We have seen that whenever co = df for some function f, it is the case that a» == 0 for
all closed curves C. So certainly we expect that the size of dco on a region will affect the
integral of co around the boundary of that region. The precise statement is the following

Theorem 3.4 (Green’s Theorem for a Rectangle) Let R c R2 be a rectangle, and


let co be a 1-form on R. Then

I co = I dco.
JdR JR

(Here the boundary 32? is traversed counterclockwise.)

Proof Take R = [a, b] x [c, d], as shown in Figure 3.6, and write co = Pdx + Qdy.
Then
J /9£ 92>\ .
dco = I —-------- — I dx A dy.
\ dx dy /
Now we merely calculate, using Fubini’s Theorem appropriately:

as required. ■

H--------------------------------- F
a b

Figure 3.6
3 Line Integrals and Green’s Theorem ◄ 359

For most applications, the following observation is adequate:

Corollary 3.5 If S C R2 is parametrized by a rectangle, and c d is a \ -form on S, then

Proof Let g: R -> S c R2 be a parametrization. Then, applying Proposition 2.4,


we have

(It is important to understand that both S and d S inherit an orientation from the parametriza­
tion g.) ■

► EXAMPLE?

Suppose c d is a smooth 1-form on the unit disk D in R2. Can we infer that / o d = I dcDl The naive
JdD JD
answer is “of course,” parametrizing by polar coordinates and applying Corollary 3.5. The difficulty
that arises is that we only get a bona fide parametrization on (0,1] x (0, 2/r). But we can apply
Corollary 3.5 on the rectangle RS e = [3,1] x [«, 2zrj when 8, s > 0 are small. Let Ds,e = g(/?a,e),
as indicated in Figure 3.7. Because c d is smooth on all of the unit disk, we have

I dcD— lim / doD = lim / g*do) = lim / g*a> = lim / cd = I cd.


Jd 6,e->0+JDl.e 5,e->0+ JRie 5,e->0+ JdRie $,e->0+ JdDs,s JdD

(We leave it to the reader to justify the first and last equalities.) We shall not belabor such details in
the future.

Figure 3.7

More generally, we observe that Green’s Theorem holds for any region S that can be
decomposed as a finite union of parametrized rectangles overlapping only along their edges.
360 ► Chapter 8. Differential Forms and Integration on Manifolds

For, as Figure 3.8 illustrates, if S = U*=i *$» because the integrals over interior boundary
segments cancel in pairs, we have
k
I dot.
s

Remark We do not usually stop to express every “reasonable*’ region explicitly as a


union of parametrized rectangles. (For most purposes, our work in Section 5 will obviate
all such worries.) Tn Example 7 we already dealt with the case of a disk. To set our minds
further at ease, we can easily check that

gl: [0,1] x [0, >r], gi(0=r[“S®

maps a rectangle to a half-disk, and that

r rcosB
O r1- -----
cos0 • a
n + sm0 sin0

1 0
maps a rectangle to the triangle with vertices at and
0 1

► EXAMPLES

We can use Green’s Theorem to calculate the area of a planar region S by line integration. Since
dx t\dy — d(xdy) = d(-ydx) — d(^(-ydx +xdyj),

we have

area(S) = / xdy = I —ydx — - —ydx + xdy. ◄


Jds Jas *
3 Line Integrals and Green’s Theorem ◄ 361

Definition A subset X C Rn is called simply connected if it is connected and every


simple closed curve in X can be continuously shrunk to a point in X.

Corollary 3.6 Let Q C R2 be a simply connected region. If co is a smooth 1-form on


Q with dco = 0, then co is exact; i.e., there is a function f so that co = df.

Proof By Green’s Theorem, for any rectangle R C □ , we have

/ co — I dco = 0,
JdR Jr
and, as the proof of Theorem 3.2 showed, this is sufficient to construct a potential func­
tion f. ■

To emphasize the importance of all the hypotheses, we give an important example.

► EXAMPLE 9
y x
Let co = —------ -dx + —----- -dy. Then, as we calculated in Example 5d of Section 2, dco = 0.
x2 + y2 x2 + y2
And yet, letting C be the unit circle, it is easy to check that ® co = 2t c . So c o cannot be exact. We
Jc
shall see further instances of this phenomenon in later sections.

Nevertheless, we can use Green’s Theorem to draw a very interesting conclusion.

► EXAMPLE 10

Suppose C is any simple closed curve in the plane that encircles the origin, and let T be a circle
centered at the origin lying in the interior of C, as shown in Figure 3.9. Let S be the region lying
between C and T. If we orient C and T counterclockwise, then we have dS = C + r~. Once again,
y x
let co = ———-dx + —----- - dy. Then, as in Example 9, we have dco = 0. But now co is smooth
x2 + y2 x2 + y2
everywhere on S, and so

Figure 3.9
362 ► Chapter 8. Differential Forms and Integration on Manifolds

That is,

I CD = / CD = 211,
Jc Jr
and this is true for any simple closed curve C with the origin in its interior. More generally, consider
the curves shown in Figure 3.10. Then / c d — 2n, 4t t , and 0, respectively, in parts (a), (b), and (c).
Jc
For reasons we leave to the reader to surmise, for a closed plane curve not passing through the origin,
the integer
If y x
2tr Jc xI2 + y2 X + x2 + y2 y

is called the winding number of C around the origin.

► EXERCISES 8.3

*1. Let c d = ydx + xdy. Compare and contrast the integrals / c d for the following parametrized
Jc
curves C. (Be sure to sketch C.)
t cos21
(a) g: [0,1]-> R2, g(t) = (d) g: [0, t t /2] ->R2, g(r) =
t 1 — sin21

t sin2t
(b) g: [0,1] —> R2, g(r) = (e) g: [0, t t /4] R2, g(r) ==
t2 1 -cos2t

1-t cost
(c) g: [0,1]-> R2, g(t) = (f) g: [0, t t /2]-> R2, g(t) =
1 —t 1 - sint

*2. Repeat Exercise 1 with c d = y2dx + xdy.


3. Calculate the following line integrals:
(a) / xy3dx, where C is the unit circle x2 + y2 = 1, oriented counterclockwise.
"o' 1 “
I zdx + xdy + ydz, where C is the line segment from 1 to -1
c 3
2
3 Line Integrals and Green’s Theorem 363

(0 y2dx + zdy - 3xydz, where C is the line segment from

(d) / ydx, where C is the intersection of the unit sphere and the plane x + y + z = 0, oriented
Jc
counterclockwise as viewed from high above the xy-plane. (Hint: Find an orthonormal basis for the
plane.)
4. Let C be the curve of intersection of the upper hemisphere x2 + y2 + z2 = 4, z > 0 and the
cylinder x2 + y2 = 2x, oriented counterclockwise as viewed from high above the xy-plane. Evaluate
/ ydx 4- zdy + xdz.
Jc
r n r -i
x y 1 2
5. Let to = —------- dx 4- —------ =•dy. If C is an arbitrary path from to not passing
x2 + y2 x2 + y2 1 2

through the origin, calculate I a>.


Jc
6. Determine which of the following 1-forms a> are exact (or, in other words, which of the correspond­
ing vector fields F are conservative). For those that are, construct (following one of the algorithms
in the text) a potential function f. For those that are not, give a closed curve C for which ® a> / 0.
Jc
(a) to = (x 4- y)dx + (x + y)dy
(b) to = y2 dx 4- x2dy
(c) a> = (ex 4- 2xy)dx 4- (x2 4- y2)dy
(d) a> = (x2 4- y 4- z)dx 4- (x 4- y2 4- z)dy 4- (x 4- y 4- z2)dz
(e) a = y2zdx 4- (2xyz 4- sin z)dy 4- (xy2 4- y cos z)dz

7. Let f: R -> R and to = f (||x||)(Xidxt) e .41 (R").


i=i
(a) Assuming f is differentiable, prove that da) = 0 on R* - {0}.
(b) Assuming f is continuous, prove that to is exact.
8. Let C be the parametric curve
p cos(2?rr21)

g(0 = P7 4-4r3 -i 0 < t < 1.


t4 4- (t - t2)esinf

Calculate / (3x 4- y1 4- 2xz)dx 4- (2xy 4- zeyz 4- y)dy 4- (x2 4- yeyz 4- zez )dz. (Hint: This prob-
Jc
lem should involve very little computation.)

9. Let C be any closed curve in the plane. Show that ® ydx = — ® xdy. What is the geometric
Jc Jc
interpretation of these integrals?

10. Calculate each of the following line integrals j to directly and by applying Green’s Theorem.
(In all cases, C is traversed counterclockwise.)

(a) to = (x2 - y2)dx 4- 2xydy, C is the square with vertices

(b) to = — y3dx 4- x3dy, C as in part a.


*(c) to = — x2ydx 4- xy2dy, C is the circle of radius a centered at the origin.
364 ► Chapter 8. Differential Forms and Integration on Manifolds

*(d) a) = y/x2 + y2(—ydx + xdy), C is the circle x2 + y2 = 2x.


(e) co = —y2dx + x2dy, C is the boundary of the sector of the circle r < a, 0 < 6 < t t /4.
11. Let C be the circle x2 + y2 = 2x, oriented counterclockwise. Evaluate / co, where co =
Jc
(—y2 +e^2)dx + (x + sin(y3))Jy.
x2 y2
*12. Use Green’s Theorem to find the area of the ellipse — + < 1.
a2 b2
COS^ t
13. Find the area inside the hypocycloid x2/3 + y2/3 — 1. (Hint: Parametrize by g(/) = 3 .)
sin /

*14. LetO < b < a. Find the area beneath one arch of the trochoid (as shown in Figure 3.11)
, ' at - b sin /
g(0 = , 0 < / < 2rr.
a — hcos/

Figure 3.11

15. Find the area of the plane region bounded by the evolute
a (cos/ +1 sin/)
0 < t < 2/r ,
a (sin/ — t cos/)

and the line segment AB, as pictured in Figure 3.12.

Figure 3.12

16. Use symmetry considerations to find the following.


J 2
(a) Let C be the polygonal curve shown in Figure 3.13(a). Compute <b (ex —2xy)dx +
Jc
(2xy - x2)dy.
(b) Let C be the curve pictured in Figure 3.13(b); you might visualize it as a racetrack with two
semicircular ends. Compute ® (4x3y - 3y2)dx + (x4 + esmy)dy.
Jc
17. Let C be an oriented curve in R2, and let n be the unit outward-pointing normal (this means that
{n, T} gives a right-handed basis for R2). Define the 1-form er on C by tr = -n^dx + n, dy.
(a) Show that <r(T) = 1.
3 Line Integrals and Green’s Theorem ◄ 365

(b) Show that / tr gives the arclength of C.


Jc
(c) Can you explain how your answers to parts a and b might be related?
18. Let C be an oriented curve in R2, and let n be the unit outward-pointing normal (this means
Fi
that {n, T} gives a right-handed basis for R2). Let F = be a vector field on the plane, and let
F2
a> = Fflx + F2dy be the corresponding 1-form. Show that

F•nds= ★ (W.

This is called the flux of F across C. (See Exercise 8.2.9.) Conclude that when C = 9S, we have

Jc Js \dx dy J

19. Prove Green’s theorem for the annular region □ = : a < y/x2 4- y2 < b k pictured in
y
Figure 3.14.

Figure 3.14
20. Give a direct proof of Green’s theorem for
0 a 0
(a) a triangle with vertices at , and
0 ’ 0 b

(b) the region : a < x < b, g(x) < y < h(x) . (Hint: Exercise 7.2.23 will be helpful.)
|_y J
21. Suppose C is a piecewise C1 closed curve in R2 that intersects itself finitely many times and does
not pass through the origin. Show that the line integral
1 [ —ydx 4- xdy
Jc x2 4- y2

is always an integer. (See the discussion of Example 10.)


366 ► Chapter 8. Differential Forms and Integration on Manifolds

22. Suppose C is a piecewise C1 closed curve in R2 that intersects itself finitely many times and does
1 -1
not pass through or Show that there are integers m and n so that
0 0

1 f -ydx + (x - l)dy —ydx + (x + l)Jy \


2?r Jc \ (x - I)2 + y2 (x + I)2 + y2 ) = mA + nB.

y3 + x2y
23. An ant finds himself in the xy-plane in the presence of the force field F = . Around
2x2 — 6xy
what simple closed curve beginning and ending at the origin should he travel counterclockwise (once)
in order to maximize the work done on him by F?
24. Suppose Q C R2 is a region with the property that every simple closed curve in Q bounds a
region contained in Q that is a finite union of parametrized rectangles. Prove that if a> is 1-form on
Q with dco = 0, then a> is exact; i.e., there is a potential function f with a> — df.
25. (a) Suppose there is a current c in a river. Show that if we row at a constant ground speed v > c
directly downstream a certain distance and then directly back upstream to our beginning point, the
time required (ignoring the time to turn around) is always greater than the time it would take with no
current. (This is just an elementary algebra problem.)
(b) Show that the same is true no matter what closed path C we take in the river. (Assume we still
row with ground velocity v, with ||v|| > c constant.) (Hint: Express the time of the trip as a line
integral over C and do some clever estimates. The diagram in Figure 3.15 may help.)

Figure 3.15
26. According to Webster, a planimeter, pictured in Figure 3.16, is “an instrument for measuring the
area of a regular or irregular plane figure by tracing the perimeter of the figure.” As we show a bit
more schematically in Figure 3.17, an arm of fixed length b has one fixed end; to the other is attached
another arm of length a, which is free to rotate. A wheel (for convenience attached slightly off the
near end) turns as the arm rotates about the pivot point. Use Green’s Theorem to explain how the
amount that the wheel rotates tells us the area of the figure.

Figure 3.16 Figure 3.17


4 Surface Integrals and Flux •*< 367

► 4 SURFACE INTEGRALS AND FLUX


Suppose U C R2 is a bounded open set and g: U -> R” is a one-to-one smooth map with
the property that Dg(a) has rank 2 for all a € U. Then we call S — g(t7) a parametrized
surface.

► EXAMPLE 1

a. Consider g: (0,2n) x (0, a) -+ R3 given by

ucosa
vsinw
v

This is a parametrization of that portion of the cone z = y/x2 + y2 between z = Oandz = a,


less one ruling, as shown in Figure 4.1.

Figure 4.1

b. Consider g: (0, 2nr) x (0, 2t t ) R3 given by

(a + b cos v) cos u
(a+b cos v) sin u
bsinv

If 0 < b < a, the image of g is most of a torus, as pictured in Figure 4.2, the surface of
revolution obtained by rotating a circle of radius b about an axis a units from its center.

Figure 4.2

c. Consider g: R x (0, oo) -> R3 given by

vcosa
vsina
u
368 ► Chapter 8. Differential Forms and Integration on Manifolds

This parametrized surface, pictured in part in Figure 4.3, resembles a spiral ramp, and is
officially called a helicoid. ◄

Figure 4.3

As we expect by now, to define the integral of a 2-form over a parametrized surface S,


we pull back and integrate: When co e A2(R”) and S = g(t7), we set

I co = f
s Ju
(provided the integral exists).

► EXAMPLE!

For these examples, let’s fix co = zdx /\dy e A2(R3).


a. Let D c R2 be the open unit disk, and let g: D -> R3 be given by

/ \ rcos&
r sin0

Then we recognize that g is a parametrization of the upper unit hemisphere, S. We then


have
g*(zdx A dy) = \/l —r2rdr A df),

and so
f f r2* f1
I a> = I g*w = / /
s Jd Jo Jo
b. Now consider g: (0, t t /2) x (0,2n) -> R3 given by

sin^cos#
sin</>sin0

cos</>
4 Surface Integrals and Flux -41 369

This is an alternative parametrization of the upper hemisphere, S. Then

g*(zdx A dy) = cos 0 (cos <£> sin a dO) = cos2 <f> sin <f>d<p a dO,

and so
/* f z*231 r*/2 2t t
I (o= I g*a> = I I cos2 </> sin QdtpdO = — •
Js J(.0,7t/2)x(0M Jo Jo 3

c. Now let’s do the lower hemisphere correspondingly in each of these two ways. Parametrizing
by the unit disk, we have

r cos#
rsin#
-^71

We then have h*(zdx A dy) = —>/l — r^rdr A dO, and so

On the other hand, in spherical coordinates, we have k: (t t /2, n) x (0,2n) -> R3 given
by the same formula as g in part b above, and so

I <o= I k tu = —.
Js A’r/2,7r)x(O,2w) 3

What gives?

The answer to the query is very simple. Imagine you were walking around on the unit
sphere with your feet on the surface (your body pointing radially outward, normal to the
sphere). As you look down, you determine that a basis for the tangent plane to the sphere
will be “correctly oriented” if you see a positive (counterclockwise) rotation from the first
vector (u) to the second (v), as pictured in Figure 4.4. We will say that your body is pointing

Figure 4.4
370 ► Chapter 8. Differential Forms and Integration on Manifolds

in the direction of the outward-pointing normal vector to the surface. Note that then n, u, v
form a positively-oriented basis for R3; i.e.,

n u v > 0.

More generally, an orientation on a surface S C R” is a continuously varying notion of


what a positively oriented basis for the tangent plane at each point should be. In particular,
S has an orientation if and only if we can choose various parametrizations g: U -> R" of
3g 0g
(subsets of) 5 (the union of whose images covers all of S) so that -— and -— give a
oUi O«2
positively oriented basis of the tangent plane of 5 at every point of g( C7). We say a surface
is orientable if there is an orientation on S. (See Exercise 26.)
An alternative characterization of an orientation on a surface S is the following. Recall
that dim A2(R2)* = 1; i.e., any nonzero element of this vector space is either a positive
or a negative multiple of dx\ dx2. Given a nonzero element 0 g A2(R2)*, it defines an
orientation on R2 in an obvious way: The basis vectors Vi, v2 are said to define a positive
orientation on R2 if and only if 0(vi, v2) > 0. Now, by analogy, a nowhere-zero 2-form co
on the surface S defines an orientation on S: For each point a G S, the tangent vectors u and
v at a will form a positively oriented basis for the tangent plane if and only if co (a) (u, v) > 0.
(We will abuse notation as follows: Given an orientation on S, i.e., a compatible choice of
positively oriented basis for each tangent plane of S', we will say co > 0 on S if the value of co
on that basis is positive and co < 0 if the value is negative.) Orienting the sphere as pictured
in Figure 4.4, we now see from Figure 4.5 that dx a dy > 0 on the upper hemisphere,
whereas dx A dy < 0 on the lower. This explains the sign disparity in the two calculations
in part c of the preceding example.

Figure 4.5

> EXAMPLES

The standard example of a nonorientable surface is the Mobius strip, pictured in Figure 4.6. Observe
that if you slide the positive basis {u. v) once around the strip, it will return with the opposite
orientation. Alternatively, if you start with an outward-pointing normal n and travel once around the
Mobius strip, the normal returns pointing in the opposite direction.

Definition If S is an oriented surface, its (oriented) area 2-form cr is the 2-form with
the property that <r(a) assigns to each pair of tangent vectors at a the signed area of the
4 Surface Integrals and Flux 371

Figure 4.6

parallelogram they span. (By signed area we mean die obvious: The pair of tangent vectors
form a positively oriented basis if and only if the signed area is positive.)

4.1 Oriented Surfaces in R3 and Flux


ni
Let S' c R3 be an oriented surface with outward-pointing unit normal n = n2 . Then
_ n3 _
we claim that

a = rt\dy Adz + r^dz A dx + n^dx A dy

is its area 2-form. This was the point of Exercise 8.2.5, but we give the argument here. If
u and v are in the tangent plane to S, then

or(U, V) = n u v

gives the signed volume of the parallelepiped spanned by n, u, and v. Since n is a unit
vector orthogonal to u and v, this volume is the area of the parallelogram spanned by u and
v; our definition of orientation dictates that the signs agree.

► EXAMPLE 4

Consider the surface of revolution S defined by z — f(r), 0 < r <a, oriented so that its outward­
pointing normal has a positive e3-component. We can parametrize S by

g: (0,a) x (0,2rr)-> R3,


r cos#
r sin 0
f(r) _
372 ► Chapter 8. Differential Forms and Integration on Manifolds

3g 9g
Since the vector — x — has a positive e3-component, this is an appropriate parametrization. Now,
the unit normal is
-f/'fr)
1
n = —===== -£/'(r)
Vl + /'(r)2
1

and so
- 1 ~f(r)dy f\dz— -f'(r)dz /\dx +dx /\ dy) .
V1 + /W
Pulling back, we have

g*a = r>/l + f'(r)2dr A de,

and so the surface area of S is given by

[ g*a= fa ry/l + f'ir^drde,


J(0,a)x(0,2jr) Jo Jo
which agrees with the formula usually derived in single variable integral calculus.

> EXAMPLE 5

Given a plane n • x = c, with ||n|| = 1, then, assuming n3 yk 0, we can give a parametrization by


thinking of the plane as a graph over the xy-plane:

*(H ■
x/ L ^(c~niX ~n2y) _
Then
* ((^2 . , \ i 1 , ,
g*<T = rii I —dx A dy I + n% I —dx a dy I + n3ax A dy = —dx A dy.
\»3 / \n3 / »3

Recall that if u and v are two vectors in the plane, then or (u, v) gives the signed area of the parallelogram
they span, whereas (dx A dy) (n, v) gives the signed area of its projection into the xy-plane. As we see
from Figure 4.7, the area of the projection is |n31 = | cos y | times the area of the original parallelogram,
where y is the angle between the plane and the xy-plane, so the general theory is compatible with a
more intuitive, geometric approach.

Figure 4.7
4 Surface Integrals and Flux 373

"Fr
Given a vector field F = Ft on an open subset of R3, we saw in Section 3 that
L^3 J

integrating the 1-form a> = F\dx 4- F2dy 4- F^dz along an oriented curve computes the
work done by F in moving a test particle along that curve. What is the meaning of integrat­
ing the corresponding 2-form i] = F^dy A dz 4- F2dz A dx 4- F$dx A dy over an oriented
surface S'? (The observant reader who’s worked Exercise 8.2.9 will recognize that t) — ★ &>.
See also Exercise 8.3.18.) Well, if u and v are tangent to S, then

t )(u , v) — F u v — (F • n) x (signed area of the parallelogram spanned by u and v).

That is, / t j represents the flux of F outward across S, often written / F • ndS. Here dS
Js Js
represents an element of (nonoriented) surface area, just as ds represented the element of
(nonoriented) arclength on a curve; in neither case should these be interpreted as the exterior
derivative of something.
A physical interpretation is the following: Imagine a fluid in motion (not depending on
time), and let F (x) represent the velocity of the fluid at x multiplied by the density of the fluid
at x. (Note that F points in the direction of the velocity and has units of mass/(area x time).)
Then the mass of fluid that flows across a small area AS of S in a small amount of time Ar
is approximately

Am « 5AV « 3(vAr • n)(AS) & (F • n)ASAr,

so that

Am
« F • nAS.
Ar

Taking the limit as Ar -> 0 and summing over the bits of area AS, we infer that / t?
Js
represents the rate at which mass is transferred across S by the fluid flow.

► EXAMPLE 6

xz2
We wish to find the flux of the vector field F = yx2 outward across the sphere S of radius a
_ J
centered at the origin. That is, we wish to find the integral over S of the 2-form t ] = xz2dy Adz +
yx2dz A dx 4- zy2dx A dy. Calculating the pullback under the spherical coordinate parametrization
g: (0,7t) x (0,2zr) -> R3,

sin $ cos 0
sin0sin0

cos</>
374 > Chapter 8. Differential Forms and Integration on Manifolds

we have

g* j? = a5 (sin 0 cos 0 cos2 <f> (sin2 0 cos 0) 4- sin3 0 sin 0 cos2 0 (sin2 0 sin 0)
4- cos 0 sin2 0 sin2 0(sin 0 cos A dO
= a5 (sin3 0 cos2 0 4- sin5 0 cos2 0 sin2 0)d0 A dO,

and so

I rj = I a5(sin30cos20 4- sin5 0 cos2 0 sin20)J0 A df)


Js J(P,7t)x(P,2x)
fix
= a5 I I (sin30cos204-sin50cos20sin20)d0d0
Jo Jo
fit
= 2na5 I ( sin3 0 cos2 0 4- 1 sin5 0) d<f>
Jo
p 4
= 2na5 I (| sin0 4-1 cos20sin0 - | cos40sin0)tZ0 =-jra5. <4
Jo 5

4.2 Surface Area


We have pilfered Figure 4.8 from someone who, in turn, plagiarized from the book MaTe-
MaTHHecKMii Anajins na Mnoroo6pa3HHX by Mnxami CnnBaK. As this ex­
ample, discovered by Hermann Schwarz, illustrates, one must be far more careful to define
surface area by a limiting process than to define arclength of curves. It seems natural to
approximate a surface by inscribed triangles. But, even as the triangles get smaller and
smaller, the sum of their areas may go to infinity, even in the case of a surface as simplistic
as a cylinder. In particular, by moving the planes of the hexagons closer together, the trian­
gles become more and more orthogonal to the cylinder. The area of the individual triangles
approaches hill, and the number of triangles grows without bound.
For an oriented surface 5 C R3, we can (and did) explicitly write down the 2-form a
that gives the oriented area-form on S. In analogy with our development of arclength of a

Figure 4.8
4 Surface Integrals and Flux 375

curve and our treatment of change of variables in Chapter 7, we next give a definition of
surface area that will work for any parametrized surface. We need the result of Exercise
7.5.22: If u and v are vectors in R", the area of the parallelogram they span is given by

/ U•U U•V
y v•u v•v

(Here is the sketch of a proof. We may assume {u, v} is linearly independent, and let
{V3,..., vn} be an orthonormal basis for Span(u, v)1. Then we know that the volume of
the -dimensional parallelepiped spanned by u, v, V3,..., vn is the absolute value of the
determinant of the matrix
riii r
A— 11 v v3 ••• v„ .

But by our choice of the vectors V3,..., vn, this volume is evidently the area of the paral­
lelogram spanned by u and v. But by Propositions 5.11 and 5.7 of Chapter 7, we have

u•u u•v 0 • • 0
v u v•v 0 • • 0
(det A)2 = det(ATA) = 0 0 1 _ u•u u•V
v•u V•V

0 0 1

as required.) If g is a parametrization of a smooth surface, then for sufficiently small Au


and Av, we expect that the area of the image g([w, u + Aw] x [v, v + Av]) should be
approximately the area of the parallelogram that is the image of this rectangle under the

linear map Dg , and that, in turn, is Aw Au times the area of the parallelogram spanned

, 9g , 9g
by — and —.
7 du dv
With this motivation, we now make the following

Definition Let S c R" be a parametrized surface, given by g: Q -> Rn, for some
region Q c R2. Let

9g II2
9u ||
F=^. du
dg
dv ’
<7 —
\j —-
9g
dv

We define the surface area of 5 to be

area(S) = Ve G-F2 dAuv.

We leave it to the reader to check in Exercise 20 that for a parametrized, oriented


surface in R3 this gives the same result as integrating the area 2-form a over the surface.
376 ► Chapter 8. Differential Forms and Integration on Manifolds

► EXERCISES 8.4
1. Let 5 be that portion of the plane x + 2y + 2z = 4 lying in the first octant, oriented with outward
normal pointing upward. Find
(a) the area of S,
(b) I (x - y + 3z )c t ,
Js
(c) / zdx /\dy + ydz /\dx + xdy /\ dz.
Js
2. Find the area of that portion of the cylinder x2 4- y2 = a2 lying above the xy-plane and below the
plane z = y.
3. Find the area of that portion of the cone z — y/2(x2 + y2) lying beneath the plane y + z = 1.
*4. Find the area of that portion of the cylinders2 + y2 = 2y lying inside the sphere x2 + y2 + z2 = 4.
#5. Let S be the sphere of radius a centered at the origin, oriented with normal pointing outward. Eval­
uate / xdy /\dz + ydz a dx + zdx a dy explicitly. What formula do you deduce for the surface
Js
area of S?
6. Let S be the surface of the unit sphere, and let its area element be <r.
(a) Calculate / x2a directly.
Js
(b) Evaluate the integral in part a without doing any calculations. (Hint: Why is / x2a = I y2o =
Js Js
zJol)

7. Find the surface area of the torus given parametrically in Example lb.
*8. Find the surface area of that portion of a sphere of radius a lying between two parallel planes
(both intersecting the sphere) a distance h apart.
9. Let S be that portion of the helicoid given parametrically by

WCOS V

g u sin v 0 < u < 1, 0 < v < 2n.


V

(a) With the orientation determined by g, decide whether the outward-pointing normal points upward
or downward.
(b) If we orient S with the normal pointing upward, compute xdz a dx.

10. We can parametrize the unit sphere (except for the north pole) by stereographic projection from
Tu"! To"
the north pole, as indicated in Figure 4.9. v is the point where the line through and
0
x x

y (on the sphere) intersects the plane z = 0, solve for « and v. Then solve for g y
z z
Explain geometrically why stereographic projection is an orientation-reversing parametrization.
4 Surface Integrals and Flux •< 377

Figure 4.9

11. Let co = xdy A dz. Let S be the unit sphere, oriented with outward-pointing normal. Calculate
I co by parametrizing S
Js
(a) by spherical coordinates,
(b) as a union of graphs,
(c) by stereographic projection (see Exercise 10).

12. Let S be the unit upper hemisphere, oriented with outward-pointing normal. Calculate / zcr by
Js
showing that zc = dx a dy as 2-forms on S.
13. LetS be the cylinder x2 4- y2 = u2,0 < z < h, oriented with outward-pointing normal. Calculate
/ co for
Js
(a) co = zdx a dy, (b) co = ydx A dz.
*14. Find the moment of inertia about the z-axis of a uniform spherical shell of radius a centered at
the origin.
*15. Find the flux of the vector field F(x) = x outward across the following surfaces (all oriented with
outward-pointing normal pointing away from the origin):
(a) the surface of the sphere of radius a centered at the origin,
(b) the surface of the cylinder x2 4- y2 = a2, —h<z<h,
(c) the surface of the cylinder x2 4-y2 = u2,-ft < z < ft, together with the two disks, x2 4-y2 < a2,
z = ±h, ±1
(d) the surface of the cube with vertices at ±1 .
±1

16. Find the flux of the vector field F = y2 outward across the given surface S (all oriented with
_ z2 _

outward-pointing normal pointing away from the origin, unless otherwise specified):
(a) S is the sphere of radius a centered at the origin.
(b) S' is the upper hemisphere of radius a centered at the origin.
(c) S is the cone z = -/x2 4- y2, 0 < z < 1, with outward-pointing normal having a negative e3-
component.
(d) S is the cylinder x2 4- y2 = a2,0 < z < ft.
378 Chapter 8. Differential Forms and Integration on Manifolds

(e) S is the cylinder x2 4- y2 = a2,0 < z < h, along with the disks x2 + y2 < a2, z = 0 and z = h.

xz
*17. Calculate the flux of the vector field F = yz outward across the surface of the paraboloid
L^2 + y2 J
S given by z = 4 — x2 — y2, z > 0 (with outward-pointing normal having positive e3-component).
*18. Find the flux of the vector field F(x) = x/||x||3 outward across the given surface (oriented with
outward-pointing normal pointing away from the origin):
(a) the surface of the sphere of radius a centered at the origin,
(b) the surface of the cylinder x2 + y2 = a2, —h<z<h,
(c) the surface of the cylinder x2 + y2 = a2, — h < z < h, together with the two disks, x2 + y2 < a2,
z ~ ±h, ±1
(d) the surface of the cube with vertices at ±1
±1

19. Let S be that portion of the cone z = y/x2 + y2 lying inside the sphere x2 + y2 + z2 = 2ax and
oriented with normal pointing downward. Calculate / co for
Js
(a) co = dx A dy,
x y
(b) co = - dy Adz + -dz Adx — dx A dy.
z z
20. Suppose g: Q -> R3 gives a parametrized, oriented surface with unit outward normal n. Let
N= x —, so thatn = N/||N||. Check that
du dv
g*(nidy Adz + n2dz Adx + n^dx Ady) = ||N||du Adv — y/EG — F2du A dv.
21. Sketch the parametrized surface g: [0, 2t t ] x [-1,1] given by:

(2 + vsin |)cosu
g (2 4- v sin |)sinu
ucos |

0
Compare g*(dy A dz) at and . Explain.
0 0

*22. Consider the “flat torus’


Xi

X2
: x2 + x2 = 1, x2 + x2 = 1
X3
_ X4 _

1
0
Orient X so that dxz A dx4 > 0 at the point e X. Calculate / co for
1 Jx
_0_
(a) co = dx\ A dx2 + dx3 A dx^,
(b) co = dx\ A dx3 + dx2 A dx^,
(c) CO = X2X4dX) A dX3.
5 Stokes’s Theorem 4 379

23. Consider the cylinder 5 with equation x2 + y2 = 1, — 1 < z < 1, oriented with unit normal
pointing outward. Calculate
(a) / xdy A dz — zdx A dy, (b) / xzdy and / xzdy (See Figure 4.10.)
JS J Ci Jc2
Compare your answers and explain.

Figure 4.10
24. Let S be the hemisphere x2 4- y2 + z2 = a2, z > 0, oriented with unit normal pointing upward.
Let C be the boundary curve, x2 + y2 = a2, z = 0, oriented counterclockwise. Calculate
(a) / dx A dy + 2zdz A dx, (b) / xdy + z2dx.
Js Jc
Compare your answers and explain.
25. Construct two Mobius strips out of paper: For each, cut out a long rectangle, and attach the short
edges with opposite orientations.
(a) Cut along the center circle of the first strip. What happens? Explain. What happens if you repeat
the process?
(b) Make parallel cuts in the second strip one-third of the way from either edge. What happens?
Explain.
26. Prove or give a counterexample: If S is an orientable surface, then there are exactly two possible
orientations on S.

► 5 STOKES’S THEOREM
We now come to the generalization of Green’s Theorem to higher dimensions. We first stop
to make the official definition of the integral of a differential form over a compact, oriented
manifold. So far we have dealt only with the integrals of 1- and 2~forms over parametrized
curves and surfaces, respectively.

5.1 Integrating over a General Compact, Oriented k-Dimensional Manifold


We know how to integrate a £-form over a parametrized ^-dimensional manifold by pulling
back. In general, a manifold will be a union of parametrized pieces that overlap, and so
summing the integrals will give a meaningless result. To solve this problem, we introduce
one of the powerful tools in the study of manifolds, one that allows us to chop a global
problem into local ones and then add up the answers.
380 ► Chapter 8. Differential Forms and Integration on Manifolds

We start with a

Definition A subset M c R" is called a ^-dimensional manifold with boundary if


for each point p e M there is an open set W C Rrt containing p and a parametrization4
g: U -> R" so that

i. g([Z) = V = W n M\ and
ii. U is an open subset either of Rfc or of R* = {u g R* : Uk > 0}.5

See Figure 5.1. We say p is a boundary point of M (written p G 3Af) if p = g(u) for some
u g dR^ = {u g R* : Uk = 0}.
g(Z7) is sometimes called a coordinate chart on M. A coordinate ball on M is the
image of some ball under some parametrization.

As was the case with surfaces, an orientation on a manifold with boundary M C R"
is a continuously varying notion of what a positively oriented basis for the tangent space
at each point should be. M has an orientation if and only if we can cover M by coordinate

4Recall from Section 3 of Chapter 6 that this means that g is a one-to-one smooth map from U to W n M so that
£>g(u) has rank k for every u e U and g-1: W Q M U is continuous.
5 We say 17 C R^. is an open subset of R$. if it is the intersection of R^. with some open subset of R*.
5 Stokes’s Theorem ■< 381

[3g dg 1
charts g: U -> R" so that < —,..., — [is a positive basis for the tangent space of M
[OUl OUk J
at each point. We say M is orientable if there is some orientation on M.
We leave it to the reader to prove, using Theorem 5.1, that M is orientable if and only
if there is a nowhere-vanishing £-form on M (see Exercise 23). Then we can make the

Definition Let M be an oriented ^-dimensional manifold with boundary. Its (ori­


ented) volume form is the k-form a with the property that cr(a) assigns to each £-tuple of
tangent vectors at a the signed volume of the parallelepiped they span.

Now we come to the main technical tool that will enable us to define integration on
manifolds.

Theorem 5.1 Let M C R" be a compact k-dimensional manifold with boundary.


Then there are smooth real-valued functions pi,..., p^ on M so that

i. 0 < pt < \ for alii;


ii. each Pi is zero outside some coordinate ball;
N
iii. £ pi = 1.
1=1

{p,} is called a partition of unity on M.

Proof

Step 1: Define h: R R by

, z x x > 0
h(x) =
0, x<0

Then h is smooth (in particular, all its derivatives at 0 are equal to 0, as we ask the reader
to prove in Exercise 25). Set
fX
I h(t)h(l — f)dt

I h(f)h(l — t)dt
Jo
and define^: R* -> Rby^r(x) = j(3 — 2||x||). Then is a smooth function wither (x) =
1 whenever )|x|| < 1 and ^(x) = 0 whenever ||x)| > 3/2; is often called a bumpfunction.
(See Figure 5.2 for the graph of for k = 1.)

Step 2: For each point p g M, choose a coordinate chart whose domain is a ball of radius
2 in R* (why can we do so?).6 The images of the balls of radius 1 obviously cover all
of M; indeed, we can choose a sequence (countable number) of p’s so that this is true.
(See Exercise 26.) By Exercise 5.1.12, finitely many of these images of balls of radius 1,
say, Vi,..., Vn , cover all of M. Let g,: B(0,2) -> Vi be the respective coordinate charts,

6For those p in the boundary, this will be a half-ball, i.e., the points in the ball with nonnegative fc* coordinate.
382 > Chapter 8. Differential Forms and Integration on Manifolds

y = h(x)h(l-x) y = Wx)

1 1 1.5

Figure 5.2

and define 0, = Vrogj 1»interpreting 0/ to be defined on all of M by letting it be 0 outside of


Vi (note that the fact that is 0 outside the ball of radius 3/2 means that 0, will be smooth).
Set

0/
p‘ 1X1 <

Note that for each p e M, we have p = g; (u) for some j and some u e B(0,1), and hence
07(p) = 1 for some j. Thus, the sum is everywhere positive. These functions pi fulfill the
requirements of the theorem. ■

Now it is easy to define the integral. Let M c R" be a compact, oriented fc-dimensional
manifold (with piecewise-smooth boundary). Let co be a A-form on M? Let {/>,} be a
partition of unity, and let gz be the corresponding parametrizations, which we may take to
be orientation-preserving (how?). Now we set

r . n n . n „
Jm Jm
The point is that the form Pito is nonzero only inside the image of the parametrization g,.
One last technical point. Let M be a ^-dimensional manifold with boundary, and let
p be a boundary point. The tangent space of dM at p is a (k — 1)-dimensional subspace
of the tangent space of M at p, and its orthogonal complement is 1-dimensional. That
1-dimensional subspace has two possible basis vectors, called the inward- and outward­
pointing normal vectors. By definition, if we follow a curve starting at p whose tangent
vector is the inward -pointing normal, we move into M, as shown in Figure 5.3. We endow
d M with an orientation, called the boundary orientation, by saying that the outward normal,
n, followed by a positively oriented basis for the tangent space of dM should provide a
positively oriented basis for the tangent space of M. For examples, see Figure 5.4. We ask

7We are being a bit casual about what a smooth function or jt-form on Af ought to mean. We might start with
something defined on a neighborhood of M in R" or, instead, we might just know the pullbacks under coordinate
charts are smooth. Because of Theorem 3.1 of Chapter 6, these notions are equivalent. We leave the technical
details to a more advanced course. In practice, except for results such as Theorem 5.1, we will usually start with
objects defined on R" anyhow.
5 Stokes’s Theorem 383

Figure 5.3

n
Figure 5.4

the reader to check in Exercise 1 that the boundary orientation on is the usual one on
R*-1 precisely when k is even.

5.2 Stokes’s Theorem


Now we come to the crowning result. We will give various physical interpretations and
applications in the next section, as well as some applications to topology in die last section
of the chapter. Here we will give the theorem and some concrete examples.

Theorem 5.2 (Stokes’s Theorem) Let M be a compact, oriented k-dimensional man­


ifold with boundary, and let co be a smooth (k — l)-form on M. Then

I co = / da).
JdM Jm
(Here dM is endowed with the boundary orientation, as described above.)

Remark Note that the usual Fundamental Theorem of Calculus, the Fundamental
Theorem of Calculus for Line Integrals (Proposition 3.1), and Green’s Theorem (Corollary
3.5) are all special cases of this theorem. When we’re orienting the boundary of an oriented
line segment, we assign a + when the outward-pointing normal agrees with the orientation
on the segment, and a — when it disagrees. This is compatible with the signs in
b <b
f'(t)dt = f(b) - /(a).
384 ► Chapter 8. Differential Forms and Integration on Manifolds

Proof Since both sides of the desired equation are linear in cu, we can (by using a
partition of unity) reduce to the case that <u is zero outside of a compact subset of a single
coordinate chart, g: U -> Rn (where U is open in either R* or R?j.). Then we have

f g*(cZm) = f d(g*a>).
u Ju

g*a>, being a (k — l)-form on U c Rfc, can be written as follows:

k
g*ca = 22 fi(x)dxi A • • • A dx, A • • • A dxk,
i=l

where dxi indicates that the dxi term is omitted. So we have

d(g*o>) = 22 —~dxi A ^X1 a • ■ • A dXi A • • • A dXk


i=l
-EG-'i'-’g)*
A • • • A dX{ A • • • A dxk.
1=1

Case 1: Suppose U is open in Rfe; this means that co = 0 on 9Af, and so we need only
show that / da) = / d(g*co) = 0. The crucial point is this: Since g*co is smooth and 0
Jm Ju ...
outside of a compact subset of U, we may choose a rectangle R containing U, as shown in
Figure 5.5, and extend the functions f to functions on all of R by setting them equal to 0
outside of U. Finally, we integrate over R = [«i, bi] x • • • x [a*, bk]:

c f _* / df- \
I d(g*a>)— I J? ((“A ’ " ^dXi A • • • ^dxk
Ju °Xi7
k
= 22(-l)'-1 f ~i-dxidx2---dxk
Jr dxi

since fi = 0 everywhere on the boundary of R. (Note the applications of Fubini’s Theorem


and the traditional Fundamental Theorem of Calculus.)
5 Stokes’s Theorem 385

Figure 5.5

Case 2: Now comes the more interesting situation. Suppose U is open in Rj.> and once again
we extend the functions fi to functions on a rectangle R C Rj. by letting them be 0 outside
of U. In this case, the rectangle is of the form R = [a15 hi] x • • • x [a*_i, hjt-i] x [0, h*],
as we see in Figure 5.6. Now we have

Figure 5.6
386 ► Chapter 8. Differential Forms and Integration on Manifolds

(since all the other integrals vanish for the same reason as in Case 1)

/ *1 \
= (-!)* / A
dx\ * ■ * dx^—|
£7r)3R$.
\ 0 /

I g*Ct) = / ft),
c/naR!}. JdM

as required. Note the crucial sign in the definition of the boundary orientation (see also
Exercise 1). ■

Remark Although we won’t take the time to prove it here, Stokes’s Theorem is
also valid when the boundary, rather than being a manifold itself, is piecewise smooth,
e.g., a union of smooth (k — l)-dimensional manifolds with boundary intersecting along
(k — 2)-dimensional manifolds. For example, we may take a cube or a solid cylinder,
whose boundary is the union of a cylinder and two disks. The theorem also applies to such
non-manifolds as a solid cone.

Corollary 5.3 Let M be a compact, oriented k-dimensional manifold without bound­


ary. Let co be an exact k-form; i.e., co = dr} for some (k — l)-form rj. Then I co — 0.
Jm

Proof This is immediate from Case 1 of the proof of Theorem 5.2. ■

► EXAMPLE 1

Let C be the intersection of the unit sphere x2 4- y2 4- z2 = 1 and the plane x 4- 2y 4- z = 0, oriented
counterclockwise as viewed from high above the xy-plane. We wish to evaluate / (z — x)dx 4-

(x - y)dy 4- (y - z)dz.
We let <u = (z - x)dx 4- (x - y)dy 4- (y - z)dz and M be that portion of the plane x 4- 2y 4-
z = 0 lying inside the unit sphere, oriented so that the outward-pointing normal has a positive e3-
component, as shown in Figure 5.7. Then dM = C, and by Stokes’s Theorem we have

Parametrizing the plane by projection on the xy-plane, we have M = g(Z>), where D is the interior
of the ellipse 2x2 4- 4xy 4- 5y2 = 1 (why?), and

y
—x — 2y

(The reader should check that g is an orientation-preserving parametrization.) Therefore,

dco — I ffdeo — I
Jd Jd
5 Stokes’s Theorem 387

Figure 5.7

Now, by Exercise 5.4.15 or by techniques we shall learn in Chapter 9, this ellipse has semimajor axis
1 and semiminor axis 1/V6, so, using the result of Exercise 8.3.12, its area is Tr/Vf), and the integral
is
Alternatively, applying our discussion of flux in Section 4, we recognize the surface integral in
“1"
(*) as the flux of the constant vector field F = 1 outward across M. Since the unit normal of M
1
is n = , we see that

(dy A dz + dz A dx + dx A dy)

since M is, after all, a disk of radius 1.

► EXAMPLE 2

Let 5 be the sphere x2 + y2 + (z - I)2 = 1, oriented in the customary fashion. We wish to evaluate
I co, where a) = xzdy Adz + yzdz a dx 4- z2dx a dy. Let M be the compact 3-manifold whose
Js
boundary is 5; i.e., Af = {x € 1R3 : x2 + y2 + (z — I)2 < 1}, oriented by the standard orientation on
1R3. We apply Stokes’s Theorem to M:

4zdx Ady Adz

f . . 4?r 16t t
— I 4zdV = 4zvol(Af) — 4 • 1 • — = ——
Jm 3 3

(Recall that z is the z-component of the center of mass of M.) Of course, we could compute the surface
0
integral directly, parametrizing S by, for example, spherical coordinates centered at 0 .
1
388 ► Chapter 8. Differential Forms and Integration on Manifolds

► EXAMPLES

xz
Suppose we wish to calculate the flux of the vector field F = yz outward across the surface

of the paraboloid S given by z = 4 - x2 — y2, z > 0 (with outward-pointing normal having posi­
tive e3-component). That is, we want to compute the integral of a> — xzdy Adz + yzdz a dx +
(x2 + y2)dx A dy. How might we do this with Stokes’s Theorem? If c d were exact, i.e., if c d = dr? for
some 1-fonn r}, then we would have / co = / ??; but since da> = 2zdx A dy A dz 0, we know
Js Jss
that c d cannot be exact What now?
If we attach the disk D = {x2 + y2 < 4, z = 0}, to S, then we have a (piecewise-smooth) closed
surface, which bounds the region Af = {0<z<4 — x2 — y2} C R3, as shown in Figure 5.8. Then
we have dM = S U D (where by D~ we mean the disk with outward-pointing normal given by
—e3). Applying Stokes’s Theorem, we find
y f2n y2 fl-T2 M
I cd — I da> = I 2zdx Ady Adz = I 2zdV = I II 2rzdzdrd0 = -~n.
JdM Jm Jm m Jo Jo Jo 3

But we are interested in the integral of c d only over the surface S. Since

(where by D we mean the disk with its usual upward orientation), then we have

f2 2 64 o 88
I r rdrdO = —it + 8?r = —t t .
'o 3 3

We leave it to the reader to check this by a direct calculation (see Exercise 8.4.17).

Figure 5.8

► EXAMPLE4

We come now to the 3-dimensional analogue of Example 9 of Section 3. It will play a major role in
physical and topological applications in upcoming sections. Consider the 2-form
xdy Adz + ydz A dx + zdx A dy
cd =
(x2 -I- y2 + z2)3/2
5 Stokes’s Theorem ◄ 389

which is defined and smooth on K3 — {0}. The astute reader may recognize that on a sphere of radius
a centered at the origin, a> is 1/a2 times the area 2-form.
Pulling back by the spherical coordinates parametrization given on p. 329, with a bit of work we
see that

g*a> = sin a dO,

which establishes again the geometric interpretation of a>. It is also clear that d(g*co) = 0; since
det Z>g 76 0 whenever p # 0 and <j> 0, it, it follows that da> — 0. (Of course, it isn’t too hard to
calculate this directly.)
So here we have a 2-form whose integral over any sphere centered at the origin (with outward­
pointing normal) is 4t t , and yet, for any ball B centered at the origin, / da> = 0. What happened to
Jb
Stokes’s Theorem? The problem is that to is not defined, let alone smooth, on all of B.
But there is more to be learned here. If £2 c R3 is a compact 3-manifold with boundary with
0 9£2, then we claim that

4?r, 0 € £2
I (D —
an 0, 0££2’

rather like what happened with the winding number in Example 10 of Section 3. When 0 £ £2, we
know that w is a (smooth) 2-form on all of £2, and hence Stokes’s Theorem applies directly to give

When 0 € £2, however, we choose e > 0 small enough so that the closed ball B(0, £) c £2, and we
let £2fi = £2 - B(0, s), as pictured in Figure 5.9, recalling that 9£2e = 9£2 + . (Here Se denotes the
sphere of radius s centered at 0, with its usual outward orientation.) Then co is a smooth form defined
on all of £2e and we have

Therefore, we have

c d = 4t t ,

as we learned above.

Figure 5.9
390 ► Chapter 8. Differential Forms and Integration on Manifolds

► EXERCISES 8.5
*1. Check that the boundary orientation on 8R* is (—l)fc times the usual orientation on R*”1.
2. Let C be the intersection of the cylinder x2 4- y2 = 1 and the plane 2x 4- 3y — z = 1, oriented
counterclockwise as viewed from high above the xy-plane. Evaluate

I ydx — 2zdy 4- xdz


Jc
directly and by applying Stokes’s Theorem.

*3. Compute / (y — z)dx 4- (z — x)dy 4- (x — y)dz, where C is the intersection of the cylinder
Jc
X z
x2 + y2=a2 and the plane - 4- - ~ 1, oriented clockwise as viewed from high above the xy-plane.
a b

4. Let C be the intersection of the sphere x2 4- y2 4- z2 = 2 and the plane z = 1, oriented counter­
clockwise as viewed from high above the xy-plane. Evaluate

I (-y3 4- z)dx 4- (x3 4- 2y)dy 4- (y - x)dz.


Jc
5. Let C be the intersection of the sphere x2 4- y2 4- z2 = a2 and the plane x 4- y 4- z = 0, oriented
counterclockwise as viewed from high above the xy-plane. Evaluate

I 2zdx + 3xdy — dz.


Jc
*6. Let Q c R3 be the region bounded above by the sphere x2 4- y2 4- z2 = a2 and below by the plane
z = 0. Compute

/ xzdy /\dz + yzdz /\dx + (x2 4- y2 4- z2)dx a dy


JdQ
directly and by applying Stokes’s Theorem.
7. Let co = y2dy f\dz + x2dz A dx 4- z2dx A dy, and let M be the solid paraboloid 0 < z <
1 — x2 - y2. Evaluate / co directly and by applying Stokes’s Theorem.
JdM
8. Let M be the surface of the paraboloid z = 1 — x2 — y2, z > 0, oriented so that the outward-
x2z
pointing normal has positive e3 -component. LetF = y2z j F • ndS directly and
m
_ X2 4- y2 _
by applying Stokes’s Theorem. Be careful!
9. Let M be the surface pictured in Figure 5.10, with boundary curve x2 4- y2 = 4, z = 0. Calculate
/ yzdy A dz 4- x3dz A dx 4- y2dx A dy.
Jm
10. Suppose M and M' are two compact, oriented ^-dimensional manifolds with boundary, and
suppose 9Af = 9AT (as oriented (k - l)-dimensional manifolds). Prove that for any (k - l)-form
cu, /
Jm M'

11. Use the result of Exercise 10 to compute / dco for the given surface M and 1-form
Jm
(a) M is the upper hemisphere x2 4- y2 4- z2 = a2, z > 0, oriented with outward-pointing normal
having positive e3-component; co = (x3 4- 3x2y - y)dx 4- (y3z 4- x 4- x3)dy 4- (x2 4- y2 4- z)dz.
(b) M is that portion of the paraboloid z = x2 4- y2 lying beneath z = 4, oriented with outward­
pointing normal having negative e3-component; co = ydx 4- zdy 4- xdz.
5 Stokes’s Theorem •«( 391

Figure 5.10

(c) M is the union of the cylinder x2 4- y2 = 1,0 < z < 2, and the disk x2 + y2 < 1, z = 0, oriented
so that the normal to the cylindrical portion points radially outward; co = ~y3zdx + x3zdy + x2y2dz.
12. Let M = {x € R4 : x2 4- xf 4- xf < x4 < 1}, with the standard orientation inherited from R4.
Evaluate / co:
JdM
*(a) a) = (xfxj 4- x4)c/xi A dxz A dx3,
(b) co = ||x||2dxi A dx2 a dx3.
13. Redo Exercise 8.4.22c by applying Stokes’s Theorem.
14. Suppose f is a smooth function on a compact 3-manifold with boundary M c R3. At a point of
9 M, let Dnf denote the directional derivative of f in the direction of the unit outward normal. Show
that
[ DnfdS = [ V2fdV,
JdM Jm
d2f d2f d2f
where V2/ = - 4- 4- is the Laplacian of f. (Hint: V2fdx /\dy Adz — d*df. See
ax2 ay2 dz2
Exercise 8.2.9.)
15. Let S be that portion of the cylinder x2 4- y2 = a2 lying above the xy-plane and below the sphere
x2 4- (y - a)2 4- z2 = 4a2. Let C be the intersection of the cylinder and sphere, oriented clockwise
as viewed from high above the xy-plane.
(a) Evaluate / zdS.
Js f
(b) Use your answer to part a to evaluate / y(z2 — l)dx 4- x(l - z2)dy 4- z2dz.
Jc
16. Let S be that portion of the cylinder x2 4- y2 = a2 lying above the xy-plane and below the sphere
(x — a)2 4- y2 4- z2 = 4a2. Let C be the intersection of the cylinder and sphere, oriented clockwise
as viewed from high above the xy-plane.
(a) Evaluate j z2dS.

(b) Use your answer to part a to evaluate y(z3 4- l)dx — x(z3 4- l)dy 4- zdz.

11. Let

M= CR4,
392 ► Chapter 8. Differential Forms and Integration on Manifolds

oriented so that —5----- > 0 on Af. Evaluate [ (y2 — xf)dx2 A dyi. (Hint: By applying an
Ji ~ *1 Jm
appropriate linear transformation, you should be able to recognize Af as a torus.)
*18. Let C be the intersection of the sphere x2 4- y1 4- z2 = 1 and the plane x 4- y 4- z = 0, oriented
counterclockwise as viewed from high above the xy-plane. Evaluate
[ z3dx.
Jc
(Hint: Give an orthonormal basis for the plane x + y + z = 0, and use polar coordinates.)
19. Let C be the intersection of the sphere x2 + y2 +z2 = 1 and the plane x 4- y 4- z = 0, oriented
counterclockwise as viewed from high above the xy-plane. Evaluate

I xy2dx + yz2dy + zx2dz.


Jc
(Hint: See Exercise 18.)
20. Suppose co e ,4*-2(R*). Complete the following proof that d(dco) = 0. Write d(dto) =
/(x)dxi A • • • A dxk, and suppose /(a) > 0. By considering the integral of dfdcd) over a small
ball centered at a and applying Corollary 5.3, arrive at a contradiction.
21. We saw in Example 8 of Section 3 that there are 1-forms co on R2 with the property that for every
region S c R2 we have area(S) = / co. Can there be such a 1-fonn on
Jas
(a) the unit sphere?
(b) the torus?
(c) the punctured sphere (i.e., the sphere less the north pole)?
22. In this exercise we sketch a proof that the graph of a function f satisfying the minimal surface
equation (see p. 124) on a region Q c R2 has less area than any other surface with the same boundary
curve.8
(a) Consider the area 2-form <r of the graph of f:
1 / df df \
a~ ■ - - — I------ dy Adz — —dz A dx 4- dx A dy I.
x/l +IIV/112 \ 9x 9y J
Show that da = 0 if and only if / satisfies the minimal surface equation.
(b) Show that for any compact oriented surface N c R3, / a < area(TV), and equality holds if and
Jn
only if N is parallel to the graph of /. (Hint: Interpret / a as a flux integral.)
Jn
(c) Let Af be the graph of / over Q, and let N be a different oriented surface with dN = d M. Deduce
that area(Af) < area(N).
23. (a) Prove that Af is an orientable ^-dimensional manifold with boundary if and only if there
is a nowhere-zero fc-form on Af. (Hint: For =>, use definition (1) of a manifold on p. 262 and a
partition of unity to glue together compatibly chosen forms on coordinate charts. Although we’ve
only proved Theorem 5.1 for a compact manifold M, the proof can easily be adapted to show that for
any manifold Af and any covering {Vf} by coordinate charts, we have a sequence of such functions
Pi, each of which is zero outside some Uj.)
(b) Conclude that Af is orientable if and only if there is a volume form globally defined on Af.

8This is an illustration of the use of calibrations, introduced by Reese Harvey and Blaine Lawson in their seminal
paper, Calibrated Geometries, Acta Math. 148 (1982), pp. 47—157.
6 Applications to Physics •*< 393

24. Let M be a compact, orientable ^-dimensional manifold (with no boundary), and let co be a
(k - l)-form. Show that da> = 0 at some point of M. (Hint: Using Exercise 23, write da) — fa,
where cr is the volume form of M. Without loss of generality, you may assume M is connected.
Why?)
x >0
25. Let h(x) = ’ . Because exponential functions grow faster at infinity than any
(0, x<0
polynomial, it should be plausible that all the derivatives of h at 0 are 0. But give a rigorous proof as
follows:
(a) Let f(x) — x > 0. Prove by induction that the kth derivative of /, is given by
/(W(x) = e~X)xpk(l/x) for some polynomial Pk of degree 2k.
(b) Prove by induction that (0) = 0 for all k > 0.
26. Let X C R". Prove that given any collection {KJ of open subsets of R" whose union contains
X, there is a sequence Vai ,Va2,... of these sets whose union contains X. (Hint: Consider all balls
B(q, 1/k) C R" (for some k € N) centered at points q € R” all of whose coordinates are rational.
This collection is countable, i.e., can be arranged in a sequence. Show that we can choose such balls
B(q,, 1/fcJ, i = l,2,..., covering all of X with the additional property that each is contained in
some Vaj.)

► 6 APPLICATIONS TO PHYSICS
6.1 The Dictionary in R3
We have already seen that a vector field in R3 can plausibly be interpreted as either a 1-form
or a 2-form, the former when we are calculating work, the latter when we are calculating
flux. We have already seen that for any function f, the 1-form df corresponds to the vector
field V/. We want to give the traditional interpretations of the exterior derivative as it acts
on 1- and 2-forms.
Given a 1-form
a) — Fidxi + F2dx2 + F3dx3 e A^R3), we have

J J fdFi 9F3\ J
dx2 dxj 4-1 - ------- --- — ) dx3 A dxi
\ dX3 0X1 /

—— | dxi A dx2.

(We stick to the subscript notation here to make the symmetries as clear as possible.)
Correspondingly, given the vector field
3F2 "
“Fi"
3X2 3x 3
d Fi 3F3
F= F2 we set curl F =
dx3 3x i
9F2 3Ft
_ 3x1 3x 2 -
Note first of all that d2 = 0 tells us that

curl (V/) = 0 for all C2 functions f.


394 ► Chapter 8. Differential Forms and Integration on Manifolds

In somewhat older books one often sees the notation “rot,” rather than “curl”; both terms
suggest that we think of curl F as having something to do with rotation (curling).
Stokes’s Theorem can now be phrased in the following classical form:
Theorem 6.1 (Classical Stokes’s Theorem) Let S C R3 be a compact, oriented
surface with boundary. Let F be a smooth vector field defined on all of S. Then we have

F • Tds = I curl F • ndS.


' * ' Js' 7 '
a) act)

If we return to our discussion of flux in Section 4 and visualize F as the velocity field
of a fluid, then the line integral / F • Tds around a closed curve C may be interpreted as
Jc
the circulation of F around C, which we might visualize as a measure of the tendency of a
piece of wire in the shape of C to turn (or circulate) when dropped in the fluid. Applying
the theorem with S = Dr, a 2-dimensional disk of radius r centered at a with normal vector
n, and using continuity (see Exercise 7.1.7), we have
curlF(a)-n = lim f F-Tds.
r->o+ nrz JdDr

In particular, if, as pictured in Figure 6.1, we stick a very small paddlewheel (of radius r)
in the fluid, it will spin the fastest when the axle points in the direction of curl F (and—at
least in the limit—won’t spin at all when the axle is orthogonal to curl F). Indeed, if the
fluid—and hence the paddlewheel—is spinning about an axis with angular speed v, then
||curl F|| = 2v (see Exercise 1).
Now, given the 2-fonn
O) — F]dX2 dx3 + Fidx?, A dx\ + F^dxy A dxz G v 42(R3)

(which happens to be obtained by applying the star operator, defined in Exercise 8.2.9, to
our original 1-form), then

da> = - ------ F - ------ 1- -— I dx\ A dx2 A dx3.


\ 3X1 vX2 0x3/

Figure 6.1
6 Applications to Physics ◄ 395

Correspondingly, given the vector field

„ 3Fi 3F2 8F3


we set div F = —- H------- - H-------- ;
3x2 3x3

“div” is short for divergence, a term that is & propos, as we shall soon see. In this case,
d2 = 0 can be restated as

div (curl F) = 0 for all Q2 vector fields F.

Stokes’s Theorem now takes the following form, sometimes called Gauss’s Theorem:

Theorem 6.2 (Classical Divergence Theorem) Suppose F is a smooth vector field


on a compact 3-manifold with boundary, Q c R3. Then

I F ndS= divFdV.
dee

Once again, we get from this a limiting interpretation of the divergence: Applying
Exercise 7.1.7, we find

div F(a) = lim f F • ndS.


(*)
r->0+ jjrr3 JdB(a,r)

That is, div F(a) is a measure of the flux (per unit volume) outward across very small
spheres centered at a. If that flux is positive, we can visualize a as a source of the field,
with a net divergence of the fluid flow; if the flux is negative, we can visualize a as a sink,
with a net confluence of the fluid. We shall see a beautiful alternative interpretation of the
divergence in Chapter 9.
Given a vector field F (in the context of work) and the corresponding 1 -form co, applying
file star operator introduced in Exercise 8.2.9 gives the 2-form *co corresponding to the same
vector field F (in the context of flux)—and vice versa. That is, when we have an oriented
surface S, the 2-form gives the normal component of F times area 2-form a of S. In
particular, if we start with a function f, then on S, *df — (Dnf)tr, where Dnf = V/ • n
is the directional derivative of f in the normal direction.
We summarize the relation among forms and vector fields, the d operator and gradient,
curl, and divergence in the following table:

Differential Forms Fields___________


0-forms functions (scalar fields) x
|d | grad )
d2 = 0 1- forms vector fields (work) I . curl (grad) = 0

|d | curl / I
d2 = 0 2- forms vector fields (flux)' / div (curl) — 0
| div /
|d
3- forms functions (scalar fields)
396 ► Chapter 8. Differential Forms and Integration on Manifolds

6.2 Gauss’s Law


In this passage we concentrate on inverse square forces, either gravitation (according to
Newton’s law of gravitation) or electrostatic attraction (according to Coulomb’s law). We
will stick with the notation of Newton’s law of gravitation, as we discussed in Section 4
of Chapter 7: The gravitational attraction of a mass M at the origin on a unit test mass at
position x is given by

(Here G is the universal gravitation constant.) As we saw in Example 4 of Section 5,


div F = 0 (except at the origin) and for any compact surface S C R3 bounding a region fl,
we have

-47TGM, 0 g fl
F-ndS =
0, otherwise

(We must also stipulate that 0 £ S for the integral to make sense.) More generally, if Fa is
the gravitational force field due to a point mass at point a £ S, then

—4t t GM, a € fl
I Fa•ndS=
s 0, otherwise

If we have point masses Afi,..., Af* at points ai,..., a*, then the flux of the resultant
k
gravitation force F = £2 Fay outward across the surface S (on which, once again, none of

the point masses lies) is given by

F-ndS = -47tG Af;.

Indeed, given a mass distribution with (integrable) density function 8 on a region D, we


can, in fact, write an explicit formula for the gravitational field (see Section 4 of Chap­
ter 7):
f V—v
(t)

(When x G D, this integral is improper, yet convergent, as can be verified by using spherical
coordinates centered at the point x.) It should come as no surprise, approximating the mass
distribution by a finite set of point masses, that the flux of the resulting gravitational force
F is given by
[ F • ndS = -4nG [ 8dV = —4nGM,
Js Ja
where M is the mass inside S ~ 3fl. This is Gauss’s law.
Using the limiting formula for divergence given in (*) on p. 395, we see that, even if
F isn’t apparently smooth, it is plausible to define

div F(x) = — 4nG8(x)

when 8 is continuous on D (and div F(x) = 0 when x £ Z>).


6 Applications to Physics •< 397

Now we can determine, as did Newton (following the lines of Example 6 of Chapter
7, Section 4), the gravitational field F inside the earth, assuming—albeit incorrectly—that
the earth is a ball of uniform density. Take the earth to be a ball of radius R centered at the
origin and to have constant density and total mass M. Fix x with ||x|| = b < R. First of
all, we have
3
I_ F • ndS = —47rG(mass of the earth inside B(0, b)) = M.
dB(Q,b)

Now, by symmetry, F points radially inward, and so

/ F-nJS = -||F||area(3B(0,i)) = - ||F||(4t t £2).

GM
Thus, we have ||F(x) || = r- ||xII • Since F is radial, we have
RA

F(x) = -^x.

It is often surprising to find that the gravitational force inside the earth is linear in the
distance from the center. Notice that at the earth’s surface, this analysis is in accord with
the inverse-square nature of the field. (See Exercise 2.)
As an amusing application, we calculate the time required to travel in a perfectly
frictionless tunnel inside the earth from one point on the surface to another. We suppose
that we start the trip with zero speed. When the mass is at position x, the component of the
gravitational force acting in the direction of the tunnel is

. n GM
- I|F|| sin# = ——u,

where u is the displacement of the mass from the center of the tunnel (see Figure 6.2). By
Newton’s second law, we have

Figure 6.2
398 ► Chapter 8. Differential Forms and Integration on Manifolds

The general solution is

u(t) = a cos + b sin •

If we start with the initial conditions w(0) = u q and w'(0) = 0, then we have

u(t) = uo cos

and we see that the mass reaches the opposite end of the tunnel after time

As was pointed out to me my freshman year of college, this is rather less time than many
of our commutes.

6.3 Maxwell’s Equations


Let E denote the electric field, B the magnetic field, p the charge density, and J the current
density. All of these are functions on (some region in) R3 x R (space-time), on which we
use coordinates x, y, z, and t. The classical presentation of Maxwell’s equations is the
following system of four partial differential equations (ignoring various constants such as
4?r and c, the speed of light):

Gauss’s law: div E = p


No magnetic monopoles: div B = 0
dB
Faraday’s law: curlE = ——
dt
3E
Ampere’s law: curlB= — +J
dt
These are all “differential” versions of equivalent “integral” statements obtained by applying
Stokes’s Theorem, as we already encountered Gauss’s Law in the previous subsection.
Briefly, suppose S is an oriented surface (perhaps imagined) and dS represents a wire.
Then Faraday’s law states that
9B d
I E • Tds = - / ------ ndS = —- / B ndS
'ds Js dt---------------- dt Js
(using the result of Exercise 7.2.20 to differentiate under the integral sign); i.e., the voltage
around the loops dS equals the negative of the rate of change of magnetic flux across the
loop. (More colloquially, a moving magnetic field induces an electric field that in turn does
work, namely, creates a voltage drop across the loop.) On the other hand, Ampere’s law
states that (in steady state, with no time variation)

I B • Tds = I J • ndS;
Jas Js
e., the circulation of the magnetic field around the wire is the flux of the current density
i.
across the loop.
6 Applications to Physics 399

Let

co — (Eidx 4- E2dy 4- E3dz) Adt + (B]dy Adz + B2dz Adx 4- B3dx Ady).

Then
J f/dE3 dE2 , 3Bi\J J /dEi dE3 dB2\ ,
dco = I I —---------------- F ——)dy Adz+ ( —--------- ------ 1- —— )dz A dx
\\ dy dz dt / X dz dx dt /
/dE2 dEi dB3\ , , \ /3Bi dB2 dB3\
+ (— -I —— Jdx A dy j Adt 4- I —----- 1—------- 1—-— ) dx A dy A dz,
X dx dy dt f / \ dx dy dz /

and so we see that

dco = 0 4==> div B = 0 and curl E 4------ = 0.

Next, let

0 — —(E\dy A dz 4- E2dz A dx 4- E3dx A dy) 4- (Bidx 4- B2dy 4- B3dz) A dt.

(Using the star operator defined in Exercise 8.2.9, one can check that 0 — *co. The subtlety
is that we’re working in space-time, endowed with a Lorentz metric in which the standard
orthonormal basis {ei......... e<} has the property that 64 • 64 = — 1; this introduces a minus
sign so that *(dx a dt) = — dy A dz, etc.) Then an analogous calculation shows that

d0 = 0 <=> div E = 0 and curl B — = 0.


dt
This would hold, for example, in a vacuum, where p = 0 and J = 0. But, in general, the
first and last of Maxwell’s equations are equivalent to the equation

d0 — (J^dy A dz 4- J2dz Adx 4- Jjdx A dy) Adt — pdx Ady A dz.

Since dco = 0 on R4, there is a 1-form

a = a\dx 4- a2dy 4- a3dz — cpdt

so that da = co (see Exercise 8.7.12). Of course, a is far from unique; for any function
f, we will have d(a 4- df) = co as well. Let = a 4- df, where f is a solution of the
inhomogeneous wave equation

v2 * 92f (9ai . 9a2 , 9«3


\ dx dy dz dt)

This means that d*d/ = — d*a, and so d*/J = 0. If we write

ft = A\dx 4- A2dy 4- A3dz — <f)dt,

the condition that d*/S = 0 is equivalent to


(Mi dA2 dA3 d</>
(*) dx + dy dz + dt

Since df} = co, ^dfi = *co = 0, we calculate

d^djS = d0 = {J\dy Adz A- hdz Adx + J3dx A dy) Adt - pdx Ady A dz.
400 ► Chapter 8. Differential Forms and Integration on Manifolds

Using (*) to substitute

32Ai d2A2 d2A3 _ 920


dxdt dydt dzdt dt2 ’

"At"
we can check that solving Maxwell’s equations is equivalent to finding A = A2 and0
_ A3 _
satisfying the inhomogeneous wave equations

V2A - = -J and V20 - = -p.


ot£ dt£
Solving such equations is a standard topic in an upper-division course in partial differential
equations.

► EXERCISES 8.6
1. Write down the vector field F corresponding to a rotation counterclockwise about an axis in the
direction of the unit vector a with angular speed v, and check that curl F = 2va.
2. Using Gauss’s law, show that the gravitational field of a uniform ball outside the ball is that of a
point mass at its center.
3. (Green’s Formulas) Let f, g: Q -> R be smooth functions on a region Q C R3. Recall that
Dng denotes the directional derivative of g in the normal direction.
(a) Prove that
f (Dng)dS = f y2gdV
Jan J si
f f(Dng)dS = [ (/V2g + Vjf Vg)dV
Jan J si
f (fD.g-gD.f)dS = f (fV2g-gV2 f)dV.
JdSi J SI

(Hint: V2gdx A dy A dz = d*dg.)


(b) We say f is harmonic on Q if V2/ = 0 on Q. Prove that if f and g are harmonic on then
f (DDf)dS = 0
hsi
[ f(D.f)dS = f ||V/||2<iV
Jan Jn
[ f(Dag)dS = [ gUhDdS.
Jan Jan
4. (See Exercise 3.) Prove that if f and g are harmonic on a region Q and f = g on 3Q, then
f = g everywhere on Q. (Hint: Consider f — g.)
5. (a) Prove that g: R3 - {0} -> R, g(x) = 1/||x||, is harmonic. (See Exercise 3.)
(b) Prove that if f is harmonic on 5(0, 5) c R3, then f has the mean value property; f(0) is
the average of the values of f on the sphere of any radius r < R centered at 0. (Hint: Apply the
appropriate results of Exercise 3 with = {x : £ < ||x|| < r} and g as in part a; then let £ -> 0+.)
6 Applications to Physics 401

(c) Deduce the maximum principle for harmonic functions*. If f is harmonic on a region Q, then f
takes on its maximum value on dQ.
6. Let S C K3 be a closed, oriented surface. Using the formula (t) for the gravitational field F, show
that
(a) the flux of F outward across 5 is 0 when no points of D lie on or inside 5;
(b) the flux of F outward across S is —4k G f 8dV when all of D lies inside S.
Jd
(Hint: Change the order of integration.)
*7. Try to determine which of the vector fields pictured in Figure 6.3 have zero divergence and which
have zero curl. Justify your answers.
8. Let F be a smooth vector field on an open set U c R". A parametrized curve g is a flow line for a
vector field F if g'(r) = F(g(r)) for all t.
(a) Give a vector field with a closed flow line.
(b) Prove that if F is conservative, then it can have no closed flow line (other than a single point).
(c) Prove that if n = 2 and F has a closed flow line C, then div F must equal 0 at some point inside
C. (Hint: See Exercise 8.3.18.)

Figure 6.3
402 ► Chapter 8. Differential Forms and Integration on Manifolds

(e)

(g) (h)

Figure 6.3 (continued)

9. Let fl c R3 be a compact 3-manifold with boundary.


(a) Prove that / fndS = I NfdV. (Hint: Apply Stokes’s Theorem to each component.)
Jan Jo
(b) Deduce that / ndS = 0. Give a (geometric) plausibility argument for this result.
Jan
10. (Archimedes’s Law of Buoyancy) Prove that when a floating body in a uniform liquid is at
equilibrium, it displaces its own weight, as follows. Let fl denote the portion of the body that is
submerged.
(a) The force exerted by the pressure of the liquid on a planar piece of surface is directed inward
normal to the surface, and pressure is force per unit area. Deduce that the buoyancy force is given by
B = / -pndS, where p is the pressure.
JdSl
(b) Assuming that Vp = <5g, where 8 is the (constant) density of the liquid and g is the acceleration
of gravity, deduce that B = —Mg, where M is the mass of the displaced liquid. (Hint: Apply Exer­
cise 9.)
(c) Deduce the result.
11. Let v be the velocity field of a fluid flow, and let 8 be the density of the fluid. (These are both 61
functions of position and time.) Let F = 5v. The law of conservation of mass states that
[ 8dV = - [ F ■ ndS.
dt Jq Jan
7 Applications to Topology •< 403

Show that the validity of this equation for all regions £2 is equivalent to the equation of continuity:
98
div F + — = 0.
9t
(Hint: Use Exercise 7.2.20.)

12. Suppose a body £2 c R3 has (C2) temperature w I | at position x € £2 at time t. Assume that
the heat flow vector q = — KVu, where K is a constant (called the heat conductivity of the body);
the flux of q outward across an oriented surface 5 represents the rate of heat flow across S.
(a) Show that the rate of heat flow across 9£2 into £2 is T = / KV2ud V.
Jq
(b) Let c denote the heat capacity of the body; the amount of heat required to raise the temperature
of the volume AV by AT degrees is approximately (cAT)AV; thus, the rate at which the volume
A V absorbs heat is c~ A V. Conclude that the rate of heat flow into £2is7r = ( c—dV.
dt Jq 9t
(c) Deduce that the heat flow within £2 is governed by the partial differential equation c— = KV2u.
dt
13. Suppose £2 C R3 is a region and u: £2 x [0, oo) —> R is a C2 solution of the heat equation
/x\
V2m = —. Suppose u I | = 0 for all x e £2 and Dnu = 0 on 9 £2 (this means the region is insulated

along the boundary).


(a) Consider the “energy” E(t) = [ u2dV. Note that E(O) = 0. Prove that E'(t) < 0 (this means

that heat dissipates) and show that E(t) = 0 for all t > 0. (Hint: Use Exercise 7.2.20.)

(b) Prove that u I I = 0 for all x e £2 and all t > 0.

(c) Prove that if »i and u2 are two solutions of the heat equation that agree at t = 0 and agree on 3 £2
for all time t > 0, then they must agree for all time t > 0.
92u
14. Supposed C R3 is a region and w: R is a C2 solution of the wave equation V2u = —y.
/x\
Suppose that u I 1 = /(x) for all x e 9£2 and all t (e.g., in two dimensions, the drumhead is clamped

along the boundary of £2). Prove that die total energy


EW = 5/n((l)2 + l|v“l|2)dy

is constant. Here by Vh we mean the vector of derivatives with respect only to the space variables.

► 7 APPLICATIONS TO TOPOLOGY
We are going to give a brief introduction to the field of topology by using the techniques
of differential forms and Stokes’s Theorem to prove three rather deep theorems. The basic
ingredient of several of our proofs is the following. Let Sn denote the n-dimensional unit
sphere, Sn = {x e R”+1 : ||x|| = 1}, and Dn the closed unit ball, Dn = {x e R” : ||x|| < 1}.
(Then 3Dn+1 = Sn.)

Lemma 7.1 There is an n-form co on Sn whose integral is nonzero.


404 ► Chapter 8. Differential Forms and Integration on Manifolds

Proof It is easy to check directly that the volume form


n+1
(D — A • • • A dXi A ■ • • A c/Xn+1
j=l

is such a form. ■

Theorem 7.2 There is no smooth function r: Dn+1 —> Sn with the property that
r(x) ~ xfor all x e Sn.

Proof Suppose there were such an r. Letting co be an n-form on S", as in Lemma


7.1, we have
[ co= [ r*co= [ d(r*a>) = f r*(d<v) = 0,
Jsn Jsn JDn+i JD"+1
inasmuch as the only (n + l)-form on an n-dimensional manifold is 0 (and hence da> = 0).
But this is a contradiction since we chose a> with a nonzero integral. ■

Corollary 7.3 (Brouwer Fixed Point Theorem) Let f: Dn —> Dn be smooth. Then
there must be a point x G Dn so that f (x) = x; i.e., f must have a fixed point.

Proof Suppose it does not. Then for all x G D\ the points x and f (x) are distinct.
Define r: Dn -> Sn~l by setting r(x) to be the point where the ray starting at f(x) and
passing through x intersects the unit sphere, as shown in Figure 7.1. We leave it to the
reader to check in Exercise 1 that r is in fact smooth. By construction, whenever x g S"-1,
we have r(x) = x. By Theorem 7.2, no such function can exist, and hence f must have a
fixed point. ■

Topology is in some sense the study of continuous (or, in our case, smooth) deformations
of objects. An old saw is that a topologist is one who cannot tell the difference between
a doughnut and a coffee cup. This occurs because we can continuously deform one to the
other, assuming we have flexible, plastic objects: The “hole” in the doughnut becomes the
“hole” in the handle of the cup. The crucial notion here is the following:
7 Applications to Topology -4 405

Definition Suppose X C R" and Y c Rm. Let f: X -> Y and g: X -> Y be


(smooth) functions. We say they are (smoothly) homotopic if there is a (smooth) map
H: X x [0,1] -► Y so that H PJ = f (x) and H = g(x) for all x € X.

► EXAMPLE 1

The identity function f: Dn -> Dn, f (x) — x, is homotopic to the constant map g(x) = 0. We merely
set

The homotopy shrinks the unit ball gradually to its center.

► EXAMPLE!

Are the maps f, g: S1 -> S1 given by

cost cos2t
and
sinf sin 2?

homotopic? These parametrized curves wrap once and twice, respectively, around the unit circle,
so the winding numbers of these curves about the origin are 1 and 2, respectively. If we surmise
that the winding number should vary continuously as we continuously deform the curve, then we
guess that the curves cannot be homotopic. Let’s make this precise: Suppose there were a homotopy
H: S1 x [0,1] -> S1 between f and g. Let a) — —ydx + xdy € ^’(S1). Then

f *a> = and I g*ft) = 47T.


s1
We observe that,

/ H*<y = / <Z(H*ft>) = / W(da>) = 0


JaCS'xfO,!]) Js’xfO,!] Js'xto.l]

since any 2-form on S1 must be 0. On the other hand, as we see from Figure 7.2,
diS1 x [0,1]) = (S1 x {1})- U (S1 x {0}),

Figure 7.2
406 ► Chapter 8. Differential Forms and Integration on Manifolds

so

[ H*<o= [ Fa>- [ g*«.


Jd(slx[o,w Js1 Js*
In conclusion, if f and g are homotopic, then we must have

[ Fa) = [ g*co;
Js1 Js'
since 2n / 4t t , we infer that f and g cannot be homotopic.

In general, we have the following important result:

Proposition 7.4 Suppose X is a compact, oriented k-dimensional manifold and


f, g: X -> Y are homotopic maps. Then for any closed k-form co on Y, we have

I Fa)= I g*o>.
x Jx

Proof We leave this to the reader in Exercise 3. ■

By the way, it is time to give a more precise definition of the term “simply connected.”
A closed curve in R" is nothing other than the image of a map S1 -> X.

Definition We say X C R” is simply connected if every pair of points in X can be


joined by a path and every map f: S1 -> X is homotopic to a constant map.

Recall that a £-form a) is closed if da> = 0 and exact if co = drj for some (k — l)-form
y. As a consequence of Proposition 7.4, we have

Corollary 7.5 Suppose X is a simply connected manifold. Then every closed 1-form
co on X is exact.

Proof Let f: S1 -> X be a closed curve; f is homotopic to a constant map g. Since

g*w = 0, we infer that / f *w = 0. The result now follows from Theorem 3.2. ■
Js1

Note that this is the generalization of the local result we obtained earlier, Proposi­
tion 3.3.
Before moving on to our last topic, we stop to state and prove one of the cornerstones
of classical mathematics. We assume a modest familiarity with the complex numbers.

Theorem 7.6 (Fundamental Theorem of Algebra) Let n > 1 and ao.ai,...,


an-i e C; consider a polynomial p(z) = zn + an_\zn~x -I-------- F a\z + o q . Then p has
n roots in C (counting multiplicities).
7 Applications to Topology 4 407

Proof (We identify C with R2 for purposes of the vector calculus.9) Since

v dn-iz 1 4-... 4- aiz + ao


lim-------------------------------------- = 0,
z—>oo

there is R > 0 so that whenever |z| > R we have

^n—lZn 1 4" ■ • • + CL\Z 4" Oq


Zn
On38(0, R) we have a homotopy H: 3.8(0, R) x [0,1] -> C — {0} between p and g(z) =
zn given by

H( ) = zn 4- (1 - t)(an-izn 1 4- • • • 4- aiz 4- ao) = tg(z) 4- (1 - t)p(z).

The crucial issue is that, by the triangle inequality,

|z" 4- (1 - r)(an_izn-1 4- ■ •• 4-aiz4-ao)l > |znl ~ (1 -0|an_izn-1 4----- 4-aiz4-a0|


-|Z”I(1"D = T’

\ 4W J

so the function H indeed takes values in C — {0}.


. —ydx 4- xdy
Recall that the 1-form co = —' 2 ~ a c^ose^ f°nn 011 C — {0} = R2 — {0}.

R cos cos nt
( R ' t/ — &
sin nt
, 0 < t < 2t t , we see that

/> /•2tt
I g*a> = I (—(sinnt)(—nsinnt) + (cosnt)(ncosnt))dt== I ndt = 2nn,
JdB(0,R~) JO Jo

and hence, by Proposition 7.4, we have / _ p*a> = 2ztn as well. Now, suppose p had

no root in 8(0, R). Then p would actually be a smooth map from all of 8(0, R) to C — {0}
and we would have

2trn = / p*a> = I d(p*a>) = I p*(dcd) = 0,


JdB(0,R) Jb (0,R) Jb (0,R)
which is a contradiction. Therefore, p has at least one root in 8(0, R). The stronger
statement of the theorem follows easily by induction on n. ■

We can actually obtain a stronger, more localized version. We need the following
computational result, a more elegant proof of which is suggested in Exercise 8.

9Recall that complex numbers are of the form z = x 4- iy, x, y e R. We add complex numbers as vectors in R2,
and we multiply by using the distributive property and the rule i2 = — 1: If z = x 4- iy and w = u 4- iv, then
zw = (xu — yv) + i(xv + yu). It is customary to denote the length of the complex number z by |z|, and the
reader can easily check that \zw | = |z||w|. In addition, deMoivre’s formula tells us that ifz = r(cos 0 + i sin 0),
then zn = rn (cos n0 4- i sin n0).
408 ► Chapter 8. Differential Forms and Integration on Manifolds

Lemma 7.7 Let a> ~ {—ydx + xdy)/(x2 4- y2) € .A*1 (C — {0}), and suppose f and
g are smooth maps toC~ {0}. Then (fg)*w = f*a> 4- g*(o.

Proof Write f = u + iv and# = U + iV. Then fg = (uU — vV) + i(uV + vU),


and so, using the product rule and a bit of high school algebra, we obtain

-(uV + vU)d(uU - vV) + (uU - vV)d(uV 4- vU)


(uU - vV)2 + (uV 4- vU)2
—(uV 4- vU)d(uU - vV) + (uU - vV)d(uV + vU)
(u2 4- v2)(U2 4- V2)
—(uV + vU)(Udu - Vdv 4- udU - vdV) + (uU - vV)(Vdu 4- Udv + udV 4- vdU)
(u2 + v2)(U2 + V2)
(U2 4- V2)(—vdu 4- udv) 4- (w2 + v2)(—VdU + UdV)
(u2 4- v2)(U2 4- V2)

as required. ■

Now we have an intriguing application of winding numbers (see Section 3) that gives
a two-dimensional analogue of Gauss’s law from the preceding section. We make use of
the Fundamental Theorem of Algebra.

Proposition 7.8 Let p be a polynomial with complex coefficients. Let D C C be a


region so that no root of p lies on C = 3D. Then
1 f */—ydx+xdy\
2rr JcP \ x2 4- y2 )

is equal to the number of roots of p in D.

Proof As usual, let cu = (—ydx 4- xdy)/(x2 4- y2). Using Theorem 7.6, we factor
p(z.) — c(z - ri)(z - rf) • • • (z - rn), where c / 0 and r7- e C, j = 1,..., n, are the roots
of p. Let fj (z) = z - rj. Then we claim that

1, rj e D
2t t Jc 0, otherwise

The former is a consequence of Example 10 on p. 361; the latter follows from Corollary
7.5. Applying Lemma 7.7 repeatedly, we see that p*a) — f*a>, and so

is equal to the number of roots of p in D. ■


7 Applications to Topology -4 409

There are far-reaching generalizations of this result that you may learn about in a
differential topology or differential geometry course. An interesting application is the
study of how roots of a polynomial vary as we change the polynomial; see Exercise 9.
A vector field v on Sn is a smooth function v: S” -> Rn+1 with the property that
x • v(x) — 0 for all x. (That is, v(x) is tangent to the sphere at x.)

► EXAMPLES

There is an obvious nowhere-zero vector field on Sl, the unit circle, which we’ve seen many times in
this chapter:

~X2

Xi

Indeed, an analogous formula works on S2m 1 c R2m:

-x2
Xl

x2m—l x2m
\ x2m J
x2m—1

(If we visualize the vector field in the case of the circle as pushing around the circle, in the higher­
dimensional case, we imagine pushing in each of the orthogonal xix2-, x3x4-, x^-ix^-planes
independently.)

In contrast with the preceding example, however, it is somewhat surprising that there
is no nowhere-zero vector field on Sn when n is even. The following result is usually
affectionately called the Hairy Ball Theorem, as it says that we cannot “comb the hairs” on
an even-dimensional sphere.

Theorem 7.9 Any vector field on the unit sphere S2m must vanish somewhere.

Proof We proceed by contradiction. Suppose v were a nowhere-zero vector field on


S2m; we may assume (by normalizing) that ||v(x) || = 1 for all x e S2m. We now use the
vector field to define a homotopy between the identity map f: S2"1 -> S2"* and the antipodal
mapg: S2w -> S2m,g(x) = —x. Namely, we follow along the semicircle fromx to-x in the
direction of v(x), as pictured in Figure 7.3. To be specific, define H: S2"1 x [0,1] -> S2m
by

Clearly, H is a smooth function. Now we apply Proposition 7.4, using the form a) defined
in Lemma 7.1. In particular, we calculate g*w explicitly:
410 ► Chapter 8. Differential Forms and Integration on Manifolds

2m+l
g*<2) — g*( (— ly^XidXi A • • • /\dXi A • • • A
i=l
2/n+l
= A • • • A (,-dXi) A • • • A (-dX2m+l) = = -0).
1=1

Thus, we have

I a> = I f*a) = I g*n> = — I <w;


$2m+l J g2m+l J $2m+l J $2m+l

since / a> 0, we have arrived at a contradiction. ■


Jg2m+1

► EXERCISES 8.7
1. Check that the mapping r defined in the proof of Corollary 7.3 is in fact smooth.
2. Consider the maps f and g defined in Example 2 as maps from [0, 2t t ] to R2 (rather than to S1).
Determine whether they are homotopic.
3. Prove Proposition 7.4.
4. Let f: C -> C be given by f (z) = z4 — 3z + 9, and let □ — {|z | < 2}. Evaluate / f*(o, where,
Jan
as usual, <u = (—ydx + xdy)/(x2 + y2). How many roots does f have in Q?
5. Show that Corollary 7.3 need not hold on the following spaces:
(a) Sn, (c) a solid torus,
(b) the annulus {x e R2 : 1 < ||x|| < 2}, (d) Bn (the open unit ball).
6. Prove the following generalization of Theorem 7.2: Let M be any compact, orientable manifold
with boundary. Then there is no function f: M -> dM with the property that f(x) = x for all x e dM.
7. As pictured in Figure 7.4, let
Z = {x2 + y2 = 1, z = 0} U {x = y = 0} U {x = z = 0, y > 1} C R3.
7 Applications to Topology 411

Suppose co is a continuously differentiable 1-form on R3 — Z satisfying da> — 0. Suppose, moreover,


that I co = 3 and I co = —7. Calculate I co, I a>, and / co. Give your reasoning.
JCi JCz JCi JC4 JC$

Figure 7.4
8. (a) Let z = x + iy. Show that
dz _ xdx + ydy . —ydx + xdy
z x2 + y2 1 x2 + y2

(b) Let 17 C C be open, f, g: U -> C — {0} be differentiable, and co = (—ydx + xdy)/(x2 -I- y2).
Prove that (/g)*co = /*co + g*co. (Hint: What is (Jg)*(dz/z)?)
9. Let co = (—ydx + xdy)/(x2 + y2).
(a) Suppose U C C is open and f, g: U -> C — {0} are smooth. Let C C U be a closed curve and
suppose |g - f\ < |/| on C. Prove that
f g*a>= [ f*a>.
Jc Jc
(Hint: Use a homotopy similar to that appearing in the proof of Theorem 7.6.)
(b) Leto q , ai........ cz„_i € C andp(z) = zn + a„_izn-1 H-------- 1- a^z + a0. Let D c Cbe aregion so
that no root of p lies on C = dD. Prove that there is 8 > 0 so that whenever \b j — a7-1 <8 for all
j = 0,1,..., n — 1, the polynomial P(z) — zn + bn_\zn~l -I------- h b^z + bo has the same number
of roots in D as p.
(c) Deduce from part b that the roots of a polynomial vary continuously with the coefficients.
(Cf. Example 2 on p. 189 and Exercise 6.2.2. See also Exercise 9.4.22 for an interesting application
to linear algebra.)
10. Let f: 52m -> S'2”1 be a smooth map. Prove that there exists x e S2m so that either f (x) = x or
f(x) = -x.
11. Letn > 2andf: Dn -> Rn be smooth. Suppose ||f(x) — x|| < 1 for all x e Sn 3. Prove that there
f
is some x € D" so that f (x) = 0. (Hint: If not, show that the restriction of the map —: Dn -> S"-1
to 3D" is homotopic to the identity map.)
12. We wish to give a generalization of Proposition 3.3. Suppose U C R" is an open subset that is
star-shaped with respect to the origin.
412 ► Chapter 8. Differential Forms and Integration on Manifolds

(a) For any k = 1,..., n, given a k-form 0 = fidiCj on U, define the (k - l)-form J(0) =
(/ l A ’ •1 A dxi. A • ■ ■ A dxik. Then make J linear. Prove that

0 = d(J(0)) + J(d0).
(b) Prove that if w is a closed k-form on U, then co is exact
13. Use the result of Exercise 12 to express each of the following closed forms co on R3 in the form
co = dr).
(a) co = (e* cos y + z)dx + (2yz2 — ex sin y)dy + (x + 2y2z + ez)dz.
(b) co = (2x + y2)dy Ndz + (3y + z)dx /\dz + (z — xy)dx A dy.
(c) co = xyzdx /\dy A dz.
14. Draw an orientable surface whose boundary is the boundary curve of the Mobius strip, as pictured
in Figure 7.5. (More generally, every simple closed curve in R3 bounds an orientable surface. Can
you see why?)

Figure 7.5
15. Find three everywhere linearly independent vector fields on S1 x S2.
16. Fill in the details in the following alternative proof of Theorem 7.9, following J. Milnor. Given
a (smooth) unit vector field v on Sn, first extend v to be a vector field V on Rn+1 by setting
V(x)={W|2v(^)’ X*0
U 0, x=0

(a) Check that V is C1.


(b) Define f,: Dn+1 -* R"+1 by f?(x) = x 4- tV(x). Apply the inverse function theorem to prove
that for t sufficiently small, fr maps the closed unit ball one-to-one and onto the closed ball of
radius Vl +t2- (Hints: To establish one-to-one, first use the inverse function theorem to show that
fr(x)
the function F: Dn+1 x R -> R"+1 x R given by F is locally one-to-one. Now

proceed by contradiction: Suppose there were a sequence tk —> 0 and points x*, y* € Drt+1 so that
ftk (Xfc) = (y*). Use compactness of Dn+1 to pass to convergent subsequences and ykj. To
establish onto, you will need to use the fact that the only nonempty subset of Dn+1 that is both open
(in Dn+1) and closed is Dn+1 itself.)
(c) Apply the Change of Variables Theorem to see that the volume of B(0, Vl +t2) must be a
polynomial expression in t.
(d) Deduce that you have arrived at a contradiction when n is even.
CHWER

EIGENVALUES,
EIGENVECTORS, AND
APPLICATIONS
We have seen the importance of choosing the appropriate coordinates in doing multiple
integration. Now we turn to what is really a much more basic question. Given a linear
transformation T: R" -> R”, can we choose appropriate (convenient?) coordinates on Rn
so that the matrix for T (in these coordinates) is as simple as possible, say, diagonal? For
this the fundamental tool is eigenvalues and eigenvectors. We then give applications to
difference and differential equations and quadratic forms.

► 1 LINEAR TRANSFORMATIONS AND CHANGE OF BASIS


In all our previous work, we have referred to the “standard matrix” of a linear transformation.
Now we wish to broaden our scope.

Definition Let V be a finite-dimensional vector space and let T: V -> V be a linear


transformation. Let B = {vi,..., vn} be an ordered basis for V. Define numbers ay,
i = 1,..., n, j = 1,..., n, by

T(fj) =avvi +a2/V2H------+ anjvn, j = l.......... n.

Then we define A = to be the matrix for T with respect to B, also denoted [T]#. As
before, we have

A= T(vt) T(v 2) T(vn) ,

where now the column vectors are the coordinates of the vectors with respect to the
basis B.

We might agree that, generally, the easiest matrices to understand are diagonal. If we
think of our examples of projection and reflection in R", we obtain some particularly simple
diagonal matrices.

413
414 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

> EXAMPLE 1

Suppose V cR" is a subspace. Choose a basis {vi,..., ¥&} for V and a basis {vjt+i,..., vn} for
V1. Then B = {vi,..., v„J forms a basis for R" (why?). Let T = projv: R" -> Rn be the linear
transformation given by projecting onto V, and let 5: R" -> R" be the linear transformation given
by reflecting across V. Then we have

P(V1) = Vi S(vi) =

P(Vfc) = V* 5(v*) = yk
and
T(v*+i) = 0 S(Vjt+i) = -Vfc+i

T(v„) = 0 S(v„) = -v„

Then the matrices for T and S with respect to the basis B are, respectively,

B= h_2. and C=
0 0

► EXAMPLE 2

Let T: R2 -> R2 be the linear transformation defined by multiplying by

It is rather difficult to understand this function until we discover that if we take

and v2 =

then T(vj) = 4vi and P(v2) = v2, so that the matrix for T with respect to the ordered basis B =
{▼ i, v2} is the diagonal matrix

4 0
0 1

Now it is rather straightforward to picture the linear transformation: As we see from Figure 1.1, it
stretches the Vi -axis by a factor of 4 and leaves the v2-axis unchanged. Since we can “pave” the plane
by parallelograms formed by vi and v2, we are able to describe the effects of T quite explicitly. We
shall soon see how to find vi and v2.
For future reference, let’s consider the matrix P with column vectors Vi and v2. Since T(vi) =
4vi and T(v2) = v2, we observe that
-11 I"4
3
AP = = PB.
2 0

This might be rewritten as B = P-1 AP, in the form that will occupy our attention for the rest of this
section.
1 Linear Transformations and Change of Basis 415

It would have been a more honest exercise here to start with the geometric description of T, i.e.,
its action on the basis vectors Vi and v2, and try to find the standard matrix for T. As the reader can
check, we have

1 1
ei = fvi “ 5V2
e2 = |vi + |v2,

and so we compute that

T(e,) = |T(v,) - |T(v 2) = |v, - |v2


= 3 1, and
L2J

Tfe) = |T(Vi) + 1T(v 2) = |v, + |v2

_ 1
~ 2

What a relief! **!

Given a (finite-dimensional) vector space V and an ordered basis B = {Vi,..., v„} for
V, we can define a linear transformation

CB:

which assigns to each vector v its vector of coordinates with respect to the basis B. That is,

Cl

Cb (Ci V! + C2V2 4--------- 1- cn\n} =


416 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

Of course, when B is the standard basis 8 for R", this is what you’d expect:

x\

X2
Q(x) =

Suppose T: R" -» R" is a linear transformation and T(x) = y; to say that A is the standard
matrix for T is to say that multiplying A by the coordinate vector of x (in the standard basis)
gives the coordinate vector of y (in the standard basis). Likewise, suppose T: V —> V is a
linear transformation, T(v) = w, and B is an ordered basis for V. Then let Cg(v) = x be
the coordinate vector of v with respect to the basis B, and let Cg(w) = y be the coordinate
vector of w with respect to the basis B. To say that A is the matrix for T with respect to the
basis B (see the definition on p. 413) is to say Ax = y. (See Figure 1.2.)
Suppose now that we have a linear transformation T: V -> V and two ordered bases
13 = {vi,..., vn} and B' = {vj,..., v„} for V. (Often in our applications, as the notation
suggests, V will be R" and B will be the standard basis 8.) Let Aoia = [T]b be the matrix
for T with respect to the “old” basis B, and let Anew = [T]^ be the matrix for T with
respect to the “new” basis B'. The fundamental issue now is to compute Anew if we know
Aoia- Define the change-of-basis matrix P to be the matrix whose column vectors are the
coordinates of the new basis vectors with respect to the old basis: i.e.,

V; = PljVi + p2;V2 + • • • + PnjVn.

When B is the standard basis, we have our usual schematic picture

Note that P must be invertible since we can similarly express each of the old basis vectors
as a linear combination of the new basis vectors. (Cf. Proposition 3.4 of Chapter 4.) Then,
as the diagram in Figure 1.3 summarizes, we have the following

> v

C<j

A
■> K"

Figure 1.2 Figure 13


1 Linear Transformations and Change of Basis ◄ 417

Theorem 1.1 (Change-of-Basis Formula) Let T: V —> V be a linear transforma­


tion, and let B = {vi,..., vn} and B' = {vj,..., v^} be ordered bases for V. If[T]g and
[P]# are the matrices for T with respect to the respective bases and P is the change-of-
basis matrix (whose columns are the coordinates of the new basis vectors with respect the
old basis), then we have

[T]f = P-'OTb P.

Remark Two matrices A and B are called similar if B = P[AP for some invertible
matrix P (see Exercise 9). Theorem 1.1 tells us that any two matrices representing a linear
map T: V -> V are similar.

Proof Given a vector v € V, denote by x and x', respectively, its coordinate vectors
with respect to the bases B and B'. The important relation here is

x = Px'.
n
We derive this as follows: Using the equations v = £ and
n n n i=l n n
v “ E*M - E<(E>ov<) = E (Epy*')v,>
7=i j=i i=i i=i ;=i

we deduce from Corollary 4.3.3 that


n
Xi = ^Pyxfj'
7=1

(If we think of the old basis as the standard basis for Rn, then this is our familiar fact that
multiplying P by x' takes the appropriate linear combination of the columns of P.)
Likewise, if T(v) = w, let y and y', respectively, denote the coordinate vectors of w
with respect to bases B and B'. Now compare the equations

y' = [PW and y = [P]sx,

using

y = Pyz and x = Px':

On one hand, we have

y = P/ = P([T]Fx') = (P[P]B/)x',

and on the other hand,

y = [Pfex = [Pfe(Px') = ([P]sP)x',

from which we conclude that

[T]8P = P[T]ff; Le„ [T]b - = P-'OTb P- ■


418 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

► EXAMPLES

Let’s return to Example 2 as a test case for the change-of-basis formula. (Of course, we’ve already
seen there that it works.) Given the matrix

1
A = [T] =
2

of a linear transformation T: R2 -> R2 with respect to the standard basis, let’s calculate its matrix
\T]& with respect to the new basis B' = {Vi, v2}, where

The change-of-basis matrix is

1 -1 . 1 2 1
p= , and P"1 = -
1 2 3 -1 1

from which the reader should calculate that, indeed,

[t j b - = p -'a p = j J

► EXAMPLE 4

We wish to calculate the standard matrix for the linear transformation T = projy, where V c R3 is
the plane - 2x2 + x3 = 0. If we choose a basis B = {vi, v2, v3} for R3 so that {vi, v2} is a basis
for V and v3 is normal to the plane, then (see Example 1) we’ll have

1 0 0
[Th= 0 1 0
0 0 0

So we take

We wish to know the standard matrix, which means that B' = {ei, e2, e3} should be the standard basis
for R3. Then the inverse of the change-of-basis matrix is
" -1 1 1 "
P'1 = 0 1 -2
1 1 1_

and so
1 0 1
“2 2
P = 1 1 1
5 3 5
1 _1 1
L 6 3 5 J
1 Linear Transformations and Change of Basis ◄ 419

Now we use the change-of-basis formula:

"-1 1 1' "1 0 o' ”_1


0 1 "1
2 2
[T] = [nSz = p-i[T]Bp = 0 1 -2 0 1 0 1 1 1
3 3 3
1 1_ 1 1 1
1 _0 0 0_ L 5 ~3 5 J
r 5 1 1 “1
? 3
i 1 1 . "41
5 3 3
1 1 5
L“5 3 5 J

► EXAMPLE 5

Suppose we consider the linear transformation T: R3 —> R3 defined by rotating an angle 2rr/3 about
1“
the line spanned by -1 . (The angle is measured counterclockwise from a vantage point on the
1
“positive side” of this line.) Once again, the key is to choose a convenient new basis adapted to the
geometry of the problem. We choose

along the axis of rotation and Vi, v2 to be an orthonormal basis for the plane orthogonal to that axis:
e.g.,

and

Now let’s compute:


T(V1)= -^ + ^¥2,

T(v2) = -^V1- |v 2,
T(v 3) = v3.

(Now it should be clear why we chose Vi, v2 to be orthonormal. We also want vb v2, v3 to form a
“right-handed system” so that we’re turning in the correct direction, as indicated in Figure 1.4. But
there’s no need to worry about the length of v3.) Thus, we have
r i -4 o’
[Ds = 4 -I 0
0 0 1

Next, we take JS' = {e2, e2, e3}, and the inverse of the change-of-basis matrix is
i

i
76
2
0
7E
420 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

Figure 1.4

so that

(Exercise 5.5.16 may be helpful here, but, as a last resort, there’s always Gaussian elimination.) Once
again, we solve for

[T] = [T]s, = F-i[r]eP =

amazingly enough. Tn hindsight, then, we should be able to see the effect of T on the standard basis
vectors quite plainly. Can you?

Remark Suppose we first rotate njl about the X3 -axis and then rotate n/2 about the
xi-axis. We leave it to the reader to check that the result is the linear transformation whose
matrix we just calculated. This raises a fascinating question: Is the composition of rotations
always again a rotation? If so, is there a way of predicting the ultimate axis and angle?

► EXERCISES 9.1
2 1
*1. Letvi = and v2 = , and consider the basis B' = {v!, v2} for R2.
_3.
1 5
(a) Suppose T: R2 —> R2 is a linear transformation whose standard matrix is [T] = . Find
2 -2
the matrix for T with respect to the basis B'.
1 Linear Transformations and Change of Basis -41 421

(b) If S: R2 -> R2 is a linear transformation defined by

S(vi) = 2vj + v2
S(v2) = -Vi + 3v 2,
then give the standard matrix for 5.
2. Derive the result of Exercise 1.4.10a by the change-of-basis formula.
3. Let T: R3 -> R3 be the linear transformation given by reflecting across the plane
-xi + x2 + x3 = 0.
(a) Find an orthogonal basis {vi, v2, v3} for R3 so that vb v2 span the plane and v3 is orthogonal
to it.
(b) Give the matrix representing T with respect to your basis in part a.
(c) Use the change-of-basis theorem to give the matrix representing T with respect to the standard
basis.
4. Use the change-of-basis formula to find the standard matrix for projection onto the plane spanned
' 1" o'
0 and 1
_1_ _ —2 _
*5. Let T: R3 -> R3 be the linear transformation given by reflecting across the plane
xi — 2x2 + 2x 3 = 0. Use the change-of-basis formula to find its standard matrix.
6. Check the result claimed in the remark on p. 420.
7. Let V c R3 be the subspace defined by
V — {x € R3 '• x\ — x2 -F x3 = 0}.
Find the standard matrix for each of the following linear transformations:
(a) projection on V,
(b) reflection across V,
(c) rotation of V through angle jt /6 (as viewed from high above).
*8. Find the standard matrix for the linear transformation giving projection onto the plane in R4

spanned by

3 9. Let A and B be n x n matrices. We say B is similar to A if there is an invertible matrix P so that


B = P}AP. (Hint: B = P^AP <=> PB - AP.)
(a) If c is any scalar, show that cl is similar only to itself.
(b) Show that if B is similar to A, then A is similar to B.

(c) Show that

1 a
(d) Show that for any real numbers a and b, the matrices are similar.
0 2

(e) Show that is not similar to


0 2 0 2
2 1
(f) Show that is not diagonalizable, i.e., is not similar to any diagonal matrix.
422 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

10. See Exercise 9 for the relevant definition. Prove or give a counterexample:
(a) If B is similar to A, then BT is similar to AT.
(b) If B2 is similar to A2, then B is similar to A.
(c) If B is similar to A and A is nonsingular, then B is nonsingular.
(d) If B is similar to A and A is symmetric, then B is symmetric.
(e) If B is similar to A, thenN(B) = N(A).
(f) If B is similar to A, then rank(B) = rank(A).
11. See Exercise 9 for the relevant definition. Suppose A and B are n x n matrices.
(a) Show that if either A or B is nonsingular, then AB and BA are similar.
(b) Must AB and BA be similar in general?
sin 0 cos#
*(a)
12. Let a = sin0sin# , 0 < 0 < 7t/2. Prove that the intersection of the circular cylin-
cos 0
der xf + x2 = 1 with the plane a • x = 0 is an ellipse. (Hint: Consider the new basis
— sin# — COS 0 cos#

cos# , v2 = — cos <p sin #

0 sin</>

(b) Describe the projection of the cylindrical region xj + x% = 1, —h < x3 < h onto the general
plane a • x = 0. (Hint: Special cases are the planes x3 = 0 and xi = 0.)
" ±1 ’ " 1 ’

13. A cube with vertices at ±1 is rotated about the long diagonal through ± 1 . Describe
_ ±1 _ _ 1 _
the resulting surface and give equation(s) for it.
14. In this exercise we give the general version of the change-of-basis formula for a linear transfor­
mation T: V W.
(a) Suppose V and V' are ordered bases for the vector space V and W and W' are ordered bases for
the vector space W. Let P be the change of basis matrix from V to V and let Q be the change of basis
matrix from W to W'. Suppose T: V -> W is a linear transformation whose matrix with respect to
the bases V and W is [T]^7 and whose matrix with respect to the new bases V' and W' is [T]^f. Prove
that[T]^ ^Q-’trjWp.
(b) Consider the identity transformation T: V -> V. Using the basis V in the domain and the basis
V in the range, show that the matrix for [T]y is the change of basis matrix P.
15. (See the discussion on p. 183 and Exercise 4.4.18.) Let A be an n x n matrix. Prove that the
functions T: R(A) -> C(A) and S: C(A) -» R(A) are inverse functions if and only if A = QP,
where P is a projection matrix and Q is orthogonal.

► 2 EIGENVALUES, EIGENVECTORS, AND DIAGONALIZABILITY


As we shall soon see, it is often necessary in applications to compute (high) powers of a
given square matrix. When A is diagonalizable, i.e., there is an invertible matrix P so that
P-1 AP — A is diagonal, we have
A = PAP-1, and so
A* = (PAP^XPAP-1) • • • (PAP-1) = PA*P-1.

k times
2 Eigenvalues, Eigenvectors, and Diagonalizability 423

Since Afe is easy to calculate, we are left with a very computable formula for Ak. We will
see a number of applications of this principle in Section 3. We turn first to the matter of
finding the diagonal matrix A if, in fact, A is diagonalizable. Then we will try to develop
some criteria that guarantee diagonalizability.

2.1 The Characteristic Polynomial


Recall that a linear transformation T: V -> V is diagonalizable if there is an (ordered)
basis B = {vi,..., vn} for V so that the matrix for T with respect to that basis is diagonal.
This means precisely that, for some scalars A1?..., A„, we have

T(vi) = Aivi,
T(v 2) — A2V2,

T (fn) — •

Likewise, an n x n matrix A is diagonalizable if there is a basis {vi,..., vB} for R” with


the property that Av, = A(Vj for all i = 1,..., n.
This observation leads us to the following

Definition Let T: V -> V be a linear transformation. A nonzero vector v g V is


called an eigenvector of T if there is a scalar A so that T(v) — Av. The scalar A is called
the associated eigenvalue of T.

In other words, an eigenvector of a linear transformation T is a (nonzero) vector that is


merely stretched (perhaps in the negative direction) by T. The line spanned by the vector
is identical to the line spanned by its image under T.
This definition, in turn, leads to a convenient reformulation of diagonalizability:

Proposition 2.1 The linear transformation T : V —> V is diagonalizable if and only


if there is a basis for V consisting of eigenvectors ofT.

At this juncture, the obvious question to ask is how we should find eigenvectors. Let’s
start by observing that, if we include the zero vector, the set of eigenvectors with eigenvalue
A forms a subspace.

Lemma 2.2 LetT: V —> V be a linear transformation, and let k be any scalar. Then

E(A) = {V G V : T’(v) = Av} = ker(T - AZ)

is a subspace ofV; dimE(A) > 0 if and only if A is an eigenvalue, in which case we call
E(A) the A-eigenspace.

Proof That E(A) is a subspace follows immediately once we recognize that it is the
kernel (or nullspace) of a linear map. (In the more familiar matrix notation, {x e R" :
Ax = Ax} = N(A — AZ).) Now, by definition, A is an eigenvalue precisely when there is a
nonzero vector in E(A). ■
424 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

We now come to the main computational tool for finding eigenvalues.

Proposition 2.3 Let Abe an n x n matrix. Then L is an eigenvalue of A if and only


i/det(A - LI) = 0.

Proof From Lemma 2.2 we infer that A is an eigenvalue if and only if the matrix
A — AZ is singular. Next we conclude from Theorem 5.5 of Chapter 7 that A — LI is
singular precisely when det(A — LI) — 0. Putting the two statements together, we obtain
the result. ■

Once we use this criterion to find the eigenvalues L, it is an easy matter to find the
corresponding eigenvectors merely by finding N(A — LI).

► EXAMPLE 1

Let’s find the eigenvalues and eigenvectors of the matrix

1
7

We start by calculating

det(A - tl) = = (3 - r)(7 -1) - (1)(—3) = t2 - lOt + 24.

Since t2 — lOt + 24 = (r — 4)(t — 6) = 0 when t = 4 or t = 6, these are our two eigenvalues. We


now proceed to find the corresponding eigenspaces:

E(4): We see that

Vl = is a basis for

E(6): We see that

v2 = is a basis for N(A — 6Z) = N

Since we observe that the {vi, v2} is linearly independent, die matrix A is diagonalizable. Indeed, as
1 1
the reader can check, if we take P = , then
3

as should be the case. <4


2 Eigenvalues, Eigenvectors, and Diagonalizability <4 425

► EXAMPLE 2

Let’s find the eigenvalues and eigenvectors of the matrix

1 2 1
A= 0 1 0
1 3 1

We begin by computing

1-t 2 1
det(A - if) = 0 l-t 0
1 3 1 -t

(expanding in cofactors along the second row)

= (1 - 0(0 - 0(1 -1) - 1) = (1 - t)(t2 - 2t) = -t(t - l)(t - 2).

Thus, the eigenvalues of A are 0,1, and 2. We next find the respective eigenspaces:

E(0): We see that


" -1 ' '1 2 r ) =N ( ”1 0 1“
Vi = 0 is a basis for N(A — 01) = N 0 1 0 0 1 0
1 _ _1 3 1_ J) 0 0_

E(l): We see that


3 ‘ "o 2 r "1 0 31
“2
-1 is a basis for N(A — 11) = N 0 0 0 0 1
V2 = =N 1 5
2_ _1 3 0_ _0 0 0_

E(2): We see that


" 1 ’ ’-1 2 r ) =N( ’1 0 -1"

v3 = 0 is a basis for N(A - 21) = N 0 -1 0 0 1 0


_ 1 _ 1 3 -1_ _0 0 0_

Once again, A is diagonalizable: As the reader can check, {vi, v2, v3} is linearly independent and
therefore gives a basis for R3. Just to be sure, we let

-1 3 1
P= 0 -1 0
1 2 1

then
r i 1 1 " ’1 2 1 " T-l 3 1" "o 0 o'
~2 "2 2
p-'AP = 0 -1 0 0 1 0 0 -1 0 = 0 1 0
1 5 1 1_ 1 2 1_ _0 0 2_
_ 2 2 2 J _1 3

as we expected.
426 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

Remark There is a built-in check here for the eigenvalues. If A is truly to be an


eigenvalue of A, we must find a nonzero vector in N(A — AZ). If we do not, then A cannot
be an eigenvalue.

► EXAMPLE 3

Let’s find the eigenvalues and eigenvectors of the matrix

1
0

As usual, we calculate

-t 1
det(A — tl) = = t2 + l.
-1 -t

Since t2 + 1 > 1 for all real numbers t, there is no real number A so that det(A — AZ) = 0. Since our
scalars are allowed only to be real numbers, this matrix has no eigenvalue. On the other hand, as one
might see in a more advanced course, it is often convenient to allow complex numbers as scalars.

It is evident that we are going to find the eigenvalues of a matrix A by finding the (real)
roots of the polynomial det(A — tl). This leads us to our next

Definition Let A be a square matrix. Then p(t) = pA(t) = det(A — tl) is called
the characteristic polynomial of A.1

We can restate Proposition 2.3 by saying that the eigenvalues of A are the real roots of
the characteristic polynomial p a (I). It is comforting to observe that similar matrices have
the same characteristic polynomial, and hence it makes sense to refer to the characteristic
polynomial of a linear map T: V -> V.

Lemma 2.4 If B = P"1 AP for some invertible matrix P, then Pa (I) = Pb (1)-

Proof We have

p5(r) = det(5 - tl) = det(P-1AP - tl)


= det (P~\A - tI)P) = det(A - tl) = pA(t),

by virtue of the product rule for determinants, Proposition 5.7 of Chapter 7. ■

As a consequence, if V is a finite-dimensional vector space and T: V -> V is a linear


transformation, then we can define the characteristic polynomial of T to be that of the matrix
A for T with respect to any basis for V. By Lemma 2.4 we’ll get the same answer no matter
what basis we choose.

’That the characteristic polynomial of an n x n matrix is in fact a polynomial of degree n seems pretty evident
from examples; but the fastidious reader can establish this by expanding in cofactors.
2 Eigenvalues, Eigenvectors, and Diagonalizability 427

Remark In order to determine the eigenvalues of a matrix, we must find the roots of
its characteristic polynomial. In real-world applications (where the matrices tend to get quite
large), one might solve this numerically (e.g., using Newton’s method). However, there
are more sophisticated methods for finding the eigenvalues without even calculating the
characteristic polynomial; a powerful such method is based on the Gram-Schmidt process.
The interested reader should consult Strang or Wilkinson for more details.
For the lion’s share of the matrices that we shall encounter here, the eigenvalues will be
integers, and so we take this opportunity to remind you of a trick from high school algebra.

Proposition 2.5 (Rational Roots Test) Let p(t) — antn + an-itn~x -I-------- 1- ait +
ao be a polynomial with integer coefficients. Ift — r/sisa rational root (in lowest terms)
of p(t), then r must be a factor ofao and s must be a factor ofan.

Proof You can find a proof in most abstract algebra texts, but, for obvious reasons,
we recommend Abstract Algebra: A Geometric Approach, by someone named T. Shifrin,
p. 105. ■

In particular, when the leading coefficient an is ±1, as is always the case with the
characteristic polynomial, any rational root must in fact be an integer that divides o q . So, in
practice, we test the various factors of o q (being careful to try both positive and negative).
Once we find one root r, we can divide p(t) by t — r to obtain a polynomial of smaller
degree.

► EXAMPLE 4

The characteristic polynomial of the matrix

4 3
A = 0 1 4
2 -2

is p(t) = -t3 + 6t2 - lit 4- 6. The factors of 6 are ±1, ±2, ±3, and ±6. Since p(l) = 0, we know
that 1 is a root (so we were lucky). Now,

= tz - 5t + 6 = (t - 2)(t - 3),

and we have succeeded in finding all three eigenvalues of A. They are 1,2, and 3. ◄

Remark It might be nice to have a few shortcuts for calculating the characteristic
polynomial of small matrices. For 2x2 matrices, it’s quite easy:

a—t b
= (a — f)(d — t) — be
c d—t
= t2 — (a + d) t 4- (ad — be) —t2 — trA t 4- det A
428 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

(Recall that the trace of a matrix is the sum of its diagonal entries. The trace of A is denoted
trA.) For 3x3 matrices, similarly,

an— t flu O13

021 022 ~ t 023


— —t3 4- trA t2 — (Cn 4~ C22 4- C33) t det A ,
O31 032 033 — t

where Ca is the 0 th cofactor, the determinant of the 2 x 2 submatrix formed by deleting


the 1th row and column from A.
In general, the characteristic polynomial p(t) of an n x n matrix A is always of the
form
pit} = (-l)nt" 4- (-1)”"1! trA | f1-1 4- • • • +1 det A .

Note that the constant term is always det A because p(0) = det(A — Of) = det A.

In the long run, these formulas notwithstanding, if s sometimes best to calculate the
characteristic polynomial of 3 x 3 matrices by expansion in cofactors. If one is both atten­
tive and fortunate, this may save the trouble of factoring the polynomial.

► EXAMPLES

Let’s find the characteristic polynomial of

2 0 0
A= 1 2 1
0 1 2

We calculate the determinant by expanding in cofactors along the first row:

2-t 0 0
1
1 2-t 1 (2-0
2-t
0 1 2-t
= (2 -1)((2 -1)2 - 1) = (2 - /)(? -4/4-3)
= (2 - 0(t — 3)(t - 1).

But that was too easy. Let’s try the characteristic polynomial of

2 0 1
B= 1 3 1
1 1 2

Again, we expand in cofactors along the first row:

2-t 0 1
3-t 1 1 3-t
1 3-t 1 = (2-0
1 2-t 1 1
1 1 2-t
= (2 —r)((3 — r)(2 — r) — 1) + (1 — (3 —/))
2 Eigenvalues, Eigenvectors, and Diagonalizability *4 429

= (2 - t)(t2 — 5t + 5) — (2 — t) = (2 — t)(t2 — 5t + 4)
= (2 — t)(t - l)(t — 4).
OK, perhaps we were a bit lucky there, too.

2.2 Diagonalizability
Judging by the foregoing examples, it seems to be the case that when an n x n matrix (or
linear transformation) has n distinct eigenvalues, the corresponding eigenvectors form a
linearly independent set and will therefore give a “diagonalizing basis.” Let’s begin by
proving a slightly stronger statement.

Theorem 2.6 Let T: V -» V be a linear transformation. Let ki,..., kt be k distinct


scalars. Suppose Vi,..., V* are eigenvectors of T with respective eigenvalues k\,..., A*.
Then {vi,..., v*} is a linearly independent set of vectors.

Vrooi Let m be the largest number between 1 and k (inclusive) so that {vi,..., } is
linearly independent. We want to see that m — k. By way of contradiction, suppose m < k.
Then we know that {vi,..., vOT} is linearly independent and {vj,..., vm, vm+]} is linearly
dependent. It follows from Proposition 3.2 of Chapter 4 that vm+i = ciVi 4--------1- cmvm
for some scalars q , ..., cm. Then (using repeatedly the fact that T(vz ) = A/v,)

0 — (T — Am+iZ)vm+i = (T — Am+iZ)(ciVi + • • • + cmvm)


= Ci (Ai Am+i)vi + • • • + cm(km km+i)vm.

Since A, — Aw+i 0 for i = 1,..., m, and since {vi,..., vm} is linearly independent, the
only possibility is that ci = • • • = cm = 0, contradicting the fact that vm+i 0 (by the very
definition of eigenvector). Thus, it cannot happen that m < k, and the proof is complete. ■

We now arrive at our first result that gives a sufficient condition for a linear transfor­
mation to be diagonalizable.

Corollary 2.7 Suppose V is an n-dimensional vector space and T : V -+ V has n


distinct (real) eigenvalues. ThenT is diagonalizable.

Proof The set of the n corresponding eigenvectors will be linearly independent and
will hence give a basis for V. The matrix for T with respect to a basis of eigenvectors is
always diagonal. ■

Remark Of course, there are many diagonalizable (indeed, diagonal) matrices with
repeated eigenvalues. Certainly the identity matrix and the matrix

"2 0 0“
0 3 0
0 0 2

are diagonal, and yet they fail to have distinct eigenvalues.


430 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

We spend the rest of this section discussing the two ways the hypotheses of Corollary
2.7 can fail: The characteristic polynomial may have complex roots or it may have repeated
roots.

► EXAMPLE 6

Consider the matrix

The reader may well recall from Chapter 1 that multiplying by A gives a rotation of the plane through
an angle of t t /4. Now, what are the eigenvalues of A? The characteristic polynomial is

p(t) = t2 — (trA)r + det A = t2 — V2t + 1,

whose roots (by the quadratic formula) are

72±x /=2 _ 1±2


2 ^2 '

After a bit of thought, it should come as no surprise that A has no (real) eigenvector, as there can be
no line through the origin that is unchanged after a rotation.

We have seen that when the characteristic polynomial has distinct (real) roots, we get
a 1-dimensional eigenspace for each. What happens if the characteristic polynomial has
some repeated roots?

► EXAMPLE?

Consider the matrix

1
3

Its characteristic polynomial is p(t) = t2 - 4r + 4 = (r — 2)2, so 2 is a repeated eigenvalue. Now


let’s find the corresponding eigenvectors:

-1
N(A - 21) = N
-1

is 1-dimensional, with basis

It follows that A cannot be diagonalized. (See also Exercise 16.)


2 Eigenvalues, Eigenvectors, and Diagonalizability < 431

► EXAMPLE 8

Both the matrices

and

have the characteristic polynomial p(f) = (t - 2)2(t — 3)2 (why?). For A, there are two linearly inde­
pendent eigenvectors with eigenvalue 2 but only one linearly independent eigenvector with eigenvalue
3. For B, there are two linearly independent eigenvectors with eigenvalue 3 but only one linearly
independent eigenvector with eigenvalue 2. As a result, neither can be diagonalized. ◄

It would be convenient to have a bit of terminology here.

Definition Let A. be an eigenvalue of a linear transformation. The algebraic multi­


plicity of A is its multiplicity as a root of the characteristic polynomial p(t), i.e., the highest
power of t — A dividing p(t). The geometric multiplicity of A is the dimension of the
A-eigenspace E(A).

> EXAMPLE 9

For the matrices in Example 8, both the eigenvalues 2 and 3 have algebraic multiplicity 2. For matrix
A, the eigenvalue 2 has geometric multiplicity 2 and the eigenvalue 3 has geometric multiplicity
1; for matrix B, the eigenvalue 2 has geometric multiplicity 1 and the eigenvalue 3 has geometric
multiplicity 2. "4

From the examples we’ve seen, it seems quite plausible that the geometric multiplicity
of an eigenvalue can be no larger than its algebraic multiplicity, but we stop to give a proof.

Proposition 2.8 Let A be an eigenvalue of algebraic multiplicity m and geometric


multiplicity d. Then 1 <d <m.

Proof Suppose A is an eigenvalue of the linear transformation T. Then d ~


dimE(A) > 1 by definition. Now, choose a basis {vi,..., vrf} for E(A) and extend it to a
basis B — {vi,..., v„} for V. Then the matrix for T with respect to the basis B is of the
form

AZrf B
A =
O C

and so, by Exercise 7.5.10, the characteristic polynomial


pA(f) = det(A — tl) — det ((A — t)lf) det(C — tl) = (A — t)d det(C — tl).

Since the characteristic polynomial does not depend on the basis and since (t — A)OT is the
largest power of t — k dividing the characteristic polynomial, it follows that d < m. ■
432 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

We are now able to give a necessary and sufficient criterion for a linear transformation
to be diagonalizable. Based on our experience with examples, it should come as no great
surprise.

Theorem 2.9 Let T: V —> V be a linear transformation. Let its distinct eigenvalues
be ... and assume these are all real numbers. Then T is diagonalizable if and only
if the geometric multiplicity, di, of each A, equals its algebraic multiplicity, mi.

Proof Let V be an n-dimensional vector space. Then the characteristic polynomial


of T has degree n, and we have

p(t) = ±(t - - A2)W2 • • • (t - A*)m‘;

therefore,

k
n =
i=l

Now, suppose T is diagonalizable. Then there is a basis B consisting of eigenvectors.


k
At most, di of these basis vectors lie in E(Af), and so n < di. On the other hand, by
i=l
Proposition 2.8, di < ntj for i = 1,..., k. Putting these together, we have
k k
n < ^^di < y^mt — n.
i=l i=l

Thus, we must have equality at every stage here, which implies that di = m( for all i =
l,...,k.
Conversely, suppose di = mi for i = 1,..., k. If we choose a basis Bi for each
eigenspace E(A() and let B — B\ U • • • U Bk, then we assert that B is a basis for V. There
are n vectors in B, so we need only check that the set of vectors is linearly independent.
This is a generalization of the argument of Theorem 2.6, and we leave it to Exercise 25. ■

► EXAMPLE 10

The matrices

both have characteristic polynomial p(t) = ~(t — l)2(t — 2). That is, the eigenvalue 1 has algebraic
multiplicity 2 and the eigenvalue 2 has algebraic multiplicity 1. To decide whether the matrices are
diagonalizable, we need to know the geometric multiplicity of the eigenvalue 1. Well,

-12 1 0 0 0
2 Eigenvalues, Eigenvectors, and Diagonalizability ◄ 433

has rank 1 and so dimExfl) = 2. We infer from Theorem 2.9 that A is diagonalizable. Indeed, as
the reader can check, a diagonalizing basis is

On the other hand,

has rank 2 and so dimEfl(l) = 1. Since the eigenvalue 1 has geometric multiplicity 1, it follows
from Theorem 2.9 that B is not diagonalizable. *41

In the next section we will see the power of diagonalizing matrices in several applica­
tions.

EXERCISES 9.2
1. Find the ei, andei, matrices.
1 5" "2 0 1"
(a)
2 4 *(i) 0 1 2
_0 0 1_
0 1
(b) 1 -2 2
i o

10 -6 (j) -1 0 -1
(C) 0 2 -1
18 -11
3 1 0
1 3
(d) (k) 0 1 2
3 1
0 1 2
1 1
*(e) 1 -6 4
3
0) -2 -4 5
-1 1 2
-2 -6 7
(f) 1 2 1
3 2 —2
2 1 -1
(m) 2 2 -1
1 0 0 2 1 0
(g) -2 1 2
1 0 0 1“
-2 0 3
0 1 1 1
1 -1 2 (n)
0 0 2 0
(h) 0 1 0 .0 0 0 2_
0 -2 3.
2. Prove that 0 is an eigenvalue of A if and only if A is singular.
3. Prove that the eigenvalues of an upper (or lower) triangular matrix are its diagonal entries.
434 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

4. What are the eigenvalues and eigenvectors of a projection? A reflection?


5. Suppose A is nonsingular. Prove that the eigenvalues of A-1 are the reciprocals of the eigenvalues
of A.
6. Suppose x is an eigenvector of A with corresponding eigenvalue A.
(a) Prove that for any n e N, x is an eigenvector of A" with corresponding eigenvalue A".
(b) Prove or give a counterexample: x is an eigenvector of A +1.
(c) If x is an eigenvector of B with corresponding eigenvalue p,, prove or give a counterexample: x
is an eigenvector of A + B with corresponding eigenvalue A + g.
(d) Prove or give a counterexample: If A is an eigenvalue of A and /z is an eigenvalue of B, then
A 4- g is an eigenvalue of A + B.
7. Prove or give a counterexample: If A and B have the same characteristic polynomial, then there
is an invertible matrix P so that B = P-1 AP.
“8. Suppose A is a square matrix. Suppose x is an eigenvector of A with corresponding eigenvalue A
and y is an eigenvector of AT with corresponding eigenvalue /z. Prove that if A /z , then x • y = 0.
9. Prove or give a counterexample:
(a) A and AT have the same eigenvalues.
(b) A and AT have the same eigenvectors.
10. Prove that the product of the roots of the characteristic polynomial of A is equal to det A. (Hint:
If Ai,..., An are the roots, show that p(t) = ±(r - AO(r - A2)... (t — A„).)
11. Let A and B be n x n matrices.
(a) Suppose A (or B) is nonsingular. Prove that the characteristic polynomials of AB and BA are
equal.
(b) (More challenging) Prove the result of part a when both A and B are singular.
*12. Decide whether each of the matrices in Exercise 1 is diagonalizable. Give your reasoning.
13. Prove or give a counterexample.
(a) If A is an n x n matrix with n distinct (real) eigenvalues, then A is diagonalizable.
(b) If A is diagonalizable and AB = BA, then B is diagonalizable.
(c) If there is an invertible matrix P so that A — P"1 BP, then A and B have the same eigenvalues.
(d) If A and B have the same eigenvalues, then there is an invertible matrix P so that A = P ] BP.
(e) There is no real 2x2 matrix A satisfying A2 = —I.
(f) If A and B are diagonalizable and have the same eigenvalues (with the same algebraic multiplic­
ities), then there is an invertible matrix P so that A = PABP.
14. Suppose A is a 2 x 2 matrix whose eigenvalues are integers. If det A = 120, explain why A
must be diagonalizable.
15. Is the linear transformation T: Mnxn -> A4nXn defined by T(X) = XT diagonalizable? (Hint:
Consider the equation XT = AX. What are the corresponding eigenspaces? Exercise 1.4.36 may also
be relevant.)
1
*16. Let A = We saw in Example 7 that A has repeated eigenvalue 2 and vi =
3
spans E(2).
(a) Calculate (A — 2Z)2.
(b) Solve (A — 27)v2 = Vj for v2. Explain how we know a priori that this equation has a solution.
(c) Give the matrix for A with respect to the basis {vi, v2}.
This is the closest to diagonal one can get and is called the Jordan canonical form of A.
2 Eigenvalues, Eigenvectors, and Diagonalizability 435

17. Prove that if A is an eigenvalue of A with geometric multiplicity d, then A is an eigenvalue of AT


with geometric multiplicity d. (Hint: Use Theorem 4.5 of Chapter 4.)
18. Suppose A is an n x n matrix with the property that A2 = A.
(a) Show that if A is an eigenvalue of A, then A = 0 or A = 1.
(b) Prove that A is diagonalizable. (Hint: See Exercise 4.4.16.)
19. Suppose A is an n x n matrix with the property that A2 = Z.
(a) Show that if A is an eigenvalue of A, then A = lorA = —1.
(b) Prove that
E(l) = {x € Rw : x = |(u + Au) for some u € R") and
E(-l) = {x g R" : x = |(u - Au) for some u g RB}.

(c) Prove that E(l) + E(—1) = R" and deduce that A is diagonalizable.
(For an application, see Exercise 15.)
20. Let A be an orthogonal 3x3 matrix.
(a) Prove that the characteristic polynomial pA has a real root.
(b) Prove that || Ax|| = ||x|| for all x G R3 and deduce that the only (real) eigenvalues of A can be 1
and —1.
(c) Prove that if det A = 1, then 1 must be an eigenvalue of A.
(d) Prove that if det A = 1 and A I, then pA: R3 —*• R3 is given by rotation through some angle
0 about some axis. (Hint: First show dimE(l) = 1. Then show that p,A maps E(l)1 to itself and use
Exercise 1.4.34.)
(e) (Cf. the remark on p. 420.) Prove that the composition of rotations in R3 is again a rotation.
21. Consider the linear map T: R3 -► R3 whose standard matrix is the matrix
r i i _i_ 1 _ a/5'i
6 3 ~ 6 6 3

C — 1 — 2 1 i V6
3 6 3 3'6
1 + £ 1-^6 1
1-6 ~ 3 3 6 6 J

given on p. 28. Show that T is indeed a rotation. Find the axis and angle of rotation.
22. Let A be an n x n matrix all of whose eigenvalues are real numbers. Prove that there is a basis
for Rn with respect to which the matrix for A becomes upper triangular. (Hint: Consider a basis
{vi, Vj,..., v3, where Vi is an eigenvector.)
8 23. Suppose T: V -> V is a linear transformation. Suppose T is diagonalizable (i.e., there is a basis
for V consisting of eigenvectors of T). Suppose, moreover, that there is a subspace W c V with the
property that T (W) c W. Prove that there is a basis for VV consisting of eigenvectors of T. (Hint:
Using Exercise 4.3.18, concoct a basis for V by starting with a basis for W. Consider the matrix for
T with respect to this basis; what is its characteristic polynomial?)
24. Suppose A and B are n x n matrices.
(a) Suppose both A and B are diagonalizable and that they have the same eigenvectors. Prove that
AB = BA.
(b) Suppose A has n distinct eigenvalues and AB = BA. Prove that every eigenvector of A is also
an eigenvector of B. Conclude that B is diagonalizable. (Query: Need every eigenvector of B be an
eigenvector of A?)
(c) Suppose A and B are diagonalizable and AB = BA. Prove that A and B are simultaneously
diagonalizable; i.e., there is a nonsingular matrix P so that both P-1 AP and PXBP are diagonal.
436 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

(Hint: If E(A) is the A.-eigenspace for A, show that if v € E(A), then Bv € E(A). Now use Exer­
cise 23.)
25. (a) Let A / ft be eigenvalues of a linear transformation. Suppose {vi,..., v*} c E(A) is linearly
independent and {wi,..., w^} C EQi) is linearly independent. Prove that {vi,..., vk, wt,..., wj
is linearly independent.
(b) More generally, if Ai,..., A* are distinct and {v^,..., v^} C E(A;) is linearly independent for
i = 1,..., k, prove that {vj0 : i = 1,..., k, j = 1........ dt} is linearly independent.

► 3 DIFFERENCE EQUATIONS AND ORDINARY


DIFFERENTIAL EQUATIONS
Suppose A is a diagonalizable matrix. Then there is a nonsingular matrix P so that

Ai
^2
P-^AP = A =

where the diagonal entries of A are the eigenvalues Ai,..., A„ of A. Then it is easy to use
this to calculate the powers of A:
A = PAP"1
A2 = (PAP"1)2 = (PAP^XPAP"1) = PA(P~1P)AP"1 = PA2?"1
A3 = A2 A = (PA2P"1)(PAP“1) = PA2(P"1P)AP"1 = PA3?"1

Ak = PA*?"1.

We now show how linear algebra can be applied to solve some simple difference equations
and systems of differential equations. Both arise very naturally in modeling economic,
physical, and biological problems. For the most basic example, we need only take “expo­
nential growth.” When we model a discrete growth process and stipulate that population
doubles each year, then ak, the population after k years, obeys the law: ak+\ = 2ak. When
we model a continuous growth process, we stipulate that the rate of change of the popu­
lation x(t) is proportional to the population at that instant, giving the differential equation
x(t) = kx(t).

3.1 Difference Equations

► EXAMPLE 1

(A cat/mouse population problem) Suppose the cat population at month k is ck and the mouse
Q
population at month k is mk, and let x* = denote the population vector at month k. Suppose
mjt

0.7 0.2
Xit+i = Axjt, where A~
-0.6 1.4
3 Difference Equations and Ordinary Differential Equations ◄ 437

and an initial population vector Xq is given. Then the population vector xk can be computed from

Hk = AkXo,

so we want to compute Ak by diagonalizing the matrix A.


Since the characteristic polynomial of A is p(t) = t2 - 2.1r + 1.1 = (t - 1)0 - 1.1), we see
that the eigenvalues of A are 1 and 1.1. The corresponding eigenvectors are

and so we form the change-of-basis matrix

1
2

Then we have

0
A = PAP~
1.1

and so

co
In particular, if Xq = is the original population vector, we have
mo

Ck 2 1 1 0 2 -1 co
=
mk 3 2 0 (1.1/ -3 2 mo

2 1 1 0 2c q -mo
3 2 0 (1-1/ _ —3c q + 2mo
1'
2 2c q — mo
3 2 _(l.l)*(-3co + 2mo)_

= (2c0 - mo) + (-3c0 + 2m0)(l.l)*

We can now see what happens as time passes. If 3co = 2nto, the second term drops out and the popu­
lation vector stays constant. If3c0 < 2m0, the first term is still constant, and the second term increases
exponentially; but note that the contribution to the mouse population is double the contribution to
the cat population. And if 3c0 > 2mQ, we see that the population vector decreases exponentially, the
mouse population being the first to disappear (why?).
438 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

The story for a general diagonalizable matrix A is the same. The column vectors of P
are the eigenvectors Vi,..., v„, and the entries of A* are A*,..., AJ, and so, letting

we have

This formula will have all the information we need, and we will see physical interpretations
of analogous formulas when we discuss systems of differential equations shortly.

► EXAMPLE 2

(The Fibonacci Sequence) The renowned Fibonacci sequence,


1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, ...,

is obtained by letting each number (starting with the third) be the sum of the preceding two: If we let
ak denote the fc* number in the sequence, then

flfc+1 = fl* + fl*-i, flo = fli = 1•

ak
Thus, if we define xfc = , k > 0, then we can encode the pattern of the sequence in the matrix
a*+i
equation

Ofc+l

In other words, setting

we have

Xt+i = Axk for all k > 0, with Xq =

Once again, by computing the powers of the matrix A, we can calculate x* = A*Xb, and hence the A01
term in the Fibonacci sequence.
The characteristic polynomial of A is p(t) = t2 —t — 1, and so the eigenvalues are
3 Difference Equations and Ordinary Differential Equations •*< 439

The corresponding eigenvectors are

1
A.2

Then

1
and

so we have

1 — A-2 1 Xi
Ai - 1 _ ~ _ -X2

Now we use the formula (*) above to calculate

xfc = A*xo = ciAjvi + C2A2V2

In particular, reading off the first coordinate of this vector, we find that the number in the Fibonacci
sequence is

It’s far from obvious (at least to the author) that each such number is an integer. We would be remiss
if we didn’t point out one of the classic facts about the Fibonacci sequence: If we take the ratio of
successive terms, we get
1 /jk4-2
Ok+1 _ V5 VA1 “ A2 )
a„

Now, |1 « .618, so lim = 0 and we have


i-»oo

lim — = Ai « 1.618.
k-»oo Cl/,

This is the famed golden ratio. ◄

3.2 Systems of Differential Equations


Another powerful application of linear algebra comes from the study of systems of ordinary
differential equations (ODE’s). For example, we have the constant-coefficient system of
linear ordinary differential equations:

xi(t) - anxi(t) + ai2X2(t),


x2(t) = «2i^i (?) + 022X2(1).

Here, and throughout this section, we use a dot to represent differentiation with respect to
t (time).
440 Chapter 9. Eigenvalues, Eigenvectors, and Applications

The main problem we address in this section is the following: Given an n x n (constant)
matrix A and a vector xo e R", we wish to find all differentiable vector-valued functions
x(t) so that

x(t) = Ax(t), x (0) = Xq .

(The vector xo is called the initial value of the solution x(r)-)

► EXAMPLES

Supposes = l,sothatA = [a] for some real number a. Then we have simply the ordinary differential
equation

x(t) = ax(t), x(0) = %o-

The trick of separating variables that the reader most likely learned in her integral calculus course
leads to the solution x(r) = xQeat. As we can easily check, x(t) = ax(t), so we have in fact found a
solution. Do we know there can be no more? Suppose y (/) were any solution of the original problem.
Then the function z(t) = y(t)e~at satisfies the equation

z(r) = y(t)e~at + y(t) (~ae~at) = (ay(f)) e~at 4- y(r) (-ae~at) = 0,

and so z(r) must be a constant function. Since z(0) — y(0) = xo, we see that y(t) = xoeat. The
original differential equation (with its initial condition) has a unique solution. ◄

► EXAMPLE 4

Consider perhaps the simplest possible 2x2 example:

xi(t) =ax!(r)
x2(t) = bx2(t)

with the initial conditions X] (0) = (xi)o, x2(O) = (x2)o. In matrix notation, this is the ODE

x(t) = Ax(r), x(0) = xo, where

(xi)o
x(t) = _(x2)oj

Since xi (f) and x2(r) appear completely independently in these equations, we infer from Example 3
that the unique solution of this system of equations will be

xi (t) = (xi )oeat, x2 (r) = (x2)oebt.

In vector notation, we have

xi (t) eat
xo = £(t)xo,
x2(0 _ _ 0

where E(t) is the diagonal 2x2 matrix with entries eat and ebt. This result is easily generalized to
the case of a diagonal n x n matrix.
3 Difference Equations and Ordinary Differential Equations ◄ 441

Recall that for any real number x, we have the Taylor series expansion

00 Yk ii 1
(t) c'=^- = 1 + x + -x2 + -x3 + ... + -x‘ + ....

Now, given an n x n matrix A, we define a new n x n matrix eA, called the exponential of
A, by

That the series converges is immediate from Proposition 1.1 of Chapter 6. In general,
however, trying to evaluate this series directly is extremely difficult because the coefficients
of Ak are not easily expressed in terms of the coefficients of A. However, when A is a
diagonalizable matrix, it is easy to compute eA: There is an invertible matrix P so that
A = P-1 AP is diagonal. Thus, A = PAP"1 and Ak = PA^P-1 for all k e N, and so

► EXAMPLES

2 0
Let A = . Then A — PAP \ where
3 -1

1
A= and P —
1

Then we have

and etA = Pe^P'1 =

The result of Example 4 generalizes to the n x n case. Indeed, whenever we can solve
a problem for diagonal matrices, we can solve it for diagonalizable matrices by making the
appropriate change of basis. So we should not be surprised by the following result.

Proposition 3.1 Let Abe a diagonalizable n x n matrix. The general solution of


initial value problem

(t) x(r) = Ax(t), x(O) = Xo

is given by x(t) = etAXo.


442 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

Proof As above, since A is diagonalizable, there are an invertible matrix P and a


diagonal matrix A so that A = PAP-1 and etA = Pe'AP-1. Since the derivative of the
diagonal matrix
[" etkl
etk2
etA =

etkn _

is obviously
r

knetkn _
then we have
(eM)* = (P^P"1)’ = P(^A)’P-1
= P (AefA) P"1
= (PAP-1)(PerAP*"1) = AetA.

We begin by checking that x(t) = etA\o is indeed a solution:

x(t) = (eMxo)‘ = (AeM)xo = A(eMXo) = Ax(r),

as required.
Now suppose that y (t) is a solution of the equation (t), and consider the vector function
z(t) = e~tAy(t). Then by the product rule, we have
i(t) = (e”M)’y(O + Ay(t)
= -Ae~tAy(t) + e~fA(Ay(t)) = (-Ae~tA + e~MA)y(t) = 0,

as Ae~tA = e~tAA. This implies that z(t) must be a constant vector, and so

z(r) = z(0) = y(0) = x0,


whence y(t) = etAz(t) = etAxo for all t, as required. ■

Remark A more sophisticated interpretation of this result is the following: If we


view the system (t) of ODE’s in a coordinate system derived from the eigenvectors of the
matrix A, then the system is uncoupled.

► EXAMPLE 6

Continuing Example 5, we see that the general solution of the system x(r) = Ax(t) has die form

for appropriate constants ci and C2


3 Difference Equations and Ordinary Differential Equations 443

Of course, this is the expression we get when we write

Cl Cl
x(t) = etA = P I etAP~x
C2 C2

and obtain the familiar linear combination of the columns of P (which are the eigenvectors of A). If,
in particular, we wish to study the long-term behavior of the solution, we observe that lim e~' = 0

1
and lim e2' = oo, so that x(t) behaves like c^2' as t -> oo. In general, this type of analysis of
r->oo 1
diagonalizable systems is called normal mode analysis, and the vector functions

1 o
e‘ and e f
1 1

corresponding to the eigenvectors are called the normal modes of the system,

To emphasize the analogy with the solution of difference equations earlier and the
formula (*) on p. 438, we rephrase Proposition 3.1 to highlight the normal modes.

Corollary 3.2 Suppose A is diagonalizable, with eigenvalues Xi,..., A„ and corre­


sponding eigenvectors Vi,..., vn, and write A = PAP"1, as usual. Then the solution of
the initial value problem

x(t) — Ax(t), x(0) = xo

is

x(r) = etAXo = PetA(P 1xo)

“1 e^11 Cl
ek2t C2
• (tt) ▼i v2
gAn,

= cieMfVi + C26X2tv2 H-------- F cnekntvn,

where

Cl

C2
P-Ixo =

c,

Note that the general solution is a linear combination of the normal modes eklt Vi,..., ekntvn.
444 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

Even when A is not diagonalizable, we may differentiate the exponential series term-
by-term2 to obtain
f2 A2 + +3 fk \*

(
/ + M+2i 3!A3 + - + ^ + CTA‘+1 + -)

*2 fk—1 fk
= A+tA2 + <_A^ + ... + _l_A^L.AM + ...

fl fk’—X. fk
= A(Z + tA + -A2 + A‘-‘ + —Ak + ••■)= At?*.
\lv X) • Iv I

Thus, we have

Theorem 3.3 Suppose A is an n xn matrix. Then the unique solution of the initial
value problem

x(t) — Ax(t), x(O) = xo

is x(t) = etAXo.

► EXAMPLE?

Consider the differential equation x(r) = Ax(r) when

A= ° -1 .
1 0

The unsophisticated (but tricky) approach is to write this system out explicitly:

xi(t) = -x2(t)
x2(t)= Xi(t)

and differentiate again, obtaining

x‘i(r) = —x2(f) = -xi(t)


(#♦)
X2(t)= xy(t) =-x2(t).

That is, our vector function x(r) satisfies the second-order differential equation

x(r) = -x(r).

Now, the equations (♦♦) have the “obvious” solutions

xi (t) = ai cos t + bi sin t and x2(t) = a2 cos t + ba sin t

for some constants ay, a2, by, and b2 (although it is far from obvious that these are the only solutions).
Some information was lost in the process; in particular, since Xi = x2, the constants must satisfy the
equations

a2 = —by and b2 — ay.

2See Spivak, Calculus, 3 ed., chap. 24, for the proof in the real case; Proposition 1.1 of Chapter 6 applies to show
that it works for the matrix case as well.
3 Difference Equations and Ordinary Differential Equations •< 445

That is, the vector function

*l(t) a cos t - b sin t cost — sint a


x(t) = — —
a sin t + b cos t sint cost b
L X2(t) J

gives a solution of the original differential equation.


On the other hand, Theorem 3.3 tells us that the general solution should be of the form

x(t) = eMXo,

and so we suspect that

sin t cos t

should hold. Well,

t2
^ = z + fA+_A2+_A3 + _.A4 + ...
t3 t*

1 0 0 -1 t2 -1 0 t3 0 1 t4 1 0
— +r + 2! + 3! + 4!
0 1 1 0 0 -1 -1 0 0 1

Since the power series expansions (Taylor series) for sine and cosine are, indeed,

cos' =1 - it'2 + ii'4+■ ■ ■+ (-1)t (2B'“ + ■ ■ ■ ■


the formulas agree. (Another approach to computing e‘A is to diagonalize A over the complex
numbers, but we don’t stop to do this here.3) 4

► EXAMPLES

Let’s now the consider the case of a nondiagonalizable matrix, e.g.,

The system

i1(t) = 2xi+ x2
x2(t) 2x2

3But we must remind you of the famous formula, usually attributed to Euler: e‘l = cos t + i sin t.
446 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

is already partially uncoupled, so we know that x2 (f) must take the form x2 (t) = ce2t for some constant
c. Now, in order to find Xi (t), we must solve the inhomogeneous ODE

xi(t) = 2xi(t) + ce2*.

In elementary differential equations courses, one is taught to look for a solution of the form

Xi(r) =ae2r + bte2t;

in this case,

xi(t) = (2a + h)e2r 4- (2b)te2( = 2xi(t) 4- be21,

and so taking b = c gives the desired solution of our equation. That is, the solution of the system is
the vector function

ae2t + cte2t _ e2*


x(t) =
ce21 0

The explanation of the trick is quite simple. Let’s calculate the matrix exponential etA by writing

2 0 0 1 0 1
A= — 214- B, where B—
0 2 0 0 0 0

The powers of A are easy to compute because B2 = 0: By the binomial theorem,

(21+ B)k = 2kI+k2k~'B,

and so
°° fk °° fk
fc=0 fc=0
00 (Qf\k 00 fk
*=0 K‘ k=0 K'
= e*l + t± W b =^/+i ^b
fc=l
(t-l)l s *1

0 e21

A similar phenomenon occurs more generally (see Exercise 14).

Let’s consider the general n* order linear ODE with constant coefficients:

(★ ) y(n)(t) 4- a„_iy(n-1)(t) 4- •• • 4- a2y(t) 4- a^y(t) 4- aoy(t) — 0.

Here o q , ai,..., an-i are scalars, and y (t) is assumed to be Qn; y denotes its kP derivative.
We can use the power of Theorem 3.3 to derive the following general result.
3 Difference Equations and Ordinary Differential Equations 447

Theorem 3.4 Let n e N. The set of solutions of the n* order ODE (★ ) is an n-


dimensional subspace ofC°°(R), the vector space^ of smooth functions. In particular, the
initial value problem

y(n)(t) + y(n-1)(r) H-------- b a2y(t) + ary(t) + ooy(r) = 0

y(O) = eo, y(0) = ci, y(0) = c2, ..., y(n-1)(0) = cn^

has a unique solution.

Proof The trick is to concoct a way to apply Theorem 3.3. We introduce the vector
function x(f) defined by

y(t)
y(t)
x(f) = y(0

and observe that it satisfies the first-order system of ODE’s

" y(t) “ 0 1 0 • 0 y(t)


y(0 0 0 1 • 0 y(t)
x(r) = y(t) — 0 0 0 '• 0 y(0

_yW(t)_ —cio -dl -a2 •• _y(n'1)(t)_

= Ax(t),

where A is the obvious matrix of coefficients. We infer from Theorem 3.3 that the general
solution is x(r) = eMxo, so

y(f) co

y(t) Cl
y(t) = elA C2 = covi(r) + civ2(?) + • • • + c„_ivn(r),

_ Cn'1 _

where v7 (r) are the columns of etA. In particular, if we let qi(t),..., qn(t) denote the first
entries of the vector functions vi(f),..., vn(f), respectively, we see that

y(r) = co<h(O + ci«2(0 4----- + Cn-iqn(t);

4See Section 3.1 of Chapter 4.


448 > Chapter 9. Eigenvalues, Eigenvectors, and Applications

that is, the functions q i,..., qn span the vector space of solutions of the differential equation
(★ ). Note that these functions are C00 since the entries of etA are. Last, we claim that these
functions are linearly independent. Suppose that for some scalars c q , c i ,..., c„-i we have

y(t) = Co^i(t) +Cltf2(t) H-------- F <?n-lQn(t) — 0;

then, differentiating, we have the same linear relation among the kP derivatives of q\,...,
qn, k = 1,..., n — 1, and so we have

y(t) co
y(t) Cl
y(t) = etA C2

Cn-1 _

Since etA is an invertible matrix (see Exercise 17), we infer that co ~ ci = • • • — cn-i = 0,
and so {qi,..., qn} is linearly independent. ■

> EXAMPLE 9

Let

-3
2

and consider the second -order system of ODE’s

x(0 = Ax(0, x(0) = Xo, x(0) = Xq .


The experience we gained in Example 7 suggests that if we can uncouple this system (by finding
eigenvalues and eigenvectors), we should expect to find normal modes that are sinusoidal in nature.
The characteristic polynomial of A is p(f) = t2 + 6if + 5, and so its eigenvalues are X] = — 1
and X2 = —5, with corresponding eigenvectors

vi = and v2 =

(Note, as a check, that because A is symmetric, the eigenvectors are orthogonal. See Exercise 9.2.8.)
As usual, we write P~l AP = A, where

and

Let’s make the “uncoupling” change of coordinates y = P i.e.,

yi _ 1 1 1 *1
y2 . 2 _1 -1 X2

Then the system of differential equations becomes

y(r) = p-1x(/) = P-1Ax — AP-1x = Ay;


3 Difference Equations and Ordinary Differential Equations 449

i.e.,

j’i(r) = -yi

y2(r)= -5y2,

whose general solution is

yi (t) = ai cos t + b} sin t


y2(t) = a2 cos V5t 4- b2 sin V5t.

This means that in the original coordinates, we have x = Py; i.e.,

Xi 1 1 ai cost +Z>i sin/


x2 1 -1 a2 cos >/5t 4- b2 sin -s/5/

= (a\ cost + hi sin/) 4- (a2 cos V5t 4- b2 sin V5t) 1

The four constants can be determined from the initial conditions Xo and Xo- In particular, if we start
with

then Ui = = 5 and bi = b2 = 0. Note that the form of our solution looks very much like the
normal mode decomposition of the solution (ft) of the first-order system earlier.
A physical system that leads to this differential equation is the following. Hooke’s Law says that
a spring with spring constant k exerts a restoring force F = —kx on a mass m that is displaced x units
from its equilibrium position (corresponding to the “natural length” of the spring). Now imagine a
system, as pictured in Figure 3.1, consisting of two masses (mi and m2) connected to each other and
to walls by three springs (with spring constants ki, k2, and fc3). Denote by jq and x2 the displacement
of masses m\ and m2, respectively, from equilibrium position. Hooke’s Law, as stated above, and
Newton’s second law of motion (“force = mass x acceleration”) give us the following system of
equations:

mix’i(r) = -kiXi + k2(x2 -Xi)= - (kx 4- fc2)xi + k2x2


m2x2(t) = k2(xi - x2) - k?x2 = k2x\ — (k2 + k2)x2.

Setting mi =m2 = 1, ki = k3 = 1, and k2 =2 gives the system of differential equations with which
we began. Here the normal modes correspond to sinusoidal motion with Xi = x2 (so we observe the
masses moving “in parallel,” the middle spring staying at its natural length) and frequency 1 and
sinusoidal motion with Xi = — x2 (so we observe the masses moving “in antiparallel,” the middle
spring compressing symmetrically) and frequency a /5.

Figure 3.1
450 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

3.3 Flows and the Divergence Theorem


Let U C R" be an open subset. Let F: U -> R" be a vector field on U. So far we have
dealt with vector fields of the form F(x) = Ax, where A is an n x n matrix. But, more
generally, we can try to solve the system of differential equations

x(t) = F(x(t)), x(O)=:Xo.

We will write the solution of this initial value problem as 0r(xo), indicating its functional
dependence on both time and the initial value. The function 0 is called the flow of the vector
field F. Note that 0o(x) = x for all x € 17.

> EXAMPLE 10
a. The flow of the vector field F(x) = x on R is <pt (x) = e‘x.
-y
b. The flow of the vector field F on R2 is
X

— sinr
cost

i.e., the flow lines are circles centered at the origin.

c. Let A = . The flow of the vector field F(x) = Ax on R2 is


1

where

-1

d. The flow of the general linear differential equation x(f) = Ax(t) is given by 0r(x) = ^Mx.
Finding an explicit formula for the flow of a nonlinear differential equation may be somewhat
difficult.

It is proved in more advanced courses that if F is a smooth vector field on an open set
U c Rn, then for any x e U, there are a neighborhood V of x and e > 0 so that for any
y e V the flow starting at y, <f>t (y), is defined for all |/1 < e. Moreover, the function 0: V x

(—e, s) -> R", 0 = 0f(y), is smooth. We now want to give another interpretation of

divergence of the vector field F, first discussed in Section 6 of Chapter 8. It is a natural


generalization of the elementary observation that the derivative of the area of a circle with
respect to its radius is the circumference.
3 Difference Equations and Ordinary Differential Equations 451

First we need to extend the definition of divergence to n dimensions: If F =


a smooth vector field on R", we set

3F2 3J1L
div F = —----- F —----- F • • • +
oxi 3x 2 dxn '

Proposition 3.5 Let F be a smooth vector field on U C R”, let denote the flow
ofF, and let Q CU be a compact region with piecewise smooth boundary. Let V(t) =
vol(0f(Q)). ThenV(O)= [ divFdV.
Jq

Remark Using (the obvious generalization of) the Divergence Theorem, Theorem
6.2 of Chapter 8, we have the intuitively appealing result that V(0) = / F • ndS. That is,
JdQ
what causes net increase in the volume of the region is flow across its boundary.

► EXAMPLE 11

a. In Figure 3.2, we see the flow of the unit square under the vector field F

Note that area is preserved under the flow, as div F = 0.

b. In Figure 3.3 (with thanks to John Polking’s MATLAB software ppi ane 5), we see the flow
of certain regions fl. In (a), the region expands (as div F > 0), whereas in (b) the region
maintains its area (as div F = 0).
452 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

x = x + 2y x = -x + 2y
y = 5x + y

2 2-
1.5 1.5 -
1 1 -
0.5 0.5 -
* 0 0-
-0.5 -0.5 -
-1 -1 -
-1.5 -1.5 -
-2 -2-
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x
(a) (b)

Figure 3.3

Proof We have
V(f) = [ A ■ • • A dxn) = f d^c)i A • • • A d(^t)n.

By Exercise 7.2.20, we have


V(°)= f <M)1 A ••■A</(«,)„.
J a dt I t=o
d 2d»t a2#. d
Now the fact that mixed partials are equal tells us that -
ut OXi dxidt dt '

= d(j>t). Moreover, $0(x) = F(x) (since <A/(x) = F(^(x))), and 0o(x) = x, so


the latter integral can be rewritten

V(0) = / (dFi A dX2 A • • • A dxn + dxy A dF2 A dx-} A • • • A dxn


Ja
+ • • • + dx\ A • • • A dxn-i A dFn)

div Fdxi A • • • A dxn,

as required. ■

► EXERCISES 9.3
2 5
1. Let A = . Calculate Ak for all k > 1.
1 -2

*2. Suppose each of two tubs contains two bottles of beer; two are Budweiser and two are Beck’s.
Each minute, Fraternity Freddy picks a bottle of beer at random from each tub and replaces it in the
other tub. After a long time, what portion of the time w ill there be exactly one bottle of Beck’s in the
3 Difference Equations and Ordinary Differential Equations 453

first tub? At least one bottle of Beck’s? (Hint: Let x* be the vector whose entries are, respectively,
the probabilities that there are two Beck’s, one of each, or two Buds in the first tub.)
*3. Gambling Gus has $200 and plays a game where he must continue playing until he has either lost
all his money or doubled it. In each game, he has a 2/5 chance of winning $100 and a 3/5 chance of
losing $100. What is the probability that he eventually loses all his money? (Warning: Calculator or
computer suggested.)
*4. If <2o — 2, = 3, and a*+i = 3ak — 2ak^, for all k > 1, use methods of linear algebra to deter­
mine the formula for ak.
5. If «o — «i = 1 and a*+i = ak + 6aki for all k > 1, use methods of linear algebra to determine
the formula for ak.
6. Suppose oo = 0, = 1, and ak+i = 3ak + 4ak~i for all k > 1. Use methods of linear algebra to
find an explicit formula for ak.
7. Ifa0 = 0,ai = Landa^ = 4a* — 4a*_i for all k > 1, use methods of linear algebra to determine
the formula for ak. (Hint: The matrix will not be diagonalizable, but you can get close if you stare at
Exercise 9.2.16.)
*8. If o q = 0, ai = a2 = 1, and ak+i = 2ak + ak_i — 2a*_2 for k > 2, use methods of linear algebra
to determine the formula for ak.
9. Consider the cat/mouse population problem studied in Example 1. Solve the following versions,
including an investigation of the dependence on the original populations.
(a) ck+1 = 0.7c* + 0.1m* (c) ck+i = 1.1c* + 0.3m*
mk+l = -0.2c* + mk mk+i = 0.1c* + 0.9m*
*(b) c*+i = 1.3c* + 0.2m*
m*+i = -0.1c* 4- m*
What conclusions do you draw?
10. Check that if A is an n x n matrix and the n x n differentiable matrix function E(t) satisfies
E(t) = AE(t) and E(0) = 1, then E(r) = etK for all t e R.
11. Calculate etA and use your answer to solve x(r) = Ax(t), x(0) = Xq .

'(a) A = *(d) A =

(b) A = *(e) A =

(c) A = (f) A —
454 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

2
*(d) A=
1

13. Find the motion of the two-mass, three-spring system in Example 9 when
(a) mi — m2 = 1 and k\ = k3 = 1, k^ = 3,
(b) mi = m2 = 1 and k\ = 1, = 2, k3 — 4,
*(c) mi — 1, m2 = 2, k\ = 1, and = k3 = 2.
*14. Let

1
2
Calculate etJ.
*15. By mimicking the proof of Theorem 3.4, convert the following second-order differential equa­
tions into first-order systems and use matrix exponentials to solve:
(a) y(r) - y(t) - 2y (t) = 0, y(0) = -1, y(0) = 4,
(b) y(t) ~ 2y(t) + y(t) = 0, y(0) = 1, y(0) = 2.
16. Let a, b e R. Convert the constant coefficient second-order differential equation
y(t)+ay(t) + by(t) =0

y(0
into a first-order system by letting x(t) = . Considering separately the cases a2 - 4b 0
y(0
and a2 — 4b = 0, use matrix exponentials to find the general solution.
17. (a) Prove that for any square matrix A, (eA)-1 = e~A. (Hint: Show (efA) 1 = elA for all
t e R.)
(b) Prove that if A is skew-symmetric (i.e., AT = —A), then eA is an orthogonal matrix.
(c) Prove that when the eigenvalues of A are real, det(eA) = e*A. (Hint: Prove the result when A is
diagonalizable and then use continuity to establish it in general. Alternatively, apply Exercise 9.2.22.)
18. Consider the mapping exp: A4nxn A4nxn given by exp(A) = eA. By Exercise 17, eA is
always invertible.
(a) Use the Inverse Function Theorem to show that for every matrix B sufficiently close to I, there
is a unique A sufficiently close to O so that eA = B.

(b) Can the matrices be written in the form eA for some A?

19. Use Proposition 3.5 to deduce that the derivative with respect to r of the volume of a ball of
radius r (in Rn) is the volume (surface area) of the sphere of radius r.
20. It can be proved using (a generalization of) the Conn-action Mapping Principle, Theorem 1.2 of
Chapter 6, that when F is a smooth vector field, given a, there are 8, s > 0 so that the differential
equation x(t) = F(x(r)), x(0) = Xo, has a unique solution for all xo e B(a, 8) and defined for all
|r| < e.
(a) Assuming this result, prove that whenever |j |, |r|, and |j +1| < £, we have j>s+t = (Hint:
Fix t = t0 and vary 5.)
(b) Deduce that =(4:)“’.
(c) By considering the example F(x) = a /[x |, show that uniqueness may fail when the vector field
isn’t smooth. Indeed, show that the initial value problem i(t) = V|x(r)|, x(0) = 0, has infinitely
many solutions.
4 The Spectral Theorem A 455

21. Generalizing Proposition 3.5 somewhat, prove that V(t) = I div Fd V. (Hint: Use Exercise

20 and the proposition as stated.)


22. (a) Show that the space-derivative of the flow <j>t satisfies the^m variation equation
(D+W = Wr(xW). ^o(«) = I-
(b) For fixed x, let J(t) = det(D0r(x)). Using Exercise 7.5.23, show that
j(t) = div F(^,(x))J(r).
Deduce that J(t) — gio ***<♦,<«))*.

► 4 THE SPECTRAL THEOREM


We now turn to the study of a large class of diagonalizable matrices, the symmetric matrices.
Recall that a square matrix A is symmetric when A — AT. To begin our exploration, let’s
start with a general symmetric 2x2 matrix

a b~
A=
c

whose characteristic polynomial is p(t) = t2 — (a 4- c)t 4- (ac — b2). By the quadratic


formula, its eigenvalues are

(a 4- c) ± ,/(a + c)2 — 4(ac — b2) (a 4- c) ± y/(a — c)2 4- 4b2


- = _

Only when A is diagonal are the eigenvalues not distinct. Thus, A is diagonalizable.
Moreover, the corresponding eigenvectors are

b A.2 — C
v2 =
Ai — a b

note that

Vi • v2 = h(A2 — c) + (Ai — a)b = b(ki 4- A2 — a — c) = 0,

and so the eigenvectors are orthogonal. Since there is an orthogonal basis for R2 consisting
of eigenvectors of A, we of course have an orthonormal basis for R2 consisting of eigenvec­
tors of A. That is, by an appropriate rotation of the usual basis, we obtain a diagonalizing
basis for A.

► EXAMPLE 1

The eigenvalues of

1 2
A=
2 -2
456 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

are Ai =2 and A2 = -3, with corresponding eigenvectors

By normalizing the vectors, we obtain an orthonormal basis

See Figure 4.1.

From Proposition 4.5 of Chapter 1 we recall that for all x, y G R" and n x n matrices
A we have

Ax • y = x • ATy.

In particular, when A is symmetric,

Ax • y = x • Ay.

More generally, we say a linear map T: R" -> R” is symmetric if T (x) • y = x • T (y) for
all x, y g R”. It is easy to see that the matrix for a symmetric linear map with respect to
any orthonormal basis is symmetric.
In general, we have die following important result. Its name comes from the word
spectrum, associated with the physical concept of decomposing light into its components
of different colors.

Theorem 4.1 (Spectral Theorem) Let T: Rn -> Rn be a symmetric linear map.


Then

1. The eigenvalues of T are real.


2. There is an orthonormal basis for R" consisting of eigenvectors of T. That is,
if A is the standard matrix for T, then there is an orthogonal matrix Q so that
Q~XAQ = A is diagonal.

Proof We proceed by induction on n. The case n = 1 is automatic. Now assume


that the result is true for all symmetric linear maps T': Rn-1 -> R"-1. Given a symmetric
linear map T: R" -> Rn, we begin by proving that it has a real eigenvalue. We choose to
4 The Spectral Theorem 457

use calculus to prove this, but for a purely linear-algebraic proof, see Exercise 16. Consider
the function

By compactness of the unit sphere, f has a maximum subject to the constraint g(x) ==
||x ||2 = 1. Applying the method of Lagrange multipliers, we infer that there is a unit vector
v so that Df(y) = AZ)g(v) for some scalar A. By Exercise 3.2.14, this means

Av = Av,

and so we’ve found an eigenvector of A; the Lagrange multiplier is the corresponding


eigenvalue. (Incidentally, this was derived at the end of Section 4 of Chapter 5.)
By what we’ve just established, T has a real eigenvalue Ai and a corresponding eigen­
vector Vi of length 1. Note that if w • Vi = 0, then T(w) • Vi = w • T(vi) = AiW • vi = 0,
so that T(w) e W whenever w e W. Let W = (SpanCvi))1" c R", and let T' = T|w be
the restriction of T to W. Since dim W = n — 1, it follows from our induction hypoth­
esis that there is an orthonormal basis {v2,..., vn} for W consisting of eigenvectors of
T'. Then {vi, v2,..., vn} is the requisite orthonormal basis for Rn since T(vi) = AiVi and
T(Vf) = T'(Vz) = A/V/ for i > 2. ■

► EXAMPLE 2

Consider the symmetric matrix

1 1
A= 1 0
0 1

Its characteristic polynomial is p(t) = — t3 + 2t2 + t —2 = —(t -I- l)(r — l)(r — 2), sotheeigenval-
ues of A are — 1,1, and 2. As the reader can check, the corresponding eigenvectors are

1
Vi = v2 = -1 and v3 = 1
1 1

Note that these three vectors form an orthogonal basis for R3, and we can easily obtain an orthonormal
basis by normalizing:

The orthogonal diagonalizing matrix Q is therefore

i ◄I
75
i
76 72 75 J
458 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

► EXAMPLES

Consider the symmetric matrix

5 -4 -2
A= -4 5 -2
-2 -2 8

Its characteristic polynomial is p(t) = -t3 + 18? - 81r = -t(t - 9)2, so the eigenvalues of A are
0,9, and 9. It is easy to check that

2
Vi = 2
1

gives a basis for E(0) = N(A). As for E(9), we find

-4 -4 -2
A-91 = -4 -4 —2
-2 —2 -1

which has rank 1, and so, as the spectral theorem guarantees, E(9) is 2-dimensional, with basis

-1 -1
v2 = 1 and v3 = 0
0 2

If we want an orthogonal (or orthonormal) basis, we must use the Gram-Schmidt process, Theorem
5.3 of Chapter 5: We take w2 = v2 and let

w3 = v3 - projW2v3 =

It is convenient to eschew fractions, and so we let

-1
w^ = 2w 3 = -1
4

As a check, note that V], w2, do in fact form an orthogonal basis. As before, if we want the
orthogonal diagonalizing matrix Q, we take
"2" ‘ -1 " ~ -1 ’
1 1 - 1
41 = 3 2 , Q2 = ~7= 1 , and q3 = —— -1
72 3^2
_ 1 _ 0_ 4_

whence
2
3

Q=
2
3 V2
1
0 4
3

We reiterate that repeated eigenvalues cause no problem with symmetric matrices.


4 The Spectral Theorem •< 459

We conclude this discussion with a comparison to our study of projections in Chapter


5. Note that if we write out A = 0A0-1 = QA0T, we see that

i=l

This is the so-called spectral decomposition of A: Multiplying by a symmetric matrix A is


the same as taking a weighted sum (weighted by the eigenvalues) of projections onto the
respective eigenspaces. This is, indeed, a beautiful result with many applications in higher
mathematics and physics.

4.1 Conics and Quadric Surfaces


We now use the Spectral Theorem to analyze the equations of conic sections and quadric
surfaces.

► EXAMPLE 4

Suppose we are given the quadratic equation


X2 + 4X1X2 — 2x2 = &

to graph. Then we notice that we can write the quadratic expression

x2 + 4x ix 2 - 2x2 = j xj

where

is the symmetric matrix we analyzed in Example 1 above. Thus, we know that

where Q = -4= 2 -1 2 0
A = QAQ\ and A =
J5 1 2 0 -3

So, if we make the substitution y = Qrx, then we have


xTAx = x t (QAQt )x = (fiTx)TA(QTx) = yTAy = 2y2 - 3y%.

Note that the conic is much easier to understand in the >'i ^-coordinates. Indeed, we recognize that
the equation 2y2 — 3yj = 6 can be written in the form

Zi_Zi = i
3 2
from which we see that this is a hyperbola with asymptotes yi = i^fyi, as pictured in Figure 4.2.
Now recall that the yi ^-coordinates are the coordinates with respect to the basis formed by the
460 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

column vectors of Q. Thus, if we want to sketch the picture in the original XiXz-coordinates, we first
draw in the basis vectors q; and qj, and these establish the yi- and y2~ax®s, respectively, as shown in
Figure 4.3.

It’s worth recalling that the equation

represents an ellipse (with semiaxes a and h), whereas the equation


Y2 Y2
•*1 _ *2 _ i
a2 b2

±a
represents a hyperbola with vertices and asymptotes X2 =
0
Quadric surfaces include those shown in Figure 4.4: ellipsoids, cylinders, and hyper­
boloids of 1 and 2 sheets. There are also paraboloids (both elliptic and hyperbolic), but we
come to these a bit later. We turn to another example.

ellipsoid cylinder hyperboloid hyperboloid


of one sheet of two sheets

Figure 4.4
4 The Spectral Theorem <4 461

► EXAMPLES

Consider the surface defined by the equation

2X1*2 + 2*1*3 4~ *2 4" *3 = 2.

We observe that if

1 1
1 0
0 1

is the symmetric matrix from Example 2, then

xTAx = 2*1*2 4- 2*1*3 + *2 + *3 ,

and so we use the diagonalization and the substitution y = <2Tx, as before, to write

’-1 0 0“
xTAx = yTAy, where A= 0 1 0
0 0 2

yi
that is, in terms of the coordinates y — y2 , we have

2*1*2 4" 2*1*3 4* *2 4“ x3 = ~yf 4- y2 4- 2yj,

and the graph of — 4- y2 4- 2yf = 2 is the hyperboloid of one sheet shown in Figure 4.5. This is
the picture with respect to the “new basis” {qi, q2, qj} (given in the solution of Example 2). The
picture with respect to the standard basis, then, is as shown in Figure 4.6. (This figure is obtained by
multiplying by the matrix Q. Why?) *41

Figure 4.5 Figure 4.6


462 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

The alert reader may have noticed that we’re lacking certain curves and surfaces. If
there are linear terms present along with the quadratic, we must adjust accordingly. For
example, we recognize that
xf + 2x 2 = 1

is the equation of an ellipse centered at the origin. Correspondingly, by completing the


square, we see that

Xj 4- 2xi 4- 2x2 — 3x2 — ■y

-1
is the equation of a congruent ellipse centered at However, the linear terms become
3/4
all important when the symmetric matrix defining the quadratic terms is singular. For
example,
xf — Xi = 1

defines a pair of lines, whereas


xf - X2 = 1

defines a parabola.

► EXAMPLE 6
We wish to sketch the surface
5xi ~ 8*1*2 — 4x i X3 4- 5xf — 4x2X3 4- 8x3 4- 2xi 4- 2x2 4- X3 = 9.

No, we did not pull this mess out of a hat. The quadratic terms came, as might be predicted, from
Example 3. Thus, we make the change of coordinates given by y = CTx, with

Since x = Qy, we have


~ 2 1 _ 1
5 3J2 yi"
1] 2 1 1
2xi 4- 2x2 4- *3 = 2 2 3 ^2 V2
1 4
_ 3 0 _J3 -

and so our given equation becomes, in the y 1 y2V3 -coordinates,


9y224-9y32 4-3yi=9.

Rewriting this a bit, we have


yi = 3(1 - yj - yj),

which we recognize as a (circular) paraboloid, shown in Figure 4.7. The sketch of the surface in our
original xix2x3-coordinates is then shown in Figure 4.8. ”4
4 The Spectral Theorem 463

*3

Figure 4.8

► EXERCISES 9.4

1. Find orthogonal matrices that diagonalize each of the following symmetric matrices:

6 2 2 2 -2 1 -2 2~
*(a) *(c) 2 -1 -1 (e) -2 1 2
2 9
_ —2 -1 -1 2 2 L

~2 o' "1 0 1 o“
0 '3 2 2"
0 1 0 i
(b) 0 1 -1 *(d) 2 2 0 (f)
1 0 1 0
_0 -1 1_ _2 0 4_
.0 1 0 1_
464 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

2. Suppose A is a symmetric matrix with eigenvalues 2 and 5. If the vectors


~ 1 *

span the 5-eigenspace, what is A 1 ? Give your reasoning.


2 1
3. A symmetric matrix A has eigenvalues 1 and 2. Find A if 1 spans E(2).
1
1 2
4. Suppose A is symmetric, A , and det A = 6. Give the matrix A. Explain your
1 2
reasoning clearly. (Hint: What are the eigenvalues of A?)
*5. Prove that if A, is the only eigenvalue of a symmetric matrix A, then A = kl.
6. Decide (as efficiently as possible) which of the following matrices are diagonalizable. Give your
reasoning. _ —
5 0 2 "5 0 2

A = 0 5 0 , B = 0 5 0
0 0 5 _2 0 5
—— —
1 2 4 "1 2 4

C= 0 2 2 , D= 0 2 2
_0 0 3_ _0 0 1
7. Suppose A is a diagonalizable matrix whose eigenspaces are orthogonal. Prove that A is symmetric.
8. Suppose A is a symmetric n x n matrix. Using the spectral theorem, prove that if Ax • x = 0 for
every vector x e Rn, then A = O.
9. Apply the spectral theorem to prove that any symmetric matrix A satisfying A2 = A is in fact a
projection matrix.
10. Suppose T is a symmetric linear map satisfying [T]4 = I. Use the spectral theorem to give a
complete description of T: R" -> R". (Hint: For starters, what are the potential eigenvalues of T?)
11. Let A be an m x n matrix. Show that ||A|| — Vk, where k is the largest eigenvalue of the
symmetric matrix ATA.
12. We say a symmetric matrix A is positive definite if Ax • x > 0 for all x 0, negative definite
if Ax • x < 0 for all x 0, and positive (resp., negative) semidefinite if Ax • x > 0 (resp., < 0) for
allx.
(a) Prove that if A and B are positive (negative) definite, then so is A + B.
(b) Prove that A is positive (resp., negative) definite if and only if all its eigenvalues are positive
(resp., negative).
(c) Prove that A is positive (resp., negative) semidefinite if and only if all its eigenvalues are non­
negative (resp., nonpositive).
(d) Prove that if C is any m x n matrix of rank n, then A = CTC has positive eigenvalues.
(e) Prove or give a counterexample: If A and B are positive definite, then so is AB. What about
AB + BA?
13. Let A be an n x n matrix. Prove that A is nonsingular if and only if every eigenvalue of ATA is
positive.
14. Prove that if A is a positive semidefinite (symmetric) matrix, then there is a unique positive
semidefinite (symmetric) matrix B with B2 = A.
4 The Spectral Theorem •*< 465

15. Suppose A and B are symmetric and AB = BA. Prove there is an orthogonal matrix Q so
that both Q~lAQ and 316 diagonal. (Hint: Let A be an eigenvalue of A. Use the Spectral
Theorem to show that there is an orthonormal basis for E(A) consisting of eigenvectors of B.)
16. Prove, using only methods of linear algebra, that the eigenvalues of a symmetric matrix are real.
(Hints: Let A = a 4- bi be a putative complex eigenvalue of A, and consider the real matrix
B = (A - (a 4- bi)l)(A - (a - bi)l) = A2 - 2a A + (a2 + b2)I = (A — al)2 + b2I.
Show that B is singular, and that if v e N(B) is a nonzero vector, then (A — al)v = 0 and b = 0.)
17. If A is a positive definite symmetric n x n matrix, what is the volume of the n-dimensional
ellipsoid {x € R" : Ax • x < 1}? (See also Exercise 7.6.3.)
18. Sketch the following conic sections, giving axes of symmetry and asymptotes (if any).
(a) 6x1X2 — 8*2 — 9 (d) 10*? 4- 6*1*2 4- 2x2 = 11
*(b) 3*j — 2*1*2 4- 3*2 = 4 (e) lx2 4- 12*i*a — 2*j — 2*i 4- 4*2 = 6
*(c) 16*f 24*1*2 + 9xf — 3*1 4- 4x2 = 5
19. Sketch the following quadric surfaces.
*(a) 3*? 4- 2x ix 2 4- 2*1*3 + 4*2*3 = 4
(b) 4*J — 2*1*2 — 2*1*3 4- 3*2 4- 4*2*3 4- 3*2 = 6
(c) —*2 4- 2xf — *3 — 4*1*2 — 10*1X3 4- 4x2*3 = 6
*(d) 2*2 4" 2*1*2 4“ 2*1X3 4* 2*2X3 — *1 4~ *2 4* X3 = 1
(e) 3*2 4- 4*1*2 4- 8*1*3 4- 4*2*3 4- 3*2 = 8
(f) 3*2 4- 2*1*3 — x2 4- 3xf 4- 2*2 = 0
20. Let a, b, c e R, and let Q(x) = ax2 4- 2b*i*2 4- ex2.
(a) The Spectral Theorem tells us that there exists an orthonormal basis for R2 with respect to whose
coordinates yi, y2 we have
Q(x) = Q(y) = Ay? 4-
In high school analytic geometry, one derives the formula
a—c
cot 2a = ———
2b
for the angle a through which we must rotate the *i*2-axes to get the appropriate yiy2-axes. Derive
this by using eigenvalues and eigenvectors, and determine the type (ellipse, hyperbola, etc.) of the
conic section Q(x) = 1 from a, b, and c. (Hint: Use the characteristic polynomial to eliminate A2 in
your computation of tan 2a.)
(b) Use the formula for Q. above to find the maximum and minimum of Q on the unit circle 11x || = 1.
21. In this exercise we consider the nature of the restriction of a quadratic form to a hyperplane. Let
A be a symmetric n x n matrix.
(a) Show that the quadratic form Q(x) = xTAx on R" is positive definite when restricted to the
subspace xn = 0 if and only if all the roots of

are positive.
466 ► Chapter 9. Eigenvalues, Eigenvectors, and Applications

(b) Use the change-of-basis theorem to prove that the restriction to the subspace b • x = 0 is positive
definite if and only if all the roots of
I
A-t! b =0
-----------------------------l_
---- bT ------ 0
are positive.
(c) Use this result to give a bordered Hessian test for the point a to be a constrained maximum
(minimum) of the function f subject to the constraint g = c. (See Exercises 5.4.34 and 5.4.32b.)
(d) What is the analogous result for an arbitrary subspace?
22. We saw in Section 3 of Chapter 5 that we can write a symmetric n x n matrix A in the form
A = LDlJ (where L is lower triangular with diagonal entries 1 and D is diagonal); we saw in this
section that we can write A = QAQ* for some orthogonal matrix Q. Although the diagonal entries
of D obviously need not be the eigenvalues of A, the point of this exercise is to see that the signs of
these numbers must agree. That is, the number of positive entries in D equals the number of positive
eigenvalues of A, the number of negative entries in D equals the number of negative eigenvalues of
A, and the number of zero (diagonal) entries in D equals the number of zero eigenvalues.
(a) Assume first that A is nonsingular. Consider the “straight line path” joining I and L (stick a
parameter s in front of the nondiagonal entries of L and let s vary from 0 to 1). We then obtain a
path in Mnxn joining D and A. Show that all the matrices in this path are nonsingular and, applying
Exercise 8.7.9, show that the number of positive eigenvalues of D equals the number of positive
eigenvalues of A. Deduce the result in this case.
(b) In general, prove that the number of zero diagonal entries in D is equal to dim N(A) = dim E(0).
By considering the matrix A 4- e I for s > 0 sufficiently small, use part a to deduce the result.

Remark Comparing Proposition 3.5 of Chapter 5 with Exercise 12 above, we can easily derive
the result of this exercise when A is either positive or negative definite. But the indefinite case is
more subtle.

GLOSSARY OF NOTATIONS
AND RESULTS FROM
SINGLE-VARIABLE
CALCULUS
► Notations

Notation Definition Discussion/Page reference


e is an element of x e X means that x belongs to the set X
c subset X c Y means that every element x of X be­
longs to Y as well; two sets X and Y are equal
ifX C PandK cX
proper subset X C Y means that X c Y and X Y
implies P => Q means that whenever P is true, Q
must be as well
if and only if P <=> Q means P => Q and
Q=^P
gives by row operations Seep. 130
© binomial coefficient 171

partial derivative of f with respect to Xj 82


dxj
d2f
second-order partial derivative 120
dxjdxt
V2f Laplacian of f 122
Vf gradient of f 104
f fdA, [ fdV
integral of f over R 268
JR JR
A wedge product 338
dR boundary of R 358
Az ilb row vector of the matrix A 28
7th column vector of the matrix A 28
A-1 inverse of the matrix A 34
AT transpose of the matrix A 36
A& matrix giving rotation through angle 0 27

467
468 ► Glossary

Notation Definition Discussion/Page reference


Ai vector corresponding to the directed line seg­ 1
ment from A to B
AB product of the matrices A and B 31
Ax product of the matrix A and the vector x 28
Aij (n — 1) x (n — 1) matrix obtained by delet­ 315
ing the Ith row and the j,th column from the
n x n matrix A
B(a, 8) bail 65
B(a, 8) closed ball 70
B basis 413
e1, e*, e°° continuously differentiable, smooth functions 93,120,120,167
C(A) column space of the matrix A 171
Cb coordinates with respect to a basis B 415
cij ijA cofactor 315
curlF curl of vector field F 393
d (exterior) derivative 340
Df(a) derivative of f at a 87
Dvf(a) directional derivative of f at a in direction v 83
det A determinant of the square matrix A 309
divF divergence of vector field F 395
{ei,... , e„} standard basis for R" 19,162
E(X) x-eigenspace 423
eA exponential of the square matrix A 441

f ; function of a vector variable 60


<Xn)
f extension of /byO 272
f average value of f 300
g*<y pullback of a) by g 342
graph(/) graph of the function f 57
Hess(/) Hessian matrix of f 209
^Cf,a quadratic form associated to Hessian of f 209
I identity matrix, moment of inertia 303
In n x n identity matrix 34
image (7) image of a linear transformation T 172
k (s ) curvature 115
ker(T) kernel of a linear transformation T 172
limf(x) limit of f (x) as x approaches a 72
x-*a
*(g) arclength of g 112
K/,?) lower sum of f with respect to partition ? 267
Glossary •< 469

Notation Definition Discussion/Page reference


A*(R")* vector space of alternating multilinear func­ 337
tions from (R")* to R
A4mXn vector space of m x n matrices 167
P*A linear transformation defined by multiplica­ 28
tion by A
n outward-pointing unit normal 364
N(A) nullspace of the matrix A 172
N(j ) principal normal vector 115
(0 differential form 339
*a> star operator 346
ft region 273
CP plane, parallelogram, or partition 16,43,267
vector space of polynomials of degree < k 168
Pa (O characteristic polynomial of the matrix A 426
projyx projection of x onto y 10
projvb projection of b onto the subspace V 226
Q quadratic form 210
r,e polar coordinates 288
r,9,z cylindrical coordinates 292
P, <l>,0 spherical coordinates 294
R" (real) n-dimensional space 1
(Rn)* vector space of linear maps from R" to R 335
i? rectangle 65
R(A) row space of the matrix A 171
p(x) rotation of x € R2 through angle ?r/2 14
5n-i unit sphere 403
Span(vb ..., vt) span of Vi,..., v* 19
supS least upper bound (supremum) of S 69
S closure of the subset S 70
T linear map (or transformation) 24
[T] standard matrix of linear map T 24
lirii norm of the linear map T 97
miD cubical norm of the linear map T 324
T(j ) unit tangent vector 115
trA trace of the matrix A 41
£/(/,?) upper sum of f with respect to partition CP 267
t/ +V sum of the subspaces U and V 23
unv intersection of the subspaces U and V 23
orthogonal complement of subspace V 22
xxy cross product of the vectors x, y e R3 48
Hx|| length of the vector x 2
x* sequence 66
470 ► Glossary

Notation Definition Discussion/Page reference

Xty subsequence 70
X least squares solution 227
x y dot product of the vectors x and y 8
(x,y) inner product of the vectors x and y 238
X11, X1 components of x parallel to and orthogonal to 9
another vector
0 zero vector 1
0 zero matrix 30

► RESULTS FROM SINGLE-VARIABLE CALCULUS


Intermediate Value Theorem: Let f: [a, b] -+ R be continuous. Then for any y between f(a) and
f(b), there is x e [a, b] with f(x) = y.
Rolle's Theorem: Suppose/: [a,b] -> discontinuous,/is differentiable on (a, b), and/(a) =
/(b). Then there is c € (a, b) so that /'(c) = 0. Proof: By the maximum value theorem, Theorem 1.2
of Chapter 5, / takes on its maximum and minimum values on [a, b]. If / is constant on [a, b], then
/'(c) = 0 for all c € (a, b). If not, say /(x) > f(a) for some x e (a, b), in which case / takes on
a global maximum at some c e (a, b). Then f(c) = 0 (by Lemma 2.1 of Chapter 5). Alternatively,
f(x) < /(«) some x 6 (a, b), in which case / takes on a global minimum at some c € (a, b).
Then in this case, as well, /'(c) = 0.
Mean Value Theorem: Suppose /: [a, b] -> R is continuous and / is differentiable on (a, b).
Then there is c e (a, b) so that /(b) - f(a) = /'(c)(b — a).
Fundamental Theorem of Calculus, Part I: Suppose / is continuous on [a, b] and we set F(x) =
I f(t)dt. Then F'(x) = /(x) for all x € (a, b).
Ja
Fundamental Theorem of Calculus, Part II: Suppose / is integrable on [a, b] and / = F'. Then
[ f(x)dx = F(b) — F(a).
Ja

Basic Differentiation Formulas:


product rule: (/g)' = f'g + fg'
quotient rule: (f/gf = (f'g ~ fg')/g2
chain rule: (f°gY = (f'°g)g'

Note: We use log to denote the natural logarithm (In).


Glossary ◄ 471

Function Derivative
x" nx”-1
ex ex
logx \/x
sinx cosx
cosx — sinx
tanx sec2x
secx secx tanx
cotx — CSC2 X
esex — CSCx cotx
arcsin x 1/V1 -X2
arctan x 1/(1+x2)

Basic Trigonometric Formulas:


sin2 9 + cos2 0 = 1 tan2 0 + 1 = sec2 0 cot2 0 + 1 = esc2 0

cos 20 = cos2 0 - sin2 0 = 2 cos2 0 — 1 = 1 - 2 sin2 0

sin20 = 2sin0 cos0

law of cosines: c2 — a2 +b2 — 2ab cos y


, „ , sin a sin 5 siny
law of sines: ----- =------ = —-
a b c

Basic Integration Formulas:


integration by parts: j f'(x)g(x)dx = f(x)g(x) - f f(x)g'(x)dx

integration by substitution: f(g(x))g>(x)dx — F(g(x)), where F(w) = f f(u)du

Miscellaneous Integration Formulas:


x”+1
xndx = n 0-1 exdx = ex
n+T
dx y sinxt/x = — cosx
/ — = log |x|
X

I cos xdx = sinx I tanxdx = — log|cosx|

2 1 2 1
sin xdx = -(x — sinx cosx) cos xdx = - (x 4- sin x cos x)

tan2 x dx = tan x — x secxdx = log | secx + tanx|


472 ► Glossary

y* sec2 xdx = tanx sec3xdx = (secx tanx +log | secx + tanx|)

f , 1 , 3 » • 1 • 3
/ sin xdx = — cosx + - cos x cos xdx = sin x — - sin x

/
tan3 xdx = - tan2x + log | cosx|

dx 1 x
= - arctan -
a2 +x2 a a
y >Jx2^a2dx ~ ^Vx2 ±a2 ± y log |x + 1/x2 ± a2| I -=====: = log x + \/x2 ±a2
J Jx2±a2 I I
2
;------ x r---------------- , a . x y* logxdx = xlogx — x
2 — x2dx = —ya2 — x2 + — arcsin —
2 2 a
eajt
eax sin bxdx =

> GREEK ALPHABET


—— (a sin bx — b cos bx)
a2 + b2
/ cos bxdx = —— (a cos bx 4- b sin bx)
a2 + b2

alpha a iota i rho P


beta kappa K sigma tr E
gamma y r lambda X A tau T
delta 8 A mu At upsilon V Y
epsilon € (£) nu V phi
zeta xi ? di chi X
eta omicron o psi 4*
theta 0 pi n n omega at Q

FOR FURTHER READING
Adams, Malcolm, and Theodore Shifrin. Linear Alge­ Spivak, Michael. Calculus on Manifolds: A Modem Ap­
bra: A Geometric Approach. New York: Freeman, proach to Classical Theorems ofAdvanced Calculus.
2002. Includes a few advanced topics in linear alge­ Boulder, Col.: Westview, 1965. A very terse and so­
bra that we did not have time to discuss in this text, phisticated version of this text, intended to introduce
e.g., complex eigenvalues, Jordan canonical form, and students who’ve seen linear algebra and multivariable
computer graphics. calculus to the rigorous approach and to a more formal
Apostol, Tom M. Calculus (2 vols.), 2nd ed. Waltham, treatment of Stokes’s Theorem.
Mass.: Blaisdell, 1967. Although the first volume Strang, Gilbert. Linear Algebra and Its Applications, 3rd
is needed for rudimentary vector algebra, the second ed. Philadelphia: Saunders, 1988. A classic text, with
volume includes linear algebra, multivariable calcu­ far more depth on applications.
lus (although only treating the “classic” versions of
Stokes’s Theorem), and an introduction to probability
theory and numerical analysis. More Advanced Reading
Bamberg, Paul, and Shlomo Sternberg. A Course in Math­ Do Carmo, Manfredo P., Differential Geometry of Curves
ematics for Students of Physics (2 vols.). Cambridge: and Surfaces. Englewood Cliffs, N.J.: Prentice Hall,
Cambridge University Press, 1988. This book in­ 1976. A sophisticated approach using much of the
cludes much of the mathematics of our course, as well material of this text.
as a volume’s worth of interesting physics (using dif­
Flanders, Harley. Differential Forms with Applications to
ferential forms).
the Physical Sciences. New York: Dover, 1989 (orig­
Edwards, Jr., C. H. Advanced Calculus of Several Vari­ inally published by Academic Press, 1963). A short,
ables. New York: Dover, 1994 (originally published sophisticated treatment of differential forms with ap­
by Academic Press, 1973). This very well-written plications to physics, topology, differential geometry,
book parallels ours for students who have already had and partial differential equations.
standard courses in linear algebra and multivariable
Guillemin, Victor, and Alan Pollack. Differential Topol­
calculus^ Of particular note is the last chapter, on the
ogy, Englewood Cliffs, N.J.: Prentice Hall, 1974. The
calculus of variations.
perfect follow-up to our introduction to manifolds and
Friedberg, Stephen H., Arnold J. Insel, and Lawrence E. the material of Chapter 8, Section 7.
Spence. Linear Algebra, 3rd ed. Upper Saddle River,
Munkres, James. Topology, 2nd ed. Upper Saddle River,
N.J.: Prentice Hall, 1997. A well-written, somewhat
N.J.: Prentice Hall, 2000. A classic, extremely well-
more advanced book, concentrating on the theoretical
written text on point-set topology, to follow up on our
aspects of linear algebra.
discussion of open and closed sets, compactness, max­
Hubbard, John H., and Barbara Burke Hubbard. Vector imum value theorem, etc.
Calculus, Linear Algebra, and Differential Forms: A
Shifrin, Theodore. Abstract Algebra: A Geometric Ap­
Unified Approach, 2nd ed. Upper Saddle River, N.J.:
proach. Upper Saddle River, N.J.: Prentice Hall,
Prentice Hall, 2002. Very similar in spirit to our text, 1996. A first course in abstract algebra that will be
this book is wonderfully idiosyncratic and includes
accessible to anyone who’s enjoyed this course.
Lebesgue integration, Kantarovich’s Theorem, and the
Wilkinson, J. M. The Algebraic Eigenvalue Problem.
exterior derivative from a nonstandard definition. It
New York: Oxford University Press, 1965. An ad­
also treats the Taylor polynomial in several variables.
vanced book that includes a proof of the algorithm
Spivak, Michael. Calculus, 3rd ed. Houston, Tex.: Pub­
based on the Gram-Schmidt process to calculate eigen­
lish or Perish, 1994. The beautiful, ultimate source for
values and eigenvectors numerically.
single-variable calculus “done right.”

473
ANSWERS TO SELECTED
EXERCISES

1.1.2.

1.1.6
1.2.1 c. —25,9 = arccos(—5/13); f. 2,0 = arccos(l/5)

A 7___5 1
1.2.2 c.
13 -4 ’ 13 8

1.23 arccos ^2/3 .62 radians « 35.3°


1.2.8 zr/6
1.2.12 Let x = cX and y = C%. Then A% - y - x, and |(a J||2 - ||y - x||2 = ||y||2 - 2y • x +
||x ||2 = a2 — 2ab cos 6 + b2.
1.3.1 b., e., g., h. yes; a., c., d., f., i. no
1.3.2 The argument is valid only ifthere is some vector in the subspace. The first criterion is equivalent
to the subspace’s being nonempty.
1.3.8 If v € V n V1, then v • v = 0, so v — 0.
1. 2
1.4.1
0 3 5 81 fl
20j*8’ [a
4 5
h. not defined; k. 2 1
5 13 10 11
7 8

1 fl
1.4.9 b. Either A = for some real number fl or A = a , a any
0 0 -1/fl -1
real number, /I/O.
0 -1 0 -1 1 0 -1 0
1.4.13 a. 5: ;b.
-1 0 1 0 0 -1 0 1

1.4.17 a. BA2B~'; b. BAnB~1-, c. assuming A invertible, BA~xB~y


1.4.19 Since Ax = 7x, we have x = (A"’A)x = A ’(Ax) = A 1(7x) = 7(A ’x), and so
A-1x = |x.

474
Answers to Selected Exercises •«! 475

1.4.24 (AB)T = BTAT = BA; thus, (AB)T = AB if and only if BA = AB.


1.4.27 A hint: It suffices to see why Asx • y = x - Ag 1y. Since rotation doesn’t change the length of
vectors, we only need to see that the angle between Aex and y is the same as the angle between x and
A^y.
1.4.31 a. (A 1)T; b. switch the second and third columns of A'1; c. multiply the third column of
A"1 by 1/2.
1.4.32 Since (ATA)x = 0, we have (ATA)x • x = 0. By Proposition 4.5, Ax • Ax = 0, and so || Ax || =
0. This means that Ax = 0.

1.4.34 d. By part c, every orthogonal matrix can be written either as Ae or as As

1.4.35 b. If A is orthogonal, then A-1 = AT, and so (A-1)T(A-1) = (AT)TAT = AAT = I by Exer­
cise 34e.
2
1.5.5 a. -2
2
1.5.6 a. 73
1.5.7 a. xi — X2 + X3 = 0; d. 3xi 4- 4x2 4- 5xj = 12
1.5.8 xi + X2 — X3 = —2
1.5.11 1/76
1.5.12 2
" 0 -c b
1.5.15 c 0 —a
_—b a 0

b. f (r) = *, e. f(t) =

215 a (° + b) cos 0 — b cos ((a 4- b)0/b)


(a 4- b) sin 0 — b sin ((a 4- b)0/h)
2.1.7 a. x = cos 0 — log(csc0 4- cot0), y = sin0;
b. x = t - (e2‘ - D/fe* 4-1), y = 2e7(e2' 4-1)
2.1.11 b. (x2 4- y2 4- z2)2 - 10(x2 4- y2) 4- 6z2 4- 9 = 0
2.2.1 a., k. neither, c., e., f., h., i., 1. open, b., d., g., j. closed, m. both
2.2.5 Ify £ B(a, r),then ]|y - a|| — s > r. Let 6 = 5 - r. By the triangle inequality, for every point
z € B(y, 8) we have ||y - a_||_< ||y - z|| 4- ||z - a||, so ||z - a||||y - a|| - ||y - z|| > s - 8 = r.
Therefore, B(y, 3) C K” - B(a, r), and so, by Proposition 2.1, B(a, r) is closed.
476 ► Answers to Selected Exercises

2.2.10 a. If 4 — [a*, bk], let x = sup{afc}. The set of left-hand endpoints is bounded above (e.g.,
by by), and so the least upper bound exists. We have ak < x for all k automatically. Now, if x > bj
for some j, then since Ik C Ij for all k > j, this means that bj is an upper bound of the set as well,
contradicting the fact that x is the least upper bound, b. Take Ik = (0,1/fc).
2.2.13 Chooses = 1. Then there is K g N so that for all k > AT we have Hx* -Xk +i II < 1, so fell <
1 + ||xx+i ||. Therefore, for all j G N, we have ||xj|| < min(||xi||, fell,..., fell, fe+i|| + 1).
2.3.8 a., b., c., d., e., g., h. yes; f., i., j. no
23.10 a. 2; d. a /2

3.1.1 a. g = 3x2 + 3y2, % = 6xy - 2; c. |£ = -y/(x2 + y2), & = x/(x2 + y2)


3.1.2 a. 3; b. 3/V5

3.1.11 T(v)
3.2.1 a. z = e~2(2x - y + 5); f. w = —x + y + z 4- 3
333 a. 0; e. 1

33.4 We get the approximate answer 34 taking a = 240 and b = 6 and the approximate answer 34.14
taking a = 210 and b = 7. My calculator gives me the “correct answer” 34.46 to two decimals.
33.1 -1
-1
6e3
6

D(gof)(0) = Dg 1 Df

3.3.5 a. 266 mph; b. approx. 187 mph


33.6 -2.5 atm/min
3.3.14 F'(t) = h(v(t))v'(t) — h(u(t))u'(t)
/df\2 /df\2 / r rr cos#
33.16 I — I + (— I I evaluated at
\dx/ \dy/ \ r sin#
3.4.1 a. x + 4y = 9
3.4.2 b. 4x + y + 14z = 16

3.4.4 a. —4 ; b. — arctan 4
25
Answers to Selected Exercises 477

3.5.7 a. VSCe* - ea); c. 7


3.5.8 a. k = 1; b. k = \[2l(et + e~')2
3.5.12 k = a/(a2 + b2), r = b/(a2 + b2)
3.6.5 0(x) = ~(h(x) + | f*k(u)du), ^(x) = - J f* k(u)du)

3.6.9 = c log r + k for some constants c and k.

4.1.2 b., c., d., t, g. are in echelon form; c. and g. are in reduced echelon form.

2 "I f 1
-1
0

center , radius 5

4.1.8 b = Vi - v2 + v3
4.1.9 b. yes; a., c. no
4.1.10 d. yes; a., b., c. no
4.1.11 b. 2b, + b2 — b3 = 0
’1’ F1 " "1 "
4.1.13 a. By Proposition 1.4, A 1 = 0, but since 0 1 / 0, this is impossible.
_0_ _1_ 0
4.1.14 a. 0, 3; b. for a = 0, b must satisfy b2 ~ 0; for a = 3, b must satisfy b2 = 3b,.
4.1.18 a. none, as Ax = 0 is always consistent; b. take r — m = n; e. take r < n
478 ► Answers to Selected Exercises

|bi + b3 = 0;

0"| T 1 0 0 0"
0 0 10 0
0 -1010
1J L 0 0 0 1.

0
0
0
1

-1 3 -2*1 |"-1 2 1
4.2.2 c. A"1 = -1 2 -1 ;e. A~!= 5 -8 -6
2-3 2 -3 5 4_
_ — "1 -1 0 O' 2
"-2 0 r 0
0 1 -3 2
4.2.3 b. A1 = 9 -1 -3 ,x — 2 ; d. A-1 =
0 0 4 —3
-6 1 2_ _ -1 _
.0 0 -1 1. 0
4.3.2 a., b., d., e. yes; c., f. no
4.3.12

4.3.13

4.3.14
Answers to Selected Exercises 479

4.3.21 a. A hint: Use the definition of U 4- V to show that the vectors span. To establish linear
independence, suppose C1U1 4- c2u2 4------- 1- QUj 4- d^ 4------- F dtvt = 0. Then what can you say
about the vector C]Ui 4- c2u2 4------- 1- q u * = -(diVj 4------- F d^t)?
4.3.23 a., c., e. yes; b., d., f. no
4.4.1 Let’s show that R(B) C R(A) if B is obtained by performing any row operation on A. Ob­
viously, a row interchange doesn’t affect the span. If B, — cA, and all the other rows are the same,
C1Bj 1- c/B/ 4------- 1- cmBm = ci Ai -I------- F (czc)A, 4------- 1- cmAm, so any vector in R(B) is in
R(A). If Bz = A, + cAj and all the other rows are the same, then ciBt 4------- F czB, 4------- 1- cmBm =
ci Ai 4- • • • 4~ Cz(Af 4" cAj) 4- ■ • • 4- cmAm = C1A1 4- • ■ • 4" CzAz 4* • • • 4- (cj 4- ccJAj 4- • • • 4"
cmAw, so once again any vector in R(B) is in R(A).
To see that R(A) C R(B), we observe that the matrix A is obtained from B by performing the
(inverse) row operation (this is why we need c 0 for the second type of row operation). Since
R(B) c R(A) and R(A) c R(B), we have R(A) = R(B).

4.4.3 f. R(A):

1
-1
1
N(A):
0
0
0.
’2 -1 o' '1 2'
4.4.4 ,T =
3 0 1 3 6
'1 0 -1 -1' “2 -1 0" "2 0 r
4.4.5 b. 0 1 0 0 ; c. 0 0 0 ; e. 0 2 1
_0 1 0 0_ _2 -1 0_ _2 2 2_

■ -3" --4‘ "1" ' O'


-2 5 0 1
4.4.8 ;b.
1 0 3 2
_ -5.
0

1_ _4_
4.4.10 Since U is a matrix in echelon form, its last m—r rows are 0. When we consider the matrix
product A — BU, we see that every column of A is a linear combinations of ths first r columns of
hence, these r column vectors span C(A). Since dim C(A) = r, these column vectors must give
a basis (see Proposition 3.9).
5^
4.4.11 b. 161 + J&2

_ |hi - 1&2
480 ► Answers to Selected Exercises

dz dx 1-1/ . .2 idy \-l/ -~

4.5.6 It is a smooth surface away from the curve g(r) = Indeed, M is the collection of all

the tangent lines to this curve; this surface has a cuspidal edge along the curve, as one can perhaps
see from the picture below:

X2 4- X% + xf + x| - 1
4.5.11 a. Let F(x) = Then M = F~1({0}). Now DF(x) =
X1X2 - X3X4

2xi 2X2 2X3 2x


4 has rank < 2 only at the origin.
Xi —X4 -x3_
b. xi = 1, X2 = 0; X] - X2 = X4 — x$ = 1.

5.1.1 a., b., h., k., 1. are compact


5.1.4 a. 2
5.1.9 First, S is closed: Any convergent sequence of points in S has a subsequence converging to a
point of S and hence must converge itself to that point of S (Exercise 2.2.6). Next, <S is bounded: If
not, we could take x* € 5 with ||xfe|| > k; then {x*} would have no convergent subsequence.

5.2.1 a. ~3^2 ; g. J , "J ; i. 0

5.2.3 length and width 2r/->/3, height r/V3


5.2.4 max5,min-1
5.2.6 2' x 2' x 1'
5.2.10 Bend up 4" on either side at an angle of t t /3 .

5.3.1 a. saddle point; g. is a local maximum point, is a local minimum point;

i. saddle point
5.3.3 We see two mountain peaks (global maxima) joined by two ridges (two saddle points) with a
deep valley (global minimum) between them.

53.6 c. A =
Answers to Selected Exercises 481

5.3.8 b. 2x ix 2 = (|xi +X2)2 - (-5X1 +X2)2


5.4.2 0,49/8
5.4.13 x2/18 + y2/2 = 1
5.4.15 semimajor axis 1, semiminor axis 1/76
5.4.19 (5/Vn)"
1
5A25 1 5
4

5.4.26 a.

5.4.27 ±

2/75 1/75 '


5.4.29 a. x = ± , X = 2; x = i ,X = -3
1/75 -2/75

5.4.30 a. i±?£ « 1.62


' -1 "
0
5.5.1 b.
1
3_

I*i
5.5.15 b. + ^b2
. - |Z>2 _
482 ► Answers to Selected Exercises

5.5.16 b. The Ith row of A-1 is aj/||at-1|2.


5.5.19 b. /(r) = r2 — t + | gives a basis.
5.5.20 Hint: Use the addition formulas for sin and cos to derive the formulas
sin kt sin It = | ( cos(k — £)t — cos(£ 4- f )r)
sin kt cos It = | (sin(£ + £)t + sin(fc — £)/).

6.1.3 Xk - X = (xo + E>1 (Xy - Xy-i)) - (xo + E?=l (Xy - Xy_i)) = - E>fc+1 (Xy ~ Xy-i), SO
llXfc -x|| < E^jt+i ||X; — Xj-ill < (E>fc+icH1) llxi — Xoll = rfellxi — Xoll-
6.1.9 a. [1, 2]; x0 = 1, xi = 1.5, x2 = 1.41667; c. [0.26,0.79], x0 - 0.785398, Xi = 0.523599,
x2 = 0.514961
1 .984
6.1.11 a. B , 1/4 ]; to three decimals, the root is
1/4 .254

6.2.1 a. any Xo / 0, Dg(f (xo)) =

6.2.9 a. a = l/T,0 = l/p


6.3.1 If X were a 1-dimensional manifold, in a neighborhood of 0 it would have to be a graph of the
form y =■ f(x) or x = f(y) for some smooth function f. It is clearly not a graph over die y-axis,
and f(x) — |jc | is far from differentiable. The so-called parametrization has 0 derivative at t = 0.

y1
6.3.5 a. zero set of F y and graph of f(y) =

/x^
6.3.7 c. zero set of F I y = y x tan z and graph off I ] = x tan z away from z = (2n 4-l)xr/2,
W
n e Z; near such points, use F = x — y cot z and / I j = y cot z-

7.1.1 Let?i = {0 = xo < xi = 1} be the trivial partition of [0,1] and let ?2 = {0 = Jo< Ji < yi <
y3 = 1} be a partition of [0,1] with the properties that yi <\<y2 and ya — yi < c; set P =?ix 1P2.
Then for j — 1, 3, we have = M\j, whereas mi2 = 0 and Mn = 1. Then
U(f, ?) - L(f, 7) = (M12 - m12)(y2 - yt) = y2 - yi < e,
and so, by the Convenient Criterion, Proposition 1.3, we infer that f is integrable. Now, for our
Answers to Selected Exercises ◄ 483

particular partition CP, we have


£(/,?) = 1 ~yi < | < 1
thus, 1 /2 is the only number that can lie between all lower and upper sums, and therefore fRfdA —
1/2.
7.1.8 Hint: Let R" be the intersection of R and R’. Then show that fRfdV = fR„ f"dV =
fx'f'dV.
7.2.1 a. e - 1; c. log(8/3V3)
7.2.2 b. /*2 f dxdy;f. [' f(X] dxdy+ T f (X\ dxdy
Jo Jy/2 \y/ Jo J-jy \yj Ji Jy~2 \y/
7.2.3 c. I(ilog2- 14-71/4)
7.2.8 a. | log 3
7.2.11 16/3

dy dz dx

7.2.13 abc/6
7.2.15 1/8
7.2.24 a. (2 4- t t )/8x 3
7.3.5 3t t /4
7.3.6 (V5 - 1)/12
7.3.9 ?r(e -1)/4

pin pa ph pin paictan(a/h) phsec</>


7.3.17 a. / / / rdzdrd8\b, II I p2 sin</>dpd^>dO
Jo Jo Jhr/a Jo Jo Jo

7.3.20 | [ p2dV =4t t /15


Js
7.3.22 625k /32
7.4.1 ja
7.43
7.4.4 6a/5
7.4.7 mass = 2(V3 - f), x = .> , y = 0 by symmetry.
484 > Answers to Selected Exercises

7.4.11 -
4

7.4.12 I = Ima2
7.4.16 a. k = 1; b. k = 1/2; c. k = 3/10
7.5.1 b. —4; d. 6
7.5.11 detA = ±l.

7.5.13 a. -4/3; b. 3 3 3
5 2 1
3 3-

-1 0 1
7.5.14 -3; 2 1 -2
5
3-

7.6.5 ±(3-log5)
7.6.7 sin 1
7.6.10 j
7.6.13 (nj3-3)/9
7.6.14 This is a solid torus with core radius a and little radius b. Its volume is 2n2ab2.
7.6.17 27r2a5/5

8.2.4 a. -5 dx a dy + lOdy A dz; c. 11 dx a dy A dz


8.2.6 a. —xe*ydx Ady’, b. 2(xdx Ady + zdz a dx + y dy Adz)’, c. 2(x + y + z)dx Ady Adz;
d. X2 dX\ A dX3 A dX4 4- X] dx2 A dxi A dx^

d. f ly = ±x3 +xyz + siny + ^z2; e. / | j = log./*2 + y2


\zj
8.2.11 b. 18 dv, e. sin2 u du A dv
8.3.1 a., b., e. 1; c., d., f. — 1
83.2 a. 5/6; b. 13/15; c. -5/6; d. -5/6; e. 5/3 - jr/4; f. -5/3 + n/4
8.3.10 c. na^/2; d. 32/3
8.3.12 nab
8.3.14 n(2a2 + b2)
8.4.4 16
8.4.8 2nah
Answers to Selected Exercises 485

u 1 - 1
8.4.10 ,g y 2v
V 1 —z y 1 + u2 + V2
z u2 + V2 - 1
8.4.14 |m«2 = I 3k a4
8.4.15 a. 47ra3; b. 4Ka2h; c. 6zra2/i; d. 24
8.4.17 88t t /3
8.4.18 a., c., d. 4k ; b. 4Kh/*/a2 + h2
8.4.22 a., b. 0: c. t t 2
8.5.1 Since the outward-pointing normal to 3R* is —e*, we must decide whether
{-ek, ej,..., e*i } is a positively-oriented basis for R*. We need k — 1 exchanges and
standard positive basis for R*'1
one change of sign to obtain {ei,..., et}. This is k sign changes in all, and hence the standard
positive basis for R*-1 gives the correct orientation precisely when (—1)* = 4-1.
8.5.3 2k a (a + b)
8.5.6 Ka4
8.5.12 a. —8t t /15
8.5.18 t t /2a /3
8.6.7 c., d., e., g., h. div = 0; a., f., g., h. curl = 0

36 -3
-55 -2

7 4 -4
1
4 1 8
-4 8 1

“3 1 5 4"
^•1-8 i 1 6 -4 7
17 5 -4 14 1
.4 7 1 11.
9.1.12 a. With respect to the “new” coordinates yb y2, J3. the equation of the curve of intersection
is yf + cos2 0 y2 = 1, y3 = 0.

9.2.1 a. eigenvalues -1, 6; eigenvectors

e. eigenvalues 2,2; only eigenvector

f. eigenvalues —3,0, 3; eigenvectors


486 ► Answers to Selected Exercises

i. eigenvalues 1,1,2; eigenvectors

1. eigenvalues —1,2, 3; eigenvectors

9.2.12 a., f., 1. diagonalizable; e., i. not diagonalizable


2 2 1
9.2.16 a. O; b. ; C(A - 21) = N(A - 21) because of part a; c.
3 0 2
9.3.2 2/3; 5/6
93.3 9/13
93.4 ak = 2k + 1
9.3.8 ak = | (2* + (~1)*+1)
Ck -1 2
9.3.9 b. xfc = + (co +»io)(l-2)fc
= (co + 2mo)(l.l/ , so—no matter what
umk u 1j u u
-1
the original cat/mouse populations—the cats proliferate and the mice die out.
°1\
-3
1 /’

If3 +12 + 2t + 1
;d,

1 1 2 2
9.3.13 c. normal modes cos r , sint , cos 2t , and sin 2t
1 1 -1 -1

e21 te2* I'2**'

9.3.14 0 e2* te2t


0 0 e2f

9.3.15 a. y(t) = e2t — 2e_f; b. y(t) = ef + te‘

1 ’-2 1" A 0 i n
75
i 1
’-2 1 2~
9.4.1 a. ~t = ; c. 7S
; d. 2 2 1
V5 1 2 1 1 1 3
L”76 75 J 1 -2 2_
75
2
9.4.2 2
4
9.4.5 There is an orthogonal matrix Q so that Q~yAQ = A — XI. But then A = Q(kI)Q 1 = XI.
1 -1
9.4.18 b. ellipse y2 + 2y% = 2, where y — -y= x;
1 1
Answers to Selected Exercises 487

1 3
c. parabola yi = 5y% — 1, where y = -
5 4

0 2 “I
J3 Te
9.4.19 a. hyperboloid of 1 sheet —2yj + y? + 4yj = 4, where y = ?2
1
75
i
7S
1 i i
TeJ

d. hyperbolic paraboloid (saddle surface) -yj 4- 3yj + v/3y2 = 1. where

*
2 1T
0 T6
1
y= T2 X.
1
V2 A TS-I
INDEX
acceleration, 109 closure, 70 determinant, 46, 309,314
Ampere’s law, 398 cofactor, 315 diagonalizable, 423, 429,432,455
angle, 11,239 column space, 171 simultaneously, 435,465
arclength, 112 basis for, 177 difference equation, 436
arclength-parametrized, 114 column vector(s), 28 differentiable, 87
area, 43 linear combination of, 136 continuously, 93
signed, 44 compact, 197 differential equations, system of,
area form, 370 complement 439,441
augmented matrix, 130 orthogonal, 22 differential form, 339
average value, 298, 299 conic section, 108,459-460 closed, 347
weighted, 301 connected, 103,352,361 exact, 347,386
simply, see simply connected differentiating under the integral
bah, 65 conservative, 352 sign, 287
closed, 70 consistent, 138,140,172 dimension, 165
basis, 161 constraint equation, 139,140,150, directional derivative, 83
change of, 416 179,182 distributive property, 8, 34
orthogonal, 233,236 continuity, 75 divergence, 395,450
orthonormal, 236,456 properties of, 76-78 Divergence Theorem, 395
standard, 162 uniform, 200, 271,287 domain, 24
binomial coefficient, 171 contour curves, 57 dot product, 8
binormal, 117 contraction mapping, 245
boundary orientation, 382 convergent, 67 echelon form, 131
boundary point, 380 coordinate chart, 380 eigenspace, 423
bounded, 197,199 coordinates, 163, 233,413,415,416 eigenvalue, 222,423
Brouwer Fixed Point Theorem, 404 cos, power series of, 445 eigenvector, 222,423
bump function, 381 Cramer’s Rule, 318 elementary matrix, 147-148, 312
critical point, 203 elementary operations, 128
e°°, 120,167-168 cross product, 48 column, 309
catenoid, 124 cubic row, 130
Cauchy sequence, 71,249 cuspidal, 55 ellipse, 105-106,108,460
Cauchy-Schwarz inequality, 11,15, nodal, 55,187 ellipsoid, 460
239 twisted, 56 envelope, 261
Cavalieri’s principle, 324 cubical norm, 324 epicycloid, 62
center of mass, 300 curl, 393 Euler, 103,445
centroid, 300 curvature, 115 exact, 347
Change of Variables Theorem, 326 cuspidal cubic, 55 exponential, power series of, 441
Change-of-Basis Formula, 417 cycloid, 56,187 exterior derivative, 341
change-of-basis matrix, 416 cylinder, 460 extremum, 202
characteristic polynomial, 426-428 cylindrical coordinates, 291-294,
checkerboard, 315 329 Faraday’s law, 398
circulation, 394 Fibonacci Sequence, 438
closed, 69, 347 d, 341 finite-dimensional, 168
closed bafi, 70 deMoivre’s formula, 407 first variation equation, 455

488
Index ◄ 489

fixed point, 79, 245 identity matrix, 34 linearly dependent, 158


flow, 450 image, 24,79,170,172,183 linearly independent, 158
flow line, 401 implicit differentiation, 190 local maximum (minimum), 202
flux, 373, 395 improper integral, 291,297 lower sum, 267
force inconsistent system, 138,227, 230
central, 110 infinite-dimensional, 168 manifold, 192,262
conservative, 352 inhomogeneous system, 140 with boundary, 380
free variable, 131 initial value, 440 matrix
Frenet formulas, 118 inner product space, 238 addition of, 29
frontier, 71, 273 integrable, 268
change-of-basis, 416
Fubini’s Theorem, 279,281 integral, 268 diagonal, 29
function, 24 Gaussian, 287,291 exponential, 441
function space, 167 interior, 71 identity, 34
Fundamental Theorem of Algebra, intersection, 23 nonsingular, 142,152,163,312,
406 inverse matrix, 34 464
formula for, 154,319 orthogonal, 42,266,322,456
GL(n), 249, 321 right (left), 154,156 permutation, 42, 320
Gauss’s law, 396, 398 invertible, 34 positive (negative) definite, 464
Gauss’s Theorem, 395 iterated integral, 277-284 powers, 32
Gaussian product, 31,147
elimination, 132,152 Jacobian matrix, 88
not commutative, 32
integral, 287, 291 Jordan canonical form, 434
singular, 142,433
general solution, 132 skew-symmetric, 36
fc-fbrm, 339
standard form of, 132 square, 29
fc-tuple, increasing, 337
global maximum (minimum), 202 symmetric, 36,455
kernel, 170,172
golden ratio, 439 upper (lower) triangular, 29, 314
kinetic energy, 109,302,350, 352
gradient, 104,192, 395 zero, 30
knot, 116
Gram-Schmidt Process, 236,427 matrix multiplication
gravitation, 304, 396-398 Lagrange interpolation, 239 associative property of, 34,36,
Green’s Formulas, 400 Lagrange multiplier, 218,457 37,152,165
Green’s Theorem, 358-362 Laplacian, 122,125,346,391 block, 39,156
leading entry, 131 distributive property of, 34
harmonic function, 122,400 least squares line, 230 maximum, 202
maximum principle for, 401 least squares solution, 226,227 maximum principle, 401
mean value property of, 400 least upper bound property, 68-69 Maximum Value Theorem, 199
heat equation, 125,403 length, 2,239 Maxwell’s equations, 398-400
helicoid, 124, 368 level curves, 57 Mean Value Inequality, 248
helix, 114 level set, 78,192 Mean Value Theorem, 94,121,208,
Heron’s formula, 52 limit, 72 328,470
Hessian, 209 properties of, 74-75 for integrals, 274
bordered, 224,466 line, 16 measure zero, 275
homogeneous function, 103,171 of regression, 230 median, 5
homogeneous system, 140 linear combination., 19 minimal surface, 124, 392
homotopic, 405 trivial, 158 minimum, 202
Hooke’s Law, 449 linear map, 24 moment of inertia, 303, 308
hyperbola, 459 matrix for, 413,416 monkey saddle, 204
hyperboloid, 460 standard matrix for, 24 multiplicity
hyperplane, 18 symmetric, 456 algebraic, 431
hypocycloid, 62,187 linear transformation, see linear map geometric, 431
490 ► Index

negative definite, 210,464 pivot, 131 skew-commutative, 339


neighborhood, 65 column, 131 smooth, 120
Newton variable, 131 span, 19,20
law of gravitation, 304, 396 plane, 17 spectral decomposition, 459
second law of motion, 109,449 affine, 21,49 Spectral Theorem, 456
Newton’s method, 247,250 planimeter, 366 speed, 109
n-dimensional, 250 polar coordinates, 60,288-291,329 sphere, 78,197,409
nodal cubic, 55 positive definite, 210,464 spherical coordinates, 294-296, 329
nonsingular, 142,152,163, 312,464 potential energy, 352 squeeze principle, 79
norm, 97,199,248, 324 potential function, 352-357 standard basis, 19,162
normal equations, 227 preimage, 76 standard matrix, 24
normal mode, 443,449 principal normal vector, 115 star operator, 346, 395
normal vector, 18,49,105 product, 45 star-shaped, 355,411
outward-pointing, 370 wedge, 338 stereographic projection, 376
nullity, 181 product rule, 312 Stokes’s Theorem, 383-389
Nullity-Rank Theorem, 170,181 projection, 10,226,234,459 classical, 394
nullspace, 172 projection matrix, 228,414 subsequence, 70
basis for, 177 pullback, 342 convergent, 197
Pythagorean Theorem, 2,9,51,227 subspace, 16
open, 65 Pythagorean triple, 61 fundamental, 171-183
open rectangle, 65 trivial, 16,164
orientable, 370, 381 quadratic form, 209-210,214-215, subspaces, sum of, 23
orientation, 370, 380 465-466 surface area, 375
orthogonal, 9 quadric surface, 460-462 symmetric, 36,455
basis, 233,236
range, 24 tangent plane, 87,105,188,191
complement, 22,181-182,226
rank, 140,180-181 tangent space, 219, 264
matrix, see matrix, orthogonal
Rational Roots Test, 427 Taylor polynomial, 210
set, 233
rectangle, 65,267,268 torsion, 118
subspaces, 21
reduced echelon form, 131 torus, 367
orthonormal basis, 236,456
refinement, 269 trace, 41,428
oscillation, 276
common, 269 tractrix, 63
reflection matrix, 39,414 transpose, 36
Pappus’s Theorem, 308 region, 273 determinant of, 313
parabola, 108,462 regression, line of, 230 triangle inequality, 12
paraboloid, 462 rotation matrix, 27,39,419-420 trivial solution, 141,142
parallel, 3,160 row space, 171 trivial subspace, 16,164
parallel axis theorem, 308 basis for, 177 twisted cubic, 56
parametrization, 53 row vector, 28-29
parametrized ^-dimensional
uniformly continuous, 200,271,287
manifold, 345 saddle point, 203 unique solution, 141, 142, 152
parametrized curve, 53 scalar multiplication, 2 unit tangent vector, 115
partial derivative, 81 sequence, 66 unit vector, 2
partial differential equation, 122 Cauchy, 71,249 upper sum, 267
particular solution, 132 convergent, 67
partition, 267 shear, 26, 314 variable
partition of unity, 381 sign, 320 free, 131
path-independent, 352 similar, 417,421 pivot, 131
permutation, 320 simply connected, 361,406 vector
permutation matrix, 42,320 sin, power series of, 445 addition, 3
piecewise-C1, 348 singular, 142,433 column, 28
Index 491
row, 28-29 velocity, 109 wave equation, 122,403
subtraction, 4 volume, 268, 314
zero, 1 wedge product, 338
signed, 314 weighted average, 301
vector field, 348, 393-395,409 volume form, 381 winding number, 362
vector space, 167 volume zero, 271 work, 350

You might also like