Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
113 views

Calculus of Vector Functions

Uploaded by

pktrangxb04
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views

Calculus of Vector Functions

Uploaded by

pktrangxb04
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 648

Third Edition

RICHARD E. WILLIAMSON

RICHARD H. CROWELL
HALE F. TROTTER
Digitized by the Internet Archive
in 2010

http://www.archive.org/details/calculusofvectorOOwill
Calculus

of

Vector Functions
Third Edition

Calculus
of
Vector Functions
RICHARD E. WILLIAMSON
Department of Mathematics
Dartmouth College

RICHARD H. CROWELL
Department of Mathematics
Dartmouth College

HALE F. TROTTER
Department of Mathematics
Princeton University

Prentice-Hall, Inc., Englewood Cliffs, New Jersey


1972, 1968, 1962 by Prentice-Hall, Inc.
Englewood Cliffs, N.J.

All rights reserved. No part of this book may be


reproduced in any form or by any means without
permission in writing from the publisher.

10 9 8 7

ISBN: 0-13-1 12367-X

Library of Congress Catalog Card Number 75-167788

Printed in the United States of America

PRENTICE-HALL INTERNATIONAL, INC., London


PRENTICE-HALL OF AUSTRALIA, PTY. LTD., Sydney
PRENTICE-HALL OF CANADA, LTD., Toronto
PRENTICE-HALL OF INDIA PRIVATE LIMITED, New Delhi
PRENTICE-HALL OF JAPAN, INC., Tokyo
Preface

This book is an introduction to the calculus of functions of several variables


and vector calculus, following the unifying idea that calculus deals with
linear approximations to functions. A working knowledge of one-variable
calculus is the only prerequisite. The necessary linear algebra is presented
in the first two chapters.
The emphasis in this third edition is on learning and understanding
mathematical techniques, together with their applications to specific
problems. The framework of linear algebra is used both to clarify concepts
and to emphasize algebraic techniques as useful tools. We have given
precise statements of all definitions and theorems, and have included
complete proofs of practically everything in the book. The proofs are,
however, meant to be studied only insofar as they help understanding.
With careful attention to the examples, a person can go through the text
intelligently without studying any proofs at all. The book is designed to
be flexible, thereby allowing an instructor to select a level of theoretical
emphasis appropriate to the interests and abilities of his class. It is unlikely
that anyone would follow either extreme course: covering all proofs or
including none.
In this edition, the material on linear algebra has been expanded and
reorganized as two chapters. The first covers vectors and linear functions
in % n and contains nearly all the basic algebraic ideas used in the calculus.
Chapter 2 includes a complete discussion of the solution of linear systems
with a section on applications, presents the fundamental ideas of abstract
vector spaces and dimension, and contains a brief but complete treatment
of linear differential equations with constant coefficients; a section on
Preface

complex vector spaces shows their usefulness in this connection. Other


topics such as orthonormal bases and eigenvectors are also treated briefly.
Material involving numerical calculation has been added in several
places. There are sections on Newton's method for functions of several
variables, on numerical estimation of implicitly defined functions, and on
numerical approximation of definite integrals. This material goes particu-
larly well with the use of an automatic computer. We have included
some exercises in these sections which cannot reasonably be done by
hand, but which students with access to a computer can do by writing
fairly simple programs. Of course we have also included enough numerical

exercises that do not require a computer.


We want to thank the people at many universities who have made
suggestions for improving the book. Their helpful interest has been most
welcome. Thanks are also due to Mrs. Dorothy Krieger and to Mr.
Arthur Wester of Prentice-Hall, whose efforts have helped to make the
book more attractive and useful.

R. E. W.

R. H. C.

H. F. T.
Possible Courses

of Study

We have tried to organize the text so that each section leads naturally
to the next, but it is by no means necessary to include all sections or to
take them up in strictly consecutive order. In particular, it is not necessary
to complete the linear algebra before starting on the calculus. (Students
who have already taken linear algebra can, of course, start with Chapter 3
and use the first two chapters for reference and review.) Everything
essential for Chapters 3 through 7 is contained in the first seven sections
of Chapter (An exception is the use of facts about dimension in the
1.

last section of Chapter 4.) The study of determinants can be postponed

until the section on change of variable in multiple integrals and the last
three sections on vector analysis, where they are really needed. On the
other hand, determinants can be used in practice for inverting small
(2-by-2 or 3-by-3) matrices or solving small systems of linear equations,
so that some may prefer to take them up earlier and postpone (or omit)
the row-reduction method given in the first section of Chapter 2. At the
cost of avoiding a few higher dimensional exercises later on, one may
even restrict discussion of matrix inversion to the trivial 2-by-2 case.
Other changes of order, such as taking up multiple integration early
in the course, should also cause no difficulty.
The sequences of section numbers listed below are minimal in the sense
that they contain the prerequisite material for later entries, but do not
contain everything that might be desirable in a typical course. Additional
sections can be added to make up a full one-year course suitable for par-
ticular needs. Experience has shown that for ordinary class use two
meetings should be used to cover an average section. An occasional short
Possible Courses of Study

section merits only one meeting, and some longer ones, in order to be
covered entirely, require three.

Ch. 1
Contents

Introduction, 1

Chapter 1 Vectors and Linearity, 4

1 Vectors, 4

2 Geometric interpretations, 7

3 Matrices, 15

4 Linear functions, 25

5 Dot products, 37

6 Euclidean geometry, 46

7 Determinants, 51

8 Determinant expansions, 67

Chapter 2 Linear Algebra, 79

1 Linear equations, inverse matrices, 79

2 Some applications, 88

3 Theory of linear equations, 97

4 Vector spaces, subspaces, dimension, 108


Contents

5 Linear functions, 120

6 Differential operators, 130

7 Complex vector spaces, 147

8 Orthonormal bases, 160

9 Eigenvectors, 167

10 Coordinates, 179

Chapter 3 Derivatives, 190

1 Vector functions, 190

2 Functions of one variable, 204

3 Arc length, 212

4 Line integrals, 220

5 Partial derivatives, 227

6 Vector partial derivatives, 234

7 Limits and continuity, 240

8 Differentials, 250

9 Newton's method, 263

Chapter 4 Vector Calculus, 272

1 Directional derivatives, 272

2 The gradient, 277

3 The chain rule, 287

4 Implicitly defined functions, 301

5 Curvilinear coordinates, 310

6 Inverse and implicit function theorems, 325

7 Surfaces and tangents, 337

Chapter 5 Real-Valued Functions, 350

1 Extreme values, 350

2 Quadratic polynomials, 361


Contents

3 Taylor expansions, 380

4 Taylor expansions and extreme values, 390

5 Fourier series, 397

6 Modified Fourier expansions, 405

7 Heat and wave equations, 412

8 Uniform convergence, 422

9 Orthogonal functions, 429

Chapter 6 Multiple Integration, 440

1 Iterated integrals, 440

2 Multiple integrals, 449

3 Change of variable, 467

4 Improper integrals, 482

5 Estimates of integrals, 492

6 Numerical integration, 498

Chapter 7 Vector Field Theory, 510

1 Green's theorem, 510

2 Conservative vector fields, 522

3 Surface integrals, 530

4 Stokes's theorem, 541

5 Gauss's theorem, 551

6 The operators V, V x and , V-, 557

7 Differential forms, 565

8 The exterior derivative, 574

Appendix, 579

1 Introduction, 579

2 The chain rule, 583


Contents

3 Arc length formula, 585

4 Convergence of Fourier series, 587

5 Proof of the inverse and implicit function theorems, 589

6 Proof of Lagrange's theorem, 595

7 Proof of Taylor's theorem, 598

8 Existence of the Riemann integral, 601

9 The change-of-variable formula for integrals, 605

Index, 611
Calculus

of

Vector Functions
Introduction

A first course in calculus deals with real-valued functions of one


variable, that is, functions defined on all or part of the real number line %,
and having values in Si. For example, a formula such as

y = x2 + 3

yields a real number y for any real number x and so defines a function/

from % to %
with (for instance) /(0) 3,/(-2) 7,/(V3) =
6, etc. = =
In this book we are concerned with functions of several variables whose
values may be real numbers or, more generally, may be w-tuples of real
numbers. For example, a pair of formulas

Jl = V^i + x\ -f X3

y2 = xrx 2 + 5x 3

yields a pair of numbers (y u y 2 ) for any triple of numbers (x x x 2 x 3 ) and , ,

so defines a function g from "3-dimensional space" to "2-dimensional


space." In particular, the formulas above give

g(0, 0, 0) = (0, 0),

g(l,2,3) = (Vl4,17),
g(3, 2, 1) = (Vl4, 11).

We shall use 51" to stand for the set of all rt-tuples of real numbers.
1
(Jl is thus the same as '.R .) The domain of a function is on which it
the set
is defined, and the range or image is the set of values assumed by the
Introduction

function. We speak of functions "from ft" to ft


m " and write

/
31" ft"

to indicate that/is a function whose domain is a subset of ft" and whose


range is a subset of ft
m . ft" is then called the domain space of the function
and ft
m is called its range space. The terms "transformation" or "mapping"
are sometimes used instead of "function."
While functions of one variable are basic and very useful, there are
many situations whose mathematical formulation requires the more
general functions we consider here. For example, just as certain curves
in the plane can be represented as graphs of functions from ft 1 to ft 1 so ,

certain surfaces in 3-space can be represented as graphs of functions


from ft 2 to ft 1 The picture below illustrates the graph of a function

.

ft 2 ft 1 defined by a formula

f(x, y) = z.

how curves and surfaces can be described


Other examples showing
by functions are given at the beginning of Chapter 3.
Most of the problems and examples in this book involve func-
tions from ft" to ft m with values of m and n not more than 2 or 3,
since higher-dimensional problems are difficult to visualize and
often require inordinate amounts of computation. In the theoreti-
cal development we nevertheless provide formulations valid for
arbitrary dimensions. This is not an empty generality. An econo-
mist may wish to consider a mathematical model in which the
prices of a number of commodities are determined by a number of
other variables such as costs of production and demands. A civil
engineer may want to study the displacements produced in the many joints
of a complex structure by various combinations of loads applied at many
points. Now that automatic computers have made the arithmetic calcu-
problems involving dozens or even hundreds of variables
lations feasible,
are being attackedand solved. Thus it is important to have techniques and
theorems that apply to functions from ft" to ft m for all values of m and n.
Most of the ideas studied in one-variable calculus reappear in a more
general form in multivariable calculus. For example, the problem of
finding a tangent line to the graphy =/(*) of a function of one variable
becomes the problem of finding a tangent plane to the graph z = g(x, y)
of a function of two variables. The tangent line has an equation of the
form y = mx + b, and the coefficient m is found by computing the
derivative of/. In the higher-dimensional case, the tangent plane has an
equation of the form z = px + qy + c, and, as we shall see, the coeffi-
cients p and q are given by the higher-dimensional derivative of g.
Differential calculus can be described as the technique of studying
general functions by approximating them with functions of the simple
Introduction

form exemplified by y = mx + b or z = px + qy + c. In one dimension,


these "linear" functions are very simple indeed, but in higher dimensions
they are more complicated. The first two chapters of the book are
concerned with linear functions and some closely related topics that have
applications in many fields other than calculus.
The main purpose of the book, to which the later chapters are largely
devoted, is to study generalizations of derivative and integral to higher
dimensions. We shall be concerned with the relations between these ideas
and with how they can be used to solve a variety of problems. The formal
techniques of differentiation and integration from one-variable calculus
continue to be directly applicable in multivariable calculus as methods of
calculation. However the interpretations, and often the underlying ideas,
may be quite different. To get geometric insight into these interpretations
it isworth learning to visualize three-dimensional pictures. Two dimen-
sional ones come fairly readily because they are easy to draw on paper.
Learning to make perspective drawings is a big help in understanding
three-dimensional problems and relating them to the physical world.
The boldface letters x, a, etc., that are used to distinguish vectors from
numbers can be written longhand in several ways. Some possibilities are

x, x, or x; capital letters or ordinary small letters can also be used in a

context in which they will not be confused with the usual notations for
matrices and numbers. The printing of a word in boldface type indicates
that the word is being defined at that point in the text.
1

Vectors and Linearity

SECTION 1

VECTORS We denote by %n the set of all ^-tuples (x 1 , x2 , . . . , x n ) of real numbers.


Boldface letters x, y, z, etc., will stand for /7-tuples, while ordinary light-
face letters will stand for single real numbers. In particular we may write
x = (x, y) or x = (x, y, z) for general pairs and triples in order to save
writing subscripts.
For any two elements x = (x x x 2 x„) and y , , . . . ,
= {y x ,y 2 , . . .
,y n )
in we define the sum x + y to be the «-tuple
.'K'\

(x t +yu x +y t % ,. . . ,x n +_y„).

For any real number r and //-tuple x = (x lf x2 , . . . , x n ), we define the


numerical multiple rx to be the //-tuple

(rxj, rx 2 , . . . , rx n ).

For example, if x = (2, — 1, 0, 3) and y = (0, 7, —2, 3), then

x + y = (2, 6, -2, 6)
and
3x = (6, -3,0,9).

We write — x for the numerical multiple (— l)x, and x — y as an abbrevia-


tion for x + (— y). We use to denote an /?-tuple consisting entirely of
zeros. The zero notation
is ambiguous since, for example, may stand
for (0, 0) inone formula and for (0, 0, 0) in another. The ambiguity
seldom causes any confusion since in most contexts only one interpretation
makes sense. For instance, if z — ( — 2, 0, 3), then in the formula z -f 0,
the must stand for (0, 0, 0) since addition is defined only between n-
tuples with the same number of entries.
.

Sec. 1 Vectors

The following formulas hold for arbitrary x, y, and z in JR." and


arbitrarynumbers r, s. They express laws for our new operations of
addition and numerical multiplication very closely analogous to the
familiar distributive, commutative, and associative laws for ordinary
addition and multiplication of numbers.

1 rx + sx = (r + s)x.
2. rx + ry = r(x + y).

3. r(sx) = (rs)x.

4. x + y = y + x.

5. (x + y) + z = x + (y + z).

6. x + = x.

7. x + (-x) = 0.

These laws are all quite obvious consequences of the definitions of


our new operations and the laws of arithmetic. For illustration, we give a
formal proof of law 2.

Let x = x2(x l5 , . . . , x n ) and y = (y x ,y 2 , . . .


,y n ), and let r be a
real number. Then
rx = (rx x rx 2
, , . . . , rx n ) [definition of numerical
Vectors and Linearity Chap. 1

in Chapter 2, but for the present "vector" may be taken to mean "element
of :K"" for some n. Numbers arc sometimes called scalars when emphasis
on the distinction between numbers and vectors is wanted. In physics, for
example, mass and energy may be referred to as scalar quantities in
distinction to vector quantities such as velocity or momentum. The term
scalar multiple is synonymous with what we have called numerical
multiple.
The vectors

(1,0,. ..,0)

(0, 1 , 0, . . . , 0)

= (0,
en . . . , 0, 1)

have the property that, if x = (x x , . . . , x n ), then

x = xfr + . . . + x„e„.

Because every element of ,'R" can be so simply represented in this way, the
set of vectors {e l5 . . . , e„} is called the natural basis for 3l n . For example,
in 'Ji
3
we have (1 , 2, — 7) = + 2e — 7e The
ei 2 3. entries in

x=(x ,..., x n 1 )

are then called the coordinates of x relative to the natural basis.


A sum of numerical multiples x^ + • • • + xne„ is called a linear
combination of the vectors e r e„. More generally, a sum of multiples
, . . . ,

a1 x 1 + + a n x n is called a linear combination of the vectors x x


. . . x„. , . . . ,

Thus, for example, the equation

(2, 3, 4) = 4(1, 1, 1) - 1(1, 1,0) - 1(1,0, 0)

shows the vector (2, 3, 4) represented as a linear combination of the


vectors (1,1, 1), (1, 1,0), and (1,0,0).

EXERCISES
1. Given x = (3, — 1, 0), y (0, 1, 5), and z = (2, 5, — 1) compute 3x, y -I- z,

and 4x - 2y -r 3z. [Ans. (18, 9, -13).]

2. Find numbers a and b such that ax by (9, — 1, •


10), where x and y are
as inProblem 1. Is there more than one solution?

3. Show that no choice of numbers a and b can make ax by (3, 0, 0), -'

where x and y are as in Problem 1. For what value(s) of c (if any) can the
equation ax by = (3, 0, c) be satisfied?
I
Sec. 2 Geometric Interpretations

4. Write out proofs for (a) law 3 and (b) law 4 on page 5, giving precise justifica-
tion for each step.

5. Verify that the set C[a, b] of all continuous real-valued functions defined on
the interval a <x<b is a vector space, with addition and numerical
multiplication defined by (/ \- g)(x) = f(x) + g(x) and (rf)(x) = rf{x).

6. Prove that the representation of a vector x in Jl" in terms of the natural basis
is unique. That is, show that if

-*i e i + • • • + x ne n y^ + . . . + yn e n ,

then xk = yk for k = 1 n.

7. Represent the first vector below as a linear combination of the remaining


vectors, either by inspection or by solving an appropriate system of equations.

(a) (2, 3, 4); (1,1,1), (1,2, 1), (-1,1,2).


(b) (2, -7); (1,1), (1, -1).

(c) (-2, 3); ei , e 2 .

SECTION 2

Geometric representations of Jl 1
as a line, of 5l 2
as a plane, and of .II
3 GEOMETRIC
as 3-dimensional space may be obtained by using coordinates. To represent INTERPRETATIONS
1
'Si one must first specify a point on the line to be called the origin,
as a line,
a unit of distance, and a direction on the line to be called positive. (The
opposite direction is then called negative.) Then a positive number x
corresponds to the point which is a distance x in the positive direction
from the origin. A negative number x corresponds to the point which is a
distance |.y| from the origin in the negative direction. The number zero of
course corresponds to the origin. The number line is most often thought
of as horizontal with the positive direction to the right. With this standard
convention, we obtain the familiar Fig. 1, in which the arrow indicates
the positive direction.

Figure 1

In the plane, one takes an origin, a unit of distance, a pair of per-


pendicular lines (called the axes) through the origin, and a positive
direction on each axis. is, a pair of numbers
Given a vector in 3l 2 , that

(Xi, x 2 ), the procedure described in the preceding paragraph determines a


point p 1 on the first axis corresponding to the number .v x and a point p 2
on the second axis corresponding to the number x 2 Then the vector .

(jfi, x 2 ) corresponds to the point p in the plane whose projection on the

first axis is
p x and whose projection on the second axis is /?.,. The (per-
pendicular) projection of a point/? on a line L is defined as the foot of the
Vectors and Linearity Chap. 1

perpendicular from p to L if p is not on L. If/? is on L. then the projection


of p on L is p itself.

The conventional choice is to take the first axis horizontal with the
positive direction to the right, and the second axis vertical with the
positive direction upwards. This leads to the usual picture shown in Fig. 2.

Representing a vector by an arrow from the origin to the corresponding


we have done
point, as in Fig. 2, often makes a better picture than simply
marking the point.
An obvious extension of the procedure works in three dimensions.
One takes an origin, three perpendicular axes through and a positive it,

direction on each. A vector in 'J\ 3 is a triple of numbers (x 1 x 2 x 3 ) and , ,

gives points p x ,p 2 and/? 3 on the three axes. Then the point/?, correspond-
,

ing to (jtx x 2 x 3 ), is the one whose projections on the three axes are p x p 2
, , , ,

and p 3 .

There is no universally accepted convention for labeling the axes in


3-dimensional figures. Figure 3 illustrates the convention followed in this

book, but several other schemes are also in common use.

/>(-3,2)

Figure 2 Figure 3

We have described how to set up a correspondence between vectors


(/7-tuples of numbers) and points. We now consider the geometrical
interpretation of the vector operations of addition and numerical multipli-
cation. This time we represent a vector as an arrow from the origin to the
point with given coordinates. Figure 4 shows two vectors u = (ult u 2 )
and v = (»!, v 2 ). The sum in %2 is u + v = (w x + v x u 2 + v 2 ), so the ,

arrow representing u -f- v must be drawn as shown.


As Fig. 4 suggests, u + v is represented by an arrow from the origin
to the opposite corner of the parallelogram whose sides are the arrows
representing u and v. This geometric rule for adding vectors is often
referred to as the parallelogram law of addition. (To prove that this
Sec. 2 Geometric Interpretations

Figure 4

law is a consequence of our definitions, one of course has to use some

theorems of geometry. For an outline of a proof, see Exercise 6(a).)


Another rule for adding vectors geometrically is illustrated in Fig. 5.
Starting at the end of the arrow representing u, draw an arrow equal in
length and parallel to the arrow representing v. (In other words, translate
the arrow representing v from the origin to the end of u.) Then u + v is

represented by an arrow from the origin to the end of the translation of v.

This rule can be applied to any pair of vectors, whereas the parallelogram
law does not (strictly speaking) apply if the arrows representing u and v
lie in the same straight line.
If we write w for u + v, then v =w— u. Figure 6 (which is simply
Fig. 5 relabeled) illustrates the useful fact that the difference of two vectors
w and u is represented by the arrow from the end of u to the end of w,
appropriately translated.

v(translated) w - u (translated)

Figure 5 Figure 6

Figure 7 illustrates numerical multiplication, both by a positive number


a and a negative number For a positive number a, the arrow represent-
b.

ing an has the same direction as the arrow representing u and is a times as
long. For a negative number b, b\\ points in exactly the opposite direction
to u and is \b\ times as long. (See Exercise 6(b).)
The question of whether to represent a vector geometrically as a
10 I 'velars and Linearity Chap. 1
Sec. 2 Geometric Interpretations 11

Example 1. To find a representation for the line in 'Ji


3
parallel to the
vector (1, 1, 1) and passing through the point (
— 1,3,6) we form all

multiples r(l, 1, 1) to get a line through the origin. Then the set of all

points /( 1 , 1 , 1) + (— 1, 3, 6) is a line containing the point (


— 1,3,6),
as we see by setting / = 0.
3
To determine a plane in 'J\ it is natural to start with two noncollinear
vectors x x and x 2 (that is, such that neither is a multiple of the other) and
consider all points ux x -f rx 2 where u and v are numbers.
, The geometric
interpretation of numerical multiplication and addition of vectors shows
that the points i^ + *'x 2 constitute what we would like to call a plane
P through the origin. We then define a plane P parallel to P to be the set
of all points j/x x + vx 2 + x , where x is some fixed vector. P and P are
related as shown in Fig. 9.

Figure 9

Example 2. We represent a plane in % 3


parallel to the two vectors
(1, 1, 1) and (2, —1,3), and passing through the point (0, 1, —1). The
set of all linear combinations w(l, 1, 1) + v(2, —1,3) is a plane P
through the origin, and the set of all points w(l, 1,1) + v(2, —1,3) +
(0, 1,-1) is a parallel plane P. That P contains the point (0, 1,-1)
becomes evident on setting u =v= 0.

The set of all linear combinations of a set S of vectors is called the


span of 5, and we speak of the set of linear combinations as being spanned
by 5". For example, if 5 consists of one vector x, then the line £ consisting of
all numerical multiples tx is the span of S. If S = {x,, x 2 }, the set spanned

by S is the set P of all linear combinations ux x + yx 2 For P to be a .

plane, we require that x x and x 2 not lie on the same line. Another way to
state this condition is that no multiple of x x should equal a multiple of
x 2 except for the zero multiples. More generally, we say that a set of
vectors
Xj, x2 , . . . ,x n

is linearly independent if, whenever

c1 x 1 + . . . + cnxn =
12 Vectors and Linearity Chap. 1

for some numbers c lt c n , then all the c's must be zero. If on the
. . . ,

other hand it is possible to have a linear combination of vectors equal to


the zero vector, but with not all the coefficients equal to zero, then those
vectors are said to be linearly dependent.

Example 3. The set of three vectors

(1,2), (-1,1), (1,-3)


in :i{'- is linearly dependent. For, the equation

c 1 (l,2) + c (-l,l) + c,(l,-3) =


i

is equivalent to the two equations

Ci—c +c = 2 3

2ci + c — 3c = 0. 2 3

If we now let cz have any nonzero value, we can then solve for c x and c 2 .

For example, if we set c 3 = 1, then

Ci — c2 =— 1

2c i + c2 = 3,
and we solve the two equations. Adding them gives 3c x = 2, or c x = §,
while subtracting the second from two times the first gives — 3c = 2 —5,
or Co = f Thus .

(|)(1, 2) +(!)(-!, 1)+(1)(2, -3) = 0,

so the vectors are linearly dependent. Linear independence can be


checked similarly by solving a vector equation and showing that it has
only zero solutions.

A linearly independent set of vectors that spans a subset S of a vector


space U is called
C
a basis for S. A basis for S is useful because, if x is any
vector in 8, then x can be represented as a linear combination of basis
elements with uniquely determined coefficients. Thus if Xj, . . . , x„ is a
basis for S and x is in S, we have, by the spanning property,

X = C&i + . . . + c„x n ,

for some constants c lt . . . , cn . But the c's are completely determined.


For if we had also
x = d{x. + x . . . + dnx n ,

then subtraction of one equation from the other gives

= (Cj - djx! + . . . + (c„ - d n )x n .

It follows from the linear independence of the x's that ck — dk = for


each k. so that c,. = d,..
Sec. 2 Geometric Interpretations 13

3
Example 4. The natural basis for .'K is the set of vectors

ei = (1,0,0), e2 = (0,1,0), e3 = (0,0,1).


It is easy to check that these vectors are linearly independent and that they
3
span 'Ji . For if

Cjej + c2e 2 + c3e 3 = 0,

then (<?!, c2 , c3 ) = (0, 0, 0). On the other hand, any vector (x, y, z)
3
in !il can be written
(x, y, z) = xe x + ye + 2 ze 3 .

EXERCISES
1. For each pair u, v of vectors in Jl 2 given below, draw the arrows representing
u, v, u -f v, u — v, and u + 2v.

(a) u = (1,0), v = (0, 1).


(b) u = (-2, 1), v =(1,2).
(c) u = (-1,1), v = (!,!).
2. Let x = (1, 1) and y = (0, 1). Draw the arrows representing tx + y for the
following values of t : —1 , \, 1 , 2.

3. Let u = (2, 1) and v = ( — 1, 2). Draw the arrows representing /u +


(1 — t)y for the following values of t: —1,0,^,1,2.
4. (a) Let u x = (0, 1), v x = (-1, 1), u 2 = (-3, 2), and v 2 = (2, 1). Sketch
the lines iij + and u 2 + s\.2 Find the vector w at the point of inter-
t\ x .

section of the lines by finding values of t and s for which w = u x + t\ 1 =

u2 + .vv
2
. [Ans. w = (— f, f).]
(b) Let u lt Vj, and u 2 be as in part (a), but take v 2 = (2, —2). (Note that v 2
is then a numerical multiple of v^) Sketch the lines u l | t\ 1 and
u2 + s\.,. Show algebraically that the lines do not intersect.

5. Show that in :R", the lines represented by sux t u and /v t + v are the same
if and only if both v t and v ()
— u are numerical multiples of u l .

u2 + v
2
14 I ectors and Linearity Chap. 1

OVB and UPA are congruent. From this deduce that OK and UP are
parallel and of equal length. Then OVPU is a parallelogram, and the
parallelogram law follows,
(b) In Figs. 7(a) and 7(b) the points U, V, and W are constructed to have
coordinates as shown. Prove that the triangles OUP, OVQ, and OWR
are similar. Show that the angles HOP, VOQ, and WOR are therefore
equal and that the lengths OK and OW^arc proportional to the length
of Oil as stated in the text.

7. It is not essential to use perpendicular axes in setting up coordinates in


the plane. The same procedure can be used if the projection of a point on
an axis is defined as the point of intersection of that axis and the line
through the given point and parallel to the other axis. (If the axes arc
perpendicular, this is equivalent to our previous definition. Why?) It is
also possible to choose different units of distance along the two axes.
Show that the geometric interpretation of the vector operations in two
dimensions remains the same in this more general setting. How would
you extend the definition of projection to make the same generalization in
three dimensions?

8. (a) Show if u and v are two distinct vectors, then the vectors ru +
that
(1 —form a line through the points corresponding to u and v.
r)v
n
(b) If x x and x 2 are two vectors in 3i then the set of all vectors rx x +
,

(1 — /)x 2 where ,< t < 1, is the line segment joining \ x and x 2 A set .

5" in R" is convex if, whenever S contains two points, it also contains the

line segment joining them. Prove that the intersection of any collection
of convex sets is convex.

9. Represent the following lines in the form rx x [ x , where / runs over all real

numbers. Sketch each line.

3
(a) The line in J\ parallel to (1, 2, 0) and passing through the point (1,1, 1).
2
(b) The line in 3i joining the points (1, 0) and (0, 1).
3
(c) The line in :R joining the points (1, 0, 0) and (0, 0, 1).

3
10. Represent the following planes in 3l in the form ux 1 + vx 2 I
x„, where
// and v run over all real numbers. Sketch each plane.

(a) The plane parallel to the vectors (1, 1,0) and (0, 1, 1) and passing
through the origin.
(b) The plane parallel to the vectors e t and ea and passing through the point
(0, 0, 1).
(c) The plane passing through the three points (1, 0, 0), (0, 1, 0), and
(0, 0, 1).

11. Determine whether each of the following sets of vectors is linearly dependent
or linearly independent.

(a) (1,2), (2,1).


(b) (1,2), (-3, -6).
(c) (1, -1), (0, 1), (2, 5).
(d) (1,2,0), ( -1,1, 1), (-1,0, 0).
Sec. 3 Matrices 15

12. Determine whether the first vector below is in the set spanned by the set S.

In each case give a geometric interpretation: the first point does (or does
not) lie on the plane (or line) spanned by S.

(a) (1,\);S = {(-2, -2)}.


(b) (-1, 1,3); 5 ={(1, 1, 1), (-1,0, I)}.

(c) (-1, -2, 1); S = {(3, 6, -3), (1, 2, -1)}.

13. Which of the following sets form bases for :R 2 or H3 ?


(a) (I, 1), (-1, 1). (c) (1, 1, 1), (1, 1,0), (1,0,0).
(b) (1, 2, 1), (-1,2, 1), (0, 4, 2). (d) (1,0, 0), (0, 2, 0), (0, 0, 3).

14. (a) Prove that two nonzero vectors x and y are linearly dependent if and
only if they lie on the same line through the origin,

(b) Prove that three nonzero vectors x, y, and z are linearly dependent if

and only if they all lie on the same plane through the origin.

A set of equations such as

J>!
= 2x x + 3x 2 — 4x 3

y2 = xl — x2 + 2x 3 ,

in which each v, is given as a sum of constant multiples of the Xj, defines a


3 2
function (in this example from 'J\ to ;<{ ) of a particularly simple kind. It

is an example of what we shall call a linear function. (The precise definition


of "linear" appears in the next section.) An understanding of these
functions is basic to the study of more general functions. Linear functions
1 1
from 111 to 'Jl are so simple that they can be taken for granted in studying
the calculus for functions of one variable. In higher dimensions, however,
linear functions can be more complicated, and themain business of this
first chapter is to develop their basic properties and to present notation
and methods of calculation for dealing with them.
In the foregoing example, the .x's and v's can be thought of as place-
holders. The function is described completely by the array of coefficients

Any such rectangular array of numbers is called a matrix. Thus


^0.325>
5X
1 1 1
(l,i,0),
16 Vectors and Linearity Chap. I

fiveexamples above have shapes 3-by-2, 2-by-3, 2-by-2, l-by-3, and 4-by-l.
Note that the number of rows always comes before the number of columns.
The 1-by-/? matrices are called n-dimensional row vectors, and /?-by-l

matrices are called n-dimensional column vectors. A matrix is square if it


has the same number of rows as it has columns.
The number in the Ah row andyth column of a matrix is called the ijth
entry of that matrix. Once more the row index is always put before the
column index. Two matrices are equal if and only if they have the same
shape and the entries in corresponding positions in the two matrices are
equal.
We use capital letters to denote matrices, and often use the corre-
sponding small letters with appropriate subscript to denote their entries.
Thus we write

j a 21 a 22 a 23 a 2i ) or A = {au ) t

where i — 1,2,3, j = 1,2,3,4. We xlt x 2


usually write simply xn , . . . ,

for the entries of an «-dimensional column vector x rather than x n ,

x 2 i, x nl , .

The operations of addition and numerical multiplication which were


n
defined in the last section for vectors in 'i\ can be extended to matrices.
If A and B have the same shape, then their sum A + B defined as the
is

matrix with the same shape and ijth entry equal to a u + b u For example
.

2 3\ /l -1\ /3 2^

4/ \0 0/ \0

Addition is not defined between matrices of different shapes. Recall that


we did not n
and %m with m ^ n.
define addition between elements of
3i

For any matrix A and number r, the numerical multiple rA is defined


as the matrix with the same shape as A and ijth entry equal to ra u For .

example
Sec. 3 Matrices 17

As with vectors in :ft", we write — A for (—\)A and A — B for A +


(—1)2?. For every shape there is a zero matrix which has all its entries
equal to zero. We use to denote any zero matrix; the shape intended
will always be clear from the context.
It is if the matrices X, Y, and Z all have the same shape,
easy to see that
and / and any numbers, then the formulas 1-7 in Section all hold.
5 are 1

(The proofs are just the same as when X, Y, and Z are all in .'ft".) In other
words, according to the definition in Section 2, for any fixed m and n, the
set of in -by- n matrices forms a vector space with the operations of addition

and numerical multiplication that we have just defined.


Another operation between matrices is suggested by the way they may
be used to describe functions from %m to .'H". For example, suppose we
have formulas
-i = 3j! - v->

z2 = 5jj + 2y 2 ,

and

yt = 2x t + 3x 2 + 4x 3

yi = X\ -^2 + ^X 3 ,

defining functions and from %


from .'ft
2
to 'Ji
2 3
to :ft
2
, respectively. These
functions are described by the matrices A and B where

3 -l\ (2 3 4
and B =
5 2/ \l -1 2

If we express the z's directly in terms of the x's we obtain

za = 3(2*! + 3x 2 + 4*3) — Ui — x + 2x 2 3)

z2 = 5(2*1 + 3x 2 + 4a: + 3) 2(*! — x + 2*3). 2

Rearranging terms gives

zx = (3 • 2 - 1 l)x, + (3 •
3 - 1 •
(-1)> 2 + (3 -4 - 1 • 2)x 3

z2 = (5 • 2 + 2 •
1)*! + (5 • 3 +2 • (- l))x 2 + (5 4 + 2 •
2)x3

and we see that the resulting function is described by a matrix C with

'3-2-1-1 3 •
3 — 1 -
(— 1) 3 -
4 — 1
-
2\ (5 10 10
C=
X5-2 + 2-1 5-3 + 2-(-l) 5-4 + 2-2/ \12 13 24

We say that C is obtained from A and B by matrix multiplication and


write C= AB.
To see how the general definition should be made, note that the ijih
entry of C is the sum of products of entries in the /th row of A and they'th
column of B. Thus c 21 = fl 21 6n + a.12 b.n = 5-2 + 2-1 = 12.
18 Vectors and Linearity Chap. J

We give the definition of matrix multiplication in two stages. First


suppose

A = (a u a 2 , . . . , a k) and B =

are row and column vectors of the same dimension. Then the product AB
is defined to be thenumber a-Jjx + a 2 b 2 + + a k b k Now let A be an . . . .

m-by-k matrix and B be an k-by-n matrix. (It is important that the number
of columns of A be equal to the number of rows of B.) Then each row of
A is a A'-dimensional row vector and each column of B is a ^-dimensional
column vector. We define the matrix product AB as the m-by-n matrix
whose ijth entry is the product (in the sense just defined) of the z'th row of
A and they'th column of B. The product AB always has the same number
of rows as A and the same number of columns as B. For instance, in our
example
'3 -1\ /2 3 4\ / 5 10 10^

12 13 24

the entry in the second row and third column of the result is obtained by the
calculation

(5 2)1 ]= 5-4 + 2-2-24.

You should check that the other entries in the product can be obtained by
the rule stated above. Schematically the entries in a matrix product are
found by the mechanism illustrated below. The process is sometimes
called row-by-column multiplication of matrices.

* * * *
* * *
* * * 3 ***[*]*
* * *
* * * *

The following remark is an obvious consequence of the way matrix


multiplication is defined. We state it formally for emphasis and because
we shall refer to it later.
Sec. 3 Matrices 19

3.1 Theorem

The /th row of a matrix product AB is equal to the ith row of A


times B. They'th column of AB is equal to A times the/th column
of B.

There are several important laws relating matrix multiplication and


the operations of matrix addition and numerical multiplication. They
hold for any number and matrices A, B, C for which the indicated
/

operations are defined. (Addition is defined only between matrices of the


same shape. Multiplication is defined only if the left factor has exactly
as many columns as the right factor has rows.)

1. (A + B)C = AC + BC.
2. A(B + C) = AB + AC.
3. {tA)B = t{AB) = A(tB).

4. A(BC) = (AB)C.

According to the last law, it makes sense to talk of the product of three
matrices and simply write ABC, since the result is independent of how the
factors are grouped. In fact this 3-term associative law implies that the
result of multiplying together any finite sequence of matrices is inde-
pendent of how they are grouped. Not all the laws that hold for multi-
plication of numbers hold for multiplication of matrices. In particular,
the value of a matrix product depends on the order of the factors, and
AB is usually different from BA. It is also possible for the product of two
matrices to be a zero matrix, without either of the factors being zero.
Exercise 5 at the end of this section illustrates these points.
The laws stated above are easily proved by writing out what they mean,
using the definitions of the operations, and then applying the associative,
distributive, and commutative laws of arithmetic. Number 4 is the most
complicated to prove, and we give its proof in full below. The other
proofs are left as exercises.
To prove thatA(BC) = (AB)C, let A, B, and C have respective shapes
p-by-q, q-by-r, and r-by-s. Let U = BC and V = AB. (Then U has
shape q-by-s, and V has shape p-by-r.) We have to show that AU = VC.
The //'th element of AU is (by definition of matrix multiplication) equal
Q T

t0 2 a The element of U ^ bkl c u Thus the element of

AU
k=l
ik ll kj-
HIT Ay'th
\
is
1=1
. (/th

equals 2 a ik( 2 b kl c u \. Similarly, the //th element of VC is equal


:

20 Vectors and Linearity Chap. I

r Tin \

to ^= vuc u — 2 IZ a<fc^w) c w Both these expressions are equal to the


1 1 (=1 u=i /

sum

l<l<r
l<k<Q

and hence are equal to each other. Thus corresponding entries of AU


and VC are equal and the matrices are the same. This completes the
proof.
A square matrix of the form
1 ...

. . . 1

that has l's on its main diagonal and zeros elsewhere is called an identity
matrix. It has the property that

IA = A, BI=B
for any matrices A, B such that the products are defined. Thus it is an
identity element for matrix multiplication just as the number 1 is an identity
for multiplication of numbers. There an n x n identity matrix for every
is

it is almost always clear from the


value of n, but, as with the zero matrices,
context what the dimension of an identity matrix must be.
If A is a square matrix and there is a matrix B (of the same size) such
that
AB = BA = I,

then we say that A is invertible and that B is an inverse of A. As we show

below Theorem 3.3, there is at most one matrix B satisfying these


in

conditions, so we are justified in speaking of the inverse of a matrix. If


A is an invertible matrix, we write A' 1 for its inverse. For example, it is
easy to check that

and according
T, 1 - U DC
to the definition this shows that
K
1 2\ /l 2\~ l
=
[1-2
is invertible and that I

3 l) \3 7/ \-3 1
Sec. 3 Matrices 21

Many matrices, on the other hand, are not invertible. No zero matrix can
have an inverse, and several less obvious examples are given in the
exercises.
It is usually not easy to tell whether a large matrix is invertible, and it

can take a lot of work to compute its inverse if it has one. Determinants
can be used to give a formula for the inverse of a matrix (Theorem 8.3),
and a more effective way to compute inverses is given in Section 1 of
Chapter 2. Two-by-two matrices and a few other easy cases are discussed
in Exercises 9 to 13 of this section. The rule for finding the inverse of a
2-by-2 matrix is as follows.

bY 1
-b\
3 2
-
a
=— H d i /
if ad-bc^O.
ad - bc\-
,

\c d! c a

Thus, for example

-1
1 3\ (2 -3
,-1 2/ Hi 1

It is easy to verify that

1 3\ /* —\\ /* -*\ / 1 3

-1 2/U \J U \JY
so that we have indeed found the inverse.
Two important properties of invertible matrices are easily proved
directly from the definition. The first one ensures that, no matter how an
inverse to a matrix A is computed, the resulting matrix A~ x is always the
same.

3.3 Theorem

A matrix A has at most one inverse.

Proof. Suppose there are two matrices, B and C, such that both

AB = BA = / and AC = CA = /.

Then AB = AC, because both products equal /. Multiplying by B


on the left gives
BAB - BAC.
22 Vectors and Linearity Chap. I

But since BA = I, substitution gives IB = IC, from which it follows


that B = C, as was to be proved.

The next theorem shows how to compute the inverse of a product of


matrices each of which is invertible.

3.4 Theorem

If A =A X A2 . . . A n and all of A
,
x , . . . , AH are invertible, then A is

invertible and A~ x = A~ l A~\ x . . . A^A^ 1


.

Proof. In the (A~ A^A^iA^o


product A n ), the terms l
. . . . . .

A\ A X combine to give /, which may then be dropped. Then A^ 1


X

and A 2 cancel, and so on, until / is obtained as the final result.


Similarly (A x A n ) (A~ l. A x x ) reduces to /, and this shows that
. . . . .

the two products are inverses of each other.

An identity matrix is a special case of what is called a diagonal matrix.


A square matrix A is diagonal if its entries off the "main diagonal" are
all zero, that is,ifo = whenever/ ^ j. The notation diag (t x t 2 tn) , , . . . ,

is convenient for the n X


n diagonal matrix which has entries t x t 2 , ,
. . . , tn

on the diagonal. For example, diag (2, 0, — 1, 3) is a notation for

Problems 7 and 8 show that matrix operations with diagonal matrices are
particularly simple.

EXERCISES
-2

C =
Sec. 3 Matrices 23

determine which of the following expressions are defined, and compute


those that are.

(a) 25 - 3G. (f) CD + 3DB.


(b) AB. (g) 2AB 5G.
(c) BA. (h) 2GC - 4AB.
(d) BD. (i) CDC. 3 7 1

(e) DB. (j) £CD. /l/;s. (b)


-2 14 -4

2. Show that for any matrix A and zero matrices of appropriate shapes,

AO =0 and OA = 0.

If A is m-by-/7, for what possible shapes of zero matrices is A0 defined?


For what shapes is OA defined? What are the shapes of the products?

3. With A, B, C, D as in Problem 1, determine what shapes A' and Y would


have to have for each of the following equations to be possible. (In some
cases there may be no possible shape; in some cases there may be more than
one.)

(a) AX = B + Y. (d) CX + DY = 0.

(b) (D 2X)YC = 0. (e) AX = YC.


(c) AX = YD. (f) AX = CY.
[Ans. (d) X is 3-by-n , Y is 2-by-».]
4. Prove the distributive law

A(B + C) = AB + AC
for matrix multiplication.

1 2\ 2 6
5. Let U= V= . Compute UV and VU. Are they the
2-4/ \1 3/
same? It is possible for the product of two matrices to be zero without
either factor being zero?

6. Let X

Let D - diag(l,2, 3)

Compute DX, DP, and DQ.


24 Vectors and Linearity Chap. 1

How may the product SD be described? (Computing the product QD using


the matrices of Exercise 6 should suggest the general rule.)

8. Using the matrices B, C, and G


compute Be Ce x Ge x Be3 of Exercise 1, 1 , , , ,

Ce3 Ge3 Prove


, . and column vector
the general rule that for any matrix M
e,- with appropriate dimension, the product A/e; is they'th column of M.

9. What is the product diag (a u . . . , a n ) diag (b u . . . , b n )l When is the


result the identity matrix? Show that diag (q lt . . . , a n ) has an inverse
provided none of the numbers a t is zero.

b\
10. (a) Let A = I
J
be a 2-by-2 matrix with ad # be, and let A~ x be given

by Formula 3.2. Show that AA~ = A~ A = X l


I. (This proves that A is

invertible if ad - be, called the determinant of A, is not zero.)


(b) Try to find the inverses of the following 2-by-2 matrices using Formula
1 1\ (0 1\ (2 6\

2 1/' \l 0/ \1 3,

What is wrong with the last one?

11. Show that if any matrix X # such that AX = 0, then A cannot be


there is

invertible. [Hint. Suppose B = A' 1 and consider (BA)X = B(AX).] Use ,

this result to show that diag (a l5 a n ) is not invertible if any of the a, is


. . . ,

zero.
b\
12. Show that if ad = be, then I I is not invertible.

13. If A is a square matrix, it can be multiplied by itself, and we can define


A 2 = A A, A 3 = AAA = A 2 A, A n = A A A (n factors). These powers . . .

of A all have the same shape as A. Find A 2 and A 3 if

(a) A = (b) A =
(

4 3^
Ans. (a)
li

(Note that is the only number whose cube is 0. Part (b) of this problem
thus illustrates another difference between the arithmetic of numbers and of
matrices.)

14. The numerical equation a 2 = 1 has a = 1 and a = —1 as its only solutions.

(a) Show that if A = /or -/, then A2 = /, where /is an identity matrix of
any dimension.
(a b\- (\ 0\
(b) Show that I I = I I if a2 + be = 1 ; so the equation
Sec. 4 Linear Functions 25

A2 = I has infinitely many different solutions in the set of 2-by-2


matrices,
(c) Show that every 2-by-2 matrix A for which A2 = I is either /, — /, or
one of the matrices described in (b).

SECTION 4

The product of an m-by-n matrix and ^-dimensional column vector LINEAR FUNCTIONS
(n-by-l matrix) is an w-dimensional column vector. An w-tuple in %
n

obviously corresponds to a unique rt-dimensional column vector, and


vice versa. From here on we shall often simply consider elements of 31"
to be column vectors. With this convention, multiplication by any given
m-by-n matrix defines a function from 31" to 3l m Indeed a matrix equa- .

tion such as

fyi
) =
2 3
~4
\yj
~ (
\l -1 2)

is equivalent to a set of numerical equations

jx = 2x + x 3x 2 — 4x 3

y<i = Xi X2 ~r 2X3,

as may be seen by simply writing out the result of the matrix multiplication.
Thus the vector function described by a matrix amounts to multiplication
of a domain vector by the matrix.
Functions given by matrix multiplication can be characterized by some
very simple properties. Note that the definitions given below apply to
functions between any two vector spaces, although we are at present
concerned only with the spaces 31".

A function / with domain a vector space "Xf and range a subset of a


vector space ID is a linear function if the equations

f(x + y) =/(x) +/(y)


/(rx) = r/(x)

C
hold for all vectors x, y in \J, and all numbers r.

4.1 Theorem

A function from %n to %m is linear if and only if it coincides with


multiplication by some m-by-« matrix.

Proof. We show first that if/ is defined by/(x) = Ax for a fixed


matrix A, then /is linear. We
must show that/(x + y) =/(x) +
/(y) and/(rx) = rf(x) for any x, y, and r. But by the definition of/
26 Vectors and Linearity Chap.]

these equations amount to A(x \- y) Ax + Ay and A(rx) =


/i Ki. which hold by properties 2 and 3 for matrix multiplication.
The proof of the converse is a little more complicated. Suppose

we are given a linear function g from :H" to :i{" We have to find an 1


.

m-by-n matrix A such that g(x) Ax for every column vector x in


:K". Let e, be the column vector in 31" that
for itsy'th entry and lias 1

for all other entries. Now let A be


whose jth column is the matrix
the ///-dimensional vector g(e ; ) fory 1.2 //. (Thus A has m
rows and // columns, as required.) Any column vector x with
entries .v,, x 2 v„ can be written as X& -f-
, + x n e n Since . . . .

g is linear,

g(x) = g{xx*i + • • • + xnen )

= *ig(ei) + • • • + x n g(e„).
By the definition of A we have

g(x)

a 21 x x + . . . + «o„.v„

^/,„ l
.v, + . . . + fl,„, r v,
7

Finally, by the definition of matrix multiplication,

g(x) = /fx.

The proof given above actually shows how to construct the matrix
corresponding to a linear function. This construction is important and
we summarize it as a theorem for emphasis.
Sec. 4 Linear Functions 27

4.2 Theorem

Let 31" — * 'Ji


m be a linear function, and let A be the matrix whose
y'th column is/(e ;
). Then/(x) = Ax for every x in 31".

The result is easy to remember if we consider, for example, the case of


a function /from ft 2 to ft
2
. The matrix A, for which /(x) = Ax, is then
2-by-2, and its columns are the result of applying A successively to
n\ /o\
ei = and e2 =
W \i,

b d]\o)
~ \b)' \b d)\\) \d t

Example 1. Consider the function ft 'Ji described in terms of the 2 — 2

geometric representation of ,'R 2 as a counterclockwise rotation of 30°. Any


rotation leaves the origin fixed, preserves distances, and carries figures
such as parallelograms onto congruent figures. From these properties,
and the way in which addition and numerical multiplication of vectors
can be done by geometrical constructions (Section 2), it can be shown that
a rotation is a linear function. Figure 10(a) shows the vectors e u e 2 ,f(ei),
and/(e 2 ). By trigonometry we see that

By Theorem 4.2,

for any vector


W in ft 2 .

Example 2. Define 'Ji


2 — 3l 2 to be the linear function which multiplies
horizontal distances by 3 and vertical distances by 2. Then fie}) = 3e x
and/(e 2 ) = 2e 2 and /has the diagonal matrix
,

2
28 I 'ectors and Linearity Chap. 1

m = Kg
*•> - ("$

(a) (b)

Figure 10

The geometrical effect of/ is illustrated in Fig. 11, in which C is the unit

circle xx + jc£ = 1, and /(C) is the image of C under/ If


( J
is the

/*i\ /"i\
= =
image of
W under/ then x x
\uj and .v 2 Ui.z . Hence, if
\W2/
I is in

/(C), then h —= 1 ; this can be recognized as the equation of an

ellipse with semimajor axis 3 and semiminor axis 2.

Theorem 4.2 says that, f/is a linear function from :K" to 3t"\ then the
;

jih column of the matrix of/is/(e ). Because we can write any vector x in
;

:K" as a linear combination

x = c^ + . . . + c„e„,

Figure 11
Sec. 4 Linear Functions 29

we can use the linearity of/ to write

f(x) =c 1 f(fi 1 ) ... + c n /(e n ).

Since x is an arbitrary vector in the domain of/, the last equation expresses

an arbitrary vector in the range of/ as a linear combination of the vectors


/(ej), . ,/(e„). But these vectors are just the columns of the matrix of/
. .

so we have the following.

4.3 Theorem

Let 3V — *.'/{'" be a linear function with matrix A. Then the


columns of A, looked at as vectors of %m , span the range of/

Example 3. If P is a plane through the origin in .'K


3
, then P consists of
3
all linear combinations of two vectors Vj and y 2 in .'R . The function
M -L+ 'Ji
3
defined by

/(
= "yi + ^2
J

is linear (why?), and Theorem 4.2 says that the matrix of/has as columns
the two vectors

- J, and = y.
/(J) /(J)
For example, if

then

[\
VI
'
1

and indeed the range of/is spanned by the columns of the 3-by-2 matrix.

Example 4. If L is a line consisting of all points of the form tu x + u ,

with Uj =£ 0, and /is a linear function, then

/(/u 1 + u ) = //(u 1 )+/(u )

= ty x + v .
30 Vectors and Linearity Chap. J

Thus, unless /in,) 0, the image/(L) is also a line. If/'tuj) - 0, then


of course /(/ > is the single point /(u„). The equation /(u,) eannot
hold for a nonzero vector u, if the linear function / is one-to-one, as
defined below.

A function /is said to be one-to-one if each point in the range of/


corresponds to exactly one point in the domain off. In other words,/
isone-to-one if the equation /'(x,) /(x 2 always implies that x, ) x.j.

For example, of the two real-valued functions /( v) x and g(x) — x2 ,

it is obvious that/is one-to-one. But g is not one-to-one because g(x) =


»( x) for every real number .v. I or linear functions we have the following
criterion.

4.4 Theorem

A linear function / is one-to-one if and only if /(x) - implies


x 0.

Proof. For suppose that /'(x) always implies x 0. If /(x,) =


/'(x 2 ). the linearity of /shows that /(Xj \j) 0. But then bv
assumption Xi — x. 0: so x, x._>. and therefore/must be one-to-
one. Conversely, suppose that /(x,) /(x.j) always implies x x = x 2
= .

Then, because /(0) = for a linear function, the equation f(x) =


can be written /(x) = /(0). But then our assumption implies x = 0.

Example 5. The counterclockwise rotation through 30° in Example I

is certainly one-to-one. For only one vector is carried into any other
vector by the rotation. Alternatively, the zero vector is the only vector
carried into the zero vector. In algebraic terms, the rotation is pro\ed in

Example 1 to be representable by

73

1
- x — —
n/3
v

Then Theorem 4.4 shows that the one-to-one property of/ is equivalent
to the equations

-'
2
"
1
v/ 3
Sec. 4 Linear Functions 31

having only the solution (x,y) = (0,0). This of course can be verified
directly by solving the equations.

If/ and g are any two functions (not necessarily linear) such that the
range space off is the same as the domain space of g we define the com-
positiong °f to be the function obtained by applying first /and then g.
More explicitly, if x is in the domain of/and f(x) is in the domain ofg,
then g °/(x) is defined as g(f(x)). If /(x) or g(f(x)) is not defined, then
g °/(x) is not defined.
Composition of linear functions lies behind matrix multiplication. In
introducing the concept of matrix multiplication, we considered a function
from .'H 2 to :K 2 given by
*\ = 3j>i - y2
?2 = 5)>i + 2y 2
3 2
and another from :K to .'K given by

yx = 2x + x 3x 2 + 4.y 3

JAj = Xi X2 -\- ^v 3 .

We then computed that the composition of the two functions was given by
the formulas
z, = 5*! + 10x 2 + \0x 3

z2 = 12*! + 13jc 2 + 24x 3 .

The definition of matrix multiplication was set up to give the matrix

/ 5 10 10\
of the composite function as the product of the matrices
\12 13 24/
'3 -l\/2 3 4\

5 2/ \ 1 — 1 2,

for the original functions. The following theorem states the important
fact that the composition of linear functions is always given by matrix
multiplication.

4.5 Theorem

Let /and g be linear functions with %n —* %m , 'Ji


m -^->- 31 p given
by matrices A, B. Thus/(x) = Ax and ^(y) = By for all x in :K"
and y in 3tm Then the composition g o/is given by the matrix BA,
.

so that g o/( x ) = BAx for all x in 31".

Proof. Note that A is an m-by-n matrix and B a p-by-ni matrix, so


the matrix product BA is defined and has the appropriate shape,
32 1 'ectors and Linearity Chap. 1

namely, p-by-n. By the definition of g f, g °f(x) = g(f(x)) =


B(A\) for all x in .'){". By the associative law of matrix multiplication
this is equal to {BA)\, as was to be proved.

Example 6. Consider the function :ii 2 -=-*• ft 2 defined as/°/o/ where


/is the function of Example 1. By Theorem 4.5, g(x) = Bx, where

B =

Thus g(ej =Be = 1 M= e, and g(e 2 ) = Be 2 = ( j - -e x. As

g amounts geometrically to a counterclockwise rotation of


Fig. 12 shows,
90°,which of course is what the result of three successive 30° rotations
ought to be.

M e 2 =&*l)

gi*i)

Figure 12

Example 7. The identity function :i\" — :R", such that /?(x) = x, is a


linear function. Its matrix the «-by-/; identity matrix Two functions
:R" — is

>-3V and %n —^->-;R" are said to be inverses of each other if/°g


/.

and g o/are both equal to the identity function. For example, if n = 2,


suppose that /is the counterclockwise rotation through 30° discussed in
Example 1. and suppose that g is a clockwise rotation through 30°. The
functions /and g are obviously inverses of each other. The matrices of
Sec. 4 Linear Functions 33

/and g are found from Fig. 12 to be

v5
34 Vectors and Linearity Chap. 1

The following theorem is an immediate consequence of the definitions


and of Theorem 4.1.

4.6 Theorem

A function /from .11" to


m is affine if and only if/(x) = Ax b
'.)i
-f
for some fixed m-by-n matrix A and m-dimensional column vector b.

Finding the matrix A and vector b needed to describe an affine trans-


formation is easy. If/(x) = Ax + b, then b =/(0). Then the function
g(x) =f(x) —f{0) is linear and the matrix A is found by using Theorem
4.2.

Example 10. Let ft 2 — '.R


2
be defined in terms of the standard geomet-
ric representation as a counterclockwise rotation of 90° about the center

= ev We see that f(0) = /"(e.) = (since the center of a


W
I I
, (

0/ l-i/

rotation stays fixed), and/ ^ Introducing the linear function

g(x) =f(x) -/(0), we have£(e x)


= I - (
= (
V and
J J

«w>-(_3-(_;

/0 -1\
Therefore ^(x) = I x for all x and

As a check, let us compute

(2\ /0 -l\/2\ / 1\ /0

\0/ \1 0/\0/ \-l/ \2/ \-l/ \1

which agrees with the geometric description off. See Fig. 13.
Linear and affine functions are often called linear and affine transforma-
tions, though we more often use the term function in this book.
Sec. 4 Linear Functions 35
'

36 Vectors and Linearity Chap. I

(b) What matrix corresponds to reflection in the line through the origin
135 counterclockwise from the horizontal?
(c) Compute the product of the matrices in (a) and (b) and interpret the
result geometrically.
Am. (b) "
')
(
\ - 1 0/

6. (a) Find the 2-by-2 matrix .\f r corresponding to reflection in the line-
through the origin at an angle x from the horizontal. Check your
result against Exercise 5 for a 45 . a 135 . What is M\l
(b) Let -'
be another angle and compute the product M M«. X
Show that this
represents a rotation, and identify the angle o( rotation. When does
MM x p
Mp MS
7. (a) Show that

and (

represent 90 rotations of ft 3 about the .v,-axis and .\\,-axis, respectively.


Find the matrix W which represents a 90 rotation about the x3-axis.
Also find V '
and V '
(which represent rotations in the opposite
direction I.

(b) Compute UVU '


and VUV '
and interpret the results geometrically.
(You may find it helpful to manipulate an actual 3-dimensional model.)

8. Show that a function /from one vector space to another is affine if and only if

f(rx (1 r)y) r/(x) (1 r)/(y)

for all numbers rand all x, y in the domain off. [Hint. Consider the function
g(x) f(x) /(0).]

9. Let / be a linear function from a vector space 'i to a vector space 10. Show
that if x, x„ are linearly independent vectors in U and /is one-to-one,
then the vectors /(x,) /'(*„) are linearly independent in 10. [Hint. If
/ is one-to-one, and f(x) 0, then x 0.]

3 3
10. (a) Prove that if /is a linear function from :K to ft and /is one-to-one,
then the image / (I.) of a line /. by f is also a line.
(b) Show by example that, if the linear function of part (a) fails to be one-to-
one, then the image of a line by / mav reduce to a point.
(c) Show that if L ]
and I.
z
are parallel lines and /'is a linear function, then
/(/.]) and I (L,,) are parallel lines, provided that / is one-to-one.

11. Show that the composition of two affine functions from a space into itself
is affine. Suppose f(x) Ax b, ;mxi (\ d. Suppose*/ g)(x)

Px q. Express P and q in terms of A, b, C, and d. When is/ g the same


function as g /?
:

Sec. 5 Dot Products 37

12. (a) I*. = Q,b, —


clockwise rotation of 90° with center at the origin. Find the matrix
(~J).
and let .H
2
.ft
2
be a counter-

corresponding to the linear function^ and compute the affine function


f= t a og o t b where / a and t b are the translations induced by a and b.
,

(b) Give a geometric interpretation of the composition/ = t a og o t b where ,

a is any vector in ft 2 b = —a, and^ is any rotation about the origin.


,

[Hint. What happens to the point corresponding to a under the


function/?]

13. Show that an affine function A is one-to-one if and only if A(x) = A(0)
always implies x = 0.

14. Which of the following linear functions are one-to-one?

(a) /(a-j, Xo) = (a-! + 2x 2 2x x, x2).


(b) g(x x , x.2 , x3) = (x 2 - x3 x3 - xlt
, * 2).
SECTION 5

To allow the full application of vector ideas to Euclidean geometry, we DOT PRODUCTS
must have a means of introducing concepts such as length, angle, and
perpendicularity. We will show in this section that all these concepts can
be defined if we introduce a new operation on vectors. If x (jcx xn) =
and y = (y\, ,yn) are vectors in Jl", we define the dot product or
. . .

inner product of x and y to be the number

x-y =x y +
1 1 . . . +xnyn .

It is easy to verify (see Exercise 4) that the dot product of vectors in


'J{" has the following properties.

5.1 Positivity: x • x > except that 0-0 = 0.

Symmetry x •
y = y x. •

Additivity: (x -j-
y) • z = x z + y • • z.

Homogeneity: Ox) . y = r(x y). •

Because of the symmetry of the dot product, it follows immediately that


additivity and homogeneity hold for the second vector also, that is,

(y-L z) = x«y + x«z


and
x • (ry) = r(x •
y).

Let us first of all consider the length of a given vector x in 3l 3 . If


x = (xlt x z x 3 ), we think of
, the length of the vector as the distance from
38 I ecfors and Linearit \ C hap. I

3
the origin to the point with coordinates i \,. x 2 x3 ). ,
In :K the Pythagorean
theorem gives us a simple formula for the distance (see Fig. 14). Letting
|x| stand for the length of the sector x. we have

|x|- x\ \
x\ \
x\. (I)

Thus we see that the length of the vector x can be expressed in terms of the

dot product as \ x • x |x|. Note that we use the same symbol for the
length of a vector as for the absolute value of a number. Indeed, if we
think of a number as a one-coordinate vector, its length is its absolute
value. In :H" we define the length of a vector by the same formula that

works in 3l 3 |x| : \ x • x.

Next we would like to express the angle between two nonzero vectors
in :H". The usual convention is to take this angle 6 to be in the interval
: tt (see Exercise 1). The solution to the problem is provided by
the following theorem.

x y

Figure 14

5.2 Theorem

If is the angle between x and y, then

x-y
cos
W |y|

Proof. Let us apply the law of cosines to the triangle shown in Fij

15. It states that

|x — y|
2
= |x|
2
+ |y|
2
— 2 |x| |y| cos 6,

which we can rewrite, using |x|


2
= x • x, as

(x — y) • (x — y) = x • x -f y •
y — 2 |x| |y| cos 6.

Expanding the left-hand member, we obtain

x-x-x-y-yx y-y x-x y-y - 2 |x| |y| cos 6.


Sec. 5 Dot Products 39

Hence,
2x •
y = 2 |x| |y| cos 6,

and the theorem follows by dividing by 2 |x| |y|.

We see from this theorem that x •


y in absolute value is at most

|x| |y|, i.e., |x •


y| < |x| |y|. This is known as the Cauchy-Schwarz
inequality, proved more generally in Theorem 5.4.

Example 1. What is the angle between x = (1, 3) and y = (—1, 1)?


We easily compute that
x-x = l
2
+ 32 = 10,

yy= (-i) 2
+ i
2
= 2,
and
xy = -1+3 = 2.

Hence

|x| = VlO, |y| = J2, and cos 6 = -±= .

By consulting a trigonometric table we find that 0=1.1 radians approxi-


mately (or about 63°).

The theorem also provides a simple test for perpendicularity of two


vectors. They are perpendicular if and only if 6 = rr/2, and hence cos d —
0. Thus the condition for perpendicularity is simply

x •
y = 0. (2)

Example 2. Let us find a vector x of length 2 perpendicular to (1, 2, 3)


and to (1,0, —1). From geometric considerations we see that there will
be two solutions since, if x is a solution, so is — x. (See Fig. 16.) We
40 I ectors and Linearity Chap. 1

have to write down three conditions, two for the perpendicularity require-
ments, using (2), and a third condition to assure length 2:

Xj -f 2x 2 + 3.v 3 = 0,

X, - X3 = 0,

x\ + x\ + x\ = 4.
These equations have the pair of solutions x = ±(vf, — vf, V§).
If n is a unit vector, that is, a vector of length 1 , then the dot product
n • x is called the coordinate of x in the direction of n. The geometric
interpretation of n • x is shown in Fig. 17. For since cos — (n • x)/|x|,
it follows that n • x is either the length of the perpendicular projection on
the line containing n, or else its negative. The vector (n • x)n is called the
component of x in the direction of n and is sometimes denoted x n .

Figure 17

Example 3. Suppose that the vector (1, —1,2) represents a force F


acting at a point in the sense that the direction of the force is in the
direction of the vector and the magnitude of the force is equal to the length
of the vector, namely, \/6. In such an interpretation it is customary to
picture the vector not as an arrow from the origin to the point with co-
ordinates (1, —1, 2), but as that arrow translated parallel to itself and
with the tail moved to the point of application of the force. Figure 18

Figure 18
)

Sec. 5 Dot Products 41

illustrates one possibility. The component of the force F in the direction

of a unit vector n is then interpreted as the force resulting from the


application of F at a point that can move only along a line parallel to n.

Thus to find the component of F = ( 1 ,


— 1 , 2) in the direction of ( 1 , 1 , 1

we first find the unit vector in that direction. Since |(1, 1, 1)| = \ 3, the

desired unit vector is n = (1/V3, 1/V3, 1/v 3). The component of F in

the direction of n is then

(F . n)n = (W .(^A) , 2) n

V3 \3 3 3/

Any vector space on which there is defined a product with the properties
5. 1 is an inner product space. Thus %n is an inner product space,
called
and some other examples are given in Problems 8 and 9. Inner products
in spaces other than Jl" are used in this book only in Chapter 2 Section 4
and Chapter 5 Section 5.
In terms of the inner product we can always define the length or
norm of a vector by
|x| = VX • X.

Then length has the following properties.

5.3 Positivity: |x| > except that |0| = 0.

Homogeneity: |rx| = |r| |x|.

Triangle inequality: |x + y| < |x| + |y|.

The proofs of the first two are easy and are left for the reader to check.
The proof of the third is harder and will be taken up later, though we
remark here on its geometric significance, illustrated by Fig. 19.
The fact that length has been defined in terms of an inner product
leads to some properties of length that are not derivable from those
already listed in 5.3. First we prove the following.
Figure 19

5.4 Cauchy-Schwarz inequality


|x •
y| < |x| |y|.

Proof. We assume first that x and y are unit vectors, that is, that

|x| = |y| = 1. Then,


< |x - y|
2
= (x - y) • (x - y)
= |x|
2
-2x-y + |y|
2
= 2-2x-y,
42 Vectors and Linearity Chap. 1

or
xy < 1.

Assuming that neither x nor y is zero (for the inequality obviously


holds if one of them is zero), we can replace x and y by the unit
vectors x/|x| and y/|y|, getting

x-y < |x| |y|.

Now replace x by — x to get

-x-y < |x| |y|.

The last two inequalities imply the Cauchy-Schwarz inequality.

Notice that the Cauchy-Schwarz inequality may be written as

|x-y|
<1,
|x| |y|

so there will always be an angle 6 such that

cos 6 = ^^- .

|x| |y|

Defining the cosine of the angle 6 between x and y is sometimes more


though it doesn't show us how to tell
satisfactory than defining 6 itself,
whether the angle is

Using the Cauchy-Schwarz inequality, it is easy to give the deferred


proof of the triangle inequality, for from

|x-y| < |x| |y|,

we get
|x + y|
2
= + y) (x + y) = |x| + 2x
(x •
2 •
y + |y|
2

<|x| + 2|x||y| +
2
= (|x| + |y|
2
|y|)
2
,

from which follows


|x + y| < |x| + |y|.

Two vectors x and y that are perpendicular with respect to an inner


product are sometimes called orthogonal. Furthermore, a set S of vectors
Sec. 5 Dot Products 43

that are mutually orthogonal and have length 1 is called an orthonormal


set. The idea has a useful application to finding the inverses of certain
matrices. Any square matrix whose rows, or whose columns, looked at as
n
vectors in 'J{ , form an orthonormal set with respect to the Euclidean dot
product is called an orthogonal matrix. It is very easy to find the inverse
of such a matrix. Suppose the columns of

fa n a 12

A =

form an orthonormal set. Thus, for example a\^ + a\ x + = 1, . . .

a\ 2 + a\ 2 + . . . =
and a lx a 12 + a 21 a 22 +
1 = 0. We form the matrix
. . .

A 1
, called the transpose of A, by reflecting A across its main diagonal. Thus

'«n 21 • •

a 12 a 22 .

Then columns of A are orthonormal shows that A A = I.


the fact that the l

We (Theorem 8.3) that this implies also that AA = I. A


shall see later l

similar argument replacing orthonormal columns by orthonormal rows


would show that AA = I and then (by Theorem 8.3) that A A = /. Thus
t 1

we have proved the following

5.5 Theorem

Every orthogonal matrix A is invertible and A' 1 = AK

Example 4. It is easy to verify that the matrix


44 Vectors and Linearity Chap. 1

has columns (and rows) orthonormal. The transposed matrix is

I 1 V3
\ 2 2
and so A~ x =A 1
.

The second of the above two square matrices is obtained from the first
by interchanging rows and columns. When two matrices are so related
we still say that one is the transpose of the other, even if they are not
square. Thus if

1 2 3
A =
4 5 6

then

T 4\

2 5

3 6,

Obviously,
(A 1
)
1
= A for any matrix A.

EXERCISES
1. Show that the natural basis vectors satisfy e,- • e,- = 1, and et
- • e, = if

i *].

2. (a) Find the angle between the vectors (1, 1, 1) and (1, 0, 1).
(b) Find a vector of length 1 perpendicular to both vectors in part (a).

3. Find the distance between (1, 2) and (0, 5).

4. Prove that the dot product has the properties listed in 5.1.

5. Prove the positivity and homogeneity properties of length listed in 5.3.

6. Find the coordinate of the vector x = (1, —1,2): (a) in the direction of
n = (1/V3, l/v/3, 1/V3) and (b) in the direction of the nonunit vector
(1,1, 3). (c) What is the component of x in the direction of n?

7. (a) Prove that any vector in 31" and n is a unit vector, then x can be
if x is

written as x = y +
z, where y is a multiple of n and z is perpendicular to

n. [Hint. Take y
to be the component of x in the direction of n.]
(b) Show that the vectors y and z of part (a) are uniquely determined. The
vector z so determined is called the component of x perpendicular to n
Sec. 5 Dot Products 45

8. (a) Consider the vector space C[0, 1] consisting of all continuous real-
valued functions defined on the interval <x<
[The sum of/ and
1.

g is defined by (f + g)(x) = f(x) + g(x), and the numerical multiple


rf by (rf)(x) = //(*).] Show that the product (f,g) defined by

<f,g)=\f(x)g{.x)dx

is an inner product on C[0, 1].

(b) Define the norm of/by \f\


= </,/>
1/2
. What is the norm of/(x) = xl
9. (a) Show that the product of x = (x r , x 2 ) and y = (ylt y 2) defined by

x * y = Xiji + 2x 2y 2
is an inner product,
(b) With length defined by jx] „.
= (x * x) 1/2 , sketch the set of points
satisfying |x|„. = 1.

10. Show that if |x| = |y| in an inner product space, then x | y is perpendicular
to x - y.

11. Show that the matrix

/'cos 9 —sin 6

ysin 6 cos 6

is orthogonal, and find its inverse.

12. Derive the inequality


|x| - |y| < |x - y|

from the triangle inequality, and then show that

||x| - |y|| <|x -y|.

13. (a) Show that if A is a 2-by-2 orthogonal matrix, then \A\\ = |x| for all
vectors x in R2 .

(b) Prove the result of part (a) for n-by-n matrices.


(c) Prove that, if A is n-by-n and |^x| = |x| for all x, then (Ax • Ay) =
x y for all x and y in & n
• .

14. A real-valued linear function W —> Jl is sometimes called a linear


functional. that, if/is a linear functional defined on Jl n then there
Show , is

a fixed vector y in Jl n such that/(x) = y • x for all x in 31". [Hint. What is

the matrix of/?]

15. Let

2-1
and B
3-5 2
Compute AB, BA, A B\ and B A l l l
.

46 I eclors and Linearity Chap. I

H>. (a) I or any two matrices A and //. show that it' /)# is defined, so is Z?'/F,
. and that IV A' (Mi)'
(b) Slum that if I in imciliblc, then so is .1'. and that (/f) '
(/J
]
)'.

si ( HON (,

II (I 11)1 \\ In this section we develop some facts and formulas about lines and planes
GEOMETRY in :ii- and 3l 3 . Some of the ideas will reappear in Section 1 of Chapter 3.
2 3
Recall that if x, is a nonzero vector in 5l or :l\ , then the set of all

numerical multiples tx }
is a line L passing through the origin, and any
line parallel to L„ consists of the set of all points /x, f-
x , where x ()
is some
fixed vector. An alternative way to say the same thing is: a line is the
range of a function of the form f{t) tx x„, where x, }
0.

Example 1. To describe a line L passing through two distinct points


a! and a 2 take x, a2 a,. Then all multiples f(a 2
. a,) make up a line

parallel toL, and the points /(a 2 a,) a, make up L itself. For example, |

when I \vc get a, and when / we get a 2 See Fig. 20. Alternatively,
. I .

we can write the points on L in the form /a 2 (1 — t)z v j

Figure 20

3
To determine a plane in .'fi , we take two noncollinear vectors x, and x 2
(that is, so that neither is a multiple of the other) and consider all points
ux + t'x 2 where u and v are numbers. These points form a plane P
x

through the origin. A parallel plane will then consist of all points u\ + l

/x x where x is some fixed vector. We can restate the definition by


._.

,

g 3
saying that a plane is the range of a function 'M- > .'H , where g(u, v)

uXi + vx 2 + x , and the vectors Xj, x 2 do not lie on a line.

Example 2. For three points a^ a 2 and a 3 to determine a unique plane ,

passing through them, they must not lie on a line. (Then the vectors a 3 — a,

and a 2 - a, are not collinear. For otherwise a 3 a, r(a 2 — a,), so

a3 /a 2 (1 Oa, and a 3 would lie on the line determined by a 2 and

a,.) Now take x, a3 a, and x 2 a. a,. The desired plane consists

of all points
w = iv(a 3 - aO + r(a 2 - a,) + a,.
Sec. 6 Euclidean Geometry 47

Figure 21

To get a l5 take u =v— 0. To get a 2 , take u = 0, v = 1. To get a 3 , take


M = l, v = 0. See Fig. 21.
Using the dot product gives an alternative way to describe a plane P.
Suppose x is an arbitrary point on P and that p is a nonzero vector
perpendicular to both x x and x 2 (so p x x = p x 2 = 0) where Xx and x 2 are • •

vectors parallel to P. Then the equation

p • (x -x = ) (1)

is satisfied by all points x = ux l + t>x 2 + x on P because p • {ux x +


vx 2 ) = wp • xt + yp • x2 = 0. It is also easy to check that Equation (1)
can be solved for x to get x = wy x + vy 2 +
x and we leave the ,

solution as an exercise. Figure 22 shows the relationship between


the vectors. To find a vector p perpendicular to two vectors x t
and x 2 , it is convenient to use the cross-product of x x and x 2 ,
defined for Xj = (u u u 2, u 3 ) and x 2 = (v u v 2 , v 3 ) by

P = («2«8 — M 3^2> "3^1 — "l^3> "1^2 — "2^l)- (2)

It is routine to check that p • xx = and p • x — 0. For example,


2

p •
Xi = U l U 2 Vi — UiU 3 V 2 +u 2 u3v x —u 2 u x v3 + u u v — U U V = 0.
3 t 2 3 2 X

The cross-product is taken up in more detail in the next section.


In the meantime we will simply use it to find a vector perpendic-
ular to two given vectors. See Problem 13.

Example 3. Suppose we are given a plane parallel to the two vectors


Xj = (1, 2, —3) and x 2 = (2, 0, 1), and containing the point x =
(1,1,1). A vector p perpendicular to x x and x 2 is given by their cross-
product:

p = ((2)(1) - (0)(-3), (2)(-3) - (1)(1), (1)(0) - (2)(2))

= (2, -7, -4).

Now writing x = (x,y, z), we require that p • (x — x ) = 0, that is,


48 Vectors and Linearity Chap. 1

(2, —7, —4) •


(x — \,y — 1, z — 1) = 0. According to the definition of
the dot product, this last equation is

2{x - 1) - l(y - 1) - 4(z - 1) =


or
2x - ly - 4z + 9 = 0. (3)

In other words, the given plane consists of all points (x, y, z) with co-
ordinates satisfying that equation.

The simplest way to sketch the plane satisfying an equation like (3)
is to pick three points with simple coordinates that satisfy say (0, 0, f), it,

(0, \ , 0), and (— f , 0, 0). Then locate the points relative to coordinate
axes and sketch the plane using the three points as references. We have
purposely chosen in our example points that lie on the coordinate axes.
See Fig. 23.

Figure 23

If p is a nonzero vector in 2
ft and x is a point in ft
2
we can determine ,

the line perpendicular to p and containing x by the equation p (x — x ) = •

0. If p = (pi,p 2 ) and x = (xQ ,y ), the equation becomes

(Pi,p*)- (*— x ,y—y ) =


or
Pi(x - *o) + Pziy - Jo) = 0.

This is one of the several forms for the equation of a line in the xv-plane.
The slope of the line is evidently —p /p 2 1
.

The representation of a plane by an equation p • (x — x ) = is not


unique because any nonzero multiple of p can replace it, leaving the set of
vectors x satisfying the equation unchanged. However, it is sometimes
useful to normalize the equation of a plane by requiring that p be a unit
vector. The normalized equation then becomes n (x — x ) = 0, where

n = p/|p|. Alternatively we can write n • x — c, where c = n x • .


Sec. 6 Euclidean Geometry 49

6.1 Theorem

If n • x = c is the normalized equation of a plane P and y is any point,


then the distance from y to P is the absolute value of c — n •
y.

Proof. By definition the distance is to be measured along a line from


y perpendicular to P. This line can be represented by rn y, and the +
intersection with P will occur for some / t The desired distance is = .

then |(7 n + y) — y| = |/ n|, which is simply the absolute value of t ,

since n has length 1. Since / n +y lies in P, then n •


(/ n + y) = c.

But n • n = 1, so we obtain t +n •
y = c, from which the theorem
follows.

Example 4. We shall find the distance from (1,1,1) to the plane


(3, 0, —4) x • = —3. The normalized equation of the plane is given by
(|, 0, -i) x • = -f . Then

c-n.y = -f-G, 0,-|). (1,1,1)

Hence the distance is f . Notice that the equation of the plane could also
be written 3x — 4z = —3 and, in normalized form, (f)x — (|)z = — f.

EXERCISES
1. Write a formula in the form/(r) = tin x + x for a line containing x and
parallel to \ v In each case determine whether the point a lies on the line,
that is, whether a is in the range off.

(a) x = (1, 1), Xl = (-2, 3); a = (-3, 7).


(b) x = (0, 0, 1), xx= (1, 1, 1); a = (5, 5, 4).
(c) x = (1, 0, 2), x1= (2, 0, 3); a = (3, 0, 5).

2. Write a formula in the form^(w, v) = ux 1 + vx 2 + x for a plane containing


x and parallel to x x and x 2 In each case determine whether the point a
.

lies on the plane, that is, whether a is in the range of g.

(a) x = (0, 0, 0), xx = (1, 0, 1), x2 = (2, 3, 0); a = (4, 3, 1).


(b) x = (1, -1, 1), Xl = (1, 1,0), x2 = (-1, 2, 0); a = (1, 1, 2).
(c) x = (0, 0, 1), x 3
= (1, 2, 1), x2 = (2, 1, 2); a = (3, 0, 4).

3. Sketch the lines determined in Jl 2 or 3l 3 by

(a) r(l,2) + (-1, -1).


(b) /(1,0, 1) + (1,1,0).
(c) (l,2).x=2.
,

50 Vectors and Linearity Chap. 1

4. Sketch the plane determined in ft 3 by

(a) h(1, 2, 0) + v(2, 0, 1) + (1, 0, 0).


(b) (-1, -1,1). x-1.
(c) 2x + y + z = 0.

5. Find an equation for the line in :ft 2 that is perpendicular to the line tx x + x„
and passes through the point y 2 .

6. Find the cosine of the angle between the planes x x • x = cx and y 1 • y = c2 .

7. Prove that if xx • x = cx and y x • y = c 2 are normalized equations of two


planes, then the cosine of the angle between them is xx •
yx .

8. For each of the points and planes or lines listed below, find the distance
from the point to the plane or line.

(a) (1,0, -1); (1, 1, l)-x = 1.

(b) (1,0, -l);x + 2y + 3z = 1.

(c) (l,2);(3,4).x =0.

9. (a) Verify that the cross-product of x x and x 2 [formula (2) in the text] is
perpendicular to x 2 .

(b) Find a representation for the line perpendicular to the plane consisting
of all points w(l, 2, 1) + v( — 1, 0, 1) + (1,1,1) and passing through the
origin.

10. (a) Find a representation for x = (x, y, z) satisfying

2x + 3y + z = 1

by finding vectors x1( x 2 and x such that x ,


= ux 1 + vx 2 +x . [Hint.
Let u = y, v = z.]

(b) Do the same as in part (a) for the general equation p • (x —x) = 0,
when p ^0. [Hint. If p ^=0, then p has a nonzero coordinate.]

11. (a) If x is any vector in &3 , show that

x = (x • e^e! + (x • e 2 )e 2 + (x • e 3 )e3.

(b) If the vector x in part (a) is a unit vector, that is, a vector u of length 1

show that u • e^ = cos a,, where a, is the angle between u and e,. The
coordinates cos a t
are called the direction cosines of u relative to the
natural basis vectors e,. If x is any nonzero vector, the direction
cosines of x are defined to be the direction cosines of the unit vector
x/|x|.
(c) Find the direction cosines of (1, 2, 1).
(d) Show that parts (a) and (b) generalize to &n .

12. Let u and v be points in ft". Show that the point ^u + ^v is the midpoint of
the line segment joining u and v.

3
13. Let u and v be noncollinear vectors in :R . To find a vector x perpendicular
to both u and v, we solve the equations u • x = and v • x =0, that is,
Sec. 7 Determinants 51

solve
u xx + u 2y + u3z =
(*)
v xx + v 2y + v3z = 0,

where u = (%, u 2 w 3 ). v = (v lt v 2 , v 3 ), and x = (x, y, z).

(a) Show that


(u 2 v 1 - u x v 2 )y = (u x v 3 - u3 v x )z

(u x v 2 - u^Jx = (u 2 v 3 - u3 v 2)z.

(b) Show that if ux v2 — u2 v1 =£ 0, then the equations (*) have a solution

(x, y, z) = {u 2 v 3 - u3 v 2 u 3 v 1
,
- u x v3 u x v 2
,
- u 2 v x ).

(c) Show how to solve (*) if u1 v 2 — u2v x = 0.

In this section we define and study a certain numerical-valued function


defined on the set of all square matrices. The value of this function for a
square matrix M is called the determinant of M and is written det M.
Another common way to denote the determinant of a matrix in displayed
form is to replace the parentheses enclosing the array of entries by vertical
bars. Thus the notations

mean the same as


52 Vectors and Linearity Chap. J

Then
-9
5, An
4 2

-5 -6^
a 23 = 0,
-3 4)

=
(4), &la =

We can now make the definition of determinant:


For a 1 -by- 1 matrix A = (a), we define

det A = a.

For an n-by-n matrix A = (a i}), i,j= 1, • • • , «, we define

7.1 det A = 2(-l)' +1 «ii det /4 l3


-

= a n det y4 n — a 12 det A l2 + . . .
— (— \)"a ln det A ln .

In words, the formula says that det A is the sum with alternating signs, of
the elements of the row of A, each multiplied by
first the determinant of
its corresponding minor. For this reason the numbers

det A 1U -det A 12 ...,(- l) n+1


, det A ln
are called the cofactors of the corresponding elements of the first row of A.
In general, the cofactor of the entry a u in A is defined to be (— l) i+i det A tj
.

Thus in Example 1 the entry a 21 = 8 in the matrix A has cofactor

(_ 1)2+1 det /
) =40
\ 4 2/

The factor (
— 1)'>J associates plus and minus signs with det A n according
to the pattern

/+- + -••
+ - + • •

+ - + • •
Sec. 7 Determinants 53

Example 2.

1 2\
(a) det [
= 1 (4) - 2(3) =4-6= -2.

la b\
(c) det I = ad — be.

The example is worth remembering as a rule of


result of the last
calculation.The determinant of a 2-by-2 matrix is the product of the entries
on the main diagonal minus the product of the other two entries. Thus
2-by-2 determinants can usually be computed mentally, and 3-by-3
determinants in one or two lines. In principle, any determinant can be
calculated from the definition, but this involves formidable amounts of
arithmetic if the dimension is at all large. Some of the theorems we prove
will justify other methods of calculation, which involve less arithmetic
than that required in working directly from the definition for n > 3.

Determinants were originally invented (in the middle of the eighteenth


century) as a means of expressing the solutions of systems of linear
equations. To see how this works for two equations in two unknowns,
consider the general system

®Z\X\ I #22-^2
= '"

The variable x 2 can be eliminated by multiplying the first equation by a 22


and the second by <z 12 and then taking the difference. The result is the
,

equation
(a 22 a n - a 12 a 2l )x l = (a 22 r x - a 12 r 2 ).

This equation may be written as

xt det A = det B ll)


.

54 Vectors and Linearity Chap. 1

where and /?'"

is
I

W
the result of replacing the
"J
is the matrix ol coefficients

first column of A by
,-

The reader can .


\r, a J
'-
(an rA
easily dense the equation x 2 del A del /i'-'. where S (2)
(rA \«2i V
is the result of substituting for the second column of A. As we shall

see in Theorem 8.4, a similar result holds for systems of // linear equations
in /; unknowns, for all values of n.
Since our definition of determinants by 7.1 is inductive, most of the
proofs have the same character. That is, to prove a theorem about
determinants of all square matrices, we verify it for 1 -by- 1 (or in some
cases. 2-by-2) matrices, and also show that if it is true for (// - l)-by-
(// 1) matrices, then it holds for //-by-" matrices. In the proofs, we
give only the argument for going from step // I to step //. The reader
should verify the propositions directly for 1 -by- 1 and 2-by-2 matrices;
the verification is in all cases quite trivial. A, B, and C will always denote
n-by-n matrices. We write a, for the /lh column of a matrix A. If A has //

rows, then a, is a vector in :\{"

7.2 Theorem

If B is obtained from A by multiplying some column by a number r,

then det B= r det A.

Proof. Suppose b, = ra ; while b k - a for k - /. Then in particular


, .
;
.

b Vj = For k ra-ty B u is obtained from A lk by multiplying a


, /', .

column by r; since B lk and A v are (// — l)-by-(/z — 1), we have .

det B n = r det A lk by the inductive assumption. On the other hand,


.

B\j = A-lj, and b lk — a ik for k / j. Thus, whether k = j or not,


b lk det B Xk — ra u det. A lk Therefore
.

detB=i(-l) tn />,,detB u .

A-=l

= J(— l)' M n/ I7i


det A lk = rdet A.

7.3 Corollary

[fa matrix has a zero column, then its determinant is zero.

Proof. If a ; = 0, then a, 0a,. Then by Theorem 7.2. det A =


• det A 0.
Sec. 7 Determinants 55

Example 3. Let

1 2 3\ / 1 6 3N

1 2 41, 5=1-164
12/ \ 3 2,

B is obtained from A by multiplying the second column by 3.

det A = (1)(4 - 4) - 2(-2 - 0) + 3(- 1 + 0)

=0+4-3=1
det B= - 6(-2 - 0) +
(1)(12 - 12) 3(-3 + 0)

= 0+ 12-9 = 3 = 3 det /I.

7.4 Theorem

Let A, B, and C be identical except in they'th column, and suppose


that they'th column of C is the sum of they'th columns of A and B.
Then det C= det A + det B.

Proof. We have c = a + b Xi ls Xi , and= BXi For also C Xi = ,4 1; .

/: ^£y, c lfc = a = 6 U and Cu


IJk , . is and 5 1J: except
identical with /i lfc

for one column which is the sum of the corresponding columns of


A lk and B lk Thus . for k ^ j, det Clk = det A + lk det 2? lft , by the
inductive assumption. For k ^ j we have
c u det Cu = c1Jfc det + c det B lk
A lk lft

= a lk det 4 + b lk det 5 U lfc ,

while

Cij det C = 1;
a Xj det C +
1? fe
1;
- det CXi
= a t j det Au + 6 l3 det - 2? l3 .

Hence

fc+1
detC=i(-l) c u.detC 1 ,.

= 2(-l) k+ 'a lk det ^ + 2(-l) k -% k det B lfc x

= det A+ det B.
56 Vectors and Linearity Chap. 1

Example 4. Let

1
Sec. 7 Determinants 57

To exhibit det (x, b, c) as a linear function of x, we need merely calculate

(Xi bx cx

x2 b2 c2

x3 b3 c3

lb 2 c2
\
lx 2 c2
=x x det — bi det
\
-f c x det
\b 3 cj \x 3 c3 J \x 3 u3
= Xiibfa - b 3c 2 ) - - x c + c^xfo - x b
b x (x 2 c 3 3 2) 3 2)

= *i(6 c - 2 3 b3c2) - x s (Va - Vi) + ^3(6^2 - c^a),


which is a linear function of x.

Another important property of determinants is that if any two columns


(or rows) of a matrix are interchanged, then its determinant changes sign.
We first prove the result for adjacent columns.

7.6 Lemma
If B is obtained from A by exchanging two adjacent columns, then
det B = — det A.

Proof. Suppose A and B are the same, except that a ; = b J+1 and
a J+1 =.bj. For k #y or j -f 1, we have b lk Blk = = a lk and det
— detA lk by the inductive hypothesis, so Blk = (— l) k+1 b lk det
— (— l) u det A lk On
fc+1
tf the other hand b Xj = a lj+1 and Bu =
.

A 1J+1 so (-iy+1b li detBlj = (-iy+^alti+1 dctA ^rl = -(-iy+ 2 1>

a lj+1 detA lj+1 Similarly (-\)> +2 b lj+1 det B 1J+1 = (-l) 3+1fl„ det A
.
v .

Thus each term in the expansion of det B by 7.1 is matched by


a term equal to its negative in the expansion of A, and it follows
that det B = — det A.

Then
det = (1)(8 - 5) - (3)(-4 - 3) + (-2)(l0 - (-12))
A
= 3 + 21 - 44 = -20
det B = (1)(5 - 8) - (-2)(10 - (-12)) + (3)(-4 - 3)

= -3 + 44 - 21 = 20
= -det A
,

58 Vectors and Linearity Chap. 1

1.1 Theorem

If B is obtained from A by exchanging any two columns, then


det 5 = — det /I.

Proof. Suppose there are k columns between the two columns in


question (so k = if The first column can be
they are adjacent).
brought next to the second by k exchanges of adjacent columns.
Then the two columns can be exchanged, and with another k
exchanges of adjacent columns the second column can be put back
in the original place of the first. There are 2k + steps in all, and 1

by Lemma 7.6 each step changes the sign of the determinant. Since
2k + 1 is an odd number, det B = — det A.

7.8 Theorem

If any two columns of A are identical, then det A = 0.

Proof. Exchanging the two columns gives A again. Therefore


det A = —det A, and so det A = 0.

Multiplication by an n-by-n matrix gives a linear function from %n


to Jl", and it is natural to ask whether the determinant of the matrix is
related to some geometric property of the corresponding linear function.
It turns out that the determinant describes how the function affects
volumes in 3t n In 3\ 2 of course, "volume" is area.
. ,

/3 0\
,
Example 6. Multiplication by gives a function Jl 2 -> 51 2 which
J

multiplies lengths in the x-direction by 3 and in the j-direction by 2.


Areas are magnified by a factor of 6, as illustrated in Fig. 24(a), which
/3 0\
shows a unit square S and its image f(S). Note that det I = 6.

1 2
• Z \
For another example, consider the function g given by the matrix I

which has determinant 1. Its effect is illustrated in Fig. 24(b). The unit
square is mapped into a parallelogram with the same base and altitude,
so the area remains unchanged. The composition g of multiplies areas
by 6 [since f(S) has 6 times the area of S and g(f(S)) has the same area
as/(5)]. The matrix of g °/is given by the matrix product
Sec. 7 Determinants 59

(0,2)

(0,1)
60 Vectors and Linearity Chap. 1

7.10 Product Rule

If A and B are any two square matrices of the same size, then
det (AB) = (det^)(det5).

Proof. Let

L(Xj, . .
.
, x„) — det A det (x ls . . . , xn) — det (Ax Y , . . . , Ax n ),
where x l5 . . . , x„ are vectors in 51". Clearly L is linear as a func-
tion of each vector Xj. Furthermore, L(e, , . . . , e, ) = for any
set {e, , . . . , e, } of natural basis vectors. The reason is that if

any of e, , . . . , e, same then both det (e,


are the , . . . , e, ) and
det (Ae {
, . . . , Ae in ) are zero by Theorem 7.8. Otherwise e, , . . . , e,

are just e l9 . . . e n in some order, and by Theorem 7.7


,

det A det = ±det A det (e en = ±det A det /


(e,
v . . . , e, )
n
ls . . . , )

= ±det A = ±det (Ae lt . . . , Ae n )


= det (^e. /4e. , . . . , ).

But then L(b (


, . . . , bj = 0, where

is they'th column of B. For, using the linearity of L,

Lib,, . .
. , b„) = I.(i><A, . . . , jU»e<)

= i...i6, ll ...fe, nri L(e il ,...,e,J = 0.

Hence,
det A det B- det /*J? = det A det (b lt . .
.
, bj
- det (Ab u . . . , Ab n )
= L(bls . . . , b„) = 0.

Note that the proof just given uses only Theorems 7.5, 7.7, and 7.8
and does not use Theorem 7.9. (The point is important because the
product rule is used in the proof in Section 7 of the Appendix, on which
the proof of Theorem 7.9 depends.)
Sec. 7 Determinants 61

The natural unit of area in 3l 2 is given by the unit square with edges
(1,0) and (0, 1), and the natural unit of volume in iR 3 is given by the unit
cube with edges (1 0, 0), (0, 0), (0, 0, 1). In general we take the unit of
, 1 ,

volume in 31" to be that of the cube whose edges are the natural basis
vectors e l5 e„ that form the columns of the n-by-n identity matrix.
. . . ,

Moreover, we can take any n vectors x l5 x n in %n and form all linear . . . ,

combinations t x x x + + t n x n where each of the real numbers t1}


. . . , . . .
,

tn satisfies the condition < tt < 1. The resulting set of points is called
the parallelepiped determined by its edges x x x n If we choose only , . . . , .

two vectors xl5 x 2 then we speak of the parallelogram determined by


,

x x and x 2 Figure 25 shows a parallelepiped.


.

Figure 25

7.11 Theorem

Let a l5 a 2 ,
. . . , a n be n vectors in 31". Then the volume of the
parallelepiped with edges a l5 . . . , a„ is |det (a l5 . . . , a„)|.

Proof. The function/ whose matrix has columns ax


linear a„
carries e, into a ; , by Theorem
4.2. Hence, /transforms the unit cube
into the parallelepiped with edges a s Since it multiplies volumes by .

the factor |det (a l5 a„)|, and the cube has unit volume, the
. . . ,

volume of the parallelepiped is [det (a l5 a„)|. . . . ,

r x cos X
r2 cos 0;
Example 7. Let a x , a .. The vectors have
r x sin X
r 2 sin 2

lengths r x and r2 , and make angles X


and 2 with the x-axis as shown in
Fig. 26. Then det (a l5 a 2 ) is
62 Vectors and Linearity Chap. 1

Figure 26

r x cos L r2 cos 2
det /^(cos X sin 2
— sin 6 X cos 2)
/*! sin d 1 r 2 sin 2

= r x r 2 sin (0 2 — X)

This number may be interpreted as the product of the base rx by the


perpendicular height r 2 sin (0 2 — dj.

We have seen that the absolute value of det (a l5 a„) can be . . . ,

interpreted as a volume. The sign of this determinant also has a geometric


interpretation. We say that an ordered set of vectors (a 1? . . . , a n) in 31"

has positive orientation (or is positively oriented) if det (a l5 . . . , a„) > 0,


and has negative orientation if det (a 1? . . . , a„) < 0. If the determinant
is equal to zero, the orientation is not defined.
Example 7 shows that in Jl 2 , the sign of det (a 1; a 2 ) is the same as the
sign of sin 6, where B — 62 — 0j is the angle from a 1 to a 2 . Thus the
orientation of (als a 2 ) issome counterclockwise rotation of less
positive if

than 180° will turn a x to the direction of a 2 The orientation is negative if a .

clockwise rotation is required; it is not defined if a and a 2 lie in the same x

line. Thus in 'Ji 2 orientation corresponds to a direction of rotation.


,

(Note that the orientation of (a x , a2) is opposite to that of (a 2 a^.) ,

Property 7.7 of determinants, of course, implies that the orientation of a


set of vectors is always reversed if two vectors of the set are exchanged.
The interpretation of orientation in Jl 3 is less obvious. The sets of
vectors (x, y, z) and (— x, y, z) shown in Fig. 27(a) and 27(b) have
opposite orientations since, by Theorem 7.2, det ( — x, y, z) = —det
(x, y, z). The ordered set of vectors (x, y, z) is said to form a right-
handed system because, when the thumb and index finger of the right
hand are made to point in the x- and y-directions, the middle finger will
point in the z-direction. Similarly, (— x, y, z) form a left-handed system.
In this book we have chosen to draw pictures in 3-space so that the vectors
Sec. 7 Determinants 63

(a) right-handed (b) left-handed

Figure 27

form a right-handed system. Since det (e 1; e 2 e 3 ) = det 1=1,


«i> e 2 , e 3 ,

this implies thatour right-handed system has positive orientation, and a


left-handed system would have negative orientation.
Let u = (ul3 u 2 u 3 ) and v = (u l9 v 2 v 3 ) be vectors in Jt 3 The vector
, , .

with coordinates
«2
64 Vectors and Linearity Chap. I

A convenient way to remember the formula for the cross-product is to


think of the formal "determinant"
Sec. 7 Determinants 65

Figure 28

It is sometimes appropriate to combine the ideas of volume and


orientation in a single concept. We define the oriented volume determined
by an w-tuple of vectors to be the ordinary volume if the orientation of the
rt-tuple is positive and to be the negative of the volume if the orientation

is negative. Then the oriented volume of the ordered set (a l5 a„) is . . . ,

equal to det (a x a„). The relation between oriented volume and


, . . . ,

ordinary volume is very much like the relation between directed distance
on a line and ordinary distance. Indeed, oriented volume may be con-
sidered a generalization of directed distance, and we use the idea in
Chapter 7, Section 7.

EXERCISES
1. Find AB, BA, and the determinants of A, B, AB, and BA when
/l -2
(a) A [Arts, det AB = -14.]
,3 1

'2

| 3

v
4

2. Find the coefficients needed to express each of the following as a linear


function of the x's.

(a)
66 Vectors and Linearity Chap. 1

3. Show that if D is the diagonal matrix diag (/-,,..., r„), then det D =
/,;_, . . . rn .

4. What is the relation between

(a) det A and det ( /*)?


(b) det (a,, a 2 , . . . , a n ._,, a 71 ) and det (a„, a„_i, . . . , a2 , a^?
5. Verify the product rule for the pairs of matrices (a) and (b) in Problem 1.

6. Apply the product rule to show that, if A is invertible, then det A # and
(det A l
) (det/4)- 1 .

7. Let /4 be an m-by-m matrix and B an n-by-n matrix. Consider the (//; /;)-

IA Ov
by-(/w «) matrix which has A in the upper left corner, B in the
\0 B/
lower right corner, and zeros elsewhere. Show that its determinant is equal
to (det /l)(det B). [Suggestion. First consider the case where one of A or B
is an identity matrix, and derive the general result from the product rule.]

8. It is geometrically clear that a rotation in R2 preserves orientations, that a


reflection reverses them, and that both leave areas unchanged. Verify this
by finding the determinants of the associated matrices. (See Problems 4 and
6 in Section 4.)

9. By interpreting the determinant as a volume, show that |det (x 1( x 2 x 3 )| , <


|xj| |x 2 |

|x 3 |
for any three vectors in R 3 and
, that equality holds if and
only if the vectors are mutually orthogonal.

10. Find the volume, and the area of each side, of the parallelepiped with edges
(1, 1, 0), (0, 1, 2), (-3, 5, -1).
[Ans. volume = 17, areas = Vl66> V66, 3.]

11. If u - (2, 1, 3), v = (0, 2, -1), and w = (1, 1, 1), compute u x v,

det (u, v, w), (u x v) w, (u x v) x w, and u x (v x w).

12. Prove that u x v = -v x u, that u x (v f- w) (u x v) + (u x w), and


that (au) X v = u x (av) = a(u X v), for a real.

13. Find a representation for a line perpendicular to (2, 1, 3) and (0, 2, —1),
and passing through (1, 1, 1).
3
14. Let P be a parallelogram determined by two vectors in :K . Let Px P v and
, ,

P, be the projections of P on the ir-plane, the r.v-plane, and the .vv-plane,


respectively. If A (P) is the area of P, show that A 2 (P) - A 2 (PX ) - A 2 (P, ) + I

A 2 (P Z ).
15. (a) Verify by direct coordinate computation that |u x v|
2
= |u|
2
|v|
2 -
2
(u • v) .

(b) Use the result of part (a) to show that |u x v| = |u| |v| sin 0, where 6 is

the angle between u and v such that -.

(c) Show that |u| |v| sin is the area of the parallelogram with edges u and v.

16. The complex numbers can be extended to the quaternion algebra JC, which
is a four-dimensional vector space with natural basis {1, i,j, k). Thus a
R

Sec. 8 Determinant Expansions 67

typical quaternion is written q = a x + a 2 i + a 3j + ajc, where the a's are


real numbers. A product is defined in X
by requiring i 2 = j 2 = k 2 = — 1
and ij = —ji = k, jk = —kj = i, ki = — ik = j. The product of two
quaternions is got by multiplying out and using the above rules for products
of basis vectors. Jl 3 can be looked at as the vector subspace S of JC consisting
of quaternions with "real part" equal to zero and thus with natural basis
{'.;.*}

(a) Show that the quaternion product of two elements of S is not necessarily
in S.
(b) Define a product on S by first forming the quaternion product and then
replacing its real part by zero. Show that the resulting product is the
same as the cross-product in R3 .

17. Prove the identity a x (b x c) = (a • c)b — (a • b)c for vectors a, b, and c in


Jl 3 [Hint. Choose an orthonormal
. set of vectors (u 1( u 2 u 3 ) for ,
Jl
3
so that
(u 1; u 2 , u 3 ) is positively oriented and

a = fliUj + o2u 2 + fl
3u3

b = b^ + b 2u 2

c = c^.]
SECTION 8

In the previous section we defined the determinant of an n-by-n matrix DETERMINANT


A = (aif ) by EXPANSIONS

det^=i(-l) m a 1;.det,4 iy ,

where, in general, A i}
; denotes the (« — l)-by-(« — 1) minor corresponding
to a tj . In the present section we prove more general formulas of the same
kind. These formulas, which apply to any n-by-n matrix A, are

8.1 det A = 2 (-\y +ja a det A u

8.1R det A =Z(-l) i+ia ij detA i


;=1

Formula 8.1 holds for each integer j between Formula 1 and n, while
8. 1 R holds for each integer Formula 8.
i between 1 and n. (For / = 1 , 1

coincides with Formula 7.1 used earlier in defining determinants.) For a


giveny, the matrix elements ati on the right side of 8.1 are in theyth column
of A, so 8.1 is called the expansion of det A by theyth column. Similarly,
the right side of 8.1 R is called the expansion of det A by the /th row. We
68 Vectors and Linearity Chap. 1

postpone the proof of the formulas to the end of the section, first showing
some of their consequences.

Example 1. Let
12 3 4\

A = I 5 6 7

\8 9 0/

The expansion of det A by the second row is

/3 4\ (2 4\ (2 3
-5 det +6 det - 7 det
\9 0/ \8 0/ \8 9

= (-5)(-36) + (6)(-32) - (7)(-6)

= 180 - 192 + 42 = 30.

The expansion by the third column is

IS 6\ (2 3\ (2 3
4 det - 7 det + det
\8 9/ \8 9/ \5 6

= (4)(-3) - (7)(-6)

= -12 + 42 = 30.

Formulas 8.1 and 8.1R can be useful in evaluating a determinant, but


some of their theoretical consequences are more important. Let the
matrix A be given and consider the expression

2= (-iy+^det^,,
t' l

where x 1 , . . . , x n may be any set of n numbers. From 8.1 we see that this
is equal to a certain determinant; in fact it is the expansion by the yth
column of the matrix obtained from A by replacing column with
they'th
(Xj, . . . , -Y,,). Now consider what happens if we x n equal
take xlt . . . ,

to the elements alk a 2k


, , , a nk of the Ath column of A. If k =j, of
course we simply have the expansion of det A by the yth column. If
k 7^y\ we have the determinant of a matrix with two columns (they'th
and Ath) identical, and by Theorem 7.8 the result is 0. We have proved:

8.2 Theorem
For any A7-by-/? matrix A,

i(-D'
!

det Au
:

Sec. 8 Determinant Expansions 69

An exactly similar argument, using 8. 1R instead of 8.1, gives the "row"


form:

8.2R For any n-by-n matrix A,

v/ i\f+; a a
(det /I if k = i

The number (— \)
i+i det Au is called the ijth cofactor of the matrix A.
We shall abbreviate and write A for the matrix with entries a ti
it as a u .

Theorem 8.2 can be formulated as a statement about the matrix product


A'A. Thejkth entry in the product is the product of they'th row of A (i.e., 1

the y'th column of A) and the kth column of A. That is, it is the sum
n
^ a tj a lk . By Theorem 8.2 this is det A if y = k, and otherwise. Hence
«=i
A'A is equal to (det A)I, a numerical multiple of the identity matrix. A
similar calculation using 8.2 shows that AA is also equal to (det A)I. l

If det A =£ 0, we may divide A' by it and obtain a matrix B such that AB =


BA = I. Thus we have proved

8.3 Theorem

If det A 7^ 0, then A is invertible, and A' 1 = (det Ay^A 1


, where A
is the matrix of cofactors of A.

It is an important consequence of Theorem 8.3 that, if a matrix A is

known only to have a "right inverse" (BA = /) or a "left inverse" {AB =


I), then A is invertible and A' 1 = B. This is true because the product rule
for determinants applied to AB = I, or to BA = I, gives

(det A)(det B) = det / = 1.

But then det A ^ 0, so A is invertible. That A' 1 = B follows immediately.


(Why?)

Example 2. We shall compute the inverse of the matrix

12 3 4\

A = j
5 6 7

\8 9 0,
70 Vectors and Linearity Chap. 1

used in Example 1 . Write b u as an abbreviation for det A u the matrix


\ B is
then easily calculated to be

To obtain the matrix of cofactors, insert the factors ( — l) i+i , changing the
sign of every second entry and giving

63
Sec. 8 Determinant Expansions 71

the other hand, A(A~ l \>) = (AA'^b = b. In other words, the equations
have a unique solution, and it is /l _1 b. They'th entry in the column vector
A~ x
\t is the matrix product of the y'th row of A~* and the vector b. If
det A 7^ 0, we may express the elements of A' 1
in terms of cofactors of A
and obtain

(det A)' 1 2 (— l) i+> (det A u)b t

for this product. From 8.1 this may be recognized as (det A)' 1 det B U)
where B U) is the result of replacing they'th column of A by b. We have
proved:

8.4 Cramer's Rule

If the determinant of the matrix of coefficients of a system of n linear


equations in n unknowns x 1 , . . . , xn is different from zero, then
there is a unique solution and it is given by

det B U)
x< = ,

det A

where A is and B U) is the result of replacing


the matrix of coefficients
they'th column of A by the column of numbers that make up the right
side of the equations.

Example 3. Solve the system

xx —2x 2 +4x 3 = 1

2x x +3x 2 —x = 3 3.

We have

1 -2
-1 1

2 3

1 1

-1 2

2 3
72 Vectors and Linearity Chap. I

Expanding the determinants by their first rows gives

del A = (1)(2) - (-2)(3) + (4)(-5) -2 + 6- 20 =-12


det 5 (1)
= (1)(2) - (-2)(1) + (4)(3) = 2 + 2+12=16
det 5< 2 » = (I)(l) - (1)(3) + (4)(-7) = 1 - 3 - 28 = -30
det B {3)
= (l)(-3) - (-2)(-7) + (l)(-5) = -3 - 14 -5= -22.
Thpn v —
1 I1CI1 .*i —
JUL
12

- *
3» Av 2 — 3-0
12 — 2' Av — 12 — 6*
5
3
2 2 l_l

We have not made any assertions in this section about what happens
if the determinant of the coefficient matrix of a system of equations is zero.
It an easy consequence of the product rule that a matrix with zero
is

determinant cannot be invertible (see Problem 1 in Section 7), and it will


be shown in Chapter 2, where we discuss the solution of linear systems
in detail, that in this case the system of equations has either no solution or
infinitely many. While Cramer's rule and the formula for A _1 in terms of
cofactors are important as theoretical results and are quite useful for
solving systems of two or three linear equations, they are less efficient for
larger systems than the methods of Section 1, Chapter 2.

Since 8.1 withy = 1 is exactly like 7.1 except that it refers to the first
column instead of the first row of a matrix, it is clear that transposing a
matrix (which just exchanges the roles of rows and columns) should not
affect the value of the determinant. The formal statement and proof
follow.

8.5 Theorem

For any square matrix A, det A' = det A.

Proof. Let B= A'. By definition of the transpose, b u = aiu and


it is easy to see that Bn = A^. Thus by definition

det B = b n det- b det B +


Bn + (-l) n+1 b ln det B ln
12 l2 . . .

= a n det A'u - a n det A\x + + (-l)" +1 a nl det A'nl . . . .

The A u are — l)-by-(« — 1) matrices, and by the inductive


(/?

hypothesis we may replace det A'n by det An in the formula above.


The result is equal to det A by 8.1 (withy = 1), and we have proved
det A 1
= det B= det A.

We may now take any theorem about determinants that involves


columns of matrices and immediately derive a corresponding theorem
involving rows instead, by applying the given theorem to the transposes
Sec. 8 Determinant Expansions 73

of the matrices. We shall not always bother to write out these corre-
sponding theorems, but shall refer to the "row" version of a numbered
statement by using the same number with an R after it. (We have already
numbered 8.1R and 8.2R to conform to this convention.) For example,
Theorem 7.2R would read: If B is obtained from A by multiplying some
row by the number r, then det B = r det A.
Theorem 8.5 implies that, from any theorem about determinants that
involves columns of matrices, we can derive a corresponding theorem
involving rows instead, by applying the given theorem to the transposes
of the matrices. In particular, 8.1R is a consequence of 8.1 and 8.5. We
shall not bother to write out the row versions of the other statements
but may refer to them by the original statement number with an R after it.

The particularly important fact that a determinant is a linear function of


each of its columns was proved in the previous section. It now follows
that a determinant is also a linear function of each of its rows.
The following theorem (and the corresponding theorem for rows)
leads to an efficient procedure for computing the determinants of large
matrices.

8.6 Theorem

If C and A are n-by-n matrices and C is obtained from A by adding a


numerical multiple of one column to another, then det C= det A.

Proof. Suppose C is the same as A except that c y = a; + ra t . B


Let
be the result of replacing a ; in A with /-a,. By Theorem 7.4, det C=
detA +
det B. By Theorem 7.2, det B is r times the determinant of
a matrix with two identical columns. Therefore det B = and
det C= det A.

-4

The third column of C is equal to the third column of A plus 2 times the
first column. As in Example 5, Section 7, det A = —20. Then det C =
—20. As a check,

det C= (1)(- 16 - 25) - (3)(8 - 15) + (0)(10 - (-12))


= -41 + 21 + = -20.
74 Vectors and Linearity Chap. 1

(b) Let

A =
Sec. 8 Determinant Expansions 75

8.7 Lemma
If the first column of the matrix A is e8 , then

dety* = (-l) i+1 det^ a .

Proof. By Definition 7.1,

3 =1
column of A is e l5 then a u = 1, while for > 1 the first
If the first /'

column of A Xi is 0, and so det A Xi = by Theorem 7.3. Thus the


expression for det A reduces to one term, det A 1X and Theorem 8.7 ,

holds for the case i = 1 For / > 1 we need to use the inductive
.

hypothesis that Lemma 8.7 holds for (n — l)-by-(« — 1) matrices.


In this case a n = and

det ,4 = 2(-l)> +1 a l3 .
det Ar
3 =2

Each minor A Xi has zir_x for its first column. (Removing the top entry
from the vector e f in 31" gives e z _! in 3l n_1 .) By the inductive
hypothesis,

det4„ = (-iy det A ljtil ,

where we write A ljtil for the matrix obtained from A by deleting


the first row, they'th column, the /th row, and the first column. Let
B = A a Then (since the first column of B is formed from the
.

second column of A, etc.) A ljn = B x _ x and a u = b lti_v Com- ;

bining the equations we have derived so far gives

det A =i(~iy +1 b .U-l) 1


i
det 5 1 ,_ 1 .

3=2

By Formula 7.1 the right side of this equation is (— l) i+1 det i?,

which is (— l) i+1 det Aa , as was to be proved.

Proof of 8.1. We first prove 8.1 for the special case j = 1. The first

column of A can be written as a x = a^^ + «2i e 2 + • • • + a ni e n>


so by linearity of det and Lemma 8.7,

det A = det (a l5 a 2 , . . . , a„)

= a n det (e u a 2 , . . . , a„) + a 2l det (e 2 a 2 , , . . . , a„)

+ flni det ( e n, a 2 , . . . , aj

= |(-ir a det^ l
il 1.
j=i
76 Vectors and Linearity

To prove the general case, consider the matrix B obtained from A


by moving the/th column into the first position. This move requires
a series of/ — exchanges of adjacent columns; so by Theorem 7.6,
1

det A = (— l) j_1 det B. For all i, we have b a = a i} and B a = Au .

Thus we obtain

det/4 =(-l)'-1 detB

= (-ir i(-0 ,+ \idetB,


1
1
= 3 1

= i(-l)*"a„det4„
as was to be proved.

EXERCISES
1. Using appropriate cases of 8.1 or 8.1R, express each of the following as a
linear function of the jc's.

(a)

(b)
Sec. 8 Determinant Expansions 77

Check your answers by multiplication.

4. Solve the systems

(a) Ix + 6y = 5
'

6x + 5y = -3.
(b) 2x +y =
3y + z = \
Az + x = 2. M«5. x = --£ 2 S -I

(c) x x + x 2 + x3 + x4 = — 1
x — x2
1 + 2x4 =
->x 1 — X3 == *

x2 — xt = 0.

5. Use the method of Example 4(b) of the text to evaluate

/"'
78 Vectors and Linearity Chap. 1

which is zero except for the entries on or adjacent to the diagonal, is

called a tridiagonal matrix. Let dk be the determinant of the k-by-k minor


formed from the first k rows and columns of A, so, for example, d1 = a x
and d2 = a x a % — b 1 c 1 Show that, for k > 3, dk = ak dk _ l — b k _ 1 c k_ l dk_ i
. .

(b) Consider tridiagonal matrices in which the entries on the diagonal all
have the value 2 and the entries next to the diagonal all have the value 1.
Let dn be the determinant of an n-by-n matrix of this type. Find a formula
for dn [Suggestion. Start out by seeing what happens for n = 2, 3, 4.]
.
2

Linear Algebra

SECTION 1

A system of linear equations is a finite set of equations LINEAR EQUATIONS,


INVERSE MATRICES

a u Xi + . . . + a ln x n = bx

1.1 ...
a mlX l + • • • + a mn X n = t>m

where the a's and fs are given and the ;c's are to be determined. The whole
system can be written in matrix form as Ax = b, where A is the m-by-n
coefficient matrix with entries o i3 b is a column vector in %
,
m and x is a
,

column vector in 31". Doing the matrix multiplication in the equation

shows at once that it is equivalent to the system 1.1.

It frequently happens in applications that there are as many equations


as there are unknowns, so that m= n. Then, if the determinant of the

79
80 Linear Algebra Chap. 2

coefficient matrix not zero, the solution can be obtained by Cramer's


is

rule of Section 8,Chapter 1. The methods of the present section do not


use determinants, and we do not assume m = n; even for systems that
can be solved by Cramer's rule, these methods are more efficient when n
is greater than 3.

Any Ac = b is a solution of the system. As


vector c in 31" such that
we shall show, some systems have no solution, some have exactly one, and
some have infinitely many solutions.
We say that two systems are equivalent if they have exactly the same
set of solutions. Our procedure will be to take a given system and alter it

in a sequence of steps to obtain an equivalent system for which the solutions


are obvious. We illustrate the process with an example before giving a
general description.

Example 1.

3x + \2y + 9z = 3

2x + 5y + 4z = 4

-x + 3y + 2z = -5.

Multiply the first equation by \, which makes the coefficient of* equal to 1

and gives
x+ Ay + 3z = 1

2x+ 5y + 4z = 4
-x + 3y + 2z - -5.

Add (—2) times the first equation to the second, and replace the second
equation by the result. This makes the coefficient of x in the second
equation equal to and gives

x+ 4y + 3z = 1

- ly - 2z = 2

—x + 3y + 2z = —5.

Add the first equation to the third, and replace the third equation by the
result, to get
x + Ay + 3z = 1

- 3y - 2z = 2

ly + 5z = -4.

Multiply the second equation by —\, to get

x + Ay + 3z = 1

y+&= —§
ly + 5z = -A.
Sec. 1 Linear Equations, Inverse Matrices 81

Add (—4) times the second equation to the first, and (—7) times the
second equation to the third, to get

x ~r ~zZ = ~3—

y + §z = -I
17
3^

— 3- 2

Multiply the third equation by 3 to get

x ~r 3^ = "3"
y + |z= -f
z= 2.
Add (—5) times the third equation to the first and (— f) times the third
equation to the second to get

x= 3

y = -2
z= 2.

Clearly, this sytem has just one solution, namely, the column vector

It is easy to verify by substitution in a system of equations that we

have found a solution for them. This verification of course does not rule
out the theoretical possibility that the original equations might have other
solutions as well. In fact the final system is equivalent to the original
system and has the same set of solutions. The same is true for any pair of
systems where one is obtained from the other by steps such as were used
in this example. Before we can prove this, we must first state exactly

what operations are allowed and then investigate their properties.


The operations used were "multiplying an equation by a number,"
and "adding a multiple of one equation to another." We prefer to give the
formal definitions in terms of matrices and, accordingly, consider the
general matrix equation Ax — b.

We define three types of elementary operations which can be applied to


any matrix M\

An elementary multiplication replaces a row of M


by a numerical
multiple of the row, where the multiplier is different from 0.
82 Linear Algebra Chap. 2

An elementary modification replaces a row of M by the sum of that row


and a numerical multiple of some other row.

An elementary transposition interchanges two rows of M. We did not


use any transpositions in Example 1, but they are sometimes useful.

It is important to understand that, if an elementary operation is

applied to both sides of an equation Ax = b, the result is a new matrix


equation that is satisfied by every vector x that satisfied the original
equation. Equally important is the fact that each elementary operation
has an inverse elementary operation by which the original operation can
be undone or reversed. For example, multiplication of a row by a number
r ^ is reversed by multiplying the row by I jr. Similarly, an elementary
modification is reversed by subtracting the same numerical multiple of
the same row instead of adding it. Finally, a transposition is reversed by
interchanging the two rows again.
Written as a matrix equation, the original system of equations in
Example 1 becomes
3 12 9\ /3\

-1

If we apply the elementary operation of multiplying the first row by £


to the matrices on both sides, we obtain

which is form of the system of equations at the second step in


the matrix
Example 1.same way, each operation on the system of equations
In the
amounts to applying an elementary operation to the matrices on both
sides of the equivalent matrix equation. At the final stage, the matrix
equation becomes
0\ / 3>

1 |x = I -2
v
1/ \ 2,

which, since the matrix on the left is the identity matrix, simply amounts
to saying that x is equal to the vector on the right.
The theorem that justifies our method of solving linear equations is as
follows.
Sec. 1 Linear Equations, Inverse Matrices 83

1.2 Theorem

If the system A x x = bx is converted to a system A 2\ = b 2 by applying


a sequence of elementary operations to A x to get A 2 and the same
sequence of elementary operations to b x to get b 2 , then the two
systems are equivalent in the sense that they have the same solutions.

Proof. If x is a solution of Axx = b l5 then applying an elementary


operation to both sides produces a new matrix equation that is

still satisfied by x. So x also satisfies the equation A 2x = b 2 that


results from a finite sequence of such operations. Conversely,
having transformed A{x. = b x into A 2x = b 2 by a sequence of
elementary operations, we can find a sequence of inverse elementary
operations which transforms the new equation back to the original
one. But then the same argument that applied in the first part of the
proof allows us to conclude that every solution of A 2x = b2 is also
a solution of A±x — b x .

Example 2. We now exhibit a system of equations with infinitely many


solutions. Consider the matrix equation

-3

Add (— V) times the row to the second row, and then add 3 times the
first

first row to the third row to produce zeros in the second and third entries
of the first column, and obtain

Multiply the second row by (—1) to obtain

'1 -2 -3\ / 2^

1 5|x= j
-6
,0 -1 -5/ \ 6;

Add 2 times the second row to the first and then add 1 times the second
84 Linear Algebra Chap. 2

row to the third to obtain

1 7\ /-ION
1 5 )x = I -6
,0 0/ \ Oy

At the corresponding stage in Example 1 , we performed an elementary


multiplication to make the third entry in the third row equal to 1. We
were then able to obtain the identity matrix by further elementary opera-
tions. Obviously the row of zeros prevents us from following this pro-

into a system of linear equations. The result is

x +7z=-10
y + 5z = -6
Ox + Oy + Oz = 0.

The third equation is satisfied for any values of x, y, z. The first two
equations may be rewritten as x — — 10 — Iz and y = —6 — 5z. Thus,
for any value of z,

is a solution, and every solution has this form for some value of z. We
have now described the set of solutions of

and by Theorem 1.2 we know that this is the same as the set of solutions
of the matrix equation we started with.

Example 3. Consider the matrix equation

1 -2 -3\ /2\

\ -2 —^ x= 7

-3 5 4/ \2,
Sec. 1

The matrix on the left is the same as the one in Example



Linear Equations, Inverse Matrices

2. Carrying out
85

the same sequence of elementary operations yields

ION

2y

/, 7^
Whatever x is, the third row in the product 10 1 5 |x will be zero

— 0—
\0 0/
because the third row of the left factor is Thus no value of x can give
zero.
a column vector with 2 in the third row, and we conclude that the equation
has no solution.

equations, we obtain
x + 7z=-10
y + 5z = -6
Ox + Oy + Oz = 2.
The last equation obviously cannot be satisfied for any values of x, y, z.)

In these examples we used elementary operations to transform the


original systems of equations into equivalent systems for which the
solutions were easy to find. The property of the final set of equations
that made was that each equation involved a
the solutions obvious
variable that did not appear in any of the other equations. Thus in
Example 2 we found x = — 10 — Iz, y — —6 — 5z, and because the
first equation involved x but not y, and the second involved y but not x,

we could find the values of (x, y, z) satisfying both equations by consider-


ing the equations separately. In practice, there is no difficulty (beyond the
labor of doing the arithmetic) in reducing any system of equations to a
system that has this "noninterference" property. For such a reduced
system, it is either obvious that no solution exists, as in Example 3, or
else the solution or set of solutions can be described explicitly as in Ex-
ample 1 or Example 2. We leave for Section 3 the proof that this procedure
can always be carried out.
Suppose we have a square matrix A and are able to convert it to the
identity matrix / by some sequence of elementary operations, as in
Example 1. Theorem 1.2 then asserts that, if we carry out the same se-

quence of operations on a vector b and get c as a result, then Ax — b


86 Linear Algebra Chap. 2

ifand only if /x = c. In other words c is the (unique) solution of Ax b. =


Carrying out these same operations on the identity matrix gives a matrix C,
whose y'th column c t is the solution of Ac = e J5 where e, is theyth column
}

of /. Hence AC = I. This suggests that C = A' 1 and that we have


arrived at a method for computing matrix inverses. This is so, and we
state the result formally.

1.3 Theorem

If an n-by-n matrix A can be converted to the n-by-n identity matrix


/ by a sequence of elementary operations, then A is invertible and
A l
is equal to the result of applying the same sequence of operations
to/.

Proof. The preceding discussion showed that if C is the result of


applying to / some elementary operations that convert A to /, then
AC = I. To show that and C = A -1 we must show
A is invertible ,

that CA = I. The argument depends on the fact that elementary


operations are reversible. Now since C was constructed by applying
elementary operations to /, it follows that C can be converted back
to / by elementary operations. Applying to C the argument that we
previously applied to A shows that there is some matrix D such that
CD = I. We now have
A = AI = A{CD) = {AC)D = ID = D;
so D= A. Thus CA = I, as was to be proved.

Example 4. We shall find the inverse of

'2 4

A =

We start with
4

-3 -1,
Add —2 times the second row to the first and —1 times the second row
to the third to get

'0 4 8\ /l -2 0\

1
Sec. 1 Linear Equations, Inverse Matrices 87

Multiply the first row by \ and then add 3 times the first row to the third
to get
'0 1 2\ n -\ ON

o I, I 1

Multiply the third row by — 1 and then add —2 times the third row to the
first to get
'0 1 0\

1 I.

1,
v

Transpose the first and second rows to get

1 0\ /

oio, 1

.0 1/ \-|
The last matrix on the right is A~ x , as may be verified by multiplying by A.

EXERCISES
1. Solve the following systems of equations:

'

/
88 Linear Algebra Chap. 2

3. Solve the matrix equations:

1 2
(a)
5 6

4. For each of the following matrices A, find A 1 if A is invertible, and find a


nonzero solution of Ax = if A is not invertible.

Ans.

SECTION 2

SOME In this section we shall look at some applications of vectors and linear
APPLICATIONS equations. The selection of examples is made so as to avoid technical
complications from the fields of application. Several of our examples
involve the notion of a network, which we define to be a finite collection
of points, or nodes, some of which may be joined to some others by line
segments. It is theoretically unimportant whether a network is visualized
as lying in 2-dimensional space or 3-dimensional space; we choose which-
ever is pictorially more convenient. Some networks are illustrated in
Fig. 1.
Sec. 2 Some Applications 89

Figure 1

Example 1. Some electrical circuits can be considered as networks in


which each line represents a connecting wire with a given electrical re-
sistance. When such a circuit is connected to a battery or similar power
source, currents flow in the connections, and a value of electrical potential
is established at each node. For many types of connecting wires, the
current flow in a wire is proportional to the difference in the potential at
its two ends. In quantitative terms,

(»< - v t)
(1)

where c u is the current flowing from node to nodey, r a is the resistance /'

of the connection between nodes i andy, and vt and vt are the values of the
electrical potential at nodes
andy. (The appropriate units of measure-
/'

ment are amperes ohms


for resistance, and volts for potential.)
for current,
A negative value for the current from i toy is to be interpreted as a current
fro my to /'.

Figure 1(a) shows a circuit with four nodes and five segments, with
the resistance of each segment indicated beside it. Suppose an external
power source is connected at nodes 1 to 4 to maintain values v 1 = 12
and v4 = 0. Since node 2 has no external connection, the current flowing
in must balance the current flowing out, so that if signs are taken into
account, the sum of the currents out of node 2 must be zero. Using (1),
we get the equation

K»i - v i) + ip 2 - v3) + £(y 2 - v 4) = 0.

When rewritten in the form

(i + i + i)»2 to, + v3 + lv*

we see that v 2 is a weighted average of vlt v 3 and y 4 with coefficients , ,

which are the reciprocals of the resistances in the lines joining node 2 to
the others. A similar equation will hold at any node that does not have an
external connection. Thus at node 3 we get

(i + 1 + i)» 8 to. + v2 + *y4


90 Linear Algebra Chap. 2

Ifwe put in the assumed values vx — 12 and vt — 0, and rearrange terms,


we get
kv
3 29 t/ — y, = 6

+ fy8
-v 2 2.

Solving this system gives v2 and y 3 = -9- as 6.22. Once the


= £ « 7.33
2

potentials are known, other quantities can be calculated directly. For


example, the current from node to node 2 is calculated as {t\ — v 2 )jr 12 =
1

i(12 — 7.33) = 2.33. Similarly, the current from node to node 3 is 1

^(12 — 6.22) = 0.96. The total current flowing from node 1 into the rest
of the network is then 2.33 + 0.96 = 3.29, which must of course be equal
to the current flowing into node 1 from the external source.

Example 2. We can consider vectors in JR,2 or 'Ji 3 as representing forces


acting atsome point which, for convenience, we take to be the origin. The
direction of the arrow is the direction in which the force acts, and the
length of the arrow is the magnitude of the force. Our fundamental
physical assumption here is that, if more than one force acts at a point,
then the resultant force acting at the point is sum of the
represented by the
separate force vectors acting there. In Fig. 2 we have shown two different
pictures. The resultant arrow r is shown only in Fig. 2(a). For example,
suppose that the force vectors in Fig. 2(a) lie in a plane, which we take to
be :K
2
with the origin at the point of action. If we have

ff, = (-l,3), f2 = (4,3), (-2, -4), (2)

-»-f,

(a) (b)

Figure 2
,

Sec. 2 Some Applications 91

then by definition
r =f + 1 f2 +f = 3 (l,2).

But suppose we are given the directions of the three force vectors
and are asked to find constants of proportionality that will produce a
given resultant, say, r = (— 1, — 1). In other words, suppose we want to
find nonnegative numbers clt c 2 c 3 such that
,

Cjfa + c 2 f2 + c 3 f3 = (— 1, — 1).
(A negative value for some one of the c's would reverse the direction of
the corresponding force.) This vector equation is equivalent to the system
of equations we get by substituting the given vectors (1) that determine
the force directions. We find

Ci

IMaMiH-i) (3>

—c +x 4c 2 — 2c 3 = —1,
3c +x 3c 2 — 4c 3 = — 1.

Since we have two equations and three unknowns, we would expect, in


general, to be able to specify one of the cs and then solve for the others.
However, recall that the c's are to be nonnegative. (In particular, a glance
at Fig. 2(a) shows that we could not get a resultant equal to (—1,-1)
unless c 3 is actually positive.) Hence, we try c 3 = 1. This choice leads to
the pair of equations
-<?! + 4c 2 = 1

C\+ c2 = 1

which has the solution c x = f c 2 = f Then the triple {c x


, . , c 2 , c3 ) =
(f f 1) is one possible solution, and the three force vectors are
, ,

cA = (-f, f), c 2 i2 = (f , f), c 3 f3 = (-2, -4),

with magnitudes

IcAl = fVlO, |c 2 f2 |
= 2, |c 3 f3 |
= 2^5.

We could equally well have asked for an assignment of force magnitudes


that would put the system in equilibrium, that is, so that the resultant is

the zero vector. We would then have replaced the vector (— 1 , — 1) on the
right side of Equation (2) by (0, 0), and solved the new system in a similar
way.

Example 3. In this example we consider the idea of a random walk


in which an object always moves among finitely many positions along
92 Linear Algebra Chap. 2

specified paths, each path having a certain probability or likelihood of


being used. To be specific, we assume that the probability of leaving a
certain position along some
is the same for all paths leading
particular path
away from that position. Some sample configurations are shown in Fig. 3,
and it is clear that they form networks.

O l«5

(a) (b)

Figure 3

It is is always a number between


understood that a probability and 1

inclusive,and that the probability of a particular event is equal to the sum


of the probabilities of the various distinct ways in which that event can
occur. Thus in Fig. 3(a), since we assume that all paths away from a 5
are equally likely, it will happen that the probability of leaving a b along
each of the two possible paths is \. Similarly, each of the three paths
from a 4 has probability g. We further assume that transition from one
position to another is such that the probability of two successive events is
equal to the product of their respective probabilities. Thus going from
o 2 to a x to fl
4 has the probability (£)($) \. —
We can now ask a question such as the following: What is the proba-
bility p k of starting at a k and arriving at the specified position a 5 without
,

going to a4 l We see that, starting at a x , we can go to a b directly with


probability g, or we can go to a 2 with probability J-
and then go to a 5
with probability p 2 Thus
.

Pi = £ + (£)/>2

Similarly, because going to a 4 does not occur in the events we are watching,

Pi = (¥)Pi + i\)Pz

Pa = (*)/>•

We can rewrite these equations as

Pi - (i)/>2 = a

-(£)/>! +/>2-(i)/>3 =

— (h)P2 + Pz =
Sec. 2 Some Applications 93

and solve them by routine methods. We get

Pi = f» Pz — t> />3
= t-
It appears that, the nearer we the more likely we are to get to a 5
start to cr
5 ,

without going to a 4 . However,


important to understand that, in
it is

general, the values of the probabilities depend on the entire configuration.


An analysis like the one just given would be useful in distinguishing
completely random behavior of a creature in a maze from purposeful or
conditioned behavior.

Example 4. A system of interconnected water pipes can be thought of


as a network in which the nodes are joints. It is usual in such a network to
assign each pipe a natural direction of flow, indicated by arrows in Fig. 4.

(a)

Figure 4

Then a positive number r will represent a rate of flow in the assigned


direction, while a negative number — r will represent a flow of equal rate
in the opposite direction. We shall separate the flow rates into internal
and external rates denoted by tk (Specifying an external rate at
rates r k .

any joint has the effect of closing off the external pipe at that joint.) We
assume that the inflow at any joint must equal the outflow. Thus at the
upper left corner in Fig. 4(a) we find t1 r1 + r 2 while at the lower left = ,

we find tz + r2 — r3 For the network of Fig. 4(a), the complete


. set of
equations relating the r's and the r's can be put in the form

riH
94 Linear Algebra Chap. 2

In vector form we can write At = t, where A is the 5-by-6 matrix

1 1 o\

-1 0-1 1

0-1 1

0-1 1

The vector equation shows that there is a linear relation between t and r,

namely, that t is a linear function of r. This implies that, the flows rk


if

are all multiplied by a number c, then so are the flows t


k . Similarly, if two
sets of flows rk and r'
k are superimposed, then the resulting external flows
are given by t = A(r + r'). This phenomenon is called the superposition
principle for linear systems. Of course, specifying the r's will completely
determine what the t's must be.

Turning the problem around, we can ask to what extent specifying


the external flows at the joints will specify the flows rk in the pipes. In
particular, we can try specifying that the exterior flow tk at each joint
should be zero. This leads to the system of five equations in six unknowns

/i + r2 =0
— »i — r« + r5 =0
—r 2 + r3 =0
r6 + r< = 0.

We can let r 6 = a be any number; then we get r 5 = —a from the last

equation. Now let r 4 = b be any number. The remaining four equations


are easily solved in terms of a and b, and we find the solution

r = {—a — b, a + b, a + b, b, —a, a).

A simple check shows that these assignments for r x through r 6 will satisfy the
above system. It follows that there are infinitely many different pipe flows
that will produce zero external flow at each joint. Similarly, if we have
a solution r of the original system (4) for some given vector t on the right
side, then by the superposition principle any of the solutions r can be
added to r to give a new solution r + r; we have

A(t + r) = Ar + At
= +t= t.

Example 5. The derivation of Simpson's rule for approximate in-

tegration is based on the requirement that it should give exact results


when applied to quadratic polynomials. The rule gives an approximation
/

Sec. 2 Some Applications 95

to the integral of a function over an interval (a — //, a + h) in terms of the


value of the function at the end-points and mid-point of the interval, in the
form

I
"
'f(x) dx ^ uf(a - h) -:-
if (a) + wf(a + h)
Ja-h
where it, v, and w are constants. If the formula is to be correct for all
polynomials of degree less than or equal to 2, it must in particular be
correct for the polynomials f (x) = U /i(-v ) = fix) = x 2 Each
x, ar>d .

of these gives an equation for u, v, and w. For instance, for/ (.v) = we 1

have
l+
f (x) dx = 2h and fQ (a - h) =f (a) =/„(a + h)
I -A
so u +v+ u' = 2//. Using /^.v) = .r and 2 (.\)
= .y
2
similarly gives the
equations
(a — h)u + av + (a + h)w = 2ah
and
(a 2 - h2 - 2ah)u + a2 v + (a 2 + h2 + 2a/2)tv = 2a 2 /? + f/?
3
.

These equations are easily solved (see Exercise 13) and give the result
u = vv = \h, v = ^h.
We have obtained a rule which is correct for the particular polynomials
f ,fi, and/ Its correctness for any quadratic polynomial follows readily.
2 .

Let us write E(f) for the error committed when the rule is applied to a
general function/, so
a+k
E(f) - \
Ja—h
fW ^ - \hf(a - h) - if (a) - \f{a + h).

It is easy to see that £ is a linear function from the vector space of poly-
nomials to JR.
1
, and of course E(f )
E{f^) E(f2 ) = = = by construction.
If/is any quadratic polynomial, so f(x) px 2 qx = + + r, then/= pf2 +
qf x
-^
f and by linearity

£(/) = P E(f + qE(f) +


2) /-£(/„) - 0.

The same method can be used to derive a wide variety of formulas


useful in numerical analysis. (See Exercise 14.)

EXERCISES
Figure 1(b) shows an electrical network with the resistance (in ohms) of each
edge marked on Suppose an external power supply maintains node A at
it.

a potential of 10 voltsand node B at a potential of 4 volts. Following the


procedure of Example 1 in the text, set up equations for the potential at
the other nodes and solve them. From the results, calculate the current
flowing into the network at A.
96 Linear Algebra Chap. 2

2. The edges and vertices of a 3-dimensional cube form a network with 8 nodes
and 12 edges. Suppose each edge is a wire of resistance ohm, and that two 1

of the vertices have external connections that maintain one of them at a


potential of and the other at a potential ofO. Find the value of the potential
1

at the other vertices, and the current flowing in the external connections if
(a) the two vertices with external connections are at opposite corners of the
cube
(b) they are at the two ends of an edge
(c) they are at opposite corners of a face of the cube.

3. (a) Suppose that three forces acting at the origin in ft 3 have the same
directions as (1, 0, 0), (1,1, 0), and (1,1, 1). Find magnitudes for the
forces acting in these directions so that the resultant force vector will be
(-1,2,4).
(b) Can any force vector be the resultant of forces acting in directions
specified in part (a)?

4. Show that, if three linearly independent vectors 3


in :R are used to specify
the directions of three forces, then magnitudes can be assigned to these forces
to make the resultant force equal to any given vector.

5. Ifforcesactin Jl 2 in the directions of (2, 1), (2, 2), and (-3, -1), show that
magnitudes can be assigned so that the system is in equilibrium.

6. Suppose that a random walk traverses the paths shown in Fig. 3(a). What is

the probability p k k = 1, 2, 3, that a walk starting


, at a k goes to o4
without passing through o 5 ?

7. (a) Suppose that a particle traces a random walk on the paths shown in
Fig. 3(b). Letting pk be the probability of going from b k to b 6 without
going through b b fmd ,
pk for k = 1,2, 3, 4.
(b) How is the result of part (a) modified if b t and the path leading to it are
eliminated altogether?
(c) How is the result of part (a) modified if a new path is introduced between
6 4 and A 6 ?

8. Suppose that various mixtures can be made of substances S lt S2 S3 54 , ,

having densities 2, 17, 3, and 1 respectively, measured in grams per cubic


centimeter. Suppose also that the price of each substance in cents per gram
is 4, 51, 3 and 1 respectively. Is it possible to make a mixture weighing 10

grams, with a volume of 20 cubic centimeters and costing dollar? 1

is such that/(ej) = (1,2, —1),


3 3
9. Suppose that a linear function/from 3l to Jl
/(e 2 )= (0, 1, 1), and/(e 3 ) = (1,2, 1). Find a vector xlt x 2 x 3 such that ,

f(\ k ) = e k , for k = 1,2, 3.

10. Let /be the linear function from :R


3
to R3 with matrix

\0 1 3/

Show that the points x in X3 such that/(x) = all lie on a line.


Sec. 3 Theory of Linear Equations 97

11. (a) Suppose the vector t = (fj, t 2 , t 3 , f4 , t 5 ) in Fig. 4(a) is specified to be


t = ( — 1, 0, 1, 2, 1). Find a vector r that determines consistent internal
flow rates,
(b) Verify that the vector r = (-a - b, a + b, a + b, b, —a, 0), for any
a and b, is consistent with external flow t = in Fig. 4(a).

12. (a) Let the external flow vector in Fig. 4(b) be given by t = (1, 1, 2, 4).
Show that there is more than one consistent internal flow vector r, and
find two of them,
(b) Let the external flow vector in Fig. 4(b) be given by t = (1,0, 1, 1).

Show that there is no consistent internal flow vector.

13. Carry out the solution of the equations for u, v, w given in Example 5
of the text. [Suggestion: begin by subtracting a times the first equation
from the second and a 2 times the first equation from the third.]

14. By using the method of Example 5, find constants /, u, v, w such that


the formula

a+Sh
f(x) dx = tf(a) + uf{a + h) + vf(a + 2A) + wf(a + 3/0
I
Ja

is exact whenever / is a polynomial of degree less than or equal to 3.

SECTION 3

In this section we are concerned with formalizing the method of solving THEORY OF LINEAR
linear equations that has been illustrated in the two preceding sections, EQUATIONS
and in proving that it always works. We begin by giving a precise definition
for the "noninterference" property discussed informally after Example 3

of Section 1. At that point we expressed the idea by saying that each


equation of a system contained a variable that did not occur in any other
equation. The following definitions express this "noninterference" prop-
erty in terms of the coefficient matrix.
An entry in a matrix is called a leading entry if it is the first nonzero
entry in its row. Thus in

3
Linear Algebra Chap. 2

3.1 (a) Every column containing a leading entry is zero except for the
leading entry.
(b) Every leading entry is 1.

In discussing reduced matrices it is frequently necessary to distinguish


the columns that contain leading entries from those that do not. (Of
course a row contains a leading entry if and only if it is nonzero.) We
shall say that the columns that do contain leading entries are pivotal. In
a reduced matrix, each leading entry belongs to one nonzero row and one
pivotal column; we shall say that the row and column are associated.
This gives a one-to-one correspondence between nonzero rows and
pivotal columns, and it establishes the following fact which will be
referred to later.

3.2 In a reduced matrix, the number of pivotal columns equals the


number of nonzero rows.

Example 1. Consider the matrices

1
Sec. 3 Theory of Linear Equations 99

3.3 Theorem

Suppose A is a reduced matrix. If A has a row of zeros for which


the corresponding entry in b is not zero, then the equation Ax = b
has no solution. Otherwise it has at least one solution.

Proof. If any row of A is zero, then the corresponding entry in Ax


will be zero, whatever x may be. Thus if the corresponding entry in
b is not zero, there can be no solution.
the other half of the theorem, we assume that every zero
To prove
row of A (if there is any) is matched by a zero entry in b, and we
show how to write down a vector x that is a solution. The vector x
must of course have entries xx xn corresponding to the n
columns of A, while b has entries b u b m corresponding to the . . . ,

m rows of A. If they'th column of A is nonpivotal set x, = 0. If


they'th column of A is pivotal, let i be the number of the associated
row, and set x, = b We claim that the vector x constructed in this
t
.

n
way satisfies Ax = b. The /th entry in the product is 2 a ikxk- We
a:=i

must show that it equals b It is zero if the rth row of A is zero,


t
.

and by assumption b is then also zero. If the /th row is not zero,
t

then a ti = 1, where y is the associated pivotal column. By construc-


tion, Xj — b so a u Xj = b
t ,
The terms a ikxk with k =£j are all zero
t
.

because a ik — if the &th column is pivotal (because A is reduced)

and xk = if the &th column is not pivotal (by construction).

Example 2. Consider

W 1 5 0\

= .12
3 0.
A , bx = ^ ,
b2 =

^0 1,

The third row of A is zero and the third entry of \i x is not; so according to
Theorem 3.3 the equation Ax = b x has no solution. On the other hand,
the third entry of b 2 is zero. The pivotal columns of A are the first, third,
and fifth, with associated rows the second, first, and fourth. The proof
of the theorem shows that, if we construct
100 Linear Algebra Chap. 2

by making the first, third, and fifth entries equal to the second, first, and
fourth entries of b 2 and making the other rows zero, then Ax will be equal
,

to b 2 This is easily verified by doing the matrix multiplication.


.

The next problem is how to tell whether an equation that has a solution
has more than one. Theorem 3.4 and its corollary show that the question
can be reduced to a special case, and theorem 3.5 deals with this special
case.

3.4 Theorem

Suppose x is a solution of Ax = b. Then Xj is also a solution if

and only if Xj — x is a solution of Ax = 0.

Proof. We have A{x x — x ) = Ax — Ax =x b — b = 0. Con-


versely, if A{x x — x ) = 0, then Ax = Ax = b.
x

The equation Ax = is often called the homogeneous equation associ-


ated with Ax = Observe that the homogeneous equation always has
b.

at least one solution, namely, x = 0. From 3.4 we immediately obtain


the following.

Corollary

Suppose Ax = b has at least one solution. Then it has exactly one


solution if and only if x = is the only solution of the associated

homogeneous equation.

3.5 Theorem

Suppose A is reduced. If every column of A is pivotal, then x =


is the only solution of the homogeneous equation Ax = 0. Other-
wise the equation has solutions with x^O.

Proof. Suppose every column of A is pivotal. Let / be the number of


the row associated with column j. Then a u = 1 and a ik = for
k ^ j (because A is reduced and all columns are pivotal). Thus the
n
/th entry in Ax is ^ fl,*** = xt \ so if Ax = 0, xs = for ally.

Conversely, if r is the number of a nonpivotal column, we can


construct a nonzero solution of Ax — as follows. Take xr = 1
Sec. 3 Theory of Linear Equations 101

(which guarantees x # 0) and take x, =


is the number of any ifj
other nonpivotal column of A. column is pivotal, take
If the y'th

Xj = —a ir where i is the number of the row associated with column


/. As in the proof of Theorem 3.4, we look at the product Ax a row
at a time. Zero rows of A, of course, give zero entries in the product.
n
If the /th row of A is nonzero, we get the sum ^ a ,k xk- If k is tne

number of any pivotal column except the yth (where column j is


associated with row /"), we have a, k =
0. If it is the number of any

nonpivotal column except the rth, we have xk = 0. Thus the sum


reduces to a^Xj + x ir x T = 1 • (— a iT) + a ir •
1 = 0, as required.

Example Consider the homogeneous equation Ax = 0, where A is


3.

the matrix of Example 2. The second and fourth columns of A are


nonpivotal. The construction given in the proof of Theorem 3.5 can thus
be applied to give a solution with x 2 = 1, x 4 — 0, and one with x4 = 1,
x 9 = 0. The vectors obtained are

and

Any combination ry + sz is also a solution. (Why?) Using Theorem 3.4


to combine this information with the result of Example 2, we see that all
vectors of the form
'
-2

+ s

are solutions of

We have answered most of the questions about solutions of linear


systems that have a reduced coefficient matrix. If we can convert any
Linear Algebra Chap. 2

given system into a reduced one by elementary operations, then we have a


general method of finding solutions of linear systems. The examples of
Section 1 illustrate a reduction process that can in fact be applied success-
fully to any matrix.
Suppose a matrix is not reduced. Then there must be some column
containing a leading entry such that either 3.1(a) or 3.1(b) (or both) is

violated. If the column contains the leading entry r for the /th row,
_1
multiplying the /th row by r will make the leading entry 1. (Since r was

a leading entry, it could not be zero. Of course it might be 1 to begin with,


and the multiplication would be unnecessary.) If any other entries in the
column are nonzero, they can be made zero by adding suitable multiples
of the /th row to the rows they are in. Any column that was "correct"
before these operations must have a zero for its /th entry, and therefore
is unaltered by them. We have just described a process that can be applied
to any unreduced matrix and that increases the number of columns that
satisfy the Conditions 3.1. If the resulting matrix is not reduced, the
process can be repeated. A reduced matrix will be obtained within at
most n steps, where n is the number of columns in the matrix. We have
proved

3.6 Theorem

Given any matrix A, a sequence of elementary operations can be


found which converts A to a reduced matrix.

Example 4. We shall apply the method given in the proof of Theorem


3.6 to reduce the matrix

-2

Column 1 does not satisfy 3.1(a), but has a leading entry of in the first 1

row; so no elementary multiplication is necessary. Subtracting 2 times


row 1 from row 2, and adding row to row 3, clears the other two entries
1

to zero and gives

1 3 -2 0\

Column 2 does not contain a leading entry. The 4 in column 3 can be


Sec. 3 Theory of Linear Equations 103

converted to a 1 by multiplying the second row by \, which gives

3-2 0\
(1
1

Adding 2 times row 2 to row 1 and (—2) times row 2 to row 3 clears out
the other entries in column 3 to give

Multiplying row 3 by § and next adding h times row 3 to row 1 and \


times row 3 to row 2 gives the reduced matrix

as the final result.


In working this example we used the standard procedure given in the
proof of 3.6. This of course is not the only sequence of steps that will
give a reduced matrix, and a different sequence may require less arithmetic.

For instance, consider the matrix A 2 Adding row 3 to row


. 1 and subtract-
ing 2 times row 3 from row 2 gives

1
104 Linear Algebra Chap. 2

We now turn to the special case of systems of/? equations in n unknowns,


that is, systems with square coefficient matrices. The following theorem is

the one most important in applications.

3.7 Theorem

If A is a square matrix and the only solution of the homogeneous


equation Ax = is x = 0, then the equation Ax = b has exactly
one solution for every vector b.

Proof. By Theorem 3.6 we can convert A to a reduced matrix C and,


using the same elementary operations, convert b to a vector d so that
Cx = has the same solution set as Ax = and Cx = d has the
same solution set as Ax = b. The hypothesis therefore implies that
Cx = has no solution except x = 0. Since C is reduced, it follows
from Theorem 3.5 that every column of C is pivotal. The number of
nonzero rows of C is equal to the number of pivotal columns.
Since C is square, there can be no zero rows, therefore, by Theorem
3.3 Cx = d (and hence Ax = b) has at least one solution. That
there is exactly one solution follows from the corollary of Theorem
3.4.

We can now also prove the converse of Theorem 1.3, namely:

3.8 Theorem

If A is invertible, then it can be converted to / by elementary


operations.

Proof. If there is an x ^ with Ax = 0, A cannot have an inverse,


since then Ax = would imply A~~ x Ax = A~*0, i.e., x = 0, which
would be a contradiction. Otherwise A can be converted to a reduced
matrix C, all of whose columns are pivotal. Every column of C then
contains one and has all other entries 0, and the l's in different
1

columns belong to different rows. Appropriately changing the order


of the rows (which can be done by elementary transpositions) will
give the identity matrix. Thus A can be converted to / by elementary
operations.

We thus see that it is sensible to apply the method used for computing

inverses given by Theorem 1.3 to any square matrix A. If the method


succeeds, it shows that the matrix is invertible and computes A' 1 If the .
Sec. 3 Theory of Linear Equations 105

method fails, then one obtains a reduced matrix with at least one non-
pivotal column and, hence, by Theorem 3.5, a nonzero x such that Ax = 0,
which demonstrates that A is not invertible.
We conclude by observing that every elementary operation, when
applied to a column vector in %
n
is a linear function, called an elementary
,

transformation. Indeed, it is obvious that (a) multiplication of a co-


ordinate by r 7^ 0, (b) addition of a multiple of one coordinate to another,
and (c) transposition of two coordinates all have the properties of linearity.
It follows that each of these operations can be performed by multiplying

on the left by some n-by-n elementary matrix. (The precise forms of such
matrices are described in Exercise 7 at the end of this section.) This fact
enables us to prove the next theorem.

3.9 Theorem

An invertible matrix A can be written as a product of elementary


matrices. It follows that the linear function defined by

f(x) = Ax
can be expressed as a composition of elementary transformations.

Proof. By Theorem 3.8, the matrix A can be reduced to / by apply-


ing a product
Q = A1 ...Ak
of elementary matrices to A. Thus QA = I. But since each ele-
1
mentary operation is reversible, each matrix A t
has an inverse Aj .

Then Q has an inverse


0- 1 = A? . . . A?,
and so A = Q~H = A^ 1
. . . A^ 1
. This expresses A as the desired
product.

EXERCISES
1. Determine which of the following matrices are reduced. For those that are
not, state exactly how they violate the conditions. For those that are
reduced, list the pivotal columns and their associated rows.

1
Linear Algebra Chap. 2

2. For each matrix of Exercise 1 which is not already reduced, find an


elementary operation which changes it to a reduced matrix.

3. Let

For each of the equations Ax = r, Ax = s, Ax t, where A is the matrix of

Problem 1, determine whether the equation has no solutions, exactly one


solution, or more than one solution. If there is one solution, give it; if there
are more than one, give two different solutions.

4. Repeat Problem 3 using the matrices B, C, and D instead of A from


Problem 1. (Remember to get the coefficient matrix in reduced form.)

5. Show that if a square «-by-« matrix is reduced and has no all-zero row, then
every row and column contains n — 1 zeros and one 1. Hence show that it

can be converted to an identity matrix by elementary transpositions.

6. Determine the solution sets of the following systems of equations.

[Am. r(l, -2, 1) + (0, -9, ¥)•]

(a) Denote by D t
(r) a matrix which is the same as the identity matrix
except for having r in place of 1 in the /th diagonal entry. Show that the
Sec. 3 Theory of Linear Equations 107

matrix product
D (r)M
f

is the elementary modification of M that results from multiplying the


/th row of M by r.

(b) Denote a matrix with 1 for its //th entry and 0's elsewhere by Eu . For
example, for 3-by-3 matrices,

£,,= 10001 and E,

Show that, if M is an m-by-n matrix and Eti


and / are both m-by-m,
then
(/ + rEu)M
is the elementary modification of M which results from adding r times
they'th row to the /th row.
(c) Denote by Tti
the matrix obtained from / by exchanging the /th andy'th
rows. Show that exchanging the /'th andy'th rows of M gives
TijM.

(d) Any matrix of the form (/ + r2s#), Dfy), or T^ is called an elementary


matrix. Show that each of the three types of elementary matrices is

invertible by verifying that, for /'


y^ j,

(I + rEvT1 = (/ - rE ),
i} D&T X
= />,(;] ,
and Tj = Tu .

8. Prove that if there are more unknowns than there are equations in a linear
system, then the system has either no solutions or infinitely many.

9. Suppose A is a square matrix. Prove that A is invertible if and only if every


matrix equation Ax = b has at least one solution.

10. Let p(x) = a + a xx + + a n x n be a polynomial of degree < n. It is a


. . .

well-known theorem of algebra that if there are more than n values of x


which make p(x) = 0, then all of its coefficients are zero. Use this theorem
to show that if x x n are any n + 1 different numbers, and b
, . . . , bn , . . . ,

are any n + 1 numbers, then there is exactly one polynomial of degree < n
such that/?O ) = b ,p(x n ) = b n [Hint. Show that the problem leads
, . . . .

to a system of linear equations with a a n as unknowns.] , . . . ,

11. A reduced matrix in which the first pivotal column (starting from the left)
is associated with the first row, the second pivotal column is associated with

the second row, etc., is said to be in echelon form. Show that:

(a) Any reduced matrix can be put in echelon form by a sequence of


elementary transpositions.
(b) If a matrix is in echelon form, then the zero rows (if there are any)
come last.
108 Linear Algebra Chap. 2

(c) A square matrix in echelon form is either an identity matrix or has at


least one zero row.

SECTION 4

VECTOR SPACES, In the earlier parts of the book we have restricted ourselves to vectors in
SUBSPACES, n
'Ji . In this section we consider more general vector spaces, though some
DIMENSION
of the ideas have already been introduced with ii\
2
and Jl 3 as the main
examples.
Recall that a vector x is a linear combination of the vectors xXl . . . , xn
if there are numbers rlt . . . ,r„ such that

x = r1 x 1 + ..'.+ rn \ n .

Example 1. Let

l
= (1,0,0), x2 = (0,1,0), x3 = (0, 0, 1), k = 0,1,1).
Then y = (2, 2, 0) is a linear combination of x 1 and x 2 because it is

equal to 2x x+ 2x is a linear combination of x and x


2 ; because
it is 3 4 it

equal to 2x — 2x On the other hand, is not a linear combination of x


4 3. it 2

and x because rx + sx has a first entry of whatever the values of r and


3 2 3

s; so y = rx + sx is impossible. 2 3

Since = Oxj + . . . + 0x„, the zero vector is a linear combination of


any set of vectors. The linear combinations of a single vector x x are just
the numerical multiples rx r .

If a set of vectors lies in a plane through the origin in 3-space, then every
linear combination of them lies in the same plane —
x and y recall that if
are in a plane through the origin, the parallelogram rule makes x + y
a vector in the same plane. Any numerical multiple of x lies in the same
plane because it lies in the line containing x. Any linear combination of
x 1} . . . up by multiplications and additions, and if the vectors
, x„ is built
Xj, x„ lie in a plane, so do all linear combinations of them.
. . . ,

Similarly, if x x x n all lie in one line through the origin (so they
are all multiples of some one vector), any linear combination of them
lies in the same line.

These remarks suggest the following generalization which includes lines


and planes as special cases: A subset S of a vector space IT is called a
linear subspace (or, frequently, simply a subspace) if every linear com-
bination of elements of Sis also in S. We assume S is non-empty.

Example 2. We list some examples of subspaces.

(a) The set of all vectors in %n with first entry equal to is a subspace,
since any linear combination of such vectors will also have a for its first

entry.
.

Sec. 4 Vector Spaces, Subspaces, Dimension 109

(b)
C
For any vector space U,
C
U itself is a subspace. The term proper
subspace is often used to refer to those subspaces that are not the whole
space. In any vector space the zero vector forms a subspace all by itself.

This subspace is called the trivial subspace (because it is).

(c) For any linear function 17 — > ID, the set JC of vectors x in U
with/(x) = is a subspace of T) called the null space off. If x l5 . . . , x fc

k k
are in JV and x = 2 r t x i> tnen /( x ) = 2 r if( x i) = 0> because / is

linear and all the/(x,) are 0. Hence x is also in JV, and so JV is a subspace
C
of \J. In particular, the set of vectors (x, y, z) in 'A 3 such that

x + 2y + 3z = 0,
or equivalently,

(1

3
is a subspace because it is the null space of the linear function from 'Ji to
Jl defined by the preceding l-by-3 matrix.

(d) The range of a linear function / defined on a vector space is a


subspace of the range space off. The reason is that for yu , ys to be
in . .

the range off means that there are vectors \ lf x k in the domain of / . . . ,

such that/(Xj) = y for


;
/ = \, . . . , k. But then, by the linearity off, an
arbitrary linear combination of the y, has the form

riYi + • • • + rkyk = r,/(Xi) + • + rkf{x k


• •
)

=f(r1x 1 + . . + rk x k
. ).

Because the domain of/is a vector space, r1x1 + + rk x k . . . is in it, and so


''iVi + • • • + fkJk s n tne ran g e off.
' ' In particular, the set of all vectors in
3
'Jl of the form
4\ /1\ /4\

5 =x
)0
is a subspace because it is the range of the linear function just defined by
the above 3-by-2 matrix.

We define the span of a set S of vectors to be the set consisting of all

linear combinations of elements of S. It is easy to show that the span of


k

any set is a subspace. Suppose x = ^ rtx i ' s a linear combination of


i=l
.

110 Linear Algebra Chap. 2

n
vectors x x x which are in the span of S, that x = £ s a u 3 f° r
, . . . , fc ,

kin \
is,
n
t
3=1
some vectors u_, in S. Then x = ^ rA ^= s^uA = £ ^, where /
;
=
k = i l \ 3 1 / = 3 1

2 fYfy- This shows that x is itself in the span of S.


i=i
We recall from Section 1 of Chapter 1 the general definition of a vector
space as a set 17 of elements with operations of addition and numerical
multiplication such that for any elements x, y, z of 1J and real numbers
r, s:

1 rx+ sx = (r + s)x.
2. rx + ry = r(x -f y).

3. r(sx) = (rs)x.

4. x + y = y + x.

5. X + y) + z = x + (y +
( z).

6. There exists an element in U such that x + = x.

7. For any x in U, x + (— l)x = 0.

Now if S is any subspace of a vector space D, then the operations of


addition and numerical multiplication, as given for 1), always yield a
result in S when they are applied to elements of S, because x + y =
lx + ly and rx = rx + Oy are linear combinations of x and y. The laws
1 through 5 certainly hold for x, y, z in S, since they hold for all elements
of IT. The zero vector belongs to S, since Ox for any x in S; also, =
if x is in S, then so is (— l)x = — x. Thus laws 6 and 7 hold as well. In

other words, we have proved the next theorem.

4.1 Theorem

Any subspace of a vector space is a vector space, with the operations


inherited from the original space.

While subspaces of :<{" provide important examples of vector spaces,


there are many others. We list some below.

c
Example 3. (a) Let l) consist of all continuous real-valued functions
of a Define/ + g and rf in the obvious way as the functions
real variable.
whose values for any number x are/(x) + g(x) and rf(x), respectively.
(Of course, we are using the theorems that/+ g and //are continuous if
Sec. 4 Vector Spaces, Subspaces, Dimension 111

/ and g are.) It is easy to verify that the laws for a vector space are
satisfied.

C
(b) Let P be the subspace of U consisting of all polynomials, i.e., all

functions /that can be expressed by a formula/(x) = a + ax x + . . . +


ak x k for some constants a , . . . , ak . (What needs to be checked to verify
that this is a subspace?)

(c) Let Pn be the subspace of polynomials of degree less than or equal


to n, i.e., those that require no power of x higher than the «th for their
expression. For k < n, Pk is a subspace of P n and all P n are subspaces ,

of P.

C a) be the vector space of real-valued functions f(x) whose


(d) Let
first k derivatives are continuous for all real x. Then C (fc4_1) is a subspace of
(co)
C If we denote by C
(fc)
. the vector space of functions having derivatives
,

of all orders, then C (00)


is a subspace of C for every k. (fc)

The description of lines, planes, and ordinary space as 1-, 2-, and 3-

dimensional is familiar. It is possible to define the dimension of any vector


space. The examples of lines and planes suggest that the span of A: vectors
should have dimension k. This is not quite right, since, for example, the
span of two vectors that happen to lie in the same line will be only a line
instead of a plane. To handle the question properly requires the concept of
linear independence introduced in Chapter 1. We recall the following
definition.
A set of vectors {x l5 . . . , xk} is (linearly) independent if the only set of
numbers r1} . . . , rk such that r^ + . . . + rk x k = is the set r t = r2 =
. . . = rk = 0. A set of vectors is (linearly) dependent if it is not inde-
pendent. For {x l5 . . . , xk } to be dependent therefore means that there are
numbers rx , . . . ,rk not all zero, such that r 1 x 1 + . . . + rk x k = 0.

Example 4. (a) The four vectors x = (2, 0, 0), x = (0, —2, 0), 1 2

x3 = (0, 0, 3), x = (2, —2, 3) are linearly dependent since x + x +


4 x 2

x — x = 0. The set of three vectors x


3 4 x x independent since x , 2, 3 is

rx + sx + rx =
x only if r — s = = 0.
2 3 t

(b) A set of two vectors x, y independent only if neither is a numer- is

ical multiple of the other. For example, if y = 3x, then 3x — y = 0, and

the vectors are dependent.

A set of vectors which is linearly independent and spans a space TJ


C
is called a basis for U.

Example 5. (a) The natural basis vectors e x , . . . , e„, where e, has 1

for its z'th entry and for all other entries, form a basis for 3\". Verification
that the e, are in fact linearly independent and span Jl" is left to the reader.
112 Linear Algebra Chap. 2

(b) In the space P„ of Example 3(c), the polynomials x —l,x1 =


x, . . . , xn = x" form a basis. Obviously, if f(x) = a + axx -f- + . . .

a n x" is a polynomial of degree less than or equal to n, it is the linear


combination o x + . . . + a,fx n of the x's. If a x + . . . + a nx n is the
zero function, then (since a polynomial of degree less than or equal to n
cannot have more than n roots unless its coefficients are all zero) a =
a1 = ... = an = 0.
While it is true that every vector space has a basis, we shall usually
consider only those which have a basis with a finite number of elements.
The next theorem implies that if a space is spanned by a finite set (which is

perhaps not linearly independent), then it has a finite basis.

4.2 Theorem

Let °l) be the span of the vectors x x , . . . , xn . Either 17 consists of


the zero vector alone, or some subset of the x's is a basis for 17.

Proof. If the set x x , . . . , x„ is independent, then it is itself a basis


for 1). Otherwise some relation rx x x + . . . + rnx n = holds,
with at least one r, say r k , different from 0. Then we can divide by
rk and obtain x fc
= — (r x //j.)xi
— ... — (r n /rk )x n where x k does not ,

appear on the right side. By substituting the right side for x A in any .

linear combination of all the x's, a linear combination is obtained


that does not involve x k . In other words, x fc
can be dropped and the
span of the remaining vectors will still be all of °0. If the resulting
subset is not independent, the process can be repeated. It must end
in a finite number of steps either because a basis has been obtained
or because all the vectors have been discarded. The latter is possible
only if the space contains only the zero vector.

The following theorem is the most important step in developing the


theory of dimension.

4.3 Theorem

If a vector space is spanned by n vectors xlf . . .


, x„, and y x , . . .
, yk
is a linearly independent set of k vectors in the space, then k < n.

Proof.The statement is easy to prove if n = 1. In this case, the space


spanned by x x consists of all numerical multiples of x x Then for .

any two vectors y x and y 2 in the space, one must be a multiple of the
Sec. 4 Vector Spaces, Subspaces, Dimension 113

other, and so y x and y., cannot be independent. We proceed by


induction and assume that, in any space spanned by n 1 vectors, —
no independent subset can have more than n — vectors in it. 1

Suppose we now are given a space spanned by n vectors x 1; x„ . . . ,

and containing k vectors y1} yk with k > n. If we can show . . . ,

that the y's are dependent, then we will have shown that the state-
ment of the theorem holds for a spanning set of n elements, and the
inductive proof will be complete. Each of the vectors yu y n+1 . . .
,

can be written as a linear combination of x l5 \ n which means . . . , ,

that there are numbers a il


such that y, = ^ auxv f° r '
= !»••>
= 3 1

n + 1 • If the /i + 1 numbers a a are all zero, then the y's all lie in the
space spanned by the /; — vectors x 2
1 x w thus by the inductive , . . . , ;

assumption the y's would then be dependent, as we want to show.


Otherwise (by renumbering, if necessary) we may suppose that a n
is not zero. Then define n vectors z 2 , . . . , z n+1 by setting z, = y t

dix&idfi- By using the equations giving y, in terms of the x's, it is

easy to see that


n

j =2

so that the z's are linear combinations of the n — 1 vectors x 2 , . . .


,

x n By the inductive assumption, there are numbers


. r2 , . . . , rn+1 ,
not all zero, such that a- 2 z 2 -f r n+1 z n+1 0. . . . + = Using the
definition of the z's in terms of the y's, this last relation becomes

— au(>yru + . . . + r n+1 a n+11 )y! + r2y 2 + . . . + r ^yn+l = n 0.

But since not all the r's are zero, this implies that y x , . . .
, y n+1 are
dependent, as we wanted to show.

4.4 Theorem

Let TF be a vector space with a basis of n elements. Then every basis


for °0 has n elements.

Proof. Let {xx and {y l5 xj C


y k } be two bases for U. Since . . .
,

both sets are independent, and both are spanning sets, Theorem 4.3
implies that k < n and n < k.

The dimension of a vector space that has a finite spanning set is the
number of elements in any basis for the space. (The dimension of the
space consisting of the zero vector alone is defined to be 0.) We write
dim (1)) for the dimension of the vector space C U. Note that Theorem 4.2
114 Linear Algebra Chap. 2

guarantees the existence of a basis, and that Theorem 4.4 guarantees that
the dimension does not depend on which basis is taken.

Example 6. (a) By Example 5(a), dim (:K") = n.

(b) By Example 5(b), dim (P n ) = n + 1.

(c) The space P of all polynomials (Example 4(b)) does not have any
finite spanning set. If it did have one with k elements, then the fact that
1, x, x2 , . . .x k are k -\-
, 1 linearly independent elements of P would
contradict Theorem 4.3.

A vector space with a finite basis is said to be finite-dimensional. As


we have just seen in Example 6(c), there are spaces which are not finite-

dimensional.
Theorem 4.2 asserts that we can get a basis from a finite spanning set by
deletingsome of its members. The next theorem shows that, in a finite-
dimensional space, we can get a basis from a linearly independent set by
adding vectors to it.

4.5 Theorem

Let 5" = {x l5 . . . , Xj.} be a linearly independent set in a vector space


C
\J. If S is not already a basis, either it can be extended to a finite
C
basis for U, or else it can be extended to an infinite sequence of
independent vectors, and "TJ is not finite-dimensional.

Proof. Suppose x1% . .


.
, x k are linearly independent but do not span
C
all of U. Then there is some vector y that is not a linear combination
of x l5 . . . , xk . set x ls
Take x trl xk = y. We claim that the . . . , ,

xk+1 is linearly independent. Suppose r 1 x l + + rk+1 x k+1 = 0. We . . .

must show that all the r's are 0. If rk+l were not 0, we could write
x A.^ x = — (/'i//'i + i)x 1 — ... — (rk lrk+1 )x k which is impossible because
. ,

x k+1 is not a linear combination of the other x's. Therefore we have


rk+i — and r^! + + rk x k = 0. Since x x x k are inde- . . . , . . . ,

pendent, the last equation implies rx = . . . = rk = 0. In other


words, if a linearly independent set does not span a space, then a
vector can be added to it so that the resulting set is also independent.
This process of adding vectors can be repeated unless a spanning
set is reached. If a spanning set is reached, then it is a basis and H) is

finite-dimensional. Otherwise an arbitrarily large independent set


can be found, and
C
U cannot be finite-dimensional. This completes
the proof of the theorem.
Sec. 4 Vector Spaces, Subspaces, Dimension 115

4.6 Theorem

If S is a subspace of a finite-dimensional space 1) with dim (IT) = n,


then S is finite-dimensional and dim (S) < n. Any basis for S can be
extended to a basis for 17, and if dim (S) = n, then S = 17.

Proof. If S consists of alone it has dimension 0. Otherwise start


with a nonzero vector x x in S and apply Theorem 4.5. Since no
independent subset of 17 can contain more than n elements, the
construction must end with afinite basis for S. This basis for S is a

linearly independent subset of 17 and can if necessary be extended


(Theorem 4.5 again) to a basis for 17. Thus we have a basis for
S that is a subset of a basis for 17. If dim (S) = n, the subset must
be the whole set; so the same set is a basis for both S and 17, and
s = i7.

The following theorem states an important property of linear functions.

4.7 Theorem

Let / be a linear function defined on a finite-dimensional vector


space. Then
dim (null space off) + dim (range of/) = dim (domain of/).

Proof. As with any theorem about dimension, the trick to proving


this is to find suitable bases for the spaces involved. Let us write 17
for the domain of/ JV for its null space, and ID for its range. Let
v 1; . . . , vk be a basis for JV, and extend it to a basis v l5 . . . , \k ,

u l5 . . . (Theorem
, ur for 17. 4.6 guarantees the possibility of this
construction.) Then dim (JV) = k and dim (17) =k+ r. Let wx =
/(Ux), . . . , w =/(u
r r ).
We claim that wl9 . . . , wr is a basis for W,
which implies that dim (ID) = r and proves the theorem. It is

obvious that the vectors w l5 . . . , wr span ID, for if y =/(x) is any


vector in the range of/ we may write

k r

x = 2 a y + 2 Mi. t i

and then

y = I aj(jd + 2
=
t' l = Z l
WW = + 2 6,-w,,
2 =1

which shows that y is a linear combination of the w's. It remains to


be shown that w l5 . . . , wr are linearly independent. Suppose
,

116 Linear Algebra Chap. 2

2 6,-w,- = 0. This means that the vector ^ t>i u i


is > n the null space
i=l i=l r

of/ and is therefore equal to some linear combination ^ a v i> so


i

i=l
fljVx + • • • + ak y k — ^i u i — • • — b Tu r = 0. Since v l5 . . . \k u 1, ,
. .
.

u r are linearly independent, this is possible only if all the 6's (and all

the a's) are zero, which shows that the w's are linearly independent.

Example 7. Suppose m < «, and define/ from Jl" to 'Ji m by letting


/(x) be the m-dimensional column vector consisting of the first m entries of
x. The range of/ is all of Jl'",and the null space, of dimension n — m,
consists of the vectors in :H" whose first m components are all 0.

The definitions of line and plane can be unified in terms of subspaces


of :K". Because a subspace always contains the zero vector, we define a
line through the origin to be a 1-dimensional subspace and a plane through
z
the origin to be a 2-dimensional subspace. Examples in 'S\ are shown in
Fig. 5(a). A line or a plane that does not pass through the origin can be
obtained from one that does by a translation, that is, by adding a fixed
vector to every vector in the subspace. We say that a subset S of a vector
space is an affine subspace if it can be expressed in the form

^+ b,

where C
U is a linear subspace and b is a single vector. The dimension of an
c
affine subspace is of course defined to be the dimension of Vf, and a
1-dimensional affine subspace is usually called aline, while a 2-dimensional
3
affine subspace is called a plane. Examples in ,'K are shown in Fig. 5(b).
Two affine subspaces are parallel if they are translates of one another.
Two of the most important ways of describing subspaces, namely, as
range or null space of a linear function, provide standard ways of describ-
ing lines and planes. The parametric representation of planes discussed in
Sec. 4 Vector Spaces, Subspaces, Dimension 1 17

Section 2 of Chapter 1 is obtained by starting with a linear function


3\
2 — > %z having a 2-dimensional range with basis x 1; x 2 . We write

L(u, v) — ux r + vx 2

and then form an affine function

A(u, v) = ux 1 + vx 2 + x

by adding a fixed vector x . The representation of a line takes a similar


form,
A(t) = tx + x ,

in which the vectors /x x form a 1 -dimensional subspace as the range of a

linear function 'J\ — > !R


3
.

To represent a line or plane through the origin as the null space of a


linear function is to consider all solution vectors x of a homogeneous
equation
L(x) = 0,

where L is linear. An afhne subspace is then obtained by adding a fixed


vector x to the result or, alternatively, by solving instead

L(x - x ) = 0.

Because L is linear, this last equation can be written as a nonhomogeneous


equation
L(x) = L(x ). (1)

Methods for solving such equations are discussed in the earlier sections
of this chapter. If x = (x, y) is 2-dimensional then equation (1) would
take the form
ax + by = c,

which is a familiar way to represent a line. Ifx = (x, y, z)is3-dimensional,


a single equation
axx + b xy + cxz = dl
3
in general has as solution a plane in 'J\ , while a system of two equations

a xx + b xy+ cx z = dx
a.,x + b y +
2 c 2z = d2

has as solution the intersection of two planes, which is either a line or a


plane.

Example 8. If we try to solve the equations

x — y + 2z = 1

x +y + 3z =
118 Linear Algebra Chap. 2

by row operations, we add the second equation to the first to get

2x + 5z = 1

x +y + 3z = 0.

Multiplying the first equation by I and subtracting from the second gives

x + \z = J

We can represent the set of all solutions of this pair of equations para-
metrically as an affine subspace by setting z — t. We find

X = + 2"/ 2

/V — — It

— A2

Z = /

Thus we have found a line of solutions; it can be described by starting


with a line through the origin having the direction of x x = (— f, — £, 1),
and then translating by adding the vector x = Q, —\, 0). The
result is shown in Fig. 6.
If we have a system of three equations, its solution set is the
intersection of the solution sets of the three individual equations.
The usual situation is that the three planes intersect in one point,
y Changing the right side of the equations shifts the planes parallel
to themselves, and in this case they will always intersect in just
one point. Another possibility is for all three planes to be parallel.
The associated homogeneous system will have a two-dimensional
solution set; the inhomogeneous system will also have a 2-
Figure 6 dimensional solution set if the three planes happen to coincide.
Otherwise there will be no solution. Another possibility (which
has no analogue in two dimensions) is that no two of the planes are parallel,
but that the planes representing the solutions of the homogeneous equations
all pass through one line. Then (unless they all pass through one line) each

pair of the planes representing the solution of the //momogeneous equations


will intersect in a line parallel to the third plane, and there will be no

common solution for all three equations.


x

Sec. 4 Vector Spaces, Subspaces, Dimension 119

EXERCISES
1. Which of the following subsets of Jl 3 are subspaces ? In each case either show
that the subset is a subspace or find some linear combination of elements of
the subset that is not in the subset.

(a) All vectors (pcltx 2 x3 ) with x 1 + x 2


,
= 0.

(b) All vectors with x3 = 0.


(c) All vectors satisfying (a) and (b).
(d) All vectors satisfying (a) or (b).
(e) All vectors with x1 = (x 2 ) 3 .

(f) All vectors satisfying (a) and (e).

2. Let x x = (1, 2, 3), x2 = (-1, 2, 1), x3 = (1, 1, 1), and x 4 = (1, 1, 0).

(a) Show that x l5 x 2 x 3 x 4


, , is a linearly dependent set by solving an appro-
priate system of equations.
(b) Express x x as a linear combination of x 2 x 3 x 4 by a method similar to
, ,

that used in part (a). [Ans. x 1 = ^x 2 + -§


3
— ^x 4 .]

3. Let C[a, b] be the vector space of continuous real-valued functions defined


on the interval [a, b]. Let C [a, b] be the set of functions /in C[a, b] such
that /(a) =f(b) = 0.

(a) Show that C [a, b] is a proper subspace of C[a, b].


(b) Whatifthecondition/(a) =f(b) = is replaced by/ (a) = \,f{b) = 0?
4. Show that the intersection of two subspaces is always a subspace.

5. Part (d) shows that the union of two subspaces is not always
of Exercise 1

a subspace. Show that the union of two subspaces is a subspace if and only
if one of them is contained in the other.

6. Show that the range of a linear function may be a proper subspace of its

range space.

7. Let Si n - — JT" be the linear function defined by multiplication by the


??i-by-n matrix A. Show that its range is the span of the columns of A
(considered as elements of &m ).

8. For any two subsets A and 3i of a vector space, let .-t — 3i be the set of all

vectors that can be expressed as a sum a + b with a in A and b in 3i.

Show that if A and 3i are subspaces then so is A - 3\.

9. Show that if is the only element in the intersection of two subspaces S, 73,
then
dim (S + TJ) = dim (S) + dim (73)

[Hint. Show that a basis for S together with a basis for 13 gives a basis for
s + t;.]

10. Show that for any two subspaces S, 73,

dim (S + 13) = dim (S) + dim (13) - dim (S n 13).

[Hint. Start with a basis for S n 13 and extend it (Theorem 4.6) to a basis
for S and a basis for 13.]
1

120 Linear Algebra Chap. 2

11. Let Jl 3 J* 2 be defined by the matrix

1 -3 2

Find a basis for the null space of/, and one for the range of/. Verify that
Theorem 4.7 holds.

12. Describe the solution set of each of the equations or systems of equations
below as an affine subspace, that is, as a translate by a specified vector of a
linear subspace with a specified basis.

(a) x +y = 1.

(b) 2jc + y =1
2x - 3y + z = 2.
(c) x + 2y + 3z = 10
Ax + 5y + 6z = 1

7;c + Sy + 9z = 12. [Arts. t(\, -2, 1) + (0, -9, ^»).]


(d) x - y + z = 2.
13. For each of the following sets of three equations in three unknowns,
determine which of the geometric possibilities discussed at the end of this
section hold.

(a) x + 2y

(b)

(c)

SECTION 5

LINEAR FUNCTIONS In Chapter 1 we saw that matrices obey some of the same rules for
addition and multiplication that ordinary numbers do. Furthermore,
we have seen that given a linear function /from :K" to %m , there is an
m-by-/7 matrix A such that/(x) = Ax for all x in J{". Using these facts
we could prove that linear functions from 'A" to %m obey algebraic rules
just as their matrices do. However, it turns out that these same rules
apply to a wider class of linear functions and not just those representable
by matrices. For this reason we shall prove the rules in general form and
then apply them in Section 6 to a systematic analysis of some linear
differential equations.
We
begin by describing the operations of addition, scalar multiplica-
tion,and composition of functions. Let / and g be functions with the
same domain and having the same vector space as range space. Then the
function/ + g is the sum of/ and g defined by

(f+g)(x)=f(x) + g{x)
Sec. 5 Linear Functions 121

for all x in the domain of both/and g. Similarly, if r is a number, then rf


is the numerical multiple of/ by r and is defined by

r/(x) = r(/(x)).
We have already defined the composition of functions in Chapter 1,

but we repeat the definition here. We require now that the range of/ be
contained in the domain of g. Then g°f is the composition of/ and g
and is defined by

It is an easy exercise to prove that sums, numerical multiples, and com-


positions of linear functions are linear.

/ g
Example 1. Suppose 5l 2 > % 2
and .'R
2
3t 2 are given by

and
'Q-C-K -3C
x\ _ /2x+y\ 12 \\lx

y) ~\x + 3yj \1 3/\ 7


Then

v+ O= / +
3x + 2y\ /3 2\/x
2x + 2j/ ~\2 2/V
Also, 3/ is given by

3jc + 3j\ /3 3Wx


3x-3j/ "\3 -3/Vk.
Finally, g °fis given, according to Theorem 4.4 of Chapter 1, by matrix
multiplication as

'•'CM'C
; x -X
.

Linear Algebra Chap. 2

in the above example the matrix of/+ g is equal to the


Notice that
matrix of/ plus the matrix of g, and that the matrix of 3/ is 3 times the
matrix off. Of course we have already proved in Chapter 1, Theorem 4.5,
that the matrix of the composition g °/is the product of the matrices of
g and fin the same order. We can state the general result as follows.

5.1 Theorem

Let/and g be linear fu nctions from %n to 31 m with matrices A and B


respectively. Then:

1 The matrix of/ + g is A + B.

2. For any number r, the matrix of //is rA.

3. If g o/is defined, its matrix is BA.

Proof. To prove statement 1 we write

(f + g)(x)=f(x) + g(x)
= Ax + Bx = (A + B)x.

The first equality holds by the definition of/+ g, the second by the
relation of matrix to function, and the last by the general rule
AC + BC — (A + B)C for matrix multiplication and addition.
To prove statement 2 we write, similarly,

rf(x) = r(Ax) = (rA)x,

where the last equality follows from the rule B(AC) = (BA)C for
matrix multiplication.
Statement 3 is just the result of Theorem 4.5, Chapter 1.

Example 2. Suppose /and g are linear functions such that

'Q-Q- 'C)-t:

;:)-(:) 'Ch:
Then the matrices of/ and g are

2 -1\ /l
and
3 0/ \0 -1
Sec. 5 Linear Functions 123

respectively, by Theorem 4.2 of Chapter 1. By Theorem 5.1, the matrix


of/+ g is thesum of the matrices of/ and g:
2 3 "'
3 ~Vf
0/ \0
°)=(
-1/ \3 -1

Thus/ + g = h is a function such that /? ( ) = ( )


and hi )
= ( )

Similarly, the matrix of/— 2g is

2 -«_/, 0j_/0 -,
3 0/ \0 -1/ \3 2

Finally, the matrix of g °/is the product

^2 -1\/1 0\ /2 1
_
,3 0/\0 -lj \3

The close connection between combinations of matrices and combina-


tions of functions from 31" to 3i m suggests that the rules for matrix algebra
should hold for operations with functions also. In fact, these rules hold
for any linear functions for which the operations are defined.

5.2 Theorem

Let r be a number and let/, g, and h be linear functions for which

both sides of the following equations are defined. Then

(1) (f + g)°h=foh+goh
(2)fo(g + h)=fog + fo h
(3) (rf)og = r(fog)=fo(rg)
(4) ho (gof) = (hog) of.

Proof. These formulas are allproved by applying the function on


either side of the equality to an arbitrary domain vector x, and then
writing out the meaning of each side. We shall prove only Equation
(4), leaving the others as exercises. To prove Equation (4) we
observe that, by definition,

ho(gof)(x) = h(gof(x))
= Kg(A*))Y
Similarly,
(hog)of(x)=hog(f(x))

= h(g(f(*)))-
124 Linear Algebra Chap. 2

The results of the two computations are the same, so the formulas we
started with must be the same. Since x is arbitrary, h ° (g of) =
(h o g) of.

(0O)
Example 3. Let C (Jl) be the vector space of infinitely often differen-
tiable real-valued functions y of the real variable x. If we let D stand for
differentiation with respect to x, then

Dy=y',
and D is a function with domain C <oo)
(3l). The familiar rules about
differentiation stating that (/ + g)' =/' + g' and (cf)' = cf imply that D
is linear. Because the meaning will always be clear, we can omit the little

circle in writing compositions, so that DD will stand for D ° D.


The compositions D 2 = DD, D = DDD, etc., 3
are all meaningful,
and we have for example

D y=y",
2
D*y=y'".
We can even define
D°y =y
for occasional convenience. Combining powers of D by addition and
numerical multiplication leads to formulas like

D -D-
2
2, (D + 1)(Z> - 2),

and applying the rules (1) and (2) of Theorem 5.2 we get

(D + l)(D - 2) =D -D- 2
2.

If we apply either side of this last equation to a function y, we of course


get the same thing:
y" _ y> _ 2y.

Differential operators such as these will be studied in more detail in the


following section.
We have seen that addition and multiplication of matrices are related
to operations on linear functions. Similarly related to the idea of an inverse
matrix is that of an inverse function. A function/has an inverse function,
denoted /_1 , if
/-i /(x) = x

for everyx in the domain of/. Applying/ to both sides of this equation
gives/°/ _1 (/(x)) =/(x); therefore, we also have
/o/-l(y) = y,

for every y =/(x) in the image of/. We leave as an exercise the proof
that:
Sec. 5 Linear Functions 125

x
5.3 If/ is linear, then so is/ .

Because we have derived the techniques of computing matrix inverses


in several stages, we summarize here what we have proved so far. At
the end of Chapter 1 , we showed that

5.4 If A is an orthogonal matrix, then A l


= A 1
, where A' is the trans-
pose of A across its main diagonal.

In Theorem 8.3 of Chapter 1, it is proved that:

5.5 If det,4 ^ 0, then A is invertible, and then A' 1 = (l/detA) A'


where A is the matrix of cofactors of elements of A.

Finally, the row reduction method of the first section of this chapter is

an efficient way to find the inverse of an invertible matrix, that is, to find
A -1 such that
A- 1 A = AA^ = 1
/,

where / is the identity matrix.


The relationship between inverse linear function and inverse matrix
is not quite so straightforward as in the case of the other operations on
functions and matrices. The following example shows why.

Example 4. Let A be an n-by-n invertible matrix. Then the linear


function/from 'Ji" to :i\ n defined by ,

f(x) = Ax,

-1
always has an inverse function/ given by

/~x (x) = A^x.


Indeed,
/-i o/( x ) = A- Ax1
= x,

for all x in .'R". However, consider the linear function g from Jl to 3l 2

defined by
fx\
g(x)
126 Linear Algebra Chap. 2

It is easy to check that g is linear and that g has an inverse function g~ x


defined on the subspace t of '.l\
2
consisting of vectors having both co-
ordinates equal. We define

)= x, for in C.
w
j

But the matrix of g is

which is not invertible because it isn't even a square matrix. The difficulty
_1 2
is that g is not defined on all of ,'il , but only on the subspace C.

In the above example the crucial characteristic of the function g was


the dimension of its range or image. The idea
important enough to is

have a name: the rank of a linear function/is the dimension of the range of
/. Thus the linear function

f =
\y) \0 3/\

(2x\

2
has its rank equal to 2 because its range is all of 'J\ . The function

1 0\

\o oj\j

2
has its rank equal to 1 because its range is just the x-axis in Si For a .

n m with matrix A, we have the following


linear function / from 'Ji to 'Ji

useful criterion.

5.6 Theorem

Let/be a linear function with matrix A. Then the rank of/is equal
to the number of linearly independent columns in A.

Proof. The columns of A span the range of/ because

f(x) = x^) + ...+ x n f(e n ),


Sec. 5 Linear Functions 127

where x = (xls . . . , x n ) and the column vectors /(ej), . . . ,/(e„)


are just the columns of A; hence, these columns span the range of/.
The number of independent columns is then the dimension of the
range of/.

Because of Theorem 5.6, we can speak of the rank of a matrix A and


define it to be either the number of linearly independent columns or else
the rank of the function /(x) = Ax.

Example 5. To find the rank of the matrix

1 7^

we apply elementary row operations to reduce the matrix to echelon form.


At each stage the new matrix is the matrix of a different linear function,
but because the row operations are all invertible, the dimension of the
range remains the same. We find that adding the second row to the third
row gives
T 7\

,2 14/

Then subtracting 2 times the first row from the third row gives

1 7\

15.
v
0/

No furtherreduction is possible; so clearly the matrix has just two linearly


independent columns. Thus the given matrix has rank 2, and the asso-
3 3
ciated linear function from :K to 'Ji has a 2-dimensional range.
Finally, we complete the discussion of the relationship between inverse
function and inverse matrix in 31". If /is a function, then / is said to be
one-to-one if each element of the range of /corresponds to exactly one
element of the domain of/ Clearly, / is one-to-one if and only if/ -1
exists. For linear functions we have the following simple fact.

5.7 Theorem

A linear function / is one-to-one if and only if x = whenever


f(x) = 0.
128 Linear Algebra Chap. 2

Proof. First assume that/(x) = implies x — 0. If/(x,) = /(x 2 ),

then, by the linearity of/,/(x x — x 2) = 0. But then by assumption


xx — x2 = 0, so xx = x 2 Thus
. /must be one-to-one. Conversely,
suppose f(Xi) = f(x 2) always implies Xj = x2 . Then, because
f(0) = for a linear function, the equation/(x) = can be written
f(x) = /(0). But then x — 0, by assumption.

It follows from Theorem 5.7 that, if/ is a one-to-one linear function


from :K" to %m , then the rank of/ equals n. The reason is that the null
spaceof/ has dimension (that is, the dimension of the origin) and so,
by Theorem 4.7, rank (/) + = n. This idea is used in the proof of the
following theorem. For linear functions from 3tB to :R", that is, for
functions representable by square matrices, we have:

5.8 Theorem

Let %n —>% n
be linear, with square matrix A. Then/ has an
inverse if and only if A is invertible.

Proof. We at the beginning of Example 4 that if A~ l


have seen
exists, does/ -1 Conversely, if/ has an inverse/ -1 then/
then so . ,

is one-to-one. By Theorem 5.7 the null space of/ has dimension 0;

so the range of/ is all of %


n
But this means that the domain of .

/ is all of % It follows that/


-1 n -1
. has a square matrix B, such that
/ (x) = Bx for all x in :K". Then
-1
the equations

BAx =/-1 °/(x) = x

ABx =/o/-i(x) = x

both hold. It follows that AB = BA; therefore A is invertible and


A- 1 = B.

EXERCISES
1. Suppose that/ and g are linear functions such that

1\ /1\ /0\ /l

'o-i- A.

y-n -CHI
Sec. 5 Linear Functions 129

Find the matrices of the following functions:

(a)/.
(bU-
(c)f + g.
(£)2f-g.
(e)<?o/.
(Ofo(f + g).
Linear Algebra Chap. 2

7. (a) Show that (D + a)(D + b) = (D + b)(D + a), where a and b are


constants,
(b) Define D + f(x) by
{D + /(x))y =/+/(*) v.
Show that
(Z> + 1)(Z> + x) * (D + *)(£> + 1)

by applying both sides to a twice-differentiable function y(x).

8. Find the rank of/, where /has matrix

(0 1 1 2
(a) (b)
U 0) 2 4y

<o o : 2

(c) | 1
I
2

,1 Oy v4

9. Let °U and '10 be finite dimensional vector spaces and let/be a linear function
with domain 1J and with range equal to 10. (Thus dim (1D) = rank (/).)
(a) Show that V contains a largest subspace S such that/, when restricted to
S, becomes one-to-one from StoW. [Hint. Let 91 be the null space of/
with basis n 1 n k Extend this basis to a basis for 1) by adding
, . . . , .

vectors s l5 . . . , s t .]

(b) Show that dim (S) = rank (/), where S is the subspace of part (a).

10. If/ is a function and y an element of the image of/, then the subset S of
is

the domain of / consisting of those x for which /(x) ^= y is called the


inverse image or pre-image of y. Show that, if/ is linear, then the inverse
image of a fixed vector is either a subspace of the domain of/ or else is a
subspace translated by a fixed vector, that is, an affine subspace.

11. What is the simplest way to find the rank of a diagonal matrix
diag («!,.. .,«„)?

12. Show that if A is an m-by-n matrix, then the rank of A is equal to n k,


where k is the dimension of the solution set of the equation Ax = 0.

SECTION 6

DIFFERENTIAL In this section we look at some vector space ideas that arise in studying
OPERATORS differential equations. The equations we shall treat will be like the
following:
/ - ly = (1)

y — 7>y = ex (2)

y" + 3y' +y = sin x, (3)

where the primes denote differentiation with respect to x. Equations of


this kind are important because they express a relation between the value
Sec. 6 Differential Operators 131

of j, its first derivative (velocity), and its second derivative (acceleration).


The problem posed by each equation is to find all functions y(x) such that
replacing y by y{x) satisfies the equation identically. Before beginning a
systematic treatment of the problem, we shall consider some examples
whose results will be useful later.

Example 1. The differential equation y' — ry = 0, where r is a


constant, can be written
/ = ry (4)

and so specifies that the rate of change of y is proportional to the value


of y for every value of the variable x. The growth of a population is
sometimes assumed to obey such a law. To find solutions we use the fact
that if y = e rx , then y' = re rx . It follows that Equation (4) is satisfied if
we take y = e rx
. More generally, if c is an arbitrary constant, then
Equation (4) is satisfied if we take

y = ce rx , (5)

because the c will cancel on both sides. In fact, Equation (5) gives the
most general solution to (4), for observe that we can write (5) in the form

e~ rx
v — c.

Differentiating with respect to x gives

(e- rxy)' =
or, using the product rule for derivatives,

e -rxy' _ re -rxy — e ~rx(y' _ ry) _ 0.

Dividing by e~ leaves y — ry = 0, which is the given Equation (4)


rx

rewritten. But now we can reverse these steps, supposing that y is some
solution. We start with
y - ry =
rx
and then multiply by e" to get

e~ Txy — re~ TXy = 0.

By the product rule, this last equation is

(e-ry)' = 0.

Integrating both sides with respect to x gives

e~ TX
y = c,

where c is a constant of integration. Multiplying both sides by e rx shows


that y must be of the form
y — ceTX .
132 Linear Algebra Chap. 2

Thus we have shown that ce rx is the most general solution of = ry /


in the sense that any particular solution can be obtained by specifying the
value of c.

The method used in the above example consists of multiplying the


expression y + ay by e ax and then recognizing the result as the derivative
(e axy)' = e axy' + aeaxy. We shall use this exponential multiplier e ax
repeatedly in what follows.

Example 2. To solve the differential equation

y - 3y = ex ,

we multiply by e~ 3x and get

£-3x,/ 2e~ 3x v = e~ 2x .

This is the same as

(e
-3Xy)' — e ~2x

Now we integrate both sides with respect to x, getting

e -zxy _ _i e -2* _|_ C)

where c is some constant of integration. Then multiplying by e 3x we obtain

for the most general solution. It is easy to verify directly, of course, that
we have indeed found some solutions, one for each value of c. What we
have shown additionally is that any solution must be of the form — \ex +
ce 3x .

Before considering more complicated examples, it will be useful to


describe some notation that is often used in solving differential equations.
We let D stand for differentiation with respect to some agreed-on variable,
say x, and interpret D+ 2, D — 1
1 , and similar expressions as linear
functions acting on suitably differentiable functions y. For example:

(D + 2)y = Dy + 2y
= / + 2y
(D 2 - \)y = D' y -y z

= D(Dy) -y=y"-y.
An is that D acts as a linear function on y;
important observation
the term linear operator sometimes used to avoid possible confusion
is

over the fact that y itself is a function of x (though not necessarily a linear
one). To see that D acts linearly, all we have to do is recall the familiar
Sec. 6 Differential Operators 133

properties of differentiation:

D{)\ + y = Dy, + Dy
s) 2

D(ky) = kDy, k constant.

These two equations express the linearity of D. From the fact that
compositions of linear functions are linear it follows that the operators
D D32
, , and in general Dn are also linear. Because numerical multiplica-
tion is a linear operation and because the sum of linear operations is

linear, the operator (D + a) is linear for all constants a. Putting these


facts together allows us to conclude that expressions such as

D +
2
a, D + aD +
2
b, (D + s)(D +
are all linear operators, with the respective interpretations

+ a)y = y" + ay
(D 2

(D + aD + b)y = y" + ay' + by


2

(D + s)(D + t)y = (D + s)(y' + ty)


= D{y' + ty) + s(y' + ty)
= y" + ty' + sy' + sty
= y" + it + s)y' + sty
= (D + (s + t)D + st)y. 2

The last example shows that, for constants s and /,

(D + s){D + t) =D + 2
(s + t)D + st,

and also that


(D + t)(D + s) =D + 2
(s + t)D + st.

Thus operators of form D + a can be multiplied like polynomials in


the
D if a is sometimes also important to be able to factor an
constant. It is

operator, for example D 2 — 1. We see immediately that for this example

Z) a - = (D-
1 \)(D + 1)

= (D+ 1)(Z>- 1).

Returning to differential equations, suppose we are given one of the


form
y" + ay' + by = 0;

Equation (3) at the beginning of this section is similar to this, with a = 3,

b = 1. Writing the equation using differential operators gives

(D 2 + aD + %= 0.
134 Linear Algebra Chap. 2

Our method of solution will be to try to factor the operator into factors
of the form (D + s) and (D + /), and then apply the exponential multiplier
method of Examples and 2 repeatedly.
1

Example 3. Suppose we want to find all functions y = y(x) that


satisfy
y" + 5/ + 6y = 0.

We write the equation in operator form as

(Z)
2
+ 5D + 6)y = 0.

Next we try to factor the operator. We see that

(D 2 + 5D + 6) = (D + 3)(D + 2);

thus we need to solve


(D + 3)(D + 2)y = 0.

To find all solutions, we suppose that y is some solution. Letting

(D + 2)y = u

for the moment, we substitute u into the previous equation and arrive at

(D + 3)w = 0.

But this equation can be solved for u if we multiply through by e 3x . We get


e Zx
Du + 3e 3x
u =
or
D(e3x u) = 0.

Therefore
e 3x u = cx ,
for some constant clf and so
u = c x e~~ 3x .

Recall now that we have temporarily set (D + 2)y = it. We then have

(D + 2)y = c x e~
3x
.

tx
Multiply this last equation by e to get

e^Dy + 2e 2xy = c x e~ x
or
D(e 2xy) = c x e~ x .

Integrating with respect to x gives

e^y = —c x e~ x + c2
or
y = — c e~ 3x + x c 2 e~ 2x .
Sec. 6 Differential Operators 135

Since the constants c x and c 2 are arbitrary anyway, we can change the
sign on the first one to get

y = c t e~ 3x + c^er 21

for the form of the most general solution.

The exponential multiplier used in the previous examples is found as


follows: (D + d)y is multiplied by e aT to produce D(e axy), that is,

e
ax
(D + a)y = D(e axy).

Repeated application of the rule leads to the following general fact.

6.1 Theorem

The differential equation

(D - r,){D - r2) . . . (D - r n )y = 0,

with ru r2 , . . . , r n all different, has its most general solution of the


form
y = c 1 e rix + c 2 e r2X + . . . + c n e TnX .

If some r's are equal, say rx = r2 = r3 = . . . = rk , we replace


~1
e r2X , er *x , . . . , e rkX by xe riX x2 e TlX
, , . . . , xk e riX , respectively, to get
the general solution.

Proof. For simplicity we shall start with the case n — 2. Given

(D - ri )(D - r 2 )y = 0,
we set

(D - r 2 )y = u.

Then the equation


(D - r x )u =
is solved by multiplying through with e~ TlX . We get

e~ riX Du — r x e~ T ^ x u =
or
D{e~ rix u) = 0.

Then integration with respect to x gives

e -TiX U _ Ci

Now
(D - r 2 )y = cx e ^x
r
.
136 Linear Algebra Chap. 2

We multiply by e r- r
to get

er* & Dy — = ~'"- )j '-


r 2 e~ riXy c 1 cj(ri

Integrating both sides of

D{e- r **y) = Cl e
(r >- r * )x
(6)
gives

Cl
e-*z*y = gin-*)*
+ C2

cl
y = e
r

r, — r..

We have assumed above that rt ^ r2 ; so the last integration is

correct as given. For neatness we replace the arbitrary constant c x by


(r x — r 2 )c 1 , which is just as arbitrary.
In case r-^ = r2 , the integration just performed is not correct.
We would have r1 — r2 = 0; so Equation (6) becomes

D{e~ rt*y) = ev
Now integration gives
g-rtxy = CiX _|_
Ca
or
y = c 1 xe TiX + c 2 e T2X ,

as stated in the theorem.


More generally, to solve

(D-r )(D-r )...(D-r


l 2 n )y
= 0,
we set

(D-r )...(D- 2 r n )y = nlf (7)

so that the previous equation becomes

(D - rjiij = 0.

Substitution of the general solution for u^ into (7) gives a new equa-
tion which we split up by setting

(D-r )...(D- 3 r n )y = u2 .

Then Equation (7) becomes

(D — r 2 )u 2 = «l9

to be solved for u z . Continuing in this way, we finally have to solve


an equation
(D - r n )y = «„_!

for y. The solution of each equation reduces to the solution of


;

Sec. 6 Differential Operators 137

a sum of equations of the form


(D — r)y = cxk e*x , s^r
or
(D — r)y = cxke Tx .

In the first case we multiply by e~ Tx . Then integration by parts


leads to a linear combination of powers of x times x
e* . In the second
case, after multiplication by e~ rx
, we only have xk on the
to integrate
right. In either case we get solutions of the form stated in the
theorem.

The key to the application of the theorem is the factorization of a


linear differential operator into factors of the form (D — r). Looked at as
a polynomial in D, the expression

D n + a^D"- + 1
. . . + ax D+ a (8)

is called the characteristic polynomial of the associated differential equation

y
(n)
+ a n_ iy ^-v + . . . + a iy '
+ a y = 0.
Notice that, to get the polynomial from the equation, we replace y
ik)

by D k
and that the term a D° = a Finding a
y corresponds to a . factor-
ization for the polynomial depends on knowing its roots, for if the
polynomial (8) has roots r u r n then it has the factored form
. . . , ,

(D - ri )(D -r )...(D- 2 r n ).

Finding the roots of a polynomial exactly is impossible in general. How-


ever, when n = 2 well-known
(or 3 or 4 for that matter), there are
formulas for the roots in terms of the coefficients. For simplicity we have
assumed that the leading coefficient of the characteristic polynomial is 1
otherwise we can divide through so that the leading coefficient becomes 1.

Example 4. Suppose we want to solve

y'» - Ay" + Ay' = 0.

Writing the characteristic polynomial

Z> 3 - 4D + 2
AD,

we observe that it can be written

D(D 2 - AD + 4) = D(D - 2)
2

The roots are and 2, where 2 is a repeated root. Thus the general
solution to the equation is a linear combination of e 0x , e 2x and xe Zx
, ,
Linear Algebra Chap. 2

and so
y = cx + c 2 e 2x + c 3 xe 2*

is the most general solution.

From here on the theory will be described in terms of vector spaces.


We have referred to the linearity of the operators (D — r) without being
specific about the vector space on which they operate. There are in fact
many possibilities, depending on the requirements of a particular problem.
One choice is to consider C (7i)
the vector space of real-valued functions of
,

x having continuous nth derivatives. If L is a linear differential operator


of order n, that is, an operator containing differentiation of order at most
n, then L acts linearly from C {n) to C (0)
, the space of continuous functions.
Another possibility is to consider L acting on C (c0) , the vector space of
functions having continuous derivatives of all orders. For our purposes
the former choice will be more natural. Observe that the functions e rx
and x k e rx of Theorem 6.1 are nevertheless members of C (co)
, and so they
are automatically also in C (n)
. Furthermore, the set JC of solutions of

(D - rO . . . (D - r n )y =
is a vector subspace of C (n)
, because JV is the null space of the linear
operator
L = (D - rx) . . . (D - rn)

acting on C (n) From


. Theorem 6.1 we can immediately conclude that Jf
has dimension at most n, the order of L, because JC is spanned by n
rx
distinct functions of the form e or x'e rx . (The possibility that r may be
a complex number is not ruled out here but will be explained in the next
section.) In fact we can prove the following theorem. The proof consists
simply of showing that the basic solutions are linearly independent, but it

requires some work.

6.2 Theorem

Let X be the subspace of C (n)


consisting of all solutions of the linear
differential equation

(D-r )...(D- y r n )y = 0.

Then JV has dimension exactly n.

Proof. We have already observed that X has dimension at most n


because itspanned by a certain set of /; elements. We can show
is

that the dimension is exactly n by reviewing the way in which the


exponential multiplier method works. We solve a succession of
Sec. 6 Differential Operators 139

differential equations
{D - rk )y = uk_ x

where uk_ x has the general form

"*-i(*) = c l x ll e rix + . . . + c k _ 1 x lk - 1 e Tk - i:r .

We apply the factor e^ nx to get as usual

D{e~ TkXy) = e~ TkX uk ^. (9)

We now proceed inductively to show that the set of functions of the


form uk has dimension k, assuming that the set of functions of the
form uk _ x has dimension k — 1. If the set of possible functions
uk _ x has dimension k — 1, then the same is true of the set of func-
tions e^ TkX u k _ x (Why?) Thus we consider the linear operator D as
.

acting from a domain of functions e~ TkXy to a range vector space of


dimension k — \. But the null space of D alone has dimension 1.
The reason is that the solutions of

D(e~ TkXy) =
are of the form
e -r k Xy _ Cf

clearly a 1 -dimensional subspace of the domain of D. Because

dim (domain) = dim (null space) + dim (range),

we have dim (domain) = + 1 (k — 1) = k. Thus the set of solu-


tions e~ TkXy of Equation (9) has dimension k. It follows that the
corresponding set of functions

y = e rkX (e- TkXy)

also has dimension k. Since we can prove this for k = 1,2, . . . , n,


we have shown that the set X of all solutions to the given differential
equation has dimension n.

We conclude the section by considering differential equations of the


form
L{y)=f,
where/is a given continuous function of x, and L has the form

L=(D- ri ){D -r2)...(D- r n ).

We have already treated the case in which/is identically zero.

Example 5. Given
y" + 2y' +y = e 3x ,
140 Linear Algebra Chap. 2

we write the characteristic polynomial D + 2D +


2
1 and factor it, putting
the equation in the form
(D + \)
2
y = e 3x .

Letting (D + \)y = u, we try to solve

(D + 1)« = e Zx .

Multiplication by e x gives
e x Du + ex u = e ix
or
D(e*w) = e 4x .

Then, integration gives


ex u = le** + Cj

or
u = £e 3x + c^ - *.
Since (Z) + l)_y = w, we have

(D + 1)7 = ie
3x
+ c^.
Again multiplying by ex , we get

e x Dy + e xy = le
ix
+ cx
or

Then

or
j = ^e * + 3
c x xe~ x + c 2 e _x .

In the above example the solution breaks naturally into a sum of two
parts, y h and y9 :

yh = c x xe~ x + c 2 e~ x ,

The function y h is called the homogeneous part of the solution because it is a


solution of the so-called homogeneous equation

L(y) =
associated with L(y) = f. The function _y„ is called a particular solution of

Uy) =f
because it is just that: a particular solution, though not the most general
one. In fact, we get y p by setting c l = c2 — in the general solution. This
breakup of the solution into two parts is an example of a general fact about
linear functions, a fact that is used in solving systems of linear algebraic
equations. The principle is important enough, and at the same time simple
enough, that we state it here also.
Sec. 6 Differential Operators 141

6.3 Theorem

Let L be a linear function. Let /be an element of the range of L,


and let y p be any element of the domain of L such that L(y v ) =f.
Then every solution y of

can be written as a sum


m =/
where y h is an element of the null space JV of L.

Proof. Suppose that L(y) =/and that also L(y p ) = f. Then, since
L is linear,
L(y - yp = ) L(y) - L(y p )

It follows that ^ — jj, is in the null space X of L, that is, _y — yv =


yh for some y h in JV\ But then y = y h + yP , as was to be shown.

The method of Example 5 can always be used to find the most general
solution to the equation L{y) =/of the form

(D-r )...(D-r n )y=f.


1

However, in some examples


the computations can be shortened by means
of Theorem which shows that yh the homogeneous part of the solution,
6. 1 , ,

can be written down as soon as we know the roots of the characteristic


polynomial. Theorem 6.3 then says that if we find the general homogen-
eous part of the solution yh (using Theorem 6.1) and can somehow find a
particular solution y p then the general solution of the given equation is
,

yh + yp . In finding yp it may be convenient to take advantage of the


linearity of L in case the right-hand side/is a sum of two or more terms.
If we want to solve
L{y) = fli/i +af 2 2 (10)

and we can find solutions yx and y 2 such that

L(yi )=fu L(y 2 )=f2 ,

then, because L is linear, the function

y = Oi7i + a 2y 2

is a solution of Equation (10). In this context, the property of linearity is

sometimes called the superposition principle because the desired solution


is found by superposition (i.e., addition) of solutions of more than one

equation.
.

142 Linear Algebra Chap. 2

Example 6. In Example 5 we found that the differential equation

(D + \)
2
y = e 3x
had the general solution

TVe
3x
+ Cyxe- X + c 2 e~ x .

When cx = c2 = we get the particular solution yx = j\.e 3x . If we now


wanted to solve
(D + \)
2
y = e 3x + 1, (11)

we would not have to start all over again, but would only have to find a
particular solution for
(D + \fy = 1.

This could be solved, of course, by using exponential multipliers, but in


equation is so simple that we can guess a solution,
this case the differential

namely, y 2 = 1. Then a particular solution of Equation (11) is y p =


tV 3x + 1 an d the general solution is
y = ^e3x + 1 + c x xe~ x + c 2 e~ x .

On the other hand, solving

(D + \fy = e 3x + e~*

requires us to find a particular solution to

(D + \fy = e~ x .

To do this we would return to the exponential multiplier method.


Other methods of solution, using "undetermined coefficients" and
"variation of parameters," are given in Exercises 10 and which follow. 1 1

These methods are sometimes more efficient for finding particular


solutions.

EXERCISES
1. For each of the following differential equations, find an appropriate
exponential multiplier and then solve by integrating both sides of the
modified equation

(a) y' + 2y = 1 (c) y' -y = ex .

(b) 2 — +y =-- x. (d) y = sin x.

2. Show that if the expression y + p(x)y is multiplied by

e $p(x) dx

the resulting product can be written in the form

d
(
e lv(x) dx 7
y\
dx
Sec. 6 Differential Operators 143

[We assume that p(x) is a continuous function of x, and that J p(x) dx is

some function with derivative p(x).]

3. Use the result of Problem 2 to find an exponential multiplier for each of the
following differential equations. Then solve the equation.

(a) / + (Ay= 1. (c) / xy

(b)
£ + xy = x. (d) / + y = 0.
4. Write each of the following differential equations in operator form, e.g.,
(D 2 + 2D t 1)/ = 0, and then factor the operator into factors of order 1.

Then find the general solution of the equation.

(a) y" + 2/ + y = 0. (e) (D 2 - \)y = 1.

(b) 2y" -y =0. (f) y" -y = e


x
.

(c) /" + 3y" + y = 0. (g) /' -y = e


x
+ 1.

(d) D{D + 3)y = 0.

5. Sketch the graph of each function of x given below. Then find a differential
equation of which each one is a solution, the equation being of the form
y" + ay' + by = 0.

(a) xe~ x . (c) 1 + x.

(b) e* + e- x . (d) 2e
2x - 3e
3x
.

6. Define the differential operator D + f(x) to act on a function y by


{D + f{x))y = y' + f{x)y.
(a) Show that in general (D + f(x))(D + g(xj) ¥= (D + g(x))(D + f(x)).
(b) Show that if/ and g are constant, then equality holds in part (a).

7. For x ^ 0, the differential equation

x y"2
+ (x 3 + x)y + (x
2
- \)y =
can be written in operator form as

(a) Show that the above equation can also be written as

(D + x)Id +-W =0.

(b) Solve the equation in part (a) by successive application of exponential


multipliers of the form given in Exercise 2.
(c) Show that

(D + x)(d + -] ^ Id +-)(D + x).


144 Linear Algebra Chap. 2

(d) Solve the differential equation

x)y-0.
K)< \(D

8. (a) Find the general solution of the differential equation y" — y — 0.


(b) Determine the constants in the general solution y(x) in part (a) so that
y(0) = and y'(0) = 1. Sketch the graph of the resulting particular
solution.
(c) Determine the constants in the general solution y(x) in part (a) so that
j(0)= 1 and y(l) = 0. Sketch the graph of the resulting particular
solution.

9. (a) Show that the characteristic equation of the differential equation


y" +y = has complex roots r1 i and r2 = — i.

(b) Using the definition


e
±ix _ cos x _j_
i sj n x ^

show that the formal solution

y(x) = cx e
tx
+ c 2 e~
tx

to the differential equation y" +y = can be put in the form

y(x) =d 1
cos x + d2 sin x.

(c) Verify directly that cos x and sin x are solutions of v" +y = 0.

10. (Method of undetermined coefficients.) Given a nonhomogeneous differen-


tial equation
(D 2 + aD + b)y =/(*), (1)

suppose that the function f{x) can itself be recognized as a particular


solution of an equation of a similar but homogeneous form, say,

(D - ri)...(D -r„)/=0.

By applying the operators (Z) — r,) to both sides of the given Equation (1),
it follows that y must also be a solution of the higher-order homogeneous
equation
(Z> - /i) . . . (D - r n )(D
2
+ aD + b)y = 0. (2)

Since we have a routine for writing down the most general solution yg of
this last equation, we can find a particular solution y p of Equation (1) by
substituting the general solution of (2) into it and seeing what conditions

result for the arbitrary constants, or"undetermined coefficients," in yg To .

save duplication, we can first eliminate from yg all terms that already occur
in the homogeneous part, yh of the solution of (1). Linear independence of
,

the basic solutions of the homogeneous equations guarantees that when yg


is substituted into (1), then coefficients of like terms must be equal on
either side. It is these equalities that are used to determine the coefficients
Sec. 6 Differential Operators 145

for y v The
. general solution of (1) is then y = yh + y v Find . the general
solutions of the following differential equations.

(a) y" -y = e 2x . (d) y' - y = x.


(b) y" -y = e
x
. (e) /' -y = e
x
+ x.

x
(c) y" + 2/ + y = e .

11. (Variation of parameters.) This method is useful for finding particular


solutions of linear differential equations with nonconstant coefficients of the
form
y" + ay' + by =/(*). (3)

Suppose that we can find the homogeneous solution

yh = C^iO) + C 2 U 2 (x).

The constants and c 2 can now be thought of as auxiliary variables or


ct
"parameters" in which we allow "variation" as functions of x. We then try
to determine c x (x) and c 2 (x) so that

y(x) = c^u^x) + c 2 (x)u 2 (x) (4)

will be a solution of the nonhomogeneous Equation (3). We compute by the


product rule
y = Cl«l + c2 u2 + C x Ux + c'2 u 2

and then require for simplicity that


.

146 Linear Algebra Chap. 2

(a) Verify that substitution of Equations (4), (6), and (7) of Exercise 11
into

/ I ay' +by =f
produces Equation (8).
(b) Verify that Equations (5) and (8) of Exercise 1 1 have the solution given
by Equation (9).

13. A pellet fired horizontally through a viscous medium has a displacement


from its initial position of the form x = x(t), where t is time. Denoting
derivatives with respect to time by dots, the velocity of the pellet at time / is

x(t) and its acceleration is x(t). From physical principles it can be shown
that, theoretically, x satisfies the differential equation

mx + kx = 0,

where m is the mass of the pellet and k is a positive constant depending on


the viscosity of the medium.

(a) Show that if the initial displacement is x(0) = and the initial velocity

is x(0) = v , then, for t > 0,

Vcfn
x(t) = -7- (1 - e- kt ' m ).
k

(b) Show that the displacement x(t) has an upper bound equal to v^mjk.
What is the effect of increasing the viscosity constant or of increasing
the mass of the pellet?
(c) Show that the velocity of the pellet decreases to zero as t increases.
(d) Show that the acceleration of the pellet is negative for t > 0. What is

the effect of an increase in k or an increase in m on the acceleration ?


(e) Sketch the graph of x(t) if v m/k = 1

14. Let D = d\dt. The first-order system of differential equations

(aD + a)y + (bD + p)z = f{t)


(cD + y)y + (dD + 6)z = g(t)

contains the purely algebraic system

ay + pz =/(/)

yy + 6z = g(t)

as a special case when a, b, c, and d are all zero. The method of elimination
can be used to solve the differential system as well as the algebraic. To find
y{t) and z(t), operate on both sides of the first equation by (dD + d),and
on the second equation by (bD + ji). Then subtract one equation from the
other. The resulting equation can be solved for y(t) since it does not contain
z. Next substitute the general solution y(t) into one of the equations and
solve that for z(t); to determine possible relations between the constants of
integration, it may be necessary to substitute y(t) and z(t) into the other
given equation. Sometimes simplifications in this procedure can be made.
Sec. 7 Complex Vector Spaces 147

(a) Use the method just described to find the general solution of the system

(£> + \)y + z = 0.

3y + (D - \)z = 0.

(b) Determine the constants in the solution of part (a) so that the initial

conditions y(0) = and z(0) = 1 are satisfied.


(c) Find the most general solution of the system

(Z> + \)y + Dz =

Dy - (D - \)z = t.

(d) Determine the constants in the solution of part (c) so that the initial

conditions y(0) = and z(0) = are satisfied.

15. Suppose that two 100-gallon tanks of salt solution have concentrations (in
pounds per gallon) of salt y(t) and z(t), respectively. Suppose that solution
is flowing from the v-tank to the z-tank at a rate of 1 gallon per minute, and

from the z-tank to the j-tank at a rate of 4 gallons per minute, and that the
overflow from the j'-tank goes down the drain, while the z-tank is kept full
by the addition of fresh water. We assume that each tank is kept thoroughly
mixed at all times.

(a) Show that y and z satisfy a system of differential equations of the type
discussed in Exercise 14. [Hint. Express Dy and Dz each as a linear
combination of y and z.]
(b) Find the general solution of the system found in part (a), and then
determine the constants in it to be consistent with initial concentrations
j(0) = ^ and z(0) = i,
(c) Draw the graphs of the particular solutions y{t) and z(t) found in part

(b) and interpret the results physically.

SECTION 7

In the earlier parts of this chapter we have always understood the vector COMPLEX VECTOR
space operation of numerical multiplication to mean multiplication by a SPACES
real number. However, we can replace real numbers by complex numbers,
and let the definition of vector space and linear function remain otherwise
the same. Then, all the theorems we have proved for real vector spaces
are still true relative to the complex numbers. To prove this, all we have
to do is observe that the only properties of real numbers that we used in
proving theorems about an abstract vector space are properties that are
shared by complex numbers. Theorems involving inner products are
another matter, which we shall discuss at the end of the section. As
motivation for considering complex vector spaces, we shall explain how
the extensionis the key to further development of the study of differential

operators begun in Section 6. First we shall review the relevant facts


about complex numbers and complex-valued functions.
The complex number z x — +
iy, with real part Re z = x and imag-

inary part Im z = y, can be identified with the vector (x, y) in 'Ji


2
for the
148 Linear Algebra Chap. 2

purpose of drawing pictures. Furthermore, complex addition is defined so


2
thatit corresponds precisely to addition of elements of .'Jl :

(x + iy) + (*' + iy') = (x + x') + i(y + /).


The relation is illustrated in Fig. 7. It is also true that numerical

(a- •
x') i (v y')

IV

Figure 7

multiplication of a complex number by a real number r corresponds to


numerical multiplication in Jl 2 :

r(x + iy) = rx -f iry.

To complete the identification of the complex numbers with the vectors of


3t 2 it is necessary to introduce an operation of vector multiplication in
Si 2 to correspond to the multiplication of complex numbers. Complex
multiplication is done by defining = —1 and computing as follows:
i
2

(x + iy)(x' + iy') = (xx' — yy') + /(*/ + yx').


The complex conjugate of x + iy, written x + iy, is

y '.l'-

Taking the conjugate z of a complex number z corresponds to reflecting


it in the horizontal axis.(See Fig. 8.) Division by a nonzero complex
number x + iy is most easily done by multiplying both numerator
and denominator by the conjugate of the denominator:

x' + iy' (x' + iy')(x — iy) (xx' + yy') + i(xy' — yx')


x + iy (x + iy)(x — iy) x
2
+f
Finally, the absolute value of x + iy, written \x + iy\, is defined
to be

\x + iy\ = v/x
z
+ y
2

X r iy
and corresponds to the length of a vector in 'Ji
2
. (See Fig. 8.)

Figure 8 Notice that absolute value and conjugate of a complex number z


Sec. 7 Complex Vector Spaces 149

are related by

The geometric significance of complex multiplication is seen best by


representing nonzero complex numbers in polar form:

y
x + iy = \x <y\ + i

x + iy\ \x + iy\)'
iy\

Because \x + iy\ = V* + y 2 2
, the numbers xj\Jx 2 +y 2
and yjs/x 2 +y 2

can be written as cosine and sine, respectively, of an angle 0, called a


polar angle of z = x + iy, and illustrated in Fig. 9. Of course, a complex

r|(cos i sin 8)

Figure 9

number has infinitely many polar angles, each pair differing by an integer
multiple of 277.

Now if z and z' are complex numbers with polar angles 6 and 0', we
can write their product in polar form as follows:

(|z| (cos + / sin 0))(|z|' (cos 6' + i sin 6'))

= \z\ \z'\ ((cos 6 cos 6' — sin 6 sin 6') + / (cos 6 sin d' + sin 6 cos 6'))

= \z\ \z'\ (cos (d + 6') + i sin (0 + 0'))-

In the last step we have used the addition formulas for cosine and
sine. The result of the computation is a number in polar form
having absolute value \z\ \z'\ and polar angle d + 6'. We conclude
that the absolute value of a product of complex numbers z and z'
is the product of their absolute values

\zz'\ = \z\ \z'\,

and that if z and z' have polar angles 0(z) and O(z'), then zz'
has a polar angle 0(z) + d(z'). These facts are illustrated in
Fig. 10. Figure 10
150 Linear Algebra Chap. 2

What we have just established can be expressed most conveniently in


terms o\' the complex exponential function. We define, for real numbers 0,

c'" cos () i sin 0.

2
Thus e'° is a complex number with |<"°| \ sin cos- I , and with
polar angle d. Since polar angles are added when complex numbers are
multiplied, we have

In particular, when 0'


0, we yet < V " I ; therefore

1 = a- = 7°.
These last equations are justifications for using the exponential notation;
the function behaves like the real exponential, for which e°e° = e
6+e '

and \\# = e*.


In terms of the exponential, the polar form of a complex number z
with polar angle 0(z) becomes

z = \z\e
mz
\
and its conjugate is

2 = \z\e~
mz \

The fact that the conjugate of zw is the product of the conjugates of z


and w can be established directly, or by writing

In addition to its algebraic simplicity, another reason for using the


complex exponential notation comes from the formulas for its derivative
and integral. To differentiate or integrate a complex-valued function
u(x) -f /'r(.v) with respect to the real variable x, we simply differentiate or
integrate the real and imaginary parts. By definition,

ax ax clx
and

|
(Ma) - iv(x))dx = |
m(x) clx + i fi-(A) dx.

Then the derivative of e ix with respect to x is given by

—d e
ix
= —d (cos x +
/ , •

i sin x)
dx dx
= —sin x f i cos x

= /(cos x + / sin x) = ie
tx
.
Sec. 7 Complex Vector Spaces 151

In short, we have
4- e
ix
= ie
ix
.

dx
Similarly

e
ix
dx = - e ix + c,
J i

where c may be a real or complex constant. These are analogous to the


formulas for the derivative and integral of e 01 when a is real. More
generally, we can define

and compute

7.1 A. e
<a+ifi)x
= (a + ip)e
la+ifi)x

dx

and

7.2 r e <«+«-/»* rf x = —— 1
e
<«-M/»*
+ Cj a + i/5 ^ 0.
J a + //5

These computations are left as exercises.


We are now in a position to discuss the differential equation

(D 2 + aD + b)y =
when the factored operator

(D - ri )(D - r2) = i)
2
- (r, + r 2)D + r^
contains complex numbers rx and r2 . We shall see that the usual tech-

niques, as discussed for example in Section 6, still apply. In fact, the


exponential multiplier method goes over formally unchanged because of
Equation 7.1 ; we have
D{e r*y) = e rx (D + r)y,

whether r is real or complex.

Example 1. Consider the differential equation y" +y= 0. We write


the equation in operator form,

(£»
2
+ \)y = 0,

and factor D + 2
1 to get

(D - i)(D + i)y = 0.
152 Linear Algebra Chap. 2

Then set

(D + i)y = u, (1)
and try to solve
(D - i)u =
for //. As in the real case, we multiply by a factor designed to make the
left side the derivative of a product. The same multiplier rule suggests
that the correct factor is e~ ix so we write
,

er ix {D - 0" =
or, since D(e~ ix
u) = e~ ix
{D — i)u,
D(e~ ix u) = 0.

We integrate both sides with respect to x to get

e~ ix
u = cx or u — cx eix .

Substituting this result for u into Equation (1) gives

(D + i)y = cxe*,

which must now be solved for y. We do it by multiplying through by e ix


to get
e ix (D + i)y = c x e 2ix

or, since D(e ixy) = e ix (D + i)y,


D(e uy) = Cxe
2 '*.

Integrating gives

e
ix
y = \ Cl e 2ix + c2

or

y =~ cx e
ix
+ c 2 e~
ix
.

2i

On replacing the arbitrary constant c a by 2ic 1 , we have

y = c x e ix + c 2 e~ ix

— c x (cos x + i sin x) + c 2 (cos x — i sin x)

= (c x + c 2 ) cos x + i(Ci — c2 ) sin x.

To make the solution simpler looking, we can set dx = (c x + c2 ) and


d2 = i(cx — c 2 ). This involves no change in generality in the constants
because, whenever dx and d2 are to be real numbers, then c x and c 2 in
general must be complex. In fact we have, solving for c x and c 2 ,

d +
=— x id*
= —d,
— id 2
c, and Co .
Sec. 7 Complex Vector Spaces 153

Example 1 shows that it is important to be able to form linear combina-


tions of elements of a vector space with complex coefficients. A complex
vector space has the same definition as a real vector space except that
numerical multiples are formed using the complex numbers C. The
definitions of linear independence and linear function are also formally
the same, with complex numbers replacing real numbers. It is worth
remarking that every complex vector space can be converted in a natural
way into a real vector space; to obtain it we simply restrict ourselves to
numerical multiplication by real numbers. As a consequence, linear
independence of a set in a complex vector space automatically implies
linear independence of the same set relative to the restricted real vector
space. The reason is that, if

c1 x 1 + . . . + c nx n =
implies that all the c's are zero whenever they are chosen from the complex
numbers, then the same implication certainly holds if the c's are only
chosen from the real numbers. However, the converse statement is not
true, as the following example shows.

Example 2. The set C of complex numbers is itself a complex vector


space. As a real vector space, we have seen that C can be identified with
5l 2 ; hence it has dimension 2 relative to the real numbers. However,
relative to thecomplex numbers, C has dimension 1 because a linear
combination of more than one complex number, with complex coefficients,
can always be made to be zero without all coefficients being zero. For
example, in c x z + c 2 w, choose c x = 1/z and c2 — — 1/iv.
Turning to differential equations again, we observe that Theorems
6. 6.2, and 6.3 of the previous section are all true relative to the complex
1 ,

numbers. The reason is that, in the proofs of these theorems, no assump-


tion is made about the numerical multipliers that occur other than the
ordinary rules of arithmetic which apply to both real and complex numbers.
In addition, the properties of the exponential function that are used hold
for both real and complex exponentials. Therefore we can conclude that
all solutions of the differential equation L(y) = 0, where
L = (D - rj . . . (D - rj (2)

are linear combinations of n functions, even when the numbers rk are


complex. These functions are e rkX or, in the case of multiple roots,
x l
e rkX . This implies, just as for real rk :

7.3 Theorem

The set of solutions of L(y) = 0, with L given by Equation (2),


is a vector space Jf of dimension n relative to the complex numbers.
154 Linear Algebra Chap. 2

Jf has a basis consisting of functions of the form e TkX or x e TkX l


,

where rk may be complex.

Theorem 7.3 is not usually applied directly in the above form because
operators such as
P(D) = D + aD + 2
b,

which occur in practice, most often have real numbers for the coefficients
a and b. This implies that, in the factorization

D + aD +
2
b = (D - r{){D - r 2 ),

the complex roots rk always occur in complex conjugate pairs. (If r is a


root of P(x) with real coefficients, then so is r. Why?) As a result,
solutions of the differential equation also occur in pairs of the form

qTX qTX

perhaps multiplied by some power of x. It then becomes natural to write


r = x + //? and f = a — //?. We get

Cl e
rx
+ c2 e = c e (a+mx + c e ~
fx
x 2
(x ill)x

= e ax (c (cos + sin /Sx) + c (cos /Sx — sin /9x))


1 jffx i
2 /

= rfje ax cos /Sx + d e ax sin /Sx. 2

where ^= cx + c and d = i(c — c The functions e xx cos /5x and


2 2 t 2 ).

e ax sin /3x are easier to interpret geometrically than are the complex
exponentials that gave rise to them. Hence the solutions are often written
using the trigonometric form

y = dx e™ cos j8x + d 2 e*
x
sin /3x.

Example 3. The differential equation

y" + 2/ + 2y =
has characteristic polynomial

D + 2D +
2
2.

The roots of this polynomial are — ± 1 /; so in factored form, the operator


equation can be written

(2>_(_1_/))(2)_(_1 +f))y = 0.
The complex exponential solutions are

e (-l-i)x £ {-l+i)x_

Translated into trigonometric form, these give

e~x cos x, e~ x sin x;


Sec. 7 Complex Vector Spaces 155

so the general solution can be written either

(-i-i)x {
- 1+i)x
Cie _|_ c 2e

or
dx e~ x cos x + d.2 e~
x sin x.

The natural counterpart of Theorem 7.3 for differential equations with


real coefficients is the following theorem which guarantees a basis for the
space of real solutions of L(y) = 0.

7.4 Theorem

The set of real-valued solutions of

{D n + a n _ x D n -i + . . . +aD+ x a )j = 0,
where the # are real, is a vector space JL of dimension n relative to
fc

the real numbers. JL has a basis consisting of functions of the form

l ax l ax
x e cos /5.x, x e sin /?x,

where a and ft
are real.

Proof. We know from Theorem 7.3 that the complex solutions of the
above differential equation are linear combinations of functions of
the form x lerx where either r is real or else there is a companion
,

solution x e
rx
We have seen that any solution, and in particular
l
.

any real solution, can then be written as a linear combination of n


functions of the form

l x l ax
x e" cos fix, x e sin fix. (3)

Since the space JL of solutions has complex dimension n, any n


functions that span it must be linearly independent over the complex
numbers. But then these same n functions will automatically be
linearly independent over the real numbers and so form a basis for
JL. Hence n functions of the form (3) are a basis for JL.

We conclude the section with some additional remarks about complex


vector spaces.

Example 4. Denote by C" the set of all w-tuples of complex numbers.


Then C" is easily seen to be a vector space with addition and numerical
multiplication defined coordinate-wise. Unless something is stated to the
contrary, it is always understood that numerical multiplication in C" is
156 Linear Algebra Chap. 2

relative to the complex numbers. Then C" has e x = (1, 0, . . . , 0), . . .


,

e„ = (0, . . . , and so it has complex dimension


0, 1) as a basis, n.

Example 5. Let P n be the vector space of polynomials in powers of


x from up to x n with complex coefficients. Then P n has complex dimen-
1 ,

sion n + 1 because the spanning set 1, jc xn is linearly independent.


To see this, observe that

c + cx x + . . . + c nx
n
— for all x

implies that the polynomial has more than n roots. This is possible only
if all the coefficients are zero.

It is and inner product (the analog of dot-


possible to define length
product complex vector space. To see how this should be
in 31") in a
done, suppose that the inner product of two vectors x and y is to be a
complex-valued function (x, y). Suppose further that (x, x) is to be
nonnegative so that we can define the length of x by |x| = V (x, x). If now
we simply assume that (x, y) is complex linear in both x and y, then we
would have for any complex number c,

|cx|
2
= (ex, ex) = c 2 (x, x) = c 2 |x| 2 .

But |cx| 2 and |x| 2 are both real numbers, while in general c 2 the square of a ,

complex number, is not real. To get around this difficulty we require that
(x, y) be conjugate symmetric,

<x, y> = (y,x),

instead of symmetric. Thus we define a complex inner product (x, y) of


elements x, y in a complex vector space so that the following properties
hold.

7.5 Positivity: <x, x) > 0, except that (0, 0) = 0.

Conjugate Symmetry: (x, y) = <y, x).

Additivity: <x + y, z) = <x, z) + <y, z).

Homogeneity: (ex, y) = c(x, y).

The conjugate symmetry implies additivity in the second vector also,


so that
<x, y + z> = (x, y) + <x, z).
Sec. 7 Complex Vector Spaces 157

However, we have conjugate homogeneity in the second variable:

<x, cy) = <cy, x) = c(y, x) = c(x, y).

Example 6. In the complex vector space C", let z = (z x , . . . , z„) and


w = (wlt . . . , n„). Define

(z, w) = z^\ + . . . + znwn .

It is easy to check that this defines a complex inner product. If we define


the length of z by
|z| = <z, z)
1/2
,

then |z| turns out to have the three properties of length listed in 5.3 of
Section 5, Chapter 1.

Example 7. Let 8 n be the vector space of exponential polynomials

P(x) = 2c ke
ik *

k=-n
defined for— n < x < tt with complex coefficients ck . We can define a
complex inner product on 8„ by

(P, 4> = P(x)q(x) dx.

To integrate the complex function pq we simply integrate its real and


imaginary parts. We define length by

Ipl = (P, Pf
2

\l/2
2
\p(x)\ dxj .

In Examples 6 and 7, the length of a complex vector z was defined by


|z| = (z, z)
1/2
, using a conjugate symmetric inner product. The usual
properties of length are:

7.6 Positivity: |z| > 0, except that |0| = 0.

Homogeneity: \cz\ = \c\ |z|.

Triangle Inequality: |z + w| < |z| + |w|.

The first is obviously satisfied because of the corresponding property of the


inner product. The second follows from

\cz\ = (cz, cz) 1/2 = [(ccz, z)] 1 ' 2 = \c\ |z|.


158 Linear Algebra Chap. 2

As in the and with the same proof, the


case of a real inner product,
triangle inequality followsfrom the Cauchy-Schwarz inequality: |(z, w)| <
|z| |w|. The proof of Cauchy-Schwarz is a simple modification of the

real case and is left as Exercise at the end of this section. The differ-
1 1

ence between the real and complex proofs here illustrates the fact that,
because a complex inner product has somewhat different properties from
a real inner product, we cannot expect theorems involving inner products
to extend without change from real to complex vector spaces. Complex
inner products are used in this chapter in the exercises following Sections
8 and 9.

EXERCISES
1. For each of the following complex numbers z, find z and \z\. Then write z
in polar form and find 1/z.

(a) 1 + i. (c) 2/.

2 + /
(b) -1 + 2/. (d)
i

2. Verify that, for complex numbers zlt z 2 , and z 3 the distributive law ,

*l(*2 + Z 3> = Z 1Z2 + Z 1 Z 3>

the associative laws

z-Sztf^ = {z x z 2 )z z and zx + (z 2 + z 3) = (z 2 + z2 ) + z3 ,

and the commutative laws

zaz2 = z2z 1 and z1 + z2 = z2 + z1

all hold.

3. (a) Verify Equation 7.1.

(b) Verify Equation 7.2.

4. Prove directly from the definitions of conjugate and absolute values that
z^ = z^ and \z z = x t \
\z x \
|z 2 |.

5. Separate the real and imaginary terms in the infinite series

to k\
fc

e
into two infinite series, and use the result to justify defining e' by

e
tB
= cos 6 + i sin d.

6. Show that if r is complex, then (D + r) is linear as an operator on the


complex vector space consisting of continuously differentiable functions
u(x) + iv(x), where x is a real variable.
Sec. 7 Complex Vector Spaces 159

7. Find the general solutions of the following differential equations.

(a) (£>
2
+ 1)7 = 1. (d) y' +y = \.

(b) (D 2 + D + I)/ =0. (e) y" +y = sin x + 1.

(c) y" + y = sin x. (f) y" +y = tan x.

8. A spring vibrating in a viscous medium has a displacement from its initial

position denoted by x(t), where t is time. Denoting differentiation with


respect to / by a dot, the differential equation,

mx + kx + hx = 0,

can be derived from physical considerations. Here m is the mass of the


spring, k is a positive constant depending on the viscosity of the medium,
and h is a positive constant depending on the stiffness of the spring.

(a) By considering the roots of the characteristic polynomial, show that


x[t) is oscillatory or not, depending on whether Amh > k2 .

(b) Assuming m, h, and k all equal to 1 , find the solution of the differential
equation satisfying x(0) = and x(0) = 3.
(c) If in part (b) we change to k =2, but leave the other conditions the
same, find the corresponding solution to the differential equation.
(d) What is the maximum displacement from the initial position under the
conditions of part (b)? Show that the oscillation tends to zero as t

increases.
(e) Sketch the graph of the displacement function under the conditions of
part (c). What is the maximum displacement and at what time does it

occur?

9. (a) Show that {a cos ex + b sin ex), where a, b, and c are real numbers,
can be written in the form r cos (ex — 0), where r = Va 2 + b 2 and
is an angle such that cos d = a\r and sin Q = bjr.

(b) The result of part (a) is useful because it shows that a linear combination
of cos ex and sin ex has a graph which is the graph of cos ex shifted by a
suitable phase angle, 8, and multiplied by an amplitude, r. Sketch the
graph of
1 1
- cos 2x -i 7= sin 2x
2 V3
by first finding r and an appropriate 0.

ax x
10. Show directly that e cos Px and e' sin fix are linearly independent
relative to the real numbers by using the formulas
gift*
+ e -iP*
i0x _e -ifix

cos fix = , sin fix =


e
— ,

rx TX
together with the fact that e and e are linearly independent relative to the
complex numbers when r j= f.

11. Prove the Cauchy-Schwarz inequality

|<z,w>|<|z||w|
:

160 Linear Algebra Chap. 2

for complex inner products. [Start, as in the real case, by assuming


|z| = |w| = 1. Express in polar form

(z, w) = |<z, vt)\e


i0
,

so that

(e-
i0
z, w) - |<z,w)|.

Then expand |<?~


i0
z — w| 2 in terms of the complex inner product.]

12. Show that C", the set of w-tuples of complex numbers, has dimension //
relative to the complex numbers and dimension 2a? relative to the real
numbers.

13. Show that if a set of real-valued functions is linearly independent relative


to the real numbers, then it is linearly independent relative to the complex
numbers. (The somewhat simpler converse statement is true quite generally
and has been treated in the text.)

SECTION 8

ORTHONORMAL BASES If 1) is a vector space with an inner product, and T) has a basis

Ui,U 2 , . . . , u„,

then it is often desirable that the basis be an orthonormal set. This means
that distinct vectors u (
and u ; are perpendicular:

u. -u ; = 0, i^j, (1)

and that each has length 1

|u<| = Vui-u^ = 1, 1=1, (2)

These conditions are useful because, if a vector x in TJ is expressed as a


linear combination of basis vectors by

X = CjUj + • • + CjU; + • • • + c nu n ,

then the coefficients c,- can be computed easily. In fact, we have

X • U; = CiUj • IT, + . . . + Ci U j • Uj + . . . + c n \x n u;

= + • • + c, + . . . + 0,
so that

C, = X • u, (3)

2
Example 1. The vectors (1,0) and (0, 1) in 'J\ form an orthonormal
basis. So does the pair of vectors (l/\'2, l/\/2) and (— 1/V2, 1/V2). In
terms of the latter basis we can write

(x y) '
= c +c
i-Jrji} 'hr2-jl}
s/2 J2i
-

Sec. 8 Orthonormal Bases 161

To determine cx and c2 we compute, using Equation (3),

*-*»>(*# fe a
5 -

In particular, if (x,j) = (1,2), we have cx = 3/V2 and c2 = 1/V2;


therefore

(1 ' 2) = + 72'^'72)'
7i(vi'^)
Example 2. The set of functions defined for — <x <
tt tt by

cos x, sin x, ... , cos nx, sin nx

span the vector space TS n of trigonometric sums of the form

n
T(x) = 2 (o n cos kx + foj. sin /ex).

A natural inner product for 7S n is

(/g)^ 1 fV(x)g(x)Jx,
77 J-s-

and with respect to this inner product, the above set turns out to be
orthonormal. In fact, computation shows that

o,
- I cos kx cos Ix dx =

— I sin kx sin Ix dx
Linear Algebra Chap. 2

The fact that orthonormal bases are simpler to work with suggests
the following question: Given a basis for a vector space with an inner
product, is there some way to find an orthonormal basis? The answer is

yes, and we shall describe an effective procedure. First we make the


following observation.

8.1 Theorem

If Uj, . . . , un is an orthonormal set in a vector space 1), then the


vectors u ;
are linearly independent. The coefficients in the linear
combination
X = dUi + . . . + c nu n

are computed by c; - = x • a,.

Proof. Suppose there are constants c t , . . . , c n such that

CiUj + . . . + cm + . . . + cnu n = 0.

If for somey we form the dot-product of both sides of this equation


with a,, then the result is

quj • u, + . CjUj • Uj, + . . . + CnU n ' U; = 0.


But u„ • Uj ;
= if k ^ j, and u 3
• u3 = 1 ; so we get c, = 0. Since j is

arbitrary, the u's are linearly independent.

Now suppose that x l5 . . . , xn is a linearly independent set in a vector


C
space \J; we shall describe a process for constructing an orthonormal set

u 1; . . . , un . First we pick any of the x's, say x l5 and set

111 = —
|Xx|
.

Then lu^ = Ix^/lxJ = 1. Now pick another of the x's, say x 2 and form ,

its projection on alt that is, form the vector (x 2 • u 1 )u 1 . If the vectors were
ordinary geometric vectors, the relationship between xlt x 2 , and
ux would be somewhat as shown in Fig. 1 1.

The vector y 2 is defined by

y2 = x2 - (x 2 -u,)uls

and we can check that y 2 is perpendicular to u 1; for


x? » u,),u.

y2 • Uj = x2 • ux - (x 2 •
UiHii! • u2)

Figure 11 = x2 • ux — x2 • ux = 0.
Sec. 8 Orthonormal Bases 163

To get a unit vector perpendicular to u x , we let

u2
"
= —
|y,l
.

The vector y 2 cannot be zero because, by its definition, that would imply
and u x were linearly dependent.
that x 2
Having found u x and u 2 we choose x 3 and form ,
its projection on the
subspace of T) spanned by u t and u 2 By definition, . this is the vector

p = (x3 • U^Ui + (x 3 • u 2 )u 2 .

We define y 3 by subtracting this projection from x 3 :

y3 = x3 - ( x3 • "i)ui - ( X3 ' u 2 )u 2 .

As before we can check that y 3 is perpendicular to u x and also to u 2 . Since


ux • ux = 1 and u x • u2 = 0, we have

y3 • ux = (x 3 • ux) - (x 3 • ux) = 0.

Similarly

y3 • u2 = (x 3 • u2) - (x 3 • u2) = 0.

Figure 12 shows these vectors as geometric arrows. We define u 3 by


setting

u3 = —y3

|y 3 l
.

Once again y 3 =
would imply linear dependence of x 3 ul5 and u 2 ,
.

But because the subspace spanned by u 2 and u 2 is the same as the one
spanned by x x and x 2 this would imply linear dependence of the x's.
,

We proceed in this way, successively computing u x u 2 , , . . . , u,. To


find u, ,
, we set

y m=x J+1
- (x ;+1 • ujux (X; + l ' U,K,
8.2

;+l
ly^+ii

As before we can verify that u ;+1 is perpendicular to uls u 2 , . . . , u; .

Equations 8.2 summarize what is known as the Gram-Schmidt process


for finding an orthonormal set from an independent set. It can be con-
tinued until we run out of x's in the independent set.

The vector
(x • ujux + . . .
+ (x •
11,011,

is called the projection of x on the subspace spanned by u x , . . . , u; . (The


projection is also called the Fourier expansion of x relative to u l5 . . . , ur
L

164 Linear Algebra Chap. 2

In Theorem 7.1 of Chapter 5 it is proved that the Fourier expansion of x


is the linear combination of the u's which is nearest to x.)

Example 3. The vectors x x = (1, — 1, 2) and x 2 = (1, 0, —1) span a


plane P in !R 3 because they are linearly independent. To find an ortho-
normal basis for P, we apply the Gram-Schmidt process to x x and x 2 . We
set

„1 = *l = (1.-1.2) = (±. =1 A.\


|x x |
V6 V6'V6'V6/'
and then compute

y2 = x - (x2 2
• uju!

= (i,o,-i)- (=±)(-L, — *)
V6/V6 V6 %/*/
= (1, 0, -1) + (|, -i, J) = (*, -i, -|).
Then

2 ~ ~
|y.l V66/36

(7 '-''- 4) -

=vk
Thus the plane P can be represented as all linear combinations

WXj + vx 2
or all linear combinations
su x + /u 2 .

The relationship between the two pairs of vectors is shown in Fig. 13.

Figure 13
Sec. 8 Orthonormal Bases 165

Example 4. Let P„[—\, 1] be the vector space of polynomials f(x) =


a + a±x + . . .
-f- a n x n defined for —1 <x <
1. We define an inner

product by

(f, g) =jj(x)g(x) dx.

We have seen in Example 5(b) of Section 4 that the functions 1, x, . . . , xn


form a basis for P n To . find an orthonormal basis, we observe that

(1, 1) = 2 and therefore set u^x) = l/v'2. Then

because (x, l/v'2) = Jlx (x/V2) dx = 0. We then compute

x x
u,(x) =
(x, x r x ax

= y/ix.
Next set

>'
3 (x)
= x
2
— (x
2
, u 1 (x))w 1 (x) - (x
2
, u 2 (x))u 2 (x)

(£5*)^-
Then

» 3 (x) = l\l/2
2 2
'
(x -i) Jx

= V¥(x 2 -i) = 2
Vl(3x -l).

The process can be continued indefinitely. (The resulting polynomials


are called the normalized Legendre polynomials, and another method for
computing them is given in Section 7 of Chapter 5.)

Of course, if we start with two bases for the same vector space "U and
apply the Gram-Schmidt process to them, we will in general get different

orthonormal bases. In particular, if one of the two given bases is already


orthonormal and we apply the Gram-Schmidt process to the other one,
we may get two different bases. The following theorem gives a condition
under which the two resulting orthonormal bases are the same except
perhaps for orientation. It is necessary to assume that U is a vector space
C

with scalar multiplication by real numbers.


e

166 Linear Algebra Chap. 2

8.3 Theorem

Let {u l5 . . . , u„} and {v l5 . . . , v,,} be two orthonormal sets in a real


C
vector space VJ. If the subspaces spanned by {u x , . . . , uk } and
{v x , . . . , v k } are the same for k = 1 , 2, . . . , n, then u k = ±v fc
for
fc= 1,2,...,«.
Proof. Let v*. be any one of the v's. Since by assumption v^ is in the
subspace spanned by {u 1? . . . , u fc}, we can write
k

for some real numbers rt . However, v fc


is orthogonal to each of
{u x , . . . , u^j} because, by assumption, it is orthogonal to
{v l5 . . . , v^}, and these two sets span the same subspace. If we
form the dot product of both sides of the above equation with u ;

for 1 <j < k, we then get

= v fc
• u; - = r it j= 1 k- 1,

yk ' U A:
~ rk-

Thus yk = {yk • u^u^.. Since both the u's and the v's have length 1,

we must have
=
.
,

|v**u*l I-

It follows that yk • uk = ± 1 ; so v fc
= ±uk .

Example 5. Consider the natural basis e x = (1, 0, 0), e2 = (0, 1, 0),

e3 = (0,0,1) together with the basis x x = (3, 0, 0), x 2 = (1,-1,0),


x3 = (1, 1, 1) for :R
3
Clearly, e x and x 1 span the same subspace of 3l 3
. .

Similarly, the pairs {e 1 and {x l5 x 2 } span the xv-plane, and the com- , e2}
plete bases both span all of 3l 3 It follows from Theorem 8.3 that applying .

the Gram-Schmidt process to {x l5 x 2 x 3 } will result in an orthonormal ,

basis of the form {±e 1; ±e 2 ±63}- As a matter of fact, we find that we ,

get {e l5 — 2 e 3 }. We leave the verification as an exercise. The vectors are


,

shown in Fig. 14. e

e,

.•'e,

Figure 14
Sec. 9 Eigenvectors 167

EXERCISES
1. (a) Find a vector (x, y, z) such that the triple of vectors (1, 1, 1), ( — 1, \, \),
and (x, y, z) forms an orthogonal basis for Jl 3 .

(b) Normalize the basis found in part (a).

2. The vectors (1,1,1) and (1, 2, 1) span a plane Pin R 3 . Find an orthonormal
basis for P by using the Gram-Schmidt process.

3. The three vectors (1 2,, 1), ( -10, 0), and (0, 1,0,2) form a basis for a
, 1 , 1 ,

subspace S of 3i i . Use the Gram-Schmidt process to find an orthonormal


basis for S.

4. Let P.2 be the 3-dimensional vector space of quadratic polynomials f{x)


defined for x < <
1. If P 2 has the inner product

f{x)g{x) dx,
</".*>
f
find an orthonormal basis for P.,. [Hint. One basis for P2 consists of
{I,*,* 2 }.]

5. Show that the application of the Gram-Schmidt process to the three vectors
Xj, x 2 x3
, in Example 5 of the text gives the triple e l5 — e2 , e3 .

6. Prove that applying the Gram-Schmidt process to an orthonormal set gives


the same orthonormal set back again.

7. Let C[ — 7T, tt] be the vector space of complex-valued functions /(x) defined
for — tt < x < n. Let {f,g) be defined for/and £ in C[ — n, n] by

<J*g) f(x)g(x) dx.

(a) Show that (f,g) is a complex inner product.


(b) Show that

!\,m = n.

0, m j= n.

(c) Show that the vector subspaces of C[~n, tt] spanned by the following
two sets are the same:

cos kx, sin kx, k =0,1,2,...,/;. (1)


ikx
e , k =0, ±1, ±2, ... , ±n. (2)

8. The conclusion of Theorem 8.3 is that u fc


= ±v fc
. What conclusion is possible
if 1) is a complex vector space? [See Problem 1 1, Section 7.]

SECTION 9

In this section we shall find a natural way to associate a basis for a vector EIGENVECTORS
t
space 1) with a given linear function /from 1? to \5. Suppose that there
168 Linear Algebra Chap. 2

is a nonzero vector x in 1) and a number ?. such that

f(x) = Ax.

Then x is called an eigenvector of the linear function/, and A is called its

associated eigenvalue. (The terms characteristic vector and characteristic


root are sometimes used.) Since /is linear, the foregoing equation will
always have the trivial solution x — for any A; so we rule that out as
being uninteresting. Of course, also by the linearity off, if x is an eigen-
vector corresponding to I, then so is ex for any number c^O.

Example 1. Let /be the linear function from ,'Jl


2
to .'Ji
2
defined by the
matrix
t r
1

Thus

x\ / 1 1 \ I x\ j +y x

\yf \4 1/ \y/ \4x + y

It is easy to verify that

1 1\/1\ /l
= 3
4 l/\2/ \2
and that

:>-i-i
2
That is, the vector (1,2) in 'Ji is an eigenvector corresponding to the
eigenvalue and the vector (1 —2) is an eigenvector corresponding to the
3, ,

eigenvalue —1. Of course, nonzero multiples of these two vectors will


be eigenvectors with the same two eigenvalues.

Before discussing how to find eigenvectors, we shall see why they are
useful.Suppose that 'V is a vector space, /is a linear function from C
U to
HI, and suppose that °0 has a basis {x ]5 x 2 , . . . , x„} consisting of eigen-
vectors of/, that is,

/(x fc ) = 4x ft ,
k= 1,2, ... ,«,

for some numbers Xk . It follows that representing an arbitrary element of


1J using this basis leads to a particularly simple form for/. Suppose

X = CjXx + . . . + c n\ n .
=

Sec. 9 Eigenvectors 169

Then, using the linearity of/ and the fact that the x's are eigenvectors,
we have
/(x) = Cl /(Xl ) + . . .+ c n f(x n )

= c^Xj + . . . + cn X n x n .

Thus, relative to the basis of eigenvectors, the action of/ is simply to


multiply each coefficient by the corresponding eigenvalue. In short,

(n \ n

9.1
6=1 / fc=i

where

9.2 /(x,) = 4x fc
, k=\,2,...,n.

In geometric terms, using the eigenvectors for a basis allows the


action of the function / to be interpreted as a succession of numerical
multiplications, one on each of the basis vectors.

Example 2. Returning to the linear function of Example 1, we


express an arbitrary vector in 51 2 as a linear combination of the two eigen-
vectors Xj = (1, 2) and x 2 = (1, —2):

The corresponding eigenvalues are 3 and — 1 ; so from

zoo U/
Q+ „/( _]

we get

Figure 15 shows the effect of/ on each of the two eigenvectors x x and
x2 . It follows that the image /(x) of any vector x can be constructed by
170 Linear Algebra Chap. 2

MX | • VXj

Figure 15

geometric methods. Furthermore, we can express the effect of/ in words


by saying that /is a combination of two transformations:

1. Stretching away from the line through x 2 along the lines parallel to

2. Reflection in the line through x 1 and along the lines parallel to x2 .

For this analysis to work, it was essential that the eigenvectors of/ span
all of 3l 2 .

Example 3. To find the eigenvectors and corresponding eigenvalues


for the function /of Examples 1 and 2 (in casethey had not been given),
we would proceed as follows. We need to find vectors x ^
0, and numbers
X such that .
r,
/(x)
.
— ,
ax = 0.

In matrix form, this equation is

^4 l/\j>

A0 ,

4 l!\y kJ\y
or

(1-/1) 1 x
(1)
4 (i-X)l\y
:

Sec. 9 Eigenvectors 171

It is clear that if the foregoing 2-by-2 matrix has an inverse, then the only
solutions are and y — 0. Hence we must try to find values of X for
x =
which the matrix fails to have an inverse. This will occur precisely when

/(1-A) 1 \
det = 0,
\ 4 (1 - X)l

that is, when X satisfies (1 — A) 2 — 4 = 0. This quadratic equation in X


has roots X = 3 and X — — 1 , as we can see by inspection or otherwise.
To find eigenvectors corresponding to X = 3 and X = — 1 we must , find
x and j', not both zero, satisfying Equation (1). Thus we consider

and
- (1J0-C

These equations reduce to

X = 3: -2x +y=
X = - 1 2x + y = 0.
It follows that there are many solutions; but all we need is one for each
eigenvalue. We choose for simplicity

X = 3
\yl \2

X= -1
\yl \-2
though any nonzero numerical multiple of either vector would do.

The method of Example 3 can be summarized as follows. To find


eigenvalues of a linear function / from % n to 'Ji
n
, solve the characteristic
equation

9.3 det (A - XI) = 0,

where A is the matrix of/. Then, for each eigenvalue Xx , . . . , X„


try to find nonzero vectors x l5 . . . , x n that satisfy the matrix
equation

9.4 (A - VK = 0.
\

172 Linear Algebra Chap. 2

Because Equations 9.3 and 9.4 are expressed in terms of the matrix A off,
we sometimes and eigenvalues of the matrix rather
refer to eigenvectors
than the function/. Of course Equation 9.4 is just Equation 9.2 in matrix
form; but Equation 9.2 also applies to linear functions that are not
representable by matrices.

Example 4. This example will show one way in which a linear function
can fail to have eigenvectors that provide a suitable basis for the vector
2
space on which/acts. Consider the linear function/from J{ to :K 2 defined
for fixed 0, < < 2tt, by

x\ /cos — s'mO\/x
/
yl \sin cos 01 \y
This function carries

1 /cos
into
0/ \sin
and
0\ /-sin 6
into
1/ \ cos0

and so represents a rotation counterclockwise about the origin through an


angle 6. Equation 9.3 becomes

(cos 6 — X —sin \
=
sin 6 cos 6 — XI
or
X2 - 2X cos 6 + 1 = 0.

This quadratic equation for X has only complex conjugate roots of the
form
cos 6 ± i sin 6 = e
±i6
.

2
Since we are looking for nonzero vectors x in III satisfying

f(x) = e
±ie
x,

we conclude that /has no eigenvectors in ,'K


2
, unless = or = tt. In
the first case we get the identity transformation, which clearly has every
nonzero vector as an eigenvector with eigenvalue 1. In the second case
we get /(x) = —x, which is a reflection in the origin and has every
nonzero vector as an eigenvector with eigenvalue — 1. A moment's
thought shows why, in general, a rotation cannot have eigenvectors in .'K 2 .

See Exercise 8 for an example of a linear function which fails in a different


way to provide a basis of eigenvectors.
Sec. 9 Eigenvectors 173

We conclude by considering the possibility that the eigenvectors of a


linear function /from C U to HJ can be chosen so that they form an ortho-
normal basis. Since eigenvectors are only determined to within a numer-
ical multiple, we can always normalize them to have length 1. We can
achieve the orthogonality if the function / is symmetric with respect to
C
an inner product on \J, that is,

/(x) •
y = x ./(y)
c
for all x and y in l). In particular, if/is a linear function from 51" to 31",
it has a matrix A = (a i}), and we can write the symmetry equation as

Ax '
y = x • Ay.

Example 6. Iff is a linear function from 3t2 to Jl 2 with a matrix

<a b'
A=
\b c

that is symmetric about its main diagonal, then f is a symmetric trans-


formation. We can compute

a b\ / Vl \ />'A _ lax, + bx 2 \ U)
b c!\xj \yj \bx1 + cxj \y 2 l
= ax^i + b(x 2 y 1 + x 1 y 2) + cx 2 y 2 .

Similarly, we compute
f
xA la b\ly 1 \_(xA(ayx + by %

,xj \b cl \yj \xj \byi + cy 2


= a*i>'i + b(x 1 y 2 + x 2 y^ + cx 2 y 2 .

Since the two dot-products are equal, we conclude that /is symmetric.

For a symmetric transformation we have the following theorem.

9.5 Theorem
C
Let /be a symmetric linear function from HJ to \J, where °\3 is a
vector space with an inner product. If x x and x 2 are eigenvectors of
/corresponding to distinct eigenvalues X x and A 2 then x 2 and x 2 are
,

orthogonal.

Proof. We need to show that x x • x2 = assuming that/(xj) = Xxxl


and/(x 2 ) = X 2 x 2 We have
.

/(Xj) • x2 = (X x x x ) • x2 = A^Xx • x2)


174 . Linear Algebra Chap. 2

and
Xi -/(x 2 ) = xx • (A 2 x 2 ) = A 2 (Xi • x 2 ).

Because/is symmetric, /(x x ) • x2 = x */(x );t 2 so

A 1 (x 1 • x2) = X (x x2 1

2 ).

Hence (X 1 — A 2 )(x 1 • x2) = 0. Since X 1 and X 2 are assumed not equal,


we must have x x x 2 • = 0.

A more general version of Theorem 9.5 is given in Chapter 5, Section 7,


where it is applied to a different class of examples.

Example 7. The function g from 3i 2 to ft 2 with matrix

T 2\

has eigenvalues computed from Equation 9.3 by

U\-X) 2 \
det = 0,
\ 2 (1 - X)l

that is, — A) 2 + 4 = 0. By inspection the eigenvalues are 1 — 3 and


(1

X =— (Note that the transformation g is not the same as the trans-


1.

formation /of Example 1, even though they have the same eigenvalues.)
Equation 9.4 for the eigenvectors has two interpretations, depending on
which eigenvalue is used. We have

1 J)0 -C
1= -1:

These equations reduce to

X= 3: -2x + = 0, 2y

A=-l: 2x+2y = 0.
We find solutions
X= 3: (x,y)=(l,l),

X=-\: (x,y) = (l,-l).

The vectors Xj = (1, 1), x2 = (1, — 1) are clearly orthogonal, and they

can be normalized to give u x = (1/V2, 1/V2), u 2 = (1/V2, — 1/V2)-


Of course the normalized vectors u x and u 2 are eigenvectors also.
u

Sec. 9 Eigenvectors 175

We can now take advantage of the fact that ii! and u 2 are eigenvectors
and also that they form an orthonormal set. The latter fact enables us to
express any vector in Jl 2 as a linear combination of u t and u 2 very simply.
From the previous section we have

X = (X • Uj)^ + (x • u 2 )u 2

for any x in 5l
2
. Now using the fact that u x and u 2 are eigenvectors of the
linear function g, we can write

g(\) = (x •
uOsCuO + (x • u 2 )g(u 2 ).

But gi^) = 3u x and g(u 2 ) = — 2 , so

g(x) = 3(x •
u^Ux - (x u 2 )u 2 .

Thus the action of g has a geometric description like that given for fin
Example 2 of this section. The only difference is that the stretching and
reflection takes place along different lines: perpendicular lines in the case
of g, nonperpendicular lines in the case off.

EXERCISES
1. The linear function /from Jl
2
to R2 with matrix

'1 12\

has eigenvalues 7 and 5. Which of the following vectors is an eigenvector


of/? For those that are, what is the corresponding eigenvalue?

2. Find all
;) n< (::)•

the eigenvalues of each of the linear functions defined by the


D c

following matrices.

4\ /l 0\
(b)
10/ (d) 2 1

\0 1 2/

3. For each matrix in Exercise 2, find an eigenvector corresponding to each


eigenvalue.

4. Show that is an eigenvalue of a linear function /if and only if/ is not
one-to-one.
176 Linear Algebra Chap. 2

5. Show that if/ is a one-to-one linear function having A for an eigenvalue,


then/ -1 , the inverse of/, has 1/A for an eigenvalue.

6. Let /be a linear function having A for an eigenvalue.

(a) Show that A 2 is an eigenvalue for/o/.


(b) Show that A n is an eigenvalue for the function got by composing/ with
itself n times.
(oc)
7. Let C (3l) be the vector space of infinitely often differentiable functions

f(x) for x in 31. Then the differential operator D 2 acts linearly from C (G0) (^)
to C (00) (3l).
(a) Show that for any real number A the functions cos ?.x and sin Ax are
eigenvectors of D 2
corresponding to the eigenvalue —A 2 .

(b) Let C (x, [0, tt] be the subspace of C (x, (#) consisting of functions/such
=0. Show if D
2
that /(0) =/(tt) that is restricted to acting on
C (x) [0, 77], then its only eigenfunctions are of the form sin kx, corre-
sponding to A = —A: 2 , where k is an integer.

8. Let h be the linear function from R 2


to 3l
2
with matrix

'1 V

\0 1,

(a) Show that the eigenvectors of A span only a 1-dimensional subspace of


3l 2Thus the geometric action of h cannot be analyzed by looking only
.

at what it does to eigenvectors.


(b) The linear function h is called a shear transformation. Give a geometric
2
description of the action of h on 3t .

9. (a) Find the eigenvalues and a corresponding pair of eigenvectors for the
/from R 2 to 31 2 having matrix
linear function

(b) Show that the eigenvectors of the function /in part (a) form an orthog-
2
onal basis for 3l , and use this fact to give a geometric description of the
2
action of/ on 31 .

(c) Generalize the results you found for (a) and (b) to any linear function/
from 31" to 31" having a diagonal matrix diag (a u a 2 , . . . , c„).

2
10. Find the eigenvalues of the function^ on 3l with matrix

1 2^

,1 1,

2
Show that the corresponding eigenvectors span 3l and describe the action
ofg.
Sec. 9 Eigenvectors 111

11. Let C 2 be the complex vector space of pairs z = (zx z 2) of complex numbers, ,

and suppose that C 2 has the complex inner product

z • w = z 1 w1 + z2 w2 .

(a) Show that the linear function /defined on C 2 by the matrix

/'cos 6 —sin 6\

v
sin 8 cos 8}

has eigenvectors and corresponding eigenvalues

V
X = e
i{

X=e-

(b) Show that if sin 6=0, then every nonzero vector in C 2 is an eigen-
vector of the function fin part (a).

(c) Show that the eigenvectors in part (a) form a basis for C2 .

(d) Show that the eigenvectors in part (a) are orthogonal with respect to the
complex inner product in C2 .

12. Explain in geometric terms why a rotation about the origin in ft 2 through
an angle 8, < 8 < 2tt, has no eigenvectors in ft 2 unless 8 = or 8 = -n.

13. Suppose that 'W is a complex vector space with a complex inner product, so
that <z, w) = (w, z). Then a linear function / from 10 to 10 is called
Hermitian symmetric if </(z), w) = <z,/(w)>, for z, w in 10.

(a) Show that if /'is Hermitian symmetric and has A for a complex eigen-
value, then A must actually be real.
(b) Show that if 10 has complex dimension 2 and / is given by a 2-by-2
complex matrix
y\

then /is Hermitian symmetric if and only if /? = y.


14. If
(x(t)
x(/) =
\yit)

defines a function of the real variable t taking values in ft 2 , then we define


the derivative of x by
(x'(f)\
x'(0 =
\y'(t))
178 Linear Algebra Chap. 2

(a) Let
(a c

b d
Show that the vector equation
x' = Ax
is the same as the system of first-order differential equations

x = ax + cy

y = bx + dy

(b) Show that x(?) = e


x,
c, where c is a constant vector, is a solution of the
vector equation x' = An if c is an eigenvector of the matrix A with
corresponding eigenvalue A.

(c) If the matrix A is

'1 2\

v2 1,

find solutions of the vector differential equation x' = As. in the form

x = e^i'c! + e^'cj.

[Note. The eigenvectors cx and c 2 are determined only to within a non-


zero numerical multiple.]

(d) Show that the second-order differential equation

(Z> - r x ){D - r 2 )y =
is equivalent to the system of first-order equations

x = rx x
y' =x + r 2 x,

where x = (D — r 2 )v- Then find the matrix A that can be used to


express this system in the form of part (a).

(e) Show that the characteristic equation of the matrix A found in part (d)
is the same as the characteristic equation of the operator (D — r^) x
(D — r 2 ) as defined in Section 6.

15. (a) Show that the functions cos kx and sin kx are eigenvectors of the
differential operator D2 acting on C (cc) What are the corresponding
.

eigenvalues?
(b) If we restrict the operator D 2
to real-valued functions defined for
—" < x < n, we can define an inner product by

</><?>=
L f(x)g(x)dx.

Show that the eigenvectors cos kx and sin Ix are orthogonal with respect
to this inner product for k, I = 1,2, 3, ... . Then use Theorem 8.1 to

conclude that the eigenvectors are linearly independent.


Sec. 10 Coordinates 179

(c) Show that the functions e ikx and e~ lkx are eigenvectors of the differential
operator D2 acting on complex-valued functions. What are the corre-
sponding eigenvalues?
(d) Using the complex inner product

</".*>* f{x)g(x) dx,


/:
show that the functions e ikx are orthogonal, for k = 0, ±1, ±2, ....
Can you conclude from Theorem 8.1 that the complex exponentials are
linearly independent?

SECTION 10

In this section we show how any problem about finite-dimensional COORDINATES


vector spaces and linear functions on them can be reduced to an equivalent
problem about 31" and matrices. This is done by introducing coordinates
in the vector spaces. The familiar coordinates of 2- and 3-dimensional
geometry may be considered a special case of what we are about to
describe. The following theorem is fundamental.

10.1 Theorem

Let V= (v l5 . . . , v n ) be a basis for a vector space U, so


£
dim CU) =
n. For any vector x in 'TJ there is one and only one «-tuple of
numbers rl5 , r„ such that x = . . . r^s x + . . . + r n\ n .

Proof. Since the v's span 17, x is a linear combination of them; so


there is at least one «-tuple (rx , . . . , r„) satisfying the condition. If
(s^ . . . , sn) is any other «-tuple such that

then
^Vi + . . . + s n\ n = x = w + r n\ n ,

fa - *iK + • • • + (r n - s n )v n = 0.

Since the v's are independent, the numbers rt — st must all be zero,
and so (s x , . . . , s n ) is the same as (r ls . . . , r n ).

Let V = (vj, . . . , v„) be a basis for a vector space


C
\J, and let x be a
vector in °U. The unique «-tuple (rl9 . . . ,r n ) such that x = r-^ x + . . . +
r n v n is the «-tuple of coordinates of x with respect to the basis V. The
vector in 31" with entries r1} . . . , r n is the coordinate vector of x with
respect to the basis.

Example 1. (a) The vectors Vj = — ( 1, 1, 0) and v 2 — (0, — 1, 1) form


a basis for the subspace 1) of Jl 3 consisting of all vectors the sum of whose
180 Linear Algebra Chap. 2

o entries is 0. The vector x = (1, — 3, 2) is in X5. An easy calcu- <

\ f^\ lation shows that x = —\ + 2v so the coordinates of x with


1 2 ,

\ /y \ respect to the basis vx , v 2 are (— 1, 2).

\ / \ A (b) In the plane, let Vj be the vector of unit length along the
q ~ *"*
horizontal axis, and v 2 the vector of unit length at an angle 60°
counterclockwise from v x . Let x be the vector of unit length 60°
Figure 16 beyond v 2 shown , in Fig. 16.
By geometry, CB is parallel to OA and OC is parallel to AB
(why?); so OABC is a parallelogram and v = 2 x + Vj. Hence x = —v + 1

v2 and the coordinates of x with respect to this basis are (—1, 1).

Coordinates provide a way of representing vectors in a concrete form


suitable for calculations. They also provide a concrete way of representing
linear functions. The key point is that if the value of a linear function is

known for each vector of a basis, then the function is known completely.
This follows from the fact that any vector x can be expressed as a linear
combination r x \x + • • • + r n\ n of the vectors in a basis. Then/(x) is the
combination ^/(Vi) + . . . + rn
f(\ n) of the vectors /(vj with the same

coefficients, because/is Suppose 1) > ID isagivenlinearfunction


linear. —
and that V = (v x v„) and,
= (w 1;
. . .w m ) are bases in CU and
, W . . . ,

ID, respectively. The possibility that 1J and 10 are the same space is not
ruled out. The matrix off with respect to the bases V and is defined to W
be the m-by-n matrix whose y'th column is the coordinate vector of/(v ) ;

with respect to the basis W. Thus if we denote by A = (a^) the matrix off
relative to Kand W, then the entries in they'th column of A appear as the
coefficients a tj in the equation

/(*) = ««*! + ... + a mj yv m .

The matrix A contains all the information needed to evaluate f(x) for
any x. For if x is represented as

with coordinate vector r = (rlt . . . , r n ), then

/(x)=i>,/(v,)

n m

=i
f=l
(i<wW
\j=l I

We can then recognize the coefficients in the representation of f(\)


relative to W as the entries in the product Ax of the matrix A and the
coordinate vector r. We remark that when T) and 1J0 are the same space
it is usually appropriate, though not logically necessary, to take V and W
v

Sec. 10 Coordinates 181

to be the same bases. We do this in all the examples for which HJ is the
same as ID.

Example 2. (a) If / is the linear function from Jl" to %m given by


multiplication by a matrix A, then the matrix of/ with respect to the
natural bases in % n and % m is A itself. This is just a rephrasing of Theorem
4.2 of Chapter 1.

(b) The function / from % 3


to 3l 3 defined by

takes the subspace 1J spanned by v x — — 1 I and v 2 = I — 1 I into

0/ \ 1/
itself, and it may be considered as a linear function from °U to C
U. We have

and/(v 2 ) = — lvi + 0v 2 so the matrix


, of/ with respect to the basis Fis

-1 -1
1

Let v 1; v 2 be the basis in 31 2 taken in Example 1(b), and let /be


(c)

a rotation of 60° counterclockwise. Then/^) v 2 and/(v 2 ), the vector =


x in Fig. 16, is equal to — x + v2 . The matrix of/ with respect to the basis
v is therefore

(0 -1 N

1 1

(d) Let P2 be the vector space of quadratic polynomials, and let ID


be 3l 4 . Define f(p), for p a polynomial in P2 , to be the vector in 3l 4 with
entries p{\),p{—\), p(2), and p(3). Take the polynomials 1, x, x 2 as a
basis for P2
and the natural basis as basis for Jt 4 Then /takes the poly- .

nomial into the 4-tuple (1, 1, 1, 1); it takes the polynomial x into the
1

4-tuple (1, —1,2,3); and it takes x z into (1, 1,4,9). Its matrix with
182 Linear Algebra Chap. 2

respect to the given bases is therefore

Given a basis V = (v 1 , . . . , v n ) for


C
XJ, we may define a function c v
from °0 to ft" by setting c v (x) equal to the coordinate vector of x with
respect to V for every x in XJ.
<
The function cv is called the coordinate
map for the basis V. The range of c v is obviously all of ft" because, for
any n-tuple rlt . . . , we can take x =
rn , r^ + . . . + rn \ n ; then c v {\)
is the given n-tuple. By Theorem 10.1, c v is one-to-one. Therefore there
C
isan inverse function c~y from ft" to U. It is easily seen to be given by the
formula
Cy\rlt . . . ,r n ) = r^j + . . . + rn \ n .

An obvious but important fact is:

10.2 Theorem

Coordinate maps and their inverses are linear functions.

Proof. To show that c v is linear, we must verify that c v (rx + sy) =


rc v (\) + sc v (y). This follows at once from the fact that if x =
n n n

2 fljV,, and y = 2 ^t v i» tnen r\ -\- sy = 2 ( ra i + ^H- The


»=1 t=l i=l
1
linearity of cy is an immediate consequence of the same equation.

Note that if °\J = ft", the coordinate map c for the natural basis is the
identity function c(x) = x.

A linear function 17 — > IT, which is one-to-one and whose range is

all of 'Ui, is called an isomorphism, and two spaces such that there exists an
isomorphism between them are said to be isomorphic. We have shown that
coordinate maps are isomorphisms, which proves:

10.3 Theorem

Every rt-dimensional real vector space is isomorphic to ft".

The significance of these concepts lies in the fact that isomorphic


spaces are alike in all respects, when considered as abstract vector spaces.
Sec. 10 Coordinates 183

Any statement true for one space (provided it can be formulated entirely
in terms of addition and numerical multiplication) can be carried over by
an isomorphism to give a true corresponding statement for an isomorphic
space. For example, if TJ —^-> ID is an isomorphism, then an equation
n n

y =2
i=l
a xi
i
ls true m ^ if an d only if
f(y) =^
t'=l
a if(*i) is true in ID.

By using coordinate maps we can give an alternative description of the

matrix of a linear function. Given TJ — > ID, and bases V = (v x , . . . , v„)


in IT and W= (w 1 , . . . , wm ) in ID, the composition c u -°f° c v
x
is a
function from 3V to 'JV". It is therefore given by multiplication by some
m-by-n matrix A we claim that A is precisely the same as the matrix of/
;

with respect to the bases V and W. To prove the assertion, note that the

f° Cy )^,) = c w (f(\ )), since Cy ^) = v,. Now


1 1
y'th column of A is (c w °
}

c, r (/(Vj)) is (by definition of c w ) the coordinate vector of/(v; ) with respect


to the basis W, and this is just how the matrix of/ with respect to bases
Kand Wwas defined. This description of the matrix of a function makes
the proof of the following generalization of Theorem 4.4 of Chapter 1

very easy. It simply says that, in general, matrix multiplication corresponds


to composition of functions, provided one uses bases consistently.

10.4 Theorem

Let / and g be linear functions with and cl) —^> ID. C


U> —^> TJ
C
Suppose bases U, V, and Ware given in U, and ID, that A is the C
IL,
matrix of/ with respect to the bases U and V, and that B is the
matrix of g with respect to the bases Fand W. Then BA is the matrix
of g o/with respect to the bases U and W.

Proof. By Theorem 4.3 of Chapter 1 and the characterization of


matrices of functions by coordinate maps, this is simply the statement
that

(C W o g o Cy
1
) °(c v ofo Cr]) = CW o
(g of) O c'r],

which is clear because composition of functions is associative.

The special case in which 1L, CU, and ID are all the same space with the
same basis is particularly important. If °0 —f-+ T) has matrix A, then
/o/has matrix A 2 f °f °f has
,
matrix A3 , etc.

Example 3. (a) For the function / of Example 2(b), it is clear that


/°/°/is the identity, and it is easy to verify that
3
1 l\
184 Linear Algebra Chap. 2

(b) Differentiation is a linear function from the space of polynomials


to itself. If we define D(p) to be /;' for elements of the space of quadratic
polynomials and use the basis (\,x, .x 2 ) as in Example 2(d), then the
matrix of D is .

A) 1 0'

A=10 2

\0 0/
It is easy to compute that
'0 2"

\0 0/

and A 3 = 0, which corresponds to the fact that the third derivative of


any quadratic polynomial is 0.

The matrix of a linear function with respect to a pair of bases, of


course, depends on the bases. Since the matrix with respect to any pair
completely determines the function, it should be possible to compute the
matrix of a function with respect to one pair of bases from its matrix with
respect to any other.
Let us look at an example. Suppose/has the matrix

1 2 0\

1 1 3y

3 2
with respect to the natural bases in 'J\ and ll\ . Consider the bases

in 3t 3 and

•) -(;;
in Si 2
. To find the matrix off with respect to (v l5 v 2 v 3 ) , and (w x , w 2 ), we
compute , ,

5w
'(
'Hio) = -
/(v 2 ) = ( 1=8*!- 3w 2 , /(v3) = ( )
= - 4w +
i
3w 2-

Hence the required matrix for/ is

'5 8 -4
\0 -3 3
X
Sec. 10 Coordinates 185

A basis V — (v l5 . . . , v„) can be described in terms of another basis


X= (x l5 . . . , x„) by using the matrix whose y'th column is the coordinate
vector of v; with respect to X. This matrix gives a function from %n to 31",
1
which can be recognized as cx ° cy by observing its effect on the natural
basis vectors e} . Multiplication by this matrix converts the K-coordinates
r
of any vector w into the A -coordinates of w, since

c.v( w) = (c x ° cv
1
) ° c v (v/)

The inverse matrix gives the x's in terms of the v's and corresponds to
x
cv o cy . In the example in the preceding paragraph, the matrices giving
V and W in terms of the natural bases in 'Si
2
and Jl 3 are

1 2
and
2 3

Since e 2 = — 3w + x 2w 2 and e2 = 2w x —w 3, the matrix giving the e's in


terms of the w's is

-3

which is easily checked to be

1 2

2 3

In the general situation we have a space TJ with bases X= (xlt . . . , x„)


and V= and a space ID with bases Y = (y x
(v l5 . . . ,
y TO ) and
v„), , . . .
,

W — (wx, w m ). Let P and Q be the matrices giving V in terms of X,


. . . ,

and W
in terms of Y, and let /be a linear function from to 10 whose ^
matrix with respect to Xand Y is A. Then the matrix of/ with respect to
Kand H^is Q~ X AP. This is most easily seen by working with the coordinate
maps. The matrix for/with respect to Kand corresponds to W
C W o/o Cy
1
= (Cjy O Cy
l
) °(Cy°fo C~x) O (C X O Cy
1
),

and the three factors on the right correspond to Q' 1


, A and P,
, respectively.
For the previous example,

QT AP
X

1
'5 8 -4
\0 -3 3

which of course is the same result as before


186 Linear Algebra Chap. 2

U and 10 are the same space, and X is the same basis as Y, and V
If
C

is same basis as W, then P and Q are the same, and the new matrix for
the
/is P~ l AP. Two matrices A and B are said to be similar if there exists an
"invertible matrix P such that B = PAP' 1
. A few properties of similarity
are presented in the exercises.

Example 4. The derivative function D has the matrix

with respect to the basis x 2 ) for the space of quadratic polynomials


(1, x,

(see Example 3(b)). The polynomials 1 + x, x + x and 1 + x are also


2 2

a basis, given in terms of x, x 2 by the matrix


1 ,

The inverse matrix is

With respect to the new basis, D has the matrix

-1

-1
Sec. 10 Coordinates 187

1. Let v x Verify that V = (v 1 ,v 2) v 3 )

'0\

is a basis for Jl 3 and find the coordinate vectors of x1 = \

> | , and x, = | I with respect to V.

2
2. Find the matrix of a rotation of 45° in Jl with respect to:

V2
(a) The basis x1 = Arts.
1/V2 V2
(b) The basis of unit vectors in the directions of x x and x 2 .

3. Show that the elementary matrices E^ of shape m-by-n form a basis for the
vector space of m-by-« matrices. What is the dimension of the space?
'1 2\
4. Let A Show that each of the following functions from the
1

space of 2-by-2 matrices to itself is linear, and find the matrix of each with
respect to the basis of Exercise 3.

(a) f(X) = AX. (b) g{X) = AXA~\


5. Find bases for the null spaces of the linear functions defined on the space of
2-by-2 matrices by:

(a) f(X) = AX- XA, (b) g(X) - AX - X A\ l

where A is the same as in Exercise 4.

6. Let C
U-^— W be an isomorphism. Show that v l5 . . . , vn form a basis
for <U if and only if/Cv^, . . . ,/(v„) form a basis for IB. This proves that
isomorphic spaces have the same dimension.

7. For any two vector spaces CU, 'IT>, let £("07, *U)) consist of all linear functions
from 1J to ID. For/ and g in C(1J, 'ID) and r a number, define functions

f+g by (f + g)(y) =f(y)+g(y)


and
rf by (r-/)(v)=r-/(v).
Show that/ + g and r /are linear and so are in CCU, 10), and show that
with these operations, C('\J, '10) is a vector space.
188 Linear Algebra Chap. 2

8. (a) Show that C(3l", &m ) is isomorphic to the space of all m-by-n matrices,

(b) Show that if dim (1)) = n and dim (ID) = m, and bases are chosen in
'U and 'ID, then assigning to each/in U('U, 'ID) its matrix with respect to
the given bases gives an isomorphism between £(1), ID) and the space
of m-by-n matrices.

9. Let Pn be the space of polynomials of degree less than or equal to n. Show


that the function / defined by t(p(x)) = p(x + 1) (so that, for example,
t(2x + 1) = 2x + 3) is a linear function from P n to itself.

(a) Write the matrix of t with respect to the basis 1 , x, x2 for the space P2 .

(b) Verify by matrix calculation that on P2


D2
= + D
t l
+Y ,

where D is the derivative function of Example 3(b). (Here "1" is to be


interpreted as the identity function, and D 2 as Do D.)
(c) Show that on Pn
D 2
D z
D
— n
t = \ + D + —
2
+ — +
6
... +
n\

[Hint. Use the definitions of the functions directly, without bringing in


coordinates.]

10. Show that if A and B are similar, then

(a) det A = det B.


(b) A" 1 is similar to B~ l (if the inverses exist).
(c) A n is similar to B n for any positive integer n.

11. The trace of a square matrix A is defined to be the sum of its diagonal
/I 2\
elements and is written tr (A). For example tr =1+4=5, and
\3 4/
if / is the n-by-n identity matrix, tr (/) = n.

(a) Show that if A and B are square and have the same size, then tr (AB) =
tr (BA).
(b) Show that if A and B are similar, then tr (A) = tr (B). [Hint. This can
be proved in one line by applying part (a) to the right pair of matrices.]

12. Suppose Jl 3 — /
> Jl 3 is a rotation and \ 1 is a unit vector in the direction of
the axis of the rotation. If v 2 and chosen to make (v 1( v 2 v 3 ) an
v 3 are ,

orthonormal basis, then /(Vj) = vx and /(v 2 ), /(v 3 ) will be perpendicular


to Vj.

(a) Show that the matrix off with


TOO respect to (vj, v 2

cos a —sin a
, v3) is

sin a cos a>


Sec. 10 Coordinates 189

where a is the angle of the rotation. (Compare Exercise 4 of Section 4,


Chapter 1.)
(b) Use the result of Exercise 1 1 to show that if A is the matrix of a rotation
of angle a with respect to any basis, then cos a = |(tr {A) — 1).

13. The linear function on Jl 3 defined by

/(ei) = e2 ,
/(e 2 ) = e3 , /(e3 ) = ex

is about the line xx = jc2 = jc3 Find its angle by geo-


clearly a rotation .

metrical reasoning (what is/o/o/?) and check the result of Exercise 12.
[Ans. 120°.]
14. Show that the matrix

3
represents a rotation of Jl and find its axis and angle. (If x is a unit vector
in the direction of the axis, then Ax = x and so (A — I)\ = 0. To show that
A represents a rotation, find its matrix with respect to an orthonormal basis
that includes x.) [Ans. Axis is (3, 3, —1).]
3

Derivatives

SECTION 1

VECTOR FUNCTIONS The purpose of this section is to make the reader familiar with some
specificexamples of vector functions. These examples will be given by
formulas that can be analyzed in terms of the elementary functions of
one-variable calculus. Wherever possible we will give a pictorial descrip-
tion of the function, something that can often be in more than one done
way. In later parts of the book we shall often return to examples like the
ones here, and so develop more familiarity with them, and with the ways
in which they can be applied.

Example 1. The function 'Ji


3 — > Ji 2 defined by

(x 2 +y + z2

{
x +y +z
has as its domain all of 3l 3 ; but, because x 2 +y +2
z2 > 0, its image
contains only those vectors in JI 2 for which the first coordinate is

nonnegative. A different function 'Ji


3 — > 3i 2 is defined by

f
3x + 4jA

\zj

and g is linear because it is given by matrix multiplication acting on the


vectors of Jl 3 . The function 'Ji
2 -^->- 'Ji
2
defined by

,Vl - x2 -y
h(x, y) = \
x + y

190
Sec. 1 Vector Functions 191

for vectors (x, y) such that x2 +y < 2


1 , has as its domain a circular disk D
in 3l 2
The notation f:D—>
. is often used 'Ji
m to denote a function
31" -£-> 51 m with domain Z) contained in 31".

Example 2. The functions defined by the following formulas have both


domain space and range space equal to the set 51 of real numbers:

f(x) = x\

g(x) = yjx, X > 0,

h{x) = sin x.
The domain and range of/ are both equal to 31. According to the usual
interpretation of the square-root symbol, the range of g is the set of non-
negative real numbers, and so also is the domain of g. The domain of/? is

31, while the range of A is the interval —1 <y < 1. To emphasize what
its domain is, the function g might be denoted

g:[0, oo)-^3l,

where [0, oo) stands for the interval < x < oo.

Avector x in 5l n whose coordinates are the real numbers xlt . . ., x


n
will be written either as a horizontal «-tuple or as a column. Thus, we shall
write both

and (*!, . . . , xn )

for the vector x. The practice of writing columns instead of horizontal


tuples arises, of course, from the definition of matrix multiplication. If a
function is determined by a matrix

a b

c d
we usually write

a b\ x

c d!\y
for the value of the function at

or (x, y).
192 Derivatives Chap. 3

A function whose range is a subset of the space 3i of all real numbers is

called real-valued. Every function J{" —-> f


:){'" defines a set of real-valued
functions /j, . . .
,fm , called the coordinate functions of/; we set /(x)
equal to the z'th coordinate of/(x). Thus

f(x) = C/i(x), . . .
,fm (x))

for every x in the domain off and/, is called the Ah coordinate function
of/

Example 3. Consider the vector function

/x\ +y + Ix z

f\}\ = Uy+jz + zx

The coordinate functions of/ are the three real-valued functions

fix, y,z) = x+y + z,


f{x, y, z) = xy + yz + zx,
f3 (x, y, z) = xyz.

The graph of a function / is defined to be the set of ordered pairs


is in the domain of/ In studying real-valued functions of
(x,/(x)) where x
one real variable, graphs are a considerable aid to understanding. For
example, the graph of the function defined by/(.v) = x 2 — 2 is the set of
all ordered pairs (x, v) with v — x — 2, that is, it is the subset of the xy-
2

plane consisting of the parabola shown in Fig. 1. As another example, the

Figure 1
Sec. 1 Vector Functions 193

graph of a function 3l 2 — > 31 is the subset of 3l 3 consisting of all points

(x,y,z) = (x,y,f(x,y)).

As a means of increasing understanding by visualization, a graph is

useful only for functions 31" —^> 3l


m for which m+ n < 3.

Example 4. To sketch the graph of the function /defined by

fix, y) = x2 +y 2
,

recall that it consists of all points in 3l 3 of the form (x, y, z) =


(x, y, x2 +y 2
). One thing we can do is plot several of these, for example,
(1,1, 2), (0, Another approach is to draw some of the
1, 1), (1, 0, 1), etc.

curves that on the graph. In particular, we can draw cross sections of


lie

the graph obtained by holding either x or y fixed and letting the other
one vary. The result is a curve in a plane parallel either to the yz- or xz-
coordinate plane. Two of these are shown in Fig. 2. We have chosen

Figure 2

x =f for one of them. The result is the set of points (x, y, z) =


(f > 7' f +y 2
)- To sketch them we can think of the points (y, z)
first =
(y, f +y 2
) in the jz-plane. Then move the resulting curve f units in the
direction of the positive x-axis. Similarly, choosing y = \, we sketch the
curve of points (x,y, z) = (x, \, x2 + \). Doing this for several values of
x and y gives a better picture than simply plotting individual points.

Example 5. Consider the function 31 — > 3l 2 defined by

''cos ^
/(') = for every t in 31.
sin t
194 Derivatives Chap. 3

Since the length |x| of a vector x = (x, y) is given by |x| = \Jx 2 +y 2


, we
have |/(0I = Vcos 2
+ 2
=
Thus the range of/ is a subset of the
1 sin / 1.

unit circle |x| = in 3l 2 The number t is interpreted geometrically as the


1 .

angle in radians between the vector/(0 and the positive x-axis. As / runs
through 31, the unit circle is covered infinitely often. It follows that the
range of/ is the whole unit circle. The circle is not, however, the graph of
/ The latter is and
a subset of 3l 3 is a spiral whose axis is the /-axis. See
Fig. 3(b). What we have done is sketched the points of the form
(t, x, y) = (/, cos /, sin that make up the graph of the function.

(cos /, sin t)

(b)

Figure 3

Example 6. Let the vector function /be defined by

oo < <
t oo.

The graph of/is a subset of Jt 4 so we shall not attempt to draw it. Instead
,

we shall sketch the range. By setting z = 0, we obtain the equations x = t,


y = t 2 which are equivalent to y = x 2 Thus the projected image, on the
, .

xy-plane, of the range of/is the graph of y = x 2 Similarly, in thejz-plane .

we obtain v = z 2/3 and in the xz-plane we get z = x 3 From this informa-


, .

tion, we have drawn in Fig. 4 that part of the range of/that lies above the
first quadrant of the xy-plane.
In drawing Fig. 4 we have labeled the axes in the manner usually
associated with a right-hand orientation. Of course, if we were to inter-
change x and y in this labeling, the picture would look different. A
Sec. 1 Vector Functions 195

Figure 4

similar change in Fig. 5(b) would result in a ramp that spirals down,
turning always to the left instead of to the right. To make it easy to see the
relationship between the pictures, we have always chosen the right-hand
orientation for the axes in the 3-dimensional ones. For a discussion of
orientation see Section 7 of Chapter 1.

Example 7. Consider the function

< u < 4,

< v < 2tt.

The domain off is the shaded rectangle in Fig. 5(a). To sketch the
range, we proceed as follows. Choose a number a in the interval
< u < 4, and set u = a. Then,

< v < 2tt,

and x 2 +y — 2
a2 . We interpret v both as distance along the z-axis and as
the angle between (x, y, 0) and the x-axis. It follows that the image under
/ of the line segment u = a, for < v < 277- (see Fig. 5(a)) is the spiral
whose projection on the .xy-plane is the circle of radius a and whose axis
is the z-axis (see Fig. 5(b)). Next, choose a number b in the interval
196 Derivatives Chap. 3

V
Sec. 1 Vector Functions 197

explicitly defines the upper half of the circle of radius 4, and the same curve
is defined parametrically by the pair of functions

x(t) = 4 cos /,
< t <
y(t) = 4 sin /,

Parametric representations of lines in 3-dimensional space have been


studied in Chapter 1. Let Xj and x 2 be any two vectors in Jl 3 If x x 7^ 0,

.

3
the range of the function 51 > Jl defined by

f{t) = rx x + x 2
00 < < t 00,

is a parametrically defined line.

A curve or a surface can also be defined implicitly as a level set of a


function. A S is a level set off if for some point k in the range off, S
set
consists of x in the domain off such that/(x) = k. The most easily
all

visualized examples occur for functions/from Jl 2 to Jl. Then the graph of


/can be pictured as are the surfaces in Fig. 7. The level sets corresponding

Figure 7
(a)

to range values k u and k 2 are the curves in the xy-plane implicitly


k ,

defined by the equations /(;t,j/) = k or k u or k 2 More concretely, the , .

level set of the function f(x,y) = x 2 + y 2 + determined by f(x,y) — 2 1

is a circle, x + = y) = 1 we get x + y — 0,
2 2 2 2
y Corresponding
1 .to/(x,
which determines a single point (x,y) = (0, 0). See Fig. 7(b). Level sets
are customarily used on topographical maps to show changes in terrain
elevation at regular intervals.
.

198 Derivatives Chap. 3

Example 8. Consider the function / defined by f(x, y, z) =


x2 +y + 2
z2 . The subset S of ft 3 consisting of all points (x, y, z)

that satisfy an equation

+ y2 + Z2 = lc (1)

is implicitly defined by Equation (1). If k > 0, we get a sphere of


radius \J k centered at (0, 0, 0), because + y + z is the square
x2 2 2

of the distance from (x, y, z) to (0, 0, 0). If k = 0, we get the


single point (0, 0, 0). Finally, if k < 0, the corresponding level
Figure 8 set S is empty. In this example the graph of/is a subset of 31* and
cannot be pictured. Hence it is desirable to at least draw the level
sets and get an idea of which points in the domain are sent by/into certain
fixed points in the range. Some of these are shown in Fig. 8 as a point and
two concentric spheres.

Example 9. Let Jl 3 Si 2 be defined by

x2 +y + 2
z2
f(x,y,z)
x +y + z

We shall describe the level set y implicitly defined in 3l 3 by the equation


f(x,y, z) = (2, 1). In other words, y consists of all (x, y, z) such that

+y + z = 2 x2 2 2

(2)
x +y +z =\.
We have seen in Example 8 that x + y + z = 2 2 2 2
implicitly defines a

sphere S of radius v 2. The equation x + y + z = 1 defines a plane P,


because it is satisfied by the set of (x, y, z) such that

(1, 1, 1)- (x,y, z- 1) = 0.

In fact Pis evidently the plane containing the three points (1,0, 0), (0, 1, 0),

and (0, 0, 1) and shown in Fig. 9. The level set determined byf(x, y, z) =
(2, 1) is then the circle C consisting of all points satisfying both Equations
(2) and so is the intersection of S and P.
We summarize the definitions of explicit, parametric, and implicit
representations. A set S is defined:

Figure 9 1 explicitly if S is the graph in 3i


n+m of a function

%n — > %m .

2. parametrically if 5 is
m
the range in 3i of a function

51" — > 'JV


Sec. 1 Vector Functions 199

3. implicitly if, for some function

Jl" H 31™,

5 is a level set of/, that is, for some point k in 'SV S is the set of all x
in the domain of/ such that/(x) = k.

A set S defined in some one of the above three ways will be called a
curve or a surface provided that/satisfies certain smoothness conditions to
be described in the last section of Chapter 4. In the meantime we shall use

the terms curve and surface informally.


When the domain and range spaces of a vector function are the same,
it is often helpful to picture the domain vectors x as points and the image

vectors/(x) as arrows. We picture/(x) as an arrow with its tail at x. One

/
S
VU>

^> \ Sfc

(a) (b)

Figure 10

would do this, for example, in representing a 2-dimensional fluid flow in


which the image vector at each point is the velocity and direction of the
flow. See Fig. 10(a). Another example is an electric field, where the value
of the function at a point is the vector giving the force exerted by the field
on a unit charge. See Fig. 10(b). Vector functions looked at in this way
are sometimes called vector fields. We try to visualize a vector field by
thinking of the appropriate arrow emanating from each point of the
domain of the field. Hence the domain and range spaces must have the
same dimension.

Example 10. A simple example of a vector field is given by the function

/(*.*)=(*.*)

To sketch it we locate some points (x,y) and some corresponding points

(x,y/2). Then we translate the arrow directed from (0,0) to (x,y/2)


parallel to itself until its tail rests at (x, y). Some of these arrows are
shown in Fig. 11.
200 Derivatives Chap. 3

(I, I) translated

\ (1, D«
(1, H

/
Figure 11

EXERCISES
1. Suppose that the temperature at a point (x, y, z) in space is given by
T(x, y, z) — x 2 + y % + z2 . A particle moves so that at time / its location
is given by {x,y, z) =
2 3
(/, t Find the temperature at the point occupied
, t ).

by the particle at t = \. What is the rate of change of the temperature at


the particle when t = \1
2. Let the density per unit of volume in a cubical box of side length 2 vary
directly as the distance from the center and inversely as 1 + t 2 where t is ,

time. If the density at a corner of the box is 1 when / = 0, find a formula


for the density at any point and at any time. What is the rate of change of
the density at a point \ unit from the center of the box at time t = ? 1

3. Consider the function f(x,y) = v4 — x 2 — y 2 .

(a) Sketch the domain of/ (take it as large as possible).


(b) Sketch the graph of/
(c) Sketch the range of/.

4. The function Jl >- 3l 2 is defined by

2 cos /

gif) < < 2n.


t

3 sin t

(a) Draw the range of g.


(b) Draw the graph of g.

5. A transformation from the xv-plane to the wf-plane is defined by

yi\ + x2)

What are the images of horizontal lines in the xy-plane? What are the
coordinate functions of/?
Sec. 1 Vector Functions 201

6. For each of the following linear functions: (i) What is the domain? (ii)

Describe and sketch the range, (iii) Describe and sketch the set implicitly
defined by the equation L(x) = 0. What is this set usually called?

(d) L \y

7. Sketch the surfaces defined explicitly by the following functions:

f\ )

y
\\ ifW <\y\,
(c) g(x, y) = sin x. (f) g(x, y)
10 ifW >\y\.

8. Sketch the curves defined parametrically by the following functions:

1\ /l

00 < t < 00.

(e) fit) =
\t
, ,

202 Derivatives Chap. 3

9. Draw the surfaces defined parametrically by the following functions:

00 < u < 00,


(a)/
CO < v < 00.

^ cos u sin v\

(b)
g\ )
= I y |
= I
sin u sin t;
|

,0 <v< tt/2.

(cos u cosh p>


<u < 2n,
sin « cosh v |
— oo < v < oo.
sinh v

10. Draw the following implicitly defined level sets:

(sl) f{x, y) =x + y = 1.

2 2
x y

(c) /(*, j) = (x 2 + / + l)
2
- 4jc
2 = 0.

(d) f(x, y, z)=x+y + z = \.

(e) xyz = 0.

(f) g(x,y,z) =x -/=2. 2

(x - y\ /0)
(g) ,

\y + z] \0

(2x +y + z = 2,
(h)
-z = 3.

xyz \ /0
(i) '

\x+y] \\

11. Suppose that the density per unit of area of a thin film, referred to plane
rectangular coordinates, is given by the formula d(x, y) = x 2 + 2y 2 —
x + 1, for -1 <x
< and -1 < J < 1. Sketch the set of points at
1

which the film has density \.


Sec. 1 Vector Functions 203

12. Sketch the indicated vector fields.

(a)/j
)=| for -1 <x <2,y = 0, y = 1.

for x2 + f < 4.

=
( )
for x2 + y2 <

(d)/
y-^r?(!) f° r * 2+/s4 -

13. Let a transformation from the Euclidean xy-plane to itself be given by

x + y]
f
\yl \-*+y
Show that /accomplishes an expansion out from the origin by a factor ^/2
and a rotation through an angle 7r/4.

14. The vector function /is defined by

(x\ ix" —y

\yj \ 2xy

What are the coordinate functions of/? Consider the domain space to be
the xj-plane and the range space to be the wr-plane.

(a) Find the image of the segment of the line y = x between

0\ /l
and
0/ \1

(b) Find the image of the region defined by < x, < y, and x 2 + y 2 < 1.
(c) Find the angle between the images of the lines y = and y = (l/^/3)x.
[Ans. tt/3.]

15. A vector function /from the ;cy-plane to the wu-plane is defined by

What are the coordinate functions of/? Find the image of the region
bounded by the lines x = y, y = x — S, x = —y, y = 8 — x.
204 Derivatives Chap. 3

SECTION 2

FUNCTIONS OF ONE If a point moves in a vector space so as to occupy various positions at


VARIABLE different times, then can be described by a vector-
its position at time /

valued function/with values/(/). For example, a point moving on a line


in ft 3 might be at the point

f(t) = tXy + x
3
at time /, where x t and x are points in ft . More generally, a function/
taking values in ft" might be given in the form

fit) = (A(0, • • •
,/„(0),

where the coordinate functions/!, . . .


,/n describe the real-valued coordi-
nates of a point in 31" at different times /.

Example 1. If xx = (x{, y lf rx ) and x = (x0> y , z ) are points in ft 3 ,

then the function 31 —^> ft 3


defined by

f(t) = t{x1 ,y1 , Zj) + ix ,y , z )

= itx 1 + x , ty x +j , tz x + z )

gives a parametric representation of a point on a line. The function g


for which
g(o = it, n
similarly describes a curve in 'R 2
. In fact, because the coordinates x = t
and y = t
2
satisfy the relation y = x2 , the point it, t
2
) always lies on the
parabola with equation y = x 2 in Fig. 12. , shown
We can define the limit of a vector-valued function / with
values in ft" by using limits of the real-valued coordinate functions

fk of/. Thus if

= (t.it),
fit) . . .
,/„(0)
'• * > is defined for an interval a < < t b containing /„, we write

lim/(f)= lim/ (0,...,Hm/ (0 1 ll

Similarly, a function with values in ft" is said to be continuous


if its real-valued coordinate functions are all continuous on their
Figure 12 interval of definition.

Example 2. The function defined by git) = 2


it, t ) has limit vector
(2, 4) at / = 2 because

lim(f, t
2
) = (lim t, lim f
2
)
(-2 \(->2 4->2 /

= (2, 4).
o

Sec. 2 Functions of One Variable 205

The function g is continuous for all real / because the coordinate functions
/ and t
2
are continuous.

The intuitive idea behind continuity of a vector-valued function is

similar to that for a real-valued function: the values of the function


should not change abruptly. These ideas are treated more fully in Section

7. At present we shall consider only continuous functions 31 —?->- 'Ji


n

with g(t) defined on an interval a < < t b. We shall first define the
derivative of g and show how it leads to a definition of tangent line to the
curve y defined as the range of g.
The function g has a derivative g\t) at a point / in the interval (a, b) if

ft— n

If the limit exists for each / in (a, b), then g'(t) determines a new function
3i — > 3\ n , just as in the case n — 1 . The derivative is often written dgjdt.

Example 3. Let g(t) = (t


2
,
3
t ). Then

+ 2
-t 2

hm — +
.. g(t h)~ g(t)
^-^ = hm..
-
\((t
1
h)
ir
*-oft\(f + hf - t
3
206 Derivatives Chap. 3

Example 4. If

COS t

g(0 = then g'(t)


sin t

If

h(t) = t
2
then h'(t) =

It is clear from Fig. 13 that the vector g(t + h) — g(t) has a direction
which, as h tends to 0, should tend to what we would like to call the tangent
direction to the curve y at g(t). However, since g is assumed continuous,

lim g(t + h)- g(t) = 0,


h->0

and the zero vector that we get as a limit has no direction. This difficulty
is overcome in most examples by dividing by h before letting h tend to zero.

Observe that division by h will not change the direction of g{t -f h) g{t) —
if h is positive; it will reverse h is negative. A glance at Fig. 13 shows
it if

that this reversal is desirable for our purposes. (What would be wrong with
dividing by \h\ ?) The derivative g'(t), if it exists and is not zero, is called the
tangent vector to y at g(t ). Of course, any nonzero multiple of g'(t) is then
called a tangent vector, and the line with direction vector g'{t) and passing
through g{t) is called the tangent line to y at g{t). Thus, if g(t ) is a
particular pointon a curve, the tangent line at g(/ ) will have a parametric
representation of the form
tg'Oo) + g(t )-

The tangent vector g'{t ) is usually pictured with its tail atg(/ ) as in Fig. 13.

Figure 13
Sec. 2 Functions of One Variable 207

Example 5. The circle defined parametrically by g{t) = (cos t, sin /) has


a tangent vector at g(t ) given by g'(t ) = (— sin /„, cos t ). In particular,
the tangent vector at ^(tt-/4) = (l/\/2, l/V2)isg'(7r/4) = {—ifyjl, l/VI).
Hence the tangent line to the circle at (l/v2, l/v2) has a parametric
representation (jc,^) = t{— \yjl, 1/V2) + (I/V2, 1/V2). The
line is shown in Fig. 14, together with some tangent vectors, each
of which has length \g'{t )\ = 1.

For the spiral curve given by f(t) = (cos t, sin /, t), we have
/'(0) = (0, 1, 1). The tangent line to the spiral at (1,0, 0) can be
represented by

Figure 14

and in this case the tangent vector/' (0) has length V2.
One reason for singling out g'(t) for special attention as the tangent
vector, rather than some
is that we often want to consider the
multiple of it,
parameter t andg(/) as representing the path of a point
as a time variable,
moving in 31". Under this interpretation, the Euclidean length \g'(t)\ is the
natural definition for the speed of motion along the path y described by
g(t) as t varies. To justify the use of the term "speed," we observe that, for
small h, the number \g(t + h) — g{i)\\\h\ is close to the average rate of
traversal of y over the interval from / to t + h. In addition, if g'(t) exists,

it is easy to show that

lim
i g« + /.)-g(.)i = ,

|g ( , )|
a->o \h\

In fact, by the reversed triangle inequality (Exercise 12, Section 5,


Chapter 1),

\g(t +h)- g(t)\


< g(t +h)- g(t)
- |g'(0l g'(t)

which tends to zero as h tends to zero. Thus \g'(t)\ is a limit of average


rates over arbitrarily small time intervals. For this reason the real-valued
function v defined by v(t) = \g'(t)\ is called the speed of g, and the vector
v(/) = g'(t) is called its velocity vector at the point g(t). The vector v(/) is,

of course, the same as what we have called the tangent vector to y atg(/),
provided v(/) # 0.

Example 6. If a point moves in the plane so that at time t its position


is g(t) = (t 2
, t
3
), then the velocity vector is v(/) = (2t, 3/ ),
2
and v(t) =
V4r 2
+ 9r 4
. In particular, v(0) = 0. The path traced by g is shown in
Fig. 1 5 for — < < 1 t I, and in drawing the picture it is helpful to observe
208 Derivatives Chap. 3

that the coordinates of a point on the path satisfy the equation


y
2
— x3 .

The fact that the tangent vector shrinks to zero in this example
as g(t) approaches the origin is a reflection of the fact that, if the
velocity vector varies continuously, the speed must be zero at an
abrupt change in the direction of motion. In this way the para-
metrization describes the physical situation well. However, for the
purpose of assigning a tangent line to the path at the origin, the
given parametrization is not useful.

Example 7. Let/(7) = (cos sin /, /, t), as in the second part of


Example 5. Then v(/) = — sin/, cos/,
( 1). It follows that the
velocity vector is always perpendicular to the vector (cos /, sin /, 0)
that points from the axis of the spiral to/(/). The speed at any
Figure 15
time / is v(t) = |v(/)| = v'2.

We list here some useful formulas that hold if/and g each has a vector
derivative on an interval (a, b) and cp is real-valued and differentiate there.

2.2 (f + g)' =f + g, (c/Y = cf, c constant.

2.3 (<pf)'
= <pf + <p'f, (f>
real-valued.

2-4 (f-g)'=f-g'+f-g.

2.5 (/(«))' = u'f'iu),

where u is a real-valued, differentiate function of one variable,


with its range in (a, b).

These can all be proved by writing /and g in terms of their coordinate


functions and then applying the corresponding differentiation formulas
for real-valued function, together with Formula 2.1. For example, the
proof of 2.5, a version of the chain rule for differentiation, goes like this:

= ([fi(u)]\ . .
. , [/„(«)]')
— —
Sec. 2 Functions of One Variable 209

If Jl > %n has a derivative 31 > 51", then we can ask for the

derivative of g', which we denote by g". Thus we have %— > 31", though
g" may
be defined at fewer points than g or even g'. We also write d 2gjdt 2
forg", and so on for higher-order derivatives.

Example 8. Let 31 —
> ft 3 describe a path in 'J\ 3 with velocity vector
v(t) and speed v(t) at each point g(t). Then t(r) = (\jv{t))y(t) is a tangent
vector of length 1, provided v(t) ^ 0. In any case, we can write g'(t) =
v(t)t(t). If we assume that g has a derivative, we define the acceleration

vector at g{t) by a(f) = g"(t). The physical significance of a(7) is that if g(t)
describes the motion of a particle of constant mass m, then ma(t) is the
force vector F{t) acting on the particle. If we denote by a{t) the length of
a(f), then a(t) is the magnitude of the acceleration, and ma(t) is the
magnitude of the force acting on the particle.

If t(0 is a unit tangent vector at g(t), the equation g'(t) = v(t)t(t)

implies that a = (vt)'. Applying Formula 2.3, we get a = v't + ft'. Thus
if t'(t) = 0, the acceleration vector, and hence the force vector at g(t),
has either the same or else the opposite direction to the motion. On the
other hand, if t'(t) ^ 0, we can define the unit vector n by n(t) = t'(t)l\t'(t)\,

and so the acceleration vector can be written

= v't It'l

This equation expresses the acceleration a(/) at each point g{t) in terms of
an orthonormal pair of vectors t(t) and n(t). We have |t| = |n| = 1 by the

definition of these vectors, and application of Formula 2.4 to the equation


t . t = 1 gives t • t' = 0. But by the definition of n, this implies t n = 0. •

The pair t(/), n(7), should be pictured at g(t) as in Fig. 16. The third
unit vector b(/) shown there is defined by b = t x n, and is called the
binormal vector to the path, while n is called the principal normal. Thus any
vector naturally associated with the point g(t) on the path can be written

Figure 16
210 Derivatives Chap. 3

combination of the triple (t(r),


as a linear n(r), b(t)}, which changes as we
go from point to point along the path.

EXERCISES
1. Ifg(t) = (e
l
, t) for all real t, sketch in ft 2 the curve described by^ together
with the tangent vectors ^'(0) and^'(l).

% Let/(0 = (r, t
2
,
3
t ) for <t< 1.

(a) Sketch the curve described by/ in ft 3 and the tangent line at (i, |, £).
(b) Find \f'{t)\.
(c) If /(f)= (f, f 2 r 3 ) for all real t, find all points of the curve described
,

by /at which the tangent vector is parallel to the vector (4, 4, 3). Are
there any points at which the tangent is perpendicular to (4, 4, 3)?

3. Sketch the curve represented by (x,y) = (r


3
,
5
r ), and show that the para-
metrization fails to assign a tangent vector at the origin. Find a para-
metrization of the curve that does assign a tangent at the origin.

4. Show that the curve describedby g(t) = (sin 2t, 2 sin 2 f, 2 cos t) lies on a
3
sphere centered at the origin in Si Find the length of the velocity vector .

v(f) and show that the projection of this vector into the xj-plane has a
constant length.

' (a) Show that if 31 > A3 is continuous for < /, then there is a unique
path g through a given point ^(0) in :R 3 , having v(f) as its velocity

vector at g(t).
(b) Show that a continuous function Si -^—>- R3 defined for < t, deter-
mines a unique path in H3 , having a(r) as its acceleration vector, pro-
vided the initial point ^(0) and initial velocity v(0) are specified.

, Suppose a target moves with constant speed v > on a circular path of


radius r, and that a missile, also having constant speed v, pursues the
target by starting from the center of the circle, always remaining between
the center and the target. When does the missile hit the target?

7. Prove (a) 2.2, (b) 2.3, and (c) 2.4 of Section 2.

8. Show that if /is vector- valued, differentiable, and never zero for a < t < b,
then

(b) |
/| is constant if and only if//' = 0.

9. Consider the vector differential equation

x" + ax' + bx =
to be solved for a function x(/) taking values in H" and defined on some
interval. We assume that a and b are constants. Show that if the real
/ .

Sec. 2 Functions of One Variable 211

equation r 2 - ar + b = has distinct roots rx and r2 , then the differential


equation has a solution of the form

\(t) = c xe
r
^ + c 2 er **,

where Cj and c 2 are constant vectors in 31". What happens if r x = r2 ?

10. Prove that (/ x ^)' = (/ x g') + (/' x g), where / and £ take values in
3
3l and are differentiable on an interval.

(ll^ Show that if 31 -^-> 31" has a derivative and ^'(r) = for a < t < b, then

g(t) is a constant vector on that interval. [Hint. Apply the mean-value


theorem to each coordinate function.]
3
12. Let a differentiable function g(t) represent the position in 3l at time / of a
particle of possibly varying mass m(t). The vector function P(t) = m(t)\(t)
is called the linear momentum of the particle. The force vector is F(t) =
(m(t)v(t))'. The angular momentum about the origin is L{t) = g{t) x P(t),
and the torque about the origin is N(t) = g(0 x F(t).

(a) Show that if F is identically zero, then P is constant. This is called the
law of conservation of linear momentum.
(b) Show that L'(t) = N(t), and hence that if is identically zero, then L N
is constant. This is called the law of conservation of angular momentum.

13. Show that if a particle has an acceleration vector a(7) at time / and v(t) =£ 0,

then v = t • a, where t is the unit vector (l/r)v.

14. Let 31 be a function defined for a < t < b. If the coordinate


-^-> Jl n

functions ...,/„ of/ are integrable, we can define the integral of/ over
1;

the interval [a, b] by

! f\t) dt = I f /i«) *, . . . ,f /„(/) rfA

(a) If /(r) = (cos t, sin t) for < < / 77/2, compute /(f) </7.

(b) If £(r) = (*, ?


2
, t
3
) for < < / 1 , compute ^ (/) rff.

15. If 31 —-* 31" and 31 -^->- 31" are both integrable over [a, b], show by using
the corresponding properties of integrals of real-valued functions that:

kf(t) dt =k \
f{t) dt, k any real number,
Ja Ja

(/(f) + #(')) * = f /(f) * + £ *(0 dt,


J
where the integrals are as defined in the previous exercise.
212 Derivatives Chap. 3

16. If ft — > Rn is defined for a < <


t b, and/' is continuous there, prove the
following extension of the fundamental theorem of calculus:

f'(t)dt=f(b)-f(a).

17. If Jt — > &" is continuous over [a, b],

(a) Show that

k /(/) dt fit) dt,


}J a
where k is a constant vector,

(b) Show that

[f(t)dt <[ \f{t)\dt.


Ja Ja

[Hint. By the Cauchy-Schwarz inequality

f(u) • /(/) dt < |/(«)| f(t) dt for each u.

Integrate with respect to u, and apply the result of part (a).]

SECTION 3

ARC LENGTH The definition of length for vectors can be used to define the length of
a parametrized curve y. We assume that y is described by a continuous
function '& —> %
n where the domain of
, g is a closed interval a < t <b.
Thus y is the image of [a, b] under g. Corresponding to any finite set P of
numbers a = t < tx < < tK = b, there are points g{t k ), k = 0,
• • •
. . .
,

K, on y. We join these points in order by a polygonal path as shown in


Fig. 17. The length of the Arth segment of the polygonal approximation to

y is \g(t k ) — g(tk _i)\, and the total length of the polygon is

1(F) -I\g(h) g(h-l)\

Let l(y) denote the least upper bound of the numbers l(P). This will, of
course, be infinite numbers 1{P) is unbounded. If l(y) is finite,
if the set of
then y is said to be rectifiable, and l(y) is called its length. It is clear from
the definition that l(y) depends on the function g that describes y and not
just on y itself. This is reasonable if we want to take into account the fact
that some part of y may be traced more than once byg. In practice this is
very often what is wanted. If it should happen that g is not one-to-one,
then we may write 1(g) instead of l(y) to emphasize the dependence on g.
The length of a path is usually awkward to compute directly from the
definition. However, if y is parametrized by a function g such that the
tangent vector g'{t) varies continuously with /, then l(y) is finite and equal
Sec. 3 Arc Length 213

to J"*
\g'(t)\ dt. In fact, it is enough to assume that g is piecewise smooth,
that is, g' is continuously extendable to the endpoints of finitely many
intervals end form the interval a < t < b. This
which placed end to
allows us to find the length of some curves for which the tangent has an
abrupt change, as in Fig. 17.

Figure 17

3.1 Theorem

Let a curve y be parametrized by a piecewise smooth function


% —$-> % n defined for a < t < b. Then l{y) is finite and
,

KY) (r)| dt.


-J>
The proof of this theorem is given in Section 3 of the Appendix, and
we shall give here only an argument that makes the integral formula
plausible. Since, by definition, the existence of g'it^i) means

8fe) ~ g( '"'
lim
tk-Hk-1 t
k — k_1
t
= g-« M ),

we have

g(h) - gih-x) - (h - t^g'O^) = (r, - t^Zit, - !,_,),

where Z(t k — t
k ^) satisfies

lim Z(t k - t
k _x )
= 0.
tk-tk-i
214 Derivatives Chap. 3

Thus (t k — t^Jg'U^x) becomes a better approximation to g(t k ) — git^x)


as (t k — t^i) is made small. We are led to approximate

KP)=I\g(tk )-g(t k . 1 )\

by

I\g'(tk -i)\(t k -t k ^).


(Some of the tangent vectors (t k — f fc _i)g'(f fc _i) are shown in Fig. 17.)
But if g' is continuous, so is \g'\, and, letting m(P) = max (t k — t
k_ v ) we
have, to conclude the argument, i<k<K

lim I\gV k -i)\(t k -t k _ ) = 1 ( \g'(t)\dt.


m(P)-»0Jt=l Ja

A curve in 3i n will usually be described by coordinate functions. Thus,


if y is defined for a < t < b by g{t) = (^(0, g 2 (0. g 3 (0), then |g'(0l
=
>/&('))" + (g'2 (t))
2
+ fei(0)', and so

/(g) = £V(gi(0)2 + (g2(0) 2 + (tfMYdt,


with a similar formula holding in 31". For example, the spiral curve in 5l 3
defined by g{t) = (cos /, sin /, t) for < t < 1 has length

Kg) = V(-sin O 2 + (cos


2
+ 1 dt

Example 1. The plane curve defined by g(t) = (t \t\, \t\) for —1 < < / 1

is shown in Fig. 18. Since g is piecewise smooth, 1(g) is finite. We have

(-21, -1), -1 < /<


g'(t)
(2f, 1), 0< < t 1,

so |g'(0l = \'4r 2 + 1 for -1 < t < 1, t ^ 0. Then

/( g ) =£vs + 1 dr

-f-(£) log (2 + ^5).


Example 2. The graph y of a real-valued continuous function
/, defined for a x < < 2
b, is a curve in 3l that can be described

^
parametrically by the one-to-one function

g(x a 6 -

Figure 18
>=U))'
Sec. 3 Arc Length 215

Iff is continuous, then so is \g'(x)\ = V 1 + (f'(x)) 2


, and the formula for
finding l(y) becomes in this case

/(y)=fVl +(f'(x)fdx.
If y is a piecewise smooth curve described by a one-to-one function g(t),
a < t <
we can think of l(y) as representing the total mass of y,
b,
assuming y has a uniform density equal to 1 at each point. More generally,
if/? is a real-valued function defined on y, we can form the integral

p(g(t))\g'(t)\dt,
r
and, if it exists, call it the integral of/? over y. In particular, if/? is a non-
negative function that can be interpreted as the density per unit of length of
a mass distribution over y, the integral becomes the definition of the total
mass m of the distribution.

Example 3. Consider a full turn of the spiral curve described by g(t) =


(cos t, sin /, t) for < / < 2n. Suppose that the density of the curve at
a point is equal to the square of the distance of that point from the mid-
point q of the axis of the spiral. Relative to our description of the curve,
this midpoint has coordinates (0, 0, tt). Thus the density at g(t) is equal to

\
g (t)
- q|
2
= + sin + (t - tt)
cos 2 1
2
1
2

= 1 + (t - tt) 2
.

Also, we have seen that \g'{t)\ = yjl. Hence, the total mass of the distri-

bution is given by

f [i + (t- tOV 2 dt

(l+rVf = (§)V2(37T+77
3
).
^ —Tt

If g(t), with a < t < b, defines a smooth curve y having a nonzero


tangent at every point, then the function s(t) defined by

3.2 s(t)= \g'(u)\du

is the length of the part of y corresponding to the interval from a to t.

Since \g'{t)\ is positive, the function s(t) is strictly increasing. Thus s(t)
is a one-to-one function from the interval a < t < b to the interval
216 Derivatives Chap. 3

< s < and so has an inverse function. We denote this inverse


l(y),

simply by We now form the vector function h(s) = g(r(s)), which


r(s).

describes the same curve y that g(t) does, but with a new parametrization
in which the variable s, with < s < /(y), represents the length of the
path along y from h(0) to /?(.s). The curve y is then said to be para-
metrized by arc length.

Example 4. Let g(t) = 2


(t, f ) for < < f 1. Then

s(f)= Vl + 4u
2
rf«
Jo

= \tsj\ + 4r
2
+ \ log (2r + Vl + 4/
2
)

Since s'(/) =V + 1 At 2 > 0, s(t) is strictly increasing and so has an inverse.

Example 4 shows that r(s), the inverse of s(t), may be awkward to


compute explicitly. However, its use has several theoretical advantages.
For example, if a curve y is parametrized by arc length, then the integral
of a real-valued function p over y takes a simpler form. Suppose y is
< t <b. Then the new paramet-
given originally by a function g(t), with a
rization is given by a function h(s) defined for < s < /(y), where
h(s) = g(r(s)). Changing variable in the integral of/? over y we get, because
s'(t) = \g'(t)\,
P(g(t))\g'(t)\dt= p(g(t))s'(t) dt
{ |

Uy)
p(h(s))ds.
I"
Jo

The expression |g'(0l ^- or its simpler counterpart ds, is sometimes


called the element of arc length on the curve y. From the Equation 3.2 that
relates s and /, we can derive another in terms of derivatives:

3-3 ^(t) = \g'(t)\=v(t).


dt

To prove this, simply differentiate both sides of 3.2 with respect to /,

and use the fact that the speed v(t) has been defined to be \g'(t)\. Equation
3.3 gives further justification for the definition of speed: the derivative of
arc length with respect to time turns out to equal speed.

Example 5. Recall that the equation

a(0 = v'(0t(0 + K0|t'(0ln(0, (1)


Sec. 3 Arc Length 111

derived in Example 8 of the previous section, expresses the acceleration


vector of a curve at each point of the curve as a linear combination of two
unit vectors, one tangent and one perpendicular to the curve. The co-
efficient of n(t) can be written more meaningfully. For, denoting by r{s)
the inverse of s(t), we have by Equation 2.5 that

^t(r(5)) = t'(r(S))^( S ). (2)


as as

For inverse functions r and s we have, in general, r(s(t)) = t; therefore,


by the chain rule,

fs (*»f«) =
i.

It then follows from Equation 3.3 that

dr..
— (s) = 1
(3)
ds v(t)

Then Equations (2) and (3) give

t'(t) = v(t)^t(t)
ds

Since v(t) > 0, when we take the length of the vectors on both sides, we
can take v(t) outside getting

t'(0l = v(t) t(0


tls

The factor

d |t'(0l
x(t) t(0
ds v(t)

is called the curvature of the curve at the point corresponding to /. Then


Equation (1) becomes, in terms of curvature,

3.4 a(0 = o'O)t(t) + v\tMt)n(t).

The terms in the sum are called the tangential and normal (or centripetal)
components of the acceleration, respectively, and the numerical factors are
sometimes denoted a = v' and a n = v 2 k.
t

From Equation 3.4 we can immediately conclude several things about


a, the acceleration vector. If the speed v is a constant r , then v' = 0, and
a(r) = i'o«r(r)n(/). This means that a is perpendicular to the curve and that
its length, vli<(t), varies only with the curvature k. At the other extreme, if
218 Derivatives Chap. 3

the path of motion is a straight line, then the unit tangent vector t is a
constant vector so

Thus k =and we have a(f) = v'(t)i{t), which shows that the acceleration
0,
same direction as the tangent or the opposite direction,
vector has either the
depending on the sign of v Exercise 10 shows that k, the curvature of the
.

path, is a measure of how rapidly the tangent vector t is turning.

EXERCISES
1. Find the length of the following curves.

- (a) (x, v) = (t, log cos f), < / < 1. [Arts, log (sec 1 + tan 1.]

(b) (x, y) = (A %t 3 - \t), < / < 2. [Ans. -1/-.]


(c) y = x
3 '
2
, < x < 5.

(d) g(f) = (6/


2
,
4V2 t
3
, 3f
4
), -1 < < 2.
t [Ans. 81.]

2. (a) Set up the integral for the arc length of the ellipse

(x, y) = (a cos /, b sin t),0 < < 2n.


t

(b) Show that the computation of the integral in part (a) can be reduced to
the computation of a standard elliptic integral of the form

v 1 - k2 sin
2
6 dd.

(c) By using a table of elliptic integrals or by direct numerical calculation,


find an approximate value for the arc length of an ellipse with a = 1 and
b =2. [Ans. 9.689.]

3. Suppose a curve y is parametrically defined by two continuously differentiable


functions
/(/), a<t <b,
g(u), *<u<p.
These functions are called equivalent parametrizations of 7 if there is a
continuously differentiable function <p such that

a = 95(a) and b = <p(P),

/(?(«)) =<?(«), «<«</?,


<p'(w) > 0, a. < u < (1.

(a) Show that equivalent parametrizations of 7 assign the same length to y.


(b) Show that

(x, y) = (cos /, sin t), < <- t


/

Sec. 3 Arc Length 219

and

(pc t y)=[- -, —— r I, 0<«<1


are equivalent parametrizations of a quarter circle,
(c) Find a pair of nonequivalent parametrizations of some curve.

4. Show that the curve

(x, y) = (cos s, sin s), <s< 2tt

is parametrized by arc length, and sketch the velocity and acceleration


vectors, together with the curve, at s = tt/2.

5. Let 7 be a continuously differentiable curve with endpoints Pi and p 2 . Let


A be the line segment p x L f(p
2
- Pj), < < / 1 . Prove that /(A) < /(y).

6. Consider the spiral curve

<r.

(a) Find explicitly the arc length parametrization of the curve.


(b) Find the unit tangent and principal normal vectors at an arbitrary point.
(c) Find the curvature at an arbitrary point.

7. (a) Show that for a line given by g(t) = tx x + x , the curvature is identically
zero,
(b) Show that if a curve y, parametrized by arc length and given by a
function f(s), has a tangent at every point and has curvature identically
zero, then y is a straight line.

8. Find the total mass of the spiral given by g(t) = (a cos /, a sin /, bt),

< < 2-n,


t if its density per unit of length at (x, y, z) is equal to
x2 + y% + z2 .

9. Show that if y is the graph of a function Jl — Jl 2 , defined for a < x < b.


then

/(y)
(" V\ + (/; Qt)) 2 + (f2 (x)f dx,

where /, and 2
are the coordinate functions of y, assumed continuously
differentiable.

10. If a curve is parametrized by arc length, its curvature is k(s) = \(djds)t(s)\,

Show that if 0(s, h) is the angle between tO) and t(.s + h), which tends to
zero as h tends to zero, then
d(s, h)
k(s) = lim

[Hint. Show that |t(* + h) - t{s)\ = v2 - 2 cos Q(s, //).]


220 Derivatives Chap. 3

11. Show that if a curve is given parametrically by a function g{t), then in terms
of derivatives with respect to t, the curvature at^(/) is

2 2
V\g'{t)\ \g'\t)\ -(g\t)-g"(t)f
3
\gV)\

[Hint. Express |t'| in terms of^-' and^".]

12. (a) Find the curvature function k(/) of the parabola x(/) = 2
(/, t ) for
— oo < t < oo.
(b) Show that, for a circle of radius r, the curvature is given by <<(f) = 1/r.

13. A piece of wire is coiled in a uniform spiral 3 inches in diameter and 2 feet
long. Find the length of the wire if it contains 6 complete turns.

14. Let y be a continuously differentiate curve having a mass distribution of


density p(x) at each point x of y. Let m be the total mass of y. If y is given

by X. > #", a < < b, the center of mass of the distribution


/ is the vector

= g(t)p(g(t))\g\t)\dt.
^|J a
(See Exercise 14 of the previous section for the definition of the vector
integral.)

(a) Find the center of mass of the spiral g(t) = (a cos t, a sin t, bt),
< <
t 2tt, with density at (x, y, z) equal to x 2 + y 2 + z 2 .

(b) Show that if y has uniform density 1 and is parametrized by arc length,
then
1
/M<y>
g(s) ds.

(c) Use the results of Problem 17(b) of the previous section to show that
the center of mass then satisfies

1 fKy)
z < \g{s)\ ds.
=J. ''

SECTION 4
h
LINE INTEGRALS The f{x) dx of a real-valued function of a real variable can be
integral \a

generalized in several ways. One generalization that has applications in

physics is the line integral, which we describe here. Let 3l


3 — > Jl 3 be a
continuous vector field defined in a region D of J! 3 . Let y be a curve lying

in D and parametrized by a function .'R -^> Jl 3 with g(t) defined and


continuously differentiate for a < t < b. To say that y lies in D means
simply that the range ofg lies in D. A typical situation is shown in Fig. 19.

At each point g(t) of y we picture the tangent vector g'(t) as an arrow with
its initial point atg(r). Also at g(t) we locate the arrow describing the vector
Sec. 4 Line Integrals 221

Figure 19

field F at g(t). The dot-product F(g(t)) •

g (/) is then a continuous real-


valued function of / for a < t < b, and the integral

j/(g(tj)-g'(t)dt

is called the line integral of F over y.

Example 1. If a vector field is given in Jl 3 by F(x, y, z) (x 2 y 2 z 2) — , ,

and y is given by g(t) = (t, t 2 , t


3
) for t < <
1, then the integral of F

over y is

O (1, It, 3t
2
dt = (t
2
+ 2r
D
+ 3f
8
dt = 1.
f

) )
Jo

The line integral can be interpreted in qualitative terms as follows. The


dot-product

v
' lg'(OI

is the coordinate of F(g(t)) in the direction of the unit tangent vector to


y at g(t). Then F(g(t))- g'(t), the integrand in Formula (1), is the tan-
gential coordinate of F(g{t)) times \g'(t)\, the speed of traversal of y at
g(t). In particular, always perpendicular to y at g(t), the
if F(g(t)) is

integrand, and hence the integral will be zero. At the other extreme, for a
given field F, if the speed \g'(t)\ is prescribed at each point of the curve, then
the integrand will bemaximized by choosing a curve y that at each point
has the same direction as the field there. Thus the integrand in the line
integral can be thought of as a measure of the effect of the vector field
along y.
Formula (1) can be generalized to any number of dimensions. Thus if

'Ji
n — > 31", and 'J{ —?-+ "J\" describes for a < t <
b a smooth curve, y,
lying in D, the line integral of F over y is still defined by Formula (1), in
which the dot-product is now formed in 31".
222 Derivatives Chap. 3

Example 2. Let F(x, y) = (x,y) define a vector field in 'J\


2
. The curve
given by g(t) = (cos t, sin /) for < t < 77-/2 is a quarter circle shown in
Fig. 20 together with some tangent vectors and some vectors of the field.
Because the field is perpendicular to the curve at each point, we expect the

integral to be zero, and in fact we have


CirlZ r*/2
F (g(0) "
g'(0 dt = ( cos f> sin ' (—sin f, cos dt
Jo J

jt/2

/.
(—cos t sin t + sin t cos t) dt = 0.

Example 3. An important physical interpretation of the line integral

arises as follows. Suppose that the function Jl 3 —F * 'Ji


3
determines a
continuous force field in a region D in ;ft
3
. To define W, the work done in

Figure 20 Figure 21

moving a particle along a curve y in D, we use the definition that, for


linear motion in a constant field, work is force times distance. That is,

W=(F )(s), t

where 5 is the distance traversed and F t


is the coordinate of the force in
the direction of motion. In Fig. 21 a particle moves along a line having
direction vector t with |t| = 1, and it is subject at each point x to the same
force vector F(x). The coordinate of F in the direction of motion is F =
t

F-t. Then = (F-t)s.W


For motion along a continuously differentiable curve, we begin by
approximating the curve by tangent vectors, as is done in defining arc

length. If the curve y is given parametrically by 'A — > 'Ji


s
with g(t)
defined for a < t < b, then the arrows representing the tangent vectors
S'( t k-\)( t k f —
k-i)> { o h < < < Oo w iH approximate y as shown in

Fig. 22. Let us fix a point xk = g(tk on y, and near x k approximate F by


)

the constant field F(x k ). That is, near x k we approximate F(x) by the vector
Sec. 4 Line Integrals 223

F(Xk
\
\ g'O k ) (tk+ \- tk)

Figure 22

field that assigns the constant vector F(x k ) to every point. The tangential
F(x k ) t(t k ), where t(t) = g'U)/\g'(t)\.
coordinate of F{x k ) is •

Thus the work done in moving a particle along y from x k to x^+x is

approximately
k W
= {F(x k ).t{t k ))\g'{t k )\{t k+1 -t k )
= F(g{t k ))-g'(t k )(t k+1 -t k ).

Letting m{P) = max (t k — /


fc
_ x ), we get
l<k<K
K
lim
m(P)-»0 fc=0
2W k
= \F{g{tj)-g'(t)dt,
Ja

which we define to be the work done by the field F in moving the particle
through the domain of F along y.

The assumptions made in Example 3 that F be continuous and that


g be continuous assured that the integrand F(g(t)) g'(t) would be con- •

tinuous and hence that the line integral would exist. However these
conditions are stronger than necessary. It is enough to assume that the
path of integration is piecewise smooth and then that the vector field F is
sufficiently regular so that the integral in Formula (1) exists. Thus the
derivative g may be discontinuous at finitely many points, so that y has
sharp corners as shown in Fig. 23.

Example 4. Let a vector field be defined in :H


3
by F(x, y, z) = (x, y, z).

Let g(t) = (cos t, sin t, \t — tt\2\) describe a curve y in %3


for < / < -n.

Then y has a corner at (0, 1,0). where t = -n\2. Indeed g is not differenti-
able there, and in fact lim g'(t) = (— 1, 0, — 1) and lim g'(t) =
( -*jt/2- «-«r/2+
(—1,0, 1), showing that the direction of the tangent jumps abruptly
at / = 77/2. Nevertheless, the integral of F over y exists. To compute it, the
interval of integration would ordinarily be broken at t = 77-/2. However,
in this particular case F(g(t)) •
g'(t) = — t tt/2 unless t = 77/2, and
224 Derivatives Chap. 3

Figure 23

the line integral is easily seen to have the value zero over the interval
< <t IT.

A convenient notation for line integrals arises if we denote the para-


metrization of y by g{t) = (x(t), y(t), z(t)) for a < / < b. Denoting the
coordinate functions of Fby Fx F and F3 and
, 2 , , suppressing the variable /

in the integrand, we get

F(g(t))-g'(t)dt

dx dy
4' F^x, y,z)— + F 2 {x, y, z)-f + F3 (x, y, z)
dz
dt.
•'a dt dt dt

The last integral can be abbreviated

Fx dx +F 2 dy +F 3 dz

and can be still further shortened by writing dx = (dx, dy, dz). Then the
formula becomes

JF dz.

The meaning of this notation is developed further in Chapter 7, Section 7.

In the meantime we shall simply use it as a shorthand.

Example 5. Let 'Ji* — > 'Ji* be given by

F(x, y, z, w) = {x— y,y — z,z — w, w — x).

The curve y, given by g(() = (t, —t,t 2 ,


— /
2
)for0 <t < 1 ,
passes through
the field. The integral of F over y is

F-dx
d = [(2f)(l) + (~t - r
2
)(-l) + 2
(2t )(2t) + (-r - t)(-2t)] dt
\r *=l
\

= 4.
Sec. 4 Line Integrals 225

Let y be a differentiable curve in Jl" given by Jl — > %n for a < t <b.


The integral over y of a real-valued function/), with its domain containing
y, is defined to be

p(g(t))\g'(t)\dt. (2)
I
•/ a

The function/* can be thought of as the density per unit of length of a mass
distribution over y, in which case the integral represents the total mass
distributed along y. If we write the line integral of a vector field over y in
the form

I
F(g(t))-^-\g'(t)\dt, (3)
•> a \g(t)\

the relationship between Formulas (2) and (3) becomes clear. The integral
of the vector over y depends on the direction of the tangent to y at
field

each point and not just the length of the tangent as in the case of the
integral of a real-valued function over y. If the curve y is parametrized by
arc length, then the two integrals take the respective forms

ly
p(g(s))ds and F(g(s)) • t(s) ds,
1
J

where t(s) =
is a tangent vector to y of length 1.
{dgjds){s)
It is from the definition of the line integral that, in general, the
clear
value depends on the parametrization of the curve y. The extent to which
the value is independent of parametrization is taken up in the exercises.

EXERCISES
^\
I. Compute the following line integrals.

_(a) § L xdx + x 2 dy + ydz, where L is given by g(t) = (t,t,i), for


< < t 1.

(b) J7- (x + y) dx + dy, where P is given by g(t) = 2


(t, t ), < < t I.

(c) L x dy and J y2 x dy, where y 1 is given by g(t) = (cos t, sin /) for


< t < 2-n, and where y 2 s given by h{t) = (cos /, sin t) for
' < < 4tt. t

(d) r
y
(dx + dy), where y 1 is given by (x, y) = (cos t, sin t),0 < < 2-n.
t

. [ dx
— + dy
- , .

(e) , where y 1 is the curve in part (d).

x
(f)
Jy (e dx + z dy + sin z dz), where y is given by (x, y, z) = (t, t
2
,
3
t ),

<t < 1.

(g) $ y
F • dx, where F(x, y, z, w) = (x, x, y, xw) and y is given by
(x, y, z, w) = (/, 1, /, /), < t <, 2.
226 Derivatives Chap. 3

2. Let y x be given by (x, y) = (cos /, sin /), < t < tt/2, and y 2 be given by
)( x> y) =- u,u),0<u <\. Compute j^ (fdx + g dy) and y2 (fdx +
(1 J"

g dy) for the choices of/ and g given below.


(a) f(x,y) = x,g(x, y) = x + 1.
(b)/(x,j) = x+j, <? (x .y) = 1. )

Mc) f(x, y) = , , ,
^(x, j)

3. Find the work done in moving a particle along the curve {x, y, z) = (t, 2
t, t ),

^0 < t < 2, under the influence of the field F(x, y, z) = {x + y, y, y).

4. Prove that if y is a curve given parametrically by a function 31 — Jl" with

x =/(/), 0<r<l,
and if —y is the curve described by

x =/(l -|), 0<f < 1,

then

F-dx = - F dx.

5. Let F(x, y) = (y, x) describe a vector field in Jl


2
. Find a curve y x passing
through the field and starting at the point (2,1) such that y 1 has length 1 and
the integral of F over yx is zero.

6. Prove that if y is given by & > Jl n for a — g


< < / b and y is then repara-
metrized by arc length s so that / = t(s), then

Hg^))'g'{t)dt =\ F(h(s))-t(s)ds,
Ja JO
where h(s) = g(t(s)) and t(s) = (dh/ds)(s). [Hint. Use the change of
variable theorem for integrals.]

7. Let 31" —F ** Jl n be a vector field and y a curve such that the line integral

jyF'dx exists. Prove that if |F(x)| < M, a constant, on y, then |j"


y
F«t/x| <
Ml(y).

8. (a) If ^(r) and h(u) are equivalent parametrizations of y as defined in


Exercise 3 of the previous section on arc length, show that § y F dx has •

the same value when computed with either parametrization.


(b) Find nonequivalent parametrizations of the circle x 2 + y 2 = 1 in Jl2
and a vector field F such that the integrals of F with respect to the two
parametrizations are different.
2
fx\ f 6/

9. Find the total mass of the wire | y \ = j 4V2 t


3
| , < < t

3/ 4

2
(a) If the density at the point corresponding to t is t
Sec. 5 Partial Derivatives 227

(b) If the density s units from the origin measured along the curve is {s + 1).
(c) If the density at a point is equal to its distance from the origin measured

in ft 3 .

10. Show that if $ y F-dx and j y G-dx exist, then f y (aF + bG) • dx =
a§ y F'dx + b]yG' dx, where a and b are any constants.

11. Show that if y and r\ are smooth curves described by functions


h
31 — g
> 31"

defined on [a, b], and & 31" defined on [b, c], with g{b) = A(6), then

F-dx = F-dx + F-</x,


J<5 Jy Jf;

where 5 is the curve given by

git), a <t <b


/(')
hit), b <t <c,
and F is a continuous vector field on y u )).

12l Let a function g{t) represent the position of a particle of varying mass m{t)
in Jl 3 at time /. Then the velocity vector of the particle is v(/) =g'(t), and
the force vector acting on the particle at£-(/) is F(g(tj) = [m(t)\(t)]'.

(a) Show that F(g(t)) g'(t) = m'(t)v 2 (t) + m(t)v(t)v' (t) , where v is the
speed of the particle.
(b) Show that if m(t) is constant, then the work done in moving the particle
over its path between times t = a and t = b is w = (m/2)(v 2 (b) — v (a)).
2

(The function (%)mv 2 (t) is the kinetic energy of the particle.)

SECTION 5

To extend the techniques of calculus to functions defined in 3\ we need n


PARTIAL
partial derivatives. Let/ be a real-valued function with domain space % n .
DERIVATIVES
For each i = 1, . . . ,n, we define a new real-valued function called the
partial derivative of / with respect to the z'th variable and denoted by
dfjdXi.For each x = (xlt . .
.
, xn) in the domain of /, the number
(df/dx^ix) is by definition

+ — f(x
5.1 —
dx,
(x) = lim /(*!,
«-o
. .
.
, xt t, . . . , x n) lt . . . , xt ,
. . . , x n)

The domain space of df\dx is 31", and the domain of df/dx is the subset
t i

of the domain of/consisting of all x for which the above limit exists. Thus
the domain of dfjdx could conceivably be the empty set. The number
t

(dfjdx )(x) is simply the derivative at x, of the function of one variable


t

obtained by holding x x Xi_x> xi+1 x n fixed and by considering/


, . . . , , . . . ,

to be a function of the /th variable only. As a result, the differentiation


formulas of one-variable calculus apply directly.
228 Derivatives Chap. 3

Example 1. Let f(x,y, z) = x 2y +yz+ 2


z 2 x. Then


rlf
(x, y, z) = 2xy + z
2
,

ox
d
-f
(x, y, z) = x2 + 2yz,
dy

&(x,y,z) = yi + 2zx.
oz

The partial derivatives at x = (1,2, 3) are

^(1,2,3) =4+ 9 = 13,


dx

^(1,2,3) = 1 + 12= 13,

— (1,2, 3) = 4 + 6 = 10.
dz

Example 2. Let/(«, v) = sin u cos v. Then

df d sin u cos v
— COS W COS I',

du du

df d sin u cos v
= —sin u sin v,
dv dv

of Itt jA _
"
dsin_u_cos_v 1 7T jA
" -sin
77
— sin
. 77
— = — 1.
.

dv\2 '2/ 9u \2 '2/ 2 2

We can repeat the operation of taking partial derivatives. The partial


derivative of df\dx with respect to they'th variable
t
is djdx, {dfjdx^ and is

denoted by d 2fldxj dx t
. This can be repeated indefinitely, provided the
derivatives exist. An alternative notation for higher-order partial deriva-
tives is illustrated below, in which the variable of differentiation is denoted
by a subscript.

*-/
_ J Xi
OX,

dxj \dxj dxj dx t

Jx Xi
dx 2
'

dx\dx)
2 3
/ a / \ a / _L
iXdxjBxJ dxjdxjdx;
Sec. 5 Partial Derivatives 229

Example 3. Consider/(x, y) = xy — x2 .

d
fx = -[ = y-2x
ox

Jfxy
= — J—
-3-3 = 1
ay ox

Jfxx = —=—
-,
dx*
2
2

^ ax ay
2

To interpret partial derivatives geometrically we can use the fact that,


for a real-valued function of a single real variable, the value of the
derivative at a point is the slope of the tangent line to the graph of the
function at that point. For illustrative purposes it will be enough to
consider the graph of a function Jl 2 — >-il{, namely, the set of points
(x, y,f(x, y)) in Jl 3 where (x, y) is in the domain off. Such a graph is
shown in Fig. 24 as a surface lying over a rectangle in the xy-plane. The
intersection of the surface with the vertical plane determined by the
condition y = b is a curve satisfying the conditions

z=f(x,y), y = b.

Consider as a subset of 2-dimensional space the curve defined by the

Figure 24
230 Derivatives Chap. 3

function g(x) = f(x, b). Its slope at x = a is

g'(a) = df
f (a, b).
ox

Similarly, at y = b the curve defined by h(y) = f(a, y) has slope equal to

h'(b) = of
f-(a,b).
By

The angles a and shown in Fig. 24 therefore satisfy

tana =— (a, b), tan =— (a, b).


dx dy
The numbers tan a and tan ft are slopes of tangent lines to two curves
contained in the graph of the function/. For this reason it is natural to
try to define a tangent plane to the graph of/ to be the plane containing
these two lines. (If/ satisfies the condition of differentiability defined in
Section 8, this is what is done.) We see easily that the set of points (x, y, z)

satisfying

B d
z =/( fl , b) + (x- a) {- (a, b) + (y - b) {- (a, b) (1)
ox oy
is a plane containing the tangent lines found above. In fact, specifying
y = b and x — a determines the two lines.

Example 4. The part of the graph of

f(x,y) = 1 -2x - y 2 2

corresponding to x > 0, y > is shown in Fig. 25. The function/ has


partial derivatives at (£, £) given by

dn\ v
:(i,i) = -2, Stfitt— i.
ax'
:\2 2/ dy\2 2/

Using Equation (1) to define the tangent plane to the graph of/ at (\, \),

Figure 25
See. 5 Partial Derivatives 231

we find also /(|, \) = J. Thus the equation of the tangent is

z = i - 2(x - J) - (y — I)

= -2* - + f jk

We can sketch the tangent plane by drawing the two tangent lines in it

determined by x = -| and y = \- It is somewhat easier to locate three


points on the plane, for simplicity

(1,0,0), (0,1,0), (0,0,1).

The point of tangency is {\, \, \).


Continuity for functions of more than one variable is discussed
extensively in Section 7. At this point we shall consider briefly the case

Jl 2 %, For convenience we assume, for each x = (x, y) in the domain


of/, that/(z) is defined for all vectors z = (z, vv) satisfying |x — z| < 6,

where some
b is positive number. We then say that/ is continuous if, for
every point x in the domain of/,

lim/( Z)=/(x).
|x-z|—

The limit relation means that/(z) can be made arbitrarily close to/(x)
if |x — z|, the distance from x to z, is made small enough. As usual,
the intuitive idea of continuity is that the values of the function /should
not change abruptly, resulting, for example, in breaks in the graph of/.
The graphs shown in Figs. 24 and 25 are those of continuous
functions, while Fig. 26 shows a simple example of the graph of
a discontinuous function. The condition that the domain D off
contain all points z sufficiently near every x in D is expressed by
saying that D is an open set.

It is a consequence of certain continuity conditions on /that


the higher-order partial derivatives of Jl
2
-^ 31 are independent
^
of the order of differentiation. The precise statement follows,
though we remark that a slightly stronger theorem can be proved.
(See Exercise 8 of Chapter 6, Section 2.) Figure 26

5.2 Theorem

Let 3l 2 > 31 be a continuous function such thatfx


,fy ,fxy and/,. ,

are also continuous on the same domain as/ Then/xv =fyx -

Proof. Choose x, y, and > so that the function


d

F(h, k) = [f(x + h,y + k)-f(x + h, y)]


- [f(x,y + k)-f(x,y)}
232 Derivatives Chap. 3

is defined whenever -Jh 2 + k 2 < d. We now assume that neither


h nor k is zero, and apply the mean-value theorem to the function

G(u)=f(u,y + k)-f(u,y)
on the interval with endpoints x and x + h. We find

G(x +h)- G(x) = hG'(x x ),

where x v is between x and x + h. In terms of F and /, this last


equation is

F(h, k) = h[fx (xu y + k) -fx ( Xl ,y)l

Now apply the mean-value theorem again, this time to the function
H(v) =fx (x 1, v) on the interval with endpoints y andy + k. We find

F(h,k) = hkfxy (x 1 ,y l ),

where yx is between y and y + k.


Rewriting F in the form
F(h, k) = [f{x + h,y + k) -f(x,y + k)]

- [f(x + h,y)-f(x,y)]
allows us to follow the same general procedure, this time differ-
entiating with respect to y, then x. We find

F(h,k) = hkfyx (x 2 ,y2),

where x 2 and y 2 lie between (x, x + h) and (y, y + k), respectively.


Equating the two expressions found for F(h, k), and canceling the
factor hk, gives

f*v (x\,yi) =fvx (x 2 ,y 2 )-

Now let both /; and k tend to zero. It follows that the distances

\( Xl - x) 2 + O'x
- v)
2
and V(x 2 - xf + (y 2 - yf
both tend to zero; therefore, by the continuity of/xy and/vx we , get

fxv {x,y) =fvx {x,y). But the point (x, y) was chosen arbitrarily, so

fxy =fy X on tne domain off.

Theorem 5.2 can be applied successively to still higher-order partial


derivatives, provided the analogous differentiability and continuity require-
ments are satisfied. Moreover, by considering only two variables at a time,
we can apply it to functions 31" —?-+ 'Ji where n > 2. Thus, for the
commonly encountered functions, which have partial derivatives of
Sec. 5 Partial Derivatives 233

arbitrarily high order, we have typically

2 2
a / a/

Bx By By Bx
z 3
B g 3 g
2
Bx By Bx Bx By
B% B%
etc.
Bz Bx By Bz Bx By Bz

The last two formulas follow from repeated application of the two-variable

formula by interchanging two differentiations at a time.

EXERCISES
1. Find —
ox
and —
dy
, where fix,
j
y) is:
j

(a) x2 + x sin (x + y). (d) arctan (_>'/*).

(b) sin x cos (x + y). (e) xv .

x+y+1
(c) e .
(0 logx J.
\n y
Ans.
dx x(\n xf
a2 a2
2. Find - —/—
oy dx
and - —/—
ox ay
, where / is

(a) xy + x 2y3 (b) sin {x


2
+ y2 ). (c)
r
3.\Find the first-order partial derivatives of the following functions:

(a) f(x, y, z) = x 2e x+y+z cos y.


x2 — 2

(b)f(x,y,z,w)
J
= „
z* + V
y
w*

(c) f(x,y,z)=x^K
dH(x, y)
4 - Find if f<.x,y) = log (x + y).
dx 2 dy

d2f f d2
5. Show that -^ + —^ = is satisfied by
J
ox oy*

(a) log {x 2 + y2 ). (b) x3 - 3xy 2 .

6. If f(x, y, z) = l/(;c 2 + y 2 + z 2 )1 ' 2 , show that

Jxx '
Jyv '
Jzz
= "•

= "" 2,/2
7. If/Ocj, x2 , . . . , x„) l/(xf + xj + ... + 4) (
, show that

JXyX-i "i"
Jx 2 x 2 I
• • • I
Jx nx n ^*

I
234 Derivatives Chap. 3

8. Prove directly that if/(x, v) is a polynomial, then

ay d2
/
dx
9..f

f(*.y>
-

SECTION 6

VECTOR PARTIAL
DERIVATIVES
Sec. 6 Vector Partial Derivatives 235

The geometric significance of the vector partial derivative is as follows.


If all the coordinates but one, say x t , are held fixed, and x alone is t

m The vector dfjdx^x) is a


allowed to vary, then/(x) traces a curve in 'Si .

tangent vector to the curve as defined in Section 2.

Example 1. Let

/("> v)

Then

cos v

("o. v o) = sin v — ("o. V o) u COS v


du ov
1

In Example 1 we can
, restrict the vector variable (w, v) so that v = v
is held fixed. Then f(u, v ) describes a curve as u varies. Similarly
f(u , v) describes another curve as v varies. Such curves are called param-
eter curves in the range of/. We shall assume that the coordinate functions
of/ are continuous functions. According to the definition of Section 2,
the vectors

— (u , v ) and — (u , v )
du ov

are tangent vectors to the two parameter curves obtained by varying u and
v, respectively, at the point where they intersect, namely, f(u v ). An ,

example is shown in Fig. 27, where the two curves are singled out by the

Figure 27
236 Derivatives Chap. 3

relations v =
and u = w that determine them; the varying parameter
v
in case v = and in case u = u it is v.
v is u,

Under certain circumstances, and in particular in the most commonly


met examples, the parameter curves of a function 3l 2 —^-> 'Ji 3 fit together
3
to form a surface in the range space 'Ji The technical conditions are .

discussed in the last section of Chapter 4; at this point we shall simply


consider some examples.

Example 2. Returning to the function / of Example 1 , the curves


determined by conditions of the form u = u are spirals around the
vertical axis in 'Ji
3
at a distance u from the axis. Taken together, these
curves form a surface with the shape of a spiral ramp, as discussed in
Example 7 of Section 1. The ramp can also be thought of as formed by the
straight lines determined by the conditions v = v and radiating out,
perpendicular to the vertical axis. The point (u, v) = (1 irjl) in the domain
,

of/ corresponds to
7(1,77/2) = (0, 1.7T/2)

on the surface. At this point on the surface, the tangent vectors

£(
du
W2)=(A f(W2)
dv
=
\

V J

determine a plane which, when translated parallel to itself so that it passes


through f(\, 7J-/2), it is natural to call a tangent plane. A parametric
representation for this tangent plane would be given by

In general we shall say that if the range of a function Jl 2 —>- 3l 3 is a


surface with a tangent plane at/(« , v ), then the tangent plane is repre-
sented parametrically by

" — ("o, v ) + v — (u , %) + /("o, v ),


du dv

with u and v as parameters.

A function Jl 3 — 2
> 3t can sometimes be interpreted as a 2-dimensional
flow as follows. A point (x, y, t) in ft is carried by/into a point (x y') =
3
,

f{x, y, t) in Jl
2
, where (x, y) is to be thought of as the position of a par-
ticle in 'Ji
2
at time t = 0, and/(x,7, /) is to be the position of that same
Sec. 6 Vector Partial Derivatives 237

Figure 28

particle after time / has elapsed. Figure 28 shows a picture of a 2-


dimensional flow. In the picture, the path of a particle that happens to be
at position (x, y) at time t = is shown as a curve containing also the
position /(jc, y, t) of the same particle t time units later. Thus, by holding
(x, y) fixed and allowing / to vary, we get a curve called a trajectory of
the flow. The family of all trajectories determined by / gives the geo-
metric picture of the flow. The tangent vectors f (x,y,
t
t), or

(x, y,
J
dt
t),

for each fixed t form a vector field with the tail of the arrow f (x, y,
t t)

now located at the point (x',y) —f(x, y, t), initially at (x,y). This
vector field is called the velocity field of the flow at time /. If the velocity

field of a flow is independent of /, then the flow is called steady. As the


next example shows, a steady flow is not necessarily one for which the
function f t
(x,y,t) is itself independent of t. On the other hand, it

should not be assumed that for a nonsteady flow every particle on a given
trajectory follows the same path as the particle that initiates the trajectory.

Example 3. The function

x cos t — y sin t

f(x,y,t) =
x sin / + y cos t

cos t —sin t\ Ix

sin t cos // \y
238 Derivatives Chap. 3

happens to be linear as a function of (jc, y) and is in fact a rotation

about the origin in Jl 2 , counterclockwise through an angle /, for


each fixed /. As a result, the trajectory of a point (x, y) is a circle
of radius \'x 2
+y 2
centered at the origin. Some of the trajectories
are shown in Fig. 29. The velocity vector dfjdt(x, y, t) is denoted
f (x, y,
t
t) in Fig. 29. Specifically, we have
/— x sin t — y cos t\ /—sin/ — cos t\ lx\
\ x cos t — y sin t) \ cos / — sin // \y)
To find f (x, y,
t
t) in terms of the position coordinates (x ,y') =
f{x, y, t), we can salve the vector equation

Figure 29 ix'\ /cos/ — sin /\ (x

y'J \sin / cos // \y


for (x, y) and find
I cos t sin t\ lx'

\yi \— sinr cos // \y'

It follows that ft (x,y, t) is given by the product

-sin/ — cos /\ /cos/ sin A lx'\ /0 —\\lx'


cost — sin//\sin/ costJ\y'J \l 0)\y'

Thus the velocity field v is given by

Since r is independent of time, the flow is steady.

Example 3 illustrates a basic assumption that is always made about


a flow: two particles cannot occupy the same position at the same time;
in other words, if

f(x , t)=f(x u t),

then x = xx . The assumption does not imply that trajectories cannot


cross one another (or themselves), but only that, for / fixed, the equation

/(x, /) = y

has at most one solution x for any given vector y. Of course a function

3l
4 —^->> Jl 3
of the form /(x, /) =f(x,y, z, /) would be used to describe
a 3-dimensional flow. Flows of higher dimension are also important in
theoretical mechanics.
Sec. 6 Vector Partial Derivatives 239

EXERCISES
1. Find formulas for the vector partial derivatives df/dx(x, y) and dfjdy(x,y) if

/x+y
(a) f{x,y) = I x -y
2 2
\x +y
x x
(b) f(x, y) = (e cos y, e sin y).

2., For each of the following functions, find the vector partial derivatives at the
""
indicated point. Sketch the coordinate curves for which these vectors are
tangents at the given point, and sketch the tangent plane to the range of/at
that point.

(a) f{u, v) = (u, v, u


2
+ v
2
) at (1, 1, 2).

(b) /(«, v) = (cos u sin v, sin u sin v cos , v) for (w, i>) = (tt/4, 7t/4).

[J7ih/. The range of/ in part (b) lies on a sphere.]

3. Find a parametric representation for the tangent plane of Exercise 2(a) and
also a representation for the line perpendicular to the tangent plane and
passing through the point of tangency.

4. Consider the 2-dimensional flow

(x + t\
f(x, y, t) = , for t < 0.
2
\y + t
)

(a) Sketch the trajectories of the flow that start at (x, y) = (0, 0), (0, 1), and
(1,1).
(b) For t = 1 sketch the velocity vectors at the points/(;t, y, 1), with (x, y)
chosen as in part (a).
(c) Show that the flow determined by/ is not steady. [Hint. It is not enough
toshow that dfjdt depends on t. Consider /(0, 0, 1) and dfj 3/(0, 0, 1) as
compared with /(l, 1,0) and 3// 3/(1, 1,0).]
5. Consider the 2-dimensional flow

/ f(x,y,t) = (xe t ,ye t ), for / > 0.

(a) Sketch the trajectories of the flow that start at (x, y) — (0, 1) and (1, 1).
(b) For / = 1 sketch the velocity vectors at the points f(x, y, 1), with (x,y)
chosen as in part (a).

(c) Solve the equation (x',y') =f(x,y, t) for (x, y) in terms of (x' ,
y'), and
substitute the result into df/dt(x,y, t) to show that the flow determined
by/ is a steady flow.

6. The flow
>x cos / — y sin ^

g(x, y, z, /) = | x sin / + y cos t


240 Derivatives Chap. 3

is a 3-dimensional extension of the flow considered in Example 3 of the


text. Find dg/Bt and show that the flow determined by g is a steady flow.

SECTION 7

LIMITS AND Limits and continuity have been introduced for functions of one variable
CONTINUITY in Section 2 and for real-valued functions in Section 5. Here we shall unify
and extend the definitions and show how to construct continuous vector
functions from continuous real functions. To begin, the definition of
limit is based on the idea of nearness. To say for example that

sin x
lim 1

x->0 x

is to say that (sin x)jx is arbitrarily close to 1 provided x is sufficiently close


to 0. Nearness on the real-number can be expressed by inequalities.
line

For example, \x — 3| < 0.4 says that the distance between the number x
and the number 3 is less than 0.4 or, equivalently, that x lies in the interior
of the interval with center 3 and half-length 0.4.

2.6 3.4

The statement "(sin x)/x is arbitrarily close to 1 provided x is suffi-

ciently close to 0" is translated in terms of inequalities as: For any positive
number e, there exists a positive number b such that if

< \
x - 0| = |jc| < 6,

then

- 1 < e.

We shall extend these ideas to Jl".

In 31 n a definition of limit also requires the means of asserting that one


point is close to another. Distance will be defined with respect to Euclidean
length. For any e > and point x in Jl", the set of all vectors x in Jl"

that satisfy the inequality


|x - x |
< e

is aspherical ball with radius e and center x . For example, if x = (1,2, 1),

the set of all x in Jl 3 such that

V(x - l)
2
+ (y - 2)
2
+ (z - l)
2
< 0.5

is the ball shown in Fig. 30.


Let S be a subset of Jl" and x a point in %n . Then x is a limit point of S
if, for any e > 0, there exists a point y in 5 such that < |x — y| < €.

Translated into English, the definition says that x is a limit point of S if


.

Sec. 7 Limits and Continuity 241

Xq = (1,2,1)

Figure 30

there are points in S other than x that are contained in a ball of arbitrarily
small radius with center at x. If, for example, S is the disk defined by
x2 +y < 2
1, together with the single point (2, 0), then the set of limit
points of S consists of S together with the circle x 2 +y = 2
1 . See Fig. 3 1
However, the point (2, 0) is not a limit point of S even though it is a point
of S.
We come now to the definition of limit for a function % n
m
> 3\ m Let — .

y be a point in 31 and x a limit point of the domain of/. Typical points


of the domain of/ will be denoted x. Then y is the limit of/ at x if, for
any > 0, there is a 6 > such that |/(x) —
e y |
< e whenever x satisfies
< |x — x < d. The relation is written
|

lim/(x) = y .

x-x

To put it a little less formally, the definition says that/(x) is arbitrarily


close to y provided x is sufficiently close to x and x 7^ x . Geometrically
the idea is this: Given any e-ball Be centered at y there exists a <3-ball Bb ,

centered at x whose intersection with the domain of/, except possibly for

Figure 31
x .

242 Derivatives Chap. 3

Domain space Jl
2
Range space 3V

Figure 32

x itself, is sent by/into B A 2-dimensional example is pictured in Fig. 32.


c .

The statement
lim /(x) = y
X—
is also commonly read "The limit of/(x), as x approaches x , is y
."

Example 1. Consider the function defined by

f(t) = (cos t, sin/).

The domain of/ is all of 31, and at an arbitrary point t of 31, /has limit
/(/). To see this we use known facts about cos t and sin / and consider

1/(0 -/(>o)l = V(cos t - cos / )


2
+ (sin t - sin t,f (1)

< |cos t — cos / | + (sin t — sin / |.

This holds because \'a 2 + b2 < \a\ + \b\. (See Problem 15.) Since, using
the fact that sin t and cos t are continuous,

lim cos t = cos t , and lim sin t = sin / ,

t->to <-'o

we can choose a d > such that for any preassigned e >


Icos / — cos tn \ < - and Isin t — sin t \ < - ,

2 2

whenever \t — t \
< 8. Hence, Equation (1) shows that \f{t) — /(/ )l < e

whenever \t — t \
< 8.
2
Example 2. Consider the real-valued function defined in all of 3l

except (x, y) = (0, 0) by

f(x, y) = —+—
-x
2
y
-
2
x x .

Sec. 7 Limits and Continuity 243

There is no limit as (jc, 7) —> (0, 0), for example along the line y = x.

We can write
lim/(x) = 00
X—
to describe what happens.

Example 3. The range space and the domain are the same as in the
preceding example and
x
2 -y 2

/(*> y) = 2 2

X* +/
,

There is no limit as (x, y) -> (0, 0). If (x, y) approaches (0, 0) along the
line y — ocjc, we obtain

— -y = lim x (l-a 1-a


2 2 2 2 2
,. x ,. )
hm
*o x
2
+ y
2
^0 x 2 (l + a
2
) 1 + a2
The limit is obviously not independent of a; it equals if a = 1, and 1 if

a = 0.

The functions in the last two examples are both real-valued. The
following theorem shows that the problem of the existence and evaluation

of a limit for any function %n — > 51 m reduces to the same problem for the
coordinate functions. The latter are, of course, real-valued.

7.1 Theorem

Given 31" — >• Jt


ra
, with coordinate functions fx , . .
,fm , and a
point y = (y lt . .
. , ym) in Si
m then
,

lim/(x) =y (2)
X—
if and only if

lira fi(x) =y t, i = l,...,m. (3)


x—
Proof. To say that Equations (2) and (3) are equivalent is to say
that the distance

l/GO - yd = V(/i(x) - Ji) + 2


• • • + (/m (x) - y m) 2

can be made arbitrarily small if and only if all the absolute values

l/iO) -yi\, , l/mW -ym \

can be made arbitrarily small. But the equivalence of these last two
statements follows at once from the elementary inequalities

1/00 - yd > 1/iOfl - yl i = l,...,m


244 Derivatives Chap. 3

and
l/(x) - y |
< y/m max (|/,(x) - y{ \}.
l<i<m
We leave these as exercises.

Example 4. Vector functions fx and/2 are defined by

\sin f/

Then

limA(l) =
u
but lim/2 (0 does not exist because the coordinate function sin (1/r) has
no limit at t = 0.

The concept of continuity is fundamental to calculus. Roughly speak-


ing, a continuous function /is one whose values do not change abruptly.
That is, if x is then/(x) must be close to/(x ). This idea is
close to x ,

related to the notion of limit,and the definition of continuity is as follows:


A function/is continuous at x if x is in the domain off and lim/(x) =
\—x
/(x ).At a nonlimit or isolated point of the domain of/, we cannot ask for
a limit; instead we simply define/to be automatically continuous at such a
point. It is an immediate corollary of Theorem 7.1 that:

7.2 Theorem

A vector function is continuous at a point if and only if its coordinate


functions are continuous there.
Sec 7 Limits and Continuity 245

is continuous at every value of?. On the other hand, the function

sin -

is continuous except at t = 0.

7.3 Theorem

Every linear function 31" > — L


%m is continuous, and for such an L
there is a number k such that

|L(x)| < A: |x|, for every x in 31".

Proo/l We prove the inequality first. Let et , . . . , e„ be the natural


basis for 3t". If x = (xu . . . , xn), then x = x 1e 1 + . . . + x ne n .

Since L is linear,

L(x) = XlL(*i) + + x n L{^n)-

By the homogeneity and triangle properties of the norm, we have

|L(x)| < |xx| |L(e x )| + . . .


+ \xn \
|L(e„)|.

Setting A:= |L(ei)| + + |L(e„)|, and using the


. . . fact that \x t \
<
|x| for / = !,...,», gives the desired inequality.
We use the inequality to show that L is continuous. If x and x
are vectors in 31", then

\L(x) - L(x )| = \L(x -x )|

< A: |x —x |.

This shows that, as x tends to x , L(x) tends to L(x ).

A function is simply called continuous if it is continuous at every point


of its domain. From Theorem 7.2 we conclude that a continuous vector-

valued function of a single variable, 31 — > 31", is precisely one for which
the coordinate functions/^ ...,/„ are continuous real-valued functions of
a real variable. The latter of course include most of the functions of ordi-
nary calculus, such as x 2 and, for x > 0, log x. These same functions
, sin x,
can be used to construct examples of the continuous coordinate functions
that go to make up a vector-valued function of a vector variable,
x

246 Derivatives Chap. 3

3i
n — >- 3l
m . For example, the coordinate functions of

,, . /sin xv cos x y\

are continuous.The continuity of these and other examples can be


deduced from repeated application of the following three theorems, to-
gether with Theorem 7.2.

7.4 The functions %n —p^> 31, where Pk {x x ,


. . . , xn ) = xk , are con-
tinuous for k = 1, 2, . .
.
, n.

7.5 The functions 3l 2 -^> 31 and 3l 2 -^> 31, defined by S(x, y) = x + y


and Af (x, y) = xy, are continuous.

7.6 If 31" — > 3l


m and 3l
m — > 3i p are continuous, then the composition
g o/given by g °f(x) = g(f(x)) is continuous wherever it is defined.

Proving the continuity of Pk S, and , M is left as an exercise.

Proof of 7.6. We assume that x is a limit point of the domain of


g °/and show that lim g °/(x) = g °f(x ). If e > 0, we can, by the
x—
continuity of g, find a 6 > such that |g(y) — g(/\x ))| < e
whenever |y — /(x )| < d and y is in the domain of g. But since/
is also continuous, we can find a d' > such that |/(x) — /(x )| < d
if |x — x < S' and x is in the domain of/. It follows that
|

\g °/(>0 - g °/(Xo)l = WW) ~ s(/(Xo))l < *

for these same vectors x, and hence that g of is continuous at x .

Since x was an arbitrary limit point of the domain of g °f, the


proof is complete.

Example 6. The function


f(x, y) V1 x — — 2 —y 2
, defined for
IC*, y)\ < 1, is continuous, because it can be written

f(x,y) = Vl - (P^y)) - (Pz(x,y)) 2 2


,

and so is a composition of continuous functions. Similarly, g(x, y) =


log {x + y), defined for x y + >
0, is continuous. The product of/ and

g, given by
h(x, y) = VI — x2 —y 2
log (x + y),
Sec. 7 Limits and Continuity 247

isdefined on the half-disk which is the intersection of the domains


of/and g, and which is shown in Figure 33. The product is a con-
tinuous function because it is the composition of the continuous
vector function
F(x,y) = (f(x,y),g(x,y))

with the function M of 7.5.

A vector x is an interior point of a subset of a vector space if

all points sufficiently close to x are also in the subset. Consider,


for example, the subset S of Jl 2 consisting of all points (x, y) Figure 33

such that < x < 2 and -1 <y < 1 (cf. Fig. 34(a)). The
points (1,0), (\,\), (1, -1), (2,0) all belong to S. The first two are
interior points and the last two are not. More generally, the interior points
of S are precisely those (x, y) that satisfy the inequalities < x < 2 and
— <y <1 1. The formal definition is as follows: x is an interior point
of a subset S of 31" if there exists a positive real number 6 such that x
belongs to S whenever |x — x < b. |

A subset of 'Ji
n
, all of whose points are interior, is called open. Notice
that according to this definition the whole space 31" is an open set. So also
is the empty subset of 31". Since contains no points, the condition
for openness is vacuously satisfied. An open set containing a particular
point is often called a neighborhood of that point.

Example 7. For any e > and any x in 31", we can show that the set
B€ , of all vectors x such that |x — x |
< e is open. In 3-dimensional space,
for example, B(
is the e-ball pictured in Fig. 30 for e = 0.5. Let x x be an
arbitrary point in B e . Then^ — x |
< e. We must show that every vector
sufficiently close to xx is in B e . Set, as in Fig. 34(b),

Then 6 is positive. Suppose x is any vector such that |x — xt \


< 6. By

i)

0!
© .(1,0) (2,0)

(1,-1)

(a) (b)

Figure 34
248 Derivatives Chap. 3

the triangle inequality,

|X - XqI = |(x - Xx) + (Xj - x )|

< |x - Xj| + |x, - x |

<6+ (e - <5) = e.

Hence, x is in B and
( , the proof is complete.

Example 8. Let / be a finite set of points in .'Jl". Then the set consisting
of all points in Jl" that are not in / is open. Thus a vector function that is

defined at all points of Jl" except for some finite set has for its domain an
open subset of Jl".

A set S is closed if it contains all its limit points, and the closure of S
is the set S together with its limit points. The boundary of S is the closure
of S with the interior of Thus an interval a < x < b, denoted
S deleted.
[a, b], is and is in fact a closed set in the
called a closed interval
above sense. An open interval a < x < b, denoted (a, b), is on
the other hand an open set. The diagram on the left shows an
open set S, then the closure of S, and then the boundary of S.

EXERCISES
In Exercises 1 and 2 take the domains of the functions to be as large as
possible.

I. At which points do the following functions fail to have limits?

ly + tan x\

jj \ln (x + y)

y
fx\ / x2 + 1

(b) f\
'

y
\y
2
- !

(c )/(^v)=-^-+j.

y, if x ^ 0.
= smx
[

if x = 0.

(e) /(/)
Sec. 7 Limits and Continuity 249

2. At which points do the following functions fail to be continuous?

. 1 1

fu
(b)/

(c)f(x,y)

(d,/
C
lu
(e)/

(f) /(x) =-

3. When x = (1, 2), draw the set of all vectors x in Si 2 such that

(a) |x - x |
< 3.

(b) |x - x |
= 3.

(c) |x - x |
< 3.

4. Identify as open, closed, both, or neither, the subset of &2 consisting of all

vectors x = (x, y) such that

(a) |x -(1,2)| <0.5.


(b) |x- (1,2)| <0.5.
(c) |x -(1,2)| < -0.5.
(d) < x < 3 and < y < 2.

(e) 2 < x < 3 and <y < 2.

x2 y
2

( f) -2L +Zi <1 -

a bl

(g) x #(0,2) or (1,2).


(h) x 2 + y 2 > 0.
(i) x > 0.
(j) x > y.
5. Let the set S consist of the points (x, y) in Jl 2 satisfying < x2 + y2 < 1,
together with the interval 1 <x < 2 of the x-axis.

(a) Describe the boundary of S.


(b) What are the interior points of S?
(c) Describe the closure of S.
250 Derivatives Chap. 3

6. (a) Prove that the union of an arbitrary collection of open subsets of 31" is

open.
(b) Prove that the intersection of a. finite collection of open subsets of 31" is

open.
(c) Give an example to show that a nonempty intersection of infinitely many
open subsets of &n may fail to be open.

7. Prove that x is a boundary point of a set S if and only if every neighborhood


of x contains a point of S and a point of the complement of S.

8. A vector function/is said to have a removable discontinuity at x if (a)/is


not continuous at x , and (b) there is a vector y such that lim/(x) = y .

x-^x
Give an example of a function/and a point x such that/is not continuous
atx and (l)/has a removable discontinuity at x (2) /does not have a ,

removable discontinuity at x .

9. Prove that every translation is a continuous vector function. A vector

function 31" — t
> 31" is a translation if there exists a vector y in 31" such
that t(x) =x + y for all x in 31".

10. Prove 7.4 and 7.5 of the text.

11. If/ and g are vector functions with the same domain and same range space,
prove
lim (f(x) + g(x)) = lim /(x) + lim g(x),
X—*X X—*-x x—>x

provided that lim /(x) and lim^(x) exist.


x—*-x x—*-x

12. Let L be a line and P a plane in 3l


3
. Is either P or L an open subset?

13. Let S be a closed subset of 31". Prove that the complement of 5 in 31" is open.

14. Converse of Exercise 1 3 : If S is an open subset of 31" , show that the comple-
ment of S in 31" is closed.

15. Show that \V - b2 < \a\ + \b\.

16. Prove the inequalities

|(zj, . . . ,z m )| > Izfl, i = 1 m


and
\(z x , ... ,zj| < Vmmaxflzil}.

17. Show that Theorem 7.3 can also be proved by using Theorems 7.4 and 7.5.

SECTION 8

DIFFERENTIALS Many of the techniques of calculus have as their foundation the idea of
approximating a vector function by a linear function or by an affine
function. Recall that a function ,'ft " -^-+ %m is affine if there exists a linear

function %n — > 3i m and a vector y in Jl


m such that

A(x) = L(x) + y , for every x in 31".


Sec. 8 Differentials 251

We shall see that affine functions form the basis of the differential calculus

of vector functions.

Example 1. In terms of coordinate variables an affine function is defined


by linear equations. For example, consider the point

and the linear function ft 3 — > 3l 2 defined by the matrix

'2 3 0\

1 5/

The affine function A(x) = L(x) + y , where

is described by the equations

u = 2x + 3y + 1

v = x + 5z + 2

Since any system of linear equations can be systematically solved, any


affine function can be similarly analyzed.

We shall now study the possibility of approximating an arbitrary vector


function / near a point x of its domain by an affine function A. The
general idea is the possibility of replacing near x what may be a very
complicated function by a simple one. Before trying to decide whether or
not an approximation exists, we first have to say what we shall mean by an
approximation. We begin by requiring that/(x ) = A(x ). Since ^(x)
=
L(x) + y , where L is linear, we obtain /(x ) = L(x ) + y , and so

^(x) = L(x - x ) +/(x ). (1)

An apparently natural requirement is that

lim(/(x)-^(x)) = 0. (2)
x-*x

However, Equation (2) may appear to say more than it really does. To
seewhat it really says, we observe that from (1) we get

f{x) - A(x) =/(x) -/(x ) - L(x - x ).


252 Derivatives Chap. 3

Now every linear function is continuous, so lim L(x — x ) = L(0) = 0.


x ~* x °
Hence,

lim (f(x) - A(x)) = lim (/(x) - /(x )).


x~»x x-»x

It follows that Equation (2) is precisely the statement that the vector func-
tion/is continuous at x . This is significant, but it says nothing about L.
Thus, in order for our notion of approximation to distinguish one affine
function from another or to measur, in any way how well A approximates
/,some additional requirement is necessary. A natural condition, and the
one we shall require, is that f(x) — A(x) approach faster than x
approaches x That is, we demand that
.

- /(x - L(x - x
Hm /(*) ) )
= Q
x->x |X — X |

Equivalently, we can ask that/ be representable in the form

f(x) =/(x ) + L(x - x ) + |x - x |


Z(x - x ),

where Z(y) some function that tends to zero as y tends to zero.

A function
is

%n — > 31'" will be called differentiable at x if

1. x is an interior point of the domain of/

2. There is an affine function that approximates / near x That is,


.

there exists a linear function Jl" > %m such that

/(x) -/(x ) - L(x - x


=
Hm )
Q
x-»x |X Xq|

The linear function L is called the differential of/ at x . The function / is


said simply to be differentiable if it is differentiable at every point of its

domain.
According to the definition, the domain of a differentiable function is
an open set. It is, however, convenient to extend the definition sufficiently
to speak of a differentiable function /defined on an arbitrary subset S of
the domain space. By such an /we shall mean the restriction to S of a
differentiable function whose domain is open.

Example 2. The function / defined by f(x, y) — \l \ — x 2 — y 2 has


its domain the disk x +
y < 1- Its graph is shown in Fig. 35. The
2 2
for
interior points of the domain are those (x, y) such that x 2 + y 2 < 1. We
shall see that this function is differentiable at all these points.

In dimension 1 , an affine function has the form ax + b. Hence a real-


valued function /(x) of a real variable x that is differentiable at x can be
Sec. 8 Differentials 253

Figure 35

approximated near x by a function A(x) = ax + b. Since/ (x ) = A(x ) =


ax + b, we obtain

A{x) = ax +b= a{x — x ) + /(*<>)•


The linear part of A (denoted earlier by L) is in this case just multiplication
by the real number a. The Euclidean norm of a real number is its absolute
value, so condition 2 of the definition of differentiability becomes

f(x)—f(x )-a(x-x
,.
hm
)
= 0.
x-*x g \X X |

This is equivalent to

lin/W -»*•> = a.

The number a is commonly denoted by/'(x ) and is called the derivative of


/at x The affine function A is therefore given by
.

A ( x ) =/(*o) + /'(*<>)(* - x ).

Its graph is the tangent line to the graph of/at x , and a typical example is

drawn in Fig. 36. Thus we have seen that the general definition of differ-
entiability for vector functions reduces in dimension 1 to the definition
usually encountered in a one-variable calculus course.

A linear function 31" Si —


m must be representable by an m-by-n
matrix. We below that the matrix of any L satisfying conditions
shall see
1 and 2 of the definition of differentiability can be computed in terms of
partial derivatives of/. It follows that L is uniquely determined by/at each
interior point of the domain of/. Thus we can speak of the differential of/
and x and denote it by dx f.
To find the matrix of d if .'R
n
r f
'Jl
x o-'

m we consider the natural basis
,

(e u ... , e„) for the domain space %n . If x is an interior point of the


domain of/, the vectors

xj = x + teit j = I, ... ,w,


254 Derivatives Chap. 3

y
,

Sec. 8 Differentials 255

and the entire matrix of dx /"has the form

I&-W <7X X
&<w
C7X 2
... ft
0X„
w N

jK-W |t
dXi dx 2
W ...
^(
dx„
Xo ;

|^(Xo) ^(x ) ... |^(x„


,
ox! ox 2 ax n
This matrix is called the Jacobian matrix or derivative of/ at x , and is

denoted /'(x ). We can summarize what we have just proved as follows.

8.1 Theorem

If the function %n — Jl
m is differentiable at x , then the differential
dx /is uniquely determined, and its matrix is the Jacobian matrix of
/ That is, for all vectors y in %n ,

<W(y) =/'(xo)y.

While the linear transformation dx /and it matrix /'(x ) are logically


distinct, the lastequation shows that they can be identified in practice,
provided it is understood that the matrix of dx f is taken with respect to
the natural bases in %n and 'Ji
m .

Example 3. The function


/ x2 + e"
f(x,y,z)=[
\x + y sin z
has coordinate functions /i(x, y, z) — x + ev 2
and / (x, y,
2 z) = x +
y sin z. The Jacobian matrix at (x, y, z) is by definition

f{x, y, z) =
(|
(x, y, z) & (x, y, z)
| (x, y, z))

so for this example


/2x ey
f(x,y,z)=(
\ 1 sin z j; cos zj
256 Derivatives Chap. 3

Thus the differential of/at (1, 1, tt) is the linear function whose matrix is

<2 e 0\

1 o

Example 4. The function /defined by

f(x,y) = (x + 2xy + y xy + x _y)


2 2
,
2 2

has differential dx fat x = (x, y) represented by the Jacobian matrix

2x + 2y 2x + 2y s

\y
2
+ 2xy 2xy + x'
How can one whether or not a vector function is differentiable?
tell

Theorem 8.1 says only that if/ is differentiable, then the differential is
represented by the Jacobian matrix. It does not go the other way. Thus

Examples 3 and 4 are inconclusive to the extent that we have simply


assumed that the functions appearing in them are differentiable. Just as
the derivative of a real-valued function of one variable may fail to exist,
so in general a vector function need not be differentiable at every point.
The next theorem is a convenient criterion for differentiability.

8.2 Theorem

If the domain of Jl" — 'J\


m is an open set D on which all partial
derivatives dfjdxj of the coordinate functions of/ are continuous,
then/is differentiable at every point of D.

Proof. Let L be the linear function defined by the Jacobian matrix of


/. The theorem will have been proved if it can be shown that L
satisfies

lim
/(*)-/(*.) -Ux-x )
= Q (3)
x-*xq |X Xq|

Since by Theorem 7. 1 a vector function approaches a limit if and only


if the coordinate functions approach the coordinates of the limit, it is

enough to prove the theorem for the coordinate functions of/, or,
what is notationally simpler, to prove it under the assumption that/
is real-valued. If
x = (*!, . . . , xn)
and
x = {a x , . . . , a„),
set

yk = (x lt ...,**»%! <*„), k = 0, 1 n,
Sec. 8 Differentials 257

so that y = x and y„ = x. These vectors are illustrated'


for three dimensions in Fig. 37. Then, because of cancel-
lation between successive terms in the sum below, only the
first and last terms survive, and we have
*o

fix) -/(x ) = 2(/(y


fc=i
fc ) -/(y*_,)).
z ;
y:

Because y k and yk _ t differ only in their kth coordinates, we


can apply the mean-value theorem for real functions of a
real variable to get

/(y*) - f(y k -i) = (** - a k) &- (z k ), Figure 37


dx k

where z k is a point on the segment joining yk and y^. Then

f(x)-f(x )=2(x k -a k)^(zk ).


fc=l ox k
We also have, by the definition of L,
'x, — a,

L(x-x )= (£-( x o)---^-( x o))|

n
df
--2(x k -a k )^(x ).

Hence

|/(x) -/(x ) - L(x - x )| = J I l%- (z k ) - %- (x ))(x k - ak)


U=i \dx k ox k }

si dx,. dx,.

where we have used the triangle inequality and the fact that

\x k — ak \
< \x — x |
for k = 1 , 2, . . . , n.

Since the partial derivatives are assumed continuous at x , and the zk


tend to x as x does, the limit Equation (3) follows immediately.

The entries in the Jacobian matrices that appear in Examples 3 and 4


are continuous functions. As a result of the theorem just proved, we
conclude that the two functions in those examples are differentiable.
258 Derivatives Chap. 3

Example 5. The function

f(x,y) = VT r
all (x, y) such that x +
y < 1 is the same as in Example 2
2 2
defined for
except that we have removed the boundary of the disk so that the domain
is an open set. The Jacobian matrix is

(/x /»)
Wl - x - 2
y
.2
vi-r-f
V X

The entries are continuous on the open disk, and we conclude, by Theorem
8.2, that/is differentiable there.

Example 6. Consider the function

m= /'cos

sin/
^
— oo < < / 00.

The derivative /'(/<>) is the 2-by-l matrix

-sin /

cos /

It is instructive to consider the matrix as a vector in the range space off


and todraw it with its tail at the image point/(/ ). For t = 0, 77/4, tt/3,
77/2, and it, the respective matrices of the differential d /are t

/-V3/2\ 0^
and
\flj

These vectors, drawn with their tails at their corresponding image points
under/, are shown in Fig. 38. Evidently, for curves at least, the differential
is related to the notion of a tangent vector. The affine function that best
approximates /(/) in some neighborhood of / is the vector function of /

given by
f'(t )(t - t ) +/(/„),

Figure 38
Sec. 8 Differentials 259

which in terms of matrices becomes

-sin t \ /cos /„ + 'o s i n '<0

cos t J \sint — t cost 0/

This is the equation of the line tangent to the range of/at/(/ ).

A good geometric picture of the differential of a real-valued function


f(x, y) at x (jc =
j ) is obtained by looking at the surface defined
,

explicitly by/, together with the tangent plane to the surface at the point

.f(x,y)-f(x ,y )

v-axis

iJ.^/'(xo)(;:;:)

Figure 39

( x o>y<»f( x o))- An example is


shown in Fig. 39. The tangent plane is the
graph of the affine function defined by

A (x, y) = f(x , y ) + f (x ,
y ) all (x, y) in 5l
2
.

y -jo
The difference A(x, y) —f(x ,y ) is a good approximation to the incre-
ment f(x,y) — f(x y , ), provided (x, y) is close to (x ,
y ). Figure 39 is

the analog of a similar picture, included in many one-variable calculus


of one variable.
texts, that exhibits the differential for real-valued functions
The example given below shows that continuity of the
in Problem 19
derivative /' is not necessary for differentiability of /. However, the
hypotheses of Theorem 8.2 are important enough to be given a special
name. A vector function / is continuously differentiable on an open set D
if the entries in the Jacobian matrix of/ are continuous on D. Thus each
260 Derivatives Chap. 3

of the functions of Examples 2-6 is not only differentiate but even


continuously differentiable in the interior of its domain.
We finally remark that if a function/ is differentiable, then f is necessarily
continuous, (even though the entries in /' may not be continuous). The
proof is very simple and we leave it as an exercise. (See Problem 11.)

EXERCISES
1. A linear function Hn —L
Rm is defined by a matrix (o„), and

is a vector in S\.
m Construct
. a specific example by choosing m and n, a
matrix (a^), and a vector

so that the affine function

A(x) = £(x) + y , all x in #n ,

(a) Explicitly defines a line in R 2


.

3
(b) Explicitly defines a line in Jl .

3
(c) Explicitly defines a plane in Jl .

3
(d) Parametrically defines a line in Si .

(e) Parametrically defines a plane in Si 3 .

What condition must the matrix (a„) satisfy in order to give an example
for (e)?

2. If y is the vector function defined by

(x\ = lx" —y

\yl = \ 2xy
Sec. 8 Differentials 261

find the derivative of/ at the following points:

1/V21
(a) (b) (c) (d)
1/V2J
2
Ans. (c)
2

3. Find the derivative of each of the following functions at the indicated points.

(a)/ x2 + y2 at
\yi \yi \h
(b) g(x, y, z) = xyz at (x, y, z) = (1 , 0, 0).

/sin ^
(c) /(/) = at t = -7 Ans.
cos t
A

(d) /(/) = / at / = 1.

>^.J)= I I

\x 2 + y2

(f) ^

(g) T u sin f I at Ans.

(h) /(*, j, z) = (x +j + z, xy + jz + zx, xyz) at (x, y, z).

4. Let P be a function from 3-dimensional to 2-dimensional Euclidean space


defined by P(x, y, z) = (x, y).

(a) What is the geometric interpretation of this transformation?


(b) Show that P is differentiable at all points and find the matrix of the

/l
differential of P at (1, 1, 1) Ans.
1

5. (a) Draw the curve in 9? defined parametrically by the function


g{t) = (t - 1 , t
2 - 3t + 2), - 00 < / < 00.

(b) Find the affine function that approximates £


(1) near t = 0. (2) near t = 2. [Ans. A(t) = (t - 1, t - 2).]
(c) Draw the curve defined parametrically by the affine function.
262 Derivatives Chap. 3

6. Let /be the function given in Exercise 2, and let

1\ /0.1\ /0\ /0.1

(a) Compute/(x + y t ) for i = 1,2, 3.

(b) Find the affine function /I that approximates /near x .

(c) Use A to find approximations to the vectors

/(x + y<), i = l,2,3.

7. (a) Sketch the surface in ft 3 defined explicitly by the function

f(x, y) = 4 — x2 — y2 .

(b) Find the affine function that approximates/


(1) near (0, 0). (2) near (2, 0). [Ans. A(x,y) = 8 - Ax.}
(c) Draw the graphs of the affine functions in (b).

8. What is the derivative of the affine function

bx b2 b3

\c 1 c2 c3

9. Prove that every linear function is its own differential.

10. Prove that if the vector function /is differentiable at x , then

/'(x )x = lim
/(x + fx ) -/(x o)
<-<-o t

11. Prove that if a vector function is differentiable at a point, then it is con-


tinuous there. [Hint. Multiply the quotient in the definition of differential
by |x - x |.]

12. At which points do the following functions fail to be differentiable? Why?

x v ^rr7
i \ _ (
\y! \ x +y
(c) f(u, v) = |« + f|.

f(xsin(l/jc), x2 + y2) if x # 0.
(d) h{x,y)
{(0,
2
x +y 2
), if x = 0.

13. Prove that every translation is differentiable. What is the differential?


>

Sec. 9 Newton's Method 263

14. Consider the function Jl n -^> A defined by/(x) = |x|


2
= x • x. Prove that
f'(x)y = 2x •
y, for any x and y in ft*.

15. Is the function 51" — Jl defined by^(x) = |x| differentiable at every point
of its domain?

16. Consider the function


N(x) = maxllXil, . . .
,
\x n \},

for x = (xt x n ) in Jl n For what points does the function fail to be


.

differentiable? Answer for (a) n = 1, (b) n = 2, and (c) arbitrary n.

17. Show that if /and g are differentiable at x and a is a real number, then

f + g and a/ are differentiable at x and

(a) 4 (/ + g) = d*J + dXo g.


(b) d {af) = a(dx
%a J).
The domain of/ + £ is the intersection of the domains of /and g. [Hint.
Use the uniqueness of the differential.]

18. Verify that the function

xy
x ^ ±y,
f(*,y)
= r
x = ±y,

has a Jacobian matrix at (0, 0), but that it is not differentiable there.

19. Show that the function defined by

1
\x" sin -, x ¥=0
fix) =
x =0
is differentiable for all x but is not continuously differentiable at x = 0.

SECTION 9

NEWTON'S METHOD
In this section we treat Newton's method for approximating a solution of
an equation /(x) = 0, where %n —>% n
is a nonlinear function. If/ is

linear or affine, the discussion in Chapter 2 holds. We begin by looking


at the idea of approximating a vector in % n by the entries in a sequence of
vectors in % We are used to thinking of a real number like V2 as being
n
.

approximated by a sequence of rational numbers, say, 1, 1.4, 1.41,


1.414, .... The idea extends immediately to vectors.

Example Consider the vector (V2, tt) in


1. Suppose that Vl is 3l 2 .

approximated by a sequence 1, 1.4, 1.41, 1.414, and that v is approxi-


. . .

mated by 3, 3.1, 3.14, 3.141, ... . Then we can form the sequence of
264 Derivatives Chap. 3

vectors (1, 3), (1.4, 3.1), (1.41,3.14), (1.414,3.141), ... to approximate

the vector (V2, v). It is natural to pair the numbers as we have, and not
in some other order, because it so happens that the entries in each pair are
accurate approximations to V2 and 77 to as many decimal places as are
given.

To make the ideas precise we define the limit of a sequence in ^l". Let
n
xls x 2 x 3
, , . . . be an infinite sequence of vectors in 3\ ,
just one vector
corresponding to each positive integer. Suppose there is a vector x in 31"
such that, for any e > 0, there is an integer N for which
|x* - x| < e

whenever k > N. Then we say that the given sequence converges to the
and we write
limit x,

lim x*. = x.

We can summarize by saying that the sequence xlt x 2 x 3 , , . . . converges


to x if |x fc — x| is arbitrarily small for all sufficiently large k. Figure 40

• *i «x 2
• xj

• x4

• x<

Figure 40

shows a sequence with entries lying within e of x whenever k > 6. We


leave as an exercise showing that if jcl9 x 2 x 3 and )>i, }'%, y 3 , , . . . , are

the sequences of Example 1, approximating v'2 and 77 respectively, then

the fact that lim xk — v 2 and lim yk = 77 implies that lim (xk ,
yk ) =
(V2, 77).

We turn now
Newton's method for approximating a solution of an
to
equation/(x) =
where/is real-valued and x is a real variable.
0, in the case
We assume that /is continuously differentiable. If the graph off should
happen to be convex as shown in Fig. 41 then it is geometrically clear that ,

the tangent line to the graph at (x ,f(x )) crosses the *-axis at a point x t
Sec. 9 Newton's Method 265

Figure 41

which is a better approximation to the solution x than x is. Having chosen


x somewhat arbitrarily, and having found x x we can repeat the process. ,

This time we use the tangent line at (x 1 ,f(x 1 )) and call its intersection
with the x-axis x 2 Thus we can generate a sequence of numbers x x u
. ,

x2, .approximating x.
. .

In practice, we need a formula for computing the sequence x x x 2 , , . . .

We observe first that the tangent line at (x ,f(x )) has the equation

y = /'(*o)(* - *o) +f(x ).

Since the approximation x x is found by intersecting the tangent with the

x-axis, we set y = in the above equation and solve for x x The result is .

=f'(x )( Xl
- x ) +/(*„),
whence

/'Mix! - x Q ) = -f(x ).

If/'(x ) ^ 0,

x, — x = — f(x )

f'(*o)
or

/(*o)
Xi — JVa

/'(*o)

Having found xx , to find x 2 we need only replace x by x x in the last

formula to get
fix,)

fix,)
In general, we compute xk+1 by

f(x k )
f'(x k )
266 Derivatives Chap. 3

Example 2. The equation x 2 — 3 = has two solutions, V3 and


— V3. To approximate V3 we choose x = 2 and compute x^ from xk
by the Formula (1) which in this case is

_ x (4-3)
x k+l — k -
2x k

_ (4 + 3)

2x k
Thus we get x x =\= 1.75. Substituting this value in the above formula for
k = 1 gives x2 = f| *** 1.7321. This approximation to y 3 happens to be
correct to three decimal places, though we have not proved that.

We can follow a similar procedure to the one just described if/ is a


function from ft" to ft". The difference is that, in this case, x 1? x and ,

/(x ) are vectors in ft" and f'(x ) is an n-by-n Jacobian matrix. To


approximate a solution of a vector equation /(x) = 0, we consider the
equation that defines the value of the best affine approximation to /near
x that is,
,

y =/'(x )(x - x ) +/(x ), (2)

where x is chosen as an initial approximation to the desired solution x.


In Fig. 41, Equation (2) can be interpreted as the equation of the tangent
to the graph of/ at (x ,/(x )). As before, we set y = in Equation (2) to
get
=/'(x )( Xl - x ) +/(x ),

whence
/'(xoXXi - x ) = -/(x ).

_1
Now if/'(x ) has an inverse matrix [/'(x )] , we can apply the inverse to
both sides, getting

Xi - x = -t/'(x )]- 1/(x ).

Finally
Xi = x - [/'(x )]- 1/(x ).

_1
In this equation [/'(x )] /(x ) is the vector obtained by applying the
inverse of the matrix /'(x ) to the vector /(xq). The vector x 1 is the first

improvement on the initial approximation x to the solution x.


What we have just done can be repeated, replacing x by x x to get

x2 = Xl - [/'(xoryxxo).

After k + 1 steps we have

9.1 x k+1 = xk - [/'(x )]-y(x fc).


fc
,

Sec. 9 Newton's Method 267

Example 3. Consider the pair of equations


*2 + f- = 2
X 2_ =L
;;
2

To find approximate solutions by Newton's method we define

(x 2 +f - 2\
/<*.*)-
\X 2 —y — 2
1

and solve the equation/(x, y) = 0. Clearly/is a function from 3l to 3l


2 2
.

2 —
Since we require both x + y
2 2 —
2 = and x j
2
— 1 = 0, it will be
helpful to sketch the curves defined by these two equations. The exact
solutions are represented by the four points of intersection of the circle
x 2 + y 2 — 2 = and the hyperbola x 2 — y 2 — I = shown in Fig. 42.
The choice of an initial approximation depends on which solution we want
to approximate. Looking for the solution in the first quadrant, we try
x = (1, 1). Since Figure 42

(x2 +y 2
-2\
f(x,y)='
r
we have
2x 2)'

f'(x,y) =
2x 2y
and

[/"(*, joi-^ 1
ly- ly
Therefore,

- [/'(*, >0rV(*, y) =
\y/ \y

This vector is the analog of the expression (x 2 + 3)/2x in the previous


example and is the formula by which the sequence of approximations is

actually computed. Setting x = (x ,y ) = (1, 1), we get

1.25

0.75
268 Derivatives Chap. 3

Substituting Xj into (3) gives

'2 (1 .25)' + 3>

4 (1.25) \ _ /1.225

2 (0.75)' +
4(0.75)

Substituting our approximate value for x2 gives

2(1. 225)
2
+ 3

4(1.225) I /1.22574 \
x ""
2
2( 1.70833) + 1 / \0.707108/'

4(1.70833)

Similarly we get
1.22474
x
0.707107
and
1.22474
x.
0.707107

As in the previous example, further iteration using only five places after

the decimal point can't produce any change.


In this example, the two simultaneous equations can actually be solved
by elimination to yield x = (v 1.5, v0.5). The approximation x 4 =
(1.22474, 0.707107) happens to be correct to that many decimal places.
The other three vector solutions can be obtained by symmetry. Referring
to Fig. 42, we get them by changing one or both signs of the coordinates to
minus. The numerical procedure could have been applied by taking as
initial estimate x one of the vectors (— 1, —1), (— 1, 1), or (1, —1).

In choosing an initial approximation x , some care must be used in


getting a sufficiently close approximation. For instance, if in Example 3
we wanted the solution in the first quadrant, then it is clear that too gross
an error in choosing x could lead to approximating the wrong solution.
In many examples a sketch or similar geometric analysis of the function/
will show how x should be chosen.

In using Newton's method for large dimension, it may be very time-


consuming to invert the matrix/' (x„) at each step of the iteration. In such
cases we can use the modified Newton formula:

9.2 x k+1 = x fc
- [/'(x )]-y(x,)
Sec. 9 Newton's Method 269

to derive a sequence of approximations to a solution of f(\) = 0. For


k = 0, the formula defining Xi is the same as the Newton formula.
For k > 1, xk+1 as defined by Equation 9.2 will in general be different
from the corresponding value determined by the Newton formula, because
_1
the matrix [/'(x )] remains the same at each step in Equation 9.2.

Example 4. Returning to the equation of Example 3, namely,

fx 2
+y 2
-2\ /0\

[X 2_ f _ l
j
-\q]>

we apply Equation 9.2 with x = (1, 1). Then

WI ^i 'i

-i
i)

SO

'X
x - [/'(xo)ry(x)
\y

Then xl9 defined by x x = x — [r'(x )]"y(x ), is

'
-2x\ + 4x + 3\
"
4 _ /1.25

2yl + 4y + 1 / \0.75

4
In the next step
'— 2(1.25) 2 + 4(1.25) + 3'

4 /1.21875
j
==
-2(0.75)
2
+ 4(0.75) + 1 / \ .71875

Continuing, we arrive at
'1.22474

0.707107

which agrees with the result obtained in Example 3, though it takes more
steps.
270 Derivatives Chap. 3

In deciding whether to use the Newton formula 9.1 or its modifi-


cation 9.2, two facts should be remembered. In general, Formula 9.1
produces faster convergence than 9.2, that is, it achieves a smaller error in
a given number of steps. On the other hand, Formula 9.2 has the advan-
tage that it requires calculation of the inverse of the derivative matrix/'
at only one point. Thus computing the inverse matrices [/'(x )] _1 is
if fc

going to be particularly time-consuming, it may be worth taking the extra


iteration steps that 9.2 may require to achieve the desired accuracy.
In Section 5 of the Appendix we prove Theorem
5.1 which guarantees

convergence of the sequence in 9.2 under certain conditions. Theorem 5.1


is also used to prove the inverse-function theorem, taken up in Section 6,

Chapter 4.

EXERCISES
1. (a) Sketch the graph of/(x) = $x - x for -2 < x < 2.
(b) Sketch the tangent lines to the graph of/at x = J, x = J, and x = —J.
(c) For each of the three choices for x in part (b), what solution of/(*) =
can the Newton iteration be expected to converge to ?
(d) Discuss the choice x = ^3/9 for an initial approximation to a solution of
fix) = 0.

2. Show that Newton's method will not produce convergence to the unique
solution of ^x = unless the initial choice x = is made. [Hint. Show that
xk = (-2)%.]
3. (a) To approximate the solution of cos* —x = by Newton's method,
show that when x Q is chosen, then for k > 0,
xk sin xk + cos x k
x k+i - l + sin x fc

(b) Use a computer to find x 10 .

4. Find approximate solutions to the pair of equations

x2 +y - 1 =
x + y* - 2 =
by following the steps below:

(a) Sketch the curves satisfying each of the two equations.


(b) Defining /by
(x* +y -V
/<*./>-,
\x+y - 2
2)

find/'(*, v), [f'(x,y)]-\ and (x,y) - [/'(*, v)]-


1
/^)-
(c) Using the sketch in part (a), choose an initial approximation x„ =
(x ,
y ) to the solution of/(x, y) = that lies in the fourth quadrant of
the xy-plane.
Sec. 9 Newton's Method 111

(d) Compute x 1 = (x lt yj by Formula 9.1.


(e) Use a computer to find x 5 = (x5 y&).
,

5. Let
'

Ix +y + z

f(x,y,z) = U 2
+/ + 2 2

\ x3 + y3 + z3

and find /'(l, 2, -1). Taking x = (1,2, -1), apply the modified Newton
Formula 9.2 to approximate a solution to f(x,y, z) = (2, 6, 8).

6. Let
2
(H 2 + UV \

U + V
6
J

Noting that ^(1, 1) = (2, 2), use Newton's method 9.1 or its modification
9.2 to approximate a solution to^(H, v) = (1.9, 2.1).
4

Vector Calculus

SECTION 1

DIRECTIONAL A partial derivative of a real-valued function measures the rate of change


DERIVATIVES of the function in a particular coordinate direction. To measure the rate
of change in an arbitrary direction we use the directional derivative.
Recall that a unit vector u is a vector of length 1, that is, |
u | = 1 . Let
31" — > :ft be a real-valued function, and let u be a unit vector in the
domain space 31", The directional derivative of /with respect to u, denoted
by 5/7 du, is the real-valued function defined by

/(x + ru)-/(x)
(x) Iim

The domain of df/du is the subset of the domain of/ for which the above
limit exists.
The connection between the derivative with respect to a unit vector
and the differential is provided in the following theorem.

1.1 Theorem

If/ is differentiate at x, then

^(x)=/'(x)u
on

for every unit vector u in 31".

272
Sec. 1 Directional Derivatives 273

Proof. The existence of the derivative /'(x) implies that

. J(x+tu)-f(x)-f'(x)(tu) =
l

t-o \tu\

which is equivalent to

lim
1 f(x + l u)-m_ fXx)tt
(-o |u| t

This in turn is equivalent to

/(x + ,u)-/(x)
|im =/ (xK,

£->0 t

and the proof is finished.

The equation

^(x)=/'(x)u
on

shows that, for a differentiable function, the directional derivative involves


nothing very new. However, it does give an important interpretation of the
derivative /'(x) applied to a unit vector u.
For each vector u in 'Ji
n
of length |u| = 1, we have defined the directional
derivative of f The reason
in the direction of u to be the function df/du.
for thename "directional derivative" is that in a Euclidean space there is a
natural way to associate a vector to each direction, namely, take the unit
vector in that direction. The number (5//3u)(x) is then regarded as a
standard measure of the rate of change of the values of/ in the direction
of u.

Example 1. The domain space of the function

f(x, y, z) = xyz + e 2x+y

is assumed to be Euclidean 3-dimensional space. We find the directional

derivative of/ in the direction of u = (|, \, 1/V2). Setting x = (x, y, z)


and using Theorem 1.1, we obtain
1/2

^ (x) =/'(x)u = (vz + 2e


2x+v
xz + e
2x+v
xy)\ 1/2
du
M>/2/
2x+v
yz + xz + ^Jlxy 3e
274 Vector Calculus Chap. 4

It follows that the directional derivative of/in the direction of u has at the
origin the value dfjdu (0, 0, 0) = f.

Let 'J\
2 —
> 01 be a function whose graph is a surface in 3-dimensional
Euclidean space, and let u be a unit vector in 3l 2 , i.e., |u| = 1. An example

Figure 1

is shown in Fig. 1. The value of the directional derivative df/du at x =


(x, y) is by definition


df ,
(x), = hm /(x+Mi)-/(x)
,.
.

du <->o t

The distance between the points x + tu and x is given by

|(x + tu) - x| = |/u| = |r|.

Hence, the ratio

f(x + tu)-f(x)

is the slope of the line through the points /(x + tu) and/(x). It follows
that the limit, (dfjdu)(x), of the ratio is the slope of the tangent line at
(x,/(x)) to the curve formed by the intersection of the graph off with the
plane that contains x and x + u, and is parallel to the z-axis. This curve is

drawn with a dashed line in the figure. The angle y shown in Fig. 1
o

Sec. 1 Directional Derivatives 275

therefore satisfies the equation

tan y = — (x).
ou
The situation here is a generalization of that illustrated in Fig. 24 of
Chapter 3, Section 5. If we choose u = (1, 0), the angle y becomes the
angle a in the earlier figure and

9u dx
On the other hand, if we take v = (0, 1), then y is the angle (3 in Figure 24,
and

d\ dy
The mean-value theorem for real-valued functions of a real variable
can be extended to real-valued functions of a vector variable as follows.

1.2 Theorem

Let 51" —
> 31 be differentiate on an open set containing the seg-
ment S joining two vectors x and y in 5l n Then there is a point x .

on S such that
/(y)-/(x)=/'(x )(y-x).

Proof. Consider the function g(t) =f(t(y — x) + x), defined for


< </ 1. Let us set m(t) = t(y — x) + x. Then if h is a real
number,

g(t + h) - g(t) =/((/ + h)(y - x) + x) -f(t(y - x) + x)


=f'(m(t))h(y - x) + \h(y - x)| Z(h(y - x)),
where \imZ(h(y — x)) = 0. This last relation simply says that /is
A—
differentiable at m(t). Dividing by h, we get

g(t + h g(t)
= f(m(t))(y - - -
l~ x) ± |y x| Z(h(y x)),
h
whence
g'(t) =/'(m(0)(y - x). (1)

But by the mean-value theorem for functions of one variable,

= g'C),
276 Vector Calculus Chap. 4

for some t satisfying < t < 1. Setting t = tQ and m(t ) = x in

Equation (1) gives the required formula.

One of the most important conclusions to be drawn from the mean-


value theorem for functions of one variable is that a function with zero
derivative on an For a function/of a vector variable,
interval is constant.
we shall replace the by an open set D in 3t n that we
domain interval
assume to be polygonally connected. A polygonally connected set S is one
such that any two points in it can be joined by a polygon in S, that is, by a
finite sequence of line segments lying in S.

1.3 Theorem

If 31" — > 3i
m is differentiable on a polygonally connected open set

D and/'(x) = for every x in D, then/is constant.

Proof. We need only prove that each coordinate function of /is


constant, and so we can assume that/is real-valued. If x x and x 2 are
points of D joined by a single line segment, then Theorem 1.2 and
the assumption that f'(x) = in D together imply that /(xj) =
/(x 2 ). Obviously the same conclusion holds for two points, x x
and x 2 joined by a finite sequence of segments. So /is constant.
,

EXERCISES
1. For each of the following functions defined on 3-dimensional Euclidean
space, find the directional derivative in the direction of the unit vector u at
the point x.

(a) f(x,y,z) = x 2 + y 2 + z 2 u = (1/Vj, 1/Vj, l/Vj), x = (1,0,1).


,

(b) h(x, v, z) = xyz, u = (cos a sin sin a sin P, cos /3), x = (1 0, 0).
/S, ,

(c) f(x,y), x = (x, y), and y = (cos a, sin a). (Assume that/is real-valued
and differentiable.)

2. For each of the following real-valued functions defined on Euclidean space,


find the directional derivative at x in the direction indicated.

(a) f{x,y) = x2 — y2 at x = (1, 1) and in the direction

(11)
W5' V5/'
Arts.
V5 'J

x
(b) f(x,y) = e sin v at x = (1, 0) and in the direction (cos a, sin a).

x+y
(c) f{x,y) = e at x = (1, 1) in the direction of the curve defined by
g{t) = (r
2 3
t ) at g(l) for / increasing.
,
Sec. 2 The Gradient 211

3. Find the absolute value of the directional derivative at (1, 1,0) of the
function /(x, j, z) = x + ye in the direction of the tangent line at^(0) to
2 z

the curve in 3-dimensional Euclidean space defined parametrically by^(r) =


2 2
(3r + t + 1,2/,? ).

4. Find the directional derivative at (1,0,0) of the function f(x,y,z) =


x 2 + ye z in the direction of increasing t along the curve in Euclidean ft 3

defined by g(t) = {t
2
- t + 2, /, t + 2) at^(0). [Am. -I/V3.]

5. Find the absolute value of the directional derivative at (1 0, 1) of the function ,

f{x,y, z) = 4x y + y z in the direction of the perpendicular at (1,1,1) to


2 2

the surface in Euclidean 3-space defined implicitly by x + 2y + z = 4.


2 2 2

[Ans. 8/V6.]

6. (a) Show that the vector (yfa — y-iz 1


,z 1 x 2 — z 2x t , xxy 2 — x 2)'i) is per-
pendicular to (x 1 ,y 1 Zj) and (x 2 , ,
y2 , z 2 ).
(b) Find the absolute value of the directional derivative at (1, 2, 1) of the
function f(x, y, z) = x 3 + y + z in the direction of the perpendicular
2

at (1, 2, 1) to the surface defined parametrically by

(x, y, z) = (u v, u
2
+ v, u). [Ans. 2j V 3.]
7. If the temperature at a point (x, y, z) of a solid ball of radius 3 centered at
is given by T{x, y, z) = yz + zx + xy, find the direction in which T
-^ (0, 0, 0)

is increasing most rapidly at (1, 1,2).

8. Show that the mean-value formula of Theorem 1.2 can be written in the form

/(y) -fix) df
(x„),
|y -x| du

where u = (y — x)/|y - x|.

9. Show that the function /defined by

x\y\
(*,y)*(0,0)

U (x,y) = (0,0)

has a directional derivative in every direction at (0, 0), but that / is not
differentiable at (0, 0).

SECTION 2

In the first section of Chapter 3 we looked at some examples of vector THE GRADIENT
fields. Many of the most important ones arise as gradient fields, which are
described below.
If/is a differentiable real-valued function, "Ji" -^ %, then the function
V/ defined by

*»-£«-£«)
278 Vector Calculus Chap. 4

is called the gradient off. The gradient is evidently a function from Jl" to
;K", and it is most often pictured as a vector field.

Example 1. The function /(x, v) = ],(x


2
+y) 3
is differentiable in all
of Jl 2 and so V/(x, y)
, {\x, \y 2 ) = is also defined in :J{
2
. The field is shown
in Fig. 2 at several points.

Figure 2

The function g(x, y, z) =x +y +z


2 2 2
has gradient Vg(x, y, z) =
(2x, 2y, 2z), and the direction of the field is directly away from the origin
at each point.

The gradient of a function is important for several reasons. To begin,


we remark that the directional derivative of a real-valued function/with
respect to a unit vector u can be written in terms of the gradient off as

2.1 (X) = V/(x) • U.

The reason is that (dfjdu)(x) =/'(x)u, and the application of the matrix
/'(x) to u is the same as the dot product V/(x) u. Using Equation 2. 1 we
• ,

can easily prove the following theorem. (This is the origin of the use of the
term gradient, usually meaning incline or slope.)

2.2 Theorem

Let .')(" —^- 'Ji be differentiable in an open set D in 31". Then at each
point x in D for which V/"(x) # 0, the vector V/(x) points in the
Sec. 2 The Gradient 279

direction of maximum increase for/. The number |V/(x)| is the


maximum rate of increase.

Proof. Given a unit vector u, we have, by Equation 2.1 and the


Cauchy-Schwarz inequality,

^( X )=V/(x).u<|V/(x)||u|
ou
= |V/(x)|
But when u = V/(x)/|V/(x)|, and only then, we have

^(x) = |V/(x)|.

Thus the rate of increase dfjdu{x) is never greater than V/(x)| and |
is

equal to it in the direction of the gradient.

Example Let f(x, y) e xv Then V/(x, y)


2. =
(ye xV xe xy ); thus at
.
= ,

(1,2) the function / increases most rapidly in the direction V/(l, 2) =


2 2
(2e , e ), which has the same direction as the unit vector (2/v 5, 1/v 5).

The rate of increase in that direction is |V/(1,2)| = V5 e 2


. Similarly

V/(-l,2) = (2e-2 , -e~2) and has direction (2/V 5, — 1/Vs), with


maximum rate of increase at (—1,2) equal to V5 r 2
. The maximum
rate of decrease occurs in the opposite direction, namely, (—2/v 5, 1/v 5).

Next we shall prove a chain rule for differentiating the composition

g(f(t)), of a function 51 -^-> 51" and a function 51"


—^>5L Thus if

ft -U- 5l 3
is given by f(t) = (t, t
2
, t) and 5l -*-+
3
% by g(x, y, z) =
x cos (y + z), then g(f(t)) = ? cos (t
2
+ /). This defines a new function
from For example, if g is denned in a region D of 5l 3 and/
51 to 51.

describes the motion of a point along a path lying in D, we may be


interested in finding the rate of change of the composite function with
respect to /. The theorem gives a formula for doing this in terms of the
gradient of g.

2.3 Theorem

Let g be real-valued and continuously differentiable on an open set


D in 51" and let/(/) be defined and differentiable for a t b, < <
taking its values in D. Then the composite function F{t) = g(f(t))
is differentiable for a < < t b and

no = vs(/(o) •/'(')•
280 Vector Calculus Chap. 4

Proof. By definition,

F(t + h) - F(t)
F'(t) lim
h

s(f(t\ h))-g(f(t))
lini

if the limit exists. Since/is differentiable, it is continuous. Then we


can choose d > such that, whenever |//| -
d, f(t 1 h) is always
inside an open ball centered at/(/) and contained in D. We now
apply the mean-value theorem of the previous section to g, getting

g(y) - g(x) = g'(x„)(y - x)

= Vg(x ) •
(y - x),

where x is some point on the segment joining y and x. Letting


x =/(/) and y =/(/ + /?), with |//| < d, we have

F(t + h) - F(t) f(t + h)-f(t)


; = Vg(x )

.

h h

The vector x is now some point on the segment joining /(/) and
f{t + h). (Why is x in the domain of/?) Since g was assumed
continuously differentiable, Vg(x) is continuous, and so V^(x )
tends to V«-(/(r))as /; tends to zero. The dot product is continuous,
so F'(t) exists, with

FP7A = ..
lim
„ f
Vg(x
, fit + h)- f(t)
(0 ) • .

= Vg(/(0) •/'(«)•

Example 3. Let g(x,y) = x 2y + xy 3 for (x, y) in 3l 2 Let/(0 be . differ-

some neighborhood of/ = t and take its values in 3l2


entiable in , If it is

known only that/(/ ) = (— 1> l)and/'(/ ) = (2, 3), then the composition,
F(t) = g(f(t)), is known only at / = t and F'(t ) cannot be computed by
,

direct differentiation. However, by the previous theorem we have

F'(t ) = Vg(f(t ))'f'(t ).

We find that Vg(x, y) = (2xy + y\ x + 2


3xy 2 ), so Vg(f(t )) = (-1, -2).
Then F'(t ) = (-1, -2) •
(2, 3) = -8.

An extension of Theorem 2.3 to the case in which/is vector-valued is

proved in the next section.


Sec. 2 The Gradient 281

The gradient is particularly useful in analyzing the level sets of a real-


valued function. Recall that a level set S of a function/is a set of points x

satisfying f(x) = k for some constant k. For % 2 -^->- % we are usually

interested in S when it is a curve, and for % 3 -^-> %, the sets S most often
considered are surfaces. Examples are shown in Fig. 3.

Vf (x )

y/>o)

Figure 3

To say that a point x is on the level set S corresponding to level k


is to say that /(x ) = k. Now suppose that there is a curve y lying in S
and parametrized by a continuously differentiable function % % -^-> n .

Suppose also that g(t ) = x and g'(t ) = v =£ 0, so that v is a tangent


vector to y at x as shown in Fig. 3. Applying the chain rule to the
,

function h(t) =f(g(t)) at t gives

h'(t ) = Vf(g( to )).g'(t )

= V/(x ).v.

But since y lies on S, we have h{t) f(g(t)) = k, that is, /? is constant.


Thus h'{t ) = 0, and

2.4 V/(x ) 0.

This says that V/"(x ), if it is not zero, is perpendicular to every tangent


vector to an arbitrary smooth curve lying on S and passing through x .

For this reason it is natural to say that V/(x ) is perpendicular or normal


to 5 at x and to take as the tangent plane (or line) to S at x the set of
282 Vector Calculus Chap. 4

all points x satisfying

V/(x ).(x-x ) = 0, if V/(x )^0.


Example 4. The function f(x, y, z) = x2 +y 2 — z 2 has for one of
its level surfaces a cone C consisting of all points satisfying x + y 2 — 2
z2 =
0. The point x = (1, 1, V2) lies on C, and to find the tangent plane to C
at x we compute V/(x ) = (2, 2, — 2\/2). Then

V/(x ) • (x - x ) = (2, 2, -2V2) •


(x - 1, y - 1, z - ^2),

and the tangent plane is given by (x — 1) + (y — 1) — \fl(z — \/2) = 0,

or x +y — v2 z = 0. This plane is shown in Fig. 4. Notice that both C

Figure 4

and its tangent contain a common line with direction (1,1, V2), and that
the normal vector to the tangent is perpendicular to that line.

Putting together Theorem 2.2 with 2.4, we see that the direction of
maximum increase of a real-valued differentiable function at a point is

perpendicular to the level set of the function through that point.

Example 5. The function f{x, y) = xy has level curves xy = k. If


k 7^ 0, these curves are all hyperbolas, and each one of them is intersected
perpendicularly by every member of the family of hyperbolas g(x, y) =
x2 — y = k. (These are shown in Fig. 5.) To see this, observe that
2

V/(x, y) = (y, x) and Vg(x,y) = (2x, —2y). Hence, for each (x, y) we
have Vf(x,y)-Vg(x,y) = 0. Thus the normal vectors, and hence the
tangents, are perpendicular. It also follows that a tangent to a curve from
one family points in a direction of maximum increase for the defining
function of the other family. The argument fails at (0, 0). See Exercise 6
following this section.
Sec. 2 The Gradient 283

Figure 5

As another example of the use of the gradient, we shall prove the follow-
ing theorem, which is an extension of the familiar formula

ff'(t)dt=f(b)-f{a\ (1)

the fundamental theorem of calculus.

2.5 Theorem

Let/ be a continuously differentiable real-valued function defined in


an open set D of 3i n . (Thus V/*is a continuous vector field in D.)
If y is a piecewise smooth curve in D with initial and terminal points
a and b, then

V/.rfx=/(b)-/(a).
1
In particular, the value of the line integral of a gradient field
over a curve depends only on the endpoints of the curve; thus, in
this case, the notation

V/.rfx=/(b)-/(a)
f
is justified.
284 Vector Calculus Chap. 4

Proof. Suppose yis parametrized by G{t) with a <t< b, and G(a) =


a, G(b) =
Using first the definition of the line
b. integral, and then
Theorem 2.3, we have

fv/.dx = fV/(G(0)-G'(0<*r
Jy Ja
h

jf(G(t))dt.
I at
But by Equation (1), the fundamental theorem, the last integral is

equal to f(G(b)) -f(G(a)) =/(b) -/(a).

Example 6. Consider the field in %


Vf(x, y)
2
where f(x, y) = ,

\{x 2 +y 2
). Then Vf(x,y) = (x,y). If y any continuously differ-
is

entiate curve with initial and final endpoints (Xi, ji), and (x 2 ,y 2 ) then

x dx + j; rfy = f(x 2, y 2) - /(x b ^)


I
= Kxl ~ 4) + KA - yt).

This is what we would expect formally from the fundamental theorem. If,

on the other hand, we letg(x, y) = xy, we have Vg(x, y) = (y, x), and for
any curve of the kind previously considered we have

\y dx + x dy = x 2y2 — xxyv
Jy

EXERCISES
1. Find the gradient, V/", of each of the following functions at the indicated
points.

(a) f(x, y, z) = (x -y)e>*;{\,2, -1).


(b) f(x,y) = x 2 — y 2 — sin y, for arbitrary (x, y) in &2 .

(c) f(x,y) =x+y;(2,3).


(d) f(\) = |x|
2
, for arbitrary x in 31".
(e) /(x) = |x|
a
, for arbitrary nonzero x in &n .

(f) f(x,y) = identically in Jl


2
; (1, 1).

2. For the functions in the previous problem find the direction and rate of
maximum increase at the indicated point.

3. (a) The notation grad/is often used for the gradient V/ Show that


df
(x) = grad/(x) •
y.
Sec. 2 The Gradient 285

(b) The notation V y /is often used for the derivative df/dy. Show that

V y /(x) = V/(x) •
y.

4. Find, if possible, a normal vector and the tangent plane to each of the
following level curves or surfaces at the indicated points.

(a) x2 + y 2 - z2 = 2 at (x,y, z) = (1, 1, 0) and at (x, y, z) = (0, 0, 0).


(b) x sin y = at (x,y) = (0, rrjl) and at (x, y) = (0, 0).
= at x = e x the first natural basis vector in n
(c) |x| 1 , ft. .

(d) x y +yz + w = 3 at (x,y, z, w) = (1, 1, 1, 1).


2

(e) xyz = at (x,y, z) = (1, 1, 1).


1

(f) xyz = at {x,y, z) = (1, 2, 0).

5. If ft 2 — >> ft is continuously differentiable, its graph can be defined


implicitly in ft 3 as the level surface S of the function F(x, y, z) = z — f(x, y)
given by F{x, y, z) = 0.

(a) Show that VF = ( — Sf/dx, — df/dy, 1), which is never the zero vector.
(b) Find a normal vector and the tangent plane to the graph off(x, y) =
xy + yex at (x,y) = (1, 1).

6. (a) The example/(x, y) = x2 + y 2 has V/(0, 0) = 0, which fails to indicate


that there is a direction of maximum increase for /at (x, y) = (0, 0).
Is this reasonable? What happens at (0, 0)?
(b) In Example 5 of Section 2, the point (x,y) = (0, 0) has been avoided.
What are the directions of maximum increase for f(x, y) = xy and
g(x, y) = x 2 - y 2 at (x,y) = (0, 0)?

7. If g(x, y) = ex+v and /'(0) = (1,2), find F'(0), where F(t) =g{f(t)) and
/(0) = (1, -1).

8. Let y be a curve in ft? being traversed at time t = 1 with speed 2 and in the
direction of (1, —1,2). If / = 1 corresponds to the point (1, 1, 1) on y
find the rate of change of the function x + y + xy along y at t = 1.

9. If f(x, y, z) = sin x and F(t) = (cos /, sin r, /), find g'i-n), where g(t) =

10. Let ft —> ft" be differentiable. Let ft." > ft be continuously differen-
tiable and such that the composition g(t) =f(F(tj) exists. If F'(/ ) is

tangent to a level surface of/at F(t ), show that^'('o) = 0-

11. Given a vector field ft n —F > ft n , the problem of finding a function


ft
n — > ft such that V/ = Fis equivalent to solving the following system of
equations for/, where F1 , . . . , F„ are the coordinate functions of F:

-L r"•
~ fx
-J.
d Xl
' ' ' '
' dxn -
(a) For the case n = 2, show that if the system

¥
— (x, y) = Fx (x, y),
d
—f (x, y) = F2 (x, y)
286 Vector Calculus Chap. 4

has a solution/, then /"must have both of the forms

f(x,y) = F (x,y)dx
1 Q(;)
J
and

f(x,y) = F,(x,y)dy + C2 (x),


J
where each indefinite integration is performed with the other variable
held fixed.
(b) Find/, if V/(jc, y) = (>'
2
+ 2xy, 2xy + x 2 ), by using part (a).
(c) Find/, if V/(;c, y, z) = (y + z, z + x, x + y), by using an appropriate
extension of part (a).

(d) Find/ if V/(x, y, z) = (yz + z, xz, xy + x).

12. (a) Find the function / of Problem 11(b) by direct computation of a line
integral of V/from (0, 0) to (x, y).
(b) Find the function /of Problem 11(c) by direct computation of a line
integral from (0, 0, 0) to (x, y, z).

13. (a) Show that if Fx , . . . , Fn are continuously differentiable, then a necessary


condition for the system of equations

- **> •
- tn
d Xl ' dxn

to have a solution /is that

dxj dxi

(b) Show that the functions :R


2 -^-y Jl
2
and ft —
> Jl given by F(x, y) =
3 3
,

(xy, x 2
) and G(x, y, z) = (y, x, —zx), are not the gradients of real-
valued functions.

14. (a) Compute the line integral J y y dx + x dy along an arbitrary con-


tinuously differentiable curve from (0, 0) to (2, 3). [Hint. Guess a
function Jl
2 — > R such that V/(x, y) = (y, x).]
(b) Compute the line integral |.,.vc/.v -
y dy -f z dz along an arbitrary
continuously differentiable curve from (0,0,0) to (1,2,3) and from
(0, 0, 0) to (x, y, z).

~y X
15. Let F(x, Jv) = , A for (jc, y) # (0, 0).
'
(
\x- + y ,
2
x* + y-f

(a) If F
x and F 2
are the coordinate functions of F, show that

dj\ _ 9F2
dy dx

(b) Show that in any region not meeting the v-axis, F is the gradient of
f(x, y) = arctan (y{x), the principal branch.
Sec. 3 The Chain Rule 287

(c) Show that Fis not the gradient of any function/differentiable in a region
completely surrounding the origin —for example, the region defined by
x2 + y 2 > 0.

16. Show that if a force field in a region D of Jl 3 has the form V/ for some
continuously differentiable function :R 3 — >• 31, then the work done in
moving a particle through the field depends only on the level surfaces of/on
which the particle starts and finishes.

17. The level surfaces of a function :ft


3 — :R are called the equipotential
surfaces of the vector field V/, and fis a called the potential function of the
field.

(a) Show that the equipotential surfaces are orthogonal to the field.
(b) Find the equipotential surfaces of the field V/(x, y, z) = (x, y, z).
(c) Find the field of which f(x, y, z) = (x 2 + y 2 + z 2 )~ 112 (the Newtonian
potential) is the potential function.
(d) Find the field of which f(x,y) = —\ log (x 2 + y2 ) (the logarithmic
potential) is the potential function.
(e) Show that if/(x) = |x|
2 ~"
(the generalized Newtonian potential in Jl",
n > 3), then V/(x) = (2 - «) |x|-"x.

18. Let 31 — F
> 3l 3 define a smooth curve y for a < < / /?, with y lying in a

region D of ft 3 in which Jl 3 — Jl is continuously differentiable. Show that

rF(t.
V/- </x = V/(F(r)) •
F\t).
dt JF(a)

19. Suppose that j* F(\) dx is independent of the piecewise smooth curve y


n Show that
joining x to x for all x in some open set D in 3i y/ = F, where .

f(x) = J^ F dx. [Hint. Look at the integral over a curve y approaching


'

x in an arbitrary direction, and apply the fundamental theorem of calculus.]

20. If T(x, y, z) represents the temperature at a point (x, y, z) of a region R in


Jl
3
, the vector field VT is called the temperature gradient. Under certain
physical assumptions VT(x, y, z) is proportional to the vector that represents
the direction and rate per unit of area of heat flow at {x, y, z). The sets on
which Tis constant are called isotherms. If the isotherms of a temperature
function are concentric spheres, prove that the temperature gradient points
either toward or away from the center of the spheres.

SECTION 3

One of the most useful formulas of one-variable calculus is the chain rule, THE CHAIN RULE
used to compute the derivative of the composition of one function with
another:

The generalization to several variables is just as valuable and, properly


formulated, is just as easy to state.
288 Vector Calculus Chap. 4

If two functions /and g are so related that the range space off is the
same domain space of g, we may form the composite function g °f
as the
by first applying/and then g. Thus,

S°/(*) =£(/(*))
for every vector x such that x is in the domain of/and/(x) is in the domain
of g. The domain of g of consists of those vectors x that are carried by/
into the domain of g. An abstract picture of the composition of two
functions is shown in Fig. 6.

Domain g of

(range/) n (domain g)

Figure 6

Example 1. Suppose that we are given a 2-dimensional region in which


the points move about according to some specified law. It may be known
that, for a given initial position with coordinates (a, v), a point is always
to be found at some definite later time in a position (x, y). Then (x, y) and
(h, v) may be related by equations of the form

x = gi(u, v)
y = g (w, ^)-
2

In vector notation these equations might be written

x = #(u),
where x = (x, y), u = (it, v), and g has coordinate functions g Y g 2 Now ,
.

suppose that the initial position u = (z/, v) of a point is itself determined as


a function of other variables (s, t) by equations

u=f (s,t) 1

v =f 2 (s, t).
Sec. 3 The Chain Rule 289

These may be written in vector form as

u =/(s),

where s = (s,t) and / has coordinate functions f f


t , 2 . Then (x, y) and
(s, t) are related by
* = gi(fi( s 0,Ms, 0) >

y = giifxis, t),f (s, o), 2

or
*=*(/())•
Using the notation g °/for the composition of g and/, we can also write

x = g °/(s).
To determine the derivative of g °/in terms of the derivatives of g and

/, suppose that %n — 3l
m is differentiable at x and that 3l
m -i-y 3l p is

differentiable at y =/(x ). Then g'(y Q ) is a p-by-m matrix and/'(x ) is

an m-by-n matrix. It follows that the product g'(y )/"( x o) is defined and
is a p-by-n matrix. The chain rule says that this product matrix is the
derivative of g °f at x . Because the matrix product corresponds to
composition of linear functions, the result can be stated in terms of
differentials: the differential of a composition is a composition of
differentials.

Example 2. Consider the special case in which/is a function of a single


real variableand g is real-valued. Then g °fis a real function of a real
variable. Theorem 2.3 shows that if / and g are continuously differ-
entiable, then

(s°/)'(0 = Vg(/(0) /'(>)•

That is, in terms of coordinate functions,

= s ...,p-
(g °/)'«) (I
-

(f(t)), (fa))) {at\ . .


. j'jt)).

The right side of this last equation can be written as a matrix product in
terms of the derivative matrices

g'(/(0)=(^(/(0),...,^(/(0)):
and

\f'Jt)i
,

290 Vector Calculus Chap. 4

as (g ° /)'(') = g'(f(0)f'(t)- The product of g'(f(tj) and f'(t) is defined by


matrix multiplication, in this case 1-by-m times m-by-1, and is equivalent
to the dot-product of the two matrices looked at as vectors in 31'". Thus,
for the case in which the domain of/ and the range of g are both 1-
dimensional, the formulas

Mf(0)-f'(t) and g {f{t)) fit)

are practically the same.

The following theorem gives an extension to any dimension for the


domain and range of g and/.

3.1 The Chain Rule

Let ft" — > %m be continuously differentiate at x and let

%m J_^ ftp

be continuously differentiate at /(x). If g °f is defined on an


open set containing x, then g o/is continuously differentiable at x,
and
(g°f)'(x)=g'(f(x))f'(x).

Proof. We need only show that the derivative matrix of g °


f at x
has continuous entries given by the entries in the product of
g'(/(x)) and/'(x). These matrices have the form

s c/wj\
-
/foo
I

and

yd)-! dy m \dx 1 dx,


J
The product of the matrices has for its zyth entry the sum of products

i^(/(x))^(x). CD
t =i dy k dxj

But this expression is just the dot-product of two vectors Vg,-(/(x))


and dfjdxjix). It follows from Theorem 2.3 that

Vg,(/(x))-^(x)
OX
= a
-^-
ox
)
(x), (2)
,
Sec. 3 The Chain Rule 291

because we are differentiating with respect to the single variable xy


But this establishes the matrix relation, because the entries in

(g °/)'(x) are by definition given by the right side of Equation (2).


Since g and /are continuously differentiate Formula (1) represents
a continuous function of x for each i and j. Hence g °f is con-
tinuously differentiable.

In Section 2 of the Appendix we give a version of the chain rule in


which we assume only differentiability off and g, and we conclude only
differentiability of g o /rather than continuous differentiability. Otherwise
the theorem is the same as 3.1.

Example 3. Let f(x, y) = (x 2 +y 2


, x2 —y 2
) and let g(u, v) = (uv,

u + v). We find
(v u\ I2x 2y\
-d
i ,)
/'<*•-">
=u _ 2y \
To find (g °f)' {2, 1), we note that/(2, 1) = (5, 3) and compute

Then the product of these last two matrices gives


'32 -4
(so/y(2,i) =i

It is common practice in calculus to denote a function by the same


symbol as a typical element of its range. Thus the derivative of a function

'J\
— > 'A is often denoted, in conjunction with the equation y =/(x), by
dy/dx. Similarly, the partial derivatives of a function 3l 3 — > 51 are
commonly written as


dw
dx
,
dw

By
, and
,

dw
dz

along with the explanatory equation w =f(x,y, z). For example, if

vv = xy 2 e x+3z then
,

?W
= 2 x + Zz
y e + xy 2 e x+ Zz^
dx

^ = 2xye^\
By

^ = 3x/e^.
y
dz
.

292 Vector Calculus Chap. 4

This notation has the disadvantage that it does not contain specific
reference to the function being differentiated. On the other hand, it is
notationally convenient and is,moreover, the traditional language of
calculus. To illustrate its convenience, suppose that the functions g and/
are given by

w=g(x,y,z), x=f (s,t),


1 y=fi(s,t), z=f3 (s,t).
Then, by the chain rule,

Matrix multiplication yields

dw dg dx dg dy dg dz"
ds dx ds dy ds dz ds
(3)
dw dgdjK dgdy dgdi
dx dx dt dy dt dd dt

A slightly different-looking application of the chain rule is obtained if

the domain space off is 1-dimensional, that is, if/ is a function of one
variable. Consider, for example,

w = g(u,v),
W =f(t)=[ \/2 (0

The composition g °fis in this case a real-valued function of one variable.


Its differential is defined by the 1-by-l matrix whose entry is the derivative

d{g°f) _ dw
'

dt dt

The derivatives of g and / are defined, respectively, by the Jacobian


matrices
<du

dw dw\ . dt
and
( Hu dv) "
dv
idt,
Sec. 3 The Chain Rule 293

Hence, the chain rule implies that

"f
dw _ idw dw\ I I _ dvv du .d_w dv .

dt \du dv / \ a,-, I du dt dv dt

Finally, let us suppose that both /and g are real-valued functions of


one variable. This is the situation encountered in one-variable calculus.
The derivatives of/at t, of g at s —f(f), and of g °/at t are represented by
the three 1-by-l Jacobian matrices/' (r), g'(s), and (g °f)'(t), respectively.
The chain rule implies that

3.2 (g°/)'(0 = sW'(0.


If the functions are presented in the form

x = g(s), s=f{t),

the more explicit Formula 3.2 can be written as the famous equation

dx dx ds
(5)
dt ds dt

Example 4. Given that

x= u2 + v3 (u = + t 1
and
y= e uv , [v = el ,

find dx\dt and dy\dt at t = 0. Let % -?-> 3l 2


and 3l 2 -^ ft 2
be the func-
tions defined by

/(*)=[ )
= ()> -co</<co,

fu\ /it
2
+ v
3
\ {x\ ( — co<u<oo.
y
v) \ e uv J \yj' [— oo < y < co.

The differential of/ at / is defined by the 2-by-l Jacobian matrix

IH)
294 Vector Calculus Chap. 4

l \
U
The matrix of the differential of g at is

'
dx dx^

du dv\ I 2u 3v
2

uv l
dy dy J
\ve ue

\du dv/

The dependence of x and y on Ms given by

fx\
= (g°f)(t), -co</<co.
yi

Hence, the two derivatives dxjdt and dyjdt are the entries in the Jacobian
matrix that defines the differential of the composite function g of. The
chain rule therefore implies that

That is,

dx
— = dx du + dx
- dv = 2u + 3v ,
_ ,
„ ,
e
t

dt du dt dv dt
(6)
dy
= d_ydu dydv
+ = veUV + ueUV+t
dt du dt dv dt

If / = 0, then I ) =/(0) = ( ), and we get u =v= 1. It follows that


v


dx
l
(0) =2+ 3 = 5,
dt

^ (0) =e+e+ 2e.


dt

The definition of matrix multiplication gives the derivative formulas


that result from applications of the chain rule a formal pattern that is

easy to memorize. The pattern is particularly in evidence when the


coordinate functions are denoted by real variables as in Formulas (3), (4),

(5), and (6). All formulas of the general form

dz dx dz dy
. . .
+ + + • •

dx dt dy dt
Sec. 3 The Chain Rule 295

have the disadvantage, however, of not containing explicit reference to the


points at which the various derivatives are evaluated. It is, of course,
essential to know this information. It can be found by going to the
formula
(W)'(x)=S'(/(*))/'(x).

It follows that derivatives appearing in the matrix/' (x) are evaluated at x,


and those in the matrix g'(f( x )) are evaluated at/(x). This is the reason
for setting t = and u v = —
1 in Formula 6 to obtain the final answers in

Example 4.

Example 5. Let
( x =f(u, v).
z = xy and
\y = g(u, v).

Suppose that when u = 1 and v = 2, we have

du dv du dv

Suppose also that /(I, 2) = 2 and g(l, 2) = -2. What is dzjdu(\, 2)?
The chain rule implies that


dz
= dz dx
+ dz dy
.
(7)
du dx du By du

When u= and v = 2,
1 we are given that x =/(l, 2) = 2 and y =
g(l,2) = -2. Hence

^(2,-2) =
dx
yU =_ 2 =-2

^(2,-2)
dy
= xU =_2 = 2.

To obtain dz\du at (u,v) = (\, 2), it is necessary to know at what points to

evaluate the partial derivatives that appear in Equation (7). In greater


detail, the chain rule implies that

£ (1. 2) = £ (2, -2) |5(i,2) + ^(2, -2)^(1, 2).


C7u dx du dy du
Hence

|^(1, 2) = — 2)( — 1) +
( (2)(5) = 12.
du

Example 6. If w = f(ax 2 + />xy + cy 2 ) and _y = x 2 + x + 1, find


dwjdx{—\). The solution relies on formulas that follow from the chain
296 Vector Calculus Chap. 4

rule [like (3), (4), (5), and (7)]. Set

z = ax 2 + bxy + cy 2 .

Then, w = f(z) and


dz dz dz dy
dx dx By dx
Hence,
dw df dz df tdz_ dz dy\
dx dz dx dz \dx dy dx)
= f'(z)(2ax + by + (bx + 2cy)(2x + 1)).

If x = —1, then y = 1, and so z = a — b + c. Thus,

dw
(-1) =f\a - b + c)(-2o + 26 - 2c).
dx
The Jacobian matrix, or derivative, of a function /from 31" to 31" is a
square matrix and so has a determinant. This determinant, det/'(x)>
is a real-valued function of x called the Jacobian determinant of/; it plays
a particularly important role in the change-of-variable theorem for
integrals taken up in Chapter 6. At this point we remark on a simple
corollary of the chain rule and the product rule for determinants:

3.3 Theorem

If 31" —/
> 31" is differentiate at x and 31" —g
> 31" is differentiate
at y =/(x ), then the Jacobian determinant of g ° / at x is the
product of the Jacobian determinant of /at x and that of g at y .

If/ is defined by

'x /iCx,, . . . , x n)

/„(*!, • • , „)

then the Jacobian determinant det/' s often denoted by

3(xi, •

or equivalently

d( Xl , . .
Sec. 3 The Chain Rule 297

Example 7. Let

2xy
Then,

d(x, v) (cos 6 —r sin 6)


-r-^Z = det = r(cos
2
+ sin
2
0) = r,
o(r, B) \sin 6 r cos 0)
and
/2x -2jA
^- =
a(w, z)

o(x, y)
,
det
\2y 2x1
= 4(.x + y%2

The Jacobian determinant of the composite function g of is denoted in


this case by d(w, z)/d(r, 6). If

x \ fr cos 6

y J \r sin 6

Theorem 3.3 implies that

—-
d(w, z)

d(r, 0)
(r , O)
= d(w,
d(x, >)
z)
(x , y
,
)
——
d(x, y)

d(r, 0)
,
(r ,
O)

= 4(4 + y^)r = 4r*.

EXERCISES
1. Given that

xz + xy + 1

2h
2
>' + 2

find the matrix of the differential of the composite function^o/at

'3
1

2. Let
298 Vector Calculus Chap. 4

and

x + 2y + r
s\y
x2 —y

3 + 4a3 \
(a) Find the Jacobian matrix of go fat t = a. Ans.
\2a - 1

(b) Find dujdt in terms of the derivatives of x, y, z, and the partial derivatives
of u.

3. Consider the curve defined parametrically by

f(t) = J t
2 -

Let g be a real-valued differentiable function with domain Jl 3 . If

-i:
and

(x ) (x ) = 2,
ex Sy dz

find d(g o f)jdt at t = 2. [Ans. 14.]

4. Consider the functions

and
F(x, y, z) = x2 + y2 + zz = w.

(a) Find the matrix that defines the differential of F of at

(b) Find 9H>/9«and dwjdv.

5. Let u =f(x,y). Make the change of variables x = rcosO, y = r sin 6.

Given that
df df
-f-
= x2 + 2xy
J
- Jy 2 and -^ = x2 - 2xy
J + 2,
dx dy

find dfjdd, when r = 2 and 6 = tt/2. [Ans. 8.]


'

Sec. 3 The Chain Rule 299

6. If w = Vx + y +
2 2
z2 and

find dwjdr and 9>v/30 using the chain rule. Check the result by direct
substitution.

7. Vector functions/ and g are defined by

(u\ /hcos v\ (0 < u < co,

/I
^17 \« sin v] \-Trj2 < v < tt/2,

arctan

(a) Find the Jacobian matrix of^- °/at

(b) Find the Jacobian matrix of f°g at

(c) Are the following statements true or false?


1. domain of/ = domain of^- °
f.
2. domain of^- = domain of fog.

8. A function / is called an identity function if /(x) = x for all x in the domain


of/.

(a) Show that if differentiable vector functions/and^- are so related that the
composite function^ o/is an identity function, then the transformation
(df(x) g) ° (dx f) is also an identity function for x in the domain of g of.
(b) How does this exercise apply to the preceding one?

9. Let x x be a tangent vector at x to a curve defined parametrically by a


differentiable vector function / If x is in the domain of a differentiable
vector function F, prove that F'(x )x 1 , if not zero, is a tangent vector at
F(x ) to the curve defined parametrically by F°f.
10. The convention of denoting coordinate functions by real variables has its

pitfalls. Resolve the following paradox: Let w =f(x, y, z) and z = g(x, y).
By the chain rule
dw dw dx dw By dw dz
dx dx dx dy dx dz dx

The quantities x and y are unrelated, so that dyjdx = 0. Clearly dxjdx = 1.


300 Vector Calculus Chap. 4

Hence,
dw dw dw dz

dx dx dz dx'
and so
dw dz
dz dx

In particular, take w = 2x + y + 3z and z = 5x + 18. Then

—dz
= 3 and —
dx
= 5.

It follows that = 15, which of course is false.

11. If y = f(x — at) + g{x + at), where a is constant and /and g are twice
differentiable, show that

d2 d2
a2 —y = —z.y
x (Wave equation)
dx' 1

dr
(x\ fr cos d\
)
= j J
, show that
y) \r sin 0/

(dzV /dz\2 /dz\2 1 /dz\2

13. If f(tx, ty) = t


n
f(x,y) for some integer /;, and for all x, y, and /, show that

df df
x
Tx + yyy- nf^y^
14. Show that for a differentiable real-valued function g{x,y),

dg(x, x) dg dg
+
-lx-=Tx (X '
X)
Ty^ x) -

Using the function f(x) = (x, x) show that this result is equivalent to the
statement (g <>/)'(*) = g'(f(x))f'(x).
Apply the equation to the function g(x,y) = xv .

15. (a) If
w =f(x,y, z, t), x=g(u,z,t), and z = h{u, t),

write a formula for dwjdt, where by this symbol is meant the rate of
change of w with respect to /, and where all the interrelations of w, x, z, t
are taken into account,
(b) If

f(x, y, z, i) = 2xy + 3z + t
2
,

g(u, z, t) = ut sin z,

h{u, t) =2u + t,
Sec. 4 Implicitly Defined Functions 301

evaluate the above dwjdt at the point u = 1, t = 2, y = 3, by using the


formula you derived in part (a) and also by substituting in the functions

for x and z and then differentiating.

16. Consider a real-valued function /(*, y) such that

/.<2, 1) = 3, fy (2, 1) = -2, /M (2, 1) = 0,

/x ,(2,l)=/yx (2 )
l) = l, /w (2, 1)=2.
2 -^-*- ft 2
Let Jl be defined by
^(w, i>) = (w + y, «i>).

Find d*(fog)ldvdu at (1, 1). [^iw. 2.]

17. Calculate the Jacobian determinants of the following functions at the points
indicated.

//A /w 2 + 2uv + 3v\ (x\ /0\

(c) A I 1=1 II I , at an arbitrary


\j/ \c dj \y] \y)

(d) An affine transformation Jl n — > 31", A(x) = L(\) + y , at an


arbitrary x .

fr cos sin <f>\

r sin sin ^> |


, at | <f>

rcos <£

18. Using the functions/and^- in Exercises 17(a) and (b), compute the Jacobian
(0\
determinant of the composite function^ °/at
(

[Ans. -1120.]

19. In terms of functions & 2 —1


> 3l 2
and 5t
2 —s
> Jl
2
, what do the following
equations say and how do they follow from Theorem 3.3?

d(x,y) d(u, v) _ d(x, y) d(x, y) d(u, v)


(a) ~ ( }
_
~
d(u, v) d(r, 6) d(r, 6)' d(u, v) 9(x, y) '

It may happen that two vectors are related by a formula that doesn't SECTION 4

express either one directly as a function of the other. For example the IMPLICITLY DEFINED
formula FUNCTIONS
P -~k
302 Vector Calculus Chap. 4

may express the relationship between pressure p on the one hand, and
volume and temperature (v, t) on the other hand, of the gas in some
container. Or the equations

x2 +/ + z2 = 1

x +y + z =
may be interpreted as a relation between the three coordinates of a point
on a certain sphere of radius 1 centered at (0, 0, 0) in Jl 3 . In neither
example do the equations give an explicit formula for any of the coordinates
in terms of others. In this section we study the application of calculus to
such relations.
For any two functions 31 2 — > 31 and 31 — > 31, the equation

F(x,y) = (1)

defines / implicitly if F(x,f(x)) — for every x in the domain of/. The


zero on the right side of Equation (1) can be replaced by any constant
c. But since F(x, y) = c is equivalent to G(x, y) = F(x,y) —c= 0, it is

customary to absorb such a constant into the function F.

Example 1. Let F(x, y) — x + v — Then


2 2
1. the condition that
F(x,f(x)) =
x 2
+
(f(x))
2
—1=0, for every x in the domain of/, is

satisfied by each of the following choices for/.

/i(x) = Vl - x2 ,
-1 <x < 1.

f (x) = -Vl -
2 x2 ,
— <x < 1 1.

VT^ x% —I < x < 0.


Mx) =
-Vi- x\ < x < 1.

Their graphs are shown in Fig. 7. It follows from the definition of an


implicitly defined function that all three functions /, / /32 ,
are defined
implicitly by the equation x2 +y — 2
1 = 0.

Figure 7
Sec. 4 Implicitly Defined Functions 303

Consider a function Jl" +m — > Jl


m . An arbitrary element in 3l n+m can
be written as (xt , . . . , x n ,y x , . . .
,y m ) or as a pair (x, y), where x =
(*!, . . . , x n ) and y = (y\, . . .
,y m ). In this way F can be thought of
either as a function of the two vector variables, x in 31" and y in m or
.'il ,

else as a function of the single vector variable (x, y) in 'Ji


n+m The function
.

31" — > 3i m is defined implicitly by the equation

F(x, y) =
if F(x,f(x)) = for every x in the domain of/.

Example 2. The equations

x+ y + z- =0 1

(2)
2x +z+2=
determine y and z as functions of x. We get
y = x + 3, z = — 2x — 2.

In terms of a function Jl 3 — >- 'J\


2
, Equations (2) can be written

(y\\ (x+y + z-\\ (0)


F[x
K zJ] \2x +z+ 2/ \0

The implicitly defined function % — > 3l 2 is

"MM-"-;
Although Example 1 shows that an implicitly defined function need
not be continuous, we shall be primarily concerned in this section with
functions that are not only continuous but also differentiable. The implicit
function theorem appearing at the end of Section 6 gives conditions for
/ defined by an equation F(x,/(x)) 0.
the existence of a differentiable =
Before discussing this theorem, however, we consider the problem of
finding the differential of/when it is known to exist. Suppose the functions
3l 2 —F > 31 and %— f
> % are differentiable and that

F(x,f(x)) =
for every x in the domain of/. Then the chain rule applied to F(x,f(x))
yields
Fx (x,f(x)) + Fv (x,f(x))f'(x) = 0.
.,

304 Vector Calculus Chap. 4

Hence,

= ^,/W)^o.
4.1 /'(*>
-FT^v
Fy(x,f(x))
lf

For vector-valued functions a similar computation is possible.

Example 3. Given the equations

x2 +/ + z2 -5= 0, xyz + 2 = 0, (3)

suppose that x and j are differentiable functions of z, that is, the function
defined implicitly by Equations (3) is of the form (x, y) — f{z). To
compute dxjdz and dyjdz, we apply the chain rule to the given equations
to get

2x^ + 2^ + 2z = 0,
dz dz
dx dy
yz— + xz— + xy = 0.
dz dz

These new equations can be solved for dxjdz and dyjdz. The solution is

which is the matrix/'(z). Notice that the corresponding values for x and y
have to be known to make the formula completely explicit. That is, from
the information given so far, there is no possible way of evaluating
{dxjdz)(\). On the other hand, given the point (x, y, z) = (1, —2, 1), we
have {dxjdz){\) = — 1. The reason is that, just as in Example 1, there is
more than one function/defined implicitly by Equations (3). By specifying
a particular point on its graph, we determine /uniquely in the vicinity of
the point.

Example 4. Consider
xu + yv + zw = 1

x-\-y-\-z-\-u-\-v-\-w = §,

xy + zuv + w — 1
-

Sec. 4 Implicitly Defined Functions 305

Suppose that each of x, y, and z is a function of u, v, and w. To find the


partial derivatives of x, y, and z with respect to w, we differentiate the
three equations using the chain rule.

— +u — + w — +
dx dy 3z
u
,

z = 0,
dw dw dw

dw dw dw
dx By dz
y— + x-^ + ui>— + 1 =
,

0.
ow ow aw
Then, solving for dx/dw gives

9x uu
2
+ xz + — vv zwt; -XW-D
9w 2
w y + vy + wx — >'\v — mx — uu
2

Similarly, we could solve for dyjdw and dzjdw. To find partials with respect
to u, differentiate the original equations with respect to u and solve for
dx/du,'dyjdu, and dzjdu. Partials with respect to v are found by the same
method.

The computation indicated in Example 4 leads to the nine entries


an implicitly defined vector function.
in the matrix of the differential of
In order for the computation to work it is necessary to have the number
of given equations equal the number of implicitly defined coordinate
functions. To get some insight into the reason for this requirement, sup-
pose we are given a differentiable vector function

(Fiiu, v,x,y)
F(u, v, x, y) = I

\F2 (u,v,x,y)
and that the equations

F t (u, v, x, y) = 0, F2 (u, v, x,y) = (4)

implicitly define a differentiable function (x,y) =f(u, v). Differentiating


Equations (4) with respect to u and v by means of the chain rule, we get

9F, 3F, dx
-+ — dy =
dF,1
-^-

0, — + —dx —
dF,
i
dx
+— —=
3F,1dy dF,
l
0,
du dx du dy du dv dy dv dv

d_Fj ,
3F_2 dx dFi dy _ dF 2 dF 2 dx dF 2 dy
du dx du dy du dv dx dv dy dv
306 Vector Calculus Chap. 4

These equations can be written in matrix form as follows:

f
d£i 9fi\ /dFi dFA /dx dx}

du dv I / dx dy \ I du dv

dF 2 j \dF 2 dF 2 ]\dy dy
dv I \ dx dy I Xdu dv)

The last matrix on the right is the matrix of the differential of/ at (w, v).

Solving for it, we get

1
'dx dx\ IdFx dFX- /dl\ dF\\

du dv I / dx dy \ I du dv
(6)
dy_ dy J \ d_F\ dFj,
J \ dFj dF\
\du dv/ \dx dy I \du dv

To be able to solve Equation (5) uniquely for the matrix /'(w, v), it is

essential that the inverse matrix that appears in Equation (6) exist. This
implies, in particular, that the number of equations originally given equals
the number of variables implicitly determined or, equivalently, that the
range spaces ofF and /must have the same dimension.
The analog of Equation (6) holds for an arbitrary number of coordinate
functions F and is proved in exactly the same way. We can summarize
t

the result in the following generalization of 4.1.

4.2 Theorem

If 3i
n+m -^> %m and 31" -U 'Ji
m are differentiable, and if y =/(x)
satisfies .F(x, y) = 0, then

/'(x) = -F7\x,f(x))Fx(x,f(x)),
provided Fy has an inverse. The derivative Fy is computed with x
held fixed, and Fx is computed with y held fixed.

The notation used above is illustrated in the next example.

Example 5. Suppose that


(x2y + xz

xz + yz
and that we choose x = x, y — (y, z). Then
(2xy + z
Sec. 4 Implicitly Defined Functions 307

and
2
(jc x

z x+y
Newton's method provides, in many cases, an effective way of com-
puting an approximate value for an implicitly defined function. Suppose
that/(x) satisfies the = in some neighborhood of
equation F(x,f(x))
x = x . To compute /(x we need to solve the equation F(x y) =
) ,

for y. We apply Newton's method to the function g(y) = F(x y). The ,

iteration equation y k+1 = yk - \g' (y t^]-


1
g(yk ) becomes:

4.3 yk+1 = yk - [Fy (x , y*)]-


1
/"^ y fc ).

Of course, some choice for y has to be made using more detailed in-
formation about F.

Example 6. If F(x, y) = x 2y + xy 2 , then Fy (x, y) = x2 + 2xy, and


Equation 4.3 becomes
2
_ _ 4y k +- x yk — v]_
yk+i — yk •
k')
x + 2x
2
>> fc x + 2y k

Suppose that x = 1 and that we choose y = 1 as an initial approximation


to the number/(x ) satisfying/(x ,/(x )) = 0. Then

yi =h
2
(i) 1
y2 =
i + 2(i) 15
2
(tV) i
y3 =
1 + 2(tb) 255

These values for y suggest that y = may be a solution, and indeed we


can check that F{\ 0) = 0. ,

On the other hand, if we try y = —2 together with x = 1, we get

2
(-2) 4
yi =
1 + 2(-2) 3

y2
M) 2
_ 16
l+2(-i) 15'

2
(-II) _256
^ ~ + 2(-xf) _
1 255
.

308 Vector Calculus Chap. 4

These values ofy seem to be converging to y = — and indeed F{ 1 , 1 , — 1 ) =


0. To solve F(l.l,y) = approximately, we would set x = 1.1 in
Equation (7) and try either y or y = = 1

Example 7. Suppose we are given

(x 2y + z
F(x, y, z) =
'
x +yz 2

and are asked to find approximate values for y and z satisfying


F{\,y, z) = 0. The derivative matrix of F with respect to (y, z) is

x2 1

F{v.z)(x,y,z) =
\lyz y'

so
1 1

Fiv . z)
(i,y,z) =
K
2yz f
Computing the inverse matrix* by the formula

'a 6\-i ( d -b
i

\c d! a <t - be \_ c a
gives

L—
[iWL ^> z)]" = 1--2yz 1 1 / / -1
y \-2_yz 1

Then the iteration Formula 4.3 is

y*+i\ yk\ i / yk -i\/^fc + z

^-2>; t z, \-2y fc2fc


1/ \l + 2
>;, z fc

1 - 2y'z 4

This formula shows that, no matter how we choose y and z so that


yl —
2y z ^ 0, we must always have y k+1 = —zk+1 for k > 0. Thus we
might try y = 1, z = — 1. Substitution into the iteration formula shows
that yk+1 = 1 , zk+1 k > 0, and in fact that F(l, 1, — 1) =
= — 1 , for every
0. Thus no further numerical computation is necessary in this example
because we made a lucky choice for y and z In the event that we had .

not been so lucky, we would compute successive approximations (xk yk ) ,

to the solution.
:

Sec. 4 Implicitly Defined Functions 309

EXERCISES
©if
x 2y + yz = 0, xyz +1=0,
find dxjdz and dyjdz at (x, y, z) = (1, 1, — 1).
dx 1 dy _3
Ans. =
Jz 2 '
rfz 2

2. If Exercise 1 is expressed in the general vector notation of Theorem 4.2, what


are F, x, y, Fy , and Fx l
3. If
x+j— u — i> = 0,

x— j + 2k + t; = 0,

find 3jc/ 3m and 3j/ 3w by

(a) Solving for x and y in terms of u and i>.

(b) Implicit differentiation.

4. If Exercise 3 is expressed in the vector notation of Theorem 4.2, what is the


matrix /'(x)?

5. If x2 + yu + xv + w = 0, x +y + «fvv + 1=0, then, regarding x and y


as functions of w, t\ and w, find

dx dy

OU
and +1
Oli
at (x, y, w, t>, w) = (1, -1, 1, 1, -1).

dx dy
Ans. — =0, -~-

= 1.
3w

6. (a) The equations 2x zy + yx 2 + /


2
= 0, x+j + r — 1=0 implicitly
define a curve

fit)

which satisfies

/"(!) =

-1
Find the tangent line to /at f = 1. ,4/75. /

(b) Apply Newton's method to approximate /"(l.l).

7. Let the equation x 2 \A + y2 + z 2 j9 — 1=0 define z implicitly as a function


z =f(x,y) near the point x = 1, y = "^TT/6, z = 2. The graph of the
function /is a surface. Find its tangent plane at (1, VTl/6, 2).
310 Vector Calculus Chap. 4

8. Suppose the equation F(x, y, z) = implicitly defines z =f(x,y) and that


z = f(xQ yQ). Suppose further that the surface that is the graph of z =
,

f(x,y) has a tangent plane at (x ,


y ). Show that

dF dF
(x -x ) — (x ,
y , z ) + (y yo) iz
(x °' y°> Zo)

dF
+ (z - z ) — (x ,
y , z ) =
is the equation for this tangent plane.

9. The equations
2x+y + 2z + u-v-l =
xy + z — u + 2v — 1 =0
yz + xz + u2 + v =
near (x, y, z, u, v) = (1,1, — 1, 1, 1) define x,j, and z as functions of uandv.

(a) Find the matrix of the differential of the implicitly defined function

fx(u, v)\

y (u, v) I =/(«, v) at («, v) = (1, 1).

K
z(u, v)J

(b) The function / parametrically defines a surface in the (x, y, z) space.


Find the tangent plane to it at the point (1, 1, —1).
(c) Apply Newton's method to approximate/(l.l, 1.1).

SECTION 5

CURVILINEAR Formulas that occur in mathematics and its applications can often be
COORDINATES simplified bygood descriptions of the quantities to be singled out for special
attention. Since in practice these quantities are usually represented by
vectors whose entries are real-number coordinates, the problem can be
viewed as one of choosing the most useful system of coordinates. Thus
we consider introducing coordinates in 31" different from the natural
coordinates x t
that appear in the designation of a typical point (xl3 . .
.
, x n ).
Specifically, to each point (x l5 . . . , xn ) there will be assigned a new
w-tuple («!, u„). Clearly, if we are to be able to switch back and forth
. . . ,

from one set of coordinates to the other, the assignment described above
must be one-to-one, that is, for each (xlt x n ) there should be just . . . ,

one «-tuple (u lf u n ) and vice versa. In practice, the new coordinate


. . . ,

assignment is often made for some specific subregion of 31" rather than
for the whole space. The vector space of new coordinates (ux ,un ) , . . .

C
will be denoted by IL" to avoid confusion with 31", whose points
(x lt . . . , xn ) are being assigned the new coordinates.

Example 1. (Polar coordinates in the plane.) Consider two copies of


2-dimensional space: the jcy-plane, denoted by 3l 2 , and the r0-plane,
Sec. 5 Curvilinear Coordinates 311

denoted by U 2
. The function W—
> % 2
denned by

x
\
312 Vector Calculus Chap. 4

From a slightly different point of view, the preceding paragraph says


that T is not one-to-one, but that it becomes so if its domain is restricted
to be a subset of a rectangular half-strip in the /-0-plane defined by
inequalities
<r< oo, O < < O + 2n.

So restricted, Thas an inverse function. Solving the equations x = r cos 0,


y = r sin for r and 6, we obtain, for x ^ 0,
2
V* + y
2
, = arctan - + k-n.

We have used the common convention that any inverse trigonometric


function is the principal branch of the corresponding multiple-valued
function. Hence, the range of the function arctan is the interval —77-/2 <
6 < 77/2. It follows that the function defined by

x > 0,

is the inverse of the restriction of Tto the region <r< 00, — tt/2 < 6 <
77-/2. Similarly, the function defined by

A/7T7\
arccott -
x ], y>0,
I
J
is the inverse of the restriction ofrtoO<r< oo,O<0<7r.
We have not defined polar coordinates for the origin of the xy-plane
simply because
^0 cos 0\ /0\
= for all 6,
sin 0/ \0/

and so the one-to-one requirement fails at the origin. This fact causes no
real difficulty. For example, the equation in rectangular coordinates of
the lemniscate,
0c 2 +/) = 2
2(x 2 -j 2
), (2)

becomes, upon introduction of polar coordinates,

r
2
= 2 cos 20, r > 0. (3)

The image under T of the set of pairs (r, 0) that satisfy Equation (3) is
precisely the set of pairs (x,y) that satisfy Equation (2), except for the
origin. We may simply fill in this one point. See Fig. 9.
Sec. 5 Curvilinear Coordinates 313

Figure 9

Example 2. {Spherical coordinates in ^-dimensional space.) Consider


-£-> Jl 3 , defined by

'r sin (f>


cos 6\ (0 < r < oo
r sin <f>
sin I , I < <</> 77 (4)

r cos </> / [0 < < 2t7.


Here for simplicity we have restricted the domain of Tfrom the outset so
3
that it is one-to-one. Its range is all of Jl with the exception of the z-axis.
Hence, it assigns spherical coordinates (r, </>, 0) to every point of 3i 3 except
those on the z-axis. As with polar coordinates in the plane, the spherical
coordinates (r, cf>, 6) of a point x = (x,y, z) have a simple geometric
interpretation (see Fig. 10): The number r is the distance from x to the

-axis
-
2tt

1 1 1

<i> -axis
(r,<f>,e)
314 Vector Calculus Chap. 4

origin. The coordinate <f>


is the angle in radians between the vector x and
the positive z-axis. Finally, d is the angle in radians from the positive
.x-axis to the projected image (x, y, 0) of x on the xy-plane. The symbols

(f>
and are sometimes interchanged, particularly in physical applications.
We can compute an explicit expression for the inverse function, which
we denote by T~l by , solving the equations

x = r sin (f>
cos 6,

y = r sin <f>
sin 6,

z = r cos (f),

for r, 6, and <j>. We get, for y > 0,

Vx 2 + /+ z
2

z
arccos x
2
+ y
2
> 0.
Vx 2 + r+

Vx 2 + /
Since the range of arccos (the principal branch) is the interval < d < 77,

this function is actually the inverse of the function obtained by restricting


the domain of T by the further condition < d < 77. To get values of 6
in the interval 77 < 6 < corresponding to y
2tt, < 0, we add to the77-

third coordinate in the formula above.


Three surfaces in Jl 3 implicitly defined by spherical coordinate equa-
tions r = 1, </> = 77/4, and d = 77/3, respectively, are shown in Fig. 11.

Figure 11
Sec. 5 Curvilinear Coordinates 315

The corresponding rectangular coordinate equations derived from the


above expressions for T~ l are

x
2
+ / + z2 = 1, x
2
+ v
2
>

z =—
V2
Vx + f + 2
z *, z >

V3x, x> 0,
respectively.

The name "curvilinear" is applied to coordinates for the reason that,


if all but one of the nonrectangular coordinates are held fixed, and the
remaining one is varied, the coordinate transformation defines a curve in
31". Thus in plane polar coordinates the coordinate curves are circles and
straight lines as shown in Fig. 12. For spherical coordinates, typical

6= -r(r varies)
4

r = 2 (6 varies)

6= — jg-(r vanes)

Figure 12

coordinate curves are the circle, semicircle, and half-line obtained as


intersections of the pairs of surfaces shown in Fig. 11. The curves and
surfaces got by varying one or more curvilinear coordinate variables play
the same role that the natural coordinate lines and planes of 31" do. For
example, to say that a point in 3l
3
has rectangular coordinates (x, y, z) =
(1, 2, 1) is to say that it lies at the intersection of the coordinate planes
x = 1 , y = 2, and z = 1 . Similarly, to say that a point in 3l 3 has spherical
coordinates (r, (/>, 6) = (1, tt/4, tt/3) is to say that it lies at the intersection
of the surfaces shown in Fig. 11.
Generalizing from the preceding examples, we see that a system of

curvilinear coordinates in 31" is determined by a function 11" —


T
> 31". It is

assumed that for some open subset N in the domain of T, the restriction of
T to N is one-to-one and therefore has an inverse T' 1 . The curvilinear
316 Vector Calculus Chap. 4

coordinates, determined by T and N, of a point x lying in the image set


T( N) are

= t-A

It is convenient to impose fairly stringent regularity conditions on a


coordinate transformation. Specifically, we shall assume that, at every
point u of N. the function T is continuously differentiable and that
r'(u) is invertible.
The polar and spherical coordinate changes represented by

x l I r cos 6

17 \ r sin

and
f
x\ / r sin </> cos Q\

y \ = I r sin </> sin 6 \

K
zJ \ rcos<j> /

have Jacobian matrices

cos 6 —r sin 6\
(5)
sin r cos B)

and

'sin <j> cos r cos (f>


cos 6 — r sin <f>
sin 6\

sin <£ sin r cos (/> sin /- sin <f>


cos j , (6)

cos (f>
—rs'mcf) /

respectively. The matrices (5) and (6), and more generally the Jacobian
matrices of differentiable coordinate transformations, have a simple
geometric interpretation. Each column of the Jacobian is obtained by
differentiation of the coordinate functions with respect to a single variable,
while holding the other variables fixed. This means that they'th column of
the matrix represents a tangent vector to the curvilinear coordinate curve
for which they'th coordinate is allowed to vary. That is, let the coordinate
t
transformation be given by lL" —
column of the matrix
:ii". Then they'th

of the derivative T'{u ) is a tangent vector, which we shall denote by


Cj, at x = r(u ), to the curvilinear coordinate curve formed by allowing

only they'th coordinate of u to vary. Tangent vectors are shown (with their
Sec. 5 Curvilinear Coordinates 317

'2 Vl)

y
318 Vector Calculus Chap. 4

Figure 14

3
Example 3. {Cylindrical coordinates in 'J\ .) The coordinate transfor-
mation is defined by

f
x\ Ir cos 6^

The Jacobian matrix is

'cos 6 —r sin 6 N

sin 6 r cos d

Curvilinear coordinate surfaces and tangent vectors to curvilinear co-


ordinate curves are shown in Fig. 1 5. Notice that the Jacobian determinant

Figure 15
. .

Sec. 5 Curvilinear Coordinates 319

d(x, y, z)

d(r, 6, z)

Computations involving curvilinear coordinates are much simpler if


the coordinate curves, and the vectors c k are orthogonal. (This is so in the ,

examples we have considered.) In this case it is customary to replace the


c k by unit vectors having the same direction. Thus, letting h x = |cj, . . .
,

h n = c J> we nave a local orthonormal basis (l/^c^


l
l//? n c n ). The . . . ,

result is that the matrix H, whose columns are the rectangular coordinates
of 1/AiCi, l//z„c n is an orthogonal matrix whose inverse is equal to
. . . , ,

its transpose.

Example 4. For spherical coordinates in 'Ji


3
we have h x = 1, h2 = r,

h3 = r sin (f>,
so that the matrix H is given by

'sin <f>
cos 6 cos (f>
cos 6 — sin 6 s

H= | sin <j> sin 6 cos <j> sin 6 cos |

cos (j) —sin (/>

We have assumed < </> < n, so that sin <f>


> 0. Then, because the matrix
H is orthogonal, H* 1 = W and
sin <f>
cos d sin ^ sin d cos </>

_1
// = | cos (f)
cos 6 cos </> sin —sin </> |

-sin cos

We have seen that the vectors c x , . . . , c„, tangent to coordinate curves,


describe approximately the nature of a curvilinear coordinate system. The
inner products of these vectors among themselves occur sufficiently often
that there is a special notation for them:

c i- c i = gn> i,j=l,...,n.

Since c t
• c; — c,- • cu we have g (j = gH . If the vectors c l5 . . . , c n are
orthogonal, as is often the case in practice, then only the inner products
Cj • c t
= gu will be different from zero. A number of important formulas
can be expressed in curvilinear coordinates entirely in terms of the
functions g i} and without explicit reference to the particular curvilinear
coordinate functions being used.
In the .vy-plane suppose curvilinear coordinates are given by

x\ (x(u, v)

yl \y(u, v)
320 Vector Calculus Chap. 4

The Jacobian of the coordinate transformation is

whence

gu +
[dul \du)'

dx dx dy dy
gl2 = g 21 =
du dv du dv

Now suppose that


g22
fh (IF
u(t)

v(t)

is a differentiable function from an interval [a, b] to 'UA Then the equation

(x(u(t), v(t))\
f(t) = a < < t b,
y(u(t), v(t))

is the parametric representation of a curve in Jl 2 . The tangent vector to


the curve can be computed by the chain rule to be

!
'dx} dx du dx djv
dt du dt dv dt

dt dy dy du dy dv
J \
\dt I \du dt dv dtl

The length of the tangent vector dfjdt is then

dx dx dy dy} du dv
+ 2
,

du dv du dv_ dt dt

+
[(IHDW
——
du dv
(Mi 2gi2
dt dt ffi
Sec. 5 Curvilinear Coordinates 321

A similar computation in %n leads to the formula

/ » du, du A1/2
,\
1 2'

(7)
\i,i=i dt dt 1

where/(r) = {xjp&\ ... , «„(/)), • • • , x„(wi(0> • • , "„(0)>


Once g i} have been computed for the particular coordinate system,
the
Equation (7) can be used for any differentiate curve by substituting in the
components of the vector
'du^

dt

To be more specific, suppose we are given plane polar coordinates

Ir cos 6^

\yl \rsin
The Jacobian matrix is

''cos B —rsind^
Vsin 6 r cos dl

Hence
= 1, fa = gn = 0,
lr{B) cos 0^
A curve f{6) = )
has a tangent vector dfjdB of length
\r(8) sin 6 t

((IM
Example 5. If (r, <f>,
3
6) are spherical coordinates in Jl , the length of
a curve y defined by an equation

r\ /r(/)\

= I <f>(t)
I
, a <t <b,
Wv
can be computed from Equation (7). The derivative of the coordinate
transformation
'x\ Ir sin <f>
cos 6^

y I = I r sin <f>
sin

\ rcos(f>
322 Vector Calculus Chap. 4

is the matrix

'sin </> cos 6 r cos </> cos — r sin <f>


sin 0^

sin <f>
sin d r cos (/> sin r sin </S cos 6

cos </> — r sin <f>

The function g i3
- is the inner product of the z'th and the/th column of the
preceding matrix. Hence

(
gll git #13

L #22 £23

Vg31 <?32 #33

And so Equation (7) yields

In particular, the curve A defined by

< r <-
2

has length
\0/ W
i. r /i
/(A)= Vl+sin 2 frff.

This integral cannot be computed by means of an elementary indefinite


integral. It can, however, be approximated by simple methods. Or it can
be reduced to a standard elliptic integral as follows. Replacing / by tt/2 —
6, and using cos 2 0=1— sin 2
d, we get

C"'~ - C" n
VI
1

+ sin
2
tdt = yj2\
1

VI - \ sin
2
d dO.
Jo Jo

The latter integral has been tabulated,! and we estimate

l(X) & y/2 (1.35) & 1.91.

EXERCISES
1. Sketch the three curves given below in polar coordinates.

(a) r = d, < < tt/2.

(b) r(sin - cos 0) = tt/2, 77/2 < < 77.

(c) r = tt/2 cos 0, 77 < < 3tt/2.

f See, for example, R. S. Burington, Handbook of Mathematical Tables, Handbook


Publishers, Inc., Sandusky, Ohio, 1965.
Sec. 5 Curvilinear Coordinates 323

3
2. In Si , sketch the curves and surfaces given below in spherical coordinates.

(a) r = 2, <6< tt/4, tt/4 < <f>


< tt/2.

(b) 1 <r< 2, 6 = tt/2, = tt/4.

(c) O<r<l,O<0< tt/2, ^ = tt/4.

(d) <r< 1 , 6 = tt/4, < <£ < tt/4.

3. Use cylindrical coordinates in Jl 3 to describe the region defined in rec-


tangular coordinates by < x, x2 + y2 < 1.

4. Let (r, 6) be polar coordinates in Jl 2 . The equation

^sin A
< <-,
2' t
t
J

describes a curve in U> 2 2


C
. Sketch this curve, and sketch its image in Jl under
the polar coordinate transformation.

3
5. Let (r, <f>, 6) be spherical coordinates in Si. . The equation

determines a curve in 3l 3 (as well as in the r<f>0-space "U 3 ). Sketch the curve
in H3 . [Suggestion. The curve lies on a sphere.]

6. Prove that the Jacobian matrices (5) and (6) of the polar and spherical
coordinate transformations given in Examples 2 and 3 have inverses.

7. The equations
x = ar sin <£ cos 0\

y = br sin <j> sin 8 \ , a, b, c > 0,

z = cr cos <j>

define ellipsoidal coordinates in SI 3 For a = 1, b = c = 2, sketch a typical .

example of each of the three kinds of coordinate surface.

8. Compute the cartesian components of the tangent vectors to the coordinate


curves for the general ellipsoidal coordinates given in Exercise 7, when
a = b = 1, c = 2, and r = \, <f>
= 6 = -n-/2.

Let r, and 6 be spherical coordinates in Si 3 . The equation


J). (f>,

2
t

3
determines a curve in Si. . Compute the cartesian components of the tangent
vector to the curve.
\ ,

324 Vector Calculus Chap. 4

10. Prove that in the 3-dimensional spherical coordinates of Example 2 in the


text, the sphere x\ - x\ - x\ = 1 has the equation r = 1.

11. Let "elliptic" coordinates in the plane be determined by

(x\ I ar cos 6
a > 0, b > 0.
br sin

(a) Compute the coefficients g tj


for this coordinate system.
(b) For what choices of a and b will it always be true that^ = for i ?fc /?

12. Verify the assertion made in the text that the y'th column of the Jacobian
matrix at u of a coordinate transformation 1L" —T > 31" is a tangent vector
at x = r(u ) to the curvilinear coordinate curve obtained by letting they'th
coordinate of u vary.

13., Show that if/(x, y) = /(/, 6), where (x, y) = (r cos 6, r sin 0), then

a2 / a2 / a2 / i a2 / 1 a/
a* 2 ~ "a^ 2 ~ a/2 ~ 72 ae 2
_r"
7 Tr '

14. (a) Find the formula for the arc length of a curve determined in plane polar
^ coordinates by an equation of the form

jr(t))
a < <b. t

(b) Compute the length of the curve


( a J )
, < < t 2.

[Ans. 2\ 5 - log (2 + v'5).]

(c) Sketch the curve.

15. An equation
fu\ iu(t)\
a<t<b,
3
determines a curve y on the conical surface in 3l

1
u cos r sin a\
\ (0 <M < 00,
u sin v sin a I

I [O < t; < 2tt,


u cos a /

where a is fixed, < a < 77/2. Find the general formula for the arc length
of y.

16. Let T(u lt ...,«„) = (Xj, . . . , x n ) define a curvilinear coordinate system in


a region D of Jl n .

" dx k dx k
(a) Show that a?,, = > -r- -=— for , /,
/'
= 1, . . . , n.
fc=l 9«£ 9«;
Sec. 6 Inverse and Implicit Function Theorems 325

ij
(b) If (g ) is the matrix inverse to (go), show that

k=i Sx k dx k

(c) Show that if /(« 1} ...,«„) represents a function differentiable in a


region D of 31", then
9
/
V/=I lift
(d) Show that if r defines an orthogonal coordinate system in D, then

SECTION 6

If a function/is thought of as sending vectors x into vectors y in the range INVERSE AND
of/, then it is natural to start with y and ask what vector or vectors x IMPLICIT FUNCTION
are sent by /into y. More particularly, we may ask if there is a function THEOREMS
-1
that reverses the action of/ If there is a function/ with the property

/
_1
(y) = x if and only if /(x) = y,

-1
then/ is called the inverse function of/. It follows that the domain of
/
-1
is the range of/ and that the range of/ -1 is the domain of/ Some
familiar examples of functions and their inverses are

/(*) = *\
326 Vector Calculus Chap. 4

whenever y x and y 2 are in the range of L. If the dimension of .11" is less

than that of %m , the range of L is a proper subspace of 3i


m In . this case,
L _1 is not defined on all of 3l m . On the other hand, if .11" and :Ji
m have the

same dimension, the domain of L~ x is all of 3l m Thus the inverse function



.

of every one-to-one linear function % n


31" is a linear function
31" ^> 31".

Example 1. Consider the affine function 31

'x\ /4 5\ Ix

y\ = 1 -6 j

,z/ \3 4/\z
It is obvious that any affine function A(x) = L(\ — x ) + y is one-to-one
if and only if the linear function L is one-to-one also. In this example,

and L(x) =

The inverse matrix of

can be computed to be

It follows that L, and therefore A, has an inverse. In fact, if /l(x) = y,


then
^(x) = L(x - x ) + y
and
A~Hy) = L-Hy - y ) + x .
(1)

That this is the correct expression for A' 1 may be checked by substituting
^(x) for y. We get

A-^Aix)) = L-^LCx - x )) + x = x.

Hence
— —
Sec. 6 Inverse and Implicit Function Theorems 327

Obviously this method will enable us to find the inverse of any affine
transformation 31" -^-> %n if the inverse exists.

We have the following criteria for deciding whether a linear function


%n — 31
m has an inverse. If M is the matrix of L, then by Theorem 4.2 of
Chapter 1 the columns of M are the vectors Z.(e ; ) and so span the range
of L. Hence L is one-to-one, and has an inverse, if and only if the columns

of M are linearly independent. Alternatively, // M is a square matrix,


then L has an inverse and only if the inverse matrix A/ -1
if exists. We
recall that M _1 exists if and only if det ^ 0. M
The principal purpose of this section is the study of inverses of non-

linear vector functions. Given a function 31" > % n one may ask: (1)
Does it have an inverse? and (2) If it does, what are its properties? In
general it is not easy to answer these questions just by looking at the
function. On the other hand, we do know how to tell whether or not an
affine transformation has an inverse and, what is more, how to compute it
explicitly when it does exist. Furthermore, iff is differentiable at a point
x it can be approximated near that point by an affine transformation A.
,

For this reason, one might conjecture that if the domain of/is restricted
to points close to x then/will have an inverse if A does. In addition, one
,

might guess that A~* is the approximating affine transformation to f" 1


near/(x ). Except for details, these statements are correct and are the
content of the inverse function theorem.

6.1 The Inverse Function Theorem

Let 51" — * 31 n be a continuously differentiable function such that


f'(x ) has an inverse. Then there is an open set containing x N
such that/, when restricted to N, has a continuously differentiable
inverse/ -1 . The image set f(N) is open. In addition,

[f-'Yiyo) = [/'(xo)]-
1
,

where y =/(x ). That is, the differential of the inverse function at


y is the inverse of the differential of/ at x .

The existence of/ -1 is proved in the Appendix. Once the existence has
been established, we can write where 51" >% n is thef~ °/=
x
/,

identity transformation on the neighborhood N. Then by the chain rule


we have, since the identity transformation is its own differential,

[/-TWtaO = l or If-'Yiyo) = [/'(xo)]-


1
-

For real-valued functions of one variable, the existence of an inverse


function is not hard to prove. Let 31 — 51 satisfy the differentiability
328 Vector Calculus Chap. 4

condition of the theorem, and suppose that/'(x ) has a matrix inverse.


Since the inverse matrix exists whenever/' (x ) ^ 0, the geometric meaning
of the condition that/'(,Y ) have an inverse is that the graph of/should not
have a horizontal tangent. To be specific, suppose that/'(x ) > 0. Since
/' is continuous, we havef'(x) > for every x in some interval a < x < b
that contains x , as shown in Fig. 16. We contend that/ restricted to this

Figure 16

interval is one-one. For suppose x x and x 2 are any two points in the
interval such that xx < x 2 By . the mean-value theorem it follows that

/(*»)-/(*!)
/'(-Xo),

for some x in the interval xx < x < x2 . Since/'(x ) > 0, and x 2 — xx >
0, we obtain
/(*,) -f(Xl ) > 0.

Thus, /is strictly increasing in the interval a < x < b, and our contention
is proved. It follows that /restricted to this interval has an inverse. The
other conclusions of the inverse function theorem can also be obtained in
a straightforward way for this special case.

Example 2. Consider the function /defined by

x
3
— 2xy 00 < X < 00,

x +y oo < y < oo.


At the point

-1
the differential dx /is defined by the Jacobian matrix

'3x 2 -2y 2
-4xy\ (\ 4

1 1
Sec. 6 Inverse and Implicit Function Theorems 329

The inverse of this matrix is

Since /is obviously differentiable, we conclude from the inverse function


theorem that in some open set containing x the function /has an inverse
_1
/ Moreover, if
.

y =/(Xo) =

the matrix of the differential dy f~ x is

Although it would be difficult to evaluate/ -1 explicitly, it is easy to write


down the affine transformation that approximates/ -1 in the vicinity of the
point y . It is the inverse A~ x of the affine transformation A that approxi-
mates/near x We have, . either by the inverse function theorem or by
Formula (1) of Example 1,

^(x)=/(x )+/'(x )(x-x )

= y +/'(x )(x - x ).

Ar\i) =/-1 (y ) + [/- 1 ]'(y )(y - y )

= x + [/'(x )]- 1 (y - y ).

lu\
Hence, if we set y = I

\v

u\
+
i

u
+

Example 3. The equations

u — x 4_y + x, v = x +y z

define a transformation from Jl 2 to :il


2
. The matrix of the differential of
the transformation at (x, y) = (1, 1) is

(4x3y + 1 x4 \ 15 1

2
1 3^ /( X>y)=(lil) \1 3
330 Vector Calculus Chap. 4

Since the columns of this matrix are independent, the differential has an
inverse, and according to the inverse function theorem the transformation
has an inverse also, in an open neighborhood of (x,y) = (1, 1). The
inverse transformation would be given by equations of the form

x = F(u, v), y — G(u, v).

The actual computation of F and G is difficult, but we can easily compute


the partial derivatives of F and G with respect to it and v at the point
(w, v) = (2, 2) that corresponds to (x, y) = (1, 1). These partial deriv-
atives occur in the Jacobian matrix of F and G or, equivalently, in the
inverse matrix of the differential of the given functions. We have

(2,2) jEftJV
'f
au ov 1/5 1

f
\ou
(2,2) ?W
ov

Suppose %n — 3i
n
is a function for which the hypotheses of the
inverse function theorem are satisfied at some point x . It is important to
realize that the theorem does not settle the question of the existence of an
inverse for the whole function/, but only for/ restricted to some open set
containing x For example, the transformation
.

x\ III cos v\
= . , 0<u,
y) \ u sin vj

has Jacobian matrix

with inverse matrix


(cos v sin v \

— 1.1 - sin v - cos v I

u u I

The inverse matrix exists for all (it, v) satisfying it > 0. However, the
otherwise unrestricted transformation clearly has no inverse, for the same
image point is obtained whenever v increases by 277. Two corresponding
regions are shown in Fig. 17. If the transformation is restricted so that,
for instance, < v <becomes one-to-one and has an inverse.
2tt, then it

In Section 4 we considered
problem of finding derivatives of an
the
implicitly defined function / under the assumption that / satisfied an
equation F(x,/(x)) = 0, with both / and F differentiable. We saw that
to solve for/'(x ) by matrix methods it was necessary for Fy (x ,/(x )) to
have an inverse. It is natural that the same condition occur in the next
Sec. 6 Inverse and Implicit Function Theorems 331

9_7T

lit

Figure 17

theorem, which treats the question whether there exists a differentiable f


The proof can be made to depend on the inverse
defined implicitly by F.
function theorem, and we give both proofs together in Section 5 of the
Appendix.

6.2 Implicit Function Theorem

Let 'Si n+m


F —
> % m be a continuously differentiable function. Suppose
that for some x in 3i n and y in %
m

1. F(x o ,y ) = 0.

2. Fy (x , y ) has an inverse.

Then there exists a continuously differentiable function %n — > %m


defined on some neighborhood N of x such that /(x ) = y and
F(x,/(x)) = 0, for all x in N.

As we showed in Theorem 4.2, the derivative of/ is then given by

f'(x) = - Fv\x, f(x))F x (x, f(x)).


Example 4. The equation x zy +y 3
x —2= defines y =f(x)
implicitly in a neighborhood of x = 1 if/(l) =1. As a function of y,
x 3y +yx— 3
2 has Jacobian (1 + 3y 2 ) at x = 1 , and the latter is invertible
at y = 1, that is,

1 + 3/U = 4^0.
332 Vector Calculus Chap. 4

The solution can be computed by standard methods to be

3/1 1 / x
iU
+ 27 3/1 _ 1 x™ + 27 |
x xV 27 Vx xV 27 '

Example 5. The equations


z 3x + w 2y 3 + 2xy = 0, xyzw —1=0 (2)

can be written in the form F(x, y) = 0, where x = I ) , y = j


1 , and
\yj \wj
(z 3 x + w 2 y 3 + 2xy\

xyzw — 1

Let x = I and y = I I •
Then

l-z 3 - w2 + 2
F<*o, y) =
\ zw - 1

and the matrix Fy {\, 1) is

-3z
2
2w\ (-3 -2
w z /Q=(l)
" \ 1 1

The inverse exists and is the matrix

1 -2
1 3

It is then a consequence of the implicit function theorem that Equations


(2) implicitly define a function/in an open set about x such that/(x ) = y .

That is, we have

and so each of z and w is a function of x and y near


'-r

EXERCISES
1. Can a function have two different inverses?

2. Show that (/"V 1 =/


f See R. S. Burington, Mathematical Tables and Formulas, Handbook Publishers,
Inc., Sandusky, Ohio, 1965.
Sec. 6 Inverse and Implicit Function Theorems 333

3. Which of the following functions have inverses?

(a) y = cosh x, — oo <x < oo.

(b) y = cosh x, <x < oo.

(c) /(*) = tan *,

(d) /(x) = tan x, < x < tt/4.


(e) j = x 2 - 2x + 1, 1 <x < oo.

(f) y = x 2 -3x + 2,0 <x < co.

-1
4. Compute /I for the following affine functions:
"
(a) A(x) = Ix + 2.

1 3 H - 1

(b) ^
2 4 v -2
jc - 3
*
/l«.y. /4

7 -4
Usjngthe inverse function theorem, show that the following functions have
*
x
inverses wnen"TestriclelTTo~5rjtfrie'lJ"pen set containing .

(a) f(x) = tan x, x = 77/6.


ib)y 3x + 2, xQ = 4.

(c) y 7x + 6, x = 4.

\d) /(jc) = f e-« dt, x = 0.


J — 00

jr — j'
6. Let/
2xy

(a) Show that, for every point x except

the restriction of/ to some open set containing x„ has an inverse.


(b) Show that, with domain unrestricted, /has no inverse.

-1
(c) If/ is the inverse of/ in a neighborhood of the point /
J,
compute
x
the affine transformation that approximates close to \2)
f~

-3

Ans.
"

334 Vector Calculus Chap. 4

7. Find the affine function that best approximates the inverse of the function

lx\ /x3 + 2xy + y2


f
\yj \ x +y
1,

l
( \
near the point / Notice that to find the precise inverse would be
W
I.

difficult.

I 3 3 \ /

Ans.

8. (a) Let T be defined by

(x\ lr\ tr cos 0\ [0 < r,

\yj \d) \r sin /' |o < d < 2i

Find T'(u) and its inverse for those points

for which they exist,

(b) Let S be defined by

f r sin <f>
cos 8

r sin <f>
sin 6

r cos <f>

Find S'(ii) and its inverse for those points

for which they exist,

(c) Compute an explicit representation for S~1 .

9. Suppose that the function T defined by

T
vl \yl \g( x y)
>

has a differentiable inverse function 5 defined by

(h(u, v)
--
S
y] \v] \k(u, v)

If/(l,2) =3,^(1,2) =4, and T\\, 2) equals

3 5\ dh
find —
dv
(3,4). [Ans. -5.]
4 7/

Sec. 6 Inverse and Implicit Function Theorems 335

10. If

compute dv/dy at the image of («, v, w) = (1, 2, —1), namely, (x, y, z) =


(2, 6, 8). [Arts. 0.]

11. Let
/h\ /h 2 + h 2 i; + Kb
2
\v) \ U + V

-1
(a) Show that /has an inverse/ in the vicinity of the point

/11.8
-1
(b) Find an approximate value of/ I

13. Show that the differentiable function

/ f{x,y,z)

F{x,y,z) = I g{x,y,z)

\f(x, y, z) + g{x, y, z)j

can never have a differentiable inverse.

14. Although the condition that the differential dx f have an inverse is needed
for the proof of the inverse function theorem, it is perfectly possible for
this condition to fail even though an inverse exists. Verify this fact with the
example /(jc) = x3 .

15. The theorem is the correct modification of the simple but


inverse function
an inverse, then/ has an inverse. The converse
false assertion that if </x /has
namely, if / has an inverse, then clx f has an inverse is also false (see —
Exercise 14). It too, however, is almost true. Using the chain rule, prove
the corrected form: Iff is differentiable and has a differentiable inverse, then
dx f is one-to-one.

16. Consider the function 31 > Jl defined by

(X l
- + x 2o sin -, if x + 0.
2
/(*)=
lo, if x = 0.
336 Vector Calculus Chap. 4

Show that rf /is one-to-one but that/ has no inverse in the vicinity of x = 0.

What does the example show about the inverse function theorem?

17. What is the inverse function of the linear function

if

18. The inverse function theorem can be generalized as follows:

Let Jl" — > Jl m , where n < m, be continuously different table. If dx f is


one-to-one, there is an open set N containing x such that f, restricted to N,
has an inverse f" 1 .

An this theorem and the inverse function


important difference between
theorem as we have stated it is that here the image /(TV) is not an open
-1
subset of &'". One consequence of this is that/ is not differentiable at

/(So).

(a) If

for what points (// , v ) does / have an inverse in a neighborhood of


/(«o> y o) ?
(b) Prove the generalized inverse function theorem. [Hint. Let the vectors

yls . . .
, yn be a basis for the range of dx f.
Extend to a basis
v i> • • •
, y«, y n+ \, • • , y m for all of &'", and define R m -£->• Jl" by

Show
GH
that {g o /)and dXo (g ° /) = (df(X(j) g)
= («!, . . . , a n ).

° (dx
J) satisfy the condition
of the inverse function theorem.]

19J Consider the equation (x — Tfy + xe v_1 = 0.

(a) Is y defined implicitly as a function of x in a neighborhood of (x, y) =


(1,D?
(b) In a neighborhood of (0, 0)?
(c) In a neighborhood of (2, 1)?

20. The point (x, y, t) = (0, 1, —1) satisfies the equations

xyt + sin xyt =0, x +y + t = 0.

Are x and y defined implicitly as functions of t in a neighborhood of


(0, 1, -1)?

21. Requirement 2 in the implicit function theorem that Fy (\ , y ) have an


inverse is not a necessary condition for the equation F(x, y) = to define a
a

Sec. 7 Surfaces and Tangents 337

unique differentiable function / such that /(x ) = y . Show this by taking


F{x,y) = x 9 - and (x ,y ) = (0, 0).
f
22. Show that if N' is an open subset of &n+m containing the point (x , y ), then
the subset N of all x in 31" such that (x, y ) is in N' is an open subset of 51".

23. Prove that under the assumptions of Theorem 6.2 there is only one function
/defined by F(x,f(xj) =
neighborhood of x and satisfying/(x ) = y
in a .

[Hint. Use the function H(x, y) = (x, F(x y)).]

SECTION 7

While explicit, implicit, and parametric representations of surfaces have SURFACES AND
so far been used for illustrative purposes, a precise definition of the term TANGENTS
"surface" has not been given. In this section we shall define a smooth
surface in terms of each of the three representations, give a unified
definition of tangent for each mode of representation, and show how the
three are related. In particular, we shall see that for each of the three
ways of representing a surface —as an image, as a graph, or as a level set —
representation for a tangent is obtained by taking the image, graph, or
level set of the affine approximation to the given function.
An /7-dimensional plane in 3l m is either an ^-dimensional linear sub-
space (that is, the set spanned by n linearly independent vectors) or else an
affine subspace (that is, the translation of a linear subspace by a fixed
vector y ). When n = 1 or n = 2 we get a line or an ordinary plane. A
parametric representation of an /?-dimensional plane in 3i m is obtained by
looking at the range of an affine function %n — >'Ji
m
, where A(x) =
L(x) +
y and L is a linear function defined on 31". To ensure that the
,

range of A is ^-dimensional, we can require that the matrix of L have n


linearly independent columns, since the columns L{z ) span the range of L.
}

Parametrically Defined Surfaces

Let a set 5 be defined parametrically by a function Jl" — > 3i m .

According to the definition of Section 1 of Chapter 3 this means


that 5 is the image under/ of the domain of/. Next we restrict/ to
a neighborhood of some point x on which /is one-to-one. If/ is
differentiable at x then the affine function A that approximates/
,

near x is given by A(x) =/(x ) +/'(x )(x — x ), for all x in %n .

Then, if A defines parametrically an n-dimensional plane, this plane


is called the tangent plane to S at/(x ). Notice that n, the dimension
of the plane, required to be the same as the dimension of the
is

domain space of/. If, in addition, /is continuously differentiable


on its domain, then the set S is called a smooth surface (or smooth
curve if n = 1) at every point at which there is a tangent.
w

338 Vector Calculus Chap. 4

Example 1. Consider the surface S in 3-dimensional space 51 3 defined


parametrically by
'u cos pN
,'»\ / . \
(0<k<4,
10 < C <2,

This function is discussed in Example 7, Section 1 Chapter 3, and its

range, which is the surface S, is pictured in Fig. 6. At (w , v ) = (2, 77/2)


the matrix of the differential is

'cos v — sin v Q

sin v u cos v

The affine function A(\) =f(x ) +/'(x )(x — x ) that approximates f


near x is therefore given by

are linearly independent, we conclude that the range of A is a plane.


Hence, the surface S has a tangent plane at (0, 2, tt/2). Eliminating u and
v from the equations
x =— 2v + 77

y= u
Z — V,

we obtain
x = — 2z + 77

as the equation that implicitly defines the tangent plane to S at (0, 2, 77/2).

See Fig. 18.


Sec. 7 Surfaces and Tangents 339

Figure 18

Example 2. The function of / defined by

ft'
fx\
7(0= \y I
= I
r \.

and discussed in Example 6, Section 1 of Chapter 3, parametrically defines


the curve shown in Fig. 19. The differential of/ at t is defined by the
Jacobian matrix
1

and the affine function

^4(0 =/('„) +/'('„)(' -?o)


that approximates /near t is given by

/ 1

3| +(t-tM2t
2
\3t /

l
1°'
= t\ 2t \ I — I t\ |

\3d/ \2tl
340 Vector Calculus Chap. 4

Figure 19 Figure 20

Since (1, 2r , 3/q) ^ 0, it follows that the range of A is the tangent line to
the curve at /"(/„). Figure 19 shows the curve and the tangent line to it at
/(I) = (1,1,1)-

The condition that the affine approximation to 3t n — > 51 '" defines an


^-dimensional plane is important both because of the restriction it places on
the tangent and because of the smoothness that it requires of the surface
if/ is continuously differentiable. As far as the tangent goes, it is clear
that its dimension at/(x ) is the same as the dimension of the range of the
differential of/ at x
But because the columns of the m-by-n Jacobian
.

matrix/'(x ) differential, it is enough to require this


span the range of the
matrix to have n linearly independent columns. That is,

7.1 Theorem

Let 3i n — > %m be differentiable. Then the tangent to the range


of/at/(x ) exists (and has dimension n) if and only if the matrix
/'(x ) has n linearly independent columns.

The requirement that / be continuously differentiable signifies for a


smooth surface 5 that the tangent varies continuously from point to point
on S. To see the effect of the dimension requirement for a smooth surface,
we consider the following example.
Sec. 7 Surfaces and Tangents 341

Example 3. The function 'J\


2 — > Jl 3 defined by

'u cos y>

f(u, v) — I
u sin u |
, for (w, v) in Jl 2 ,

is continuously differentiable because the Jacobian matrix

-
ftp. »)

has continuous entries. The range of/ is a cone shown in Fig. 20. Points
of the cone not at the vertex correspond to values of u ^ 0, and it is easy
to check that for u 7^ the columns of/'(w, v) are linearly independent.
Thus the tangent plane at such a point has the expected dimension,
namely 2. However, at the vertex, u = 0. Therefore, /'(0, v) has only one
nonzero column. Thus any attempt to use the affine approximation to/
to define a tangent at the vertex leads to a 1-dimensional tangent. Indeed,
it seems natural to say that the cone has no tangent at its vertex. However,
the cone satisfies the definition of smooth surface at every other point.
The lack of smoothness at the vertex is not associated with a lack of
differentiability in/, but rather with the failure of the tangent to exist.

Explicitly Defined Surfaces

Suppose a set S is defined explicitly by a function Jl"


n+m
% m This — .

means that S is the graph of/ in % consisting of the points of ,

the form (x,/(x)), for all x in the domain of/ If/ is differentiable
at x , the affine function

^(x) =/(x ) +/'(x )(x - x )

explicitly defines an ^-dimensional plane, and this plane is called the


tangent to S at x . If in addition/is continuously differentiable, then
S is a smooth surface.

The complication in the parametric theory that requires checking that


the differential has ^-dimensional range does not occur here. If/'(x )
exists, then A always has as its graph an ^-dimensional plane.
342 Vector Calculus Chap. 4

7.2 Theorem

The graph of every affine function ,'R" %m is an /z-dimensional


plane.

Proof. By the definition of an affine function, there exists a linear

function ft" — > %m and a vector y in ft


m such that

A(x) = L(x) + r

} o> for all x in ft".

The graph of A is the set of all points

(x, A(x)) = (x, I(x)) + (0, y ), x in ft".

It is therefore the image under translation by (0, y ) of the graph of


L. Hence, the graph of A is an /2-dimensional plane if and only if the
graph of L, which is a subspace of ft" +m , has dimension n. But if
(e l5 . . . , e„) is the natural basis for %n , then any point (x, L(x)) on
the graph of L can be written

(x, L(x)) = x1 (e 1, L(ej)) + . . . + x n (e n , L(e„)).

Clearly, the n vectors (e ; , L(ef)) are linearly independent, and since


they span the graph of L, that graph has dimension /?.

Example 4. The hemisphere shown in Fig. 21 is defined explicitly by

(x\
y/9 - X2 -r

(2,1,2)

Figure 21
Sec. 7 Surfaces and Tangents 343

The differential of g at x = is defined by the Jacobian matrix

f
\ x/ 9
-*
_ x 2_ };
2
-y
)
^9 _ x a _ y^,^
= U
\
-iv
2/

The tangent plane to the hemisphere at (2, 1 , 2) is the graph of the approxi-
mating affine function

A(x) = g(x ) + g'(x )(x - x )

(- -9C:3
9
= x — - y.1

2 2

The plane is implicitly defined by the equation z = ^4(x), that is, by

2x + + _y 2z = 9.

The graph of any function 31" — 'Ji


m can always be represented para-

metrically by a function Jl" —^->- 3l n+m of the form g(x) = (x,/(x)). This
raises the question of whether the sets 5 that can be represented in both
ways have the same tangents and of whether the notion of smooth surface
is the same in both representations. It is clear that g is continuously

differentiable if and only if/is, because g'(x ) has the form

\/ ( x o)

where / is the n-by-n identity matrix. Thus

g(*o) + g'(Xo)(x - x„) = (x /(x )) + (x - x ,/'(x )(x -


,
x ))

= (x,/(x )+/'(x )(x-x )),

and this shows that the graph of the affine approximation to/is the same as
the range of the affine approximation to g. Hence the two definitions of
tangent and of smooth surface are the same where they overlap.
We recall that the null space of a linear function 3i n+m > 3i m is a —
subspace of 3i n+m Then, for a fixed vector x the set of all x such that
. ,

x —
x is in the null space of L is a plane in 3i n+m Clearly, the plane passes .

through x . For nonlinear functions we have the following.


344 Vector Calculus Chap. 4

Implicitly Defined Surfaces

Consider a function Ji
n+m — > %m and a fixed vector z in %m . Let
5 be the level set defined by the equation F(x) = z . If F is differ-

entiable at a point x in % n+m and the affine approximation A(x) —


F(x ) + F'(x )(x — x ) determines an /7-dimensional plane implicitly
by A(x) = z then this plane is called the tangent to S at x„. Since
,

F(x ) = z the defining equation of the plane reduces to


,

F'(x )(x - x ) = 0. (1)

If in addition Fis continuously differentiable on its domain, then S


is representable as a smooth surface near every point at which there is
a tangent.

Example 5. The equation x 2 +y + 2


z2 = 9 implicitly defines a sphere
of radius 3 with center at the origin in 3-dimensional Euclidean space.
An equation of the tangent plane to the sphere at

x = (x ,y ,z ) = (2, 1,2)

is determined as follows. If F(x, y, z) = x2 -f y


2
+ z2 , the Jacobian
matrix F'(x ) is

(2x 2v 2z ) = (4 2 4).

Equation (1), which implicitly defines the tangent plane, is therefore

(4 2 4)1 y- 1 =0.

This is equivalent to 4x + 2y + 4z = 18 and thence to 2x + y + 2z = 9.


Notice that we have found the same equation as that obtained for the
tangent plane in Example 4.

If the plane determined by the equation F'(x )(x = — x is not n-


dimensional, then, according to the definition, the function %n+m
)

—F > 3i m
does not assign a tangent to the level set F (x) = z at x = x . This is

similar to the complication that occurs in the parametric theory and that
gave rise to Theorem 7.1. In the present case we need to know that n is

the dimension of the null space of the linear transformation with matrix
F'(x ). Since the dimension of the null space is equal to the dimension of
thedomain minus the dimension of the range, we want the dimension of the
range to be m. (Theorem 4.7, Chapter 2.) Hence
Sec. 7 Surfaces and Tangents 345

7.3 Theorem

Let Jl"+" !
—^'J{'" be differentiable. Then the tangent to the level
set F(\) = z at x exists (and has dimension n) if and only if F'(x )

has m linearly independent columns.

Example 3 shows that in the parametric case a surface may fail to be


smoothly represented because the dimension of the tangent is too small.
The following example is fairly typical of the way in which the
standard method of assigning tangents in the implicit case may fail.

Example 6. Figure 22 shows the folium of Descartes, which is

the level set determined by x3 +y — 3


3xy — 0. The function
F(x, y) = x3 +y — 3
3xy is continuously differentiable with deriv-
ative

F'(x, y) = (3x
2
- 3y, 3/ - 3.x).

The criterion of Theorem 7.3 requires that one or another of the


two entries be different from zero, in which case the tangent at a
point (x ,
y ) satisfies
Figure 22
Oo - yo)(x - x Q) + Oo - x )(y - >- ) = 0.

The one point at which the differential fails to assign a tangent is (x, y) —
(0, 0). There the null space of the differential has dimension 2.

The function 31 — > Si


2
given by

/ 3f _3r_\
3t
fit) = 1 < < t 00
ll + J
^'l
f 1 + ry'

is a parametrization of the part of the curve that lies in the first and second
quadrants, and it assigns the curve a horizontal tangent at the origin.
Interchanging the coordinate functions off gives a parametrization of the
part of the curve in thefirst and fourth quadrants.

If 'Ji
n+ —
m F >- % m is continuously differentiable and its level set F(\) =
has an implicitly determined tangent at a point x , then according to
Theorem 7.3 the m-by-(n +
m) matrix -F'(x ) has m linearly independent
columns. Denote the variables corresponding to these columns by the
vector y and observe that the implicit-function theorem applies. The
conclusion is that, writing x = (v, y), there is a continuously differentiable
function 31™ — > tfl
m satisfying F(v,/(v)) = in some neighborhood of x .

The significance of this result is that, restricted to a neighborhood of x ,

the level set F(x) = can be represented as the graph of the function/. It is

routine to show that we get the same tangent by using the explicit or the
implicit representation, and we leave the computation as Exercise 12.
346 Vector Calculus Chap. 4

EXERCISES
1. Find a parametric representation tx 1 + x 2 for the tangent line to each of the
curves defined parametrically by the following functions at the points
indicated. Sketch the curve and the tangent line.

(a) /(/) = I , at/(0). Ans. t

(b) g(t) = y

2. Find the tangent plane to each of the surfaces defined parametrically by the
following functions at the points indicated. Sketch the surface and the
tangent plane in (a) and (b).

< u <2
(a)/
< v <2 ^/

Ans.

< u < 2tt


a 8 \ttJ4}
<v < rrjl]

3. Find the tangent plane or tangent line to each of the following explicitly
defined curves and surfaces at the points indicated.

(a) f{x) =(x- 1)(* - 2)(x - 3), at (0, -6).

(b) f(x,y) = ~^—


x" + 2
c
, at (x , >v/(Wo)) = (0, 2, £).
y

'1

(c) git) = Ans. L(t) = t

(d) g{x,y) = cosh (x 2 + y 2 ), at (x ,


y g(x y
, , )) =(1,2, cosh 5).
Sec. 7 Surfaces and Tangents 347

4. Find the tangent line or tangent plane to each of the following implicitly
defined curves and surfaces at the points indicated. Sketch the curve or
surface and the tangent in (b), (d), and (e).

(a) xy + yz + zx = 1, at x = (2, — 1, 3).

T +^+z
2
(b) l,atx = (l>0>-y)-

Ans. - + V3 z = 2.
2

(c) 5x + 5y + 2z = 8 at (1, 1, -1).

(
'2

d ) -5 +
/ = l.atUo.J'o) (4--).
15

x* + f
, atx
x +y

Ans.

5. In each of Exercises 1(a), 2(a), 2(b), and 4(b), find a normal to the given
curve or surface at the point indicated.
Ans. 1(a)

6. Each of the following curves and surfaces fails, according to our definitions,
to have a tangent line or plane at the indicated point. Why?

(a) /(/)

(b) g{t)

(c)/
348 Vector Calculus Chap. 4

(d) f{x, y) = Vi - jc* - /, at \-j- , — ,


0)

7. Find all points at which the surface defined parametrically by the function

,2„2

«t + 1

fails to have a tangent plane.

8. Different vector functions can define the same curve or surface. Show that
the functions

fx (t) = (cos t, sin t), < t < 2tt,

Is
1 - 1 2s \

parametrically define the same curve in 2-dimensional Euclidean space.

9. Consider the vector functions

f{t) = I
*" I, -oo < t < 00,

— 00 < u < 00,

— 00 < v < oo.

(a) What curve and what surface are parametrically defined by / and g,
respectively?
(b) Show that according to our definition the curve does not have a tangent
line and the surface does not have a tangent plane at (0, 0, 0).

10. Let

fit) = I .),
-oo < r < co.

(a) Show that the curve in Jl 3 defined explicitly by /has a tangent line at
every point.
(b) Show that the curve in &'- defined parametrically by / fails to have a
tangent line at one point.
Sec. 7 Surfaces and Tangents 349

(c) Interpret (a) and (b) geometrically. What is the relation between the
tangent line in (a) and in (b)?

11. Let y = lie in the range of a function &n —F Rm . The surface S defined
implicitly by the equation F(x) = is assumed to have a tangent "B at x .

n+ m
(a) Check that the surface S' in R defined explicitly by F has a tangent
T?' at (x y ).
,

(b) Let P be the plane in & n+ '"


defined by the equation y = 0, and show that
S = S' n P.
(c) Prove that T3 = 15' n P.
(d) Using the equation F(x, y) = x 2 + y 2 — 2 = and the point x =
(1, 1), draw a picture illustrating S, "B, S' TS' and P. , ,

(e) Using the equation F(x,y) = 4x 2 — 4xy + y 2 = and the point


x = (1, 2), draw a picture illustrating S, S', TS', and P. What happens
to "6?

12. Show that if Jl n+m —


> Jl m determines a smooth surface S, then the tangent
to S at x determined by F'(x )(x — x ) = is the same as the tangent to the

graph of the function X. n —


v 3l m which satisfies F(v,/(v)) = 0, and whose
existence is guaranteed by the implicit function theorem.

13. Verify that the two parametrically defined tangents at the origin that are
described in Example 6 of the text are horizontal and vertical, respectively.

14. Show that if P is an ^-dimensional plane through the origin in Jt


m+n , then P
is precisely the null space of some linear function taking values in Rm .
5

Real- Valued Functions

SECTIOxN 1

EXTREME VALUES The problem of finding the maximum and minimum values of a real-
valued function of several variables is important in many branches of
applied mathematics, as well as in pure mathematics. Familiar examples
are extremes of temperature, speed, or economic profit, each of which
may be a function of more than one variable in a practical problem.
A real-valued function / has an absolute maximum value at x if, for
all x in the domain of/,
f(x) </(x ),

and an absolute minimum value if instead

/(x ) </(x).

The number/(x ) is called a local maximum value or a local minimum value


if there is a neighborhood N of x such that, respectively,

f(x) </(x ) or /(x ) </(x),

for all x in iV. A maximum or minimum value of/ is called an extreme


value. A point x at which an extreme value occurs is called an extreme
point.

Example 1. Consider the function f(x, y) = x2 -j- 2


y whose domain is
the set of points (x, y) that lie inside or on the ellipse x 2 + 2y 2 = 1. The
graph of/ is shown in Fig. 1. Suppose that /has an extreme value (i.e.,

maximum or minimum) y at a point (x , ) in the interior of the ellipse.

Then obviously, both functions/ and/ defined by

A(x)=f(x,y ), fz (y)=f(x ,y)

350
Sec. 1 Extreme Values 351

must also have extreme values at x and y , respectively. Applying the


familiar criterion for differentiable functions of one variable, we have

fi(xo)=tt(yo) = 0.
Since

/iOo) = — 0„, Jo)


ox
and
r)f

/2O0) = — (*o, y ),
dy

a necessary condition for/to have an extreme value at (x , v ) is

— (*o> Jo) = — (*o> >'o) = 0.


ox oy
In this example,
d
-f{x,y)
ox
= 2x and ^
oy
(x, y) = 2y,

and so the only extreme value of/ in the interior of the ellipse occurs at
( x o>yo)
= (0, 0). It is obvious from the graph of/, shown in Fig. 1, that

< °>
' V2 '

(1, 0, 0)

Figure 1

this value is a minimum. We next consider the values of/on the boundary
curve itself. The ellipse can be defined parametrically by the function

g(0 (x, y) = 1 cos t,


—= sin t) , < < t 2i
\ J2 )

Thus, the values of/on the ellipse are given as the values of the composition
f° g. Any extreme values of/ on the ellipse will be extreme for/° g. The
.

352 Real- Valued Functions Chap. 5

latter is a real-valued function of one variable, and we treat it in the usual


way, that is, by setting its derivative equal to zero. By the chain rule, we
obtain

/ —sin / \

-(fog) = (2 cos 1 4= sin t )l 1


cos
dt \ v 2 /\-;
7
= — 2 cos sin + t t sin t cos t

= — I sin 2/.
Extreme values therefore may occur at / = 0, 77/2, tt, and 37r/2. The cor-

responding values of (a\ v) are (1,0), (0, I/V2), (-1, 0), and (0, -1/V2),
and those off are and \, respectively. We see that the absolute
1. \, 1,

minimum of/ is and that the absolute maximum of/ occurs at


at (0, 0)
the two points (1,0) and (—1,0). Notice that the two extreme values of

f°g that occur at t = tt\2 and 377/2 are not extreme for/ as can be seen
by looking at Fig. 1

The methods used in the preceding example are valid in any number of
dimensions. The next theorem is the principal criterion used in this

extension, and while it can be proved by reducing it to the one-variable


method, we give a proof that contains the one-variable situation as a
special case.

1.1 Theorem

If a differentiate function 31" — > :H has a local extreme value at a


point x interior to its domain, then/'(x ) = 0.

Proof. Suppose/ has a local minimum at x . For any unit vector u


in :({", there is an e > such that if —e < < t e, then /(x ) <
/(x + /u). Hence, for < < t e,

^ /(x + t u) - /(x )

f(x - tu) -/(x )


<

It follows by Theorem 1.1 of Chapter 4 that


df
(x ) =/(x )u.
du
Sec. 1 Extreme Values 353

Therefore,
- /(Xo)
0<lim /(Xo +
fU)
= r(x )u,
?->o+ t

n ^ hm /(Xp -
< i-
tu) - f(x )
=/ (xo)(— u) = —/ (x„)u.
,

<-o+ t

We conclude that/'(x )u = 0. Because u is arbitrary, /'(x ) = 0.

The argument for a maximum value is analogous.

This result is what we should expect. Recall that

(x ) =/'(x )u,
du

and that the derivative v/ith respect to u measures the rate of change of/
in the direction of u. At an extreme point in the interior of the domain of/,
this rate should be zero in every direction. The importance of the theorem
is that of all the interior points x of the domain of/ we need to look for

extreme points only among those for which/'(x) = 0. Points x for which
/'(x) = are called critical points of/.
In practice we are often given a function /that is differentiable on an
open set and want to find the extreme points of/ when it is restricted to
some subset S of the domain of/ example the following two
In the next
remarks are x such that /'(x) =
illustrated: (1) a point is not necessarily

an extreme point for f; (2) / may have an extreme point x when restricted
to a set S without having /''(x) — 0.

Example 2. z) = xyz in the region defined by |xj < 1,


Let f(x,y,
\y\ < l,|z| <
Thus the domain of/is the cube with each edge of length
1.

2 illustrated in Fig. 2. The condition for critical points, /'(x) = 0, is


equivalent to (yz, xz, xy) = (0, 0, 0). The solutions of this equation are
the points satisfying x = y = 0, or x — z — 0, or y = z = 0; in other
words, the coordinate axes. Since /has the value zero at any one of its
critical points, and since /has both positive and negative values

in the neighborhood of any one of these points, no critical point


can be an extreme point. Furthermore, a little thought shows that
/has maximum value and minimum value — 1. These values
1

occur at the corners of the cube, none of which is a critical point.

The problem of finding the extreme values of a function/on the


boundary of a subregion R of Jl" is one in which /has been re-
stricted to a set S of lower dimension than that of R. Then, as
we have seen in Example 2, it is not sufficient just to examine the
critical points of/ as a function on R. More generally, we may Figure 2
354 Real- Valued Functions Chap. 5

be interested in/when it is restricted to a lower-dimensional set S that is

not necessarily the boundary of any region at all.

Example 3. The function f{x,y, z) —y — 2


z — x has a differential
defined by the matrix
(-1 ly -i),

so/has no critical points as a function defined on .'It


15
. Suppose, however,
that /is restricted to the curve y defined parametrically by

oo < < / oo.

On y, /takes the values F(t) —f(t, t


2
, t
3
) = t
4
— t
z
— t while t varies over
(— co, oo). We have

F'(t) - 4/ 3 - 3/
2 - 1 = (t - 1)(4/
2
+ + / 1).

Then F'(t) is zero only at / = 1. Furthermore, since F"(t) — 12/ 2 — 6/, we


have F"(\) > 0. It follows that the point (1, 1, 1) is a relative minimum
for /restricted to the curve y. The minimum value of/on y is — 1, and
there are no other extreme values.

Example 4. Suppose the function /(.v, v, z) = x -+-


y + z is restricted
to the intersection of the two surfaces

a- r- = 1 , z = 2

shown in Fig. 3. The curve C of intersection can be parametrized by

f
x\ /cosr

y\ = \
sin / 1 , < < t 2tt.

The function / on C takes the value F(t) = cos t -\- sin / + 2. We have
F'(t) = -sin / + cos t, so F'(t) = at / = it/4 and t = 5tt/4. Since
F"(tt/4) < Oand F"(5rrjA) > 0,

/(x/2/2, V2/2, 2) = x/2 + 2

is the maximum and


/(-x/2/2, -x/2/2, 2) = -x/2 + 2

is the minimum value for/on C.

The solution of the previous problem depended on our being able to

find a concrete parametric representation for the curve of intersection of the


Sec. 1 Extreme Values 355

Figure 3

cylinder x2 +y — 2
1 =0 and the plane z — 2 = 0. When a specific
parametrization is not readily available, we can still sometimes apply the
method of Lagrange multipliers to be described below. The method
consists in verifying the pure existence of a parametric representation and
then deriving necessary conditions for there to be an extreme point for a
function /when restricted to the parametrized curve or surface.

1.2 Theorem. Lagrange Multiplier Method

Let the function %n — > 'Ji


m n
, > m, be continuously differentiable
and have coordinate functions G x , G2 , . . . , Gm . Suppose the
equations

G 1 (x1 ...,xn) ,
=
G2 (x 1 , ... ,xn) =

Gm (xx ...,xn ) ,
=
n
implicitly define a surface S in Jl , and that at a point x of S the
matrix G"(x has some m columns linearly independent.
If x is
)

an extreme point of a differentiable function 31" > %, —


when restricted to S, then x is a critical point of the function

A,„G,

for some constants lx . . . , Xm .


356 Real- Valued Functions Chap. 5

The complete proof of the theorem is given in the Appendix. We give


here a geometric argument that makes the result plausible. Recall that
the gradient of a differentiable, real-valued function defined in 'j\" is the
vector valued function V/ defined by

*»-8«"vi»)-
In terms of the gradient, the Lagrange condition

/'(x ) + Ax Gi(x ) + . . . + A ro G;(x ) =


can be expressed as

V/(x ) + AxVG^Xo) + . . . + A mVG m (x ) = 0. (1)

Equation 2.4 of Chapter 4 says that the vector VGj(x ) is perpendicular


to the surface S t
defined implicitly by G t
(x) = at x . But then each
vector VG t
(x ) is also perpendicular, at the same point, to the inter-
section 5 of all the surfaces S {
. Since by Equation (1), V/"(x ) is a linear
combination of the vectors VG,(x () ), the gradient of/itself is perpendicular
to S at x . Now recall that by Theorem 2.2 of Chapter 4, the gradient of/
points in the direction of greatest increase for/. That this direction should
be perpendicular to S at an extreme point x for/restricted to 5 is reason-
able, because otherwise we would expect to find a larger or smaller value
for /by moving along S
in the direction of V/(x ) projected onto S.

theorem it is important to verify that some m columns


In applying the
of G'(x) are independent for x in S. Points for which this condition fails
must be examined separately in looking for extreme points. All extreme
points x for which the condition is satisfied are such that there are
constants Alt Xm for which
. . . ,

f+l G + ... + XmGm l 1

has x as a critical point, or in other words,

/'(x ) + y^GKxo) + . . . + A m G'm (x ) = 0. (2)

Example 5. The problem of Example 4 is that of finding the extreme


points of/(x, y, z) = x + y + z subject to the conditions

x2 + y
2
- 1 = 0, z - 2 = 0. (3)
We write down

(x +y+ z) + ?n (x* +j - 2
1) + Uz - 2).

The critical points of this function occur when

1 + 2X x x = 0, 1 + lk x y = 0, 1 + A2 = 0.
Sec. 1 Extreme Values 357

In addition, we must satisfy Equations (3). Solving for X x and A 2 as well as ,

x, y, and z, we get

That is, the critical points are

As in Example 4, we easily see that /has its maximum value, V2 + 2, at

the first of these points and its minimum value, —v 2 + 2, at the other.
Notice that, while the A's are not needed in the final answer, it is necessary
to consider all values of the A's for which the equations can be satisfied.

Example 6. Find the maximum value of f(x, y, z) = x — y + z,


subject to the condition x2 +y 2
+ z2 = 1. The function

x -y+ z+ k(x 2 +y + 2
z2 - 1)

has critical points satisfying

1 + 2Xx = 0, - + 1 2A>- = 0, 1 + 2Az = 0,


and
*2 +7 + 2
.z
2 = 1.

The solutions of these equations are

V3
A=± T , * = -, = z = T -. 1

The maximum of/ occurs at (1/V3, — 1/\/3, 1/V3). The maximum value
is v 3.

Example 7. Let g(x 1? x 2 xn) , . . . ,


=
implicitly define a surface S in
31"and let a
from S to a
=
is
(a x , a2 > • •
> O
be a fixed point. Minimizing the distance
the same thing as minimizing the square of the distance.
Thus, p, the nearest point to a on 5", must be among the critical points of

n
2(x k - a*)
2
+ h(xi, . ,x n )

for some X. The critical points satisfy, in addition to g{x x , . . . , xn) = 0,


358 Real- Valued Functions Chap. 5

the equations

2(xx - a x) + X -f- (x 1; . .
. , x n) =

2(x„ - a„) + A —- (x ls . . . , xj = 0.
ox„

In vector form these equations reduce at the critical point p to

tPi - «i
(P)\

\p n — aJ
\dx„

where p = (p 1 p n ). The vector p — a on the left is then parallel to the


, . . .
,

normal vector to S at p, which appears on the right side of the equation.


In other words, p — a is perpendicular to S. A 2-dimensional example is
illustrated in Fig. 4.

Figure 4

Example 8. Suppose that a cylindrical can is to contain a fixed volume


V and that its surface area, with top and bottom, is to be as small as
possible. If the radius of the can is x, and its height is y, then V = irx*y.

We want to minimize the total area 27rx 2 + lirxy of the top, bottom, and
sides. We write

f(x,y) = 2nx 2 + 2irxy + X{irx2y - V)


Sec. 1 Extreme Values 359

and look for critical points of/ We find that/. = 0,/„ = reduce to

2x +y + Xxy = 0, 2x + Ax2 - 0.

The second equation is satisfied if x = or if Xx = —2. But x =


would require V = 0. So we substitute A.v = —2 into the first equation
to get 2.v = y. Thus height v must equal diameter 2x. The value of x
for a given volume V can be then determined from the equation 2ttx 3 =
TTXSy = K

Example 9. The planes

x -\-
y + z — =0 1 and + —z=
x _y

intersect in a line 5 as shown in Fig. 5. Let f(x,y) — xy, and restrict/


to the line S. Using the Lagrange method, we consider

xy + X{x +y + z — 1) + ^(x +7 — z).

Its critical points occur when

7 + A + ^ = 0, x + A + /M = 0, A — /* = 0.
The only point that satisfies these conditions, together with the condition
that it lie on S, is x = (J, £, \). We have V/(x ) = (£, J, 0), which is
perpendicular to 5. The unit vector u in the direction of V/(x ) is shown
in Fig. 5 with its initial point moved to x .

Figure 5
.

360 Real- Valued Functions Chap. 5

EXERCISES
1. Find the critical points of x 2 + Axy — y 2 — 8x — 6y. [Ans. (2, 1).]

2. Find the points at which the largest and smallest values are attained by the
following functions.

(a) x + y in the square with corners (±1, ±1).


[Ans. Max. (1, 1), min(-l, -1).]

(b) x +y + z in the region x 2


+y + 2
z 2
< 1.

[Ans. Max. (1/V3, 1/V3, 1/V5), min. (-l/v7 ^, -1/VJ, -1/^3).]

(c) x 2 + 24xy + 8y 2 in the region x2 + y 2 < 25.


[Ans. Max. ±(3,4), min. ±(4, -3).]

(d) 1/(jc
2
I
y
2
) in the region (x - 2)
2
+ y2 < I.

[Ans. Max. (1,0), min. (3,0).]

(e) x2 + y2 + (2V2/3)xy in the ellipse x 2 + 2y 2 < 1.

[Ans. Max. (±2/^5, ±1/VT0), min. (0, 0).]

3. Find the point on the curve

that is farthest from the origin. [Ans. (


— 1,0, 1).]

4. Find the critical points of the following functions.

(a) x +y sin x. (b) xy + xz. (c) x2 + y2 + z2 — 1.

5. Find the maximum value of the function x(y f- z), given that x2 + y 2 = 1

and xz = 1 [Ans. f .]
6. Find the minimum value of x + y
2
, subject to the condition 2x 2 + y2 = 1.

[Ans. -IIVI]

7. Let f(x, y) and g(x,y) be continuously differentiable, and suppose that,


subject to the condition g(x,y) = 0,f(x, y) attains its maximum value M
at (x ,
y ). Show that the level curve f(x, y) = M
is tangent to the curve

£(a-, y) - at (x ,
y ).

8. A rectangular box with no top is to have surface area 32 square units. Find
the dimensions that will give it the maximum volume.

9. Find the minimum distance in Jl


2
from the ellipse x 2 + 2y 2 = 1 to the line
x +y = 4. [Hint. Treat the square of the distance as a function of four
variables.]
Sec. 2 Quadratic Polynomials 361

10. (a) Find the maximum value of x 2 + xy + y 2 + yz + z 2 subject to the ,

condition x 2 + y 2 + z 2 = 1. [Ans. 1 + 1/V2.]


(b) Find the maximum value of the same function subject to the conditions
x 2 + y 2 + z 2 = 1 and ax + by + cz = 0, where (a, b, c) is a point at
which the maximum is attained in (a). [Ans. 1.]

11. Consider a differentiate function &n — > Jl and a continuously differ-

entiable function ft" — Jl m , m <


Suppose the surface S defined by
n.

G(x) = has a tangent "E of dimension n -mat x and that the function/ ,

restricted to S has an extreme value at x . Show that "B is parallel to the


tangent to the surface defined explicitly by /at the point (x ,/(x )).

12. (a) Find the points x at which/(x, y) = x 2 —y —y 2


attains its maximum
on the circle x2 + y2 = 1. [Ans. (ivTs/4, -J).]
(b) Find the directions in which /increases most rapidly at x .

[Ans. (±VI3/4, -J).]

13. The planes x + y — z — 2w = I and x— y + z + w=2 intersect in a


set 3 in Jl 4 . Find the point on $ that is nearest to the origin.
[AnS. (yg", — Y9", X9~> — Tsf).]
14. Let xx , . . . , x v be points in Rn , and let

fix) = y ix - x,i
2
.

Find the point at which /attains its minimum and find the minimum value.

15. Prove by solving an appropriate minimum problem that if ak > 0,


k = I, . . . , n, then

(« lfl2 . . . a n ) l/n < — + an

SECTION 2

Let F(x, y) =
x y, be the Euclidean dot-product of two vectors x and y
• QUADRATIC
in Jl". In addition to having the property F(x, x) > 0, the function F POLYNOMIALS
satisfies

F(x, y) = F(y, x) (1)

F(x + x', y) = F(x, y) + F(x' y) , (2)

F(ax, y) = aF(x, y), (3)

where a is any real number. As a consequence of the definition of F, or of


the above three properties, Fis linear in the second variable also. Because
of the symmetry property (1) and the linearity in both variables, a real-
valued function F satisfying (l)-(3) for all pairs of vectors x and y in 31" is
called a symmetric bilinear function. Such a function can be written in
362 Real- Valued Functions Chap. 5

terms of coordinates as follows. Let x = (xt , . . . , x n) and y =


Ol»- • •
»Jn)- Then >

n n

2=1 j=l

where ek , k = 1,2, ... , /?, are the natural basis vectors

l\ /0

1 / 1

i /

of 31". We have from (2) and (3)

In n \ n n
F(x, y) = Ffexfit, 2 y^A/ = 2 2 F e ^ *dXiy ( fi
\ 1 1 = 3=1 2 1

where, by (1) F(e 2 e ; ) ,


= F(tj, e,). Conversely, an arbitrary choice of the
numbers F(e t , e.,) = a {j , consistent with a u = a u determines the most
,

general symmetric bilinear function. In summary, symmetric bilinear


functions are just those which, in terms of coordinates, have the form

H*> y) = 2 a a x<y^ aa = a n- ( 4)

In particular, if a H = 1 and a u = for i #y, we get our original example


n
x •
y = 2x i>'i-
2=1

If F is a symmetric bilinear function on :i\", the real-valued function of


a single vector defined by

Q(x) = F(x, x) for all x in %n


is called a homogeneous quadratic polynomial or sometimes a quadratic form.
Thus, by definition, every Q is associated with some bilinear F, and vice
versa. From (4) it follows that, in coordinate form,

Q(x) =2 a^xjj, au = aH .
(5)

For example, if Fis the Euclidean dot-product, we have associated with it

the quadratic polynomial


n
X • X = ^ A-
:

Sec. 2 Quadratic Polynomials 363

The word homogeneous applied to a polynomial, means that all terms


have the same degree in the coordinate variables x Throughout this t
.

section the phrase "quadratic polynomial" will be understood to mean


"homogeneous quadratic polynomial."
Equation (4) can be written as a matrix product as follows

/a u a 12 ... a ln \ ly x

F(x, y) = (*! x2 ... *w ) [

\a nl . . . a nn \ \yn \

or
F(x, y) = x*Ay = x • Ay.

This follows immediately from the definition of matrix multiplication. The


condition aH = a i} means that the matrix A = (a^) is symmetric about its

principal diagonal.
In the matrix notation, (5) becomes

Q(x) = x*Ax,

and we have as a familiar special case

Q{x) — x Ix
l
= x • x.

In case a ij = for i ^ j, A is a diagonal matrix and Q is said to be


represented in diagonal form

Example 1. We give some examples of quadratic polynomials.

= x
2
+ Axy + y
2
,

( Xl x2 x3 x 4) I I = x\ + 2x\ + 3x
2
+ Ax

A quadratic polynomial Q is called positive definite if Q(x) > except


for x = 0. We remark that if Q is positive definite and F is its associated
364 Real- Valued Functions Chap. 5

bilinear function, then F is an inner product in ."ft". This is so because


(l)-(3) together with the positive definiteness condition are the character-
istic properties of an inner product.

Example 2. The graphs of two quadratic polynomials are shown in


Fig. 6. The one on the left is positive definite. The other one is not; its

z=y
Figure 6

graph is called a hyperbolic paraboloid.

Example 3. The quadratic polynomial

Qi(x, y,z) = (x —y — z) 2 = x2 +y + 2
z2 — 2xy — 2xz + 2yz

is nonnegative. However, it is not positive definite because it is zero on the


plane x —y — z= 0.

The polynomial

Q 2 (x, y, z) = (x +y+ z) 2 -(x-y-z) = Axy + 2


Axz

changes sign. In fact, Q 2 is negative on the plane x + y + z = and


positiveon the plane x — y — z = 0, except along the line of intersection
of the two planes, where Q 2 is zero.
The polynomial
Q 3 (x, y, z) = x 2 + y 2
is nonnegative, but not positive definite, because (? 3 (0, 0, z) = for
arbitrary z.

In the examples just given, we have seen illustrations of the fact that if a
quadratic polynomial can be written, say, in the diagonal form

Q(x, y, z) = a x x2 + a 2y 2 + o3 z 2 ,
Sec. 2 Quadratic Polynomials 365

then Q is positive definite if and only if all the coefficients a t are positive.
Furthermore, if some coefficients are negative or zero, it is possible to
determine regions for which Q is positive or negative. In the following
examples we consider one way in which a polynomial Q{x,y) can be
written in diagonal form.

Example 4. In Jl 2 we get the most general symmetric bilinear function


by choosing a, b, and c arbitrarily in

F((x,y),(x',y'))=(x y)r JT
The general quadratic polynomial is then

Q(x, y) — ax 2
+ 2bxy + cy 2 .

To determine conditions under which Q is positive definite, notice first

that we could not have both a For then Q(x, y) = 2bxy,


= and c — 0.

and, if b ^ 0, this polynomial assumes both positive and negative values.


Suppose then that a ^ 0. Completing the square, we have
" 2
1 / b \
Q(x,y) a2 x 1 + - y I
+ (ac — b 2 )y 2 (6)

Similarly, if c ^ 0,
2

<20x y) ,
= i c
2
ly + - x\ + (ac - b )x
2 2
.
(7)

We see directly that Q is positive definite if and only if ac — b2 > and


either a > or c > 0.

Example 5. Having written Q in one of the two forms (6) or (7), an


obvious change of variable can be used to simplify the polynomial. To be
specific, suppose a ^
and that (6) holds. Letting

u = x + _
y, v = Ox + y,
a

we can write Q in the form

au
2
+ -(ac- b )v
2 2
.

This transformation of coordinates corresponds to a change of basis in


2
which the natural basis of 'Ji is replaced by the basis

Xl = (l,0), x2 = (~^.l)-

The coordinate relations between x, y and u, v can be written in matrix


366 Real- Valued Functions Chap. 5

form as
1
u

v
\0

To see concretely the geometric significance of the choice of new basis,


we consider a numerical example. Let
Q(x) = Q(x,y) = x + 2xy + 3y 2 2
.

Then a — b — and 1 c = 3. The new basis consists of the vectors x x =


(1, 0) and x 2 = (— 1, 1). With respect to the new coordinates we have
Q(x) = u2 + 2v\
where
(u\ l\ \\ lx\ (x\ /l — 1\ lu\

\yi \yi

Clearly Q is positive definite. The vectors x x and x 2 together with the


,

level curve Q(x) = 1, are shown in Fig. 7.

Q{x) = 1

Figure 7

We have seen in Example 5 that a quadratic polynomial can be reduced


to diagonal form if it is written in terms of the coordinates of an appro-
priately chosen basis. However, this was done with basis vectors which
were not necessarily perpendicular. The next theorem shows that we can
always find a diagonalizing basis consisting of perpendicular vectors of
length 1.

2.1 Theorem

Let Q be a quadratic polynomial on :){". There exists an orthonormal


basis x l5 . . . , x n such that if
y lt . . .
,
yn are the coordinates of a
Sec. 2 Quadratic Polynomials 367

vector x with respect to this basis, then

<2( X ) = i xy \ + ... + xn yl
As a result,
Q{x k ) = Xk , fc=l,...,».

Proof. The basis vectors xx , . . . , x„ will be chosen successively as


follows. Let S"" 1 be the set of unit vectors in 'Ji", that is, the set of all
x such that |x| = l.f Let Xj be a maximum point on 5" _1 for the
function Q restricted to S'!_1 . (See the introduction to the Appendix.)
By its choice, x x is a unit vector. Let 'LF^i be the (n — l)-dimensional-
subspace of % n consisting of the vectors x of 51 n that are perpendic-
2 c
ular to xl5 and
let S"~ be the unit sphere in lTn_ 1 Restrict x to .

S n ~ 2 and , x 2 be a vector on S"~ 2 such that Q(x) assumes its


let

maximum, for x on S"~ 2 at x 2 By its choice, x 2 is a unit vector , .

perpendicular to x x Assuming that x lf xk , k < n, have been


. . . . ,

chosen in this way, let cUn_k be the subspace of 'J\ H consisting of all
vectors perpendicular to xl5 x k Let S"-*-1 be the unit sphere in . . . , .

Q maximum on S "^ -1 at the point x k+1


1

^n-k, and let assume its .

Continue the process until n unit vectors have been chosen in this
way, each perpendicular to those already chosen.
The vectors x l5 x n clearly form an orthonormal basis for
. . . ,

31". We now show that this basis is a diagonalizing basis for Q.


Since Q has a maximum at x x when restricted to the unit sphere
|x|
2
— 1 = 0, by Lagrange's theorem, Theorem 1.2, the function
/defined by
fix) = Q(x) - A(|x| 2 - 1)

must have a critical point at x x for some ?.. That is /'(x x ) = 0.


Direct computation shows that every quadratic polynomial Q on %n
and its associated bilinear function F satisfy the equation

Q'(x)y = 2F(x, y)

for any x and y in Jl". (See Exercise 10.) Hence, at the critical
point x ls
0=/'(x 1 )y = 2F(x 1 ,y)-2;.x 1 .y,

and we conclude that


F(x 1; y) = Ax, •
y

for any y in 'Ji". It follows that

F(xl9 x,) = 0, k = 2,..., n, F(xx x x ) , =A= Q{x,).

By restricting Q to
c
Un _ 1 the subspace of % n perpendicular to x x , we
t The set of all unit vectors in .ft" is an (n — l)-dimensional surface implicitly
defined by the equation |x| = 1. For this reason, we write the index n-lon 5n_1 .
368 Real Valued Functions Chap. 5

can repeat the same argument and obtain

F(x 2 X*)
, = 0, k = 3,. . . , n.

Continuing in this way, we obtain finally

F(Xi, xk) = 0, if i # k.
If an arbitrary vector x is written in terms of the basis x l5 . . . , x„
as x =yx + l 1 . . .
+ y„x„, we obtain

Q(x) = I Fix,, xk)yjyk

= I,F(xk ,xM=lQ(x*)yl-
This completes the proof.

A further consequence of the proof just given can be stated as follows.

2.2 Theorem

The basis vectors xl5 . . . . x n with respect to which a quadratic


polynomial Q has the form

1=1

can be chosen by requiring that Q(x k ) be the maximum value of Q


restricted to the unit sphere of the subspace of "J{" perpendicular to
Xi, x2 , • • • , x k_ x .

The maximum value property of the basis vectors xk can be used to


compute them, as in the next example.

Example 6. Suppose the quadratic polynomial

Q(x,y) = 3.x
2
+ 2xy + 3y
2

is expressed using the coordinates of the natural basis for 2


'JV . We restrict

Q to the unit circle


x2 4-
y
2 - 1 = 0.

By Lagrange's theorem, Theorem 1.2, Q will have its maximum at the


critical points of
3.x
2
- 2.vv + 3v 2 - A(.v 2 -;- v
2 - 1),
A

Sec. 2 Quadratic Polynomials 369

for some A. That is, for some ?., the vector (x, y) must satisfy

(3 - X)x + j =
.v + (3 - % =
in addition to jc
2
+y = 2
1. (A has been replaced by —A.) Nonzero
solutions to these equations will exist only if the columns of the matrix

-;. 1

1 3-
are dependent. Since dependence is equivalent to

;. 1

0,
3-;.

we must have (3 — A) 2 — 1 = or X — 2, 4. The corresponding solutions


for (x, y) are

1 = 2: U,,)=( ± i T -L).

A = 4: (*,,)= ± X ± J.).
(

The maximum of (2 occurs at (±l/v2, ±1/V2), so we can choose


xi = (l/v2, l/v2). For x 2 we can take either the vector (—1/v 2, l/V2)or
its negative.

Let x 2 = (— 1/v 2, 1/v 2). The change of coordinate equation is then

^ = lj_ />/2 >/5

^/ ±
In terms of the new variables we have

Q(x) = Au + 2
2v 2 .

In Fig. 8, level curves of Q are shown in their relation to the new and to the
original basis vectors.
For some purposes it is unnecessary to compute the orthonormal basis
vectors x of Theorem 2.1 provided that the numbers A can be found. For
example, it is clear just from knowing the X k whether Q is positive definite
or not. The following theorem enables us to compute, or estimate, the Xk .
,

370 Real- Valued Functions Chap. 5

2.3 Theorem

Let Q be a quadratic polynomial in 31" given by

Q(x) = x Ax,
l

where A is a symmetric matrix. Suppose that, with respect to the


coordinates of the orthonormal basis xl5 . . . , x„, Q has the form

Q(*) = Ihyl, (8)

where Q(x k ) = Xk . Then the numbers X k are the roots of the equation

det {A - XI) = 0. (9)

Although the existence of the basis vectors x 1} xk was proved in . . . ,

Theorem 2.1, it is not necessary to know what they are in order to find
the X k The X k can be computed by solving Equation (9). Equation (9)
.

is called the characteristic equation of Q, and the roots X k are called char-
acteristic roots or eigenvalues. The next theorem provides another method
for computing the basis vectors x lt . xn. . , .

2.4 Theorem

Let z l5 . . . , zn be any orthonormal basis such that, for each k = 1

...,«, the vector z k satisfies the matrix equation

(A - 4/)z* = 0. (10)

Then, with respect to this basis, Q has the diagonal form (8).
Sec. 2 Quadratic Polynomials 371

Vectors z k that satisfy Equation (10) are called characteristic vectors or


eigenvectors corresponding to X k .

Proof (of Theorem 2.3). Suppose that the orthonormal basis vectors
xx , x„ that diagonalize Q are
. . . ,

»
Yn
A =:

bm
t

Let B be the n by « matrix with columns x 1; . . . , x„. According to


Chapter 2, Section 10, coordinates of the same point are related by
the equation x = By, where

x = I and y =

and where y 1 ,y n are the coordinates with respect


, . . . to x l5 . . . , x„.
Then substituting By for x gives

g(x) = (ByyA(By)

= y (B AB)y
t t

= y Ay.
f

By the choice of the columns of B, the matrix A = B'AB is a diagonal


matrix with diagonal entries Al5 . . . , An . Furthermore, since the
columns of B are the coordinates of perpendicular unit vectors with
respect to an orthonormal basis, we have directly, by matrix multipli-
cation, B B = I. In other words, B = B~ x Then A = B~ X AB.
( {
.

Subtracting XI from both sides of this equation and factoring the


right-hand member, we get

A- XI = B-'AB - XI
- B'\A - XI)B.
372 Real- Valued Functions Chap. 5

But A— XI is a diagonal matrix with diagonal entries X k — X, so

(A, - X)(X 2 - X) . . . (X n - X) = det (A - XI)


= det 5" det {A - 1
XI) det B
= det (/I - A/).
This shows that the roots of det (A — XI) = are Ax , . . . , AB .

Proof (of Theorem 2.4). Let zx , . . . , z„ be an orthonormal basis


satisfying ^z fc
= A z fc fc
, for k = 1,2,... , «. Let C be the matrix with
columns z l5 z 2 z„. The equation

x = Cz

gives the relation between the coordinates of the basis z x zn in


%n and the natural coordinates. Then

Q(x) = xMx = (Cz)M(Cz) = z'(C ylC)z.


(

All we have to do is verify that the matrix OAC is diagonal with


diagonal entries A ls . . . , Xn . Schematically, we write

C AC l
= I I A(z 1 ... z„)

(i4Zi . . . Az„)

(AjZj . . . X n z n ).

Using the fact that

l, if i=;\
ZjZj = Zj • Xj =
0, if i#j,
Sec. 2 Quadratic Polynomials 373

we get
'
A,

A,

aac

This completes the proof.

Example 7. Let Q(x, y, z) = xy + yz + zx The matrix of Q is

'0 \

^2 2

and the characteristic equation is

or
-A 3 + |A + i = 0.
The characteristic roots are A = 1, — \, —\. So there is an orthonormal
system of coordinates (w, v, w) with respect to which Q has the form

u2 — \v % — \w 2 .

To find the related basis vectors we look for the unit vector solutions of
the equations
-A \

-X

with A = 1 and A = — \. With A = 1 we get x =y = z for a solution, so

we can choose x x = (1/V3, 1/V3, 1/V3). When A = — ^, the matrix

equation simply requires that the two remaining basis vectors lie in the
plane x +y+z= perpendicular to xv Then x 2 and x 3 can be chosen
0,
to be arbitrary perpendicular vectors in that plane, for example,

(1 -1
) and (1L 2).
374 Real- Valued Functions Chap. 5

(0, 0, c)

(0, 0, c)

a' t>- C
Two-sheeted hyperboloid
Figure 9

A level surface of a quadratic polynomial is called a quadratic surface.


Since every quadratic polynomial can be diagonalized with respect to some
orthonormal basis, it is sufficient to be able to picture the level surfaces of
some standard quadratic polynomials in order to be able to picture a more
general quadratic surface. Figure 9 shows some illustrations of quadratic
surfaces in 3l 3 .

Example 8. The quadratic polynomial xy + yz can be represented by


a symmetric matrix as
\ 0\ lx\
(0
\ 2

1
Sec. 2 Quadratic Polynomials 375

The characteristic equation is —2 + U =


3
0, and this equation has roots

X = \j\ll, 0, — 1/V2. The corresponding characteristic vector equations,


together with their unit vector solutions, are as follows:

V"2

A- 4=
V2
376 Real- Valued Functions Chap. 5

Figure 10

Then
P(x) = F(x, x, . . . , x)

will be a polynomial of degree N. Alternatively, we can consider functions


of x = (xi, x2 , . . . , x n ) having the form
n
P(x) = 2 ^...feXfi • • XiS>
i$=l «1

where the coefficients a, ,


. are symmetric in the subscripts. In both cases
we get the same class of functions. The symmetry condition on the coeffi-
cients of F is a convenience, and
were not assumed, it could be if it

obtained simply by averaging the nonsymmetric coefficients. The details


are left as an exercise (Exercise 12).

EXERCISES
1. By changing coordinates, write each of the following quadratic poly-
nomials as a sum of squares. In each problem exhibit an orthonormal basis
that does the job, and write the coordinate transformation.

(a) 3x 2
+ 2V2xy + Ay 2 .

Ans. 0(x)
Sec. 2 Quadratic Polynomials 377

(b) 3x 2 + 2V3xy + 5y
2
.

5
4/w. <2(x) = 2/r + 6r 2 ;x! =
(4 4H^)-]
(c) (x J)
:;)0
(d) 2x 2 - 5xy + 2/ - 2xz + 4z 2 - 2yz.

^HJ. Q(x)
2 2

(11 M
V3V2'3V2'3V2/' *3 ~
P
\2 '
2
3 '
_1\
V

2. (a) For each polynomial Q in Exercise 1 , find the maximum of Q when Q


is restricted to the unit sphere, |x| = 1 , of the Euclidean space on which
Q is defined. [Hint. See Theorem 2.2.]

(b) Find the maximum of the polynomial in 1(a), restricted to the circle
x2 +f = 3. [Ans. 15.]

3. Classify the following quadratic polynomials as positive definite, negative


definite, or neither. Give reasons. (Q is negative definite if Q < except for
G(0) = 0.)

(a) 2x 2 — Ixy + 5y 2 . [Ans. Neither.]


(b) 2x 2 - 3xy + 5y 2 . [Ans. Positive definite.]
(c) — x 2 + 2xy — 6y 2 . [Ans. Negative definite.]
(d) 3x 2 + xy + 3/ + 5z 2 . [Ans. Positive definite.]

[Ans. Neither.]

4. Prove that x 2 + y2 + z2 — xy — xz — yz is not positive definite, but


becomes so when restricted to the plane x + y + z = 0.

5. Sketch the level curves Q(x) = 1 and Q(x) = for each of the following
polynomials in 5t2 .

(a) xy. (b) x 2 + xy + y 2 . (c) x 2 + xy - 2y


2
.
Real- Valued Functions Chap. 5

6. Sketch the level surfaces Q{\) = 1 and Q(x) = for the following poly-
3
nomials in Jl .

(a) x2 — xy -f
2
y + z2 . (b) x2 + xy. (c) x 2 — 2xy + y 2 — z2 .

7. Show that every quadratic polynomial Q satisfies

Q(ax) = 2
a Q(x),
for every real number a.

8. Let Q be an arbitrary quadratic polynomial on Jl", and let Fbe its associated
symmetric bilinear function. Prove that

F(x, y) = h[Q(x r y) - Q(x) - Q(y)],

for all vectors x and y in 3l n . This equation proves that F is uniquely


determined by Q.

9. Prove that every quadratic polynomial Q on Jl" is a continuous function.

10. Let Q be an arbitrary quadratic polynomial on Jl", and let /"be its associated
symmetric bilinear function, that is, Q(x) = F(x, x). Prove that Q is a
differentiable vector function or, more explicitly, that

0'(x)y = 2F(x, y).

11. Prove that every quadratic polynomial Q is a continuously differentiable


function.

12. What follows illustrates the fact that the condition of symmetry on a bilinear
function can be obtained by averaging out the nonsymmetry. Let G be a
real-valued function defined for all pairs of vectors x and y and linear in each
variable (we do not assume symmetry). Show that the function F defined by

F(x, y) = l(G(x, y) + C(y, x))

is a symmetric bilinear function. Show that G(x, x) = F(x, x) and, hence,


that G and F define the same quadratic polynomial.

13. Let Q be a quadratic polynomial on R". Prove that there exists a basis
(x l7 . . . , x„) for Jl" such that, for any vector x = y1x 1 + . . . + y„x n ,

n
Q(x) = ^ hy\, witn ?-i =0,1, at -1.

14. Prove that if Q is a positive-definite quadratic polynomial on Jl", there

exists a positive real number m such that

Q(x) >m |x|


2
, for all x in .R".

[Suggestion. Diagonalize Q.\ A corollary is that the values of Q on the


unit sphere |x| = 1 are bounded away from zero.

15. Let Q be a positive-definite quadratic polynomial in Jl 2 and let ). x and / 2 ,

characteristic roots. Show that /~


1/2
be its and A~1/2 are the lengths of the
principal axes of the ellipse Q(x) = 1.
Sec. 2 Quadratic Polynomials 379

16. Verify that if a =£ and ab - /2 # 0, then

Conclude that the above polynomial is positive definite if and only if the
three determinants are positive:

a \f
f b
380 Real- Valued Fund ions Chap. 5

definite, then there is an orthonormal basis for 31" with respect to which Q Y

and Q 2 both have diagonal form.

21. Prove the converse of Theorem 2.4, namely, that if x l5 . . . , x„ is an ortho-


normal diagonalizing basis for Q(x) = x'Ax, then (A — h,.I)x k = 0, where
h = 0(x*).

SECTION 3

TAYLOR EXPANSIONS We begin by reviewing the definition and the simplest properties of the
Taylor expansion for functions of one variable. If f(x) has an Mh
derivative at x , its Taylor expansion of degree N about xQ is the poly-
nomial

v
/(*„) - |j/'(*«)(* - *o) + jj"(x )(x -x )
2
+... + ^f iN \x )(x - x )- .

The relation between / and its Taylor expansion can be expressed con-
veniently by the following integral remainder formula.

3.1 Theorem

Iff has a continuous Nth derivative in a neighborhood of x , then


in thatneighborhood

/(*)=/(*„) + ^/ (x )(x-x
/
)

+ LY,
- N + RN
• • •
+ ^/ (-x )(.x x ) , (1)

where

** = 7^~4v ft* - O^t/^CO


-/ LV, (^o)] dt.
(N — 1)! Jxo t

Proof. The remainder can be written as the difference

/<A)(
Rs = l
-
P(x - A'-y Y, ( -

(0 dt - Vo)
'

(\x - N-Ht.
t)

The second of these integrals is directly computed to be

S \x - x )\
j/ ){x

which is just the last term of the Taylor expansion. The first integral
can be integrated by parts to give

x- 2 LX - 1} -1}
1

f(x - - t) [f «) -f lN (x )] dt = RN_V
(JV-2)!J*
Jx
,

Sec. 3 Taylor Expansions 381

We therefore obtain

RN = - j^f im (x )(x - x )N + RN_V

If we substitute the preceding equation into (1), we get (1) back again
with N replaced by N— 1. The induction is completed by noticing
that, for N = 1 , Equation (1) is just

/to =f(x ) +/'(-v )(.x - x ) + \\f\t) -f'(x )] dt,

and that this is a valid equation.

It follows from the remainder formula that a polynomial of degree JV


* {

is equal to its Taylor expansion of degree N. For if/is of degree N,f is '

a constant function and so the remainder is identically zero. some We list

common examples. It is only in the first one that we have equality. The
expansions are all about x = 0.

(!+*)» =!(?)*>.

*+*"!'
2
il-x) M ~o\ m-1
e
x
: 1 +-x + -x + ...+— x N 2
,

1!2! N\

log (1 - x): -x - ix
2
- ix - 3
... --x iV
, (2)
N

cosx: 1 ——+— — ... + (— l) 1'-^ —


2! 4! (2/c)!

3 5 2 ^+i
x x
+——
x
sin x: x — '— ... + (—l) fc -
3! 5! (2k + 1)!

Figure 11 shows the graphical relationship between the functions ex


and cos x, and their second-degree Taylor expansions.
To see how to generalize Taylor expansions to functions of more than
one variable, suppose first we are given a function f(x,y) having partial
derivatives of order N at a point (x ,
y ). For the moment, consider a
function F of one variable defined by

F(t)=f(tu + x ,tv+y ),
.

382 Real- Valued Functions Chap. 5

Figure 11

where u and v are held fixed. We attempt to find a Taylor expansion for F
about / = and, therefore, compute the successive derivatives of F at 0.
We find, by the chain rule,

F'(t) = u^-(tu + x , ty + y ) + v
f-
(tv + x , ty + y ),
ox oy

and hence that

F'(0) = u^(x ,y ) + v^(x ,y ).


ox oy

Similarly, we compute F"{t), set / = 0, and find

^"'(0) = u — 2
(x , y ) + 2uv —— (x , y ) + v* — l
(x ,
>- ).

dx dx By dy

Further calculation shows that, in general,

^"'^.iJ^'^^'P' "** 1

N
where is the binomial coefficient NljkliN — k)\. Now replace u by

(x — x ) and v by (y —y ). The Taylor expansion

F(t) ~ F(0) + ^ F'(0)t + i F'(0)r + 2


. .

becomes, for / = 1,
Sec. 3 Taylor Expansions 383

fix, y) ~f(x ,
>' ) + — (— (x , y )(x - x ) + — (x„, y )(y - y )f
'
1 ! \ox dy
1 /r)
2

+ — T~ Oo, >'o)(x - x f +
(
f
2
2 ——
5.x
r)
2

dy
f
(

+ —
df
2

2 Oo, y )(y - y )
)
)

Thus we arrive at the following definition. For a function f(x,y) of two


variables having continuous Nth-order partial derivatives in a neighbor-
hood of (x ,y ) the Taylor expansion of degree N about (x ,y ) is the
polynomial

f(x , y Q) + — ((x - x )/x (x , y ) + (y - )\,)/„(xo. y ))

+ —((x - XoYfxx (x , y ) + 2(x - x )(y - y )fxy (x , y )

+ (y - y )
2
fv »(x y , )) (3)

jV
l /iv\
+

Example Let f(x, y)


1. VI x 2 -f = + j
2
. To expand about (0,0)
through the second degree, we compute

/x (0,0)=/,(0,0) = 0,

/«(0, 0) =/w (0, 0) = 1 and fxy (0, 0) = 0.

Then Formula (3) reduces to the second-degree polynomial + \{x 2 + y 2 ). 1

The graphs of/ and its second-degree Taylor expansion are shown in Fig.
12.

1 + ±(*2 + yl)

Figure 12
384 Real- Valued Functions Chap. 5

To simplify the writing of the terms of the Taylor expansion, we can


use the following notation. The differential operator

{
x
i + yB
dy>
applied to/and evaluated at x = (x ,y ) is by definition the first-degree
polynomial

y)=l X ^- + ypl f=x l iXo) + y dl (Xo


d
dx J(x, ).
\ ox oy/* ox oy
Differentials of order k, k > 1, can also be defined. They are homogen-
eous polynomials of degree k. If / has the required derivatives, the
definition is

Here the operator

\
X
YX + y Yy)
has been multiplied out according to the binomial expansion. The
operator is applied to/, and the partial derivatives are then evaluated at
x = (x ,
y ). Notice that x and y are the only variables that appear in
the preceding equation, since

(Xo)
j)a7a7^
is a constant for a fixed x . The Mh-degree Taylor expansion of/at x can
now be written

f(x , Jo) + J7
<4„ fix -x ,y- y + ) ...+ — <l£ fix - x y
,
-y ).

Example 2. If/(x, y) has the required derivatives then the differential


of order 3 at (1, 1)evaluated at (x \,y —
1) is the polynomial —

4.i>/(* - 1. y
- 1) = ((x - i)f + (y - Dj-T /

= (x-l) 3

^ (l,l)
dx 3
2
+ 3(x-l) (^-l) -4^(1,1)

+ 3(x-l)(y-l) -^- (l,l) 2

2
dx dy*

+ (y-if^{(i,i). 3
dy
Sec. 3 Taylor Expansions 385

A
Example 3. When the polynomial (x 1 + x2 + . . . + x„) is multiplied
out, each term will consist of a constant times a factor of the form
x^x^" 2 Xnn where the nonnegative integers ki satisfy kx
. . . + . . . + kn = N.
The multinomial expansion has the form

N
(x 1 + ... + xn f= k
I
+...+k n =x\k 1
(
. . . kn!
)#...**•.

The multinomial coefficients can be computed to be

N
(v/Cj kj )
U, .
. .
. .
. k nl k1 . . . . kn .

This computation will be done later using Taylor's theorem (Theorem 3.2).

The coefficients can also be computed by counting. See Kemeny, Snell,


Thompson, Finite Mathematics, Prentice-Hall, Inc., 1966, p. 109.

For a function of n variables, the Ath-order differential at x =


(a 1 , . . . , a n) is defined to be the following polynomial in x = (x u . . . , x n ):

</(x)= (x.f + ...+x„/-t/

= 1 (
U 1
xl"
/f ( fll ,...,fl,).

In terms of differentials the Taylor expansion about x is defined to be

/(*>) + — d x J(\ - x ) + — d x J(x - 2


x ) + • • •
- — d* fix - x ).

The function dx /is exactly the same as the differential defined in Section 8
of Chapter 3. For completeness we can also define the 0th differential by

</(x)=/(x ).
Example 4. The second-degree Taylor expansion of e Xl+ • •
Xn
about
x = is

i+
ii(^ + - + -l-> + ii(^ + - +x»4)/
= + ^ (^i +
1 • • • + xn) + j (xi + 2x x x 2 + . . . + x\ + . . . + 2
x n ).

According to the preceding paragraphs, the Taylor expansion of a


function /is defined in such a way that the coefficients of the polynomial
can be computed in a routine manner from the derivatives of/ The Taylor
expansion is important because it provides a polynomial approximation to
/near x that exhibits in a simple way many of the characteristics of/ near
386 Real- Valued Functions Chap. 5

x . Furthermore, as higher-degree terms are included in the expansion,


the approximation gets better. Consider first the one-variable case. The
expansion

/Oo) + - /' Oo)0 - *o) = /C*o) + d x J{x - x„)

is the affine approximation of/ provided by dx f. In other words, as x


approaches x ,

/(x)-/(x )-/'(x )(x-x )

tends to zero faster than x — x .

Having found a first-degree approximation, we now ask for one of the


second-degree. Indeed, the next theorem shows that the desired approxi-
mation is the second-degree Taylor expansion and that as x approaches x ,

fix) -/(*o) - ^/'(*o)(* - x ) - ^/"(.v )(x - x )


2

tends to zero faster than (x — x )


2
. For a function of two variables, the
first-degree Taylor expansion can be written

/Oo, >'o)
- — (— (x , y )(x - x ) + — (x , y )(y - y )j
dy

X Xn
f(x ,y ) + (d f)\
_
Ky ) \y yQ,

and so is just the affine approximation to/provided by the differential. We


shall see that to find a second-degree approximation of a similar kind we
need to take the second-degree Taylor expansion. The complete statement
follows, and the proof is given in the Appendix.

3.2 Taylor's Theorem

Let 'Ji"

* 'S\ have all derivatives of order iV continuous in a
neighborhood of x Let Tx (x — x ) be the Mh-degree Taylor
.

expansion of/ about x That is, .

Tv (x - x ) =/(x ) + d Xo f(x - Xo) + . . . + ~ </(x - x ).

Then
(/(x) - Tv (x - x
..
hm ))
= 0,
x-*Xq |X Xq|

and T\ is the only Mh-degree polynomial having this property.


Sec. 3 Taylor Expansions 387

The uniqueness statement at the end of Taylor's theorem shows that a


polynomial of degree N is equal to itsMh-degree Taylor expansion about
an arbitrary point x .

Example 5. The polynomial x zy + x 3 + y 3 can be written as a poly-


nomial in (x — 1) and (y + 1) by computing its Taylor expansion about

(1, —1). The result is

x y
2
+ x
3
+ / = -1 + ^((x - 1) + My + 1))

+ i (4(x - l)
2
+ 4(x - l)(y -f- 1) - 6(y + l)
2
)

+ ± (6(x - l)
3
+ 6(x - lf(y + 1) + 6(y + l)
3
)

= -l + (x - i) + My + i) + 2(x - i) 2

+ 2(x - l)(y + 1) - 3(>- + l) + (x - 2


l)
3

+ (x-l) + l) + (y+ l) 2
(>'
3
.

Example 6. The infinite series expansion e' = 1 + + f (1/2!)/


2
+ . . .

is valid for all /. Letting / — x + j we get

e
x+ v = +1 j
( x +^+_ ( x2 _|_ 2x> , _j_
ys)
+ _ > _
f
for a n x and y

It follows that

l+^(x + >') + ^(x + 2


2x>' + >-
2
)

is the 2nd-degree Taylor expansion of e x+y . The remainder, of the form


(x +
j) /3
3
! + ..., tends to zero when
divided by (v x 2 2 2
J ) as (x, v)
it is + ,

tends to (0, 0). According to Taylor's theorem, there is only one poly-
nomial of degree two having this property.

Example 7. Let/(x, y) = e
xu
sin (x + y)- Since

e" =1+ xy + - xV + Rx

and

sin (x + y) = (x + y) - j| (x + >0
3
+ K2 ,

we can multiply the expansions together, putting into the remainder all
388 Real- Valued Functions Chap. 5

terms of degree greater than three. The result is

f(x, y) = e
xv
sin (x + y) = (x + y) + x 2y + xy 2 - ^ (x + y)
3
+ R,

where R/\(x, y)\ 3 tends to zero as (x, y) tends to (0, 0). In other words, we
have found the third-degree Taylor expansion of e xy sin (x + y) about (0, 0).
In standard form the expansion looks like

fix, y) = e*» sin (x + y)

=
jj
(* + y) + jj (--x + 3
3x y
2
+ 3x/ - /) + R.

We can conclude that


3 3
r) f r) f
|^(0,0)
3
= |^ (0,0)=-1,
3
By dx
3
df
2
.(0,0) = —— d Jf
2
3

(0,0) = 1.
dx dy dx dy

Example 8. The functions ex and cos x have second-degree expansions


about x =

e* =1+ +—+ jc R(x), cos x = 1 -—+ i?'(x).

Then
2
e
cosx
=1+ (l - j+ R'(*)) + ^(l - + ^'Wj
|
+ k(i-| + *'(*))
Since /?(l — x 2 /2 + R'(xf) does not even tend to zero as x tends to zero,
we must proceed differently to find a Taylor expansion of eeoiX . We have

+ ^(-| + i?'(x))
= *(l-f) +*(*),

where R"(x)jx 2 tends to zero as x tends to zero. The coefficients can also be
found by direct computation of the derivatives of e cosx .
Sec. 3 Taylor Expansions 389

Example 9. Since 1/(1 + x2) = 1 -x + 2


R(x),

n
k=l 1 +
1

Xk
no
k=X
-**'+*(**»

= \-{x\ + x\ + ...+ xl) + R'(xx ..., x n ),


,

2
where R'/Kxx, . . . , x n )\ tends to zero as (x u . . . , x n ) tends to zero.

= N about
Example 10. The Taylor expansion of/(x) {x± + . . . + xn )

xn = is

(xx + . .
+ xj* = h(*7T + • • • + ^rT'
.

N!\ ox l
ox Jo

N dNf

N
V I
»i+...+*»=iv\fe 1 . . . kh

Only the Mh-order differential is different from zero at x = 0. To compute


the multinomial coefficient, differentiate both sides by d j{dx\ 1
N . . .
k
dx nn ).
Then, setting x = 0, we get

from which the formula of Example 3 follows.

EXERCISES
3
1. Find the third-degree Taylor expansion of (h + v)

(a) about the point (u , v ) = (0, 0).


(b) about the point (w , v ) = (1, 2).

2. Find the best second-degree approximation to the function f(x, y) = xe y


near the point (x ,
y ) = (2, 0).
[/1/w. 2 + (* - 2) + 2y + y2 + (x - 2)y.]

3. Find the best second-degree approximation to the function f(x, y) = x v+1


near the point (x ,
y Q)
= (2, 0).

4. Find the quadratic terms of the Taylor expansion of Xe x+V about (0, 0)

(a) by computing derivatives.


(b) by substitution. -

{x+v)
5. Find the quadratic terms of the Taylor expansion of esin about (0, 0).
.

390 Real- Valued Functions Chap. 5

xi+yi
6. If/(;t, y) = (x 2 +y 2,
)e , use a Taylor expansion of/ to compute
ay

M«5. 0.]

7. Compute the second-degree Taylor expansion of VI + x2 + y2 about the


point (x ,_y ) = (-1,1).

8. Compute the second-degree Taylor expansion of

(a) f{x,y, z) = (x 2 + 2xy + y 2 )e\ about (x ,


y , z ) = (1,2, 0).

(b) g(x,y, z) = xy2z z about (x j , , , z ) = (1,2, -1).

9. Write the polynomial xy 2 z 3 as a polynomial in (x — 1), j, and z + 1.

10. Compute the second-degree Taylor expansion of log 2 x at x = 1. Sketch


the graph of the expansion near * = 1

11. Compute the second-degree Taylor expansion of log cos (x + y)at(x ,y )


=
(0,0)

(a) by computing derivatives


(b) by substitution.

12. Compute the second-degree Taylor expansion of exp (— x\ — x\ — ... — x 2^


about (xlt x 2 , . . . , x„) = (0, ... 0).

13. Prove that the Taylor expansion of degree of a polynomial of degree K, N


K> N, consists of the terms of the polynomial that are of degree less than
or equal to TV.

14. Compute the differentials d£ f(y) for arbitrary y.

(a) k = 2, Xo = (1, 2),f(x,y) = x3y + 3x 2 - 2xy*.


[Ans. 18x 2 + 54xy + 24/.]
(b) k = 1, x = (a, b, c),f(x, y, z) = l/(x +j + z + 1).

(c) k = 2, x = (0, 0, 0),/(x,_y, z) = 1/(jc + v -r z + 1).

[Ans. 2(x +y + z) 2 .]
(d) A: = 4, x = (0, 0, 0),/(x, >', z) = x3 - 3xv 2 ;
4x/ + 6x 2y3 + ly 5 .

[Ans. 96xy 3 .]

15. Find the second differential of sin (x x + x2 + . . . + xn ) at

(jc lf -v 2 , . . . , xn ) = (0, 0, . . . 0). [Ans. 0.]

SECTION 4

TAYLOR EXPANSIONS The tangent t>' to the graph of a function J{" > J{ at a point (x ,/(x )) —
AND EXTREME isfound by computing the first-degree Taylor expansion of/about x We .

VALUES now consider the question of whether or not the graph of/ crosses 73 at
(x ./(x )). The possibilities for a function of one variable are shown in
Fig. 13.
Sec. 4 Taylor Expansions and Extreme Values 391

Figure 13

Consider first a very much simplified situation in which/is equal to its

second-degree Taylor expansion. That is, assume that

fix) =/(xo) + d x J{x - x ) + \ dlj(x - x„).

The two terms of the expansion constitute the best


first affine approxima-
tion to and the graph of/(x ) + dx J{x — x
/near x , ) is the tangent 75
to the graph of/ at p = (x„,/(x )). It is clear that if dXo f(x — x ) is

positive for all x in 31" except x = x then ,

f(x)>f(x ) + dx J(x- x ), for x^x ,

which implies that the graph of/ lies above the tangent 75. Similarly, if
dx f(x — x ) is negative except for x = x the graph of/ lies below 75. ,

On the other hand, if dx f(x — x ) changes sign at x then the graph of ,

/will cross 75 at p .

To say that the quadratic polynomial Q(x — x ) = dx J{x — x )


changes sign at x means that there are points x x and x 2 arbitrarily close to
x such that
Q(x, - x ) > and £(x 2 - x ) < 0. (1)

The phrase omitted, because every


"arbitrarily close" can really be omittei homo-
geneous quadratic polynomial Q has the property

Q{tx) = t*Q(x). (2)

It follows that if (1) holds for two vectors x x and x 2 not necessarily close to
,

x , then (1) also holds for t(x 1 —


x ) + x and /(x 2 — x ) + x for any ,

/ ^ 0. By choosing t small enough, we can bring the latter vectors as close


to x as we like.

Example 1. The function

f{x, y) = 2x 2 — xy — 3j
2 — 3x + ly

equals its second-degree Taylor expansion. It has one critical point at


x = (1, 1), and the tangent plane 75 at (1, 1, 2) is therefore horizontal.

392 Real- 1 alucd Functions Chap. 5

The second differential is given by

dljy = 4(x - l)
2
- 2(x - 1)0' - 1) - 6(y - l)
2
.

_ J

Trying x x
— (2, 1) and x 2 = (1, 2), we obtain

</(x 1 -x = 4 ) o.

</(*» - x ) = -6 < 0.

We conclude that the graph of/ crosses the tangent plane 73 at (1, 1,2)
and, consequently, that/has neither a local maximum nor minimum at x .

The assumption that/equals its second-degree Taylor expansion is too


strong to be of much practical value. However, the next theorem shows
that under more general hypothesis, the sign of the second differential still

determines whether or not/crosses its tangent. The quadratic polynomial


d\ f(x — x ) is always zero at x = x . If it is positive except at that one
point, it is said to be positive definite, and if it is negative except at that one
point then it is said to be negative definite.

4.1 Theorem

Let :H" > 'M have all its second partial derivatives continuous

in a neighborhood of x„, and denote the tangent to the graph of/ at


Po = (x ,/(x )) by 73.

(a) If d* f(x — x ) is positive definite, then/lies above 73 in some


neighborhood of x .

(b) If d* /(x — x ) is negative definite, then/lies below 73 in some


neighborhood of x .

(c) If dlf(x — x ) assumes both positive and negative values,


then /crosses the tangent 73 at p„.

Notice that not all possible cases are covered by parts (a), (b), and (c).

It may happen, for example, that the second differential is zero somewhere
other than at x , but that it still does not change sign.

Proof {o^ 4.1). By Taylor's theorem we have

f(x) - /(x - ) d* f(x - xo) = UttJ(x - x„) + R, (3)


Sec. 4 Taylor Expansions and Extreme Values 393

where
R
0.
x — *o |x — X
Under assumption (a), we must show that

\dlj(x - x ) +R>
in some neighborhood of x , excluding x itself. The homogeneity of
the quadratic polynomial d* /(see Equation (2)) implies that

dx f(x — x )

Ix — x„|
2
\|x - x |/

Since d% /is positive definite, its values for unit vectors are bounded
away from zero by a constant in > 0. (See Exercise 14, Section 2).
Now choose d > so that, for < |x — x < 6, |

\R\ m
<
~ "

|x - x |
2
4
It follows that

\dlj(x - x ) +R >^ |x - x |
2
- j |x - x |
2
>
which, according to Equation what we wanted to show.
(3), is
The proof of part (b) is same as the proof just
practically the
given. To prove part (c), suppose that d^o f(x 1 — x ) > and
^/(x, - x ) < 0. Set

*j(0 = t(*i — x ) + x 0' ' = 1> 2, — co < < / oo.

Using the homogeneity property of the polynomial d$ f, and also of


thenorm, we obtain, for any / 0, ^
/(x«(0) -/Oo) - d*of(*i(t) - x )

= R
V \dlj(x t
—x + ) |x t -

(4)
|
Xi (0-x |
2
J
Since

R
o,
f-o \x (t) t
- x
itfollows that, for any nonzero / sufficiently small, the left side of
Equation (4) is positive if = and negative if i = 2. In other
/'
1

words, the graph off lies both above and below the tangent "G for
some values of x arbitrarily close to x This completes the proof. .

Example 2. The function

f{x,y) = (x 2 + />* -*
2 2
394 Real- 1 allied Functions Chap. 5

lias its critical points at (0,0). (0. 1). and (0. —1). This implies that the
tangents at these points are horizontal planes. The second-degree Taylor
expansions at the three points are. respectively,

/(0.0) -

/(0
Sec. 4 Taylor Expansions and Extreme Values 395

Example 4. If/Xx) = x sin 3 x, the first four terms of the Taylor expan-
sion at x — are identically zero, while

4fx =
4
24x .

The criteria of Theorem 4.1 do not cover this example. However, a


similar proof would show that /behaves like its Taylor expansion in the
matter of crossing its tangent. The conclusion is that x sin 3 x has a local
minimum at x = 0.

For distinguishing among the critical points of a function those that


are maximum points or minimum points, a detailed examination of the
polynomial d% f(x — x ) is not necessary. It is enough to know that this
quadratic approximation is positive or negative definite. In addition to
the criteria of Section 2, we list here for reference a simple test for quadratic
polynomials in two or three variables. (See Section 2, Example 4 and
Exercise 16.)
The polynomial

ax 2 + 2bxy + cy
1
= (x y)

is positive definite if and only if

a > and >0


and is negative definite if and only if

a < and >0.

The character of the polynomial

ax 2 + by 2 + cz 2 + 2dyz + lexz + 2fxy —

depends on the sign of the three determinants

a,
396 Real- Valued Functions Chap. 5

Example 5. The function g(x, y, z) = x2 +y + 2


z2 + xy has critical
points only when

gz = 2x+y = 0,

gy = 2y + x = 0,

gz = 2? = 0;

so the only critical point occurs at (0, 0, 0). Since dig = 2g, we can test g
itself for positive definiteness. We have

a f e

Thus g is positive definite, and, as a result, g has minimum value at


(0,0,0).
Sec. 5 Fourier Series 397

give any information, consider the next highest term of the Taylor expansion
that does give information.

(a)
,

398 Real- Valued Functions Chap. 5

for all .v and all integers k. For this reason it is possible to restrict atten-
tion to those x's lying in some fixed interval of length 2tt, say, — 77 x <
77. We shall compute some examples of Fourier approximations to get
some idea of how they work.

Example 1. Let/(x) .v for — < tt x < 77.

Then
1
x\ cos kx dx, - Pi|x| ,

sin

kx dx.
77 J-ir 77 J-n

Clearly, kx has integral zero over [—77, 77], because the integrals
\x\ sin
over [
— 77,
and [0, 77] are negatives of one another. Hence b k =
0] for
k = 1,2,.... On the other hand, the graph of |x| cos kx is symmetric
about the y-axis. For k =£ we integrate by parts, getting

x cos kx dx

x sin kx'
—2

77/c
r sin kx dx
Jo
/

cos kx = (COS k: I)
rk' -k 2

(> - k = 2, 4, 6,

-~Lz,k = 1,3, 5,
77A.'"

When /c = 0, we have

x dx

To summarize.
k = 2, 4, 6, . . .

4
2
fc= 1,3,5,...,
rrk

&t = 0, fc= 1,2, 3,

Hence, the Nth Fourier approximation is given for N= 1, 3, 5, . . . by


the trigonometric polynomial

4 cos 3x — 4 cos Nx
s v (x)
, ,
= — — 4— cos x —
77

2
2 77 77 3" 77 iV

If N is even, we have .v
v (.v) v
A ,(.v). Figure 15 shows how the graphs of
s , Vj, and .v
3 approximate that of |x| on [
— 77, 77].
.

Sec. 5 Fourier Series 399

Figure 15

Example 2. Let
1. < x < IT
g(*) =
— <X < IT 0.

Then the Fourier coefficients of g are given by integration as

k = 0, 1, 2, . .

k = 2, 4, 6, . . .

fc = l, 3, 5

Hence, for N odd, the TVth Fourier approximation to g is given by

s
,
ivW
.
= 4- sin x +
. , 4 sin 3x
—+ ,

. . .
4 sin
N
Nx

The graphs of s lt s3 , and s5 are shown in Fig. 16, together with that of g{x).
An important question is whether, for specific values of x, a Fourier
approximation s x (x) converges as N— »- oo to/(x), where /is the function
from which the Fourier coefficients are computed. We define the Fourier
series of/ to be the infinite series

- + 2( a fc
cos kx + b k sin /ex), (2)

where o and b k are given by Formula 5.1. Theorem 5.2 below gives some
fc

conditions on /under which the Fourier series can be used to represent/.

Figure 16
400 Real- Valued Functions Chap. 5

Indeed, suppose that the graph off is piecewise smooth. This means that
the interval [—77, tt] can be broken into finitely many subintervals, with
endpoints — <tt xx < x2 < . . . < xk < tt, such that / can be extended
continuously from each open interval (xk xk+1 ) to the closed interval ,

[x k , xk+1 ] is continuous on [xk xk+ i\. Then we can show that


so that/' ,

the Fourier series of/ converges to/(x) wherever/is continuous, and, at a


possible discontinuity at xk will converge to the average value
,

(W(*»-) +/<*»+)]•
Here/(x— ) stands for the left-hand limit of/ at x, and/(x+) represents
the right-hand limit. The graph of a smooth function is
typical piecewise
shown in Fig. 17, with the average value indicated by a dot at each jump.

2ir 3tt

Figure 17

5.2 Theorem

Let/be piecewise smooth on (—tt, tt). Then the Fourier series of/
converges at every point x of the interval to (?)[f(x— ) +/(*+)]•
In particular, if/is continuous at x, then the series converges tof(x).
At x — ±tt the series converges to (\)[f(TT—) +/(—"+)].
The proof is given in the Appendix.

Examples 1 and 2 gave an indication of the way in which the partial


sums of a Fourier series converge. In each of those two examples, the
function satisfies the condition of piecewise smoothness; hence the series
converges to the appropriate value of the function for —tt < x < tt.

Example 3. The function g defined in Example 2 is rather arbitrarily


defined to have the value 1 atjc = 0. In spite of this arbitrariness, Theorem
5.2 allows us to conclude that the Fourier series of g converges as follows:

1,
* 4 sin (2k + l)x

fc=0 77" 2k + 1
Sec. 5 Fourier Series 401

To be very specific, we can set x = tt/2 and arrive at the alternating series
expansion

y (-0* _* '

=o 2k + 1
fc 4

Theorem 5.2 shows to some extent the reason for choosing the co-
efficients in a trigonometric polynomial according to Formula 5.1. The
reason is that, under favorable circumstances, the resulting sequence of
trigonometric polynomials converges, as the sequence of partial sums of a
Fourier series, to the function f. We shall now explain another reason.
Consider the vector space TS N of trigonometric polynomials of degree N,

restricted to the interval — <x<


-a it. Clearly, the IN — 1 functions

—= , cos x, sin x, ... , cos Nx, sin Nx (3)

can be formed into linear combinations so as to span the vector space ¥> N .

Thus "GN has dimension at least 2-/V — 1 . If we now introduce the inner
product

</,g> =~ f(x)g(x)dx (4)


77 J-ir
J-j,

in "6 jV , we can show that the set of functions (3) is an orthonormal set and,
therefore, is linearly independent according to Theorem 8.1 of Chapter 2.

In fact we have the next theorem.

5.3 Theorem

The set of functions (3) satisfies the orthogonality relations

[1, m= n ^ 0,
(cos km, cos nx) —
[0, m# n,

m = n,
(1,

0, m^ n,

(cos mx, sin nx) = 0, all m, n,

<— cos mx\/=4 ~i^-> mx\ = m,


^2
- ,

/ V 2
sin
/
0, all
402 Real- Valued Functions Chap. 5

The proof of these relations is a routine calculation of some integrals


and is left as an exercise. (For example, to say that

(cos mx, sin nx) =


is to say that

cos mx sin nx dx = 0.)


77 J-v

From Theorem 5.3 it follows that the trigonometric functions (3) form an
orthonormal set in 7S N and, hence, by Theorem 8.1 of Chapter 2, an
orthonormal basis. It follows further, from Theorem 8.1 of Chapter 2,
that the coefficients in a basis expansion

N
f(x) = —a = + 2(o* cos kx + b k sin kx)

can be computed by the formulas

«.=(4=./(*)\

ak = (cos kx,f{x)), k > 0,


bk = (sin kx,f(x)), k > 0.
Because of the definition of the inner product, these formulas are the same
as the formulas in 5.1 except that in 5.1 the formula for a has the factor

1/V2 taken out and included with the trigonometric polynomial. Thus
the Fourier coefficient formulas are the correct formulas for computing
coefficients relative to the orthonormal basis (3) and inner product (4).
This fact explains why it is not necessary to recompute the previously
found coefficients if the space 73,v is extended to 'G v+1 by including
cos (N + l)x and sin (N + \)x.
We conclude this section by showing how the use of complex
exponential functions can simplify some Fourier series formulas. Recall
from Section 7 of Chapter 2 that, by definition,

e
±ikx = CQS ^ x j^ sjn £. x
The identities

ikx _|_ -ikx ikx _ -ikx


cos kx =g g
, sin kx =e e
(5)
2 2/

follow immediately by substitution. As a result, the trigonometric poly-


nomial (1) can be written in the form

%(*) = I c k e* x , (6)
k=-N
Sec. 5 Fourier Series 403

where

Cjc k>0,

(7)

k <0.
2

It follows, again by direct substitution, that for all integers £:


404 Real- Valued Functions Chap. 5

2. (a) With respect to the inner product (f,g) of two continuous functions
defined by

</.#> =" I
fMg(x)dx,

prove the orthogonality relations

[1, m = n =£

(cos nx, cos /«x) = I

|p, m # n

(\, m =h^0
(sin nx, sin hijc) = I

(0, m # «

(sin /7x, cos owe) = 0, all n, m,

where n and m are integers,


(b) Use the result of part (a) to show that if

c N
fix) = — - 2 ( c fc cos Arx + r/fc sin /be),

then the Fourier coefficients of/ are given by

(f(x), cos kx) = ck , k = 0, 1, 2, . . . , N


and
</(*), sin /<*> = </fc , A: = 1, 2, . . . , N,

and that (/(*), cos kx) = (f{x), sin /bt> = if k > N.

3. Using the identities in Equation (5) of the text, express the following trigono-
metric polynomials as polynomials (i.e., linear combinations of powers) in
kx kx
e' and e~' .

(a) 1 — cos x.
(Jo) cos x — 1 sin 2x.

4. The result of Exercise 2(b) shows that if a function /can be represented as a


trigonometric polynomial, then that polynomial is the Fourier expansion of

f.
For example, the identity

cos 2 jc = \ + \ cos 2x
2
expresses cos x as a trigonometric polynomial and so provides the Fourier
expansion of cos 2 x. Find the Fourier expansion of each of the following
functions.

(a) sin 2 x.
3
(b) cos x.
(c) sin 2x cos x.

5. (a) Prove the identity

1
£ ——(N +
————
sin 1/2)7/
- + 2. cos ku =
2 jit! 2 sin («/2)
/

Sec. 6 Modified Fourier Expansions 405

[Hint. Sum the identity 2 sin (h/2) cos ku = sin (k + -i)« — sin (k — ^)u
for k from 1 to A'.]

(b) Is the sum on the left the Fourier expansion of the function on the right?

6. Let f be a complex-valued continuous function defined on [— w, 77], and

=—
p e~!
ikx
k = ±1, ±2, Then
jv

V to
define c fc f{x) dx, 0, c fc e

is called the Nth complex Fourier approximation to /'.

(a) Show that if

fix) = 2 d*r
k=-N

then the constants dk are the complex Fourier coefficients of/!


(b) Show that if/ is real- valued, then

2c = a , 2c k =ak - ib k ,

and
2c_ fc
= ak + ib k for A: = 1,2,3,..

7. Prove that if s N and ty are the JVth Fourier approximations to / and g


respectively, then as N + bty is the Mh Fourier approximation to af + bg,
where a and b are constants.
SECTION 6

The direct application of Fourier methods to practical problems usually MODIFIED FOURIER
requires some modification of the standard formulation presented in the EXPANSIONS
previous section. In the present section we describe some of these modifi-
cations and calculate some examples.
While the interval [—tt, tt] is a natural one for Fourier expansions
because it is a period interval for the trigonometric functions, it may be that
a function encountered in an application needs to be approximated on some
other interval.
If the function / to be approximated is defined not on the interval
[— tt, but on [—p,p], a suitable change in the computation of the
-n]

approximation can be made as follows. With / defined on [—p,p], we


define

*»-/(=). -77 < X < TT.

Then we can compute the Fourier coefficients of/p by Formula 5.1. The
resulting trigonometric polynomials sN will approximate on [— tt, tt}. ;)

To approximate /on [—p,p], we consider

k-TTX

kTTX
p <x <p
*©-M(- +
. .
,

k cos b k sin
)
P P
.

406 Real-Valued Functions Chap. 5

The coefficients a k and b k can be computed directly in terms of/by making


a change of variable. We have

ak —— fp {x) cos kx dx — - / 1 — )
I7 /
cos kx dx
TT J-7, TT J-w \

A similar computation holds for b k and we have ,

6.1 ak = - f(x) cos


—— dx, bk =- f(x) sin -^- dx
pJ-j P p J-p P

for the coefficients in the Fourier approximation

&ttx k7Tx\
cos +
, ,

b k sin
.

2 k=i \ P P J

to the function /defined on [—p,p].

Example 1. If

f 1,0 <x < p


h(x) =
I— 1, — p <x <,0,
then
ak = 0, k = 0, 1, 2,

p
2 f . knx J
sin dx
pjo p
(0, fc = 2, 4, 6
sin /ex dx = {
4
Jo fe = l,3, 5,
(irk

Hence, the TVth Fourier approximation to h is given, for odd JV, by

4 . TTX
j

Sec. 6 Modified Fourier Expansions 407

*^ :
N^.

Figure 18

the integration can be performed over any interval of length 2p =b— a,


in particular, over [a, b]. Thus Formula 6.1 can be rewritten:

b
2 C .. 2knx
= fix) cos
.
;
,

b-a ax,
flj. ;
b-aJa
6.2
b
2 2k7TX_,
k —I [
/(*) sin
.

b-a dx.
'.

The associated trigonometric polynomials are

sN (x) = - +X[a k cos


v
—+ a
b k sin — .

Example 2. Let f(x) = x, for <x < 1. We find, integrating by


parts,

"= 2£ x cos 2k7TX dx


= 2
sin 2/c7rx
— sin 2kirx dx = 0,
2& Jo 2/C77 Jo
1

6, = 2 f x sin •
2/c7rx dx

2k7TX~l1
= 2 — x cos COS 2/C7TX rfx
2^77 2k', I"

COS 2/C7T

Then the Fourier series is

1
- — —1/ sin 2-7TX
. sin 477-x , sin 6-7TX

2 7T\
408 Real- Valued Functions Chap. 5

It sometimes happens that an expansion in terms only of cosines or


only of sines is more convenient to use than a general Fourier expansion.
We begin with the observation that cosine is an even function, i.e.,

cos (— x) = cos x, and sine is an odd function, i.e., sin ( — x) = —sin x.


In general, a function /is said to be even if /(—x) =/(x) for all x in the
domain of /and, alternatively, to be odd if/(— x) = —f(x) always holds.
The graphs of some even and odd functions are shown in Fig. 19.

(even) (odd)

Figure 19

Geometrically, a function is even if its graph is symmetric with respect to

the ^-axis and odd if its graph is symmetric with respect to the origin. It
follows that

f(x) dx = 2 f(x) dx, for even / (1)


J-p Jo
and

\f(x)dx = Q, for odd/. (2)


J— p

Thus, if/ is an even periodic function, the product

f(x) sin

is odd. (Why?) Therefore, for the Fourier sine coefficient b k we have by ,

6.1

K = -p J-p/O) sin ~P
\
dx = 0. (3)

It follows that an eren fund ion has only cosine terms in its Fourier expansion.
Similarly, if/is an odd periodic function, the product

f(x) cos
Sec. 6 Modified Fourier Expansions 409

is also odd; so for the Fourier cosine coefficient we have

1 C
p
) cos —P dx = 0. (4)

Thus an odd function has only sine terms in its Fourier expansion.
The facts in the preceding paragraph are the key to solving the following
problem: given a function /(.x) defined just on the interval < x <p,
find a trigonometric series expansion for/consisting only of cosine terms
or, alternatively, only of sine terms. The trick is to extend the definition
of /from the interval < x <p to all real x in such a way that the
extension is periodic of period 2p and either is even or is odd. We then
compute the Fourier series of the extension. If f e
is an even periodic
extension off, then/, will have only cosine terms in its Fourier series but
will still agree with / on < x < p.
an odd periodic
Similarly, if/ is

extension of/, then/ has only sine terms in its expansion but will agree
with /for < x < p. We illustrate the procedure with two examples.

Example 3. We shall compute the cosine expansion for the function


defined by f(x) = 1 — x for < x < 2. We consider the even periodic
extension shown in Fig. 20. To find the extension we define/, by fe (x) —

Figure 20

f{—x) for —2 < x < 0, and then extend periodically, with period 4, to
the whole x-axis. We can use Formula 6.1 to compute the Fourier coeffi-
cients of/,. Since f is even, e
Equation (3) shows that b k = for all k.
Also, Equation (1) allows us to write

ak =~
J
fe(x) cos — dx

= fe (x) cos -2p£ dx.


Jo 2

Since, for < x < 2, the function fe


is the same as the given function

410 Real- Valued Functions Chap. 5

f(x) = 1 — x, we have, for k > 0,

ak =
Jo
(1 — .x) cos — 2
dx

—2

A:t7
(1 -x)sin — /C77"X

2
2
f sin
Jo
. /C7TX
dx

^[1- CO. far]

() - /c odd,

fc even.

Finally,

a = (1 - x) dx = 0.
Jo

Thus the cosine expansion off on < x < 2 has for its general nonzero
term

—— cos
kirx
k even.
2 2
r k 2

Written out, the expansion looks like

—2 /
cos ttx +
,
cos
—— + —277-x
:
. cos -3-rrx
•)
Example 4. Starting with the same function as in Example 2, f(x) =
1 —x for < x < 2, we compute a sine expansion by considering the odd
periodic extension shown in Fig. 21. We first define f {x) = —f(—x) for
—2 < x < 0, and then extend periodically with period 4. Using Formula
6.1 and Equation (4) we find, as we intended, that a k = for all k. Also,

Figure 21
Sec. 6 Modified Fourier Expansions 411

by Equation (1),

bk =- I /oW Sin ~^
2
-,
/ (x)
. .

sin —
/C77X ,

dx.

/oW = 1 -
412 Real- Valued Functions Chap. 5

(c) Compare the results of (a) and (b) with the complete Fourier expansion of

g(x) = X, —TT<X<TT.

4. Show that every real-valued function/defined on a symmetric interval [ — a, a]


can be written as the sum of an even function fe and an odd function
f .

[Hint. Let/,(x) =(/(*) +/(-*))/2.]

5. Show that, if the appropriate combinations are defined,

(a) a product of even functions is even.


(b) a product of odd functions is even.
(c) the product of an even function and an odd function is odd.
(d) a linear combination of even functions is even, and that of odd functions
is odd.

6. Using elementary properties of integrals, prove Equations (1) and (2) of the
text.

7. Let/be an odd function on [ — ir, n] (i.e.,/(— x) = —f(x)), and let ^ be an


even function (i.e., g(—x) = g{xj). Let a k b k and a'k b'k be the Fourier
, ,

coefficients of/ and g, respectively. Show that

2 C*
«* =o. bk =- /(*) sin
SI kx dx,
"Jo
2
a.
I
= -If* g{x) cos kx dx, b' = 0.
77
Jo
SECTION 7

HEAT AND WAVE In this section we show how a Fourier series can be used to solve some
EQUATIONS problems in heat conduction and wave motion. We first find a differential
equation that is satisfied by the physical quantity being studied and then
apply Fourier series to solve the equation.
Suppose we are given a thin wire of uniform density and length p. Let
u{x, t) be the temperature, at time t, at a point x units from one end. Thus

< x < p, and we assume / > 0. We shall assume that the only heat
transfer is along the direction of the wire and that the temperature at the
two ends is held fixed. For this reason we can, without loss of generality,
represent the wire as a straight segment along an .x-axis and picture the
temperature as the graph of a function it = u(x, t). an example of which is
shown in Fig. 22.
The basic physical principle of heat conduction is that heat flow is

proportional to, and in the direction opposite to, the temperature gradient
V;/. Recall that Vj/ is the direction in which the temperature is increasing
most rapidly, so it is reasonable that heat should flow in the opposite
direction, from hotter to colder. Since the medium is -dimensional and is 1

represented by a segment of the x-axis, the gradient is represented by


Sec. 7 Heat and Wave Equations 413

Figure 22

dujdx. Thus the rate of change of total heat in a segment [x 1; x 2 ] is

proportional to

-—
du
(*i,
,
s
+
,


du
(x
,
2,
.

0- (1)
ox ox

But the rate of change of heat in the segment is also proportional to the
total change in temperature,

x
*du
(x, t) dx, (2)
Jx i dt

the total being taken over the interval. By the fundamental theorem of
calculus, the expression (1) can be written as

X2
—2
d u
- (x, t) dx. (3)
Ixj dx

Hence the two proportional expressions (2) and (3) for rate of change of
can be combined to give
total heat

a
2

Jxi

OX
(x, t)dx = \

Jxi

Ot
(x, dx,

where a 2 is a positive proportionality constant. Allowing x 2 to vary, we


can differentiate both sides of this last equation with respect to x 2 getting ,

7.1 a
a
2
—2
B u
dx
2
,

(x, t)
.
= —
du
dt
(x,
,
s
t).

This is the 1 -dimensional heat or diffusion equation.


Equation 7.1 is linear in the sense that if u x and u 2 are solutions, then
so are linear combinations b^ + b2u2 . To single out particular solutions,
.

414 Real- Valued Functions Chap. 5

we impose linear boundary conditions of the form

m(0, = and u(p, t) = 0, (4)

together with an initial condition of the form

y(x, 0) = h{x).

The standard method of solution is by separation of variables, in which we


start by trying to find product solutions of the form

u(x, t) = G(x)H(t)

with boundary condition (7(0) = G{p) = 0. If such exist, substitution into

ahlxx = u t gives
a 2 G"(x)H(t) = G(x)H'(t),

for < jc < p, < t. Dividing through by G(x)H{t) gives

, G"(x) H'(t)
(5)
G(x) H(t)

For this equation to be satisfied for varying x and /, both sides must be
equal to a constant, which we denote by —X 2 This procedure . is the
origin of the term separation of variables.
Setting both sides of (5) equal to —A 2 gives two equations:

a 2 G" + l G = 0, 2
(6)

H' + X H = 0.2
(7)
Equation (6) has solutions

G(x) = cl cos I -)x + c 2 sin I - 1

But (7(0) = requires c x = 0, and G(p) = then requires c 2 sin (A/a)/? =


0. This condition can be achieved, without making c 2 = 0, only by
choosing ). so that (?.ja)p = k-n, where k is an integer. That is, we must
take I = {kcm)\p, with the result that G has the form

G(x) = c2 s ni
(
— )*j A: = 1,2, ....

Equation (7) has solution


H(t) = e^ 1

which, because ?.
2
= {k 2 a 2 n 2 )Jp 2 , becomes
,2, 2 2, 2..
H(t) = e~
k {a
" /p u .

The product solution u{x, t) is thus given, except for a constant factor, by

- fc2,AV)
u k (x, t) = e 'sin (—\x, k = 1, 2
,

Sec. 7 Heat and Wave Equations 415

Thus a limit of linear combinations

Zb k u k (x,t)
looks like

-k (a ir Ip )t
(8)
(7>
To satisfy the initial condition w(x, 0) = h(x), we require

fb k sin(^)x = h(x), (9)


h=l \ P I

for some N. If h(x) can be expressed in this form, we can expect that a
solution to the problem is given by Equation (8). The boundary con-
ditions, u(0, t) = u(p, t) = 0, require that the temperature remain zero
at the ends, and the initial condition u(x, 0) = h(x) specifies the initial

temperature at each point x of the wire between and p. Thus we are


naturally led to the problem of finding a Fourier sine series representation
for h{x). The matter of conditions under which the infinite series (8)
actually represents a solution of Equation 7.1 is taken up in Section 8.

Example 1. To be more specific about solving the heat equation, we


assume for simplicity that p = tt and recall that, to solve a 2 uxx = u with t

boundary condition w(0, t) = u(tt, t) = 0, and initial condition u(x, 0) =


h(x), we want in general to be able to represent h by an infinite series of the
form

(10)

Suppose, for example, that h is given in [0, 77] by

< x < xr/2


h(x) =
- X, 7T-/2 < X < 77.

The graph of h is shown in Fig. 23.

Figure 23

416 Real- Valued Functions Chap. 5

To make Equation (10) represent the Fourier expansion of// on [0, 77], we
extend h to the interval [—77, 77] in such a way that the cosine terms in the
expansion of/; will all be zero, leaving only the sine terms to be computed.
We do this by extending the graph of// symmetrically about the origin, as
shown in Fig. 23. Then

1 ('
/i(x) cos kx dx = 0,
77 J-*

because h{— x) cos k(— x) = — h(x) cos kx; therefore, the integrals over
[—77, 0] and [0, 77] are negatives of one another. To compute bk we use
the fact that //(— x) sin k{— x) = h{x) sin kx, so that the graph of this
function is symmetric about the j-axis. Then

b» =- h(x) sin kx dx

= —2 f h(x) sin kx dx
77 Jo

=a
77
n
Jo
x sin /c.v dx -f —
77 Js-/2
(77 — x) sin fcx dx

= — : sin

Hence,
0. /c even,

k = 1,5,9,
b,=

-7:, fc = 3, 7, 11,
V77/C

Theorem 5.2 then implies that

4 /sin x 3x 7x
;(— -
sin sin 5x sin
,

Hx)
, .

" +
5
2
~ 7
2

for each x in [0, 77]. Finally, from Equation 8 we expect the solution to the
equation a 2 u xx = u t to be given by

s<
— — e - 3V< sin3x —1 e_r°
2 2

u(x , t) = -hr° sin x


2
a ,
sin 5x
3

.2 2.

2
sin 7x +
7

Verification that w(0, t) = 1/(77, /) = follows immediately from setting


x = and x = 77. By setting / = 0, we get the representation of h by its
Sec. 7 Heat and Wave Equations All

Fourier series, which is guaranteed by Theorem 5.2. That u(x, t) satisfies

the equation a 2 uxx = u t depends on term-by-term differentiation of the


series for w. This will be justified though the formal
in the next section,
verification is left end of this section.
as Exercise 7 at the
Next we take up the 1 -dimensional wave equation. Consider for
physical motivation a stretched elastic string of length p and uniform
density p placed along the jc-axis in 3l 3 Suppose that the ends of .

the string are held fixed at x = and x = p by opposite forces


of magnitude F. If the string is somehow made to vibrate, our
problem is to determine in ft 3 the position x(s, t) at time / of a
point on the string a distance s from the end fixed at x — 0.
Figure 24 shows a possible configuration. We imagine the string
subdivided into short pieces of length As and then derive two Figure 24

different expressions for the total force acting on a typical segment


of the subdivision. If t(s) is the unit tangent vector to the string at x(s),
then the opposing forces at x(s Q ) and x(.y + As) are

Ft(s + As) and -Ft(s ).

Hence the total force is


F[t(s + As) - t(j„)].

On the other hand, by Newton's law, the force equals mass, p As, times
acceleration n(s ). Hence

pa = F "t(s„ + As) - t(5 )-|

As
But

a = —
or
(s, and t(s) = —
os
(s, t);

so, letting As -> 0, we get

p —
d
dr
x,2

(s,t)
A = F„d —x,
ds
2

2
(s, t).
.

(11)

This vector differential equation is of course equivalent to a system of


three scalar equations
2 2
d x 2 g x
dt
2 ~ °
ds
2

2 2
dt ds

2 2
dt ds

where we have set a2 = Fjp > 0. In a formal sense, the problem of


finding solution functions x(s, t), y(s, t), and z(s, t) is the same for all
418 Real-Valued Functions Chap. 5

three equations. In practice, however, the equation for x(s, t) is usually


set aside when the longitudinal motion (along the x-axis) is slight ; we make
this assumption. Between the other two equations there is little difference
in physical significance unless some other special assumption is made. We
suppose to be specific that the string has been plucked in such a way that
its motion takes place entirely in the xy-plane, with z(s, t) = 0. Finally we
assume that the displacements are small enough that we can replace d 2yjds 2
by d 2yjdx 2 This last assumption is one that requires experimental justifi-
.

cation in actual practice. Thus, with some loss of generality, we consider


instead of the vector Equation (11) a single scalar equation for y{x, t):

7.2 ^=
d
dt
2
a>
d
-y
d

This differential equation does not completely specify the vibration of a


string unless we impose some initial conditions:

y(x,0)=/(x), (12)

^(x,0) = g(x). (13)


dt

The first condition specifies the initial (/ = 0) displacement in the y-


direction as a function of x, rather than s, and the second equation specifies
the initial velocity in the j-direction also as a function of x. In addition,
we also impose boundary conditions

7(0, = and y(p, t) = 0. (14)

As in the case of the heat equation, we use separation of variables and


rely on the linearity of Equation 7.2 and the conditions (14) for construct-
ing solutions that satisfy the conditions (12) and (13) also. We try

y(x, t) = G(x)H(t),
upon which 7.2 becomes

G(x)H"(t) = a 2 G"(x)H(t)
or
if"(Q_ j2
G"(x)
a '

H(t) G(x)
Since the left side is independent of/, both sides are constant; so we write

G"(x)
G(x)
= .
A and
,
—^ =
H"(t)
H(t)
a
2 ,
A.

The first equation, G"(x) = lG(x), has solutions

G(x) = c e Vix + c e- Vxx


1 2 .
Sec. 7 Heat and Wave Equations 419

The boundary conditions (14) require

G(0) = and G(p) = 0;


so

c, + c2 - and A
CjC^ * + c e- Vxv = 0.
2

Solving for Cj and c2 we find that, for nonzero solutions to exist, we must
have c x = —c 2 and e~
v Xp
= ex
XT>
, or e ^
2 Xv
= ] . Thus, allowing complex
exponents, we conclude that 2\J Ap = 2-nki for some integer k, so that

Solutions G(x) are then of the form

G(x) = „ J.irkilj>)x
cx e
J.—iiki!p)x

= 2c A sin — x.
P
We write

G k (x) = b k sin — x.

The differential equation for H, namely, H"(t) = a 2 AH{t), now takes


the form

P
since we have determined that A = Tr
2
k 2 \p 2 . Solutions of this equation are
of the form
,,
H,,k (t) = C k cos trka + D„ k sin trka
_,
t
,
.

t ;

therefore, product solutions yk = G k Hk take the form

y k (x, = A k cos
nka
t + B„ h sin
,
. itka
t
.

sin —
Ttk
x.
P P P

To satisfy the initial conditions (12) and (13), we form finite or infinite
sums of the type

y(x, 0=2 A,, cos


rka
t + 5. sin
rka
sin
.


irk
x.
P
Thus the initial conditions become formally

y(x, 0) = ^A k sin — x =/(x) (15)


k=0 P
and

y t(x, 0) = 2 -— 5 t sin — x = g(x). (16)


Ar=0 F />
420 Real- Valued Functions Chap. 5

The coefficients A k and {rcka\p)Bk are then determined so that they are the
Fourier sine coefficients of/ and g, respectively.

Example 2. For a simple example we consider a string which is initially


stationary; hence, y (x, 0) = 0. We assume the initial displacement is
t

given by

g(x, 0) — b sin (16)

and y(0, t) y(p, t) =0. We choose Bk = Equation (16) and A x = in =


b in Equation (15), with A k for k =
happens that /and g are ^ 1. (It

so simple in this case that we can find their Fourier sine coefficients by
inspection.) The solution to 7.2 then takes the form

,
y(x,
N
t) = ,
b cos —
-na
t
.

sin
tt
— x.
P P
Recall that for Equation 7.2 to be physically realistic the vibrations of the
string should be fairly small — in other words, the coefficient b should be
small. In Fig. 25, the graph of our solution is shown with a = 1 , p = tt,

Figure 25

and b chosen unrealistically large in order to bring out some qualitative


features of the picture.

EXERCISES
1. Use the method of separation of variables to solve the 1-dimensional heat
equation a 2 n XJ . ii t ,
subject to each of the following boundary and initial

conditions.

(a) ;/(0, = u(p, t) 0, u(x, 0) = sin (ttx//?).

(b) i/(0, r) = u(n, t) 0, u(x, 0) = .v(tt a).

(c) // x (0, /) = ux (n, t) 0, u(x, 0) sin .v.


Sec. 7 Heat and Wave Equations 421

2". Use the method of separation of variables to solve the 1 -dimensional wave
equation a 2 uxx = u tt , subject to each of the following boundary and initial

conditions.

(a) w(0, ;) = m(tt, /) = 0, u(x, 0) = sin x, u (x, 0)


t
= 0.

=
(x, < x < -nil }

(b) «(0, r) = «(*, r) = 0, «(*, 0) ,_^


'
\,u (x,0)=0.
[rr — X, irjZ < X < wj
t

(c) w(0, = m(tt, /) = 0, u{x, 0) = 0, for < x < tt,

(0, < x < 77/2

^ 1 , Tj2 <X< TT.

3. The 2-dimensional heat equation is a 2 (uxx + u yy ) = u t ,


and if wis independent
of t, ut =
This results in the Laplace equation uxx + u yy =
0. for the
temperature in a 2-dimensional steady-state heat flow problem. Using
the method of separation of variables, solve the equation uxx + u vy — in the
rectangle < x < n, < y < tt, subject to the boundary conditions
w(0, y) = u(rr, y) = 0, u(x, 0) = 0, u(n, x) = sin x.

4. Let L be defined as an operator by

(a) L{u) = u t - a 2 uxx .

(b) L(y) = y tt ~ a 2y xx .

Show that L is a linear operator, and conclude that linear combinations of


solutions of (a) the heat equation and (b) the wave equation are also solutions.

5. (a) Show boundary condition of the form u(a, t) = u(b, t) =


that a is

"linear" in the sense that,if two functions satisfy it, then so does any

linear combination of the two functions.


(b) Show that a boundary condition of the form u(a, i) = 1 is not linear in
the sense of part (a).
(c) Show that the initial condition u{x, t) = f(x) is not linear in the sense of
part (a) unless /is identically zero.

6. The 2-dimensional Laplace equation in polar coordinates is, by Chapter 4,


Section 5, Problem 13,

„ d2 u du d2u
r
2
2
V r 1
=
dr dr d6 2

(a) By letting u(r, 8) = G(r)H(Q), show that the method of separation of


variables leads to the two differential equations

H" - m= 0,

r
2
G" + rG' + W= 0.

(b) Show that H" - IH = has solutions satisfying 7/(0) = H{1tt) if and
only if A = —k 2 where k is an integer, and that the solutions can then be
,

written cos kd and sin kd.


422 Real- Valued Functions Chap. 5

k
(c) Show that r 2 G" t rG' k2G has solutions r and r~ k for k =
0,1,2,..., but that negative exponents are ruled out if u(r, 0)
G(r)H(6) is to be finite for r 0.

(d) Show that if

fC>) -£ -f 2 KcosAO I- bk sm kO),

then

u{r, 6) = -£ + 2 (fli/* cos /c0 + b k r k sin A-0)

satisfies the Laplace equation in polar coordinates together with the


boundary condition «(1, 0) = /(0).
7. Verify that the series expansion for given in Equation (8) of the text
»(.v, f)

is a formal solution of a 2 u xx = //,. Use term-by-term differentiation of the


series. (The method is justified by Theorem 8.5 of the next section.)

SECTION 8

UNIFORM Let/A.(x), k = 1, 2, 3, . . . , be a sequence of real-valued functions defined


CONVERGENCE
for all x in some set S. Then for each x, we consider the series JJ /l-(x). If

it converges for each x in S, we say that the series converges pointwise on S.


Calling the limit f{\) for each x in S\ we write

= lim !A(x).
-V- ' fr=l

Recall that this means that, for each x in S, there is a number/(x) such
that, given e > 0, there is an integer K sufficiently large that

i a( X )-/(x) < e,

whenever N > K.

Example 1. The series ^x k


has as (TV + l)th partial sum the finite sum
J.=0

x#l,

x = 1.

Then
v
= Hm 2 X * =
1
2**' '
^or — <x<1 1.
.

Sec. 8 Uniform Convergence 423

For real values of x outside the interval (— 1 , 1), the series fails to converge.
00

The trigonometric series ^ (sin kx)/k 2 converges pointwise for all real

x. The reason is that its terms can be compared with those of the convergent

series ^
fc=i
l/^'
2
> by observing that

sin kx
<1, fc = l, 2,
~k 2

The result is that the given series even converges absolutely.


00

An infinite series
2 fk( x ) tnat converges for each x in a set S to a num-
A:=l

ber/(x) defines a function/on S. However, in general, very little can be


concluded about the properties off from pointwise convergence alone.
For this reason it is sometimes helpful to consider a stronger form of
00

convergence on S. We say that ]?fk converges uniformly to a function/on


k=l
a set S, if, given e > 0, there is an integer K such that for all x in S and
for all N >K,
i/ (x)-/(X) < e
ft
.

The definition just given should be compared carefully with that of point-
wise convergence. Notice that uniform convergence implies pointwise
convergence, but not conversely. Roughly speaking, uniform convergence
of a series of functions defined on a set S means that the series converges
with at least a certain minimum rate for all points in S. A pointwise con-

s N (x)

Figure 26
424 Real- Valued Functions Chap. 5

vergent series may have points at which the convergence is arbitrarily slow.
Figure 26 is a picture of uniform and nonuniform convergence to the
same function/; sN (x) and t y (x) are Nth partial sums of two series.
To determine that a series converges uniformly we have the following.

8.1 Weierstrass Test

Let
fc
^
=
fk
l
be a series of real-valued functions defined on a set S. If
00

there is a constant series ^pk


fc=i
, such that

1 •
\fk( x )\ ^ Pk f° r a 'l x in 5 and for k = 1 , 2, ... ,

00

2. 2/>k converges,

CO

then
fc=i
^ /t converges uniformly to a function /defined on 5.
00

Proof. The comparison test for series shows that ]T /fc(x) converges

(even absolutely) for each x in S to a number that we shall write/(x).


Hence we can write
N oo A r

/(x) - 1 A(x) = 2 A.(x) - 2/ (x) fc

k=l fc=l fc=l

= i /*(x).
fc=A +l r

It follows that

/(x)-ia(x) < i iA( X )i < I Pt .

fc=l I
*=A'+1 *=A'+1
00

Since
CO
^pk converges, we can, given e > 0, find a K such that
k=l

^ p k < € if N > K. This completes the proof, because the number


k=y
K depends only on e and not on x.
CO

Example 2. The trigonometric series ]T (sin kx)jk 2 converges uniformly


k=l
for all real x, because

sin kx 1

k'

and 2
k=l
l/^ 2 converges. However, the power series 2x
k=0
k
, while it converges

pointwise for — < 1 x < 1, fails to converge uniformly on (— 1, 1). See


Sec. 8 Uniform Convergence 425

Problem 6. The Weierstrass test can be applied on any closed subinterval


00

[— r, r] by observing that \x
k
\
< r k for x on [— r, r] and that^ f
k
converges
fc=0

if < r < 1. Hence, the power series converges pointwise on (—1, 1)

and uniformly on [— r, r] for any r < 1.

The next four theorems are about uniformly convergent series


of functions. They all assert that certain limit operations can be inter-
changed with the summing of a series, provided that some series converges
uniformly. If uniform convergence is replaced by pointwise convergence,
then the resulting statements are false. See Problem 9.

8.2 Theorem

Let/1; /2 ,/3 , ... be a sequence of functions defined on a set S in 31".

Suppose x is a limit point of S, and suppose that the limit

lim/^x)
x-»x

exists for k — 1,2,.... Then


00 00

lim 2/fc( x )=2 lim/ fc


(x) ,

x-»x k=l h=l x->x

provided the series of numbers on the right converges, and the series
on the left converges uniformly on S.

Proof. Let lim/ fc


(x) = ak . Then, adding and subtracting ^fk (x)
N x-»x k=l
and ^
A:-l
ak> we §et

Z/*(*)-5>a lA-W-I/,(x) (1)

JV iV

Now let e > 0. Since ^fk converges uniformly we can choose K


k=i
such that N > K implies

Z/*00-2/*« <f. for all x in S.

Then choose an ./V > K such that

<
k=l k=l
426 Real- Valued Functions Chap. 5

Finally, pick d > so that |x — x |


< S implies, by the relation
N N
lim IA00 = la k ,
that

i/,(x)-i«,
fc=l k =l 3

Then for x satisfying |x — x„| < <5, the left side of (1) is less than e.

8.3 Corollary

If
A-
^f
= l
k is a uniformly convergent series of continuous functions fk
defined on a set S in %n , then the function / defined by f(\) =
00

^f (\) k is continuous on S.

In the next two theorems we restrict ourselves to functions of one


variable, though by treating one variable at a time, we can apply them to
functions of several variables.

8.4 Theorem

If the series "£fk converges uniformly on the interval [a, b], and the

functions^, are continuous on [a, b], then

00 ,b

1
fc=l Ja
/",(*) dx IAM ./v.

Proof. By Theorem 8.3 the function ^f (x)


k=l
k is continuous on [a, b]

and so is integrable there. We have

2/,(-v) dx -I f /*(*) </.v = f" I / (x)


fc
J.v. (2)
i = l Ja Ja A = .Y+1

Let e > 0, and choose K so large that if N> A', then <
i \+ 1

e(b — a) l
, for all x in [<?, />]. Then using the fact that, for con-
tinuous g,

x)dx (b - a) max |g(.v)| ,


fg(-
Ja a<x:.b
Sec. 8 Uniform Convergence 411

we have

2 fk (x)dx < (b - a) •
e •
(6 - a)~ for N> K.
I
Thus the left side of Equation (2) is less than e in absolute value for
N> K, which was to be shown.

The interchange of differentiation with the summing of a series requires


somewhat more in the way of hypotheses than did the previous theorem
on integration.

8.5 Theorem

Let fi,f2 ,f3, • • • be a sequence of continuously differentiable


00

functions defined on an interval [a, b]. If ^fk (x) — f(x) for all x in
CO

[a, b] (pointwise convergence), and if ^ dfjdx


fc=i
converges uniformly

on [a, b], then/is continuously differentiable, and

°°
A °° rlf

ax fc=i *=i rfx

Proof. By the fundamental theorem of calculus,

|[A(x)-A(a)] =i f/i(r)A
fc=l k=l Ja

Z/i(0 c//. (3)


-r
Letting TV tend to infinity, we get 2 .A 00 = f(x )> so

/(x)-/(a) = 2/i(0 c/f.

where we have used pointwise convergence on the left side of Equation


(3) and, on the right, have used uniform convergence, together with
Theorem 8.4. Differentiation of both sides of the last equation gives

which is the conclusion of the theorem.


428 Real- Valued Functions Chap. 5

Example 3. Consider the trigonometric series

2. sin kx
^ ; 4

Clearly the series converges for all real x. Furthermore the series of deriva-
tives of the terms of the given series is

S cos kx
A =l K

This series converges uniformly for all x by the Weierstrass test, because

cos kx
3
< —
~ 3
1

k k
3
and ^
i
C/^' ) converges. Hence, by Theorem 8.5,

d 2-, sin kx _^
~ cos kx
dx A k* ti
ft k*

The same argument can be applied to give

2
d ^.sinh _ S. sin kx
i 2 *- i 4 ^— ; 2
dx k=i k k=i k

EXERCISES
1. Show that the series ^ .v
l"
converges uniformly for — */ < <
Jt f/ if < rf < 1.
fc=0

* cos kx
2. (a) Show that the trigonometric series 2,
k=1
— — 7^ converges uniformly for
all real x.

(b) Prove that the series of part (a) defines a continuous function for all real

A.

3. (a) Show that if a trigonometric series

- -r
Ja i; cos kx + b k sin kx

converges uniformly on [ — -n, tt\ then it converges uniformly for all


real x.

(b) Prove that the uniformly convergent series of part (a) is necessarily the
Fourier series o\' the function it represents. [Hint. Use Theorem 8.4.]

4. (a) Show that if \c k \


B for some fixed number B, then the scries

00

U(X, 0=2
k=\
Ck e
~
k~ l
S'n kx
Sec. 9 Orthogonal Functions 429

is a solution of the differential equation

u xx = //( for / • and x in [0, n],

satisfying u(0, t) = u(it, t) = 0. [Hint. Use Theorem 8.5.]

(b) Show that if u(x, t) in part (a) is defined for / = by a uniformly con-
2
vergent series, then u(x, t) is continuous on the set S in Jl defined by
< t, < x < IT.

(c) Show that the function u(x, t) is infinitely often differentiable with
respect to both x and /, for t > 0.

5. Show that if a trigonometric series of the form shown in Problem 3(a)

satisfies |o„| < Ajri1 ,


\b n \
< Bjn 2 for n = 1, 2, 3, ... , and some constants
A and B, then the trigonometric series is a Fourier series.

6. By considering the partial sums of the power series ^x k for —1 < x < 1,
fc=0
show that the series fails to converge uniformly on ( — 1, 1).

00

7. Show that 2(~l) fc


(l — x)x k converges uniformly on [0,1], but that
oo k=l
y (1 — k
x)x only converges pointwise on [0, 1].

00 00

8. (a) Assume that the series ^k 2


°n ar) d
k=1
^k 2
b n both converge absolutely.
Show that k=1
00

w(x, 0=2 fc=i


s'n ^ X ( fl cos kat + b k
fc
sin kat)

is a solution of the 1-dimensional wave equation a 2 w xx = w tt . [Hint.


Use the Weierstrass test and Theorem 8.5.]

(b) Show that the solution w(x, t) of part (a) satisfies the boundary con-
ditions w(0, t) = w{tt, t) — for t > and an initial condition w(x, 0) =
h(x), where h is twice continuously differentiable.

9. Show that, with uniform convergence replaced by pointwise convergence,


the statements of Theorems (a) 8.3, (b) 8.4, and (c) 8.5 become false.

10. Can Theorems 8.4 and 8.5 be proved under the more general assumption
that the functions involved are vector-valued, with values in Jl n ? [Hint.
See Problems 16 and 17 of Chapter 3, Section 2.]

SECTION 9

Some of the properties of Fourier expansions are shared by a large class ORTHOGONAL
of similar expansions in which the functions sin kx and cos kx are replaced FUNCTIONS
by some sequence {<p k } k=l2 ^ of functions, all of which are mutually
orthogonal. The orthogonality is measured in terms of an inner product
on a vector space of functions. For example, if/and g are elements in the
space C[— 7T 77] of continuous functions on the interval [—77-, it], we can
,
430 Real- Valued Functions Chap. 5

define an inner product of/ and g by

if, g) = /(-v)g(-x) dx. (1)

It follows by direct computation (compare Theorem 5.3) that if we set

<px(x) = 1/v 2tt, (f 2n ( x) = (cosnx)l\


!
TT, and q> 2n +i(x) = (sin nx„)l\ tt, then

k = I

(<Pk , <Pi)
0, k^l.
The sequence{(p k } k=12 ,... is orthonormal with respect to the inner product
given by Equation (1). The term "normal" comes from the fact the
functions have been normalized by requiring \\(p k \\
= (<p k , (p k )
112 = 1. In
the trigonometric case, the normalization is achieved by dividing the sines
\ '-.
and cosines by
To see the importance of orthonormal sequences in general, we con-
sider the following problem: Let (f,g) be an inner product on a vector
space, and let {(p k } k =i, 2 .... t> e a sequence of elements, orthonormal with

respect to the inner product. Using the norm defined by ||/|| = (f,f)
1/2
,

we try to determine coefficients c k k ,


— 1,2,..., N, such that

g -J.c k <Fk

isminimized for given g and N. The fact that the sequence <pk is ortho-
normal makes the solution very simple by adding and subtracting —
N
2 (g, Vkf, we get

N N
< 8 - 2 c k(pk ~i,ck (pk , g -2q.7\
k=l k=l

2
= \\g\\ -2j,c k (g,(pk )+Icl

igii —2 <»»%>'
+ X[<g,?>*> -2c 2
fc <g,^> + cf]

2
= \\g\\ -i(g,<pky+i[(g,<pk ) ck t (2)
k= l k=l

But the first two terms in the last expression are independent of the choice
of the ck s, and the last sum is then minimized by taking ck — (g, q> k ). The
numbers g, <j , are called the Fourier coefficients of £ with respect to the
Sec. 9 Orthogonal Functions 431

orthonormal sequence {(p k }. Notice that the simplicity of the answer to


the problem depends very much on the orthogonality of the (pk s.

We can summarize what has just been proved as follows.

9.1 Theorem

Let {<pk } k= i,2.... De an orthonormal sequence in a vector space with


an inner product. Then, given an element g of the space, the
distance
N
k=l

is minimized for TV = 1,2,... by taking c k to be the Fourier co-


efficient (g, <p k ).

The important thing about the conclusion of Theorem 9.1 is that the

c k s are uniquely determined, independently of TV. In other words, if we


wanted to improve the closeness of the approximation to g by increasing
TV, then Theorem 9.1 says that the c k s already computed are to be left

unchanged, and it is only necessary to compute additional coefficients


cN+1 = (g, tpN+i), etc.
As a by-product of the proof of Theorem 9.1 , we have the following.

9.2 Bessel's Inequality

2 2
lgll > I(g,<P k ) -

Proof. The inequality

2 2 2
o<ngii -i<g,^> +i[<g,^)-c,]
k=l k=l

was established in the last step of Equation (2). On taking ck =


(g, <p k ), the inequality becomes

0<||£||
2
- I&n) 2

k=l

Bessel's inequality follows by letting TV tend to infinity.

N
Example 1. The approximation of g by a sum ]£ ^-9"^ h as been measured
fc=i

by a norm in Theorem 9.1. To see what this means for approximation by


trigonometric polynomials, we use the inner product given by Equation
432 Real- Valued Functions Chap. 5

(1) on the space C[ — tt, tt] of continuous functions on [—ir, it]. Given the
orthonormal sequence cp
x (x)
= \\^2tt, (p 2 „(x n ) = (cos nx)l\fn, q> 2n+1 (x n ) =
(sin nx)l\Jir, we try to minimize, for given g in C[— tt, tt], the norm
2AM-1

g - I, CkVk

We have seen that this is done by taking c k = (g, <p k ). But, by the definition
of the inner product,

g(0 dt, fc = 1
J-w J2tt

[" cos nt
<g> <P*>
, N
g(t)—=-dt,
.

k — 2n
J -IT y/TT

C" sin nt
, .

g(t)—^dt, ,

k = 2n + l.
\J-n y/TT

Hence, the terms ck <p k become

(g. <Pi)<pM = —
ATT J-ir
g(t) dt,

<g> <P2„)<P2n( x )
= - g(t) cos nt dt cos nx,

—1 f*
(g. <Pin+l><P*n+l( x ) g(t) sin nt dt
TT J-7T

Then

2 c k (p k (x) = -^ + ^(a k cos /ex + 6 sin kx),


fc

fc=l 2 A-=l

where a k and 6 7c
are the trigonometric Fourier coefficients as defined in
Section 5. The square of the norm to be minimized takes the form

(x) — a— — ^(a k cos kx + b k sin kx) dx,


£ 2 k=l

and Theorem 9. says that the minimum will be attained


1 for any fixed N by
taking ak and b k to be the Fourier coefficients of g.
The minimization of an integral of the form

g(x)-j,c k <p k (x) dx


I"
Ja

is called a best mean-square approximation to g. In this sense we can say


that the Fourier approximation provides the best mean-square approxi-
mation by a trigonometric polynomial.
Sec. 9 Orthogonal Functions 433

In Section 5 we observed that the functions cos kx and sin kx are


solutions of the differential equation y" — —k y. 2
The result can be stated
by saying that cos kx and sin kx are eigenfunctions of the differential
operator d 2/(dx 2 ), corresponding to the eigenvalue —k 2
. More generally,
let L be a linear transformation defined on some vector space and having
its range in the same space. We say that a nonzero vector/is an eigenvector
of L, corresponding to the eigenvalue A if Lf = If for some number A.

To see the connection between eigenvectors and orthogonal sets of


functions, we need one more definition. Let J" be a vector space with an
inner product {/, g). Let L be a linear transformation from & into 3r .

Then L is symmetric with respect to the inner product if (Lf, g) = (f Lg)


for all vectors /and g for which Lf and Lg are defined as elements of 3r .

A linear transformation with the same domain space and range space is

sometimes called a linear operator.

Example 2. Let (/, g) be the inner product defined by Equation (1)


on C[—tt, tt}. If we let Lf = d 2fjdx 2
, then it is clear that Lf is in C[— tt, tt]

only for those / in C[— tt,


happen to have continuous second
tt] that
derivatives. Thus, for L we must have (Lf, g) = (f, Lg)
to be symmetric,
for all twice continuously differentiable/and g on [—tt, tt]. Equivalently,
we must have, because of the definition of the inner product,

7"(x)g(x)rfx = dx.
J J7«g"(*)
Integration by parts twice shows that

J_V"(x)g(x) dx =/'(*)*(») -/'(-w)g(-ir) -/(")£'(")

+ /(-")«'(-ir) + J_7(*)g"(x) dx.

Hence to make L symmetric we restrict its domain to some subspace of


C[— tt, which the nonintegrated terms will always add up to zero.
tt] for
This can be done in several ways. For example, we may restrict L to
the subspace consisting of those functions h in C[— tt, tt] for which
h(-n) = h{—Tr) = subspace for which /z'O) = h'(-Tr)
0, or to the 0. =
With either of these restrictions L becomes symmetric. Notice that a
restriction of the required type is a boundary condition, in that it specifies
the values off at the endpoints tt and — tt of its domain of definition.

The connection between orthogonal functions and symmetric operators


is as follows.
434 Real- Valued Functions Chap. 5

9.3 Theorem

Let L be a symmetric linear operator defined on a vector space 3-

with an inner product. If/i and/2 are eigenvectors of L correspond-


ing to distinct eigenvalues Aj and A 2 , then/i and/2 are orthogonal.

Proof. We assume that

Lfi — A-ifi, Lf = X f 2 2 2 ,

and prove that (f,f2 ) = 0. We have


<L/i,/ = <Vi./i> = Wi./t>
2>

and
(A, V 2) = <A> ^A> = a,(A,A>.
Because Lis symmetric, <L/i,/2 ) = (/ ls Z/2 >, so hifnfz) = K{fi,fz)
or (A x— X )(f,f =
2 2) 0. Since A x and A 2 are not equal, we must have
(fi,fz) = 0.

We consider the Sturm-Liouville differential operator of the form

Lf={pf') + qf, (3)

where p is a continuously differentiable function and q is assumed only


continuous. (The operator d 2 ldx 2 is a special case if we set p(x) 1, =
q(x) = 0.) We want to see what boundary conditions should be imposed
on the domain of L in order to make L symmetric with respect to the
inner product

(f,g)=j f(x)g(x)dx.
a

The following formula simplifies the problem.

9.4 Lagrange Formula

If L is given by Lf = (pf)' + qf on an interval [a, b], then

(fu Lf2 ) - (L/ /8 = |K/i/i -/i/ )t


l5 ) 2

Proof. Starting with the definition of L, we rearrange f{Lf2 ) —


(L/i)/2 as follows:

ULf - 2) (i/o/, =/ [( /o' +


1 jP ?/2 ] -/ittp/jy + 9/1]

=A(p/t)' -A(p/0'
=/ib/2 + p'/J] -A[p/i + p'fi]

= p'l/Ji -fifz + vUJl ] -f'lf*\


= [pUif't-fifJ\'-
Integration of both sides from a to b gives the Lagrange formula.
.

Sec. 9 Orthogonal Functions 435

Formula 9.4 shows that any condition on the coefficient/?, or on the


space containing fx and/2 which makes [p{fif'2 — /i/2 )]a = 0> will als°
,

make L symmetric.

Example The operator L defined by Lf(x) = (1 — x 2)f"(x) —


3.

in Equation (3) if we selp(x) = (1 — x ) and


2
2xf'(x), has the form shown
q(x) = 0. We shall consider L to be operating on twice continuously differ-
entiable functions defined on [— 1, 1]. To make L symmetric, we need to
ensure that the right side of the Lagrange formula is always zero for
4= —l,b= 1. Butp(x) = (1 - x 2 ),sop(-l) =p{\) = 0. Hence, Lis
symmetric on the space C[— 1, 1] without further restriction, and its

domain consists of all twice continuously differentiable functions in


C[-l,l].

The symmetric operator Lf defined in the previous example is usually


associated with the differential equation

(1 - x 2 )y" - Ixy' + n(n + \)y = 0. (4)

This is called the Legendre equation of index n, and it is satisfied by the nth
Legendre polynomial defined by

9.5 P n (x) = ——
2
n
n\dx n
(x
2
- l)
n
, n = 0, 1, 2,

That P n satisfies Equation (4) can be verified by repeated differentiation.


See Problem 8. The significance of the fact that Pn satisfies the Legendre
equation comes from writing the equation in the form Ly = —n{n + \)y,
where L is the symmetric operator Ly (1 — x )y" — 2xy' on C[— 1, 1].
=
2

Then P n can be looked at as an eigenfunction of L, corresponding to the


eigenvalue —n(n + 1). Hence, by Theorem 9.3, the Legendre polynomials
are orthogonal, that is,

(x)P m (x) dx — 0, n =t^ m.


J>-
Furthermore, a fairly complicated calculation (see Problem 10) shows that

2
(x) dx = n =
J> 2n + l
, 0, 1, 2, . . . .

Therefore, the normalized sequence {{\l2n + n


l/2)P (x)} n = 0, 1,2, . .

is an orthonormal sequence in C[— 1, 1].

The Nth Fourier-Legendre approximation to a function g in C[— 1, 1]


L a

436 Real- Valued Functions Chap. 5

is the finite sum

Ic k Pk (x),
k=0
where

ck = —1 — fj [
I
g(x)Pk (x) dx. (5)

Theorem 9.1 then implies that the best mean-square approximation to g


by a linear combination of Legendre polynomials is given by the Fourier-
Legendre approximation. In other words,

fTgW-ic F,(x)
J-i fc=o
fe
dx

is minimized by computing ck by Equation (5).

EXERCISES
1. (a) Verify that

</><?>= fMg(x)dx

defines an inner product on the space C[a, b] of continuous functions


defined on [a, b].

(b) What condition is required of a continuous function w defined on


[a, b] in order that

f(x)g(x)w{x) dx
J
define an inner product?
2. Let {9^.} be an orthonormal sequence of functions in C[a, b], and let ^ be
in C[a, b]. Show that if the real-valued function

A(cl5 . . . , c N) g{x) - 2c k <p k (x) dx


I k=\

has a local minimum Dn of (c u


as a function . . . , cy), then

g(x)<pk (x) dx.


•'a

[Hint. Differentiate the formula for A under the integral sign. This is

justified under the assumptions made here. (See Problem 7 of Chapter 6,


Section 2.)]

3. Let {<p k } be an orthonormal sequence of functions in C[a, b] and let ^ be in

C[a, b]. Use Bessefs inequality to show the following:

•b -|2

(a)
2 J
fc=i
* a
g(x)<p k {x) dx converges.
Sec. 9 Orthogonal Functions 437

(b) If c k is the £th Fourier coefficient of g with respect to {<p k }, then


lim c k = 0.
fc-»co

(c) Find an example of a function whose trigonometric Fourier series has


00

coefficients a k
00 00
and b k such that ^
fc=l
( al + tf)
converges, but such that

2
fc=l
o k or 2^
fc=2
d° es not conver ge -

4. Let Z, be a linear transformation from a vector space 3^ to an otherwise


unrelated vector space ^2 Can L have eigenvectors?
-

5. Find all eigenfunctions and corresponding eigenvalues of the differential


operator d 2 jdx 2 satisfying each of the following sets of boundary conditions.
That is, solve y" = Ay, subject to the boundary conditions

(a) y(0) = y(n) = 0.


(b) y(0) = yin) + /(tt) = 0.

(c) ><(0) = /(tt) = 0.

6. Let C[a, b] be the continuous real- valued functions defined on [a, b],

C'[a, b] the continuously differentiate functions on that interval, and


C"[a, b] the twice continuously differentiable functions there.

(a) Show that C[a, b], C'[a, b], and C"[a, b] are vector spaces, each con-
tained in the preceding one.
(b) Let Bx [a, b] be the set of functions/contained in C[a, b] and satisfying
a condition of the form
cf(x ) + df'(x ) = 0,

where c and d are constants, not both zero. Show that Bx [a, b] is a
vector subspace of C[a, b].

(c) Show that C"[a, b] r\ Ba [a, b] r» B b [a, b] is a vector subspace of C[a, b].

t l
7. Show that the differential operator d \dx is symmetric with respect to the
inner product

and with each of the following sets of boundary conditions.

(a)/(-l) =/(l)=0.
(b)/'(l) =/'(-l) =0.
(c) Cl /(-l) + dj\-l) = c 2 /(l) + d2 f\\) = 0,

where c? + df > 0.

8. Verify that the Legendre polynomial P n defined by Formula 9.5 satisfies the
Legendre equation: (1— x 2 )y" — 2xy' + n(n + \)y = 0. [Hint. Let = //

(x 2 — \)
n
. Then (x 2 — \)u = 2nxu. Differentiate both sides (n + 1) times
with respect to x.]

9. Compute the Legendre polynomials P Px , , and P2 .


438 Real- Valued Functions Chap. 5

10. (a) By using Formula 9.5 and repeated integration by parts, show that

1 1
f (2nV C

(b) Show that


r i jr/2
f
|
(1 - x 2 ) n dx =2 sin
2n+1 6
dQ.

(c) Show that


17/2
. _« 2 •
4 •
6 •
. . .

(2n)
sin 2n+1 dd =
I 1 •
3 •
5 •
. . .

(In + 1)
(d) Show that

PnK
2
(x)' dx
£ In + 1

11. Prove that Pn , the nth Legendre polynomial, has n distinct roots in the
interval [ — 1, 1]. [Hint. Use Formula 9.5 and Rolle's theorem.]

12. The 3-dimensional Laplace equation in spherical coordinates (r, <p, 6) has
the form

tvt)
or\
9 /

or/
„ du\
- -•
sin
—T
1

<p dq>\
d I
sin V T"
du\
ay)1+ ^~1T
sin <p
1

12
d 2u
^2"
ctr
= °-

(a) Show that, for solutions u(r, q>, 6) = v(r, cp) that are independent of 8,
the equation has the form

d2

or"
1
sin 9? 3
3
-
/
sin <p — 3«\
= 0.

(b) Show that the method of separation of variables applied to the equation
of part (a) leads to the two ordinary differential equations

r
2
G" + 2K7' = kG, '

sin
1

<p
—d
dq>\
I
sin y —
dH\
dtp ]
\ = -W.

(c) Show that the equation for G(r) has sotutions

r
-d/2)+VA+(i/4) and r
-(i/2)-VA+d/4)_ [Hint Let r = et ]

(d) Show that the equation for H can be put in the form of the Legendre
equation
(1 - x 2)H" - 2xH' + IH = 0.

[Hint. Let x = cos <p.]

(e) By setting 7. = /*(« + 1) find a sequence of solutions to the partial


differential equation of part (a).

13. Show that in the case of the orthonormal sequence derived from the Legendre
N N
polynomials, the general expansion ^ {g, <p k )<p k reduces to ^c k P n (x),
k=0 fc=0
where ck is given by Equation 5 of the text.
Sec. 9 Orthogonal Functions 439

14. Show by using Theorem 8.3 of Chapter 2 that the normalized Legendre
polynomials
\ln + 1

are the same as the sequence of polynomials obtained by applying the Gram-
Schmidt process to the functions {1, x, x2 , . . . }. Use the inner product

=
{f,8) ~jjf{x)g(x)dx
|

denned on C[0, 1].


6

Multiple Integration

SECTION 1

ITERATED INTEGRALS This chapter is devoted to the study of multiple integrals of functions with
n
domains in 3i . Such integrals occur in many branches of pure and
applied mathematics, with interpretations such as volume, mass, prob-
ability, and flux. In this section we start with iterated integrals because
they are computationally useful and because they provide a natural
transition to the multiple integral from the ordinary definite integral,

f(x) dx,
f

of a real-valued function of one real variable. We begin with n — 2,

that is, with the iterated integral of functions Jl 2 — 31.

Suppose f(x,y) is a function defined on a rectangle a < x < b,


c <y <d. By

dy
r f(x, y)

ismeant simply the definite integral of the function of one variable obtained
by holding x fixed for example ;

-
x y- l/=2

f
Jo
x
3
/ dy = 3
—-X
_ Jy=0

As this example shows, if the integral exists, it depends on x. Thus, we


may set

F(x)=jj(x,y)dy

440
,

Sec. 1 Iterated Integrals 441

and form the iterated integral

F(x) dx =
F(x)dx /(*, y) dy dx.
\
a Ja \_Jc

A common notational convention, which we shall adopt, is to omit the


brackets and write the iterated integral as

H /(*, y) dy.

This notation has the advantage of emphasizing which variable goes with
which integral sign, namely, x with j^ and y with Jf.

Example 1. Consider f(x, y) = x2 + y, defined on the rectangular


region <x < < y < 2.
1 , 1

(x
2
+ y) dy x*y + dx
Jo Jl f
Jo

[(2x
2
+ 2) - (x + 2
dx
r
Jo
J)]

(x
2
+ f)rfx = i + f =i i.
r
Jo
6

To interpret this example geometrically, look at the surface defined by


z = x2 + y shown in Fig. 1 For each x in the
. interval between and 1

Figure 1
442 Multiple Integration Chap. 6

the integral

(x
2
+ y) dy = x
2
+
I
is the area of the shaded cross section. It is customary to interpret the
definite integral of an area-valued function as volume. Thus we can regard
the iterated integral

(x +y)dy = \±
j>r
as the volume of the 3-dimensional region lying below the surface and
above the rectangle < x < 1, 1 < y < 2.

Example 2. We can perform the integration in Example 1 in the


opposite order.

pi pi pi
dy (x
2
+ y) dx = \
+ yx dy
Ji Jo Ji

/2nn2 2

=) i G + y)dy =
y
f 3^2
= (f + 2) - (i + i) = V.
This time

(x
2
+ j;) rfx = J + j;
i:

isthe area of a cross section parallel to the xz-plane. See Fig. 2. The
second integral again gives the volume of the 3-dimensional region lying
below the surface z = x2 +y and above the rectangle 0<x<l,

A
Figure 2
Sec. 1 Iterated Integrals 443

1 <y < 2. It is not surprising, therefore, that the two iterated integrals
of Examples 1 and 2 are equal.
It is important to be able to integrate over subsets of the plane that are
more general than rectangles. In such problems the limits in the first

integration will depend on the remaining variable.

Example 3. Consider the iterated integral

+
"ft + *z
frfx f (x y) dy *y dx
Jo Jo

- - *
2 2

-j:o
x(l x
2
) + (1 )
dx
L
i r
1 - 2x
2
+ x
4

=
x' + dx
"ft
For each x between and 1, the number y lies between the values
y = and j = 1 — x 2 In other words, the point (x, y) runs along the
.

line segment joining (x, 0) and (x, 1 — x 2 ). As x varies between and


1 this line segment sweeps out the shaded region B as shown in Fig. 3.
,

Figure 3

The integrand f{x,y) = x + y has a graph (see Fig. 4), and the iterated
integral is the volume under the graph and above the region B.
Suppose we are given an iterated integral over a plane region B in
which the integrand is the constant function /defined by f{x,y) = 1, for
all (x, y) in B. The integral may then be interpreted either as the volume

of the slab of unit thickness and with base B or simply as the area of B.
For example,

dx\ dy =
Jo Jo

is the area of the region B shown in Fig. 3.


444 Multiple Integration Chap. 6

Figure 4

Example 4. Let / be defined by f(x, y) = x 2y + xy 2 over the region


bounded by y = |jc|, y = 0, x = — 1, and x = 1. See Fig. 5. The two
iterated integrals over the region are

dx (xj'-f xy~) dy
J-i Jc

and

+ +
Hi: (x^y xy') dx + (x'y xy-) dx

The second integral breaks into two pieces because, for fixed j between
and 1, the integration with respect to x is carried out over two separate
intervals. Computation of the integral is straightforward. We get

m xy
2
rx 3
y x y
2 2
'

y Jo
( v - y
4
) dy

y = x \ \

Figure 5
Sec. 1 Iterated Integrals 445

The iterated integral in the other order is

J-iL 2 3 Jo J-i\2 3 /

.r^ + raUsCfc
J-i 2 J-i 3

3
The functions x i j2 and x |x| /3 are even and odd, respectively. It follows
that

x dx =
J-i 2 J-i 3 Jo

The theorem which states that, under quite general hypotheses, the
value of an iterated integral is independent of the order of integration will
be proved in the next section. This will prove that different orders of
integration in computing volume must lead to the same result.

Iterated integrals for functions defined on subsets of dimension greater


than 2 can also be computed by repeated 1-dimensional integration.

Example 5.

"i rx rzx+v ri rx
dx\ dy\ (x + y + 2z)dz = \ dx\ (4x
2
+ 6xy + 2y
2
) dy
Jx Jx Jo Jx

It is not possible to give a complete interpretation of this integral by


drawing a picture. However, the region of integration B can be drawn
and is shown in Fig. 6. It is bounded on the top by the surface z = 2x + y
and on the bottom by z — x. On the sides it is bounded by the surfaces
obtained by projecting the curves y = x 2 andy = x parallel to the z-axis.
With the same limits of integration, the integral

ri rx r2x+y
|
dx dy\ dz
Jo Jx Jx

is the volume of B. For fixed x and y the first integral,

2X+V
dz,
I
446 Multiple Integration Chap. 6

y = x*

Figure 6

is the length of the vertical segment joining the point (x,y, x) to the point
(x,y, 2x + y). For fixed x, the integral

px /»2

dz
JX JX

is the area of a cross section parallel to the jz-plane. Finally, the triply
iterated integral is the volume.

Example 6. The «-fold iterated integral

\
dxA dx 2 . . . dx n
Jo Jo Jo

can be thought of as the volume of the region in «-dimensional Euclidean


space defined by the inequalities

< x n < x n _i < . . . <x < 2 x-l <\.


To get some idea of what this region is like, consider the cases n = 1,
n = 2, and n — 3. For n = \, the integral

dx 1 = 1
j:

is simply the length of the unit interval < x t < 1. If n = 2, we have


<x <
2 Xi <
The region1 of.integration is the intersection of the

regions < x 2 x 2 < xlt and x x < 1 shown in Fig. 7. For n = 3, we have
,

simultaneously < x3 x3 < x 2 x 2 < xlt and x x < 1. See Fig. 8. If we


, ,
Sec. 1 Iterated Integrals 447

k
Figure 7 Figure 8

denote the «-fold integral by 1, I2 h, and /3 = |. These


/„, then F1 = —
numbers can be obtained by direct computation or by observing
either
that they are the length, area, and volume, respectively, of the regions of
integration. Direct evaluation of /„ is straightforward:

I„ = \
dxA dx 2 . .
.\ dx n
Jo Jo Jo
n rxi rx„- 2
= dx 1 dx 2 ... x„_x dx n _ x
Jo Jo Jo
pi pn-3 x 2
= p dx 1 dx 2 . . .
-t£ rfx„_:
Jo Jo Jo 2

px
= p dx x dx 2 .

Jo Jo 3!

dx x =—
1 o (n - 1)!

EXERCISES
Evaluate the following iterated integrals and sketch the region of integration
for each.

•H (x 2y 2 + xy 3) dy.

M »0 p
x —

(x +
2\ sin

y*) dy.
j dx. [Ans. 1 — cos 2.]
448 Multiple Integration Chap. 6

(£)
J*
dyj sin xdx.
^ [Ans. 0.]

/*1 pv 2

dy. [Ans. 1.]

r 1 r^i-xi
73 ^ </y.
Jo Jo

f
—1 /»2a:

/*jt/2 rcosy
9, rfy x sin _y c/;t.

10. rfx xrfy. [Ans. ^.]

"• .['4' c/y d*.

12. </x dy\ ydz. [Ans. f.]


Jo Ji J2

14. </x t/y (x +j + z) fcfe. [/*/?*. f.]


J-i Jo Jo
pit /*1 /»2

15. sin x </x dy (x +j + 2) dz.


Jo Jo Jo
._^ p\ px px+y px
I
l<n Evaluate the integral dx\ dy\ dz\ dw.

17. Sketch the subset B defined by0<x<l,0<y<x, and write down the
integral over B in each of the two possible orders of f(x,y) = xsinj.
Evaluate both integrals.

18. Sketch the region defined by x > 0, x2 + y 2 < 2, and x 2 + y 2 > 1. Write
down the integral over the region in each of the two possible orders of
f(x,y) = x2 . Evaluate both integrals. [Ans. 3tt/8.]

19. Consider two real-valued functions c(x) and d(x) of a real variable x.
Suppose that, for all x in the interval a < x < b, we have c(x) < d(x).
(a) Make a sketch of two such functions and of the subset B of the xy-
plane consisting of all (x, y) such that a < x < b and c(x) < y < d{x).
(b) Express the area of B as an iterated integral.
(c) Set up the iterated integral of f(x, y) over B.
,

Sec. 2 Multiple Integrals 449

2^ Sketch
t the subset B of &3 , defined by <x< 1, <y < + x,
1 and
< z < 2. Write down the iterated integral with order of integration z,
then y, and then x, of the function f(x, y, z) = x2 + z over the subset 5.
Compute the integral. [Ans. -2 5-.]
^

21. Sketch the region defined by < x < l,x 2


<y < Vx, and <z<x + y,
and evaluate the iterated integral, in some order, of f(x,y,z) =
x + y + z over the region.

22. Let /be defined by f(x, y, z) = 1 on the hemisphere bounded by the plane
z = and the surface z = Vl — x2 — j2 . Evaluate an iterated integral of/
in some order over the region. [Ans. 27t/3.]

Let / be defined by /(x 1; . . . , xn ) = x^ x n on the cube <x < l 1

<x < 2 1 , . . . , < x„ < 1 . Evaluate

[ dxS dx 9
, J.v„
Jo Jo

24. Show that if in the integral

pi p2 p»
</Xi rfx 2 . . . /(x 1; . . . , x„) dx n
Ja 1 Ja 2 Ja n

the order of the limits of integration is interchanged on an even number of


integral signs, then the value of the integral is unchanged. If the limits are
interchanged on an odd number of integral signs, then the whole iterated
integral changes sign.

25. Evaluate
r 1
r 1
r 1
r xi
dxA dx 2 . . . dx n _ x (x x + x 2 ) dx n .

Jo Jo Jo Jo
2p. Prove that

Jo Jo
dxo
r /(x„) dx n =
(« - l)!jo
j:
(x - o n -VW dt -

SECTION 2

Multiple integrals are closely related to the iterated integrals of the pre- MULTIPLE INTEGRALS
ceding section. Suppose we are given a real-valued function defined on a
set B in %n . Our problem is to formulate a definition of the integral of/
over B
analogous to the definition of a one-variable integral, using sums
rather than iterated integrals.
We first consider some simple sets in 3tn . A closed coordinate rectangle
is a subset of Jl" consisting of all points x = (x 1; . . . , x„) that satisfy a set
of inequalities
at <x < b u t
i = 1 n. (1)

If in Formula (1) some of the symbols "<" are replaced by "<," the
resulting set is still called a coordinate rectangle. In particular, if all the
inequalities are of the form a t
< x, < b u the set is open and is called an
450 Multiple Integration Chap. 6

open coordinate rectangle. A coordinate rectangle has its edges parallel to


the coordinate axes. Throughout this section the word "rectangle" will be
understood to mean "coordinate rectangle." Rectangles in ill 2 and 3l 3
are illustrated in Fig. 9. A rectangle in % is just an interval.

1< x <4
1 < v < 1

Figure 9

Let R be a rectangle (open, closed, or neither) defined by Formula (1)


with replacement of any symbols "<" by "<" permitted. The volume or
content of R, written V(R), is defined by

V(R) = (b, - a x )(b 2 -a,)... (b n - a n ). (2)

In the examples shown in Fig. 9, V(R 2 ) = (4 — l)(l — (—1)) = 6 and


V(R 3 ) = (3 - 1)(3 - 1)(2 - 1) = 4. If, for some i in Formula (1),
a t
= b u then and V{R) = 0. For rectangles in 5l 2
R is called degenerate ,

content is the same thing as area, and we often write A(R) instead of
V(R) to have the notation remind us of area rather than volume.
A subset B of 3i n is called bounded if there is a real number k such that
|x| < k for all x in B. A finite set of (n — l)-dimensional planes in 31"
2
(lines in !ft
) As
parallel to the coordinate planes will be called a grid.
illustrated in Fig. 10, a grid separates 31" into a finite number of closed,
bounded rectangles R u R r and a finite number of unbounded regions.
. . . ,

A grid covers a subset 5 of 'JV if B is contained in the union of the bounded


rectangles /? 1? . . R r Obviously, a set can be covered by a grid if and
. , .

only if the set is bounded. As a measure of the fineness of a grid, we take


the maximum of the lengths of the edges of the rectangles R x , . . . , R T
.

This number is called the mesh of the grid.


We now give the definition of multiple integral, also called the Riemann
integral after Bernhard Riemann (1826-1866). Consider a function
.

Sec. 2 Multiple Integrals 451

y-&\\s

Figure 10

% n —^> ft and a set B such that

(a) B is a bounded subset of the domain of/.

(b) /is bounded on B.

Assertion (b) means that there exists a real number K such that
|/(x)| < K, for all x in B. The multiple integral of/over B will be defined
in terms of the function /#, which is/altered to be zero outside B, that is

{ fix), if x is in B.
/*(*) = .
c .

[0, if x is not in
l B.

Let G be a grid that covers B and has mesh equal to m{G). In each of the
bounded rectangles R t
formed by G, i 1, =
r, choose an arbitrary
. . . ,

point X;. The sum

i/^on^)
is Riemann sum for/over B. Its value, for given/and B, depends
called a
on G
and x lt , xr If, no matter how we choose grids G with mesh
. . .

tn(G) tending to zero, it happens that

Km IfB&MRt)
m(G)->0 (=1

exists and is always the same number, then this limit is the integral of/
over B and is denoted by § B fdV. If the integral exists, / is said to be
integrable over B.
The limit that defines the multiple integral is somewhat different from
the limit of a vector function defined in Chapter 2, Section 2, although the
452 Multiple Integration Chap. 6

idea behind it is similar. The defining equation

Hm IfB (x )V(R )=\


l
fdV
l

m(G)-0i=l JB

means that, for any e > 0, there exists d > such that if G is any grid
that covers B and has mesh less than 5, and S is an arbitrary Riemann sum
for/B formed from G, then

1 fdV
<e.

It should be emphasized that the integral is not defined for functions


31" — > 31 and sets B unless the boundedness conditions on/and on B are
satisfied. Without these conditions, even the Riemann sums may not be
defined.
If/is a real-valued function of one real variable, that is, if n = 1, and
if B is an interval a < x < b, the Riemann integral of/ over B is the
familiar definite integral

/(x) dx.
r
Other common notations for the integral of 31" — >- 31 over B are

f dA and f(x, y) dx dy, if n = 2,


B JB

f(x, y, z) dx dy dz, if n — 3,
i
/ dx x . . . dx n ,
for arbitrary n.

most applications the multiple integral can be replaced by iterated


In
integrals as we shall see later. However, for theoretical purposes it is
important to have some well-understood conditions under which multiple
integrals exist. The following theorem, proved in the Appendix, gives
such conditions.

2.1 Theorem

Let/ bedefined and bounded on a bounded set B in 3V, and let the
boundary of B be contained in finitely many smooth sets. If/ is
continuous on B except perhaps on finitely many smooth sets, then
/is integrable over B. The value of fafdV is unchanged by changing
values of/ on any smooth set.
Sec. 2 Multiple Integrals 453

n
By a smooth set in 'J{ is meant the image of a closed bounded set under
a continuously differentiable function —
% m > :i\", m < n. Thus, if n = 2
and m= 1, we may get a smooth curve. A smooth set in Si 1 will be
understood to be just a point. To say that the value of $ B fdVis unchanged
by changing the values of/ on such a set means that /can be assigned
arbitrary values on the set without affecting the existence or the value of
the integral. For instance we can change the integrand on any finite set
of points without changing the integral. This kind of modification is often
convenient for removing discontinuities.

Example 1. Evaluate the multiple integral

(2x + y) dx dy,
1
where B is the rectangle < x < 1, <y < 2. The existence of the
integral is ensured by Theorem 2.1. For this reason, any sequence of
Riemann sums with mesh tending to zero may be used to evaluate it.

z = 2x +y

R,

(b)

Figure 11

For each n = 1,2,..., consider the grid G n consisting of the lines


x = ijn, i = 0, n, and y =j/n, j = 0,
. . . , In. See Fig. 1(b). The . . . , 1

mesh of G n is \jn, and the area of each of the rectangles R u is \jn 2 Setting .

\n nl
454 Multiple Integration Chap. 6

we form the Riemann sum, illustrated in Figure 11(a),

n 2n n 2n I
jj j\ 1

2 I (2x, + y,)A(Ri,) =II ~+l ~2

1 / « 2n \

_ j_ An + 2
n 4n
2
+ 2n \
~n 2
^ 2 2 /

in
2
+ 3n
= 3
2
2n
Hence,

I
(2x + y) Jx d^ = lim (4 + -) = 4.
Jjs M-00 \ 2«/

A direct evaluation of a multiple integral will be very arduous for most


functions we wish to integrate. Fortunately, in many instances the multiple
integral can be easily evaluated by repeated application of ordinary 1-
dimensional integration instead of by finding the limits of Riemann sums.
The pertinent theorem, which we prove at the end of this section, is the
following.

2.2 Theorem

Let B be a subset of Jl" such that the iterated integral

£?*! \dx 2 . . . \fdx n

exists over B. If, in addition, the multiple integral

\fdV
Jb
exists, then the two integrals are equal.

Since the argument used to prove Theorem 2.2 applies equally well to
any order of iterated integration, we have as an immediate corollary:

2.3 Theorem

If $BfdV exists, and iterated integrals exist for some orders of


integration, then all of these integrals are equal.
Sec. 2 Multiple Integrals 455

Example 2. Evaluate §B (2x y) dx dy, where B is the rectangle


+
0<x<l,0<j<2. This is the same integral that occurs in Example 1.
Theorem 2.2 is applicable, and we obtain

(2x + y) dx d y = dx\ (2x + y) dy


JB JO Jo

= (4x + dx
i 2)

= [2x
2
+ 2x]] = 4.

Example 3. Let R be the 3-dimensional rectangle defined by — < 1

x <2, <y < 1, I < z <2, and shown in Fig. 12. Consider/ (x, y, z) =
xyz. Then

/ dV = xyz dx dy dz
JR JR

= dx\ dy\ xyz dz = x dx\ y dy\ z dz


J-i Jo Jl J-i Jo Jl

= [(f)(i)(f) = f •

Example 4. Let/(x, ^, f) = xyz, and let the subset B of3\ z be defined


by x 2 +j +
2
z2 < 4, x > 0, y > 0, z > 0. B is the interior and boundary
of one-eighth of the spherical ball of radius 2 with center at the origin,
shown in Fig. 13. The integraljs/dV equals the triple iterated integral of
the function f(x, y, z) = xyz over B. For fixed x and y, the variable z

=>

Figure 12 Figure 13
456 Multiple Integration Chap. 6

runs from to V4 — x 2 — y 2 which are the limits of the first integration


,

with respect to z. The result of this integration is a function of x and y


that next must be integrated over the 2-dimensional subset obtained by
projecting B on the xy-plane, that is, over the region x 2 + y 2 < 4, x > 0,
y ;> 0. For fixed x, the variable y runs from to V4 — x 2 ; hence, these
are the limits on the integration with respect to y. Finally, x runs from
to 2, so we conclude that

fdV = \
dx\ dy\ xyz dz.
B Jo Jo J

f fdV = -
fx dx Jo( ' v(4 - x
2
- y
2
) dy
Jb 2 Jo

The last integral simplifies to

»2

- x3 + |x
5
=
r(2x ) rfx |.

If the constant function 1 is integrable over a subset B of 31", the


content or volume of B is denoted by V(B) and defined by

V(B) = I
1 dV = I
dV.
Jb Jb
For sets B in Jl 2 we write
, ^(5), for area, instead of V(B). It follows from
the last part of Theorem 2. 1that the content of a continuously differentiable
A>dimensional (k < ri) curve or surface S is zero, for

V(S) = dV =
|
0.
Js

For some sets B, the integral j"# rfF does not exist. If this happens, the
content of B is not defined (see Exercise 21). Notice that for rectangles
R, the content V(R) has been defined twice: first as the product of the
lengths of mutually perpendicular edges and second as an integral. That
the two definitions agree follows immediately from Theorems 2. 1 and 2.2.

Example 5. Let B be the region in Jl 2 under the curve y = f(x) from


x = a to x = b, where/is a nonnegative function. Assuming the existence
of the following integrals, we obtain, using the iterated integral theorem
(Theorem 2.2),

A(B) = \
dA = \
dx\ dy = \
/(x) dx.
B Ja Jo Ja
Sec. 2 Multiple Integrals 457

Hence, the above definition of content is consistent with the usual one
one
for the area under the graph of a nonnegative integrable function of
variable. If/ is integrable over B and also nonnegative on B, we could
similarly show that the volume under the graph off and above the set B
is the double integral B \fdA.

Example 6. The volume above the disk D defined by x 2 +


y <
2
and under the graph of/(x, y) = x 2 + y 2 (see Fig. 14) is
1

equal to

f (x
2
+ y
2
) dx dy = Fdx f '"(x 2 + y
2
) dy
JD J-l J-Vi-x 2

= 2 f (jcV/l - x + 1(1 - x
a 2
)Vl - x
2
) dx

= ; fVVl - x
2
+ 2xVl - x
2
)
dx
3 Jo

4 ItT Tt\
_ ~
77
'

8/ 2 Figure 14
3 \4

Example 7. Find the volume of the region B in ft 3 bounded by the


four planes x = 0, y = 0, z = 0, and x +y +z= 1 , shown in Fig. 15.
pi pi-x pi-x-y
V(B) = C dV =
\ \
dx\ dy\ dz
JB Jo Jo Jo

The volume of the region B can be computed directly as a double integral.


The projection of B on the xy-plane is the triangle D bounded by the lines
x = 0, 7 = 0, x + y = 1. The set B itself can be described as the region
under the graph of the function f{x,y) = 1 — x — y and above D.

Figure 15
458 Multiple Integration Chap. 6

Hence, according to the remark at the end of Example 5,

V(B) = [ fdA = \\x f "(1 - x - y) dy = £.


Jd Jo Jo

Notice that what we have called the content of a subset of 31™ is more
properly called its n-dimensional content. For example, the square defined
by the inequalities 0<x<2,0<^<2in3l 2
has 2-dimensional content
4,whereas the square defined in 'Ji by < x < 2, < y < 2, z = 0, and
3

which looks the same, has 3-dimensional content 0. Thus the content of a
set depends on the dimension of the containing Euclidean space with
respect to which it is being measured, as well as on the shape of the set
itself. Having already indicated that 2-dimensional content is called area,

we remark that 1-dimensional content is length.


Some characteristic properties of the Riemann integral are summarized
in the following four theorems.

2.4 Theorem. Linearity

If/and g are integrable over B and a and b are any two real numbers,
then af + bg is integrable over B and

f(af+bg)dV = a(fdV + b[g dV.


JB Jb Jb

2.5 Theorem. Positivity

Iff is nonnegative and integrable over B, then

1 fdV
>0.

2.6 Theorem

If R is a rectangle, then $ R dV = V(R) (where the content V(R) is

defined by Equation (2)).

2.7 Theorem

If B is a subset of a bounded set C, then \B fdV exists if and only if

ScIb dV exists. Whenever both integrals exist, they are equal.

Proof of 2A. Let e > be given, and choose d > so that if S x and
S 2 are two Riemann sums forfB and gB respectively, and whose grids
,
Sec. 2 Multiple Integrals 459

have mesh less than d, then

Let S be any Riemann


Jl
fdV <-

sum
2

for (of
and

+
\b\

bg) D
Hwhose
gdV <

grid has mesh less

than 6. Then

S = 2(af+bg)B (x )V(R ) l i

= a ^fB {Xi)V(R ) + b J gB (^W(R )


t t
i i

= aS^ + bS 2 .

Hence,

a\fdV- b\ gdV
Jb Jb

-a\fdV + bS,-b\ gdV


IB JB

<\a\ H - [fdV \b\ gdV


JB "I
<- + - = €.
2 2
Thus

Iim 2(«/+ bg) B {x )V{K)


t
= a \ fdV + b
f g dV,
m(G)-0 i JB JB

and the proof is complete.

Proof o$ 2.5. Since all the Riemann sums are nonnegative, the limit
must also be nonnegative.

Proof ot 2.6. This follows immediately from Theorems 2.1 and 2.2.

Proof of 2.7. The existence and the value of the integral § B fdV
depend only on the function fB Similarly, J c /b . dV is defined by
using (fB ) c which is equal tofB
, .

Many of the important properties of the integral can be derived


directly from the preceding four theorems, without reference to the
original definition. The next two theorems are given as examples.
460 Multiple Integration Chap. 6

2.8 Theorem

If/ and g are integrable over B and/ < g on B, then

f fdV <( gdV.


Jb Jb

Proof. The function g —/is nonnegative and, by Theorem 2.4, is

integrable over B. Hence, byTheorems 2.4 and 2.5,

0<((g-f)dV = ( gdV-f fdV,


Jb Jb Jb
from which the conclusion follows.

The next theorem establishes an analog for the equation

[f{x) dx = [fix) dx + [f(x) dx


Ja Ja Jb

that holds for functions of one variable.

2.9 Theorem

If/ is integrable over each of two disjoint sets B 1 and B2 , then /is
integrable over their union and

I* fdV = \ fdV + [ fdV.


JBi<JB 2 Jbi Jb 2

Proof. By Theorem 2.7,

f fdV + \ fdV = \ fBl dV+\ fB2 dV.


JBi Jb 2 Jb x <oB 2 JBiSJBi

Since B x and B 2 are disjoint, fB lU B


2
= Sb + /b
x 2
- Hence, by The-
orem 2A,fB uB is integrable over Bx U B2 , and

f fBl dV + f fB2 dV = ( fBl vB dV. 2


Jb 1 ^IB 2 JBiSJB 2 JB^JB 2

Finally, by Theorem 2.7 again, /is integrable over B U B2


r and

f fBl vB.dV = \ fdV.


JBi*JB 2 JBiVB 2

This completes the proof.


Sec. 2 Multiple Integrals 461

The next theorem will show that, for functions / and regions B for
which \sfdV exists, the value of the integral is completely determined by
the properties stated in the four Theorems 2.4-2.7. The theorem is

important because it enables us to identify the multiple integral with other


integrals I, in particular, iterated integrals, which may be computed in any
one of a number of orders. We shall use the symbol I to denote such an
integral.

2.10 Theorem

Let I be an integral of bounded functions 51


n —> Jl over bounded
sets B. Suppose that I has the following characteristic properties:

(a) If /g/and IBg are defined and a and b are real numbers, then
IB (af + bg) is defined and

IB (af+bg) = aIB f+bIBg.

(b) If /is nonnegative and IBf is defined, then IBf> 0.

(c) If R is a rectangle, then IR l = V{R) (as defined by Equation


(2)).
(d) If B is contained in a bounded set C, then IB f is defined if and
only if Ic fB is defined. Whenever both exist, they are equal.

We can then conclude that, if IB and \B both exist, they are


f fdV
equal. (No properties of the integral itself are assumed; we use only
its definition.)

Proof. Suppose $B fdV < IB f. Set

e = IB f~\Jb fdV,
and choose 6 > so that if S is any Riemann sum forfB whose grid
has mesh less than <5, then

fdV < -
j;B 2

Let G be an arbitrary grid that covers B and has mesh less than d,
and denote the closed bounded rectangles formed by G by R u . . .
,

R r . Set

C=R 1
KJ . . . UR r ,

fi = least upper bound of/B in R t


.
/

462 Multiple Integration Chap. 6

Consider the function g defined by

g = ZfiXHi-
1=1

The function yM> is the characteristic function of R t


. It is defined by

( \ _ P>
if x is in i^,
Xr,W - 0> otherwise.
|

It follows immediately from properties (c), (d), and (a) that Icg is

defined and that

j cg = i/«*w
i=l

The definition of least upper bound implies that there exists a


Riemann sum for/B on the grid G that is arbitrarily close to Icg.
Hence,

fdV - / cg <-
2

and so, by the definition of e,

leg < hf- 0)


By property "=
(d), IB j /c/^- Moreover the function g has been
constructed so that/B < g. It follows from property (b) (as extended

in Theorem 2.8) that

lBf=IefB<hg- (4)

The inequalities (3) and (4) are contradictory; so we conclude

fdV >IB f.
1
By an entirely analogous argument using the notion of greatest
lower bound instead of least upper bound, we can obtain

fdV <IB f,
1
JB

and this completes the proof.

For an application of Theorem 2.10, take the functions Jl 2 > 31 and —


sets B for which the iterated integral J dx J dy over B is defined. Let '

IBf=jdxjfdy.
(overf?)
Sec. 2 Multiple Integrals 463

Verification of the conditions of Theorem 2.10 is straightforward and


reduces to a knowledge of the corresponding properties of the definite
integral for functions of one variable. For example, for integration over
intervals,
re re re
(af+bg)dx = a\ fdx + b\ gdx.

e
fdx >0, if/>0.
1
dx = B — a.
f
\
fdx = f[( v , S) ]dx,
Jy Ja
ifa<y<(5<^ and
[y, d] is the interval

y <x < d.

It Theorem 2.10 that if both the iterated integral


follows immediately from
and the double of/ exist over B, then they are equal. This proves
integral
Theorem 2.2 for two variables. The general case can be done by induction.
The possibility of changing order of integration has a number of
consequences other than its obvious convenience for computing multiple
integrals. One of these is the theorem for change of order in partial differ-
entiation, proved
in Section 5 of Chapter 3 by other means, and in a
slightly strongerform in Exercise 8 of this section. Another consequence
is the Leibnitz formula for interchanging differentiation and integration.

The theorem is stated here and the proof is outlined in Exercise 7.

2.11 Leibnitz Rule

If {dgjdy){x, y) is continuous for a < x < b and c <y < d, then

j- r g (x,y)dx = r^(x,y)dx.
dy Ja Ja dy

EXERCISES
1. Make a drawing of the set B and compute ] B fdA, where

(a) f{x, y) = x 2 -r 3/ and B is the disk jc2 + y* < 1. [Ans. n.]


(b) f(x,y) = \j{x + y) and B is the region bounded by the lines y = x,
x = 1, x = 2, y = 0. [Ans. log 2.]
(c) f(x, y) = x sin xy and B is the rectangle < x < n, <y <\.
[Ans. tt.]
,

Multiple Integration
chaP- 6
464

(d) fix, y) = x 2 - y 2 and B consists of all (x, y) such that < x < 1 and
x 2 - v
2
> 0.
[Ans. JJ

2. Using the definition of the double integral as a limit of Riemann sums,


compute $ B f(x, y) dx dy, where

(a) f{x, v) =x J- Ay and B is the rectangle < x < 2, < jy < 1.

[Ans. 6.]

(b) 1fix, v) = 3x2 + 2y and £ is the rectangle <x <2,0 <y <l.
' [Ans. 10.]

The following formulas will be useful in doing this problem:

2
"
< = ——
n(n +
2
1)
'

«(« + l)(2n + 1)
2' =
t=i
2
6

n / n \2

3. Find the volume under the graph of/ and above the set B, where
(a) fix, y) = x + y2 and B is the rectangle with corners (1, 1), (1, 3),

(2, 3), and (2, 1).


i^ns. ¥•]
(b) /(x j) =x +y + 2 and B is the region bounded by the curves y2 = x.

and* =2. [A».W^J


(c) /(x,j) = |x + jl and B is the disk x2 + /< 1. [Ans. 4^2/3.]

4. Find by integration the area of the subset of A bounded


2
by the curve
x2 - 2x + 4/ - 8y + 1 = 0. [^'"- 2jt.]

5. Find an approximate value for each integral in Problem 1 by computing a


Riemann sum with an appropriately fine grid.

6. Consider the rectangles

B 1
defined by 0<x<1,0<j<1
B2 defined by 1 < x < 2, -1 <J < 1

and the function


(2x - y, if x < 1

1
U 2
-v, if x>l.
Compute S Bl vB 2 /(*» J) rfjc d>'- ^ Ans ' ^
7. (a) Prove the Leibnitz rule for differentiating
an integral with respect to a
parameter: If ix,y) is continuous
gy on a rectangle a x b, < <
c <y <d, then

iJW.j)*-J>.**
^ec -
2 Multiple Integrals 465

fy pb
[Hint. Interchange the order of integration in dy\ gv (t,y) dt, and
Jc Ja
then differentiate both sides with respect to y.]
(b) Use part (a) and the chain rule to show that if continuous, and A,
gy is

and h 2 are differentiate, then

d r*/ih 2* (y)
1

git,y)dt
»A 2 (tf)

gv (t,y)dt + h'2 (y)g(h 2 (y), y) - h'^gQt^y), y).


Jhh
I j ( !/ )

8. Prove (compare 5.2, Chapter 3): Iffx ,f andfxy are continuous on an open
v ,

set, then/XJ ,
=fyx . [Hint. Apply the Leibnitz rule of Exercise 7(a) to the
equation

f(pc,y) -f(a,y) =\jx {t,y)dt,


and then differentiate both sides with respect to x.]

9. Given that f(x,y, z) = xyz and that


r ri px px+v
f{x, y, z) dx dydz = dx dy xyz dz,

sketch the region B and evaluate the integral. [Ans. -§-8-.]

10. Compute the multiple integral of f(x,y,z, w) = xyzw over the 4-dimen-
sional rectangle

®^x<\, -\<y<2, l<z<2, 2<w<3. [Ans.


ff]
11. Sketch the region B bounded by the surface z = 4 - 4x 2
in Jl 3 - y2 and
the .xy-plane. Set up the volume of B as a triple integral and also as a
double integral. Compute the volume. [Ans. 4n.]

12. Write an expression for the volume of the ball x 2 + y2 + z2 < a2


(a) as a triple integral.
(b) as a double integral.

13. Sketch in ft 3 the two cylindrical solids defined by x 2 + z 2 < 1 and y 2 +


z2 < 1, respectively. Find the volume of their intersection. [Ans.-1-6-.]

14. The 4-dimensional ball B of radius 1 and with center at the origin is the
subset of ft 4 defined by x\ + x\ + x\ + x\ < 1. Set up an expression for
the volume V(B) as a fourfold iterated integral.

15. Use Theorem 2.8 to show that if /and |/| are integrable over B, then
\$BfdV\<$B \f\dV.

16. Let Rn -^—> 3L m be defined on a set B in Jl n . We define


466 Multiple Integration Chap. 6

provided that the integrals of the coordinate functions f x , . . .


,fm of/ all

exist.

(a) Show that if &n -^-> 5Lm and #" -^- # m are both integrable over B, then

(of + bg)dV = a( fdV + bj g dV,


J

where a and b are constants.


k is a fixed vector in
Ifki
(b) If & m and ,
Rn — > &m is integrable over B, show
that

k -fdV = k •
fdV.
JB JB

if ft" -?-> R"> and Jl -^U- # are integrable over B, then


n
(c) Show that

Iju/^^l < JB l/l </K. [i/mr: By the Cauchy-Schwarz inequality


/(*) ' §Bf dv ^ l/WI $BfdV\, for all x in #. Integrate with respect to
I

x and apply the result of part (b).]

17. Use the result of Exercise 16(c) to show that if ft" > Xm is continuous
on a set B, and x is interior to B, then

s^/v /</F=/(x ),

where 5r is a ball of radius r centered at x .

18. Let R be a region in Rn having a finite volume. The vector

--ml.*"
is called the centroid of R, and the real number

/(z) = I
|z -z\ 2 dVx
JR
is called the moment of inertia of R about z.

(a) Show that the centroid of a ball is its center.


(b) Show that 7(z) is minimized by taking z to be the centroid of R. [Hint.
Show that
/(z) = 7(0) - V(R) |z |
2
+ V(R) |z - z |
2
.]

19. Prove the analog for multiple integrals of Theorem 8.4 of Chapter 5: If R
has finite volume, and the series ^/a. of continuous functions converges to/
k=l
uniformly on R, then

°°
f c r °° ~

I fitdV- I/, </K


,

Sec. 3 Change of Variable 467

20. A function R n - &, integrable over a region R in Jl" is called a probability

density on R if

p(\) > for all x in R. (1)

pdV = \. (2)
j;
If £ is an experiment with possible outcomes in Rn distributed according
to the density/), then the probability that the outcome lies in a set B in
n
is R
defined by

Pr[E in B]
L pdv -

(a) For what constant k is the function R2 — > R,

(k(l -x -y 2 2
),x 2 + y2 < 1

p(x,y) =
x2 +y > 2
1,

a probability density? [Arts, k = 2j-rr.]

(b) If the outcomes of E are distributed according to the density of part (a),
find the probability that E has an x-coordinate bigger than J.

21. Let B be 2
the subset of Jl consisting of allpoints (x, y) such that <y < 1

and x is rational, <x < 1. What is the area of B?

22. On the rectangle <x < 1 and <y < 1, let/(x, _y)
= 1, if x is rational,
and f(x,y) = 2y, if x is irrational. Show that

Jo Jo

but that /is not Riemann integrable over the rectangle.

23. Prove that

h;
The change-of-variable formula
(e~
xy - 2<r 2* !

for
') </x

1
#
M
-dimensional integrals
(«-*» - 2e- 2 *»)</7.

SECTION
CHANGE OF
3

is

VARIABLE
/(x) rfx = /(^(U))f (U) du. (i)
J*(a) Ja

For example, taking (/>(w) = sin zv, we obtain

2
= 2
=
J> x f/x cos u du

In this section Equation (1) will be extended to dimensions higher than


one. In ^-dimensional space a change of variable is effected by a function
Ul
n — > ft". In what follows it will usually be more convenient to consider
468 Multiple Integration Chap. 6

the domain space and range space of 7 as distinct. We therefore regard T


as a transformationfrom one copy of :K", which we label c lL n to another ,

copy, which we continue to label 3t", writing typically T(u) = x where u is


in 11" and x is in Jl". The statement of the ^-dimensional change-of-
variable theorem follows.

3.1 Theorem

Let 1L" — >. Jl" be a continuously differentiable transformation. Let


R be a set in 1L" having a boundary consisting of finitely many
smooth sets. Suppose that R and its boundary are contained in the
interior of the domain of T and that

(a) Tis one-to-one on R.

(b) det T', the Jacobian determinant of T, is different from zero


on R.

( T(R)J

Figure 16

Then, if the function / is bounded and continuous on T(R) (the


image of R under T), we have

[ fdV = (/o T) |det T'\ dV.


Jt(R) \
Ji

Either condition (a) or (b) is allowed to fail on a set of zero


content.

The proof is in the Appendix. Before showing why the formula works,
we give some examples of its application. Notice that the factor that </>'

occurs in Equation (1) has been replaced in higher dimensions by the


absolute value of the Jacobian determinant of T. Aside from the computa-
tion of det T' the application of the transformation formula
, is a matter of
Sec. 3 Change of Variable 469

finding the geometric relationship between the subset R and its image
T(R) for various transformations T.

Example 1. The integral J P (x + y) dx dy, in which P is the parallelo-


gram shown in Fig. 17, can be transformed into an integral over a rectangle.

(1.1)
(2,1) '(3, 1)

Figure 17

This is done by means of the transformation

x\ fu\ lu + V

The Jacobian determinant of T is


1 1

det T' =
1

By the change-of-variable theorem,

(x + v) dx dy = [(u + ») + v]i du dv
Jp Jr

= rfu I (u + 2t>) dv = 4.
Jo Jo

The transformation Tis clearly one-to-one because it is a linear transforma-


tion with nonzero determinant. Notice that the region of integration in
the given integral is in the range of the transformation rather than in its

domain.

Example 2. The transformation

u cos v

u sin v
470 Multiple Integration Chap. 6

goes between the regions shown in Fig. 18. The Jacobian is

cos v —it sin v


detr'

The transformation is one-to-one between R and T(R). This can be seen


geometrically because of the interpretation of v and u as angle and radius

y
-(1,7) 2
< 'T)
Sec. 3 Change of Variable 471

Figure 19

we get T(R) — B. The corresponding regions are shown in Fig. 19. Since

sjx'
.

472 Multiple Integration Chap. 6

occurs has zero content, so the change-of-variable theorem still applies.


Of course, the value of neither integral is affected by including or excluding
these points.

Example 4. Let a function 1L 2 — > % 2


be defined by

(x\

w vj \u + V2

The unit square R uv defined by the inequalities 0<«<l,0<y<lis


carried by T onto the subset R xy shown in Fig. 20. Corresponding pieces

R.. = =

Figure 20

of the boundaries are indicated in the picture. The image of each of the
four line segments that comprise the boundary of R uv is computed as
follows:

u =
(a) If and <v < 1 , then x = — v and y = v
2
, that is, y = x2
and — < x < 0.
1

(b) If v = and <u < 1 , then x = u 2 and y = u, that is, x =y 2

and <y < 1

(c) = and < v < then x =


If u 1 1 , 1 — v and y = + 1 v
2
, that is,

y - = (x - l) and < x <


1
2
1.

(d) If v = and < u < then x =


1 1 , u2 — 1 and 7 =u+ 1, that is,

(y - 1)2 = x + and < j < 2. 1 1

It is not hard to verify that T is one-to-one on R uv . Suppose

"1

then

«i — v1 = u2 — v2 ,

Mi + vt — M2 + fa-
Sec. 3 Change of Variable 473

Obviously, if wx — u 2 then v x
, = v2 . Suppose u x < u2 . This implies

< u\ — u\ = v2 — vu

< u — "i =2 v\ — v\.

Hence, vx < v2 , whereas v\ < v\. This is impossible if both vx and v 2 are
nonnegative; so the one-to-one-ness of T on R uv is established. The
Jacobian determinant of T is
2m -1
det r= = 4uv + 1.
1 2v

We therefore have as an application of the change-of-variable theorem

x dx dy = (u
2
— v)(4uv + 1) du dv
jRxv J Ruv

= \
dv\ (4u
3
v — 4uv + u
2
— v) du
Jo Jo

=j\-2i? + $)dv=-l

To understand why the change-of-variable formula works for a


continuously differentiable vector function T, we need to know what
effect Thas on volume. We use the affine approximation to
Tthat replaces
T(u) near u by r(u ) + 7"(u )(u — u ). The way in which T alters volume
will be reflected in the way in which da Takers volume. In fact, translation

of a subset by the vector T(u ) leaves its volume unchanged, and the
differential da T, being a linear transformation, changes volume in a
particularly simple way. Indeed, under a linear transformation volumes
get multiplied by a constant factor, and the factor of proportionality is
just the absolute value of the determinant of the transformation. For
instance, suppose T is taken to be a linear transformation, and /is the
constant function 1, that is, f(u) = 1 for all u in \L™. The change-of-
C

variable theorem (Theorem 3.1) then implies the following.

3.2 Theorem

IfT is a linear transformation from Jl" to 3i n having matrix A, then


T multiplies volumes by the factor |det A\.

Proof. By the change of variable theorem, setting J = det 7", we get

V(T(R)) = f dV = \\J\dV = \J\ V(R.)


Jt(r) Jr
474 Multiple Integration Chap. 6

The last step is valid because for a linear transformation with


matrix A, the Jacobian determinant is a constant det A.
Then |7| = jdet A\, which can be taken outside the integral.

Example 5. The transformation from 1L to 51 given by x =


7>u is linear. It is therefore its own differential (see Chapter 3,
Section 8, Exercise 9) and has Jacobian determinant J = 3. It is

clear from Fig. 21 that lengths get multiplied by 3 under this


transformation.

Example 6. The transformation T from c


lL a to Jl 2 given by

\v \u + v

has as its differential at u n = I


= I I the linear transformation

2w 0\ lu

yVI \ 1 1/ \V ::)(:
Near

the function Tis approximated by the affine transformation

M T\ il 0\ lu - 1\ /2m
")=r( + r(u )(
,v) \\)
]

\v - \u + v

The square R in the wy-plane in Fig. 22 is carried by T onto the curved

(1,1)

Figure 22
Sec. 3 Change of Variable 475

figure on the right. The affine approximation A carries R onto the paral-
lelogram outlined with dashes. Notice that the area of the parallelogram is

roughly equal to that of the curved figure. The exact area of the parallelo-
gram is easily computed to be \, twice the area of R. The important point
is that the affine approximation to T doubles the area of the square, while
T itself approximately doubles that area. The magnification factor, 2, is

given by the Jacobian determinant of Tat u = (


). In fact

2
det r = 2.
1 1

To find the exact area of the image T(R), we use the change-of-variable
theorem. Since u is positive on R, the transformation T is one-to-one
there. The inverse function is given explicitly by

y
so Tis one-to-one. Moreover, the Jacobian determinant/ = 2w is positive
on R. Hence,

A(T(R)) = j
dA = { \J\ dA
Jt(r) jr
/•3/2 /*3/2 /*i

= dv\ 2u du = \u
%

To understand the change-of-variable theorem itself, let T be a con-


tinuously differentiable transformation and consider the corresponding
regions R and T(R). A 2-dimensional example is illustrated in Fig. 23.

Figure 23
/

476 Multiple Integration Chap. 6

Decompose R into regions R t by means of coordinate lines. Denoting


approximate equality by the symbol sa, we have

f
fdV = z( fdV^^mnK)), (2)
JT(R) i JT(Ri) i

where the number/ is a value assumed by the function /in T(R t


). We
assume that V(T(R )) t
is a reasonable approximation to fdV. Next,
JT(R t
)

approximate F(r(/?;)) by \J V(R ), where J is a value assumed by the


t \
t t

Jacobian determinant of T in R Thus we are led to the approximation


t
.

2F(W))«2/*WW (3)

But the number/ is equally well a value of/° Tin R t


, which we can write
(/ o T)it getting from Formulas (2) and (3)

fdVHtZifo-nMViR,).
j;

Finally, the last sum can be used to approximate {f°T)\J\dV. To


Jr
make this argument precise is difficult; so the proof given in the Appendix
follows other lines.
The foregoing discussion shows that the Jacobian determinant can be
interpreted as an approximate local magnification factor for volume. A
one-to-one continuously differentiable transformation, looked at as a
coordinate change, leads to another slightly different interpretation of J.

Example 7. Polar coordinate curves in 3l 2 bound regions like 5 in

Fig. 24. Since the Jacobian determinant of the polar coordinate trans-
formation
fx\ fr cos 6\ [r\
^T
\yj \r sin 6]

<IL-

SB
Ar

Figure 24
Sec. 3 Change of Variable 411

cos 6 —r sin
J=
sin 6 r cos 6

we expect an approximation to the shaded area S = T(R) in the xy-plane


to be r Ar Ad. Computation of the exact area of S, using the change-of-
variable theorem, gives

\dA = \
r dA = \
rdr\ dd
JS JR Jr a JOo

= (i(r + Ar) -H)A0


o
2

= r ArA0 + KM A0
o
2

tv r Q Ar A0 (for small Ar, A0).

Thus the significance of J in this case is that J Ar Ad is an approximation


to the area of a polar coordinate "rectangle," or region bounded by polar
coordinate curves, with Ar and Ad as the difference between pairs of
values of r and 6.

The expression r Ar AO is called the area element in polar coordinates.


More generally, if J is the Jacobian determinant of a coordinate change,
then |7| AV is called its volume element.

3
Example 8. Spherical coordinates are introduced in 'Ji by means of
the transformation

Except for a notational change, the same transformation is considered in


Example The Jacobian determinant is J = r 2 sin
3. This suggests the <f>.

approximation r 2 sin Ar A<f> Ad for the volume of the spherical co-


(f>

ordinate "cube" C shown in Fig. 25. The spherical ball with center at the
origin and radius is defined in 3l
1
3
by jc 2 + y 2 + z 2 < and is denoted 1

below by Bxyz With respect to polar coordinates, the same ball is defined
.

by the inequalities

0<r<l, 0<^<7v, <6 <2tt,

and is denoted below by B r(f>e


. Using the change-of- variable theorem, we
.

478 Multiple Integration Chap. 6

Figure 25

compute the volume of the ball to be

V(B XVZ ) = dx dy dz = r
2
sin <f>
dr d<f> dd
JBxvz JB r<t>e
fl fir rzv
= r
2
dr\ sin <j> d<f>\ d6
Jo Jo Jo

= Kl + D27T = 4j

Notice (as in Example 3) that both conditions (a) and (b) of Theorem 3.1
fail to hold on However, except on a subset of Br<f>e having zero
Br<t>9
.

volume, the Jacobian J = r 2 sin is positive, and the coordinate trans-


(f>

formation is one-to-one. The change-of-variable formula is therefore


applicable.

EXERCISES
1. Let

2uv

(a) Sketch the image under T of the square in "M 2 with vertices at (1, 1),

(b) Sketch the image under T'\ I of the square in part (a).

(c) Sketch the translate of the image found in part (b) by the vector

1\ /l
7"
1/ \1
Sec. 3 Change of Variable 479

Verify that this is the image of the square under the affine approxi-

mation to Tat r
\1>
(d) Find the area of the region sketched in (c). [Ans. 2.]
(e) Find the area of the region sketched in (a). [Ans. ^.]
2. Let
(u\ lu COS V
T
y] \vj \ u sin v

(a) Sketch the image under Tof the square S with vertices at (0, 0), (0, tt/2),

(77/2, 0), and (tt/2, tt/2).

(b) Sketch the image under T'\ I of the square S. What is the area of
\ 0/
the image?
(c) Sketch the image of S under the affine approximation to Tat (77/4, 77/4).

What is the area of the image?


(d) What is the area of the region sketched in (a) ?

3. Let

III COS V

usmv
lu\
Show that T I
J
transforms a rectangle of area A into a region having

area it A.

4. Compute the area of the image of the rectangle in the wt'-plane with vertices
at (0, 0), (0, 1), (2, 0), and (2, 1) under the transformation

lA " sS]
y) \2 i)C)-
5. Consider the transformation T defined by

x\ {u\ lu 2 — v
7

y! W \ 2uv

Let R uv be the quarter of the unit disk lying in the first quadrant, i.e.,

u
2
+ v
2
< 1, u > 0, v > 0.

(a) Sketch the image region Rxy = T(R UV).


C dx dy '
(b) Compute . [Ans. n.]
JRXV V x 2 + y2
'

6. Let the transformation from the wivplane to the xy-p\ane be defined by


480 Multiple Integration Chap. 6

x = u + v,
y v. Let R uv be the region bounded by (1) */-axis,
(2) t'-axis, and (3) the line u + v = 2.

(a) Find and sketch the image region Rx


(b) Compute the integral

dxdy
[Ans. 2.]
tRxy Vl
j; + 4x + 4y

7. Let a transformation of the «i>-plane to the .#y-plane be given by

x = u, y = v(l + u 2 ),

and let R uv be the rectangular region given by < u < 3 and < v < 2.

(a) Find and sketch the image region R xy .

d(x,y)
(b) Find
d(u, v)

(c) Transform $Rxy x dx dy to an integral over R uv and compute either one


of them. [Ans. ^.]
8. The transformation u = x 2 — y2 , v = 2xy maps the region D (see sketch)
onto a region R in the w-plane, and it is one-to-one on D.

(a) Find R.
(b) Compute | /e 1 dudv by integrating directly over R, and then by using
the transformation formulas to integrate over D. [Ans. ^p.]
(c) Compute J /e v du dv both directly and by using the change-of-variable
theorem. [Ans. 128.]

9. Let a transformation from the xj-plane to the »r-p!ane be given by

u =x
v =y(l + 2x).

(a) What happens to horizontal lines in the xj-plane?


(b) If D is the rectangular region
< x < 3

1 <y < 3,
find the image region R of D.
(c) Find

\
du dv, v dv du, and u dv du
Jn JR J R
by direct integration, and then by reducing them to integrals over D.
[Ans. 24, 228, 45.]

10. Compute the area bounded by the polar coordinate curves = 0, — n/4,
and r = 2
. [Ans. 7r
5
/(2
10 •
10).]

11. Find the area bounded by the Iemniscate (x 2 V y


2 2
)
= 2a 2 (x 2 — y2) by
changing to polar coordinates. [Ans. 2a 2 .]
Sec. 3 Change of Variable 481

12. Compute the volume of the ellipsoid

x 2
v2 z2
2
h—-\
2 2
<
~ 1
a b c

[Use the transformation (x,y, z) = (au,bv,cw) to transform the sphere


u2 + v
2
+ w2 < 1 onto the ellipsoid. Assume the volume of the sphere to
be known.]

13. Evaluate the integral of f(x,y, z) = a over the hemisphere x2 + y 2 +


z2 < 1, x > by changing to spherical coordinates. [Ans. f^a.]

14. (a) Compute the Jacobian of the cylindrical coordinate transformation

(b) Use cylindrical coordinates to compute

x 2 dx dy dz.
\
JO<z<l

15. Prove that the transformation

xx = ux

X2 = Ml + U2

X3 = ll
x + W2 + M3

Xn = «1 + »a + • • + «n

leaves volumes of corresponding regions unchanged.

16. Cutting a solid of revolution R into thin cylindrical shells, with axis the
same as the axis of revolution, leads intuitively to the following formula
for the volume of the solid:

2nrh(r) dr.
I
Here /;(/) is the thickness of the solid at a distance r from its axis, measured
along a line parallel to the axis. Show that introducing cyli ndrical coordinates
in the integral l R dx dy dz leads to the same formula.

17. (a) Let a ball B of radius a have density p at each of its points equal to the
distance of the point from a fixed diameter. Find the total mass of the
ball.

[Hint. Compute the integral $ B pdV by using spherical coordinates.]


482 Multiple Integration Chap. 6

(b) Let a cylinder of height h and radius a have a density p equal at each
point to the distance of the point from the axis of the cylinder. Find the
total mass of the cylinder.

SECTION 4

IMPROPER The definition of the integral can be extended to functions that are
INTEGRALS unbounded and not necessarily zero outside some bounded set. We shall
first consider some examples.

Example 1. The function f{x,y) = ljx 2y 2 defined for x , > 1 and


y > 1, has the graph shown in Fig. 26. If B is the set of points (x, y) for

Figure 26

which x > 1 and y > 1 , it is natural to define § B fdA in such a way that it

can be called the volume under the graph of/. We can approximate this
volume by computing the volume above bounded subrectangles of B.
lying
To be specific, let By be the rectangle with corners at (1, 1) and (TV, N)
and with edges parallel to the edges of B. For > 1 we have N
r fdA=\
N
dx\
dx\
rN

i

x y
{

2
dy

-{\:f)-{^
As jV tends to infinity, the rectangles BlX eventually cover every point of B,
and the regions above the By fill out the region under the graph of/.
Then, by definition,

fdA lim
f fdA= 1.

Example 2. Let B be the disk x2 +y < 2


1 in 'Si
2
, and suppose

f(x,y) = -log (x 2 +y 2
), < x2 +y < 2
1.
Sec. 4 Improper Integrals 483

r:.>
/u '
>-)=-i°g(* 2 + y
2
)
i
r i /

Figure 27

The graph off is shown in Fig. 27. Since /is unbounded near (0, 0), we
cut out from B a disk centered at (0, 0) and with radius e. Call the part of
B that is left B €
. We have, using polar coordinates,

fin pi
log (x
2
+ 2
dx dy d - (log r)r dr
JBf
y )
1 i
= -2-rr[r log
2
r-\r 2
]\

= it + 27re
2
log e — ttc
2

Since lim (Ine 2 log e — ve 2 ) = 0, we get, by definition,


€—0

log (x
2
+ >'
2
) dx dy = lim -
)
log (x
2
+ y
2
) dx dy = tt.
c-0 JB€

In Example 2, the function —log (x 2 +j 2


) becomes unbounded in the

disk x2 + y 2 < only at the point


1 (0, 0). It is of course important to
find all such points in attempting to integrate an unbounded function. In
general, we define an infinite discontinuity point for a function /to be a
point x such that in any neighborhood of x, |/| assumes arbitrarily large
values.

Example 3. Consider the function g(x, y, z) = (x 2 +y + 2


z2 — 1)~ 1/2

for (x, y, z) satisfying 1 < x 2


+y + 2
z 2
< 2. The domain of g is the
region between the concentric spheres shown in Fig. 28. Every point of the
sphere x 2 +y + 2
z2 = 1 is an infinite discontinuity point for g. To
define the integral of g, we approximate its domain by shells B e
determined
by 1 + e < x2 +y + 2
z2 < 2. These shells have the property of filling

out the entire domain of g as e tends to zero, although none of them


484 Multiple Integration Chap. 6

Figure 28

contains an infinite discontinuity point. Introducing spherical coordinates,


we obtain

(x
2
+ y
2
+ 1)- 1/2 dx dy dz
I
r-2 rin rir

= \
dr\ dd\ (r
2
- l)-
1/2
r
2
sin <f>
d<j>
Jl+e Jo Jo

= 4tt
f (r
2
- l)-
1/2 2
r dr

= 47r[irjr
2
- 1 + I log (r + Vr 2 - l)]
2

It follows immediately that

lim f (x
2
+/- z
2
+ 1)
1/2
dx dy dz = 4^3 + 2tt log (2 + ^3).
e-0 JB€

Before collecting the ideas illustrated above into a general definition,


we make two requirements about the integrand/and the set B over which
it is to be integrated.

1. Let D be the set of points of B at which /is not continuous. The


part of D lying in an arbitrary bounded rectangle is to be contained in
finitely many smooth sets.

2. The part of B lying in an arbitrary bounded rectangle is to have a


boundary consisting of finitely many smooth sets.

Both conditions are satisfied in the three examples considered so far, and
we shall assume that they hold throughout the rest of the section.
In the examples, we have seen that the integral § B fdV can sometimes
Sec. 4 Improper Integrals 485

be defined when either / or B is unbounded. The extended definition of the


integral will be made in such a way that both phenomena can occur at
once. We proceed as follows. An increasing family {By} of subsets of B
will be said to converge to B if every bounded subset of B on which /is
bounded is contained in some one of the sets BN . Notice that this notion
of convergence depends not only on B but also on/. The index N can be
chosen any convenient way; it may, for example, tend to oo contin-
in
uously or through integer values, or it may tend to some finite number.
Throughout the rest of this section we shall assume that, in any increasing
family {By} converging to B, each of the sets BN satisfies condition 2.
The integral of/ over B is by definition

limf fdV = \fdV,


N JBy JB

provided that the limit is finite and is the same for every increasing family
of bounded sets By converging to B. It is assumed that the BN are chosen
so that the ordinary Riemann integrals § Bn J" dV (as defined in Section 2)
exist. The integral thus obtained is called the improper Riemann integral

when it is necessary to distinguish it from the Riemann integral of a


bounded function over a bounded set.
Although the requirement that the value of the integral be independent
of the converging family of sets used to define it is a natural one, we shall
see later that sometimes interesting to disregard
it is it. Nevertheless, the
next theorem shows that for positive functions the limit of §B N fdV is

always independent of the family of sets.

4.1 Theorem

Let/ be nonnegative on B and suppose that

limf fdV
N JBy
is finite for some particular increasing family of sets By converging
to B. Then \ B fdV is defined and has the same value,

lim fdV,
-V JCy
for every other family {CN} converging to B.

Proof. Since/is bounded on each By, we have for each TV an index


K such that
By C CK .
486 Multiple Integration Chap. 6

Similarly, there is an index M depending on K such that


CK c B M .

Then, because /is nonnegative,

fdV < f fdV<[ fdV.


JCe JBm
In addition,

|
fdV <lim| fdV
JCy X J By
for all N. Because IcyfdV increases and is bounded above,

limf fdV
A' JCy

exists. The double inequality shows that

liml fdV = \\m\ fdV.


N JBx X JCy
This completes the proof.

Example 4. Let / be defined on the infinite strip 5 in Jl 2 , shown in


Fig. 29, by f(x,y) — y~ 1/2
e~ x . Clearly, /has an infinite discontinuity at

y=\

Figure 29

every point of the positive .x-axis. We define Rx to be the rectangle in S


bounded by the lines x —N and y = 1/tV, for TV > 1 . As N tends to
infinity, R y will converge to S. We have

I fdA = \dx\ y-
me~x dy
JRs Jo Jl/N

Then

J[/*V***-lta(l-^)(2-^)-Z
. .

Sec. 4 Improper Integrals 487

Example 5. The integral of l/x a over the positive x-axis, denoted by


jo
a
x~ dx, fails to exist for any a. Consider, for N> 0,

~a -
N l
- 1/iV
1

i ; ,
a¥- 1,

h/N
j;
\2\ogN, a = l.

As N tends to infinity, we get infinity for a limit in every case. However,


it is easy to verify that

_a
1

x dx = l
, for a > 1,
f a - 1

and

x a
dx = 1
, for a < 1
1o I — a
The integral
%1
~1
P
J-i x

fails to exist if we require that its value be independent of the limit process
by which it is computed. Indeed, if we integrate first over the intervals
[— 1, —(5] and [e, 1] with < d < 1 and < e < 1, we get

-6
dx = log |x| + log X
J-l X Je X

d
= log -
1

As e and 6 tend to zero, log (<3/e) can be made to tend to any number by
controlling the limit of the ratio d/e. In particular, if we keep e = d,
the limit is zero.
For a function /having a graph symmetric about some point x it is ,

sometimes significant to define § B j dV by a limit using sets BN that are "

also symmetric about x This is what we have done in the previous


.

example, and in general, we speak of computing a principal value (p. v.)


of the integral. For the integral in the last part of Example 5 we would
write

p.v. -= 0.
J-l X

Example 6. Let (r, 6) be polar coordinates in ft 2 , and set f(r, 6) =


(sin 6)jr
2
over the disk D of radius 1 centered at the origin. Clearly, /has
an infinite discontinuity at the origin because, for instance, along the line
6 = tt/2,/tends to oo and along the line 6 = 377-/2, /tends to — oo. (See
Fig. 30 for the graph of/.) However, \ D fdA fails to exist in the ordinary
488 Multiple Integration Chap. 6

Figure 30

improper integral sense because the limit obtained from a sequence of


regions in D will depend on the way in which the positive and negative
values of/ are balanced. A principal value of the integral can be deter-
mined by taking a limit over a family of annular regions. Let D e
be the
annulus e < r < 1. Then

S
p.v. f fdV = lim f -^rdrdd
JD <r->0 JDe r

2"
C C 1 rlr
= lim sin 6 d0\ —= 0.
€-0 JO Je r

The next theorem is a convenient test for the existence of an improper


integral.

4.2 Theorem

Let /and g have the same infinite discontinuity points. If |/| <g
and J B g dV exists, then so does § B fdV.

Proof. Let {BN } be an increasing family of sets converging to B.


Since/ + |/| < 2 |/| < 2g, we have

f {f+\f\)dV <l[ gdV <l[ gdV.


JBn JB.v JB
-

Sec. 4 Improper Integrals 489

Then, because/ + 1/1 > 0, the value of the integral j"^ (/ + l/l) dV
increases as J5 A- increases, and we have

limf (f+\f\)dV = l1 <2f gdV.


y JBy JB
Similarly,

limf \f\dV=l 2 <( gdV.


N JBy JB
Finally,

limf fdV = lim(( {f+\f\)dV-[ \f\dv\


N JBy .V \jBy JBy I

= limf (/+ l/l) dV- limf \f\dV = k-l z .

N JBy N JBy
Since the family B N is arbitrary, B J"dV is defined.
jj

Example 7. Let B be the disk x 2 + y 2 < 1 in ft 2 , and let/be defined by


f(x 2
+ >'
2 )" 1/2
, for x > and x2 +j >2
0.

10
2
+ /) 1/2 ,
for x < 0.

Since $B +y(x 2 2 )~ 1/2


dx dy exists, and < {x + j )~
|/(x, y)\ 2 2 1/2
for
< x2 +J < 2
1 , it follows from Theorem 4.2 that j/?/c/x c/y exists.
The computation of the integral of/ is left as an exercise.

~
Example 8. Let j{x) = (— \) n l jn for n — 1 < x < n and » = 1,2,
3, . . The graph ofy is shown in Fig. 31 as far out as x = 4. Then
. .

;(x)rfx = 2— y — =io g 2,
71-+00 JO
and we can write

;(X) </x = log 2,


I
if it is understood that the passage to the limit has been carried out in this
special way. This example shares with the principal-value examples the
property that the value assigned to the integral depends on having taken

Figure 31
490 Multiple Integration Chap. 6

a limit over some particular sequence of regions. Such an integral is

called conditionally convergent. For another example see Exercise 8.

EXERCISES
1. In each part determine whether the integral is defined or not. If it is defined,
compute its value.

f=° dx
(a)
TF^TT • Ans W2-]
X2 + 1
\- -

dx
(b)
x- - 1

dx
(c)
Jo v 1 — x1 -

dx dy
(d)

(x — y) dx dy
(e)
|

2 g— , where R is the rectangle max (\x\, \y\) < 1.

f dx dy dz
'
2
J x*+v*+z*>i (* + y1 + z2)2

f ^/a: dy dz
(g)
"
J x*+v*+z*>i xyz
xyi

x ~ v~ z
(h) e~ dx dy dz, where C the infinite column
I
|

max
is

(|.r|, \y\) < 1, z > 0.

2. Prove that
~1 -
(a) T(n) = e- x x n dx = (n 1)!

for h > 1 an integer.

(b) Express J
r e~*(.x:
— j) _1/2 */x </y in terms of r, where T is the region
x >y > 0. [/4«5. 2r(|).]

<^
—-
3. Let 5 be the ball |x| < 1 in A". For what values of a does
,

f:
|
-
exist?
_ Ixl
C
x x dx 1
x dx
C
4. Compute: (a) p. v. (b) p.v.
J —x X~
,
-t I
.

_i 2x - 1

1
f x</.v
5. Compute the values of the function ^-(v) = , taking a principal
J-i v - y -

value of the integral when necessary.

6. Compute the integral of the function/in Example 7 in the text, and compute
\ B (x 2 + f)-1 '* dx dy. [Ans. |»r, 2w.]
Sec. 4 Improper Integrals 491

7. Show that the integral of the function j in Example 8 of the text depends on
the sequence of sets used to compute the limit. [Suggestion. Take each jB v

to be a disconnected set of intervals.]

8. Let f{x, y) = 2
sin (x 2 + y ) over the quadrant Q defined by x > 0, y > 0.

Show that Iq/cIA converges conditionally. (Suggestion. To get a limit,


integrate over increasing squares. Then integrate over quarter disks.)

9. In what sense does each of the following integrals exist? The possibilities
are ordinary Riemann integral, improper integral, conditionally convergent
integral, or none of these.

/""sin* f"
(a) —x j- dx. (c) sin x dx.
Jit Jo
00 1
f sin x f 1
(b) dx. (d) sin-dx.
Jo x Jo x

10. The integral f(x) dx is said to be Abel summable to the value k if


Jo
/• CO

lim e~ €xf(x) dx = k.
e— 0+ Jo
/*CO (*DO

Find the Abel value of (a) smxdx, (b) cosx^/x.


Jo Jo

~ y2
11. (a) Compute e~ x2 dxdy. (Use polar coordinates.)
Jj? 2

(b) Use the result of part (a) to compute J^ e~ x * dx.


(c) Compute J^« exp (— x\ — ... — x 2 ) dV.

12. (a) Show that the area bounded by the graph of y = \\x, the x-axis, and
x
the line = 1 is infinite,

(b) Compute the volume swept out by rotating the region described in
part (a) about the jc-axis.

13. Let /be positive and unbounded on an unbounded set B in R 2


. Consider
the region C between the graph off and B, and show that if

S B fdA and $c dV
both exist, then they are equal.

14. Show that if


JB \f\
\dV exists, then so does §B fdV. Without conditions
(1)and (2), this is false. For example, let B be the unit interval <x < 1
and /the function
if x is rational.
11,
— 1, if x is irrational.

15. Show that if the ordinary Riemann integral l B fdV exists, then it exists as an
improper integral (given conditions (1) and (2)) and the two integrals are
equal.
492 Multiple Integration Chap. 6

16. A nonnegative function 31" — > 31, integrable over a region R in Rn , is

called a probability density if


§R p dV = 1. The mean of/? is defined to be
the vector

M[p] = xp(x) dV,


JR
and the variance of p is the real number

° 2 [p] = I
|x - M[p]\ 2 p(x) dV.
JR
Show that each of the following functions is a probability density, and
compute its mean and variance if they exist.

(a) p(x,y) = (l/27r)e~ (x2+!/2)/2 . [Hint. Use polar coordinates.]


(b) p(x) = (tt(1 + x2))- 1 .

SECTION 5

ESTIMATES OF
In many of the examples of this chapter, numerical evaluation of integrals
INTEGRALS
has been made by using the fundamental theorem of calculus to arrive at a
precise answer in terms of some elementary function. In practice, such a
computation is very often not feasible, and then an estimate for the value
of an integral may have to serve instead. The fundamental inequality
used in making estimates is contained in Theorem 2.8. We repeat it here.

5.1 Iff and g are integrable over B, and/ <g on B, then

f fdV <[ gdV.


JB JB

Example 1. The function of one variable defined by

sin x
/(*) = '
<x < 1
(1 + x 2) 2

has an integral which is difficult compute. Comparing the graphs of x


to
and sinx for <x < shows that < sin x < x there, as we
1

see from Fig. 32. Then, since (1 + x 2 ) 2 > 0,

< <
Hence, by 5.1,
1 1
x x dx
< f sin
— dx ,

< f
\
-

Jo((1+x
2 2 2 2
Jo(l+x ) )

But the latter integral is easy to compute:

dx 1
0.25;
Figure 32 1o(l+x 2 2
) L2(l+x 2 )J<
,

Sec. 5 Estimates of Integrals 493

hence

04- + Jo (1
sin x
x
2 2
)
dx < 0.25.

Example 2. Suppose we want to estimate the integral of f(x, y) =


cos (x + y\2) over the square S: < x < \, < y <\. The graph of /is

Figure 33

shown in Fig. 33, lying between the horizontal planes at z = 1 and z =


cos | > 0.73. Thus

0.73 <cos (
^L^\
J
< 1-

By 5.1 we have

J
0.73 dx dy <
J
cos (^^) dxdy <\ Idx dy.

Evaluating the largest and smallest of these integrals is easy because they
are the volumes of solid rectangles with base 5 and heights 0.73 and 1

respectively. Since the area of S is 0.25, we get

0.1825 < cos(x 2y) dx dy < 0.25.

The estimates in the two preceding examples are fairly rough because
we have replaced the given integrand /by approximating functions that
differ from / considerably over a relatively large part of the domain of
integration. Of course, the estimates could be improved by choosing
approximating functions that agree with/more closely. However, to be
approximating functions themselves should be easy to
really useful, the
integrate. One way to achieve this is to choose for an approximating
494 Multiple Integration Chap. 6

function a step function that is constant on each subrectangle of a rec-


tangular grid in the domain of integration. The step function can be chosen
so that its integral is a Riemann sum, and a sequence of these can be
chosen so as to converge to the value of the integral.

Example 3. The integral

(1 + x + y)
3
dx dy,
j;

where R is the rectangle < x < 1, <y < \, can be approximated by


Riemann sums. We subdivide the rectangle by a grid with corners at
points (j/N, kj2M), where/ = 1, . . . , N, k = 1 , . . . , M. Evaluating the
integrand at the upper right corner of each subrectangle gives function
(1 +jjN + k\2M)
%
values of the form Each subrectangle has area .

(1/jV)(1/(2A/)). The corresponding Riemann sum is

wm\ N 2M/\(2NM)J
(2NM))

To simplify the expression, we can take TV = M; hence the sum takes the
form
N N
wh\ N 2nJ\2N 2
,

Evaluating sums like this for even moderately large values of N is best
done on a computer. We get the following table:

N
Sec. 5 Estimates of Integrals 495

5.2 Theorem

Let /be continuous on a closed bounded rectangle R in '.i\", defined


by a i
< x t
< b L
, i — 1, . . . , n, and let / have partial derivatives
satisfying

df
(x) < M {,
( = !,...,».
dx

Let be subdivided by a grid G into rectangles of width h =


R t

{b t —
a^/Ni in the x -coordinate, where is an integer. Then the
(
N t

error in approximating \ R fdV by a Riemann sum based on G is at


most

V(R) |hM t it

where V{R) is the volume of R.

Proof. Since R — 7? x U . . . U /?#, and the i? fc are disjoint, we can


write

(fdV=2 f /dK.
Then

\ fdV-SN =J, \ (f-K**))dV,


JR fc=l JRk
because/(xj.) is constant on R k Hence
.

f fdV-SN <i f \/-f(xk)\dv


JR c=l JR k

ZV(R )max\f(x)-f(xk
k )\.
fc=l x in Ri;

But by the mean-value theorem, for x and y in Rk ,

l/(x)-/(y)| = |/'(*)(x-y)|
n
df
i=l OXj

< 2 ^ max (A)


in
t'=l U iJ dx,-
By the previous inequality,

fdV-SN ^IV(R*)IK max (.V)


i=l UinR fc=l 9x,

1=1
as was to be shown.
496 Multiple Integration Chap. 6

Example 4. The integral

/ = (1 + x + yf dx dy
JR
of Example 3 has integrand f(x, y) = (1 + x + )>)* We have fx (x, y) —
fv(x ,y) = 3(1 + x + y) R < <
2
The . rectangle determined by x 1 and
< y < \ is such that
max \fx (x, y)\ = max |/„(x, y)\ = 3(f)
2
= ¥.
(•r.v) in R Ix.y) in iJ

Thus x M M
= 2 = ~t- If we subdivide into K equal parts along the x-axis
and L equal parts along the j-axis, we have /? x = IjK and h s = 1/2L.
Then a Riemann sum 5^ L based on such a grid will satisfy

IJ-Szrrl <
«<4(i)(7V(i)(?);

8 \X 2L/

By choosing L = 200 and K= 400, we get the error bound

\I-SK L \<0.05._

Integration over nonrectangular regions poses a problem in making


estimates because there is likely to be error not only in approximating the
function being integrated but also in fitting a grid to the region of integra-
tion. A bounded domain of integration B can always be extended to a
rectangular one by the device of extending the integrand /to the function

fB , which is zero outside B. However, the error estimate in Theorem 5.2


will then usuallyno longer be applicable because /# is, in general, dis-
continuous and so will fail to have the required partial derivatives. In
some examples, the change-of-variable theorem can be used to transform
the domain of integration into a rectangular one. Some care should then
be used in the choice of transformation to ensure that it doesn't complicate
the integrand unnecessarily.

Example 5. Let Q be the quarter disk in the first quadrant of 5l 2

defined by < x, <y, x 2 +y 2


<l. Using the polar coordinate
transformation
x — u cos v

y — u sin v

for < u < 1 and < v < tt/2, we can transform Q into the rectangle R
shown in Fig. 34.
Sec. 5 Estimates of Integrals 497

i
i i i

i
i i i

+-I-I-+-4-

rn~r|i

+ H-I-4-4-
.
i i i
j

I I I I
J
-I
i
— — ——
|

'
i

i
I

i
I-
I

Figure 34

Given an integral of the form

/ = f f(x, y) dx dy,
JQ
the change-of-variable theorem shows that

J- — f( u cos v> u sm v) u du dv,


JR
where u is the Jacobian determinant of the polar coordinate transforma-
tion. Since R is a rectangle, the estimates of this section apply readily.
For example, iff(x, y) = (x 2 + y )\Jx + y3 2 2
,

I = \ (u
2
cos
2
v + 3
u sin v)u
3 2
du dv.
JR
Setting g(u, v) = w4 cos 2 v + u 5 sin 3 v, we find

= |4u
3
cos v
2
+ 5w sin
4 3
v\ < 9
du
and

K
= — 2u |
4
cos v sin v + 3u sin v cos
5 2
u| < 5
dv '

for < w < 1, 0<y<


7t/2. Now suppose Sjv^/ is a Riemann sum

based on a grid with TV equal subdivisions along the w-axis and equal M
subdivisions along the f-axis. Then Theorem 5.1 shows that

<15 13
N M
By taking N= M = 60, we get the error bound
I-* "^eo 6ol -^ 60 ' ITo ^- "•-'•
.

498 Multiple Integration Chap. 6

EXERCISES
1. Find rough estimates above and below for the following integrals by using
the largest and smallest values of the integrand on the domain of integration.

(a) Vl + cos x dx. (d) e x+v dx dy.


Jo Ja;2+w2<i

(b) e*
2
dx. (e) dx\ dy\ Vx + y + zdz.
Jo Jo Jo
ri /»2

(c)
Pdy P cos xy dy. (f) dx\ dy\
r>i

(x 2 + y2 + z 2 )dz.
Jo Jo Jo Ji J-i

2. Estimate each of the following integrals by computing a Riemann sum based


on a subdivision of the domain of integration into four equal rectangles.

•l /*i

(a) + x 2 )dx. (b) dx (x 2 + J2 ) dy.


f (1
o Jo

3. For each integral in Problem 2, estimate the error in making the approxi-
mation. Use Theorem 5.1

4. Find estimates for the error in approximating the integrals in Problem 2


if number of equal
the subdivisions is 10 in part (a) and 10 2 = 100 in part (b).
Use Theorem 5.1.

5. Use a computer to make estimates by Riemann sums described in Problem 4.

6. If an is approximated by a Riemann sum with function values taken


integral
at themidpoint of each rectangle of the grid, then the integral is said to be
approximated by the midpoint rule. It can be shown that, in this case, the
error estimate of Theorem 5.1 can be changed to

where M {2)
> 2
\(d 2fldx )(x)\ for / = 1, 2, . . . , n. If the integrals in Problem
2 are estimated using the midpoint rule for 10 equal subdivisions in part (a)
and 10 2 = 100 in part (b), find a bound for the error.

7. Transform the integral below by the change-of-variable theorem into one


over a rectangle. Then use Theorem 5. 1 to estimate the error in approximating

by a Riemann sum with 10 2 = 100 equal subdivisions.

(x + y2 ) dx dy.
I'

SECTION 6

NUMERICAL The estimates of the previous section are rather crude relative to the degree
INTEGRATION of accuracy often required in numerical work. The purpose of this section
is to show how greater accuracy can be obtained. Theoretically, Theorem
Sec. 6 Numerical Integration 499

5.2 may provide an arbitrarily high degree of accuracy. However, to


achieve it, the amount of computing time required may be very great.
Furthermore, so much arithmetic may have to be done that the ac-
cumulation of round-off error becomes unacceptably large. To get
around these problems we need methods capable of giving good accuracy
without so much arithmetic. One such method is Simpson's rule, which
we first describe just for 1 -dimensional integrals and then apply to multiple
integrals.
Simpson's rule is motivated by the elementary observation that, for
any quadratic polynomial q{x) = Ax 2 Bx C, + +

j\(x) dx = —^
b
\q{d) + 4<?(
£
-p) + q(b) (1)

The proof of this formula is outlined in Example 5, Chapter 2, Section 2.


To apply Equation (1) to approximate § ab f{x)dx, we set

x = a, xx = —+
(fl fr)
, x2 = .

and find a quadratic polynomial q such that

q(x )=f(x ), q(x 1 )=f(x 1 ), q(x 2 )=f(x 2 ). (2)

(Since q(x) has the form Ax 2 + Bx + C for some A, B, and C, to deter-


mine q we need only solve for A, B, and C in the equations Ax\ + Bxk +
C=f(x 2 ), k=\, 2, 3. However this need not be done in practice.)
The approximation we make is then

f/(x) dx « P q(x) dx.


Ja Ja

Since by Equation (2), /and q are equal at x , xu and x 2 we can appeal


, to
Equation (1) to get

6.1
"fix) dx m °
- ~a
r
Ja

This formula is called Simpson's three-point approximation. Figure 35


shows that the approximation may be good or bad, depending on the
shape of/. In Fig. 35(a), the graph of q lies close to that of/over [a, b],
and in addition there is cancellation between the shaded and unshaded
areas because q lies alternately above and below/ In Fig. 35(b), the graphs
are not particularly close, and there is no cancellation.
:

500 Multiple Integration Chap. 6

Figure 35

To improve the approximation, subdivide [a, b] further into an even


number of equal intervals by a new choice of points x , xu . . . , xN ,

N even
I I I I I I I

a — .x o x i x2 • • • x r
v _ 2 xN _ i
xn = b

Since
fxz rxt rxy
f(x)dx = \
f{x)dx + \
f(x)dx + ... + ]
f(x)dx,
J*6

we can apply the three-point rule N/2 times to get

h
f/(x) dx * -=-± [/(.v ) + 4/(x0 +/(x 2)]
Ja 6(AT/2)

6(N/2)

+ J"-?"
[/(.x.Y - 2 ) + 4/(x v _ 1 ) + /(.x.v )].

Combining terms gives

6.2 /(x) dx

^ b ~ a
+ 4/(x x ) + 2/(x 2 ) + + 4/(x A _ 1 ) +/(x A )].
*" [/(*o) . . .

3N

Formula 6.2 is called Simpson's rule.


Sec. 6 Numerical Integration 501

Example 1.Applying Simpson's rule to / = f* e


x'
dx for small values
of TV gives the approximation SN :

1 -0 VN 2/N lN ~ 1),N
[e° + 4e + 2e + ... + 4e + e
1
].
3N

N
502 Multiple Integration Chap. 6

Thus to prove the theorem we need only consider the three-point


rule on [a, ft] and show that

h
5
M
f f(x)dx--[f(a) +
Ja
4f(c)+f(b)] <
90
where ft = (ft — d)\2, and c = (b + a) 12. First let

+
£(') = P 7« dx - l
- [f(c + + 4/(c) + /(< 0]
Jc-i 3
and

5
ft

Straightforward computation shows that

£'(0 = Uf(c + - 2/(c) + /(c - 0]

-W(c + -/'(c - 0] - 7^5


ft
£(/0,

£"(0 = H/'(c + 0-/'(c-0]


3
20f
£(ft),

2
60r
£ (0= - Mf (c + -/ (c - 0] - tt £ W-

Now we apply the mean-value theorem to F at / = and / = ft.


But since £(0) = £(ft) = 0, we can conclude that £'('i) = for
some t x between and ft. Since £'(0) = 0, we can apply the mean-
value theorem to F' at / = and t =
t x to conclude that £"(7 2 ) =

for some t 2 between and t v But F"{0) = also; so as before we


conclude that F"'(t 3 ) = for some t 3 between and r 2 This implies .

that

E(h) = [/ (c + h)-f (c -*«)] (3)


I80r-

Again using the mean-value theorem, we get

+ 3) -f\c -
f"\c r t 3) = 2t zf«\u) (4)

for some /4 between c — and c + t3 t3 . Since < /3 < ft, the number
t4 lies in (a, ft). We also have from (3) and (4)

= {i)
£(/>)
fQ f (U)-

This completes the proof.


Sec. 6 Numerical Integration 503

Example 2. We apply the previous theorem to

1
2
x
e dx
I"
Jo

We find f
U) (x) = 12e*
2
+ 48xV + 16xV 2 2
; so on [0,1], we have
\f
U) (x)\ < Me + 48e + \6e < 228. Then the error in replacing the
integral by the Simpson approximation using N points is at most

(1 - 0)
5
228 L3
'

4 4
180iV " iV

Thus for N = 10 the error is at most 0.00013, and for N= 100 it is at


most 0.000000013.

To see how to apply Simpson's rule to a multiple integral over some


rectangle, we apply Formula 6.2 repeatedly to an iterated integral in some
order. Thus, if
d
( f(x, y) dx dy = [
dy ("f(x, y) dx,
JR Jc Ja

we consider for each y in [c, d] the approximation

F(y) = [fix, y) dy * b—-^ jU/(*„ y), (5)


Ja 5!S j=0

where the numbers A, follow the Simpson rule pattern A = 1 A x = 4, ,

A 2 2,— Aj_ x — 4, A j =
. . . , and J is even. The points x are evenly 1 ,
;

spaced at a + j(b — a)jj, forj — 0, 1, /. We then apply Simpson's . . . ,

rule to the integral of F with K intervals in [c, d], getting

F(y)dy^^—^2B k F(y k ), (6)


I 3K k=o

where B = 1, Bx = 4, . . . , i^-i = 4, 2?^ = 1 , as usual. At the risk of


increasing the error, we substitute the approximation in Equation (5)
into the one in (6) to get

[dy [f(x,
Jc Ja
y) dx m (
±^ 5J
(A
i^I B&AJi** »)
jK. A-=0 ;=0

6.4 f f{x, y)
J It
dx dy * ^|
yJKk=0j=0
I^A/C*,-, y k ),

where x ; = a -\- j{b — a)jJ and y k =c+ /c(d — c)jK. The products
/ijjBj. are simply the products of the usual Simpson coefficients. For
example, if A3 = 4 and B2 = 2, then ^ 352 = 8.
504 Multiple Integration Chap. 6

Example 3. Consider

f (x - yf dx dy = ['dy (\x - yf dx.


Jr Ji Ji

Choosing/ = 4 and K= 2 we find an approximation based on evaluation


of the integrand at the points indicated in Fig. 36. The associated coefh-

(1)

(4)

(1)
Sec. 6 Numerical Integration 505

we apply the one-variable Simpson's rule three times to get the 3-dimen-
sional Simpson rule:

6.5 f /(*, y, z) dV ^ £&- | f lAjB.CJix;, y k , z x\


JR J./JKLl=0k=Oj=O

where x, = a^ + j(b — t a-^jJ, yk = — a^jK, and z =


a2 + k(b 2 l

t*3 + l(b 3 — a )lL. Once


3 again the A\, and C's are the usual
iTs,
Simpson coefficients; hence, for example, with A 2 = 2, B 3 — 4, and
C3 = 4, we have A 2 B3 C3 = 32. Even in the case J = K — L = 2,
the sum in 6.5 has twenty-seven terms, so it is desirable to have a
computer to carry out the arithmetic.

Example 4. To approximate

•*2.3 /M.8 /*3.6

dz dy log (xyz) dx,


Jl.l Jl.5 Ji

we apply Formula 6.4 to the solid rectangle 1 < x < 3.6, 1.5 < y < 1.8,
1.1 < z < 2.3. The edges of the rectangle have lengths 2.6, 0.3, and 1.2,

respectively; so we choose J, K, and L in proportion. For a first approxi-


mation we try J = 16, K = 2, and L = 8. The number of terms in the
sum S16>2>8 is then (17)(3)(9) = 459. We have

"
II4
_ (3.6- 1)(1.8- 1.5)(2.3- 1.1) ' _ _
= I/lAQlog(x
. ,
z,),
,
Sie.2.8 .
7m nwg
27(16)(2)(8)
. .
,=oA=o ; =o
y
>'
fc

where x, = 1 +y'(3.6 - l)/8, yk = 1.5 + k{l.S - 1.5)/2, and z = l

1.1 + /(2.3 — l.l)/8. We compute 5 16>2 , 8


= 1.66797, approximately.
To improve the accuracy in Example 4, we can refine the subdivision.
However, since the arithmetic needed is fairly time-consuming, it is

helpful to have an error estimate like 6.3 to use as a guide in choosing a


grid.

6.6 Theorem

/be continuous on a closed, bounded rectangle n


Let R in 3t , and
have partial derivatives of order 4 satisfying

' < M {, 1 =1 it.


dxV
506 Multiple Integration Chap. 6

Let R be subdivided by a grid G into rectangles of width

bf — a t
h,=
N<

in the .^-coordinate, where each N{ is an even integer. Then the


error in making a Simpson approximation to \ n fdV based on G is
at most

> /?,M.-.
180 ,fl

Proof. The theorem contains Theorem 6.3 as the special case,


n = 1, and is proved by reducing it to repeated application of 6.3.
The case n = 2 presents the ideas adequately because it is the
essential step in an inductive argument. Setting

E =
Jc
dy
Ja
f(x, z) dy - v
^
9JK
2Aj2B k f(x
j=0 1=0
it y k ),

we add and subtract an intermediate term to get

E = [dy [fix, y) dy - ^7*2^ f /(x„ y) dy


Jc Ja JJ j=0 J c

h — a J
d
C
+ -rflAA f(x»y)dy
iJ j=o Jc

(b-a)(d-c) $ A *
ttt;
yj k
2 a jZ
j=o k-=o
BJ(xj, y k ).

Using the triangle inequality, and interchanging summation and


we have
integration,

l£l < [f(x,y)dx-^77^2 dy


2>J j=o

d — cK
+ IAj /(*„ y) dy —r—lBJix^yt)
3J ;=0 J" 3K k=o

We now apply 6.3, the single-variable case, to the two expressions


in square brackets. Thus

(b - a)
5
b - a (d - c)
5

\E\ < Mj dy l* M,
r 180J
4
3J ~o
t
180K 4

But now the expressions in square brackets are independent of y


Sec. 6 Numerical Integration 507

J
and y, respectively. Furthermore, it is easy to check that ]£ ^i =
3/. Hence

|£| <{d-c)
180J
v-
4
M. + ib- a)
180K 4
vM 2

- - -
+ (d-
4
(d c)(b a) (b af .. ..
M x
c)
— Mo
180 J 4
K 4

which is what we wanted to show.

Example 5. Returning to the integral

^2.3 /*1.8 T3.6


dz dy\ log (xj^z) dx
Jl.l Jl.5 Jl

of Example 4, we find

V(R) = (2.3 - 1.1)(1.8 - 1.5)(3.6 - 1) < 1.

Also, from the integrand y"(x, y, z) = log (xyz), we find

— - (x, y, z)
}
= —ox
ax
4 ^ '

— (x, y, z) = -6y

— -(x, y, z) = — 6z .

Since x, y, and z are all greater than or equal to on R, the partial 1

derivatives of order 4 are at most 6 in absolute value. Thus we can choose


M —M =M =
x 2 3 6. The numbers h ± h 2 and h z we estimate more care-
, ,

fully, because they are to be raised to the fourth power. In fact, we take
508 Multiple Integration Chap. 6

Computing this J= 16, K = 2, and L = 8 gives E <


number for
0.000029. We recall that in
Example 4 we computed the approximate value
of the integral by Simpson's rule to be 1.66797. The error estimate now
gives us a possible latitude of ±0.000029, hence less than ±0.00003.
Thus Theorem 6.6 indicates that the true value of the integral lies between
1.66794 and 1.66800.

EXERCISES
1. In applying Simpson's rule to the approximation of $%x sin xdx, what is

the smallest reasonable number of subintervals to use in order to take


advantage of the shape of the graph of x sin xl

2. Use Simpson's three-point rule to estimate each of the following integrals.

xi
(a) x* dx. (b) sin x dx. (c) e~ dx.
Ji Jo Jo

3. Use the Simpson's rule with four intervals to estimate each of the following
integrals.

/»1 /»1.4 /«0

x x
(a) sin x dx, (b) log sin x dx. (c) e dx.
Jo Ji J-i

4. Using a computer, apply Simpson's rule over 100 intervals to estimate each
of the following integrals.
2
.
2 1
-

dx f
23
:™dx
f f
(a) e*dx. (b)
-3-TT- < c>
Jo Jo X3 + I Jn X
5. Using Theorem 6.3, estimate the error in each approximation in Exercise 4.

6. Apply Formula 6.4 with J = K= 2 to estimate the following integrals.

(a) (x+y)dxdy. (b) dx\ (x - y) dy.

7. Using a computer and Formula 6.4 for / = K = 8, estimate the following


integrals.

/•l /»2 /»2.1 /M.l


(a) (x + evf dx dy. (b) dx log (x + y) dy.
Jo Ji Ji.i Jo

8. Use Theorem 6.6 to estimate the error in each approximation in Exercise 6.

9. Use Theorem 6.6 to estimate the error in each approximation in Exercise 7.

10. How large should / and K be chosen in Formula 6.4 to ensure that a Simpson
approximation to

dx e*» dy
Jo Jo

gives an error no more than 0.0001 ? Remember that J and K must be even
integers.
Sec. 6 Numerical Integration 509

11. How large should J, K, and L be chosen in Formula 6.5 to ensure that a
Simpson approximation to

gives an error
H
Jo

no more than 0.0001


Jo

?
dy
/•1.2

e*
yz
dz

12. Let A be the annular region in Jl


2
defined by 1 < x 2 + y2 < 4, that is, the
region between concentric circles of radius 1 and 2 about (0, 0). The
integral

(x
2
+ 2y
2
) dx dy
L
cannot be well approximated directly by Simpson's rule because A is not
rectangular.

(a) Transform the integral by the change-of-variable theorem into one over
a rectangular region.
(b) Approximate the integral found in part (a), accurate to within 0.005.
7

Vector Field Theory

SECTION 1

GREEN'S THEOREM The fundamental theorem of calculus says that if /' is integrable for
a <t <b, then

f\t)dt=f(b)-f{a). (1)

In Section 2 of Chapter 4 the theorem has been extended to line integrals


of a gradient V/by the equation

V/-(x).Jx=/(b)-/(a). (2)

The main theorems of the present chapter are also variations on the idea
that an integral of some kind of derivative of a function can be evaluated
by using the values of the function itself on a set of lower dimension. We
begin with the version known as Green's theorem.
Let D be a plane region whose boundary is a single curve y, paramet-
rized by a function g in such a way that, as / increases from a to b, g(t)
traces y once in the counterclockwise direction. An example is shown in
Fig. 1. If Fr and F2 are real-valued functions defined on D, including its

boundary, then Green's theorem says that

f ( IT
JDXOXy
~ ir)
OX 2!
dx> dx * = lF
Jy
>
dx > +F >
dx *> W
under appropriate smoothness conditions. The requirement that y be
traced counterclockwise is the analog of the fact that, in Equations (1) and
(2), the differences on the right have to be taken in the proper order. The
analogy of Equation (3) with Equations (1) and (2) can be further
strengthened if we think of the integrand {dF^dx^ — {dFijdx 2 ) as a kind

510
Sec. 1 Green's Theorem 511

72

71

74

Figure 1

of derivative of the vector field F= (F1? F2 ). Section 8 contains a justifi-


cation of this viewpoint.

Example 1. Suppose D is the square defined by — < 1 xx < 1,

— < 1 x2 < 1 ,and let Fx and F 2 be defined on D by i^fo, jc 2 ) = —x 2e


Xl
,

and F2 (x l5 x 2 ) = x x e Then
x,i
.


OXi
(x l5 x 2 ) - —
ox 2
- (x l5 x 2) = e
2
+ e
1
;

so
r

Jl>\OX 1
/^f 2
_ |F,\
OX 2 l
^^=p^p J-l J-l
(eXa
+ eXi)
^
= 4U -
The boundary curve y can be parametrized in four pieces y u i = 1,2,3, 4,
b>
//i

= < -\ <t <\.


512 Vector Field Theory Chap. 7

Notice that the traversal of y is counterclockwise, as is shown in Fig. 1.

On the first side of the square we have

F dx t x -f- F dx 2
2
= — x 2 e Xl dx x + x^* 2 dx 2

dt
dt dt

I
l
e dt = e — -
J-\ e

Similarly, the integrals over the other three sides are also equal to (e — \je),

so

\F1 dx1 +F 2 dx z = 4(e--\.

Equation (3) is thus verified for this particular example.

In the computation of a line integral, a given parametrization can


always be replaced by an equivalent one for which the line integral will
have the same value. In the previous example the boundary curve y was
given what appears to be the simplest parametrization, though any
equivalent one would do. The question becomes more important if the
boundary is presented without any parametrization, but merely as a set.

It may be necessary to choose a parametrization, and if Green's theorem

is to be applied, we shall see that this must be done so that the boundary is
traced just once, and in the proper direction.
The importance of the clockwise versus counterclockwise traversal of
the boundary becomes apparent when we observe that, for any line
integral, a reversal of the direction of the path changes the sign of the
integral. Thus, if y is parametrized by g(t) for a < t < b, we can denote
by y~ the curve parametrized by g~(t) = g(a + b — t) for a < t < b.
It is clear that y~ is the same set as y, but is traced in the opposite direction,

that is, from g(b) to g(a) instead of the other way around. Then, since
(£"('))' = —g'(a + b - t), we have

I
F dx • = - f F(g(a + b -t))> g'(a +b— t) dt.
Jy— Ja

Changing variable by t — a + b — u gives

F-dx= - f V(g(«)) • g'(u) du


I
=-\r dx
Sec. 1 Green's Theorem 513

This proves the important formula

dx = —
L F -*—l F
1.1 •
\
• dx.

Green's theorem can be proved most easily for regions D such that y,
the boundary of D, is crossed at most twice by a line parallel to a co-
ordinate axis. Such a region is called simple. Thus a coordinate line inter-
sects the boundary of a simple region either in a line segment or else in at
most two points. In fact, using Equation 1.1, we can extend the theorem
to finite unions of simple regions. A few such are shown in Fig. 2, where
only D x is simple. In D 2 the boundary is shown traced not always counter-
clockwise, but rather with the region always to the left as a point traces the
curve. For bounded regions with a single boundary curve, the two
descriptions of the orientation of the boundary amount to the same thing.

y\

Figure 2

1.2 Green's Theorem

Let D be a bounded plane region which is a finite union of simple


regions, each with a boundary consisting of a piecewise smooth
curve. Let F and F
1 2 be continuously difTerentiable real-valued
functions defined on D together with y, the boundary of D. Then

where y is parametrized so that it is traced once, with D on the left.

Proof. Consider first the case in which D is a simple region, with


boundary y parametrized by

gi(0
a < < t b.
(::) g 2 (0
514 Vector Field Theory Chap. 7

Since

F dx 1
x + F, dx, = Fj dx x + F 2 dx 2 ,

we can work with each of the terms on the right separately. We have

j F^, x 2) dxt =j* Fx (gl(0, g,(0)«i(0 dt.

The curve y consists of the graphs of two functions u(x x ) and v(x x ),
perhaps together with one or two vertical segments, as shown in

Fig. 3. On
a vertical segment, g x is constant, so g[ there. On —
the remaining parts of y we apply the change of variable x x = g x (t)
so that, on the top curve, g 2 (t) = u(xx ), while on the bottom,
g 2 (t) = v(x x ). It follows that

F x (x x x 2 )
, dxx = F x
(x x u(Xl))dXl
, + ^(xj, v( Xl))d Xl ,
J

where the integration from /5 to a occurs because the graph of ;/ is


traced from right to left. Reversing the limits in the first integral,
we get

F x (x x , x 2) dXl = -F x
(x x u(Xl))
, + FjKxu v( Xlj)] dXl
I

(x l5 x 2 ) rf.v 2 dx x
dx.

1 z>
— —
dx.
axj ax2 .

A similar proof, referred to Fig. 4, shows that

F,(xj, x 2 ) dx 2 = —- dx x dx,.
Jv Jd ox x

x2

0'

x2 = u ( x \)

Figure 3
Sec. 1 Green's Theorem 515

Combining this equation with the previous ones gives Green's


theorem for the special class of simple regions.
We extend the theorem to a finite union, D —D 1 \j . . . U DK ,

of simple regions each with a piecewise smooth boundary curve y k ,

k = 1, , K.
. .Applying Green's theorem to each simple region D k
.

we get

tdF1 _d_F1 \
J*
dx 1 dx 2 -

f
Fi dx, + F, dx 2
JD^dx^ dx 2
The sum of integrals over D k is an integral over D; so

jD\dx 1 dxj

+F + + +F
iF x dx x 2 dx 2 . . .
I ^i dxj_ z
dx 2

Now the boundary of D consists of pieces taken from several of the


curves yk In addition there may be parts of curves y k that are not a
.

part of y but which act as common boundary to two simple regions.


The effect is illustrated in Fig. 5.

Figure 5

A piece <5 of common boundary will be traced in one direction or


the opposite, depending on which simple region it is associated with.
But for a line integral we always have, by Equation 1.1,

Fx dxi +F 2 dx 2 + Fi dx 1 +F 2 dx 2 = 0.
i:
Thus, while the parts of the curves yk that make up y contribute to
f^r7! dx x + F 2 dx 2 the other parts cancel, leaving
,

d
2 ~ 1 )dx = +F
(^ \
jDXOXi {
OXJ
l
dx 2 \F 1 dx 1
Jy
2 dx 2 .

This completes the proof of Green's theorem.


516 Vector Field Theory Chap. 7

The last part of the proof just given extends Green's theorem from
simple regions to one like those shown in Fig. 6. The extension has an

important consequence for line integrals J F1 dx 1 + F2 dx 2 over two curves


y and d, whenever the functions F1 and F2 are continuously differentiable
in the region D between y and b as well as on the boundary curves. In
Fig. 6(a) the curves are traced in the same direction (counterclockwise in
the figure) and in Fig. 6(b) the curves go from one point to another in the

(a) (b)

Figure 6

same direction. If it happens that the equation

_h_ dli =
d
(4)
dx x dx 2

holds throughout D, then we can conclude that

F x dx x +F 2 dx 2 = F x
dx x -f F2 dx 2

The principle is illustrated in the next two examples.

Example 2. Let F x and F2 be defined by

Fi(x u x 2 ) = '
2
,
'
F 2 (x l5 x 2) =
x{ + x\ A+A
for (*!, x2) ^ (0, 0). Direct computation shows that these functions
satisfy Equation (4). If y is the ellipse shown in Fig. 7 and defined by

/xA pcosA < / < 2775


\x 2 / \3 sin t)

then the line integral J 7 F dxx + F2 dx 2 would


x
be troublesome to com-
Figure 7 pute directly, even using tables. However, we can apply Green's
Sec. 1 Green's Theorem 517

theorem to the region D between y and the circle c of radius 1 about


the origin, parametrized by (x x x 2 ) = (cos /, sin /), < t < 2-n. Because
,

Equation (4) is satisfied, Green's theorem yields

F dx x
1 +F 2 dx 2 = 0,
?Uc-

where c~ is c traced clockwise, so that D is on its left. The last equation


can be written, by Equation 1.1,

F dx x
x +F 2 dx 2 = F dx x
x +F 2 dx 2
Jy Jo

But on c we have x\ + x\ = 1 , so

F dx +F dx 2 = — x 2 dx x + xx dx
b x x 2

= (sin
2
1 + cos
2
dt = 2-n.
i
It is important to observe that Green's theorem could not have been
applied directly to the entire interior of the ellipse because {dF2 jdx x ) and
(dFJdxo) fail to exist at the origin.

Example 3. The curve y x given by g(t) = (t, t 2), < t < 1, is shown
in Fig. 8. Suppose that F(x x x 2 ) = (F^Xx, x 2 ), F2 (x x x 2 ))is a continuously
, ,

differentiate vector field for x\ + x\ < 4 and satisfies Equation


(4) there. The line integral of F over y could perhaps be com- x x2
puted directly in the form

f F dx =
• {'[F^t, t
%
) + F 2 (t, t')(2t)] dt.
Jy 1 Jo

However there are other possibilities. For example, the curve y 2


can be parametrized by g 2 (t) (t,t), =
< / < 1. Since we can
apply Green's theorem to the region between y and y 2 the fact x ,

that

— —— -
-
) dx x dx 2 (J

d \ dx x dx 2 J
would imply
518 Vector Field Theory Chap. 7

Another alternative would be to replace y x by y 3 where y 3, is parametrized


in two pieces by

'ft'

Thus

I F dx
• = F I x (f, 0) dt + I
F2 (l, dt.
Jy 3 JO JO

This may be easier to compute than either of the integrals over y x and y 2 ,

although all three are equal.


Line integrals around a closed path, sometimes called a circuit, are of
importance that they are often distinguished from other integrals
sufficient
by means of an integral sign like j. In the plane this notation has the
special advantage that cp and <p can be used to indicate a counterclockwise
or clockwise traversal of the path.

Example 4. Green's theorem has two distinct but closely related


physical interpretations. We assume D to be a region in 'J\
2
whose boundary
is a single counterclockwise-oriented curve y has a smooth param-
y. If
etrization g(t) — (gi(t),g 2 (t)), a <t <b, and has a nonzero tangent
at each point, we can form the unit tangent and normal vectors

g'0) ( g'i(t) g'i(t)


t({)
ig'coi Mg'(or ig'oi
and

Wcoi
gV)\ \gV)\
i*'(oi/

An example is shown in Fig. 9. If F =


(Fl7 F2 ) is a continuously differ-
entiable vector field defined on a region containing D and y, then the line
integral in Green's theorem can be written in the form

j) F, dx, +F 2 dx 2 = J* V(g(0) t(0 • |g'(r)| dt

F-tds.

We define a real-valued function, curl F, called the curl of F by

,, ,
.

curl F(x)
dF,
= -— (x) —
ox,
dF

ox 2
x
(x).
F

Sec. 1 Green's Theorem 519

Figure 9

Green's theorem then becomes

curl F dA F • t ds,

sometimes called Stokes's theorem. Now interpret Fas a force field in the
plane. The line integral represents the work W{y) done in moving a
particle around y in the counterclockwise direction under the influence of F.
Stokes's theorem says that W(y) is equal to the integral of the curl of F over
D. In particular, if curl Fis identically zero in D, then W{y) = for every
smooth circuit y contained in D, whether y is oriented counterclockwise or
not. For this conclusion to hold, it is of course necessary that curl F be
defined throughout the inside of every circuit in D to which Stokes's
theorem is applied. Conversely, it is possible to show that if W(y) =
for every smooth circuit, then curl Fis identically zero. See Section 2.
Fcan also be interpreted as the velocity field of a fluid flow in D. That
is, the vector field Fat each point of D represents the speed and direction of
the flow at that point. In this case the line integral in Stokes's theorem is

called the circulation of F around and Stokes's theorem says that the
y,
circulation of F along y is the integral of the curl of F over D. Thus to
say that curl Fis identically zero in D is to say that the circulation is zero
around every smooth closed curve with its interior contained in D. A
field Ffor which curl Fis zero is called irrotational for this reason.

Now using the unit normal n(?), we can rewrite Green's theorem
in another way. Instead of applying the fundamental Equation (3) to the
field F=(Fx F2 ), we apply it to the related pair of functions (— 2 Fj).
, ,

Thus the line integral becomes

-F 2 dx, + F, dx 2 = F(g(f)) • n(0 \g'(t)\ dt

F 'n ds.
520 Vector Field Theory Chap. 7

On the other hand, the area integral over D becomes

Kg+g)^-
We define a real-valued function div F, called the divergence of F, by

divF(x) = |^(x) + |^(x);


d.Vi ox 2
thus Green's theorem can be written

div F dA =4F • n ds,


ID <ly

which, in this form, is called Gauss's theorem.


Using the fluid flow interpretation, in which F represents a velocity
field, the line integral in Gauss's theorem is the integral of the outward
normal coordinate of F over y. Hence, this integral is called the total flow,
or flux, of F across y in the outward direction. Gauss's theorem shows that
the flux, O(y), across y is equal to the integral of the divergence of F
over the region bounded by y. Thus div F(x) measures the rate of change
of the density of the fluid at the point x. If div F(\) is predominantly
positive inD, then O(y), the outward flow, will be positive, and a negative
O(y) indicates that more fluid is going into D than is going out. If div F
is identically zero, then F is said to represent an incompressible flow.

EXERCISES
©1. Use Green's theorem to compute the value of
dxz where y is each of the following closed
jCj ,
the line integral $
paths.
y
x2 dx1 +

(a) The circle given by g(t) = (cos /, sin f), < t < 2n.
(b) The square with corners at (±1, ±1), traced counterclockwise.
(c) The square with corners at (0,0), (1,0), (1,1), and (0,1), traced
counterclockwise.

2. Using some one of the paths y l , -/.>, or y 3 in Example 3 of the text, compute
the line integral J
x 2 dx 1 + x 1 dx 2 .

3. Let 7 be the curve parametrized by g(t) = (2 cos t, 3 sin t), < t < In.
Compute j"
v (2x 1 + x 2 ) dx^ + (a: x + 3x 2 ) dx 2 .

4. Evaluate the following line integrals by whatever method seems simplest

(a) fy
e* cos y dx + e x sin y dy, where y is the triangle with vertices (0, 0),
(1,0), (1, tt/2), traced counterclockwise.
(b) Use the same integrand as in part (a), but change the path to the square
with corners at (0, 0), (1,0), (1, 1), and (0, 1), traced counterclockwise.
(c)
J c (x
2 — y 2 ) dx + (x 2 + y2 ) dy, where c is the circle of radius 1 centered
at (0, 0) and traced clockwise.
Sec. 1 Green's Theorem 521

Show that if D is a simple region bounded by a piecewise smooth curve y,


then the area of D is given by

A(D) = \ <b ( -y dx + x dy).

6. Let/ be a real-valued function with continuous second-order derivatives


inan open set D in Jl 2 Let F be the vector field defined in D by F(x) =
.

V/(x), the gradient of /. Show that if F(x) = (F^x), F2 (x)), then the
equation {dFjdxJ — (dFJdx 2 ) = is satisfied in D.

7. Let f(x 1? x 2) = x x > 0. Let the vector field ^f(x 1 x 2 )


arctan (x 2 lx 1 ), for ,

have coordinate functions x 2 ) and F2 (x 1 x 2 ). Show that F1 and F2 can


F2 (x 1 , ,

be extended in a natural way to all (x l5 x 2 ) ^ (0, 0). Let y be a smooth


curve parametrized to run from the point (1,0) to a fixed point {x x x 2 ), ,

but on the way winding k times counterclockwise around the origin.


Compute the line integral

V/.rfx,
1
x (a) for k = k = k an arbitrary positive integer, (d)
0; (b) for 1; (c) for
What interpretation can be given the line integral if k is a negative integer?

8. Let F be a continuously differentiable vector field defined everywhere but


at two points x x and x 2 in Si 2 and satisfying (dFJdx^ — (dFJdx 2) = 0.
,

Let c x and c 2 be counterclockwise-oriented circles centered at x x and x 2 and ,

with radii less than |xj — x 2 Suppose |.

F>dx=Ik , A: = 1,2.
ick
Show that if y is any closed smooth path that avoids x x and x 2 then ,

F • dx = n-Ji + n 2 I2 ,

.1

for some integers n x and n 2 .

9. (a) Consider a particle moving in a plane vertical to the surface of the earth
and subject to the gravitational field G(x, y) = (0, mg), where m is the mass
of the particle and^ is the acceleration of gravity. Show that as the particle
moves in the plane, the amount of work done is independent of the path
between two points and depends only on the and final points. In initial

particular, the work done


moving along a closed path is zero,
in
(b) Replace the field G by a field (Fx F2 ) satisfying {dF2 jd Xl ) = F= ,

(dF1 /dx 2 ) throughout the plane. Show that the same conclusions hold.

10. (a) Let 31 > Jl 2 trace a simple closed curve y in the counterclockwise
direction. Show is given by t(/) =
that, if a unit tangent vector to y
g\t)l\g'{t)\, then the outward pointing unit normal to y at^COis given by
n(0 = (g2(t)l\g'(0\, -g'i(0llg'U)\), where g x and g 2 are the coordinate
functions of^-.
522 Vector Field Theory Chap. 7

(b) Show that Green's theorem can be written in the form

where F= (Fj, F2 ), and ds denotes integration with respect to arc


length.
(c) Show that Green's theorem can also be written in the form

F • n ds.
1
[Hint. In the previous formula, replace F2 by F x
and F 1 by -F2 .]

11. Assume that the vector field F= (F1 , F.,) in Exercise 10(c) is a gradient field,
that is, F= V/for some real-valued/. Show that Green's theorem can be
written in the form

vy.n<fc,
J>"-J,
where A/ = (Pf/dx*) + {d 2f/dx 2 ), the Laplacian of/.
2

12. (a) Show that if/ is a continuous real-valued function defined in an open
set D of ft
2
, and fa/fdA = for every circular disk M in D, then /is
identically zero in D. [Hint. Show that if/(x ) # for some x in D,
then there is a disk M centered at x such that |/(x)| > d for some
(5 > 0, and all x in M.]
(b) Use part (a) and Stokes's theorem to show that if curl Fis continuous in
an open set D, and the circulation of F is zero around every smooth
circuit in Z), then Fis irrotational in D, that is, curl Fis identically zero
in D.
(c) Use part (a) and Gauss's theorem to show that if div Fis continuous in D
and $("/) = for every smooth circuit y in D, then Fis incompressible.

. The equations curl F= and div F = occur in complex variable theory


in a slightly different form as the Cauchy-Riemann equations. Show that if
u(x, y) and v(x, y) are the real and imaginary parts, respectively, of the
following complex valued functions, then the vector field, given by F(x,y) =
(r(jc, y), u(x,y)), is irrotational and incompressible.

-(a) (x + iyf.
(b) <?

(c) (i) log (x


2
+ y2 ) + i arctan y/x, x > 0.

SECTION 2

CONSERVATIVE The examples of the previous section show that, under certain conditions,
VECTOR FIELDS it is possible to alter the path of integration in a line integral in the plane
without affecting the value of the integral. Not all line integrals have this
property, but those that do are particularly important, not only for the
computational reasons already illustrated, but also because of their relation
to the gradient. In fact, we have the following theorem, valid in 3l n , which
is a converse to Theorem 2.5 of Chapter 4.
Sec. 2 Conservative Vector Fields 523

2.1 Theorem

Let F be a continuous vector field defined in a polygonally connected


open subset D of 51". If the line integral

F-dx
I
is independent of the piecewise smooth path y from x to x in D,
then the real-valued function defined by

/(x) = f F-dx
Jx

is continuously differentiable and satisfies the vector equation


V/= F throughout D.

Proof. We have to show that, for each x in D, V/(x) = F(\).


Since x is an interior point of D, there is a ball of radius d
centered at x and contained in D. This implies that, for any Xq
x /u 1
j

unit vector u and for all real numbers / satisfying |/| < <5,
-t-

the vectors x + tu are contained in D. Since the line integral

is independent of the path, we choose an arbitrary piecewise Figure 10


smooth path from x to x, lying in D, and extend it by a
linear segment to the vector x + tu, \t\ < d, as shown in Fig. 10. To
show that /is continuously differentiable we observe that

/( x + tu) — /(x) = F dx • - \ F dx •

«/x Jx
"x+iu
F -dx
r
= F(x + vu) ' u dv.

Then taking u = ey , theyth natural basis vector in 3l n , we get

dxi t-*o t

i r*
= lim - F(x + ve s ) • e,- dv
t->o t Jo

—— F(x + ue.)
3
• e,- dv
dtJo '

= F(x) • e, = F,(x),

where F,- is the y'th coordinate function of F. Since F was assumed


continuous, so are the partial derivatives dfjdx^, therefore /
524 Vector Field Theory Chap. 7

is continuously differentiable on D. Finally, the equations


(dfldxt )(x) = F,(x), j= 1, . . . , n, mean that V/ = F in D.

A vector field F for which there is a real-valued function /such that


F= V/is called a conservative, or gradient, field. In that case/is called the
potential of F. The motivation for this terminology is discussed in the
next example.

Example 1. Suppose that a continuous force field F, defined in a region


D of J* 3
, is such that the work done in moving a particle from one point to
another under the influence of the field is independent of the path taken
between the two points. Thus, if x x and x 2 are two points in the field and
W{x 1 x 2 ) represents the work done in going from x x to x 2 we can write
, ,

=
W(x u Xo)
f \

Jx,
F- dx.

If the particle follows a particular path given by g(t), then the velocity and
acceleration vectors are v(7) = g'(t), a(t) = g"(t), and we have F(g(t)) =
ma{t), where m is the mass of the particle. Hence,

W(x x 2 )Y ,
= ma(0 • v(r) dt,

ifg(t { ) = x,. But since a(7) = v'(0> and (djdt)v 2 (t) = 2v(r) • v'(/), we have

m C d
=^
2
o
W(x ,x 2 )
1
f [v\t)]dt,
2 J dt tl

= l;
2
(f 2 )- t;
2
(f I )). (1)
^(
The function k(t) = (m/2)v 2 (t) is called the kinetic energy of the particle at
time t.

On the other hand, if we fix a point x in D, then by Theorem 2. 1 , the


equation

w(x) =— F dx•

defines a continuously differentiable function u in D. Using the independ-


ence of path to integrate from x x to x, via x , we get

W(x lt x 2 ) = r> • dx

r*2 /*xi
= F dx • - F-dx
Jx n Jx„

u(x 2 ) + «( Xl ). (2)
Sec. 2 Conservative Vector Fields 525

Comparison of Equations (1) and (2) shows that

«(x2) + tJ
r
2
v (t 2 ) = m(xj) + y v\td.
In other words, along the path traced by g{t), the sum u(g(t)) + k(t) is a
constant, independent of /, called the total energy of the path. For this

reason, the function ;/(x), which is a function of position in D, is called the


potential energy of the field F. Notice that there is an arbitrary choice
made in defining the potential in that thepoint x was picked to have zero
potential. The choice of some other point x would change the function u
at most by an additive constant equal to W(x Xj). It is the constant total ,

energy which is "conserved" and which gives rise to the term "con-

servative field."

For a vector field F defined in a region D of n independence 'Ji , of path


in the line integral
J" y
F dx has been defined to mean that

2.2 f dx • = F dx •

where y[x l5 x 2 ] and (3[x l5 x 2 ] are any two piecewise smooth curves in D
having initial point x : and terminal point x 2 An alternative formu- .

lation of the independence property is that

2.3 f F dx =

for every piecewise smooth closed curve y lying in D. The equivalence of


the two properties follows from the observations that y[x-^, x 2 ] followed by
(3[x l5 x 2 ] in reverse direction is a closed path, and that a closed path can be

separated at points x x and x 2 into different paths joining x x and x 2 The .

details of the proof have already been illustrated in Section 1 and will be
left as an exercise.
We can summarize what we have proved about gradient fields in
Theorem Chapter 4, Theorem
2.5 of 2.1 of the present section, and in the
previous remark, as follows.

2.4 Theorem

Let Fbe a continuous vector field defined in a polygonally connected


open set D of 'Ji
n
. Then the following are equivalent:
526 Vector Field Theory Chap. 7

1. Fis the gradient of a function,/, continuously differentiate in


D.

2. The line integral of F over a path from x 1 to x 2 is independent of


the piecewise smooth curve y from x x to x 2 and so can be written ,

f F-dx.

3. For every piecewise smooth closed curve lying in D,

F>dx = 0.

A more intrinsic criterion for deciding if a continuous vector field

is a gradient field or not arises as follows. Suppose first that 5l 2 — 5l 2

is continuous on an open set D, and that F is a gradient field, that is,

there is a real-valued function /defined on D such that V/= F. In terms


of coordinate functions F x and F 2 of F, this means

—=F
ox-L
x and -^ = F 2
dx 2
.

If .F itself is continuously differentiable, we can form the second partials,

-?h ay _3Fi
and
dx 2 dx 1 dx 2 9xj dx 2 dx±

and conclude from their equality that

dF\ dF2 .„
=
dx 2 dx 1
throughout D. By the definition of curl F, Equation (3) can be written
curl F — 0. The equation can also be expressed another way: We consider

the more general field 'Ji


n —F 3l
n
, which we assume continuously differ-

entiable in an open subset D of 51". If Fis a gradient field, there is an/


such that V/= F, or, in terms of coordinate functions

&- = F» j =1 n.
dxj

Differentiating with respect to x we get

dF,_ df
2
_ a
2
/ _dF t
(4)
dx t dx dxj
t
dxj dx dx.

But the functions dFjdxj are the entries in the n-by-« Jacobian matrix

of 51" —
> 51", and Equation (4) expresses the fact that this matrix is

symmetric. Hence
Sec. 2 Conservative Vector Fields 527

2.5 Theorem

If %n — :Jl" is a continuously differentiable gradient field, then F',


the Jacobian matrix of F, is symmetric.

Example 2. The converse of Theorem 2.5 is false, as we see by looking


at the example in ill 2 ,

defined for all (x, j) 7^ (0, 0). It is easy to check that dFjdy = dF2 /dx.
But there isno continuously differentiable/such that V/"= F. The reason
is that, for x > 0, the function/(x, 7) = arctan (y/x) satisfies V/"= F, but
this /cannot be extended to be a single-valued solution of the equation
in the entire plane with the origin deleted.

Example shows that the nature of the region D on which Fis defined
2
is significant in determining whether F is a gradient field. By making a
special assumption about D we can obtain a partial converse to Theorem
2.5.

2.6 Theorem

Let R be an open coordinate rectangle in Jl", and let F be a con-


tinuously differentiable vector field on R. If F'(x), the Jacobian
matrix of F, is symmetric on R, then F is a gradient field.

Proof. Pick a fixed point x in R and let x be any other point of R.


We consider paths from x to x, each consisting of a sequence of
line segments parallel to the axes and such that each coordinate
variable varies on at most one such segment. Three-dimensional
examples are shown in Fig. 1 1. The reason for looking at such paths
is approach x from any coordinate direction for the
to be able to
purpose of taking partial derivatives at x. Choose one of these
paths, call it yx and define a real-valued function /by
,

/(x)= E.Jx. (5)


{
While the particular path y x is only one of several of the same type,
we shall see that any of the other possible choices would lead to the
same value for/(x). The reason is that any one of these paths can be
altered step by step into any one of the others by changes, each of
528 Vector Field Theory Chap. 7

Xq^.

Figure 11

which leaves the value of the integral (5) unaltered. Each path can
be described by a sequence of coordinate directions, only one of
which is allowed to vary at a time. (For example, the dotted path in

Fig. 11 corresponds to xlt x 2 x z and


, , the solid one to x 2 x 3 x x .)
, ,

Clearly, changing one such sequence into another can be accom-


plished by successively interchanging adjacent variables in pairs until
the desired order is reached. But each interchange replaces a pair of
segments (d^ b ) by another pair
}
(d't , <5>) lying in the same 2-dimen-
sional plane. To see that the replacement leaves the value of the
integral invariant, we form the circuit 6 consisting of the segments (5
t

and dj, followed by d't and b\ in the reverse of their original


directions. On these segments, x and x }
i
are the only variables that
vary, so the circuit integral can be written

F -dx F t
dXj +F j dxj.

We apply Green's theorem to the 2-dimensional rectangle Rs


bounded by d and get

(BFj
Jd jR s \OX
[d Xi i
dx
OXj/

since by the symmetry assumption, dFjdXf — dFjdx, = in R.


Thus -

<p F dx • = I
F dx • — I
F dx
• = 0,
and so the change of path leaves the value of the integral invariant.
Once it has been established that x can be approached along a
path of integration that varies only in an arbitrary coordinate, say
the Arth, we have, as in the proof of Theorem 2.1, the equation
dfjdxk (x) = Fk (x), for all k. Thus V/(x) - F(x) for all x in R.
Sec. 2 Conservative Vector Fields 529

Example 3. Applying Theorem 2.4 to the field

K* y) = (x y* * (0
W
(-f^-2 -rj—)
+ y i x' + yV ' '
' 0) '

of Example 2, we conclude that F, when restricted to any coordinate rec


tangle not containing the origin, is a gradient field. This is true,
for example in any of the four half-planes bounded by a coordi- y
nate axis. A potential function /for the half-plane x > 0, can be
computed by the line integral

y dx x dy
/(*, y) +
Jd.O)
o) x'
.
+r xz +r
where the path of integration any piecewise smooth curve from
is

(1, 0) to (x, y). A polygonal path from (1, 0) to (x, 0) and from
Figure 12
(x, 0) to (x, y) is particularly simple. On the first segment, the
entire integral is zero because y is identically zero, and on the second seg-
ment, with x constant, the integral reduces to

fv xdy
2
= arctan
0-
Jo x + y
2

The most general potential of F in the right half-plane differs from this
one by at most a constant. (Why?) The general solution there of V/= F
is therefore

/(x, y) = arctan —he


x
y

Compare the method of solution described in Problem 11 of Chapter 4,


Section 2.

EXERCISES
1. Consider the approximation to the earth's gravitational field acting on a
particle of mass 1 represented by the vector field F(x,y, z) = (0, 0, —g).

(a) Find for F the potential function u(x, y, z) that is zero when (x, y, z) =
(0, 0, 0).
(b) If a particle of mass 1 has at (0, 0, 0) a velocity vector (v t v 2 v 3 ) with , ,

v3 > 0, and no force but F acts on the particle, find the path of the
particle.
(c) Verify that the sum of potential energy and kinetic energy remains
constant for the path of part (b).

2, (a) Show F and G are gradient fields defined on the same domain D,
that if

then F+ G
and cF are gradient fields, where c is a constant,
(b) Let SJ be the vector space of gradient fields defined on a domain D.
Show that "V has infinite dimension.
530 Vector Field Theory Chap. 7

3. Use Theorem 2.4 or 2.6 to decide whether the following vector fields are
gradient fields.

(a) F(x,y) = (x - y, x + 2
y), for (x, y) in Jl .
(b) G(x, y, z) = (y, z, x), for {x, y, z) in dl 3 .

4. Use Theorem 2.5 to show that the vector fields in Problems 3(a) and 3(b) are
not gradient fields in any open subset of :R 2 or ft 3 , respectively.

5. Show that the vector field of Problem 3(c) is a gradient field in the region
y > of Jl 2 and find an explicit representation for its potential.

6. Consider the vector field defined in Jl 3 , with the z-axis deleted, by


-v x \
F(x, y, z)
xl +f
Is Fa gradient field?

7. Find a potential for each of the following fields.

(a) F(x,y, z) = (2xy, x 2


+ z 2 2jz).
,

(b) G(x, y) = (y cos xy, x cos xy).

(c) L(X U X 2) = (~2^| W th ( ^1' *2) ^ (0, 0).


X 2+ X2 )
'
> '

8. Consider the vector field F which is the gradient of the Newtonian potential
/(x) = — |
1
xl^ for nonzero x in Jl 3 .
Find the work done in moving a particle
from (1, 1, 1) to (—2, —2, —2) along a smooth curve lying in the domain
ofF.

9. Give a detailed proof of the equivalence of Relations 2.2 and 2.3 of the text.

10. In 3l n , how many paths can there be from x to x of the special kind described
in the proof of Theorem 2.6?

SECTION 3

SURFACE INTEGRALS In Chapter 3, Section 4, we have defined integrals both of a real-valued


function and of a vector over a smooth curve. Defining an integral over
a surface S leads to a different geometric situation having, however, a close
analogy with the line integral. To begin, we assume that S isparametrized

by a continuously differentiable function .'il


2
.'ft
3
. We shall write g in
the form

g(u, v) = I
g 2 (u, v) (1)

\g»(«> v ))

with u = (w, v) in some set D in .'ft


2
which we assume bounded by a piece-
,

wise smooth curve. We further assume that, at each point g{u, v) of S,


Sec. 3 Surface Integrals 531

the tangent vectors defined by the vector partial derivatives

dg dg
(u, v), (u,v)
du dv

determine a 2-dimensional tangent plane to S; in other words, that the


two tangents are linearly independent. If S satisfies all the above con-
ditions, we shall refer to it as a piece of smooth surface.
On a smooth curve, the choice of a parametrization going from one
endpoint to the other establishes an orientation for the curve. Analogously,
on a piece of smooth surface, a one-to-one parametrization determines a
particular normal vector

— (u,v) X -f(u,v) (2)


du dv
at g(u, v). See Fig. 13.

du dv

(b)

Figure 13

We recall that the length of the cross-product of two vectors a and b


is the area of the parallelogram spanned by a and b. In particular,

f{u,v)xf{u,v)
du dv

represents the area of the tangent parallelogram shown in Fig. 13(6). If


we think of constructing such parallelograms at the points g(uk vk ) ,

corresponding to the corner points (uk , vk ) of a grid over D, then it is


532 Vector Field Theory Chap. 7

natural to define the area of S by

d
3.1 o(S) = f ^(u,v)x^(u,v) du dv.
Jd du \

I dv

We assume that g is one-to-one. The integral over D exists because g was


assumed continuously differentiable on D. Similarly, if/? is a continuous
real-valued function defined on S, we define the integral of/? over S by

u
3.2 pda = \
p(g(u, v)) f(u,v)x"f(u,v) du dv.
JS JD du dv

This definition is the analog of that for a real-valued function over a


smooth curve given by Equation (2) of Chapter 3, Section 4.

Example 1. Let S be parametrized by

g(u, v)=\ v

for
1 <w + v <4; 2 2

thus S is actually the graph of x + j for2


< x + y < 4.
2
1
2 2
The surface
is shown in Fig. 14. Then

£ („,„)= oi, £(«..)


\2m/
We have

-^ (m, u) X ^ (u, u)
3u 3d
Sec. 3 Surface Integrals 533

xi
534 Vector Field Theory Chap. 7

yc
x{

Figure 15

is equal to the coordinate of F(g(u, v))in the direction of n, multiplied by


the area of the tangent parallelogram spanned by dg/du and dgjdv at
g(u, v). We define the surface integral of F over S by

3.3
Jd
f F(g(u, v)) (|i
You
(ii, v) X ^
ov
(u, v)\
J
du dv,

and denote it by
s F dS or j s F • n <ftr.
j" •

It is easy to check that the coordinates of the normal vector are given
by Jacobian determinants as

dg 3g = / g(x 2 , x3) d(x 3 , xQ d(x t x 2 ) \


,

x
du dv \d(u,v) '
d(u,v) '
d(u, i>)/'

where xl5 x 2 and x 3 represent the coordinate functions of g.


, If the vector
field F has coordinate functions F F2 F3
x , , , then the surface integral is

often written

f 9(x 2 x , ) _ 9(x 3 Xi) , _ 5(x!,x 2 )


F-dS
. .

+ ^2
"
du dv
J/j 9(w, f ) d(u,v) d(u, v)

= i7 ! cfx 2 dx 3 +F 2 ix 3 ^Xi +F 3 c/x x dx 2 .

This last abbreviation is a particularly convenient one, and its significance


is discussed in Section 7.

Example 2. Suppose that a continuous vector field Ji 3 — > 'J\


3
describes
the speed and direction of a fluid flow at each point of a region R in which
it is defined. We shall define, using a surface integral, the flux, or rate of
Sec. 3 Surface Integrals 535

flow per unit of area and time across a piece of smooth surface 5
lying in R. If S is perfectly flat and F is a constant field, then the
flux is equal to where FB is the coordinate of F in the
FD a(S),
direction of a unitnormal to S. Thus, in this case, the flux is equal
to the volume of a tube of fluid illustrated in Fig. 16. Because Fn =
F n, we get the formula

0> =F • na(S)
for the flux.
If S is a piece of smooth surface in R, we partition S along Figure 16
coordinate curves of the form u = const, and v = const, and
assume that, within each part of S so formed, the field F is constant.
Approximating 5 by tangent parallelograms spanned by vectors

Au^ and A,^


du dv

leads to the picture shown in Fig. 17. The approximate flux across a
typical subdivision Sk of S will have the form

®k = F(g(u k )).n k o(Sk )

= *&(»*))• s (u x ^(uk)\ Au A,.


t)
(I

The sum

I% = i F(g(u
fc=i fc=i
k )) •
(f*
\dw
(uA ) x ^
du
(u,))
/
Au A,

becomes a better approximation to what we would like to call the flux


across S as the subdivision of S is
making finer the correspond-
refined by
ing grid G in the parameter domain D. On the other hand, if F is

/(*("))

Figure 17
536 Vector Field Theory Chap. 7

continuous on S and g is continuously differentiable on D, then

m(O)->0fc=l JD \OU OV I

F •
ds,
LIS
which is the previously defined integral of F over S. Consequently, we
define the flux of F across S by

= FdS.
j;
We remark that the sign of <1> would change if S were reparametrized so
that the normal vector determined by the parametrization pointed in the
opposite direction.

Example 3. Let a fluid flow outward from the origin in Jt 3 be given by

p, N / Xi £2 £3 \

\Xi -\- X2 -\~ X3 Xj + x2 + x3 Xi -\- x2 + x3/

The flux of F across a sphere S a centered at the origin and of radius a takes
the form

f F ds=( *dx 2 dx 3 + x 2 dx 3 dxx + x 3 dx t dx 2


JSa JSa Xi + x2 + x3

However, on S a the denominator


, is contantly equal to a 2 so the surface ,

integral takes the simpler form

dx 2 dx 3 + x 2 dx 3 dx x + x 3 dx x dx 2 .

a Jsa

We can represent S a parametrically by

f
Xi\ la sin w cos
< w < 77

< e < 2tt,


\X°/ \a cos 99

so that

d(x 2 x 3 )
,
= 2
a sin
.
2
w cos

d(x 3 , xj
a sin 9? sin
B(<p, 6)

d(xx, x 2)
a sin 93 cos 9?.
Sec. 3 Surface Integrals 537

These are the coordinates of a normal vector pointing out from the
sphere. Then

F dS
• = a dO sin
3
9? cos
2
+ sin
3
7? sin
2
6 + sin <p cos
2
95 d<p
Jsa JO Jo

= a \ dd sin cp dq> — Ana.


Jo Jo

For the purpose of computing a line integral over a piecewise smooth


curve, it is natural to orient the smooth pieces of the curve so that the

terminal point of one piece is the same as the initial point of the one that
follows it. To integrate a vector field over a piecewise smooth surface,

we need a notion of orientation for pieces of smooth surface S. If .'R


2

'J\
3
S parametricaly with g defined on D, then Fig. 13 shows how
represents
D and 5 may be related. The edge of S, corresponding under g to the
boundary of D, we shall call the border of S. As a point u moves around
the piecewise smooth boundary of D in the counterclockwise direction,
its image g(u) traces the border of S with what we shall call its positive
orientation. A further geometric picture of the positive orientation can
be had by observing that, as the normal vector dgjdu x dgjdv runs
around the border in the positive direction, the surface S remains on its left.
A piecewise smooth surface is of course a finite union of pieces of
smooth surface that are joined along common border curves. A piecewise
smooth surface is orientable if, when the border curves of its pieces are
positively oriented, borders common to two pieces are traced in opposite
directions. If one of the two curves is reversed, their parametrizations
must be equivalent. The surfaces pictured in Fig. 18 are assumed to be
representable as piecewise smooth surfaces, and the first two are orientable.
However, the joining together of two rectangular strips, one of them with
a twist, gives a Mobius strip, which is the standard example of a non-
orientable surface.

7
*!

Figure 18
538 Vector Field Theory Chap. 7

We define the integral of a continuous vector field over a piecewise


smooth surface to be the sum of the integrals over each of its smooth
pieces. Thus if 5 = S t U S 2 ,

\
F dS = I
F-dS + \
F-dS.
JS JSi Js 2

This definition holds even if S is not orientable, but in practice it is of little


interest to integrate a vector field over a nonorientable surface. On the
other hand, the integral of a real-valued function over a surface can be
computed without regard to orientation. The reason is that in Formulas
3.1 and 3.2 the so-called element of surface area,

da = du dv,
du dv

does not change when the orientation is reversed by interchanging the roles
of u and v. But in Formula 3.3, the vector surface element,

dS du dv,
\du dv

does change sign when u and v are interchanged. We observe that the
surface element dS can also be written in the form

dS = n do,

where n is the unit normal to the surface given by

du
Sec. 3 Surface Integrals 539

2y Compute J $ F dS, where •

(a) F(x, y, z) = (x, y, z) and 5" is given by

0<«<1,
6 K ' ' I I ' < v < 2.

\
W /
(b) F(x, y, z) = (x 2 0, 0) and
, S is given by

(u cos »\

u sin i'
\
I
<
— « <
— 1,
'

I' 0<t;<277.

3^ Find the total mass of a spherical film having density at each point equal to
the linear distance of the point from a single fixed point on the sphere.

4. Let x = g(u, v), for («, v) in D, and x = /7(s, 0, for (s, in fi, be para-
metrizations for the same piece of smooth surface S in :R 3 . If there is a
one-to-one transformation T, wayscontinuously differentiable both
between D and B, such that the Jacobian determinant of Tis positive, and
such that g(u, v) = h(T(u, v)) for (u, v) in D, then g and h are called
equivalent parametrizations of S.

(a) Show that equivalent parametrizations assign the same surface area to
S. [Hint. Use the change-of-variable theorem.]
(b) Show that equivalent parametrizations assign the same value to the
surface integral of a vector field over S.

(Si Let the temperature at a point (x, y, z) of a region R be given by a con-


tinuously differentiable function T(x,y, z). Then the vector field VT is
called the temperature gradient, and under certain physical assumptions,
VT(x,y, z) is proportional to the direction and rate of flow of heat per unit
of area at (x, y, z).

(a) If T(x, y, z) = x 2 + y 2 for x 2 + y 2 < 4, find the total rate of flow of


heat across the cylindrical surface x 2 + y 2 = 1, < z < 1.
(b) Give an example of a continuously differentiable vector field that cannot
be a temperature gradient.
_1/2
6. The Newtonian potential function (x 2 + y2 + z2) has as its gradient
the attractive force field F of a charged particle at the origin acting on an
oppositely charged particle at (x, y, z). The flux of the field across a piece
of smooth surface is defined to be the surface integral of F over S. Show
that the flux of F across a sphere of radius a centered at the origin is

independent of a.

7. (a) If &2 f
>
Jl is continuously differentiable on a set D bounded by a
piecewise smooth curve, show that the area of the graph of/ is

o(S) = f Vl + (fx f + 2
(fy ) dxdy.
540 Vector Field Theory Chap. 7

(b) Find the area of the graph of f(x,y) = x 2 + y for < x < 1,
<y < 1. [Ans. Vf + i log Vl + V3).] (

8. Show that if Jl
3 —^> Jl is continuously differentiable and implicitly deter-
mines a piece of smooth surface S on which dG/dz # 0, and which lies over
a region D of the xy-plane, then

Assume
MJSRIH
that just one point of S lies over each point of D.
</.v (/y.

9. Prove that the border of a piece of smooth surface is a piecewise smooth


curve.

10. For each of the following sets find a parametrization as a piecewise smooth
orientable surface with outward pointing normal.

(a) The cylindrical can with bottom and no top given by x 2 + y2 = 1,


< z < 1 and x 2 + y 2 < z = 0. 1 ,

(b) The funnel given by x 2 + y 2 — z 2 = 0, 1 < z < 4 and x 2 + y2 = 1,


< z < 1.

(c) The trough given by/ — z = 0, 0<x<l,0<z<l and _y + z = 0,


<x < 1, <z < 1.

11. Let F be the vector field in H 3 given by F(x,y, z) = (x, y, 2z — x — j).


Find the integral of Fover the oriented surfaces of Problem 10.

12. Let F be a continuous fluid flow field and let M be a piecewise smooth
Mobius strip lying in the domain of F. Is it possible to define the flux of
F across Ml
13. Parametrize the set of Problem 10(a) so that it is unoriented, with normals
pointing out on the bottom and in on the sides. Compute the integral of
F(x, y, z) = (x, y, 2z —x — y) over the unoriented surface.

14. Prove that if F and G are continuous vector fields on a piece of smooth
surface S, then

(aF + bG) dS = a\ F dS + b\ GdS,


Js Js Js
where a and b are constants.

15. (a) Let F be a continuous vector field on a piece of smooth surface S. Show
that

//'* < Ma(S),

where M is the maximum of |F(x)| for x on S. [Hint. Write J F dS


in the form f F • n do.]
(b) Use part (a) to show that if S contracts to a point x in such a way that
cr(5) tends to zero, then (l/f?(5))Js F dS tends to F(x ) • n where n is a
unit normal to S at xn .
Sec. 4 Stokes's Theorem 541

16. Let /be a real-valued continuous differentiable function of one variable,


nonnegative for a <x < b. The graph of /, rotated around the x-axis,
generates a surface of revolution in Jl 3 .

(a) Find a parametric representation for 5 in terms of/.


(b) Prove that a(S) = 2n pj(x) ^TTJfixj) 2 dx.

17. The solid angle determined by one nappe of a solid cone e in X3 , with
vertex at the origin, is defined to be the area of the intersection of e with
the unit sphere |x| =1. See Fig. 19.
(a) Show that a suitable reduction of the above definition leads to the usual
definition of the angle between two lines.

~^{b) Compute the solid angle determined by the cone x2 + y2 < 2z 2 , < z. Figure 19

[Ans. 2 tt(1 - 3~ 1/2).]

SECTION 4

An important extension of Green's theorem can be made as follows. STOKES'S THEOREM


Instead of considering a plane regon D
bounded by a curve, we can think
of lifting such a region, together with its boundary curve, into a 2-dimen-
sional surface S in 'J\ 3 Then S will have as its border a space curve y
.

corresponding to the boundary of D. The lifting is made precise by


defining on D and its piecewise smooth boundary a function Jl 2 — > 3t3
having 5 as the image of D. A typical picture is shown in Fig. 20. The

Figure 20

region D has its boundary oriented counterclockwise, and y, the border


curve of S, inherits what we have called the positive orientation with

respect to S. If we parametrize the boundary of D by Jl -^-> 3l 2 , for


a < < t b, then the composition g(h(t)) will describe the border of S. We
shall denote the positively oriented border of S by dS.
We can now relate the line integral of a vector field F around dS to the
surface integral of an associated vector field over S. We assume that
,

542 Vector Field Theory Chap. 7

.'ft
3 — > Ol 3 is a continuously differentiable vector field whose domain
contains S. Then the vector field curl F is defined by

curl F(x)
.
_, .

=
/5F
(
-^-s3
\dx
(x) - BF- 2

3.x,
.

(x),
.


BF
a.v
t ,

(x)
.

- BF-3 (x), BF . 3F
-^ (x) - J
.
2

3x x
. .

— l .

(x)J
\

where Fl5 F2 and F3 are the coordinate functions of F. If the domain of F


,

is an open set, then the domain of curl Fis the same set. The vector field

curl Fhas been chosen so that if S is a piece of sufficiently smooth surface,


then it will turn out that

curl F ds • = d) F •
dx. (1)
I J as

Notice that if F were essentially a 2-dimensional vector field, with F3


identically zero and Fx and F 2 independent of x 3 then only the third ,

coordinate function of curl F would be different from zero, and Equation


(1) would reduce to Green's formula.

Example 1. Let S be the spiral surface parametrized by

for < u < 1, 0<i; <-.


2

Then the border of S consists of three line segments and a spiral curve
shown in Fig. 21 together with the domain D of the parametrization.

.(')

(4)

(3) I

Figure 21
Sec. 4 Stokes's Theorem 543

Restricting the parametrization of S to the boundary of D gives the follow-


ing parametrizations of the smooth pieces of the border of S:

Yi-

y-i-

Y3-

Yi'

Now let F be the vector field F(x t x z x 3 )


, ,
— (x 3 ,x 1 ,x 2 ). The line
integrals of F over the y' are all of the form

x 3 dx x + *i dx 2 + x 2 dx 3 .

It is easy to see that the integrals over y x yz and y 3 are , , all zero, while over

Yi we get

—r
72
f
F dx
• \ (cos
2
t + sin t — t sin r) rfr
Jy4 Jo

On the other hand, curl F(x u x 2 x3)


, — (1, 1, 1); so the integral of curl F
over 5 is

3^x3) afcfexo ac^^x^


I zA 5(u, u) (», t>) d(u, v).

t/2
-.
\\lu
c/m f
I
(sin v — cos v + u) t/f = —
t,.

Jo Jo 4

This verifies Equation (1) for our special example.

The proof that we give of Stokes's theorem depends on an application


of Green's theorem to the region D on which the parametrization of 5 is
544 Vector Field Theory Chap. 7

defined. For this reason we need to assume enough about D to make


Green's theorem hold on it. Also, if ,'R
2 —g
> .'Jl
3
is the parametrization of
S, we shall want the second-order partial derivatives of g to be continuous,
that is, g should be twice continuously differentiable on D. These con-
ditions can be relaxed, but to do so makes the proof much more difficult.

4.1 Stokes's Theorem


3
Let 5 be a piece of smooth surface in ;K , parametrized by a twice
continuously differentiable function g. Assume that Z), the parameter
domain of g, is a finite union of simple regions bounded by a piece-
wise smooth curve. If F is a continuously differentiable vector field
defined on S, then

curl F dS = <f
F •
dx,

where dS is the positively oriented border of S.

Proof. Let Fl5 F F3 2 , be coordinate functions of F. We shall prove


that

(j)

Jbs
f, dx x =f-
Js
—dx,
x
dxy dx, + ^
dx 3
dx 3 dx,. (2)

The proofs that

F, dx 2 = — — - dx 2 dx 3 -j dx x dx 2
Js dx 3 dx t
and

F 3 dx 3 = — — - dx 3 dx, + — dx 2 dx 3
ds '

Js dx x dx.

are similar, and addition of the three equations gives Stokes's for-
mula. To prove the top equation, suppose that h(t) — (ii{t), r(7))is a
counterclockwise-oriented parametrization of b, the boundary of D.
Then g(h(t)) is a piecewise smooth parametrization of the border of
S, which by definition is then positively oriented. Writing glt g,. g3
for the coordinate functions of g, we have

<\> F dx 1
l
= (F1 (g(u,v))jg1(u,v)dt
>>dS J dt

dt
= JVifete o)

= <PF1 °g^du + F og^dv. 1


h ou ov
.

Sec. 4 Stokes's Theorem 545

This last integral is a line integral around the region D in Si 2 , and


we can apply Green's theorem to it, getting

du dv. (3)
•'as jD\_du\ dv J dv\ du /

The fact that g is twice continuously differentiable ensures that the


integral over D will exist. The same fact enables us to interchange the
order of partial differentiation in a computation which shows that

d u\ dv J dv\ du 1

(4)
dx 2 d(u, v) dx 3 d(u, v)

Substitution of this identity into Equation (3) gives Equation (2), thus
completing the proof.

Using Stokes's theorem we can derive an interpretation for the vector


Fthat gives some information about F itself. Let x be a point of
field curl

an open set on which F is continuously differentiable. Let n be an


away from x and construct a disk Sr of
arbitrary unit vector pointing ,

radius r centered at xand perpendicular to n This is shown in Fig. 22. .

Applying Stokes's theorem to F on the surface Sr and its border y r gives

r curl F dS.
i -*-Ls
The value of the line integral is /"around y r and it
called the circulation of ,

measures the strength of the y r Thus, for small r, the


field tangential to .

circulation around y r is a measure of the tendency of the field near x to


rotate around the axis determined by n On the other hand, the surface.

integral is, for small enough r, nearly equal to the dot product curl F(x ) n •
,

multiplied by the area of Sr See Problem 15 of the previous section. It


.

follows that the circulation around y r will tend to be larger if n points


in the same direction as curl F(x ). Thus we can think of curl F(x ) as
determining the axis about which the circulation of F is greatest near x .

Similarly, |curl F(x )| measures the magnitude of the circulation around


this axis near x .

The extension of Stokes's theorem to piecewise smooth orientable


surfaces is very simple, though a little care is needed in defining the border
of such a surface. Figure 23 illustrates the method. The surfaces S 1 and S 2
have their borders joined so as to produce a piecewise smooth positively
oriented surface which we denote by S x U S 2 Recall that the surface .
,

546 Vector Field Theory Chap. 7

Figure 23

integral of a vector field F over S l U S 2 has already been defined by

I
F dS = I
F dS•
+ I
F-dS.
JSiVSs JSi JSi

The piece of common border curve, indicated by a broken line in Fig. 23,

will be traced in opposite directions, depending on whether the para-


metrization induced by 5\ or by S2 is used. Hence, the respective line
integrals of Fover the common border will have opposite sign and when the ,

line integrals over d-S^ and BS 2 are added, the integrals over the common
part will cancel, leaving a line integral over the rest of the borders of Sx
and S 2 . It is this remaining part that we call the positively oriented border
of SV U S 2 and denote by d(Si U S 2 ). (Thus
, in general d(S t U S2 ) ^
dSi U dS 2 .) With this understanding we write Stokes's theorem in the
form

curl F dS = i) F • dx,
Js JdS

for a piecewise smooth surface S.

Example 2. A sphere can be considered as a piecewise smooth surface


on which all of the border curves cancel one another. In fact if we para-
metrize a sphere S a in Jl 3 by

'a sin v cos u\


\ < u < 2tt
{u, v) = I
a sin v sin u I

/ < V < 77,


a cos v /

then the positively oriented "border" of the sphere consists of the half-
circle shown in Fig. in each direction. Thus the half-circle
24 traced once
corresponds to the segments u and u = 2tt in the parameter domain.
=
(What happens to the segments v = and v = 77?) The result is that a
line integral over dSa will be zero, and Stokes's theorem applied to a vector
field F on S a gives

curl F dS = 0.
£Sa
Sec. 4 Stokes's Theorem 547

IT

1
548 Vector Field Theory Chap, 7

J
(a) (b)

Figure 25

the strict converse false. Using Stokes's theorem, we can prove another
is

partial converse, in which the local condition is replaced by a different


kind of restriction on the domain of the given field.
For this purpose we shall define a simply connected open set B in 31".
Roughly, a set B is simply connected if every closed curve y in B can be
contracted to a point in such a way as to stay within B during the con-
traction. As y sweeps out a surface S lying in B,
contracts to a point, it

and y is the border of S. The region between two spheres shown in Fig.
25(a) is simply connected. However, the open ball with a hole punched
through it is not simply connected, because any surface whose border
2
encircles the hole must lie at least partly outside B. In :R , the typical
simply connected region is the inside of a closed curve, while the outside
of such a curve is not simply connected. In Fig. 26(a). the curve y is the

border of the surface consisting of the part of the plane lying inside y.
However, the presence of the hole in Fig. 26(b) prevents a similar con-
struction. More precisely, we shall say that an open set is simply connected
ifevery piecewise smooth closed curve y lying in B is the border of some
piecewise smooth orientable surface S lying in B, and with parameter
domain a disk in Jl 2 . We assume for applications that 5 is parametrized by

(a)

Figure 26
,

Sec. 4 Stokes's Theorem 549

twice continuously differentiable functions. All the differentiability


conditions assumed above are usually relaxed to just continuity, but,
since we shall have no use for the broader definition, we shall not introduce
a special term for the one involving differentiability.
Now we can prove the following.

4.3 Theorem

Let F be a continuously differentiable vector field defined on an open


set B in Jl 2 or Jl 3 If .

1. B is simply connected, and

2. curl F is identically zero in B,

then Fis a gradient field in B, that is, there is a real-valued function/


such that F = V/.

Proof. By Theorem 2.4 it is enough to show that <J>


F dx
• = for
every piecewise smooth curve y lying in B. Because B is simply
connected, there is a piecewise smooth surface S of which y is the
border and to which we can apply Stokes's theorem in either 2 or 3
dimensions. Thus

F dx • = curl F-dS = 0.

EXERCISES
1. Compute curl F if
(a) F(xlt x 2 x 3 ) = (x 2
,
- x\, xa - x\,x x - x 22 ).
(b) F(x,y,z) = (x,2y,3z).

2. Verify by computing both integrals that Stokes's theorem holds for the
vector field F(x lf x2 x3) =
, (x lr x 2 x 3 ) on
, a hemisphere centered at the
origin in Jl 3 .

3. (a) Verify that if F(x lt x 2 x3 ) , is independent of x 3 and the third coordinate


function of F is identically zero, then Stokes's formula, applied to a
planar surface in the jc^-plane, is the same as Green's formula.

(b) Consider the function R2 — g


> R3 defined by

'u cos v\
\ 1 < u <2,
f(«, v) = I u sin v I

I < v < An.


550 Vector Field Theory Chap. 7

If S is the image in Jl 3 of g, give a precise description of the border of S.


(c) IfF is a continuously differentiable vector field on S of the type described
in part (a), use Stokes's theorem to compute the integral of F over the

positively oriented border of S.

4. Show that the Stokes formula can be written in the form

curl F • n da F-tds,
j;
IS JdS

where n is a unit normal to S and t is a unit tangent to dS.

5. Use the result of Exercise 15 of the previous section and Stokes's theorem
to prove that if F is a continuously differentiable vector field at x , then

lim ,
F-tds = curl F(x ) • n ,

r -+o A(D r )

where D r is a disk of radius r centered at x , and n is a unit normal to the


disk,and c is the boundary of D r.

^6. Prove that if F is a continuously differentiable vector field such that at


each point x of a piece of smooth surface S, the vector curl F(x) is tangent to
S, then the integral of F around the border of S is zero.

7. Let Fbe a differentiable vector field defined in an open subset B of :R


3
. Use
the decomposition of a square matrix A into symmetric and skew-symmetric
parts given by A = \(A + A 1
) + \(A — A 1
) to show that for all y in A3

F '(x)y = S(x)y + \ curl F(x) x y,

where 5(x) is a symmetric matrix.

^8. Let F be the gradient field of the Newtonian potential f(x,y,z) =


(x 2 + v2 + z 2 )~ 1/2 . Show that near each point of the domain of F the
circulation of F is zero.

9. Carry out the computation of the identity in Equation (4) of the


proof of Stokes's theorem.

10. Consider the cylindrical can C of radius 1 having an unspecified


smooth border and an orientation as shown in Fig. 27. Let
F(xlt x2 x3 ) = , (2x1, xl, 3x1). Wf> at is the value of the line integral
of F over the border of C?
11. Compute the where F(x lt x 2 x3) = (xl, — Jcf, xf),
integral of curl F, ,

over the hemisphere x\ x| = 1 x 3 > 0, by considering an


+ x\ — ,

integral over the disk that makes the hemisphere closed.


2 2
12. Show that the open subset of Jl consisting of :R with the origin
deleted is not simply connected by finding a vector field Ffor which
curl F is identically zero, but such that F is not a gradient field.

Figure 27 [Hint. See Problem 7 of Section 1.]


Sec. 5 Gauss's Theorem 551

Figure 28

13. The open 2


set in Jl consisting of :R 2 with two points \ x and x 2 deleted is not
simply connected. However, show that if F is any continuously differen-
tiable vector field in such a region such that curl F= there, then the
integral of F over the smooth curve shown in Fig. 28 is equal to zero.

14. In the open subset B of 3i3 consisting of a 3 with the origin show that
deleted,
a circle centered at the origin is the border of a piecewise smooth surface
lying in B.

SECTION 5

The Gauss theorem (described below) has many applications, some of GAUSS'S THEOREM
which are discussed in Section 6. Like Stokes's theorem, it can be looked
at as an extension of Green's theorem. We begin with a region R in 'Si z
having as boundary a piecewise smooth surface S. Each piece of S will be
parametrized by a continuously differentiable function 3la —L+.313 such
that the normal vector dgjdu x dgjdv points away from R at each point
of S. The boundary surface 5 is then said to have positive orientation, and
we denote the positively oriented boundary of R by dR. To state the
theorem, we consider a vector field F, continuously differentiable on R
and its boundary. We define the divergence of F to be the real-valued
function div F defined on R by

divF(x) = ^(x) +
ox 1 ^ 2

ox.,
(x) + ^
ox 3
3
(x),

where F F x ,
2 , F3 are the coordinate functions of F. Then the Gauss (or
divergence) formula is

I
diwFdV = I
F dS.
JR JdR
The Gauss formula is like Stokes's formula, Green's formula, and the
formula

grad/.rfx=/(b)-/(a),
r
Ja

in that it relates an integral of some kind of derivative of a function to the


behavior of that function on a boundary. In each case the orientation of
552 Vector Field Theory Chap. 7

the boundary is important. For example, if we apply Gauss's theorem to


the region R in lil
3
given by 1 < |x| < 2, then the oriented boundary, dR,
must be such that normal vectors on the outer sphere point away from
its

the origin, and on the inner sphere point toward it, as shown in
Fig. 29. We shall say that dR is positively oriented with respect to
R if normal vectors given by the parametrization of BR point
the
away from R.
We shall prove Gauss's theorem for the case in which R is a
finite union of simple regions, where a simple region in 5l 3 is one

whose boundary is crossed by a line parallel to a coordinate axis


at most twice. The region between two spheres, shown in Fig. 29,
is a union of eight simple regions, one in each octant.

5.1 Gauss's Theorem

Figure 29 Let R be a finite union of simple regions in Jl 3 , having a


positively oriented piecewise smooth boundary dR. If Fis a
continuously differentiable vector field on R and dR, then

\
div F dV = f F-dS.
Jr hi

Proof. In terms of coordinate functions of F, Gauss's formula reads

f
JR\OX 1
—+—+~
(3f 1dF
)
. dF 2
OX 2
.

OX 3 f
3\ ,
dx 1 dx 2 dx 3
. .

= F dx 2 dx 3 + F 2 dx 3 dx x + F3 dx 3 dx 2
1 .

J BR
a.

We assume first that R is a simple region and prove only the equation

—* dx 1 dx 2 dx 3 = F 2 dx 3 dx lt
JR OX 2 JSR

the proofs for the terms containing F x and F3 being similar. Addition
of the resulting equations will then prove the theorem for simple
regions. Because R is simple, dR consists of the graphs of two
functions, s(x u x3 ) and r(x u x3 ), perhaps together with pieces
parallel to the x 2 -axis as shown in Fig. 30. Let

/gi(u, v)\

gO> v) = I g 2 (u, v) , («, v) in D,


Sec. 5 Gauss's Theorem 553

x2 = r ( Xy , x3)

Figure 30

be a parametrization for dR that orients it positively. Then, by the


definition of the surface integral.

f
JiR
F2 dxz dx, = f
Jd
F2 ( gl , g 2 g,)
, ^-^ o(u, V)
rfu ^r, 0)

and, on the sections of dR that are parallel to the x2-axis, the normal
vector to dR is perpendicular to the x 2 -axis. Hence d(g 3 gi)jd(u, ,
v),

the second coordinate of the normal, is equal to zero, thus eliminating


the part of the integral that is not on the graph of r or s. We now
apply the change-of-variable theorem to the two remaining parts of
the integral in Equation (1). The appropriate transformations are

g 3 (u, v)

gi(u, v)

with (w, v) in either where D r and D s are the parts of D


D r or D s ,

corresponding to the graphs of r and 5\ The Jacobian determinant

9 (g 3 gi)jd{u, v) is positive on the graph of r and negative on the


,

graph of s, because it represents the x 2 -coordinate of the outward


normal. On D T we have g 2 (u, v) = r(x l x 3 ), while on D s g 2 (u, v) = , ,

s(xj, x 3 ). Using these facts, we get from the change of variable


theorem and Equation (1),

F 2 dx 3 dx =y F2 (x 1 , s(x lf x 3), x 3 )(— 1) dxy dx 3


Jm jr 2

+ F2 (x l5 r(xt x 3 ), x 3 )
, dx x dx 3 ,

JRz

where R 2 is the plane region got by projecting R on the x^-plane.


These last integrals are not surface integrals, but rather integrals over
J
554 Vector Field Theory Chap. 7

a set. Then, by the fundamental theorem of calculus,

JdR
F 2 dx 3 dx y
=
Jli2l.JslX1.r3)

OXo
(x l5 x,, x 3 ) </x 2 Jx! rfx 3

— - dXi dx 2 dx 3
r dx 2
.1;

Similar arguments involving F and F3 complete


1 the proof for simple
regions, since the addition of the three resulting equations gives

\
F dx dx 3
1 2 +F 2 dx 3 dx x + F3 dx-^ dx 2
.[dR
dFj. . dF 2 ,
dF
,
dx y dx 2 dx 3
R\OXi OX., OX-,

This is the Gauss formula in coordinate form.

The extension of Gauss's theorem to a finite union R of simple


regions is essentially the same as the analogous extension of Green's
theorem. In the present case, when two simple regions have a common
boundary surface, the respective outward normals will be negatives
of one The corresponding surface integrals are then
another.
negatives of one another, and so cancel out. The remaining surface
integrals extend over the surface dR.

Example 1. Problem 6 of Section 3 consists of showing that the flux


of the gradient field F of the potential function

f(x, y, z) = (x 2 + / + z y V2 2

is independent of a.
across a sphere of radius a, centered at the origin,
Using Gauss's theorem we can prove something more general, and with a
minimum of calculation. Thus, let Sj and S 2 be any two piecewise smooth
closed surfaces, one contained in the other, both containing the origin, and
bounding a region R between them; for example, R might be the region
between two spheres. A routine calculation of the gradient shows that
F(x,y, z) = (x 2 + y 2 + z 2 )~ 3/2 (— x, —y, — z), and then that the diver-
gence of this field is zero, i.e., div F= everywhere except at the origin.
In particular, div F= throughout R. Applying Gauss's theorem to R
gives

f
FdS = \ divF dV = 0.
JdR JlR

But dR consists of S 1 with inward pointing normal and S 2 with outward


pointing normal; so, with the understanding that 5f stands for the inner
surface with reversed normal, we get

f F dS =- I
_ F dS +•
I
F dS = 0.
JSR J Si J Si
Sec. 5 Gauss's Theorem 555

Thus the integrals over the outward-oriented surfaces are equal. To find
the actual value, it is enough to compute it for one surface, say a sphere.
The result is —4tt.

The divergence of a vector field F at a point x can be interpreted as a


measure of the tendency of the field to radiate outward from x, hence the
term "divergence." To see this, consider a ball Br of radius r, centered
at an interior point x of the set on which F is continuously differentiable.
Dividing both sides of Gauss's formula by V(Br) gives

— —
V(BT) J Br
I
div FdV = —!—
V(B r) Jbb
I
F • n do,

where n is the outward unit normal. As r tends to zero, the left side tends to
div F(x ). See Problem 17 of Chapter 6, Section 2. On the right side, the
integral is the average rate of flow per unit of volume across the sphere of
radius r centered at x Hence the limit as r tends to zero is, by definition,
.

the rate per unit of volume of flow outward from x Because of its con- .

nection with the interpretation of divergence, Gauss's theorem is often


called the divergence theorem.

EXERCISES
1. Compute the divergence of the following vector fields:

(a) F(x Xt x 2 x 3) ,
= (x\, x\, xl).
(b) F(x,y, z) = (sin xy, 0, 0).
(c) F(xu x 2 x3) ,
= (x 2 , x3 , Xj).

2. Prove the following identities for any twice continuously differentiable vector
field For real-valued function/.

(a) div (curl F)(x) = 0.


(b) curl (grad/)(x) = 0.
3. (a) Show that for f(x,y, z) = (x 2 + j2 + z2)
_1/2
the equation

div (grad/)(x) = holds for all x # 0.

(b) Show by example that


div (grad/)(x) ^
may hold for some twice continuously differentiable function/.
(c) If the operator A is defined by

A/ = div (grad/),

find a formula for A/ in terms of partial derivatives of/ (A function


such that A/(x) = for all x in the domain of/is called harmonic, and A
is called the Laplace operator.)
!

556 Vector Field Theory Chap. 7

4. The trace of a square matrix is defined as the sum of the elements on its main
diagonal. If & n — > R n
is a differentiate vector field, we define div F to be
the real-valued function given by

div F(x) = tr F'(x),

where tr A stands for the trace of A. Show that in the 2- and 3-dimensional
cases this definition agrees with those previously given.

5. Use Gauss's theorem to compute

FdS
j;s

over the sphere of radius 1 centered at the origin in ft 3 , and with outward
pointing normal, where F is
(a) F(xt x 2 x 3) = (xf, x\, x§).
, ,

(b) F(x,y,z) = (xz 2 0, z 3). ,

6. Show that for a region R to which Gauss's theorem applies, the volume of R
is given by

i r
V{R) = - atj dx 2 dx 3 + x* dx3 dx x + x z dx x dx2 .

3 JdR
7. (a) Use Gauss's theorem to prove that if F is a continuously differentiable
vector field with zero divergence in a region R, then the integral of F
over dR is zero,
(b) Write an intuitive argument, based on the interpretation of the divergence,
for the assertion in part (a).

8. Let S be the ellipsoid

jr y* z*
V — + —2 = 1,
a2 b2 c

and let D(x, y, z) be the distance from the origin to the tangent plane to S
at (x, y, z).

-1
(a) Show that if F(x,y, z) = {x\a 2 ,y\b 2 zjc 2 ), then
, F> n = £> , where n is

the outward unit normal to S at (x,y, z).

(b) Show that


C ,

D' 1 da =
4tt
— (be
( i
—ca—-
ab\
1.
Js 3 \a b c)

9. A vector field 3i 3 — > Jl 3 defined in a region R is called irrotational if

curl F(x) = for all x in R, and incompressible if div F(x) = for all x in
R. Assume F continuously differentiable.

(a) Show that if F is irrotational, then the circulation of F is zero around


every sufficiently small circular path in R.
(b) Show that if F is incompressible, then the flux of F is zero across every
sufficiently small sphere with its interior in R.
Sec. 6 The Operators V, V x and
, V« 557

SECTION 6

To facilitate the application of the Gauss and Stokes theorems, it is THE OPERATORS V,
helpful to extend the use of the symbol V, called "del," that is used in Vx, AND V>
denoting the gradient field of a real-valued function. In terms of the
natural basis e x , e2 , e 3 for ill
3
, we recall that

V/ (1)
dx t dxo ox.

This equation defines V as an operator from real-valued differentiable


functions 31 3 — > Jl, to vector fields 3i 3 > 3l
3
. If we write

J_ d_ _d_
(2)
ox 1 dx 2 ax z

then Equation (1) follows by application of both sides of Equation (2)


to/.
The formalism described above makes the following definitions
natural. If F is a differentiable vector field given by

F(x) = Fiixfa + ir2 (x)e 2 + F (x)e 3 3,

then the operator Vx is defined by taking the formal cross-product of V


and F to get

VxF =
558 Vector Field Theory Chap. 7

and Gauss's formula becomes

6.2 f
I V F dV =
V-FdV--
• I
f
F-nda
I
JR JdR

To exploit these formulas fully we need some identities involving V. In the


formulas below, /and g are real-valued differentiable functions, F and G
are differentiable vector fields, and a and b are constants.

V(af+bg) = aVf+bVg (4)

V(fg)=fVg + gVf (5)

Vx(aF+6G) = aVxF+6VxG (6)

rx(/F)=/Vxf+V/xf (7)

V • («F + bG) = a V F + 6 V G • •
(8)

V.(/F)=/V.F-V/.F (9)

V (F
• x G) = (V x F) G - F- (V x • G). (10)

Each of these formulas is an immediate consequence of the coordinate


definitions of the operators. Using the same kind of proof establishes that
if/ and Fare twice differentiable, then

V-(VxF) = 0, (11)

V x (V/) = 0, (12)

V.V/=V% (13)

where V /is
2
the Laplacian of/ defined by

J a 2 a 8 a 2 "
d.Vj ox 2 ox 3

(Equations (11), (12), and (13) are the same as those in Problems 2 and 3
of the previous section, where the alternative symbol A was used for the
Laplace operator.)
The formulas given above can be used to derive many special cases of
the Gauss and Stokes theorems. A particularly important kind arises if the
vector field Fis assumed to be a gradient V/ or a multiple/ V g. If we set
F = V/in Formula 6.2, the result is

I
V-VfdV = \ Vf-ndo. (14)
JR JdR
But by Equation (13), V V/= V 2/ and by Equation

2.1 of Chapter 4,
Sec. 6 The Operators V, V x and V-
, 559

V/- n = (d/dn)/. Thus we have

d
\T-fdV = [ on
-[da. (15)
Jr Js

If we replace F in Formula 6.2 by/ V g, instead of by V/ we have from


Equation (9)
V-(/Vg)=/V.Vg + V/.Vg;

and so Gauss's formula yields

6.3 f fV 2 gdV + ( Vf-VgdV = f f^do.


Jr Jr Js dn

This is called Green's first identity. Because of the symmetry in the middle
term, interchange of/ and g and subtraction of the corresponding terms
gives Green's second identity.

6.4 f
( /V 2
g - gV 2/) dV = f (/ & - gA
d
da.
Jr Js\ on on!

Example 1. Let R be a polygonally connected region in 'Jl


3
with a
piecewise smooth boundary surface 5. If h is a real-valued function defined
in R, we consider a Poisson equation

V 2
u = h,

subject to a preassigned boundary condition, w(x) = q)(x) for x on S. We


suppose that there one solution u(x) defined in R and satisfying
is at least
the boundary condition. We can show, using Green's first identity, that
such a solution must be unique. Let us suppose that there were two
solutions ut and u 2 then the function u defined by u(x) = w x (x) — */ 2 ( x )
;

would satisfy the Laplace equation V 2 u = in R, together with the bound-


ary condition u(x) = on S. Setting/ = g = u in Formula 6.3 gives

( uV udV + 2
( \Vu\
2
dV = ( u — do.
jr Jr Js dn
But the and last terms are zero because V u =
first in R and u = on 2

S. It follows from J s |V u\ 2 dV = that V u = identically on i? and S.


Hence u must be a constant in the polygonally connected region R.
Finally, u must in fact be identically zero because z/(x) = for x on S. We
remark that the Laplace equation is the special case of the Poisson equa-
tion obtained by taking // identically zero thus we have proved a uniqueness ;

theorem for the Laplace equation also.


560 Vector Field Theory Chap. 7

Example 2. As an application of Green's second identity, we consider


the following problem. Let D be a differential operator acting on func-
tions/defined in a region R in .'K
3
having a piecewise smooth boundary S.
If we let ^ be the vector space of continuous functions defined on R
together with S, we can define an inner product on !F by

(f,g) = \ fgdV.

It is easy to check that the integral is indeed an inner product. With respect
to the inner product, it is often important to know when the operator D is

symmetric in the sense that

(Df,g) = (f,Dg),

whenever Df and Dg are defined and continuous. See Exercise 17 of


Chapter 5, Section 2 for the analog for finite dimensional spaces. In
general, to solve such a problem it is necessary to impose appropriate
boundary conditions on the functions/in JF. Thus we can show that the
Laplace operator V 2 is a symmetric operator acting on the subspace ^"
of fF consisting of continuous functions/ that satisfy a fixed boundary
condition of the form

«i/(x) + «2 ^(x) = 0. (16)


on

In Equation (16), a x and a 2 are constants, not both zero, and djdn denotes
differentiation with respect to the outward unit normal on 5".

We /and g are two functions satisfying Equation (16)


suppose that
such that V / and V 2 g are continuous. Then Formula 6.4, Green's
2

second identity, shows that (S 2 f,g) = if, ^ g)> if and only if 2

Js \ dn dn/

Since/and g both satisfy Equation (16), we have

on

Mg+fa,f = 0,
on

and subtraction of one from the other gives

g(x)^(x)-/(x)^(x) 0, for all x on S.


on dn

If a2 ^ 0, this implies g(dfldn)-f(dgldn) = identically on S; so


Equation (17) is satisfied. If, on the other hand, a 2 = 0, then a^ is not
Sec. 6 The Operators V, V x and V-
, 561

zero, and the boundary condition implies that both/and g are identically
zero on S. But then Equation (17) is still satisfied. Thus we have shown

that V 2 is symmetric on :F Notice that V 2 /is not defined for all/in .!F
.
,

but only for sufficiently differentiable/.

From Formula 6.2 we can derive some equations for vector-valued


integrals.For the definition and properties of the integral, see Problem 16
of Chapter 6, Section 2. Let k be an arbitrary constant vector and let
F(x) = /(x)k, where / is real-valued and continuously differentiable on
a region R. Then, because V-/k = V/«k (Verify!), Formula 6.2
becomes

I
Vf.kdV = \
fk-nda.
JR JdR
Since k is constant,

k>\VfdV = k'\ fnda,


Jr JdR
and, from the fact that k is arbitrary, we conclude that

(vfdV = ( fnda. (18)


JR J dR
Similarly, replacing F in Formula 6.2 by k x F, where k is a constant
vector, we can show that

V X FdV = \
n x F da. (19)
R JdR
The proof is left as an exercise.
We conclude the section with a description of the expressions for
gradient, divergence, and curl with respect to an orthogonal curvilinear
coordinate system. Thus we assume that all coordinate curves intersect
at right angles, which means that the natural tangent vectors c 1; c 2 c 3 ,

are perpendicular at each point. If we are given a real-valued function/


or a vector field F defined in a region D of Jl 3 , then for practical conven-
ience we may want to substitute the curvilinear coordinate variables

"i. «2> w 3 using a transformation

(xlt xz ,x3) = T(Ui,u2 ,ua).

This leads us to consider a new real-valued function/, defined in the region


of the w-space corresponding to D by
/•* = /
For the vector field, we make an additional change and express it in terms
of the unit vectors


1
Ci, —c —c
1
2,
1
3,
hi h2 h3
562 Vector Field Theory Chap. 7

where h t
= |cf |. We get coordinate functions Fu F2 F3 , satisfying

F.T = f /id + f /fo +f F3C3.


/?! /l
2 /l
3

Rather lengthy computation using the chain rule gives the formulas:

6.5 (V/) o r= - — Cl + - — c + - — c 2 3
«i dUi «2 om 2 h 3 ou 3

dhzhzFi dhJizFj dhih z Fz


6.6 (V • F)oT =
3"i du, 3w,

6.7 ±1 '/ dh
3 F3 dh t F*
(V x F) ° T
/ll/l 2 ^! .\ du 2 du
u3 1

jdh1 Fx dh 3 F3 \
BhoFi
+
du 3
~ d Ul r
2 F2

dKF,
/dh dhjA
+
\ du l du 2 /

In the last formula, the plus sign is chosen if the c's form a right-
handed system, and the minus sign is chosen otherwise. Of course the c's

and the A's may vary from point The formulas are no more
to point.
complicated than they are because we have assumed the c's perpendicular.
As a result, the product h x h 2 h 3 can be interpreted as the volume of the
rectangular box spanned by c l5 c 2 and c 3 In the case of rectangular
, .

coordinates in 5i 3 the transformation T becomes the identity, and the


,

resulting formulas reduce to the corresponding definitions in terms of


rectangular coordinates.

3
Example 3. Introducing spherical coordinates in 'Ji , we have from
Chapter 4, Section 5, Example 4, that

hi— 1, h2 = r, h3 — r sin <p.

Then at a point (x, y, z) in Jl 3 with spherical coordinates (/*, cp, 6) we have

1 df
V/(x, y,z) = ^ (r, cp, 0) Cl + \ / (r, cp, 6)c 2 + ,
(r, (p, 0)c 8 .

or r ocp r sin cp 00
Sec. 6 The Operators V,Vx,WV- 563

and
2
dr F (r, cp, 8) d sin cpF2 (r, cp, 6)
V • F(x, y, z) = sin cp
x
+ r
Br dq>

dF3 (r, cp, ey


+ r
dO

Using the fact that the Laplacian is defined by V 2 / = V V/, we •


find

2
V /(x, y, z) = sin cp

r sin dr\ dr J

+ sin cp — I sin
df(r,cp,d)\
+,
d f(r, cp,

2
6)

dcp\ dcp dd

In checking this derivation, it is important to remember that Equation 6.6


deals with coordinate functions relative to unit vectors in the direction
of the c's and not relative to the c's themselves.

EXERCISES
1. Verify the identities (4) through (10) of the text.

2. If F= (F1 ,F2 ,F3) is a vector field, define the operator F-V to be

F 1
— + F —- + F —
d

d Xl
2
d

dx 2
3
d

dx3
.

(a) If F(x) = x, compute (x • V)G(x), where G(xx , x2 x^ = , (xf, xxx 2 x3). ,

(b) Show that in general (V • F)G ¥= (F- V)G.


(c) If k is a constant vector, show that

V x (k x F) = k(V F) - • (k • V)F,

where F is a differentiate vector field.

(d) Use part (c) to show that if F and G are differentiate, then

V x (F x G) = F(V G) - (F- V)^ - G(V F) + (G V)F• . •

3. Prove that if k is a constant vector, then

V x
k x x k kx
4. Prove that

(a)
3
x^O.
\\x\) Ixl
'

(b) v2 =0 x ^°-
(r) '
564 Vector Field Theory Chap. 7

5. If T(x) is the steady-state temperature at a point x of an open set R in 3t 3


,

then the flux of the temperature gradient across any smooth closed surface
in R is zero. Use this fact and Equation (15) to show that a steady-state
temperature function that is twice continuously differentiable is harmonic,
i.e., V2 7 = 0. [Hint. Suppose V 2 7"(x ) > 0. Show that V 2 T(x) > in
some ball centered at x .]

6. The boundary condition (16) of Example 2 may be generalized to

^(x)/(x) + <f> 2 (x) (x) = 0,



where <£ 2 and <f> 2 are continuous functions satisfying <^(x) + <f>\(x) > 0.
Show that V2 , the Laplace operator, is still symmetric with this more general
condition.

7. Show that if V Fis • identically zero in a ball B in 3l 3 , then the vector field G
defined by

G(x) = [F(t\) x (tx)] dt

satisfies V x G(x) = F(x) in B. The proof consists of the following steps,

(a) If G(x) = JJ
[F(tx) x (tx)] dt, then show that

V x C?(x) = f V x [F(tx) x (rx)] dt.


t
[Hint. Apply Leibnitz rule of Problem 7(a), Chapter 6, Section 2.]
(b) Show that V x [F(tx) x (tx)] = 2tF(tx) + t
2
(djdt)F(tx) by using the
identity of Exercise 2(d).

(c) Show that V x G(x) = F(x).

8. A vector field —
ft 3 defined in a region R is called solenoidal if, in some
Jl 3

neighborhood B of every point of R, the field Fcan be represented as the curl


of another field GB The field F is called incompressible if the divergence of
.

F is zero at every point of R. Show that a continuously differentiable field is

solenoidal if and only if it is incompressible. See Exercise 9 of the previous


section for the interpretation of incompressibility.

= _1
9. Consider the Newtonian potential function 7V(x) and its associated
|x|

gradient field V/V(x). (See Exercise 4.) Show N(x) can be interpreted
that
as the work done in moving a particle from co to x along some smooth
path through the field V./V.

10. Let R be a bounded region in ft


3
and \etp(x) be the density of material at the
point x in R. Then the integral

I)r |x - y|

is called the Newtonian potential of the material distributed with density p


throughout R.
.

Sec. 7 Differential Forms 565

(a) Show that if p is continuous on an open set R with a smooth boundary,


then, for x not in R,
V2 ^= 0.

That is, show that for x not in R, A^,(x) is harmonic.


(b) Show above assumptions
that under the onp and R, the integral Np (x)
exists as an improper integral when x is in R.

11. (a) Use Equations 6.5 and 6.6 to show that the Laplacian in cylindrical
coordinates has the form

i r a / df(r, 0, z) 3 2f(r, 6, z) 3 2/(r, 0, z)


V*f(x,y,z) r +r
2
dz 2
?l TrV dr d6

(b) Show that if/(.r, y, z) is independent of z, then in cylindrical coordinates

d 2f(r,0) 1 d 2f(r, 6) 1 d/(r, 6)

dO 2 dr

Compare with Chapter 4, Section 5, Exercise 13.


(c) Show that if/(jt, y, z) can be written as a function /(Vx 2 + y2 ), then

1 3/(r)
^ 2f(x,y,z)
dr 2 r dr

12. Show that \ff(x,y, z) can be written as a function /( V* 2 + _y


2
+ z 2 ), then

,
3"/(r) ,
2 3/(r)

SECTION 7

Having defined the line integral in Chapter 3, we observed that it could be DIFFERENTIAL FORMS
abbreviated

F 1 dx x + F 2 dx 2 + F3 clx 3
j;

in the 3-dimensional case. Our purpose here is to show that the integrand
Fx dx x + +
F2 dx 2 F3 dx 3 has an interpretation which leads to another way
of looking at the line integral. From there we can go naturally to a
definition of surface integral.
We shall denote by dxk the function that assigns to a vector a in :R n its

kih coordinate. Thus if a = (a l5 . . . , ak , . . . , an) is in .'R", then dxk (a) =


ak . In particular, (—1, 1,3), then dx^a) = — 1, dx 2 (a) = 1, and
if a =
dx 3 (a) = 3. Geometrically, ^.(a) is the length, with appropriate sign, of
the projection of a on the Ath coordinate axis, as shown in Fig. 31. Linear
combinations of the functions dxk with constant coefficients produce new
functions
c1 dx l + . . . + cn dx n .

Going one step further, if Flf . . , Fn are real-valued functions defined in a


.

566 Vector Field Theory Chap. 7

Figure 31

region D of 'Ji", we can form for each x in D the linear combination

co* = Fi(*) dxx + . . + Fn (x) dx n , (1)

n
where wx acts on vectors a in 'J\ by

o)x (a) = Fi(x) ^(a) + . . . + F„(x) <fc„(a). (2)

For example, if in %i we have a)^^) = x 2 dx + J 2 a'j, then oj (a v) (a, 6) = .

ax 2 + 6}> 2 and ft) (_li3) (a, b) = a + 9b. A function a>a as defined by


,

Formula (2) defines a differential 1-form or, briefly, a 1-form.

Example 1. Let 'J\ 3 —


> 31 be a differentiate function in a region D
of :K 3Then </x/, the differential of/at x, is a 1-form in Z>, for dx f acting on
.

a can be written, if a = (a l9 a 2 a3 ),
,

df
dj(*) = ^~ (x)^! + ^~ (x)a + 2
O.Xi dx 2 <7X 3

= ^
dxj
(x) dxfr) -r ^
dx 2
(x) rfx 2 (a) + ^
<7X 3
(x) rf.v 3 (a).

Thus the coefficient functions F .(x)


A in Formula (1) have the form Ffc
(x) =
(dfjdx k ){\). However, not every 1-form is the differential of a function.
Recall that, for sufficiently differentiable /, we have d 2fj(dx 2 5.Yj) =
d 2fl{dx 1 dx 2 ). but that dF1 ldx 2 =£ dF.2 ldx 1 , unless and F2 are
F l

specially related, as they are by the requirement that V/*= Ffor some F

Example 2. If <y x is a 1-form defined in a region D of ;R


3
, and 7 is a

differentiable curve lying in D and given by :ft


-^-> :R
3
for a < t < b, we
can at each point x = g(/) of y apply the linear function o)gH) to the tangent
Sec. 7 Differential Forms 567

vector g'(t) at x. The result is a real number which we can express as

^gU) (gV))=F (g(t))dx (g'(t)) + F,(g(t))dx,(g'(t)) + F 3 (g(0Kx 3 (g'(0)


l 1

= F (g(t))g[(t)+ F2 (g(0)g;(O+ F,(#;(()


1

= F(g(0)-g'(0-
If we write

rWco(g'(o)^ = fV(g (o)-g'(o^,


Ja Ja

we see that the right-hand integral is the line integral of the field Fover y.

The previous example leads us to the natural definition of the integral


of a 1-form over a smooth curve y. If co x is defined by Formula (1) in a
region D of % and y, lying in D, is parametrized by 3l-^->3l n for
n
,

a < < b, wq can do either of two things. We can use the coefficient
t

functions F Fn of a> x to form a vector field F in D and define


lf . . . ,

Jy Ja

or we can form a partition F of the interval a < t < b at points a = t <


tx < .< t K — b and define
. .

K
= lim 2,a> gllk) (gXt k ))(h.- r _i).
i m(P)->0it=l
fc

It is clear that the two formulas give the same definition of j y a> x , the
integral of t he \-form co x over y.
Next we define a product of 1 -forms which is different from ordinary
pointwise multiplication of functions. We first define the product of the
3
basic 1-forms dx x dx 2 dx3
, , in 'Ji . The product dx x A dx 2 is defined so
3
that it is a function on ordered pairs of vectors in 'Ji . Geometrically,
dx x A dx 2 (a, spanned by the pro-
b) will be the area of the parallelogram
jections of a and b into the x^-plane. The sign of the area is determined
so that if the projections of a and b have the same orientation as the positive
x x and x 2 axes, then the area is positive; it is negative when these orient-
ations are opposite. Such a projection is shown in Fig. 32. Thus, if a =
(a u a 2 a 3 ) and b = (b u b 2 b 3 ), then
, ,

dx x A dx 2 (a, b) = det = a x bo — a^b x ,

\a 2 b2J

and the determinant automatically gives the area the correct sign. We can
use the basic 1-forms dx x and dx 2 to write the last equation as

Idx-L (a) dx x (b)^


dxi A dx 2 (a, b) = det _

\dx 2 (a) dx 2 (b)


568 Vector Field Theory Chap. 7

Figure 32

The definitions of all possible products of dx x dx 2 and dx 3


, , in either order
can thus all be written in one formula:

dx t
(a) dx t (b)
dx { A dXj (a, b) = det (3)
K
dXj (») dx,(b)

with a similar geometric interpretation for each one. For example, we have

fa 2 b2
dx 2 A dx 3 (a, b) = det
a, b.

which is the signed area of the projection of the ab-parallelogram on the


^2^3-plane.
As a consequence of Equation (3) and the properties of determinants,
the following relations hold:

7.1 dx t A dXj = —dXj A dx t .

7.2 dx t A dx t = 0.

The first equation holds because the interchange of adjacent rows of a


determinant changes its sign, the second, because a determinant with two
equal rows is zero. Similarly, we have, on interchanging columns,

7.3 dx A dxj
t
(b, a) = — dx t
A dx } (a, b).
Sec. 7 Differential Forms 569

If we now ask for the most general linear combination of the functions
dx A (
dXj, it is clear from 7.1 and 7.2 that it can be written in the form

cx dx 2 A dx3 + c2 dx3 A dx x + c3 dx x A dx 2 .

Furthermore, if F=
(F1 F2 F3 ) is a vector , , field in a region D of JI 3 we ,

can define for each x in D the function

rx = F {\) i/x
x 2 A dx 3 + F (x) ^x
2 3 A dx x +F 3 (x) Jx x A dx 2
3
of ordered pairs (a, b) of vectors in JI . The function t x is called a differen-
tial 2-form or 2-form.

Example 3. The 2-form

t = 2 dx 2 A c/x 3 + dx3 A i/Xi + 5 dx x A </;c 2

is the same function at every point of 'Ji


3
because its coefficients are con-
stant. (We have written r instead of tx , and we shall do this whenever
explicit mention of the variable x is not needed.) Letting a = (1, 2, 3) and
b = (0, 1, 1), we have

r(a, b) = 2 det -f det )


+ 5 det ]
= 2.
j J ( (

We recall, and it is easy to verify directly, that the vector a x b with


coordinates

(2 1\ /3 1\ /l
det , det , det
\3 1/ \1 0/ \2 1

is perpendicular to (1, 2, 3) and to (0, 1,1), and that its length is equal to
the area of the parallelogram P spanned by a and b. Thus T(a, b) =
(2, 1, 5) • (a x b) and is thus the coordinate of (2, 1,5) in the direction
perpendicular to P, multiplied by the area of P. If we interpret (2, 1, 5)

as the constant velocity vector of a fluid flow in space, r(a, b) will be the
total flow across P in one unit of time. Such a flow is shown in Fig. 33.
For a flow F of constant speed and direction across a flat surface S, the
flux is defined to be the normal coordinate F of F, times the area of S,
and so is equal to (F • n)ff(S).

We can summarize what has just been done by pointing out that we
have defined a multiplication, called the exterior product of basic 1-forms
dx x dx 2
, , etc. The resulting products written dx ( A dx} are basic 2-forms
in that every 2-form is a linear combination of them. In Jl 2 there is only
one basic 2-form, dx x A dx 2 while in 'Ji 3 there are three of them. (How
,

many in Ji 1 ?) From here we can proceed to define the exterior product of


any two 1-forms to be the 2-form got by multiplying the 1-forms as if they
570 Vector Field Theory Chap. 7

Figure 33

were polynomials in the variables dx x dx 2 etc., and then using the rules
, ,

7.1 and 7.2 to simplify. In practice, the wedges are often omitted from the
notation.

Example 4. To save writing subscripts, we shall denote basic 1 -forms


by dx, dy, dz, even though the latter notation is less convenient for formu-
lating the^general definition. We have

(x dx +y 2
dy) A (dx + x dy)

= x dx A dx + y dy A 2
dx + x 2 dx A dy + xy 2 dy A dy
= - y dx A dy + x 2 2
dx A dy +
= (x — y dx A dy.
2 2
)

In three variables we have the example

(dx + dy + dz) A (x dx + z dy)

— x dx A dx -r x dy A dx + x dz A c/x

+ z </x A (fy + z (/j A rfj + z (/z A rfy

= —2 dy A dz -\- x dz h dx -\- (z — x) dx /\ dy.

Three-forms arise in attempting to define the product of a 2-form and


a 1-form. The meaning of the basic 3-form dx x A dx 2 A dx 3 is that of a
signed volume function. Thus if a = (alt a 2 a 3 ), b
, = (b x , b 2 , b 3 ), and
c = (c x c 2 c 3 ), we define
, ,

(ax bx cx

a2 b2 c2

a3 b3 c3
. .

Sec. 7 Differential Forms 571

which is the 3-dimensional-oriented volume of the parallelepiped spanned


by the vectors a, b, and c. Higher-dimensional forms are defined in
Exercise 1 1

A differential form of unspecified dimension will be called a /"-form


and denoted io p , ap , etc. As we did with the 1-form, we shall define the
n
integral of a /?-form in 'J\ over the image S of a set in 'Ji
p under a con-
tinuously differentiable function g. To simplify the discussion, we suppose
that 'J\
p — > :K" is differentiable on a closed bounded rectangle R in 'Ji
p .

If co p is a/?-form with coefficient functions defined on the image S dR —


in 31", we can apply «j p at a point g(u) to the derived vectors (dgjduk ){\i)
for k = \, . . .
,p. The result looks like

< u) (^(u),...,^(u)).
\du 1 dUj, )
(4)

Furthermore, if G is a grid over R with corner points u l5 . . . , uv , and


rectangles Rlt . . , R x we , can form the sum

icog( Jp-(nk) ...,p-(uJ)v(Rk),


v
}

where V(R k ) is the/7-dimensional volume of Rk We . define the integral of


a> p over S by

f <* = lim f< (UA,


(|i (ufc), . .
. ,
!* (u fc )) V(R k ),
JS m(G)-»0 7c=l VdUi OM p /

provided the limit exists in the Riemann sum sense. If g is continuously


and oj p has continuous coefficient functions, then the function
differentiable,
of u given by Formula (4) will be continuous on R, and j s co p will exist
and be equal to the Riemann integral of that function. Thus

JS JR XOll! OU v I

Example 5. If

(til = F (x) dx x 2 A dx 3 +F 2 (x) dx 3 A &xx +F 3 (x) dx x A dx 2

has continuous coefficients in 3i


3
, and fll
2 — > ft
3
is continuously differen-
tiable on R, with coefficient functions g 1 g 2 g 3 then
, , ,

,,
"h
(
I
<!
XOM!

dg
" ,

'

duj
dg
=F
(7U 2 /
\
I
JT
x
o g
8 (g

d^,
2, g 3)
M 2)
+F
, r,
2
o g
9(g 3 g
d(i<j,
,

U 2)
t )
|F 3
o» 9(gl,

<?( M 1>
g2 )
u 2)
572 Vector Field Theory Chap. 7

Hence, in this case,

dx 2 A dx 3 +F 2 dx 3 A dx x + jF 3 dx 1 A dx 2
Js Js
'

F\ ° g r:
9(g2, g3 )
+ ,

r
-F 2 ° g

9(g3, gl)
: + ,

F3

°
~ Hgl^gz)'
du x du 2
-I
.
:
d(u u u 2 ) 9("i. "2) d( M i> "2).

Except for smoothness conditions on S, assumed here for convenience,


this is the same as the definition of the surface integral given in Section 3.

EXERCISES
1. Find the value of each of the following differential forms acting on the in-

dicated vector or ordered set of vectors.

(a) dx x + 2dx 2 (1,1). ;

(b) 3dx -dy + dz;(l, -1,0).


(c) dx 2 + dx3 (1,3, -5). ;

(d) dx x + 2dx 2 + ... + ndx n (1, -1, 1, -1,...). ;

(e) 2dXl Adx2 ;({\,\)A\, -1)).


(f) dy hdy + 2dx a dz; ((1, 2, 1), (-1, 2, 3)).
(g) dx Ady a dz; ((1,1, 1), (1, 2, 1), (2, 1,2)).
(h) dx x a dx 2 ((-2, -3, 0), (2, 0, 2)).
;

2. Multiply out and simplify the following products.

(a) (dx x + dx 2) a {dx — dx 2 x ).

(b) (2dx + 3dy - 2dz) a dx.


(c) (dx + dy) a (<£t + dy).
(d) (x 2 rfx + z 2 dz) a (rfx - 2dy).
(e) (sin z (/x + cos x dy) a (dx + dz).

3. Compute $ ydx + xdy, where y is given by g(t) = (cos /, sin t),


< t < 77/2.

4. Compute f_ x x rf^ + x 2 ^2 + *3 ^x3> where 7 is given by

(a)£(0 = (-M »0 2 for -1</<1.


(b) A(r) = (/,/, for < r < 1.

(c) fc(f) = (r
2
,r ,f )
2 2
for -1 <t < 1.

5. Compute J y w where = x x rfx x + x 2 a> , <tx 2 + . . . + x 2 t/x„ and y is given


by g(t) = (/, t, /,...) forO < t < 1.

6. Let a) and to be 1-forms defined in a region Z) of Jl", and let /and g be


real-valued functions defined in D.

(a) Show that/w +^w defines a 1-form in D.


(b) Show that if u>, v, and /« are 1-forms in Z), then

(f(l) + gv) A fl = fu> A /i + gv a /<.


Sec. 7 Differential Forms 573

7. Prove that if co and d> are 1 -forms and if y1 and y 2 are curves over which
co and w are integrable, then

(aco + bco) = a\ co + 6 co,

where a and b are constant


ant and

CD = I co + I co.

8. Let / be a continuously differentiable real-valued function defined on a


region D of 31", and let y be a continuously differentiable curve lying in D
with parametrization 31 — > 31", for a < t < b. Show that if ^ (a) = a and
^(6) = b, then

dx f=f(b) -/(a).
i
9. (a) Prove that if co x is a 1-form in a region Z> of 31", then for each fixed
x in D, oj x is a real-valued linear function on all of R n .

(b) Prove the converse to part (a), namely, that if o x is a real-valued linear
n
function defined on all of 31", then a>
x (a) = yc k (x ) dx k (a), for all

a in 31", where the c k (x ) are real numbers.

3
10. (a) Let r be a real-valued function of pairs (a, b) of vectors in 3l such that
r(a, b) = — r(b, a) and such that t is linear in a and in b. Show that
there is a vector c r in 3l 3 such that r(a, b) = det (a, b, c r ) for all a, b
in 3l 3 . [Hint. Show first that the result holds if a and b are in a basis
for 3l 3 .]
(b) Use part (a) to show that if r(a, b) = — r(b, a), and r is bilinear, then
t is a 2-form in 3l 3 .

11. For an ordered p-tuple (a 1; a 2 , . . . , a,,) of vectors in 31" where/? > 1, define

dx h a dxht a ... a dx kp (* lt . . . , a p) = det (^x fc .(a ; )) i=1 p


.

i=l j>

This equation defines the basic p-forms in 31", of which the generals-forms
are linear combinations.

(a) Compute dx 2 a dx3 a dx^ + 2 ciq a dx 2 a Jx4 (a, b, c), where a = (1,
-1,0, 2), b = (-1, 1, 1, 1), andc = (0, 1,2,0).
(b) Prove that the interchange of adjacent factors in a basic p-iorm changes
the sign of the form.
(c) Prove that a basic p-fovm with a repeated factor is zero.
(d) Prove that the general p-form can be written

oj p = y fi i dXi a ... a dx; ,

«i<...<«p

where 1 < ik < n for /c = 1 , . . .


, p.
,

574 Vector Field Theory Chap. 7

(e) Prove that if p > n, a> p is identically zero.

(f) Prove that there are I terms in the/>-form of part (d).


J

12. If w p and a>* are p- and ^-forms in 31" with

°> v = dXit A A dx H
J, fit *, • • •

and
wP = 2 Sh u dx Jt A -- Adxs9 >

h<—<i*
define their exterior product co p a oj q by

a> p M i = 2/< 1.... ,<fii...i


A
a J ^! A • • •
A dx i
p
A «***» A • • •
A dx i q -

(a) Prove that if a>


p a <xfl is reduced to standard form by using Equations

7.1 and 7.2, then it has |


terms.
\p +v
(b) Prove that co p a g>« = (-l) p «a>« a u> p .

13. Show that the definition of the integral of a /?-form agrees with that given
for a 1-form when/? = 1.

14. Compute | s a>


2
if cu
2
= dx a dy + dx a dz, and 5 is the image of < u < 1

< v < 7r/2 under^O/, v) = (u cos v, u sin t>, v).

15. (a) If a»
3
is the 3-form fdx x a dx 2 a */jr
3 in 4l 3 , and ft 3 — > ft 3 is differ-

entiate, show that

/dg_ dg_ dg\ t(gx,git gz)


g
=
\Su 1 '
9« 2 '
9m 3 / -^
* 3(«j, //
2 , m )

(b) Compute the integral s J"


ckc ArfjA c/z, where Sis the image of < « < 1,
0<i><l,0<w<l under ^(m, j;, w) = (u
2
, f2 , w 2 ).

SECTION 8

THE EXTERIOR The fundamental theorem of calculus states that if (djdx)f is integrable
DERIVATIVE
on [a, b], then

l
d
dx = f{b) - f{a). (1)
Ia dX
The Stokes and Gauss formulas,

I
curl F dS = I
F • t ds (2)
Js JdS

\
div FdV = \
F-nda, (3)
JR JdR
are similar in that they express the integral of a kind of derivative of a
function in terms of the function itself on a set of lower dimension. Using
Sec. 8 The Exterior Derivative 575

differential forms, we shall define exterior differentiation, which unifies


the above formulas.
The operation of exterior differentiation is defined inductively on
differential forms as follows. Let/be a real-valued differentiable function
on 31". Then, by definition,

df=2Ldx1 + ... + $-dxn .

dx l dx n
Thus the exterior derivative of /is the particular 1-form that at each point
of the domain of/is equal to what we have earlier called the differential of
/ To continue, if

co
1
=/j dxx + • • •
+/„ dxn
is a 1-form with differentiable coefficients, then in terms of the 1-forms
dfx , . . . , dfn we define
,

dco 1 = (dft hdx1 + ... + (dfn ) A dx n .

Thus doj is a 2-form. In general, if co p is a/?-form, then dco p is the (p + 1)-


1

form got by replacing each coefficient function of oj p by the 1-form that is


its exterior derivative. To keep the terminology consistent, we may refer to
a real-valued function as a 0-form.

Example 1. Uf(x u x 2 ) = x\ + x\, then df is given by


d{x\ + x\) = 2x x dx x + 3x2 dx 2 .

If o)\ x iX )
= xxx 2 dx x + (x\ + x\) dx 2 then dw
,
1
is given by

d(x 1 x 2 dxj. + (xl + x\) dx ) 2

= (x dx + 2 t x x dx 2 ) A dx x + (2x x dx x + 2x 2 dx 2 ) A dx 2
=x t dx x A c/x 2 .

If co {x , ViZ)
= xz dx A dy + j>
2
z c?x A dz, then dco 2 is given by
2
d(xz dx A dy + )> z dx A dz)
= (z <ix + x dz) A dx A dy + (2yz dy + 2
y dz) A dx A dz
= (x — 2_yz) dx A dy A dz.

Using the exterior derivative we can state the general Stokes formula in
the form

8.1
I
dco* = I
0)
P
,

JB JdB

where 5 is (p + l)-dimensional and 35 is its/?-dimensional boundary. We


shall interpret the formula in several specific cases.

576 Vector Field Theory Chap. 7

If (o 1 is a 1-form in 3i 2 , then we can write to in the form

co
1
—F x dx 1 +F 2 dx 2 ,

and then

dco
. (dF
(<3b\
= —— dx + I

Vox!
x ,

1

6t\
3^!
dx 2
dx 2
\
\

/
)
A dx x +
tCF 2
/dF,
I

V^X!
— dx x -j
dF 2
dx 2
dx 2
\
1

/
A ^x 2

(4)
(dF 2 dFA
= T--^l
,

dx t Adx 2 .

\ox! dx 2

1
Substitution of dco 1 and co into Equation 8.1 gives

~ 7r) ^ Xl A dx i = Fl rfXl
+ F2 rfx2 -
<5>
f (t~*
Jb\OX OX 2 l f
1 JdB

This is almost Green's formula of Section 1, if B is a suitable set in % 2 , and


5J? stands for its counterclockwise-oriented boundary curve. However,
the left-hand integral in Equation (5) has been defined as an integral over a
parametrized set in the previous section, whereas the corresponding integral
of Green's formula,

I— — —- 2
} dx 1 dx 2 = <h Fx dxy +F 2 dx 2 ,

Jd\ox 1 ox 2 I Jy

is an integral over a set without a parametrization. The difference between


the two is such that the parametrized set B may be covered more than once
by its parametrization, while the integral over the set D covers each part of
the set only once. See Exercise 9.

If co 1 is a 1-form in 3t8 given by

co
1
—F x dx x +F 2 dx 2 +F 3 dx 3 ,

then a straightforward calculation like that in Equation 4 yields

dco
1
= I

,dx
• — ^ I dx 2 A dx

+ —-— —-— (6)


(dF x dF 3\ , , dFA
(dF 2 .
,

dx 3 A dx x + ,

M dx x A rfx 2
oxj \ox 3 \ox x ox 2 !

Thus the 2-form dco 1 has as coefficient functions the coordinates of the
vector field curl F where F = (Fx , F2 F3 , ). It is immediate that the general
Stokes formula becomes precisely the Stokes formula of Section 4 if we
make B and dB stand for a piece of smooth surface 5 and its positively
oriented border dS.
The Gauss formula of Section 5 comes from considering a 2-form

co
2
=F x dx 2 A dx 3 +F 2 dx 3 A dx x +F 3 dx x A dx 2 .
Sec. 8 The Exterior Derivative 577

A short computation shows that

dco* =(Vaxj
d
-fl + ^+M
ox 2
d
vx 3 /
dXl A dx, A dx3 . (7)

field F = (F1? F2 F3 ),
This 3-form has as coefficient the divergence of the ,

and substitution into the general Stokes formula gives the Gauss, or
divergence, formula of Section 5, except that, as with Green's formula, the
volume integral of Gauss's formula is not identical with the integral of a
3-form. See Exercise 10.

The correspondence between a vector fieldF = (Fl5 F2 F3 ) in 3l 3 and


, ,

"
a differential form with coefficient functions Fu F2 F3 has been described
,

in Equations (6) and (7) and can be summarized as follows:

if (»
2 <—> F, then dco 2 <-» div F,

if a)
1 <-> F, then dco 1 <— > curl F
Finally, if oj° <->/, then rfco <-> grad/, where/is real-valued.

EXERCISES
1. Compute dco, where to is
578 Vector Field Theory Chap. 7

(b) Use the correspondence between a vector field F= (Fx , F2 F3


, ) and the
1-form with coefficients Flt F2 F3 , , to show that the result of part (a) is

equivalent to the relation div (curl F) = 0.

(c) Show that d(dw°) = is equivalent to curl (grad/) = 0.

6. (a) Find an example of a 1-form w 1


such that there is no 0-form w° for
which doj° = co 1 .

2
(b) Show similarly that if oj is a 2-form, there may not exist a 1-form co
1

such that dco 1 = to


2
.

(c) Interpret parts (a) and (b) in terms of gradient and curl.

7. Prove that if co v is a />-form with twice continuously differentiable co-


efficients, then d(da> p ) = 0.

8. (a) Let F = (Fl5 . . . , F„) be a vector field in 31", and consider the 1-form
cu
1
= F x i/jfj + . . . + F„ ^ n . Show that the condition Jw 1 = is

equivalent to the requirement that the Jacobian matrix F' be symmetric.


(b) Show that ifw 1 is a 1-form with continuously differentiable coefficients
in a rectangle R in R n then dco1 = in R implies that there is a function
,

Jl" -^-> 3t such that ^


= w 1 [jy/nf. Use Theorem 2.6.] .

9. In the general Stokes formula, Equation 8.1, let B be a disc in Si 2 param-


etrized by (x l ,x 2 ) = (m cos u, usinv), < u < 1, < i> < 4n-. Notice
that the disc is covered twice by this parametrization. What is the correct
parametrization of dB to make 8.1 hold for this example? Take a> v to be a

1-form with continuously differentiable coefficients.

10. In the general Stokes formula, Equation 8.1, let co p be a 2-form with con-
tinuously differentiable coefficients in the ball |x| < 1 of R 3 Show
. that it is

possible to parametrize the ball B in such a way that the corresponding


parametrization of dB has an inward pointing normal. Explain the apparent
contradiction to the requirement in the Gauss theorem of Section 5 that dB
have an outward-pointing normal.
Appendix

SECTION 1

The theorems and techniques of calculus depend on both algebraic and INTRODUCTION
n
topological properties of 'J\ and of functions from Si n to Si" The alge- 1
.

braic properties are taken up in Chapter 1 , while topological matters are


treated in the section of Chapter 3 on limits and continuity. However, at
certain points of the later development of the subject, notably in existence
theorems for integrals, we need more facts about continuity than we have
treated at the beginning of Chapter 3. These topics are discussed briefly
in the present section, but without proofs. For example, consider the
following theorem.

1.1 Theorem

Let 3i n —
> Si be continuous on a closed, bounded subset K of Si n .

Then /attains both its maximum and minimum on K.

This theorem is used to guarantee the existence of a maximum or a


minimum value for a real-valued function. The usual techniques of cal-
culus are useful for finding maxima and minima, but they fail to solve the
problem of existence. The following theorem is basic to this entire dis-
cussion.

1.2 Theorem

Let AT be a closed, bounded subset of Si. Then K contains a largest


element, called the supremum of K, and a smallest element, called the
infimum of K.

579
580 Appendix

Theorem 1 .2 can be used to prove Theorem 1 . 1 as well as the following


result, used to establish the existence of the integral of a continuous
function.

1.3 Theorem

Let 'J\
n — > .'Jl be continuous on a closed, bounded subset Koi 3i n .

Then/is uniformly continuous on K, that is, given e > 0, there is a


d > such that \f(x) —f(y)\ < e whenever x and y are in K and
\x-y\ <d.

Proofs of these three theorems, and also of Theorem 1.4, can be found in

any textbook on real-variable theory. One of the best is Principles of


Mathematical Analysis, Second Ed., Walter Rudin, McGraw-Hill, 1964.
Next we describe an intrinsic criterion for the convergence of a sequence
x 1? x 2 x 3 ,of vectors in 3l n We say that a sequence is a Cauchy
, . . . .

sequence, if, given e > there is a number N such that

I** - x ; |
< e

whenever k, I > N. While it is quite easy to show directly from the


definitions that every convergent sequence is a Cauchy sequence, the
converse given below depends on Theorem 1.2.

1.4 Theorem

If Xj, x 2 x 3 ...
, , is a Cauchy sequence of vectors in 31", then the
n
sequence has a limit x in 3i , so that

lim x k — x.
fc->CO

Theorem 1.4 is needed to prove Theorem 5.1 which justifies the


modified Newton method for solving vector equations.
The definitions of limit point, limit, continuity, interior point, open
set, boundary, closed set, and differentiability have been given directly
or indirectly in terms of the Euclidean norm on :K". Relative to the norm,
the distance between two points x = (x1} . . . , x n ) and y = (ylt . . .
,y n )
is the number
|x - y| = [(*, - yiy + ...+.(*„- yj-r*.
The same definitions can be made provided any norm is given. A norm on
C
U is a real-valued function || ||
with domain equal to H) and with the
three properties of Euclidean length:
Sec. I Introduction 581

1.5 Positivity |x|| > 0, except that ||0|| = 0.

1.6 Homogeneity ax = a x .

1.7 Triangle inequality |x + y|| <||x|| + ||y|

One purpose of this section is to prove that, in a finite-dimensional vector


space, the limit definitions are independent of the choice of norm. That is,

if x is a limit point of a set 5" with respect to one norm, then it is a limit
point of S with respect to every norm, and the same goes for the other
definitions referred to above. It follows, in particular, that these
basic limit concepts are not dependent upon a Euclidean inner
product. An example of a norm on !il" different from the Euclidean
norm is the so-called box norm, or maximum norm, defined, for
any x = (x1} . .. , x n ), by

||x|| =max{|x1 |,...,K|}.


= ||x — x < 0.5 is
2
If x (2, 1), the set of all x in 'J\ such that ||

the parallelogram shown in Fig. 1. We have purposely drawn


nonperpendicular coordinate axes with different scale to emphasize Figure 1

the fact that this norm is not Euclidean.


For any two norms ||
and || || 2 on a vector space CU, we define
|| || x to be equivalent to || 2 if there exist positive real numbers k and K
C
such that, for any x in 17,

*||x|| i: £||x|| 2 <;tf uxj. (l)

It is easy to check that this is a true equivalence relation, that is, it satisfies

the three requirements:

Reflexivity Every norm is equivalent to itself.

Symmetry || || x is equivalent to || || 2 if and only if || || 2 is

equivalent to || |d.

Transitivity If || Id is equivalent to || || 2 andif|| || 2 is


equivalent
to || || 3 , then || Id is equivalent to || || 3.

More important is the fact that equivalent norms result in the same
definitions of limit point, limit, interior point, and differentiability.
Continuity, open set, boundary, and closed set, which are defined in terms
of the preceding concepts, are therefore also independent of a choice
between equivalent norms.
To verify the above contention, let || || l5 and || || 2 be equivalent
norms, and suppose that x is a limit point of 5 with respect to || Id.
X

582 Appendix

Then, for any ex > 0, there exists a point x in S such that < ||x — Xoflx <
ex . Thus, if e 2 > is given arbitrarily, we may set e x = € 2 jK and obtain,
by Inequality (1),

< II X — XnlU <K ||


li < Ke x = e2 -

Hence, x is a limit point of S with respect to ||


Suppose, next, that
with respect to || || x we have
lim/(x) = y .

x->x

Then, as we have just proved, x is a limit point of the domain of/ with
respect to both norms. For any e x > 0, there exists d 1 > such that if x is
in the domain off and < ||x — XqIIj < d lf then ||/(x) — y ||i < «i- Let
e2 > be given arbitrarily, set € = ejK, and then choose x <5
2
= d x k. If
< ||x — x < ||
follows by Inequality (1) that
2 (5
2 , it

< II x
k k
Hence,
11/00 - yolU <K
||/(x) -y ||i < K€l = 6„
and we conclude that lim/(x) =y with respect to || || 2 . The
arguments for the definitions of interior point and differentiability
are similar, and we omit the details.
With respect to a given norm on a vector space 17, the e-ball
with center x is the set of all x in 17 such that ||x — x < e. ||

In general, an e-ball doesn't look very round. For example, with


respect to the box norm on 3l 2 , every e-ball is a parallelogram
from the Inequalities (1) that two
(see Fig. 1). It follows directly
norms are equivalent if and only if any e-ball about x with respect
to one norm is contained in some d-ball about x with respect to
the other norm, and vice versa (see Fig. 2). We now turn to the
Figure 2 principal theorem.

1.8 Theorem

Any two norms on a finite-dimensional vector space 17 are equiv-


alent.

Proof. Let || ||
be an arbitrary norm on 17. Choose a basis {x 1 ,

x n } for 17, and define a Euclidean norm on 17 by setting

|x| = y/xt + . . . + xl
for any x = x1 x 1 + . . . + x nx n . We shall show that is equiva-
|| ||

lent to | |, that is, there exist positive real numbers k and K such
Sec. 2 The Chain Rule 583

that k |x| < ||x|| < K |x|, for all x in °0. By the transitivity property
of the equivalence relation between norms, it then follows that any
two norms on T) are equivalent.
For any x = XjXj + + x n \ n we have . . . ,

1*11 <2WUX; <:(ij|xj)max{|x f |}

< (ijx,\i)Ji/, = K M ,

where K= ]T II*; II
> 0. We now prove that k exists. We contend

that, as a function of x, the real-valued function || ||


is continuous
with respect to the Euclidean norm |
|. For instance, if e > is

given, we pick d = e/K. Then, if |x — x |


< 6,

<l|x < K |x — xn |
< e.

Let k be the minimum value of the function || [|


restricted to the
Euclidean unit sphere |x| = 1. Then, for any x ^ 0, it follows that
||x/|x| ||
> k and, hence, that

<
||x|| >k\\\, for any x in XS.

This completes the proof of the equivalence of norms.

SECTION 2

In Chapter 4 we have presented two different versions of the chain rule, THE CHAIN RULE
one in the section on the gradient and another in Section 3 devoted entirely
to the chain rule. Both of these theorems contain assumptions that certain
functions are continuously differentiable. Furthermore, in Theorem 3.1
we assumed that the composition g of was defined on an open set. Neither
of these assumptions need to be made in order for the chain rule formula
to hold. Of course, to conclude that g°f is continuously differentiable,
we need to assume that both/and g have continuous derivative matrices.
The virtue of the next theorem is that it contains a minimum of assump-
tions, and the proof is, in style, very much what would be used to prove
the theorem for real functions of a real variable.

2.1 Theorem. The Chain Rule

If %n -^ %m is x and %m -
differentiable at %p is differentiable
at/(x ), then g °fis differentiable at x and

(g°fy(x ) = g'(f(xo))f'(Xo).
x y

584 Appendix

Proof. The first thing to prove is that x is an interior point of the


domain of g °f. Since g is differentiate at /(x ), the point y =
/(x ) is by definition an interior point of the domain of g. Hence,
there exists a positive real number b' such that a point y is in the
domain of g whenever |y — y < <3'. The function/, being differenti- |

able at x , is also continuous there. Furthermore, x is by definition


an interior point of the domain of/. It follows that there exists a
positive number 6 such that if |x — x |
< 6, then x is in the domain
of/ and
l/« - Yol = l/(x) -/(*o)l < o'.

But 6' has been chosen just so that, if the last inequality holds, then
/(x) is in the domain of g. Thus any point x in Jl n that satisfies
|x — x < 6 lies in the domain of the composite function g of, and
|

the vector x is therefore an interior point of that domain.


It remains to prove that the composite linear function with

matrix g'(y )/'( x o) satisfies the criterion for being the differential of
g of at x That is, we must prove that if
.

g °/( x ) - 8 °/( x o) - g'(yo)/'( x o)( x - *o) = |x - x |


Z(x - x ),

(1)
then
lim Z(x — x ) = 0.
x->x

Since /and g are differentiable at x and y , respectively, there are


functions Z 1 and Z 2 such that

/(*) -/(*o) =/'(x )(x - x + |x - x Z (x - x


) | x ),

g(y) - gtoo) = g'(y<>)(y - y + ly - y z (y - y


) l 2 ),

and
lim Zj(x — x ) = lim Z,(y — y ) = 0.
x— y—

Using the /-equation to substitute into the ^-equation, we get

8 °/(x) - g °/(x = g'(yo)(/' (x„)(x -


) x ) + |x - x |
Z x (x - x ))

+ |/(x)-/(x )|Z 2 (/(x)-/(x )).

From this it follows, by the linearity of the matrix multiplier g'(y ),

that

g °/(x) - g °/(x - £'(y )/'(xo)(x - x


) )

= Ix - x g'(y )[Z (x - x )] + |/(x) -/(x


| 1
)|Z,(/(x) -/(x )).

(2)
Sec. 3 Arc Length Formula 585

We note that, since/ is differentiable at x ,

l/(x) -/(x )l = l/'(x )(x - x ) + |x - x |


Z x (x - x )|

< k |x — x | + |Zi(x — x )| |x — x |,

where we have used the triangle inequality, and


in the last step
Theorem Chapter 3 provides the constant k. This inequality
7.3 of
enables us to estimate the right side of Equation (2), showing that
its norm is less than or equal to |x — x multiplied by a function |

that tends to zero as x tends to x . In fact, we have the upper


estimate

|x - x |
{|^'(y )[Z 1 (x - x„)]| + [k + |Zt (x - x„)|] |Z2 (/(x) -/(Xo))|}

But as x tends to x ,/(x) tends to/Yx ); so both Z(x x — x ) and


Z2 (/(x) — /(x )) tend to zero. This shows that Z(x
— x ) itself,
defined by Equation (1), tends to zero, thus completing the proof.

SECTION 3

To appreciate fully the significance of the connection between arc length ARC LENGTH
and the formula j"^ \f'{t)\ dt that is used to compute it, it is necessary to FORMULA
know that the formula doesn't always work. The reason is that there is a
continuous curve y in til 2 which has length 2, but such that if the integral
formula is applied to it the result is 1. The construction of such an
example is fairly complicated, and showing that the relevant integral has
value 1 is itself nontrivial. For the curve y we can take the graph of the
so-called Cantor function. (See R. P. Boas, A Primer of Real Functions,
John Wiley & Sons, 1960, p. 131, or B. R. Gelbaum and J. M. H. Olm-
stead, Counter-examples in Analysis, Holden-Day, Inc., 1964, p. 97.)
Once the Cantor function is understood, it is fairly easy to show that its
graph has length 2, as defined by the least upper bound of the lengths of
inscribed polygons.
For a piecewise smooth curve y, given by a piecewise continuously
differentiable function 'A — > 'JV for a < t < b, we have the following
1

theorem, stated in Chapter 3. (The graph of the Cantor function is not, of


course, a piecewise smooth curve.)

Theorem

Let y be a piecewise smooth curve as described above. Then l(y),

the length of y, is finite, and

/(y) (01 dt.


-IV
586 Appendix

Proof. We show first that l(y) < J** |g'(OI dt, noting that since \g'\ is

continuous on each of finitely many closed intervals, bounded,


\g'\ is

and the integral will be finite. The inequality will be proved if we


can show that

iiga*)-gc*-i)i <[\g\t)\dt, (i)


fc=l Ja

when a = t < tx < . . . < K=t b is an arbitrary partition P of


[a, b]. But, by the triangle inequality for the norm, we only increase
the sum on the left if we add to the partition all end points of the
finitely many intervals on which g is continuous. So we assume this
has been done. Then for each interval [tk _i, tk we have ]

lg('*) - g(>A-l)l
Jtk-i

< |g'(OI dt.


Jtk-i

The equality holds by Exercise 16 of Section 2, Chapter 3, and the


inequality holds by Exercise 17 of the same section. Summing over
k= I,. . . , K, gives (1).

To prove the reverse inequality, j^ \g'(t)\ dt < l(y), we shall show


that for any r\ >
0, we can find a partition P of [a, b] such that

f i g '(oi^-^<iig(?,)-g(^-i)i. (2)
Ja A-=l

This will show that no number smaller than the integral is an upper
bound for the sums on the right. We take as an initial partition P
all the finitely many endpoints of closed intervals on which g' is

continuous. On each such interval, g is uniformly continuous by


Theorem 1.3 of this Appendix. This means that given € > 0, there
is a b > such that \g'(t) — g'(u)\ < e, if \t — u\ < 8 and tk _x < t,

u < t
k . Since there are only finitely many intervals we can
[t k _i, t k ],

choose a single positive b that will work for all of them. Now make
a partition P fine enough that max {t k — tk_i) < b, still including
in P all the points of P . On each interval of the new partition we
have \g'(t)\ < \g'(t k )\ + €, by the uniform continuity of g'. Thus,

lg'(0ld'<:[|g'('*)l + *](f*-'*-i) (3)


I'
Sec. 4 Convergence of Fourier Series 587

But we also have the identity

lg'('*)l('*-'*-i) = (g'(0 + g'(h) - g'(0) dt

< r
jtk-i
gv)dt + r (gv )-g'(t))dt
jtk-i
k

< \g(h) - g(**_i)l + I \g'(t k ) g'(OI dt,


Jtk-i

where in the last step we have again used the results of Exercises 16
and 17 of Section 2, Chapter 3. Again using the uniform continuity
of g', together with the previous inequality, we get

[g'C*)l (h - t
k_x) < |g(f„) - g(t k _ x )\ + e (tk - y.
Applying this inequality to Equation (3) gives

|g'(OI dt < \g(t k ) - g(t k _J\ + 2e(t k - / fc _ x ).


I"

summing over k gives

g'(t)\ dt <Z\g(t k ) - g (r _ x )| + 2<b - a).


r fc

If e is chosen so that 2e(b — a) < rj, then the desired Inequality (2)
will be satisfied for the partition P constructed above.

SECTION 4

Theorem 5.2 of Chapter 5 asserts that the Fourier series of a piecewise CONVERGENCE OF
smooth function/converges pointwise to the average of the right and left FOURIER SERIES
limits of / at each point. The assumption that / is piecewise smooth
means that the interval [—77, tt] can be broken into finitely many sub-
intervals, on each of which f and f can be extended to be continuous.
At the endpoints of an interval [xk xk+1 ] we require/to be continuous if
,

it is given the respective values/^ +) = limf(xk + u) and/(,Yt+1 — ) =


lim f(xk + u). Similarly, we require /' to be continuous on the closed

interval if it is given the values of the right and left derivatives, respectively:

+
(x k ) urn
f(x t + u)-f(xk +)
f --

u-»0+

/(Xfc+l + <0 -/(**+! -)


u-»0+

We restate the convergence theorem as follows.


588 Appendix

4.1 Theorem

Let /be piecewise smooth on [—it, n}. Then the Fourier series of/
converges at each point x of the interval to (l)[f(x ) + f(x +)].

In particular, if/ is continuous at x, then the series converges to
fix).

Proof. We have to show that if ak and b k are the Fourier coefficients


of/, then

lim s N (x) = lim — +2 Aft C0S ^ x + frfc sm &*


N-+00 N-*oo 2 fc=l

= M/(* -) + /0 +)], for all x in [-tt, tt}.

Replacing ak and 6 by their definitions, we get


fc

*ivW = — fit) dt

+ -I
77" fc=l
cos kx f(t) cos kt dt + sin kx f(t) sin /cf df

N
fit) \ + ][cos/c(f — x) dt.
77 J-n

But trigonometric identities (See Exercise 5 of Chapter 5, Section 5)


show that
* sin (AT + \)u
+ 2, COS KM = 2 (1)
k=i sin (|)u
Hence,
sin (JV + \)U - x)
**(*) =- f/(0- —
77 J-ir 2 sin (J)(f x)

We now extend /outside the interval [— ir, tt] so that it has period
2t7, and make the change of variable t = x + u. Then the new
interval of integration is [—77 — x, 77 — x]. But since the integrand
has period 2tt, the value of the integral remains unchanged if we
shift back to the interval [—77, 77]. Thus we have

s
,
y (x)
,
= -1 f ' ,,
\
f(x + ,
m) — (N +
n sin \)u
- ,
du.
77 J-!T 2 sin (£)u

We shall show that

limst<x)=G)f(x+), (2)
2V->00
Sec. 5 Proof of the Inverse and Implicit Function Theorems 589

where
+/ n
*n(x) =- 1 [\<
f(x + ,

u)
x — (N +
sin \)u
-du.
.

(3)
77 J0 2 sin (\)u

A similar proof would show that lim sj,(x) = (h)f(x — ) where


JV—*.oo

^v(x) = (1/tt) $°_ n f(x + w) (sin (TV + \)uj2 sin (J)w) </«, and addi-
tion of the two equations will finish the proof.
To prove Equation (2), we observe from Equation (1) that

77 Jo 2 sin (|)u

Multiplying both sides by f(x +) and subtracting from Equation


(3) gives

st(x) - \f(x +) = - f s[n N +J> ^


[/( x + u) - f(x +)] du. (4)
Jo
77 2sin(i)i/

The proof will becomplete if weshow that this last integral tends to
zero as ./V tends to infinity. But g(u)
[f(x u) —f(x +)]/sin (|)« = +
is piecewise continuous, so the result is a consequence of

4.2 Riemann's Lemma


If g is continuous on the closed interval [a, b], then

lim
rg(u) sin ku du = lim
f
b

g(u) cos ku du = 0.
fe-» oo Ja k-*ccja

Proof. By Bessel's inequality, proved in Chapter 4, Section 7, we


have
b

g\u)du>f(al+bl).
\
Ja k l
SECTION 5
Hence, the series on the right converges. We conclude that a k and b k
tend to zero as k tends to infinity.
INVERSE AND
IMPLICIT FUNCTION
THEOREMS
We start with a theorem that not only guarantees the existence of an
inverse function, but also proves the convergence of the modified Newton
iteration given in Equation 9.2 of Chapter 3.

5.1 Theorem

n -^-> ft" be continuously differentiate in


Let 3l some closed ball of
_1
radius r centered at x . Suppose [/'( x o)] exists and that for some
590 Appendix

number K, with <K< 1, the maximum absolute value of the


entries in the matrix

/-i/'(xo)]-y"(x)

is less than Kin whenever |x„ — x| < r. If

iir(xo)]-y(x )i<(i-*>-, a)
then the equation
x*+i = x* - [/'(x )]- 1/(x fc )

defines a sequence x„, x ls x 2 , . . . converging to a vector x such that


f(x) = 0. Furthermore, x is the only solution of/(x) = contained
in BT (x ).

Proof. Let r be chosen as in the hypotheses. Let g(x) =x—


[/'(x )]~y(x). Then for x and x' in B r (x ), we haveg(x') — g(x) =
(x' — x) — [/'(x )]
_1
(/(x') —f(x)). Applying the mean- value the-
orem to each coordinate function/ of/, we get

/,(x')-/,(x) = i|^'(y,)(x;.-x,),

where x' = (x[, . . . , x'm ), x (x 1; = x m ), and y ; is on the


. . . ,
-

segment joining x and x'. Writing F(x, x') for the matrix

if)
we get
/(x')-/(x) = F(x,x')(x'-x);
hence

S(x') - g(x) = (x' - x) - [f'(x )]-iF(x, x')(x' - x)

= (/ - [f'(x )]-^F(x, x'))(x' - x). (2)

By assumption, the maximum absolute entry in the matrix


A = I — [/'(x )]
_1
F(x, x') is less than or equal to AT/« for some
positive K< 1. But for any matrix A = (a#),- ( ,=i n and any
vector y = (y lt . .
y n ), we have
.
,

n I n \2
\Ay\

In I n n \

V i= l \,=1 3=1 J

by the Cauchy-Schwarz inequality. Hence

\Ay\ < \y\J


V
I
i=lj=l
24 ^ lyl Vn 2 max a u
i.i
2
= n |y| max
i.i
\a u \.
Sec. 5 Proof of the Inverse and Implicit Function Theorems 591

Applying this result to the matrix / — [/'( x o)]


_1
-F ( x > x ')> we get
from (2):

- < - =K -
|g(x') g(x)|
fn |x' x| |x' x|. (3)

Now define for A: > 1, and for x k in the domain of g,

X fc+ l = g( x k)-

Then

l
x fc+l - x*l = Ig(Xfc) - g( x*-i)l
^A-lx^-x^l <**| x i- x ol- (4)

It follows by repeated application of (4) that for k >m > we have

l
XJfc+l — X m\ ^ Xfc+l — X +
l A:I l
X —X +
fc +
fc-ll • • •
|
x m +l — X m\
< (K k + K"- + 1 m
... + K ) Xl - x | |

rJc—m+l\

< ~^— xi - x ol = t^- II/'Cxo)]-


1
/^)!
1 — K.
l

1 — K.

<K m r
For m= we get
L
fc+1
— xn < r:
so, forA: =
1,2, 3, ... x lies in r (x ) and, hence, in the subset
, fc
N
of the domain of/ on which the hypotheses of the theorem are
satisfied. Thus each xk is defined and satisfies the Inequality (4).
Returning to (4) we see that

\x k+1 - xj < Km r
implies that x , x x x 2 ...
, , is a Cauchy sequence which necessarily
converges to some vector x satisfying |x — x |
< r. Since

x fc+i = g(*k)
and g is continuous, we have

x = lim x i+1 = lim g(x k ) = g(x),

that is,

x =x- [/' (x )]-y(x).

Thus

/( x ) = 0.
592 Appendix

To show that the solution x is unique in B r (\ ), observe that the


inequality (3), rewritten in terms of/, is

|(x' - x) - [/'(x )]
-1
(/(x') ~/(x))| <K\x' - x|.

From this it follows by the triangle inequality that

|x' - x| - |[/'(x )]
-1
(/(x') -/(x))| <K\x'- x|.

Hence

(1 _ K) X |
'
_ X |
< \[f'(x ))^(f(x') -/(x))|. (5)

This shows that, for x and x' in Br (\ ), f{\')


— f{x) = implies
x' = x = 0, that is, that/is one-to-one on Br (x ).

5.2 Inverse Function Theorem

Let 'J\
n — > 'Ji
n
be a continuously differentiable function such that
/'(x ) has an inverse. Then there is an open set N containing x

such that /(TV) is open and such that/, when restricted to N, has a
-1
continuously differentiable inverse/ . In addition,

if-'Yiyo) = Lf (x )]-\
where y =/(x ).

Proof, Part 1. We apply the previous theorem to the function

/(x) — y, where y is any vector such that the Inequality (1) is


satisfied. That is, we assume that y is in the set 5 of vectors z
satisfying
|[/'(x )]- 1 (/( x o)-z)K(l ~K)r.
By the approximation theorem, there is a vector x, unique in
B'(x ), such that/(x) —y= 0.

Furthermore, Inequality (1) shows that the function/ -1 defined


by/ _1 (y) = x is continuous because

(l - K)\f-Hy') -/-Hy)l < \[f'(x )]-Hy' - y)|.

We define the set N =f~ 1


(S). Clearly, TV <= B r (x ), and it is easily
seen that N and f(N) are open sets because/and/ -1 are continuous.
Proof, Part 2. For any x in N, the inverse off'(x) satisfies the

off at y =/(x).
1
condition for being the derivative
For x' and x in TV, let y' =/(x') and y =/(x). Since / is

differentiable at x,

f(x) ~f(x) =/'(x)(x' - x) + |x' - x| Z(x' - x),


Sec. 5 Proof of the Inverse and Implicit Function Theorems 593

where lim Z(x' — x) = 0. Alternatively, we can write


x'—>-x

f - y =/'(x)(/- 1 (y') -/-x (y)) + |x' - x| Z(x' - x).

_1
Applying [/'(x)] to both sides gives

f-Hy') -/-%) - [fWlHy' - y) = |x' - x| \f'(x)]^z(x' - x).


Lemma 5.3 shows that |x' — x|/|y' — y| remains bounded as y'
tends to y. Since [/'(x)] Z(x' — x) tends to zero as x' tends to x,
_1

the inequality |y' — y| > M \x — x| resulting from Lemma 5.3


shows that the right side of the above equation tends to zero when
divided by |y' — y| and as y' tends to y. This completes the proof of

part 2.

_1
Proof, Part 3. / is continuously differentiable. Part 2 shows that
_1
/ is differentiable on f{N) and that

[f-'Yiy) = [/'(x)]- 1 .

Theorem 8.3 of Chapter 1 shows that the inverse of a matrix A has


_1
as entries continuous functions of the entries in A. Since /'(/ (y))
has continuous entries, so does its inverse. This completes the proof
of part 3 and so of the inverse function theorem.

5.3 Lemma
There is a neighborhood iV of x and a number M> such that

M\x'-x\ <\f(x')-f(x)\
for all x and x in N.

Proof. By Theorem 7.3 of Chapter 3 there is a constant k such that

\[f'(x )]-i(f(x') -f(x))\ < k I/O*') -f{x)\.


Combining this inequality with inequality (5) above gives the
desired result with M= (1 — K)/k.

5.4 Implicit-Function Theorem

Let :Jl"
+m — > 'Ji
m be a continuously differentiable function. Suppose

that for some x in %n and y in :H'"

1. JF(x o ,y ) = 0.

2. Fy (x Q , y ) has an inverse.
y

594 Appendix

Then there exists a continuously differentiable function :K" — > 'Ji


m
defined on some neighborhood N of x such that /(x ) = y and
F(\,f(x)) = 0, for all x in TV.

Proof. The proof consists in reducing the theorem to an application


of the inverse function theorem. For that purpose we extend F to a

function 'Ji u+m —


> 'Si n+m by setting H(\, y) (x, F(x, y)). In terms =
of the coordinate functions Fx ,Fm of F, the coordinate func-
. . . ,

tions of H are given by

"i(x, y) = Xi
H (x, y) ^
z *2

Hn+X (x, y) = Fiixu . . . , xn , lt . . .


,ym )

H n+m (x, y) = Fm {x x , . . . , xn ,ylf . . .


,ym ).

The Jacobian matrix of H at (x , y ) is


Sec. 6 Proof of Lagrange's Theorem 595

The function //is certainly continuously differentiable; so we can


apply the inverse function theorem at the point (x , y ) to get a
function H J
that is inverse to H from some open set N' in 3{"+'"
containing H(\ , y ) to an open set about (x y ). Since //(x y )
, ,
=
(x , F(x , y )) = (x , 0), the set N of all points x in 31" such that
(x, 0) is in N' is an open set and contains x .

Let G 1 be the function that selects the first n variables of a point


and G 2 the function that selects the last m variables. Thus,
in 'J{'"'",

Gi(x, y) =
x and G 2 {x, y) = y. Since //(x, y) = (x, F(x, y)), the
function H is the identity on x. The same must therefore be true of
H-1 Hence,
.

Ui = &i ° ti .

We define/ by

/(x) = G H~ 2
x
(x, 0), for every x in N.
Then
//-i(x, 0) = (GiH-^x, 0), G.2 H~Hx, 0)) = (x,/(x)).

Applying // to both sides, we get

(x,0) = /f(x,/(x)) = (x,/(x)),


for every x in N. The two parts of the first and last pairs must be
equal. Hence,
= F(x,f(x)), for every x in N.

Finally,/, being the composition of two continuously differentiable


functions, is itself continuously differentiable by the chain rule.
SECTION 6

The proof makes use of the implicit function theorem, and of the fact PROOF OF
that, for a linear function L defined
dimension of the domain is in 31", the
equal to the dimension of the range plus the dimension of the null
THEOREM
space. We begin by restating the theorem.

Lagrange's Theorem

Let the function 31" — - 3l


m n
, > m, be continuously differentiable
and have coordinate functions G G2
1 , , • , G m Suppose
. the
equations
Gi(*i, •

,xn) =

C m (xj, ... ,xn) =


n
implicitly define a surface S in 3l , and that at a point x of S the
matrix G'(x ) has some m columns linearly independent.
596 Appendi.

If x is an extreme point of a differentiate function 31" — 31,

when restricted to S, then x is a critical point of the function

+ lnfi n

for some constants ).


x . . . , Xm .

Proof. The implicit function theorem ensures that there is a para-


metric representation for S in a neighborhood of x For instance, .

suppose that for some choice of m variables, say xlt . . . , xm , the


columns of the matrix

dG x
dXl dx. dx„

(1)

dG m dG m dG,
^dx x
dx 2 dx n
are independent. Then the matrix has an inverse. Write x =
(a u . . . , a n ), and set u = (a u . . . , a m ) and v = (a m ^,«„) . . . ,

By the implicit function theorem, there is a differentiable function

3l
n_m —
m defined on a neighborhood vVof v such that /2(v = u
> 3l )

and G(h(y), v) = for all v in N. The function :R"- m -^> 31" defined
by
H(v) = (//(v), v) for all v in N,
is a parametric representation of a part of S containing x = H(v ).
The surface S has a tangent 73 of dimension n — m at x . The reason
is that, first of all, the derivative of H at v is the (n — m)-by-
(n — in) matrix
dh x dh x dh x
dx, dx, dx~,

dh m
Sec. 6 Proof of Lagrange's Theorem 597

where h lt h m are the coordinate functions of H. In addition,


. . . ,

the columns of this matrix are independent because the columns of


O's and l's in it are independent.
H
Now compose with/. Since x is an extreme point of/ in S,
the point v is an extreme point of/° H. Hence,

(/o//)'(v )=/'(x )//'(v ) = 0. (2)

Because G is constantly zero on S,

(G o H)'(y ) = G'(x )H'(y ) = 0. (3)

Looking at (2) and (3) together, we see that dx f and d G are both
zero on the range of dr H, which set is the tangent "G. Thus the
matrix

~Jf_
dx,

dGj_ d_Gj_

dx, dx„

dG n
LBx, dx.

defines a linear function 31" — > i)l


m+1 that is identically zero on "6.

Since the dimension of V is n — m, we have

n — m < dimension of null space of L.

It is always true for a linear function L that

n — dimension of null-space of L + dimension of range of L;


so

n > n —m+ dimension of range of L,


that is,

m > dimension of range of L.

m+1 Then we can


Recall that the range of L is a subspace of 3l .

m+1 -^-+ such that A is zero on the range


define a linear function 'j\ 'Ji

of L, but not identically zero on 3i m+1 In other words, there is a .


598 Appendix

nonzero A such that A ° L = 0. In matrix form A= (?. , ?n , . . .


,

X m ), and so

dx x dx n

bg 1 ac,
dx 1 dx n
(Ao, K • , AJ = 0. (4)

5G^ 5G,
_3xi 9x,

It cannot happen that A = 0, for then the rows of (1) are dependent,
contradicting the fact that (1) has an inverse. Taking A = 1 (if

A t^ 1, divide through by A ), the condition (4) becomes


C7X ; -
(x ) +K —
OXj
(x ) + • • • + Am —2 (x
OX,
) = 0,

for _/
= 1, . . . , m. In other words, (/+ k1 Gl + . . . + A m (7TO )'(x )
= 0. This completes the proof.

SECTION 7

PROOF OF TAYLOR'S The method of proof consists of reducing the problem to the one-variable
THEOREM case an(j tnen making an estimate of the size of the integral formula for the
remainder.

7.1 Taylor's Theorem

Let 3tn — > 31 have all derivatives of order A continuous in a neigh-


borhood of x . Let TN (x — x ) be the Ath degree Taylor expansion
off about x . That is,

TY (x - x„) =/(x ) + d*J{x - Xo) + . . . + 1- d*w f(x - x ).

Then
(/(x) - Tv (x - x
lim-
))
= 0, (1)

and rA is the only Ath-degree polynomial having this property.

Proof. Let y = x — x and define

F(t) =/(x„ + t(x - x )) =/(x + /y).


Sec. 7 Proof of Taylors Theorem 599

Then for k = 0, 1 , . . . , TV, we can apply the chain rule to get

F™(t) = <*W(y). (2)

To see this, notice that for k — the formula is true by definition.


Assuming it to hold for some k < TV, we have

F {k+1 \t) = f dl +tJ f(y)


at

did. ,
d\k
dt \ OX x OX n /x +ty

- ***[(*£ + •
+ y "i5 (y)

>'„ —
OX n /x +ty
/

<^ /(y)- y

This completes the proof of Equation 2 by induction. In particular,

F (fc)
(0) = </(y).

From Theorem 3.1 of Chapter 5 obtain

F(l) - F(0) -^ F'(0) - . . - — F (A, (0)


TV!

1
(1 - iV
[f
(A
'(0 - F '(0)] rfr.
(TV- 1)! Jo
In terms of/, this is

/(x) - Tv (x - x ) =/(x) -/(x ) -± tf


Xo /(y)
- . . .
- ^ </(y)

— 1):
±— f \l - r)
v- 1
[< +ty /(y) - </(y)] Jr.
(TV-1) Jo

We now estimate this difference.

|/(x) - rv (y)| < 1


max|< +(y/(y)-</(y)|
(TV — l)!o<<<i

< max
0<*<1
yi— + ••• + y„— /

J
"
\ d*i dxJx

= max
0<<<1 k 1 +...+k n =N\ K l K n/
600 Appendix

Then since \y t \
<: |y|, we have

x n
\y\ ...yl
<i,
\

lyl'

and so

|/(x) -T iV (y)|
< (3)
|y|"

N d
N d
N
v / \ f f
2, I , ,

k nJ
I
'
max (x„ + ty) (x )
-...+k„=N V^l • • 0<<<1 dx^.-.dxl" dx\* . . . dx knn

By assumption, the derivatives of f through order TV are continuous


functions at x . Then as y tends to zero, each term in the last sum tends
to zero,which proves Equation (1). The inequality (3) shows that if/" is a
polynomial of degree N, then it equals its Nth-degree Taylor expansion.
For then all terms on the right are zero.
The proof that 7^ is the only Mh-degree polynomial satisfying
Equation (1) goes as follows. Let 7^ and T'N be two such polynomials.
By (1),

A
|y|

Suppose that TN — T's were not identically zero, and let

Pk (y) + R(y) = TN (y) - rv (y),

where Pk is the polynomial consisting of the terms of lowest degree (say k)


that actually occur in TN — T^. Then, there is a vector y such that
Pk (y ) 7^ 0. On the other hand, since k < N,

Tv (/y ) - T'N (ty


= lim )

\tyX

= ,.
lim
Pk Qy ) + R(ty
-
)
k
\ty \

I yd* '-oi^iyol*
Sec. 8 Existence of the Riemann Integral 601

However, because all the terms of R have degree greater than k, the last
is zero. But then P (y ) = 0, which is a contradiction.
limit k

SECTION 8

Theorem 2. 1 of Chapter 6 is as follows. EXISTENCE OF THE


RIEMANN INTEGRAL
Theorem

Let/ be defined and bounded on a bounded set B in 31", and let the
boundary of B be contained in finitely many smooth sets. If/ is
continuous on B, except perhaps on finitely many smooth sets, then
/is integrable over B. The value of §B fdV is unchanged by chang-
ing the values off on any smooth set.

We recall that a smooth set in 31" is the image of a closed bounded set

under a continuously differentiable function 31™ > ill", with m< n.


We first show why smooth sets are negligible in the domain of integration
off.

8.1 Theorem

Let 31
m -^> 31" be continuously differentiable. Then for every closed,
bounded subset A' in the domain ofg, there is a constant M such that
\g(y) ~ g(*)\ < M |x - y|

for all x and y in K.

Proof. Denote by K x K the subset of 'J\


2m consisting
of all 2m-
tuples (x, y) such that x and y are each in K. It is easy to see that
K x K is closed and bounded in 3l
2m . Now consider the equation

g(y) - g(x) - g'(x)(x - y) = |x - y| Z(y - x).


This equation defines Z except at 0; so we define Z(0) = 0. We
first show that Z(x — y) thus defined is a continuous function on
Kx K.
To solve for Z(y — x), we divide both sides of the defining
equation by |x —
then apparent, because both g and g' are
y|. It is

continuous, thatZ(x —
y) is continuous except perhaps when x y. =
If both x and y tend to some point z in the domain of g, then (x, y)
tends to (z, z), and we want to show that Z(x y) tends to zero. —
We apply the mean-value theorem, Theorem 8.2 of Chapter 2, to the
602 Appendix

coordinate functions gk of g, for k = 1, . . . , n. We have for each k,

gk(y) ~ gk(*) = £&(x fc


)(x - y),

for x and y is some sufficiently small neighborhood of z and for


some x k on the segment joining x and y. Then

||x - y| Z(x - y)| = |g(y) - g(x) - g'(x)(y - x)|

< max \g k (y) - g k (x) - gk (x)(y - x)|


l<it<n

< max |g,Xx )(y


fr
- x) - g;(x)(y - x)|
l<k<n

max 2 d&
l<fc<n ;=1 \OXj
w - & WW-, OXj 1
i

< max |^"(x )-^(x) fc


|y-x|.
l<fc<7! ;=1 OXj OX,
Hence

|Z(y — x)| < max


l<k<n
2
j=l

OXj
(x t ) - —
OXj
(x)

Since the partial derivatives are assumed continuous, and since each
x k tends to z as x and y do, it follows that Z(y — x) tends to zero as
x and y tend to z. This shows that Z(x — y) is continuous on^x K.
Since |Z(x — y)| is continuous, it attains its maximum value
M'; so |Z(y — x)| < M' for all x and y in K. Hence

ls(y) - S(x) - £'(x)(y - x)| < M' |y - x|.

The inequality \A\ — \B\ < \A — B\ shows that

|^(y) - g(x)\ <M'\y-x\ + \g'(x)(y - x)|. (1)

But

|g'(x)(y - x)| < max I -^(x) |y - x|,


l<fc<n j=lOXj

and the continuity of the partial derivatives on /^implies the existence


of a constant M" such that

\g'(x)(y - x)| < M" \y - x|.

This inequality, together with (1), implies that

\g(y)-g(x)\ <(M' + A/")|y-x|,

for all x and y in K, as was to be shown.


Sec. 8 Existence of the Riemann Integral 603

8.2 Theorem

If S is a smooth set in Jl", then S can be covered by finitely many


coordinate rectangles of arbitrarily small total content. The covering
can be done in such a way that no point of S lies on the boundary
of the union of the set of covering rectangles.

Proof.The case in which S is just a point is trivially true, so


we assume m > 1. The smooth subset S is the image under a

continuously differentiable function


m
'Ji'
n — g
> 31", m< n, of a closed
bounded set K in We enclose K in a cube of side length s,
3i .

and subdivide the cube into smaller cubes of side length s/N, where N
is an integer bigger than 1. There are N
m of these little cubes. On
each of the little cubes that contain any points of K we have by
Theorem 8.1

llg(x)-g(y)|| <M||x-y|| <M^,


where M is K and g. This means that
a constant depending only on
the image under g of the part of K in each little cube is contained in a
cube of side length M(sjN). Then the surface S is contained in N m
cubes each of volume {MsjN) n The total volume of the cubes .

-
containing S is at most N m (MsjN) n = {Ms) n jN n m Since n > m, .

the total volume can be made arbitrarily small by making TV large.


By enlarging the side length of each covering rectangle to {Ms + 1)/
N, the last condition of the theorem can be met.

As a corollary, we get the fact that a smooth set has zero content.
Now we can prove the existence theorem for integrals stated at the
beginning of the section. Suppose that / and B are as described in
the hypotheses. We
must produce a number which we shall prove is the
Riemann integral of/ over B. Let/B be the function /extended to be zero
outside B. For an arbitrary grid G covering B, let R k be the kth bounded
rectangle of G, and let/, be the infimum of/B on R k Define .

Similarly, define

S(G)=Zfk V(R k ),
k=l

where/, is the supremum offB on R k Then


. clearly

§(G) < 2(fB (x k))V(R k ) < S(G), (2)


604 Appendix

if the Riemann sum is an arbitrary one formed from the grid G. Further-
more, if G' is a grid consisting of a subdivision of the rectangles of a grid
G, we have
S(G) < S(G') < S(G') < S(G).

In particular, if G and G" are two grids, and G' contains all the rectangles
of both of them, then

S(G) < S(G') < S(G') < S(G"). (3)

We define

/#/ = supremum of S(G),


where the supremum is taken over all grids G covering B. We have from
relation (3)

S(G) < IBf < 5(G)

or

-5(G) <-IBf<-S(G).
This inequality added to (2) gives

.v

I(fB(**))v(Rk) - SBf < S(G) - S(G),

in which the Riemann sum has been formed from the grid G.
Now all we have to do is show that S(G) — S(G) can be made arbitrarily
small if the mesh of G is made small enough. Then according to the
definition of the integral, we will have shown that the integral of/over B
exists and is IBf. Let e be a positive number. By Theorem 6.2, we can
cover the boundary of B, the smooth surfaces containing the discontinuity
points of f, and any other smooth surface on which we would like to
disregard the values of/, with finitely many open rectangles R[, R{, . . . ,

of total content less than e. On the part of B not covered by these rec-
tangles, /is continuous; so by Theorem 1.3 there is a <3 > such that/. —
fk < e over any rectangle Rk belonging to a grid with mesh less than b.

By making the mesh still smaller, say, less than b' , we can arrive at a
mesh size such that the rectangles R[, . . . , R',, are always contained in
finitely many rectangles R'i, . . . , R"m of any grid with mesh less than b'

and such that the total content of the latter rectangles is less than 2e.
Suppose that the remaining rectangles of such a grid G are R x Rn , . . . , ,

that |/| < M


on B, and that B is contained in a rectangle of volume C.
Then
m n

S(G) - S(G) = 2(£ - flW(K) +1(1, -f*MR k)

< (2M)(2e) -f eC = e(4Af + C).


Sec. 9 The Change-of- Variable Formula for Integrals 605

Thus we have made

IfD (xk )V(R - k) I Bf < e(4M + C)

for any grid of small enough mesh. Since e can be made arbitrarily small,
the proof is complete.

SECTION 9

This section contains a proof of the change-of-variable theorem (Theorem THE CHANGE-OF-
1 n
ft fn,on tJ.r f.
VARIABLE FORMULA
3.1) of Chapter 6.
FOR INTEGRALS

Theorem

Let 51" —
> 51" be a continuously differentiable transformation. Let
R be a set in 31™ having a boundary consisting of finitely many
smooth sets. Suppose that R and its boundary are contained in the
interior of the domain of T and that

1. T is one-to-one on R.

2. J, the Jacobian determinant of T, is different from zero on R,


except perhaps on finitely many smooth sets.

Then, if the function / is bounded and continuous on T(R) (the


image of R under T),

[ fdV = [{foT)\J\dV.
JTIR) JR
If/ should be discontinuous on a smooth set S contained in R, then
the theorem can be applied to R with S deleted. The subsequent inclusion
of S and T(S) in the domains of integration will affect neither integral
since these sets have zero content.

Proofs We first consider the special case in which /is the constant
function 1, and T is linear, then, by Theorem 3.9 of Chapter 2, T
can, except for trivial interchanges of variables, be written as the
product of elementary linear transformations of two types: numer-
ical multiplication of a coordinate,

M(x u x n) — V*l» • >


axk> • • • >
X n)'-> (1)

t The proof we give isone by J. Schwartz, "The formula for change of


contained in
variable in a multiple integral," American Math. Monthly, vol. 61, no. 2 (February,
1954). See also D. E. Varberg, "Change of variables in multiple integrals," American
Math. Monthly, vol. 78, no. 1 (January, 1971).
606 Appendix

addition of a multiple of one coordinate to another,

A(x u . . . , xk x n) = (*!, . . . ,x k + rxt , . . . , x n ). (2)

By looking at the matrices of these transformations, it is easy to see


that det M = a and det A = 1 . Once the special case of the theorem
has been verified for each of these two types, it follows for arbitrary
nonsingular linear transformations by successive application of the
product rule for determinants. Let Rk be the projection of R on the
subspace perpendicular to the kth coordinate axis. For each point
(x l5 . . . , xk _ x xk+1
, x n ) in
, . . . , Rk , let Ik be the set of all xk such that
(x x , . . . , x n ) is in R. For the linear transformation (1), we have
by iterated integration

f \J\dV = ( dVn A\a\dx k .

JR JRk Jh

If we denote by \a\ Ik the set of all numbers of the form \a\ xk where ,

xk is in 4, we obtain by 1-dimensional change of variable

f dVn A \a\dx k = [ dVn A du k


JRk J Ik JRk J \a\Ik

= |
dV.
JM(R)

For the linear transformation (2), we denote by Ik + rx, the set of


allnumbers xk + rx,, where xk is in Ik Then iterated integration and .

1-dimensional change of variable yield

JR
f |J| dV = [ dV =
JR JRk
[ dVn A dx
Jlk
k

= f dVn_x \ dx k = \ dV.
JRk Jlj+rx J A(R)

This completes the proof of the theorem for linear transformations


Tand constant functions/.

In proving the general theorem we shall use the following norm for the
matrix A — (a{j ) of a linear function

M|| =max 2>,- 3 |


.

l<t<n 3=1

If we similarly define a norm for vectors x = (xlt . . . , x n ) by

||x|| = max |x ; | ,

l<3'<n
Sec. 9 The Change-of- Variable Formula for Integrals

then it follows immediately that

11-4x11 < Mil INI


The reason for using this norm
and y are vectors in a coordinate
is that, if x
cube of side length 2s, then ||x — y|| < 2s. Similarly, if p is the center of
such a cube, then the closed cube consists of all vectors x such that
llx - p|| < s.

Suppose now that C is a cube of side length 2s contained in R and with


center p. We have by the mean-value theorem, for x in C,

Tk (x) - T,( P ) = r;.(y,)(x - P ),


where the Tk are the coordinate functions of T, and yk is some point on the
segment joining x to p. Then

||T(x)-T(p)|| <max||T'(y)|| ||x - p|| .

y in C

This implies that T(C) is contained in the cube of vectors z defined by

||z-T(p)|| < 5 max||r(y)||.


y inC
Raising both sides to the nth power, we conclude that

K(r(C))<[max||r(y)||]V(C).
I yin C )

Notice that if L is an arbitrary one-to-one linear transformation with


matrix A, and S is a set bounded by finitely many smooth sets, then
V(L(S)) = |det A\ V(S). This follows from the special case of the change-
of-variable theorem that we have just proved for linear transformations.
Now we take S = T(C) and A = [T'(x)]- 1 Then, applying (3) with .

T replaced by (dx T)~ x o T, we get

|det [T'(x)]-
1
!
V(T(Q) = V^T)' 1
° 7(C))

1
^fraax||[r(x)]- T'(y)||}'
!
nC)

V(T(Q) < |det T (x)| max ||[r(x)]^r(y)|| \"V(C). (4)


U in C j

Let the cube C finite set C t


be divided into a C v of nonoverlapping , . . . ,

cubes with centers xlf xN and suppose that b is the maximum side-
. . . , ,

length of all of them. Apply (4) to each Ck taking x = xk in each case. ,

Addition gives

V(T(Q) <i|det r(x,)|( max ||[r(x A.)]- r(y)||]V(Q).


1

n=l ly in Ck I
608 Appendix

Since T is continuously differentiable, 7"(y) is a continuous function of y,


and [V {x k)]~ l T' (y) approaches the identity matrix as y tends to x k Then .

there is a function h(d), tending to zero as 6 tends to zero, such that

maxllfT'Cx^rny)!!! < 1 + h(6).


y in Ck
This gives

V(T(Q) < [1 + h(d))l |det T'(xk )\ V(C k ).

As 6 approaches zero, the sum on the right approaches J" c |det T'\ dV
Then the last inequality becomes

V{T{Q) < |det T'\ dV (5)


J

Having proved this last inequality, we use it to prove the formula for
more general sets than cubes. We shall assume / > 0. The general case
follows by considering the positive and negative parts of / separately and
adding the resulting formula for each part. Let G be a cubical grid
covering R and having mesh <5. Let C\, . . . , CN G that
be the cubes of
are contained in R. If we let R x be the part of R that is not contained in
any of the cubes Ck then , R = C U x . . . U C Y U Rx Whenever y is .
fc

a point of Ck and x k = T(yk ), we shall write fk for (/° T)yk and/(xA ).


Then, because of (5), we have

lA-f dV<2f k \
\J\dV.
k=
;=1 JTlCl:) A-=l JCk
From this it follows that

D = \ fdV-\{foT)\J\dV
JT(R) JR

<f fdV-ZfJJTiCk) dV+lfk ( \J\dV-\(foT)\J\dV.


JR
JTiR) JCk
fc=l k=l

Since T(R) = T(C X ) U . . . U T(CN) U T(RN), we have

D<\ fdV+l f (f-fk )dV


JTlRy) fc=l JT(Ck)

+1 f
(A-Z-^UIrfK-f (/oT)|J|rfK.

Because T is continuously differentiable, it follows from Theorem 8.1

of this Appendix that there is a constant B for which

||
r(x) - 7Yy)|| < B\\x - y||, for all x and y in R. (6)

Now be a positive number. Since/is uniformly continuous on T(R)


let e

(apply Theorem 1.3 of this Appendix to T(R) together with its boundary),
Sec. 9 The Change-of- Variable Formula for Integrals 609

we can choose d, the mesh of G, small enough so that

\(f°T)y-fk \<:€, foryinQ, fc=l,...,JV.

By using (6) and, if necessary, taking d still smaller, we can get

I
.fix) -fk < e, \
for x in T(Ck ), k=l,...,N.
Then

D<[ fdV-[ (foT)\J\dV+e{V(T(R))+V(R)}.


JTlRx) JRx
Since R is assumed to have a volume, there is a mesh such that V(Ry) ^ € -

Again using (6) and, if necessary, decreasing the mesh again, we can get
V(T(R v )) < B n e. Then

M + V(T(R)) + V(R)},
D < e{MB n +
where M a number such that/ < M on T(R) and (f° T)
is \J\ < M on R.
Since e is arbitrary, we must have D < 0, that is,

Jr(«) Jr
If we apply this last inequality to the situation in which T is replaced
by T~ x we get,

_1
\ (fo T) \J\ dV < I
(/o T o T- 1
) |J o T" 1
!
|J |
dF.

where 7_1 is the Jacobian determinant of the transformation T' 1 But


.

(J o T- 1 )^- 1 ) = 1 ; so

./W< f (/oT)|J|dF< [ /JK,


T(ii) JR JTR
and the desired equality has been proved.
If 7 is zero on some piece of smooth surface S in R, then the above
proof breaks down because T~ x may fail to be continuously differentiate.
However, by Theorem 8.2 of this Appendix, S can be enclosed in the
interior of a union U of finitely many rectangles of arbitrarily small
content and the image surface T(S) will be contained in an image region
v,

T(U) having content at most B"v, where B is the constant of relation (6).
Then, applying the change-of-variable formula to the region R with U
deleted, we get

[ fdV-( (foT)\J\dV
JT<R) JR
Index

Associated column, row, 98

Abel summable integral, 491


Absolute maximum, minimum, 350
B
Absolute value, 148
Acceleration:
Basis, 6, 12, 111
magnitude of, 209
Bessel's inequality, 431
normal, 217
Bilinear function, symmetric, 361
tangential, 217
Binormal vector, 209
Acceleration vector, 209
Border of surface, 537
Affine:
Boundary of a set, 248
function, 33, 250
Bounded function, set, 450
subspace, 116
Box norm, 581
Amplitude, 159
Angle, 159
Angular momentum, 21
Approximation:
Fourier, 397
Legendre, 435 Cantor function, 585
mean-square, 432 Cauchy-Schwarz inequality, 41
Simpson's three-point, 499 Cauchy sequence, 580
Taylor, 380, 383, 385 Center of mass, 220
Arc length, 585 Centripetal component, 217
element of, 216 Centroid, 466
parametrization, 216 Chain rule, 287, 290, 583
Area: Change of variable, 467, 605
element, 477 Characteristic equation, roots, vec-
surface, 532 tors, 168, 171,370,371

611
612 Index

Characteristic polynomial, 137 Coordinate rectangle, 449


Circuit, 518 closed, 449
Circulation, 519, 545 open, 450
Closed set, 248 Coordinates of a vector, 6, 179
Closed surface, 547 Coordinate vector, 179
Closure of a set, 248 Covers, 450
Coefficient matrix, 79 Cramer's rule, 71
Coefficients: Critical point, 353
Fourier, 161, 397, 430 Cross-product, 47, 63
Legendre, 435 Curl, 518, 542
multinomial, 385 Curvature, 217
Cofactor, 52, 69 Curvilinear coordinates, 310, 315
Column of a matrix, 15 Cylindrical coordinates, 318
Column vector, /7-dimensional, 16
Complex:
addition, 145 I)

conjugate, 148
exponential, 150 Del, 557
inner product, 156 Dependent vectors, 12, 111
numbers, 147 Derivative, 205, 227, 253
vector space, 147, 153 directional, 272, 273
Component, tangential, normal, 217 Derivative with respect to a vector,
Component of a vector, 40 255
Composition of functions, 31, 121, Determinant, 51, 52
288 expansions, 67
Conditionally convergent integral, Diagonal form, 363
490 Diagonal matrix, 22
Conjugate homogeneity, 157 Differentiable function, 252
Conjugate symmetric, 156 252
Differential,
Connected, polygonally, 276 Ath-order, 385
Conservative field, 522, 524 Differential form, 565, 566, 569
Content, 450, 456 Differential operator, 130
/7-dimensional, 458 Diffusion equation, 413
Continuous differentiability, 259 Dimension, 1 13
Continuous function, 244, 245 Directional derivative, 272, 273
Convergence: Direction cosine, 50
conditional, 490 Discontinuity, removable, 250
Fourier series, 400 Discontinuity point, infinite, 483
pointwise, 422 Divergence of a field, 520, 551
sequence, 264 Divergence theorem, 520
uniform, 423 Domain, space, 1, 2, 199
Convergent family of subsets, 485 Dot product, 37
Convex, 14
Coordinate, 40
Coordinate functions, 180 E
Coordinate map, 182
Coordinate of a vector relative to Echelon form, 107
another vector, 40 Eigenfunction, 433
Index 613

Eigenvalues, 168, 370,433 Force vector, 209, 211


Eigenvectors, 168, 371,433 field, 222

Element: Fourier approximation, 397, 435


arc length,216 coefficient, 161, 397,430
surface area, 477 expansion, 163, 405
volume, 477 Fourier-Legendre approximation,
Elementary matrix, 105, 107 435
Elementary operations, 81 Fourier series, 397, 399
inverse, 82 convergence of, 587
modification, 82 Function:
multiplication, 81 affine, 33
transformation, 105 composite, 31, 288
transposition, 82 continuous, 244
Entry, matrix, 16 continuously differentiable, 259
Equilibrium, 91 differentiable, 252
Equipotential surface, 287 even, odd, 408
Equivalent norms, 581 integrable, 451
Equivalent parametrizations, 218, inverses, 32, 325
539 limit of, 204, 231, 241
Equivalent systems of linear equa- linear, 25
tions, 80 one-to-one, 30, 127
Even function, 408 orthogonal, 429
Expansion: real-valued, 192
Fourier, 163, 405 symmetric bilinear, 361
Legendre, 435 translation, 33
multinomial, 385 Fundamental theorem of calculus,
Taylor, 380, 383, 390 283
Expansion, determinant, 67
Explicit definition, 196, 198
of surfaces, 341
Exponential, complex multiplier,
132 Gauss's theorem, 520, 551, 552
Exterior derivative, 574 Gradient, 278
differentiation, 575 field, 524
Exterior product, 569 local, 549
Extreme point, value, 350, 390 temperature, 287, 539
Gram-Schmidt orthogonalization
process, 163
Graph, 192, 198
Green's identities, 559
Field: Green's theorem, 510, 513
conservative, 524 Grid, 450
force, 222
gradient, 524
vector, 199 H
Finite dimensional, 114
Flow, steady, 236, 237 Harmonic function, 555
Flux, 520, 536, 569 Heat equation, 412, 413
614 Index

Hermitian symmetric, 177 K


Homogeneous equation, 100, 140
solution, 140 Kinetic energy, 524
Homogeneous polynomial, 362, 363 &th-order differential, 385

Hyperbolic paraboloid, 364

Lagrange formula, 434


Identity function, 299 Lagrange multipliers, 355
Identity matrix, 20
Lagrange theorem, 595
Image, 1
Laplace equation, 559
Imaginary part, 147
Leading entry, 97
Implicit definition, 197, 199, 302,
Legendre approximation, 435
303 Legendre equation, polynomial, 435
Implicit function theorem, 33 593 1 ,
Leibnitz rule, 463
Implicitly defined surfaces,344
Length of a curve, 212
Improper Riemann integral, 485 Length of a vector, 38
Incompressible field, 556, 564 Level set, 197, 199
Incompressible flow, 520 Limit of function, 204, 231, 241
Independent vectors, 12, 111
Limit point, 240
Infimum, 579 Line, 10, 116
Infinite discontinuity point, 483
through the origin, 116
Inner product, 37 Linear combination, 6, 108
Inner product complex, 156 Linear function, 25, 120
Inner product space, 41 composition, 121
Integrable function, 451 sum, 120
Integral, surface, 530, 534
Linearity theorem, 458
line, 221 Linearly dependent, independent,
Integral of function, 45 485 1,
11, 12, 111
Integral Remainder Formula, 380 momentum, 21
Linear
Interior point, 247 Linear operator, 132, 433
Inverse elementary operation, 82 Linear subspace, 108
Inverse function, 32, 124, 325 Line integral, 221
Inverse function theorem, 327, 592 Line segment, 14
Inverse matrix, 20 Local maximum value, minimum
Invertible matrix, 20 value, 350
Irrotational field, 519, 556
Logarithmic potential, 287
Isolated point, 244
Isomorphism, 182
Iterated integral, 440, 441
M
Magnitude of acceleration, 209
Magnitude of force, 209
Jacobian determinant, 296 Matrix, 15

Jacobian matrix, 255 diagonal, 22


Index 615

Matrix (cont.) O
elementary, 105, 107
identity, 20 Odd function, 408
inverse, 20 One-to-one function, 30, 127
invertible, 20 Open coordinate rectangle, 450
Jacobian, 255 Open set, 231, 247
orthogonal, 42, 43 Operator, linear, 132, 433
product, 18 differential, 130
symmetric, 363 Orientable surface, 537
transformation, 25 Orientation:
transpose. 43, 44 boundary surface, 551
Maximum: negative, 62
absolute, 350 positive, 62, 537, 551, 552
local,350 surface, 537
norm, 580 Oriented volume, 65
Mean, probabilistic, 492 Origin, 7
Mean-square approximation, 432 Orthogonal functions, 429
Mean-value theorem, 275 Orthogonal matrix, 43
Mesh, 450 Orthogonal vectors, 42
Minimum: Orthonormal sequence, 430
absolute, 350 Orthonormal set, 43, 160
local, 350
Minor, 51
Modification, elementary, 82
Moment of inertia, 466
Multinomial coefficients, 385 Paraboloid, hyperbolic, 364
Multinomial expansion, 385 Parallel, 10, 116
Multiple integral, 449 Parallelepiped, 61
Multiplication, elementary, 81 Parallelogram, law, 8
Multipliers, Lagrange, 355 Parametric definition, 196, 198
Parametrization, arc-length, 216
Partial derivative, 227
N Particular solution, 140
Phase angle, 159
Natural basis, 6 Piecewise smooth curve, 213, 400
Negativedefinite, 377, 390 Piecewise smooth surface, 537
Neighborhood, 247 Pivotal column, 98
Network, 88 Plane, 11, 116, 337
Newtonian potential, 287, 564 through the origin, 1 16
Newton's method, modified, 263 Pointwise convergence, 422
Norm, 41 Poisson equation, 559
box,maximum, 580 Polar angle, 149
Normal component, 217 form, 149
Normal vector, 281 Polar coordinates, 3 1

Null space, 109 Polygonally connected, 276


Numerical multiple, 4, 16 Polynomial:
multiplication, 148 homogeneous quadratic, 362
616 Index

Polynomial (cont.) Row of a matrix, 15


Legendre, 435 Row vector, A?-dimensional, 16
quadratic, 361
Taylor, 381,383, 384
trigonometric, 397
Positive definite, 363, 392
Positive orientation, 537, 551, 552 Scalar, 6
Positivity theorem, 458 Separation of variables, 414
Potential energy, 525 Sequence, Cauchy, 580
Potential: Series, Fourier, 397,399
logarithmic, 287 Shape of a matrix, 15
Newtonian, 287, 564 Shear, transformation, 176
Potential of a field,524 Similar, 186
Potential function, 287 Simple region:
Principal normal, 209 in ,513
Principal value, 487 in ,552
Probability density, 467 Simply connected region, 548
Product rule for determinants, 60 Simpson's rule, 500
Projection, 7, 163 three-point approximation, 499
Smooth curve, 337
piecewise, 213, 400, 537
Smooth set, 453, 601
Smooth surface, 337
piece of, 531
Quadratic form, 362
Solenoidal field, 564
Quadratic polynomial, 361
Solid angle, 541
homogeneous, 362
Span, 11, 109
Quadratic surface, 374
Speed, 207
Spherical coordinates, 313
ball, 240
Square matrix, 16
Stokes's formula, 544
Random walk, 91 Stokes's theorem, 519, 541, 544
Range, 1, 198 Sturm-Liouville operator, 434
Range space, 2 Subspace, 108
Rank, 127 affine, 1 16
Real part, 147 Sum:
Real-valued function, 192 of functions, 120
Rectifiable curve, 212 of matrices, 16
Reduced matrix, 97 of /;-tuples, 4
Remainder, Taylor, 380 Superposition principle, 94, 141
Removable discontinuity, 250 Supremum, 579
Resultant force, 90 Surface:
Riemann integral, 601 area, 532
improper, 485 equipotential, 287
Riemann's lemma, 589 explicitly defined, 341

Riemann sum, 451 implicitly defined, 344


Index 617

Surface (com.) Trigonometric polynomial, 397


integral, 530, 534
of revolution, area, 541
quadratic, 374 U
smooth, 337, 531
Symmetric, 173 Uniform convergence, 423
with respect to inner product, 433 Unit vector, 272
Symmetric bilinear function, 361
Symmetric linear transformation,
379
Symmetric operator, 560
System of linear equations, 79 Variance, 492
Vector, 5
length, 38
velocity, 207
Vector field, 199
Tangent: Vector identities, 558
line, 206, 253, 281 Vector integral, 211, 465
plane, 281,337 Vector space, 5
vector, 206 complex, 147, 153
Tangential component, 217 linear subspace, subspace, 108
Taylor expansion, 380, 383, 390 Volume, 450, 456
Taylor's theorem, 386, 598 oriented, 65
Temperature gradient, 287, 539 Volume element, 477
Torque, 21
Total mass, 215
Trajectory, 237 W
Transformation:
elementary, 105 Wave equation, 412
shear, 176 Weierstrass test, 424
Transformations, linear and afflne, Work, 223
34
Translation, 250
of functions, 33
Transpose, matrix, 43, 44
Transposition, elementary, 82 Zero matrix, 17
Triangle inequality, 41, 157, 581 Zero vector, 5

You might also like