Sheldon Axler - Linear Algebra Done Right - Second Edition
Sheldon Axler - Linear Algebra Done Right - Second Edition
Sheldon Axler - Linear Algebra Done Right - Second Edition
Done Right,
Second Edition
Sheldon Axler
Springer
Contents
Preface to the Instructor
ix
xiii
Acknowledgments
xv
Chapter 1
Vector Spaces
Complex Numbers . . . . .
Denition of Vector Space .
Properties of Vector Spaces
Subspaces . . . . . . . . . .
Sums and Direct Sums . . .
Exercises . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
2
4
11
13
14
19
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
22
27
31
35
.
.
.
.
.
37
38
41
48
53
59
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 2
Chapter 3
Linear Maps
Denitions and Examples .
Null Spaces and Ranges . .
The Matrix of a Linear Map
Invertibility . . . . . . . . .
Exercises . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
v
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vi
Contents
Chapter 4
Polynomials
Degree . . . . . . . . .
Complex Coefcients
Real Coefcients . . .
Exercises . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
63
64
67
69
73
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
75
76
80
81
87
91
94
Inner-Product Spaces
Inner Products . . . . . . . . . . . . . . . . . . . . . . .
Norms . . . . . . . . . . . . . . . . . . . . . . . . . . .
Orthonormal Bases . . . . . . . . . . . . . . . . . . . .
Orthogonal Projections and Minimization Problems
Linear Functionals and Adjoints . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
97
98
102
106
111
117
122
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
127
128
132
138
144
147
152
158
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 5
Chapter 6
Chapter 7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 8
Contents
Square Roots . . . . . . .
The Minimal Polynomial
Jordan Form . . . . . . .
Exercises . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
177
179
183
188
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
193
194
195
198
210
.
.
.
.
.
.
213
214
216
222
225
236
244
Chapter 9
Chapter 10
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Symbol Index
247
Index
249
You are probably about to teach a course that will give students
their second exposure to linear algebra. During their rst brush with
the subject, your students probably worked with Euclidean spaces and
matrices. In contrast, this course will emphasize abstract vector spaces
and linear maps.
The audacious title of this book deserves an explanation. Almost
all linear algebra books use determinants to prove that every linear operator on a nite-dimensional complex vector space has an eigenvalue.
Determinants are difcult, nonintuitive, and often dened without motivation. To prove the theorem about existence of eigenvalues on complex vector spaces, most books must dene determinants, prove that a
linear map is not invertible if and only if its determinant equals 0, and
then dene the characteristic polynomial. This tortuous (torturous?)
path gives students little feeling for why eigenvalues must exist.
In contrast, the simple determinant-free proofs presented here offer more insight. Once determinants have been banished to the end
of the book, a new route opens to the main goal of linear algebra
understanding the structure of linear operators.
This book starts at the beginning of the subject, with no prerequisites other than the usual demand for suitable mathematical maturity.
Even if your students have already seen some of the material in the
rst few chapters, they may be unaccustomed to working exercises of
the type presented here, most of which require an understanding of
proofs.
Vector spaces are dened in Chapter 1, and their basic properties
are developed.
Linear independence, span, basis, and dimension are dened in
Chapter 2, which presents the basic theory of nite-dimensional
vector spaces.
ix
The minimal polynomial, characteristic polynomial, and generalized eigenvectors are introduced in Chapter 8. The main achievement of this chapter is the description of a linear operator on
a complex vector space in terms of its generalized eigenvectors.
This description enables one to prove almost all the results usually proved using Jordan form. For example, these tools are used
to prove that every invertible linear operator on a complex vector
space has a square root. The chapter concludes with a proof that
every linear operator on a complex vector space can be put into
Jordan form.
Linear operators on real vector spaces occupy center stage in
Chapter 9. Here two-dimensional invariant subspaces make up
for the possible lack of eigenvalues, leading to results analogous
to those obtained on complex vector spaces.
The trace and determinant are dened in Chapter 10 in terms
of the characteristic polynomial (dened earlier without determinants). On complex vector spaces, these denitions can be restated: the trace is the sum of the eigenvalues and the determinant is the product of the eigenvalues (both counting multiplicity). These easy-to-remember denitions would not be possible
with the traditional approach to eigenvalues because that method
uses determinants to prove that eigenvalues exist. The standard
theorems about determinants now become much clearer. The polar decomposition and the characterization of self-adjoint operators are used to derive the change of variables formula for multivariable integrals in a fashion that makes the appearance of the
determinant there seem natural.
This book usually develops linear algebra simultaneously for real
and complex vector spaces by letting F denote either the real or the
complex numbers. Abstract elds could be used instead, but to do so
would introduce extra abstraction without leading to any new linear algebra. Another reason for restricting attention to the real and complex
numbers is that polynomials can then be thought of as genuine functions instead of the more formal objects needed for polynomials with
coefcients in nite elds. Finally, even if the beginning part of the theory were developed with arbitrary elds, inner-product spaces would
push consideration back to just real and complex vector spaces.
xi
xii
Even in a book as short as this one, you cannot expect to cover everything. Going through the rst eight chapters is an ambitious goal for a
one-semester course. If you must reach Chapter 10, then I suggest covering Chapters 1, 2, and 4 quickly (students may have seen this material
in earlier courses) and skipping Chapter 9 (in which case you should
discuss trace and determinants only on complex vector spaces).
A goal more important than teaching any particular set of theorems
is to develop in students the ability to understand and manipulate the
objects of linear algebra. Mathematics can be learned only by doing;
fortunately, linear algebra has many good homework problems. When
teaching this course, I usually assign two or three of the exercises each
class, due the next class. Going over the homework might take up a
third or even half of a typical class.
A solutions manual for all the exercises is available (without charge)
only to instructors who are using this book as a textbook. To obtain
the solutions manual, instructors should send an e-mail request to me
(or contact Springer if I am no longer around).
Please check my web site for a list of errata (which I hope will be
empty or almost empty) and other information about this book.
I would greatly appreciate hearing about any errors in this book,
even minor ones. I welcome your suggestions for improvements, even
tiny ones. Please feel free to contact me.
Have fun!
Sheldon Axler
Mathematics Department
San Francisco State University
San Francisco, CA 94132, USA
e-mail: axler@math.sfsu.edu
www home page: http://math.sfsu.edu/axler
You are probably about to begin your second exposure to linear algebra. Unlike your rst brush with the subject, which probably emphasized Euclidean spaces and matrices, we will focus on abstract vector
spaces and linear maps. These terms will be dened later, so dont
worry if you dont know what they mean. This book starts from the beginning of the subject, assuming no knowledge of linear algebra. The
key point is that you are about to immerse yourself in serious mathematics, with an emphasis on your attaining a deep understanding of
the denitions, theorems, and proofs.
You cannot expect to read mathematics the way you read a novel. If
you zip through a page in less than an hour, you are probably going too
fast. When you encounter the phrase as you should verify, you should
indeed do the verication, which will usually require some writing on
your part. When steps are left out, you need to supply the missing
pieces. You should ponder and internalize each denition. For each
theorem, you should seek examples to show why each hypothesis is
necessary.
Please check my web site for a list of errata (which I hope will be
empty or almost empty) and other information about this book.
I would greatly appreciate hearing about any errors in this book,
even minor ones. I welcome your suggestions for improvements, even
tiny ones.
Have fun!
Sheldon Axler
Mathematics Department
San Francisco State University
San Francisco, CA 94132, USA
e-mail: axler@math.sfsu.edu
www home page: http://math.sfsu.edu/axler
xiii
Acknowledgments
I owe a huge intellectual debt to the many mathematicians who created linear algebra during the last two centuries. In writing this book I
tried to think about the best way to present linear algebra and to prove
its theorems, without regard to the standard methods and proofs used
in most textbooks. Thus I did not consult other books while writing
this one, though the memory of many books I had studied in the past
surely inuenced me. Most of the results in this book belong to the
common heritage of mathematics. A special case of a theorem may
rst have been proved in antiquity (which for linear algebra means the
nineteenth century), then slowly sharpened and improved over decades
by many mathematicians. Bestowing proper credit on all the contributors would be a difcult task that I have not undertaken. In no case
should the reader assume that any theorem presented here represents
my original contribution.
Many people helped make this a better book. For useful suggestions and corrections, I am grateful to William Arveson (for suggesting
the proof of 5.13), Marilyn Brouwer, William Brown, Robert Burckel,
Paul Cohn, James Dudziak, David Feldman (for suggesting the proof of
8.40), Pamela Gorkin, Aram Harrow, Pan Fong Ho, Dan Kalman, Robert
Kantrowitz, Ramana Kappagantu, Mizan Khan, Mikael Lindstr
om, Jacob Plotkin, Elena Poletaeva, Mihaela Poplicher, Richard Potter, Wade
Ramey, Marian Robbins, Jonathan Rosenberg, Joan Stamm, Thomas
Starbird, Jay Valanju, and Thomas von Foerster.
Finally, I thank Springer for providing me with help when I needed
it and for allowing me the freedom to make the nal decisions about
the content and appearance of this book.
xv
Chapter 1
Vector Spaces
Linear algebra is the study of linear maps on nite-dimensional vector spaces. Eventually we will learn what all these terms mean. In this
chapter we will dene vector spaces and discuss their elementary properties.
In some areas of mathematics, including linear algebra, better theorems and more insight emerge if complex numbers are investigated
along with real numbers. Thus we begin by introducing the complex
numbers and their basic properties.
Complex Numbers
1 by
used to denote
the Swiss
mathematician
Leonhard Euler in 1777.
You should already be familiar with the basic properties of the set R
of real numbers. Complex numbers were invented so that we can take
square roots of negative numbers. The key idea is to assume we have
a square root of 1, denoted i, and manipulate it using the usual rules
of arithmetic. Formally, a complex number is an ordered pair (a, b),
where a, b R, but we will write this as a + bi. The set of all complex
numbers is denoted by C:
C = {a + bi : a, b R}.
If a R, we identify a + 0i with the real number a. Thus we can think
of R as a subset of C.
Addition and multiplication on C are dened by
(a + bi) + (c + di) = (a + c) + (b + d)i,
(a + bi)(c + di) = (ac bd) + (ad + bc)i;
here a, b, c, d R. Using multiplication as dened above, you should
verify that i2 = 1. Do not memorize the formula for the product
of two complex numbers; you can always rederive it by recalling that
i2 = 1 and then using the usual rules of arithmetic.
You should verify, using the familiar properties of the real numbers, that addition and multiplication on C satisfy the following properties:
commutativity
w + z = z + w and wz = zw for all w, z C;
associativity
(z1 + z2 ) + z3 = z1 + (z2 + z3 ) and (z1 z2 )z3 = z1 (z2 z3 ) for all
z1 , z2 , z3 C;
identities
z + 0 = z and z1 = z for all z C;
additive inverse
for every z C, there exists a unique w C such that z + w = 0;
multiplicative inverse
for every z C with z = 0, there exists a unique w C such that
zw = 1;
Complex Numbers
distributive property
(w + z) = w + z for all , w, z C.
For z C, we let z denote the additive inverse of z. Thus z is
the unique complex number such that
z + (z) = 0.
Subtraction on C is dened by
w z = w + (z)
for w, z C.
For z C with z = 0, we let 1/z denote the multiplicative inverse
of z. Thus 1/z is the unique complex number such that
z(1/z) = 1.
Division on C is dened by
w/z = w(1/z)
for w, z C with z = 0.
So that we can conveniently make denitions and prove theorems
that apply to both real and complex numbers, we adopt the following
notation:
Throughout this book,
F stands for either R or C.
Many mathematicians
To generalize R 2 and R3 to higher dimensions, we rst need to discuss the concept of lists. Suppose n is a nonnegative integer. A list of
length n is an ordered collection of n objects (which might be numbers, other lists, or more abstract entities) separated by commas and
surrounded by parentheses. A list of length n looks like this:
(x1 , . . . , xn ).
Thus a list of length 2 is an ordered pair and a list of length 3 is an
ordered triple. For j {1, . . . , n}, we say that xj is the j th coordinate
of the list above. Thus x1 is called the rst coordinate, x2 is called the
second coordinate, and so on.
Sometimes we will use the word list without specifying its length.
Remember, however, that by denition each list has a nite length that
is a nonnegative integer, so that an object that looks like
(x1 , x2 , . . . ),
which might be said to have innite length, is not a list. A list of length
0 looks like this: (). We consider such an object to be a list so that
some of our theorems will not have trivial exceptions.
Two lists are equal if and only if they have the same length and
the same coordinates in the same order. In other words, (x1 , . . . , xm )
equals (y1 , . . . , yn ) if and only if m = n and x1 = y1 , . . . , xm = ym .
Lists differ from sets in two ways: in lists, order matters and repetitions are allowed, whereas in sets, order and repetitions are irrelevant.
For example, the lists (3, 5) and (5, 3) are not equal, but the sets {3, 5}
and {5, 3} are equal. The lists (4, 4) and (4, 4, 4) are not equal (they
do not have the same length), though the sets {4, 4} and {4, 4, 4} both
equal the set {4}.
To dene the higher-dimensional analogues of R 2 and R 3 , we will
simply replace R with F (which equals R or C) and replace the 2 or 3
with an arbitrary positive integer. Specically, x a positive integer n
for the rest of this section. We dene Fn to be the set of all lists of
length n consisting of elements of F:
Fn = {(x1 , . . . , xn ) : xj F for j = 1, . . . , n}.
For example, if F = R and n equals 2 or 3, then this denition of Fn
agrees with our previous notions of R 2 and R3 . As another example,
C4 is the set of all lists of four complex numbers:
C4 = {(z1 , z2 , z3 , z4 ) : z1 , z2 , z3 , z4 C}.
If n 4, we cannot easily visualize R n as a physical object. The same
problem arises if we work with complex numbers: C1 can be thought
of as a plane, but for n 2, the human brain cannot provide geometric
models of Cn . However, even if n is large, we can perform algebraic
manipulations in Fn as easily as in R 2 or R 3 . For example, addition is
dened on Fn by adding corresponding coordinates:
For an amusing
account of how R3
would be perceived by
a creature living in R2 ,
read Flatland: A
Romance of Many
Dimensions, by Edwin
1.1
(x , x )
1
x1-axis
A vector
Whenever we use pictures in R2 or use the somewhat vague language of points and vectors, remember that these are just aids to our
understanding, not substitutes for the actual mathematics that we will
develop. Though we cannot draw good pictures in high-dimensional
spaces, the elements of these spaces are as rigorously dened as ele
ments of R2 . For example, (2, 3, 17, , 2) is an element of R5 , and we
may casually refer to it as a point in R 5 or a vector in R5 without worrying about whether the geometry of R5 has any physical meaning.
Recall that we dened the sum of two elements of Fn to be the element of Fn obtained by adding corresponding coordinates; see 1.1. In
the special case of R2 , addition has a simple geometric interpretation.
Suppose we have two vectors x and y in R2 that we want to add, as in
the left side of the picture below. Move the vector y parallel to itself so
that its initial point coincides with the end point of the vector x. The
sum x + y then equals the vector whose initial point equals the initial point of x and whose end point equals the end point of the moved
vector y, as in the right side of the picture below.
Mathematical models
of the economy often
have thousands of
variables, say
x1 , . . . , x5000 , which
means that we must
operate in R5000 . Such
a space cannot be dealt
with geometrically, but
the algebraic approach
works well. Thats why
x+y
linear algebra.
or R3 , in which we
multiply together two
vectors and obtain a
scalar. Generalizations
of the dot product will
(3/2) x
x
(1/2)x
become important
when we study inner
products in Chapter 6.
You may also be
familiar with the cross
product in R3 , in which
we multiply together
two vectors and obtain
another vector. No
useful generalization of
this type of
(1/2) x
multiplication exists in
higher dimensions.
(3/2)x
The motivation for the denition of a vector space comes from the
important properties possessed by addition and scalar multiplication
on Fn . Specically, addition on Fn is commutative and associative and
has an identity, namely, 0. Every element has an additive inverse. Scalar
multiplication on Fn is associative, and scalar multiplication by 1 acts
as a multiplicative identity should. Finally, addition and scalar multiplication on Fn are connected by distributive properties.
We will dene a vector space to be a set V along with an addition
and a scalar multiplication on V that satisfy the properties discussed
in the previous paragraph. By an addition on V we mean a function
that assigns an element u + v V to each pair of elements u, v V .
By a scalar multiplication on V we mean a function that assigns an
element av V to each a F and each v V .
Now we are ready to give the formal denition of a vector space.
A vector space is a set V along with an addition on V and a scalar
multiplication on V such that the following properties hold:
commutativity
u + v = v + u for all u, v V ;
associativity
(u + v) + w = u + (v + w) and (ab)v = a(bv) for all u, v, w V
and all a, b F;
additive identity
there exists an element 0 V such that v + 0 = v for all v V ;
additive inverse
for every v V , there exists w V such that v + w = 0;
multiplicative identity
1v = v for all v V ;
distributive properties
a(u + v) = au + av and (a + b)u = au + bu for all a, b F and
all u, v V .
The scalar multiplication in a vector space depends upon F. Thus
when we need to be precise, we will say that V is a vector space over F
instead of saying simply that V is a vector space. For example, R n is
a vector space over R, and Cn is a vector space over C. Frequently, a
vector space over R is called a real vector space and a vector space over
10
Though Fn is our
crucial example of a
vector space, not all
vector spaces consist
of lists. For example,
the elements of P(F)
consist of functions on
F, not lists. In general,
a vector space is an
abstract entity whose
elements might be lists,
functions, or weird
objects.
11
Proof: Suppose 0 and 0 are both additive identities for some vector space V . Then
0 = 0 + 0 = 0,
where the rst equality holds because 0 is an additive identity and the
second equality holds because 0 is an additive identity. Thus 0 = 0,
proving that V has only one additive identity.
The symbol
means
12
1.4
0v = (0 + 0)v = 0v + 0v.
v + (1)v = 1v + (1)v = 1 + (1) v = 0v = 0.
This equation says that (1)v, when added to v, gives 0. Thus (1)v
must be the additive inverse of v, as desired.
Subspaces
13
Subspaces
A subset U of V is called a subspace of V if U is also a vector space
(using the same addition and scalar multiplication as on V ). For example,
{(x1 , x2 , 0) : x1 , x2 F}
Some mathematicians
use the term linear
subspace, which means
the same as subspace.
is a subspace of F3 .
If U is a subset of V , then to check that U is a subspace of V we
need only check that U satises the following:
additive identity
0U
closed under addition
u, v U implies u + v U;
closed under scalar multiplication
a F and u U implies au U.
The rst condition insures that the additive identity of V is in U. The
second condition insures that addition makes sense on U. The third
condition insures that scalar multiplication makes sense on U . To show
that U is a vector space, the other parts of the denition of a vector
space do not need to be checked because they are automatically satised. For example, the associative and commutative properties of addition automatically hold on U because they hold on the larger space V .
As another example, if the third condition above holds and u U , then
u (which equals (1)u by 1.6) is also in U, and hence every element
of U has an additive inverse in U.
The three conditions above usually enable us to determine quickly
whether a given subset of V is a subspace of V . For example, if b F,
then
{(x1 , x2 , x3 , x4 ) F4 : x3 = 5x4 + b}
is a subspace of F4 if and only if b = 0, as you should verify. As another
example, you should verify that
{p P(F) : p(3) = 0}
is a subspace of P(F).
The subspaces of R 2 are precisely {0}, R2 , and all lines in R 2 through
the origin. The subspaces of R 3 are precisely {0}, R 3 , all lines in R3
14
through the origin, and all planes in R 3 through the origin. To prove
that all these objects are indeed subspaces is easythe hard part is to
show that they are the only subspaces of R2 or R3 . That task will be
easier after we introduce some additional tools in the next chapter.
in subspaces, as
U1 + + Um = {u1 + + um : u1 U1 , . . . , um Um }.
opposed to arbitrary
subsets. The union of
subspaces is rarely a
subspace (see
Exercise 9 in this
chapter), which is why
we usually work with
sums rather than
unions.
U = {(x, 0, 0) F3 : x F}
Then
Sums of subspaces in
the theory of vector
spaces are analogous to
unions of subsets in set
theory. Given two
subspaces of a vector
1.7
U + W = {(x, y, 0) : x, y F},
W = {(y, y, 0) F3 : y F}.
15
The symbol ,
consisting of a plus
sign inside a circle, is
used to denote direct
sums as a reminder
that we are dealing with
a special type of sum of
subspaceseach
element in the direct
sum can be represented
only one way as a sum
of elements from the
specied subspaces.
16
P(F) = Ue Uo .
Sometimes nonexamples add to our understanding as much as examples. Consider the following three subspaces of F3 :
U1 = {(x, y, 0) F3 : x, y F};
U2 = {(0, 0, z) F3 : z F};
U3 = {(0, y, y) F3 : y F}.
Clearly F3 = U1 + U2 + U3 because an arbitrary vector (x, y, z) F3 can
be written as
(x, y, z) = (x, y, 0) + (0, 0, z) + (0, 0, 0),
where the rst vector on the right side is in U1 , the second vector is
in U2 , and the third vector is in U3 . However, F3 does not equal the
direct sum of U1 , U2 , U3 because the vector (0, 0, 0) can be written in
two different ways as a sum u1 +u2 +u3 , with each uj Uj . Specically,
we have
(0, 0, 0) = (0, 1, 0) + (0, 0, 1) + (0, 1, 1)
and, of course,
(0, 0, 0) = (0, 0, 0) + (0, 0, 0) + (0, 0, 0),
where the rst vector on the right side of each equation above is in U1 ,
the second vector is in U2 , and the third vector is in U3 .
In the example above, we showed that something is not a direct sum
by showing that 0 does not have a unique representation as a sum of
appropriate vectors. The denition of direct sum requires that every
vector in the space have a unique representation as an appropriate sum.
Suppose we have a collection of subspaces whose sum equals the whole
space. The next proposition shows that when deciding whether this
collection of subspaces is a direct sum, we need only consider whether
0 can be uniquely written as an appropriate sum.
1.8
Proposition: Suppose that U1 , . . . , Un are subspaces of V . Then
V = U1 Un if and only if both the following conditions hold:
(a)
V = U 1 + + Un ;
(b)
17
1.9
Proposition: Suppose that U and W are subspaces of V . Then
V = U W if and only if V = U + W and U W = {0}.
18
Exercises
Exercises
1.
Suppose a and b are real numbers, not both 0. Find real numbers
c and d such that
1/(a + bi) = c + di.
2.
Show that
1 + 3i
2
4.
5.
(b)
(c)
{(x1 , x2 , x3 ) F3 : x1 x2 x3 = 0};
(d)
{(x1 , x2 , x3 ) F3 : x1 = 5x3 }.
6.
7.
8.
9.
10.
11.
19
20
12.
13.
14.
15.
Chapter 2
Finite-Dimensional
Vector Spaces
21
22
Some mathematicians
use the term linear
a1 v1 + + am vm ,
same as span.
Recall that by
denition every list has
Thus (7, 2, 9) span (2, 1, 3), (1, 0, 1) .
You should verify that the span of any list of vectors in V is a subspace of V . To be consistent, we declare that the span of the empty list
() equals {0} (recall that the empty set is not a subspace of V ).
If (v1 , . . . , vm ) is a list of vectors in V , then each vj is a linear combination of (v1 , . . . , vm ) (to show this, set aj = 1 and let the other as
in 2.1 equal 0). Thus span(v1 , . . . , vm ) contains each vj . Conversely,
because subspaces are closed under scalar multiplication and addition,
every subspace of V containing each vj must contain span(v1 , . . . , vm ).
Thus the span of a list of vectors in V is the smallest subspace of V
containing all the vectors in the list.
If span(v1 , . . . , vm ) equals V , we say that (v1 , . . . , vm ) spans V . A
vector space is called nite dimensional if some list of vectors in it
spans the space. For example, Fn is nite dimensional because
(1, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1)
nite length.
p(z) = a0 + a1 z + + am zm
23
Innite-dimensional
vector spaces, which
we will not mention
much anymore, are the
center of attention in
the branch of
mathematics called
functional analysis.
Functional analysis
uses tools from both
analysis and algebra.
24
a0 + a1 z + + am zm = 0
25
2.4
Linear Dependence Lemma: If (v1 , . . . , vm ) is linearly dependent in V and v1 = 0, then there exists j {2, . . . , m} such that the
following hold:
(a)
vj span(v1 , . . . , vj1 );
(b)
vj =
aj1
a1
v1
vj1 ,
aj
aj
proving (a).
To prove (b), suppose that u span(v1 , . . . , vm ). Then there exist
c1 , . . . , cm F such that
u = c1 v1 + + cm vm .
In the equation above, we can replace vj with the right side of 2.5,
which shows that u is in the span of the list obtained by removing the
j th term from (v1 , . . . , vm ). Thus (b) holds.
Now we come to a key result. It says that linearly independent lists
are never longer than spanning lists.
2.6
Theorem: In a nite-dimensional vector space, the length of
every linearly independent list of vectors is less than or equal to the
length of every spanning list of vectors.
26
Step 1
The list (w1 , . . . , wn ) spans V , and thus adjoining any vector to it
produces a linearly dependent list. In particular, the list
(u1 , w1 , . . . , wn )
is linearly dependent. Thus by the linear dependence lemma (2.4),
we can remove one of the ws so that the list B (of length n)
consisting of u1 and the remaining ws spans V .
Step j
The list B (of length n) from step j 1 spans V , and thus adjoining
any vector to it produces a linearly dependent list. In particular,
the list of length (n + 1) obtained by adjoining uj to B, placing it
just after u1 , . . . , uj1 , is linearly dependent. By the linear dependence lemma (2.4), one of the vectors in this list is in the span of
the previous ones, and because (u1 , . . . , uj ) is linearly independent, this vector must be one of the ws, not one of the us. We
can remove that w from B so that the new list B (of length n)
consisting of u1 , . . . , uj and the remaining ws spans V .
After step m, we have added all the us and the process stops. If at
any step we added a u and had no more ws to remove, then we would
have a contradiction. Thus there must be at least as many ws as us.
Our intuition tells us that any vector space contained in a nitedimensional vector space should also be nite dimensional. We now
prove that this intuition is correct.
2.7
Proposition:
Every subspace of a nite-dimensional vector
space is nite dimensional.
Proof: Suppose V is nite dimensional and U is a subspace of V .
We need to prove that U is nite dimensional. We do this through the
following multistep construction.
Step 1
If U = {0}, then U is nite dimensional and we are done. If U =
{0}, then choose a nonzero vector v1 U .
Step j
If U = span(v1 , . . . , vj1 ), then U is nite dimensional and we are
Bases
27
Bases
A basis of V is a list of vectors in V that is linearly independent and
spans V . For example,
v = a1 v1 + + an vn ,
where a1 , . . . , an F.
Proof: First suppose that (v1 , . . . , vn ) is a basis of V . Let v V .
Because (v1 , . . . , vn ) spans V , there exist a1 , . . . , an F such that 2.9
holds. To show that the representation in 2.9 is unique, suppose that
b1 , . . . , bn are scalars so that we also have
This proof is
essentially a repetition
of the ideas that led us
to the denition of
linear independence.
v = b1 v1 + + bn vn .
28
Bases
29
(1, 2), (3, 6), (4, 7), (5, 9) ,
which spans F2 . To make sure that you understand the last proof, you
should verify that the process in the proof produces (1, 2), (4, 7) , a
basis of F2 , when applied to the list above.
Our next result, an easy corollary of the last theorem, tells us that
every nite-dimensional vector space has a basis.
2.11
Proof: By denition, a nite-dimensional vector space has a spanning list. The previous theorem tells us that any spanning list can be
reduced to a basis.
We have crafted our denitions so that the nite-dimensional vector
space {0} is not a counterexample to the corollary above. In particular,
the empty list () is a basis of the vector space {0} because this list has
been dened to be linearly independent and to have span {0}.
Our next theorem is in some sense a dual of 2.10, which said that
every spanning list can be reduced to a basis. Now we show that given
any linearly independent list, we can adjoin some additional vectors so
that the extended list is still linearly independent but also spans the
space.
2.12 Theorem: Every linearly independent list of vectors in a nitedimensional vector space can be extended to a basis of the vector space.
corollary. Specically,
suppose V is nite
dimensional. This
theorem implies that
the empty list () can be
extended to a basis
of V . In particular, V
has a basis.
30
Step j
If wj is in the span of B, leave B unchanged. If wj is not in the
span of B, extend B by adjoining wj to it.
After each step, B is still linearly independent because otherwise the
linear dependence lemma (2.4) would give a contradiction (recall that
(v1 , . . . , vm ) is linearly independent and any wj that is adjoined to B is
not in the span of the previous vectors in B). After step n, the span of
B includes all the ws. Thus the B obtained after step n spans V and
hence is a basis of V .
As a nice application of the theorem above, we now show that every subspace of a nite-dimensional vector space can be paired with
another subspace to form a direct sum of the whole space.
Using the same basic
ideas but considerably
2.13 Proposition: Suppose V is nite dimensional and U is a subspace of V . Then there is a subspace W of V such that V = U W .
and U W = {0};
Dimension
a1 u1 + + am um b1 w1 bn wn = 0.
Because (u1 , . . . , um , w1 , . . . , wn ) is linearly independent, this implies
that a1 = = am = b1 = = bn = 0. Thus v = 0, completing the
proof that U W = {0}.
Dimension
Though we have been discussing nite-dimensional vector spaces,
we have not yet dened the dimension of such an object. How should
dimension be dened? A reasonable denition should force the dimension of Fn to equal n. Notice that the basis
(1, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1)
31
32
Dimension
33
Because F2 has dimension 2, the last proposition implies that this linearly independent list of length 2 is a basis of F2 (we do not need to
bother checking that it spans F2 ).
The next theorem gives a formula for the dimension of the sum of
two subspaces of a nite-dimensional vector space.
2.18 Theorem: If U1 and U2 are subspaces of a nite-dimensional
vector space, then
analogous to a familiar
counting formula: the
number of elements in
the union of two nite
sets equals the number
of elements in the rst
set, plus the number of
elements in the second
set, minus the number
of elements in the
intersection of the two
sets.
34
V = U1 + + Um
2.20
and
2.21
Then V = U1 Um .
Exercises
Exercises
1.
2.
3.
4.
5.
6.
Prove that the real vector space consisting of all continuous realvalued functions on the interval [0, 1] is innite dimensional.
7.
Prove that V is innite dimensional if and only if there is a sequence v1 , v2 , . . . of vectors in V such that (v1 , . . . , vn ) is linearly
independent for every positive integer n.
8.
9.
10.
35
36
11.
12.
13.
14.
15.
You might guess, by analogy with the formula for the number
of elements in the union of three subsets of a nite set, that
if U1 , U2 , U3 are subspaces of a nite-dimensional vector space,
then
dim(U1 + U2 + U3 )
= dim U1 + dim U2 + dim U3
dim(U1 U2 ) dim(U1 U3 ) dim(U2 U3 )
+ dim(U1 U2 U3 ).
Prove this or give a counterexample.
16.
17.
Chapter 3
Linear Maps
So far our attention has focused on vector spaces. No one gets excited about vector spaces. The interesting part of linear algebra is the
subject to which we now turnlinear maps.
Lets review our standing assumptions:
Recall that F denotes R or C.
Recall also that V is a vector space over F.
In this chapter we will frequently need another vector space in addition to V . We will call this additional vector space W :
Lets agree that for the rest of this chapter
W will denote a vector space over F.
37
38
39
integration
Dene T L(P(R), R) by
1
Tp =
p(x) dx.
for x R.
imagined by some
backward shift
Recall that F denotes the vector space of all sequences of elements of F. Dene T L(F , F ) by
T (x1 , x2 , x3 , . . . ) = (x2 , x3 , . . . ).
from Fn to Fm
Dene T L(R3 , R2 ) by
T (x, y, z) = (2x y + 3z, 7x + 5y 6z).
More generally, let m and n be positive integers, let aj,k F for
j = 1, . . . , m and k = 1, . . . , n, and dene T L(Fn , Fm ) by
T (x1 , . . . , xn ) = (a1,1 x1 + +a1,n xn , . . . , am,1 x1 + +am,n xn ).
Later we will see that every linear map from Fn to Fm is of this
form.
Suppose (v1 , . . . , vn ) is a basis of V and T : V W is linear. If v V ,
then we can write v in the form
v = a1 v1 + + an vn .
The linearity of T implies that
identities such as
cos 2x = 2 cos x and
cos(x + y) =
cos x + cos y.
40
T v = a 1 T v1 + + a n T vn .
In particular, the values of T v1 , . . . , T vn determine the values of T on
arbitrary vectors in V .
Linear maps can be constructed that take on arbitrary values on a
basis. Specically, given a basis (v1 , . . . , vn ) of V and any choice of
vectors w1 , . . . , wn W , we can construct a linear map T : V W such
that T vj = wj for j = 1, . . . , n. There is no choice of how to do thiswe
must dene T by
T (a1 v1 + + an vn ) = a1 w1 + + an wn ,
where a1 , . . . , an are arbitrary elements of F. Because (v1 , . . . , vn ) is a
basis of V , the equation above does indeed dene a function T from V
to W . You should verify that the function T dened above is linear and
that T vj = wj for j = 1, . . . , n.
Now we will make L(V , W ) into a vector space by dening addition
and scalar multiplication on it. For S, T L(V , W ), dene a function
S + T L(V , W ) in the usual manner of adding functions:
(S + T )v = Sv + T v
for v V . You should verify that S + T is indeed a linear map from V
to W whenever S, T L(V , W ). For a F and T L(V , W ), dene a
function aT L(V , W ) in the usual manner of multiplying a function
by a scalar:
(aT )v = a(T v)
for v V . You should verify that aT is indeed a linear map from V to W
whenever a F and T L(V , W ). With the operations we have just
dened, L(V , W ) becomes a vector space (as you should verify). Note
that the additive identity of L(V , W ) is the zero linear map dened
earlier in this section.
Usually it makes no sense to multiply together two elements of a
vector space, but for some pairs of linear maps a useful product exists.
We will need a third vector space, so suppose U is a vector space over F.
If T L(U, V ) and S L(V , W ), then we dene ST L(U , W ) by
(ST )(v) = S(T v)
for v U. In other words, ST is just the usual composition S T of two
functions, but when both functions are linear, most mathematicians
41
Some mathematicians
use the term kernel
instead of null space.
null T = {v V : T v = 0}.
Lets look at a few examples from the previous section. In the differentiation example, we dened T L(P(R), P(R)) by T p = p . The
42
only functions whose derivative equals the zero function are the constant functions, so in this case the null space of T equals the set of
constant functions.
In the multiplication by x 2 example, we dened T L(P(R), P(R))
by (T p)(x) = x 2 p(x). The only polynomial p such that x 2 p(x) = 0
for all x R is the 0 polynomial. Thus in this case we have
null T = {0}.
In the backward shift example, we dened T L(F , F ) by
T (x1 , x2 , x3 , . . . ) = (x2 , x3 , . . . ).
Clearly T (x1 , x2 , x3 , . . . ) equals 0 if and only if x2 , x3 , . . . are all 0. Thus
in this case we have
null T = {(a, 0, 0, . . . ) : a F}.
The next proposition shows that the null space of any linear map is
a subspace of the domain. In particular, 0 is in the null space of every
linear map.
3.1
43
Many mathematicians
use the term
one-to-one, which
means the same as
injective.
3.2
Proposition: Let T L(V , W ). Then T is injective if and only
if null T = {0}.
Proof: First suppose that T is injective. We want to prove that
null T = {0}. We already know that {0} null T (by 3.1). To prove the
inclusion in the other direction, suppose v null T . Then
T (v) = 0 = T (0).
Because T is injective, the equation above implies that v = 0. Thus
null T = {0}, as desired.
To prove the implication in the other direction, now suppose that
null T = {0}. We want to prove that T is injective. To do this, suppose
u, v V and T u = T v. Then
0 = T u T v = T (u v).
Thus u v is in null T , which equals {0}. Hence u v = 0, which
implies that u = v. Hence T is injective, as desired.
For T L(V , W ), the range of T , denoted range T , is the subset of
W consisting of those vectors that are of the form T v for some v V :
range T = {T v : v V }.
For example, if T L(P(R), P(R)) is the differentiation map dened by
T p = p , then range T = P(R) because for every polynomial q P(R)
there exists a polynomial p P(R) such that p = q.
As another example, if T L(P(R), P(R)) is the linear map of
multiplication by x 2 dened by (T p)(x) = x 2 p(x), then the range
of T is the set of polynomials of the form a2 x 2 + + am x m , where
a2 , . . . , am R.
The next proposition shows that the range of any linear map is a
subspace of the target space.
Some mathematicians
use the word image,
which means the same
as range.
44
3.3
Proof: Suppose T L(V , W ). Then T (0) = 0 (by 3.1), which implies that 0 range T .
If w1 , w2 range T , then there exist v1 , v2 V such that T v1 = w1
and T v2 = w2 . Thus
T (v1 + v2 ) = T v1 + T v2 = w1 + w2 ,
and hence w1 +w2 range T . Thus range T is closed under addition.
If w range T and a F, then there exists v V such that T v = w.
Thus
T (av) = aT v = aw,
and hence aw range T . Thus range T is closed under scalar multiplication.
We have shown that range T contains 0 and is closed under addition
and scalar multiplication. Thus range T is a subspace of W .
Many mathematicians
use the term onto,
which means the same
as surjective.
3.4
Theorem: If V is nite dimensional and T L(V , W ), then
range T is a nite-dimensional subspace of W and
dim V = dim null T + dim range T .
Proof: Suppose that V is a nite-dimensional vector space and
T L(V , W ). Let (u1 , . . . , um ) be a basis of null T ; thus dim null T = m.
The linearly independent list (u1 , . . . , um ) can be extended to a basis (u1 , . . . , um , w1 , . . . , wn ) of V (by 2.12). Thus dim V = m + n,
and to complete the proof, we need only show that range T is nite
dimensional and dim range T = n. We will do this by proving that
(T w1 , . . . , T wn ) is a basis of range T .
Let v V . Because (u1 , . . . , um , w1 , . . . , wn ) spans V , we can write
v = a1 u1 + + am um + b1 w1 + + bn wn ,
where the as and bs are in F. Applying T to both sides of this equation,
we get
T v = b 1 T w1 + + b n T wn ,
where the terms of the form T uj disappeared because each uj null T .
The last equation implies that (T w1 , . . . , T wn ) spans range T . In particular, range T is nite dimensional.
To show that (T w1 , . . . , T wn ) is linearly independent, suppose that
c1 , . . . , cn F and
c1 T w1 + + cn T wn = 0.
Then
T (c1 w1 + + cn wn ) = 0,
and hence
c1 w1 + + cn wn null T .
Because (u1 , . . . , um ) spans null T , we can write
c1 w1 + + cn wn = d1 u1 + + dm um ,
where the ds are in F. This equation implies that all the cs (and ds)
are 0 (because (u1 , . . . , um , w1 , . . . , wn ) is linearly independent). Thus
(T w1 , . . . , T wn ) is linearly independent and hence is a basis for range T ,
as desired.
45
46
Now we can show that no linear map from a nite-dimensional vector space to a smaller vector space can be injective, where smaller
is measured by dimension.
3.5
Corollary: If V and W are nite-dimensional vector spaces such
that dim V > dim W , then no linear map from V to W is injective.
Proof: Suppose V and W are nite-dimensional vector spaces such
that dim V > dim W . Let T L(V , W ). Then
dim null T = dim V dim range T
dim V dim W
> 0,
where the equality above comes from 3.4. We have just shown that
dim null T > 0. This means that null T must contain vectors other
than 0. Thus T is not injective (by 3.2).
The next corollary, which is in some sense dual to the previous corollary, shows that no linear map from a nite-dimensional vector space
to a bigger vector space can be surjective, where bigger is measured
by dimension.
3.6
Corollary: If V and W are nite-dimensional vector spaces such
that dim V < dim W , then no linear map from V to W is surjective.
Proof: Suppose V and W are nite-dimensional vector spaces such
that dim V < dim W . Let T L(V , W ). Then
dim range T = dim V dim null T
dim V
< dim W ,
where the equality above comes from 3.4. We have just shown that
dim range T < dim W . This means that range T cannot equal W . Thus
T is not surjective.
The last two corollaries have important consequences in the theory
of linear equations. To see this, x positive integers m and n, and let
aj,k F for j = 1, . . . , m and k = 1, . . . , n. Dene T : Fn Fm by
T (x1 , . . . , xn ) =
n
a1,k xk , . . . ,
k=1
47
am,k xk .
k=1
a1,k xk = 0
k=1
..
.
n
Homogeneous, in this
equation equals 0.
am,k xk = 0.
k=1
a1,k xk = c1
k=1
..
.
n
am,k xk = cm .
k=1
inhomogeneous
systems with more
equations than
variables are often
proved using Gaussian
elimination. The
abstract approach
taken here leads to
cleaner proofs.
48
3.7
a1,1
.
.
.
am,1
...
...
a1,n
..
.
.
am,n
Note that the rst index refers to the row number and the second index refers to the column number. Thus a3,2 refers to the entry in the
third row, second column of the matrix above. We will usually consider
matrices whose entries are elements of F.
Let T L(V , W ). Suppose that (v1 , . . . , vn ) is a basis of V and
(w1 , . . . , wm ) is a basis of W . For each k = 1, . . . , n, we can write T vk
uniquely as a linear combination of the ws:
3.8
T vk = a1,k w1 + + am,k wm ,
v1
...
w1
..
.
wm
vk
a1,k
..
.
am,k
...
vn
49
Note that in the matrix above only the kth column is displayed (and thus
the second index of each displayed a is k). The kth column of M(T )
consists of the scalars needed to write T vk as a linear combination of
the ws. Thus the picture above should remind you that T vk is retrieved
from the matrix M(T ) by multiplying each entry in the kth column by
the corresponding w from the left column, and then adding up the
resulting vectors.
If T is a linear map from Fn to Fm , then unless stated otherwise you
should assume that the bases in question are the standard ones (where
the kth basis vector is 1 in the kth slot and 0 in all the other slots). If
you think of elements of Fm as columns of m numbers, then you can
think of the kth column of M(T ) as T applied to the kth basis vector.
For example, if T L(F2 , F3 ) is dened by
T (x, y) = (x + 3y, 2x + 5y, 7x + 9y),
then T (1, 0) = (1, 2, 7) and T (0, 1) = (3, 5, 9), so the matrix of T (with
respect to the standard bases) is the 3-by-2 matrix
2
7
5 .
9
50
a1,1
.
.
.
am,1
...
...
b1,1
a1,n
.
..
+ .
.
.
am,n
bm,1
...
...
b1,n
..
bm,n
a1,1 + b1,1
..
=
.
am,1 + bm,1
...
...
a1,n + b1,n
..
.
.
am,n + bm,n
whenever T , S L(V , W ).
Still assuming that we have some bases in mind, is the matrix of a
scalar times a linear map equal to the scalar times the matrix of the
linear map? Again the question does not make sense because we have
not dened scalar multiplication on matrices. Fortunately the obvious
denition again has the right properties. Specically, we dene the
product of a scalar and a matrix by multiplying each entry in the matrix
by the scalar:
ca1,1 . . . ca1,n
a1,1 . . . a1,n
.
..
..
..
=
.
c
.
.
.
..
am,1 . . . am,n
cam,1 . . . cam,n
You should verify that with this denition of scalar multiplication on
matrices,
3.10
M(cT ) = cM(T )
51
So far, however, the right side of this equation does not make sense
because we have not yet dened the product of two matrices. We will
choose a denition of matrix multiplication that forces the equation
above to hold. Lets see how to do this.
Let
b1,1 . . . b1,p
a1,1 . . . a1,n
.
..
..
and M(S) = ..
M(T ) =
.
. .
.
..
am,1 . . . am,n
bn,1 . . . bn,p
For k {1, . . . , p}, we have
n
T Suk = T (
br ,k vr )
r =1
=
=
br ,k T vr
r =1
n
br ,k
r =1
aj,r wj
j=1
(
aj,r br ,k )wj .
j=1 r =1
aj,r br ,k .
r =1
52
You should nd an
example to show that
matrix multiplication is
not commutative. In
other words, AB is not
necessarily equal to BA,
even when both are
dened.
matrices only when the number of columns of the rst matrix equals
the number of rows of the second matrix.
As an example of matrix multiplication, here we multiply together
a 3-by-2 matrix and a 2-by-4 matrix, obtaining a 3-by-4 matrix:
1 2
10 7
4 1
6 5 4 3
= 26 19 12 5 .
3 4
2 1 0 1
5 6
42 31 20 9
Suppose (v1 , . . . , vn ) is a basis of V . If v V , then there exist unique
scalars b1 , . . . , bn such that
3.12
v = b1 v1 + + bn vn .
b1
.
3.13
M(v) =
.. .
bn
Usually the basis is obvious from the context, but when the basis needs
to be displayed explicitly use the notation M v, (v1 , . . . , vn ) instead
of M(v).
For example, the matrix of a vector x Fn with respect to the standard basis is obtained by writing the coordinates of x as the entries in
an n-by-1 matrix. In other words, if x = (x1 , . . . , xn ) Fn , then
x1
.
M(x) =
.. .
xn
The next proposition shows how the notions of the matrix of a linear
map, the matrix of a vector, and matrix multiplication t together. In
this proposition M(T v) is the matrix of the vector T v with respect to
the basis (w1 , . . . , wm ) and M(v) is the matrix of the vector v with respect to the basis (v1 , . . . , vn ), whereas M(T ) is the matrix of the linear
map T with respect to the bases (v1 , . . . , vn ) and (w1 , . . . , wm ).
3.14 Proposition: Suppose T L(V , W ) and (v1 , . . . , vn ) is a basis
of V and (w1 , . . . , wm ) is a basis of W . Then
M(T v) = M(T )M(v)
for every v V .
Invertibility
53
Proof: Let
M(T ) =
3.15
a1,1
..
.
am,1
...
...
a1,n
..
.
.
am,n
3.16
aj,k wj
j=1
= b1
j=1
aj,1 wj + + bn
aj,n wj
j=1
j=1
where the rst equality comes from 3.12 and the second equality comes
from 3.16. The last equation shows that M(T v), the m-by-1 matrix of
the vector T v with respect to the basis (w1 , . . . , wm ), is given by the
equation
a1,1 b1 + + a1,n bn
..
.
M(T v) =
.
am,1 b1 + + am,n bn
This formula, along with the formulas 3.15 and 3.13 and the denition
of matrix multiplication, shows that M(T v) = M(T )M(v).
Invertibility
A linear map T L(V , W ) is called invertible if there exists a linear
map S L(W , V ) such that ST equals the identity map on V and T S
equals the identity map on W . A linear map S L(W , V ) satisfying
ST = I and T S = I is called an inverse of T (note that the rst I is the
identity map on V and the second I is the identity map on W ).
If S and S are inverses of T , then
54
S = SI = S(T S ) = (ST )S = IS = S ,
so S = S . In other words, if T is invertible, then it has a unique
inverse, which we denote by T 1 . Rephrasing all this once more, if
T L(V , W ) is invertible, then T 1 is the unique element of L(W , V )
such that T 1 T = I and T T 1 = I. The following proposition characterizes the invertible linear maps.
3.17 Proposition: A linear map is invertible if and only if it is injective and surjective.
Proof: Suppose T L(V , W ). We need to show that T is invertible
if and only if it is injective and surjective.
First suppose that T is invertible. To show that T is injective, suppose that u, v V and T u = T v. Then
u = T 1 (T u) = T 1 (T v) = v,
so u = v. Hence T is injective.
We are still assuming that T is invertible. Now we want to prove
that T is surjective. To do this, let w W . Then w = T (T 1 w), which
shows that w is in the range of T . Thus range T = W , and hence T is
surjective, completing this direction of the proof.
Now suppose that T is injective and surjective. We want to prove
that T is invertible. For each w W , dene Sw to be the unique element of V such that T (Sw) = w (the existence and uniqueness of such
an element follow from the surjectivity and injectivity of T ). Clearly
T S equals the identity map on W . To prove that ST equals the identity
map on V , let v V . Then
T (ST v) = (T S)(T v) = I(T v) = T v.
This equation implies that ST v = v (because T is injective), and thus
ST equals the identity map on V . To complete the proof, we need to
show that S is linear. To do this, let w1 , w2 W . Then
T (Sw1 + Sw2 ) = T (Sw1 ) + T (Sw2 ) = w1 + w2 .
Thus Sw1 + Sw2 is the unique element of V that T maps to w1 + w2 . By
the denition of S, this implies that S(w1 + w2 ) = Sw1 + Sw2 . Hence
S satises the additive property required for linearity. The proof of
homogeneity is similar. Specically, if w W and a F, then
Invertibility
55
56
investigation of Fn
would soon lead to
vector spaces that do
not equal Fn . For
example, we would
encounter the null
space and range of
linear maps, the set of
matrices Mat(n, n, F),
and the polynomials
Pn (F). Though each of
these vector spaces is
isomorphic to some
Fm , thinking of them
that way often adds
complexity but no new
insight.
A=
a1,1
..
.
am,1
...
...
a1,n
..
am,n
T vk =
aj,k wj
j=1
Invertibility
57
T is invertible;
(b)
T is injective;
(c)
T is surjective.
58
Exercises
59
Exercises
1.
2.
3.
4.
6.
7.
8.
9.
60
10.
Prove that there does not exist a linear map from F5 to F2 whose
null space equals
{(x1 , x2 , x3 , x4 , x5 ) F5 : x1 = 3x2 and x3 = x4 = x5 }.
11.
Prove that if there exists a linear map on V whose null space and
range are both nite dimensional, then V is nite dimensional.
12.
13.
14.
15.
16.
17.
18.
Exercises
19.
61
a1,1
.
M(T ) = ..
am,1
...
...
a1,n
..
,
.
am,n
21.
22.
23.
24.
25.
Prove that if V is nite dimensional with dim V > 1, then the set
of noninvertible operators on V is not a subspace of L(V ).
62
26.
a1,k xk = 0
k=1
..
.
n
an,k xk = 0.
k=1
(b)
a1,k xk = c1
k=1
..
.
n
an,k xk = cn .
k=1
Chapter 4
Polynomials
63
Chapter 4. Polynomials
64
Degree
Recall that a function p : F F is called a polynomial with coefcients in F if there exist a0 , . . . , am F such that
p(z) = a0 + a1 z + a2 z2 + + am zm
p() = 0.
p(z) = (z )q(z)
for all z F.
Proof: One direction is obvious. Namely, suppose there is a polynomial q P(F) such that 4.2 holds. Then
p() = ( )q() = 0,
and hence is a root of p, as desired.
To prove the other direction, suppose that F is a root of p. Let
a0 , . . . , am F be such that am = 0 and
p(z) = a0 + a1 z + a2 z2 + + am zm
Degree
65
Chapter 4. Polynomials
66
4.4
Corollary: Suppose a0 , . . . , am F. If
a0 + a1 z + a2 z2 + + am zm = 0
4.5
Division Algorithm: Suppose p, q P(F), with p = 0. Then
there exist polynomials s, r P(F) such that
4.6
q = sp + r
Complex Coefcients
67
Complex Coefficients
So far we have been handling polynomials with complex coefcients
and polynomials with real coefcients simultaneously through our convention that F denotes R or C. Now we will see some differences between these two cases. In this section we treat polynomials with complex coefcients. In the next section we will use our results about polynomials with complex coefcients to prove corresponding results for
polynomials with real coefcients.
Though this chapter contains no linear algebra, the results so far
have nonetheless been proved using algebra. The next result, though
called the fundamental theorem of algebra, requires analysis for its
proof. The short proof presented here uses tools from complex analysis. If you have not had a course in complex analysis, this proof will
almost certainly be meaningless to you. In that case, just accept the
fundamental theorem of algebra as something that we need to use but
whose proof requires more advanced tools that you may learn in later
courses.
4.7
Fundamental Theorem of Algebra: Every nonconstant polynomial with complex coefcients has a root.
This is an existence
theorem. The quadratic
formula gives the roots
Proof: Let p be a nonconstant polynomial with complex coefcients. Suppose that p has no roots. Then 1/p is an analytic function
on C. Furthermore, p(z) as z , which implies that 1/p 0 as
z . Thus 1/p is a bounded analytic function on C. By Liouvilles theorem, any such function must be constant. But if 1/p is constant, then
p is constant, contradicting our assumption that p is nonconstant.
explicitly for
polynomials of
degree 2. Similar but
more complicated
formulas exist for
polynomials of degree
3 and 4. No such
The fundamental theorem of algebra leads to the following factorization result for polynomials with complex coefcients. Note that
in this factorization, the numbers 1 , . . . , m are precisely the roots
of p, for these are the only values of z for which the right side of 4.9
equals 0.
Chapter 4. Polynomials
68
4.8
Corollary: If p P(C) is a nonconstant polynomial, then p
has a unique factorization (except for the order of the factors) of the
form
4.9
p(z) = c(z 1 ) . . . (z m ),
where c, 1 , . . . , m C.
Proof: Let p P(C) and let m denote the degree of p. We will use
induction on m. If m = 1, then clearly the desired factorization exists
and is unique. So assume that m > 1 and that the desired factorization
exists and is unique for all polynomials of degree m 1.
First we will show that the desired factorization of p exists. By the
fundamental theorem of algebra (4.7), p has a root . By 4.1, there is a
polynomial q with degree m 1 such that
p(z) = (z )q(z)
for all z C. Our induction hypothesis implies that q has the desired
factorization, which when plugged into the equation above gives the
desired factorization of p.
Now we turn to the question of uniqueness. Clearly c is uniquely
determined by 4.9it must equal the coefcient of zm in p. So we need
only show that except for the order, there is only one way to choose
1 , . . . , m . If
(z 1 ) . . . (z m ) = (z 1 ) . . . (z m )
for all z C, then because the left side of the equation above equals 0
when z = 1 , one of the s on the right side must equal 1 . Relabeling,
we can assume that 1 = 1 . Now for z = 1 , we can divide both sides
of the equation above by z 1 , getting
(z 2 ) . . . (z m ) = (z 2 ) . . . (z m )
for all z C except possibly z = 1 . Actually the equation above
must hold for all z C because otherwise by subtracting the right side
from the left side we would get a nonzero polynomial that has innitely
many roots. The equation above and our induction hypothesis imply
that except for the order, the s are the same as the s, completing
the proof of the uniqueness.
Real Coefcients
69
Real Coefficients
Before discussing polynomials with real coefcients, we need to
learn a bit more about the complex numbers.
Suppose z = a + bi, where a and b are real numbers. Then a is
called the real part of z, denoted Re z, and b is called the imaginary
part of z, denoted Im z. Thus for every complex number z, we have
z = Re z + (Im z)i.
, is dened by
The complex conjugate of z C, denoted z
= Re z (Im z)i.
z
For example, 2 + 3i = 2 3i.
The absolute value of a complex number z, denoted |z|, is dened
by
sum of z and z
= 2 Re z for all z C;
z+z
difference of z and z
= 2(Im z)i for all z C;
zz
product of z and z
= |z|2 for all z C;
zz
additivity of complex conjugate
+z
for all w, z C;
w +z =w
multiplicativity of complex conjugate
z
for all w, z C;
wz = w
if and
Note that z = z
only if z is a real
number.
Chapter 4. Polynomials
70
conjugate of conjugate
= z for all z C;
z
multiplicativity of absolute value
|wz| = |w| |z| for all w, z C.
In the next result, we need to think of a polynomial with real coefcients as an element of P(C). This makes sense because every real
number is also a complex number.
A polynomial with real
coefcients may have
If C is a root of p, then so is .
Proof: Let
p(z) = a0 + a1 z + + am zm ,
polynomial 1 + x 2 has
no real roots. The
failure of the
fundamental theorem
a0 + a1 + + am m = 0.
of algebra for R
accounts for the
differences between
+ + am
m = 0,
a0 + a1
where we have used some of the basic properties of complex conjuga is a root of p.
tion listed earlier. The equation above shows that
We want to prove a factorization theorem for polynomials with real
coefcients. To do this, we begin by characterizing the polynomials
with real coefcients and degree 2 that can be written as the product
of two polynomials with real coefcients and degree 1.
4.12
x 2 + x + = (x 1 )(x 2 ),
x 2 + x + = (x +
2
2
) + (
).
2
4
Real Coefcients
71
First suppose that 2 < 4. Then clearly the right side of the
equation above is positive for every x R, and hence the polynomial
x 2 + x + has no real roots. Thus no factorization of the form 4.12,
with 1 , 2 R, can exist.
Conversely, now suppose that 2 4. Thus there is a real number
2
c such that c 2 = 4 . From 4.13, we have
2
) c2
2
= (x + + c)(x + c),
2
2
x 2 + x + = (x +
Here either m or M
may equal 0.
Chapter 4. Polynomials
72
for some polynomial q P(C) with degree two less than the degree
of p. If we can prove that q has real coefcients, then, by using induction on the degree of p, we can conclude that (x ) appears in the
Exercises
Exercises
1.
2.
3.
4.
5.
Prove that every polynomial with odd degree and real coefcients
has a real root.
73
Chapter 5
In Chapter 3 we studied linear maps from one vector space to another vector space. Now we begin our investigation of linear maps from
a vector space to itself. Their study constitutes the deepest and most
important part of linear algebra. Most of the key results in this area
do not hold for innite-dimensional vector spaces, so we work only on
nite-dimensional vector spaces. To avoid trivialities we also want to
eliminate the vector space {0} from consideration. Thus we make the
following assumption:
Recall that F denotes R or C.
Lets agree that for the rest of the book
V will denote a nite-dimensional, nonzero vector space over F.
75
76
Invariant Subspaces
In this chapter we develop the tools that will help us understand the
structure of operators. Recall that an operator is a linear map from a
vector space to itself. Recall also that we denote the set of operators
on V by L(V ); in other words, L(V ) = L(V , V ).
Lets see how we might better understand what an operator looks
like. Suppose T L(V ). If we have a direct sum decomposition
5.1
V = U1 U m ,
where each Uj is a proper subspace of V , then to understand the behavior of T , we need only understand the behavior of each T |Uj ; here
T |Uj denotes the restriction of T to the smaller domain Uj . Dealing
with T |Uj should be easier than dealing with T because Uj is a smaller
vector space than V . However, if we intend to apply tools useful in the
study of operators (such as taking powers), then we have a problem:
T |Uj may not map Uj into itself; in other words, T |Uj may not be an
operator on Uj . Thus we are led to consider only decompositions of
the form 5.1 where T maps each Uj into itself.
The notion of a subspace that gets mapped into itself is sufciently
important to deserve a name. Thus, for T L(V ) and U a subspace
of V , we say that U is invariant under T if u U implies T u U.
In other words, U is invariant under T if T |U is an operator on U. For
example, if T is the operator of differentiation on P7 (R), then P4 (R)
(which is a subspace of P7 (R)) is invariant under T because the derivative of any polynomial of degree at most 4 is also a polynomial with
degree at most 4.
Lets look at some easy examples of invariant subspaces. Suppose
T L(V ). Clearly {0} is invariant under T . Also, the whole space V is
obviously invariant under T . Must T have any invariant subspaces other
than {0} and V ? Later we will see that this question has an afrmative
answer for operators on complex vector spaces with dimension greater
than 1 and also for operators on real vector spaces with dimension
greater than 2.
If T L(V ), then null T is invariant under T (proof: if u null T ,
then T u = 0, and hence T u null T ). Also, range T is invariant under T
(proof: if u range T , then T u is also in range T , by the denition of
range). Although null T and range T are invariant under T , they do not
necessarily provide easy answers to the question about the existence
Invariant Subspaces
77
of invariant subspaces other than {0} and V because null T may equal
{0} and range T may equal V (this happens when T is invertible).
We will return later to a deeper study of invariant subspaces. Now
we turn to an investigation of the simplest possible nontrivial invariant
subspacesinvariant subspaces with dimension 1.
How does an operator behave on an invariant subspace of dimension 1? Subspaces of V of dimension 1 are easy to describe. Take any
nonzero vector u V and let U equal the set of all scalar multiples
of u:
5.2
U = {au : a F}.
T u = u,
78
Some texts dene
eigenvectors as we
have, except that 0 is
declared not to be an
eigenvector. With the
5.4
z = w,
w = z.
Substituting the value for w given by the second equation into the rst
equation gives
z = 2 z.
Now z cannot equal 0 (otherwise 5.5 implies that w = 0; we are looking
for solutions to 5.5 where (w, z) is not the 0 vector), so the equation
above leads to the equation
1 = 2 .
The solutions to this equation are = i or = i. You should be
able to verify easily that i and i are eigenvalues of T . Indeed, the
eigenvectors corresponding to the eigenvalue i are the vectors of the
form (w, wi), with w C, and the eigenvectors corresponding to the
eigenvalue i are the vectors of the form (w, wi), with w C.
Now we show that nonzero eigenvectors corresponding to distinct
eigenvalues are linearly independent.
Invariant Subspaces
5.6
Theorem: Let T L(V ). Suppose 1 , . . . , m are distinct eigenvalues of T and v1 , . . . , vm are corresponding nonzero eigenvectors.
Then (v1 , . . . , vm ) is linearly independent.
Proof: Suppose (v1 , . . . , vm ) is linearly dependent. Let k be the
smallest positive integer such that
5.7
vk span(v1 , . . . , vk1 );
the existence of k with this property follows from the linear dependence
lemma (2.4). Thus there exist a1 , . . . , ak1 F such that
5.8
vk = a1 v1 + + ak1 vk1 .
79
80
and (T m ) = T mn ,
Upper-Triangular Matrices
81
Upper-Triangular Matrices
Now we come to one of the central results about operators on complex vector spaces.
5.10 Theorem: Every operator on a nite-dimensional, nonzero,
complex vector space has an eigenvalue.
82
We often use to
denote matrix entries
that we do not know
about or that are
irrelevant to the
questions being
discussed.
a1,1 . . . a1,n
.
..
5.11
.
..
an,1 . . . an,n
matrix
.
;
..
0
here the denotes the entries in all the columns other than the rst
column. To prove this, let be an eigenvalue of T (one exists by 5.10)
Upper-Triangular Matrices
6 2 7 5
0 6 1 3
0 0 7 9
0 0 0 8
is upper triangular. Typically we represent an upper-triangular matrix
in the form
..
;
.
0
n
the 0 in the matrix above indicates that all entries below the diagonal
in this n-by-n matrix equal 0. Upper-triangular matrices can be considered reasonably simplefor n large, an n-by-n upper-triangular matrix
has almost half its entries equal to 0.
The following proposition demonstrates a useful connection between upper-triangular matrices and invariant subspaces.
5.12 Proposition: Suppose T L(V ) and (v1 , . . . , vn ) is a basis
of V . Then the following are equivalent:
(a)
(b)
(c)
Proof: The equivalence of (a) and (b) follows easily from the definitions and a moments thought. Obviously (c) implies (b). Thus to
complete the proof, we need only prove that (b) implies (c). So suppose
that (b) holds. Fix k {1, . . . , n}. From (b), we know that
83
84
T v1 span(v1 ) span(v1 , . . . , vk );
T v2 span(v1 , v2 ) span(v1 , . . . , vk );
..
.
T vk span(v1 , . . . , vk ).
Thus if v is a linear combination of (v1 , . . . , vk ), then
T v span(v1 , . . . , vk ).
In other words, span(v1 , . . . , vk ) is invariant under T , completing the
proof.
Now we can show that for each operator on a complex vector space,
there is a basis of the vector space with respect to which the matrix
of the operator has only 0s below the diagonal. In Chapter 8 we will
improve even this result.
This theorem does not
hold on real vector
R2 ), then there is no
T u = (T I)u + u.
T uj = (T |U )(uj ) span(u1 , . . . , uj ).
Upper-Triangular Matrices
85
5.15
From 5.14 and 5.15, we conclude (using 5.12) that T has an uppertriangular matrix with respect to the basis (u1 , . . . , um , v1 , . . . , vn ).
How does one determine from looking at the matrix of an operator
whether the operator is invertible? If we are fortunate enough to have
a basis with respect to which the matrix of the operator is upper triangular, then this problem becomes easy, as the following proposition
shows.
5.16 Proposition: Suppose T L(V ) has an upper-triangular matrix
with respect to some basis of V . Then T is invertible if and only if all
the entries on the diagonal of that upper-triangular matrix are nonzero.
Proof: Suppose (v1 , . . . , vn ) is a basis of V with respect to which
T has an upper-triangular matrix
5.17
M T , (v1 , . . . , vn ) =
1
2
..
0
86
Unfortunately no method exists for exactly computing the eigenvalues of a typical operator from its matrix (with respect to an arbitrary
basis). However, if we are fortunate enough to nd a basis with respect to which the matrix of the operator is upper triangular, then the
problem of computing the eigenvalues becomes trivial, as the following
proposition shows.
matrix.
Diagonal Matrices
87
2
.
M T , (v1 , . . . , vn ) =
.
..
0
n
Let F. Then
M T I, (v1 , . . . , vn ) =
2
..
.
n
Diagonal Matrices
A diagonal matrix is a square matrix that is 0 everywhere except
possibly along the diagonal. For example,
8 0 0
0 2 0
0 0 5
is a diagonal matrix. Obviously every diagonal matrix is upper triangular, although in general a diagonal matrix has many more 0s than an
upper-triangular matrix.
An operator T L(V ) has a diagonal matrix
0
1
..
0
n
with respect to a basis (v1 , . . . , vn ) of V if and only
T v1 = 1 v1
..
.
T vn = n vn ;
88
this follows immediately from the denition of the matrix of an operator with respect to a basis. Thus an operator T L(V ) has a diagonal
matrix with respect to some basis of V if and only if V has a basis
consisting of eigenvectors of T .
If an operator has a diagonal matrix with respect to some basis,
then the entries along the diagonal are precisely the eigenvalues of the
operator; this follows from 5.18 (or you may want to nd an easier
proof that works only for diagonal matrices).
Unfortunately not every operator has a diagonal matrix with respect
to some basis. This sad state of affairs can arise even on complex vector
spaces. For example, consider T L(C2 ) dened by
5.19
Diagonal Matrices
89
(b)
(c)
(d)
(e)
Proof: We have already shown that (a) and (b) are equivalent.
Suppose that (b) holds; thus V has a basis (v1 , . . . , vn ) consisting of
eigenvectors of T . For each j, let Uj = span(vj ). Obviously each Uj
is a one-dimensional subspace of V that is invariant under T (because
each vj is an eigenvector of T ). Because (v1 , . . . , vn ) is a basis of V ,
each vector in V can be written uniquely as a linear combination of
(v1 , . . . , vn ). In other words, each vector in V can be written uniquely
as a sum u1 + + un , where each uj Uj . Thus V = U1 Un .
Hence (b) implies (c).
Suppose now that (c) holds; thus there are one-dimensional subspaces U1 , . . . , Un of V , each invariant under T , such that
V = U1 Un .
For each j, let vj be a nonzero vector in Uj . Then each vj is an eigenvector of T . Because each vector in V can be written uniquely as a sum
u1 + +un , where each uj Uj (so each uj is a scalar multiple of vj ),
we see that (v1 , . . . , vn ) is a basis of V . Thus (c) implies (b).
90
At this stage of the proof we know that (a), (b), and (c) are all equivalent. We will nish the proof by showing that (b) implies (d), that (d)
implies (e), and that (e) implies (b).
Suppose that (b) holds; thus V has a basis consisting of eigenvectors
of T . Thus every vector in V is a linear combination of eigenvectors
of T . Hence
5.22
Choose a basis of each null(T j I); put all these bases together to
form a list (v1 , . . . , vn ) of eigenvectors of T , where n = dim V (by 5.23).
To show that this list is linearly independent, suppose
a1 v1 + + an vn = 0,
where a1 , . . . , an F. For each j = 1, . . . , m, let uj denote the sum of
all the terms ak vk such that vk null(T j I). Thus each uj is an
eigenvector of T with eigenvalue j , and
u1 + + um = 0.
Because nonzero eigenvectors corresponding to distinct eigenvalues
are linearly independent, this implies (apply 5.6 to the sum of the
nonzero vectors on the left side of the equation above) that each uj
equals 0. Because each uj is a sum of terms ak vk , where the vk s
were chosen to be a basis of null(T j I), this implies that all the ak s
equal 0. Thus (v1 , . . . , vn ) is linearly independent and hence is a basis
of V (by 2.17). Thus (e) implies (b), completing the proof.
91
Here either m or M
might equal 0.
92
5.25
T 2 u + j T u + j u = 0.
We will complete the proof by showing that span(u, T u), which clearly
has dimension 1 or 2, is invariant under T . To do this, consider a typical
element of span(u, T u) of the form au+bT u, where a, b R. Then
T (au + bT u) = aT u + bT 2 u
= aT u bj T u bj u,
where the last equality comes from solving for T 2 u in 5.25. The equation above shows that T (au + bT u) span(u, T u). Thus span(u, T u)
is invariant under T , as desired.
We will need one new piece of notation for the next proof. Suppose
U and W are subspaces of V with
V = U W.
Each vector v V can be written uniquely in the form
v = u + w,
PU ,W is often called the
projection onto U with
null space W .
on all real vector spaces with dimension 2 less than dim V . Suppose
T L(V ). We need to prove that T has an eigenvalue. If it does, we are
done. If not, then by 5.24 there is a two-dimensional subspace U of V
that is invariant under T . Let W be any subspace of V such that
V = U W;
2.13 guarantees that such a W exists.
Because W has dimension 2 less than dim V , we would like to apply
our induction hypothesis to T |W . However, W might not be invariant
under T , meaning that T |W might not be an operator on W . We will
compose with the projection PW ,U to get an operator on W . Specically,
dene S L(W ) by
Sw = PW ,U (T w)
for w W . By our induction hypothesis, S has an eigenvalue . We
will show that this is also an eigenvalue for T .
Let w W be a nonzero eigenvector for S corresponding to the
eigenvalue ; thus (S I)w = 0. We would be done if w were an
eigenvector for T with eigenvalue ; unfortunately that need not be
true. So we will look for an eigenvector of T in U + span(w). To do
that, consider a typical vector u + aw in U + span(w), where u U
and a R. We have
(T I)(u + aw) = T u u + a(T w w)
= T u u + a(PU ,W (T w) + PW ,U (T w) w)
= T u u + a(PU ,W (T w) + Sw w)
= T u u + aPU ,W (T w).
Note that on the right side of the last equation, T u U (because U
is invariant under T ), u U (because u U), and aPU ,W (T w) U
(from the denition of PU ,W ). Thus T I maps U + span(w) into U.
Because U + span(w) has a larger dimension than U , this means that
(T I)|U +span(w) is not injective (see 3.5). In other words, there exists
a nonzero vector v U + span(w) V such that (T I)v = 0. Thus
T has an eigenvalue, as desired.
93
94
Exercises
1.
2.
3.
4.
5.
Dene T L(F2 ) by
T (w, z) = (z, w).
Find all eigenvalues and eigenvectors of T .
6.
Dene T L(F3 ) by
T (z1 , z2 , z3 ) = (2z2 , 0, 5z3 ).
Find all eigenvalues and eigenvectors of T .
7.
8.
Find all eigenvalues and eigenvectors of the backward shift operator T L(F ) dened by
T (z1 , z2 , z3 , . . . ) = (z2 , z3 , . . . ).
9.
10.
Exercises
11.
12.
13.
Suppose T L(V ) is such that every subspace of V with dimension dim V 1 is invariant under T . Prove that T is a scalar
multiple of the identity operator.
14.
95
p(ST S 1 ) = Sp(T )S 1 .
15.
16.
Show that the result in the previous exercise does not hold if C
is replaced with R.
17.
18.
19.
20.
21.
22.
96
23.
24.
Suppose V is a real vector space and T L(V ) has no eigenvalues. Prove that every subspace of V invariant under T has even
dimension.
Chapter 6
Inner-Product Spaces
In making the denition of a vector space, we generalized the linear structure (addition and scalar multiplication) of R 2 and R 3 . We
ignored other important features, such as the notions of length and
angle. These ideas are embedded in the concept we now investigate,
inner products.
Recall that F denotes R or C.
Also, V is a nite-dimensional, nonzero vector space over F.
97
98
Inner Products
If we think of vectors
as points instead of
arrows, then x
should be interpreted
as the distance from
the point x to the
(x , x )
1
origin.
x1-axis
x1 2 + x2 2 .
Similarly, for x = (x1 , x2 , x3 ) R3 , we have x = x1 2 + x2 2 + x3 2 .
Even though we cannot draw pictures in higher dimensions, the generalization to R n is obvious: we dene the norm of x = (x1 , . . . , xn ) Rn
by
The length of this vector x is
x = x1 2 + + xn 2 .
Inner Products
by abstracting the properties of the dot product discussed in the paragraph above. For real vector spaces, that guess is correct. However,
so that we can make a denition that will be useful for both real and
complex vector spaces, we need to examine the complex case before
making the denition.
Recall that if = a + bi, where a, b R, then the absolute value
of is dened by
|| = a2 + b2 ,
the complex conjugate of is dened by
= a bi,
||2 =
connects these two concepts (see page 69 for the denitions and the
basic properties of the absolute value and complex conjugate). For
z = (z1 , . . . , zn ) Cn , we dene the norm of z by
z = |z1 |2 + + |zn |2 .
The absolute values are needed because we want z to be a nonnegative number. Note that
z2 = z1 z1 + + zn zn .
We want to think of z2 as the inner product of z with itself, as we
did in Rn . The equation above thus suggests that the inner product of
w = (w1 , . . . , wn ) Cn with z should equal
w1 z1 + + wn zn .
If the roles of the w and z were interchanged, the expression above
would be replaced with its complex conjugate. In other words, we
should expect that the inner product of w with z equals the complex
conjugate of the inner product of z with w. With that motivation, we
are now ready to dene an inner product on V , which may be a real or
a complex vector space.
An inner product on V is a function that takes each ordered pair
(u, v) of elements of V to a number u, v F and has the following
properties:
99
100
If z is a complex
number, then the
statement z 0 means
that z is real and
nonnegative.
positivity
v, v 0 for all v V ;
deniteness
v, v = 0 if and only if v = 0;
additivity in rst slot
u + v, w = u, w + v, w for all u, v, w V ;
homogeneity in rst slot
av, w = av, w for all a F and all v, w V ;
conjugate symmetry
v, w = w, v for all v, w V .
Recall that every real number equals its complex conjugate. Thus
if we are dealing with a real vector space, then in the last condition
above we can dispense with the complex conjugate and simply state
that v, w = w, v for all v, w V .
An inner-product space is a vector space V along with an inner
product on V .
The most important example of an inner-product space is Fn . We
can dene an inner product on Fn by
6.1
(w1 , . . . , wn ), (z1 , . . . , zn ) = w1 z1 + + wn zn ,
as you should verify. This inner product, which provided our motivation for the denition of an inner product, is called the Euclidean inner
product on Fn . When Fn is referred to as an inner-product space, you
should assume that the inner product is the Euclidean inner product
unless explicitly told otherwise.
There are other inner products on Fn in addition to the Euclidean
inner product. For example, if c1 , . . . , cn are positive numbers, then we
can dene an inner product on Fn by
(w1 , . . . , wn ), (z1 , . . . , zn ) = c1 w1 z1 + + cn wn zn ,
as you should verify. Of course, if all the cs equal 1, then we get the
Euclidean inner product.
As another example of an inner-product space, consider the vector
space Pm (F) of all polynomials with coefcients in F and degree at
most m. We can dene an inner product on Pm (F) by
Inner Products
1
6.2
p, q =
p(x)q(x) dx,
u
= av,
= au,
v;
here a F and u, v V . Note that in a real vector space, conjugate
homogeneity is the same as homogeneity.
101
102
Norms
For v V , we dene the norm of v, denoted v, by
v = v, v.
For example, if (z1 , . . . , zn ) Fn (with the Euclidean inner product),
then
(z1 , . . . , zn ) = |z1 |2 + + |zn |2 .
As another example, if p Pm (F) (with inner product given by 6.2),
then
1
|p(x)|2 dx.
p =
0
= aav,
v
= |a|2 v2 ;
Some mathematicians
use the term
perpendicular, which
means the same as
orthogonal.
The word orthogonal
comes from the Greek
taking square roots now gives the desired equality. This proof illustrates a general principle: working with norms squared is usually easier
than working directly with norms.
Two vectors u, v V are said to be orthogonal if u, v = 0. Note
that the order of the vectors does not matter because u, v = 0 if
and only if v, u = 0. Instead of saying that u and v are orthogonal,
sometimes we say that u is orthogonal to v. Clearly 0 is orthogonal
to every vector. Furthermore, 0 is the only vector that is orthogonal to
itself.
For the special case where V = R 2 , the next theorem is over 2,500
years old.
word orthogonios,
which means
6.3
right-angled.
6.4
Norms
103
u + v2 = u + v, u + v
= u2 + v2 + u, v + v, u
= u2 + v2 ,
as desired.
v
v
An orthogonal decomposition
To discover how to write u as a scalar multiple of v plus a vector orthogonal to v, let a F denote a scalar. Then
u = av + (u av).
Thus we need to choose a so that v is orthogonal to (u av). In other
words, we want
0 = u av, v = u, v av2 .
The equation above shows that we should choose a to be u, v/v2
(assume that v = 0 to avoid division by 0). Making this choice of a, we
can write
u, v
u, v
6.5
u=
v
+
u
v
.
v2
v2
As you should verify, if v = 0 then the equation above writes u as a
scalar multiple of v plus a vector orthogonal to v.
The equation above will be used in the proof of the next theorem,
which gives one of the most important inequalities in mathematics.
the Pythagorean
theorem holds in real
inner-product spaces.
104
In 1821 the French
mathematician
Augustin-Louis Cauchy
showed that this
inequality holds for the
inner product dened
by 6.1. In 1886 the
German mathematician
Herman Schwarz
6.6
6.7
This inequality is an equality if and only if one of u, v is a scalar multiple of the other.
u, v
v + w,
v2
6.8
|u, v|2
+ w2
v2
|u, v|2
.
v2
Multiplying both sides of this inequality by v2 and then taking square
roots gives the Cauchy-Schwarz inequality 6.7.
Looking at the proof of the Cauchy-Schwarz inequality, note that 6.7
is an equality if and only if 6.8 is an equality. Obviously this happens if
and only if w = 0. But w = 0 if and only if u is a multiple of v (see 6.5).
Thus the Cauchy-Schwarz inequality is an equality if and only if u is a
scalar multiple of v or v is a scalar multiple of u (or both; the phrasing
has been chosen to cover cases in which either u or v equals 0).
The next result is called the triangle inequality because of its geometric interpretation that the length of any side of a triangle is less
than the sum of the lengths of the other two sides.
u+v
Norms
6.9
105
The triangle inequality
can be used to show
6.10
u + v u + v.
6.12
u, v = uv.
106
u
u
v
u+
Orthonormal Bases
A list of vectors is called orthonormal if the vectors in it are pairwise orthogonal and each vector has norm 1. In other words, a list
(e1 , . . . , em ) of vectors in V is orthonormal if ej , ek equals 0 when
j = k and equals 1 when j = k (for j, k = 1, . . . , m). For example, the
standard basis in Fn is orthonormal. Orthonormal lists are particularly
easy to work with, as illustrated by the next proposition.
6.15 Proposition:
in V , then
Orthonormal Bases
6.16 Corollary:
pendent.
107
1 1 1 1
1 1
1
1
1
1
1 1
1 1
1 1
( 2 , 2 , 2 , 2 ), ( 2 , 2 , 2 , 2 ), ( 2 , 2 , 2 , 2 ), ( 2 , 2 , 2 , 2 ) .
The verication that this list is orthonormal is easy (do it!); because we
have an orthonormal list of length four in a four-dimensional vector
space, it must be an orthonormal basis.
In general, given a basis (e1 , . . . , en ) of V and a vector v V , we
know that there is some choice of scalars a1 , . . . , am such that
v = a 1 e1 + + a n en ,
but nding the aj s can be difcult. The next theorem shows, however,
that this is easy for an orthonormal basis.
6.17 Theorem:
Then
6.18
orthonormal bases
and
6.19
for every v V .
The importance of
108
The Danish
mathematician Jorgen
Gram (18501916) and
the German
mathematician Erhard
Schmidt (18761959)
popularized this
algorithm for
constructing
orthonormal lists.
6.21
for j = 1, . . . , m.
Proof: Suppose (v1 , . . . , vm ) is a linearly independent list of vectors in V . To construct the es, start by setting e1 = v1 /v1 . This
satises 6.21 for j = 1. We will choose e2 , . . . , em inductively, as follows. Suppose j > 1 and an orthornormal list (e1 , . . . , ej1 ) has been
chosen so that
6.22
Let
6.23
ej =
Note that vj span(v1 , . . . , vj1 ) (because (v1 , . . . , vm ) is linearly independent) and thus vj span(e1 , . . . , ej1 ). Hence we are not dividing
by 0 in the equation above, and so ej is well dened. Dividing a vector
by its norm produces a new vector with norm 1; thus ej = 1.
Orthonormal Bases
109
vj , ek vj , ek
vj vj , e1 e1 vj , ej1 ej1
= 0.
Thus (e1 , . . . , ej ) is an orthonormal list.
From 6.23, we see that vj span(e1 , . . . , ej ). Combining this information with 6.22 shows that
span(v1 , . . . , vj ) span(e1 , . . . , ej ).
Both lists above are linearly independent (the vs by hypothesis, the es
by orthonormality and 6.16). Thus both subspaces above have dimension j, and hence they must be equal, completing the proof.
Now we can settle the question of the existence of orthonormal
bases.
6.24 Corollary: Every nite-dimensional inner-product space has an
orthonormal basis.
110
(e1 , . . . , em , f1 , . . . , fn );
..
.
.
111
for each j (see 6.21), we conclude that span(e1 , . . . , ej ) is invariant under T for each j = 1, . . . , n. Thus, by 5.12, T has an upper-triangular
matrix with respect to the orthonormal basis (e1 , . . . , en ).
The next result is an important application of the corollary above.
6.28 Corollary: Suppose V is a complex vector space and T L(V ).
Then T has an upper-triangular matrix with respect to some orthonormal basis of V .
This result is
sometimes called
Schurs theorem. The
German mathematician
V = U + U .
result in 1909.
112
6.31
v = v, e1 e1 + + v, em em + v v, e1 e1 v, em em .
u
U U = {0}.
U (U ) .
113
for every v V .
The following problem often arises: given a subspace U of V and
a point v V , nd a point u U such that v u is as small as
possible. The next proposition shows that this minimization problem
is solved by taking u = PU v.
6.36
The remarkable
simplicity of the
solution to this
minimization problem
has led to many
applications of
6.38
inner-product spaces
outside of pure
mathematics.
114
where 6.38 comes from the Pythagorean theorem (6.3), which applies
because v PU v U and PU v u U . Taking square roots gives the
desired inequality.
Our inequality is an equality if and only if 6.37 is an equality, which
happens if and only if PU v u = 0, which happens if and only if
u = PU v.
v
U
PU v
where the s that appear in the exact answer have been replaced with
a good decimal approximation.
By 6.36, the polynomial above should be about as good an approximation to sin x on [ , ] as is possible using polynomials of degree
at most 5. To see how good this approximation is, the picture below
shows the graphs of both sin x and our approximation 6.40 over the
interval [ , ].
1
0.5
-3
-2
-1
-0.5
-1
x5
x3
+
.
3!
5!
To see how good this approximation is, the next picture shows the
graphs of both sin x and the Taylor polynomial 6.41 over the interval
[ , ].
115
116
0.5
-3
-2
-1
-0.5
-1
V = U U .
Note that the proof uses the nite dimensionality of U (to get a basis
of U) but that it works ne regardless of whether or not V is nite
dimensional. Second, note that the denition and properties of PU (including 6.35) require only 6.29 and thus require only that U (but not
necessarily V ) be nite dimensional. Finally, note that the proof of 6.36
does not require the nite dimensionality of V . Conclusion: for v V
and U a subspace of V , the procedure discussed above for nding the
vector u U that makes v u as small as possible works if U is nite
dimensional, regardless of whether or not V is nite dimensional. In
the example above U was indeed nite dimensional (we had dim U = 6),
so everything works as expected.
6.43
is a linear functional on F3 . As another example, consider the innerproduct space P6 (R) (here the inner product is multiplication followed
by integration on [0, 1]; see 6.2). The function : P6 (R) R dened
by
1
6.44
(p) =
p(x)(cos x) dx
0
117
118
Now we prove that only one vector v V has the desired behavior.
Suppose v1 , v2 V are such that
(u) = u, v1 = u, v2
for every u V . Then
0 = u, v1 u, v2 = u, v1 v2
for every u V . Taking u = v1 v2 shows that v1 v2 = 0. In other
words, v1 = v2 , completing the proof of the uniqueness part of the
theorem.
In addition to V , we need another nite-dimensional inner-product
space.
Lets agree that for the rest of this chapter
W is a nite-dimensional, nonzero, inner-product space over F.
The word adjoint has
another meaning in
linear algebra. We will
not need the second
meaning, related to
inverses, in this book.
Just in case you
encountered the
second meaning for
adjoint elsewhere, be
warned that the two
meanings for adjoint
are unrelated to one
another.
119
= av,
T w
= v, aT w,
which shows that aT w plays the role required of T (aw). Because
only one vector can behave that way, we must have
aT w = T (aw).
Thus T is a linear map, as claimed.
You should verify that the function T T has the following properties:
additivity
(S + T ) = S + T for all S, T L(V , W );
conjugate homogeneity
for all a F and T L(V , W );
(aT ) = aT
adjoint of adjoint
(T ) = T for all T L(V , W );
identity
I = I, where I is the identity operator on V ;
120
products
(ST ) = T S for all T L(V , W ) and S L(W , U) (here U is an
inner-product space over F).
The next result shows the relationship between the null space and
the range of a linear map and its adjoint. The symbol means if and
only if; this symbol could also be read to mean is equivalent to.
6.46
(a)
null T = (range T ) ;
(b)
range T = (null T ) ;
(c)
null T = (range T ) ;
(d)
range T = (null T ) .
Proof: Lets begin by proving (a). Let w W . Then
w null T T w = 0
v, T w = 0 for all v V
T v, w = 0 for all v V
w (range T ) .
3 4i
7
5 .
8i
121
we are dealing with orthonormal baseswith respect to nonorthonormal bases, the matrix of T does not necessarily equal the conjugate
transpose of the matrix of T .
6.47 Proposition: Suppose T L(V , W ). If (e1 , . . . , en ) is an orthonormal basis of V and (f1 , . . . , fm ) is an orthonormal basis of W ,
then
M T , (f1 , . . . , fm ), (e1 , . . . , en )
is the conjugate transpose of
M T , (e1 , . . . , en ), (f1 , . . . , fm ) .
Proof: Suppose that (e1 , . . . , en ) is an orthonormal basis of V and
(f1 , . . . , fm ) is an orthonormal basis of W . We write M(T ) instead of the
longer expression M T , (e1 , . . . , en ), (f1 , . . . , fm ) ; we also write M(T )
instead of M T , (f1 , . . . , fm ), (e1 , . . . , en ) .
Recall that we obtain the kth column of M(T ) by writing T ek as a linear combination of the fj s; the scalars used in this linear combination
then become the kth column of M(T ). Because (f1 , . . . , fm ) is an orthonormal basis of W , we know how to write T ek as a linear combination
of the fj s (see 6.17):
T ek = T ek , f1 f1 + + T ek , fm fm .
Thus the entry in row j, column k, of M(T ) is T ek , fj . Replacing T
with T and interchanging the roles played by the es and f s, we see
that the entry in row j, column k, of M(T ) is T fk , ej , which equals
fk , T ej , which equals T ej , fk , which equals the complex conjugate
of the entry in row k, column j, of M(T ). In other words, M(T ) equals
the conjugate transpose of M(T ).
122
Exercises
1.
2.
3.
Prove that
n
aj b j
2
j=1
n
jaj 2
j=1
n
bj 2
j
j=1
u + v = 4,
u v = 6.
6.
u + v2 u v2
4
for all u, v V .
7.
for all u, v V .
Exercises
8.
9.
,...,
, ,
,...,
, ,
2
is an orthonormal list of vectors in C[ , ], the vector space of
continuous real-valued functions on [ , ] with inner product
f , g =
f (x)g(x) dx.
10.
p(x)q(x) dx.
Apply the Gram-Schmidt procedure to the basis (1, x, x 2 ) to produce an orthonormal basis of P2 (R).
11.
12.
13.
123
124
14.
15.
16.
17.
18.
19.
20.
21.
In R 4 , let
U = span (1, 1, 0, 0), (1, 1, 1, 2) .
is as small as possible.
23.
as small as possible. (The polynomial 6.40 is an excellent approximation to the answer to this exercise, but here you are asked to
nd the exact solution, which involves powers of . A computer
that can perform symbolic integration will be useful.)
Exercises
24.
125
25.
27.
28.
29.
30.
31.
(a)
(b)
Prove that
dim null T = dim null T + dim W dim V
and
dim range T = dim range T
for every T L(V , W ).
32.
Chapter 7
Operators on
Inner-Product Spaces
127
128
If F = R, then by
denition every
eigenvalue is real, so
this proposition is
interesting only when
F = C.
.
= v
129
7.2
Proposition: If V is a complex inner-product space and T is an
operator on V such that
T v, v = 0
for all v V , then T = 0.
Proof: Suppose V is a complex inner-product space and T L(V ).
Then
T u, w =
T (u + w), u + w T (u w), u w
4
T (u + iw), u + iw T (u iw), u iw
+
i
4
for every v V .
Proof: Let v V . Then
T v, v T v, v = T v, v v, T v
= T v, v T v, v
= (T T )v, v.
If T v, v R for every v V , then the left side of the equation above
equals 0, so (T T )v, v = 0 for every v V . This implies that
T T = 0 (by 7.2), and hence T is self-adjoint.
Conversely, if T is self-adjoint, then the right side of the equation
above equals 0, so T v, v = T v, v for every v V . This implies that
T v, v R for every v V , as desired.
130
T u, w =
T (u + w), u + w T (u w), u w
;
4
7.6
131
Note that this
proposition implies
T v = T v
for all v V .
operator T .
for all v V ,
where we used 7.4 to establish the second equivalence (note that the
operator T T T T is self-adjoint). The equivalence of the rst and
last conditions above gives the desired result.
Compare the next corollary to Exercise 28 in the previous chapter.
That exercise implies that the eigenvalues of the adjoint of any operator
are equal (as a set) to the complex conjugates of the eigenvalues of the
operator. The exercise says nothing about eigenvectors because an
operator and its adjoint may have different eigenvectors. However, the
next corollary implies that a normal operator and its adjoint have the
same eigenvectors.
7.7
Corollary: Suppose T L(V ) is normal. If v V is an eigenvector of T with eigenvalue F, then v is also an eigenvector of T
with eigenvalue .
Proof: Suppose v V is an eigenvector of T with eigenvalue .
Thus (T I)v = 0. Because T is normal, so is T I, as you should
verify. Using 7.6, we have
132
7.8
Corollary: If T L(V ) is normal, then eigenvectors of T
corresponding to distinct eigenvalues are orthogonal.
Proof: Suppose T L(V ) is normal and , are distinct eigenvalues of T , with corresponding eigenvectors u, v. Thus T u = u and
Thus
T v = v. From 7.7 we have T v = v.
(i, 1) (i, 1)
,
2
2
133
Because every
self-adjoint operator is
normal, the complex
spectral theorem
a1,1 . . . a1,n
..
..
7.10
M T , (e1 , . . . , en ) =
.
. .
0
an,n
We will show that this matrix is actually a diagonal matrix, which means
that (e1 , . . . , en ) is an orthonormal basis of V consisting of eigenvectors
of T .
We see from the matrix above that
T e1 2 = |a1,1 |2
and
T e1 2 = |a1,1 |2 + |a1,2 |2 + + |a1,n |2 .
Because T is normal, T e1 = T e1 (see 7.6). Thus the two equations
above imply that all entries in the rst row of the matrix in 7.10, except
possibly the rst entry a1,1 , equal 0.
Now from 7.10 we see that
T e2 2 = |a2,2 |2
134
This technique of
We will need two lemmas for our proof of the real spectral theorem. You could guess that the next lemma is true and even discover its
proof by thinking about quadratic polynomials with real coefcients.
Specically, suppose , R and 2 < 4. Let x be a real number.
Then
2
2
x 2 + x + = x +
+
2
4
> 0.
135
Proof: As noted above, we can assume that V is a real innerproduct space. Let n = dim V and choose v V with v = 0. Then
(v, T v, T 2 v, . . . , T n v)
cannot be linearly independent because V has dimension n and we have
n + 1 vectors. Thus there exist real numbers a0 , . . . , an , not all 0, such
that
0 = a0 v + a1 T v + + an T n v.
Make the as the coefcients of a polynomial, which can be written in
factored form (see 4.14) as
a0 + a1 x + + an x n
= c(x 2 + 1 x + 1 ) . . . (x 2 + M x + M )(x 1 ) . . . (x m ),
where c is a nonzero real number, each j , j , and j is real, each
j 2 < 4j , m + M 1, and the equation holds for all real x. We then
have
0 = a0 v + a1 T v + + an T n v
= (a0 I + a1 T + + an T n )v
= c(T 2 + 1 T + 1 I) . . . (T 2 + M T + M I)(T 1 I) . . . (T m I)v.
Each T 2 + j T + j I is invertible because T is self-adjoint and each
j 2 < 4j (see 7.11). Recall also that c = 0. Thus the equation above
implies that
0 = (T 1 I) . . . (T m I)v.
Hence T j I is not injective for at least one j. In other words, T has
an eigenvalue.
136
As an illustration of the real spectral theorem, consider the selfadjoint operator T on R 3 whose matrix (with respect to the standard
basis) is
14 13 8
8 .
13 14
8
8
7
You should verify that
,
,
3
6
2
27
0
0
0
9
0
0 .
15
Combining the complex spectral theorem and the real spectral theorem, we conclude that every self-adjoint operator on V has a diagonal
matrix with respect to some orthonormal basis. This statement, which
is the most useful part of the spectral theorem, holds regardless of
whether F = C or F = R.
7.13 Real Spectral Theorem: Suppose that V is a real inner-product
space and T L(V ). Then V has an orthonormal basis consisting of
eigenvectors of T if and only if T is self-adjoint.
Proof: First suppose that V has an orthonormal basis consisting of
eigenvectors of T . With respect to this basis, T has a diagonal matrix.
This matrix equals its conjugate transpose. Hence T = T and so T is
self-adjoint, as desired.
To prove the other direction, now suppose that T is self-adjoint. We
will prove that V has an orthonormal basis consisting of eigenvectors
of T by induction on the dimension of V . To get started, note that our
desired result clearly holds if dim V = 1. Now assume that dim V > 1
and that the desired result holds on vector spaces of smaller dimension.
The idea of the proof is to take any eigenvector u of T with norm 1,
then adjoin to it an orthonormal basis of eigenvectors of T |{u} . Now
for the details, the most important of which is verifying that T |{u} is
self-adjoint (this allows us to apply our induction hypothesis).
Let be any eigenvalue of T (because T is self-adjoint, we know
from the previous lemma that it has an eigenvalue) and let u V
denote a corresponding eigenvector with u = 1. Let U denote the
one-dimensional subspace of V consisting of all scalar multiples of u.
Note that a vector v V is in U if and only if u, v = 0.
Suppose v U . Then because T is self-adjoint, we have
u, T v = T u, v = u, v = u, v = 0,
and hence T v U . Thus T v U whenever v U . In other words,
U is invariant under T . Thus we can dene an operator S L(U ) by
S = T |U . If v, w U , then
Sv, w = T v, w = v, T w = v, Sw,
which shows that S is self-adjoint (note that in the middle equality
above we used the self-adjointness of T ). Thus, by our induction hypothesis, there is an orthonormal basis of U consisting of eigenvectors of S. Clearly every eigenvector of S is an eigenvector of T (because
Sv = T v for every v U ). Thus adjoining u to an orthonormal basis
of U consisting of eigenvectors of S gives an orthonormal basis of V
consisting of eigenvectors of T , as desired.
For T L(V ) self-adjoint (or, more generally, T L(V ) normal
when F = C), the corollary below provides the nicest possible decomposition of V into subspaces invariant under T . On each null(T j I),
the operator T is just multiplication by j .
7.14 Corollary: Suppose that T L(V ) is self-adjoint (or that F = C
and that T L(V ) is normal). Let 1 , . . . , m denote the distinct eigenvalues of T . Then
V = null(T 1 I) null(T m I).
Furthermore, each vector in each null(T j I) is orthogonal to all vectors in the other subspaces of this decomposition.
Proof: The spectral theorem (7.9 and 7.13) implies that V has a
basis consisting of eigenvectors of T . The desired decomposition of V
now follows from 5.21.
The orthogonality statement follows from 7.8.
137
To get an eigenvector
of norm 1, take any
nonzero eigenvector
and divide it by its
norm.
138
(b)
(c)
Proof: First suppose that (a) holds, so that T is normal but not
self-adjoint. Let (e1 , e2 ) be an orthonormal basis of V . Suppose
a c
.
7.16
M T , (e1 , e2 ) =
b d
Then T e1 2 = a2 + b2 and T e1 2 = a2 + c 2 . Because T is normal,
T e1 = T e1 (see 7.6); thus these equations imply that b2 = c 2 .
Thus c = b or c = b. But c = b because otherwise T would be selfadjoint, as can be seen from the matrix in 7.16. Hence c = b, so
a b
.
7.17
M T , (e1 , e2 ) =
b d
139
1 1 2 2 2
1 1 2 2 2
D=
0 0 3 3 3 .
0 0 3 3 3
0 0 3 3 3
We can write this matrix in the form
A B
D=
,
0 C
Often we can
understand a matrix
better by thinking of it
as composed of smaller
matrices. We will use
where
A=
1
1
1
1
,
B=
2
2
2
2
2
2
,
C= 3
3
3
3
3
3 ,
3
140
The next result will play a key role in our characterization of the
normal operators on a real inner-product space.
Without normality, an
easier result also holds:
if T L(V ) and U
(a)
U is invariant under T ;
U is invariant under
(b)
U is invariant under T ;
(c)
(T |U ) = (T )|U ;
(d)
T |U is a normal operator on U;
(e)
T |U is a normal operator on U .
T ; see Exercise 29 in
Chapter 6.
M(T ) =
e1
..
.
em
f1
..
.
fn
e1
...
em f1
...
fn
here A denotes an m-by-m matrix, 0 denotes the n-by-m matrix consisting of all 0s, B denotes an m-by-n matrix, C denotes an n-by-n
matrix, and for convenience the basis has been listed along the top and
left sides of the matrix.
For each j {1, . . . , m}, T ej 2 equals the sum of the squares of the
absolute values of the entries in the j th column of A (see 6.17). Hence
7.19
j=1
T ej 2 =
For each j {1, . . . , m}, T ej 2 equals the sum of the squares of the
absolute values of the entries in the j th rows of A and B. Hence
7.20
T ej 2 =
j=1
T ej 2 =
j=1
T ej 2 .
j=1
This equation, along with 7.19 and 7.20, implies that the sum of the
squares of the absolute values of the entries of B must equal 0. In
other words, B must be the matrix consisting of all 0s. Thus
7.21
M(T ) =
e1
..
.
em
f1
..
.
fn
e1
...
em f1
...
fn
141
142
In proving 7.18 we thought of a matrix as composed of smaller matrices. Now we need to make additional use of that idea. A block diagonal matrix is a square matrix of the form
0
A1
..
,
.
0
Am
where A1 , . . . , Am are square matrices lying along the diagonal and all
the other entries of the matrix equal 0. For example, the matrix
4 0 0 0 0
0 2 3 0 0
7.22
A=
0 3 2 0 0
0 0 0 1 7
0 0 0 7 1
is a block diagonal matrix with
A1
A=
0
where
7.23
A1 =
,
A2 =
2
3
0
A2
A3
3
2
,
A3 =
1
7
7
1
B1
A1
0
0
.
.
..
..
A=
, B =
0
Am
0
Bm
.
A1 B1
0
..
,
7.24
AB =
.
0
Am Bm
143
144
Positive Operators
Many mathematicians
also use the term
positive semidenite
operator, which means
the same as positive
operator.
Positive Operators
145
7.27
(a)
T is positive;
(b)
(c)
(d)
(e)
Proof: We will prove that (a) (b) (c) (d) (e) (a).
First suppose that (a) holds, so that T is positive. Obviously T is
self-adjoint (by the denition of a positive operator). To prove the other
condition in (b), suppose that is an eigenvalue of T . Let v be a nonzero
eigenvector of T corresponding to . Then
0T v, v
= v, v
= v, v,
and thus is a nonnegative number. Hence (b) holds.
Now suppose that (b) holds, so that T is self-adjoint and all the eigenvalues of T are nonnegative. By the spectral theorem (7.9 and 7.13),
146
there is an orthonormal basis (e1 , . . . , en ) of V consisting of eigenvectors of T . Let 1 , . . . , n be the eigenvalues of T corresponding to
e1 , . . . , en , so that each j is a nonnegative number. Dene S L(V )
by
Sej = j ej
Isometries
147
T v = S 2 v = 2 v,
Isometries
An operator S L(V ) is called an isometry if
Sv = v
for all v V . In other words, an operator is an isometry if it preserves
norms. For example, I is an isometry whenever F satises || = 1.
More generally, suppose 1 , . . . , n are scalars with absolute value 1 and
S L(V ) satises S(ej ) = j ej for some orthonormal basis (e1 , . . . , en )
of V . Suppose v V . Then
7.33
and
7.34
148
7.35
An isometry on a real
inner-product space is
often called an
orthogonal operator.
An isometry on a
complex inner-product
space is often called a
unitary operator. We
will use the term
isometry so that our
results can apply to
both real and complex
inner-product spaces.
Comparing 7.34 and 7.35 shows that v = Sv. In other words, S is
an isometry.
For another example, let R. Then the operator on R 2 of counterclockwise rotation (centered at the origin) by an angle of is an
isometry (you should nd the matrix of this operator with respect to
the standard basis of R 2 ).
If S L(V ) is an isometry, then S is injective (because if Sv = 0,
then v = Sv = 0, and hence v = 0). Thus every isometry is
invertible (by 3.21).
The next theorem provides several conditions that are equivalent
to being an isometry. These equivalences have several important interpretations. In particular, the equivalence of (a) and (b) shows that
an isometry preserves inner products. Because (a) implies (d), we see
that if S is an isometry and (e1 , . . . , en ) is an orthonormal basis of V ,
then the columns of the matrix of S (with respect to this basis) are orthonormal; because (e) implies (a), we see that the converse also holds.
Because (a) is equivalent to conditions (i) and (j), we see that in the last
sentence we can replace columns with rows.
7.36
lent:
(a)
S is an isometry;
(b)
(c)
S S = I;
(d)
(e)
(f)
S is an isometry;
Isometries
(g)
(h)
SS = I;
(i)
(j)
149
150
Sv2 = S v, e1 e1 + + v, en en 2
= v, e1 Se1 + + v, en Sen 2
= |v, e1 |2 + + |v, en |2
= v2 ,
where the rst and last equalities come from 6.17. Taking square roots,
we see that S is an isometry, proving that (e) implies (a).
Having shown that (a) (b) (c) (d) (e) (a), we know at this
stage that (a) through (e) are all equivalent to each other. Replacing S
with S , we see that (f) through (j) are all equivalent to each other. Thus
to complete the proof, we need only show that one of the conditions
in the group (a) through (e) is equivalent to one of the conditions in
the group (f) through (j). The easiest way to connect the two groups of
conditions is to show that (c) is equivalent to (h). In general, of course,
S need not commute with S . However, S S = I if and only if SS = I;
this is a special case of Exercise 23 in Chapter 3. Thus (c) is equivalent
to (h), completing the proof.
The last theorem shows that every isometry is normal (see (a), (c),
and (h) of 7.36). Thus the characterizations of normal operators can
be used to give complete descriptions of isometries. We do this in the
next two theorems.
7.37 Theorem: Suppose V is a complex inner-product space and
S L(V ). Then S is an isometry if and only if there is an orthonormal
basis of V consisting of eigenvectors of S all of whose corresponding
eigenvalues have absolute value 1.
Proof: We already proved (see the rst paragraph of this section)
that if there is an orthonormal basis of V consisting of eigenvectors of S
all of whose eigenvalues have absolute value 1, then S is an isometry.
To prove the other direction, suppose S is an isometry. By the complex spectral theorem (7.9), there is an orthonormal basis (e1 , . . . , en )
of V consisting of eigenvectors of S. For j {1, . . . , n}, let j be the
eigenvalue corresponding to ej . Then
|j | = j ej = Sej = ej = 1.
Thus each eigenvalue of S has absolute value 1, completing the proof.
Isometries
151
If R, then the operator on R 2 of counterclockwise rotation (centered at the origin) by an angle of has matrix 7.39 with respect to
the standard basis, as you should verify. The next result states that every isometry on a real inner-product space is composed of pieces that
look like rotations on two-dimensional subspaces, pieces that equal the
identity operator, and pieces that equal multiplication by 1.
7.38 Theorem: Suppose that V is a real inner-product space and
S L(V ). Then S is an isometry if and only if there is an orthonormal
basis of V with respect to which S has a block diagonal matrix where
each block on the diagonal is a 1-by-1 matrix containing 1 or 1 or a
2-by-2 matrix of the form
cos sin
7.39
,
sin
cos
with (0, ).
Proof: First suppose that S is an isometry. Because S is normal,
there is an orthonormal basis of V such that with respect to this basis
S has a block diagonal matrix, where each block is a 1-by-1 matrix or a
2-by-2 matrix of the form
a b
,
7.40
b a
with b > 0 (see 7.25).
If is an entry in a 1-by-1 along the diagonal of the matrix of S (with
respect to the basis mentioned above), then there is a basis vector ej
such that Sej = ej . Because S is an isometry, this implies that || = 1.
Thus = 1 or = 1 because these are the only real numbers with
absolute value 1.
Now consider a 2-by-2 matrix of the form 7.40 along the diagonal of
the matrix of S. There are basis vectors ej , ej+1 such that
Sej = aej + bej+1 .
Thus
1 = ej 2 = Sej 2 = a2 + b2 .
The equation above, along with the condition b > 0, implies that there
exists a number (0, ) such that a = cos and b = sin . Thus the
152
matrix 7.40 has the required form 7.39, completing the proof in this
direction.
Conversely, now suppose that there is an orthonormal basis of V
with respect to which the matrix of S has the form required by the
theorem. Thus there is a direct sum decomposition
V = U1 Um ,
where each Uj is a subspace of V of dimension 1 or 2. Furthermore,
any two vectors belonging to distinct U s are orthogonal, and each S|Uj
is an isometry mapping Uj into Uj . If v V , we can write
v = u1 + + um ,
where each uj Uj . Applying S to the equation above and then taking
norms gives
Sv2 = Su1 + + Sum 2
= Su1 2 + + Sum 2
= u1 2 + + um 2
= v2 .
Thus S is an isometry, as desired.
153
where the rst factor, namely, z/|z|, is an element of the unit circle. Our
analogy leads us to guess that any operator T L(V ) can be written
T = S T T .
= T T T T v, v
= T T v, T T v
= T T v2 .
complex numbers:
every complex number
can be written in the
form ei r , where
[0, 2 ) and r 0.
Note that ei is in the
unit circle,
corresponding to S
Thus
7.42
coordinates for
T v = T T v
for all v V .
7.43
S1 ( T T v) = T v.
The idea of the proof is to extend S1 to an isometry S L(V ) such that
= T T (v1 v2 )
= T T v 1 T T v2
= 0,
where the second equality holds by 7.42. The equation above shows
that T v1 = T v2 , so S1 is indeed well dened. You should verify that S1
is a linear map.
T T being a positive
operator.
154
In the rest of the proof
all we are doing is
extending S1 to an
7.42 and 7.43 imply that S1 u = u for all u range T T . In
particular, S1 is injective. Thus from 3.4, applied to S1 , we have
isometry S on all of V .
S2 : (range T T ) (range T ) by
S2 (a1 e1 + + am em ) = a1 f1 + + am fm .
7.44
S( T T v) = S1 ( T T v) = T v,
so T = S T T , as desired. All that remains is to show that S is an isometry. However, this follows easily from the two uses of the Pythagorean
theorem: if v V has decomposition as in 7.44, then
Sv2 = S1 u + S2 w2
= S1 u2 + S2 w2
= u2 + w2
= v2 ,
where the second equality above holds because S1 u range T and
S2 u (range T ) .
is an orthonormal basis of V with respect to which T T has a diagonal matrix. Warning: there may not exist an orthonormal basis that
(hence self-adjoint) operator T T . For example, the operator T dened by 7.45 on the four-dimensional vector space F4 has four singular
values (they are 3, 3, 2, 0), as we saw in the previous paragraph.
The next result shows that every operator on V has a nice description in terms of its singular values and two orthonormal bases of V .
155
156
7.46 Singular-Value Decomposition: Suppose T L(V ) has singular values s1 , . . . , sn . Then there exist orthonormal bases (e1 , . . . , en )
and (f1 , . . . , fn ) of V such that
T v = s1 v, e1 f1 + + sn v, en fn
7.47
for every v V .
M T , (e1 , . . . , en ), (f1 , . . . , fn ) =
s1
0
..
157
sn
158
Exercises
1.
(b)
0 0 0
0 1 0 .
0 0 0
This matrix equals its conjugate transpose, even though T
is not self-adjoint. Explain why this is not a contradiction.
2.
Prove or give a counterexample: the product of any two selfadjoint operators on a nite-dimensional inner-product space is
self-adjoint.
3.
(a)
(b)
4.
Suppose P L(V ) is such that P 2 = P . Prove that P is an orthogonal projection if and only if P is self-adjoint.
5.
6.
7.
and
range T k = range T
Exercises
8.
9.
10.
11.
12.
159
Exercise 9 strengthens
the analogy (for normal
operators) between
self-adjoint operators
and real numbers.
13.
14.
T v v < ,
then T has an eigenvalue such that | | < .
15.
16.
17.
18.
160
19.
20.
21.
22.
23.
Dene T L(F3 ) by
T (z1 , z2 , z3 ) = (z3 , 2z1 , 3z2 ).
24.
25.
26.
27.
28.
29.
30.
if we write T as the
product of an isometry
and a positive operator
(as in the polar
decomposition), then
the positive operator
must equal T T .
Exercises
31.
32.
Prove that
T v = s1 v, f1 e1 + + sn v, fn en
for every v V .
(b)
v, f1 e1
v, fn en
+ +
s1
sn
for every v V .
33.
34.
161
Chapter 8
Operators on
Complex Vector Spaces
163
164
Generalized Eigenvectors
Unfortunately some operators do not have enough eigenvectors to
lead to a good description. Thus in this section we introduce the concept of generalized eigenvectors, which will play a major role in our
description of the structure of an operator.
To understand why we need more than eigenvectors, lets examine
the question of describing an operator by decomposing its domain into
invariant subspaces. Fix T L(V ). We seek to describe T by nding a
nice direct sum decomposition
8.1
V = U1 Um ,
where each Uj is a subspace of V invariant under T . The simplest possible nonzero invariant subspaces are one-dimensional. A decomposition 8.1 where each Uj is a one-dimensional subspace of V invariant
under T is possible if and only if V has a basis consisting of eigenvectors
of T (see 5.21). This happens if and only if V has the decomposition
8.2
(T I)j v = 0
for some positive integer j. Note that every eigenvector of T is a generalized eigenvector of T (take j = 1 in the equation above), but the
converse is not true. For example, if T L(C3 ) is dened by
Generalized Eigenvectors
165
T (z1 , z2 , z3 ) = (z2 , 0, z3 ),
then T 2 (z1 , z2 , 0) = 0 for all z1 , z2 C. Hence every element of C3
whose last coordinate equals 0 is a generalized eigenvector of T . As
you should verify,
C3 = {(z1 , z2 , 0) : z1 , z2 C} {(0, 0, z3 ) : z3 C},
where the rst subspace on the right equals the set of generalized eigenvectors for this operator corresponding to the eigenvalue 0 and the second subspace on the right equals the set of generalized eigenvectors
corresponding to the eigenvalue 1. Later in this chapter we will prove
that a decomposition using generalized eigenvectors exists for every
operator on a complex vector space (see 8.23).
Though j is allowed to be an arbitrary integer in the denition of a
generalized eigenvector, we will soon see that every generalized eigenvector satises an equation of the form 8.3 with j equal to the dimension of V . To prove this, we now turn to a study of null spaces of
powers of an operator.
Suppose T L(V ) and k is a nonnegative integer. If T k v = 0, then
k+1
v = T (T k v) = T (0) = 0. Thus null T k null T k+1 . In other words,
T
we have
8.4
The next proposition says that once two consecutive terms in this sequence of subspaces are equal, then all later terms in the sequence are
equal.
8.5
Proposition: If T L(V ) and m is a nonnegative integer such
that null T m = null T m+1 , then
null T 0 null T 1 null T m = null T m+1 = null T m+2 = .
Proof: Suppose T L(V ) and m is a nonnegative integer such
that null T m = null T m+1 . Let k be a positive integer. We want to prove
that
null T m+k = null T m+k+1 .
We already know that null T m+k null T m+k+1 . To prove the inclusion
in the other direction, suppose that v null T m+k+1 . Then
166
8.7
Corollary: Suppose T L(V ) and is an eigenvalue of T . Then
the set of generalized eigenvectors of T corresponding to equals
null(T I)dim V .
Proof: If v null(T I)dim V , then clearly v is a generalized
eigenvector of T corresponding to (by the denition of generalized
eigenvector).
Conversely, suppose that v V is a generalized eigenvector of T
corresponding to . Thus there is a positive integer j such that
v null(T I)j .
From 8.5 and 8.6 (with T I replacing T ), we get v null(T I)dim V ,
as desired.
Generalized Eigenvectors
is nilpotent because N = 0. As another example, the operator of differentiation on Pm (R) is nilpotent because the (m + 1)st derivative of
any polynomial of degree at most m equals 0. Note that on this space of
dimension m + 1, we need to raise the nilpotent operator to the power
m + 1 to get 0. The next corollary shows that we never need to use a
power higher than the dimension of the space.
8.8
167
The Latin word nil
means nothing or zero;
the Latin word potent
means power. Thus
nilpotent literally
means zero power.
Proof: We could prove this from scratch, but instead lets make use
of the corresponding result already proved for null spaces. Suppose
m > dim V . Then
dim range T m = dim V dim null T m
= dim V dim null T dim V
= dim range T dim V ,
These inclusions go in
the opposite direction
from the corresponding
inclusions for null
spaces (8.4).
168
where the rst and third equalities come from 3.4 and the second equality comes from 8.6. We already know that range T dim V range T m . We
just showed that dim range T dim V = dim range T m , so this implies that
range T dim V = range T m , as desired.
If T happens to have a
diagonal matrix A with
respect to some basis,
then appears on the
diagonal of A precisely
dim null(T I) times,
as you should verify.
169
..
.
8.11
n1
0
n
Let U = span(v1 , . . . , vn1 ). Clearly U is invariant under T (see 5.12),
and the matrix of T |U with respect to the basis (v1 , . . . , vn1 ) is
8.12
1
..
0
n1
null T n U.
Once this has been veried, we will know that null T n = null(T |U )n , and
hence 8.13 will tell us that 0 appears on the diagonal of 8.11 exactly
dim null T n times, completing the proof in the case where n = 0.
Because M(T ) is given by 8.11, we have
170
M(T ) = M(T ) =
1 n
..
.
n1 n
n n
171
Our denition of
multiplicity has a clear
connection with the
geometric behavior
of T . Most texts dene
multiplicity in terms of
the multiplicity of the
roots of a certain
polynomial dened by
determinants. These
8.16
6 7 7
8.17
0 6 7 ,
0 0 7
then 6 is an eigenvalue of T with multiplicity 2 and 7 is an eigenvalue
of T with multiplicity 1 (this follows from the last theorem).
In each of the examples above, the sum of the multiplicities of the
eigenvalues of T equals 3, which is the dimension of the domain of T .
The next proposition shows that this always happens on a complex
vector space.
172
theorem.
8.19
M(T ) =
1
..
0
Decomposition of an Operator
173
The English
mathematician Arthur
Cayley published three
mathematics papers
8.21
for j = 1, . . . , n.
We will prove 8.21 by induction on j. To get started, suppose j = 1.
Because M T , (v1 , . . . , vn ) is given by 8.19, we have T v1 = 1 v1 , giving
8.21 when j = 1.
Now suppose that 1 < j n and that
0 = (T 1 I)v1
= (T 1 I)(T 2 I)v2
..
.
= (T 1 I) . . . (T j1 I)vj1 .
Because M T , (v1 , . . . , vn ) is given by 8.19, we see that
(T j I)vj span(v1 , . . . , vj1 ).
Thus, by our induction hypothesis, (T 1 I) . . . (T j1 I) applied to
(T j I)vj gives 0. In other words, 8.21 holds, completing the proof.
Decomposition of an Operator
We saw earlier that the domain of an operator might not decompose
into invariant subspaces consisting of eigenvectors of the operator,
even on a complex vector space. In this section we will see that every
operator on a complex vector space has enough generalized eigenvectors to provide a decomposition.
We observed earlier that if T L(V ), then null T is invariant under T . Now we show that the null space of any polynomial of T is also
invariant under T .
before he completed
his undergraduate
degree in 1842. The
Irish mathematician
William Hamilton was
made a professor in
1827 when he was 22
years old and still an
undergraduate!
174
8.22 Proposition:
invariant under T .
V = U 1 Um ;
(b)
(c)
Proof: Note that Uj = null(T j I)dim V for each j (by 8.7). From
8.22 (with p(z) = (z j )dim V ), we get (b). Obviously (c) follows from
the denitions.
To prove (a), recall that the multiplicity of j as an eigenvalue of T
is dened to be dim Uj . The sum of these multiplicities equals dim V
(see 8.18); thus
8.24
Decomposition of an Operator
175
8.27
0
..
0
If V is complex vector
space, a proof of this
176
Now lets think about the matrix of N with respect to this basis. The
rst column, and perhaps additional columns at the beginning, consists
of all 0s because the corresponding basis vectors are in null N. The
next set of columns comes from basis vectors in null N 2 . Applying N
to any such vector, we get a vector in null N; in other words, we get a
vector that is a linear combination of the previous basis vectors. Thus
all nonzero entries in these columns must lie above the diagonal. The
next set of columns come from basis vectors in null N 3 . Applying N
to any such vector, we get a vector in null N 2 ; in other words, we get a
vector that is a linear combination of the previous basis vectors. Thus,
once again, all nonzero entries in these columns must lie above the
diagonal. Continue in this fashion to complete the proof.
Note that in the next theorem we get many more zeros in the matrix
of T than are needed to make it upper triangular.
8.28 Theorem: Suppose V is a complex vector space and T L(V ).
Let 1 , . . . , m be the distinct eigenvalues of T . Then there is a basis
of V with respect to which T has a block diagonal matrix of the form
A1
0
..
Am
8.29
Aj =
j
..
0
Square Roots
177
Square Roots
Recall that a square root of an operator T L(V ) is an operator
S L(V ) such that S 2 = T . As an application of the main structure
theorem from the last section, in this section we will show that every
invertible operator on a complex vector space has a square root.
Every complex number has a square root, but not every operator on
a complex vector space has a square root. An example of an operator
on C3 that has no square root is given in Exercise 4 in this chapter.
The noninvertibility of that particular operator is no accident, as we
will soon see. We begin by showing that the identity plus a nilpotent
operator always has a square root.
8.30 Lemma:
square root.
1 + x:
Because a1 = 1/2, this
formula shows that
1 + x/2 is a good
estimate for 1 + x
when x is small.
178
Proof: Suppose T L(V ) is invertible. Let 1 , . . . , m be the distinct eigenvalues of T , and let U1 , . . . , Um be the corresponding subspaces of generalized eigenvectors. For each j, there exists a nilpotent
operator Nj L(Uj ) such that T |Uj = j I + Nj (see 8.23(c)). Because T
is invertible, none of the j s equals 0, so we can write
Nj
T | Uj = j I +
j
for each j. Clearly Nj /j is nilpotent, and so I + Nj /j has a square
root (by 8.30). Multiplying a square root of the complex number j by
a square root of I + Nj /j , we obtain a square root Sj of T |Uj .
A typical vector v V can be written uniquely in the form
v = u1 + + um ,
where each uj Uj (see 8.23). Using this decomposition, dene an
operator S L(V ) by
Sv = S1 u1 + + Sm um .
You should verify that this operator S is a square root of T , completing
the proof.
By imitating the techniques in this section, you should be able to
prove that if V is a complex vector space and T L(V ) is invertible,
then T has a kth -root for every positive integer k.
179
a polynomial whose
highest degree
coefcient equals 1.
For example,
2 + 3z2 + z8 is a monic
(I, T , T 2 , . . . , T n )
polynomial.
2
A monic polynomial is
(I, T , T 2 , . . . , T m )
180
Note that (z )
divides a polynomial q
if and only if is a
root of q. This follows
immediately from 4.1.
8.35
and deg r < deg p. We have
181
Proof: Let
p(z) = a0 + a1 z + a2 z2 + + am1 zm1 + zm
be the minimal polynomial of T .
First suppose that F is a root of p. Then the minimal polynomial
of T can be written in the form
p(z) = (z )q(z),
where q is a monic polynomial with coefcients in F (see 4.1). Because
p(T ) = 0, we have
0 = (T I)(q(T )v)
for all v V . Because the degree of q is less than the degree of the
minimal polynomial p, there must exist at least one vector v V such
that q(T )v = 0. The equation above thus implies that is an eigenvalue
of T , as desired.
To prove the other direction, now suppose that F is an eigenvalue of T . Let v be a nonzero vector in V such that T v = v. Repeated
applications of T to both sides of this equation show that T j v = j v
for every nonnegative integer j. Thus
0 = p(T )v = (a0 + a1 T + a2 T 2 + + am1 T m1 + T m )v
= (a0 + a1 + a2 2 + + am1 m1 + m )v
= p()v.
Because v = 0, the equation above implies that p() = 0, as desired.
Suppose we are given, in concrete form, the matrix (with respect to
some basis) of some operator T L(V ). To nd the minimal polynomial of T , consider
(M(I), M(T ), M(T )2 , . . . , M(T )m )
for m = 1, 2, . . . until this list is linearly dependent. Then nd the
scalars a0 , a1 , a2 , . . . , am1 F such that
a0 M(I) + a1 M(T ) + a2 M(T )2 + + am1 M(T )m1 + M(T )m = 0.
The scalars a0 , a1 , a2 , . . . , am1 , 1 will then be the coefcients of the
minimal polynomial of T . All this can be computed using a familiar
process such as Gaussian elimination.
182
8.37
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
3
6
0
0
0
8.38
Now what about the eigenvalues of this particular operator? From 8.36,
we see that the eigenvalues of T equal the solutions to the equation
z5 6z + 3 = 0.
Unfortunately no solution to this equation can be computed using rational numbers, arbitrary roots of rational numbers, and the usual rules
of arithmetic (a proof of this would take us considerably beyond linear
algebra). Thus we cannot nd an exact expression for any eigenvalues
of T in any familiar form, though numeric techniques can give good approximations for the eigenvalues of T . The numeric techniques, which
we will not discuss here, show that the eigenvalues for this particular
operator are approximately
1.67,
0.51,
1.40,
0.12 + 1.59i,
0.12 1.59i.
Note that the nonreal eigenvalues occur as a pair, with each the complex
conjugate of the other, as expected for the roots of a polynomial with
real coefcients (see 4.10).
Suppose V is a complex vector space and T L(V ). The CayleyHamilton theorem (8.20) and 8.34 imply that the minimal polynomial
of T divides the characteristic polynomial of T . Both these polynomials
are monic. Thus if the minimal polynomial of T has degree dim V , then
it must equal the characteristic polynomial of T . For example, if T is
the operator on C5 whose matrix is given by 8.37, then the characteristic polynomial of T , as well as the minimal polynomial of T , is given
by 8.38.
Jordan Form
183
Jordan Form
We know that if V is a complex vector space, then for every T L(V )
there is a basis of V with respect to which T has a nice upper-triangular
matrix (see 8.28). In this section we will see that we can do even better
there is a basis of V with respect to which the matrix of T contains zeros
everywhere except possibly on the diagonal and the line directly above
the diagonal.
We begin by describing the nilpotent operators. Consider, for example, the nilpotent operator N L(Fn ) dened by
N(z1 , . . . , zn ) = (0, z1 , . . . , zn1 ).
If v = (1, 0, . . . , 0), then clearly (v, Nv, . . . , N n1 v) is a basis of Fn and
(N n1 v) is a basis of null N, which has dimension 1.
As another example, consider the nilpotent operator N L(F5 ) dened by
8.39
N(z1 , z2 , z3 , z4 , z5 ) = (0, z1 , z2 , 0, z4 ).
(b)
Obviously m(v)
depends on N as well
as on v, but the choice
of N will be clear from
the context.
184
The existence of a
(i)
(ii)
8.41
0=
k m(v
r )
ar ,s N s (vr ),
r =1 s=0
k m(v
r )
ar ,s N s+1 (vr )
r =1 s=0
j m(ur )
ar ,s N s (ur ).
r =1 s=0
Jordan Form
The terms on the rst line on the right are all in null N range N; the
terms on the second line are all in W . Thus the last equation and 8.41
imply that
0 = a1,m(v1 ) N m(v1 ) v1 + + aj,m(vj ) N m(vj ) vj
8.43
and
0 = aj+1,0 vj+1 + + ak,0 vk .
8.44
8.45
Clearly (i) implies that
dim range N =
(m(ur ) + 1)
r =0
j
8.46
m(vr ).
r =0
(m(vr ) + 1) = k +
r =0
m(vr )
r =0
185
186
Now (ii) and 8.41 show that the last list above is a basis of null N, completing the proof of (b).
Suppose T L(V ). A basis of V is called a Jordan basis for T if
with respect to this basis T has a block diagonal matrix
0
A1
..
,
0
Am
where each Aj is an upper-triangular matrix of the form
j 1
0
..
..
.
.
.
Aj =
.. 1
0
j
To understand why
each j must be an
eigenvalue of T ,
see 5.18.
The French
mathematician Camille
Jordan rst published a
proof of this theorem
in 1870.
0 1
0
..
..
.
.
.
.. 1
0
0
Jordan Form
187
188
Exercises
1.
Dene T L(C2 ) by
T (w, z) = (z, 0).
Find all generalized eigenvectors of T .
2.
Dene T L(C2 ) by
T (w, z) = (z, w).
Find all generalized eigenvectors of T .
3.
4.
5.
6.
7.
8.
9.
Exercises
10.
189
11.
12.
13.
14.
15.
16.
17.
18.
Dene N L(F5 ) by
N(x1 , x2 , x3 , x4 , x5 ) = (2x2 , 3x3 , x4 , 4x5 , 0).
Find a square root of I + N.
190
19.
20.
Suppose T L(V ) is invertible. Prove that there exists a polynomial p P(F) such that T 1 = p(T ).
21.
22.
23.
24.
25.
p(T )v = 0.
Prove that p divides the minimal polynomial of T .
26.
27.
Give an example of an operator on C4 whose characteristic polynomial equals z(z 1)2 (z 3) and whose minimal polynomial
equals z(z 1)(z 3).
28.
0
a0
1 0
a1
..
.
1
a2
..
..
.
.
0 an2
1 an1
Exercises
29.
Suppose N L(V ) is nilpotent. Prove that the minimal polynomial of N is zm+1 , where m is the length of the longest consecutive string of 1s that appears on the line directly above the
diagonal in the matrix of N with respect to any Jordan basis for N.
30.
31.
191
Chapter 9
Operators on
Real Vector Spaces
193
194
7
1
8
5
2
1
=
6
3
=3
2
1
.
As another example, you should verify that the matrix 01 1
has no
0
eigenvalues if we are thinking of F as the real numbers (by denition,
an eigenvalue must be in F) and has eigenvalues i and i if we are
thinking of F as the complex numbers.
We now have two notions of eigenvalueone for operators and one
for square matrices. As you might expect, these two notions are closely
connected, as we now show.
9.1
Proposition: Suppose T L(V ) and A is the matrix of T with
respect to some basis of V . Then the eigenvalues of T are the same as
the eigenvalues of A.
Proof: Let (v1 , . . . , vn ) be the basis of V with respect to which T
has matrix A. Let F. We need to show that is an eigenvalue of T
if and only if is an eigenvalue of A.
First suppose is an eigenvalue of T . Let v V be a nonzero vector
such that T v = v. We can write
9.2
v = a1 v1 + + an vn ,
9.3
a1
.
x=
.. .
an
195
We have
Ax = M(T )M(v) = M(T v) = M(v) = M(v) = x,
where the second equality comes from 3.14. The equation above shows
that is an eigenvalue of A, as desired.
To prove the implication in the other direction, now suppose is an
eigenvalue of A. Let x be a nonzero n-by-1 matrix such that Ax = x.
We can write x in the form 9.3 for some scalars a1 , . . . , an F. Dene
v V by 9.2. Then
M(T v) = M(T )M(v) = Ax = x = M(v).
where the rst equality comes from 3.14. The equation above implies
that T v = v, and thus is an eigenvalue of T , completing the proof.
Because every square matrix is the matrix of some operator, the
proposition above allows us to translate results about eigenvalues of
operators into the language of eigenvalues of square matrices. For
example, every square matrix of complex numbers has an eigenvalue
(from 5.10). As another example, every n-by-n matrix has at most n
distinct eigenvalues (from 5.9).
A1
..
,
.
0
Am
where A1 , . . . , Am are square matrices lying along the diagonal, all entries below A1 , . . . , Am equal 0, and the denotes arbitrary entries. For
example, the matrix
As usual, we use an
asterisk to denote
entries of the matrix
that play no important
role in the topics under
consideration.
196
A=
4
0
0
0
0
10
3
3
0
0
11
3
3
0
0
12
14
16
5
5
13
25
17
5
5
A=
A1
A2
0
A3
where
A1 =
Every upper-triangular
matrix is also a block
upper-triangular matrix
,
A2 =
3
3
3
3
,
A3 =
5
5
5
5
.
Now we prove that for each operator on a real vector space, we can
nd a basis that gives a block upper-triangular matrix with blocks of
size at most 2-by-2 on the diagonal.
9.4
Theorem: Suppose V is a real vector space and T L(V ).
Then there is a basis of V with respect to which T has a block uppertriangular matrix
matrix is a block
upper-triangular matrix
because we can take
the rst (and only)
block to be the entire
matrix. Smaller blocks
9.5
A1
..
0
Am
197
v = w + u, where
w W and u U,
then PW ,U v = w.
Recall that if
= PU ,W (T w) + Sw
for every w W .
By our induction hypothesis, there is a basis of W with respect to
which S has a block upper-triangular matrix of the form
A2
..
,
.
0
Am
where each Aj is a 1-by-1 matrix or a 2-by-2 matrix with no eigenvalues.
Adjoin this basis of W to the basis of U chosen above, getting a basis
of V . A minutes thought should convince you (use 9.6) that the matrix
of T with respect to this basis is a block upper-triangular matrix of the
form 9.5, completing the proof.
198
199
(b)
2 1
I),
1 2
200
3
2
5
1
.
1
1
6
1
.
201
null(T 2 + T + I) = {0}.
(Proof: Because V is one-dimensional, there is a constant R such
that T v = v for all v V . Thus (T 2 + T + I)v = (2 + + )v.
However, the inequality 2 < 4 implies that 2 + + = 0, and thus
null(T 2 + T + I) = {0}.)
Now suppose V is a two-dimensional real vector space and T L(V )
has no eigenvalues. If R, then null(T I) equals {0} (because T
has no eigenvalues). If , R with 2 < 4, then null(T 2 + T + I)
equals V if x 2 + x + is the characteristic polynomial of the matrix
of T with respect to some (or equivalently, every) basis of V and equals
{0} otherwise (by 9.7). Note that for this operator, there is no middle
groundthe null space of T 2 + T + I is either {0} or the whole space;
it cannot be one-dimensional.
Now suppose that V is a real vector space of any dimension and
T L(V ). We know that V has a basis with respect to which T has
a block upper-triangular matrix with blocks on the diagonal of size at
most 2-by-2 (see 9.4). In general, this matrix is not uniqueV may
have many different bases with respect to which T has a block uppertriangular matrix of this form, and with respect to these different bases
we may get different block upper-triangular matrices.
We encountered a similar situation when dealing with complex vector spaces and upper-triangular matrices. In that case, though we might
get different upper-triangular matrices with respect to the different
bases, the entries on the diagonal were always the same (though possibly in a different order). Might a similar property hold for real vector
spaces and block upper-triangular matrices? Specically, is the number of times a given 2-by-2 matrix appears on the diagonal of a block
upper-triangular matrix of T independent of which basis is chosen?
Unfortunately this question has a negative answer. For example, the
operator T L(R 2 ) dened by 9.8 has two different 2-by-2 matrices,
as we saw above.
Though the number of times a particular 2-by-2 matrix might appear
on the diagonal of a block upper-triangular matrix of T can depend on
the choice of basis, if we look at characteristic polynomials instead
of the actual matrices, we nd that the number of times a particular
characteristic polynomial appears is independent of the choice of basis.
This is the content of the following theorem, which will be our key tool
in analyzing the structure of an operator on a real vector space.
202
9.9
Theorem: Suppose V is a real vector space and T L(V ).
Suppose that with respect to some basis of V , the matrix of T is
A1
..
,
9.10
.
0
Am
where each Aj is a 1-by-1 matrix or a 2-by-2 matrix with no eigenvalues.
(a)
(b)
Proof: We will construct one proof that can be used to prove both
(a) and (b). To do this, let , , R with 2 < 4. Dene p P(R)
by
x
if we are trying to prove (a);
p(x) =
2
x + x + if we are trying to prove (b).
Let d denote the degree of p. Thus d = 1 if we are trying to prove (a)
and d = 2 if we are trying to prove (b).
We will prove this theorem by induction on m, the number of blocks
along the diagonal of 9.10. If m = 1, then dim V = 1 or dim V = 2; the
discussion preceding this theorem then implies that the desired result
holds. Thus we can assume that m > 1 and that the desired result
holds when m is replaced with m 1.
For convenience let n = dim V . Consider a basis of V with respect
to which T has the block upper-triangular matrix 9.10. Let Uj denote
the span of the basis vectors corresponding to Aj . Thus dim Uj = 1
if Aj is a 1-by-1 matrix and dim Uj = 2 if Aj is a 2-by-2 matrix. Let
U = U1 + + Um1 . Clearly U is invariant under T and the matrix
of T |U with respect to the obvious basis (obtained from the basis vectors corresponding to A1 , . . . , Am1 ) is
A1
..
.
9.11
.
0
Am1
Actually the induction hypothesis gives 9.12 with exponent dim U instead of n, but then we can replace dim U with n (by 8.6) to get the
statement above.
Suppose um Um . Let S L(Um ) be the operator whose matrix
(with respect to the basis corresponding to Um ) equals Am . In particular, Sum = PUm ,U T um . Now
T um = PU ,Um T um + PUm ,U T um
= U + Sum ,
where U denotes a vector in U. Note that Sum Um ; thus applying
T to both sides of the equation above gives
T 2 um = U + S 2 um ,
where again U denotes a vector in U, though perhaps a different vector
than the previous usage of U (the notation U is used when we want
to emphasize that we have a vector in U but we do not care which
particular vectoreach time the notation U is used, it may denote a
different vector in U). The last two equations show that
9.13
p(T )n um = U + p(S)n um
for some U U.
The proof now breaks into two cases. First consider the case where
the characteristic polynomial of Am does not equal p. We will show
that in this case
9.15
null p(T )n U.
203
204
and hence 9.12 will tell us that precisely (1/d) dim null p(T )n of the
matrices A1 , . . . , Am have characteristic polynomial p, completing the
proof in the case where the characteristic polynomial of Am does not
equal p.
To prove 9.15 (still assuming that the characteristic polynomial of
Am does not equal p), suppose v null p(T )n . We can write v in the
form v = u + um , where u U and um Um . Using 9.14, we have
0 = p(T )n v = p(T )n u + p(T )n um = p(T )n u + U + p(S)n um
for some U U. Because the vectors p(T )n u and U are in U and
p(S)n um Um , this implies that p(S)n um = 0. However, p(S) is invertible (see the discussion preceding this theorem about one- and twodimensional subspaces and note that dim Um 2), so um = 0. Thus
v = u U , completing the proof of 9.15.
Now consider the case where the characteristic polynomial of Am
equals p. Note that this implies dim Um = d. We will show that
9.16
205
T + T + I
is not injective. The previous theorem shows that T can have only
nitely many eigenpairs because each eigenpair corresponds to the
characteristic polynomial of a 2-by-2 matrix on the diagonal of 9.10
and there is room for only nitely many such matrices along that diagonal. Guided by 9.9, we dene the multiplicity of an eigenpair (, )
of T to be
dim null(T 2 + T + I)dim V
.
2
From 9.9, we see that the multiplicity of (, ) equals the number of
times that x 2 +x+ is the characteristic polynomial of a 2-by-2 matrix
on the diagonal of 9.10.
As an example, consider the operator T L(R 3 ) whose matrix (with
respect to the standard basis) equals
3 1 2
3 2 3 .
1 2
0
You should verify that (4, 13) is an eigenpair of T with multiplicity 1;
note that T 2 4T + 13I is not injective because (1, 0, 1) and (1, 1, 0)
are in its null space. Without doing any calculations, you should verify
that T has no other eigenpairs (use 9.9). You should also verify that 1 is
an eigenvalue of T with multiplicity 1, with corresponding eigenvector
(1, 0, 1), and that T has no other eigenvalues.
206
In the example above, the sum of the multiplicities of the eigenvalues of T plus twice the multiplicities of the eigenpairs of T equals 3,
which is the dimension of the domain of T . The next proposition shows
that this always happens on a real vector space.
This proposition shows
that though an
operator on a real
eigenpairs.
9.18
A1
..
0
Am
where each Aj is a 1-by-1 matrix or a 2-by-2 matrix with no eigenvalues (see 9.4). We dene the characteristic polynomial of T to be the
product of the characteristic polynomials of A1 , . . . , Am . Explicitly, for
each j, dene qj P(R) by
9.19
Note that the roots of
the characteristic
polynomial of T equal
the eigenvalues of T , as
was true on complex
vector spaces.
qj (x) =
x
(x a)(x d) bc
if Aj equals [];
c
if Aj equals a
b d .
207
Now we can prove a result that was promised in the last chapter,
where we proved the analogous theorem (8.20) for operators on complex vector spaces.
9.20 Cayley-Hamilton Theorem: Suppose V is a real vector space
and T L(V ). Let q denote the characteristic polynomial of T . Then
q(T ) = 0.
Proof: Choose a basis of V with respect to which T has a block
upper-triangular matrix of the form 9.18, where each Aj is a 1-by-1
matrix or a 2-by-2 matrix with no eigenvalues. Suppose Uj is the one- or
two-dimensional subspace spanned by the basis vectors corresponding
to Aj . Dene qj as in 9.19. To prove that q(T ) = 0, we need only show
that q(T )|Uj = 0 for j = 1, . . . , m. To do this, it sufces to show that
q1 (T ) . . . qj (T )|Uj = 0
9.21
for j = 1, . . . , m.
We will prove 9.21 by induction on j. To get started, suppose that
j = 1. Because M(T ) is given by 9.18, we have q1 (T )|U1 = 0 (obvious if
dim U1 = 1; from 9.7(a) if dim U1 = 2), giving 9.21 when j = 1.
Now suppose that 1 < j n and that
0 = q1 (T )|U1
0 = q1 (T )q2 (T )|U2
..
.
0 = q1 (T ) . . . qj1 (T )|Uj1 .
If v Uj , then from 9.18 we see that
qj (T )v = u + qj (S)v,
where u U1 + + Uj1 and S L(Uj ) has characteristic polynomial qj . Because qj (S) = 0 (obvious if dim Uj = 1; from 9.7(a) if
dim Uj = 2), the equation above shows that
qj (T )v U1 + + Uj1
whenever v Uj . Thus, by our induction hypothesis, q1 (T ) . . . qj1 (T )
applied to qj (T )v gives 0 whenever v Uj . In other words, 9.21 holds,
completing the proof.
208
Suppose V is a real vector space and T L(V ). Clearly the CayleyHamilton theorem (9.20) implies that the minimal polynomial of T has
degree at most dim V , as was the case on complex vector spaces. If
the degree of the minimal polynomial of T equals dim V , then, as was
also the case on complex vector spaces, the minimal polynomial of T
must equal the characteristic polynomial of T . This follows from the
Cayley-Hamilton theorem (9.20) and 8.34.
Finally, we can now prove a major structure theorem about operators on real vector spaces. The theorem below should be compared
to 8.23, the corresponding result on complex vector spaces.
Either m or M
might be 0.
V = U1 Um V1 VM ;
(b)
(c)
Proof: From 8.22, we get (b). Clearly (c) follows from the denitions.
To prove (a), recall that dim Uj equals the multiplicity of j as an
eigenvalue of T and dim Vj equals twice the multiplicity of (j , j ) as
an eigenpair of T . Thus
9.23
This equation, along with 9.23, shows that dim V = dim U . Because U
is a subspace of V , this implies that V = U. In other words,
V = U1 + + Um + V1 + + VM .
This equation, along with 9.23, allows us to use 2.19 to conclude that
(a) holds, completing the proof.
209
210
Exercises
1.
2.
3.
A1
0
.
..
A=
0
Am
where each Aj is a square matrix. Prove that the set of eigenvalues of A equals the union of the eigenvalues of A1 , . . . , Am .
Clearly Exercise 4 is a
4.
stronger statement
than Exercise 3. Even
so, you may want to do
Exercise 3 rst because
it is easier than
A1
..
,
A=
.
0
Am
where each Aj is a square matrix. Prove that the set of eigenvalues of A equals the union of the eigenvalues of A1 , . . . , Am .
Exercise 4.
5.
6.
A1
..
,
.
0
Am
where each Aj is a 1-by-1 matrix or a 2-by-2 matrix with no eigenvalues.
Exercises
7.
8.
Prove that there does not exist an operator T L(R 7 ) such that
T 2 + T + I is nilpotent.
9.
10.
211
null(T 2 + T + I)k
has even dimension for every positive integer k.
11.
12.
13.
14.
15.
is the matrix of T with respect to some basis of V , then the characteristic polynomial of T equals (z a)(z d) bc.
Suppose V is a real inner-product space and S L(V ) is an isometry. Prove that if (, ) is an eigenpair of S, then = 1.
the eigenvalues of T to
do this exercise. As
usual unless otherwise
be a real or complex
vector space.
Chapter 10
Throughout this book our emphasis has been on linear maps and operators rather than on matrices. In this chapter we pay more attention
to matrices as we dene and discuss traces and determinants. Determinants appear only at the end of this book because we replaced their
usual applications in linear algebra (the denition of the characteristic polynomial and the proof that operators on complex vector spaces
have eigenvalues) with more natural techniques. The book concludes
with an explanation of the important role played by determinants in
the theory of volume and integration.
Recall that F denotes R or C.
Also, V is a nite-dimensional, nonzero vector space over F.
213
214
Change of Basis
The matrix of an operator T L(V ) depends on a choice of basis
of V . Two different bases of V may give different matrices of T . In this
section we will learn how these matrices are related. This information
will help us nd formulas for the trace and determinant of T later in
this chapter.
With respect to any basis of V , the identity operator I L(V ) has a
diagonal matrix
1
0
..
.
0
1
This matrix is called the identity matrix and is denoted I. Note that we
use the symbol I to denote the identity operator (on all vector spaces)
and the identity matrix (of all possible sizes). You should always be
able to tell from the context which particular meaning of I is intended.
For example, consider the equation
M(I) = I;
Some mathematicians
use the terms
nonsingular, which
means the same as
invertible, and
singular, which means
the same as
noninvertible.
on the left side I denotes the identity operator and on the right side I
denotes the identity matrix.
If A is a square matrix (with entries in F, as usual) with the same
size as I, then AI = IA = A, as you should verify. A square matrix A
is called invertible if there is a square matrix B of the same size such
that AB = BA = I, and we call B an inverse of A. To prove that A has
at most one inverse, suppose B and B are inverses of A. Then
B = BI = B(AB ) = (BA)B = IB = B ,
and hence B = B , as desired. Because an inverse is unique, we can use
the notation A1 to denote the inverse of A (if A is invertible). In other
words, if A is invertible, then A1 is the unique matrix of the same size
such that AA1 = A1 A = I.
Recall that when discussing linear maps from one vector space to
another in Chapter 3, we dened the matrix of a linear map with respect
to two basesone basis for the rst vector space and another basis for
the second vector space. When we study operators, which are linear
maps from a vector space to itself, we almost always use the same basis
Change of Basis
for both vector spaces (after all, the two vector spaces in question are
equal). Thus we usually refer to the matrix of an operator with respect
to a basis, meaning that we are using one basis in two capacities. The
next proposition is one of the rare cases where we need to use two
dierent bases even though we have an operator from a vector space
to itself.
Lets review how matrix multiplication interacts with multiplication
of linear maps. Suppose that along with V we have two other nitedimensional vector spaces, say U and W . Let (u1 , . . . , up ) be a basis
of U , let (v1 , . . . , vn ) be a basis of V , and let (w1 , . . . , wm ) be a basis
of W . If T L(U, V ) and S L(V , W ), then ST L(U, W ) and
M ST , (u1 , . . . , up ), (w1 , . . . , wm ) =
M S, (v1 , . . . , vn ), (w1 , . . . , wm ) M T , (u1 , . . . , up ), (v1 , . . . , vn ) .
10.1
215
216
I = M I, (v1 , . . . , vn ), (u1 , . . . , un ) M I, (u1 , . . . , un ), (v1 , . . . , vn ) .
Now interchange the roles of the us and v s, getting
I = M I, (u1 , . . . , un ), (v1 , . . . , vn ) M I, (v1 , . . . , vn ), (u1 , . . . , un ) .
These two equations give the desired result.
Now we can see how the matrix of T changes when we change
bases.
10.3 Theorem: Suppose T L(V ). Let (u1 , . . . , un ) and (v1 , . . . , vn )
be bases of V . Let A = M I, (u1 , . . . , un ), (v1 , . . . , vn ) . Then
M T , (u1 , . . . , un ) = A1 M T , (v1 , . . . , vn ) A.
10.4
10.5
Trace
Lets examine the characteristic polynomial more closely than we
did in the last two chapters. If V is an n-dimensional complex vector
space and T L(V ), then the characteristic polynomial of T equals
(z 1 ) . . . (z n ),
where 1 , . . . , n are the eigenvalues of T , repeated according to multiplicity. Expanding the polynomial above, we can write the characteristic
polynomial of T in the form
10.6
zn (1 + + n )zn1 + + (1)n (1 . . . n ).
Trace
217
Here m or M might
equal 0.
10.7
x n (1 + + m 1 m )x n1 + . . .
m
+ (1) (1 . . . m 1 . . . M ).
In this section we will study the coefcient of zn1 (usually denoted
x
when we are dealing with a real vector space) in the characteristic
polynomial. In the next section we will study the constant term in the
characteristic polynomial.
For T L(V ), the negative of the coefcient of zn1 (or x n1 for real
vector spaces) in the characteristic polynomial of T is called the trace
of T , denoted trace T . If V is a complex vector space, then 10.6 shows
that trace T equals the sum of the eigenvalues of T , counting multiplicity. If V is a real vector space, then 10.7 shows that trace T equals the
sum of the eigenvalues of T minus the sum of the rst coordinates of
the eigenpairs of T , each repeated according to multiplicity.
For example, suppose T L(C3 ) is the operator whose matrix is
3 1 2
10.8
3 2 3 .
1 2
0
T 2 + T + I is not
injective.
n1
Then the eigenvalues of T are 1, 2 + 3i, and 2 3i, each with multiplicity 1, as you can verify. Computing the sum of the eigenvalues, we
have trace T = 1 + (2 + 3i) + (2 3i); in other words, trace T = 5.
As another example, suppose T L(R3 ) is the operator whose matrix is also given by 10.8 (note that in the previous paragraph we were
working on a complex vector space; now we are working on a real vector space). Then 1 is the only eigenvalue of T (it has multiplicity 1)
and (4, 13) is the only eigenpair of T (it has multiplicity 1), as you
should have veried in the last chapter (see page 205). Computing the
sum of the eigenvalues minus the sum of the rst coordinates of the
eigenpairs, we have trace T = 1 (4); in other words, trace T = 5.
218
The reason that the operators in the two previous examples have
the same trace will become clear after we nd a formula (valid on both
complex and real vector spaces) for computing the trace of an operator
from its matrix.
Most of the rest of this section is devoted to discovering how to calculate trace T from the matrix of T (with respect to an arbitrary basis).
Lets start with the easiest situation. Suppose V is a complex vector
space, T L(V ), and we choose a basis of V with respect to which
T has an upper-triangular matrix A. Then the eigenvalues of T are
precisely the diagonal entries of A, repeated according to multiplicity
(see 8.10). Thus trace T equals the sum of the diagonal entries of A.
The same formula works for the operator T L(F3 ) whose matrix is
given by 10.8 and whose trace equals 5. Could such a simple formula
be true in general?
We begin our investigation by considering T L(V ) where V is a
real vector space. Choose a basis of V with respect to which T has a
block upper-triangular matrix M(T ), where each block on the diagonal
is a 1-by-1 matrix containing an eigenvalue of T or a 2-by-2 block with
no eigenvalues (see 9.4 and 9.9). Each entry in a 1-by-1 block on the
diagonal of M(T ) is an eigenvalue of T and thus makes a contribution
to trace T . If M(T ) has any 2-by-2 blocks on the diagonal, consider a
typical one
a c
.
b d
The characteristic polynomial of this 2-by-2 matrix is (xa)(xd)bc,
which equals
x 2 (a + d)x + (ad bc).
You should carefully
review 9.9 to
understand the
relationship between
eigenpairs and
characteristic
polynomials of 2-by-2
blocks.
Trace
219
trace T = trace M T , (v1 , . . . , vn ) , where (v1 , . . . , vn ) is an arbitrary
basis of V . We already know this is true if (v1 , . . . , vn ) is a basis with
respect to which T has an upper-triangular matrix (if V is complex) or
an appropriate block upper-triangular matrix (if V is real). We will need
the following proposition to prove our trace formula for an arbitrary
basis.
10.9
then
Proof: Suppose
a1,1 . . .
.
A = ..
an,1 . . .
a1,n
..
. ,
an,n
b1,1
.
B = ..
bn,1
...
...
b1,n
..
. .
bn,n
aj,k bk,j .
k=1
Thus
trace(AB) =
aj,k bk,j
j=1 k=1
bk,j aj,k
k=1 j=1
k=1
= trace(BA),
as desired.
Now we can prove that the sum of the diagonal entries of the matrix
of an operator is independent of the basis with respect to which the
matrix is computed.
10.10 Corollary: Suppose T L(V ). If (u1 , . . . , un ) and (v1 , . . . , vn )
are bases of V , then
trace M T , (u1 , . . . , un ) = trace M T , (v1 , . . . , vn ) .
220
0 0 0 0 3
1 0 0 0 6
0 1 0 0 0 .
0 0 1 0 0
0 0 0 1 0
Trace
No one knows an exact formula for any of the eigenvalues of this operator. However, we do know that the sum of the eigenvalues equals 0
because the sum of the diagonal entries of the matrix above equals 0.
The theorem above also allows us easily to prove some useful properties about traces of operators by shifting to the language of traces
of matrices, where certain properties have already been proved or are
obvious. We carry out this procedure in the next corollary.
10.12
trace(ST ) = trace(T S)
and
221
222
The statement of this
corollary does not
Determinant of an Operator
Note that det T
depends only on T and
not on a basis of V
because the
characteristic
polynomial of T does
not depend on a choice
of basis.
Determinant of an Operator
has multiplicity 1) and (4, 13) is the only eigenpair of T (it has multiplicity 1). Computing the product of the eigenvalues times the product
of the second coordinates of the eigenpairs, we have det T = (1)(13);
in other words, det T = 13.
The reason that the operators in the two previous examples have the
same determinant will become clear after we nd a formula (valid on
both complex and real vector spaces) for computing the determinant
of an operator from its matrix.
In this section, we will prove some simple but important properties
of determinants. In the next section, we will discover how to calculate
det T from the matrix of T (with respect to an arbitrary basis). We begin
with a crucial result that has an easy proof with our approach.
10.14 Proposition: An operator is invertible if and only if its determinant is nonzero.
Proof: First suppose V is a complex vector space and T L(V ).
The operator T is invertible if and only if 0 is not an eigenvalue of T .
Clearly this happens if and only if the product of the eigenvalues of T
is not 0. Thus T is invertible if and only if det T = 0, as desired.
Now suppose V is a real vector space and T L(V ). Again, T is
invertible if and only if 0 is not an eigenvalue of T . Using the notation
of 10.7, we have
10.15
det T = 1 . . . m 1 . . . M ,
where the s are the eigenvalues of T and the s are the second coordinates of the eigenpairs of T , each repeated according to multiplicity.
For each eigenpair (j , j ), we have j 2 < 4j . In particular, each j
is positive. This implies (see 10.15) that 1 . . . m = 0 if and only if
det T = 0. Thus T is invertible if and only if det T = 0, as desired.
If T L(V ) and , z F, then is an eigenvalue of T if and only if
z is an eigenvalue of zI T . This follows from
(T I) = (zI T ) (z )I.
Raising both sides of this equation to the dim V power and then taking
null spaces of both sides shows that the multiplicity of as an eigenvalue of T equals the multiplicity of z as an eigenvalue of zI T .
223
224
The next lemma gives the analogous result for eigenpairs. We will use
this lemma to show that the characteristic polynomial can be expressed
as a certain determinant.
Real vector spaces are
harder to deal with
than complex vector
spaces. The rst time
considering only
(2x )2 = 4x 2 + 4x + 2
< 4x 2 + 4x + 4
= 4(x 2 + x + ).
special procedures
needed to deal with
real vector spaces.
Determinant of a Matrix
The right side of the equation above is, by denition, the characteristic
polynomial of T , completing the proof when V is a complex vector
space.
Now suppose V is a real vector space. Let 1 , . . . , m denote the
eigenvalues of T and let (1 , 1 ), . . . , (M , M ) denote the eigenpairs
of T , each repeated according to multiplicity. Thus for x R, the
eigenvalues of xI T are x 1 , . . . , x m and, by 10.16, the eigenpairs
of xI T are
(2x 1 , x 2 + 1 x + 1 ), . . . , (2x M , x 2 + M x + M ),
each repeated according to multiplicity. Hence
det(xI T ) = (x 1 ) . . . (x m )(x 2 + 1 x + 1 ) . . . (x 2 + M x + M ).
The right side of the equation above is, by denition, the characteristic
polynomial of T , completing the proof when V is a real vector space.
Determinant of a Matrix
Most of this section is devoted to discovering how to calculate det T
from the matrix of T (with respect to an arbitrary basis). Lets start with
the easiest situation. Suppose V is a complex vector space, T L(V ),
and we choose a basis of V with respect to which T has an uppertriangular matrix. Then, as we noted in the last section, det T equals
the product of the diagonal entries of this matrix. Could such a simple
formula be true in general?
Unfortunately the determinant is more complicated than the trace.
In particular, det T need not equal the product of the diagonal entries
of M(T ) with respect to an arbitrary basis. For example, the operator
on F3 whose matrix equals 10.8 has determinant 13, as we saw in the
last section. However, the product of the diagonal entries of that matrix
equals 0.
For each square matrix A, we want to dene the determinant of A,
denoted det A, in such a way that det T = det M(T ) regardless of which
basis is used to compute M(T ). We begin our search for the correct definition of the determinant of a matrix by calculating the determinants
of some special operators.
Let c1 , . . . , cn F be nonzero scalars and let (v1 , . . . , vn ) be a basis
of V . Consider the operator T L(V ) such that M T , (v1 , . . . , vn )
equals
225
226
10.18
c1
cn
0
c2
0
..
.
..
cn1
here all entries of the matrix are 0 except for the upper-right corner
and along the line just below the diagonal. Lets nd the determinant
of T . Note that
(v1 , T v1 , T 2 v1 , . . . , T n1 v1 ) = (v1 , c1 v2 , c1 c2 v3 , . . . , c1 . . . cn1 vn ).
Thus (v1 , T v1 , . . . , T n1 v1 ) is linearly independent (the cs are all nonzero). Hence if p is a nonzero polynomial with degree at most n 1,
then p(T )v1 = 0. In other words, the minimal polynomial of T cannot
have degree less than n. As you should verify, T n vj = c1 . . . cn vj for
each j, and hence T n = c1 . . . cn I. Thus zn c1 . . . cn is the minimal
polynomial of T . Because n = dim V , we see that zn c1 . . . cn is also
the characteristic polynomial of T . Multiplying the constant term of
this polynomial by (1)n , we get
10.19
det T = (1)n1 c1 . . . cn .
Determinant of a Matrix
into lists of consecutive integers and in each list move the rst term to
the end of that list. For example, taking n = 9, the permutation
10.20
(2, 3, 1, 5, 6, 7, 4, 9, 8)
is obtained from (1, 2, 3), (4, 5, 6, 7), (8, 9) by moving the rst term of
each of these lists to the end, producing (2, 3, 1), (5, 6, 7, 4), (9, 8), and
then putting these together to form 10.20. Let T L(V ) be the operator
such that
10.21
T vk = ck vpk
A=
A1
0
..
AM
where each block is a square matrix of the form 10.18. The eigenvalues
of T equal the union of the eigenvalues of A1 , . . . , AM (see Exercise 3 in
Chapter 9). Recalling that the determinant of an operator on a complex
vector space is the product of the eigenvalues, we see that our denition
of the determinant of a square matrix should force
det A = (det A1 ) . . . (det AM ).
However, we already know how to compute the determinant of each Aj ,
which has the same form as 10.18 (of course with a different value of n).
Putting all this together, we see that we should have
det A = (1)n1 1 . . . (1)nM 1 c1 . . . cn ,
where Aj has size nj -by-nj . The number (1)n1 1 . . . (1)nM 1 is called
the sign of the permutation (p1 , . . . , pn ), denoted sign(p1 , . . . , pn ) (this
is a temporary denition that we will change to an equivalent denition
later, when we dene the sign of an arbitrary permutation).
227
228
To put this into a form that does not depend on the particular permutation (p1 , . . . , pn ), let aj,k denote the entry in row j, column k, of A;
thus
0 if j = pk ;
aj,k =
ck if j = pk .
Then
10.22
det A =
sign(m1 , . . . , mn ) am1 ,1 . . . amn ,n ,
because each summand is 0 except the one corresponding to the permutation (p1 , . . . , pn ).
Consider now an arbitrary matrix A with entry aj,k in row j, column k. Using the paragraph above as motivation, we guess that det A
should be dened by 10.22. This will turn out to be correct. We can
now dispense with the motivation and begin the more formal approach.
First we will need to dene the sign of an arbitrary permutation.
The sign of a permutation (m1 , . . . , mn ) is dened to be 1 if the
number of pairs of integers (j, k) with 1 j < k n such that j appears after k in the list (m1 , . . . , mn ) is even and 1 if the number of
such pairs is odd. In other words, the sign of a permutation equals 1 if
the natural order has been changed an even number of times and equals
1 if the natural order has been changed an odd number of times. For
example, in the permutation (2, 3, . . . , n, 1) the only pairs (j, k) with
j < k that appear with changed order are (1, 2), (1, 3), . . . , (1, n); because we have n 1 such pairs, the sign of this permutation equals
(1)n1 (note that the same quantity appeared in 10.19).
The permutation (2, 1, 3, 4), which is obtained from the permutation
(1, 2, 3, 4) by interchanging the rst two entries, has sign 1. The next
lemma shows that interchanging any two entries of any permutation
changes the sign of the permutation.
10.23 Lemma: Interchanging two entries in a permutation multiplies
the sign of the permutation by 1.
Proof: Suppose we have two permutations, where the second permutation is obtained from the rst by interchanging two entries. If the
two entries that we interchanged were in their natural order in the rst
permutation, then they no longer are in the second permutation, and
Determinant of a Matrix
229
vice versa, for a net change (so far) of 1 or 1 (both odd numbers) in
the number of pairs not in their natural order.
Consider each entry between the two interchanged entries. If an intermediate entry was originally in the natural order with respect to the
rst interchanged entry, then it no longer is, and vice versa. Similarly,
if an intermediate entry was originally in the natural order with respect
to the second interchanged entry, then it no longer is, and vice versa.
Thus the net change for each intermediate entry in the number of pairs
not in their natural order is 2, 0, or 2 (all even numbers).
For all the other entries, there is no change in the number of pairs
not in their natural order. Thus the total net change in the number of
pairs not in their natural order is an odd number. Thus the sign of the
second permutation equals 1 times the sign of the rst permutation.
If A is an n-by-n matrix
10.24
a1,1
.
A = ..
an,1
...
...
a1,n
..
. ,
an,n
sign(m1 , . . . , mn ) am1 ,1 . . . amn ,n .
10.25
det A =
(m1 ,...,mn )perm n
For example, if A is the 1-by-1 matrix [a1,1 ], then det A = a1,1 because perm 1 has only one element, namely, (1), which has sign 1. For
a more interesting example, consider a typical 2-by-2 matrix. Clearly
perm 2 has only two elements, namely, (1, 2), which has sign 1, and
(2, 1), which has sign 1. Thus
a1,1 a1,2
= a1,1 a2,2 a2,1 a1,2 .
10.26
det
a2,1 a2,2
To make sure you understand this process, you should now nd the
formula for the determinant of the 3-by-3 matrix
230
a1,1
..
.
A=
.
0
an,n
The permutation (1, 2, . . . , n) has sign 1 and thus contributes a term
of a1,1 . . . an,n to the sum 10.25 dening det A. Any other permutation
(m1 , . . . , mn ) perm n contains at least one entry mj with mj > j,
which means that amj ,j = 0 (because A is upper triangular). Thus all
the other terms in the sum 10.25 dening det A make no contribution. Hence det A = a1,1 . . . an,n . In other words, the determinant of an
upper-triangular matrix equals the product of the diagonal entries. In
particular, this means that if V is a complex vector space, T L(V ),
and we choose a basis of V with respect to which M(T ) is upper triangular, then det T = det M(T ). Our goal is to prove that this holds for
every basis of V , not just bases that give upper-triangular matrices.
Generalizing the computation from the paragraph above, next we
will show that if A is a block upper-triangular matrix
A1
..
,
A=
.
0
Am
where each Aj is a 1-by-1 or 2-by-2 matrix, then
10.27
Determinant of a Matrix
231
Our goal is to prove that det T = det M(T ) for every T L(V ) and
every basis of V . To do this, we will need to develop some properties of determinants of matrices. The lemma below is the rst of the
properties we will need.
232
We need to introduce notation that will allow us to represent a matrix in terms of its columns. If A is an n-by-n matrix
a1,1 . . . a1,n
.
..
A=
. ,
..
an,1 . . . an,n
then we can think of the kth column of A as an n-by-1 matrix
a1,k
.
ak =
.. .
an,k
We will write A in the form
[ a1
...
an ],
with the understanding that ak denotes the kth column of A. With this
notation, note that aj,k , with two subscripts, denotes an entry of A,
whereas ak , with one subscript, denotes a column of A.
The next lemma shows that a permutation of the columns of a matrix
changes the determinant by a factor of the sign of the permutation.
Some texts dene the
determinant to be the
function dened on the
square matrices that is
linear as a function of
each column separately
and that satises 10.30
and det I = 1. To prove
that such a function
exists and that it is
unique takes a
nontrivial amount of
work.
...
an ] is an n-by-n matrix.
amn ] = sign(m1 , . . . , mn ) det A.
Determinant of a Matrix
det A = det[ a1
...
ak
...
233
an ],
Theorem:
French mathematicians
Jacques Binet and
Proof: Let A = [ a1 . . .
of A. Let
b1,1 . . .
.
B = ..
bn,1 . . .
b1,n
..
. = [ b1
bn,n
...
bn ],
m1 =1
...
Aemn ],
mn =1
where the last equality comes from repeated applications of the linearity of det as a function of one column at a time. In the last sum above,
Augustin-Louis Cauchy.
234
...
Aemn ]
bm1 ,1 . . . bmn ,n sign(m1 , . . . , mn ) det A
= (det A)
sign(m1 , . . . , mn ) bm1 ,1 . . . bmn ,n
Determinant of a Matrix
Proof: Let T L(V ). As noted above, 10.32 implies that det M(T )
is independent of which basis of V we choose. Thus to show that
det T = det M(T )
for every basis of V , we need only show that the equation above holds
for some basis of V . We already did this (on page 230), choosing a basis
of V with respect to which M(T ) is an upper-triangular matrix (if V is a
complex vector space) or an appropriate block upper-triangular matrix
(if V is a real vector space).
If we know the matrix of an operator on a complex vector space, the
theorem above allows us to nd the product of all the eigenvalues without nding any of the eigenvalues. For example, consider the operator
on C5 whose matrix is
0 0 0 0 3
1 0 0 0 6
0 1 0 0 0 .
0 0 1 0 0
0 0 0 1 0
No one knows an exact formula for any of the eigenvalues of this operator. However, we do know that the product of the eigenvalues equals 3
because the determinant of the matrix above equals 3.
The theorem above also allows us easily to prove some useful properties about determinants of operators by shifting to the language of
determinants of matrices, where certain properties have already been
proved or are obvious. We carry out this procedure in the next corollary.
10.34
235
236
Volume
Most applied
mathematicians agree
that determinants
should rarely be used
in serious numeric
calculations.
We proved the basic results of linear algebra before introducing determinants in this nal chapter. Though determinants have value as a
research tool in more advanced subjects, they play little role in basic
linear algebra (when the subject is done right). Determinants do have
one important application in undergraduate mathematics, namely, in
computing certain volumes and integrals. In this nal section we will
use the linear algebra we have learned to make clear the connection
between determinants and these applications. Thus we will be dealing
with a part of analysis that uses linear algebra.
We begin with some purely linear algebra results that will be useful when investigating volumes. Recall that an isometry on an innerproduct space is an operator that preserves norms. The next result
shows that every isometry has determinant with absolute value 1.
10.35 Proposition: Suppose that V is an inner-product space. If
S L(V ) is an isometry, then |det S| = 1.
Proof: Suppose S L(V ) is an isometry. First consider the case
where V is a complex inner-product space. Then all the eigenvalues of S
have absolute value 1 (by 7.37). Thus the product of the eigenvalues
of S, counting multiplicity, has absolute value one. In other words,
|det S| = 1, as desired.
Volume
Now suppose V is a real inner-product space. Then there is an orthonormal basis of V with respect to which S has a block diagonal matrix,
where each block on the diagonal is a 1-by-1 matrix containing 1 or 1
or a 2-by-2 matrix of the form
cos sin
10.36
,
sin
cos
with (0, ) (see 7.38). Note that the constant term of the characteristic polynomial of each matrix of the form 10.36 equals 1 (because
cos2 + sin2 = 1). Hence the second coordinate of every eigenpair
of S equals 1. Thus the determinant of S is the product of 1s and 1s.
In particular, |det S| = 1, as desired.
Suppose V is a real inner-product space and S L(V ) is an isometry.
By the proposition above, the determinant of S equals 1 or 1. Note
that
{v V : Sv = v}
is the subspace of V consisting of all eigenvectors of S corresponding
to the eigenvalue 1 (or is the subspace {0} if 1 is not an eigenvalue
of S). Thinking geometrically, we could say that this is the subspace
on which S reverses direction. A careful examination of the proof of
the last proposition shows that det S = 1 if this subspace has even
dimension and det S = 1 if this subspace has odd dimension.
A self-adjoint operator on a real inner-product space has no eigenpairs (by 7.11). Thus the determinant of a self-adjoint operator on a
real inner-product space equals the product of its eigenvalues, counting multiplicity (of course, this holds for any operator, self-adjoint or
not, on a complex vector space).
Recall that if V is an inner-product space and T L(V ), then T T
is a positive operator and hence has a unique positive square root, de
noted T T (see 7.27 and 7.28). Because T T is positive, all its eigenvalues are nonnegative (again, see 7.27), and hence its determinant is
nonnegative. Thus in the corollary below, taking the absolute value of
|det T | = det T T .
237
238
Another proof of this
corollary is suggested
in Exercise 24 in this
T = S T T .
chapter.
Thus
= det T T ,
where the rst equality follows from 10.34 and the second equality
follows from 10.35.
T = S T T ,
Volume
239
240
for R n .
Proof: First consider the case where T L(Rn ) is a positive
operator. Let 1 , . . . , n be the eigenvalues of T , repeated according
to multiplicity. Each of these eigenvalues is a nonnegative number
(see 7.27). By the real spectral theorem (7.13), there is an orthonormal
basis (e1 , . . . , en ) of V such that T ej = j ej for each j. As discussed
above, this implies that T changes volumes by a factor of det T .
Now suppose T L(R n ) is an arbitrary operator. By the polar decomposition (7.41), there is an isometry S L(V ) such that
T = S T T .
T T () . Thus
volume T () = volume S T T ()
= volume T T ()
= (det T T )(volume )
If Rn , then T () = S
= |det T |(volume ),
where the second equality holds because volumes are not changed by
the isometry S (as discussed above), the third equality holds by the
Volume
241
If n = 1, then the
derivative in this sense
is the operator on R of
multiplication by the
derivative in the usual
sense of one-variable
calculus.
242
contains Dk j (x) in row j, column k (we will not prove this). In other
words,
D1 1 (x) . . . Dn 1 (x)
..
..
.
10.39
M( (x)) =
.
.
D1 n (x) . . . Dn n (x)
Suppose that is differentiable at each point of and that is
injective on . Let f be a real-valued function dened on (). Let
x and let be a small subset of containing x. As we noted above,
volume ( ) volume (x) ( ),
where the symbol means approximately equal to. Using 10.38, this
becomes
volume ( ) |det (x)|(volume ).
Let y = (x). Multiply the left side of the equation above by f (y) and
the right side by f (x) (because y = (x), these two quantities are
equal), getting
10.40
f (y) volume ( ) f (x) |det (x)|(volume ).
Now divide into many small pieces and add the corresponding versions of 10.40, getting
f (y) dy =
f (x) |det (x)| dx.
10.41
()
Volume
cos
sin
243
r sin
r cos
,
sin cos
sin sin
cos
cos cos
cos sin
sin
sin sin
sin cos ,
0
as you should verify. You should also verify that the determinant of the
matrix above equals 2 sin , thus explaining why a factor of 2 sin
is needed when computing an integral in spherical coordinates.
244
Exercises
1.
2.
Prove that if A and B are square matrices of the same size and
AB = I, then BA = I.
3.
Suppose T L(V ) has the same matrix with respect to every basis of V . Prove that T is a scalar multiple of the identity operator.
4.
5.
6.
7.
8.
9.
10.
11.
Exercises
12.
245
51 12 21
60 40 28 .
57 68
1
Someone tells you (accurately) that 48 and 24 are eigenvalues
of T . Without using a computer or writing anything down, nd
the third eigenvalue of T .
13.
14.
15.
16.
17.
a1,1 . . . a1,n
.
..
.
.
.
an,1 . . . an,n
is the matrix of T with respect to some orthonormal basis of V .
Prove that
n
|1 |2 + + |n |2
|aj,k |2 .
k=1 j=1
18.
246
Exercise 19 fails on
19.
innite-dimensional
inner-product spaces,
leading to what are
called hyponormal
operators, which have a
20.
21.
22.
well-developed theory.
A=
A1
..
0
Am
24.
25.
Symbol Index
R, 2
Re, 69
C, 2
Im, 69
F, 3
, 69
z
|z|, 69
P(F), 10
T |U , 76
M T , (v1 , . . . , vn ) , 82
, 23
M(T ), 82
Pm (F), 23
PU ,W , 92
dim V , 31
u, v, 99
L(V , W ), 38
v, 102
I, 38
U , 111
null T , 41
PU , 113
range T , 43
M T , (v1 , . . . , vn ), (w1 , . . . , wm ) ,
T , 118
F ,5
F , 10
, 120
T , 146
48
M(T ), 48
, 166
Mat(m, n, F), 50
, 231
M(v), 52
M v, (v1 , . . . , vn ) , 52
T (), 239
"
f , 241
T 1 , 54
Dk , 241
L(V ), 57
, 242
deg p, 66
247
Index
absolute value, 69
addition, 9
adjoint, 118
complex number, 2
complex spectral theorem,
133
complex vector space, 10
conjugate transpose, 120
coordinate, 4
cube, 238
cube root of an operator,
159
basis, 27
block diagonal matrix, 142
block upper-triangular matrix,
195
Cauchy-Schwarz inequality,
104
Cayley-Hamilton theorem for
complex vector
spaces, 173
Cayley-Hamilton theorem for
real vector spaces,
207
change of basis, 216
characteristic polynomial of a
2-by-2 matrix, 199
characteristic polynomial of an
operator on a complex
vector space, 172
characteristic polynomial of an
operator on a real
vector space, 206
characteristic value, 77
closed under addition, 13
closed under scalar
multiplication, 13
complex conjugate, 69
degree, 22
derivative, 241
determinant of a matrix,
229
determinant of an operator,
222
diagonal matrix, 87
diagonal of a matrix, 83
differentiable, 241
dimension, 31
direct sum, 15
divide, 180
division algorithm, 66
dot product, 98
eigenpair, 205
eigenvalue of a matrix, 194
eigenvalue of an operator,
77
eigenvector, 77
249
250
Index
Index
permutation, 226
perpendicular, 102
point, 10
polynomial, 10
positive operator, 144
positive semidenite operator,
144
product, 41
projection, 92
Pythagorean theorem, 102
range, 43
real part, 69
real spectral theorem, 136
real vector space, 9
root, 64
scalar, 3
scalar multiplication, 9
self-adjoint, 128
sign of a permutation, 228
signum, 228
singular matrix, 214
singular values, 155
251
span, 22
spans, 22
spectral theorem, 133, 136
square root of an operator,
145, 159
standard basis, 27
subspace, 13
sum of subspaces, 14
surjective, 44
trace of a square matrix,
218
trace of an operator, 217
transpose, 120
triangle inequality, 105
tuple, 4
unitary operator, 148
upper triangular, 83
vector, 6, 10
vector space, 9
volume, 238