Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Passman Ds Lectures On Linear Algebra

Download as pdf or txt
Download as pdf or txt
You are on page 1of 253
At a glance
Powered by AI
The document provides an overview and preface of lecture notes for a linear algebra course.

The book contains lecture notes from a one-year junior-senior level linear algebra course taught by the author around 40 years ago at the University of Wisconsin-Madison.

The topics covered in the course include vectors, vector spaces, linear transformations, matrices, eigenvalues, and more advanced topics like infinite dimensional vector spaces.

ISTUDY

LECTURES ON
LINEAR ALGEBRA

ISTUDY
This page intentionally left blank

ISTUDY
LECTURES ON
LINEAR ALGEBRA

Donald S Passman
University of Wisconsin-Madison, USA

World Scientific
NEW JERSEY • LONDON • SINGAPORE • BEIJING • SHANGHAI • HONG KONG • TAIPEI • CHENNAI • TOKYO

ISTUDY
Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

Library of Congress Control Number: 2022006322

British Library Cataloguing-in-Publication Data


A catalogue record for this book is available from the British Library.

LECTURES ON LINEAR ALGEBRA

Copyright © 2022 by World Scientific Publishing Co. Pte. Ltd.


All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or
mechanical, including photocopying, recording or any information storage and retrieval system now known or to
be invented, without written permission from the publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center,
Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from
the publisher.

ISBN 978-981-125-484-0 (hardcover)


ISBN 978-981-125-499-4 (paperback)
ISBN 978-981-125-485-7 (ebook for institutions)
ISBN 978-981-125-486-4 (ebook for individuals)

For any available supplementary material, please visit


https://www.worldscientific.com/worldscibooks/10.1142/12793#t=suppl

Printed in Singapore

ISTUDY
Preface
These are the notes from a one-year, junior-senior level linear alge-
bra course I offered at the University of Wisconsin-Madison some 40
plus years ago. The notes were actually hand written on ditto masters,
run off and given to the class at the time of the lectures. Each section
corresponds to about one week or three one-hour lectures.
The course was very enjoyable and I even remember three particular
A students, but not their names. One always had his hand raised to
answer every question I posed to the class. I told him he reminded me
of myself when I was an undergraduate. The second had a wonderful
sense of humor and always included jokes in each of his homework
assignments. The third was a sophomore basketball player. Of course,
the U.W. athletics department just couldn’t accept the fact that he
was a strong math student.
In those years, I believed that elementary row and column opera-
tions should not be part of such a course, but when I began to translate
these notes into tex, I relented. So I added material on this topic. I
also added a brief final chapter on infinite dimensional vector spaces,
including the existence of basis and dimension. The proofs here might
be considered a bit skimpy.
Most undergraduate linear algebra courses are not as sophisticated
as this one was. Most math graduate students are assumed to have
had a good course in linear algebra, but sadly many have not. Reading
these notes might be an appropriate way to fill in the gap.
D. S. Passman
Madison, Wisconsin
San Diego, California
February 2021

v
ISTUDY
This page intentionally left blank

ISTUDY
Contents

Preface v
Chapter I. Vector Spaces 1
1. Fields 2
2. Vector Spaces 9
3. Subspaces 17
4. Spanning and Linear Independence 24
5. The Replacement Theorem 31
6. Matrices and Elementary Operations 39
7. Linear Equations and Bases 46
Chapter II. Linear Transformations 53
8. Linear Transformations 54
9. Kernels and Images 61
10. Quotient Spaces 68
11. Matrix Correspondence 77
12. Products of Linear Transformations 86
13. Eigenvalues and Eigenvectors 95
Chapter III. Determinants 103
14. Volume Functions 104
15. Determinants 114
16. Consequences of Uniqueness 123
17. Adjoints and Inverses 133
18. The Characteristic Polynomial 142
19. The Cayley-Hamilton Theorem 150
20. Nilpotent Transformations 157
21. Jordan Canonical Form 165

Chapter IV. Bilinear Forms 171


22. Bilinear Forms 172
23. Symmetric and Skew-Symmetric Forms 178
24. Congruent Matrices 184
25. Inner Product Spaces 190
26. Inequalities 197
vii
ISTUDY
viii CONTENTS

27. Real Symmetric Matrices 206


28. Complex Analogs 213
Chapter V. Infinite Dimensional Spaces 219
29. Existence of Bases 220
30. Existence of Dimension 229
Index 239

ISTUDY
CHAPTER I

Vector Spaces

ISTUDY
2 I. VECTOR SPACES

1. Fields
Linear algebra is basically the study of linear equations. Let us
start with a very simple example like 4x + 3 = 5 or more generally
ax + b = c. Here of course x is the unknown and a, b and c are known
quantities. Now this is really a simple equation and we all know how
to solve for x. But before we seek the unknown, perhaps we had better
be sure we know the knowns.
First, a, b and c must belong to some system S with an arithmetic.
There is an addition denoted by + and a multiplication indicated by
juxtaposition, that is the elements to be multiplied are written adjacent
to each other. For example, S could be the set of integers, rational
numbers Q, real numbers R or complex numbers C. But, as we will
see, the set of integers is really inadequate.
Let us now try to solve for x. Clearly ax+b = c yields ax = c−b and
then x = (c − b)/a. We see that in order to solve even the simplest of
all equations, S must also have a subtraction (the opposite of addition)
and a division (the opposite of multiplication). Now the quotient of two
integers need not be an integer and we see that the original equation
4x + 3 = 5 in fact has no integer solution (since of course x must be
1/2). Thus the integers are not adequate, but it turns out that Q, R
and C are.
We can now develop three theories of linear algebra, one for each
of Q, R and C. What we would discover is that most theorems proved
in one of these situations would carry over to the other two cases and
the proofs would be the same. Since it is rather silly to do things three
times when one will suffice, we seek the common property of these three
sets that make everything work. The common property is that they
are fields.
Formally, a field F is a set of elements with two operations, addition
and multiplication, defined that satisfy a number of axioms. First there
are the axioms of addition.

A1. For all a, b ∈ F , we have


a+b∈F (closure)
A2. For all a, b, c ∈ F , we have
a + (b + c) = (a + b) + c (associative law)
A3. For all a, b ∈ F , we have
a+b=b+a (commutative law)

ISTUDY
1. FIELDS 3

A4. There exists a unique element 0 ∈ F , called zero, with the


property that for all a ∈ F
a+0=0+a=a (zero)
A5. For each a ∈ F there is a unique element −a ∈ F , its negative,
such that
a + (−a) = (−a) + a = 0 (negative)

Let us consider these in more detail. As we said, the system we


work in must have an arithmetic and axiom A1 says that we can add
together any two elements of F and get an element of F . Thus the
sum of two real numbers is real and the sum of two rational numbers
is rational. Associativity is a little harder to understand. We already
know how to add two elements of F , but how do we add three? How
do we make sense out of a + b + c ? We might first add a to b, find the
answer and then add this to c. That is, we find (a + b) + c. On the
other hand, we might add b to c, find the answer and then add this
to a obtaining a + (b + c). Axiom A2 says that either way we do it,
we get the same answer. This therefore allows us to define the sum
of three elements unambiguously. Finally, it is not hard to show (see
Problem 1.4) that associativity allows us to define the sum of any finite
number of elements in an unambiguous manner.
Axiom A3 is a nice assumption. We can add two elements without
worrying about which one comes first. Commutativity and associativ-
ity have many everyday consequences. For example, let us consider a
simple problem in column addition.
1
2
3
4
+ 5

We usually start at the top. We add 1 + 2 = 3, we add this to 3


getting 6, we then add this to 4 getting 10, and finally we add 10 + 5 =
15. Of course, what we have really done is computed the expression
(((1 + 2) + 3) + 4) + 5 = 15. Next, we check our arithmetic by adding
upwards and obtain (((5 + 4) + 3) + 2) + 1 = 15 since 5 + 4 = 9,
9 + 3 = 12, 12 + 2 = 14 and 14 + 1 = 15. Thus the answers agree.
At first glance we think this is obviously what should happen. But at
second glance we see that the equality
(((1 + 2) + 3) + 4) + 5 = (((5 + 4) + 3) + 2) + 1

ISTUDY
4 I. VECTOR SPACES

is in fact miraculous. It is a consequence of the two axioms of associa-


tivity and commutativity.
The final two axioms tell us that F is big enough to allow for
a solution of certain linear equations, namely equations of the form
x + a = b. Thus if a = b then axiom A4 says that x = 0 is a solution,
and if b = 0 then A5 says that x = −a works. In the general case the
solution is x = b + (−a). Let us check that it works. We have
x + a = (b + (−a)) + a
= b + ((−a) + a) by the associative law
=b+0 by the definition of −a
=b by the definition of 0
We might observe that a short hand notation for b + (−a) is b − a
and thus F has a subtraction. In many ways subtraction is not as nice
an operation as addition. Can we say “subtract a and b”? Certainly
not. We must know which of a or b comes first and this means that
subtraction is not commutative. Can we subtract a column of figures?
Again no, since subtraction is not associative.
The axioms for multiplication are similar to those of addition.

M1. For all a, b ∈ F , we have


ab ∈ F (closure)
M2. For all a, b, c ∈ F , we have
a(bc) = (ab)c (associative law)
M3. For all a, b ∈ F , we have
ab = ba (commutative law)
M4. There exists a unique element 1 ∈ F , called one, with the
property that for all a ∈ F
a1 = 1a = a
Furthermore, 1 = 0. (one)
M5. For each a ∈ F with a = 0 there is a unique element a−1 ∈ F ,
its inverse, such that
aa−1 = a−1 a = 1 (inverse)

Again associativity and commutativity allow us to multiply any


finite number of elements of F unambiguously. Moreover the latter
two axioms allow us to solve the equation ax + b = c with a = 0. The

ISTUDY
1. FIELDS 5

solution is x = a−1 (c + (−b)) which we check as follows. With this x


we have

ax + b = a(a−1 (c + (−b)) + b
= (aa−1 )(c + (−b)) + b by the associative law
for multiplication
= 1(c + (−b)) + b by the definition of a−1
= (c + (−b)) + b by the definition of 1
= c + ((−b) + b) by the associative law
for addition
=c+0 by the definition of −b
=c by the definition of 0

So the equation has a solution.


Finally, there is one more axiom. Up till now addition and multi-
plication have been considered separately. This last axiom intertwines
the two.
D. For all a, b, c ∈ F we have

a(b + c) = ab + ac, and


(b + c)a = ba + ca (distributive law)

The effect of this on the study of linear equations will be apparent in


the next section. However, there are a few simple consequences that
we state below. Note that part (i) involves 0, an additive element,
along with multiplication. Thus, it is clear that some relation between
addition and multiplication must be used in the proof. Namely, we
have to use the distributive law. We will see how it applies early in the
following argument.

Lemma 1.1. Let F be a field and let a, b ∈ F .


i. a0 = 0a = 0
ii. (−a)b = a(−b) = −(ab)

Proof. (i ) To start with we note that 0 + 0 = 0 and thus the


distributive law implies that

a0 = a(0 + 0) = a0 + a0

ISTUDY
6 I. VECTOR SPACES

For convenience, write c = a0. Then c = c + c and by adding −c to


both sides of this equation we obtain
0 = c + (−c) by definition of −c
= (c + c) + (−c) since c = c + c
= c + (c + (−c)) by associativity
=c+0 by definition of −c
=c by definition of 0
Thus a0 = c = 0 and the commutative law yields 0a = a0 = 0.
(ii) We have
(−a)b + ab = ((−a) + a)b the distributive law
= 0b by definition of −a
=0 using (i ) above
Thus (−a)b is an element of F which when added to ab yields 0. By
definition of −(ab), we must have (−a)b = −(ab). Similarly we can
show that a(−b) = −(ab). 
We now consider some examples of fields.
Example 1.1. The set of all rational numbers, that is ordinary
fractions, is a field which we denote by Q.
Example 1.2. The set of all real numbers is a field which we denote
by R. As is well known, the real numbers can be put into one-to-one
correspondence with the points on a line. We start with two points
labeled 0 and 1. Once these are fixed, we can then put in all positive
and negative integer points by measuring off multiples of the length
of the interval [0, 1]. Next, by subdividing all the intervals [0, n] into
m equal pieces, we can designate all the rational points. Finally, we
somehow “fill in all the holes” with the remaining real numbers.

-3 -2 -1 0 1 2 3

Example 1.3. The set of all complex numbers is a field which we


denote by C. This field has a property which will be of interest to us.
Not only do all linear equations in C have solutions in C, as we saw
above, but in fact all polynomial equations have solutions. Namely let
c0 , c1 , . . . , cn ∈ C with n ≥ 1 and cn = 0, and consider the equation
cn xn + cn−1 xn−1 + · · · + c1 x + c0 = 0
in the unknown x. Then there exists x ∈ C that satisfies the equa-
tion. This property characterizes what are known as algebraically closed

ISTUDY
1. FIELDS 7

fields. At some point, we will see that by assuming that our field F is
algebraically closed, we will obtain much nicer theorems on the struc-
ture of certain functions called linear transformations.
Example 1.4. The above fields all have infinitely many elements.
There are however finite fields. The simplest has just two elements.
Think about the arithmetic of the ordinary integers and the way even
and odd numbers add and multiply. We define a field with two elements
“even” and “odd” with addition and multiplication given by

+ even odd × even odd


even even odd even even even
odd odd even odd even odd

Here we have described the arithmetic in terms of addition and multi-


plication tables. It is not hard to check that F = {even, odd} is indeed
a field. Note that even plays the role of 0 and odd plays the role of the
element 1.
In the remainder of these lectures the field we work with will play
a secondary role. Therefore the reader can, if he or she wishes, assume
that F is one of the more familiar examples Q, R or C. Also the
structure and properties of F are assumed to be known. In particular,
we will perform the necessary arithmetic operations in F in the familiar
fashion, without undo concern with fine points like parentheses or the
order of addition. We know that the associative and commutative laws
guarantee the validity of these simplifications.

Problems

1.1. Discuss why we abstract the concept of a field.


1.2. Convince yourself that Q, R and C are all fields.
Let F be a field and let a1 , a2 , . . . , an ∈ F .
1.3. Consider the column addition of the n field elements listed
above. Find the sums adding up and adding down. Show that these
sums are the same for n = 4. Try the general case.
1.4. Prove that all possible products of a1 , a2 , . . . , an in that order
are equal to (· · · (((a1 a2 )a3 )a4 ) · · · )an . Try this first for small values
of n. This is the generalized associative law that allows us to define
unambiguously the product a1 a2 · · · an .

ISTUDY
8 I. VECTOR SPACES

1.5. Prove that all products of a1 , a2 , . . . , an in any order are equal.


This is the generalized commutative law.
1.6. Let a, b, c ∈ F with a = 0. In the text it was shown that
x = a−1 (c + (−b)) = a−1 (c − b) is a solution of the equation ax + b = c.
Prove that this is the unique solution.
1.7. Let a, b, c, d, e, f ∈ F . Find the solution to the simultaneous
linear equations
ax + by = e
cx + dy = f
under the assumption that ad − bc = 0. What happens if ad − bc = 0.
1.8. Let Q[i] denote the √set of all complex numbers of the form
a + bi with a, b ∈ Q and i = −1. Prove that Q[i] is a field.
√ √
1.9. Let Q[ 2] denote the√set of all real numbers of the form a+b 2
with a, b ∈ Q. Prove that Q[ 2] is a field.
1.10. Construct a field containing precisely three elements. You
may describe the arithmetic by means of addition and multiplication
tables.

ISTUDY
2. VECTOR SPACES 9

2. Vector Spaces
Let F be a fixed field and let e, f, g ∈ F . Consider the linear
equation
ex + f y + gz = 0
with x, y, z as unknowns. We denote solutions of this, with x, y, x ∈ F ,
by ordered triples (x, y, z) and let S denote the set of all such solutions.

Suppose (x, y, z) and (x , y  , z  ) are in S. Then by the distributive


law and the other axioms of F we have
e(x + x )+f (y + y  ) + g(z + z  )
= (ex + f y + gz) + (ex + f y  + gz  )
=0+0=0
Thus (x + x , y + y  , z + z  ) ∈ S. Now this looks like a summation and
we are therefore tempted to define an addition in S by
(x, y, z) + (x , y  , z  ) = (x + x , y + y  , z + z  )
Now let (x, y, z) ∈ S and let a ∈ F . Then
e(ax)+f (ay) + g(az)
= a(ex + f y + gz) = a0 = 0
and thus (ax, ay, az) ∈ S. Again this looks like a suitable product and
we are therefore tempted to define a scalar multiplication in S, that is
a multiplication of an element of S by an element of F , by means of
a(x, y, z) = (ax, ay, az)
In this way, S is a system that depends on a field F and has defined
on it a nice addition and scalar multiplication. Such systems occur
frequently in mathematics, as our later examples will indicate, and are
known as vector spaces.
Formally, a vector space V over F is a set of elements, called vec-
tors, with an addition and scalar multiplication satisfying a number of
axioms. The axioms of addition say that V has as nice an addition as
does the field F . In fact, the axioms are the same. Observe that we
use Greek letters to denote vectors.

A1. For all α, β ∈ V , we have


α+β ∈V (closure)
A2. For all α, β, γ ∈ V , we have
α + (β + γ) = (α + β) + γ (associative law)

ISTUDY
10 I. VECTOR SPACES

A3. For all α, β ∈ V , we have


α+β =β+α (commutative law)
A4. There exists a unique vector 0 ∈ V , called zero, with the prop-
erty that for all α ∈ V
α+0=0+α=α (zero)
A5. For each α ∈ V there is a unique vector −α ∈ V , its negative,
such that
α + (−α) = (−α) + α = 0 (negative)

Note that the same symbol 0 is used to denote the zero in F as well as
the zero vector in V . However this almost never causes confusion.
Thus a vector space V satisfies the same addition axioms as does
a field. This means, for example, that we can define unambiguously
α1 + α2 + · · · + αn the sum of any finite number of vectors and that this
sum is independent of the order in which the summands are written.
It means also that we can always solve equations of the form x + α = β
for x ∈ V . In fact, just about anything true about the addition in fields
carries over to V . It is easy to check that the addition defined on S
satisfies these axioms.
The axioms of multiplication are a little different. Remember we
can only multiply a vector by a field element and the field element
always occurs on the left. (This last part is really just a matter of
convention.) In particular, there is no commutative law and only one
possibility for associativity.

M1. For all a ∈ F and α ∈ V , we have


aα ∈ V (closure)
M2. For all a, b ∈ F and α ∈ V , we have
a(bα) = (ab)α (associative law)
M3. For all α ∈ V , we have
1α = α
where 1 is the one element of F . (unital law)

Again associativity guarantees that we can define unambiguously


the product vector a1 a2 · · · an α ∈ V with a1 , a2 , . . . , an ∈ F and α ∈ V .
The unital axiom assures us that this multiplication is nontrivial and
meaningful. For example if we were to define a ∗ α = 0 for all a ∈ F

ISTUDY
2. VECTOR SPACES 11

and α ∈ V , then this multiplication would satisfy all axioms except


M3. But certainly this ∗ multiplication is trivial and uninteresting.
Now let us consider the following vector equation

ax + α = β

with α, β ∈ V , a ∈ F and a = 0. We claim that x = a−1 (β + (−α)) is


a solution and we verify this as follows.

ax + α = a(a−1 (β + (−α))) + α
= (aa−1 )(β + (−α)) + α by the associative law
of multiplication
= 1(β + (−α)) + α by definition of a−1
= (β + (−α)) + α by the unital law
= β + ((−α) + α) by the associative law
of addition
=β+0 by definition of −α
=β by definition of 0

Therefore x is indeed a solution. Notice how the unital law gets used
above. In the future we denote β + (−α) by β − α.
The remaining two axioms intertwine addition and multiplication.

D1. For all a ∈ F and α, β ∈ V we have

a(α + β) = aα + aβ (distributive law)

D2. For all a, b ∈ F and α ∈ V we have

(a + b)α = aα + bα (distributive law)

Observe that unlike the two distributive laws for fields which are basi-
cally the same, the two laws here are really different. This is because
scalar multiplication is not only not commutative, but in fact it is not
even defined in the other order.
Let us consider some examples of vector spaces.

Example 2.1. The simplest nontrivial example of a vector space is


F itself. Here vector addition is just ordinary addition in V = F and
scalar multiplication is just ordinary multiplication.

ISTUDY
12 I. VECTOR SPACES

Example 2.2. Let U be a set and let V be the set of all functions
from U into F . If α, β ∈ V and a ∈ F , then we define the functions
α + β and aα from U to F by
(α + β)(u) = α(u) + β(u)
(aα)(u) = a(α(u)) for all u ∈ U
It is easy to check that with these definitions V is a vector space over
the field F .
Example 2.3. Now the field C is endowed with a nice arithmetic
and it contains R as a subfield. Thus there is an addition in C and a
natural multiplication of elements of C by elements of R. In this way
we see easily that C is a vector space over R.
More generally, if K is any field containing F , then by suitably
restricting the multiplication we can make K a vector space over F .
This example is not as silly as it first appears. For suppose that we
know a good deal about F , but very little about K. Then it is indeed
possible to use certain vector space results to deduce information about
K. This is in fact a very important tool in the study of fields.
Example 2.4. Perhaps the most popular examples of vector spaces
are as follows. For each integer n ≥ 1, let F n denote the set of n-tuples
of elements of F . Thus F n = {(a1 , a2 , . . . , an ) | ai ∈ F } and of course
(a1 , a2 , . . . , an ) = (b1 , b2 , . . . , bn ) if and only if the corresponding entries
are equal, that is a1 = b1 , a2 = b2 , . . . , an = bn . Now let α, β ∈ F n
with
α = (a1 , a2 , . . . , an ), β = (b1 , b2 , . . . , bn )
and let c ∈ F . Then we define
α + β = (a1 + b1 , a2 + b2 , . . . , an + bn ) ∈ F n
and
cα = (ca1 , ca2 , . . . , can ) ∈ F n
The axioms of F carry over easily to show that F n is a vector space
over F .

Example 2.5. Finally, we consider the polynomial ring F [x]. This


is the set of all polynomials
α = α(x) = a0 + a1 x + · · · + an xn
with coefficients ai ∈ F . Observe that α above is also equal to
α = a0 + a1 x + · · · + an xn + 0xn+1 + 0xn+2

ISTUDY
2. VECTOR SPACES 13

or in other words, equality of two polynomials means that their nonzero


coefficients are the same and that their zero coefficients do not mat-
ter. Because of this ambiguity about zero coefficients, it is sometimes
simpler to write typical elements α, β ∈ F [x] as

 ∞

α= ai x i , β= bi x i
i=0 i=0

with the proviso that only finitely many of their coefficients are not
zero.
Let c ∈ F . Then with α and β as above, we define
∞
α+β = (ai + bi )xi
i=0


cα = (cai )xi
i=0

In this way, F [x] becomes a vector space.


As usual, we can define the degree of a polynomial by
deg 0 = −∞
deg α = max{i | ai = 0} for α = 0
Clearly
deg(α + β) ≤ max{deg α, deg β}
and from this it follows that Wn , the set of all polynomials of degree
≤ n, is also a vector space under the same operations of addition and
scalar multiplication.
Now as we well know, F [x] has a number of additional properties
above and beyond its being a vector space. Since we will have need for
these later on, it is worthwhile discussing them now. First, polynomials
can be multiplied. Roughly speaking to multiply α and β we just
distribute the terms, using the fact that xi xj = xi+j and then regroup
the terms. The answer is αβ = γ where
∞
γ= cn xn
n=0

and
 n
 n

cn = ai b j = ai bn−i = an−j bj
i+j=n i=0 j=0

It is easy to check that with this multiplication F [x] satisfies all the
axioms of a field with the exception of the existence of multiplicative

ISTUDY
14 I. VECTOR SPACES

inverses. In addition, we have


deg αβ = deg α + deg β
Now by making F [x] somewhat larger, we can in fact get a field.
Thus let F (x) denote the set of all rational functions, that is quotients
of polynomials with coefficients in F . Then F (x) bears the same re-
lationship to F [x] as Q does to the set of ordinary integers. Namely,
every element of Q is a quotient of integers and every element of F (x)
is a quotient of elements of F [x]. Admittedly there are a number of
technical details to be checked, but none-the-less with some care we
could show that F (x) is indeed a field. Observe that F becomes a
subfield of F (x) when we think of each element a ∈ F as the constant
polynomial a + 0x + 0x2 + 0x3 + · · · .
Finally, the elements of F [x] can be viewed as functions from F to
F . Thus if α is as above and if c ∈ F , then α(c) is given by


α(c) = ai c i
i=0

where this is of course a finite sum. It is not hard to see that


(α + β)(c) = α(c) + β(c)
(αβ)(c) = α(c) β(c)
We close with an elementary result.
Lemma 2.1. Let V be a vector space over the field F . Then for all
a ∈ F and α ∈ V , we have
i. 0α = 0 where the second zero is in V .
ii. a0 = 0 where both zeros are in V .
iii. −α = (−1)α.
Proof. Now 0 is an additively defined object and the above are
multiplicative properties. Thus the proof must be based on the only
axioms that intertwine addition and multiplication, namely the dis-
tributive laws.
(i ) In F we have 0 + 0 = 0 and thus
0α = (0 + 0)α = 0α + 0α
Adding −(0α) to both sides yields immediately 0α = 0.
(ii ) In V we have 0 + 0 = 0 and thus
a0 = a(0 + 0) = a0 + a0
Adding −(a0) to both sides then yields a0 = 0.

ISTUDY
2. VECTOR SPACES 15

(iii ) We have
α + (−1)α = (1)α + (−1)α
= (1 + (−1))α = 0α = 0
Thus adding −α to both sides, we obtain (−1)α = −α and the lemma
is proved. 

Problems
2.1. In the text we showed that a solution of the equation
ax + α = β
with a = 0 is x = a−1 (β − α). Prove that this is the unique solution.
2.2. Solve the simultaneous vector equations
ax + by = α
cx + dy = β
under the assumption that ad − bc = 0.
2.3. Show that the set of functions from U into F , as described in
Example 2.2, is a vector space.
2.4. Prove that F n is a vector space over the field F . If necessary,
try it first for n = 2 or 3.
2.5. In F n define αi = (0, 0, . . . , 1, . . . , 0) where the 1 occurs as the
ith entry, and all the other entries are 0. Show that every element of
F n can be written uniquely as a sum
α = a1 α1 + a2 α2 + · · · + an αn
for suitable ai ∈ F .
2.6. Clearly R2 is the ordinary Euclidean plane. Let α, β ∈ R2 .
Describe geometrically the quadrilateral with vertices 0, α, α + β and
β. Describe geometrically the relationship between aα and α for any
a ∈ R.
2.7. Let α1 = (1, 1, 2), α2 = (0, 1, 4) and α3 = (1, 0, −1) be vectors
in Q3 . Show that every element α ∈ Q3 can be written uniquely as a
sum α = a1 α1 + a2 α2 + a3 α3 for suitable a1 , a2 , a3 ∈ Q.
2.8. Can we find finitely many polynomials α1 , α2 , . . . , αn ∈ F [x]
with the property that every element α in F [x] is of the form α =
a1 α1 + a2 α2 + · · · + an αn for suitable ai ∈ F ?

ISTUDY
16 I. VECTOR SPACES

2.9. Verify the assertions made in Example 2.5. In particular, check


that polynomial multiplication is associative and check that deg αβ =
deg α + deg β. Also prove that for a ∈ F , (α + β)(a) = α(a) + β(a) and
(αβ)(a) = α(a)·β(a).
2.10. Let α be a nonzero polynomial in F [x] and let a ∈ F with
α(a) = 0. Show that α = (x − a)β for some β ∈ F [x]. Suppose further
that α = (x − a1 )(x − a2 ) · · · (x − an ). Show that a = ai for some i.

ISTUDY
3. SUBSPACES 17

3. Subspaces
Perhaps the subtitle of this section should be “new spaces from
old”. In mathematics, as soon as an object is defined, one seeks ways
of constructing new objects from the original and the obvious first step
is to look inside the original for subobjects. Thus we consider subfields
of fields, like the reals inside the complex numbers, and of course we
consider subsets of sets.
Let V be a vector space over F and let W be a subset of V . Then
W is said to be a subspace of V if W is a vector space in its own right
with the same addition and scalar multiplication as in V . Let us make
this last statement more explicit. Suppose α, β ∈ W and a ∈ F . Since
W is a vector space, we can compute α + β and aα in W . Since α and
β also belong to V ⊇ W , we can also compute α + β and aα in V .
Well, the fact that the arithmetic is the same in both W and V means
precisely that the two sums α + β are the same and again that the two
products aα are identical.
At this point, we could start listing examples of vector spaces V
and subspaces W , but in each case we would have to verify that W
satisfies all ten axioms and this is rather tedious. So what we do first
is to decide which of these axioms we really have to check. Suppose W
is a subspace of V . Then W has a zero element and negatives, and of
course the same is true of V .
Lemma 3.1. Let W be a subspace of V . Then the zero element of
W is the same as that of V , and negatives in W are the same as in V .
Proof. Let 0W be the zero element of W . Then in W we have
0W + 0W = 0W . But this is the same addition as in V , so viewing this
as an equation in V and adding −0W (the negative in V ) to both sides,
we get immediately 0W = 0, the zero of V .
Let α ∈ W and let β be its negative in W . Then β + α = 0W = 0.
Again, this is the same addition as in V , so adding −α to both sides
yields β = −α. 
We can now obtain our simplified subspace criteria.
Theorem 3.1. Let W be a subset of a vector space V . Then W is
a subspace of V if and only if
i. 0 ∈ W ,
ii. α, β ∈ W implies that α + β ∈ W , and
iii. α ∈ W and a ∈ F imply that aα ∈ W .
Proof. Suppose first that W is a subspace of V . Then W is a
vector space, so W has a zero element 0W . By the previous lemma,

ISTUDY
18 I. VECTOR SPACES

0W = 0, so 0 = 0W ∈ W . Finally, if α, β ∈ W and a ∈ F , then by the


closure axioms for W , and the fact that the arithmetic is the same in
W and in V , we have α + β ∈ W and aα ∈ W . Thus (i), (ii) and (iii)
are satisfied.
Now suppose W is a subset of vector space V satisfying (i), (ii) and
(iii). We show that W is a vector space in its own right using the same
arithmetic operations as in V . We have to check the ten axioms and
we consider them in four packages.
1. The two closure axioms are of course immediate consequences of
(ii) and (iii).
2. By (i), 0 ∈ W so there exists an element of W , namely 0, with
the property that α + 0 = 0 + α = α for all α ∈ W . Suppose some
second element γ ∈ W satisfies α + γ = γ + α = α for all α ∈ W . Then
since 0 ∈ W , we can set α = 0 here and get immediately γ = 0 + γ = 0.
Thus W has a unique zero element, the zero of V , and the zero axiom
is satisfied.
3. Let α ∈ W . Then by our assumption (iii) and Lemma 2.1(iii),
−α = (−1)α ∈ W . Thus there exists an element of W , namely −α,
with the property that α + (−α) = (−α) + α = 0. Now suppose there
existed some second element γ ∈ W with α + γ = γ + α = 0. Then
viewing this as an equation in V and using the uniqueness of negatives
in V , we see immediately that γ = −α. Thus W has unique negatives
for all its elements and the negative axiom is satisfied.
4. Finally, there remains the commutative law, the two associative
laws, the two distributive laws and the unital law to consider. But we
claim that these are in fact obviously satisfied. For example, consider
the distributive law a(α+β) = aα+aβ with α, β ∈ W . Now if this were
an equation with addition and scalar multiplication in V , then there
would be no question of its validity. But this is in fact such an equation
since the arithmetic in W is precisely the same as in V . Therefore this
axiom is satisfied in W and similarly the other ones hold.
Thus W satisfies all ten axioms and we have proved that W is a
subspace of V . 

We now consider our examples.


Example 3.1. Let V be a vector space. Then V always has two
subspaces namely V itself and {0}. We usually denote this latter sub-
space by 0 and thus this overworked symbol now signifies the zero of F ,
the zero of V , and the zero subspace. Moreover it will also be used to
designate certain additional objects later on. Again this usually causes
no confusion.

ISTUDY
3. SUBSPACES 19

Example 3.2. If V = F [x] is the polynomial ring over F in the


variable x, then for each integer n, the set of polynomials of degree ≤ n
is a subspace of V .

Example 3.3. Let U be a set and let V be the set of all functions
from U to F . As we have seen earlier, V is a vector space over F with
addition and scalar multiplication given by

(α + β)(u) = α(u) + β(u)


(aα)(u) = a·α(u) for all u ∈ U

Let u0 be a fixed element of U . Then it is easy to see that the set


W = {α ∈ V | α(u0 ) = 0} is a subspace of V .

Example 3.4. Let V be the set of all real valued functions defined
on the interval [0, 1]. That is, V is the set of all functions from U = [0, 1]
to F = R and hence, as above, V is a vector space over R. Suppose C
denotes the set of all continuous functions in V . Then 0 is continuous,
and the sum of two continuous functions is continuous. Furthermore,
C is closed under scalar multiplication, so C is in fact a subspace of V .

Example 3.5. Now let us consider V = F n for any field F . If W is


the set of all such n-tuples whose first entry is zero, then W is clearly
a subspace of V .
More generally, suppose we are given a set of homogeneous linear
equations over F in n unknowns. For the sake of simplicity, we will
write down just three such equations.

e 1 x1 + e 2 x2 + · · · + e n xn = 0
f 1 x1 + f 2 x2 + · · · + f n xn = 0
g 1 x1 + g 2 x2 + · · · + g n xn = 0

Observe that the word homogeneous means that zeros occur on the
right hand side of the equal sign. Let S denote the set of all n-tuples
(x1 , x2 , . . . , xn ) ∈ F n that are solutions to all three equations. We show
that S is a subspace of V .
First 0 = (0, 0, . . . , 0) is clearly in S. Now let α = (x1 , x2 , . . . , xn ) ∈
S, β = (y1 , y2 , . . . , yn ) ∈ S and let a ∈ F . Then

e1 (x1 +y1 ) + e2 (x2 + y2 ) + · · · + en (xn + yn )


= (e1 x1 + e2 x2 + · · · + en xn ) + (e1 y1 + e2 y2 + · · · + en yn )
=0+0=0

ISTUDY
20 I. VECTOR SPACES

Thus α + β satisfies the first equation and, in a similar manner, also


the other two. In other words, α + β ∈ S. In the same way

e1 (ax1 ) + e2 (ax2 ) + · · · + en (axn )


= a(e1 x1 + e2 x2 + · · · + en xn ) = a0 = 0

so aα satisfies the three equations and aα ∈ S. This implies that S is a


subspace of V . Thus we see that the study of subspaces is a necessary
ingredient in the study of linear equations.

Example 3.6. Suppose F ⊆ K ⊆ L is a chain of three fields, as


for example Q ⊆ R ⊆ C. Then L is a vector space over F and K is
certainly a subspace. Observe that L is also a vector space over field
K and this fact turns out to have many interesting consequences.

Now suppose V is a vector space and that we already know a num-


ber of subspaces. In the following few examples, we discuss ways of
constructing new subspaces from these.

Example 3.7.Let {Wi } be a family of subspaces of V . Then the


intersection W = i Wi is also a subspace, a fact which we now prove.
First, 0 ∈ Wi for all i, so 0 ∈ W . Second let α, β ∈ W and a ∈ F .
Then for each i, we have α, β ∈ Wi so, since Wi is a subspace, we
have α + β ∈Wi and aα ∈ Wi . That is, α + β, aα ∈ Wi for all i, so
α + β, aα ∈ i Wi = W . Thus W is a subspace of V and it is clearly
the largest subspace contained in all Wi .

Example 3.8. Let W1 and W2 be subspaces of V . Then, as we


observed above, W1 ∩ W2 is the largest subspace of V contained in
both W1 and W2 . Now let us turn this problem around and find the
smallest subspace which contains both W1 and W2 . Suppose first that
W is any subspace satisfying W ⊇ W1 , W2 . If α1 ∈ W1 and α2 ∈ W2 ,
then α1 , α2 ∈ W , so α1 + α2 ∈ W . Thus W contains the set of all such
sums or in other words W ⊇ W1 + W2 where we define

W1 + W2 = {α1 + α2 | α1 ∈ W1 , α2 ∈ W2 }

We show now that W1 + W2 is in fact a subspace of V and this will


therefore imply that W1 + W2 is the smallest subspace containing both
W1 and W2 .
Let α, β ∈ W1 + W2 and let a ∈ F . Then by definition of W1 + W2
there exist α1 , β1 ∈ W1 and α2 , β2 ∈ W2 with

α = α1 + α2 , β = β1 + β2

ISTUDY
3. SUBSPACES 21

Then
α + β = (α1 + α2 ) + (β1 + β2 )
= (α1 + β1 ) + (α2 + β2 ) ∈ W1 + W2
since α1 + β1 ∈ W1 and α2 + β2 ∈ W2 . Also
aα = a(α1 + α2 )
= (aα1 ) + (aα2 ) ∈ W1 + W2
since aα1 ∈ W1 and aα2 ∈ W2 . Finally, 0 = 0 + 0 ∈ W1 + W2 , so
W1 + W2 is indeed a subspace.
Now every element of W1 + W2 can be written as a sum of an
element of W1 and one of W2 , but the summands need not be uniquely
determined. If it happens that all such summands are unique, then
we say that the sum is direct and write W1 ⊕ W2 for W1 + W2 . The
following lemma yields a simple test for deciding when a sum is direct.

Lemma 3.2. Let W1 and W2 be subspaces of V . Then W1 + W2 is


a direct sum if and only if W1 ∩ W2 = 0.
Proof. Suppose first that W1 + W2 is a direct sum and let α ∈
W1 ∩ W2 . Then
α = α + 0 ∈ W1 + W2 (viewing α ∈ W1 and 0 ∈ W2 )
and
α = 0 + α ∈ W1 + W2 (viewing 0 ∈ W1 and α ∈ W2 )
Thus, by uniqueness of summands, we must have α = 0 and hence
W1 ∩ W2 = 0.
Conversely, suppose W1 ∩ W2 = 0 and let
γ = α1 + α2 = β1 + β2 ∈ W1 + W2
with α1 , β1 ∈ W1 and α2 , β2 ∈ W2 . Then
α1 − β1 = β2 − α2
Now the left hand term here is clearly in W1 and the right hand term
is in W2 . Thus this common element is in W1 ∩ W2 = 0. This yields
α1 − β1 = β2 − α2 = 0, so α1 = β1 , α2 = β2 , and the sum is direct. 
Example 3.9. Of course what we can do with two summands, we
can do with finitely many. Let W1 , W2 , . . . , Wn be subspaces of V and
set
W1 + W2 + · · · + Wn = {α1 + α2 + · · · + αn | αi ∈ Wi }
It follows as above that W = W1 + W2 + · · · + Wn is a subspace of V
and in fact it is the smallest subspace containing W1 , W2 , . . . and Wn .

ISTUDY
22 I. VECTOR SPACES

If, in addition, every element α of W can be written uniquely as


α = α1 + α2 + · · · + αn with αi ∈ Wi , then again we say that the sum is
direct and we write W = W1 ⊕ W2 ⊕ · · · ⊕ Wn . It is not hard to show
that W = W1 + W2 + · · · + Wn is a direct sum if and only if
(W1 ) ∩ W2 = 0
(W1 + W2 ) ∩ W3 = 0
(W1 + W2 + W3 ) ∩ W4 = 0
......
(W1 + W2 + · · · + Wn−1 ) ∩ Wn = 0
Since addition is commutative, this chain of conditions is therefore
equivalent to any similar chain of conditions with the subscripts per-
muted.
Example 3.10. Let us change the problem a little. Suppose α ∈
V . What is the smallest subspace containing α? Well obviously any
subspace containing α must necessarily contain all scalar multiples of
α and thus we define
α = {aα | a ∈ F }.
Now it is not hard to see that α is a subspace. In fact, all the proof
amounts to is observing that
0 = 0α ∈ α
aα + bα = (a + b)α ∈ α
a(bα) = (ab)α ∈ α
for all a, b ∈ F . Thus α is a subspace of V and then it is the smallest
subspace containing α.
Finally, let α1 , α2 , . . . , αn ∈ V . Then clearly the smallest subspace
containing all these vectors is α1  + α2  + · · · + αn  which we denote
by α1 , α2 , . . . , αn . Thus clearly
α1 , α2 , . . . , αn  = {a1 α1 + a2 α2 + · · · + an αn | ai ∈ F }

Problems
3.1. Let V = F 2 and let α = (1, 0), β = (0, 1) and γ = (1, 1) be
three vectors in V . By computing both sides of each inequality, show
that
α + ( β ∩ γ) = ( α + β) ∩ ( α + γ)
and
α ∩ ( β + γ) = ( α ∩ β) + ( α ∩ γ)

ISTUDY
3. SUBSPACES 23

Thus appropriate analogs of the distributive law do not hold for these
operations on subspaces.
3.2. Let α1 , α2 , α3 , α4 be four vectors in V that satisfy
α1 − 3α2 + 2α3 − 5α4 = 0
Show that α1 , α2 , α3 , α4  = α2 , α3 , α4 .
3.3. Let W1 and W2 be subspaces of V . Show that W1 ∪ W2 is
a subspace if and only if W1 ⊆ W2 or W2 ⊆ W1 . (Hint. If neither
of these two inclusions hold, then we can choose α1 ∈ W1 \ W2 and
α2 ∈ W2 \ W1 . Consider the element α1 + α2 .)

3.4. Let W1 , W2 and W3 be subspaces of V with W1 ⊆ W3 . Prove


that
(W1 + W2 ) ∩ W3 = W1 + (W2 ∩ W3 )
This is called the modular law.
3.5. Let α = (1, 1, 2), β = (0, 1, 3), γ = (2, 0, −1) ∈ Q3 . Show that
Q3 = α ⊕ β ⊕ γ
3.6. Prove that the criteria given in Example 3.9 for the sum of
subspaces W1 + W2 + · · · + Wn to be direct is indeed correct.
In each of the remaining problems decide whether there exist finitely
many vectors α1 , α2 , . . . , αm ∈ V with V = α1 , α2 , . . . , αm .
3.7. V = F n .
3.8. V = F [x].
3.9. V = C, F = R.
3.10. V = R, F = Q. This requires outside knowledge about the
size of the real numbers.

ISTUDY
24 I. VECTOR SPACES

4. Spanning and Linear Independence


Let us again return to the problem of a single linear equation, say
x1 − x2 + x3 = 0
over the field R of real numbers. Now we seek to solve this equation,
that is to find all its solutions. As before, let
S = {(x1 , x2 , x3 ) ∈ R3 | x1 − x2 + x3 = 0}
Then S is a subspace of R3 and what we want to do is to list all the
elements of S. Now (2, 1, −1) ∈ S and hence so is (2r, r, −r) for every
real number r. Thus S has infinitely many elements, so we cannot hope
to list them all. Instead, we devise another scheme.
Since x3 = x2 − x1 from the above, a typical element of S looks like
(x1 , x2 , x3 ) = (x1 , x2 , x2 − x1 ) = x1 (1, 0, −1) + x2 (0, 1, 1)
Observe that β1 = (1, 0, −1) and β2 = (0, 1, 1) are elements of S and
by the above every element of S can be written as a linear sum of
β1 and β2 with coefficients in R. Moreover, it is easy to see that the
coefficients are unique. What we have found is a basis for S.
Let V be a vector space over F . A basis B is a subset B =
{β1 , β2 , . . .}, possibly infinite, of V such that every element of α ∈ V
can be written uniquely as a finite F -linear sum of elements of B.
That is, for each α ∈ V , there exist finitely many elements of B, say
βi1 , βi2 , . . . , βim such that
α = a1 βi1 + a2 βi2 + · · · + am βim
with ai ∈ F . Moreover, with the exception of terms with zero coeffi-
cients, the choice of βi1 , βi2 , . . . , βim and the ai are uniquely determined
by α. If B = {β1 , β2 , . . . , βn } is a finite set, then the above of course
simplifies to the statement that every element α ∈ V can be written
uniquely as
α = a1 β1 + a2 β2 + · · · + an βn
with ai ∈ F .
Let us consider some examples.
Example 4.1. The space S given above has
B = {β1 = (1, 0, −1), β2 = (0, 1, 1)}
as a basis.
Example 4.2. V = F [x] has B = {1, x, x2 , . . . , xn , . . .} as a basis.
In other
m words, every element α ∈ F [x] can be written as a finite sum
i
α = i=0 ai x and the only ambiguity that occurs is in the number of
zero-coefficient terms that we allow as summands.

ISTUDY
4. SPANNING AND LINEAR INDEPENDENCE 25

Example 4.3. F n has B = {β1 , β2 , . . . , βn } as a basis where βi =


(0, 0, . . . , 1, . . . , 0) has a 1 in its ith entry and zeros elsewhere.

√Example 4.4. If we view C as a vector space over R, then C has


{1, −1} as a basis.

√Example 4.5.
√ If we view Q[ 2] as a vector space over Q, then
Q[ 2] has {1, 2} as a basis.
We should mention here that vector spaces have very many different
bases and that there is usually no such thing as a canonical or best basis.
In fact, one of the main problems in linear algebra is to find bases in
which the behavior of certain objects can be more easily understood.
However, generally for two different such objects, we get two different
bases.
Now there are really two aspects of the definition of a basis B.
First, each element α ∈ V can be written as a finite F -linear sum of
elements of B, the existence part. Second, the coefficients are unique,
the uniqueness part. Each of these aspects is interesting in its own
right, and we consider them separately.
Let C = {γ1 , γ2 , . . .} be a subset of V . We say that C is a spanning
set for V , or that C spans V , if every element α ∈ V can be written as
a finite F -linear sum
α = a1 γi1 + a2 γi2 + · · · + am γim
of elements of C. Observe that if C = {γ1 , γ2 , . . . , γn } is a finite set,
then C spans V if and only if V = γ1 , γ2 , . . . , γn .
Again, let C be as above. We say that C is linearly independent if
for every finite subset {γi1 , γi2 , . . . , γim } of C every equation of the form
a1 γi1 + a2 γi2 + · · · + am γim = 0
has only the trivial solution a1 = a2 = · · · = am = 0. In other words, C
is linearly independent if no nontrivial finite F -linear sum of elements
of C can add to zero. If C is not linearly independent, then we say that
it is linearly dependent. Note that if C = {γ1 , γ2 , . . . , γn } is a finite set,
then C is linearly independent if and only if the equation
a1 γ1 + a2 γ2 + · · · + an γn = 0
can hold only for a1 = a2 = · · · = an = 0. We consider now the
relationship between this concept and the uniqueness aspect of bases.
Lemma 4.1. Let C = {γ1 , γ2 , . . .} be a subset of V . Then C is
linearly independent if and only if every element α ∈ V that can be
written as an F -linearly combination of finitely many elements of C
can be written so uniquely.

ISTUDY
26 I. VECTOR SPACES

Proof. Suppose first that uniqueness holds and let γi1 , γi2 , . . . γim 
be a finite subset of C. Say a1 , a2 , . . . , am ∈ F with
a1 γi1 + a2 γi2 + · · · + am γim = 0
Since we know that
0γi1 + 0γi2 + · · · + 0γim = 0
uniqueness for the vector α = 0 implies that a1 = a2 = · · · = am = 0
and C is linearly independent.
Conversely suppose that C is linearly independent and suppose α ∈
V can be written as
b 1 γ i 1 + b 2 γ i 2 + · · · + br γ i r = α
and
c1 γ j 1 + c2 γ j 2 + · · · + cs γ j s = α
two possibly different finite F -linear combinations of elements of C. By
adding zero terms to each equation, we may assume that the vectors of
C that occur in each equation are the same, and then by renumbering
we may assume that r = s and γik = γjk . Thus, we have
b1 γ i 1 + b 2 γ i 2 + · · · + br γ i r = α
and
c1 γ i 1 + c2 γ i 2 + · · · + cr γ ir = α
Subtracting the second equation from the first then yields
(b1 − c1 )γi1 + (b2 − c2 )γi2 + · · · + (br − cr )γir = α − α = 0
Since C is linearly independent, each of these coefficients must vanish.
Thus bi − ci = 0, so bi = ci and the coefficients for α are unique. 
Thus B is a basis if and only if it is a linearly independent spanning
set. We now consider ways to find bases.
Suppose C is a subset of V . If C spans V , then obviously any set
bigger than C also spans V . However, it is not true that we can in-
discriminately remove elements from C and still maintain the spanning
property. We say that C is a minimal spanning set of V if C spans V ,
but for every vector γ ∈ C, the set C \ {γ} does not span.
In the other direction, suppose C is a linearly independent subset
of V . Then clearly, every subset of C is also linearly independent, but
we cannot add vectors indiscriminately to C and still maintain this
property. We say that C is a maximal linearly independent set if C is
linearly independent, but for every vector γ ∈ V \ C, the set C ∪ {γ} is
not linearly independent. The interrelations between these definitions
is given by

ISTUDY
4. SPANNING AND LINEAR INDEPENDENCE 27

Theorem 4.1. Let V be a vector space over F and let C be a subset


of V . The following are equivalent.
i. C is a basis for V .
ii. C is a linearly independent spanning set.
iii. C is a minimal spanning set.
iv. C is a maximal linearly independent set.
Proof. We have already observed that (i ) and (ii) are equivalent.
We prove that (ii) and (iii ) are equivalent and then that (ii ) and (iv )
are equivalent.
(ii) ⇒ (iii ). Since C is a linearly independent spanning set, it is
certainly a spanning set. Suppose C is not minimal. Then we can
choose γ ∈ C such that C \ {γ} spans V . Since γ ∈ V , this means that
we can write γ as a finite F -linear sum of elements of C \ {γ}. That is,
there exists γ1 , γ2 , . . . , γm ∈ C \ {γ} and a1 , a2 , . . . , am ∈ F with
γ = a1 γ1 + a2 γ2 + · · · + am γm
But then
(−1)γ + a1 γ1 + a2 γ2 + · · · + am γm = 0
is a nontrivial F -linear sum of elements of C which adds to zero, and
this contradicts the fact that C is linearly independent. Thus C is a
minimal spanning set.
(iii ) ⇒ (ii ). Since C is a minimal spanning set, it spans V and we
need only show that it is linearly independent. If this is not the case,
then there exist γ1 , γ2 , . . . , γm ∈ C and a1 , a2 , . . . , am ∈ F not all zero,
with
a1 γ1 + a2 γ2 + · · · + am γm = 0
Since the ai are not all zero, say a1 = 0. Then multiplying the above
equation by −a−1 1 , we see that we can assume that a1 = −1. This
yields
γ1 = a2 γ2 + · · · + am γm (∗)
We show now that C \ {γ1 } spans V . Let α ∈ V . Since C spans V ,
there exist field elements b1 , b2 , . . . , bm and ci1 , ci2 , . . . , cir with
α = b1 γ 1 + b2 γ 2 + · · · + bm γ m + c i 1 γ i 1 + c i 2 γ i 2 + · · · + ci r γ i r
where γi1 , γi2 , . . . , γir are additional elements of C. But, by plugging in
the formula (∗) for γ1 in this equation, we see that
α = b1 (a2 γ2 + · · · + am γm )
+ b 2 γ 2 + b3 γ 3 + · · · + bm γ m + c i 1 γ i 1 + c i 2 γ i 2 + · · · + c i r γ i r
= (b1 a2 + b2 )γ2 + (b1 a3 + b3 )γ3 + · · · + (b1 am + bm )γm
+ ci 1 γ i 1 + ci 2 γ i 2 + · · · + ci r γ i r

ISTUDY
28 I. VECTOR SPACES

is an F -linear combination of elements of C \ {γ1 }. Thus C \ {γ1 } spans


V , and this contradicts the fact that C is a minimal spanning set. We
therefore must have had C linearly independent.
(ii) ⇒ (iv ). Since C is a linearly independent spanning set, it is
certainly independent. Let γ ∈ V \ C. Since C spans V , there exist
γ1 , γ2 , . . . , γm ∈ C and a1 , a2 , . . . , am ∈ F with
γ = a1 γ1 + a2 γ2 + · · · + am γm
or
(−1)γ + a1 γ1 + a2 γ2 + · · · + am γm = 0
But this dependence relation implies that the set C ∪ {γ} is linearly
dependent. Hence C is a maximal linearly independent set.
(iv ) ⇒ (ii ). Finally we are given that C is a maximal linearly
independent set, so we need only show that it spans. Let α ∈ V . If
α ∈ C, then
α = (1)α
shows that α is an F -linear sum of elements of C. Now let α ∈ / C so
that C ∪ {α} is properly bigger than C. By the maximality property
of C, we know that this bigger set is linearly dependent, so there exist
γ1 , γ2 , . . . , γm ∈ C and a, a1 , a2 , . . . , am ∈ F not all zero with
aα + a1 γ1 + a2 γ2 + · · · + am γm = 0
Can a = 0? If this were the case, then we could delete the α term
above and have a nontrivial F -linear sum of elements of C adding to
zero, a contradiction since C is linearly independent. Thus a = 0 and
multiplying the above equation by −a−1 , we see that we can assume
that a = −1. Thus we have
(−1)α + a1 γ1 + a2 γ2 + · · · + am γm = 0
so
α = a1 γ1 + a2 γ2 + · · · + am γm
and C spans V . This completes the proof. 
Finally, one more definition. Let V be a vector space over F . If
V has a finite spanning set, then we say that V is finite dimensional.
Otherwise, V is infinite dimensional. It is apparent that finite dimen-
sional spaces are in some sense small and therefore this property should
carry over to subspaces. However this fact requires proof and we will
prove it and more in the next section. We close this section with
Corollary 4.1. Let V be a vector space with a finite spanning set
C. Then some subset B of C is a basis. In particular, finite dimensional
vector spaces have bases.

ISTUDY
4. SPANNING AND LINEAR INDEPENDENCE 29

Proof. C has only finitely many subsets and one of these subsets,
namely C itself spans V . Thus we may choose a subset B of C that is a
spanning set of smallest possible size. Clearly B is a minimal spanning
set of V and therefore by Theorem 4.1, B is a basis.
If V is a finite dimensional vector space, then V does have such a
finite spanning set, and hence V does have a basis. 
It is in fact true that every vector space has a basis. However a
proof of this requires going beyond the above finite type argument,
and one must use transfinite methods. We will put this study off until
the end of these notes since, for the most part, we are concerned with
finite dimensional spaces.

Problems
4.1. Find a basis for the solution space of the real linear equation
x1 − 2x2 + x3 − x4 = 0
4.2. Find a basis for the space of simultaneous solutions of the real
linear equations
x1 − 2x2 + x3 − x4 = 0
2x1 − 3x2 − x3 + 2x4 = 0
4.3. For each i ≥ 0, let βi ∈ F [x] be a polynomial of degree i. Prove
that B = {β0 , β1 , β2 , . . .} is a basis for F [x].
4.4. Verify that B = {β1 , β2 , . . . , βn } as given in Example 4.3 is a
basis for F n .
4.5. Let V be a vector space with basis B = {β1 , β2 , β3 , β4 }. Prove
that C = {β1 , β2 − β1 , β3 − β2 , β4 − β3 } is also a basis.
4.6. Prove that the subset
C = {(1, 0, −1, 2), (1, 1, 0, 1), (1, 0, 2, 3), (0, 0, 1, 2)}
of R4 is linearly independent.
4.7. Prove that the subset C = {(1, 0, 2), (1, 1, 0), (1, 0, −1)} of Q3
is a spanning set.
4.8. Let γ ∈ V . Show that {γ} is linearly independent if and only
 0.
if γ =
4.9. Let C be a subset of vector space V . Show that C is linearly
independent if and only if C is a basis for some subspace of V .

ISTUDY
30 I. VECTOR SPACES

4.10. Let F ⊆ L ⊆ K be a chain of three fields. Suppose A =


{α1 , α2 , . . . , αn } is a basis for L as a vector space over F and suppose
B = {β1 , β2 , . . . , βm } is a basis for K as a vector space over L. Prove
that
{αi βj | i = 1, 2, . . . , n; j = 1, 2, . . . , m}
is a basis for K as a vector space over F .

ISTUDY
5. THE REPLACEMENT THEOREM 31

5. The Replacement Theorem


In this section, we prove our first real theorem. It is quite elemen-
tary, but never-the-less it tells us almost everything of a theoretical
nature that we might want to know about finite dimensional vector
spaces.
Theorem 5.1 (Replacement Theorem). Let V be a vector space
over the field F , let A = {α1 , α2 , . . . , αn } be a linearly independent
subset of V , and let C = {γ1 , γ2 , . . . , γm } be a spanning set of V . Then
by suitably renumbering the elements of C, if necessary, it follows that
the set
{α1 , α2 , . . . , αn , γn+1 , . . . , γm }
spans V . In particular, m ≥ n.
In other words, we can replace γ1 by α1 , γ2 by α2 , . . . and γn by
αn in C, and the set will still span V . Observe that the renumbering
of C is really necessary. First, the order in which the γ’s are written
is of course accidental, and there is no reason to believe a priori that
the given order might be consistent with that of the α’s to allow the
replacement to work. Second, consider the extreme case where C is a
basis and A ⊆ C. Then C is a minimal spanning set and from this
it is clear that the only possible replacement that can work for all n
requires α1 = γ1 , α2 = γ2 , . . ., αn = γn .
Proof. We show by induction on j, for 0 ≤ j ≤ n, that we can
choose elements, which we label γ1 , γ2 , . . . , γj ∈ C so that the set
Cj = {α1 , α2 , . . . , αj , γj+1 , γj+2 , . . . , γm }
spans V .
Observe that the induction starts at j = 0. Here no replacements
are made, so by assumption, C0 = C spans V .
Now let us assume that the first j γ’s have been so labeled and that
the set Cj above spans V . If j = n, we are done. If not then j < n
so αj+1 ∈ V . In particular, since Cj spans V , there exist field elements
a1 , a2 , . . . , aj , bj+1 , bj+2 , . . . , bm with
αj+1 = a1 α1 + a2 α2 + · · · + aj αj
+ bj+1 γj+1 + bj+2 γj+2 + · · · + bm γm (∗)
Can all the bi ’s be zero? If this were the case, then
αj+1 = a1 α1 + a2 α2 + · · · + aj αj
so
0 = a1 α1 + a2 α2 + · · · + aj αj + (−1)αj+1 + 0αj+2 + · · · + 0αn

ISTUDY
32 I. VECTOR SPACES

is a nontrivial F -linear sum of the elements of A that adds to zero.


But A is linearly independent, so this cannot occur. Thus some bi is
nonzero.
By relabeling the elements γj+1 , γj+2 , . . . , γm if necessary, we can
assume that bj+1 = 0. With this assumption, we show that
Cj+1 = {α1 , α2 , . . . , αj , αj+1 , γj+2 , γj+3 , . . . , γm }
spans V . First, since bj+1 = 0, we can solve for γj+1 in (∗) and obtain
γj+1 = a1 α1 + a2 α2 + · · · + aj αj + aj+1 αj+1
+ bj+2 γj+2 + bj+3 γj+3 + · · · + bm γm (∗∗)
where
ai = −b−1
j+1 ai for i ≤ j
aj+1 = b−1
j+1
−1
bi = −bj+1 bi for i ≥ j + 2
Now let β ∈ V be arbitrary. Since Cj spans V , there exist field
elements c1 , c2 , . . . , cj , dj+1 , dj+2 , . . . , dm with
β = c1 α 1 + c 2 α 2 + · · · + cj α j
+ dj+1 γj+1 + dj+2 γj+2 + · · · + dm γm
Plugging the formula (∗∗) for γj+1 into this equation, we get easily
β = c1 α 1 + c2 α 2 + · · · + cj α j
+ dj+1 (a1 α1 + a2 α2 + · · · + aj αj + aj+1 αj+1 )
+ dj+1 (bj+2 γj+2 + bj+3 γj+3 + · · · + bm γm )
+ dj+2 γj+2 + · · · + dm γm
= c1 α1 + c2 α2 + · · · + cj αj + cj+1 αj+1
+ dj+2 γj+2 + dj+3 γj+3 + · · · + dm γm
where
ci = ci + dj+1 ai for i ≤ j
cj+1 = dj+1 aj+1
di = di + dj+1 bi for i ≥ j + 2
Thus β is written as an F -linear sum of elements of Cj+1 . Since this is
true for all such β ∈ V , this says that Cj+1 spans V .
Therefore the induction step is verified. The first part of the the-
orem now follows from the case j = n. Finally, since n different γ’s

ISTUDY
5. THE REPLACEMENT THEOREM 33

have been replaced, we must clearly have m ≥ n. This completes the


proof. 
To some readers, the proof of the last statement m ≥ n may be
a little disturbing. Where in the proof do we really show that we do
not run out of elements of C in this process? The answer is precisely
in studying equation (∗). If there were no γ’s left at this stage, then
certainly all the bi ’s would be zero. Thus, when we show that some bi
is not zero, we are in fact also showing that some γi exists with i > j.
The Replacement Theorem has numerous corollaries. We start with
the most important. If A is a subset of V , we let |A| denote its size.
Theorem 5.2. Let V be a finite dimensional vector space over F .
Then there exists a unique integer n ≥ 0 called the dimension of V with
the property that
i. If A is a linearly independent subset of V , then |A| ≤ n.
ii. If B is a basis of V , then |B| = n.
iii. If C is a spanning subset of V , then |C| ≥ n.
Proof. Since V is a finite dimensional vector space, we know that
it has a finite basis B. Fix such a basis and let n = |B|.
 We show that
n satisfies (i ), (ii) and (iii ). Let A, B and C be as above.
Let A0 be a finite subset of A. Then A0 is linearly independent
and B spans V . Thus by the Replacement Theorem, |A0 | ≤ |B|  = n.
Since |A0 | ≤ n for all such finite subsets A0 of A, this clearly implies
that A is finite and |A| ≤ n. Thus (i ) is proved.
If |C| = ∞, then certainly |C| ≥ n. Thus, in proving (iii ), we can
assume that C is finite. Now B is linearly independent and C spans V .
Hence by the Replacement Theorem, |C| ≥ |B|  = n and (iii ) follows.
Finally, B is linearly independent, so by (i ) with A = B we have
|B| ≤ n. Also B spans, so by (iii ) with C = B we have |B| ≥ n. This
yields |B| = n and (ii ) follows. Note, by (ii ), n is the common size of
all bases of V , and thus clearly n is uniquely determined. 
We use dimF V to denote the dimension of V .
Corollary 5.1. Let dimF V = n < ∞ and let A be a subset of V
of size n. Then the following are equivalent.
i. A is linearly independent.
ii. A is a basis.
iii. A spans V .
Proof. Certainly (ii ) implies (i ) and (iii ). Now assume that A
satisfies (i ). Then A is linearly independent and |A| = n. But there

ISTUDY
34 I. VECTOR SPACES

can be no linearly independent subsets of size n + 1, so A is a maximal


linearly independent subset and hence is a basis for V . Finally, let
A satisfy (iii ). Then A spans V and |A| = n. But there can be no
spanning set of V of size n − 1, so A is a minimal spanning set and
hence it is a basis for V . 
At this point, a natural question to consider is whether a subspace
of a finite dimensional vector space must also be finite dimensional. We
proceed to answer this now.
Lemma 5.1. Let V be a finite dimensional vector space and let A
be a linearly independent subset of V . Then A can be extended to a
basis of V .
Proof. Let A = {α1 , α2 , . . . , αr } and let B = {β1 , β2 , . . . , βn } be a
basis for V of size n = dimF V . Since A is linearly independent and B
spans V , the Replacement Theorem implies that for a suitable ordering
of the β’s, the set
A = {α1 , α2 , . . . , αr , βr+1 , βr+2 , . . . , βn }
 ≤ n (since some duplication might occur), but A
spans V . Clearly |A|
 ≥ n. Thus A is a spanning set of size n
spans V , so we must have |A|
and, by the above corollary, A is a basis for V . 
Theorem 5.3. Let W be a subspace of a finite dimensional vector
space V . Then
i. W is finite dimensional and dimF W ≤ dimF V .
ii. dimF W = dimF V if and only if W = V .
iii. Any basis of W can be extended to a basis for V .
iv. There exists a subspace W  of V such that V = W ⊕ W  .
Proof. Say dimF V = n. Let A be a linearly independent subset
of W . Then no nontrivial finite F -linear sum of elements of A can add
to zero. Furthermore, the arithmetic in W is the same as in V , so the
previous statement holds whether we view A as a subset of W or of V .
In particular, A is linearly independent in V . By the definition of the
dimension of V , we have |A| ≤ n.
We start with A0 = ∅, the empty subset of W . Then A0 is linearly
independent in W . If it is maximal with this property, then A0 is a
basis of W , and of course W = 0. If A0 is not maximal, then we can
adjoin a suitable vector of W to get A1 , a linearly independent subset
of W of size 1. If A1 is not maximal, we can adjoin a suitable vector of
W to get A2 , a linearly independent subset of W of size 2. We continue
with this process and, by the argument of the previous paragraph, it

ISTUDY
5. THE REPLACEMENT THEOREM 35

must terminate at some Aj with j ≤ n. Thus Aj is a maximal linearly


independent subset of W , so Aj is a basis for W . This shows that
W has a finite spanning set, so it is finite dimensional. Then, by the
definition of dimension, we have dimF W = j ≤ n. This yields (i ).
Now let A = {α1 , α2 , . . . , αj } be any basis for W . Since A is linearly
independent in V , the above lemma implies that we can extend A to
B = {α1 , α2 , . . . , αj , βj+1 , βj+2 , . . . , βn }, a basis for V , thereby proving
(iii ).
Now if W = V , then certainly dimF W = dimF V . Conversely,
suppose dimF W = dimF V . Then we must have B = A above and in
particular no β’s occur. If γ ∈ V , then γ can be written as an F -linear
sum of elements of B = A. Since A ⊆ W , this implies that γ ∈ W and
thus V ⊆ W . Therefore clearly V = W and (ii ) follows.
Finally, set W  = βj+1 , βj+2 , . . . , βn . We show that V = W ⊕ W  .
First, let γ ∈ V . Since B spans V , we have
γ = a1 α1 + a2 α2 + · · · + aj αj + bj+1 βj+1 + bj+2 βj+2 + · · · + bn βn
for suitable elements a1 , a2 , . . . , aj , bj+1 , bj+2 , . . . , bn ∈ F . If we set
α = a1 α1 + a2 α2 + · · · + aj αj and β = bj+1 βj+1 + bj+2 βj+2 + · · · + bn βn ,
then clearly α ∈ W , β ∈ W  and γ = α + β ∈ W + W  . Thus
V = W + W .
Now suppose δ ∈ W ∩ W  . Then from δ ∈ W and δ ∈ W  , we
deduce that
δ = c1 α 1 + c 2 α 2 + · · · + c j α j
δ = dj+1 βj+1 + dj+2 βj+2 + · · · + dn βn
for suitable c1 , c2 , . . . , cj , dj+1 , dj+2 , . . . , dn ∈ F . Thus
c1 α 1 + c2 α 2 + · · · + cj α j
+ (−dj+1 )βj+1 + (−dj+2 )βj+2 + · · · + (−dn )βn = 0
Since B = {α1 , α2 , . . . , αj , βj+1 , βj+2 , . . . , βn } is linearly independent,
we conclude that all coefficients above must vanish. In particular, c1 =
c2 = · · · = cj = 0, so δ = 0 and W ∩ W  = 0. Thus V = W + W  =
W ⊕ W  and the theorem is proved. 
We remark that W  above is called a complement for W in V . Of
course, W  is by no means unique. We now consider a general relation
between two subspaces of V .
Theorem 5.4. Let W1 and W2 be two subspaces of a finite dimen-
sional vector space V . Then
dimF (W1 + W2 ) + dimF (W1 ∩ W2 ) = dimF W1 + dimF W2

ISTUDY
36 I. VECTOR SPACES

Proof. We know that all of the above spaces are finite dimen-
sional. Let C = {γ1 , γ2 , . . . , γt } be a basis for W1 ∩ W2 , so that
dimF (W1 ∩ W2 ) = t. Since W1 ∩ W2 is a subspace of W1 , we can
extend C to a basis A = {γ1 , γ2 , . . . , γt , α1 , α2 , . . . , αr } of W1 . Simi-
larly, we can extend C to a basis B = {γ1 , γ2 , . . . , γt , β1 , β2 , . . . , βs } of
W2 . Observe that dimF W1 = t + r and dimF W2 = t + s, so
dimF W1 + dimF W2 − dimF (W1 ∩ W2 )
= (t + r) + (t + s) − t = r + s + t
In other words, what we want to prove is that dimF (W1 +W2 ) = r+s+t.
Now we have a nice set
D = {α1 , α2 , . . . , αr , β1 , β2 , . . . , βs , γ1 , γ2 , . . . , γt }
of r + s + t seemingly distinct vectors that are all clearly contained in
W1 + W2 . Thus the obvious approach is to prove that D is a basis for
W1 + W2 .
First let δ ∈ W1 + W2 . Then δ = α + β with α ∈ W1 and β ∈ W2 .
Since A spans W1 and B spans W2 , we have
α = a1 α1 + · · · + ar αr + c1 γ1 + · · · + ct γt
β = b1 β1 + · · · + bs βs + d1 γ1 + · · · + dt γt
for suitable field elements ai , bi , ci and di . Thus
δ = α + β = a1 α1 + · · · + ar αr + b1 β1 + · · · + bs βs
+ (c1 + d1 )γ1 + · · · + (ct + dt )γt
and D spans W1 + W2 .
Now suppose that some F -linear sum of the elements of D sums to
zero. Say
a1 α1 + · · · + ar αr + b1 β1 + · · · + bs βs + c1 γ1 + · · · + ct γt = 0
Then
a1 α1 + · · · + ar αr + c1 γ1 + · · · + ct γt = (−b1 )β1 + · · · + (−bs )βs
Now the above left-hand side is clearly in W1 and the right-hand side
is in W2 , so this common vector which we call δ is in W1 ∩ W2 . Thus,
since C spans W1 ∩ W2 , we have
δ = d1 γ1 + · · · + dt γt
for suitable di ∈ F . This yields the equations
a1 α1 + · · · + ar αr + c1 γ1 + · · · + ct γt = d1 γ1 + · · · + dt γt
(−b1 )β1 + · · · + (−bs )βs = d1 γ1 + · · · + dt γt

ISTUDY
5. THE REPLACEMENT THEOREM 37

or equivalently
a1 α1 + · · · + ar αr + (c1 − d1 )γ1 + · · · + (ct − dt )γt = 0
b1 β1 + · · · + bs βs + d1 γ1 + · · · + dt γt = 0
Therefore, by the linear independence of A and B, we conclude that
a1 = · · · = ar = 0, b1 = · · · = bs = 0, d1 = · · · = dt = 0
and then c1 = · · · = ct = 0. Thus D is linearly independent and hence a
basis for W1 + W2 . Observe that the above also tells us that all r + s + t
elements of D are distinct so
dimF (W1 + W2 ) = |D| = r + s + t
and the result follows. 

Problems
5.1. Suppose that A and B are finite disjoint index sets. Convince
yourself that for vectors αi ∈ V we have
  
αi = αi + αi
i∈A∪B i∈A i∈B

What does this say when A = ∅ is the empty set? What is a basis for
the space V = 0?
Find a basis and the dimension of each of the following F -vector
spaces.
5.2. V = F n .
5.3. V = F [x].
5.4. V = C over the field F = R.
5.5. V is the set a functions from a finite set U into the field F .
Let V be a vector space of finite dimension n.
5.6. Let W1 and W2 be subspaces of V with
dimF W1 + dimF W2 > n
Prove that W1 ∩ W2 = 0.
5.7. Let V = Wk > Wk−1 > · · · > W1 > W0 = 0 be a finite chain
of distinct subspaces of V . Show that k ≤ n.

ISTUDY
38 I. VECTOR SPACES

5.8. Consider the chain of subspaces in the preceding problem.


Show that there exist subspaces U0 , U1 , . . . , Uk of V such that
Wi = U0 ⊕ U1 ⊕ · · · ⊕ Ui
for all i.
5.9. Find two different complements for the subspace W of V where
W = (1, −1, 2, 0), (1, 1, 1, −1) and V = Q4 .
5.10. Let F ⊆ L ⊆ K be a chain of fields. Prove that
dimF K = (dimF L)(dimL K)
in the sense that if any two of these three numbers is finite, then so is
the third and the product formula holds. (See Problem 4.10.)

ISTUDY
6. MATRICES AND ELEMENTARY OPERATIONS 39

6. Matrices and Elementary Operations


Now we introduce a bit of notation and some computational tech-
niques. We are concerned with the following two rather concrete prob-
lems. Given a finite collection A = {α1 , α2 , . . . , αm } of vectors in F n ,
find a basis for the subspace V = A spanned by A. Of course, canon-
ical bases do not exist in general, so we could just try to find a nice
basis or we could look for a basis that is a subset of A.
To start with, let us write
αi = (ai1 , ai2 , . . . , ain ) ∈ F n
for i = 1, 2, . . . , m. Since the entries are double subscripted, it is tempt-
ing to put them into a rectangular array which we call a matrix. We
“hold these entries in place” by surrounding the array A with square
brackets, but we no longer use commas to separate the entries since
this would not work in the vertical direction. Thus we write
⎡ ⎤
a11 a12 . . . a1n
⎢ a21 a22 . . . a2n ⎥
A=⎢ ⎣


........
am1 am2 . . . amn
for the corresponding m × n (m by n) matrix. As usual the double
subscripts should be separated by a comma, but we will ignore this if
there is no possibility of confusion. We frequently abbreviate matrix
notation by writing A = [aij ], and we denote the set of all m × n
matrices by F m×n .
This matrix has m rows, labeled 1 through m going down, and n
columns, labeled 1 through n from left to right. Thus aij is the entry in
the ith row and jth column. Now the rows each have n ordered entries
and hence can be viewed as elements of F n . Indeed these correspond to
the original vectors A = {α1 , α2 , . . . , αm }. The subspace of F n spanned
by these row vectors is called the row space of A. Similarly, we note
that the columns of A each have m ordered entries. Thus by mentally
rotating these columns 90◦ , we can view these as vectors in F m . The
subspace of F m spanned by these column vectors is then called the
column space of A.
Now to find a basis for V = A we are free to modify the vectors
in A, and this gives rise to corresponding operations on the rows of the
matrix. Indeed, we list the three so-called elementary row operations
that are of interest.

R1. Interchange any pair of rows. More precisely, if we interchange


rows i and k, we write this operation as R1 (i, k).

ISTUDY
40 I. VECTOR SPACES

R2. Multiply the ith row by a nonzero constant c so that the en-
try aij becomes c·aij for all j = 1, 2, . . . , n. We denote this
operation by R2 (i; c).
R3. Finally, for i = k, we add c ∈ F times the kth row to the ith,
so that the entry aij becomes aij + c·akj for all j = 1, 2, . . . , n.
We denote this operation by R3 (i, k; c).

Notice that R2 (i; 1) and R3 (i, k; 0) are both equal to Id, the identity
operation that leaves A unchanged. The key property of all these
operations is that they are invertible. Indeed, one can undo each of
these with another elementary row operation.
Lemma 6.1. Each elementary row operation is invertible with its
inverse being an elementary row operation of the same type.

Proof. Clearly R1 (i, k)R1 (i, k) = Id and R2 (i; c−1 )R2 (i; c) = Id.
Finally since R3 (i, k; c) does not change the kth row of A, we see that
R3 (i, k; −c)R3 (i, k; c) = Id. 
With this, we can quickly prove that elementary row operations do
not change the row space. Specifically, we have
Lemma 6.2. If A is an m × n matrix over F , and if R is an ele-
mentary row operation, then the row spaces of A and of R(A) are the
same.
Proof. Write B = R(A). By considering the three operations in
turn, we see easily that each row vector of B belongs to the row space
of A and hence the row space of B is contained in the row space of A.
Furthermore, since R−1 (B) = R−1 (R(A)) = A, we obtain the reverse
inclusion and consequently the two row spaces are equal. 
Now let us see what we can do with these operations. To start with,
we describe a fairly nice matrix structure. We say that a matrix A is
in row echelon form if
E1. The zero rows of A, if any, are at the bottom, that is they are
all below the nonzero rows.
E2. Each nonzero row, if any, starts with a leading 1. That is, the
first nonzero entry, from left to right, is a 1.
E3. The leading 1s slope down and to the right. Specifically, if the
rows i and i + 1 are both nonzero, then the leading 1 in row
i+1 is contained in a column strictly to the right of the leading
1 of row i.
For example, we have

ISTUDY
6. MATRICES AND ELEMENTARY OPERATIONS 41

Example 6.1. The 5 × 6 real matrix


⎡ ⎤
0 1 2 3 4 6
⎢0 0 0 1 3 1⎥
⎢ ⎥
A = [aij ] = ⎢
⎢0 0 0 0 1 4⎥⎥
⎣0 0 0 0 0 0⎦
0 0 0 0 0 0
is in row echelon form since (1) the two zero rows are at the bottom,
(2) each of the nonzero rows starts with an entry 1, and (3) the leading
1s slant down and to the right. Note that the leading 1s are the entries
a12 , a24 and a35 . On the other hand, a26 = 1 but it is not a leading 1.
It follows from (E1) and (E3) that the entries below each leading
1 are all 0. But what about the entries above the leading 1s? We say
that the matrix A is in reduced row echelon form if it satisfies (E1),
(E2) and (E3), so that it is in row echelon form, and also satisfies

E4. In each column determined by a leading 1, all the remaining


entries are 0.

Example 6.2. The matrix A of Example 6.1 is not in reduced row


echelon form, but the modified matrix
⎡ ⎤
0 1 2 0 0 6
⎢ 0 0 0 1 0 1⎥
⎢ ⎥
B = [bij ] = ⎢
⎢ 0 0 0 0 1 4 ⎥

⎣ 0 0 0 0 0 0⎦
0 0 0 0 0 0
is in reduced row echelon form since the entries in the second, fourth
and fifth columns, that are not the leading 1s, are all equal to 0.
Reduced row echelon form matrices are important because of the
uniqueness property described below.
Theorem 6.1. Let A ∈ F m×n . Then we have
i. Using a sequence of elementary row operations, the matrix A
can be transformed to a reduced row echelon matrix A .
ii. The nonzero rows of A form a basis for the row space of A.
In particular, the dimension of the row space of A is equal to
the number of nonzero rows of A .
iii. The matrix A is uniquely determined by A, independent of the
sequence of elementary row operations that are used.

ISTUDY
42 I. VECTOR SPACES

Proof. We show how to transform A = [aij ] into a reduced row


echelon matrix. If all entries of A are 0, we are done, so suppose not and
say the sth column is the left-most nonzero column. If ars = 0, then by
interchanging the rth and first rows, if necessary, we can place ars in the
1, s-position. Thus we may assume that a1s = 0 and by multiplying the
first row by a−11s , we can now assume that a1s = 1. Next, we successively
add −ais times the first row to the ith for i = 2, 3, . . . , m. This leaves
the first row unchanged, but modifies the others so that all remaining
entries in the sth column are 0.
Next, we consider the submatrix consisting of rows 2 through m.
Note that in this matrix the first s columns are 0. Again, if all entries
are 0, we are done. If not let t be the left-most nonzero column, so
that t > s. If art = 0, then interchanging the second and rth rows, if
necessary, we can put art in the 2, t-position. Thus we may assume that
a2t = 0 and by multiplying the second row by a−1 2t , we can now assume
that a2t = 1. Next, we successively add −ait times the second row to
the ith for all i = 1, 3, . . . , m. This leaves the sth column unchanged
since a2s = 0, but modifies the tth column so that all entries, including
the first, but not the leading 1, have become 0. We now move on to
the submatrix consisting of rows 3 through m and, by continuing in
this manner, we can clearly transform A into a reduced row echelon
matrix. With this, part (i ) follows.
For parts (ii) and (iii ), let α1 , α2 , . . . , αr be the nonzero row vectors
of A , in their natural order, with leading 1s in columns s1 , s2 , . . . , sr
respectively. If
β = x1 α1 + x2 α2 + · · · + xr αr
with x1 , x2 , . . . , xr ∈ F , then the sj th entry of β is precisely equal
to xj . In particular, if β = 0, then all xj = 0 and we conclude
that α1 , α2 , . . . , αr are linearly independent. On the other hand, we
know that the row space V of A is equal to the row space of A , so
α1 , α2 , . . . , αr span V . Thus A = {α1 , α2 , . . . , αr } is indeed a basis for
V . This proves (ii ) and consequently r = dimF V .
Finally, observe that β above is a typical element of V and our
comments concerning the sj th entry of β imply that αr is the unique
vector in V with a leading 1 and the largest number of preceding 0s.
Thus V , and hence A, determines the row vector αr and also the column
sr . Next consider the nonzero vectors in V with 0 entry in position sr .

If β above is such an element, then xr = 0 and it follows that αr−1 is
the unique such vector with a leading 1 and with the largest number

of preceding 0s. Thus V , and hence A, determines the row vector αr−1
and also the column sr−1 . Continuing in this manner with vectors

ISTUDY
6. MATRICES AND ELEMENTARY OPERATIONS 43


αr−2 , . . . , α1 , in turn, we see that V , and hence A, determines all the
rows of A in order. Thus (iii ) is proved. 

Example 6.3. The procedure described in the preceding theorem


for converting A into a reduced row echelon matrix is automatic and
can be easily programmed on a computer. However, humans abhor
fractions, so we can sometimes make appropriate choices to avoid cer-
tain divisions that might occur. Since there is a unique answer, we
might as well take a path that is computationally simpler. Consider
for example, the integer matrix
⎡ ⎤
0 2 4 1 3
A0 = ⎣0 5 10 3 8⎦
0 3 6 2 2
Dividing by 2, 5 or 3 will certainly introduce fractions. So we first
apply R3 (1, 3; −1), namely we subtract the third row from the first.
This yields ⎡ ⎤
0 −1 −2 −1 1
A1 = ⎣ 0 5 10 3 8⎦
0 3 6 2 2
and then multiplying the first row by −1, we have
⎡ ⎤
0 1 2 1 −1
A2 = ⎣0 5 10 3 8⎦
0 3 6 2 2
At this point, we subtract 5 times the first row from the second, and
then 3 times the first row from the third to obtain
⎡ ⎤
0 1 2 1 −1
A3 = ⎣0 0 0 −2 13⎦
0 0 0 −1 5
In the fourth column, it is certainly better to deal with the −1 entry
rather than dividing by 2. So we interchange rows 2 and 3, and then
multiply the new second row by −1. This yields
⎡ ⎤
0 1 2 1 −1
A4 = ⎣ 0 0 0 1 −5⎦
0 0 0 −2 13
Next subtracting the second row from the first, and adding 2 times the
second row to the third yields
⎡ ⎤
0 1 2 0 4
A5 = ⎣0 0 0 1 −5⎦
0 0 0 0 3

ISTUDY
44 I. VECTOR SPACES

At this point, we have no choice but to divide the last row by 3. For-
tunately all the other entries in that row are 0, so again we do not
introduce fractions in the matrix
⎡ ⎤
0 1 2 0 4
A6 = ⎣0 0 0 1 −5⎦
0 0 0 0 1
Finally, we subtract 4 times the third row from the first, and add 5
times the third row to the second to obtain the reduced row echelon
matrix ⎡ ⎤
0 1 2 0 0
A7 = ⎣0 0 0 1 0⎦
0 0 0 0 1
and the procedure is complete.

Problems
6.1. If A is a matrix in row echelon form, prove that the nonzero
rows of A are linearly independent. This can be done without serious
computation.
6.2. List the analogous elementary column operations and prove
the column version of Theorem 6.1.

6.3. Prove that any elementary row operation R and any elemen-
tary column operation C commute in their action on matrices. To be
precise, show that R(C(A)) = C(R(A)) for all matrices A.
6.4. Assume that row subscripts {i1 , i2 } and {k1 , k2 } are disjoint.
Show that the elementary row operations R3 (i1 , k1 ; c1 ) and R3 (i2 , k2 ; c2 )
commute.
6.5. Let A ∈ F m×n . Using a sequence of elementary row and column
operations show that any matrix A can be transformed into a matrix
of the form Dr = [dij ], where d11 = d22 = · · · = drr = 1 and all the
remaining dij are 0.
6.6. Find an example of an integer matrix whose corresponding
reduced row echelon form matrix does not have all integer entries.
6.7. Find a “slanted” basis for the row space of matrix
⎡ ⎤
1 1 1 2 0
⎢2 3 0 1 3⎥
A=⎢ ⎣1

2 −1 −1 3⎦
0 1 −2 −2 1

ISTUDY
6. MATRICES AND ELEMENTARY OPERATIONS 45

That is, find a basis that comes from an appropriate reduced row ech-
elon matrix.
6.8. Suppose A ∈ F n×n is a square matrix with linearly independent
rows. If A is converted into the reduced row echelon form matrix A ,
find the structure of A .
6.9. If we “unwrap” matrices into straight horizontal lines, then
m×n
F surely looks like the vector space F mn . With this idea in hand,
define addition and scalar multiplication so that F m×n becomes a vector
space over F .
6.10. If F m×n is viewed as a vector space as above, determine its
dimension and find a nice basis.

ISTUDY
46 I. VECTOR SPACES

7. Linear Equations and Bases


The elementary row operations are of course familiar because we
use them to solve systems of linear equations via Gaussian elimination.
Thus suppose we are given the system

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
·········
am1 x1 + am2 x2 + · · · + amn xn = bm

of m linear equations in the n unknowns x1 , x2 , . . . , xn . Of course, we


assume that the field elements aij and bj are known.
We can clearly group these constants into several appropriate ma-
trices. To start with, we have
⎡ ⎤ ⎡ ⎤
a11 a12 · · · a1n b1
⎢ a21 a22 · · · a2n ⎥ ⎢ b2 ⎥
A=⎢ ⎣
⎥,
⎦ B=⎢ ⎣ · ⎦

........
am1 am2 · · · amn bm

where A is the matrix of coefficients and B is the matrix of constants.


Notice that the columns of A are labeled 1, 2, . . . , n and these naturally
correspond to the n unknowns x1 , x2 , . . . , xn .
Furthermore, we can combine the two matrices to form
⎡ ⎤
a11 a12 · · · a1n b1
⎢ a21 a22 · · · a2n b2 ⎥
A|B = ⎢⎣

........ · ⎦
am1 am2 · · · amn bm

the so-called augmented matrix with the constants on the right. The
vertical line is of course not formally part of the matrix structure,
but it does help us to better visualize how the matrix is partitioned.
This matrix clearly contains all the information given by the system of
equations, except perhaps for the names of the unknowns. As is to be
expected, we have

Lemma 7.1. Let R be an elementary row operation. Then the set


of solutions to the linear system associated to the augmented matrix
A|B is the same as the set of solutions to the linear system associated
to the augmented matrix R(A|B).

ISTUDY
7. LINEAR EQUATIONS AND BASES 47

Proof. Write R(A|B) = A |B  and let (x1 , x2 , . . . , xn ) ∈ F n be a


solution to the system of equations
ai1 x1 + ai2 x2 + · · · + ain xn = bi
associated with A|B. In other words, these are honest expressions in F
that hold for all i = 1, 2, . . . , m. Obviously these hold if we interchange
equations i and k or if we multiply equation i by c ∈ F to yield
(c·ai1 )x1 + (c·ai2 )x2 + · · · + (c·ain )xn = c·bi
Finally, if we add c times the kth equation to the ith, we get
(ai1 + c·ak1 )x1 + (ai2 + c·ak2 )x2 + · · · + (ain + c·akn )xn = bi + c·bk
It follows that (x1 , x2 , . . . , xn ) is also a solution of the system associated
with A |B  . For the reverse inclusion, we just observe from Lemma 6.1
that A|B = R−1 (A |B  ). 
In view of the above, we are obviously motivated to convert the
augmented matrix into reduced row echelon form and to consider the
leading 1s.
Example 7.1. Suppose the augmented matrix A|B has been con-
verted to A |B  , a matrix in reduced row echelon form and suppose a
leading 1, occurs in the last column, the column of constants. Then
the corresponding row of A |B  consists of all 0s followed by a 1 at the
end, and the corresponding linear equation is clearly 0 = 1. Of course,
this contradicts (M4) in the definition of a field. It follows that this
system of equations is inconsistent, that is there are no solutions.
Thus it make sense to assume that the leading 1s occur in the first
n columns, namely the columns of A and of A. But, as we observed,
the columns of A correspond to the unknowns x1 , x2 , . . . , xn so we can
introduce the following notation. We say that the unknowns corre-
sponding to the columns with a leading 1 are the bound variables and
the remaining unknowns are the free variables. The key result here is
that solutions always exist and that the free variables can be “freely
chosen”.
Theorem 7.1. Given a system a linear equations with augmented
matrix A|B. Let A |B  be the reduced row echelon form matrix obtained
from A|B, and assume that no leading 1 occurs in the B  column. Then
for any choice of free variables in F , there exist unique bound variables
in F that yield a solution to the original system of equations.
Proof. Since the solution set of the systems corresponding to A|B
and to A |B  are identical, we can assume that A = A and B = B  . Let

ISTUDY
48 I. VECTOR SPACES

s1 , s2 , . . . , sr be the column numbers of A corresponding to the leading


1s and let F denote the set of remaining column numbers. Then since
A|B is in reduced row echelon form, the r nonzero equations look like

x si + xk aik = bi
k∈F

for i = 1, 2, . . . , r. In particular, if the free variables are chosen in


any way, then the bound variables are uniquely determined by these
equations and all such equations are satisfied. 
In view of Example 3.5, we know that the solution set of a system
of homogeneous equations in n unknowns is a subspace of F n . A close
look at the proof of the preceding theorem yields the following more
precise formulation of that result.
Corollary 7.1. Given a system of homogeneous linear equations
with matrix of coefficients A, and let A = [aij ] be the reduced row ech-
elon matrix obtained from A. If s1 , s2 , . . . , sr are the column numbers
of A having a leading 1, and if F is the complementary set of col-
umn numbers, then the solution set of the given equations has the basis
A = {αi | i ∈ F}. Here αi has a 1 as its ith entry, 0 as its kth entry
for i = k ∈ F, and −aji as its sj th entry for j = 1, 2, . . . , r.

Proof. For each i ∈ F, let αi ∈ F n be the unique solution of the


system of homogenous linear equations associated with A satisfying
xi = 1 and xk = 0 for all i = k ∈ F. Then αi has a 1 as its ith entry
and a 0 as its kth entry for all i = k ∈ F. Furthermore, we know
that the solution sets for A and for A are identical, and that the jth
nonzero row of A gives rise to the homogeneous equation

x sj + xk ajk = 0
k∈F

so xsj = −aji . In other words, αi has −aji as its sj th entry for j =


1, 2, . . . , r and αi is now completely understood.
Finally, for all field elements ci with i ∈ F, we see that

β= ci α i
i∈F

is a solution to the system of homogeneous linear equations with xi = ci


for all i ∈ F. Thus, by Theorem 7.1, β is the unique such solution.
With this, it follows immediately that A = {αi | i ∈ F} is a basis for
the solution set. 

ISTUDY
7. LINEAR EQUATIONS AND BASES 49

As a consequence, we can easily solve the second problem posed at


the beginning of the previous section. Namely, given a finite subset
A = {α1 , α2 , . . . , αm } of F n , how do we find a subset of A that forms
a basis for V = A. At the very least, we should start by checking
whether the set A is linearly independent. Indeed, if this occurs then
A is both a spanning and linearly independent subset of V , and hence
a basis for V . Surprisingly, this is all we have to do.
To begin with, let us write
αi = (ai1 , ai2 , . . . , ain ) ∈ F n
for i = 1, 2, . . . , m. In the preceding section, we formed the m × n
matrix A = [aij ] and studied its row space. Here we consider the
equation
x1 α 1 + x2 α 2 + · · · + xm α m = 0
to see whether A is linearly independent or not. By checking the n
entries in F n , this gives rise to the system of linear equations
a11 x1 + a21 x2 + · · · + am1 xm = 0
a12 x1 + a22 x2 + · · · + am2 xm = 0
·········
a1n x1 + a2n x2 + · · · + amn xm = 0
Now the matrix of coefficients of this system is not the matrix A.
Rather its size is n × m, the rows and columns are interchanged and it
is the transpose of A, namely
⎡ ⎤
a11 a21 · · · am1
⎢ a12 a22 · · · am2 ⎥
AT = ⎢⎣


........
a1n a2n · · · amn

Essentially, if A = [aij ] then AT = [aij ] where aij = aji . It is clear


that the row space of AT is identical to the column space of A. Notice
that the augmented matrix of the above system of linear equations is
AT |Z where Z is the column matrix having all entries equal to 0. The
following result describes how to find a subset of A that forms a basis
for V = A.
Theorem 7.2. Let A = {α1 , α2 , . . . , αm } be a finite subset of F n
and consider the system of n equations in m unknowns determined by
x1 α 1 + x2 α 2 + · · · + xm α m = 0

ISTUDY
50 I. VECTOR SPACES

Then the vectors αi corresponding to the bound variables xi form a basis


for V = A ⊆ F n . In particular, the dimension of V is equal to the
number of bound variables.
Proof. Suppose the bound variables have subscripts s1 , s2 , . . . , sr .
We have to show that B = {αs1 , αs2 , . . . , αsr } is linearly independent
and spans V = A. To start consider
x s1 α s1 + x s2 α s2 + · · · + x sr α sr = 0
for suitable xsi ∈ F . Notice that this is a special case of the equation

j xj αj = 0 but with all free variables equal to 0. Since this equation
has the all 0s solution, the uniqueness aspect of Theorem 7.1 implies
that all xsi = 0 and hence B is linearly independent.
To show that B spans, it clearly suffices to show that each αk ,
with xk a free variable, is contained in the subspace B. To this
end, since we know that we can choose the free variables arbitrarily,
let xk = −1 and let all other free variables be 0. Then there exist
bound variables xsi that satisfy the linear system, and the equation

j xj αj = 0 simplifies to
r

(−1)αk + x si α si = 0
i=1

But then αk = i xsi αsi ∈ B and we see that B spans. It follows
that B is a linearly independent spanning set in V , so B is indeed a
basis for this space. 
If A ∈ F m×n , then the row space of A is a subspace of F n and the
column space of A is a subspace of F m . As a consequence of all of the
above, we have the remarkable
Corollary 7.2. If A ∈ F m×n , then the row space of A and the
column space of A have the same dimension. This common dimension
is called the rank of A.

Proof. Let us start with the matrix AT and via a sequence of


elementary row operations, convert it into a reduced row echelon form
matrix B. If Z is a column matrix having all 0 entries, then clearly
AT |Z converts to B|Z via the same sequence of operations. Of course,
AT |Z is the augmented matrix for the system of n equations in m
unknowns given by
x 1 α 1 + x2 α 2 + · · · + xm α m = 0
where the αi are the row vectors of A.

ISTUDY
7. LINEAR EQUATIONS AND BASES 51

Note that V = α1 , α2 , . . . , αm  is the row space of A, and by Theo-


rem 7.2, the dimension of V is equal to the number of bound variables
xi and hence equal to the number of leading 1s in B. This is surely
equal to the number of nonzero rows of B and hence by Theorem 6.1(ii),
this is the dimension of the row space of AT . Since the row space of
AT is identical to the column space of A, the result follows. 

Problems
7.1. Using Gaussian elimination, solve the system of real linear
equations given by
5x1 + 5x2 − x3 + 7x4 + 2x5 + 5x6 = 9
x1 + x2 − x3 − x4 + x5 − x6 = 2
4x1 + 4x2 − x3 + 5x4 + 2x5 + 4x6 = 8
Which of the variables are free, which are bound, and what is the
solution when all free variables are set to 0?
7.2. Find a basis for the solution space to the system of homoge-
neous linear equations given by
x1 + 2x2 + x3 + 3x4 + x5 + x6 = 0
2x1 + 4x2 + 2x3 + 6x4 + 3x5 + 5x6 = 0
x1 + 2x2 + x3 + 2x4 + 0x5 − x6 = 0
7.3. Without computation, show that the elementary column op-
erations on A do not change the dimension of the row space of A.
Similarly show that the elementary row operations on A do not change
the dimension of the column space of A.
7.4. Show that the integer r in Problem 6.5 is uniquely determined.
Indeed it is the dimension of the row space of A and of the column
space of A.
7.5. Find a subset of the rows of the matrix A of the Problem 6.7
that form a basis for its row space.
7.6. Let A be a square matrix so that A has the same number of
rows as columns. Prove that the rows of A are linearly independent if
and only if the columns are linearly independent.
7.7. Again let A ∈ F n×n be a square matrix and assume that its
rows are linearly independent. Consider the system of linear equations
associated with A|B and let the reduced row echelon matrix of this
augmented matrix be A |B  . Using Problem 6.8, show that B  describes
the unique solution to the system of equations.

ISTUDY
52 I. VECTOR SPACES

7.8. State and prove the appropriate analog of Corollary 7.1 in the
nonhomogeneous situation.
Let A ∈ F m×n and let B1 , B2 , . . . , Bt be m × 1 column matrices.
Consider the system of equations associated with the augmented ma-
trices A|Bj for j = 1, 2, . . . , t and form the large augmented matrix
A|B, where B is an m × t matrix with columns B1 , B2 , . . . , Bt . Via
elementary row operations, convert A|B to the reduced row echelon
form matrix A |B  .
7.9. Explain how A |B  can be used to solve the systems associated
to the various A|Bj for all j.
7.10. The comments in Example 7.1 do not hold precisely in this
context. How can one tell from the matrix A |B  that the system A|Bj
is inconsistent?

ISTUDY
CHAPTER II

Linear Transformations

ISTUDY
54 II. LINEAR TRANSFORMATIONS

8. Linear Transformations
So far our study of linear algebra has been confined to vector spaces.
But in fact the essence of this subject is the study of certain functions
defined on these spaces. This is analogous to the situation in calculus.
There one first considers the real line, but the main interest of course
is in the study of real valued functions defined on the real line.
Let us consider a set of simultaneous linear equations over a field
F . Say, for simplicity
e 1 x1 + e 2 x2 + e 3 x3 = b
f 1 x1 + f 2 x2 + f 3 x3 = c
We will think of the coefficients e1 , e2 , e3 , f1 , f2 , f3 as being fixed. What
we would like to know is for which choices of b and c do solutions exist
and then how many solutions are there. Now starting with b and c and
finding x1 , x2 , x3 takes a certain amount of work. On the other hand,
starting with x1 , x2 , x3 and finding b and c from the above is trivial.
Of course, if we look at all pairs b, c and all triples x1 , x2 , x3 , then the
above two considerations are really the same. Therefore we take the
second point of view since it is certainly simpler.
What we have here is a map T that takes ordered triples (x1 , x2 , x3 )
with entries in F to ordered pairs (b, c) with entries in F . In other
words,
T : F3 → F2
But T is not any old function. It is defined in a linear fashion and
therefore we we expect it to have some nice properties.
Let α = (x1 , x2 , x3 ), β = (y1 , y2 , y3 ) ∈ F 3 with (x1 , x2 , x3 )T = (b, c)
and (y1 , y2 , y3 )T = (b , c ). Observe that we have written this function
T to the right of its argument. Then
e1 (x1 + y1 ) + e2 (x2 + y2 ) + e3 (x3 + y3 )
= (e1 x1 + e2 x2 + e3 x3 ) + (e1 y1 + e2 y2 + e3 y3 )
= b + b
and similarly
f1 (x1 + y1 ) + f2 (x2 + y2 ) + f3 (x3 + y3 ) = c + c
Thus we have
(α + β)T = (x1 + y1 , x2 + y2 , x3 + y3 )T
= (b + b , c + c ) = αT + βT

ISTUDY
8. LINEAR TRANSFORMATIONS 55

Now let a ∈ F . Then


e1 (ax1 )+e2 (ax2 ) + e3 (ax3 )
= a(e1 x1 + e2 x2 + e3 x3 ) = ab
and similarly
f1 (ax1 ) + f2 (ax2 ) + f3 (ax3 ) = ac
Thus we have
(aα)T = (ax1 , ax2 , ax3 )T
= (ab, ac) = a(αT )
and therefore T is a linear transformation.
Let V and W be two vector spaces over the same field F . A linear
transformation from V to W is a function T : V → W satisfying

T1. For all α, β ∈ V , we have


(α + β)T = αT + βT
T2. For all α ∈ V and a ∈ F , we have
(aα)T = a(αT )
In other words, T somehow intertwines the arithmetic of V and
of W . We have already seen that the function T as defined earlier is
indeed a linear transformation. We now consider some more examples.
Example 8.1. For any two vector spaces V, W over F there is
always the map T : V → W given by αT = 0 for all α ∈ V . This is
clearly a linear transformation which we denote, as expected, by 0.
Example 8.2. The identity map I : V → V is a linear transforma-
tion. Here of course αI = α for all α ∈ V .
Example 8.3. Let W be a subspace of V . Then there is a natural
map, the injection of W into V , given by αT = α for all α ∈ W . Ob-
serve that the second α is considered to be a vector in V . T is of course
a linear transformation since W and V share the same arithmetic.
In the other direction, suppose that W has a complement W  in
V so that V = W ⊕ W  . Then each element α ∈ V can be written
uniquely as β + β  with β ∈ W and β  ∈ W  . This means that the
projection map P : V → W given by αP = β is well defined, and then
it is easily seen to be a linear transformation. We remark that different
complements for W give rise to different projection maps.

ISTUDY
56 II. LINEAR TRANSFORMATIONS

Example 8.4. Let V be a vector space over the field F . For each
a ∈ F define Ta : V → V by αTa = aα for all α ∈ V . It follows from
the distributive and associative laws that Ta is a linear transformation.
Observe that T0 = 0 and that, by the unital law, T1 = I.
Example 8.5. Let C be considered as a vector space over R. For
each complex number α, let e(α) denote its real part and m(α) its
imaginary part. Then both e : C → R and m : C → R are linear
transformations of real vector spaces.

Example 8.6. Let C denote the vector space of real valued con-
tinuous functions on the interval [0, 1]. Let S : C → R be defined by
1
f (x)S = f (t) dt
0
for all f (x) ∈ C. Then S is a linear transformation from C to R.
Example 8.7. Let C be as above and define J : C → C by
x
(f J)(x) = f (t) dt
0
Then J is easily seen to be a linear transformation.
Example 8.8. Given integers m, n ≥ 1, we define T : F n → F m
as follows. First fix mn elements aij ∈ F with i = 1, 2, . . . , m and
j = 1, 2, . . . , n. Then set
n
 n
 n
 
(c1 , c2 , . . . , cn )T = a1j cj , a2j cj , . . . , amj cj
j=1 j=1 j=1

This is of course a generalization of our original example and it is surely


a linear transformation.
Example 8.9. Let V be the vector space of all functions from
a set U into the field F . Fix u0 ∈ U . Then the evaluation map
E0 : V → F given by αE0 = α(u0 ) is a linear transformation. More
generally, suppose u1 , u2 , . . . , un ∈ U . Then we can define the map
E : V → F n by
αE = (α(u1 ), α(u2 ), . . . , α(un ))
and E is a linear transformation.
Example 8.10. Finally let V = F [x]. We can define the formal
derivative D : V → V by
∞  ∞

i
ai x D = iai xi−1
i=0 i=1

ISTUDY
8. LINEAR TRANSFORMATIONS 57

It is easy to see that D is a linear transformation which enjoys most of


the properties of ordinary differentiation. In particular, one can show
that
(αβ)D = (αD)β + α(βD)
Before we consider a general means of constructing such maps, let
us first note a few elementary properties of linear transformations.
Lemma 8.1. Let T : V → W be a linear transformation. Then for
all ai ∈ F and αi ∈ V we have
(a1 α1 + a2 α2 + · · · + ak αk )T
= a1 (α1 T ) + a2 (α2 T ) + · · · + ak (αk T )

Moreover 0T = 0 and (−α)T = −(αT ) for all α ∈ V .

Proof. It follows easily by induction that for βi ∈ V we have


(β1 + β2 + · · · + βk )T = β1 T + β2 T + · · · + βk T
Thus
(a1 α1 + a2 α2 + · · · + ak αk )T = (a1 α1 )T + (a2 α2 )T + · · · + (ak αk )T
= a1 (α1 T ) + a2 (α2 T ) + · · · + ak (αk T )
Now let α ∈ V . Then 0 = 0α so
0T = (0α)T = 0(αT ) = 0
Finally
(−α)T = ((−1)α)T = (−1)(αT ) = −(αT )
and the lemma is proved. 

We show now that linear transformations are plentiful.


Theorem 8.1. Let V be a finite dimensional vector space with basis
B = {β1 , β2 , . . . , βn } and let W be another space over the same field F .
Then for any choice of γ1 , γ2 , . . . , γn ∈ W , there exists one and only
one linear transformation T : V → W with βi T = γi for i = 1, 2, . . . , n.
In fact, T is given by
(a1 β1 + a2 β2 + · · · + an βn )T = a1 γ1 + a2 γ2 + · · · + an γn
for all a1 , a2 , . . . , an ∈ F .

ISTUDY
58 II. LINEAR TRANSFORMATIONS

Proof. Suppose T : V → W is given with βi T = γi for all i. Then,


by the previous lemma,
(a1 β1 + a2 β2 + · · · + an βn )T
= a1 (β1 T ) + a2 (β2 T ) + · · · + an (βn T )
= a1 γ1 + a2 γ2 + · · · + an γn
Now B spans V , so every element α ∈ V can be written as an F -
linear sum of the βi ’s. Thus the above formula gives αT for all α, and
therefore T is uniquely determined.
Now given γ1 , γ2 , . . . , γn ∈ W , we show that T exists. Obviously,
there is only one way to define T in view of the above and therefore we
set
αT = a1 γ1 + a2 γ2 + · · · + an γn
for α = a1 β1 + a2 β2 + · · · + an βn ∈ V . Note that B is not only a
spanning set but it is in fact a basis for V . That means that α ∈ V can
be written in only one way as above and therefore T is a well defined
function from V to W . It is now a simple matter to see that T is the
required linear transformation.
First, suppose
α = a1 β1 + a2 β2 + · · · + an βn
and
α = a1 β1 + a2 β2 + · · · + an βn
are vectors in V and let b ∈ F . Then
α + α = (a1 + a1 )β1 + (a2 + a2 )β2 + · · · + (an + an )βn
and
bα = (ba1 )β1 + (ba2 )β2 + · · · + (ban )βn
so by definition
(α + α )T = (a1 + a1 )γ1 + (a2 + a2 )γ2 + · · · + (an + an )γn
= (a1 γ1 + a2 γ2 + · · · + an γn ) + (a1 γ1 + a2 γ2 + · · · + an γn )
= αT + α T
and
(bα)T = (ba1 )γ1 + (ba2 )γ2 + · · · + (ban )γn
= b(a1 γ1 + a2 γ2 + · · · + an γn ) = b(αT )
Thus T is a linear transformation. Finally,
βi = 0β1 + 0β2 + · · · + 1βi + · · · + 0βn

ISTUDY
8. LINEAR TRANSFORMATIONS 59

so
βi T = 0γ1 + 0γ2 + · · · + 1γi + · · · + 0γn = γi
and the theorem is proved. 
There are a number of ways of constructing new linear transforma-
tions from old ones. We consider one such now.
Let T : V → W be a linear transformation. As usual we say that
T is one-to-one if for each γ ∈ W there exists at most one α ∈ V with
αT = γ. We say that T is onto if for each γ ∈ W there exists at least
one α ∈ V with αT = γ. Thus if T is one-to-one and onto, then for
each γ ∈ W there exists one and only one α ∈ V with αT = γ. But
this means that α is really a function of γ and we can therefore define
naturally a back map T −1 : W → V given by
γT −1 = α for γ ∈ W
where α is the unique element of V with αT = γ. As we might expect,
T −1 is also a linear transformation.
For brevity we call such a one-to-one and onto linear transformation
an isomorphism.
Lemma 8.2. Let T : V → W be an isomorphism. Then the map
T −1 : W → V is also an isomorphism.
Proof. Let γ1 , γ2 ∈ W and let a ∈ F with γ1 T −1 = α1 and
γ2 T −1 = α2 . Then by definition α1 T = γ1 and α2 T = γ2 . Since T
is a linear transformation, this yields
(α1 + α2 )T = α1 T + α2 T = γ1 + γ2
(aα1 )T = a(α1 T ) = aγ1
Thus again by definition of T −1 we have
(γ1 + γ2 )T −1 = α1 + α2 = γ1 T −1 + γ2 T −1
(aγ1 )T −1 = aα1 = a(γ1 T −1 )
and T −1 is a linear transformation.
Suppose now that γ1 T −1 = γ2 T −1 . Then α1 = α2 so
γ1 = α1 T = α2 T = γ2
−1
and hence T is one-to-one. Finally, let α ∈ V and set γ = αT . Then
by definition of T −1 we have γT −1 = α. Thus T −1 is onto and the
result follows. 

ISTUDY
60 II. LINEAR TRANSFORMATIONS

Problems
8.1. Verify that the projection map P of Example 8.3 is a linear
transformation.
8.2. Verify that the map Ta given in Example 8.4 is a linear trans-
formation.
8.3. If you already know calculus, convince yourself that the maps
S and J given in Examples 8.6 and 8.7 are linear transformations.
8.4. Let D : F [x] → F [x] be the formal derivative map. Show that
(αβ)D = α(βD) + (αD)β
for all α, β ∈ F [x]. (Hint. First verify this for α = axn and β = bxm ,
and then prove the result by induction on deg α + deg β.)
8.5. Describe geometrically the linear transformations Ti : R2 → R2
given by
(a, b)T1 = (2a, 2b)
(a, b)T2 = (2a, b)
(a, b)T3 = (−a, −b)
For each Ti find a nonzero vector αi and a scalar ci ∈ R with αi Ti = ci αi .
8.6. Describe geometrically the linear transformation Sθ : R2 → R2
given by
(a, b)Sθ = (a cos θ − b sin θ, a sin θ + b cos θ)
For which angles θ can one find 0 = α ∈ R2 and a ∈ R with αSθ = aα.
8.7. Let T : F 3 → F 2 be given by
(a1 , a2 , a3 )T = (a1 − a2 , a3 − 2a2 + a1 )
Show that T is onto, but not one-to-one.
8.8. Let T : F 3 → F 4 be given by
(a1 , a2 , a3 )T = (a1 − 2a2 + a3 , a1 + a2 , a2 − a3 , a3 )
Prove that T is one-to-one, but not onto.
8.9. Let V be an n-dimensional vector space over F with basis
B = {β1 , β2 , . . . , βn }. Show that the map T : F n → V given by
(a1 , a2 , . . . , an )T = a1 β1 + a2 β2 + · · · + an βn
is an isomorphism. Find T −1 .
8.10. Let V be a finite dimensional vector space and let W1 and
W2 be subspaces of the same dimension. Construct an isomorphism
T : W1 → W2 such that γT = γ for all γ ∈ W1 ∩ W2 .

ISTUDY
9. KERNELS AND IMAGES 61

9. Kernels and Images


Let T : V → W be a linear transformation so that T is a map that
intertwines the arithmetic of V and of W . It is apparent therefore
that T must preserve certain aspects of the vector space structure. For
example, we have already seen that T maps 0 to 0 and negatives to
negatives. It is then natural to ask how it behaves with respect to
subspaces, bases and dimension.
If V0 is a subset of V , we let (V0 )T denote the image of V0 , that is
(V0 )T = {αT | α ∈ V0 }
In the same way, if W0 is a subset of W , we let
←−
(W0 ) T = {α ∈ V | αT ∈ W0 }
denote the complete inverse image of the subset W0 . Thus (V0 )T ⊆ W
←−
and (W0 ) T ⊆ V .
Theorem 9.1. Let T : V → W be a linear transformation with V
and W vector spaces over the same field F .
i. If V  is a subspace of V , then (V  )T is a subspace of W .
←−
ii. If W  is a subspace of W , then (W  ) T is a subspace of V .
Proof. Recall that in order to verify that a subset is a subspace,
it suffices to check that the set is closed under addition and scalar
multiplication and that it contains 0.
(i ) Let β1 , β2 ∈ (V  )T and let a ∈ F . By definition, there exist
α1 , α2 ∈ V  with β1 = α1 T and β2 = α2 T . Then α1 + α2 ∈ V  and
aα1 ∈ V  so
β1 + β2 = (α1 + α2 )T ∈ (V  )T
aβ1 = (aα1 )T ∈ (V  )T
Finally 0 = 0T ∈ (V  )T , so (V  )T is a subspace of W .
←−
(ii ) Let α1 , α2 ∈ (W  ) T and let a ∈ F . Then by definition,
α1 T, α2 T ∈ W  so since W  is a subspace of W , we have
(α1 + α2 )T = α1 T + α2 T ∈ W 
(aα1 )T = a(α1 T ) ∈ W 


Hence, by definition, α1 + α2 , aα1 ∈ (W  ) T . Finally, 0T = 0 ∈ W  , so
←− ←−
0 ∈ (W  ) T and (W  ) T is a subspace of V . 
Now a vector space always has two subspaces of interest namely
itself and 0. This gives rise to the following definitions.

ISTUDY
62 II. LINEAR TRANSFORMATIONS

Let T : V → W be a linear transformation. We set


←−
kernel of T = ker T = (0) T
image of T = im T = (V )T
Then ker T is a subspace of V and im T is a subspace of W .
Theorem 9.2. Let T : V → W be a linear transformation.
i. Let α1 , α2 ∈ V . Then α1 T = α2 T if and only if α1 −α2 ∈ ker T .
ii. T is one-to-one if and only if ker T = 0.
iii. T is onto if and only if im T = W .
Proof. (i ) Since T is a linear transformation, we have α1 T −α2 T =
(α1 − α2 )T . Thus α1 T = α2 T if and only if (α1 − α2 )T = 0 and hence
if and only if α1 − α2 ∈ ker T .


(ii ) Suppose T is one-to-one. Then ker T = (0) T must consist of
at most one vector. Since 0 ∈ ker T this yields ker T = 0. Conversely
suppose ker T = 0. If α1 T = α2 T , then α1 −α2 ∈ ker T = 0, so α1 = α2 .
(iii ) This follows by definition. 
Let us consider some examples.
Example 9.1. We start with some trivialities. The zero map
0 : V → W certainly satisfies ker 0 = V and im 0 = 0. Also the identity
map I : V → V satisfies ker I = 0, im I = V .
Example 9.2. Suppose that V is the real vector space V = R[x].
If D : V → V is the derivative map given by
∞  ∞

ai x i D = iai xi−1
i=0 i=1

then clearly ker D is the set of constant polynomials and im D = R[x].


Thus D is onto but not one-to-one. Moreover we conclude, as is well
known, that two polynomials have the same derivative if and only if
they differ by an element of ker D, namely a constant.
Now let S : V → V denote the integral map given by
∞  ∞
ai i+1
i
ai x S = x
i=0 i=0
i + 1
Then ker S = 0 and im S is the set of all polynomials with constant
term 0. In particular, S is one-to-one but not onto.
Example 9.3. Let V be a finite dimensional vector space and let
W be a subspace. Then there exists a complementary subspace W 
for W so that V = W ⊕ W  . Let P  : V → W  be the projection map

ISTUDY
9. KERNELS AND IMAGES 63

given by αP  = β  where α is uniquely written as α = β + β  with


β ∈ W and β  ∈ W  . Clearly im P  = W  and ker P  = W . The latter
fact shows that any subspace of V can be the kernel of some linear
transformation. Thus being the kernel of a linear transformation is no
more special than being a subspace.
We now consider bases and dimension. The main result of this
section is
Theorem 9.3. Let T : V → W be a linear transformation with V
and W both finite dimensional over F . Then there exist bases B =
{α1 , . . . , αr , β1 , . . . , βs } of V and C = {γ1 , . . . , γs , δ1 , . . . , δt } of W such
that
i. {α1 , . . . , αr } is a basis for ker T .
ii. βj T = γj for j = 1, . . . , s.
iii. {γ1 , . . . , γs } is a basis for im T .
Thus
dimF ker T + dimF im T = dimF V

Proof. Since ker T is a subspace of V , it has a basis {α1 , . . . , αr }


that extends to a basis B = {α1 , . . . , αr , β1 , . . . , βs } of V . Set γ1 =
β1 T, γ2 = β2 T, . . . , γs = βs T . We show that {γ1 , . . . , γs } is a basis for
im T .
First let γ ∈ im T . Then γ = αT for some α ∈ V and since B is a
basis for V we have
α = a1 α1 + · · · + ar αr + b1 β1 + · · · + bs βs
for suitable ai , bi ∈ F . Thus
γ = αT = a1 (α1 T ) + · · · + ar (αr T ) + b1 (β1 T ) + · · · + bs (βs T )
= b1 γ 1 + · · · + bs γ s
since αi ∈ ker T implies that αi T = 0. We have therefore shown that
{γ1 , . . . , γs } spans im T .
Now suppose b1 γ1 + · · · + bs γs = 0 for some bi ∈ F . Then
(b1 β1 + · · · + bs βs )T = b1 (β1 T ) + · · · + bs (βs T )
= b1 γ 1 + · · · + bs γ s = 0
so b1 β1 + · · · + bs βs ∈ ker T . Now {α1 , . . . , αr } spans ker T so
b1 γ1 + · · · + bs γs = a1 α1 + · · · + ar αr
for suitable ai ∈ F . This yields
(−a1 )α1 + · · · + (−ar )αr + b1 γ1 + · · · + bs γs = 0

ISTUDY
64 II. LINEAR TRANSFORMATIONS

so since B is a basis for V all coefficients must be zero. In par-


ticular, b1 = · · · = bs = 0 so {γ1 , . . . , γs } is linearly independent
and therefore a basis for im T . We now extend {γ1 , . . . , γs } to C =
{γ1 , . . . , γs , δ1 , . . . , δt }, a basis for W .
Finally, dimF ker T = r and dimF im T = s, so
dimF V = r + s = dimF ker T + dimF im T
and the result follows. 

If T : V → W , then we define the rank of T to be the dimension of


the image of T . Therefore a restatement of the last part of the previous
theorem is
Corollary 9.1. Let T : V → W be a linear transformation. Then
dimF ker T + rank T = dimF V
Recall that a linear transformation T : V → W is said to be an
isomorphism if it is one-to-one and onto.
Corollary 9.2. If T : V → W is an isomorphism of finite dimen-
sional vector spaces, then dimF V = dimF W .
Proof. This is clear from Theorem 9.3 since dimF ker T = 0 and
dimF im T = dimF W . 

Next, we consider a linear transformation T from V to V . Such a


map T is said to be nonsingular if it is an isomorphism.
Corollary 9.3. Let V be a vector space of dimension n < ∞
and let T : V → V be a linear transformation. Then the following are
equivalent.
i. T is nonsingular.
ii. T is onto, that is rank T = n.
iii. T is one-to-one.
Proof. Clearly (i ) implies (ii ) and (iii ). In the other direction,
we use
dimF ker T + rank T = dimF V
If T is onto, then rank T = dimF V = n, so dimF ker T = 0. This
shows that ker T = 0, so T is also one-to-one and hence nonsingular.
Finally, if T is one-to-one, then ker T = 0, so rank T = n. Since
dimF im T = rank T = dimF V , we have im T = V so T is onto and the
result follows. 

ISTUDY
9. KERNELS AND IMAGES 65

We observed in Example 9.2 that with V = R[x], the linear trans-


formation D is onto but not one-to-one. Also S is one-to-one but not
onto. Thus the above corollary does not hold in general without the
finite dimensionality assumption.
Corollary 9.4. Let T : V → W be a linear transformation with
dimF W < dimF V < ∞. Then ker T = 0.
Proof. We have
dimF V = dimF im T + dimF ker T
Now im T is a subspace of W , so dimF im T ≤ dimF W < dimF V .
This easily yields dimF ker T > 0, so ker T = 0. 

Example 9.4. Let us see now how all this applies to the study
of linear equations. We consider a set of m linear equations in the n
unknowns x1 , x2 , . . . , xn . This is given by
a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b2
······
am1 x1 + am2 x2 + · · · + amn xn = bm
Observe that, as usual, the coefficients aij ∈ F are double subscripted.
The first subscript corresponds to the row or the equation, and the
second subscript corresponds to the column of the unknown.
We think of the set {aij } as being fixed. Then the above equations
define a linear transformation T : F n → F m given by
(x1 , x2 , . . . , xn )T = (b1 , b2 , . . . , bm )
Now the solution set of the homogeneous equations, that is when b1 =


b2 = · · · = bm = 0 is clearly (0) T , the kernel of T . This is of course
a subspace of F n and hence has a basis, say {α1 , α2 , . . . , αr }. Thus
every solution of the system of homogeneous equations can be written
uniquely as
a1 α1 + a2 α2 + · · · + ar αr
for suitable ai ∈ F . See Corollary 7.2 for efficient ways to construct
such bases.
Now we ask for which (b1 , b2 , . . . , bm ) ∈ F m do solutions exist. But
clearly a solution exists if and only if (b1 , b2 , . . . , bm ) ∈ im T . In par-
ticular, the set of m-tuples of constant terms for which a solution ex-
ists is in fact a subspace of F m . Suppose (b1 , b2 , . . . , bm ) ∈ im T and
let β = (y1 , y2 , . . . , yn ) be a solution to the associated equations so

ISTUDY
66 II. LINEAR TRANSFORMATIONS

that βT = (b1 , b2 , . . . , bm ). Then β  is also a solution if and only if


β  − β ∈ ker T or in other words if and only if
β  = β + a1 α1 + a2 α2 + · · · + ar αr
for suitable ai ∈ F . Thus once we know one solution for a particular
vector in im T , we can find all such. In addition the above formula
shows that, in some sense, all such vectors in im T give rise to the same
“number” of solutions.
Finally let us consider two applications of the preceding theorems.
First suppose n > m. Then
dimF F n = n > m = dimF F m
and hence ker T = 0. This says that if there are more unknowns
than equations, then there is always a nonzero solution to the set of
homogeneous equations.
Secondly, suppose m = n. Then T is one-to-one if and only if it
is onto. This says that a solution always exists if and only if solutions
that do exist are always unique.

Problems
Let T : V → W be a linear transformation with V and W both
vector spaces over F .


9.1. Suppose T is onto. Prove that T and T define a one-to-one
correspondence between subspaces of V that contain ker T and all sub-
spaces of W .
9.2. Let V1 be a subspace of V and let W1 be a subspace of W with
(V1 )T ⊆ W1 . Show that the restriction map T1 : V1 → W1 given by
αT1 = αT for α ∈ V1 is a linear transformation. Moreover show that
ker T1 = V1 ∩ (ker T ).
9.3. Suppose T is onto and let V1 be a complement for ker T in V .
Prove that the restriction map T1 : V1 → W is an isomorphism.
Consider the following set of three linear equations in four unknowns
with coefficients in Q.
2x1 + x2 + x3 − 4x4 = b1
3x1 + x2 + 3x3 − x4 = b2
x1 + x2 − x3 − 7x4 = b3
and let T : Q4 → Q3 be the corresponding linear transformation.
9.4. Find a basis for ker T and extend this to a basis for Q4 .

ISTUDY
9. KERNELS AND IMAGES 67

9.5. Find a basis for im T and extend this to a basis for Q3 . What
is the rank of T ?
9.6. Find all solutions with b1 = 2, b2 = 3, b3 = 1.
9.7. Let V be an F -vector space of dimension n < ∞ and let
T : V → V be a linear transformation. Define subspaces Vj and Wj
inductively by
V0 = V, Vj+1 = (Vj )T


W0 = 0, Wj+1 = (Wj ) T
Show that Vj ⊇ Vj+1 and Wj+1 ⊇ Wj and deduce from this that V2n =
Vn and W2n = Wn .
Let V be a two-dimensional real vector space with basis {α1 , α2 }.
Let T : V → V be the linear transformation given by
α1 T = 4α1 − 5α2
α2 T = 2α1 − 3α2
9.8. Find nonzero vectors β1 , β2 ∈ V with β1 T = −β1 and β2 T =
2β2 .
9.9. Suppose 0 = γ ∈ V with γT = aγ for some a ∈ R. Show that
a = −1 or 2.
9.10. Prove that {β1 .β2 } is a basis for V and describe T as above
in terms of this basis.

ISTUDY
68 II. LINEAR TRANSFORMATIONS

10. Quotient Spaces


In Example 9.3, we showed that if V is a finite dimensional vector
space, then every subspace of V is the kernel of some linear transforma-
tion T : V → V  . This fact is also true for infinite dimensional vector
spaces and can be proved using the transfinite methods we will study
at the end of these notes. But there is an alternate approach that is
quite elementary and avoids these infinite methods. Indeed, it applies
to many other algebraic systems where complements do not exist in
general. We consider this below.
To start with, we need a better understanding of equality. For
example, what does it mean when we say
2 4
=
3 6
Surely the fractions are not identical since they have different numer-
ators and different denominators. So equality here must mean “equal
in Q” or perhaps “numerically equal”. Again 4 − 3 = 1 must mean
“equal in the integers Z” or perhaps “numerically equal”. Even worse,
what does 3 = 3 mean? Surely the two 3’s are not absolutely identical
and certainly they are composed of different molecules. Again = must
mean “numerically equal”.
Let A be a set and let ∼ be a relation on A. In other words, for
any two elements of a, b ∈ A, we have either a ∼ b or not. We say that
∼ is an equivalence relation if

E1. a ∼ a for all a ∈ A. (reflexive)


E2. a ∼ b if and only if b ∼ a. (symmetric)
E3. If a ∼ b and b ∼ c, then a ∼ c. (transitive)

Our first example is amusing but surprisingly relevant.


Example 10.1. Suppose we have a large collection of marbles and
we store them in a number of different buckets. Let us say that marbles
a and b are equivalent if and only if they are stored in the same bucket.
Certainly a ∼ a, and a ∼ b if and only if b ∼ a. Finally, since b is in a
unique bucket, we see that a ∼ b and b ∼ c imply that a and c are in
the same bucket as b and hence a ∼ c.

Example 10.2. We can of course formulate the above example


without the use of marbles. Let A be a set and suppose A is given as

˙
A= Ai
i∈I

ISTUDY
10. QUOTIENT SPACES 69

a disjoint union of the nonempty subsets Ai . If a, b ∈ A, we say that a


and b are equivalent, and write a ∼ b, if and only if a and b are in the
same subset Ai . As above, it is easy to verify that ∼ is an equivalence
relation.
Let us return to Example 10.1 and ask, which bucket contains mar-
ble a? We cannot answer this exactly, since we don’t really know what
a bucket is, but we can say that the bucket we are looking for is the
one that contains all the marbles that are equivalent to a. So if we
identify the bucket with its contents, then we can say that the bucket
is equal to {x | x ∼ a}. This motivates the following definition.
Let A be a set with an equivalence relation ∼. For each a ∈ A we
define the equivalence class of a to be
cl(a) = {x ∈ A | x ∼ a}
and we write A/∼ for the set of all these equivalence classes. In par-
ticular, we have a natural map cl : A → A/∼ that is certainly onto.
With this, we obtain a converse to Example 10.2.
Theorem 10.1. If A is a set with an equivalence relation ∼, then
A is the disjoint union of its distinct equivalence classes.

Proof. Notice that a ∈ cl(a) so A is certainly equal to the union


of its equivalence classes. We need only show that distinct classes are
disjoint. To start with, we observe that if a ∼ b, then cl(a) = cl(b).
Indeed, if x ∈ cl(a), then by transitivity x ∼ a and a ∼ b imply that
x ∼ b and hence x ∈ cl(b). Thus cl(a) ⊆ cl(b) and by symmetry we
have the reverse inclusion. Finally, if cl(a) ∩ cl(b) = ∅ and if c is in
the intersection, then c ∼ a and c ∼ b. It follows from symmetry and
transitivity that a ∼ b and hence cl(a) = cl(b), as required. 
Before we apply this to our vector space problem, it is certainly
worthwhile to consider other examples of interest. The first is well
known, but uses a different symbol for equivalence.
Example 10.3. Let A = Z be the ring of rational integers and let
n > 1 be a fixed integer. Then nZ is the set of all Z-multiples of n,
namely those integers divisible by n. If a, b ∈ Z, we write a ≡ b mod n
if and only if n divides a−b. Then ≡ is an equivalence relation. Indeed,
a − a = 0 = n·0 so a ≡ a mod n, and if a − b is divisible by n, then
so is (b − a) = −(a − b). Finally, if a ≡ b mod n and if b ≡ c mod n,
then a − b is a multiple of n and b − c is a multiple of n. Hence
a − c = (a − b) + (b − c) is a multiple of n and a ≡ c mod n.

ISTUDY
70 II. LINEAR TRANSFORMATIONS

Now if a ∈ Z then cl(a) is easily seen to be the coset a + nZ


and the distinct classes are cl(0) = nZ, cl(1) = 1 + nZ, . . ., through
cl(n − 1) = (n − 1) + nZ. Indeed, for i = 0, 1, . . . , n − 1, we see that
cl(i) = i + nZ is precisely the set of all integers that leave a remainder
of i when divided by n.
Next, we look at the fraction problem we began with.
Example 10.4. Let us assume that we understand the integers Z
and we want to understand Q. To start with, we consider all “formal
fractions” a/b with a, b ∈ Z and b = 0. Observe that these are just or-
dered pairs (a, b), where we know the numerator a and the denominator
b. Denote by F the set of all such fractions and write a/b ∼ c/d if and
only if ad = bc, the formula one gets by cross multiplying. Notice that
the latter is an equation in Z and hence well understood. We show now
that ∼ is an equivalence relation. Indeed, since multiplication in Z is
commutative, it follows immediately that ∼ is reflexive and symmetric.
For transitivity, suppose a/b ∼ c/d and c/d ∼ e/f . Then ad = bc and
cf = de, so
d(af − be) = daf − dbe
= f (ad − bc) + b(cf − de) = 0
Thus, since d = 0, we have af − be = 0 and hence a/b ∼ e/f .
So, what is Q? Well, if q ∈ Q, then q = a/b for some a, b ∈ Z
with b = 0. Furthermore, q is also equal to another fraction c/d if
and only if the formal fractions a/b and c/d are equivalent. Thus q
really corresponds to the equivalence class of a/b. In other words,
there is a one-to-one correspondence, that is a one-to-one, onto map,
θ : Q → F/∼. Indeed, when Q is constructed from Z, it is taken to
be F/∼. Of course, Q ⊇ Z and Q has an arithmetic defined on it. So
these aspects require additional considerations. We will outline some
of this in Problems 10.5 through 10.8.
The moral here is that we can define a new structure by starting
with something we know, introducing an appropriate equivalence rela-
tion, and then studying the set of equivalence classes. With this idea
in hand, we move on to consider vector spaces. As will be apparent,
the situation here is quite similar to that of Example 10.3 except that
we expand upon it by introducing an arithmetic on the set of classes.
Let V be a vector space over the field F and let W be a fixed
subspace. For α, β ∈ V we write α ∼ β, or more properly α ∼W β if
and only if α − β ∈ W .
Lemma 10.1. With the above notation, we have

ISTUDY
10. QUOTIENT SPACES 71

i. ∼W is an equivalence relation on V .
ii. If α ∈ V , then cl(α) is equal to the coset α + W .
iii. If α ∼ α and β ∼ β  , then α + β ∼ α + β  .
iv. If α ∼ α and c ∈ F , then cα ∼ cα .

Parts (iii) and (iv) above say that the equivalence relation respects
the arithmetic in V . As we will see, results of this nature allow us to
define an appropriate arithmetic on the set of equivalence classes.

Proof. (i ) It is interesting to observe that the three conditions


for ∼ to be an equivalence relation almost precisely mirror the three
conditions in Theorem 3.1 for W to be a subspace. First, if α ∈ V ,
then α − α = 0 ∈ W , so α ∼ α. Next suppose α ∼ β. Then α − β ∈ W ,
so β − α = (−1)(α − β) ∈ W and β ∼ α. Finally, if α ∼ β and β ∼ γ,
then α − β ∈ W and β − γ ∈ W , so α − γ = (α − β) + (β − γ) ∈ W
and α ∼ γ.
(ii ) For this, we note that β ∼ α if and only β − α ∈ W and hence
if and only if β ∈ α + W .
(iii ) If α ∼ α and β ∼ β  , then α − α ∈ W and β − β  ∈ W so
(α + β) − (α + β  ) = (α − α ) + (β − β  ) ∈ W and α + β ∼ α + β  .
(iv ) Finally, if α ∼ α and c ∈ F , then α − α ∈ W so cα − cα =
c(α − α ) ∈ W and cα ∼ cα . 

Now let us return to Example 10.1 to better visualize what comes


next. Let us assume that the collection of marbles in that example
actually has an arithmetic, say a multiplication ∗ defined on it. Can
we extend this in a natural manner to a multiplication of the buckets?
The obvious approach is as follows. Let A1 and A2 be buckets and
choose a marble a1 ∈ A1 and a2 ∈ A2 . Then we can multiply a1 and
a2 , see which bucket B contains a1 ∗ a2 , and define A1 ∗ A2 to equal B.
This seems reasonable, but there is an obvious flaw. Namely, suppose
we choose a second marble a1 in A1 and a second marble a2 in A2 .
We then multiply these new elements and see which bucket B  contains
a1 ∗ a2 . If B  is always equal to B, then everything is fine and the
definition makes sense. If not, the process just does not work.
So how can we guarantee that all B  equal B? Note that a1 and a1
come from the same bucket, so a1 ∼ a1 . Similarly a2 ∼ a2 . Obviously
we need some sort of result which asserts that for all choices of marbles,
a1 ∼ a1 and a2 ∼ a2 implies that (a1 ∗ a2 ) ∼ (a1 ∗ a2 ). Indeed, if we
have such a result, then a1 ∗ a2 and a1 ∗ a2 will always belong to the
same bucket. In other words, we need to show that the relation ∼
respects the multiplication ∗. When this occurs we can indeed define

ISTUDY
72 II. LINEAR TRANSFORMATIONS

the multiplication of buckets, and since A1 = cl(a1 ) and A2 = cl(a2 ),


we obtain cl(a1 ) ∗ cl(a2 ) = cl(a1 ∗ a2 ).
Now as we observed, parts (iii) and (iv) of the preceding lemma
show that in a vector space V , the relation ∼W does indeed respect its
arithmetic. Thus, we can define addition and scalar multiplication in
the set of equivalence classes of V , and then show that V /∼ is also an
F -vector space. Once we do this, we write V /∼ as V /W , the quotient
space of V by W .

Theorem 10.2. Let W be a subspace of the F -vector space V and


let ∼ be the equivalence relation on V determined by W . Then
i. V /W = V /∼ is an F -vector space.
ii. The class map T = cl is a linear transformation from V onto
the quotient space V /W .
iii. ker T = W .
In particular, every subspace W of V is the kernel of a suitable linear
transformation.

Proof. Since the equivalence relation ∼W respects addition and


scalar multiplication by the preceding lemma, we can define a cor-
responding addition and scalar multiplication in V /∼. Indeed, if we
write T for the class map T : V → V /∼, then αT + βT = (α + β)T
and c(αT ) = (cα)T for all α, β ∈ V and c ∈ F . In particular, if we
knew that V /∼ was an F -vector space, that is if it satisfied all the
appropriate axioms, then T would be a linear transformation.
Now we know at least that T is onto, and with this we can use
T to transfer the axioms from V to V /∼. To start with, we already
know that closure of addition and of scalar multiplication are satisfied.
Next come the for all axioms. These are the identities that hold for
all elements of V and F . For example, addition is associative in V so
α + (β + γ) = (α + β) + γ for all α, β, γ ∈ V . Thus, applying T to this
expression yields

αT + (βT + γT ) = αT + (β + γ)T = (α + (β + γ))T


= ((α + β) + γ)T = (α + β)T + γT
= (αT + βT ) + γT

But T is onto, so αT, βT and γT are three typical elements of V /∼,


and thus we conclude that the associative law of addition holds in V /∼.

ISTUDY
10. QUOTIENT SPACES 73

Similarly, the distributive law (c + d)α = cα + dα holds in V , and


applying T yields
(c + d)(αT ) = ((c + d)α)T = (cα + dα)T
= (cα)T + (dα)T = c(αT ) + d(αT )
Again, T is onto, so αT is a typical element of V /∼ and therefore this
distributive law holds in V /∼. The remaining axioms of this nature
obviously carry over in the same way.
Thus, we need only consider the zero axiom and negatives. Since
α + 0 = 0 + α = α, we see that
αT + 0T = 0T + αT = αT
Thus since αT is a typical element of V /∼, we see that 0T plays the
role of 0 in V /∼. In the same way, (−α)T plays the role of −(αT ) since
α + (−α) = (−α) + α = 0 yields
αT + (−α)T = (−α)T + αT = 0T = 0
Thus V /∼ is indeed a vector space over F and T : V → V /∼ is an onto
linear transformation. Finally, since 0 = 0T , we see that α ∈ ker T if
and only if cl(α) = αT = 0 = 0T = cl(0). Obviously, this occurs if and
only if α ∼ 0 and hence if and only if α − 0 ∈ W . Thus ker T = W and
the theorem is proved. 
Let us return, one last time, to Example 10.1 and suppose that we
have a function f that assigns to each marble some attribute, say its
color. Can we use f to assign a color to each bucket? Obviously, we
can do this if all marbles in the same bucket have the same color. In
other words, we need the equivalence relation ∼ to respect the function
f , so that a ∼ b implies af = bf . Since the bucket containing a is the
equivalence class of a, we can then define a function f¯ on buckets that
sends cl(a) to af .
Finally, we apply this idea to our vector space situation.
Theorem 10.3. Let W be a subspace of the F -vector space V and
let S : V → V  be a linear transformation with ker S ⊇ W . Then there
is a natural linear transformation S : V /W → V  such that
i. For all α ∈ V ,
αS = (αT )S
where T : V → V /W is the class map.
ii. im S = im S and hence S is onto if and only if S is onto.
iii. αT ∈ ker S if and only if α ∈ ker S. Hence S is one-to-one if
and only if ker S = W .

ISTUDY
74 II. LINEAR TRANSFORMATIONS

Proof. Let ∼ denote the equivalence relation on V determined by


W , and let T : V → V /W be the class map. We show first that ∼
respects the function S, and this is where the hypothesis ker S ⊇ W
is used. Indeed, if α ∼ β, then α − β ∈ W ⊆ ker S, so αS − βS =
(α − β)S = 0 and αS = βS. As above this allows us to define a map
S : V /W → V  by
(αT )S = (cl(α))S = αS
We can now quickly verify that S is a linear transformation. To
this end, let α, β ∈ V and c ∈ F . Then, since both S and T are linear
transformations, we have
(αT + βT )S = ((α + β)T )S = (α + β)S
= αS + βS = (αT )S + (βT )S
and
(c(αT ))S = ((cα)T )S = (cα)S
= c(αS) = c((αT )S)
As usual, we note that T is onto so αT and βT are two typical elements
of V /W . With this, we conclude that S is indeed a linear transforma-
tion.
Finally, from (αT )S = αS we see that im S = im S. Thus S is onto
if and only if S is onto. Furthermore, αT ∈ ker S if and only if αS = 0
and hence if and only if α ∈ ker S. In particular, S is one-to-one if and
only if (ker S)T = 0 and therefore if and only if ker S ⊆ ker T = W . 
Recall that an isomorphism of vector spaces is a linear transforma-
tion that is a one-to-one correspondence. Namely, it is one-to-one and
onto.
Corollary 10.1. Let V be an F -vector space and let S : V → V 
be a linear transformation onto V  . If W = ker S, then S : V /W → V 
is an isomorphism of vector spaces.
Proof. We know that S is a linear transformation, and parts (ii)
and (iii) of the preceding theorem imply that S is onto and one-to-one.
Thus S is an isomorphism. 

Problems
10.1. In Example 10.2, where is the disjointness of the various sub-
sets Ai used in the proof that ∼ is an equivalence relation. In particular,
show that transitivity fails if the subsets are not disjoint.

ISTUDY
10. QUOTIENT SPACES 75

10.2. Example 10.3 studies A = Z and congruence modulo n > 1.


Show that ≡ respects the arithmetic of Z. In other words, verify that
if a ≡ a and b ≡ b then a + b ≡ a + b and ab ≡ a b . Sketch a proof
that Z/nZ = Z/≡ is a commutative ring, namely it satisfies all the
field axioms except the one concerning the existence of multiplicative
inverses.
10.3. If n above is not a prime number show that the ring Z/nZ
contains two nonzero elements that multiply to 0. Prove that neither
of these elements can have an inverse, and conclude that the ring is not
a field.
10.4. Notice that Z/2Z is the finite field with two elements that is
described in Example 1.4. Now it is known that if a, b ∈ Z are relatively
prime, that is they have no nontrivial factors in common, then there
exist x, y ∈ Z with ax + by = 1. Use this to prove that for any prime
p, the ring Z/pZ is a finite field of size p.

10.5. Let F be the set of all formal integer fractions as described


in Example 10.4 and let ∼ be the corresponding equivalence relation.
Define an arithmetic on F by
(a/b) + (c/d) = (ad + bc)/(bd)
(a/b)·(c/d) = (ac)/(bd)
Show that ∼ respects this arithmetic.
10.6. Prove that the addition and multiplication defined above are
commutative and associative. Show that Z embeds in F via the map
ϕ : a → a/1 and that ϕ preserves addition and multiplication.
10.7. Show that the distributive law fails in F. Prove instead that

(a/b)· (c/d) + (e/f ) ∼ (a/b)·(c/d) + (a/b)·(e/f )

10.8. Starting with the known ring Z, use these ideas to construct
the rational field Q. Sketch a proof that Q satisfies all the appropriate
field axioms, that Q contains a copy Z and that every element of Q is
a fraction with numerator and denominator in this copy of Z.
10.9. Let V = W ⊕W  be a vector space over F and let P  : V → W 
denote the corresponding projection map described in Example 9.3. If
∼ is the equivalence relation on V determined by W , show that the
linear transformation P  : V /W → W  , as given by Theorem 10.3, is
an isomorphism that assigns to each equivalence class of V the unique
element of W  it contains.

ISTUDY
76 II. LINEAR TRANSFORMATIONS

10.10. Let U ⊆ W ⊆ V be a chain of vector spaces and write


S : V → V /U and T : V → V /W for the corresponding class maps.
Let S0 : W → V /U denote the restriction of S to W , as described in
Problem 9.2, and observe, by Theorem 10.3, that S0 : W/U → V /U
is an isomorphism to its image (W )S. Now show that T gives rise to
a linear transformation T : V /U → V /W that is onto and has kernel
equal to the image of S0 .

ISTUDY
11. MATRIX CORRESPONDENCE 77

11. Matrix Correspondence


We have already made a good deal of progress on the solving as-
pects of linear equations. We observed for example that the solution
set of a system of homogeneous equations has defined on it a natural
arithmetic, and this led to the definition of a vector space. We also
showed that these solution sets had finite bases and therefore that all
elements in it could be easily described. There is of course still more to
do, but for now we look in another direction. We consider the systems
of linear equations or rather their associated linear transformations and
we describe a natural arithmetic on them.
Let V and W be two vector spaces over the same field F and let
L(V, W ) denote the set of all linear transformations from V to W .
Suppose S, T ∈ L(V, W ) and a ∈ F . We can then define the maps
S + T and aS by

α(S + T ) = αS + αT
α(aS) = a(αS) for all α ∈ V

We observe now that these maps are also linear transformations from
V to W .
Let α, β ∈ V and b ∈ F . Then by definition and the fact that both
S and T are linear transformations, we have

(α + β)(S + T ) = (α + β)S + (α + β)T


= (αS + βS) + (αT + βT )
= (αS + αT ) + (βS + βT )
= α(S + T ) + β(S + T )

(bα)(S + T ) = (bα)S + (bα)T


= b(αS) + b(αT )
= b(αS + αT ) = b·α(S + T )

and similarly

(α + β)(aS) = a·(α + β)S


= a(αS + βS)
= a(αS) + a(βS)
= α(aS) + β(aS)

ISTUDY
78 II. LINEAR TRANSFORMATIONS

(bα)(aS) = a·(bα)S
= a·b(αS) = b·a(αS)
= b·α(aS)
Thus S + T ∈ L(V, W ) and aS ∈ L(V, W ). In other words, L(V, W ) is
a set with an addition and a scalar multiplication defined on it. It is
in fact a vector space over F .
Theorem 11.1. Let V and W be vector spaces over F . Then with
addition and scalar multiplication defined as above, L(V, W ) is also a
vector space over F .
Proof. We have already verified the closure axioms. The associa-
tive, commutative, distributive and unitary laws are routine to check
and so we relegate these to Problem 11.1 at the end of this section.
Finally the zero map clearly plays the role of 0 ∈ L(V, W ) and −T is
just (−1)T as defined above. The result follows. 
At this point we could go ahead and compute dimF L(V, W ) by
constructing a basis. However we take another approach.
Let m and n be positive integers and consider the space F mn . This
is of course the vector space of mn-tuples over F . Now it doesn’t really
matter how we write these mn entries, namely whether we write them
in a straight line or in a circle or perhaps in a rectangular m by n
array. So for reasons that will be apparent later on we will take this
latter approach and write all such elements as m by n arrays. When
we do this, we of course designate F mn by F m×n and we call this the
space of m × n (m by n) matrices over F . We have seen these matrices
before in Sections 6 and 7 as formal arrays, but without considering
their arithmetic.
Now, if α ∈ F m×n , then α is an m by n array of elements of F ,
where m indicates the number of rows and n the number of columns.
As usual, we write α as
⎡ ⎤
a11 a12 . . . a1n
⎢ a21 a22 . . . a2n ⎥
α=⎢ ⎣


........
am1 am2 . . . amn
in terms of double subscripted entries aij ∈ F . Here i is the row
subscript and runs between 1 and m, and j is the column subscript and
runs between
  1 and n. For brevity we sometimes denote the matrix α
by α = aij .
Since F m×n is really just F mn in disguise, we know how addition
and scalar multiplication are defined. Never-the-less to avoid confusion

ISTUDY
11. MATRIX CORRESPONDENCE 79
   
we restate this below. Let α = aij and β = bij be matrices in F m×n
and let c ∈ F . Then
     
α + β = aij + bij = aij + bij
and    
cα = c aij = caij
In other words, the i, j-th entry of α + β is aij + bij and the i, j-th entry
of cα is caij . Clearly
dimF F m×n = dimF F mn = mn

Let us return now to L(V, W ). Suppose dimF V = m < ∞,


dimF W = n < ∞ and choose bases A = {α1 , α2 , . . . , αm } of V and
B = {β1 , β2 , . . . , βn } of W . Here we think of A and B as not only
sets, but in fact ordered sets, namely the basis vectors are suitably
numbered 1, 2, . . .. Let T ∈ L(V, W ). Then αi T ∈ W , so αi T can be
written uniquely as an F -linear sum of the β’s. This then gives rise to
the family of equations
α1 T = a11 β1 + a12 β2 + · · · + a1n βn
α2 T = a21 β1 + a22 β2 + · · · + a2n βn
······
αm T = am1 β1 + am2 β2 + · · · + amn βn
for uniquely determined elements aij ∈ F . We then have a map which
we denote by A [ ]B from L(V, W ) to F m×n given by
⎡ ⎤
a11 a12 · · · a1n
⎢ a21 a22 · · · a2n ⎥

A [ ]B : T → ⎣

........ ⎦
am1 am2 · · · amn
where the aij are as above. Unlike our usual function on the right
notation, we will use A [T ]B to denote the image of T under this map.
Now L(V, W ) and F m×n are both vector spaces over F , and we have
Theorem 11.2. Let V and W be finite dimensional vector spaces
over the field F with A = {α1 , α2 , . . . , αm } being a basis for V and
B = {β1 , β2 , . . . , βn } being a basis for W . Then
A[ ]B : L(V, W ) → F m×n
as defined above is a rank preserving isomorphism. In particular,
dimF L(V, W ) = mn = (dimF V )(dimF W )
and rank T = rank A [T ]B for all T ∈ L(V, W ).

ISTUDY
80 II. LINEAR TRANSFORMATIONS

Proof. We first show that A [  ]B is a linear transformation.


   Let
 
T, T ∈ L(V, W ) with A [T ]B = aij and A [T ]B = aij and let b ∈ F .
Then by definition of T + T  and bT we have for all i
αi (T + T  ) = αi T + αi T 
= (ai1 β1 + ai2 β2 + · · · + ain βn )
+ (ai1 β1 + ai2 β2 + · · · + ain βn )
= (ai1 + ai1 )β1 + (ai2 + ai2 )β2 + · · · + (ain + ain )βn
and similarly
αi (bT ) = b(αi T )
= b(ai1 β1 + ai2 β2 + · · · + ain βn )
= (bai1 )β1 + (bai2 )β2 + · · · + (bain )βn
This implies that
 
A [T + T  ]B = aij + aij
   
= aij + aij = A [T ]B + A [T  ]B
and
 
A [bT ]B = baij
 
= b aij = b·A [T ]B
Therefore A [ ]B is a linear transformation.
Recall that we have shown in Theorem 8.1 that for each choice of
γ1 , γ2 , . . . , γm ∈ W there exists one and only one linear transformation
T : V → W with αi T = γi for i = 1, 2, . . . , m. Since the γi are uniquely
determined by
γi = ai1 β1 + ai2 β2 + · · · + ain βn
we see immediately that A [ ]B is one-to-one and onto. Thus A [ ]B is an
isomorphism, and by Corollary 9.2 we conclude that dimF L(V, W ) =
dimF F m×n = mn.
It remains to compare the ranks of T and of A = A [T ]B = [aij ]
and we sketch a proof of their equality. First observe that the map
S : F n → W given by
(b1 , b2 , . . . , bn ) → b1 β1 + b2 β2 + · · · + bn βn
is an isomorphism since B = {β1 , β2 , . . . , βn } is a basis for W . Next
note from Corollary 7.2 that the rank of A is the dimension of its
row space, namely the subspace of F n spanned by the row vectors

ISTUDY
11. MATRIX CORRESPONDENCE 81

ρi = (ai1 , ai2 , . . . , ain ) for i = 1, 2, . . . , m. Furthermore, the rank of T


is the dimension of im T , the subspace of W spanned by the vectors
αi T = ai1 β1 + ai2 β2 + · · · + ain βn = ρi S
again for i = 1, 2, . . . , m. Since S is an isomorphism, it induces an
isomorphism of the subspaces
S : ρ1 , ρ2 , . . . , ρm  → ρ1 S, ρ2 S, . . . , ρm S
= α1 T, α2 T, . . . , αm T  = im T
and Corollary 9.2 implies that
rank A = dimF ρ1 , ρ2 , . . . , ρm  = dimF im T = rank T
as required. 
We call A [T ]B the matrix associated with T . Obviously different
choices of bases A and B give rise to different matrices.
We close this section with a slightly different phenomenon. Namely
we study a multiplication that can be defined  between
 certain
 pairs of
matrices. Let us consider two matrices α = aij and β = bij over F
but of possibly different sizes. Say α is m × n and β is r × s. That is
symbolically
⎡ n ⎤ ⎡ s ⎤

α = m⎣ ⎦, β = r⎣ ⎦

Then the product αβ is defined if and only if n = r in which case αβ


turns out to be of size m × s. In other words,
⎡ n ⎤ ⎡ s ⎤ ⎡ s ⎤

m⎣ ⎦·  n ⎣ ⎦ = m⎣ ⎦

and we can think of the intermediate n’s as somehow being cancelled.
Now αβ = cij has the same number of rows as does α and the
same number of columns as does β. Thus it would make sense to have
cij be a function of the ith row of α and the jth column of β. In
addition, the rows of α and the columns of β have the same number of
entries, namely n. Therefore we can and do define
n

cij = aik bkj
k=1
We see that this meets all of the above criteria and therefore at least
makes sense.

ISTUDY
82 II. LINEAR TRANSFORMATIONS
 
We repeat the definition
  formally. Let α = aij be an m×n matrix
 let β = bij be an r × s matrix over F . Then the product
over F and
αβ = cij is defined if and only if n = r in which case αβ is m × s and
n=r

cij = aik bkj
k=1

for all appropriate i and j.


It is natural to ask whether this multiplication has reasonably nice
properties. The answer is “yes and no”. First observe that if αβ is
defined, then βα need not be, so we cannot expect a general commu-
tative law. However even when both αβ and βα are defined and have
the same size, we still need not have αβ = βα. For example, if
   
1 0 0 1
α= and β =
0 0 0 0
then    
0 1 0 0
αβ = and βα =
0 0 0 0
so αβ = βα in this case. Now the sum of two matrices is defined if and
only if they have the same size. With this understanding, we have the
following
Lemma 11.1. Matrix multiplication is associative and distributive.
That is, for all matrices α, β, γ over F of suitable sizes (so that the
arithmetic operations are defined) we have
α(βγ) = (αβ)γ
α(β + γ) = αβ + αγ
(β + γ)α = βα + γα
Proof. We first consider the associative law. Suppose that the left
hand side is defined. Then βγ exists so we must have β of size n × r
and γ of size r × s. Then βγ has size n × s so since α(βγ) exists, α
must have size m × n. With these sizes we see that the right hand side
is also defined. Conversely if the right hand side exists, then so does
the left and the
 matrix
 sizes
  must be as
 above.  
  Say α = aij , β = bij , γ = cij . Let βγ = eij and α(βγ) =
fij . Then
r

eij = bik ckj
k=1

ISTUDY
11. MATRIX CORRESPONDENCE 83

and
n
 n
 r

fij = aik ek j = aik bk k ckj
k =1 k =1 k=1
n 
 r
= aik bk k ckj
k =1 k=1
 
Similarly, if (αβ)γ = fij then
r 
 n
fij = aik bk k ckj = fij
k=1 k =1

and the associative law holds.


Now consider the equation α(β + γ) = αβ + αγ. Clearly the condi-
tion that makes either side defined is that
 α has size
 m  × n and that
both β and γ have
 size n × s. Let α = a ij , β = b ij , γ = cij and
α(β + γ) = eij . Then
           
eij = aij bij + cij = aij bij + cij
so
n

eij = aik (bkj + ckj )
k=1
n  n
 
= aik bkj + aik ckj
k=1 k=1

Now
  the first term directly above is clearly the i, j-th entry in αβ
 =
aij bij and the second term is the i, j-th entry in αγ = aij cij .
Thus by definition of addition we have immediately α(β+γ) = αβ+αγ.
The other distributive law follows in a similar manner and the
lemma is proved. 
It will be necessary to develop a certain amount
 of manipulative
skill in computing
  matrix products. If
  α = aij is an m × n matrix
and β = bij is n × r, then αβ = cij is m × r and we have to find
all mr entries. Clearly the easiest way to scan through all these mr
entries is to work column by column or row by row.
Suppose we consider the column by column method. Say we wish to
find the jth column of αβ. As we have already observed, this depends
upon all of α but only the jth column of β. So we start by physically
(or mentally) lifting the jth column of β, turning it on its side and
placing it above the matrix α, as indicated in the following diagram.

ISTUDY
84 II. LINEAR TRANSFORMATIONS

⎡ b1j b2j · · · bnj ⎤ ⎡ ⎤


a11 a12 · · · a1n b1j
⎢ a21 a22 · · · a2n ⎥ ⎢ b2j ⎥
α=⎢


⎦ β=⎢
⎣ .. ⎥

· · ·· · · · · · .
am1 am2 · · · amn bnj

We then apply the b-row to each of the a-rows in turn, multiplying the
corresponding b- and a-entries and summing these products. The sum
we get from the ith row of α is then
ai1 b1j + ai2 b2j + · · · + ain bnj = cij
Thus, in this way we have found the ith entry of the jth column of αβ,
namely cij .
We can also proceed row by row. Here to find the ith row of the
product, we take the ith row of α, turn it on its side and place it next
to β.

⎡ ⎤ ⎡ ⎤
ai1 b11 b12 · · · b1r
⎢ ⎥ ai2 ⎢ b21 b22 · · · b2r ⎥
α=⎢

⎥ β = .. ⎢ ⎥
ai1 ai2 · · · ain ⎦ . ⎣ · · ·· · · · · · ⎦
ain bn1 bn2 · · · bnr

We then apply the a-column in turn to each of the b-columns, multi-


plying the corresponding a- and b-entries and summing these products.
The sum we get from the jth column of β is then
ai1 b1j + ai2 b2j + · · · + ain bnj = cij
Thus in this way we find the jth entry of the ith row of αβ.

Problems
11.1. Complete the proof of Theorem 11.1 that L(V, W ) is a vector
space.
Let V be a vector space over R with basis A = {α1 , α2 } and let W
have basis B = {β1 , β2 }. Define α1 = α1 + 2α2 , α2 = 2α1 + 3α2 in V
and β1 = β1 + 2β2 , β2 = β1 + β2 in W so that A = {α1 , α2 } is also a
basis for V and B  = {β1 , β2 } is a second basis for W .
11.2. Let T : V → W be given by
 
1 4
A [T ]B =
5 2

ISTUDY
11. MATRIX CORRESPONDENCE 85

Find A [T ]B and A [T ]B .

11.3. Let IV : V → V be the identity map. Find A [IV ]A , A [IV ]A


and A [IV ]A . Compute the matrix product A [IV ]A A [IV ]A .
11.4. If IW : W → W is the identity map, find B [IW ]B . Compute
the matrix product A [IV ]A A [T ]B B [IW ]B and compare this to A [T ]B .
Let α, β, γ be matrices over F .
11.5. When are both products αβ and βα defined? When do the
products have the same size?
11.6. Show that α(β + γ) is defined if and only if αβ + αγ is defined
and find the common condition on the sizes of α, β, γ that guarantees
this.
11.7. Use the techniques suggested at the end of this section to
perform the matrix multiplication
⎡ ⎤⎡ ⎤
1 −1 2 1 0
⎣3 1 0⎦ ⎣−2 4⎦
2 −4 1 3 −2
Is it more efficient to use the row by row method or the column by
column method?
11.8. Let R = F n×n . Then R has defined on it an addition and a
multiplication. Which of the field axioms does R satisfy? A set with
this sort of arithmetic is known as a ring. In particular, R is the ring
of n × n matrices.

11.9. Recall that for any a ∈ F the linear transformation Ta : V →


V is defined by αTa = aα. If A = {α1 , α2 , . . . , αn } is a basis for V , find
the matrix A [Ta ]A .
11.10. Let T : V → W with
⎡ ⎤
0 2 −1
A [T ]B =
⎣ 1 1 −4⎦
−1 3 2
If A = {α1 , α2 , α3 }, find ker T .

ISTUDY
86 II. LINEAR TRANSFORMATIONS

12. Products of Linear Transformations


Now we have seen that the addition of matrices corresponds to ad-
dition of linear transformations and we have defined a multiplication of
matrices. What does this correspond to? Obviously to a multiplication
of linear transformations.
Let S : U → V and T : V → W be linear transformations. Then we
can define the (composition) product ST : U → W by
α(ST ) = (αS)T
for all α ∈ U . In other words, we first apply S to map U into V , and
then we apply T to map (U )S ⊆ V into W . We first observe that this
product is indeed a linear transformation.
Let α, β ∈ U and a ∈ F . Then
(α + β)(ST ) = ((α + β)S)T by definition of ST
= (αS + βS)T since S is a linear
transformation
= (αS)T + (βS)T since T is a linear
transformation
= α(ST ) + β(ST ) by definition of ST
and similarly
(aα)(ST ) = ((aα)S)T by definition of ST
= (a(αS))T since S is a linear
transformation
= a((αS)T ) since T is a linear
transformation
= a(α(ST )) by definition of ST
Thus ST is a linear transformation.
It is natural to ask whether this multiplication has nice properties
and it does. Of course, there is no reason to expect it to be commutative
since in fact ST may be defined while T S is not. However we do have
Lemma 12.1. Multiplication of linear transformations is associative
and distributive. Furthermore, a(ST ) = (aS)T = S(aT ) for all a ∈ F
and appropriate linear transformations S and T .
Proof. Let us first consider the associative law. Let R : U → V ,
S : V → W and T : W → X be three linear transformations with

ISTUDY
12. PRODUCTS OF LINEAR TRANSFORMATIONS 87

U, V, W, X being vector spaces over the same field F . We compare


R(ST ) and (RS)T which both map U to X. Let α ∈ U . Then

α(R(ST )) = (αR)(ST ) by definition of R(ST )


= ((αR)S)T by definition of ST

and

α((RS)T ) = (α(RS))T by definition of (RS)T


= ((αR)S)T by definition of RS

Since these are equal for all α ∈ U , the associative law holds. Observe
that what we have shown above is that no matter how we write the
product RST , the element α(RST ) is found by first applying R, then
S and then T .
Now suppose R : U → V and S, T : V → W . Then for all α ∈ U ,

α(R(S + T )) = (αR)(S + T ) by definition of R(S + T )


= (αR)S + (αR)T by definition of S + T
= α(RS) + α(RT ) by definition of RS, RT
= α(RS + RT ) by definition of RS + RT

Thus R(S + T ) = RS + RT .
Next, let S, T : U → V and R : V → W . Then for all α ∈ U ,

α((S + T )R) = (α(S + T ))R by definition of (S + T )R


= (αS + αT )R by definition of S + T
= (αS)R + (αT )R since R is a linear
transformation
= α(SR) + α(T R) by definition of SR, T R
= α(SR + T R) by definition of SR + T R

Thus (S + T )R = SR + T R and the distributive laws are proved.


Finally, let S : U → V , T : V → W and let a ∈ F . Then for all
α ∈ U , we have

α(S(aT )) = (αS)(aT ) by definition of S(aT )


= a((αS)T ) by definition of aT
= a(α(ST )) by definition of ST
= α(a(ST )) by definition of a(ST )

ISTUDY
88 II. LINEAR TRANSFORMATIONS

and
α((aS)T ) = (α(aS))T be definition of (aS)T
= (a(αS))T by definition of aS
= a((αS)T ) since T is linear
= a(α(ST )) by definition of ST
= α(a(ST )) by definition of a(ST )
Thus S(aT ) = a(ST ) = (aS)T , as required. 
Notice that the associative and first distributive law follow formally
from the definition of product and sum of functions. But the second
distributive law requires that R is a linear transformation. Similarly
for the two formulas involving a ∈ F . One is formal, but one requires
linearity. Our main result here is the correspondence between multi-
plication of matrices and of linear transformations.
Theorem 12.1. Let S : U → V and T : V → W be linear trans-
formations with U, V, W finite dimensional vector spaces over the same
field F . Let A be a basis for U , B a basis for V , and C a basis for W .
Then
A [S]B · B [T ]C = A [ST ]C

Proof. Let A = {α1 , α2 , . . . , αm }, B = {β1 , β2 , . . . , βn } and let


  {γ1 , γ2 , . . . , γr }. Then A [S]B = aij is an m × n matrix, B [T ]C =
C =
bij is n × r, so their product exists and has size m × r, the same size
as A [ST ]C = cij .
Now by definition of this correspondence, we have

αi S = aik βk
k

and 
βk T = bkj γj
j
so
 
αi (ST ) = (αi S)T = aik βk T
k
  
= aik (βk T ) = aik bkj γj
k k j
  
= aik bkj γj
j k

ISTUDY
12. PRODUCTS OF LINEAR TRANSFORMATIONS 89
 
On the other hand, by the definition of A [ST ]C = cij , we have

αi (ST ) = cij γj
j

This yields

cij = aik bkj
k
     
so cij = aij · bij and the result follows. 
This simple result of course explains our definition of matrix mul-
tiplication. Indeed matrix multiplication was defined so that it would
correspond to the composition of linear transformations.
The result also has numerous corollaries. Suppose T : V → W and
T is given in the form of its corresponding matrix A [T ]B for some pair
of bases A and B. Now it is quite possible that if another pair A
and B  were chosen, then the matrix A [T ]B would be so simple that
we could easily visualize the action of T . We will see an example of
this later. A natural problem is then to find the relationship between
A [T ]B and A [T ]B . Recall that matrix multiplication is associative, so
a product of matrices can be defined unambiguously without the use
of parentheses.
Corollary 12.1. Let T : V → W be a linear transformation with
V and W finite dimensional vector spaces over F . Let A, A be bases
of V and let B, B  be bases for W . Then
A [T ]B = A [IV ]A · A [T ]B · B [IW ]B
where IV : V → V and IW : W → W are the identity maps.
Proof. By the previous theorem
 
A [IV ]A · A [T ]B · B [IW ]B = A [IV ]A · A [T ·IW ]B

= A [IV ·T ·IW ]B


Since clearly T = IV T IW , the result follows. 
For obvious reasons, the matrices A [IV ]A and B [IW ]B are called the
change of basis matrices. They depend upon A, A , B and B  , but not
on the linear transformation T itself. Of course, we are now faced with
determining the nature of these matrices. Just how special are they?
In studying these change of basis matrices, we see first of all that
they correspond to linear transformations from a space to itself. So for
the time being, we will restrict all linear transformations to be from V
to V . Let A = {α1 , α2 , . . . , αn } and A = {α1 , α2 , . . . , αn } be bases for

ISTUDY
90 II. LINEAR TRANSFORMATIONS
 
V and let I : V → V be the identity transformation. If A [I]A = aij ,
then
αi = αi I = ai1 α1 + ai2 α2 + · · · + ain αn
 
In other words, the entries in the ith row of aij are merely the coeffi-
cients that occur
 when
 we write αi in terms of the basis A. If A = A
then of course aij might be quite complicated. On the other hand if
A = A (in the same order) then clearly A [I]A = In is the n × n identity
matrix ⎡ ⎤
1
⎢ 1 0 ⎥
⎢ ⎥
In = ⎢⎢
1 ⎥

⎣ 0 ..
. ⎦
1
 
That is In = δij where

1, for i = j
δij =
0, for i = j
Certainly A [T ]A = In if and only if T = I. Thus we can easily identify
the identity transformation from the matrix A [I]A , but not so easily
from a random A [I]A .
Now In is easily seen to be the identity element in F n×n , that is
In α = αIn = α for all α ∈ F n×n . Next let β ∈ F n×n . We say that
β is nonsingular if and only if β has an inverse β −1 ∈ F n×n with
ββ −1 = β −1 β = In .
Lemma 12.2. Let β = A [T ]A with T : V → V . Then β is non-
singular if and only if T is nonsingular and when this occurs we have
β −1 = A [T −1 ]A .
Proof. Suppose T is nonsingular. Then T is one-to-one and onto
and hence T −1 exists. From the definition of T −1 we have clearly
T T −1 = T −1 T = I. Thus
A [T ]A · A [T −1 ]A = A [T T −1 ]A = A [I]A = In
and
−1
A [T ]A · A [T ]A = A [T −1 T ]A = A [I]A = In
Therefore β has an inverse, namely A [T −1 ]A .
Conversely, suppose β is nonsingular and, by Theorem 11.2, let
S : V → V be given by β −1 = A [S]A . Then
A [ST ]A = A [S]A · A [T ]A = β −1 β = In

ISTUDY
12. PRODUCTS OF LINEAR TRANSFORMATIONS 91

so ST = I. Similarly
A [T S]A = A [T ]A · A [S]A = ββ −1 = In
so T S = I. Observe that for α ∈ V , (αS)T = αI = α so T is onto.
Also (αT )S = αI = α, so αT = 0 implies α = 0 and T is one-to-one.
Thus T is nonsingular and then clearly S = T −1 , so β −1 = A [T −1 ]A
and the result follows. 
In particular, since I is nonsingular with inverse I −1 = I, the above
shows that every change of basis matrix β = A [I]A is nonsingular with
inverse β −1 = A [I]A . Now for the converse.
Lemma 12.3. Let β be a nonsingular matrix and let A be a basis
for V . Then there exist bases A and A of V with
β = A [I]A = A [I]A
In particular, a square matrix is a change of basis matrix if and only
if it is nonsingular.

Proof. Choose T : V → V so that β = A [T ]A . Since β is non-


singular, so is T . Let A = (A)T be the image of A under T and let
A = (A)T −1 . Since A spans V , it is clear that A spans im T = V , and
A spans im T −1 = V . Since these are spanning sets of size n = dimF V ,
they are therefore bases for V . Now clearly
In = A [T ]A = A [T −1 ]A
so
β = A [T ]A · In = A [T ]A · A [T −1 ]A = A [I]A
and
β = In−1 β = A [T ]A −1 · A [T ]A
= A [T −1 ]A · A [T ]A = A [I]A
The lemma is proved. 
Let U, V and W be vector spaces over F and suppose T : V →
W is a linear transformation. Then via composition, T induces map
T1 : L(W, U ) → L(V, U ) and T2 : L(U, V ) → L(U, W ) given by
T1 : S → T S and T2 : S → ST
It follows easily from Lemma 12.1 that both T1 and T2 are linear trans-
formations. We will use this in the special case below where U is the
1-dimensional vector space F .
Let V be a vector space over F . Then L(V, F ) is called the dual
space of V and is denoted by V ∗ . Furthermore, the elements λ ∈ V ∗

ISTUDY
92 II. LINEAR TRANSFORMATIONS

are called linear functionals. Of course, these are just linear transfor-
mations so that (α1 + α2 )λ = α1 λ + α2 λ and (cα1 )λ = c(α1 λ) for all
α1 , α2 ∈ V and c ∈ F .
Lemma 12.4. Let dimF V = n and let A = {α1 , α2 , . . . , αn } be a
basis for V . Then the functionals αi∗ : V → F defined by

1, if j = i
αi∗ : αj →
0, if j = i
form a basis A∗ = {α1∗ , α2∗ , . . . , αn∗ }, the dual basis, for V ∗ . In particu-
lar, dimF V ∗ = dimF V .

Proof. Since A is a basis for V , Theorem 8.1 implies that there


exists one and only one linear functional β : V → F with (αi )β = ai for
each choice of a1 , a2 , . . . , an ∈ F . In particular, it follows that the dual
functionals αi∗ exist. Furthermore, since β = a1 α1∗ + a2 α2∗ + · · · + an αn∗
maps αi to ai for all i, we see that this β must be the unique functional
sending αi to ai . We now conclude easily that A∗ is a basis for V ∗ . 
 
Recall that if α = aij is the m × n matrix given by
⎡ ⎤
a11 a12 · · · a1n
⎢ a21 a22 · · · a2n ⎥
α=⎢ ⎣
⎥ ∈ F m×n

........
am1 am2 · · · amn
then the transpose αT of α is defined to be the n × m matrix
⎡ ⎤
a11 a21 · · · am1
⎢ a12 a22 · · · am2 ⎥
αT = ⎢

⎥ ∈ F n×m

........
a1n a2n · · · amn
 
In other words, αT = aij where aij = aji for all appropriate i and j.
Theorem 12.2. Let V and W be finite dimensional F -vector spaces
with bases A and B respectively, and let T : V → W be a linear transfor-
mation. Then the map T ∗ : W ∗ → V ∗ given by λ → T λ for all λ ∈ W ∗
is a linear transformation and the corresponding matrices A [T ]B and

B∗ [T ]A∗ are transposes of each other.

Proof. We have already observed that T ∗ : W ∗ → V∗ is a linear


transformation. Now, let A [T ]B = [aij ] so that αi T = j aij βj and
 ∗ ∗
let us write B∗ [T ∗ ]A∗ = [a∗ji ] so that βj∗ T ∗ = ∗ ∗
i aji αi . Then βj T
is the unique functional in V ∗ that sends αi to a∗ji . Furthermore, by
definition, βj∗ T ∗ is the composite map T βj∗ , so this functional sends αi

ISTUDY
12. PRODUCTS OF LINEAR TRANSFORMATIONS 93

to (αi T )βj∗ = ( k aik βk )βj∗ = aij . Thus a∗ji = aij and the matrices

A [T ]B and B∗ [T ]A∗ are indeed transposes of each other. 

Problems
12.1. Suppose T1 : V1 → V2 , T2 : V2 → V3 , . . ., Tk : Vk → Vk+1 are
linear transformations or in fact any functions. Show that the composi-
tion product T1 T2 · · · Tk with any choice of parentheses merely amounts
to first applying T1 , then T2 , . . ., and then Tk .
12.2. Let S : U → V and T : V → W be linear transformations
with U, V and W finite dimensional. Prove that
min{rank S, rank T } ≥ rank ST ≥ rank S + rank T − dim V
For the second inequality, let R be the restriction of T to im S ⊆ V .
Then im R = im ST and ker R = im S ∩ ker T ⊆ ker T , so
dim ker R ≤ dim ker T = dim V − rank T
12.3. Let α ∈ F m×n and β ∈ F n×r so that αβ ∈ F m×r . Use
Theorem 11.2 and the preceding problem with U = F m , V = F n and
W = F r to show that
min{rank α, rank β} ≥ rank αβ ≥ rank α + rank β − n

12.4. Prove that In is the unique identity element of the ring F n×n .
More generally, if α ∈ F m×n and β ∈ F n×k show that αIn = α and
In β = β.
12.5. Show how to deduce the associative and distributive laws of
matrix multiplication from those of linear transformation multiplica-
tion.
 
12.6. Let T : V → W be a linear transformation with A [T ]B = cij
where A = {α1 , α2 , . . . , αm } and B = {β1 , β2 , . . . , βn } are bases. Let
α = a1 α1 + a2 α2 + · · · + am αm ∈ V
and
αT = β = b1 β1 + b2 β2 + · · · + bn βn ∈ W
Show that matrix multiplication yields
⎡ ⎤
c11 c12 · · · c1n
  ⎢ c21 c22 · · · c2n ⎥  
a1 a2 · · · am ⎢⎣
⎥ = b 1 b2 · · ·
⎦ bn
........
cm1 cm2 · · · cmn
How does this relate to Theorem 12.1?

ISTUDY
94 II. LINEAR TRANSFORMATIONS

12.7. Find the rank of the linear transformation T if


⎡ ⎤
1 0 3 −2

A [T ]B = −1 2 1 4⎦
−1 6 9 8
12.8. Use Lemma 12.1 to prove in detail that the maps T1 and T2
described after Lemma 12.3 are both linear transformations.
12.9. If α and β are matrices of appropriate sizes so that αβ exists,
prove that (αβ)T = β T αT .
12.10. Let β ∈ F n×n . Show that β is nonsingular if and only if β T
is nonsingular.

ISTUDY
13. EIGENVALUES AND EIGENVECTORS 95

13. Eigenvalues and Eigenvectors


Suppose T : V → W is a linear transformation. What does it look
like? To start with, by Theorem 9.3, there exist bases A and B of V
and W respectively with
A [T ]B = Dr = [δij ]
where δ11 = δ22 = · · · = δrr = 1 and all remaining δij are 0. Here,
of course, r is the rank of T . In this way, T is clearly described in
the nicest possible manner. But what happens when W = V ? Does
it really make sense to choose two different bases for V in order to
describe T ? The answer is definitely “no”. For example. suppose we
are given A and B, both bases for V , with A [T ]B = Dr . Then we will
obviously want to find some relationship between A and B, and this
clearly reduces the situation to the case of a single basis. Thus our
main problem here is to find a basis A for V so that A [T ]A is as simple
as possible.
For the remainder of this section, V will be a fixed vector space
over F of finite dimension and all linear transformations will map V
to V . Before we concern ourselves with the problem of choosing a nice
basis for T , let us consider what happens to the matrix associated with
T under any sort of change of basis.
Lemma 13.1. Let T : V → V and let A be a basis for V .
i. If B is a second basis for V , then
B [T ]B = β −1 · A [T ]A · β
where β is the nonsingular matrix A [I]B .
ii. Conversely, given a nonsingular matrix β ∈ F n×n there exists
a basis B such that B [T ]B is given as above.

Proof. We know that


B [T ]B = B [I]A · A [T ]A · A [I]B
Moreover if β = A [I]B , then β −1 = B [I]A so
B [T ]B = β −1 · A [T ]A · β
and this yields (i ).
Conversely, if β is any nonsingular matrix, then Lemma 12.3 implies
that there exists a basis B with β = A [I]B . Since β −1 = B [I]A , it then
follows that
β −1 · A [T ]A · β = B [T ]B
and the lemma is proved. 

ISTUDY
96 II. LINEAR TRANSFORMATIONS

Let α, γ ∈ F n×n . We say that α and γ are similar if there exists


a nonsingular matrix β with γ = β −1 αβ. Thus the above lemma says
that α and γ are similar if and only if they represent the same linear
transformation but over possibly different bases, that is α = A [T ]A and
γ = B [T ]B .
Now let us consider some examples to see how nice we can hope
A [T ]A to be.
Example 13.1. If I : V → V is the identity map, then certainly
for all bases A we have A [I]A = In where dimF V = n.
Example 13.2. Let Ta : V → V denote scalar multiplication by
a ∈ F . Then for any α ∈ V we have αTa = aα, so clearly
⎡ ⎤
a
⎢ a 0⎥

A [Ta ]A = ⎣ ... ⎥ ⎦ = aIn
0 a
Thus all entries down the main diagonal (namely the i, i-entries for
i = 1, 2, . . . , n) are equal to a and the remaining terms are 0.
Example 13.3. The above shows that we cannot hope to always
get matrices of the form Dr . In fact, what happens when A [T ]A = Dr ?
Say A = {α1 , α2 , . . . , αn } and let W = α1 , α2 , . . . , αr  and W  =
αr+1 , αr+2 , . . . , αn . Then certainly V = W ⊕ W  , αT = α for α ∈ W ,
and αT = 0 for α ∈ W  . This shows that T is essentially the projection
from V to W followed by the embedding of W back into V . In other
words, if α ∈ V satisfies α = β + β  with β ∈ W and β  ∈ W  , then
αT = β where we think of β as being an element of V .
These three examples all have one thing in common. The matrices
A [T ]A above are all diagonal matrices, that is their only nonzero entries
occur on the main diagonal. Since, in such matrices, we need only
be concerned with the n diagonal entries, we can use the shorthand
notation diag(a1 , a2 , . . . , an ) for the n × n diagonal matrix
⎡ ⎤
a1
⎢ a2 0⎥
diag(a1 , a2 , . . . , an ) = ⎢
⎣ ... ⎥

0 an
Let T : V → V . If there exists a basis A such that A [T ]A is diagonal,
then we say that T can be diagonalized.
Suppose A [T ]A = diag(a1 , a2 , . . . , an ) where A = {α1 , α2 , . . . , αn }.
Then by definition, we have αi T = ai αi for i = 1, 2, . . . , n. In other

ISTUDY
13. EIGENVALUES AND EIGENVECTORS 97

words, T sends each αi to a scalar multiple of itself. Vectors with


this property are therefore of interest to us and we give them a name.
Suppose α ∈ V, α = 0 and αT = aα for some a ∈ F . Then we say
that α is an eigenvector for T and a is its associated eigenvalue. The
following theorem is now obvious. It is true by definition.
Theorem 13.1. Let T : V → V be a linear transformation. Then
T can be diagonalized if and only if V has a basis consisting entirely of
eigenvalues of T .
Now we ask, can a linear transformation always be diagonalized?
By the above we are asking for a full basis consisting of eigenvectors.
But do eigenvectors necessarily exist and how do we find them? The
answer as we will see in due time is that eigenvectors always exist if
the field F is in some sense big enough, but even then there may not
be enough to yield a basis for V . Let us consider some examples.
Example 13.4. Let V be a 2-dimensional space over R with basis
{α1 , α2 } and suppose T is given by
α1 T = α1 + 2α2
α2 T = −α1 − α2
Let α = a1 α1 + a2 α2 ∈ V with αT = cα for some c ∈ R. Then
ca1 α1 + ca2 α2 = cα = αT
= a1 (α1 + 2α2 ) + a2 (−α1 − α2 )
= (a1 − a2 )α1 + (2a1 − a2 )α2
This yields
ca1 = a1 − a2 , ca2 = 2a1 − a2
so
(1 − c)a1 = a2 , (1 + c)a2 = 2a1
Thus if we assume that α = 0, then we have a1 = 0, a2 = 0 and
a1 1 1+c
= =
a2 1−c 2
But this yields
0 = 2 − (1 + c)(1 − c) = 1 + c2
there is no such c ∈ R satisfying this equation. On the other hand
and √
c = −1 ∈ C is a solution, so the trouble here is that R is just not big
enough.

ISTUDY
98 II. LINEAR TRANSFORMATIONS

Example 13.5. Let V be as above, but now let T be given by


α1 T = − α1 + 2α2
α2 T = −2α1 + 3α2
Suppose α = a1 α1 + a2 α2 ∈ V is an eigenvector for T with correspond-
ing eigenvalue c ∈ R. Then
ca1 α1 + ca2 α2 = cα = αT
= a1 (−α1 + 2α2 ) + a2 (−2α1 + 3α2 )
= (−a1 − 2a2 )α1 + (2a1 + 3a2 )α2
Thus
ca1 = −a1 − 2a2 , ca2 = 2a1 + 3a2
so
(c + 1)a1 = −2a2 , (c − 3)a2 = 2a1
Again if α = 0, then a1 = 0, a2 = 0 and
a1 −2 c−3
= =
a2 c+1 2
so
0 = (c + 1)(c − 3) + 4 = (c − 1)2
Thus c = 1, a2 = −a1 and α = a1 (α1 −α2 ). We have therefore found an
eigenvector, namely α1 − α2 , but all other eigenvectors are just scalar
multiples of this one. Hence T cannot be diagonalized.
Suppose T : V → V and we wish to solve the equation αT = aα
for α = 0. If a is known then it is apparent that this is merely a set of
linear equations in the coefficients of α with respect to some basis of V
and this offers no great difficulty. Thus the real problem is to find the
eigenvalues. If S : V → V is a linear transformation, we say that S is
singular if and only if it is not nonsingular. In view of Corollary 9.3 this
occurs if and only if ker S = 0. The following is a trivial reformulation
of the eigenvalue problem.
Theorem 13.2. Let T : V → V be a linear transformation. Then
a ∈ F is an eigenvalue for T if and only if the transformation S =
aI − T is singular. When this occurs then 0 = α ∈ ker S if and only if
α is an eigenvector of T with corresponding eigenvalue a.
Proof. S = aI − T is singular if and only if there exists α ∈ V ,
α = 0 with
0 = αS = α(aI − T ) = aα − αT
so the result is clear. 
Let us try another example, a really ugly one this time.

ISTUDY
13. EIGENVALUES AND EIGENVECTORS 99

Example 13.6. Let V be a 3-dimensional vector space over R with


basis {α1 , α2 , α3 } and suppose T : V → V is given by

α1 T = −10α1 + 9α2 + 6α3


α2 T = 32α1 − 23α2 − 18α3
α3 T = −61α1 + 47α2 + 35α3

If a ∈ R is an eigenvalue of T , then S = aI − T is singular. Clearly

α1 S = (a + 10)α1 − 9α2 − 6α3


α2 S = −32α1 + (a + 23)α2 + 18α3
α3 S = 61α1 − 47α2 + (a − 35)α3

Now how do we determine when S is singular? By Corollary 9.3


this occurs if and only if ker S = 0, so choose β = b1 α1 + b2 α2 + b3 α3
and suppose βS = 0. Indeed, in view of the preceding theorem, any
such nonzero β will be an eigenvector of T . We have

0 = βS = b1 (α1 S) + b2 (α2 S) + b3 (α3 S)


= ((a + 10)b1 − 32b2 + 61b3 )α1
+ (−9b1 + (a + 23)b2 − 47b3 )α2
+ (−6b1 + 18b2 + (a − 35)b3 )α3

so we must have

0 = h1 = (a + 10)b1 − 32b2 + 61b3


0 = h2 = −9b1 + (a + 23)b2 − 47b3
0 = h3 = −6b1 + 18b2 + (a − 35)b3

We solve this system of equations in b1 , b2 and b3 by eliminating one


variable at a time. Since we do not know the value of a ∈ R, we will
therefore avoid dividing by any coefficients that depend upon it. Thus
we start by saving the third equation and eliminating b1 from the first
and second. We get

0 = k1 = h 3 = − 6b1 + 18b2 + (a − 35)b3


0 = k2 = (a + 10)h3 + 6h1 = (18a − 12)b2 + (a2 − 25a + 16)b3
0 = k3 = 6h2 − 9h3 = (6a − 24)b2 + (33 − 9a)b3

ISTUDY
100 II. LINEAR TRANSFORMATIONS

Unfortunately, at this point the coefficients in k2 and k3 all depend


upon a and therefore the elimination procedure hits a slight snag. How-
ever, if we replace k2 by k2 − 3k3 a simplification occurs
0= 1 = k1 = − 6b1 + 18b2 + (a − 35)b3
0= 2 = k2 − 3k3 = 60b2 + (a2 + 2a − 83)b3
0= 3 = k3 = (6a − 24)b2 + (33 − 9a)b3
Finally, we save the first two equations and eliminate b2 from the
third to obtain
0 = m1 = 1 = − 6b1 + 18b2 + (a − 35)b3
0 = m2 = 2 = 60b2 + (a2 + 2a − 83)b3
0 = m3 = (a − 4) 2 − 10 3 = (a3 − 2a2 − a + 2)b3
Suppose (a3 − 2a2 − a + 2) = 0. Then 0 = m3 implies that b3 = 0
and we get in turn b2 = 0 from 0 = m2 and b1 = 0 from 0 = m1 . Thus
for a nonzero solution in the bs we must have
0 = (a3 − 2a2 − a + 2) = (a − 1)(a + 1)(a − 2)
so a = 1, −1, or 2. For each of these three values it is now a simple
matter to find nonzero b1 , b2 , b3 and we get
a1 = a = 1 β1 = β = −5α1 + 4α2 + 3α3
a2 = a = −1 β2 = β = −9α1 + 7α2 + 5α3
a3 = a = 2 β3 = β = −7α1 + 5α2 + 4α3
Since βS = 0, it follows from Theorem 13.2 that βi is an eigenvector
for T with corresponding eigenvalue ai . Now it is an easy matter to
check (and in fact a consequence of a later theorem) that β1 , β2 and β3
are linearly independent. Thus V has a basis B = {β1 , β2 , β3 } consisting
of eigenvectors of T , so
B [T ]B = diag(1, −1, 2)
and T is diagonalized. Certainly, T is now easily understood. We
observe that here as in all earlier examples the eigenvalues of T occur
as roots of certain polynomial equations.
Now it seems reasonable that the elimination method will probably
work in general. But it is certainly unpleasant. There are difficulties
with coefficients being functions of the possible eigenvalue and there
are also difficulties with zero coefficients that we did not encounter
here. Moreover there are certainly many different possibilities for the
order in which we carry out this procedure so that the answer we get
is not a priori strictly a function of T . Wouldn’t it be nice therefore if

ISTUDY
13. EIGENVALUES AND EIGENVECTORS 101

there existed a fixed function of the entries of the matrix A [S]A which
would tell us at a glance just when S is singular. As we will see in the
next chapter, there is a natural candidate for such a function and it
indeed does the job.

Problems

13.1. For α, β ∈ F n×n , let us write α ∼ β if and only if α and β are


similar. Prove that ∼ is an equivalence relation.
Let V be a fixed finite dimensional vector space over F and let
T : V → V . For each a ∈ F we set
Va = {α ∈ V | αT = aα}
13.2. Show that Va is a subspace of V .
13.3. Suppose S : V → V commutes with T , so that ST = T S.
Prove that (Va )S ⊆ Va .
13.4. If a and b are distinct elements of F , show that Va ∩ Vb = 0
and conclude that Va + Vb = Va ⊕ Vb .
13.5. Suppose T satisfies T 2 = T . Show that V = V0 ⊕ V1 and
that T is the projection map, with respect to this direct sum, of V into
V1 ⊆ V .
13.6. Suppose that every vector α ∈ V satisfies αT = aα for some
a ∈ F depending upon α. Show that V = Vb for some b ∈ F so that
T = Tb .
13.7. Assume that V = Va1 ⊕ Va2 ⊕ · · · ⊕ Var for suitable distinct
a1 , a2 , . . . , ar ∈ F . Prove that T can be diagonalized.
m i
13.8. Let f (x) = i=0 ci x ∈ F [x] be a polynomial over F and
consider
m
f (T ) = ci T i ∈ L(V, V )
i=0
where by convention T 0 = I, the identity transformation. If α ∈ Va ,
show that
αf (T ) = f (a)α
Conclude that if f (T ) = 0 and if α = 0, then a must be a root of the
polynomial f .
13.9. Suppose T k = 0 for some integer k ≥ 1. Find all eigenvalues
for T and show that T can be diagonalized if and only if T = 0.

ISTUDY
102 II. LINEAR TRANSFORMATIONS

13.10. Let n = dimF V . Prove that there exists a nonzero polyno-


mial f (x) of degree ≤ n2 such that f (T ) = 0. Hint. Recall that L(V, V )
is a vector space over F of dimension n2 and consider the n2 + 1 linear
2
transformations T 0 = I, T 1 = T, T 2 , . . . , T n .

ISTUDY
CHAPTER III

Determinants

ISTUDY
104 III. DETERMINANTS

14. Volume Functions


Let V be an n-dimensional vector space over F and let S : V → V
be a linear transformation. When is S singular? If B = {β1 , β2 , . . . , βn }
is a basis for V , then certainly (B)S = {β1 S, β2 S, . . . , βn S} spans
the image of S and thus S is singular if and only if the n vectors
β1 S, β2 S, . . . , βn S are linearly dependent. So we change the problem.
Given n vectors α1 , α2 , . . . , αn in V , when are they linearly dependent?
We consider some examples.
Example 14.1. Let V = R2 so that V is the Euclidean plane with
each vector corresponding to a point in this plane. Let α1 , α2 ∈ V and
draw the parallelogram whose vertices include the vectors 0, α1 and α2 .
The fourth vertex is of course α1 + α2 . Now clearly α1 and α2 are

α1 + α2

α2

α1

dependent if and only if one of them is an R-multiple of the other


and therefore if and only if 0, α1 and α2 are colinear. In this case the
parallelogram reduces to a straight line and hence has zero area. Thus
the vanishing of the area v(α1 , α2 ) of the parallelogram is a necessary
and sufficient condition for the linear dependence of α1 and α2 .
Since the diagonals of a parallelogram bisect each other, we also
note that the midpoint of the line segment joining α1 and α2 is precisely
the point (1/2)(α1 + α2 ), the midpoint of the line joining 0 and α1 + α2 .
Of course, this property carries over to R3 and indeed to all Rn .
Example 14.2. Now let V = R3 so that V is Euclidean 3-space,
with each vector again corresponding to a point in this space, and let
α1 , α2 , α3 ∈ V . Here we draw (on the next page) the parallelepiped

ISTUDY
14. VOLUME FUNCTIONS 105

whose vertices include 0, α1 , α2 and α3 . The other vertices are then


α1 + α2 , α1 + α3 , α2 + α3 and α1 + α2 + α3 . We see easily that α1 , α2 , α3
are linearly dependent if and only if the four points 0, α1 , α2 , α3 are
coplanar. Indeed, let us suppose for example that α1 and α2 are lin-
early independent, that α3 = a1 α1 + a2 α2 , and let P denote the plane

α2 + α3

α3
α 1 + α2 + α3

α2

α1 + α2

α1

determined by the triangle with vertices 0, α1 and α2 . For convenience,


set α1 = 2a1 α1 and α2 = 2a2 α2 . Then α1 is on the line joining 0 to α1 ,
and α2 is on the line joining 0 to α2 , so both points are on the plane
P. Finally, α3 = (1/2)(α1 + α2 ) is the midpoint of the line segment
joining α1 and α2 , so α3 is also on the plane P. Note that this occurs
if and only if the volume v(α1 , α2 , α3 ) of the parallelepiped is zero.
More generally, if V = Rn then we are led to consider the n-
dimensional volume of the generalized parallelepiped whose vertices
include the points 0, α1 , α2 , . . . , αn . Of course, we have no idea as to
what the volume really is or in fact whether it exists at all. Indeed,
even if we were willing to believe that such a volume exists in Rn , we
would still have to come up with something that works for any vector
space over any field. So instead, what we will do is use this volume
idea as a guide to discover some appropriate function v(α1 , α2 , . . . , αn )
that will work in general.
Let us consider some properties of n-dimensional volume in Rn .
Actually, we will work in R2 and argue that the analogous situation
probably holds in Rn . First, v(α1 , α2 , . . . , αn ) ∈ R. Of course we might
be tempted to say that the volume is always a nonnegative real number,
but this thought quickly dies. One reason is that not all fields have a

ISTUDY
106 III. DETERMINANTS

concept of positive and negative. Additional reasons even for the field
R occur later in this section.
Now how does v(α1 , α2 , . . . , αn ) behave as a function of each vari-
able individually. Namely, suppose we fix all but the ith vector. We
then have a map from V to R given by αi → v(α1 , α2 , . . . , αi , . . . , αn ).
What sort of a function is it?
Let a ∈ R and consider v(α1 , α2 , . . . , aαi , . . . , αn ). By multiplying
αi by a we have clearly expanded the parallelepiped linearly in one di-
rection by a factor of a and thus we expect the volume to be multiplied
by a. In other words
v(α1 , α2 , . . . , aαi , . . . , αn ) = a·v(α1 , α2 , . . . , αi , . . . , αn )
This can clearly be seen in the figure below where we have compared
v(3α1 , α2 ) and v(α1 , α2 ). If a is negative, then multiplication by a
reverses the sign of a real number. Again negative volumes seem to
appear. Of course, we could still multiply v(α1 , α2 , . . . , αi , . . . , αn ) by
|a|, the absolute value of a, to keep things positive, but the formula is
just not as nice and besides, absolute values do not exist in all fields
that might be of interest to us.

3α1 + α2
2α1 + α2
α1 + α2
α2

3α1
2α1
α1

More complicated is the problem of addition of vectors. Namely,


what can we say about v(α1 , α2 , . . . , αi + αi , . . . , αn )? Let us consider
n = 2 and the area v(α1 + α1 , α2 ) which is the area of the paral-
lelogram ACC  A in the diagram on the next page. Since triangles
ABC and A B  C  are clearly congruent, we see that v(α1 + α1 , α2 ) is
equal to the sum of the areas of the two parallelograms ABB  A and
BCC  B  . Of course, the area of ABB  A is just v(α1 , α2 ). Moreover,

ISTUDY
14. VOLUME FUNCTIONS 107

α1 + α1 + α2

A
α1 + α2
B
α2

C
α1 + α1
B
α1
A 0

as we see from the diagram below, the area of parallelogram BCC  B  is


the same as the area of the congruent parallelogram DEE  D , namely
v(α1 , α2 ). Thus we have v(α1 + α1 , α2 ) = v(α1 , α2 ) + v(α1 , α2 ).

α1 + α2

E
D
α2

E
α1

D 0

ISTUDY
108 III. DETERMINANTS

More generally, we would expect that


v(α1 , α2 , . . . , αi + αi , . . . , αn ) = v(α1 , α2 , . . . , αi , . . . , αn )
+ v(α1 , α2 , . . . , αi , . . . , αn )
Therefore we see that v(α1 , α2 , . . . , αn ) viewed as a function strictly of
the ith variable is in fact a linear functional from V to R.
Finally, since we have only treated each variable independently so
far, we must now consider them together. We know for example that if
the vectors α1 , α2 , . . . , αn are linearly dependent, then the generalized
parallelepiped collapses to an n-dimensional surface rather than a solid
and thus has zero volume. For later convenience, we will just record a
special case of this. Namely we note that if two vectors αi and αj are
equal, with i = j, then v(α1 , . . . , αi , . . . , αj , . . . , αn ) = 0.
Now we observe that the properties of v listed above translate easily
to an arbitrary vector space to give us a volume function. Let V be a
vector space over F of dimension n < ∞. A volume function

· · · × V → F
v : V × V ×
n

is a function of n variable vectors of V with image in F satisfying the


following three axioms.

V1. For all a ∈ F and subscripts i, we have


v(α1 , α2 , . . . , aαi , . . . , αn ) = a·v(α1 , α2 , . . . , αi , . . . , αn )
V2. For all i, we have
v(α1 , α2 , . . . , αi + αi , . . . , αn ) = v(α1 , α2 , . . . , αi , . . . , αn )
+ v(α1 , α2 , . . . , αi , . . . , αn )
V3. For all i = j, if αi = αj then
v(α1 , . . . , αi , . . . , αj , . . . , αn ) = 0

The first two conditions say that v is multilinear while the third
condition says that v is alternating.
Example 14.3. The zero map v : V × V × · · · × V → F is always
trivially a volume function.
Example 14.4. If dimF V = n = 1, then any linear functional
T : V → F is a volume function since the third axiom is vacuously
satisfied.

ISTUDY
14. VOLUME FUNCTIONS 109

Example 14.5. Suppose n = 2 and let {β1 , β2 } be a fixed basis for


V . If α1 , α2 ∈ V , then α1 = a11 β1 + a12 β2 and α2 = a21 β1 + a22 β2 for
uniquely determined field elements aij ∈ F . If we set
v(α1 , α2 ) = v(a11 β1 + a12 β2 , a21 β1 + a22 β2 ) = a11 a22 − a12 a21
then it is trivial to see that v : V × V → F is indeed a volume function.
Let us consider some elementary properties of volume functions. Of
course, for all we know at the moment, some vector spaces V may only
have the zero volume function. For the remainder of this section, let V
be an n-dimensional vector space over the field F and let v be a volume
function.
Lemma 14.1. With the above notation, we have
i. If αi = 0 for some i, then v = 0. In other words,
v(α1 , . . . , 0, . . . , αn ) = 0
ii. If we add a scalar multiple of αj to αi , for j = i, then the value
of v does not change. That is,
v(α1 , . . . , αi + aαj , . . . , αn ) = v(α1 , . . . , αi , . . . , αn )
iii. If the vectors αi and αj , for i = j, are interchanged, then the
value of v is multiplied by −1. Thus,
v(α1 , . . . , αj , . . . , αi , . . . , αn ) = −v(α1 , . . . , αi , . . . , αj , . . . , αn )
Proof. (i ) This is just the fact that the image of 0 under a linear
transformation is always 0.
(ii ) By the axioms for a volume function, we have
v(. . . , αi + aαj , . . . , αj , . . .)
= v(. . . , αi , . . . , αj , . . .) + v(. . . , aαj , . . . , αj , . . .)
= v(. . . , αi , . . . , αj , . . .) + a·v(. . . , αj , . . . , αj , . . .)
= v(. . . , αi , . . . , αj , . . .)
(iii ) Let us replace the ith and jth variables by αi + αj . Then by
linearity and (ii) above, we have
0 = v(. . . , αi + αj , . . . , αi + αj , . . .)
= v(. . . , αi + αj , . . . , αi , . . .) + v(. . . , αi + αj , . . . , αj , . . .)
= v(. . . , αj , . . . , αi , . . .) + v(. . . , αi , . . . , αj , . . .)
and the result follows. 
Lemma 14.2. If the vectors α1 , α2 , . . . , αn are linearly dependent,
then v(α1 , α2 , . . . , αn ) = 0.

ISTUDY
110 III. DETERMINANTS

Proof. Since the vectors are linearly dependent, for some i, we can
solve for αi in terms of the remaining αj and obtain αi + j=i aj αj = 0.
We now apply part (ii) of the previous lemma and successively add
a1 α1 , a2 α2 , . . . , an αn (for j = i) to the ith variable without changing v.
Thus

v(α1 , . . . , αi , . . . , αn ) = v(α1 , . . . , αi + aj α j , . . . , α n )
j=i

= v(α1 , . . . , 0, . . . , αn ) = 0
and the lemma is proved. 
This solves part of the problem. Namely if the n vectors are linearly
dependent then the volume function vanishes. What happens if the
vectors are linearly independent? We will show below that if v is not
the identically zero function, then v does not vanish on any linearly
independent set.
Lemma 14.3. Suppose σ is any permutation of the set {1, 2, . . . , n}.
Then
v(ασ(1) , . . . , ασ(i) , . . . , ασ(n) ) = ± v(α1 , . . . , αi , . . . , αn )
where the ± sign is determined solely by σ.
Proof. By Lemma 14.1(iii ), we can interchange any two entries
and only change v by a factor of −1. So we first interchange α1 with
ασ(1) if 1 = σ(1) so that v(ασ(1) , . . . , ασ(i) , . . . , ασ(n) ) now has α1 as its
first variable. Then we interchange α2 if necessary with the second
entry and continue this process. At each step we either leave v alone
or multiply it by −1. This clearly yields the result since the ± sign is
determined in this way by σ. 
It is important to observe that the choice of the ± sign above de-
pends on our specific procedure for reordering the αi s. There is cer-
tainly no a priori reason to believe that a different reordering scheme
will yield the same sign. Indeed this is the fundamental problem in
trying to define a nontrivial volume function. The following argument
may look complicated, but it really a simple application of the multi-
linearity of v.
Theorem 14.1. Let V be a vector space over the field F having
dimension n < ∞. Let A = {α1 , α2 , . . . , αn } be a subset of V and let
B = {β1 , β2 , . . . , βn } be a basis. If v is a volume function then there
exists c ∈ F depending only upon A and B with
v(α1 , α2 , . . . , αn ) = c · v(β1 , β2 , . . . , βn )

ISTUDY
14. VOLUME FUNCTIONS 111

Proof. Since B is a basis for V , we can write



α1 = a11 β1 + a12 β2 + · · · + a1n βn = a1j βj
j

α2 = a21 β1 + a22 β2 + · · · + a2n βn = a2j βj
j
············

αn = an1 β1 + an2 β2 + · · · + ann βn = anj βj
j

for uniquely determined aij ∈ F . Then


   
v(α1 , α2 , . . . , αn ) = v a1j βj , a2j βj , . . . , anj βj
j j j

Now v is a linear transformation when viewed as a function of each


variable independently so that
  
v ..., aij βj , . . . = aij · v(. . . , βj , . . .)
j j

Thus by expanding v(α1 , α2 , . . . , αn ) in this manner, we get a sum of


n terms from each variable and we therefore obtain

v(α1 , α2 , . . . , αn ) = a1j1 a2j2 · · · anjn · v(βj1 , βj2 , . . . , βjn )
all n−tuples
(j1 ,j2 ,...,jn )

a sum of nn terms.
Consider one such term
a1j1 a2j2 · · · anjn · v(βj1 , βj2 , . . . , βjn )
Since there are precisely n vectors βj and precisely n entries, we see
that either j1 , j2 , . . . , jn is a permutation of the numbers 1, 2, . . . , n or
else two of the ji s are equal. Of course, in the latter case we see that
v(βj1 , βj2 , . . . , βjn ) = 0 and the entire term vanishes. Thus, by deleting
these zero terms, we have

v(α1 , α2 , . . . , αn ) = a1σ(1) a2σ(2) · · · anσ(n) · v(βσ(1) , βσ(2) , . . . , βσ(n) )
σ

where the sum is over all n! permutations σ of the set {1, 2, . . . , n}.
Finally, by the previous lemma
v(βσ(1) , βσ(2) , . . . , βσ(n) ) = ± v(β1 , β2 , . . . , βn )

ISTUDY
112 III. DETERMINANTS

where the ± sign is determined by σ. Thus we get


 
v(α1 , α2 , . . . , αn ) = ± a1σ(1) a2σ(2) · · · anσ(n) · v(β1 , β2 , . . . , βn )
σ
= c · v(β1 , β2 , . . . , βn )
where

c= ± a1σ(1) a2σ(2) · · · anσ(n)
σ

depends only on A and B. The theorem is proved. 

As an immediate corollary we have


Corollary 14.1. Let B = {β1 , β2 , . . . , βn } be a basis for V and
let v be a volume function. Then v is uniquely determined by the value
v(β1 , β2 , . . . , βn ). In particular if v(β1 , β2 , . . . , βn ) = 0, then v is iden-
tically 0.

Problems
14.1. Verify that the function defined in Example 14.5 is a volume
function.

Let V be a 3-dimensional vector space over F with basis B =


{β1 , β2 , β3 }. Let α1 , α2 , α3 ∈ V and let v be a volume function.

14.2. Each of the following is equal to ± v(α1 , α2 , α3 ).


v(α1 , α2 , α3 ), v(α2 , α3 , α1 ), v(α3 , α1 , α2 )
v(α1 , α3 , α2 ), v(α2 , α1 , α3 ), v(α3 , α2 , α1 )
Find the correct sign for each.
14.3. Follow the proof of Theorem 14.1 in this special case and
compute c explicitly.
14.4. Let v  be the function given by v  (α1 , α2 , α3 ) = c where c is
given as in the previous exercise. Prove that v  is a volume function
with v  (β1 , β2 , β3 ) = 1.

Now let V be an n-dimensional vector space over R and let v be a


volume function. Suppose σ is a permutation of the set {1, 2, . . . , n}.

ISTUDY
14. VOLUME FUNCTIONS 113

14.5. If ασ(1) , ασ(2) , . . . , ασ(n) can be reordered to α1 , α2 , . . . , αn in t


interchanges, show that
v(ασ(1) , ασ(2) , . . . , ασ(n) ) = (−1)t · v(α1 , α2 , . . . , αn )

14.6. Suppose v exists with v(α1 , α2 , . . . , αn ) = 0 for some vectors


α1 , α2 , . . . , αn ∈ V . Show that the ± sign in
v(ασ(1) , ασ(2) , . . . , ασ(n) ) = ± v(α1 , α2 , . . . , αn )
is uniquely determined. What can you conclude about the parity, that
is the oddness or evenness, of the number of interchanges required in
the reordering process.
14.7. Suppose n = 5. Find three different reordering processes for
each of v(α2 , α3 , α4 , α5 , α1 ) and v(α3 , α1 , α2 , α5 , α4 ). Count the number
of interchanges in each case and see whether the parity remains the
same.
14.8. Let V be a vector space over F with basis {β1 , β2 , . . . , βn }
and let v  be a nonzero volume function. Show that, for each a ∈ F , V
has one and only one volume function v with v(β1 , β2 , . . . , βn ) = a.
14.9. Suppose Rn has an n-dimensional volume v. Explain why v
is unique up to a scalar factor.
14.10. Find the volumes of the geometric figures described in Ex-
amples 14.1 and 14.2.

ISTUDY
114 III. DETERMINANTS

15. Determinants
We seem to know everything about volume functions except whether
nonzero ones exist. In this section, we prove their existence. There are
several possible ways of doing this. One approach is to study the ±
sign that occurs in
v(ασ(1) , ασ(2) , . . . , ασ(n) ) = ± v(α1 , α2 , . . . , αn )
Indeed, once this sign is properly understood, we would have no dif-
ficulty in defining v. A second approach and the one we take here is
inductive.
Let α be an n × n matrix over F . Then each row of α is an n-tuple
of elements of F and hence can be thought of as being a vector in F n .
In this way, α is composed on n vectors of F n and thus any function
sending α to F can be thought of as a function of n elements of F n . We
can therefore translate the notion of a volume function to this situation
and we call the resulting function the determinant.
Formally the determinant is a map det : F n×n → F satisfying the
following three axioms.

D1. If we consider det α as a function of the ith row of α, for any


i, with all other rows kept fixed, then det is a linear transfor-
mation from F n to F .
D2. If two rows of α are identical, then det α = 0.
D3. det In = 1.

The first two above tell us, as we have already indicated, that det is a
volume function on F n . The third axiom is a normalization. It says
that the volume function evaluated at the basis {β1 , β2 , . . . , βn }, where
βi = (0, 0, . . . , 1, . . . , n) has a 1 in the ith spot, is equal to 1. Thus, by
Corollary 14.1 we have
Theorem 15.1. If det : F n×n → F exists, then it is unique.
We now proceed to show that det does in fact exist by induction
on n. If α ∈ F n×n then we denote by αij the submatrix of α obtained
by deleting the ith row and jth column. Clearly αij ∈ F (n−1)×(n−1) .
Lemma 15.1. Let n > 1 and suppose that the determinant map
det : F (n−1)×(n−1) → F exists. Fix an integer s with 1 ≤ s ≤ n and
define a function ds : F n×n → F by
n
ds (α) = (−1)i+s ais det αis
i=1

ISTUDY
15. DETERMINANTS 115

where α = [aij ]. Then ds (α) is an n × n determinant map.

Proof. We obviously have to check that ds satisfies all three ax-


ioms above. This is not difficult, just tedious. We consider each axiom
in turn.
D1) Fix a row subscript k. We study ds as a function of the kth
row of α. Thus here all matrices of interest will have the same fixed
entries in all rows but the kth. We can indicate this symbolically by
⎡ ⎤
eij
α = ⎣akj ⎦
eij
In other words, all the ij-entries for i = k are eij and these are fixed.
The kj-entries are akj and we will allow these to vary.
Let b ∈ F and set
⎡ ⎤ ⎡ ⎤
eij eij
α = ⎣akj ⎦ , β = ⎣b·akj ⎦
eij eij
Then by definition

ds (β) = (−1)k+s (b·aks ) det βks + (−1)i+s eis det βis
i=k

Now for i = k, the kth row is not deleted in βis . Thus αis and βis agree
in all rows but one and in that row all entries of βis are equal to b times
the corresponding entry in αis . By properties of det we therefore have
det βis = b(det αis ) for i = k. On the other hand, for i = k, the kth
row is deleted so βks = αks . This yields

ds (β) = (−1)k+s (b·aks ) det αks + (−1)i+s eis ·b(det αis )
i=k
 
= b (−1)k+s aks det αks + (−1)i+s eis det αis
i=k

= b·ds (α)
Now let α, β and γ be given by
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
eij eij eij

α = akj , ⎦ β = bkj ⎦ ,
⎣ γ = ⎣akj + bkj ⎦
eij eij eij
Then

ds (γ) = (−1)k+s (aks + bks ) det γks + (−1)i+s eis det γis
i=k

ISTUDY
116 III. DETERMINANTS

Now for i = k, the kth row is not deleted in γis . Clearly αis , βis and
γis agree in all the other rows and they add appropriately in this row.
Thus
det γis = det αis + det βis
for i = k. On the other hand, for i = k we have γks = αks = βks . This
yields

ds (γ) = (−1)k+s aks det αks + (−1)i+s eis det αis
i=k

+ (−1)k+s bks det βks + (−1)i+s eis det βis
i=k

= ds (α) + ds (β)
and ds satisfies the first axiom.
D2) Let us suppose that the kth and th rows of α are identical and
say > k. If α = [aij ], then

ds (α) = (−1)i+s ais det αis
i

Now if i = k or , then neither row is deleted in αis . This means that


αis has two identical rows and det αis = 0. Therefore only the k and
terms occur in the above sum and since aks = as , we have

ds (α) = (−1)k+s aks det αks + (−1)−k det αs
How do αks and αs compare? In one case we deleted the kth
row and in the other the th. But since these rows are identical, we
see that αks and αs have the same rows occurring but in a slightly
different order. In fact, we have pictorially
⎡ ⎤ ⎡ ⎤
row 1 row 1
⎢row 2 ⎥ ⎢row 2 ⎥
⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ ⎢ .. ⎥
⎢ . ⎥ ⎢ . ⎥
⎢ ⎥ ⎢ ⎥
⎢row k (missing)⎥ ⎢row k ⎥
αks = ⎢
⎢ ... ⎥
⎥ α s = ⎢ .
⎢ ..


⎢ ⎥ ⎢ ⎥
⎢row ⎥ ⎢row (missing)⎥
⎢ ⎥ ⎢ ⎥
⎢ .. ⎥ ⎢ .. ⎥
⎣ . ⎦ ⎣ . ⎦
row n row n
Since det is a volume function on F n−1 , interchanging rows merely
multiplies the value det by −1. Now we make αs identical to αks by
slowly lowering row k to the position. We do this by consecutively
interchanging the kth row with rows k + 1, k + 2, . . . , − 1.

ISTUDY
15. DETERMINANTS 117
⎡ ⎤
..
⎢ . ⎥
⎢row k ⎥
⎢ ⎥
⎢row k+1 ⎥
⎢ ⎥
⎢row k+2 ⎥
⎢ . ⎥
αs = ⎢
⎢ ..


⎢row −2 ⎥
⎢ ⎥
⎢row −1 ⎥
⎢ ⎥
⎢row (missing)⎥
⎣ ⎦
..
.

By doing it this way, we have not changed the ordering of the other
rows and yet we have shifted row k down to position . Moreover this
was achieved in − k − 1 interchanges (since, for example if k = − 1
then no interchanges are needed). Thus we have
det αs = (−1)−k−1 det αks
and

ds (α) = (−1)k+s aks det αks + (−1)−k (−1)−k−1 det αks

k+s
= (−1) aks det αks − det αks = 0

as required.
D3) If In = α = [aij ] then

ds (In ) = (−1)i+s ais det αis
i

Now ais = 0 for i = s and ass = 1, so


ds (In ) = (−1)s+s ass det αss = det αss
Since clearly αss = In−1 and det In−1 = 1, this axiom is satisfied. We
have therefore shown that the function ds satisfies all the appropriate
axioms and thus ds is a determinant function on F n×n . 
It is now an easy matter to prove that determinants exist.
Theorem 15.2. The determinant map det : F n×n → F exists for
all integers n. Moreover for n > 1 and for any integer s with 1 ≤ s ≤ n,
we have
n
det α = (−1)i+s ais det αis
i=1

ISTUDY
118 III. DETERMINANTS

where α = [aij ] ∈ F n×n . The latter is called the cofactor expansion of


the determinant with respect to the sth column.

Proof. We proceed by induction on n. If n = 1 then det[a] = a is


easily seen to satisfy all the appropriate axioms.
Now suppose det : F (n−1)×(n−1) → F exists. Fix s with 1 ≤ s ≤ n
and define ds : F n×n → F by
n

ds (α) = (−1)i+s ais det αis
i=1

where α = [aij ]. By the preceding lemma ds is a determinant function


and hence by the uniqueness Theorem 15.1, ds is the determinant func-
tion. Thus we have shown that det : F n×n → F exists and that det = ds
for any suitable integer s. The theorem now follows by induction. 
Corollary 15.1. Let V be an n-dimensional vector space over F
with basis {β1 , β2 , . . . , βn }. Define

 ×V ×
v: V · · · × V → F
n

by v(α1 , α2 , . . . , αn ) = det[aij ] where αi = j aij βj . Then v is a
nonzero volume function with v(β1 , β2 , . . . , βn ) = 1.
Proof. We must show that v satisfies all the axioms for a volume
function. Suppose first that αk is replaced by bαk . Then clearly the
kth row of [aij ] is multiplied by b ∈ F so (D1) yields
v(α1 , . . . , bαk , . . . , αn ) = b det[aij ] = b·v(α1 , . . . , αk , . . . , αn )

Secondly, consider v(α1 , . . . , αk + αk , . . . , αn ) where αk = j akj βj and
 
αk = j akj βj . Then αk + αk = j (akj + akj )βj . This means that the
corresponding matrices for the definition of v(α1 , . . . , αk , . . . , αn ) and
v(α1 , . . . , αk , . . . , αn ) agree except in the kth row and there they add
to give the kth row of the matrix for v(α1 , . . . , αk + αk , . . . , αn ). Thus
again by (D1) we have
v(α1 , . . ., αk + αk , . . . , αn ) =
v(α1 , . . . , αk , . . . , αn ) + v(α1 , . . . , αk , . . . , αn )
Next, if the two vectors αr and αs are identical, then certainly the rth
and sth rows of [aij ] are identical so
v(α1 , . . . , αr , . . . , αs , . . . , αn ) = det[aij ] = 0
We have therefore shown that v is a volume function.

ISTUDY
15. DETERMINANTS 119

Finally
v(β1 , β2 , . . . , βn ) = det In = 1
so v is nonzero and the result follows. 
Corollary 15.2. Let S : V → V be a linear transformation and let
B and B  be bases for V . Then S is singular if and only if det B [S]B = 0.

Proof. Let the volume function v be defined as in the preceding


corollary using the basis B = {β1 , β2 , . . . , βn }. Let B  = {β1 , β2 , . . . , βn }
and set αi = βi S, the image of βi under S. If

βi S = αi = aij βj
j

then B [S]B = [aij ] and

det B [S]B = det[aij ] = v(α1 , α2 , . . . , αn )

Now (B  )S = {α1 , α2 , . . . , αn } spans im S. Thus S is singular if


and only if {α1 , α2 , . . . , αn } is linearly dependent and hence if and only
if v(α1 , α2 , . . . , αn ) = 0 since v is a nonzero volume function. This
completes the proof. 

If α = [aij ] is an n × n matrix, then we will use the notation


 
 a11 a12 · · · a1n 

a a22 · · · a2n 
det α = det[aij ] = |aij | =  21 
 ........ 
an1 an2 . . . ann 

Now the proof of the existence of the determinant is actually construc-


tive. It really gives us a way of computing det for at least small values
of n. Note that the vertical lines here stand for determinant and not
absolute value.
Example 15.1. If n = 1, then |a11 | = a11 . Let n = 2. Then using
the cofactor expansion with respect to the first column, we have
 
a11 a12 
 
a21 a22  = a11 |a22 | − a21 |a12 |
= a11 a22 − a12 a21

a not unexpected result.

ISTUDY
120 III. DETERMINANTS

Suppose n = 3 and again use the cofactor expansion down the first
column.
 Then 
a11 a12 a13     
  a22 a23  a12 a13 
a21 a22 a23  = a11    
  a32 a33  − a21 a32 a33 
a31 a32 a33 
 
a12 a13 
+ a31  
a22 a23 
= a11 (a22 a33 − a32 a23 ) − a21 (a12 a33 − a32 a13 )
+ a31 (a12 a23 − a22 a13 )
= a11 a22 a33 − a11 a23 a32 + a13 a21 a32
− a12 a21 a33 + a12 a23 a31 − a13 a22 a31
Finally, we have
Theorem 15.3. Let [aij ] ∈ F n×n . Then

det[aij ] = |aij | = ±a1j1 a2j2 · · · anjn
where the sum is over all n-tuples (j1 , j2 , . . . , jn ) of distinct column sub-
scripts. Here the ± sign is uniquely determined and the main diagonal
term a11 a22 · · · ann occurs with a plus sign.

Proof. We proceed by induction on n. We have already seen in


the above example that the result is true for n = 1, 2, 3. Suppose that
the result holds for (n − 1) × (n − 1) matrices and let α = [aij ] ∈ F n×n .
Then by induction

det αk1 = ±a1j1 · · · a(k−1),j(k−1) a(k+1),j(k+1) · · · anjn
where the sum is over all (n − 1)-tuples (j1 , . . . , j(k−1) , j(k+1) , . . . , jn )
of distinct integers between 2 and n. This follows because the row
subscripts of αk1 omit k and the column subscripts omit 1. Applying
the cofactor expansion of det α with respect to the first column, we
therefore get

det α = (−1)k+1 ak1 · det αk1
k
 
= (−1)k+1 ak1 ±a1j1 · · · a(k−1),j(k−1) a(k+1),j(k+1) · · · anjn
k

= ±a1j1 a2j2 · · · akjk · · · anjn
Finally the term a11 a22 · · · ann clearly comes from (−1)1+1 a11 times the
diagonal term a22 a33 · · · ann of det α11 and hence it occurs with a plus
sign. The result follows. 

ISTUDY
15. DETERMINANTS 121

Problems
15.1. Evaluate the determinant of α ∈ R4×4 where
⎡ ⎤
2 1 0 −1
⎢−1 0 1 3⎥
α=⎢ ⎣ 0

2 7 1⎦
0 0 −1 2
15.2. Let α ∈ R3×3 be given by
⎡ ⎤
3 −1 2
α=⎣ 0 2 4⎦
−1 0 1
Evaluate the determinants of α and of αT , the transpose of α. Compare
your results.
15.3. Let α, β ∈ R2×2 be given by
   
4 1 2 −1
α= β=
1 2 1 1
Evaluate det α, det β, det(αβ) and det(βα), Compare your results.
15.4. Recall Example 13.6 where S : V → V is given by
⎡ ⎤
a + 10 −9 −6
A [S]A =
⎣ −32 a + 23 18 ⎦
61 −47 a − 35
Find det A [S]A . For what values of a ∈ R is S singular?
15.5. Let α ∈ F 5×5 be given by
⎡ ⎤
a11 a12 a13 a14 a15
⎢ 0 a22 a23 a24 a25 ⎥
⎢ ⎥
α=⎢ ⎢0 0 a33 a34 a35 ⎥

⎣0 0 0 a44 a45 ⎦
0 0 0 0 a55
Find det α.
15.6. Let β ∈ F n×n . Prove that β is nonsingular if and only if
det β = 0.
15.7. Let α, β ∈ F n×n . Is the formula
det(α + β) = det α + det β
true in general?

ISTUDY
122 III. DETERMINANTS

Let V be a vector space over F of dimension n and let v be a


nonzero volume function. Let T : V → V be a linear transformation.

15.8. Define a function

· · · × V → F
vT : V × V ×
n
by
(vT )(α1 , α2 , . . . , αn ) = v(α1 T, α2 T, . . . , αn T )
Show that vT is a volume function.
15.9. If a ∈ F define av by
(av)(α1 , α2 , . . . , αn ) = a·(v(α1 , α2 , . . . , αn ))
Show that av is a volume function and that every volume function on
V is of this form for a unique a ∈ F .
15.10. From the above we see that there exists a unique c ∈ F with
vT = cv. Prove that c = 0 if and only if T is singular.

ISTUDY
16. CONSEQUENCES OF UNIQUENESS 123

16. Consequences of Uniqueness


At this point we could use the existence of determinants to further
our study of eigenvalues. However, instead we will devote this section
to a consideration of certain additional properties of determinants. As
we will see, the theorems all follow beautifully from the uniqueness
aspect which we repeat below.
Lemma 16.1. Let d : F n×n → F satisfy
i. d is a linear transformation from F n to F when viewed as a
function of any particular row of the matrix.
ii. d(α) = 0 if two rows of α are identical.
Then for all α, we have
d(α) = c·(det α)
where c = d(In ).
Proof. Observe that the function d : F n×n → F given by d (α) =
d(In )· det α satisfies (i ) and (ii ), and d(In ) = d (In ). Thus d and d
are volume functions on F n that agree on a basis. By the uniqueness
theorem, Corollary 14.1, d and d are identical. 

Now we have treated det α as a function of the rows of α. How does


it behave as a column function? The answer is that it behaves in just
the same way. Observe that the columns are merely n-tuples on their
side, so we can also consider them as elements of F n .
Lemma 16.2. Let det : F n×n → F . Then det is a linear transforma-
tion from F n to F when viewed strictly as a function of one particular
column. Moreover, if two columns of α are identical, then det α = 0.
Proof. This all follows from the expansion of det by cofactors
down a particular column. Let us consider det α as a function of the
kth column. If α = [aij ] then
n

det α = (−1)i+k aik det αik
i=1

Observe that if we fix all the columns of α other than the kth, then αik
is kept fixed and the above formula is just
n

det α = aik ci
i=1

ISTUDY
124 III. DETERMINANTS

for some constants ci ∈ F . Since the map F n → F given by


n

(a1k , a2k , . . . , ank ) → aik ci
i=1

is clearly a linear functional, the first part of the lemma is proved.


We prove the second part by induction on n. The result is vacuously
true for n = 1. Now let n = 2. Then
 
a11 a12 
 
a21 a22  = a11 a22 − a12 a21

so if the two columns are identical, then a11 = a12 , a21 = a22 and
|aij | = 0. Finally, let n > 2 and suppose the result is true for n − 1.
Let α ∈ F n×n with say its rth and sth columns identical. Since n ≥ 3,
we can choose k = r, s and then by expanding det α with respect to
the kth column we have
n
det α = (−1)i+k aik det αik
i=1

where α = [aij ]. Now in obtaining αik , we deleted neither the rth nor
the sth column. Thus αik has two columns identical and by induction
det αik = 0 for all i. Hence det α = 0 and the lemma is proved. 
Let α = [aij ] ∈ F n×n . Recall that the transpose αT of α is defined
to be the n × n matrix αT = [aij ] where aij = aji . In other words, αT
is obtained from α by reflecting it about the main diagonal. In this
process we see that rows and columns are interchanged. But one thing
that is not changed is the determinant.
Theorem 16.1. Let α ∈ F n×n . Then
det αT = det α

Proof. We define a function d : F n×n → F by d(α) = det αT and


we show that d is a determinant function. We first consider d as a
function strictly of the kth row of α. But this means that we are
studying det αT as a function of its kth column. Since we know by the
preceding lemma that det as a function of any particular column is a
linear transformation, we conclude that d satisfies (i ) of Lemma 16.1.
Suppose now that two rows of α are identical. Then two columns of
αT are the same and hence 0 = det αT = d(α) again by the preceding
lemma. Finally d(In ) = det InT = det In = 1 so by the uniqueness
Lemma 16.1, we have det αT = d(α) = det α and the result follows. 

ISTUDY
16. CONSEQUENCES OF UNIQUENESS 125

Corollary 16.1. det α can be evaluated by cofactors with respect


to the rth row as follows. If α = [aij ] then
n
det α = (−1)r+j arj det αrj
j=1

Proof. Let αT = [aij ] so that aij = aji . Clearly (αT )ij = (αji )T so
using the rth column expansion of det αT we obtain
n
T
det α = det α = (−1)j+r ajr det(αT )jr
j=1
n
= (−1)r+j arj det(αrj )T
j=1
n
= (−1)r+j arj det αrj
j=1

since det(αrj )T = det αrj . 


Now it is not true that det(α + β) = det α + det β in general. For
example if    
1 0 0 0
α= , β=
0 0 0 1
then det α = det β = 0 but det(α + β) = det I2 = 1. On the other
hand, determinants do act well with respect to multiplication.
Theorem 16.2. Let α, β ∈ F n×n . Then
det αβ = (det α)(det β)
Proof. Fix β throughout this proof and define d : F n×n → F by
d(α) = det αβ. We observe now the expected properties of d. Note
first that the kth row of αβ depends on the kth row of α and all of β.
Thus if α and α agree on all rows but the kth, then so do αβ and α β.
Suppose α = [aij ], β = [bij ] and αβ = [cij ]. Then
n
ckj = aki bij
i=1
so the map T from the kth row of α to the kth row of αβ is given by
n
 n
 n
 
(ak1 , ak2 , . . . , akn )T = aki bi1 , aki bi2 , . . . , aki bin
i=1 i=1 i=1
n n
This is clearly a linear transformation from F to F . Thus the map
d(α) viewed strictly as a function of the kth row of α is the composition

ISTUDY
126 III. DETERMINANTS

of T followed by det viewed as a function of the kth row of αβ. Since


the product of linear transformations is a linear transformation, we see
that d satisfies (i ) of Lemma 16.1.
Suppose now that two rows of α are identical. Then the correspond-
ing two rows of αβ will be identical so d(α) = det αβ = 0. Finally
d(In ) = det In β = det β so by the uniqueness Lemma 16.1 we get
det αβ = d(α) = (det α)·d(In ) = (det α)(det β)
and the theorem is proved. 
In the proof below, the commutativity of multiplication in F is
clearly crucial.
Corollary 16.2. Let α, β ∈ F n×n be similar matrices. Then
det α = det β.

Proof. By definition, there exists a nonsingular n × n matrix γ


with α = γ −1 βγ. Then
det α = det(γ −1 β)γ = (det γ −1 β)(det γ)
= (det γ)(det γ −1 β) = det γ(γ −1 β)
= det β
and the result follows. 
Finally we consider the determinant of certain block matrices. Sup-
pose α ∈ F n×n , β ∈ F m×m and δ ∈ F m×n . Then we can form the
(m + n) × (m + n) block matrix
⎡ ⎤
α 0
γ=⎣ ⎦
δ β

This is of course only a symbolic expression for the matrix. It means


for example that the upper left part of γ has entries identical to those
of α. Observe that the upper right part of γ has all zero entries. We
wish to compute det γ and we start with a special case.
Lemma 16.3. If
⎡ ⎤
In 0
γ=⎣ ⎦
δ β

then det γ = det β.

ISTUDY
16. CONSEQUENCES OF UNIQUENESS 127

Proof. We proceed by induction on n. If n = 0, then In and δ do


not occur. Thus γ = β so certainly det γ = det β. Now suppose that
n ≥ 1 and compute det γ by its cofactor expansion with respect to the
first row. Since this row has only one nonzero entry, namely a 1 in the
1, 1-position, we get det γ = det γ11 . But
⎡ ⎤
In−1 0
γ11 = ⎣ ⎦

δ β

where δ ∗ denotes δ with its first column deleted. By induction, det γ11 =
det β, so the result follows. 
We can now prove
Theorem 16.3. If
⎡ ⎤
α 0
γ=⎣ ⎦
δ β

where α ∈ F n×n , β ∈ F m×m and δ ∈ F m×n , then


det γ = (det α)(det β)
Proof. Fix β and δ and define d : F n×n → F by d(α) = det γ
where γ is as above. Because of the presence of 0s in the upper right
part of γ, we see easily that d(α) is a linear transformation when viewed
as a function of any particular row of α. Secondly, if two rows of α
are identical, then the corresponding rows of γ are the same so d(α) =
det γ = 0. Thus by the uniqueness Lemma 16.1 and the preceding
lemma we have
det γ = d(α) = (det α)·d(In ) = (det α)(det β)
and the theorem is proved. 
The cofactor expansion is useful theoretically, but it not a partic-
ularly efficient method computationally. Indeed it is better to use a
mixture of elementary row and column operations, along with these
expansions, to evaluate any fixed determinant. To start with, we have
Lemma 16.4. Let α ∈ F n×n and set β = E(α), where E is an
elementary row or column operation.
i. If E interchanges two distinct rows or two distinct columns,
then det β = − det α.

ISTUDY
128 III. DETERMINANTS

ii. If E multiplies a row or a column of α by c ∈ F , then det β =


c· det α.
iii. If E adds a scalar multiple of one row to another or a scalar
multiple of one column to another, then det β = det α.
Proof. We first quickly consider the row operations. Note that
Corollary 15.1 implies that the determinant map defines a volume func-
tion on the rows of α. Thus (i) and (iii) for row operations follow from
Lemma 14.1. Furthermore, (ii) is immediate since det α is a linear func-
tional on the individual rows of α. Of course, the analogous column
results follow from the fact that det α = det αT . 

It is convenient to view (ii) above as a common factor principle.


Namely, if all entries in a particular row or column of α are divisible by
c ∈ F , then det α is equal to c times the determinant of the modified
matrix with c factored out of that row or column. We now consider
two examples, one rather trivial and the other rather amazing.
Example 16.1. Let α be the 3 × 3 matrix
⎡ ⎤
0 3 2
α = ⎣3 6 9⎦
2 5 9
with integer entries. To evaluate det α, we first factor 3 out of the
second row so that det α = 3· det β where
⎡ ⎤
0 3 2
β = ⎣ 1 2 3⎦
2 5 9
Next, we subtract twice the second row of β from the third, so that
det β = det γ, where
⎡ ⎤
0 3 2
γ = ⎣ 1 2 3⎦
0 1 3
Finally, we evaluate det γ by cofactors with respect to the first column
to obtain
 
3 2 
det α = 3· det β = 3· det γ = 3·(−1) ·1· 
1+2 
1 3
= 3·(−1)·7 = −21
Now for our more serious example.

ISTUDY
16. CONSEQUENCES OF UNIQUENESS 129

Example 16.2. The n × n Hilbert Matrix An = [aij ] with rational


entries
1
aij =
i+j−1

occurs in the study of orthogonal polynomials defined on the unit in-


terval of the real line. One can evaluate det An using the theory of
such polynomials or by elementary row and column operations. We
of course take the latter approach and obtain the rather remarkable
formula
(n − 1)!!4
det An =
(2n − 1)!!
 m
where as usual m! = m i=1 i, and where we set m!! = i=1 i!. We
proceed by induction on n and note that this formula holds for n = 1
since, by convention, 0! = 1. As we see, the computations involved are
simple, but a bit tedious.
Now let n > 1 and we apply elementary row operations to put 0
entries in the first n − 1 spots of the last column. Specifically, since
ann = 1/(2n − 1), we subtract (2n − 1)/(n + i − 1) times the nth row
from the ith for i = 1, 2, . . . , n − 1. This of course does not change the
determinant, by Lemma 16.4(iii), but it does transform the matrix An
to the block matrix

⎡ ⎤
Bn−1 0
Bn = ⎣ ⎦
C ann

where Bn−1 = [bij ] and

2n − 1
bij = aij − ·anj
n+i−1

Thus, by the cofactor expansion with respect to the last column of Bn ,
we have

det An = det Bn = (−1)n+n ann · det Bn−1


1
= · det Bn−1
2n − 1

ISTUDY
130 III. DETERMINANTS

We consider the submatrix Bn−1 . Here


2n − 1
bij = aij − ·anj
n+i−1
1 2n − 1 1
= − ·
i+j−1 n+i−1 n+j−1
(n + i − 1)(n + j − 1) − (2n − 1)(i + j − 1)
=
(n + i − 1)(n + j − 1)(i + j − 1)
n2 − n(i + j) + ij
=
(n + i − 1)(n + j − 1)(i + j − 1)
n−i n−j
= · ·aij
n+i−1 n+j−1
In particular, if we factor (n − i)/(n + i − 1) out of the ith row of Bn−1
for all i and (n − j)/(n + j − 1) out of the jth column of Bn−1 for all
j, we obtain the matrix An−1 . Thus
n−1
 n−1
n−i  n−j
det Bn−1 = · · det An−1
i=1
n + i − 1 j=1 n + j − 1
Now
n−1

(n − i) = (n − 1)!
i=1
and
n−1
 (2n − 2)!
(n + i − 1) =
i=1
(n − 1)!
so
1
det An = · det Bn−1
2n − 1
1 (n − 1)!4
= · · det An−1
2n − 1 (2n − 2)!2
(n − 1)!4
= · det An−1
(2n − 1)!(2n − 2)!
Thus, by induction,
(n − 1)!4 (n − 2)!!4
det An = ·
(2n − 1)!(2n − 2)! (2n − 3)!!
(n − 1)!!4
=
(2n − 1)!!
as required.

ISTUDY
16. CONSEQUENCES OF UNIQUENESS 131

Problems
16.1. Show that the 2 × 2 determinant
 
a11 a12 
 
a21 a22 

is given by the product a11 a22 , corresponding to the line slanting down
and to the right, minus the product a21 a12 , corresponding to the line
slanting up and to the right.
16.2. We learn this trick in calculus class for evaluating the 3 × 3
determinant |aij |. Copy the first two columns of the matrix to the right
hand side, as indicated, and then draw the six slanting lines.
a11 a12 a13 a11 a12
a21 a22 a23 a21 a22
a31 a32 a33 a31 a32

Show that |aij | is equal to the sum of the three products corresponding
to the lines slanting down and to the right, minus the three products
corresponding to the lines slanting up and to the right.
16.3. Can the above trick work in general for n × n determinants
with n ≥ 4 ? For this, consider Theorem 15.3.
16.4. If a, b, c ∈ F , show that
 
  1 1 1
 1 1  
  = b − a, and  a b c  = (c − a)(c − b)(b − a)
a b   2 2 2
a b c 
Why do you think these right hand factors occur?
16.5. Let α, β ∈ F 2×2 with
   
a b e f
α= , β=
c d g h
Compute αβ and then det αβ. Prove directly that det αβ factors as
the product (det α)(det β).
16.6. Let α and β be n×n matrices with integer entries and assume
that αβ = diag(1, 2, . . . , n). Show that det α is an integer dividing n!.
16.7. Compute the determinant
 
 1 1 2 3 

 0 2 4 5 

−1 3 −2 1 

 2 4 1 9

ISTUDY
132 III. DETERMINANTS

16.8. Let α = [aij ] ∈ F 3×3 and let b1 , b2 , b3 ∈ F . Then the sum


(−1)1+2 b1 ·α12 + (−1)2+2 b2 ·α22 + (−1)3+2 b3 ·α32
is the cofactor expansion with respect to the second column of some
3 × 3 matrix β. Describe β and determine the sum when b1 = a11 ,
b2 = a21 and b3 = a31 .
As in Example 16.2, let An be the n × n Hilbert Matrix and write
det An = 1/an .
 
16.9. Use the fact that the binomial coefficient mi is an integer,
for all m and i, to prove that an is always an integer.

16.10. Verify the inequality


 
2m 2m 22m
2 ≥ ≥
m 2m + 1
for all m ≥ 0, and use this to estimate the size of the integer an . It
clearly grows very quickly.

ISTUDY
17. ADJOINTS AND INVERSES 133

17. Adjoints and Inverses


Let α ∈ F n×n with n ≥ 2. Since the subdeterminants det αij are
double subscripted, it is tempting to place them inside an n×n matrix.
To be more precise, we form the matrix of cofactors [cij ] where cij =
(−1)i+j det αij and then we form the adjoint of α given by
adj α = [aij ] = [cij ]T
In other words, the entries of adj α satisfy aij = cji = (−1)i+j det αji .
Now it turns out that the row and column cofactor expansions for
suitable determinants translate to yield the following remarkable result.

Theorem 17.1. If α ∈ F n×n with n ≥ 2, then


α·(adj α) = (adj α)·α = (det α)·In
Proof. Let α = [aij ], adj α = [aij ] and set (adj α)·α = [dsr ]. Then
for fix subscripts r, s we have
n n


dsr = asi air = (−1)i+s air det αis
i=1 i=1
Now this looks like the cofactor expansion down the sth column for
the determinant of some matrix β. But what is this matrix? The
answer is that β = [bij ] is the matrix obtained from α by replacing
the sth column of α with the rth column of α. Then β and α agree
on all columns but the sth, so βis = αis . Furthermore, bis = air so
Theorem 15.2 implies that
 n
dsr = (−1)i+s bis det βis = det β
i=1
Now if r = s, then β = α and drr = det α. On the other hand, if
r = s, then the rth and sth columns of β are identical, so det β = 0
and dsr = 0. It follows that (adj α)·α = [dsr ] = (det α)·In , as required.
The proof that α·(adj α) = (det α)·In is similar but depends upon
the row cofactor expansions of certain determinants as described in
Corollary 16.1. 
Of course, adj α makes no sense for n = 1, but we could set adj α =
In in this situation. This would salvage the above result in the trivial
n = 1 case, but there is really nothing to be gained by doing so.
As a consequence of the preceding theorem, we obtain a concrete
description of matrix inverses. Furthermore, we show that if a right
or left inverse exists, then a two-sided inverse exists and all of these
inverses are equal.

ISTUDY
134 III. DETERMINANTS

Corollary 17.1. Let α ∈ F n×n with n ≥ 2. The following are


equivalent.
i. There exists β ∈ F n×n with αβ = In .
ii. There exists γ ∈ F n×n with γα = In .
iii. det α = 0.
iv. There exists α−1 ∈ F n×n with αα−1 = α−1 α = In .
Furthermore, when these occur then
1
β = γ = α−1 = · adj α
det α

Proof. If β exists, then (det α)(det β) = det In = 1 and hence


det α = 0. Thus (i ) implies (iii ), and similarly (ii ) implies (iii ). Next,
if det α = 0, we can set
1
α−1 = · adj α
det α
Then Theorem 17.1 implies that αα−1 = α−1 α = In and (iv ) holds. Of
course, (iv ) implies (i ) and (ii ).
Finally, if β is a right or two-sided inverse of α, then
β = In β = (α−1 α)β = α−1 (αβ) = α−1 In = α−1
Similarly, if γ exists, then γ = α−1 , as required. 
For convenience, we add a few more items to the above list of equiv-
alent statements. Recall that the rank of a matrix is the dimension of
its row space and also the dimension of its column space.
Lemma 17.1. Let α ∈ F n×n and let α be the unique reduced row
echelon matrix that is row equivalent to α. Then α = In if and only if
det α = 0 if and only if rank α = n.
Proof. It follows from Theorem 6.1 that rank α = rank α and that
this rank is equal to the number of nonzero rows of α . Furthermore,
it is an easy consequence of Lemma 16.4 that det α = 0 if and only if
det α = 0. Thus we can now assume that α = α is in reduced row
echelon form.
If α = In , then certainly rank α = n and det α = 1 = 0. Conversely,
if either rank α = n or det α = 0, then the last row of α cannot be zero.
It follows that all rows of α are nonzero and therefore α has precisely
n leading 1s. Since these leading 1s slope down and to the right and
since α has n columns, we see that the leading 1s must fill the main
diagonal and hence α = In . 

ISTUDY
17. ADJOINTS AND INVERSES 135

Since (α−1 )−1 = α, one might guess that adj(adj α) is equal to α


up to a scalar factor. However this is not quite the case since adj α can
be the zero matrix even when α = 0. Indeed, we have
Lemma 17.2. Let α ∈ F n×n with n ≥ 2. Then adj α = 0 if and
only if rank α ≥ n − 1.

Proof. Suppose first that rank α ≥ n − 1. We show that some


entry of adj α is not 0. From the fact that the n rows of α span the
row space, we see that some n − 1 of these rows are part of a basis
for the row space. Hence at least n − 1 of the rows of α are linearly
independent, say these are the first n − 1 rows.
Now let β be the (n − 1) × n matrix obtained from α by deleting
the last row. Since the n − 1 rows of β are linearly independent, we
see that β has rank n − 1. Hence the column space of β has dimension
n − 1, and there are n − 1 columns that form a basis for this space,
say these are the first n − 1 columns. Since αnn is obtained from β by
deleting the last column, it follows that the columns of αnn are linearly
independent. Hence αnn ∈ F (n−1)×(n−1) has rank n−1, and we conclude
from Lemma 17.1 that cnn = det αnn = 0. Thus adj α = 0.
Conversely, suppose rank α ≤ n − 2 and consider the submatrix
αrs . First, let β be the submatrix of α obtained by deleting its rth
row. Since rank α ≤ n − 2 and β has n − 1 rows of α, we see that the
rows of β cannot be linearly independent. Thus rank β ≤ n − 2, so the
column space of β has dimension ≤ n − 2. Since αrs is obtained from
β by deleting its sth column, it follows that the columns of αrs cannot
be linearly independent. Hence αrs ∈ F (n−1)×(n−1) has rank < n − 1, so
Lemma 17.1 implies that det αrs = 0. Thus adj α = 0, as required. 
As will be apparent, we can use matrix inverses to solve certain
systems of linear equations. To be precise, let
a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b2
·········
an1 x1 + an2 x2 + · · · + ann xn = bn
be a system of n linear equations in the n unknowns x1 , x2 , . . . , xn . As
usual, let A = [aij ] ∈ F n×n be the matrix of coefficients, and let
⎡ ⎤ ⎡ ⎤
x1 b1
⎢ x2 ⎥ ⎢ b2 ⎥
X=⎢ ⎥
⎣ ... ⎦ , B=⎢ ⎥
⎣ ... ⎦
xn bn

ISTUDY
136 III. DETERMINANTS

be the matrices of unknowns and constants, respectively, with these


matrices both in F n×1 . Then the system of equations is clearly equiv-
alent to the matrix equation
AX = B

Now if A is nonsingular, that is if the two-sided inverse A−1 exists,


then AX = B implies that
X = In X = (A−1 A)X = A−1 (AX) = A−1 B
so X can only equal A−1 B. Conversely, if X = A−1 B, then
AX = A(A−1 B) = (AA−1 )B = In B = B
and X is indeed a solution. Notice that we use both sides of the inverse
condition A−1 A = AA−1 = In in the above. Indeed, one side yields
existence and the other uniqueness.
Now we can use the formula for A−1 given in Corollary 17.1 along
with certain cofactor expansions to obtain the following result, known
as Cramer’s Rule. However, we offer a different proof.
Theorem 17.2. Consider the system of n linear equations in the n
unknowns x1 , x2 , . . . , xn described by the matrix equation AX = B. If
A is a nonsingular matrix, then
det Bi
xi = , for i = 1, 2, . . . , n
det A
where Bi is the matrix obtained from A by replacing its ith column by
the column of B.
Proof. Since A is nonsingular, we know that X ∈ F n×1 is uniquely
determined by this system of equations. Furthermore, det A = 0. Let
x1 , x2 , . . . , xn ∈ F be the solution set and consider xi . Then xi · det A =
det Ai , where Ai is obtained from A by multiplying its ith column by
xi . Next, for all k = i, we add xk times the kth column of Ai to the ith
column. This does not change the determinant, but it does transform
the matrix Ai into the matrix Bi . Indeed, the jth entry of the ith
column of this new matrix is

aji xi + ajk xk = bj
k=i

Thus xi · det A = det Ai = det Bi and the lemma is proved since we


know that det A = 0. 

ISTUDY
17. ADJOINTS AND INVERSES 137

This result is of interest for two reasons. First, if A and B are


integer matrices, then Cramer’s Rule gives us an idea of the possible
denominators that can show up in solutions. Second, unlike the usual
approach to solving linear equations, we can find one unknown here,
without having to find them all. A simple example is as follows.
Example 17.1. Consider the system of 3 linear equations in the 3
unknowns x1 , x2 , x3 with corresponding matrices
⎡ ⎤ ⎡ ⎤
1 1 0 2
A = ⎣ 2 3 1⎦ , B = ⎣8⎦
1 2 3 9
Then det A = 2 and
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 0 1 2 0 1 1 2
B 1 = ⎣ 8 3 1⎦ , B 2 = ⎣ 2 8 1⎦ , B 3 = ⎣ 2 3 8⎦
9 2 3 1 9 3 1 2 9
Thus we have x1 = det B1 / det A = −1/2, x2 = det B2 / det A = 5/2
and x3 = det B3 / det A = 3/2.
Now it is tedious enough finding one n × n determinant, so evaluat-
ing n2 determinants is just too much to ask for. Thus we really cannot
use the adjoint formula to compute the inverse of a large square matrix.
Fortunately, there is a simple computational technique that does the
job rather efficiently.
Let A ∈ F n×n and form the n × (2n) augmented matrix A|In . If A
is nonsingular, then A is row equivalent to the identity matrix In , and
using the same operations we see that A|In is row equivalent to In |B for
some matrix B ∈ F n×n . Furthermore, In |B is in reduced row echelon
form, so In |B and hence B are uniquely determined by this process.
Indeed, it turns out that B = A−1 . We consider some examples.
Example 17.2. Let
⎡ ⎤
1 0 1
A = ⎣ 1 1 2⎦
2 1 3
Then the augmented matrix is given by
⎡ ⎤
1 0 1 1 0 0
A|I3 = ⎣1 1 2 0 1 0⎦
2 1 3 0 0 1
i l i
and we now apply elementary row operations. To start with, subtract
the first row from the second and subtract twice the first row from the

ISTUDY
138 III. DETERMINANTS

third. This yields


⎡ ⎤
1 0 1 1 0 0
⎣0 1 1 −1 1 0⎦
0 1 1 −2 0 1

Next, subtracting row 2 from row 3, we get


⎡ ⎤
1 0 1 1 0 0
⎣0 1 1 −1 1 0⎦
0 0 0 −1 −1 1

and this makes no sense. The three zeroes in the last row tell us that
A must have been a singular matrix, so A−1 does not exist.
Example 17.3. Let us start again, this time with
⎡ ⎤
1 0 1

A= 1 1 2⎦
2 1 4
so that
⎡ ⎤
1 0 1 1 0 0
A|I3 = ⎣1 1 2 0 1 0⎦
2 1 4 0 0 1

Again, we subtract the first row from the second and subtract twice
the first row from the third. This yields
⎡ ⎤
1 0 1 1 0 0
⎣0 1 1 −1 1 0⎦
0 1 2 −2 0 1
2f 3 i
Next, subtracting row 2 from row 3, gives us
⎡ ⎤
1 0 1 1 0 0
⎣0 1 1 −1 1 0⎦
0 0 1 −1 −1 1

and finally subtracting row 3 from rows 1 and 2 yields


⎡ ⎤
1 0 0 2 1 −1
⎣0 1 0 0 2 −1⎦ = I3 |B
0 0 1 −1 −1 1

ISTUDY
17. ADJOINTS AND INVERSES 139

Thus ⎡ ⎤
2 1 −1
B=⎣ 0 2 −1⎦
−1 −1 1
must equal A−1 and we can check this by verifying that AB = BA = I3 .
We state this procedure as a formal lemma and offer a proof.
Lemma 17.3. If A ∈ F n×n is nonsingular, then the augmented
matrix A|In is row equivalent to the reduced row echelon matrix In |A−1 .

First Proof. To avoid the subscript on the identity matrix, let us


write In = J. We consider the matrix equation AX = J, where X has
size n × n. A solution here is a right inverse for A and hence the unique
two-sided inverse A−1 by Corollary 17.1. Notice that the jth column
of X, namely the column matrix Xj , determines the jth column of the
product AX = J, namely the column matrix Jj . Thus AX = J is
equivalent to the n matrix equations AXj = Jj for j = 1, 2, . . . , n.
Now, for each j, we solve AXj = Jj by using the techniques of
Section 7. Namely, we first form the augmented matrix A|Jj and then
put this into reduced row echelon form. Since A is row equivalent
to In = J, the same elementary row operations will transform A|Jj
to J|Bj for some column matrix Bj and this system of equations has
the same solution set as the original by Lemma 7.1. Writing this new
system as a matrix product yields Bj = JXj = Xj .
Of course, we can solve all n systems simultaneously by forming the
extended augmentation matrix
A|J1 |J2 | · · · |Jn = A|J
and applying the same row operations that transform A into J. We
see that A|J is transformed to
J|B1 |B2 | · · · |Bn = J|B
where Bj is the jth column of B ∈ F n×n . But Bj = Xj for all j, so
B = X = A−1 , as required.
Finally, note that J|B is in reduced row echelon form, so it is the
unique reduced row echelon form matrix obtainable from A|J. Hence it
is independent of the particular sequence of elementary row operations
that are used. 
Notice that the proof above uses a right inverse for A. There is
another proof, interesting in its own right, that uses a left inverse for
A. To start with, we have a result we have seen before.

ISTUDY
140 III. DETERMINANTS

Lemma 17.4. Let A ∈ F m×n and B ∈ F n×r , so that AB ∈ F m×r .


If E is an elementary row operation, then E(AB) = E(A)·B. In par-
ticular, E(B) = E(In )·B.

Proof. As we know, the rth row of AB is determined by the rth


row of A and all entries of B. In particular, if we interchange two
rows of A, then the two corresponding rows of AB are interchanged.
Furthermore, if we multiply the rth row of A by c ∈ F , then the rth
row of AB is also multiplied by c. The third elementary row operation
can also be visualized, but we offer a more concrete argument. Let
A = [aij ], B = [bij ] and AB = [cij ]. If we add x times the rth row of A
to the sth row, then asj becomes asj = asj + x·arj . Thus csj becomes
 
csj = ask bkj = (ask + x·ark )bkj
k k
 
= ask bkj + x· ark bkj = csj + x·crj
k k
and this says that the new sth row of AB is the old sth row plus x
times the rth row.
Finally, since B = In B, we see that E(B) = E(In B) = E(In )B and
the lemma is proved. 
The various E(In ) above are called elementary matrices. As a con-
sequence, we have the following second proof of Lemma 17.3.
Second Proof. Since A ∈ F n×n is nonsingular, it is row equiv-
alent to In . Thus there exists a sequence E1 , E2 , . . . , Ek of elementary
row operations that transform A to In . In particular, if Ei = Ei (In ),
then the preceding lemma implies that
Ek · · · E2 E1 ·A = Ek · · · E2 E1 (A) = In
and thus Ek · · · E2 E1 is a left inverse for A. By Corollary 17.1, it is the
unique two-sided inverse, namely A−1 . But again, by Lemma 17.4,
Ek · · · E2 E1 = Ek · · · E2 E1 ·In = Ek · · · E2 E1 (In )
In other words, the elementary row operations that transform A into
In , also transform In into A−1 and this is precisely what the process
of Lemma 17.3 asserts. Indeed by considering the augmented matrix
A|In , we apply the same row operations to A and to In , in the same
order, so when A|In becomes In |B, we must have B = A−1 . 

ISTUDY
17. ADJOINTS AND INVERSES 141

Problems
17.1. Let α ∈ F n×n with n ≥ 2. If det α = 0, show that det(adj α) =
0. For this, note that if det(adj α) = 0, then adj α is invertible, and
then (adj α)α = 0 implies that α = 0. As a consequence, conclude that
for all α, we have det(adj α) = (det α)n−1 .
17.2. If α, β are nonsingular matrices in F n×n , show that (αβ)−1 =
β −1 α−1 and then that adj(αβ) = (adj β)(adj α).
17.3. For  
a b
α=
c d
find adj α and adj(adj α). If α is nonsingular, find α−1 .
17.4. If α ∈ F n×n has rank n − 1, use Problem 12.3 to prove that
adj α has rank 1. In particular, if n ≥ 3, conclude that adj(adj α) = 0.
17.5. Compute the adjoint of the 3 × 3 real matrix
⎡ ⎤
1 1 0
⎣2 3 −1⎦
1 4 −1
Find the inverse of this matrix using Corollary 17.1 and also by using
the technique of Lemma 17.3.
17.6. Solve the system of real linear equations
x1 + x2 + 2x3 = 4
x1 + 2x2 + 2x3 = 9
2x1 + 3x2 + 7x3 = 7
using Cramer’s Rule.
17.7. Prove Cramer’s Rule using the inverse of the matrix A given
by Corollary 17.1 and suitable column cofactor expansions.
17.8. State and prove the column analog of Lemma 17.4. Show that
the associative law of matrix multiplication implies that any elementary
row operation commutes with any elementary column operation. For
this, see Problem 6.3.
17.9. Show that any elementary matrix obtained from an elemen-
tary column operation applied to In is equal to an elementary matrix
obtained from an elementary row operation applied to In .
17.10. Prove that elementary matrices are all nonsingular with ele-
mentary inverses, and that any nonsingular square matrix is a product
of elementary matrices.

ISTUDY
142 III. DETERMINANTS

18. The Characteristic Polynomial


Now that we know determinants exist, it is a simple matter to solve
the eigenvalue problem. Let T : V → V be a linear transformation. If
a ∈ F is an eigenvalue for T , then the linear transformation S = aI −T
is singular and thus for any basis B of V we have
det B [aI − T ]B = det B [S]B = 0
Therefore what we want to study is det B [aI −T ]B . Now this is clearly a
function of a ∈ F and the easiest way to indicate this fact is to replace a
by some variable like x. But then, what does xI mean? Unfortunately
it doesn’t mean anything. We avoid this difficulty by first considering
matrices.
Let F be a field and, as in Example 2.5, let F [x] be the ring of poly-
nomials over F in the variable x. Then F [x] embeds in a certain field
extension F (x) of F , the so-called field of rational functions. Specifi-
cally, this is the set of all quotients of polynomials with coefficients in
F . More precisely, F (x) is the set of equivalence classes of such frac-
tions. It bears the same relationship to F [x] as the rational field Q does
to the ring of integers Z. Note that F ⊆ F [x] ⊆ F (x) with F being
the set of constant polynomials. With this, we have F n×n ⊆ F (x)n×n .
In other words, F n×n is the set of all n × n matrices over F (x) all of
whose entries lie in F .
Now F is a subfield of F (x). This means that when we add or
multiply two elements of F , it does not matter whether we view these
elements as belonging to F or to F (x). Certainly this property carries
over to say that if we add or multiply two matrices in F n×n , it does
not matter whether we view these as belonging to F n×n or F (x)n×n .
Moreover all our previous results on matrices and determinants hold
for F (x)n×n , since after all they were proved for an arbitrary field.
Let α ∈ F n×n . We define its characteristic polynomial ϕα (x) to be
ϕα (x) = det(xIn − α)
where xIn − α is viewed as a matrix in F (x)n×n . The following lemma
justifies the name. Note that a polynomial is said to be monic if its
leading coefficient is equal to 1.
Lemma 18.1. If α ∈ F n×n , then ϕα (x) is a monic polynomial of
degree n.
Proof. If α = [aij ], then xIn − α = [bij ] where bij = −aij for i = j
and bii = x − aii . By Theorem 15.3, we have

ϕα (x) = |bij | = ±b1j1 b2j2 · · · bnjn

ISTUDY
18. THE CHARACTERISTIC POLYNOMIAL 143

where the sum is over all n-tuples (j1 , j2 , . . . , jn ) of distinct column


subscripts. Since each bij is a polynomial in x of degree ≤ 1, we see
immediately that ϕα (x) ∈ F [x] and has degree ≤ n. Now the only way
we can get a term of degree n in this sum is for each bij to have degree
1 and this occurs only for the term
+b11 b22 · · · bnn = (x − a11 )(x − a22 ) · · · (x − ann )
= xn + smaller degree terms
Thus ϕα (x) has degree n with leading coefficient equal to 1. 
In order to define the characteristic polynomial of a linear transfor-
mation, we need the following.
Lemma 18.2. Let α, β ∈ F n×n be similar matrices. Then ϕα (x) =
ϕβ (x).
Proof. By definition, there exists a nonsingular matrix γ ∈ F n×n
with γ −1 βγ = α. Let us work in F (x)n×n . Since it is easy to see that
(xIn )γ = γ(xIn ) = xγ, we have
γ −1 (xIn − β)γ = γ −1 (xIn )γ − γ −1 βγ
= γ −1 γ(xIn ) − γ −1 βγ = xIn − α
and thus xIn − β and xIn − α are similar matrices in F (x)n×n . By
Corollary 16.2, we conclude that
ϕα (x) = det(xIn − α) = det(xIn − β) = ϕβ (x)
and the lemma is proved. 
Let V be an n-dimensional vector space over F and let T : V → V
be a linear transformation. We define the characteristic polynomial
ϕT (x) of T to be
ϕT (x) = ϕα (x)
where α = A [T ]A for some basis A of V . Observe that if B is a sec-
ond basis and if β = B [T ]B , then α and β are similar matrices by
Lemma 13.1(i). Now by the above, such matrices have the same char-
acteristic polynomial, and thus we see that ϕT (x) is well defined. In
particular, we have proved
Lemma 18.3. Let T : V → V be a linear transformation. Then
ϕT (x) is a monic polynomial of degree n = dimF V . Moreover, this
polynomial can be found by choosing any basis A for V and then setting
ϕT (x) = ϕα (x) where α = A [T ]A .

ISTUDY
144 III. DETERMINANTS

Recall that if f (x) ∈ F [x], then we can consider f to be a function


f : F → F by evaluation. Namely if a ∈ F , then we obtain f (a) by
replacing x by a in the polynomial expression for f . We have easily for
f (x), g(x) ∈ F [x] and a ∈ F
(f + g)(a) = f (a) + g(a)
(f g)(a) = f (a)g(a)
where the latter is multiplication of polynomials and not composition.
If f (a) = 0, we say that a is a root of f (x).
The main result connecting eigenvalues of T and the characteristic
polynomial ϕT (x) is
Theorem 18.1. Let V be a finite dimensional vector space over
F and let T : V → V be a linear transformation. Then a ∈ F is an
eigenvalue for T if and only if a is a root of ϕT (x).
Proof. Fix a basis B for V and let a ∈ F . Then we know by
Theorem 13.2 that a is an eigenvalue for T if and only if the linear
transformation S = aI − T is singular and hence by Corollary 15.2 if
and only if det B [S]B = 0. Now
B [S]B = B [aI − T ]B = B [aI]B − B [T ]B = aIn − α
where α = B [T ]B . Thus a is an eigenvalue for T if and only if
0 = det B [S]B = det(aIn − α) = ϕT (a)
and therefore if and only if a is a root of ϕT (x). 
In order to better use the above result, we briefly consider roots of
polynomials.
Lemma 18.4. Let f (x) ∈ F [x] be a nonzero polynomial.
i. If a ∈ F , then a is a root of f (x) if and only if we have
f (x) = (x − a)g(x) for some polynomial g(x) ∈ F [x].
ii. f (x) can be written uniquely up to the order of the factors as
the product
f (x) = (x − a1 )(x − a2 ) · · · (x − ar )g(x)
where g(x) ∈ F [x] has no roots in F . Thus {a1 , a2 , . . . , ar } is
the set of roots of f (x) in F .
Proof. (i ) If f (x) = (x−a)g(x), then clearly f (a) = (a−a)g(a) =
0 and a is a root of f (x). Conversely, suppose a is a root. By formal
long division, we divide f (x) by (x − a) and obtain
f (x) = (x − a)g(x) + b

ISTUDY
18. THE CHARACTERISTIC POLYNOMIAL 145

where b is some element of F . Setting x = a in this expression yields


b = 0 and this fact follows.
(ii ) We show the existence of the required representation for f (x)
by induction on the degree of f (x). If f (x) has no roots in F , then
f (x) = f (x) is such a product and we are done. Now suppose f (x)
has a root a1 ∈ F . By (i ), f (x) = (x − a1 )h(x) for some h(x) ∈ F [x].
Since clearly deg h(x) = deg f (x) − 1, we see by induction that h(x) =
(x − a2 )(x − a3 ) · · · (x − ar )g(x) where g(x) has no roots in F . Thus
f (x) = (x − a1 )h(x) = (x − a1 )(x − a2 ) · · · (x − ar )g(x)
and existence is proved.
Now suppose f (x) is written as above. Then clearly the elements
a1 , a2 , . . . , ar are roots of f (x). Conversely suppose a ∈ F is a root of
f (x). Then
0 = f (a) = (a − a1 )(a − a2 ) · · · (a − ar )g(a)
Since g(x) has no roots in F , we see that g(a) = 0 and thus we must
have a − ai = 0 for some i. Therefore a = ai and {a1 , a2 , . . . , ar } is
precisely the set of roots of f (x) in F .
Finally, we show that the above expression for f (x) is unique by
induction on deg f (x). Suppose
f (x) = (x − a1 )(x − a2 ) · · · (x − ar )g(x)
= (x − b1 )(x − b2 ) · · · (x − bs )h(x)
where g(x) and h(x) have no roots in F . If s = 0, then f (x) = h(x)
has no roots in F . This implies that no ai s can occur, so r = 0 and
g(x) = f (x) = h(x). On the other hand, if s > 0, then b1 is a root of
f (x), so b1 ∈ {a1 , a2 , . . . , ar }. Say b1 = a1 . Since f (x) ∈ F [x] ⊆ F (x)
and F (x) is a field, we can cancel the common factor x − a1 = x − b1
and conclude that
(x − a2 ) · · · (x − ar )g(x) = (x − b2 ) · · · (x − bs )h(x)
By induction, uniqueness follows. 
Suppose now that f (x) = (x − a1 )(x − a2 ) · · · (x − ar )g(x) where
g(x) has no roots in F . Then f (x) has r roots, but they may not all
be distinct. Thus, to be precise, we say that f (x) has r roots counting
multiplicities, that is counting a root b the number of times (x − b)
occurs as a factor of f (x).
Corollary 18.1. If T : V → V is a linear transformation with
dimF V = n, then ϕT (x) has at most n roots in F counting multiplici-
ties. Thus T has at most n eigenvalues.

ISTUDY
146 III. DETERMINANTS

If F = R, then we know that the polynomial f (x) = x2 + 1 has


no roots in R. However it does have roots in C. In fact, the field of
complex numbers has a very nice property, namely it is algebraically
closed. Formally a field F is said to be algebraically closed if every
nonconstant polynomial f (x) ∈ F [x] has at least one root in F . There
are many examples of such fields but they are difficult to construct. So
we will just be content with having the one example F = C.
Lemma 18.5. Let F be an algebraically closed field. If f (x) ∈ F [x]
has degree n > 0, then f (x) has precisely n roots in F counting multi-
plicities.

Proof. Write f (x) as the product


f (x) = (x − a1 )(x − a2 ) · · · (x − ar )g(x)
where g(x) has no roots in F . Since F is algebraically closed, it follows
that g(x) is a constant polynomial. Thus r = n and f (x) has n roots
counting multiplicities. 
Corollary 18.2. Assume that F is an algebraically closed field.
If T : V → V is a linear transformation and dimF V = n, then ϕT (x)
has exactly n roots in F counting multiplicities. Thus T has at least
one eigenvalue and eigenvector.
Unfortunately, the existence of these n eigenvalues, counting mul-
tiplicities, does not guarantee that the transformation T can be diag-
onalized. For example, if
 
0 0
A [T ]A =
1 0

then ϕT (x) = x2 so T has the eigenvalue 0 with multiplicity 2. If T


could be diagonalized, then for some basis B we would have B [T ]B =
diag(0, 0) and T = 0, a contradiction. We do however have
Lemma 18.6. Let a1 , a2 , . . . , ak be distinct eigenvalues for the linear
transformation T with corresponding eigenvectors α1 , α2 , . . . , αk . Then
{α1 , α2 , . . . , αk } is linearly independent.
Proof. We proceed by induction on k. If k = 1, then {α1 } is
linearly independent since α1 = 0 by assumption. Suppose the result
is true for k − 1 and let
0 = b1 α 1 + b2 α 2 + · · · + bk α k

ISTUDY
18. THE CHARACTERISTIC POLYNOMIAL 147

Since αi T = ai αi , we obtain by applying T ,


0 = 0T = (b1 α1 + b2 α2 + · · · + bk αk )T
= b 1 a 1 α 1 + b 2 a2 α 2 + · · · + b k ak α k
Now by subtracting the second displayed equation from ak times the
first, we have
0 = b1 (ak − a1 )α1 + b2 (ak − a2 )α2 + · · · + bk−1 (ak − ak−1 )αk−1
and thus by induction bi (ak − ai ) = 0 for i = 1, 2, . . . , k − 1. Since
the ai s are distinct, ak − ai = 0, so bi = 0 for i = 1, 2, . . . , k − 1. The
first equation then reduces to bk αk = 0 and since αk = 0 we also have
bk = 0. Thus {α1 , α2 , . . . , αk } is linearly independent. 
As a consequence, we conclude that
Theorem 18.2. Let T : V → V be a linear transformation with V
an n-dimensional vector space over F . If ϕT (x) has n distinct roots in
F , then T can be diagonalized.

Proof. Let the roots of ϕT (x) be a1 , a2 , . . . , an ∈ F . Then each


ai is an eigenvalue of T and let αi be some corresponding eigenvec-
tor. Since the ai s are distinct, the preceding lemma implies that
A = {α1 , α2 , . . . , αn } is a linearly independent subset of V of size
n = dimF V . Thus A is a basis for V and V has a basis consisting
entirely of eigenvectors of T . This implies that T can be diagonalized
and in fact A [T ]A = diag(a1 , a2 , . . . , an ). 
We close this section with another application of the embedding of
F n×n into F (x)n×n .
Example 18.1. Let a1 , a2 , . . . , an be n elements of F and form the
n × n Vandermonde matrix
⎡ ⎤
1 1 ··· 1
⎢ a1 a2 · · · an ⎥
⎢ 2 ⎥
ν=⎢ ⎢ a1 a22 · · · a2n ⎥ ⎥
⎣ ......... ⎦
n−1 n−1 n−1
a1 a2 · · · an
Notice that if ai = aj for some i = j, then two columns of this matrix
are equal and hence det ν = 0. We claim that in general

det ν = (ai − aj )
i>j

ISTUDY
148 III. DETERMINANTS

Since this holds in the trivial situation when two of the parameters are
equal, we can assume in the proof by induction that all ai are distinct.
Obviously this result holds for n = 1 and 2, so let n > 2.
Now work in F (x)n×n and form the n × n matrix
⎡ ⎤
1 1 ··· 1
⎢ a1 a2 · · · x ⎥
⎢ 2 ⎥
α=⎢ ⎢ a1 a2 · · · x 2 ⎥
2

⎣ ......... ⎦
n−1 n−1 n−1
a1 a2 ··· x
where an is replaced by the variable x. By considering the column
cofactor expansion with respect to the nth column, we see that det α
is a polynomial in x of degree ≤ n − 1. Indeed, it has degree precisely
n − 1 since the coefficient of xn−1 is cn−1 = det αnn , the Vandermonde
determinant determined by a1 , a2 , . . . , an−1 . Furthermore, cn−1 = 0
since the parameters are distinct.
Finally note that det α is 0 when evaluated at x = a1 , a2 , . . . , an−1
since in each of these cases, two columns of the matrix are equal. Since
these n − 1 roots are all distinct, we must have
det α = cn−1 (x − a1 )(x − a2 ) · · · (x − an−1 )
and evaluating at x = an yields
  
det ν = cn−1 (an − aj ) = (ai − aj )· (an − aj )
n>j n=i>j n>j

= (ai − aj )
i>j

as required.

Problems
18.1. If α ∈ F n×n and a ∈ F , show that α(aIn ) = (aIn )α.
Compute the characteristic polynomials of the following two matri-
ces with entries in the field F .
18.2. ⎡ ⎤
a11 a12 a13 a14 a15
⎢ 0 a22 a23 a24 a25 ⎥
⎢ ⎥
α=⎢
⎢0 0 a33 a34 a35 ⎥

⎣0 0 0 a44 a45 ⎦
0 0 0 0 a55

ISTUDY
18. THE CHARACTERISTIC POLYNOMIAL 149

18.3. ⎡ ⎤
0 1 0 0
⎢0 0 1 0⎥
α=⎢
⎣0

0 0 1⎦
a b c d
If α = [aij ] ∈ F n×n , then the trace of α is defined to be
tr α = a11 + a22 + · · · + ann
the sum of the entries of α on the main diagonal.
18.4. Prove that tr : F n×n → F is a linear functional and that
tr(αβ) = tr(βα) for all α, β ∈ F n×n .

18.5. If α ∈ F n×n , show that


ϕα (x) = xn − (tr α)xn−1 + smaller degree terms
Furthermore, show that the constant term of ϕα (x) is (−1)n det α.
Let dimF V = n and let T : V → V be a linear transformation. We
define tr T and det T by
tr T = tr α, det T = det α
where α = A [T ]A for some basis A of V .
18.6. Prove that tr T and det T are well defined, that is they are
independent of the choice of basis A.
18.7. Let F be algebraically closed and let a1 , a2 , . . . , an be the
eigenvalues of T counting multiplicities. Show that
tr T = a1 + a2 + · · · + an , det T = a1 a2 · · · an
18.8. Let V be a 2-dimensional vector space over the real field and
let T : V → V satisfy  
1 −1
A [T ]A =
3 2
Find ϕT (x) and then using T 0 = I, find the matrix of the linear trans-
formation ϕT (T ).
If α ∈ F n×n , define α(x) ∈ F (x)n×n by α(x) = xIn + α.
18.9. Show that α(x) is a nonsingular matrix with entries in F [x]
and satisfying α(0) = α.
18.10. Let α, β ∈ F n×n . Use Problem 17.2 and the above to show
that adj(α(x)β(x)) = (adj β(x))·(adj α(x)). Now evaluate at x = 0 to
conclude that adj(αβ) = (adj β)·(adj α).

ISTUDY
150 III. DETERMINANTS

19. The Cayley-Hamilton Theorem


There are other properties of characteristic polynomials that are
of interest. We start with a simple question. Is there anything really
special about a characteristic polynomial? First, we know it must have
degree n where n = dimF V and second it must be monic. Are there
other requirements on the polynomial? The answer is “no”.
Let f (x) ∈ F [x] be a monic polynomial of degree n so that

f (x) = a0 + a1 x + · · · + an−1 xn−1 + xn

We define the companion matrix of f to be the n × n F -matrix


⎡ ⎤
0 1 0 0 ··· 0 0
⎢ 0 0 1 0 ··· 0 0 ⎥
⎢ ⎥
⎢ 0 0 0 1 · · · 0 0 ⎥
⎢ ⎥
α=⎢ ...................... ⎥
⎢ 0 0 0 0 · · · 1 0

⎢ ⎥
⎣ 0 0 0 0 ··· 0 1 ⎦
−a0 −a1 −a2 −a3 · · · −an−2 −an−1
Lemma 19.1. Let α be the companion matrix of the monic polyno-
mial f (x) ∈ F [x]. Then
ϕα (x) = f (x)

Proof. Let f and α be as above. To find ϕα (x) we must compute


det β where β ∈ F (x)n×n is given by
⎡ ⎤
x −1 0 0 ··· 0 0
⎢ 0 x −1 0 · · · 0 0 ⎥
⎢ ⎥
⎢0 0 x −1 · · · 0 0 ⎥
⎢ ⎥
β = xIn − α = ⎢ ................... ⎥
⎢0 0 0 0 · · · −1 0

⎢ ⎥
⎣0 0 0 0 ··· x −1 ⎦
a0 a1 a2 a3 · · · an−2 x + an−1

We prove that det β = f (x) by induction on n.


Suppose n = 1. Then we have f (x) = x + an−1 , α = [−an−1 ] and
β = [x + an−1 ], so the result follows in this case. Now suppose that
n > 1 and the result is true for all monic polynomials of degree n − 1.
Since there are only two nonzero entries in the first column of β, it is
a simple matter to apply the cofactor expansion here and obtain

det β = x(det β11 ) + (−1)n+1 a0 (det βn1 )

ISTUDY
19. THE CAYLEY-HAMILTON THEOREM 151

Now ⎡ ⎤
x −1 0 · · · 0 0
⎢ 0 x −1 · · · 0 0 ⎥
⎢ ⎥
⎢ .............. ⎥
β11 = ⎢ ⎥
⎢0 0 0 · · · −1 0 ⎥
⎣0 0 0 ··· x −1 ⎦
a1 a2 a3 · · · an−2 x + an−1
so β11 = xIn−1 −α where α∗ is the companion matrix of the polynomial

f ∗ (x) = a1 + a2 x + · · · + an−2 xn−3 + an−1 xn−2 + xn−1


Thus by induction, det β11 = f ∗ (x).
On the other hand
⎡ ⎤
−1 0 0 ··· 0 0
⎢ x −1 0 · · · 0 0⎥
⎢ ⎥
⎢0 x −1 · · · 0 0⎥
βn1 = ⎢ ⎥
⎢ ................. ⎥
⎣0 0 0 · · · −1 0 ⎦
0 0 0 · · · x −1
and we have seen such matrices before, where all entries above the main
diagonal are zero or perhaps where all entries below the main diagonal
are zero. We know that the determinant of such a matrix is equal to
the product of its main diagonal entries. However here we can take a
different approach. First add x times the first row to the second, then
add x times the second row to the third, and continue in this manner,
finally adding x times the (n − 2)nd row to the (n − 1)st. In this way,
we have found a new matrix with the same determinant as βn1 and this
new matrix is clearly −In−1 . Thus we have
det βn1 = det(−In−1 ) = (−1)n−1
Putting this all together, we have
ϕα (x) = det β = x(det β11 ) + (−1)n+1 a0 (det βn1 )
= x(a1 + a2 x + · · · + an−1 xn−2 + xn−1 ) + (−1)n+1 a0 (−1)n−1
= a0 + a1 x + a2 x2 + · · · + an−1 xn−1 + xn = f (x)
and the lemma is proved. 
We can conclude form the above that a field F is algebraically
closed if and only if all linear transformations on finite dimensional
vector spaces over F have at least one eigenvalue. Indeed, suppose
first that F is algebraically closed. Then we know, by Corollary 18.2,
that T : V → V has n = dimF V eigenvalues counting multiplicities. In
the other direction, let f (x) ∈ F [x] be a nonconstant polynomial and

ISTUDY
152 III. DETERMINANTS

we wish to show that f (x) has a root in F . To do this, it clearly suffices


to consider any nonzero scalar multiple of f (x) or in other words we
may assume that f (x) is monic of degree n for some n ≥ 1. Let α be
its companion matrix and let T : F n → F n be a linear transformation
with α = A [T ]A for some basis A of F n . By assumption, T has an
eigenvalue in F so ϕT (x) = ϕα (x) = f (x) has a root in F . This implies
that F is algebraically closed.
Now let us turn to a somewhat different problem. Let f (x) =
a0 + a1 x + · · · + an xn ∈ F [x] and let T ∈ L(V, V ). Does it make sense
to substitute T in for x and consider f (T )? The answer is “yes” with
one proviso. We must consider the constant term a0 as a0 x0 and then
put T 0 = I, the identity linear transformation. Since L(V, V ) is a
vector space over F , we then see that
f (T ) = a0 T 0 + a1 T + a2 T 2 + · · · + an T n
is just some element of L(V, V ). We remark that the power T i is of
course given by
Ti = T ·T
· · · T
i times

Furthermore, since multiplication in L(V, V ) is associative and T 0 = I,


we have T i ·T j = T i+j for all i, j ≥ 0.
Lemma 19.2. Let S, T ∈ L(V, V ) where V is a vector space over F .
i. If a, b ∈ F , then (aS)(bT ) = (ab)(ST ).
ii. If h(x) = f (x)g(x) are polynomials in F [x], then by way of
evaluation at T we have h(T ) = f (T )g(T ).
Proof. (i ) This follows immediately from Lemma 12.1. Indeed by
that result, we have
(aS)(bT ) = a(S(bT )) = a(b(ST )) = (ab)(ST )
(ii ) Write
 
f (x) = ai x i , g(x) = bj xj
i j

Since the distributive law holds in L(V, V ), we have


    
f (T )g(T ) = ai T i bj T j = (ai T i )(bj T j )
i j i,j

Now by (i) above


(ai T i )(bj T j ) = (ai bj )T i T j = (ai bj )T i+j

ISTUDY
19. THE CAYLEY-HAMILTON THEOREM 153

so we have

f (T )g(T ) = (ai bj )T i+j
i,j
  
= ai bj T k = h(T )
k i+j=k

and the lemma is proved. 


Suppose dimF V = n. Then dimF L(V, V ) = n2 by Theorem 11.2.
2
Consider the n2 + 1 linear transformations T 0 = I, T 1 , T 2 , . . . , T n .
Since n2 + 1 > dimF L(V, V ), this set must be linearly dependent.
Thus there exist elements ai ∈ F not all zero with
2
a0 T 0 + a1 T 1 + a2 T 2 + · · · + an2 T n = 0
In other words, there exists a nonzero polynomial f (x) ∈ F [x] of degree
≤ n2 with f (T ) = 0.
As it turns out, the bound of n2 on deg f (x) is really too large.
Indeed, we can always find a monic polynomial f (x) of degree n with
f (T ) = 0. In fact, we can take f to be the characteristic polynomial
of T . This is the content of the Cayley-Hamilton Theorem below.
Theorem 19.1. Let T : V → V be a linear transformation with V
a finite dimensional vector space over F . Then ϕT (T ) = 0.

Proof. Suppose that dimF V = n. Since ϕT (T ) is a linear trans-


formation, we see that ϕT (T ) = 0 if and only if α·ϕT (T ) = 0 for all
α ∈ V . Now α·ϕT (T ) = 0 is certainly true for α = 0, so we need only
consider the nonzero elements of V . Let α ∈ V with α = 0. Then {α} is
a linearly independent subset of V . It may also be true that {α, αT } is
linearly independent, and then possibly {α, αT, αT 2 }. Since V is finite
dimensional, we see that at some point r, {α, αT, . . . , αT r−1 } is linearly
independent, but {α, αT, . . . , αT r−1 , αT r } is linearly dependent. From
the latter dependence, there exist ai ∈ F not all zero with
a0 α + a1 (αT ) + · · · + ar−1 (αT r−1 ) + ar (αT r ) = 0
Clearly ar = 0 since otherwise we would have a dependence relation
among α, αT, . . . , αT r−1 which is not the case. Thus, by multiplying
through by a−1
r if necessary, we may assume that ar = 1 so

a0 α + a1 (αT ) + · · · + ar−1 (αT r−1 ) + αT r = 0


Moreover since ai (αT i ) = α(ai T i ), the above becomes
α(a0 T 0 + a1 T 1 + · · · + ar−1 T r−1 + T r ) = 0

ISTUDY
154 III. DETERMINANTS

and hence αf (T ) = 0, where


f (x) = a0 + a1 x + · · · + ar−1 xr−1 + xr ∈ F [x]
Now {α, αT, . . . , αT r−1 } is linearly independent, so we can complete
this set to form a basis
A = {α, αT, . . . , αT r−1 , β1 , β2 , . . . , βs }
for V where, of course, r + s = n. We consider the matrix A [T ]A . First,
for 0 ≤ i ≤ r − 2 we have (αT i )T = αT i+1 so
(αT i )T = 0α + 0(αT ) + · · · + 1(αT i+1 ) + · · · + 0(αT r−1 )
+ 0β1 + 0β2 + · · · + 0βs
and for i = r − 1 we have (αT r−1 )T = αT r so
(αT r−1 )T = −a0 α − a1 (αT ) − · · · − ar−1 (αT r−1 )
+ 0β1 + 0β2 + · · · + 0βs
Therefore in block matrix form
⎡ ⎤
A 0
A [T ]A =⎣ ⎦
C B

where B ∈ F s×s , C ∈ F s×r and


⎡ ⎤
0 1 0 0 ··· 0 0
⎢ 0 0 1 0 ··· 0 0 ⎥
⎢ ⎥
⎢ 0 0 0 1 ··· 0 0 ⎥
⎢ ⎥
A=⎢ ...................... ⎥
⎢ 0 0 0 0 ··· 1 0 ⎥
⎢ ⎥
⎣ 0 0 0 0 ··· 0 1 ⎦
−a0 −a1 −a2 −a3 · · · −ar−2 −ar−1
is the companion matrix of the polynomial f (x).
We compute ϕT (x). By definition
ϕT (x) = det(xIn − A [T ]A )
⎡ ⎤
xIr − A 0
= det ⎣ ⎦
−C xIs − B

and thus by Theorem 16.3 we have


ϕT (x) = det(xIr − A)· det(xIs − B) = ϕA (x)ϕB (x)

ISTUDY
19. THE CAYLEY-HAMILTON THEOREM 155

But A is the companion matrix of f (x), so Lemma 19.1 yields ϕA (x) =


f (x) and hence ϕT (x) = f (x)ϕB (x).
Finally, by part (ii) of Lemma 19.2, ϕT (T ) = f (T )ϕB (T ), so
   
α·ϕT (T ) = α· f (T )ϕB (T ) = α·f (T ) ϕB (T )
= 0·ϕB (T ) = 0
Thus α·ϕT (T ) = 0 for all α ∈ V and we conclude that ϕT (T ) = 0. 
In the same way, if γ ∈ F n×n and f (x) ∈ F [x], then we can substi-
tute γ in for x with the proviso that γ 0 = In . Clearly f (γ) ∈ F n×n .
Corollary 19.1. Let γ ∈ F n×n . Then ϕγ (γ) = 0.
Proof. Let A be a basis for F n and define a linear transformation
T : F n → F n by A [T ]A = γ. Since
γ i = γ·γ · · · γ = A [T ]A ·A [T ]A · · · A [T ]A = A [T i ]A
we have clearly f (γ) = A [f (T )]A for any polynomial f (x). Finally,
ϕT (x) = ϕγ (x) so, by the Cayley-Hamilton Theorem,
ϕγ (γ) = ϕT (γ) = A [ϕT (T )]A = A [0]A = 0
and the result follows. 

Problems
A nonempty subset A of F [x] is said to be an ideal of the ring if
i. α, β ∈ A implies that α + β ∈ A.
ii. α ∈ A and β ∈ F [x] implies that αβ ∈ A.
For example A = {0} is an ideal and so is the set of all F [x]-multiples
of any fixed element of the ring. In the following three problems, let A
be an ideal of F [x] with A = {0}.
19.1. Let m be the minimal degree of all nonzero polynomials in A.
Show that A contains a unique monic polynomial μ(x) of degree m. μ
is called the minimal polynomial of A.
19.2. Show that for all f (x) ∈ A, there exists some polynomial
g(x) ∈ F [x] with f (x) = μ(x)g(x). (Hint, apply induction on the de-
gree of f (x). If deg f = n ≥ m and f (x) = an xn + lower degree terms,
then f (x) − an xn−m μ(x) ∈ A has degree less than n.)
19.3. Conclude that A = F [x]μ(x) is the set of all F [x]-multiples
of its minimal polynomial μ(x).

ISTUDY
156 III. DETERMINANTS

Let V be a finite dimensional vector space over F and let T : V → V


be a linear transformation. Define
AT = {f (x) ∈ F [x] | f (T ) = 0}
19.4. Show that AT is an ideal of F [x] with AT = {0}. We call the
minimal polynomial μT (x) of AT the minimal polynomial of T . Prove
that μT (x) divides the characteristic polynomial ϕT (x).

19.5. Let A be a basis for V and let α = A [T ]A . Prove that f (x) ∈


AT if and only if f (α) = 0.
19.6. Let f (x), g(x) ∈ F [x].
i. If f (x) and g(x) have no roots in F , show that the same is true
for f (x)g(x). 
ii. If f (x) is monic and divides n1 (x − ai ), prove that f (x) is a
partial product of the (x − ai ) factors.
Using the fact that μT (x) divides ϕT (x), find the minimal polyno-
mials of the linear transformations below given by their corresponding
matrices.
19.7. ⎡ ⎤
1 0 0 0
⎢0 1 0 0⎥

A [T ]A = ⎣
⎥ ∈ R4×4
0 0 1 0⎦
0 0 0 2
19.8.
A [T ]A = diag(a1 , a2 , . . . , an )
with a1 , a2 , . . . , an distinct elements of F .
19.9. ⎡ ⎤
0 0 0 0
⎢1 0 0 0⎥

A [T ]A = ⎣
⎥ ∈ F 4×4
0 1 0 0⎦
0 0 1 0
19.10. ⎡ ⎤
2 0 0
⎣ ⎦
A [T ]A = 1 2 0 ∈ R
3×3

0 1 2

ISTUDY
20. NILPOTENT TRANSFORMATIONS 157

20. Nilpotent Transformations


Let T : V → V be a linear transformation and suppose that its
characteristic polynomial is given by
ϕT (x) = (x − a1 )(x − a2 ) · · · (x − an )
If a1 , a2 , . . . , an are all distinct elements of F , then as we have seen,
T can be diagonalized. What can we conclude if all the roots are
not distinct? Obviously the worst offenders are those T that satisfy
a1 = a2 = · · · = an and we consider the case now where this common
value is 0. Thus ϕT (x) = xn and, by the Cayley-Hamilton Theorem,
we have T n = ϕT (T ) = 0. In other words, T is a nilpotent transforma-
tion. Formally, a nilpotent transformation is a linear transformation
T : V → V satisfying T m = 0 for some integer m. Let us consider some
examples.
Example 20.1. Let V denote the subspace of R[x] consisting of all
polynomials of degree < n, for some fixed n, and let D : V → V be
the derivative map. Since applying D lowers the degree of a nonzero
polynomial and since dimR V = n, we see that Dn = 0. Now V has a
basis
xn−1 xn−2 x2 !
A= , , . . . , , x, 1
(n − 1)! (n − 2)! 2!
and since (xi /i!)D = xi−1 /(i − 1)! for i ≥ 1, and 1D = 0, we see that
⎡ ⎤
0 1 0 0 ··· 0
⎢0 0 1 0 · · · 0⎥
⎢ ⎥
⎢0 0 0 1 · · · 0⎥
A [D]A = ⎢ ⎥
⎢ ............ ⎥
⎣0 0 0 0 · · · 1⎦
0 0 0 0 ··· 0
Example 20.2. Again let V be as above and define Δ : V → V by
α(x)Δ = α(x + 1) − α(x)
Then Δ is called a difference operator and it is in some sense a finite
analog of the derivative. Observe that for i > 0
xi Δ = (x + 1)i − xi
has degree i − 1, so applying Δ always lowers the degree of a nonzero
polynomial. This therefore implies that Δn = 0. Now for i ≥ 0, define
x(i) by x(i) = x(x − 1) · · · (x − i + 1) with x(1) = x and x(0) = 1. Then

ISTUDY
158 III. DETERMINANTS

for i > 0, we have


x(i) Δ = (x + 1)x(x − 1) · · · (x − i + 2) − x(x − 1) · · · (x − i + 1)

= (x + 1) − (x − i + 1) x(x − 1) · · · (x − i + 2)
= ix(i−1)
Since
x(n−1) x(n−2) x(2) (1) (0) !
B= , ,..., ,x ,x
(n − 1)! (n − 2)! 2!
is also a basis for V , we see easily that
⎡ ⎤
0 1 0 0 ··· 0
⎢0 0 1 0 · · · 0⎥
⎢ ⎥
⎢0 0 0 1 · · · 0⎥
B [Δ]B = ⎢ ⎥
⎢ ............ ⎥
⎣0 0 0 0 · · · 1⎦
0 0 0 0 ··· 0
Example 20.3. Let us return to the formal derivative map, but this
time over a different field. Indeed, let F be the field ofi two elements.
Thus F = {0, 1} and 1 + 1 = 0. If α(x) = i≥0 ai x ∈ F [x], then

α(x)D = i≥1 iai xi−1 and

α(x)D2 = i(i − 1)ai xi−2
i≥2

Now i(i− 1) is an even integer, so i(i− 1) is a sum of an even number of


identity elements of F . Since 1 + 1 = 0, we therefore have i(i − 1) = 0
in F and hence α(x)D2 = 0. Thus D2 = 0. Now suppose V is the
subspace of F [x] consisting of all polynomials of degree ≤ 4. Then
C = {x4 , x3 , x2 , x, 1} is a basis for V and
⎡ ⎤
0 0 0 0 0
⎢0 0 1 0 0 ⎥
⎢ ⎥

C [D]C = ⎢0 0 0 0 0⎥

⎣0 0 0 0 1 ⎦
0 0 0 0 0
Thus we have seen in all of the above examples that a basis B
exists for which B [T ]B is a matrix with zero entries everywhere but on
the diagonal directly above the main diagonal. Moreover on that super
diagonal the entries are either 0 or 1. We will see in the following that
this is always the case for nilpotent transformations.

ISTUDY
20. NILPOTENT TRANSFORMATIONS 159

Let a ∈ F and let s ≥ 1 be an integer. Then we call the s × s


F -matrix
⎡ ⎤
a 1 0 0 ··· 0
⎢0 a 1 0 ··· 0⎥
⎢ ⎥
⎢0 0 a 1 ··· 0⎥
⎢ ⎥
⎢ ............ ⎥
⎣0 0 0 0 ··· 1⎦
0 0 0 0 ··· a
an (a, s)-Jordan block and we denote it by b(a, s). Specifically, b(a, s)
has all main diagonal entries equal to a, all super diagonal entries equal
to 1, and zeros elsewhere. Of course, if s = 1, then b(a, 1) = [a]. Next,
we denote by
diag(b(a1 , s1 ), b(a2 , s2 ), . . . , b(ar , sr ))
the n × n block diagonal matrix given pictorially by
⎡ ⎤
b(a1 , s1 ) 0
⎢ b(a2 , s2 ) ⎥
⎢ . ⎥
⎣ .. ⎦
0 b(ar , sr )
where n = s1 + s2 + · · · + sr . The main result of this section is
Theorem 20.1. Let V be a finite dimensional vector space over F
and let T : V → V be a nilpotent linear transformation. Then there
exists a basis B of V such that
B [T ]B = diag(b(0, s1 ), b(0, s2 ), . . . , b(0, sr ))
with s1 ≥ s2 ≥ · · · ≥ sr .
The proof of this theorem is elementary, but somewhat tedious. If
B is a basis for V with B [T ]B having the appropriate form, then it is
easy to see that B ∩ ker T is a basis for ker T , and that the size of
B ∩ ker T is equal to r, the number of blocks in the block diagonal
matrix. Thus it is reasonable to suppose that the proof should start
with a basis for ker T and then use each basis member to construct a
corresponding “block” of vectors in B. As will be apparent, we have to
be a bit careful in the choice of this starting basis for ker T .
Proof. We first define a descending chain of subspaces of V as
follows. Let Vi = im T i = (V )T i so that Vi is a subspace of V and
V0 = V . Moreover for i > 0
Vi = (V )T i = (V T )T i−1 ⊆ (V )T i−1 = Vi−1

ISTUDY
160 III. DETERMINANTS

so Vi ⊆ Vi−1 . Finally, T is nilpotent, so T n+1 = 0 for some n + 1 ≥ 1


and hence Vn+1 = 0. Thus we have
V = V0 ⊇ V1 ⊇ V2 ⊇ · · · ⊇ Vn ⊇ Vn+1 = 0
Next, let W = ker T and set Wi = W ∩ Vi . Then clearly
W = W0 ⊇ W1 ⊇ W2 ⊇ · · · ⊇ Wn ⊇ Wn+1 = 0
At this point, we choose a basis for W consistent with this chain of
subspaces. Specifically, we start with a basis for Wn , extend it to
a basis for Wn−1 , extend that to a basis for Wn−2 and continue this
process. In this way, we obtain a basis
{αij | i = 0, 1, . . . , n; j = 0, 1, . . . , f (i)}
for W such that αij ∈ Wi \ Wi+1 and such that, for each integer q,
{αij | i ≥ q} is a basis for Wq .
Since αij ∈ Vi = im T i , we can choose βij ∈ V with βij T i = αij .
For each such i, j we then obtain i + 1 potentially different vectors of
V , namely βij , βij T, βij T 2 , . . . , βij T i = αij . We show now that the set
of vectors
B = {βij T k | i = 0, 1, . . . , n; j = 0, 1, . . . , f (i); k = 0, 1, . . . , i}
is a basis for V .
We first prove that the vectors in B are distinct and linearly inde-
pendent. Thus suppose

aijk (βij T k ) = 0
i,j,k

with aijk ∈ F . If some aijk is not zero, let


m = max{i − k | aijk = 0}
so that m ≥ 0 since we always have i ≥ k. In particular, if i − k > m
then aijk = 0, but for some i, j, k with i − k = m we have aijk = 0.
We now apply T m to the given linear dependence. Since T m is a linear
transformation, we obtain
  
aijk (βij T k+m ) = aijk (βij T k ) T m = 0T m = 0
i,j,k i,j,k

Let us consider each term aijk (βij T k+m ) in this sum. If i − k > m
then, by assumption, aijk = 0 so this term is 0. On the other hand, if
i − k < m, then i < k + m so
βij T k+m = (βij T i )T ·T k+m−i−1
= (αij T )·T k+m−i−1 = 0T k+m−i−1 = 0

ISTUDY
20. NILPOTENT TRANSFORMATIONS 161

since βij T i = αij ∈ W = ker T . Thus the only terms that occur in the
above sum have i − k = m, so i = k + m and
aijk (βij T k+m ) = aijk (βij T i ) = aijk αij
Thus we have 
aijk αij = 0
i,j,k
k=i−m
But {αij } is a basis for W and there is at most one subscript k for each
i, j, so we conclude that all aijk = 0 with i − k = m, a contradiction by
the definition of m. This proves that B is a linearly independent set of
distinct vectors.
We show now, by inverse induction on q = n + 1, n, . . . , 0, that the
subset of B given by Bq = {βij T k | k ≥ q} spans Vq = im T q = (V )T q .
Of course, Bq ⊆ Vq since all the exponents of T in the vectors of Bq are
at least equal to q. If q = n+1, then Vq = 0 and Bq = ∅, so the result is
trivially true. Let us now suppose that Bq+1 spans Vq+1 and let γ ∈ Vq .
Since γ ∈ Vq = im T q , it follows easily that γT ∈ im T q+1 = Vq+1 and
thus, by induction, 
γT = bijk (βij T k )
i,j,k
k≥q+1
for suitable bijk ∈ F . Note that q ≥ 0 implies that k ≥ q + 1 ≥ 1 and
thus if we define 
δ=γ− bijk (βij T k−1 )
i,j,k
k≥q+1
then δT = 0.
Two facts about δ ∈ V are now apparent. First, δT = 0 so δ ∈
ker T = W . Second, δ ∈ Vq since γ ∈ Vq and since k−1 ≥ q implies that
βij T k−1 ∈ Vq . Thus δ ∈ W ∩ Vq = Wq . Now we know that {αij | i ≥ q}
is a basis for Wq and therefore, for suitable elements cij ∈ F , we have
 
δ= cij αij = cij (βij T i )
i,j i,j
i≥q i≥q

Since

γ=δ+ bijk (βij T k−1 )
i,j,k
k≥q+1
 
= cij (βij T i ) + bijk (βij T k−1 )
i,j i,j,k
i≥q k≥q+1

ISTUDY
162 III. DETERMINANTS

and since all exponents of T are at least q in size, we conclude that Bq


spans Vq and the induction step is proved. Finally, V = V0 , so B = B0
spans V and therefore B is a basis for V .
We finish the proof by writing T in terms of the basis B. However,
this requires that we first choose an ordering for the members of the
basis. We do this in such a way that βij , βij T, βij T 2 , · · · , βij T i occur
consecutively. We then have to decide in which order these βij “blocks”
occur and we do this in such a way that the larger i subscripts appear
ahead of the smaller ones. It is now a simple matter to write down the
matrix B [T ]B . Observe that for k < i we have (βij T k )T = βij T k+1 , and
that for k = i we have (βij T k )T = αij T = 0 since αij ∈ W = ker T .
Thus the contribution of this “block” of the basis to the matrix B [T ]B
of T is the (i + 1) × (i + 1) matrix
⎡ ⎤
0 1 0 0 ··· 0
⎢0 0 1 0 · · · 0⎥
⎢ ⎥
⎢0 0 0 1 · · · 0⎥
b(0, i + 1) = ⎢ ⎥
⎢ ............ ⎥
⎣ 0 0 0 0 · · · 1⎦
0 0 0 0 ··· 0
and this occurs as a block on the main diagonal.
Thus we see finally that

B [T ]B = diag(b(0, s1 ), b(0, s2 ), . . . , b(0, sr ))


where each Jordan block b(0, st ) corresponds to some “block”
βij , βij T, βij T 2 , · · · , βij T i
of the basis. Therefore st = i + 1 and, since we have ordered the βij s
in such a way that the larger i subscripts appear first, we see that
s1 ≥ s2 ≥ · · · ≥ sr . The theorem is proved. 

It is interesting to observe that the integers s1 , s2 , . . . , sr are actually


uniquely determined by T . A proof of this fact is given in the exercises
at the end of this section.
Corollary 20.1. Let V be a finite dimensional vector space over
F and let T : V → V be a linear transformation. Let a ∈ F and suppose
that T − aI is nilpotent. Then there exists a basis B of V such that

B [T ]B = diag(b(a, s1 ), b(a, s2 ), . . . , b(a, sr ))


with s1 ≥ s2 ≥ · · · ≥ sr .

ISTUDY
20. NILPOTENT TRANSFORMATIONS 163

Proof. If S = T − aI, then by the preceding theorem we can find


a basis B with
B [S]B = diag(b(0, s1 ), b(0, s2 ), . . . , b(0, sr ))
and with s1 ≥ s2 ≥ · · · ≥ sr . Now T = aI + S so
B [T ]B = B [aI]B + B [S]B = aIn + B [S]B
where n = dimF V . Thus, since clearly aIs +b(0, s) = b(a, s), the result
follows. 

Problems
20.1. Let F be a field with 1 + 1 + 1 = 0. Discuss the formal
derivative map and the difference map on the full polynomial ring F [x]
and show that these maps are nilpotent.
Recall that R is a commutative ring if it is a set with an addition and
multiplication that satisfies all the axioms for a field with the possible
exception of the existence of multiplicative inverses.
20.2. If p ∈ Z is a prime, show that p divides the binomial coef-
ficients pi for i = 1, 2, . . . , p − 1. Now let R be a commutative ring
with
p = 1 + 1 +· · · + 1 = 0
p times

Deduce that (a + b)p = ap + bp for all a, b ∈ R.


20.3. Let V be an F -vector space and let T : V → V be a nilpotent
linear transformation. Let the integer m ≥ 1 be minimal with T m = 0.
If f (x) ∈ F [x] with f (T ) = 0, show that f (x) = xm g(x) for some
polynomial g(x). See Problems 19.4 and 19.6.
Now let V be a finite dimensional vector space over F with basis B
and let T : V → V be a linear transformation.
20.4. If B [T ]B = b(0, n) show that

k n − k, for k ≤ n
rank T =
0, for k > n

20.5. Now suppose that


B [T ]B = diag(b(0, s1 ), b(0, s2 ), . . . , b(0, sr ))

ISTUDY
164 III. DETERMINANTS

and let ns denote the number of block sizes sj that are equal to s. Use
the preceding result to show that

rank T k = ns (s − k) = nk+1 + 2nk+2 + 3nk+3 + · · ·
s≥k

and deduce that


rank T k − rank T k+1 = nk+1 + nk+2 + nk+3 + · · ·
As a check, we observe that
r = n1 + n2 + n3 + · · · = rank T 0 − rank T 1
= dimF V − dimF (V )T = dimF ker T
20.6. Deduce from the above that for s ≥ 1
ns = (rank T s−1 − rank T s ) − (rank T s − rank T s+1 )
= rank T s−1 − 2 rank T s + rank T s+1
and therefore conclude that the integers s1 , s2 , . . . , sr are uniquely de-
termined by T .
Let T : V → V . If V = U ⊕ W is the direct sum of two nonzero
subspaces with (U )T ⊆ U and (W )T ⊆ W , then we say that V is
decomposable with respect to T . Otherwise it is indecomposable.
20.7. Suppose V is finite dimensional and decomposable as above.
Let A be a basis for U and let C be a basis for W . If B = A ∪ C,
describe the matrix B [T ]B . What happens when V is a finite direct
sum of T -stable subspaces?
20.8. Let V be finite dimensional and assume that T is nilpotent.
Show that V is indecomposable with respect to T if and only if there
exists a basis B with B [T ]B = b(0, n). Hint. If B [T ]B = b(0, n), first
observe that ker T is 1-dimensional.
Let S, T : V → V be commuting linear transformations so that
ST = T S.
←−
20.9. If W = (V )S k or if W = (0)S k , show that (W )T ⊆ W .
20.10. Suppose S and T are both nilpotent. Prove that S + T is
also a nilpotent transformation. What happens if S and T do not
commute?

ISTUDY
21. JORDAN CANONICAL FORM 165

21. Jordan Canonical Form


Let T : V → V be a linear transformation with characteristic poly-
nomial given by ϕT (x) = (x − a1 )(x − a2 ) · · · (x − an ). If all the eigen-
values ai are distinct then, as we have seen previously, T can be diag-
onalized. If all the roots ai are identical, then the results of the last
section imply that T can be diagonalized by Jordan block matrices.
These assumptions are obviously the extreme cases and in this section
we finally prove a general result. The proof will be by induction on
dimF V and therefore we will have to consider T acting on subspaces
of V .
Suppose T : V → V and let W be a subspace of V . We say that W
is T -invariant or T -stable if (W )T ⊆ W . In this case, by restricting our
attention to W , we can clearly view T : W → W as a linear transfor-
mation. To avoid confusion, we denote this restricted transformation
by TW . Let us consider some simple examples.
Example 21.1. Suppose α is an eigenvector of T with correspond-
ing eigenvalue a. If W = α, then W is a 1-dimensional T -invariant
subspace of V . It is easy to see that TW : W → W is just scalar multi-
plication by a ∈ F .
Example 21.2. Let W = im T = (V )T . Then (W )T = ((V )T )T ⊆


(V )T , so W is T -invariant. In addition if W  = ker T = (0) T , then
(W  )T = 0 ⊆ W  , so W  is also T -stable. Clearly TW  = 0.
We say that V is decomposable with respect to T if V = U ⊕ W is a
direct sum of two nonzero T -invariant subspaces U and W . If no such
decomposition exists, then V is indecomposable. If V is decomposable,
it is a simple matter to describe T in terms of its action on the T -stable
subspaces.
Lemma 21.1. Let T : V → V be a linear transformation on a finite
dimensional vector space V and suppose that V = U ⊕ W is a direct
sum of two nonzero T -invariant subspaces. Let A be a basis for U and
let C be a basis for W . Then B = A ∪ C is a basis for V and we have
pictorially
⎡ ⎤
A [TU ]A 0
B [T ]B =
⎣ ⎦
0 C [TW ]C

Proof. Let A = {α1 , α2 , . . . , αr } and C = {γ1 , γ2 , . . . , γs }. Then


certainly
B = A ∪ C = {α1 , α2 , . . . , αr , γ1 , γ2 , . . . , γs }

ISTUDY
166 III. DETERMINANTS

is a basis for V = U ⊕ W . Let A [TU ]A = [aij ] and C [TW ]C = [cij ]. We


compute B [T ]B . To start with, αi ∈ U so
  
αi T = αi TU = aij αj = aij αj + 0γk
j j k

Also, γi ∈ W so
  
γi T = γi TW = cij γj = 0αk + cij γj
j k j

This shows that ⎡ ⎤


[aij ] 0
B [T ]B =⎣ ⎦
0 [cij ]

and the lemma is proved. 


This result will obviously be of use to us provided that we can
handle the indecomposable situation. The key to this is the following.
Lemma 21.2. Let T : V → V and suppose that dimF V = n < ∞.
←−
Set U = V T n and W = (0)T n . Then U and W are both T -invariant
subspaces of V , TU is nonsingular, TW is nilpotent, and V = U ⊕ W .
←−
Proof. Let W = (0)T n . Then
(W T )T n = (W T n )T = 0T = 0
←−
so W T ⊆ (0)T n = W and W is T -invariant. Since W T n = 0, we
conclude that TW is nilpotent.
Now consider the subspaces of V given by Ui = V T i for i =
0, 1, 2, . . .. Since
Ui+1 = V T i+1 = (V T )T i ⊆ V T i = Ui
we have V = U0 ⊇ U1 ⊇ U2 ⊇ . . .. If Ui+1 is properly contained in Ui ,
then dimF Ui+1 < dimF Ui . Since dimF V = n, it follows that for some
integer k ≤ n we must have Uk+1 = Uk . This then implies that
Uk+2 = Uk+1 T = Uk T = Uk+1 = Uk
and continuing in this manner we see that for all j ≥ k we have Uj = Uk .
In particular, since k ≤ n we therefore have
Un T = Un+1 = Un
Let U = Un . Then from the above U T = U so U is T -invariant and
furthermore TU maps U onto U . Since U is finite dimensional, we
conclude that TU is nonsingular.

ISTUDY
21. JORDAN CANONICAL FORM 167

It remains to show finally that V = U ⊕ W . First let α ∈ U ∩ W .


Since α ∈ W , we have αT n = 0. But α ∈ U and TU is nonsingular,
so we have α = 0. Therefore U ∩ W = 0 and U + W = U ⊕ W .
Next observe that since TU : U → U is onto, so is (TU )n : U → U . Let
β ∈ V . Then βT n ∈ U and by the above remark, there exists α ∈ U
with βT n = αT n . Then (β − α)T n = 0, so β − α = γ ∈ W and
β = α + γ ∈ U + W . Thus
V =U +W =U ⊕W
and the lemma is proved. 
We now come to the main result of this section. We obtain a de-
scription, the Jordan canonical form, of all linear transformations on a
finite dimensional vector space over an algebraically closed field.
Theorem 21.1. Let V be a finite dimensional vector space over an
algebraically closed field F and let T : V → V be a linear transforma-
tion. Then there exists a basis B of V such that
B [T ]B = diag(b(a1 , s1 ), b(a2 , s2 ), . . . , b(ar , sr ))
for suitable integers si ≥ 1 and elements ai ∈ F .
Proof. We proceed by induction on n = dimF V , the result being
clear for n = 1. Suppose first that V is decomposable with respect to
T and say V = U ⊕ W with each of U and W a proper T -invariant
subspace of V . Since dimF U < n and dimF W < n, there exist, by
induction, bases A of U and C of W with
A [TU ]A = diag(b(a1 , s1 ), b(a2 , s2 ), . . . , b(at , st ))
C [TW ]C = diag(b(at+1 , st+1 ), b(at+2 , st+2 ), . . . , b(ar , sr ))
for suitable integers si ≥ 1 and elements ai ∈ F . If B = A ∪ C then by
Lemma 21.1, B is a basis for V and
⎡ ⎤
A [TU ]A 0
B [T ]B =
⎣ ⎦
0 C [TW ]C

= diag(b(a1 , s1 ), b(a2 , s2 ), . . . , b(ar , sr ))


Thus the result follows in this case.
Now suppose that V is indecomposable with respect to T . Since F
is algebraically closed, we can choose a ∈ F to be an eigenvalue for T
and set S = T − aI. By definition S is a singular linear transformation.
←−
Let U = V S n and W = (0)S n be as in the preceding lemma. Then
U and W are both S-invariant subspaces with V = U ⊕ W . We now

ISTUDY
168 III. DETERMINANTS

observe that U and W are T -invariant. To this end, suppose that α ∈ U


and γ ∈ W . Since αS ∈ U and γS ∈ W , we have
αT = α(S + aI) = αS + aα ∈ U
γT = γ(S + aI) = γS + aγ ∈ W
Therefore V = U ⊕ W is a direct sum of T -invariant subspaces.
But V is indecomposable with respect to T so we conclude that
either U = 0 and V = W or U = V and W = 0. By the preceding
lemma, SU is nonsingular. Since we know that S : V → V is singular,
we cannot have U = V . Thus U = 0 and V = W and therefore S = SW
is nilpotent. We have shown that S = T − aI is nilpotent and thus, by
Corollary 20.1, there exists a basis B of V such that
B [T ]B = diag(b(a, s1 ), b(a, s2 ), . . . , b(a, sr ))
with s1 ≥ s2 ≥ · · · ≥ sr . The theorem is proved. 
We remark that the above Jordan blocks for T are in fact unique up
to the order in which they occur. However, we will not prove this. Also
if B [T ]B is written as above, then the elements of B have a rather nice
property. Indeed, if β ∈ B then there exists a ∈ F with β(T −aI)m = 0
for some m ≥ 1. This property characterizes what are called generalized
eigenvectors of T . Therefore an immediate consequence of the above
theorem is
Corollary 21.1. Let V be a finite dimensional vector space over
an algebraically closed field F . If T : V → V is a linear transformation,
then V has a basis B consisting of generalized eigenvectors of T .
This concludes our work on finding a basis B that makes the matrix
B [T ]B as simple as possible. We have obviously achieved our goal in case
the field F is algebraically closed, but we have done little or nothing
for nonalgebraically closed fields. Nevertheless a theory does exist in
this more general situation and one gets the so-called rational canonical
form.
The key to this form is a closer study of the polynomial ring F [x].
In the algebraically closed case, every polynomial f (x) ∈ F [x] can be
factored as
f (x) = b(x − a1 )(x − a2 ) · · · (x − an )
with b and the ai in F . However this is no longer true in general as the
examples
f (x) = x2 + 1 ∈ R[x]
g(x) = x2 − 2 ∈ Q[x]

ISTUDY
21. JORDAN CANONICAL FORM 169

show. Even so, one can prove that F [x] has divisibility and factoriza-
tion properties very similar to those of the ordinary integers.
If f (x) is a monic polynomial in F [x], let c(f ) denote its companion
matrix. We recall that if f (x) = xs then
b(0, s) = c(xs )
Now suppose that T : V → V is a linear transformation with charac-
teristic polynomial ϕT (x) = xn . Then our main result on nilpotent
transformations, Theorem 20.1, states that there exists a basis B such
that
s1 s2 sr
B [T ]B = diag(c(x ), c(x ), . . . , c(x ))
where diag has the obvious meaning. Of course
xs1 xs2 · · · xsr = xn = ϕT (x)

This is precisely what happens in general in the rational canonical


form. Let T be given. We can then show that there exists a basis B
for V such that
B [T ]B = diag(c(f1 ), c(f2 ), . . . , c(fr ))
where the fi are monic polynomials with
ϕT (x) = f1 f2 · · · fr
We can further restrict the fi to look even more like xsi . Namely
the fi can be taken to be powers of irreducible polynomials in F [x],
that is polynomials that cannot be factored in F [x] as a product of
polynomials of smaller degree.

Problems
Let V be a finite dimensional vector space over the algebraically
closed field F and let T : V → V . For each a ∈ F let
Va = {α ∈ V | α(T − aI)n = 0 for some n ≥ 1}
21.1. Prove that Va is a subspace of V invariant under T . (Va is the
space of generalized eigenvectors for T with eigenvalue a.)
21.2. Show that Va = 0 if and only if a is an eigenvalue of T .
21.3. Let f (x) be a nonzero polynomial in F [x] and let a ∈ F with
f (a) = 0. Suppose α ∈ V satisfies
αf (T ) = 0 and
α(T − aI)n = 0 for some n ≥ 1

ISTUDY
170 III. DETERMINANTS

Show that α = 0. (Hint. First prove this for n = 1. In general, set


β = α(T − aI)n−1 and consider βf (T ) and β(T − aI).)
21.4. Let α ∈ Va1 + Va2 + · · · + Var . Show that there exist integers
n1 , n2 , . . . , nr ≥ 1 with
α(T − a1 I)n1 (T − a2 I)n2 · · · (T − ar )nr = 0
21.5. Use the above and Corollary 21.1 to conclude that
V = V a1 ⊕ V a2 ⊕ · · · ⊕ V at
where a1 , a2 , . . . , at are the distinct eigenvalues of T .
21.6. If ki = dimF Vai show that

ϕT (x) = (x − ai )ki
i

Lemma 21.2 can be generalized to infinite dimensional vector spaces


under suitable circumstances. Let V be an arbitrary vector space over
F and let T : V → V be a linear transformation. Suppose there exists
an integer n ≥ 1 with
U = (V )T n = (V )T 2n
←− ←−
W = (0)T n = (0)T 2n
21.7. Show that U and W are both T -invariant subspaces of V and
that TU is onto and TW is nilpotent.
21.8. Prove that V = U + W .
21.9. Prove that U ∩ W = 0 so that V = U ⊕ W . (Hint. Let α ∈
U ∩ W . Since α ∈ U = U T n we have α = βT n . Then 0 = αT n = βT 2n
and we can deduce information about β.)
21.10. Show that TU is one-to-one and hence nonsingular.

ISTUDY
CHAPTER IV

Bilinear Forms

ISTUDY
172 IV. BILINEAR FORMS

22. Bilinear Forms


There are many other aspects of linear algebra. The one we will
consider for the remainder of this course has its roots in geometry, in
the study of conic sections. Let us consider the real Euclidean plane
R2 with (x, y)-coordinate axes. Then the conic sections are graphs of
equations of the form
ax2 + 2bxy + cy 2 + dx + ey + f = 0
As one sees in analytic geometry, the main interest here is in the qua-
dratic part
Q(x, y) = ax2 + 2bxy + cy 2
and this is a function from R2 into the real numbers R.
On the other hand, since this is a course in linear algebra, Q must
somehow be related to a linear function. It is indeed. Suppose α1 =
(x1 , y1 ) and α2 = (x2 , y2 ) are vectors in R2 . We define
B(α1 , α2 ) = ax1 x2 + bx1 y2 + by1 x2 + cy1 y2
and we now have a new function, this time from R2 × R2 to R. Now
what happens when we set α1 = α2 = α = (x, y)? Then
B(α, α) = ax2 + 2bxy + cy 2 = Q(x, y)
and we are back where we started.
Why then do we introduce B? The reason is that it is bilinear. Let
α1 , α2 be as above and set α2 = (x2 , y2 ). Then
B(α1 , α2 + α2 ) = ax1 (x2 + x2 ) + bx1 (y2 + y2 )
+ by1 (x2 + x2 ) + cy1 (y2 + y2 )
= (ax1 x2 + bx1 y2 + by1 x2 + cy1 y2 )
+ (ax1 x2 + bx1 y2 + by1 x2 + cy1 y2 )
= B(α1 , α2 ) + B(α1 , α2 )
Moreover if r ∈ R then
B(α1 , rα2 ) = ax1 (rx2 ) + bx1 (ry2 ) + by1 (rx2 ) + cy1 (ry2 )
= r(ax1 x2 + bx1 y2 + by1 x2 + cy1 y2 )
= rB(α1 , α2 )
Of course a similar result holds if we study B as a function of its first
variable.
Let V be a vector space over a field F . A bilinear form is a map
B: V × V → F

ISTUDY
22. BILINEAR FORMS 173

such that B is a linear transformation when viewed as a function of


each of the two variables individually. Let us consider some examples.

Example 22.1. We always have the zero map, namely


B(α, β) = 0 for all α, β ∈ V
Example 22.2. If dimF V = 2, then
det : V × V → F
is certainly bilinear.
Example 22.3. Let V = F [x] and for each a ∈ F let Ea denote the
evaluation map given by αEa = α(a). For any a, b ∈ F we can define
Ba,b by
Ba,b (α, β) = (αEa )(βEb ) = α(a)β(b)
Example 22.4. Let V be the real vector space of all continuous
real valued functions on the interval 0 ≤ x ≤ 1. Let ω(x) be some fixed
member of V and define
1
B(α, β) = ω(t)α(t)β(t) dt
0

In Fourier analysis, ω is usually called the weight function.

Example 22.5. Let V be as above and now let κ(x, y) be some fixed
continuous real valued function defined on the unit square 0 ≤ x ≤ 1,
0 ≤ y ≤ 1. Then we can set
1 1
B(α, β) = κ(u, v)α(u)β(v) du dv
0 0

Here κ is usually called the kernel function.


In our study of bilinear forms, we will be concerned with the equa-
tion B(α, β) = 0. Suppose α, β is a solution of the above so that
B(α, β) = 0. Is it necessarily true that B(β, α) = 0? Definitely not.
For example, let V and Ba,b be as in Example 22.3 with a = b. If α = 1
and β = x − b, then
Ba,b (α, β) = 1·(b − b) = 0
but
Ba,b (β, α) = (a − b)·1 = 0

ISTUDY
174 IV. BILINEAR FORMS

In this unpleasant situation, the theory just does not work well. There-
fore, we will usually restrict our attention to normal bilinear forms, that
is bilinear forms B satisfying
B(α, β) = 0 if and only if B(β, α) = 0
There are two important special cases here.
First, we say that a bilinear form B is symmetric if
B(α, β) = B(β, α) for all α, β ∈ V
Obviously such a form is also normal. Second, we say that B is skew-
symmetric if
B(α, α) = 0 for all α ∈ V
The following lemma explains the name and shows that these forms
are also normal.
Lemma 22.1. Let B be a skew-symmetric bilinear form on V . Then
for all α, β ∈ V we have
B(β, α) = −B(α, β)
Hence B is normal.
Proof. We have B(α + β, α + β) = 0 and therefore
0 = B(α + β, α + β) = B(α, α + β) + B(β, α + β)
= B(α, α) + B(α, β) + B(β, α) + B(β, β)
= B(α, β) + B(β, α)
since also B(α, α) = B(β, β) = 0. Thus B(β, α) = −B(α, β) and this
clearly implies that B is normal. 
We now see that these are not only some examples of normal forms,
but they are in fact the only examples.
Theorem 22.1. Let B : V × V → F be a normal bilinear form.
Then B is either symmetric or skew-symmetric.
Proof. Let σ, τ, η ∈ V . Then by linearity
 
B σ, B(σ, η)τ − B(σ, τ )η
= B(σ, η)(B(σ, τ ) − B(σ, τ )B(σ, η) = 0
Hence, since B is normal, we have
 
0 = B B(σ, η)τ − B(σ, τ )η, σ
= B(σ, η)B(τ, σ) − B(σ, τ )B(η, σ)

ISTUDY
22. BILINEAR FORMS 175

and we have shown that


B(σ, η)B(τ, σ) = B(η, σ)B(σ, τ ) (∗)
for all σ, τ, η ∈ V .
Let us suppose that B is not skew-symmetric. Then we can choose
a fixed γ ∈ V with B(γ, γ) = 0. If α ∈ V , then setting σ = η = γ,
τ = α in (∗) yields
B(γ, γ)B(α, γ) = B(γ, γ)B(γ, α)
so since B(γ, γ) = 0 we have B(α, γ) = B(γ, α) for all α ∈ V .
Now let α, β ∈ V . Since B(γ, γ) = 0 and |F | ≥ 2, we can certainly
find a ∈ F with
B(α + aγ, γ) = B(α, γ) + aB(γ, γ) = 0
Then setting σ = α + aγ, τ = β and η = γ in (∗) yields
B(α + aγ, γ)B(β, α + aγ) = B(γ, α + aγ)B(α + aγ, β)
But as we have seen before
B(γ, α + aγ) = B(α + aγ, γ) = 0
so this common factor cancels and we obtain
B(β, α + aγ) = B(α + aγ, β)
Finally
B(β, α + aγ) = B(β, α) + aB(β, γ) and
B(α + aγ, β) = B(α, β) + aB(γ, β)
so since B(β, γ) = B(γ, β) we conclude that B(β, α) = B(α, β). Since
this is true for all α, β ∈ V , B is a symmetric bilinear form and the
result follows. 
Let B : V × V → F be a normal bilinear form. If α, β ∈ V , we
say that α and β are perpendicular and write α ⊥ β if B(α, β) = 0.
Obviously α ⊥ β if and only if β ⊥ α. If S is a nonempty subset of V ,
we write
S ⊥ = {β ∈ V | β ⊥ α for all α ∈ S}
We now list a number of trivial observations.
Lemma 22.2. Let B : V × V → F be a normal bilinear form and
let S, S1 and S2 be nonempty subsets of V . Then
i. S ⊥ is a subspace of V .
ii. S ⊆ S ⊥⊥ .
iii. S1 ⊆ S2 implies that S2⊥ ⊆ S1⊥ .
iv. (S1 ∪ S2 )⊥ = S1⊥ ∩ S2⊥ .

ISTUDY
176 IV. BILINEAR FORMS

v. S⊥ = S ⊥ .
Proof. If α ∈ V , then {α}⊥ is clearly the kernel of the linear
transformation β → B(α, β) and is therefore a subspace of V . Since
clearly "
S⊥ = {α}⊥
α∈S

we see that S is a subspace. This yields (i ). Parts (ii), (iii ) and (iv )
are of course obvious.
Now S ⊇ S so by (iii ) we have S ⊥ ⊇ S⊥ . To obtain the reverse
inclusion, we note that S ⊥⊥ ⊇ S and since S ⊥⊥ is a subspace, we must
have S ⊥⊥ ⊇ S. This says that every element of S ⊥ is perpendicular
to every element of S and hence S ⊥ ⊆ S⊥ . This yields (v ). 
Finally we show
Theorem 22.2. Let V be a finite dimensional vector space over F
and let B : V × V → F be a normal bilinear form. If W is a subspace
of V , then
dimF W + dimF W ⊥ ≥ dimF V
In particular, if W ∩ W ⊥ = 0, then V = W ⊕ W ⊥ .
Proof. If W = 0, then W ⊥ = V and the result is clear. Now
suppose dimF W = s ≥ 1 and let S = {α1 , α2 , . . . , αs } be a basis
for W . Then by the previous lemma, W ⊥ = S ⊥ . Now W ⊥ = S ⊥ is
certainly the kernel of the linear transformation
 T : V → F s given by
β → B(α1 , β), B(α2 , β), . . . , B(αs , β) . Hence since
dimF im T ≤ dimF F s = s = dimF W
Theorem 9.3 yields
dimF V = dimF im T + dimF ker T
≤ dimF W + dimF W ⊥
as required.
Finally if W ∩ W ⊥ = 0, then W + W ⊥ = W ⊕ W ⊥ is a subspace of
V of dimension dimF W + dimF W ⊥ ≥ dimF V , so W ⊕ W ⊥ = V and
the theorem is proved. 

Problems
22.1. Suppose we defined a skew-symmetric bilinear form by the
relation B(α, β) = −B(β, α). Would we get the same answer?

ISTUDY
22. BILINEAR FORMS 177

22.2. Consider the bilinear form defined in Example 22.5. What


do you think are the appropriate assumptions to make on the kernel
function κ(x, y) to guarantee that B is symmetric or skew-symmetric?
22.3. Discuss how one might define a multilinear form in n variables
on V . What would the corresponding definitions be for symmetric or
skew-symmetric forms?
22.4. Suppose n = dimF V . What are the skew-symmetric mul-
tilinear forms in n variables on V ? Can you think of a symmetric
analog?
22.5. Let V = F [x] and define B : V × V → F by
B(α, β) = α(0)β(0)
For each α ∈ V find {α} . What is V ⊥ ?

22.6. Give an example to show that dimF W + dimF W ⊥ could be


strictly larger than dimF V .
If B, B  : V × V → F are bilinear forms, define B + B  by
(B + B  )(α, β) = B(α, β) + B  (α, β)
Similarly, for a ∈ F define aB by
(aB)(α, β) = a·B(α, β)
22.7. Show that B + B  and aB are bilinear forms.
22.8. If B and B  are both symmetric (or skew-symmetric) prove
that the same is true for B + B  and aB. What can one say about the
sum of two normal forms?
22.9. Define B t : V × V → F by B t (α, β) = B(β, α). Discuss the
bilinear forms B + B t and B − B t .

22.10. Assume that F is a field with 2 = 1 + 1 = 0. Prove that


every bilinear form is a sum of a symmetric and a skew-symmetric form
and uniquely so. What happens when 1 + 1 = 0?

ISTUDY
178 IV. BILINEAR FORMS

23. Symmetric and Skew-Symmetric Forms


Let B : V × V → F be a normal bilinear form on the vector space
V over F . If W is a subspace of V , then it is clear that B defines, by
restriction, a normal bilinear form BW : W × W → F . That is, for all
α, β ∈ W , BW (α, β) is just B(α, β). It is necessary now to introduce a
certain amount of notation.
First, it is quite possible that BW is the zero form. That is, for all
α, β ∈ W
BW (α, β) = B(α, β) = 0
In this case, we say that W is an isotropic subspace of V . Otherwise
W is said to be nonisotropic.
Now there is a natural way of finding a particular isotropic subspace.
We define the radical of V to be
rad V = radB V = V ⊥
Observe that if α ∈ V ⊥ and β ∈ V , then B(α, β) = 0 and therefore
all the more so we have B(α, β) = 0 if β ∈ V ⊥ . Thus we see that the
radical of V is an example of an isotropic subspace.
If rad V = V ⊥ = 0 we say that V is nonsingular. In the same way,
if W is a subspace of V then W is nonsingular if W is nonsingular with
respect to the restricted bilinear form BW : W × W → F . Clearly
radBW W = W ∩ W ⊥
where W ⊥ is defined in V . Thus we see that W is a nonsingular
subspace if and only if W ∩ W ⊥ = 0. Let us consider some examples.
Example 23.1. Suppose B is a symmetric form and assume that
for some α ∈ V we have B(α, α) = 0. Then certainly α = 0 so W = α
is a one-dimensional subspace of V . If β, γ ∈ W are both nonzero, then
β = bα and γ = cα for some nonzero b, c ∈ F and thus
BW (β, γ) = B(bα, cα) = bcB(α, α) = 0
We conclude that W is a nonisotropic and in fact a nonsingular sub-
space of V . Using the geometric designation for one-dimensional spaces,
we call such a subspace W a nonisotropic line.
Example 23.2. Now let B be skew-symmetric and let U = α be
a one-dimensional subspace of V . If β, γ ∈ U then β = bα, γ = cα for
some b, c ∈ F and thus
BU (β, γ) = B(bα, cα) = bcB(α, α) = 0
since B is skew-symmetric. This shows that U is an isotropic sub-
space and therefore that a skew-symmetric form does not give rise to

ISTUDY
23. SYMMETRIC AND SKEW-SYMMETRIC FORMS 179

nonisotropic lines. However, we do get interesting nonisotropic planes,


that is nonisotropic 2-dimensional subspaces.
Let W be a 2-dimensional subspace of V with basis {α, β} and
suppose that
B(α, α) = 0, B(α, β) = 1
B(β, α) = −1, B(β, β) = 0
Then we say that W is a hyperbolic plane. We observe now that W is
nonsingular. Let γ ∈ W and write γ = aα + bβ with a, b ∈ F . Then
by the above
B(α, γ) = B(α, aα + bβ)
= aB(α, α) + bB(α, β) = b
and
B(β, γ) = B(β, aα + bβ)
= aB(β, α) + bB(β, β) = −a
Thus if γ ∈ rad W = W ∩ W ⊥ , then a = b = 0 and γ = 0.
We see below that the above two examples are typical of the non-
singular subspaces that exist. In dealing with symmetric forms it is
necessary to eliminate certain fields from consideration. Namely we
must assume that 2 = 1 + 1 = 0 in F .
Lemma 23.1. Let B be a normal bilinear form on the vector space
V and assume that V is nonisotropic.
i. If B is symmetric and 1 + 1 = 0 in F , then V contains a
nonisotropic line.
ii. If B is skew-symmetric, then V contains a hyperbolic plane.
Furthermore, these subspaces are nonsingular.
Proof. (i ) Here we assume that B is symmetric. Suppose first
that there exists α ∈ V with B(α, α) = 0. Then as we have seen
in Example 23.1, W = α is a nonisotropic line and a nonsingular
subspace of V .
We must now consider the case in which B(α, α) = 0 for all α ∈ V .
Then by definition, B is also skew-symmetric and by Lemma 22.1 we
have B(α, β) = −B(β, α) for all α, β ∈ V . On the other hand, B is
symmetric so
B(α, β) = −B(β, α) = −B(α, β)
and 2B(α, β) = 0. Since 2 = 0 in F we conclude that B(α, β) = 0 for
all α, β ∈ V and hence V is isotropic, a contradiction.

ISTUDY
180 IV. BILINEAR FORMS

(ii) Here B is skew-symmetric and V is nonisotropic so we can


choose α, β ∈ V with B(α, β) = 0. Certainly α and β are nonzero and
if b ∈ F then B(α, bβ) = bB(α, β). We can conclude two facts from
this latter observation. First by replacing β by some bβ if necessary we
can assume that B(α, β) = 1. Second, we see that α and β are linearly
independent since otherwise α = bβ for some 0 = b ∈ F and then
B(α, α) = B(α, bβ) = bB(α, β) = 0
which contradicts the fact that B is skew-symmetric. Therefore W =
α, β is 2-dimensional and we may assume that B(α, β) = 1. Since
B is skew-symmetric we then have B(α, α) = 0, B(β, β) = 0 and
B(β, α) = −B(α, β) = −1. Thus W is a hyperbolic plane. Finally as
we saw in Example 23.2, such planes are always nonsingular and the
lemma is proved. 

Of course these lines and planes are rather small subspaces of V . We


now consider a way to build larger subspaces. Let U, W be subspaces of
V . We say that U and W are perpendicular and write U ⊥ W if for all
α ∈ U , β ∈ W we have α ⊥ β. Clearly U ⊥ W if and only if U ⊆ W ⊥
or equivalently W ⊆ U ⊥ . Now let W1 , W2 , . . . , Wk be subspaces of V .
We say that W1 + W2 + · · · + Wk is a perpendicular direct sum if first
of all it is a direct sum and secondly if for all i = j we have Wi ⊥ Wj .
Lemma 23.2. Suppose B : V × V → F is a normal bilinear form
and let
W = W1 + W2 + · · · + Wk
be a perpendicular direct sum. If each Wi is nonsingular, then so is W .
Proof. Let α = α1 + α2 + · · · + αk ∈ rad W with each αi ∈ Wi .
We will show that each αj is zero. To this end, fix j and let βj be any
element of Wj . Since Wj ⊆ W , we have B(α, βj ) = 0. Now
0 = B(α, βj ) = B(α1 + α2 + · · · + αk , βj )
= B(α1 , βj ) + B(α2 , βj ) + · · · + B(αk , βj )
and thus since Wi ⊥ Wj for all i = j we have B(αj , βj ) = 0 for all
βj ∈ Wj . Therefore αj ∈ rad Wj = 0 and, since this is true for all αj ,
we have α = 0 and W is nonsingular. 

We now come to the main result of this section.


Theorem 23.1. Let V be a finite dimensional vector space over F
and let B : V × V → F be a normal bilinear form.

ISTUDY
23. SYMMETRIC AND SKEW-SYMMETRIC FORMS 181

i. If B is symmetric and if 1 + 1 = 0 in F , then


V = W 1 + W 2 + · · · + Wk + U
can be written as a perpendicular direct sum where the sub-
spaces Wi are nonisotropic lines and U is isotropic.
ii. If B is skew-symmetric, then
V = W 1 + W2 + · · · + Wk + U
is a perpendicular direct sum where the subspaces Wi are now
hyperbolic planes and U is an isotropic subspace.

Proof. We prove both parts simultaneously. Let us consider all


subspaces W of V that can be written as
W = W1 + W2 + · · · + Wk
a perpendicular direct sum of the subspaces Wi . Here Wi is a non-
isotropic line in case B is symmetric or Wi is a hyperbolic plane in case
B is skew-symmetric. Certainly W = 0 is such a subspace if we view
the empty sum as 0.
Now among all such subspaces let us choose one, say W as given
above, of maximal dimension. By Lemmas 23.1 and 23.2 we see that
W is a nonsingular subspace and thus by Theorem 22.2 we have V =
W ⊕ W ⊥ . Suppose W ⊥ is a nonisotropic subspace of V . Then by
Lemma 23.1 again, there exists a subspace Wk+1 ⊆ W ⊥ that is either
a nonisotropic line if B is symmetric or a hyperbolic plane if B is
skew-symmetric. But then W ∩ Wk+1 = 0 and Wk+1 ⊆ W ⊥ imply that
W  = W ⊕ Wk+1 = W1 + W2 + · · · + Wk + Wk+1
is an appropriate perpendicular direct sum with W  > W and hence
dimF W  > dimF W , a contradiction. Therefore we conclude that W ⊥
is isotropic and since W ⊕ W ⊥ is a perpendicular direct sum we have
V = W ⊕ W ⊥ = W 1 + W2 + · · · + Wk + W ⊥
and the result follows. 
It is interesting to observe that the above result essentially deter-
mines B. First let B be symmetric and let V be decomposed as above.
Let Wi = δi  and set di = B(δi , δi ). If α, β ∈ V , then we can write
these elements uniquely as
α = a1 δ1 + a2 δ2 + · · · + ak δk + α
β = b1 δ 1 + b2 δ 2 + · · · + bk δ k + β 

ISTUDY
182 IV. BILINEAR FORMS

with α , β  ∈ U . From the perpendicularity of the sum for V , we have


easily
B(α, β) = a1 b1 B(δ1 , δ1 ) + a2 b2 B(δ2 , δ2 ) + · · · + ak bk B(δk , δk )
+ B(α , β  )
= a1 b1 d1 + a2 b2 d2 + · · · + ak bk dk
Similarly if B is skew-symmetric, let Wi = γi , δi  with B(γi , δi ) = 1,
B(δi , γi ) = −1. If α, β ∈ V , then we can write
k

α= (ai γi + ãi δi ) + α
i=1
k

β= (bi γi + b̃i δi ) + β 
i=1

with α , β  ∈ U . Again the perpendicularity of the sum for V yields


k

B(α, β) = B(ai γi + ãi δi , bi γi + b̃i δi ) + B(α , β  )
i=1
k

= (ai b̃i − ãi bi )
i=1

Problems
Let V be a vector space over F and let B : V × V → F be a normal
bilinear form.
23.1. Let S be a nonempty subset of V . Show that S ⊥⊥⊥ = S ⊥ .
23.2. Let V = W1 + W2 + · · · + Wk be a perpendicular direct sum.
Prove that
rad V = rad W1 + rad W2 + · · · + rad Wk
23.3. In Theorem 23.1, show that the subspace U is in fact equal
to rad V .
23.4. Let V = W1 + W2 + · · · + Wk be a sum of subspaces with
Wi ⊥ Wj for all i = j and with each Wi nonsingular. Prove that the
above sum is direct.
23.5. Let F be a field with 1+1 = 0. What is the difference between
symmetric and skew-symmetric forms?

ISTUDY
23. SYMMETRIC AND SKEW-SYMMETRIC FORMS 183

23.6. Again let F be a field with 1 + 1 = 0 and suppose that B is


symmetric. Find an appropriate decomposition for the finite dimen-
sional vector space V .
A quadratic form on V is a map Q : V → F satisfying
i. Q(aα) = a2 Q(α) for all a ∈ F and α ∈ V .
ii. If R(α, β) is defined by
R(α, β) = Q(α + β) − Q(α) − Q(β)
then R is a bilinear form.
23.7. Let B be a bilinear form and set Q(α) = B(α, α). Prove that
Q is a quadratic form. What is the corresponding bilinear form R?
23.8. Show that R is symmetric and R(α, α) = 2Q(α) for all α ∈ V .
23.9. Suppose F is a field with 1 + 1 = 0. Show that Q ↔ R is
a one-to-one correspondence between quadratic forms and symmetric
bilinear forms.
23.10. Let V be finite dimensional and let 1 + 1 = 0 in F . Using
Theorem 23.1 show that there exists a basis  α1 , α2 , . . . , αn } of
V and
field elements d1 , d2 , . . . , dn so that if α = ai αi , then Q(α) = a2i di .

ISTUDY
184 IV. BILINEAR FORMS

24. Congruent Matrices


So far bilinear forms have really just been abstract objects for us.
As with linear transformations we can achieve a concrete realization
by fixing a basis for the vector space. It turns out that we can also
associate these forms in a rather natural way with a matrix.
Let B : V ×V → F be a bilinear form and let C = {γ1 , γ2 , . . . , γn } be
an ordered basis for V . Then for all i, j we have field elements B(γi , γj )
and, since this is a doubly subscripted system, it seems reasonable to
make these the entries of a suitable n × n matrix. Thus we define the
matrix of B with respect to C to be
[B]C = [B(γi , γj )]

Theorem 24.1. Let C = {γ1 , γ2 , . . . , γn } be a fixed basis for V .


Then the map B → [B]C yields a one-to-one correspondence between
the set of all bilinear forms B on V and the set F n×n of all n × n
matrices over F .
n×n
 [cij ] ∈ F  and
Proof. We first show that the map is onto. Let
define a maps B : V × V → F as follows. If α = i ai γi , β = i bi γi
are vectors in V , then

B(α, β) = ai cij bj
i,j

Since the a’s and b’s occur linearly in this formula, it is clear that B is
a bilinear form. Finally since γi = 0γ1 + 0γ2 + · · · + 1γi + · · · + 0γn we
have B(γi , γj ) = cij so this particular B maps to the matrix [cij ] and
the map is onto.
We show now that the map is one-to-one or in other words  that B
C
is uniquely
 determined by [B] . Let α, β ∈ V and write α = i ai γ i
and β = i bi γi . Then by the linear properties of B, we have
  
B(α, β) = B ai γ i , bj γ j
i j
   
= ai B γ i , bj γ j
i j

= ai B(γi , γj )bj
i,j

Now the a’s and b’s depend only upon α, β and C and the field elements
B(γi , γj ) depend only upon [B]C . Thus we see that C and [B]C uniquely
determine B and the theorem is proved. 

ISTUDY
24. CONGRUENT MATRICES 185

Let us record the above formula


 for later use.Thus if B : V ×V → F
is a bilinear form and if α = i ai γi and β = i bi γi then

B(α, β) = ai B(γi , γj )bj (∗)
i,j

Now what sort of matrices do the symmetric and skew-symmetric


forms correspond to? Recall that if [cij ] is a matrix then its transpose
is defined to be [cij ]T = [dij ] where dij = cji . In other words, the
transpose of a matrix is essentially its mirror image about the main
diagonal. Let [cij ] ∈ F n×n . We say that [cij ] is a symmetric matrix if
[cij ]T = [cij ] or equivalently if cij = cji for all i, j. We say that [cij ] is a
skew-symmetric matrix if [cij ]T = −[cij ] and if all the diagonal entries
are zero or equivalently if cij = −cji for all i, j and cii = 0 for all i. Of
course we now have
Theorem 24.2. Under the correspondence B ↔ [B]C symmetric
bilinear forms correspond to symmetric matrices and skew-symmetric
forms correspond to skew-symmetric matrices.
Proof. Let C = {γ1 , γ2 , . . . , γn }. Suppose first that B is symmet-
ric. Then B(γi , γj ) = B(γj , γi ) so [B]C = B(γi , γj )] is a symmetric
matrix. Conversely let us assume that the matrix is symmetric and let
α and β be given as in equation (∗). Then by that equation we have

B(α, β) = ai B(γi , γj )bj
i,j

= bj B(γj , γi )ai = B(β, α)
i,j

and B is symmetric.
Now let B be skew-symmetric. Then by Lemma 22.1 we have
B(γi , γi ) = 0 and B(γi , γj ) = −B(γj , γi ) and hence [B(γi , γj )] is a
skew-symmetric matrix. Finally let us suppose that the matrix is skew
symmetric and let α = β in equation (∗). Then

B(α, α) = ai B(γi , γj )aj
i,j

Since the diagonal entries of the matrix are 0, the terms with i = j do
not appear in the above sum. Thus we may assume that i = j and
then the contribution from the pair {i, j} is
ai B(γi , γj )aj + aj B(γj , γi )ai

= ai aj B(γi , γj ) + B(γj , γi ) = 0

ISTUDY
186 IV. BILINEAR FORMS

since the matrix is skew-symmetric. This shows that B(α, α) = 0 for


all α ∈ V and hence B is skew-symmetric. 
Now as we saw at the end of the last section certain bases exist for
V that make symmetric and skew-symmetric forms looks nice. It is
certainly natural to ask how the matrix [B]C changes under a change
of basis. The answer is given below.
Theorem 24.3. Let B : V × V → F be a bilinear form and let A
and C be two ordered bases for the finite dimensional vector space V .
Then
[B]A = A [I]C ·[B]C ·A [I]C T
where A [I]C is the usual change of basis matrix.
n } and C = {γ1 , γ2 , . . . , γn }. Recall
Proof. Let A = {α1 , α2 , . . . , α
that A [I]C = [aij ] means that αi = j aij γj . Then by equation (∗) we
have 
B(αr , αs ) = ari B(γi , γj )asj
i,j
and the right hand side is clearly the r, s-entry of the matrix product
[aij ][B(γi , γj )][aij ]T = A [I]C ·[B]C ·A [I]C T . The result follows. 
Suppose that α and γ are matrices in F n×n . We say that α and γ
are congruent if there exists a nonsingular matrix β with α = βγβ T .
As an immediate consequence of the above we have
Lemma 24.1. Let α, γ ∈ F n×n and let V be some n-dimensional
vector space over F , for example V = F n . Then α and γ are congruent
matrices if and only if there exists a bilinear form B : V × V → F and
bases A and C with α = [B]A and γ = [B]C .
Proof. If α = [B]A and γ = [B]C then by the preceding theorem,
α and γ are certainly congruent. Conversely assume that α = βγβ T
for some nonsingular matrix β. Let C be a fixed ordered basis for V .
Then by Theorem 24.1 we see that there exists a bilinear form B with
[B]C = γ. Now β is a nonsingular matrix so by Lemma 12.3 there exists
a basis A of V with β = A [I]C . Thus
α = βγβ T = A [I]C ·[B]C ·A [I]C T = [B]A
by the above theorem and the result follows. 
We can now combine all of the above ideas to prove some results
about symmetric and skew-symmetric matrices.
Theorem 24.4. Let α be a matrix in F n×n .

ISTUDY
24. CONGRUENT MATRICES 187

i. If α is symmetric and 1 + 1 = 0 in F , then α is congruent to


a diagonal matrix.
ii. If α is skew-symmetric, then α is congruent to a block diagonal
matrix of the form
diag(H, H, . . . , H, Z)
where 

0 1
H=
−1 0
corresponds to the hyperbolic plane, and Z is a suitable zero
matrix.
Specifically, the block diagonal matrix in (ii ) above looks like
⎡ ⎤
0 1
⎢−1 0 ⎥
⎢ ⎥
⎢ 0 1 ⎥
⎢ ⎥
⎢ −1 0 ⎥
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ 0 1 ⎥
⎢ ⎥
⎢ −1 0 ⎥
⎢ ⎥
⎢ 0 ⎥
⎢ ... ⎥
⎣ ⎦
0

Proof. Let V be some n-dimensional vector space over F , say


for example V = F n . Fix a basis A of V so that by Theorem 24.1,
α = [B]A for some bilinear form B.
(i ) Suppose first that α is a symmetric matrix and 1 + 1 = 0 in F .
Then by Theorem 24.2, B is a symmetric bilinear form. Therefore by
Theorem 23.1, we can write
V = W1 + W2 + · · · + Wk + U
a perpendicular direct sum of the nonisotropic lines Wi and also the
isotropic space U . Let Wi = γi  and let {γk+1 , . . . , γn } be a basis for
U . Then C = {γ1 , . . . , γk , . . . , γn } is a basis for V and we consider [B]C .
Let i = j. If either i ≤ k or j ≤ k, then B(γi , γj ) = 0 since the above
is a perpendicular direct sum. On the other hand, if i > k and j > k,
then γi , γj ∈ U so again B(γi , γj ) = 0 since U is isotropic. Thus we
shown that [B]C has all zeros off the main diagonal. By Theorem 24.3,
α = [B]A is congruent to the diagonal matrix [B]C .
(ii ) Now let α be a skew-symmetric matrix. Then by Theorem 24.2,
B is a skew-symmetric bilinear form. Therefore by Theorem 23.1, we

ISTUDY
188 IV. BILINEAR FORMS

can write
V = W 1 + W2 + · · · + Wk + U
a perpendicular direct sum on the hyperbolic planes Wi and also the
isotropic space U . Let Wi = γi , δi  with B(γi , δi ) = 1, B(δi , γi ) = −1
and let {β1 , β2 , . . . , βs } be a basis for U . Then certainly
C = {γ1 , δ1 , γ2 , δ2 , . . . , γk , δk , β1 , β2 , . . . , βs }
is a basis for V . We compute the matrix [B]C . Since the above is a
perpendicular direct sum and since U is an isotropic subspace we have
easily
B(βi , γj ) = 0 = B(γj , βi )
B(βi , δj ) = 0 = B(δj , βi )
B(βi , βj ) = 0 = B(βj , βi )
for all i, j. Furthermore, for all i = j we have B(γi , γj ) = 0 = B(δi , δj )
and B(γi , δj ) = 0 = B(δj , γi ). Thus the only nonzero entries of [B]C
occur when we consider two elements of each Wi . Here we have
B(γi , γi ) = 0 B(γi , δi ) = 1
B(δi , γi ) = −1 B(δi , δi ) = 0
Therefore [B]C has the indicated block diagonal form with precisely k
blocks equal to H. Finally, by Theorem 24.3, α = [B]A is congruent to
the matrix [B]C , as required. 
Finally we should consider whether the matrices we get in the above
theorem are actually unique. For symmetric matrices α we get diagonal
matrices but the diagonal entries are by no means unique. For certain
fields we can suitably restrict the entries to get a uniqueness theorem
but the problem is certainly not solved in general. On the other hand,
if α is skew-symmetric then the only parameter here is the number of
H blocks that appear and this is in fact uniquely determined.

Problems
24.1. Let V be the real subspace of R[x] consisting of all polyno-
#mials of degree < n and let B : V × V → R be defined by B(α, β) =
1
0
α(x)β(x) dx. If C is the basis of V given by {1, x, x2 , . . . , xn−1 } show
that the matrix [B]C is the Hilbert matrix of Example 16.2.
24.2. Prove that the map B : V × V → F defined in the first para-
graph of the proof of Theorem 24.1 is a bilinear form.

ISTUDY
24. CONGRUENT MATRICES 189

24.3. Let V be a vector space over F with basis C = {γ1


, γ2 , . . . , γ n }
let B : V × V → F be a bilinear form. Let α =
and  i ai γi and
β = i bi γi be vectors in V . Then the matrix product
[a1 a2 . . . an ]·[B]C ·[b1 b2 . . . bn ]T
is a 1 × 1 matrix. Show that its entry is precisely B(α, β).
24.4. Let V be a 3-dimensional Q-vector space with basis C =
{γ1 , γ2 , γ3 } and let B : V × V → Q be the symmetric bilinear form
given by ⎡ ⎤
0 1 −2
C
[B] = ⎣ 1 2 1⎦
−2 1 3
Find a basis A such that [B]A is diagonal.
24.5. Let V be a 4-dimensional Q vector space with basis C =
{γ1 , γ2 , γ3 , γ4 } and let B : V × V → Q be the skew-symmetric bilinear
form given by ⎡ ⎤
0 1 2 3
⎢−1 0 −1 −2⎥
[B]C = ⎢
⎣−2 1

0 0⎦
−3 2 0 0
Find a basis A such that [B]A is block diagonal.
24.6. Let α ∼ β indicate that the matrices α and β are congruent.
Show that ∼ is an equivalence relation, that is show
i. α ∼ α for all α (reflexive law)
ii. α ∼ β implies β ∼ α (symmetric law)
iii. α ∼ β and β ∼ γ imply α ∼ γ (transitive law)
24.7. Let α ∈ F n×n be the diagonal matrix α = diag(a1 , a2 , . . . , an )
and let b1 , b2 , . . . , bn be nonzero elements of F . Show that α is congruent
to the matrix β = diag(a1 b21 , a2 b22 , . . . , an b2n ).
24.8. Let F be a field like the complex numbers in which every
element has a square root and suppose 1 + 1 = 0 in F . Using the above
prove that every symmetric matrix is congruent to a diagonal matrix
with diagonal entries 0 and 1 only.
24.9. Prove that every symmetric matrix over the real numbers is
congruent to a diagonal matrix with diagonal entries 0, 1 and −1 only.
24.10. Let α ∈ F n×n be a skew-symmetric matrix
 and  suppose α
is congruent to a block matrix with k blocks H = −10 10 . If C is any
basis for V = F n and if [B]C = α, show that n − 2k = dimF rad B.
Conclude that k is uniquely determined by α.

ISTUDY
190 IV. BILINEAR FORMS

25. Inner Product Spaces


When we restrict our attention to certain special fields F , bilinear
forms can become even more interesting. Here we will assume that
F = R and start by posing a simple question about Rn . Let α =
(a1 , a2 , . . . , an ) and β = (b1 , b2 , . . . , bn ) be elements of Rn . What is the
angle θ between the line 0α joining 0 to α and the line 0β? This is
actually a 2-dimensional problem since 0, α and β are contained in the
subspace W = α, β.
Let a denote the length of the line segment 0α, b the length of the
line segment 0β and c the length of αβ as indicated on the diagram.
β

b
c

θ
0 α
a

Then by the Pythagorean Theorem (and induction on n)


a2 = a21 + a22 + · · · + a2n
b2 = b21 + b22 + · · · + b2n
Furthermore, since c corresponds to the vector α − β, we have
c2 = (a1 − b1 )2 + (a2 − b2 )2 + · · · + (an − bn )2
Now by the Law of Cosines
c2 = a2 + b2 − 2ab cos θ
so
1
ab cos θ = (a2 + b2 − c2 )
2
= a1 b1 + a2 b2 + · · · + an bn
and we have essentially found the angle θ.
Let us observe that the function of α and β given by the above right
hand side is certainly a symmetric bilinear form which we now denote
by α • β. In other words,
α • β = a1 b1 + a2 b2 + · · · + an bn

ISTUDY
25. INNER PRODUCT SPACES 191

Moreover
α • α = a21 + a22 + · · · + a2n
and thus since F = R we have α • α ≥ 0 and α • α = 0 if and only if
α = 0. In this way, Rn becomes an inner product space.
Formally an inner product space is a vector space V over R with
an inner product or dot product α • β defined on it satisfying
1. The inner product • maps V × V to R and is a symmetric
bilinear form.
2. For all α ∈ V , α • α ≥ 0 and α • α = 0 if and only if α = 0.
Let us consider some examples.
Example 25.1. We have as above V = Rn and α • β defined by
α • α = a1 b1 + a2 b2 + · · · + an bn
where α = (a1 , a2 , . . . , an ) and β = (b1 , b2 , . . . , bn ).

Example 25.2. Let V be a 3-dimensional vector space over R with


basis C = {γ1 , γ2 , γ3 } and let α • β = B(α, β) where
⎡ ⎤
2 −1 0
[B]C = ⎣−1 2 −2⎦
0 −2 4
Then • = B is certainly a symmetric bilinear form. Now suppose α ∈ V
with α = a1 γ1 + a2 γ2 + a3 γ3 . Then

α • α = B(α, α) = ai B(γi , γj )aj
i,j

= 2a21 − 2a1 a2 + 2a22 − 4a2 a3 + 4a23


= a21 + (a1 − a2 )2 + (a2 − 2a3 )2
Thus we see that α • α ≥ 0 and that α • α = 0 if and only if
a1 = 0, a1 − a2 = 0, and a2 − 2a3 = 0
and hence if and only if α = 0.
Example 25.3. Let V = R[x] and define
1
α•β = α(x)β(x) dx
−1
 i
 i
If α = i ai x and β = i bi x , then
 2
α•β = ai bj
i+j even
i + j + 1

ISTUDY
192 IV. BILINEAR FORMS
#1
Observe that α • α = −1
α(x)2 dx ≥ 0 and clearly α • α = 0 if and only
if α = 0.

Example 25.4. Let V be the subspace of all continuous real valued


functions spanned by the trigonometric functions sin x, sin 2x, sin 3x, . . .
and 1, cos x, cos 2x, cos 3x, . . .. For α, β ∈ V we define
π
α•β = α(x)β(x) dx
−π

Since V consists of continuous functions of period 2π, we see easily that


V is an inner product space.

Now let V be an inner product space. We say that α and β are


orthogonal or perpendicular and write α ⊥ β if α • β = 0. We observe
that in our original example

ab cos θ = α • β

and therefore if α, β = 0, then α • β = 0 if and only if cos θ = 0. But


cos θ = 0 mean that θ = ±π/2 and hence geometrically that 0α and
0β are perpendicular lines.
If U and W are subspaces of V , we say 
that U and W are orthogonal
if α ⊥ β for all α ∈ U , β ∈ W . Similarly, i Wi is an orthogonal direct
sum if the sum is direct and if Wi ⊥ Wj for all i = j.

Lemma 25.1. Let V be a finite dimensional inner product space


over R and let W be a subspace of V .
i. If W = 0, then W is nonisotropic.
ii. W is nonsingular and hence V = W ⊕ W ⊥ .
iii. V = n1 Wi is an orthogonal direct sum of n = dimR V non-
isotropic lines.

Proof. Most of this follows by virtue of the fact that the inner
product is a symmetric bilinear form.
(i ) If W = 0 we can choose 0 = α ∈ W . By definition α • α > 0
and hence W is nonisotropic.
(ii ) If α ∈ rad W then we must have α ⊥ α and thus α = 0.
Therefore rad W = 0 and W is nonsingular so Theorem 22.2 implies
that V = W ⊕ W ⊥ . k
(iii ) By Theorem 23.1 we have V = 1 Wi + U , an orthogonal
direct sum of the nonisotropic lines Wi and the isotropic subspace U .
Since (i ) implies that U = 0, the lemma is proved. 

ISTUDY
25. INNER PRODUCT SPACES 193

In our original example in Rn , if α = (a1 , a2 , . . . , an ) then the length


of the line segment 0α is given by
$ √
a21 + a22 + · · · + a2n = α • α
In general this is a parameter which in some sense measures the size
of α. If V is any inner product space and if α ∈ V , then we define the
norm of α to be

α = α • α
Observe that by definition we always have α • α ≥ 0 so it does make
sense to take its square root in R. If α = 1 we say that α is a normal
vector.
Let S be a subset of V . Then S is said to be orthogonal if for all
α, β ∈ S with α = β we have α ⊥ β. S is orthonormal if in addition
for all α ∈ S we have α = 1.
Lemma 25.2. Let α ∈ V and a ∈ R. Then aα = |a|·α. In
particular, if α = 0 and a = 1/α, then aα = 1.
Proof. We have
% %
aα = (aα) • (aα) = a2 (α • α) = |a|·α
In particular, if α = 0 then α > 0 and aα = 1 when a = 1/α. 
We now come to one of the important facts about finite dimensional
inner product spaces.
Theorem 25.1. Let V be a finite dimensional inner product space.
Then V has an orthonormal basis.

Proof. By part (iii ) of Lemma 25.1, V = n1 Wi is an orthogonal
direct sum of nonisotropic lines Wi = αi . It then follows easily that
{α1 , α2 , . . . , αn } is an orthogonal basis for V . Finally, by the preceding
lemma, for each αi there exists ai = 1/αi  ∈ R with ai αi  = 1. Then
{a1 α1 , a2 α2 , . . . , an αn } is certainly an orthonormal basis for V and the
theorem is proved. 
We remark that the above result is decidedly false in general if V is
allowed to be infinite dimensional. In the finite dimensional case there
is actually a constructive procedure for finding an orthonormal basis
known as the Gram-Schmidt method. It works as follows.
Theorem 25.2. Let V be an inner product space over R with basis
{α1 , α2 , . . . , αn }. We define vectors γ1 , γ2 , . . . , γn inductively by γ1 =

ISTUDY
194 IV. BILINEAR FORMS

α1 and
i
 (αi+1 • γk )
γi+1 = αi+1 − γk
k=1
(γk • γk )

Then, for all i


& '
1 1 1
Ci = γ1 , γ2 , . . . , γi
γ1  γ2  γi 

is an orthonormal basis for α1 , α2 , . . . , αi . In particular Cn is an


orthonormal basis for V .

Proof. We remark first that in the above formula for γi+1 we have
denominators of the form (γk •γk ). Therefore we will have to show that
each γk is nonzero.
The proof of the theorem proceeds by induction on i. When i = 1
we certainly have γ1 = α1 = 0 so C1 = { γ11 γ1 } is an orthonormal basis
for α1  by Lemma 25.2.
Now assume that Ci is an orthonormal basis for α1 , α2 , . . . , αi . In
particular each γk with k ≤ i is nonzero. Thus it makes sense to define

i
 (αi+1 • γk )
γi+1 = αi+1 − γk
k=1
(γk • γk )

Now observe that αi+1 can be solved from this equation and hence
αi+1 ∈ γ1 , γ2 , . . . , γi+1 . But we also know, by induction, that αk ∈
γ1 , γ2 , . . . , γi  for all k ≤ i. Hence

α1 , α2 , . . . , αi+1  ⊆ γ1 , γ2 , . . . , γi+1 

Now {α1 , α2 , . . . , αi+1 } is linearly independent, so the left hand sub-


space above has dimension i + 1. Since the right hand subspace is
spanned by i + 1 vectors, it has dimension at most i + 1. The inclusion
therefore becomes an equality and this implies that the i + 1 vectors
{γ1 , γ2 , . . . , γi+1 } are linearly independent and in particular γi+1 = 0.
Moreover we now see that Ci+1 is a basis for α1 , α2 , . . . , αi+1 . We need
only show that it is orthonormal.
By Lemma 25.2 we know that Ci+1 consists of normal vectors and
moreover by induction γj ⊥ γk for all j < k ≤ i. Thus we need only
show that γj ⊥ γi+1 for all j ≤ i. To this end, fix some j ≤ i. Then by

ISTUDY
25. INNER PRODUCT SPACES 195

the definition of γi+1 we have


i
 (αi+1 • γk ) 
γi+1 • γj = αi+1 − γk • γj
k=1
(γk • γk )
i
 (αi+1 • γk )
= αi+1 • γj − (γk • γj )
k=1
(γk • γk )
Since γk • γj = 0 in the above sum for all k = j, we have
(αi+1 • γj )
γi+1 • γj = αi+1 • γj − (γj • γj ) = 0
(γj • γj )
so γi+1 ⊥ γj and the theorem follows by induction. 
It is clear that the above result offers an elementary method for
constructing an orthonormal basis.

Problems
25.1. Prove that an orthogonal set of nonzero vectors is always
linearly independent.

25.2. We consider the inner product space V = R[x] with


1
α•β = α(x)β(x) dx
−1
Let D : R[x] → R[x] denote the usual derivative d/dx and note that
if α(x) is divisible by γ(x)k with k ≥ 1 then α(x)D is divisible by
γ(x)k−1 . For each integer n define αn ∈ R[x] by
αn (x) = (x2 − 1)n Dn
Show that αn is a polynomial of degree n and that αn • αm = 0 for all
m = n. (Hint. If m < n evaluate the integral αm • αn by parts m + 1
times, differentiating αm and integrating αn .)
25.3. Prove that {α0 , α1 , α2 , . . .} is an orthogonal basis for the inner
product space V = R[x] given above. The αn (x) are suitable scalar
multiples of the so-called Legendre polynomials.
25.4. Let m and n be integers. Evaluate the integrals
π π π
sin mx· sin nx dx, sin mx· cos nx dx, cos mx· cos nx dx
−π −π −π
using appropriate trigonometric identities.
25.5. Let V be as in Example 25.4. Use the above to find an or-
thonormal basis for V .

ISTUDY
196 IV. BILINEAR FORMS

25.6. Let V be a finite dimensional inner product space and let


W be a subspace. Show that any orthonormal bases for W can be
extended to an orthonormal bases for V .
25.7. Use the Gram-Schmidt method to find an orthonormal basis
for the inner product space of Example 25.2.

25.8. Suppose {α1 , α2 , . . . , αn } is an orthonormal basis for V . What


new orthonormal basis does the Gram-Schmidt method construct?
Let Z+ denote the set of positive integers and let V be the set of
all bounded functions from Z+ to R. That is α ∈ V if and only if
α : Z+ → R and there exists a finite bound M (depending on α) with
α(n) ≤ M for all n ∈ Z+ . For α, β ∈ V define
∞
α(n)β(n)
α•β =
n=1
2n
25.9. Prove that V is a vector space over R and in fact an inner
product space.
25.10. Let W be the set of all elements α ∈ V such that α(n) = 0
for all but finitely many n ∈ Z+ . Show that W is a proper subspace of
V and find W ⊥ . Do you think V can have an orthonormal basis? See
Problem 26.10.

ISTUDY
26. INEQUALITIES 197

26. Inequalities
Let V be a finite dimensional inner product space. Then we know
that V has an orthonormal basis and we expect this basis to completely
determine the inner product structure. Indeed we have
Lemma 26.1. Let V be an  inner product 
space with orthonormal
basis {γ1 , γ2 , . . . , γn }. If α = i ai γi and β = i bi γi are vectors in V ,
then ai = α • γi and bi = β • γi so

α= (α • γi )γi
i
 
α•β = ai b i = (α • γi )(β • γi )
i i
 
α2 = a2i = (α • γi )2
i i

Proof. By linearity we have


   
α•β = ai γ i • bj γ j
i j

= ai bj (γi • γj )
i,j

But γi • γj = 0 for i = j and γi • γi = 1, so the above becomes



α•β = ai b i
i

In particular,
 if β = γi this yields α•γi = ai and similarly β •γi = bi .
Thus α = i (α • γi )γi and

α•β = (α • γi )(β • γi )
i

Finally if β = α then
 
α2 = α • α = a2i = (α • γi )2
i i

and the lemma is proved. 


Let us return for a moment to Example 25.1 and the comments at
the beginning of the preceding section. Thus V = Rn and
ab cos θ = α • β
where α = (a1 , a2 , . . . , an ), β = (b1 , b2 , . . . , bn ), a is the length of 0α, b
is the length of 0β and θ is the angle between the two line segments.

ISTUDY
198 IV. BILINEAR FORMS

Clearly
$
a = a21 + a22 + · · · + a2n = α
$
b = b21 + b22 + · · · + b2n = β
so
α•β
cos θ =
α·β
Finally | cos θ| ≤ 1 so we have shown here that
α•β
1≥
α·β
While the above inequality is of interest, the above proof is certainly
unsatisfying. It is based on certain geometric reasoning that just is not
set on as firm a foundation as our algebraic reasoning. Admittedly,
geometry can be axiomatized in such a way that the above proof will
have validity in our situation, but most of us have not seen this formal
approach to the subject. Thus it is best for us to look elsewhere for
a proof of this inequality. In fact, we will offer three entirely different
proofs of the result, the first of which is probably best since it avoids
the use of bases.
The following is called the Cauchy-Schwarz inequality.
Theorem 26.1. Let V be an inner product space and let α, β ∈ V .
Then
α·β ≥ |α • β|
Moreover equality occurs if and only if one of α or β is a scalar multiple
of the other.
Proof. We observe that the theorem is trivially true if one of α
or β is zero. Thus we can certainly assume that both α and β are
nonzero. Moreover if α and β are scalar multiples of each other then
we easily get equality here. Finally since α and β are contained in
α, β, a finite dimensional subspace of V , we may clearly assume that
V is finite dimensional.
First Proof. Let x ∈ R. Then
0 ≤ α − xβ2 = (α − xβ) • (α − xβ)
= α2 − 2x (α • β) + x2 β2
What we have here is a parabola
y = α2 − 2x (α • β) + x2 β2

ISTUDY
26. INEQUALITIES 199

in the (x, y)-plane that does not go below the x-axis. We obviously
derive the most information from this by choosing x to be that real
number that minimizes y, namely we take x = (α • β)/β2 . Then

(α • β) (α • β)2
0 ≤ α2 − 2 (α • β) + β2
β2 β4
(α • β)2
= α2 −
β2

This yields (α•β)2 ≤ α2 β2 and the first part follows. Moreover we
have equality if and only if, for this particular x, we have α−xβ2 = 0
and hence if and only if α = xβ. 

Second Proof. We may suppose that V is finite dimensional so


by Theorem 25.1,V has an orthonormal basis {γ1 , γ2 , . . . , γn }. If
that 
α = i ai γi and β = i bi γi then by Lemma 26.1

2 α2 β2 − (α • β)2
       
2 2
= ai bj + b2i a2j
i j i j
   
−2 ai b i aj b j
i j

= (a2i b2j + b2i a2j − 2ai bi aj bj )
i,j

= (ai bj − bi aj )2 ≥ 0
i,j

Moreover if say a1 = 0 then equality implies that ai b1 − bi a1 = 0 for


all i so since β = 0 we have b1 = 0 and then ai /a1 = bi /b1 . Thus
(1/a1 )α = (1/b1 )β. 

Third Proof. Again we may suppose that V is finite dimensional


so we can extend {β} to a basis {β1 , β2 , . . . , βn } of V with β1 = β.
We then apply the Gram-Schmidt procedure to this basis and get an
orthonormal basis {γ1 , γ2 , . . . , γn } of V with

1 1
γ1 = β1 = β
β1  β

ISTUDY
200 IV. BILINEAR FORMS

Then by Lemma 26.1



α2 = (α • γi )2 ≥ (α • γ1 )2
i
1 2 1
= α• β = 2
(α • β)2
β β
and the inequality is proved. Finally, if equality occurs then we must
have α2 = (α • γ1 )2 so (α • γi ) = 0 for all i > 1 and α = (α • γ1 )γ1 is
a scalar multiple of β. 
This completes all three proofs of the theorem. 
The following application is known as the Triangle Inequality for
reasons that will be explained later.
Theorem 26.2. Let V be an inner product space and let α, β ∈ V .
Then
α + β ≤ α + β
Proof. We have
α + β2 = (α + β) • (α + β) = (α • α) + 2(α • β) + (β • β)
Since α • β ≤ α β this yields
 2
α + β2 ≤ α2 + 2α β + β2 = α + β
and thus, by taking square roots, we have α + β ≤ α + β. 
Suppose we return to our original example concerning Rn in this
section and let α = (a1 , a2 , . . . , an ) and β = (b1 , b2 , . . . , bn ). Then the
length of the line segment αβ is
%
(a1 − b1 )2 + (a2 − b2 )2 + · · · + (an − bn )2 = α − β
Now let γ be a third vector in Rn and form the triangle with α, β and
γ as vertices. Since a straight line is the shortest distance between two
points, it follows that the sum of the lengths of two sides of the triangle
is greater than the length of the third. In particular we must have
α − γ + γ − β ≥ α − β
Since (α − γ) + (γ − β) = α − β, this statement is an immediate
consequence of the above Triangle Inequality.
Finally we consider an elementary formula that is known as the
Parallelogram Law.
Lemma 26.2. Let V be an inner product space and let α, β ∈ V .
Then
α − β2 + α + β2 = 2α2 + 2β2

ISTUDY
26. INEQUALITIES 201

Proof. We merely observe that


α − β2 + α + β2 = (α − β) • (α − β) + (α + β) • (α + β)

= α2 − 2(α • β) + β2

+ α2 + 2(α • β) + β2
= 2α2 + 2β2
and the result follows. 
Again consider Rn and nonzero vectors α and β. Then the quadri-
lateral drawn below with vertices 0, α, α + β and β is clearly a parallel-
ogram. Thus there are two sides of length α and two of length β.
On the other hand the diagonals have lengths α + β and α − β.
Thus the above lemma asserts that the sum of the squares of the lengths
of the four sides of a parallelogram is equal to the sum of the squares
of the lengths of the two diagonals.

α+β

It is interesting to observe that this essentially trivial parallelogram


law really characterizes the norm map. The following proof is not really
algebraic. It uses limit properties of the real numbers.
Theorem 26.3. Let V be a vector space over R and let   : V → R
satisfy
i. α ≥ 0 for all α and α = 0 if and only if α = 0.
ii. rα = |r|·α for all r ∈ R.
iii. α − β2 + α + β2 = 2α2 + 2β2 for all α, β ∈ V .
Then V is an inner product space with   as the associated norm.

ISTUDY
202 IV. BILINEAR FORMS

Proof. Using (iii ) we define a function V × V → R by the two


equivalent formulas
1 
α•β = α + β2 − α2 − β2
2
1 
= α2 + β2 − α − β2
2
and we show that this is an inner product. Certainly α • β = β • α and
1 
α•α= α2 + α2 − 02 = α2
2
so by (i) α • α ≥ 0 and α • α = 0 if and only if α = 0. It remains to
show that α • β is bilinear.
For convenience, let us rewrite (iii ) as
σ − τ 2 + σ + τ 2 = 2σ2 + 2τ 2
and we apply this identity three times with the substitutions: (1) σ =
α + β + γ, τ = α ; (2) σ = α + β, τ = α + γ ; (3) σ = β, τ = γ. We
obtain
(1) 2α + β + γ2 + 2α2 = 2α + β + γ2 + β + γ2

(2) 2α + β + γ2 + β − γ2 = 2α + β2 + 2α + γ2

(3) 2β2 + 2γ2 = β + γ2 + β − γ2


Adding these, cancelling the terms 2α + β + γ2 and β − γ2 , and
dividing by 2 yields easily
α + β + γ2 + α2 + β2 + γ2 (∗)
= α + β2 + α + γ2 + β + γ2
Now, by definition, we have
 
2 α • (β + γ) − 2(α • β) − 2(α • γ)
 
= α + β + γ2 − α2 − β + γ2
 
− α + β2 − α2 − β2
 
− α + γ2 − α2 − γ2
 
= α + β + γ2 + α2 + β2 + γ2
 
− α + β2 + α + γ2 + β + γ2
=0
by equation (∗) above. Thus
α • (β + γ) = α • β + α • γ

ISTUDY
26. INEQUALITIES 203

and by symmetry we also have

(β + γ) • α = β • α + γ • α

Let m, n be positive integers. Then induction yields α • (nβ) =


n(α • β) so
 n 
m α • β = α • nβ = n(α • β)
m
and
n n
α • β = (α • β)
m m
Moreover, by definition and (ii ) we have
1 
α • (−β) = α2 + −β2 − α + β2
2
1 
= − α + β2 − α2 − β2 = −(α • β)
2
and this shows that for all b ∈ Q, α • bβ = b(α • β). Moreover, for all
a ∈ Q, symmetry yields aα • β = a(α • β).
Let r ∈ R with r = 0 and let a ∈ Q with a > 0. Then by (ii )
1 
rα • aβ = rα2 + aβ2 − rα − aβ2
2
1 
≤ rα2 + aβ2
2
1 2 1
= r α2 + a2 β2
2 2
Since rα • aβ = a(rα • β) and a > 0 this yields

r2 a
rα • β ≤ α2 + β2
2a 2
If we now choose a so that |r|/2 < a < 2|r| the above easily becomes
 
rα • β ≤ |r| α2 + β2

Moreover replacing β by −β and using −β = β we get


 
−(rα • β) = rα • (−β) ≤ |r| α2 + β2

and thus
 
|(rα • β)| ≤ |r| α2 + β2 (∗∗)

ISTUDY
204 IV. BILINEAR FORMS

Finally fix α, β ∈ V and s ∈ R. We choose c ∈ Q with c “close to


but not equal to” s and obtain
   
|sα • β − s(α • β)| =  sα • β − s(α • β) − cα • β − c(α • β) 
= |(s − c)α • β − (s − c)(α • β)|
≤ |(s − c)α • β| + |s − c||α • β|

≤ |s − c| α2 + β2 + |α • β|

by equation (∗∗) with r = s − c. Now letting c ∈ Q approach s, we


observe that the above right hand side approaches 0 and we conclude
that
(sα) • β = s(α • β)
By symmetry
α • (sβ) = s(α • β)
so • is a bilinear form and the result follows. 

Problems
26.1. Prove geometrically that the sum of the squares of the lengths
of the four sides of a parallelogram is equal to the sum of the squares
of the lengths of the two diagonals. (Hint. An obvious approach is to
try two applications of the Law of Cosines.)
Let V be an inner product space.
26.2. Show that equality occurs in the Cauchy-Schwarz inequality
if α = aβ or β = bα for some a, b ∈ R.
 
26.3. Prove that α − β ≤ α − β for all α, β ∈ V .
A linear transformation T : V → V is said to be unitary if αT  =
α for all α ∈ V .
26.4. Let T : V → V be a linear transformation. Show that T is
unitary if and only if (αT ) • (βT ) = α • β for all α, β ∈ V .
26.5. Let C = {γ1 , γ2 , . . . , γn } be an orthonormal basis for V . Prove
that T : V → V is unitary if and only if (C)T = {γ1 T, γ2 T, . . . , γn T } is
also an orthonormal basis.
26.6. Let C be as above. Show that T is unitary if and only if
T T
C [T ] C ·C [T ]C = In where is the matrix transpose map.

ISTUDY
26. INEQUALITIES 205

Let V denote the vector space of all bounded real valued functions
α : Z+ → R as defined immediately before Problem 25.9 and recall that
∞
α(n)β(n)
α•β =
n=1
2n
Then we know that V is an inner product space. Suppose by way of
contradiction that V has an orthonormal basis C.
26.7. Show that V is infinite dimensional and therefore that C has
a countably infinite subset {γ1 , γ2 , . . . , γi , . . .}.
26.8. For each γi as above, let ci > 0 denote a finite upper bound
for {|γi (n)| | n ∈ Z+ }. Now define α : Z+ → R by
∞
γi (n)
α(n) =
i=1
2 i ci
(Note that we are not adding infinitely many functions since this is not
defined on V , but rather we are adding infinitely many real function
values.) Prove that α ∈ V .
26.9. For all i show that
1
α • γi = = 0
2i c i
(Hint. α • γi is a double sum and can be computed by interchanging
the order of summation and using the values of the various γj • γi .)
26.10. Use the preceding problem and the ideas of Lemma 26.1 to
deduce that when α is written in terms of the basis C that all γi must
occur. Conclude that V does not have an orthonormal basis.

ISTUDY
206 IV. BILINEAR FORMS

27. Real Symmetric Matrices


Here we study real symmetric matrices. As in the preceding sec-
tions, it will be easier to study certain vector spaces and then translate
the results into matrix language. We will consider two different prop-
erties of the matrix, namely its behavior as a bilinear form and then
as a linear transformation.
Let B be a bilinear form on the real vector space V and let W
be a subspace. We say that W is a positive semidefinite subspace if
B(β, β) ≥ 0 for all β ∈ W , We say that W is positive definite if in
addition B(β, β) = 0 only when β = 0. Analogously, we say that W is
negative semidefinite if B(β, β) ≤ 0 for all β ∈ W and W is negative
definite if in addition B(β, β) = 0 only when β = 0. The following is
known as Sylvester’s Law of Inertia.
Theorem 27.1. Let B : V × V → R be a symmetric bilinear form
on the n-dimensional real vector space V . Then V has a basis C =
{γ1 , γ2 , . . . , γn } with
[B]C = diag(1, 1, . . . , 1, −1, −1, . . . , −1, 0, 0, . . . , 0)
Moreover the number of 1’s, −1’s and 0’s that occur on the diagonal
are uniquely determined by B.
Proof. By Theorem 23.1, we can write
V = W1 + W2 + · · · + Wk + U
a perpendicular direct sum of the nonisotropic lines Wi and the isotropic
subspace U . Let Wi = αi  and let {αk+1 , . . . , αn } be a basis for U .
It follows that A = {α1 , α2 , . . . , αk , . . . , αn } is a basis for V and that
[B]A is diagonal with diagonal entries B(αi , αi ) = ai . Furthermore, if
0 = ci ∈ R and if γi = ci αi , then C = {γ1 , γ2 , . . . , γn } is also a basis for
V . Clearly
 
[B]C = diag B(γ1 , γ1 ), B(γ2 , γ2 ), . . . , B(γn , γn )
 
= diag c21 a1 , c22 a2 , . . . , c2n an
Now we can certainly choose ci = 0 so that c2i ai = 1, −1 or 0 accordingly
as ai > 0, ai < 0 or ai = 0 and we do so. Finally, we order the vectors
in C so that the +1 entries come first, then the −1’s and lastly the 0’s.
In this way
[B]C = diag( 1, 1, . . . , 1 , −1, −1, . . . , −1 , 0, 0, . . . , 0 )
        
r s t
Let r, s and t be as above. We show that these parameters are
uniquely determined by B. To this end, let us first make a notational

ISTUDY
27. REAL SYMMETRIC MATRICES 207

change so that the orthogonal basis C is written as


C = {α1 , α2 , . . . , αr , β1 , β2 , . . . , βs , γ1 , γ2 , . . . , γt }
with B(αi , αi ) = 1, B(βi , βi ) = −1 and B(γi , γi ) = 0. Set
W − = β1 , β2 , . . . , βs , γ1 , γ2 . . . , γt 
W + = α1 , α2 , . . . , αr ,

Suppose δ ∈ W + . Then δ = i di αi so
 
B(δ, δ) = B(di αi , di αi ) = d2i
i i

Thus B(δ, δ) ≥ 0 and B(δ, δ) = 0 if and only if δ = 0. In other words,


W + is a positive definite subspace and hence V has a positive definite
subspace of dimension r.  
Now let δ ∈ W − . Then δ = i di βi + j ej γj so
  
B(δ, δ) = B(di βi , di βi ) + B(ej γj , ej γj ) = − d2i ≤ 0
i j i

and thus W − is a negative semidefinite subspace of dimension equal to


s + t = n − r. Finally let W be any positive definite subspace of V .
If δ ∈ W ∩ W − , then B(δ, δ) ≥ 0 since δ ∈ W and B(δ, δ) ≤ 0 since
δ ∈ W − . Thus B(δ, δ) = 0 and since W is positive definite we have
δ = 0. Therefore W ∩ W − = 0 so
dimR W + dimR W − ≤ dimR V = n
and hence dimR W ≤ r since dimR W − = n − r. It follows that r is the
largest dimension of a positive definite subspace of V and therefore r
is uniquely determined by B.
We obtain s in a similar manner by reversing the roles of the α’s
and the β’s. We set
W − = β1 , β2 , . . . , βs , W + = α 1 , α 2 , . . . , α r , γ 1 , γ 2 . . . , γt 
and as above we conclude that s is the largest dimension of a negative
definite subspace of V . Thus s is uniquely determined by B and since
t = n − r − s, the theorem is proved. 
We can now translate the above into a result on matrices.
Corollary 27.1. Let α ∈ Rn×n be a symmetric matrix. Then α is
congruent to a diagonal matrix with diagonal entries 1, −1 and 0 only.
Moreover the number of 1’s, −1’s and 0’s is uniquely determined by α.
Proof. Choose some n-dimensional real vector space V , say V =
Rn , and fix a basis A of V . Then, by Theorems 24.1 and 24.2, there
exists a unique bilinear form B : V × V → R with [B]A = α. By the
preceding theorem, there exists a basis C of V such that [B]C is diagonal

ISTUDY
208 IV. BILINEAR FORMS

with diagonal entries 1, −1 and 0 and with the number of these entries
determined by B and hence by α. Since α = [B]A is congruent to [B]C
by Lemma 24.1, we see that α is congruent to an appropriate diagonal
matrix.
Conversely assume that α is congruent to a suitable diagonal matrix
γ and let V = Rn have basis A. Then B can be chosen so that α = [B]A
and B depends only upon α. Since α is congruent to γ, Lemma 24.1
implies that [B]C = γ for some basis C of V . The preceding theorem
now implies that the number of diagonal entries of γ equal to 1, −1 or
0 is uniquely determined by B and hence by α. 

We now move on to consider linear transformation properties of


real symmetric matrices. Again we start by studying vector spaces and
linear transformations. Let V be an inner product space. A linear
transformation T : V → V is said to be symmetric if for all α, β ∈ V
we have
αT • β = α • βT
Recall that the complex numbers are algebraically closed, so every
nonconstant polynomial in C[x] has a root in C. It follows that every
nonzero polynomial in C[x] can be factored into a product of linear
factors. For the real numbers we have
Lemma 27.1. Every nonzero polynomial in R[x] can be factored into
a product of linear and quadratic factors.
Proof. We proceed by induction on the degree of the polynomial
f , the result being trivial for deg f = 0 or 1. Now let f (x) ∈ R[x] be
the nonzero polynomial
n

f (x) = ai x i
i=0

of degree n > 1 and assume that the result holds for all polynomials
of smaller degree. We embed R in the complex numbers C and then
R[x] ⊆ C[x].
Let : C → C denote complex conjugation. Since C is algebraically
closed, we can let b ∈ C be a root of f . Then since the coefficients ai
are real, we have
 i  i
0=0= ai b = ai b
 i
= ai b = f (b)

ISTUDY
27. REAL SYMMETRIC MATRICES 209

and b is also a root of f . By Lemma 18.4 we have f (x) = (x − b)g(x)


for some g(x) ∈ C[x]. Indeed, if b is real, then that lemma implies that
g(x) ∈ R[x], and the result follows by induction since deg g < deg f .
If b is not real, then b = b is also a root of f , so b is a root of g.
Hence g(x) = (x − b)h(x) and f (x) = (x − b)(x − b)h(x). Finally, we
see that
(x − b)(x − b) = x2 − (b + b)x + bb
is a real quadratic polynomial. Furthermore by long division, h(x) ∈
R[x] and the result again follows by induction since deg h < deg f . 
With this lemma, we can obtain
Lemma 27.2. Let V be a finite dimensional inner product space and
let T : V → V be a symmetric linear transformation. Then T has an
eigenvalue in R.
Proof. By the Cayley-Hamilton Theorem, there exists a nonzero
polynomial f (x) ∈ R[x] of minimal degree with f (T ) = 0. Clearly
deg f ≥ 1, and by the previous lemma, f (x) = g(x)h(x) where h(x)
is monic and either linear or quadratic. Now deg g < deg f , so by the
minimal nature of deg f we have g(T ) = 0. Hence there exists γ ∈ V
with α = γg(T ) = 0. Lemma 19.2 now implies that
αh(T ) = γg(T )h(T ) = γf (T ) = 0
If h(x) = x − a is linear then
0 = αh(T ) = aT − aα
and a ∈ R is an eigenvalue for T .
Suppose now that h(x) = x2 + 2ax + b is quadratic. Then
0 = αh(T ) = αT 2 + 2aαT + bα
and hence
 
0 = αT 2 + 2a(αT ) + bα • α
= (αT 2 • α) + 2a(αT • α) + b(α • α)
Note that T is symmetric, so αT 2 • α = αT • αT = αT 2 and the
above yields
0 = αT 2 + 2a(αT • α) + bα2 (∗)
On the other hand
0 ≤ αT + aα2 = (αT + aα) • (αT + aα)
= αT 2 + 2a(αT • α) + a2 α2

ISTUDY
210 IV. BILINEAR FORMS

and subtracting (∗) from the above yields


0 ≤ (a2 − b) α2
Since α = 0 we have α2 > 0√and we conclude that √ a2 − b ≥ 0.
Therefore the numbers c1 = −a + a2 − b and c2 = −a − a2 − b are
real and these are the roots of h(x) so h(x) = (x − c1 )(x − c2 ). This
shows that f (x) has a linear factor and as we have seen this implies
that T has a real eigenvalue. 
The existence of this one eigenvalue allows us to deduce a good deal
of information about T . We have
Theorem 27.2. Let V be a finite dimensional inner product space
and let T : V → V be a symmetric linear transformation. Then V has
an orthonormal basis consisting of eigenvectors of T .

Proof. We proceed by induction on dimR V , the result being triv-


ial if dimR V = 1. Let dimR V = n > 1 and suppose that the result is
true for all smaller dimensional spaces. Let T : V → V be symmetric.
By the preceding lemma, T has an eigenvalue a ∈ R with correspond-
ing eigenvector α. Since bα is also an eigenvector for any b ∈ R with
b = 0, we can certainly assume that α = 1. If W = α⊥ then by
Lemma 25.1 we have V = α ⊕ W , an orthogonal direct sum. Clearly
dimR W = n − 1 and W is an inner product space.
We show now that (W )T ⊆ W . To this end, let β ∈ W . Since T is
symmetric and β ∈ α⊥ we have
α • βT = αT • β = (aα) • β = 0
Hence βT ⊆ α⊥ = W . Thus TW : W → W and clearly TW is sym-
metric. By induction W has an orthonormal basis {α2 , α3 , . . . , αn }
consisting of eigenvectors for TW and hence for T . Setting α1 = α it
follows that {α1 , α2 , . . . , αn } is an orthonormal basis for V consisting
of eigenvectors for T and the result follows. 
Finally we have
Corollary 27.2. Let α ∈ Rn×n be a symmetric matrix. Then α
is similar to a diagonal matrix.
Proof. Let V be some n-dimensional inner product space, for ex-
ample we could take V = Rn with the usual dot product. If C =
{γ1 , γ2 , . . . , γn } is an orthonormal basis for V , then we can find a linear
transformation T : V → V such that C [T ]C = α. We show now that T
is symmetric.

ISTUDY
27. REAL SYMMETRIC MATRICES 211
 
Write α = [aij ] and let β = i bi γ i and δ = j dj γj be vectors in
V . Then

βT = bi aij γj
i,j

δT = dj aji γi
i,j

so since C is orthonormal we have


   
βT • δ = bi aij γj • dj γj
i,j j

= bi aij dj
i,j

and
   
β • δT = bi γ i • dj aji γi
i i,j

= bi aji dj
i,j

But α is symmetric, so aij = aji and thus βT • δ = β • δT as required.


Finally by the previous theorem, V has an orthonormal basis A
consisting of eigenvectors of T and hence A [T ]A is diagonal. Since
α = C [T ]C is similar to A [T ]A , the proof is complete. 

Problems
27.1. Let α ∈ Rn×n . Show that α is similar to a diagonal matrix if
and only if it is similar to a symmetric matrix.
27.2. Let V be an inner product space with orthonormal basis C =
{γ1 , γ2 , . . . , γn } and let α ∈ Rn×n . Define the linear transformations T
and T T by
T T
C [T ]C = α, C [T ]C = α
Prove that βT • δ = β • δT T for all vectors β, δ ∈ V .
Let V be an. n-dimensional real vector space and let B : V ×V → R
be a symmetric bilinear form. Suppose there exists a basis C of V with
[B]C = diag( 1, 1, . . . , 1 , −1, −1, . . . , −1 , 0, 0, . . . , 0 )
        
r s t

27.3. Let U be an isotropic subspace of V . Prove that dimR U ≤


n − r and n − s.

ISTUDY
212 IV. BILINEAR FORMS

27.4. Show that there exists an isotropic subspace U of dimension


n − max{r, s} = t + min{r, s}.
Let V be an n-dimensional real vector space having the basis C =
{γ1 , γ2 , . . . , γn } and let B : V × V → R be a symmetric bilinear form.
Suppose α is the real symmetric matrix given by [B]C = α. We obtain
a condition on α equivalent to B being an inner product.
27.5. Suppose B is an inner product. Deduce that α is congruent
to the identity matrix and hence that det α > 0.
27.6. For each i, let Vi ⊆ V be given by Vi = γ1 , γ2 , . . . , γi  so that
Ci = {γ1 , γ2 , . . . , γi } is a basis for Vi . Show that the matrix [BVi ]Ci is
equal to αi , the square submatrix of α formed by the first i rows and
first i columns of α.
27.7. If B is an inner product, then obviously so is BW for all sub-
spaces W of V . Using this, conclude that det αi > 0 for all subscripts
i = 1, 2, . . . , n.
27.8. Let W = Vn−1 and suppose that BW is an inner product.
Then certainly V = W ⊕ W ⊥ with W ⊥ = β for some nonzero vector
β. Let B be the basis of V given by B = {γ1 , γ2 , . . . , γn−1 , β} and
find the matrix [B]B . Show that B is an inner product if and only if
B(β, β) > 0 and hence if and only if det[B]B > 0.
27.9. Given the same situation as in Problem 27.8, use the fact that
[B]B is congruent to α to deduce that B is an inner product if and only
if det α > 0.

27.10. Prove by induction on n that B is an inner product if and


only if det αi > 0 for i = 1, 2, . . . , n.

ISTUDY
28. COMPLEX ANALOGS 213

28. Complex Analogs


When dealing with complex vector spaces, somewhat different bi-
linear forms and inner products occur. They depend upon the fact that
C has a conjugation map : C → C such that aa = |a|2 ≥ 0 for all
a ∈ C. Most of the theory follows in a completely analogous manner
and therefore we will confine our proofs to the few special situations in
which differences occur.
Let V be a complex vector space. A map B : V × V → C is said to
be a Hermitian bilinear form if for all α, β, γ ∈ V and a, b ∈ C we have
H1. B(α + β, γ) = B(α, γ) + B(β, γ)
B(γ, α + β) = B(γ, α) + B(γ, β)

H2. B(aα, β) = aB(α, β)


B(α, bβ) = bB(α, β)
Thus we observe that B differs from the usual bilinear form just in the
last equation where the scalar b factors out as b, its complex conjugate.
We consider some examples.
Example 28.1. Let V = Cn . Then we define a dot product α • β =
B(α, β) by 
α • β = B(α, β) = ai b i
i
where α = (a1 , a2 , . . . , an ) and β = (b1 , b2 , . . . , bn ). It is easy to see
that this is a Hermitian bilinear form. It has a number of additional
properties we will find of interest later.
Example 28.2. More generally, let V be an n-dimensional C-vector
space with basis {γ1 , γ2 , . . . , γn } and let [cij ] ∈ Cn×n . Then B : V ×V →
C given by    
B ai γi , bj γ j = ai cij bj
i j i,j
is a Hermitian bilinear form.
In other words, we can easily construct such forms by taking an
ordinary bilinear form and somehow conjugating the second factor.
Theorem 28.1. Let V be an n-dimensional C-vector space with
basis C = {γ1 , γ2 , . . . , γn }. Then the map
B → [B]C = [B(γi , γj )]
yields a one-to-one correspondence between Hermitian bilinear forms
B and all n × n matrices over C.

ISTUDY
214 IV. BILINEAR FORMS

Let us observe an interesting difference here.


Lemma 28.1. Suppose B : V × V → C is a nonzero Hermitian
bilinear form. Then there exists α ∈ V with B(α, α) = 0.
Proof. Since B is nonzero there exists β, γ ∈ V with B(β, γ) = 0.
Now let x be a complex number of absolute value 1 and set α = xβ + γ.
Then since xx = 1, we have
xB(α, α) = xB(xβ + γ, xβ + γ)
= x2 xB(β, β) + x2 B(β, γ) + xxB(γ, β) + xB(γ, γ)
 
= x2 B(β, γ) + x B(β, β) + B(γ, γ) + B(γ, β)
Since B(β, γ) = 0, the above right hand quadratic can have at most
two roots and hence x ∈ C exists with |x| = 1 and B(α, α) = 0. 
We say that B is normal if B(α, β) = 0 implies B(β, α) = 0. One
example is as follows. A form B : V × V → C is said to be Hermitian
symmetric if for all α, β ∈ V we have
B(β, α) = B(α, β)
Of course, any scalar multiple of a Hermitian symmetric form is normal.
Conversely, we have
Theorem 28.2. Let B : V ×V → C be a normal Hermitian bilinear
form. Then B is a scalar multiple of a Hermitian symmetric form.
A proof of this result is outlined in the first four problems of this
section. Now the structure of this special type of form is easily ob-
tained. Again we write α ⊥ β if B(α, β) = 0.
Theorem 28.3. Let V be a finite dimensional vector space over C
and let B : V × V → C be a normal Hermitian bilinear form. Then
V = W 1 + W2 + · · · + Wk + U
is a perpendicular direct sum of the nonisotropic lines Wi and the
isotropic subspace U .
If α = [aij ] ∈ Cn×n we define its conjugate transpose α∗ to be the
matrix α∗ = [bij ] where bij = aij . The matrix α is said to be Hermitian
symmetric if α∗ = α. We have of course
Theorem 28.4. Let V be a finite dimensional complex vector space
with basis C and let B : V × V → C be a Hermitian bilinear form.
Then B is Hermitian symmetric if and only if α = [B]C is a Hermitian
symmetric matrix..
Furthermore, the formula for change of basis is

ISTUDY
28. COMPLEX ANALOGS 215

Theorem 28.5. Let V be a finite dimensional complex vector space


with bases A and C and let B : V × V → C be a Hermitian bilinear
form. Then
   ∗
[B]C = C [I]A ·[B]A · C [I]A
Now Example 28.1 is of course an example of a complex inner prod-
uct space. Formally, such a space is a complex vector space V with a
complex inner product or complex dot product α • β defined on it and
satisfying
C1. B(α, β) = α • β is a Hermitian symmetric form.
C2. α • α is real and nonnegative.
C3. α • α = 0 if and only if α = 0.

Again we define the norm of a vector α ∈ V to be α = α • α ≥ 0.
The relationship between the inner product and the norm is not as
simple here. We have
Lemma 28.2. Let V be a complex inner product space. If α, β ∈ V ,
then
1 
α • β = α + β2 − α2 − β2
2
1 
+ α + iβ2 − α2 − β2 i
2

where of course i = −1.
Proof. We have
α + β2 = (α + β) • (α + β) = α • α + α • β + β • α + β • β
= α2 + 2 e(α • β) + β2
Thus the real part of α • β is given by
1 
e(α • β) = α + β2 − α2 − β2
2
In addition, we obtain the imaginary part using
 
m(α • β) = e (−i(α • β) = e(α • iβ)
1 
= α + iβ2 − α2 − β2
2
since iβ = β. 
If V is a finite dimensional complex inner product space, then V
has an orthonormal basis. In fact the Gram-Schmidt method yields a
constructive proof of this result.

ISTUDY
216 IV. BILINEAR FORMS

Theorem 28.6. Let V be a complex inner product space with basis


A = {α1 , α2 , . . . , αn }. We define vectors γ1 , γ2 , . . . , γn inductively by
γ1 = α1 and
i
(αi+1 • γk )
γi+1 = αi+1 − γk
k=1
(γk • γk )
Then, & '
1 1 1
C= γ1 , γ2 , . . . , γn
γ1  γ2  γn 
is an orthonormal basis for V .
In terms of such an orthonormal basis, the inner product is easily
describable.
Lemma 28.3. Let V be a complex innerproduct space withorthonor-
mal basis C = {γ1 , γ2 , . . . , γn }. If α = i ai γi and β = i bi γi are
vectors in V , then ai = α • γi and bi = β • γi so

α= (α • γi )γi
i
 
α•β = ai b i = (α • γi )(γi • β)
i i
 
2 2
α = |ai | = |(α • γi )|2
i i

The Cauchy-Schwarz inequality follows from a minor modification


of the original proof.
Theorem 28.7. Let V be a complex inner product space and let
α, β ∈ V . Then
α·β ≥ |α • β|
Moreover equality occurs if and only if α is a scalar multiple of β or β
is a scalar multiple of α.
Proof. We may assume that α and β are nonzero vectors and set
γ = β2 α − (α • β)β. Then
   
0 ≤ γ2 = β2 α − (α • β)β • β2 α − (α • β)β
= β4 (α • α) − β2 (α • β)(α • β) − (α • β)β2 (β • α)
+ (α • β)(α • β)(β • β)
 
= β2 · β2 α2 − |α • β|2
and we have α2 ·β2 ≥ |α • β|2 . Moreover if equality occurs then
γ = 0 and hence α and β are multiples of each other. 

ISTUDY
28. COMPLEX ANALOGS 217

An easy analog of Sylvester’s Law of Inertia holds.


Theorem 28.8. Let V be a finite dimensional complex vector space
and let B : V × V → C be a Hermitian symmetric bilinear form. Then
there exists a basis C = {γ1 , γ2 , . . . , γn } such that
[B]C = diag(1, 1, . . . , 1, −1, −1, . . . , −1, 0, 0, . . . , 0)
Furthermore the number of 1’s, −1’s and 0’s that occur on the diagonal
is uniquely determined by B.
Finally let V be a complex inner product space. We say that
T : V → V is a Hermitian symmetric linear transformation if for all
α, β ∈ V
αT • β = α • βT
Theorem 28.9. Let V be a finite dimensional inner product space
over the complex numbers and let T : V → V be Hermitian symmetric.
Then all eigenvalues of T are real and V has an orthonormal basis
consisting of eigenvectors of T .
Proof. Let a ∈ C be an eigenvalue for T with eigenvector α. Then
aα • α = αT • α = α • αT = α • aα
so
a(α • α) = a(α • α)
Since α = 0 we have a = a and hence a is real. Since C is algebraically
closed, T does have an eigenvalue. The remainder of the proof follows
as in the real case. 
Corollary 28.1. Every Hermitian symmetric matrix is similar to
a diagonal matrix with real entries.

Problems
Let B : V × V → C be a normal Hermitian bilinear form. In the
next four problems, we prove Theorem 28.2.
28.1. For σ, τ, η ∈ V , derive the identity
B(η, σ)B(τ, η) = B(σ, η)B(η, τ ) (∗)
(Hint. See the proof of Theorem 22.1.)
28.2. Assume that there exists γ ∈ V with B(γ, γ) > 0 and fix this
element. Setting τ = η = γ, σ = β in (∗), deduce that
B(γ, β) = B(β, γ)
for all β ∈ V .

ISTUDY
218 IV. BILINEAR FORMS

28.3. Let α, β ∈ V and choose c ∈ C so that B(γ, cγ + α) = 0.


Setting σ = β, η = cα + γ, τ = γ in (∗), deduce that
B(α, β) = B(β, α)
and hence that B is Hermitian symmetric.
28.4. Finish the proof of Theorem 28.2.
28.5. Verify that Example 28.1 yields a complex inner product
space.
28.6. Find complex analogs for the second and third proofs of the
Cauchy-Schwarz inequality.
28.7. Do the Triangle inequality and the Parallelogram law hold for
complex inner product spaces?

28.8. Let α, β ∈ Cn×n and let a ∈ C. Prove that


(α + β)∗ = α∗ + β ∗
(α β)∗ = β ∗ α∗
(a α)∗ = a α∗
28.9. Let V be a complex inner product space, let T : V → V be
a linear transformation and let C = {γ1 , γ2 , . . . , γn } be an orthonormal
basis for V . Show that T is Hermitian symmetric if and only if the
matrix C [T ]C is Hermitian symmetric.
28.10. Finish the proof of Theorem 28.9 and prove Corollary 28.1.

ISTUDY
CHAPTER V

Infinite Dimensional Spaces

ISTUDY
220 V. INFINITE DIMENSIONAL SPACES

29. Existence of Bases


The goal now is to study infinite dimensional vector spaces, to show
that bases exist and that dimension makes sense. To do this, we need
a better understanding of set theory.
Of course, set theory, like all of mathematics, is based on certain
axioms. Most of these are intuitive. For example, we can take unions
and intersections, and we can use set builder notation as long as we
are careful. The latter means that we can define a set as long as we
know the objects we are putting into it. Thus we cannot look at the
set of all sets because one of the objects in this set would be itself and
we don’t know it. Of course we can look at the collection of all sets
as long as we do not view this collection as a set. Similarly, we cannot
define a set A by A = {x | x ∈ / A} since again we are trying to define
A in terms of A. These sorts of constructions lead to contradictions.
But as long as we are careful, set theory behaves itself.
Another thing we do is choose elements from a set. For example, if
V = 0, we take an element 0 = α ∈ V . We can do this because the set
V \ {0} is given to be nonempty. Similarly if B : V × V → F is nonzero
then we can choose α, β ∈ V with B(α, β) = 0. Again this follows
because the set {(σ, τ ) ∈ V × V | B(σ, τ ) = 0} is nonempty. But can
we make such choices simultaneously from infinitely many sets. The
answer is “yes” and “no”.
The “no” answer comes from the fact that basic set theory does
not allow us to do it. Indeed there are so-called models of set theory
where this kind of choice is not possible. The “yes” answer comes
about because we usually add an additional axiom to basic set theory
known as the Axiom of Choice that allows this to occur. The way the
axiom is formulated is interesting. Suppose we are given a family of
nonempty sets {Ai | i ∈ I} ( where I is the index set or set of labels.
For convenience, let A = i∈I Ai . Then a choice function is a map
f : I → A such that f (i) ∈ Ai for all i. Thus the existence of such a
function f gives us a way to make a simultaneous choice of elements
one from each Ai . The Axiom of Choice says that such choice functions
exist. We state it below as a theorem.
Theorem 29.1. Let A be a set and let {Ai | i ∈ I} be a family
of nonempty subsets of A indexed by the set I. Then there exists a
function f : I → A such that f (i) ∈ Ai for all i ∈ I.
Proof. This is an axiom that we assume. 
Now it is not immediately clear how useful this axiom really is.
But it turns out that there are a dozen or so equivalent formulations

ISTUDY
29. EXISTENCE OF BASES 221

that are more useful. Among these are: the Compactness theorem,
the Hausdorff maximal principle, Kuratowski’s lemma, Tukey’s lemma,
the Well-ordering principle, Zermelo’s postulate, and Zorn’s lemma.
Algebraist tend to use Zorn’s lemma or Well-ordering, so we will restrict
our attention to these. The fact that Zorn’s lemma is equivalent to the
Axiom of Choice means that one can prove Zorn’s lemma using the
Axiom of Choice and conversely one can prove the Axiom of Choice
using Zorn’s lemma.
We need some definitions.
Definition 29.1. We say that (A, ≤) is a partially ordered set or
a poset if the inequality ≤ is defined on A. By this we mean that for
some pairs a, b ∈ A we have a ≤ b. Furthermore, for all a, b, c ∈ A, the
inequality satisfies
PO1. a ≤ a.
PO2. a ≤ b and b ≤ c imply a ≤ c.
PO3. a ≤ b and b ≤ a imply a = b.
Of course the second condition above is the transitive law. As usual,
we will use b ≥ a to mean a ≤ b, and we will use a < b or b > a to
mean less than or equal to, but not equal.
An element m ∈ A is said to be maximal if there are no properly
larger elements in A. That is, if m ≤ a then m = a. Similarly n ∈ A
is minimal if there are no properly smaller elements in A.
A standard example here is as follows. Let F be a set and let
A be the set of all subsets of F . Then A is a partially ordered set
with inequality being setwise inclusion ⊆. Obviously F is the unique
maximal member of A and ∅ is the unique minimal member of A.
Definition 29.2. Next, (A, ≤) is said to be a linearly ordered set
or a totally ordered set if
LO1. (A, ≤) is a partially ordered set.
LO2. For all a, b ∈ A we have a ≤ b or b ≤ a.
Thus this occurs when A is partially ordered and all pairs of elements
are comparable. Of course both the set of rational numbers Q and the
set of real numbers R are totally ordered with the usual inequality.
Definition 29.3. Finally, (A, ≤) is a well-ordered set if
WO1. (A, ≤) is a linearly ordered set.
WO2. Every nonempty subset of A has a minimal member.
Notice that Z+ the set of positive integers is well-ordered, but Q+
the set of positive rationals is not. Indeed, for example, the subset
B = {q ∈ Q+ | q > 1} does not have a minimal member. Indeed, the
lower bound q = 1 is not in B.

ISTUDY
222 V. INFINITE DIMENSIONAL SPACES

Suppose (A, ≤) is a partially ordered set and let B be a subset of


A. Then an element u ∈ A is an upper bound for B if b ≤ u for all
b ∈ B. Note that u need not be an element of B. We can now state
Zorn’s lemma and prove it using the Axiom of Choice. The proof of
the claim below is bit tedious. The remainder of the argument is more
routine.
Theorem 29.2. Let (A, ≤) be a nonempty partially ordered set and
assume that every linearly ordered subset of A has an upper bound in
A. Then A has a maximal element.
Proof. Assume by way of contradiction that A has no maximal
elements.
Let B be any linearly ordered subset of A. Then by assumption,
B has an upper bound u ∈ A. Furthermore, u is not maximal, by
the above, so there exists v ∈ A with v properly larger than u, that is
u < v. Thus for all b ∈ B we have b ≤ u < v so v is a “strict” upper
bound for B. We have shown that each such B has a strict upper
bound. Hence by the Axiom of Choice, Theorem 29.1, there exists a
choice function f from the set of linearly ordered subsets of A to A
itself so that f (B) is a strict upper bound for B. Note that B above is
allowed to be the empty set.
Again suppose that B is a linearly ordered subset of A. If e is any
element of B, then
Be = {b ∈ B | b < e}
is a linearly ordered subset of B and hence of A. Clearly e is a strict
upper bound for Be , but there is no reason to believe that e = f (Be ).
When this happens for all such elements e then B becomes rather
special. Specifically, we say that B is a Z-subset of A if
i. B is well-ordered.
ii. For all e ∈ B we have e = f (Be ).
Note that (i) above is a stronger assumption than B just being linearly
ordered. Notice also that {f (∅)} is a Z-subset of size 1.
Claim. Let B = C be two Z-subsets. Then either C = Bb ⊆ B for
some b ∈ B or B = Cc ⊆ C for some c ∈ C.
Proof of the claim. Since B = C we cannot have both B ⊆ C
and C ⊆ B. By symmetry we can assume that B ⊆ C. Now B is well-
ordered, so we can choose b to be a minimal element in the nonempty
subset B \ (B ∩ C). Clearly minimality then implies that Bb ⊆ C. If
C = Bb , then the claim is proved. If not, we can choose c to be a
minimal element of the nonempty set C \ Bb . Again minimality yields

ISTUDY
29. EXISTENCE OF BASES 223

Cc ⊆ Bb . Note that b ∈ B \ Bb so b ∈ B \ Cc and hence, since B


is well-ordered, we can choose d to be minimal in B \ Cc with d ≤ b.
Again, minimality and transitivity yield Bd ⊆ Cc so

B d ⊆ Cc ⊆ B b ⊆ C

We are given d ≤ b. If b = d, then Bd = Bb = Cc and hence by


condition (ii ), b = f (Bb ) = f (Cc ) = c, a contradiction since b ∈
/ C. On
the other hand, if d < b then d ∈ Bb ⊆ C. But d ∈ / Cc and C is linearly
ordered so c ≤ d. Now d ∈ C, so Cd is defined and Cc ⊆ Cd ∩ B ⊆ Bd .
Thus Bd = Cc and again (ii) yields d = f (Bd ) = f (Cc ) = c and c ∈ B.
Now c = d < b and c ∈ B so c ∈ Bb and this contradicts the definition
of the element c. Thus the claim is proved. 

Finally, let M be the union of all Z-subsets of A. We show that M


is a Z-subset. First let x, y ∈ M so that x ∈ X and y ∈ Y where X and
Y are Z-subsets. By the Claim we know that X and Y are comparable
sets and by symmetry we can assume that X ⊆ Y . Then x, y ∈ Y and
Y is linearly ordered, so we conclude that x ≤ y or y ≤ x. Thus M is
linearly ordered.
Next we observe that if X is a Z-subset of A and if x ∈ X, then
Mx = Xx . Clearly Xx ⊆ Mx . For the other inclusion, let y ∈ Mx ⊆ M ,
so that y ∈ Y for some Z-subset Y . If Y ⊆ X then y ∈ X and hence
y ∈ Xx . On the other hand, if Y ⊆ X, then the Claim implies that
X = Yt for some t ∈ Y . But then x ∈ X = Yt so x < t and y ∈ Y \ Yt
so t ≤ y. Thus x < t ≤ y and this contradicts the fact that y < x.
With this, we see that M satisfies condition (ii ). Indeed, let x ∈ M
and as above note that Mx = Xx for some Z-set X containing x. Then
f (Mx ) = f (Xx ) = x since the Z-subset X satisfies (ii ). Finally, let S
be a nonempty subset of M and choose a Z-subset X with S ∩ X = ∅.
Since X is well-ordered, S ∩ X has a minimal element x. In particular,
S ∩ Xx = ∅ so S ∩ Mx = ∅ and thus x is a minimal element of S.
We conclude that M is a Z-subset of A and hence clearly the largest
Z-subset of A.
Since M is a linearly ordered subset of A, v = f (M ) exists and we
form N = M ∪ {v}. Note that v is strictly larger than all elements of
M so N is strictly larger than M . Now N is surely linearly ordered and
well-ordered since M is. Furthermore, by definition of v, it is clear that
N satisfies (ii ). Thus N is a Z-subset of A strictly larger than M , and
this contradicts the maximality of M . We conclude that our original
assumption is false, so A has a maximal element and the theorem is
proved. 

ISTUDY
224 V. INFINITE DIMENSIONAL SPACES

The Well-ordering principle below says that well-orderings abound.


We prove it easily using a standard Zorn’s lemma argument. If A is a
linearly ordered set, we say that B ⊆ A is an initial segment if for all
a ∈ A and b ∈ B, the inequality a ≤ b implies that a ∈ B. In other
words, B is the set of “small” elements of A.
Theorem 29.3. Any set can be well-ordered.
Proof. Let A be the given set. We consider the subsets of A that
can be well-ordered. Specifically, let W, the set of all pairs (B, ≤)
where B is a subset of A and ≤ defines a well-ordering on B. Next
we introduce an ordering on these pairs by defining (B, ≤)  (C, ≤) to
mean that B ⊆ C, both ≤’s agree on B, and B is an initial segment
of C with respect to ≤. Since an initial segment of an initial segment
is an initial segment, it follows easily that W is a partially ordered set
under the inequality .
Now suppose X = {(Xi , ≤i ) | i ∈ I} is a nonempty linearly ordered
subset of W where( we have written ≤i for the inequality defined on
Xi . Let X = i∈I Xi ⊆ A. Our goal is to define an ordering ≤ on
X, to show that (X, ≤) ∈ W and that (X, ≤) is an upper bound for
X . To start with, let x, y, z ∈ X. Then x ∈ Xi , y ∈ Xj and z ∈ Xk
for suitable i, j, k ∈ I. Since X is linearly ordered, one of these three
subsets is largest, say Xm , and then x, y, z ∈ Xm . In other words, any
three elements of X are contained in a common Xm .
Now we define ≤ on X. To this end, let x, y ∈ X and let x, y ∈ Xr
for some r. Then we say that x ≤ y if and only if x ≤r y. Notice
that if x, y ∈ Xs for some other s, then either (Xr , ≤r )  (Xs , ≤s ) or
(Xs , ≤s )  (Xr , ≤r ). In either case, ≤r and ≤s agree on the smaller
set and hence x ≤r y if and only if x ≤s y. Thus x ≤ y is well defined,
independent of the choice of r.
Next we show that (Xi , ≤i )  (X, ≤) for all i. To this end, note
that by definition the two inequalities agree on the smaller set Xi . Next
suppose x ∈ Xi and y ∈ X with y ≤ x. Then y ∈ Xj for some j and
either Xj ⊆ Xi or Xi ⊆ Xj . In the latter case we know that Xi is an
initial segment of Xj , so either inclusion implies that y ∈ Xi . Thus Xi
is indeed an initial segment of X and (Xi , ≤i )  (X, ≤).
Finally we show that (X, ≤) ∈ W and for this we need ≤ to be a
well-ordering. Since each (Xi , ≤i ) is a linear ordering and since we can
put three elements of X in a common Xi , it follows easily that ≤ is
a linear ordering. Thus we need only check the existence of minimal
elements. To this end, let S be a nonempty subset of X and choose
Xr so that S ∩ Xr = ∅. Since Xr is well-ordered, S ∩ Xr has a minimal
element, say x. But Xr is an initial segment of X and hence x is clearly

ISTUDY
29. EXISTENCE OF BASES 225

a minimal element of S. We conclude that (X, ≤) ∈ W is an upper


bound for X and Zorn’s lemma (Theorem 29.2) implies that W has a
maximal member (B, ≤).
If B = A we can choose c ∈ A\B and extend B to the strictly larger
set C = B∪{c}. Furthermore, we can extend the ordering on B to C by
defining b < c for all b ∈ B. In this way, C becomes a well-ordered set
with B as an initial segment. Thus clearly (B, ≤) ≺ (C, ≤) ∈ W and
this contradicts the maximality of (B, ≤). We conclude that A = B is
well-ordered. 
As an easy consequence, we have
Theorem 29.4. The Axiom of Choice, Zorn’s lemma and the Well-
ordering principle are all equivalent.

Proof. We have already shown that the Axiom of Choice (Theo-


rem 29.1) implies Zorn’s lemma (Theorem 29.2). Furthermore, Zorn’s
lemma implies the Well-ordering principle (Theorem 29.3). It remains
to show that Well-ordering implies the Axiom of Choice and this is
basically trivial. Indeed, let A be a set and let {Ai | i ∈ I} be a family
of nonempty subsets of A indexed by the set I. Well-order the set A
and observe that each nonempty subset B of A has a unique minimal
element in B which we denote by min(B). Then obviously the function
f : I → A given by f (i) = min(Ai ) is a choice function. 
We can now apply these set theoretic assumptions to linear algebra.
Let V be a vector space over a field F . Since we can only do finite
arithmetic in V , as in Section 4, we say that L ⊆ V is a linearly
independent subset if for all finitely many elements α1 , α2 , . . . , αn ∈ L,
any linear dependence is trivial. That is
a1 α2 + a2 α2 + · · · + an αn = 0
implies that the field elements a1 = a2 = · · · = an = 0. Note that the
empty set ∅ is linearly independent. Similarly the subset S spans V
if every element of V can be written as a finite linear combination of
elements of S. That is, V = S. Of course, a basis B is a linearly
independent spanning set. As we know, if B is a basis then
i. Every element of V can be written uniquely as a finite linear
combination of elements of B.
ii. B is a maximal linearly independent set.
iii. B is a minimal spanning set.
The goal now is to prove that bases exist and (ii) above offers a hint
as to how to prove the result. Clearly we apply Zorn’s lemma.

ISTUDY
226 V. INFINITE DIMENSIONAL SPACES

Theorem 29.5. Let V be a vector space over the field F and let
L ⊆ S ⊆ V be given where L is a linearly independent set and where S
spans V . Then there exists a basis B of V with L ⊆ B ⊆ S.

Proof. Consider the set X of all linearly independent subsets X of


V with L ⊆ X ⊆ S. Then X is nonempty since L ∈ X . Furthermore,
X is a partially ordered set using set inclusion ⊆.
Suppose Y = {Yi | ( i ∈ I} is a nonempty linearly ordered subset of
X . We show that Y = i∈I Yi is an upper bound for Y in X . First since
L ⊆ Yi ⊆ S for all i ∈ I, we have L ⊆ Y ⊆ S. Next, let α1 , α2 , . . . , αn
be finitely many vectors in Y and choose subscripts i1 , i2 , . . . , in ∈ I
with αj ∈ Yij . Since Y is linearly ordered under set inclusion, one of the
subscripts, say im , corresponds to the largest Yij . Then αj ∈ Yij ⊆ Yim
for all j = 1, 2, . . . , n. But Yim is a linearly independent set and hence
α1 , α2 , . . . , αn are linearly independent vectors. We conclude that Y is
a linearly independent set sandwiched between L and S, so Y ∈ X .
Clearly Y is an upper bound for Y in X .
We have shown that every linearly ordered subset of X has an upper
bound in X and hence Zorn’s lemma (Theorem 29.2) implies that X
contains a maximal member B. We show that B is a basis for V and
we already know that B is linearly independent. Thus it suffices to
show that B spans V .
Of course, B ⊇ B. Next let γ ∈ S \ B. Then B < B ∪ {γ} ⊆ S
so the maximality of B implies that B ∪ {γ} is linearly dependent. In
other words there exists a dependence relation of the form
b1 β1 + b2 β2 + · · · + bn βn + c γ = 0
with β1 , β2 , . . . , βn ∈ B and coefficients in F not all 0. If c = 0 then
the above becomes
b1 β1 + b2 β2 + · · · + bn βn = 0
a linear dependence relation for elements of B. But B is a linearly
independent set so we have b1 = b2 = · · · = bn = 0, a contradiction.
Thus c = 0, so we can divide by c and conclude that
(−b1 ) (−b2 ) (−bn )
γ= β1 + β2 + · · · + βn
c c c
and hence γ ∈ B. Thus B ⊇ S and since B is a subspace of V we
have B ⊇ S. But S spans V so S = V and therefore B = V . In
other words, B is a linearly independent spanning set for V and hence
B is a basis for V . 
As an immediate consequence we have

ISTUDY
29. EXISTENCE OF BASES 227

Corollary 29.1. Let V be a vector space over a field F . Then


i. V has a basis.
ii. Any linearly independent subset of V is contained in a basis,
namely a maximal linearly independent set.
iii. Any spanning set for V contains a basis, namely a minimal
spanning set.

Proof. Take L = ∅ and S = V in the previous theorem. These are


linearly independent and spanning sets, respectively, so a basis exists.
Next if L is given, take S = V and conclude that there exists a basis
B ⊇ L. Finally if S is given, take L = ∅ and deduce that a basis B
exists with B ⊆ S. 
It remains to show that all bases for V have the same “size”, what-
ever that means, and we consider this in the next section.

Problems
Let (A, ≤) be a partially ordered set.
29.1. Show that the following two conditions are equivalent.
i. A satisfies the maximal condition (MAX), which means that
every nonempty subset of A has a maximal member.
ii. A satisfies the ascending chain condition (ACC), namely A has
no strictly ascending chain
a1 < a2 < · · · < an < · · ·
indexed by the positive integers.
29.2. Show that the following two conditions are equivalent.
i. A satisfies the minimal condition (MIN), which means that
every nonempty subset of A has a minimal member.
ii. A satisfies the descending chain condition (DCC), namely A
has no strictly descending chain
a1 > a2 > · · · > an > · · ·
indexed by the positive integers.
29.3. Show that A is well-ordered if and only if every nonempty sub-
set has a unique minimal member. (Hint. To prove that the ordering
is linear consider subsets of size 2.)
29.4. Define the relation  on A by a  b if and only if b ≤ a.
Show that (A, ) is also a partially ordered set that has the effect of
interchanging maximal and minimal and also upper bound and lower

ISTUDY
228 V. INFINITE DIMENSIONAL SPACES

bound. Use this to translate Zorn’s lemma into a result guaranteeing


the existence of a minimal member of A.
29.5. Take a close look at the proof of Theorem 29.2. Did we actu-
ally prove the slightly stronger result, namely if all well-ordered subsets
of A have upper bounds in A, then A has a maximal element.
29.6. Let B ⊆ A. We say that B is a chain if it is linearly ordered,
that is if all elements of B are comparable. In the other extreme, B is
said to be an anti-chain if no distinct elements are comparable. Prove
that A has a maximal chain and a maximal anti-chain.
Now let V be a vector space over F .
29.7. Let (A, ≤) be the partially ordered set of subspaces of V with
inequality corresponding to set inclusion ⊆. If V has an infinite linearly
independent subset, show that A does not satisfy ACC or DCC. (Hint.
As we will see in Proposition 30.1, V has a linearly independent subset
{α1 , α2 , . . . , αn , . . .} indexed by the positive integers Z+ .)
29.8. Verify that bases satisfy the three conditions listed just before
Theorem 29.5.
29.9. Find an example of V and a descending chain
S 1 ⊇ S 2 ⊇ · · · ⊇ Sn ⊇ · · ·

of spanning sets of V such that the intersection S = ∞n=1 Sn does not
span V . (Hint. Surprisingly we can take V to be 1-dimenional.)
29.10. Let W be a subspace of V . Show that there exists a subspace
U of V with V = W ⊕ U . (Hint. Start with a basis for W .)

ISTUDY
30. EXISTENCE OF DIMENSION 229

30. Existence of Dimension


One can define the dimension of an infinite dimensional vector space
but as we will see, it is really less useful than in the finite dimensional
case. Obviously, we start by first understanding what the size of a basis
might mean.
Let A be a finite set and observe that we find |A| = n by counting
the elements of A. We say “first element, second element, . . . , nth
element”. Now what we are actually doing is setting up a one-to-
one correspondence between the elements of A and the elements of
the set {1, 2, . . . , n}. In fact, that correspondence naturally appears
when we write A = {a1 , a2 , . . . , an }. Indeed, i ↔ ai is precisely the
correspondence. Furthermore, if |B| = n and B = {b1 , b2 , . . . , bn } then
we have a one-to-one correspondence between the elements of A and of
B given by ai ↔ bi . Let us formalize and extend these definitions.
Let A and B be sets. Then the map f : A → B is a one-to-one
correspondence if f is one-to-one and onto. Clearly this occurs if and
only if there exists a back map f −1 : B → A such that f −1 f = 1A and
f f −1 = 1B where 1A is the identity map on A and 1B is the identity
map on B. We write A ∼ B if the maps f and f −1 exist.
Lemma 30.1. With the above notation we have
i. A ∼ A for all A (reflexive law)
ii. A ∼ B implies B ∼ A (symmetric law)
iii. A ∼ B and B ∼ C imply A ∼ C (transitive law)
(˙ (
˙
iv. Suppose A = i∈I Ai and B = i∈I Bi are disjoint unions with
Ai ∼ Bi for all i ∈ I. Then A ∼ B.
Proof. For (i ) we use 1A : A → A and (ii ) follows since a one-to-
one correspondence yields maps in both directions. Since the composi-
tion of two one-to-one correspondences is a one-to-one correspondence,
(iii ) is immediate. Finally, for (iv ) we define a map from A to B by
defining it on each of the disjoint subsets Ai . In particular, if I is finite
or if there is a canonical choice for the maps fi : Ai → Bi , then the
result is clear. In the general case, we need the Axiom of Choice to
choose a fixed one-to-one correspondence fi for each i ∈ I. 

As in Chapter 10, the first three rules above tell us that ∼ is an


equivalence relation. We then say that sets A and B have the same
size if and only if A ∼ B. In some sense, the size of A, namely |A|, is
the collection of all sets B with A ∼ B namely the equivalence class
of A. Thus sizes don’t really have names except in a few special cases.
For example, as above, A has size n if A ∼ {1, 2, . . . , n}. Next A is

ISTUDY
230 V. INFINITE DIMENSIONAL SPACES

countably infinite if A ∼ N where N = Z+ = {1, 2, . . . , n, . . .} is the set


of natural numbers and A has the size of the continuum if A ∼ R.
Again, let A and B be sets. We need two more set-theoretic defini-
tions. First, we want size to be compatible with set inclusion. Thus we
say |B| ≤ |A| if and only if there exists a one-to-one map f : B → A.
When this occurs then f gives rise to a one-to-one correspondence be-
tween B and f (B) and hence B ∼ f (B) ⊆ A. It is clear that the
inequality |B| ≤ |A| depends only upon the size of the sets and not on
the sets themselves.
Next, we define |A| + |B| to be the size of A ∪ B if A and B are
disjoint. More generally, let A = {a | a ∈ A} and B ∗ = {b∗ | b ∈ B} so
that A ∼ A and B ∼ B ∗ . Furthermore, A and B ∗ are disjoint because
the elements of A end in  while the elements of B ∗ end in ∗ , and thus
we can define |A| + |B| = |A ∪ B ∗ |. It follows from parts (iii ) and (iv )
 ∗
of the previous lemma that if A ∼ A and B ∼ B, then A ∪B ∗ ∼ A ∪B
and hence that |A| + |B| only depends upon |A| and |B|. Furthermore,
it is clear that |A| + |B| = |B| + |A| and |A| ≤ |A| + |B|. Note that the
latter makes sense since |A| + |B| is the size of a set.
It is interesting to observe some elementary properties of the set N.
Surprisingly part (iii ) below uses the Axiom of Choice.
Lemma 30.2. Let N denote the set of natural numbers.
i. If F is a finite set then |F | + |N| = |N|.
ii. |N| + |N| =
( |N|.
iii. Let A = n∈N An be a countable union of the finite sets An .
Then either A is finite of A ∼ N.
Proof. (i ) If |F | = n, then F ∼ {1, 2, . . . , n}, and clearly N ∼
{n + 1, n + 2, . . .}. Thus |F | + |N| is the size of the set
{1, 2, . . . , n} ∪ {n + 1, n + 2, . . .} = N
(ii) Now N ∼ E, where E is the set of even positive integers and
N ∼ O where O is the set of odd positive integers. Thus |N| + |N| is
the size of the set E ∪ O = N. (
(iii ) For each n, let Bn = A1 ∪ A2 ∪ · · · ∪ An so that A = n Bn .
Furthermore, the Bn ’s form an ascending chain of finite sets starting
with B0 = ∅. If sn = |Bn |, then Bn \ Bn−1 ∼ (sn−1 , sn ], where the
latter is the half-open interval {x ∈ N | sn−1 < x ≤ sn }. If the sn ’s are
bounded, then A is finite. Otherwise
˙ ˙
A= Bn \ Bn−1 ∼ (sn−1 , sn ] = N
n∈N n∈N
(
where ˙ as usual indicates disjoint union. 

ISTUDY
30. EXISTENCE OF DIMENSION 231

We again use the Axiom of Choice in the next construction to obtain


an extension of the above result.
Proposition 30.1. Suppose A is any infinite set. Then A contains
a copy of N. Consequently, |N| ≤ |A| and we have
i. If F is a finite set then |F | + |A| = |A|.
ii. |N| + |A| = |A|.
Proof. If B is any finite subset of A, then A \ B is nonempty.
Hence, by the Axiom of Choice, there exists a choice function f from
the set of all finite subsets of A to A itself such that f (B) ∈ A \ B.
With this and mathematical induction, we define a sequence of distinct
elements an ∈ A by a1 = f (∅) and an+1 = f ({a1 , a2 , . . . , an }). Clearly
N ∼ {a1 , a2 , . . . , an , . . .} ⊆ A and hence |N| ≤ |A|.
Now let N denote this copy of N in A. Then A is the disjoint union
A = N ∪˙ (A \ N ). In particular, if F is any finite set disjoint from A,
then the previous lemma yields
F ∪˙ A = (F ∪˙ N ) ∪˙ (A \ N ) ∼ N ∪˙ (A \ N ) = A
and similarly if M ∼ N is disjoint from A then M ∪˙ N ∼ N implies
M ∪˙ A = (M ∪˙ N ) ∪˙ (A \ N ) ∼ N ∪˙ (A \ N ) = A
as required. 
The following result is the Schroeder-Bernstein theorem.
Theorem 30.1. Let A and B be sets with |A| ≤ |B| and |B| ≤ |A|.
Then |A| = |B|.
Proof. We can better appreciate this proof if we think pictorially.
To start with, |A| ≤ |B| means that there exists a one-to-one function
f : A → B. Similarly |B| ≤ |A| yields a one-to-one function g : B → A.
Since these functions are one-to-one, there exist back maps which we
write as f −1 : B → A and g −1 : A → B with the understanding that
f −1 is only defined on f (A) ⊆ B and g −1 is only defined on g(B) ⊆ A.
Note that f −1 f = 1A and f f −1 = 1f (A) . Similarly we have g −1 g = 1B
and gg −1 = 1g(B) .
We can of course assume that A and B are disjoint. Our goal is to
partition A ∪ B into a family of “horizontal lines” H with the property
that H ∩ A ∼ H ∩ B. To start with, take a0 ∈ A and move to the right
by applying f , then g, then f , . . . , as in the diagram below. Or we
could have started with b0 ∈ B and moved to the right by applying g,
then f , then g, . . . , again as in the diagram below.
g f g f g f
−−−→ A −−−→ B −−−→ A −−−→ B −−−→ A −−−→

ISTUDY
232 V. INFINITE DIMENSIONAL SPACES

We can certainly continue infinitely in this direction.


Next we move to the left from a0 by applying g −1 , then f −1 , then
−1
g , . . . , as in the diagram below. Or we could move to the left from
b0 by first applying f −1 , then g −1 and then f −1 . . . , again as in the
diagram. In either case, we can keep going until we are “blocked”,
that is until we reach an element of A not in the domain of g −1 or an
element of B not in the domain of f −1 .
g −1 f −1 g −1 f −1 g −1 f −1
←−−− A ←−−− B ←−−− A ←−−− B ←−−− A ←−−−
Notice that the elements to the right are all in f (A) or g(B) and thus
can be moved to the left, eventually winding up where we started.
Similarly, we can certainly apply the functions f and g alternately to
the elements on the left and again return to the starting point. With
this, it is clear that any element in the horizontal line H completely
determines H and hence that different horizontal lines are disjoint. In
other words, A ∪ B is partitioned by the family of horizontal lines.
Now any of these horizontal lines certainly extends infinitely to the
right. But there are three different behaviors on the left. First, there
are the lines that extend infinitely to the left and we call these the
unblocked lines. The remaining ones are blocked and are either A-
blocked if their left most member is in A or B-blocked if their left most
member is in B. Note that the elements in the blocked lines are all
distinct since they are each at a different distance from the blocking
member. On the other hand, it is possible for the unblocked lines to
contain only finitely many distinct elements.
It remains to show that H ∩ A ∼ H ∩ B for any such H and
this is clear pictorially. If H is A-blocked, the required one-to-one
correspondence H ∩A → H ∩B is a single shift to the right, namely the
function f . If H is B-blocked, we use a single shift to the left, namely
g −1 . Finally, if H is unblocked there are choices, but we again take f ,
the single shift to the right. By combining these, we construct a one-to-
one correspondence h : A → B by h(a) = g −1 (a) if a determines a B-
blocked line and h(a) = f (a) otherwise. This completes the proof. 

Next we need two applications of Zorn’s lemma with proofs that


are similar but somewhat easier than the proof of the Well-ordering
principle (Theorem 29.3). The first part shows that all set sizes are
comparable. Of course, once we know this, then it makes sense to
speak about the maximum of two such sizes.
Theorem 30.2. Let A and B be any two sets.
i. Either |A| ≤ |B| or |B| ≤ |A|.

ISTUDY
30. EXISTENCE OF DIMENSION 233

ii. If at least one of A or B is infinite then


|A| + |B| = max{|A|, |B|}

Proof. (i ) Consider the set C of all ordered pairs (C, f ) where


C ⊆ A and f : C → B is a one-to-one function. We write (C, f ) 
(D, g) if and only if C ⊆ D and both functions agree on the smaller set
C. In this way, C becomes a partially ordered set with unique smallest
element C = ∅ and the obvious embedding C → B. Now suppose
L = {(Ci , fi ) | i ∈ L} is a linearly ordered
( subset of C. Then we obtain
an upper bound for L by taking C = i∈L Ci ⊆ A and f : C → B to
agree with fi on Ci . It is easy to see that f is a well-defined, one-to-
one function. It now follows from Zorn’s lemma that C has a maximal
member (M, g).
If M = A, then g : A → B is one-to-one and |A| ≤ |B|. On
the other hand, if g(M ) = B, then g −1 : B → A is one-to-one and
|B| ≤ |A|. Finally if M < A and g(M ) < B, then we can choose
a ∈ A \ M , b ∈ B \ g(M ) and extend the function g to M = M ∪ {a} by
defining g(a) = b. But then (M , g) ∈ C is strictly larger than (M, g), a
contradiction.
(ii) Suppose A is an infinite set. We first show that |A| + |A| = |A|.
To this end, recall that we have maps  : A → A and ∗ : A → A∗ and
that |A| + |A| = |A ∪ A∗ |. Now consider the set C of all pairs (C, f )
where C ⊆ A and f : C  ∪ C ∗ → C is a one-to-one correspondence.
Again we define (C, f )  (D, g) if and only if C ⊆ D and f and g
agree on the smaller set C  ∪ C ∗ . In this way, C becomes a partially
ordered set with unique smallest element corresponding to C = ∅. If
L = {(Ci , fi ) | i ∈ L} is a linearly ordered
( subset of C, then we obtain
an upper bound for L by taking C = i∈L Ci ⊆ A and f : C  ∪ C ∗ → C
to agree with fi on Ci ∪ Ci∗ . It is easy to see that f is a well-defined,
one-to-one correspondence. It now follows from Zorn’s lemma that C
has a maximal member (M, g).
Note that (M, g) ∈ C implies that |M | + |M | = |M |. If A \ M is
infinite, then by Proposition 30.1, A \ M has a copy N of the natural
 ∗
numbers N. But then we can extend g to the set M ∪M corresponding
to M = M ∪ N using the one-to-one correspondence between N  ∪ N ∗
and N given by Lemma 30.2. Of course, this contradicts the maximal-
ity of (M, g). Thus A \ M is finite, M is infinite and |M | = |A| by
Proposition 30.1. It then follows that
|A| + |A| = |M | + |M | = |M | = |A|
as required.

ISTUDY
234 V. INFINITE DIMENSIONAL SPACES

Finally, suppose |B| ≤ |A|. Then


|A| ≤ |A| + |B| ≤ |A| + |A| = |A|
so the Schroeder-Bernstein theorem (Theorem 30.1) implies that
|A| + |B| = |A| = max{|A|, |B|}
and the theorem is proved. 
The goal now is to use this machinery to show that all bases of a
vector space have the same size. To start with, we need
Lemma 30.3. Let V be a nonzero vector space and let B and B be
bases for V . Then there exist nonempty subsets C of B and C of B
such that C ∼ C and C = C.
Proof. Since B is a basis of V , each β ∈ B can be written uniquely
as a finite linear combination, with nonzero coefficients, of elements of
B. Let us denote this finite subset of B by B(β), the support of β in B.
Similarly each β ∈ B has a unique finite support B(β) ⊆ B. Clearly
β ∈ B(β and β ∈ B(β).
We now construct a sequence of finite subsets Cn of B and C n of B
as follows. To start with, choose β1 ∈ B and take C1 = {β1 }. Similarly,
C 1 = {β 1 } for some β 1 ∈ B. Now suppose ( that we have finite sets
Cn ⊆ B and C n ⊆ B. Then we let Cn+1 = B(β) where the union ( is
over the finitely many elements β ∈ C n and similarly C n+1 = B(β)
where the union is over the finitely many elements β ∈ Cn . Clearly
Cn ⊆ C n+1  and C( n ⊆ Cn+1 . (∞
Finally let C = ∞ n=1 Cn and C = n=1 C n . Then

W = C = C
so C and C are bases of W since they span and since they are both
linearly independent subsets of the corresponding bases B and B. If
either C or C is finite, then W is finite dimensional and Theorem 5.2
implies that C ∼ C. On the other hand if both C and C are infinite,
then part (iii ) of Lemma 30.2 yields C ∼ N ∼ C. 
As a consequence of the above and Zorn’s lemma, we obtain
Theorem 30.3. Let V be a vector space over the field F . Then any
two bases of V have the same size.
Proof. Let B and B be bases of V . We consider the family C of all
pairs (C, f ) where C ⊆ B, f : C → B is a one-to-one map and where
C = f (C). As usual we write (C, f )  (D, g) if C ⊆ D and if f

ISTUDY
30. EXISTENCE OF DIMENSION 235

and g agree on the smaller set C. Then C becomes a partialy ordered


set under  and C has the unique minimum member given by C = ∅.
Suppose L = {(Ci , fi ) | i ∈ L} is a linearly ordered subset of
C. Then
( it is easy to see that L has an upper bound (C, f ) where
C = Ci and where f : C → B agrees with fi on Ci . We can now
apply Zorn’s lemma (Theorem 29.2) to conclude that C has a maximal
member (M, g) and we set M = g(M ) ⊆ B. Since (M, g) ∈ C it follows
that W = M  = M  and M ∼ M . If W = V , then since B and B
are minimal spanning sets of V , we conclude that M = B, M = B and
therefore B ∼ B.
On the other hand, suppose by way of contradiction that W =
V and, using Theorem 10.2, let T : V → V /W be the natural linear
transformation onto the quotient space V /W with kernel W . Since
W = M  = M , it follows easily that T is one-to-one on B \ M and
on B \ M . Furthermore, the images T (B \ M ) and T (B \ M ) are both
bases for the space V /W . (Note that, to be consistent with our recent
function notation, we are writing T on the left here.)
Applying the preceding lemma to these two bases for V /W , we
conclude that there exist nonempty subsets C of B \ M and C of
B \ M such that T (C) ∼ T (C) and T (C) = T (C). Properties
of T then imply that C ∼ C. Furthermore, by considering complete
inverse images of subspaces, we see that M ∪ C = M ∪ C. Finally
using C ∼ C, we see that the one-to-one correspondence g : M → M
extends to a one-to-one correspondence h : M ∪ C → M ∪ C. But
then (M ∪ C, h) ∈ C is properly larger than (M, g), contradicting the
maximality of (M, g). This completes the proof. 

We can now, of course, define the dimension of V to be the common


size of all bases of V . We mention just one consequence of this definition
and of some of the theorems above.

Corollary 30.1. Let V be an infinite dimensional vector space


over the field F and let T : V → W be a linear transformation. Then

dimF V = dimF ker T + dimF im T


= max{dimF ker T, dimF im T }

Proof. Let A be a basis for ker T . Then A is a linearly indepen-


dent subset of V and hence, by Corollary 29.1, A extends to the disjoint
union A ∪˙ B, a basis for V . Then clearly T : B → T (B) is a one-to-one

ISTUDY
236 V. INFINITE DIMENSIONAL SPACES

correspondence and T (B) is a basis for im T = T (V ). Thus


dim V = |A ∪˙ B| = |A| + |B| = |A| + |T (B)|
= dim ker T + dim im T
The result follows from Theorem 30.2. 
Since infinite dimensional vector spaces can have proper subspaces
of the same dimension, it is clear that dimension is a less useful tool in
this infinite context.

Problems
30.1. Let V = 0 be a finite dimensional vector space over Q. Show
that V ∼ N. (Hint. Let {γ1 , γ2 , . . . , γk } be a basis for V and for each
natural number n let Vn be the set of all vectors
a1 a2 ak
γ1 + γ2 + · · · + γk
b1 b2 bk
with ai , bi ∈ Z and |ai |, |bi | ≤ n for all i. Now apply Lemma 30.2.)
30.2. If A is a set, then its power set P(A) is the set of all subsets
of A. Observe that A ∼ A implies that P(A) ∼ P(A). Now prove that
that |A| =
 |P(A)|. (Hint. If f : A → P(A) is given, define a subset B
of A so that b ∈ B if and only if b ∈/ f (b). Is B in the image of f ?)
30.3. Show that |R| ≤ |P(Q)| = |P(N)|. (Hint. Each real number
r determines a “cut” {q ∈ Q | r < q} in the rational line.)
30.4. Conversely show that |P(N)| ≤ |R| and deduce that equality
occurs. (Hint. We construct a map f from P(N) to R, or actually to
elements in the half open interval [0, 1), as follows. For each subset B
of N, let f (B) be the real number whose nth decimal digit, after the
decimal point, is equal to 1 if n ∈ B and equal to 0 otherwise. Thus
 1
f (B) =
n∈B
10n

What happens if the 10 in the denominator is replaced by 3 or by 2?)


30.5. Both R and C are vector spaces over Q. Prove that they both
have the same Q-dimension. (Hint. First argue that these dimensions
are infinite.)
30.6. Let f : R → R satisfy f (x + y) = f (x) + f (y) for all x, y ∈ R.
If f is a continuous function prove that f (x) = ax for a = f (1) ∈ R.
Show that there are numerous discontinuous examples.

ISTUDY
30. EXISTENCE OF DIMENSION 237

30.7. Explain how Proposition 30.1 is actually used in the solu-


tion to Problems 26.7–26.10. Compare those arguments to the simpler
solution of Problem 30.2 that is suggested above.
30.8. Consider the proof of Theorem 30.1, the Schroeder-Bernstein
theorem, and give an example of a situation where the unblocked hor-
izontal lines have only finitely many distinct elements.
30.9. The proofs of both parts of Theorem 30.2 are a bit skimpy.
Carefully verify the fact that the linearly ordered sets L have upper
bounds in C.
30.10. Verify the material in the proof of Theorem 30.3 concerning
the linear transformation T : V → V /W .

ISTUDY
This page intentionally left blank

ISTUDY
Index

A-blocked line, 232 associated matrix, 184


addition of sizes, 230 Hermitian symmetric, 217
adjoint, 135 normal, 173–175, 179, 180
adjoint matrix, 133 scalar multiplication, 177
algebraically closed field, 6, 146, 167, skew-symmetric, 174, 176, 177, 179
208 symmetric, 174, 177, 179
alternating function, 108 block diagonal matrix, 159, 186
angle, 190 blocked, 232
anti-chain, 228 bound variables, 47, 49, 51
area, 104
ascending chain condition, 227 Cauchy-Schwarz inequality, 198, 204,
associative law, 86 216
associative law of addition, 2, 9 Cayley-Hamilton theorem, 153, 157,
associative law of multiplication, 4, 209
10 matrix version, 155
augmented matrix, 46, 137, 139 chain, 228
Axiom of Choice, 220, 222, 225, change of basis, 95, 186, 214
229–231 bilinear form, 186
axioms of linear transformation, 95
bilinear form, 173 change of basis matrix, 89, 91, 186
determinant, 114 characteristic polynomial, 142, 150,
field, 2 153, 156
Hermitian bilinear form, 213 linear transformation, 143
linear transformation, 55 similar matrices, 143
vector space, 9 choice function, 220
volume function, 108 class map, 72, 73
closure of addition, 2, 9
B-blocked line, 232 closure of multiplication, 4, 10
back map, 59, 229 cofactor expansion, 118
bases column, 118
same size, 234 row, 125
basis, 24, 33, 34, 225–227 column space, 39, 50
existence of, 226, 227 column vector, 39
basis for solution set, 48 commutative law of addition, 2, 10
basis for solution space, 50 commutative law of multiplication, 4
bilinear form, 172, 183, 185, 186 commutative ring, 163
addition, 177 companion matrix, 150
239
ISTUDY
240 INDEX

comparable elements, 221 dot product, 191


complement, 35 dual basis, 92
complete inverse image, 61 dual space, 91
complex conjugation, 213
complex dot product, 215 eigenvalues, 97, 98, 146
complex inner product, 215 real, 217
complex inner product space, 215, eigenvectors, 97, 98, 146, 210, 217
216 generalized, 168
complex numbers, 146 linearly independent, 146
complex vector space, 213 elementary column operations, 44
congruent matrices, 186, 190, 207 determinant, 127
conic section, 172 elementary matrices, 140, 141
quadratic part, 172 elementary row operations, 39, 40,
conjugate transpose matrix, 214 46, 140
continuous functions, 192 determinant, 127
continuum, 230 inverse, 40
coset, 69, 70 equivalence classes, 69, 70, 142, 229
countably infinite, 230 equivalence relation, 68, 70, 101,
Cramer’s Rule, 136, 141 190, 229
cut set, 236 Euclidean plane, 172
evaluation at T , 152
decomposable, 164, 165 evaluation map, 56, 144
derivative, 157 even integers, 230
formal, 56 existence of basis, 226, 227
descending chain condition, 227 existence of inverses, 4
determinant, 114, 115, 119, 120 existence of negatives, 3, 10, 73
axioms, 114 existence of one, 4
cofactor expansion, 125 existence of solutions, 136
elementary operations, 127 existence of zero, 3, 10, 73
existence, 114, 117
main diagonal term, 120 field, 2
matrix multiplication, 125 algebraically closed, 6, 146
similar matrices, 126 finite, 75
singular matrix, 119 field axioms, 2
transpose, 124 field of rationals, 75
uniqueness, 114, 123 finite dimensional vector space, 28
determinant of linear finite field, 75
transformation, 149 for all axioms, 72
diagonal matrix, 96, 186, 190, 207, formal derivative, 56, 60
210, 217 formal fraction, 70
diagonalized, 96, 146, 147, 157 free variables, 47, 51
difference operator, 157 function
dimension, 33 identity, 229
definition of, 33, 235 one-to-one, 229
of intersection of subspaces, 35 onto, 229
of sum of subspaces, 35
direct sum, 21 Gaussian elimination, 46, 51
distinct roots, 147 generalized associative law, 7
distributive law, 5, 11, 86 generalized commutative law, 7

ISTUDY
INDEX 241

generalized eigenvectors, 168, 169 kernel, 62, 72, 165


basis, 168 basis, 63
Gram-Schmidt method, 193, 196, dimension, 63, 235
215 kernel function, 173

half open interval, 230 Law of Cosines, 190, 204


Hermitian bilinear form, 213 leading 1, 40
axioms, 213 Legendre polynomials, 195
normal, 214 linear equations
symmetric, 214 homogeneous, 48
Hermitian symmetric bilinear form, system of, 46, 49
214, 215, 217 linear functional, 92
Hermitian symmetric linear linear transformation, 55, 57, 153
transformation, 217, 218 axioms, 55
Hermitian symmetric matrix, 214, class map, 72
217 Hermitian symmetric, 217, 218
Hilbert matrix, 129, 132, 188 image, 62, 63
homogeneous linear equations, 19, 48 kernel, 62, 63
homogenous equations, 65 matrix of, 79
horizontal line, 231 nonsingular, 64
hyperbolic plane, 179, 181, 186, 188 one-to-one, 62, 64
onto, 62
ideal, 155, 156 rank, 64
identity element, 90 symmetric, 208–210
identity function, 229 unitary, 204
identity map, 55, 89, 96 zero, 55
identity matrix, 90 linear transformations
identity operation, 40 addition, 77
image, 61, 62, 165 composition, 86
basis, 63 multiplication, 86, 88
dimension, 63, 64, 235 scalar multiplication, 77
indecomposable, 164, 165 space of, 78
inequality of size, 230 linearly dependent set, 25
infinite dimensional vector space, 28 linearly independent set, 25, 31, 33,
initial segment, 224 225, 227
inner product, 191, 212 maximal, 225, 227
inner product space, 191, 197 linearly ordered set, 221
interchanges
number of, 113 main diagonal, 96, 159
parity, 113 map
inverse of matrix, 90 one-to-one, 59
irreducible polynomial, 169 onto, 59
isomorphism, 59, 74 matrices
isotropic subspace, 178, 181, 186, congruent, 186
188, 211, 214 similar, 96
space of, 78
Jordan blocks, 159, 168 matrix, 39
unique, 168 adjoint, 133
Jordan canonical form, 167 associated with T , 81

ISTUDY
242 INDEX

augmented, 46 nonisotropic plane, 179


block diagonal, 186 nonisotropic subspace, 178, 192
column space, 39 nonsingular linear transformation, 64
diagonal, 186, 210, 217 nonsingular matrix, 90, 91, 95
Hermitian symmetric, 214 nonsingular subspace, 178, 180, 192
inverse, 90, 134 norm, 193, 215
nonsingular, 90 normal bilinear form, 173–175, 179,
rank, 50, 134 180
reduced row echelon form, 41–43 normal Hermitian bilinear form, 214
row echelon form, 41 normal vector, 193
row space, 39 numerically equal, 68
scalar multiplication, 79
singular, 98 odd integers, 230
skew-symmetric, 186 one-to-one, 73
square, 51 one-to-one correspondence, 229
symmetric, 186, 210 one-to-one function, 59, 229
transpose, 49, 124 onto, 73
matrix addition, 79 onto function, 59, 229
matrix multiplication, 81–84, 88, 89 orthogonal direct sum, 192
matrix of coefficients, 46, 135 orthogonal polynomials, 129
matrix of cofactors, 133 orthogonal sets, 193
matrix of constants, 46, 136 orthogonal subspaces, 192
matrix of unknowns, 136 orthogonal vectors, 192
matrix transpose, 185 orthonormal basis, 193, 195, 197,
maximal condition, 227 204, 210, 216, 217
maximal element, 221 orthonormal set, 193
maximal linearly independent set,
26, 34, 225, 227 parabola, 198
maximum size, 232, 233 parallelepiped, 104
minimal condition, 227 generalized, 105
minimal element, 221 parallelogram, 104, 201
minimal polynomial, 155 Parallelogram Law, 200, 201, 204,
minimal polynomial of T , 156 218
minimal spanning set, 26, 34, 225, partially ordered set, 221
227 permutation, 111
modular law, 23 perpendicular, 175
monic polynomial, 142, 150 sets, 175
multilinear function, 108 vectors, 175
multiplication of linear perpendicular direct sum, 180, 181,
transformations, 86, 88 186, 188, 214
multiplication of matrices, 88 perpendicular subspaces, 180, 192
perpendicular vectors, 192
natural numbers, 230 polynomial
negative definite subspace, 206, 207 root, 144
negative semidefinite subspace, 206, polynomial ring, 12, 56, 101, 142, 169
207 poset, 221
nilpotent transformation, 157, 162 positive definite subspace, 206, 207
nonisotropic line, 178, 181, 186, 192, positive semidefinite subspace, 206,
214 207

ISTUDY
INDEX 243

power set, 236 skew-symmetric bilinear form, 174,


projection map, 55 176, 177, 179
skew-symmetric matrix, 185, 186
quadratic form, 183 solution set
quadrilateral, 201 basis, 48
quotient space, 72, 74 solution space
basis, 50
radical, 178 spanning set, 25, 31, 33, 225, 227
rank, 64, 134, 135 minimal, 225, 227
linear transformation, 64 square bracket, 39
rank of linear transformation, 79 square matrix, 51
rank of matrix, 50, 79, 93 subspace, 17
rational canonical form, 168 subspace criteria, 17
rational field, 75 super diagonal, 158, 159
rational functions, 142 Sylvester’s Law of Inertia, 206, 217
real eigenvalue, 209 symmetric bilinear form, 174, 177,
real numbers, 146, 190 179, 191
real polynomial ring, 191, 208 symmetric law, 68, 190, 229
real symmetric matrix, 207 symmetric linear transformation,
reduced row echelon form, 41–43, 208–210
134, 139 symmetric matrix, 185, 186, 207, 210
reflexive law, 68, 190, 229 system of linear equations, 46, 49,
Replacement Theorem, 31, 33 65, 135
respect the function, 73 inconsistent, 47
respects the arithmetic, 71
restriction map, 66 T-invariant subspace, 165
ring, 85 T-stable subspace, 165
ring of matrices, 85 totally ordered set, 221
roots counting multiplicities, 145, trace of linear transformation, 149
146 trace of matrix, 149
roots of polynomial, 144 transitive law, 68, 190, 221, 229
row echelon form, 40, 41 transpose matrix, 49, 50, 92
reduced, 41 Triangle inequality, 200, 218
row space, 39, 50 trigonometric functions, 192
row vector, 39
unblocked line, 232, 237
scalar multiplication, 10 uniqueness of solutions, 66, 136
Schroeder-Bernstein theorem, 231, unital law, 10, 56
234, 237 unitary linear transformation, 204
set theory, 220 upper bound, 222
similar matrices, 96, 101, 210, 217
singular matrix, 98 Vandermonde matrix, 147
size variables
addition, 230 bound, 47, 51
inequality, 230 free, 47, 51
size of set, 229 vector, 9
sizes vector space, 9
comparable, 232 axioms, 9
maximum, 232 dimension, 33, 235

ISTUDY
244 INDEX

finite dimensional, 28
infinite dimensional, 28
volume, 105
n-dimensional, 105
volume function, 108, 109, 114, 118
existence, 118
linearly dependent vectors, 109

weight function, 173


well-ordered set, 221
Well-ordering principle, 221, 224, 225

Z-set, 222
Zorn’s lemma, 221, 222, 224–226,
233, 235

ISTUDY

You might also like