Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
214 views

Linear Algebra Lecture Notes

This document defines key concepts in linear algebra, including: - Vector spaces, which are sets that can be added and multiplied by scalars according to certain properties. Examples include Rn and the set of m×n matrices. - Span of a set S, which is the set of all linear combinations of elements of S. If the span of S equals the entire vector space, S is called a spanning set. - Linearly independent and dependent sets. A set S is linearly independent if the only way to get the zero vector as a linear combination of elements of S is with all scalars equal to zero. - Basis of a vector space, which is a linearly independent set that spans the entire vector space

Uploaded by

Michael Steele
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
214 views

Linear Algebra Lecture Notes

This document defines key concepts in linear algebra, including: - Vector spaces, which are sets that can be added and multiplied by scalars according to certain properties. Examples include Rn and the set of m×n matrices. - Span of a set S, which is the set of all linear combinations of elements of S. If the span of S equals the entire vector space, S is called a spanning set. - Linearly independent and dependent sets. A set S is linearly independent if the only way to get the zero vector as a linear combination of elements of S is with all scalars equal to zero. - Basis of a vector space, which is a linearly independent set that spans the entire vector space

Uploaded by

Michael Steele
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

CM222, Linear Algebra

1. Vector Spaces
Def inition 1.1. A real vector space (or a vector space over R) is a
set V together with two operations + and . called addition and scalar
multiplication respectively, such that:
(i) addition is a binary operation on V which makes V into an abelian group
and
(ii) the operation of scalar multiplying an element v V by an element
R gives an element .v of V .
In addition, the following
(a) 1.v = v for all v V
(b) If , R and v V
(c) If R and v, w V
(d) If , R and v V

axioms must be satisfied:

then .(.v) = ().v


then .(v + w) = .v + .w
then ( + ).v = .v + .v

We refer to the elements of V as vectors.


The definition of a complex vector space is exactly the same except
that field R of scalars is replaced by the complex field C.
In fact one can have vector spaces over any field; in this course we will not
consider any other fields (but most of what we do will hold with no change
for any field).
Informally, a vector space is a set of elements that can be added and
multiplied by scalars and obey the usual rules.
Example 1.2. Rn = { (x1 , x2 , . . . , xn ) | x1 , x2 , . . . , xn R } with its usual
operations of vector addition and scalar multiplication is a real vector space.
(We could just as well use column vector notation.)
Example 1.3. {0} (with the output of any operation being 0 of course) is
a real vector space.
Example 1.4. C is a vector space over R (with the usual operations, so
addition is defined by (x1 + iy1 ) + (x2 + iy2 ) = (x1 + x2 ) + i(y1 + y2 ) and
scalar multiplication is defined by .(x + iy) = (x) + i(y).
We usually omit the symbol . in the notation for scalar multiplication.
Note also that since V is an abelian group under +, it has an additive
identity (or zero) element, which well denote as 0V or simply 0. As usual,
the additive inverse of a vector v V is denoted v.
Example 1.5. If m and n are positive integers, we let Mm,n (R) denote the
set of m n real matrices. We can add two such matrices, the sum being
1

defined by the formula

a11 a12
a21 a22

..
.

a1n
a2n
..
.

b11
b21
..
.

b12
b22

b1n
b2n
..
.

am1
am2 amn
bm1 bm2 bmn
a11 + b11
a12 + b12 a1n + b1n
a21 + b21
a
a2n + b2n
22 + b22

=
.
..
..

.
.
am1 + bm1 am2 + bm2

amn + bmn

We can abbreviate the above notation by writing the first matrix above as
A = (aij )i,j , meaning that A is the matrix whose entry in row i, column j (or
(i, j)-entry) is aij for i = 1, . . . , m and j = 1, . . . , n. Similarly writing B =
(bij )i,j , the formula becomes A + B = (aij + bij )i,j . Scalar multiplication of
matrices is defined by A = (aij )i,j for R and A = (aij )i,j Mm,n (R).
Recall that Mm,n (R) is an abelian group by standard properties of matrix
addition. It is also easy to see that Mm,n (R) is a vector space: a) It is clear
that 1 A = A from the definition of scalar multiplication. b) If , R
and A = (aij )i,j , then
(A) = (aij )i,j = ((aij ))i,j = (()aij )i,j = ()A
(the third equality follows from associativity of multiplication on R, the rest
from the definition of scalar multiplication). The proofs of axioms c) and d)
are similar, but using the distributive law on R.
Example 1.6. Let F denote the set of functions f : R R. Recall that
we can add two function by adding their values; i.e., if f, g F, then f + g
is the function defined by (f + g)(x) = f (x) + g(x). Similarly if R and
f F, then we define the scalar multiple of f by by multiplying its values
by ; i.e., (f )(x) = (f (x)). The proof that F is a real vector space is left
as an exercise. Some interesting subsets of F are also vector spaces, e.g.,
D = { f : R R | f is differentiable };
P = { f : R R | f is defined by a polynomial }.
We have the following basic properties, which we leave as an exercise to
deduce from the axioms:
Proposition 1.7. Suppose that V is a real vector space with zero element
0. Then
(1) 0.v = 0 for all v V ;
(2) .0 = 0 for all R;
(3) (1).v = v for all v V .
Def inition 1.8. If S is a subset of a vector space V , then the span of S,
denoted by span S, is the set of all finite sums of the form 1 v1 + 2 v2 +
+ k vk where i R (some may be zero) and vi S. We call a sum of

the form 1 v1 + 2 v2 + + k vk a linear combination of elements of S;


thus span S is the set of all linear combinations of elements of S. If the span
of a set S is the whole vector space V (i.e. if span S = V ) then S is called
a spanning set.
In practice, the set S will usually be finite, say S = {s1 , s2 , . . . , sm }. In
that case
span S = { 1 s1 + 2 s2 + + m sm | 1 , 2 , . . . , m R}.
(Note that every vector of the form on the right is in span S, by the definition
of span S, since each si S. On the other hand, if v span S, then v =
1 v1 + 2 v2 + + k vk for some i R and vi S. Since S = {s1 , . . . , sm },
we know that each vi = sji for some ji {1, 2, . . . , m}. Some of the sj might
appear mutliple times, others not all, but we can combine like terms, add
0.sj s where necessary and apply the axioms and Prop. 1.7(a) to write v in
the required form.)
By convention, the span of the empty set is {0}.
Example 1.9. For i = 1, . . . , n, let si = (0, . . . , 0, 1, 0, . . . , 0) (with a 1 in
the ith place, and 0s elsewhere). Then S = {s1 , . . . , sn } spans Rn .
Example 1.10. {1, i} spans C (as a real vector space).
Example 1.11. Let Eij Mm,n (R) be the matrix in which the (i, j)-entry is
1 and the rest of the entries are 0. Then { Ei,j | i = 1, . . . , m, j = 1, . . . , n }
spans Mm,n (R), since any matrix A = (aij )i,j can be written as a linear
0 s:
combination of the Eij
X
aij Eij .
A=
i,j

Example 1.12. The set of functions {1, x, x2 , x3 , . . .} spans the space P


of polynomial functions mentioned in Example 1.6 (since, by definition, a
polynomial is a linear combination of such functions).
Lemma 1.13. Suppose that S, S 0 are subsets of a vector space V . If S 0
span S, then span S 0 span S.
(The proof is straightforward: if v span S 0 , then v is a linear combination of elements of S 0 . But each element of S 0 is a linear combination of
elements of S; substituting and expanding gives v as a linear combination
of elements of S.)
Corollary 1.14. If S 0 span S, then span(S S 0 ) = span S.
Def inition 1.15. A subset S of a vector space V is linearly dependent
if if there exist distinct elements s1 , s2 , . . . , sk S and scalars 1 , 2 , . . . , k
not all equal to 0 such that 1 s1 + 2 s2 + + k sk = 0. If S is not linearly
dependent we say that S is linearly independent.

By convention, the empty set is linearly independent.


Again in practice S will usually be finite, say S = {s1 , s2 , . . . , sm } with
s1 , . . . , sm distinct. Then S is linearly independent if
()

1 s1 +2 s2 + +m sm = 0 implies that 1 = 2 = = m = 0 .

Example 1.16. The subset S = {s1 , s2 , . . . , sn } of Rn in Example 1.9 is


linearly independent.
Example 1.17. The subset S = {(0, 1), (1, 0), (1, 1)} of R2 is linearly dependent since 1(0, 1) + 1(1, 0) + (1)(1, 1) = (0, 0).
Example 1.18. Let F be the set of functions as in Example 1.6). The
subset S = {f, g, h} where f (x) = x + 1, g(x) = x + 2, h(x) = x + 3 is
linearly dependent (exercise).
It is sometimes convenient to be able to use the characterization of linear
independence in () above without demanding that s1 , s2 , . . . , sm be distinct. Note that if they are not distinct, i.e., si = sj for some i 6= j, then
1.si + (1)sj = 0, so we automatically view {s1 , s2 , . . . , sm } as being linearly dependent if there is any repetition among the si . For example, we view
{(0, 1), (0, 1)} as linearly dependent, even though the S with single element
(0, 1) is linearly independent. It should be clear from the context whether
we mean the sequence of elements or the set.
Lemma 1.19. Suppose that s1 , s2 , . . . , sm+1 are elements of a vector space V
and that S = {s1 , s2 , . . . , sm } is linearly independent. Then sm+1 span S
if and only if {s1 , s2 , . . . , sm , sm+1 } is linearly dependent.
Proof. Suppose first that sm+1 span S. Then sm+1 = 1 s1 + + m sm
for some i R. Therefore
1 s1 + + m sm + (1)sm+1 = 0,
so {s1 , s2 , . . . , sm , sm+1 } is linerly dependent. (Note that the coefficient of
sm+1 is 1 6= 0.)
Conversely suppose that {s1 , s2 , . . . , sm , sm+1 } is linearly dependent. This
means that
1 s1 + + m sm + m+1 sm+1 = 0
for some 1 , . . . , m+1 R with some i 6= 0. Note that m+1 =
6 0 (otherwise wed have 1 s1 + + m sm = 0 contradicting that S is linearly
independent). It follows that
1
sm+1 = (1
m+1 1 )s1 + + (m+1 m )sm ,

so sm+1 span S.

Def inition 1.20. A subset S of a vector space V is a basis of V if it is


linearly independent and spans V . If V has a finite basis we say that V is
finite-dimensional.

Example 1.21. The sets in Examples 1.9, 1.10 and 1.11 are all bases of the
vector spaces being considered; they are all therefore finite-dimensional.
Example 1.22. The set {1, x, x2 , . . .} is a basis for P (which is not finitedimensional).
Example 1.23. The set in Example 1.17 spans R2 but is not a basis of R2
since it is not linearly independent.
Theorem 1.24. If S = {v1 , v2 , . . . , vm } spans V then there is a basis of V
which is a subset of S.
Proof. (Sketch) If S is linearly independent, then it is a basis and we are
done, so suppose S is linearly dependent. This means that
1 v1 + + m vm = 0
for some 1 , . . . , m R with some i 6= 0. We can reorder the vi and
so assume that m 6= 0. As in the proof of Lemma 1.19, we get that
vm span S 0 where S 0 = {v1 , . . . , vm1 }. By Corollary 1.14, we see that
span S 0 = span(S 0 {vm }) = span S = V , so S 0 spans V .
If S 0 is linearly independent, then we are done. Otherwise repeat the
process, which must eventually terminate since S is finite.

Corollary 1.25. If V is spanned by a finite set, then V is finite-dimensional.
Theorem 1.26. If V is finite-dimensional and S = {v1 , v2 , . . . , vk } is linearly independent, then there is a basis of V which contains S.
Proof. Since V is finite-dimensional, V is spanned by a finite set S 0 =
{s1 , . . . , sn } for some s1 , . . . , sn V . The idea of the proof is to show that
we can choose elements from S 0 to add to S to form a basis.
If S spans V , then S is itself is a basis, so there is nothing to prove.
So suppose S does not span V . Then S 0 6 span S (since if S 0 span S,
then Lemma 1.13 would show that V = span S 0 span S, contradicting
the assumption that S does not span V ). So we can choose some si1 S 0
such that si1 6 span S. By Lemma 1.19, S1 = {v1 , . . . , vk , si1 } is linearly
independent.
If S1 = {v1 , . . . , vk , si1 } spans V , then S1 is a basis containing S, and we
are done. If S1 does not span V , then repeat the process using S1 instead of
S to get si2 S 0 such that S2 = {v1 , . . . , vk , si1 , si2 } is linearly indpendent.
(Note that si2 6= si1 since si2 6 span S1 .) If S2 spans we are done; if not
then repeat the process, which must yield a spanning set in at most n steps
since we keep choosing distinct si from S 0 and S 0 itself spans V .

Theorem 1.27. Suppose that S = {v1 , . . . , vn } spans V and S 0 = {w1 , . . . , wm }
is a set of m linearly independent vectors in V . Then m n.
Proof. The idea of the proof is to show that we can replace each wi with
vi (reordering the vi if necessary), leading to a contradiction if m > n.

If each vi span{w2 , . . . , wm }, then S span{w2 , . . . , wm } implies that


V = span S span{w2 , . . . , wm } (Lemma 1.13) and therefore that w1
span{w2 , . . . , wm }, contradicting that S 0 is linearly independent (Lemma 1.19).
Therefore vi 6 span{w2 , . . . , wm } for some i; reordering we can assume
vi = v1 . Applying Lemma 1.19 again, we see that {v1 , w2 , . . . , wn } is linearly independent.
Now repeat the process with span{v1 , w3 , . . . , wn } instead of {w2 , . . . , wm }
to get that vi 6 span{v1 , w3 , . . . , wn } for some i 6= 1; reordering we can
assume i = 2, so {v1 , v2 , w3 , . . . , wn } is linearly independent.
Continuing in this way, we get that if m > n, then {v1 , . . . , vn , wm+1 , . . . , wm }
is linearly independent, but wm+1 V = span{v1 , . . . , vn }, contradicting
Lemma 1.19

This easily gives:
Theorem 1.28. [BASIS THEOREM] Every basis of a finite-dimensional
vector space has the same number of elements.
Def inition 1.29. If V is a finite-dimensional vector space the dimension
of V (denoted dim V ) is the number of elements in any basis of V .
For example, Rn has dimension n, C has dimension 2 (as a vector space
over R), and Mm,n (R) has dimension mn.
Let V be a finite-dimensional vector space with basis S = {v1 , v2 , . . . , vn }.
Then any element v V can be written uniquely in the form
v = 1 v1 + 2 v2 + n vn
where i are scalars. (It can be written this
way since
S spans, uniquely
1
2

since S is linearly independent.) The n-tuple .. is called the n-tuple


.

n
of coordinates of v with respect to the basis {v1 , v2 , . . . , vn }. (Coordinates
of vectors are normally regarded as column vectors but sometimes written
in rows, or using transpose notation, to save space.)
Note that if V = Rn and S = {s1 , . . . , sn } is its standard basis (Example 1.9), then the n-tuple of coordinates of a vector is given by its usual
coordinates (since v = (1 , . . . , n = 1 s1 + + n sn ). The coordinates
with respect to a different basis will be different.
 
 
2
3
Example 1.30. Let v1 =
and v2 =
in R2 . Then {v1 , v2 } is
1
1
linearly independent
 and
 span, and therefore form
 abasis. The standard
1
1
basis vector s1 =
has coordinates s1 =
with respect to the
0
0


1
standard basis, but has coordinates
with respect to {v1 , v2 } (since
1

2
s1 = v1 + v2 ). The vector v1 has coordinates
with respect to the
1
 
1
standard basis, and has coordinates
with respect to {v1 , v2 }.
0

Suppose that S = {v1 , . . . , vn } and S 0 = {v10 , . . . , vn0 } are bases for V , and
that v V coordinates (1 , . . . , n )t with respect to S and (01 , . . . , 0n )t
with respect to S 0 (where t denotes the transpose). We can relate the
coordinate vectors with respect to the different bases as follows: Note that
each vj S has an n-tuple of coordinates (t1j , . . . , tnj ) with respect to S 0 .
Define the transition matrix from S to S 0 as the n n-matrix T = (tij )i,j .
Then

0
1
1
.. ..
T . = . ,
n
0n
i.e., multiplication by T converts the coordinates with respect to S into the
coordinates with respect to S 0 . (Proof:

n
n
n
n
n
X
X
X
X
X

tij vi0 =
tij j vi0 ,
j
j vj =
v=
j=1

so

0i

Pn

j=1 tij j

j=1

i=1

i=1

j=1

for i = 1, . . . , n.)

Def inition 1.31. A subset W of a vector space V is called a subspace if


it is a vector space (with the operations as in V ).
Equivalently a subset W of V is a subspace if it is non-empty and closed
under the operations of addition and scalar multiplication. (See the exercises.)
For example,
The set of solutions in Rn to a system of homogeneous linear equations is a subspace of Rn .
The set { f P | f has degree n } is a subspace of P, which is a
subspace of D, which is a subspace of F (see Example 1.6).
If S is any subset of V , then span S is a subspace of V (exercise).
Lemma 1.32. If W is a subspace of an n-dimensional vector space V then
W is finite-dimensional with dimension m n. The case m = n holds if
and only if W = V .
Proof. We need to show that W has a basis with at most n elements. If
W = {0}, then dim W = 0 (the empty set is a basis) and the lemma holds.
If W 6= {0}, then W contains some non-zero vector w1 , and S1 = {w1 } is
linearly independent. If S1 spans W , then S1 is a basis, so dim W = 1. If
S1 does not span W , then there is some w2 6 span S1 , and by Lemma 1.19,
S2 = {w1 , w2 } is linearly independent. Continuing in this way, we either
get that Sm = {w1 , . . . , wm } is a basis for W for some m < n, or that

Sn = {w1 , . . . , wn } is linearly independent, in which case Sn is a basis for


V , so V = span Sn W , so W = V and has dimension n.

2. Linear Maps
Def inition 2.1. Let V and W be vector spaces. A map (or function or
transformation) f : V W is said to be linear if preserves (or respects)
addition and scalar multiplication, that is
(i) f (v1 + v2 ) = f (v1 ) + f (v2 ) for all v1 , v2 V and
(ii)f (v) = f (v) for all scalars and v V .

Example 2.2. Let V = Rn and W = Rm (viewed as column vectors). If A is


an m n-matrix (over R), then f (v) = Av defines a linear map f : V W .
Example 2.3. Differentiation

d
dx

defines a linear map from D to F.

Def inition 2.4. A map f : V W is said to be an isomorphism (of real


vector spaces) if f is linear and bijective. If there is an isomorphism from
V to W , then we say V is isomorphic to W .
Proposition 2.5. Suppose U, V, W are vector spaces and f : V W and
g : U V are linear maps. Then
(1) The composite f g from U to W is linear.
(2) If f is an isomorphism, then so is its inverse map from W to V .
The proof is left as an exercise.
Example 2.6. Suppose that V has dimension n and {v1 , . . . , vn } is a basis
for V . Define f : Rn V by f ((1 , . . . , n )t ) = 1 v1 + + n vn . Then f
is an isomorphism, the inverse being the map sending v to its coordinates.
This shows that every vector space of dimension n is isomorphic to Rn .
Def inition 2.7. If f : V W is a linear map, the kernel (or null space)
ker(f ) and the image (or range) im(f ) of f are defined by
(i) ker(f ) = {v V f (v) = 0} ,
(ii) im(f ) = {w | w = f (v) for some v V } .
Proposition 2.8. If f : V W is a linear map, then
(1) ker(f ) is a subspace of V ;
(2) im(f ) is a subspace of W ;
(3) ker(f ) = {0} if and only if f is injective.
The proof is left as an exercise.

Example 2.9. Let f : V W be as in Example 2.2. Then ker(f ) is the


null space of A (the set of solutions to Av = 0) and im(f ) is the span of the
columns of A.
Lemma 2.10. Suppose that f : V W is a linear map, {w1 , . . . , wr } is a
basis for im(f ) and {v1 , . . . , vk } is a basis for ker(f ). For i = 1, . . . , r, let
ui V be such that f (ui ) = wi . Then S = {u1 , . . . , ur , v1 , . . . , vk } is a basis
for V .

Proof. We must show that 1) S is linearly independent, and 2) S spans V .


1) Suppose that 1 u1 + r ur + 1 v1 + + k vk = 0. We must show
that 1 = = r = 1 = = k = 0. Applying f to the given equation
gives f (1 u1 + r ur + 1 v1 + k vk ) = f (0) = 0. Since f is linear, we
have
f (1 u1 + r ur + 1 v1 + + k vk )
= 1 f (u1 ) + + r f (ur ) + 1 f (v1 ) + + k f (vk )
= 1 w1 + + r wr = 0.
Since {w1 , . . . , wr } is linearly independent, we get 1 = = r = 0. Therefore the given equation becomes 1 v1 + + k vk = 0. Since {v1 , . . . , vk } is
linearly independent, it follows that 1 = = k = 0 as well.
2) Suppose that v V . Since f (v) im(f ) = span{w1 , . . . , wr }, we
have f (v) = 1 w1 + + r wr for some 1 , . . . , r R. Letting v 0 =
v (1 u1 + + r ur ), we have
f (v 0 ) = f (v) (1 f (u1 ) + + r f (ur ))
= f (v) 1 w1 + + r wr = 0.

Therefore v 0 ker(f ) = span{v1 , . . . , vk }, so we have v 0 = 1 v1 + + k vk


for some 1 , . . . , k R. It follows that
v = 1 u1 + r ur + 1 v1 + + k vk

is in the span of S.

Def inition 2.11. If V and W are finite-dimensional and f : V W is a


linear map, we define the rank of f (denoted rank(f )) to be the dimension
of im(f ), and the nullity of f (denoted nullity(f )) to be the dimension of
ker(f ).
It follows immediately from the lemma that
Theorem 2.12. If V and W are finite-dimensional and f : V W is
linear, then
rank(f ) + nullity(f ) = dim(V ) .
Suppose that f : V W is a linear map, and we are given bases S =
{v1 , . . . , vn } for V and S 0 = {w1 , . . . , wm } for W . For j = 1, . . . , n, let
(a1j , . . . , amj )t be the coordinates of f (vj ) W . Define the matrix of f
with respect to S and S 0 to be the m n-matrix Af = (aij )i,j . Then the
map f can be described on coordinates as multiplication (on the left) by
Af , in the following sense:
Lemma 2.13. If v V has coordinates (1 , . . . , n )t , then f (v) has coordinates Af (1 , . . . , n )t .
P
The proof is straightforward: Af is defined so thatPf (vj ) = m
i=1 aij wi .
That v has coordinates (1 , . . . , n )t means that v = nj=1 j vj . Therefore
f (v) =

n
X
j=1

j f (vj ) =

X
i,j

aij j wi

10

has ith coordinate

Pn

j=1 aij j .

Example 2.14. For f : V W as in Example 2.2, the matrix of f with


respect to the standard basis is just A.
If we fix bases for V and W , then each matrix A Mm,n (R) uniquely
determines a linear map f : V W whose matrix is A (the proof is an
exercise). Recall that the choices of bases determine isomorphisms : Rn
V and : Rm W (where (x) is the vector in V with coordinates are x,
and (y) is the vector in W with coordinates y). If f has matrix A, then
1 f is the map sending x to Ax.
Proposition 2.15. If U , V and W are vector spaces with bases {u1 , . . . , uk },
{v1 , . . . , vn } and {w1 , . . . , wm }, and f : V W and g : U V are linear
maps, then Af g = Af Ag .
Proof. If u U has coordinates x = (1 , . . . , k )t (with respect to the chosen basis), then g(u) has coordinates Ag x, f (g(u)) has coordinates Af (Ag x) =
(Af Ag )x. Therefore Af Ag is the matrix of f g.


Note that in the proposition, we are assuming the matrices are always
with the respect to the chosen bases for U , V and W . Choosing different
bases will give different matrices:

Proposition 2.16. Suppose that S and S 0 are bases for V , and T and T 0
are bases for W . If a linear map f : V W has matrix Af with respect to
S and T , then the matrix of f with respect to S 0 and T 0 is A0f = QAP 1 ,
where P is the transition matrix from S to S 0 and Q is the transition matrix
from T to T 0 .
Proof. If v V has coordinates x = (1 , . . . , k )t with respect to S 0 , then
v has coordinates P 1 x with respect to S, so f (v) has coordinates Af P 1 x
with respect to T , and QAf P 1 with respect to T 0 .



1 1
Example 2.17. Let A =
and define f : R2 R2 by f (x) =
1 0
Ax. Then the matrix of A with respect to the standard basis S is just A.
0
t
t
The matrix of A with respect
 to the
 basis S = {(2, 1) , (3, 1) } is given by
2 3
A0 = P 1 AP , where P =
is the transition matrix from S 0 to S,
1 1
so



 

1 3
1 1
2 1
5
7
0
A =
=
.
1 2
1 0
3 1
3 4
Recall that if f : V W is a linear map, then ker(f ) = { v V | f (v) =
0 } is a subspace of V of dimension nullity(f ), and im(f ) = { f (v) W | v
V } is a subspace of W of dimension rank(f ). If f : Rn Rm is defined by
f (x) = Ax (see Example 2.9), then we also write nullity(A) for the nullity
of f (the dimension of the solution space of Ax = 0) and rank(A) for the
rank of f (the dimension of the span in Rm of the columns of A, so this

11

is the column-rank of A, but well see shortly this is the same as the
row-rank).
Theorem 2.12 says that
dim(V ) = rank(f ) + nullity(f ).
(assuming V and W are finite-dimensional). We know that:
f is injective if and only if nullity(f ) = 0;
f is surjective if and only if rank(f ) = dim(W );
so f is an isomorphism if and only if nullity(f ) = 0 and rank(f ) =
dim(W ).
If follows easily that:
Proposition 2.18. If f : V W is an isomorphism then dim(V ) =
dim(W ). If dim(V ) = dim(W ), then the following are equivalent:
(1) f is injective;
(2) f is surjective;
(3) f is an isomorphism.
An n n-matrix A is invertible if and only there is an n n-matrix B
such that AB = BA = In . Because of the correspondence between matrices
and linear maps, A is invertible if and only if the associated linear map
f : Rn Rn (sending x to Ax) is an isomorphism. So in that context
Prop. 2.18 says that:
A is invertible nullity(A) = 0 rank(A) = n.

(Such an A is also called non-singular.)


Composing with an isomorphism doesnt change rank or nullity:
Lemma 2.19. Suppose that f : V W and g : U V are linear maps of
finite-dimensioanl vector spaces.
(1) If f is an isomorphism, then
nullity(f g) = nullity(g)

and

rank(f g) = rank(g).

nullity(f g) = nullity(f )

and

rank(f g) = rank(f ).

(2) If g is an isomorphism, then

Proof. 1) We show that ker(f g) = ker(g). If u ker(f g), then f (g(u)) =


0. Since f is injective, this implies that g(u) = 0, so u ker(g). On the
other hand if u ker(g), then g(u) = 0 so f (g(u)) = 0 and u ker(f g).
It follows that nullity(f g) = nullity(g). The assertion about ranks follows
from Thm. 2.12. (Note that in this part we only needed that f is injective.)
2) Similarly to 1), we see that since g is surjective, im(f g) = im(f ),
and we get the asserion about ranks. The assertion about nullity then
follows from Thm. 2.12 (and the fact that dim(U ) = dim(V ) since g is an
isomorphism).

Applying this to multiplication by matrices, we see that if A Mm,n (R),
and P is an invertible n n-matrix and Q is an invertible m m-matrix,

12

then A has the same rank and nullity as QAP . Note also that the rank and
nullity of f : V W are the same as those of its matrix Af (for any choice
of bases for V and W ). Lemma 2.10 shows that we can choose the bases so
that Af has a very simple form:
Lemma 2.20. Suppose that f : V W is a linear map of finite-dimensional
vector spaces. Then there are bases {v1 , v2 , . . . , vn } of V and {w1 , w2 , . . . , wm }
of W such that the matrix of f with respect to the chosen bases has the partitioned matrix form


Ir 0
Af =
0 0
where r = rank(f ), Ir is the r r identity matrix and 0 denote zero matrices
of appropriate sizes. (If r = m then the bottom row is absent; if r = n then
the right hand column is absent; if r = 0 then Af is the zero matrix.)
Proof. Let S = {u1 , . . . , ur , v1 , . . . , vk } be the basis for V in Lemma 2.10.
So {w1 , . . . , wr } (where each wi = f (ui )) is a basis for im(f ), and {v1 , . . . , vk }
is a basis for ker(f ). Since {w1 , . . . , wr } is linearly independent, it can
be extended to a basis T = {w1 , . . . , wm } for W (by Thm. 1.26). Now
f (u1 ) has coordinates (1, 0, . . . , 0)t with respect to T , f (u2 ) has coordinates
(0, 1, 0, . . . , 0)t . . . and f (ur ) has coordinates (0, 0, . . . , 0, 1, 0, . . . , 0)t (with a 1
in the rth coordinate). Also, f (v1 ) = . . . = f (vk ) have coordinates (0, . . . , 0),
so the matrix of f is exactly what we want.

Corollary 2.21. If A Mm,n (R), then there are invertible matrices P
Mn,n (R) and Q Mm,m (R) such that


Ir 0
A=Q
P.
0 0

Recall the definition and a few basic facts about transpose matrices: If
A = (aij )i,j is an m n-matrix, then At denotes its n m transpose matrix
whose (j, i)-entry is ai,j . I.e.,

a11 a12 a1n


a11 a21 am1
a21 a22 a2n
a12 a22 am2

If A = ..
.. , then At = ..
.. ,
.
.
.
.
am1 am2 amn
a1n a2n amn

In other words the rows of At are the columns of A and vice-versa.


If A Mm,n (R) and B Mn,k (R), then (AB)t = B t At . If P is an
invertible n-matrix, then so is P t (since if P Q = QP = In , then P t Qt =
Qt P t = In ).
Note that the span (in Rn ) of the rows of A is the same as the span of
the columns of At , so the row-rank of A is the same as the (column-)rank
of At .
Corollary 2.22. rank(A) = rank(At ), so the dimension of the span of the
columns of A is the same as the dimension of the span of the rows of A.

13

Ir 0
Proof. Write A = Q
P as in Corollary 2.21. Then At = P t
0 0
Since P t and Qt are invertible, we get rank(At ) = r = rank(A).

Ir 0
0 0


Qt .

3. Equations and Matrices


Recall that if A is an m n-matrix, then Ax = 0 defines a homogeneous
system of linear equations (with m equations in n unknowns). The set
of solutions is precisely the kernel of the corresponding linear map from
f : Rn Rm . So Theorem 2.12 gives:
Theorem 3.1. [Homogeneous Linear Equations.] Let A be an m n matrix. The homogenous system of linear equations Ax = 0 has a non-trivial
solution if and only if rank(A) < n. The dimension of the solution space is
n rank(A).
Consequently, if m < n (i.e. if there are more unknowns than equations)
then there is always a non-trivial solution.
Recall the usual method for finding the space of solutions is to apply
elementary row operations to put A into Row Echelon Form, from which
the solution space can be read off. Recall the three types of elementary row
operations:
(I) Interchange two rows.
(II) Multiply one row by a non-zero scalar.
(III) Add a scalar multiple of one row to another row.

0 1 1 1 1
1 1 3
0
2
. Denoting the ith row
Example 3.2. Let A =
1 0
2 1 3
2 1
3
0 1
by Ri , apply the row operation
exchanging
R1 with R
2 , and then multiply

1 1
3
0 2
0
1 1 1 1
. Then (sometimes
(the new) R1 by 1 to get
1
0
2 1 3
2
1
3
0 1
combining row elementary operations to save writing. . . )
R3 R3 R1 ,
R4 R4 2R1

R3 R4 ,
R4 13 R4

0
BB
@
0
BB
@

1
0
0
0

1
1
1
3

1
0
0
0

1
1
0
0

3
1
1
3
3
1
0
0

0
1
1
0

2
1
1
3

0
1
1
0

2
1
2
0

1
CC
A
1
CC
A

R3 R3 R2 ,
R4 R4 3R2

R2 R2 + R3
R1 R1 + R2

0
BB
@
0
BB
@

1
0
0
0

1
1
0
0

1
0
0
0

0
1
0
0

3
1
0
0
2
1
0
0

0
1
0
3
0
0
1
0

The resulting matrix defines a system of equations equivalent to the original


one, whose solutions have x3 and x5 arbitrary, with
x1 = 2x3 + x5 ,

x2 = x3 x5 ,

x4 = 2x5 .

1
1
2
0

2
1
0
6

1
CC .
A

1
C
C
A

14

We can therefore describe the general solution in the form

2
1
1
1

x = 1 +
0 .
0
2
0
1

The solution space therefore has these two vectors as a basis. (Note that
this procedure computes the rank and nullity as well. In this case the rank
is 3 and nullity is 2.)
To see that row operations yield an equivalent system of equations, we can
make the following observation: the effect of applying a row operation to a
matrix A is the same as that multiplying A on the left by the corresponding
elementary matrix:
(I) EI (p, q) (interchanging row p and row q): all the diagonal entries of
EI (p, q) except the pth and the q th are 1, the (p, q)th and the (q, p)th
entries are 1 and all the remaining entries are 0.
(II) EII (p, ) (multiplying row p by a non-zero scalar ): all the diagonal
entries of EII (p, ) except the pth are 1, the (p, p)th entry is and
all the remaining entries are 0.
(III) EIII (p, q, ) (adding times row p to row q): all the diagonal entries
of EIII (p, q, ) are 1, the (q, p)th entry is and all the remaining
entries are 0.
Note that each of these matrices is invertible: The inverses of EI (p, q),
EII (p, ) and EIII (p, q, ) are EI (p, q), EII (p, 1 ) and EIII (p, q, ) respectively.
Therefore the effect of applying a sequence of elementary row operations
to A is to replace A by Ek Ek1 E2 E1 A where E1 , . . . , Ek are the corresponding elementary matrices. Thus we are replacing A by QA where
Q = Ek Ek1 E2 E1 is invertible. Note that if Ax = 0 then QAx = 0, and
if QAx = 0 then Ax = Q1 QAx = 0. So the two systems of equations have
the same solutions.
Similarly column operations correspond to multiplying A on the right by
elementary matrices:
(I) EI (p, q) interchanges column p and column q;
(II) EII (p, ) column p by ;
(III) EIII (p, q, ) adds times column q to column p.
Recall that the image of the map f : Rn Rm defined by f (x) = Ax is
given by the span of the columns of A, and this is the set of b Rm such that
Ax = b has solutions. Applying column operations amounts to multiplying
A on the right by an invertible matrix P , and Ax = b has solutions if and
only if AP y = b has solutions (set x = P y and y = P 1 x), This can also
be thought of as saying that column operations dont change the span of the

15

columns. So for example, you could use column operations to find a basis
for the image of a linear map.
Example 3.3.To find a basis
for the image of the map f : R3 R4 defined
1 2
1
1 1
2

by the matrix
2 1 1 applying column operations gives
1 0
1

1
0 0
1
0
0
1

1
3 0
3
3

2 3 3 2 3 0
1
2 0
1
2
2

so the rank of f is two, and a basis for the image is {(1, 1, 2, 1)t , (0, 3, 3, 2)t }.
Note that row operations could be used to compute the rank of A, but that
would change the span of the columns of A (while preserving the span of its
rows).
Note that Ax = b has solutions (i.e., b is in the span of the columns of
A) if and only if the rank of the augmented matrix (A|b) is the same as the
rank of A. One can use row operations on this augmented matrix to find
the set of solutions to the system. We use the augmented matrix ( A |b)
to represent the system, and apply row operations to obtain an augmented
matrix ( A0 |b0 ) with A0 in row echelon form. Note that the new matrix
represents an equivalent system since applying the same series of row operations to both A and b amounts to multiplying them both on the left by
the same invertible matrix, say Q, and the equation Ax = b is equivalent to
the equation QAx = Qb. In particular if A0 is in row echelon form, one can
immediately tell whether A0 x = b0 = (b01 , . . . , b0m ) has a solution, according
to whether b0i = 0 for all i > r, where r is the rank of A0 . Furthermore if
there are any solutions, they can easily be read off from ( A0 |b0 ).
Example 3.4. Consider the system Ax = (1, 2, 1, 1)t where A is as in
Example 3.2. The same sequence of row operations applied now to the
augmented matrix

1
0
1
2

1
1
0
1

3
1
2
3

0
1
1
0

2
1
3
1

1
2
1
1

1
0
0
0

0
1
0
0

2
1
0
0

0
0
1
0

1
1
2
0

1
1
0
0

The solutions are therefore given by


x1 = 1 2x3 + x5 ,

x2 = 1 + x3 x5 ,

x4 = 2x5

16

with x3 , x5 arbitrary. We can therefore describe the general solution as

1
2
1
1
1
1

x=
0 + 1 + 0 .
2
0
0
0
0
1

Note the structure of the set of solutions of a non-homogeneous system.


If x0 is a fixed solution, then Ax = b = Ax0 if and only if A(x x0 ) = 0,
if and only if x = x0 + y for some solution y of the homogeneous system
Ay = 0. We sum this up as follows:
Theorem 3.5. [Non-Homogeneous Linear Equations.] Let A be an m n
matrix and b be a non-zero vector in Rm . The non-homogenous system of
linear equations Ax = b has a solution if and only if rank(A) = rank(A|b)
where (A|b) is the m (n + 1) augmented matrix obtained by adjoining b to
A as the (n + 1)th column. If a solution x0 to the system exists then every
solution is of the form x0 + y where y is any solution of the homogeneous
system Ay = 0. In particular, the system has a unique solution if and only
if rank(A) = rank(A|b) = n.
Row operations can also be used to find the inverse of a matrix of an
invertible n n-matrix A. Since A has rank n, we get it into the row echelon
form In by row operations. So starting with the augmented matrix (A|In )
and applying row operations yields (In |B), where QA = In and QIn = B.
Therefore Q = B, and so BA = In and B = A1 .

1 2 3
Example 3.6. To find the inverse of the matrix A = 1 1 1 , apply
0 1 0
row operations to the augmented matrix ( A | In ) giving

0
@

1
0
0

0
@
0
@
giving A1

2
1
1

3
2
0

1
1
0

1
0
0

2
1
0

3
0
2

1
0
0

0
1
0

3
0
1

12

0
=

1
2

0
1
0

1
0
1

0
0
1

0
0
1

1
0

0
0

1
2

21

3
2

0
21

1 0
A@

0
1
1

1 0
A@

2
1
12
1
2

2
1
1

1
0
0

1 0
A@

2
1
0
1
0
0

3
0
2

3
0
1
0
1
0

0
0
1

1
0
1

0
0
1

1
A

0
1
0

1
0

0
0

0
1

1
2

21

21

12
0
1
2

1
A

3
2

1
2

0
12

1
21

1
A

1 .

12

1
0
0

For another algorithmic application of row and column operations, suppose we are given a linear map f : V W of finite-dimensional vector
spaces, and we want to find bases for V and W with respect to which the

17

matrix of f is as in Lemma 2.20. If A is the matrix of f with respect to


given bases for V and W , then this amounts to finding invertible matrices
Q and P so that


Ir 0
QAP =
0 0

(the transition matrices to the desired bases being P 1 and Q). To achieve
this, first apply row operations to the augmented matrix (A|In ) to obtain a
matrix (A0 |Q) where A0 =QA isin row echelon form. Next consider the verA0
. Applying column operations, it is easy to
tically augmented matrix
Im


Ir 0
00
see we can get the top matrix into the desired form A =
. Since
0 0
applying column operations amounts to multiplying the top and bottom
parts of the augmented matrix by the
 invertible matrix P , we get that
 same
00
A
where A00 = A0 P = QAP . Note
the resulting augmented matrix is
P
that there are (infinitely) many possible P and Q such that QAP has the
desired form. For example, if A = 0 then any invertible P and Q work; if A
is square and invertible, then given a P and Q which work (so QAP = In ),
we can choose any invertible R and replace Q by RQ and P by P R1 .


1 0 1
Example 3.7. Suppose A =
. Subtracting 2R1 from R2
2 1
0
gives




1 0 1
1 0
1 0 1 1 0

2 1
0 0 1
0 1
2 2 1




1 0 1
1 0
0
so we can let A =
and Q =
. Then applying
0 1
2
2 1
column operations gives

1 0 1
1 0 0
1 0
0
0 1
0 1 2
0 1
2
0

1 0

0 1 0 1 1 0
1
,

0 1
0 1 0
0 1 2
0
0 0
1
0 0
1
0 0 1

1 0
1
so we have P = 0 1 2 .
0 0
1
4. Determinants
We recall the definition of the determinant and review its main properties.
If A is a square matrix, then the determinant of A is a certain scalar associated to A, denoted det(A) (or simply |A|). We can define the determinant
inductively by first defining the determinant of any 1 1-matrix. Then we

18

define the determinant of an n n-matrix assuming we have already defined


the determinant of any (n 1) (n 1)-matrix.
Def inition 4.1. The determinant of an n n-matrix A = (aij ) is defined
as follows.
(1) The determinant of the 1 1 matrix (a) is a.
(2) Suppose that determinants have been defined for (n 1) (n 1)matrices. For integers i, j with 1 i n, 1 i n the (i, j)-minor
of A is defined to be the determinant of the (n 1) (n 1) matrix
obtained by deleting the ith row and the j th column of A.
(3) The determinant of A is defined by
n+1

det(A) = a11 M11 a12 M12 + . . . + (1)

a1n M1n

n
X
(1)j+1 a1j M1j ,
=
j=1

where M1j denotes the (1, j)-minor of A. We also write |A| for
det(A).
For example if n = 2, then M11 =det(a22 ) = 
a22 and M22 = det(a21 ) =
a11 a12
a21 , so we get the usual formula det
= a11 a22 a12 a21 .
a21 a22
For n = 3, we have (using the | | notation)


a22 a23
= a22 a33 a23 a32 ,
M11 =
a32 a33
M12

M13
so


a
a
= 21 23
a31 a33


a
a
= 21 22
a31 a32



= a21 a33 a23 a31 , ,



= a21 a32 a22 a31 .

det(A) = a11 a22 a33 a11 a23 a32 a12 a21 a33 +a12 a23 a31 +a13 a21 a32 a13 a22 a31 .

Note that this coincides with the description of the determinant of a 3 3matrix given by repeating the first two columns of A after the matrix:

a11 a12 a13


a11 a12
a21 a22 a23 a21 a22
a31 a32 a33
a31 a32

forming the products along each diagonal and adding them with appropriate signs (+ for the three &-sloping diagonals, for the three %-sloping
diagonals).

2 1
1
4 1 is 16 + 1 18
Example 4.2. The determinant of A = 3
1
6
2
4 + 12 6 = 1.

19

For n = 4, the definition of the determinant yields an expression with 24


terms, but fortunately theres a more practical way to compute the determinant of a large matrix using the effect of row operations on the determinant,
summed up as follows:
Proposition 4.3. Suppose that A is an n n-matrix.
(1) Exchanging two rows of A multiplies det A by 1.
(2) Multiplying a row of A by R multiplies det A by .
(3) Adding a multiple of one row to another does not change det A.
We will prove the proposition later, after we explain some of its many
consequences. First we recall how it can be used in practice to compute
determinants. Simply apply row operations, keeping track of their effect
on the determinant, until the matrix is upper-triangular, and we know the
determinant of an upper-triangular matrix is the product of its diagonal
entries (see the exercises).
Example 4.4. We give an example comuting the determinant of a 4 4matrix:




1
1
1
2 1
4
2 1
4
2 1
4
2
0 4
0 1
0 1
3
1
5
0
0

= 0
= 0
1
1
0
1
3 1
5
3 1
5
4 1
0 9
0 9
2 1
6 17
6 17

(by a series of row operations of type III, the last one being R2 R2 + R3 )



1
1
2 1
4
2 1
4
0 1


0
0
0
0
0 1
= 0
=
= (1)(1)13 = 13.


0 1
5
0
0 1
5
0
0
0
6 17
0
0 13

We now deduce some general consequences of the proposition (which also


follows easily from the definition of the determinant).
Lemma 4.5. If A has a row of zeroes, then det(A) = 0.

Proof. Multiplying the zero row by any doesnt change A, so part (2) of
Prop. 4.3 gives det A = det A for all R , so det A = 0.

The following is not at all easily seen from the definition, but follows fairly
quickly from Lemma 4.3.

Theorem 4.6. If A and B are nn-matrices, then det(AB) = det(A) det(B)


Proof. Recall that applying a row operation to B gives the matrix EB
where E is the corresponding elementary matrix, so the proposition says
that
(I) if E = EI (p, q), then det(EB) = det(B);
(II) if E = EII (p, ), then det(EB) = det(B);
(III) if E = EIII (p, q, ), then det(EB) = det(B).
In particular, taking B = In gives det(E) = 1, or 1, according to whether
is of type I, II or III. It follows that det(EB) = det(E) det(B) for all B.

20

Recall that if A is invertible, then A can be written as a product E1 E2 Ek


of elementary matrices. It follows then that
det(A) = det(E1 E2 Ek ) = det(E1 ) det(E2 Ek )
= = det(E1 ) det(E2 ) det(Ek ).

(In particular, note that det(A) 6= 0 if A is invertible.) Similarly

det(AB) = det(E1 E2 Ek B) = det(E1 ) det(E2 Ek B)


= = det(E1 ) det(E2 ) det(Ek ) det(B) = det(A) det(B).

If A is not invertible, then the associated row echelon matrix can be


written as QA for some invertible matrix Q. Since rank(A) < n, we know
that QA has a zero row, so by Prop. 4.5 we have det(QA) = 0. Since Q is
invertible det(Q) 6= 0 and det(QA) = det(Q) det(A), so det(A) = 0. Note
also that AB is not invertible, so the same argument shows det(AB) = 0,
and therefore det(AB) = det(A) det(B) in this case as well.

In the course of the proof of Theorem 4.6, we saw also:
Theorem 4.7. det A 6= 0 if and only if A is invertible.
Corollary 4.8. det(At ) = det(A).

Proof. If A is invertible, then A = E1 Ek for some invertible matrices


E1 Ek . If E is an elementary matrix, then E t is an elementary matrix of
the same type and det(E) = det(E t ). Since At = Ekt E1t , it follows that
det(A) = det(E1 ) det(Ek ) = det(Ek )t det(E1t ) = det(At ).

If A is not invertible, then neither is det(At ) so det(A) = det(At ) = 0.

Since applying column operations amounts to multiplying (on the right)


by an elementary matrix, we get:
Corollary 4.9. Suppose that A is an n n-matrix.
(1) Exchanging two columns of A multiplies det A by 1.
(2) Multiplying a column of A by R multiplies det A by .
(3) Adding a multiple of one column to another does not change det A.
Corollary 4.10. If A = (aij ) is upper-triangular or lower-triangular, then
det A = a11 a22 ann .
Proof. This was an exercise if A is upper-trangular. If A is lower-triangular,
then At is upper-triangular with the same diagonal entries as A. The formula
then follows from Prop. 4.8.

The rest of this section consists of the proof of Prop. 4.3, and
an alternate definition of the determinant; these are included for
the sake of completeness, but are not required for the exam.
Rather than prove the three properties in Prop. 4.3 directly, we will show
they follow from three related properties, and prove those instead:
Proposition 4.11. Suppose that A is an n n-matrix.

21

(I) If A has two identical rows, then det A = 0.


(II) Multiplying a row of A by R multiplies det A by .
(III) Suppose that

A=

a11
a21
..
.
ai1
..
.
an1

a12
a22
..
.
ai2
..
.
an2

and

A00

a1n
a2n
..
.
ain
..
.
ann

0
, A =

a11
a21
..
.
a00
i1
..
.
an1

a12
a22
..
.
a00
i2
..
.
an2

a11 a12
a21 a22
..
..
.
.
a0i1
a0i2
..
..
.
.
an1 an2

a1n
a2n

..
.
,
a00
in
..
.
ann

a1n
a2n
..
.
a0in
..
.
ann

are identical except in ith row, and that the ith row of A00 is the sum
of the ith rows of A and A0 (i.e., a00ij = aij + a0ij for j = 1, . . . , n).
Then det(A00 ) = det(A) + det(A0 ).
Note that in (III), A00 is not the sum of A and A0 ; it is only the ith row
that is being described as a sum. Parts (II) and (III) taken together can
be thought of as saying that det is linear on each row.
Before proving Prop. 4.11, we show how Prop. 4.3 follows from it. Clearly
(II) implies (II).
Next we prove (III). Let A0 be the matrix identical to A, except that its
qth row is replaced times the pth row of A. Then by (II) and (I), we
have







0
|A | =




a11
..
.
ap1
..
.
ap1
..
.
an1

a12
..
.
ap2
..
.
ap2
..
.
an2

a1n
..
.
apn
..
.
apn
..
.
ann

a11
..
.
ap1
..
.
ap1
..
.
an1

a12
..
.
ap2
..
.
ap2
..
.
an2

a1n
..
.
apn
..
.
apn
..
.
ann








= 0.





Applying (III), we see therefore that det(A00 ) = det(A) where A00 is gotten
by applying the row operation of type (III) to A.

22

We now prove (I). Let A0 be the matrix gotten from A by interchanging


rows p and q. Note that by (III) (which we have now proved), we have



a12

a1n
a11
a12

a1n
a11





..
..
..
..
..
..



.
.
.
.
.
.



ap1 + aq1 ap2 + aq2 apn + aqn
ap1 + aq1 ap2 + aq2 apn + aqn



..
..
..
..
..
..
, |A0 | =
|A| =
.
.
.
.
.
.


aq1
ap1

aq2

aqn
ap2

apn



..
..
..
..
..
..






.
.
.
.
.
.
a


a

ann
a
a

ann
n1

n2

n1

n2

These two matrices are identical expect in the qth row, and the sum of
their qth rows is the same as the pth row, so by (III) and (I), we have
det(A) + det(A0 ) = 0. Therefore det(A0 ) = det(A).
Proof. (of Prop. 4.11) We proceed by induction on n.
For n = 1, note that there is nothing to prove for (I), that (II) just says
that det(a) = a = det(a), and (III) says that det(a + a0 ) = a + a0 =
det(a) + det(a0 ).
Suppose now that n > 1, and that Prop. 4.11 (and therefore also Prop. 4.3)
hold with n replaced by n 1.
First we prove (II). Let A0 be the matrix gotten by mutliplying the ith
row of A by . If i = 1, then the definition of the determinant gives
det(A0 ) = (a11 )M11 (a12 )M12 + + (1)n+1 (a1n )M1n = det(A).

If 2 i n, then we get

0
0
0
det(A0 ) = (a11 )M11
(a12 )M12
+ + (1)n+1 (a1n )M1n
,

0 is the (1, j)-minor of A0 . However M 0 is the determinant of


where M1j
1j
the same (n 1) (n 1) matrix as in the definition of M1j , except that
one of the rows has been multiplied by . So by the induction hypothesis
0 = M
0
M1j
1j for j = 1, . . . , n, and it follows that det(A ) = det(A).
Next we prove (III). If i = 1, then

det(A00 ) = a0011 M11 a0012 M12 + + (1)n+1 a001n M1n


= (a11 + a011 )M11 (a12 + a012 )M12 + + (1)n+1 (a1n + a1n0 )M1n
= det(A) + det(A0 ).
If 2 i n, then

00
00
00
det(A00 ) = a11 M11
a12 M12
+ + (1)n+1 a1n M1n
,

00 is the (1, j)-minor of A00 , which is the determinant of an (n 1)


where M1j
(n 1)-matrix which is identical to the one in the definition of hte minors
of A and A0 except in one row, where it is their sum. So it follow from
00 = M + M 0 for each j, and therefore
the induction hypothesis that M1j
1j
1j
det(A00 ) = det(A) + det(A0 ).
Finally we prove (I). This is easy if n = 2, so assume n > 2. Suppose
that the first two rows of A are identical, so a1j = a2j for j = 1, . . . , n.
If 1 j < k n, then let Njk denote the denote the determinant of the

23

(n 2) (n 2)-matrix gotten by deleting the first two rows of A, and


the jth and kth columns of A. Applying the definition of the determinant
to the minors M1j in the definition of det(A), we see that each M1j is the
alternating sum
a21 N1j a22 N2j + +(1)j a2,j1 Nj1,j +(1)j+1 a2,j+1 Nj,j+1+ +(1)n a2n Njn .
In the resulting expansion of det(A), each Njk (where j < k) will appear
twice: once in the expansion of M1j with coefficient (1)j+1 (1)k a1j a2k ,
and once in the expansion of M1k with coefficient (1)k+1 (1)j+1 a1k a2j .
Since we are assuming the first two rows are identical, we have a1j = a2j
and a1k = a2k . Since the signs are opposite, these two terms cancel. Thus
all terms cancel, and it follows that det(A) = 0.
Now suppose instead that row 1 and row q are identical for some q > 2.
Let A0 be the matrix gotten from A by interchanging rows 2 and q. Then
0
0
0
a12 M12
+ + (1)n+1 a1n M1n
,
det(A0 ) = a11 M11

where each M1j 0 is the determinant of the same matrix as in the definition
of M1j , but with two rows interchanged. By Prop. 4.3 (I) for n 1, we
0 = M , and it follows that det(A) = det(A0 ). But
therefore have M1j
1j
we have already shown that det(A0 ) = 0, so det(A) = 0 as well.
Finally suppose that row p and row q are identical, and p and q are
both greater than 1. Then the minors M1j are determinants of (n 1)
(n 1)-matrices, each of which has two identical rows. So by the induction
hypothesis, each M1j = 0, and therefore det(A) = 0.
This finishes the proof of Prop. 4.11, and thus the proof of all the stated
properties of the determinant.

Finally, we remark that there is an alternative definition of the determinant using permutations. This is also provided for your interest and will not
be covered on the examination.
The second definition is given purely for your general interest and will
not be covered in the course. It is somewhat more sophisticated but, once
mastered, the proofs of the properties are more transparent. The definition requires some knowledge about permutations. Recall that a bijective
mapping of {1, . . . , n} onto itself is called a permutation. The set of all
such permutations with composition as the operation of product forms a
group denoted by Sn . (i.e. 0 = 0 ). Here are the main properties of
permutations.
Every permutation Sn can be written as a product = 1 2 . . . s
where each i is a transposition (i.e. a permutation that exchanges
two elements of {1, 2, . . . , n} and leaves the rest fixed).
Although there are many such expressions for and the number s
of transpositions varies, nevertheless, for given , s is either always
even (in which case we say that is an even permutation) or always
odd (when we say is odd). (See the exercises for a proof of this

24

fact using the definition and properties of the determinant we already


know about.)
We define the sign sgn() of to be 1 if is even and 1 if is
odd. (i.e. sgn() = (1)s ).
The map sgn : 7 sgn() is a homomorphism from Sn to the
multiplicative group {1, 1} i.e. sgn( 0 ) = sgn() sgn( 0 ) , 0
Sn .
The second definition of determinant is based on the idea that each term
in the evaluation of a determinant is a product of n entries with exactly one
from each row and from each column. Thus from the first row a term might
have a1n1 where n1 is any integer between 1 and n, the from the second
a2n2 , but now n2 6= n1 , and so on. It is clear that the mapping i 7 ni is
a permutation of the integers 1 to n and that there one such term to each
such permutation . If the permutation is even we attach a plus sign to the
term a1(1) a2(2) an(n) and if is odd, we attach a minus sign. The sum
of these signed terms is the determinant. Here is the formal definition:
The determinant of an n n matrix A = (aij ) is defined by
X
sgn()a1(1) a2(2) . . . an(n)
det(A) =
Sn

where the sum is take over the group Sn of all permutations of the integers
{1, 2, 3, . . . , n}.
Here is a sketch of a proof that the two definitions agree: Let us provisionally call det0 (A) the determinant of A as defined using permutations, and
continue to use det(A) for the determinant as we originally defined it inductively. We first show that det0 (A) has some of the same properties as det(A),
and then explain how to deduce that det(A) = det0 (A). It is straightforward
to check that det0 (A) satisfies (II) and (III) in Prop. 4.11. In fact (I) is
not so difficult either: let denote the transposition interchanging p and q.
If apj = aqj for all j, then one sees that for any permutation , the terms in
the permutation definition indexed by and are identical, except that
they have opposite signs and therefore cancel.
Having proved det0 (A) satifies these properties, one finds that det0 (A) has
all the same properties we proved about det. In particular, if E is elementary,
we find that det(E) = det0 (E), and therefore det0 (EA) = det(E) det0 (A). If
A is invertible, then we can write it as a product of elementary matrices to
deduce in this case that det(A) = det0 (A). If A is not invertible, then there
is an invertible matrix P so that P A has a zero row. It is easy to see that
in this case det0 (P A) = 0, so that det0 (A) = 0, and so det0 (A) = det(A) in
all cases.
5. Similarity and diagonalization
We now consider linear maps f : V V from a finite-dimensional vector
space V to itself, also called linear operators on V . We consider matrices
of such maps with respect to the same basis of V as both the domain and

25

co-domain. We then have that Af g = Af Ag , where all matrices are with


respect to the same basis for V . (Note that all the matrices are square.)
Changing the basis for V from S to S 0 (in both domain and co-domain)
replaces the matrix Af by A0f = QAf Q1 where P is the transition matrix
from S to S 0 .
Def inition 5.1. If A and B are nn-matrices, then we say that A is similar
to B if there exists an n n invertible matrix Q such that B = QAQ1 . If
A is similar to B, then we write A ' B.
So matrices representing the same linear operator, but with respect to
different bases, are similar. Also, since every invertible matrix is a transition
matrix, any matrix similar to Af is the matrix of f with respect to some
basis. One can check that similarity is an equivalence relation (exercise).




0 1
1 0
Example 5.2. The matrix A =
is similar to B =
1 0
0 1


1
1
since B = QAQ1 with Q =
.
1 1

Def inition 5.3. We say that a linear map f : V V is diagonalizable if


the matrix of f with respect to some basis is diagonal. An n n-matrix A
is diagonalizable if A is similar to a diagonal matrix.

A linear map f is diagonalizable if and only if Af is diagonalizable and


a matrix A is diagonalizable if and only if the corresponding map on Rn
(defined by sending v to Av) is diagonalizable (exercise).
Recall the notion of an eigenvector:
Def inition 5.4. A non-zero vector v Rn is an eigenvector for A with
eigenvalue R if Av = v. Similarly if f : V V is a linear map, then
we say a non-zero vector v V is an eigenvector for f with eigenvalue if
f (v) = v.
Theorem 5.5. A linear map f : V V is diagonalizable if and only V has
a basis consisting of eigenvectors for f . A matrix A is diagonalizable if and
only if Rn has a basis consisting of eigenvectors for A.
Proof. Suppose that f is diagonalizable. Then
with respect to which

1 0 0
0 2 0

Af = ..
..
..
.
.
.
0

V has a basis {v1 , . . . , vn }

This means that f (v1 ) = 1 v1 , f (v2 ) = 2 v2 , . . . f (vn ) = n vn ; i.e., that


v1 , v2 , . . . , vn are eigenvectors. Conversely if V has a basis consisting of
eigenvectors, then we see in the same way that the matrix of f with respect
to this basis is diagonal.

26

  

1
1
Example 5.6. The matrix
is diagonalizable, and {
,
}
1
1


1 1
is a basis of eigenvectors. The matrix
is not diagonalizable. The
0 1


0 1
matrix
is not diagonalizable over R, but it is diagonalizable
1 0
 
1
over C, a basis of eigenvectors being given by
(with eigenvalue i) and
i


1
(with eigenvalue i). For the reason ilustrated by the last example,
i
it is often more convenient to work over C than over R. Well return to this
point later.
0 1
1 0

Lemma 5.7. If f : V V is a linear map, and v1 , v2 , . . . , vk V are


eigenvectors for f with distinct eigenvalues 1 , 2 , . . . , k , then {v1 , . . . , vk }
is linearly independent.
Proof. We prove the lemma by induction on k. If k = 1, then {v1 } is
linearly independent since v1 6= 0.
Now suppose that k > 1 and the lemma is true with k replaced by k1. In
P
particular {v1 , .. . , vk1 } islinearly independent. Suppose that ki=1 i vi =
Pk
0. Then also f
i=1 i vi = 0, but since f is linear,
!
k
k
X
X
i f (vi ).
i vi =
f
i=1

i=1

Since f (vi ) = i vi for i = 1, . . . , k, we get

1 1 v1 + + k1 k1 vk1 + k k vk = 0.
P
On the other hand, multiplying ki=1 i vi = 0 by k gives
k 1 v1 + + k k1 vk1 + k k vk = 0.

Subtracting one equation from the other gives


1 (1 k )v1 + + k1 (k1 k )vk1 = 0.
But since {v1 , . . . , vk1 } is linearly independent, we get
1 (1 k ) = 2 (2 k ) = k1 (k1 k ) = 0.
Since the i are distinct, we know that 1 k , 2 k , . . . , k1 k are
all non-zero. Therefore 1 = 2 = = k1 = 0. Finally, since this gives
k vk = 0, and vk 6= 0, we conclude that k = 0 as well.


Note that theres an equivalent form of the lemma with f replaced by an


n n-matrix and v1 , . . . , vk replaced vectors in Rn (or Cn ).
Since a set of n linearly independent vectors in an n-dimensional space is
necessarily a basis, it follows from Theorem 5.5 and the lemma that:

27

Corollary 5.8. If f (or A) has n distinct eigenvalues, then it is diagonalizable.


A key tool for studying eigenvalues, eigenvectors and diagonalization of a
matrix is its characteristic polynomial. Recall the definition:
Def inition 5.9. The characteristic polynomial of an n n-matrix A is
the polynomial pA (x) = det(xI A) (in the variable x). The characteristic
equation of A is the equation det(xI A) = 0.
It is easy to see from the permutation definition of the determinant that
pA (x) is a polynomial of degree n. See the exercises for a proof using the
inductive definition.
The connection with eigenvalue is given by:
Lemma 5.10. is an eigenvalue for A if and and only if pA () = 0.
Proof. is an eigenvalue for A

Av = v for some v 6= 0
(I A)v = for some v 6= 0
nullity(I A) > 0
rank(I A) < n
det(I A) = 0
pA () = 0.



1 1
Example 5.11. The characteristic polynomial of
0 1


x 1 1

= (x 1)2
0
x1

is

and its only eigenvalue is 1.



0 1
The characteristic polynomial of
is
1 0


x 1
2


1 x = x 1 = (x + 1)(x 1),

so its eigenvalues are 1.

0 1
1 0

is
The characteristic polynomial of


x 1
2


1 x = x + 1 = (x i)(x + i),

so its eigenvalues are the complex numbers i (and it has no real eigenvalues).
Everything weve said so far works in exactly the same way for complex
vector spaces and matrices as for real vector spaces and matrices. However
the last example shows that some real matrices are only diagonalizable when

28

viewed as complex matrices. Roughly speaking, the process of diagonalizing


matrices works better over C than over R because factorization of polynomials works better, The reason for this is the following important property
of the complex numbers, called the Fundamental Theorem of Algebra. We
state it without proof (the proof being outside the scope of this course).
Theorem 5.12. If f (x) = xn + an1 xn1 + a0 with an , . . . , a1 , a0 C,
then
f (x) = (x 1 )(x 2 ) (x n )

for some 1 , 2 , . . . , n C.

In other words, any polynomial over C of degree n factors completely into


linear factors, i.e., it has n roots, counting multiplicity.
For the rest of the section, we will be working with complex
vector spaces and matrices unless otherwise stated.
Def inition 5.13. The algebraic multiplicity of an eigenvalue of A is
the multiplicity of as a root of the characteristic equation, i.e., if
pA (x) = (x 1 )m1 (x 2 )m2 (x k )mk
where 1 , . . . , k are the distinct eigenvalues for A, then the algebraic multiplicity of each i is mi . The geometric multiplicity of is the nullity
of (I A).


1 1
Example 5.14. The algebraic multiplicity of the eigenvalue 1 of
0 1
2
is 2 (since the characteristic
polynomial
is (x 1) ). The geometric multi

0 1
plicity is the nullity of
, which is 1.
0 0
For the 22 identity matrix, the characteristic polynomial is also (x1)2 ,
so the only eigenvalue is 1, with algebraic multiplicity 2, which is also its
geometric multiplicity.


0 1
For the matrix
, the algebraic multiplicity of each eigenvalue is
1 0
1, as is its geometric multiplicity.
Proposition 5.15. Suppose that B is similar to A. Then
(1) pA (x) = pB (x) (in particular, A and B have the same eigenvalues);
(2) the algebraic multiplicity of an eigenvalue of A is the same as its
algebraic multiplicity as an eigenvalue of B;
(3) the geometric multiplicity of an eigenvalue of A is the same as its
geometric multiplicity as an eigenvalue of B.
Proof. Since B = QAQ1 for some invertible matrix Q, we have
Q(xI A)Q1 = (xQ QA)Q1 = (xQQ1 QAQ1 ) = xI B.
Therefore,

29

(1) pB (x) = det(xI B) = det(Q(xI A)Q1 ) = det(Q) det(xI


A) det(Q1 ) = det(Q) det(Q1 )pA (x) = pA (x).
(2) This is immediate from (1) and the definition of algebraic multiplicity.
(3) From above, we see that I B = Q(I A)Q1 . Since Q is invertible,
it follows that nullity(I B) = nullity(I A).

When working with a fixed matrix A, we will write m for the algebraic
multiplicity of and n for the geometric multiplicity of .

Theorem 5.16. A is diagonalizable if and only m = n for all eigenvalues


.
Proof. : If A is diagonal, then A ' D for

1 0
0 2

D = ..
..
.
.
0 0

some diagonal matrix

0
0

.. .
.
n

By Prop. 5.15, we just have to show that m = n for the matrix D. Since
pD (x) = (x 1 )(x 2 ) (x n ), we see that m is the number times
that = i . On the other hand n is the nullity of the matrix

1
0

0
2
0

D=
,
..
..
..

.
.
.
0

which is also the number of times that = i .


: Conversely, suppose that m = n for all eigenvalues . We will show
that Cn has a basis consisting of eigenvectors for A. Denote the distinct
eigenvalues 1 , 2 , , k , write mi for the algebraic multiplicity of i , and
ni for the geometric multiplicity. So for each i = 1, . . . , k, the kernel of
i I A has dimension ni . Let Si = {vi1 , . . . , vini } be a basis for ker(i I A).
Note that each vij is an eigenvector for A with eigenvector i . We will prove
that
S = S1 S2 Sk = {v11 , . . . , v1n1 , v21 , . . . , v2n2 , . . . , vk1 , . . . , vknk }

is a basis for Cn .
We first prove that S is linearly independent. Suppose that 11 , . . . , 1n1 ,
. . . , k1 , . . . , knk C are such that
11 v11 + +1n1 v1n1 21 v21 + +2n2 v2n2 + +k1 vk1 + +knk vknk = 0.
Let w1 = 11 v11 + + 1n1 v1n1 , . . . , wk = k1 vk1 + + knk vknk .
Each wi that is non-zero is an eigenvector for A with eigenvalue i . Since
the i are distinct and w1 + + wk = 0, we see by Lemma 5.7 that
w1 = w2 = = wk = 0. Since each Si is linearly independent it follows
that ij = 0 for all j = 1, . . . , ni .

30

We have now shown that S is linearly independent, so to prove that S is


a basis it suffices to prove that S has n elements. By construction, S has
n1 + n2 + + nk elements, we assumed that m1 = n1 ,. . . ,mk = nk , and
m1 + m2 + + mk = n
(by the Fundamental Theorem of Algebra). Since Cn has a basis of eigenvectors, A is diagonalizable by Theorem 5.5.

Recall that we are assuming A is a complex matrix. The results apply to
real matrices, provided they are viewed as complex matrices. In particular,
diagonalizability means as a complex matrix, and the equality m = n must
hold for all eigenvalues , including those that are complex but not real.


1 1
Example 5.17. For A =
, we had m = 2, n = 1 for the (only)
0 1
eigenvalue = 1.
While it is not the case that every matrix is similar to a diagonal matrix,
the following is true:
Theorem 5.18. Every square matrix is similar to an upper-triangular matrix (over C).
Proof. We prove this by induction on n. For n = 1 there is nothing to
prove since every 1 1-matrix is upper-triangular.
Suppose that n > 1 and the theorem is true for (n 1) (n 1)-matrices.
Let v be an eigenvector for A with eigenvalue C. (By the Fundamental
Theorem of Algebra, pA (x) has a root in C, so A has an eigenvalue in
C.) Let S be any basis for Cn with v as its first vector, and let Q be the
transition matrix from the standard basis to S, so the matrix of f (x) = Ax
with respect to S is B = QAQ1 . Since Av = v, the first column of B is
(, 0, . . . , 0)t . Therefore


r
1
B = QAQ =
0 A1
for some row vector r of length n 1 and some (n 1) (n 1)-matrix A1 .
By the induction hypothesis, there is an invertible (n 1)
 (n 1)-matrix

1 0
1
Q1 so that T1 = Q1 A1 Q1 is upper-triangular. Let R =
. Then
0 Q1
R is invertible, and therefore so is RQ, and
1
(RQ)A(RQ)1 = RBR



 

1
0
1 0
r
rQ1
1
=
=
0 Q1
0 A1
0
T1
0 Q1
1

is upper-triangular.

31

Note that if A is similar to the upper-triangular matrix

t11 t12 t1n


0 t22 t2n

T = ..
..
..
.
.
.
0
0

tnn

then pA (x) = pT (x) = (x t11 )(x t22 ) (x tnn ).

Corollary 5.19. If is an eigenvalue for A, then n m .


Proof. By the theorem, we can assume A is the upper-triangular matrix T .
Then m is the number of times appears on the diagonal of T , and n is
the nullity of I T . This matrix is upper-triangular and has n m nonzero diagonal entries. The rows with non-zero diagonal entries are linearly
independent, so rank(I T ) n m , and n = n rank(I T ) m .

Note that if f (x) = am xm + am1 xm1 + + a1 x + a0 is a polynomial,
say with coefficients in C, and A is a square matrix, then we can evaluate
the polynomial at x = A by defining
f (A) = am Am + am1 Am1 + + a1 A + a0 I.
The following theorem states that when we substitute a matrix into its
characteristic polynomial, we get the 0 matrix.
Theorem 5.20. [Cayley-Hamilton] Every square matrix A satisfies its characteristic equation; i.e., pA (A) = 0.
Proof. Let 1 , 2 , , n be the eigenvalues of A repeated according to
algebraic multiplicity. From Theorem 5.18 there is an invertible matrix
R such that A = RT R1 where T is an upper-triangular matrix with
1 , 2 , , n as its diagonal entries. The characteristic polynomial p of
A and of T is (x 1 )(x 2 ) (x n ). Writing this in the form
p(x) = xn + an1 xn1 + + a1 x + a0 , we see that
p(A)

=
=(
=
=
=

p(RT R1 )
RT R1 )n + an1 (RT R1 )n1 + + a1 RT R1 + a0 I
RT n R1 + an1 RT n1 R1 + + a1 RT R1 + a0 RIR1
R(T n + an1 T n1 + + a1 T + a0 I)R1
Rp(T )R1 ,

so it is sufficient to show that p(T ) = (T 1 I)(T 2 I) (T n I) = 0.


Note that the ith diagonal entry of T i I is 0.
We complete the proof by showing that if T1 , T2 , . . . , Tn are upper-triangular
matrices such that the ith diagonal entry of Ti is 0 then T1 T2 Tn = 0. The
proof is by induction on the size of the matrix. The result is obvious for
n = 1. Suppose the result is true for (n 1) (n 1) matrices. Write the

32

Ti as partitioned matrices
 0

 0

Ti vi
Tn vn
Ti =
, for i = 1, . . . , n 1, and Tn =
,
0 ti
0 0
where each Ti0 is an (n1)(n1)-upper triangular matrix with 0 in the ith
0
=0
diagonal place. Therefore, by the induction hypothesis, T10 T20 Tn1
and so
 0 0
 

0
T1 T2 Tn1
v
0 v
T1 T2 Tn1 =
=
0
t
0 t

where v and t are some vector and scalar (whose values are not important).
Then

 0
 

Tn vn
0 0
0 v
=0
=
T1 T2 T3 . Tn =
0 0
0 t
0 0
as required.




0 1
has characteristic polynomial
1 0
pA (x) = x2 + 1 and satisfies pA (A) = 0 since A2 + I = I + I = 0.

Example 5.21. The matrix A =

Lemma 5.22. There is a unique monic polynomial mA (x) of lowest degree


such that mA (A) = 0. The polynomial mA (x) has the following properties:
(1) If q(x) is a polynomial such that q(A) = 0, then mA (x) divides q(x).
(2) The roots of mA (x) = 0 are precisely the eigenvalues of A.
(3) If A and B are similar, then mA (x) = mB (x).
Proof. The Cayley-Hamilton Theorem shows that there is some monic
polynomial q(x) such that q(A) = 0, namely q(x) = pA (x). (Recall that
a monic polynomial is one whose leading coefficient is 1.) Therefore there
must be a monic polynomial q(x) of lowest degree such that q(A) = 0. We
denote it mA (x). The uniqueness will follow from part (1).
(1) Suppose that q(x) is a polynomial such that q(A) = 0. By the division algorithm for polynomials, we have q(x) = mA (x)f (x) + r(x) for
some polynomials f (x), r(x) where deg(r(x)) < deg(mA (x)) (or r(x) = 0).
Substituting x = A gives
0 = q(A) = mA (A)f (A) + r(A) = r(A)
since mA (A) = 0. If r(x) 6= 0, then r(x) = bd xd + + b0 for some
d < deg(mA (x)) and bd 6= 0, in which case b1
d r(x) is a monic polynomial
1
of degree less than deg(mA (x)) such that bd r(A) = 0, contradicting our
definition of mA (x). Therefore r(x) = 0, and q(x) = mA (x)f (x) is divisible
by mA (x).
The uniqueness of mA (x) follows since if q(x) is a monic polynomial of
the same degree as mA (x) and q(x) = 0, then by (1), q(x) = mA (x)f (x)
where f (x) is a scalar c (since q(x) and mA (x) have the same degree). Since
mA (x) and q(x) are both monic, we must have c = 1, and q(x) = mA (x).

33

(2) By (1), we know that if mA (x) divides pA (x). Therefore if mA () = 0,


then pA () = 0, so is an eigenvalue of A. Conversely, suppose that is an
eigenvalue of A. We must show that mA () = 0. Let v be an eigenvector
with eigenvalue . Note that since Av = v, we have A2 v = A(Av) =
A(v) = (Av) = 2 v, and similarly Ai v = i v for all i 0. So writing
mA (x) = xm + bm1 xm1 + + b0 , we see that
mA (A)v =
=
=
=
=

(Am + bm1 Am1 + + b0 I)v


Am v + bm1 Am1 v + + b0 v
m v + bm1 m1 v + + b0 v
(m + bm1 m1 + + b0 )v
mA ()v.

Since mA (A) = 0, we therefore have 0 = mA (A)v = mA ()v, and since


v 6= 0, it follows that the scalar mA () is 0.
(3) is left an exercise.

Def inition 5.23. The polynomial mA (x) defined by the preceding lemma
is called the minimal polynomial of A.
Example 5.24. Note that the lemma implies that mA (x) has the same roots
as pA (x), but not necessarily with the same multiplicity; the multiplicity of
the eigenvalue as a root of mA (x) may be less than as a root of pA (x). In
particular, if A has n distinct eigenvalues (i.e., pA (x) has no repeated roots),
then mA (x) = pA (x), but this may fail if pA (X) has repeated roots. For
2
example, if n = 2 and A
 = I, then
 pA (x) = (x 1) , but mA (x) = x 1. On
1 1
the other hand if A =
, then pA (x) = (x 1)2 , but mA (x) 6= x 1
0 1
since A I 6= 0; therefore mA (x) = (x 1)2 .
We give one more criterion for diagonalizability, now in terms of the
minimal polynomial of A.
Theorem 5.25. A square matrix A is diagonalizable if and only if its minimal polynomial mA (x) has no repeated roots.
Proof. Suppose that A is diagonalizable. We must show that its minimal
polynomial is q(x) = (x 1 ) (x k ) where 1 , . . . , k are the distinct
eigenvalues of A (i.e., listed without any repetition). Since each eigenvalue
is a root of mA (x), we know that mA (x) is divisible by q(x), so it suffices
to show that q(A) = 0 as this implies (by Lemma 5.22) that q(x) is also
divisible by mA (x).
If v is an eigenvector for A, then Av = v for some root of q(x). So as in
the proof of part (2) of Lemma 5.22, we see that q(A)v = q()v = 0. Since
A is diagonalizable, Cn has a basis {v1 , . . . , vn } consisting of eigenvectors
for A, and we have seen that for each i = 1, . . . , n, q(A)vi = 0, i.e., vi is in
the kernel of q(A). Therefore nullity(q(A)) = n, so q(A) = 0.

34

Conversely, suppose that mA (x) = (x 1 ) (x k ) where 1 , . . . , k


are the distinct eigenvalues. This means that
(A 1 I) (A k I) = 0.
Recall that if B and C are n n-matrices, then nullity(BC) nullity(B) +
nullity(C) (see Sheet 3, Exercise 9f, which in fact shows this is true for any
matrices B, C such that BC is defined ). Therefore
n = nullity 0 = nullity((A 1 I) (A k I))
n1 + n2 + + nk ,
where ni is the geometric multiplicity of the eigenvalue i . On the other
hand we know that n = m1 + m2 + + mk where mi is the algebraic
multiplicity of i , and that ni mi for each i (Cor. 5.19). Putting these
inequalities together, we see that the only possibility is that ni = mi for
each i, and hence by Thm. 5.16 A is diagonalizable.

We finish the section with a discussion of Jordan canonical form, which
is not covered on the exam. This gives a standard form for a matrix
similar to A, which is as close as possible to being diagonal.
Theorem 5.26. Every square matrix A is similar (over C) to a partitioned
matrix of the form:

T1 0 0
0 T2 0

T = ..
.. ,
.
.
.
. .
0

where each Ti is a square matrix of the

i 1
0
0 i 1

..
..
.
.

0 0
0 0

Tk

form

..
.
i
0

0
0
..
.

1
i

for some eigenvalue i of A. Moreover the matrix T as above is unique, up


to permuting the order of T1 , . . . , Tk .

The form of the matrix in the theorem is called Jordan canonical form.
We remark that there can be repetition among the eigenvalues 1 , . . . , k
appearing in the expression, and the sizes of the T1 , T2 , . . . , Tk may be different from each other; also some of the Ti may be 1 1, in which case we
have Ti = (i ). We will sketch the proof, but first we illustrate the meaning
of the theorem by listing the possible forms for T in the case n = 3.

35

If pA (x) has distinct roots 1 , 2 , 3 , then A is diagonalizable and

1 0 0
T = 0 2 0 .
0 0 3
So in this case T1 = (1 ), T2 = (2 ), T3 = (3 ).
If pA (x) = (x 1 )(x 2 )2 with 1 6= 2 , then

1 0 0
1 0 0
T = 0 2 0 or 0 2 1 .
0 0 2
0 0 2
In the first case k = 3, T1 = (1
) and T2 =T3 = (2 );
2 1
case k = 2, T1 = (1 ) and T2 =
.
0 2
If pA (x) = (x )3 , then

1
0 0
0 0
T = 0 0 , 0 1 , or 0
0 0
0 0
0 0

in the second

0
1 .

In the first case k =3, T1 = 


T2 = T3 = (); in the second case k = 2,
1
T1 = () and T2 =
; in the last case k = 1 and T = T1 .
0

We now sketch the proof of the theorem. By Theorem 5.18, we can assume
A is upper-triangular. We will first reduce to the case where A has only
one eigenvalue. Write pA (x) = (x 1 )n1 (x r )nr where 1 , . . . , r
are distinct, and consider the matrices Ai = (A i )ni for i = 1, . . . , r.
Then considering the diagonal entries, we see that rank Ai n ni , so
nullity(Ai ) ni . On the other hand, by the Cayley-Hamilton Theorem
A1 A2 Ar = 0, so an argument like the one in the proof of Thm. 5.25 shows
that in fact nullity(Ai ) = ni . Let Vi denote the null space of Ai and suppose
that vi Vi for i = 1, . . . , r are such that v1 + v2 + + vr = 0. Suppose
that vi 6= 0 for some i. If j 6= i, then vi cannot be an eigenvector with
eigenvalue j (else we would have Ai vi = (A i )ni vi = (j i )ni vi 6= 0).
So (A j )vi 6= 0, but (A j ) commutes with Ai , so (A j )vi is a
non-zero vector in Vi . Inductively, we find that Aj vi is a non-zero vector
in Vi , and in fact, letting A0 = A1 A2 Ai1 Ai+1 Ar , we get that A0 vi
is non-zero. On the other hand Aj vj = 0, so that A0 vj = 0 for j 6= i, so
applying A0 to v1 + v2 + + vr = 0 gives a contradiction. It follows then
as in the proof of Thm. 5.16 that if Si is a basis for Vi for i = 1, . . . , r, then
S = S1 Sr is a basis for Cn . Since vi Vi implies Avi Vi , it follows
that the matrix for A with respect to such a basis will be block diagonal
with the ith diagonal block Bi being a matrix for the linear map defined by
A on Vi with respect to Si . Moreover Bi satisfies (Bi i )ni = 0, so i is
the only eigenvalue of Bi .

36

It suffices to prove that each Bi has the required form, so we are reduced
to the case where A has only one eigenvalue, so we will just write instead
of i . Replacing A by A I, we can even assume = 0 from now on. (If
A I is similar to T , then A is similar to T + I.) Suppose then that
(1)
(1)
(1)
mA (x) = xd , so Ad = 0, but Ad1 6= 0. Let v1 , v2 , . . . , vs1 be a basis for
the image of Ad1 (viewed as the linear map v 7 Av), and for i = 1, . . . , s1 ,
(1)
(1)
(1)
choose wi so that Ad1 wi = vi . Consider the linear map f1 from the
image of Ad2 to the image of Ad1 defined by multiplication by A. Then
(1)
(1)
(1)
v1 , v2 , . . . , vs1 are linearly independent vectors in the kernel, and so can
be extended to a basis of the kernel of f1 :
(1)

(2)

(1)

(2)

.
, v1 , v2 , . . . , vs(2)
v1 , v2 , . . . , vs(1)
2
1
By lemma 2.10, these s1 + s2 vectors, together with
(1)

(1)

Ad2 w1 , Ad2 w2 , . . . , Ad2 ws(1)


,
1
(2)

(2)

(2)

form a basis for the image of Ad2 . Now choose wi so Ad2 wi = vi for
i = 1, . . . , s2 . Iterating the process for j = 3, . . . , d for the map fj1 (v) = Av
from the image of Adj to the image of Adj+1 , we get vectors
(1)

(1)

(2)

(2)

(j)

(j)

w1 , w2 , . . . , ws(1)
, w1 , w2 , . . . , ws(2)
, . . . , w1 , w2 , . . . , ws(j)
1
2
j
with the following properties:
(t)

(t)

The vectors vi = Adt wi , for 1 t j, 1 i st , form a basis


for ker fj1 .
(t)
The vectors Au wi for 1 t j, d t u d j, 1 i st , form
a basis for the image of Adj .
In particular, for j = d, this gives a basis for Cn . Now divide the basis into
its k = s1 + s2 + + sd subsets of the form:
(t)

(t)

(t)

(t)

(t)

vi = Adt wi , Adt1 wi , . . . , Awi , wi

(where t = 1, . . . , d, and i = 1, . . . , st for each t). Note that the effect of A


on each such subset is precisely that of a matrix Ti as in the statement of
the theorem, so the matrix of A with respect to this basis has the required
form (with = 0).
The uniqueness follows from the fact that the sizes of the blocks for each
eigenvalue are determined by the nullities of the maps (A )i for 1
i n .
6. Inner products
Def inition 6.1. The inner product of two vectors u = (u1 , . . . , un ), v =
(v1 , . . . , vn ) Cn is defined by
hu, vi =

n
X
i=1

ui vi = u1 v1 + u2 v2 + + un vn .

37

= uv
where v
= (
v1 , . . . , vn ) is the complex conNote that hu, vi = ut v
jugate of v (taken coordinatewise) and dnotes the dot (or scalar) product
of the two vectors. Thus hu, vi is a scalar in C.
Example 6.2. If u = (3, 1 + i, 1) and v = (1 2i, i, 2), then
hu, vi = 3(1 + 2i) + (1 + i)i + (1)2 = 7i.

If u, v Rn , then the same definition applies, simply giving the real


.
number hu, vi = ut v = u v since v = v
Proposition 6.3. If u, v, w Cn and , C, then
(1) hu + v, wi = hu, wi + hv, wi;
(2) hv, ui = hu, vi;
(3) hu, ui R, and hu, ui > 0, unless u = 0, in which case hu, ui = 0.

The proof is straighforward and is left as an exercise. Note that according


to (2), the order of u and v matters; interchanging the vectors replaces the
inner product by its complex conjugate, so for example if u = (3, 1 + i, 1)
and v = (1 2i, i, 2), then hu, vi = 7i, but hv, ui = 7i
n
Def inition
p 6.4. If v C , then the norm (or length) of v is defined as
||v|| = hv, vi. We say u is orthogonal to v if hu, vi = 0.

Note that ||v|| is a non-negative real number. For the vectors from
Example
+ i, 1) and v = (1 2i, i, 2), we have
u = (3, 1
6.2, with
||u|| = 12 = 2 3 and ||v|| = 10.
More generally, suppose V is any finite-dimensional complex vector space
equipped with a map V V C denoted h, i; i.e., a rule associating a
complex number hu, vi to any elements u, v C. Then V is an inner
product space if it satisies (1), (2) and (3) of the preceding proposition.
In particular, there p
is the notion of the norm of a vector in V defined by
the formula ||v|| = hv, vi (a non-negative real number by property (3));
there is also a notion of orthogonality as for Cn . Note that Cn , with the
inner product defined above, is an inner product space. There is a similar
notion of a real inner product space, which is a real vector space V equipped
with an inner product hv, wi satisfying analogous axioms. Everything we do
for (complex) inner product spaces carries over without change, except that
complex conjugation plays no role.
Proposition 6.5. If V is an inner product space, then for u, v, w V and
, C, we have
(1) hu, v + wi = hu, vi + hu, wi;
(2) ||v|| = || ||v||;
(3) h0, vi = hv, 0i = 0;
(4) [Pythagorean Theorem] if hu, vi = 0, then ||u + v||2 = ||u||2 + ||v||2 .
The proofs of these are also straightforward and left as exercises. Note
that the proposition applies to to V = Cn with its usual inner product. The

38

case of (4) for u, v R2 is the usual Pythagorean Theorem for triangles in


the plane.
Theorem 6.6. Suppose that V is an inner product space and u, v V .
Then
(1) [Cauchy-Schwartz Inequality]
|hu, vi| ||u|| ||v||;
(2) [Triangle Inequality]
||u + v|| ||u|| + ||v||.
Proof. (1) If v = 0, then hu, vi = 0, so the assertion is obvious. If v 6= 0,
then ||v||2 6= 0, and we let
w =u
Then hw, vi = hu, vi

orthogonal to v 0 =
implies that

hu,vi
||v||2

hu,vi
hv, vi
||v||2

hu, vi
v.
||v||2

= hu, vi hu, vi = 0. Therefore w is also

v, and since u = w + v 0 , the Pythagorean Theorem

||u||2 = ||w||2 + ||v 0 ||2 = ||w||2 +


2

hu, vi2
.
||v||2

||u||2 ,and (1) follows on taking square roots and


It follows that hu,vi
||v||2
multiplying through by ||v||.
(2) We have
||u + v||2 =
=
=
=

hu + v, u + vi
hu, ui + hu, vi + hv, ui + hv, vi
||u||2 + hu, vi + hu, vi + ||v||2
||u||2 + 2Re(hu, vi) + ||v||2

where Re(z) = 21 (z + z) is the real part of z. Since Re(z) |z| for any z C,
this gives (using part (1))
||u + v||2 ||u||2 + 2|hu, vi| + ||v||2
||u||2 + 2||u|| ||v|| + ||v||2
= (||u|| + ||v||)2 ,
and (2) follows on taking square roots.

Example 6.7. Again usingthe vectors


u and v from Example 6.2, we
have |hu, vi| =7, ||u|| = 2 3, ||v|| = 10 and u + v = (4 2i, 1, 1),
so||u +
v|| = 22. The inequalities in Theorem 6.6 are 7 2 30 and
2 3 + 10 22.
Def inition 6.8. A set of vectors {u1 , u2 , . . . , uk } in an inner product space
V is orthonormal if
(1) hui , ui i = 1 for i = 1, . . . , k;
(2) hui , uj i = 0 if i 6= j (with i, j {1, . . . , k}).
Note that the first condition is equivalent to ||ui || = 1 for i = 1, . . . , k.

39

Example 6.9. Let u1 =


1
t
2 (i, 1, i, i) .

1 (1, 0, 1, 0)t
2
orthonormal subset of C4 .

1
t
2 (1, i, 1, 1) ,

Then {u1 , u2 , u3 } is an

u2 =

and u3 =

Lemma 6.10. An orthonormal set is linearly independent.


Proof. Suppose {u1 , . . . , uk } is orthonormal and 1 , . . . , k C are such
that
1 u1 + 2 u2 + + k uk = 0.
We must show that i = 0 for i = 1, . . . , k. Since h0, ui i = 0 for each i, we
know that
0 = h1 u1 +2 u2 + +k uk , ui i = 1 hu1 , ui i+2 hu2 , ui i+ +k huk , ui i.

But huj , ui i = 0 unless j = i, so the only surviving term on the right is


i hui , ui i = i , which must therefore be 0.


If an orthonormal set spans V , then (being linearly independent) it is


necessarily a basis, called an orthonormal basis.

Example 6.11. The standard basis {e1 , e2 , . . . , en }, where ei is the vector


(0, 0, . . . , 0, 1, 0, . . . , 0)t (with the 1 in the ith place) is an orthonormal basis
for Cn . For another example of an orthonormal basis, take {u1 , u2 , u3 , u4 }
where u1 , u2 , u3 are as in Example 6.9 and u4 = 12 (0, i, 0, 1)t .
Lemma 6.12. Let (1 , 2 , . . . , n )t and (1 , 2 , . . . , n )t be the coordinates
of v and w with respect to an orthonormal basis {u1 , u2 , . . . , un } of an inner
product space V . Then
(1) i = hv, ui i for i = 1, . . . , n;
(2) hv, wi = 1 1 + + n n .

Proof. (1) For (1 , 2 , . . . , n )t to be the coordinates of v means that v =


1 u1 + 2 u2 + + n un . Therefore
hv, ui i = 1 hu1 , ui i + 2 hu2 , ui i + + n hun , ui i,

and since huj , ui i = 0 unless i = j, in which case it is 1, we get hv, ui i = i .


(2) Since w = 1 u1 + 2 u2 + + n un , we have
hv, wi = 1 hv, u1 i + 2 hv, u2 i + + n hv, un i,

which by (1) is 1 1 + + n n .

Example 6.13. Let {u1 , u2 , u3 , u4 } be the orthonormal basis of C4 from


Example 6.11. If v = (1 + i, 2, 0, i)t , then the coordinates
of v are given

by 1 = hv, u1 i = (1+2i)/2,
2 = hv, u2 i = (1+i)/ 2, 3 = hv, u3 i = i/2

and 4 = hv, u4 i = 3i/ 2.


Def inition 6.14. For a complex n n matrix A = (aij ), the adjoint of A,
t
denoted A , is the matrix whose (i, j)-entry is aji ; in other words A = A .
A complex n n matrix U is unitary if U U = I; in other words, U is
invertible and U 1 = U . A real n n matrix P is orthogonal if P P t = I;
in other words P is orthogonal if P is invertible and P 1 = P t . (Note that

40

if A is real, then A = At , so a real matrix is orthogonal if and only if it is


unitary.)


3 4i
1
Example 6.15. The matrix U = 5
is unitary since U =
4 3i


3
4
1
= U 1 .
5
4i 3i
Proposition 6.16. Let A and U be complex n n matrices, and P a real
n n matrix. Then
(1) (A ) = A;
1
(2) if U is unitary, then so are U = U 1 , U and U t = U ;
(3) if P is real orthogonal, then so is P t = P 1 .

The proof is left as an exercise.


Proposition 6.17. Suppose that S = {u1 , . . . , un } and S 0 = {v1 , . . . , vn }
are bases for Cn and that {u1 , . . . , un } is orthonormal. Then {v1 , . . . , vn }
is orthonormal if and only the transition matrix from S 0 to S is unitary.
Proof. Let U = (ij ) be the transition matrix from S 0 to S. Then the j th
column (1j , . . . , nj )t of U is given by the coordinates of vj with respect
to S. Therefore the ith row of U is (1i , . . . , ni ). It follows that the
(i, j)-entry of U U is
1j 1i + 2j 2i + + nj ni ,
which by Lemma 6.12 is precisely hvj , vi i. So U U = I is equivalent to
having hvi , vi i = 1 for all i, and hvj , vi i = 0 whenever i 6= j.

Corollary 6.18. A complex nn matrix is unitary if and only if its columns
form an orthonormal basis for Cn .
Proof. The basis S = {e1 , . . . , en } is orthonormal, and the transition matrix from S 0 = {v1 , . . . , vn } to S is the matrix whose columns are precisely
the vectors in S 0

Example 6.19. From Example6.11, we see that

2
i
1
0

1 i
1 i 2
0
i
2 1 2
0
1
0 i 2
is unitary.

the matrix

Analagous statemenes are true for real matrices; in particular, a real nn


matrix is orthogonal if and only if its columns form an orthonormal basis
for Rn .

41

Def inition 6.20. If A and B are complex square matrices, then we say
that A is unitarily similar to B if A = U 1 BU = U BU for some unitary
matrix U , and A is unitarily diagonalizable if it is unitarily similar to a
diagonal matrix. If A and B are real square matrices, then we say that A is
orthogonally similar to B if A = P 1 BP = P t BP for some orthogonal
matrix P , and A is orthogonally diagonalizable if it is orthogonally
similar to a diagonal matrix.
It is easy to see that unitary (and orthogonal) similarity are equivalence
relations using the fact that if U and U 0 are unitary, then so are U 1 and
U U 0 (see the exercises).
Theorem 6.21. A complex (or real) matrix A is unitarily (or orethogonally) diagonalizable if and only if there is an orthonormal basis consisting
of eigenvectors for A.
Proof. This is clear from the definitions and Corollary 6.18 since U AU is
diagonal if and only the columns of U are eigenvectors for A.



0 i
Example 6.22. The matrix A =
is unitarily diagonalizable since
i 0
1 (1, 1)t and 1 (1, 1)t form an orthonormal basis of eigenvectors with
2
2
eigenvalues
i
and
i. More
explicitly,
we have U AU = D where U =




1 1
i 0
1
and D =
.
2
1 1
0 i
Def inition 6.23. A square matrix A is
(1) symmetric if A = At ;
(2) self-adjoint (or Hermitian) if A = A;
(3) normal if AA = A A.
Example

 6.24. If A is self-adjoint, then A is normal. The matrix A =
i 0
is not self-adjoint, but it is normal since A = A, so A A =
0 i
AA = A2 .
Proposition 6.25.
(1) If a real matrix is orthogonally diagonalizable, then it is symmetric.
(2) If a complex matrix is unitarily diagonalizable, then it is normal.
Proof. (1) If A is orthogonally diagonalizable, then A = P DP t for some orthogonal P and diagonal D. Then At = (P DP t )t = (P t )t D t P t = P DP t =
A, so A is symmetric.
(2) If A is unitarily diagonalizable, then A = U DU for some unitary U
and diagonal D. Then A = (U DU ) = (U ) D U = U D U , so
AA = (U DU )(U D U ) = U (DD )U
and similarly A A = U (D D)U . Since D is diagonal D = D and DD =
D D, so A is normal.


42

We will later prove the converse to both parts of the proposition. First
we note the behavior of adjoint and unitary matrices with respect to the
inner product.
Proposition 6.26. Suppose that A is a complex n n-matrix.
(1) If hAv, wi = 0 for all v, w Cn , then A = 0.
(2) If v, w Cn , then hAv, wi = hv, A wi.
(3) If U is a unitary n n matrix and v, w Cn , then
hU v, U wi = hv, wi
ith

and

||U v|| = ||v||.

Proof. (1) Let ei be the


standard basis vector
for i = 1, . . . , n. Then
Pn
th
Aej is the j column of A = (aij ), so Aej = k=1 akj ek . Therefore
* n
+
n
X
X
hAej , ei i =
akj ek , ei =
akj hek , ei i.
k=1

k=1

The only term that survives on the right is for i = k, giving hAej , ei i = aij .
Applying hAv, wi = 0 with v = ei , w = ej therefore gives aij = 0 for all
i, j, so A = 0.
(2) Since A = At , we have
hv, A wi = vt A w = vt A w = vt At w = (Av)t w = hAv, wi.

(3) Applying part (2) with A = U and with U w in place of w gives


hU v, U wi = hv, U U wi = hv, wi

since U U = I. The assertion about ||U v|| follows from the case v = w and
taking square roots.

Analogous statements hold for real matrices. (See the exercises.)
Theorem 6.27. If A self-adjoint, then
(1) the eigenvalues of A are real;
(2) eigenvectors with distinct eigenvalues are orthogonal.
Proof. (1) Suppose C is an eigenvalue of A, so Av = v for some
v 6= 0. Then hAv, vi = hv, vi = hv, vi. Since A = A , part (1) of
Prop. 6.26 implies that
hAv, vi = hv, A vi = hv, Avi = hv, vi = hv, vi.

6 0, it follows that = , so R.
Since hv, vi = hv, vi and hv, vi =
(2) Suppose that Av = v and Aw = w for some non-zero v, w Cn ,
and some 6= C. Then hAv, wi = hv, wi = hv, wi, and
hAv, wi = hv, Awi = hv, wi = hv, wi = hv, wi

since R. Since hv, wi = hv, wi and 6= , it follows that hv, wi = 0.



Note that the theorem applies to real symmetric matrices.

43

The theorem has the following corollary (which we will improve upon
later, removing the assumption that A has n eigenvalues).
Corollary 6.28.
(1) If A is self-adjoint with n distinct eigenvalues, then A is unitarily
diagonalizable.
(2) If A is real symmetric with n distinct eigenvalues, then A is orthogonally diagonalizable.
Proof. (1) Let {v1 , v2 , . . . , vn } be a basis consisting of egenvectors for the
distinct eigenvalues 1 , 2 , . . . , n . Let ui = ||vvii || , for i = 1, . . . , n. Then
each ui is an eigenvector with eigenvalue i . By Thm. 6.27 we have hui , uj i =
0 for i 6= j. Since ||ui || = 1 for i = 1, . . . , n, we see that {u1 , u2 , . . . , un } is
an orthonormal basis of eigenvectros for A, so A is unitarily diagonalizable.
The proof of (2) is the same.



1 2i
Example 6.29. Let A be the self-adjoint matrix
. Then
2i 4
det(xI A) = x2 5x = x(x 5), so the eigenvalues are the real number 0
and 5. For 1 = 0, we have the eigenvector v1 = (2i, 1)t , and for 2 = 5,
we have the eigenvector v2 = (1, 2i)t . Then v1 is orthogonal
to v2 and an
orthonormal 
basis is givenby multiplying each by its norm 15.
 Thusif we
2i
1
0 0
1

take U = 5
, then U is unitary, and U AU =
.
1 2i
0 5
Before treating the general cases of normal and real symmetric matrices,
we establish the following useful algorithm for constructing orthonormal
sets.
Theorem 6.30 (Gram-Schmidt Process). Suppose that {v1 , v2 , . . . , vk } is a
linearly independent subset of an inner product space V . Let u1 , u2 , . . . , uk
be vectors defined inductively as follows:
Let

and

w1 = v1 ,
w2 = v2 hv2 , u1 iu1 ,
w3 = v3 hv3 , u1 iu1 hv3 , u2 iu2 ,
..
.
P
wk = vk k1
i=1 hvk , ui iui ,

u1 =
u2 =
u3 =

w1
||w1 || ,
w2
||w2 || ,
w3
||w3 || ,

..
.
uk =

wk
||wk || .

Then {u1 , . . . , uk } is orthonormal and has the same span as {v1 , . . . , vk }.


Proof. We prove the theorem by induction on k. For k = 1, we have
||u1 || = 1, so {u1 } is orthonormal, and since u1 is a non-zero multiple of v1 ,
they have the same span.
Now suppose k > 1 and the theorem holds with k replaced by k 1. Then
{u1 , . . . , uk1 } is orthonormal and has the same span as {v1 , . . . , vk1 }.

44

We must first check that wk =


6 0 in order for the formula defining uk to
make sense. Suppose then that wk = 0. Then
vk =

k1
X
hvk , ui iui span{u1 , . . . , uk1 } = span{v1 , . . . , vk1 }
i=1

contradicts the assumption that span{v1 , . . . , vk1 , vk } is linearly independent (recall Lemma 1.19).
Next we check that {u1 , . . . , uk } is orthonormal. Since {u1 , . . . , uk1 }
is orthonormal, we already know that ||ui || = 1 for i = 1, . . . , k 1 and
hui , uj i = 0 if i 6= j {1, . . . , k 1}. It is clear that ||uk || = 1, so we only
need to check that hwk , uj i = 0 for j = 1, . . . , k 1 (as this implies that
huk , uj i = 0 and huj , uk i = huk , uj i = 0 for j = 1, . . . , k 1). But for each
such j, we have
E
D
P
hwk , uj i =
vk k1
i=1 hvk , ui iui , uj
P
= hvk , uj i k1
i=1 hvk , ui ihui , uj i.

By the induction hypothesis, hui , uj i = 0 unless i = j, in which case it is 1,


so the right hand side becomes hvk , uj i hvk , uj i = 0, as required.
Finally we must check that span{u1 , . . . , uk } = span{v1 , . . . , vk }. By the
induction hypothesis, we know span{u1 , . . . , uk1 } = span{v1 , . . . , vk1 }, so
in particular ui span{v1 , . . . , vk1 } span{v1 , . . . , vk } for i = 1, . . . , k 1.
By construction, we have wk span{u1 , . . . , uk1 , vk } span{v1 , . . . , vk1 , vk },
and therefore so is uk . It follows that span{u1 , . . . , uk } span{v1 , . . . , vk }.
Since we have shown that {u1 , . . . , uk } is orthonormal, we know it is linearly
independent, so span{u1 , . . . , uk } has dimension k. We know span{v1 , . . . , vk }
also has dimension k, so in fact span{u1 , . . . , uk } = span{v1 , . . . , vk }.

The Gram-Schmidt Process works in exactly the same way for real inner
product spaces. For Rn , it has the folowing geometric interpretation: first
rescale v1 to get a vector u1 of length 1. Then the orthogonal projection
of v2 onto the line spanned by u1 is hv2 , u1 iu1 ; subtract this from v2 to get
w2 , and rescale to get u2 . Then the orthogonal projection of v3 to the plane
spanned by u1 and u2 is hv3 , u1 iu1 + hv3 , u2 iu2 ; subtract this from v3 and
rescale to get u3 . . .

Example 6.31. Let us apply the Gram-Schmidt Process to the vectors


v1 = (1, 0, i)t , v2 = (1, i, 1)t and v3 = (0, 1, 1 + i)t in C3 . Then w1 = v1 ,
u1 = 12 v1 = 12 (1, 0, i)t , and
1
w2 = v2 hv2 , u1 iu1 == v2 hv2 , w1 iw1 .
2

1+i
1+i t
Since hv2 , w1 i = 1 i, this gives w2 =
2 , i, 2 ) , so


1 + i 1 + i t
1
, i,
u2 =
).
2
2
2

45

Now
1
1
i
1+i 1 t
w3 = v3 hv3 , w1 iw1 hv3 , w2 iw2 = ( ,
, )
2
2
2
2 2
happens to have ||w3 || = 1, so let u3 = w3 .
We record some consequences of the Gram-Schmidt process.
Corollary 6.32.
(1) Every subspace of Cn has an orthonormal basis;
(2) if v Cn with ||v|| = 1, then v is the first column of a unitary
matrix;
(3) if Q is an invertible n n complex matrix, then Q = U T for some
unitary matrix U and upper-triangular matrix T .
Proof. (1) Apply the Gram-Schmidt process to any basis {v1 , . . . , vk } for
the subspace.
(2) Extend {v} to a basis {v1 , . . . , vn } for Cn with v1 = v (by Thm. 1.26
and then apply the Gram-Schmidt process to obtain an orthonormal basis
{u1 , . . . , un }. Note that since ||v1 || = 1, we have u1 = v1 = v, and since
{u1 , . . . , un } is orthonormal, the matrix whose columns are u1 , . . . , un is
unitary.
(3) Let v1 , v2 , . . . , vn be the columns of Q. If A is invertible, then
rank Q = n, so the columns span Cn , hence form a basis. Now apply the
Gram-Schmidt process to S = {v1 , v2 , . . . , vn } to get an orthonormal basis
S 0 = {u1 , u2 , . . . , un }. Then the matrix whose columns are u1 , u2 , . . . , un
is unitary. Then T = U 1 Q is the transition matrix from S to S 0 . Recall
that the j th column of T is given by the coordinates of vj with respect to
S 0 . Since vj is in the span of {u1 , . . . , uj }, its ith coordinate is 0 for i > j,
which means that T is upper-triangular.

The analagous statements hold for vectors and subspaces of Rn and real
matrices, replacing unitary with orthogonal throughout; the proofs are the
same as for C.
We discuss some examples before proceeding with the proof of unitary
(resp. orthogonal) diagonalizability of normal (resp. real symmetric) matrices.
Example 6.33. Let V be the null space in C4 of the matrix


1 i 0 i+1
.
A=
1 0 2
i
We can find an orthonormal basis for V as follows. First find a basis for V
in the usual way, by applying row operations to A to get


1 0 2 i
.
0 1 2i i

46

We see therefore that v1 = (2, 2i, 1, 0)t , v2 = (i, i, 0, 1)t . Applying


Gram-Schmidt to get an orthonormal basis gives u1 = 13 (2, 2i, 1, 0)t and

1
1
w2 = (i, i, 0, 1)t + (2 + 2i)(2, 2i, 1, 0)t = (5i 4, 5i 4, 2 2i, 9)t .
9
9
1
Normalizing then gives u2 = 319 (5i 4, 5i 4, 2 2i, 9)t .

Example 6.34. Let v = 31 (1, 2, 2)t R3 . Then ||v|| = 1, and we can


extend {v} to a basis using, for example, the standard basis vectors e1 , e2 .
Applying the Gram-Schmidt process to {v, e1 , e2 } gives
2
1
w2 = e1 he1 , viv = (1, 0, 0)t (1, 2, 2)t = (4, 1, 1)t .
9
9

2
Then ||w2 || = 9 3 2, so we let
u2 =
Now he2 , vi =

2
3

w2
1
= (4, 1, 1)t .
||w2 ||
3 2

1
, so we let
and he2 , u2 i = 3
2

w3 = e2 he2 , viv he2 , u2 iu2


1
= (0, 1, 0)t 29 (1, 2, 2)t + 18
(4, 1, 1)t = 12 (0, 1, 1)t ,
giving u3 =

1 (0, 1, 1)t .
2

We thus obtain an orthogonal matrix

1
2
0
3
3
2 1
1
3 23 2
2
3

2 3

with first column v = 13 (1, 2, 2)t .

Example 6.35. Let Q be the matrix whose columns are the vectors v1 , v2 , v3
from Example ??, so

1 1
0
1 .
Q= 0 i
i 1 1+i

We then have the matrix

1
2

U = 0

i
2

1+i

2 2
i
2
1+i

2 2

i
2
1i
2
1
2

whose columns are the vectors u1 , u2 , u3 gotten from the Gram-Schmidt


Process. Now T = U 1 Q = U Q =

1

i
1i
1i

0
2
1
1
0
2
2
2
2

1i
i
1i
1+i

0
.
i
1 = 0
2
22
2
2 2
2
i
1+i
1
i 1 1+i
0
0
1
2
2
2

47

So far the only examples of inner product spaces weve considered were
Rn and Cn . For more examples, note that a subspace of an inner product
space is still an inner product space. For more interesting examples, one can
consider spaces of functions.
Example 6.36. Let V be the set of real polynomials of degree at most n.
Define an inner product on V by
Z 1
f (x)g(x) dx.
hf, gi =
0

This satisfies the conditions in the definition of a real inner product space,
since
R1
R1
R1
(1) 0 (f (x)+g(x))h(x) dx = 0 f (x)h(x) dx+ 0 g(x)h(x) dx shows
that hf, hi + hg, hi;
R1
(2) hg, f i = 0 f (x)g(x) dx = hg, f i;
R1
(3) hf, f i = 0 f (x)2 dx > 0 unless f (x) = 0.

In the context of abstract inner product spaces, the Gram-Schmidt Process


has the following corollary.
Corollary 6.37. Every inner product space has an orthonormal basis.
Proof. Let {v1 , . . . , vn } be any basis for V , and apply the Gram-Schmidt
Process to get an orthonormal basis.

We already saw how this works if V is a subspace of Cn (see Example ??).
Consider an example where V is a space of polynomials, as in Example ??.
Example 6.38. Let V be the space of real polynomials of degree at most 2.
Let us take the basis {f1 , f2 , f3 } where f1 (x) = 1, f2 (x) = x and f3 (x) = x2 ,
and apply Rthe Gram-Schmidt Process to get an orthonormal basis. Since
1
||f1 (x)|| = 0 dx = 1, we do not need to rescale, and we let h1 = f1 . Since
R1
hf2 , h1 i = 0 x dx = 12 , we set g2 (x) = x 12 . Now


 1
Z 1
1
1 3
1
1 2
2
dx =
x
||g2 || =
x
= ,

2
3
2
12
0
0

so ||g2 || = 1/2 3. Our second basis vector is therefore





1
g2 (x)
=2 3 x
h2 (x) =
= 3(2x 1).
||g2 ||
2
R1 2
Since hf3 , h1 i = 0 x dx = 31 and
R1
R1
3 (2x3 x2 ) dx
hf3 , h2 i = 3 0 x2 (2x 1) dx =
01 4 1 3  1 1
3 2 x 3 x 0 = 3 6 ,
=
we set

g3 (x) = x2

1 1
1
(2x 1) = x2 x + .
3 2
6

48

Then another integral calculation gives ||g3 ||2 = 1/180, so for the third
vector in the orthonormal basis, take
g3 (x)
h3 (x) =
= 5(6x2 6x + 1).
||g3 ||

Example 6.39. If we momentarily drop the assumption that V be finitedimensional, we can consider more interesting examples. Let V be the space
of continuous complex-valued functions on the real unit interval [0, 1] (so
f V means that f (x) = s(x) + it(x) where s and t are continuous realvalued functions on [0, 1]). Define
Z 1
f (x)g(x) dx
hf, gi =
0

for f, g V . This satisfies the axioms to be an inner product on V . (Since


hf, f i is the integral of the continuous non-negative real-valued function
|f (x)|2 on [0, 1], its value is positive unless |f (x)|2 , and hence f (x), are
identically 0.) Consider the functions
fn (x) = e2inx = cos(2nx) + i sin(2nx)
for n Z. Then fn (x) = e2inx = cos(2nx) i sin(2nx), |fn (x)|2 = 1,
R1
||fn || = 0 dx = 1, and if m 6= n, then
R1
hfm , fn i = 0 e2imx e2inx dx
R1
= 0 e2i(mn)x dx
R1
= 0 (cos(2(m n)x) + i sin(2(m n)x)) dx
R1
R1
= 0 cos(2(m n)x) dx + i 0 sin(2(m n)x) dx
= 0.

Therefore { fn | n Z} is an example of infinite orthonormal set. The set is


linearly independent, by the same argument as for finite orthonormal sets.

Now we return to the task of proving unitary diagonalizability. We first


prove:
Theorem 6.40.
(1) Every square matrix is unitarily similar to an upper-triangular matrix.
(2) Every square matrix with only real eigenvalues is orthogonally similar
to an upper-triangular matrix.
Proof. (1) Recall from Theorem 5.18 that if A is a square matrix, then A
is similar to an upper-triangular matrix, so Q1 AQ = T is upper-triangular
for some invertible Q and upper-triangular T . By Corollary ??(3), Q = U T 0
for some unitary U and upper-triangular T 0 . It follows that
T = Q1 AQ = (U T 0 )1 A(U T 0 ) = (T 0 )1 U 1 AU T 0 ,
and therefore U AU = U 1 AU = T 0 T (T 0 )1 is upper-triangular.

49

(2) If A is a real square matrix with only real eigenvalues, then the proof
of Theorem 5.18 goes through in exactly the same way to show that A is
similar (over R) to an upper-triangular matrix, and the proof of Corollary ??
goes through to show that if Q is real invertible, then Q = P T for some
orthogonal P and upper-triangular T . The proof of (2) is then the same as
(1).

Theorem 6.41.
(1) A matrix is unitarily diagonalizable if and only if it is normal.
(2) A real matrix is orthogonally diagonalizabe if and only if it is symmetric.
Proof. (1) We already saw in Prop. 6.25 that if A is unitarily diagonalizable,
then A is normal. We must prove the converse.
Suppose then that A is normal. By Thm. ??, we know that U AU = T for
some unitary matrix U and upper-triangular matrix T . Since AA = A A
and T = (U AU ) = U A (U ) = U A U , it follows that
T T = (U AU )(U A U ) = U A(U U )A U = U (AA )U
= U (A A)U = U A U U AU = T T ;
i.e., T is normal.
We complete the proof by showing that a normal upper-triangular n n
matrix is diagonal. We prove this by induction on n. The case n = 1 is
obvious. Suppose then that n > 1 and the statement is true for (n 1)
(n 1) matrices. Let T be a normal upper-triangular n n matrix. Since
T is upper-triangular, we can write


vt
,
T =
0 T1
n1 and T is an (n 1) (n 1) matrix. Then
where C, v
1
 C
0

, so
T =
v T1




||2 + vt v vt T1
||2
vt

.
TT =
and T T =
T1 v
T1 T1
v vvt + T1 T1

Since T is normal, T T = T T , and equating the upper-left entries gives


||2 +||v||2 =||2 , so ||v||2 = 0, which implies that v = 0. Therefore
0
T =
, and T1 T1 = T1 T1 . Now T1 is a normal upper-triangular
0 T1
(n 1) (n 1) matrix, so by the induction hypothesis, T1 is diagonal, and
it follows that so is T .
(2) Again, weve already seen one implication in Prop. 6.25, namely that
if A is real and orthogonally diagonalizable, then A is symmetric. We must
prove the converse, so suppose that A is symmetric. By Thm. 6.27, the
eigenvalues of A are all real, so Thm. ?? implies that P t AP = T for some
orthogonal P and upper-triangular T . Now T t = P t At P = P t AP = T , so T

50

is symmetric, and a symmetric upper-triangular matrix is clearly diagonal.



Finding the matrix U (or P ) as in the theorem amounts to finding an
orthonormal basis of eigenvectors. If the eigenvalues are distinct then the
eigenvectors are automatically orthogonal (see Theorem 6.27 for the real
symmetric/orthogonal case; the normal/unitary case is similar), so only need
to be rescaled to obtain an orthonormal basis (see Example 6.29). If the
characteristic has repeated roots, then for each repeated root , we apply
the Gram-Schmidt Process to the linearly independent eigenvectors with
that eigenvalue to obtain an orthonornmal basis for the kernel of A I.
Example 6.42. Let A denote the matrix

3 2 4
2
6 2 .
4
2 3

Since A is symmetric, we know by the theorem that is orthogonally diagonalizable. The characteristic polynomial of A is
pA (x) = (x 3)2 (x 6) + 16 + 16 16(x 6) 4(x 3) 4(x 3)
= x3 12x2 + 21x + 98.

We see that x = 2 is a root and pA (x) factors as (x + 2)(x2 14x + 49) =


(x + 2)(x 7)2 . Applying row operations to A + 2I gives

5 2 4
1 0
1
2
8 2 0 1 1/2 ,
4
2 5
0 0
0

so that (2, 1, 2)t is an eigenvector with eigenvalue 2. Normalizing gives


u1 = 13 (2, 1, 2)t . Applying row operations to A 7I gives

4 2
4
1 1/2 1
2 1
2 0
0
0 ,
4
2 4
0
0
0

so that (1, 0, 1)t and (1, 2, 0)t are linearly independent eigenvectors with
eigenvalue 7. Applying the Gram-Schmidt Process to these gives u2 =
1
1 (1, 0, 1)t and u3 =
(1, 4, 1). Therefore setting
2
3 2
2 1

P =

3
1
3
2
3

1
2

32
2 2
3
1

3 2

gives an orthogonal matrix such that P t AP is the diagonal matrix with


entries 2, 7, 7 on the diagonal.

You might also like