Linear Algebra Lecture Notes
Linear Algebra Lecture Notes
1. Vector Spaces
Def inition 1.1. A real vector space (or a vector space over R) is a
set V together with two operations + and . called addition and scalar
multiplication respectively, such that:
(i) addition is a binary operation on V which makes V into an abelian group
and
(ii) the operation of scalar multiplying an element v V by an element
R gives an element .v of V .
In addition, the following
(a) 1.v = v for all v V
(b) If , R and v V
(c) If R and v, w V
(d) If , R and v V
a11 a12
a21 a22
..
.
a1n
a2n
..
.
b11
b21
..
.
b12
b22
b1n
b2n
..
.
am1
am2 amn
bm1 bm2 bmn
a11 + b11
a12 + b12 a1n + b1n
a21 + b21
a
a2n + b2n
22 + b22
=
.
..
..
.
.
am1 + bm1 am2 + bm2
amn + bmn
We can abbreviate the above notation by writing the first matrix above as
A = (aij )i,j , meaning that A is the matrix whose entry in row i, column j (or
(i, j)-entry) is aij for i = 1, . . . , m and j = 1, . . . , n. Similarly writing B =
(bij )i,j , the formula becomes A + B = (aij + bij )i,j . Scalar multiplication of
matrices is defined by A = (aij )i,j for R and A = (aij )i,j Mm,n (R).
Recall that Mm,n (R) is an abelian group by standard properties of matrix
addition. It is also easy to see that Mm,n (R) is a vector space: a) It is clear
that 1 A = A from the definition of scalar multiplication. b) If , R
and A = (aij )i,j , then
(A) = (aij )i,j = ((aij ))i,j = (()aij )i,j = ()A
(the third equality follows from associativity of multiplication on R, the rest
from the definition of scalar multiplication). The proofs of axioms c) and d)
are similar, but using the distributive law on R.
Example 1.6. Let F denote the set of functions f : R R. Recall that
we can add two function by adding their values; i.e., if f, g F, then f + g
is the function defined by (f + g)(x) = f (x) + g(x). Similarly if R and
f F, then we define the scalar multiple of f by by multiplying its values
by ; i.e., (f )(x) = (f (x)). The proof that F is a real vector space is left
as an exercise. Some interesting subsets of F are also vector spaces, e.g.,
D = { f : R R | f is differentiable };
P = { f : R R | f is defined by a polynomial }.
We have the following basic properties, which we leave as an exercise to
deduce from the axioms:
Proposition 1.7. Suppose that V is a real vector space with zero element
0. Then
(1) 0.v = 0 for all v V ;
(2) .0 = 0 for all R;
(3) (1).v = v for all v V .
Def inition 1.8. If S is a subset of a vector space V , then the span of S,
denoted by span S, is the set of all finite sums of the form 1 v1 + 2 v2 +
+ k vk where i R (some may be zero) and vi S. We call a sum of
1 s1 +2 s2 + +m sm = 0 implies that 1 = 2 = = m = 0 .
so sm+1 span S.
Example 1.21. The sets in Examples 1.9, 1.10 and 1.11 are all bases of the
vector spaces being considered; they are all therefore finite-dimensional.
Example 1.22. The set {1, x, x2 , . . .} is a basis for P (which is not finitedimensional).
Example 1.23. The set in Example 1.17 spans R2 but is not a basis of R2
since it is not linearly independent.
Theorem 1.24. If S = {v1 , v2 , . . . , vm } spans V then there is a basis of V
which is a subset of S.
Proof. (Sketch) If S is linearly independent, then it is a basis and we are
done, so suppose S is linearly dependent. This means that
1 v1 + + m vm = 0
for some 1 , . . . , m R with some i 6= 0. We can reorder the vi and
so assume that m 6= 0. As in the proof of Lemma 1.19, we get that
vm span S 0 where S 0 = {v1 , . . . , vm1 }. By Corollary 1.14, we see that
span S 0 = span(S 0 {vm }) = span S = V , so S 0 spans V .
If S 0 is linearly independent, then we are done. Otherwise repeat the
process, which must eventually terminate since S is finite.
Corollary 1.25. If V is spanned by a finite set, then V is finite-dimensional.
Theorem 1.26. If V is finite-dimensional and S = {v1 , v2 , . . . , vk } is linearly independent, then there is a basis of V which contains S.
Proof. Since V is finite-dimensional, V is spanned by a finite set S 0 =
{s1 , . . . , sn } for some s1 , . . . , sn V . The idea of the proof is to show that
we can choose elements from S 0 to add to S to form a basis.
If S spans V , then S is itself is a basis, so there is nothing to prove.
So suppose S does not span V . Then S 0 6 span S (since if S 0 span S,
then Lemma 1.13 would show that V = span S 0 span S, contradicting
the assumption that S does not span V ). So we can choose some si1 S 0
such that si1 6 span S. By Lemma 1.19, S1 = {v1 , . . . , vk , si1 } is linearly
independent.
If S1 = {v1 , . . . , vk , si1 } spans V , then S1 is a basis containing S, and we
are done. If S1 does not span V , then repeat the process using S1 instead of
S to get si2 S 0 such that S2 = {v1 , . . . , vk , si1 , si2 } is linearly indpendent.
(Note that si2 6= si1 since si2 6 span S1 .) If S2 spans we are done; if not
then repeat the process, which must yield a spanning set in at most n steps
since we keep choosing distinct si from S 0 and S 0 itself spans V .
Theorem 1.27. Suppose that S = {v1 , . . . , vn } spans V and S 0 = {w1 , . . . , wm }
is a set of m linearly independent vectors in V . Then m n.
Proof. The idea of the proof is to show that we can replace each wi with
vi (reordering the vi if necessary), leading to a contradiction if m > n.
n
of coordinates of v with respect to the basis {v1 , v2 , . . . , vn }. (Coordinates
of vectors are normally regarded as column vectors but sometimes written
in rows, or using transpose notation, to save space.)
Note that if V = Rn and S = {s1 , . . . , sn } is its standard basis (Example 1.9), then the n-tuple of coordinates of a vector is given by its usual
coordinates (since v = (1 , . . . , n = 1 s1 + + n sn ). The coordinates
with respect to a different basis will be different.
2
3
Example 1.30. Let v1 =
and v2 =
in R2 . Then {v1 , v2 } is
1
1
linearly independent
and
span, and therefore form
abasis. The standard
1
1
basis vector s1 =
has coordinates s1 =
with respect to the
0
0
1
standard basis, but has coordinates
with respect to {v1 , v2 } (since
1
2
s1 = v1 + v2 ). The vector v1 has coordinates
with respect to the
1
1
standard basis, and has coordinates
with respect to {v1 , v2 }.
0
Suppose that S = {v1 , . . . , vn } and S 0 = {v10 , . . . , vn0 } are bases for V , and
that v V coordinates (1 , . . . , n )t with respect to S and (01 , . . . , 0n )t
with respect to S 0 (where t denotes the transpose). We can relate the
coordinate vectors with respect to the different bases as follows: Note that
each vj S has an n-tuple of coordinates (t1j , . . . , tnj ) with respect to S 0 .
Define the transition matrix from S to S 0 as the n n-matrix T = (tij )i,j .
Then
0
1
1
.. ..
T . = . ,
n
0n
i.e., multiplication by T converts the coordinates with respect to S into the
coordinates with respect to S 0 . (Proof:
n
n
n
n
n
X
X
X
X
X
tij vi0 =
tij j vi0 ,
j
j vj =
v=
j=1
so
0i
Pn
j=1 tij j
j=1
i=1
i=1
j=1
for i = 1, . . . , n.)
d
dx
is in the span of S.
n
X
j=1
j f (vj ) =
X
i,j
aij j wi
10
Pn
j=1 aij j .
Note that in the proposition, we are assuming the matrices are always
with the respect to the chosen bases for U , V and W . Choosing different
bases will give different matrices:
Proposition 2.16. Suppose that S and S 0 are bases for V , and T and T 0
are bases for W . If a linear map f : V W has matrix Af with respect to
S and T , then the matrix of f with respect to S 0 and T 0 is A0f = QAP 1 ,
where P is the transition matrix from S to S 0 and Q is the transition matrix
from T to T 0 .
Proof. If v V has coordinates x = (1 , . . . , k )t with respect to S 0 , then
v has coordinates P 1 x with respect to S, so f (v) has coordinates Af P 1 x
with respect to T , and QAf P 1 with respect to T 0 .
1 1
Example 2.17. Let A =
and define f : R2 R2 by f (x) =
1 0
Ax. Then the matrix of A with respect to the standard basis S is just A.
0
t
t
The matrix of A with respect
to the
basis S = {(2, 1) , (3, 1) } is given by
2 3
A0 = P 1 AP , where P =
is the transition matrix from S 0 to S,
1 1
so
1 3
1 1
2 1
5
7
0
A =
=
.
1 2
1 0
3 1
3 4
Recall that if f : V W is a linear map, then ker(f ) = { v V | f (v) =
0 } is a subspace of V of dimension nullity(f ), and im(f ) = { f (v) W | v
V } is a subspace of W of dimension rank(f ). If f : Rn Rm is defined by
f (x) = Ax (see Example 2.9), then we also write nullity(A) for the nullity
of f (the dimension of the solution space of Ax = 0) and rank(A) for the
rank of f (the dimension of the span in Rm of the columns of A, so this
11
is the column-rank of A, but well see shortly this is the same as the
row-rank).
Theorem 2.12 says that
dim(V ) = rank(f ) + nullity(f ).
(assuming V and W are finite-dimensional). We know that:
f is injective if and only if nullity(f ) = 0;
f is surjective if and only if rank(f ) = dim(W );
so f is an isomorphism if and only if nullity(f ) = 0 and rank(f ) =
dim(W ).
If follows easily that:
Proposition 2.18. If f : V W is an isomorphism then dim(V ) =
dim(W ). If dim(V ) = dim(W ), then the following are equivalent:
(1) f is injective;
(2) f is surjective;
(3) f is an isomorphism.
An n n-matrix A is invertible if and only there is an n n-matrix B
such that AB = BA = In . Because of the correspondence between matrices
and linear maps, A is invertible if and only if the associated linear map
f : Rn Rn (sending x to Ax) is an isomorphism. So in that context
Prop. 2.18 says that:
A is invertible nullity(A) = 0 rank(A) = n.
and
rank(f g) = rank(g).
nullity(f g) = nullity(f )
and
rank(f g) = rank(f ).
12
then A has the same rank and nullity as QAP . Note also that the rank and
nullity of f : V W are the same as those of its matrix Af (for any choice
of bases for V and W ). Lemma 2.10 shows that we can choose the bases so
that Af has a very simple form:
Lemma 2.20. Suppose that f : V W is a linear map of finite-dimensional
vector spaces. Then there are bases {v1 , v2 , . . . , vn } of V and {w1 , w2 , . . . , wm }
of W such that the matrix of f with respect to the chosen bases has the partitioned matrix form
Ir 0
Af =
0 0
where r = rank(f ), Ir is the r r identity matrix and 0 denote zero matrices
of appropriate sizes. (If r = m then the bottom row is absent; if r = n then
the right hand column is absent; if r = 0 then Af is the zero matrix.)
Proof. Let S = {u1 , . . . , ur , v1 , . . . , vk } be the basis for V in Lemma 2.10.
So {w1 , . . . , wr } (where each wi = f (ui )) is a basis for im(f ), and {v1 , . . . , vk }
is a basis for ker(f ). Since {w1 , . . . , wr } is linearly independent, it can
be extended to a basis T = {w1 , . . . , wm } for W (by Thm. 1.26). Now
f (u1 ) has coordinates (1, 0, . . . , 0)t with respect to T , f (u2 ) has coordinates
(0, 1, 0, . . . , 0)t . . . and f (ur ) has coordinates (0, 0, . . . , 0, 1, 0, . . . , 0)t (with a 1
in the rth coordinate). Also, f (v1 ) = . . . = f (vk ) have coordinates (0, . . . , 0),
so the matrix of f is exactly what we want.
Corollary 2.21. If A Mm,n (R), then there are invertible matrices P
Mn,n (R) and Q Mm,m (R) such that
Ir 0
A=Q
P.
0 0
Recall the definition and a few basic facts about transpose matrices: If
A = (aij )i,j is an m n-matrix, then At denotes its n m transpose matrix
whose (j, i)-entry is ai,j . I.e.,
If A = ..
.. , then At = ..
.. ,
.
.
.
.
am1 am2 amn
a1n a2n amn
13
Ir 0
Proof. Write A = Q
P as in Corollary 2.21. Then At = P t
0 0
Since P t and Qt are invertible, we get rank(At ) = r = rank(A).
Ir 0
0 0
Qt .
0 1 1 1 1
1 1 3
0
2
. Denoting the ith row
Example 3.2. Let A =
1 0
2 1 3
2 1
3
0 1
by Ri , apply the row operation
exchanging
R1 with R
2 , and then multiply
1 1
3
0 2
0
1 1 1 1
. Then (sometimes
(the new) R1 by 1 to get
1
0
2 1 3
2
1
3
0 1
combining row elementary operations to save writing. . . )
R3 R3 R1 ,
R4 R4 2R1
R3 R4 ,
R4 13 R4
0
BB
@
0
BB
@
1
0
0
0
1
1
1
3
1
0
0
0
1
1
0
0
3
1
1
3
3
1
0
0
0
1
1
0
2
1
1
3
0
1
1
0
2
1
2
0
1
CC
A
1
CC
A
R3 R3 R2 ,
R4 R4 3R2
R2 R2 + R3
R1 R1 + R2
0
BB
@
0
BB
@
1
0
0
0
1
1
0
0
1
0
0
0
0
1
0
0
3
1
0
0
2
1
0
0
0
1
0
3
0
0
1
0
x2 = x3 x5 ,
x4 = 2x5 .
1
1
2
0
2
1
0
6
1
CC .
A
1
C
C
A
14
2
1
1
1
x = 1 +
0 .
0
2
0
1
The solution space therefore has these two vectors as a basis. (Note that
this procedure computes the rank and nullity as well. In this case the rank
is 3 and nullity is 2.)
To see that row operations yield an equivalent system of equations, we can
make the following observation: the effect of applying a row operation to a
matrix A is the same as that multiplying A on the left by the corresponding
elementary matrix:
(I) EI (p, q) (interchanging row p and row q): all the diagonal entries of
EI (p, q) except the pth and the q th are 1, the (p, q)th and the (q, p)th
entries are 1 and all the remaining entries are 0.
(II) EII (p, ) (multiplying row p by a non-zero scalar ): all the diagonal
entries of EII (p, ) except the pth are 1, the (p, p)th entry is and
all the remaining entries are 0.
(III) EIII (p, q, ) (adding times row p to row q): all the diagonal entries
of EIII (p, q, ) are 1, the (q, p)th entry is and all the remaining
entries are 0.
Note that each of these matrices is invertible: The inverses of EI (p, q),
EII (p, ) and EIII (p, q, ) are EI (p, q), EII (p, 1 ) and EIII (p, q, ) respectively.
Therefore the effect of applying a sequence of elementary row operations
to A is to replace A by Ek Ek1 E2 E1 A where E1 , . . . , Ek are the corresponding elementary matrices. Thus we are replacing A by QA where
Q = Ek Ek1 E2 E1 is invertible. Note that if Ax = 0 then QAx = 0, and
if QAx = 0 then Ax = Q1 QAx = 0. So the two systems of equations have
the same solutions.
Similarly column operations correspond to multiplying A on the right by
elementary matrices:
(I) EI (p, q) interchanges column p and column q;
(II) EII (p, ) column p by ;
(III) EIII (p, q, ) adds times column q to column p.
Recall that the image of the map f : Rn Rm defined by f (x) = Ax is
given by the span of the columns of A, and this is the set of b Rm such that
Ax = b has solutions. Applying column operations amounts to multiplying
A on the right by an invertible matrix P , and Ax = b has solutions if and
only if AP y = b has solutions (set x = P y and y = P 1 x), This can also
be thought of as saying that column operations dont change the span of the
15
columns. So for example, you could use column operations to find a basis
for the image of a linear map.
Example 3.3.To find a basis
for the image of the map f : R3 R4 defined
1 2
1
1 1
2
by the matrix
2 1 1 applying column operations gives
1 0
1
1
0 0
1
0
0
1
1
3 0
3
3
2 3 3 2 3 0
1
2 0
1
2
2
so the rank of f is two, and a basis for the image is {(1, 1, 2, 1)t , (0, 3, 3, 2)t }.
Note that row operations could be used to compute the rank of A, but that
would change the span of the columns of A (while preserving the span of its
rows).
Note that Ax = b has solutions (i.e., b is in the span of the columns of
A) if and only if the rank of the augmented matrix (A|b) is the same as the
rank of A. One can use row operations on this augmented matrix to find
the set of solutions to the system. We use the augmented matrix ( A |b)
to represent the system, and apply row operations to obtain an augmented
matrix ( A0 |b0 ) with A0 in row echelon form. Note that the new matrix
represents an equivalent system since applying the same series of row operations to both A and b amounts to multiplying them both on the left by
the same invertible matrix, say Q, and the equation Ax = b is equivalent to
the equation QAx = Qb. In particular if A0 is in row echelon form, one can
immediately tell whether A0 x = b0 = (b01 , . . . , b0m ) has a solution, according
to whether b0i = 0 for all i > r, where r is the rank of A0 . Furthermore if
there are any solutions, they can easily be read off from ( A0 |b0 ).
Example 3.4. Consider the system Ax = (1, 2, 1, 1)t where A is as in
Example 3.2. The same sequence of row operations applied now to the
augmented matrix
1
0
1
2
1
1
0
1
3
1
2
3
0
1
1
0
2
1
3
1
1
2
1
1
1
0
0
0
0
1
0
0
2
1
0
0
0
0
1
0
1
1
2
0
1
1
0
0
x2 = 1 + x3 x5 ,
x4 = 2x5
16
1
2
1
1
1
1
x=
0 + 1 + 0 .
2
0
0
0
0
1
1 2 3
Example 3.6. To find the inverse of the matrix A = 1 1 1 , apply
0 1 0
row operations to the augmented matrix ( A | In ) giving
0
@
1
0
0
0
@
0
@
giving A1
2
1
1
3
2
0
1
1
0
1
0
0
2
1
0
3
0
2
1
0
0
0
1
0
3
0
1
12
0
=
1
2
0
1
0
1
0
1
0
0
1
0
0
1
1
0
0
0
1
2
21
3
2
0
21
1 0
A@
0
1
1
1 0
A@
2
1
12
1
2
2
1
1
1
0
0
1 0
A@
2
1
0
1
0
0
3
0
2
3
0
1
0
1
0
0
0
1
1
0
1
0
0
1
1
A
0
1
0
1
0
0
0
0
1
1
2
21
21
12
0
1
2
1
A
3
2
1
2
0
12
1
21
1
A
1 .
12
1
0
0
For another algorithmic application of row and column operations, suppose we are given a linear map f : V W of finite-dimensional vector
spaces, and we want to find bases for V and W with respect to which the
17
(the transition matrices to the desired bases being P 1 and Q). To achieve
this, first apply row operations to the augmented matrix (A|In ) to obtain a
matrix (A0 |Q) where A0 =QA isin row echelon form. Next consider the verA0
. Applying column operations, it is easy to
tically augmented matrix
Im
Ir 0
00
see we can get the top matrix into the desired form A =
. Since
0 0
applying column operations amounts to multiplying the top and bottom
parts of the augmented matrix by the
invertible matrix P , we get that
same
00
A
where A00 = A0 P = QAP . Note
the resulting augmented matrix is
P
that there are (infinitely) many possible P and Q such that QAP has the
desired form. For example, if A = 0 then any invertible P and Q work; if A
is square and invertible, then given a P and Q which work (so QAP = In ),
we can choose any invertible R and replace Q by RQ and P by P R1 .
1 0 1
Example 3.7. Suppose A =
. Subtracting 2R1 from R2
2 1
0
gives
1 0 1
1 0
1 0 1 1 0
2 1
0 0 1
0 1
2 2 1
1 0 1
1 0
0
so we can let A =
and Q =
. Then applying
0 1
2
2 1
column operations gives
1 0 1
1 0 0
1 0
0
0 1
0 1 2
0 1
2
0
1 0
0 1 0 1 1 0
1
,
0 1
0 1 0
0 1 2
0
0 0
1
0 0
1
0 0 1
1 0
1
so we have P = 0 1 2 .
0 0
1
4. Determinants
We recall the definition of the determinant and review its main properties.
If A is a square matrix, then the determinant of A is a certain scalar associated to A, denoted det(A) (or simply |A|). We can define the determinant
inductively by first defining the determinant of any 1 1-matrix. Then we
18
a1n M1n
n
X
(1)j+1 a1j M1j ,
=
j=1
where M1j denotes the (1, j)-minor of A. We also write |A| for
det(A).
For example if n = 2, then M11 =det(a22 ) =
a22 and M22 = det(a21 ) =
a11 a12
a21 , so we get the usual formula det
= a11 a22 a12 a21 .
a21 a22
For n = 3, we have (using the | | notation)
a22 a23
= a22 a33 a23 a32 ,
M11 =
a32 a33
M12
M13
so
a
a
= 21 23
a31 a33
a
a
= 21 22
a31 a32
= a21 a33 a23 a31 , ,
= a21 a32 a22 a31 .
det(A) = a11 a22 a33 a11 a23 a32 a12 a21 a33 +a12 a23 a31 +a13 a21 a32 a13 a22 a31 .
Note that this coincides with the description of the determinant of a 3 3matrix given by repeating the first two columns of A after the matrix:
forming the products along each diagonal and adding them with appropriate signs (+ for the three &-sloping diagonals, for the three %-sloping
diagonals).
2 1
1
4 1 is 16 + 1 18
Example 4.2. The determinant of A = 3
1
6
2
4 + 12 6 = 1.
19
(by a series of row operations of type III, the last one being R2 R2 + R3 )
1
1
2 1
4
2 1
4
0 1
0
0
0
0
0 1
= 0
=
= (1)(1)13 = 13.
0 1
5
0
0 1
5
0
0
0
6 17
0
0 13
Proof. Multiplying the zero row by any doesnt change A, so part (2) of
Prop. 4.3 gives det A = det A for all R , so det A = 0.
The following is not at all easily seen from the definition, but follows fairly
quickly from Lemma 4.3.
20
21
A=
a11
a21
..
.
ai1
..
.
an1
a12
a22
..
.
ai2
..
.
an2
and
A00
a1n
a2n
..
.
ain
..
.
ann
0
, A =
a11
a21
..
.
a00
i1
..
.
an1
a12
a22
..
.
a00
i2
..
.
an2
a11 a12
a21 a22
..
..
.
.
a0i1
a0i2
..
..
.
.
an1 an2
a1n
a2n
..
.
,
a00
in
..
.
ann
a1n
a2n
..
.
a0in
..
.
ann
are identical except in ith row, and that the ith row of A00 is the sum
of the ith rows of A and A0 (i.e., a00ij = aij + a0ij for j = 1, . . . , n).
Then det(A00 ) = det(A) + det(A0 ).
Note that in (III), A00 is not the sum of A and A0 ; it is only the ith row
that is being described as a sum. Parts (II) and (III) taken together can
be thought of as saying that det is linear on each row.
Before proving Prop. 4.11, we show how Prop. 4.3 follows from it. Clearly
(II) implies (II).
Next we prove (III). Let A0 be the matrix identical to A, except that its
qth row is replaced times the pth row of A. Then by (II) and (I), we
have
0
|A | =
a11
..
.
ap1
..
.
ap1
..
.
an1
a12
..
.
ap2
..
.
ap2
..
.
an2
a1n
..
.
apn
..
.
apn
..
.
ann
a11
..
.
ap1
..
.
ap1
..
.
an1
a12
..
.
ap2
..
.
ap2
..
.
an2
a1n
..
.
apn
..
.
apn
..
.
ann
= 0.
Applying (III), we see therefore that det(A00 ) = det(A) where A00 is gotten
by applying the row operation of type (III) to A.
22
a1n
a11
a12
a1n
a11
..
..
..
..
..
..
.
.
.
.
.
.
ap1 + aq1 ap2 + aq2 apn + aqn
ap1 + aq1 ap2 + aq2 apn + aqn
..
..
..
..
..
..
, |A0 | =
|A| =
.
.
.
.
.
.
aq1
ap1
aq2
aqn
ap2
apn
..
..
..
..
..
..
.
.
.
.
.
.
a
a
ann
a
a
ann
n1
n2
n1
n2
These two matrices are identical expect in the qth row, and the sum of
their qth rows is the same as the pth row, so by (III) and (I), we have
det(A) + det(A0 ) = 0. Therefore det(A0 ) = det(A).
Proof. (of Prop. 4.11) We proceed by induction on n.
For n = 1, note that there is nothing to prove for (I), that (II) just says
that det(a) = a = det(a), and (III) says that det(a + a0 ) = a + a0 =
det(a) + det(a0 ).
Suppose now that n > 1, and that Prop. 4.11 (and therefore also Prop. 4.3)
hold with n replaced by n 1.
First we prove (II). Let A0 be the matrix gotten by mutliplying the ith
row of A by . If i = 1, then the definition of the determinant gives
det(A0 ) = (a11 )M11 (a12 )M12 + + (1)n+1 (a1n )M1n = det(A).
If 2 i n, then we get
0
0
0
det(A0 ) = (a11 )M11
(a12 )M12
+ + (1)n+1 (a1n )M1n
,
00
00
00
det(A00 ) = a11 M11
a12 M12
+ + (1)n+1 a1n M1n
,
23
where each M1j 0 is the determinant of the same matrix as in the definition
of M1j , but with two rows interchanged. By Prop. 4.3 (I) for n 1, we
0 = M , and it follows that det(A) = det(A0 ). But
therefore have M1j
1j
we have already shown that det(A0 ) = 0, so det(A) = 0 as well.
Finally suppose that row p and row q are identical, and p and q are
both greater than 1. Then the minors M1j are determinants of (n 1)
(n 1)-matrices, each of which has two identical rows. So by the induction
hypothesis, each M1j = 0, and therefore det(A) = 0.
This finishes the proof of Prop. 4.11, and thus the proof of all the stated
properties of the determinant.
Finally, we remark that there is an alternative definition of the determinant using permutations. This is also provided for your interest and will not
be covered on the examination.
The second definition is given purely for your general interest and will
not be covered in the course. It is somewhat more sophisticated but, once
mastered, the proofs of the properties are more transparent. The definition requires some knowledge about permutations. Recall that a bijective
mapping of {1, . . . , n} onto itself is called a permutation. The set of all
such permutations with composition as the operation of product forms a
group denoted by Sn . (i.e. 0 = 0 ). Here are the main properties of
permutations.
Every permutation Sn can be written as a product = 1 2 . . . s
where each i is a transposition (i.e. a permutation that exchanges
two elements of {1, 2, . . . , n} and leaves the rest fixed).
Although there are many such expressions for and the number s
of transpositions varies, nevertheless, for given , s is either always
even (in which case we say that is an even permutation) or always
odd (when we say is odd). (See the exercises for a proof of this
24
where the sum is take over the group Sn of all permutations of the integers
{1, 2, 3, . . . , n}.
Here is a sketch of a proof that the two definitions agree: Let us provisionally call det0 (A) the determinant of A as defined using permutations, and
continue to use det(A) for the determinant as we originally defined it inductively. We first show that det0 (A) has some of the same properties as det(A),
and then explain how to deduce that det(A) = det0 (A). It is straightforward
to check that det0 (A) satisfies (II) and (III) in Prop. 4.11. In fact (I) is
not so difficult either: let denote the transposition interchanging p and q.
If apj = aqj for all j, then one sees that for any permutation , the terms in
the permutation definition indexed by and are identical, except that
they have opposite signs and therefore cancel.
Having proved det0 (A) satifies these properties, one finds that det0 (A) has
all the same properties we proved about det. In particular, if E is elementary,
we find that det(E) = det0 (E), and therefore det0 (EA) = det(E) det0 (A). If
A is invertible, then we can write it as a product of elementary matrices to
deduce in this case that det(A) = det0 (A). If A is not invertible, then there
is an invertible matrix P so that P A has a zero row. It is easy to see that
in this case det0 (P A) = 0, so that det0 (A) = 0, and so det0 (A) = det(A) in
all cases.
5. Similarity and diagonalization
We now consider linear maps f : V V from a finite-dimensional vector
space V to itself, also called linear operators on V . We consider matrices
of such maps with respect to the same basis of V as both the domain and
25
1 0 0
0 2 0
Af = ..
..
..
.
.
.
0
26
1
1
Example 5.6. The matrix
is diagonalizable, and {
,
}
1
1
1 1
is a basis of eigenvectors. The matrix
is not diagonalizable. The
0 1
0 1
matrix
is not diagonalizable over R, but it is diagonalizable
1 0
1
over C, a basis of eigenvectors being given by
(with eigenvalue i) and
i
1
(with eigenvalue i). For the reason ilustrated by the last example,
i
it is often more convenient to work over C than over R. Well return to this
point later.
0 1
1 0
i=1
1 1 v1 + + k1 k1 vk1 + k k vk = 0.
P
On the other hand, multiplying ki=1 i vi = 0 by k gives
k 1 v1 + + k k1 vk1 + k k vk = 0.
27
Av = v for some v 6= 0
(I A)v = for some v 6= 0
nullity(I A) > 0
rank(I A) < n
det(I A) = 0
pA () = 0.
1 1
Example 5.11. The characteristic polynomial of
0 1
x 1 1
= (x 1)2
0
x1
is
0 1
The characteristic polynomial of
is
1 0
x 1
2
1 x = x 1 = (x + 1)(x 1),
0 1
1 0
is
The characteristic polynomial of
x 1
2
1 x = x + 1 = (x i)(x + i),
so its eigenvalues are the complex numbers i (and it has no real eigenvalues).
Everything weve said so far works in exactly the same way for complex
vector spaces and matrices as for real vector spaces and matrices. However
the last example shows that some real matrices are only diagonalizable when
28
for some 1 , 2 , . . . , n C.
29
1 0
0 2
D = ..
..
.
.
0 0
0
0
.. .
.
n
By Prop. 5.15, we just have to show that m = n for the matrix D. Since
pD (x) = (x 1 )(x 2 ) (x n ), we see that m is the number times
that = i . On the other hand n is the nullity of the matrix
1
0
0
2
0
D=
,
..
..
..
.
.
.
0
is a basis for Cn .
We first prove that S is linearly independent. Suppose that 11 , . . . , 1n1 ,
. . . , k1 , . . . , knk C are such that
11 v11 + +1n1 v1n1 21 v21 + +2n2 v2n2 + +k1 vk1 + +knk vknk = 0.
Let w1 = 11 v11 + + 1n1 v1n1 , . . . , wk = k1 vk1 + + knk vknk .
Each wi that is non-zero is an eigenvector for A with eigenvalue i . Since
the i are distinct and w1 + + wk = 0, we see by Lemma 5.7 that
w1 = w2 = = wk = 0. Since each Si is linearly independent it follows
that ij = 0 for all j = 1, . . . , ni .
30
is upper-triangular.
31
T = ..
..
..
.
.
.
0
0
tnn
=
=(
=
=
=
p(RT R1 )
RT R1 )n + an1 (RT R1 )n1 + + a1 RT R1 + a0 I
RT n R1 + an1 RT n1 R1 + + a1 RT R1 + a0 RIR1
R(T n + an1 T n1 + + a1 T + a0 I)R1
Rp(T )R1 ,
32
Ti as partitioned matrices
0
0
Ti vi
Tn vn
Ti =
, for i = 1, . . . , n 1, and Tn =
,
0 ti
0 0
where each Ti0 is an (n1)(n1)-upper triangular matrix with 0 in the ith
0
=0
diagonal place. Therefore, by the induction hypothesis, T10 T20 Tn1
and so
0 0
0
T1 T2 Tn1
v
0 v
T1 T2 Tn1 =
=
0
t
0 t
where v and t are some vector and scalar (whose values are not important).
Then
0
Tn vn
0 0
0 v
=0
=
T1 T2 T3 . Tn =
0 0
0 t
0 0
as required.
0 1
has characteristic polynomial
1 0
pA (x) = x2 + 1 and satisfies pA (A) = 0 since A2 + I = I + I = 0.
33
34
T1 0 0
0 T2 0
T = ..
.. ,
.
.
.
. .
0
i 1
0
0 i 1
..
..
.
.
0 0
0 0
Tk
form
..
.
i
0
0
0
..
.
1
i
The form of the matrix in the theorem is called Jordan canonical form.
We remark that there can be repetition among the eigenvalues 1 , . . . , k
appearing in the expression, and the sizes of the T1 , T2 , . . . , Tk may be different from each other; also some of the Ti may be 1 1, in which case we
have Ti = (i ). We will sketch the proof, but first we illustrate the meaning
of the theorem by listing the possible forms for T in the case n = 3.
35
1 0 0
T = 0 2 0 .
0 0 3
So in this case T1 = (1 ), T2 = (2 ), T3 = (3 ).
If pA (x) = (x 1 )(x 2 )2 with 1 6= 2 , then
1 0 0
1 0 0
T = 0 2 0 or 0 2 1 .
0 0 2
0 0 2
In the first case k = 3, T1 = (1
) and T2 =T3 = (2 );
2 1
case k = 2, T1 = (1 ) and T2 =
.
0 2
If pA (x) = (x )3 , then
1
0 0
0 0
T = 0 0 , 0 1 , or 0
0 0
0 0
0 0
in the second
0
1 .
We now sketch the proof of the theorem. By Theorem 5.18, we can assume
A is upper-triangular. We will first reduce to the case where A has only
one eigenvalue. Write pA (x) = (x 1 )n1 (x r )nr where 1 , . . . , r
are distinct, and consider the matrices Ai = (A i )ni for i = 1, . . . , r.
Then considering the diagonal entries, we see that rank Ai n ni , so
nullity(Ai ) ni . On the other hand, by the Cayley-Hamilton Theorem
A1 A2 Ar = 0, so an argument like the one in the proof of Thm. 5.25 shows
that in fact nullity(Ai ) = ni . Let Vi denote the null space of Ai and suppose
that vi Vi for i = 1, . . . , r are such that v1 + v2 + + vr = 0. Suppose
that vi 6= 0 for some i. If j 6= i, then vi cannot be an eigenvector with
eigenvalue j (else we would have Ai vi = (A i )ni vi = (j i )ni vi 6= 0).
So (A j )vi 6= 0, but (A j ) commutes with Ai , so (A j )vi is a
non-zero vector in Vi . Inductively, we find that Aj vi is a non-zero vector
in Vi , and in fact, letting A0 = A1 A2 Ai1 Ai+1 Ar , we get that A0 vi
is non-zero. On the other hand Aj vj = 0, so that A0 vj = 0 for j 6= i, so
applying A0 to v1 + v2 + + vr = 0 gives a contradiction. It follows then
as in the proof of Thm. 5.16 that if Si is a basis for Vi for i = 1, . . . , r, then
S = S1 Sr is a basis for Cn . Since vi Vi implies Avi Vi , it follows
that the matrix for A with respect to such a basis will be block diagonal
with the ith diagonal block Bi being a matrix for the linear map defined by
A on Vi with respect to Si . Moreover Bi satisfies (Bi i )ni = 0, so i is
the only eigenvalue of Bi .
36
It suffices to prove that each Bi has the required form, so we are reduced
to the case where A has only one eigenvalue, so we will just write instead
of i . Replacing A by A I, we can even assume = 0 from now on. (If
A I is similar to T , then A is similar to T + I.) Suppose then that
(1)
(1)
(1)
mA (x) = xd , so Ad = 0, but Ad1 6= 0. Let v1 , v2 , . . . , vs1 be a basis for
the image of Ad1 (viewed as the linear map v 7 Av), and for i = 1, . . . , s1 ,
(1)
(1)
(1)
choose wi so that Ad1 wi = vi . Consider the linear map f1 from the
image of Ad2 to the image of Ad1 defined by multiplication by A. Then
(1)
(1)
(1)
v1 , v2 , . . . , vs1 are linearly independent vectors in the kernel, and so can
be extended to a basis of the kernel of f1 :
(1)
(2)
(1)
(2)
.
, v1 , v2 , . . . , vs(2)
v1 , v2 , . . . , vs(1)
2
1
By lemma 2.10, these s1 + s2 vectors, together with
(1)
(1)
(2)
(2)
form a basis for the image of Ad2 . Now choose wi so Ad2 wi = vi for
i = 1, . . . , s2 . Iterating the process for j = 3, . . . , d for the map fj1 (v) = Av
from the image of Adj to the image of Adj+1 , we get vectors
(1)
(1)
(2)
(2)
(j)
(j)
w1 , w2 , . . . , ws(1)
, w1 , w2 , . . . , ws(2)
, . . . , w1 , w2 , . . . , ws(j)
1
2
j
with the following properties:
(t)
(t)
(t)
(t)
(t)
(t)
n
X
i=1
ui vi = u1 v1 + u2 v2 + + un vn .
37
= uv
where v
= (
v1 , . . . , vn ) is the complex conNote that hu, vi = ut v
jugate of v (taken coordinatewise) and dnotes the dot (or scalar) product
of the two vectors. Thus hu, vi is a scalar in C.
Example 6.2. If u = (3, 1 + i, 1) and v = (1 2i, i, 2), then
hu, vi = 3(1 + 2i) + (1 + i)i + (1)2 = 7i.
Note that ||v|| is a non-negative real number. For the vectors from
Example
+ i, 1) and v = (1 2i, i, 2), we have
u = (3, 1
6.2, with
||u|| = 12 = 2 3 and ||v|| = 10.
More generally, suppose V is any finite-dimensional complex vector space
equipped with a map V V C denoted h, i; i.e., a rule associating a
complex number hu, vi to any elements u, v C. Then V is an inner
product space if it satisies (1), (2) and (3) of the preceding proposition.
In particular, there p
is the notion of the norm of a vector in V defined by
the formula ||v|| = hv, vi (a non-negative real number by property (3));
there is also a notion of orthogonality as for Cn . Note that Cn , with the
inner product defined above, is an inner product space. There is a similar
notion of a real inner product space, which is a real vector space V equipped
with an inner product hv, wi satisfying analogous axioms. Everything we do
for (complex) inner product spaces carries over without change, except that
complex conjugation plays no role.
Proposition 6.5. If V is an inner product space, then for u, v, w V and
, C, we have
(1) hu, v + wi = hu, vi + hu, wi;
(2) ||v|| = || ||v||;
(3) h0, vi = hv, 0i = 0;
(4) [Pythagorean Theorem] if hu, vi = 0, then ||u + v||2 = ||u||2 + ||v||2 .
The proofs of these are also straightforward and left as exercises. Note
that the proposition applies to to V = Cn with its usual inner product. The
38
orthogonal to v 0 =
implies that
hu,vi
||v||2
hu,vi
hv, vi
||v||2
hu, vi
v.
||v||2
hu, vi2
.
||v||2
hu + v, u + vi
hu, ui + hu, vi + hv, ui + hv, vi
||u||2 + hu, vi + hu, vi + ||v||2
||u||2 + 2Re(hu, vi) + ||v||2
where Re(z) = 21 (z + z) is the real part of z. Since Re(z) |z| for any z C,
this gives (using part (1))
||u + v||2 ||u||2 + 2|hu, vi| + ||v||2
||u||2 + 2||u|| ||v|| + ||v||2
= (||u|| + ||v||)2 ,
and (2) follows on taking square roots.
39
1 (1, 0, 1, 0)t
2
orthonormal subset of C4 .
1
t
2 (1, i, 1, 1) ,
Then {u1 , u2 , u3 } is an
u2 =
and u3 =
which by (1) is 1 1 + + n n .
by 1 = hv, u1 i = (1+2i)/2,
2 = hv, u2 i = (1+i)/ 2, 3 = hv, u3 i = i/2
40
2
i
1
0
1 i
1 i 2
0
i
2 1 2
0
1
0 i 2
is unitary.
the matrix
41
Def inition 6.20. If A and B are complex square matrices, then we say
that A is unitarily similar to B if A = U 1 BU = U BU for some unitary
matrix U , and A is unitarily diagonalizable if it is unitarily similar to a
diagonal matrix. If A and B are real square matrices, then we say that A is
orthogonally similar to B if A = P 1 BP = P t BP for some orthogonal
matrix P , and A is orthogonally diagonalizable if it is orthogonally
similar to a diagonal matrix.
It is easy to see that unitary (and orthogonal) similarity are equivalence
relations using the fact that if U and U 0 are unitary, then so are U 1 and
U U 0 (see the exercises).
Theorem 6.21. A complex (or real) matrix A is unitarily (or orethogonally) diagonalizable if and only if there is an orthonormal basis consisting
of eigenvectors for A.
Proof. This is clear from the definitions and Corollary 6.18 since U AU is
diagonal if and only the columns of U are eigenvectors for A.
0 i
Example 6.22. The matrix A =
is unitarily diagonalizable since
i 0
1 (1, 1)t and 1 (1, 1)t form an orthonormal basis of eigenvectors with
2
2
eigenvalues
i
and
i. More
explicitly,
we have U AU = D where U =
1 1
i 0
1
and D =
.
2
1 1
0 i
Def inition 6.23. A square matrix A is
(1) symmetric if A = At ;
(2) self-adjoint (or Hermitian) if A = A;
(3) normal if AA = A A.
Example
6.24. If A is self-adjoint, then A is normal. The matrix A =
i 0
is not self-adjoint, but it is normal since A = A, so A A =
0 i
AA = A2 .
Proposition 6.25.
(1) If a real matrix is orthogonally diagonalizable, then it is symmetric.
(2) If a complex matrix is unitarily diagonalizable, then it is normal.
Proof. (1) If A is orthogonally diagonalizable, then A = P DP t for some orthogonal P and diagonal D. Then At = (P DP t )t = (P t )t D t P t = P DP t =
A, so A is symmetric.
(2) If A is unitarily diagonalizable, then A = U DU for some unitary U
and diagonal D. Then A = (U DU ) = (U ) D U = U D U , so
AA = (U DU )(U D U ) = U (DD )U
and similarly A A = U (D D)U . Since D is diagonal D = D and DD =
D D, so A is normal.
42
We will later prove the converse to both parts of the proposition. First
we note the behavior of adjoint and unitary matrices with respect to the
inner product.
Proposition 6.26. Suppose that A is a complex n n-matrix.
(1) If hAv, wi = 0 for all v, w Cn , then A = 0.
(2) If v, w Cn , then hAv, wi = hv, A wi.
(3) If U is a unitary n n matrix and v, w Cn , then
hU v, U wi = hv, wi
ith
and
k=1
The only term that survives on the right is for i = k, giving hAej , ei i = aij .
Applying hAv, wi = 0 with v = ei , w = ej therefore gives aij = 0 for all
i, j, so A = 0.
(2) Since A = At , we have
hv, A wi = vt A w = vt A w = vt At w = (Av)t w = hAv, wi.
since U U = I. The assertion about ||U v|| follows from the case v = w and
taking square roots.
Analogous statements hold for real matrices. (See the exercises.)
Theorem 6.27. If A self-adjoint, then
(1) the eigenvalues of A are real;
(2) eigenvectors with distinct eigenvalues are orthogonal.
Proof. (1) Suppose C is an eigenvalue of A, so Av = v for some
v 6= 0. Then hAv, vi = hv, vi = hv, vi. Since A = A , part (1) of
Prop. 6.26 implies that
hAv, vi = hv, A vi = hv, Avi = hv, vi = hv, vi.
6 0, it follows that = , so R.
Since hv, vi = hv, vi and hv, vi =
(2) Suppose that Av = v and Aw = w for some non-zero v, w Cn ,
and some 6= C. Then hAv, wi = hv, wi = hv, wi, and
hAv, wi = hv, Awi = hv, wi = hv, wi = hv, wi
43
The theorem has the following corollary (which we will improve upon
later, removing the assumption that A has n eigenvalues).
Corollary 6.28.
(1) If A is self-adjoint with n distinct eigenvalues, then A is unitarily
diagonalizable.
(2) If A is real symmetric with n distinct eigenvalues, then A is orthogonally diagonalizable.
Proof. (1) Let {v1 , v2 , . . . , vn } be a basis consisting of egenvectors for the
distinct eigenvalues 1 , 2 , . . . , n . Let ui = ||vvii || , for i = 1, . . . , n. Then
each ui is an eigenvector with eigenvalue i . By Thm. 6.27 we have hui , uj i =
0 for i 6= j. Since ||ui || = 1 for i = 1, . . . , n, we see that {u1 , u2 , . . . , un } is
an orthonormal basis of eigenvectros for A, so A is unitarily diagonalizable.
The proof of (2) is the same.
1 2i
Example 6.29. Let A be the self-adjoint matrix
. Then
2i 4
det(xI A) = x2 5x = x(x 5), so the eigenvalues are the real number 0
and 5. For 1 = 0, we have the eigenvector v1 = (2i, 1)t , and for 2 = 5,
we have the eigenvector v2 = (1, 2i)t . Then v1 is orthogonal
to v2 and an
orthonormal
basis is givenby multiplying each by its norm 15.
Thusif we
2i
1
0 0
1
take U = 5
, then U is unitary, and U AU =
.
1 2i
0 5
Before treating the general cases of normal and real symmetric matrices,
we establish the following useful algorithm for constructing orthonormal
sets.
Theorem 6.30 (Gram-Schmidt Process). Suppose that {v1 , v2 , . . . , vk } is a
linearly independent subset of an inner product space V . Let u1 , u2 , . . . , uk
be vectors defined inductively as follows:
Let
and
w1 = v1 ,
w2 = v2 hv2 , u1 iu1 ,
w3 = v3 hv3 , u1 iu1 hv3 , u2 iu2 ,
..
.
P
wk = vk k1
i=1 hvk , ui iui ,
u1 =
u2 =
u3 =
w1
||w1 || ,
w2
||w2 || ,
w3
||w3 || ,
..
.
uk =
wk
||wk || .
44
k1
X
hvk , ui iui span{u1 , . . . , uk1 } = span{v1 , . . . , vk1 }
i=1
contradicts the assumption that span{v1 , . . . , vk1 , vk } is linearly independent (recall Lemma 1.19).
Next we check that {u1 , . . . , uk } is orthonormal. Since {u1 , . . . , uk1 }
is orthonormal, we already know that ||ui || = 1 for i = 1, . . . , k 1 and
hui , uj i = 0 if i 6= j {1, . . . , k 1}. It is clear that ||uk || = 1, so we only
need to check that hwk , uj i = 0 for j = 1, . . . , k 1 (as this implies that
huk , uj i = 0 and huj , uk i = huk , uj i = 0 for j = 1, . . . , k 1). But for each
such j, we have
E
D
P
hwk , uj i =
vk k1
i=1 hvk , ui iui , uj
P
= hvk , uj i k1
i=1 hvk , ui ihui , uj i.
45
Now
1
1
i
1+i 1 t
w3 = v3 hv3 , w1 iw1 hv3 , w2 iw2 = ( ,
, )
2
2
2
2 2
happens to have ||w3 || = 1, so let u3 = w3 .
We record some consequences of the Gram-Schmidt process.
Corollary 6.32.
(1) Every subspace of Cn has an orthonormal basis;
(2) if v Cn with ||v|| = 1, then v is the first column of a unitary
matrix;
(3) if Q is an invertible n n complex matrix, then Q = U T for some
unitary matrix U and upper-triangular matrix T .
Proof. (1) Apply the Gram-Schmidt process to any basis {v1 , . . . , vk } for
the subspace.
(2) Extend {v} to a basis {v1 , . . . , vn } for Cn with v1 = v (by Thm. 1.26
and then apply the Gram-Schmidt process to obtain an orthonormal basis
{u1 , . . . , un }. Note that since ||v1 || = 1, we have u1 = v1 = v, and since
{u1 , . . . , un } is orthonormal, the matrix whose columns are u1 , . . . , un is
unitary.
(3) Let v1 , v2 , . . . , vn be the columns of Q. If A is invertible, then
rank Q = n, so the columns span Cn , hence form a basis. Now apply the
Gram-Schmidt process to S = {v1 , v2 , . . . , vn } to get an orthonormal basis
S 0 = {u1 , u2 , . . . , un }. Then the matrix whose columns are u1 , u2 , . . . , un
is unitary. Then T = U 1 Q is the transition matrix from S to S 0 . Recall
that the j th column of T is given by the coordinates of vj with respect to
S 0 . Since vj is in the span of {u1 , . . . , uj }, its ith coordinate is 0 for i > j,
which means that T is upper-triangular.
The analagous statements hold for vectors and subspaces of Rn and real
matrices, replacing unitary with orthogonal throughout; the proofs are the
same as for C.
We discuss some examples before proceeding with the proof of unitary
(resp. orthogonal) diagonalizability of normal (resp. real symmetric) matrices.
Example 6.33. Let V be the null space in C4 of the matrix
1 i 0 i+1
.
A=
1 0 2
i
We can find an orthonormal basis for V as follows. First find a basis for V
in the usual way, by applying row operations to A to get
1 0 2 i
.
0 1 2i i
46
1
1
w2 = (i, i, 0, 1)t + (2 + 2i)(2, 2i, 1, 0)t = (5i 4, 5i 4, 2 2i, 9)t .
9
9
1
Normalizing then gives u2 = 319 (5i 4, 5i 4, 2 2i, 9)t .
2
Then ||w2 || = 9 3 2, so we let
u2 =
Now he2 , vi =
2
3
w2
1
= (4, 1, 1)t .
||w2 ||
3 2
1
, so we let
and he2 , u2 i = 3
2
1 (0, 1, 1)t .
2
1
2
0
3
3
2 1
1
3 23 2
2
3
2 3
Example 6.35. Let Q be the matrix whose columns are the vectors v1 , v2 , v3
from Example ??, so
1 1
0
1 .
Q= 0 i
i 1 1+i
1
2
U = 0
i
2
1+i
2 2
i
2
1+i
2 2
i
2
1i
2
1
2
1
i
1i
1i
0
2
1
1
0
2
2
2
2
1i
i
1i
1+i
0
.
i
1 = 0
2
22
2
2 2
2
i
1+i
1
i 1 1+i
0
0
1
2
2
2
47
So far the only examples of inner product spaces weve considered were
Rn and Cn . For more examples, note that a subspace of an inner product
space is still an inner product space. For more interesting examples, one can
consider spaces of functions.
Example 6.36. Let V be the set of real polynomials of degree at most n.
Define an inner product on V by
Z 1
f (x)g(x) dx.
hf, gi =
0
This satisfies the conditions in the definition of a real inner product space,
since
R1
R1
R1
(1) 0 (f (x)+g(x))h(x) dx = 0 f (x)h(x) dx+ 0 g(x)h(x) dx shows
that hf, hi + hg, hi;
R1
(2) hg, f i = 0 f (x)g(x) dx = hg, f i;
R1
(3) hf, f i = 0 f (x)2 dx > 0 unless f (x) = 0.
1
g2 (x)
=2 3 x
h2 (x) =
= 3(2x 1).
||g2 ||
2
R1 2
Since hf3 , h1 i = 0 x dx = 31 and
R1
R1
3 (2x3 x2 ) dx
hf3 , h2 i = 3 0 x2 (2x 1) dx =
01 4 1 3 1 1
3 2 x 3 x 0 = 3 6 ,
=
we set
g3 (x) = x2
1 1
1
(2x 1) = x2 x + .
3 2
6
48
Then another integral calculation gives ||g3 ||2 = 1/180, so for the third
vector in the orthonormal basis, take
g3 (x)
h3 (x) =
= 5(6x2 6x + 1).
||g3 ||
Example 6.39. If we momentarily drop the assumption that V be finitedimensional, we can consider more interesting examples. Let V be the space
of continuous complex-valued functions on the real unit interval [0, 1] (so
f V means that f (x) = s(x) + it(x) where s and t are continuous realvalued functions on [0, 1]). Define
Z 1
f (x)g(x) dx
hf, gi =
0
49
(2) If A is a real square matrix with only real eigenvalues, then the proof
of Theorem 5.18 goes through in exactly the same way to show that A is
similar (over R) to an upper-triangular matrix, and the proof of Corollary ??
goes through to show that if Q is real invertible, then Q = P T for some
orthogonal P and upper-triangular T . The proof of (2) is then the same as
(1).
Theorem 6.41.
(1) A matrix is unitarily diagonalizable if and only if it is normal.
(2) A real matrix is orthogonally diagonalizabe if and only if it is symmetric.
Proof. (1) We already saw in Prop. 6.25 that if A is unitarily diagonalizable,
then A is normal. We must prove the converse.
Suppose then that A is normal. By Thm. ??, we know that U AU = T for
some unitary matrix U and upper-triangular matrix T . Since AA = A A
and T = (U AU ) = U A (U ) = U A U , it follows that
T T = (U AU )(U A U ) = U A(U U )A U = U (AA )U
= U (A A)U = U A U U AU = T T ;
i.e., T is normal.
We complete the proof by showing that a normal upper-triangular n n
matrix is diagonal. We prove this by induction on n. The case n = 1 is
obvious. Suppose then that n > 1 and the statement is true for (n 1)
(n 1) matrices. Let T be a normal upper-triangular n n matrix. Since
T is upper-triangular, we can write
vt
,
T =
0 T1
n1 and T is an (n 1) (n 1) matrix. Then
where C, v
1
C
0
, so
T =
v T1
||2 + vt v vt T1
||2
vt
.
TT =
and T T =
T1 v
T1 T1
v vvt + T1 T1
50
3 2 4
2
6 2 .
4
2 3
Since A is symmetric, we know by the theorem that is orthogonally diagonalizable. The characteristic polynomial of A is
pA (x) = (x 3)2 (x 6) + 16 + 16 16(x 6) 4(x 3) 4(x 3)
= x3 12x2 + 21x + 98.
5 2 4
1 0
1
2
8 2 0 1 1/2 ,
4
2 5
0 0
0
4 2
4
1 1/2 1
2 1
2 0
0
0 ,
4
2 4
0
0
0
so that (1, 0, 1)t and (1, 2, 0)t are linearly independent eigenvectors with
eigenvalue 7. Applying the Gram-Schmidt Process to these gives u2 =
1
1 (1, 0, 1)t and u3 =
(1, 4, 1). Therefore setting
2
3 2
2 1
P =
3
1
3
2
3
1
2
32
2 2
3
1
3 2