Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
34 views

Linear Algebra

Uploaded by

熊MH
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Linear Algebra

Uploaded by

熊MH
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 143

Kjell Elfström

Linear
Algebra
Linear Algebra

Kjell Elfström
Copyright c Kjell Elfström
First edition, December 2020
Contents

1 Matrices 1
1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Addition and Multiplication of Matrices . . . . . . . . . . . . . . . . . . 1
1.3 The Transpose of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 The Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Linear Spaces 11
2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 More on Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Direct Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 The Rank-Nullity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 26
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Inner Product Spaces 33
3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 Orthogonal Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 The Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5 The Method of Least Squares . . . . . . . . . . . . . . . . . . . . . . . . 44
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4 Determinants 51
4.1 Multilinear Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Definition of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3 Properties of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . 56
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5 Linear Transformations 67
5.1 Matrix Representations of Linear Transformations . . . . . . . . . . . . 67
5.2 Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.3 Projections and Reflections . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.4 Isometries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6 Eigenvalues and Eigenvectors 87
6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.2 Diagonalisability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Contents

6.3 Recurrence Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94


6.4 The Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.5 Systems of Linear Differential Equations . . . . . . . . . . . . . . . . . . 99
6.6 The Vibrating String . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7 Quadratic Forms 107
7.1 Bilinear Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.2 Definition of Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . 108
7.3 The Spectral Theorem Applied to Quadratic Forms . . . . . . . . . . . . 109
7.4 Quadratic Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7.5 Sylvester’s Law of Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Answers to Exercises 125
Index 133

iv
1 Matrices
1.1 Definition
A matrix is a rectangular array of real numbers:
 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
 
A= . .. ..  .
 .. . . 
am1 am2 · · · amn
The size of a matrix with m rows and n columns is m × n, which is read as ‘m by n’.
The number aik in row i and column k is called an entry. We shall also use the more
compact notation A = [aik ]m×n , A = [aik ], A = [Aik ]m×n or A = [Aik ]. If m = n, we say
that A is a square matrix of order n, and then the entries aii are said to form the main
diagonal of A. By a column matrix we shall mean an m × 1 matrix with a single column,
and a row matrix is a 1 × n matrix having a single row.
Example 1.1.
     
  1 2 1 1 2 3
1 2 3
A= , B = 2 5 , C = 2 , D = 2 3 4 .
1 1 1
3 7 3 3 4 5
The sizes of these matrices are 2 × 3, 3 × 2, 3 × 1 and 3 × 3, respectively. D is a square
matrix of order 3, and its main diagonal comprises the entries 1, 3 and 5.

1.2 Addition and Multiplication of Matrices


Definition 1.2. When the matrices A = [aik ]m×n and B = [bik ]m×n are of the same
size, we define their sum as
 
a11 + b11 a12 + b12 · · · a1n + b1n
 a21 + b21 a22 + b22 · · · a2n + b2n 
 
A + B = [aik + bik ]m×n =  .. .. .. .
 . . . 
am1 + bm1 am2 + bm2 · · · amn + bmn
The m × n zero matrix is the matrix
 
0 0 ··· 0
0 0 · · · 0
 
0m×n = [0]m×n = . . ..  .
 .. .. .
0 0 ··· 0
1 Matrices

When the size of the zero matrix is clear from context, we shall simply denote it by 0.

Example 1.3.
       
1 2 3 1 1 1 1+1 2+1 3+1 2 3 4
+ = = ,
0 1 1 2 1 2 0+2 1+1 1+2 2 2 3
         
1 2 0 0 1 2 0+1 0+2 1 2
0 + 2 3 = 0 0 + 2 3 = 0 + 2 0 + 3 = 2 3 ,
3 4 0 0 3 4 0+3 0+4 3 4

whereas  
  1 2
1 2 3
+ 2 3
0 1 1
3 4
is not defined.

Definition 1.4. Let A = [aik ]m×n be a matrix and s a real number. We define the
product of s and A as
 
sa11 sa12 ··· sa1n
 sa21 sa22 ··· sa2n 
 
sA = [saik ]m×n = . .. ..  .
 .. . . 
sam1 sam2 · · · samn

This operation is called scalar multiplication of matrices. We also define −A as (−1)A.

Example 1.5.
   
1 2 3 3 6 9
3 = .
0 1 1 0 3 3

The reader should be able to verify the following rules of calculation.

Theorem 1.6. Below, s and t are real numbers, and A, B and C are matrices of the
same size.
(i) A + B = B + A.
(ii) A + (B + C) = (A + B) + C.
(iii) 0 + A = A.
(iv) A + (−A) = 0.
(v) s(A + B) = sA + sB.
(vi) (st)A = s(tA).

Our next objective is to define multiplication of matrices in a way that enables us to


write systems of linear equations in a compact manner. Therefore, we begin by defining

2
1.2 Addition and Multiplication of Matrices

the product AB of an m × n matrix A = [aik ]m×n and an n × 1 matrix B = [bik ]n×1 .


Note that the number of columns of A and the number of rows of B are equal. We define
    
a11 a12 · · · a1n b1 a11 b1 + a12 b2 + · · · + a1n bn
 a21 a22 · · · a2n   b2   a21 b1 + a22 b2 + · · · + a2n bn 
    
AB =  . .. ..   ..  =  .. .
 .. . .   .   . 
am1 am2 · · · amn bn am1 b1 + am2 b2 + · · · + amn bn

As we can see, the product AB is a column matrix of size m × 1. Now let


   
x1 y1
 x2   y2 
   
X =  .  and Y =  .  .
 ..   .. 
xn ym

Then
   
a11 x1 + a12 x2 + · · · + a1n xn y1
 a21 x1 + a22 x2 + · · · + a2n xn   y2 
   
AX = Y ⇔  ..  =  .. 
 .   . 
am1 x1 + am2 x2 + · · · + amn xn ym

 a11 x1 + a12 x2 + · · · + a1n xn = y1


 a21 x1 + a22 x2 + · · · + a2n xn = y2
⇔ .. .

 .


am1 x1 + am2 x2 + · · · + amn xn = ym

Thus we achieved the goal we set out to ourselves. We shall now extend the definition
to n × p matrices B. Denote by Bk the kth column of B. The matrix AB is the m × p
matrix comprising the columns ABk , k = 1, 2, . . . , p.

Definition 1.7. Let A = [aik ]m×n and B = [bik ]n×p be matrices of size m × n and n × p,
respectively. The product AB is the m × p matrix C = [cik ]m×p for which
n
X
cik = ai1 b1k + ai2 b2k + · · · + ain bnk = aij bjk , 1 ≤ i ≤ m, 1 ≤ k ≤ p.
j=1

Hence, the entry in position i, k of the product AB is the sum of the products of the
corresponding entries of row i of A and column k of B.

Example 1.8.
      
1 2 3 1 2 1·1+2·1+3·1 1·2+2·2+3·2 6 12
4 5 6 1 2 = 4 · 1 + 5 · 1 + 6 · 1 4 · 2 + 5 · 2 + 6 · 2 = 15 30 ,
7 8 9 1 2 7·1+8·1+9·1 7·2+8·2+9·2 24 48

3
1 Matrices

whereas   
1 2 1 2 3
1 2 4 5 6
1 2 7 8 9
is undefined. As we see, BA need not be defined even though AB is.
      
1 1 2 −1 1 · 2 + 1 · 2 1 · (−1) + 1 · (−1) 4 −2
= = ,
2 2 2 −1 2 · 2 + 2 · 2 2 · (−1) + 2 · (−1) 8 −4
      
2 −1 1 1 2 · 1 + (−1) · 2 2 · 1 + (−1) · 2 0 0
= = = 0.
2 −1 2 2 2 · 1 + (−1) · 2 2 · 1 + (−1) · 2 0 0

Here we see that the commutative law and the cancellation law fail for matrix multiplic-
ation.

Theorem 1.9. In the identities below, s ∈ R and A, B, C are matrices. All members
of an identity are defined whenever any member is defined.
(i) A(B + C) = AB + AC.
(ii) (A + B)C = AC + BC.
(iii) A(BC) = (AB)C.
(iv) s(AB) = (sA)B = A(sB).

Proof. (i) In order that any member, hence both members, be defined, it is necessary
and sufficient that the sizes of A, B and C are m × n, n × p and n × p, respectively. We
then have
n
X n
X n
X n
X
(A(B + C))ik = Aiν (B + C)νk = Aiν (Bνk + Cνk ) = Aiν Bνk + Aiν Cνk
ν=1 ν=1 ν=1 ν=1
= (AB)ik + (AC)ik = (AB + AC)ik

for all i and k such that 1 ≤ i ≤ m and 1 ≤ k ≤ p. This proves the statement.
The proof of (ii) is similar and is left to the reader.
(iii) We may here assume that the sizes of A, B and C are m × n, n × p and p × q,
respectively. Then
n
X n
X p
X p
n X
X
(A(BC))ik = Aiν (BC)νk = Aiν Bνµ Cµk = Aiν Bνµ Cµk
ν=1 ν=1 µ=1 ν=1 µ=1
p Xn p n
! p
X X X X
= Aiν Bνµ Cµk = Aiν Bνµ Cµk = (AB)iµ Cµk
µ=1 ν=1 µ=1 ν=1 µ=1

= ((AB)C)ik .

The simple proof of (iv) is also left to the reader.

4
1.3 The Transpose of a Matrix

Definition 1.10. The unit matrix of order n is the square matrix


 
1 0 0 ··· 0
0 1 0 · · · 0
 
 1 · · · 0
I (n) = 0 0 .
 .. .. .. .. 
. . . .
0 0 0 ··· 1
The entries along the main diagonal are ones and the remaining entries are zeros. When
the order is clear from context, we shall simply write this matrix as I.

The following theorem is an immediate consequence of the definition.


Theorem 1.11. Let A be an m × n matrix. Then I (m) A = AI (n) = A.

1.3 The Transpose of a Matrix


Definition 1.12. Let  
a11 a12 ··· a1n
 a21 a22 ··· a2n 
 
A= . .. .. 
 .. . . 
am1 am2 · · · amn
be an m × n matrix. The transpose of A is the n × m matrix
 
a11 a21 · · · am1
 a12 a22 · · · am2 
 
At =  . .. ..  .
 .. . . 
a1n a2n · · · amn

Hence, the transpose of A is the matrix whose columns are the rows of A.
Theorem 1.13. Below, s ∈ R and A and B are matrices. Both sides of an identity are
defined whenever any side is defined.
(i) (A + B)t = At + B t .
(ii) (sA)t = sAt .
(iii) (AB)t = B t At .

Proof. The first two statements are simple consequences of the definition. To prove the
last statement, suppose that the sizes of A and B are m × n and n × p, respectively.
Then
Xn n
X
t
((AB) )ik = (AB)ki = Akj Bji = (At )jk (B t )ij = (B t At )ik ,
j=1 j=1

which proves the claim.

Definition 1.14. A matrix A is said to be symmetric if At = A.

5
1 Matrices

1.4 The Inverse of a Matrix


Definition 1.15. Let A be a square matrix of order n. We say that A is invertible if
there exists a square matrix B of order n such that
AB = BA = I.

Suppose that AB = BA = I and AC = CA = I. We then have


B = IB = (CA)B = C(AB) = CI = C.
This shows that the matrix B in the definition is unique when it exists.
Definition 1.16. Let A be an invertible square matrix. The inverse A−1 of A is the
unique matrix B in the definition above.

Example 1.17. Let 


  
1 2 −5 2
A= and B= .
3 5 3 −1
Then    
1(−5) + 2 · 3 1 · 2 + 2(−1) 1 0
AB = = =I
3(−5) + 5 · 3 3 · 2 + 5(−1) 0 1
and    
(−5) · 1 + 2 · 3 (−5) · 2 + 2 · 5 1 0
BA = = = I.
3 · 1 + (−1) · 3 3 · 2 + (−1) · 5 0 1
Hence A is invertible with inverse A−1 = B.

Example 1.18. Let   


2 −1 1 1
A= and B= .
2 −1 2 2
As we saw in Example 1.8, AB = 0. From this it follows that A cannot be invertible. If
it were, we should have 0 = A−1 0 = A−1 (AB) = (A−1 A)B = IB = B, which is false.

Theorem 1.19. Let A be a square matrix of order n. Then A is invertible if and only
if, for every n × 1 matrix Y , there is a unique n × 1 matrix X such that AX = Y .

Proof. First assume that A is invertible with inverse A−1 . If AX = Y , then we have
X = IX = A−1 AX = A−1 Y , which shows that the equation AX = Y can have no more
than one solution X. Since AA−1 Y = IY = Y , we see that X = A−1 Y is, in fact, a
solution. This proves the implication to the right.
To show the converse, we assume that, for every n × 1 matrix Y , there is a unique
n × 1 matrix X such that AX = Y . Let Ik be the kth column of the unit matrix I.
By assumption, there is an n × 1 matrix Bk such that ABk = Ik . Let B be the matrix
comprising the columns Bk , k = 1, 2, . . . , n. Then AB = I. It remains to show that
BA = I. Let C = BA. Then we have AC = A(BA) = (AB)A = IA = A = AI. Hence
ACk = AIk for k = 1, 2, . . . , n, and it follows from the uniqueness that Ck = Ik for
k = 1, 2, . . . , n. From this we conclude that BA = C = I.

6
1.4 The Inverse of a Matrix

Sometimes the following simple theorem proves useful.

Theorem 1.20. Let A and B be m × n matrices. Then AX = BX for all n × 1


matrices X if and only if A = B.

Proof. If AX = BX for all n × 1 matrices X, then AIk = BIk for all columns Ik of the
n × n unit matrix I. Hence, A = AI = BI = B. The reverse implication is immediate.

Theorems 1.19 and 1.20 can be used to devise a method for finding the inverse when it
exists or disclosing its non-existence. If we find that the system AX = Y has a unique
solution X = BY , then A is invertible by Theorem 1.19. Hence also X = A−1 Y is a
solution. It follows that BY = A−1 Y for all n × 1 matrices Y and so, by Theorem 1.20,
B = A−1 . If, instead, we find that the system has not a unique solution for some
right-hand side Y , then A is not invertible by Theorem 1.19.

Example 1.21. Let us determine whether the matrix


 
1 2 3
A = 1 3 5
1 4 6

is invertible. We have
 
 x1 + 2x2 + 3x3 = y1  x1 + 2x2 + 3x3 = y1
AX = Y ⇔ x + 3x2 + 5x3 = y2 ⇔ x2 + 2x3 = −y1 + y2
 1 
x1 + 4x2 + 6x3 = y3 2x2 + 3x3 = −y1 + y3
 
 x1 + 2x2 + 3x3 = y1  x1 = 2y1 − y3
⇔ x2 + 2x3 = −y1 + y2 ⇔ x2 = y1 − 3y2 + 2y3 .
 
− x3 = y1 − 2y2 + y3 x3 = −y1 + 2y2 − y3

From this it follows that A is invertible and that


 
2 0 −1
A−1 =  1 −3 2  .
−1 2 −1

Of course, it suffices to keep track of the coefficients of the xi and the yi . Thus the above
computations can be written as
   
1 2 3 1 0 0 1 2 3 1 0 0
AX = Y ⇔ 1 3 5 0 1 0 ⇔ 0 1 2 −1 1 0
1 4 6 0 0 1 0 2 3 −1 0 1
   
1 2 3 1 0 0 1 0 0 2 0 −1
⇔ 0 1 2 −1 1 0 ⇔ 0 1 0 1 −3 2  .
0 0 −1 1 −2 1 0 0 1 −1 2 −1

7
1 Matrices

Example 1.22. The matrix  


1 2 3
A = 1 3 5
1 4 7
differs from the matrix in the previous example only in its lower right position. This
time we get
   
1 2 3 1 0 0 1 2 3 1 0 0
AX = Y ⇔ 1 3 5 0 1 0 ⇔ 0 1 2 −1 1 0
1 4 7 0 0 1 0 2 4 −1 0 1
 
1 2 3 1 0 0
⇔ 0 1 2 −1 1 0 .
0 0 0 1 −2 1

We see that the system has an infinite number of solutions when y1 − 2y2 + y3 = 0 and
no solutions otherwise. Hence A is not invertible.

Theorem 1.23. Let A and B be invertible square matrices. Then AB and At are
invertible, (AB)−1 = B −1 A−1 and (At )−1 = (A−1 )t .

Proof. The statements follow from

(AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIA−1 = AA−1 = I,


(B −1 A−1 )(AB) = B −1 (A−1 A)B = B −1 IB = B −1 B = I

and

At (A−1 )t = (A−1 A)t = I t = I,


(A−1 )t At = (AA−1 )t = I t = I.

Exercises
1.1. Let    
1 2 3 1 0 1  
2 2 1
A = −1 1 2  , B = 1 2 1 , C= .
1 1 3
1 1 −1 2 1 3
Perform the following operations or explain why they are not defined.
(a) A + B, (b) A + C, (c) AB, (d) BA,
(e) AC, (f) CA, (g) C(A + B), (h) C(A + 2B).
1.2. Let 
  
1 2 1 0
A= and B= .
−1 1 1 2
Compute A2 − B 2 and (A + B)(A − B) and explain what you observe.

8
Exercises

1.3. Let 
2 4
A= .
1 2
Find all 2 × 2 matrices B such that

AB = BA = 0.

1.4. (a) Let A and be B be n × n matrices such that AB = BA. Show that the
binomial theorem
X n  
n n n−k k
(A + B) = A B
k
k=0

holds for A and B. Explain why the commutativity condition is necessary.


(b) Compute (I + A)6 where  
0 2 3
A = 0 0 4 .
0 0 0

1.5. Let A, B and C be the matrices in Exercise 1.1. Compute At B t and (At + B t )C t .
1.6. Show that At A and AAt are symmetric matrices for all matrices A.
1.7. A matrix A is said to be skew-symmetric if At = −A.
(a) Let A be a square matrix. Show that A + At is symmetric and that A − At
is skew-symmetric.
(b) Show that, for every square matrix A, there exist unique matrices B and C
such that A = B + C, B is symmetric and C is skew-symmetric.
1.8. Find the inverse of each matrix below or explain why it does not exists.
     
1 2 3 1 2 3 1 2 3
(a) 1 1 2, (b) 1 3 2, (c) 1 3 2.
1 1 1 1 1 4 2 1 4
1.9. Compute the inverses of A, At and A2 where
 
2 1 1
A = 1 1 2 .
2 1 2

1.10. For which values of a is the matrix


 
2 0 1
1 3 2 
2 4 a

invertible? Find the inverse for those values.

9
1 Matrices

1.11. Solve the matrix equation


AXB = C
where  
 2 1 2  
2 3 2 1 0
A= , B = 1 1 2 , C= .
3 4 0 1 1
2 1 1

1.12. Let A and B be n × n matrices such that I + AB is invertible. Show that I + BA


is invertible with inverse I − B(I + AB)−1 A.
1.13. A nilpotent matrix is a square matrix A such that Ak = 0 for some positive
integer k. The smallest such k is called the index of A. Show that

I + A + A2 + · · · + Ak

is invertible if A is a nilpotent matrix of index k. What is the inverse?

10
2 Linear Spaces
2.1 Definition
Definition 2.1. Let V be a non-empty set, and let there be defined two operations
V × V → V and R × V → V called addition and scalar multiplication, respectively.
We denote the value of the addition at (u, v) by u + v and the value of the scalar
multiplication at (s, u) by su. We call an element of V a vector and real numbers are
usually called scalars. We say that the set V , together with the addition and scalar
multiplication, forms a linear space provided that the following conditions are met:

(i) u + v = v + u for all u and v in V .


(ii) (u + v) + w = u + (v + w) for all u, v and w in V .
(iii) There exists an element 0 ∈ V such that u + 0 = u for all u ∈ V .
(iv) For every u ∈ V , there exists an element −u ∈ V such that u + (−u) = 0.
(v) s(u + v) = su + sv for all s ∈ R and all u and v in V .
(vi) (s + t)u = su + tu for all s and t in R and all u ∈ V .
(vii) s(tu) = (st)u for all s and t in R and all u ∈ V .
(viii) 1u = u for all u ∈ V .

The conditions are called the axioms for linear spaces. We call 0 the zero vector,
and −u is called the additive inverse of u. The linear space is in fact a triple (V, +, ·)
where + and · denote the addition and scalar multiplication, respectively. When it is
clear from context which operations + and · are intended, we shall, by abuse of language,
use V to denote also the linear space.
The following statements follow from the definition.

Theorem 2.2.
(i) The vector 0 is uniquely determined.
(ii) For every u ∈ V , the vector −u is uniquely determined.
(iii) 0u = 0 for all u ∈ V .
(iv) s0 = 0 for all s ∈ R.
(v) (−1)u = −u for all u ∈ V .

Proof. (i) Suppose that 01 and 02 both satisfy Axiom (iii) in the definition. Then it
follows from that axiom and Axiom (i) that 01 = 01 + 02 = 02 .
(ii) Suppose that u + v = 0 and u + w = 0. Then it follows from Axioms (i), (ii)
and (iii) that v = v + 0 = v + (u + w) = (v + u) + w = 0 + w = w + 0 = w.
(iii) 0u = 0u + 0 = 0u + 0u + (−(0u)) = (0 + 0)u + (−(0u)) = 0u + (−(0u)) = 0.
(iv) s0 = s0 + 0 = s0 + s0 + (−(s0)) = s(0 + 0) + (−(s0)) = s0 + (−(s0)) = 0.
2 Linear Spaces

(v) u + (−1)u = 1u + (−1)u = (1 − 1)u = 0u = 0, and hence (−1)u = −u by the


uniqueness of the additive inverse.
Example 2.3. The set Rn of n-tuples (x1 , x2 , . . . , xn ) of real numbers, together with
the addition and scalar multiplication defined by
(x1 , x2 , . . . , xn ) + (y1 , y2 , . . . , yn ) = (x1 + y1 , x2 + y2 , . . . , xn + yn ),
s(x1 , x2 , . . . , xn ) = (sx1 , sx2 , . . . , sxn ),
is a linear space. The zero vector is 0 = (0, 0, . . . , 0) and the additive inverse of the
vector x = (x1 , x2 , . . . , xn ) is −x = (−x1 , −x2 , . . . , −xn ). We leave it to the reader to
verify the axioms for a linear space.
Example 2.4. The set of vectors in space, together with the ordinary vector addition
and scalar multiplication, forms a linear space.
Example 2.5. The set P of polynomials over R, together with addition of polynomi-
als and multiplication by a number, forms a linear space. The zero vector is the zero
polynomial.
Example 2.6. The set Pn of polynomials over R of degree at most n, together with
addition of polynomials and multiplication by a number, forms a linear space.
Example 2.7. The set of polynomials of degree exactly n, together with addition of
polynomials and multiplication by a number, does not form a linear space. One reason
for this is that addition is not a function into the set. The sum of two polynomials
of degree n need not be a polynomial of degree n. For example, the sum of the two
polynomials 1 + x + x2 and 1 + x − x2 of degree 2 is the polynomial 2 + 2x of degree 1.
Another reason is that the zero polynomial does not belong to the set.
Example 2.8. Let C(I) be the set of continuous functions u : I → R where I is an
interval. The sum of two functions u and v in C(I) is defined by (u+v)(x) = u(x)+v(x),
x ∈ I, and the scalar multiplication is defined by (su)(x) = su(x), x ∈ I. In this way
we get a linear space. The zero vector is the function 0 defined by 0(x) = 0, x ∈ I,
and −u is defined by (−u)(x) = −u(x), x ∈ I. Here it is essential that the sum of two
continuous functions is continuous and that su is continuous when u is.
Example 2.9. The set Mm×n of matrices of size m × n forms a linear space together
with addition and scalar multiplication of matrices.
Definition 2.10. Let U and V be linear spaces. We say that U is a subspace of V if
U ⊆ V and the addition and scalar multiplication of U agree with those of V on the sets
U × U and R × U , respectively.
Theorem 2.11. Let U be a subset of a linear space V . Then the operations on V make
U a subspace of V if and only if
(i) U 6= ∅,
(ii) u + v ∈ U whenever u and v belong to U ,
(iii) su ∈ U whenever s ∈ R and u ∈ U .

12
2.1 Definition

Proof. First assume that U is a subspace of V . Then U is non-empty by definition.


Hence (i) is satisfied. Since the restrictions of the operations + and · to U × U and
R × U are functions into U , conditions (ii) and (iii) are satisfied.
To show the converse, we assume that the three conditions are met. From (ii) and
(iii) it follows that the restrictions of the operations to U × U and R × U are functions
into U . Since vectors in U are also vectors in V , all the axioms for linear spaces hold for U
except possibly Axioms (iii) and (iv). Since U 6= ∅, U contains at least one vector u.
By condition (iii), 0 = 0u ∈ U . Hence Axiom (iii) holds. It also follows from condition
(iii) that −u = (−1)u ∈ U if u is any vector in U , whence Axiom (iv) is satisfied.

Corollary 2.12. Let U be a non-empty subset of a linear space V . Then U is a subspace


of V if and only if su + tv ∈ U for all u and v in U and all real numbers s and t.

Proof. Assume that U is a subspace of V . If u and v belong to U and s and t are


real numbers, then, by (iii) of Theorem 2.11, su ∈ U and tv ∈ U . Hence, by (ii) of
Theorem 2.11, su + tv ∈ U .
Assume that su + tv ∈ U for all u and v in U and all real numbers s and t. If u
and v belong to U , then it follows that u + v = 1u + 1v ∈ U . Assume that u ∈ U and
s ∈ R. Then su = su + 0u ∈ U . Hence, the conditions of Theorem 2.11 are satisfied,
which shows that U is a subspace.

Note that it follows from the proof of Theorem 2.11 that the zero vector of a linear
space V is also the zero vector of its subspaces and that the inverse of a vector in a
subspace of V is the inverse of that vector in V .
Example 2.13. Let V be any linear space. If 0 is the zero vector of V , then U = {0} is a
subset of V . In fact, U is a subspace by Corollary 2.12. Firstly, U is non-empty. Secondly,
if u and v belong to U , then both are the zero vector. Hence, su + tv = s0 + t0 = 0 ∈ U
for all real numbers s and t. We call U the zero subspace of V .

Example 2.14. Let U = {x ∈ Rn ; a1 x1 + a2 x2 + · · · + an xn = 0}. Then U is a subspace


of Rn . To show this, we first observe that 0 = (0, 0, . . . , 0) ∈ U . Hence U is non-empty.
Let x and y belong to U and s and t be real numbers. Then

0 = a1 x 1 + a2 x 2 + · · · + a n x n ,
0 = a1 y 1 + a2 y 2 + · · · + a n y n .

Hence

0 = s0 + t0 = s(a1 x1 + a2 x2 + · · · + an xn ) + t(a1 y1 + a2 y2 + · · · + an yn )
= a1 (sx1 + ty1 ) + a2 (sx2 + ty2 ) + · · · + an (sxn + tyn ),

and it follows that sx + ty = (sx1 + ty1 , sx2 + ty2 , . . . , sxn + tyn ) ∈ U .

Example 2.15. The linear space Pn of polynomials of degree at most n is a subspace


of the linear space P of all polynomials. If m ≤ n, then Pm is a subspace of Pn .

13
2 Linear Spaces

Example 2.16. If we regard polynomials over R as functions on R, then we can regard


P as a subspace of the space C(R) of continuous functions on R.

By abuse of notation, we shall frequently regard elements of Rn as row or column


matrices and vice versa. For example, when A is an m × n matrix, x = (x1 , . . . , xn ) ∈ Rn
and y = (y1 , . . . , ym ) ∈ Rm , Ax = y means that
   
x1 y1
 ..   .. 
A .  =  . .
xn ym

Definition 2.17. Let A be an m×n matrix. The kernel ker A of A is the set of solutions
in Rn of the equation Ax = 0. The image im A of A is the set of vectors y ∈ Rm for
which the equation Ax = y has a solution.

Theorem 2.18. Let A be an m × n matrix. Then ker A and im A are subspaces of Rn


and Rm , respectively.

Proof. Obviously, 0 = (0, 0, . . . , 0) ∈ ker A. Hence ker A 6= ∅. Suppose that x and y


belong to ker A. Then Ax = Ay = 0, and so A(sx + ty) = sAx + tAy = s0 + t0 = 0.
Hence, sx + ty belongs to ker A. This shows that ker A is a subspace of Rn .
The vector 0 = (0, 0, . . . , 0) ∈ Rm belongs to im A since x = 0 = (0, 0, . . . , 0) ∈ Rn is
a solution of the equation Ax = 0. Hence im A 6= ∅. Suppose that u and v belong to
im A. Then there exist x and y in Rn such that Ax = u and Ay = v. From this we get
A(sx + ty) = sAx + tAy = su + tv, and so su + tv ∈ im A.

Example 2.19. The planea1 x1 + a2 x2 + a3 x3 = 0 through the origin can be regarded
as the kernel of the matrix a1 a2 a3 . This plane also has a parametric equation
    
 x1 = u1 t1 + v1 t2 u1 v1   x1
  t1
x = u2 t1 + v2 t2 ⇔ u2 v2 
= x2  .
 2 t2
x3 = u3 t1 + v3 t2 u3 v3 x3

Hence, the plane is the image of the matrix


 
u1 v1
u2 v2  .
u3 v3

The intersection of the planes a1 x1 + a2 x2 + a3 x3 = 0 and b1 x1 + b2 x2 + b3 x3 = 0


through the origin is the kernel of the matrix
 
a1 a2 a3
.
b1 b2 b3

If the two planes are not identical, their intersection is a line through the origin. Since
lines have parametric equations, this line can also be regarded as the image of a matrix.

14
2.2 Bases

2.2 Bases
Definition 2.20. Let u and u1 , u2 , . . . , uk be vectors in a linear space V . We say that
u is a linear combination of the ui if there exist real numbers s1 , s2 , . . . , sk such that

u = s 1 u1 + s 2 u2 + · · · + s k uk .

Definition 2.21. We say that the vectors u1 , u2 , . . . , uk in a linear space V span V if


every vector of V is a linear combination of them. One can also say that they generate V
and call them generators of V .

Definition 2.22. Let u1 , u2 , . . . , uk be vectors in a linear space V . We call the set of all
linear combinations of those vectors the span of them and denote it by [u1 , u2 , . . . , uk ].

Theorem 2.23. Let u1 , u2 , . . . , uk be vectors in a linear space V . Then their span U


is a subspace of V and U is spanned by u1 , u2 , . . . , uk .

Proof. U 6= ∅ since 0 = 0u1 + 0u2 + · · · + 0uk ∈ U . If u and v belong to U , then


u = s1 u1 + s2 u2 + · · · + sk uk and v = t1 u1 + t2 u2 + · · · + tk uk . Therefore,

su + tv = (ss1 + tt1 )u1 + (ss2 + tt2 )u2 + · · · + (ssk + ttk )uk ∈ U,

and hence, by Corollary 2.12, U is a subspace of V . The fact that the vectors span U
follows directly from the definition.

Example 2.24. The plane in Example 2.19 is the span of the two vectors u = (u1 , u2 , u3 )
and v = (v1 , v2 , v3 ). The line there is the span of a single vector w = (w1 , w2 , w3 ).

Example 2.25. Let A be an m × n matrix. A vector y ∈ Rm belongs to im A if and


only if there exists a vector x ∈ Rn such that Ax = y. Since this can be written as
x1 A1 +x2 A2 +· · ·+xn An = y, we see that y ∈ im A if and only if y is a linear combination
of the columns of A. Hence, im A = [A1 , A2 , . . . , An ] is spanned by the columns of the
matrix.

Definition 2.26. We say that the vectors u1 , u2 , . . . , uk in a linear space V are linearly
dependent if there exist real numbers s1 , s2 , . . . , sk , not all zero, such that

s1 u1 + s2 u2 + · · · + sk uk = 0.

If the vectors are not linearly dependent, we say that they are linearly independent.

Hence, the vectors u1 , u2 , . . . , uk are linearly independent if and only if

s 1 u1 + s 2 u2 + · · · + s k uk = 0 ⇒ s1 = s2 = · · · = sk = 0.

Also note that a single vector u is linearly dependent if and only if it is the zero vector.

15
2 Linear Spaces

Example 2.27. Here we want to find out if the vectors u1 = (1, 1, 1), u2 = (1, 3, 1),
u3 = (1, 4, 3) in R3 are linearly dependent. We solve the equation s1 u1 +s2 u2 +s3 u3 = 0:
 
 s1 + s2 + s3 = 0  s1 + s2 + s3 = 0
s1 + 3s2 + 4s3 = 0 ⇔ 2s2 + 3s3 = 0 ⇔ s1 = s2 = s3 = 0.
 
s1 + s2 + 3s3 = 0 2s3 = 0

Hence, the vectors are linearly independent.

Example 2.28. Consider the vectors u1 = (1, 1, 1), u2 = (1, 3, 5), u3 = (1, 4, 7) in R3 .
This time the equation s1 u1 + s2 u2 + s3 u3 = 0 is equivalent to
 
 s1 + s2 + s3 = 0  s1 + s2 + s3 = 0 
s1 + s2 + s3 = 0
s + 3s2 + 4s3 = 0 ⇔ 2s2 + 3s3 = 0 ⇔ .
 1  2s2 + 3s3 = 0
s1 + 5s2 + 7s3 = 0 4s2 + 6s3 = 0

Since this equation has non-trivial solutions, the vectors u1 , u2 , u3 are linearly dependent.

Theorem 2.29. Let k ≥ 2. Then the vectors u1 , u2 , . . . , uk in a linear space V are


linearly dependent if and only if one of them is a linear combination of the others.

Proof. Suppose that the vectors are linearly dependent. Then there exist real numbers
s1 , . . . , si , . . . , sk , where si 6= 0, such that s1 u1 + · · · + si ui + · · · + sk uk = 0. Dividing
by si and moving terms, we find that
X −sj
ui = uj
si
1≤j≤k
j6=i

is a linear combination of the other vectors.


Now suppose that X
ui = s j uj
1≤j≤k
j6=i

is a linear combination of the other vectors. Then s1 u1 + · · · + sk uk = 0 where the


coefficient si = −1 of ui is non-zero. Hence the vectors are linearly dependent.

Definition 2.30. Let u1 , . . . , uk be vectors in a linear space V . We say that these


vectors form a basis for V if they span V and are linearly independent.

Example 2.31. The vectors


ε1 = (1, 0, 0, . . . , 0)
ε2 = (0, 1, 0, . . . , 0)
ε3 = (0, 0, 1, . . . , 0)
..
.
εn = (0, 0, 0, . . . , 1)

16
2.2 Bases

of Rn form a basis for Rn . Firstly, they span Rn since any vector

x = (x1 , x2 , x3 , . . . , xn ) = x1 ε1 + x2 ε2 + x3 ε3 + · · · + xn εn

of Rn is a linear combination of them. Secondly, they are linearly independent since

x1 ε1 + x2 ε2 + x3 ε3 + · · · + xn εn = (x1 , x2 , x3 , . . . , xn ) = 0

only if x1 = x2 = x3 = · · · = xn = 0.

Definition 2.32. The basis ε1 , . . . , εn in Example 2.31 is called the standard basis for
the linear space Rn .

Theorem 2.33. Let u1 , . . . , uk be vectors in a linear space V . Then they form a basis
for V if and only if every vector u ∈ V can be written as a linear combination

u = x1 u1 + · · · + xk uk

with unique coefficients x1 , . . . , xk .

Proof. First assume that the vectors form a basis for V and let u ∈ V . Then, by the
definition of bases, u = x1 u1 +· · ·+xk uk is a linear combination of the vectors. It remains
to show that the coefficients are uniquely determined. Assume, to that end, that we also
have u = y1 u1 + · · · + yk uk . Then 0 = u − u = (x1 − y1 )u1 + · · · + (xk − yk )uk , and
it follows from the linear independence of u1 , . . . , uk that x1 − y1 = · · · = xk − yk = 0,
whence xi = yi for i = 1, . . . , k.
To show the converse, we assume that every vector of V has a unique representation as
in the theorem. Then, certainly, every vector of V is a linear combination of u1 , . . . , uk .
If x1 u1 + · · · + xk uk = 0, then the uniqueness and the fact that 0u1 + · · · + 0uk = 0 give
that x1 = · · · = xk = 0. Hence u1 , . . . , uk are also linearly independent, and therefore
they form a basis for V .

Definition 2.34. Let u1 , . . . , uk be a basis for a linear space V and let u be a vector
of V . If u = x1 u1 + · · · + xk uk , we call (x1 , . . . , xk ) the coordinates of u with respect to
the basis u1 , . . . , uk .

Example 2.35. The coordinates of x = (x1 , . . . , xn ) ∈ Rn with respect to the standard


basis in Example 2.31 are (x1 , . . . , xn ). Hence, with respect to that specific basis, the
coordinates of a vector in Rn are its components.

Example 2.36. The polynomials 1, x, x2 , . . . , xn form a basis for the linear space Pn
of polynomials of degree at most n. This is so because every polynomial in Pn can be
written as p = a0 + a1 x + a2 x2 + · · · + an xn with unique coefficients a0 , a1 , a2 , . . . , an .
The coordinates of p with respect to this basis are (a0 , a1 , a2 , . . . , an ).

Example 2.37. The linear space P of all polynomials has no basis. No finite collection
p1 , . . . , pk of polynomials span P since a polynomial of degree greater than the maximum
degree of the pi cannot be a linear combination of them.

17
2 Linear Spaces

Example 2.38. We set out to find bases for ker A and im A where
 
1 2 3 4 5
A = 1 3 4 5 6  .
2 5 7 9 11
We begin by solving the equation Ax = 0:
 
 x1 + 2x2 + 3x3 + 4x4 + 5x5 = 0  x1 + 2x2 + 3x3 + 4x4 + 5x5 = 0
x1 + 3x2 + 4x3 + 5x4 + 6x5 = 0 ⇔ x2 + x3 + x4 + x5 = 0
 
2x1 + 5x2 + 7x3 + 9x4 + 11x5 = 0 x2 + x3 + x4 + x5 = 0


 x1 = − r − 2s − 3t


 x2 = − r − s − t
⇔ x3 = r ⇔ x = ru + sv + tw


 x4 =
 s

x5 = t
where u = (−1, −1, 1, 0, 0), v = (−2, −1, 0, 1, 0) and w = (−3, −1, 0, 0, 1). This shows
that x ∈ ker A if and only if x = ru + sv + tw. Hence the vectors u, v and w span
ker A. The generators obtained when solving a system in the usual way will always be
linearly independent. The free variables x3 , x4 and x5 in the last system correspond
to the parameters r, s and t in the solution, which in turn correspond to the patterns
(1, 0, 0), (0, 1, 0) and (0, 0, 1) in the third, fourth and fifth positions of the generators.
Hence ru + sv + tw = 0 if and only if (∗, ∗, r, s, t) = 0 which implies that r = s = t = 0.
Thus (−1, −1, 1, 0, 0), (−2, −1, 0, 1, 0) and (−3, −1, 0, 0, 1) form a basis for ker A.
We can use the same computations to find a basis for im A. Since im A is spanned by
the columns A1 , A2 , A3 , A4 , A5 of A, every vector y ∈ im A can be written as
y = x1 A1 + x2 A2 + x3 A3 + x4 A4 + x5 A5 . (2.1)
From the solution of the system we have
(−r − 2s − 3t)A1 + (−r − s − t)A2 + rA3 + sA4 + tA5 = 0.
By setting r = 1, s = 0, t = 0, we see that A3 = r1 A1 + r2 A2 is a linear combination
of A1 , A2 . By setting r = 0, s = 1, t = 0, we see that also A4 = s1 A1 + s2 A2 is
a linear combination of A1 , A2 . Finally, by setting r = 0, s = 0, t = 1, we see that
A5 = t1 A1 + t2 A2 is a linear combination of A1 , A2 . Substituting these expressions for
A3 , A4 and A5 into (2.1) and collecting terms, we find that y is a linear combination of
A1 , A2 , which therefore span im A. The computations also reveal that these vectors are
linearly independent. In fact,
x1 A1 + x2 A2 = 0 ⇔ x1 A1 + x2 A2 + 0A3 + 0A4 + 0A5 = 0


 x1 = − r − 2s − 3t


 2=−r− s− t
x
⇔ 0= r



 0= s

0= t
from which it follows that x1 = x2 = 0. Hence, A1 , A2 form a basis for im A.

18
2.2 Bases

Lemma 2.39. Let u1 , . . . , uj , uj+1 , . . . , uk be vectors and assume that 1 ≤ j < k and
that u1 , . . . , uj are linearly dependent. Then the vectors u1 , . . . , uj , uj+1 , . . . , uk are
also linearly dependent.

Proof. By the assumption, there exist real numbers s1 , . . . , sj , not all zero, such that
s1 u1 + · · · + sj uj = 0. But then s1 u1 + · · · + sj uj + 0uj+1 + · · · + 0uk = 0, and at least
one of these coefficients is non-zero.

Theorem 2.40. If the vectors v 1 , . . . , v k belong to the span [u1 , . . . , uj ] and 1 ≤ j < k,
then v 1 , . . . , v k are linearly dependent.

Proof. By Lemma 2.39, it is sufficient to prove the statement for k = j + 1. We use


induction on j. If v 1 and v 2 belong to [u1 ], then v 1 = s1 u1 and v 2 = s2 u1 for some real
numbers s1 and s2 . If s2 = 0, then v 2 = 0 and hence 0v 1 + 1v 2 = 0, which shows that
v 1 , v 2 are linearly dependent in this case. If s2 6= 0, the linear dependence follows from
the equality s2 v 1 − s1 v 2 = 0. This shows the statement for j = 1.
Let p ≥ 2 and suppose that the statement holds for j = p − 1 and that v 1 , . . . , v p+1
belong to the span [u1 , . . . , up ]. Then


 a11 u1 + · · · + a1(p−1) up−1 + a1p up = v 1

 ..
. .

 ap1 u1 + · · · + ap(p−1) up−1 + app up = v p


a(p+1)1 u1 + · · · + a(p+1)(p−1) up−1 + a(p+1)p up = v p+1

If a1p = · · · = app = a(p+1)p = 0, then v 1 , . . . , v p+1 belong to the span [u1 , . . . , up−1 ]
and are therefore linearly dependent by hypothesis and Lemma 2.39. Hence, renaming
the vectors if needed, we may assume that a(p+1)p 6= 0. We can then eliminate up from
all equations except the last one by adding suitable multiples of that equation to them.
Doing so, we obtain

 b11 u1 + · · · + b1(p−1) up−1 = v 1 + c1 v p+1

.. .
 .

bp1 u1 + · · · + bp(p−1) up−1 = v p + cp v p+1

Hence, the vectors v 1 +c1 v p+1 , . . . , v p +cp v p+1 belong to [u1 , . . . , up−1 ] and are therefore
linearly dependent by the induction hypothesis. Thus there exist scalars s1 , . . . , sp , not
all zero, such that

s1 (v 1 + c1 v p+1 ) + · · · + sp (v p + cp v p+1 ) = 0,

and hence
s1 v 1 + · · · + sp v p + (s1 c1 + · · · + sp cp )v p+1 = 0.
This shows that v 1 , . . . , v p+1 are linearly dependent.

19
2 Linear Spaces

Theorem 2.41. Let u1 , . . . , um and v 1 , . . . , v n be bases for the same linear space V .
Then m = n.

Proof. By assumption, V = [u1 , . . . , um ] and v 1 , . . . , v n are linearly independent. Hence


n ≤ m by Theorem 2.40. We also have V = [v 1 , . . . , v n ], and u1 , . . . , um are linearly
independent. Hence, by the same theorem, m ≤ n. The two inequalities together yield
m = n.

Definition 2.42. Let V be a linear space. If V = {0}, we say that the dimension of
V is zero. If V has a basis consisting of n ≥ 1 vectors, we say that the dimension of V
is n. In these two cases we say that V is finite-dimensional and denote the dimension of
V by dim V . In the remaining case where V 6= {0} and has no basis, we say that V is
infinite-dimensional.

Example 2.43. We saw in Example 2.31 that dim Rn = n. From Example 2.36 we have
dim Pn = n + 1, and by Example 2.37, P is infinite-dimensional. The dimensions of the
kernel and image in Example 2.38 are 3 and 2, respectively.

Lemma 2.44. Let u1 , . . . , uk , uk+1 be vectors in a linear space V . If u1 , . . . , uk are


linearly independent and u1 , . . . , uk , uk+1 are linearly dependent, then uk+1 is a linear
combination of u1 , . . . , uk .

Proof. By assumption, there exist real numbers s1 , . . . , sk , sk+1 , not all zero, such that
s1 u1 + · · · + sk uk + sk+1 uk+1 = 0. If sk+1 = 0, then s1 u1 + · · · + sk uk = 0 where at
least one of the coefficients is non-zero. Since this contradicts the linear independence of
u1 , . . . , uk , we must have sk+1 6= 0. Hence, dividing by sk+1 and moving terms, we can
express uk+1 as a linear combination of u1 , . . . , uk .

Theorem 2.45. Let U 6= {0} be a subspace of a finite-dimensional linear space V . If


u1 , . . . , uk are linearly independent vectors of U , then there exists a basis for U containing
those vectors.

Proof. Set n = dim V and consider any finite set S of linearly independent vectors of
U containing the vectors u1 , . . . , uk . Then, by Theorem 2.40, S cannot contain more
than n vectors, for the vectors in S are also vectors of V and V is spanned by n vectors.
Therefore, among all such sets S, there is a set S0 = {u1 , . . . , um } with a maximum
number of vectors. If u is any other vector of U , it follows from the maximality of S0
that u1 , . . . , um , u must be linearly dependent. Hence, by Lemma 2.44, u is a linear
combination of the vectors in S0 . This shows that u1 , . . . , um form a basis for U .

Corollary 2.46. If U is a subspace of a finite-dimensional linear space V , then U is


finite-dimensional and dim U ≤ dim V .

Proof. The claim is trivial when U = {0}. Otherwise, U contains a non-zero vector u1 ,
which by Theorem 2.45 can be extended to a basis for U . The inequality now follows
from Theorem 2.40.

20
2.2 Bases

In particular, any subspace of Rn is finite-dimensional. The dimension of a subspace


of R3 must be 0, 1, 2 or 3. Hence, the only subspaces of R3 are {0}, lines and planes
through the origin and R3 itself.
Example 2.47. The linear space C(R) cannot be finite-dimensional, for if it were, then
its subspace P of polynomials would also be finite-dimensional, which it is not.

Lemma 2.48. If uk+1 is a linear combination of the vectors u1 , . . . , uk , then

[u1 , . . . , uk , uk+1 ] = [u1 , . . . , uk ].

Proof. If u ∈ [u1 , . . . , uk ], then u = s1 u1 + · · · + sk uk + 0uk+1 ∈ [u1 , . . . , uk+1 ]. The


assumption means that uk+1 = t1 u1 + · · · + tk uk . Therefore, if u ∈ [u1 , . . . , uk+1 ], then
u = s1 u1 +· · ·+sk uk +sk+1uk+1 = (s1 +t1 sk+1 )u1 +· · ·+(sk +tk sk+1 )uk ∈ [u1 , . . . , uk ].

Theorem 2.49. Let V 6= {0} be a linear space and assume that the vectors u1 , . . . , uk
span V . Then there exists a basis for V comprising the vectors in a subset of {u1 , . . . , uk }.

Proof. Among all non-empty subsets S of {u1 , . . . , uk } with the property that the vectors
in S span V there must be a set S0 with a minimum number of vectors. Suppose that the
vectors in S0 are linearly dependent. Then S0 contains more than one vector, and one
vector u ∈ S0 is a linear combination of the other vectors in S0 . Hence, by Lemma 2.48,
the vectors in S0 \{u} span V . Since this contradicts the minimality of S0 , the vectors
in S0 are linearly independent and hence form a basis for V .

Theorem 2.50. Let u1 , . . . , un be n vectors in an n-dimensional linear space V . Then


these vectors span V if and only if they are linearly independent.

Proof. Suppose that u1 , . . . , un span V . If they are also linearly dependent, then, by
using Theorem 2.49, we can obtain a basis for V by removing one or more of the vectors
u1 , . . . , un . Since this contradicts the fact that all bases for an n-dimensional space
consist of n vectors, u1 , . . . , un must be linearly independent.
Now suppose that u1 , . . . , un are linearly independent. If they do not span V , then,
by using Theorem 2.45, we can obtain a basis for V with more than n vectors. We get
the same contradiction as before. Hence u1 , . . . , un span V .

Hence, in order to find out whether n vectors of an n-dimensional space form a basis for
that space, it is enough to check if they are linearly independent or to check if they span
the space.
Corollary 2.51. Let U be a subspace of V and assume that the two spaces have the
same finite dimension. Then U = V .

Proof. Assume that the common dimension is n. If n = 0, the statement is trivial.


Otherwise we can find a basis for U consisting of n vectors. These vectors are linearly
independent and belong also to V . Hence, they span V . Therefore, every vector of V is
a linear combination of vectors of U , and hence belongs to U .

21
2 Linear Spaces

Example 2.52. We want to show that the vectors e1 = (1, 2, 1), e2 = (1, 1, 2) and
e3 = (1, 4, 0) form a basis for R3 and find the coordinates of u = (3, 7, 3) with respect
to that basis. The equation s1 e1 + s2 e2 + s3 e3 = 0 is equivalent to
  
 s1 + s2 + s3 = 0  s1 + s2 + s3 = 0  s1 + s2 + s3 = 0
2s1 + s2 + 4s3 = 0 ⇔ − s2 + 2s3 = 0 ⇔ − s2 + 2s3 = 0 .
  
s1 + 2s2 =0 s2 − s3 = 0 s3 = 0

From this we see that s1 = s2 = s3 = 0. Hence the vectors are linearly independent.
Since the number of vectors is 3 and dim R3 = 3, they must form a basis for R3 . In order
to find the coordinates (x1 , x2 , x3 ) of u, we solve the equation x1 e1 + x2 e2 + x3 e3 = u,
which is equivalent to
 
 x1 + x2 + x3 = 3  x1 + x2 + x3 = 3
2x1 + x2 + 4x3 = 7 ⇔ − x2 + 2x3 = 1
 
x1 + 2x2 =3 x2 − x3 = 0

 x1 + x2 + x3 = 3
⇔ − x2 + 2x3 = 1 ⇔ x1 = x2 = x3 = 1.

x3 = 1

This shows that the coordinates of u with respect to e1 , e2 , e3 are (1, 1, 1). Note that the
coordinates of u with respect to the ordinary basis ε1 , ε2 , ε3 for R3 are (3, 7, 3). When
more than one basis are involved, some care must be taken when stating the coordinates
of a vector.

Example 2.53. We solve the problem in the previous example by, instead, showing that
e1 , e2 , e3 span R3 and using the fact that dim R3 = 3. We must then show that any
vector y = (y1 , y2 , y3 ) of R3 is a linear combination of e1 , e2 , e3 . We check if this holds
by solving the equation x1 e1 + x2 e2 + x3 e3 = y.
 
 x1 + x2 + x3 = y 1  x1 + x2 + x3 = y 1
2x1 + x2 + 4x3 = y2 ⇔ − x2 + 2x3 = y2 − 2y1
 
x1 + 2x2 = y3 x2 − x3 = y 3 − y 1
 
 x1 + x2 + x3 = y 1  x1 = 8y1 − 2y2 − 3y3
⇔ − x2 + 2x3 = y2 − 2y1 ⇔ x2 = −4y1 + y2 + 2y3 .
 
x3 = y3 + y2 − 3y1 x3 = −3y1 + y2 + y3

Indeed, the equation has a solution (x1 , x2 , x3 ) for every y. Therefore, the vectors span
R3 and thus form a basis for R3 . We can now find the coordinates of u by substituting
its components y1 = 3, y2 = 7 and y3 = 3 in the last system above. This gives the same
result as before, namely (x1 , x2 , x3 ) = (1, 1, 1).

Example 2.54. We use the corollary to prove that [u1 , u2 ] = [v 1 , v 2 ] where u1 , u2 ,


v 1 and v 2 are (1, 1, 1, 1), (2, 3, 1, −1), (4, 5, 3, 1) and (1, 0, 2, 4), respectively. It is plain
that u1 , u2 are linearly independent and that v1 , v 2 are linearly independent. Hence,

22
2.3 More on Matrices

both spaces have dimension 2. Therefore, by the corollary, it is sufficient to show that
one of the spaces is a subspace of the other. To do so, we begin by solving the equation
x1 u1 + x2 u2 + y1 v 1 + y2 v 2 = 0.
 

 x 1 + 2x 2 + 4y 1 + y 2 = 0 
 x1 + 2x2 + 4y1 + y2 = 0
 
x1 + 3x2 + 5y1 =0 x2 + y 1 − y 2 = 0


 x + x + 3y + 2y = 0 
 − x 2 − y1 + y2 = 0
 1 2 1 2

x1 − x2 + y1 + 4y2 = 0 − 3x2 − 3y1 + 3y2 = 0

x1 + 2x2 + 4y1 + y2 = 0
⇔ .
x2 + y 1 − y 2 = 0
From this we see that we can choose any values for y1 and y2 and then solve for x1 and x2 .
By choosing y1 = −1, y2 = 0, we see that there are numbers x1 and x2 such that
v 1 = x1 u1 + x2 u2 . Hence, v 1 is a linear combination of u1 and u2 and therefore belongs
to [u1 , u2 ]. By choosing y1 = 0, y2 = −1, we see that also v 2 is a linear combination
of u1 and u2 and therefore belongs to [u1 , u2 ]. Hence, every linear combination of
v 1 and v 2 , and therefore every vector of [v 1 , v 2 ], belongs to [u1 , u2 ]. This shows that
[v 1 , v 2 ] is a subspace of [u1 , u2 ] and therefore, by the corollary, [u1 , u2 ] = [v 1 , v 2 ].
Suppose that the dimension of the linear space V is n > 0 and that e1 , . . . , en form
a basis for V . If u and v are vectors of V having coordinates x = (x1 , . . . , xn ) and
y = (y1 , . . . , yn ), respectively, then u = x1 e1 + · · · + xn en and v = y1 e1 + · · · + yn en .
Hence, the coordinates of
u + v = (x1 + y1 )e1 + · · · + (xn + yn )en and su = sx1 e1 + · · · + sxn en
are
x + y = (x1 + y1 , . . . , xn + yn ) and sx = (sx1 , . . . , sxn ).
By means of a basis for V , we may therefore identify V with Rn . The two spaces behave
the same in every linear respect. We exploit this fact in the next example.
Example 2.55. Let V be the space P2 of polynomials of degree at most 2 and consider
the polynomials p1 = 1 + 2x + x2 , p2 = 1 + x + 2x2 and p3 = 1 + 4x. We intend to
show that these vectors form a basis for P2 and find the coordinates with respect to that
basis of p = 3 + 7x + 3x2 . The three polynomials π 1 = 1, π 2 = x, π 3 = x2 form a basis
for P2 . With respect to this basis, the coordinates of the vectors p1 , p2 , p3 and p are
e1 = (1, 2, 1), e2 = (1, 1, 2), e3 = (1, 4, 0) and u = (3, 7, 3), respectively. Hence, it is
enough to show that e1 , e2 , e3 form a basis for R3 and find the coordinates of u with
respect to this basis. We did this in Example 2.52 and found that the coordinates of
u with respect to e1 , e2 , e3 are (1, 1, 1). Hence, p1 , p2 , p3 form a basis for P2 and the
coordinates of p with respect to this basis are (1, 1, 1). Indeed, p = p1 + p2 + p3 .

2.3 More on Matrices


Let A be an n×n matrix and denote its columns as usual by A1 , . . . , An . By the definition
of matrix multiplication, Ax = y is equivalent to x1 A1 + · · · + xn An = y for all x and y
in Rn . We shall use this fact in the proof of the following theorem.

23
2 Linear Spaces

Theorem 2.56. The following statements are equivalent for an n × n matrix A.


(i) The columns of A are linearly independent.
(ii) The columns of A span Rn .
(iii) The columns of A form a basis for Rn .
(iv) The equation Ax = 0 has only the trivial solution x = 0.
(v) The equation Ax = y has some solution x ∈ Rn for every y ∈ Rn .
(vi) The equation Ax = y has a unique solution x ∈ Rn for every y ∈ Rn .
(vii) A is invertible.
(viii) At is invertible.
(ix) The rows of A are linearly independent.
(x) The rows of A span Rn .
(xi) The rows of A form a basis for Rn .

Proof. The first three statements are equivalent by Theorem 2.50. The equivalences
(i) ⇔ (iv), (ii) ⇔ (v) and (iii) ⇔ (vi) follow from the observation made before the theorem.
The equivalence of (vi) and (vii) is the content of Theorem 1.19. The equivalence of (vii)
and (viii) follows from Theorem 1.23. The last three statements are equivalent to (viii)
since the rows of A are the columns of At .

Corollary 2.57. Let A and B be n × n matrices. Then AB = I if and only if BA = I.

Proof. Assume that AB = I. It then follows that if Bx = 0, then x = ABx = A0 = 0.


Hence, by the equivalence of (iv) and (vii) in Theorem 2.56, B has an inverse B −1 .
Multiplying both sides of the equality AB = I from the right by B −1 , we get A = B −1
and hence BA = BB −1 = I. The converse now follows by interchanging A and B.

After this corollary it is sufficient to show one of the equalities AB = I and BA = I in


order to show that a square matrix A is invertible with inverse B. In Example 1.17 we
had to show both.

2.4 Direct Sums


Definition 2.58. Let U ′ and U ′′ be subspaces of a linear space V . We say that V is
the sum of U ′ and U ′′ and write V = U ′ + U ′′ if every vector u ∈ V can be written as
u = u′ + u′′ where u′ ∈ U ′ and u′′ ∈ U ′′ . If u′ and u′′ are uniquely determined by u for
every vector u ∈ V , we say that V is the direct sum of U ′ and U ′′ and write V = U ′ ⊕ U ′′ .

Definition 2.59. Assume that V = U ′ ⊕ U ′′ is the direct sum of U ′ and U ′′ and let
u ∈ V . The unique vectors u′ and u′′ such that u = u′ + u′′ are called the projections
of u on U ′ along U ′′ and of u on U ′′ along U ′ , respectively.

24
2.4 Direct Sums

U ′′

u′′ u

U′ u′

Theorem 2.60. Using the same notation as in Definition 2.59, we have

(su + tv)′ = su′ + tv′ and (su + tv)′′ = su′′ + tv ′′ .

Proof. The statement follows from the uniqueness and the fact that

(su + tv)′ + (su + tv)′′ = su + tv = s(u′ + u′′ ) + t(v ′ + v′′ )


= (su′ + tv′ ) + (su′′ + tv′′ ).

Theorem 2.61. Let U ′ and U ′′ be subspaces of a linear space V . Then V = U ′ ⊕ U ′′ if


and only if V = U ′ + U ′′ and U ′ ∩ U ′′ = {0}.

Proof. Assume that V = U ′ ⊕ U ′′ . Then, by definition, V = U ′ + U ′′ . If u ∈ U ′ ∩ U ′′ , we


use the fact that u = u + 0 = 0 + u and the uniqueness to conclude that u = 0. Hence,
U ′ ∩ U ′′ = {0}.
To prove the converse, we assume that u = u′ + u′′ = v ′ + v ′′ where u′ and v ′ belong
to U ′ and u′′ and v ′′ belong to U ′′ . Then u′ − v′ = v ′′ − u′′ ∈ U ′ ∩ U ′′ = {0}, and hence
u′ − v ′ = v ′′ − u′′ = 0. This proves the uniqueness, and therefore V = U ′ ⊕ U ′′ .

Theorem 2.62. Let V be a finite-dimensional linear space and assume that V = U ′ ⊕U ′′


where U ′ and U ′′ are subspaces of V . Then dim U ′ + dim U ′′ = dim V .

Proof. If one of the subspaces is the zero subspace, then the other subspace equals V , and
the statement is trivial. We may therefore assume that none of the subspaces is the zero
subspace. Since both subspaces are finite-dimensional, we can choose bases e1 , . . . , ek and
f 1 , . . . , f m for U ′ and U ′′ , respectively. We show that dim V = k + m = dim U ′ + dim U ′′
by showing that e1 , . . . , ek , f 1 , . . . , f m form a basis for V . If u ∈ V , we can write
u = u′ + u′′ where u′ ∈ U ′ and u′′ ∈ U ′′ . Since u′ and u′′ are linear combinations
of e1 , . . . , ek and f 1 , . . . , f m , respectively, it follows that u is a linear combination of
the vectors e1 , . . . , ek , f 1 , . . . , f m . It remains to show that these vectors are linearly
independent. Suppose that

s1 e1 + · · · + sk ek + t1 f 1 + · · · + tm f m = 0.

Then s1 e1 + · · · + sk ek = −(t1 f 1 + · · · + tm f m ) ∈ U ′ ∩ U ′′ = {0}, and hence

s1 e1 + · · · + sk ek = 0 and t1 f 1 + · · · + tm f m = 0.

Since e1 , . . . , ek are linearly independent and f 1 , . . . , f m are linearly independent, it


follows that s1 = · · · = sk = t1 = · · · = tm = 0.

25
2 Linear Spaces

Example 2.63. Let V be 3-space and consider the plane U ′ and the line U ′′ defined
by x1 + 2x2 + 3x3 = 0 and x = t(2, 1, 2), respectively. It is easily checked that the
intersection of the two subspaces is the zero space. Hence, their sum is direct, and
therefore its dimension is 2 + 1 = 3. Thus V = U ′ ⊕ U ′′ . Let u = (2, 3, 4). In order to
find u′ and u′′ , we form the line x = t(2, 1, 2) + (2, 3, 4) through u. Its intersection with
the plane is given by

2t + 2 + 2(t + 3) + 3(2t + 4) = 10t + 20 = 0 ⇔ t = −2.

Hence, u′ = −2(2, 1, 2) + (2, 3, 4) = (−2, 1, 0) and u′′ = u − u′ = (4, 2, 4).

2.5 The Rank-Nullity Theorem


Definition 2.64. Let U and V be linear spaces. A linear transformation F from U to V
is a function F : U → V such that

F (su + tv) = sF (u) + tF (v)

for all u and v in U and all real numbers s and t. If U = V , we also say that F is a
linear transformation on U .

The single condition in the definition can be replaced with the two conditions

F (u + v) = F (u) + F (v) and F (su) = sF (u).

For if the condition in the definition holds, then

F (u + v) = F (1u + 1v) = 1F (u) + 1F (v) = F (u) + F (v)

and
F (su) = F (su + 0u) = sF (u) + 0F (u) = sF (u).
Conversely, if the two conditions hold, then

F (su + tv) = F (su) + F (tv) = sF (u) + tF (v).

Also note that F (0) = F (00) = 0F (0) = 0.

Definition 2.65. Let F be a linear transformation from U to V . We define the kernel


and image of F by

ker F = {u ∈ U ; F (u) = 0} and im F = {v ∈ V ; v = F (u) for some u ∈ U }.

As in the proof of Theorem 2.18, one sees that ker F and im F are subspaces of U and V ,
respectively. We also note that if A is an m × n matrix, then F (x) = Ax defines a linear
transformation F from Rn to Rm whose kernel and image agree with the kernel and
image of A.

26
2.5 The Rank-Nullity Theorem

Example 2.66. Let U = C n (R) be the space of n times continuously differentiable


functions on R and let V = C(R). If u ∈ U , define F (u) by

F (u)(t) = u(n) (t) + a1 u(n−1) (t) + · · · + an−1 u′ (t) + an u(t), t ∈ R.

Then F is a linear transformation from U to V . The kernel of F is the set of solutions


of the homogenous linear differential equation

u(n) (t) + a1 u(n−1) (t) + · · · + an−1 u′ (t) + an u(t) = 0, t ∈ R.

When n = 2 and the roots λ1 and λ2 of the characteristic equation are real and unequal,
ker F is spanned by e1 and e2 defined by e1 (t) = eλ1 t and e2 (t) = eλ2 t . The vectors
e1 and e2 are linearly independent, for if

s1 eλ1 t + s2 eλ2 t = 0, t ∈ R,

then, by taking the derivative,

λ1 s1 eλ1 t + λ2 s2 eλ2 t = 0, t ∈ R.

By inserting t = 0, we get 
s1 + s2 = 0
.
λ1 s1 + λ2 s2 = 0
Since λ1 6= λ2 , this is possible only if s1 = s2 = 0. Thus we have shown that ker F is
two-dimensional with basis e1 , e2 .

Theorem 2.67 (Rank-nullity theorem). Let F be a linear transformation from U


to V where U and V are linear spaces and dim U = n. Then im F is finite-dimensional
and
dim ker F + dim im F = n.

Proof. If ker F = U , then im F = {0}, and the statement holds trivially. Otherwise,
ker F 6= U , and hence U 6= {0}. We can therefore choose a possibly empty set of basis
vectors {e1 , . . . , ek } for ker F and extend it to a basis e1 , . . . , ek , ek+1 , . . . , en for U . Set
f i = F (ei ) for i = 1, . . . , n. Then f i ∈ im F for i = 1, . . . , n and f i = 0 for i = 1, . . . , k.
We show that dim im F = n − k = n − dim ker F by showing that f k+1 , . . . , f n form a
basis for im F .
First, we show that they are linearly independent. Suppose that

sk+1 f k+1 + · · · + sn f n = 0.

Then, by the definition of the f i ,

F (sk+1 ek+1 + · · · + sn en ) = sk+1 F (ek+1 ) + · · · + sn F (en ) = 0.

Hence, sk+1 ek+1 + · · · + sn en ∈ ker F . If ker F = {0}, this implies that

sk+1 ek+1 + · · · + sn en = 0,

27
2 Linear Spaces

and consequently sk+1 = · · · = sn = 0 since the ei are linearly independent. If


ker F 6= {0}, then sk+1 ek+1 + · · · + sn en is a linear combination of e1 , . . . , ek . From
this we get sk+1 = · · · = sn = 0 by once again using the fact that the ei are linearly
independent. Hence, f k+1 , . . . , f n are linearly independent.
We complete the proof by showing that f k+1 , . . . , f n span im F . Let v be any vector
of im F . Then v = F (u) for some u ∈ U . Since e1 , . . . , en form a basis for U , we can
write u = x1 e1 + · · · + xk ek + xk+1 ek+1 + · · · + xn en . Hence,

v = F (x1 e1 + · · · + xk ek + xk+1 ek+1 + · · · + xn en )


= x1 f 1 + · · · + xk f k + xk+1 f k+1 + · · · + xn f n
= xk+1 f k+1 + · · · + xn f n

is a linear combination of f k+1 , . . . , f n .

Example 2.68. We use the theorem to find the dimensions of ker A and im A where
 
1 −1 1 −1 1
2 3 4 2 5
A= 3 7 7 5
.
9
4 6 8 4 10

We begin by solving the equation Ax = 0.


   
1 −1 1 −1 1 0 1 −1 1 −1 1 0
2 3 4 2 5 0 0 5 2 4 3 0
Ax = 0 ⇔  3 7 7 5
 ⇔  
9 0 0 10 4 8 6 0
4 6 8 4 10 0 0 10 4 8 6 0
 
1 −1 1 −1 1 0

0 5 2 4 3 0
⇔ x = r(−7, −2, 5, 0, 0) + s(1, −4, 0, 5, 0) + t(−8, −3, 0, 0, 5).

Hence, dim ker A = 3, and by the rank-nullity theorem, dim im A = 5 − 3 = 2. In fact,


in order to find the dimension of dim ker A there is no need to find the solution x. It
is enough to establish that there are three free variables in the last system. Knowing
that dim im A = 2, we can easily find a basis for im A. We can choose any two non-
proportional columns of A, for example the first two columns.

Theorem 2.69. Let G : U → V and F : V → W be linear transformations. Then their


composition F ◦ G : U → W is also a linear transformation.

Proof. Let u and v be vectors of U and s and t real numbers. Then

(F ◦ G)(su + tv) = F (G(su + tv)) = F (sG(u) + tG(v)) = sF (G(u)) + tF (G(v))


= s(F ◦ G)(u) + t(F ◦ G)(v).

We shall usually omit the circle and write F ◦ G as F G.

28
Exercises

Definition 2.70. A linear transformation F : U → V is said to be invertible if it is onto


and one-to-one.

Theorem 2.71. Let F : U → V be an invertible linear transformation. Then its inverse


function F −1 : V → U is a linear transformation.

Proof. Let u and v be vectors of V and s and t real numbers. Set u′ = F −1 (u) and
v ′ = F −1 (v). The linearity of F then yields su + tv = sF (u′ ) + tF (v ′ ) = F (su′ + tv ′ ),
and hence
sF −1 (u) + tF −1 (v) = su′ + tv′ = F −1 (su + tv).

Lemma 2.72. Let F : U → V be a linear transformation. Then F is one-to-one if and


only if ker F = {0}.

Proof. If F is one-to-one, then the only solution of the equation F (u) = 0 is u = 0,


and hence ker F = {0}. Conversely, suppose that ker F = {0}. If F (u) = F (v), then
F (u − v) = 0. Therefore u − v ∈ ker F = {0}, whence u = v. This means that F is
one-to-one.

Theorem 2.73. Let F : U → V be a linear transformation where U and V are finite-


dimensional linear spaces of the same dimension. Then F is one-to-one if and only if F
is onto.

Proof. Let n be the common dimension of U and V . Then the rank-nullity theorem and
Corollary 2.51 yield

F is one-to-one ⇔ ker F = {0} ⇔ dim ker F = 0


⇔ dim im F = n ⇔ im F = V ⇔ F is onto.

Exercises
2.1. Which of the following sets are subspaces of R2 ? Justify your answers.
(a) {x ∈ R2 ; x1 = 2x2 }, (b) {x ∈ R2 ; (x1 , x2 ) = t(1, 2) + (1, 1), t ∈ R},
(c) {x ∈ R2 ; x1 = x22 }, (d) {x ∈ R2 ; (x1 , x2 ) = t(1, 2), t ∈ R}.
2.2. Show that the set of symmetric n × n matrices is a subspace of Mn×n .
2.3. Express the plane through the origin and the two points (1, 1, 0) and (2, 0, 1) as
the image of a matrix and as the kernel of a matrix.
2.4. Which of the following sets of vectors are linearly dependent?
(a) (1, 2, 3), (2, 3, 3), (2, 5, 7) in R3 ,
(b) (1, 2, 3, 1), (2, 3, 2, 3), (1, 1, −1, 2) in R4 ,
(c) (1, 2, 3, 1, 2), (2, 3, 2, 3, 1), (1, 1, −1, 2, 3) in R5 .

29
2 Linear Spaces

2.5. Consider the following vectors in R4 .

u1 = (1, 1, 1, 2), u2 = (1, 2, 3, 4), u3 = (2, 1, 2, 3), u4 = (5, 1, 3, 5).

Is u1 a linear combination of u2 , u3 and u4 ? Are u1 , u2 , u3 and u4 linearly


dependent?
2.6. Show that the vectors u1 , u2 and u3 in C(R) defined by

u1 (t) = sin t, u2 (t) = sin 2t, u3 (t) = sin 3t

are linearly independent.


2.7. Find bases for ker A and im A where
   
1 1 3 3 1 3 4 1
(a) A = 1 −1 1 −1, (b) A = 1 1 2 −1.
2 1 5 4 2 5 7 2
2.8. Find a basis for the span of the vectors

(1, 3, 2, 1), (1, 2, 1, 1), (1, 1, 0, 1), (1, 2, 2, 1), (3, 4, 1, 3).

2.9. Show that the vectors

(1, 0, 1, 0), (1, 1, 1, 1), (2, 1, −1, 2), (1, −2, −2, 1)

form a basis for R4 . What are the coordinates of (2, 2, −1, 1) with respect to this
basis?
2.10. Find the dimensions of the spans of the following vectors in R4 .
(a) (2, 1, 0, 1), (1, 0, 1, 2), (1, 1, −1, −1),
(b) (1, 2, 3, 1), (1, 1, 1, 2), (1, 1, −1, 1).
2.11. Show that u1 = (1, 1, 1, −1), u2 = (1, 2, 3, 4) span the same subspace of R4 as
v 1 = (−1, 1, 3, 11), v 2 = (3, 1, −1, −13).
2.12. Let u1 , . . . , un be a basis for a linear space. What is the dimension of the subspace
[u1 − u2 , u2 − u3 , . . . , un−1 − un , un − u1 ]?
2.13. Let A and B be n × n matrices such that A2 − AB = I. Show that A2 − BA = I.
2.14. For which of the subspaces U and V of R4 is the sum U + V direct?
(a) U = {x ∈ R4 ; x1 + x2 + x3 + x4 = 0}, V = {x ∈ R4 ; x1 − x2 + x3 − x4 = 0}.
(b) U = {(t, 0, −t, t) ∈ R4 ; t ∈ R}, V = {x ∈ R4 ; x1 + 2x2 + 3x3 + 4x4 = 0}.
2.15. Show that R3 = U ⊕ V where

U = {x ∈ R3 ; x1 + x2 + 3x3 = 0}, V = {t(1, 1, 2) ∈ R3 ; t ∈ R}.

Find the projections of u = (4, 5, 5) on U along V and on V along U .

30
Exercises

2.16. Which of the following functions F : R3 → R2 are linear transformations?


(a) F (x1 , x2 , x3 ) = (x1 + 2x2 + 1, x2 − 2x3 − 1),
(b) F (x1 , x2 , x3 ) = (x1 + x2 , x2 − x3 ),
(c) F (x1 , x2 , x3 ) = (x1 x2 , x2 x3 ).
Justify your answers.
2.17. Find the dimensions of ker A and im A where
 
1 1 1 2 1
1 2 1 3 2
A= 1 1 2 1
.
0
1 2 2 2 1

2.18. Consider the linear transformation F from R2 to R3 defined by

F (x1 , x2 ) = (x1 + x2 , 2x1 + x2 , 3x1 + x2 ).

Is F one-to-one? Is it onto?
2.19. Show that the linear transformation F on R3 defined by

F (x1 , x2 , x3 ) = (x1 + 2x2 + 2x3 , 2x1 + 2x2 + x3 , 3x1 + x2 − x3 )

is invertible. Find the inverse transformation.


2.20. Let P be the space of polynomials over R and let F be the linear transformation
on P defined by F (p) = p′ where p′ denotes the derivative of the polynomial p.
Is F one-to-one? Is F onto? What are the kernel and image of F ?

31
3 Inner Product Spaces
3.1 Definition
Definition 3.1. Let V be a linear space, and let there be defined a function V × V → R
whose value at (u, v) is denoted by hu, vi. We call the function an inner product on V
if the following conditions are satisfied.
(i) hsu + tv, wi = shu, wi + thv, wi for all u, v and w in V and all s and t in R.
(ii) hu, vi = hv, ui for all u and v in V .
(iii) hu, ui ≥ 0 for all u ∈ V with equality only if u = 0.
A linear space,
p furnished with an inner product, is called an inner product space. We
call kuk = hu, ui the norm or length of a vector u in an inner product space.

When we talk about a subspace of an inner product space V , we shall assume that it is
equipped with the same inner product as V .
Note that Axiom (i) means that u 7→ hu, wi is a linear transformation from V to R
for every fixed w ∈ V . Hence, Axiom (i) is equivalent to hu + v, wi = hu, wi + hv, wi
and hsu, wi = shu, wi for all u, v, w in V and all s ∈ R.
Theorem 3.2.
(i) hw, su + tvi = shw, ui + thw, vi for all u, v and w in V and all s and t in R.
(ii) h0, ui = hu, 0i = 0 for all u ∈ V .
(iii) ksuk = |s| kuk for all u ∈ V and s ∈ R.

Proof. (i) We have

hw, su + tvi = hsu + tv, wi = shu, wi + thv, wi = shw, ui + thw, vi

by Axioms (i) and (ii).


(ii) By the same two axioms,

hu, 0i = h0, ui = h0u, ui = 0hu, ui = 0.


p p
(iii) ksuk = hsu, sui = s2 hu, ui = |s| kuk.

Hence, also u 7→ hw, ui is a linear transformation from V to R for every w ∈ V .


Definition 3.3. The dot product on Rn is defined by

(x1 , x2 , . . . , xn ) · (y1 , y2 , . . . , yn ) = x1 y1 + x2 y2 + · · · + xn yn .

We leave the simple proof of the following theorem to the reader.


3 Inner Product Spaces

Theorem 3.4. hx, yi = x · y is an inner product on Rn . The corresponding norm is


q
k(x1 , . . . , xn )k = x21 + · · · + x2n .

Example 3.5. Also hx, yi = x1 y1 + 2x2 y2 + 3x3 y3 + · · · + nxn yn is an inner product


on Rn .

When we mention the inner product space Rn without further specification, it is always
assumed that the inner product is the dot product.
R1
Example 3.6. Let V = C[0, 1]. Then hu, vi = 0 u(x)v(x) dx is an inner product on V .
R1
Axioms (i) and (ii) are easily verified. It is also clear that hu, ui = 0 (u(x))2 dx ≥ 0
for all u ∈ V . Suppose that u 6= 0. Then u(a) 6= 0 for some a ∈ [0, 1]. Hence
(u(a))2 = b > 0. Since u is continuous, u2 is continuous. Hence, there exists a real
number δ > 0 such that (u(x))2 > 2b when |x − a| < δ and x ∈ [0, 1]. We may clearly
assume that δ < 12 . At least one of the intervals [a, a + δ] and [a − δ, a] is contained in
R1
the interval [0, 1]. Therefore 0 (u(x))2 dx ≥ δb 2 > 0. This shows that also Axiom (iii)
holds.

ExampleR 3.7. Let V be the linear space of Riemann integrable functions in [0, 1]. Then
1
hu, vi = 0 u(x)v(x) dx is not an inner product. Let, for example, u be the function
defined by u(0) = 1 and u(x) = 0 for x ∈ (0, 1]. Then hu, ui = 0 despite the fact that
u is not the zero vector.

Example P 3.8. Let l2 be the set of all infinite sequences x = (xn )∞


n=1 of real numbers
∞ 2 ∞ ∞
for which n=1 xn is convergent. Let x = (xn )n=1 and y = (yn )n=1 be two elements of
∞ ∞
2 and s a real number. We define x + y = (xn + yn )n=1 and sx = (sxn )n=1 . Clearly
lP
∞ 2
n=1 (sxn ) is convergent, which shows that sx ∈ l2 . Since

(xn + yn )2 ≤ (xn + yn )2 + (xn − yn )2 = 2x2n + 2yn2 ,


P
it follows from the comparison theorem for positive series that ∞ 2
n=1 (xn + yn ) is con-
vergent. Hence, the two operations turn l2 into a linear space. Since |xn |2 = x2n , we have
(|xn |)∞
n=1 ∈ l2 . Hence, using the fact that

(|xn | + |yn |)2 − (|xn | − |yn |)2


|xn yn | = ,
4
P
we find that ∞ n=1 xn yn is absolutely convergent, and hence
P∞ convergent. Therefore, we
can make l2 an inner product space by defining hx, yi = n=1 xn yn .

Definition 3.9. We say that two vectors u and v in an inner product space are ortho-
gonal if hu, vi = 0. The vector e is called a unit vector if kek = 1.

1 1
If u =
6 0, we can form the vector e = kuk u. Then kek = kuk kuk = 1. Hence e is a
unit vector. We say that we normalise u to e.

34
3.1 Definition

Theorem 3.10 (Pythagorean theorem). If u and v are orthogonal vectors in an


inner product space, then ku + vk2 = kuk2 + kvk2 .

Proof. Since hu, vi = 0, we have

ku + vk2 = hu + v, u + vi = hu, ui + 2hu, vi + hv, vi = kuk2 + kvk2 .

Theorem 3.11 (Cauchy–Schwarz inequality). If u and v are vectors in an inner


product space, then |hu, vi| ≤ kukkvk.

Proof. If v = 0, then both sides are zero and the inequality holds. Assume that v 6= 0.
Then we have

0 ≤ ku − tvk2 = hu − tv, u − tvi = kuk2 − 2thu, vi + t2 kvk2


 
2 hu, vi 2 2 hu, vi2
= kvk t − + kuk −
kvk2 kvk2
hu,vi
for any real number t. For t = kvk2 , this yields

hu, vi2
kuk2 − ≥ 0.
kvk2

Consequently, hu, vi2 ≤ kuk2 kvk2 , and hence |hu, vi| ≤ kukkvk.

In ordinary 3-space one starts out with the notions of length and angle and defines the
inner product of two non-zero vectors u and v by hu, vi = kukkvk cos θ. In this general
setting we started out with an inner product and defined the length, or norm as we
also call it. When u and v are non-zero vectors, the Cauchy–Schwarz inequality can be
written as
hu, vi
−1 ≤ ≤ 1.
kukkvk
This enables us to complete the situation by also defining angles.
Definition 3.12. Let u and v be non-zero vectors of an inner product space. The angle
between u and v is the unique real number θ for which
hu, vi
cos θ = , 0 ≤ θ ≤ π.
kukkvk
The inner product on ordinary 3-space is an inner product in the sense of this chapter.
By defining angles as we do here, we see that we get our old angles back in 3-space.
Example 3.13. The lengths of the vectors u = (4, 3, −1, −1) and v = (1, 1, −1, −1) in
R4 are
p √ √
kuk = 42 + 32 + (−1)2 + (−1)2 = 27 = 3 3,
p √
kvk = 12 + 12 + (−1)2 + (−1)2 = 4 = 2.

35
3 Inner Product Spaces

Their inner product is

hu, vi = 4 · 1 + 3 · 1 + (−1)(−1) + (−1)(−1) = 9.

Hence, the angle θ between the two vectors is given by



hu, vi 9 3
cos θ = = √ = ,
kukkvk (3 3) · 2 2
π
and is 6.

Example 3.14. Consider the functions u and Rv in C[0, 1] defined by u(x) = 1 and
1
v(x) = 6x − 2. With the inner product hu, vi = 0 u(x)v(x) dx we have
Z 1 Z 1
2 2
kuk = (u(x)) dx = 1 dx = 1,
0 0
Z 1 Z 1
kvk2 = (v(x))2 dx = (36x2 − 24x + 4) dx = 4,
0 0
Z 1 Z 1
hu, vi = u(x)v(x) dx = (6x − 2) dx = 1.
0 0

Consequently, the angle θ between u and v is given by


1 1
cos θ = = ,
1·2 2
π
and is therefore 3.

You cannot find the angle in the example in a figure depicting the graphs of the two
functions. Instead, it is small if the functions are close to being directly proportional and
large if they are close to being inversely proportional.
Theorem 3.15 (Triangle inequality). Let u and v be vectors in an inner product
space. Then ku + vk ≤ kuk + kvk.

Proof. The Cauchy–Schwarz inequality gives that

ku + vk2 = hu + v, u + vi = kuk2 + kvk2 + 2hu, vi ≤ kuk2 + kvk2 + 2kukkvk


= (kuk + kvk)2 .

The inequality now follows from the fact that both ku + vk and kuk + kvk are non-
negative.

3.2 Orthonormal Bases


Theorem 3.16. If the vectors u1 , . . . , uk are pairwise orthogonal and non-zero, then
they are linearly independent.

36
3.2 Orthonormal Bases

Proof. If s1 u1 + · · · + sk uk = 0, then
0 = h0, ui i = hs1 u1 + · · · + si ui + · · · + sk uk , ui i
= s1 hu1 , ui i + · · · + si hui , ui i + · · · + sk huk , ui i = si hui , ui i.
Since ui 6= 0, we have hui , ui i = kui k2 6= 0, and hence si = 0.
Definition 3.17. Let V be an inner product space. We say that the vectors e1 , . . . , ek
in V form an orthonormal set if
(
1 when i = j,
hei , ej i =
0 when i 6= j.
If the vectors also form a basis for V , we say that they form an orthonormal basis for V .
By Theorem 3.16, the vectors of an orthonormal set form a basis for V if and only if they
span V .
Theorem 3.18. Let V be an inner product space with orthonormal basis e1 , . . . , en and
assume that the coordinates of u and v with respect to this basis are x = (x1 , . . . , xn )
and y = (y1 , . . . , yn ), respectively. Then hu, vi = x · y and kuk2 = kxk2 where the last
norm is the ordinary norm in Rn ,
Proof. We have
X n
n X n
X
hu, vi = hx1 e1 + · · · + xn en , y1 e1 + · · · + yn en i = xi yj hei , ej i = xi yi = x · y,
i=1 j=1 i=1

and from this it follows that kuk2 = hu, ui = x · x = kxk2 .


On page 23, we remarked that a non-zero n-dimensional linear space V can be identified
with the linear space Rn by means of a basis for V . This theorem allows us to identify
a non-zero n-dimensional inner product space V with the inner product space Rn by
means of an orthonormal basis for V . As will be seen in Corollary 3.22, every non-zero
finite-dimensional inner product space has an orthonormal basis.
Theorem 3.19. If e1 , . . . , en form an orthonormal basis for an inner product space V ,
then the coordinates of a vector u are given by xi = hu, ei i, i = 1, . . . , n.
Proof. If u = x1 e1 + · · · + xn en , then
hu, ei i = hx1 e1 + · · · + xn en , ei i = x1 he1 , ei i + · · · + xi hei , ei i + · · · + xn hen , ei i = xi .
Theorem 3.20. Let u1 , . . . , uk be pairwise orthogonal non-zero vectors and let v be a
vector in an inner product space. Then there exist unique numbers s1 , . . . , sk such that
u = s 1 u1 + · · · + s k uk + v
is orthogonal to the vectors u1 , . . . , uk . The si are given by
hv, ui i
si = − .
kui k2

37
3 Inner Product Spaces

Proof. The vector u = s1 u1 + · · · + sk uk + v is orthogonal to ui if and only if

0 = hu, ui i = si hui , ui i + hv, ui i

and this is in turn equivalent to


hv, ui i hv, ui i
si = − =− .
hui , ui i kui k2

Assume that the vectors v 1 , . . . , v n form a basis for an inner product space V . These
vectors can then be used to construct an orthonormal basis for V . To do so, we first
construct a basis for V consisting of pairwise orthogonal vectors u1 , . . . , un . First, set

u1 = v 1 .

If n = 1, we are done. Otherwise, by using Theorem 3.20, we can find a number s12 such
that
u2 = s12 u1 + v 2
is orthogonal to u1 . Then u2 must be non-zero, for if u2 = 0, then v 1 and v 2 would be
linearly dependent. If n > 2, we find numbers s13 and s23 such that

u3 = s13 u1 + s23 u2 + v 3

is orthogonal to u1 and u2 . Since v 1 , v 2 , v 3 are linearly independent, u3 6= 0. When


k − 1 vectors ui are constructed, and k ≤ n, we form the next vector uk by requiring
that
uk = s1k u1 + s2k u2 + · · · + sk−1k uk−1 + v k
be orthogonal to u1 , . . . , uk−1 . Owing to the linear independence of the v i , uk must be
non-zero. In this way we get n pairwise orthogonal non-zero vectors u1 , . . . , un . Next,
we normalise these vectors by setting
1
ei = ui , i = 1, . . . , n.
kui k
By Theorem 3.16, the ei are linearly independent, and since the v i are linear combinations
of the ei and span V , the vectors ei also span V . Hence, e1 , . . . , en form an orthonormal
basis for V . We could instead have argued that the ei are n linearly independent vectors
in the n-dimensional space V . This algorithm for finding an orthonormal basis is called
the Gram–Schmidt orthogonalisation process and can be summarised as follows.
Theorem 3.21. If v 1 , . . . , v n form a basis for an inner product space V , and

u1 = v 1
u2 = s12 u1 + v 2
u3 = s13 u1 + s23 u2 + v 3
..
.
un = s1n u1 + s2n u2 + · · · + sn−1n un−1 + v n

38
3.2 Orthonormal Bases

where
hui , v k i
sik = − , 1 ≤ i < k ≤ n,
kui k2
then the vectors
1
ei = ui , i = 1, . . . , n,
kui k
form an orthonormal basis for V .
s12 u1
u2 v2

u1 = v 1

Corollary 3.22. Every non-zero finite-dimensional inner product space has an ortho-
normal basis.

Example 3.23. We demonstrate the process by finding an orthonormal basis for the
subspace of R4 spanned by v 1 = (1, 1, 1, 1), v 2 = (1, 2, 2, 1), v 3 = (2, 3, 1, 6). We set
u1 = v 1 . Then we determine the number r so that

u2 = ru1 + v 2

is orthogonal to u1 . By multiplying by u1 , we get

0 = hu1 , u2 i = rhu1 , u1 i + hu1 , v 2 i


= (1 · 1 + 1 · 1 + 1 · 1 + 1 · 1)r + 1 · 1 + 1 · 2 + 1 · 2 + 1 · 1 = 4r + 6.

Hence, r = − 23 , and we get

u2 = ru1 + v 2 = − 23 (1, 1, 1, 1) + (1, 2, 2, 1) = 21 (−1, 1, 1, −1).

To avoid fractional numbers, we can replace u2 with 2u2 . This works, because also
u1 and 2u2 are orthogonal, and 2u2 is a linear combination of v 1 and v 2 . Hence, we set
u2 = (−1, 1, 1, −1). Next, we set out to find numbers s and t that make

u3 = su1 + tu2 + v3

orthogonal to u1 and u2 . Multiplying by u1 , we get

0 = hu1 , u3 i = shu1 , u1 i + thu1 , u2 i + hu1 , v 3 i


= (1 · 1 + 1 · 1 + 1 · 1 + 1 · 1)s + 1 · 2 + 1 · 3 + 1 · 1 + 1 · 6 = 4s + 12,

since hu1 , u2 i = 0. We get s = −3. Next, we multiply by u2 and get

0 = hu2 , u3 i = shu2 , u1 i + thu2 , u2 i + hu2 , v 3 i


= (−1 · (−1) + 1 · 1 + 1 · 1 − 1 · (−1))t − 1 · 2 + 1 · 3 + 1 · 1 − 1 · 6 = 4t − 4.

39
3 Inner Product Spaces

Hence, t = 1, and we get

u3 = −3(1, 1, 1, 1) + (−1, 1, 1, −1) + (2, 3, 1, 6) = (−2, 1, −1, 2).



Now it only remains to normalise the vectors. Since ku1 k = 2, ku2 k = 2 and ku3 k = 10,
we obtain the orthonormal basis
1 1 1
e1 = (1, 1, 1, 1), e2 = (−1, 1, 1, −1), e3 = √ (−2, 1, −1, 2).
2 2 10
Instead of deriving this result from scratch, we could of course have used the formulae
of Theorem 3.21. Methods are, however, easier to remember than formulae.

The Gram–Schmidt process works also if the vi merely span V . As long as the ui
constructed so far are non-zero, they are linearly independent and span the same subspace
as the corresponding vectors vi . If ui = 0, then either i = 1 and v 1 = 0 or v i is a linear
combination of v 1 , . . . , v i−1 . Hence, ui and vi can be discarded. One can then repeat
the last step using the next vector and proceed from there.

Definition 3.24. We say that an n × n matrix is orthogonal if its columns form an


orthonormal set in Rn .

The orthogonality means that At A = I. Since At A = I if and only if AAt = I, we get


the following theorem.

Theorem 3.25. Let A be a square matrix. Then the following statements are equivalent.
(i) A is orthogonal.
(ii) At A = I.
(iii) AAt = I.
(iv) The rows of A form an orthonormal set.

3.3 Orthogonal Complement


Definition 3.26. Let S be a subset of an inner product space V . The orthogonal com-
plement of S is the set S ⊥ = {v ∈ V ; hu, vi = 0 for all u ∈ S}.

Theorem 3.27. If S is a subset of an inner product space V , then S ⊥ is a subspace


of V .

Proof. Clearly, 0 ∈ S ⊥ , and hence S ⊥ 6= ∅. Suppose that v and w belong to S ⊥ . Then


hu, vi = hu, wi = 0 for all u ∈ S. Therefore, hu, sv + twi = shu, vi + thu, wi = 0 for all
u ∈ S, which means that sv + tw ∈ S ⊥ .

Lemma 3.28. Let u1 , . . . , uk be vectors of an inner product space. Then

[u1 , . . . , uk ]⊥ = {u1 , . . . , uk }⊥ .

40
3.3 Orthogonal Complement

Proof. Since {u1 , . . . , uk } ⊆ [u1 , . . . , uk ], the inclusion [u1 , . . . , uk ]⊥ ⊆ {u1 , . . . , uk }⊥ is


clear. Assume that v ∈ {u1 , . . . , uk }⊥ . If u ∈ [u1 , . . . , uk ], then u = s1 u1 + · · · + sk uk ,
and hence hu, vi = s1 hu1 , vi + · · · + sk huk , vi = 0. This shows that v ∈ [u1 , . . . , uk ]⊥ ,
and therefore also the reverse inclusion holds.

Theorem 3.29. Let U be a finite-dimensional subspace of an inner product space V .


Then V = U ⊕ U ⊥ .

Proof. If U = {0}, then U ⊥ = V , and a vector u ∈ V can be written as u = 0 + u.


Assume that U 6= {0}. Then, by Corollary 3.22, there exists an orthonormal basis
e1 , . . . , en for U . We define u′ and u′′ by

u′ = hu, e1 ie1 + · · · + hu, en ien , u′′ = u − u′ .

Then u′ ∈ U and u = u′ + u′′ . By using Lemma 3.28 and the fact that

hei , u′′ i = hei , ui − hei , hu, e1 ie1 + · · · + hu, en ien i = hei , ui − hei , ui = 0, i = 1, . . . , n,

we also see that u′′ ∈ U ⊥ . If u ∈ U ∩ U ⊥ , then u is orthogonal to itself, and hence


kuk2 = hu, ui = 0. This shows that U ∩ U ⊥ = {0}. The conclusion of this theorem now
follows from Theorem 2.61.

Theorem 3.30. Let U be a subspace of an inner product space V . If V = U ⊕ U ⊥ , then


(U ⊥ )⊥ = U .

Proof. Let u ∈ U . Then, by the definition of U ⊥ , hu, vi = 0 for every v ∈ U ⊥ . Hence


u ∈ (U ⊥ )⊥ . This shows that U ⊆ (U ⊥ )⊥ . Assume that u ∈ (U ⊥ )⊥ . By the assumption
that V = U ⊕ U ⊥ , we can write u = u′ + u′′ where u′ ∈ U and u′′ ∈ U ⊥ . The vector u′′
is orthogonal to both u and u′ . It follows that hu′′ , u′′ i = hu′′ , u − u′ i = 0, which shows
that u′′ = 0. Hence u = u′ ∈ U . This shows the reverse inclusion (U ⊥ )⊥ ⊆ U . Hence
(U ⊥ )⊥ = U .

Definition 3.31. Assume that V = U ⊕ U ⊥ and let u ∈ V . The unique vectors u′ ∈ U


and u′′ ∈ U ⊥ for which u = u′ + u′′ are called the orthogonal projections of u on U
and U ⊥ , respectively.

U⊥

u′′ u

U u′

Note that the conclusion V = U ⊕ U ⊥ of Theorem 3.29 need not be true if we drop
the assumption that U be finite-dimensional, and if V 6= U ⊕ U ⊥ , the conclusion of

41
3 Inner Product Spaces

Theorem 3.30 need not be true. Consider the space l2 defined in Example 3.8 and let U
be the subspace of l2 consisting of those sequences (xn )∞n=1 for which only a finite number
of components are non-zero. Then the vector εn having a one in position n and zeros
elsewhere belongs to U . If x ∈ U ⊥ , then xn = hx, εn i = 0 for all n, and hence x = 0.
This means that U ⊥ = {0}, whence (U ⊥ )⊥ = {0}⊥ = l2 6= U .
Let U be a finite-dimensional subspace of an inner product space V and assume that
e1 , . . . , en form an orthonormal basis for U . Then the proof of Theorem 3.29 shows that
the orthogonal projection u′ on U of a vector u ∈ V is given by

u′ = hu, e1 ie1 + · · · + hu, en ien . (3.1)

Example 3.32. Let U be the subspace of R4 spanned by the vectors u1 = (3, 0, 4, 0)


and u2 = (0, 3, 0, −4). We set out to find the orthogonal projection of u = (1, 1, −1, −1)
on U . We note that the vectors u1 and u2 are orthogonal to each other. Hence, we can
obtain an orthonormal basis for U by normalising these vectors to
1 1
e1 = (3, 0, 4, 0), e2 = (0, 3, 0, −4).
5 5
Using the above formula, we get
1 7 1 7
u′ = hu, e1 ie1 + hu, e2 ie2 = − e1 + e2 = − (3, 0, 4, 0) + (0, 3, 0, −4)
5 5 25 25
1
= (−3, 21, −4, −28).
25
Example 3.33. This time we want to find the orthogonal projection of u = (1, 2, 1, 2)
on the subspace U of R4 spanned by v 1 = (1, 1, 1, 1) and v 2 = (1, 1, −1, 3). We begin by
applying the Gram–Schmidt process to v 1 and v 2 . We set u1 = v 1 and u2 = su1 + v 2 ,
and we see that hu1 , u2 i = 0 if s = −1. Hence, we can choose u1 = (1, 1, 1, 1) and
u2 = (0, 0, −2, 2). By normalising these vectors to
1 1
e1 = (1, 1, 1, 1), e2 = √ (0, 0, −1, 1),
2 2
we get an orthonormal basis for U . Hence,
1 3 1 1
u′ = 3e1 + √ e2 = (1, 1, 1, 1) + (0, 0, −1, 1) = (3, 3, 2, 4).
2 2 2 2

Example 3.34. Here, we want to find the orthogonal projection of u = (5, 7, 3) on the
plane U defined by x1 + 2x2 + 3x3 = 0. One way of solving this problem would be to
first find a basis v1 , v 2 for U , then apply the Gram–Schmidt process to v 1 and v 2 to get
an orthonormal basis e1 , e2 for U and finally use the formula.
To avoid this rather cumbersome procedure, we use the fact that U ⊥ is spanned by
the single vector (1, 2, 3). Hence
1
e = √ (1, 2, 3)
14

42
3.3 Orthogonal Complement

constitutes an orthonormal basis for U ⊥ . By using the formula, we find that the ortho-
gonal projection of u on U ⊥ is
28
u′′ = hu, eie = √ e = (2, 4, 6).
14
The orthogonal projection of u on U is, therefore,

u′ = u − u′′ = (5, 7, 3) − (2, 4, 6) = (3, 3, −3).

As we saw in this example, it may be worthwhile to give some thought to which of the
vectors u′ and u′′ should be computed first.

Now assume that V = U ⊕ U ⊥ . Let u ∈ V and let w be any vector in U . Then u′ − w


is orthogonal to u′′ . Hence, by the Pythagorean theorem,

ku − wk2 = ku′ − w + u′′ k2 = ku′ − wk2 + ku′′ k2 .

Therefore, ku − wk is as small as possible when w = u′ , and then ku − wk = ku′′ k.


Since ku − wk > ku − u′ k when w ∈ U and w 6= u′ , u′ is the unique closest vector to u
in U . We call ku′′ k the distance from u to U .

Example 3.35. Consider once again the subspace U and the vector u in Example 3.33.
We found there that the vector closest to u in U is u′ = 12 (3, 3, 2, 4). The distance from
u to U is
1 1 1
ku′′ k = ku − u′ k = (1, 2, 1, 2) − (3, 3, 2, 4) = (−1, 1, 0, 0) = √ .
2 2 2

Example 3.36. Consider the plane U defined by ax + by + cz = 0 and let u = (x, y, z)


be a vector in R3 . Then
1
e= √ (a, b, c)
a + b2 + c2
2

forms an orthonormal basis for U ⊥ . Hence,

ax + by + cz
u′′ = hu, eie = √ e.
a2 + b2 + c2
Consequently, the distance from u to the plane is

ax + by + cz |ax + by + cz|
ku′′ k = √ kek = √ .
a2 + b2 + c2 a2 + b2 + c2

Theorem 3.37. Let V be a finite-dimensional inner product space and let U be a sub-
space of V . Then dim U + dim U ⊥ = dim V .

Proof. The statement follows directly from Theorems 3.29 and 2.62.

43
3 Inner Product Spaces

3.4 The Rank of a Matrix


Let A be an m × n matrix and denote its rows by R1 , . . . , Rm . Since the rows of A
are the columns of At , we have im At = [R1 , . . . , Rm ]. By definition, x ∈ ker A if
and only if Ax = 0. Clearly, Ax = 0 means that x is orthogonal to all the rows
Ri of A. By Lemma 3.28, this is equivalent to x being orthogonal to all vectors of
im At = [R1 , . . . , Rm ], which by definition is equivalent to x ∈ (im At )⊥ . Thus, we have
shown that
ker A = (im At )⊥ . (3.2)
Replacing A with At and using the fact that (At )t = A, we also get

ker At = (im A)⊥ . (3.3)

By applying the rank-nullity theorem to A, Theorem 3.37 to Rn = im At ⊕ (im At )⊥ and


using (3.2), we get

dim im A = n − dim ker A = n − dim (im At )⊥ ,


dim im At = n − dim (im At )⊥ .

Hence, dim im A = dim im At . This means that the maximum number of linearly inde-
pendent columns of A equals the maximum number of linearly independent rows of A.
Definition 3.38. The common value of dim im A and dim im At is called the rank of A.

3.5 The Method of Least Squares


Let A be an m × n matrix and y ∈ Rm . If y ∈ / im A, then the system Ax = y has no
solutions. Often, however, it is interesting to find an approximate solution x by requiring
that Ax be as close as possible to y. The distance from Ax to y can be measured in
several different ways. We choose to define this distance as kAx − yk where the norm is
the ordinary norm in Rm .
We have Ax ∈ im A for all x ∈ Rn , and we know that the orthogonal projection

y of y on im A is the vector in im A that is closest to y. Hence, we can find our
vectors x by solving the system Ax = y ′ . The method indicated here involves two
steps: first finding y ′ and then solving Ax = y ′ . We shall devise a method that involves
only one step. Let y ′ and y ′′ be the orthogonal projections of y on im A and (im A)⊥ ,
respectively. By (3.3), (im A)⊥ = ker At , and hence At y ′′ = 0. Since y ′′ = y − y ′ , this
yields At y = At y ′ . Therefore, Ax = y ′ implies that

AtAx = At y ′ = At y.

Conversely, if At Ax = At y, then At (Ax − y′ ) = At Ax − At y ′ = At Ax − At y = 0. Hence,


Ax − y ′ ∈ ker At = (im A)⊥ . Since also Ax − y ′ ∈ im A, we must have Ax − y ′ = 0 or,
equivalently, Ax = y ′ . Hence, the so-called normal equations

AtAx = At y

44
3.5 The Method of Least Squares

give the same result as the two-step method. This method is called the method of least
squares and its name stems from the fact that the approximate solutions minimise the
sum
kAx − yk2 = ((Ax)1 − y 1 )2 + · · · + ((Ax)m − y m )2

of squares.

ker At
y ′′ y

im A y′

Note that y ∈ im A if the system has solutions. Hence, in this case y ′ = y and the
solutions in the sense of least squares are the ordinary solutions.

Example 3.39. We seek the solution in the sense of least squares of the system

 x1 + x2 = 6
4x − x2 = 8 .
 1
3x1 + 2x2 = 5

We set    
1 1 6
A = 4 −1 and Y = 8 .

3 2 5

Then
   
 1 1
     6  
t 1 4 3   26 3 t 1 4 3   53
AA= 4 −1 = and AY = 8 = .
1 −1 2 3 6 1 −1 2 8
3 2 5

The normal equations for this system are, therefore,


 
26x1 + 3x2 = 53 x1 = 2
⇔ .
3x1 + 6x2 = 8 x2 = 31

Hence, the solution in the sense of least squares is given by x1 = 2, x2 = 13 .

Note that At A is a symmetric matrix and its entry in position i, k is the dot product
Ai · Ak . This observation might save you some time and effort.
The following example illustrates a common application of the method of least squares.

45
3 Inner Product Spaces

Example 3.40. We have reason to believe that some process is described by a linear
model y = at + b where y is some quantity and, for example, t is time. Measurements
give the following data:
t 0 1 2 3
y 1.5 2.9 5.3 6.6
From these data we want to estimate the values of a and b. Ideally, we should be able
to solve for a and b in the following system of equations.


 b = 1.5

a + b = 2.9
.

 2a + b = 5.3

3a + b = 6.6
This is, however, seldom possible owing to measure errors or perhaps to the fact that we
are mistaken in our assumption about the model. We decide, instead, to minimise the
distance in the sense of least squares between the vectors (b, a + b, 2a + b, 3a + b) and
(1.5, 2.9, 5.3, 6.6). That is to say that we decide to solve the system in the same sense.
We can write the system as AX = Y where
   
0 1 1.5  
1 1 2.9 a
A=    
, Y =  , X = .
2 1  5.3 b
3 1 6.6
We get    
14 6 33.3
At A = , At Y =
6 4 16.3
and find that the solution of the normal equations At AX = At Y is given by a = 1.77,
b = 1.42.

The applicability of the method in the last example has nothing to do with the assumption
that the model is linear. It would have worked equally well under the assumption that
y = at2 + bt + c. The important thing here is that the coefficients appear linearly in the
expression. Thus, the method is not directly applicable to the exponential model y = ceat .
However, in this case there is a way to bypass this limitation. This is demonstrated in
the next example.
Example 3.41. Assume that the model is y = ceat and that we have the following data:
t 0 1 2 3
y 3 3 5 9
Taking the logarithm, we get the equivalent relation ln y = at + ln c. Setting z = ln y
and b = ln c, we can write this as z = at + b. We compute the values of z and construct
a new table:
t 0 1 2 3
z ln 3 ln 3 ln 5 ln 9

46
Exercises

The matrix A is the same matrix as in the previous example. Hence, also At A is the
same. From now on, the problem is not well suited for manual calculations. By means
of some numerical software, we should be able to find that
 
t 10.90916185
AZ= .
6.003887068

The solutions of the normal equations At AX = At Z are now given by a = 0.3806662490,


b = 0.9299723935. Hence, c = eb = 2.534439210.

The approach in the above example usually serves its purpose well. Note, however, that
minimising the distance between the vectors comprising the logarithmic values is not the
same as minimising the distance between the vectors themselves.
The method of least squares also gives us a means to compute orthogonal projections.
Let U = [u1 , . . . , un ] be a subspace of an m-dimensional inner product space V . By
introducing an orthonormal basis for V , we can regard V as Rm and the ui as elements
of Rm . If A is the m × n matrix having the ui as columns, then U = im A. Hence, if
x is any solution of the normal equations At Ax = At u, then u′ = Ax is the orthogonal
projection of u on U . If At Ax = 0, then Ax ∈ im A∩ (im A)⊥ = {0}, and hence Ax = 0.
If the ui are linearly independent, this implies that At Ax = 0 has the unique solution
x = 0 and therefore that At A is invertible. Hence, if the ui are linearly independent, we
have the following formula:

u′ = Ax = A(At A)−1 At u. (3.4)

If the basis u1 , . . . , un above is an orthonormal basis for U , then At A = I (n) , and (3.4)
reads
u′ = AAt u.
This is Formula (3.1) written in the language of matrices.

Exercises
3.1. Find the angle between the two vectors u = (−1, 1, 1, −1, 0) and v = (0, 2, 1, 0, 2)
in R5 .
3.2. Show that the four points (1, 1, 2, 2), (2, 2, 3, 3), (3, 1, 4, 2), (2, 0, 3, 1) in R4 are
the vertices of a square.
3.3. Show that the parallelogram identity

2 kuk2 + kvk2 = ku + vk2 + ku − vk2

holds in every inner product space.


3.4. Show that an equilateral triangle in an inner product space V is equiangular
by showing that the angle between two non-zero vectors u and v of V is π3 if
kuk = kvk = ku − vk.

47
3 Inner Product Spaces

3.5. Find an orthonormal basis for the subspace of R4 spanned by (2, 1, 1, 1), (1, 2, 3, 0)
and (1, 1, 1, 1).
3.6. Find an orthonormal basis for the subspace of R4 given by

x1 + 2x2 − 2x3 − x4 = 0.

3.7. Determine the constants a, b and c so that the matrix


 
2 1 2
1
2 −2 −1
3
a b c

is orthogonal.
3.8. Let A and B be orthogonal n × n matrices. Show that AB is an orthogonal
matrix.
R1
3.9. (a) Show that hp, qi = 0 p(x)q(x) dx defines an inner product on P2 .
(b) Use the Gram–Schmidt process to find an orthonormal basis for P2 equipped
with this inner product.
3.10. Find an orthonormal basis for the orthogonal complement of the subspace of R4
spanned by (1, 1, 1, 1) and (0, 1, 2, 1).
3.11. Find the orthogonal projections of u = (1, 2, 3, 4) on the subspaces of R4 spanned
by
(a) (1, 1, 1, 1) and (1, −1, 1, −1),
(b) (1, 1, 1, 1) and (1, 1, 1, 0),
and compute the distances from u to the two subspaces.
3.12. Show that the vectors e1 = √13 (1, 1, 1, 0) and e2 = 13 (0, 2, −2, 1) form an orthonor-
mal set in R4 . Extend this set to an orthonormal basis for R4 by first finding a
basis v1 , v 2 for the orthogonal complement of U = [e1 , e2 ] and then applying the
Gram–Schmidt process to v 1 and v 2 .
3.13. Let e1 , . . . , en be an orthonormal set in an inner product space V and let u be
any vector in V .
(a) Show that
n
X
ku′ k2 = hu, ek i2
k=1

where u′ is the orthogonal projection of u on [e1 , . . . , en ].


(b) Show Bessel’s inequality
n
X
hu, ek i2 ≤ kuk2 .
k=1

48
Exercises

3.14. (a) Show that the vectors e1 , e2 , e3 , . . . in C[0, π] defined by


r
2
ej (t) = sin jt, 0 ≤ t ≤ π, j = 1, 2, 3, . . .
π
are pairwise orthogonal and of norm 1 with respect to the inner product
Z π
hu, vi = u(t)v(t) dt.
0
Hint: The identity
2 sin kt sin jt = cos (k − j)t − cos (k + j)t
might prove useful.
(b) Find the orthogonal projection of u defined by u(t) = t, 0 ≤ t ≤ π, on the
span [e1 , e2 ].
3.15. Find the orthogonal projection of u = (3, 2, 1, 4, 5, 6) on the kernel of
 
1 1 1 1 1 1
.
1 1 −1 −1 1 1

3.16. (a) Find the rank of the matrix


 
1 1 1
2 −1 5
A= 
3 2 4 .
4 3 5

(b) Find bases for im A and im At .


3.17. Find the column vector X ∈ R3 that minimises the distance between AX and Y
where    
1 1 1 1
1 2 2 0
A= 2
 and Y =   .
1 2 1
2 1 0 0
3.18. Find the polynomial y = at + b that is the best least-squares fit to the following
data.
t 0 1 2 3
.
y 5 2 1 0
3.19. Find the polynomial y = at2 + bt + c that is the best least-squares fit to the
following data.
t −1 0 1 2
.
y 2 2 4 6
3.20. Make use of Formula (3.4) to find the orthogonal projection of u = (1, 2, 3, 4) on
the subspace of R4 spanned by (1, 1, 1, 1) and (1, 1, 1, −1).

49
4 Determinants

4.1 Multilinear Forms


Definition 4.1. Let V be a linear space. A function F : V n → R, where the Cartesian
product V n = V × V × · · · × V contains n copies of V , is said to be an n-multilinear form
on V provided that the function u 7→ F (u1 , . . . , ui−1 , u, ui+1 , . . . , un ) is a linear trans-
formation V → R for every index i and every (n − 1)-tuple (u1 , . . . , ui−1 , ui+1 , . . . , un ).
We say that F is alternating if F (u1 , . . . , un ) = 0 whenever there exists an index i,
1 ≤ i ≤ n − 1, such that ui = ui+1 .

Hence, F is an n-multilinear form if

F (. . . , ui−1 , su + tv, ui+1 , . . . ) = sF (. . . , ui−1 , u, ui+1 , . . . ) + tF (. . . , ui−1 , v, ui+1 , . . . )

and alternating if F (u1 . . . , un ) = 0 whenever two adjacent vectors are equal.

Theorem 4.2. Let F be an n-multilinear alternating form on a linear space V . If i 6= j,


then
F (. . . , ui , . . . , uj , . . . ) = −F (. . . , uj , . . . , ui , . . . );
that is, F changes by a sign if two vectors are interchanged. If i 6= j and ui = uj , then
F (u1 , . . . , un ) = 0.

Proof. We begin by showing the first statement in the special case where j = i + 1. Since
F is alternating, we have

F (. . . , ui + uj , ui + uj , . . . ) = 0.

Using the multilinearity and the alternating property, we get

0 = F (. . . , ui + uj , ui + uj , . . . ) = F (. . . , ui , ui + uj , . . . ) + F (. . . , uj , ui + uj , . . . )
= F (. . . , ui , ui , . . . ) + F (. . . , ui , uj , . . . ) + F (. . . , uj , ui , . . . ) + F (. . . , uj , uj , . . . )
= F (. . . , ui , uj , . . . ) + F (. . . , uj , ui , . . . ).

Hence, F (. . . , ui , uj , . . . ) = −F (. . . , uj , ui , . . . ).
For the last statement, we interchange successively adjacent vectors until we obtain an
n-tuple of vectors having two equal adjacent vectors. Since the resulting function value
is zero and can differ from the original value only by a sign, also the original function
value must be zero.
4 Determinants

To show the first statement in general, we assume that i 6= j. It then follows from the
last statement and the multilinearity that

0 = F (. . . , ui + uj , . . . , ui + uj , . . . )
= F (. . . , ui , . . . , ui , . . . ) + F (. . . , ui , . . . , uj , . . . )
+ F (. . . , uj , . . . , ui , . . . ) + F (. . . , uj , . . . , uj , . . . )
= F (. . . , ui , . . . , uj , . . . ) + F (. . . , uj , . . . , ui , . . . ).

Corollary 4.3. The value of F does not change if a multiple of the vector in one position
is added to the vector in another position.

Proof. If i 6= j, we have

F (. . . , ui , . . . , uj + sui , . . . ) = F (. . . , ui , . . . , uj , . . . ) + sF (. . . , ui , . . . , ui , . . . )
= F (. . . , ui , . . . , uj , . . . ).

Theorem 4.4. Let V be an n-dimensional linear space with basis e1 , . . . , en and let F
be an n-multilinear alternating form on V . If F (e1 , . . . , en ) = 0, then F (u1 , . . . , un ) = 0
for all n-tuples (u1 , . . . , un ) of vectors in V .

Proof. Let i1 , . . . , in be any indices such that 1 ≤ ij ≤ n for j = 1, . . . , n and consider


F (ei1 , . . . , ein ). If two of the indices are equal, then F (ei1 , . . . , ein ) = 0 by the second
statement of Theorem 4.2. If the indices are distinct, we can successively interchange
adjacent vectors until we get the n-tuple (e1 , . . . , en ). Hence, by the first statement of
the same theorem, F (ei1 , . . . , ein ) = ±F (e1 , . . . , en ) = 0. Let u1 , . . . , un be vectors of V .
We can then write ui = xi1 e1 + · · · + xin en . Using the linearity in the first argument,
we get
n
X
F (u1 , u2 , . . . , un ) = F (x11 e1 + · · · + x1n en , u2 , . . . , un ) = x1i1 F (ei1 , u2 , . . . , un ).
i1 =1

By using the linearity in each of the remaining arguments, we eventually get


X
F (u1 , u2 , . . . , un ) = x1i1 x2i2 · · · xnin F (ei1 , ei2 , . . . , ein )

where the sum is taken over all n-tuples (i1 , . . . , in ) of indices. The assertion now follows
from the fact that all terms of the sum are zero.

Corollary 4.5. Let V be an n-dimensional linear space with basis e1 , . . . , en and let
F and G be n-multilinear alternating forms on V . If F (e1 , . . . , en ) = G(e1 , . . . , en ),
then F (u1 , . . . , un ) = G(u1 , . . . , un ) for all n-tuples (u1 , . . . , un ) of vectors in V .

Proof. It is plain that F − G is an n-multilinear alternating form on V . Hence, the


statement follows from the preceding theorem.

52
4.2 Definition of Determinants

4.2 Definition of Determinants


Let Mn be the set of n × n matrices. By a determinant of order n we shall mean a
mapping D : Mn → R with the following properties:

(i) D([. . . , sAk + tA′k , . . . ]) = sD([. . . , Ak , . . . ]) + tD([. . . , A′k , . . . ]),


(ii) D(A) = 0 if two adjacent columns of A are equal,
(iii) D(I) = 1.

Consider the linear space V of n×1 columns. The columns Ik , k = 1, . . . , n, of the unit
matrix I form a basis for V . When D is viewed as a function V n → R, conditions (i)
and (ii) mean that D is an n-multilinear alternating form on V . By Corollary 4.5 and
condition (iii), a determinant is uniquely determined if it exists.
We shall now define determinants of all orders recursively. Let A be an n × n matrix
where n ≥ 2. By Aik we mean the (n − 1) × (n − 1) matrix obtained from A by deleting
row i and column k. Hence, if
 
a11 ··· a1(k−1) a1k a1(k+1) ··· a1n
 .. .. .. .. .. 
 . . . . . 
 
a(i−1)1 · · · a(i−1)(k−1) a(i−1)k a(i−1)(k+1) · · · a(i−1)n 
 
A=
 ai1 ··· ai(k−1) aik ai(k+1) ··· ain  ,
a(i+1)1 · · · a(i+1)(k−1) a(i+1)k a(i+1)(k+1) · · · a(i+1)n 
 
 .. .. .. .. .. 
 . . . . . 
an1 ··· an(k−1) ank an(k+1) ··· ann

then
 
a11 ··· a1(k−1) a1(k+1) ··· a1n
 .. .. .. .. 
 . . . . 
 
a(i−1)1 · · · a(i−1)(k−1) a(i−1)(k+1) · · · a(i−1)n 
Aik = 
a(i+1)1 · · · a(i+1)(k−1) a(i+1)(k+1) · · · a(i+1)n  .
 
 .. .. .. .. 
 . . . . 
an1 ··· an(k−1) an(k+1) ··· ann

Theorem 4.6. Let n ≥ 2 be an integer and i an integer such that 1 ≤ i ≤ n. If Dn−1


is a determinant of order n − 1, then the mapping defined by

n
X
Dn (A) = (−1)i+j aij Dn−1 (Aij )
j=1

= (−1)i+1 ai1 Dn−1 (Ai1 ) + (−1)i+2 ai2 Dn−1 (Ai2 ) + · · · + (−1)i+n ain Dn−1 (Ain )

is a determinant of order n.

53
4 Determinants

Proof. We denote Dn−1 by D in this proof. Assume that A = [aik ]n×n and let A′ and B
be the matrices obtained from A by replacing its kth column with
 ′   
a1k sa1k + ta′1k
 ..   .. 
 .   . 
 ′   
 a  and  saik + ta′  ,
 ik   ik 
 ..   .. 
 .   . 

ank ′
sank + tank
respectively. Plainly, Bik = Aik = A′ik and bik = saik +ta′ik . If j 6= k, then bij = aij = a′ij .
In this case we also have D(Bij ) = sD(Aij ) + tD(A′ij ) since D is assumed to satisfy
condition (i). Therefore,
n
X X
Dn (B) = (−1)i+j bij D(Bij ) = (−1)i+k bik D(Bik ) + (−1)i+j bij D(Bij )
j=1 j6=k
X
= (−1)i+k (saik + ta′ik )D(Bik ) + (−1)i+j bij (sD(Aij ) + tD(A′ij ))
j6=k
X
= (−1)i+k (saik D(Aik ) + ta′ik D(A′ik )) + (−1)i+j (saij D(Aij ) + ta′ij D(A′ij ))
j6=k
n
X n
X
=s (−1)i+j aij D(Aij ) + t (−1)i+j a′ij D(A′ij ) = sDn (A) + tDn (A′ ).
j=1 j=1

Hence, the mapping Dn meets condition (i).


Assume that the columns Ak and Ak+1 of the n × n matrix A are equal. If j 6= k and
j 6= k + 1, then two adjacent columns of Aij are equal. Hence D(Aij ) = 0 in this case.
It is also clear that Aik = Ai(k+1) and that aik = ai(k+1) . Therefore,

Dn (A) = (−1)i+k aik D(Aik ) + (−1)i+k+1 ai(k+1) D(Ai(k+1) ) = 0.

This shows that Dn satisfies condition (ii).


The ij-entry δij of the unit matrix equals 1 when i = j and 0 otherwise. We also see
that Iii is the unit matrix of order n − 1. Hence,
n
X
Dn (I) = (−1)i+j δij D(Iij ) = (−1)i+i δii D(Iii ) = D(Iii ) = 1
j=1

and, therefore, also condition (iii) is met.

Definition 4.7.
  We define the function Dn : Mn → R recursively as follows. For 1 × 1
matrices A = a , we set D1 (A) = a. When A is an n × n matrix where n ≥ 2, we set
n
X
Dn (A) = (−1)1+j a1j Dn−1 (A1j )
j=1

= a11 Dn−1 (A11 ) − a12 Dn−1 (A12 ) + · · · + (−1)1+n a1n Dn−1 (A1n ).

54
4.2 Definition of Determinants

Theorem 4.8. For every positive integer n, the function Dn is a determinant.

Proof. The function D1 satisfies condition (ii) for the simple reason that a square matrix
of order 1 has no adjacent columns. The other two conditions are trivially satisfied.
Hence D1 is a determinant. The statement now follows by induction on n, the induction
step being supplied by Theorem 4.6.

Hence, determinants exist of all orders and are unique. In order not to overload the
notation, we shall from now on denote determinants of all orders by D. Other notations
used for D(A), where A = [aik ]n×n , are det A and
a11 a12 · · · a1n
a21 a22 · · · a2n
.. .. .. .
. . .
an1 an2 · · · ann
For 2 × 2 matrices, the definition yields
a11 a12
= (−1)1+1 a11 a22 + (−1)1+2 a12 a21 = a11 a22 − a12 a21 .
a21 a22

Do not mistake a22 and a21 for absolute values here. To avoid this ambiguity, we shall
never again use this notation for determinants of order 1.
Example 4.9. Using the definition and the above formula for determinants of order 2,
we find that the determinant of the matrix
 
1 2 3
A= 4 5 6
7 8 9
is
1 2 3
5 6 4 6 4 5
det A = 4 5 6 = (−1)2 · 1 · + (−1)3 · 2 · + (−1)4 · 3 ·
8 9 7 9 7 8
7 8 9
= 5 · 9 − 6 · 8 − 2(4 · 9 − 6 · 7) + 3(4 · 8 − 5 · 7) = 0.

For a general 3 × 3 matrix  


a11 a12 a13
A = a21 a22 a23  ,
a31 a32 a33
we obtain
a22 a23 a a a a
D(A) = a11 − a12 21 23 + a13 21 22
a32 a33 a31 a33 a31 a32
= a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 )
= a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − (a11 a23 a32 + a12 a21 a33 + a13 a22 a31 ).

55
4 Determinants

The reader probably recognises this as Sarrus’s rule for determinants of order 3. Hence,
for determinants of order 2 and 3, our definition agrees with the ones usually stated for
such determinants.

4.3 Properties of Determinants


Theorem 4.10. Let A be an n × n matrix, where n ≥ 2, and i an integer such that
1 ≤ i ≤ n. Then
n
X
D(A) = (−1)i+j aij D(Aij )
j=1

= (−1)i+1 ai1 D(Ai1 ) + (−1)i+2 ai2 D(Ai2 ) + · · · + (−1)i+n ain D(Ain ).

Proof. By Theorem 4.6, the right-hand side is a determinant of order n. The equality
therefore follows from the uniqueness of determinants.

Theorem 4.11. Let A be a square matrix. Then D(A) = D(At ).

Proof. Define D ′ (A) = D(At ) for n × n matrices A. We intend to show that D ′ is a


determinant of order n. It will then follow from the uniqueness of the determinant that
D ′ (A) = D(A) for all square matrices A of order n. The statement holds for n = 1
since in this case At = A. We proceed by showing that each of the defining conditions is
satisfied by D ′ for every n ≥ 2.
To show condition (i) for D ′ we must show that D is linear in each row. Assume that
A = [aik ]n×n and let A′ and B be the matrices obtained from A by replacing its ith row
with  ′   
ai1 · · · a′in and sai1 + ta′i1 · · · sain + ta′in ,
respectively. Then Aij = A′ij = Bij for j = 1, . . . , n, and hence, by Theorem 4.10,
n
X
D(B) = (−1)i+j (saij + ta′ij )D(Bij )
j=1
Xn n
X
=s (−1)i+j aij D(Aij ) + t (−1)i+j a′ij D(A′ij ) = sD(A) + tD(A′ ).
j=1 j=1

Hence, D ′ meets condition (i).


Condition (ii) for D ′ means that D(A) = 0 if two adjacent rows of A are equal. For
n = 2, this follows from the formula D(A) = a11 a22 − a12 a21 . We proceed by induction
on n. Assume that n ≥ 3 and that the statement holds for orders less than n. Let i be
the index of a row other than the two equal rows. Then, for every j = 1, . . . , n, Aij has
two equal adjacent rows and hence, by hypothesis, D(Aij ) = 0. It now follows from
Theorem 4.10 that D(A) = 0.
Condition (iii) is satisfied by D ′ since D ′ (I) = D(I t ) = D(I) = 1.

56
4.3 Properties of Determinants

Since the columns of A are the rows of At , Theorems 4.10 and 4.11 yield the following
theorem.
Theorem 4.12. Let A be an n × n matrix, where n ≥ 2, and j an integer such that
1 ≤ j ≤ n. Then
Xn
D(A) = (−1)i+j aij D(Aij )
i=1
= (−1)1+j a1j D(A1j ) + (−1)2+j a2j D(A2j ) + · · · + (−1)n+j anj D(Anj ).
Theorem 4.13. The value of a determinant does not change when a multiple of a column
is added to another column or when a multiple of a row is added to another row.
Proof. The statement concerning columns is contained in Corollary 4.3. Since the rows
of a matrix are the columns of its transpose, the statement about rows follows from
Theorem 4.11.
Theorem 4.14. D(A) = 0 if two columns of A are equal or two rows of A are equal.
D(A) changes by a sign if two columns of A are interchanged or two rows of A are
interchanged.
Proof. The statements about columns are translations into the language of determinants
of the statements of Theorem 4.2. Their row analogues follow from Theorem 4.11.
Theorems 4.10, 4.12, 4.13 and 4.14 provide efficient tools for evaluation of determinants.
The formulae of the first two theorems are called expansion along a row and column,
respectively.
Using the recursive definition of determinants amounts to successive expansions along
the first row. A determinant of order 4 first splits into 4 determinants of order 3, and
then each of these splits into 3 determinants of order 2. Hence, we must evaluate 12
determinants of order 2.
By Theorem 4.13, a determinant can be transformed into one containing a row or
column having at most one non-zero entry. The expansion along the transformed row
or column contains at most one non-zero term. Correct use of the tools decreases the
workload significantly.
Example 4.15. We demonstrate the tools by, once again, evaluating the determinant
in Example 4.9. For the sake of easy calculations we choose to produce zeros in the third
column. By subtracting twice the first row from the second row and thrice the first row
from the last row, we obtain
1 2 3 1 2 3
4 5 6 = 2 1 0 .
7 8 9 4 2 0
Expanding along column 3, we get the same value as before:
1 2 3
2 1 1 2 1 2
2 1 0 = (−1)1+3 · 3 + (−1)2+3 · 0 + (−1)3+3 · 0 = 3(2 · 2 − 1 · 4) = 0.
4 2 4 2 2 1
4 2 0

57
4 Determinants

The zero terms are written out for the convenience of the reader. In fact, the whole
evaluation fits on a single line:

1 2 3 1 2 3
2 1
4 5 6 = 2 1 0 = (−1)1+3 · 3 = 3(2 · 2 − 1 · 4) = 0.
4 2
7 8 9 4 2 0

The reader is discouraged from using Sarrus’s rule. The reason for this is twofold.
Firstly, it applies only to determinants of order 3. Secondly, it requires unnecessarily
long calculations.

Example 4.16. Consider the determinant

3 8 4 6
5 3 2 4
.
7 11 4 3
11 13 6 10

Column 3 is best suited for elimination since its entries are integral multiples of its entry
in row 2. We subtract twice the second row from the first and third rows and thrice the
same row from the last row and get

−7 2 0 −2
5 3 2 4
.
−3 5 0 −5
−4 4 0 −2

Expansion along the third column yields

−7 2 −2 −7 2 −2
2+3
(−1) · 2 −3 5 −5 = −2 −3 5 −5 .
−4 4 −2 −4 4 −2

We now choose to eliminate in row 3 for the same reason as before. Hence, we subtract
twice the last column from the first column and add twice the same column to the second
column. Thus we get
−3 −2 −2
−2 7 −5 −5 .
0 0 −2

Expanding along the third row and then using the formula for determinants of order 2,
we obtain
−3 −2
−2(−1)3+3 (−2) = 4((−3)(−5) − (−2) · 7) = 116.
7 −5

58
4.3 Properties of Determinants

An upper triangular square matrix has zeros below its main diagonal. Likewise, a
lower triangular square matrix has zeros above its main diagonal. A diagonal square
matrix has zeros outside its main diagonal. Hence, a diagonal matrix is upper and lower
triangular. The determinant of a matrix of any of these kinds equals the product of the
diagonal entries of that matrix. For an upper triangular matrix, this can be seen by
successively expanding along the first column:

a11 a12 a13 · · · a1n


a22 a23 · · · a2n
0 a22 a23 · · · a2n
0 a33 · · · a3n
0 0 a33 · · · a3n = a11 . .. .. = · · · = a11 a22 a33 · · · ann .
.. .. .. .. .. . .
. . . .
0 0 · · · ann
0 0 0 · · · ann

Example 4.17. Here we evaluate the determinant

x 1 1 ··· 1
1 x 1 ··· 1
1 1 x ··· 1
.. .. .. ..
. . . .
1 1 1 ··· x

of order n. Observing that all the row sums are equal, we add the last n − 1 columns to
the first column and obtain
x + n − 1 1 1 ··· 1
x + n − 1 x 1 ··· 1
x + n − 1 1 x ··· 1 .
.. .. .. ..
. . . .
x + n − 1 1 1 ··· x

Now, subtracting the first row from the other rows, we get

x+n−1 1 1 ··· 1
0 x−1 0 ··· 0
0 0 x−1 ··· 0 .
.. .. .. ..
. . . .
0 0 0 ··· x − 1

Since this matrix is triangular, the determinant equals (x + n − 1)(x − 1)n−1 .

Theorem 4.18 (Product theorem). Let A and B be square matrices of order n.


Then D(AB) = D(A)D(B).

Proof. Let, for a fixed matrix A, the mappings D ′ and D ′′ from Mn to R be defined by
D ′ (B) = D(A)D(B) and D ′′ (B) = D(AB). When viewed as functions V n → R, where

59
4 Determinants

V is the linear space of n×1 columns, these mappings are n-multilinear alternating forms
on V . Since D is such a form, this statement is trivial for D ′ . For D ′′ , it follows from
the fact that D ′′ (B1 , . . . , Bn ) = D(AB1 , . . . , ABn ). Therefore, and since D ′ (I) = D ′′ (I),
Corollary 4.5 yields that D ′ (B) = D ′′ (B) for all n × n matrices B.

Theorem 4.19. A square matrix A is invertible if and only if D(A) 6= 0, and in that
case D(A−1 ) = (D(A))−1 .

Proof. Assume that A is invertible with inverse A−1 . Then it follows from Theorem 4.18
that D(A)D(A−1 ) = D(AA−1 ) = D(I) = 1, whence D(A) 6= 0 and D(A−1 ) = (D(A))−1 .
The converse amounts to saying that D(A) = 0 if A is not invertible.
  Assume that A is
not invertible and let n be the order of A. If n = 1, then A = 0 and hence D(A) = 0 in
this case. Otherwise, the columns of A are linearly dependent, and therefore one column
Ak is a linear combination of the other columns. Thus,

Ak = s1 A1 + · · · + sk−1 Ak−1 + sk+1 Ak+1 + · · · + sn An

for some real numbers s1 , . . . , sk−1 , sk+1 , . . . , sn , and therefore, by linearity in the kth
argument,
X
D(A) = D(A1 , . . . , Ak−1 , Ak , Ak+1 , . . . , An ) = si D(A1 , . . . , Ak−1 , Ai , Ak+1 , . . . , An ).
i6=k

Each determinant in the sum has two equal columns, and is therefore equal to zero by
Theorem 4.14. Hence, D(A) = 0 also in this case.

Example 4.20. Let us, for every a ∈ R, find the dimensions of ker A and im A where
 
a 2 1 2
2 a 2 1
A= 1 2
.
a 2
2 1 2 a

By Theorem 4.19, A is invertible if and only if D(A) 6= 0, and then dim ker A = 0 and
dim im A = 4. We observe that the column sums are equal. We choose to add the first
three rows to the last row. Thus we get

a 2 1 2 a 2−a 1−a 2−a


2 a 2 1 2 a−2 0 −1
D(A) = =
1 2 a 2 1 1 a−1 1
a+5 a+5 a+5 a+5 a+5 0 0 0
2−a 1−a 2−a 2−a 1−a 0
= −(a + 5) a − 2 0 −1 = −(a + 5) a − 2 0 1−a
1 a−1 1 1 a−1 0
2−a 1−a
= (a + 5)(1 − a) = (a + 5)(a − 1)2 (a − 3).
1 a−1

60
4.3 Properties of Determinants

If a = −5, a = 1 or a = 3, then D(A) = 0 and A is not invertible. It remains to determine


the dimensions in these three cases. We do this by solving the system AX = 0.
When a = −5, this system is
   
−5 2 1 2 0 −5 2 1 2 0
 2 −5 2 1 0  
  ⇔  12 −9 0 −3 0
1 2 −5 2 0  −24 12 0 12 0
2 1 2 −5 0 12 −3 0 −9 0
   
−5 2 1 2 0   1
 12 −9 0 −3 0 −5 2 1 2 0 1
⇔   0 −6 0 6 0
 ⇔  4 −3 0 −1 0 ⇔ X = t  
1 .
0 −1 0 1 0
0 6 0 −6 0 1

Hence, dim ker A = 1 and, by the rank-nullity theorem, dim im A = 3.


When a = 1, we get
   
1 2 1 2 0 1 2 2 0
1
2 1 2 1 0 2 1 1 0
2
  ⇔  
1 2 1 2 0 0 0 0 0
0
2 1 2 1 0 0 0 0
0 0
   
  −1 0
1 2 1 2 0 0 −1
⇔ ⇔ X = s   
 1  + t 0 .
0 −3 0 −3 0
0 1

In this case, dim ker A = dim im A = 2.


When a = 3, the system reads
   
3 2 1 2 0 3 2 1 2 0
2 3 2 1 0 −4 −1 0 −3 0
  ⇔  
1 2 3 2 0 −8 −4 0 −4 0
2 1 2 3 0 −4 −3 0 −1 0
   
3 2 1 2 0   −1
−4 −1 0 −3 0 3 2 1 2 0 1
⇔   0 −2 0 2 0 ⇔
 −4 −1 0 −3 0 ⇔ X = t  
−1 .
0 −1 0 1 0
0 −2 0 2 0 1

This time, dim ker A = 1 and dim im A = 3.


To sum up, we have found that dim ker A = 1, dim im A = 3 when a = −5 or a = 3,
dim ker A = dim im A = 2 when a = 1, and dim ker A = 0, dim im A = 4 for all other
values of a.

Example 4.21. A plane is parallel to the vectors u = (1, 2, 3) and v = (1, −1, 2) and
passes through the point Q = (2, 1, 4). A point P = (x, y, z) lies in the plane if and only
−−

if the vectors QP = (x − 2, y − 1, z − 4), u and v are linearly dependent. This in turn is

61
4 Determinants

equivalent to the determinant of the matrix with columns equal to the coordinate vectors
being zero. Hence, the equation of the plane is

x−2 1 1
2 −1 1 1 1 1
0 = y − 1 2 −1 = (x − 2) + (y − 1)(−1) + (z − 4)
3 2 3 2 2 −1
z−4 3 2
= 7(x − 2) + 1(y − 1) − 3(z − 4) = 7x + y − 3z − 3.

Theorem 4.22 (Cramer’s rule). Let A be a square matrix of order n and assume that
AX = Y where X and Y are column vectors. Then

D(A1 , . . . , Ai−1 , Y, Ai+1 , . . . , An ) = xi D(A).

In particular, if A is invertible, then the unique solution of the system AX = Y is given


by
D(A1 , . . . , Ai−1 , Y, Ai+1 , . . . , An )
xi = , i = 1, . . . , n.
D(A)

Proof. Let d = D(A1 , . . . , Ai−1 , Y, Ai+1 , . . . , An ). The equality AX = Y can be written


as Y = x1 A1 + · · · + xn An . Hence,
n
X n
X
d = D(A1 , . . . , Ai−1 , xk Ak , Ai+1 , . . . , An ) = xk D(A1 , . . . , Ai−1 , Ak , Ai+1 , . . . , An )
k=1 k=1
= xi D(A1 , . . . , Ai−1 , Ai , Ai+1 , . . . , An ) = xi D(A).

The second equality follows from the linearity in the ith argument and the third equality
follows from the fact that the kth determinant in the sum has two equal columns when
k 6= i. The last statement of the theorem should now be obvious.

Definition 4.23. Let A be an n × n matrix. The adjugate à of A is I if n = 1 and


otherwise
 
h i (−1)1+1 D(A11 ) · · · (−1)n+1 D(An1 )
 .. .. 
à = (−1)k+i D(Aki ) = . . .
n×n
(−1)1+n D(A1n ) · · · (−1)n+n D(Ann )

Note the reversal of indices.

Theorem 4.24. Let A = [aik ]n×n be a square matrix. Then AÃ = ÃA = D(A)I. In
particular, if A is invertible, then

1
A−1 = Ã.
D(A)

62
Exercises

Proof. The assertions are trivial for n = 1. Otherwise, the ikth entry of AÃ is
n
X
bik = aij (−1)k+j D(Akj ).
j=1

If k = i, this sum is the expansion of D(A) along the ith row. Hence, bii = D(A) for
i = 1, . . . , n. If k 6= i, let A′ be the matrix obtained from A by replacing the kth row by
the ith row and leaving all other rows unchanged. Then A′kj = Akj , and hence
n
X
bik = aij (−1)k+j D(A′kj ).
j=1

This is the expansion of D(A′ ) along the ith row. Since two rows of A′ are equal, we
have D(A′ ) = 0. Hence, bik = D(A′ ) = 0 if i 6= k. This shows that AÃ = D(A)I. It
ft . Hence
follows from the definition of the adjugate that Ãt = A
ft = D(At )I = D(A)I,
(ÃA)t = At Ãt = At A

and consequently, ÃA = D(A)I.


If A is invertible, then D(A) 6= 0. Therefore
 
1
A Ã = I,
D(A)
and the last assertion follows.

The methods mentioned earlier for solving systems of equations and finding inverses
usually involve much shorter calculations than the methods of the last two theorems. One
exception to this is the following formula for the inverse of an invertible 2 × 2 matrix:
 −1  
a11 a12 1 a22 −a12
= .
a21 a22 a11 a22 − a12 a21 −a21 a11

Example 4.25. Let A be an invertible square matrix with integer entries. Then A−1
has integer entries if and only if D(A) = ±1.
First assume that A−1 has integer entries. Then D(A) and D(A−1 ) are integers. By
the product theorem, D(A)D(A−1 ) = 1, and hence D(A) = ±1.
Since the entries of A are integers, also the entries of its adjugate are integers. If
D(A) = ±1, it therefore follows from Theorem 4.24 that the entries of A−1 are integers.

Exercises
4.1. Evaluate the following determinants.
1 1 2 666 667 669 1 a b+c
(a) 3 2 9 , (b) 667 668 670 , (c) 1 b c + a .
7 2 5 669 670 671 1 c a+b

63
4 Determinants

4.2. Evaluate the following determinants.


1 1 1 1 1 2 3 4 0 0 a b
1 1 2 1 4 1 2 3 0 a b b
(a) , (b) , (c) .
1 3 1 1 3 4 1 2 a b b b
4 1 1 1 2 3 4 1 b b b b
4.3. Solve the following equations.
x 0 0 2 x 1 1 1
0 x 0 2 1 x 1 1
(a) = 0, (b) = 0.
0 0 x 2 1 1 x 1
2 2 2 x 1 1 1 x
4.4. The numbers 17887, 22041, 34503, 46159, 54777 are divisible by 31. Show that
the determinant
1 7 8 8 7
2 2 0 4 1
3 4 5 0 3
4 6 1 5 9
5 4 7 7 7
is divisible by 31.
4.5. Show that the determinant of a skew-symmetric matrix of odd order equals zero.
4.6. Evaluate the following determinants of order n.
1 1 1 ··· 1 1 1 1 1 ··· 1 1
1 2 1 ··· 1 1 1 1 0 ··· 0 0
1 1 3 ··· 1 1 0 1 1 ··· 0 0
(a) . . . .. .. , (b) . . . .. .. ,
.. .. .. . . .. .. .. . .
1 1 1 ··· n − 1 1 0 0 0 ··· 1 0
1 1 1 ··· 1 n 0 0 0 ··· 1 1
1 2 0 0 ··· 0 0 0 0 1 0 0 ··· 0 0 0
1 3 2 0 ··· 0 0 0 2 0 1 0 ··· 0 0 0
0 1 3 2 ··· 0 0 0 0 2 0 1 ··· 0 0 0
(c) . . . . .. .. .. , (d) . . . . .. .. .. .
.. .. .. .. . . . .. .. .. .. . . .
0 0 0 0 ··· 1 3 2 0 0 0 0 ··· 2 0 1
0 0 0 0 ··· 0 1 3 0 0 0 0 ··· 0 2 0
4.7. Evaluate the determinant

0 1 2 ··· n − 1 3
1 0 1 ··· n − 2 2
2 1 0 ··· n − 3 . 1
.. .. .. .. ..
. . . . .
n − 1 n − 2 n − 3 n − 4 ··· 0

64
Exercises

4.8. Show that


1 2 3 ··· n
2 3 4 ··· 1
n(n−1) nn + nn−1
3 4 5 ··· 2 = (−1) 2 .
.. .. .. .. 2
. . . .
n 1 2 ··· n − 1

4.9. Show that


x 0 0 0 ··· 0 a0
−1 x 0 0 ··· 0 a1
0 −1 x 0 ··· 0 a2 = xn + an−1 xn−1 + · · · + a0 .
.. .. .. .. .. ..
. . . . . .
0 0 0 0 · · · −1 an−1 + x

4.10. (a) Let x1 , x2 , y1 , y2 be real numbers. Show that

1 −(x1 + x2 ) x1 x2 0
0 1 −(x1 + x2 ) x1 x2
= (x1 − y1 )(x1 − y2 )(x2 − y1 )(x2 − y2 ).
1 −(y1 + y2 ) y1 y2 0
0 1 −(y1 + y2 ) y1 y2

(b) Show that the quadratic polynomials a0 x2 + a1 x + a2 and b0 x2 + b1 x + b2


have a common zero if and only if

a0 a1 a2 0
0 a0 a1 a2
= 0.
b0 b1 b2 0
0 b0 b1 b2

Hint: Use the relationship between roots and coefficients.


4.11. The Vandermonde determinant of order n is
1 x1 x21 · · · x1n−1
1 x2 x22 · · · x2n−1
Vn (x1 , x2 , . . . , xn ) = . . .. .. .
.. .. . .
1 xn x2n · · · xnn−1

Show by induction on n that


Y
Vn (x1 , x2 , . . . , xn ) = (xk − xj ).
j<k

Hint for the induction step: Perform column operations to obtain a determinant
with zeros in all positions of the first row except the first position.

65
4 Determinants

4.12. Show that the determinant of an orthogonal matrix equals 1 or −1.


4.13. Suppose that there exist invertible n×n matrices A and B such that AB = −BA.
Show that n is even.
4.14. Find, for every real constant a, the dimensions of im A and ker A where
 
1 2 2a − 1
A= 1 a 3 .
2a − 3 2 3

4.15. Show that the matrix  


1 3 4 2
4 1 2 3
A=
2

3 1 4
3 2 4 1
is invertible and compute det A−1 .
4.16. For which values of x do the following vectors form a basis for R4 ?

(2, 1, 1, x), (1, 2, x, 1), (1, x, 2, 1), (x, 1, 1, 2).

4.17. Let  
a b c d
 b −a d −c 
A= 
 c −d −a b  .
d c −b −a
Show that det A = 0 only if a = b = c = d = 0. Hint: Consider AAt .
4.18. Let V be the volume of the parallelepiped spanned by the vectors u, v and w in
3-space. Show that
hu, ui hu, vi hu, wi
2
V = hv, ui hv, vi hv, wi .
hw, ui hw, vi hw, wi
Hint: Choose an orthonormal basis and use the product theorem.
4.19. Use Theorem 4.24 to find the inverses of the following matrices.
 
  1 1 2
1 2
(a) , (b) 0 3 1.
3 4
2 1 0

66
5 Linear Transformations
According to Definition 2.64 on page 26, a linear transformation F from U to V is a
function F : U → V such that

F (su + tv) = sF (u) + tF (v)

for all u and v in U and all real numbers s and t. If U = V , we also say that F is a
linear transformation on V . We shall here study linear transformations on a linear space
in more detail.

5.1 Matrix Representations of Linear Transformations


Definition 5.1. Let V be a finite-dimensional linear space with basis e1 , . . . , en and let
F be a linear transformation on V . The n×n matrix A whose columns are the coordinate
vectors of the images F (e1 ), . . . , F (en ) with respect to the basis e1 , . . . , en is called the
matrix of F with respect to the basis e1 , . . . , en .

Assume that the coordinates of u, F (u) and F (ek ), k = 1, . . . , n, are


     
x1 y1 a1k
     
X =  ...  , Y =  ...  and Ak =  ...  ,
xn ym amk

respectively. By this assumption and the linearity of F , we have

F (u) = F (x1 e1 + · · · + xn en ) = x1 F (e1 ) + · · · + xn F (en ).

Hence, by the uniqueness of coordinates,

Y = x1 A1 + · · · + xn An = AX

where A is the matrix of F with respect to the basis e1 , . . . , en .

Definition 5.2. Let V be a linear space. The linear transformation I on V defined by


I(u) = u for u ∈ V is called the identity mapping on V .

Example 5.3. Let I be the identity mapping on an n-dimensional linear space V and
let e1 , . . . , en be any basis for V . Since I(ek ) = ek for k = 1, . . . , n, the matrix of I with
respect to the basis is the unit matrix I of order n.
5 Linear Transformations

Example 5.4. Let there be given an orthonormal basis e1 , e2 for the plane and let F
be rotation about the origin through an angle θ. It is apparent from the figure below
that F is a linear transformation from the plane to itself. We also see that

F (e1 ) = (cos θ)e1 + (sin θ)e2 ,


F (e2 ) = (− sin θ)e1 + (cos θ)e2 .

F (u + v)
F (su) e2
F (u) F (e1 )
u+v F (u) F (e2 )
F (v) v
su
u θ
u θ
e1

Hence, the matrix of F with respect to e1 , e2 is


 
cos θ − sin θ
A= .
sin θ cos θ

If θ = π3 , then " √ #
1 1 − 3
A= √ ,
2 3 1
and the vector u with coordinates (1, 2) is mapped to the the vector F (u) with coordin-
ates   " √ #
1 1 1−2 3
A = √ .
2 2 3+2

Example 5.5. Let e1 , e2 , e3 be an orthonormal, positively oriented basis for 3-space.


Let F be rotation about the x3 -axis through an angle θ. The rotation appears anti-
clockwise on looking down the e3 -axis towards the origin if θ > 0, clockwise if θ < 0.
Then

F (e1 ) = (cos θ)e1 + (sin θ)e2 ,


F (e2 ) = (− sin θ)e1 + (cos θ)e2 ,
F (e3 ) = e3 .

The matrix is therefore  


cos θ − sin θ 0
 sin θ cos θ 0 .
0 0 1

Example 5.6. Consider the linear transformation F on P3 defined by

d2 p dp
F (p) = − .
dx2 dx

68
5.1 Matrix Representations of Linear Transformations

The basis vectors e1 = 1, e2 = x, e3 = x2 , e4 = x3 are mapped to

F (e1 ) = 0,
F (e2 ) = −1 = −e1 ,
F (e3 ) = 2 − 2x = 2e1 − 2e2 ,
F (e4 ) = 6x − 3x2 = 6e2 − 3e3 .

Hence, the matrix of F with respect to the given basis is


 
0 −1 2 0
0 0 −2 6 
 .
0 0 0 −3
0 0 0 0

Example 5.7. Let there be given an orthonormal basis e1 , e2 , e3 for 3-space and let F
be orthogonal projection on the plane 2x1 + 2x2 + x3 = 0. By Theorem 2.60, F is a
linear transformation. The vector e = 13 (2, 2, 1) is a unit normal vector to the plane.
The orthogonal projections of the basis vectors on the normal of the plane are
1·2+0·2+0·1 1 2
e′′1 = he1 , eie = · (2, 2, 1) = (2, 2, 1),
3 3 9
′′ 0·2+1·2+0·1 1 2
e2 = he2 , eie = · (2, 2, 1) = (2, 2, 1),
3 3 9
0 · 2 + 0 ·2+1·1 1 1
e′′3 = he3 , eie = · (2, 2, 1) = (2, 2, 1).
3 3 9
Hence, their projections on the plane are
2 1
e′1 = e1 − e′′1 = (1, 0, 0) − (2, 2, 1) = (5, −4, −2),
9 9
′ ′′ 2 1
e2 = e2 − e2 = (0, 1, 0) − (2, 2, 1) = (−4, 5, −2),
9 9
1 1
e′3 = e3 − e′′3 = (0, 0, 1) − (2, 2, 1) = (−2, −2, 8),
9 9
and the matrix of F with respect to the given basis is
 
5 −4 −2
1
−4 5 −2 .
9
−2 −2 8

Let F be a linear transformation on V , e1 , . . . , en a basis for V and A an n × n matrix.


If Ax = y whenever

F (x1 e1 + · · · + xn en ) = y1 e1 + · · · + yn en ,

then A is the matrix of F with respect to the basis. In fact, by setting x = εi we find
that the ith column of A is the coordinate vector of F (ei ). We use this observation in
the next example.

69
5 Linear Transformations

Example 5.8. Let us find the matrix of projection on the plane U ′ : x1 + 2x2 + 3x3 = 0
along the line U ′′ : x = t(2, 1, 2) in Example 2.63. Let the coordinates of u be (ξ1 , ξ2 , ξ3 )
and form the line x = (ξ1 , ξ2 , ξ3 ) + t(2, 1, 2). The image u′ of u is the intersection of this
line with the plane. The intersection is given by
ξ1 + 2ξ2 + 3ξ3
ξ1 + 2t + 2(ξ2 + t) + 3(ξ3 + 2t) = 0 ⇔ t=− .
10
U ′′

u′′ u

U′ u′

Hence,
u′ = (ξ1 , ξ2 , ξ3 ) + t(2, 1, 2)
1
= ((10ξ1 , 10ξ2 , 10ξ3 ) − (2ξ1 + 4ξ2 + 6ξ3 , ξ1 + 2ξ2 + 3ξ3 , 2ξ1 + 4ξ2 + 6ξ3 ))
10
1
= (8ξ1 − 4ξ2 − 6ξ3 , −ξ1 + 8ξ2 − 3ξ3 , −2ξ1 − 4ξ2 + 4ξ3 ).
10
The matrix is therefore  
8 −4 −6
1 
−1 8 −3 .
10
−2 −4 4
Theorem 5.9. Let F and G be linear transformations on a linear space V with basis
e1 , . . . , en and assume that the matrices of F and G are A and B, respectively. Then
the composition F G is a linear transformation on V with matrix AB.
Proof. F G is a linear transformation by Theorem 2.69. Assume that w = F G(u) where
the coordinates of w and u are z and x, respectively. Set v = G(u) and let the coordin-
ates of v be y. Then w = F (v), z = Ay, y = Bx, and therefore z = ABx.
From Theorem 2.73 we know that a linear transformation F on a finite-dimensional linear
space V is one-to-one if and only if it is onto. In that case Theorem 2.71 yields that F −1
is a linear transformation on V .
Theorem 5.10. Let F be a linear transformation on a linear space V with basis
e1 , . . . , en and let A be its matrix. Then F is invertible if and only if A is invertible, and
in that case the matrix of F −1 is A−1 .
Proof. The invertibility of F means that the equation F (u) = v has a unique solution
for every v ∈ V . This is equivalent to the equation Ax = y having a unique solution
for every y ∈ Rn , which by Theorem 1.19 means that A is invertible. Assume that F
is invertible and let B be the matrix of F −1 . Since F F −1 = I is the identity mapping,
Theorem 5.9 gives that AB = I, and hence B = A−1 .

70
5.2 Change of Basis

Example 5.11. Let F1 and F2 be rotations in 2-space about the origin through the
angles θ1 and θ2 , respectively. Clearly, F = F1 F2 is rotation about the origin through
the angle θ = θ1 + θ2 . From this and Example 5.4, we get
    
cos θ1 − sin θ1 cos θ2 − sin θ2 cos (θ1 + θ2 ) − sin (θ1 + θ2 )
= .
sin θ1 cos θ1 sin θ2 cos θ2 sin (θ1 + θ2 ) cos (θ1 + θ2 )

This can, of course, also be shown by using the angle addition formulae. In particular,
F1 F2 = I when θ1 = −θ2 . Hence,
 −1    
cos θ − sin θ cos (−θ) − sin (−θ) cos θ sin θ
= =
sin θ cos θ sin (−θ) cos (−θ) − sin θ cos θ

for all real numbers θ.

5.2 Change of Basis


Let e1 , . . . , en and e′1 , . . . , e′n be two bases for the linear space V and suppose that

e′1 = t11 e1 + t21 e2 + · · · + tn1 en ,


e′2 = t12 e1 + t22 e2 + · · · + tn2 en ,
..
.
e′n = t1n e1 + t2n e2 + · · · + tnn en .

Assume that the coordinates of u with respect to the two bases are x = (x1 , . . . , xn ) and
x′ = (x′1 , . . . , x′n ), respectively. Then

u = x1 e1 + · · · + xn en and u = x′1 e′1 + · · · + x′n e′n .

The second equality gives that

u = x′1 (t11 e1 + t21 e2 + · · · + tn1 en ) + · · · + x′n (t1n e1 + t2n e2 + · · · + tnn en )


= (x′1 t11 + · · · + x′n t1n )e1 + · · · + (x′1 tn1 + · · · + x′n tnn )en .

Since the coordinates of u are unique, the first equality now gives that

x1 = t11 x′1 + t12 x′2 + · · · + t1n x′n ,


x2 = t21 x′1 + t22 x′2 + · · · + t2n x′n ,
..
.
xn = tn1 x′1 + tn2 x′2 + · · · + tnn x′n .

Hence, x = T x′ where T is the matrix whose columns are the coordinate vectors of
e′1 , . . . , e′n with respect to the basis e1 , . . . , en . Conversely, if x = T x′ whenever

x1 e1 + · · · + xn en = x′1 e′1 + · · · + x′n e′n ,

71
5 Linear Transformations

then the kth column of T must be the coordinate vector of e′k with respect to e1 , . . . , en .
This can be seen by setting x′ = εk . We call T the transition matrix from basis e′1 , . . . , e′n
to basis e1 , . . . , en .
If S is the transition matrix from a basis e′′1 , . . . , e′′n to e′1 , . . . , e′n and if

x1 e1 + · · · + xn en = x′1 e′1 + · · · + x′n e′n = x′′1 e′′1 + · · · + x′′n e′′n ,

then x = T x′ = T Sx′′ . Hence T S is the transition matrix from e′′1 , . . . , e′′n to e1 , . . . , en .


If e′′k = ek for k = 1, . . . , n, then T S = I is the transition matrix from e1 , . . . , en to itself.
This shows that the transition matrix T from e′1 , . . . , e′n to e1 , . . . , en is invertible and
that its inverse S = T −1 is the transition matrix from e1 , . . . , en to e′1 , . . . , e′n .

Example 5.12. Consider the bases e1 = (1, 2, 1), e2 = (1, 1, 2), e3 = (1, 4, 0) and
ε1 , ε2 , ε3 for R3 in Example 2.52. Here

e1 = 1ε1 + 2ε2 + 1ε3 ,


e2 = 1ε1 + 1ε2 + 2ε3 ,
e3 = 1ε1 + 4ε2 + 0ε3 .

The transition matrix from e1 , e2 , e3 to ε1 , ε2 , ε3 is therefore


 
1 1 1
T = 2 1 4 .
1 2 0

In order to find the coordinates x = (x1 , x2 , x3 ) of u = (3, 7, 3) = 3ε1 + 7ε2 + 3ε3 with
respect to e1 , e2 , e3 , we can solve the system
   
x1 3
T x2  = 7 .
x3 3

We did that in Example 2.52, where we obtained x = (1, 1, 1). Another possibility would
be to compute the transition matrix
 
8 −2 −3
T −1 = −4 1 2
−3 1 1

from ε1 , ε2 , ε3 to e1 , e2 , e3 and then get the coordinates


    
8 −2 −3 3 1
−4 1 2  7 = 1
 
−3 1 1 3 1

by matrix multiplication.

72
5.2 Change of Basis

Theorem 5.13. Let V be an inner product space. If T is the transition matrix from an
orthonormal basis e′1 , . . . , e′n to an orthonormal basis e1 , . . . , en , then T is orthogonal.

Proof. Let ti = (t1i , . . . , tni ) be the coordinates of e′i with respect to e1 , . . . , en . Since
e1 , . . . , en is an orthonormal basis, it follows from Theorem 3.18 that he′i , e′j i = ti · tj .
Since also e′1 , . . . , e′n is an orthonormal basis, we find that t1 , . . . , tn , and hence the
columns of T , form an orthonormal set in Rn .

Let F be a linear transformation on a linear space V and assume that the matrix of
F is A with respect to a basis e1 , . . . , en for V . Then Ax = y whenever
F (x1 e1 + · · · + xn en ) = y1 e1 + · · · + yn en .
Let us introduce a new basis e′1 , . . . , e′n for V . Denote the transition matrix from
e′1 , . . . , e′n to e1 , . . . , en by T . If
F (x′1 e′1 + · · · + x′n e′n ) = y1′ e′1 + · · · + yn′ e′n ,
then AT x′ = T y ′ and hence T −1AT x′ = y ′ . This shows that the matrix of F with
respect to the new basis is A′ = T −1AT . Thus we have proved the following theorem.
Theorem 5.14. Let F : V → V be a linear transformation with matrix A with respect
to a basis e1 , . . . , en and let T be the transition matrix from a basis e′1 , . . . , e′n to
e1 , . . . , en . Then the matrix of F with respect to e′1 , . . . , e′n is
A′ = T −1AT. (5.1)

When V is an inner product space and both bases are orthonormal, it follows from
Theorem 5.13 that (5.1) can also be written as A′ = T tAT .
If A and A′ are the matrices of F with respect to two bases, then A′ = T −1AT for
some matrix T , and hence det A′ = det (T −1AT ) = det T −1 det A det T = det A. This
justifies the following definition.
Definition 5.15. Let F be a linear transformation on a finite-dimensional non-zero
linear space V . We define the determinant det F as det A where A is the matrix of F
with respect to any basis for V .

Example 5.16. Let e1 , e2 , e3 be an orthonormal, positively oriented basis for 3-space.


Let F be rotation about the line x = t(2, 1, 2) through the angle π3 and suppose that
the rotation appears anticlockwise on looking from the point (2, 1, 2) towards the origin.
We set out to find the matrix A of F with respect to the given basis. To that end we
introduce a new orthonormal, positively oriented basis e′1 , e′2 , e′3 . We choose e′3 as the
unit direction vector 13 (2, 1, 2) of the line. Then we take e′2 as any unit vector orthogonal
to e′3 , for example e′2 = 31 (1, 2, −2). Finally, we set e′1 = e′2 × e′3 = 13 (2, −2, −1). By
Example 5.5, the matrix of F with respect to e′1 , e′2 , e′3 is
   √ 
cos π3 − sin π3 0 1 − 3 0
1 √
A′ =  sin π3 cos π3 0 =  3 1 0 .
2
0 0 1 0 0 2

73
5 Linear Transformations

The transition matrix from e′1 , e′2 , e′3 to e1 , e2 , e3 is


 
2 1 2
1
T = −2 2 1 .
3
−1 −2 2

Since both bases are orthonormal, A′ = T tAT and hence


 √ √ 
1  13√ 2 − 6 3 4 + 3√3 
A = T A′ T t = 2 + 6 3 10 2 − 6 3 .
18 √ √
4−3 3 2+6 3 13

5.3 Projections and Reflections


Suppose that V = U ′ ⊕ U ′′ is the direct sum of two subspaces of V and let u′ ∈ U ′ and
u′′ ∈ U ′′ be the projections of u defined on page 24. The function P : V → V defined
by P (u) = u′ for u ∈ V is called the projection of V on U ′ along U ′′ . By Theorem 2.60,
P is a linear transformation on V .
Since P (u′ + u′′ ) = u′ for all u′ ∈ U ′ and all u′′ ∈ U ′′ , we see that im P = U ′ and
ker P = U ′′ .

Theorem 5.17. Let P be a linear transformation on a linear space V . Then P is a


projection if and only if P 2 = P .

Proof. Suppose that P is projection on U ′ along U ′′ and let u ∈ V . Then

P 2 (u) = P (P (u)) = P (P (u′ + u′′ )) = P (u′ ) = P (u).

To show the converse, we assume that P 2 = P . Let u be any vector of V and set
u′ = P (u) and u′′ = u − u′ . Then u = u′ + u′′ and u′ ∈ im P . Since

P (u′′ ) = P (u) − P (u′ ) = P (u) − P 2 (u) = P (u) − P (u) = 0,

we also see that u′′ ∈ ker P . Hence V = im P + ker P . We show that V = im P ⊕ ker P
by showing that im P ∩ ker P = {0}. If u ∈ im P ∩ ker P , then u = P (v) for some v ∈ V
and u = P (v) = P 2 (v) = P (u) = 0. Hence, V = im P ⊕ ker P by Theorem 2.61. Now
let u = u′ + u′′ where u′ ∈ im P and u′′ ∈ ker P . Then u′ = P (v) for some v ∈ V ,
and therefore P (u) = P (u′ ) + P (u′′ ) = P (P (v)) = P (v) = u′ . This shows that P is
projection on im P along ker P .

Corollary 5.18. Let V be a linear space with basis e1 , . . . , en and let A be the matrix
of a linear transformation P on V . Then P is a projection if and only if A2 = A.

Note that if P is projection on U ′ along U ′′ , then I − P is projection on U ′′ along U ′ .


Hence U ′ = im P = ker (I − P ) = ker (P − I). We use this observation in the next
example.

74
5.3 Projections and Reflections

Example 5.19. We set out to show that


 
4 3 2
1
A = −2 −3 −4
3
1 3 5

is the matrix of a projection on U ′ along U ′′ where U ′ and U ′′ are subspaces of R3 . By


the corollary, it is sufficient to show that A2 = A, and indeed,
    
4 3 2 4 3 2 12 9 6
1 1
A2 = −2 −3 −4 −2 −3 −4 = −6 −9 −12 = A.
9 9
1 3 5 1 3 5 3 9 15

Since U ′ = ker (P − I), we get U ′ by solving the system (A − I)x = 0 ⇔ 3(A − I)x = 0.
 
1 3 2 0  
−2 −6 −4 0 ⇔ 1 3 2 0 .
1 3 2 0

Hence, U ′ is the plane x1 + 3x2 + 2x3 = 0. In order to find U ′′ = ker A, we solve the
system Ax = 0.
   
4 3 2 0 4 3 2 0
−2 −3 −4 0 ⇔  2 0 −2 0 ⇔ x = t(1, −2, 1).
1 3 5 0 −3 0 3 0

Thus we have shown that A is the matrix of projection on the plane x1 + 3x2 + 2x3 = 0
along the line x = t(1, −2, 1).

Definition 5.20. Assume that V = U ′ ⊕ U ′′ and let P ′ be projection on U ′ along U ′′ .


The linear transformation R′ = 2P ′ − I is called reflection in U ′ along U ′′ .

Let P ′′ be projection on U ′′ along U ′ and let R′′ be reflection in U ′′ along U ′ . Then


I = P ′ + P ′′ , and hence R′ = 2(I − P ′′ ) − I = I − 2P ′′ = −R′′ . This means that R′ also
can be described as reflection in U ′′ along U ′ followed by reflection in the origin.

U ′′
R′′ (u) u′′ u

U′
−u′ u′

−u −u′′ R′ (u)

Theorem 5.21. Let R be a linear transformation on a linear space V . Then R is a


reflection if and only if R2 = I.

75
5 Linear Transformations

Proof. Assume that R = 2P − I is a reflection. Then R2 = 4P 2 − 4P + I = I since


P 2 = P . Assume that R2 = I, and set P = 21 (R + I). Then R = 2P − I and P is a
projection since P 2 = 14 (R2 + 2R + I) = 12 (R + I) = P . Hence R is a reflection.

Corollary 5.22. Let V be a linear space with basis e1 , . . . , en and let A be the matrix
of a linear transformation R on V . Then R is a reflection if and only if A2 = I.

Theorem 5.23. Assume that V = U ′ ⊕ U ′′ and let R be reflection in U ′ along U ′′ . Then


U ′ = ker(R − I) and U ′′ = ker(R + I).

Proof. The statement follows from the fact that ker(R− I) = ker(−2P ′′ ) = ker(P ′′ ) = U ′
and ker(R + I) = ker(2P ′ ) = ker(P ′ ) = U ′′ .

Theorem 5.24. Let V = U ′ ⊕ U ′′ and assume that n = dim V > 0 and k = dim U ′ .
The determinant of the reflection F in U ′ along U ′′ is then (−1)n−k .

Proof. If k = n, then F = I, and the statement is true. If k = 0, then F = −I, and


det F = det (−I) = (−1)n det I = (−1)n . Otherwise, we can choose a basis e1 , . . . , ek
for U ′ and a basis ek+1 , . . . , en for U ′′ . Then e1 , . . . , en form a basis for V . We have
F (ei ) = ei for i = 1, . . . , k and F (ei ) = −ei for i = k + 1, . . . , n. The matrix A of F with
respect to this basis is a diagonal matrix with k diagonal entries 1 and n − k entries −1.
Therefore, det F = det A = 1k · (−1)n−k = (−1)n−k .

Definition 5.25. Let V be an inner product space and U a subspace. If V = U ⊕ U ⊥ ,


we call projection on U along U ⊥ orthogonal projection on U . We also call reflection
in U along U ⊥ orthogonal reflection in U .

Lemma 5.26. Assume that V = U ⊕W where V is an inner product space and U and W
are subspaces of V . If hu, wi = 0 for all u ∈ U and w ∈ W , then W = U ⊥ .

Proof. It is true that W ⊆ U ⊥ , for if w ∈ W , then hu, wi = 0 for all u ∈ U . It remains


to show the reverse inclusion. Assume that v ∈ U ⊥ . By assumption, v = u + w for some
vectors u ∈ U and w ∈ W . Since 0 = hu, vi = hu, ui + hu, wi = hu, ui, we have u = 0
and, therefore, v = w ∈ W . Hence U ⊥ ⊆ W .

Definition 5.27. Let F be a linear transformation on an inner product space V . We


say that F is symmetric if
hF (u), vi = hu, F (v)i
for all vectors u and v in V .

Theorem 5.28. Let F be a linear transformation on an inner product space V . Then


(i) F is an orthogonal projection if and only if F is a projection and F is symmetric,
(ii) F is an orthogonal reflection if and only if F is a reflection and F is symmetric.

76
5.3 Projections and Reflections

Proof. (i) Assume that F is orthogonal projection on U . If u and v are vectors of V , we


write u = u′ + u′′ and v = v ′ + v ′′ where u′ and v′ belong to U and u′′ and v ′′ belong
to U ⊥ . Since F (u) = u′ and F (v) = v ′ are orthogonal to u′′ and v′′ , we have
hF (u), vi = hu′ , v ′ + v ′′ i = hu′ , v ′ i + hu′ , v ′′ i = hu′ , v ′ i,
hu, F (v)i = hu′ + u′′ , v ′ i = hu′ , v ′ i + hu′′ , v ′ i = hu′ , v ′ i.
Hence, hF (u), vi = hu, F (v)i for all u and v in V , and F is symmetric.
To prove the converse, we assume that F is a symmetric projection on im F along
ker F where V = im F ⊕ ker F . Assume that u = F (v) ∈ im F and w ∈ ker F . Then
hu, wi = hF (v), wi = hv, F (w)i = hv, 0i = 0,
and it follows from Lemma 5.26 that ker F = (im F )⊥ . Consequently, F is an orthogonal
projection.
(ii) The statement about reflections follows from the fact that R = 2P −I is symmetric
if and only if P is symmetric.

If we think of vectors x ∈ Rn as column matrices, then x · y = xt y.


Lemma 5.29. Let A and B be n × n matrices. Then A = B if and only if xt Ay = xt By
for all x and y in Rn .

Proof. If xt Ay = xt By for all x and y in Rn , this is true for x = εi and y = εk . Hence,


Aik = εti Aεk = εti Bεk = Bik for all indices i, k. The reverse implication is trivial.

Theorem 5.30. Let F be a linear transformation on an inner product space V with


orthonormal basis e1 , . . . , en and let A be the matrix of F with respect to that basis.
Then F is symmetric if and only if A is symmetric.

Proof. If x and y are the coordinates of u and v, respectively, then


hF (u), vi = hu, F (v)i ⇔ (Ax)t y = xt Ay ⇔ xt At y = xt Ay.
Hence, F is symmetric if and only if xt At y = xt Ay for all x and y in Rn . By the lemma,
this is equivalent to At = A.

Example 5.31. Let there be given an orthonormal basis e1 , e2 , e3 for a 3-dimensional


inner product space V and consider the linear transformation F with matrix
 
7 4 −4
1
A =  4 1 8 .
9
−4 8 1
We wish to show that F is an orthogonal reflection. Since A is a symmetric matrix and
the basis is orthonormal, F is a symmetric linear transformation. Since
    
7 4 −4 7 4 −4 81 0 0
1  1 
A2 = 4 1 8  4 1 8  = 0 81 0  = I,
81 81
−4 8 1 −4 8 1 0 0 81

77
5 Linear Transformations

F is also a reflection. Hence, F is an orthogonal reflection. We also want to find the


subspace U = ker (F − I) of reflection. To do so, we solve the system (A − I)x = 0.
 
−2 4 −4 0  
 4 −8 8 0 ⇔ 1 −2 2 0 .
−4 8 −8 0

Hence, F is orthogonal reflection in the plane x1 − 2x2 + 2x3 = 0.

Note that the classes of linear transformations we have discussed so far are not ex-
clusive. For example, the identity mapping I on a 2-dimensional inner product space V
is rotation through the angle 0 about the origin, orthogonal projection on V as well as
orthogonal reflection in V .

5.4 Isometries
Definition 5.32. Let F be a linear transformation on an inner product space V . We
say that F is an isometry if kF (u)k = kuk for all u ∈ V .

Theorem 5.33. F is an isometry if and only if hF (u), F (v)i = hu, vi for all u and v
in V .

Proof. If hF (u), F (v)i = hu, vi for all u and v, then in particular,

kF (u)k2 = hF (u), F (u)i = hu, ui = kuk2 , u ∈ V,

and hence F is an isometry.


Suppose that F is an isometry. Then it follows from the identity

ku + vk2 = kuk2 + kvk2 + 2hu, vi

that

2hF (u), F (v)i = kF (u) + F (v)k2 − kF (u)k2 − kF (v)k2


= kF (u + v)k2 − kF (u)k2 − kF (v)k2
= ku + vk2 − kuk2 − kvk2 = 2hu, vi,

and therefore hF (u), F (v)i = hu, vi for all u and v.

Hence, an isometry preserves lengths and inner products. Since

hF (u), F (v)i hu, vi


=
kF (u)kkF (v)k kukkvk

for non-zero vectors u and v, it also preserves angles.

78
5.4 Isometries

Theorem 5.34. Let A be the matrix of a linear transformation F on an inner product


space V with respect to an orthonormal basis e1 , . . . , en for V . Then F is an isometry if
and only if A is an orthogonal matrix.

Proof. Let u and v be vectors of V and x and y their coordinates. Since the basis is
orthonormal, hu, vi = xt y = xt Iy and hF (u), F (v)i = (Ax)t Ay = xt At Ay. Hence,
by Theorem 5.33, F is an isometry if and only if xt At Ay = xt Iy for all x and y. By
Lemma 5.29, this in turn is equivalent to At A = I.

Corollary 5.35. If F is an isometry on a finite-dimensional non-zero inner product


space, then det F = ±1.

Proof. The matrix A of F with respect to an orthonormal basis is orthogonal. Hence,

(det A)2 = det At det A = det (At A) = det I = 1

from which it follows that det F = det A = ±1.

Note that A need not be orthogonal even if det A = ±1. Hence, the converse of the
statement of the theorem is not true.
Theorem 5.36. Let F be orthogonal reflection in a subspace U of an inner product
space V = U ⊕ U ⊥ . Then F is an isometry.

Proof. We have F = I − 2P where P is orthogonal projection on U ⊥ . Hence

kF (u)k2 = ku − P (u) − P (u)k2 = ku − P (u)k2 + kP (u)k2 = kuk2

by the Pythagorean theorem.

Theorem 5.37. Let F be an isometry on an inner product space V . Then F is an


orthogonal reflection if and only if F is symmetric.

Proof. If F is an orthogonal reflection, then F is symmetric by Theorem 5.28. To prove


the converse, we assume that F is a symmetric isometry. By the same theorem, it is then
sufficient to show that F is a reflection. Hence, by Theorem 5.21, it is sufficient to show
that F 2 = I, and this follows from the fact that

kF 2 (u) − uk2 = kF 2 (u)k2 + kuk2 − 2hF 2 (u), ui = kuk2 + kuk2 − 2hF (u), F (u)i
= 2kuk2 − 2kF (u)k2 = 0, u ∈ V.

5.4.1 Isometries in Two Dimensions


Let F be an isometry on a 2-dimensional inner product space V and choose an orthonor-
mal basis e1 , e2 for V . Denote by θ the angle between e1 and F (e1 ). Since e1 and e2 are
orthogonal and of length 1, so are F (e1 ) and F (e2 ). This leaves us with the following
two possibilities:

79
5 Linear Transformations

e2 F (e ) e2 F (e )
1 1

F (e2 ) θ θ/2
θ θ/2
e1 e1
F (e2 )

In the first case, F is rotation through the angle θ about the origin. As we saw in
Example 5.4, the matrix of F is
 
cos θ − sin θ
A= .
sin θ cos θ
Hence, det F = det A = cos2 θ + sin2 θ = 1 in this case.
In the second case, F is reflection in the bisector of the angle between e1 and F (e1 ).
Comparing with the previous case, we understand that F (e2 ) here is obtained from F (e2 )
there by a change of sign. Hence, the matrix is
 
cos θ sin θ
A= ,
sin θ − cos θ
and det F = det A = −1.
Theorem 5.38. An isometry F on a 2-dimensional inner product space is either a ro-
tation about the origin or reflection in a line. In the first case, det F = 1, and in the
second case, det F = −1.

5.4.2 Isometries in Three Dimensions


Theorem 5.39. An isometry F on a 3-dimensional inner product space V is either
rotation about a line through the origin or rotation about such a line followed by reflection
in the origin. In the first case, det F = 1, and in the second case, det F = −1.
Proof. Let A be the matrix of F with respect to a basis for V and λ a real number. The
determinant
a11 − λ a12 a13
det (F − λI) = det (A − λI) = a21 a22 − λ a23
a31 a32 a33 − λ
is a polynomial of degree 3 in λ. Hence, it has a real zero λ. For this λ there exists
a non-zero x ∈ R3 such that (A − λI)x = 0 or, equivalently, Ax = λx. Thus there
exists a non-zero vector v ∈ V such that F (v) = λv. Since F is an isometry, we have
|λ| kvk = kλvk = kF (v)k = kvk, whence λ = ±1. It follows that dim (ker (F − I)) > 0
or dim (ker (F + I)) > 0. Let U = ker (F − I) and n = dim U .
If n = 3, then F (u) = u for all u ∈ V . Hence, in this case, F = I is the identity
mapping, which can be regarded as rotation about any line through the angle 0.
Assume that n = 2 and let u be a non-zero vector orthogonal to U . Then F (u) is
orthogonal to U and kF (u)k = kuk since F preserves inner products and norms. Since

80
5.4 Isometries

u ∈ / U , we have F (u) 6= u, and it follows that F (u) = −u. Hence F is orthogonal


reflection in the plane U . In this case, F can also be regarded as rotation through the
angle π about the line through the origin orthogonal to U followed by reflection in the
origin.
Now assume that n = 1 so that U is a line through the origin. If u ∈ U ⊥ , then
F (u) ∈ U ⊥ since F preserves inner products. The restriction of F to the two-dimensional
space U ⊥ is therefore an isometry on U ⊥ . This restriction is therefore either a rotation
about the origin or reflection in a line contained in U ⊥ . In the latter case, there would
exist a non-zero vector u ∈ U ⊥ such that F (u) = u, contradicting the assumption that
n = 1. Hence F is rotation about the line U in this case.
If n = 0, then dim (ker (−F − I)) = dim (ker (F + I)) > 0. It follows from what
we have proved that −F is rotation about a line or rotation about a line followed by
reflection in the origin, and so is F = −(−F ).
When F is a rotation, its matrix with respect to an orthonormal basis for V consisting
of two vectors e1 and e2 orthogonal to the line and one vector e3 parallel to the line is
of the form  
cos θ − sin θ 0
A =  sin θ cos θ 0 .
0 0 1
Hence, det F = det A = 1 when F is a rotation. In the other case, −F is a rotation, and
therefore det F = (−1)3 det (−F ) = (−1) · 1 = −1.

Suppose that we are given the matrix A of a linear transformation F on a 3-dimensional


inner product space V with respect to an orthonormal basis. If we are asked to show
that F is an isometry and to find its geometric meaning, we can proceed as follows:

• Establish that F is an isometry by showing that A is orthogonal.

• Find U = ker (F − I) by solving the system (A − I)x = 0.


• Let n = dim U and take the appropriate action.
n = 3: In this case F = I, and we have completed the task.
n = 2: In this case F is orthogonal reflection in the plane U .
n = 1: In this case, F is rotation about the line U . Let U
w be a direction vector of U , take any vector u
orthogonal to w and compute v = F (u). The w
angle of rotation then equals the angle between
v
u and v. If the vectors u, v, w are positively ori- θ
ented, then the rotation appears anticlockwise on U⊥ u
looking in the opposite direction of the vector w.
Otherwise, it appears clockwise.
n = 0: Set G = −F , find U = ker (G − I) = ker (F + I) and set n = dim U .
Then n = 3 or n = 1. If n = 3, then F = −I. If n = 1, then F is the
rotation G followed by reflection in the origin.

81
5 Linear Transformations

Example 5.40. Consider the linear transformation F with matrix


 
2 −1 2
1
A=  2 2 −1
3
−1 2 2

with respect to an orthonormal, positively oriented basis for 3-space. Since AAt = I,
A is orthogonal, and therefore F is an isometry. We solve the system Ax = x.
  
 2x1 − x2 + 2x3 = 3x1 −1 −1 2 0
2x + 2x2 − x3 = 3x2 ⇔  2 −1 −1 0 ⇔ x = t(1, 1, 1).
 1
−x1 + 2x2 + 2x3 = 3x3 −1 2 −1 0
Hence, F is a rotation about the line U = [(1, 1, 1)]. We set w = (1, 1, 1). A vector
orthogonal to w is, for example, u = (1, −1, 0). We have v = F (u) = (1, 0, −1), and the
angle θ between u and v is given by
hu, vi 1
cos θ = = .
kukkvk 2
The angle of rotation is therefore θ = π3 . The determinant of the matrix having columns
u, v, w is
1 1 1
−1 0 1 = 3 > 0.
0 −1 1
Therefore, the rotation appears anticlockwise on looking from the point (1, 1, 1) towards
the origin.

Example 5.41. Consider a linear transformation F with matrix


 
2 −1 a
1
A= 2 2 b
3
−1 2 c
with respect to an orthonormal basis for a 3-dimensional inner product space. Let us
determine the values of a, b and c for which the transformation is a rotation. A necessary
condition is that A be orthogonal. Since the first two columns are orthogonal and of
length 1, we need only require that the last column be orthogonal to the first two columns
and of length 1. The orthogonality condition means that

2a + 2b − c = 0
⇔ (a, b, c) = t(2, −1, 2).
−a + 2b + 2c = 0
The length condition is now satisfied if and only if kt(2, −1, 2)k = 3 |t| = 3, which is
equivalent to t = ±1. Hence,
 
2 −1 2t
1
A=  2 2 −t
3
−1 2 2t

82
Exercises

where t = ±1. We have


 3 2 −1 2t 2 −1 2
1 t t
det A = 2 2 −t = 2 2 −1 = · 27 = t.
3 27 27
−1 2 2t −1 2 2

By Theorem 5.39, F is a rotation if and only if t = 1. Hence (a, b, c) = (2, −1, 2).

Example 5.42. We wish to determine a, b and c so that


 
a b c
1
A= 4 7 −4
9
8 −4 1

becomes the matrix of an orthogonal reflection F in a plane in R3 with respect to the


basis ε1 , ε2 , ε3 . As in the previous example, we first make sure that A is orthogonal.
This time we use the fact that the matrix is orthogonal if and only if its rows form
an orthonormal basis for R3 and get (a, b, c) = ±(1, 4, 8). By Theorem 5.37, F is an
orthogonal reflection if and only if A is symmetric. Hence (a, b, c) = (1, 4, 8). We still do
not know whether the subspace of reflection is a plane. However, by solving the system
Ax = x, we find that F actually is reflection in the plane 2x1 − x2 − 2x3 = 0.
We could instead have used Theorem 5.24. We have det A = −1. Hence, F is reflection
in a plane or in {0}. Since A 6= −I, the subspace of reflection must be a plane.

Exercises
5.1. Let (x1 , x2 ) be the coordinates with respect to an orthonormal basis e1 , e2 for
2-space. Find the matrices with respect to that basis of the following linear
transformations on 2-space.
(a) Rotation a quarter-turn in the direction from the x1 -axis to the x2 -axis.
(b) Orthogonal projection on the line 3x1 = 4x2 .
5.2. Find the matrices of the following linear transformations on 3-space with respect
to an orthonormal, positively oriented basis e1 , e2 , e3 .
π
(a) Anticlockwise rotation about the x2 -axis through the angle 6.
(b) Orthogonal projection on the plane x1 + 2x2 − 2x3 = 0.
5.3. Let e1 , e2 be a basis for 2-space. Find the matrix with respect to that basis of
projection on the line x1 = x2 along the x1 -axis.
5.4. Let e1 , e2 , e3 be a basis for 3-space.
(a) Find the matrix of projection on the plane x1 + x2 + x3 = 0 along the line
x = t(1, 2, 3).
(b) Find the matrix of projection on the line along the plane.

83
5 Linear Transformations

5.5. Find, with respect to the basis e1 = 1, e2 = x, e3 = x2 , e4 = x3 , the matrix of


d
the linear transformation F on P3 defined by F (p) = dx ((x − 1)p).
5.6. Let F be a linear transformation with matrix
 
1 2 3
1 −1 −1
2 1 3

with respect to a basis e1 , e2 , e3 . Find the matrix of F with respect to the basis

e′1 = e1 + e2 − e3 ,
e′2 = 2e1 + e2 + e3 ,
e′3 = 2e1 + e2 + 2e3 .

5.7. Let e1 , e2 , e3 be an orthonormal, positively oriented basis for 3-space. Let F


be rotation about the line x = t(1, 1, 0) through the angle π6 and suppose that
the rotation appears anticlockwise on looking from the point (1,1,0) towards the
origin. Find the matrix of F with respect to the given basis.
5.8. (a) What is the determinant of a rotation about the origin in 2-space?
(b) What is the determinant of a rotation about a line in 3-space?
Hint: Introduce suitable bases.
5.9. Let F be the linear transformation on 3-space with matrix
 
−1 4 2
1 −1 −1
−3 6 4

with respect to a basis e1 , e2 , e3 . Show that F is projection on a plane along a


line and find the plane and the line.
5.10. Show that the linear transformation with the following matrix with respect to a
basis for R4 is projection on a subspace U ′ along a subspace U ′′ . Find U ′ and U ′′ .
 
0 2 −2 0
1 3 −2 2 −3 .

2  3 −4 4 −3
−2 2 −2 2

5.11. Show that the linear transformation with matrix


 
0 −1 −2
−3 −2 −6
1 1 3

with respect to a basis for R3 is reflection in a subspace U ′ along a subspace U ′′


and find U ′ and U ′′ .

84
Exercises

5.12. Let e1 , e2 , e3 be an orthonormal basis for R3 and consider the plane with equation
x1 + 2x2 − 2x3 = 0. Find the matrix of orthogonal reflection in that plane with
respect to the given basis.
5.13. Let e1 , e2 , e3 be a basis for R3 . Find the matrix with respect to that basis of
reflection in the plane 2x1 − x2 − 3x3 = 0 along the line x = t(1, −2, 1).
5.14. Let I be the unit matrix of order n and B an n × 1 column vector of unit length.
Explain the geometric meaning of the so-called Householder matrix I − 2BB t .
5.15. The matrices below are matrices of linear transformations on 2-space with respect
to an orthonormal basis. Show that the linear transformations are isometries and
find their geometric meaning.
√   √   
1 3 √1 1 −1 3 1 1 −1
(a) , (b) √ , (c) √ .
2 −1 3 2 3 1 2 1 1
5.16. The matrices below are matrices of linear transformations on 3-space with respect
to an orthonormal, positively oriented basis. Show that the linear transformations
are isometries and find their geometric meaning.
     
8 −4 1 1 4 8 0 1 0
1 1
(a) −1 −4 −8, (b) 4 7 −4, (c) 0 0 1,
9 9
4 7 −4 8 −4 1 1 0 0
   
−6 2 3 −2 1 −2
1 1
(d)  2 −3 6, (e) −2 −2 1 .
7 3
3 6 2 1 −2 −2
5.17. (a) Let F and G be rotations about lines in 3-space. Show, for example by using
determinants, that F G and GF are also rotations about lines.
(b) Let F and G be orthogonal reflections in two different planes through the
origin in 3-space. Show that F G and GF are rotations about the line of
intersection between the planes. What are the angles of rotation?

85
6 Eigenvalues and Eigenvectors

6.1 Definition
Definition 6.1. Let F be a linear transformation on a linear space V . We say that
λ ∈ R is an eigenvalue of F if there exists a non-zero vector u ∈ V such that F (u) = λu.
A non-zero vector u ∈ V for which F (u) = λu is called an eigenvector of F belonging
to the eigenvalue λ. By an eigenvalue and an eigenvector of an n × n matrix A we shall
mean an eigenvalue and an eigenvector of the linear transformation x 7→ Ax on Rn .

Definition 6.2. Let A be a square matrix. The polynomial det (A − λI) is called the
characteristic polynomial of A.

Definition 6.3. Let A = [aik ] be an n × n matrix. The trace of A is defined by


n
X
tr A = aii .
i=1

Theorem 6.4. Let A be an n × n matrix. If the complex zeros of det (A − λI) are
λ1 , . . . , λn , where each zero is counted as many times as its multiplicity, then

det (A − λI) = (−λ)n + bn−1 (−λ)n−1 + · · · + b0

where bn−1 = tr A = λ1 + · · · + λn and b0 = det A = λ1 · · · λn .

Proof. Let A(k) be the matrix obtained from I by replacing the kth column of I with
Ak . Then by multilinearity,

det (A − λI) = det [A1 − λI1 , . . . , An − λIn ]


 
= (−λ)n det I + (−λ)n−1 det A(1) + · · · + det A(n) + · · · + det A
= (−λ)n + (−λ)n−1 tr A + · · · + det A.

This shows that det (A − λI) is of the form stated and that bn−1 = tr A and b0 = det A.
The equalities bn−1 = λ1 +· · ·+λn and b0 = λ1 · · · λn follow from the relationship between
roots and coefficients.

Theorem 6.5. Let A be an n × n matrix. Then λ is an eigenvalue of A if and only if


det (A − λI) = 0. The eigenvectors of A belonging to the eigenvalue λ are the non-zero
vectors x given by (A − λI)x = 0.
6 Eigenvalues and Eigenvectors

Proof. We have det (A − λI) = 0 if and only if the system Ax − λx = (A − λI)x = 0


has a non-zero solution x. The second statement follows directly from the definition.

Example 6.6. The eigenvalues of



1 2
A=
2 −2
are given by
1−λ 2
0 = det (A − λI) = = λ2 + λ − 6 = (λ + 3)(λ − 2).
2 −2 − λ
Hence, the eigenvalues of A are λ1 = −3 and λ2 = 2. The eigenvectors belonging to the
eigenvalue λ1 are the non-zero vectors x given by
   
1 − λ1 2 0 4 2 0
⇔ ⇔ x = t(1, −2),
2 −2 − λ1 0 2 1 0
and those belonging to λ2 are the non-zero vectors given by
   
1 − λ2 2 0 −1 2 0
⇔ ⇔ x = t(2, 1).
2 −2 − λ2 0 2 −4 0
Theorem 6.7. Let F be a linear transformation on a linear space V with basis e1 , . . . , en
and let A be the matrix of F with respect to that basis. Then λ is an eigenvalue of F
if and only if λ is an eigenvalue of A. Moreover, if x ∈ Rn is the coordinate vector of
u ∈ V , then u is an eigenvector of F belonging to λ if and only if x is an eigenvector
of A belonging to λ.

Proof. If x ∈ Rn is the coordinate vector of u ∈ V , then Ax = λx if and only if


F (u) = λu.

Example 6.8. Let V be a 2-dimensional linear space endowed with a basis e1 , e2 , and
consider the linear transformation F on V whose matrix is
 
0 −1
A= .
−1 0
The eigenvalues of F are given by
−λ −1
0 = det (A − λI) = = λ2 − 1 = (λ + 1)(λ − 1),
−1 −λ
and are therefore λ1 = −1 and λ2 = 1. The coordinate vectors of the eigenvectors
belonging to λ1 satisfy the system
 
1 −1 0
⇔ x = t(1, 1).
−1 1 0
Hence, the eigenvectors u belonging to that eigenvalue are u = t(e1 + e2 ), t 6= 0. In the
same way, we find that the eigenvectors belonging to λ2 are u = t(e1 − e2 ), t 6= 0.

88
6.2 Diagonalisability

6.2 Diagonalisability
Let F be a linear transformation on a linear space V and suppose there exists a basis
f 1 , . . . , f n for V consisting of eigenvectors of F belonging to the eigenvalues λ1 , . . . , λn .
Since F (f i ) = λi f i for i = 1, . . . , n, the matrix of F with respect to that basis is the
diagonal matrix  
λ1 0 · · · 0
 0 λ2 · · · 0 
 
D= . .. ..  .
 .. . . 
0 0 · · · λn
Let A be the matrix of F with respect to any basis e1 , . . . , en for V . If T is the n × n
matrix whose ith column is the coordinate vector of f i with respect to e1 , . . . , en , then
T −1AT = D. According to the following definition, A is diagonalisable.
Definition 6.9. Let A be an n × n matrix. We say that A is diagonalisable if there
exists an invertible n × n matrix T such that T −1AT is a diagonal matrix.

Theorem 6.10. Let F be a linear transformation on a linear space V and let A be the
matrix of F with respect to a basis e1 , . . . , en for V . Then A is diagonalisable if and
only if there exists a basis for V consisting of eigenvectors of F .

Proof. We have already shown that A is diagonalisable if such a basis exists. To show the
converse, we assume that A is diagonalisable, which means that there exist an invertible
matrix T and a diagonal matrix D such that T −1AT = D. Since T is invertible, its
columns t1 , . . . , tn form a basis for Rn . Hence, the vectors f 1 , . . . , f n with coordinates
t1 , . . . , tn with respect to e1 , . . . , en form a basis for V . Let the diagonal entries of D
be λ1 , . . . , λn . Since AT = T D, we have Ati = λi ti , and hence F (f i ) = λi f i , for
i = 1, . . . , n. This shows that f 1 , . . . , f n are also eigenvectors of F .

Corollary 6.11. Let A be an n × n matrix. Then A is diagonalisable if and only if there


exists a basis e1 , . . . , en for Rn consisting of eigenvectors of A.

Example 6.12. The eigenvectors e1 = (1, −2) and e2 = (2, 1) of the matrix
 
1 2
A=
2 −2

in Example 6.6 form a basis for R2 . Hence, A is diagonalisable, and we have T −1AT = D
where    
1 2 −3 0
T = and D = .
−2 1 0 2

Example 6.13. Consider the matrix



1 0
A= .
1 1

89
6 Eigenvalues and Eigenvectors

We have
1−λ 0
det (A − λI) = = (1 − λ)2 .
1 1−λ
The only eigenvalue of A is therefore λ = 1. The eigenvectors x belonging to this
eigenvalue are given by  
0 0 0
⇔ x = t(0, 1).
1 0 0
Hence, no basis for R2 consisting of eigenvectors of A exists, and A is not diagonalisable.

Example 6.14. Let  


1 1
A= .
−1 1
The characteristic polynomial det (A − λI) = (1 − λ)2 + 1 has no real zeros at all. Hence,
A has no eigenvectors, and is therefore not diagonalisable.

Theorem 6.15. Let F be a linear transformation on a linear space V , and suppose


that e1 , . . . , ek are eigenvectors belonging to distinct eigenvalues λ1 , . . . , λk of F . Then
e1 , . . . , ek are linearly independent.

Proof. We use induction on k. The statement holds for k = 1 since eigenvectors are
non-zero. Assume that the statement holds for k eigenvectors and let e1 , . . . , ek , ek+1 be
eigenvectors belonging to distinct eigenvalues λ1 , . . . , λk , λk+1 . Suppose that

s1 e1 + · · · + sk ek + sk+1 ek+1 = 0. (6.1)

Multiplying both sides of (6.1) by λk+1 , we get

s1 λk+1 e1 + · · · + sk λk+1 ek + sk+1 λk+1 ek+1 = 0, (6.2)

and taking F of both sides of (6.1) yields

s1 λ1 e1 + · · · + sk λk ek + sk+1 λk+1 ek+1 = 0. (6.3)

Subtracting corresponding sides of (6.2) from (6.3) , we now get

s1 (λ1 − λk+1 )e1 + · · · + sk (λk − λk+1 )ek = 0.

By hypothesis, e1 , . . . , ek are linearly independent. Therefore, si (λi − λk+1 ) = 0 for


i = 1, . . . , k. Since λi 6= λk+1 for i = 1, . . . , k, we must have s1 = · · · = sk = 0, and
hence by (6.1), sk+1 = 0.

Corollary 6.16. Let F be a linear transformation on a non-zero n-dimensional linear


space V . If e1 , . . . , en are eigenvectors of F belonging to n distinct eigenvalues, then
e1 , . . . , en form a basis for V .

90
6.2 Diagonalisability

Example 6.17. Let


 
6 −2 −2
A = 1 2 −1 .
3 −2 1

We set out to compute An for every positive integer n. We have

6 − λ −2 −2 6 − λ −2 −2 6 − λ −2 4 − λ
1 2 − λ −1 = 1 2 − λ −1 = 1 2−λ 0
3 −2 1 − λ λ−3 0 3−λ λ−3 0 0
= −(λ − 2)(λ − 3)(λ − 4).

Hence, the eigenvalues of A are λ1 = 2, λ2 = 3 and λ3 = 4. The eigenvectors belonging


to λ1 are given by
   
4 −2 −2 0 1 0 −1 0
1 0 −1 0 ⇔ 1 0 −1 0 ⇔ x = t(1, 1, 1).
3 −2 −1 0 3 −2 −1 0

For λ2 , we have
 
3 −2 −2 0  
1 −1 −1 0 1 0 0 0
⇔ ⇔ x = t(0, 1, −1),
1 −1 −1 0
3 −2 −2 0

and for λ3 ,
   
2 −2 −2 0 0 2 0 0
1 −2 −1 0 ⇔ 1 −2 −1 0 ⇔ x = t(1, 0, 1).
3 −2 −3 0 0 4 0 0

By Corollary 6.16, the eigenvectors (1, 1, 1), (0, 1, −1), (1, 0, 1) form a basis for R3 . Thus
T −1AT = D, and hence A = T DT −1 , where
   
1 0 1 2 0 0
T = 1 1 0 and D = 0 3 0 .
1 −1 1 0 0 4

We compute the inverse T −1 and get


 
−1 1 1
T −1 = 1 0 −1 .
2 −1 −1

91
6 Eigenvalues and Eigenvectors

It follows that

An = (T DT −1 )n = T DT −1 T DT −1 · · · T DT −1 = T D n T −1
  n  
1 0 1 2 0 0 −1 1 1
= 1 1 0  0 3n 0   1 0 −1
1 −1 1 0 0 4 n 2 −1 −1
 n n n n

−2 + 2 · 4 2 −4 2n − 4n
= −2n + 3n 2n 2n − 3n  .
−2 − 3 + 2 · 4 2 − 4 2 + 3n − 4n
n n n n n n

Definition 6.18. Let F be a linear transformation on a linear space V . If λ is an


eigenvalue of F , the eigenspace of F associated with λ is

{u ∈ V ; F (u) = λu} = ker (F − λI).

Note that 0, albeit not an eigenvector, belongs to an eigenspace.

Definition 6.19. Let F be a linear transformation on a finite-dimensional space V , and


let λ be an eigenvalue of F . The algebraic multiplicity of λ is the multiplicity of µ = λ
as a zero of the polynomial det (F − µI), and its geometric multiplicity is the dimension
of ker (F − λI).

Theorem 6.20. Let F be a linear transformation on a finite-dimensional space V , and


let λ be an eigenvalue of F . Then the geometric multiplicity of λ is less than or equal to
the algebraic multiplicity of λ.

Proof. Assume that dim V = n. We choose a basis e1 , . . . , ek for ker (F − λI) and extend
it to a basis e1 , . . . , ek , ek+1 , . . . , en for V . Since F (ei ) = λei for i = 1, . . . , k, the matrix
of F with respect to this basis is of the form
 
λ 0 ··· 0 a1(k+1) ··· a1n
0 λ · · · 0 a2(k+1) ··· a2n 
 
 .. .. .. .. .. 
. . . . . 
 
A= 0 0 · · · λ ak(k+1) ··· akn  .
 0 0 · · · 0 a(k+1)(k+1) · · · a(k+1)n 
 
 .. .. .. .. .. 
. . . . . 
0 0 ··· 0 an(k+1) ··· ann

Successively expanding along the first column, we get

det (F − µI) = det (A − µI) = (λ − µ)k f (µ)

where f is a polynomial of degree n − k. Hence, the algebraic multiplicity of λ is greater


than or equal to its geometric multiplicity k.

92
6.2 Diagonalisability

Corollary 6.21. Let F be a linear transformation on a non-zero n-dimensional linear


space V . If the polynomial det (F − λI) has non-real zeros, then V has no basis consisting
of eigenvectors of F . If all the distinct zeros λ1 , . . . , λk are real, then there exists a basis
for V of eigenvectors of F if and only if the algebraic and geometric multiplicities of λi
are equal for i = 1, . . . , k.

Proof. We have n = deg (det (F − λI)) = dim V . By Theorem 6.15, an eigenvector basis
exists if and only if the sum of the geometric multiplicities of the distinct eigenvalues
equals n. By Theorem 6.20, this cannot happen if det (F − λI) has non-real zeros, and if
all zeros are real, then the sum of the geometric multiplicities of the distinct eigenvalues
equals n if and only if the algebraic and geometric multiplicities of each eigenvalue are
equal.

Example 6.22. Let  


1 0 0
A = 1 1 0 .
0 0 2
We have
1−λ 0 0
det (A − λI) = 1 1−λ 0 = (1 − λ)2 (2 − λ).
0 0 2−λ
The algebraic multiplicities of the eigenvalues λ1 = 1 and λ2 = 2 are 2 and 1, respectively.
Hence the geometric multiplicity of λ2 is 1 since it cannot be less than 1. The geometric
multiplicity of λ1 is either 1 or 2. Since
   
1 − λ1 0 0 0 0 0 0 0
 1 1 − λ1 0 0 ⇔ 1 0 0 0 ⇔ x = t(0, 1, 0),
0 0 2 − λ1 0 0 0 1 0
the geometric multiplicity of λ1 equals 1. Hence, A is not diagonalisable.

Example 6.23. Consider the matrix


 
0 1 0
A = 1 0 0 .
0 0 1
We have
−λ 1 0
det (A − λI) = 1 −λ 0 = −(λ − 1)2 (λ + 1).
0 0 1−λ
The algebraic multiplicities of λ1 = 1 and λ2 = −1 are 2 and 1, respectively. The
eigenvectors belonging to λ1 are given by
 
−1 1 0 0
 1 −1 0 0 ⇔ x = s(1, 1, 0) + t(0, 0, 1).
0 0 0 0

93
6 Eigenvalues and Eigenvectors

Hence, the geometric multiplicity of λ1 equals 2. As a basis for the eigenspace associated
with λ1 we can choose e1 = (1, 1, 0), e2 = (0, 0, 1). This time it is worthwhile to find
also the eigenvectors belonging to λ2 .
 
1 1 0 0
1 1 0 0 ⇔ x = t(1, −1, 0).
0 0 2 0

We can, therefore, choose e3 = (1, −1, 0) as a basis for the eigenspace associated with λ2 .
The eigenvectors e1 , e2 , e3 form a basis for R3 since eigenvectors belonging to different
eigenvalues are linearly independent. Hence, A is diagonalisable, and T −1AT = D where
   
1 0 1 1 0 0
T = 1 0 −1 and D = 0 1 0  .
0 1 0 0 0 −1

6.3 Recurrence Equations


Let A be a diagonalisable n × n matrix, and suppose that we are given a recurrence
equation
uk+1 = Auk .
Since A is diagonalisable, Rn has a basis e1 , . . . , en consisting of eigenvectors of A. Let
λ1 , . . . , λn be the corresponding eigenvalues. We can then write u0 = c1 e1 + · · · + cn en ,
where (c1 , . . . , cn ) are the coordinates of u0 with respect to the basis e1 , . . . , en . From
this we get

u1 = Au0 = A(c1 e1 + · · · + cn en ) = c1 λ1 e1 + · · · + cn λn en ,
u2 = Au1 = A(c1 λ1 e1 + · · · + cn λn en ) = c1 λ21 e1 + · · · + cn λ2n en ,
..
.
uk = c1 λk1 e1 + · · · + cn λkn en .

Hence, if we know u0 , the eigenvalues and the eigenvectors, we know uk for all k.
Example 6.24. Let the numbers ak and bk be defined by
 
ak+1 = ak + 2bk a0 = 3
, .
bk+1 = 2ak + bk b0 = 1

If we set uk = (ak , bk ) and 



1 2
A= ,
2 1
the recurrence relation can be written as uk+1 = Auk , u0 = (3, 1). Let us first find the
eigenvalues and eigenvectors of A. We have

1−λ 2
= (1 − λ)2 − 4 = (λ + 1)(λ − 3).
2 1−λ

94
6.3 Recurrence Equations

Thus the eigenvalues are λ1 = −1 and λ2 = 3. For λ1 , the eigenvectors are given by
 
2 2 0
⇔ x = t(1, −1),
2 2 0

and for λ2 by  
−2 2 0
⇔ x = t(1, 1).
2 −2 0
The eigenvectors e1 = (1, −1) and e2 = (1, 1) form a basis for R2 . We have
 
1 1 3
c1 e1 + c2 e2 = u0 ⇔ ⇔ c1 = 1, c2 = 2.
−1 1 1

Hence,      
ak k k k 1 k 1
uk = = c1 λ1 e1 + c2 λ2 e2 = (−1) +2·3 .
bk −1 1

Example 6.25. Consider the recurrence relation

an+3 = −2an+2 + an+1 + 2an , a0 = −2, a1 = −1, a2 = 1.

Setting bn = an+1 and cn = an+2 , we get


 
 an+1 = bn  a0 = −2
bn+1 = cn , b0 = −1 .
 
cn+1 = 2an + bn − 2cn c0 = 1

Provided that  
0 1 0
A = 0 0 1 
2 1 −2
is diagonalisable, we can now apply the method used in the previous example. It turns
out that the eigenvectors e1 = (1, −2, 4), e2 = (1, −1, 1), e3 = (1, 1, 1) belonging to the
eigenvalues λ1 = −2, λ2 = −1, λ3 = 1 form a basis for R3 . Setting

(a0 , b0 , c0 ) = d1 e1 + d2 e2 + d3 e3

and solving for d1 , d2 , d3 , we get d1 = 1, d2 = −2, d3 = −1. Hence


       
an 1 1 1
 bn  = d1 λn1 e1 + d2 λn2 e2 + d3 λn3 e3 = (−2)n −2 − 2 · (−1)n −1 − 1 ,
cn 4 1 1

and in particular,
an = (−2)n − 2 · (−1)n − 1.

95
6 Eigenvalues and Eigenvectors

Example 6.26. The numbers an defined by an+2 = an+1 + an , a0 = 0, a1 = 1, are


known as the Fibonacci numbers. As in the previous example, we set bn = an+1 and get
    
an+1 0 1 an
= .
bn+1 1 1 bn

We have
√ √
−λ 1 1+ 5 1− 5
= λ2 − λ − 1 = 0 ⇔ λ = λ1 = or λ = λ2 = .
1 1−λ 2 2

The relationship between roots and coefficients gives that λ1 λ2 = −1 and λ1 + λ2 = 1.


Hence,
     
−λ1 1 0 −λ1 1 0 −λ1 1 0
⇔ ⇔ ⇔ x = t(1, λ1 ).
1 1 − λ1 0 1 λ2 0 λ1 −1 0

An eigenvector belonging to λ1 is therefore e1 = (1, λ1 ). By symmetry, we find that


e2 = (1, λ2 ) is an eigenvector belonging to λ2 . Setting (a0 , b0 ) = (0, 1) = c1 e1 + c2 e2 , we
get  
1 1 0 1 1 1
⇔ c1 = =√ and c2 = −c1 = − √ .
λ1 λ2 1 λ1 − λ2 5 5
Therefore      
an 1 n 1 1 n 1
= √ λ1 − √ λ2 .
bn 5 λ1 5 λ2
In particular, we have
√ !n √ !n !
1 1+ 5 1− 5
an = √ − .
5 2 2

6.4 The Spectral Theorem


So far, we have only discussed real linear spaces. If complex scalars are allowed in
Definition 2.1, we get a complex linear space. The set Cn of n-tuples of complex numbers
together with the addition

(x1 , . . . , xn ) + (y1 , . . . , yn ) = (x1 + y1 , . . . , xn + yn )

and the multiplication

s(x1 , . . . , xn ) = (sx1 , . . . , sxn ), s ∈ C,

then forms a complex linear space. The dot product on Cn is defined by

x · y = x1 y 1 + · · · + xn y n

96
6.4 The Spectral Theorem

and the norm by q



kxk = x·x= |x1 |2 + · · · + |xn |2 .

If we replace condition (ii) in Definition 3.1 with the condition hu, vi = hv, ui, we get
a complex inner product. The dot product on Cn is then a complex inner product. We
can also allow complex numbers in the theory of matrices and determinants. Let A be
a complex n × n matrix. Then Ax = 0 has a non-zero solution x ∈ Cn if and only if
det A = 0. In particular, if λ ∈ C, then there exists a non-zero vector x ∈ Cn such that
Ax = λx if and only if det (A − λI) = 0. Let A be a complex matrix. By A we shall
mean the matrix obtained from A by taking the complex conjugates of the entries of A.
We say that a square matrix A is Hermitian if At = A.
Lemma 6.27. Let A be a Hermitian n×n matrix. Then all the zeros of the characteristic
polynomial det (A − λI) are real.

Proof. Let λ be a zero of the characteristic polynomial. Then there exists a non-zero
vector x ∈ Cn such that
Ax = λx. (6.4)
Taking the complex conjugate and using the assumption on A, we get

At x = A x = Ax = λx = λ x. (6.5)

By (6.4),
(Ax) · x = (Ax)t x = (λx)t x = λxt x = λkxk2 ,
and by (6.5),

(Ax) · x = (Ax)t x = xt At x = xt λ x = λxt x = λkxk2 .

Therefore λkxk2 = λkxk2 , and since x 6= 0, we find that λ = λ. Hence λ is real.

From now on, linear spaces and matrices are real.


Lemma 6.28. Let A be a symmetric n × n matrix. Then A has an eigenvalue.

Proof. Since A is real, A is Hermitian. Therefore, by Lemma 6.27, the characteristic


polynomial det (A − λI) has real zeros. Hence, A has at least one eigenvalue.

Theorem 6.29 (Spectral theorem). Let F be a symmetric linear transformation on


a non-zero finite-dimensional inner product space V . Then there exists an orthonormal
basis for V of eigenvectors of F .

Proof. We prove the statement by induction on dim V = n. If n = 1, then V has an


orthonormal basis e. Since F (e) ∈ V , we have F (e) = λe where λ is the coordinate of
F (e) with respect to e. Hence, the statement holds for n = 1.
Suppose that dim V = n ≥ 2 and that the statement holds for symmetric linear
transformations on inner product spaces of dimension less than n. We can choose an

97
6 Eigenvalues and Eigenvectors

orthonormal basis for V . Since F is symmetric, the n × n matrix A of F with respect to


that basis is symmetric by Theorem 5.30. Hence, by Lemma 6.28, F has an eigenvalue λn .
Let en be an eigenvector of length 1 belonging to λn . Set V ′ = [en ]⊥ and let F ′ be the
restriction of F to V ′ . If u ∈ V ′ , then hu, en i = 0. Since F is symmetric, we have
hF (u), en i = hu, F (en )i = hu, λen i = λhu, en i = 0,
which means that F (u) ∈ V ′ . Hence, F ′ is a symmetric linear transformation on V ′ .
Since dim V ′ = n − 1, it follows from the induction hypothesis that V ′ has an orthonor-
mal basis e1 , . . . , en−1 of eigenvectors of F ′ , hence also of F . Since these vectors are
orthogonal to en , we find that e1 , . . . , en form an orthonormal basis for V .
The converse of the spectral theorem also holds. For if there exists an orthonormal
basis for V of eigenvectors of F , then the matrix with respect to that basis is a diagonal
matrix D. Since D is symmetric and the basis is orthonormal, F is a symmetric linear
transformation.
Corollary 6.30. Let A be an n × n matrix. Then A is symmetric if and only if Rn has
an orthonormal basis of eigenvectors of A.
Theorem 6.31. Let A be a symmetric n × n matrix. Then there exist an orthogonal
matrix T and a diagonal matrix D such that T tAT = D.
Proof. A is the matrix of the linear transformation x 7→ Ax on Rn with respect to
the orthonormal basis ε1 , . . . , εn . Let D be the diagonal matrix of A with respect to an
orthonormal basis e1 , . . . , en for Rn of eigenvectors of A and let T be the transition matrix
from e1 , . . . , en to ε1 , . . . , εn . Then T is orthogonal. Hence, D = T −1AT = T tAT .
Theorem 6.32. Let F be a symmetric linear transformation on an inner product space
and let e1 and e2 be eigenvectors belonging to two different eigenvalues λ1 and λ2 . Then
he1 , e2 i = 0.
Proof. We have
λ1 he1 , e2 i = hλ1 e1 , e2 i = hF (e1 ), e2 i = he1 , F (e2 )i = he1 , λ2 e2 i = λ2 he1 , e2 i.
Since λ1 6= λ2 , it follows that he1 , e2 i = 0.
Example 6.33. The matrix  
3 1 2
A = 1 3 2
2 2 6
is symmetric. Our goal is to find an orthogonal matrix T and a diagonal matrix D so
that D = T tAT .
3−λ 1 2 3−λ 1 2 3−λ 4−λ 2
1 3−λ 2 = λ−2 2−λ 0 = λ−2 0 0
2 2 6−λ 2 2 6−λ 2 4 6−λ
= −(λ − 2)2 (λ − 8).

98
6.5 Systems of Linear Differential Equations

The eigenvalues are λ1 = 2 and λ2 = 8. The algebraic multiplicity of λ1 is 2, and since


A is symmetric, also its geometric multiplicity must be 2. We seek its eigenspace.
 
1 1 2 0
1 1 2 0 ⇔ x1 + x2 + 2x3 = 0 ⇔ x = s(1, −1, 0) + t(2, 0, −1).
2 2 4 0

The vectors v 1 = (1, −1, 0) and v 2 = (2, 0, −1) form a basis for this eigenspace. We apply
the Gram–Schmidt process to them. We set u1 = v 1 and u2 = su1 + v 2 and get s = −1
and u2 = −u1 + v 2 = (1, 1, −1). Normalising u1 and u2 , we get the orthonormal basis
e1 = √12 (1, −1, 0), e2 = √13 (1, 1, −1) for the eigenspace ker (A − λ1 I). By Theorem 6.32,
we know that the eigenvectors belonging to λ2 are orthogonal to e1 and e2 . Hence, the
unit normal vector e3 = √16 (1, 1, 2) of the plane x1 + x2 + 2x3 = 0 must be an eigenvector
belonging to λ2 . With
 
√1 √1 √1 √ √   
 2 3 6
√3 √ 2 1 2 0 0
− √1 √1  1
√1  = √ − 3  and D = 0 2 0 ,
T = 2 3 6 √2 1
 6 0 − 2 2 0 0 8
0 − √13 √26

we therefore have D = T tAT .

6.5 Systems of Linear Differential Equations


Consider the system
 ′

 x1 (t) = a11 x1 (t) + a12 x2 (t) + · · · + a1n xn (t)

 x′ (t) = a21 x1 (t) + a22 x2 (t) + · · · + a2n xn (t)
2
..

 .

 ′
xn (t) = an1 x1 (t) + an2 x2 (t) + · · · + ann xn (t)

of first-order linear differential equations. For every t ∈ R, x(t) = (x1 (t), . . . , xn (t))
and x′ (t) = (x′1 (t), . . . , x′n (t)) are elements of Rn , and the system can be written as
x′ (t) = Ax(t) where A = [aik ]n×n . Suppose that Rn has a basis e1 , . . . , en consisting of
eigenvectors of A and let y(t) = (y1 (t), . . . , yn (t)) be the coordinates of x(t) with respect
to that basis. Then
x(t) = y1 (t)e1 + · · · + yn (t)en ,
and hence
x′ (t) = y1′ (t)e1 + · · · + yn′ (t)en .
If the eigenvalue associated with ei is λi for i = 1, . . . , n, then

x′ (t) = Ax(t) = A(y1 (t)e1 + · · · + yn (t)en ) = λ1 y1 (t)e1 + · · · + λn yn (t)en ,

99
6 Eigenvalues and Eigenvectors

and therefore

y1′ (t)e1 + · · · + yn′ (t)en = λ1 y1 (t)e1 + · · · + λn yn (t)en .

Since the coordinates are unique, we get

yi′ (t) = λi yi (t), i = 1, . . . , n. (6.6)

Hence,
yi (t) = ci eλi t , i = 1, . . . , n,
for some constants ci , and thus

x(t) = c1 eλ1 t e1 + · · · + cn eλn t en .

Example 6.34. Let us solve the following system of differential equations.


 ′
 x1 (t) = x1 (t) + 3x2 (t) + 2x3 (t)
x′ (t) = −x1 (t) + 2x2 (t) + x3 (t) .
 2′
x3 (t) = 4x1 (t) − x2 (t) − x3 (t)

Proceeding as usual, we find that the eigenvalues of


 
1 3 2
A = −1 2 1
4 −1 −1

are λ1 = −2, λ2 = 1, λ3 = 3 with associated eigenvectors e1 = (1, 1, −3), e2 = (1, −2, 3),
e3 = (1, 0, 1). Since the eigenvalues are distinct, the eigenvectors form a basis for R3 .
By applying the above method, we therefore get
       
x1 (t) 1 1 1
x2 (t) = c1 e−2t  1  + c2 et −2 + c3 e3t 0 .
x3 (t) −3 3 1

Given the initial conditions x1 (0) = 6, x2 (0) = −3 and x3 (0) = 6, we obtain


       
6 1 1 1
−3 = c1  1  + c2 −2 + c3 0 .
6 −3 3 1

Solving for c1 , c2 and c3 , we get c1 = 1, c2 = 2, c3 = 3. Hence, the solution of this initial


condition problem is
       
x1 (t) 1 1 1
x2 (t) = e−2t  1  + 2et −2 + 3e3t 0 .
x3 (t) −3 3 1

100
6.6 The Vibrating String

Exactly the same method can be used to solve a system of the form
 ′′

 x1 (t) = a11 x1 (t) + a12 x2 (t) + · · · + a1n xn (t)

 x′′ (t) = a21 x1 (t) + a22 x2 (t) + · · · + a2n xn (t)
2
.. .

 .

 ′′
xn (t) = an1 x1 (t) + an2 x2 (t) + · · · + ann xn (t)

(6.6) is now replaced by yi′′ (t) = λi yi (t), i = 1, . . . , n. We can solve these second-order
equations and proceed as before.

Example 6.35. Consider the system



x′′1 (t) = x1 (t) + x2 (t)
.
x′′2 (t) = 3x1 (t) − x2 (t)

This time, the eigenvalues and associated eigenvectors are λ1 = −2, λ2 = 2, e1 = (1, −3),
e2 = (1, 1). We have
√ √
y1′′ (t) = −2y1 (t) ⇔ y1 (t) = c1 sin 2 t + c2 cos 2 t

and
√ √
y2′′ (t) = 2y2 (t) ⇔ y2 (t) = d1 e 2t
+ d2 e− 2t
.

Hence,
      √  
x1 (t) √ √  1 2t
√ 
− 2t 1
= c1 sin 2 t + c2 cos 2 t + d1 e + d2 e .
x2 (t) −3 1

6.6 The Vibrating String


Consider a massless elastic string of length n + 1 stretched between two points. Suppose
that n point masses, each of mass m, are attached to the string at distances 1 from each
other and from the ends. If the point masses perform small transverse oscillations, the
tension T in the string is approximately constant. Disregarding gravity and using the
notation in the figure below, we see that the mass at the point j is influenced in the
y-direction by the force

−T sin αj−1 + T sin αj , j = 1, . . . , n.

Since y0 = yn+1 = 0 and the displacements are supposed to be small, we have

sin αj ≈ tan αj = yj+1 − yj , j = 0, . . . , n.

101
6 Eigenvalues and Eigenvectors

y
α2
T
T
T T
α3
α1
T T
y1 y2 y3
α0
0 1 2 3 4
n = 3 point masses
Hence, the force exerted on the mass at j is approximately
−T (yj − yj−1 ) + T (yj+1 − yj ) = T (yj−1 − 2yj + yj+1 ), j = 1, . . . , n.
According to Newton’s second law, force equals mass times acceleration. Therefore,
T (yj−1 (t) − 2yj (t) + yj+1 (t)) = myj′′ (t), j = 1, . . . , n.
q
T
Setting q = m, we can write this as

yj′′ (t) = q 2 (yj−1 (t) − 2yj (t) + yj+1 (t)), j = 1, . . . , n,


and thus     
y1′′ (t) −2 1 0 0 ··· 0 y1 (t)
y2′′ (t)  1 −2 1 0 ··· 0   y2 (t) 
    
y ′′ (t) 0 1 −2 1 ··· 0   y3 (t) 
 3    
y ′′ (t) = q 2  0 0 1 −2 ··· 0   y4 (t)  .
 4    
 .   . . .. .. ..  . 
 ..   .. .. . . .   .. 
yn′′ (t) 0 0 0 0 · · · −2 yn (t)
Since this n×n matrix A is symmetric, Rn has an orthonormal basis e1 , . . . , en consisting
of eigenvectors of A. Let λ1 , . . . , λn be the associated eigenvalues. If (z1 (t), . . . , zn (t))
are the coordinates of (y1 (t), . . . , yn (t)) with respect to e1 , . . . , en , we get
zj′′ (t) = λj z(t), j = 1, . . . , n.
p
We shall show in Section 7.5 that the eigenvalues are negative. Thus, with kj = −λj ,
we have
zj′′ (t) + kj2 zj (t) = 0, j = 1, . . . , n.
The solutions of this second-order differential equation are given by
zj (t) = aj eikj t + bj e−ikj t = cj sin (kj t + δj ), j = 1, . . . , n.
Hence,
y(t) = (y1 (t), . . . , yn (t)) = c1 sin (k1 t + δ1 )e1 + · · · + cn sin (kn t + δn )en .
The solution y(t) = cj sin (kj t + δj )ej , for which all coefficients except cj are zero, is
k
called an eigenmode with eigenfrequency 2πj .

102
Exercises

Example 6.36. Consider a string with two point masses. The corresponding matrix is
 
2 −2 1
A=q .
1 −2

Its eigenvalues are λ1 = −q 2 and λ2 = −3q 2 with associated eigenvectors

e1 = (1, 1) and e2 = (1, −1).

The two eigenmodes



c1 sin (qt + δ1 )e1 and c2 sin ( 3 qt + δ2 )e2

are shown below.

y1 y2 y1

y2

Exercises
6.1. Find the eigenvalues and eigenvectors of the following matrices.
   
5 −2 3 −2
(a) , (b) ,
6 −2 4 −1
   
−1 2 1 4 −1 0
(c)  2 −1 1, (d) 4 0 0.
−1 1 2 2 −1 2

6.2. Let A be an invertible square matrix. Show that if λ is an eigenvalue of A, then


λ−1 is an eigenvalue of A−1 .

6.3. (a) What eigenvalues can a projection have?


(b) What eigenvalues can a reflection have?

6.4. Let A be a square matrix such that the sum of the entries in each row equals λ.
Show that λ is an eigenvalue of A.

6.5. Let F be a linear transformation on a linear space V and assume that every non-
zero vector of V is an eigenvector of F . Show that there exists a real number λ
such that F = λI.

103
6 Eigenvalues and Eigenvectors

6.6. Determine, for each of the following matrices A, whether it is diagonalisable, and
when it is, find a matrix T such that T −1AT is a diagonal matrix.
     
−3 3 1 3 −1 −1 1 4 6
(a)  4 −1 −2 ,  (b)  4 −1 −2, (c) −3 −7 −7.
−14 9 6 −2 1 2 4 8 7

6.7. Find a diagonalisable 3 × 3 matrix with eigenvalues 1, 2 and 4 and associated


eigenvectors (1, 1, 1), (1, 2, −1) and (1, 0, 2), respectively.

6.8. Compute An for every positive integer n, where


 
2 2
A= .
3 1

6.9. Let A be a diagonalisable square matrix with non-negative eigenvalues. Show


that there exists a square matrix B such that B 2 = A.

6.10. Let A be a diagonalisable square matrix. Show that At is diagonalisable with the
same eigenvalues as A.

6.11. Let V be an n-dimensional inner product space, where n > 0, and let F be the
linear transformation on V defined by

F (u) = hu, cib − hb, ciu

where b and c are vectors of V for which hb, ci 6= 0. Show that V has a basis
consisting of eigenvectors of F and find the matrix of F with respect to some such
basis.

6.12. Solve the recurrence problem


 
an+1 = 2an + 2bn a0 = 4
, .
bn+1 = 10an + 3bn b0 = 1

6.13. Solve the recurrence problem


 
 an+1 = an + 2bn + 2cn  a0 = 0
b = 3an + bn + 9cn , b =5.
 n+1  0
cn+1 = 2an + 2bn + cn c0 = 8

6.14. Solve the recurrence problem

an+3 = an+2 + 4an+1 − 4an , a0 = 2, a1 = 7, a2 = 5.

104
Exercises

6.15. Find, for each of the following matrices A, an orthogonal matrix T such that
T tAT is a diagonal matrix.
   
1 0 3 13 −4 −2
(a) 0 1 4, (b) −4 13 2 .
3 4 1 −2 2 10
6.16. Let A be an invertible square matrix.
(a) Show that the eigenvalues of the symmetric matrix AtA are positive. Hint:
xtAtAx = kAxk2 .
(b) Show that there exists a unique symmetric matrix
√ B with positive eigenvalues
2 t t
such that B = A A. Hint: Show that Bx = λ x if A Ax = λx.
(c) Show that A = QB where Q is an orthogonal matrix and B a symmetric
matrix with positive eigenvalues. Hint: Try Q = (A−1 )t B.
6.17. Solve the following initial value problem.
 ′ 
 x1 (t) = x1 (t) + 3x2 (t) + 2x3 (t)  x1 (0) = 8
x′ (t) = 3x1 (t) − 4x2 (t) + 3x3 (t) , x (0) = −5 .
 2′  2
x3 (t) = 2x1 (t) + 3x2 (t) + x3 (t) x3 (0) = 10

6.18. Find the general solution of the following system of differential equations.
 ′′
 x1 (t) = x1 (t) + 2x2 (t) + x3 (t)
x′′ (t) = 2x1 (t) + x2 (t) + x3 (t) .
 ′′2
x3 (t) = 3x1 (t) + 3x2 (t) + 4x3 (t)

6.19. Find the eigenfrequencies and describe the corresponding eigenmodes for a string
with three point masses.

105
7 Quadratic Forms

7.1 Bilinear Forms


Definition 7.1. Let V be a linear space. A bilinear form on V is a 2-multilinear form
on V .

The definition means that a bilinear form b on V is a function b : V × V → R such that

b(su + tv, w) = sb(u, w) + tb(v, w) and b(w, su + tv) = sb(w, u) + tb(w, v)

for all vectors u, v and w of V and all real numbers s and t.

Example 7.2. The function b : Rn × Rn → R defined by


n X
X n
b(x, y) = bik xi yk
i=1 k=1

where bik , i = 1, . . . , n, k = 1, . . . , n, are real numbers is a bilinear form on Rn as is


easily verified. If we think of x and y as columns, we can write

b(x, y) = xt By

where B = [bik ].

Theorem 7.3. Let b be a bilinear form on a linear space V with basis e1 , . . . , en . If the
coordinates of u and v with respect to that basis are x and y, respectively, then

b(u, v) = xt By

where  
b(e1 , e1 ) · · · b(e1 , en )
 .. .. 
B= . . .
b(en , e1 ) · · · b(en , en )

Proof. We have by assumption that


n
X n
X
u= xi ei and v= yk e k .
i=1 k=1
7 Quadratic Forms

Hence, by bilinearity,
n n
! n n
!
X X X X
b(u, v) = b xi e i , yk ek = xi b ei , yk e k
i=1 k=1 i=1 k=1
n X
X n
= xi yk b(ei , ek ) = xt By.
i=1 k=1

Definition 7.4. Let b be a bilinear form on a linear space V and let e1 , . . . , en be a


basis for V . The matrix B in Theorem 7.3 is called the matrix of b with respect to the
basis e1 , . . . , en .

By using the basis ε1 , . . . , εn for Rn , we see that every bilinear form on Rn is of the form
described in Example 7.2.
Definition 7.5. A bilinear form b on a linear space V is said to be symmetric if

b(u, v) = b(v, u)

for all u and v in V .

Theorem 7.6. Let b be a bilinear form on a linear space V with basis e1 , . . . , en and
let B be the matrix of b with respect to that basis. Then b is symmetric if and only if B
is symmetric.

Proof. Let x and y be the coordinates of u and v, respectively. Then b(u, v) = xt By and
b(v, u) = y t Bx = xt (y t B)t = xt B t y. Hence, b is symmetric if and only if xt By = xt B t y
for all x and y in Rn , and this is equivalent to B = B t .

Theorem 7.7. Let b be a bilinear form on a linear space V with bases e1 , . . . , en and
e′1 , . . . , e′n . If B is the matrix of b with respect to e1 , . . . , en and T is the transition
matrix from e′1 , . . . , e′n to e1 , . . . , en , then the matrix of b with respect to e′1 , . . . , e′n is

B ′ = T tBT.

Proof. If the coordinates of u and v with respect to the bases are x, y and x′ , y ′ , then

b(u, v) = xt By = (T x′ )t BT y′ = (x′ )t T tBT y′ .

This shows that the matrix with respect to e′1 , . . . , e′n is B ′ = T tBT .

7.2 Definition of Quadratic Forms


Definition 7.8. Let V be a linear space. A function q : V → R is a quadratic form
on V if there exists a bilinear form b on V such that

q(u) = b(u, u), u ∈ V.

108
7.3 The Spectral Theorem Applied to Quadratic Forms

Example 7.9. The function q : R3 → R defined by

q(x) = x21 + 2x22 + x23 + 2x1 x2 + 4x1 x3 + 6x2 x3

is a quadratic form on R3 since q(x) = b(x, x) where b is the bilinear form on R3 defined
by
b(x, y) = x1 y1 + 2x2 y2 + x3 y3 + 2x1 y2 + 4x1 y3 + 6x2 y3 .
We also have q(x) = c(x, y) where c is the symmetric bilinear form on R3 defined by

c(x, y) = x1 y1 + 2x2 y2 + x3 y3 + x1 y2 + x2 y1 + 2x1 y3 + 2x3 y1 + 3x2 y3 + 3x3 y2 .

Theorem 7.10. Let q be a quadratic form on a linear space V . Then there exists a
unique symmetric bilinear form c on V such that q(u) = c(u, u) for all u ∈ V .

Proof. Let b be any bilinear form on V such that q(u) = b(u, u) for all u ∈ V and
define c by c(u, v) = 12 (b(u, v) + b(v, u)). Then c is symmetric and q(u) = c(u, u) for
all u ∈ V . To show the uniqueness, we let c be any symmetric bilinear form on V for
which q(u) = c(u, u) for all u ∈ V . Then

q(u + v) = c(u + v, u + v) = c(u, u) + c(v, v) + c(u, v) + c(v, u)


= q(u) + q(v) + 2c(u, v).

Consequently,
q(u + v) − q(u) − q(v)
c(u, v) =
2
is uniquely determined by q.

Definition 7.11. Let q be a quadratic form on a linear space V with basis e1 , . . . , en


and let b be the symmetric bilinear form associated with q. The matrix of b with respect
to e1 , . . . , en is called the matrix of q with respect to e1 , . . . , en .

Hence, the matrix of a quadratic form is symmetric.

Example 7.12. The matrix with respect to the basis ε1 , ε2 , ε3 of the quadratic form in
Example 7.9 is  
1 1 2
1 2 3 .
2 3 1

7.3 The Spectral Theorem Applied to Quadratic Forms


Definition 7.13. Let q be a quadratic form on a linear space V with basis e1 , . . . , en .
If the matrix of q with respect to that basis is a diagonal matrix, the basis is said to
diagonalise q.

109
7 Quadratic Forms

If the basis e1 , . . . , en diagonalises q and


 
λ1 0 · · · 0
 0 λ2 · · · 0 
 
B= . .. .. 
. . . . 
0 0 · · · λn

is the corresponding diagonal matrix, then

q(x1 e1 + x2 e2 + · · · + xn en ) = xt Bx = λ1 x21 + λ2 x22 + · · · + λn x2n .

Theorem 7.14. If q is a quadratic form on a non-zero finite-dimensional linear space V ,


then there exists a basis for V that diagonalises q.

Proof. Let e1 , . . . , en be any basis for V and let B be the matrix of q with respect to
that basis. Since B is a symmetric matrix, it follows from Theorem 6.31 that there exist
an orthogonal matrix T and a diagonal matrix D such that T tBT = D. Let e′k be the
vector in V whose coordinate vector with respect to e1 , . . . , en is the kth column of T .
Then the matrix of q with respect to the basis e′1 . . . , e′n is D.

If V is an inner product space and we start out with an orthonormal basis e1 , . . . , en in


the above proof, then also e′1 , . . . , e′n is an orthonormal basis since T is orthogonal. Thus
we get the following theorem.

Theorem 7.15. If q is a quadratic form on a non-zero finite-dimensional inner product


space V , then there exists an orthonormal basis for V that diagonalises q.

Theorem 7.16. Let q be a quadratic form on an inner product space V with ortho-
normal bases e1 , . . . , en and f 1 , . . . , f n . If

q(x1 e1 + x2 e2 + · · · + xn en ) = λ1 x21 + λ2 x22 + · · · + λn x2n ,


q(x1 f 1 + x2 f 2 + · · · + xn f n ) = µ1 x21 + µ2 x22 + · · · + µn x2n

for all x ∈ Rn where λ1 ≤ λ2 ≤ · · · ≤ λn and µ1 ≤ µ2 ≤ · · · ≤ µn , then λi = µi for


i = 1, 2, . . . , n.

Proof. It suffices to show that the matrices B and C of q with respect to the two bases
have the same characteristic polynomial. Since both bases are orthonormal, the transition
matrix T from f 1 , . . . , f n to e1 , . . . , en is orthogonal. Hence, C = T tBT = T −1BT , and
it follows that

det (C − λI) = det (T −1BT − T −1 λIT ) = det (T −1 (B − λI)T ) = det (B − λI).

Definition 7.17. If q is a quadratic form on a non-zero finite-dimensional inner product


space V , then the eigenvalues of q are the eigenvalues of the matrix of q with respect to
any orthonormal basis for V .

110
7.3 The Spectral Theorem Applied to Quadratic Forms

Example 7.18. Consider the quadratic form

q(x) = 3x22 + 3x23 + 4x1 x2 + 4x1 x3 − 2x2 x3

on R3 . The matrix of q with respect to the orthonormal basis ε1 , ε2 , ε3 is


 
0 2 2
B = 2 3 −1 .
2 −1 3

The eigenvalues are λ1 = −2, λ2 = 4, λ3 = 4, and

1 1 1
e′1 = √ (2, −1, −1), e′2 = √ (0, 1, −1), e′3 = √ (1, 1, 1)
6 2 3

form an orthonormal basis for R3 of eigenvectors of B. Hence, if x = x′1 e′1 + x′2 e′2 + x′3 e′3 ,
then
q(x) = −2(x′1 )2 + 4(x′2 )2 + 4(x′3 )2 .

Theorem 7.19. Let q be a quadratic form on an n-dimensional inner product space V


and let λ1 ≤ λ2 ≤ · · · ≤ λn be its eigenvalues ordered in ascending order. Then

λ1 = min q(u) and λn = max q(u).


kuk=1 kuk=1

Proof. There exists an orthonormal basis that diagonalises q, and its vectors e1 , . . . , en
can be ordered so that ei corresponds to λi for i = 1, . . . , n. Let u be any vector of
length 1 in V . If (x1 , . . . , xn ) are the coordinates of u with respect to e1 , . . . , en , then
x21 + · · · + x2n = kuk = 1 and q(u) = λ1 x21 + · · · + λn x2n . Hence,

λ1 = λ1 (x21 + · · · + x2n ) ≤ q(u) ≤ λn (x21 + · · · + x2n ) = λn

with equality in the left inequality if x2 = · · · = xn = 0 and in the right inequality if


x1 = · · · = xn−1 = 0.

Corollary 7.20. Using the same notation as in Theorem 7.19, we have

q(u) q(u)
λ1 = min and λn = max .
u6=0 kuk2 u6=0 kuk2

1
Proof. When u ranges over all non-zero vectors, kuk u ranges over all vectors of length
1. The statement of the corollary now follows from the fact that
 
1 q(u)
q u = .
kuk kuk2

111
7 Quadratic Forms

Example 7.21. Consider anew the quadratic form


q(x) = 3x22 + 3x23 + 4x1 x2 + 4x1 x3 − 2x2 x3
in Example 7.18 with eigenvalues −2 ≤ 4 ≤ 4. Its minimum and maximum values subject
to the constraint x21 + x22 + x23 = 1 are −2 and 4, respectively. When
x21 + x22 + x23 = (x′1 )2 + (x′2 )2 + (x′3 )2 = 1,
we have equality in the inequality
−2 ≤ −2(x′1 )2 + 4(x′2 )2 + 4(x′3 )2
if and only if (x′1 , x′2 , x′3 ) = ±(1, 0, 0). Hence, the minimum value is attained at the two
points ± √16 (2, −1, −1). In the inequality

−2(x′1 )2 + 4(x′2 )2 + 4(x′3 )2 ≤ 4,


we have equality if and only if x′1 = 0 and (x′2 )2 + (x′3 )2 = 1. The maximum value is,
therefore, attained at all points on the unit circle centred at the origin and lying in the
plane 2x1 − x2 − x3 = 0 perpendicular to the vector e′1 .

7.4 Quadratic Equations


7.4.1 Two Variables
The general quadratic equation in two variables is of the form
a11 x21 + a12 x1 x2 + a21 x2 x1 + a22 x22 + b1 x1 + b2 x2 = c
where the quadratic form q(x1 , x2 ) = a11 x21 + a12 x1 x2 + a21 x2 x1 + a22 x22 is not identically
zero.
Consider first the equation
q(x1 , x2 ) = c.
By choosing an orthonormal basis e′1 , e′2 for R2 that diagonalises q, we get an equation
λ1 (x′1 )2 + λ2 (x′2 )2 = c.
Since q is not identically zero, we must have λ1 6= 0 or λ2 6= 0.
Suppose that λ1 > 0 and λ2 > 0. If c < 0, the solution set is empty. If c = 0, the only
solution is (0, 0). If c > 0, the equation can be written as
(x′1 )2 (x′2 )2
+ 2 =1
a21 a2
where r r
c c
a1 = and a2 = .
λ1 λ2
We recognise this as the equation of an ellipse with centre at the origin and whose axes
are spanned by the vectors e′1 and e′2 . If λ1 = λ2 , the ellipse is a circle with radius a1 .

112
7.4 Quadratic Equations

x2 x2 x2
x′2 x′1 x′2 x′1 x′2 x′1

x1 x1 x1

Ellipse Hyperbola Parabola

Suppose that λ1 > 0 and λ2 < 0. If c = 0, the solution set


p p
λ1 x′1 = ± −λ2 x′2

consists of two lines intersecting at the origin. If c > 0, the equation can be written as

(x′1 )2 (x′2 )2
− 2 =1
a21 a2

where r r
c c
a1 = and a2 = .
λ1 −λ2
Hence, the solution set is a hyperbola with centre at the origin. Its transverse axis is
spanned by e′1 and its conjugate axis by e′2 . If c < 0, the equation can be written as

(x′1 )2 (x′2 )2
− + 2 = 1.
a21 a2

This is also the equation of a hyperbola with centre at the origin, but now the transverse
axis is spanned by e′2 and the conjugate axis by e′1 .
All the remaining cases with non-zero eigenvalues can be brought back to one of the
previous cases by changing the signs of both sides of the equation or reindexing the
eigenvalues and eigenvectors or both.
If one eigenvalue is zero, the solution set is empty or consists of one or two lines
depending on the value of c.
Also in the general case

q(x1 , x2 ) + b1 x1 + b2 x2 = c,

we diagonalise q. With respect to the new basis, the equation becomes

λ1 (x′1 )2 + λ2 (x′2 )2 + b′1 x′1 + b′2 x′2 = c′ .

If both eigenvalues are non-zero, we can complete the two squares and get
 2  2
b′ b′ (b′1 )2 (b′2 )2
λ1 x′1 + 1 + λ2 x′2 + 2 = c′ + + .
2λ1 2λ2 4λ1 4λ2

113
7 Quadratic Forms

By placing the origin at the point


 
b′ b′
− 1 ,− 2 ,
2λ1 2λ2
we get an equation of the form

λ1 (x′′1 )2 + λ2 (x′′2 )2 = c′′

bringing us back to the cases already discussed.


If λ1 6= 0 and λ2 = 0, the equation is

λ1 (x′1 )2 + b′1 x′1 + b′2 x′2 = c′ .

Completing the square, we get


 
′ b′1 2 (b′ )2
λ1 x1 + + b′2 x′2 = c′ + 1 .
2λ1 4λ1

If b′2 = 0, the solution set is empty or consists of one line or two parallel lines. Otherwise,
it is a parabola with vertex at
 
b′1 4λ1 c′ + (b′1 )2
− ,
2λ1 4λ1 b′2

and symmetry axis parallel to e′2 .


The case where λ1 = 0 and λ2 6= 0 can be brought back to the previous case by
reindexing the eigenvectors and eigenvalues.

7.4.2 Three Variables


The general quadratic equation in three variables is of the form

q(x1 , x2 , x3 ) + b1 x1 + b2 x2 + b3 x3 = c

where q is a quadratic form on R3 , not identically zero. Also now we begin by studying
the equation
q(x1 , x2 , x3 ) = c.
We can write this equation as

λ1 (x′1 )2 + λ2 (x′2 )2 + λ3 (x′3 )2 = c

where (x′1 , x′2 , x′3 ) are the coordinates of x with respect to an orthonormal basis e′1 , e′2 , e′3
that diagonalises q. At least one eigenvalue is non-zero.
Suppose first that the eigenvalues are positive. If c < 0, the solution set is empty, and
if c = 0, the only solution is (0, 0, 0). If c > 0, the surface is called an ellipsoid. The
intersection between the surface and any of the coordinate planes x′i = 0 is an ellipse. In
fact, the intersection between the surface and the plane x′i = d is an ellipse if λi d2 < c.

114
7.4 Quadratic Equations

Ellipsoid Hyperboloid of one sheet Cone

Hyperboloid of two sheets Elliptic cylinder Hyperbolic cylinder

Elliptic paraboloid Hyperbolic paraboloid Parabolic cylinder

Assume that λ1 > 0, λ2 > 0 and λ3 < 0. If c > 0, the surface is a hyperboloid of
one sheet. The intersection between the surface and a plane x′3 = d is an ellipse. The
intersections between the surface and planes of the form x′1 = d or x′2 = d are hyperbolae.
If c = 0, the surface is a cone. The intersection between the surface and the plane x′3 = d
is an ellipse when d 6= 0 and (0, 0, 0) when d = 0. The intersection between the surface
and one of the coordinate planes x′1 = 0 and x′2 = 0 consists of two intersecting lines. If
c < 0, the surface is a hyperboloid of two sheets. The intersection between the surface
and the plane x′3 = d is empty when λ3 d2 > c, consists of one point when λ3 d2 = c and is
an ellipse when λ3 d2 < c. The intersection between the surface and one of the coordinate
planes x′1 = 0 and x′2 = 0 is a hyperbola.
Suppose that λ1 > 0, λ2 > 0 and λ3 = 0. If c < 0, the solution set is empty. If c = 0,
the solution set consists of the x′3 -axis. If c > 0, the surface is an elliptic cylinder.

115
7 Quadratic Forms

Assume that λ1 > 0, λ2 < 0 and λ3 = 0. If c = 0, the solution set consists of two
intersecting planes. Otherwise, the surface is a hyperbolic cylinder.
If λ1 6= 0 and λ2 = λ3 = 0, the solution set is empty, a plane or two parallel planes,
depending on the value of c.
After diagonalisation, the general equation becomes

λ1 (x′1 )2 + λ2 (x′2 )2 + λ3 (x′3 )2 + b′1 x′1 + b′2 x′2 + b′3 x′3 = c′ .

Completing squares takes us back to the previous cases when the eigenvalues are non-zero.
When λ1 6= 0, λ2 6= 0 and λ3 = 0, we get an equation of the form

λ1 (x′′1 )2 + λ2 (x′′2 )2 + b′′3 x′′3 = c′′ .

We need only consider the case where b′′3 6= 0. If λ1 and λ2 have the same sign, the
surface is an elliptic paraboloid, otherwise a hyperbolic paraboloid.
When λ1 6= 0 and λ2 = λ3 = 0, the equation becomes

λ1 (x′′1 )2 + b′′2 x′′2 + b′′3 x′′3 = c′′ .

If at least one of b′′2 and b′′3 is non-zero, the surface is a parabolic cylinder.
Above we have regarded spheres as ellipsoids. In general, if two or more eigenvalues
are equal and the quadratic equation represents a surface, we call that surface a surface
of revolution.
Example 7.22. We set out to find the type of the surface

q(x1 , x2 , x3 ) = x21 − 4x22 + x23 + 6x1 x2 + 4x1 x3 + 62 x3 = 110.

We also wish to find the points on the surface closest to the origin and the distance from
those points to the origin. The eigenvalues of
 
1 3 2
B = 3 −4 3
2 3 1
are λ1 = −6, λ2 = −1 and λ3 = 5. The surface is therefore a hyperboloid of two
sheets. Let e′1 , e′2 , e′3 be an orthonormal basis of eigenvectors associated with λ1 , λ2 , λ3
and (x′1 , x′2 , x′3 ) the coordinates of x = (x1 , x2 , x3 ) with respect to that basis. Then

q(x) = −6(x′1 )2 − (x′2 )2 + 5(x′3 )2 .

For x on the surface, we have


−6(x′1 )2 − (x′2 )2 + 5(x′3 )2 q(x) 110
kxk2 = (x′1 )2 + (x′2 )2 + (x′3 )2 ≥ = = = 22
5 5 5

with equality if and only if x′1 = x′2 = 0, ′
√ x3 = ± 22. Hence, the′ minimum distance√from
a point on the surface to the origin is 22 and is√attained at (x1 , x′2 , x′3 ) = ±(0, 0, 22).
A unit eigenvector associated with λ3 is e′3 = (1/ √22)(3, 2, 3). The points on the surface
closest to the origin are therefore (x1 , x2 , x3 ) = ± 22 e′3 = ±(3, 2, 3).

116
7.5 Sylvester’s Law of Inertia

7.5 Sylvester’s Law of Inertia


Definition 7.23. Let q be a quadratic form on a linear space V .
• q is positive definite if q(u) > 0 for all non-zero vectors u ∈ V .
• q is positive semidefinite if q(u) ≥ 0 for all u ∈ V and q(v) = 0 for some non-zero
vector v ∈ V .
• q is negative definite if q(u) < 0 for all non-zero vectors u ∈ V .
• q is negative semidefinite if q(u) ≤ 0 for all u ∈ V and q(v) = 0 for some non-zero
vector v ∈ V .
• q is indefinite if q(u) > 0 and q(v) < 0 for some vectors u and v in V .

If a quadratic form q is diagonalised with respect to two orthonormal bases for an inner
product space V , then the coefficients in the two representations are the same according
to Theorem 7.16. For any two diagonalising bases, this need not be true. Let

q(u) = λ1 x21 + · · · + λn x2n

be the diagonal representation of q with respect to a basis e1 , . . . , en that diagonalises q.


Then q is positive definite if and only if λi > 0 for i = 1, . . . , n. Hence, if all coefficients
are positive in one diagonal representation of q, then they are positive in every diagonal
representation of q. Below, we shall see that this statement can be strengthened.
If q is a quadratic form on a linear space V and U is a subspace of V , then the
restriction q ↾U of q to U is clearly a quadratic form on U . We say that q is positive or
negative definite on U if q ↾U is positive or negative definite, respectively.
Definition 7.24. Let q be a quadratic form on a finite-dimensional linear space V . The
positive index σ+ of inertia of q is the maximum dimension of subspaces of V on which
q is positive definite. The negative index σ− of inertia of q is the maximum dimension of
subspaces of V on which q is negative definite. The signature of q is the pair (σ+ , σ− ).

Theorem 7.25 (Sylvester’s law of inertia). Let q be a quadratic form on a linear


space V and suppose that the basis e1 , . . . , en for V diagonalises q. If

q(x1 e1 + x2 e2 + · · · + xn en ) = λ1 x21 + λ2 x22 + · · · + λn x2n

for all x ∈ Rn , then the number of positive λi equals σ+ and the number of negative λi
equals σ− .

Proof. After reindexing the basis vectors and coefficients if necessary, we may assume
that λ1 , . . . , λk are positive and λk+1 , . . . , λn are non-positive. Set U+ = [e1 , . . . , ek ]
and U− = [ek+1 , . . . , en ]. We use here the convention that the subspace spanned by no
vectors is the zero space {0}. Then q is positive definite on U+ . Let U be any subspace
of V on which q is positive definite. Since q(u) > 0 for all non-zero vectors u ∈ U and
q(u) ≤ 0 for all vectors u ∈ U− , we must have U ∩ U− = {0}. Hence, by Theorem 2.62,

dim U + n − k = dim U + dim U− = dim (U + U− ) ≤ dim V = n,

117
7 Quadratic Forms

and therefore dim U ≤ k. This shows that k is the maximum dimension of subspaces
on which q is positive definite. We now obtain the statement about σ− by applying the
statement about σ+ to the quadratic form −q.

Example 7.26. We wish to find the type of the surface

x21 + 6x1 x2 − 4x1 x3 + 7x22 − 4x2 x3 + 2x23 = 1.

This time, as it turns out, the matrix


 
1 3 −2
B= 3 7 −2
−2 −2 2

of the quadratic form is not well suited for manual computation of eigenvalues. Instead,
we set out to find the representation with respect to some diagonalising basis, not ne-
cessarily orthonormal. Then we can use Theorem 7.25 to find the number of positive
and negative eigenvalues of the form. We find a diagonal representation by completing
squares as follows.

q(x) = x21 + 6x1 x2 − 4x1 x3 + 7x22 − 4x2 x3 + 2x23


= x21 + (6x2 − 4x3 )x1 + 7x22 − 4x2 x3 + 2x23
= (x1 + (3x2 − 2x3 ))2 − (3x2 − 2x3 )2 + 7x22 − 4x2 x3 + 2x23
= (x1 + 3x2 − 2x3 )2 − 2x22 + 8x2 x3 − 2x23
= (x1 + 3x2 − 2x3 )2 − 2(x2 − 2x3 )2 + 8x23 − 2x23
= (x1 + 3x2 − 2x3 )2 − 2(x2 − 2x3 )2 + 6x23 .

Setting  ′
 x1 = x1 + 3x2 − 2x3
x′ = x2 − 2x3 ,
 ′2
x3 = x3
we obtain
q(x) = (x′1 )2 − 2(x′2 )2 + 6(x′3 )2 .
The coefficient matrix  
1 3 −2
0 1 −2
0 0 1
is clearly invertible, whence it is the transition matrix from ε1 , ε2 , ε3 to some basis
e1 , e2 , e3 . Its inverse T is then the transition matrix from e1 , e2 , e3 to ε1 , ε2 , ε3 . Hence,
the matrix of q with respect to the basis e1 , e2 , e3 is
 
1 0 0
T tBT = D = 0 −2 0 .
0 0 6

118
7.5 Sylvester’s Law of Inertia

The representation of q with respect to the diagonalising basis e1 , e2 , e3 has two positive
and one negative coefficients. Hence, q has two positive and one negative eigenvalues.
Therefore and since the right-hand side of the equation of the surface is positive, the
surface is a hyperboloid of one sheet. The reader should be aware that the coefficients
1, −2 and 6 are not eigenvalues of B and hence not of q.
When carried out correctly, the above method always yields an invertible coefficient
matrix and hence a basis that diagonalises q. Sometimes, however, there are no squares
to complete as in the following example.
Example 7.27. The quadratic form
q(x) = x1 x2 + x1 x3 + x2 x3
on R3 has no squares. As a remedy for this we begin with the following change of
coordinates. 
 x1 = x′1 + x′2
x = x′1 − x′2 .
 2
x3 = x′3
Since the coefficient matrix is invertible, this yields a change of basis. We get
q(x) = (x′1 )2 − (x′2 )2 + 2x′1 x′3 .
Now we can proceed as in the previous example.
q(x) = (x′1 )2 − (x′2 )2 + 2x′1 x′3 = (x′1 + x′3 )2 − (x′2 )2 − (x′3 )2 = (x′′1 )2 − (x′′2 )2 − (x′′3 )2 .
The method used in the above two examples works well for finding the type of a surface
but is useless for exploring metric properties of the surface. For example, it cannot be
used to find the points on the surface closest to the origin. The reason for this is that
the diagonalising basis need not be orthonormal. Nor can it reveal whether the surface
is a surface of revolution. Even if two coefficients happen to be equal in the diagonal
representation, nothing says that two eigenvalues of the quadratic form must be equal.
The following result is an immediate consequence of Definition 7.24 but can also be
regarded as a corollary to Theorem 7.25.
Corollary 7.28. Let q be a quadratic form on an n-dimensional linear space V . Then
the following statements hold.
• q is positive definite if and only if σ+ = n.
• q is positive semidefinite if and only if σ+ < n and σ− = 0.
• q is negative definite if and only if σ− = n.
• q is negative semidefinite if and only if σ+ = 0 and σ− < n.
• q is indefinite if and only if σ+ > 0 and σ− > 0.
Definition 7.29. The sign function on R is defined by


1, if x > 0,
sgn(x) = 0, if x = 0,


−1, if x < 0.

119
7 Quadratic Forms

Let q be a quadratic form on an n-dimensional linear space V with matrix


 
b11 · · · b1n
 .. 
B =  ... . 
bn1 · · · bnn

with respect to some basis e1 , . . . , en for V and let D be the matrix of q with respect
to some basis for V that diagonalises q. Then D = T tBT for some invertible matrix T ,
whence
det D = det(T tBT ) = (det T )2 det B.

Since (det T )2 > 0, it follows that sgn(det D) = sgn(det B).


Assume that det B 6= 0. Then σ+ + σ− = n. If λ1 , . . . , λn are the diagonalQentries
of D, then σ+ of the λi are positive and σ− of them are negative. Since det D = ni=1 λi ,
we get
sgn(det B) = (−1)σ− .

Set U0 = {0} and Um = [e1 , . . . , em ] for m = 1, . . . , n. Let, for m = 0, . . . , n, qm be the


(m) (m)
restriction of q to Um , and denote by σ+ and σ− the indices of inertia of qm . For
m = 1, . . . , n, the matrix of qm with respect to the basis e1 , . . . , em is
 
b11 · · · b1m
 ..  .
Bm =  ... . 
bm1 · · · bmm

Let d0 = 1 and suppose that dm = det Bm 6= 0 for m = 1, . . . , n. Then


(m)
(m) (m)
σ+ + σ− = m and sgn(dm ) = (−1)σ− , m = 0, . . . , n.

Suppose that 0 ≤ m ≤ n − 1. A subspace U of Um is also a subspace of Um+1 . If qm


is positive definite or negative definite on U , then qm+1 is positive definite or negative
(m) (m+1) (m) (m+1)
definite, respectively, on U . Hence, σ+ ≤ σ+ and σ− ≤ σ− , and therefore

(m+1) (m) (m+1) (m)


σ− = σ− or σ− = σ− + 1.

In the first case, dm and dm+1 have the same sign, and in the second case, dm and dm+1
have opposite signs. Hence, σ− equals the number of sign changes in the sequence
d0 , d1 , . . . , dn .
Summing up, we have the following theorem.

Theorem 7.30. Let q be a quadratic form on a non-zero n-dimensional linear space V


and let B be its matrix with respect to a basis for V . Set d0 = 1 and suppose that
dm = det Bm 6= 0 for m = 1, . . . , n. Then σ− equals the number of sign changes in the
sequence d0 , d1 , . . . , dn .

120
7.5 Sylvester’s Law of Inertia

Example 7.31. Consider once again the quadratic form

x21 + 6x1 x2 − 4x1 x3 + 7x22 − 4x2 x3 + 2x23

in Example 7.26 with matrix


 
1 3 −2
B= 3 7 −2 .
−2 −2 2

Here

1 3 −2
1 3
d0 = 1, d1 = 1, d2 = = 7 − 9 = −2, d3 = 3 7 −2 = −12.
3 7
−2 −2 2

Since all the determinants are non-zero and there is only one change of sign in the
sequence 1, 1, −2, −12, we see that σ+ = 2 and σ− = 1.

Example 7.32. We can now fulfil the promise made in Section 6.6. Let Bn be the
symmetric n × n matrix
 
−2 1 0 ··· 0
1 −2 1 ··· 0 
 
0 1 −2 ··· 0 
 .
 .. .. .. ..
 . . . .
0 0 0 · · · −2
Then d1 = det B1 = −2, d2 = det B2 = 3 and, for n ≥ 3,

−2 1 0 ··· 0
−2 1 · · · 0 1 0 ··· 0
1 −2 1 ··· 0
1 −2 · · · 0 1 −2 · · · 0
dn = det Bn = 0 1 −2 ··· 0 = −2 . .. .. − .. .. ..
.. .. .. .. .. . . . . .
. . . .
0 0 · · · −2 0 0 · · · −2
0 0 0 · · · −2
= −2dn−1 − dn−2 .

We can now prove by induction that dn = (−1)n (n + 1). The statement holds for n = 1
and n = 2, and if it holds for k < n where n ≥ 3, then

dn = −2dn−1 − dn−2 = −2(−1)n−1 n − (−1)n−2 (n − 1) = (−1)n (2n − (n − 1))


= (−1)n (n + 1).

Hence, there are n sign changes in the sequence d0 , d1 , . . . , dn , and therefore all the
eigenvalues of Bn are negative.

121
7 Quadratic Forms

Exercises
7.1. Find, for each of the following quadratic forms q on R3 , an orthonormal basis
for R3 that diagonalises q and find the corresponding diagonal representation.
(a) q(x) = 6x21 + 3x22 + 3x23 − 4x1 x2 + 4x1 x3 − 2x2 x3 ,
(b) q(x) = x21 + x22 + x23 − 2x2 x3 .
7.2. Find the maximum and minimum values of
q(x1 , x2 , x3 ) = 7x21 + 3x22 + 7x23 + 2x1 x2 + 4x2 x3
subject to the constraint x21 + x22 + x23 = 1. Also find the points where they occur.
7.3. Find the maximum and minimum values of
q(x1 , x2 , x3 ) = x21 + 2x22 + 2x23 + 8x1 x2 + 8x1 x3 + 6x2 x3
subject to the constraint x21 + x22 + x23 ≤ 9.
7.4. (a) Find the minimum value of r(x1 , x2 , x3 ) = x21 + x22 + x23 subject to the con-
straint
q(x1 , x2 , x3 ) = x21 + 3x22 + x23 + 2x1 x2 − 2x1 x3 − 2x2 x3 = 1.

(b) Does r(x1 , x2 , x3 ) have a maximum value in the set where q(x1 , x2 , x3 ) = 1?
7.5. Find the least value of a for which
3x21 + 5x22 + 3x23 − 2x1 x2 + 2x1 x3 − 2x2 x3 ≤ a(x21 + x22 + x23 )
for all x ∈ R3 .
7.6. Let A be a square matrix. Show that if λ is the least eigenvalue of the symmetric
matrix AtA, then √
min kAxk = λ .
kxk=1

7.7. Find the least possible distance between the points (x1 , x2 , x3 ) and (−x3 , x1 , x2 )
on the unit sphere.
7.8. Show that the curve described by the equation
18x21 + 12x22 − 8x1 x2 = 40
with respect to an orthonormal coordinate system for 2-space is an ellipse. Find
the lengths and directions of the semi-major and semi-minor axes.
7.9. A quadratic surface has, with respect to an orthonormal coordinate system for
3-space, the equation
3x21 + 3x22 − 8x1 x2 + 4x1 x3 − 4x2 x3 = 1.
Identify the type of surface and find the least distance from a point on the surface
to the origin.

122
Exercises

7.10. A quadratic surface has, with respect to an orthonormal coordinate system for
3-space, the equation

11x21 + 11x22 + 14x23 − 2x1 x2 − 8x1 x3 − 8x2 x3 = 1.

Show that it is an ellipsoid and find the points on the surface closest to the origin
and furthest from the origin.

7.11. A quadratic surface has, with respect to an orthonormal coordinate system for
3-space, the equation

2x21 − x22 − x23 + 4x1 x2 − 4x1 x3 + 8x2 x3 = 1.

Show that it is a surface of revolution, identify its type and find the axis of
revolution.

7.12. A quadratic surface has, with respect to an orthonormal coordinate system for
3-space, the equation

x21 + 6x22 + x23 + 6x1 x2 + 4x1 x3 + 6x2 x3 = 81.

Identify the type of surface and determine the distance from the surface to the
origin.

7.13. A quadratic surface has, with respect to an orthonormal coordinate system for
3-space, the equation

2x21 + 3x22 + 3x23 − 2x1 x2 − 2x1 x3 − 4x2 x3 = 25.

Identify the type of surface and find the least distance from a point on the surface
to the origin.

7.14. Find the signatures of the following quadratic forms on R3 .


(a) x21 + 5x22 + 11x23 − 4x1 x2 + 6x1 x3 − 10x2 x3 ,
(b) x21 + 3x22 + x23 − 4x1 x2 + 2x1 x3 − 6x2 x3 ,
(c) 2x1 x2 − 3x1 x3 − x2 x3 .

7.15. Which of the following quadratic forms on R3 are positive definite?


(a) (2x1 + x2 + x3 )2 + (x1 + 2x2 + 2x3 )2 + (x1 − x2 − x3 )2 ,
(b) (x1 + 2x2 + x3 )2 + (x1 + x2 + x3 )2 + (x1 − x2 + 2x3 )2 .

7.16. Identify the types of the following surfaces.


(a) x21 + 3x22 + 2x23 − 2x1 x2 − 2x1 x3 + 6x2 x3 = 1,
(b) x21 + x22 + x23 + 2x1 x2 − 4x1 x3 − 4x2 x3 = 1.

123
7 Quadratic Forms

7.17. Identify, for each value of the real constant a, the type of the surface

x21 + (2a + 1)x22 + ax23 + 2ax2 x3 = 1.

7.18. Determine whether there is a change of coordinates that takes the quadratic form

q(x1 , x2 , x3 ) = x21 + 2x1 x2 − 4x1 x3 − 2x2 x3

to the quadratic form r in the following two cases.


(a) r(y1 , y2 , y3 ) = y12 + 3y22 + y32 − 4y1 y2 − 4y1 y3 + 6y2 y3 ,
(b) r(y1 , y2 , y3 ) = y12 + 5y22 + 3y32 − 4y1 y2 − 4y1 y3 + 10y2 y3 .
7.19. Let a, b and c be non-zero real numbers. Prove that the equation

ax1 x2 + bx1 x3 + cx2 x3 = 1

is the equation of a hyperboloid. State conditions on a, b and c in order that the


hyperboloid be of one sheet and two sheets, respectively.
7.20. Find the signature of the quadratic form

q(x1 , x2 , x3 ) = 4x21 + 10x22 + 3x23 + 4x1 x2 + 8x1 x3 + 10x2 x3

by means of Theorem 7.30.


7.21. Show that the matrix  
3 2 1
B = 2 4 1
1 1 1
has exactly one eigenvalue in the open interval (1, 2) by studying the signatures
of the quadratic forms with matrices B − I and B − 2I.

124
Answers to Exercises
   
2 2 4 9 7 12

1.1. (a) 0 3 3, (b) not defined, (c) 4 4 6 ,
3 2 2 0 1 −1
 
2 3 2  
1 7 9
(d) 0 5 6, (e) not defined, (f) ,
3 6 2
4 8 5
   
7 12 16 13 17 23
(g) , (h) .
11 11 13 19 16 24
   
2 2 −2 4 −4 2
1.2. A −B = , (A + B)(A − B) = .
−5 −5 −6 −3
 
2 −4
1.3. B=t , t ∈ R.
−1 2
 
1 12 138
1.4. (b) 0 1 24 .
0 0 1
   
2 0 4 7 11
1.5. At B t = 3 5 8 , (At + B t )C t = 12 11 .
2 6 5 16 13
   
−1 1 1 −10 5 5
1
1.8. (a)  1 −2 1 , (b) not invertible, (c)  0 2 −1.
5
0 1 −1 5 −3 −1
     
0 −1 1  0 2 −1  −3 −2 4
−1 −1
1.9. A−1 =  2 2 −3 , At = −1 2 0  , A2 = 7 2 −7.
−1 0 1 1 −3 1 −1 1 0
 
3a − 8 4 −3
1  4 − a 2a − 2 −3 , a 6= 3.
1.10.
6(a − 3)
−2 −8 6
 
−2 6 −5
1.11. X= .
1 −4 4
2.1. The sets in (a) and (d) are subspaces, the sets in (b) and (c) are not.
Answers to Exercises
 
1 2  
2.3. E.g. im 1 0 = ker 1 −1 −2 .
0 1
2.4. Only the set in (b) is linearly dependent.
2.5. No. Yes.
2.7. (a) ker A: E.g. (−2, −1, 1, 0), (−1, −2, 0, 1). im A: E.g. (1, 1, 2), (1, −1, 1).
(b) ker A: E.g. (−1, −1, 1, 0). im A: E.g. (3, 1, 5), (4, 2, 7), (1, −1, 2).
2.8. E.g. (1, 1, 0, 1), (1, 2, 2, 1), (3, 4, 1, 3).
2.9. (1, −2, 2, −1).
2.10. (a) 2, (b) 3.
2.12. n − 1.
2.14. Only the sum in (b) is direct.
2.15. The projection on U along V is (1, 2, −1). The projection on V along U is (3, 3, 6).
2.16. Only the function in (b) is a linear transformation.
2.17. dim ker A = 2, dim im A = 3.
2.18. F is one-to-one but not onto.
2.19. F −1 (x1 , x2 , x3 ) = (3x1 − 4x2 + 2x3 , −5x1 + 7x2 − 3x3 , 4x1 − 5x2 + 2x3 ).
2.20. F is not one-to-one but onto. The kernel is the set of constant polynomials. The
image is P .
π
3.1. 3.

3.5. E.g. √1 (2, 1, 1, 1), √1 (−1, 1, 2, −1), √1 (−2, 1, 0, 3).


7 7 14

3.6. E.g. √1 (−2, 1, 0, 0), √1 (2, 4, 5, 0), √1 (1, 2, −2, 9).


5 45 90

3.7. (a, b, c) = ±(1, 2, −2).


√ √
3.9. (b) E.g. 1, 3 (2x − 1), 5 (6x2 − 6x + 1).
3.10. E.g. √1 (1, −2, 1, 0), √1 (1, 1, 1, −3).
6 12

3.11. (a) (2, 3, 2, 3), 2.



(b) (2, 2, 2, 4), 2.
3.12. E.g. √1 (1, 1, 1, 0), 1 (0, 2, −2, 1), √1 (1, −1, 0, 2), √1 (3, −1, −2, −2).
3 3 6 18

3.14. (b) 2 sin t − sin 2t.


1
3.15. 2 (−2, −4, −3, 3, 2, 4).

126
3.16. (a) 2.
(b) E.g. (1, 2, 3, 4), (1, −1, 2, 3) and (1, 1, 1), (2, −1, 5), respectively.
1 23

1 t
3.17. 3 − 42 2 .

3.18. y = − 85 t + 22
5 .

3.19. y = 12 t2 + 9
10 t + 23
10 .

3.20. (2, 2, 2, 4).


4.1. (a) 24, (b) 1, (c) 0.
4.2. (a) 6, (b) −160, (c) b(b − a)3 .
√ √
4.3. (a) x = 0, x = 2 3 or x = −2 3. (b) x = 1 or x = −3.
4.6. (a) (n − 1)!, (b) 1 if n is odd, 0 if n is even,
(c) 1, (d) 0 if n is odd, (−2)n/2 if n is even.
4.7. (−1)n−1 (n − 1)2n−2 .
4.14. dim ker A = 1, dim im A = 2 if a = −2, dim ker A = 2, dim im A = 1 if a = 2,
dim ker A = 0, dim im A = 3 if a 6= −2 and a 6= 2.
1
4.15. − 50 .
4.16. x 6= −4, x 6= 0 and x 6= 2.
 
  1 −2 5
1 −4 2 1 
4.19. (a) , (b) −2 4 1 .
2 3 −1 11
6 −1 −3
   
0 −1 1 16 12
5.1. (a) , (b) .
1 0 25 12 9
√   
3 0 1 8 −2 2
1  1
5.2. (a) 0 2 √0 , (b) −2 5 4.
2 9
−1 0 3 2 4 5
 
0 1
5.3. .
0 1
   
5 −1 −1 1 1 1
1 1
5.4. (a) −2 4 −2, (b) 2 2 2.
6 6
−3 −3 3 3 3 3
 
1 −1 0 0
0 2 −2 0 
5.5. 
0
.
0 3 −3
0 0 0 4

127
Answers to Exercises
 
2 −7 −12
5.6. −4 13 23 .
3 −6 −12
 √ √ √ 
2 + √3 2 − √3 √2
1
5.7. 2 −√ 3 2+ 
√ 3 −√ 2 .
4
− 2 2 2 3
5.8. (a) 1, (b) 1.
5.9. U ′ : x1 − 2x2 − x3 = 0, U ′′ : x = t(2, −1, 3).
5.10. U ′ : x = s(0, 3, 3, −2) + t(1, −1, −2, 1), U ′′ : x = s(1, 0, 0, 1) + t(0, 1, 1, 0).
5.11. U ′ : x1 + x2 + 2x3 = 0, U ′′ : x = t(1, 3, −1).
 
7 −4 4
1
5.12. −4 1 8.
9
4 8 1
 
−3 2 6
5.13.  8 −3 −12.
−4 2 7
π
5.15. (a) Rotation about the origin through the angle 6 in the direction from e2 to-
wards e1 .

(b) Orthogonal reflection in the line x2 = 3 x1 .
π
(c) Rotation about the origin through the angle 4 in the direction from e1 to-
wards e2 .
5.16. (a) Rotation about the line x = t(5, −1, 1) through the angle 2π
3 in the anticlock-
wise direction when looking from the point (5, −1, 1) towards the origin.
(b) Orthogonal reflection in the plane 2x1 − x2 − 2x3 = 0.
(c) Rotation about the line x = t(1, 1, 1) through the angle 2π
3 in the clockwise
direction when looking from the point (1, 1, 1) towards the origin.
(d) Rotation about the line x = t(1, 2, 3) through the angle π.
(e) Rotation about the line x = t(1, 1, 1) through the angle π3 in the anticlockwise
direction when looking from the point (1, 1, 1) towards the origin followed by
reflection in the origin.
6.1. (a) 1: t(1, 2), t 6= 0, 2: t(2, 3), t 6= 0.
(b) No eigenvalues.
(c) −3: t(9, −11, 4), t 6= 0, 1: t(1, 1, 0), t 6= 0, 2: t(1, 1, 1), t 6= 0.
(d) 2: s(0, 0, 1) + t(1, 2, 0), s 6= 0 or t 6= 0.
6.3. (a) 0 and 1. (b) −1 and 1.

128
6.6. (a) Diagonalisable, e.g.
   
1 1 1 −1 0 0
T = 0 1 2  , T −1AT =  0 1 0 .
2 1 −1 0 0 2

(b) Diagonalisable, e.g.


  
0 1 −1 1 0 0
T = −1 2 −2 , T −1AT = 0 1 0 .
1 0 1 0 0 2

(c) Not diagonalisable.


 
12 −7 −4
6.7.  4 −1 −2 .
16 −11 −4
 
1 2 · (−1)n + 3 · 4n −2 · (−1)n + 2 · 4n
6.8. .
5 −3 · (−1)n + 3 · 4n 3 · (−1)n + 2 · 4n
6.11. E.g.  
0 0 0 ··· 0
0 λ 0 · · · 0
 
0 0 λ · · · 0
 
 .. .. .. .. 
. . . .
0 0 0 ··· λ
where λ = −hb, ci.
     
an n 2 n 2
6.12. = (−2) +7 .
bn −4 5
       
an 3 −7 4
6.13.  bn  = (−3)n −9 + (−1)n  6  + 7n 8.
cn 3 1 4
6.14. an = −(−2)n + 1 + 2 · 2n .
6.15. (a) E.g. √   
3 −4√ 2 3 −4 0 0
1
T = √ 4 3 2 4 , T tAT =  0 1 0 .
5 2 −5 0 5 0 0 6

(b) E.g.    
1 2 −2 9 0 0
1
T =  2 1 2 , T tAT = 0 9 0  .
3
−2 2 1 0 0 18

129
Answers to Exercises
       
x1 (t) 1 −1 3
6.17. x2 (t) = 3e−6t −3 + e−t  0  + 2e5t 2.
x3 (t) 1 1 3
    
x1 (t) 1 1 1 a1 cos t + b1 sin t
6.18. x2 (t) = −1 1 1  a√2 et + b2 e−t√ .
x3 (t) 0 −2 3 a3 e 6 t + b3 e− 6 t
6.19. Eigenfrequencies p p
√ √ √
2− 2q q 2 q 2+ 2
, ,
2π 2π 2π
with associated eigenvectors
     
√1 1 1

 2 ,  0  , − 2 ,
1 −1 1
respectively.
7.1. (a) E.g. √1 (0, 1, 1), √1 (1, 1, −1), √1 (2, −1, 1), 2(x′ )2 + 2(x′ )2 + 8(x′3 )2 .
2 3 6 1 2
1 1 ′ 2 ′ 2
(b) E.g. √ (0, 1, 1), (1, 0, 0), √ (0, −1, 1), (x ) + 2(x ) .
2 2 2 3

7.2. Minimum value 2 at ± √130 (1, −5, 2), maximum value 8 at ± √16 (1, 1, 2).
7.3. Minimum value −27, maximum value 81.
1
7.4. (a) 4. (b) No.
7.5. 6.
7.7. 1.
7.8. The semi-major
√ axis has length 2 and direction (1, 2). The semi-minor axis has
length 2 and direction (−2, 1).
7.9. Hyperboloid of two sheets. √1 .
8
1
7.10. The points closest to the origin are ± 6√ 3
(1, 1, −2). The points furthest from the
1
origin are ± 3√ 2
(1, 1, 1).

7.11. Hyperboloid of one sheet. (1, −2, 2).


7.12. Hyperbolic cylinder. 3.

7.13. Elliptic cylinder. 5.
7.14. (a) (3, 0), (b) (2, 1), (c) (1, 2).
7.15. Only the quadratic form in (b) is positive definite.
7.16. (a) Hyperboloid of one sheet.
(b) Hyperbolic cylinder.

130
7.17. Ellipsoid if a > 0, elliptic cylinder if a = 0, hyperboloid of one sheet if −1 < a < 0,
hyperbolic cylinder if a = −1 and hyperboloid of two sheets if a < −1.
7.18. (a) Yes. (b) No.
7.19. Two sheets if abc > 0, one sheet if abc < 0.
7.20. (2, 1).

131
Index
A dot product, 33
addition
of matrices, 1 E
of vectors, 11 eigenfrequency, 102
additive inverse, 11 eigenmode, 102
adjugate, 62 eigenspace, 92
algebraic multiplicity, 92 eigenvalue
alternating, 51 of a linear transformation, 87
angle, 35 of a quadratic form, 110
eigenvector, 87
B ellipse, 112
basis, 16 ellipsoid, 114
orthonormal, 37 elliptic cylinder, 115
Bessel’s inequality, 48 elliptic paraboloid, 116
bilinear form, 107 entry, 1
expansion along a row or column, 57
C
Cauchy–Schwarz inequality, 35 F
characteristic polynomial, 87 finite-dimensional, 20
column matrix, 1
composition of linear transformations, 28 G
cone, 115 generate, 15
coordinate, 17 generator, 15
Cramer’s rule, 62 geometric multiplicity, 92
cylinder Gram–Schmidt orthogonalisation, 38
elliptic, 115
hyperbolic, 116 H
parabolic, 116 Hermitian, 97
Householder matrix, 85
D hyperbola, 113
determinant, 53 hyperbolic cylinder, 116
of a linear transformation, 73 hyperbolic paraboloid, 116
diagonal matrix, 59 hyperboloid
diagonalisable matrix, 89 of one sheet, 115
diagonalisation of a quadratic form, 109 of two sheets, 115
dimension, 20
direct sum, 24 I
distance, 43 identity mapping, 67
Index

image negative semidefinite, 117


of a linear transformation, 26 nilpotent matrix, 10
of a matrix, 14 norm, 33
indefinite, 117 normal equations, 44
index of inertia, 117 normalisation of a vector, 34
infinite-dimensional, 20
inner product, 33 O
inner product space, 33 order
inverse of a determinant, 53
of a linear transformation, 29 of a square matrix, 1
of a matrix, 6 orthogonal
invertible matrix, 40
linear transformation, 29 vectors, 34
matrix, 6 orthogonal complement, 40
isometry, 78 orthogonal projection, 41, 76
orthogonal reflection, 76
K orthonormal basis, 37
kernel orthonormal set, 37
of a linear transformation, 26
of a matrix, 14 P
parabola, 114
L
parabolic cylinder, 116
length, see norm
paraboloid
linear combination, 15
elliptic, 116
linear space, 11
hyperbolic, 116
linear transformation, 26
positive definite, 117
linearly dependent, 15
positive semidefinite, 117
linearly independent, 15
product of matrices, 3
lower triangular matrix, 59
product theorem, 59
M projection, 24, 74
main diagonal, 1 orthogonal, 41, 76
matrix, 1 Pythagorean theorem, 35
of a bilinear form, 108
of a linear transformation, 67 Q
of a quadratic form, 109 quadratic form, 108
method of least squares, 45
multilinear form, 51 R
multiplication of matrices, 3 rank, 44
multiplicity rank-nullity theorem, 27
algebraic, 92 reflection, 75
geometric, 92 orthogonal, 76
revolution, surface of, 116
N rotation, 68
negative definite, 117 row matrix, 1

134
S
scalar, 11
scalar multiplication
of matrices, 2
of vectors, 11
sign function, 119
signature, 117
size of a matrix, 1
skew-symmetric matrix, 9
span, 15
spectral theorem, 97
square matrix, 1
standard basis, 17
subspace, 12
sum
direct, 24
of matrices, 1
of subspaces, 24
of vectors, 11
surface of revolution, 116
Sylvester’s law of inertia, 117
symmetric
bilinear form, 108
linear transformation, 76
matrix, 5

T
trace, 87
transition matrix, 72
transpose, 5
triangle inequality, 36

U
unit matrix, 5
unit vector, 34
upper triangular matrix, 59

V
vector, 11

Z
zero matrix, 1
zero subspace, 13
zero vector, 11

135

You might also like