Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

La PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 208

Lecture Notes on Linear Algebra

Arbind K Lal Sukant Pati


T

July 10, 2018


AF
DR
2

DR
AF
T
Contents

1 Introduction to Matrices 5
1.1 Definition of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Operations on Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Multiplication of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Some More Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3.1 Submatrix of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2 System of Linear Equations 27


T
AF

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.1 Elementary Row Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 30
DR

2.2 Row-Reduced Echelon Form (RREF) . . . . . . . . . . . . . . . . . . . . . . . . . 33


2.3 Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4 Solution set of a Linear System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5 Square Matrices and Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.5.1 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.5.2 Adjugate (classical Adjoint) of a Matrix . . . . . . . . . . . . . . . . . . . 53
2.5.3 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.6 Miscellaneous Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3 Vector Spaces 63
3.1 Vector Spaces: Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . 63
3.1.1 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.1.2 Linear Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.2 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.2.1 Basic Results on Linear Independence . . . . . . . . . . . . . . . . . . . . 77
3.2.2 Application to Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.2.3 Linear Independence and Uniqueness of Linear Combination . . . . . . . 80
3.3 Basis of a Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.3.1 Main Results associated with Bases . . . . . . . . . . . . . . . . . . . . . 84

3
4 CONTENTS

3.3.2 Constructing a Basis of a Finite Dimensional Vector Space . . . . . . . . 85


3.4 Fundamental Subspaces Associated with a Matrix . . . . . . . . . . . . . . . . . 87
3.5 Ordered Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4 Linear Transformations 101


4.1 Definitions and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.2 Rank-Nullity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.2.1 Algebra of Linear Transformations . . . . . . . . . . . . . . . . . . . . . . 110
4.3 Matrix of a linear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.4 Similarity of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.5 Dual Space* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

5 Inner Product Spaces 125


5.1 Definition and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.1.1 Cauchy Schwartz Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.1.2 Angle between two Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.1.3 Normed Linear Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.2 Gram-Schmidt Orthonormalization Process . . . . . . . . . . . . . . . . . . . . . 133
T

QR Decomposition∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
AF

5.2.1
5.3 Orthogonal Projections and Applications . . . . . . . . . . . . . . . . . . . . . . . 142
DR

5.3.1 Orthogonal Projections as Self-Adjoint Operators* . . . . . . . . . . . . . 146


5.4 Orthogonal Operator and Rigid Motion∗ . . . . . . . . . . . . . . . . . . . . . . . 149
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

6 Eigenvalues, Eigenvectors and Diagonalizability 155


6.1 Introduction and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.1.1 Spectrum of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6.2 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
6.2.1 Schur’s Unitary Triangularization . . . . . . . . . . . . . . . . . . . . . . . 170
6.2.2 Diagonalizability of some Special Matrices . . . . . . . . . . . . . . . . . . 172
6.2.3 Cayley Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
6.3 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
6.3.1 Sylvester’s law of inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.3.2 Applications in Eculidean Plane and Space . . . . . . . . . . . . . . . . . 183

7 Appendix 189
7.1 Uniqueness of RREF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
7.2 Permutation/Symmetric Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.3 Properties of Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
7.4 Dimension of W1 + W2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
CONTENTS 5

7.5 When does Norm imply Inner Product . . . . . . . . . . . . . . . . . . . . . . . . 200


7.6 Roots of a Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
7.7 Variational characterizations of Hermitian Matrices . . . . . . . . . . . . . . . . . 203

T
AF
DR
6 CONTENTS

T
AF
DR
Chapter 1

Introduction to Matrices

1.1 Definition of a Matrix


Definition 1.1.1. A rectangular array of numbers is called a matrix.

The horizontal arrays of a matrix are called its rows and the vertical arrays are called its
columns. Let A be a matrix having m rows and n columns. Then, A is said to have order
m × n or is called a matrix of size m × n and can be represented in either of the following forms:
   
a11 a12 · · · a1n a11 a12 ··· a1n
T

   
 a21 a22 · · · a2n   a21 a22 · · · a2n 
AF

A= . or A =  . ,
   
 .. .. . . .. . .. .. .. 
. . .  . . . . 
DR

 
 
am1 am2 · · · amn am1 am2 · · · amn

where aij is the entry at the intersection of the ith row and j th column. One writes A ∈ Mm,n (F)
to mean that A is an m × n matrix with entries from the set F, or in short A = [aij ]. We write
A[i, :] to denote the i-th row of A, A[:, j] to denote the j-th column of A and aij or (A)ij , for
the (i, j)-th entry of A. " # " #
1 3+i 7 7
For example, if A = then A[1, :] = [1 3 + i 7], A[:, 3] = and
4 5 6 − 5i 6 − 5i
a22 = 5. Sometimes commas are inserted to differentiate between entries of a row vector. Thus,
A[1, :] may also be written as [1, 3 + i, 7]. A matrix having only one column is called a column
vector and a matrix with only one row is called a row vector. All our vectors will be column
vectors and will be represented by bold letters.

Example 1.1.2. Consider a system " of linear


# equations 2x + 5y = 7 and 3x + 2y = 6. Then,
2 5 7
we identify it with the matrix A = . Here the variable/unknown x is associated with
3 2 6
A[:, 1] and y is associated with A[:, 2].

Definition 1.1.3. Two matrices A = [aij ], B = [bij ] ∈ Mm,n (C) are said to be equal if aij = bij ,
for each i = 1, 2, . . . , m and j = 1, 2, . . . , n.

In other words, two matrices are said to be equal if they have the same order and their
corresponding entries are equal.

7
8 CHAPTER 1. INTRODUCTION TO MATRICES

1.1.1 Special Matrices

Definition 1.1.4. Let A = [aij ] be an m × n matrix with aij ∈ F.

1. Then A is called a zero-matrix, denoted 0"(order # is mostly clear


" from the
# context), if
0 0 0 0 0
aij = 0 for all i and j. For example, 02×2 = and 02×3 = .
0 0 0 0 0

2. Then A is called a square matrix if m = n and is denoted by A ∈ Mn (F).

3. Let A ∈ Mn (F).

(a) Then, the entries a11 , a22 , . . . , ann are called the diagonal entries of A. They consti-
tute the principal diagonal of A.

(b) Then, A is said to be a diagonal matrix, , denoted


" #diag(a11 , . . . , ann ), if aij = 0
4 0
for i 6= j. For example, the zero matrix 0n and are diagonal matrices.
0 1
(c) Then, A = diag(1, . . . , 1) is called the
 identity
 matrix, denoted In , or in short I.
" # 1 0 0
1 0  
For example, I2 = and I3 = 0 1 0

.
0 1
0 0 1
T

(d) If A = αI, for some α ∈ F, then A is called a scalar matrix.


AF

(e) Then, A is said to be an upper triangular matrix if aij = 0 for i > j.


DR

(f) Then, A is said to be a lower triangular matrix if aij = 0 for i < j.

(g) Then, A is said to be triangular if it is an upper or a lower triangular matrix.


   
0 1 4 0 0 0
   
For example, 0 3 −1  is upper triangular, 1 0 0 is lower triangular and the
  
0 0 −2 0 1 1
matrices 0, I are upper as well as lower triangular matrices.

4. An m × n matrix A = [aij ] is said to have an uppertriangular form if aij = 0 for all


a11 a12 · · · a1n
 
0 0 1
  " #
 0 a22 · · · a2n   0 1 0

1 2 0 0 1
  
i > j. For example, the matrices  . ,  and

 .. .. .. .. 
 . . .  
 0 0 2 
 0 0 0 1 1
0 0 · · · ann 0 0 0
have upper triangular forms.

5. For 1 ≤ i ≤ n, define ei = In [:, i], a matrix of order n × 1. Then the column matrices
e1 , . . . , en are called the standard unit vectors or the standard basis of Mn,1 (C) or
Cn . The dependence of n is omitted as it is understood
  from the context. For example,
" # 1
1
if e1 ∈ C then, e1 = and if e1 ∈ C then e1 = 
2 3
 
0.

0
0
1.2. OPERATIONS ON MATRICES 9

1.2 Operations on Matrices


Definition 1.2.1. Let A = [aij ] ∈ Mm,n (C). Then
1. the transpose of A, denoted AT , is an n × m matrix with (AT )ij = aji , for all i, j.

2. the conjugate transpose of A, denoted A∗ , is an n × m matrix with (A∗ )ij = aji (the
complex-conjugate of aji ), for all i, j.
" # " # " #
1 4+i 1 0 1 0
If A = then AT = and A∗ = . Note that A∗ 6= AT .
0 1−i 4+i 1−i 4−i 1+i
Note that if x is a column vector then xT and x∗ are row vectors.

Theorem 1.2.2. For any matrix A, (A∗ )∗ = A and (AT )T = A.

Proof. Let A = [aij ], A∗ = [bij ] and (A∗ )∗ = [cij ]. Clearly, the order of A and (A∗ )∗ is the
same. Also, by definition cij = bji = aij = aij for all i, j.

Definition 1.2.3. Let A = [aij ], B = [bij ] ∈ Mm,n (C) and k ∈ C.


1. . Then the sum of A and B, denoted A + B, is defined to be the matrix C = [cij ] ∈
Mm,n (C) with cij = aij + bij for all i, j.

2. Then, the product of k ∈ C with A, denoted kA, equals kA = [kaij ] = [aij k] = Ak.
T
AF

" # " # " # " #


1 4 5 1 −4 6 2 0 11 5 20 25
If A = ,B= then A + B = and 5A = .
DR

0 1 2 1 1 7 1 2 9 0 5 10

Theorem 1.2.4. Let A, B, C ∈ Mm,n (C) and let k, ` ∈ C. Then

1. A + B = B + A (commutativity).

2. (A + B) + C = A + (B + C) (associativity).

3. k(`A) = (k`)A.

4. (k + `)A = kA + `A.

Proof. (1). Let A = [aij ] and B = [bij ]. Then by definition

A + B = [aij ] + [bij ] = [aij + bij ] = [bij + aij ] = [bij ] + [aij ] = B + A

as complex numbers commute. The other parts are left for the reader.

Definition 1.2.5. Let A ∈ Mm,n (C). Then

1. the matrix 0m×n satisfying A + 0 = 0 + A = A is called the additive identity.

2. the matrix B with A + B = 0 is called the additive inverse of A, denoted −A = (−1)A.

Exercise 1.2.6. 1. Find a few non zero, non-identity matrices A satisfying


10 CHAPTER 1. INTRODUCTION TO MATRICES

(a) AT = A.
 
1 −1 2
T
 
−1
Ans: A =  3 5 =A
2 5 −1
(b) AT = −A.
 
0 −1 2
T
 
Ans: A = 
1 0  = −A .
5
−2 −5 0

2. Find a few non zero, non-identity matrices A with complex entries satisfying

(a) A∗ = A.

(b) A∗ = −A.
   
−1 + i 2 − i
1 0 −1 + i 2 − i
 = A∗ , A =  1 + i  = −A∗ .
   
Ans: A = 
−1 − i 3 i   0 −i 
2+i −i −1 −2 − i −i 0

3. Suppose A = [aij ], B = [bij ] ∈ Mm,n (C).

(a) If A + B = 0 then show that B = (−1)A = [−aij ].


T

(b) If A + B = A then show that B = 0.


AF
DR

 
1 + i −1 " #
  2 3 −1 ∗ ∗
4. Let A = 
 2 3 and B = 1 1 − i 2 . Compute A + B and B + A .
i 1

5. Write the 3 × 3 matrices A = [aij ] satisfying

(a) aij = 1 if i 6= j and 2 otherwise.


(b) aij = 1 if | i − j | ≤ 1 and 0 otherwise.
(c) aij = i + j.
(d) aij = 2i+j .
       
2 1 1 1 1 0 2 3 4 22 23 24
       
Ans: a) A = 1 2 1 , b) A = 1 1 1, c) A = 3 4 5, d) A = 23 24 25 .
       
1 1 2 0 1 1 4 5 6 24 25 26

1.2.1 Multiplication of Matrices

Definition 1.2.7. Let A = [aij ] ∈ Mm,n (C) and B = [bij ] ∈ Mn,r (C). Then, the product of A
and B, denoted AB, is a matrix C = [cij ] ∈ Mm,r (C) with
n
X
cij = aik bkj = ai1 b1j + ai2 b2j + · · · + ain bnj , 1 ≤ i ≤ m, 1 ≤ j ≤ r.
k=1
1.2. OPERATIONS ON MATRICES 11

Thus, AB is defined if and onlyif the number


 of columns of A = the number of rows of
" # α β γ δ
a b c  
B. If A = and B = x y z t then
d e f
u v w s
" #
aα + bx + cu aβ + by + cv aγ + bz + cw aδ + bt + cs
AB = . (1.2.1)
dα + ex + f u dβ + ey + f v dγ + ez + f w dδ + et + f s

Note that the rows of the matrix AB can be written directly as

(AB)[1, :] = a [α, β, γ, δ] + b [x, y, z, t] + c [u, v, w, s] = aB[1, :] + bB[2, :] + cB[3, :]


X 3
= a11 B[1, :] + a12 B[2, :] + a13 B[3, :] = a1i B[i, :] (1.2.2)
i=1
(AB)[2, :] = dB[1, :] + eB[2, :] + f B[3, :] = a21 B[1, :] + a22 B[2, :] + a23 B[3, :]
X3
= a2i B[i, :] (1.2.3)
i=1

and similarly, the columns of the matrix AB can be written directly as


3
" #
aα + bx + cu X
(AB)[:, 1] = = α A[:, 1] + x A[:, 2] + u A[:, 3] = A[:, j] bj1 , (1.2.4)
dα + ex + f u j=1
T
AF

3
P
(AB)[:, k] = A[:, j] bjk for k = 2, 3, 4.
DR

j=1

Remark 1.2.8. Observe the following:

1. In the above example, while AB is defined, the product BA is not defined. However, for
square matrices A and B of the same order, both the product AB and BA are defined.

2. The product AB corresponds to operating (adding or subtracting certain multiples) on the


rows of B (see Equation (1.2.3)). This is called the row method for calculating the
matrix product.

3. The product AB also corresponds to operating (adding or subtracting certain multiples)


on the columns of A (see Equation (1.2.4)). This is called the column method for
calculating the matrix product.

4. Let A ∈ Mm,n (C) and B ∈ Mn,p (C). Then (AB)[i, :] = A[i, :]B = ai1 B[1, :]+· · ·+ain B[n, :]
and (AB)[:, j] = AB[:, j] = A[:, 1]b1j + · · · + A[:, n]bnj .
   
1 2 0 1 0 −1
   
Example 1.2.9. Let A = 1 0 1 and B = 0 0
 
. Use the row/column method
1
0 −1 1 0 −1 1

1. to find the second row of AB.


Solution: (AB)[2, :] = A[2, :]B = 1 · [1, 0, −1] + 0 · [0, 0, 1] + 1 · [0, −1, 1] = [1, −1, 0].
12 CHAPTER 1. INTRODUCTION TO MATRICES

2. to find the third column of AB.        


1 2 0 1
       
Solution: (AB)[:, 3] = A B[:, 3] = −1 · 1 + 1 ·  0  + 1 · 1 = 0
      
.
0 −1 1 0
Exercise 1.2.10. 1. Let A ∈ Mn (C), D = diag(d1 , d2 , . . . , dn ) and e1 , . . . , en ∈ Mn,1 (C)
(see Definition 5). Then verify that
(a) Ae1 = A[:, 1], . . . , Aen = A[:, n].
(b) eT1 A = e∗1 A = A[1, :], . . . , eTn A = e∗n A = A[n, :].
(c) (DA)[i, :] = di A[i, :], for 1 ≤ i ≤ n, and
(d) (AD)[:, j] = dj A[:, j], for 1 ≤ j ≤ n. In particular, if D = αI is a scalar matrix then
DA = αA = AD.

Ans: Just use matrix multiplication to get the required results.


   
x1 y1
 .  . n n
.. , y =  ..  ∈ Mn,1 (C). Then y∗ x = ∗x = |xi |2 ,
P P
2. Let x = 
    y i x i , x
i=1 i=1
xn yn
 
  |x1 |2 x1 x2 · · · x1 xn
x1 y1 x1 y2 · · · x1 yn
 x2 x1 |x2 |2 · · · x2 xn 
 

 . . .  ∗
xy =  . .. · · · ..  and xx =  .

.. .. 

.
 . ..
T

  .. . . . 
AF

xn y1 xn y2 · · · xn yn
 
xn x1 xn x2 · · · |xn | 2
DR

Ans: Just use matrix multiplication to get the required results.


3. Let A be an upper triangular matrix. If A∗ A = AA∗ then prove that A is a diagonal
matrix. The same holds for lower triangular matrix.
 
a11 a12 · · · a1n
 
 0 a22 · · · a2n 
Ans: Let A =  . be an upper triangular matrix. Then (A∗ A)11 = |a11 |2
 
 .. .. .. .. 
 . . . 

0 0 · · · ann
and (AA )11 = |a11 | + |a12 |2 + · · · + |a1n |2 . Thus, A∗ A = AA∗ implies |a11 |2 + |a12 |2 + · · · +
∗ 2

|a1n |2 = |a11 |2 . Hence, a12 = 0, . . . , a1n = 0. Now, use (A∗ A)22 = (AA∗ )22 to conclude
a23 = 0, . . . , a2n = 0 and so on.

Definition 1.2.11. Two square matrices A and B are said to commute if AB = BA.

Remark 1.2.12. Note that if A is a square matrix of order n and if B is a scalar matrix of
order n then "AB =# BA. In general,
" # the matrix product is not
" commutative.
# " # For example,
1 1 1 0 2 0 1 1
consider A = and B = . Then, verify that AB = 6= = BA.
0 0 1 0 0 0 1 1

Theorem 1.2.13. Let A ∈ Mm,n (C), B ∈ Mn,p (C) and C ∈ Mp,q (C).

1. Then (AB)C = A(BC), i.e., the matrix multiplication is associative.


1.2. OPERATIONS ON MATRICES 13

2. For any k ∈ C, (kA)B = k(AB) = A(kB).

3. Then A(B + C) = AB + AC, i.e., multiplication distributes over addition.

4. If A ∈ Mn (C) then AIn = In A = A.


p
P n
P
Proof. (1). Verify that (BC)kj = bk` c`j and (AB)i` = aik bk` . Therefore,
`=1 k=1

n
X n
X p
X p
n X
X
   
A(BC) ij = aik BC kj
= aik bk` c`j = aik bk` c`j
k=1 k=1 `=1 k=1 `=1
Xn Xp p
X Xn X T
   
= aik bk` c`j = aik bk` c`j = AB c = (AB)C
i` `j ij
.
k=1 `=1 `=1 k=1 `=1

Using a similar argument, the next part follows. The other parts are left for the reader.

Exercise 1.2.14. 1. Let L1 , L2 ∈ Mn (C) be lower triangular matrices and U1 , U2 ∈ Mn (C)


be upper triangular matrices. If D ∈ Mn (C) is a diagonal matrix then
(a) L1 L2 is a lower triangular matrix.
(b) U1 U2 is an upper triangular matrix.
(c) DL1 and L1 D are lower triangular matrices.
(d) DU1 and U1 D are upper triangular matrices.
T
AF

Ans: Just use matrix multiplication to get the required results.


DR

2. Let A ∈ Mm,n (C). If Ax = 0 for all x ∈ Mn,1 (C) then A = 0, the zero matrix.
Ans: Take x = ei . Then 0 = Ax = Aei = A[:, i]. Hence the i-th column of A is the zero
vector. Thus, as we vary i in {1, 2, . . . , n}, we see that all the columns of A are zero.

3. Let A, B ∈ Mm,n (C). If Ax = Bx, for all x ∈ Mn,1 (C) then prove that A = B.
Ans: Take C = A − B. Now use (2) above to show that C = 0 and conclude that A = B.

4. Let A ∈ Mm,n (C) and B ∈ Mn,p (C).

(a) Prove that (AB)∗ = B ∗ A∗ .


Ans: By definition (AB)∗ = (AB)T = B T AT = B T AT = B ∗ A∗ .
(b) If A[1, :] = 0T then (AB)[1, :] = 0T .
Ans: By definition (AB)[1, :] = A[1, :]B = 0T B = 0T .
(c) If B[:, 1] = 0 then (AB)[:, 1] = 0.
Ans: By definition (AB)[:, 1] = AB[:, 1] = A0 = 0.
(d) If A[i, :] = A[j, :] for some i and j then (AB)[i, :] = (AB)[j, :].
Ans: By definition (AB)[i, :] = A[i, :]B = A[j, :]B = (AB)[j, :].
(e) If B[:, i] = B[:, j] for some i and j then (AB)[:, i] = (AB)[:, j].
Ans: By definition (AB)[:, i] = AB[:, i] = AB[:, j] = (AB)[:, j].
14 CHAPTER 1. INTRODUCTION TO MATRICES

5. Construct matrices A and B that satisfy the following statements.

(a) The product AB is defined but BA is not defined.


Ans: Let A be a 2 × 3 matrix and B be a 3 × 1 matrix.
(b) The products AB and BA are defined but they have different orders.
Ans: Let A be a 2 × 3 matrix and B be a 3 × 2 matrix.
(c) The products AB and BA are defined, they have the same order but AB 6= BA.
" # " # " #
1 1 1 −1 2 −2
Ans: Let A = and B = . Then AB = whereas BA =
1 1 1 −1 2 −2
" #
0 0
.
0 0
 
" # 0 1 1
0 1 
. Guess a formula for An and B n and prove it?

(d) Let A = and B =  0 0 1
0 0  
0 0 0
Ans: An = 0 for n ≥ 2 and B n = 0 for n ≥ 3.
   
" # 1 1 1 1 1 1
1 1    
 and C = 1 1 1. Is it true that A2 −2A+I = 0?
(e) Let A = ,B= 0 1 1
0 1    
0 0 1 1 1 1
What is B 3 − 3B 2 + 3B − I? Is C 2 = 3C?
T

Ans: Yes, all the three statements are TRUE.


AF
DR

6. Let A and B be two m × n matrices. Then, prove that (A + B)∗ = A∗ + B ∗ .


Ans: (A + B)∗ = (A + B)T = AT + B T = AT + B T = A∗ + B ∗ .

7. Find A ∈ M2 (C) such that A 6= 0 but A2 = 0.


Ans: See Exercise 5d.

8. Find A ∈ M2 (C) such that A 6= 0, I2 but A2 = A.


" #
1 1
2 2
Ans: Let A = 1 1
.
2 2

9. Find A, B, C ∈ M2 (C) such that AB = AC but B 6= C (cancellation law doesn’t hold).


" # " # " #
1 −1 1 1 2 −3
Ans: Let A = ,B = and C = . Then AB = 0 = AC.
1 −1 1 1 2 −3
" # " #
0 −1 0 −1
10. Let S = and T = . Determine all m, n ∈ N such that S m = I and
1 1 1 0
T n = I.
Ans: Verify S 6m = I and T 4m = I for all m ∈ N.
 
0 1 0
  2 3 3 3 2
11. Let A = 
0 0 1. Compute A and A . Is A = I? Determine aA + bA + cA .
1 0 0
1.2. OPERATIONS ON MATRICES 15
   
0 0 1 a b c
   
Ans: A2 =  1 0 0  and A3 = I. So, aA3 + bA + cA2 =  c b a. Such matrices are
  
0 1 0 a b c
called circulant matrices.
   
1 1 + i −2 1 0
   
12. Let A =   1 −2 i  and B =  0
  1. Compute

−i 1 1 −1 + i 1

(a) A − A∗ , A + A∗ , (3AB)∗ − 4B ∗ A and 3A − 2A∗ .


(b) (AB)[1, :], (AB)[3, :], (AB)[:, 1] and (AB)[:, 2].
(c) (B ∗ A∗ )[:, 1], (B ∗ A∗ )[:, 3], (B ∗ A∗ )[1, :] and (B ∗ A∗ )[2, :].

1.2.2 Inverse of a Matrix

Definition 1.2.15. Let A ∈ Mn (C). Then

1. B ∈ Mn (C) is said to be a left inverse of A if BA = In .

2. C ∈ Mn (C) is called a right inverse of A if AC = In .

3. A is invertible (has an inverse) if there exists B ∈ Mn (C) such that AB = BA = In .


T
AF

Lemma 1.2.16. Let A ∈ Mn (C). If there exist B, C ∈ Mn (C) such that AB = In and CA = In
DR

then B = C, i.e., If A has a left inverse and a right inverse then they are equal.

Proof. Note that C = CIn = C(AB) = (CA)B = In B = B.

Remark 1.2.17. Lemma 1.2.16 implies that whenever A is invertible, the inverse is unique.
Thus, we denote the inverse of A by A−1 . That is, AA−1 = A−1 A = I.
" #
a b
Example 1.2.18. 1. Let A = .
c d
" #
d −b
(a) If ad − bc 6= 0. Then, verify that A−1 = 1
ad−bc .
a −c
" # " #
2 3 7 −3
(b) In particular, the inverse of equals 12 .
4 7 −4 2
(c) If ad − bc = 0 then prove that either A[1, :] = 0∗ or A[:, 1] = 0 or A[2, :] = αA[1, :] or
A[:, 2] = αA[:, 1] for some α ∈ C. Hence, prove that A is not invertible.
" # " # " #
1 2 1 0 4 2
(d) Matrices , and do not have inverses. Justify your answer.
0 0 4 0 6 3
   
1 2 3 −2 0 1
. Then A−1 =  0  (verify AA−1 = A−1 A = I3 ).
   
2. Let A =  2 3 4  3 −2 
3 4 6 1 −2 1
16 CHAPTER 1. INTRODUCTION TO MATRICES
   
1 1 1 1 1 2
   
3. Prove that the matrices A = 
1 1 1 and B = 1 0 1 are not invertible.
  
1 1 1 0 1 1
Solution: Suppose there exists C such that CA = AC = I. Then, using matrix product

A[1, :]C = (AC)[1, :] = I[1, :] = [1, 0, 0] and A[2, :]C = (AC)[2, :] = I[2, :] = [0, 1, 0].

But A[1, :] = A[2, :] and thus [1, 0, 0] = [0, 1, 0], a contradiction.


Similarly, if there exists D such that BD = DB = I then

DB[:, 1] = (DB)[:, 1] = I[:, 1], DB[:, 2] = (DB)[:, 2] = I[:, 2] and DB[:, 3] = I[:, 3].

But B[:, 3] = B[:, 1] + B[:, 2] and hence I[:, 3] = I[:, 1] + I[:, 2], a contradiction.

Theorem 1.2.19. Let A and B be two invertible matrices. Then,

1. (A−1 )−1 = A.

2. (AB)−1 = B −1 A−1 .

3. (A∗ )−1 = (A−1 )∗ .

Proof. (1). Let B = A−1 . Then AB = BA = I. Thus, by definition, B is invertible and


T
AF

B −1 = A. Or equivalently, (A−1 )−1 = A.


(2). By associativity (AB)(B −1 A−1 ) = A(BB −1 )A−1 = I = (B −1 A−1 )(AB).
DR

(3). As AA−1 = A−1 A = I, we get (AA−1 )∗ = (A−1 A)∗ = I ∗ . Or equivalently, (A−1 )∗ A∗ =


A∗ (A−1 )∗ = I. Thus, by definition (A∗ )−1 = (A−1 )∗ .
We will again come back to the study of invertible matrices in Sections 2.2 and 2.5.1.

Exercise 1.2.20. 1. If A is an invertible matrix then (A−1 )r = A−r , for all r ∈ N.

2. If A1 , . . . , Ar are invertible matrices then B = A1 A2 · · · Ar is also invertible.


Ans: Use Theorem 1.2.19.2 repeatedly.
" # " #
cos(θ) sin(θ) cos(θ) − sin(θ)
3. Find the inverse of and .
sin(θ) − cos(θ) sin(θ) cos(θ)
" # " #
cos(θ) sin(θ) cos(θ) − sin(θ)
Ans: If A = then A−1 = A and if B = then
sin(θ) − cos(θ) sin(θ) cos(θ)
" #
cos(θ) sin(θ)
B −1 = .
− sin(θ) cos(θ)

4. Let A ∈ Mn (C) be an invertible matrix. Then

(a) A[i, :] 6= 0T , for any i.


(b) A[:, j] 6= 0, for any j.
(c) A[i, :] 6= A[j, :], for any i and j.
1.2. OPERATIONS ON MATRICES 17

(d) A[:, i] 6= A[:, j], for any i and j.


(e) A[3, :] 6= αA[1, :] + βA[2, :], for any α, β ∈ C, whenever n ≥ 3.
(f ) A[:, 3] 6= αA[:, 1] + βA[:, 2], for any α, β ∈ C, whenever n ≥ 3.
Ans: As A is invertible, there exists B ∈ Mn (C) such that AB = BA = In . Therefore,
(a) if A[i, :] = 0T then eTi = In [i, :] = (AB)[i, :] = A[i, :]B = 0T B = 0T .
(b) if A[:, j] = 0 then ej = In [:, j] = (BA)[:, j] = BA[:, j] = B0 = 0.
(c) if A[i, :] = A[j, :] then
eTi = In [i, :] = (AB)[i, :] = A[i, :]B = A[j, :]B = (AB)[j, :] = In [j, :] = eTj .
(d) if A[:, i] = A[:, j] then
ei = In [:, i] = (BA)[:, i] = BA[:, i] = BA[:, j] = (BA)[:, j] = In [:, j] = ej .
(e) if A[3, :] = αA[1, :] + βA[2, :] then

eT3 = In [3, :] = (AB)[3, :] = A[3, :]B = (αA[1, :] + βA[2, :]) B


= αA[1, :]B + βA[2, :]B = α(AB)[1, :] + β(AB)[2, :]
= αIn [1, :] + βIn [2, :] = αeT1 + βeT2 .

(f) if A[:, 3] = αA[:, 1] + βA[:, 2] then

e3 = In [:, 3] = (BA)[:, 3] = BA[:, 3] = B (αA[:, 1] + βA[:, 2])


T

= αBA[:, 1] + βBA[:, 2] = α(BA)[:, 1] + β(BA)[:, 2]


AF

= αIn [:, 1] + βIn [:, 2] = αe1 + βe2 .


DR

" #
1 2
5. Determine A that satisfies (I + 3A)−1 = .
2 1
" # " #!−1 " #
−1 4 −2 −1 1 2 −1 1 −2
as (I +3A) = (I + 3A)−1

Ans: A = = = .
9 −2 4 2 1 3 −2 1
 
−2 0 1
6. Determine A that satisfies (I − A)−1 = 
 
0 3 −2 . [See Example 1.2.18.2].
1 −2 1
     
1 2 3 1 2 3 0 −2 −3
     
Ans: Example 1.2.18.2 gives I −A =  2 3 4  ⇒ A = I − 2 3 4 = −2 −2 −4.
    
3 4 6 3 4 6 −3 −4 −5

1 2
7. Let A be an invertible matrix satisfying A3 + A − 2I = 0. Then A−1 =

A +I .
2
Ans: As A is invertible, multiplying by A−1 gives A2 + I − 2A−1 = 0. Hence, the result.

8. Let A = [aij ] be an invertible matrix and B = [pi−j aij ], for some p ∈ C, p 6= 0. Then
B −1 = [pi−j (A−1 )ij ].
Ans: Note that B = DAD−1 , where D = diag(p, p2 , . . . , pn ) is a diagonal matrix. As
p 6= 0, D is invertible. Hence B −1 is invertible and B −1 = (DAD−1 )−1 = DA−1 D−1 .
18 CHAPTER 1. INTRODUCTION TO MATRICES

1.3 Some More Special Matrices


Definition 1.3.1. 1. For 1 ≤ k ≤ m and 1 ≤ ` ≤ n, define ek` ∈ Mm,n (C) by
(
1, if (k, `) = (i, j)
(ek` )ij =
0, otherwise.

Then, the matrices ek` for 1 ≤ k ≤ m and 1 ≤ ` ≤ n are called the standard basis
elements for Mm,n (C).
" " # # " # " #
1 0 0
1 h i 0 1 0 1 h i
So, if ek` ∈ M2,3 (C) then e11 = = 1 0 0 , e12 = = 0 1 0
0 0 0 0 0 0 0 0
" # " #
0 0 0 0 h i
and e22 = = 0 1 0 .
0 1 0 1
In particular, if eij ∈ Mn (C) then eij = ei eTj = ei e∗j , for 1 ≤ i, j ≤ n.

2. Let A ∈ Mn (R). Then


" #
1 3
(a) A is called symmetric if AT = A. For example, A = .
3 2
" #
0 3
(b) A is called skew-symmetric if AT = −A. For example, A = .
−3 0
T

" #
1 1 1
AF

(c) A is called orthogonal if AAT = AT A = I. For example, A = √ .


2 1 −1
DR

(d) A is said to be a permutation matrix if A has exactly one non-zero entry, "namely
#
0 1
1, in each row and column. For example, In for each positive integer n, ,
1 0
     
0 1 0 0 0 1 0 1 0
     
0 0 1, 0 1 0 and 1 0 0 are permutation matrices. Verify that per-
     
1 0 0 1 0 0 0 0 1
mutation matrices are Orthogonal matrices.

3. Let A ∈ Mn (C). Then


" #
1 i
(a) A is called normal if A∗ A = AA∗ . For example, is a normal matrix.
i 1
" #
1 1+i
(b) A is called Hermitian if A∗ = A. For example, A = .
1−i 2
" #
0 1+i
(c) A is called skew-Hermitian if A∗ = −A. For example, A = .
−1 + i 0
" #
∗ ∗ 1 1+i 1
(d) A is called unitary if AA = A A = I. For example, A = √ .
3 −1 1 − i
Verify that Hermitian, skew-Hermitian and Unitary matrices are normal matrices.

4. A vector u ∈ Mn,1 (C) such that u∗ u = 1 is called a unit vector.


1.3. SOME MORE SPECIAL MATRICES 19
" #
1 0
5. A matrix A is called idempotent if A2 = A. For example, A = is idempotent.
1 0

6. An idempotent matrix which is also Hermitian is called a projection matrix. For example,
if u ∈ Mn,1 (C) is a unit vector then A = uu∗ is a Hermitian, idempotent matrix. Thus A
is a projection matrix.

Verify that u∗ (x − Ax) = u∗ x − u∗ Ax = u∗ x − u∗ (uu∗ )x = 0 (as u∗ u = 1), for any


x ∈ C3 . Thus, with respect to the dot product in R3 , Ax is the foot of the perpendicular
1
from the point x on the vector u. In particular, if u = √ [1, 2, −1]T and A = uuT . Then,
6
for any vector x = [x1 , x2 , x3 ]T ∈ M3,1 (R),

x1 + 2x2 − x3 x1 + 2x2 − x3
Ax = (uuT )x = u(uT x) = √ u= [1, 2, −1]T .
6 6

7. Fix a unit vector a ∈ Mn,1 (R) and let A = 2aaT − In . Then, verify that A ∈ Mn (R) and
Ay = 2(aT y)a − y, for all y ∈ Rn . This matrix is called the reflection matrix about the
line containing the points 0 and a.

8. Let A ∈ Mn (C). Then, A is said to be nilpotent if there exists a positive integer n


such that An = 0. The least positive integer k for which Ak = 0 is called the order of
nilpotency. For example, if A = [aij ] ∈ Mn (C) with aij equal to 1 if i − j = 1 and 0,
T

otherwise then An = 0 and A` 6= 0 for 1 ≤ ` ≤ n − 1.


AF
DR

Exercise 1.3.2. 1. Consider the matrices eij ∈ Mn (C) for 1 ≤ i, j, ≤ n. Is e12 e11 = e11 e12 ?
What about e12 e22 and e22 e12 ?

Ans: Note e11 = e1 eT1 and e12 = e1 eT2 . Thus e12 e11 = (e1 eT2 )(e1 eT1 ) = e1 (eT2 e1 )eT1 = 0
as eT2 e1 = 0. Where as e11 e12 = (e1 eT1 )(e1 eT2 ) = e1 (eT1 e1 )eT2 = e1 eT2 = e12 .

2. Let {u1 , u2 , u3 } be three vectors in R3 such that u∗i ui = 1, for 1 ≤ i ≤ 3, and u∗i uj = 0
whenever i 6= j. Prove the following.

(a) If U = [u1 u2 u3 ] then U ∗ U = I. What about U U ∗ = u1 u∗1 + u2 u∗2 + u3 u∗3 ?


   
u∗1 h u
i  1 1
∗u u ∗u
1 2 u∗u
1 3

  
Ans: U U = u2  u1 u2 u3 = u2 u1 u2 u2 u∗2 u3 
 ∗   ∗ ∗
 = I3 .
u3∗ ∗ ∗ ∗
u3 u1 u3 u2 u3 u3
Check (U U ) = U (U U )U = U U and U U ∗ is Hermitian. So, U U ∗ is a projection
∗ 2 ∗ ∗ ∗

matrix. It will be shown later that U U ∗ = I3 .


(b) If A = ui u∗i , for 1 ≤ i ≤ 3 then A2 = A. Is A Hermitian? Is A a projection matrix?
Ans: A2 = (ui u∗i )(ui u∗i ) = ui (u∗i ui )u∗i = ui u∗i = A. Clearly, A is Hermitian. Thus,
A is a projection.
(c) If A = ui u∗i + uj u∗j , for i 6= j then A2 = A. Is A a projection matrix?
Ans: A2 = (ui u∗i + uj u∗j )(ui u∗i + uj u∗j ) = ui u∗i + uj u∗j = A as u∗i uj = 0 = u∗j ui .
Clearly, A is Hermitian. So, A is a projection matrix.
20 CHAPTER 1. INTRODUCTION TO MATRICES

3. Let A, B ∈ Mn (C) be two unitary matrices. Then both AB and BA are unitary matrices.

4. Let A ∈ Mn (C) be a Hermitian matrix.


(a) Then the diagonal entries of A are necessarily real numbers.
Ans: Note that aii = e∗i Aei = e∗i A∗ ei = (e∗i Aei )∗ = aii . Thus aii = aii ⇒ aii ∈ R.
(b) For each B ∈ Mn (C) prove that B ∗ AB is a Hermitian matrix.
Ans: (B ∗ AB)∗ = B ∗ A∗ B = B ∗ AB.
(c) Further if A2 = 0 then show that A = 0.
Ans: 0 = A2 = A∗ A. So, 0 = (A∗ A)11 = |a11 |2 + |a21 |2 + · · · + |an1 |2 implies ai1 = 0
for 1 ≤ i ≤ n. Similarly, use 0 = (A∗ A)ii for i ≥ 2 to get other entries as zero.
(d) Then x∗ Ax is a real number, for any x ∈ Mn,1 (C).
Ans: As x∗ Ax is a scalar, x∗ Ax = (x∗ Ax)∗ = x∗ A∗ x = x∗ Ax ⇒ x∗ Ax ∈ R.

5. Let A ∈ Mn (C). If x∗ Ax ∈ R for every x ∈ Mn,1 (C) then A is a Hermitian matrix. [Hint:
Use ej , ej + ek and ej + iek of Mn,1 (C) for x.]
Ans: Taking x = ei gives aii = e∗i Aei = x∗ Ax ∈ R. So, aii ∈ R.
Taking x = ei + iej , gives x∗ Ax = aii − iaji + iaij + ajj , a real number. As aii , ajj ∈ R,
aij − aji is a purely imaginary number, i.e., they have the same real part. Similarly, taking
x = ei + ej gives aij + aji ∈ R, i.e., they have opposite imaginary parts. So aij = aji .
T

6. Let A and B be Hermitian matrices. Then, prove that AB is Hermitian if and only if
AF

AB = BA.
DR

7. Let A ∈ Mn (C) be a skew-Hermitian matrix. Then prove that


(a) the diagonal entries of A are either zero or purely imaginary.
(b) for each B ∈ Mn (C) prove that B ∗ AB is a skew-Hermitian matrix.
Ans: Note that −aii = e∗i (−A)ei = e∗i A∗ ei = aii . Thus −aii = aii and hence aii is
either zero or purely imaginary. (B ∗ AB)∗ = B ∗ A∗ B = −(B ∗ AB).

8. Let A be a complex square matrix. Then S1 = 21 (A + A∗ ) is Hermitian, S2 = 12 (A − A∗ )


is skew-Hermitian, and A = S1 + S2 .

9. Let A, B be skew-Hermitian matrices with AB = BA. Is the matrix AB Hermitian or


skew-Hermitian?
Ans: (AB)∗ = B ∗ A∗ = (−B)(−A) = BA = AB.

10. Let A be a nilpotent matrix. Prove that there exists a matrix B such that B(I + A) = I =
(I + A)B. [If Ak = 0 then look at I − A + A2 − · · · + (−1)k−1 Ak−1 ].
Ans: Verify (I + A)(I − A + · · · + (−1)k−1 Ak−1 ) = (I − A + · · · + (−1)k−1 Ak−1 )(I + A) = I.
   
1 0 0 1 0 0
   
0 cos θ − sin θ and B = 0 cos θ
11. Let A =  sin θ, for θ ∈ [−π, π). Are they
 
0 sin θ cos θ 0 sin θ − cos θ
orthogonal?
1.3. SOME MORE SPECIAL MATRICES 21

Ans: Yes, as AAT = I = AT A and B T B = I = BB T .

1.3.1 Submatrix of a Matrix

Definition 1.3.3. For k ∈ N, let [k] = {1, . . . , k}. Also, let A ∈ Mm×n (C).
1. Then, a matrix obtained by deleting some of the rows and/or columns of A is said to be
a submatrix of A.
2. If S ⊆ [m] and T ⊆ [n] then by A(S|T) , we denote the submatrix obtained from A by
deleting the rows with indices in S and columns with indices in T . By A[S, T ], we mean
A(S c |T c ), where S c = [m] \ S and T c = [n] \ T . Whenever, S or T consist of a single
element, then we just write the element. If S = [m], then A[S, T ] = A[:, T ] and if T = [n]
then A[S, T ] = A[S, :] which matches with our notation in Definition 1.1.1.
3. If m = n, the submatrix A[S, S] is called a principal submatrix of A.
" # " #
1 4 5 1 5
Example 1.3.4. 1. Let A = . Then, A[{1, 2}, {1, 3}] = A[:, {1, 3}] = ,
0 1 2 0 2
" #
1
A[1, 1] = [1], A[2, 3] = [2], A[{1, 2}, 1] = A[:, 1] = , A[1, {1, 3}] = [1 5] and A are a few
0
" # " #
1 4 1 4
submatrices of A. But the matrices and are not submatrices of A.
1 0 0 2
T

 
1 2 3
AF

" #
  1 3
2. Take A =  5 6 7, S = {1, 3} and T = {2, 3}. Then, A[S, S] = 9 7 , A[T, T ] =

DR

9 8 7
" #
6 7 h i h i
, A(S | S) = 6 and A(T | T ) = 1 are principal submatrices of A.
8 7

Let A ∈ Mn,m (C) and B ∈ Mm,p (C). Then the product AB" is
# defined. Suppose r < m.
H
Then A and B can be decomposed as A = [P Q] and B = , where P ∈ Mn,r (C) and
K
H ∈ Mr,p (C) so that AB = P H + QK. This is proved next.

Theorem 1.3.5. Let the matrices A, B, P, H, Q and K be defined as above. Then

AB = P H + QK.

Proof. Verify that the matrix products P H and QK are valid. Further, their sum is defined
as P H, QK ∈ Mn,p (C). Now, let P = [Pij ], Q = [Qij ], H = [Hij ], and K = [Kij ]. Then, for
1 ≤ i ≤ n and 1 ≤ j ≤ p, we have
m
X r
X m
X r
X m
X
(AB)ij = aik bkj = aik bkj + aik bkj = Pik Hkj + Qik Kkj
k=1 k=1 k=r+1 k=1 k=r+1
= (P H)ij + (QK)ij = (P H + QK)ij .

Thus, the required result follows.


22 CHAPTER 1. INTRODUCTION TO MATRICES

Remark 1.3.6. Theorem 1.3.5 is very useful due to the following reasons:

1. The order of the matrices P, Q, H and K are smaller than that of A or B.

2. The matrices P, Q, H and K can be further partitioned so as to form blocks that are either
identity or zero or matrices that have certain nice properties. So, such a partition may
be quite useful during different matrix operations. Examples of such partitions appear
throughout the notes.

3. Suppose one wants to prove a result for a square matrix A. If we want to prove it using
induction then we can prove it for the 1 × 1 matrix (the initial step of induction). Then
assume the result to hold for all k × k submatrices
" # A or just the first k × k principal
of
B x
submatrix of A. At the next step write A = , where B is a k × k matrix. Then
xT a
the result holds for B and then one can proceed to prove it for A.

Exercise 1.3.7. 1. Complete the proofs of Theorems 1.2.4 and 1.2.13.


" # " # " # " #
x1 y1 cos α − sin α cos(2θ) sin(2θ)
2. Let x = ,y= ,A= and B = .
x2 y2 sin α cos α sin(2θ) − cos(2θ)

(a) Then y = Ax gives the counter-clockwise rotation through an angle α.


T

" # " # " # " #


1 cos α 0 − sin α
AF

Ans: Note that A sends the vector to and the vector to


0 sin α 1 cos α
DR

which are counter-clockwise rotations by α of the respective vectors.

(b) Then y = Bx gives the reflection about the line y = tan(θ)x.


" #
a
Ans: Let y = tan(θ)x be the line `1 . Then is a general point on `1 . Further,
a tan θ
" # " #
a a
B = . So, B fixes every point on `1 .
a tan θ a tan θ
" #
a
Now let `2 be the line which passes through and is perpendicular to `1 . A
a tan θ
" #
a sec2 θ − y tan θ
general point on `2 is . Then
y
" # " #
a sec2 θ − y tan θ 2a − a sec2 θ + y tan θ
B = .
y 2a tan θ − y
" # " #
2a − a sec2 θ + y tan θ a
Note that lies on `2 and is the mid-point of the two
2a tan θ − y a tan θ
" # " # " #
2a − a sec2 θ + y tan θ a sec2 θ − y tan θ 2a − a sec2 θ + y tan θ
points and . Thus,
2a tan θ − y y 2a tan θ − y
" #
a sec2 θ − y tan θ
is the reflection of about the line `1 .
y
1.3. SOME MORE SPECIAL MATRICES 23

(c) Let α = θ and compute y = (AB)x and y = (BA)x. Do they correspond to reflec-
tion? If yes, then about which line(s)?
" # " #
cos(3θ) sin(3θ) cos(θ) sin(θ)
Ans: Note AB = and BA = . So, the lines
sin(3θ) − cos(3θ) sin(θ) − cos(θ)
   
3θ θ
are y = tan x and y = tan x.
2 2
(d) Further, if y = Cx gives the counter-clockwise rotation through β and y = Dx gives
the reflections about the line y = tan(δ) x. Then prove that
i. AC = CA and y = (AC)x gives " the counter-clockwise rotation
# through α + β.
cos(α + β) − sin(α + β)
Ans: Verify that AC = CA =
sin(α + β) cos(α + β)
ii. y = (BD)x and
" y = (DB)x give rotations. # Which"angles do they represent? #
cos 2(θ − δ) − sin 2(θ − δ) cos 2(δ − θ) − sin 2(δ − θ)
Ans: BD = , DB = .
sin 2(θ − δ) cos 2(θ − δ) sin 2(δ − θ) cos 2(δ − θ)

3. Let A ∈ Mn (C). If AB = BA for all B ∈ Mn (C) then A is a scalar matrix, i.e., A = αI


for some α ∈ C (use the matrices eij in Definition 1.3.1.1).
Ans: Let B = eij = ei eTj for i 6= j. Then AB = Aei eTj = A[:, i]eTj and BA = ei eTj A =
ei A[j, :]. But,
 
0
T

 . 
 . 
AF

j-th  . 
 
↓  0 
DR

 
A[:, i]eTj = [0, · · · , 0, A[:, i], 0, · · · , 0] and ei A[j, :] = A[j, :] ←i-th .
 
 
 0 
 
 . 
 . 
 . 
0
Hence aij = 0, if i 6= j and ajj = aii .

4. Consider the two coordinate transformations


x1 = a11 y1 + a12 y2 y1 = b11 z1 + b12 z2
and .
x2 = a21 y1 + a22 y2 y2 = b21 z1 + b22 z2

(a) Compose the two transformations to express x1 , x2 in terms of z1 , z2 .


" # " #" # " # " # " #" # " #
x1 a11 a12 y1 y1 y1 b11 b12 z1 z1
Ans: Note = =A and = =B .
x2 a21 a22 y2 y2 y2 b21 b22 z2 z2
" # " #
x1 z1
Then = AB .
x2 z2
(b) Does the composition of two transformations obtained in the previous part correspond
to multiplying two matrices? Give reasons for your answer.
Ans: Yes, see the above solution.

5. For An×n = [aij ], the trace of A, denoted tr(A), is defined by tr(A) = a11 + a22 + · · · + ann .
24 CHAPTER 1. INTRODUCTION TO MATRICES
" # " #
3 2 4 −3
(a) Compute tr(A) for A = and A = .
2 2 −5 1
Ans: 3 + 2 = 5 and 4 + 1 = 5.
" # " # " # " # " #
1 1 1 1 1 1
(b) Let A be a matrix with A =2 and A =3 . If B = then
2 2 −2 −2 2 −2
compute tr(AB). What about tr(A)?
" # " #
2 3 a b
Ans: Verify AB = . So tr(AB) = −4. Let A = . Then, the
4 −6 c d
given conditions imply a + 2b = 2, c + 2d = 4, a − 2b = 3 and c − 2d = −6. Thus
5 5
tr(A) = a + d = + = 5.
2 2
(c) Let A and B be two square matrices of the same order. Then

i. tr(A + B) = tr(A) + tr(B).


n
P n
P n
P
Ans: tr(A + B) = (A + B)ii = (A)ii + (B)ii = tr(A) + tr(B).
i=1 i=1 i=1
ii. tr(AB) = tr(BA).
P n n P
P n n P
P n n
P
Ans: tr(AB) = (AB)ii = aij bji = bji aij = (BA)jj = tr(BA).
i=1 i=1 j=1 j=1 i=1 j=1

(d) Does there exist matrices A, B ∈ Mn (C) such that AB − BA = cI, for some c 6= 0?
Ans: No. Note that tr(AB − BA) = 0, where as, for c 6= 0, tr(c I) = nc 6= 0.
T
AF

6. Let J ∈ Mn (R) be a matrix having each entry 1.


DR

(a) Verify that J = 11T , where 1 is a column vector having all entries 1.

(b) Verify that J 2 = nJ.

(c) Also, for any α1 , α2 , β1 , β2 ∈ R, verify that there exist α3 , β3 ∈ R such that

(α1 In + β1 J) · (α2 In + β2 J) = α3 In + β3 J.

(d) Let α, β ∈ R such that α 6= 0 and α + nβ 6= 0. Now, define A = αIn + βJ. Then,
use the above to prove that A is invertible.
Ans: J 2 = (11T )(11T ) = 1(1T 1)1T = n11T = nJ.
Note that in part (6c), α3 = α1 α2 and β3 = α1 β2 + α2 β1 + nβ1 β2 . So, using the third
1 β
part B = I − J is the inverse of A.
α α(α + nβ)

" #
1 2 3
7. Let A = .
2 1 1

(a) Find a matrix B such that AB = I2 .

(b) What can you say about the number of such matrices? Give reasons for your answer.

(c) Does there exist a matrix C such that CA = I3 ? Give reasons for your answer.
1.3. SOME MORE SPECIAL MATRICES 25
 
−1 + k 2+z
1 
Ans: Take G =  2 − 5k −1 − 5z , for k, z arbitrary. Then AG = I2 . Does there
3 
3k 3z
 
−8/35 3/5
 
exists a value of z for which G = 
 1/7 0 ? Note that for this choice of G, one has
11/35 −1/5
AGA = A, GAG = G, (AG)T = AG and (GA)T = GA. The matrices G which satisfy the
above are called pseudo inverse of A.
" #
P Q
8. Let A = . If P, Q and R are Hermitian, is the matrix A Hermitian?
Q R
" # " #
∗ P ∗ Q∗ n P Q
Ans: Yes, as A = = .
Q∗ R∗ Q R
" #
A11 x
9. Let A = , where A11 ∈ Mn (C) is invertible and c ∈ C.
y∗ c

(a) If p = c − y∗ A−1
11 x is non zero, then verify that
" # " #
A−1
11 0 1 A−111 x
h i
B= + y∗ A−1
11 −1
0 0 p −1
T

is the inverse of A.
AF

Ans: Just multiply and verify.


DR

   
0 −1 2 0 −1 2
   
(b) Use the above to find the inverse of 
 1 4 
1  and  3
 1 .
4 
−2 1 1 −2 5 −3
" # " #" #
1 1 h i 1 1 2
Ans: A−1
11 = , p = 1 − −2 1 = 15. So, the inverse equals
−1 0 −1 0 4
       
1 1 0 6 h 1 1 0 −18 −12 −6
 −1 0 0 + 1 −2 −3 −2 −1 =  −1 0 0 + 1 
    i    
  15     15  6 2 2 

0 0 0 −1 0 0 0 3 2 1
   
−1/5 1/5 −2/5 −23/33 7/33 −2/11
   
=−3/5 4/15 2/15 . For the second matrix the inverse is  1/33 4/33 2/11 
 

1/5 2/15 1/15 17/33 2/33 1/11

10. Let x ∈ Mn,1 (R) be a unit vector.

(a) Define A = In − 2xxT . Prove that A is symmetric and A2 = I. The matrix A is


commonly known as the Householder matrix.
Ans: A2 = (In − 2xxT )(In − 2xxT ) = In − 4xxT + 4xxT = In as xT x = 1.
(b) Let α 6= 1 be a real number and define A = In − αxxT . Prove that A is symmetric
and invertible. [The inverse is also of the form In + βxxT , for some β.]
α
Ans: Just multiply and verify that β = as xT x = 1.
α−1
26 CHAPTER 1. INTRODUCTION TO MATRICES

11. Let A ∈ Mn (R) be an invertible matrix and let x, y ∈ Mn,1 (R). Also, let β ∈ R such that
α = 1 + βyT A−1 x 6= 0. Then, verify the famous Shermon-Morrison formula
β −1 T −1
(A + βxyT )−1 = A−1 − A xy A .
α
This formula gives the information about the inverse when an invertible matrix is modified
by a rank (see Definition 2.3.1) one matrix.
Ans: Just multiply and verify.

12. Suppose the matrices B and C are invertible and the involved partitioned products are
defined, then verify that that
" #−1 " #
A B 0 C −1
= .
C 0 B −1 −B −1 AC −1

Ans: Just multiply and verify.

13. Let A ∈ Mm,n (C). Then, a matrix G ∈ Mn,m (C) is called a generalized inverse (for
short, g-inverse) of A if AGA
" = A. # For example, a generalized inverse of the matrix
1 − 2α
A = [1, 2] is a matrix G = , for all α ∈ R. A generalized inverse G is called a
α
pseudo inverse or a Moore-Penrose inverse if GAG = G and the matrices AG and
T

2
GA are symmetric. Check that for α = the matrix G is a pseudo inverse of A. Further,
AF

5
2
DR

among all the g-inverses, the inverse with the least euclidean norm also has α = .
5

1.4 Summary
In this chapter, we started with the definition of a matrix and came across lots of examples.
We recall these examples as they will be used in later chapters to relate different ideas:

1. The zero matrix of size m × n, denoted 0m×n or 0.

2. The identity matrix of size n × n, denoted In or I.

3. Triangular matrices.

4. Hermitian/Symmetric matrices.

5. Skew-Hermitian/skew-symmetric matrices.

6. Unitary/Orthogonal matrices.

7. Idempotent matrices.

8. Nilpotent matrices.

We also learnt product of two matrices. Even though it seemed complicated, it basically
tells that multiplying by a matrix on the
1.4. SUMMARY 27

1. left of A is same as operating on (playing with) the rows of A.

2. right of A is same as operating on (playing with) the columns of A.

The matrix multiplication is not commutative. We also defined the inverse of a matrix. Further,
there were exercises that informs us that the rows and columns of invertible matrices cannot
have certain properties.

T
AF
DR
28 CHAPTER 1. INTRODUCTION TO MATRICES

T
AF
DR
Chapter 2

System of Linear Equations

This chapter starts with understanding the effect of elementary row operations on the solution
set of a system of linear equations. This helps us to conclusively give necessary and sufficient
conditions for a system of linear equations to have either a unique solution, no solution or an
infinite number of solutions.

2.1 Introduction
We start this section with our understanding of the system of linear equations in at most 2
T

variables/unknowns.
AF
DR

Example 2.1.1. Let us look at some examples of linear systems.

1. Suppose a, b ∈ R. Consider the system ax = b in the variable x. If

(a) a 6= 0 then the system has a unique solution x = ab .


(b) a = 0 and
i. b 6= 0 then the system has no solution.
ii. b = 0 then the system has infinite number of solutions, namely all x ∈ R.

2. Recall that the linear system ax + by = c for (a, b) 6= (0, 0), in the variables x and y,
represents a line in R2 . So, let us consider the points of intersection of the two lines

a1 x + b1 y = c1 , a2 x + b2 y = c2 , (2.1.1)

where a1 , a2 , b1 , b2 , c1 , c2 ∈ R with (a1 , b1 ), (a2 , b2 ) 6= (0, 0) (see Figure 2.1 for illustration
of different cases).

(a) Unique
" # Solution
" # (a1 b2 − a2 b1 6= 0): The linear system x − y = 3 and 2x + 3y = 11
x 4
has = as the unique solution.
y 1
(b) No Solution (a1 b2 − a2 b1 = 0 but a1 c2 − a2 c1 6= 0): The linear system x + 2y = 1
and 2x + 4y = 3 represent a pair of parallel lines which have no point of intersection.

29
30 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS

❵✶
❵✶
❵✷ ❵✶ ✄☎❞ ❵✷
✝ ❵✷

◆♦ ❙♦❧ t✐♦♥ ■♥☞♥✐t✂ ◆ ♠❜✂✁ ♦❢ ❙♦❧ t✐♦♥s ❯♥✐✞ ✂ ❙♦❧ t✐♦♥✿ ■♥t✂✁s✂❝t✐♥❣ ▲✐♥✂s
P❛✐✁ ♦❢ P❛✁❛❧❧✂❧ ❧✐♥✂s ❈♦✐♥❝✐✆✂♥t ▲✐♥✂s ✟ ✿ P♦✐♥t ♦❢ ■♥t✂✁s✂❝t✐♦♥

Figure 2.1: Examples in 2 dimension.

(c) Infinite Number of Solutions (a1 b2 − a2 b1 = 0 and a1 c2 − a2 c1 = 0): The linear


system "x #+ 2y"= 1 and # 2x
"+ # 4y =
" 2 #represent the same line. So, the solution set
x 1 − 2y 1 −2
equals = = +y with y arbitrary. Observe that the vector
y y 0 1
" #
1
i. corresponds to the solution x = 1, y = 0 of the given system.
0
" #
−2
ii. gives x = −2, y = 1 as the solution of x + 2y = 0, 2x + 4y = 0.
1
(d) If the linear system ax + by = c has
i. (a, b) = (0, 0) and c 6= 0 then ax + by = c has no solution.
T
AF

ii. (a, b, c) = (0, 0, 0) then ax+by = c has infinite number of solutions, namely
whole of R2 .
DR

Let us now look at different interpretations of the solution concept.

Example 2.1.2. Observe the following of the linear system in Example 2.1.1.2a.
" #
4
1. corresponds to the point of intersection of the corresponding two lines.
1
" #
1 −1
2. Using matrix multiplication, the given system equals Ax = b, where A = ,
2 3
" # " # " #" # " #
x 3 3 1 3 4
x= and b = . So, the solution is x = A−1 b = 15 = .
y 11 −2 1 11 1
" # " # " #
1 −1 3
3. Re-writing Ax = b as x+ y= gives us 4 · (1, 2)T + 1 · (−1, 3)T = (3, 11)T .
2 3 11
This corresponds to addition of vectors in the Euclidean plane.

Thus, there are three ways of looking at the linear system Ax = b, where, as the name suggests,
one of the ways is looking at the point of intersection of planes, the other is the vector sum
approach and the third is the matrix multiplication approach. We will see that all the three
approaches are fundamental to the understanding of linear algebra.

Definition 2.1.3. A system of m linear equations in n variables x1 , x2 , . . . , xn is a set of


equations of the form
2.1. INTRODUCTION 31

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
.. ..
. . (2.1.2)
am1 x1 + am2 x2 + · · · + amn xn = bm

where for 1 ≤ i ≤ m and 1 ≤ j ≤ n; aij , bi ∈ R. Linear System (2.1.2) is called homogeneous


if b1 = 0 = b2 = · · · = bm and non-homogeneous, otherwise.
 
a11 a12 · · · a1n    
  x1 b1
 a21 a22 · · · a2n   .   . 
Definition 2.1.4. Let A =  . ,x= .   . 
 .  and b =  . . Then, (2.1.2)
 
 .. .. .. .. 
. . . 

xn bm

am1 am2 · · · amn
can be re-written as Ax = b. In this setup, the matrix A is called the coefficient matrix and
the block matrix [A b] is called the augmented matrix of the linear system (2.1.2).

Remark 2.1.5. Consider the linear system Ax = b, where A ∈ Mm,n (C), b ∈ Mm,1 (C) and
x ∈ Mn,1 (C). If [A b] is the augmented matrix and xT = [x1 , . . . , xn ] then,

1. for j = 1, 2, . . . , n, the variable xj corresponds to the column ([A b])[:, j].


T
AF

2. the vector b = ([A b])[:, n + 1].


DR

3. for i = 1, 2, . . . , m, the ith equation corresponds to the row ([A b])[i, :].

Definition 2.1.6. A solution of Ax = b is a vector y such that Ay indeed equals b. The


set of all solutions is called 
the solution
 set of thesystem.
 For example, the solution set of
1 1 1 1  0 

 

   
Ax = b, with A =  1 4 2 and b = 0 equals −1.
    


4 1 1 1  2  

Definition 2.1.7. Consider a linear system Ax = b. Then, this linear system is called consis-
tent if it admits a solution and is called inconsistent if it admits no solution. For example,
the homogeneous system Ax = 0 is always consistent as 0 is a solution whereas, verify that the
system x + y = 2, 2x + 2y = 3 is inconsistent.

Definition 2.1.8. Consider a linear system Ax = b. Then, the corresponding linear system
Ax = 0 is called the associated homogeneous system. 0 is always a solution of the associated
homogeneous system.

The readers are advised to supply the proof of the next theorem that gives information
about the solution set of a homogeneous system.

Theorem 2.1.9. Consider a homogeneous linear system Ax = 0.

1. Then, x = 0, the zero vector, is always a solution, called the trivial solution.
32 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS

2. Let u 6= 0 be a solution of Ax = 0. Then, y = cu is also a solution, for all c ∈ C.


A nonzero solution is called a non-trivial solution. Note that, in this case, the system
Ax = 0 has an infinite number of solutions.
k
P
3. Let u1 , . . . , uk be solutions of Ax = 0. Then, ai ui is also a solution of Ax = 0, for
i=1
each choice of ai ∈ C, 1 ≤ i ≤ k.
" # " #
1 1 1
Remark 2.1.10. 1. Let A = . Then, x = is a non-trivial solution of Ax = 0.
1 1 −1

2. Let u 6= v be solutions of a non-homogeneous system Ax = b. Then, xh = u − v


is a solution of the associated homogeneous system Ax = 0. That is, any two distinct
solutions of Ax = b differ by a solution of the associated homogeneous system Ax = 0.
Or equivalently, the solution set of Ax = b is of the form, {x0 + xh }, where x0 is a
particular solution of Ax = b and xh is a solution of the associated homogeneous system
Ax = 0.
Exercise 2.1.11. 1. Consider a system of 2 equations in 3 variables. If this system is
consistent then how many solutions does it have?

Ans: Since there are two intersecting (system is consistent) planes in R3 they will intersect
in a line. So, infinite number of solutions.
T
AF

2. Give a linear system of 3 equations in 2 variables such that the system is inconsistent
whereas it has 2 equations which form a consistent system.
DR

Ans: x + y = 2, x + 2y = 3, 2x + 3y = 4.

3. Give a linear system of 4 equations in 3 variables such that the system is inconsistent
whereas it has three equations which form a consistent system.

Ans: x + y + z = 3, x + 2y + 3z = 6, 2x + 3y + 4z = 4, 2x + 2y + z = 5.

4. Let Ax = b be a system of m equations in n variables, where A ∈ Mm,n (C).


(a) Can the system, Ax = b have exactly two distinct solutions for any choice of m and
n? Give reasons for your answer.
(b) Can the system Ax = b have only a finitely many (greater than 1) solutions for any
choice of m and n? Give reasons for your answer.
Ans: No. Let x1 , x2 be two solutions. Define z = ax1 + (1 − a)x2 for a ∈ R. Then
Az = aAx1 + (1 − a)Ax2 = ab + (1 − a)b = b.

2.1.1 Elementary Row Operations

A system of linear equations can be solved by people differently. But, the final solution remains
the same. In this section, we use a systematic way to solve any linear system which is popularly
known as the Guass Elimination method.

Example 2.1.12. Solve the linear system y + z = 2, 2x + 3z = 5, x + y + z = 3.


2.1. INTRODUCTION 33
 
0 1 1 2
 
Solution: Let B0 = [A b], the augmented matrix. Then, B0 = 
2 0 3 5. We now

1 1 1 3
systematically proceed to get the solution.

1. Interchange 1-st and 2-nd equations (interchange B0 [1, :] and B0 [2, :] to get B1 ).
 
2x + 3z = 5 2 0 3 5
 
y+z =2 B1 =   0 1 1 2 .

x+y+z =3 1 1 1 3

1 1
2. In the new system, multiply 1-st equation by 2 (multiply B1 [1, :] by to get B2 ).
2
 
x + 32 z = 5
2 1 0 32 5
2

y+z =2 B2 = 
0 1 1 .
2
x+y+z =3 1 1 1 3

3. In the new system, replace 3-rd equation by 3-rd equation minus 1-st equation (replace
B2 [3, :] by B2 [3, :] − B2 [1, :] to get B3 ).
 
x + 23 z = 5
2 1 0 23 5
2
T


y+z =2 B3 = 
0 1 1 .
2
AF

y − 12 z = 1
2 0 1 − 21 1
2
DR

4. In the new system, replace 3-rd equation by 3-rd equation minus 2-nd equation (replace
B3 [3, :] by B3 [3, :] − B3 [2, :] to get B4 ).
 
x + 23 z = 5
2 1 0 23 5
2
 
y+z =2 B4 = 
0 1 1 2 .
− 32 z = − 32 0 0 − 32 3
−2

−2 −2
5. In the new system, multiply 3-rd equation by (multiply B4 [3, :] by to get B5 ).
3 3
 
x + 32 z = 5
2 1 0 23 52
 
y+z =2 B5 = 
0 1 1 2  .

z =1 0 0 1 1

The last equation gives z = 1. Using this, the second equation gives y = 1. Finally, the
first equation gives x = 1. Hence, the solution set is {[x, y, z]T | [x, y, z] = [1, 1, 1]}, a unique
solution.
In Example 2.1.12, observe how each operation on the linear system corresponds to a similar
operation on the rows of the augmented matrix. We use this idea to define elementary row
operations and the equivalence of two linear systems.

Definition 2.1.13. Let A ∈ Mm,n (C). Then, the elementary row operations are
34 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS

1. Eij : Interchange the i-th and j-th rows, namely, interchange A[i, :] and A[j, :].

2. Ek (c) for c 6= 0: Multiply the k-th row by c, namely, multiply A[k, :] by c.

3. Eij (c) for c 6= 0: Replace the i-th row by i-th row plus c-times the j-th row, namely,
replace A[i, :] by A[i, :] + cA[j, :].

Definition 2.1.14. Two matrices are said to be row equivalent if one can be obtained from
the other by a finite number of elementary row operations.

Definition 2.1.15. The linear systems Ax = b and Cx = d are said to be row equivalent if
their respective augmented matrices, [A b] and [C d], are row equivalent.

Thus, note that the linear systems at each step in Example 2.1.12 are row equivalent to each
other. We now prove that the solution set of two row equivalent linear systems are same.

Lemma 2.1.16. Let Cx = d be the linear system obtained from Ax = b by application of a


single elementary row operation. Then, Ax = b and Cx = d have the same solution set.

Proof. We prove the result for the elementary row operation Ejk (c) with c 6= 0. The reader is
advised to prove the result for the other two elementary operations.
In this case, the systems Ax = b and Cx = d vary only in the j th equation. So, we
need to show that y satisfies the j th equation of Ax = b if and only if y satisfies the j th
T

equation of Cx = d. So, let yT = [α , . . . , α ]. Then, the j th and k th equations of Ax = b are


AF

1 n
aj1 α1 + · · · + ajn αn = bj and ak1 α1 + · · · + akn αn = bk . Therefore, we see that αi ’s satisfy
DR

(aj1 + cak1 )α1 + · · · + (ajn + cakn )αn = bj + cbk . (2.1.3)

Also, by definition the j th equation of Cx = d equals

(aj1 + cak1 )x1 + · · · + (ajn + cakn )xn = bj + cbk . (2.1.4)

Therefore, using Equation (2.1.3), we see that yT = [α1 , . . . , αn ] is also a solution for Equation
(2.1.4). Now, use a similar argument to show that if zT = [β1 , . . . , βn ] is a solution of Cx = d
then it is also a solution of Ax = b. Hence, the required result follows.
The readers are advised to use Lemma 2.1.16 as an induction step to prove the next result.

Theorem 2.1.17. Let Ax = b and Cx = d be two row equivalent linear systems. Then, they
have the same solution set.

The exercise below shows that every square matrix is row equivalent to an upper triangular
matrix.

Exercise 2.1.18. Let A = [aij ] ∈ Mn (R). Then there exists an orthogonal matrix U such that
U A is upper triangular. The proof uses the following ideas.
1. If A[1, :] = 0 then proceed to the next column. So, let A[:, 1] 6= 0. If a11 = 0 then apply a
permutation matrix P (an orthogonal matrix, see Definition 1.3.1.2d) to get B = P A such
that the (1, 1)-th entry of B is non zero. Hence, without loss of generality, let a11 6= 0.
2.2. ROW-REDUCED ECHELON FORM (RREF) 35

2. Let [w1 , . . . , wn ]T = w ∈ Rn with w1 6= 0. Then use the Householder matrix (see 1.3.7.10a)
H such that Hw = αe1 for some α ∈ R, i.e., find x ∈ Rn such that (In − 2xxT )w = αe1 .
w − αe1 1
Ans: Given condition implies w − αe1 = 2(xT w)x. So x = T
. As is scalar,
2x w 2xT w
1 − 2wT w
use x = w + αe1 to find a choice of α. Show that for α = , Hw = −αe1 .
2w1
" #
α ∗
3. So, Part 2 gives an orthogonal matrix H1 with H1 A = .
0 A1
4. Now, use induction to get H2 ∈ Mn−1 (R) to get H2 A1 = T1 , an upper triangular matrix.
" # " #
1 0T α ∗
5. Define H = H1 . Then H is an orthogonal matrix and HA = , an upper
0 H2 0 T1
triangular matrix.

2.2 Row-Reduced Echelon Form (RREF)


In the previous section, we saw that two row equivalent linear systems have the same solution
set. Sometimes it helps to imagine an elementary row operation as left multiplication by a suit-
able matrix, known as an elementary matrix. In this section, we show that the product of such
matrices can be used to obtain a matrix which has certain nice properties. This will also help
us to understand the Gauss Elimination method and the Gauss-Jordan method. This under-
T

standing will be used to define the row-rank of a matrix in the next section and in subsequent
AF

sections we use them to obtain results for a system of linear equations.


DR

Definition 2.2.1. A matrix E ∈ Mn (C) is called an elementary matrix if it is obtained by


applying exactly one elementary row operation to the identity matrix In .

Remark 2.2.2. The elementary matrices are of three types and they correspond to elementary
row operations.

1. Eij = In −ei eTi −ej eTj +ei eTj +ej eTi : Matrix obtained by applying elementary row operation
Eij to In .

2. Ek (c) = In + (c − 1)ek eTk for c 6= 0: Matrix obtained by applying elementary row operation
Ek (c) to In .

3. Eij (c) = In + c ei eTj for c 6= 0: Matrix obtained by applying elementary row operation
Eij (c) to In .

Thus, when an elementary matrix is multiplied on the left of a matrix A, it gives the same result
as that of applying the corresponding elementary row operation on A.

Example 2.2.3.
 1.  for n= 3 and c ∈C, c 6= 0, one has
 In particular,  
1 0 0 c 0 0 1 0 0 1 0 0
       
E23 = 
0 0 1 , E1 (c) = 0 1 0, E31 (c) = 0 1 0 and E23 (c) = 0 1 c .
      
0 1 0 0 0 1 c 0 1 0 0 1
36 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS

2. Verify that the transpose of an elementary matrix is again an elementary matrix of similar
type (see the above examples).
 
1 2 3
 
3. Let A =  2 0 3.

3 4 5
(a) If B1 is obtained
 from A by applying the elementary row operation E23 then B1 =

1 2 3
 
E23 A = 3 4 5.

2 0 3
(b) If B is obtained from
 A by applying
 the elementary row operation E31 (−3) then
1 2 3
 
B = E31 (−3)A = 2 0
 3.
0 −2 −4
(c) If C is obtained from
 B by applying
 the elementary row operation E21 (−2) then
1 2 3
 
C = E21 (−2)A = 0 −4 −3

.
0 −2 −4
   
1 3 2 −8 2 3
   
2 3 0, AE31 (−3) =  −7 0 3.
(d) Where as AE23 =    
T

3 5 4 −12 4 5
AF

Exercise 2.2.4. 1. Which of the following matrices are elementary?


DR

           
1
2 0 1 0 0 1 −1 0 1 0 0 0 0 1 0 0 1
  2         
0 1 0 ,  0 1 0 , 0 1 0 , 5 1 0 , 0 1 0 , 1 0 0 .
           
0 0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0
" #
2 1
2. Find some elementary matrices E1 , . . . , Ek such that Ek · · · E1 = I2 .
1 2
 
1 1 1
 
3. Find some elementary matrices F1 , . . . , F` such that F` · · · F1 
0 1 1  = I3 .

0 0 3
Exercise 2.2.5. Show that each elementary matrix is invertible. Further, the inverse is an
elementary matrix of the same type.
Ans: Verify that (Eij )−1 = Eij as Eij Eij = I = Eij Eij . If c 6= 0 then (Ek (c))−1 =
Ek (1/c) as Ek (c)Ek (1/c) = I = Ek (1/c)Ek (c) and (Eij (c))−1 = Eij (−c) as Eij (c)Eij (−c) = I =
Eij (−c)Eij (c).

Proposition 2.2.6. Let A and B be two row equivalent matrices. Then, there exists elementary
matrices E1 , . . . , Ek such that B = E1 · · · Ek A.

Proof. By the definition of row equivalence, B can be obtained from A by a finite number of
elementary row operations. But by Remark 2.2.2, each elementary row operation corresponds
to left multiplication by an elementary matrix. Thus, the required result follows.
2.2. ROW-REDUCED ECHELON FORM (RREF) 37

We now give an alternate prove of Theorem 2.1.17.

Theorem 2.2.7. Let Ax = b and Cx = d be two row equivalent linear systems. Then they
have the same solution set.

Proof. Let E1 , . . . , Ek be the elementary matrices such that E1 · · · Ek [A b] = [C d]. Put


E = E1 · · · Ek . Then, by Exercise 2.2.5

EA = C, Eb = d, A = E −1 C and b = E −1 d. (2.2.1)

Now assume that Ay = b holds. Then, by Equation (2.2.1)

Cy = EAy = Eb = d. (2.2.2)

On the other hand if Cz = d holds then using Equation (2.2.1), we have

Az = E −1 Cz = E −1 d = b. (2.2.3)

Therefore, using Equations (2.2.2) and (2.2.3) the required result follows.
The following result is a particular case of Theorem 2.2.7.

Corollary 2.2.8. Let A and B be two row equivalent matrices. Then, the systems Ax = 0 and
T
AF

Bx = 0 have the same solution set.


   
DR

1 0 0 1 0 a
   
Example 2.2.9. Are the matrices A =  0 1 0  and B = 0 1 b row equivalent?
  
0 0 1 0 0 0
 
a
 
Solution: No, as  b 

 is a solution of Bx = 0 but it isn’t a solution of Ax = 0.
−1

Definition 2.2.10. Let A be a nonzero matrix. Then, in each nonzero row of A, the left most
nonzero entry is called a pivot/leading entry. The column containing the pivot is called a
pivotal column. If aij is a pivot then we denote it by aij . For example, the entries a12 and
 
0 3 4 2
 
a23 are pivots in A = 
0 0 0 0. Thus, columns 2 and 3 are pivotal columns.

0 0 2 1

Definition 2.2.11. A matrix is in row echelon form (REF) (ladder like)

1. if the zero rows are at the bottom;

2. if the pivot of the (i + 1)-th row, if it exists, comes to the right of the pivot of the i-th
row.

3. if the entries below the pivot in a pivotal column are 0.


38 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS

Example 2.2.12. 1. The following matrices


 are in echelon form.

    1 2 0 5  
0 2 4 2 1 1 0 2 3   1 0 0
    0 2 0 6  
,  0 ,  0
 and
.
0 0 1 1  0 0 3 4  0 1 0
 0 0 1 
0 0 0 0 0 0 0 0 1 0 0 1
 
0 0 0 0
2. The
 following matrices
 are not in echelon 
form (determine the rule(s) that fail).
0 1 4 2 1 1 0 2 3
   
0
 0 0 0 and  0
  0 0 0 1.
0 0 1 1 0 0 0 1 4

Definition 2.2.13. A matrix C is said to be in row-reduced echelon form (RREF)

1. if C is already in echelon form,

2. if the pivot of each nonzero row is 1,

3. if every other entry in each pivotal column is zero.

A matrix in RREF is also called a row-reduced echelon matrix.


Example 2.2.14. 1. The following matrices
 are in RREF.

    1 0 0 5  
0 1 0 −2 0 1 3 0   1 1 0 0 0
 0 1 0 6
T

    
, 0 0 0 1 ,  0
 and
.
0 0 1 1    0 0 0 1 0
0 1 2
AF

 
0 0 0 0 0 0 0 0 0 0 0 0 1
 
0 0 0 0
DR

2. The
 following matrices
  are not inRREF
 (determinethe rule(s) that fail).
0 3 3 0 0 1 3 0 0 1 3 1
     
, 0 , 0 .
0 0 0 1  0 0 0  0 0 1

0 0 0 0 0 0 0 1 0 0 0 0

Let A ∈ Mm,n (C). We now present an algorithm, commonly known as the Gauss-Jordan
Elimination (GJE), to compute the RREF of A.
1. Input: A.
2. Output: a matrix B in RREF such that A is row equivalent to B.
3. Step 1: Put ‘Region’ = A.
4. Step 2: If all entries in the Region are 0, STOP. Else, in the Region, find the leftmost
nonzero column and find its topmost nonzero entry. Suppose this nonzero entry is aij = c
(say). Box it. This is a pivot.
5. Step 3: Interchange the row containing the pivot with the top row of the region. Also,
make the pivot entry 1 by dividing this top row by c. Use this pivot to make other entries
in the pivotal column as 0.
6. Step 4: Put Region = the submatrix below and to the right of the current pivot. Now,
go to step 2.
Important: The process will stop, as we can get at most min{m, n} pivots.
2.2. ROW-REDUCED ECHELON FORM (RREF) 39
 
0 2 3 7
 
1 1 1 1
Example 2.2.15. Apply GJE to 
1

 3 4 8

0 0 0 1
1. Region = A as A 6= 0.
   
1 1 1 1 1 1 1 1
   
0 2 3 7. Also, E31 (−1)E12 A =  0 2 3 7

2. Then, E12 A = 
1 3
 = B (say).
 4 8 0 2
 3 7
0 0 0 1 0 0 0 1
 
  1 1 1 1
2 3 7  3 7
  1
0 1 2

2
3. Now, Region =  2 3 7 6= 0. Then, E2 ( 2 )B =  0 = C(say). Then,
 
2 3 7
0 0 1
 
0 0 0 1
 
1 0 −1 2
−5
2

0 3 7 
1 2

2 
E12 (−1)E32 (−2)C =  0 = D(say).
 0 0 0 
0 0 0 1
 
1 0 −1 2
−5
2
T

" #  3 7 
0 0 0 1
AF

2

2 
4. Now, Region = . Then, E34 D =  . Now, multiply on the left
0 1 0
 0 0 1
DR


0 0 0 0
 
1 0 − 12 0

0 3 
5 −7 1 2 0
by E13 ( 2 ) and E23 ( 2 ) to get 
 , a matrix in RREF. Thus, A is row
 0 0 0 1 

0 0 0 0
 
1 0 − 12 0

0 3 
1 2 0 
equivalent to F , where F = RREF(A) =  0
.
 0 0 1
0 0 0 0

Exercise 2.2.16. 1. Let Ax = b be a linear system of m equations in 2 variables. What


are the possible choices for RREF([A b]), if m ≥ 1?
   
x1 x1
   
 x2  x2 
2. Let A =    and B =  x  be two matrices, where x1 , x2 , x3 are any

 x 3

  3
2x1 − 5x2 + πx3 0
three row vectors of the same size. Then, prove that RREF(A) = RREF(B).

3. Let A ∈ Mn (C). If A is not a scalar matrix, i.e., A 6= αI, for any α ∈ C then prove that
there exists a non-singular matrix S such that SAS −1 = B with B = [bij ] and b11 = 0.
40 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS

Ans: If A has a non-zero entry in the first row, say a1i 6= 0, (the first column, say aj1 6= 0)
a11 a11
then take S = In + ei eT1 (S = In − e1 eTj ).
a1i aj1

4. Find the row-reduced echelon form of the following matrices:


 
      −1 −1 −2 3
0 0 1 0 1 1 3 0 −1 1 
3 −3 −3

      3
1 0 3 , 0 0 1 3 , −2 0 3 ,  .
      1 1 2 2 
3 0 7 1 1 0 0 −5 1 0
 
−1 −1 2 −2

The proof of the next result is beyond the scope of this book and hence is omitted.

Theorem 2.2.17. Let A and B be two row equivalent matrices in RREF. Then A = B.

As an immediate corollary, we obtain the following important result.

Corollary 2.2.18. The RREF of a matrix A is unique.

Proof. Suppose there exists a matrix A with two different RREFs, say B and C. As the RREFs
are obtained by left multiplication of elementary matrices, there exist elementary matrices
E1 , . . . , Ek and F1 , . . . , F` such that B = E1 · · · Ek A and C = F1 · · · F` A. Let E = E1 · · · Ek
and F = F1 · · · F` . Thus, B = EA = EF −1 C.
T

As inverse of an elementary matrix is an elementary matrix, F −1 is a product of elementary


AF

matrices and hence B and C are row equivalent. As B and C are in RREF, using Theorem 2.2.17,
DR

B = C.

Remark 2.2.19. Let A ∈ Mm,n (C).

1. Then, by Corollary 2.2.18, it’s RREF is unique.

2. Let A ∈ Mm,n (C). Then, the uniqueness of RREF implies that RREF(A) is independent
of the choice of the row operations used to get the final matrix which is in RREF.

3. Let B = EA, for some elementary matrix E. Then, RREF(A) = RREF(B).

Proof. Let E1 , . . . , Ek and F1 , . . . , F` be elementary matrices such that RREF(A) =


E1 · · · Ek A and RREF(B) = F1 · · · F` B. Then,

RREF(B) = F1 · · · F` B = (F1 · · · F` )EA = (F1 · · · F` )E(Ek−1 · · · E1−1 )RREF(A).

Thus, the matrices RREF(A) and RREF(B) are row equivalent. Since they are also in
RREF by Theorem 2.2.17, RREF(A) = RREF(B).

4. Then, there exists an invertible matrix P , a product of elementary matrices, such that
P A = RREF(A).

Proof. By definition, RREF(A) = E1 · · · Ek A, for certain elementary matrices E1 , . . . , Ek .


Take P = E1 · · · Ek . Then, P is invertible (product of invertible matrices is invertible)
and P A = RREF(A).
2.2. ROW-REDUCED ECHELON FORM (RREF) 41

5. Let F = RREF(A) and B = [A[:, 1], . . . , A[:, s]], for some s ≤ n. Then,

RREF(B) = [F [:, 1], . . . , F [:, s]].

Proof. By Remark 2.2.19.4, there exist an invertible matrix P , such that

F = P A = [P A[:, 1], . . . , P A[:, n]] = [F [:, 1], . . . , F [:, n]].

Thus, P B = [P A[:, 1], . . . , P A[:, s]] = [F [:, 1], . . . , F [:, s]]. As F is in RREF, it’s first s
columns are also in RREF. Hence, by Corollary 2.2.18, RREF(P B) = [F [:, 1], . . . , F [:, s]].
Now, a repeated use of Remark 2.2.19.3 gives RREF(B) = [F [:, 1], . . . , F [:, s]]. Thus, the
required result follows.

Example 2.2.20. Consider a linear system Ax = b, where A ∈ M3 (C) and A[:, 1] 6= 0.Then,
verify that the 7 different choices for [C d] = RREF([A b]) are
     
1 0 0 d1 x d
     1
1. 0 1 0 d2 . Here, Ax = b is consistent. The unique solution equals y  = d2 
    
.
0 0 1 d3 z d3
     
1 0 α 0 1 α 0 0 1 α β 0
     
T

2. 0 1
 β 0 , 0 0 1 0 or 0
   
. Here, Ax = b is inconsistent for any
0 0 1
AF

0 0 0 1 0 0 0 1 0 0 0 0
choice of α, β as RREF([A b]) has a row of [0 0 0 1]. This corresponds to solving
DR

0 · x + 0 · y + 0 · z = 1, an equation which has no solution.


     
1 0 α d1 1 α 0 d1 1 α β d1
     
3. 
 0 1 β d2
 , 0 0 1 d2  or 0 0 0 0 . Here, Ax = b is consistent and has
    
0 0 0 0 0 0 0 0 0 0 0 0
infinite number of solutions for every choice of α, β as RREF([A b]) has no row of
the form [0 0 0 1].

Proposition 2.2.21. Let A ∈ Mn (C). Then, A is invertible if and only if RREF(A) = In , i.e.,
every invertible matrix is a product of elementary matrices.

Proof. If RREF(A) = In then In = E1 · · · Ek A, for some elementary matrices E1 , . . . , Ek . As


Ei ’s are invertible, E1−1 = E2 · · · Ek A, E2−1 E1−1 = E3 · · · Ek A and so on. Finally, one obtains
A = Ek−1 · · · E1−1 . A similar calculation now gives AE1 · · · Ek = In . Hence, by definition of
invertibility A−1 = E1 · · · Ek .
Now, let A be invertible with B = RREF(A) = E1 · · · Ek A, for some elementary matrices
E1 , . . . , Ek . As A and Ei ’s are invertible, the matrix B is invertible. Hence, B doesn’t have any
zero row. Thus, all the n rows of B have pivots. Therefore, B has n pivotal columns. As B
has exactly n columns, each column is a pivotal column and hence B = In . Thus, the required
result follows.
As a direct application of Proposition 2.2.21 and Remark 2.2.19.3 one obtains the following.
42 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS

Theorem 2.2.22. Let A ∈ Mm,n (C). Then, for any invertible matrix S, RREF(SA) =
RREF(A).

" #A ∈ Mn (C) be an invertible matrix. Then, for any matrix


Proposition 2.2.23. Let " # B, define
h i A h i In
C = A B and D = . Then, RREF(C) = In A−1 B and RREF(D) = .
B 0
Proof. Using matrix product,
h i h i
A−1 C = A−1 A A−1 B = In A−1 B .
h i h i
As In A−1 B is in RREF, by Remark 2.2.19.1, RREF(C) = In A−1 B .
" #
A−1 0
For the second part, note that the matrix X = is an invertible matrix. Thus,
−BA−1 In
" #
In
by Proposition 2.2.21, X is a product of elementary matrices. Now, verify that XD = .
0
" #
In
As is in RREF, a repeated application of Remark 2.2.19.1 gives the required result.
0
As an application of Proposition 2.2.23, we have the following observation.
Let A ∈ Mn (C). Suppose we start with C = [A In ] and compute RREF(C). If RREF(C) =
[G H] then, either G = In or G 6= In . Thus, if G = In then we must have H = A−1 . If G 6= In
T

then, A is not invertible. We explain this with an example.


AF

 
0 0 1
DR

 
Example 2.2.24. Use GJE to find the inverse of A =  0 1 1.

1 1 1
 
0 0 1 1 0 0
 
Solution: Applying GJE to [A | I3 ] =   0 1 1 0 1 0  gives

1 1 1 0 0 1
   
1 1 1 0 0 1 1 1 0 −1 0 1
E13   E13 (−1),E23 (−2)  
[A | I3 ] → 0 1 1 0 1
 0
 → 0 1 0 −1 1 0
 
0 0 1 1 0 0 0 0 1 1 0 0
 
1 0 0 0 −1 1
E12 (−1)  
→ 0 1 0 −1 1

.
0
0 0 1 1 0 0
 
0 −1 1
Thus, A−1 = 
 
−1 1 .
0
1 0 0

Exercise
 2.2.25.
 Find
 the inverse
 of the following
 matrices
 using
 GJE.
1 2 3 1 3 3 2 1 1 0 0 2
       
(i) 
1 3 2 (ii) 2 3 2 (iii) 1
   2 1  (iv) 0 2 1.
 
2 4 7 2 4 7 1 1 2 2 1 1
2.3. RANK OF A MATRIX 43

2.3 Rank of a Matrix


Definition 2.3.1. Let A ∈ Mm,n (C). Then, the rank of A, denoted Rank(A), is the number
of pivots in the RREF(A). For example, Rank(In ) = n and Rank(0) = 0.

Remark 2.3.2. Before proceeding further, for A ∈ Mm,n (C), we observe the following.
1. The number of pivots in the RREF(A) is same as the number of pivots in REF of A.
Hence, we need not compute the RREF(A) to determine the rank of A.
2. Since, the number of pivots cannot be more than the number of rows or the number of
columns, one has Rank(A) ≤ min{m, n}.
" # " #
A 0 RREF(A) 0
3. If B = then Rank(B) = Rank(A) as RREF(B) = .
0 0 0 0
" #
A11 A12
4. If A = then, by definition
A21 A22
h i h i
Rank(A) ≤ Rank A11 A12 + Rank A21 A22 .

Further, using Remark 2.2.19,


h i
(a) Rank(A) ≥ Rank A11 A12 .
T

h i
(b) Rank(A) ≥ Rank A21 A22 .
AF

" #!
A11
DR

(c) Rank(A) ≥ Rank .


A21

We now illustrate the calculation of the rank by giving a few examples.

Example 2.3.3. Determine the rank of the following matrices.


" # " #
1 1 −1 1
1. Let A = and B = . Then, Rank(A) = Rank(B) = 1. Also, verify that
2 2 1 −1
" #
1 1
AB = 0 and BA = . So, Rank(AB) = 0 6= 1 = Rank(BA).
−1 −1

2. Let A = diag(d1 , . . . , dn ). Then, Rank(A) equals the number of nonzero di ’s.


 
1 2 1 1 1
 
3. Let A = 
2 3 1 2 2 . Then, Rank(A) = 2 as it’s REF has two pivots.

1 1 0 1 1

We now show that the rank doesn’t change if a matrix is multiplied on the left by an
invertible matrix.

Lemma 2.3.4. Let A ∈ Mm,n (C). If S is an invertible matrix then Rank(SA) = Rank(A).

Proof. By Theorem 2.2.22, RREF(A) = RREF(SA). Hence, Rank(SA) = Rank(A).


We now have the following result.
44 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS

Corollary 2.3.5. Let A ∈ Mm,n (C) and B ∈ Mn,q (C). Then, Rank(AB) ≤ Rank(A).
In particular, if B ∈ Mn (C) is invertible then Rank(AB) = Rank(A).

Proof. Let Rank(A) = r." Then,


# there exists an
" invertible
# # P and A1 ∈ Mr,n (C) such
" matrix
A1 A1 A1 B
that P A = RREF(A) = . Then, P AB = B= . So, using Lemma 2.3.4 and
0 0 0
Remark 2.3.2.2, we get
" #!
A1 B
Rank(AB) = Rank(P AB) = Rank = Rank(A1 B) ≤ r = Rank(A). (2.3.4)
0

In particular, if B is invertible then, using Equation (2.3.4), we get

Rank(A) = Rank(ABB −1 ) ≤ Rank(AB)

and hence the required result follows.

Theorem 2.3.6. Let A ∈ Mm,n (C). If Rank(A) = r then, there exist invertible matrices P and
Q such that " #
Ir 0
P AQ= .
0 0

Proof. Let C = RREF(A). Then, by Remark 2.2.19.4 there exists as invertible matrix P such
T

that C = P A. Note that C has r pivots and they appear in columns, say i1 < i2 < · · · < ir .
AF

Now, let D = CE1i1 E2i2 · · · Erir . As Ejij ’s are elementary matrices that interchange the
DR

" #
Ir B
columns of C, one has D = , where B ∈ Mr,n−r (C).
0 0
" #
Ir −B
Put Q1 = E1i1 E2i2 · · · Erir . Then, Q1 is invertible. Let Q2 = . Then, verify that
0 In−r
Q2 is invertible and
" #" # " #
Ir B Ir −B Ir 0
CQ1 Q2 = DQ2 = = .
0 0 0 In−r 0 0
" #
Ir 0
Thus, if we put Q = Q1 Q2 then Q is invertible and P AQ = CQ = CQ1 Q2 = and
0 0
hence, the required result follows.
We now prove the following result.

Proposition 2.3.7. Let A ∈ Mn (C) be an invertible matrix and let S be any subset of {1, 2, . . . , n}.
Then Rank(A[S, :]) = |S| and Rank(A[:, S]) = |S|.

Proof. Without loss of generality, let S = {1, 2, . . . , r} and S c = {r + 1, . . . , n}. Let us


write A1 = A[S, :] and A2 = A[S c , :]. Since A is invertible, RREF(A) = In . Hence, by
Remark 2.2.19.4, there exists an invertible matrix P such that P A = In . Thus,
" #
h i h i Ir 0
P A1 P A2 = P A1 A2 = P A = In = .
0 In−r
2.3. RANK OF A MATRIX 45
" # " #
Ir 0
Thus, P A1 = and P A2 = . So, using Corollary 2.3.5, Rank(A1 ) = r.
0 In−r
For the second part, let B1 = A[:, S], B2 = A[:, S c ] and let Rank(B1 ) = t < s. Then, by
Remark 2.2.19.4, there exists an invertible matrix Q and a matrix C in RREF which has exactly
t pivots such that " #
C
QB1 = RREF(B1 ) = . (2.3.5)
0
As
" t <# s, QB
" 1#has at least one zero
" row. # As P A = In by Proposition 2.2.21 AP = In . Hence,
B1 P B1 Is 0
= P = AP = In = . Thus,
B2 P B2 0 In−s
h i h i
B1 P = Is 0 and B2 P = 0 In−s . (2.3.6)

Hence, using Equations (2.3.5) and (2.3.6), we see that


" # " #
CP C h i h i
= P = QB1 P = Q Is 0 = Q 0 .
0 0

Thus, Q has a zero row, a contradiction to Q being invertible. Hence, Rank(B1 ) = s.


As a direct corollary of Theorem 2.3.6 and Proposition 2.3.7, we have the following result
which improves Corollary 2.3.5.
T
AF

Corollary 2.3.8. Let A ∈ Mm,n (C). hIf Rank(A)


i = r < n then, there exists an invertible matrix
Q and B ∈ Mm,r (C) such that AQ = B 0 , where Rank(B) = r.
DR

" #
Ir 0
Proof. By Theorem 2.3.6, there exist invertible matrices P and Q such that P AQ = .
0 0
h i
If P −1 = B C , where B ∈ Mm,r (C) and C ∈ Mm,m−r (C) then,
" # " #
Ir 0 h i I 0
r
h i
AQ = P −1 = B C = B 0 .
0 0 0 0
h i
Now, by Proposition 2.3.7, Rank(B) = r = Rank(A) as the matrix P −1 = B C is an invertible
matrix. Thus, the required result follows.
As an application of Corollary 2.3.8, we have the following result.

Corollary 2.3.9. Let A ∈ Mm,n (C) and B ∈ Mn,p (C). Then, Rank(AB) ≤ Rank(B).

Proof. Let Rank(B) = r. Then, by Corollary


h i 2.3.8, there exists an invertible matrixh Q and
i a
matrix C ∈ Mn,r (C) such that BQ = C 0 and Rank(C) = r. Hence, ABQ = A C 0 =
h i
AC 0 . Thus, using Corollary 2.3.5 and Remark 2.3.2.2, we get
h i
Rank(AB) = Rank(ABQ) = Rank AC 0 = Rank(AC) ≤ r = Rank(B).

We end this section by relating the rank of the sum of two matrices with sum of their ranks.
46 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS

Proposition 2.3.10. Let A, B ∈ Mm,n (C). Then, prove that Rank(A + B) ≤ Rank(A) +
k
xi yi∗ , for some xi , yi ∈ C, for 1 ≤ i ≤ k, then Rank(A) ≤ k.
P
Rank(B). In particular, if A =
i=1

Proof. Let Rank(A) = r. Then, # exists an invertible matrix P and a matrix A1 ∈ Mr,n (C)
" there
A1
such that P A = RREF(A) = . Then,
0
" # " # " #
A1 B1 A1 + B 1
P (A + B) = P A + P B = + = .
0 B2 B2

Now using Corollary 2.3.5, Remark 2.3.2.4 and the condition Rank(A) = Rank(A1 ) = r, the
number of rows of A1 , we have

Rank(A + B) = Rank(P (A + B)) ≤ r + Rank(B2 ) ≤ r + Rank(B) = Rank(A) + Rank(B).

Thus, the required result follows. The other part follows, as Rank(xi yi∗ ) = 1, for 1 ≤ i ≤ k.
" # " #
2 4 8 1 0 0
Exercise 2.3.11. 1. Let A = and B = . Find P and Q such that
1 3 2 0 1 0
B = P AQ.

2. Let A ∈ Mm,n (C). If Rank(A) = r then, prove that A = BC, where B ∈ Mm,r (C) and
C ∈ Mr,n (C) and Rank(B) = Rank(C) = r. Now, use matrix product to give the existence
T

r
AF

of xi ∈ Cm and yi ∈ Cn such that A = xi yi∗ .


P
i=1
DR

3. If Rank(A)
" = r# then prove" that #there exist invertible
" matrices
# Bi , Ci such that
R1 R2 S1 0 A1 0
B1 A = , AC1 = and B2 AC2 = , where the (1, 1) block
0 0 S3 0 0 0
of each matrix has size r × r. Also, prove that A1 is an invertible matrix.

4. Prove that if Rank(A) = Rank(AB) then A = ABX, for some matrix X. Similarly, if
Rank(A) = Rank(BA) then A =" Y BA,# for some matrix Y . [Hint: Choose " invertible
#
A1 0 −1 A 2 A 3
matrices P, Q satisfying P AQ = , P (AB) = (P AQ)(Q B) = . Now,
0 0 0 0
" #
C 0
find an invertible matrix R such that P (AB)R = . Use the above result to show
0 0
" #
C −1 A1 0
that C is invertible. Then X = R Q−1 gives the required result.]
0 0
" # " # " #
A1 0 A11 0 B11 A11 + B12 A21 0
Ans: P AQ = ⇒ AQ = ⇒ BAQ = . Thus,
0 0 A21 0 B21 A11 + B22 A21 0
" #
C 0
there exists an invertible matrix P1 such that P1 BAQ = for some invertible matrix
0 0
" #
A C −1 0
1
C. Define Y = P −1 P1 and compute Y BA.
0 0

5. Let M and N be invertible matrices. Then prove that Rank(M AN ) = Rank(A).


2.4. SOLUTION SET OF A LINEAR SYSTEM 47

6. Let A be an m × n matrix with Rank(A) = m. Then prove the following:


(a) There
h exists
i an invertible matrix P and a permutation matrix Q such that P AQ =
Im 0 .
(b) As Q is a permutation matrix Q is an orthogonal matrix, i.e., QQT = I = QT Q.
(c) P (AAT )P T = (P AQ)(QT AT P T ) = (P AQ)(P AQ)T = Im . Hence Rank(AAT ) = m.

2.4 Solution set of a Linear System


Definition 2.4.1. Consider the linear system Ax = b. If RREF([A b]) = [C d]. Then,
the variables corresponding to the pivotal columns of C are called the basic variables and the
variables that are not basic are called free variables.

Example 2.4.2. 1. If the system Ax = b in n variables is consistent and RREF(A) has r


nonzero rows then, Ax = b has r basic variables and n − r free variables.
 
1 0 0 1
 
2. Let RREF([A b]) =  0 1 1 2. Hence, x and y are basic variables and z is the free

0 0 0 0
variable. Thus, the solution set of Ax = b is given by

[x, y, z]T | [x, y, z] = [1, 2 − z, z] = [1, 2, 0] + z[0, −1, 1], with z arbitrary.

T
AF

 
1 0 0 0
DR

 
3. Let RREF([A b]) =  0 1 1 0. Then, the system Ax = b has no solution as

0 0 0 1
(RREF([A b]))[3, :] = [0 0 0 1].

We now prove the main result in the theory of linear systems. Before doing so, we look at
the following example.

Example 2.4.3. Consider a linear system Ax = b. Suppose RREF([A b]) = [C d], where
 
1 0 2 −1 0 0 2 8
 
0 1 1 3 0 0 5 1
 
0 0 0 0 1 0 −1 2
[C d] =  .
 
0 0 0 0 0 1 1 4
 
0 0 0 0 0 0 0 0
 

0 0 0 0 0 0 0 0

Then to get the solution set, we observe the following.

1. C has 4 pivotal columns, namely, the columns 1, 2, 5 and 6. Thus, x1 , x2 , x5 and x6 are
basic variables.

2. Hence, the remaining variables x3 , x4 and x7 are free variables.


48 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS

Therefore, the solution set is given by


           
x1 8 − 2x3 + x4 − 2x7 8 −2 1 −2
           
x2  1 − x3 − 3x4 − 5x7  1 −1 −3 −5
           
x x 0 1 0 0
           
 3  3       
           
x4  =  x4  = 0 + x3  0  + x4  1  + x7  0  ,
           
           
x5   2 + x7  2 0 0 1
           
x  
 6  4 − x 7
 
  4   0
 
  0
 
 −1
 
x7 x7 0 0 0 1

where x3 , x4 and


 x7 are 
arbitrary.
    
8 −2 1 −2
       
1 −1 −3 −5
       
0 1 0 0
       
       
Let x0 = 0 , u1 =  0  , u2 =  1  and u3 = 
     
 0 . In this example, verify that

       
2 0 0 1
       
4 0 0 −1
       
0 0 0 1
Cx0 = d, and for 1 ≤ i ≤ 3, Cui = 0. Hence, it follows that Ax0 = d, and for 1 ≤ i ≤ 3,
Aui = 0.
T

Theorem 2.4.4. Let Ax = b be a linear system in n variables with RREF([A b]) = [C d]


AF

with Rank(A) = r and Rank([A b]) = ra .


DR

1. Then, the system Ax = b is inconsistent if r < ra

2. Then, the system Ax = b is consistent if r = ra .

(a) Further, Ax = b has a unique solution if r = n.


(b) Further, Ax = b has infinite number of solutions if r < n. In this case, there
exist vectors x0 , u1 , . . . , un−r ∈ Rn with Ax0 = b and Aui = 0, for 1 ≤ i ≤ n − r.
Furthermore, the solution set is given by

{x0 + k1 u1 + k2 u2 + · · · + kn−r un−r | ki ∈ C, 1 ≤ i ≤ n − r}.

Proof. Part 1: As r < ra , by Remark 2.2.19.5 ([C d])[r + 1, :] = [0T 1]. Note that this row
corresponds to the linear equation

0 · x1 + 0 · x2 + · · · + 0 · xn = 1

which clearly has no solution. Thus, by definition and Theorem 2.1.17, Ax = b is inconsistent.
Part 2: As r = ra , by Remark 2.2.19.5, [C d] doesn’t have a row of the form [0T 1].
Further, the number of pivots in [C d] and that in C is same, namely, r pivots. Suppose the
pivots appear in columns i1 , . . . , ir with 1 ≤ i1 < · · · < ir ≤ n. Thus, the variables xij , for
1 ≤ j ≤ r, are basic variables and the remaining n − r variables, say xt1 , . . . , xtn−r , are free
2.4. SOLUTION SET OF A LINEAR SYSTEM 49

variables with t1 < · · · < tn−r . Since C is in RREF, in terms of the free variables and basic
variables, the `-th row of [C d], for 1 ≤ ` ≤ r, corresponds to the equation
n−r
X n−r
X
x i` + c`tk xtk = d` ⇔ xi` = d` − c`tk xtk .
k=1 k=1

Thus, the system Cx = d is consistent. Hence, by Theorem 2.1.17 the system Ax = b is


consistent and the solution set of the system Ax = b and Cx = d are the same. Therefore, the
solution set of the system Cx = d (or equivalently Ax = b) is given by
 n−r
P 
  d1 − c1tk xtk        
x i1  k=1
 d 1 c1t 1 c1t 2 c1tn−r
 .   ..   .. 
    .   .   . 
 .   . . .
 .   .  .  .   .   . 
     
   n−r
        
 x ir   P d
  
r
crt  crt  crt 
   dr − crtk xtk 
    1   2   n−r 
 
 xt1  =  k=1
= 0  + xt1  1  + xt2  0  + · · · + xtn−r  0 . (2.4.7)
          

 x  
  xt1 
         
t
 2    0  0   1   0 
 .   xt2         
 .     ..   . 
.
 . 
.
 . 
.
 .    .  .   .   . 
       
..
.

xtn−r 0 0 0 1
 
xtn−r
Part 2a: As r = n, there are no free variables. Hence, xi = di , for 1 ≤ i ≤ n, is the unique
solution.      
d1 c1t1 c1tn−r
T

.  .   . 
.  .   . 
AF

.  .   . 
     
dr  crt  crt 
DR

   1  n−r 
Part 2b: Define x0 =  0  and u1 =  1 , . . . , un−r =  0 . Then, it can be easily
     
     
0  0   0 
     
.  .   . 
.  .   . 
.  .   . 
0 0 1
verified that Ax0 = b and, for 1 ≤ i ≤ n−r, Aui = 0. Also, by Equation (2.4.7) the solution set
has indeed the required form, where ki corresponds to the free variable xti . As there is at least
one free variable the system has infinite number of solutions. Thus, the proof of the theorem is
complete.

Exercise 2.4.5. Consider the linear system given below. Use GJE to find the RREF of it’s
augmented matrix. Now, use the technique used in the previous theorem to find the solution of
the linear system
x +y −2u +v = 2
z +u +2v = 3
v +w = 3
v +2w = 5
Let A ∈ Mm,n (C). Then, Rank(A) ≤ m. Thus, using Theorem 2.4.4 the next result follows.

Corollary 2.4.6. Let A ∈ Mm,n (C). If Rank(A) = r < min{m, n} then Ax = 0 has infinitely
many solutions. In particular, if m < n, then Ax = 0 has infinitely many solutions. Hence, in
either case, the homogeneous system Ax = 0 has at least one non-trivial solution.
50 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS

Remark 2.4.7. Let A ∈ Mm,n (C). Then, Theorem 2.4.4 implies that Ax = b is consistent
if and only if Rank(A) = Rank([A b]). Further, the vectors associated to the free variables in
Equation (2.4.7) are solutions to the associated homogeneous system Ax = 0.

We end this subsection with some applications.

Example 2.4.8. 1. Determine the equation of the line/circle that passes through the points
(−1, 4), (0, 1) and (1, 4).
Solution: The general equation of a line/circle in Euclidean plane is given by a(x2 +
y 2 ) + bx + cy + d = 0, where a, b, c and d are variables. Since this curve passes through
the given points,we get a homogeneous
 system in 3 equations and 4 variables, namely
  a
(−1)2 + 42 −1 4 1  
  b  3 16
 (0)2 + 12 0 1 1  c  = 0. Solving this system, we get [a, b, c, d] = [ 13 d, 0, − 13 d, d].
 

2
1 +4 2 1 4 1
 
d
Hence, choosing d = 13, the required circle is given by 3(x2 + y 2 ) − 16y + 13 = 0.

2. Determine the equation of the plane that contains the points (1, 1, 1), (1, 3, 2) and (2, −1, 2).
Solution: The general equation of a plane in space is given by ax + by + cz + d = 0,
where a, b, c and d are variables. Since this plane passes through the 3 given points, we
get a homogeneous system in 3 equations and 4 variables. So, it has a non-trivial solution,
T

namely [a, b, c, d] = [− 43 d, − d3 , − 32 d, d]. Hence, choosing d = 3, the required plane is given


AF

by −4x − y + 2z + 3 = 0.
DR

 
2 3 4
 
3. Let A = 
0 −1 0 . Then, find a non-trivial solution of Ax = 2x. Does there exist a

0 −3 4
nonzero vector y ∈ R3 such that Ay = 4y?
Solution: Solving for Ax = 2x is equivalentto solving (A − 2I)x = 0. The augmented
0 3 4 0
T
 
matrix of this system equals 0 −3 0 0

. Verify that x = [1, 0, 0] is a nonzero
0 4 2 0
solution.
  other part, the augmented matrix for solving (A − 4I)y = 0 equals
For the
−2 3 4 0
 0 −5 0 0. Thus, verify that yT = [2, 0, 1] is a nonzero solution.
 
 
0 −3 0 0

Exercise 2.4.9. 1. Let A ∈ Mn (C). If A2 x = 0 has a non trivial solution then show that
Ax = 0 also has a non trivial solution.

2. Prove that 5 distinct points are needed to specify a general conic, namely, ax2 + by 2 +
cxy + dx + ey + f = 0, in the Euclidean plane.

3. Let u = (1, 1, −2)T and v = (−1, 2, 3)T . Find condition on x, y and z such that the system
cu + dv = (x, y, z)T in the variables c and d is consistent.
2.5. SQUARE MATRICES AND LINEAR SYSTEMS 51

4. For what values of c and k, the following systems have i) no solution, ii) a unique
solution and iii) infinite number of solutions.

(a) x + y + z = 3, x + 2y + cz = 4, 2x + 3y + 2cz = k.
(b) x + y + z = 3, x + y + 2cz = 7, x + 2y + 3cz = k.
(c) x + y + 2z = 3, x + 2y + cz = 5, x + 2y + 4z = k.

5. Find the condition(s) on x, y, z so that the systems given below (in the variables a, b and
c) is consistent?

(a) a + 2b − 3c = x, 2a + 6b − 11c = y, a − 2b + 7c = z.
(b) a + b + 5c = x, a + 3c = y, 2a − b + 4c = z.

6. Determine the equation of the curve y = ax2 + bx + c that passes through the points
(−1, 4), (0, 1) and (1, 4).

7. Solve the linear systems


x + y + z + w = 0, x − y + z + w = 0 and −x + y + 3z + 3w = 0, and
x + y + z = 3, x + y − z = 1, x + y + 4z = 6 and x + y − 4z = −1.

8. For what values of a, does the following systems have i) no solution, ii) a unique solution
T

and iii) infinite number of solutions.


AF

(a) x + 2y + 3z = 4, 2x + 5y + 5z = 6, 2x + (a2 − 6)z = a + 20.


DR

(b) x + y + z = 3, 2x + 5y + 4z = a, 3x + (a2 − 8)z = 12.

9. Consider the linear system Ax = b in m equations and 3 variables. Then, for each of the
given solution set, determine the possible choices of m? Further, for each choice of m,
determine a choice of A and b.
(a) (1, 1, 1)T is the only solution.
(b) {(1, 1, 1)T + c(1, 2, 1)T |c ∈ R} as the solution set.
(c) {c(1, 2, 1)T |c ∈ R} as the solution set.
(d) {(1, 1, 1)T + c(1, 2, 1)T + d(2, 2, −1)T |c, d ∈ R} as the solution set.
(e) {c(1, 2, 1)T + d(2, 2, −1)T |c, d ∈ R} as the solution set.

2.5 Square Matrices and Linear Systems


In this section the coefficient matrix of the linear system Ax = b will be a square matrix. We
start with proving a few equivalent conditions that relate different ideas.

Theorem 2.5.1. Let A ∈ Mn (C). Then, the following statements are equivalent.

1. A is invertible.

2. RREF(A) = In .
52 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS

3. A is a product of elementary matrices.

4. The homogeneous system Ax = 0 has only the trivial solution.

5. Rank(A) = n.

Proof. 1 ⇔ 2 Already done in Proposition 2.2.21.


2⇔3 Again, done in Proposition 2.2.21.
3 =⇒ 4 Let A = E1 · · · Ek , for some elementary matrices E1 , . . . , Ek . Then, by previous
equivalence A is invertible. So, A−1 exists and A−1 A = In . Hence, if x0 is any solution of the
homogeneous system Ax = 0 then,

x0 = In · x0 = (A−1 A)x0 = A−1 (Ax0 ) = A−1 0 = 0.

Thus, 0 is the only solution of the homogeneous system Ax = 0.


4 =⇒ 5 Let if possible Rank(A) = r < n. Then, by Corollary 2.4.6, the homogeneous
system Ax = 0 has infinitely many solution. A contradiction. Thus, A has full rank.
5 =⇒ 2 Suppose Rank(A) = n. So, RREF(A) has n pivotal columns. But, RREF(A)
has exactly n columns and hence each column is a pivotal column. Thus, RREF(A) = In .
We end this section by giving two more equivalent conditions for a matrix to be invertible.

Theorem 2.5.2. The following statements are equivalent for A ∈ Mn (C).


T
AF

1. A is invertible.
DR

2. The system Ax = b has a unique solution for every b.

3. The system Ax = b is consistent for every b.

Proof. 1 =⇒ 2 Note that x0 = A−1 b is the unique solution of Ax = b.


2 =⇒ 3 The system is consistent as Ax = b has a solution.
3 =⇒ 1 For 1 ≤ i ≤ n, define eTi = In [i, :]. By assumption, the linear system Ax = ei
has a solution, say xi , for 1 ≤ i ≤ n. Define a matrix B = [x1 , . . . , xn ]. Then,

AB = A[x1 , x2 . . . , xn ] = [Ax1 , Ax2 . . . , Axn ] = [e1 , e2 . . . , en ] = In .

Therefore, n = Rank(In ) = Rank(AB) ≤ Rank(A) and hence Rank(A) = n. Thus, by Theo-


rem 2.5.1, A is invertible.
We now give an immediate application of Theorem 2.5.2 and Theorem 2.5.1 without proof.

Theorem 2.5.3. The following two statements cannot hold together for A ∈ Mn (C).

1. The system Ax = b has a solution for every b.

2. The system Ax = 0 has a non-trivial solution.

As an immediate consequence of Theorem 2.5.1, the readers should prove that one needs to
compute either the left or the right inverse to prove invertibility of A ∈ Mn (C).
2.5. SQUARE MATRICES AND LINEAR SYSTEMS 53

Corollary 2.5.4. Let A ∈ Mn (C). Then the following holds.

1. If there exists C such that CA = In then A−1 exists.

2. If there exists B such that AB = In then A−1 exists.

Exercise 2.5.5. 1. Let A be a square matrix. Then, prove that A is invertible ⇔ AT is


invertible ⇔ AT A is invertible ⇔ AAT is invertible.

2. [Theorem of the Alternative] The following two statements cannot hold together for
A ∈ Mn (C) and b ∈ Rn .

(a) The system Ax = b has a solution.


(b) The system yT A = 0T , yT b 6= 0 has a solution.

3. Let A and B be two matrices having positive entries and of orders 1 × n and n × 1,
respectively. Which of BA or AB is invertible? Give reasons.

4. Let A ∈ Mn,m (C) and B ∈ Mm,n (C).

(a) Then, prove that I − BA is invertible if and only if I − AB is invertible [use Theo-
rem 2.5.1.4].
(b) If I − AB is invertible then, prove that (I − BA)−1 = I + B(I − AB)−1 A.
T
AF

(c) If I − AB is invertible then, prove that (I − BA)−1 B = B(I − AB)−1 .


(d) If A, B and A + B are invertible then, prove that (A−1 + B −1 )−1 = A(A + B)−1 B.
DR

5. Let bT = [1, 2, −1, −2]. Suppose A is a 4 × 4 matrix such that the linear system Ax = b
has no solution. Mark each of the statements given below as true or false?

(a) The homogeneous system Ax = 0 has only the trivial solution.


(b) The matrix A is invertible.
(c) Let cT = [−1, −2, 1, 2]. Then, the system Ax = c has no solution.
(d) Let B = RREF(A). Then,
i. B[4, :] = [0, 0, 0, 0].
ii. B[4, :] = [0, 0, 0, 1].
iii. B[3, :] = [0, 0, 0, 0].
iv. B[3, :] = [0, 0, 0, 1].
v. B[3, :] = [0, 0, 1, α], where α is any real number.

2.5.1 Determinant
 
1 2 3 " #
  1 2
1 3 2 then A(1 | 2) = 2 7
Recall the notations used in Section 1.3.1 on Page 19 . If A =  
2 4 7
and A({1, 2} | {1, 3}) = [4]. We are ready to give an inductive definition of the determinant of
54 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS

a square matrix. The advanced students can find an alternate definition of the determinant in
Appendix 7.2.22, where it is proved that the definition given below corresponds to the expansion
of determinant along the first row.

Definition 2.5.6. Let A be a square matrix of order n. Then, the determinant of A, denoted
det(A) (or | A | ) is defined by

 a,
 if A = [a] (corresponds to n = 1),
det(A) = n
(−1)1+j a1j det A(1 | j) ,
P 

 otherwise.
j=1

Example 2.5.7. 1. Let A = [−2]. Then, det(A) = | A | = −2.


" #
a b
2. Let A = . Then, det(A) = | A | = a det(A(1 | 1)) − b det(A(1 | 2)) = ad − bc.
c d
" #
1 2 1 2
For example, if A = then det(A) = = 1 · 5 − 2 · 3 = −1.

3 5 3 5

3. Let A = [aij ] be a 3 × 3 matrix. Then,

det(A) = | A | = a11 det(A(1 | 1)) − a12 det(A(1 | 2)) + a13 det(A(1 | 3))

a
22 a23
a
21 a23
a
21 a22

= a11 − a12 + a 13
T

a32 a33 a31 a33 a31 a32


AF

= a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a31 a23 ) + a13 (a21 a32 − a31 a22 ).
DR

 
1 2 3
3 1

2 1

2 3

 
2 3 1, | A | = 1 · 2 2 − 2 · 1 2 + 3 · 1 2 = 4 − 2(3) + 3(1) = 1.
For A =  

1 2 2

Exercise
 2.5.8.
 Find the determinant
 of the following matrices.
1 2 7 8 3 0 0 1  
    1 a a2
0 4 3 2 0 2 0 5  
i) 

 ii) 6 −7 1 0 iii) 1
    b b2 
.
0 0 2 3
1 c c2
 
0 0 0 5 3 2 0 6

Definition 2.5.9. A matrix A is said to be a singular if det(A) = 0 and is called non-


singular if det(A) 6= 0.

The next result relates the determinant with row operations. For proof, see Appendix 7.3.

Theorem 2.5.10. Let A be an n × n matrix.

1. If B = Eij A, for 1 ≤ i 6= j ≤ n, then det(B) = − det(A).

2. If B = Ei (c)A, for c 6= 0, 1 ≤ i ≤ n, then det(B) = c det(A).

3. If B = Eij (c)A, for c 6= 0 and 1 ≤ i 6= j ≤ n, then det(B) = det(A).

4. If A[i, :]T = 0, for 1 ≤ i, j ≤ n then det(A) = 0.


2.5. SQUARE MATRICES AND LINEAR SYSTEMS 55

5. If A[i, :] = A[j, :] for 1 ≤ i 6= j ≤ n then det(A) = 0.

6. If A is a triangular matrix with d1 , . . . , dn on the diagonal then det(A) = d1 · · · dn .

As det(In ) = 1, we have the following result.

Corollary 2.5.11. Fix a positive integer n.

1. Then det(Eij ) = −1.

2. If c 6= 0 then det(Ek (c)) = c.

3. If c 6= 0 then det(Eij (c)) = 1.


 
2 2 6 1 1 3 1 1 3


  E1 ( 21 ) E21 (−1)E31 (−1)
1 3 2. Then A → 1 3 2 →
Example 2.5.12. Let A =   0 2 −1 .

0 0 −1

1 1 2 1 1 2
Thus, using Theorem 2.5.10, det(A) = 2 · (1 · 2 · (−1)) = −4, where the first 2 appears from the
1
elementary matrix E1 ( ).
2
Exercise 2.5.13. Prove the following without computing the determinant (use Theorem 2.5.10).
h i
1. Let A = u v 2u + 3v , where u, v ∈ C3 . Then, det(A) = 0.
    
a b c a b c a e αa + βe + h
T

T
     
2. Let A =  e f g , B =  e
  f g and C =  b f αb + βf + j  for some
 
AF

h j ` αh αj α` c g αc + βg + `
DR

complex numbers α and β. Then, det(B) = α det(A) and det(C) = det(A).

By Theorem 2.5.10.6 det(In ) = 1. The next result about the determinant of elementary
matrices is an immediate consequence of Theorem 2.5.10 and hence the proof is omitted.

Remark 2.5.14. Theorem 2.5.10.1 implies that the determinant can be calculated by expanding
along any row. Hence, the readers are advised to verify that
n
X
det(A) = (−1)k+j akj det(A(k | j)), for 1 ≤ k ≤ n.
j=1

Example
2.5.15.
Using Remark 2.5.14, one has
2 2 6 1
2 2 1 2 2 6


0 0 2 1 2+3

0 1 2 0 = (−1)
· 2 · 0 1 0 + (−1)2+4 · 0 1 2 = −2 · 1 + (−8) = −10.

1 2 1 1 2 1
1 2 1 1

2.5.2 Adjugate (classical Adjoint) of a Matrix

Definition 2.5.16. Let A ∈ Mn (C). Then, the cofactor matrix, denoted Cof(A), is an Mn (C)
matrix with Cof(A) = [Cij ], where

Cij = (−1)i+j det (A(i | j)) , for 1 ≤ i, j ≤ n.

And, the Adjugate (classical Adjoint) of A, denoted Adj(A), equals CofT (A).
56 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS
 
1 2 3
 
Example 2.5.17. Let A = 
2 3 1 .

1 2 4
1. Then,
 
C C21 C31
T
 11 
Adj(A) = Cof (A) = C12 C22
 C32 

C13 C23 C33
 
(−1)1+1 det(A(1|1)) (−1)2+1 det(A(2|1)) (−1)3+1 det(A(3|1))
 
=  1+2 det(A(1|2)) (−1)2+2 det(A(2|2)) (−1)3+2 det(A(3|2))
(−1) 
(−1)1+3 det(A(1|3)) (−1)2+3 det(A(2|3)) (−1)3+3 det(A(3|3))
 
10 −2 −7
 
−7 1
=  5 .
1 0 −1
   
−1 0 0 det(A) 0 0
   
 0 −1 0  =  0
Now, verify that AAdj(A) =  det(A) 0  = Adj(A)A.
 
0 0 −1 0 0 det(A)
 
x − 1 −2 −3
T

 
2. Consider xI3 − A =  −2 x − 3 −1  . Then,
AF


−1 −2 x − 4
DR

   
C C21 C31 x2 − 7x + 10 2x − 2 3x − 7
 11   
Adj(xI − A) = C12 C22
 C32  =  2x − 7 x 2 − 5x + 1 x + 5 
  
C13 C23 C33 x+1 2x x2 − 4x − 1
 
−7 2 3
2 2
 
= x I + x 2
 −5 1   + Adj(A) = x I + Bx + C(say).
1 2 −4

Hence, we observe that Adj(xI − A) = x2 I + Bx + C is a polynomial in x with coefficients


as matrices. Also, note that (xI − A)Adj(xI − A) = (x3 − 8x2 + 10x − det(A))I3 . Thus,
we see that
(xI − A)(x2 I + Bx + C) = (x3 − 8x2 + 10x − det(A))I3 .

That is, we have obtained a matrix equality and hence, replacing x by A makes sense. But,
then the LHS is 0. So, for the RHS to be zero, we must have A3 −8A2 +10A−det(A)I = 0
(this equality is famously known as the Cayley-Hamilton Theorem).

The next result relates adjugate matrix with the inverse, in case det(A) 6= 0.

Theorem 2.5.18. Let A ∈ Mn (C).


n n
aij (−1)i+j det(A(i|j)) = det(A), for 1 ≤ i ≤ n.
P P
1. Then, aij Cij =
j=1 j=1
2.5. SQUARE MATRICES AND LINEAR SYSTEMS 57

n n
aij (−1)i+j det(A(`|j)) = 0, for i 6= `.
P P
2. Then, aij C`j =
j=1 j=1

3. Thus, A(Adj(A)) = det(A)In . Hence,


1
whenever det(A) 6= 0 one has A−1 = Adj(A). (2.5.1)
det(A)
Proof. Part 1: It follows directly from Remark 2.5.14 and the definition of the cofactor.
Part 2: Fix positive integers i, ` with 1 ≤ i 6= ` ≤ n and let B = [bij ] be a square matrix
with B[`, :] = A[i, :] and B[t, :] = A[t, :], for t 6= `. As ` 6= i, B[`, :] = B[i, :] and thus, by
Theorem 2.5.10.5, det(B) = 0. As A(` | j) = B(` | j), for 1 ≤ j ≤ n, using Remark 2.5.14
n
X n
 X
(−1)`+j b`j det B(` | j) = (−1)`+j aij det B(` | j)

0 = det(B) =
j=1 j=1
n
X n
X
(−1)`+j aij det A(` | j) =

= aij C`j . (2.5.2)
j=1 j=1

This completes the proof of Part 2.


Part 3: Using Equation (2.5.2) and Remark 2.5.14, observe that
n n
(
if i 6= j,
 
 X  X 0,
A Adj(A) = aik Adj(A) kj = aik Cjk =
ij k=1 k=1
det(A), if i = j.
 
1
Thus, A(Adj(A)) = det(A)In . Therefore, if det(A) 6= 0 then A det(A) Adj(A) = In . Hence,
T

by Proposition 2.2.21, A−1 = det(A)


1
Adj(A).
AF

   
1 −1 0 −1 1 −1
DR

   
Example 2.5.19. For A =  0 1 1  , Adj(A) =  1 1 −1  and det(A) = −2. Thus,
   
1 2 1 −1 −3 1
 
1/2 −1/2 1/2
by Theorem 2.5.18.3, A−1 = 
 
−1/2 −1/2 1/2 .

1/2 3/2 −1/2
Let A be a non-singular matrix. Then, by Theorem 2.5.18.3, A−1 = det(A)
1
Adj(A). Thus
 
A Adj(A) = Adj(A) A = det(A) In and this completes the proof of the next result
Corollary 2.5.20. Let A be a non-singular matrix. Then,
n
(
X det(A), if j = k,
Cik aij =
i=1 0, if j 6= k.
The next result gives another equivalent condition for a square matrix to be invertible.
Theorem 2.5.21. A square matrix A is non-singular if and only if A is invertible.
Proof. Let A be non-singular. Then, det(A) 6= 0 and hence A−1 = 1
det(A) Adj(A).
Now, let us assume that A is invertible. Then, using Theorem 2.5.1, A = E1 · · · Ek , a
product of elementary matrices. Also, by Corollary 2.5.11, det(Ei ) 6= 0, for 1 ≤ i ≤ k. Thus, a
repeated application of Parts 1, 2 and 3 of Theorem 2.5.10 gives det(A) 6= 0.
The next result relates the determinant of a matrix with the determinant of its transpose.
Thus, the determinant can be computed by expanding along any column as well.
58 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS

Theorem 2.5.22. Let A be a square matrix. Then, det(A) = det(AT ).

Proof. If A is a non-singular, Corollary 2.5.20 gives det(A) = det(AT ).


If A is singular then, by Theorem 2.5.21, A is not invertible. So, AT is also not invertible
and hence by Theorem 2.5.21, det(AT ) = 0 = det(A).
The next result relates the determinant of product of two matrices with their determinants.

Theorem 2.5.23. Let A and B be square matrices of order n. Then,

det(AB) = det(A) · det(B) = det(BA).

Proof. Case 1: Let A be non-singular. Then, by Theorem 2.5.18.3, A is invertible and by


Theorem 2.5.1, A = E1 · · · Ek , a product of elementary matrices. Thus, a repeated application
of Parts 1, 2 and 3 of Theorem 2.5.10 gives the desired result as

det(AB) = det(E1 · · · Ek B) = det(E1 ) det(E2 · · · Ek B) = det(E1 ) det(E2 ) det(E3 · · · Ek B)


= · · · = det(E1 ) · · · det(Ek ) det(B) = · · · = det(E1 E2 · · · Ek ) det(B)
= det(A) det(B).

Case 2: Let A be singular. Then, by Theorem 2.5.21 A is not " invertible.


# So, by"Proposi-
#
C1 −1 C1
tion 2.2.21 there exists an invertible matrix P such that P A = . So, A = P . As
T

0 0
AF

P is invertible, using Part 1, we have


DR

" #! ! " #! " #!


C 1 C 1 B C 1 B
det(AB) = det P −1 B = det P −1 = det(P −1 ) · det
0 0 0
= det(P ) · 0 = 0 = 0 · det(B) = det(A) det(B).

Thus, the proof of the theorem is complete.

Example 2.5.24. Let A be an orthogonal matrix then, by definition, AAT = I. Thus, by


Theorems 2.5.23 and 2.5.22

1 = det(I) = det(AAT ) = det(A) det(AT ) = det(A) det(A) = (det(A))2 .


" # " #
a b a2 + b2 ac + bd
Hence det A = ±1. In particular, if A = then I = AAT = .
c d ac + bd c2 + d2
1. Thus, a2 + b2 = 1 and hence there exists θ ∈ [−pi, π) such that a = cos θ and b = sin θ.
2. As ac + bd = 0, we get c = r sin θ and d = −r cos θ, for some r ∈ R. But, c2 + d2 = 1
implies that either c = sin θ and d = − cos θ or c = − sin θ and d = cos θ.
" # " #
cos θ sin θ cos θ sin θ
3. Thus, A = or A = .
sin θ − cos θ − sin θ cos θ
" #
cos θ sin θ
4. For A = , det(A) = −1. Then A represents a reflection about the line
sin θ − cos θ
y = mx. Determine m? (see Exercise 2.2b).
2.5. SQUARE MATRICES AND LINEAR SYSTEMS 59
" #
cos θ sin θ
5. For A = , det(A) = 1. Then A represents a rotation through the angle α.
− sin θ cos θ
Determine α? (see Exercise 2.2a).
Exercise 2.5.25. 1. Let A ∈ Mn (C) be an upper triangular matrix with nonzero entries on
the diagonal. Then, prove that A−1 is also an upper triangular matrix.
2. [LU decomposition of an invertible matrix] Let A ∈ Mn (R) such that det(A[S|S]) 6= 0
for all S ⊆ {1, 2, . . . , n}. Then there exists an invertible lower triangular matrix L such
that LA is an invertible upper triangular matrix. The proof uses the following ideas.
h i
(a) Let u ∈ Rn with uT = u1 · · · un and u1 6= 0. Then there exists an invertible
lower triangular matrix L such that Lu = u1 e1 .
 
" # u2
1 0T 1  .. 
Ans: Define L = , where x = −  . Then verify that Lu = u1 e1 .
x In−1 u1  . 
un
(b) As a11 = det(A[S|S]) 6=" 0 for S #= {1}, Part 2a gives an invertible lower triangular
a11 ∗
matrix L1 with L1 A = .
0 A1
(c) Deduce that det(A) = a11 det(A1 ). So det(A1 [S|S]) 6= 0 for all S ⊆ {1, 2, . . . , n − 1}.
(d) Now, use induction to get L2 ∈ Mn−1 (R), an invertible lower triangular matrix, such
that L2 A1 ="T1 , an#invertible upper triangular
" # matrix.
T
T

1 0 α ∗
(e) Define L = L1 . Then LA = , is an upper triangular matrix with L
AF

0 L2 0 T1
as an invertible lower triangular matrix.
DR

" #
α ∗
(f ) Since L−1 is also a lower triangular matrix, A = L−1 . Thus, A is a product
0 T1
of a "lower triangular
# invertible matrix and an upper triangular invertible matrix
α ∗
U= .
0 T1
3. Let A ∈ Mn (C). Then, det(A) = 0 if
(a) either A[i, :]T = 0T or A[:, i] = 0, for some i, 1 ≤ i ≤ n,
(b) or A[i, :] = cA[j, :], for some c ∈ C and for some i 6= j,
(c) or A[:, i] = cA[:, j], for some c ∈ C and for some i 6= j,
(d) or A[i, :] = c1 A[j1 , :] + c2 A[j2 , :] + · · · + ck A[jk , :], for some rows i, j1 , . . . , jk of A and
some ci ’s in C,
(e) or A[:, i] = c1 A[:, j1 ] + c2 A[:, j2 ] + · · · + ck A[:, jk ], for some columns i, j1 , . . . , jk of A
and some ci ’s in C.
   
a b c a e 102 a + 10e + h
C. Without
   
2
4. Let A =  e f g  and B =  b f 10 b + 10f + j , where a, b . . . , ` ∈
  
h j ` c g 102 c + 10g + `

3 1 1


computing deduce that det(A) = det(B). Hence, conclude that 17 divides 4 8 1 .

0 7 9
60 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS

2.5.3 Cramer’s Rule

We start with a corollary which is a direct application of Theorems 2.5.2 and 2.5.21.

Corollary 2.5.26. Let A be a square matrix. Then, the following statements are equivalent:

1. A is invertible.

2. The linear system Ax = b has a unique solution for every b.

3. det(A) 6= 0.

Thus, Ax = b has a unique solution for every b if and only if det(A) 6= 0. The next theorem
gives a direct method of finding the solution of the linear system Ax = b when det(A) 6= 0.

Theorem 2.5.27 (Cramer’s Rule). Let A be an n × n non-singular matrix. Then, the unique
solution of the linear system Ax = b with xT = [x1 , . . . , xn ] is given by

det(Aj )
xj = , for j = 1, 2, . . . , n,
det(A)

where Aj is the matrix obtained from A by replacing A[:, j] by b.

Proof. Since det(A) 6= 0, A is invertible. Thus, there exists an invertible matrix P such that
P A = In and P [A | b] = [I | P b]. Then A−1 = P . Let d = P b = A−1 b. Then, Ax = b has the
T
AF

unique solution xj = dj , for 1 ≤ j ≤ n. Also, [e1 , . . . , en ] = I = P A = [P A[:, 1], . . . , P A[:, n]].


Thus,
DR

P Aj = P [A[:, 1], . . . , A[:, j − 1], b, A[:, j + 1], . . . , A[:, n]]


= [P A[:, 1], . . . , P A[:, j − 1], P b, P A[:, j + 1], . . . , P A[:, n]]
= [e1 , . . . , ej−1 , d, ej+1 , . . . , en ].

dj det(P Aj ) det(P ) det(Aj ) det(Aj )


Thus, det(P Aj ) = dj , for 1 ≤ j ≤ n. Also, dj = = = = .
1 det(P A) det(P ) det(A) det(A)
det(Aj )
Hence, xj = and the required result follows.
det(A)
   
1 2 3 1
   
Example 2.5.28. Solve Ax = b using Cramer’s rule, where A = 2 3 1 and b = 1
  
.
1 2 2 1
T
Solution: Check that det(A) = 1 and x = [−1, 1, 0] as

1 2 3 1 1 3 1 2 1


x1 = 1 3 1 = −1, x2 = 2 1 1 = 1, and x3 = 2 3 1 = 0.

1 2 2 1 1 2 1 2 1

2.6 Miscellaneous Exercises


Exercise 2.6.1. 1. Let A be a unitary matrix then what can you say about | det(A) |?
2.6. MISCELLANEOUS EXERCISES 61

2. Let A ∈ Mn (C). Prove that the following statements are equivalent:

(a) A is not invertible.


(b) Rank(A) 6= n.
(c) det(A) = 0.
(d) A is not row-equivalent to In .
(e) The homogeneous system Ax = 0 has a non-trivial solution.
(f ) The system Ax = b is either inconsistent or it has an infinite number of solutions.
(g) A is not a product of elementary matrices.

3. Let A be a Hermitian matrix. Prove that det A is a real number.

4. Let A ∈ Mn (C). Then, A is invertible if and only if Adj(A) is invertible.

5. Let A and B be invertible matrices. Prove that Adj(AB) = Adj(B)Adj(A).


" #
A B
6. Let A be an n × n invertible matrix and let P = . Then, show that Rank(P ) = n
C D
if and only if D = CA−1 B.
7. Let A be a 2 × 2 matrix with tr(A) = 0 and det(A) = 0. Then, A is a nilpotent matrix.
T
AF

8. Determine necessary and sufficient condition for a triangular matrix to be invertible.


DR

" # " #
−1 A11 A12 B11 B12
9. Suppose A = B with A = and B = . Also, assume that A11 is
A21 A22 B21 B22
invertible and define P = A22 − A21 A−1
11 A12 . Then, prove that
" #" # " #
I 0 A11 A12 A11 A12
(a) = ,
−A21 A−1
11 I A21 A22 0 A22 − A21 A−1 11 A12
" #
A−1
11 + (A −1
11 A 12 )P −1 (A A−1 )
21 11 −(A −1
11 A 12 )P −1
(b) P is invertible and B = .
−P −1 (A21 A−1 11 ) P −1

10. Let A and B be two non-singular matrices. Are the matrices A + B and A − B non-
singular? Justify your answer.

11. For what value(s) of λ does the following systems have non-trivial solutions? Also, for
each value of λ, determine a non-trivial solution.

(a) (λ − 2)x + y = 0, x + (λ + 2)y = 0.


(b) λx + 3y = 0, (λ + 6)y = 0.

12. Let a1 , . . . , an ∈ C and define A = [aij ]n×n with aij = aj−1


i . Prove that det(A) =
Q
(aj − ai ). This matrix is usually called the van der monde matrix.
1≤i<j≤n

13. Let A = [aij ] ∈ Mn (C) with aij = max{i, j}. Prove that det A = (−1)n−1 n.
62 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS

14. Solve the following linear system by Cramer’s rule.


i) x + y + z − w = 1, x + y − z + w = 2, 2x + y + z − w = 7, x + y + z + w = 3.
ii) x − y + z − w = 1, x + y − z + w = 2, 2x + y − z − w = 7, x − y − z + w = 3.

15. Let p ∈ C, p 6= 0. Let A = [aij ], B = [bij ] ∈ Mn (C) with bij = pi−j aij , for 1 ≤ i, j ≤ n.
Then, compute det(B) in terms of det(A).

16. The position of an element aij of a determinant is called even or odd according as i + j is
even or odd. Prove that if all the entries in

(a) odd positions are multiplied with −1 then the value of determinant doesn’t change.
(b) even positions are multiplied with −1 then the value of determinant
i. does not change if the matrix is of even order.
ii. is multiplied by −1 if the matrix is of odd order.

2.7 Summary
In this chapter, we started with a system of m linear equations in n variables and formally
wrote it as Ax = b and in turn to the augmented matrix [A | b]. Then, the basic operations on
equations led to multiplication by elementary matrices on the right of [A | b]. These elementary
T

matrices are invertible and applying the GJE on a matrix A, resulted in getting the RREF of
AF

A. We used the pivots in RREF matrix to define the rank of a matrix. So, if Rank(A) = r and
DR

Rank([A | b]) = ra

1. then, r < ra implied the linear system Ax = b is inconsistent.

2. then, r = ra implied the linear system Ax = b is consistent. Further,

(a) if r = n then the system Ax = b has a unique solution.


(b) if r < n then the system Ax = b has an infinite number of solutions.

We have also seen that the following conditions are equivalent for A ∈ Mn (C).

1. A is invertible.

2. The homogeneous system Ax = 0 has only the trivial solution.

3. The row reduced echelon form of A is I.

4. A is a product of elementary matrices.

5. The system Ax = b has a unique solution for every b.

6. The system Ax = b has a solution for every b.

7. Rank(A) = n.

8. det(A) 6= 0.
2.7. SUMMARY 63

So, overall we have learnt to solve the following type of problems:

1. Solving the linear system Ax = b. This idea will lead to the question “is the vector b a
linear combination of the columns of A”?

2. Solving the linear system Ax = 0. This will lead to the question “are the columns of A
linearly independent/dependent”? In particular, we will see that

(a) if Ax = 0 has a unique solution then the columns of A are linear independent.
(b) if Ax = 0 has a non-trivial solution then the columns of A are linearly dependent.

T
AF
DR
64 CHAPTER 2. SYSTEM OF LINEAR EQUATIONS

T
AF
DR
Chapter 3

Vector Spaces

In this chapter, we will mainly be concerned with finite dimensional vector spaces over R or C.
Please note that the real and complex numbers have the property that any pair of elements can
be added, subtracted or multiplied. Also, division is allowed by a nonzero element. Such sets in
mathematics are called field. So, Q, R and C are examples of field and they have infinite number
of elements. But, in mathematics, we do have fields that have only finitely many elements. For
example, consider the set Z5 = {0, 1, 2, 3, 4}. In Z5 , we define addition and multiplication,
respectively, as

·
T

+ 0 1 2 3 4 0 1 2 3 4
AF

0 0 1 2 3 4 0 0 0 0 0 0
DR

1 1 2 3 4 0 1 0 1 2 3 4
and .
2 2 3 4 0 1 2 0 2 4 1 3
3 3 4 0 1 2 3 0 3 1 4 2
4 4 0 1 2 3 4 0 4 3 2 1

Then, we see that the elements of Z5 can be added, subtracted and multiplied. Note that 4
behaves as −1 and 3 behaves as −2. Thus, 1 behaves as −4 and 2 behaves as −3. Also, we see
that in this multiplication 2 · 3 = 1 and 4 · 4 = 1. Hence,
1. the division by 2 is similar to multiplying by 3,
2. the division by 3 is similar to multiplying by 2, and
3. the division by 4 is similar to multiplying by 4.

Thus, Z5 indeed behaves like a field. So, in this chapter, F will represent a field.

3.1 Vector Spaces: Definition and Examples


Let A ∈ Mm,n (F) and let V denote the solution set of the homogeneous system Ax = 0. Then,
by Theorem 2.1.9, V satisfies:

1. 0 ∈ V as A0 = 0.

2. if x ∈ V then αx ∈ V, for all α ∈ F. In particular, for α = −1, −x ∈ V.

65
66 CHAPTER 3. VECTOR SPACES

3. if x, y ∈ V then, for any α, β ∈ F, αx + βy ∈ V.

We see that the solution set of a homogeneous linear system satisfies certain properties which
are also satisfied by the Euclidean plane, R2 , or the Euclidean space, R3 . In this chapter, our
aim is to understand sets that satisfy such properties. We start with the formal definition.

Definition 3.1.1. A vector space V over F, denoted V(F) or in short V (if the field F is clear
from the context), is a non-empty set, satisfying the following conditions:

1. Vector Addition: To every pair u, v ∈ V there corresponds a unique element u ⊕ v ∈ V


(called the addition of vectors) such that

(a) u ⊕ v = v ⊕ u (Commutative law).


(b) (u ⊕ v) ⊕ w = u ⊕ (v ⊕ w) (Associative law).
(c) V has a unique element, denoted 0, called the zero vector that satisfies u ⊕ 0 = u,
for every u ∈ V (called the additive identity).
(d) for every u ∈ V there is an element w ∈ V that satisfies u ⊕ w = 0.

2. Scalar Multiplication: For each u ∈ V and α ∈ F, there corresponds a unique element


α u in V (called the scalar multiplication) such that

(a) α · (β u) = (α · β) u for every α, β ∈ F and u ∈ V (· is multiplication in F).


T
AF

(b) 1 u = u for every u ∈ V, where 1 ∈ F.


DR

3. Distributive Laws: relating vector addition with scalar multiplication


For any α, β ∈ F and u, v ∈ V, the following distributive laws hold:

(a) α (u ⊕ v) = (α u) ⊕ (α v).
(b) (α + β) u = (α u) ⊕ (β u) (+ is addition in F).

Remark 3.1.2. [Real / Complex Vector Space]


1. The elements of F are called scalars.
2. The elements of V are called vectors.
3. We denote the zero element of F by 0, whereas the zero element of V will be denoted by 0.
4. Observe that Condition 3.1.1.1d implies that for every u ∈ V, the vector w ∈ V such that
u ⊕ w = 0 holds, is unique. For if, w1 , w2 ∈ V with u ⊕ wi = 0, for i = 1, 2 then by
commutativity of vector addition, we see that

w1 = w1 ⊕ 0 = w1 + (u ⊕ w2 ) = (w1 ⊕ u) ⊕ w2 = 0 ⊕ w2 = w2 .

Hence, we represent this unique vector by −u and call it the additive inverse.
5. If V is a vector space over R then V is called a real vector space.
6. If V is a vector space over C then V is called a complex vector space.
7. In general, a vector space over R or C is called a linear space.
3.1. VECTOR SPACES: DEFINITION AND EXAMPLES 67

Some interesting consequences of Definition 3.1.1 is stated next. Intuitively, they seem
obvious but for better understanding of the given conditions, it is desirable to go through the
proof.

Theorem 3.1.3. Let V be a vector space over F. Then,

1. u ⊕ v = u implies v = 0.

2. α u = 0 if and only if either u = 0 or α = 0.

3. (−1) u = −u, for every u ∈ V.

Proof. Part 1: By Condition 3.1.1.1d, for each u ∈ V there exists −u ∈ V such that −u⊕u = 0.
Hence, u ⊕ v = u is equivalent to

−u ⊕ (u ⊕ v) = −u ⊕ u ⇐⇒ (−u ⊕ u) ⊕ v = 0 ⇐⇒ 0 ⊕ v = 0 ⇐⇒ v = 0.

Part 2: As 0 = 0 ⊕ 0, using Condition 3.1.1.3, we have

α 0 = α (0 ⊕ 0) = (α 0) ⊕ (α 0).

Thus, using Part 1, α 0 = 0 for any α ∈ F. In the same way, using Condition 3.1.1.3b,

0 u = (0 + 0) u = (0 u) ⊕ (0 u).
T
AF

Hence, using Part 1, one has 0 u = 0 for any u ∈ V.


DR

Now suppose α u = 0. If α = 0 then the proof is over. Therefore, assume that α 6= 0, α ∈ F.


Then, (α)−1 ∈ F and

0 = (α)−1 0 = (α)−1 (α u) = ((α)−1 · α) u = 1 u = u

as 1 u = u for every vector u ∈ V (see Condition 2.2b). Thus, if α 6= 0 and α u = 0 then


u = 0.
Part 3: As 0 = 0 · u = (1 + (−1))u = u ⊕ (−1) · u, one has (−1) · u = −u.

Example 3.1.4. The readers are advised to justify the statements given below.

1. Let V = {0}. Then, V is a real as well as a complex vector space.

2. Let A ∈ Mm,n (F) with Rank(A) = r ≤ n. Then, using Theorem 2.4.4, the solution set of
the homogeneous system Ax = 0 is a vector space over F.

3. Consider R with the usual addition and multiplication. That is, a ⊕ b = a + b and
a b = a · b. Then, R forms a real vector space.

4. Let R2 = {(x1 , x2 )T | x1 , x2 ∈ R} Then, for x1 , x2 , y1 , y2 ∈ R and α ∈ R, define

(x1 , x2 )T ⊕ (y1 , y2 )T = (x1 + y1 , x2 + y2 )T and α (x1 , x2 )T = (αx1 , αx2 )T .

Verify that R2 is a real vector space.


68 CHAPTER 3. VECTOR SPACES

5. Let Rn = {(a1 , . . . , an )T | ai ∈ R, 1 ≤ i ≤ n}. For u = (a1 , . . . , an )T , v = (b1 , . . . , bn )T ∈


V and α ∈ R, define

u ⊕ v = (a1 + b1 , . . . , an + bn )T and α u = (αa1 , . . . , αan )T

(called component wise operations). Then, V is a real vector space. The vector
space Rn is called the real vector space of n-tuples.


Recall that the symbol i represents the complex number −1.

6. Consider C = {x + iy | x, y ∈ R}, the set of complex numbers. Let z1 = x1 + iy1 and


z2 = x2 + iy2 and define z1 ⊕ z2 = (x1 + x2 ) + i(y1 + y2 ). For scalar multiplication,

(a) let α ∈ R and define, α z1 = (αx1 ) + i(αy1 ). Then, C is a vector space over R
(called the real vector space).
(b) let α + iβ ∈ C and define, (α + iβ) (x1 + iy1 ) = (αx1 − βy1 ) + i(αy1 + βx1 ). Then,
C forms a vector space over C (called the complex vector space).

7. Let Cn = {(z1 , . . . , zn )T | zi ∈ C, 1 ≤ i ≤ n}. For z = (z1 , . . . , zn ), w = (w1 , . . . , wn )T ∈


Cn and α ∈ F, define

z + w = (z1 + w1 , . . . , zn + wn )T , and α z = (αz1 , . . . , αzn )T .


T
AF

Then, verify that Cn forms a vector space over C (called the complex vector space)
DR

as well as over R (called the real vector space). Unless specified otherwise, Cn will be
considered a complex vector space.

Remark 3.1.5. If F = C then i(1, 0) = (i, 0) is allowed. Whereas, if F = R then i(1, 0)


doesn’t make sense as i 6∈ R.

8. Fix m, n ∈ N and let Mm,n (C) = {Am×n = [aij ] | aij ∈ C}. For A, B ∈ Mm,n (C) and
α ∈ C, define (A + αB)ij = aij + αbij . Then, Mm,n (C) is a complex vector space. If
m = n, the vector space Mm,n (C) is denoted by Mn (C).

9. Let S be a non-empty set and let RS = {f | f is a function from S to R}. For f, g ∈ RS


and α ∈ R, define (f + αg)(x) = f (x) + αg(x), for all x ∈ S. Then, RS is a real vector
space. In particular, for S = N, observe that RN consists of all real sequences and forms
a real vector space.

10. Fix a, b ∈ R with a < b and let C([a, b], R) = {f : [a, b] → R | f is continuous}. Then,
C([a, b], R) with (f + αg)(x) = f (x) + αg(x), for all x ∈ [a, b], is a real vector space.

11. Let C(R, R) = {f : R → R | f is continuous}. Then, C(R, R) is a real vector space, where
(f + αg)(x) = f (x) + αg(x), for all x ∈ R.

12. Fix a < b ∈ R and let C 2 ((a, b), R) = {f : (a, b) → R | f 00 is continuous}. Then,
C 2 ((a, b), R) with (f + αg)(x) = f (x) + αg(x), for all x ∈ (a, b), is a real vector space.
3.1. VECTOR SPACES: DEFINITION AND EXAMPLES 69

13. Fix a < b ∈ R and let C ∞ ((a, b), R) = {f : (a, b) → R | f is infinitely differentiable}.
Then, C ∞ ((a, b), R) with (f + αg)(x) = f (x) + αg(x), for all x ∈ (a, b) is a real vector
space.

14. Fix a < b ∈ R. Then, V = {f : (a, b) → R | f 00 + f 0 + 2f = 0} is a real vector space.

15. Let R[x] = {a0 + a1 x + · · · + an xn | ai ∈ R, for 0 ≤ i ≤ n}. Now, let p(x), q(x) ∈ R[x].
Then, we can choose m such that p(x) = a0 + a1 x + · · · + am xm and q(x) = b0 + b1 x +
· · · + bm xm , where some of the ai ’s or bj ’s may be zero. Then, we define

p(x) + q(x) = (a0 + b0 ) + (a1 + b1 )x + · · · + (am + bm )xm

and αp(x) = (αa0 ) + (αa1 )x + · · · + (αam )xm , for α ∈ R. With these operations “com-
ponentwise addition and multiplication”, it can be easily verified that R[x] forms a real
vector space.

16. Fix n ∈ N and let R[x; n] = {p(x) ∈ R[x] | p(x) has degree ≤ n}. Then, with componen-
twise addition and multiplication, the set R[x; n] forms a real vector space.

17. Let C[x] = {a0 + a1 x + · · · + an xn | ai ∈ C, for 0 ≤ i ≤ n}. Then, under componentwise


addition and multiplication the set C[x] forms a real/complex vector space. Further
C[x; n], the set of complex polynomials of degree less than or equal to n also forms a
T

real/complex vector space.


AF

18. Let V = {A = [aij ] ∈ Mn (C) | a11 = 0}. Then, V is a complex vector space.
DR

19. Let V = {A = [aij ] ∈ Mn (C) | A = A∗ }. Then, verify that V is a real vector space but
not a complex vector space.

20. Let V and W be vector spaces over F, with operations (+, •) and (⊕, ), respectively. Let
V × W = {(v, w) | v ∈ V, w ∈ W}. Then, V × W forms a vector space over F, if for every
(v1 , w1 ), (v2 , w2 ) ∈ V × W and α ∈ R, we define

(v1 , w1 ) ⊕0 (v2 , w2 ) = (v1 + v2 , w1 ⊕ w2 ), and


α ◦ (v1 , w1 ) = (α • v1 , α w1 ).

v1 +v2 and w1 ⊕w2 on the right hand side mean vector addition in V and W, respectively.
Similarly, α • v1 and α w1 correspond to scalar multiplication in V and W, respectively.

21. Let Q be the set of scalars. Then,


(a) R is a vector space over Q. In this space, all the irrational numbers are vectors but
not scalars.

(b) V = {a + b 2 : a, b ∈ Q} is a vector space.
√ √ √
(c) V = {a + b 2 + c 3 + d 6 : a, b, c, d ∈ Q} is a vector space.

(d) V = {a + b −3 : a, b ∈ Q} is a vector space.

22. Let R+ = {x ∈ R | x > 0}. Then,


70 CHAPTER 3. VECTOR SPACES

(a) R+ is not a vector space under usual operations of addition and scalar multiplication.
(b) R+ is a real vector space with 1 as the additive identity if we define

u ⊕ v = u · v and α u = uα , for all u, v ∈ R+ and α ∈ R.

23. For any α ∈ R and x = (x1 , x2 )T , y = (y1 , y2 )T ∈ R2 , define

x ⊕ y = (x1 + y1 + 1, x2 + y2 − 3)T and α x = (αx1 + α − 1, αx2 − 3α + 3)T .

Then, R2 is a real vector space with (−1, 3)T as the additive identity.

24. Recall the field Z5 = {0, 1, 2, 3, 4} given on the first page of this chapter. Then, V =
{(a, b) | a, b ∈ Z5 } is a vector space over Z5 having 25 elements/vectors.

Note that all our vector spaces, except the last two , are linear spaces.

From now on, we will use ‘u + v’ for ‘u ⊕ v’ and ‘αu or α · u’ for ‘α u’.

Exercise 3.1.6. 1. Verify that the vectors spaces mentioned in Example 3.1.4 do satisfy all
the conditions for vector spaces.

2. Does R with x ⊕ y = x − y and α x = −αx, for all x, y, α ∈ R form a vector space?

3. Let V = R2 . For x = (x1 , x2 )T , y = (y1 , y2 )T ∈ R2 and α ∈ R, define


T
AF

(a) (x1 , y1 )T ⊕ (x2 , y2 )T = (x1 + x2 , 0)T and α (x1 , y1 )T = (αx1 , 0)T .


(b) x + y = (x1 + y1 , x2 + y2 )T and αx = (αx1 , 0)T .
DR

Then, does V form a vector space under any of the two operations?

4. Does the set V given below form a real/complex or both real and complex vector space?
Give reasons for your answer.
(" # )
a b
(a) Let V = | a, b, c, d ∈ C, a + c = 0 .
c d
(" # )
a b
(b) Let V = | a = b, a, b, c, d ∈ C .
c d
(c) Let V = {(x, y, z)T | x + y + z = 1}.
(d) Let V = {(x, y)T ∈ R2 | x · y = 0}.
(e) Let V = {(x, y)T ∈ R2 | x = y 2 }.
(f ) Let V = {α(1, 1, 1)T + β(1, 1, −1)T | α, β ∈ R}.

3.1.1 Subspaces

Definition 3.1.7. Let V be a vector space over F. Then, a non-empty subset S of V is called a
subspace of V if S is also a vector space with vector addition and scalar multiplication inherited
from V.

Example 3.1.8. 1. The vector space R[x; n] is a subspace of R[x].


3.1. VECTOR SPACES: DEFINITION AND EXAMPLES 71

2. Is V = {xp(x) | p(x) ∈ R[x]} a subspace of R[x]?

3. Let V be a vector space. Then V and {0} are subspaces, called trivial subspaces.

4. The real vector space R has no non-trivial subspace. To check this, let V 6= {0} be a
vector subspace of R. Then, there exists x ∈ R, x 6= 0 such that x ∈ V. Now, using scalar
multiplication, we see that {αx | α ∈ R} ⊆ V. As, x 6= 0, the set {αx | α ∈ R} = R. This
in turn implies that V = R.

5. W = {x ∈ R3 | [1, 2, −1]x = 0} is a plane in R3 containing 0 (hence a subspace).


" #
1 1 1
6. W = {x ∈ R3 | x = 0} is a line in R3 containing 0 (hence a subspace).
1 −1 −1

7. Verify that C 2 (a, b) is a subspace of C(a, b).

8. Verify that W = {(x, 0)T ∈ R2 | x ∈ R} is a subspace of R2 .

9. Is the set of sequences converging to 0 a subspace of the set of all bounded sequences?

10. Let V be the vector space of Example 3.1.4.23. Then,


(a) S = {(x, 0)T | x ∈ R} is not a subspace of V as (x, 0)T ⊕(y, 0)T = (x+y+1, −3)T 6∈ S.
(b) Verify that W = {(x, 3)T | x ∈ R} is a subspace of V.
T
AF

11. The vector space R+ defined in Example 3.1.4.22 is not a subspace of R.


DR

Let V(F) be a vector space and W ⊆ V, W 6= ∅. We now prove a result which implies that
to check W to be a subspace, we need to verify only one condition.

Theorem 3.1.9. Let V(F) be a vector space and W ⊆ V, W 6= ∅. Then, W is a subspace of V


if and only if αu + βv ∈ W whenever α, β ∈ F and u, v ∈ W. Note that the vector addition and
scalar multiplication are inherited from that in V(F).

Proof. Let W be a subspace of V and let u, v ∈ W. Then, for every α, β ∈ F, αu, βv ∈ W and
hence αu + βv ∈ W.
Now, we assume that αu + βv ∈ W, whenever α, β ∈ F and u, v ∈ W. To show, W is a
subspace of V:

1. Taking α = 1 and β = 1, we see that u + v ∈ W, for every u, v ∈ W.

2. Taking α = 0 and β = 0, we see that 0 ∈ W.

3. Taking β = 0, we see that αu ∈ W, for every α ∈ F and u ∈ W. Hence, using Theo-


rem 3.1.3.3, −u = (−1)u ∈ W as well.

4. The commutative and associative laws of vector addition hold as they hold in V.

5. The conditions related with scalar multiplication and the distributive laws also hold as
they hold in V.
72 CHAPTER 3. VECTOR SPACES

Thus, one obtains the required result.

Exercise 3.1.10. 1. Determine all the subspaces of R and R2 .

2. Prove that a line in R2 is a subspace if and only if it passes through (0, 0)T ∈ R2 .

3. Fix n ∈ N. In the exampless given below, is W a subspace of Mn (R), where

(a) W = {A ∈ Mn (R) | A is upper triangular}?


(b) W = {A ∈ Mn (R) | A is symmetric}?
(c) W = {A ∈ Mn (R) | A is skew-symmetric}?
(d) W = {A ∈ Mn (R) | A is a diagonal matrix}?
(e) W = {A ∈ Mn (R) | trace(A) = 0}?
(f ) W = {A ∈ Mn (R) | AT = 2A}?

4. Fix n ∈ N. Then, is W = {A = [aij ] ∈ Mn (R | a11 + a22 = 0} a subspace of the complex


vector space Mn (C)? What if Mn (C) is a real vector space?
5. Are all the sets given below subspaces of C([−1, 1])?

(a) W = {f ∈ C([−1, 1]) | f (1/2) = 0}.


(b) W = {f ∈ C([−1, 1]) | f (−1/2) = 0, f (1/2) = 0}.
T
AF

6. Are all the sets given below subspaces of R[x]? Recall that the degree of the zero polynomial
DR

is assumed to be −∞.

(a) W = {f (x) ∈ R[x] | deg(f (x)) = 3}.


(b) W = {f (x) ∈ R[x] | deg(f (x)) ≤ 0}.
(c) W = {f (x) ∈ R[x] | f (0) = 0}.

7. Which of the following are subspaces of Rn (R)?

(a) {(x1 , x2 , . . . , xn )T | x1 ≥ 0}.


(b) {(x1 , x2 , . . . , xn )T | x1 is rational}.
(c) {(x1 , x2 , . . . , xn )T | | x1 | ≤ 1}.

8. Among the following, determine the subspaces of the complex vector space Cn ?

(a) {(z1 , z2 , . . . , zn )T | z1 is real }.


(b) {(z1 , z2 , . . . , zn )T | z1 + z2 = z3 }.
(c) {(z1 , z2 , . . . , zn )T | | z1 |=| z2 |}.

9. Prove that the following sets are not subspaces of Mn (R).

(a) G = {A ∈ Mn (R) | det(A) = 0}.


(b) G = {A ∈ Mn (R) | det(A) = 1}.
3.1. VECTOR SPACES: DEFINITION AND EXAMPLES 73

3.1.2 Linear Span

Definition 3.1.11. Let V be a vector space over F. Then, for any u1 , . . . , un ∈ V and
n
α1 , . . . , αn ∈ F, the vector α1 u1 + · · · + αn un =
P
αi ui is said to be a linear combina-
i=1
tion of the vectors u1 , . . . , un .

Example 3.1.12. 1. (3, 4, 3) is a linear combination of (1, 1, 1) and (1, 2, 1) as (3, 4, 3) =


2(1, 1, 1) + (1, 2, 1).

2. (3, 4, 5) is not a linear combination of (1, 1, 1) and (1, 2, 1) as the linear system (3, 4, 5) =
a(1, 1, 1) + b(1, 2, 1), in the variables a and b has no solution.

3. Is (4, 5, 5) a linear combination of eT1 = (1, 0, 0), eT2 = (0, 1, 0) and eT3 = (3, 3, 1)?
Solution: (4, 5, 5) is a linear combination as (4, 5, 5) = 4eT1 + 5eT2 + 5eT3 .

4. Is (4, 5, 5) a linear combination of (1, 0, 0), (2, 1, 0) and (3, 3, 1)?


Solution: (4, 5, 5) is a linear combination if the linear system

a(1, 0, 0) + b(2, 1, 0) + c(3, 3, 1) = (4, 5, 5) (3.1.1)

in the variables a, b, c ∈ R has a solution. Clearly, Equation (3.1.1) has solution a = 9, b =


−10 and c = 5.
T

5. Is 4 + 5x + 5x2 + x3 a linear combination of the polynomials p1 (x) = 1, p2 (x) = 2 + x2


AF

and p3 (x) = 3 + 3x + x2 + x3 ?
DR

Solution: The polynomial 4 + 5x + 5x2 + x3 is a linear combination if the linear system

ap1 (x) + bp2 (x) + cp3 (x) = 4 + 5x + 5x2 + x3 (3.1.2)

in the variables a, b, c ∈ R has a solution. Verify that the system has no solution. Thus,
4 + 5x + 5x2 + x3 is not a linear combination of the given set of polynomials.
     
1 3 4 0 1 1 0 1 2
     
3 3 6 a linear combination of the vectors I3 , 1 1 2 and 1 0 2?
6. Is      
4 6 5 1 2 0 2 2 4
     
1 3 4 0 1 1 0 1 2
     
Solution: Verify that 3 3 6 = I3 + 21 1 2 + 1
     0 2. Hence, it is indeed a
4 6 5 1 2 0 2 2 4
linear combination of given vectors of M3 (R).

Exercise 3.1.13. 1. Let x ∈ R3 . Prove that xT is a linear combination of (1, 0, 0), (2, 1, 0)
and (3, 3, 1). Is this linear combination unique? That is, does there exist (a, b, c) 6= (e, f, g)
with xT = a(1, 0, 0) + b(2, 1, 0) + c(3, 3, 1) = e(1, 0, 0) + f (2, 1, 0) + g(3, 3, 1)?

2. Find condition(s) on x, y, z ∈ R such that


(a) (x, y, z) is a linear combination of (1, 2, 3), (−1, 1, 4) and (3, 3, 2).
(b) (x, y, z) is a linear combination of (1, 2, 1), (1, 0, −1) and (1, 1, 0).
74 CHAPTER 3. VECTOR SPACES

(c) (x, y, z) is a linear combination of (1, 1, 1), (1, 1, 0) and (1, −1, 0).

Definition 3.1.14. Let V be a vector space over F and S ⊆ V. Then, the linear span of S,
denoted LS(S), is defined as

LS(S) = {α1 u1 + · · · + αn un | αi ∈ F, ui ∈ S, for 1 ≤ i ≤ n}.

That is, LS(S) is the set of all possible linear combinations of finitely many vectors of S. If S
is an empty set, we define LS(S) = {0}.

Example 3.1.15. For the set S given below, determine LS(S).

1. S = {(1, 0)T , (0, 1)T } ⊆ R2 .


Solution: LS(S) = {a(1, 0)T + b(0, 1)T | a, b ∈ R} = {(a, b)T | a, b ∈ R} = R2 .

2. S = {(1, 1, 1)T , (2, 1, 3)T }. What does LS(S) represent in R3 ?


Solution: LS(S) = {a(1, 1, 1)T + b(2, 1, 3)T | a, b ∈ R} = {(a + 2b, a + b, a + 3b)T | a, b ∈
R}. Note that LS(S) represents a plane passing through the points (0, 0, 0)T , (1, 1, 1)T
and (2, 1, 3)T . To get he equation of the plane, we proceed as follows:
Find conditions on x, y and z such that (a + 2b, a + b, a + 3b) = (x, y, z). Or equivalently,
find conditions on x, y and z such that a + 2b = x, a + b = y and
 a + 3b = z has asolution
1 0 2y − x
T

for all a, b ∈ R. The RREF of the augmented matrix equals 0 1


 
 x−y . Thus,
AF


0 0 z + y − 2x
the required condition on x, y and z is given by z + y − 2x = 0. Hence,
DR

LS(S) = {a(1, 1, 1)T + b(2, 1, 3)T | a, b ∈ R} = {(x, y, z)T ∈ R3 | 2x − y − z = 0}.

3. S = {1 + 2x + 3x2 , 1 + x + 2x2 , 1 + 2x + x3 }.
Solution: To understand LS(S), we need to find condition(s) on α, β, γ, δ such that the
linear system

a(1 + 2x + 3x2 ) + b(1 + x + 2x2 ) + c(1 + 2x + x3 ) = α + βx + γx2 + δx3

in the unknowns a, b, c is always consistent. An application of GJE method gives


α + β − γ − 3δ = 0 as the required condition. Thus,

LS(S) = {α + βx + γx2 + δx3 ∈ R[x] | α + β − γ − 3δ = 0}.


    
 0 1 1 0 1 2

 

4. S = I3 , 1 1 2 , 1 0 2 ⊆ M3 (R).
    
 
 1 2 0 2 2 4 
Solution: To get the equation, we need to find conditions on aij ’s such that the system
   
α β+γ β + 2γ a11 a12 a13
   
β+γ α + β 2β + 2γ  = a21 a22 a23 ,
   
β + 2γ 2β + 2γ α + 2γ a31 a32 a33
3.1. VECTOR SPACES: DEFINITION AND EXAMPLES 75

in the unknowns α, β, γ is always consistent. Now, verify that the required condition
equals
a22 + a33 − a13
LS(S) = {A = [aij ] ∈ M3 (R) | A = AT , a11 = ,
2
a22 − a33 + 3a13 a22 − a33 + 3a13

a12 = , a23 = .
4 2

Exercise 3.1.16. Determine the equation of the geometrical object represented by LS(S).

1. S = {π} ⊆ R.

2. S = {(x, y)T : x, y < 0} ⊆ R2 .

3. S = {(x, y)T : either x 6= 0 or y 6= 0} ⊆ R2 .

4. S = {(1, 0, 1)T , (0, 1, 0)T , (2, 0, 2)T } ⊆ R3 . Give two examples of vectors u, v different
from the given set such that LS(S) = LS(u, v).

5. S = {(x, y, z)T : x, y, z > 0} ⊆ R3 .


     
 0 1 0 0 0 1 0 1 1 

 

0 1, −1 0 0 ⊆ M3 (R).
   
6. S = −1 0 1,  0
     
 
 0 −1 0

−1 −1 0 −1 0 0 

T
AF

7. S = {(1, 2, 3, 4)T , (−1, 1, 4, 5)T , (3, 3, 2, 3)T } ⊆ R4 .


DR

8. S = {1 + 2x + x2 , x, 1 + x2 } ⊆ C[x; 2]. Give two examples of polynomial p(x), q(x) different


from the given set such that LS(S) = LS(p(x), q(x)).

9. S = {1 + 2x + 3x2 , −1 + x + 4x2 , 3 + 3x + 2x2 } ⊆ C[x; 2].

10. S = {1, x, x2 , . . .} ⊆ C[x].

Definition 3.1.17. Let V be a vector space over F. Then, V is called finite dimensional if
there exists S ⊆ V, such that S has finite number of elements and V = LS(S). If such an S
does not exist then V is called infinite dimensional.

Example 3.1.18. 1. {(1, 2)T , (2, 1)T } spans R2 . Thus, R2 is finite dimensional.

2. {1, 1 + x, 1 − x + x2 , x3 , x4 , x5 } spans C[x; 5]. Thus, C[x; 5] is finite dimensional.

3. Fix n ∈ N. Then, C[x; n] is finite dimensional as C[x; n] = LS({1, x, x2 , . . . , xn }).

4. C[x] is not finite dimensional as the degree of a polynomial can be any large positive
integer. Indeed, verify that C[x] = LS({1, x, x2 , . . . , xn , . . .}).

5. The vector space R over Q is infinite dimensional. An argument to justify it will be


given later. The same argument also implies that the vector space C over Q is infinite
dimensional.
76 CHAPTER 3. VECTOR SPACES

Lemma 3.1.19 (Linear Span is a Subspace). Let V be a vector space over F and S ⊆ V. Then,
LS(S) is a subspace of V.

Proof. By definition, 0 ∈ LS(S). So, LS(S) is non-empty. Let u, v ∈ LS(S). To show,


au + bv ∈ LS(S) for all a, b ∈ F. As u, v ∈ LS(S), there exist n ∈ N, vectors wi ∈ S and
scalars αi , βi ∈ F such that u = α1 w1 + · · · + αn wn and v = β1 w1 + · · · + βn wn . Hence,

au + bv = (aα1 + bβ1 )w1 + · · · + (aαn + bβn )wn ∈ LS(S)

as aαi + bβi ∈ F for 1 ≤ i ≤ n. Thus, by Theorem 3.1.9, LS(S) is a vector subspace.

Exercise 3.1.20. Let V be a vector space over F and W ⊆ V.


1. Then, LS(W ) = W if and only if W is a subspace of V.
2. If W is a subspace of V and S ⊆ W then LS(S) is a subspace of W as well.

Theorem 3.1.21. Let V be a vector space over F and S ⊆ V. Then, LS(S) is the smallest
subspace of V containing S.

Proof. For every u ∈ S, u = 1 · u ∈ LS(S). Thus, S ⊆ LS(S). Need to show that LS(S) is the
smallest subspace of V containing S. So, let W be any subspace of V containing S. Then, by
Exercise 3.1.20, LS(S) ⊆ W and hence the result follows.
T

Definition 3.1.22. Let V be a vector space over F.


AF

1. Let S and T be two subsets of V. Then, the sum of S and T , denoted S + T equals
DR

{s + t|s ∈ S, t ∈ T }. For example,


(a) if V = R, S = {0, 1, 2, 3, 4, 5, 6} and T = {5, 10, 15} then S + T = {5, 6, . . . , 21}.
(" #) (" #) (" #)
1 −1 0
(b) if V = R , S =
2 and T = then S + T = .
1 1 2
(" #) " #! (" # " # )
1 −1 1 −1
(c) if V = R2 , S = and T = LS then S + T = +c |c ∈ R .
1 1 1 1

2. Let P and Q be two subspaces of R2 . Then, P + Q = R2 , if


(a) P = {(x, 0)T | x ∈ R} and Q = {(0, x)T | x ∈ R} as (x, y) = (x, 0) + (0, y).
(b) P = {(x, 0)T | x ∈ R} and Q = {(x, x)T | x ∈ R} as (x, y) = (x − y, 0) + (y, y).
2y − x 2x − y
(c) P = LS((1, 2)T ) and Q = LS((2, 1)T ) as (x, y) = (1, 2) + (2, 1).
3 3
We leave the proof of the next result for readers.

Lemma 3.1.23. Let P and Q be two subspaces of a vector space V over F. Then, P + Q is a
subspace of V. Furthermore, P + Q is the smallest subspace of V containing both P and Q.

Exercise 3.1.24. 1. Let a ∈ R2 , a 6= 0. Then, show that {x ∈ R2 | aT x = 0} is a


non-trivial subspace of R2 . Geometrically, what does this set represent in R2 ?

2. Find all subspaces of R3 .


3.1. VECTOR SPACES: DEFINITION AND EXAMPLES 77
(" # ) (" # )
a b a 0
3. Let U = | a, b, c ∈ R and W = | a, d ∈ R be subspaces of M2 (R).
c 0 0 d
Determine U ∩ W. Is M2 (R) = U ∪ W? What is U + W?

4. Let W and U be two subspaces of a vector space V over F.

(a) Prove that W ∩ U is a subspace of V.

(b) Give examples of W and U such that W ∪ U is not a subspace of V.

(c) Determine conditions on W and U such that W ∪ U a subspace of V?

(d) Prove that LS(W ∪ U) = W + U.

5. Prove that {(x, y, z)T ∈ R3 | ax + by + cz = d} is a subspace of R3 if and only if d = 0.

6. Determine all subspaces of the vector space in Example 3.1.4.23.

7. Let S = {x1 , x2 , x3 , x4 }, where x1 = (1, 0, 0)T , x2 = (1, 1, 0)T , x3 = (1, 2, 0)T and x4 =
(1, 1, 1)T . Then, determine all xi such that LS(S) = LS(S \ {xi }).

8. Let W = LS((1, 0, 0)T , (1, 1, 0)T ) and U = LS((1, 1, 1)T ). Prove that W + U = R3 and
W ∩ U = {0}. If v ∈ R3 , determine w ∈ W and u ∈ U such that v = w + u. Is it
T
AF

necessary that w and u are unique?


DR

9. Let W = LS((1, −1, 0), (1, 1, 0)) and U = LS((1, 1, 1), (1, 2, 1)). Prove that W + U = R3
and W ∩ U 6= {0}. Find v ∈ R3 such that v = w + u, for 2 different choices of w ∈ W
and u ∈ U. That is, the choice of vectors w and u is not unique.

Let V be a vector space over either R or C. Then, we have learnt the following:

1. for any S ⊆ V, LS(S) is again a vector space. Moreover, LS(S) is the smallest subspace
containing S.

2. if S = ∅ then LS(S) = {0}.

3. if S has at least one non zero vector then LS(S) contains infinite number of vectors.

Therefore, the following questions arise:

1. Are there conditions under which LS(S1 ) = LS(S2 ), for S1 6= S2 ?

2. Is it always possible to find S so that LS(S) = V?

3. Suppose we have found S ⊆ V such that LS(S) = V. Can we find S such that no proper
subset of S spans V?

We try to answer these questions in the subsequent sections.


78 CHAPTER 3. VECTOR SPACES

3.2 Linear Independence


Definition 3.2.1. Let S = {u1 , . . . , um } be a non-empty subset of a vector space V over F.
Then, S is said to be linearly independent if the linear system

α1 u1 + α2 u2 + · · · + αm um = 0, (3.2.1)

in the variables αi ’s, 1 ≤ i ≤ m, has only the trivial solution. If Equation (3.2.1) has a
non-trivial solution then S is said to be linearly dependent.
If S has infinitely many vectors then S is said to be linearly independent if for every
finite subset T of S, T is linearly independent.

Observe that we are solving a linear system over F. Hence, linear independence and depen-
dence depend on F, the set of scalars.
Example 3.2.2. 1. Is the set S a linear independent set? Give reasons.
(a) Let S = {1 + 2x + x2 , 2 + x + 4x2 , 3 + 3x + 5x2 } ⊆ R[x; 2].  
h ia
2 2
Solution: Consider the system 1 + 2x + x 2 + x + 4x 3 + 3x + 5x  2
 b  = 0,

c
2 2 2
or equivalently a(1 + 2x + x ) + b(2 + x + 4x ) + c(3 + 3x + 5x ) = 0, in the variables
a, b and c. As two polynomials are equal if and only if their coefficients are equal,
T

the above system reduces to the homogeneous system a + 2b + 3c = 0, 2a + b + 3c =


AF

0, a + 4b + 5c = 0. The corresponding coefficient matrix has rank 2 < 3, the number


DR

of variables. Hence, the system has a non-trivial solution. Thus, S is a linearly


dependent subset of R[x; 2].
(b) S = {1, sin(x), cos(x)} is a linearly independent subset of C([−π, π], R) over R as the
system
 
h ia
 b  = 0 ⇔ a · 1 + b · sin(x) + c · cos(x) = 0,
1 sin(x) cos(x)  (3.2.2)

c

in the variables a, b and c has only the trivial solution. To verify this, evaluate
π π
Equation (3.2.2) at − , 0 and to get the homogeneous system a − b = 0, a + c =
2 2
0, a + b = 0. Clearly, this system has only the trivial solution.
(c) Let S = {(0, 1, 1)T , (1, 1, 0)T , (1, 0, 1)T }.  
h ia
Solution: Consider the system (0, 1, 1) (1, 1, 0) (1, 0, 1)   b  = (0, 0, 0) in the

c
variables a, b and c. As rank of coefficient matrix is 3 = the number of variables, the
system has only the trivial solution. Hence, S is a linearly independent subset of R3 .
(d) Consider C as a complex vector space and let S = {1, i}.
Solution: Since C is a complex vector space, i · 1 + (−1)i = i − i = 0. So, S is a
linear dependent subset of the complex vector space C.
3.2. LINEAR INDEPENDENCE 79

(e) Consider C as a real vector space and let S = {1, i}.


Solution: Consider the linear system a · 1 + b · i = 0, in the variables a, b ∈ R. Since
a, b ∈ R, equating real and imaginary parts, we get a = b = 0. So, S is a linear
independent subset of the real vector space C.

2. Let A ∈ Mm,n (C). If Rank(A) < m then, the rows of A are linearly dependent. " #
C
Solution: As Rank(A) < m, there exists an invertible matrix P such that P A = .
0
m
Thus, 0T = (P A)[m, :] =
P
pmi A[i, :]. As P is invertible, at least one pmi 6= 0. Thus, the
i=1
required result follows.
3. Let A ∈ Mm,n (C). If Rank(A) < n then, the columns of A are linearly dependent.
Solution: As Rank(A) < n, by Corollary 2.3.8, there exists an invertible matrix Q such
h i n
P
that AQ = B 0 . Thus, 0 = (AQ)[:, n] = qin A[:, i]. As Q is invertible, at least one
i=1
qin 6= 0. Thus, the required result follows.

3.2.1 Basic Results on Linear Independence

The reader is expected to supply the proof of parts that are not given.

Proposition 3.2.3. Let V be a vector space over F.


T
AF

1. Then, 0, the zero-vector, cannot belong to a linearly independent set.


DR

2. Then, every subset of a linearly independent set in V is also linearly independent.

3. Then, a set containing a linearly dependent set of V is also linearly dependent.

Proof. Let 0 ∈ S. Then, 1 · 0 = 0. That is, a non-trivial linear combination of some vectors in
S is 0. Thus, the set S is linearly dependent.
We now prove a couple of results which will be very useful in the next section.

Proposition 3.2.4. Let S be a linearly independent subset of a vector space V over F. If


T1 , T2 are two subsets of S such that T1 ∩ T2 = ∅ then, LS(T1 ) ∩ LS(T2 ) = {0}. That is, if
v ∈ LS(T1 ) ∩ LS(T2 ) then v = 0.

Proof. Let v ∈ LS(T1 )∩LS(T2 ). Then, there exist vectors u1 , . . . , uk ∈ T1 , w1 , . . . , w` ∈ T2 and


k
P P̀
scalars αi ’s and βj ’s (not all zero) such that v = αi ui and v = βj wj . Thus, we see that
i=1 j=1
k
P P̀
αi ui + (−βj )wj = 0. As the scalars αi ’s and βj ’s are not all zero, we see that a non-trivial
i=1 j=1
linear combination of some vectors in T1 ∪ T2 ⊆ S is 0. This contradicts the assumption that S
is a linearly independent subset of V. Hence, each of α’s and βj ’s is zero. That is v = 0.
We now prove another useful result.

Theorem 3.2.5. Let S = {u1 , . . . , uk } be a non-empty subset of a vector space V over F. If


T ⊆ LS(S) having more than k vectors then, T is a linearly dependent subset in V.
80 CHAPTER 3. VECTOR SPACES

Proof. Let T = {w1 , . . . , wm }. As wi ∈ LS(S), there exist aij ∈ F such that

wi = ai1 u1 + · · · + aik uk , for 1 ≤ i ≤ m.

So,       
w1 a11 u1 + · · · + a1k uk a11 · · · a1k u1
 .   .   . . .  . 
 . = .. = . .. ..  .. .
 .     .  
wm am1 u1 + · · · + amk uk am1 · · · amk uk
As m > k, using Corollary 2.4.6, the linear system xT A = 0T has a non-trivial solution, say
y 6= 0, i.e., yT A = 0T . Thus,
        
w1 u1 u1 u1
 .    .   .   . 
yT  . 
 . =y
T A .  = (y A) .  = 0  .  = 0T .
  . 
T
 . 
T
 . 
wm uk uk uk

As y 6= 0, a non-trivial linear combination of vectors in T is 0. Thus, the set T is linearly


dependent subset of V.

Corollary 3.2.6. Fix n ∈ N. Then, any subset S of Rn with | S | ≥ n + 1 is linearly dependent.

Proof. Observe that Rn = LS({e1 , . . . , en }), where ei = In [:, i], is the i-th column of In . Hence,
using Theorem 3.2.5, the required result follows.
T

Theorem 3.2.7. Let S be a linearly independent subset of a vector space V over F. Then, for
AF

any v ∈ V the set S ∪ {v} is linearly dependent if and only if v ∈ LS(S).


DR

Proof. Let us assume that S ∪ {v} is linearly dependent. Then, there exist vi ’s in S such that
the linear system
α1 v1 + · · · + αp vp + αp+1 v = 0 (3.2.3)

in the variables αi ’s has a non-trivial solution, say αi = ci , for 1 ≤ i ≤ p + 1. We claim that


cp+1 6= 0.
For, if cp+1 = 0 then, Equation (3.2.3) has a non-trivial solution corresponds to having a
non-trivial solution of the linear system α1 v1 + · · · + αp vp = 0 in the variables α1 , . . . , αp . This
contradicts Proposition 3.2.3.2 as {v1 , . . . , vp } ⊆ S, a linearly independent set. Thus, cp+1 6= 0
and we get
1
v=− (c1 v1 + · · · + cp vp ) ∈ LS(v1 , . . . , vp )
cp+1
ci
as − cp+1 ∈ F, for 1 ≤ i ≤ p. That is, v is a linear combination of v1 , . . . , vp .
Now, assume that v ∈ LS(S). Then, there exists vi ∈ S and ci ∈ F, not all zero, such that
p
P
v= ci vi . Thus, the linear system α1 v1 + · · · + αp vp + αp+1 v = 0 in the variables αi ’s has a
i=1
non-trivial solution [c1 , . . . , cp , −1]. Hence, S ∪ {v} is linearly dependent.
We now state a very important corollary of Theorem 3.2.7 without proof. This result can
also be used as an alternative definition of linear independence and dependence.

Corollary 3.2.8. Let V be a vector space over F and let S be a subset of V containing a
non-zero vector u1 .
3.2. LINEAR INDEPENDENCE 81

1. If S is linearly dependent then, there exists k such that LS(u1 , . . . , uk ) = LS(u1 , . . . , uk−1 ).
Or equivalently, if S is a linearly dependent set then there exists a vector uk , for k ≥ 2,
which is a linear combination of the previous vectors.

2. If S linearly independent then, v ∈ V \ LS(S) if and only if S ∪ {v} is also a linearly


independent subset of V.

3. If S is linearly independent then, LS(S) = V if and only if each proper superset of S is


linearly dependent.

3.2.2 Application to Matrices

We start with our understanding of the RREF.

Theorem 3.2.9. Let A ∈ Mm,n (C). Then, the rows of A corresponding to the pivotal rows of
RREF(A) are linearly independent. Also, the columns of A corresponding to the pivotal columns
of RREF(A) are linearly independent.

Proof. Let RREF(A) = B. Then, the pivotal rows of B are linearly independent due to the
pivotal 1’s. Now, let B1 be the submatrix of B consisting of the pivotal rows of B. Also, let
A1 be the submatrix of A whose rows corresponds to the rows of B1 . As the RREF of a matrix
is unique (see Corollary 2.2.18) there exists an invertible matrix Q such that QA1 = B1 . So, if
T

there exists c 6= 0 such that cT A1 = 0T then


AF
DR

0T = cT A1 = cT (Q−1 B1 ) = (cT Q−1 )B1 = dT B1 ,

with dT = cT Q−1 6= 0T as Q is an invertible matrix (see Theorem 2.5.1). This contradicts the
linear independence of the rows of B1 .
Let B[:, i1 ], . . . , B[:, ir ] be the pivotal columns of B. Then, they are linearly independent
due to pivotal 1’s. As B = RREF(A), there exists an invertible matrix P such that B = P A.
Then, the corresponding columns of A satisfy

[A[:, i1 ], . . . , A[:, ir ]] = [P −1 B[:, i1 ], . . . , P −1 B[:, ir ]] = P −1 [B[:, i1 ], . . . , B[:, ir ]].


   
x1 x1
. .
As P is invertible, the systems [A[:, i1 ], . . . , A[:, ir ]] . .
 .  = 0 and [B[:, i1 ], . . . , B[:, ir ]] .  = 0
xr xr
are row-equivalent. Thus, they have the same solution set. Hence, {A[:, i1 ], . . . , A[:, ir ]} is
linearly independent if and only if {B[:, i1 ], . . . , B[:, ir ]} is linear independent. Thus, the required
result follows.
The next result follows directly from Theorem 3.2.9 and hence the proof is left to readers.

Corollary 3.2.10. The following statements are equivalent for A ∈ Mn (C).


1. A is invertible.
2. The columns of A are linearly independent.
82 CHAPTER 3. VECTOR SPACES

3. The rows of A are linearly independent.

We give an example for better understanding.


   
1 1 1 0 1 0 −1 0

1 0 −1 1
  
0 1 2 0
Example 3.2.11. Let A = 2 1 0 1 with RREF(A) = B = 0
  .
   0 0 1
1 1 1 2 0 0 0 0
1. Then, B[:, 3] = −B[:, 1] + 2B[:, 2]. Thus, A[:, 3] = −A[:, 1] + 2A[:, 2].
2. As the 1-st, 2-nd and 4-th columns of B are linearly independent, the set {A[:, 1], A[:
, 2], A[:, 4]} is linearly independent.
3. Also, note that during the application of GJE, the 3-rd and 4-th rows were interchanged.
Hence, the rows A[1, :], A[2, :] and A[4, :] are linearly independent.

3.2.3 Linear Independence and Uniqueness of Linear Combination

We end this section with a result that states that linear combination with respect to linearly
independent set is unique.

Lemma 3.2.12. Let S be a linearly independent subset of a vector space V over F. Then, each
v ∈ LS(S) is a unique linear combination of vectors from S.
T
AF

Proof. Suppose there exists v ∈ LS(S) with v ∈ LS(T1 ), LS(T2 ) with T1 , T2 ⊆ S. Let T1 =
{v1 , . . . , vk } and T2 = {w1 , . . . , w` }, for some vi ’s and wj ’s in S. Define T = T1 ∪ T2 . Then,
DR

T is a subset of S. Hence, using Proposition 3.2.3, the set T is linearly independent. Let T =
{u1 , . . . , up }. Then, there exist αi ’s and βj ’s in F, not all zero, such that v = α1 u1 + · · · + αp up
as well as v = β1 u1 + · · · + βp up . Equating the two expressions for v gives

(α1 − β1 )u1 + · · · + (αp − βp )up = 0. (3.2.4)

As T is a linearly independent subset of V, the system c1 v1 + · · · + cp vp = 0, in the variables


c1 , . . . , cp , has only the trivial solution. Thus, in Equation (3.2.4), αi − βi = 0, for 1 ≤ i ≤ p.
Thus, for 1 ≤ i ≤ p, αi = βi and the required result follows.

Exercise 3.2.13. 1. Suppose V is a vector space over R as well as over C. Then, prove that
{u1 , . . . , uk } is a linearly independent subset of V over C if and only if {u1 , . . . , uk , iu1 , . . . , iuk }
is a linearly independent subset of V over R.

2. Is the set {1, x, x2 , . . .} a linearly independent subset of the vector space C[x] over C?

3. Is the set {eij | 1 ≤ i ≤ m, 1 ≤ j ≤ n} a linearly independent subset of the vector space


Mm,n (C) over C (see Definition 1.3.1.1)?

4. Let W be a subspace of a vector space V over F. For u, v ∈ V \ W, define K = LS(W, u)


and M = LS(W, v). Then, prove that v ∈ K if and only if u ∈ M .

5. Prove that
3.2. LINEAR INDEPENDENCE 83

(a) the rows/columns of A ∈ Mn (C) are linearly independent if and only if det(A) 6= 0.
(b) the rows/columns of A ∈ Mn (C) span Cn if and only if A is an invertible matrix.
(c) the rows/columns of a skew-symmetric matrix A of odd order are linearly dependent.

6. Let V and W be subspaces of Rn such that V + W = Rn and V ∩ W = {0}. Prove that


each u ∈ Rn is uniquely expressible as u = v + w, where v ∈ V and w ∈ W.

7. Let S1 =h {u1 , . . . , un } iandh S2 = {w1 , .i. . , wn } be subsets of a complex vector space V.


Also, let w1 · · · wn = u1 · · · un A for some matrix A ∈ Mn (C).
(a) If A = [aij ] is invertible then S1 is a linearly independent if and only if S2 is linearly
independent.
Pn
Hint: Suppose S2 is linearly independent and consider the linear system αi ui = 0 in the
! i=1
n n n n
 n 
(A−1 )ji wj =
P P P P P −1
variables αi ’s. Then 0 = αi ui = αi (A )ji αi wj . As
i=1 i=1 j=1 j=1 i=1
 
α1
n  .. 
(A−1 )ji αi = 0, for 1 ≤ j ≤ n. Or equivalently A−1 
P
S2 is linearly independent,  .  = 0.

i=1
αn
Thus αi = 0 for all i and hence the set S1 is linearly independent.
(b) If S2 is linearly independent then prove that A is invertible. Further, in this case,
the set S1 is necessarily linearly independent.
h iT
T

Hint: Suppose A is not invertible. Then there exists x0 = x01 · · · x0n 6= 0 such that
AF

Ax0 = 0. Thus, we have obtained x0 6= 0 such that


DR

   
x01 x01
.. 
 = u1 · · · un A ..  = u1 · · · un 0 = 0,
h i h i   h i
w1 · · · wn  .   . 
x0n x0n

a contradiction to S2 being a linearly independent set.

8. Let S = {u1 , . . . , un } ⊆ Cn and T = {Au1 , . . . , Aun }, for some matrix A ∈ Mn (C).


(a) If S is linearly dependent then prove that T is linear dependent.
(b) If S is linearly independent then prove that T is linearly independent for every in-
vertible matrix A.
(c) If T is linearly independent then S is linearly independent. Further, in this case, the
matrix A is necessarily invertible.

9. Let S = {(1, 1, 1, 1)T , (1, −1, 1, 2)T , (1, 1, −1, 1)T } ⊆ R4 . Does (1, 1, 2, 1)T ∈ LS(S)? Fur-
thermore, determine conditions on x, y, z and u such that (x, y, z, u)T ∈ LS(S).

10. Show that S = {(1, 2, 3)T , (−2, 1, 1)T , (8, 6, 10)T } ⊆ R3 is linearly dependent.

11. Find u, v, w ∈ R4 such that {u, v, w} is linearly dependent whereas {u, v}, {u, w} and
{v, w} are linearly independent.

12. Let A ∈ Mn (R). Suppose x, y ∈ Rn \ {0} such that Ax = 3x and Ay = 2y. Then, prove
that x and y are linearly independent.
84 CHAPTER 3. VECTOR SPACES
 
2 1 3
. Determine x, y, z ∈ R3 \ {0} such that Ax = 6x, Ay = 2y and
 
13. Let A = 
4 −1 3 
3 −2 5
Az = −2z. Use the vectors x, y and z obtained above to prove the following.

(a) A2 v = 4v, where v = cy + dz for any c, d ∈ R.


(b) The set {x, y, z} is linearly independent.
(c) Let P = [x, y, z] be a 3 × 3 matrix. Then, P is invertible.
 
6 0 0
 
(d) Let D = 
0 2 0 . Then AP = P D.

0 0 −2

3.3 Basis of a Vector Space


Definition 3.3.1. Let S be a subset of a set T . Then, S is said to be a maximal subset of
T having property P if
1. S has property P and
2. no proper superset of S in T has property P .

Example 3.3.2. Let T = {2, 3, 4, 7, 8, 10, 12, 13, 14, 15}. Then, a maximal subset of T of
T
AF

consecutive integers is S = {2, 3, 4}. Other maximal subsets are {7, 8}, {10} and {12, 13, 14, 15}.
Note that {12, 13} is not maximal. Why?
DR

Definition 3.3.3. Let V be a vector space over F. Then, S is called a maximal linearly
independent subset of V if
1. S is linearly independent and
2. no proper superset of S in V is linearly independent.
Example 3.3.4. 1. In R3 , the set S = {e1 , e2 } is linearly independent but not maximal as
S ∪ {(1, 1, 1)T } is a linearly independent set containing S.
2. In R3 , S = {(1, 0, 0)T , (1, 1, 0)T , (1, 1, −1)T } is a maximal linearly independent set as S is
linearly independent and any collection of 4 or more vectors from R3 is linearly dependent
(see Corollary 3.2.6).
3. Let S = {v1 , . . . , vk } ⊆ Rn . Now, form the matrix A = [v1 , . . . , vk ] and let B =
RREF(A). Then, using Theorem 3.2.9, we see that if B[:, i1 ], . . . , B[:, ir ] are the pivotal
columns of B then {vi1 , . . . , vir } is a maximal linearly independent subset of S.
4. Is the set {1, x, x2 , . . .} a maximal linearly independent subset of C[x] over C?
5. Is the set {eij | 1 ≤ i ≤ m, 1 ≤ j ≤ n} a maximal linearly independent subset of Mm,n (C)
over C?

Theorem 3.3.5. Let V be a vector space over F and S a linearly independent set in V. Then,
S is maximal linearly independent if and only if LS(S) = V.
3.3. BASIS OF A VECTOR SPACE 85

Proof. Let v ∈ V. As S is linearly independent, using Corollary 3.2.8.2, the set S ∪ {v} is
linearly independent if and only if v ∈ V \ LS(S). Thus, the required result follows.
Let V = LS(S) for some set S with | S | = k. Then, using Theorem 3.2.5, we see that if
T ⊆ V is linearly independent then | T | ≤ k. Hence, a maximal linearly independent subset
of V can have at most k vectors. Thus, we arrive at the following important result.

Theorem 3.3.6. Let V be a vector space over F and let S and T be two finite maximal linearly
independent subsets of V. Then, | S | = | T | .

Proof. By Theorem 3.3.5, S and T are maximal linearly independent if and only if LS(S) =
V = LS(T ). Now, use the previous paragraph to get the required result.
Let V be a finite dimensional vector space. Then, by Theorem 3.3.6, the number of vectors
in any two maximal linearly independent set is the same. We use this number to define the
dimension of a vector space. We do so now.

Definition 3.3.7. Let V be a finite dimensional vector space over F. Then, the number of
vectors in any maximal linearly independent set is called the dimension of V, denoted dim(V).
By convention, dim({0}) = 0.
Example 3.3.8. 1. As {1} is a maximal linearly independent subset of R, dim(R) = 1.

2. As {e1 , e2 , e3 } ⊆ R3 is maximal linearly independent, dim(R3 ) = 3.


T

3. As {e1 , . . . , en } is a maximal linearly independent subset in Rn , dim(Rn ) = n.


AF

4. As {e1 , . . . , en } is a maximal linearly independent subset in Cn over C, dim(Cn ) = n.


DR

5. Using Exercise 3.2.13.1, {e1 , . . . , en , ie1 , . . . , ien } is a maximal linearly independent subset
in Cn over R. Thus, as a real vector space, dim(Cn ) = 2n.

6. As {eij | 1 ≤ i ≤ m, 1 ≤ j ≤ n} is a maximal linearly independent subset of Mm,n (C) over


C, dim(Mm,n (C)) = mn.

Definition 3.3.9. Let V be a vector space over F. Then, a maximal linearly independent subset
of V is called a basis/Hamel basis of V. The vectors in a basis are called basis vectors. By
convention, a basis of {0} is the empty set.

Existence of Hamel basis


Definition 3.3.10. Let V be a vector space over F. Then, a subset S of V is called minimal
spanning if LS(S) = V and no proper subset of S spans V.

Remark 3.3.11 (Standard Basis). The readers should verify the statements given below.

1. All the maximal linearly independent set given in Example 3.3.8 form the standard basis
of the respective vector space.

2. {1, x, x2 , . . .} is the standard basis of R[x] over R.

3. Fix a positive integer n. Then, {1, x, x2 , . . . , xn } is the standard basis of R[x; n] over R.
86 CHAPTER 3. VECTOR SPACES

4. Let V = {A ∈ Mn (R) | A = AT }. Then, V is a vector space over R with standard basis


{eii , eij + eji | 1 ≤ i < j ≤ n}.

5. Let V = {A ∈ Mn (R) | AT = −A}. Then, V is a vector space over R with standard basis
{eij − eji | 1 ≤ i < j ≤ n}.
Example 3.3.12. 1. Note that {−2} is a basis and a minimal spanning subset in R.

2. Let u1 , u2 , u3 ∈ R2 . Then, {u1 , u2 , u3 } can neither be a basis nor a minimal spanning


subset of R2 .

3. {(1, 1, −1)T , (1, −1, 1)T , (−1, 1, 1)T } is a basis and a minimal spanning subset of R3 .

4. Let V = {(x, y, 0)T | x, y ∈ R} ⊆ R3 . Then, B = {(1, 0, 0)T , (1, 3, 0)T } is a basis of V.

5. Let V = {(x, y, z)T ∈ R3 | x + y − z = 0} ⊆ R3 . As each element (x, y, z)T ∈ V satisfies


x + y − z = 0. Or equivalently z = x + y, we see that

(x, y, z) = (x, y, x + y) = (x, 0, x) + (0, y, y) = x(1, 0, 1) + y(0, 1, 1).

Hence, {(1, 0, 1)T , (0, 1, 1)T } forms a basis of V.

6. Let S = {a1 , . . . , an }. Then, RS is a real vector space (see Example 3.1.4.9). For 1 ≤ i ≤ n,
define the functions (
1 if j = i
ei (aj ) = .
T

0 otherwise
AF

Then, prove that B = {e1 , . . . , en } is a linearly independent subset of RS over R. Is it a


DR

basis of RS over R? What can you say if S is a countable set?

7. Let S = Rn and consider the vector space RS (see Example 3.1.4.9). For 1 ≤ i ≤ n, define

the functions ei (x) = ei (x1 , . . . , xn ) = xi . Then, verify that {e1 , . . . , en } is a linearly
independent subset of RS over R. Is it a basis of RS over R?

8. Let S = {v1 , . . . , vk } ⊆ Rn . Define A = [v1 , . . . , vk ]. Then, using Example 3.3.4.3,


we see that dim(LS(S)) = Rank(A). Further, using Theorem 3.2.9, the columns of A
corresponding to the pivotal columns in RREF(A) form a basis of LS(S).

9. Recall the vector space C[a, b], where a < b ∈ R. For each α ∈ [a, b], define

fα (x) = x − α, for all x ∈ [a, b].

Suppose α, β and γ are three distinct real numbers in [a, b]. Then prove that {fα , fβ , f )γ }
is linearly dependent subset of C[a, b].

Ans: Choose α, β and γ such that β = aα + (1 − a)γ, i.e., β is a convex combination of α


and γ. Then afα + (1 − a)fγ = fβ .

3.3.1 Main Results associated with Bases

Theorem 3.3.13. Let V be a non-zero vector space over F. Then, the following statements are
equivalent.
3.3. BASIS OF A VECTOR SPACE 87

1. B is a basis (maximal linearly independent subset) of V.

2. B is linearly independent and spans V.

3. B is a minimal spanning set in V.

Proof. 1 ⇒ 2 By definition, every basis is a maximal linearly independent subset of V.


Thus, using Corollary 3.2.8.2, we see that B spans V.
2 ⇒ 3 Let S be a linearly independent set that spans V. As S is linearly independent,
/ LS (S − {x}). Hence LS (S − {x}) $ LS(S) = V.
for any x ∈ S, x ∈
3 ⇒ 1 If B is linearly dependent then using Corollary 3.2.8.1, B is not minimal
spanning. A contradiction. Hence, B is linearly independent.
We now need to show that B is a maximal linearly independent set. Since LS(B) = V, for
any x ∈ V \ B, using Corollary 3.2.8.2, the set B ∪ {x} is linearly dependent. That is, every
proper superset of B is linearly dependent. Hence, the required result follows.
Now, using Lemma 3.2.12, we get the following result.

Remark 3.3.14. Let B be a basis of a vector space V over F. Then, for each v ∈ V, there exist
n
unique ui ∈ B and unique αi ∈ F, for 1 ≤ i ≤ n, such that v =
P
αi ui .
i=1

The next result is generally known as “every linearly independent set can be extended to
form a basis of a finite dimensional vector space”.
T
AF

Theorem 3.3.15. Let V be a vector space over F with dim(V) = n. If S is a linearly independent
DR

subset of V then there exists a basis T of V such that S ⊆ T .

Proof. If LS(S) = V, done. Else, choose u1 ∈ V \ LS(S). Thus, by Corollary 3.2.8.2, the set
S ∪{u1 } is linearly independent. We repeat this process till we get n vectors in T as dim(V) = n.
By Theorem 3.3.13, this T is indeed a required basis.

3.3.2 Constructing a Basis of a Finite Dimensional Vector Space

We end this section with an algorithm which is based on the proof of the previous theorem.

Step 1: Let v1 ∈ V with v1 6= 0. Then, {v1 } is linearly independent.

Step 2: If V = LS(v1 ), we have got a basis of V. Else, pick v2 ∈ V \ LS(v1 ). Then, by


Corollary 3.2.8.2, {v1 , v2 } is linearly independent.

Step i: Either V = LS(v1 , . . . , vi ) or LS(v1 , . . . , vi ) $ V. In the first case, {v1 , . . . , vi } is


a basis of V. Else, pick vi+1 ∈ V \ LS(v1 , . . . , vi ). Then, by Corollary 3.2.8.2, the set
{v1 , . . . , vi+1 } is linearly independent.

This process will finally end as V is a finite dimensional vector space.

Exercise 3.3.16. 1. Let B = {u1 , . . . , un } be a basis of a vector space V over F. Then,


n
P
does the condition αi ui = 0 in αi ’s imply that αi = 0, for 1 ≤ i ≤ n?
i=1
88 CHAPTER 3. VECTOR SPACES

2. Let S = {v1 , . . . , vp } be a subset of a vector space V over F. Suppose LS(S) = V but S


is not a linearly independent set. Then, does this imply that each v ∈ V is expressible
in more than one way as a linear combination of vectors from S? Is it possible to get a
subset T of S such that T is a basis of V over F? Give reasons for your answer.

3. Let V be a vector space of dimension n. Then,

(a) prove that any set consisting of n linearly independent vectors forms a basis of V.
(b) prove that if S is a subset of V having n vectors with LS(S) = V then, S forms a
basis of V.

4. Let {v1 , . . . , vn } be a basis of Cn . Then, prove that the two matrices B = [v1 , . . . , vn ] and
 
v1T
 . 
C= . 
 .  are invertible.
vnT

5. Let A ∈ Mn (C) be an invertible matrix. Then, prove that the rows/columns of A form a
basis of Cn over C.

6. Let W1 and W2 be two subspaces of a finite dimensional vector space V such that W1 ⊆ W2 .
Then, prove that W1 = W2 if and only if dim(W1 ) = dim(W2 ).
T

7. Let W1 be a non-trivial subspace of a finite dimensional vector space V over F. Then,


AF

prove that there exists a subspace W2 of V such that


DR

W1 ∩ W2 = {0}, W1 + W2 = V and dim(W2 ) = dim(V) − dim(W1 ).

Also, prove that for each v ∈ V there exist unique vectors w1 ∈ W1 and w2 ∈ W2 with
v = w1 + w2 . The subspace W2 is called the complementary subspace of W1 in V.

8. Let V be a finite dimensional vector space over F. If W1 and W2 are two subspaces of V
such that W1 ∩W2 = {0} and dim(W1 )+dim(W2 ) = dim(V) then prove that W1 +W2 = V.

9. Consider the vector space C([−π, π]) over R. For each n ∈ N, define en (x) = sin(nx).
Then, prove that S = {en | n ∈ N} is linearly independent. [Hint: Need to show that every
finite subset of S is linearly independent. So, on the contrary assume that there exists ` ∈ N and
functions ek1 , . . . , ek` such that α1 ek1 + · · · + α` ek` = 0, for some αt 6= 0 with 1 ≤ t ≤ `. But,
the above system is equivalent to looking at α1 sin(k1 x) + · · · + α` sin(k` x) = 0 for all x ∈ [−π, π].
Now in the integral
Z π Z π
sin(mx) (α1 sin(k1 x) + · · · + α` sin(k` x)) dx = sin(mx)0 dx = 0
−π −π

replace m with ki ’s to show that αi = 0, for all i, 1 ≤ i ≤ `. This gives the required contradiction.]

10. Is the set {1, sin(x), cos(x), sin(2x), cos(2x), sin(3x), cos(3x), . . .} a linearly subset of the
vector space C([−π, π], R) over R?

11. Find a basis of R3 containing the vector (1, 1, −2)T .


3.4. FUNDAMENTAL SUBSPACES ASSOCIATED WITH A MATRIX 89

12. Find a basis of R3 containing the vector (1, 1, −2)T and (1, 2, −1)T .

13. Is it possible to find a basis of R4 containing the vectors (1, 1, 1, −2)T , (1, 2, −1, 1)T and
(1, −2, 7, −11)T ?

14. Show that B = {(1, 0, 1)T , (1, i, 0)T , (1, 1, 1 − i)T } is a basis of C3 over C.

15. Find a basis of C3 over R containing the basis B given in Example 3.3.16.14.

16. Determine a basis and dimension of W = {(x, y, z, w)T ∈ R4 | x + y − z + w = 0}.

17. Find a basis of V = {(x, y, z, u) ∈ R4 | x − y − z = 0, x + z − u = 0}.


 
1 0 1 1 0
0 1 2 3 0. Find a basis of V = {x ∈ R | Ax = 0}.
  5
18. Let A =  
0 0 0 0 1

19. Let uT = (1, 1, −2), vT = (−1, 2, 3) and wT = (1, 10, 1). Find a basis of LS(u, v, w).
Determine a geometrical representation of LS(u, v, w).

20. Is the set W = {p(x) ∈ R[x; 4] | p(−1) = p(1) = 0} a subspace of R[x; 4]? If yes, find its
dimension.

3.4 Fundamental Subspaces Associated with a Matrix


T
AF

In this section, we will study results that are intrinsic to the understanding of linear algebra
DR

from the point of view of matrices. So, we start with defining the four fundamental subspaces
associated with a matrix.

Definition 3.4.1. Let A ∈ Mm,n (C). Then, we define the four fundamental subspaces associ-
ated with A as
1. Col(A) = {Ax | x ∈ Cn } is a subspace of Cm , called the Column space, and is the
linear span of the columns of A.
2. Row(A) = Col(AT ) = {AT x | x ∈ Cm } is a subspace of Cn , called the row space of A
and is the linear span of the rows of A.
3. Col(A∗ ) = {A∗ x | x ∈ Cm }.
4. Null(A) = {x ∈ Cn | Ax = 0}, called the Null space of A.
5. Null(A∗ ) = {x ∈ Cm | A∗ x = 0}.

Remark 3.4.2. Let A ∈ Mm,n (C).


1. Then, Col(A) is a subspace of Cm and Col(A∗ ) is a subspace of Cn .
2. Then, Null(A) is a subspace of Cn and Null(A∗ ) is a subspace of Cm . 
1 1 1 −2
 
Example 3.4.3. 1. Compute the fundamental subspaces for A = 1 2 −1

.
1 
1 −2 7 −11
Solution: Verify the following
90 CHAPTER 3. VECTOR SPACES

(a) Row(A) = {(x, y, z, u)T ∈ C4 | 3x − 2y = z, 5x − 3y + u = 0}.


(b) Col(A) = {(x, y, z)T ∈ C3 | 4x − 3y − z = 0}.
(c) Null(A) = {(x, y, z, u)T ∈ C4 | x + 3z − 5u = 0, y − 2z + 3u = 0}.
(d) Null(AT ) = {(x, y, z)T ∈ C3 | x + 4z = 0, y − 3z = 0}.
 
1 1 0 −1
 
1 −1 1 2 . Then, verify that
2. Let A =  
2 0 1 1

(a) Col(A) = {x = (x1 , x2 , x3 )T ∈ R3 | x1 + x2 − x3 = 0}.


(b) Row(A) = {x = (x1 , x2 , x3 , x4 )T ∈ R4 | x1 − x2 − 2x3 = 0, x1 − 3x2 − 2x4 = 0}.
(c) Null(A) = LS({(1, −1, −2, 0)T , (1, −3, 0, −2)T }).
(d) Null(AT ) = LS((1, 1, −1)T ).
 
1 1 0 1 1 0 −1
 
0 0 1 2 3 0 −2. Find a basis and dimension of Null(A).
3. Let A =  
0 0 0 0 0 1 1
Solution: Writing the basic vairables x1 , x3 and x6 in terms of the free variables x2 , x4 , x5
and x7 , we get x1 = x7 − x2 − x4 − x5 , x3 = 2x7 − 2x4 − 3x5 and x6 = −x7 . Hence, the
T

solution set has the form


AF

           
x1 x7 − x2 − x4 − x5 −1 −1 −1 1
DR

           
x2   x2  1 0 0 0
           
x3   2x7 − 2x4 − 3x5  0 −2 −3 2
           
           
x4  =  x4
 = x 2
 0  + x 4
 1  + x 5
 0  + x 7
 0 . (3.4.1)
           
           
x5   x5  0 0 1 0
           
x  
 6  −x7 
 0
 
0
 
0
  −1
 
x7 x7 0 0 0 1

h i h i h i
Now, let uT1 T T
= −1, 1, 0, 0, 0, 0, 0 , u2 = −1, 0, −2, 1, 0, 0, 0 , u3 = −1, 0, −3, 0, 1, 0, 0
h i
and uT4 = 1, 0, 2, 0, 0, −1, 1 . Then, S = {u1 , u2 , u3 , u4 } is a basis of Null(A). The
reasons for S to be a basis are as follows:

(a) By Equation (3.4.1) Null(A) = LS(S).


(b) For Linear independence, the homogeneous system c1 u1 + c2 u2 + c3 u3 + c4 u4 = 0 in
the variables c1 , c2 , c3 and c4 has only the trivial solution as
i. u4 is the only vector with a nonzero entry at the 7-th place (u4 corresponds to
x7 ) and hence c4 = 0.
ii. u3 is the only vector with a nonzero entry at the 5-th place (u3 corresponds to
x5 ) and hence c3 = 0.
iii. Similar arguments hold for the variables c2 and c1 .
3.4. FUNDAMENTAL SUBSPACES ASSOCIATED WITH A MATRIX 91

Remark 3.4.4. Let A ∈ Mm,n (R). Then, in Example ??, observe that the direction ratios of
normal vectors of Col(A) matches with vector in Null(AT ). Similarly, the direction ratios
of normal vectors of Row(A) matches with vectors in Null(A). Are these true in the general
setting? Do similar relations hold if A ∈ Mm,n (C)? We will come back to these spaces again
and again.

Exercise 3.4.5. 1. For the matrices given below, determine Col(A), Row(A), Null(A),
Null(AT ).
 Further, find
 the dimensions
 of all the vector subspaces so obtained.
1 2 1 3 2 2 4 0 6  
   1 i 2i
−1 0 −2

0 2 2 2 4  5  
A=  and C =  i −2 −3 .
2 −2 4 0 8 , B = −3 −5 1
 
−4   
1 1 1+i
  
4 2 5 6 10 −1 −1 1 2
Ans: Verify that Col(C) = {(x1 , x2 , x3 ) ∈ C3 | (2 + i)x1 − (1 − i)x2 − x3 = 0}.
Col(C ∗ ) = {(x1 , x2 , x3 ) ∈ C3 | ix1 − x2 + x3 = 0}. Null(C) = LS((i, 1, −1)T ).
Null(C ∗ ) = LS((−2 + i, 1 + i, 1)T ).

2. Let A = [X Y ]. Then, determine the condition under which Col(X) = Col(Y ).

The next result is a re-writing of the results on system of linear equations. We give the
proof for the sake of completeness.
T

Lemma 3.4.6. Let A ∈ Mm×n (C) and let E be an elementary matrix. If


AF

1. B = EA then
DR

(a) Null(A) = Null(B), Row(A) = Row(B). Thus, the dimensions of the corre-
sponding spaces are equal.
(b) Null(A) = Null(B), Row(A) = Row(B). Thus, the dimensions of the corre-
sponding spaces are equal.

2. B = AE then

(a) Null(A∗ ) = Null(B ∗ ), Col(A) = Col(B). Thus, the dimensions of the corre-
sponding spaces are equal.
(b) Null(AT ) = Null(B T ), Col(A) = Col(B). Thus, the dimensions of the corre-
sponding spaces are equal.

Proof. Part 1a: Let x ∈ Null(A). Then, Bx = EAx = E0 = 0. So, Null(A) ⊆ Null(B).
Further, if x ∈ Null(B), then Ax = (E −1 E)Ax = E −1 (EA)x = E −1 Bx = E −1 0 = 0. Hence,
Null(B) ⊆ Null(A). Thus, Null(A) = Null(B).
Let us now prove Row(A) = Row(B). So, let xT ∈ Row(A). Then, there exists y ∈ Cm
such that xT = yT A. Thus, xT = yT E −1 EA = yT E −1 B and hence xG ∈ Row(B). That
 

is, Row(A) ⊆ Row(B). A similar argument gives Row(B) ⊆ Row(A) and hence the required
result follows.
Part 1b: E is invertible implies E is invertible and B = EA. Thus, an argument similar to
the previous part gives us the required result.
92 CHAPTER 3. VECTOR SPACES

For Part 2, note that B ∗ = E ∗ A∗ and E ∗ is invertible. Hence, an argument similar to the
first part gives the required result.
Let W1 and W1 be two subspaces of a vector space V over F. Then, recall that (see
Exercise 3.1.24.4d) W1 + W2 = {u + v | u ∈ W1 , v ∈ W2 } = LS(W1 ∪ W2 ) is the smallest
subspace of V containing both W1 and W2 . We now state a result similar to a result in Venn
diagram that states | A | + | B | = | A ∪ B | + | A ∩ B |, whenever the sets A and B are
finite (for a proof, see Appendix 7.4.1).

Theorem 3.4.7. Let V be a finite dimensional vector space over F. If W1 and W2 are two
subspaces of V then

dim(W1 ) + dim(W2 ) = dim(W1 + W2 ) + dim(W1 ∩ W2 ). (3.4.2)

For better understanding, we give an example for finite subsets of Rn . The example uses
Theorem 3.2.9 to obtain bases of LS(S), for different choices S. The readers are advised to see
Example 3.2.9 before proceeding further.

Example 3.4.8. Let V = {(v, w, x, y, z)T ∈ R5 | v + x + z = 3y} and W = {(v, w, x, y, z)T ∈


R5 | w − x = z, v = y}. Find bases of V and W containing a basis of V ∩ W.
Solution: One can first find a basis of V ∩ W and then heuristically add a few vectors to get
bases for V and W, separately.
T

Alternatively, First find bases of V, W and V∩W, say BV , BW and B. Now, consider
AF

S = B ∪ BV . This set is linearly dependent. So, obtain a linearly independent


DR

subset of S that contains all the elements of B. Similarly, do for T = B ∪ BW .


So, we first find a basis of V ∩ W. Note that (v, w, x, y, z)T ∈ V ∩ W if v, w, x, y and z satisfy
v + x − 3y + z = 0, w − x − z = 0 and v = y. The solution of the system is given by

(v, w, x, y, z)T = (y, 2y, x, y, 2y − x)T = y(1, 2, 0, 1, 2)T + x(0, 0, 1, 0, −1)T .

Thus, B = {(1, 2, 0, 1, 2)T , (0, 0, 1, 0, −1)T } is a basis of V∩W. Similarly, a basis of V is given
by C = {(−1, 0, 1, 0, 0)T , (0, 1, 0, 0, 0)T , (3, 0, 0, 1, 0)T , (−1, 0, 0, 0, 1)T } and that of W is given by
D = {(1, 0, 0, 1, 0)T , (0, 1, 1, 0, 0)T , (0, 1, 0, 0, 1)T }. To find the required basis form a matrix
whose rows are the vectors in B, C and D (see Equation(3.4.3)) and apply row operations other
than Eij . Then, after a few row operations, we get
   
1 2 0 1 2 1 2 0 1 2
−1 0 1 0 −1
   
0 0 1 0 0
   
−1 0 1 0 0 0 1 0 0 0
   
   
0 1 0 0 0 0 0 0 1 3
  
3 0 0 1 0  → 0 0 0 0 0 . (3.4.3)
   
   
−1 0 0 0 1 0 0 0 0 0
   
   
1 0 0 1 0 0 1 0 0 1
  
0 1 1 0 0 0 0 0 0 0
   

0 1 0 0 1 0 0 0 0 0
3.4. FUNDAMENTAL SUBSPACES ASSOCIATED WITH A MATRIX 93

Thus, a required basis of V is {(1, 2, 0, 1, 2)T , (0, 0, 1, 0, −1)T , (0, 1, 0, 0, 0)T , (0, 0, 0, 1, 3)T }. Sim-
ilarly, a required basis of W is {(1, 2, 0, 1, 2)T , (0, 0, 1, 0, −1)T , (0, 1, 0, 0, 1)T }.

Exercise 3.4.9. 1. Give an example to show that if A and B are equivalent then Col(A)
need not equal Col(B).

2. Let V = {(x, y, z, w)T ∈ R4 | x + y − z + w = 0, x + y + z + w = 0, x + 2y = 0} and


W = {(x, y, z, w)T ∈ R4 | x − y − z + w = 0, x + 2y − w = 0} be two subspaces of R4 .
Think of a method to find bases and dimensions of V, W, V ∩ W and V + W.

3. Let W1 and W2 be two subspaces of a vector space V. If dim(W1 ) + dim(W2 ) > dim(V),
then prove that dim(W1 ∩ W2 ) ≥ 1.

4. Let A ∈ Mm×n (C) with m < n. Prove that the columns of A are linearly dependent.

We now prove the rank-nullity theorem and give some of it’s consequences.

Theorem 3.4.10 (Rank-Nullity Theorem). Let A ∈ Mm×n (C). Then,

dim(Col(A)) + dim(Null(A)) = n. (3.4.4)

Proof. Let dim(Null(A)) = r ≤ n and let B = {u1 , . . . , ur } be a basis of Null(A). Since B is


a linearly independent set in Rn , extend it to get C = {u1 , . . . , un } as a basis of Rn . Then,
T
AF

Col(A) = LS(AB) = LS(Au1 , . . . , Aun )


DR

= LS(0, . . . , 0, Aur+1 , . . . , Aun ) = LS(Aur+1 , . . . , Aun ).

So, D = {Aur+1 , . . . , Aun } spans Col(A). We further need to show that D is linearly indepen-
dent. So, consider the linear system

α1 Aur+1 + · · · + αn−r Aun = 0 ⇔ A(α1 ur+1 + · · · + αn−r un ) = 0 (3.4.5)

in the variables α1 , . . . , αn−r . Thus, α1 ur+1 + · · · + αn−r un ∈ Null(A) = LS(B). Therefore,


n−r
P Pr
there exist scalars βi , 1 ≤ i ≤ r, such that αi ur+i = βj uj . Or equivalently,
i=1 j=1

β1 u1 + · · · + βr ur − α1 ur+1 − · · · − αn−r un = 0. (3.4.6)

Equation (3.4.6) is a linear system in vectors from C with αi ’s and βj ’s as unknowns. As C is a


linearly independent set, the only solution of Equation (3.4.6) is

αi = 0, for 1 ≤ i ≤ n − r and βj = 0, for 1 ≤ j ≤ r.

In other words, we have shown that the only solution of Equation (3.4.5) is the trivial solution.
Hence, {Aur+1 , . . . , Aun } is a basis of Col(A). Thus, the required result follows.
Theorem 3.4.10 is part of what is known as the fundamental theorem of linear algebra (see
Theorem 3.4.13). The following are some of the consequences of the rank-nullity theorem. The
proofs are left as an exercise for the reader.
94 CHAPTER 3. VECTOR SPACES

Exercise 3.4.11. 1. Let A ∈ Mm,n (C).

(a) If n > m then the system Ax = 0 has infinitely many solutions,


(b) If n < m then there exists b ∈ Rm \ {0} such that Ax = b is inconsistent.

2. The following statements are equivalent for an m × n matrix A.

(a) Rank (A) = k.


(b) There exist a set of k rows of A that are linearly independent.
(c) There exist a set of k columns of A that are linearly independent.
(d) dim(Col(A)) = k.
(e) There exists a k × k submatrix B of A with det(B) 6= 0. Further, the determinant of
every (k + 1) × (k + 1) submatrix of A is zero.
(f ) There exists a linearly independent subset {b1 , . . . , bk } of Rm such that the system
Ax = bi , for 1 ≤ i ≤ k, is consistent.
(g) dim(Null(A)) = n − k.
k
xi yi∗ , for some xi ∈ Cm and yi ∈ Cn . Then does it imply that Rank(A) ≤ k?
P
3. Let A =
i=1
T

We end this section by proving the fundamental theorem of linear algebra. We start with
AF

the following result.


DR

Lemma 3.4.12. Let A ∈ Mm,n (R). Then, Null(A) = Null(AT A).

Proof. Let x ∈ Null(A). Then, Ax = 0. So, (AT A)x = AT (Ax) = AT 0 = 0. Thus,


x ∈ Null(AT A). That is, Null(A) ⊆ Null(AT A).
Suppose that x ∈ Null(AT A). Then, (AT A)x = 0 and 0 = xT 0 = xT (AT A)x =
(Ax)T (Ax) = kAxk2 . Thus, Ax = 0 and the required result follows.
Let u, v ∈ Cn . Then u is said to be orthogonal to v if u∗ v = 0 (dot product in case the
vectors are from Rn ). For a subset S of Cn , the orthogonal complement of S, denoted S ⊥
is defined as
S ⊥ = {x ∈ Cn : x∗ s = 0 for all s ∈ S}.

As an exercise prove the following:


1. S ⊥ is a subspace of Cn .

2. S ⊥ = (LS(S))⊥

Theorem 3.4.13 (Fundamental Theorem of Linear Algebra). Let A ∈ Mn (C). Then,

1. dim(Null(A)) + dim(Col(A)) = n.
⊥ ⊥
2. Null(A) = Col(A∗ ) and Null(A∗ ) = Col(A) .

3. dim(Col(A)) = dim(Col(A∗ )). Or equivalently, Row-rank(A) = Column-rank(A).


3.4. FUNDAMENTAL SUBSPACES ASSOCIATED WITH A MATRIX 95

Proof. Part 1: Proved in Theorem 3.4.10.


Part 2: We first prove that Null(A) ⊆ Col(A∗ )⊥ . Let x ∈ Null(A). Then, Ax = 0 and

0 = u∗ 0 = u∗ (Ax) = u∗ Ax = (A∗ u)∗ x, for all u ∈ Cn .

But Col(A∗ ) = {A∗ u | u ∈ Cn }. Thus, x ∈ Col(A∗ )⊥ and Null(A) ⊆ Col(A∗ )⊥ .


We now prove that Col(A∗ )⊥ ⊆ Null(A). Let x ∈ Col(A∗ )⊥ . Then, for every y ∈ Cn ,
A∗ y ∈ Col(A∗ ) and hence

0 = (A∗ y)∗ x = y∗ (A∗ )∗ x = y∗ Ax.

In particular, for y = Ax ∈ Cn , we get kAxk2 = 0. Hence Ax = 0. That is, x ∈ Null(A).


Thus, the proof of the first equality in Part 2 is over. We omit the second equality as it proceeds
on the same lines as above.
Part 3: Use the first two parts to get the required result.
Hence the proof of the fundamental theorem is complete.
⊥
Remark 3.4.14. Theorem 3.4.13.2 implies that Null(A) = Col(A∗ ) . This statement is
just stating the usual fact that if x ∈ Null(A) then Ax = 0 and hence the usual dot product of
every row of A with x equals 0.

As an implication of Theorem 3.4.13.2 and Theorem 3.4.13.3, we show the existence of an


T

invertible linear map T : Col(A∗ ) → Col(A).


AF
DR

Corollary 3.4.15. Let A ∈ Mn (C). Then, the function T : Col(A∗ ) → Col(A) defined by
T (x) = Ax is invertible.

Proof. In view of Theorem 3.4.13.3 and the rank-nullity theorem, we just need to show that
the map is one-one. So, suppose that there exist x, y ∈ Col(A∗ ) such that T (x) = T (y).
Or equivalently, Ax = Ay. Thus, x − y ∈ Null(A) = (Col(A∗ ))⊥ (by Theorem 3.4.13.2).
Therefore, x − y ∈ (Col(A∗ ))⊥ ∩ Col(A∗ ) = {0}. Thus, x = y and hence the map is one-one.
Thus, the required result follows.
The readers should look at Example 3.4.3 and Remark 3.4.4. We give one more example.
 
1 1 0
 
Example 3.4.16. Let A =  2 1 1 . Then, verify that

3 2 1
1. {(0, 1, 1)T , (1, 1, 2)T } is a basis of Col(A).
2. {(1, 1, −1)T } is a basis of Null(AT ).
3. Null(AT ) = (Col(A))⊥ .
Exercise 3.4.17. 1. Find distinct subspaces W1 and W2
(a) in R2 such that W1 and W2 are orthogonal but not orthogonal complement.
(b) in R3 such that W1 6= {0} and W2 6= {0} are orthogonal, but not orthogonal comple-
ment.
96 CHAPTER 3. VECTOR SPACES

2. Let A ∈ Mm,n (C). Then, Null(A) = Null(A∗ A).

3. Let A ∈ Mm,n (R). Then, Col(A) = Col(AT A).

4. Let A ∈ Mm,n (R). Then, Rank(A) = n if and only if Rank(AT A) = n.

5. Let A ∈ Mm,n (C). Then, for every

(a) x ∈ Rn , x = u + v, where u ∈ Col(AT ) and v ∈ Null(A) are unique.


(b) y ∈ Rm , y = w + z, where w ∈ Col(A) and z ∈ Null(AT ) are unique.

For more information related with the fundamental theorem of linear algebra the interested
readers are advised to see the article “The Fundamental Theorem of Linear Algebra, Gilbert
Strang, The American Mathematical Monthly, Vol. 100, No. 9, Nov., 1993, pp. 848 - 855.”

3.5 Ordered Bases


Let V be a vector space of dimension n over F. Then, in the previous class, we learnt that
V is isomorphic to Fn . So, if it should be possible to write the elements of V as an n-tuple.
Further, our problem may require us to look at a subspace W of V whose dimension is very
small as compared to the dimension of V. It may also be possible that a basis of W may not
look like a standard basis where the coefficient of ei gave the i-th component of the vector. In
T

the above cases, it is helpful to fix the vectors in a particular order and then concentrate only
AF

on the coefficients of the vectors as was done for the system of linear equations where we didn’t
DR

worry about the unknowns. We start with the following example. Note that we will be using
‘small brackets’ in place of ‘braces’ to represent a basis.

 = 1 − x ∈ R[x; 2]. If B = (1, x, x ) be a basis of R[x; 2] then,


Example 3.5.1. 1. Let 2 2
 f (x)
h i 1 
2
f (x) = 1 x x   0 .

−1

2. Consider the first three Legendre polynomials


3 1
P0 (x) = 1, P1 (x) = x and P2 (x) = x2 − , for all x ∈ [−1, 1].
2 2
Then B = (P0 (x), P1 (x), P2 (x)) forms a basis of R[x; 2]. These polynomials have been
defined for all positive integers n and are very helpful in numerical computations due to
the following properties:

(a) deg(Pi (x)) = i, for i ≥ 0;


(b) Pi (1) = 1, for i ≥ 0;
R1
(c) Pi (x)Pj (x)dx = 0, whenever i 6= j; and
−1

R1 2
(d) (Pi (x))2 dx = , for i ≥ 0.
−1 2i + 1
3.5. ORDERED BASES 97
  
P0 (x) a0 + a32
Here, a0 + a1 x + a2 x2 = a0 + a32 P0 (x) + a1 P1 (x) + 2a32 P2 (x) = 
   
P 1 (x) a1 .
  
2a2
P2 (x) 3

3. Let V = {(u, v, w, x, y)T ∈ R5 | w − x = u, v = y, u + v + x = 3y}. Then, verify that


B = (−1, 0, 0, 1, 0)T , (2, 1, 2, 0, 1)T = (u1 , u2 ), say, can be taken as a basis of V. In this

" # " #
3 5
case, (7, 5, 10, 3, 5) = [u1 , u2 ] = [u2 , u1 ] .
5 3
So, from the above, we conclude the following: Let V be a vector space of dimension n over
n
F. If we fix a basis, say, B = (u1 , u2 , . . . , un ) of V and if then v ∈ V with v =
P
αi ui ⇒
i=1
   
α1 α2
   
 α2   α1 
v = [u1 , u2 , . . . , un ] .  = [u2 , u1 , . . . , un ] . 
   
 ..   .. 
   
αn αn
Note the change in the first two components of the column vectors which are elements of Fn .
So, a change in the position of the vectors ui ’s gives a change in the column vector. Hence, if
we fix the order of the basis vectors ui ’s then then with respect to this order all vectors can be
thought of as elements of Fn . To clarify, we have the following definition.
T

Definition 3.5.2. Let W be a vector space over F with a basis B = {u1 , . . . , um }. Then, an
AF

ordered basis for W is a basis B together with a one-to-one correspondence between B and
DR

{1, 2, . . . , m}. Since there is an order among the elements of B, we write B = (u1 , . . . , um ). The
matrix B = [u1 , . . . , um ] is an element of Wm and is generally called the basis matrix.

Example 3.5.3. Note that in Example 3.5.5 the matrices [1, x, x2 ], [P0 (x), P1 (x), P2 (x)] and
[u1 , u2 ] were basis matrices corresponding to different vector spaces.

Definition 3.5.4. Let B = [v1 , . . . , vm ] be the basis matrix corresponding to an ordered basis
B of W. Since B  is a basis of W, for  each v ∈ W, there exist βi , 1 ≤ i ≤ m, such that
β1 β1
m  .   . 
.   . 
P
v= βi vi = B 
 . . The vector  . , denoted [v]B , is called the coordinate vector of
i=1
βm βm
v with respect to B. Thus,
 
v1
.. 
v = B[v]B = [v1 , . . . , vm ][v]B , or equivalently, v = [v]TB 

 . . (3.5.1)

vm
The last expression is generally viewed as a symbolic expression.

Example 3.5.5. Consider Example 3.5.5. Then


 
1
1. for f (x) = 1 − x2 ∈ R[x; 2] with B = (1, x, x2 ) as an ordered basis [f (x)]B = 
 
 0 .

−1
98 CHAPTER 3. VECTOR SPACES

2. for (7, 5, 10, 3, 5) ∈ V = {(u, v, w, x, y)T ∈ R5 | w − x = u, v = y, u + v + x = "3y} # with


3
B = (−1, 0, 0, 1, 0)T , (2, 1, 2, 0, 1)T as an ordered basis of V [(7, 5, 10, 3, 5)]B =

.
5
Remark 3.5.6. 1. Let B be an ordered basis of a vector space V over F of dimension n.
(a) Then [αv + w]B = α[v]B + [w]B , for all α ∈ F and v, w ∈ V.
(b) Further, let S = {w1 , . . . , wm } ⊆ V. Then, observe that S is linearly independent if
and only if {[w1 ]B , . . . , [wm ]B } is linearly independent in Fn .

2. Suppose V = Fn in Definition 3.5.4. Then, B = [v1 , . . . , vn ] is an n × n invertible matrix


(see Exercise 3.3.16.4). Thus, v = B[v]B implies

[v]B = B −1 v for every v ∈ V. (3.5.2)


" # " #! " #
1 1 1 1
Example 3.5.7. Let B = , be an ordered basis of R2 . Then, the matrix B =
1 2 1 2
" # " #−1 " #
π 1 1 π
is invertible and using Equation (3.5.2), = .
e 1 2 e
B

Exercise 3.5.8. Recall that any real square matrix can be written as a sum of Hermitian and
skew-Hermitian matrices. Thus M3 (R) =
 U+ W, where U = {A ∈ M3 (R)|A = A} and
T

1 2 3
T

W = {A ∈ M3 (R)|A = −A}. Let A = 


T
. Then A = X + Y for some X ∈ U and
 
2 1 3
AF

3 1 4
Y ∈ W.
DR

1. If B = (e11 , e12 , e13 , . . . , e33 ) is an ordered basis of M3 (R) then


h i
[A]TB = 1 2 3 2 1 3 3 1 4 .

2. hIf C = (e11 , e12 + ei21 , e13 + e31 , e22 , e23 + e32 , e33 ) is an ordered basis of U then [X]TC =
1 2 3 1 2 4 .
h i
3. If D = (e12 − e21 , e13 − e31 , e23 − e32 ) is an ordered basis of W then [Y ]TD = 0 0 −1 .

Definition 3.5.9. Let V be a vector space over F with dim(V) = n. Let A = [v1 , . . . , vn ] and
B = [u1 , . . . , un ] be basis matrices corresponding to the ordered bases A and B, respectively, of
V. Thus, using Equation (3.5.1), we have

A = [v1 , . . . , vn ] = [B[v1 ]B , . . . , B[vn ]B ] = B [[v1 ]B , . . . , [vn ]B ] = B[A]B , (3.5.3)

where [A]B = [[v1 ]B , . . . , [vn ]B ]. The matrix [A]B is called the matrix of A with respect to
the ordered basis B or the change of basis matrix from A to B.

We now summarize the above discussion which helps us to understand the name ‘change of
basis matrix’ for the matrix [A]B .

Theorem 3.5.10. Let V be a vector space over F with dim(V) = n. Further, let A =
(v1 , . . . , vn ) and B = (u1 , . . . , un ) be two ordered bases of V
3.6. SUMMARY 99

1. Then the matrix [A]B is invertible.


2. Similarly, the matrix [B]A is invertible.
3. Moreover, [x]B = [A]B [x]A , for all x ∈ V, i.e., [A]B takes coordinate vector of x with
respect to A to the coordinate vector of x with respect to B.
4. Similarly, [x]A = [B]A [x]B , for all x ∈ V.
5. Furthermore, ([A]B )−1 = [B]A .

Proof. Part 1: Note that using Equation (3.5.3), we have [v1 , . . . , vn ] = [u1 , . . . , un ][A]B . Hence,
by Exercise 3.2.13.7, the matrix [A]B is invertible, which proves Part 1. A similar argument
gives Part 2.
Part 3: Note that using Equation (3.5.1), B[x]B = x = A[x]A for all x ∈ V. Therefore,
using Equation (3.5.3), we get B[x]B = (B[A]B ) [x]A . As B is invertible, [x]B = [A]B [x]A . This
completes the proof of Part 3. We leave the proof of other parts to the reader.
Example 3.5.11. 1. Let V = Cn , A = [v1 , . . . , vn ] and B = (e1 , . . . , en ) be the standard
ordered basis. Then A = [v1 , . . . , vn ] = [[v1 ]B , . . . , [vn ]B ] = [A]B .
2. Suppose A = (1, 0, 0)T , (1, 1, 0)T , (1, 1, 1)T and B = (1, 1, 1)T , (1, −1, 1)T , (1, 1, 0)T are
 

two ordered bases of R3 . Then, we verify the statements in the previous result.
   −1    
x 1 1 1 x x−y
T

       
(a) Using Equation (3.5.2),  y  = 0 1 1 y  =  y − z .
AF

       
z 0 0 1 z z
A
DR

   −1       
x 1 1 1 x −1 1 2 x −x + y + 2z
      1   1  
y  = 1 −1 1 y  = 2  1 −1 0 y  = 2 
(b) Similarly,  x−y
         .

z 1 1 0 z 2 0 −2 z 2x − 2z
B
   
−1/2 0 1 0 2 0
   
(c) [A]B =  1/2 0 0, [B]A = 0 −2 1
  
 and [A]B [B]A = I3 .
1 1 0 1 1 0

Remark 3.5.12. Let V be a vector space over F with A = (v1 , . . . , vn ) as an ordered basis.
Then, by Theorem 3.5.10, [v]A is an element of Fn , for each v ∈ V. Therefore,
1. if F = R then, the elements of V correspond to vectors in Rn .
2. if F = C then, the elements of V correspond to vectors in Cn .

Exercise 3.5.13. Let A = (1, 2, 0)T , (1, 3, 2)T , (0, 1, 3)T and B = (1, 2, 1)T , (0, 1, 2)T , (1, 4, 6)T
 

be two ordered bases of R3 . Then, determine [A]B , [B]A and verify that [A]B [B]A = I3 .

3.6 Summary
In this chapter, we defined vector spaces over F. The set F was either R or C. To define a vector
space, we start with a non-empty set V of vectors and F the set of scalars. We also needed to
do the following:
100 CHAPTER 3. VECTOR SPACES

1. first define vector addition and scalar multiplication and

2. then verify the conditions in Definition 3.1.1.

If all conditions in Definition 3.1.1 are satisfied then V is a vector space over F. If W was a
non-empty subset of a vector space V over F then for W to be a space, we only need to check
whether the vector addition and scalar multiplication inherited from that in V hold in W.
We then learnt linear combination of vectors and the linear span of vectors. It was also shown
that the linear span of a subset S of a vector space V is the smallest subspace of V containing
S. Also, to check whether a given vector v is a linear combination of u1 , . . . , un , we needed to
solve the linear system c1 u1 + · · · + cn un = v in the variables c1 , . . . , cn . Or equivalently, the
system Ax = b, where in some sense A[:, i] = ui , 1 ≤ i ≤ n, xT = [c1 , . . . , cn ] and b = v. It
was also shown that the geometrical representation of the linear span of S = {u1 , . . . , un } is
equivalent to finding conditions in the entries of b such that Ax = b was always consistent.
Then, we learnt linear independence and dependence. A set S = {u1 , . . . , un } is linearly
independent set in the vector space V over F if the homogeneous system Ax = 0 has only the
trivial solution in F. Else S is linearly dependent, whereas before the columns of A correspond
to the vectors ui ’s.
We then talked about the maximal linearly independent set (coming from the homogeneous
system) and the minimal spanning set (coming from the non-homogeneous system) and culmi-
nating in the notion of the basis of a finite dimensional vector space V over F. The following
T
AF

important results were proved.


DR

1. A linearly independent set can be extended to form a basis of V.

2. Any two bases of V have the same number of elements.

This number was defined as the dimension of V, denoted dim(V).


Now let A ∈ Mn (R). Then, combining a few results from the previous chapter, we have the
following equivalent conditions.

1. A is invertible.

2. The homogeneous system Ax = 0 has only the trivial solution.

3. RREF(A) = In .

4. A is a product of elementary matrices.

5. The system Ax = b has a unique solution for every b.

6. The system Ax = b has a solution for every b.

7. Rank(A) = n.

8. det(A) 6= 0.

9. Col(AT ) = Row(A) = Rn .
3.6. SUMMARY 101

10. Rows of A form a basis of Rn .

11. Col(A) = Rn .

12. Columns of A form a basis of Rn .

13. Null(A) = {0}.

T
AF
DR
102 CHAPTER 3. VECTOR SPACES

T
AF
DR
Chapter 4

Linear Transformations

4.1 Definitions and Basic Properties


Let V be a vector space over F with dim(V) = n. Also, let B be an ordered basis of V. Then, in
the last section of the previous chapter, it was shown that for each x ∈ V, the coordinate vector
[x]B is a column vector of size n and has entries from F. So, in some sense, each element of V
looks like elements of Fn . In this chapter, we concretize this idea. We also show that matrices
give rise to functions between two finite dimensional vector spaces. To do so, we start with the
definition of functions over vector spaces that commute with the operations of vector addition
T

and scalar multiplication.


AF

Definition 4.1.1. Let V and W be vector spaces over F. A function (map) f : V → W is called
DR

a linear transformation if for all α ∈ F and u, v ∈ V the function f satisfies

f (α · u) = α f (u) and f (u + v) = f (u) ⊕ f (v),

where +, · are binary operations in V and ⊕, are the binary operations in W. By L(V, W), we
denote the set of all linear transformations from V to W. In particular, if W = V then the linear
transformation f is called a linear operator and the corresponding set of linear operators is
denoted by L(V).

Definition 4.1.2. Let g, h ∈ L(V, W). Then g and h are said to be equal if g(x) = h(x), for
all x ∈ V.

We now give examples of linear transformations.

Example 4.1.3. 1. Let V be a vector space. Then, the maps Id, 0 ∈ L(V), where
(a) Id(v) = v, for all v ∈ V, is commonly called the identity operator.
(b) 0(v) = 0, for all v ∈ V, is commonly called the zero operator.

2. Let V and W be vector spaces over F. Then, 0 ∈ L(V, W), where 0(v) = 0, for all v ∈ V,
is commonly called the zero transformation.

3. The map f (x) = x, for all x ∈ R, is an element of L(R) as f (ax) = ax = af (x) and
f (x + y) = x + y = f (x) + f (y).

103
104 CHAPTER 4. LINEAR TRANSFORMATIONS

4. The map f (x) = (x, 3x)T , for all x ∈ R, is an element of L(R, R2 ) as f (λx) = (λx, 3λx)T =
λ(x, 3x)T = λf (x) and f (x + y) = (x + y, 3(x + y))T = (x, 3x)T + (y, 3y)T = f (x) + f (y).

5. Let V, W and Z be vector spaces over F. Then, for any T ∈ L(V, W) and S ∈ L(W, Z),
the map S ◦ T ∈ L(V, Z), defined by (S ◦ T )(v) = S T (v) for all v ∈ V, is called the


composition of maps. Observe that for each u, v ∈ V and α, β ∈ R,


 
(S ◦ T )(αv + βu) = S T (αv + βu) = S αT (v) + βf (u)
 
= αS T (v) + βS T (u) = α(S ◦ T )(v) + β(S ◦ T )(u).

Hence S ◦ T , in short ST , is an element of L(V, Z).

6. Fix a ∈ Rn and define f (x) = aT x, for all x ∈ Rn . Then f ∈ L(Rn , R). In particular, if
x = [x1 , . . . , xn ]T then, for all x ∈ Rn ,
n
xi = 1T x is a linear transformation.
P
(a) f (x) =
i=1

(b) f1 (x) = x1 = eT1 x is a linear transformation.

(c) fi (x) = xi = eTi x is a linear transformation, for 1 ≤ i ≤ n.

7. Define f : R2 → R3 by f (x, y)T = (x + y, 2x − y, x + 3y)T . Then f ∈ L(R2 , R3 ). Here



T

f (e1 ) = (1, 2, 1)T and f (e2 ) = (1, −1, 3)T .


AF

8. Let A ∈ Mm×n (C). Define fA (x) = Ax, for every x ∈ Cn . Then, fA ∈ L(Cn , Cm ). Thus,
DR

for each A ∈ Mm,n (C), there exists a linear transformation fA ∈ L(Cn , Cm ).

9. Define f : Rn+1 → R[x; n] by f (a1 , . . . , an+1 )T = a1 + a2 x + · · · + an+1 xn , for each




(a1 , . . . , an+1 ) ∈ Rn+1 . Then f is a linear transformation.

10. Fix A ∈ Mn (C). Now, define fA : Mn (C) → Mn (C) and gA : Mn (C) → C by fA (B) =
AB and gA (B) = Tr(AB), for every B ∈ Mn (C). Then, fA and gA are both linear
transformations.

Are the maps f (B) = A∗ B, g(B) = BA, h(B) = tr(A∗ B) and t(B) = tr(BA), for every
B ∈ Mn (C) linear?

11. Is the map T : R[x; n] → R[x; n + 1] defined by T (f (x)) = xf (x), for all f (x) ∈ R[x; n] a
linear transformation?

d
Rx
12. The maps T, S : R[x] → R[x] defined by T (f (x)) = dx f (x) and S(f (x)) = f (t)dt, for all
0
f (x) ∈ R[x] are linear transformations. Is it true that T S = Id? What about ST ?

13. Recall the vector space RN in Example 3.1.4.9. Now, define maps T, S : RN → RN
by T ({a1 , a2 , . . .}) = {0, a1 , a2 , . . .} and S({a1 , a2 , . . .}) = {a2 , a3 , . . .}. Then, T and S,
commonly called the shift operators, are linear operators with exactly one of ST or T S
as the Id map.
4.1. DEFINITIONS AND BASIC PROPERTIES 105

14. Recall the vector space C(R, R) (see Example 3.1.4.11). Define T : C(R, R) → C(R, R)
Rx Rx
by T (f (x)) = f (t)dt. For example, T (sin(x)) = sin(t)dt = 1 − cos(x), for all x ∈ R.
0 0
Then, verify that T is a linear transformation.

Remark 4.1.4. Let A ∈ Mn (C) and define TA : Cn → Cn by TA (x) = Ax, for every x ∈ Cn .
Then, verify that TAk (x) = (TA ◦ TA ◦ · · · ◦ TA )(x) = Ak x, for any positive integer k.
| {z }
k times

We now prove that any linear transformation sends the zero vector to a zero vector.

Proposition 4.1.5. Let T ∈ L(V, W). Suppose that 0V is the zero vector in V and 0W is the
zero vector of W. Then T (0V ) = 0W .

Proof. Since 0V = 0V + 0V , we get T (0V ) = T (0V + 0V ) = T (0V ) + T (0V ). As T (0V ) ∈ W,

0W + T (0V ) = T (0V ) = T (0V ) + T (0V ).

Hence, T (0V ) = 0W .
From now on 0 will be used as the zero vector of the domain and codomain. We now consider
a few more examples.
Example 4.1.6. 1. Does there exist a linear transformation T : V → W such that T (v) 6= 0,
for all v ∈ V?
T

Solution: No, as T (0) = 0 (see Proposition 4.1.5).


AF

2. Does there exist a linear transformation T : R → R such that T (x) = x2 , for all x ∈ R?
DR

Solution: No, as T (ax) = (ax)2 = a2 x2 = a2 T (x) 6= aT (x), unless a = 0, 1.



3. Does there exist a linear transformation T : R → R such that T (x) = x, for all x ∈ R?
√ √ √ √
Solution: No, as T (ax) = ax = a x 6= a x = aT (x), unless a = 0, 1.
4. Does there exist a linear transformation T : R → R such that T (x) = sin(x), for all x ∈ R?
Solution: No, as T (ax) 6= aT (x).
5. Does there exist a linear transformation T : R → R such that T (5) = 10 and T (10) = 5?
Solution: No, as T (10) = T (5 + 5) = T (5) + t(5) = 10 + 10 = 20 6= 5.
6. Does there exist a linear transformation T : R → R such that T (5) = π and T (e) = π?
π eπ
Solution: No, as 5T (1) = T (5) = π implies that T (1) = . So, T (e) = eT (1) = .
5 5
7. Does there exist a linear transformation f : R2 → R2 such that f ((x, y)T ) = (x + y, 2)T ?
Solution: No, as f (0) 6= 0.
8. Does there exist a linear transformation f : R2 → R2 such that f ((x, y)T ) = (x + y, xy)T ?
Solution: No, as f ((2, 2)T ) = (4, 4)T 6= 2(2, 1)T = 2f ((1, 1)T ).
9. Define a map T : C → C by T (z) = z, the complex conjugate of z. Is T a linear operator
over the real vector space R?
Solution: Yes, as for any α ∈ R, T (αz) = αz = αz = αT (z).

The next result states that a linear transformation is known if we know its image on a basis
of the domain space.
106 CHAPTER 4. LINEAR TRANSFORMATIONS

Lemma 4.1.7. Let V and W be vector spaces over F and let T ∈ L(V, W). Then T is determined
if the image of T on basis vectors of V are known.

h if V is finite dimensional
In particular, i and B = (v1 , . . . , vn ) is an ordered basis of V over
F then T (v) = T (v1 ) · · · T (vn ) [v]B .

Proof. Let B be a basis of V over F. Then, for each v ∈ V, there exist vectors u1 , . . . , uk in B
k
and scalars c1 , . . . , ck ∈ F such that v =
P
ci ui . Thus
i=1

k k k
!
X X X
T (v) = T ci ui = T (ci ui ) = ci T (ui ).
i=1 i=1 i=1

Or equivalently, whenever
   
c1 c
. i 1 
 .. 
h
.
 .  then T (v) = T (u1 ) · · · T (uk )  . .
v = [u1 , . . . , uk ] (4.1.1)
ck ck

Thus, the image of T on v just depends on where the basis vectors are mapped. This completes
the first part.  
c1
n .
.
P
For the second part, let v =  . . Hence, using Equation (4.1.1), we
ci vi . Then [v]B = 
T

i=1
AF

cn
h i
have T (v) = T (v1 ) · · · T (vn ) [v]B . Thus, the required result follows.
DR

As another application of Lemma 4.1.7, we have the following result.

Corollary 4.1.8. Let V and W be vector spaces over F and let T : V → W be a linear
transformation. If B is a basis of V then, Rng(T ) = LS(T (x)|x ∈ B).

Recall that by Example 4.1.3.6, for each a ∈ Rn , the map T (x) = aT x, for each x ∈ Rn , is
a linear transformation from Rn to R. We now show that these are the only ones.

Corollary 4.1.9. [Reisz Representation Theorem] Let T ∈ L(Rn , R). Then, there exists
a ∈ Rn such that T (x) = aT x.

Proof. By Lemma 4.1.7, T is known if we know the image of T on {e1 , . . . , en }, the standard
basis of Rn . So, for 1 ≤ i ≤ n, let T (ei ) = ai , for some ai ∈ R. Now define a = [a1 , . . . , an ]T
and x = [x1 , . . . , xn ]T ∈ Rn . Then
n n n
!
X X X
T (x) = T xi ei = xi T (ei ) = xi ai = aT x for all x ∈ Rn .
i=1 i=1 i=1

Thus, the required result follows.

Example 4.1.10. In each of the examples given below, state whether a linear transformation
exists or not. If yes, give at least one linear transformation. If not, then give the condition due
to which a linear transformation doesn’t exist.
4.1. DEFINITIONS AND BASIC PROPERTIES 107

1. T : R2 → R2 such that T ((1, 1)T ) = (1, 2)T and T ((1, −1)T ) = (5, 10)""
T?
# " ##
1 1
Solution: Yes, as the set {(1, 1), (1, −1)} is a basis of R2 . Write B = , . Then,
1 −1
using Equation (4.1.1) and [v]B = B −1 v, we get
" #! " " #! " #!# "" ##
x 1 1 x
T = T , T
y 1 −1 y
B
" # " #−1 " # " #" # " #
x+y
1 5 1 1 x 1 5 2 3x − 2y
=  = = .
2 10 1 −1 y 2 10 x−y 2 6x − 4y

2. T : R2 → R2 such that T ((1, 1)T ) = (1, 2)T and T ((5, 5)T ) = (5, 10)T ?
Solution: Yes, as (5, 10)T = T ((5, 5)T ) = 5T ((1, 1)T ) = 5(1, 2)T = (5, 10)T .
To construct one such linear transformation, note that B = ((1, 1)T , (1, 0)T ) is a basis of
R2 . Pick v ∈ R2 and define T ((1, 0)T ) = v = (v1 , v2 )T . Then, as above, we get
" #! " " #! " #!# "" ## " #" # " #
x 1 1 x 1 v1 y 1
T = T ,T = =y + (x − y)v.
y 1 0 y 2 v2 x − y 2
B

3. T : R2 → R2 such that T ((1, 1)T ) = (1, 2)T and T ((5, 5)T ) = (5, 11)T ?
Solution: No, as (5, 11)T = T ((5, 5)T ) = 5T ((1, 1)T )5(1, 2)T = (5, 10)T , a contradiction.
T
AF

4. T : R2 → R2 such that Rng(T ) = {T (x) | x ∈ R2 } = LS{(1, π)T }?


Solution: Yes. Define T (e1 ) = (1, π)T and T (e2 ) = a(1, π)T , for some a ∈ R.
DR

5. T : R2 → R2 such that Rng(T ) = R2 ?


Solution: Yes. Let {u, v} be a basis of R2 and define T (e1 ) = u and T (e2 ) = v.
6. T : R2 → R2 such that Rng(T ) = {T (x) | x ∈ R2 } = {0}?
Solution: Yes. Define T (e1 ) = 0 and T (e2 ) = 0.
7. T : R2 → R2 such that Ker(T ) = {x ∈ R2 | T (x) = 0} = LS{(1, π)T }?
Solution: Yes. Take {(1, π)T , u} as a basis of R2 and define T ((1, π)T ) = 0 and T (u) = u.

Exercise 4.1.11. 1. Use matrices to construct linear operators T, S : R3 → R3 that satisfy:

(a) T 6= 0, T ◦ T = T 2 6= 0, T ◦ T ◦ T = T 3 = 0.
(b) T 6= 0, S 6= 0, S ◦ T = ST 6= 0, T ◦ S = T S = 0.
(c) S ◦ S = S 2 = T 2 = T ◦ T, S 6= T .
(d) T ◦ T = T 2 = Id, T 6= Id.

2. Let T : Rn → Rn be a linear operator with T 6= 0 and T 2 = 0. Prove that there exists a


vector x ∈ Rn such that the set {x, T (x)} is linearly independent.

3. Fix a positive integer p and let T : Rn → Rn be a linear operator with T k 6= 0 for


1 ≤ k ≤ p and T p+1 = 0. Then prove that there exists a vector x ∈ Rn such that the set
{x, T (x), . . . , T p (x)} is linearly independent.
108 CHAPTER 4. LINEAR TRANSFORMATIONS

4. Fix x0 ∈ Rn with x0 6= 0. Now, define T ∈ L(Rn , Rm ) by T (x0 ) = y0 , for some y0 ∈ Rm .


Define T −1 (y0 ) = {x ∈ Rn : T (x) = y0 }. Then prove that x ∈ T −1 (y0 ) if and only if
x − x0 ∈ T −1 (0). Further, T −1 (y0 ) is a subspace of Rn if and only if y0 = 0.

5. Let V and W be vector spaces over F. If {v1 , . . . , vn } is a basis of V and {w1 , . . . , wn } ⊆ W


then prove that there exists a unique T ∈ L(V, W) with T (vi ) = wi , for i = 1, . . . , n.

6. Prove that there exists infinitely many linear transformations T : R3 → R2 such that
T ((1, −1, 1)T ) = (1, 2)T and T ((−1, 1, 2)T ) = (1, 0)T ?

7. Let V be a vector space and let a ∈ V. Then the map Ta : V → V defined by Ta (x) = x+a,
for all x ∈ V is called the translation map. Prove that Ta ∈ L(V) if and only if a = 0.

8. Does there exist a linear transformation T : R3 → R2 such that

(a) T ((1, 0, 1)T ) = (1, 2)T , T ((0, 1, 1)T ) = (1, 0)T and T ((1, 1, 1)T ) = (2, 3)T ?
(b) T ((1, 0, 1)T ) = (1, 2)T , T ((0, 1, 1)T ) = (1, 0)T and T ((1, 1, 2)T ) = (2, 3)T ?

9. Let T : R3 → R3 be defined by T ((x, y, z)T ) = (2x + 3y + 4z, x + y + z, x + y + 3z)T . Find


the value of k for which there exists a vector x ∈ R3 such that T (x) = (9, 3, k)T .

10. Let T : R3 → R3 be defined by T ((x, y, z)T ) = (2x − 2y + 2z, −2x + 5y + 2z, 8x + y + 4z)T .
Find x ∈ R3 such that T (x) = (1, 1, −1)T .
T
AF

11. Let T : R3 → R3 be defined by T ((x, y, z)T ) = (2x + y + 3z, 4x − y + 3z, 3x − 2y + 5z)T .


DR

Determine x, y, z ∈ R3 \ {0} such that T (x) = 6x, T (y) = 2y and T (z) = −2z. Is the set
{x, y, z} linearly independent?

12. Let T : R3 → R3 be defined by T ((x, y, z)T ) = (2x + 3y + 4z, −y, −3y + 4z)T . Determine
x, y, z ∈ R3 \ {0} such that T (x) = 2x, T (y) = 4y and T (z) = −z. Is the set {x, y, z}
linearly independent?

13. Let n ∈ N. Does there exist a linear transformation T : R3 → Rn such that T ((1, 1, −2)T ) =
x, T ((−1, 2, 3)T ) = y and T ((1, 10, 1)T ) = z

(a) with z = x + y?
(b) with z = cx + dy, for some c, d ∈ R?

14. For each matrix A given below, define T ∈ L(R2 ) by T (x) = Ax. What do these linear
operators signify geometrically?
( "√ # " # " √ # " # " #)
cos 2π 2π

1 3 −1 1 1 −1 1 1 − 3 0 −1 3 − sin 3
(a) A ∈ √ ,√ , √ , , 2π
 2π
 .
2 1 3 2 1 1 2 3 1 1 0 sin 3 cos 3
(" # " # " # " # " # " #)
−1 0 1 0 1 1 1 1 1 2 0 0 1 0
(b) A ∈ , , , , , .
0 1 0 −1 2 1 −1 5 2 4 0 1 0 0
( "√ # " # " √ # "  #)
cos 2π 2π

1 3 1 1 1 1 1 1 3 3 sin 3
(c) A ∈ √ ,√ , √ ,  .
sin 2π 2π

2 1 − 3 2 1 −1 2 3 −1 3 − cos 3
4.2. RANK-NULLITY THEOREM 109

15. Find all functions f : R2 → R2 that fixes the line y = x and sends (x1 , y1 ) for x1 6= y1 to
its mirror image along the line y = x. Or equivalently, f satisfies

(a) f (x, x) = (x, x) and


(b) f (x, y) = (y, x) for all (x, y) ∈ R2 .

16. Consider the space C3 over C. If f ∈ L(C3 ) with f (x) = x, f (y) = (1 + i)y and f (z) =
(2 + 3i)z, for x, y, z ∈ C3 \ {0} then prove that {x, y, z} forms a basis of C3 .

4.2 Rank-Nullity Theorem


The readers are advised to see Exercise 3.2.13.8 and Theorem 3.4.10 for clarity and similarity
with the results in this section. To start with, we define two spaces related with a linear
transformation.

Definition 4.2.1. Let V and W be vector spaces over F and let T : V → W be a linear
transformation. Then the set
1. {T (v)|v ∈ V} is called the range space of T , denoted Rng(T ).
2. {v ∈ V|T (v) = 0} is called the kernel of T , denoted Ker(T ). In certain books, it is also
called the null space of T .
T
AF

Example 4.2.2. Determine Rng(T ) and Ker(T ) of the following linear transformations.
DR

1. f ∈ L(R3 , R4 ), where f ((x, y, z)T ) = (x − y + z, y − z, x, 2x − 5y + 5z)T .


Solution: Consider the standard basis {e1 , e2 , e3 } of R3 . Then

Rng(f ) = LS(f (e1 ), T (e2 ), T (e3 )) = LS (1, 0, 1, 2)T , (−1, 1, 0, −5)T , (1, −1, 0, 5)T


= LS (1, 0, 1, 2)T , (1, −1, 0, 5)T = {λ(1, 0, 1, 2)T + β(1, −1, 0, 5)T | λ, β ∈ R}


= {(λ + β, −β, λ, 2λ + 5β) : λ, β ∈ R}


= {(x, y, z, w)T ∈ R4 | x + y − z = 0, 5y − 2z + w = 0},

Ker(f ) = {(x, y, z)T ∈ R3 : f ((x, y, z)T ) = 0}


= {(x, y, z)T ∈ R3 : (x − y + z, y − z, x, 2x − 5y + 5z)T = 0}
= {(x, y, z)T ∈ R3 : x − y + z = 0, y − z = 0, x = 0, 2x − 5y + 5z = 0}
= {(x, y, z)T ∈ R3 : y − z = 0, x = 0}
= {(0, z, z)T ∈ R3 : z ∈ R} = LS((0, 1, 1)T )

2. Let B ∈ M2 (R). Now, define a map T : M2 (R) → M2 (R) by T (A) = BA − AB, for all
A ∈ M2 (R). Determine Rng(T ) and Ker(T ).
Solution: Note that A ∈ Ker(T ) if and only if A commutes with B. In particular,
{I, B, B 2 , . . .} ⊆ Ker(T ). For example, if B is a scalar matrix then, Ker(T ) = M2 (R).
For computing, Rng(T ), recall that {eij |1 ≤ i, j ≤ 2} is a basis of M2 (R). So,
110 CHAPTER 4. LINEAR TRANSFORMATIONS

(a) if B = cI2 then Rng(T ) = {0}.


" # " # " # " #
1 2 0 −2 −2 −3 2 0
(b) if B = then T (e11 ) = , T (e12 ) = , T (e21 ) = and
2 4 2 0 0 2 3 −2
" # " # " # " #!
0 2 0 2 2 3 −2 0
T (e22 ) = . Thus, Rng(T ) = LS , , .
−2 0 −2 0 0 −2 −3 2
" # " # " # " #!
1 2 0 2 2 2 −2 0
(c) for B = , verify that Rng(T ) = LS , , .
2 3 −2 0 0 −2 −2 2

Exercise 4.2.3. 1. Let V and W be vector spaces over F and let T ∈ L(V, W). Then
(a) Rng(T ) is a subspace of W.
(b) Ker(T ) is a subspace of V.

Furthermore, if V is finite dimensional then


(a) dim(Ker(T )) ≤ dim(V).
(b) dim(Rng(T )) is finite and whenever dim(W) is finite dim(Rng(T )) ≤ dim(W).

2. Which of the following maps are linear transformations? In case, the map is a linear
transformation, determine its range space and the null space.

(a) Let V = R2 and W = R3 with T ((x, y)T ) = (x + y + 1, 2x − y, x + 3y)T .


(b) Let V = W = R2 with T ((x, y)T ) = (x − y, x2 − y 2 )T .
T
AF

(c) Let V = W = R2 with T ((x, y)T ) = (x − y, | x | )T .


DR

(d) Let V = R2 and W = R4 with T ((x, y)T ) = (x + y, x − y, 2x + y, 3x − 4y)T .


(e) Let V = W = R4 with T ((x, y, z, w)T ) = (z, x, w, y)T .

3. Which of the following maps T : M2 (R) → M2 (R) are linear operators? In case, T is a
linear operator, determine Rng(T ) and Ker(T ).
(a) T (A) = AT .
(b) T (A) = I + A.
(c) T (A) = A2 .
(d) T (A) = BAB −1 , for some fixed B ∈ M2 (R).

4. Describe Ker(D) and Rng(D), where D ∈ L(R[x; n]) and is defined by D(f (x)) = f 0 (x),
the differentiation with respect to x. Note that Rng(D) ⊆ R[x; n − 1].

5. Define T ∈ L(R[x]) by T (f (x)) = xf (x), for all f (x) ∈ L(R[x]). What can you say about
Ker(T ) and Rng(T )?

6. For T in Example 4.2.2, compute dim(Ker(T )) and dim(Rng(T )).

7. Define T ∈ L(R3 ) by T (e1 ) = e1 + e3 , T (e2 ) = e2 + e3 and T (e3 ) = −e3 . Then

(a) determine T ((x, y, z)T ), for x, y, z ∈ R.


(b) determine Null(T ) and Rng(T ).
4.2. RANK-NULLITY THEOREM 111

(c) is it true that T 2 = Id?

8. Find T ∈ L(R3 ) for which Rng(T ) = LS (1, 2, 0)T , (0, 1, 1)T , (1, 3, 1)T .


Theorem 4.2.4. Let V and W be vector spaces over F and let T ∈ L(V, W).
1. If S ⊆ V is linearly dependent then T (S) = {T (v) | v ∈ V} is linearly dependent.
2. Suppose S ⊆ V such that T (S) is linearly independent then S is linearly independent.

Proof. As S is linearly dependent, there exist k ∈ N and vi ∈ S, for 1 ≤ i ≤ k, such that the
k
xi vi = 0, in the unknowns xi ’s, has a non-trivial solution, say xi = ai ∈ F, 1 ≤ i ≤ k.
P
system
i=1
k
P Pk
Thus ai vi = 0. Then ai ’s also give a non-trivial solution to the system yi T (vi ) = 0,
i=1 i=1
k k
 k 
P P P
where yi ’s are unknown, as ai T (vi ) = T (ai vi ) = T ai vi = T (0) = 0. Hence the
i=1 i=1 i=1
required result follows.
The second part is left as an exercise for the reader.

Definition 4.2.5. Let V and W be vector spaces over F and let T ∈ L(V, W) and dim(V) is
finite then we define Rank(T ) = dim(Rng(T )) and Nullity(T ) = dim(Ker(T )).

We now prove the rank-nullity Theorem. The proof of this result is similar to the proof of
Theorem 3.4.10. We give it again for the sake of completeness.
T
AF

Theorem 4.2.6 (Rank-Nullity Theorem). Let V and W be vector spaces over F. If dim(V) is
finite and T ∈ L(V, W) then
DR

Rank(T ) + Nullity(T ) = dim(Rng(T )) + dim(Ker(T )) = dim(V).

Proof. By Exercise 4.2.3.1.1a, dim(Ker(T )) ≤ dim(V). Let B be a basis of Ker(T ). We extend


it to form a basis C of V. As T (v) = 0, for all v ∈ B, using Corollary 4.1.8, we get

Rng(T ) = LS({T (v)|v ∈ C}) = LS({T (v)|v ∈ C \ B}).

We claim that {T (v)|v ∈ C \ B} is linearly independent subset of W.


On the contrary, assume that there exists v1 , . . . , vk ∈ C \ B and a = [a1 , . . . , ak ]T such that
Pk k
P  P k k
P
a 6= 0 and ai T (vi ) = 0. Thus T ai vi = ai T (vi ) = 0, i.e., ai vi ∈ Ker(T ). Hence,
i=1 i=1 i=1 i=1
k
there exists b1 , . . . , b` ∈ F and u1 , . . . , u` ∈ B such that
P P̀
ai vi = bj uj . Or equivalently,
i=1 j=1
k
P P̀
the system x i vi + yj uj = 0, in the unknowns xi ’s and yj ’s, has a non-trivial solution
i=1 j=1
[a1 , . . . , ak , −b1 , . . . , −b` ]T (non-trivial as a 6= 0). Hence, S = {v1 , . . . , vk , u1 , . . . , u` } is linearly
dependent subset in V. A contradiction to S ⊆ C. Thus,

dim(Rng(T )) + dim(Ker(T )) = |C \ B| + |B| = |C| = dim(V).

Thus, we have proved the required result.


As an immediate corollary, we have the following result. The proof is left for the reader.
112 CHAPTER 4. LINEAR TRANSFORMATIONS

Corollary 4.2.7. Let V and W be finite dimensional vector spaces over F and let T ∈ L(V, W).
If dim(V) = dim(W) then the following statements are equivalent.
1. T is one-one.
2. Ker(T ) = {0}.
3. T is onto.
4. dim(Rng(T )) = dim(V).

Corollary 4.2.8. Let V be a vector space over F with dim(V) = n. If S, T ∈ L(V) then
1. Nullity(T ) + Nullity(S) ≥ Nullity(ST ) ≥ max{Nullity(T ), Nullity(S)}.
2. min{Rank(S), Rank(T )} ≥ Rank(ST ) ≥ n − Rank(S) − Rank(T ).

Proof. The prove of Part 2 is omitted as it directly follows from Part 1 and Theorem 4.2.6.
Part 1: We first prove the second inequality. Suppose v ∈ Ker(T ). Then

(ST )(v) = S(T (v)) = S(0) = 0

implies Ker(T ) ⊆ Ker(ST ). Thus Nullity(T ) ≤ Nullity(ST ).


By Theorem 4.2.6, Nullity(S) ≤ Nullity(ST ) ⇔ Rank(S) ≥ Rank(ST ). This holds as
Rng(T ) ⊆ V implies Rng(ST ) = S(Rng(T )) ⊆ S(V) = Rng(S).
T

To prove the first inequality, let {v1 , . . . , vk } be a basis of Ker(T ). Then {v1 , . . . , vk } ⊆
AF

Ker(ST ). So, let us extend it to get a basis {v1 , . . . , vk , u1 , . . . , u` } of Ker(ST ).


DR

Now, proceeding as in the proof of the rank-nullity theorem, implies that {T (u1 ), . . . , T (u` )}
is a linearly independent subset of Ker(S). Hence, Nullity(S) ≥ ` and therefore, we get
Nullity(ST ) = k + ` ≤ Nullity(T ) + Nullity(S).
Exercise 4.2.9. 1. Let A ∈ Mn (R) with A2 = A. Define T ∈ L(Rn ) by T (v) = Av for all
v ∈ Rn . Then prove that

(a) T 2 = T , or equivalently, (T (Id − T ))(x) = 0, for all x ∈ Rn .


(b) Null(T ) ∩ Rng(T ) = {0}.
(c) Rn = Rng(T ) + Null(T ).

2. Let z1 , z2 , . . . , zk be k distinct complex numbers. Define T ∈ L(C[x; n], Ck ) by T P (z) =



T
P (z1 ), . . . , P (zk ) , for all P (z) ∈ C[x; n]. Determine Rank(T ).

4.2.1 Algebra of Linear Transformations

We start with the following definition.

Definition 4.2.10. Let V, W be vector spaces over F and let S, T ∈ L(V, W). Then, we define
the point-wise
1. sum of S and T , denoted S + T , by (S + T )(v) = S(v) + T (v), for all v ∈ V.
2. scalar multiplication, denoted cT for c ∈ F, by (cT )(v) = c (T (v)), for all v ∈ V.
4.2. RANK-NULLITY THEOREM 113

Theorem 4.2.11. Let V and W be vector spaces over F. Then L(V, W) is a vector space over
F. Furthermore, if dim V = n and dim W = m, then dim L(V, W) = mn.

Proof. It can be easily verified that under point-wise addition and scalar multiplication, defined
above, L(V, W) is indeed a vector space over F. We now prove the other part. So, let us
assume that B = {v1 , . . . , vn } and C = {w1 , . . . , wm } are bases of V and W, respectively. For
1 ≤ i ≤ n, 1 ≤ j ≤ m, we define the functions fij on the basis vectors of V by
(
wj , if k = i
fij (vk ) =
0, if k 6= i.

n
For other vectors of V, we extend the definition by linearity, i.e., if v =
P
αs vs then
s=1

n n
!
X X
fij (v) = fij αs vs = αs fij (vs ) = αi fij (vi ) = αi wj . (4.2.1)
s=1 s=1

Thus fij ∈ L(V, W). We now show that {fij |1 ≤ i ≤ n, 1 ≤ j ≤ m} is a basis of L(V, W).
As a first step, we show that fij ’s are linearly independent. So, consider the linear system
n P
P m
cij fij = 0, in the unknowns cij ’s, for 1 ≤ i ≤ n, 1 ≤ j ≤ m. Using the point-wise addition
i=1 j=1
and scalar multiplication, we get
T

 
n Xm n X
m m
AF

X X X
0 = 0(vk ) =  cij fij (vk ) =
 cij fij (vk ) = ckj wj .
DR

i=1 j=1 i=1 j=1 j=1

But, the set {w1 , . . . , wm } is linearly independent and hence the only solution equals ckj = 0,
for 1 ≤ j ≤ m. Now, as we vary vk from v1 to vn , we see that cij = 0, for 1 ≤ j ≤ m and
1 ≤ i ≤ n. Thus, we have proved the linear independence.
Now, let us prove that LS ({fij |1 ≤ i ≤ n, 1 ≤ j ≤ m}) = L(V, W). So, let f ∈ L(V, W).
m
Then, for 1 ≤ s ≤ n, f (vs ) ∈ W and hence there exists βst ’s such that f (vs ) =
P
βst wt . So, if
t=1
n
αs vs ∈ V then, using Equation (4.2.1), we get
P
v=
s=1

n n n m n X
m
! !
X X X X X
f (v) = f α s vs = αs f (vs ) = αs βst wt = βst (αs wt )
s=1 s=1 s=1 t=1 s=1 t=1
n Xm n X
m
!
X X
= βst fst (v) = βst fst (v).
s=1 t=1 s=1 t=1

Since the above is true for every v ∈ V, LS ({fij |1 ≤ i ≤ n, 1 ≤ j ≤ m}) = L(V, W) and thus
the required result follows.
Before proceeding further, recall the following definition about a function.

Definition 4.2.12. Let f : S → T be any function.


1. Then, a function g : T → S is called a left inverse of f if (g ◦ f )(x) = x, for all x ∈ S.
That is, g ◦ f = Id, the identity function on S.
114 CHAPTER 4. LINEAR TRANSFORMATIONS

2. Then, a function h : T → S is called a right inverse of f if (f ◦ h)(y) = y, for all y ∈ T .


That is, f ◦ h = Id, the identity function on T .
3. Then f is said to be invertible if it has a right inverse and a left inverse.

Remark 4.2.13. Let f : S → T be invertible. Then, it can be easily shown that any right
inverse and any left inverse are the same. Thus, the inverse function is unique and is denoted
by f −1 . It is well known that f is invertible if and only if f is both one-one and onto.

Lemma 4.2.14. Let V and W be vector spaces over F and let T ∈ L(V, W). If T is one-one
and onto then, the map T −1 : W → V is also a linear transformation. The map T −1 is called
the inverse linear transform of T and is defined by T −1 (w) = v, whenever T (v) = w.

Proof. Part 1: As T is one-one and onto, by Theorem 4.2.6, dim(V) = dim(W). So, by
Corollary 4.2.7, for each w ∈ W there exists a unique v ∈ V such that T (v) = w. Thus, one
defines T −1 (w) = v.
We need to show that T −1 (α1 w1 + α2 w2 ) = α1 T −1 (w1 ) + α2 T −1 (w2 ), for all α1 , α2 ∈ F
and w1 , w2 ∈ W. Note that by previous paragraph, there exist unique vectors v1 , v2 ∈ V such
that T −1 (w1 ) = v1 and T −1 (w2 ) = v2 . Or equivalently, T (v1 ) = w1 and T (v2 ) = w2 . So,
T (α1 v1 + α2 v2 ) = α1 w1 + α2 w2 , for all α1 , α2 ∈ F. Hence, for all α1 , α2 ∈ F, we get

T −1 (α1 w1 + α2 w2 ) = α1 v1 + α2 v2 = α1 T −1 (w1 ) + α2 T −1 (w2 ).


T
AF

Thus, the required result follows.


DR

Example 4.2.15. 1. Let T : R2 → R2 be given by (x, y) (x + y, x − y). Then, verify that


x+y x−y 
T −1 is given by 2 , 2 .
n
2. Let T ∈ L(Rn , R[x; n − 1]) be given by (a1 , . . . , an ) ai xi−1 , for (a1 , . . . , an ) ∈ Rn .
P
i=1
n n
T −1 xi−1 ai xi−1 ∈ R[x; n − 1].
P P
Then, maps ai (a1 , . . . , an ), for each polynomial
i=1 i=1
Verify that T −1 ∈ L(R[x; n − 1], Rn ).

Definition 4.2.16. Let V and W be vector spaces over F and let T ∈ L(V, W). Then, T is said
to be singular if {0} $ Ker(T ), i.e., Ker(T ) contains a non-zero vector. If Ker(T ) = {0}
then, T is called non-singular.
 
" #! x
x
Example 4.2.17. Let T ∈ L(R2 , R3 ) be defined by T
 
=
y . Then, verify that T is

y
0
non-singular. Is T invertible?

We now prove a result that relates non-singularity with linear independence.

Theorem 4.2.18. Let V and W be vector spaces over F and let T ∈ L(V, W). Then the
following statements are equivalent.

1. T is one-one.
4.2. RANK-NULLITY THEOREM 115

2. T is non-singular.

3. Whenever S ⊆ V is linearly independent then T (S) is necessarily linearly independent.

Proof. 1⇒2 Let T be singular. Then, there exists v 6= 0 such that T (v) = 0 = T (0). This
implies that T is not one-one, a contradiction.
2⇒3 Let S ⊆ V be linearly independent. Let if possible T (S) be linearly dependent.
k
Then, there exists v1 , . . . , vk ∈ S and α = (α1 , . . . , αk )T 6= 0 such that
P
αi T (vi ) = 0.
 k i=1
k

P P
Thus, T αi vi = 0. But T is nonsingular and hence we get αi vi = 0 with α 6= 0, a
i=1 i=1
contradiction to S being a linearly independent set.
3⇒1 Suppose that T is not one-one. Then, there exists x, y ∈ V such that x 6= y but
T (x) = T (y). Thus, we have obtained S = {x − y}, a linearly independent subset of V with
T (S) = {0}, a linearly dependent set. A contradiction to our assumption. Thus, the required
result follows.

Definition 4.2.19. Let V and W be vector spaces over F and let T ∈ L(V, W). Then, T is
said to be an isomorphism if T is one-one and onto. The vector spaces V and W are said to
be isomorphic, denoted V ∼
= W, if there is an isomorphism from V to W.

We now give a formal proof of the statement in Remark 3.5.12.


T

Theorem 4.2.20. Let V be an n-dimensional vector space over F. Then V ∼


= Fn .
AF
DR

Proof. Let {v1 , . . . , vn } be a basis of V and  {e1 ,n. . . , en }, the standard basis of F . Now define
n
n
αi ei , for α1 , . . . , αn ∈ F. Then, it is easy to
P P
T (vi ) = ei , for 1 ≤ i ≤ n and T α i vi =
i=1 i=1
observe that T ∈ L(V, Fn ), T is one-one and onto. Hence, T is an isomorphism.
As a direct application using the countability argument, one obtains the following result

Corollary 4.2.21. The vector space R over Q is not finite dimensional. Similarly, the vector
space C over Q is not finite dimensional.

We now summarize the different definitions related with a linear operator on a finite dimen-
sional vector space. The proof basically uses the rank-nullity theorem and they appear in some
form in previous results. Hence, we leave the proof for the reader.

Theorem 4.2.22. Let V be a finite dimensional vector space over F with dim V = n. Then the
following statements are equivalent for T ∈ L(V).
1. T is one-one.
2. Ker(T ) = {0}.
3. Rank(T ) = n.
4. T is onto.
5. T is an isomorphism.
6. If {v1 , . . . , vn } is a basis for V then so is {T (v1 ), . . . , T (vn )}.
116 CHAPTER 4. LINEAR TRANSFORMATIONS

7. T is non-singular.
8. T is invertible.
Exercise 4.2.23. 1. Let V and W be vector spaces over F and let T ∈ L(V, W). If dim(V)
is finite then prove that

(a) T cannot be onto if dim(V) < dim(W).


(b) T cannot be one-one if dim(V) > dim(W).

2. Let A ∈ Mn (C). Then, the function T : Col(A∗ ) → Col(A) defined by T (x) = Ax is


invertible. [Hint: Use Theorem 3.4.13.3 and the rank-nullity theorem]
Ans: In view of Theorem 3.4.13.3 and the rank-nullity theorem, we just need to show that
the map is one-one. So, suppose that there exist x, y ∈ Col(A∗ ) such that T (x) = T (y).
Or equivalently, Ax = Ay. Thus, x − y ∈ Null(A) = (Col(A∗ ))⊥ (by Theorem 3.4.13.2).
Therefore, x − y ∈ (Col(A∗ ))⊥ ∩ Col(A∗ ) = {0}. Thus, x = y and hence the map is
one-one. Thus, the required result follows.

4.3 Matrix of a linear transformation


In Example 4.1.3.8, we saw that for each A ∈ Mm×n (C) there exists a linear transformation
T ∈ L(Cn , Cm ) given by T (x) = Ax, for each x ∈ Cn . In this section, we prove that if V
T

and W are vector spaces over F with dimensions n and m, respectively, then any T ∈ L(V, W)
AF

corresponds to a set of m × n matrices. Before proceeding further, the readers should recall the
DR

results on ordered basis (see Section 3.5).


So, let A = (v1 , . . . , vn ) and B = (w1 , . . . , wm ) be ordered bases of V and W, respectively.
Also, let A = [v1 , . . . , vn ] and B = [w1 , . . . , wm ] be the basis matrix of A and B, respectively.
Then, using Equation (3.5.1), v = A[v]A and w = B[w]B , for all v ∈ V and w ∈ W. Thus, we
see that for T ∈ L(V, W) and x ∈ V (T is determined by its image on basis vectors),
h i
B[T(x)]B = T (x) = T ([v1 , . . . , vn ][x]A ) = T (v1 ) · · · T (vn ) [x]A
h i h i
= B[T (v1 )]B · · · B[T (vn )]B [x]A = B [T (v1 )]B · · · [T (vn )]B [x]A .

As B is an invertible matrix, we cancel


h it to get [T(x)]B = i[[T (v1 )]B , . . . , [T (vn )]B ] [x]A , for
each x ∈ V. Note that the matrix [T (v1 )]B · · · [T (vn )]B , denoted T [A, B], is an m × n
matrix and is unique with respect to the ordered basis B as the i-th column equals [T (vi )]B , for
1 ≤ i ≤ n. So, we immediately have the following definition and result.

Definition 4.3.1. Let A = (v1 , . . . , vn ) and B = (w1 , . . . , wm ) be ordered bases of V and W,


respectively. If T ∈ L(V, W) then the matrix T [A, B] is called the coordinate matrix of T or
the matrix of the linear transformation T with respect to the basis A and B, respectively.

When there is no mention of bases, we take it to be the standard ordered bases and denote
the corresponding matrix by [T ]. Also, note that for each x ∈ V, the matrix T [A, B][x]A is
the coordinate vector of T (x). Thus, the matrix T [A, B] takes coordinate vector of the domain
points to the coordinate vector of its images. The above discussion is stated as the next result.
4.3. MATRIX OF A LINEAR TRANSFORMATION 117

Theorem 4.3.2. Let A = (v1 , . . . , vn ) and B = (w1 , . . . , wm ) be ordered bases of V and W,


respectively. If T ∈ L(V, W) then there exists a matrix S ∈ Mm×n (F) with
h i
S = T [A, B] = [T (v1 )]B · · · [T (vn )]B and [T (x)]B = S [x]A , for all x ∈ V.

Remark 4.3.3. Let V and W be vector spaces over F with ordered bases A1 = (v1 , . . . , vn )
and B1 = (w1 , . . . , wm ), respectively. Also, for α ∈ F with α 6= 0, let A2 = (αv1 , . . . , αvn ) and
B1 = (αw1 , . . . , αwm ) be another set of ordered bases of V and W, respectively. Then, for any
T ∈ L(V, W)
h i h i
T [A2 , B2 ] = [T (αv1 )]B2 · · · [T (αv1 )]B2 = [T (vn )]B1 · · · [T (v1 )]B1 = T [A1 , B1 ].

Thus, we see that the same matrix can be the matrix representation of T for two different pairs
of bases.

We now give a few examples to understand the above discussion and Theorem 4.3.2.

Q = (0, 1)
Q′ = (− sin θ, cos θ)

P ′ = (x′ , y ′ )

θ P = (cos θ, sin θ)
T
AF

θ P = (x, y)
P = (1, 0)
θ α
DR

O O

Figure 4.1: Counter-clockwise Rotation by an angle θ

Example 4.3.4. 1. Let T ∈ L(R2 ) represent a counterclockwise rotation by an angle θ, 0 ≤


θ < 2π. Then, using Figure 4.1, x = OP cos α and y = OP sin α, verify that
" # " # " # " #" #
x0 OP 0 cos(α + θ) OP cos α cos θ − sin α sin θ cos θ − sin θ x
= =  = .
y0 OP 0 sin(α + θ) OP sin α cos θ + cos α sin θ sin θ cos θ y

Or equivalently, the matrix in the standard ordered basis of R2 equals


" #
h i cos θ − sin θ
[T ] = T (e1 ), T (e2 ) = . (4.3.1)
sin θ cos θ

2. Let T ∈ L(R2 ) with T ((x, y)T ) = (x + y, x − y)T .


" #
h i 1 1
(a) Then [T ] = [T (e1 )] [T (e2 )] = .
1 −1
" # " #!
1 1
(b) On the image space take the ordered basis B = , . Then
0 1
"" # " # # " #
h i 1 1 0 2
[T ] = [T (e1 )]B [T (e2 )]B = = .
1 −1 1 −1
B B
118 CHAPTER 4. LINEAR TRANSFORMATIONS
" # " #!
−1 3
(c) In the above, let the ordered basis of the domain space be A = , . Then
1 1
"" " ## " " ## # "" # " # # " #
−1 3 0 4 2 2
T [A, B] = T T = = .
1 1 −2 2 −2 2
B B B B

3. Let A = (e1 , e2 ) and B = (e1 + e2 , e1 − e2 ) be two ordered bases of R2 . Then Compute


T [A, A] and T [B, B], where T ((x,"y)T ) = #(x + y, x − 2y)T . " #
1 1 −1 −1 1 1 1
Solution: Let A = Id2 and B = . Then, A = Id2 and B = . So,
1 −1 2 1 −1
"" " #!# " " #!# # "" # " # # " #
1 0 1 1 1 1
T [A, A] = T , T = , = and
0 1 1 −2 1 −2
"" " #!#A " " #!# A
# "" A # " #A # " #
1 3
1 1 2 0 2 2
T [B, B] = T , T = , = 3 3
1 −1 −1 3 2 −2
B B B B
" # " # " # " #
2 2 0 0
as = B −1 and = B −1 . Also, verify that T [B, B] = B −1 T [A, A]B.
−1 −1 3 3
B B

4. Let T ∈ L(R3 , R2 ) be defined by T ((x, y, z)T ) = (x + y − z, x + z)T . Determine [T ].


By definition
"" # " # " ## " #
1 1 −1 1 1 −1
[T ] = [[T (e1 )], [T (e2 )], [T (e3 )]] = , , = .
T

1 0 1 1 0 1
AF

5. Define T ∈ L(C3 ) by T (x) = x, for all x ∈ C3 . Determine the coordinate matrix with
DR

 
respect to the ordered basis A = e1 , e2 , e3 and B = (1, 0, 0), (1, 1, 0), (1, 1, 1) .
By definition, verify that
        
1 0 0 1 −1 0
        
T [A, B] = [[T (e1 )]B , [T (e2 )]B , [T (e3 )]B ] = 0 , 1 , 0
    

 = 0 1 −1
  
0 0 1 0 0 1
B B B
and         
1 1 1 1 1 1
        
T [B, A] = 
0 , 1 , 1
       = 0 1 1 .
  
0 0 1 0 0 1
A A A
Thus, verify that T [B, A]−1 = T [A, B] and T [A, A] = T [B, B] = I3 as the given map is
indeed the identity map.

6. Fix S ∈ Mn (C) and define T ∈ L(Cn ) by T (x) = Sx, for all x ∈ Cn . If A is the standard
basis of Cn then [T ] = S as

[T ][:, i] = [T (ei )]A = [S(ei )]A = [S[:, i]]A = S[:, i], for 1 ≤ i ≤ n.

7. Fix S ∈ Mm,n (C) and define T ∈ L(Cn , Cm ) by T (x) = Sx, for all x ∈ Cn . Let A and B
be the standard ordered bases of Cn and Cm , respectively. Then T [A, B] = S as

(T [A, B])[:, i] = [T (ei )]B = [S(ei )]B = [S[:, i]]B = S[:, i], for 1 ≤ i ≤ n.
4.4. SIMILARITY OF MATRICES 119

8. Fix S ∈ Mn (C) and define T ∈ L(Cn ) by T (x) = Sx, for all x ∈ Cn . Let A = (v1 , . . . , vn )
and B = (u1 , . . . , un ) be two ordered basses of Cn with respective basis matrices A and
B. Then
h i h i
T [A, B] = [T (v1 )]B · · · [T (v1 )]B = −1 −1
B T (v1 ) · · · B T (v1 )
h i h i
= B −1 Sv1 · · · B −1 Sv1 = B −1 S v1 · · · vn = B −1 SA.

In particular, if

(a) A = B then T [A, A] = A−1 SA. Thus, if S = In so that T = Id then Id[A, A] = In .


(b) S = In so that T = Id then Id[A, B] = B −1 A, an invertible matrix. Similarly,
Id[B, A] = A−1 B. So, Id[B, A] · Id[A, B] = (A−1 B)(B −1 A) = In .

9. Let T (x, y)T = (x + y, x − y)T and A = (e1 , e1 + e2 ) be the ordered basis of R2 . Then,


using Example 4.3.4.8a we obtain


" #" # " #
h i−1 h i 1 −1 1 2 0 2
T [A, A] = e1 e1 + e2 T (e1 ) T (e1 + e2 ) = = .
0 1 1 0 1 0

Example 4.3.5. [Finding T from T [A, B]]


T

1. Let V and W be vector spaces over F with ordered bases A and B, respectively. Suppose
AF

we are given the matrix S = T [A, B]. Then determine the corresponding T ∈ L(V, W).
DR

Solution: Let B be the basis matrix corresponding to the ordered basis B. Then, using
Equation (3.5.1) and Theorem 4.3.2, we see that

T (v) = B[T (v)]B = BT [A, B][v]A = BS[v]A .

2. In particular, if V = W = Fn and A = B then we see that

T (v) = BSB −1 v. (4.3.2)

Exercise 4.3.6. 1. Let T ∈ L(R2 ) represent the reflection about the line y = mx. Find [T ].

2. Let T ∈ L(R3 ) represent the reflection about the X-axis. Find [T ].

3. Let T ∈ L(R3 ) represent the counterclockwise rotation around the positive Z-axis by an
angle θ, 0 ≤ θ < 2π. Findits matrix with respect to the standard ordered basis of R3 .
cos θ − sin θ 0
 
[Hint: Is  sin θ cos θ 0 the required matrix?]

0 0 1

4. Define a function D ∈ L(R[x; n]) by D(f (x)) = f 0 (x). Find the matrix of D with respect
to the standard ordered basis of R[x; n]. Observe that Rng(D) ⊆ R[x; n − 1].
120 CHAPTER 4. LINEAR TRANSFORMATIONS

T [B, C]m×n S[C, D]p×m


(V, B, n) (W, C, m) (Z, D, p)

(ST )[B, D]p×n = S[C, D] · T [B, C]

Figure 4.2: Composition of Linear Transformations

4.4 Similarity of Matrices


Let V be a vector space over F with dim(V) = n and ordered basis B. Then any T ∈ L(V)
corresponds to a matrix in Mn (F). What happens if the ordered basis needs to change? We
answer this in this subsection.

Theorem 4.4.1 (Composition of Linear Transformations). Let V, W and Z be finite dimen-


sional vector spaces over F with ordered bases B, C and D, respectively. Also, let T ∈ L(V, W)
and S ∈ L(W, Z). Then S ◦ T = ST ∈ L(V, Z) (see Figure 4.2). Then

(ST ) [B, D] = S[C, D] · T [B, C].

Proof. Let B = (u1 , . . . , un ), C = (v1 , . . . , vm ) and D = (w1 , . . . , wp ) be the ordered bases of


V, W and Z, respectively. Then using Theorem 4.3.2, we have
T

(ST )[B, D] = [[ST (u1 )]D , . . . , [ST (un )]D ] = [[S(T (u1 ))]D , . . . , [S(T (un ))]D ]
AF

= [S[C, D] [T (u1 )]C , . . . , S[C, D] [T (un )]C ]


DR

= S[C, D] [[T (u1 )]C , . . . , [T (un )]C ] = S[C, D] · T [B, C].

Hence, the proof of the theorem is complete.


As an immediate corollary of Theorem 4.4.1 we have the following result.

Theorem 4.4.2 (Inverse of a Linear Transformation). Let V is a vector space with dim(V) = n.
If T ∈ L(V) is invertible then for any ordered basis B and C of the domain and co-domain,
respectively, one has (T [C, B])−1 = T −1 [B, C]. That is, the inverse of the coordinate matrix of
T is the coordinate matrix of the inverse linear transform.

Proof. As T is invertible, T T −1 = Id. Thus, Example 4.3.4.8a and Theorem 4.4.1 imply

In = Id[B, B] = (T T −1 )[B, B] = T [C, B] · T −1 [B, C].

Hence, by definition of inverse, T −1 [B, C] = (T [C, B])−1 and the required result follows.

Exercise 4.4.3. Find the matrix of the linear transformations given below.

1. Let B = x1 , x2 , x3 be an ordered basis of R3 . Now, define T ∈ L(R3 ) by T (x1 ) = x2 ,




T (x2 ) = x3 and T (x3 ) = x1 . Determine T [B, B]. Is T invertible?

2. Let B = 1, x, x2 , x3 be an ordered basis of R[x; 3] and define T ∈ L(R[x; 3]) by T (1) = 1,




T (x) = 1 + x, T (x2 ) = (1 + x)2 and T (x3 ) = (1 + x)3 . Prove that T is invertible. Also,
find T [B, B] and T −1 [B, B].
4.4. SIMILARITY OF MATRICES 121

T [B, B]
(V, B) (V, B)

[C, B] [B, C]

(V, C) (V, C)
T [C, C]

Figure 4.3: T [C, C] = Id[B, C] · T [B, B] · Id[C, B] - Similarity of Matrices

Let V be a finite dimensional vector space. Then, the next result answers the question “what
happens to the matrix T [B, B] if the ordered basis B changes to C?”

Theorem 4.4.4. Let B = (u1 , . . . , un ) and C = (v1 , . . . , vn ) be two ordered bases of V and Id
the identity operator. Then, for any linear operator T ∈ L(V)

T [C, C] = Id[B, C] · T [B, B] · Id[C, B] = (Id[C, B])−1 · T [B, B] · Id[C, B]. (4.4.1)

Proof. As Id is an identity operator, T [B, C] as (Id ◦ T ◦ Id)[B, C] (see Figure 4.3 for clarity).
Thus, using Theorem 4.4.1, we get

T [B, C] = (Id ◦ T ◦ Id)[B, C] = Id[B, C] · T [B, B] · Id[C, B].


T
AF

Hence, using Theorem 4.4.2, the required result follows.


DR

Let V be a vector space and let T ∈ L(V). If dim(V) = n then every ordered basis B of V
gives an n × n matrix T [B, B]. So, as we change the ordered basis, the coordinate matrix of
T changes. Theorem 4.4.4 tells us that all these matrices are related by an invertible matrix.
Thus, we are led to the following definitions.

Definition 4.4.5. Let V be a vector space with ordered bases B and C. If T ∈ L(V) then,
T [C, C] = Id[B, C] · T [B, B] · Id[C, B]. The matrix Id[B, C] is called the change of basis matrix
(also, see Theorem 3.5.10) from B to C.

Definition 4.4.6. Let X, Y ∈ Mn (C). Then, X and Y are said to be similar if there exists a
non-singular matrix P such that P −1 XP = Y ⇔ XP = P Y .

Example 4.4.7. Let B = 1 + x, 1 + 2x + x2 , 2 + x and C = 1, 1 + x, 1 + x + x2 be ordered


 

bases of R[x; 2]. Then, verify that Id[B, C]−1 = Id[C, B], as
 
−1 1 −2
Id[C, B] = [[1]B , [1 + x]B , [1 + x + x2 ]B ] = 
 
 0 0 1  and

1 0 1
 
0 −1 1
Id[B, C] = [[1 + x]C , [1 + 2x + x2 ]C , [2 + x]C ] = 
 
1 1 1 .
 
0 1 0
122 CHAPTER 4. LINEAR TRANSFORMATIONS

Exercise 4.4.8. 1. Let V be a vector space with dim(V) = n. Let T ∈ L(V) satisfy
T n−1 6= 0 but Tn = 0. Then, use Exercise 4.1.11.3 to get an ordered basis B =
u, T (u), . . . , T n−1 (u) of V.

 
0 0 0 ··· 0
1 0 0 · · · 0
 
 
(a) Now, prove that T [B, B] = 0
 1 0 · · · 0 .
. . . . . .. 
. . . .
. 
0 0 ··· 1 0
(b) Let A ∈ Mn (C) satisfy An−1 6= 0 but An = 0. Then, prove that A is similar to the
matrix given in Part 1a.

2. Let A be an ordered basis of a vector space V over F with dim(V) = n. Then prove that
the set of all possible matrix representations of T is given by (also see Definition 4.4.5)

{S · T [A, A] · S −1 | S ∈ Mn (F) is an invertible matrix}.

3. Let B1 (α, β) = {(x, y)T ∈ R2 : (x − α)2 + (y − β)2 ≤ 1}. Then, can we get a linear
transformation T ∈ L(R2 ) such that T (S) = W , where S and W are given below?
(a) S = B1 (0, 0) and W = B1 (1, 1).
T

(b) S = B1 (0, 0) and W = B1 (.3, 0).


AF

(c) S = B1 (0, 0) and W = hull(±e1 , ±e2 ), where hull means the convex hull.
DR

(d) S = B1 (0, 0) and W = {(x, y)T ∈ R2 : x2 + y 2 /4 = 1}.


(e) S = hull(±e1 , ±e2 ) and W is the interior of a pentagon.

4. Let V, W be vector spaces over F with dim(V) = n and dim(W) = m and ordered bases
B and C, respectively. Define IB,C : L(V, W) → Mm,n (F) by IB,C (T ) = T [B, C]. Show
that IB,C is an isomorphism. Thus, when bases are fixed, the number of m × n matrices
is same as the number of linear transformations.
5. Define T ∈ L(R3 ) by T ((x, y, z)T ) = (x + y + 2z, x − y − 3z, 2x + 3y + z)T . Let B be the

standard ordered basis and C = (1, 1, 1), (1, −1, 1), (1, 1, 2) be another ordered basis of
R3 . Then find

(a) matrices T [B, B] and T [C, C].


(b) the matrix P such that P −1 T [B, B] P = T [C, C].

4.5 Dual Space*


Definition 4.5.1. Let V be a vector space over F. Then a map T ∈ L(V, F) is called a linear
functional on V.
Example 4.5.2. 1. Let a ∈ Cn be fixed. Then, T (x) = a∗ x is a linear function from Cn to
C.
4.5. DUAL SPACE* 123

2. Define T (A) = tr(A), for all A ∈ Mn (R). Then, T is a linear functional from Mn (R) to R.
Rb
3. Define T (f ) = f (t)dt, for all f ∈ C([a, b], R). Then, T is a linear functional from
a
L(C([a, b], R) to R.
Rb
4. Define T (f ) = t2 f (t)dt, , for all f ∈ C([a, b], R). Then, T is a linear functional from
a
L(C([a, b], R) to R.
5. Define T : C3 → C by T ((x, y, z)T ) = x. Is it a linear functional?
6. Let B be a basis of a vector space V over F. For a fixed element u ∈ B, define
(
1 if x = u
T (x) =
0 if x ∈ B \ u.

Now, extend T linearly to all of V. Does, T give rise to a linear functional?

Definition 4.5.3. Let V be a vector space over F. Then L(V, F) is called the dual space of
V and is denoted by V∗ . The double dual space of V, denoted V∗∗ , is the dual space of V∗ .

We first give an immediate corollary of Theorem 4.2.20.

Corollary 4.5.4. Let V and W be vector spaces over F with dim V = n and dim W = m.
1. Then L(V, W) ∼
= Fmn . Moreover, {fij |1 ≤ i ≤ n, 1 ≤ j ≤ m} is a basis of L(V, W).
T
AF

2. In particular, if W = F then L(V, F) = V∗ ∼ = Fn . Moreover, if({v1 , . . . , vn } is a basis of


1, if k = i
DR

V then the set {fi |1 ≤ i ≤ n} is a basis of V∗ , where fi (vk ) = The basis


0, k 6= i.
{fi |1 ≤ i ≤ n} is called the dual basis of Fn .

Exercise 4.5.5. Let V be a vector space. Suppose there exists v ∈ V such that f (v) = 0, for
all f ∈ V∗ . Then prove that v = 0.

So, we see that V∗ can be understood through a basis of V. Thus, one can understand V∗∗
again via a basis of V∗ . But, the question arises “can we understand it directly via the vector
space V itself?” We answer this in affirmative by giving a canonical isomorphism from V to V∗∗ .
To do so, for each v ∈ V, we define a map Lv : V∗ → F by Lv (f ) = f (v), for each f ∈ V∗ . Then
Lv is a linear functional as

Lv (αf + g) = (αf + g) (v) = αf (v) + g(v) = αLv (f ) + Lv (g).

So, for each v ∈ V, we have obtained a linear functional Lv ∈ V∗∗ . Note that, if v 6= w then,
Lv 6= Lw . Indeed, if Lv = Lw then, Lv (f ) = Lw (f ), for all f ∈ V∗ . Thus, f (v) = f (w), for all
f ∈ V∗ . That is, f (v − w) = 0, for each f ∈ V∗ . Hence, using Exercise 4.5.5, we get v − w = 0,
or equivalently, v = w.
We use the above argument to give the required canonical isomorphism.

Theorem 4.5.6. Let V be a vector space over F. If dim(V) = n then the canonical map
T : V → V∗∗ defined by T (v) = Lv is an isomorphism.
124 CHAPTER 4. LINEAR TRANSFORMATIONS

Proof. Note that for each f ∈ V∗ ,

Lαv+u (f ) = f (αv + u) = αf (v) + f (u) = αLv (f ) + Lu (f ) = (αLv + Lu ) (f ).

Thus, Lαv+u = αLv +Lu . Hence, T (αv+u) = αT (v)+T (u). Thus, T is a linear transformation.
For verifying T is one-one, assume that T (v) = T (u), for some u, v ∈ V. Then, Lv = Lu . Now,
use the argument just before this theorem to get v = u. Therefore, T is one-one.
Thus, T gives an inclusion (one-one) map from V to V∗∗ . Further, applying Corollary 4.5.4.2
to V∗ , gives dim(V∗∗ ) = dim(V∗ ) = n. Hence, the required result follows.
We now give a few immediate consequences of Theorem 4.5.6.

Corollary 4.5.7. Let V be a vector space of dimension n with basis B = {v1 , . . . , vn }.


1. Then, a basis of V∗∗ , the double dual of V, equals D = {Lv1 , . . . , Lvn }. Thus, for each
T ∈ V∗∗ there exists x ∈ V such that T (f ) = f (x), for all f ∈ V∗ . Or equivalently, there
exists x ∈ V such that T = Tx .

2. If C = {f1 , . . . , fn } is the dual basis of V∗ defined using the basis B (see Corollary 4.5.4.2)
then D is indeed the dual basis of V∗∗ obtained using the basis C of V∗ . Thus, each basis
of V∗ is the dual basis of some basis of V.

Proof. Part 1 is direct as T : V → V∗∗ was a canonical inclusion map. For Part 2, we need to
T

show that
AF

( (
1, if j = i 1, if j = i
DR

Lvi (fj ) = or equivalently fj (vi ) =


0, if j 6= i 0, if j 6= i

which indeed holds true using Corollary 4.5.4.2.


Let V be a finite dimensional vector space. Then Corollary 4.5.7 implies that the spaces V
and V∗ are naturally dual to each other.
We are now ready to prove the main result of this subsection. To start with, let V and W
be vector spaces over F. Then, for each T ∈ L(V, W), we want to define a map Tb : W∗ → V∗ .
So, if g ∈ W∗ then, Tb(g) a linear functional
  from V to F. So, we need to be evaluate T (g) at
b
an element of V. Thus, we define Tb(g) (v) = g (T (v)), for all v ∈ V. Now, we note that
Tb ∈ L(W∗ , V∗ ), as for every g, h ∈ W∗ ,
   
Tb(αg + h) (v) = (αg + h) (T (v)) = αg (T (v)) + h (T (v)) = αTb(g) + Tb(h) (v),

for all v ∈ V implies that Tb(αg + h) = αTb(g) + Tb(h).

Theorem 4.5.8. Let V and W be vector spaces over F with ordered bases A = (v1 , . . . , vn )
and B = (w1 , . . . , wm ), respectively. Also, let A∗ = (f1 , . . . , fn ) and B ∗ = (g1 , . . . , gm ) be the
corresponding ordered bases of the dual spaces V∗ and W∗ , respectively. Then,

Tb[B ∗ , A∗ ] = (T [A, B])T ,

the transpose of the coordinate matrix T .


4.6. SUMMARY 125
hh i h i i
Proof. Note that we need to compute Tb[B ∗ , A∗ ] = Tb(g1 ) ∗
, . . . , T
b(gm ) and prove that
A A∗
it equals the transpose of the matrix T [A, B]. So, let
 
a11 a12 ··· a1n
 
 a21 a22 · · · a2n 
T [A, B] = [[T (v1 )]B , . . . , [T (vn )]B ] =  . .
 
 .. .. .. .. 
 . . . 

am1 am2 · · · amn

Thus, to prove the required result, we need to show that


 
aj1
n
 
h i  aj2  X
T (gj ) ∗ = [f1 , . . . , fn ]  .  = ajk fk , for 1 ≤ j ≤ m. (4.5.1)
b  
A  .. 
  k=1
ajn

n n
 
P P
Now, recall that the functionals fi ’s and gj ’s satisfy αk fk (vt ) = αk (fk (vt )) = αt ,
k=1 k=1
for 1 ≤ t ≤ n and [gj (w1 ), . . . , gj (wm )] = eTj , a row vector with 1 at the j-th place and 0,
elsewhere. So, let B = [w1 , . . . , wm ] and evaluate Tb(gj ) at vt ’s, the elements of A.
 
Tb(gj ) (vt ) = gj (T (vt )) = gj (B [T (vt )]B ) = [gj (w1 ), . . . , gj (wm )] [T (vt )]B
T

 
a1t
AF

n
  !
 a2t  X
= eTj  .  = ajt =
DR

ajk fk (vt ).
 
 .. 
  k=1
amt
n
P
Thus, the linear functional Tb(gj ) and ajk fk are equal at vt , for 1 ≤ t ≤ n, the basis vectors
k=1
n
of V. Hence Tb(gj ) =
P
ajk fk which gives Equation (4.5.1).
k=1

Remark 4.5.9. The proof of Theorem 4.5.8 also shows the following.
1. For each T ∈ L(V, W) there exists a unique map Tb ∈ L(W∗ , V∗ ) such that
 
Tb(g) (v) = g (T (v)) , for each g ∈ W∗ .

2. The coordinate matrices T [A, B] and Tb[B ∗ , A∗ ] are transpose of each other, where the
ordered bases A∗ of V∗ and B ∗ of W∗ correspond, respectively, to the ordered bases A of
V and B of W.
3. Thus, the results on matrices and its transpose can be re-written in the language a vector
space and its dual space.

4.6 Summary
126 CHAPTER 4. LINEAR TRANSFORMATIONS

T
AF
DR
Chapter 5

Inner Product Spaces

5.1 Definition and Basic Properties


Recall the dot product in R2 and R3 . Dot product helped us to compute the length of vectors
and angle between vectors. This enabled us to rephrase geometrical problems in R2 and R3
in the language of vectors. We generalize the idea of dot product to achieve similar goal for a
general vector space over R or C. So, in this chapter F will denote either R or C.

Definition 5.1.1. Let V be a vector space over F. An inner product over V, denoted by
h , i, is a map from V × V to F satisfying
T
AF

1. hau + bv, wi = ahu, wi + bhv, wi, for all u, v, w ∈ V and a, b ∈ F,


DR

2. hu, vi = hv, ui, the complex conjugate of hu, vi, for all u, v ∈ V and

3. hu, ui ≥ 0 for all u ∈ V. Furthermore, equality holds if and only if u = 0.

Remark 5.1.2. Using the definition of inner product, we immediately observe that
1. hv, αwi = hαw, vi = αhw, vi = αhv, wi, for all α ∈ F and v, w ∈ V.
2. If hu, vi = 0 for all v ∈ V then in particular hu, ui = 0. Hence, u = 0.

Definition 5.1.3. Let V be a vector space with an inner product h , i. Then, (V, h , i) is called
an inner product space (in short, ips).

Example 5.1.4. Examples 1 and 2 that appear below are called the standard inner product
or the dot product on Rn and Cn , respectively. Whenever an inner product is not clearly
mentioned, it will be assumed to be the standard inner product.

1. For u = (u1 , . . . , un )T , v = (v1 , . . . , vn )T ∈ Rn define hu, vi = u1 v1 + · · · + un vn = vT u.


Then, h , i is indeed an inner product and hence Rn , h , i is an ips.


2. For u = (u1 , . . . , un )∗ , v = (v1 , . . . , vn )∗ ∈ Cn define hu, vi = u1 v1 + · · · + un vn = v∗ u.


Then, Cn , h , i is an ips.


" #
4 −1
3. For x = (x1 , x2 )T , y = (y1 , y2 )T ∈ R2 and A = , define hx, yi = yT Ax. Then,
−1 2
h , i is an inner product as hx, xi = (x1 − x2 )2 + 3x21 + x22 .

127
128 CHAPTER 5. INNER PRODUCT SPACES
" #
a b
4. Fix A = with a, c > 0 and ac > b2 . Then, hx, yi = yT Ax is an inner product on
b c
h i2
bx2
R2 as hx, xi = ax21 + 2bx1 x2 + cx22 = a x1 + 1
ac − b2 x22 .
 
a + a

5. Verify that for x = (x1 , x2 , x3 )T , y = (y1 , y2 , y3 )T ∈ R3 , hx, yi = 10x1 y1 + 3x1 y2 + 3x2 y1 +


2x2 y2 + x2 y3 + x3 y2 + x3 y3 defines an inner product.

6. For x = (x1 , x2 )T , y = (y1 , y2 )T ∈ R2 , we define three maps that satisfy at least one
condition out of the three conditions for an inner product. Determine the condition which
is not satisfied. Give reasons for your answer.

(a) hx, yi = x1 y1 .
(b) hx, yi = x21 + y12 + x22 + y22 .
(c) hx, yi = x1 y13 + x2 y23 .

7. Let A ∈ Mn (C) be a Hermitian matrix. Then, for x, y ∈ Cn , define hx, yi = y∗ Ax. Then,
h , i satisfies hx, yi = hy, xi and hx + αz, yi = hx, yi + αhz, yi, for all x, y, z ∈ Cn and
α ∈ C. Does there exist conditions on A such that hx, xi ≥ 0 for all x ∈ C? This will be
answered in affirmative in the chapter on eigenvalues and eigenvectors.

8. For A, B ∈ Mn (R), define hA, Bi = tr(BT A). Then,


T
AF

hA + B, Ci = tr CT (A + B) = tr(CT A) + tr(CT B) = hA, Ci + hB, Ci and




hA, Bi = tr(BT A) = tr( (BT A)T ) = tr(AT B) = hB, Ai.


DR

n n n
If A = [aij ] then hA, Ai = tr(AT A) = (AT A)ii = a2ij and therefore,
P P P
aij aij =
i=1 i,j=1 i,j=1
hA, Ai > 0 for all nonzero matrix A.
R1
9. Consider the complex vector space C[−1, 1] and define hf, gi = f (x)g(x)dx. Then,
−1
R1
(a) hf , f i = | f (x) |2 dx ≥ 0 as | f (x) |2 ≥ 0 and this integral is 0 if and only if f ≡ 0
−1
as f is continuous.
R1 R1 R1
(b) hg, f i = g(x)f (x)dx = g(x)f (x)dx = f (x)g(x)dx = hf , gi.
−1 −1 −1
R1 R1
(c) hf + g, hi = (f + g)(x)h(x)dx = [f (x)h(x) + g(x)h(x)]dx = hf , hi + hg, hi.
−1 −1
R1 R1
(d) hαf , gi = (αf (x))g(x)dx = α f (x)g(x)dx = αhf , gi.
−1 −1
(e) Fix an ordered basis B = [u1 , . . . , un ] of a complex vector space V. Then, for any
   
a1 b1
. . n
u, v ∈ V, with [u]B =  . .
P
 .  and [v]B =  . , define hu, vi = ai bi . Then, h , i is
i=1
an bn
indeed an inner product in V. So, any finite dimensional vector space can be endowed
with an inner product.
5.1. DEFINITION AND BASIC PROPERTIES 129

5.1.1 Cauchy Schwartz Inequality

As hu, ui > 0, for all u 6= 0, we use inner product to define length of a vector.

Definition 5.1.5. Let V be a vector space over F. Then, for any vector u ∈ V, we define the
p
length (norm) of u, denoted kuk = hu, ui, the positive square root. A vector of norm 1 is
u
called a unit vector. Thus, is called the unit vector in the direction of u.
kuk
1. Let V be an ips and u ∈ V. Then, for any scalar α, kαuk = α · kuk.

Example 5.1.6.
√ √
2. Let u = (1, −1, 2, −3)T ∈ R4 . Then, kuk = 1 + 1 + 4 + 9 = 15. Thus, √1 u
15
and
− √115 u are vectors of norm 1. Moreover √1 u
15
is a unit vector in the direction of u.

Exercise 5.1.7. 1. Let u = (−1, 1, 2, 3, 7)T ∈ R5 . Find all α ∈ R such that kαuk = 1.

2. Let u = (−1, 1, 2, 3, 7)T ∈ C5 . Find all α ∈ C such that kαuk = 1.

3. Prove that kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 , for all xT , yT ∈ Rn . This equality is


called the Parallelogram Law as in a parallelogram the sum of square of the lengths
of the diagonals is equal to twice the sum of squares of the lengths of the sides.

4. Apollonius’ Identity: Let the length of the sides of a triangle be a, b, c ∈ R and that of
the median be d ∈ R. If the median is drawn on the side with length a then prove that
 a 2 
T

2 2
b +c =2 d + 2 .
AF

2
DR

5. Let u = (1, 2)T , v = (2, −1)T ∈ R2 . Then, does there "exist an


# inner product in R such
2

a b
that kuk = 1, kvk = 1 and hu, vi = 0? [Hint: Let A = and define hx, yi = yT Ax.
b c
Use given conditions to get a linear system of 3 equations in the variables a, b, c.]

6. Let x = (x1 , x2 )T , y = (y1 , y2 )T ∈ R2 . Then, hx, yi = 3x1 y1 − x1 y2 − x2 y1 + x2 y2 defines


an inner product. Use this inner product to find

(a) the angle between e1 = (1, 0)T and e2 = (0, 1)T .


(b) v ∈ R2 such that hv, e1 i = 0.
(c) x, y ∈ R2 such that kxk = kyk = 1 and hx, yi = 0.

7. Under the standard inner product in Mm,n (R), Rm and Rn , prove that
m n
(a) for A ∈ Mm,n (R), kAk2 = tr(AT A) = kA[k, :]k2 = kA[:, `]k2 .
P P
k=1 `=1
(b) for A ∈ Mm,n (R) and x ∈ Rn , kAxk ≤ kAk · kxk.
m m n
Ans: kAxk2 = | (Ax)k |2 = | A[k, :]T x |2 = |hx, A[k, :]i|2
P P  P
k=1 k=1 k=1
m m
kxk2 · kA[k, :]k2 = kxk2 kA[k, :]k2 = kxk2 kAk2 .
P P

k=1 k=1

A very useful and a fundamental inequality, commonly called the Cauchy-Schwartz inequal-
ity, concerning the inner product is proved next.
130 CHAPTER 5. INNER PRODUCT SPACES

Theorem 5.1.8 (Cauchy-Bunyakovskii-Schwartz inequality). Let V be an inner product space


over F. Then, for any u, v ∈ V
| hu, vi | ≤ kuk kvk. (5.1.1)

Moreover, equality holds in Inequality


 (5.1.1)
 if and only if u and v are linearly dependent.
u u
Furthermore, if u 6= 0 then v = v, .
kuk kuk
Proof. If u = 0 then Inequality (5.1.1) holds. Hence, let u 6= 0. Then, by Definition 5.1.1.3,
hv, ui
hλu + v, λu + vi ≥ 0 for all λ ∈ F and v ∈ V. In particular, for λ = − ,
kuk2

0 ≤ hλu + v, λu + vi = λλkuk2 + λhu, vi + λhv, ui + kvk2


hv, ui hv, ui hv, ui hv, ui | hv, ui |2
= 2 2
kuk2 − 2
hu, vi − 2
hv, ui + kvk2 = kvk2 − .
kuk kuk kuk kuk kuk2

Or, in other words | hv, ui |2 ≤ kuk2 kvk2 and the proof of the inequality is over.
Now, note that equality holds in Inequality (5.1.1) if and only if hλu + v, λu + vi = 0, or
equivalently, λu + v = 0. Hence, u and v are linearly dependent. Moreover,

0 = h0, ui = hλu + v, ui = λhu, ui + hv, ui

hv, ui
 
u u
implies that v = −λu = − 2
u = v, .
kuk kuk kuk
T

n 2  n  n
AF


Corollary 5.1.9. Let x, y ∈ R . Then
n
P P 2 P 2
xi y i ≤ xi yi .
i=1 i=1 i=1
DR

5.1.2 Angle between two Vectors

Let V be a real vector space. Then, for u, v ∈ V, the Cauchy-Schwartz inequality implies that
hu,vi
−1 ≤ kuk kvk ≤ 1. We use this together with the properties of the cosine function to define the
angle between two vectors in an inner product space.

Definition 5.1.10. Let V be a real vector space. If θ ∈ [0, π] is the angle between u, v ∈ V\{0}
then we define
hu, vi
cos θ = .
kuk kvk
Example 5.1.11. 1. Take (1, 0)T , (1, 1)T ∈ R2 . Then, cos θ = √1 .
2
So θ = π/4.
2. Take (1, 1, 0)T , (1, 1, 1)T ∈ R3 . Then, angle between them, say β = cos−1 √2 .
6

3. Angle depends on the IP. Take hx, yi = 2x1 y1 + x1 y2 + x2 y1 + x2 y2 on R2 . Then, angle


between (1, 0)T , (1, 1)T ∈ R2 equals cos−1 √3 .
10

4. As hx, yi = hy, xi for any real vector space, the angle between x and y is same as the
angle between y and x.
 
1 1
5. Let a, b ∈ R with a, b > 0. Then, prove that (a + b) + ≥ 4.
a b
6. For
n1  ≤ i ≤ n, let ai ∈ R with ai > 0. Then, use Corollary 5.1.9 to show that
n 1
≥ n2 .
P P
ai
i=1 i=1 ai
5.1. DEFINITION AND BASIC PROPERTIES 131

n( | z1 |2 + · · · + | zn |2 ), for z1 , . . . , zn ∈ C. When does


p
7. Prove that | z1 + · · · + zn | ≤
the equality hold?
8. Let V be an ips. If u, v ∈ V with kuk = 1, kvk = 1 and hu, vi = 1 then prove that u = αv
for some α ∈ F. Is α = 1?

a
b

A B
c

Figure 5.1: Triangle with vertices A, B and C

We will now prove that if A, B and C are the vertices of a triangle (see Figure 5.1) and a, b
b2 +c2 −a2
and c, respectively, are the lengths of the corresponding sides then cos(A) = 2bc . This in
turn implies that the angle between vectors has been rightly defined.

Lemma 5.1.12. Let A, B and C be the vertices of a triangle (see Figure 5.1) with corresponding
side lengths a, b and c, respectively, in a real inner product space V then

b2 + c2 − a2
cos(A) = .
T

2bc
AF

Proof. Let 0, u and v be the coordinates of the vertices A, B and C, respectively, of the triangle
DR

ABC. Then, AB ~ = u, AC
~ = v and BC ~ = v − u. Thus, we need to prove that

kvk2 + kuk2 − kv − uk2


cos(A) = ⇔ kvk2 + kuk2 − kv − uk2 = 2 kvk kuk cos(A).
2kvkkuk

Now, by definition kv−uk2 = kvk2 +kuk2 −2hv, ui and hence kvk2 +kuk2 −kv−uk2 = 2 hu, vi.
As hv, ui = kvk kuk cos(A), the required result follows.

Definition 5.1.13. Let V be an inner product space over R. Then,


1. the vectors u, v ∈ V are called orthogonal/perpendicular if hu, vi = 0.
2. Let S ⊆ V. Then, the orthogonal complement of S in V, denoted S ⊥ , equals

S ⊥ = {v ∈ V : hv, wi = 0, for all w ∈ S}.

Example 5.1.14. 1. 0 is orthogonal to every vector as h0, xi = 0 for all x ∈ V.

2. If V is a vector space over R or C then 0 is the only vector that is orthogonal to itself.

3. Let V = R.

(a) S = {0}. Then, S ⊥ = R.


(b) S = R, Then, S ⊥ = {0}.
(c) Let S be any subset of R containing a nonzero real number. Then, S ⊥ = {0}.
132 CHAPTER 5. INNER PRODUCT SPACES

4. Let u = (1, 2)T . What is u⊥ in R2 ?


Solution: {(x, y)T ∈ R2 | x + 2y = 0}. Is this Null(u)? Note that (2, −1)T is a basis of
u⊥ and for any vector x ∈ R2 ,

2x1 − x2
 
u u x1 + 2x2
x = hx, ui 2
+ x − hx, ui 2
= (1, 2)T + (2, −1)T
kuk kuk 5 5

is a decomposition of x into two vectors, one parallel to u and the other parallel to u⊥ .

5. Fix u = (1, 1, 1, 1)T , v = (1, 1, −1, 0)T ∈ R4 . Determine z, w ∈ R4 such that u = z + w


with the condition that z is parallel to v and w is orthogonal to v.
Solution: As z is parallel to v, z = kv = (k, k, −k, 0)T , for some k ∈ R. Since w is
orthogonal to v the vector w = (a, b, c, d)T satisfies a + b − c = 0. Thus, c = a + b and

(1, 1, 1, 1)T = u = z + w = (k, k, −k, 0)T + (a, b, a + b, d)T .

Comparing the corresponding coordinates, gives the linear system d = 1, a + k = 1,


b + k = 1 and a + b − k = 1 in the variables a, b, d and k. Thus, solving for a, b, d and k
1 1
gives z = (1, 1, −1, 0)T and w = (2, 2, 4, 3)T .
3 3
6. Let x, y ∈ Rn then prove that

(a) hx, yi = 0 ⇐⇒ kx − yk2 = kxk2 + kyk2 (Pythagoras Theorem).


T

Solution: Use kx − yk2 = kxk2 + kyk2 − 2hx, yi to get the required result follows.
AF

(b) kxk = kyk ⇐⇒ hx + y, x − yi = 0 (x and y form adjacent sides of a rhombus as the


DR

diagonals x + y and x − y are orthogonal).


Solution: Use hx + y, x − yi = kxk2 − kyk2 to get the required result follows.
(c) 4hx, yi = kx + yk2 − kx − yk2 (polarization identity in Rn ).
Solution: Just expand the right hand side to get the required result follows.
(d) kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2 (parallelogram law: the sum of squares
of the diagonals of a parallelogram equals twice the sum of squares of its sides).
Solution: Just expand the left hand side to get the required result follows.

7. Let P = (1, 1, 1)T , Q = (2, 1, 3)T and R = (−1, 1, 2)T be three vertices of a triangle in R3 .
Compute the angle between the sides P Q and P R.
Solution: Method 1: Note that P~Q = (2, 1, 3)T − (1, 1, 1)T = (1, 0, 2)T , P~R =
~ = (−3, 0, −1)T . As hP~Q, P~Ri = 0, the angle between the sides
(−2, 0, 1)T and RQ
π
P Q and P R is .
2
√ √ √
Method 2: kP Qk = 5, kP Rk = 5 and kQRk = 10. As kQRk2 = kP Qk2 + kP Rk2 ,
π
by Pythagoras theorem, the angle between the sides P Q and P R is .
2
Exercise 5.1.15. 1. Let V be an ips.
(a) If S ⊆ V then S ⊥ is a subspace of V and S ⊥ = (LS(S))⊥ .
(b) Furthermore, if V is finite dimensional then S ⊥ and LS(S) are complementary. That
is, V = LS(S) + S ⊥ . Equivalently, hu, wi = 0, for all u ∈ LS(S) and w ∈ S ⊥ .
5.1. DEFINITION AND BASIC PROPERTIES 133

2. Consider R3 with the standard inner product. Find


(a) S ⊥ for S = {(1, 1, 1)T , (0, 1, −1)T } and S = LS((1, 1, 1)T , (0, 1, −1)T ).
(b) vectors v, w ∈ R3 such that v, w, u = (1, 1, 1)T are mutually orthogonal.
(c) the line passing through (1, 1, −1)T and parallel to (a, b, c) 6= 0.
(d) the plane containing (1, 1 − 1) with (a, b, c) 6= 0 as the normal vector.
(e) the area of the parallelogram with three vertices 0T , (1, 2, −2)T and (2, 3, 0)T .
(f ) the area of the parallelogram when kxk = 5, kx − yk = 8 and kx + yk = 14.
(g) the plane containing (2, −2, 1)T and perpendicular to the line with parametric equa-
tion x = t − 1, y = 3t + 2, z = t + 1.
(h) the plane containing the lines (1, 2, −2) + t(1, 1, 0) and (1, 2, −2) + t(0, 1, 2).
(i) k such that cos−1 (hu, vi) = π/3, where u = (1, −1, 1)T and v = (1, k, 1)T .
(j) the plane containing (1, 1, 2)T and orthogonal to the line with parametric equation
x = 2 + t, y = 3 and z = 1 − t.
(k) a parametric equation of a line containing (1, −2, 1)T and orthogonal to x+3y +2z =
1.

3. Let P = (3, 0, 2)T , Q = (1, 2, −1)T and R = (2, −1, 1)T be three points in R3 . Then,
(a) find the area of the triangle with vertices P, Q and R.
T

(b) find the area of the parallelogram built on vectors P~Q and QR.
~
AF

(c) find a nonzero vector orthogonal to the plane of the above triangle.
DR


(d) find all vectors x orthogonal to P~Q and QR
~ with kxk = 2.

(e) the volume of the parallelepiped built on vectors P~Q and QR


~ and x, where x is one
of the vectors found in Part 3d. Do you think the volume would be different if you
choose the other vector x?

4. Let p1 be a plane containing A = (1, 2, 3)T and (2, −1, 1)T as its normal vector. Then,

(a) find the equation of the plane p2 that is parallel to p1 and contains (−1, 2, −3)T .
(b) calculate the distance between the planes p1 and p2 .

5. In the parallelogram ABCD, ABkDC and ADkBC and A = (−2, 1, 3)T , B = (−1, 2, 2)T
and C = (−3, 1, 5)T . Find the

(a) coordinates of the point D,


(b) cosine of the angle BCD.
(c) area of the triangle ABC
(d) volume of the parallelepiped determined by AB, AD and (0, 0, −7)T .

6. Let W = {(x, y, z, w)T ∈ R4 : x + y + z − w = 0}. Find a basis of W⊥ .

7. Recall the ips Mn (R) (see Example 5.1.4.8). If W = {A ∈ Mn (R) | AT = A} then W⊥ ?


134 CHAPTER 5. INNER PRODUCT SPACES

5.1.3 Normed Linear Space

To proceed further, recall that a vector space over R or C was a linear space.

Definition 5.1.16. Let V be a linear space.


1. Then, a norm on V is a function f (x) = kxk from V to R such that
(a) kxk ≥ 0 for all x ∈ V and if kxk = 0 then x = 0.
(b) kαxk = | α | kxk for all α ∈ F and x ∈ V.
(c) kx + yk ≤ kxk + kyk for all x, y ∈ V (triangle inequality).

2. A linear space with a norm on it is called a normed linear space (nls).

Theorem 5.1.17. Let V be a normed linear space and x, y ∈ V. Then, kxk − kyk ≤ kx − yk.

Proof. As kxk = kx − y + yk ≤ kx − yk + kyk one has kxk − kyk ≤ kx − yk. Similarly, one
obtains kyk − kxk ≤ ky − xk = kx − yk. Combining the two, the required result follows.
1. On R3 , kxk = x21 + x22 + x23 is a norm. Also, observe that this norm
p
Example 5.1.18.
p
corresponds to hx, xi, where h, i is the standard inner product.
2. Let V be an ips. Is it true that f (x) = hx, xi is a norm?
p

Solution: Yes. The readers should verify the first two conditions. For the third condition,
recalling the Cauchy-Schwartz inequality, we get
T

f (x + y)2 = hx + y, x + yi = hx, xi + hx, yi + hy, xi + hy, yi


AF

≤ kxk2 + kxkkyk + kxkkyk + kyk2 = (f (x) + f (y))2 .


DR

p
Thus, kxk = hx, xi is a norm, called the norm induced by the inner product h·, ·i.
Exercise 5.1.19. 1. Let V be an ips. Then,

4hx, yi = kx + yk2 − kx − yk2 + ikx + iyk2 − ikx − iyk2 (Polarization Identity).

2. Consider the complex vector space Cn . If x, y ∈ Cn then prove that

(a) If x 6= 0 then kx + ixk2 = kxk2 + kixk2 , even though hx, ixi =


6 0.
(b) hx, yi = 0 whenever kx + yk2 = kxk2 + kyk2 and kx + iyk2 = kxk2 + kiyk2 .

3. Let A ∈ Mn (C) satisfy kAxk ≤ kxk for all x ∈ Cn . Then, prove that if α ∈ C with
| α | > 1 then A − αI is invertible.

The next result is stated without proof as the proof is beyond the scope of this book.

Theorem 5.1.20. Let k · k be a norm on a nls V. Then, k · k is induced by some inner product
if and only if k · k satisfies the parallelogram law: kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2 .
Example 5.1.21. 1. For x = (x1 , x2 )T ∈ R2 , we define kxk1 = |x1 | + |x2 |. Verify that
kxk1 is indeed a norm. But, for x = e1 and y = e2 , 2(kxk2 + kyk2 ) = 4 whereas

kx + yk2 + kx − yk2 = k(1, 1)k2 + k(1, −1)k2 = (|1| + |1|)2 + (|1| + | − 1|)2 = 8.

So, the parallelogram law fails. Thus, kxk1 is not induced by any inner product in R2 .
5.2. GRAM-SCHMIDT ORTHONORMALIZATION PROCESS 135

2. Does there exist an inner product in R2 such that kxk = max{|x1 |, |x2 |}?
3. If k · k is a norm in V then d(x, y) = kx − yk, for x, y ∈ V, defines a distance function as
(a) d(x, x) = 0, for each x ∈ V.
(b) using the triangle inequality, for any z ∈ V, we have

d(x, y) = kx−yk = k (x − z)−(y − z) k ≤ k (x − z) k+k (y − z) k = d(x, z)+d(z, y).

5.2 Gram-Schmidt Orthonormalization Process


We start with the definition of an orthonormal set.

Definition 5.2.1. Let V be an ips. Then, a non-empty set S = {v1 , . . . , vn } ⊆ V is called an


orthogonal set if vi and vj are mutually orthogonal, for 1 ≤ i 6= j ≤ n, i.e.,

hui , uj i = 0, for 1 ≤ i < j ≤ n.

Further, if kvi k = 1, for 1 ≤ i ≤ n, Then S is called an orthonormal set. If S is also a


basis of V then S is called an orthonormal basis of V.

Example 5.2.2. 1. A few orthonormal sets in R2 are


 1 1  1 1
(1, 0)T , (0, 1)T , √ (1, 1)T , √ (1, −1)T and √ (2, 1)T , √ (1, −2)T .

2 2 5 5
T

2. Let S = {e1 , . . . , en } be the standard basis of Rn . Then, S is an orthonormal set as


AF

(a) kei k = 1, for 1 ≤ i ≤ n.


DR

(b) hei , ej i = 0, for 1 ≤ i 6= j ≤ n.


h iT h iT h iT 
3. The set 1 1
√ , −√ , √
3 3
1
3
1 1 2 1 1
, 0, √2 , √2 , √6 , √6 , − √6 is an orthonormal set in R3 .


4. Recall that hf (x), g(x)i = f (x)g(x)dx defines the standard inner product in C[−π, π].
−π
Consider S = {1} ∪ {em | m ≥ 1} ∪ {fn | n ≥ 1}, where 1(x) = 1, em (x) = cos(mx) and
fn (x) = sin(nx), for all m, n ≥ 1 and for all x ∈ [−π, π]. Then,
(a) S is a linearly independent set.
(b) k1k2 = 2π, kem k2 = π and kfn k2 = π.
(c) the functions in S are orthogonal.
     
1 1 1
Hence, √ ∪ √ em | m ≥ 1 ∪ √ fn | n ≥ 1 is an orthonormal set in C[−π, π].
2π π π
To proceed further, we consider a few examples for better understanding.

Example 5.2.3. Which point on the plane P is closest to the point, say Q?
Solution: Let y be the foot of the perpendicular from Q on P . Thus, by Pythagoras
Theorem, this point is unique. So, the question arises: how do we find y?
−→ →
− −→
Note that yQ gives a normal vector of the plane P . Hence, Q = y + yQ. So, need to


decompose Q into two vectors such that one of them lies on the plane P and the other is
orthogonal to the plane.
136 CHAPTER 5. INNER PRODUCT SPACES

P lane − P
0 y

Thus, we see that given u, v ∈ V \ {0}, we need to find two vectors, say y and z, such that
y is parallel to u and z is perpendicular to u. Thus, y = u cos(θ) and z = u sin(θ), where θ is
the angle between u and v.

R v P
⃗ =v− ⟨v,u⟩ ⃗ =
OQ ⟨v,u⟩
u
OR ∥u∥2
u ∥u∥2
u
Q
O θ

Figure 5.2: Decomposition of vector v


u
We do this as follows (see Figure 5.2). Let û = be the unit vector in the direction
kuk
~
of u. Then, using trigonometry, cos(θ) = kOQk ~ ~
~ k . Hence kOQk = kOP k cos(θ). Now using
k OP
~ hv,ui hv,ui
Definition 5.1.10, kOQk = kvk kvk kuk = kuk , where the absolute value is taken as the
length/norm is a positive quantity. Thus,
T
AF

 
~ = kOQk
~ û = u u
OQ v, .
kuk kuk
DR

   
~ u u u u ~
Hence, y = OQ = v, kuk and z = v − v, . In literature, the vector y = OQ
kuk kuk kuk
is called the orthogonal projection of v on u, denoted Proju (v). Thus,

hv, ui
 
u u ~
Proju (v) = v, and kProju (v)k = kOQk = . (5.2.1)
kuk kuk kuk

~ = kP~Qk =
Moreover, the distance of u from the point P equals kORk v − hv, u
i u
kuk kuk .

Example 5.2.4. 1. Determine the foot of the perpendicular from the point (1, 2, 3) on the
XY -plane.
Solution: Verify that the required point is (1, 2, 0)?
2. Determine the foot of the perpendicular from the point Q = (1, 2, 3, 4) on the plane
generated by (1, 1, 0, 0), (1, 0, 1, 0) and (0, 1, 1, 1).
Answer: (x, y, z, w) lies on the plane x−y −z +2w = 0 ⇔ h(1, −1, −1, 2), (x, y, z, w)i = 0.
So, the required point equals
1 1
(1, 2, 3, 4) − h(1, 2, 3, 4), √ (1, −1, −1, 2)i √ (1, −1, −1, 2)
7 7
4 1
= (1, 2, 3, 4) − (1, −1, −1, 2) = (3, 18, 25, 20).
7 7
5.2. GRAM-SCHMIDT ORTHONORMALIZATION PROCESS 137

3. Determine the projection of v = (1, 1, 1, 1)T on u = (1, 1, −1, 0)T .


u 1 T
Solution: By Equation (5.2.1), we have Projv (u) = hv, ui = 3 (1, 1, −1, 0) and
kuk2
w = (1, 1, 1, 1)T − Projv (u) = 31 (2, 2, 4, 3)T is orthogonal to u.
4. Let u = (1, 1, 1, 1)T , v = (1, 1, −1, 0)T , w = (1, 1, 0, −1)T ∈ R4 . Write v = v1 + v2 , where
v1 is parallel to u and v2 is orthogonal to u. Also, write w = w1 + w2 + w3 such that
w1 is parallel to u, w2 is parallel to v2 and w3 is orthogonal to both u and v2 .
Solution: Note that
u 1 1 T is parallel to u.
(a) v1 = Proju (v) = hv, ui kuk2 = 4 u = 4 (1, 1, 1, 1)

(b) v2 = v − 14 u = 41 (3, 3, −5, −1)T is orthogonal to u.

Note that Proju (w) is parallel to u and Projv2 (w) is parallel to v2 . Hence, we have
u 1 1 T is parallel to u,
(a) w1 = Proju (w) = hw, ui kuk2 = 4 u = 4 (1, 1, 1, 1)

(b) w2 = Projv2 (w) = hw, v2 i kvv22k2 = 7


44 (3, 3, −5, −1)
T is parallel to v2 and
(c) w3 = w − w1 − w2 = 3 T
11 (1, 1, 2, −4) is orthogonal to both u and v2 .

We now prove the most important initial result of this section.

Theorem 5.2.5. Let S = {u1 , . . . , un } be an orthonormal subset of an ips V(F).

1. Then, S is a linearly independent subset of V.


T

n
αi ui , for some αi ’s in F. Then,
P
2. Suppose v ∈ LS(S) with v =
AF

i=1
(a) αi = hv, ui i.
DR

n n
(b) kvk2 = k αi ui k2 = | αi |2 .
P P
i=1 i=1
n
3. Let z ∈ V and w =
P
hz, ui iui . Then, z = w + (z − w) with hz − w, wi = 0, i.e.,
i=1
z − w ∈ LS(S)⊥ . Further, kzk2 = kwk2 + kz − wk2 ≥ kwk2 .

4. Let dim(V) = n. Then, hv, ui i = 0 for all i = 1, 2, . . . , n if and only if v = 0.

Proof. Part 1: Consider the linear system c1 u1 + · · · + cn un = 0 in the variables c1 , . . . , cn . As


h0, ui = 0 and huj , ui i = 0, for all j 6= i, we have
n
X
0 = h0, ui i = hc1 u1 + · · · + cn un , ui i = cj huj , ui i = ci hui , ui i = ci .
j=1

Hence, ci = 0, for 1 ≤ i ≤ n. Thus, the above linear system has only the trivial solution. So,
the set S is linearly independent.
n
P Pn
Part 2: Note that hv, ui i = h αj uj , ui i = j=1 αj huj , ui i = αi hui , ui i = αi . This
j=1
completes the first sub-part. For the second sub-part, we have
n
* n n
+ n
* n
+
X X X X X
k αi ui k2 = α i ui , α i ui = αi ui , αj uj
i=1 i=1 i=1 i=1 j=1
n
X n
X n
X n
X
= αi αj hui , uj i = αi αi hui , ui i = | αi |2 .
i=1 j=1 i=1 i=1
138 CHAPTER 5. INNER PRODUCT SPACES

Part 3: Note that for 1 ≤ i ≤ n,


* n +
X
hz − w, ui i = hz, ui i − hw, ui i = hz, ui i − hz, uj iuj , ui
j=1
n
X
= hz, ui i − hz, uj ihuj , ui i = hz, ui i − hz, ui i = 0.
j=1

So, z − w ∈ LS(S)⊥ . Hence, hz − w, wi = 0 as w ∈ LS(S). Further,

kzk2 = kw + (z − w)k2 = kwk2 + kz − wk2 ≥ kwk2 .

Part 4: Follows directly using Part 2b as {u1 , . . . , un } is a basis of V.


A rephrasing of Theorem 5.2.5.2b gives a generalization of the pythagoras theorem, popularly
known as the Parseval’s formula. The proof is left as an exercise for the reader.

Theorem 5.2.6. Let V be a finite dimensional ips with an orthonormal basis {v1 , · · · , vn }.
Then, for each x, y ∈ V,
n
X
hx, yi = hx, vi ihy, vi i.
i=1
n
Furthermore, if x = y then kxk2 = | hx, vi i |2 (generalizing the Pythagoras Theorem).
P
T

i=1
AF

As a corollary to Theorem 5.2.5, we have the following result.


DR

Theorem 5.2.7 (Bessel’s Inequality). Let V be an ips with {v1 , · · · , vn } as an orthogonal set.
n n
X | hz, vk i |2 2
X hz, vk i
Then, 2
≤ kzk , for each z ∈ V. Equality holds if and only if z = vk .
kvk k kvk k2
k=1 k=1

vk
Proof. For 1 ≤ k ≤ n, define uk = and use Theorem 5.2.5.4 to get the required result.
kvk k
 
Remark 5.2.8. Using Theorem 5.2.5, we see that if B = v1 , . . . , vn is an ordered orthonormal
 
hu, v1 i
 . 
basis of an ips V then for each u ∈ V, [u]B =  . 
 . . Thus, in place of solving a linear
hu, vn i
system to get the coordinates of a vector, we just need to compute the inner product with basis
vectors.

Exercise 5.2.9. 1. Find v, w ∈ R3 such that v, w, (1, −1, −2)T are mutually orthogonal.
" " # " ## "" ## " x+y #
1 1 x √
2. Let B = √1
2
, √1
2
be an ordered basis of R2 . Then, = 2
x−y
.
1 −1 y √
B 2
    
   √ 
1 1 1 2 3
 −1 
 3 1, 2 −1, 6  1  of R , [(2, 3, 1) ]B =  2 .
T
 1   1   1   3
3. For the ordered basis B =  √   √   √   √ 
1 0 −2 √3
6
5.2. GRAM-SCHMIDT ORTHONORMALIZATION PROCESS 139

In view of the importance of Theorem 5.2.5, we inquire into the question of extracting an
orthonormal basis from a given basis. The process of extracting an orthonormal basis from a
finite linearly independent set is called the Gram-Schmidt Orthonormalization process.
We first consider a few examples. Note that Theorem 5.2.5 also gives us an algorithm for doing
so, i.e., from the given vector subtract all the orthogonal projections/components. If the new
vector is nonzero then this vector is orthogonal to the previous ones. The proof follows directly
from Theorem 5.2.5 but we give it again for the sake of completeness.

Theorem 5.2.10 (Gram-Schmidt Orthogonalization Process). Let V be an ips. If {v1 , . . . , vn }


is a set of linearly independent vectors in V then there exists an orthonormal set {w1 , . . . , wn }
in V. Furthermore, LS(w1 , . . . , wi ) = LS(v1 , . . . , vi ), for 1 ≤ i ≤ n.

Proof. Note that for orthonormality, we need kwi k = 1, for 1 ≤ i ≤ n and hwi , wj i = 0, for
1 ≤ i 6= j ≤ n. Also, by Corollary 3.2.8.2, vi ∈
/ LS(v1 , . . . , vi−1 ), for 2 ≤ i ≤ n, as {v1 , . . . , vn }
is a linearly independent set. We are now ready to prove the result by induction.
v1
Step 1: Define w1 = then LS(v1 ) = LS(w1 ).
kv1 k
u2
Step 2: Define u2 = v2 − hv2 , w1 iw1 . Then, u2 6= 0 as v2 6∈ LS(v1 ). So, let w2 = .
ku2 k
Note that {w1 , w2 } is orthonormal and LS(w1 , w2 ) = LS(v1 , v2 ).
Step 3: For induction, assume that we have obtained an orthonormal set {w1 , . . . , wk−1 } such
that LS(v1 , . . . , vk−1 ) = LS(w1 , . . . , wk−1 ). Now, note that
T

k−1 k−1
AF

P P
uk = v k − hvk , wi iwi = vk − Projwi (vk ) 6= 0 as vk ∈
/ LS(v1 , . . . , vk−1 ). So, let us put
i=1 i=1
uk
DR

wk = . Then, {w1 , . . . , wk } is orthonormal as kwk k = 1 and


kuk k
k−1
X k−1
X
kuk khwk , w1 i = huk , w1 i = hvk − hvk , wi iwi , w1 i = hvk , w1 i − h hvk , wi iwi , w1 i
i=1 i=1
k−1
X
= hvk , w1 i − hvk , wi ihwi , w1 i = hvk , w1 i − hvk , w1 i = 0.
i=1

Similarly, hwk , wi i = 0, for 2 ≤ i ≤ k − 1. Clearly, wk = uk /kuk k ∈ LS(w1 , . . . , wk−1 , vk ). So,


wk ∈ LS(v1 , . . . , vk ).
k−1
P
As vk = kuk kwk + hvk , wi iwi , we get vk ∈ LS(w1 , . . . , wk ). Hence, by the principle of
i=1
mathematical induction LS(w1 , . . . , wk ) = LS(v1 , . . . , vk ) and the required result follows.
We now illustrate the Gram-Schmidt process with a few examples.

Example 5.2.11. 1. Let S = {(1, −1, 1, 1), (1, 0, 1, 0), (0, 1, 0, 1)} ⊆ R4 . Find an orthonor-
mal set T such that LS(S) = LS(T ).
Solution: Let v1 = (1, 0, 1, 0)T , v2 = (0, 1, 0, 1)T and v3 = (1, −1, 1, 1)T . Then,
w1 = √1 (1, 0, 1, 0)T . As hv2 , w1 i = 0, we get w2 = √12 (0, 1, 0, 1)T . For the third vec-
2
tor, let u3 = v3 − hv3 , w1 iw1 − hv3 , w2 iw2 = (0, −1, 0, 1)T . Thus, w3 = √1 (0, −1, 0, 1)T .
2
h iT h iT h iT h iT
2. Let S = {v1 = 2 0 0 , v2 = 32 2 0 , v3 = 12 3
2 0 , v4 = 1 1 1 }. Find
an orthonormal set T such that LS(S) = LS(T ).
140 CHAPTER 5. INNER PRODUCT SPACES

h iT
Solution: Take w1 = kvv11 k = 1 0 0 = e1 . For the second vector, consider u2 =
h iT h iT
v2 − 32 w1 = 0 2 0 . So, put w2 = kuu22 k = 0 1 0 = e2 .
2
hv3 , wi iwi = (0, 0, 0)T . So, v3 ∈ LS((w1 , w2 )). Or
P
For the third vector, let u3 = v3 −
i=1
equivalently, the set {v1 , v2 , v3 } is linearly dependent.
2
P
So, for again computing the third vector, define u4 = v4 − hv4 , wi iwi . Then, u4 =
i=1
v4 − w1 − w2 = e3 . So w4 = e3 . Hence, T = {w1 , w2 , w4 } = {e1 , e2 , e3 }.

3. Find an orthonormal set in R3 containing (1, 2, 1)T .


Solution: Let (x, y, z)T ∈ R3 with (1, 2, 1), (x, y, z) = 0. Thus,

(x, y, z) = (−2y − z, y, z) = y(−2, 1, 0) + z(−1, 0, 1).

Observe that (−2, 1, 0) and (−1, 0, 1) are orthogonal to (1, 2, 1) but are themselves not
orthogonal.
Method 1: Apply Gram-Schmidt process to { √16 (1, 2, 1)T , (−2, 1, 0)T , (−1, 0, 1)T } ⊆ R3 .
Method 2: Valid only in R3 using the cross product of two vectors.
−1
In either case, verify that { √16 (1, 2, 1), √ 5
(2, −1, 0), √−1
30
(1, 2, −5)} is the required set.
T

We now state two immediate corollaries without proof.


AF

Corollary 5.2.12. Let V 6= {0} be an ips. If


DR

1. V is finite dimensional then V has an orthonormal basis.


2. S is a non-empty orthonormal set and dim(V) is finite then S can be extended to form an
orthonormal basis of V.

Remark 5.2.13. Let S = {v1 , . . . , vn } 6= {0} be a non-empty subset of a finite dimensional


vector space V. If we apply Gram-Schmidt process to
1. S then we obtain an orthonormal basis of LS(v1 , . . . , vn ).
2. a re-arrangement of elements of S then we may obtain another orthonormal basis of
LS(v1 , . . . , vn ). But, observe that the size of the two bases will be the same.
Exercise 5.2.14. 1. Let V be an ips with B = {v1 , . . . , vn } as a basis. Then, prove that B
n
is orthonormal if and only if for each x ∈ V, x =
P
hx, vi ivi . [Hint: Since B is a basis,
i=1
each x ∈ V has a unique linear combination in terms of vi ’s.]
2. Let S be a subset of V having 101 elements. Suppose that the application of the Gram-
Schmidt process yields u5 = 0. Does it imply that LS(v1 , . . . , v5 ) = LS(v1 , . . . , v4 )? Give
reasons for your answer.
k
3. Let B = {v1 , . . . , vn } be an orthonormal set in Rn . For 1 ≤ k ≤ n, define Ak = vi viT .
P
i=1
Then, prove that ATk = Ak and A2k = Ak . Thus, Ak ’s are projection matrices.
4. Determine an orthonormal basis of R4 containing (1, −2, 1, 3)T and (2, 1, −3, 1)T .
5.2. GRAM-SCHMIDT ORTHONORMALIZATION PROCESS 141

5. Let x ∈ Rn with kxk = 1.

(a) Then, prove that {x} can be extended to form an orthonormal basis of Rn .
(b) Let the extended basis be {x,x2 , . . . , xn } and B = [e 1 , . . . , en ] the standard ordered
basis of Rn . Prove that A = [x]B , [x2 ]B , . . . , [xn ]B is an orthogonal matrix.

6. Let v, w ∈ Rn , n ≥ 1 with kuk = kwk = 1. Prove that there exists an orthogonal matrix
A such that Av = w. Prove also that A can be chosen such that det(A) = 1.
7. Let (V, h , i) be an n-dimensional ips. If u ∈ V with kuk = 1 then give reasons for the
following statements.

(a) Let S ⊥ = {v ∈ V | hv, ui = 0}. Then, dim(S ⊥ ) = n − 1.


(b) Let 0 6= β ∈ F. Then S = {v ∈ V : hv, ui = β} is not a subspace of V.
(c) Let v ∈ V. Then v = v0 + hv, uiu for a vector v0 ∈ S ⊥ . Thus V = LS(u, S ⊥ ).

5.2.1 QR Decomposition∗

The next result gives the proof of the QR decomposition for real matrices. The readers are
advised to prove similar results for matrices with complex entries. This decomposition and its
generalizations are helpful in the numerical calculations related with eigenvalue problems (see
T

Chapter 6).
AF

Theorem 5.2.1 (QR Decomposition). Let A ∈ Mn (R) be invertible. Then, there exist matrices
DR

Q and R such that Q is orthogonal and R is upper triangular with A = QR. Furthermore, if
det(A) 6= 0 then the diagonal entries of R can be chosen to be positive. Also, in this case, the
decomposition is unique.

Proof. As A is invertible, it’s columns form a basis of Rn . So, an application of the Gram-
Schmidt orthonormalization process to {A[:, 1], . . . , A[:, n]} gives an orthonormal basis {v1 , . . . , vn }
of Rn satisfying

LS(A[:, 1], . . . , A[:, i]) = LS(v1 , . . . , vi ), for 1 ≤ i ≤ n.

Since A[:, i] ∈ LS(v1 , . . . , vi ), for 1 ≤ i ≤ n, there existαji ∈ R, 1 ≤ j ≤ i,


 such that
  α11 α12 · · · α1n
α1i  
 .   0 α22 · · · α2n 
A[:, i] = [v1 , . . . , vi ] .. . Thus, if Q = [v1 , . . . , vn ] and R = 
 . .. ..

..  then
   .. . . . 
αii
 
0 0 ··· αnn
1. Q is an orthogonal matrix (see Exercise 5.4.8.5),
2. R is an upper triangular matrix, and
3. A = QR.

Thus, this completes the proof of the first part. Note that
1. αii 6= 0, for 1 ≤ i ≤ n, as A[:, 1] 6= 0 and A[:, i] ∈
/ LS(v1 , . . . , vi−1 ).
142 CHAPTER 5. INNER PRODUCT SPACES

2. if αii < 0, for some i, 1 ≤ i ≤ n then we can replace vi in Q by −vi to get a new Q ad R
in which the diagonal entries of R are positive.

Uniqueness: Suppose Q1 R1 = Q2 R2 for some orthogonal matrices Qi ’s and upper triangular


matrices Ri ’s with positive diagonal entries. As Qi ’s and Ri ’s are invertible, we get Q−1
2 Q1 =
R2 R1−1 . Now, using

1. Exercises 2.5.25.1, 1.2.14.1, the matrix R2 R1−1 is an upper triangular matrix.

2. Exercises 1.3.2.3, Q−1


2 Q1 is an orthogonal matrix.

So, the matrix R2 R1−1 is an orthogonal upper triangular matrix and hence, by Exercise 1.2.10.3,
R2 R1−1 = In . So, R2 = R1 and therefore Q2 = Q1 .
Let A be an n × k matrix with Rank(A) = r. Then, by Remark 5.2.13, an application
of the Gram-Schmidt orthonormalization process to columns of A yields an orthonormal set
{v1 , . . . , vr } ⊆ Rn such that

LS(A[:, 1], . . . , A[:, j]) = LS(v1 , . . . , vi ), for 1 ≤ i ≤ j ≤ k.

Hence, proceeding on the lines of the above theorem, we have the following result.

Theorem 5.2.2 (Generalized QR Decomposition). Let A be an n × k matrix of rank r. Then,


T

A = QR, where
AF
DR

1. Q = [v1 , . . . , vr ] is an n × r matrix with QT Q = Ir ,

2. LS(A[:, 1], . . . , A[:, j]) = LS(v1 , . . . , vi ), for 1 ≤ i ≤ j ≤ k and

3. R is an r × k matrix with Rank(R) = r.


 
1 0 1 2

1 −1 1

0
Example 5.2.3. 1. Let A = 
1
. Find an orthogonal matrix Q and an upper
 0 1 1 
0 1 1 1
triangular matrix R such that A = QR.
Solution: From Example 5.2.11, we know that w1 = √1 (1, 0, 1, 0)T , w2 = √1 (0, 1, 0, 1)T
2 2
and w3 = √1 (0, −1, 0, 1)T . We now compute w4 . If v4 = (2, 1, 1, 1)T then
2

1
u4 = v4 − hv4 , w1 iw1 − hv4 , w2 iw2 − hv4 , w3 iw3 = (1, 0, −1, 0)T .
2

Thus, w4 = √1 (−1, 0, 1, 0)T . Hence, we see that A = QR with


2   √ √
√1 √1 2 − √32

0 0 2 0
 2 2
√ √ 
1 −1
 0 √2 √2 0 

     0 2 0 − 2
Q= w1 , . . . , w4 =  1 −1
 and R =  √ .
√
 2 0 0 √2   0 0 2 0 
  
0 √ 1 √1
0 0 0 0 √1
2 2 2
5.2. GRAM-SCHMIDT ORTHONORMALIZATION PROCESS 143
 
1 1 1 0

−1 0 −2 1

2. Let A = 
 . Find a 4 × 3 matrix Q satisfying QT Q = I3 and an upper
 1 1 1 0

1 0 2 1
triangular matrix R such that A = QR.
Solution: Let us apply the Gram-Schmidt orthonormalization process to the columns of
A. As v1 = (1, −1, 1, 1)T , we get w1 = 21 v1 . Let v2 = (1, 0, 1, 0)T . Then,

1
u2 = v2 − hv2 , w1 iw1 = (1, 0, 1, 0)T − w1 = (1, 1, 1, −1)T .
2
Hence, w2 = 12 (1, 1, 1, −1)T . Let v3 = (1, −2, 1, 2)T . Then,

u3 = v3 − hv3 , w1 iw1 − hv3 , w2 iw2 = v3 − 3w1 + w2 = 0.

So, we again take v3 = (0, 1, 0, 1)T . Then,

u3 = v3 − hv3 , w1 iw1 − hv3 , w2 iw2 = v3 − 0w1 − 0w2 = v3 .

So, w3 = √1 (0, 1, 0, 1)T . Hence,


2

1 1
 
2 2 0  
 −1 1 2 1 3 0
√1 

 2 2
T

 
2  and R = 
Q = [v1 , v2 , v3 ] = 
 1 1
0 1 −1 0  .
0 √ 
AF

 
 2 2
1 −1 1
0 0 0 2

2 2
DR

The readers are advised to check the following:

(a) Rank(A) = 3,
(b) A = QR with QT Q = I3 , and
(c) R is a 3 × 4 upper triangular matrix with Rank(R) = 3.

Remark 5.2.4. Let A ∈ Mm,n (R).


 
hv1 , A[:, 1]i hv1 , A[:, 2]i hv1 , A[:, 3]i ···
 
 0 hv2 , A[:, 2]i hv2 , A[:, 3]i · · ·
1. If A = QR with Q = [v1 , . . . , vn ] then R =  .
 
 0 0 hv3 , A[:, 3]i · · ·
.. .. ..
 
. . .
In case Rank(A) < n then a slight modification gives the matrix R.
2. Further, let m = n and Rank(A) = n .
(a) Then, AT A is invertible (see Exercise 3.4.17.4).
(b) By Theorem 5.2.2, A = QR with Q a matrix of size m × n and R an upper triangular
matrix of size n × n. Also, QT Q = In and Rank(R) = n.
(c) Thus, AT A = RT QT QR = RT R. As AT A is invertible, the matrix RT R is invertible.
Since R is a square matrix, by Exercise 2.5.5.1, the matrix R itself is invertible.
Hence, (RT R)−1 = R−1 (RT )−1 .
144 CHAPTER 5. INNER PRODUCT SPACES

(d) So, if Q = [v1 , . . . , vn ] then

A(AT A)−1 AT = QR(RT R)−1 RT QT = (QR)(R−1 (RT )−1 )RT QT = QQT .

(e) Hence, using Theorem 5.3.7, we see that the matrix


 
v1T n
..  X
P = A(AT A)−1 AT = QQT = [v1 , . . . , vn ] vi viT

=
 . 
i=1
vnT

is the orthogonal projection matrix on Col(A).

3. Further, let Rank(A) = r < n. If j1 , . . . , jr are the pivot columns of A then Col(A) =
Col(B), where B = [A[:, j1 ], . . . , A[:, jr ]] is an m × r matrix with Rank(B) = r. So, using
Part 2e we see that B(B T B)−1 B T is the orthogonal projection matrix on Col(A). So,
compute RREF of A and choose columns of A corresponding to the pivot columns.

5.3 Orthogonal Projections and Applications


Till now, our main interest was to understand the linear system Ax = b, for A ∈ Mm,n (C), x ∈
Cn and b ∈ Cm , from different view points. But, in most practical situations the system has
no solution. So, we are interested in finding a point x0 ∈ Rn such that the err = b − Ax0 is
T
AF

the least. Thus, we consider the problem of finding x0 ∈ Rn such that


DR

kerrk = kb − Ax0 k = min{kb − Axk : x ∈ Rn },

i.e., we try to find the vector x0 ∈ Rn which is nearest to b.


To begin with, recall the following result from Page 135.

Theorem 5.3.1 (Decomposition). Let V be an ips having W as a finite dimensional subspace.


k
Suppose {f1 , . . . , fk } is an orthonormal basis of W. Then, for each b ∈ V, y =
P
hb, fi ifi is the
i=1
closest point in W from b. Furthermore, b − y ∈ W⊥ .

We now give a definition and then an implication of Theorem 5.3.1.

Definition 5.3.2. Let W be a finite dimensional subspace of an ips V. Then, by Theorem 5.3.1,
for each v ∈ V there exist unique vectors w ∈ W and u ∈ W⊥ with v = w + u. We thus define
the orthogonal projection of V onto W, denoted PW , by

PW : V → W by PW (v) = w.

The vector w is called the projection of v on W.

Remark 5.3.3. Let A ∈ Mm,n (R) and W = Col(A). Then, to find the orthogonal projection
PW (b), we can use either of the following ideas:
k
P
1. Determine an orthonormal basis {f1 , . . . , fk } of Col(A) and get PW (b) = hb, fi ifi .
i=1
5.3. ORTHOGONAL PROJECTIONS AND APPLICATIONS 145

2. By Theorem 3.4.13.2, Col(A) = Null(AT )⊥ . Hence, for b ∈ Rm there exists unique


u ∈ Col(A) and v ∈ Null(AT ) such that b = u + v. Thus, using Definition 5.3.2 and
Theorem 5.3.1, PW (b) = u.

Before proceeding to projections, we give an application of Theorem 5.3.1 to a linear system.

Corollary 5.3.4. Let A ∈ Mm,n (R) and b ∈ Rm . Then, every least square solution of Ax = b
is a solution of the system AT Ax = AT b. Conversely, every solution of AT Ax = AT b is a least
square solution of Ax = b.

Proof. As b ∈ Rm , by Remark 5.3.3, there exists y ∈ Col(A) and v ∈ Null(AT ) such that
b = y + v and min{kb − wk | w ∈ Col(A)} = kb − yk. As y ∈ Col(A), there exists x0 ∈ Rn
such that Ax0 = y, i.e., x0 is the least square solution of Ax = b. Hence,

(AT A)x0 = AT (Ax0 ) = AT y = AT (b − v) = AT b − 0 = AT b.

Conversely, let x1 ∈ Rn be a solution of AT Ax = AT b, i.e., AT (Ax1 − b) = 0. To show

min{kb − Axk | x ∈ Rn } = kb − Ax1 k.

Note that AT (Ax1 − b) = 0 implies

0 = (x − x1 )T AT (Ax1 − b) = (Ax − Ax1 )T (Ax1 − b) = hAx1 − b, Ax − Ax1 i.


T
AF

Thus, the vectors b − Ax1 and Ax1 − Ax are orthogonal and hence
DR

kb − Axk2 = kb − Ax1 + Ax1 − Axk2 = kb − Ax1 k2 + kAx1 − Axk2 ≥ kb − Ax1 k2 .

Hence, the required result follows.


The above corollary gives the following result.

Corollary 5.3.5. Let A ∈ Mm,n (R) and b ∈ Rm . If


1. AT A is invertible then the least square solution of Ax = b equals x = (AT A)−1 AT b.
2. AT A is not invertible then the least square solution of Ax = b equals x = (AT A)− AT b,
where (AT A)− is the pseudo-inverse of AT A.

Proof. Part 1 directly follows from Corollary 5.3.5. For Part 2, let b = y + v, for y ∈ Col(A)
and v ∈ Null(AT ). As y ∈ Col(A), there exists x0 ∈ Rn such that Ax0 = y. Thus, by
Remark 5.3.3, AT b = AT (y + v) = AT y = AT Ax0 . Now, using the definition of pseudo-inverse
(see Exercise 1.3.7.13), we see that

(AA A) (AT A)− AT b = (AT A)(AT A)− (AT A)x0 = (AT A)x0 = AT b.


Thus, we see that (AT A)− AT b is a solution of the system AT Ax = AT b. Hence, by Corol-
lary 5.3.4, the required result follows.
We now give a few examples to understand projections.

Example 5.3.6. Use the fundamental theorem of linear algebra to compute the vector of the
orthogonal projection.
146 CHAPTER 5. INNER PRODUCT SPACES

1. Determine the projection of (1, 1, 1, 1, 1)T on Null ([1, −1, 1, −1, 1]).
Solution: Here A = [1, −1, 1, −1, 1]. So, a basis of Col(AT ) equals {(1, −1, 1, −1, 1)T }
and that of Null(A) equals {(1, 1, 0, 0, 0)T , (1, 0, −1, 0, 0)T , (1, 0, 0, 1, 0)T , (1, 0, 0, 0, −1)T }.
Note that Null(A) and Col(AT ) are orthogonal and hence (1, 1, 1, 1, 1)T = y + z, where
y ∈ Null(A) and z ∈ Col(AT ) = LS([1, −1, 1, −1, 1]T ).
     
1 1 1 1 1 1 6
     
1 0 0 0 −1 1 −4
    1 
So, taking B = 0 −1 0 0 1  and solving Bx = 1 gives x =  6 .
     
    5 
0 0 1 0 −1 1 −4
     
0 0 0 −1 1 1 1
1
Thus, z = x5 (1, −1, 1, −1, 1)T = (1, −1, 1, −1, 1)T and the projection vector y equals
5  
1 1 1 1  
  6/5
1 0 0 0 
1  −4/5
T T

(1, 1, 1, 1, 1) − z = (4, 6, 4, 6, 4) , which is also equal to 0 −1 0 0  .
 
5   6/5 

0 0 1 0  
  −4/5
0 0 0 −1
2. Determine the projection of (1, 1, 1)T on Null ([1, 1, −1]).
Solution: Here A = [1, 1, −1]. So, a basis of Null(A) equals {(1, −1, 0)T , (1, 0, 1)T } and
T

that of 
Col(A T ) equals {(1, 1, −1)T }. Then, the solution of the linear system
AF

    
1 1 1 1 −2
    1  
DR

Bx =  1, where B = −1 0 1  equals x = 3  4 . Thus, the projection is


    
1 0 1 −1 1
1 2
(−2)(1, −1, 0)T + 4(1, 0, 1)T = (1, 1, 2)T .

3 3
3. Determine the projection of (1, 1, 1)T on Col [1, 2, 1]T .


Solution: Here, AT = [1, 2, 1], a basis of Col(A) equals {(1, 2, 1)T } and that of Null(AT )
equals {(1, T T
 0, −1) , (2, −1,
 0) }. Then,
 using the solution of the linear system
1 1 2 1
    2 T
Bx = 1, where B =  0 −1 2
  
 gives 3 (1, 2, 1) as the required vector.
1 −1 0 1

To use the first idea in Remark 5.3.3, we prove the following result which helps us to get
the matrix of the orthogonal projection from an orthonormal basis.

Theorem 5.3.7. Let {f1 , . . . , fk } be an orthonormal basis of a finite dimensional subspace W


k
of an ips V. Then PW = fi fi∗ .
P
i=1

Proof. Let v ∈ V. Then,


k k k
!
X X X
PW v = fi fi∗ v= fi (fi∗ v) = hv, fi ifi .
i=1 i=1 i=1

As PW v is the only closet point (see Theorem 5.3.1), the required result follows.
5.3. ORTHOGONAL PROJECTIONS AND APPLICATIONS 147

Example 5.3.8. In each of the following, determine the matrix of the orthogonal projection.
Also, verify that PW + PW⊥ = I. What can you say about Rank(PW⊥ ) and Rank(PW )? Also,
verify the orthogonal projection vectors obtained in Example 5.3.6.
1. W = {(x1 , . . . , x5 )T ∈ R5 | x1 − x2 + x3 −x4 +x5= 0}=Null  ([1,−1, 1, −1,
 1]).



   1 0 1 −2 


      
1 0 −1 2

        


       1   
Solution: An orthonormal basis of W is √12 0, √12 1, √16  0 , √  3  . Thus,
       
       30   


 0 1 0 −3

        

−2 −2 
 
 0 0
   
4 1 −1 1 −1 1 −1 1 −1 1
   
 1 4 1 −1 1 −1 1 −1 1 −1
4 1  1 
fi fiT = −1
P
PW = 1 4 1 −1 and P =  1 −1 1 −1 1.
 
W⊥
i=1 5   5  
 1 −1 1 4 1 −1 1 −1 1 −1 
   
−1 1 −1 1 4 1 −1 1 −1 1
2. W = {(x, y, z)T ∈ R3 | x + y − z = 0} = Null ([1, 1, −1]).
 
⊥ 1 1
Solution: Note {(1, 1, −1)} is a basis of W and √ (1, −1, 0), √ (1, 1, 2) an or-
2 6
thonormal basis of W. So,
   
1 1 −1 2 −1 1
1 1
T

 
PW⊥ =  1 1 −1 and PW =  −1 2 1 .
3 3
AF

  
−1 −1 1 1 1 2
DR

Verify that PW + PW⊥ = I3 , Rank(PW⊥ ) = 2 and Rank(PW ) = 1.


3. W = LS( (1, 2, 1) ) = Col [1, 2, 1]T ⊆ R3 .


Solution: Using Example 5.2.11.3 and Equation (5.2.1)

W⊥ = LS({(−2, 1, 0), (−1, 0, 1)}) = LS({(−2, 1, 0), (1, 2, −5)}).


   
1 2 1 5 −2 −1
= 61  1
   
So, PW 2 4 2 and PW⊥ = 6 −2 2 −2.
 
1 2 1 −1 −2 5

We advise the readers to give a proof of the next result.

Theorem 5.3.9. Let {f1 , . . . , fk } be an orthonormal basis of a subspace W of Rn . If {f1 , . . . , fn }


k n
is an extended orthonormal basis of Rn , PW = fi fiT and PW⊥ = fi fiT then prove that
P P
i=1 i=k+1
1. In − PW = PW⊥ .
2. (PW )T = PW and (PW⊥ )T = PW⊥ . That is, PW and PW⊥ are symmetric.
3. (PW )2 = PW and (PW⊥ )2 = PW⊥ . That is, PW and PW⊥ are idempotent.
4. PW ◦ PW⊥ = PW⊥ ◦ PW = 0.
Exercise 5.3.10. 1. Let W = {(x, y, z, w) ∈ R4 : x = y, z = w} be a subspace of R4 .
Determine the matrix of the orthogonal projection.
148 CHAPTER 5. INNER PRODUCT SPACES

2. Let PW1 and PW2 be the orthogonal projections of R2 onto W1 = {(x, 0) : x ∈ R} and
W2 = {(x, x) : x ∈ R}, respectively. Note that PW1 ◦ PW2 is a projection onto W1 . But,
it is not an orthogonal projection. Hence or otherwise, conclude that the composition of
two orthogonal projections need not be an orthogonal projection?
" #
1 1
3. Let A = . Then, A is idempotent but not symmetric. Now, define P : R2 → R2 by
0 0
P (v) = Av, for all v ∈ R2 . Then,

(a) P is idempotent.
(b) Null(P ) ∩ Rng(P ) = Null(A) ∩ Col(A) = {0}.
(c) R2 = Null(P ) + Rng(P ). But, (Rng(P ))⊥ = (Col(A))⊥ 6= Null(A).
(d) Since (Col(A))⊥ 6= Null(A), the map P is not an orthogonal projector. In this
case, P is called a projection of R2 onto Rng(P ) along Null(P ).

4. Find all 2 × 2 real matrices A such that A2 = A. Hence, or otherwise, determine all
projection operators of R2 .
5. Let W be an (n − 1)-dimensional subspace of Rn with ordered basis BW = [f1 , . . . , fn−1 ].
Suppose B = [f1 , . . . , fn−1 , fn ] is an orthogonal ordered basis of Rn obtained by extending
n−1
BW . Now, define a function Q : Rn → Rn by Q(v) = hv, fn ifn −
P
hv, fi ifi . Then,
i=1
T
AF

(a) Q fixes every vector in W⊥ .


(b) Q sends every vector w ∈ W to −w.
DR

(c) Q ◦ Q = In .

The function Q is called the reflection operator with respect to W⊥ .

5.3.1 Orthogonal Projections as Self-Adjoint Operators*

Theorem 5.3.9 implies that the matrix of the projection operator is symmetric. We use this
idea to proceed further.

Definition 5.3.11. Let V be an ips with inner product h , i. A linear operator P : V → V is


called self-adjoint if hP (v), ui = hv, P (u)i, for every u, v ∈ V.

A careful understanding of the examples given below shows that self-adjoint operators and
Hermitian matrices are related. It also shows that the vector spaces Cn and Rn can be decom-
posed in terms of the null space and column space of Hermitian matrices. They also follow
directly from the fundamental theorem of linear algebra.

Example 5.3.12. 1. Let A be an n × n real symmetric matrix. If P : Rn → Rn is defined


by P (x) = Ax, for every x ∈ Rn then

(a) P is a self adjoint operator as A = AT , for every x, y ∈ Rn , implies

hP (x), yi = (yT )Ax = (yT )AT x = (Ay)T x = hx, Ayi = hx, P (y)i.
5.3. ORTHOGONAL PROJECTIONS AND APPLICATIONS 149

(b) Null(P ) = (Rng(P ))⊥ as A = AT . Thus, Rn = Null(P ) ⊕ Rng(P ).

2. Let A be an n × n Hermitian matrix. If P : Cn → Cn is defined by P (z) = Az, for all


z ∈ Cn then using similar arguments (see Example 5.3.12.1) prove the following:

(a) P is a self-adjoint operator.


(b) Null(P ) = (Rng(P ))⊥ as A = A∗ . Thus, Cn = Null(P ) ⊕ Rng(P ).

We now state and prove the main result related with orthogonal projection operators.

Theorem 5.3.13. Let V be a finite dimensional ips. If V = W ⊕ W⊥ then the orthogonal


projectors PW : V → V on W and PW⊥ : V → V on W⊥ satisfy

1. Null(PW ) = {v ∈ V : PW (v) = 0} = W⊥ = Rng(PW⊥ ).

2. Rng(PW ) = {PW (v) : v ∈ V} = W = Null(PW⊥ ).

3. PW ◦ PW = PW , PW⊥ ◦ PW⊥ = PW⊥ (Idempotent).

4. PW⊥ ◦ PW = 0V and PW ◦ PW⊥ = 0V , where 0V (v) = 0, for all v ∈ V

5. PW + PW⊥ = IV , where IV (v) = v, for all v ∈ V.

6. The operators PW and PW⊥ are self-adjoint.


T
AF

Proof. Part 1: As V = W⊕W⊥ , for each u ∈ W⊥ , one uniquely writes u = 0+u, where 0 ∈ W
and u ∈ W⊥ . Hence, by definition, PW (u) = 0 and PW⊥ (u) = u. Thus, W⊥ ⊆ Null(PW ) and
DR

W⊥ ⊆ Rng(PW⊥ ).
Now suppose that v ∈ Null(PW ). So, PW (v) = 0. As V = W ⊕ W⊥ , v = w + u, for unique
w ∈ W and unique u ∈ W⊥ . So, by definition, PW (v) = w. Thus, w = PW (v) = 0. That is,
v = 0 + u = u ∈ W⊥ . Thus, Null(PW ) ⊆ W⊥ .
A similar argument implies Rng(PW⊥ ) ⊆ W ⊥ and thus completing the proof of the first
part.
Part 2: Use an argument similar to the proof of Part 1.
Part 3, Part 4 and Part 5: Let v ∈ V. Then, v = w + u, for unique w ∈ W and unique
u ∈ W⊥ . Thus, by definition,

(PW ◦ PW )(v) = PW PW (v) = PW (w) = w and PW (v) = w

(PW⊥ ◦ PW )(v) = PW⊥ PW (v) = PW⊥ (w) = 0 and
(PW ⊕ PW⊥ )(v) = PW (v) + PW⊥ (v) = w + u = v = IV (v).

Hence, PW ◦ PW = PW , PW⊥ ◦ PW = 0V and IV = PW ⊕ PW⊥ .


Part 6: Let u = w1 + x1 and v = w2 + x2 , for unique w1 , w2 ∈ W and unique x1 , x2 ∈ W⊥ .
Then, by definition, hwi , xj i = 0 for 1 ≤ i, j ≤ 2. Thus,

hPW (u), vi = hw1 , vi = hw1 , w2 i = hu, w2 i = hu, PW (v)i

and the proof of the theorem is complete.


150 CHAPTER 5. INNER PRODUCT SPACES

Remark 5.3.14. Theorem 5.3.13 gives us the following:

1. The orthogonal projectors PW and PW⊥ are idempotent and self-adjoint.

2. Let v ∈ V. Then, v −PW (v) = (IV −PW )(v) = PW⊥ (v) ∈ W⊥ . Thus, hv −PW (v), wi = 0,
for every v ∈ V and w ∈ W.

3. As PW (v) − w ∈ W, for each v ∈ V and w ∈ W, we have

kv − wk2 = kv − PW (v) + PW (v) − wk2


= kv − PW (v)k2 + kPW (v) − wk2 + 2hv − PW (v), PW (v) − wi
= kv − PW (v)k2 + kPW (v) − wk2 .

Therefore, kv − wk ≥ kv − PW (v)k and equality holds if and only if w = PW (v). Since


PW (v) ∈ W, we see that

d(v, W) = inf {kv − wk : w ∈ W } = kv − PW (v)k.

That is, PW (v) is the vector nearest to v ∈ W. This can also be stated as: the vector
PW (v) solves the following minimization problem:

inf kv − wk = kv − PW (v)k.
w∈W
T
AF

The next theorem is a generalization of Theorem 5.3.13. We omit the proof as the arguments
DR

are similar and uses the following:


Let V be a finite dimensional ips with V = W1 ⊕ · · · ⊕ Wk , for certain subspaces Wi ’s of V.
Then, for each v ∈ V there exist unique vectors v1 , . . . , vk such that

1. vi ∈ Wi , for 1 ≤ i ≤ k,

2. hvi , vj i = 0 for each vi ∈ Wi , vj ∈ Wj , 1 ≤ i 6= j ≤ k and

3. v = v1 + · · · + vk .

Theorem 5.3.15. Let V be a finite dimensional ips with subspaces W1 , . . . , Wk of V such that
V = W1 ⊕ · · · ⊕ Wk . Then, for each i, j, 1 ≤ i 6= j ≤ k, there exist orthogonal projectors
PWi : V → V of V onto Wi satisfying the following:

1. Null(PWi ) = Wi⊥ = W1 ⊕ W2 ⊕ · · · ⊕ Wi−1 ⊕ Wi+1 ⊕ · · · ⊕ Wk .

2. Rng(PWi ) = Wi .

3. PWi ◦ PWi = PWi .

4. PWi ◦ PWj = 0V .

5. PWi is a self-adjoint operator, and

6. IV = PW1 ⊕ PW2 ⊕ · · · ⊕ PWk .


5.4. ORTHOGONAL OPERATOR AND RIGID MOTION∗ 151

5.4 Orthogonal Operator and Rigid Motion∗


We now give the definition and a few properties of an orthogonal operator.

Definition 5.4.1. Let V be a vector space. Then, a linear operator T : V → V is said to be an


orthogonal operator if kT (x)k = kxk, for all x ∈ V.

Example 5.4.2. Each T ∈ L(V) given below is an orthogonal operator.


1. Fix a unit vector a ∈ V and define T (x) = 2hx, aia − x, for all x ∈ V.


Solution: Note that Proja (x) = hx, aia. So, hx, aia, x − hx, aia = 0. Also, by
Pythagoras theorem kx − hx, aiak2 = kxk2 − (hx, ai)2 . Thus,

kT (x)k2 = k(hx, aia) + (hx, aia − x)k2 = khx, aiak2 + kx − hx, aiak2 = kxk2 .
" #" #
cos θ − sin θ x
2. Let n = 2, V = R2 and 0 ≤ θ < 2π. Now define T (x) = .
sin θ cos θ y

We now show that an operator is orthogonal if and only if it preserves the angle.

Theorem 5.4.3. Let T ∈ L(V). Then, the following statements are equivalent.
1. T is an orthogonal operator.
2. hT (x), T (y)i = hx, yi, for all x, y ∈ V. That is, T preserves inner product.
T
AF

Proof. 1 ⇒ 2 Let T be an orthogonal operator. Then, kT (x + y)k2 = kx + yk2 . So,


DR

kT (x)k2 + kT (y)k2 + 2hT (x), T (y)i = kT (x) + T (y)k2 = kT (x + y)k2 = kxk2 + kyk2 + 2hx, yi.
Thus, using definition again hT (x), T (y)i = hx, yi.
2 ⇒ 1 If hT (x), T (y)i = hx, yi, for all x, y ∈ V then T is an orthogonal operator as
kT (x)k2 = hT (x), T (x)i = hx, xi = kxk2 .
As an immediate corollary, we obtain the following result.

Corollary 5.4.4. Let T ∈ L(V). Then, T is an orthogonal operator if and only if “for every
orthonormal basis {u1 , . . . , un } of V, {T (u1 ), . . . , T (un )} is an orthonormal basis of V”. Thus,
if B is an orthonormal ordered basis of V then T [B, B] is an orthogonal matrix.

Definition 5.4.5. Let V be a vector space. Then, a map T : V → V is said to be an isometry


or a rigid motion if kT (x) − T (y)k = kx − yk, for all x, y ∈ V. That is, an isometry is
distance preserving.

Observe that if T and S are two rigid motions then ST is also a rigid motion. Furthermore,
it is clear from the definition that every rigid motion is invertible.

Example 5.4.6. The maps given below are rigid motions/isometry.


1. Let V be a linear space with norm k · k. If a ∈ V then the translation map Ta : V → V
(see Exercise 7), defined by Ta (x) = x + a for all x ∈ V, is an isometry/rigid motion as

kTa (x) − Ta (y)k = k (x + a) − (y + a) k = kx − yk.


152 CHAPTER 5. INNER PRODUCT SPACES

2. Let V be an ips. Then, using Theorem 5.4.3, we see that every orthogonal operator is an
isometry.

We now prove that every rigid motion that fixes origin is an orthogonal operator.

Theorem 5.4.7. Let V be a real ips. Then, the following statements are equivalent for any
map T : V → V.
1. T is a rigid motion that fixes origin.
2. T is linear and hT (x), T (y)i = hx, yi, for all x, y ∈ V (preserves inner product).
3. T is an orthogonal operator.

Proof. We have already seen the equivalence of Part 2 and Part 3 in Theorem 5.4.3. Let us now
prove the equivalence of Part 1 and Part 2/Part 3.
If T is an orthogonal operator then T (0) = 0 and kT (x) − T (y)k = kT (x − y)k = kx − yk.
This proves Part 3 implies Part 1.
We now prove Part 1 implies Part 2. So, let T be a rigid motion that fixes 0. Thus,
T (0) = 0 and kT (x) − T (y)k = kx − yk, for all x, y ∈ V. Hence, in particular for y = 0, we
have kT (x)k = kxk, for all x ∈ V. So,
T

kT (x)k2 + kT (y)k2 − 2hT (x), T (y)i = hT (x) − T (y), T (x) − T (y)i = kT (x) − T (y)k2
AF

= kx − yk2 = hx − y, x − yi
DR

= kxk2 + kyk2 − 2hx, yi.

Thus, using kT (x)k = kxk, for all x ∈ V, we get hT (x), T (y)i = hx, yi, for all x, y ∈ V. Now,
to prove T is linear, we use hT (x), T (y)i = hx, yi in 3-rd and 4-th line to get

kT (x + y) − (T (x) + T (y)) k2 = hT (x + y) − (T (x) + T (y)) , T (x + y) − (T (x) + T (y))i


= hT (x + y), T (x + y)i − 2 hT (x + y), T (x)i
−2 hT (x + y), T (y)i + hT (x) + T (y), T (x) + T (y)i
= hx + y, x + yi − 2hx + y, xi − 2hx + y, yi
+hT (x), T (x)i + 2hT (x), T (y)i + hT (y), T (y)i
= −hx + y, x + yi + hx, xi + 2hx, yi + hy, yi = 0.

Thus, T (x+y)−(T (x) + T (y)) = 0 and hence T (x+y) = T (x)+T (y). A similar calculation
gives T (αx) = αT (x) and hence T is linear.
Exercise 5.4.8. 1. Let A, B ∈ Mn (C). Then, A and B are said to be
(a) Orthogonally Congruent if B = S T AS, for some invertible matrix S.
(b) Unitarily Congruent if B = S ∗ AS, for some invertible matrix S.

Prove that Orthogonal and Unitary congruences are equivalence relations on Mn (R) and
Mn (C), respectively.
5.4. ORTHOGONAL OPERATOR AND RIGID MOTION∗ 153

2. Let x ∈ C2 . Identify it with the complex number x = x1 + ix2 . If we rotate x by a


counterclockwise rotation θ, 0 ≤ θ < 2π then, we have

xeiθ = (x1 + ix2 ) (cos θ + i sin θ) = x1 cos θ − x2 sin θ + i[x1 sin θ + x2 cos θ].

Thus, the corresponding vector in R2 is


" # " #" #
x1 cos θ − x2 sin θ cos θ − sin θ x1
= .
x1 sin θ + x2 cos θ sin θ cos θ x2
" #
cos θ − sin θ
Is the matrix, , the matrix of the corresponding rotation? Justify.
sin θ cos θ
" #
cos θ sin θ
3. Let A ∈ M2 (R) and T (θ) = , for θ ∈ R. Then, A is an orthogonal matrix
− sin θ cos θ
" #
0 1
if and only if A = T (θ) or A = T (θ), for some θ ∈ R.
1 0
" #
a b
Ans: To see this assume that A = is orthogonal. Thus a2 + b2 = c2 + d2 = 1 and
c d
ac + bd = ab + cd = 0. Note that (b − c)(d − a) = ac + bd − ab − cd = 0 and so either b = c
or a = d.
" #
0 1
T

Without loss we assume a = d, otherwise we consider A. If a =6 0, then from


AF

1 0
" #
cos θ ± sin θ
DR

0 = ac + bd = a(c + b), we get c = −b. So A = . If a = d = 0,


∓ sin θ cos θ
then b, c ∈ {−1, 1}. In both the cases A = T (θ), for some θ.
4. Let A ∈ Mn (C). Then, the following statements are equivalent.

(a) A is an orthogonal matrix.


(b) A−1 = AT .
(c) AT is orthogonal.
(d) the columns of A form an orthonormal basis of the real vector space Rn .
(e) the rows of A form an orthonormal basis of the real vector space Rn .
(f ) for any two vectors x, y ∈ Cn , hAx, Ayi = hx, yi Orthogonal matrices preserve
angle.
(g) for any vector x ∈ Cn , kAxk = kxk Orthogonal matrices preserve length.

5. Let U be an n × n matrix. Then, prove that the following statements are equivalent.

(a) U is a unitary matrix.


(b) U −1 = U ∗ .
(c) U ∗ is unitary.
(d) the columns of U form an orthonormal basis of the complex vector space Cn .
154 CHAPTER 5. INNER PRODUCT SPACES

(e) the rows of U form an orthonormal basis of the complex vector space Cn .
(f ) for any two vectors x, y ∈ Cn , hU x, U yi = hx, yi Unitary matrices preserve
angle.
(g) for any vector x ∈ Cn , kU xk = kxk Unitary matrices preserve length.

Ans: Part 5a⇔ Part 5g. If U is unitary, then kxk2 = x∗ x = x∗ U ∗ U x = kU xk2 . Conversely,
we have
hU ∗ U x, xi = hU x, U xi = kU xk2 = kxk2 = hx, xi, for all x.

That is h(U ∗ U − I)x, xi = 0, for all x. Put B = U ∗ U − I. Now, taking x = ei , we see that
B(i, i) = 0. For i 6= j, taking x = ei + ej , we get

x∗ Bx = B(i, i) + B(i, j) + B(j, i) + B(j, j) = 0,

so that B(i, j) + B(j, i) = 0. Taking x = ei + iej (here i2 = −1), we get

x∗ Bx = B(i, i) + iB(i, j) − iB(j, i) + B(j, j) = 0,

so that B(i, j) − B(j, i) = 0. Thus B = 0 and so U ∗ U = I. The rest is exercise.


6. Let A be an n × n orthogonal matrix. Then, prove that det(A) = ±1.
7. Let A be an n × n upper triangular matrix. If A is also an orthogonal matrix then A is a
diagonal matrix with diagonal entries ±1.
T
AF

8. Prove that in M5 (R), there are infinitely many orthogonal matrices of which only finitely
many are diagonal (in fact, there number is just 32).
DR

9. Prove that permutation matrices are real orthogonal.


10. Let A, B ∈ Mn (C) be two unitary matrices. Then, prove that AB and BA are unitary
matrices.
|aij |2 = |bij |2 .
P P
11. If A = [aij ] and B = [bij ] are unitarily equivalent then prove that
ij ij
Ans: Notice that
X X X X
|bij |2 = kB[:, i]k2 = k(U B)[:, i]k2 = k(AU )[:, i]k2 (5.4.1)
i i i
X X X
= k(AU )[i, :]k2 = kA[i, :]k2 = |aij |2 . (5.4.2)
i i

We have used that U gives an isometry.

Alternate. An alternate proof for the above is the following:


X X
|aij |2 = tr(A∗ A) = tr(UB∗ U∗ UBU∗ ) = tr(U[B∗ BU∗ ]) = tr([B∗ BU∗ ]U) = tr(B∗ B) = |bij |2 .

12. Let U be a unitary matrix and for every x ∈ Cn , define

kxk1 = max{|xi | : xT = [x1 , . . . , xn ]}.

Then, is it necessary that kU xk1 = kxk1 ?


Ans: No. You may use rotation matrices to see this.
5.5. SUMMARY 155

5.5 Summary
In the previous chapter, we learnt that if V is vector space over F with dim(V) = n then V
basically looks like Fn . Also, any subspace of Fn is either Col(A) or Null(A) or both, for some
matrix A with entries from F.
So, we started this chapter with inner product, a generalization of the dot product in R3
or Rn . We used the inner product to define the length/norm of a vector. The norm has the
property that “the norm of a vector is zero if and only if the vector itself is the zero vector”.
We then proved the Cauchy-Bunyakovskii-Schwartz Inequality which helped us in defining the
angle between two vector. Thus, one can talk of geometrical problems in Rn and proved some
geometrical results.
We then independently defined the notion of a norm in Rn and showed that a norm is
induced by an inner product if and only if the norm satisfies the parallelogram law (sum of
squares of the diagonal equals twice the sum of square of the two non-parallel sides).
The next subsection dealt with the fundamental theorem of linear algebra where we showed
that if A ∈ Mm,n (C) then
1. dim(Null(A)) + dim(Col(A)) = n.
⊥ ⊥
2. Null(A) = Col(A∗ ) and Null(A∗ ) = Col(A) .
3. dim(Col(A)) = dim(Col(A∗ )).
T
AF

We then saw that having an orthonormal basis is an asset as determining the


DR

1. coordinates of a vector boils down to computing the inner product.


2. projection of a vector on a subspace boils down to finding an orthonormal basis of the
subspace and then summing the corresponding rank 1 matrices.

So, the question arises, how do we compute an orthonormal basis? This is where we came
across the Gram-Schmidt Orthonormalization process. This algorithm helps us to determine
an orthonormal basis of LS(S) for any finite subset S of a vector space. This also lead to the
QR-decomposition of a matrix.
Thus, we observe the following about the linear system Ax = b. If
1. b ∈ Col(A) then we can use the Gauss-Jordan method to get a solution.
2. b ∈
/ Col(A) then in most cases we need a vector x such that the least square error between
b and Ax is minimum. We saw that this minimum is attained by the projection of b on
Col(A). Also, this vector can be obtained either using the fundamental theorem of linear
algebra or by computing the matrix B(B T B)−1 B T , where the columns of B are either
the pivot columns of A or a basis of Col(A).
156 CHAPTER 5. INNER PRODUCT SPACES

T
AF
DR
Chapter 6

Eigenvalues, Eigenvectors and


Diagonalizability

6.1 Introduction and Definitions

In this chapter, every matrix is an element of Mn (C) and x = (x1 , . . . , xn )T ∈ Cn , for some
n ∈ N. We start with a few examples to motivate this chapter.
" # " # " #
1 2 9 −2 x
Example 6.1.1. 1. Let A = ,B= and x = .
T

2 1 −2 6 y
AF

" # " # " #


1 1 1
DR

(a) Then A magnifies the nonzero vector three times as A =3 and behaves
1 1 1
" # " # " # " #
1 1 1 1
by changing the direction of as A = −1 . Further, the vectors
−1 −1 −1 1
" #
1
and are orthogonal.
−1
" # " # " # " # " # " #
1 −2 1 1 2 2
(b) B magnifies both the vectors and as B =5 and B = 10 .
2 1 2 2 −1 −1
" # " #
1 2
Here again, the vectors and are orthogonal.
2 −1
(x + y)2 (x − y)2
(c) xT Ax = 3 − . Here, the displacements occur along perpendicular
2 2 " # " #
1 1
lines x + y = 0 and x − y = 0, where x + y = (x, y) and x − y = (x, y) .
1 −1
(x + 2y)2 (2x − y)2
Whereas xT Bx = 5 + 10 . Here also the maximum/minimum
5 5
displacements occur
" # along the orthogonal
" # lines x + 2y = 0 and 2x − y = 0, where
1 2
x + 2y = (x, y) and 2x − y = (x, y) .
2 −1
(d) the curve xT Ax = 10 represents a hyperbola, where as the curve xT Bx = 10 repre-
sents an ellipse (see Figure 6.1 drawn using the package “Sagemath”).

157
158 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY

Figure 6.1: A Hyperbola and two Ellipses (first one has orthogonal axes)
.

" #
1 2
2. Let C = , a non-symmetric matrix. Then, does there exist a nonzero x ∈ C2 which
1 3
gets magnified by C?
So, we need x 6= 0 and α ∈ C such that Cx = αx ⇔ [αI2 − C]x = 0. As x 6= 0,
[αI2 − C]x = 0 has a solution if and only if det[αI − A] = 0. But,
" #!
α−1 −2
det[αI − A] = det = α2 − 4α + 1.
−1 α − 3
T
AF

" √ #
√ √ 1+ 3 −2
So, α = 2± 3. For α = 2+ 3, verify that the x 6= 0 that satisfies √ x=
DR

−1 3−1
"√ # "√ #
3−1 √ 3+1
0 equals x = . Similarly, for α = 2 − 3, the vector x = satisfies
1 −1
" √ #
1− 3 −2
√ x = 0. In this example,
−1 − 3 − 1
"√ # "√ #
3−1 3+1
(a) we still have magnifications in the directions and
.
1 −1

(b) the maximum/minimum displacements do not occur along the lines ( 3−1)x+y = 0

and ( 3 + 1)x − y = 0 (see the third curve in Figure 6.1).
√ √
(c) the lines ( 3 − 1)x + y = 0 and ( 3 + 1)x − y = 0 are not orthogonal.

3. Let A be a real symmetric matrix. Consider the following problem:

Maximize (Minimize) xT Ax such that x ∈ Rn and xT x = 1.

To solve this, consider the Lagrangian

n X
n n
!
X X
T T
L(x, λ) = x Ax − λ(x x − 1) = aij xi xj − λ x2i −1 .
i=1 j=1 i=1
6.1. INTRODUCTION AND DEFINITIONS 159

Partially differentiating L(x, λ) with respect to xi for 1 ≤ i ≤ n, we get


∂L
= 2a11 x1 + 2a12 x2 + · · · + 2a1n xn − 2λx1 ,
∂x1
.. ..
.=.
∂L
= 2an1 x1 + 2an2 x2 + · · · + 2ann xn − 2λxn .
∂xn
Therefore, to get the points of extremum, we solve for

∂L T
 
T ∂L ∂L ∂L
0 = , ,..., = = 2(Ax − λx).
∂x1 ∂x2 ∂xn ∂x
Thus, to solve the extremal problem, we need λ ∈ R, x ∈ Rn such that x 6= 0 and
Ax = λx.

We observe the following about the matrices A, B and C that appear in Example 6.1.1.
√ √
1. det(A) = −3 = 3 × −1, det(B) = 50 = 5 × 10 and det(C) = 1 = (2 + 3) × (2 − 3).
√ √
2. tr(A) = 2 = 3 − 1, tr(B) = 15 = 5 + 10 and det(C) = 4 = (2 + 3) + (2 − 3).
(" # " #) (" # " #) ("√ # "√ #)
1 1 1 2 3−1 3+1
3. The sets , , , and , are linearly indepen-
1 −1 2 −1 1 −1
dent.
" # " #
1 1
4. If v1 = and v2 = and S = [v1 , v2 ] then
T

1 −1
AF

" # " #
3 0 3 0
(a) AS = [Av1 , Av2 ] = [3v1 , −v2 ] = S ⇔ S −1 AS = = diag(3, −1).
DR

0 −1 0 −1
1 1
(b) Let u1 = √ v1 and u2 = √ v2 . Then, u1 and u2 are orthonormal unit vectors,
2 2
i.e., if U = [u1 , u2 ] then I = U U ∗ = u1 u∗1 + u2 u∗2 and A = 3u1 u∗1 − u2 u∗2 .
" # " #
1 2
5. If v1 = and v2 = and S = [v1 , v2 ] then
2 −1
" # " #
5 0 5 0
(a) AS = [Av1 , Av2 ] = [5v1 , 10v2 ] = S ⇔ S −1 AS = = diag(3, −1).
0 10 0 10
1 1
(b) Let u1 = √ v1 and u2 = √ v2 . Then, u1 and u2 are orthonormal unit vectors,
5 5
i.e., if U = [u1 , u2 ] then I = U U ∗ = u1 u∗1 + u2 u∗2 and A = 5u1 u∗1 + 10u2 u∗2 .
"√ # "√ #
3−1 3+1
6. If v1 = and v2 = and S = [v1 , v2 ] then
1 −1
" √ #
−1 2+ 3 0 √ √
S CS = √ = diag(2 + 3, 2 − 3).
0 2− 3

Thus, we see that given A ∈ Mn (C), the number λ ∈ C and x ∈ Cn , x 6= 0 satisfying


Ax = λx have certain nice properties. For example, there exists a basis of C2 in which the
matrices A, B and C behave like diagonal matrices. To understand the ideas better, we start
with the following definitions.
160 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY

Definition 6.1.2. [Eigenvalues, Eigenvectors and Eigenspace] Let A ∈ Mn (C). Then,


1. the equation
Ax = λx ⇔ (A − λIn )x = 0 (6.1.1)

is called the eigen-condition.


2. an α ∈ C is called a characteristic value/root or eigenvalue or latent root of A if
there exists x 6= 0 satisfying Ax = αx.
3. an x 6= 0 satisfying Equation (6.1.1) is called a characteristic vector or eigenvector
or invariant/latent vector of A corresponding to λ.
4. the tuple (α, x) with x 6= 0 and Ax = αx is called an eigen-pair or characteristic-pair.
5. for an eigenvalue α ∈ C, Null(A − αI) = {x ∈ Rn |Ax = αx} is called the eigenspace
or characteristic vector space of A corresponding to α.

Theorem 6.1.3. Let A ∈ Mn (C) and α ∈ C. Then, the following statements are equivalent.
1. α is an eigenvalue of A.
2. det(A − αIn ) = 0.

Proof. We know that α is an eigenvalue of A if any only if the system (A − αIn )x = 0 has a
non-trivial solution. By Theorem 2.4.4 this holds if and only if det(A − αI) = 0.
T
AF

Definition 6.1.4. [Characteristic Polynomial / Equation, Spectrum and Spectral Radius]


Let A ∈ Mn (C). Then,
DR

1. det(A−λI) is a polynomial of degree n in λ and is called the characteristic polynomial


of A, denoted PA (λ), or in short P (λ).
2. the equation PA (λ) = 0 is called the characteristic equation of A.
3. The multi-set (collection with multiplicities) {α ∈ C : PA (α) = 0} is called the spectrum
of A, denoted σ(A). Hence, σ(A) contains all the eigenvalues of A.
4. The Spectral Radius, denoted ρ(A) of A ∈ Mn (C), equals max{|α| : α ∈ σ(A)}.

We thus observe the following.

Remark 6.1.5. Let A ∈ Mn (C).

1. Then, A is singular if and only if 0 ∈ σ(A).

2. Further, if α ∈ σ(A) then the following statements hold.


(a) {0} $ Null(A − αI). Therefore, if Rank(A − αI) = r then r < n. Hence, by
Theorem 2.4.4, the system (A − αI)x = 0 has n − r linearly independent solutions.
(b) x ∈ Null(A − αI) if and only if cx ∈ Null(A − αI), for c 6= 0.
r
P
(c) If x1 , . . . , xr ∈ Null(A − αI) are linearly independent then ci xi ∈ Null(A − αI),
i=1
for all ci ∈ C. Hence, if S is a collection of eigenvectors then, we necessarily want
the set S to be linearly independent.
6.1. INTRODUCTION AND DEFINITIONS 161

(d) Thus, an eigenvector v of A is in some sense a line ` = Span({v}) that passes


through 0 and v and has the property that the image of ` is either ` itself or 0.

3. Since the eigenvalues of A are roots of the characteristic equation, A has exactly n eigen-
values, including multiplicities.

4. If the entries of A are real and α ∈ σ(A) is also real then the corresponding eigenvector
has real entries.

5. Further, if (α, x) is an eigenpair for A and f (A) = b0 I + b1 A + · · · + bk Ak is a polynomial


in A then (f (α), x) is an eigenpair for f (A).

Almost all books in mathematics differentiate between characteristic value and eigenvalue
as the ideas change when one moves from complex numbers to any other scalar field. We give
the following example for clarity.

Remark 6.1.6. Let A ∈ M2 (F). Then, A induces a map T ∈ L(F2 ) defined by T (x) = Ax, for
all x ∈ F2 . We use this idea to understand the difference.
" #
0 1
1. Let A = . Then, pA (λ) = λ2 + 1. So, ±i are the roots of P (λ) = 0 in C. Hence,
−1 0

(a) A has (i, (1, i)T ) and (−i, (i, 1)T ) as eigen-pairs or characteristic-pairs.
T

(b) A has no characteristic value over R.


AF

" #
1 2 √
DR

2. Let A = . Then, 2 ± 3 are the roots of the characteristic equation. Hence,


1 3
(a) A has characteristic values or eigenvalues over R.
(b) A has no characteristic value over Q.

Let us look at some more examples.

Example 6.1.7. 1. Let A = diag(d1 , . . . , dn ) with di ∈ C, 1 ≤ i ≤ n. Then, p(λ) =


n
Q
(λ − di ) and thus verify that (d1 , e1 ), . . . , (dn , en ) are the eigen-pairs.
i=1
n
Q
2. Let A = (aij ) be an n × n triangular matrix. Then, p(λ) = (λ − aii ) and thus verify
i=1
that σ(A) = {a11 , a22 , . . . , ann }. What can you say about the eigen-vectors of an upper
triangular matrix if the diagonal entries are all distinct?
" #
1 1
3. Let A = . Then, p(λ) = (1 − λ)2 . Hence, σ(A) = {1, 1}. But the complete solution
0 1
of the system (A − I2 )x = 0 equals x = ce1 , for c ∈ C. Hence using Remark 6.1.5.2, e1 is
an eigenvector. Therefore, 1 is a repeated eigenvalue whereas there is only one
eigenvector.
" #
1 0
4. Let A = . Then, 1 is a repeated eigenvalue of A. In this case, (A − I2 )x = 0 has a
0 1
solution for every x ∈ C2 . Hence, any two linearly independent vectors xT , yT ∈ C2
162 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY

gives (1, x) and (1, y) as the two eigen-pairs for A. In general, if S = {x1 , . . . , xn } is a
basis of Cn then (1, x1 ), . . . , (1, xn ) are eigen-pairs of In , the identity matrix.
" # " #! " #!
1 −1 i 1
5. Let A = . Then, 1 + i, and 1 − i, are the eigen-pairs of A.
1 1 1 i
 
0 1 0
 
6. Let A = 0 0 1. Then, σ(A) = {0, 0, 0} with e1 as the only eigenvector.
 
0 0 0
   
0 1 0 0 0 x1
   
0 0 1 0 0 x2 
   
7. Let A = 0 0 0 0 0. Then, σ(A) = {0, 0, 0, 0, 0}. Note that Ax3  = 0 implies
   
   
0 0 0 0 1 x 
   4
0 0 0 0 0 x5
x2 = 0 = x3 = x5 . Thus, e1 and e4 are the only eigenvectors. Note that the diagonal
blocks of A are nilpotent matrices.

Exercise 6.1.8. 1. Let A ∈ Mn (R). Then, prove that

(a) if α ∈ σ(A) then αk ∈ σ(Ak ), for all k ∈ N.


T

(b) if A is invertible and α ∈ σ(A) then αk ∈ σ(Ak ), for all k ∈ Z.


AF

2. "
Find eigen-pairs
# over C, for each
# of"the following matrices:
DR

" # " #
1 1+i i 1+i cos θ − sin θ cos θ sin θ
, , and .
1−i 1 −1 + i i sin θ cos θ sin θ − cos θ
n
3. Let A = [aij ] ∈ Mn (C) with
P
aij = a, for all 1 ≤ i ≤ n. Then, prove that a is an
j=1
eigenvalue of A with corresponding eigenvector 1 = [1, 1, . . . , 1]T .

4. Prove that the matrices A and AT have the same set of eigenvalues. Construct a 2 × 2
matrix A such that the eigenvectors of A and AT are different.
" # " # " #
1 1 T T 1 1
Ans: A = . Then 0 ∈ σ(A). Verify that 1 A = 01 and A =0 .
−1 −1 −1 −1

5. Prove that λ ∈ C is an eigenvalue of A if and only if λ ∈ C is an eigenvalue of A∗ .

6. Let A be an idempotent matrix. Then, prove that its eigenvalues are either 0 or 1 or both.

7. Let A be a nilpotent matrix. Then, prove that its eigenvalues are all 0.

8. Let J = 11T ∈ Mn (C). Then, J is a matrix with each entry 1. Show that
(a) (n, 1) is an eigenpair for J.
(b) 0 ∈ σ(J) with multiplicity n − 1. Find a set of n − 1 linearly independent eigenvectors
for 0 ∈ σ(J).
6.1. INTRODUCTION AND DEFINITIONS 163
" #
B 0
9. Let B ∈ Mn (C) and C ∈ Mm (C). Now, define the Direct Sum B ⊕ C = . Then,
0 C
prove that
" #!
x
(a) if (α, x) is an eigen-pair for B then α, is an eigen-pair for B ⊕ C.
0
" #!
0
(b) if (β, y) is an eigen-pair for C then β, is an eigen-pair for B ⊕ C.
y

Definition 6.1.9. Let A ∈ L(Cn ). Then, a vector y ∈ Cn \ {0} satisfying y∗ A = λy∗ is called
a left eigenvector of A for λ.
" # " #
1 1 1
Example 6.1.10. 1. Let A = . Then, x = is a left eigenvector of A corre-
−1 −1 1
" #!
1
sponding to the eigenvalue 0 and 0, y = is a (right) eigenpair of A.
−1
" # " #! " #!
1 1 1 1
2. Let A = . Then, 0, x = and 3, y = are (right) eigen-pairs of
2 2 −1 2
" #! " #!
1 2
A. Also, 3, u = and 0, v = are left eigen-pairs of A. Note that x is
1 −1
orthogonal to u and y is orthogonal to v. This is true in general and is proved next.
T

3. Let S be a nonsingular matrix such that its columns are left eigenvectors of A. Then,
AF

prove that the columns of (S ∗ )−1 are right eigenvectors of A.


DR

Ans: We have S ∗ A = ΛS ∗ and hence A(S ∗ )−1 = (S ∗ )−1 Λ.

Theorem 6.1.11. [Principle of bi-orthogonality] Let (λ, x) be a (right) eigenpair and (µ, y)
be a left eigenpair of A, where λ 6= µ. Then, y is orthogonal to x.

Proof. Verify that µy∗ x = (y∗ A)x = y∗ (λx) = λy∗ x. Thus, y∗ x = 0.

Exercise 6.1.12. Let Ax = λx and x∗ A = µx∗ . Then µ = λ.

Ans: Note λ(x∗ x) = x∗ (λx) = x∗ (Ax) = (x∗ A)x = (µx∗ )x = µ(x∗ x).

Definition 6.1.13. [Eigenvalues of a linear Operator] Let T ∈ L(Cn ). Then, α ∈ C is called


an eigenvalue of T if there exists v ∈ Cn with v 6= 0 such that T (v) = αv.

Proposition 6.1.14. Let T ∈ L(Cn ) and let B be an ordered basis in Cn . Then, (α, v) is an
eigenpair for T if and only if (α, [v]B ) is an eigenpair of A = T [B, B].

Proof. Note that, by definition, T (v) = αv if and only if [T v]B = [αv]B . Or equivalently,
α ∈ σ(T ) if and only if A[v]B = α[v]B . Thus, the required result follows.

Remark 6.1.15. [A linear operator on an infinite dimensional space may not have any
eigenvalue] Let V be the space of all real sequences (see Example 3.1.4.9). Now, define a linear
operator T ∈ L(V) by
T (a0 , a1 , . . .) = (0, a1 , a2 , . . .).
164 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY

We now show that T doesn’t have any eigenvalue.


Solution: Let if possible α be an eigenvalue of T with corresponding eigenvector x =
(x1 , x2 , . . .). Then, the eigen-condition T (x) = αx implies that

(0, x1 , x2 , . . .) = α(x1 , x2 , . . .) = (αx1 , αx2 , . . .).

So, if α 6= 0 then x1 = 0 and this in turn implies that x = 0, a contradiction. If α = 0 then


(0, x1 , x2 , . . .) = (0, 0, . . .) and we again get x = 0, a contradiction. Hence, the required result
follows.

Theorem 6.1.16. Let λ1 , . . . , λn , not necessarily distinct, be the A = [aij ] ∈ Mn (C). Then,
n
Q Pn n
P
det(A) = λi and tr(A) = aii = λi .
i=1 i=1 i=1
Proof. Since λ1 , . . . , λn are the eigenvalues of A, by definition,
n
Y
det(A − xIn ) = (−1)n (x − λi ) (6.1.2)
i=1

is an identity in x as polynomials. Therefore, by substituting x = 0 in Equation (6.1.2), we get


det(A) = (−1)n (−1)n ni=1 λi = ni=1 λi . Also,
Q Q
 
a11 − x a12 ··· a1n
 
 a21 a22 − x · · · a2n 
det(A − xIn ) =  . (6.1.3)
 
 .. .. .. .. 
. . .
T


 
AF

an1 an2 · · · ann − x


= a0 − xa1 + · · · + (−1)n−1 xn−1 an−1 + (−1)n xn
DR

(6.1.4)

for some a0 , a1 , . . . , an−1 ∈ C. Then, an−1 , the coefficient of (−1)n−1 xn−1 , comes from the term

(a11 − x)(a22 − x) · · · (ann − x).


n
P
So, an−1 = aii = tr(A), the trace of A. Also, from Equation (6.1.2) and (6.1.4), we have
i=1
n
Y
n−1 n−1 n n n
a0 − xa1 + · · · + (−1) x an−1 + (−1) x = (−1) (x − λi ).
i=1

Therefore, comparing the coefficient of (−1)n−1 xn−1 , we have


n n
( )
X X
tr(A) = an−1 = (−1) (−1) λi = λi .
i=1 i=1

Hence, we get the required result.

Exercise 6.1.17. 1. Let A be a 3 × 3 orthogonal matrix (AAT = I). If det(A) = 1, then


prove that there exists v ∈ R3 \ {0} such that Av = v.

2. Let A ∈ M2n+1 (R) with AT = −A. Then, prove that 0 is an eigenvalue of A.

3. Let A ∈ Mn (C). Then, A is invertible if and only if 0 is not an eigenvalue of A.

4. Let A ∈ Mn (C) satisfy kAxk ≤ kxk for all x ∈ Cn . Then, prove that if α ∈ C with
| α | > 1 then A − αI is invertible.
6.1. INTRODUCTION AND DEFINITIONS 165

6.1.1 Spectrum of a Matrix

Definition 6.1.18. [Algebraic, Geometric Multiplicity] Let A ∈ Mn (C). Then,


1. the multiplicity of α ∈ σ(A) is called the algebraic multiplicity of A, denoted Alg.Mulα (A).
2. for α ∈ σ(A), dim(Null(A−αI)) is called the geometric multiplicity of A, Geo.Mulα (A).

We now state the following observations.

Remark 6.1.19. Let A ∈ Mn (C).


1. Then, for each α ∈ σ(A), using Theorem 2.4.4 dim(Null(A − αI)) ≥ 1. So, we have at
least one eigenvector.
2. If the algebraic multiplicity of α ∈ σ(A) is r ≥ 2 then the Example 6.1.7.7 implies that we
need not have r linearly independent eigenvectors.

Theorem 6.1.20. Let A and B be two similar matrices. Then,


1. α ∈ σ(A) if and only if α ∈ σ(B).
2. for each α ∈ σ(A), Alg.Mulα (A) = Alg.Mulα (B) and Geo.Mulα (A) = Geo.Mulα (B).

Proof. Since A and B are similar, there exists an invertible matrix S such that A = SBS −1 .
So, α ∈ σ(A) if and only if α ∈ σ(B) as
T

det(A − xI) = det(SBS −1 − xI) = det S(B − xI)S −1



AF

= det(S) det(B − xI) det(A−1 ) = det(B − xI). (6.1.5)


DR

Note that Equation (6.1.5) also implies that Alg.Mulα (A) = Alg.Mulα (B). We will now
show that Geo.Mulα (A) = Geo.Mulα (B).
So, let Q1 = {v1 , . . . , vk } be a basis of Null(A − αI). Then, B = SAS −1 implies that
Q2 = {Sv1 , . . . , Svk } ⊆ Null(B − αI). Since Q1 is linearly independent and S is invertible,
we get Q2 is linearly independent. So, Geo.Mulα (A) ≤ Geo.Mulα (B). Now, we can start
with eigenvectors of B and use similar arguments to get Geo.Mulα (B) ≤ Geo.Mulα (A) and
hence the required result follows.
Remark 6.1.21. 1. Let A = S −1 BS. Then, from the proof of Theorem 6.1.20, we see that
x is an eigenvector of A for λ if and only if Sx is an eigenvector of B for λ.
2. Let A and B be two similar
" matrices then
# " σ(A)
# = σ(B). But, the converse is not true.
0 0 0 1
For example, take A = and B = .
0 0 0 0
3. Let A ∈ Mn (C). Then, for any invertible matrix B, the matrices AB and BA =
B(AB)B −1 are similar. Hence, in this case the matrices AB and BA have
(a) the same set of eigenvalues.
(b) Alg.Mulα (AB) = Alg.Mulα (BA), for each α ∈ σ(A).
(c) Geo.Mulα (AB) = Geo.Mulα (BA), for each α ∈ σ(A).

We will now give a relation between the geometric multiplicity and the algebraic multiplicity.
166 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY

Theorem 6.1.22. Let A ∈ Mn (C). Then, for α ∈ σ(A), Geo.Mulα (A) ≤ Alg.Mulα (A).

Proof. Let Geo.Mulα (A) = k. Suppose Q1 = {v1 , . . . , vk } is an orthonormal basis of Null(A−


αI). Extend Q1 to get {v1 , . . . , vk , vk+1 , . . . , vn } as an orthonormal basis of Cn . Put P =
[v1 , . . . , vk , vk+1 , . . . , vn ]. Then, P ∗ = P −1 and

P ∗ AP = P ∗ [Av1 , . . . , Avk , Avk+1 , . . . , Avn ]


   
v1∗ α ··· 0 ∗ ··· ∗
 ..
 
..

.0 ∗ · · · ∗
  
 . 0
   
 v∗  0 · · · α ∗ · · · ∗
=  ∗ k  [αv1 , . . . , αvk , ∗, . . . , ∗] =  .
   
vk+1 
 
0
 · · · 0 ∗ · · · ∗


 .
..
 .
 ..


   
vn∗ 0 ··· 0 ∗ ··· ∗

Now, if we denote the lower diagonal submatrix as D then

PA (x) = det(A − xI) = det(P ∗ AP − xI) = (α − x)k det(D − xI). (6.1.6)

So, Alg.Mulα (A) = Alg.Mulα (P ∗ AP ) ≥ k = Geo.Mulα (A).

Remark 6.1.23. Note that in the proof of Theorem 6.1.22, the remaining eigenvalues of A are
the eigenvalues of D (see Equation (6.1.6)). This technique is called deflation.
T
AF

 
1 2 3
DR

√1
 
Exercise 6.1.24. 1. Let A = 3 2 1. Notice that x1 = 3 1 is an eigenvector for A.

2 3 1
h i
Find an ordered basis {x1 , x2 , x3 } of C3 . Put X = x1 x2 x3 . Compute X −1 AX to
get a block-triangular matrix. Can you now find the remaining eigenvalues of A?
 
6 − √12 − √36
√ 
Ans: X −1 AX = 

 0 −1 − . The other eigenvalues are −1 ± i.
3
1
0 √3 −1

2. Let A ∈ Mm×n (R) and B ∈ Mn×m (R).


(a) If α ∈ σ(AB) and α 6= 0 then
i. α ∈ σ(BA).
ii. Alg.Mulα (AB) = Alg.Mulα (BA).
iii. Geo.Mulα (AB) =" Geo.Mul #" α (BA).# " #" # " #
I 0 0 B BA B I 0 0 B
Ans: Verify that = =
−A I 0 AB 0 0 −A I 0 0
Let {u1 , . . . , uk } be k linearly independent eigenvalues of AB corresponding to α.
Then, {Bu1 , . . . , Buk } are k linearly independent eigenvalues of BA corresponding
Pk k
P
to α as α 6= 0 and cj Buj = 0 implies cj uj = 0, a contradiction.
j=1 j=1
(b) If 0 ∈ σ(AB) and n = m then Alg.Mul0 (AB) = Alg.Mul0 (BA) as there are n
eigenvalues, counted with multiplicity.
6.2. DIAGONALIZATION 167

(c) Give an example to show that Geo.Mul0 (AB) need not equal Geo.Mul0 (BA) even
when n = m.

3. Let A ∈ Mn (R) be an invertible matrix and let x, y ∈ Rn with x 6= 0 and yT A−1 x 6= 0.


Define B = xyT A−1 . Then, prove that

(a) λ0 = yT A−1 x is an eigenvalue of B of multiplicity 1.

(b) 0 is an eigenvalue of B of multiplicity n − 1 [Hint: Use Exercise 6.1.24.2a].

(c) 1 + αλ0 is an eigenvalue of I + αB of multiplicity 1, for any α ∈ R.

(d) 1 is an eigenvalue of I + αB of multiplicity n − 1, for any α ∈ R.

(e) det(A + αxyT ) equals (1 + αλ0 ) det(A), for any α ∈ R. This result is known as the
Shermon-Morrison formula for determinant.

4. Let A, B ∈ M2 (R) such that det(A) = det(B) and tr(A) = tr(B).

(a) Do A and B have the same set of eigenvalues?

(b) Give examples to show that the matrices A and B need not be similar.

5. Let A, B ∈ Mn (R). Also, let (λ1 , u) and (λ2 , v) are eigen-pairs of A and B, respectively.
T

(a) If u = αv for some α ∈ R then (λ1 + λ2 , u) is an eigen-pair for A + B.


AF

(b) Give an example to show that if u and v are linearly independent then λ1 + λ2 need
DR

not be an eigenvalue of A + B.

6. Let A ∈ Mn (R) be an invertible matrix with eigen-pairs (λ1 , u1 ), . . . , (λn , un ). Then, prove
that B = [u1 , . . . , un ] forms a basis of Rn . If [b]B = (c1 , . . . , cn )T then the system Ax = b
has the unique solution
c1 c2 cn
x= u1 + u2 + · · · + un .
λ1 λ2 λn

6.2 Diagonalization

Let A ∈ Mn (C) and let T ∈ L(Cn ) be defined by T (x) = Ax, for all x ∈ Cn . In this section,
we first find conditions under which one can obtain a basis B of Cn such that T [B, B] (see
Theorem 4.4.4) is a diagonal matrix. And, then it is shown that normal matrices satisfy the
above conditions. To start with, we have the following definition.

Definition 6.2.1. [Matrix Diagonalizability] A matrix A is said to be diagonalizable if A


is similar to a diagonal matrix. Or equivalently, P −1 AP = D ⇔ AP = P D, for some diagonal
matrix D and invertible matrix P .

Example 6.2.2. 1. Let A be an n × n diagonalizable matrix. Then, by definition, A is


similar to a diagonal matrix, say D = diag(d1 , . . . , dn ). Thus, by Remark 6.1.21, σ(A) =
σ(D) = {d1 , . . . , dn }.
168 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
" #
0 1
2. Let A = . Then, A cannot be diagonalized.
0 0
Solution: Suppose A is diagonalizable. Then, A is similar to D = diag(d1 , d2 ). Thus,
by Theorem 6.1.20, {d1 , d2 } = σ(D) = σ(A) = {0, 0}. Hence, D = 0 and therefore,
A = SDS −1 = 0, a contradiction.
 
2 1 1
 
3. Let A = 
0 2 1. Then, A cannot be diagonalized.

0 0 2
Solution: Suppose A is diagonalizable. Then, A is similar to D = diag(d1 , d2 , d3 ). Thus,
by Theorem 6.1.20, {d1 , d2 , d3 } = σ(D) = σ(A) = {2, 2, 2}. Hence, D = 2I3 and therefore,
A = SDS −1 = 2I3 , a contradiction.
" # " #! " #!
0 1 i −i
4. Let A = . Then, i, and −i, are two eigen-pairs of A. Define
−1 0 1 1
" # " #
i −i −i 0
U = √12 . Then, U ∗ U = I2 = U U ∗ and U ∗ AU = .
1 1 0 i

Theorem 6.2.3. Let A ∈ Mn (R).


1. Let S be an invertible matrix such that S −1 AS = diag(d1 , . . . , dn ). Then, for 1 ≤ i ≤ n,
the i-th column of S is an eigenvector of A corresponding to di .
T

2. Then, A is diagonalizable if and only if A has n linearly independent eigenvectors.


AF

Proof. Let S = [u1 , . . . , un ]. Then, AS = SD gives


DR

[Au1 , . . . , Aun ] = A [u1 , . . . , un ] = AS = SD = S diag(d1 , . . . , dn ) = [d1 u1 , . . . , dn un ] .

Or equivalently, Aui = di ui , for 1 ≤ i ≤ n. As S is invertible, {u1 , . . . , un } are linearly


independent. Hence, (di , ui ), for 1 ≤ i ≤ n, are eigen-pairs of A. This proves Part 1 and “only
if” part of Part 2.
Conversely, let {u1 , . . . , un } be n linearly independent eigenvectors of A corresponding to
eigenvalues α1 , . . . , αn . Then, by Corollary 3.2.10, S = [u1 , . . . , un ] is non-singular and
 
α1 0 0
 
 0 α2 0 
AS = [Au1 , . . . , Aun ] = [α1 u1 , . . . , λn un ] = [u1 , . . . , un ]  . .  = SD,
 
 .. . . ... 
 
0 0 αn

where D = diag(α1 , . . . , αn ). Therefore, S −1 AS = D and hence A is diagonalizable.


Definition 6.2.4. 1. A matrix A ∈ Mn (C) is called defective if for some α ∈ σ(A),
Geo.Mulα (A) < Alg.Mulα (A).
2. A matrix A ∈ Mn (C) is called non-derogatory if Geo.Mulα (A) = 1, for each α ∈ σ(A).

As a direct consequence of Theorem 6.2.3, we obtain the following result.

Corollary 6.2.5. Let A ∈ Mn (C). Then,


6.2. DIAGONALIZATION 169

1. A is non-defective if and only if A is diagonalizable.


2. A has distinct eigenvalues if and only if A is non-derogatory and non-defective.

Theorem 6.2.6. Let (α1 , v1 ), . . . , (αk , vk ) be k eigen-pairs of A ∈ Mn (C) with αi ’s distinct.


Then, {v1 , . . . , vk } is linearly independent.

Proof. Suppose {v1 , . . . , vk } is linearly dependent. Then, there exists a smallest ` ∈ {1, . . . , k −
1} and β 6= 0 such that v`+1 = β1 v1 + · · · + β` v` . So,

α`+1 v`+1 = α`+1 β1 v1 + · · · + α`+1 β` v` . (6.2.1)

and

α`+1 v`+1 = Av`+1 = A (β1 v1 + · · · + β` v` ) = α1 β1 v1 + · · · + α` β` v` . (6.2.2)

Now, subtracting Equation (6.2.2) from Equation (6.2.1), we get

0 = (α`+1 − α1 ) β1 v1 + · · · + (α`+1 − α` ) β` v` .

So, v` ∈ LS(v1 , . . . , v`−1 ), a contradiction to the choice of `. Thus, the required result follows.
An immediate corollary of Theorem 6.2.3 and Theorem 6.2.6 is stated next without proof.

Corollary 6.2.7. Let A ∈ Mn (C) have n distinct eigenvalues. Then, A is diagonalizable.


T
AF

The converse of Theorem 6.2.6 is not true as In has n linearly independent eigenvectors
corresponding to the eigenvalue 1, repeated n times.
DR

Corollary 6.2.8. Let α1 , . . . , αk be k distinct eigenvalues A ∈ Mn (C). Also, for 1 ≤ i ≤ k, let


Pk
dim(Null(A − αi In )) = ni . Then, A has ni linearly independent eigenvectors.
i=1

Proof. For 1 ≤ i ≤ k, let Si = {ui1 , . . . , uini } be a basis of Null(A − αi In ). Then, we need to


k
 k 
S Q
prove that Si is linearly independent. To do so, denote pj (A) = (A − αi In ) / (A − αj In ),
i=1 i=1
for 1 ≤ j ≤ k. Then, note that pj (A) is a polynomial in A of degree k − 1 and

 0, if u ∈ Null(A − αi In ), for some i 6= j
pj (A)u = Q
(αj − αi )u if u ∈ Null(A − αj In ) (6.2.3)

i6=j

k
S
So, to prove that Si is linearly independent, consider the linear system
i=1

c11 u11 + · · · + c1n1 u1n1 + · · · + ck1 uk1 + · · · + cknk uknk = 0

in the variables cij ’s. Now, applying the matrix pj (A) and using Equation (6.2.3), we get
Y 
(αj − αi ) cj1 uj1 + · · · + cjnj ujnj = 0.
i6=j
Q
But (αj − αi ) 6= 0 as αi ’s are distinct. Hence, cj1 uj1 + · · · + cjnj ujnj = 0. As Sj is a basis
i6=j
of Null(A − αj In ), we get cjt = 0, for 1 ≤ t ≤ nj . Thus, the required result follows.
170 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY

Corollary 6.2.9. Let A ∈ Mn (C) with distinct eigenvalues α1 , . . . , αk . Then, A is diagonaliz-


able if and only if Geo.Mulαi (A) = Alg.Mulαi (A), for each 1 ≤ i ≤ k.
k
P
Proof. Let Alg.Mulαi (A) = mi . Then, mi = n. Let Geo.Mulαi (A) = ni , for 1 ≤
i=1
k
P
i ≤ k. Then, by Corollary 6.2.8 A has ni linearly independent eigenvectors. Also, by
i=1
Theorem 6.1.22, ni ≤ mi , for 1 ≤ i ≤ mi .
Now, let A be diagonalizable. Then, by Theorem 6.2.3, A has n linearly independent
k
P k
P
eigenvectors. So, n = ni . As ni ≤ mi and mi = n, we get ni = mi .
i=1 i=1
Now, assume that Geo.Mulαi (A) = Alg.Mulαi (A), for 1 ≤ i ≤ k. Then, for each
k
P k
P
i, 1 ≤ i ≤ n, A has ni = mi linearly independent eigenvectors. Thus, A has ni = mi = n
i=1 i=1
linearly independent eigenvectors. Hence by Theorem 6.2.3, A is diagonalizable.
       
2 1 1 1 1
       
Example 6.2.10. Let A =   1 2 1 . Then, 1,  0  and 2,  1  are the only
      
0 −1 1 −1 −1
eigen-pairs. Hence, by Theorem 6.2.3, A is not diagonalizable.

Exercise 6.2.11. 1. Let A be diagonalizable. Then, prove that A + αI is diagonalizable for


every α ∈ C.
T
AF

2. Let A be an strictly upper triangular matrix. Then, prove that A is not diagonalizable.
DR

3. Let A be an n×n matrix with λ ∈ σ(A) with alg.mulλ (A) = m. If Rank[A−λI] 6= n−m
then prove that A is not diagonalizable.

4. If σ(A) = σ(B) and both A and B are diagonalizable then prove that A is similar to B.
That is, they are two basis representation of the same linear transformation.

5. Let A and B be two similar matrices such that A is diagonalizable. Prove that B is
diagonalizable.
" #
A 0
6. Let A ∈ Mn (R) and B ∈ Mm (R). Suppose C = . Then, prove that C is diagonal-
0 B
izable if and only if both A and B are diagonalizable.
 
2 1 1
 
7. Is the matrix A =  1 2 1 diagonalizable?

1 1 2

8. Let Jn be an n × n matrix with all entries 1. Then, Geo.Mul1 (Jn ) = Alg.Mul1 (Jn ) = 1
and Geo.Mul0 (Jn ) = Alg.Mul0 (Jn ) = n − 1.

9. Let A = [aij ] ∈ Mn (R), where aij = a, if i = j and b, otherwise. Then, verify that
A = (a − b)In + bJn . Hence, or otherwise determine the eigenvalues and eigenvectors of
Jn . Is A diagonalizable?
6.2. DIAGONALIZATION 171

10. Let T : R5 −→ R5 be a linear operator with Rank(T − I) = 3 and

Null(T ) = {(x1 , x2 , x3 , x4 , x5 ) ∈ R5 | x1 + x4 + x5 = 0, x2 + x3 = 0}.

(a) Determine the eigenvalues of T ?


(b) For each distinct eigenvalue α of T , determine Geo.Mulα (T ).
(c) Is T diagonalizable? Justify your answer.

11. Let A ∈ Mn (R) with A 6= 0 but A2 = 0. Prove that A cannot be diagonalized.

12. Are the followingmatrices diagonalizable?


1 3 2 1    
  1 0 −1 1 −3 3 " #
0 2 3 1     2 i
i)   , ii) 0 0 1  , iii) 0 −5 6 and iv) .
0 0 −1 1     i 0
0 2 0 0 −3 4
 
0 0 0 4

13. Let A ∈ Mn (C).


(a) Then, prove that Rank(A) = 1 if and only if A = xy∗ , for some non-zero vectors
x, y ∈ Cn .
(b) If Rank(A) = 1 then
i. A has at most one nonzero eigenvalue of algebraic multiplicity 1.
T
AF

ii. find this eigenvalue and its geometric multiplicity.


iii. when is A diagonalizable?
DR

Ans: (a) A has a nonzero row, call it y ∗ . Other rows are scalar multiples of this row. So
A = xy∗ .
(b.i) Note that Ax = (xy∗ )x = (y∗ x)x. Thus, α = y∗ x ∈ σ(A).
(b.ii) Since y 6= 0, let {z1 , . . . , zn−1 } be an orthonormal basis of y⊥ . Then, Azi = xy∗ zi = 0,
hence the geometric multiplicity of 0 is at least n − 1. So, if y∗ x 6= 0, then the geometric
multiplicity of y∗ x is 1. If y∗ x = 0, then the geometric multiplicity of 0 could be n − 1 or n.
(b.iii) A is not diagonalizable if and only if y∗ x = 0 and the geometric multiplicity of the
eigenvalue 0 is n − 1. Or equivalently, A is diagonalizable if and only if tr(A) 6= 0.

14. Let u, v ∈ Cn such that {u, v} is a linearly independent set. Define A = uvT + vuT .
(a) Then prove that A is a symmetric matrix.
(b) Then prove that dim(Ker(A)) = n − 2.
(c) Then 0 ∈ σ(A) and has multiplicity n − 2.
(d) Determine the other eigenvalues of A.
" #
vT
Ans: (a) AT = (uvT + vuT )T = vuT + uvT
= A. Also, A = [u, v] T .
u
⊥ ⊥

(b) Let w ∈ {u, v} . Then Aw = 0 and dim {u, v} = n − 2.
(c) Hence, 0 is an eigenvalue with multiplicity n − 2.
(d) As the eigenvalues of AB and BA are same (except for the multiplicity of the
172 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY
" #
vT u vT v
eigenvalue 0), consider the 2 × 2 matrix . The eigenvalue of this 2 × 2
uT u u T v
matrix gives the other eigenvalues.
k
15. Let A ∈ Mn (C). If Rank(A) = k then there exists xi , yi ∈ Cn such that A = xi yi∗ . Is
P
i=1
the converse true?

6.2.1 Schur’s Unitary Triangularization

We now prove one of the most important results in diagonalization, called the Schur’s Lemma
or the Schur’s unitary triangularization.

Lemma 6.2.12 (Schur’s unitary triangularization (SUT)). Let A ∈ Mn (C). Then, there exists
a unitary matrix U such that A is similar to an upper triangular matrix. Further, if A ∈ Mn (R)
and σ(A) have real entries then U is a real orthogonal matrix.

Proof. We prove the result by induction on n. The result is clearly true for n = 1. So, let n > 1
and assume the result to be true for k < n and prove it for n.
Let (λ1 , x1 ) be an eigen-pair of A with kx1 k = 1. Now, extend it to form an orthonormal
basis {x1 , x2 , . . . , un } of Cn and define X = [x1 , x2 , . . . , un ]. Then, X is a unitary matrix and
 
x∗
 1∗ 
T

" #
 x2  λ ∗
AF

1
X ∗ AX = X ∗ [Ax1 , Ax2 , . . . , Axn ] =  .  [λ1 x1 , Ax2 , . . . , Axn ] = , (6.2.4)
 
 ..  0 B
DR

 
x∗n

where B ∈ Mn−1 (C). Now, by induction hypothesis there exists a unitary" matrix # U ∈ Mn−1 (C)
such that U ∗ BU = T is an upper triangular matrix. Define U b = X 1 0 . Then, using
0 U
Exercise 5.4.8.10, the matrix U is unitary and
b
" # " # " #" #" #
 ∗ 1 0 1 0 1 0 λ 1 ∗ 1 0
Ub AU b = X ∗ AX =
0 U∗ 0 U 0 U∗ 0 B 0 U
" #" # " # " #
λ1 ∗ 1 0 λ1 ∗ λ1 ∗
= = = .
0 U ∗B 0 U 0 U ∗ BU 0 T
" #
λ1 ∗
Since T is upper triangular, is upper triangular.
0 T
Further, if A ∈ Mn (R) and σ(A) has real entries then x1 ∈ Rn with Ax1 = λ1 x1 . Now, one
uses induction once again to get the required result.

Remark 6.2.13. Let A ∈ Mn (C). Then, by Schur’s Lemma there exists a unitary matrix U
such that U ∗ AU = T = [tij ], a triangular matrix. Thus,

{α1 , . . . , αn } = σ(A) = σ(U ∗ AU ) = {t11 , . . . , tnn }. (6.2.5)

Furthermore, we can get the αi ’s in the diagonal of T in any prescribed order.


6.2. DIAGONALIZATION 173

Definition 6.2.14. [Unitary Equivalence] Let A, B ∈ Mn (C). Then, A and B are said to be
unitarily equivalent/similar if there exists a unitary matrix U such that A = U ∗ BU .

Remark 6.2.15. We know that if two matrices are unitarily equivalent then they are necessarily
similar as U ∗ = U −1 , for every unitary matrix U . But, similarity doesn’t imply unitary equiva-
lence (see Exercise 6.2.17.6). In numerical calculations, unitary transformations are preferred
as compared to similarity transformations due to the following main reasons:

1. Exercise 5.4.8.5g implies that kAxk = kxk, whenever A is a normal matrix. This need
not be true under a similarity change of basis.

2. As U −1 = U ∗ , for a unitary matrix, unitary equivalence is computationally simpler.

3. Also, computation of “conjugate transpose” doesn’t create round-off error in calculation.


" # " #
3 2 1 1
Example 6.2.16. Consider the two matrices A = and B = . Then, we show
−1 0 0 2
that they are similar but not unitarily similar.
Solution: Note that σ(A) = σ(B) = {1, 2}. As the eigenvalues are distinct, by Theo-
rem 6.2.7, the matrices A and B are diagonalizable and "hence#there exists invertible matrices
1 0
S and T such that A = SΛS −1 , B = T ΛT −1 , where Λ = . Thus, A = ST −1 B(ST −1 )−1 .
0 2
|aij |2 6= |bij |2 and hence by Exercise 5.4.8.11, they
T

P P
That is, A and B are similar. But,
AF

cannot be unitarily similar.


DR

Exercise 6.2.17. 1. If A is unitarily similar to an upper triangular matrix T = [tij ] then


|tij |2 = tr(A∗ A) − |λi |2 .
P P
prove that
i<j
2. Use the exercises given below to conclude that the upper triangular matrix obtained in the
“Schur’s Lemma” need not be unique.
 √   √ 
2 −1 3 2 2 1 3 2
 √   √ 
(a) Prove that B = 0 1
 2 and C = 0 1 − 2 are unitarily equivalent.
  
0 0 3 0 0 3
 √   √ 
2 0 3 2 2 0 3 2
 √   √ 
(b) Prove that D = 1 1 2  and E = −1 1 − 2 are unitarily equivalent.
   
0 0 1 0 0 1
   
2 1 4 1 1 4
   
(c) Let A1 = 
0 1 2 and A2 = 0 2 2. Then, prove that
  
0 0 1 0 0 3
i. A1 and D are unitarily equivalent.
ii. A2 and B are unitarily equivalent.
iii. Do the above results contradict Exercise 5.4.8.5c? Give reasons for your answer.
   √ 
1 1 1 2 −1 2
   
3. Prove that A = 0 2 1 and B = 0 1
 0  are unitarily equivalent.

0 0 3 0 0 3
174 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY

4. Let A be a normal matrix. If all the eigenvalues of A are 0 then prove that A = 0. What
happens if all the eigenvalues of A are 1?
5. Let A ∈ Mn (C). Then, Prove that if x∗ Ax = 0, for all x ∈ Cn , then A = 0. Do these
results hold for arbitrary matrices?
" # " #
4 4 10 9
6. Show that the matrices A = and B = are similar. Is it possible to find
0 4 −4 −2
a unitary matrix U such that A = U ∗ BU ?
" #
3 2
Ans: Take S = . Then S −1 BS = A. There doesn’t exist an unitary matrix as the
−2 0
sum of the squares of the matrix entries are NOT equal.

We now use Lemma 6.2.12 to give another proof of Theorem 6.1.16.


n
Corollary 6.2.18. Let A ∈ Mn (C). If σ(A) = {α1 , . . . , αn } then det(A) =
Q
αi and tr(A) =
i=1
n
P
αi .
i=1

Proof. By Schur’s Lemma there exists a unitary matrix U such that U ∗ AU = T = [tij ], a
n
Q n
Q
triangular matrix. By Remark 6.2.13, σ(A) = σ(T ). Hence, det(A) = det(T ) = tii = αi
i=1 i=1
n n
and tr(A) = tr(A(UU∗ )) = tr(U∗ (AU)) = tr(T) =
P P
tii = αi .
T

i=1 i=1
AF

6.2.2 Diagonalizability of some Special Matrices


DR

We now use Schur’s unitary triangularization Lemma to state the main theorem of this subsec-
tion. Also, recall that A is said to be a normal matrix if AA∗ = A∗ A.

Theorem 6.2.19 (Spectral Theorem for Normal Matrices). Let A ∈ Mn (C). If A is a normal
matrix then there exists a unitary matrix U such that U ∗ AU = diag(α1 , . . . , αn ).

Proof. By Schur’s Lemma there exists a unitary matrix U such that U ∗ AU = T = [tij ], a
triangular matrix. Since A is a normal

T ∗ T = (U ∗ AU )∗ (U ∗ AU ) = U ∗ A∗ AU = U ∗ AA∗ U = (U ∗ AU )(U ∗ AU )∗ = T T ∗ .

Thus, we see that T is an upper triangular matrix with T ∗ T = T T ∗ . Thus, by Exercise 1.2.10.3,
T is a diagonal matrix and this completes the proof.

Exercise 6.2.20. Let A ∈ Mn (C). If A is either a Hermitian, skew-Hermitian or Unitary


matrix then A is a normal matrix.

We re-write Theorem 6.2.19 in another form to indicate that A can be decomposed into
linear combination of orthogonal projectors onto eigen-spaces. Thus, it is independent of the
choice of eigenvectors.

Remark 6.2.21. Let A ∈ Mn (C) be a normal matrix with eigenvalues α1 , . . . , αn .


1. Then, there exists a unitary matrix U = [u1 , . . . , un ] such that
6.2. DIAGONALIZATION 175

(a) In = u1 u∗1 + · · · + un u∗n .


(b) the columns of U form a set of orthonormal eigenvectors for A (use Theorem 6.2.3).
(c) A = A · In = A (u1 u∗1 + · · · + un u∗n ) = α1 u1 u∗1 + · · · + αn un u∗n .

2. Let α1 , . . . , αk be the distinct eigenvalues of A. Also, let Wi = Null(A − αi In ), for


1 ≤ i ≤ k, be the corresponding eigen-spaces.

(a) Then, we can group the ui ’s such that they form an orthonormal basis of Wi , for
1 ≤ i ≤ k. Hence, Cn = W1 ⊕ · · · ⊕ Wk .
(b) If Pαi is the orthogonal projector onto Wi , for 1 ≤ i ≤ k then A = α1 P1 + · · · + αk Pk .
Thus, A depends only on eigen-spaces and not on the computed eigenvectors.

We now give the spectral theorem for Hermitian matrices.

Theorem 6.2.22. [Spectral Theorem for Hermitian Matrices] Let A ∈ Mn (C) be a Hermitian
matrix. Then,

1. the eigenvalues αi , for 1 ≤ i ≤ n, of A are real.

2. there exists a unitary matrix U , say U = [u1 , . . . , un ] such that

(a) In = u1 u∗1 + · · · + un u∗n .


T

(b) {u1 , . . . , un } forms a set of orthonormal eigenvectors for A.


AF

(c) A = α1 u1 u∗1 +· · ·+αn un u∗n , or equivalently, U ∗ AU = D, where D = diag(α1 , . . . , αn ).


DR

Proof. The second part is immediate from Theorem 6.2.19 as Hermitian matrices are also normal
matrices. For Part 1, let (α, x) be an eigen-pair. Then, Ax = αx. As A is Hermitian A∗ = A.
Thus, x∗ A = x∗ A∗ = (Ax)∗ = (αx)∗ = αx∗ . Hence, using x∗ A = αx∗ , we get

αx∗ x = x∗ (αx) = x∗ (Ax) = (x∗ A)x = (αx∗ )x = αx∗ x.

As x is an eigenvector, x 6= 0. Hence, kxk2 = x∗ x 6= 0. Thus α = α, i.e., α ∈ R.


As an immediate corollary of Theorem 6.2.22 and the second part of Lemma 6.2.12, we give
the following result without proof.

Corollary 6.2.23. Let A ∈ Mn (R) be symmetric. Then, A = U diag(α1 , . . . , αn ) U ∗ , where

1. the αi ’s are all real,

2. the columns of U can be chosen to have real entries,

3. the eigenvectors that correspond to the columns of U form an orthonormal basis of Rn .

Exercise 6.2.24. 1. Let A be a skew-symmetric matrix. Then, the eigenvalues of A are


either zero or purely imaginary and A is unitarily diagonalizable.

2. Let A be a skew-Hermitian matrix. Then, A is unitarily diagonalizable.


176 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY

3. Characterize all normal matrices in M2 (R).


" #" # " # " #" # " #
a b a c a2 + b2 ac + bd a c a b a2 + c2 ab + cd
Ans: = and = .
c d b d ac + bd c2 + d2 b d c d ab + cd b2 + d2
From b2 = c2 , we have either b = c, in which case AT = A or b = −c 6=" 0, in which
# case
2 1
a = d. If a = d = 0, we get AT = −A. We could have other matrices like .
−1 2

4. Let σ(A) = {λ1 , . . . , λn }. Then, prove that the following statements are equivalent.
(a) A is normal.
(b) A is unitarily diagonalizable.
|aij |2 = |λi |2 .
P P
(c)
i,j i
(d) A has n orthonormal eigenvectors.

Ans: In view of earlier results, we only prove c) ⇒ b). By Schur’ theorem, there exists
a unitary matrix U such that U ∗ AU = T is upper triangular. As U ∗ AU = T , we have
|aij |2 = |tij |2 = |tii |2 . So tij = 0, for all i < j.
P P P
i,j i,j i

5. Let A be a normal matrix with (λ, x) as an eigen-pair. Then,

(a) (A∗ )k x for k ∈ Z+ is also an eigenvector corresponding to λ.


T

(b) (λ, x) is an eigen-pair for A∗ . [Hint: Verify kA∗ x − λxk2 = kAx − λxk2 .]
AF

6. Let A be an n × n unitary matrix. Then,


DR

(a) |λ| = 1 for any eigenvalue λ of A.


(b) the eigenvectors x, y corresponding to distinct eigenvalues are orthogonal.

7. Let A be a 2 × 2 orthogonal matrix. Then, prove the following:


" #
cos θ − sin θ
(a) if det(A) = 1 then A = , for some θ, 0 ≤ θ < 2π. That is, A
sin θ cos θ
counterclockwise rotates every point in R2 by an angle θ.
" #
cos θ sin θ
(b) if det A = −1 then A = , for some θ, 0 ≤ θ < 2π. That is, A
sin θ − cos θ
reflects every point in R2 about a line passing through origin. Determine "this line.
#
1 0
Or equivalently, there exists a non-singular matrix P such that P −1 AP = .
0 −1

8. Let A be a 3 × 3 orthogonal matrix. Then, prove the following:

(a) if det(A) = 1 then A is a rotation about a fixed axis, in the sense that A has an
eigen-pair (1, x) such that the restriction of A to the plane x⊥ is a two dimensional
rotation in x⊥ .
(b) if det A = −1 then A corresponds to a reflection through a plane P , followed by a
rotation about the line through origin that is orthogonal to P .
6.2. DIAGONALIZATION 177

9. Let A be a normal matrix. Then, prove that Rank(A) equals the number of nonzero
eigenvalues of A.

10. [Equivalent characterizations of Hermitian matrices] Let A ∈ Mn (C). Then, the fol-
lowing statements are equivalent.

(a) The matrix A is Hermitian.


(b) The number x∗ Ax is real for each x ∈ Cn .
(c) The matrix A is normal and has real eigenvalues.
(d) The matrix S ∗ AS is Hermitian for each S ∈ Mn (C).

Ans: i)⇒ii),iii),iv) can be shown easily.

ii)⇒i). Taking x = ei + ıej , we have x∗ Ax = aii − ıaji + ıaij + ajj ∈ R. As aii , ajj ∈ R, we
see that aij − aji is a purely imaginary number, i.e., they have the same real part. Similarly,
taking x = ei + ej , we see that aij + aji ∈ R, that is, they have opposite imaginary parts. So
aij = aji .

iii)⇒i). Suppose that A∗ A = AA∗ and λ(A) ∈ R. By Spectral theorem A = U ∗ ΛU , for some
unitary matrix, where Λ is a real matrix. Taking conjugate transpose, we see that A∗ = A.

iv)⇒i). Follows by taking S = I.


T
AF

6.2.3 Cayley Hamilton Theorem


DR

Let A ∈ Mn (C). Then, in Theorem 6.1.16, we saw that

PA (x) = det(A − xI) = (−1)n xn − an−1 xn−1 + an−2 xn−2 + · · · + (−1)n−1 a1 x + (−1)n a0


(6.2.6)
for certain ai ∈ C, 0 ≤ i ≤ n − 1. Also, if α is an eigenvalue of A then PA (α) = 0. So,
xn − an−1 xn−1 + an−2 xn−2 + · · · + (−1)n−1 a1 x + (−1)n a0 = 0 is satisfied by n complex numbers.
It turns out that the expression

An − an−1 An−1 + an−2 An−2 + · · · + (−1)n−1 a1 A + (−1)n a0 I = 0

holds true as a matrix identity. This is a celebrated theorem called the Cayley Hamilton
Theorem. We give a proof using Schur’s unitary triangularization. To do so, we look at
multiplication of certain upper triangular matrices.

Lemma 6.2.25. Let A1 , . . . , An ∈ Mn (C) be upper triangular matrices such that the (i, i)-th
entry of Ai equals 0, for 1 ≤ i ≤ n. Then, A1 A2 · · · An = 0.

Proof. We use induction to prove that the first k columns of A1 A2 · · · Ak is 0, for 1 ≤ k ≤ n.


The result is clearly true for k = 1 as the first column of A1 is 0. For clarity, we show that the
first two columns of A1 A2 is 0. Let B = A1 A2 . Then, by matrix multiplication

B[:, i] = A1 [:, 1](A2 )1i + A1 [:, 2](A2 )2i + · · · + A1 [:, n](A2 )ni = 0 + · · · + 0 = 0
178 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY

as A1 [:, 1] = 0 and (A2 )ji = 0, for i = 1, 2 and j ≥ 2. So, assume that the first n − 1 columns
of C = A1 · · · An−1 is 0 and let B = CAn . Then, for 1 ≤ i ≤ n, we see that

B[:, i] = C[:, 1](An )1i + C[:, 2](An )2i + · · · + C[:, n](An )ni = 0 + · · · + 0 = 0

as C[:, j] = 0, for 1 ≤ j ≤ n − 1 and (An )ni = 0, for i = n − 1, n. Thus, by induction hypothesis


the required result follows.

Exercise 6.2.26. Let A, B ∈ Mn (C) be upper triangular matrices with the top leading principal
submatrix of A of size k being 0. If B[k + 1, k + 1] = 0 then prove that the leading principal
submatrix of size k + 1 of AB is 0.

We now prove the Cayley Hamilton Theorem using Schur’s unitary triangularization.

Theorem 6.2.27 (Cayley Hamilton Theorem). Let A ∈ Mn (C). Then, A satisfies its charac-
teristic equation. That is, if PA (x) = det(A−xIn ) = a0 −xa1 +· · ·+(−1)n−1 an−1 xn−1 +(−1)n xn
then
An − an−1 An−1 + an−2 An−2 + · · · + (−1)n−1 a1 A + (−1)n a0 I = 0

holds true as a matrix identity.


n
Q
Proof. Let σ(A) = {α1 , . . . , αn } then PA (x) = (x − αi ). And, by Schur’s unitary triangular-
i=1
ization there exists a unitary matrix U such that U ∗ AU = T , an upper triangular matrix with
T
AF

tii = αi , for 1 ≤ i ≤ n. Now, observe that if Ai = T − αi I then the Ai ’s satisfy the conditions
of Lemma 6.2.25. Hence,
DR

(T − α1 I) · · · (T − αn I) = 0.

Therefore,
n
Y n
Y h i
PA (A) = (A − αi I) = (U T U ∗ − αi U IU ∗ ) = U (T − α1 I) · · · (T − αn I) U ∗ = U 0U ∗ = 0.
i=1 i=1

Thus, the required result follows.


We now give some examples and then implications of the Cayley Hamilton Theorem.
" #
1 2
Remark 6.2.28. 1. Let A = . Then, PA (x) = x2 + 2x − 5. Hence, verify that
1 −3
" # " # " #
2 3 −4 1 2 1 0
A + 2A − 5I2 = +2 −5 = 0.
−2 11 1 −3 0 1
" #
1 1 3 2
Further, verify that A−1 = (A + 2I2 ) = . Furthermore, A2 = −2A + 5I
5 5 1 −1
implies that

A3 = A(A2 ) = A(−2A+5I) = −2A2 +5I = −2(−2A+5I)+5I = 4A−10I +5I = 4A−5I.

We can keep using the above technique to get Am as a linear combination of A and I, for
all m ≥ 1.
6.2. DIAGONALIZATION 179
" #
3 1
2. Let A = . Then, PA (t) = t(t − 3) − 2 = t2 − 3t − 2. So, using PA (A) = 0, we have
2 0
A−3I
A−1 = 2 .Further, A2 = 3A + 2I implies that A3 = 3A2 + 2A = 3(3A + 2I) + 2A =
11A + 6I. So, as above, Am is a combination of A and I, for all m ≥ 1.
" #
0 1
3. Let A = . Then, PA (x) = x2 . So, even though A 6= 0, A2 = 0.
0 0
 
0 0 1
  3 3
4. For A = 
0 0 0, PA (x) = x . Thus, by the Cayley Hamilton Theorem A = 0. But,

0 0 0
it turns out that A2 = 0.
 
1 0 0
  3
5. For A = 0 1 1, note that PA (t) = (t − 1) . So PA (A) = 0. But, observe that if

0 0 1
q(t) = (t − 1)2 then q(A) is also 0.

6. Let A ∈ Mn (C) with PA (x) = a0 − xa1 + · · · + (−1)n−1 an−1 xn−1 + (−1)n xn .

(a) Then, for any ` ∈ N, the division algorithm gives α0 , α1 , . . . , αn−1 ∈ C and a poly-
nomial f (x) with coefficients from C such that
T

x` = f (x)PA (x) + α0 + xα1 + · · · + xn−1 αn−1 .


AF

Hence, by the Cayley Hamilton Theorem, A` = α0 I + α1 A + · · · + αn−1 An−1 .


DR

i. Thus, to compute any power of A, one needs to apply the division algorithm to
get αi ’s and know Ai , for 1 ≤ i ≤ n − 1. This is quite helpful in numerical
computation as computing powers takes much more time than division.
ii. Note that LS I, A, A2 , . . . is a subspace of Mn (C). Also, dim (Mn (C)) = n2 .


But, the above argument implies that dim LS I, A, A2 , . . . ≤ n.


 

iii. In the language of graph theory, it says the following: “Let G be a graph on n
vertices and A its adjacency matrix. Suppose there is no path of length n − 1 or
less from a vertex v to a vertex u in G. Then, G doesn’t have a path from v to u
of any length. That is, the graph G is disconnected and v and u are in different
components of G.”
(b) Suppose A is non-singular. Then, by definition a0 = det(A) 6= 0. Hence,
1 
A−1 = a1 I − a2 A + · · · + (−1)n−2 an−1 An−2 + (−1)n−1 An−1 .

a0
This matrix identity can be used to calculate the inverse.
(c) The above also implies that if A is invertible then A−1 ∈ LS I, A, A2 , . . . . That is,


A−1 is a linear combination of the vectors I, A, . . . , An−1 .


     
2 3 4 −1 −1 1 1 −2 −1
     
5 6 7,  1 −1 1 and
Exercise 6.2.29. Find the inverse of     −2 1 −1 by the
 
1 1 2 0 1 1 0 −1 2
Cayley Hamilton Theorem.
180 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY

Exercise 6.2.30. Miscellaneous Exercises:


1. Let A, B ∈ M2 (C) such that A = AB − BA. Then, prove that A2 = 0.
" # " #!
0 B x
2. Let B be an m×n matrix and A = T
. Then, prove that λ, is an eigen-pair
B 0 y
" #!
x
if and only if −λ, is an eigen-pair.
−y
" #
B C
3. Let B, C ∈ Mn (R). Define A = . Then, prove the following:
−C B
" #
x
(a) if s is a real eigenvalue of A with corresponding eigenvector then s is also an
y
" #
−y
eigenvalue corresponding to the eigenvector .
x
" #
x + iy
(b) if s + it is a complex eigenvalue of A with corresponding eigenvector then
−y + ix
" #
x − iy
s − it is also an eigenvalue of A with corresponding eigenvector .
−y − ix
(c) (s + it, x + iy) is an eigen-pair of B+iC if and only if (s − it, x − iy) is an eigen-pair
of B − iC.
" #!
T

x + iy
(d) s + it, is an eigen-pair of A if and only if (s + it, x + iy) is an eigen-
AF

−y + ix
pair of B + iC.
DR

(e) det(A) = | det(B + iC)|2 .

The next section deals with quadratic forms which helps us in better understanding of conic
sections in analytic geometry.

6.3 Quadratic Forms


Definition 6.3.1. [Positive, Semi-positive and Negative definite matrices] Let A ∈ Mn (C).
Then, A is said to be
1. positive semi-definite (psd) if x∗ Ax ∈ R and x∗ Ax ≥ 0, for all x ∈ Cn .
2. positive definite (pd) if x∗ Ax ∈ R and x∗ Ax > 0, for all x ∈ Cn \ {0}.
3. negative semi-definite (nsd) if x∗ Ax ∈ R and x∗ Ax ≤ 0, for all x ∈ Cn .
4. negative definite (nd) if x∗ Ax ∈ R and x∗ Ax < 0, for all x ∈ Cn \ {0}.
5. indefinite if x∗ Ax ∈ R and there exist x, y ∈ Cn such that x∗ Ax < 0 < y∗ Ay.

Lemma 6.3.2. Let A ∈ Mn (C). Then A is Hermitian if and only if at least one of the following
statements hold:
1. S ∗ AS is Hermitian for all S ∈ Mn .
2. A is normal and has real eigenvalues.
6.3. QUADRATIC FORMS 181

3. x∗ Ax ∈ R for all x ∈ Cn .

Proof. Let S ∈ Mn , (S ∗ AS)∗ = S ∗ A∗ S = S ∗ AS. Thus S ∗ AS is Hermitian.


Suppose A = A∗ . Then, A is clearly normal as AA∗ = A2 = A∗ A. Further, if (λ, x) is an
eigenpair then λx∗ x = x∗ Ax ∈ R implies λ ∈ R.
For the last part, note that x∗ Ax ∈ C. Thus x∗ Ax = (x∗ Ax)∗ = x∗ A∗ x = x∗ Ax, we get
Im(x∗ Ax) = 0. Thus, x∗ Ax ∈ R.
If S ∗ AS is Hermitian for all S ∈ Mn then taking S = In gives A is Hermitian.
If A is normal then A = U ∗ diag(λ1 , . . . , λn )U for some unitary matrix U . Since λi ∈ R,
A∗ = (U ∗ diag(λ1 , . . . , λn )U )∗ = U ∗ diag(λ1 , . . . , λn )U = U ∗ diag(λ1 , . . . , λn )U = A. So, A is
Hermitian.
If x∗ Ax ∈ R for all x ∈ Cn then aii = e∗i Aei ∈ R. Also, aii +ajj +aij +aji = (ei +ej )∗ A(ei +
ej ) ∈ R. So, Im(aij ) = −Im(aji ). Similarly, aii + ajj + iaij − iaji = (ei + iej )∗ A(ei + iej ) ∈ R
implies that Re(aij ) = Re(aji ). Thus, A = A∗ .

Remark 6.3.3. Let A ∈ Mn (R). Then the condition x∗ Ax ∈ R in Definition 6.3.9 is always
true and hence doesn’t put any restriction on the matrix A. So, in Definition 6.3.9, we assume
that AT = A, i.e., A is a symmetric
" matrix.
# " #
2 1 3 1+i
Example 6.3.4. 1. Let A = or A = . Then, A is positive definite.
1 2 1−i 4
" # "√ #
T

1 1 2 1+i
2. Let A = or A = √ . Then, A is positive semi-definite but not positive
AF

1 1 1−i 2
definite.
DR

" # " #
−2 1 −2 1 − i
3. Let A = or A = . Then, A is negative definite.
1 −2 1 + i −2
" # " #
−1 1 −2 1 − i
4. Let A = or A = . Then, A is negative semi-definite.
1 −1 1 + i −1
" # " #
0 1 1 1+i
5. Let A = or A = . Then, A is indefinite.
1 −1 1−i 1
Theorem 6.3.5. Let A ∈ Mn (C). Then, the following statements are equivalent.
1. A is positive semi-definite.
2. A∗ = A and each eigenvalue of A is non-negative.
3. A = B ∗ B for some B ∈ Mn (C).

Proof. 1 ⇒ 2: Let A be positive semi-definite. Then, by Lemma 6.3.2 A is Hermitian. If


(α, v) is an eigen-pair of A then αkvk2 = v∗ Av ≥ 0. So, α ≥ 0.
2 ⇒ 3: Let σ(A) = {α1 , . . . , αn }. Then, by spectral theorem, there exists a unitary
matrix U such that U ∗ AU = D with D = diag(α1 , . . . , αn ). As αi ≥ 0, for 1 ≤ i ≤ n, define
1 √ √ 1 1
D 2 = diag( α1 , . . . , αn ). Then, A = U D 2 [D 2 U ∗ ] = B ∗ B.
3 ⇒ 1: Let A = B ∗ B. Then, for x ∈ Cn , x∗ Ax = x∗ B ∗ Bx = kBxk2 ≥ 0. Thus, the
required result follows.
A similar argument gives the next result and hence the proof is omitted.
182 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY

Theorem 6.3.6. Let A ∈ Mn (C). Then, the following statements are equivalent.
1. A is positive definite.
2. A∗ = A and each eigenvalue of A is positive.
3. A = B ∗ B for a non-singular matrix B ∈ Mn (C).

Remark 6.3.7. Let A ∈ Mn (C) be a Hermitian matrix with eigenvalues λ1 ≥ λ2 ≥ · · · ≥


λn . Then, there exists a unitary matrix U = [u1 , u2 , . . . , un ] and a diagonal matrix D =
diag(λ1 , λ2 , . . . , λn ) such that A = U DU ∗ . Now, for 1 ≤ i ≤ n, define αi = max{λi , 0} and
βi = min{λi , 0}. Then
1. for D1 = diag(α1 , α2 , . . . , αn ), the matrix A1 = U D1 U ∗ is positive semi-definite.
2. for D2 = diag(β1 , β2 , . . . , βn ), the matrix A2 = U D2 U ∗ is positive semi-definite.
3. A = A1 − A2 . The matrix A1 is generally called the positive semi-definite part of A.

Definition 6.3.8. [Multilinear Function] Let V be a vector space over F. Then,


1. for a fixed m ∈ N, a function f : Vm → F is called an m-multilinear function if f is
linear in each component. That is,

f (v1 , . . . , vi−1 , (vi + αu), vi+1 . . . , vm ) = f (v1 , . . . , vi−1 , vi , vi+1 . . . , vm )


+αf (v1 , . . . , vi−1 , u, vi+1 . . . , vm )
T

for α ∈ F, u ∈ V and vi ∈ V, for 1 ≤ i ≤ m.


AF

2. An m-multilinear form is also called an m-form.


DR

3. A 2-form is called a bilinear form.

Definition 6.3.9. [Sesquilinear, Hermitian and Quadratic Forms] Let A = [aij ] ∈ Mn (C)
be a Hermitian matrix and let x, y ∈ Cn . Then, a sesquilinear form in x, y ∈ Cn is defined
as H(x, y) = y∗ Ax. In particular, H(x, x), denoted H(x), is called a Hermitian form. In
case A ∈ Mn (R), H(x) is called a quadratic form.

Remark 6.3.10. Observe that


1. if A = In then the bilinear/sesquilinear form reduces to the standard inner product.
2. H(x, y) is ‘linear’ in the first component and ‘conjugate linear’ in the second component.
3. the quadratic form H(x) is a real number. Hence, for α ∈ R, the equation H(x) = α,
represents a conic in Rn .
Example 6.3.11. 1. Let vi ∈ Cn , for 1 ≤ i ≤ n. Then, f (v1 , . . . , vn ) = det ([v1 , . . . , vn ])
is an n-form on Cn .
2. Let A ∈ Mn (R). Then, f (x, y) = yT Ax, for x, y ∈ Rn , is a bilinear form on Rn .
" # " #
1 2−i ∗ x
3. Let A = . Then, A = A and for x = , verify that
2+i 2 y

H(x) = x∗ Ax = |x|2 + 2|y|2 + 2Re ((2 − i)xy)

where ‘Re’ denotes the real part of a complex number, is a sesquilinear form.
6.3. QUADRATIC FORMS 183

6.3.1 Sylvester’s law of inertia

The main idea of this section is to express H(x) as sum or difference of squares. Since H(x) is
a quadratic in x, replacing x by cx, for c ∈ C, just gives a multiplication factor by |c|2 . Hence,
one needs to study only the normalized vectors. Let us consider Example 6.1.1 again. There
we see that
(x + y)2 (x − y)2
xT Ax = 3 − = (x + 2y)2 − 3y 2 , and (6.3.1)
2 2
(x + 2y)2 (2x − y)2 2y 50y 2
xT Bx = 5 + 10 = (3x − )2 + . (6.3.2)
5 5 3 9
Note that both the expressions in Equation (6.3.1) is the difference of two non-negative terms.
Whereas, both the expressions in Equation (6.3.2) consists of sum of two non-negative terms.
Is this just a coincidence?
In general, let A ∈ Mn (C) be a Hermitian matrix. Then, by Theorem 6.2.22, σ(A) =
{α1 , . . . , αn } ⊆ R and there exists a unitary matrix U such that U ∗ AU = D = diag(α1 , . . . , αn ).
Let x = U z. Then, kxk = 1 and U is unitary implies that kzk = 1. If z = (z1 , . . . , zn )∗ then
n p r
∗ ∗ ∗
X
2
X √ 2
X p 2
H(x) = z U AU z = z Dz = αi |zi | = | αi zi | − |αi | zi , (6.3.3)

i=1 i=1 i=p+1

where α1 , . . . , αp > 0, αp+1 , . . . , αr < 0 and αr+1 , . . . , αn = 0. Thus, we see that the possible
values of H(x) seem to depend only on the eigenvalues of A. Since U is an invertible matrix,
T
AF

the components zi ’s of z = U −1 x = U ∗ x are commonly known as the linearly independent


linear forms. Note that each zi is a linear expression in the components of x. Also, note
DR

that in Equation (6.3.3), p corresponds to the number of positive eigenvalues and r − p to the
number of negative eigenvalues. For a better understanding, we define the following numbers.

Definition 6.3.12. [Inertia and Signature of a Matrix] Let A ∈ Mn (C) be a Hermitian


matrix. The inertia of A, denoted i(A), is the triplet (i+ (A), i− (A), i0 (A)), where i+ (A) is the
number of positive eigenvalues of A, i− (A) is the number of negative eigenvalues of A and i0 (A)
is the nullity of A. The difference i+ (A) − i− (A) is called the signature of A.

Exercise 6.3.13. Let A ∈ Mn (C) be a Hermitian matrix. If the signature and the rank of A
is known then prove that one can find out the inertia of A.

As a next result, we show that in any expression of H(x) as a sum or difference of n absolute
squares of linearly independent linear forms, the number p (respectively, r − p) gives the number
of positive (respectively, negative) eigenvalues of A. This is popularly known as the ‘Sylvester’s
law of inertia’.

Lemma 6.3.14. [Sylvester’s Law of Inertia] Let A ∈ Mn (C) be a Hermitian matrix and let
x ∈ Cn . Then, every Hermitian form H(x) = x∗ Ax, in n variables can be written as

H(x) = |y1 |2 + · · · + |yp |2 − |yp+1 |2 − · · · − |yr |2

where y1 , . . . , yr are linearly independent linear forms in the components of x and the integers
p and r satisfying 0 ≤ p ≤ r ≤ n, depend only on A.
184 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY

Proof. Equation (6.3.3) implies that H(x) has the required form. We only need to show that
p and r are uniquely determined by A. Hence, let us assume on the contrary that there exist
p, q, r, s ∈ N with p > q such that

H(x) = |y1 |2 + · · · + |yp |2 − |yp+1 |2 − · · · − |yr |2 (6.3.4)


= |z1 |2 + · · · + |zq |2 − |zq+1 |2 − · · · − |zs |2 , (6.3.5)
   
" # " # y1 z1
Y1 Z1 . .
where y = = M x, z = = N x with Y1 =  . .
Y2 Z2  .  and Z1 =  .  for some invertible
yp zq
matrices M and N . Now" the invertibility
# of M and N implies z = By,
" # some
for " invertible
# " matrix
#
B1 B2 Z1 B1 B2 Y1
B. Decompose B = , where B1 is a q × p matrix. Then = . As
B3 B4 Z2 B3 B4 Y2
 
y˜1
.
p > q, the homogeneous linear system B1 Y1 = 0 has a nontrivial solution, say Y f1 =  .  and
.
y˜p
" #
Y
f1
consider ye = . Then for this choice of y e , Z1 = 0 and thus, using Equations (6.3.4) and
0
(6.3.5), we have

H(ỹ) = |y˜1 |2 + |y˜2 |2 + · · · + |y˜p |2 − 0 = 0 − (|zq+1 |2 + · · · + |zs |2 ).


T
AF

Now, this can hold only if Y


f1 = 0, a contradiction to Y
f1 being a non-trivial solution. Hence
p = q. Similarly, the case r > s can be resolved. This completes the proof of the lemma.
DR

Remark 6.3.15. Since A is Hermitian, Rank(A) equals the number of nonzero eigenvalues.
Hence, Rank(A) = r. The number r is called the rank and the number r − 2p is called the
inertial degree of the Hermitian form H(x).

We now look at another form of the Sylvester’s law of inertia. We start with the following
definition.

Definition 6.3.16. [Star Congruence] Let A, B ∈ Mn (C). Then, A is said to be ∗-congruent


(read star-congruent) to B if there exists an invertible matrix S such that A = S ∗ BS.

Theorem 6.3.17. [Second Version: Sylvester’s Law of Inertia] Let A, B ∈ Mn (C) be


Hermitian. Then, A is ∗-congruent to B if and only if i(A) = i(B).

Proof. By spectral theorem U ∗ AU = ΛA and V ∗ BV = ΛB , for some unitary matrices U, V


and diagonal matrices ΛA , ΛB of the form diag(+, · · · , +, −, · · · , −, 0, · · · , 0). Thus, there exist
invertible matrices S, T such that S ∗ AS = DA and T ∗ BT = DB , where DA , DB are diagonal
matrices of the form diag(1, · · · , 1, −1, · · · , −1, 0, · · · , 0).
If i(A) = i(B), then it follows that DA = DB , i.e., S ∗ AS = T ∗ BT and hence A =
(T S −1 )∗ B(T S −1 ).
Conversely, suppose that A = P ∗ BP , for some invertible matrix P , and i(B) = (k, l, m).
As T ∗ BT = DB , we have, A = P ∗ (T ∗ )−1 DB T −1 P = (T −1 P )∗ DB (T −1 P ). Now, let X =
(T −1 P )−1 . Then, A = (X −1 )∗ DB X −1 and we have the following observations.
6.3. QUADRATIC FORMS 185

1. As rank and nullity do not change under similarity transformation, i0 (A) = i0 (DB ) = m
as i(B) = (k, l, m).
2. Using i(B) = (k, l, m), we also have

X[:, k + 1]∗ AX[:, k + 1] = X[:, k + 1]∗ (X −1 )∗ DB (X −1 ) X[:, k + 1] = e∗k+1 DB ek+1 = −1.




Similarly, X[:, k + 2]∗ AX[:, k + 2] = · · · = X[:, k + l]∗ AX[:, k + l] = −1. As the vectors
X[:, k + 1], . . . , X[:, k + l] are linearly independent, using 7.7.10, we see that A has at least
l negative eigenvalues.
3. Similarly, X[:, 1]∗ AX[:, 1] = · · · = X[:, k]∗ AX[:, k] = 1. As X[:, 1], . . . , X[:, k] are linearly
independent, using 7.7.10 again, we see that A has at least k positive eigenvalues.

Thus, it now follows that i(A) = (k, l, m).

6.3.2 Applications in Eculidean Plane and Space

We now obtain conditions on the eigenvalues of A, corresponding to the associated quadratic


form, to characterize conic sections in R2 , with respect to the standard inner product.

Definition 6.3.18. [Associated Quadratic Form] Let f (x, y) = ax2 +2hxy+by 2 +2f x+2gy+c
be a general quadratic in x and y, with coefficients from R. Then,
" #" #
h i a h x
H(x) = xT Ax = x, y = ax2 + 2hxy + by 2
T

h b y
AF

is called the associated quadratic form of the conic f (x, y) = 0.


DR

Proposition 6.3.19. Consider the quadratic f (x, y) = ax2 + 2hxy + by 2 + 2gx + 2f y + c, for
a, b, c, g, f, h ∈ R. If (a, b, h) 6= (0, 0, 0) then f (x, y) = 0 represents

1. a parabola or a pair of parallel lines if ab − h2 = 0,

2. a hyperbola or a pair of perpendicular lines if ab − h2 < 0,

3. an ellipse or a circle or a point (point of intersection of a pair of perpendicular lines) if


ab − h2 > 0.
" #
a h
Proof. Consider the associated quadratic ax2 + 2hxy + by 2 with A = as the associated
h b
symmetric matrix. Then, by Corollary 6.2.23, A = U diag(α1 , α2 )U T , where U = [u1 , u2 ] is an
orthogonal matrix, with (α1 , u1 ) and (α2 , u2 ) as eigen-pairs of A. As (a, b, h) 6= (0, 0, 0) at least
one of α1 , α2 6= 0. Also,
" # " # " #" #
T
h i α1 0 T x h i α
1 0 u
x Ax = x, y U U = u v = α1 u2 + α2 v 2 ,
0 α2 y 0 α2 v
" #
u
where = U T x. The lines u = 0, v = 0 are the two linearly independent linear forms, which
v
correspond to two perpendicular lines passing through the origin in the (x, y)-plane. In terms
of u, v, f (x, y) reduces to f (u, v) = α1 u2 + α2 v 2 + d1 u + d2 v + c, for some choice of d1 , d2 ∈ R.
We now look at different cases:
186 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY

1. if α1 = 0 and α2 6= 0 then ab − h2 = det(A) = α1 α2 = 0. In this case,

d2 2
 
f (u, v) = 0 ⇔ α2 v + = c1 − d1 u,
2α2

for some c1 ∈ R.
d2
(a) If d1 = 0, the quadratic corresponds to either the same line v + = 0, two parallel
2α2
lines or two imaginary lines, depending on whether c1 = 0, c1 α2 > 0 and c1 α2 < 0,
respectively.
(b) If d1 6= 0, the quadratic corresponds to a parabola of the form V 2 = 4aU , for some
translate U = u + α and V = v + β.

2. If α1 α2 < 0 then ab − h2 = det(A) = λ1 λ2 < 0. If α2 = −β2 < 0, for β2 >


0 then the quadratic reduces to α1 (u + d1 )2 − β2 (v + d2 )2 = d3 , or equivalently, to
√ √  √ √
α1 (u + d1 ) + β2 (v + d2 ) · α1 (u + d1 ) − β2 (v + d2 ) = d3 , for some d1 , d2 , d3 ∈ R.


Thus, the quadratic corresponds to


(a) a pair of perpendicular lines u + d1 = 0 and v + d2 = 0 whenever d3 = 0.
(b) a hyperbola with orthogonal principal axes u + d1 = 0 and v + d2 = 0 whenever
d3 6= 0. In particular, if d3 > 0 then the corresponding equation equals

α1 (u + d1 )2 α2 (v + d2 )2
− = 1.
T

d3 d3
AF

3. If α1 α2 > 0 then ab − h2 = det(A) = α1 α2 > 0. Here, the quadratic reduces to α1 (u +


DR

d1 )2 + α2 (v + d2 )2 = d3 , for some d1 , d2 , d3 ∈ R. Thus, the quadratic corresponds to


(a) a point which is the point of intersection of the pair of orthogonal lines u + d1 = 0
and v + d2 = 0 if d3 = 0.
(b) an empty set if α1 d3 < 0.
(c) an ellipse or circle with u + d1 = 0 and v + d2 = 0 as the orthogonal principal axes
if α1 d3 > 0 with the corresponding equation

α1 (u + d1 )2 α2 (v + d2 )2
+ = 1.
d3 d3

Thus, we have considered all the possible cases and the required result follows.
" # " #
u T x
Remark 6.3.20. Observe that the linearly independent forms =U are functions of
v y
the eigenvectors u1 and u2 . Further, the linearly independent forms together with the shifting
of the origin give us the principal axes of the corresponding conic.

Example 6.3.21. 1. Let"H(x)# = x2 + y 2 + 2xy be the associated


" √ quadratic
#! form
" for√a class
#!
1 1 1/ 2 1/ 2
of curves. Then, A = and the eigen-pairs are 2, √ and 0, √ .
1 1 1/ 2 −1/ 2
In particular, for
6.3. QUADRATIC FORMS 187

(a) f (x, y) = x2 + 2xy + y 2 − 8x − 8y + 16, we have f (x, y) = 0 ⇔ (x + y − 4)2 = 0, just


one line.
(b) f (x, y) = x2 + 2xy + y 2 − 8x − 8y, we have f (x, y) = 0 ⇔ (x + y − 8) · (x + y) = 0, a
pair of parallel lines.
(c) f (x, y) = x2 + 2xy + y 2 − 6x − 10y − 3, we have

x+y 2 x−y 2 √ √ x−y


       
x+y
f (x, y) = 0 ⇔ 2 √ +0 √ =8 2 √ −2 2 √ +3
2 2 2 2
x+y−4 2 √ x − y − 19/2
   
⇔ √ =− 2 √ ,
2 2
a parabola with principal axes x + y = 4, 2x − 2y = 19 and directrix x − y = 10.

T
AF

Figure 6.2: Conic x2 + 2xy + y 2 − 6x − 10y = 3


DR

2 2
2. Let H(x)
" # − 5y + 20xy be the associated
= 10x " quadratic
√ #!
form for "
a class of#!

curves. Then
10 10 2/ 5 1/ 5
A= and the eigen-pairs are 15, √ and −10, √ . So, for
10 −5 1/ 5 −2/ 5
(a) f (x, y) = 10x2 − 5y 2 + 20xy + 16x − 2y + 1, we have f (x, y) = 0 ⇔ 3(2x + y + 1)2 −
2(x − 2y − 1)2 = 0, a pair of perpendicular lines.
(b) f (x, y) = 10x2 − 5y 2 + 20xy + 16x − 2y + 19, we have
2 2
x − 2y − 1
 
2x + y + 1
f (x, y) = 0 ⇔ − √ = 1,
3 6
a hyperbola.
(c) f (x, y) = 10x2 − 5y 2 + 20xy + 16x − 2y − 17, we have
2 2
x − 2y − 1
 
2x + y + 1
f (x, y) = 0 ⇔ √ − = 1,
6 3

a hyperbola.

3. Let H(x) = 2 2
" # 6x + 9y + 4xy be the associated
" √quadratic
#! form"for a class
√ #!
of curves. Then,
6 2 1/ 5 2/ 5
A= , and the eigen-pairs are 10, √ and 5, √ . So, for
2 9 2/ 5 −1/ 5
188 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY

Figure 6.3: Conic 10x2 − 5y 2 + 20xy + 16x − 2y = c, c = −1, c = −19 and c = 17

(a) f (x, y) = 6x2 + 9y 2 + 4xy + 10y − 53, we have

x + 2y + 1 2 2x − y − 1 2
   
f (x, y) = 0 ⇔ + √ = 1,
5 5 2
an ellipse.
T
AF
DR

Figure 6.4: Conic 6x2 + 9y 2 + 4xy + 10y = 53


.

Exercise 6.3.22. Sketch the graph of the following surfaces:

1. x2 + 2xy + y 2 + 6x + 10y = 3.
Ans: a parabola.

2. 2x2 + 6xy + 3y 2 − 12x − 6y = 5.


Ans: a hyperbola.

3. 4x2 − 4xy + 2y 2 + 12x − 8y = 10.


Ans: an ellipse.

4. 2x2 − 6xy + 5y 2 − 10x + 4y = 7.


Ans: an ellipse.
6.3. QUADRATIC FORMS 189

As a last application,
 we consider
 a quadratic
 in 3 variables,
 namely x1 , x2 and x3 . To do
a h g x l y
   1    1
so, let A = 
h b f , x = x2 , b = m and y = y2  with
      
g f c x3 n y3

f (x1 , x2 , x3 ) = xT Ax + 2bT x + q
= ax21 + bx22 + cx23 + 2hx1 x2 + 2gx1 x3 + 2f x2 x3
+2lx1 + 2mx2 + 2nx3 + q (6.3.6)

Then, we observe the following:

1. As A is symmetric, P T AP = diag(α1 , α2 , α3 ), where P = [u1 , u2 , u3 ] is an orthogonal


matrix and (αi , ui ), for i = 1, 2, 3 are eigen-pairs of A.

2. Let y = P T x. Then, f (x1 , x2 , x3 ) reduces to

g(y1 , y2 , y3 ) = α1 y12 + α2 y22 + α3 y32 + 2l1 y1 + 2l2 y2 + 2l3 y3 + q. (6.3.7)

3. Depending on the values of αi ’s, rewrite g(y1 , y2 , y3 ) to determine the center and the
planes of symmetry of f (x1 , x2 , x3 ) = 0.

Example 6.3.23. Determine the following quadrics f (x, y, z) = 0, where


T
AF

1. f (x, y, z) = 2x2 + 2y 2 + 2z 2 + 2xy + 2xz + 2yz + 4x + 2y + 4z + 2.


DR

2. f (x, y, z) = 3x2 − y 2 + z 2 + 10.

3. f (x, y, z) = 3x2 − y 2 + z 2 − 10.

4. f (x, y, z) = 3x2 − y 2 + z − 10.


     
2 1 1 2 √1 √1 √1
 13 −1
2 6
√1 
   
Solution: (1) Here A =  1 2 1, b = 1 and q = 2. So, verify P =  3
   √ √
2 6
and
1 1 2 2 √1 −2
3
0 √
6
P T AP = diag(4, 1, 1). Hence, f (x, y, z) = 0 reduces to

x+y+z 2 x−y 2 x + y − 2z 2
     
4 √ + √ + √ = −(4x + 2y + 4z + 2).
3 2 6

4(x + y + z) + 5 2 x−y+1 2 x + y − 2z − 1 2
     
√ √ √ 9
Or equivalently to 4 + + = 12 . So, the
4 3 2 6
principal axes of the quadric (an ellipsoid) are 4(x + y + z) = −5, x − y = 1 and x + y − 2z = 1.
y2 3x2 z2
Part 2 Here f (x, y, z) = 0 reduces to 10 − 10 − 10 = 1 which is the equation of a
hyperboloid consisting of two sheets with center 0 and the axes x, y and z as the principal axes.
3x2 y2 z2
Part 3 Here f (x, y, z) = 0 reduces to 10 − 10 + 10 = 1 which is the equation of a
hyperboloid consisting of one sheet with center 0 and the axes x, y and z as the principal axes.
Part 4 Here f (x, y, z) = 0 reduces to z = y 2 −3x2 +10 which is the equation of a hyperbolic
paraboloid.
190 CHAPTER 6. EIGENVALUES, EIGENVECTORS AND DIAGONALIZABILITY

T
AF
DR

Figure 6.5: Ellipsoid, hyperboloid of two sheets and one sheet, hyperbolic paraboloid
.
Chapter 7

Appendix

7.1 Uniqueness of RREF


Definition 7.1.1. Fix n ∈ N. Then, for each f ∈ Sn , we associate an n × n matrix, denoted
P f = [pij ], such that pij = 1, whenver f (j) = i and 0, otherwise. The matrix P f is called the
Permutation " matrix
# corresponding to the permutation f . For example, I2 , corresponding
0 1
to Id2 , and = E12 , corresponding to the permutation (1, 2), are the two permutation
1 0
matrices of order 2 × 2.
T

Remark 7.1.2. Recall that in Remark 7.2.16.1, it was observed that each permutation is a
AF

product of n transpositions, (1, 2), . . . , (1, n).


DR

1. Verify that the elementary matrix Eij is the permutation matrix corresponding to the
transposition (i, j) .
2. Thus, every permutation matrix is a product of elementary matrices E1j , 1 ≤ j ≤ n.
   
1 0 0 0 1 0
   
3. For n = 3, the permutation matrices are I3 , 0 0 1 = E23 = E12 E13 E12 , 1
   0 0=
0 1 0 0 0 1
     
0 1 0 0 0 1 0 0 1
     
E12 , 
0 0 1 = E12 E13 , 1 0 0 = E13 E12 and 0 1 0 = E13 .
    
1 0 0 0 1 0 1 0 0
4. Let f ∈ Sn and P f = [pij ] be the corresponding permutation matrix. Since pij = δi,j and
{f (1), . . . , f (n)} = [n], each entry of P f is either 0 or 1. Furthermore, every row and
column of P f has exactly one nonzero entry. This nonzero entry is a 1 and appears at
the position pi,f (i) .
5. By the previous paragraph, we see that when a permutation matrix is multiplied to A
(a) from left then it permutes the rows of A.
(b) from right then it permutes the columns of A.

6. P is a permutation matrix if and only if P has exactly one 1 in each row and column.
Solution: If P has exactly one 1 in each row and column, then P is a square matrix, say

191
192 CHAPTER 7. APPENDIX

n × n. Now, apply GJE to P . The occurrence of exactly one 1 in each row and column
implies that these 1’s are the pivots in each column. We just need to interchange rows to
get it in RREF. So, we need to multiply by Eij . Thus, GJE of P is In and P is indeed a
product of Eij ’s. The other part has already been explained earlier.

We are now ready to prove Theorem 2.2.17.

Theorem 7.1.3. Let A and B be two matrices in RREF. If they are row equivalent then A = B.

Proof. Note that the matrix A = 0 if and only if B = 0. So, let us assume that the matrices
A, B 6= 0. Also, the row-equivalence of A and B implies that there exists an invertible matrix
C such that A = CB, where C is product of elementary matrices.
Since B is in RREF, either B[:, 1] = 0T or B[:, 1] = (1, 0, . . . , 0)T . If B[:, 1] = 0T then
A[:, 1] = CB[:, 1] = C0 = 0. If B[:, 1] = (1, 0, . . . , 0)T then A[:, 1] = CB[:, 1] = C[:, 1]. As C is
invertible, the first column of C cannot be the zero vector. So, A[:, 1] cannot be the zero vector.
Further, A is in RREF implies that A[:, 1] = (1, 0, . . . , 0)T . So, we have shown that if A and B
are row-equivalent then their first columns must be the same.
Now, let us assume that the first k − 1 columns of A and B are equal and it contains r
pivotal columns. We will now show that the k-th column is also the same.
Define Ak = [A[:, 1], . . . , A[:, k]] and Bk = [B[:, 1], . . . , B[:, k]]. Then, our assumption implies
that A[:, i] = B[:, i], for 1 ≤ i ≤ k − 1. Since, the first k − 1 columns contain r pivotal columns,
there exists a permutation matrix P such that
T
AF

" # " #
Ir W A[:, k] Ir W B[:, k]
Ak P = and Bk P = .
DR

0 0 0 0

" # If the k-th columns of A and B are pivotal columns then by definition of RREF, A[:, k] =
0
= B[:, k], where 0 is a vector of size r and e1 = (1, 0, . . . , 0)T . So, we need to consider two
e1
cases depending on whether both are non-pivotal or one is pivotal and the other is not.
As A = CB, we get Ak = CBk and
" # " #" # " #
Ir W A[:, k] C1 C2 Ir W B[:, k] C1 C1 W CB[:, k]
= Ak P = CBk P = = .
0 0 C3 C4 0 0 C3 C3 W
" #
I r C2
So, we see that C1 = Ir , C3 = 0 and A[:, k] = B[:, k].
0 C4
Case 1: Neither A[:, k] nor B[:, k] are pivotal. Then
" # " # " #" # " #
X I r C2 I r C2 Y Y
= A[:, k] = B[:, k] = = .
0 0 C4 0 C4 0 0
Thus, X = Y and in this case the k-th columns are equal.
Case 2: A[:, k] is pivotal but B[:, k] in non-pivotal. Then
" # " # " #" # " #
0 I r C2 Ir C2 Y Y
= A[:, k] = B[:, k] = = ,
e1 0 C4 0 C4 0 0
a contradiction as e1 6= 0. Thus, this case cannot arise.
Therefore, combining both the cases, we get the required result.
7.2. PERMUTATION/SYMMETRIC GROUPS 193

7.2 Permutation/Symmetric Groups


Definition 7.2.1. For a positive integer n, denote [n] = {1, 2, . . . , n}. A function f : A → B is
called
1. one-one/injective if f (x) = f (y) for some x, y ∈ A necessarily implies that x = y.
2. onto/surjective if for each b ∈ B there exists a ∈ A such that f (a) = b.
3. a bijection if f is both one-one and onto.

Example 7.2.2. Let A = {1, 2, 3}, B = {a, b, c, d} and C = {α, β, γ}. Then, the function
1. j : A → B defined by j(1) = a, j(2) = c and j(3) = c is neither one-one nor onto.
2. f : A → B defined by f (1) = a, f (2) = c and f (3) = d is one-one but not onto.
3. g : B → C defined by g(a) = α, g(b) = β, g(c) = α and g(d) = γ is onto but not one-one.
4. h : B → A defined by h(a) = 2, h(b) = 2, h(c) = 3 and h(d) = 1 is onto.
5. h ◦ f : A → A is a bijection.
6. g ◦ f : A → C is neither one-one not onto.

Remark 7.2.3. Let f : A → B and g : B → C be functions. Then, the composition of


functions, denoted g ◦ f , is a function from A to C defined by (g ◦ f )(a) = g(f (a)). Also, if
1. f and g are one-one then g ◦ f is one-one.
T
AF

2. f and g are onto then g ◦ f is onto.


DR

Thus, if f and g are bijections then so is g ◦ f .

Definition 7.2.4. A function f : [n] → [n] is called a permutation on n elements if f is a


bijection. For example, f, g : [2] → [2] defined by f (1) = 1, f (2) = 2 and g(1) = 2, g(2) = 1 are
permutations.

Exercise 7.2.5. Let S3 be the set consisting of all permutation on 3 elements. Then, prove
that S3 has 6 elements. Moreover, they are one of the 6 functions given below.
1. f1 (1) = 1, f1 (2) = 2 and f1 (3) = 3.
2. f2 (1) = 1, f2 (2) = 3 and f2 (3) = 2.
3. f3 (1) = 2, f3 (2) = 1 and f3 (3) = 3.
4. f4 (1) = 2, f4 (2) = 3 and f4 (3) = 1.
5. f5 (1) = 3, f5 (2) = 1 and f5 (3) = 2.
6. f6 (1) = 3, f6 (2) = 2 and f6 (3) = 1.

Remark 7.2.6. Let f : [n] → [n] be a bijection. Then, the inverse of f , denote f −1 , is defined
by f −1 (m) = ` whenever f (`) = m for m ∈ [n] is well defined and f −1 is a bijection. For
example, in Exercise 7.2.5, note that fi−1 = fi , for i = 1, 2, 3, 6 and f4−1 = f5 .

Remark 7.2.7. Let Sn = {f : [n] → [n] : σ is a permutation}. Then, Sn has n! elements and
forms a group with respect to composition of functions, called product, due to the following.
194 CHAPTER 7. APPENDIX

1. Let f ∈ Sn . Then,
!
1 2 ··· n
(a) f can be written as f = , called a two row notation.
f (1) f (2) · · · f (n)
(b) f is one-one. Hence, {f (1), f (2), . . . , f (n)} = [n] and thus, f (1) ∈ [n], f (2) ∈ [n] \
{f (1)}, . . . and finally f (n) = [n]\{f (1), . . . , f (n−1)}. Therefore, there are n choices
for f (1), n − 1 choices for f (2) and so on. Hence, the number of elements in Sn
equals n(n − 1) · · · 2 · 1 = n!.

2. By Remark 7.2.3, f ◦ g ∈ Sn , for any f, g ∈ Sn .

3. Also associativity holds as f ◦ (g ◦ h) = (f ◦ g) ◦ h for all functions f, g and h.

4. Sn has a special permutation called the identity permutation, denoted Idn , such that
Idn (i) = i, for 1 ≤ i ≤ n.

5. For each f ∈ Sn , f −1 ∈ Sn and is called the inverse of f as f ◦ f −1 = f −1 ◦ f = Idn .

Lemma 7.2.8. Fix a positive integer n. Then, the group Sn satisfies the following:

1. Fix an element f ∈ Sn . Then, Sn = {f ◦ g : g ∈ Sn } = {g ◦ f : g ∈ Sn }.

2. Sn = {g −1 : g ∈ Sn }.

Proof. Part 1: Note that for each α ∈ Sn the functions f −1 ◦α, α◦f −1 ∈ Sn and α = f ◦(f −1 ◦α)
T
AF

as well as α = (α ◦ f −1 ) ◦ f .
Part 2: Note that for each f ∈ Sn , by definition, (f −1 )−1 = f . Hence the result holds.
DR

Definition 7.2.9. Let f ∈ Sn . Then, the number of inversions of f , denoted n(f ), equals

n(f ) = | {(i, j) : i < j, f (i) > f (j) } |


= | {j : i + 1 ≤ j ≤ n, f (j) < f (i)} | using two row notation. (7.2.1)
!
1 2 3 4
Example 7.2.10. 1. For f = , n(f ) = | {(1, 2), (1, 3), (2, 3)} | = 3.
3 2 1 4

2. In Exercise 7.2.5, n(f5 ) = 2 + 0 = 2.


!
1 2 3 4 5 6 7 8 9
3. Let f = . Then, n(f ) = 3 + 1 + 1 + 1 + 0 + 3 + 2 + 1 = 12.
4 2 3 5 1 9 8 7 6
Definition 7.2.11. [Cycle Notation] Let f ∈ Sn . Suppose there exist r, 2 ≤ r ≤ n and
i1 , . . . , ir ∈ [n] such that f (i1 ) = i2 , f (i2 ) = i3 , . . . , f (ir ) = i1 and f (j) = j for all j 6= i1 , . . . , ir .
Then, we represent such a permutation
! by f = (i1 , i2 , . . . , ir! ) and call it an r-cycle. For
1 2 3 4 5 1 2 3 4 5
example, f = = (1, 4, 5) and = (2, 3).
4 2 3 5 1 1 3 2 4 5
Remark 7.2.12. 1. One also write the r-cycle (i1 , i2 , . . . , ir ) as (i2 , i3 , . . . , ir , i1 ) and so on.
For example, (1, 4, 5) = (4, 5, 1) = (5, 1, 4).
!
1 2 3 4 5
2. The permutation f = is not a cycle.
4 3 2 5 1
7.2. PERMUTATION/SYMMETRIC GROUPS 195

3. Let f = (1, 3, 5, 4) and g = (2, 4, 1) be two cycles. Then, their product, denoted f ◦ g or
(1, 3, 5, 4)(2, 4, 1) equals (1, 2)(3, 5, 4). The calculation proceeds as (the arrows indicate the
images):
1 → 2. Note (f ◦ g)(1) = f (g(1)) = f (2) = 2.
2 → 4 → 1 as (f ◦ g)(2) = f (g(2)) = f (4) = 1. So, (1, 2) forms a cycle.
3 → 5 as (f ◦ g)(3) = f (g(3)) = f (3) = 5.
5 → 4 as (f ◦ g)(5) = f (g(5)) = f (5) = 4.
4 → 1 → 3 as (f ◦ g)(4) = f (g(4)) = f (1) = 3. So, the other cycle is (3, 5, 4).
4. Let f = (1, 4, 5) and g = (2, 4, 1) be two permutations. Then, (1, 4, 5)(2, 4, 1) = (1, 2, 5)(4) =
(1, 2, 5) as 1 → 2, 2 → 4 → 5, 5 → 1, 4 → 1 → 4 and
(2, 4, 1)(1, 4, 5) = (1)(2, 4, 5) = (2, 4, 5) as 1 → 4 → 1, 2 → 4, 4 → 5, 5 → 1 → 2.
!
1 2 3 4 5
5. Even though is not a cycle, verify that it is a product of the cycles
4 3 2 5 1
(1, 4, 5) and (2, 3).

Definition 7.2.13. A permutation f ∈ Sn is called a transposition if there exist m, r ∈ [n]


such that f = (m, r).

Remark 7.2.14. Verify that


1. (2, 4, 5) = (2, 5)(2, 4) = (4, 2)(4, 5) = (5, 4)(5, 2) = (1, 2)(1, 5)(1, 4)(1, 2).
T

2. in general, the r-cycle (i1 , . . . , ir ) = (1, i1 )(1, ir )(1, ir−1 ) · · · (1, i2 )(1, i1 ).
AF

3. So, every r-cycle can be written as product of transpositions. Furthermore, they can be
DR

written using the n transpositions (1, 2), (1, 3), . . . , (1, n).

With the above definitions, we state and prove two important results.

Theorem 7.2.15. Let f ∈ Sn . Then, f can be written as product of transpositions.

Proof. Note that using use Remark 7.2.14, we just need to show that f can be written as
product of disjoint cycles.
Consider the set S = {1, f (1), f (2) (1) = (f ◦ f )(1), f (3) (1) = (f ◦ (f ◦ f ))(1), . . .}. As S is an
infinite set and each f (i) (1) ∈ [n], there exist i, j with 0 ≤ i < j ≤ n such that f (i) (1) = f (j) (1).
Now, let j1 be the least positive integer such that f (i) (1) = f (j1 ) (1), for some i, 0 ≤ i < j1 .
Then, we claim that i = 0.
For if, i − 1 ≥ 0 then j1 − 1 ≥ 1 and the condition that f is one-one gives
   
f (i−1) (1) = (f −1 ◦ f (i) )(1) = f −1 f (i) (1) = f −1 f (j1 ) (1) = (f −1 ◦ f (j1 ) )(1) = f (j1 −1) (1).

Thus, we see that the repetition has occurred at the (j1 − 1)-th instant, contradicting the
assumption that j1 was the least such positive integer. Hence, we conclude that i = 0. Thus,
(1, f (1), f (2) (1), . . . , f (j1 −1) (1)) is one of the cycles in f .
Now, choose i1 ∈ [n] \ {1, f (1), f (2) (1), . . . , f (j1 −1) (1)} and proceed as above to get another
cycle. Let the new cycle by (i1 , f (i1 ), . . . , f (j2 −1) (i1 )). Then, using f is one-one follows that

1, f (1), f (2) (1), . . . , f (j1 −1) (1) ∩ i1 , f (i1 ), . . . , f (j2 −1) (i1 ) = ∅.
 
196 CHAPTER 7. APPENDIX

So, the above process needs to be repeated at most n times to get all the disjoint cycles. Thus,
the required result follows.

Remark 7.2.16. Note that when one writes a permutation as product of disjoint cycles, cycles
of length 1 are suppressed so as to match Definition 7.2.11. For example, the algorithm in the
proof of Theorem 7.2.15 implies
1. Using Remark 7.2.14.3, we see that every permutation can be written as product of the n
transpositions (1, 2), (1, 3), . . . , (1, n).
!
1 2 3 4 5
2. = (1)(2, 4, 5)(3) = (2, 4, 5).
1 4 3 5 2
!
1 2 3 4 5 6 7 8 9
3. = (1, 4, 5)(2)(3)(6, 9)(7, 8) = (1, 4, 5)(6, 9)(7, 8).
4 2 3 5 1 9 8 7 6
Note that Id3 = (1, 2)(1, 2) = (1, 2)(2, 3)(1, 2)(1, 3), as well. The question arises, is it
possible to write Idn as a product of odd number of transpositions? The next lemma answers
this question in negative.

Lemma 7.2.17. Suppose there exist transpositions fi , 1 ≤ i ≤ t, such that

Idn = f1 ◦ f2 ◦ · · · ◦ ft ,

then t is even.
T

Proof. We will prove the result by mathematical induction. Observe that t 6= 1 as Idn is not a
AF

transposition. Hence, t ≥ 2. If t = 2, we are done. So, let us assume that the result holds for
DR

all expressions in which the number of transpositions t ≤ k. Now, let t = k + 1.


Suppose f1 = (m, r) and let `, s ∈ [n] \ {m, r}. Then, the possible choices for the com-
position f1 ◦ f2 are (m, r)(m, r) = Idn , (m, r)(m, `) = (r, `)(r, m), (m, r)(r, `) = (`, r)(`, m)
and (m, r)(`, s) = (`, s)(m, r). In the first case, f1 and f2 can be removed to obtain Idn =
f3 ◦ f4 ◦ · · · ◦ ft , where the number of transpositions is t − 2 = k − 1 < k. So, by mathematical
induction, t − 2 is even and hence t is also even.
In the remaining cases, the expression for f1 ◦ f2 is replaced by their counterparts to obtain
another expression for Idn . But in the new expression for Idn , m doesn’t appear in the first
transposition, but appears in the second transposition. The shifting of m to the right can
continue till the number of transpositions reduces by 2 (which in turn gives the result by
mathematical induction). For if, the shifting of m to the right doesn’t reduce the number
of transpositions then m will get shifted to the right and will appear only in the right most
transposition. Then, this expression for Idn does not fix m whereas Idn (m) = m. So, the later
case leads us to a contradiction. Hence, the shifting of m to the right will surely lead to an
expression in which the number of transpositions at some stage is t − 2 = k − 1. At this stage,
one applies mathematical induction to get the required result.

Theorem 7.2.18. Let f ∈ Sn . If there exist transpositions g1 , . . . , gk and h1 , . . . , h` with

f = g1 ◦ g2 ◦ · · · ◦ gk = h1 ◦ h2 ◦ · · · ◦ h`

then, either k and ` are both even or both odd.


7.2. PERMUTATION/SYMMETRIC GROUPS 197

Proof. As g1 ◦ · · · ◦ gk = h1 ◦ · · · ◦ h` and h−1 = h for any transposition h ∈ Sn , we have

Idn = g1 ◦ g2 ◦ · · · ◦ gk ◦ h` ◦ h`−1 ◦ · · · ◦ h1 .

Hence by Lemma 7.2.17, k + ` is even. Thus, either k and ` are both even or both odd.

Definition 7.2.19. [Even and Odd Permutation] A permutation f ∈ Sn is called an


1. even permutation if f can be written as product of even number of transpositions.
2. odd permutation if f can be written as a product of odd number of transpositions.

Definition 7.2.20. Observe that if f and g are both even or both odd permutations, then f ◦ g
and g ◦ f are both even. Whereas, if one of them is odd and the other even then f ◦ g and g ◦ f
are both odd. We use this to define a function sgn : Sn → {1, −1}, called the signature of a
permutation, by (
1 if f is an even permutation
sgn(f ) = .
−1 if f is an odd permutation
Example 7.2.21. Consider the set Sn . Then,

1. by Lemma 7.2.17, Idn is an even permutation and sgn(Idn ) = 1.

2. a transposition, say f , is an odd permutation and hence sgn(f ) = −1

3. using Remark 7.2.20, sgn(f ◦ g) = sgn(f ) · sgn(g) for any two permutations f, g ∈ Sn .
T

We are now ready to define determinant of a square matrix A.


AF

Definition 7.2.22. Let A = [aij ] be an n × n matrix with complex entries. Then, the deter-
DR

minant of A, denoted det(A), is defined as


X X n
Y
det(A) = sgn(g)a1g(1) a2g(2) . . . ang(n) = sgn(g) aig(i) . (7.2.2)
g∈Sn σ∈Sn i=1
" #
1 2
For example, if S2 = {Id, f = (1, 2)} then for A = , det(A) = sgn(Id) · a1Id(1) a2Id(2) +
2 1
sgn(f ) · a1f (1) a2f (2) = 1 · a11 a22 + (−1)a12 a21 = 1 − 4 = −3.

Observe that det(A) is a scalar quantity. Even though the expression for det(A) seems
complicated at first glance, it is very helpful in proving the results related with “properties of
determinant”. We will do so in the next section. As another examples, we verify that this
definition also matches for 3 × 3 matrices. So, let A = [aij ] be a 3 × 3 matrix. Then, using
Equation (7.2.2),
X 3
Y
det(A) = sgn(σ) aiσ(i)
σ∈Sn i=1
3
Y 3
Y 3
Y
= sgn(f1 ) aif1 (i) + sgn(f2 ) aif2 (i) + sgn(f3 ) aif3 (i) +
i=1 i=1 i=1
3
Y 3
Y 3
Y
sgn(f4 ) aif4 (i) + sgn(f5 ) aif5 (i) + sgn(f6 ) aif6 (i)
i=1 i=1 i=1
= a11 a22 a33 − a11 a23 a32 − a12 a21 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31 .
198 CHAPTER 7. APPENDIX

7.3 Properties of Determinant


Theorem 7.3.1 (Properties of Determinant). Let A = [aij ] be an n × n matrix.

1. If A[i, :] = 0T for some i then det(A) = 0.

2. If B = Ei (c)A, for some c 6= 0 and i ∈ [n] then det(B) = c det(A).

3. If B = Eij A, for some i 6= j then det(B) = − det(A).

4. If A[i, :] = A[j, :] for some i 6= j then det(A) = 0.

5. Let B and C be two n×n matrices. If there exists m ∈ [n] such that B[i, :] = C[i, :] = A[i, :]
for all i 6= m and C[m, :] = A[m, :] + B[m, :] then det(C) = det(A) + det(B).

6. If B = Eij (c), for c 6= 0 then det(B) = det(A).

7. If A is a triangular matrix then det(A) = a11 · · · ann , the product of the diagonal entries.

8. If E is an n × n elementary matrix then det(EA) = det(E) det(A).

9. A is invertible if and only if det(A) 6= 0.

10. If B is an n × n matrix then det(AB) = det(A) det(B).


T
AF

11. If AT denotes the transpose of the matrix A then det(A) = det(AT ).


DR

Proof. Part 1: Note that each sum in det(A) contains one entry from each row. So, each sum
has an entry from A[i, :] = 0T . Hence, each sum in itself is zero. Thus, det(A) = 0.
Part 2: By assumption, B[k, :] = A[k, :] for k 6= i and B[i, :] = cA[i, :]. So,
   
X Y X Y
det(B) = sgn(σ)  bkσ(k)  biσ(i) = sgn(σ)  akσ(k)  caiσ(i)
σ∈Sn k6=i σ∈Sn k6=i
X n
Y
= c sgn(σ) akσ(k) = c det(A).
σ∈Sn k=1

Part 3: Let τ = (i, j). Then, sgn(τ ) = −1, by Lemma 7.2.8, Sn = {σ ◦ τ : σ ∈ Sn } and

X n
Y X n
Y
det(B) = sgn(σ) biσ(i) = sgn(σ ◦ τ ) bi,(σ◦τ )(i)
σ∈Sn i=1 σ◦τ ∈Sn i=1
 
X Y
= sgn(τ ) · sgn(σ)  bkσ(k)  bi(σ◦τ )(i) bj(σ◦τ )(j)
σ◦τ ∈Sn k6=i,j
 
X Y X n
Y
= sgn(τ ) sgn(σ)  bkσ(k)  biσ(j) bjσ(i) = − sgn(σ) akσ(k)
σ∈Sn k6=i,j σ∈Sn k=1
= − det(A).

Part 4: As A[i, :] = A[j, :], A = Eij A. Hence, by Part 3, det(A) = − det(A). Thus, det(A) = 0.
7.3. PROPERTIES OF DETERMINANT 199

Part 5: By assumption, C[i, :] = B[i, :] = A[i, :] for i 6= m and C[m, :] = B[m, :] + A[m, :]. So,
 
X Yn X Y
det(C) = sgn(σ) ciσ(i) = sgn(σ)  ciσ(i)  cmσ(m)
σ∈Sn i=1 σ∈Sn i6=m
 
X Y
= sgn(σ)  ciσ(i)  (amσ(m) + bmσ(m) )
σ∈Sn i6=m
X n
Y X n
Y
= sgn(σ) aiσ(i) + sgn(σ) biσ(i) = det(A) + det(B).
σ∈Sn i=1 σ∈Sn i=1

Part 6: By assumption, B[k, :] = A[k, :] for k 6= i and B[i, :] = A[i, :] + cA[j, :]. So,
 
X Yn X Y
det(B) = sgn(σ) bkσ(k) = sgn(σ)  bkσ(k)  biσ(i)
σ∈Sn k=1 σ∈Sn k6=i
 
X Y
= sgn(σ)  akσ(k)  (aiσ(i) + cajσ(j) )
σ∈Sn k6=i
   
X Y X Y
= sgn(σ)  akσ(k)  aiσ(i) + c sgn(σ)  akσ(k)  ajσ(j) )
σ∈Sn k6=i σ∈Sn k6=i
X n
Y
= sgn(σ) akσ(k) + c · 0 = det(A). U seP art 4
T

σ∈Sn k=1
AF

Part 7: Observe that if σ ∈ Sn and σ 6= Idn then n(σ) ≥ 1. Thus, for every σ 6= Idn , there
DR

exists m ∈ [n] (depending on σ) such that m > σ(m) or m < σ(m). So, if A is triangular,
amσ(m) = 0. So, for each σ 6= Idn , ni=1 aiσ(i) = 0. Hence, det(A) = ni=1 aii . the result follows.
Q Q

Part 8: Using Part 7, det(In ) = 1. By definition Eij = Eij In and Ei (c) = Ei (c)In and
Eij (c) = Eij (c)In , for c 6= 0. Thus, using Parts 2, 3 and 6, we get det(Ei (c)) = c, det(Eij ) = −1
and det(Eij (k)) = 1. Also, again using Parts 2, 3 and 6, we get det(EA) = det(E) det(A).
Part 9: Suppose A is invertible. Then, by Theorem 2.5.1, A = E1 · · · Ek , for some elementary
matrices E1 , . . . , Ek . So, a repeated application of Part 8 implies det(A) = det(E1 ) · · · det(Ek ) 6=
0 as det(Ei ) 6= 0 for 1 ≤ i ≤ k.
Now, suppose that det(A) 6= 0. We need to show that A is invertible. On the contrary, as-
sume that A is not invertible. Then, by Theorem 2.5.1, Rank(A) < " n. So,# by Proposition 2.2.21,
B
there exist elementary matrices E1 , . . . , Ek such that E1 · · · Ek A = . Therefore, by Part 1
0
and a repeated application of Part 8 gives
" #!
B
det(E1 ) · · · det(Ek ) det(A) = det(E1 · · · Ek A) = det = 0.
0

As det(Ei ) 6= 0, for 1 ≤ i ≤ k, we have det(A) = 0, a contradiction. Thus, A is invertible.


Part 10: Let A be invertible. Then, by Theorem 2.5.1, A = E1 · · · Ek , for some elementary
matrices E1 , . . . , Ek . So, applying Part 8 repeatedly gives det(A) = det(E1 ) · · · det(Ek ) and

det(AB) = det(E1 · · · Ek B) = det(E1 ) · · · det(Ek ) det(B) = det(A) det(B).


200 CHAPTER 7. APPENDIX

In case A is not invertible, by Part 9, det(A) = 0. Also, AB is not invertible (AB is invertible
will imply A is invertible using the rank argument). So, again by Part 9, det(AB) = 0. Thus,
det(AB) = det(A) det(B).
Part 11: Let B = [bij ] = AT . Then, bij = aji , for 1 ≤ i, j ≤ n. By Lemma 7.2.8, we know that
Sn = {σ −1 : σ ∈ Sn }. As σ ◦ σ −1 = Idn , sgn(σ) = sgn(σ −1 ). Hence,
X n
Y X n
Y X n
Y
det(B) = sgn(σ) biσ(i) = sgn(σ) aσ(i),i = sgn(σ −1 ) aiσ−1 (i)
σ∈Sn i=1 σ∈Sn i=1 σ −1 ∈S n i=1
= det(A).

Remark 7.3.2. 1. As det(A) = det(AT ), we observe that in Theorem 7.3.1, the condition
on “row” can be replaced by the condition on “column”.

2. Let A = [aij ] be a matrix satisfying a1j = 0, for 2 ≤ j ≤ n. Let B = A(1 | 1), the submatrix
of A obtained by removing the first row and the first column. Then det(A) = a11 det(B).
Proof: Let σ ∈ Sn with σ(1) = 1. Then, σ has a cycle (1). So, a disjoint cycle represen-
tation of σ only has numbers {2, 3, . . . , n}. That is, we can think of σ as an element of
Sn−1 . Hence,
X n
Y X n
Y
T

det(A) = sgn(σ) aiσ(i) = sgn(σ) aiσ(i)


AF

σ∈Sn i=1 σ∈Sn ,σ(1)=1 i=1


n n−1
DR

X Y X Y
= a11 sgn(σ) aiσ(i) = a11 sgn(σ) biσ(i) = a11 det(B).
σ∈Sn ,σ(1)=1 i=2 σ∈Sn−1 i=1

We now relate this definition of determinant with the one given in Definition 2.5.6.
n
(−1)1+j a1j det A(1 | j) , where
P 
Theorem 7.3.3. Let A be an n × n matrix. Then, det(A) =
j=1
recall that A(1 | j) is the submatrix of A obtained by removing the 1st row and the j th column.
 
0 0 · · · a1j · · · 0
 
 a21 a22 · · · a2j · · · a2n 
Proof. For 1 ≤ j ≤ n, define an n × n matrix Bj =  . . Also, for
 
 .. .. .. .. .. 
 . . . . 

an1 an2 · · · anj · · · ann
each matrix Bj , we define the n × n matrix Cj by
1. Cj [:, 1] = Bj [:, j],
2. Cj [:, i] = Bj [:, i − 1], for 2 ≤ i ≤ j and
3. Cj [:, k] = Bj [:, k] for k ≥ j + 1.

Also, observe that Bj ’s have been defined to satisfy B1 [1, :] + · · · + Bn [1, :] = A[1, :] and
Bj [i, :] = A[i, :] for all i ≥ 2 and 1 ≤ j ≤ n. Thus, by Theorem 7.3.1.5,
n
X
det(A) = det(Bj ). (7.3.1)
j=1
7.4. DIMENSION OF W1 + W2 201

Let us now compute det(Bj ), for 1 ≤ j ≤ n. Note that Cj = E12 E23 · · · Ej−1,j Bj , for 1 ≤ j ≤ n.
Then, by Theorem 7.3.1.3, we get det(Bj ) = (−1)j−1 det(Cj ). So, using Remark 7.3.2.2 and
Theorem 7.3.1.2 and Equation (7.3.1), we have
n
X n
X
(−1)j−1 det(Cj ) = (−1)j+1 a1j det A(1 | j) .

det(A) =
j=1 j=1

Thus, we have shown that the determinant defined in Definition 2.5.6 is valid.

7.4 Dimension of W1 + W2
Theorem 7.4.1. Let V be a finite dimensional vector space over F and let W1 and W2 be two
subspaces of V. Then,

dim(W1 ) + dim(W2 ) = dim(W1 + W2 ) + dim(W1 ∩ W2 ). (7.4.1)

Proof. Since W1 ∩ W2 is a vector subspace of V , let B = {u1 , . . . , ur } be a basis of W1 ∩ W2 .


As, W1 ∩ W2 is a subspace of both W1 and W2 , let us extend the basis B to form a basis
B1 = {u1 , . . . , ur , v1 , . . . , vs } of W1 and a basis B2 = {u1 , . . . , ur , w1 , . . . , wt } of W2 .
We now prove that D = {u1 , . . . , ur , w1 , . . . , ws , v1 , v2 , . . . , vt } is a basis of W1 + W2 . To
do this, we show that

1. D is linearly independent subset of V and


T
AF

2. LS(D) = W1 + W2 .
DR

The second part can be easily verified. For the first part, consider the linear system

α1 u1 + · · · + αr ur + β1 w1 + · · · + βs ws + γ1 v1 + · · · + γt vt = 0 (7.4.2)

in the variables αi ’s, βj ’s and γk ’s. We re-write the system as

α1 u1 + · · · + αr ur + β1 w1 + · · · + βs ws = −(γ1 v1 + · · · + γt vt ).
s r T
γi vi ∈ LS(B1 ) = W1 . Also, v = βk wk . So, v ∈ LS(B2 ) = W2 .
P P P
Then, v = − αr ur +
i=1 j=1 k=1
r
Hence, v ∈ W1 ∩ W2 and therefore, there exists scalars δ1 , . . . , δk such that v =
P
δ j uj .
j=1
Substituting this representation of v in Equation (7.4.2), we get

(α1 − δ1 )u1 + · · · + (αr − δr )ur + β1 w1 + · · · + βt wt = 0.

So, using Exercise 3.3.16.1, αi − δi = 0, for 1 ≤ i ≤ r and βj = 0, for 1 ≤ j ≤ t. Thus, the


system (7.4.2) reduces to

α1 u1 + · · · + αk uk + γ1 v1 + · · · + γr vr = 0

which has αi = 0 for 1 ≤ i ≤ r and γj = 0 for 1 ≤ j ≤ s as the only solution. Hence, we see that
the linear system of Equations (7.4.2) has no nonzero solution. Therefore, the set D is linearly
independent and the set D is indeed a basis of W1 + W2 . We now count the vectors in the sets
B, B1 , B2 and D to get the required result.
202 CHAPTER 7. APPENDIX

7.5 When does Norm imply Inner Product


In this section, we prove the following result. A generalization of this result to complex vector
space is left as an exercise for the reader as it requires similar ideas.

Theorem 7.5.1. Let V be a real vector space. A norm k · k is induced by an inner product if
and only if, for all x, y ∈ V, the norm satisfies

kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2 (parallelogram law). (7.5.1)

Proof. Suppose that k · k is indeed induced by an inner product. Then, by Exercise 5.1.7.3 the
result follows.
So, let us assume that k · k satisfies the parallelogram law. So, we need to define an inner
product. We claim that the function f : V × V → R defined by
1
kx + yk2 − kx − yk2 , for all x, y ∈ V

f (x, y) =
4
satisfies the required conditions for an inner product. So, let us proceed to do so.
1
Step 1: Clearly, for each x ∈ V, f (x, 0) = 0 and f (x, x) = kx + xk2 = kxk2 . Thus,
4
f (x, x) ≥ 0. Further, f (x, x) = 0 if and only if x = 0.

Step 2: By definition f (x, y) = f (y, x) for all x, y ∈ V.


T

Step 3: Now note that kx + yk2 − kx − yk2 = 2 kx + yk2 − kxk2 − kyk2 . Or equivalently,

AF

2f (x, y) = kx + yk2 − kxk2 − kyk2 , for x, y ∈ V.


DR

(7.5.2)

Thus, for x, y, z ∈ V, we have

4 (f (x, y) + f (z, y)) = kx + yk2 − kx − yk2 + kz + yk2 − kz − yk2


= 2 kx + yk2 + kz + yk2 − kxk2 − kzk2 − 2kyk2


= kx + z + 2yk2 + kx − zk2 − kx + zk2 + kx − zk2 − 4kyk2




= kx + z + 2yk2 − kx + zk2 − k2yk2


= 2f (x + z, 2y) using Equation (7.5.2). (7.5.3)

Now, substituting z = 0 in Equation (7.5.3) and using Equation (7.5.2), we get 2f (x, y) =
f (x, 2y) and hence 4f (x + z, y) = 2f (x + z, 2y) = 4 (f (x, y) + f (z, y)). Thus,

f (x + z, y) = f (x, y) + f (z, y), for all x, y ∈ V. (7.5.4)

Step 4: Using Equation (7.5.4), f (x, y) = f (y, x) and the principle of mathematical induction,
it follows that nf (x, y) = f (nx, y), for all x, y ∈ V and n ∈ N. Another application of
Equation (7.5.4) with f (0, y) = 0 implies that nf (x, y) = f (nx, y), for all x, y ∈ V and
n ∈ Z. Also, for m 6= 0,
n  n
mf x, y = f (m x, y) = f (nx, y) = nf (x, y).
m m
Hence, we see that for all x, y ∈ V and a ∈ Q, f (ax, y) = af (x, y).
7.6. ROOTS OF A POLYNOMIALS 203

Step 5: Fix u, v ∈ V and define a function g : R → R by

g(x) = f (xu, v) − xf (u, v)


1  x
kxu + vk2 − kxuk2 − kvk2 − ku + vk2 − kuk2 − kvk2 .

=
2 2
Then, by previous step g(x) = 0, for all x ∈ Q. So, if g is a continuous function then
continuity implies g(x) = 0, for all x ∈ R. Hence, f (xu, v) = xf (u, v), for all x ∈ R.
Note that the second term of g(x) is a constant multiple of x and hence continuous. Using
a similar reason, it is enough to show that g1 (x) = kxu + vk, for certain fixed vectors
u, v ∈ V, is continuous. To do so, note that

kx1 u + vk = k(x1 − x2 )u + x2 u + vk ≤ k(x1 − x2 )uk + kx2 u + vk.



Thus, kx1 u + vk − kx2 u + vk ≤ k(x1 − x2 )uk. Hence, taking the limit as x1 → x2 , we

get lim kx1 u + vk = kx2 u + vk.


x1 →x2
Thus, we have proved the continuity of g and hence the prove of the required result.

7.6 Roots of a Polynomials


The main aim of this subsection is to prove the continuous dependence of the zeros of a poly-
nomial on its coefficients and to recall Descartes’s rule of signs.
T

Definition 7.6.1. [Jordan Curves] A curve in C is a continuous function f : [a, b] → C,


AF

where [a, b] ⊆ R.
DR

1. If the function f is one-one on [a, b) and also on (a, b], then it is called a simple curve.
2. If f (b) = f (a), then it is called a closed curve.
3. A closed simple curve is called a Jordan curve.
4. The derivative (integral) of a curve f = u+iv is defined component wise. If f 0 is continuous
on [a, b], we say f is a C 1 -curve (at end points we consider one sided derivatives and
continuity).
5. A C 1 -curve on [a, b] is called a smooth curve, if f 0 is never zero on (a, b).
6. A piecewise smooth curve is called a contour.
7. A positively oriented simple closed curve is called a simple closed curve such that
while traveling on it the interior of the curve always stays to the left. (Camille Jordan
has proved that such a curve always divides the plane into two connected regions, one of
which is called the bounded region and the other is called the unbounded region. The
one which is bounded is considered as the interior of the curve.)

We state the famous Rouche Theorem of complex analysis without proof.

Theorem 7.6.2. [Rouche’s Theorem] Let C be a positively oriented simple closed contour.
Also, let f and g be two analytic functions on RC , the union of the interior of C and the curve
C itself. Assume also that |f (x)| > |g(x)|, for all x ∈ C. Then, f and f + g have the same
number of zeros in the interior of C.
204 CHAPTER 7. APPENDIX

Corollary 7.6.3. [Alen Alexanderian, The University of Texas at Austin, USA.] Let P (t) =
tn +an−1 tn−1 +· · ·+a0 have distinct roots λ1 , . . . , λm with multiplicities α1 , . . . , αm , respectively.
Take any  > 0 for which the balls B (λi ) are disjoint. Then, there exists a δ > 0 such that the
polynomial q(t) = tn + a0n−1 tn−1 + · · · + a00 has exactly αi roots (counting with multiplicities) in
B (λi ), whenever |aj − a0j | < δ.

Proof. For an  > 0 and 1 ≤ i ≤ m, let Ci = {z ∈ C : |z − λi | = }. Now, for each i, 1 ≤ i ≤ m,


take νi = min |p(z)|, ρi = max[1 + |z| + · · · + |z|n−1 ] and choose δ > 0 such that ρi δ < νi . Then,
z∈Ci z∈Ci
for a fixed j and z ∈ Cj , we have

|q(z) − P (z)| = |(a0n−1 − an−1 )z n−1 + · · · + (a00 − a0 )| ≤ δρj < νj ≤ |p(z)|.

Hence, by Rouche’s theorem, p(z) and q(z) have the same number of zeros inside Cj , for each
j = 1, . . . , m. That is, the zeros of q(t) are within the -neighborhood of the zeros of P (t).
As a direct application, we obtain the following corollary.

Corollary 7.6.4. Eigenvalues of a matrix are continuous functions of its entries.

Proof. Follows from Corollary 7.6.3.


Pn
Remark 7.6.5. 1. [Sign changes in a polynomial] Let P (x) = 0 ai xn−i be a real polyno-
mial, with a0 6= 0. Read the coefficient from the left a0 , a1 , . . .. We say the sign changes
of ai occur at m1 < m2 < · · · < mk to mean that am1 is the first after a0 with sign
T

opposite to a0 ; am2 is the first after am1 with sign opposite to am1 ; and so on.
AF

Pn n−i be a real polynomial. Then, the


2. [Descartes’ Rule of Signs] Let P (x) = 0 ai x
DR

maximum number of positive roots of P (x) = 0 is the number of changes in sign of the
coefficients and that the maximum number of negative roots is the number of sign changes
in P (−x) = 0.
Proof. Assume that a0 , a1 , · · · , an has k > 0 sign changes. Let b > 0. Then, the coeffi-
cients of (x − b)P (x) are

a0 , a1 − ba0 , a2 − ba1 , · · · , an − ban−1 , −ban .

This list has at least k + 1 changes of signs. To see this, assume that a0 > 0 and an 6= 0.
Let the sign changes of ai occur at m1 < m2 < · · · < mk . Then, setting

c0 = a0 , c1 = am1 − bam1 −1 , c2 = am2 − bam2 −1 , · · · , ck = amk − bamk −1 , ck+1 = −ban ,

we see that ci > 0 when i is even and ci < 0, when i is odd. That proves the claim.
Now, assume that P (x) = 0 has k positive roots b1 , b2 , · · · , bk . Then,

P (x) = (x − b1 )(x − b2 ) · · · (x − bk )Q(x),

where Q(x) is a real polynomial. By the previous observation, the coefficients of (x −


bk )Q(x) has at least one change of signs, coefficients of (x − bk−1 )(x − bk )Q(x) has at least
two, and so on. Thus coefficients of P (x) has at least k change of signs. The rest of the
proof is similar.
7.7. VARIATIONAL CHARACTERIZATIONS OF HERMITIAN MATRICES 205

7.7 Variational characterizations of Hermitian Matrices


Let A ∈ Mn (C) be a Hermitian matrix. Then, by Theorem 6.2.22, we know that all the
eigenvalues of A are real. So, we write λi (A) to mean the i-th smallest eigenvalue of A. That
is, the i-th from the left in the list λ1 (A) ≤ λ2 (A) ≤ · · · ≤ λn (A).

Lemma 7.7.1. [Rayleigh-Ritz Ratio] Let A ∈ Mn (C) be a Hermitian matrix. Then,


1. λ1 (A)x∗ x ≤ x∗ Ax ≤ λn (A)x∗ x, for each x ∈ Cn .

2. λ1 (A) = min xx∗Ax ∗
x = min x Ax.
x6=0 kxk=1
x∗ Ax
3. λn (A) = max x∗ x = max x∗ Ax.
x6=0 kxk=1

Proof. Proof of Part 1: By spectral theorem (see Theorem 6.2.22, there exists a unitary matrix
U such that A = U DU ∗ , where D = diag(λ1 (A), . . . , λn (A)) is a real diagonal matrix. Thus,
the set {U [:, 1], . . . , U [:, n]} is a basis of Cn . Hence, for each x ∈ Cn , there exists Ans :i ’s
(scalar) such that x = αi U [:, i]. So, note that x∗ x = |αi |2 and
P

X X X
λ1 (A)x∗ x = λ1 (A) |αi |2 ≤ |αi |2 λi (A) = x∗ Ax ≤ λn |αi |2 = λn x∗ x.

For Part 2 and Part 3, take x = U [:, 1] and x = U (:, n), respectively.
As an immediate corollary, we state the following result.
T

x∗ Ax
AF

Corollary 7.7.2. Let A ∈ Mn (C) be a Hermitian matrix and α = . Then, A has an


x∗ x
DR

eigenvalue in the interval (−∞, α] and has an eigenvalue in the interval [α, ∞).

We now generalize the second and third parts of Theorem 7.7.2.

Proposition 7.7.3. Let A ∈ Mn (C) be a Hermitian matrix with A = U DU ∗ , where U is a


unitary matrix and D is a diagonal matrix consisting of the eigenvalues λ1 ≤ λ2 ≤ · · · ≤ λn .
Then, for any positive integer k, 1 ≤ k ≤ n,

λk = min x∗ Ax = max x∗ Ax.


kxk=1 kxk=1
x⊥U [:,1],...,U [:,k−1] x⊥U [:,n],...,U [:,k+1]

Proof. Let x ∈ Cn such that x is orthogonal to U [, 1], . . . , U [:, k − 1]. Then, we can write
Pn
x= αi U [:, i], for some scalars αi ’s. In that case,
i=k

n
X n
X
λk x∗ x = λk |αi |2 ≤ |αi |2 λi = x∗ Ax
i=k i=k

and the equality occurs for x = U [:, k]. Thus, the required result follows.

Theorem 7.7.4. [Courant-Fischer] Let A ∈ Mn (C) be a Hermitian matrix with eigenvalues


λ1 ≤ λ2 ≤ · · · ≤ λn . Then,

λk = max min x∗ Ax = min max x∗ Ax.


w1 ,...,wk−1 kxk=1 wn ,...,wk+1 kxk=1
x⊥w1 ,...,wk−1 x⊥wn ,...,wk+1
206 CHAPTER 7. APPENDIX

Proof. Let A = U DU ∗ , where U is a unitary matrix and D = diag(λ1 , . . . , λn ). Now, choose a


set of k − 1 linearly independent vectors from Cn , say S = {w1 , . . . , wk−1 }. Then, some of the
eigenvectors {U [:, 1], . . . , U [:, k − 1]} may be an element of S ⊥ . Thus, using Proposition 7.7.3,
we see that
λk = min x∗ Ax ≥ min x∗ Ax.
kxk=1, kxk=1
x⊥U [:,1],...,U [:,k−1] x∈S ⊥

Hence, λk ≥ max min x∗ Ax, for each choice of k − 1 linearly independent vectors.
w1 ,...,wk−1 kxk=1
x⊥w1 ,...,wk−1

But, by Proposition 7.7.3, the equality holds for the linearly independent set {U [:, 1], . . . , U [:
, k − 1]} which proves the first equality. A similar argument gives the second equality and hence
the proof is omitted.

Theorem 7.7.5. [Weyl Interlacing Theorem] Let A, B ∈ Mn (C) be a Hermitian matrices.


Then, λk (A) + λ1 (B) ≤ λk (A + B) ≤ λk (A) + λn (B). In particular, if B = P ∗ P , for some
matrix P , then λk (A + B) ≥ λk (A). In particular, for z ∈ Cn , λk (A + zz∗ ) ≤ λk+1 (A).

Proof. As A and B are Hermitian matrices, the matrix A + B is also Hermitian. Hence, by
Courant-Fischer theorem and Lemma 7.7.1.1,

λk (A + B) = max min x∗ (A + B)x


w1 ,...,wk−1 kxk=1
x⊥w1 ,...,wk−1
T

≤ max min [x∗ Ax + λn (B)] = λk (A) + λn (B)


AF

w1 ,...,wk−1 kxk=1
x⊥w1 ,...,wk−1
DR

and

λk (A + B) = max min x∗ (A + B)x


w1 ,...,wk−1 kxk=1
x⊥w1 ,...,wk−1

≥ max min [x∗ Ax + λ1 (B)] = λk (A) + λ1 (B).


w1 ,...,wk−1 kxk=1
x⊥w1 ,...,wk−1

If B = P ∗ P , then λ1 (B) = min x∗ (P ∗ P )x = min kP xk2 ≥ 0. Thus,


kxk=1 kxk=1

λk (A + B) ≥ λk (A) + λ1 (B) ≥ λk (A).

In particular, for z ∈ Cn , we have

λk (A + zz∗ ) = max min [x∗ Ax + |x∗ z|2 ]


w1 ,...,wk−1 kxk=1
x⊥w1 ,...,wk−1

≤ max min [x∗ Ax + |x∗ z|2 ]


w1 ,...,wk−1 kxk=1
x⊥w1 ,...,wk−1 ,z

= max min x∗ Ax
w1 ,...,wk−1 kxk=1
x⊥w1 ,...,wk−1 ,z

≤ max min x∗ Ax = λk+1 (A).


w1 ,...,wk−1 ,wk kxk=1
x⊥w1 ,...,wk−1 ,wk
7.7. VARIATIONAL CHARACTERIZATIONS OF HERMITIAN MATRICES 207

Theorem 7.7.6.
" [Cauchy
# Interlacing Theorem] Let A ∈ Mn (C) be a Hermitian matrix.
A y
Define  = ∗ , for some a ∈ R and y ∈ Cn . Then,
y a

λk (Â) ≤ λk (A) ≤ λk+1 (Â).

Proof. Note that

λk+1 (Â) = max min x∗ Âx ≤ max min x∗ Âx


w1 ,...,wk ∈Cn+1 kxk=1 w1 ,...,wk ∈Cn+1 kxk=1
xn+1 =0
x⊥w1 ,...,wk x⊥w1 ,...,wk

= max min x Ax = λk+1 (A)
w1 ,...,wk ∈Cn kxk=1
x⊥w1 ,...,wk

and

λk+1 (Â) = min max x∗ Âx ≥ min max x∗ Âx


w1 ,...,wn−k ∈Cn+1 kxk=1 w1 ,...,wn−k ∈Cn+1 kxk=1
xn+1 =0
x⊥w1 ,...,wn−k x⊥w1 ,...,wn−k

= min max x Ax = λk (A)
w1 ,...,wn−k ∈Cn kxk=1
x⊥w1 ,...,wn−k

As an immediate corollary, one has the following result.

Corollary 7.7.7. [Inclusion principle] Let A ∈ Mn (C) be a Hermitian matrix and r be a


T

positive integer with 1 ≤ r ≤ n. If Br×r is a principal submatrix of A then, λk (A) ≤ λk (B) ≤


AF

λk+n−r (A).
DR

Theorem 7.7.8. [Poincare Separation Theorem] Let A ∈ Mn (C) be a Hermitian matrix and
{u1 , . . . , ur } ⊆ Cn be an orthonormal set for some positive integer r, 1 ≤ r ≤ n. If further
B = [bij ] is an r × r matrix with bij = u∗i Auj , 1 ≤ i, j ≤ r then, λk (A) ≤ λk (B) ≤ λk+n−r (A).

Proof. Let us extend the i set {u1 , . . . , ur } to an orthonormal basis, say {u1 , . . . , un }
h orthonormal
of Cn and write U = u1 · · · un . Then, B is a r × r principal submatrix of U ∗ AU . Thus, by
inclusion principle, λk (U ∗ AU ) ≤ λk (B) ≤ λk+n−r (U ∗ AU ). But, we know that σ(U ∗ AU ) = σ(A)
and hence the required result follows.
The proof of the next result is left for the reader.

Corollary 7.7.9. Let A ∈ Mn (C) be a Hermitian matrix and r be a positive integer with
1 ≤ r ≤ n. Then,

λ1 (A) + · · · + λr (A) = ∗min trU∗ AU and λn−r+1 (A) + · · · + λn (A) = max



trU∗ AU.
U U =Ir U U=Ir

Corollary 7.7.10. Let A ∈ Mn (C) be a Hermitian matrix and W be a k-dimensional subspace


of Cn . Suppose, there exists a real number c such that x∗ Ax ≥ cx∗ x, for each x ∈ W . 
Then,

λn−k+1 (A) ≥ c. In particular, if x Ax > 0, for each nonzero x ∈ W , then λn−k+1 > 0. Note
that, a k-dimensional subspace need not contain
" # an eigenvector of A. For example, the line

1 0
y = 2x does not contain an eigenvector of .
0 2
208 CHAPTER 7. APPENDIX

Proof. Let {x1 , . . . , xn−k } be a basis of W ⊥ . Then,

λn−k+1 (A) = max min x∗ Ax ≥ min x∗ Ax ≥ c.


w1 ,...,wn−k kxk=1 kxk=1
x⊥w1 ,...,wn−k x⊥x1 ,...,xn−k

Now assume that x∗ Ax > 0 holds for each nonzero x ∈ W and that λn−k+1 = 0. Then, it
follows that min x∗ Ax = 0. Now, define f : Cn → C by f (x) = x∗ Ax.
kxk=1
x⊥x1 ,...,xn−k

Then, f is a continuous function and min f (x) = 0. Thus, f must attain its bound on the
kxk=1
x∈W
unit sphere. That is, there exists y ∈ W with kyk = 1 such that y∗ Ay = 0, a contradiction.
Thus, the required result follows.

T
AF
DR

You might also like