Abstract Algebra
Abstract Algebra
Abstract Algebra
(For B.A. and B.Sc. IIIrd year students of All Colleges affiliated to Lucknow University, Lucknow)
By
(Lucknow Edition)
Dedicated
to
Lord
Krishna
Authors & Publishers
P reface
This book on ABSTRACT ALGEBRA has been specially written according
to the Latest Unified Syllabus to meet the requirements of the B.A. and B.Sc.
Part-III Students of all colleges affiliated to Lucknow University, Lucknow.
The subject matter has been discussed in such a simple way that the students
will find no difficulty to understand it. The proofs of various theorems and
examples have been given with minute details. Each chapter of this book
contains complete theory and a fairly large number of solved examples.
Sufficient problems have also been selected from various university examination
papers. At the end of each chapter an exercise containing objective questions has
been given.
We have tried our best to keep the book free from misprints. The authors
shall be grateful to the readers who point out errors and omissions which, inspite
of all care, might have been there.
The authors, in general, hope that the present book will be warmly received
by the students and teachers. We shall indeed be very thankful to our colleagues
for their recommending this book to their students.
The authors wish to express their thanks to Mr. S.K. Rastogi, M.D.,
Mr. Sugam Rastogi, Executive Director, Mrs. Kanupriya Rastogi, Director and
entire team of KRISHNA Prakashan Media (P) Ltd., Meerut for bringing
out this book in the present nice form.
The authors will feel amply rewarded if the book serves the purpose for
which it is meant. Suggestions for the improvement of the book are always
welcome.
B.A./B.Sc. Paper-II
Unit-I
Automorphism, inner automorphism, automorphism groups and their
computations. Conjugacy relations, Normaliser, Counting principle and the
class equation of a finite group, Center of group of prime power order, Sylow's
theorems, Sylow p-subgroup.
Unit-II
Prime and maximal ideals, Euclidean Rings, Principal ideal rings, Polynomial
Rings, Polynomial over the Rational Field, The Eisenstein Criterion, Polynomial
Rings over Commutative Rings, unique factorization domain, R is unique
factorization domain implies so is R [x1, x2, …, xn].
Unit-III
Direct sum, Quotient space, Linear transformations and their representation as
matrices, The Algebra of linear transformations, rank nullity theorem, change of
basis, linear functional, Dual space, Bidual space and natural isomorphism,
transpose of a linear transformation, Characteristic values, annihilating
polynomials, diagonalisation, Cayley Hamilton Theorem, Invariant subspaces,
Primary decomposition theorem.
Unit-IV
Inner product spaces, Cauchy-Schwarz inequality, orthogonal vectors,
Orthogonal complements, Orthonormal sets and bases, Bessel's inequality for
finite dimensional spaces, Gram-Schmidt orthogonalization process, Bilinear,
Quadratic and Hermitian forms.
B rief C ontents
Dedication.........................................................................(v)
Preface ...........................................................................(vi)
Syllabus (Lucknow University, w.e.f. 2014-15).................................(vii)
Brief Contents ...............................................................(viii)
1. Group Automorphisms....................................................................................03 – 36
2. Rings........................................................................................................................37 – 88
ABSTRACT ALGEBRA
C hapters
1. Group Automorphisms
2. Rings
3. Linear Transformations
5. Linear Functionals
6. Characteristic Values and
Annihilating Polynomials
Group Automorphisms
Example 4: Let G be a finite abelian group of order n, where n is odd and > 1. Then show
that G has a non-trivial automorphism.
Solution: Define a mapping f : G → G such that f ( x) = x −1 , V x ∈ G.
Then f is an automorphism of G. [See Ex. 3]
We shall show that the mapping f is a non-trivial automorphism of G i. e., f ≠ I ,
where I is the identity mapping of G i. e., I : G → G such that I ( x) = x, V x ∈ G.
Suppose f = I .
Then f ( x) = I ( x), V x ∈ G
⇒ x −1 = x, V x ∈ G ⇒ x −1 x = x 2 , V x ∈ G
⇒ x 2 = e, V x ∈ G, where e is the identity of G
⇒ o ( x)| 2 , V x ∈ G ⇒ o ( x) = 1 or 2 , V x ∈ G.
Since o (G) > 1, therefore G must have an element x such that x ≠ e.
Then x ≠ e ⇒ o ( x) = 2 .
But in a group G, o ( x)| o (G), V x ∈ G.
∴ o ( x) = 2 ⇒ 2 | o (G) ⇒ o (G) is even,
which is a contradiction.
∴ our assumption that f = I is wrong.
Hence, f ≠ I i. e., f is a non-trivial automorphism of G.
Example 5: Let G be a group, H a subgroup of G, f an automorphism of G. Let
f ( H) = { f (h) : h ∈ H }. (Lucknow 2007)
Prove that f ( H) is a subgroup of G.
Solution: Let a, b be any two elements of f ( H). Then
a = f (h1 ) and b = f (h2 ) where h1 , h2 ∈ H.
Now h1 , h2 ∈ H ⇒ h1 h2 −1 ∈ H [ ∵ H is a subgroup]
−1
⇒ f (h1 h2 ) ∈ f ( H)
⇒ f (h1 ) f (h2 −1 ) ∈ f ( H) [ ∵ f is an automorphism]
−1
⇒ f (h1 ) [ f (h2 )] ∈ f ( H) ⇒ ab −1 ∈ f ( H).
∴ f ( H) is a subgroup of G.
Note: Some authors use the symbol hf in place of f (h) to denote the image of an
element.
Example 6:Let G be a group, f an automorphism of G, N a normal subgroup of G. Prove
that f (N ) is a normal subgroup of G. (Lucknow 2010)
Solution: First show as in Ex. 5 that f (N ) is a subgroup of G.
Now to show that f (N ) is a normal subgroup of G.
Let x ∈ G and k ∈ f (N ). Then x = f ( y) where y ∈ G because f is a function of
G onto G. Also k = f (n) where n ∈ N .
6
= a −1 (a xa −1
) a = x.
∴ fa f is the identity function on G.
a −1
∴ f = ( f a ) −1 .
a −1
∴ φ is onto I (G ).
Now to prove that
φ (ab) = φ (a) φ (b) V a, b ∈ G.
We have φ (ab) = f = f = f f = φ (a) φ (b).
(ab)−1 b −1 a −1 a −1 b −1
Now to show that Z is the kernel of φ.
The identity function i on G is the identity of the group I (G ).
Let K be the kernel of φ.
Then we have z ∈ K ⇔ φ (z ) = i ⇔ f =i ⇔ f ( x) = i ( x) V x ∈ G
z −1 z −1
⇔ (z −1 ) −1 xz −1 = x V x ∈ G ⇔ z x z −1 = x V x ∈ G
⇔ z x = xz V x ∈ G ⇔ z ∈ G.
∴ K = Z. Hence the theorem.
= (a m 2 ) m 1 = [ f m (a)] m 1 = f m (a m 1 )
2 2
= f m [ f m (a)] = ( f m o f m ) (a).
2 1 2 1
13
Now two automorphisms of a cyclic group are equal if the image of a generator of
the group under each of them is the same.
Hence f m o f m = fm o fm .
1 2 2 1
Therefore the group of automorphisms of a cyclic group is abelian.
Example 7: Let G be a finite abelian additive group and n be a positive integer relatively
prime to o (G ). Prove that the mapping σ : G → G given by σ ( x) = nx is an automorphism
of G.
Solution: The mapping σ is one-one: Let x, y be any two elements of G. Then
σ ( x) = σ ( y) ⇒ nx = ny
⇒ n ( x − y) = 0 , where 0 is the identity of the group G
⇒ o ( x − y)| n
⇒ o ( x − y)| n and o ( x − y)| o (G ) [ ∵ o ( x − y)| o (G )]
⇒ o ( x − y) = 1 [∵ if o ( x − y) ≠ 1, then o ( x − y) > 1 ⇒ n
and o (G ) are not relatively prime]
⇒ x− y=o i. e., the identity of G
⇒ x = y.
Therefore the mapping σ is one-one.
The mapping σ is also onto G: Since G is finite and σ is one-one, therefore σ
must be onto G.
Finally if x, y ∈ G , then σ ( x + y) = n ( x + y) = nx + ny = σ ( x) + σ ( y).
Hence, σ is an automorphism of G.
Example 8: Let R + be the multiplicative group of all positive real numbers. Show that the
mapping f : R + → R + defined by f ( x) = x 2 , V x ∈ R + is an automorphism.
(Lucknow 2009)
+
Solution: We have R = { x : x ∈ R and x > 0 }.
The mapping f is one-one : Let x, y ∈ R + .
Then f ( x) = f ( y) ⇒ x2 = y2 ⇒ x = y. [∵ x > 0 and y > 0]
∴ f is one-one.
The mapping f is onto : Let x be any element of R + . Then there exists x ∈R+
such that
f ( x ) = ( x )2 = x.
∴ f is onto.
The mapping f is a homomorphism : Let x, y be any two elements of R + . Then
f ( xy) = ( xy)2 = x 2 y 2 = f ( x) f ( y).
14
∴ f is a homomorphism.
Hence, f is an automorphism of R + .
Example 9: Find the automorphism group of A3 , where A3 is the alternating group of degree
3 on three symbols.
Solution. Let A3 be the alternating group on three symbols a , b , c . Then
A3 = {e , f , g},
where e = the identity permutation, f = (a b c ) and g = (a c b).
Let I be the identity mapping of A3 i.e.,
I ( e) = e , I ( f ) = f and I ( g) = g.
Then obviously I is an automorphism of A3 .
Now consider the mapping T : A3 → A3 defined by
T (e) = e, T ( f ) = g, T ( g) = f .
Obviously T is a one-one mapping of A3 onto A3 .
We have T ( e f ) = T ( f ) = g = e g = T ( e) T ( f ),
T ( f e) = T ( f ) = g = ge = T ( f ) = T (e),
T (e g) = T ( g) = f = e f = T (e) T ( g),
T ( ge) = T ( g) T ( e).
a b c a c b a b c
Also, T ( f g) = T ( e ) ∵ fg = (abc )(acb) = = = e
b c a c b a a b c
=e
and T ( f ) T ( g) = g f = ( acb) ( abc ) = e.
∴ T ( f g) = T ( f ) T ( g).
Similarly T ( g f ) = T ( g) T ( f ). [Note that A3 is an abelian group]
∴ the mapping T is a homomorphism.Thus the mapping T is also an
automorphism of A3 .
Hence, the automorphism group Aut ( A3 ) of A3 contains only two elements,
namely I and T.
Thus Aut ( A3 ) = {I , T }.
Example 10: If f : G → G such that f ( a) = a n , V a ∈ G is an automorphism of G,
show that a n − 1 ∈ Z for all a ∈ G, where Z is the centre of the group G.
Solution: Let x , a be any two elements of G. We have
f ( a − n xa n ) = (a − n xa n ) n [By def. of the mapping f ]
= a − n x na n [∵ if a , b are any elements of a group G,
then ( b − 1 ab) n = b − 1 a n b for any integer n]
= ( a − 1 ) n x na n
= f (a − 1 ) f ( x) f ( a) [By def. of the mapping f ]
= f (a − 1 xa). [∵ the mapping f is a homomorphism]
15
Thus f (a − n xa n ) = f ( a − 1 xa).
Since the mapping f is one-one, therefore a − n xa n = a − 1 xa
⇒ xa n − 1 = a n − 1 x, V x , a ∈ G
⇒ a n − 1 ∈ Z , V a ∈ G.
Example 11: Let f : G → G be a homomorphism i.e., f is an endomorphism of G. Suppose
f commutes with every inner automorphism of G. Show that
(i) K = { x ∈ G : f 2 ( x) = f ( x)} is a normal subgroup of G.
(ii) G / K is abelian.
Solution: (i) Let e be the identity of the group G.
We have f 2 (e) = f ( f (e)) = f (e).
∴ e ∈ K and so K ≠ ∅.
Now let x, y be any two elements of K. Then f ( x) = f 2 ( x) and f ( y) = f 2 ( y). We
have
f 2 ( x y −1 ) = f ( f ( x y −1 ))
= f [ f ( x) f ( y −1 )] [∵ f is a homomorphism]
= f ( f ( x) [ f ( y)] − 1 ) [∵ f is a homomorphism ⇒
−1 −1
f ( y ) = [ f ( y)] ]
= f ( f ( x)) f ([ f ( y)] −1 ) [∵ f is a homomorphism]
= f 2 ( x) [ f ( f ( y))] −1 [∵ f is a homomorphism
⇒ f ([ f ( y)] − 1 ) = [ f ( f ( y))] − 1 ]
= f 2 ( x) [ f 2 ( y)] −1
= f ( x) [ f ( y)] −1 [∵ x , y ∈ K ⇒ f 2 ( x) = f ( x) and
f 2 ( y) = f ( y) ]
= f ( x) f ( y − 1 ) {∵ f is a homomorphism
⇒ [ f ( y)] − 1 = f ( y − 1 ) }
= f ( xy − 1 ). [∵ f is a homomorphism]
∴ xy −1 ∈ K.
Thus K ≠ ∅ and x , y ∈ K ⇒ x y −1 ∈ K.
∴ K is a subgroup of G.
Now to show that K is normal in G. Let g ∈ G and x ∈ K. Then
f 2 ( g x g − 1 ) = f [ f ( g x g −1 )]
= f [ f f g ( x)], where f g is the inner automorphism
corresponding to g
16
= f ( g) f ( f ( x)) f ( g − 1 ) [∵ f is a homomorphism]
2 −1
= f ( g) f ( x) f ( g )
= f ( g) f ( x) f ( g − 1 ) [∵ x ∈ K ⇒ f 2 ( x) = f ( x) ]
= f ( g x g − 1 ). [∵ f is a homomorphism]
−1
∴ gx g ∈ K for all x ∈ K, g ∈ G.
∴ K is a normal subgroup of G.
(ii) To show that G / K is abelian.
By definition of a quotient group, we have G / K = { K x : x ∈ G}.
We have G / K is abelian ⇔ K x K y = K y K x , V x , y ∈ G
⇔ K x y = K y x, V x , y ∈ G
⇔ x y ( y x) −1 ∈ K, V x , y ∈ G
⇔ x y x −1 y −1 ∈ K, V x, y ∈ G. ...(1)
Now for all x , y ∈ G, we have
f 2 ( x y x −1 y −1 ) = f [ f ( x y x −1 y −1 )]
= f [ f ( x y x −1 ) f ( y −1 )] [∵ f is a homomorphism]
−1
= f [ f ( f x ( y)) f ( y )], where f x is the inner
automorphism corresponding to x
−1
= f [ f x ( f ( y)) f ( y )] [∵ f f x = f x f ]
−1 −1
= f ( x f ( y) x [ f ( y)] )
−1
= f [x f f ( y) (x )], where f f ( y) is the inner automorphism
corresponding to f ( y)
−1
= f ( x) f [ f f ( y) (x )] [ ∵ f is a homomorphism]
= f ( x) f f ( y) f ( x −1 ) [∵ f f f ( y) = ff ( y) f ]
= f ( x) f ( y) f ( x −1 ) [ f ( y)] −1
= f ( x) f ( y) f ( x −1 ) f ( y −1 )
= f ( x y x −1 y −1 ).
∴ x y x −1 y −1 ∈ K.
Hence, by virtue of (1), G / K is abelian.
Example 12: Let G be a group and f an automorphism of G. If, for a ∈ G , we have
N (a) = { x ∈ G : xa = a x}, prove that N ( f (a)) = f (N (a)).
17
Example 13: Let G be an infinite cyclic group. Determine Aut G, the group of all
automorphisms of G.
Solution: Let G = (a) be an infinite cyclic group generated by a.
Let f ∈ Aut G i.e., let f be an automorphism of G.
First we shall show that f (a) is also a generator of G i.e., G = ( f (a)), the cyclic
group generated by f (a).
Let x be any element of G.
Since the mapping f : G → G is onto, therefore there exists y ∈ G such that
x = f ( y).
But y ∈ G ⇒ y = a r for some integer r [ ∵ a is a generator of G ]
r r
∴ x = f ( y) = f (a ) = ( f (a) ).
∴ f (a) is a generator of G i.e., G = ( f (a)) .
But the infinite cyclic group G = (a) has only two generators, namely a and a − 1 .
18
∴ f (a) = a or f (a) = a − 1 .
Thus f has only two choices and so
o (Aut G) ≤ 2. ...(1)
Define a mapping T : G → G such that
T ( x) = x − 1 , V x ∈ G.
Then T ∈ Aut G.
Also T ≠ I as T = I ⇒ T ( x) = x , V x ∈ G
⇒ x −1 = x , V x ∈ G
⇒ a −1 = a ⇒ a2 = e ⇒ o (a) is finite, which is a contradiction because the
generator a of an infinite cyclic group cannot be of finite order.
∴ T ≠ I.
Thus G has at least two automorphisms.
∴ o (Aut G) ≥ 2. ...(2)
From (1) and (2), we have o (Aut G ) = 2.
In fact, we have Aut G = { I , T : T ( x) = x − 1 , V x ∈ G }.
Since o (Aut G) = 2, therefore Aut G is a cyclic group of order 2.
We know that any cyclic group of order n is isomorphic to the group
Z n = {0, 1, 2, … , n − 1} under addition modulo n.
Hence, Aut G ≅ Z 2 , where Z 2 = {0, 1} is a group under addition modulo 2.
Example 14: Let G be a finite cyclic group of order n. Determine Aut G, the group of all
automorphisms of G.
Solution: Let G = (a) be a finite cyclic group of order n generated by a.
n
We have o (a) = o (G) = n so that a = e.
Let f ∈ Aut G i.e., let f be an automorphism of G.
First we shall show that f (a) is also a generator of G i.e., G = ( f (a)), the cyclic
group generated by a.
Let x be any element of G.
Since the mapping f : G → G is onto, therefore ∃ y ∈ G such that x = f ( y).
But y ∈ G ⇒ y = a r for some integer r. [∵ a is a generator of G ]
∴ x = f ( y) = f (a r ) = ( f (a)) r .
∴ f (a) is a generator of G i.e., G = f (a).
But the finite cyclic group G has only φ (n) generators, where φ (n) is Euler’s
φ -function i.e., φ (n) denotes the number of integers less than n and prime to n.
∴ if f is an automorphism of G, then f has only φ (n) choices.
∴ o (Aut G) ≤ φ (n). ...(1)
Define a mapping f m : G → G such that
19
m
f m ( x) = x , (m , n) = 1, 1≤ m < n.
Here (m, n) denotes the H.C.F. of m and n and (m , n) = 1 means that m and n are
co-prime.
We claim that the mapping f m is an automorphism of G.
f m is one-one : Let a r , a s ∈ G, where 1≤ r ≤ n, 1≤ s ≤ n and r ≥ s.
Then f m (a r ) = f m (a s ) ⇒ (a r ) m = (a s ) m
− s) m
⇒ a r m = a sm ⇒ a(r =e
⇒ n |(r − s) m.
But m is prime to n and 0 ≤ r − s < n.
∴ n |(r − s) m ⇒ r − s = 0 ⇒ r = s ⇒ a r = a s .
Thus f m (a r ) = f m (a s ) ⇒ a r = a s .
∴ f m is one-one.
f m is onto: Since G is finite and f m is one-one, therefore f must be onto G.
f m is a homomorphism : Let a r , a s ∈ G, where 1 ≤ r ≤ n , 1 ≤ s ≤ n. Then
+s
f m (a r a s ) = f m (a r ) = f m (a nu + k ) ,
where r + s = nu + k, u is some integer and 0 ≤ k < n
= f m (a nu a k ) = f m (a k ) [∵ a nu = (a n ) u = e u = e]
+ s − nu)
= (a k ) m = a mk = a m (r
+ s)
= a m (r a − mnu
+ s)
= a m (r [∵ a − mnu = (a n ) − mu = e − mu = e]
= a mr a ms = (a r ) m (a s ) m
= f m (a r ) f m (a s ).
∴ f m is a homomorphism.
∴ f m is an automorphism of G i.e., f m ∈ Aut G.
Thus f m ∈ Aut G, where (m, n) = 1, 1 ≤ m < n.
We now show that f r ≠ f s , for all r , s (r ≠ s), 1≤ r , s < n , where (r , n) = 1 and
(s , n) = 1.
Assume that r > s. Suppose, if possible, f r = f s . Then
f r = f s ⇒ f r (a) = f s (a)
−s
⇒ ar = as ⇒ ar =e
⇒ o (a)| (r − s) ⇒ n |(r − s)
⇒ n ≤ r − s < n.
This is a contradiction.
∴ f r ≠ f s , for all r, s (r ≠ s), 1 ≤ r, s < n, where r and s are relatively prime to n.
This shows that Aut G has at least φ (n) automorphisms of G.
20
arise.
Case I: If G is an infinite cyclic group, then its group of automorphisms is given by
Aut G = { I , T : T ( x) = x − 1 , V x ∈ G}.
Since o (Aut G) = 2, therefore Aut G is abelian.
Case II. If G is a cyclic group of finite order n, then
Aut G = { f m : f m ( x) = x m, 1 ≤ m < n, (m, n) = 1}.
Let f r , f s ∈ Aut G. Then f r (a) = a r, f s (a) = a s .
Now ( f r f s ) (a) = f r ( f s (a)) = f r (a s ) = (a s ) r = a sr = a r s
= (a r ) s = f s (a r ) = f s ( f r (a)) = ( f s f r ) (a).
Now two automorphisms of a cyclic group are equal if the image of a generator of
the group under each of them is the same.
Hence f r f s = f s f r , V f r , f s ∈ Aut G.
∴ in this case also Aut G is abelian.
Hence, the group of automorphisms of a cyclic group is abelian.
Example 16: Let φ (n) be the Euler φ -function. For any integers a > 1, n > 0, show that
n | φ (a n − 1).
Solution: Let G = (b) be the cyclic group generated by b, where
o (b) = o (G) = a n − 1.
Consider the mapping f a : G → G defined by
f a ( x) = x a , V x ∈ G.
Since (a, a n − 1) = 1, therefore f a ∈ Aut G.
2
If x ∈ G, then f a 2 ( x) = f a ( f a ( x)) = f a ( x a ) = ( x a ) a = x a .
r
In general, f a r ( x) = x a , for every positive integer r.
n n
−1
∴ f a n ( x) = x a = x x a = x x o (G) = xe = x [∵ x o (G) = e , V x ∈ G ]
= I ( x), V x ∈ G.
n
∴ f a = I. ...(1)
Again if f a m = I , then
f a m ( x) = I ( x) = x , V x ∈ G ⇒ f a m (b) = b [∵ b ∈ G ]
m m
−1
⇒ ba =b ⇒ ba =e ⇒ o (b)|(a m − 1)
⇒ (a n − 1)|(a m − 1) ⇒ an − 1≤ am − 1
⇒ an ≤ am ⇒ n≤ m ⇒ m≥ n ...(2)
From (1) and (2), we conclude that o ( f a ) = n.
22
Also, o (Aut G) = φ (a n − 1.
)
∴ f a ∈ Aut G ⇒ o ( f a )| o (Aut G) ⇒ n | φ (a n − 1).
Comprehensive Exercise 1
Answers 1
1. False.
2. (i) Every abelian group, (ii) the symmetric group P3 .
23
where the summation runs over one element a in each conjugate class containing more than one
element.
Proof: The class equation of G is
o (G)
o (G) = o (Z ) + ∑ ,
a ∉Z o [N (a)]
the summation being extended over one element a in each conjugate class.
Now a ∈ Z ⇔ o [N (a)] = o (G) ⇔ o (G) / o [N (a)] = 1 ⇔ the conjugate class of a in
G contains only one element. Thus the number of conjugate classes each having
only one element is equal to o (Z ). If a is an element of any one of these conjugate
classes, we have o (G) / o [N (a)] = 1.
Hence, the class equation of G takes the desired form
o (G)
o (G) = o (Z ) + ∑ .
a ∉Z o [N (a)]
where the summation runs over one element a in each conjugate class containing
more than one element.
Now V a ∈ G, N (a) is a subgroup of G. Therefore by Lagrange’s theorem o [N (a)]
is a divisor of o (G). Also a ∉ Z ⇒ N (a) ≠ G ⇒ o [N (a)] < o (G). Therefore if
a ∉ Z , then o[N (a)] must be of the form pn a where na is some integer such that
1≤ na < n. Suppose there are exactly z elements in Z i. e., let o (Z ) = z . Then the
class equation (1) gives
pn
pn = z + ∑ n , where each na is some integer such that 1≤ na < n.
p a
28
pn
∴ z = pn − ∑ pn a
, ...(2)
Example 17: Write all the conjugate classes in S3 , find the c a ’s and verify the class
equation.
Solution: The symmetric group on 3 symbols 1, 2, 3 is given by
S3 = {(1), (1, 2), (2, 3), (3, 1), (1, 2, 3), (1, 3, 2)}.
The three conjugate classes of S3 are
30
1 2 3
C (a) = {(1)}, where a = (1) = ;
1 2 3
1 2 3
C (b) = {(123 ), (132 )}, where b = (123 ) = ;
2 3 1
1 2 3
C ( c ) = {(12), (23), (31)}, where c = (12 ) = ;
2 1 3
∴ c a = 1, c b = 2, c c = 3.
Also, we have Z (S3 ) = {(1)} = C (a).
Here o (Z (S3 )) = 1.
Hence, the class equation of S3 is
o (S3 )
o (S3 ) = o (Z (S3 )) + Σ ,
a o (C (a))
where the summation is taken over a set of representatives for
the distinct conjugacy classes having more than one member
o (S3 ) o (S3 ) 6 6
i. e., o (S3 ) = o (Z (S3 )) + + = 1 + + = 1 + 3 + 2 = 6.
o (C (b)) o (C (c )) 2 3
Example 18: Let Z be the centre of a group G. If a ∈ Z , then prove that the cyclic subgroup
{ a} of G generated by a is a normal subgroup of G.
Solution: We have
Z = { z ∈ G : z x = xz V x ∈ G }. Let a ∈ Z and let H = { a} be the cyclic
subgroup of G generated by a. Let h be any element of H. Then h = a n for some
integer n.
Let x be any element of G. We have
xh x −1 = xa n x −1 = a n x x −1 [ ∵ a ∈ Z ⇒ a n ∈ Z ∈ a n x = xa n ]
= a n e = a n ∈ H.
∴ H is a normal subgroup of G.
Example 19: Let a be any element of G. Show that the cyclic subgroup of G generated by a is
a normal subgroup of the normalizer of a.
Solution: We have the normalizer of a = N (a) = { x ∈ G : xa = a x }.
Let H be the cyclic subgroup of G generated by a. Also let h be any element of H.
Then h = a n where n is some integer. We have
a n a = a n + 1 = aa n .
∴ a n = h ∈ N (a).
Now N (a) and H are subgroups of G. Also h ∈ H ⇒ h ∈ N (a).
Therefore H ⊆ N (a). Hence H is a subgroup of N (a).
Now to prove that H is a normal subgroup of N (a). Let x be any element of N (a) and
h = a n be any element of H. We have
31
xh x −1 = xa n x −1 = ( xa x −1 ) n = (a x x −1 ) n [∵ x ∈ N (a) ⇒ a x = xa]
n n
= (ae) = a ∈ H.
∴ H is a normal subgroup of N (a).
Example 20: Show that two elements are conjugate if and only if they can be put in the
form x y and y x respectively where x and y are suitable elements of G.
Solution: Let a, b be two conjugate elements of a group G.
Then a = c −1 bc for some c ∈ G.
Let c −1 b = x and c = y. Then a = xy.
Also y x = c (c −1 b) = (c c −1 ) b = eb = b.
Conversely suppose that a = x y and b = y x. We have
b = y x ⇒ y −1 b = y −1 y x ⇒ y −1 b = x .
Now a = x y ⇒ a = y −1 by ⇒ a and b are conjugate elements.
Example 21: Give an example to show that in a group G the normalizer of an element is not
necessarily a normal subgroup of G.
Solution: Consider the group S3 , the symmetric group of permutations on three
symbols a, b, c. We have S3 = { I , (ab), (bc ), (ca), (abc ), (acb)}. Let N (ab) denote the
normalizer of the element (ab) ∈ S3 . We shall show that N (ab) is not a normal
subgroup of S3 . Let us calculate the elements of N (ab). Obviously (ab) ∈ N (ab).
Also I ∈ N (ab) because I (ab) = (ab) I .
Now (bc ) (ab) = (abc ) and (ab) (bc ) = (acb). Thus (bc ) does not commute with (ab).
Therefore (bc ) ∉ N (ab).
Again (ca) (ab) = (acb) and (ab) (ca) = (abc ).
Thus (ca) (ab) ≠ (ab) (ca) and therefore (ca) ∉ N (ab). Similarly we can verify that
(abc ) ∉ N (ab) and (acb) ∉ N (ab).
Hence N (ab) = { I , (ab) }.
Now we shall show that N (ab) is not a normal subgroup of S3 . Take the element
(bc ) ∈ S3 and the element (ab) ∈ N (ab). We have
(bc ) (ab) (bc ) −1 = (bc ) (ab) (cb) = (abc ) (cb) = (ac ) ∉ N (ab).
Therefore N (ab) is not a normal subgroup of S3 .
Example 22:. Let Z denote the centre of a group G. If G / Z is cyclic prove that G is
abelian.
Solution: It is given that G / Z is cyclic. Let Zg be a generator of the cyclic group
G / Z where g is some element of G.
Let a, b ∈ G. Then to prove that ab = ba. Since a ∈ G, therefore Z a ∈ G / Z . But
G / Z is cyclic having Z g as a generator. Therefore there exists some integer m such
that Z a = (Z g) m = Z g m, because Z is a normal subgroup of G. Now a ∈ Za.
Therefore
Z a = Z g m ⇒ a ∈ Z g m ⇒ a = z1 g m for some z1 ∈ Z .
Similarly b = z 2 g n where z 2 ∈ Z and n is some integer.
32
Now ab = (z1 g m ) (z 2 g n ) = z1 g m z 2 g n
= z1 z 2 g m g n [∵ z 2 ∈ Z ⇒ z 2 g m = g m z 2 ]
= z1 z 2 g m + n .
Again ba = z 2 g n z1 g m = z 2 z1 g n g m = z 2 z1 g n + m = z1 z 2 g m + n .
[∵ z1 ∈ Z ⇒ z1 z 2 = z 2 z1 ]
∴ ab = ba.
Since ab = ba V a, b ∈ G, therefore G is abelian.
Example 23: If p is a prime number and G is a non-abelian group of order p3, show that the
centre of G has exactly p elements.
Solution: Let Z denote the centre of G. Since o (G) = p3 where p is a prime number,
therefore Z ≠ { e} i. e., o (Z ) > 1. But Z is a subgroup of G, therefore o (Z ) must be a
divisor of o (G) i. e., o (Z ) must be a divisor of p3 . Since p is prime, therefore either
o (Z ) = p or p2 or p3 .
If o (Z ) = p3 = o (G), then Z = G and so G is abelian which contradicts the
hypothesis that G is non-abelian. So o (Z ) cannot be p3 .
If o (Z ) = p2 , then o (G / Z ) = o (G) / o (Z ) = p3 / p2 = p i. e., G / Z is a group of
prime order p and so is cyclic. But if G / Z is cyclic, then G is abelian which again
contradicts the hypothesis. So o (Z ) cannot be p2 .
Hence the only possibility is that o (Z ) = p i. e., the centre of G has exactly p
elements.
Assuming that the theorem is true for groups of order less than that of G, we shall
show that it is also true for G. To start the induction we see that the theorem is
obviously true if o (G ) = 1 .
Let o (G ) = pm n, where p is not a divisor of n. If m = 0, the theorem is obviously true.
If m = 1, the theorem is true by Cauchy’s theorem. So let m > 1. Then G is a group of
33
Example 25: If a group G has only one p-Sylow subgroup H, then H is normal in G.
Solution: Suppose a group G has only one p-Sylow subgroup H. Let x be any
element of G. Then by previous example, x −1 Hx is also a p -Sylow subgroup of G.
But H is the only p -Sylow subgroup of G. Therefore
x −1 Hx = H V x ∈ G ⇒ H is a normal subgroup of G.
35
Comprehensive Exercise 2
Answers 2
5. f1 = I , f 2 = (12), f 3 = (23), f 4 = (31), f 5 = (123), f 6 = (132);
C ( f1 ) = { f1 }, C ( f 2 ) = C ( f 3 ) = C ( f 4 ) = { f 2 , f 3 , f 4 },
C ( f 5 ) = C ( f 6 ) = { f 5 , f 6 }.
36
True or False
Write ‘T’ for true and ‘F’ for false statement.
1. The relation of conjugacy is an equivalence relation on the group G.
2. If a ∈ G, then the set N (a) = { x ∈ G : a x = xa } is always a normal subgroup
of G.
3. If o (G) = p n where p is a prime number, then the centre Z = { e }.
4. A group of order 121 is abelian.
A nswers
True or False
1. T. 2. F. 3. F. 4. T.
¨
37
Rings
Let S + be the set of all positive integers in S. Since S + is not empty, therefore by the
well ordering principle S + must possess a least positive integer. Let s be this least
element. We will now show that S is the principal ideal generated by s i.e., S = (s).
Suppose now that n is any integer in S. Then by division algorithm, there exist
integers q and r such that n = qs + r with 0 ≤ r < s.
Now s ∈ S, q ∈ I ⇒ qs ∈ S [ ∵ S is an ideal ]
and n ∈ S, qs ∈ S ⇒ n − qs ∈ S
[ ∵ S is a subgroup of the additive group of I ]
⇒ r ∈S [∵ n − qs = r ]
But 0 ≤ r < s and s is the least positive integer such that s ∈ S. Hence r must be 0.
∴ n = qs.
Thus n ∈ S ⇒ n = qs for some q ∈ I.
Hence S is a principal ideal of I generated by s.
Since S was an arbitrary ideal in the ring of integers, therefore the ring of integers is
a principal ideal ring.
Theorem 2: Every field is a principal ideal ring.
Proof: A field has no proper ideals. The only ideals of a field are (i) the null ideal
which is a principal ideal generated by 0 and (ii) the field itself which is also a
principal ideal generated by 1. Thus a field is always a principal ideal ring.
element of the ring. The symbol ‘+’ connecting various terms in f ( x) has no
connection with the addition of the ring. This symbol has been used here only to
connect different terms. Also x is not an element of R. The powers of x are nothing
to do with the powers of an element of R. The different powers of x only tell us the
ordered place of different coefficients. There is no harm if we represent this
polynomial f ( x) by the infinite ordered set (a0 , a1 , a2 , ……), where
a0 , a1 , a2 ,……are elements of R and only a finite number of them are not equal to
zero. Since from high school classes we represent a polynomial with an
indeterminate x, therefore we have preferred this way to represent polynomials.
Remark: The polynomial a0 x 0 + a1 x + a2 x 2 + ……over a ring R can also be
written as
a0 + a1 x + a2 x 2 + …
Set of all polynomials over a ring : Let R be an arbitrary ring and x an
indeterminate. The set of all polynomials f ( x) ,
∞
f ( x) = Σ a n x n = a0 x 0 + a1 x + a2 x 2 + …
n=0
where the a,s are elements of the ring R and only a finite number of them are not equal to zero,
is called R [ x ] .
We shall make a ring out of R [ x ]. Then R [ x ] will be called the ring of all
polynomials over the ring R. For this we shall define equality, addition and
multiplication of two elements of R [ x ] .
Definition:
Suppose R is an arbitrary ring and f ( x) = a0 x 0 + a1 x + a2 x 2 + a3 x 3 + … and
g ( x) = b0 x 0 + b1 x + b2 x 2 + b3 x 3 + … are any elements of R [ x]. Then
(a) f ( x) = g ( x) if and only if a n = b n V non-negative integer n. Thus two
polynomials are equal iff their corresponding coefficients are equal.
(b) f ( x) + g ( x) = c 0 x 0 + c1 x + c 2 x 2 + c 3 x 3 + …
where c n = a n + b n for every non-negative integer n. Thus in order to add two
polynomials we should add the coefficients of like powers of x.
Since c n ∈ R and only a finite number of c' s can be not equal to zero, therefore
f ( x) + g ( x) is an element of R [ x]. Thus R [ x] is closed with respect to
addition of polynomials as defined above.
(c) f ( x) g ( x) = d0 x 0 + d1 x + d2 x 2 + d3 x 3 + …
where dn = a0 b n + a1 b n − 1 + a2 b n − 2 + … + a n b0
for every non-negative integer n.
the products of the type a i b j with i and j non-negative integers whose sum is n.
40
,
Since dn ∈ R and only a finite number of d s can be not equal to zero, therefore
f ( x) g ( x) is an element of R [ x]. Thus R [ x] is closed with respect to multiplication
of polynomials as defined above.
We have d0 = a0 b0 , d1 = a0 b1 + a1 b0 , d2 = a0 b2 + a1 b1 + a2 b0 ,
d3 = a0 b3 + a1 b2 + a2 b1 + a3 b0 and so on.
Therefore in order to multiply two polynomials f ( x) and g ( x), we should first write
f ( x) g ( x) = (a0 x 0 + a1 x + a2 x 2 + …) (b0 x 0 + b1 x + b2 x 2 + …).
Now we should multiply different powers of the indeterminate x and using the
relation x i x j = x i + j we should collect coefficients of different powers of x.
Zero Polynomial: The polynomial
f ( x) = Σ a n x n = a0 x 0 + a1 x + a2 x 2 + a3 x 3 + …
in which all the coefficients a0 , a1 , a2 ,…… are equal to 0 is called the zero
polynomial over the ring R.
Degree of a polynomial:
Let f ( x) = a0 x 0 + a1 x + a2 x 2 + a3 x 3 + … + a n x n
+…
be a polynomial over an arbitrary ring R. We say that n is the degree of the polynomial f ( x) if
and only if a n ≠ 0 and a m = 0 for all m > n. We shall write deg f ( x) to denote the
degree of f ( x). Thus the degree of f ( x) is the largest non-negative integer i for
which the ith coefficient of f ( x) is not 0. If in the polynomial f ( x), a0 (i.e., the
coefficient of x 0 ) is not 0 and all the other coefficients are 0, then according to our
definition, the degree of f ( x) will be zero. Also according to our definition, if there
is no non-zero coefficient in f ( x), then its degree will remain undefined. Thus we
do not define the degree of the zero polynomial. Also it is obvious that every
non-zero polynomial will possess a unique degree.
Note: If f ( x) = a0 x 0 + a1 x + a2 x 2 + … + a n x n + … is a polynomial of degree
n i. e., if a n ≠ 0 and a m = 0 for all m > n, then it is convenient to write
n
f ( x) = Σ a i x i = a0 x 0 + a1 x + a2 x 2 + … + a n x n .
i =0
It will remain understood that all the terms in f ( x) which follow the term a n x n ,
have zero coefficients. Also we shall call a n x n as the leading term and a n as the
leading coefficient of the polynomial. The term a0 x 0 is called the constant term
and a0 is called the zero th coefficient of f ( x).
For example f ( x) = 2 x 0 + 3 x − 4 x 2 + 4 x 3 − 8 x 4 is a polynomial of degree 4 over
the ring of integers. Here − 8 is the leading coefficient and 2 is the zero th
coefficient. The coefficients of all terms which contain powers of x greater than 4
will be regarded as zero. Similarly g ( x) = 3 x 0 is a polynomial of degree zero over
the ring of integers. In this polynomial the coefficients of x, x 2 , x 3 ,…are all equal to
41
Now let
f ( x) = Σ a i x i = a0 x 0 + a1 x + a2 x 2 + …,
g ( x) = b0 x 0 + b1 x + b2 x 2 + …,
h ( x) = c 0 x 0 + c1 x + c 2 x 2 + … be any arbitrary elements of R [ x].
Commutativity of addition: We have
0
f ( x) + g ( x) = (a0 + b0 ) x + (a1 + b1 ) x + (a2 + b2 ) x 2 + …
= (b0 + a0 ) x 0 + (b1 + a1 ) x + (b2 + a2 ) x 2 + … = g ( x) + f ( x).
Associativity of addition: We have
[ f ( x) + g ( x)] + h ( x) = Σ (a i + b i ) x i + Σ c i x i = Σ [(a i + b i ) + c i ] x i
= Σ [a i + (b i + c i )] x i = Σ a i x i + Σ (b i + c i ) x i
= f ( x) + [ g ( x) + h ( x)].
Existence of additive identity: Let 0 ( x) be the zero polynomial over R i.e.,
0 ( x) = 0 x 0 + 0 x + 0 x 2 + …
Then f ( x) + 0 ( x) = (a0 + 0) x 0 + (a1 + 0) x + (a2 + 0) x 2 + …
= a0 x 0 + a1 x + a2 x 2 + … = f ( x).
∴ the zero polynomial 0 ( x) is the additive identity.
Existence of additive inverse: Let − f ( x) be the polynomial over R defined as
− f ( x) = (− a0 ) x 0 + (− a1 ) x + (− a2 ) x 2 + …
Then − f ( x) + f ( x) = (− a0 + a0 ) x 0 + (− a1 + a1 ) x + (− a2 + a2 ) x 2 + …
= 0 x 0 + 0 x + 0 x 2 + … = 0 ( x) = the additive identity.
∴ each member of R [ x] possesses additive inverse.
Associativity of Multiplication: We have
0
f ( x) g ( x) = (a0 x + a1 x + a2 x 2 + …) (b0 x 0 + b1 x + b2 x 2 + …)
= d0 x 0 + d1 x + d2 x 2 + … + dl x l + … ,
where dl = Σ ai b j .
i+ j=l
Now [ f ( x) g ( x)] h ( x) = (d0 x 0 + d1 x + d2 x 2 + …) (c 0 x 0 + c1 x + c 2 x 2 + …)
= e0 x 0 + e1 x + e2 x 2 + … + e n x n + …,
where e n = the coeff. of x n in [ f ( x) g ( x)] h ( x)
= Σ dl c k = Σ [( Σ ai b j ) c k ] = Σ ai b j c k .
l+k=n l+k=n i+ j=l i+ j+k=n
Example 1: Show that if a ring R has no zero divisors, then the ring R [ x] has also no zero
divisors.
Solution: It is given that a ring R has no zero divisors and we have to show that the
ring R [ x] has also no zero divisors.
Let f ( x) = a0 + a1 x + a2 x 2 + .... + a m x m , a m ≠ 0
and g ( x) = b0 + b1 x + b2 x 2 + … + b n x n , b n ≠ 0
be two non-zero elements of R [ x].
Then f ( x) g ( x) cannot be the zero polynomial i.e., the zero element of R [ x]. The
reason is that at least one coefficient of f ( x) g ( x) namely a m b n of x m + n is ≠ 0
because a m , b n are non-zero elements of R and R is without zero divisors.
Thus in R [ x] the product of no two non-zero elements can be the zero element.
Hence the ring R [ x] has no zero divisors.
Example 2: Consider the following polynomials over the ring (I 8 , + 8 , × 8 ) :
f ( x) = 2 + 6 x + 4 x 2, g ( x) = 2 x + 4 x 2, h ( x) = 2 + 4 x
and find (i) deg [ f ( x) + g ( x)]
(ii) deg [ f ( x) g ( x)]
(iii) deg [ h ( x) h ( x)].
Solution: (i) We have f ( x) + g ( x) = (2 + 6 x + 4 x 2 ) + (0 + 2 x + 4 x 2 )
= (2 + 8 0) + (6 + 8 2) x + (4 + 8 4) x 2 = 2 + 0 x + 0 x 2 = 2.
Thus f ( x) + g ( x) is a non-zero constant polynomial and so
deg [ f ( x) + g ( x)] = 0.
(ii) We have f ( x) g ( x) = (2 + 6 x + 4 x 2 ) (2 x + 4 x 2 )
= (2 × 8 2) x + [(2 × 8 4) + 8 (6 × 8 2)] x 2
+ [(6 × 8 4) + 8 (4 × 8 2)] x 3 + (4 × 8 4) x 4
= 4 x + (0 + 8 4) x 2 + (0 + 8 0) x 3 + 0 x 4
= 4 x + 4 x2 + 0 x3 + 0 x4 = 4 x + 4 x2
∴ deg [ f ( x) g ( x)] = 2.
(iii) We have h ( x) h ( x) = (2 + 4 x) (2 + 4 x)
= (2 × 8 2) + [(2 × 8 4) + 8 (4 × 8 2)] x + (4 × 8 4) x 2
= 4 + (0 + 8 0) x + 0 x 2 = 4 + 0 x + 0 x 2 = 4.
Thus h ( x) h ( x) is a non-zero constant polynomial and so deg [h ( x) h ( x)] = 0.
48
(a) any non-zero element in R is either a unit or can be written as the product of a finite
number of irreducible (prime) elements of R;
(b) the decomposition in part (a) is unique upto the order and associates of the irreducible
elements.
Thus if R is a unique factorization domain and if a ≠ 0 is a non-unit in R, then a can
be expressed as a product of a finite number of prime elements of R. Also if
a = p1 p2 p3 … pn = p1 ' p2 ' p3 ' … pm '
where the pi and p j ' are prime elements of R, then m = n and each pi , 1≤ i ≤ n is an
associate of some p j ' , 1≤ j ≤ m and conversely each pk ' is an associate of some pl .
p1 ( x) p2 ( x) … pm ( x) = q1 ( x) q2 ( x) … q n ( x).
Now p1 ( x)| p1 ( x) p2 ( x) … pm ( x).
Therefore p1 ( x)| q1 ( x) q2 ( x) … q n ( x).
By corollary to theorem 2 of this article p1 ( x) must divide at least one of
q1 ( x), q2 ( x), … , q n ( x). Since F [ x ] is a commutative ring, therefore without loss of
generality we may suppose that p1 ( x) divides q1 ( x). But p1 ( x) and q1 ( x) are both
irreducible polynomials in F [ x ] and p1 ( x)| q1 ( x). Therefore p1 ( x) and q1 ( x) must
be associates and we have q1 ( x) = u p1 ( x) where u is a unit in F [ x ] i. e., u is a
non-zero element of F. Since q1 ( x) and p1 ( x) are monic therefore u must be equal to
1 and we have p1 ( x) = q1 ( x). Thus we have
p1 ( x) p2 ( x) … pm ( x) = p1 ( x) q2 ( x) … q n ( x).
Cancelling 0 ≠ p1 ( x) from both sides, we get
p2 ( x) p3 ( x) … pm ( x) = q2 ( x) q3 ( x) … q n ( x) ...(1)
Now we can repeat the above argument on the relation (1) with p2 ( x). If n > m, then
after m steps the left hand side becomes 1 while the right hand side reduces to a
product of a certain number of q( x) (the excess of n over m). But the q( x) are
irreducible polynomials so they are not units of F [ x ] i. e., they are not
polynomials of zero degree.
So their product will be a polynomial of degree ≥ 1. So it cannot be equal to 1.
Therefore n cannot be greater than m. Then n ≤ m. Similarly interchanging the roles
of p( x) and q ( x), we get m ≤ n. Hence m = n.
Also in the above process we have shown that every p( x) is equal to some q ( x) and
conversely every q ( x) is equal to some p( x). Hence the theorem has been completely
established.
Thus we can say that the ring of polynomials over a field is a unique factorization domain.
Illustration: Show that the polynomial x 2 + x + 4 is irreducible over F, the field of integers
modulo 11.
Solution: The field F is ({0, 1, ..., 10 }, +11 , ×11 ).
Let f ( x) = x 2 + x + 4.
If a ∈ F, then by a n we shall mean a ×11 a ×11 a ×11 a… upto n times.
Now f (0) = 0 2 +11 0 +11 4 = 4, f (1) = 12 +11 1 +11 4 = 6,
f (2) = 22 + 11 2 +11 4 = 10, f (3) = 32 +11 3 +11 4 = 5, f (4) = 2,
f (5) = 1, f (6) = 62 +11 6 +11 4 = 2, f (7) = 5, f (8) = 10, f (9) = 6,
f (10) = 4.
Since f (a) ≠ 0 V a ∈ F, therefore by factor theorem x − a does not divide
f ( x) V a ∈ F. Therefore f ( x) has no proper divisors in F [ x ]. Hence f ( x) is
irreducible over F.
58
Comprehensive Exercise 1
Answers 1
1. x 4 + 4 = ( x + 1) ( x + 2) ( x + 3) ( x + 4).
2. x 2 + 1 = ( x + 2) ( x + 3).
3. x = 3 because 3 × 7 3 = 2.
4. not irreducible over reals.
5. (i) 3 x 3 + 4 x 2 .
(ii) 1 + 4 x 2 + 2 x 3 + x 4 + 2 x 5 + x 6 + 3 x 7 + 4 x 8 + x 9 .
6. (i) 2
(ii) 4 + 4 x + 4 x 2 + x 3 + 3 x 4
+ 4 x 7 + 6 x 8 + x10 .
(iii) 3 x 7 + 5 x 3 + 4 x + 2.
7. (i) f ( x) + g ( x) = x 6 + 3 x 5 + 5 x 2 + 6 x + 6.
59
f ( x) + g ( x) = x 8 + 5 x 7 + 3 x 6 + 5 x 5 + 4 x 4 + 5 x 3 + 5 x 2 + 6 x + 1.
Note that in Z 7 , we have − 3 = 4, − 1 = 6 etc.
(ii) q ( x) = x 4 + x 3 + x 2 + x + 5, r ( x) = 4 x + 3.
Theorem 1: Let R be a commutative ring and S an ideal of R. Then the ring of residue
classes R / S is an integral domain if and only if S is a prime ideal.
Proof: Let R be a commutative ring and S an ideal of R. Then
R / S = { S + a : a ∈ R }.
Let S + a, S + b be any two elements of R / S. Then a, b ∈ R.
We have (S + a) (S + b) = S + ab
= S + ba [∵ R is a commutative ring ]
= (S + b) (S + a).
∴ R / S is a commutative ring.
Now let S be a prime ideal of R. Then we are to prove that R / S is an integral
domain. For this we are to show that R / S is without zero divisors. The zero
element of the ring R / S is the residue class S itself. Let S + a, S + b be any two
elements of R / S.
Then (S + a) (S + b) = S ( the zero element of R / S)
⇒ S + ab = S ⇒ ab ∈ S
⇒ either a or b is in S, since S is a prime ideal
⇒ either S + a = S or S + b = S [Note that a ∈ S ⇔ S + a = S]
⇒ either S + a or S + b is the zero element of R / S.
∴ R / S is without zero divisors.
Since R / S is a commutative ring without zero divisors, therefore R / S is an
integral domain.
Conversely, let R / S be an integral domain. Then we are to prove that S is a prime
ideal of R. Let a, b be any two elements in R such that ab ∈ S. We have
ab ∈ S ⇒ S + ab = S ⇒ (S + a) (S + b) = S.
Since R / S is an integral domain, therefore it is without zero divisors. Therefore
(S + a) (S + b) = S (the zero element of R / S)
⇒ either S + a or S + b is zero ⇒ either S + a = S or S + b = S
⇒ either a ∈ S or b ∈ S ⇒ S is a prime ideal.
This completes the proof of the theorem.
Note: If R is a ring with unity, then R / S is also a ring wity unity. The residue class
S + 1 is the unity element of R / S. Therefore if we define an integral domain as a
commutative ring with unity and without zero divisors, even then the above
theorem will be true. But in that case R must be a commutative ring with unity.
Theorem 2: Let R be a commutative ring with unity. Then every maximal ideal of R is a
prime ideal.
Proof: R is a commutative ring with unit element. Let S be a maximal ideal of
R.Then R / S is a field.
Now every field is an integral domain. Therefore R / S is also an integral domain.
Hence by theorem 1, S is a prime ideal of R. This completes the proof of the
theorem.
64
But it should be noted that the converse of the above theorem is not true i. e., every
prime ideal is not necessarily a maximal ideal.
Example 3: Let R be the field of real numbers and S the set of all those polynomials
f ( x) ∈ R [ x ] such that f (0) = 0 = f (1). Prove that S is an ideal of R [ x ]. Is the
residue class ring R [ x ] / S an integral domain? Give reasons for your answer.
Solution: Let f ( x), g( x) be any elements of S. Then
f (0) = 0 = f (1) and g (0) = 0 = g (1).
Let h ( x) = f ( x) − g ( x).
Then h (0) = f (0) − g (0) = 0 − 0 = 0 and h (1) = f (1) − g (1) = 0 − 0 = 0.
Thus h (0) = 0 = h (1). Therefore h( x) ∈ S.
Thus f ( x), g ( x) ∈ S ⇒ h ( x) = f ( x) − g ( x) ∈ S.
Further let f ( x) be any element of S and r ( x) be any element of R [ x ].
Then f (0) = 0 = f (1), by definition of S.
Let t ( x) = r ( x) f ( x) = f ( x) r ( x). [∵ R [ x ] is a commutative ring ]
Then t (0) = r (0) f (0) = r (0) . 0 = 0
and t (1) = r (1) f (1) = r (1) . 0 = 0.
∴ t ( x) ∈ S.
Thus r ( x) ∈ R [ x ], f ( x) ∈ S ⇒ r ( x) f ( x) ∈ S.
Hence S is an ideal of R [ x ].
Now we claim that S is not a prime ideal of R [ x ]. Let f ( x) = x ( x − 1). Then
f (0) = 0 (0 − 1) = 0, and f (1) = 1(1 − 1) = 0.
Thus f ( x) = x ( x − 1) is an element of S.
Now let p( x) = x, q ( x) = x − 1.
We have p(1) = 1 ≠ 0. Therefore p( x) ∉ S. Also q (0) = 0 − 1 = − 1 ≠ 0.
Therefore q ( x) ∉ S. Thus x ( x − 1) ∈ S while neither x ∈ S nor x − 1∈ S. Hence S is
not a prime ideal of R [ x ].
Since S is not a prime ideal of R [ x ], therefore the residue class ring R [ x ] / S is
not an integral domain.
Example 4: Let R be the ring of all real valued continuous functions defined on the closed
interval [ 0, 1 ]. Let M = { f ( x) ∈ R : f = 0}.
1
3
Show that M is a maximal ideal of R.
Solution: First of all we observe that M is non-empty because the real valued
function e ( x) on [0, 1] defined by
65
e ( x) = 0 V x ∈ [0, 1]
belongs to M.
Now let f ( x), g( x) be any two elements of M. Then
f = 0, g = 0, by definition of M.
1 1
3 3
Let h( x) = f ( x) − g( x).
h = f − g = 0 − 0 = 0.
1 1 1
Then
3 3 3
Therefore h( x) ∈ M.
Thus f ( x), g ( x) ∈ M ⇒ h ( x) = f ( x) − g ( x) ∈ M.
Further let f ( x) be any element of M and r( x) be any element of R. Then
f = 0, by definition of M.
1
3
Let t ( x) = r ( x) f ( x) = f ( x) r ( x). [∵ R is a commutative ring.]
Then t = r f = r . 0 = 0. Therefore t ( x) ∈ M.
1 1 1 1
3 3 3 3
Thus r ( x) ∈ R, f ( x) ∈ M ⇒ r ( x) f ( x) ∈ M.
Hence M is an ideal of R.
Clearly M ≠ R because i( x) ∈ R given by i( x) = 1 V x ∈ [0, 1] does not belong to M.
The ring R is with unity and the element i( x) is its unity element.
Let N be an ideal of R properly containing M i. e., M ⊆ N and M ≠ N .Then M
will be a maximal ideal of R if N = R , which will be so if the unity i( x) of R
belongs to N. Since M is a proper subset of N, therefore there exists λ ( x) ∈ N such
that λ ( x) ∉ M. This means λ ≠ 0. Put λ = c where c ≠ 0.
1 1
3 3
Let us define β ( x) ∈ R by β ( x) = c V x ∈ [0, 1]. Now consider µ ( x) ∈ R given by
µ ( x) = λ ( x) − β ( x).
We have µ = λ − β = c − c = 0.
1 1 1
3 3 3
Therefore µ( x) ∈ M and so µ ( x) also belongs to N because N is a super-set of M.
Now N is an ideal of R and λ ( x), µ ( x) are in N. Therefore λ ( x) − µ ( x) = β ( x) is also
an element of N.
Now define γ ( x) ∈ R by γ ( x) = 1 / c V x ∈ [0, 1]. Since N is an ideal of R, therefore
γ ( x) ∈ R and β ( x) ∈ N ⇒ γ ( x) β( x) ∈ N . We shall show that γ ( x) β ( x) = i( x).
For every x ∈ [0, 1,] we have γ ( x) β ( x) = (1 / c ) c = 1.
Therefore γ ( x) β ( x) = i( x), by definition of i( x).
Thus the unity element i( x) of R belongs to N and consequently N = R.
Hence M is a maximal ideal of R.
66
Example 5: If R is a finite commutative ring (i. e., has only a finite number of elements) with
unit element prove that every prime ideal of R is a maximal ideal of R..
Solution: Let R be a finite commutative ring with unit element. Let S be a prime
ideal of R. Then to prove that S is a maximal ideal of R.
Since S is a prime ideal of R, therefore the residue class ring R / S is an integral
domain. Now
R / S = { S + a : a ∈ R }.
Since R is a finite ring, therefore R / S is a finite integral domain. But every finite
integral domain is a field. Therefore R / S is a field. Since R is a commutative ring
with unity and R / S is a field, therefore S is a maximal ideal of R.
Example 6: Give an example of a ring in which some prime ideal is not a maximal ideal.
Solution: Let I [ x ] be the ring of polynomials over the ring of integers I. Let S be the
principal ideal of I [ x ] generated by x i. e., let S = ( x).We shall show that ( x) is prime
but not maximal.
We have S = ( x) = { x f ( x) : f ( x) ∈ I [ x ] } .
First we shall prove that S is prime.
Let a ( x), b ( x) ∈ I [ x ] be such that a ( x) b ( x) ∈ S. Then there exists a polynomial
c ( x) ∈ I [ x ] such that
x c ( x) = a ( x) b ( x). ...(1)
2 2
Let a( x) = a0 + a1 x + a2 x + … , b( x) = b0 + b1 x + b2 x + …,
c ( x) = c 0 + c1 x + c 2 x 2 + … .
Then (1) becomes
x (c 0 + c1 x + …) = (a0 + a1 x + …) (b0 + b1 x + … ).
Equating the constant term on both sides, we get
a0 b0 = 0 ⇒ a0 = 0 or b0 = 0. [ ∵ I is without zero divisors]
2
Now a0 = 0 ⇒ a( x) = a1 x + a2 x +…
⇒ a ( x) = x (a1 + a2 x + ...) ⇒ a ( x) ∈ ( x).
Similarly b0 = 0 ⇒ b ( x) = b1 x + b2 x 2 + …
⇒ b ( x) = x (b1 + b2 x + ...) ⇒ b ( x) ∈ ( x).
Thus a ( x) b ( x) ∈ ( x) ⇒ either a ( x) ∈ ( x) or b ( x) ∈ ( x).
Hence ( x) is a prime ideal.
Now we shall show that ( x) is not a maximal ideal of I [ x ]. For this we must show an
ideal N of I [ x ] such that ( x) is properly contained in N, while N itself is properly
contained in I [ x ]. The ideal N = ( x, 2) serves this purpose.
Obviously ( x) ⊆ ( x, 2). In order to show that ( x) is properly contained in ( x, 2) we
must show an element of ( x, 2) which is not in ( x). Clearly 2 ∈ ( x, 2). We shall show
that 2 ∉( x). Let 2 ∈ ( x). Then we can write,
67
2 = x f ( x) for some f ( x) ∈ I [ x ].
Let f ( x) = a0 + a1 x + …
Then 2 = x f ( x) ⇒ 2 = x (a0 + a1 x + ...)
⇒ 2 = a0 x + a1 x 2 + … ⇒ 2 = 0 + a0 x + a1 x 2 + …
⇒ 2=0 [by equality of two polynomials]
But 2 ≠ 0 in the ring of integers. Hence 2 ∉( x). Thus ( x) is properly contained in
( x, 2).
Now obviously ( x, 2) ⊆ I [ x ]. In order to show that ( x, 2) is properly contained in
I [ x ] we must show an element of I [ x ] which is not in ( x, 2). Clearly1∈ I [ x ]. We
shall show that 1 ∉( x, 2). Let 1 ∈ ( x, 2). Then we have a relation of the form
1 = x f ( x) + 2 g( x), where f ( x), g( x) ∈ I [ x ].
Let f ( x) = a0 + a1 x + … , g( x) = b0 + b1 x + …
Then 1 = x (a0 + a1 x + …) + 2 (b0 + b1 x + ...)
⇒ 1 = 2 b0 [Equating constant term on both sides]
But there is no integer b0 such that 1 = 2 b0 .
Hence 1 ∉( x, 2). Thus ( x, 2) is properly contained in I [ x ].
Therefore ( x) is not a maximal ideal of I [ x ].
[d (− 5) = | − 5 | = 5, d (− 1) = | − 1| = 1, d (4) = | 4 | = 4 etc.]
Further if a, b ∈ I and are both non-zero, then
| ab | = | a || b | ⇒ | ab | ≥ | a | [ ∵ | b | ≥ 1 if 0 ≠ b ∈ I ]
⇒ d (ab) ≥ d (a).
Finally we know that if a ∈ I and 0 ≠ b ∈ I, then there exist two integers q and r
such that
a = qb + r where 0 ≤ r < | b |
i. e., where either r = 0 or 1≤ r < | b |
i. e., where either r = 0 or d (r) < d (b).
It should be noted that d (b) = | b | and if r is a positive integer then r = | r | = d (r).
Therefore the ring of integers is a Euclidean ring.
Illustration 2: The ring of polynomials over a field is a Euclidean ring.
Solution: Let F [ x ] be the ring of polynomials over a field F. Let the d function on
the non-zero polynomials in F [ x ] be defined as
d [ f ( x)] = deg f ( x), V 0 ≠ f ( x) ∈ F [ x ].
Now if 0 ≠ f ( x) ∈ F [ x ], then deg f ( x) is a non-negative integer.
Thus we have assigned a non-negative integer to every non-zero element f ( x) in
F [ x ].
Further if f ( x), g( x) ∈ F [ x ] and are both non-zero polynomials, then
deg [ f ( x) g ( x)] = deg f ( x) + deg g ( x)
⇒ deg [ f ( x) g ( x)] ≥ deg f ( x) [∵ deg g ( x) ≥ 0 ]
⇒ d [ f ( x) g ( x)] ≥ d [ f ( x)].
Finally we know that if f ( x) ∈ F [ x ] and 0 ≠ g( x) ∈ F [ x ], then there exist two
polynomials q ( x) and r ( x) in F [ x ] such that
f ( x) = q ( x) g ( x) + r ( x)
where either r ( x) = 0 or deg r ( x) < deg g ( x)
i. e., where either r ( x) = 0 or d [r ( x)] < d [ g ( x)].
Hence the ring of polynomials over a field is a Euclidean ring .
Illustration 3: Every field is a Euclidean ring.
Solution: Let F be any field. Let the d function on the non-zero elements of F be
defined as
d (a) = 0 V 0 ≠ a ∈ F.
Thus we have assigned the integer zero to every non-zero element in F.
If a and b are non-zero elements in F then ab is also a non-zero element in F. We
have therefore
d (ab) = 0 = d (a).
Thus we have d (ab) ≥ d (a).
Finally if a ∈ F and 0 ≠ b ∈ F, then we can write
69
a = (ab − 1 ) b + 0
i. e., a = qb + r, where q = ab −1 and r = 0.
Hence every field is a Euclidean ring.
Illustration 4: The ring of Gaussian integers is a Euclidean ring.
Solution: Let (G, +, .) be the ring of Gaussian integers where
G = { x + iy : x, y ∈ I }.
Let the d function on the non-zero elements of G be defined as
d ( x + iy) = x 2 + y 2 V 0 + i0 ≠ x + iy ∈ G.
Now if x + iy is a non-zero element of G, then ( x 2 + y 2 ) is a non-negative integer.
Thus we have assigned a non-negative integer to every non-zero element of G.
If x + iy and m + in are two non-zero elements of G, then
d [ ( x + iy ) (m + in ) ] = d [ ( xm − ny ) + i (my + xn ) ]
= ( xm − ny)2 + (my + xn)2 = x 2 m2 + n2 y 2 + m2 y 2 + x 2 n2
= ( x 2 + y 2 ) (m2 + n2 )
≥ x2 + y2 . [∵ m2 + n2 ≥ 1]
Thus d [ ( x + iy) (m + in)] ≥ d ( x + iy).
Now to show the existence of division algorithm in G.
Let α ∈ G and let β be a non-zero element of G. Let α = x + iy and β = m + in.
Define a complex number λ by the equation
α x + iy ( x + iy) (m − in)
λ = = = = p + iq ,
β m + in m2 + n2
where p, q are rational numbers.
Here λ is not necessarily a Gaussian integer.
Also division by β is possible since β ≠ 0.
Let p′ and q ′ be the nearest integers to p and q respectively.
1 1
Then obviously | p − p′ | ≤ , | q − q ′ | ≤ .
2 2
Let λ ′ = p′ + iq ′. Then λ ′ is a Gaussian integer.
α
Now λ = ⇒α = λ β ⇒ α = λ ′ β + λ β − λ ′ β.
β
Thus α = λ ′ β + (λ − λ ′) β. ...(1)
Since α, β, λ ′ are Gaussian integers, therefore from (1) it implies that (λ − λ ′) β is
also a Gaussian integer.
Now if p and q are integers then p = p′, q = q ′.
So λ − λ ′ = ( p − p′) + i(q − q ′) = 0 + i0. Thus (λ − λ ′) β = 0 + i0.
If p and q are not both integers, then (λ − λ ′ ) β is a non-zero Gaussian integer and
we have
70
Proof: If the greatest common divisor of a and b is 1, then by theorem 3 there exist
elements λ and µ in R such that
1 = λ a + µ b. ...(1)
Multiplying both members of (1) by c, we get
c = λ a c + µ b c. ...(2)
But a | bc , so there exists an element q ∈ R such that bc = qa.
Substituting this value of bc in (2), we get
c = λ a c + µ q a = ( λ c + µ q ) a,
which shows that a is a divisor of c. Hence the theorem.
Theorem 5: If p is a prime element in the Euclidean ring R and p| ab where a, b ∈ R then p
divides at least one of a or b.
Proof: If p divides a, we are nothing to prove. So suppose that p does not
divide a. Since p is prime and p does not divide a, therefore p and a are
relatively prime i. e., the greatest common divisor of p and a is 1. Hence by
theorem 4, we get that p| b.
Corollary: If p is a prime element in the Euclidean ring R and p divides the product
a1 a2 … a n of elements in R , then p divides at least one of a1 , a2 , ..., a n .
The result follows immediately by repeated application of theorem 5.
Theorem 6: Let R be a Euclidean ring. Let a and b be two non-zero elements in R. Then
(i) if b is a unit in R, d (ab) = d (a).
(ii) if b is not a unit in R, d (ab) > d (a).
Proof: (i) By the definition of Euclidean ring, we have
d (ab) ≥ d (a). ...(1)
Now suppose that b is a unit in R.Then b is inversible and b −1 exists. We can write
a = (ab) b −1 .
∴ d (a) = d [(ab) b −1 ].
But by the definition of Euclidean ring, we have
d [(ab) b −1 ] ≥ d (ab).
∴ d (a) ≥ d (ab). ...(2)
From (1) and (2), we conclude that
d (ab) = d (a).
(ii) Suppose now that b is not a unit in R. Since a and b are non-zero elements
of the Euclidean ring R, therefore ab is also a non-zero element of R. Now a ∈ R
and 0 ≠ ab ∈ R, therefore by definition of Euclidean ring there exist
elements q and r in R such that
a = q(ab) + r ...(3)
where either r = 0 or d (r) < d (ab).
73
If r = 0, then
a = qab ⇒ a − qab = 0 ⇒ a(1 − qb) = 0
⇒ 1 − qb = 0 [∵ a ≠ 0 and R is free of zero divisors]
⇒ qb = 1 ⇒ b is inversible ⇒ b is a unit in R.
Thus we get a contradiction. Hence r cannot be zero. Therefore we must have
d (r) < d (ab) i. e., d (ab) > d (r). ...(4)
Also from (3), we have r = a − qab = a (1 − qb).
∴ d (r) = d [a(1 − qb)].
But d [a(1 − qb)] ≥ d (a).
∴ d (r) ≥ d (a). ...(5)
From (4) and (5), we conclude that d (ab) > d (a).
Theorem 7: The necessary and sufficient condition that the non-zero element a in the
Euclidean ring R is a unit is that
d (a) = d (1). (Lucknow 2011)
Proof: Let a be a unit in R. Then to prove that d (a) = d (1).
By the definition of Euclidean ring
d (1 a ) ≥ d (1) ⇒ d ( a ) ≥ d (1). ...(1)
−1
Since a is a unit in R, therefore a exists and we have
−1
1 = aa ⇒ d (1) = d (aa −1 ).
But d ( aa −1 ) ≥ d ( a ).
∴ d (1) ≥ d ( a ). ...(2)
From (1) and (2), we conclude that d ( a ) = d (1).
Conversely let d ( a ) = d (1). Then to prove that a is a unit in R. If a is not a unit in R,
then by theorem 6, we have
d (1a ) > d (1) ⇒ d ( a ) > d (1).
Thus we get a contradiction. Hence a must be a unit in R.
Theorem 8: Let R be a Euclidean ring . Then every non-zero element in R is either a unit in
R or can be written as a product of a finite number of prime elements of R .
Proof: Let a be a non-zero element of R . We are to prove that either a is a unit in R
or it can be written as a product of a finite number of prime elements of R . We shall
prove the result by induction on d ( a ) i. e., by induction on the d- value of a.
Let us first start the induction. We have a = 1 a. Therefore d ( a ) ≥ d (1). Thus 1 is
an element in R which has the minimal d - value. If d ( a ) = d (1),then a is a unit in
R. [See theorem 7]. Thus the result of the theorem is true if d ( a ) = d (1) and so we
have started the induction.
Now assume as our induction hypothesis that the theorem is true for all non-zero
elements x ∈ R such that d ( x ) < d ( a ). Then we shall show that the theorem is
74
Theorem 10: An ideal S of the Euclidean ring R is maximal iff S is generated by some prime
element of R .
Proof: We know that every ideal of a Euclidean ring R is a principal ideal. Suppose
S is an ideal of R generated by p so that S = ( p). Now we are to prove that
(i) S is maximal if p is a prime element of R.
(ii) p is prime if S is maximal.
First we shall prove (i). Let p be a prime element of R such that ( p) = S . Let T be an
ideal of R such that S ⊆ T ⊆ R. Since T is also a principal ideal of R, so let T = (q)
where q ∈ R.
Now S ⊆ T ⇒ ( p) ⊆ (q) ⇒ p ∈ (q)
⇒ p = x q for some x ∈ R ⇒ q | p.
Since p is prime, therefore either q should be a unit in R or q should be an
associate of p.
If q is a unit in R, then T = (q) = R.
If q is an associate of p, then T = (q) = ( p) = S .
Thus either T = R or T = S .
Now we shall prove (ii). Let ( p) = S be a maximal ideal.We are to show that p is
prime. Let us suppose that p is composite i. e., p is not prime.
Let p = mn where neither m nor n is a unit in R.
Now p = mn ⇒ m | p ⇒ ( p) ⊆ ( m ).
But ( m ) ⊆ R. Therefore we have ( p) ⊆ ( m ) ⊆ R.
But ( p) is a maximal ideal, therefore we should have either
( m ) = ( p) or ( m ) = R.
If R = ( m ), then R ⊆ (m).
∴ 1 ∈ R ⇒ 1 ∈ (m) ⇒ 1 = ym for some y ∈ R
⇒ m is inversible ⇒ m is a unit in R.
Thus we get a contradiction.
If (m) = ( p), then m ∈ ( p). Therefore m = l p for some l ∈ R.
∴ p = mn = l pn = pln .
∴ p (1 − ln) = 0 ⇒ 1 − ln = 0 [∵ p ≠ 0 and R is without zero divisors]
⇒ ln = 1 ⇒ n is inversible ⇒ n is a unit in R.
This is again a contradiction. Hence p must be a prime element of R.
⇒ bc = k q 1 q 2 … q s . ...(3)
Since each element of R can be uniquely expressed as the product of a finite number
of prime elements of R, therefore each of the prime elements q 1, q 2 , … , q s must
occur as a factor of either b or c . But none of q 1, q 2 , … , q s can be a factor of b
because otherwise a and b will not remain relatively prime. Therefore each of
q1 , q 2 , … , q s must be a factor of c. Hence
q 1 q 2 … q s is a divisor of c ⇒ a | c .
Note: In a similar manner we can prove that if a1 , … , a n are any n elements of a
unique factorization domain, they possess a greatest common divisor which will be
unique apart from the distinction between associates. Thus if g1 , g 2 are two
greatest common divisors of these n elements, then by the definition of greatest
common divisor, we have
g1 | g 2 and g2 | g1
⇒ g1 and g 2 are associates
⇒ g1 = u g 2 where u is a unit in R.
Thus the greatest common divisor of some elements is unique within units of R.
Theorem 2: If a is a prime element of a unique factorization domain R and b, c are
any elements of R, then
a | bc ⇒ a | b or a | c .
Proof: If a | b, then obviously the theorem is proved. So let a be not a divisor of b.
Since a is a prime element of R and a is not a divisor of b, therefore we claim that a
and b are relatively prime. Since a is a prime element of R, therefore the only
divisors of a are the associates of a or the units of R. Now an associate of a cannot be
a divisor of b otherwise a itself will be a divisor of b while we have assumed that a is
not a divisor of b. Thus the units of R are the only divisors of a which also divide b.
Therefore the greatest common divisor of a and b is a unit of R.
Since a and b are relatively prime, therefore by theorem 1, we have
a | bc ⇒ a | c .
This completes the proof of the theorem.
Polynomial rings over unique factorization domains: Let R be a unique
factorization domain. Since R is an integral domain with unity, therefore R [ x] is
also an integral domain with unity. Also any unit, (inversible element) in R [ x]
must already be a unit in R . Thus the only units in R [ x] are the units of R . A
polynomial p ( x) ∈ R [ x] is irreducible over R i.e., irreducible as an element of R [ x] if
whenever p( x) = a ( x) b ( x) with a ( x), b ( x) ∈ R [ x] , then one of a ( x) or b ( x) is a
unit in R [ x] i.e., a unit in R. For example, if I is the ring of integers, then I is a
unique factorization domain. The polynomial 2 x 2 + 4 ∈ I [ x] is a reducible
element of I [ x]. We have 2 x 2 + 4 = 2 ( x 2 + 2) . Neither 2 nor x 2 + 2 is a unit in
78
If possible, let
f ( x) = h f 2 ( x) where h ∈ R and f 2 ( x) ∈ R [ x] is primitive.
Then g f1 ( x) = h f 2 ( x) ...(1)
Since f1 ( x) and f 2 ( x) are both primitive, therefore the content of the polynomial
on the left hand side of (1) is g and the content of the polynomial on the right hand
side of (1) is h. But the content of a polynomial is unique upto associates.
Therefore g and h are associates
⇒ g = hu where u is some unit in R
⇒ hu f1 ( x) = h f 2 ( x)
⇒ u f1 ( x) = f 2 ( x) [by left cancellation law in the integral
domain R [ x] , since h ≠ 0]
⇒ f1 ( x) and f 2 ( x) are associates.
Hence the theorem.
Theorem 4: If R is a unique factorization domain, then the product of two primitive
polynomials in R [ x] is again a primitive polynomial in R [ x ].
Proof: Let
f ( x) = a0 + a1 x + … + a n x n and g( x) = b0 + b1 x + … + b m x m
be two primitive polynomials in R [ x].
Let h ( x) = f ( x) g( x) = c 0 + c 1 x + … + c m + n x m + n .
Suppose h ( x) is not primitive. Then all the coefficients of h ( x) must be divisible by
some prime element p of R. Since f ( x) is primitive, therefore the prime element p
must not divide some coefficient of f ( x) . Let a i be the first coefficient of f ( x)
which p does not divide. Similarly let b j be the first coefficient of g ( x) which p
does not divide. In f ( x) g( x), the coefficient of x i + j is
ci + j = a i b j + (a i − 1 b j + 1 + a i − 2 b j + 2 + .... + a0 b i + j )
+ (a i + 1 b j − 1 + a i + 2 b j − 2 + .... + a i + j b0 )
From this relation, we get
a i b j = c i + j − (a i − 1 b j + 1 + a i − 2 b j + 2 + .... + a0 b i + j )
− (a i + 1 b j − 1 + a i + 2 b j − 2 + .... + a i + j b0 ) ...(1)
Now by our choice of a i , p is a divisor of each of the elements a0 , a1 , … , a i − 1 .
Therefore p|(a i − 1 b j + 1 + a i − 2 b j + 2 + … + a0 b i + j ) .
Similarly, by our choice of b j , p is a divisor of each of the elements
b0 , b1 , ..., b j − 1 . Therefore
p|(a i + 1 b j − 1 + a i + 2 b j − 2 + .... + a i + j b0 ).
Also by assumption p| c i + j .
Hence from (1), we get
p| a i b j ⇒ p| a i or p| b j , since p is a prime element of R.
But this is nonsense because according to our assumption p is not a divisor of a i
and also p is not a divisor of b j .
Hence h ( x) must be primitive. This proves the theorem.
80
Note: In the above theorem if we take the ring of integers I in place of the unique
factorization domain R, then the field of quotients of I is the field of rational
numbers. The statement of the theorem will be as follows :
Let f ( x) = a0 + a1 x + … + a n x n be a polynomial with integer coefficients. If p is a prime
number such that
p| a0 , p| a1 , ..., p| a n − 1
whereas p is not a divisor of a n and p2 is not a divisor of a0 , then f ( x) is irreducible over the
field of rational numbers.
There will be no difference in proof.
Comprehensive Exercise 2
True or False
Write ‘T’ for true and ‘F’ for false statement.
1. Every field is a principal ideal ring.
2. The polynomial domain F [ x] over a field F is a field.
3. The polynomial ring I[ x] over the ring of integers is not a principal ideal
ring.
4. The field of real numbers is a prime field.
5. Every prime integer is a prime Gaussian integer.
6. A Euclidean ring is a unique Factorization Domain.
7. The ring of Gaussian integers is not a Euclidean ring.
8. Every field is a Euclidean ring.
A nswers
True or False
1. T. 2. F. 3. T. 4. F. 5. F.
6. T. 7. F. 8. T.
¨
89
3
L inear T ransformations
Proof: We have α, β ∈ V ⇒ α + β ∈ V .
Also a ∈ F, α ∈ V ⇒ aα ∈ V .
Therefore W + (α + β) ∈ V / W and also W + aα ∈ V / W. Thus V / W is closed
with respect to addition of cosets and scalar multiplication as defined above. Now
first of all we shall show that these two compositions are well defined i. e., are
independent of the particular representation chosen to denote a coset.
Let W + α = W + α ′ , α, α ′ ∈ V
and W + β = W + β ′ , β, β ′ ∈ V .
We have W + α = W + α′ ⇒ α − α′ ∈ W
and W + β = W + β ′ ⇒ β − β ′ ∈ W.
Now W is a subspace, therefore
α − α ′ ∈ W, β − β ′ ∈ W
⇒ (α − α ′ ) + ( β − β ′ ) ∈ W
⇒ (α + β) − (α ′ + β ′ ) ∈ W
⇒ W + (α + β) = W + (α ′ + β ′ )
⇒ (W + α) + (W + β) = (W + α ′ ) + (W + β ′ ).
Therefore addition in V /W is well defined.
Again a ∈ F, α − α ′ ∈ W ⇒ a (α − α ′ ) ∈ W
⇒ aα − aα ′ ∈ W ⇒ W + aα = W + aα ′ .
∴ scalar multiplication in V /W is also well defined.
Commutativity of addition: Let W + α , W + β be any two elements of V / W.
Then
(W + α) + (W + β) = W + (α + β) = W + ( β + α)
= (W + β) + (W + α).
Associativity of addition: Let W + α , W + β, W + γ be any three elements of
V / W. Then
(W + α) + [(W + β) + (W + γ )] = (W + α) + [W + ( β + γ )]
= W + [α + ( β + γ )]
= W + [(α + β) + γ ]
= [W + (α + β)] + (W + γ )
= [(W + α) + (W + β)] + (W + γ ).
Existence of additive identity: If 0 is the zero vector of V, then
W + 0 = W ∈ V / W.
If W + α is any element of V / W, then
(W + 0) + (W + α) = W + (0 + α) = W + α .
∴ W + 0 = W is the additive identity.
Existence of additive inverse: If W + α is any element of V / W, then
W + (− α) = W − α ∈ V / W.
92
Also we have
(W + α) + (W − α) = W + (α − α) = W + 0 = W.
∴ W − α is the additive inverse of W + α.
Thus V / W is an abelian group with respect to addition composition. Further we
observe that if
a, b ∈ F and W + α, W + β ∈ V / W, then
1. a [(W + α) + (W + β)] = a [W + (α + β)]
= W + a (α + β) = W + (aα + aβ)
= (W + aα) + (W + aβ)
= a (W + α) + a (W + β).
2. (a + b) (W + α) = W + (a + b) α
= W + (aα + bα)
= (W + aα) + (W + bα)
= a (W + α) + b (W + α).
3. (ab) (W + α) = W + (ab) α = W + a (bα)
= a (W + bα) = a [b (W + α)].
4. 1 (W + α) = W + 1α = W + α.
∴ V / W is a vector space over F for these two compositions. The vector space V / W
is called the Quotient Space of V relative to W. The coset W is the zero vector of
this vector space.
Let a1 (W + β1 ) + a2 (W + β 2 ) + … + a l (W + β l ) = W
⇒ (W + a1 β1 ) + (W + a2 β 2 ) + … + (W + a l β l ) = W
⇒ W + (a1 β1 + a2 β 2 + … + a l β l ) = W + 0
⇒ a1 β1 + a2 β 2 + … + a l β l ∈ W
⇒ a1 β1 + a2 β 2 + … + a l β l = b1 α1 + b2 α 2 + … + b m α m
[∵ any vector in W can be expressed as a lin ear
combination of its basis vectors]
⇒ a1 β1 + a2 β 2 + … + a l β l − b1 α1 − b2 α 2 − … − b m α m = 0
⇒ a1 = 0, a2 = 0, …, a l = 0 since the vectors
β1 , β 2 , … β l , α1 , α 2 , … , α m are linearly independent.
∴ The set S1 is linearly independent.
Now to show that L (S1 ) = V / W. Let W + α be any element of V / W. The vector
α ∈ V can be expressed as
α = c1 α1 + c 2 α 2 + … + c m α m + d1 β1 + d2 β 2 +… + dl β l
= γ + d1 β1 + d2 β 2 + … + dl β l where
γ = c1 α1 + c 2 α 2 + … + c m α m ∈ W.
So W + α = W + (γ + d1 β1 + d2 β 2 + … + dl β l )
= (W + γ ) + d1 β1 + d2 β 2 + … + dl β l
= W + (d1 β1 + d2 β 2 + … + dl β l ) [ ∵ γ ∈ W ⇒ W + γ = W ]
= (W + d1 β1 ) + (W + d2 β 2 ) + … + (W + dl β l )
= d1 (W + β1 ) + d2 (W + β 2 ) + … + dl (W + β l ).
Thus any element W + α of V / W can be expressed as a linear combination of
elements of S1 .
∴ V / W = L (S1 ). ∴ S1 is a basis of V / W.
∴ dim V / W = l. Hence the theorem.
Proof: In order to prove the equivalence of the three statements we shall prove
that (i) ⇒ (ii), (ii) ⇒ (iii) and (iii) ⇒ (i).
(i) ⇒ (ii). Suppose W1 , … , Wk are independent. Let α ∈ W.
Since W = W1 + … + Wk , therefore we can write
α = α1 + … + α k with α i in Wi .
Suppose that also α = β1 + … + β k with β i in Wi .
Thenα1 + … + α k = β1 + … + β k
⇒ (α1 − β1 ) + … + (α k − β k ) = 0 with α i − β i in Wi as Wi is a subspace
⇒ αi − βi = 0 [ ∵ W1 , … , Wk are independent]
⇒ α i = β i , i = 1, … , k.
Therefore the α i ’s are uniquely determined by α.
(ii) ⇒ (iii). Let α ∈ W j ∩ (W1 + … + W j−1 ).
Then α ∈ W j and α ∈ W1 + … + W j − 1 .
Now α ∈ W1 + … + W j − 1 implies that there exist vectors α1 , … , α j − 1 with α i in Wi
such that
α = α1 + … + α j − 1 .
Also α ∈ Wj .
Therefore we get two expressions for α as a sum of vectors, one in each Wi . These
are
α = α1 + … + α j − 1 + 0 + … + 0
in which the vector belonging to W j is 0
and α = 0 +…+ 0 + α +…+ 0
in which the vector belonging to W j is α.
Since the expression for α is given to be unique, therefore we must have
α1 = … = α j − 1 = 0 = α.
Thus W j ∩ (W1 + … + W j − 1 ) = {0}.
(iii) ⇒ (i).
Let α1 + … + α k = 0 where α i ∈ Wi , i = 1, … , k. …(1)
Then we are to prove that each α i = 0.
Suppose that for some i we have α i ≠ 0.
Let j be the largest integer ibetween 1 and k such that α i ≠ 0.Obviously j must be ≥ 2
and at the most j can be equal to k. Then (1) reduces to
α1 + … + α j = 0, α j ≠ 0
⇒ α j = − α1 − … − α j − 1
⇒ α j ∈ W1 + … + W j − 1 [∵ − α1 − … − α j − 1 ∈ W1 + … + W j − 1 ]
⇒ α j ∈ W j ∩ (W1 + … + W j − 1 ) [ ⇒ α j ∈ Wj ]
⇒ α j = 0.
Thus we get a contradiction. Hence each α i = 0.
101
Note: If any (and hence all) of the three conditions of theorem 1 hold for
W1 , … , Wk ,then we shall say that W is the direct sum of W1 , … , Wk and we write
W = W1 ⊕ … ⊕ Wk .
Theorem 2: Let V ( F ) be a vector space. Let W1 , … , Wn be subspaces of V.Suppose that
V = W1 + … + Wn
and that Wi ∩ (W1 + … + Wi − 1 + Wi + 1 + … + Wn ) = {0}
for every i = 1, 2, … , n. Prove that V is the direct sum of W1 , … , Wn .
Proof: In order to prove that V is the direct sum of W1 , … , Wn , we should prove
that each vector α ∈ V can be uniquely expressed as
α = α1 + … + α n where α i ∈ Wi , i = 1, … , n.
Since V = W1 + … + Wn , therefore any vector α in V can be written as
α = α1 + … + α n where α i ∈ Wi . …(1)
To show that α1 , … , α n are unique.
Let α = β1 + … + β n where β i ∈ Wi . …(2)
From (1) and (2), we get
α1 + … + α n = β1 + … + β n
(α1 − β1 ) + … + (α i − 1 − β i − 1 ) + (α i − β i )
+ (α i + 1 − β i + 1 ) + … + (α n − β n ) = 0. …(3)
Now each Wi is a subspace of V. Therefore α i − β i and also its additive inverse
β i − α i ∈ Wi , i = 1, … , n.
From (3), we get
(α i − β i ) = ( β1 − α1 ) + … + ( β i − 1 − α i − 1 )
+ ( β i + 1 − α i + 1 ) + … + ( β n − α n ). …(4)
Now the vector on the right hand side of (4) and consequently the vector α i − β i is
in W1 + … + Wi − 1 + Wi + 1 + … + Wn .
Also α i − β i ∈ Wi .
∴ α i − β i ∈ Wi ∩ (W1 + … + Wi − 1 + Wi + 1 + … + Wn ).
But for every i = 1, … , n, it is given that
Wi ∩ (W1 + … + Wi − 1 + Wi + 1 + … + Wn ) = {0}.
Therefore α i − β i = 0, i = 1, … , n
⇒ α i = β i , i = 1, … , n
⇒ the expression (1) for α is unique.
Hence V is the direct sum of W1 , … , Wn .
Theorem 3: Let V ( F ) be a finite dimensional vector space and let W1 , … , Wk be
subspaces of V. Then the following two statements are equivalent.
(i) V is the direct sum of W1 , … , Wk .
(ii) If Bi is a basis of Wi , i = 1, … , k, then the union
k
B = ∪ Bi is also a basis for V.
i =1
102
= α1 + α 2 + … + α k …(2)
i i i i
where α i = a1 α1 + … + a ni α ni ∈ Wi .
Thus each vector in V can be expressed as a sum of vectors one in each Wi .
Now V will be the direct sum of W1 , … , Wk if the expression (2) for α is unique. Let
α = β1 + β 2 + … + β k …(3)
i i i i
where β i = b1 α1 + … + b ni α ni ∈ Wi .
From (2) and (3), we get
α1 + … + α k = β1 + …+ β k
⇒ (α1 − β1 ) + … + (α i − β i ) + … + (α k − β k ) = 0
103
k
⇒ Σ (α i − β i ) = 0
i =1
k
⇒ Σ [(a1 i − b1 i ) α1 i + … + (a ni i − b ni i ) α ni i ] = 0
i =1
⇒ a1 i − b1 i = … = a ni i − b ni i = 0, i = 1, … , k
k
[ ∵ ∪ Bi is linearly independent being a basis of V ]
i =1
⇒ a1 = b1 , … , a ni = b ni i , i = 1, … , k
i i i
⇒ α i = β i , i = 1, … , k
⇒ the expression (2) for α is unique.
Hence V is the direct sum of W1 , … , Wk .
Note: While proving this theorem we have proved that if a finite dimensional vector
space V ( F ) is the direct sum of its subspaces W1 , … , Wk , then
dim V = dim W1 + … + dim Wk .
Thus R3 = W1 + W2 .
Comprehensive Exercise 1
1. Let V be the vector space of square matrices of order n over the field R. Let W1
and W2 be the subspaces of symmetric and antisymmetric matrices
respectively. Show that V = W1 ⊕ W2 .
2. Let V be the vector space of all functions from the real field R into R. Let U be
the subspace of even functions and W the subspace of odd functions. Show
that V = U ⊕ W.
3. Let W1 , W2 and W3 be the following subspaces of R 3 :
W1 = {(a, b, c ) : a + b + c = 0}, W2 = {(a, b, c ) : a = c },
W3 = {(0, 0, c ) : c ∈ R }.
Show that (i) R 3 = W1 + W2 ; (ii) R 3 = W1 + W3 ; (iii) R 3 = W2 + W3 .
When is the sum direct ?
105
A nswers 1
= aT (a1 , b1 , c1 ) + bT (a2 , b2 , c 2 )
= aT (α) + bT ( β).
∴ T is a linear transformation from V3 (R) into V2 (R).
Illustration 2: Let V ( F ) be the vector space of all m × n matrices over the field F. Let P
be a fixed m × m matrix over F, and let Q be a fixed n × n matrix over F.The correspondence T
from V into V defined by
T ( A) = PAQ V A ∈ V
is a linear operator on V.
If A is an m × n matrix over the field F, then PAQ is also an m × n matrix over the
field F.Therefore T is a function from V into V.Now let A, B ∈ V and a, b ∈ F.Then
T (aA + bB) = P (aA + bB ) Q [by def. of T ]
= (aPA + bPB) Q = a PAQ + b PBQ = aT ( A) + bT ( B ).
∴ T is a linear transformation from V into V. Thus T is a linear operator on V.
Illustration 3: Let V ( F ) be the vector space of all polynomials over the field F. Let
f ( x) = a0 + a1 x + a2 x 2 + … + a n x n
∈ V be a polynomial of degree n in the
indeterminate x. Let us define
n −1
Df ( x) = a1 + 2a2 x + … + na n x if n > 1
and Df ( x) = 0 if f ( x) is a constant polynomial.
Then the correspondence D from V into V is a linear operator on V.
If f ( x) is a polynomial over the field F, then Df ( x) as defined above is also a
polynomial over the field F. Thus if f ( x) ∈ V , then Df ( x) ∈ V . Therefore D is a
function from V into V.
Also if f ( x), g ( x) ∈ V and a, b ∈ F, then
D [a f ( x) + bg ( x)] = a Df ( x) + b Dg ( x).
∴ D is a linear transformation from V into V.
The operator D on V is called the differentiation operator. It should be noted that
for polynomials the definition of differentiation can be given purely algebraically,
and does not require the usual theory of limiting processes.
Illustration 4: Let V (R) be the vector space of all continuous functions from R into R. If
f ∈ V and we define T by
x
(Tf ) ( x) = ∫0 f (t) dt V x ∈ R ,
(iii) T (α − β) = T (α) − T ( β) V α, β ∈ U.
(iv) T (a1α1 + a2 α 2 + … + a n α n ) = a1T (α1 ) + a2 T (α 2 ) + … + a nT (α n )
where α1 , α 2 ,… α n ∈ U and a1 , a2 , … , a n ∈ F.
Proof: (i) Let α ∈ U. Then T (α) ∈ V . We have
T (α) + 0 = T (α) [∵ 0 is zero vector of V and T (α) ∈ V ]
= T (α + 0) [∵ 0 is zero vector of U ]
= T (α) + T (0) [ ∵ T is a linear transformation]
Now in the vector space V, we have
T (α) + 0 = T (α) + T (0)
⇒ 0 = T (0), by left cancellation law for addition in V.
Note: When we write T (0) = 0, there should be no confusion about the vector 0.
Here T is a function from U into V. Therefore if 0 ∈ U, then its image under
T i. e., T (0) ∈ V . Thus in T (0) = 0, the zero on the right hand side is zero vector of
V.
(ii) We have T [α + (− α)] = T (α) + T (− α)
[ ∵ T is a linear transformation]
But T [α + (− α)] = T (0) = 0 ∈ V . [by (i)]
Thus in V, we have
T (α) + T (− α) = 0
⇒ T (− α) = − T (α).
(iii) T (α − β) = T [α + (− β)]
= T (α) + T (− β) [ ∵ T is linear]
= T (α) + [− T ( β)] [by (ii)]
= T (α) − T ( β).
(iv) We shall prove the result by induction on n, the number of vectors in the
linear combination a1α1 + a2 α 2 + … + a n α n . Suppose
T (a1α1 + a2 α 2 + … + a n − 1 α n − 1 ) = a1 T (α1 ) + a2 T (α 2 )
+ … + a n − 1 T (α n − 1 ). …(1)
ThenT (a1α1 + a2 α 2 + … + a n α n )
= [T (a1α1 + a2 α 2 + … + a n − 1 α n − 1 ) + a n α n ]
= T (a1α1 + a2 α 2 + … + a n − 1 α n − 1 ) + a n T (α n )
= [a1 T (α1 ) + a2 T (α 2 ) + … + a n −1 T (α n −1 )] + a n T (α n ) [by (1)]
= a1 T (α1 ) + a2 T (α 2 ) + … + a n − 1 T (α n − 1 ) + a n T (α n ).
Now the proof is complete by induction since the result is true when the number of
vectors in the linear combination is 1.
Note: On account of this property sometimes we say that a linear transformation
preserves linear combinations.
109
= a0 + b 0 = 0 + 0 = 0 ∈ V .
∴ aα1 + bα 2 ∈ N (T ).
Thus a, b ∈ F and α1 , α 2 ∈ N ( T ) ⇒ aα1 + bα 2 ∈ N ( T ). Therefore N ( T ) is a
subspace of U.
Note:If in place of the vector space V, we take the vector space U i. e., if T is a linear
transformation on an n-dimensional vector space U, even then as a special case of
the above theorem,
ρ ( T ) + ν ( T ) = n.
= a (a1 + b1 , a1 − b1 , b1 ) + b (a2 + b2 , a2 − b2 , b2 )
= aT (α) + bT ( β).
∴ T is a linear transformation from V2 (R) into V3 (R).
Now {(1, 0), (0, 1)} is a basis for V2 (R).
We have T (1, 0) = (1 + 0, 1 − 0, 0) = (1, 1, 0)
and T (0, 1) = (0 + 1, 0 − 1, 1) = (1, − 1, 1).
The vectors T (1, 0), T (0, 1) span the range of T.
Thus the range of T is the subspace of V3 (R) spanned by the vectors
(1, 1, 0), (1, − 1, 1).
Now the vectors (1, 1, 0), (1, − 1, 1) ∈ V3 (R) are linearly independent because if
x, y ∈ R, then
x (1, 1, 0) + y (1, − 1, 1) = (0, 0, 0)
⇒ ( x + y, x − y, y) = (0, 0, 0)
⇒ x + y = 0, x − y = 0, y = 0 ⇒ x = 0, y = 0.
∴ the vectors (1, 1, 0), (1, − 1, 1) form a basis for the range of T.
Hence rank T = dim of range of T = 2.
Nullity of T = dim of V2 (R) − rank T = 2 − 2 = 0.
∴ null space of T must be the zero subspace of V2 (R).
Otherwise, (a, b) ∈ null space of T
⇒ T (a, b) = (0, 0, 0)
⇒ (a + b, a − b, b) = (0, 0, 0)
⇒ a + b = 0, a − b = 0, b = 0
⇒ a = 0, b = 0.
∴ (0, 0) is the only element of V2 (R) which belongs to the null space of T.
∴ null space of T is the zero subspace of V2 (R).
Example 7: Let V be the vector space of all n × n matrices over the field F and let B be a
fixed n × n matrix. If
T ( A) = AB − BA V A ∈ V ,
verify that T is a linear transformation from V into V.
Solution: If A ∈ V , then T ( A) = AB − BA ∈ V because AB − BA is also an n × n
matrix over the field F. Thus T is a function from V into V.
Let A1 , A2 ∈ V and a, b ∈ F.
Then aA1 + bA2 ∈ V
and T (aA1 + bA2 ) = (aA1 + bA2 ) B − B (aA1 + bA2 )
= aA1 B + bA2 B − aBA1 − bBA2
= a ( A1 B − BA1 ) + b ( A2 B − BA2 )
= aT ( A1 ) + bT ( A2 ).
∴ T is a linear transformation from V into V.
114
Example 8: Let V be an n-dimensional vector space over the field F and let T be a linear
transformation from V into V such that the range and null space of T are identical. Prove
that n is even. Give an example of such a linear transformation.
Solution: Let N be the null space of T. Then N is also the range of T.
Now ρ ( T ) + ν ( T ) = dim V
i. e., dim of range of T + dim of null space of T = dim V = n
i. e., 2 dim N = n [∵ range of T = null space of T = N ]
i. e., n is even.
Example of such a transformation:
Let T : V2 (R) → V2 (R) be defined by
T (a, b) = (b, 0) V a, b ∈ R.
Let α = (a1 , b1 ), β = (a2 , b2 ) ∈ V2 (R) and let x, y ∈ R.
Then T ( xα + yβ) = T ( x (a1 , b1 ) + y (a2 , b2 )) = T ( xa1 + ya2 , xb1 + yb2 )
= ( xb1 + yb2 , 0) = ( xb1 , 0) + ( yb2 , 0) = x (b1 , 0) + y (b2 , 0)
= xT (a1 , b1 ) + yT (a2 , b2 ) = xT (α) + yT ( β).
∴ T is a linear transformation from V2 (R) into V2 (R).
Now {(1, 0), (0, 1)} is a basis of V2 (R).
We have T (1, 0) = (0, 0) and T (0, 1) = (1, 0).
Thus the range of T is the subspace of V2 (R) spanned by the vectors (0, 0) and (1, 0).
The vector (0, 0) can be omitted from this spanning set because it is zero vector.
Therefore the range of T is the subspace of V2 (R) spanned by the vector (1, 0). Thus
the range of T = { a (1, 0) : a ∈ R} = {(a, 0) : a ∈ R}.
Now let (a, b) ∈ N (the null space of T ).
Then (a, b) ∈ N ⇒ T (a, b) = (0, 0) ⇒ (b, 0) = (0, 0) ⇒ b = 0.
∴ null space of T = {(a, 0) : a ∈ R}.
Thus range of T = null space of T.
Also we observe that dim V2 (R) = 2 which is even.
Example 9: Let V be a vector space and T a linear transformation from V into V. Prove
that the following two statements about T are equivalent :
(i) The intersection of the range of T and the null space of T is the zero subspace of
V i. e., R ( T ) ∩ N ( T ) = { 0}.
(ii) T [T (α)] = 0 ⇒ T (α) = 0.
Solution: First we shall show that (i) ⇒ (ii).
We have T [T (α)] = 0 ⇒ T (α) ∈ N (T )
⇒ T (α) ∈ R ( T ) ∩ N ( T ) [ ∵ α ∈ V ⇒ T (α) ∈ R (T )]
⇒ T (α) = 0 because R (T ) ∩ N (T ) = { 0}.
Now we shall show that (ii) ⇒ (i).
115
Let α ≠ 0 and α ∈ R ( T ) ∩ N ( T ).
Then α ∈ R ( T ) and α ∈ N ( T ).
Since α ∈ N ( T ), therefore T (α) = 0. …(1)
Also α ∈ R ( T ) ⇒ ∃ β ∈ V such that T ( β) = α.
Now T ( β) = α
⇒ T [T ( β)] = T (α) = 0 [From (1)]
Thus ∃ β ∈ V such that T [T ( β)] = 0 but T ( β) = α ≠ 0.
This contradicts the given hypothesis (ii).
Therefore there exists no α ∈ R ( T ) ∩ N ( T ) such that α ≠ 0.
Hence R ( T ) ∩ N ( T ) = {0}.
Example 10: Consider the basis S = {α1 , α 2 , α 3 } of R3 where
α1 = (1, 1, 1), α 2 = (1, 1, 0), α 3 = (1, 0, 0).
Express (2 , − 3, 5) in terms of the basis α1 , α 2 , α 3 .
Let T : R 3 → R 2 be defined as
T (α1 ) = (1, 0), T (α 2 ) = (2 , − 1), T (α 3 ) = (4, 3).
Find T (2 , − 3, 5).
Solution: Let (2 , − 3, 5) = aα1 + bα 2 + cα 3
= a (1, 1, 1) + b (1, 1, 0) + c (1, 0, 0).
Then a + b + c = 2 , a + b = − 3, a = 5.
Solving these equations, we get
a = 5, b = − 8, c = 5.
∴ (2 , − 3, 5) = 5α1 − 8α 2 + 5α 3 .
Now T (2 , − 3, 5) = T (5α1 − 8α 2 + 5α 3 )
= 5T (α1 ) − 8T (α 2 ) + 5T (α 3 )
[ ∵ T is a linear transformation]
= 5 (1, 0) − 8 (2 , − 1) + 5 (4, 3)
= (5, 0) − (16, − 8) + (20, 15)
= (9, 23).
Comprehensive Exercise 2
1 − 1
12. Let V be the vector space of 2 × 2 matrices over R and let M = ⋅
−2 2
Let T : V → V be the linear function defined by T ( A) = MA for A ∈ V .
Find a basis and the dimension of (i) the kernel of T and (ii) the range of T.
1 2
13. Let V be the vector space of 2 × 2 matrices over R and let M = ⋅ Let
0 3
T : V → V be the linear transformation defined by T ( A) = AM − MA. Find
a basis and the dimension of the kernel of T.
14. Let V be the space of n × 1 matrices over a field F and let W be the space of
m × 1 matrices over F. Let A be a fixed m × n matrix over F and let T be the
linear transformation from V into W defined by T ( X ) = AX .
Prove that T is the zero transformation if and only if A is the zero matrix.
15. Let U ( F) and V ( F) be two vector spaces and let T1 , T2 be two linear
transformations from U to V. Let x, y be two given elements of F. Then the
mapping T defined as T (α) = x T1 (α) + y T2 (α) V α ∈ U is a linear
transformation from U into V.
A nswers 2
1 0 0 1
12. (i) , is a basis of the kernel T and dim (kernel T) = 2
1 0 0 1
1 0 0 1
(ii) , is a basis for R (T ) and dim R (T ) = 2
−2 0 0 −2
1 −1 1 0
13. , is a basis of the kernel of T and dim (kernel T) = 2
0 0 0 1
118
V i. e., ^
0 (α) = 0 ∈ V V α ∈ U.
Then ^
0 ∈ L (U, V ). If T ∈ L (U, V ) and α ∈ U, we have
(^
0 + T ) (α) = ^
0 (α) + T (α) [by (1)]
∴ −T +T =^
0 for every T ∈ L (U, V ).
Thus each element in L (U, V ) possesses additive inverse.
Therefore L (U, V ) is an abelian group with respect to addition defined in it.
Further we make the following observations :
(i) Let c ∈ F and T1 , T2 ∈ L (U, V ). If α is any element in U, we have
[c (T1 + T2 )] (α) = c [(T1 + T2 ) (α)] [by (2) i. e., by def. of scalar
multiplication in L (U, V )]
= c [T1 (α) + T2 (α)] [by (1)]
= cT1 (α) + cT2 (α)
[ ∵ c ∈ F and T1 (α), T2 (α) ∈ V
which is a vector space]
= (cT1 ) (α) + (cT2 ) (α) [by (2)]
= (cT1 + cT2 ) (α) [by (1)]
∴ c (T1 + T2 ) = cT1 + cT2 .
(ii) Let a, b ∈ F and T ∈ L (U, V ). If α ∈ U, we have
[(a + b) T ] (α) = (a + b) T (α) [by (2)]
= aT (α) + bT (α) [ ∵ V is a vector space]
= (aT ) (α) + (bT ) (α) [by (2)]
= (aT + bT ) (α) [by (1)]
∴ (a + b) T = aT + bT .
(iii) Let a, b ∈ F and T ∈ L (U, V ). If α ∈ U, we have
[(ab) T ] (α) = (ab) T (α) [by (2)]
= a [bT (α)] [ ∵ V is a vector space]
= a [(bT ) (α)] [by (2)]
= [a (bT )] (α) [by (2)]
∴ (ab) T = a (bT ).
(iv) Let 1∈ F and T ∈ L (U, V ). If α ∈ U, we have
(1T ) (α) = 1T (α) [by (2)]
= T (α) [ ∵ V is a vector space]
∴ 1T = T .
Hence L (U, V ) is a vector space over the field F.
Note: If in place of the vector space V,we take U,then we observe that the set of all
linear operators on U forms a vector space with respect to addition and scalar
multiplication defined as above.
Dimension of L (U , V ): Now we shall prove that if U ( F ) and V ( F ) are finite
dimensional, then the vector space of linear transformations from U into V is also
finite dimensional. For this purpose we shall require an important result which
we prove in the following theorem :
121
Theorem 2: Let U be a finite dimensional vector space over the field F and let
B = {α1 , α 2 , … , α n }be an ordered basis for U. Let V be a vector space over the same field F
and let β1 , … , β n be any vectors in V. Then there exists a unique linear transformation T from
U into V such that
T (α i ) = β i , i = 1, 2 , … , n.
Proof: Existence of T:
Let α ∈ U.
Since B = {α1 , α 2 , … , α n} is a basis for U, therefore there exist unique scalars
x1 , x2 , … , x n such that
α = x1α1 + x2 α 2 + … + x n α n .
For this vector α, let us define
T (α) = x1β1 + x2 β 2 + … + x n β n .
Obviously T (α) as defined above is a unique element of V. Therefore T is a
well-defined rule for associating with each vector α in U a unique vector T (α) in V.
Thus T is a function from U into V.
The unique representation of α i ∈ U as a linear combination of the vectors
belonging to the basis B is
α i = 0α1 + 0α 2 + … + 1α i + 0α i + 1 + … + 0α n .
Therefore according to our definition of T, we have
T (α i ) = 0β1 + 0β 2 + … + 1β i + 0β i + 1 + … + 0β n
i. e., T (α i ) = β i , i = 1, 2 , … , n.
Now to show that T is a linear transformation.
Let a, b ∈ F and α, β ∈ U. Let
α = x1α1 + … + x n α n and β = y1α1 + … + y n α n .
Then T (aα + bβ) = T [a ( x1α1 + … + x n α n ) + b ( y1α1 + … + y n α n )]
= T [(ax1 + by1 ) α1 + … + (ax n + by n ) α n ]
= (ax1 + by1 ) β1 + … + (ax n + by n ) β n [by def. of T ]
= a ( x1 β1 + … + x n β n ) + b ( y1 β1 + … + y n β n )
= aT (α) + bT ( β) [by def. of T ]
∴ T is a linear transformation from U into V. Thus there exists a linear
transformation T from U into V such that
T (α i ) = β i , i = 1, 2 , … , n.
Uniqueness of T: Let T ′ be a linear transformation from U into V such that
T ′ (α i ) = β i , i = 1, 2 , … , n.
For the vector α = x1α1 + … + x n α n ∈ U, we have
T ′ (α) = T ′ ( x1α1 + … + x n α n )
= x1T ′ (α1 ) + … + x n T ′ (α n )
[ ∵ T ′ is a linear transformation]
122
= x1 β1 + … + x n β n [by def. of T ′]
= T (α). [by def. of T ]
Thus T ′ (α) = T (α) V α ∈ U.
∴ T ′ = T.
This shows the uniqueness of T.
Note: From this theorem we conclude that if T is a linear transformation from a
finite dimensional vector space U ( F ) into a vector space V ( F ), then T is
completely defined if we mention under T the images of the elements of a basis set
of U. If S and T are two linear transformations from U into V such that
S (α i ) = T (α i ) V α i belonging to a basis of U, then
S (α) = T (α) V α ∈ U, i. e., S = T .
Thus two linear transformations from U into V are equal if they agree on a basis of U.
Theorem 3: Let U be an n-dimensional vector space over the field F, and let V be an
m-dimensional vector space over F. Then the vector space L (U, V ) of all linear
transformations from U into V is finite dimensional and is of dimension mn.
Proof: Let B = { α1 , α 2 , … , α n }and B ′ = { β1 , β 2 , … , β m }be ordered bases for
U and V respectively. By theorem 2, there exists a unique linear transformation T11
from U into V such that
T11 (α1 ) = β1 , T11 (α 2 ) = 0, … , T11 (α n ) = 0
where β1 , 0, … , 0 are vectors in V.
In fact, for each pair of integers ( p, q) with 1≤ p ≤ m and 1≤ q ≤ n, there exists a
unique linear transformation Tpq from U into V such that
0, if i≠ q
Tpq (α i ) =
β p , if i= q
m n
= Σ Σ a pq δ iq β p [From (1)]
p =1 q =1
m
= Σ a pi β p
p =1 [On summing with respect to q. Remember
that δ iq = 1 when q = i and δ iq = 0 when q ≠ i]
= T (α i ). [From (2)]
Thus S (α i ) = T (α i ) V α i ∈ B. Therefore S and T agree on a basis of U. So we
must have S = T . Thus T is also a linear combination of the elements of B1 .
Therefore L (U, V ) is a linear span of B1 .
(ii) Now we shall show that B1 is linearly independent. For b pq ’s ∈ F, let
m n
Σ Σ b pq Tpq = ^
0 i. e., zero vector of L (U, V )
p =1 q =1
m n
⇒ Σ Σ b pq Tpq (α i ) = ^
0 (α i ) V α i ∈ B
p =1 q =1
m n
⇒ Σ Σ b pq Tpq (α i ) = 0 ∈ V [∵ ^
0 is zero transformation]
p =1 q =1
m n m
⇒ Σ Σ b pq δ iq β p = 0 ⇒ Σ b pi β p = 0
p =1 q =1 p =1
⇒ b1i β1 + b2 i β 2 + … + b mi β m = 0, 1 ≤ i ≤ n
⇒ b1i = 0, b2 i = 0, … , b mi = 0, 1 ≤ i ≤ n
[∵ β1 , β 2 , … , β m are linearly independent]
⇒ b pq = 0 where 1≤ p ≤ m and 1≤ q ≤ n
⇒ B1 is linearly independent. Therefore B1 is a basis of L (U, V ).
∴ dim L (U, V ) = number of elements in B1 = mn.
Corollary: The vector space L (U, U ) of all linear operators on an n-dimensional vector
space U is of dimension n2 .
Note: Suppose U ( F ) is an n-dimensional vector space and V ( F ) is an
m-dimensional vector space. If U ≠ {0} and V ≠ {0}, then n ≥ 1 and m ≥ 1. Therefore
L (U, V ) does not just consist of the element ^
0, because dimension of L (U, V ) is
mn ≥ 1.
124
Example 12:Let S (R) be the vector space of all polynomial functions in x with coefficients as
elements of the field R of real numbers. Let D and T be two linear operators on V defined by
d
D ( f ( x)) = f ( x) …(1)
dx
x
and T ( f ( x)) = ∫0 f ( x) dx …(2)
for every f ( x) ∈ V .
Then show that DT = I (identity operator) and TD ≠ I .
Solution: Let f ( x) = a0 + a1 x + a2 x 2 + … ∈ V .
We have ( DT ) ( f ( x)) = D [T ( f ( x))]
= D ∫ f ( x) dx = D ∫ (a0 + a1 x + a2 x 2 + ...) dx
x x
0 0
x
a a a x + a1 x 2 + …
= D a0 x + 1 x 2 + 2 x 3 + … =
d
2 3
0 dx 0 2
= a0 + a1 x + a2 x 2 + … = f ( x) = I [ f ( x)].
Thus we have ( DT ) [ f ( x)] = I [ f ( x)] V f ( x) ∈ V . Therefore DT = I .
Now (TD ) f ( x) = T [ D f ( x)]
d
= T (a0 + a1 x + a2 x 2 + …) = T (a1 + 2a2 x + …)
dx
x
= ∫ 0 (a1 + 2a2 x + ...) dx = [a1 x + a2 x 2 + …]0x
= a1 x + a2 x 2 + …
≠ f ( x) unless a0 = 0.
Thus ∃ f ( x) ∈ V such that (TD ) [ f ( x)] ≠ I [ f ( x)].
∴ TD ≠ I .
Hence TD ≠ DT ,
showing that product of linear operators is not in general commutative.
Example 13: Let V (R) be the vector space of all polynomials in x with coefficients in the
field R. Let D and T be two linear transformations on V defined as
d
D [ f ( x)] = f ( x) V f ( x) ∈ V and T [ f ( x)] = x f ( x) V f ( x) ∈ V .
dx
Then show that DT ≠ TD.
Solution: We have
d
( DT ) [ f ( x)] = D [T ( f ( x))] = D [ x f ( x)] = [ x f ( x)]
dx
d
= f ( x) + x f ( x). …(1)
dx
d
Also (TD ) [ f ( x)] = T [ D ( f ( x))] = T ( f ( x))
dx
126
d
= x f ( x). …(2)
dx
From (1) and (2), we see that ∃ f ( x) ∈ V such that
( DT ) ( f ( x)) ≠ (TD ) ( f ( x)) ⇒ DT ≠ TD.
Also we see that
( DT − TD ) ( f ( x)) = ( DT ) ( f ( x)) − (TD ) ( f ( x))
= f ( x) = I ( f ( x)).
∴ DT − TD = I .
Theorem 2: Let V ( F ) be a vector space and A, B, C be linear transformations on V.
Then
(i) A^
0 =^
0 =^
0 A (ii) AI = A = IA
(iii) A ( BC ) = ( AB ) C (iv) A ( B + C ) = AB + AC
(v) ( A + B) C = AC + BC
(vi) c ( AB ) = (cA ) B = A (cB ) where c is any element of F.
Proof: Just for the sake of convenience we first mention here our definitions of
addition, scalar multiplication and product of linear transformations :
( A + B ) (α) = A (α) + B (α) …(1)
(cA ) (α) = cA (α) …(2)
( AB ) (α) = A [ B (α)] …(3)
V α ∈ V and V c ∈ F.
Now we shall prove the above results.
(i) We have V α ∈ V, ( A ^
0) (α) = A [ ^
0 (α)] [by (3)]
= A (0) ^
[ ∵ 0 is zero transformation]
= 0 =^
0 (α).
∴ A^
0 =^
0. [by def. of equality of two functions]
^ ^
Similarly we can show that 0 A = 0.
(ii) We have V α ∈ V,
( AI ) (α) = A [I (α)]
= A (α) [∵ I is identity transformation]
∴ AI = A.
Similarly we can show that IA = A.
(iii) We have V α ∈ V
[ A ( BC )] (α) = A [( BC ) (α)] [by (3)]
= A [ B (C (α))] [by (3)]
= ( AB ) [C (α)] [by (3)]
= [( AB ) C ] (α). [by (3)]
∴ A ( BC ) = ( AB ) C.
127
(iv) We have V α ∈ V,
[ A ( B + C )] (α) = A [( B + C ) (α)] [by (3)]
= A [ B (α) + C (α)] [by (1)]
= A [ B (α)] + A [C (α)] [ ∵ A is a linear
transformation and B (α), C (α) ∈ V ]
= ( AB ) (α) + ( AC ) (α) [by (3)]
= ( AB + AC ) (α) [by (1)]
∴ A ( B + C ) = AB + AC.
(v) We have V α ∈ V,
[( A + B) C ] (α) = ( A + B ) [C (α)] [by (3)]
= A [C (α)] + B [C (α)] [by (1) since C (α) ∈ V ]
= ( AC ) (α) + ( BC ) (α) [by (3)]
= ( AC + BC ) (α) [by (1)]
∴ ( A + B ) C = AC + BC.
(vi) We have V α ∈ V,
[c ( AB )] (α) = c [( AB ) (α)] [by (2)]
= c [ A ( B (α))] [by (3)]
= (cA ) [ B (α)] [by (2) since B (α) ∈ V ]
= [(cA) B ] (α) [by (3)]
∴ c ( AB ) = (cA ) B.
Again [c ( AB )] (α) = c [( AB ) (α)] [by (2)]
= c [ A ( B (α))] [by (3)]
= A [cB (α)]
[ ∵ A is a linear transformation and B (α) ∈ V ]
= A [(cB ) (α)] [by (2)]
= [ A (cB )] (α). [by (3)]
∴ c ( AB ) = A (cB ).
R5 . a ∈ R ⇒ ∃ − a ∈ R such that
(− a) + a = 0.
R6 . R is closed with respect to multiplication i. e.,
ab ∈ R , V a, b ∈ R
R7 . (ab) c = a (bc ) V a, b, c ∈ R.
R8 . Multiplication is distributive with respect to addition, i. e.,
a (b + c ) = ab + ac and (a + b) c = ac + bc V a, b, c ∈ R.
Ring with unity element: Definition:
If in a ring R there exists an element 1∈ R such that
1a = a = a1 V a ∈ R,
then R is called a ring with unity element. The element 1 is called the unity element of
the ring.
Theorem: The set L (V , V ) of all linear transformations from a vector space V ( F ) into
itself is a ring with unity element with respect to addition and multiplication of linear
transformations defined as below :
(S + T ) (α) = S (α) + T (α)
and (ST ) (α) = S [T (α)] V S, T ∈ L (V , V ) and V α ∈ V.
Proof: The students should themselves write the complete proof of this theorem.
We have proved all the steps here and there. They should show here that all the ring
postulates are satisfied in the set L (V , V ). The transformation ^
0 will act as the zero
element and the identity transformation I will act as the unity element of this ring.
Proof: The students should write the complete proof here. All the necessary steps
have been proved here and there.
3.17 Polynomials
Let T be a linear transformation on a vector space V ( F ). Then T T is also a linear
transformation on V. We shall write T 1 = T and T 2 = T T . Since the product of
linear transformations is an associative operation, therefore if m is a positive
integer, we shall define
m
T = T T T … upto m times.
m
Obviously T is a linear transformation on V.
0
Also we define T = I (identity transformation).
If m and n are non-negative integers, it can be easily seen that
m n m+n m n mn
T T =T and (T ) =T .
The set L (V , V ) of all linear transformations on V is a vector space over the field F.
If a0 , a1 , … , a n ∈ F, then
2 n
p (T ) = a0 I + a1T + a2 T + … + an T ∈ L (V , V )
i. e., p (T ) is also a linear transformation on V because it is a linear combination over
F of elements of L (V , V ).We call p (T ) as a polynomial in linear transformation T.
The polynomials in a linear transformation behave like ordinary polynomials.
We have (aA) A −1 = a A A −1
1 1
(ii)
a a
= a ( AA −1 ) = a ( AA −1 ) = 1I = I .
1 1
a a
1 A −1 (aA) = 1 [ A −1 (aA)] = 1 [a ( A −1 A)]
Also
a a a
= a ( A −1 A) = 1I = I .
1
a
133
(aA) A −1 = I = A −1 (aA).
1 1
Thus
a a
∴ by theorem 3, aA is invertible and
1
(aA) −1 = A −1 .
a
(iii) Since A is invertible, therefore
AA −1 = I = A −1 A.
∴ by theorem 3, A −1 is invertible and
A = ( A −1 ) −1 .
Singular and Non-singular transformations:
Definition: Let T be a linear transformation from a vector space U ( F ) into a vector space
V ( F ). Then T is said to be non-singular if the null space of T (i. e., ker T ) consists of the
zero vector alone i. e., if
α ∈ U and T (α) = 0 ⇒ α = 0.
If there exists a vector 0 ≠ α ∈ U such that T (α) = 0, then T is said to be singular.
Theorem 7: Let T be a linear transformation from a vector space U ( F ) into a vector
space V ( F ). Then T is non-singular if and only if T is one-one.
Proof: Given that T is non-singular. Then to prove that T is one-one.
Let α1 , α 2 ∈ U. Then
T (α1 ) = T (α 2 )
⇒ T (α1 ) − T (α 2 ) = 0 ⇒ T (α1 − α 2 ) = 0
⇒ α1 − α 2 = 0 [ ∵ T is non-singular]
⇒ α1 = α 2 .
∴ T is one-one.
Conversely let T be one-one. We know that T (0) = 0. Since T is one-one, therefore
α ∈ U and T (α) = 0 = T (0) ⇒ α = 0. Thus the null space of T consists of zero
vector alone. Therefore T is non-singular.
Theorem 8: Let T be a linear transformation from U into V. Then T is non-singular if
and only if T carries each linearly independent subset of U onto a linearly independent subset
of V.
Proof: First suppose that T is non-singular.
Let B = {α1 , α 2 , … , α n}
be a linearly independent subset of U. Then image of B under T is the subset B ′ of V
given by
B ′ = {T (α1 ), T (α 2 ), … , T (α n )}.
To prove that B ′ is linearly independent.
Let a1 , a2 , … , a n ∈ F and let
134
a1T (α1 ) + … + a n T (α n ) = 0
⇒ T (a1α1 + … + a n α n ) = 0 [ ∵ T is linear]
⇒ a1α1 + … + a n α n = 0 [ ∵ T is non-singular]
⇒ a i = 0, i = 1, 2 , … , n [ ∵ α1 , … , α n are linearly independent]
Thus the image of B under T is linearly independent.
Conversely suppose that T carries independent subsets onto independent subsets.
Then to prove that T is non-singular.
Let α ≠ 0 ∈ U. Then the set S = {α} consisting of the one non-zero vector α is
linearly independent. The image of S under T is the set
S ′ = {T (α)}.
It is given that S ′ is also linearly independent. Therefore T (α) ≠ 0 because the set
consisting of zero vector alone is linearly dependent. Thus
0 ≠ α ∈ U ⇒ T (α) ≠ 0.
This shows that the null space of T consists of the zero vector alone. Therefore T is
non-singular.
Theorem 9: Let U and V be finite dimensional vector spaces over the field F such that dim
U = dim V. If T is a linear transformation from U into V, the following are equivalent.
(i) T is invertible.
(ii) T is non-singular.
(iii) The range of T is V.
(iv) If {α1 , … , α n} is any basis for U, then
{T (α1 ), … , T (α n )} is a basis for V.
(v) There is some basis {α1 , … , α n} for U such that
{T (α1 ), … , T (α n )} is a basis for V.
Proof: (i) ⇒ (ii).
If T is invertible, then T is one-one. Therefore T is non-singular.
(ii) ⇒ (iii).
Let T be non-singular. Let {α1 , … , α n } be a basis for U. Then {α1 , … , α n } is a
linearly independent subset of U. Since T is non-singular therefore
{T (α1 ), … , T (α n )} is a linearly independent subset of V and it contains n vectors.
Since dim V is also n, therefore this set of vectors is a basis for V. Now let β be any
vector in V. Then there exist scalars a1 , … , a n ∈ F such that
β = a1T (α1 ) + … + a n T (α n ) = T (a1 α1 + … + a n α n )
which shows that β is in the range of T because
a1α1 + … + a n α n ∈ U.
Thus every vector in V is in the range of T. Hence range of T is V.
(iii) ⇒ (iv).
135
Now suppose that range of T is V i. e., T is onto. If {α1 , … , α n} is any basis for U,
then the vectors T (α1 ), … , T (α n ) span the range of T which is equal to V. Thus
the vectors T (α1 ), … , T (α n ) which are n in number span V whose dimension is
also n. Therefore {T (α1 ), … , T (α n )} must be a basis set for V.
(iv) ⇒ (v).
Since U is finite dimensional, therefore there exists a basis for U. Let {α1 , … , α n} be
a basis for U. Then {T (α1 ), … , T (α n )} is a basis for V as it is given in (iv).
(v) ⇒ (i).
Suppose there is some basis {α1 , … , α n } for U such that {T (α1 ), … , T (α n ) } is a
basis for V. The vectors {T (α1 ), … , T (α n ) } span the range of T. Also they span V.
Therefore the range of T must be all of V i. e., T is onto.
If α = c1 α1 + … + c n α n is in the null space of T, then
T (c1 α1 + … + c n α n ) = 0
⇒ c1T (α1 ) + … + c n T (α n ) = 0
⇒ c i = 0, 1 ≤ i ≤ n because T (α1 ), … , T (α n ) are linearly independent
⇒ α = 0.
∴ T is non-singular and consequently T is one-one. Hence T is invertible.
TS = ^
0 but ST ≠ ^
0.
Solution: Consider the linear transformations T and S on V2 (R) defined by
T (a, b) = (a, 0) V (a, b) ∈ V2 (R)
138
∴ TS = ^
0.
Again (ST ) (a, b) = S [T (a, b)] = S (a, 0) = (0, a)
≠^
0 (a, b) V (a, b) ∈ V2 (R).
Thus ST ≠ ^
0.
Example 19: Let V be a vector space over the field F and T a linear operator on V. If
T 2
=^
0, what can you say about the relation of the range of T to the null space of T ? Give an
Solution: We have T 2
=^
0
⇒ T 2
(α) = ^
0 (α) V α ∈ V
⇒ T [T (α)] = 0 V α ∈ V
⇒ T (α) ∈ null space of T V α ∈ V .
But T (α) ∈ range of T V α ∈ V .
∴ T 2
=^
0 ⇒ range of T ⊆ null space of T.
For the second part of the question, consider the linear transformation T on V2 (R)
defined by
T (a, b) = (0, a) V (a, b) ∈ V2 (R).
Then obviously T ≠ ^
0.
2
We have T (a, b) = T [T (a, b)] = T (0, a) = (0, 0)
=^
0 (a, b) V (a, b) ∈ V2 (R).
∴ T 2
=^
0.
Example 20: If T : U → V is a linear transformation and U is finite dimensional, show
that U and range of T have the same dimension iff T is non-singular. Determine all
non-singular linear transformations
T : V4 (R) → V3 (R).
Solution: We know that
dim U = rank (T ) + nullity (T )
= dim of range of T + dim of null space of T.
∴ dim U = dim of range of T
iff dim of null space of T is zero
139
Solution: If A2 − A + I = ^
0, then A2 − A = − I .
First we shall prove that A is one-one Let α1 , α 2 ∈ V .
Then A (α1 ) = A (α 2 ) …(1)
⇒ A [ A (α1 )] = A [ A (α 2 )]
⇒ A2 (α1 ) = A2 (α 2 ) …(2)
2 2
⇒ A (α1 ) − A (α1 ) = A (α 2 ) − A (α 2 ) [From (2) and (1)]
⇒ ( A2 − A) (α1 ) = ( A2 − A) (α 2 )
⇒ (− I ) (α1 ) = (− I ) (α 2 ) ⇒ − [I (α1 )] = − [I (α 2 )]
⇒ − α1 = − α 2 ⇒ α1 = α 2 .
∴ A is one-one.
Now to prove that A is onto.
Let α ∈ V. Then α − A (α) ∈ V .
We have A [α − A (α)] = A (α) − A2 (α) = ( A − A2 ) (α)
= I (α) [ ∵ A2 − A = − I ⇒ A − A2 = I ]
=α.
Thus α ∈ V ⇒ ∃ α − A (α) ∈ V such that A [α − A (α)] = α.
∴ A is onto.
Hence A is invertible.
Example 23: Let V be a finite dimensional vector space and T be a linear operator on V.
Suppose that rank (T 2 ) = rank ( T ). Prove that the range and null space of T are disjoint
i. e., have only the zero vector in common.
Solution: We have
dim V = rank ( T ) + nullity ( T ) and dim V = rank ( T 2 ) + nullity (T 2 ).
Since rank ( T ) = rank ( T 2 ), therefore we get nullity (T ) = nullity (T 2 )
i. e., dim of null space of T = dim of null space of T 2 .
Now T (α) = 0
⇒ T [T (α)] = T (0) ⇒ T 2 (α) = 0.
∴ α ∈ null space of T ⇒ α ∈ null space of T 2 .
∴ null space of T ⊆ null space of T 2 .
But null space of T and null space of T 2 are both subspaces of V and have the same
dimension.
∴ null space of T = null space of T 2 .
∴ null space of T 2 ⊆ null space of T
2
i. e., T (α) = 0 ⇒ T (α) = 0.
∴ range and null space of T are disjoint. [See Example 5 after 3.12]
141
Comprehensive Exercise 3
8. For the linear operator T of Solved Example 16, after article 3.18, prove
that (T 2 − I ) (T − 3I ) = ^
0.
9. Let T and U be the linear operators on R 2 defined by T (a, b) = (b, a) and
U (a, b) = (a, 0). Give rules like the one defining T and U for each of the linear
transformation (U + T ), UT , TU, T 2 , U 2 .
10. Let T be the (unique) linear operator on C 3 for which
T (1, 0, 0) = (1, 0, i), T (0, 1, 0) = (0, 1, 1), T (0, 0, 1) = (i, 1, 0).
Show that T is not invertible.
11. Show that if two linear transformations of a finite dimensional vector space
coincide on a basis of that vector space, then they are identical.
T ≠^ 0, T 2 ≠ ^
0 but T 3 = ^0.
14. Let T be a linear transformation from a vector space U into a vector space V
with Ker T ≠ 0. Show that there exist vectors α1 and α 2 in U such that
α1 ≠ α 2 and Tα1 = Tα 2 .
15. Let T be a linear transformation from V3 (R) into V2 (R), and let S be a linear
transformation from V2 (R) into V3 (R). Prove that the transformation ST is
not invertible.
16. Let A and B be linear transformations on a finite dimensional vector space
V and let AB = I . Then A and B are both invertible and A −1 = B. Give an
example to show that this is false when V is not finite dimensional.
17. If A and B are linear transformations (on the same vector space) and if
AB = I , then A is called a left inverse of B and B is called a right inverse of A.
Prove that if A has exactly one right inverse, say B, then A is invertible.
18. Prove that the set of invertible linear operators on a vector space V with the
operation of composition forms a group. Check if this group is
commutative.
19. Let V and W be vector spaces over the field F and let U be an isomorphism
of V onto W. Prove that T → UTU −1 is an isomorphism of L (V , V ) onto
L (W, W).
20. If {α1 , … , α k } and {β1 , … β k } are linearly independent sets of vectors in a
finite dimensional vector space V, then there exists an invertible linear
transformation T on V such that
T (α i ) = β i , i = 1, … , k.
143
A nswers 3
1. T ( x1 , x2 ) = ( x1 a + x2 c , x1 b + x2 d) 2. T ( x1 , x2 ) = ( x1 − x2 , x1 + 2 x2 )
3. T (a, b) = 5a − 2b 4. T (a, b, c ) = (a + 2b, 2a − b, − 4a − 3b)
5. T (a, b, c ) = (a + 2b, − a + 3b, 2a − b, 3a)
7. T −1 ( x, y, z ) = x +
1 1 1 1
6. T −1 ( p, q) = (q, p − q) y, z , x − y
2 2 2 2
9. (U + T ) (a, b) = (a + b, a) ; (UT ) (a, b) = (b, 0) ; (TU) (a, b) = (0, α)
T 2 (a, b) = (a, b) ; U 2 (a, b) = (a, 0)
18. Not commutative
5. Let U and V be vector spaces over the field F and let T be a linear
transformation from U into V. Suppose that U is finite dimensional. Then
rank (T ) + nullity (T ) = …… .
6. Let V be an n-dimensional vector space over the field F and let T be a linear
transformation from V into V such that the range and null space of T are
identical. Then n is …… .
True or False
Write ‘T’ for true and ‘F’ for false statement.
1. If a finite dimensional vector space V ( F ) is a direct sum of two subspaces W1
and W2 , then dim V = dim W1 + dim W2 .
4. Two linear transformations from U into V are equal if they agree on a basis of
U.
5. For two linear operators T and S on R2 , TS = ^
0 → ST = ^
0.
A nswers
True or False
1. T 2. F 3. F 4. T
5. F 6. F 7. T
¨
147
4
M atrices and L inear
T ransformations
4.1 Matrix
efinition: Let F be any field. A set of mn elements of F arranged in the form of a
D rectangular array having m rows and n columns is called an m × n matrix over the field F.
An m × n matrix is usually written as
a11 a12 ... a1n
a a22 ... a2 n
A= ⋅
21
... ... ... ...
a
m1 a m2 ... a mn
In a compact form the above matrix is represented by A = [a ij ] m × n .The element a ij
is called the (i, j) th element of the matrix A. In this element the first suffix i will
always denote the number of row in which this element occurs.
If in a matrix A the number of rows is equal to the number of columns and is equal
to n, then A is called a square matrix of order n and the elements a ij for which i = j
constitute its principal diagonal.
Unit matrix: A square matrix each of whose diagonal elements is equal to1and each of whose
non-diagonal elements is equal to zero is called a unit matrix or an identity matrix. We shall
denote it by I. Thus if I is unit matrix of order n, then I = [δ ij ] n × n where δ ij is
Kronecker delta.
148
Diagonal matrix: A square matrix is said to be a diagonal matrix if all the elements lying
above and below the principal diagonal are equal to 0. For example,
0 0 0 0
0 2 + i 0 0
0 0 0 0
0 0 0 5
is a diagonal matrix of order 4 over the field of complex numbers.
Null matrix: The m × n matrix whose elements are all zero is called the null
matrix or (zero matrix) of the type m × n.
Equality of two matrices: Definition:
Let A = [a ij ] m × n and B = [b jk ] m × n . Then
A = B if a ij = b ij for each pair of subscripts i and j.
Addition of two matrices: Definition:
Let A = [a ij ] m × n , B = [b ij ] m × n . Then we define
A + B = [a ij + b ij ] m × n .
Multiplication of a matrix by a scalar: Definition:
Let A = [a ij ] m × n and a ∈ F i. e., a be a scalar. Then we define
aA = [aa ij ] m × n .
Multiplication of two matrices: Definition:
Let A = [a ij ] m × n , B = [b jk ] n × p i. e.,the number of columns in the matrix
A is equal to the number of rows in the matrix B. Then we define
n
AB = Σ a ij b jk i. e., AB is an m × p matrix whose (i, k) th
j =1 m × p
n
element is equal to Σ a ij b jk .
j =1
If A and B are both square matrices of order n, then both the products AB and BA
exist but in general AB ≠ BA.
Transpose of a matrix: Definition:
Let A = [a ij ] m × n . The n × m matrix AT obtained by interchanging the rows and
columns of A is called the transpose of A. Thus AT = [b ij ] n × m , where b ij = a ji , i. e.,
the (i, j) th element of AT is the ( j, i) th element of A. If A is an m × n matrix and B is
an n × p matrix, it can be shown that ( AB)T = BT AT . The transpose of a matrix A
is also denoted by A t or by A′ .
Determinant of a square matrix: Let Pn denote the group of all permutations of
degree n on the set {1, 2 , … , n}. If θ ∈ Pn , then θ (i) will denote the image of i under θ.
The symbol (− 1) θ for θ ∈ Pn will mean + 1if θ is an even permutation and − 1if θ is an
odd permutation.
Definition: Let A = [a ij ] n × n . Then the determinant of A, written as det A or| A |or
| a ij |n × n is the element
149
Aij = cofactor of a ij in A
= (− 1) i + j . [determinant of the matrix of order n − 1 obtained
by deleting the row and column of A passing through a ij ].
It should be noted that
n
Σ a ik Aij = 0 if k ≠ j
i =1
or = det A if k = j.
Adjoint of a square matrix: Definition: Let A = [a ij ] n × n .
The n × n matrix which is the transpose of the matrix of cofactors of A is called the
adjoint of A and is denoted by adj A.
It should be remembered that
A (adj A) = (adj A) A = (det A) I
where I is the unit matrix of order n.
Inverse of a square matrix: Definition: Let A be a square matrix of order n. If
there exists a square matrix B of order n such that
AB = I = BA
then A is said to be invertible and B is called the inverse of A.
Also we write B = A −1 .
The following results should be remembered :
(i) The necessary and sufficient condition for a square matrix A to be invertible
is that det A ≠ 0.
150
D (α1 ) = D ( x 0 ) = 0 = 0 x 0 + 0 x1 + 0 x 2 + 0 x 3
= 0α1 + 0α 2 + 0α 3 + 0α 4
D (α 2 ) = D ( x ) = x 0 = 1x 0 + 0 x1 + 0 x 2 + 0 x 3
1
= 1α1 + 0α 2 + 0α 3 + 0α 4
D (α 3 ) = D ( x ) = 2 x1 = 0 x 0 + 2 x1 + 0 x 2 + 0 x 3
2
= 0α1 + 2α 2 + 0α 3 + 0α 4
D (α 4 ) = D ( x ) = 3 x 2 = 0 x 0 + 0 x1 + 3 x 2 + 0 x 3
3
= 0α1 + 0α 2 + 3α 3 + 0α 4 .
∴ the matrix of D relative to the ordered basis B
0 1 0 0
0 0 2 0
= [D ; B ] =
0 0 0 3
0 0 0 0
4 × 4.
Theorem 1: Let U be an n-dimensional vector space over the field F and let V be an
m-dimensional vector space over F. Let B and B ′ be ordered bases for U and V respectively.
Then corresponding to every matrix [a ij ] m × n of mn scalars belonging to F there corresponds a
unique linear transformation T from U into V such that
[T ; B ; B ′ ] = [a ij ] m × n .
Proof: Let B = {α1 , α 2 , … , α n} and B ′ = { β1 , β 2 , … , β m }.
m m m
Now Σ a i1 β i , Σ a i2 β i , … , Σ a in β i
i =1 i =1 i =1
Since B is a basis for U, therefore by the theorem 2 of article 3.13 of chapter 3 there
exists a unique linear transformation T from U into V such that
m
T (α j ) = Σ a ij β i where j = 1, 2, … , n. …(1)
i =1
and B ′ = { β1 , β 2 , … , β m }.
Then A = [T ; B ; B ′ ] = [a ij ] m × n ,
m
where T (α j ) = Σ a ij β i , j = 1, 2, … , n. …(1)
i =1
m n
= Σ Σ a ij x j β i . …(2)
i =1 j =1
The co-ordinate matrix of T (α) with respect to ordered basis B ′ is an m × 1matrix.
From (2), we see that the ith entry of this column matrix [T (α)] B ′
n
= Σ a ij x j
j =1
(ii) We have ^
0 (α j ) = 0, j = 1, 2, … , n
= 0α1 + 0α 2 + … + 0α n
n
= Σ o ij α i , where each o ij = 0.
i =1
[^
0 ; B ] = [o ij ] n × n = null matrix of the type n × n.
∴ matrix of T + S relative to B, B ′ = [a ij + b ij ] m × n
= [a ij ] m × n + [b ij ] m × n ,
156
∴ [T + S ; B ; B ′ ] = [T ; B ; B ′ ] + [S ; B ; B ′ ].
(ii) We have (cT ) (α j ) = cT (α j ), j = 1, 2, … , n
m m
= c Σ a ij β i = Σ (ca ij ) β i .
i =1 i =1
Theorem 5: Let U, V and W be finite dimensional vector spaces over the field F ; let T be a
linear transformation from U into V and S a linear transformation from V into W.Further let
B, B ′ and B ′ ′ be ordered bases for spaces U, V and W respectively. If A is the matrix of T
relative to the pair B, B ′ and D is the matrix of S relative to the pair B ′ , B ′ ′ then the matrix
of the composite transformation ST relative to the pair B, B ′ ′ is the product matrix C = DA.
Proof: Let dim U = n, dim V = m and dim W = p. Further let
B = {α1 , α 2 , … , α n }, B ′ = { β1 , β 2 , … , β m }
and B′ ′ = {γ 1 , γ 2 , … , γ p }.
Let A = [a i j ] m × n , D = [dk i ] p × m and C = [c k j ] p × n . Then
m
T (α j ) = Σ a ij β i , j = 1, 2, … , n, …(1)
i =1
p
S ( β i) = Σ dki γ k , i = 1, 2, … , m. …(2)
k =1
p
and (ST ) (α j ) = Σ c kj γ k , j = 1, 2, … , n. …(3)
k =1
m
∴ [c kj ] p × n = Σ dki a ij
i = 1 p × n
= [dki ] p × m [a ij ] m × n , by def. of product of two matrices.
Thus C = DA.
157
Note: If U = V = W, then the statement and proof of the above theorem will be as
follows :
Let V be an n-dimensional vector space over the field F ; let T and S be linear transformations
of V. Further let B be an ordered basis for V. If A is the matrix of T relative to B, and D is the
matrix of S relative to B,then the matrix of the composite transformation ST relative to B is the
product matrix
C = DA i. e., [ST ] B = [S ] B [T ] B .
Proof: Let B = {α1 , α 2 , … , α n }.
Let A = [a ij ] n × n , D = [dki ] n × n and C = [c kj ] n × n . Then
n
T (α j ) = Σ a ij α i , j = 1, 2, … , n, …(1)
i =1
n
S (αi ) = Σ dki α k , i = 1, 2, … , n, …(2)
k =1
n
and (ST ) (α j ) = Σ c kj α k , j = 1, 2, … , n. …(3)
k =1
We have (ST ) (α j ) = S [T (α j )]
n n n n
= S Σ a ij α i = Σ a ij S (α i ) = Σ a ij Σ dki α k
i =1 i =1 i =1 k =1
n n
= Σ Σ dki a ij α k . …(4)
k =1 i =1
∴ from (3) and (4), we have
n
c kj = Σ dki a ij .
i =1
n
∴ [c kj ] n × n = Σ dki a ij = [dk i ] n × n [a i j ] n × n .
i = 1 n× n
∴ C = DA.
Theorem 6: Let U be an n-dimensional vector space over the field F and let V be an
m-dimensional vector space over F. For each pair of ordered bases B, B ′ for U and V
respectively, the function which assigns to a linear transformation T its matrix relative to B,
B ′ is an isomorphism between the space L (U, V ) and the space of all m × n matrices over the
field F.
Proof. Let B = {α1 , … , α n } and B′ = { β1 , … , β m }.
Let M be the vector space of all m × n matrices over the field F. Let
ψ : L (U, V ) → M such that
ψ (T ) = [T ; B ; B′ ] V T ∈ L (U, V ).
Let T1 , T2 ∈ L (U, V ) ; and let
[T1 ; B ; B′ ] = [a ij ] m × n and [T2 ; B ; B ′ ] = [b ij ] m × n .
158
m
Then T1 (α j ) = Σ a ij β i , j = 1, 2 , … , n
i =1
m
and T2 (α j ) = Σ b ij β i , j = 1, 2, … , n.
i =1
⇒ T1 (α j ) = T2 (α j ) for j = 1, … , n
⇒ T1 = T2 [∵ T1 and T2 agree on a basis for U ]
∴ ψ is one-one.
ψ is onto:
Let [c ij ] m × n ∈ M. Then there exists a linear transformation T from U into V such
that
m
T (α j ) = Σ c ij β i , j = 1, 2, … , n.
i =1
We have [ T ; B ; B ′ ] = [c ij ] m × n ⇒ ψ (T) = [c ij ]m × n .
∴ ψ is onto.
ψ is a linear transformation:
If a, b ∈ F, then
ψ (aT1 + bT2 ) = [aT1 + bT2 ; B ; B′ ] [by def. of ψ]
= [aT1 ; B ; B′ ] + [bT2 ; B ; B′ ] [by theorem 4]
= a [T1 ; B ; B′ ] + b [T2 ; B ; B′ ] [by theorem 4]
= aψ (T1 ) + bψ (T2 ), by def. of ψ.
∴ ψ is a linear transformation.
Hence ψ is an isomorphism from L (U, V ) onto M.
Note: It should be noted that in the above theorem if U → V , then ψ also
preserves products and I i. e.,
ψ (T1 T2 ) = ψ (T1 ) ψ (T2 )
and ψ (I ) = I i. e., unit matrix.
Theorem 7: Let T be a linear operator on an n-dimensional vector space V and let B be an
ordered basis for V. Prove that T is invertible iff [T ] B is an invertible matrix. Also if T is
invertible, then
[ T −1 ] B = [T ] B −1 ,
i. e., the matrix of T −1 relative to B is the inverse of the matrix of T relative to B.
159
n n
= Σ Σ a ij y j α i .
i =1 j =1
n
Also α = Σ xi α i .
i =1
n
∴ x i = Σ a ij y j
j =1
Similarity:
Similarity of matrices. Definition:Let A and B be square matrices of order n over the
field F. Then B is said to be similar to A if there exists an n × n invertible square matrix C with
elements in F such that
B = C −1 AC.
Theorem 10: The relation of similarity is an equivalence relation in the set of all n × n
matrices over the field F.
Proof: If A and B are two n × n matrices over the field F, then B is said to be
similar to A if there exists an n × n invertible matrix C over F such that
B = C −1 AC.
−1
Reflexive: Let A be any n × n matrix over F. We can write A = I AI , where I is
n × n unit matrix over F.
∴ A is similar to A because I is definitely invertible.
Symmetric: Let A be similar to B. Then there exists an n × n invertible matrix P
over F such that
A = P −1 BP
⇒ PAP −1 = P ( P −1 BP ) P −1
⇒ PAP −1 = B ⇒ B = PAP −1
⇒ B = ( P −1 ) −1 AP −1
[∵ P is invertible means P −1 is invertible and ( P −1 ) −1 = P ]
⇒ B is similar to A.
Transitive: Let A be similar to B and B be similar to C. Then
A = P −1 BP and B = Q −1 CQ,
where P and Q are invertible n × n matrices over F.
We have A = P −1 BP = P −1 (Q −1 CQ ) P
162
= ( P −1 Q −1 ) C (QP )
= (QP) −1 C (QP) [ ∵ P and Q are invertible means QP
is invertible and (QP) −1 = P −1 Q −1 ]
∴ A is similar to C.
Hence similarity is an equivalence relation on the set of n × n matrices over the field
F.
Theorem 11: Similar matrices have the same determinant.
Proof: Let B be similar to A.Then there exists an invertible matrix C such that
B = C −1 AC
⇒ det B = det (C −1 AC ) ⇒ det B = (det C −1 ) (det A) (det C )
⇒ det B = (det C −1 ) (det C ) (det A) ⇒ det B = (det C −1 C ) (det A)
⇒ det B = (det I ) (det A) ⇒ det B = 1 (det A) ⇒ det B = det A.
Similarity of linear transformations: Definition: Let A and B be linear
transformations on a vector space V ( F ). Then B is said to be similar to A if there exists an
invertible linear transformation C on V such that
B = CAC −1 .
Theorem 12: The relation of similarity is an equivalence relation in the set of all linear
transformations on a vector space V ( F ).
Proof: If A and B are two linear transformations on the vector space V ( F ), then
B is said to be similar to A if there exists an invertible linear transformation C on V
such that
B = CAC −1 .
Reflexive: Let A be any linear transformation on V. We can write
A = IAI −1 ,
where I is identity transformation on V.
∴ A is similar to A because I is definitely invertible.
Symmetric: Let A be similar to B. Then there exists an invertible linear
transformation P on V such that
A = PBP −1
⇒ P −1 AP = P −1 ( PBP −1 ) P
⇒ P −1 AP = B ⇒ B = P −1 AP
⇒ B = P −1 A ( P −1 ) −1
⇒ B is similar to A.
Transitive: Let A be similar to B and B be similar to C.
Then A = PBP −1 ,
and B = QCQ −1 ,
where P and Q are invertible linear transformations on V.
163
n n
= Σ Σ pik c kj α i . …(6)
i =1 k =1
From (5) and (6), we have
n n n n
Σ Σ a ik pkj α i = Σ Σ pik c kj α i
i =1 k =1 i = 1 k = 1
n n
⇒ Σ a ik pkj = Σ pik c kj
k =1 k =1
⇒ [a ik ] n × n [ pkj ] n × n = [ pik ] n × n [c kj ] n × n
[by def. of matrix multiplication]
⇒ AP = PC
⇒ P −1 AP = P −1 PC [∵ P −1 exists]
⇒ P −1 AP = IC ⇒ P −1 AP = C
⇒ C is similar to A.
Note: Suppose B and B′ are two ordered bases for an n-dimensional vector space
V ( F ).Let T be a linear operator on V.Suppose A is the matrix of T relative to B and
C is the matrix of T relative to B′ . If P is the transition matrix from the basis B to the
basis B′ , then C = P −1 AP.
This result will enable us to find the matrix of T relative to the basis B ′ when we
already knew the matrix of T relative to the basis B.
Theorem 14: Let V be an n-dimensional vector space over the field F and T1 and T2 be
two linear operators on V. If there exist two ordered bases B and B ′ for V such that
[T1 ] B = [T2 ] B ′ , then show that T2 is similar to T1 .
Proof: Let B = {α1 , … , α n } and B′ = { β1 , … , β n }.
Let [T1 ] B = [T2 ] B ′ = A = [a ij ] n × n . Then
n
T1 (α j ) = Σ a ij α i , j = 1, 2, … , n, …(1)
i =1
n
and T2 ( β j ) = Σ a ij β i , j = 1, 2, … , n. …(2)
i =1
n
= S Σ ai j α i [ ∵ S is linear]
i =1
= S [T1 (α j )] [From (1)]
= (ST1 ) (α j ). …(5)
From (4) and (5), we have
(T2 S ) (α j ) = (ST1 ) (α j ), j = 1, 2, … , n.
Since T2 S and ST1 agree on a basis for V, therefore we have T2 S = ST1
⇒ T2 SS −1 = ST1 S − 1 ⇒ T2 I = ST1 S −1
⇒ T2 = ST1 S −1 ⇒ T2 is similar to T1 .
Determinant of a linear transformation on a finite dimensional vector
space: Let T be a linear operator on an n-dimensional vector space V ( F ).If B and
B′ are two ordered bases for V, then [T ] B and [T ] B ′
are similar matrices. Also similar matrices have the same determinant. This enables
us to make the following definition :
Definition: Let T be a linear operator on an n-dimensional vector space V ( F ). Then the
determinant of T is the determinant of the matrix of T relative to any ordered basis for V.
By the above discussion the determinant of T as defined by us will be a unique
element of F and thus our definition is sensible.
Scalar Transformation: Definition: Let V ( F ) be a vector space. A linear
transformation T on V is said to be a scalar transformation of V if
T (α) = cα V α ∈ V ,
where c is a fixed scalar in F.
Also then we write T = c and we say that the linear transformation T is equal to the
scalar c.
Also obviously if the linear transformation T is equal to the scalar c, then we have
T = cI , where I is the identity transformation on V.
Trace of a Matrix: Definition: Let A be a square matrix of order n over a field F. The
sum of the elements of A lying along the principal diagonal is called the trace of A. We shall
write the trace of A as trace A. Thus if A = [a ij ] n × n , then
n
tr A = Σ a ii = a11 + a22 + … + a nn .
i =1
In the following two theorems we have given some fundamental properties of the
trace function.
Theorem 15: Let A and B be two square matrices of order n over a field F and λ ∈ F.Then
(i) tr (λ A) = λ tr A ;
(ii) tr ( A + B ) = tr A + tr B ;
(iii) tr ( AB ) = tr ( BA).
166
(ii) We have A + B = [a ij + b ij ] n × n .
n n n
∴ tr ( A + B ) = Σ (a ii + b ii ) = Σ a ii + Σ b ii = tr A + tr B.
i =1 i =1 i =1
n
(iii) We have AB = [c i j ] n × n where c ij = Σ a i k b k j .
k =1
n
Also BA = [dij ] n × n where dij = Σ b ik a kj .
k =1
n n n
Now tr ( AB ) = Σ c ii = Σ Σ a ik b ki
i =1 i =1 k =1
n n
= Σ Σ a ik b ki ,
k =1 i =1
3 3 3
∴ [T ] B ′ = −6 −6 −2 .
6 5 −1
Example 4: Let T be the linear operator on R3 defined by
T ( x1 , x2 , x3 ) = (3 x1 + x3 , − 2 x1 + x2 , − x1 + 2 x2 + 4 x3 ).
What is the matrix of T in the ordered basis {α1 , α 2 , α 3 } where
α1 = (1, 0, 1), α 2 = (− 1, 2, 1) and α 3 = (2, 1, 1) ?
Solution: By def. of T, we have
T (α1 ) = T (1, 0, 1) = (4, − 2, 3).
Now our aim is to express (4, − 2, 3) as a linear combination of the vectors in the
basis B = {α1 , α 2 , α 3 }. Let
(a, b, c ) = x α1 + yα 2 + zα 3
= x (1, 0, 1) + y (− 1, 2, 1) + z (2, 1, 1)
= ( x − y + 2z , 2 y + z , x + y + z ).
Then x − y + 2z = a, 2 y + z = b, x + y + z = c .
Solving these equations, we have
− a − 3b + 5c b+c −a b−c +a
x= , y= , z = ⋅ …(1)
4 4 2
Putting a = 4, b = − 2, c = 3 in (1), we get
17 3 1
x= , y = − ,z = − ⋅
4 4 2
17 3 1
∴ T (α1 ) = α1 − α 2 − α 3 .
4 4 2
Also T (α 2 ) = T (− 1, 2, 1) = (− 2, 4, 9).
35 15 −7
Putting a = − 2, b = 4, c = 9 in (1), we get x = , y= ,z = ⋅
4 4 2
35 15 7
∴ T (α 2 ) = α1 + α 2 − α3 .
4 4 2
Finally T (α 3 ) = T (2, 1, 1) = (7, − 3, 4).
11 3
Putting a = 7, b = − 3, c = 4 in (1), we get x = , y = − , z = 0.
2 2
11 3
∴ T (α 3 ) = α1 − α 2 + 0α 3 .
2 2
17 35 11
4 4 2
3 15 3
∴ [T ] B = − − ⋅
4 1
4
7
2
− − 0
2 2
169
We shall now find a formula for T −1 . Let α = (a, b, c ) be any vector belonging to R3 .
Then
[T −1 (α)] B = [T −1 ] B [α] B [See Note of Theorem 2, article 4.2]
4 2 −1 a 4a + 2b − c
1 1
= 8 13 −2 b = 8a + 13b − 2c
9 9
−3 −6 3 c − 3a − 6b + 3c
Since B is the standard ordered basis for R3 ,
−1 −1 1
∴ T (α) = T (a, b, c ) = (4a + 2b − c , 8a + 13b − 2c , − 3a − 6b + 3c ).
9
Example 6: Let T be the linear operator on R3 defined by
T ( x1 , x2 , x3 ) = (3 x1 + x3 , − 2 x1 + x2 , − x1 + 2 x2 + 4 x3 ).
(i) What is the matrix of T in the standard ordered basis B for R3 ?
(ii) Find the transition matrix P from the ordered basis B to the ordered basis
B′ = {α1 , α 2 , α 3 } where α1 = (1, 0, 1), α 2 = (− 1, 2, 1), and α 3 = (2, 1, 1). Hence find the
matrix of T relative to the ordered basis B ′ .
Solution: (i) Let A = [T ] B . Then
3 0 1
A = − 2 1 0 ⋅ [For calculation work see Example 5]
− 1 2 4
(ii) Since B is the standard ordered basis, therefore the transition matrix P from B
to B′ can be immediately written as
1 − 1 2
P = 0 2 1 ⋅
1 1 1
−1
Now [T ] B ′ = P [T ] B P. [See note of theorem 13, article 4.2]
−1
In order to compute the matrix P , we find that det P = − 4.
1 3 −5
1 1
Therefore P −1 = Adj. P = − 1 −1 −1.
det P 4
−2 −2 2
1 3 − 5 3 0 1 1 − 1 2
1
∴ [T ] B ′ =− 1 − 1 − 1 − 2 1 0 0 2 1
4
− 2 − 2 2 − 1 2 4 1 1 1
2 − 7 − 19 1 − 1 2
1
=− 6 − 3 − 3 0 2 1
4
− 4 2 6 1 1 1
171
17 35 11
− 17 − 35 − 22 4 4 2
1 3 15 3
=− 3 − 15 6 = − − ⋅
4 4 4 2
2 14 0 1 7
− − 0
2 2
[Note that this result tallies with that of Example 4].
Example 7: Let T be a linear operator on R2 defined by :
T ( x, y) = (2 y, 3 x − y).
Find the matrix representation of T relative to the basis { (1, 3), (2, 5) }.
Solution: Let α1 = (1, 3) and α 2 = (2, 5). By def. of T, we have
1 0 0 1 0 0 0 0
=1 +0 +1 +0 0 1 ,
0 0 0 0 1 0
1 1 0 1 0 1
T (α 2 ) = = 0 1
1 1 0 0
1 0 0 1 0 0 0 0
=0 + 10 0 + 0 1 +1
0 0 0
0 1
1 1 0 0 1 0
T (α 3 ) = =
1 1 1 0 1 0
1 0 0 1 0 0 0 0
=1 +0 +1 +0 ,
0 0 0 0 1 0 0 1
1 1 0 0 0 1
and T (α 4 ) = =
1 1 0 1 0 1
1 0 0 1 0 0 0 0
=0 +1 +0 1 0 + 1 0 1.
0 0 0 0
1 0 1 0
0 1 0 1
∴ [T ] B = ⋅
1 0 1 0
0 1 0 1
Example 9: If the matrix of a linear transformation T on V2 ( C), with respect to the
1 1
ordered basis B = {(1, 0), (0, 1) }is , what is the matrix of T with respect to the ordered
1 1
basis B ′ = {(1, 1), (1, − 1) } ?
Solution: Let us first define T explicitly. It is given that
1 1
[T ] B = .
1 1
∴ T (1, 0) = 1 (1, 0) + 1 (0, 1) = (1, 1), and T (0, 1) = 1 (1, 0) + 1 (0, 1) = (1, 1).
If (a, b) ∈ V2 (C), then we can write (a, b) = a (1, 0) + b (0, 1).
∴ T (a, b) = aT (1, 0) + bT (0, 1)
= a (1, 1) + b (1, 1) = (a + b, a + b).
This is the explicit expression for T.
Now let us find the matrix of T with respect to B′ .
We have T (1, 1) = (2, 2).
Let (2, 2) = x (1, 1) + y (1, − 1) = ( x + y, x − y).
Then x + y = 2, x − y = 2
⇒ x = 2 , y = 0.
∴ (2, 2) = 2 (1, 1) + 0 (1, − 1).
Also T (1, − 1) = (0, 0) = 0 (1, 1) + 0 (1, − 1).
173
2 0
∴ [T ] B ′ = ⋅
0 0
Note: If P is the transition matrix from the basis B to the basis B ′ , then
1 1
P= ⋅
1 − 1
We can compute [T ] B ′ by using the formula
[T ] B ′ = P −1 [T ] B P.
Example 10: Show that the vectors α1 = (1, 0, − 1), α 2 = (1, 2, 1), α 3 = (0, − 3, 2) form
a basis for R3 . Express each of the standard basis vectors as a linear combination of
α1 , α 2 , α 3 .
Solution: Let a, b, c be scalars i. e., real numbers such that
aα1 + bα 2 + cα 3 = 0
i. e., a (1, 0, − 1) + b (1, 2, 1) + c (0, − 3, 2) = (0, 0, 0)
i. e., (a + b + 0 c , 0 a + 2b − 3c , − a + b + 2c ) = (0, 0, 0)
a + b + 0 c = 0,
i. e., 0 a + 2b − 3c = 0, …(1)
− a + b + 2c = 0.
The coefficient matrix A of these equations is
1 1 0
A= 0 2 − 3 ⋅
− 1 1 2
1 1 0
We have det A = | A | = 0 2 − 3 ⋅
−1 1 2
= 1 (4 + 3) − 1 (0 − 3) = 7 + 3 = 10.
Since det A ≠ 0, therefore the matrix A is non-singular and rank A = 3 i. e., equal to
the number of unknowns a, b, c . Hence a = 0, b = 0, c = 0 is the only solution of the
equations (1). Therefore the vectors α1 , α 2 , α 3 are linearly independent over R.
Since dim R3 = 3, therefore the set {α1 , α 2 , α 3 } containing three linearly
independent vectors forms a basis for R3 .
Now let B = { e1 , e2 , e3 } be the standard ordered basis for R3 .
Then e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1).
Let B′ = {α1 , α 2 , α 3 }.
We have α1 = (1, 0, − 1) = 1 e1 + 0 e2 − 1e3
α 2 = (1, 2, 1) = 1e1 + 2e2 + 1e3
α 3 = (0, − 3, 2) = 0 e1 − 3e2 + 2e3 .
174
1 1 0
P = 0 2 − 3 ⋅
−1 1 2
Let us find the matrix P −1 . For this let us first find Adj. P.
The cofactors of the elements of the first row of P are
2 −3 0 −3 0 2
1 2 , − −1 2 , −1 1 , i. e., 7, 3, 2.
The cofactors of the elements of the second row of P are
1 0 1 0 1 1
− , , − i. e., − 2, 2, − 2.
1 2 −1 2 −1 1
The cofactors of the elements of the third row of P are
1 0 1 0 1 1
,− , , i. e, − 3, 3, 2
2 −3 0 −3 0 2
7 3 2 7 − 2 − 3
∴ Adj P = transpose of the matrix − 2 2 − 2 = 3
2 3 ⋅
− 3 3 2 2 − 2 2
7 − 2 − 3
1 1
∴ −1
P = Adj P = 3 2 3 ⋅
det P 10
2 − 2 2
Now e1 = 1e1 + 0 e2 + 0 e3 .
1
∴ Coordinate matrix of e1 relative to the basis B = 0 ⋅
0
∴ Co-ordinate matrix of e1 relative to the basis B ′
1
−1
= [e1 ] B ′ = P 0
0
7 − 2 − 3 1
1
= 3 2 3 0
10
2 − 2 2 0
7 7 /10
1
= 3 = 3 /10 ⋅
10
2 2 /10
7 3 2
∴ e1 = α1 + α2 + α3 .
10 10 10
175
0 0
Also [e2 ] B = 1 and [e3 ] B = 0 ⋅
0 1
0 0
∴ [e2 ] B ′ = P −1 1 and [e3 ] B ′ = P −1 0 ⋅
0 1
− 2 − 3
1 1
Thus [e2 ] B ′ = 2 , [e3 ] B ′ = 3 ⋅
10 10
− 2 2
2 2 2
∴ e2 = − α1 + α2 − α3
10 10 10
3 3 2
and e3 = − α1 + α2 + α3 .
10 10 10
Example 11: Let A be an m × n matrix with real entries. Prove that A = 0 (null matrix) if
and only if trace ( A t A) = 0.
Solution: Let A = [a ij ] m × n .
Then A t = [b ij ] n × m , where b ij = a ji .
Now A t A is a matrix of the type n × n.
Let A t A = [c ij ] n × n . Then
c ii = the sum of the products of the corresponding elements of
the ith row of A t and the ith column of A
= b i1 a1i + b i2 a2 i + … + b im a mi
= a1i a1i + a2 i a2 i + … + a m i a m i [∵ b ij = a ji ]
= a1i + a2 i + … + a mi .
2 2 2
n n
Now trace ( A t A) = Σ c ii = Σ {a1i2 + a2 i2 + … + a mi2 }
i =1 i =1
Example 12: Show that the only matrix similar to the identity matrix I is I itself.
Solution: The identity matrix I is invertible and we can write I = I −1 I I .
Therefore I is similar to I. Further let B be a matrix similar to I. Then there exists an
invertible matrix P such that
B = P −1 IP
⇒ B = P −1 P [ ∵ P −1 I = P −1 ]
⇒ B = I.
Hence the only matrix similar to I is I itself.
Example 13: If T and S are similar linear transformations on a finite dimensional vector
space V ( F ), then det T = det S.
Solution: Since T and S are similar, therefore there exists an invertible linear
transformation P on V such that T = PSP −1 .
Therefore det T = det ( PSP −1 ) = (det P) (det S ) (det P −1 )
= (det P) (det P −1 ) (det S ) = [det ( PP −1 )] (det S )
= (det I ) (det S ) = 1 (det S ) = det S.
Comprehensive Exercise 1
(ii) Find the transition matrix P from the ordered basis B to the ordered
basis B ′ = {α1 , α 2 } where α1 = (1, 1), α 2 = (− 1, 0). Hence find the
matrix of T relative to the ordered basis B′.
6. Let T be the linear operator on R 2 defined by T (a, b) = (a, 0). Write the
matrix of T in the standard ordered basis B = {(1, 0), (0, 1)}.If
B ′ = {(1, 1), (2, 1)} is another ordered basis for R 3 , find the transition matrix P
from the basis B to the basis B ′. Hence find the matrix of T relative to the
basis B ′.
7. The matrix of a linear transformation T on V3 (C) relative to the basis
0 1 1
B = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} is 1 0 −1 .
−1 −1 0
What is the matrix of T relative to the basis
B′ = {(0, 1, − 1), (1, − 1, 1), (− 1, 0, 1)}?
8. Find the matrix relative to the basis
α1 = , ,− , α 2 = ,− ,− , α 3 = ,− ,
2 2 1 1 2 2 2 1 2
3 3 3 3 3 3 3 3 3
of R 3 , of the linear transformation T : R 3 → R 3 whose matrix relative to
2 0 0
the standard ordered basis is 0 4 0 ⋅
0 0 3
9. Find the matrix representation of the linear mappings relative to the usual
bases for R n .
(i) T : R 3 → R 2 defined by T ( x, y, z ) = (2 x − 4 y + 9z , 5 x + 3 y − 2z ).
(ii) T : R → R 2 defined by T ( x) = (3 x, 5 x).
(iii) T : R 3 → R 3 defined by T ( x, y, z ) = ( x, y, 0).
(iv) T : R 3 → R 3 defined by T ( x, y, z ) = (z , y + z , x + y + z ).
10. Let B = {(1, 0),(0, 1)} and B′ = {(1, 2), (2, 3)} be any two bases of R 2 and
T ( x, y) = (2 x − 3 y, x + y).
(i) Find the transition matrices P and Q from B to B′ and from B′ to B
respectively.
(ii) Verify that [α] B = P [α] B ′ V α ∈ R 2
(iii) Verify that P −1 [T ] B P = [T ] B ′ .
11. Let V be the space of all 2 × 2 matrices over the field F and let P be a fixed 2 × 2
matrix over F. Let T be the linear operator on V defined by
T ( A) = PA, V A ∈ V . Prove that trace (T ) = 2 trace ( P).
178
1 2
12. Let V be the space of 2 × 2 matrices over R and let M = ⋅
3 4
Let T be the linear operator on V defined by T ( A) = MA. Find the trace
of T.
13. Find the trace of the operator T on R 3 defined by
T ( x, y, z ) = (a1 x + a2 y + a3 z , b1 x + b2 y + b3 z , c1 x + c 2 y + c 3 z ).
14. Show that the only matrix similar to the zero matrix is the zero matrix itself.
A nswers 1
3 0 13
2 1 2 2 4
0 1 1 1 5
1. 2. 1
2 4
−2 2 0 1
0 1 −
2
1, 0, 1 , 3 3 − 2
3. 4.
2 2 1 2
4 −2 1 − 1 3 − 2
5. (i) (ii) ;
2 1 1 0 1 2
1 0 1 2 −1 −2
6. [T ] B = ;P= ; [T ] B ′ =
0 0 1 1 1 2
1 0 −4 3 −2/3 −2/3
7. [T ] B ′ = 0 0 −2
8. −2/3 10 /3 0
0 0 −3 −2/3 0 8/3
2 −4 9 3
9. (i) (ii)
5 3 −2 5
1 0 0 0 0 1
(iii) 0 1 0 (iv) 0 1 1
0 0 0 1 1 1
1 2 −3 2
10. (i) P= ;Q=
2 3 2 −1
12. trace (T ) = 10
13. trace (T ) = a1 + b2 + c 3
179
1 0 0 1
(a) (b)
0 0 0 0
0 0 0 0
(c) (d) .
1 0 0 1
2. The transition matrix P from the standard ordered basis to the ordered basis
{(1, 1), (− 1, 0)} is
1 1 1 − 1
(a) (b)
−1 0 1 0
0 1 1 0
(c) (d) .
1 0 0 1
3. Let V be the vector space of 2 × 2 matrices over R and let
1 2
M = .
3 4
Let T be the linear operator on V defined by T ( A) = MA. Then the trace of T
is
(a) 5 (b) 10
(c) 0 (d) None of these.
Fill in the Blank(s)
Fill in the blanks ‘‘……’’ so that the following statements are complete and
correct.
1. If T is a linear operator on R2 defined by
2
T ( x, y) = ( x − y, y), then T ( x, y) = …… .
2. Let A be a square matrix of order n over a field F. The sum of the elements of A
lying along the principal diagonal is called the …… of A.
3. Let T and S be similar linear operators on the finite dimensional vector space
V ( F ), then det (T ) …… det (S ).
180
True or False
Write ‘T’ for true and ‘F’ for false statement.
1. The relation of similarity is an equivalence relation in the set of all linear
transformations on a vector space V ( F ).
2. Similar matrices have the same trace.
A nswers
True or False
1. T 2. T
¨
181
5
L inear F unctionals
Illustration 1: Let Vn ( F ) be the vector space of ordered n-tuples of the elements of the
field F.
Let x1 , x2 , … , x n be n field elements of F. If
α = (a1 , a2 , … , a n ) ∈ Vn ( F ),
182
Thus the trace of A is the scalar obtained by adding the elements of A lying along
the principal diagonal.
The trace function is a linear functional on V because if
a, b ∈ F and A = [a ij ] n × n , B = [b ij ] n × n ∈ V , then
tr (aA + bB) = tr (a [a ij ] n × n + b [b ij ] n × n ) = tr ([aa ij + bb ij ] n × n )
n n n
= Σ (aa ii + bb ii ) = a Σ a ii + b Σ b ii = a (tr A ) + b (tr B ).
i =1 i =1 i =1
linear functional on V.
(^
0 + f ) (α) = ^
0 (α) + f (α) [by (1)]
^
∴ 0 + f = f V f ∈V′.
^
∴ 0 is the additive identity in V ′ .
Existence of additive inverse of each element in V ′.
Let f ∈ V ′ . Let us define − f as follows :
(− f ) (α) = − f (α) V α ∈ V .
Then − f ∈ V ′ . If α ∈ V, we have
(− f + f ) (α) = (− f ) (α) + f (α) [by (1)]
= − f (α) + f (α) [by def. of − f ]
=0
=^
0 (α) [by def. of ^
0]
^
∴ − f + f = 0 for every f ∈ V ′ .
Thus each element in V ′ possesses additive inverse. Therefore V ′ is an abelian
group with respect to addition defined in it.
Further we make the following observations :
(i) Let c ∈ F and f1 , f 2 ∈ V ′ . If α is any element in V, we have
[c ( f1 + f 2 )] (α) = c [( f1 + f 2 ) (α)] [by (2)]
= c [ f1 (α) + f 2 (α)] [by (1)]
= cf1 (α) + cf 2 (α)
= (cf1 ) (α) + (cf 2 ) (α) [by (2)]
= (cf1 + cf 2 ) (α) [by (1)]
∴ c ( f1 + f 2 ) = cf1 + cf 2 .
(ii) Let a, b ∈ F and f ∈ V ′ . If α ∈ V, we have
[(a + b) f ] (α) = (a + b) f (α) [by (2)]
= af (α) + bf (α) [∵ F is a field]
= (af ) (α) + (bf ) (α) [by (2)]
= (af + bf ) (α) [by (1)]
∴ (a + b) f = af + bf .
(iii) Let a, b ∈ F and f ∈ V ′ . If α ∈ V, we have
[(ab) f ] (α) = (ab) f (α) [by (2)]
= a [bf (α)] [∵ multiplication in F is associative]
= a [(bf ) (α)] [by (2)]
= [a (bf )] (α) [by (2)]
∴ (ab) f = a (bf ).
187
⇒ c j = 0, j = 1, 2 , … , n
⇒ f1 , f 2 , … , f n are linearly independent.
In the second place, we shall show that the linear span of B′ is equal to V ′ .
Let f be any element of V ′ . The linear functional f will be completely
determined if we define it on a basis for V. So let
f (α i ) = a i , i = 1, 2, … , n. …(2)
We shall show that
n
f = a1 f1 + … + a n f n = Σ a i f i .
i =1
We know that two linear functionals on V are equal if they agree on a basis of V. So
let α j ∈ B where j = 1, … , n. Then
n n
i Σ a i f i (α j ) = Σ a i f i (α j )
=1 i =1
n
= Σ a i δ ij [from (1)]
i =1
combination of f1 , … , f n .
∴ V ′ = linear span of B′ . Hence B′ is a basis for V ′ .
Now dim V ′ = number of distinct elements in B′ = n.
190
= c j , j = 1, 2, … , n.
n
∴ f = Σ f (α i ) f i .
i =1
Theorem 4:Let V be an n-dimensional vector space over the field F. If α is a non-zero vector
in V, there exists a linear functional f on V such that f (α) ≠ 0.
Proof: Since α ≠ 0, therefore {α}is a linearly independent subset of V. So it can be
extended to form a basis for V. Thus there exists a basis B = {α1 , … , α n }for V such
that α1 = α.
If B′ = { f1 , … , f n } is the dual basis, then
f1 (α) = f1 (α1 ) = 1 ≠ 0.
Thus there exists linear functional f1 such that
f1 (α) ≠ 0.
Corollary: Let V be an n-dimensional vector space over the field F. If
f (α) = 0 V f ∈ V ′ , then α = 0.
Proof:Suppose α ≠ 0. Then there is a linear functional f on V such that f (α) ≠ 0.
This contradicts the hypothesis that
f (α) = 0 V f ∈ V ′ .
Hence we must have α = 0.
Theorem 5: Let V be an n-dimensional vector space over the field F. If α, β are any two
different vectors in V,then there exists a linear functional f on V such that f (α) ≠ f ( β).
Proof: We have α ≠ β ⇒ α − β ≠ 0.
Now α − β is a non-zero vector in V. Therefore by theorem 4, there exists a linear
functional f on V such that
f (α − β) ≠ 0 ⇒ f (α) − f ( β) ≠ 0 ⇒ f (α) ≠ f ( β)
Hence the result.
5.5 Reflexivity
Second dual space (or Bi-dual space): We know that every vector space V
possesses a dual space V ′ consisting of all linear functionals on V. Now V ′ is also a
vector space. Therefore it will also possess a dual space (V ′ )′ consisting of all linear
functionals on V ′. This dual space of V ′ is called the Second dual space or
Bi-dual space of V and for the sake of simplicity we shall denote it by V ′ ′ .
If V is finite-dimensional, then
dim V = dim V ′ = dim V ′ ′
showing that they are isomorphic to each other.
Theorem 1: Let V be a finite dimensional vector space over the field F. If α is any vector in
V, the function Lα on V ′ defined by
Lα ( f ) = f (α) V f ∈ V ′
is a linear functional on V ′ i.e., Lα ∈ V ′ ′ .
Also the mapping α ⇒ Lα is an isomorphism of V onto V ′ ′ .
192
of V3 (C). If { f1 , f 2 , f 3 } is the dual basis and if α = (0, 1, 0), find f1 (α), f 2 (α) and
f 3 (α).
Solution: Let α = a1α1 + a2 α 2 + a3 α 3 . Then
f1 (α) = a1 , f 2 (α) = a2 , f 3 (α) = a3 .
Now α = a1α1 + a2 α 2 + a3 α 3
⇒ (0, 1, 0) = a1 (1, 1, 1) + a2 (1, 1, − 1) + a3 (1, − 1, − 1)
⇒ (0, 1, 0) = (a1 + a2 + a3 , a1 + a2 − a3 , a1 − a2 − a3 )
⇒ a1 + a2 + a3 = 0, a1 + a2 − a3 = 1, a1 − a2 − a3 = 0
1 1
⇒ a1 = 0, a2 = , a3 = − ⋅
2 2
1 1
∴ f1 (α) = 0, f 2 (α) = , f 3 (α) = − ⋅
2 2
Example 3: If f is a non-zero linear functional on a vector space V and if x is an arbitrary
scalar, does there necessarily exist a vector α in V such that f (α) = x ?
Solution: f is a non-zero linear functional on V. Therefore there must be some
non-zero vector β in V such that
f ( β) = y where y is a non-zero element of F.
If x is any element of F, then
−1 −1
x = ( xy ) y = ( xy ) f ( β)
−1
= f [( xy ) β] [ ∵ f is linear functional]
Thus there exists α = ( xy −1 ) β ∈ V such that f (α) = x.
Note: If f is a non-zero linear functional on V ( F ), then f is necessarily a
function from V onto F.
Important Note: In some books f (α) is written as [α, f ].
If possible, let
α = c ′ α 0 + β ′ where c ′ ∈ F and β′ ∈ N.
Then c α 0 + β = c ′ α 0 + β ′ …(1)
⇒ (c − c ′ ) α 0 + ( β − β ′ ) = 0
⇒ f [(c − c ′ ) α 0 + ( β − β ′ )] = f (0)
⇒ (c − c ′ ) f (α 0 ) + f ( β − β ′ ) = 0
⇒ (c − c ′ ) f (α 0 ) = 0
[ ∵ β, β ′ ∈ N ⇒ β − β ′ ∈ N and thus f ( β − β ′ ) = 0]
⇒ (c − c ′ ) = 0 [ ∵ f (α 0 ) is a non-zero element of F ]
c = c′.
Putting c = c ′ in (1), we get cα 0 + β = cα 0 + β ′ ⇒ β = β ′ .
Hence c and β are unique.
Example 6: If f and g are in V ′ such that f (α) = 0 ⇒ g (α) = 0, prove that g = kf for
some k ∈ F.
Solution: It is given that f (α) = 0 ⇒ g (α) = 0. Therefore if α belongs to the null
space of f , then α also belongs to the null space of g. Thus null space of f is a subset
of the null space of g.
(i) If f is zero linear functional, then null space of f is equal to V. Therefore in this
case V is a subset of null space of g. Hence null space of g is equal to V. So g is also
zero linear functional. Hence we have
g = k f V k ∈ F.
(ii) Let f be non-zero linear functional on V. Then there exists a non-zero vector
α 0 ∈ V such that f (α 0 ) = y where y is a non-zero element of F.
g (α 0 )
Let k= ⋅
f (α 0 )
If α ∈ V, then we can write
α = c α 0 + β where c ∈ F and β ∈ null space of f .
We have g (α) = g (c α 0 + β) = cg (α 0 ) + g ( β)
= cg (α 0 )
[ ∵ β ∈ null space of f ⇒ f ( β) = 0 and so g ( β) = 0]
Also (k f ) (α) = k f (α) = k f (cα 0 + β)
= k [cf (α 0 ) + f ( β)]
= kc f (α 0 ) [ ∵ f ( β) = 0]
g (α 0 )
= cf (α 0 ) = cg (α 0 ).
f (α 0 )
Thus g (α) = (k f ) (α) V α ∈ V .
∴ g = k f.
198
Comprehensive Exercise 1
A nswers 1
5.6 Annihilators
Definition:If V is a vector space over the field F and S is a subset of V,the annihilator of S is
the set S 0 of all linear functionals f on V such that
f (α) = 0 V α ∈ S.
Sometimes A (S ) is also used to denote the annihilator of S.
Thus S 0 = { f ∈ V ′ : f (α) = 0 V α ∈ S }.
It should be noted that we have defined the annihilator of S which is simply a
subset of V . S should not necessarily be a subspace of V.
If S = zero subspace of V, then S 0 = V ′ .
If S = V , then S 0 = V 0 = zero subspace of V ′ .
If V is finite dimensional and S contains a non-zero vector, then S 0 ≠ V ′. If
0 ≠ α ∈ S, then there is a linear functional f on V such that f (α) ≠ 0. Thus there is
f ∈ V ′ such that f ∉ S 0 . Therefore S 0 ≠ V ′ .
Theorem 1: If S is any subset of a vector space V ( F ), then S 0 is a subspace of V ′ .
^
We have 0 (α) = 0 V α ∈ S.
0
Let f , g ∈ S . Then f (α) = 0 V α ∈ S, and g (α) = 0 V α ∈ S.
If a, b ∈ F, then
(af + bg) (α) = (af ) (α) + (bg) (α) = af (α) + bg (α) = a0 + b0 = 0.
∴ af + bg ∈ S 0 .
Thus a, b ∈ F and f , g ∈ S 0 ⇒ af + bg ∈ S 0 .
∴ S 0 is a subspace of V ′ .
Dimension of annihilator:
Theorem 2: Let V be a finite dimensional vector space over the field F, and let W be a
subspace of V. Then
dim W + dim W 0 = dim V .
Proof: If W is zero subspace of V, then W 0 = V ′ .
∴ dim W 0 = dim V ′ = dim V .
Also in this case dim W = 0. Hence the result.
Similarly the result is obvious when W = V .
Let us now suppose that W is a proper subspace of V.
Let dim V = n, and dim W = m where 0 < m < n.
Let B1 = {α1 , … , α m }be a basis for W. Since B1 is a linearly independent subset of V
also, therefore it can be extended to form a basis for V. Let
B = {α1 , … , α m , α m + 1 , … , α n } be a basis for V.
Let B′ = { f1 , … , f m , f m + 1 , … , f n }be the dual basis of B. Then B′ is a basis for V ′
such that f i (α j ) = δ ij .
We claim that S = { f m + 1 , … , f n } is a basis for W 0 .
Now f ∈ W 0 ⇒ f (α) = 0 V α ∈ W
⇒ f (α j ) = 0 for each j = 1, … , m [ ∵ α1 , … , α m are in W ]
n
⇒ Σ x i f i (α j ) = 0 [From (1)]
i =1
n n
⇒ Σ x i f i (α j ) = 0 ⇒ Σ x i δ ij = 0
i =1 i =1
201
⇒ x j = 0 for each j = 1, … , m.
Putting x1 = 0, x2 = 0, … , x m = 0 in (1), we get
f = xm + 1 f m + 1 + … + xn f n
= a linear combination of the elements of S.
∴ f ∈ L (S ).
0
Thus f ∈ W ⇒ f ∈ L (S ).
0
∴ W ⊆ L (S ).
0
Now we shall show that L (S ) ⊆ W .
Let g ∈ L (S ). Then g is a linear combination of
f m + 1, … , f n .
n
Let g= Σ yk f k . …(2)
k = m+1
m
We have g (α) = g Σ c j α j [From (3)]
j =1
m
= Σ c j g(α j ) [ ∵ g is linear functional]
j =1
m n
= Σ cj Σ y k f k (α j ) [From (2)]
j =1 k = m+1
m n m n
= Σ cj Σ y k f k (α j ) = Σ c j Σ y k δ kj
j =1 k = m+1 j =1 k = m+1
m
= Σ cj 0 [ ∵ δ kj = 0 if k ≠ j which is so for each
j =1
Example 7: If S1 and S2 are two subsets of a vector space V such that S1 ⊆ S2 , then show
that S2 0 ⊆ S10 .
⇒ f (α) = 0 V α ∈ S1 [ ∵ S1 ⊆ S2 ]
⇒ f ∈ S10 .
∴ S2 0 ⊆ S10 .
Example 8: Let V be a vector space over the field F. If S is any subset of V, then show that
0 0
S = [ L (S )] .
Solution: We know that S ⊆ L (S ).
∴ [ L (S )]0 ⊆ S 0 . …(1)
Now let f ∈ S 0 . Then f (α) = 0 V α ∈ S.
If β is any element of L (S ), then
n
β = Σ x i α i where each α i ∈ S.
i =1
n
We have f ( β) = Σ x i f (α i ) = 0, since each f (α i ) = 0.
i =1
Thus f ( β) = 0 V β ∈ L (S ).
∴ f ∈ ( L (S ))0 .
Therefore S 0 ⊆ ( L (S ))0 …(2)
0 0
From (1) and (2), we conclude that S = ( L (S )) .
Example 9: Let V be a finite-dimensional vector space over the field F. If S is any subset of
00
V, then S = L (S ).
α = α1 + α 2 where α1 ∈ W1 , α 2 ∈ W2 .
We have f (α) = f (α1 + α 2 ) = f (α1 ) + f (α 2 )
=0 +0 [ ∵ α1 ∈ W1 and f ∈ W10 ⇒ f (α1 ) = 0
and similarly f (α 2 ) = 0]
= 0.
Thus f (α) = 0 V α ∈ W1 + W2 .
∴ f ∈ (W1 + W2 )0 .
∴ W10 ∩ W2 0 ⊆ (W1 + W2 )0 . …(1)
Now we shall prove that
(W1 + W2 )0 ⊆ W10 ∩ W2 0 .
We have W1 ⊆ W1 + W2 .
∴ (W1 + W2 )0 ⊆ W10 . …(2)
Similarly, W2 ⊆ W1 + W2 .
∴ (W1 + W2 )0 ⊆ W2 0 . …(3)
From (2) and (3), we have
(W1 + W2 )0 ⊆ W10 ∩ W2 0 . …(4)
From (1) and (4), we have
(W1 + W2 )0 = W10 ∩ W2 0 .
(b) Let us use the result (a) for the vector space V ′ in place of the vector space V.
Thus replacing W1 by W10 and W2 by W2 0 in (a), we get
(W10 + W2 0 )0 = W100 ∩ W2 00
⇒ (W10 + W2 0 )0 = W1 ∩ W2 [ ∵ W100 = W1 etc. ]
⇒ (W10 + W2 0 )00 = (W1 ∩ W2 )0
⇒ W10 + W2 0 = (W1 ∩ W2 )0 .
Example 11: If W1 and W2 are subspaces of a vector space V, and if V = W1 ⊕ W2 ,
then V ′ = W10 ⊕ W2 0 .
(i) W10 ∩ W2 0 = { ^
0}
and (ii) V ′ = W10 + W2 0 i. e., each f ∈ V ′ can be written as f1 + f 2
where f1 ∈ W10 , f 2 ∈ W2 0 .
If α is any vector in V,then, V being the direct sum of W1 and W2 , we can write
α = α1 + α 2 where α1 ∈ W1 , α 2 ∈ W2 .
We have f (α) = f (α1 + α 2 )
= f (α1 ) + f (α 2 ) [ ∵ f is linear functional]
=0 +0 [∵ f ∈ W10 and α1 ∈ W1 ⇒ f (α1 ) = 0
and similarly f (α 2 ) = 0]
= 0.
Thus f (α) = 0 V α ∈ V .
∴ f =^
0.
∴ W10 ∩ W2 0 = {^
0 }.
(ii) Now to prove that V ′ = W10 + W2 0 .
Let f ∈ V ′ .
If α ∈ V, then α can be uniquely written as
α = α1 + α 2 where α1 ∈ W1 , α 2 ∈ W2 .
For each f , let us define two functions f1 and f 2 from V into F such that
f1 (α) = f1 (α1 + α 2 ) = f (α 2 ) …(1)
and f 2 (α) = f 2 (α1 + α 2 ) = f (α1 ). …(2)
First we shall show that f1 is a linear functional on V. Let a, b ∈ F and
α = α1 + α 2 , β = β1 + β 2 ∈ V where α1 , β1 ∈ W1 and α 2 , β 2 ∈ W2 . Then
f1 (aα + bβ) = f1 [a (α1 + α 2 ) + b ( β1 + β 2 )]
= f1 [(aα1 + bβ1 ) + (aα 2 + bβ 2 )]
= f (aα 2 + bβ 2 )
[ ∵ aα1 + bβ1 ∈ W1 , aα 2 + bβ 2 ∈ W2 ]
= af (α 2 ) + bf ( β 2 ) [ ∵ f is linear functional]
= af1 (α) + bf1 ( β) [From (1)]
∴ f1 is linear functional on V i. e., f1 ∈ V ′ .
Now we shall show that f1 ∈ W10 .
Let α1 be any vector in W1 . Then α1 is also in V. We can write
α1 = α1 + 0, where α1 ∈ W1 , 0 ∈ W2 .
∴ from (1), we have
f1 (α1 ) = f1 (α1 + 0) = f (0) = 0.
Thus f1 (α1 ) = 0 V α1 ∈ W1 .
∴ f1 ∈ W10 .
Similarly we can show that f 2 is a linear functional on V and f 2 ∈ W2 0 .
Now we claim that f = f1 + f 2 .
206
It should be noted that TW is quite a different object from T because the domain
of TW is W while the domain of T is V.
Invariance can be considered for several linear transformations also. Thus W is
invariant under a set of linear transformations if it is invariant under each
member of the set.
Matrix interpretation of invariance: Let V be a finite dimensional vector
space over the field F and let T be a linear operator on V. Suppose V has a
subspace W which is invariant under T. Then we can choose suitable ordered basis
B for V so that the matrix of T with respect to B takes some particular simple
form.
Let B1 = {α1 , … , α m} be an ordered basis for W where dim W = m. We can extend
B1 to form a basis for V. Let
B = {α1 , … , α m , α m + 1 , … , α n}
be an ordered basis for V where dim V = n.
Let A = [a ij ] n × n be the matrix of T with respect to the ordered basis B. Then
n
T (α j ) = Σ a i j α i , j = 1, 2, … , n. …(1)
i =1
In other words in the relation (1), the scalars a i j are all zero if 1≤ j ≤ m and
m + 1 ≤ i ≤ n.
Therefore the matrix A takes the simple form
M C
A=
O D
where M is an m × m matrix, C is an m × (n − m) matrix, O is the null matrix of the
type (n − m) × m and D is an (n − m) × (n − m) matrix.
From the relation (2) it is obvious that the matrix M is nothing but the matrix of the
induced operator TW on W relative to the ordered basis B1 for W.
Reducibility: Definition: Let W1 and W2 be two subspaces of a vector space V and
let T be a linear operator on V. Then T is said to be reduced by the pair (W1 , W2 ) if
(i) V = W1 ⊕ W2 ,
(ii) both W1 and W2 are invariant under T.
It should be noted that if a subspace W1 of V is invariant under T, then there are
many ways of finding a subspace W2 of V such that V = W1 ⊕ W2 , but it is not
necessary that some W2 will also be invariant under T. In other words among the
collection of all subspaces invariant under T we may not be able to select any two
other than V and the zero subspace with the property that V is their direct sum.
208
The definition of reducibility can be extended to more than two subspaces. Thus let
W1 , … , Wk be k subspaces of a vector space V and let T be a linear operator on V. Then T is
said to be reduced by (W1 , … , Wk ) if
(i) V is the direct sum of the subspaces W1 , … , Wk ,
and (ii) Each of the subspaces Wi is invariant under T.
Direct sum of linear operators: Definition:
Suppose T is a linear operator on the vector space V. Let
V = W1 ⊕ … ⊕ Wk
be a direct sum decomposition of V in which each subspace Wi is invariant under T.
Then T induces a linear operator Ti on each Wi by restricting its domain from V to
Wi . If α ∈ V, then there exist unique vectors α1 , … , α k with α i in Wi such that
α = α1 + … + α k
⇒ T (α) = T (α1 + … + α k )
⇒ T (α) = T (α1 ) + … + T (α k ) [ ∵ T is linear]
⇒ T (α) = T1 (α1 ) + … + Tk (α k ) [∵ if α i ∈ Wi , then by def. of
Ti , we have T (α i ) = Ti (α i )]
Thus we can find the action of T on V with the help of independent action of the
operators Ti on the subspaces Wi . In such situation we say that the operator T is the
direct sum of the operators T1 ,… , Tk . It should be noted carefully that T is a
linear operator on V, while the Ti are linear operators on the various
subspaces Wi .
0 − 1 1 0 − c − 1
= −c = ⋅
1 0 0 1 1 − c
− c − 1 − c − 1 2
Now det = = c + 1 ≠ 0 for any real number c.
1 − c 1 − c
− c − 1
∴ 1 − c i. e., [T − cI ] B is invertible.
Consequently T − cI is invertible which is contradictory to the result that T − cI is
not invertible.
Hence no proper subspace W of V2 (R) can be invariant under T.
T ′ g =^
^ ^
0 where 0 is zero element of U ′ i. e., 0 is zero functional on U. Therefore
from (1), we get
^
[Tα, g] = [α, 0 ] V α ∈ U
⇒ [Tα , g] = 0 V α ∈ U [∵^
0 (α) = 0 V α ∈ U ]
⇒ g ( β) = 0 V β ∈ R (T )
[ ∵ R (T ) = {β ∈V : β = T (α) for some α ∈ U }]
⇒ g ∈ [ R (T )]0 .
∴ N (T ′ ) ⊆ [ R (T )]0 .
Now let g ∈ [ R (T )]0 which is a subspace of V ′ . Then
g ( β) = 0 V β ∈ R (T )
⇒ [Tα, g] = 0 V α ∈ U [∵ V α ∈ U, Tα ∈ R (T )]
⇒ [α , T ′ g] = 0 V α ∈ U [From (1)]
⇒ T ′ g =^
0 (zero functional on U ) ⇒ g ∈ N (T ′ ).
0
∴ [ R (T )] ⊆ N (T ′ ).
Hence [ R (T )]0 = N (T ′ ).
(ii) Suppose U and V are finite dimensional. Let dim U = n, dim V = m. Let
r = ρ (T ) = the dimension of R (T ).
Now R (T ) is a subspace of V. Therefore
dim R (T ) + dim [ R (T )]0 = dim V . [See Th. 2 of article 5.6]
0
∴ dim [ R (T )] = dim V − dim R (T )
= dim V − r = m − r.
By part (i) of this theorem [ R (T )]0 = N (T ′ ).
∴ dim N (T ′ ) = m − r ⇒ nullity of T ′ = m − r.
But T ′ is a linear transformation from V ′ into U ′ .
∴ ρ (T ′ ) + ν (T ′ ) = dim V ′
or ρ (T ′ ) = dim V ′ − ν (T ′ )
= dim V − nullity of T ′ = m − (m − r) = r.
∴ ρ (T ) = ρ (T ′ ) = r.
(iii) T ′ is a linear transformation from V ′ into U ′. Therefore R (T ′ ) is a subspace
of U ′. Also [N (T )]0 is a subspace of U ′ because N (T ) is a subspace of U. First
we shall show that R (T ′ ) ⊆ [N (T )]0 .
214
Let f ∈ R (T ′ ).
Then f = T ′ g for some g ∈ V ′ .
If α is any vector in N (T ), then Tα = 0.
We have
[α , f ] = [α , T ′ g] = [Tα , g] = [0, g] = 0.
Thus f (α) = 0 V α ∈ N (T ).
Therefore f ∈ [N (T )]0 .
∴ R (T ′ ) ⊆ [N (T )]0
⇒ R (T ′ ) is a subspace of [N (T )]0 .
Now dim N (T ) + dim [N (T )]0 = dim U. [Theorem 2 of article 5.6]
∴ dim [N (T )]0 = dim U − dim N (T )
= dim R (T ) [∵ dim U = dim R (T ) + dim N (T )]
= ρ (T ) = ρ (T ′ ) = dim R (T ′ ).
Thus dim R (T ′ ) = dim [N (T )]0
and R (T ′ ) ⊆ [N (T )]0 .
∴ R (T ′ ) = [N (T )]0 .
Note: If T is a linear transformation on a vector space V, then in the proof of the
above theorem we should replace U by V and m by n.
Theorem 3 : Let U and V be finite-dimensional vector spaces over the field F. Let B be an
ordered basis for U with dual basis B ′ , and let B1 be an ordered basis for V with dual basis
B1 ′. Let T be a linear transformation from U into V. Let A be the matrix of T relative to B, B1
and let C be the matrix of T ′ relative to B1 ′ , B ′. Then C = A′ i. e., the matrix C is the
transpose of the matrix A.
Proof : Let dim U = n, dim V = m .
Let B = {α1 ,... , α n }, B ′ = { f1 ,... , f n },
B1 = { β1 , ... , β m }, B1 ′ = { g1 , ... , g}.
Now T is a linear transformation from U into V and T ′ is that from V ′ into U ′ .The
matrix A of T relative to B, B1 will be of the type m × n. If A = [a ij ] m × n , then by
definition
m
T (α j ) or simply Tα j = Σ a ij β i , j = 1, 2 ..., n. …(1)
i =1
The matrix C of T ′ relative to B1 ′ , B′ will be of the type n × m. If C = [c ji ] n × m,
then by definition
n
T ′ ( g i ) or simply T ′ g i = Σ c j i f j , i = 1, 2 , ... , m …(2)
j =1
n
f = Σ f (α j ) f j . [See theorem 3 of Chapter 5, article 5.4]
j =1
= ai j
[On summing with respect to k and remembering
that δ ik = 1 when k = i and δ ik = 0 when k ≠ i]
Putting this value of (T ′ g i ) (α j ) in (3), we get
m
T ′ g i = Σ a ij f j . …(4)
j =1
Since f1 , ... , f n are linearly independent, therefore from (2) and (4), we get
c ji = a ij .
Hence by definition of transpose of a matrix, we have
C = A′ .
Note: If T is a linear transformation on a finite-dimensional vector space V, then
in the above theorem we put U = V and m = n. Also according to our convention
we take B1 = B. The students should write the complete proof themselves.
Theorem 4 : Let A be any m × n matrix over the field F. Then the row rank of A is equal to
the column rank of A.
Proof : Let A = [ a ij ] m × n . Let
Proof : (i) If ^
0 is the zero transformation on V, then by the definition of the
adjoint of a linear transformation, we have
[α, ^
0 ′ g] = [ ^
0 α , g] for every g in V ′ and α in V
[Here ^
0 is the zero transformation on V ′ ]
Thus we have
[α ,^
0 ′ g] = [α , ^
0 g] for all g in V ′ and α in V.
^ ^
∴ 0 ′ = 0.
(ii) If I is the identity transformation on V, then by the definition of the adjoint of
a linear transformation, we have
[α , I ′ g] = [Iα, g] for every g in V ′ and α in V
= [α , g] for every g in V ′ and α in V [∵ I (α) = α V α ∈ V ]
217
Note that T ′ g ∈ V ′ ]
= [α , (aT ′ ) g]
[by def. of scalar multiplication of T ′ by a]
∴ (aT )′ = aT ′ .
−1
(vi) Suppose T is an invertible linear operator on V. If T is the inverse of T,
−1 −1
we have T T =I =T T
−1 −1
⇒ (T T )′ = I ′ = (T T )′
−1 −1
⇒ T ′ (T ) = I = (T )′ T ′ [Using results (ii) and (iv)]
−1 −1
∴ T ′ is invertible and (T ′ ) = (T )′ .
(vii) V is a finite dimensional vector space. T is a linear operator on V , T ′ is a linear
operator on V ′ and (T ′ )′ or T ′ ′ is a linear operator on V ′ ′ . We have identified
V ′ ′ with V through natural isomorphism α ↔ Lα where α ∈ V and Lα ∈ V ′ ′ .
Here Lα is a linear functional on V ′ and is such that
Lα ( g) = g (α) ∀ g ∈ V ′ . …(1)
Through this natural isomorphism we shall take α = Lα and thus T ′ ′ will be
regarded as a linear operator on V.
Now T ′ is a linear operator on V ′ .Therefore by the definition of adjoint, we have
[ g, T ′ ′ Lα ] = [ g, (T ′ )′ Lα ] = [T ′ g, Lα ] for every g ∈ V ′ and α ∈ V.
Now T ′ g is an element of V ′ . Therefore from (1), we have
[T ′ g, Lα ] = [α , T ′ g]
[Note that from (1), Lα ( T ′ g ) = ( T ′ g ) α]
= [Tα , g]. [by def. of adjoint]
Again T ′ ′ Lα is an element of V ′ ′ . Therefore from (1), we have
[ g, T ′ ′ Lα ] = [β , g] where β ∈ V and β ↔ T ′ ′ Lα
under natural isomorphism
= [T ′ ′ α , g] [∵ β = T ′ ′ Lα = T ′ ′ α when we regard
T ′ ′ as linear operator on V in place of V ′ ′ ]
Thus, we have
[Tα , g] = [T ′ ′ α , g] for every g in V ′ and α in V
⇒ g (Tα) = g (T ′ ′ α) for every g in V ′ and α in V
⇒ g (Tα − T ′ ′ α) = 0 for every g in V ′ and α in V
⇒ Tα − T ′ ′ α = 0 for every α in V
⇒ (T − T ′ ′ ) α = 0 for every α in V
^
⇒ T − T ′′ = 0
⇒ T = T ′ ′.
219
Example 17 : Let V be a finite dimensional vector space over the field F. Show that
T → T ′ is an isomorphism of L (V , V ) onto L (V ′ , V ′ ).
Solution : Let dim V = n.
Then dim V ′ = n.
220
Comprehensive Exercise 2
1. Let V be a finite dimensional vector space over the field F. If W1 and W2 are
subspaces of V, then W10 = W20 iff W1 = W2 .
2. If W1 and W2 are subspaces of a finite-dimensional vector space V and if
V = W1 ⊕ W2 , then
(i) W1 ′ is isomorphic to W20 . (ii) W2 ′ is isomorphic to W10 .
3. Let W be the subspace of R 3 spanned by (1, 1, 0) and (0, 1, 1.
) Find a basis of the
annihilator of W.
4. Let W be the subspace of R 4 spanned by (1, 2, − 3, 4), (1, 3, − 2, 6) and
(1, 4, − 1, 8). Find a basis of the annihilator of W.
222
5. If the set S = {Wi } is the collection of subspaces of a vector space V which are
invariant under T, then show that W = ∩i Wi is also invariant under T.
6. Prove that the subspace spanned by two subspaces each of which is invariant
under some linear operator T, is itself invariant under T.
7. Let V be a vector space over the field F, and let T be a linear operator on V and
let f (t) be a polynomial in the indeterminate t over the field F. If W is the null
space of the operator f (T ), then W is invariant under T.
8. Let T be the linear operator on R2 , the matrix of which in the standard
ordered basis is
1 −1
A= ⋅
2 2
(a) Prove that the only subspaces of R2 invariant under T are R2 and the
zero subspace.
(b) If U is the linear operator on C 2 , the matrix of which in the standard
ordered basis is A, show that U has one dimensional invariant
subspaces.
9. Let T be the linear operator on R2 , the matrix of which in the standard
ordered basis is
2 1
0 ⋅
2
A nswers 2
3. Basis is { f ( x, y, z ) = x − y + z }
4. Basis is { f1 , f 2 } where f1 ( x, y, z , w) = 5 x − y + z , f 2 ( x, y, z , w) = 2 y − w
223
6. 0
If S is any subset of a vector space V ( F ), then S is a subspace of
(a) V (b) V ′
(c) V ′ ′ (d) None of these.
7. Let V be a finite dimensional vector space over the field F. If S is any subset of
V, then S 00 =
(a) S (b) L (S )
0
(c) [ L (S )] (d) None of these.
224
True or False
Write ‘T’ for true and ‘F’ for false statement.
1. Let V ( F ) be a vector space. A linear functional on V is a vector valued
function.
2. Every finite dimensional vector space is not reflexive.
3. If V ( F) is a vector space, then mapping f : V → F : f (α) = 0, then f is a
linear functional.
4. If V is finite-dimensional and W is a subspace of V then W′ is isomorphic to
V ′ /W 0 .
5. If T is a linear transformation from a vector space U into a vector space V then
ρ (T ′ ) ≠ ρ (T ).
A nswers
True or False
1. F 2. F 3. T 4. T 5. F
¨
225
6
C haracteristic V alues and
A nnihilating P olynomials
Definition: Let T be a linear operator on an n-dimensional vector space V over the field F.
Then a scalar c ∈ F is called a characteristic value of T if there is a non-zero vector α
in V such that Tα = cα. Also if c is a characteristic value of T, then any non-zero vector α in V
such that Tα = cα is called a characteristic vector of T belonging to the
characteristic value c.
Characteristic values are sometimes also called proper values, eigen values, or
spectral values. Similarly characteristic vectors are called proper vectors, eigen
vectors, or spectral vectors.
Proof: Suppose A and B are similar matrices. Then there exists an invertible
matrix P such that
B = P −1 AP.
We have B − xI = P −1 AP − xI = P −1 AP − P −1 ( xI ) P
[∵ P −1 ( xI ) P = xP −1 IP = xI ]
= P −1 ( A − xI ) P.
∴ det ( B − xI ) = det P −1det ( A − xI ) det P
= det P −1 . det P . det ( A − xI ) = det ( P −1 P) . det ( A − xI )
= det I . det ( A − xI ) = 1 . det ( A − xI ) = det ( A − xI ).
Thus the matrices A and B have the same characteristic polynomial and
consequently they will have the same characteristic values.
If c is an eigenvalue of A and X is a corresponding eigenvector, then AX = cX , and
hence
B ( P −1 X ) = ( P −1 AP) P −1 X = P −1 AX = P −1 (cX ) = c ( P −1 X ).
∴ P −1 X is an eigenvector of B corresponding to c. This completes the proof of the
theorem.
Now suppose that T is a linear operator on an n-dimensional vector space V. If
B1 , B2 are any two ordered bases for V, then we know that the matrices [T ] B and
1
[T ] B are similar. Also similar matrices have the same characteristic polynomial.
2
This enables us to define sensibly the characteristic polynomial of T.
det (T − xI ) = 0. …(1)
The equation (1) is of degree n in x. If the field F is algebraically closed i. e., if every
polynomial equation in F possesses a root then T will definitely have at least one
characteristic value. If the field F is not algebraically closed, then T may or may not
have a characteristic value according as the equation (1) has or has not a root in F.
Since the equation (1) is of degree n in x, therefore if T has a characteristic value
then it cannot have more than n distinct characteristic values. The field of complex
numbers is algebraically closed. By fundamental theorem of algebra we know that
every polynomial equation over the field of complex numbers is solvable.
Therefore if F is the field of complex numbers then T will definitely have at least
one characteristic value. The field of real numbers is not algebraically closed. If F is
the field of real numbers, then T may or may not have a characteristic value.
Illustration: Consider the linear operator T on V2 (R) which is represented in the
standard ordered basis by the matrix
0 −1
A= ⋅
1 0
The characteristic polynomial for T (or for A) is
0− x −1 − x − 1 2
det ( A − xI ) = det = = x + 1.
1 0 − x 1 − x
The polynomial equation x 2 + 1 = 0 has no roots in R . Therefore T has no
characteristic values.
However if T is a linear operator on V2 (C), then the characteristic equation of T
has two distinct roots i and − i in C. In this case T has two characteristic values i and
− i.
i. e., if ( A − cI ) X = O, …(1)
where X = [α] B = a column matrix of the type n × 1and O is the null matrix of the
type n × 1.Thus to find the coordinate matrix of α with respect to B, we should solve
the matrix equation (1) for X.
Equality of Matric Polynomials: Two matric polynomials are equal iff the
coefficients of the like powers of x are the same.
Lemma: Every square matrix over the field F whose elements are ordinary polynomials in x
over F, can essentially be expressed as a matric polynomial in x of degree m, where m is the
highest power of x occurring in any element of the matrix.
We shall explain this theorem by the following illustration:
Consider the matrix
1 + 2 x + 3 x2 x2 4 − 6x
A = 1 + x3 3 + 4x 2
1 − 2x + 4x 3
3 5 6
2 − 3 x + 2 x
0 0 0
+ x 3 1 0 4 ⋅
2 0 0
233
= a0 + a1 x + a2 x 2 + … + a n x n (say), …(1)
where the a i ’s are in F.
The characteristic equation of A is f ( x) = 0
i. e., a0 + a1 x + a2 x 2 + … + a n x n = 0.
Since the elements of the matrix A − xI are polynomials at most of the first degree
in x, therefore the elements of the matrix adj ( A − x I ) are ordinary polynomials in x
of degree n − 1 or less. Note that the elements of the matrix adj ( A − xI ) are the
cofactors of the elements of the matrix A − xI . Therefore adj ( A − xI ) can be
written as a matrix polynomial in x in the form
adj ( A − xI ) = B0 + B1 x + B2 x 2 + … + Bn − 1 x n − 1 , …(2)
where the Bi ’s are square matrices of order n over F with elements independent of x.
Now by the property of adjoints, we know that
( A − xI ). adj. ( A − xI ) = {det. ( A − xI )} I .
∴ ( A − xI ) { B0 + x B1 + x 2 B2 + … + x n − 1 Bn − 1}
= { a0 + a1 x + … + a n x n } I
[from (1) and (2)]
Equating the coefficients of like powers of x on both sides, we get
AB0 = a0 I
AB1 − IB0 = a1 I
AB2 − IB1 = a2 I
…………………
…………………
234
ABn − 1 − IBn − 2 = a n − 1 I
− IBn − 1 = a n I .
Premultiplying these equations successively by I , A, A2 , … , A n and adding, we get
a0 I + a1 A + a2 A2 + … + a n A n = O,
where O is the null matrix of order n.
Thus f ( A) = O.
Now f (T ) = a0 I + a1T + a2 T 2 + … + a nT n.
∴ [ f (T )] B = [a0 I + a1T + a2 T 2 + … + a nT n ] B
= a0 [I ] B + a1 [T ] B + a2 [T 2 ] B + … + a n [T n ] B
= f ( A).
∴ f ( A) = O ⇒ [ f (T )] B = O = [ ^
0 ]B ⇒ f (T ) = ^
0
⇒ a0 I + a1T + a2 T 2 + … + a nT n = ^
0. …(3)
Corollary:We have
f ( x) = a0 + a1 x + … + a n x n = det. ( A − xI ).
∴ f (0) = a0 = det. A = det. T .
If T is non-singular, then T is invertible and
det. T ≠ 0 i. e., a0 ≠ 0.
Then from (3), we get
a0 I = − (a1T + a2 T 2 + … + a nT n )
a a2 a
⇒ I =− 1 I + T + … + n T n − 1 T
a0 a0 a0
a a a
T −1 = − 1 I + T + … + n T n − 1 ⋅
2
⇒
a0 a0 a0
c1 0 … 0
0 c2 … 0
P −1 AP = … … … … ⋅
… … … …
0 0 … c n
if and only if the j th column of P is a characteristic vector of A corresponding to the
characteristic value c j of A, ( j = 1, 2 ,… , n).
Theorem 10: A linear operator T on an n-dimensional vector space V ( F ) is
diagonalizable if and only if its matrix A relative to any ordered basis B of V is
diagonalizable.
Proof: Suppose T is diagonalizable. Then T has n linearly independent
characteristic vectors α1 , α 2 , … , α n in V. Suppose X1 , X 2 , … , X n are the
co-ordinate vectors of α1 , α 2 , … , α n relative to the basis B. Then X1 , … , X n are
also linearly independent since V is isomorphic to Vn ( F ) by isomorphism which
takes a vector in V to its co-ordinate vector in Vn ( F ). Under an isomorphism a
linearly independent set is mapped onto a linearly independent set. Further
X1 , … , X n are the characteristic vectors of the matrix A [see theorem 6]. Therefore
the matrix A is diagonalizable. [See theorem 9].
Conversely suppose the matrix A is diagonalizable. Then A has n linearly
independent characteristic vectors X1 , … , X n in Vn ( F ). If α1 , … , α n are the
vectors in V having X1 , … , X n as their coordinate vectors, then α1 , … , α n will be n
linearly independent characteristic vectors of T. So T is diagonalizable.
Theorem 11: Let T be any linear operator on a finite dimensional vector space V, let
c1 , c 2 , … , c k be the distinct characteristic values of T, and let Wi be the null space of
(T − c i I ). Then the subspaces W1 , … , Wk are independent.
Further show that if in addition T is diagonalizable, then V is the direct sum of the subspaces
W1 , … , Wk .
Proof: By definition of Wi , we have
Wi = {α : α ∈ V and (T − c i I ) α = 0 i. e., Tα = c i α}.
Now let α i be in Wi , i = 1, … , k, and suppose that
α1 + α 2 + … + α k = 0. …(1)
Let j be any integer between 1 and k and let
U j = Π (T − c i I ).
1≤i≤k
i≠ j
1 − x 0 … 0
0 1− x … 0
. . … .
= (1 − x) n .
. . … .
. . … .
0 0 … 1− x
(ii) If ^
0 is the zero operator on V, then [ ^
0 ] B = O i. e., the null matrix of order n.
3 − x 1 1
= 3 − x 1− x 1 C1 + C2 + C3
3 − x 1 1− x
1 1 1
= (3 − x) 1 1− x 1
1 1 1− x
1 1 1
= (3 − x) 0 − x 0 R2 − R1 , R3 − R1
0 0 − x
= (3 − x) x 2 .
∴ the characteristic equation of A is (3 − x) x 2 = 0.
The only roots of this equation are x = 3, 0.
∴ 0 and 3 are the only characteristic values of A.
240
x1
Let X = x2 be the coordinate matrix of a characteristic vector corresponding to
x3
the characteristic value x = 0. Then X will be given by a non-zero solution of the
equation
( A − 0I ) X = O
1 1 1 x1 0 x1 + x2 + x3 0
i. e., 1 1 1 x2 = 0 or x + x + x = 0
1 2 3
1 1 1 x3 0 x1 + x2 + x3 0
or x1 + x2 + x3 = 0.
This equation has two linearly independent solutions i. e.,
1 0
X1 =
0 , and X 2 = 1 ⋅
− 1 − 1
Every non-zero multiple of these column matrices X1 and X 2 is a characteristic
vector of A corresponding to the characteristic value 0.
The characteristic space of this characteristic value will be the subspace W spanned
by these two vectors X1 and X 2 . Any non-zero vector in W will be a characteristic
vector corresponding to this characteristic value.
To find the characteristic vectors corresponding to the characteristic value 3 we
consider the equation
( A − 3I ) X = O
−2 1 1 x1 0
i. e., 1 −2 1 x =0
2
1 1 − 2 x3 0
− 2 x1 + x2 + x3 0
i. e., x −2 x + x =0
1 2 3
x1 + x2 − 2 x3 0
i. e., − 2 x1 + x2 + x3 = 0, x1 − 2 x2 + x3 = 0, x1 + x2 − 2 x3 = 0.
Solving these equations, we get x1 = x2 = x3 = k.
k
∴ X = k , where k ≠ 0.
k
1 1 1
(b) Let A = 0 1 1 ⋅
0 0 1
The characteristic equation of A is (1− x)3 = 0.
241
x1
Let X = x2 be the coordinate matrix of a characteristic vector corresponding to
x3
the characteristic value 1. Then X will be given by a non-zero solution of the
equation
(A − I ) X = O
0 1 1 x1 0 x2 + x3 0
i. e., 0 0 1 x =0 i. e., x3 =0
2
0 0 0 x3 0 0 0
i. e., x2 + x3 = 0 , x3 = 0 .
∴ x1 = k, x2 = 0, x3 = 0.
k
Thus X =0 , where k ≠ 0.
0
Example 5:Let T be the linear operator on R3 which is represented in the standard basis by
the matrix
−9 4 4
−8 3 4 ⋅
− 16 8 7
−9 − x 4 4
−8 3− x 4 = 0
− 16 8 7 − x
−1− x 4 4
or −1− x 3− x 4 = 0, applying C1 + C2 + C3
−1− x 8 7 − x
1 4 4
or − (1 + x) 1 3− x 4 = 0
1 8 7− x
242
1 4 4
or (1 + x) 0 −1− x 0 = 0, applying R2 − R1 , R3 − R1
0 4 3− x
or (1 + x) (1 + x) (3 − x) = 0.
The roots of this equation are − 1, − 1, 3.
∴ The eigenvalues of the matrix A are − 1, − 1, 3.
The characteristic vectors X of A corresponding to the eigenvalue − 1are given by
the equation
( A − (− 1) I ) X = O or (A + I ) X = O
−8 4 4 x1 0
or −8 4 4 x2 = 0 ⋅
− 16 8 8 x3 0
These equations are equivalent to the equations
−8 4 4 x1 0
0 0 0 x = 0 , applying
2 R2 − R1 , R3 − 2 R1 .
0 0 0 x3 0
The matrix of coefficients of these equations has rank 1. Therefore these equations
have two linearly independent solutions. We see that these equations reduce to the
single equation
− 2 x1 + x2 + x3 = 0.
Obviously
1 0
X1 = 1 , X 2 = 1
1 − 1
are two linearly independent solutions of this equation. Therefore X1 and X 2 are
two linearly independent eigenvectors of A corresponding to the eigenvalue − 1.
Now the eigenvectors of A corresponding to the eigenvalue 3 are given by
( A − 3I ) X = O
− 12 4 4 x1 0
i. e., −8 0 4 x =0 ⋅
2
− 16 8 4 x3 0
These equations are equivalent to the equations
− 12 4 4 x1 0
4 −4 0 x =0 , applying R − R , R − R .
2 2 1 3 1
− 4 4 0 x3 0
The matrix of coefficients of these equations has rank 2. Therefore these equations
243
will have a non-zero solution. Also these equations will have 3 − 2 = 1 linearly
independent solution. These equations can be written as
− 12 x1 + 4 x2 + 4 x3 = 0
4 x1 − 4 x2 = 0
− 4 x1 + 4 x2 = 0.
From these, we get x1 = x2 = 1, say.
Then x3 = 2 .
1
∴ X3 = 1
2
is an eigenvector of A corresponding to the eigenvalue 3.
1 0 1
Now let P = 1 1 1⋅
1 −1 2
We have det P = 1 ≠ 0. Therefore the matrix P is invertible. Therefore the columns
of P are linearly independent vectors belonging to R3 . Since the matrix A has three
linearly independent eigenvectors in R3 , therefore it is diagonalizable.
Consequently the linear operator T is diagonalizable. Also the diagonal form D of
A is given by
−1 0 0
−1
P AP = 0 −1 0 = D.
0 0 3
1 2
Example 6: Prove that the matrix A= is not diagonalizable over the
0 1
field C.
Solution: The characteristic equation of A is
1− x 2
= 0 or (1 − x)2 = 0.
0 1− x
The roots of this equation are 1, 1. Therefore the only distinct eigenvalue of A is 1.
The eigenvectors of A corresponding to this eigenvalue are given by
0 2 x1 0
0 = or 0 x1 + 2 x2 = 0.
0 x2 0
This equation has only one linearly independent solution. We see that
1
X =
0
is the only linearly independent eigenvector of A. Since A has not two linearly
independent eigenvectors, therefore it is not diagonalizable.
244
Comprehensive Exercise 1
1. Show that the characteristic values of a diagonal matrix are precisely the
elements in the diagonal. Hence show that if a matrix B is similar to a
diagonal matrix D, then the diagonal elements of D are the characteristic
values of B.
2. Let T be a linear operator on a finite dimensional vector space V. Then show
that 0 is a characteristic value of T iff T is not invertible.
3. Suppose S and T are two linear operators on a finite dimensional vector space
V. If S and T have the same characteristic polynomial, then
det S = det T.
4. Find all (complex) proper values and proper vectors of the following
matrices:
0 1 1 0 1 1
(a) (b) (c) ⋅
0 0 0 i 0 i
4 2 −2 − 17 18 −6
(c) − 5 3 2 (d) − 18 19 −6⋅
− 2 4 1 − 9 9 2
Answers 1
4. (a) 0 is the only characteristic value of A. The corresponding characteristic
vectors are given by [k 0]′ where k is any non-zero scalar.
(b) Characteristic values are1, i. Characteristic vectors corresponding to the
value 1 are given by [k 0]′ and corresponding to the value i are given by
[0, c ]′ , where k and c are any non-zero scalars.
246
1 2
8. 1, 2, 2. Corresponding eigenspaces are spanned by 0 , 1
0 0
respectively.
11. 1, 2, 2.
12. Not similar over R to a diagonal matrix ; similar over C to a diagonal matrix.
14. Yes.
15. Not similar over the field R as well as C to a diagonal matrix.
2 0 1 −3
16. (a) D = , P= ,
0 −7 −1 2
3 + 4i 0 1 −1
(b) D = , P= ,
0 3 − 4i i i
1 0 0 2 1 0
(c) D = 0 2 0 ,P= 1 1 1 ,
0 0 5 4 2 1
−2 0 0 2 1 −1
(d) D = 0 1
0 ,P=2 1 0 ⋅
0 0 1 1 0 3
−1 −2
and g ( x) = x r + b1 x r + b2 x r + … + br −1 x + br
be two minimal polynomials of A.Then both f ( x) and g ( x) annihilate A.Therefore
we have f ( A) = O and g ( A) = O. These give
−1
A r + a1 A r + … + ar −1 A + a r I = O, …(1)
−1
and A r + b1 A r + … + br −1 A + b r I = O. …(2)
Subtracting (1) from (2), we get
−1
(b1 − a1 ) A r + … + (b r − a r ) I = O. …(3)
r −1
From (3), we see that the polynomial (b1 − a1 ) x + … + (b r − a r ) also
annihilates A. Since its degree is less than r, therefore it must be a zero polynomial.
This gives b1 − a1 = 0, b2 − a2 = 0, … , b r − a r = 0. Thus a1 = b1 , … , a r = b r .
Therefore f ( x) = g ( x) and thus the minimal polynomial of A is unique.
Theorem 2: The minimal polynomial of a matrix (linear operator) is a divisor of every
polynomial that annihilates the matrix (linear operator).
Proof: Suppose m ( x) is the minimal polynomial of a matrix A. Let h ( x) be any
polynomial that annihilates A. Since m ( x) and h ( x) are two polynomials, therefore
by the division algorithm there exist two polynomials q ( x) and r ( x) such that
h ( x) = m ( x) q ( x) + r ( x), …(1)
where either r ( x) is a zero polynomial or its degree is less than the degree of m ( x).
Putting x = A on both sides of (1), we get
h ( A) = m ( A) q ( A) + r ( A)
⇒ O = O q ( A) + r ( A) [∵ both m ( x) and h ( x) annihilate A]
⇒ r ( A) = O.
Thus r ( x) is a polynomial which also annihilates A. If r ( x) ≠ 0, then it is a non-zero
polynomial of degree smaller than the degree of the minimal polynomial m ( x) and
thus we arrive at a contradiction that m ( x) is the minimal polynomial of A.
Therefore r ( x) must be a zero polynomial. Then (1) gives
h ( x) = m ( x) q ( x) ⇒ m ( x) is a divisor of h ( x).
p (T ) β = 0, V β ∈ V ⇒ p (T ) = ^
0.
∴ p ( x) annihilates T and so p ( x) is the minimal polynomial for T.
Thus we have proved that if T is a diagonalizable linear operator, the minimal
polynomial for T is a product of distinct linear factors.
Corollary:If the roots of the characteristic equation of a linear operator T are all distinct say
c1 , c 2 , … , c n , then the minimal polynomial for T is the polynomial
p ( x) = ( x − c1 ) … ( x − c n ).
Proof: Since the roots of the characteristic equation of T are all distinct,
therefore T is diagonalizable. Hence by the above theorem, the minimal
polynomial for T is the polynomial
( x − c1 ) ( x − c 2 ) … ( x − c n ).
Example 7: Let V be a finite-dimensional vector space. What is the minimal polynomial for
the identity operator on V ? What is the minimal polynomial for the zero operator ?
Solution: We have I − 1 I = I − I = ^
0. Therefore the monic polynomial x − 1
annihilates the identity operator I and it is the polynomial of lowest degree that
annihilates I. Hence x − 1 is the minimal polynomial for I.
Again we see that the monic polynomial x annihilates the zero operator ^
0 and it is
the polynomial of lowest degree that annihilates ^
0. Hence x is the minimal
^
polynomial for 0.
Solution: Since T k = ^
0, therefore the polynomial x k annihilates T. So the
minimal polynomial for T is a divisor of x k . Let x r be the minimal polynomial for T
where r ≤ n. Then T r = ^
0.
7 − x 4 −1
Solution: We have | A − xI | = 4 7− x −1
−4 −4 4 − x
7 − x 4 −1
= 4 7− x − 1 , by R3 + R2
0 3− x 3 − x
7 − x 4 − 1
= (3 − x) 4 7− x − 1
0 1 1
7 − x 4 −5
= (3 − x) 4 7− x x − 8 , by C3 − C2
0 1 0
7 − x −5
= − (3 − x) , expanding along third row
4 x − 8
3 − x 3 − x
= − (3 − x) , by R1 − R2
4 x − 8
1 1
= − (3 − x)2 = − (3 − x)2 ( x − 12).
4 x − 8
Therefore the roots of the equation | A − xI | = 0 are x = 3, 3, 12. These are the
characteristic roots of A.
Let us now find the minimal polynomial of A. We know that each characteristic
root of A is also a root of its minimal polynomial. So if m ( x) is the minimal
polynomial for A, then both x − 3 and x − 12 are factors of m ( x). Let us try whether
the polynomial h ( x) = ( x − 3) ( x − 12) = x 2 − 15 x + 36 annihilates A or not.
69 60 − 15
We have A = 60
2
69 − 15 ⋅
− 60 − 60 24
69 60 − 15 7 4 −1
∴ A − 15 A + 36I = 60
2
69 − 15 − 15 4 7 −1
− 60 − 60 24 − 4 −4 4
36 0 0
+ 0 36 0
0 0 36
252
105 60 − 15 105 60 − 15
= 60 105 − 15 − 60 105 − 15 = O.
− 60 − 60 60 − 60 − 60 60
∴ h ( x) annihilates A. Thus h ( x) is the monic polynomial of lowest degree which
annihilates A. Hence h ( x) is the minimal polynomial for A.
Note: In order to find the minimal polynomial of a matrix A, we should not forget
that each characteristic root of A must also be a root of the minimal polynomial.
We should try to find the monic polynomial of lowest degree which annihilates A
and which has also the characteristic roots of A as its roots.
Comprehensive Exercise 2
Answers 2
2. (9 − x) ( x 2 − 5 x − 2) ; ( x − 9) ( x 2 − 5 x − 2).
253
1 0
2. The characteristic values of the matrix are
0 i
(a) 1, i (b) 1, − i
(c) − 1, i (d) − 1, − i.
3. Let V be a finite-dimensional vector space. The minimal polynomial for the
zero operator is
(a) − x (b) x
(c) x − 1 (d) none of these.
0 1
4. The minimal polynomial of the real matrix is
1 0
(a) x 2 + 1 (b) x 2 − 1
(c) x + 1 (d) x − 1.
1 2 3
5. The eigen values of the triangular matrix 0 2 3 are
0 0 3
(a) 1, 2, 3 (b) 2, 3, 4
(c) 1, 3, 4 (d) none of these.
1 4
6. Let A = ⋅ The eigen values of A are
2 3
(a) 1, 5 (b) − 1, 5
(c) 1, − 5 (d) − 1, − 5.
1 1 0
7. The only eigen value of the matrix 0 1 0 is
0 0 1
(a) 1 (b) 2
(c) 3 (d) 4.
254
A nswers
True or False
1. F. 2. T. 3. T. 4. T. 5. F.
6. T.
¨
255
7
I nner P roduct S paces
7.1 Introduction
hroughout this chapter we shall deal only with real or complex vector spaces.
T Thus if V is the vector space over the field F, then F will not be an arbitrary
field. In this chapter F will be either the field R of real numbers or the field C of
complex numbers.
Before defining inner product and inner product spaces, we shall just give some
important properties of complex numbers.
Let z ∈ C i.e., let z be a complex number. Then z = x + iy where x, y ∈ R and
i = √ (− 1) . Here x is called the real part of z and y is called the imaginary part of z.
We write x = Re z , and y = Im z . The modulus of the complex number z = x + iy is
the non-negative real number √ ( x 2 + y 2 ) and is denoted by | z |. Also if z = x + iy
is a complex number, then the complex number z = x − iy is called the conjugate
complex of z. If z = z , then x + iy = x − iy and therefore y = 0. Thus z = z implies
that z is real. Obviously we have
(i) z + z = 2 x = 2 Re z , (ii) z − z = 2iy = 2i Im z ,
2 2 2
(iii) zz = x + y =| z | , (iv) | z | = 0 ⇔ x = 0, y = 0
i.e., | z | = 0 ⇔ z = 0,
256
Let us see that all the postulates of an inner product hold in (1).
(i) Conjugate symmetry : From the definition of inner product given in (1), we
have (β, α) = b1 a1 + … + b n a n .
∴ (β, α) = (b1 a1 + … + b n a n ) = (b1 a1 ) + … + (b n a n )
= b1 ( a1 ) + … + b n ( a n ) = b1 a1 + … + b n a n
= a1 b1 + … + a n b n [∵ multiplication in C is commutative]
= (α, β).
Thus (α, β) = (β, α).
(ii) Linearity : Let γ = (c1 , … , c n ) ∈ Vn (C) and let a, b ∈ C.
We have aα + bβ = a (a1 , … , a n ) + b (b1 , … , b n )
= (aa1 + bb1 , … , aa n + bb n ).
∴ (aα + bβ, γ ) = (aa1 + bb1 ) c1 + … + (aa n + bb n ) c n [by (1)]
= (aa1 c1 + … + aa n c n ) + (bb1 c1 + … + bb n c n )
= a (a1 c1 + … + a n c n ) + b (b1 c1 + … + b n c n )
= a (α , γ ) + b (β , γ ) [by (1)]
(iii) Non-negativity :
(α , α) = a1 a1 + … + a n a n [by (1)]
2 2
= | a1 | + … + | a n | . ...(2)
Now a i is a complex number. Therefore | a i | 2 ≥ 0. Thus (2) is a sum of n
non-negative real numbers and therefore it is ≥ 0. Thus (α , α) ≥ 0. Also (α , α) = 0
⇒ | a1 |2 + … + | a n |2 = 0
⇒ each | a i | 2 = 0 and so each a i = 0 ⇒ α = 0.
Hence the product defined in (1) is an inner product on Vn (C) and with respect to
this inner product Vn (C) is an inner product space.
If α, β are two vectors in Vn (C), then the standard inner product of α and β is also
called the dot product of α and β and is denoted by α . β. Thus if
α = (a1 , … , a n ), β = (b1 , … , b n ) then α . β = a1 b1 + a2 b2 + … + a n b n .
Illustration 2: If α = (a1 , a2 ), β = (b1 , b2 ) ∈ V2 (R), let us define
(α , β) = a1 b1 − a2 b1 − a1 b2 + 4a2 b2 . ...(1)
We shall show that all the postulates of an inner product hold good in (1).
258
We shall show that all the postulates of an inner product hold in (1).
(i) Conjugate Symmetry: We have
1
( g (t), f (t)) = ∫0 g (t) f (t) dt [from (1)]
Since | f (t)|2 ≥ 0 for every t lying in the closed interval [0, 1], therefore (2) ≥ 0.
Thus ( f (t), f (t)) ≥ 0
Also ( f (t), f (t)) = 0
1
⇒ ∫0 | f (t)|2 dt = 0
= a (β, α) + b (γ , α) = a (β, α) + b (γ , α)
= a (α, β) + b (α, γ ).
Note 1: If F = R, then the result (ii) can be simply read as
(α , aβ + b γ ) = a (α , β) + b (α , γ ).
Note 2: Similarly it can be proved that (α , aβ − b γ ) = a (α , β) − b (α , γ ).
Also (α , β + γ ) = (α , 1β + 1γ ) = 1 (α, β) + 1 (α, γ ) = (α , β) + (α , γ ).
Theorem 2: In an inner product space V ( F ), prove that
(i) ||α|| ≥ 0; and ||α|| = 0; if and only if α = 0.
(ii) || aα|| = | a |.|| α||.
Proof: (i) We have ||α|| = √ (α, α) [by def. of norm]
2
⇒ ||α|| = (α, α)
⇒ ||α||2 ≥ 0 [∵ (α, α) ≥ 0]
⇒ ||α||2 ≥ 0
Also (α, α) = 0 iff α = 0
∴ || α ||2 = 0 iff α = 0 i.e., ||α|| = 0 iff α = 0.
Thus in an inner product space, || α|| > 0 iff α ≠ 0.
(ii) We have || aα|| 2 = (aα, aα) [by def. of norm]
= a (α, aα) [by linearity property]
= aa (α, α) [by theorem 1]
2 2
= | a| .|| α|| .
2 2 2
Thus ||aα|| = |a| ⋅ ||α||
Taking square root, we get || aα|| = | a |.|| α||.
1
Note: If α is any non-zero vector of an inner product space V, then α is a unit
|| α ||
vector in V. We have ||α||≠ 0 because α ≠ 0.
1
Therefore α ∈ V.
||α ||
1 1 1 1 1 1
Now α, α = α, α = (α, α) = ||α ||2 = 1.
||α || ||α || ||α || ||α || ||α ||2 ||α ||2
α α
Therefore = 1 and thus is unit vector.
||α || ||α ||
For example if α = (2, 1, 2) is a vector in V3 (R) with standard inner product, then
||α|| = √ (α, α) = √ (4 + 1 + 4) = 3 .
Therefore (2, 1, 2) i.e., , , is a unit vector.
1 2 1 2
3 3 2 3
261
1 2
Also |( f (t), g(t))|2 = ∫0 f (t) g (t) dt .
n n
(α ,β) = Σ x j α j , β = Σ x j (α j , β),
j =1 j =1
by linearity property of the inner product
n n n n
= Σ x j α j , Σ y i α i = Σ xj Σ y i (α j , α i )
j =1 i =1 j =1 i =1
Now suppose that out of the n scalars x1 , … , x n we take x i = 1 and each of the
remaining n − 1scalars is taken as 0. Then from (3) we conclude that g ii > 0. Thus
g ii > 0 for each i = 1, … , n. Hence each entry along the principal diagonal of the
matrix G is positive.
Example 1: Show that we can always define an inner product on a finite dimensional vector
space real or complex.
Solution: Let V be a finite dimensional vector space over the field F real or complex.
Let B = { α1 , ..., α n } be a basis for V.
Let α, β ∈ V. Then we can write α = a1α1 + .... + a nα n ,
and β = b1α1 + .... + b nα n
where a1 , … , a n and b1 , … , b n are uniquely determined elements of F.Let us define
(α, β) = a1 b1 + .... + a n b n . ...(1)
We shall show that (1) satisfies all the conditions for an inner product.
(i) Conjugate Symmetry: We have
(β, α) = b1 a1 + … + b n a n .
∴ (β, α) = (b1 a1 + … + b n a n ) = b1 a1 + … + b n a n
= a1 b1 + … + a n b n = (α , β).
(ii) Linearity: Let γ = c1α1 + .... + c nα n ∈ V and a, b ∈ F. We have
aα + bβ = a (a1α1 + .... + a nα n ) + b (b1α1 + .... + b nα n )
= (aa1 + bb1 ) α1 + … + (aa n + bb n ) α n .
∴ (aα + bβ, γ ) = (aa1 + bb1 ) c1 + … + (aa n + bb n ) c n
= a (a1 c1 + … + a n c n ) + b (b1 c1 + .... + b n c n )
= a (α, γ ) + b (β, γ ).
(iii) Non-negativity: We have
(α, α) = a1 a1 + … + a n a n = | a1 | 2 + .... + | a n | 2 ≥ 0.
Also (α, α) = 0 ⇒ | a1 | 2 + .... + | a n | 2 = 0
⇒ | a1 | 2 = 0, … ,| a n | 2 = 0
⇒ a1 = 0, … , a n = 0 ⇒ α = 0.
Hence (1) is an inner product on V.
Example 2: In V2 ( F ) define for α = (a1 , a2 ) and β = (b1 , b2 ) ,
(α, β) = 2a1 b1 + a1 b2 + a2 b1 + a2 b2 .
Show that this defines an inner product on V2 ( F ).
267
Comprehensive Exercise 1
(a) (α, β) = x1 y1 + 2 x1 y2 + 2 x2 y1 + 5 x2 y2
(b) (α, β) = x12 − 2 x1 y2 − 2 x2 y1 + y12 .
(c) (α, β) = 2 x1 y1 + 5 x2 y2 .
(d) (α, β) = x1 y1 − 2 x1 y2 − 2 x2 y1 + 4 x2 y2 .
4. Let α = (a1 , a2 ) and β = (b1 , b2 ) be any two vectors ∈ V2 (C). Prove that
(α, β) = a1 b1 + (a1 + a2 ) (b1 + b2 ) defines an inner product in V2 (C). Show
that the norm of the vector (3, 4) in this inner product space is √ (58).
5. Let V be an inner product space.
(a) Show that (0, β) = 0 for all β in V.
(b) Show that if (α, β) = 0 for all β in V, then α = 0.
6. Let V be an inner product space, and α, β be vectors in V. Show that α = β if
and only if (α, γ ) = (β, γ ) for every γ in V.
7. Normalize each of the following vectors in the Euclidean space R3 :
(ii) , , − ⋅
1 2 1
(i) (2, 1, − 1),
2 3 4
8. Let V (R) be a vector space of polynomials with inner product defined by
1
( f (t), g (t)) = ∫0 f (t) g(t) dt.
A nswers 1
3. (a) and (c) are inner products, (b) and (d) are not
(i)
2 1 1 1
7. , ,− (ii) (6, 8, − 3)
√6 √6 √6 √ (109)
7 1
8. ;
4 √3
271
7.4 Orthogonality
(Lucknow 2008)
Definition: Let α and β be vectors in an inner product space V. Then α is said to be
orthogonal to β if (α , β) = 0.
The relation of orthogonality in an inner product space is symmetric. We have α is
orthogonal to β ⇒ (α, β) = 0 ⇒ (α, β) = 0 ⇒ (β, α) = 0 ⇒ β is orthogonal to α.
So we can say that two vectors α and β in an inner product space are orthogonal if
(α, β) = 0.
Note 1: If α is orthogonal to β, then every scalar multiple of α is orthogonal to β. Let k be any
scalar. Then
(kα, β) = k (α, β) = k 0 = 0. [∵ (α, β) = 0]
Therefore kα is orthogonal to β.
Note 2: The zero vector is orthogonal to every vector. For every vector α in V, we have
(0, α) = 0.
Note 3: The zero vector is the only vector which is orthogonal to itself.
We have α is orthogonal to α ⇒ (α, α) = 0
⇒ α = 0, by def. of an inner product space.
Definition: A vector α is said to be orthogonal to a set S if it is orthogonal to each vector in S.
Similarly two subspaces are called orthogonal if every vector in each is orthogonal to every
vector in the other.
Orthogonal set. Definition: Let S be a set of vectors in an inner product space V. Then S
is said to be an orthogonal set provided that any two distinct vectors in S are orthogonal.
Theorem 1: Let S = {α1 , … , α m } be an orthogonal set of non-zero vectors in an inner
product space V. If a vector β in V is in the linear span of S, then
m (β, α k )
β= Σ αk .
k = 1 || α || 2
k
Theorem 2: Any orthogonal set of non-zero vectors in an inner product space V is linearly
independent.
Proof: Let S be an orthogonal set of non-zero vectors in an inner product space V.
Let S1 = {α1 , … , α m } be a finite subset of S containing m distinct vectors. Let
m
Σ c j α j = c1 α1 + … + c m α m = 0. ...(1)
j =1
⇒ ck = 0 [∵ α k ≠ 0 ⇒ || α k ||2 ≠ 0]
∴ The set S1 is linearly independent. Thus every finite subset of S is linearly
independent. Therefore S is linearly independent.
Orthonormal set. Definition: Let S be a set of vectors in an inner product space V. Then
S is said to be an orthonormal set if
(i) α ∈ S ⇒ || α|| = 1 i.e., (α, α) = 1,
and (ii) α, β ∈ S and α ≠ β ⇒ (α, β) = 0.
Thus an orthonormal set is an orthogonal set with the additional property that
each vector in it is of length 1. In other words a set S consisting of mutually orthogonal unit
vectors is called an orthonormal set. Obviously an orthonormal set cannot contain zero
vector because || 0 || = 0.
A finite set S = {α1 , … , α m } is orthonormal if
(α i , α j ) = δ ij where δ ij = 1 if i = j and δ ij = 0 if i ≠ j.
Existence of an orthonormal set: Every inner product space V which is not equal to zero
space possesses an orthonormal set.
α
Let 0 ≠ α ∈ V. Then || α || ≠ 0. The set containing only one vector is
|| α||
necessarily an orthonormal set.
273
α α 1 α 1 1
We have , = α, = , (α, α)
|| α || || α || || α || || α || || α || || α ||
1
= || α|| 2 = 1.
|| α|| 2
Theorem 3: Let S = {α1 , ..., α m }be an orthonormal set of vectors in an inner product space
m
V. If a vector β is in the linear span of S, then β = Σ (β, α k ) α k .
k =1
spanned by S.
Proof: We have for each k where 1≤ k ≤ m,
m
(γ , α k ) = β − Σ (β, α i ) α i , α k
i =1
m
= (β, α k ) − Σ (β, α i ) α i , α k
i =1 [by linearity of inner product]
m
= (β, α k ) − Σ (β, α i ) (α i , α k ) [by linearity of inner product]
i =1
m
= (β, α k ) − Σ (β, α i ) δ ik
i =1 [∵ α i , α k belong to an orthonormal set]
= (β, α k ) − (β, α k ) [∵ δ ik = 1 if i = k and δ ik = 0 if i ≠ k]
= 0.
Hence the first part of the theorem.
274
Now let δ be any vector in the subspace spanned by S i.e., let δ ∈ L (S). Then
m
δ= Σ a i α i where each a i is some scalar.
i =1
m m m
We have (γ , δ) = γ, Σ a i α i = Σ a i (γ , α i ) = Σ a i 0 = 0.
i =1 i =1 i =1
We know that δ is orthogonal to each of the vectors α1 , … , α n i.e., (δ, α i ) = 0 for each
i = 1, ..., n. Therefore according to the given statement δ = 0. This gives
n
γ = Σ (γ , α i ) α i . Thus every vector γ in V can be expressed as a linear
i =1
(iv) ⇒ (v).
n
It is given that if β is in V, then β = Σ (β, α i ) α i . If γ is another vector in V, then
i =1
n
γ = Σ (γ , α i ) α i .
i =1
n n
We have (β, γ ) = Σ (β, α i ) α i , Σ (γ , α j ) α j
i =1 j =1
n n n
= Σ Σ (β,α i ) (γ , α j ) (α i ,α j ) = Σ (β,α i )(γ ,α i )
i =1 j =1 i =1
[On summing with respect to j]
276
n
= Σ (β, α i ) (α i , γ ).
i =1
(v) ⇒ (vi).
n
It is given that if β and γ are in V, then (β, γ ) = Σ (β , α i ) (α i , γ ).
i =1
(vi) ⇒ (i).
n
It is given that if β is in V, then|| β ||2 = Σ |(β,α i )|2 . . To prove that S is a complete
i =1
orthonormal set.
Let S be not a complete orthonormal set i.e., let S be contained in a larger
orthonormal set S1 .
Then there exists a vector α 0 in S1 such that|| α 0|| = 1and α 0 is orthogonal to each
of the vectors α1 , … , α n . Since α 0 is in V, therefore from the given condition,
n
we have ||α 0||2 = Σ |(α 0 ,α i )|2 = 0.
i =1
Thus we have constructed an orthonormal set {α1 }containing one vector. Also α1 is
in the linear span of β1 .
Now let γ 2 = β 2 − (β 2 , α1 ) α1 . By theorem 4, γ 2 is orthogonal to α1 . Also γ 2 ≠ 0
because if γ 2 = 0,then β 2 is a scalar multiple of α1 and therefore of β1 . But this is not
possible because the vectors β1 and β 2 are linearly independent. Hence γ 2 ≠ 0. Let
γ2
us now put α 2 = ⋅ Then|| α 2 || = 1. Also α 2 is orthogonal to α1 because α 2 is
|| γ 2 ||
simply a scalar multiple of γ 2 which is orthogonal to α1 . Further α 2 ≠ α1 . For
otherwise β 2 will become a scalar multiple of β1 . Thus {α1 , α 2 } is an orthonormal
set containing two distinct vectors such that α1 is in the linear span of β1 and α 2 is
in the linear span of β1 , β 2 .
The way ahead is now clear. Suppose that we have constructed an orthonormal set
{α1 , … , α k } of k (where k < n) distinct vectors such that each α j ( j = 1, ..., k) is a
linear combination of β1 , … , β j . Consider the vector
γ k + 1 = β k + 1 − (β k + 1 , α1 ) α1 − (β k + 1 , α 2 ) α 2 − … − (β k + 1 , α k ) α k ...(1)
278
γ3
α3 = where γ 3 = β 3 − (β 3 , α1 ) α1 − (β 3 , α 2 ) α 2 ,
|| γ 3||
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
γn
αn = where γ n = β n − (β n , α1 ) α1 − (β n , α 2 ) α 2 − …
|| γ n ||
− (β n , α n − 1 ) α n − 1 .
Now we shall give an example to illustrate the Gram-Schmidt process.
Example 7: Apply the Gram-Schmidt process to the vectors β1 = (1, 0, 1),β 2 = (1, 0, − 1),
β 3 = (0, 3, 4) , to obtain an orthonormal basis for V3 (R) with the standard inner product.
(Lucknow 2009)
2
Solution: We have || β1|| = (β1 , β1 ) = 1 . 1 + 0 . 0 + 1 . 1 = (1) + (0)2 + (1)2 = 2.
2
β1 1 1 , 0, 1 .
Let α1 = = (1, 0, 1) =
|| β1 || √ 2 √2 √ 2
Now let γ 2 = β 2 − (β 2 , α1 ) α1 .
1 1
We have (β 2 , α1 ) = 1 . + 0 . 0 + (− 1) . = 0.
√2 √2
γ 2 = (1, 0, − 1) − 0
1 1
∴ ,0 , = (1, 0, − 1).
√2 √ 2
Now || γ 2||2 = (γ 2 , γ 2) = (1)2 + (0)2 + (−1)2 = 2.
γ2
(1, 0, − 1) =
1 1 1
Let α2 = = ,0 , − ⋅
|| γ 2|| √2 √2 √ 2
Now let γ 3 = β 3 − (β 3 , α1 ) α1 − (β 3 , α 2 ) α 2 .
1
(β 3 , α1 ) = (0, 3, 4),
1 1 1
We have ,0 , = 0 ⋅ + 3⋅0 + 4 ⋅ = 2 √ 2.
√2 √ 2 √2 √2
1
(β 3 , α 2 ) = (0, 3, 4),
1 1 1
, 0 ,− = 0 ⋅ + 3⋅0 − 4 ⋅ = 2 √ 2.
√2 √2 √2 √2
γ 3 = (0, 3, 4) − 2 √ 2
1 1 1 , 0 ,− 1
∴ ,0 , + 2√2
√2 √ 2 √2 √ 2
∴ = (0, 3, 4) − (2, 0, 2) + (2, 0, − 2) = (0, 3, 0).
Now || γ 3 ||2 = (γ 3 , γ 3) = (0)2 + (3)2 + (0)2 = 9.
γ3 1
Put α3 = = (0, 3, 0) = (0, 1, 0).
|| γ 3|| 3
1
{α1 , α 2 , α 3 } i.e.,
1 1 1
Now ,0 , , ,0 , − , (0, 1, 0)
√ 2 √ 2 √ 2 √ 2
is the required orthonormal basis for V3 (R).
280
m m
We have || γ | |2 = (γ , γ ) = β − Σ (β,α i ) α i ,β − Σ (β,α j ) α j
i =1 j =1
m m
= (β, β) − Σ (β, α i ) (α i ,β) − Σ (β, α j )(β, α j )
i =1 j =1
m m
+ Σ Σ (β,α i ) (β, α j ) (α i , α j )
i =1 j =1
m m m
= (β, β) − Σ (β, α i ) (β, α i ) − Σ (β, α j ) (β, α j ) + Σ (β, α i ) (β, α i )
i =1 j =1 i =1
Now || γ ||2 ≥ 0.
m m
∴ || β ||2 − Σ |(β,α i )|2 ≥ 0 or Σ |(β,α i )|2 ≤ || β ||2 ,
i =1 i =1
m
If the equality holds i.e., if Σ |(β,α i )|2 = || β ||2 then from (1) we have|| γ || 2 = 0.
i =1
m
This implies that γ = 0 i.e., β = Σ (β, α i ) α i .
i =1
αi 1
Also (β, δ i ) = β, = (β,α i ).
|| α i || || α i ||
|(β, α i )| 2
|(β, δ i )| 2 = ⋅ ...(2)
|| α i || 2
From (1) and (2), we get the required result.
Corollary: If V is finite dimensional and if {α1 , … , α m } is an orthonormal set in V such
m
that Σ | (β, α i )| 2 = || β ||2 for every β ∈ V, prove that {α1 , … , α m }must be a basis of V.
i =1
Therefore V = W + W ⊥ .
Now we shall prove that the subspaces W and W ⊥ are disjoint. Let α ∈ W ∩ W ⊥ .
283
Example 8: State whether the following statement is true or false. Give reasons to support
your answer.
α is an element of an n-dimensional unitary space V and α is perpendicular to n linearly
independent vectors from V, then α = 0.
284
⇒ (W1 ∩ W2 ) ⊥ = W1 ⊥ + W2 ⊥ .
Example 15: Find two mutually orthogonal vectors each of which is orthogonal to the vector
α = (4, 2, 3) of V3 (R) with respect to standard inner product.
Solution: Let β = ( x1 , x2 , x3 ) be any vector orthogonal to the vector (4, 2, 3). Then
4 x1 + 2 x2 + 3 x3 = 0.
Obviously β = (3, − 3, − 2) is a solution of this equation. We now require a third
vector γ = ( y1 , y2 , y3 ) orthogonal to both α and β. This means γ must be a
solution vector of the system of equations
4 y1 + 2 y2 + 3 y3 = 0, 3 y1 − 3 y2 − 2 y3 = 0.
Obviously γ = (5, 17, − 18) is a solution of these equations. Thus, β and γ are
orthogonal to each other and to α. The solution is, of course, by no means unique.
Comprehensive Exercise 2
1. Let V3 (R) be the inner product space relative to the standard inner product.
Then find
(a) two linearly independent vectors each of which is orthogonal to the vector
(1, 1, 2).
(b) two mutually orthogonal vectors, each of which is orthogonal to ( 5, 2, − 1).
(c) two mutually orthogonal unit vectors, each of which is orthogonal to
( 2, − 1, 3).
(d) the projections of the vector ( 3, 4, 1) onto the space spanned by (1, 1, 1) and on
its orthogonal complement.
2. Let V3 (R) be the inner product space with respect to the standard inner
product and let W be the subspace of V3 (R) spanned by the vector
α = (2, − 1, 6). Find the projections of the vector β = (4, 1, 2) on W and W ⊥ .
1 2 2 2 1 2 2 2 1
3. Verify that the vectors ( , − , − ) , ( , − , ) and ( , , − ) from an
3 3 3 3 3 3 2 3 3
orthonormal basis for V3 (R) relative to the standard inner product.
4. Given the basis (1, 0, 0), (1, 1, 0), (1, 1, 1) for V3 (R), construct from it by the
Gram-Schmidt process an orthonormal basis relative to the standard inner
product.
5. Given the basis (2, 0, 1), (3, − 1, 5), and (0, 4, 2) for V3 (R), construct from it by
the Gram-Schmidt process an orthonormal basis relative to the standard
inner product.
6. Find an orthonormal basis of the vector space V of all real polynomials of
degree not greater than two, in which the inner product is defined as
288
1
(φ ( x), ψ( x)) = ∫ −1 φ ( x) Ψ ( x) dx , where φ ( x), ψ( x) ∈ V .
7. If α and β are vectors in a real inner product space, and if|| α || = || β || , then
α − β and α + β are orthogonal. Interpret the result geometrically.
8. If V is an inner product space, then prove that
(i) {0} ⊥ = V, (ii) V ⊥
= { 0}.
9. If V is an inner product space and S, S1 , S2 are subsets of V, then
(i) S1 ⊆ S2 ⇒ S2 ⊥ ⊆ S1 ⊥ (ii) S ⊥ = [ L(S)] ⊥
(iii) L (S ) ⊆ S ⊥ ⊥
(iv) L (S) = S ⊥ ⊥ if V is finite dimensional.
10. Let W be a subspace of an inner product space V. If {α1 , … , α n } is a basis for
W, then β ∈ W ⊥ if and only if (β, α i ) = 0 V i = 1, 2, ..., n.
11. Let V be a finite-dimensional inner product space, and let {α1 , … , α n } be an
orthonormal basis for V. Show that for any vectors α, β in V
n
(α, β) = Σ (α, α k ) (β, α k ).
k =1
A nswers 2
1 √3 √5
13. , , −
2 2 1
6. , x, (3 x 2 − 1)
√ 2 √ 2 √ 8 3 3 3
True or False
Write ‘T’ for true and ‘F’ for false statement.
1. If α = (a1 , … , a n ) and β = (b1 , … , b n ) are two vectors in Vn (C), then the
standard inner product of α and β is
α . β = a1 b1 + a2 b2 + … + a n b n .
2. In an inner product space V ( F ), (α , aβ + b γ ) = a (α , β) + b (α , γ ).
3. In an inner product space V ( F ), the distance d (α, β) from α to β is given by
d (α, β) = || α − β ||.
4. If α and β are vectors in an inner product space, then
|| α + β ||2 + || α − β ||2 = ||α ||2 + || β ||2
5. Any orthogonal set of non-zero vectors in an inner product space V is linearly
dependent.
6. Every finite-dimensional inner product space has an orthonormal basis.
7. If V is an inner product space, then V ⊥ = { 0}.
A nswers
True or False
1. F 2. T 3. T 4. F
5. F 6. T 7. T
¨
291
8
Bilinear, Quadratic and
Hermitian Forms
Example 1: Suppose V is a vector space over the field F.Let L1 , L2 be linear functionals on
V. Let f be a function from V × V into F defined as
f (α , β) = L1 (α) L 2 ( β).
Then f is a bilinear form on V.
Solution: If α, β ∈ V , then L1 (α), L 2 ( β) are scalars. We have
f (aα1 + bα 2 , β) = L1 (aα1 + bα 2 ) L2 ( β)
= [a L1 (α1 ) + b L1 (α 2 )] L2 ( β)
= aL1 (α1 ) L 2 ( β) + bL1 (α 2 ) L2 ( β)
= af (α1 , β) + bf (α 2 , β).
Also f (α, aβ1 + bβ 2 ) = L1 (α) L 2 (aβ1 + bβ 2 )
= L1 (α) [a L 2 ( β1 ) + b L 2 ( β 2 )]
= aL1 (α) L 2 ( β1 ) + b L1 (α) L 2 ( β 2 )
= a f (α, β1 ) + b f (α, β 2 ).
Hence f is a bilinear form on V.
Example 2:Suppose V is a vector space over the field F.Let T be a linear operator on V and f
a bilinear form on V. Suppose g is a function from V × V into F defined as
g (α , β) = f (Tα , T β).
Then g is a bilinear form on V.
293
= af (α1 , β) + bf (α 2 , β).
294
n m
g (α , β) = g Σ x i α i , Σ y j β j
i =1 j =1
n m
= Σ Σ x i y j g (α i , β j ) [ ∵ g is a bilinear form]
i =1 j =1
n m
= Σ Σ x i y j a ij [from (2)]
i =1 j =1
f (α i , α j ) = a ij , i = 1, … , n ; j = 1, … , n.
We shall denote this matrix A by [ f ] B .
Rank of a bilinear form: Definition: The rank of a bilinear form is defined as the
rank of the matrix of the form in any ordered basis.
Let us describe all bilinear forms on a finite-dimensional vector space V of
dimension n.
n n
If α = Σ x i α i , and β = Σ y j α j are vectors in V, then
i =1 j =1
n n n n
f (α, β) = f Σ x i α i , Σ y j α j = Σ Σ x i y j f (α i , α j )
i =1 j =1 i =1 j =1
n n
= Σ Σ x i y j a ij = X ′ AY,
i =1 j =1
295
where X and Y are coordinate matrices of α and β in the ordered basis B and X′ is the
transpose of the matrix X. Thus f (α, β) = [ α ] B ′ A [ β ] B .
From the definition of the matrix of a bilinear form, we note that if f is a bilinear
form on an n-dimensional vector space V over the field F and B is an ordered basis
of V,then there exists a unique n × n matrix A = [ a ij ] n × n over the field F such that
A = [ f ]B .
Conversely, if A = [ a ij ] n × n be an n × n matrix over the field F, then from
theorem 1, we see that there exists a unique bilinear form f on V such that
[ f ] B = [ a ij ] n × n .
n n
If α = Σ x i α i , β = Σ y j α j are vectors in V, then the bilinear form f is de-
i =1 j =1
n n
fined as f (α, β) = Σ Σ x i y j a ij = X ′ AY, …(1)
i =1 j =1
where X, Y are the coordinate matrices of α, β in the ordered basis B. Hence the
bilinear forms on V are precisely those obtained from an n × n matrix as in (1).
f (( x1 , y1 ), ( x2 , y2 )) = x1 y1 + x2 y2 . (Lucknow 2011)
x1 y1
x y
2 2
Let X = ... and Y = ... be any two elements of Vm and Vn respectively so that
... ...
x m y n
Let b (X, Y ) = X T AY ,
where X T = [ x1 x2 … x m ],
T
Y = [ y1 y2 … y n]
be a given bilinear form over the field F.
If the matrix A is of rank r, then there exist non-singular matrices P and Q of orders
m and n respectively such that
I r O
P T AQ =
O O
is in the normal form.
If we transform the vectors X and Y to U and V by the transformations
X = PU, Y = QV,
then the bilinear form b (U, V ) equivalent to the bilinear form b (X, Y) is given by
I r O
b (U, V ) = U T (P T AQ ) V = U T V
O O
= u1 v1 + u2 v2 + … + ur vr ,
where U = [u1 u2 … um ]T and V = [v1 v2 … vn ]T .
The bilinear form b (U, V ) = u1 v1 + … + ur vr is called the equivalent canonical form
or the equivalent normal form of the bilinear form
b (X, Y ) = X T AY .
Congruent Matrices. Definition: A square matrix B of order n over a field F is said to
be congruent to another square matrix A of order n over F, if there exists a non-singular matrix
P over F such that
B = P T AP.
Cogradient Transformations: Definition : Let X and Y be vectors belonging to
the same vector space Vn over a field F and let A be a square matrix of order n over F.Let
b (X, Y ) = X T AY
be a bilinear form on the vector space Vn over F.
Let B be a non-singular matrix of order n and let the vectors X and Y be transformed to the
vectors U and V by the transformations
X = BU, Y = BV.
Then the bilinear form b (X, Y ) transforms to the equivalent bilinear form
b (U, V ) = U T (BT AB ) V = U T DV,
where D = BT AB and U and V are both n-vectors.
The bilinear form U T DV is said to be congruent to the bilinear form X T AY . Under such
circumstances when X and Y are subjected to the same transformation X = BU and Y = BV,
we say that X and Y are transformed cogradiently.
Here, the matrix A is congruent to B because D = BT AB,where B is non- singular.
299
Example 4: Find the matrix A of each of the following bilinear forms b (X, Y ) = X T AY .
(i) 3 x1 y1 + x1 y2 − 2 x2 y1 + 3 x2 y2 − 3 x1 y3
(ii) 2 x1 y1 + x1 y2 + x1 y3 + 3 x2 y1 − 2 x2 y3 + x3 y2 − 5 x3 y3
Which of the above forms is symmetric ?
Solution: (i) The element a ij of the matrix A is the coefficient of x i y j in
the given bilinear form.
3 1 −3
∴ A = .
−2 3 0
The matrix A is not a symmetric matrix. So the given bilinear form is not
symmetric.
(ii) The element a ij of the matrix A which occurs in the i th row and the jth column
is the coefficient of x i y j in the given bilinear form.
2 1 1
∴ A = 3 0 −2 .
0 1 −5
The matrix A is not symmetric. So the given bilinear form is not symmetric.
Example 5: Transform the bilinear form X T AY to the equivalent canonical form where
2 1 1
A = 4 2 2.
1 2 2
Performing R1 ↔ R3 , we get
1 2 2 0 0 1 1 0 0
4 2 2 = 0 1 0 A0 1 0 ⋅
2 1 1 1 0 0 0 0 1
Performing R2 → R2 − 4 R1 , R3 → R3 − 2 R1 , we get
1 2 2 0 0 1 1 0 0
0 −6 −6 = 0 1 −4 A 0 1 0 ⋅
0 −3 −3 1 0 −2 0 0 1
300
Performing R3 → R3 − R2 , we get
1 0 0 0 0 1 1 − 2 − 2
0 1 1 = 0 − 16 2
1 0⋅
3 A0
0 0 0 − 13 1
6
0 0 0 1
Performing C3 → C3 − C2 , we get
1 0 0 0 1 1 − 2 0
0
0 1 0 = 0 − 16
2 1 − 1 ⋅
A0
3
0 0 0 − 13 0 0 0
1
6
1
0 0 −
1
3
I 2 O 1 1
∴ P T AQ = , where P = 0 −
O O 6 6
1 2
0
3
1 −2 0
and Q = 0 1 − 1 ⋅
0 0 0
Hence if the vectors X = [ x1 x2 x3 ]T and Y = [ y1 y2 y3 ]T are
transformed to the vectors U = [u1 u2 u3 ]T and V = [v1 v2 v3 ]T
respectively by the transformations X = PU, Y = QV,
then the bilinear form
2 1 1 y1
[ x1 x2 x3 ] 4 2 2 y2
1 2 2 y3
where a ij ' s are all real numbers, is called a real quadratic form in the n variables
x1 , x2 ,…, x n .
For example,
(i) 2 x 2 + 7 xy + 5 y 2 is a real quadratic form in the two variables x and y.
(ii) 2 x 2 − y 2 + 2z 2 − 2 yz − 4zx + 6 xy is a real quadratic form in the three
variables x, y and z.
(iii) x12 − 2 x2 2 + 4 x3 2 − 4 x4 2 − 2 x1 x2 + 3 x1 x4 + 4 x2 x3 − 5 x3 x4 is a real
quadratic form in the four variables x1 , x2 , x3 and x4 .
Theorem: Every quadratic form over a field F in n variables x1 , x2 ,……, x n can be
expressed in the form X ′ BX where X = [ x1 , x2 ,……, x n ]T is a column vector and B is a
symmetric matrix of order n over the field F.
n n
Proof: Let Σ Σ a ij x i x j , …(1)
i =1 j =1
If we identify a1 × 1matrix with its single element i. e.,if we regard a1 × 1matrix equal
to its single element, then we have
n n n n
XT BX = Σ Σ b ij x i x j = Σ Σ a ij x i x j .
i =1 j =1 i =1 j =1
then there exists a unique symmetric matrix B of order n such that φ = XT BX where
X = [ x1 x2 …… x n ]T . The symmetric matrix B is called the matrix of the quadratic
n n
form Σ Σ a ij x i x j .
i =1 j =1
Since every quadratic form can always be so written that matrix of its coefficients is
a symmetric matrix, therefore we shall be considering quadratic forms which are so
adjusted that the coefficient matrix is symmetric.
Thus we have seen that there exists a one-to-one correspondence between the set of
all quadratic forms in n variables over a field F and the set of all n-rowed symmetric
matrices over F.
Let A = [a ij ] n × n be any given square matrix of order n over the field F. Then any polynomial
of the form
n n
q ( x1 , x2 , … , x n ) = X T AX = Σ Σ a ij x i x j
i =1 j =1
Example 6: Write down the matrix of each of the following quadratic forms and verify that
they can be written as matrix products XT AX :
(i) x12 − 18 x1 x2 + 5 x2 2 .
(ii) x12 + 2 x2 2 − 5 x3 2 − x1 x2 + 4 x2 x3 − 3 x3 x1 .
Solution: (i) The given quadratic form can be written as
x1 x1 − 9 x1 x2 − 9 x2 x1 + 5 x2 x2 .
1 −9
Let A be the matrix of this quadratic form. Then A = ⋅
−9 5
x1
Let X = ⋅ Then X′ = [ x1 x2 ] ⋅
x2
1 −9
We have X ′ A = [ x1 x2 ] = [ x1 − 9 x2 − 9 x1 + 5 x2 ] .
−9 5
x1
∴ X ′ AX = [ x1 − 9 x2 − 9 x1 + 5 x2 ]
x2
= x1 ( x1 − 9 x2 ) + x2 (−9 x1 + 5 x2 )
= x12 − 9 x1 x2 − 9 x2 x1 + 5 x2 2
= x12 − 18 x1 x2 + 5 x2 2 .
(ii) The given quadratic form can be written as
1 3 1 3
x1 x1 − x1 x2 − x1 x3 − x2 x1 + 2 x2 x2 + 2 x2 x3 − x3 x1
2 2 2 2
+ 2 x3 x2 − 5 x3 x3 .
Let A be the matrix of this quadratic form. Then
304
1 − 12 − 32
A = − 12 2 2 ⋅
−3 2 −5
2
Obviously A is a symmetric matrix.
x1
Let X = x2 ⋅ Then X′ = [ x1 x2 x3 ].
x3
1 − 12 − 32
We have X ′ A = [ x1 x2 x3 ] − 12 2 2
−3 2 −5
2
1 3 1 3
= [ x1 − x2 − x3 − x1 + 2 x2 + 2 x3 −
x1 + 2 x2 − 5 x3 ].
2 2 2 2
x1
− x1 + 2 x2 + 2 x3 − x1 + 2 x2 − 5 x3 ] x2 .
1 3 1 3
∴ X ′AX = [x1 − x2 − x3
2 2 2 2
x3
1 3 1 3
= x1 ( x1 − x2 − x3 ) + x2 (− x1 + 2 x2 + 2 x3 )+ x3 (− x1 + 2 x2 − 5 x3 )
2 2 2 2
1 3 1 3
= x12 2
− x1 x2 − x1 x3 − x2 x1 + 2 x2 + 2 x2 x3 − x3 x1
2 2 2 2
+ 2 x3 x2 − 5 x3 2
= x 1 2 + 2 x22 − 5 x32 − x1 x2 + 4 x2 x3 − 3 x3 x1 .
a h g
A=h b f ⋅
g f c
Example 8: Write down the quadratic forms corresponding to the following matrices :
0 1 2 3
0 5 − 1 1 2 3 4
(i) 5 1 6 (ii) ⋅
2 3 4 5
−1 6 2 3 4 5 6
Solution: (i) Let X = [ x1 x2 x3 ]T and A denote the given symmetric
matrix. Then XT AX is the quadratic form corresponding to this matrix. We have
0 5 − 1
X A = [ x1 x2 x3 ] 5 1 6
T
−1 6 2
= [5 x2 − x3 5 x1 + x2 + 6 x3 − x1 + 6 x2 + 2 x3 ].
∴ X T AX = x1 (5 x2 − x3 ) + x2 (5 x1 + x2 + 6 x3 ) + x3 (− x1 + 6 x2 + 2 x3 )
= x2 2 + 2 x3 2 + 10 x1 x2 − 2 x1 x3 + 12 x2 x3 .
(ii) Let X = [ x1 x2 x3 x4 ]T and A denote the given symmetric matrix.
Then X T AX is the quadratic form corresponding to this matrix. We have
0 1 2 3
1 2 3 4
X T A = [ x1 x2 x3 x4 ]
2 3 4 5
3 4 5 6
= [x2 + 2 x3 + 3 x4 x1 + 2 x2 + 3 x3 + 4 x4 2 x1 + 3 x2 + 4 x3 + 5 x4
3 x1 + 4 x2 + 5 x3 + 6 x4 ]
T
∴ X AX = x1 ( x2 + 2 x3 + 3 x4 ) + x2 ( x1 + 2 x2 + 3 x3 + 4 x4 )
+ x3 (2 x1 + 3 x2 + 4 x3 + 5 x4 ) + x4 (3 x1 + 4 x2 + 5 x3 + 6 x4 )
= 2 x2 2 + 4 x3 2 + 6 x4 2 + 2 x1 x2 + 4 x1 x3 + 6 x1 x4
+ 6 x2 x3 + 8 x2 x4 + 10 x3 x4 .
C omprehensive E xercise 1
A nswers 1
4 1 0
(ii) A = 1 −2 −4 ⋅ The given bilinear form is symmetric
0 −4 7
1 − 1 − 1 1 − 1 − 1
3.
P= 0 1 1 , Q = 0
1 − 1
0 0 1 0 0 1
0 1 3
a h
4. (i)
(ii) 1 0 −2
h b
3 −2 0
1 0 0 2 2 0
(iii) 0 5 0 (iv) 2 0 −3
0 0 −7 0 −3 −7
1 −1 0 3
2
a11 a12 a31
− 1 − 2 2 0
5. (i) a12 a22 a23 ⋅ (ii)
0 2 4 − 52
a31 a23 a33 3
2 0 − 52 −4
0 12 12 12 1 0 0 0
1 0 0 −1 0
2
0 12 12
(iii) 1 1 (iv)
0 1 0 −1 0 − 12
21 21 1 2 0 0 − 1
2 2 2 0 2
0
A1 O
P ′AP =
O O
where A1 is a non-singular diagonal matrix of order r over F and each O is a null matrix of
suitable size.
Or
Every symmetric matrix of rank r is congruent to a diagonal matrix, r, of whose diagonal
elements only are non-zero.
Proof : We shall prove the theorem by induction on n, the order of the given
matrix. If n = 1, the theorem is obviously true. Let us suppose that the theorem is
true for all symmetric matrices of order n − 1. Then we shall show that it is also true
for an n × n symmetric matrix A.
Let A = [ a ij ] n × n be a symmetric matrix of rank r over a field F. First we shall show
that there exists a matrix B = [ b ij ] n × n over F congruent to A such that b11 ≠ 0.
Case 1: If a11 ≠ 0, then we take B = A.
Case 2:If a11 = 0, but some diagonal element of A, say, a ii ≠ 0, then applying the
congruent operation Ri ↔ R1 , Ci ↔ C1 to A, we obtain a matrix B congruent
to A such that
b11 = a ii ≠ 0.
Case 3: Suppose that each diagonal element of A is 0. Since A is a non-zero
matrix, let a ij be a non-zero element of A. Then a ij = a ji ≠ 0.
Applying the congruent operation Ri → Ri + R j , Ci → Ci + C j to A, we obtain a
matrix D = [ dij ] n × n congruent to A such that dii = a ij + a ji = 2 a ij ≠ 0.
Now applying the congruent operation Ri ↔ R1 , Ci ↔ C1 to Dwe obtain a matrix
B = [ b ij ] n × n
congruent to D and, therefore, also congruent to A such that b11 = dii ≠ 0.
Thus there always exists a matrix B = [ b ij ] n × n
congruent to a symmetric matrix, such that the leading element of B is not zero.
Since B is congruent to a symmetric matrix, therefore B itself is a symmetric matrix.
Since b11 ≠ 0,therefore all elements in the first row and first column of B, except the
leading element, can be made 0 by suitable congruent operations. We thus have a
matrix
a11 0 ... 0
0
C= ,
⋮ B1
0
congruent to B and, therefore, also congruent to A such that B1 is a square matrix of
order n − 1. Since C is congruent to a symmetric matrix A, therefore C is also a
symmetric matrix and consequently B1 is also a symmetric matrix. Thus B1 is a
symmetric matrix of order n − 1.
313
which is equivalent to
315
y3
1 2
x1 = y1 + y2 −
3 7
1
x2 = y2 + y3 …(2)
7
x3 = y3 .
The transformation (2) will reduce the quadratic form (1) to the diagonal form
7 16
Y ′ P ′APY = 6 y12 + y2 2 + y3 2 .
3 7
The rank of the quadratic form X ′ AX is 3. So it has been reduced to a form which is
a sum of three squares.
1 1 1 1
,…, , ,…, , 1, … , 1.
√ λ1 √ λ p √ (− λ p + 1 ) √ (− λ r )
316
1 1 1 1
Then S = diag ,…, , ,…, , 1, … , 1
√ λ 1 √ λ p √ (− λ p + 1 ) √ (− λ r )
is a real non-singular diagonal matrix and S ′ = S.
If we take P = QS, then P is also real non-singular matrix and we have
P′ AP = (QS) ′ A (QS) = S ′ Q′ AQS = S ′ DS = SDS
= diag. [1, … , 1, − 1, … , − 1, 0, … , 0]
so that 1 and − 1 appear p and r − p times respectively.
Corollary: If X ′AX is a real quadratic form of rank r in n variables, then there exists a real
non-singular linear transformation X = PY which transforms X ′AX to the form
Y′ P′ APY = y12 + … + y p 2 − y p + 12 − … − y r 2 .
Canonical or Normal form of a real quadratic form: Definition: If X ′AX is
a real quadratic form in n variables, then there exists real non-singular linear transformation
X = PY which transforms X ′ AX to the form
y12 + … + y p 2 − y p + 12 − … − y r 2 .
In the new form the given quadratic form has been expressed as a sum and difference of the
squares of new variables. This latter expression is called the canonical form or normal form of
the given quadratic form.
If φ = X ′ AX is a real quadratic form of rank r, then A is a matrix of rank r. If the real
non-singular linear transformation X = PY reduces φ to normal form, then P′ AP is a
diagonal matrix having 1 and − 1as its non-zero diagonal elements. Since P′ AP is
also of rank r,therefore it will have precisely r non-zero diagonal elements. Thus the
number of terms in each normal form of a given real quadratic form is the same.
Now we shall prove that the number of positive terms in any two normal
reductions of a real quadratic form is the same.
Theorem 2:The number of positive terms in any two normal reductions of a real quadratic
form is the same.
Proof: Let φ = X ′ AX be a real quadratic form of rank r in n variables. Suppose the
real non-singular linear transformations
X = PY and X = QZ
transform φ to the normal forms
y12 + … + y p 2 − y p + 12 − … − y r 2 …(1)
2 2 2 2
and z1 + … + z q − z q +1 − … − z r …(2)
respectively.
To prove that p = q.
Let p < q. Obviously y1 , … , y n , z1 , … , z n are linear homogeneous functions of
x1 , … , x n . Since q > p, therefore q − p > 0. So n − (q − p) is less than n. Therefore
(n − q) + p is less than n.
317
Now y1 = 0, y2 = 0, … , y p = 0, z q + 1 = 0, z q + 2 = 0, … , z n = 0 are (n − q) + p
linear homogeneous equations in n unknowns x1 , … , x n . Since the number of
equations is less than the number of unknowns n, therefore these equations must
possess a non-zero solution. Let x1 = a1 , … , x n = a n be a non-zero solution of these
equations and let X1 = [a1 , … , a n ]′ . Let Y = [b1 , … , b n ]′ = Y1 and Z = [c1 , … , c n ]′
when X = X1 . Then b1 = 0, … , b p = 0 and c q + 1 = 0, … , c n = 0. Putting
Y = [b1 , … , b n ]′ in (1) and Z = [c1 , … , c n ]′ in (2), we get two values of φ when
X = X1 .
These must be equal. Therefore we have
− b p + 12 − … − b r 2 = c12 + … + c q 2
⇒ b p + 1 = 0, … , b r = 0
⇒ Y1 = O
⇒ P −1 X 1 = O [ ∵ X1 = PY1 ]
⇒ X1 = O,
which is a contradiction since X1 is a non-zero vector.
Thus we cannot have p < q. Similarly, we cannot have q < p. Hence we must have
p = q.
Corollary: The number of negative terms in any two normal reductions of a real quadratic
form is the same. Also the excess of the number of positive terms over the number of negative
terms in any two normal reductions of a real quadratic form is the same.
Signature and index of a real quadratic form:
Definition: Let y12 + … + y p 2 − y p + 12 − … − y r 2 be a normal form of a real
quadratic form X ′ AX of rank r. The number p of positive terms in a normal form of X ′ AX is
called the index of the quadratic form. The excess of the number of positive terms over the
number of negative terms in a normal form of
X ′AX i. e., p − (r − p) = 2 p − r
is called the signature of the quadratic form and is usually denoted by s.
Thus s = 2 p − r.
In terms of signature theorem 2 may be stated as follows :
Theorem 3: Sylvester’s Law of Inertia: The signature of a real quadratic form is
invariant for all normal reductions.
Proof: For its proof give definition of signature and the proof of theorem 2.
Theorem 4: Two real quadratic forms in n variables are real equivalent if and only if they
have the same rank and index (or signature).
Proof: Suppose X ′ AX and Y ′ BY are two real quadratic forms in the same
number of variables.
Let us first assume that the two forms are equivalent. Then there exists a real
non-singular linear transformation X = PY which transforms X ′ AX to Y′ BY i. e.,
B = P′ AP.
318
Corollary 2:Two real quadratic forms in n variables are complex equivalent if and only if
they have the same rank.
Orthogonal reduction of a real quadratic form:
Theorem 6: If φ = X ′AX be a real quadratic form of rank r in n variables, then there exists
a real orthogonal transformation X = PY which transforms φ to the diagonal form
λ 1 y12 + … + λ r y r 2 ,
where λ 1 , … , λ r are the, r, non-zero eigenvalues of A, n − r eigenvalues of A being equal to
zero.
Proof: Since A is a real symmetric matrix, therefore there exists a real orthogonal
matrix P such that P −1 AP = D, where D is a diagonal matrix whose diagonal
elements are the eigenvalues of A.
Since A is of rank r, therefore P −1 AP = D is also of rank r. So D has precisely r
non-zero diagonal elements. Consequently A has exactly r non-zero eigenvalues,
the remaining n − r eigenvalues of A being zero. Let D = diag. [λ 1 , …, λ r , … 0, … 0].
Since P −1 = P ′, therefore P −1 AP = D
⇒ P ′ AP = D ⇒ A is congruent to D.
Now consider the real orthogonal transformation X = PY. We have
X ′ AX = (PY) ′ A (PY) = Y ′ P ′ AP Y = Y ′ DY
= λ 1 y12 + … + λ r y r 2 .
Hence the result.
Theorem 7: Every real quadratic form X ′ AX in n variables is real equivalent to the form
y12 + … + y p 2 − y p + 12 − … − y r 2 ,
where r is the rank of A and p is the number of positive eigenvalues of A.
Proof: A is a real symmetric matrix. Therefore there exists a real orthogonal
matrix Q such that Q −1 AQ = Q ′ AQ = D,
where D is a diagonal matrix whose diagonal elements are the eigenvalues of A.
Since A is of rank r, therefore D is also of rank r. So D has exactly r non-zero diagonal
elements. Consequently A has exactly r non-zero eigenvalues, the remaining n − r
eigenvalues of A being zero. Let D = diag. [λ 1 , λ 2 … , λ r , 0, … 0].
Let λ 1 , … , λ p be positive and λ p + 1 , … λ r be negative. Let S be the n × n real
diagonal matrix with diagonal elements
1 1 1 1
,…, , ,…, , 1, … , 1.
√ λ1 √ λ p √ (− λ p + 1 ) √ (− λ r )
Then S is non-singular and S ′ = S. If we take P = QS, then P is also a real
non-singular matrix and we have
P′ AP = (QS) ′ A (QS) = S ′ Q′ AQS = SDS
= diag. [1, … , 1, − 1, … , − 1, 0, … , 0]
so that 1 and − 1 appear p and r − p times respectively.
320
Example 10: Reduce each of the following quadratic forms in three variables to real
canonical form and find its rank and signature. Also write in each case the linear
transformation which brings about the normal reduction.
(i) 2 x12 + x2 2 − 3 x3 2 − 8 x2 x3 − 4 x3 x1 + 12 x1 x2 .
(ii) 6 x12 + 3 x2 2 + 14 x3 2 + 4 x2 x3 + 18 x3 x1 + 4 x1 x2 .
Solution: (i) The matrix A of the given quadratic form is
2 6 −2
A = 6 1 −4 ⋅
−2 −4 −3
321
2 0 0 1 0 0 1 − 3 11 17
0 − 17 0 = − 3 1 0 A 0 1 17 ⋅
2
0 0 − 1781 11 2 0 1
17 17 1 0
1 1 1 1
Performing R1 → R1 , C1 → C1 ; R2 → R2 , C2 → C2 ;
√2 √2 √ 17 √ 17
and R3 → √ (17 / 81) R3 , C3 → √ (17 / 81) C3 , we get
1 0 0 a 0 0 a − 3b 11
17
c
0 − 1 0 = − 3b
b 0 A 0 b 2
c ,
17
0 0 − 1 11
17
c 2 c
17
c 0 0 c
a − 3b 11
17
c
P = 0 b 2
17
c , X = [ x1 x2 x3 ]′ , Y = [ y1 y2 y3 ]′ ,
0 0 c
transforms the given quadratic form to the normal form
y12 − y2 2 − y3 2 . …(1)
The rank r of the given quadratic form = the number of non-zero terms in its normal
form (1) = 3.
322
The signature of the given quadratic form = the excess of the number of positive
terms over the number of negative terms in its normal form = 1 − 2 = − 1.
The index of the given quadratic form = the number of positive terms in its normal
form = 1.
The linear transformation X = PY which brings about this normal reduction is
given by
11 2
x1 = ay1 − 3by2 + cy3 , x2 = by2 + cy3 , x3 = cy3 .
7 17
(ii) The matrix of the given quadratic form is
6 2 9
A = 2 3 2 ⋅
9 2 14
6 2 9 1 0 0 1 0 0
We write 2 3 2 = 0 1 0 A 0 1 0 ⋅
9 2 14 0 0 1 0 0 1
Performing congruence operations
1 1
R2 → R2 − R1 , C2 → C2 − C1 ;
3 3
3 3
and R3 → R3 − R1 , C3 → C3 − C1 ,
2 2
we get
6 0 0 1 0 0 1 − 13 − 32
1
0 3 − 1 = − 3 1 0 A 1 0⋅
7
0
0 − 1 1 − 3 0 1 0 0 1
2 2
3 3
Performing R3 → R3 + R2 , C3 → C3 + C2 , we get
7 7
6 0 0 1 0 0 1 − 13 23
14
1
7
0 3 0 = − 3 1 0 A 0 1 3
7 .
0 0 1 23 3 1 0 0 1
14 14 7
Performing R1 → (1/ √ 6) R1 , C1 → (1/ √ 6)C1 ; R2 → √ (3 / 7) R2 , C2 → √ (3 / 7) C2 ;
R3 → (1/14) R3 , C3 → (1/ √ 14) C3 , we get
1 0 0 a 0 0 a − 13 b 23 c
14
0 1 0 = − 1 b b 0 A 0
b 3 c
3 7
0 0 1 14
23 c 3 c 1
7
0 0 1
1 , 1
where a= b = √ (3 / 7), c = ⋅
√6 √ 14
Thus the linear transformation X = PY where
323
a − 13 b 23
14
c
P = 0 b 3
7
c
0 0 1
transforms the given quadratic form to the normal form
y12 + y22 + y32 .
The rank of the given quadratic form is 3 and its signature is
3 − 0 = 3.
Example 11: Find an orthogonal matrix P that will diagonalize the real symmetric matrix
0 1 1
A = 1 0 − 1 ⋅
1 −1 0
or − x1 + x2 + x3 = 0.
Two orthogonal solutions are
X1 = [1, 0, 1]′ and X 2 = [1, 2, − 1]′.
An eigenvector corresponding to the eigenvalue −2, is found by solving
2 x1 + x2 + x3 = 0, x1 + 2 x2 − x3 = 0
to be X 3 = [−1, 1, 1]′ .The required matrix P is therefore a matrix whose columns are
unit vectors which are scalar multiples of X1 , X 2 and X 3 .
1/ √ 2 1/ √ 6 −1/ √ 3
∴ P= 0 2 / √ 6 1/ √ 3
1/ √ 2 −1/ √ 6 1/ √ 3
We have P ′ AP = diag. [1, 1, − 2]
324
or (2 − λ ) [(6 − λ ) (4 − λ ) − 8] = 0
or (2 − λ ) (λ2 − 10 λ + 16) = 0
or (2 − λ ) (λ − 2) (λ − 8) = 0.
∴ the eigenvalues of A are 2 , 2 , 8.
The eigenvalue 8 is of algebraic multiplicity 1. So there will be only one linearly
independent eigenvector corresponding to this value.
325
x
Let X3 = y be another eigenvector of A corresponding to the eigenvalue 2
z
and let X 3 be orthogonal to X 2 .
Then 2x − y + z = 0 [ ∵ X 3 is a solution of (3)]
and 0 + y+ z =0 [ ∵ X 2 and X 3 are orthogonal]
Obviously y = 1, z = − 1, x = 1 is a solution.
1
∴ X3 = 1 ⋅
−1
8 0 0
P −1
AP = P AP = 0 2 0 ⋅
T
0 0 2
det P = 1 ≠ 0.
The quadratic form q now becomes
q = y1 ( y1 + y2 ) + 2 y1 y3 + 3 ( y1 + y2 ) y3
= y12 + y1 y2 + 5 y1 y3 + 3 y2 y3
= { y12 + y1 ( y2 + 5 y3 )} + 3 y2 y3
1 1
= { y1 + ( y2 + 5 y3 )}2 − ( y2 2 + 10 y2 y3 + 25 y3 2 ) + 3 y2 y3
2 4
2
= y1 + y3 − ( y2 2 + 10 y2 y3 − 12 y2 y3 ) −
1 5 1 25
y2 + y3 2
2 2 4 4
2
= y1 + y3 − ( y2 2 − 2 y2 y3 ) −
1 5 1 25
y2 + y3 2
2 2 4 4
2
= y1 + y3 − ( y2 − y3 )2 +
1 5 1 1 25
y2 + y3 2 − y3 2
2 2 4 4 4
2
= y1 + y3 − ( y2 − y3 )2 − 6 y3 2 .
1 5 1
y2 +
2 2 4
1 5 1 1 5
Now put z1 = y1 + y2 + y3 = x1 + x2 + x3
2 2 2 2 2
z 2 = y2 − y3 = x2 − x1 − x3
z 3 = y3 = x3
so that the non-singular transformation Z = QX i. e.,
z1 12 12 52 x1
z = −1 1 −1 x
2 2
z 3 0 0 1 x3
with det Q = 1 will reduce the given quadratic form to the diagonal form
1
q = z12 − z 2 2 − 6z 3 2 .
4
329
Comprehensive Exercise 2
1. Write the matrix and find the rank of each of the following quadratic forms :
(i) x 12 − 2 x1 x2 + 2 x 22
(ii) 4 x 12 + x 22 − 8 x 32 + 4 x1 x2 − 4 x1 x3 + 8 x2 x3 .
2. Determine a non-singular matrix P such that P ′ AP is a diagonal matrix, where
0 1 2
A = 1 0 3 ⋅
2 3 0
3. Reduce each of the following quadratic forms in three variables to real
canonical form and find its rank and signature. Also write in each case the
linear transformation which brings about the normal reduction.
(i) x 2 − 2 y 2 + 3z 2 − 4 yz + 6zx.
(ii) x 2 + 2 y 2 + 2z 2 − 2 xy − 2 yz + zx.
4. Reduce the following quadratic form to canonical form and find its rank and
signature :
x 2 + 4 y 2 + 9z 2 + t 2 − 12 yz + 6zx − 4 xy − 2 xt − 6zt.
5. Using Lagrange’s reduction reduce the quadratic form
5 −2 0 x1
[ x1 x2 x3 ] − 2 6 2 x2
0 2 7 x3
to a diagonal form.
6. Using Lagrange’s reduction reduce the quadratic form
3 1 0 0 x1
1 3 0 0 x
[ x1 x2 x3 x4 ] 2
0 0 3 − 1 x3
0 0 −1 3 x
4
to a diagonal form.
7. Using Lagrange’s reduction reduce the quadratic form
q ( x1 , x2 , x3 ) = ( x1 + x2 + x3 ) x2 to a diagonal form.
8. Using Lagrange’s reduction reduce the quadratic form
q ( x1 , x2 , x3 ) = x1 x2 + x2 x3 + x3 x1 to a diagonal form.
9. Using Lagrange’s reduction reduce the quadratic form
2 2 2
q ( x1 , x2 , x3 ) = x1 + 4 x2 + 16 x3 + 4 x1 x2 + 8 x1 x3 + 17 x2 x3
to a diagonal form.
330
A nswers 2
4 2 −2
1 − 1
1. (i) , rank 2 (ii) 2 1 4 , rank 3
−1 2
−2 4 −8
1 − 12 −3
2. P=1 1
2
−2
0 0 1
3. (i) The linear transformation X = PY where
1 0 − 32
P = 0 1 − 12 , X = [x y z ]′ , Y = [ y1 y2 y3 ]′
√2
0 0 1
2
transforms the given quadratic form to the normal form y12 − y22 − y32
The rank of the given quadratic form is 3 and its signature is −1
(ii) The linear transformation X = PY where
1 1 0
P = 0 1 1/ √ 6
0 0 √ (2 / 3)
2 2 2
transforms the given quadratic form to the normal form y1 + y2 + y3
Rank of the given quadratic form is 3 and its signature is 3
2 2 2
4. y1 − y2 + y4
Rank of the given quadratic form is 3 and its signature is 1
1 2 10 2 81 2 1 2 8 2 1 2 8 2
5. y + y + y 6. y + y + y + y
5 1 13 2 13 3 3 1 3 2 3 3 3 4
1 2 2 1 2 2 2
7. ( y − y2 ) 8. (z1 − z 2 ) − z 3
4 1 4
2 1 2 2
9. z1 + (z 2 − z 3 )
4
(i) Positive Definite (PD) if φ ≥ 0 for all real values of the variables x1 , … , x n and
φ = 0 only if X = O i. e., φ = 0 ⇒ x1 = x2 = … = x n = 0.
For example the quadratic form x12 − 4 x1 x2 + 5 x2 2 in two variables is positive
definite because it can be written as
( x1 − 2 x2 )2 + x2 2 ,
which is ≥ 0 for all real values of x1 , x2 and
( x1 − 2 x2 )2 + x2 2 = 0 ⇒ x1 − 2 x2 = 0, x2 = 0
⇒ x1 = 0, x2 = 0.
Similarly the quadratic form x12 + x2 2 + x3 2 in three variables is a positive
definite form.
(ii) Negative definite (ND) if φ ≤ 0 for all real values of the variables x1 , … , x n and
φ = 0 only if x1 = x2 = … = x n = 0.
For example − x12 − x2 2 − x3 2 is a negative definite form in three variables.
(iii) Positive semi-definite (PSD) if φ ≥ 0 for all real values of the variables x1 , … , x n
and φ = 0 for some non-zero real vector X i. e., φ = 0 for some real values of the variables
x1 , x2 , … , x n not all zero.
For example the quadratic form
x12 + x2 2 + 2 x3 2 − 2 x1 x3 − 2 x2 x3
is positive semi-definite because it can be written in the form
( x1 − x3 )2 + ( x2 − x3 )2 ,
which is ≥ 0 for all real values of x1 , x2 , x3 but is zero for non-zero values also, for
example, x1 = x2 = x3 = 1.
Similarly the quadratic form x12 + x2 2 + 0 x3 2 in three variables x1 , x2 , x3 is
positive semi-definite. It is non-negative for all real values of x1 , x2 , x3 and it is zero
for values x1 = 0, x2 = 0, x3 = 2 which are not all zero.
(iv) Negative semi-definite (NSD) if φ ≤ 0 for all real values of the variables
x1 , … , x n and φ = 0 for some values of the variables x1 , … , x n not all zero.
For example the quadratic form − x12 − x2 2 − 0 x3 2 in three variables x1 , x2 , x3 is
negative semi-definite.
(v) Indefinite (I) if φ takes positive as well as negative values for real values of the variables
x1 , … , x n .
For example the quadratic form x12 − x2 2 + x3 2 in three variables is indefinite. It
takes positive value 1 when x1 = 1, x2 = 1, x3 = 1 and it takes negative value − 1
when x1 = 0, x2 = 1, x3 = 0.
Note 1: The above five classes of real quadratic forms are mutually exclusive and
are called value classes of real quadratic forms. Every real quadratic form must
belong to one and only one value class.
332
Note 2:A form which is positive definite or negative definite is called definite and a
form which is positive semi-definite or negative semi-definite is called semi-definite.
Non-negative definite quadratic form:
Definition: A real quadratic form φ = X ′AX in n variables x1 , … , x n , is said to be
non-negative definite if it takes only non-negative values for all real values of x1 , … , x n .
Thus φ is non-negative definite if φ ≥ 0 for all real values of x1 , … , x n . A
non-negative definite quadratic form may be positive definite or positive
semi-definite. It is positive definite if it takes the value 0 only when
x1 = x2 = … = x n = 0.
Classification of real-symmetric matrices:
Definite, semi-definite and indefinite real symmetric matrices:
Definition: A real symmetric matrix A is said to be definite, semi-definite or indefinite if
the corresponding quadratic form X ′AX is definite, semi-definite or indefinite respectively.
Positive definite real symmetric matrix:
Definition: A real symmetric matrix A is said to be positive definite if the corresponding
form X ′ AX is positive definite.
Non-negative definite real symmetric matrix:
Definition:A real symmetric matrix A is said to be non-negative definite if the associated
quadratic form X ′AX is non-negative definite.
Theorem 1: All real equivalent real quadratic forms have the same value class.
Proof: Let φ = X ′ AX and ψ = Y′ BY be two real equivalent real quadratic forms.
Then there exists a real non-singular matrix P such that P′AP = B and
(P −1 )′ BP −1 = A. The real non-singular linear transformation X = PY transforms
the quadratic form φ into the quadratic form ψ and the inverse transformation
Y = P −1 X transforms the quadratic form ψ into the quadratic form φ. The two
quadratic forms have the same ranges of values. The vectors X and Yfor which φ and
ψ have the same value are connected by the relations X = PY and Y = P −1 X. Thus
the vector Y for which ψ has the same value as φ has for the vector X is given by
Y = P −1 X. Similarly the vector X for which φ has the same value as ψ has for the
vector Y is given by X = PY.
Now we shall discuss the five cases separately.
Case I: φ is positive definite if and only if ψ is positive definite.
Suppose φ is positive definite.
Then φ ≥ 0 and φ = 0 ⇒ X = O.
Since φ and ψ have same ranges of values, therefore
φ ≥ 0 ⇒ ψ ≥ 0.
333
Also ψ = 0 ⇒ Y′ BY = 0
⇒ (PY) ′ A(PY) = 0 [ ∵ φ has the same value for the vector PY as ψ
has for the vector Y]
⇒ PY = O [∵ φ is positive definite means X ′AX = 0 ⇒ X = O]
⇒ P −1 (PY) = P − 1 O
⇒ Y = O.
Thus ψ is also positive definite.
Conversely suppose that ψ is positive definite.
Then ψ ≥ 0 and ψ = 0 ⇒ Y = O.
Since φ and ψ have the same ranges of values, therefore
ψ ≥ 0 ⇒ φ ≥ 0.
Also φ = 0 ⇒ X ′ AX = 0
⇒ (P −1 X) ′ B (P −1 X) = 0 [ ∵ ψ has the same value for the vector
P −1 X as φ has for the vector X]
⇒ P −1 X = O [ ∵ ψ is positive definite]
⇒ P (P −1 X) = PO
⇒ X = O.
Thus φ is also positive definite.
Case II: φ is negative definite if and only if ψ is negative definite.
The proof is the same as in case I.
The only difference is that we are to replace the expressions φ ≥ 0, ψ ≥ 0 by the
expressions φ ≤ 0, ψ ≤ 0.
Case III: φ is positive semi-definite if and only if ψ is positive semi-definite.
Since φ and ψ have the same ranges of values, therefore φ ≥ 0 if and only if ψ ≥ 0.
Further since P is non-singular, therefore
X ≠ O ⇒ Y = P −1 X ≠ O
and Y ≠ O ⇒ X = PY ≠ O.
Also the vectors X and Y for which φ and ψ have the same values are connected by
the relations X = PY and Y = P −1 X. Therefore φ = 0 for some non-zero vector X if
and only if ψ = 0 for some non-zero vector Y.
Hence φ is positive semi-definite if and only if ψ is positive semi-definite.
Case IV: φ is negative semi-definite if and only if ψ is negative semi-definite.
For proof replace the expressions φ ≥ 0, ψ ≥ 0 in case III by the expressions
φ ≤ 0, ψ ≤ 0.
Case V: φ is indefinite if and only if ψ is indefinite. Since φ and ψ have the same ranges
of values, therefore the result follows immediately.
Thus the proof of the theorem is complete.
334
Criterion for the value of a real quadratic form in terms of its rank and
signature:
Theorem 2: Suppose r is the rank and s is the signature of a real quadratic form
φ = X ′ AX in n variables. Then φ is (i) positive definite if and only if s = r = n, (ii) negative
definite if and only if − s = r = n, (iii) positive semi-definite if and only if s = r < n, (iv)
negative semi-definite if and only if − s = r < n ; and (v) indefinite if and only if | s | ≠ r.
Proof: Let ψ = y12 + … + y p 2 − y p + 12 − … − y r 2 …(1)
be the real canonical form of the real quadratic form φ of rank r and signature s.
Then s = 2 p − r. Since φ and ψ are real equivalent real quadratic forms, therefore
they have the same value class.
(i) Suppose s = r = n. Then p = n and the real canonical form of φ becomes
y12 + … + y n2 . But this is a positive definite quadratic form. So φ is also
positive definite.
Conversely suppose that φ is positive definite. Then ψ is also a positive
definite form in n variables. So we must have ψ = y12 + … + y n2 .
Hence r = n, p = n, 2p − r = s = n.
(ii) Suppose − s = r = n. Then s = 2 p − r gives p = 0. The real canonical form of φ
becomes − y12 − … − y n2 which is negative definite and so φ is also
negative definite.
Conversely if φ is negative definite, then ψ is also negative definite and
so we must have ψ = − y12 − … − y n2 .
Hence r = n, p = 0, 2 p − r = s = − n i. e., − s = n.
(iii) Suppose s = r < n. Then s = 2 p − r gives p = r and the real canonical form of φ
becomes y12 + … + y r 2 where r < n. But this is a positive semi-definite form
in n variables. So φ is also positive semi-definite.
Conversely if φ is positive semi-definite, then ψ is also a positive
semi-definite form in n variables. So we must have ψ = y12 + … + y r 2
where r < n.
Therefore p = r < n and s = 2 p − r = r. Thus s = r < n.
(iv) Suppose − s = r < n. Then s = 2 p − r gives p = 0 and the real canonical form
of φ becomes − y12 − … − y r 2 where r < n. This is a negative semi-definite
form in n variables. So φ is also negative semi-definite.
Conversely if φ is negative semi-definite, then ψ is also a negative
semi-definite form in n variables. So we must have
ψ = − y12 − … − y r 2 , where r < n.
Therefore p = 0 and s = 2 p − r = − r. Thus − s = r < n.
(v) Suppose | s | ≠ r. Then | 2p − r | ≠ r. Therefore p ≠ 0 and p ≠ r and so
0 < p < r. Then in this case the canonical form of φ has positive as well as
negative terms and so it is an indefinite form. Consequently φ is also
indefinite.
335
P′ O P C P′ O B B1 P C
=
1 O
S
C′ 1 C ′ 1 B1 ′ λ O 1
P′ B P′ B 1 P C
=
C′ B + B1 ′ C ′ B 1 + λ O 1
P′ BP P′ BC + P′ B 1
=
C ′ BP + B 1 ′ P C ′ BC + B 1 ′ C + C ′ B 1 + λ
340
Im O −1 −1
O B ′ C + λ [∵ P ′ BP = I m , C = − B B1 , C ′ = − B1 ′ B ]
1
P ′ O P C I m O
Thus S = B1 ′ C + λ
⋅
C ′ 1 O 1 O
Taking determinants of both sides, we get
| P′ | . | S | . | P | = | I m | . | B 1 ′ C + λ | = B 1 ′ C + λ
because B1 ′ C + λ is an 1 × 1 matrix.
∴ | P |2 .| S | = B 1 ′ C + λ [ ∵ | P | = | P ′ |].
Since | S | > 0 and | P | ≠ 0, therefore B1 ′ C + λ is positive. Let B1 ′ C + λ = α 2 ,
where α is real. Then
P′ O P C Im O
= ⋅
1 O
S
C′ 1 O α 2
Pre-multiplying and post-multiplying both sides with
Im O
, we get
O α −1
Im O P ′ O P C I m O
O C′ 1 S O 1 O = I m+1 .
α −1 α −1
P C Im O
Now let Q = ⋅
O 1 O α −1
Then Q is non-singular as it is the product of two non-singular matrices. Also
Im O P′ O
Q′= C′ 1 ⋅
O α −1
Therefore, we have
Q ′ SQ = I m + 1 .
Thus the real symmetric matrix S of order m + 1 is congruent to I m + 1 . So the
quadratic form corresponding to S is positive definite.
The proof is now complete by induction.
Corollary: The real quadratic form
n n
q ( x1 , … , x n ) = Σ Σ a ij x i x j
i =1 j =1
Theorem 5:A real quadratic form XT A X is positive semi-definite if the leading principal
minors A1 , … , An − 1 of A are positive and det A = 0.
Theorem 6:A real symmetric matrix A is indefinite if and only if at least one of the following
conditions is satisfied :
(a) A has a negative principal minor of even order.
(b) A has a positive principal minor of odd order and a negative principal minor of odd order.
Solution: Since A is a positive definite real symmetric matrix, therefore the eigenvalues
λ 1 , … , λ n of A are all real and positive. Also there exists an orthogonal matrix P such that
P −1 AP = D = diag. [λ 1 , … , λ n ].
346
− 1 1 0
T
= X AX, where A = 1 − 2 1 is a symmetric matrix.
0 1 − 2
The leading principal minors of A are
−1 1
A1 = − 1, A2 = = 2 − 1 = 1,
1 −2
− 1 1 0 − 1 0 0
A3 = 1 − 2 1 = 1 − 1 1 , C2 → C2 + C1
0 1 − 2 0 1 − 2
= − 1 (2 − 1) = − 1.
We see that A1 < 0, A2 > 0, A3 < 0.
So the given quadratic form is negative definite.
Comprehensive Exercise 3
6. Show that a real symmetric matrix is positive definite if and only if all its
characteristic roots are positive.
7. Show that a real symmetric matrix A is positive definite iff A −1 exists and is
positive definite.
348
A nswers 3
10. positive semi-definite. 11. negative semi-definite.
12. indefinite.
is called a Hermitian form of order n in the n complex variables x1, x2 , … , xn. The Hermitian
matrix H is called the matrix of this Hermitian form.
If H is real, then a Hermitian form is called the real Hermitian form. Also, det H is
defined as the discriminant of the Hermitian form, and a Hermitian form is called
singular if its determinant is zero, otherwise it is called non-singular.
θ
Note: A matrix H over the complex field Cis called a Hermitian matrix if H = H,
θ
where H denotes the conjugate transpose of the matrix H i. e.,
θ
H = ( H )′ = ( H )T = ( H ′ ).
Some authors denote the conjugate transpose of a matrix H by the symbol H * . A
real symmetric matrix A is always a Hermitian matrix.
349
Theorem 1: A Hermitian form X θ H X assumes only real values for all complex
n-vectors X.
Proof : Suppose X θ H X is a Hermitian form. Then H is a Hermitian matrix and
θ
so H = H.
Since X θ H X is a 1 × 1 matrix, therefore it is symmetric and so
(X θ H X)′ = X θ H X.
Now ( X θ H X ) = ( X θ H X )′ = ( X θ H X ) θ
= Xθ H θ
(X θ ) θ = X θ H X.
Thus X θ H X and its conjugate are equal.
∴ X θ H X is a real 1 × 1 matrix.
Hence, X θ H X has only real values.
Theorem 2 : The determinant and every leading principal minor of a Hermitian matrix H
are real.
Proof : We have | H | = | H | = |( H )′|
= | H θ | = | H |. [∵ H θ
= H, H being Hermitian]
∴ | H | is real.
Since every leading principal sub-matrix of a Hermitian matrix is Hermitian, we
conclude that every leading principal minor of a Hermitian matrix is real.
This completes the proof of the theorem.
Theorem 3 : A Hermitian form X θ H X remains Hermitian by a non-singular linear
transformation of coordinates defined by
X= PY
where P is a non-singular matrix over the complex field C.
Proof : Substituting X = P Y in h = X θ H X, we get
h = ( P Y) θ H ( P Y) = Yθ ( P θ
HP) Y = Yθ Q Y,
θ
where Q= P HP.
θ θ
We have Q = (P HP) θ = P θ
H θ
(P θ )θ = P θ
HP = Q.
∴ Q is Hermitian.
Hence, the transformed form Y θ Q Y is Hermitian.
Remark: It can be easily seen that under this non-singular transformation, the rank
of a Hermitian form remains invariant. We know that the rank of a matrix does not
alter by pre-multiplication or post-multiplication by a non-singular matrix. Since P
and P θ are both non-singular, therefore rank H = rank ( P θ HP) = rank Q.
θ
Hence rank X H X = rank Y θ Q Y.
350
is negative definite if and only if the leading principal minors of H are alternately negative and
positive i. e., iff
h11 h12 h13
h11 h12
> 0,h21 h22 h23 < 0, … .
h11 < 0,
h21 h22
h31 h32 h33
Theorem 8 : A Hermitian form X θ H X is negative definite if and only if all the
principal minors of H of even order are positive and those of odd order are negative.
Theorem 9 : A Hermitian form X θ H X is
(i) Positive definite if and only if all the eigenvalues of H are positive.
(ii) Negative definite if and only if all the eigenvalues of H are negative.
(iii) Positive semi-definite iff all the eigenvalues of H are ≥ 0 with at least one zero eigenvalue.
(iv) Negative semi-definite if and only if all the eigenvalues of H are ≤ 0 with at least one zero
eigenvalue.
(v) Indefinite iff H has at least one positive eigenvalue and one negative eigenvalue.
353
The signature s of the given Hermitian form = the number of positive terms in its
diagonal form − the number of negative terms in its diagonal form = 1 − 1 = 0.
(ii) The characteristic equation of the matrix H of the given Hermitian form
X θ H X is | H − λI | = 0
− 2 − λ 1+ 2 i 0
or 1 − 2 i −4−λ 0 = 0
0 0 − λ
or − λ [(− 2 − λ ) (− 4 − λ ) − (1 − 2 i) (1 + 2 i)] = 0
or − λ [(λ + 2) (λ + 4) − (1 + 4)] = 0
or − λ (λ2 + 6 λ + 8 − 5) = 0 or λ (λ2 + 6 λ + 3) = 0.
− 6 ± √ (36 − 12) − 6 ± 24
∴ λ = 0, i. e., i. e., − 3 ± 6.
2 2
The eigenvalues − 3 + 6 and − 3 − 6 are both negative.
Thus the eigenvalues of the matrix H are all ≤ 0 and at least one of them is zero.
Hence the given Hermitian form X θ H X is negative semi-definite.
Remark: Order of the given Hermitian form = order of the matrix H = 3.
Rank r of the given Hermitian form = 2.
Index p of the given Hermitian form = 0.
Signature s of the given Hermitian form = 0 − 2 = − 2.
A diagonal form of the given Hermitian form
= − (3 − 6) y1 y1 − (3 + 6) y2 y2 .
Its canonical form or normal form is
− z1 z1 − z 2 z 2 .
Example 24 : Determine whether or not the following Hermitian forms in C2 are
equivalent.
(i) 2 x1 x1 + 3i x2 x1 − 3i x1 x2 − x2 x2 , (2 + i) x2 x1 + (2 − i) x1 x2
(ii) x1 x1 + 2 x2 x2 + (1 + 2 i) x1 x2 + (1 − 2 i) x2 x1 , i x1 x2 − i x2 x1 .
Solution : (i) Out of the two given Hermitian forms the first Hermitian form is
2 x1 x1 − 3i x1 x2 + 3i x2 x1 − x2 x2
2 − 3i
= X θ H1 X, where H1 = is a Hermitian matrix.
3i − 1
The characteristic equation of H1 is | H1 − λI | = 0
2−λ − 3i
i. e., = 0
3i −1− λ
or (2 − λ ) (− 1 − λ ) + 9i2 = 0
or − 2 − λ + λ2 − 9 = 0
or λ2 − λ − 11 = 0.
355
or λ2 − (2 + i) (2 − i) = 0 or λ2 − 5 = 0
∴ λ2 = 5 or λ = ± 5.
∴ the eigenvalues of H2 are + 5, − 5.
The rank r of the Hermitian form X θ H2 X = the number of non-zero eigenvalues
of H2 = 2 , the index p of the Hermitian form X θ H2 X = the number of positive
eigenvalues of H2 = 1.
Thus the Hermitian forms X θ H1 X and X θ H2 X have the same rank and the
same index. Hence, they are equivalent.
(ii) Out of the two given Hermitian forms the first Hermitian form is
x1 x1 + (1 + 2 i) x1 x2 + (1 − 2 i) x2 x1 + 2 x2 x2
1 1+ 2 i
= Xθ H1 X, where H1 = is a Hermitian matrix.
1 − 2 i 2
The characteristic equation of H1 is | H1 − λI | = 0
1− λ 1 + 2 i
i. e., = 0
1 − 2 i 2−λ
or (1 − λ ) (2 − λ ) − (1 − 2 i) (1 + 2 i) = 0
or λ2 − 3 λ + 2 − 5 = 0 or λ2 − 3 λ − 3 = 0 .
3 ± 9 + 12 3 ± 21
∴ λ = = ⋅
2 2
3 + 21 3 − 21
∴ the eigenvalues of H1 are , ⋅
2 2
356
3 + 21 3 − 21
The eigenvalue is positive and the eigenvalue is negative.
2 2
The rank r of the Hermitian form X θ H1 X = the number of non-zero eigenvalues
of H1 = 2 ,
the index pof this Hermitian form = the number of positive eigenvalues of H1 = 1.
The second Hermitian form is
0 x1 x1 + i x1 x2 − i x2 x1 + 0 x2 x2
0 i
= X θ H2 X, where H2 = is a Hermitian matrix.
− i 0
The characteristic equation of H2 is | H2 − λI | = 0
−λ i
i. e., = 0 or λ2 + i2 = 0 or λ2 − 1 = 0 .
− i − λ
∴ λ2 = 1 or λ = ± 1.
∴ the eigenvalues of H2 are 1, − 1.
The rank r of the Hermitian form X θ H2 X = the number of non-zero eigenvalues
of H2 = 2 and its index p = the number of positive eigenvalues of H2 = 1.
Thus the Hermitian forms X θ H1 X and X θ H2 X have the same rank and the
same index. Hence, they are equivalent.
Example 25 : Reduce the Hermitian form X θ H X, where
2 1− 2 i
H= ,
1+ 2 i − 2
to the canonical form by a unitary transformation and hence find the rank and signature of the
given Hermitian form.
Solution : The characteristic equation of H is
2−λ 1− 2 i
| H − λI | = 0 i. e., = 0
1 + 2 i − 2 − λ
or (λ − 2) (λ + 2) − (1 + 2 i) (1 − 2 i) = 0
or λ2 − 4 − (1 + 4) = 0 or λ2 − 9 = 0 .
∴ the eigenvalues of H are − 3, 3.
x1
The eigenvector X = corresponding to the eigenvalue − 3 is given by the
x2
equation
5 1− 2 i x1 0
( H + 3I ) X = O or 1+ 2 i x =0
1 2
i. e., 5 x1 + (1 − 2 i) x2 = 0, and (1 + 2 i) x1 + x2 = 0.
Obviously x1 = 1 − 2 i, x2 = − 5 is a solution.
357
1− 2 i
∴ X1 = is an eigenvector corresponding to the eigenvalue λ = − 3.
−5
Corresponding to the eigenvalue 3, the eigenvector is given by the equation
( H − 3I ) X = O
−1 1 − 2 i x1 0
or 1+ 2 i =
− 5 x2 0
i. e., − x1 + (1 − 2 i) x2 = 0,
and (1 + 2 i) x1 − 5 x2 = 0.
Obviously x1 = 5, x2 = 1 + 2 i is a solution.
5
∴ X2 = is an eigenvector corresponding to λ = 3.
1+ 2 i
Length of the vector X1 = √ [|1 − 2 i|2 + | − 5 |2 ]
= √ (5 + 25) = 30
and the length of the vector X2 = √ [| 5 |2 + |1 + 2 i|2 ]
= 25 + 5 = 30.
∴ the unitary matrix P that will transform H to diagonal form is
1 1
P= X1 , X2
30 30
(1 − 2 i) / 30 5 / 30
= ⋅
− 5 / 30 (1 + 2 i) / 30
−1 θ
Also P = P
−1 θ −3 0
and P HP = P HP = = diag. [− 3, 3].
0 3
The unitary transformation X = P Y will transform the given Hermitian form to
the equivalent diagonal form
−3 0
Y θ diag. Y = − 3 y1 y1 + 3 y2 y2 .
0 3
The canonical form of the given Hermitian form is
− z1 z1 + z 2 z 2 .
The rank of the given Hermitian form = the number of non-zero eigenvalues of its
matrix H = 2 .
The signature of the given Hermitian form = the number of positive eigenvalues of
H − the number of negative eigenvalues of H = 1 − 1 = 0.
358
Comprehensive Exercise 4
A nswers 4
1. (i) Positive semi-definite. (ii) Negative definite.
(iii) Positive definite.
2. The unitary matrix P that will transform H to diagonal form is
− i / 6 1 / 2 i / 3
P= 2/ 6 0 1 / 3 ⋅
− i / 6 − 1 / 2 i / 3
The unitary transformation X = P Y will transform the given Hermitian
form to the equivalent diagonal form 2 y2 y2 + 3 y3 y3 .
The rank of the given Hermitian form is 2 and its signature is 2.
359
8. Two real quadratic forms in n variables are equivalent if and only if they have
the same …… and the same index.
9. The number of positive terms in any two normal reductions of a real
quadratic form is the …… .
10. The rank of a real quadratic form XT A X is equal to the number of ……
eigenvalues of the symmetric matrix A.
11. The real quadratic form XT A X is positive definite if and only if the
eigenvalues of the matrix A are all …… .
12. The real quadratic form XT A X is positive definite if XT A X …… 0, for all
non-zero vectors X.
13. The eigenvalues of a Hermitian matrix are always …… .
14. A Hermitian form X θ H X always takes …… values for all complex
n-vectors X.
15. A Hermitian form X θ H X is negative definite iff the eigenvalues of the
matrix H are all …… .
True or False
Write ‘T’ for true and ‘F’ for false statement.
1. The bilinear form
3 x1 y1 + x1 y2 + x2 y1 − 2 x2 y2 − 4 x2 y3 − 4 x3 y2 + 3 x3 y3
is symmetric.
361
A nswers
2 1 −3
1. −2 3 0
2. symmetric 3. quadratic form
4. symmetric 5. diag. [d1 , d2 , d3 , d4 , d5 ]
6. λ 1 x12 + λ 2 x2 2 + … + λ n x n2
7. 2 x12 + 3 x2 2 + 4 x3 2 + 2 x1 x2 + 10 x1 x3 − 4 x2 x3
8. rank 9. same 10. non-zero
11. positive 12. > 13. real
14. real 15. negative
True or False
1. T 2. T 3. F
4. F 5. T 6. F
7. F 8. T 9. T
10. F